Fathi, Vida (2021) Reinforcement Learning in Linear Quadratic Deep Structured Teams with Performance Guarantee. Masters thesis, Concordia University.
Text (application/pdf)
1MBFathi_MASc_S2021 (1).pdf - Accepted Version Restricted to Repository staff only Available under License Spectrum Terms of Access. |
Abstract
In this thesis, the global convergence of model-based and model-free gradient descent and natural policy gradient descent algorithms is studied for a class of linear quadratic deep structured teams. In such systems, agents are partitioned into a few sub populations wherein the agents in each sub-population are coupled in the dynamics and cost function through a set of linear regressions of the states and actions of all agents. Every agent observes its local state and the linear regressions of states, called
deep states. For a sufficiently small risk factor and/or sufficiently large population, we prove that model-based policy gradient methods globally converge to the optimal solution. Given an arbitrary number of agents, we develop model-free policy gradient and natural policy gradient algorithms for the special case of risk-neutral cost function. The proposed algorithms are scalable with respect to the number of agents due to the fact that the dimension of their policy space is independent of the number of agents in each sub-population. Furthermore, the connection between the model-based and model-free methods is investigated for systems having unknown nonlinear terms with bounded Lipschitz constants. As an extension, the existence of a near-optimal solution in the convex vicinity of initialized controllers obtained from model-based LQR methods is proved. We show these initialized control strategies are derived by solving an algebraic Riccati equation (ARE), obtained by neglecting the nonlinear terms. Finally, we provide convergence guarantees to the optimal solution using a derivative-free policy gradient approach. Simulations confirm the validity of the analytical results.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Electrical and Computer Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Fathi, Vida |
Institution: | Concordia University |
Degree Name: | M.A. Sc. |
Program: | Electrical and Computer Engineering |
Date: | 1 March 2021 |
Thesis Supervisor(s): | Aghdam, Amir |
ID Code: | 988088 |
Deposited By: | Vida Fathi |
Deposited On: | 29 Jun 2021 22:35 |
Last Modified: | 29 Jun 2021 22:35 |
References:
[1] H. Witsenhausen, “A counterexample in stochastic optimum control,” SIAM Journal on Control and Optimization, vol. 6, pp. 131–147, Dec. 1968.[2] H¨uttenrauch M, So si´c A, Neumann G., “Guided Deep Reinforcement Learning for Swarm Systems.” arXiv preprint arXiv:1709.06011, 2017.
[3] Calvo JA, Dusparic I., “Heterogeneous Multi-Agent Deep Reinforcement Learning for Traffic Lights Control.: InAICS, 2018(pp. 2-13).
[4] J. Arabneydi and A. G. Aghdam, “Deep teams: Decentralized decision making with finite and infinite number of agents,” IEEE Transactions on Automatic Control, DOI: 10.1109/TAC.2020.2966035, 2020.
[5] J. Arabneydi, A. G. Aghdam, and R. P. Malham´e, “Explicit sequential equilibria in LQ deep structured games and weighted mean-field games,” conditionally accepted in Automatica, 2020.
[6] J. Arabneydi and A. G. Aghdam, “Deep structured teams with linear quadratic model: Partial equivariance and gauge transformation,” [Online]. Available at https://arxiv.org/abs/1912.03951, 2019.
[7] ——, “Deep structured teams and games with Markov-chain model: Finite and infinite number of players,” Submitted, 2019.
[8] J. Arabneydi, M. Roudneshin, and A. G. Aghdam, “Reinforcement learning in deep structured teams: Initial results with finite and infinite valued features,” in Proceedings of IEEE Conference on Control Technology and Applications, 2020.
[9] M. Roudneshin, J. Arabneydi, and A. G. Aghdam, “Reinforcement learning in nonzero-sum Linear Quadratic deep structured games: Global convergence of policy optimization,” in Proceedings of the 59th IEEE Conference on Decision and Control,
2020.
[10] J. Arabneydi, “New concepts in team theory: Mean field teams and reinforcement learning,” Ph.D. dissertation, Department of Electrical and Computer Engineering, McGill University, Montreal, Canada, 2016.
[11] J. Arabneydi and A. Mahajan, “Linear quadratic mean field teams: Optimal and approximately optimal decentralized solutions,” Available at https://arxiv.org/abs/1609.00056, 2016.
[12] ——, “Team-optimal solution of finite number of mean-field coupled LQG subsystems,” in Proceedings of the 54th IEEE Conference on Decision and Control, 2015, pp. 5308 – 5313.
[13] M. Baharloo, J. Arabneydi, and A. G. Aghdam, “Near-optimal control strategy in leader-follower networks: A case study for linear quadratic mean-field teams,” in Proceedings of the 57th IEEE Conference on Decision and Control, 2018, pp. 3288–3293.
[14] ——, “Minmax mean-field team approach for a leader-follower network: A saddlepoint strategy,” IEEE Control Systems Letters, vol. 4, no. 1, pp. 121–126, 2019.
[15] J. Arabneydi, M. Baharloo, and A. G. Aghdam, “Optimal distributed control for leader-follower networks: A scalable design,” in Proceedings of the 31st IEEE Canadian Conference on Electrical and Computer Engineering, 2018, pp. 1–4.
[16] J. Arabneydi and A. G. Aghdam, “Optimal dynamic pricing for binary demands in smart grids: A fair and privacy-preserving strategy,” in Proceedings of American Control Conference, 2018, pp. 5368–5373.
[17] Qu G, Yu C, Low S, Wierman A., “Combining Model-based and Model-free Methods for Nonlinear Control: A Provably Convergent Policy Gradient Approach.” arXiv preprint arXiv:2006.07476. 2020 Jun 12.
[18] Fathi V, Arabneydi J, Aghdam AG., “Reinforcement Learning in Linear Quadratic Deep Structured Teams: Global Convergence of Policy Gradient Methods.” arXiv preprint arXiv:2011.14393. 2020 Nov 29.
[19] M. Fazel, R. Ge, S. M. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the linear quadratic regulator,” arXiv preprint arXiv:1801.05039, 2018.
[20] Y. Luo, Z. Yang, Z. Wang, and M. Kolar, “Natural actor-critic converges globally for hierarchical linear quadratic regulator,” Arxiv at https://arxiv.org/pdf/1912.06875.pdf, 2019.
[21] L. Lewis, Z. Yang, L. Yuchen, and Z. Wang, “Decentralized policy gradient method for mean-field linear quadratic regulator with global convergence,” ICML, 2020.
[22] K. Zhang, B. Hu, and T. Basar, “Policy optimization for H2 linear control with H1 robustness guarantee: Implicit regularization and global convergence,” arXiv preprint arXiv:1910.09496, 2019.
[23] S. Dean, H. Mania, N. Matni, B. Recht, and S. Tu, “Regret bounds for robust adaptive control of the linear quadratic regulator,” in Advances in Neural Information Processing Systems, 2018, pp. 4188–4197.
[24] B. T. Polyak, “Gradient methods for the minimisation of functionals,” USSR Computational Mathematics and Mathematical Physics, vol. 3, no. 4, pp. 864–878, 1963
[25] D. Malik, A. Pananjady, K. Bhatia, K. Khamaru, P. L. Bartlett, and M. J. Wainwright, “Derivative-free methods for policy optimization: Guarantees for linear quadratic systems,” Journal of Machine Learning Research, vol. 21, no. 21, pp. 1–51, 2020.
[26] A. D. Flaxman, A. T. Kalai, and H. B. McMahan, “Online convex optimization in the bandit setting: Gradient descent without a gradient,” in Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 2005, p. 385–394.
[27] Z. Yang, Y. Chen, M. Hong, and Z. Wang, “Provably global convergence of actorcritic: A case for linear quadratic regulator with ergodic cost,” in Advances in Neural Information Processing Systems, 2019, pp. 8353– 8365.
[28] Liu B, Wang L, Liu M., “Lifelong Federated Reinforcement Learning: A Learning Architecture for Navigation in Cloud Robotic Systems.” IEEE Robotics and Automation Letters. 2019 Jul 25;4(4):4555-62.
[29] Sallab AE, Abdou M, Perot E, Yogamani S. “Deep Reinforcement Learning Framework for Autonomous Driving.” Electronic Imaging. 2017;2017(19):70-6.
[30] V´azquez-Canteli JR, Nagy Z., “Reinforcement Learning for Demand Response: A Review of Algorithms and Modeling Techniques.” Applied energy. 2019;235:1072-89.
[31] Gama F, Sojoudi S. Graph Neural Networks for Decentralized Linear- Quadratic Control. arXiv preprint arXiv:2011.05360. 2020 Nov 10.
[32] D. Jacobson, “Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games,” IEEE Transactions on Automatic control, vol. 18, no. 2, pp. 124–131, 1973.
[33] P. Whittle, “Risk-sensitive linear/quadratic/Gaussian control,” Advances in Applied Probability, vol. 13, no. 4, pp. 764–777, 1981.
[34] T. Bas¸ar and P. Bernhard, H-infinity optimal control and related minimax design problems: A dynamic game approach. Birkha¨user Basel, 2008.
[35] Jin M, Lavaei J. Stability-certified reinforcement learning: A control theoretic perspective. IEEE Access. 2020 Dec 16.
Repository Staff Only: item control page