[1] H. Witsenhausen, “A counterexample in stochastic optimum control,” SIAM Journal on Control and Optimization, vol. 6, pp. 131–147, Dec. 1968. [2] H¨uttenrauch M, So si´c A, Neumann G., “Guided Deep Reinforcement Learning for Swarm Systems.” arXiv preprint arXiv:1709.06011, 2017. [3] Calvo JA, Dusparic I., “Heterogeneous Multi-Agent Deep Reinforcement Learning for Traffic Lights Control.: InAICS, 2018(pp. 2-13). [4] J. Arabneydi and A. G. Aghdam, “Deep teams: Decentralized decision making with finite and infinite number of agents,” IEEE Transactions on Automatic Control, DOI: 10.1109/TAC.2020.2966035, 2020. [5] J. Arabneydi, A. G. Aghdam, and R. P. Malham´e, “Explicit sequential equilibria in LQ deep structured games and weighted mean-field games,” conditionally accepted in Automatica, 2020. [6] J. Arabneydi and A. G. Aghdam, “Deep structured teams with linear quadratic model: Partial equivariance and gauge transformation,” [Online]. Available at https://arxiv.org/abs/1912.03951, 2019. [7] ——, “Deep structured teams and games with Markov-chain model: Finite and infinite number of players,” Submitted, 2019. [8] J. Arabneydi, M. Roudneshin, and A. G. Aghdam, “Reinforcement learning in deep structured teams: Initial results with finite and infinite valued features,” in Proceedings of IEEE Conference on Control Technology and Applications, 2020. [9] M. Roudneshin, J. Arabneydi, and A. G. Aghdam, “Reinforcement learning in nonzero-sum Linear Quadratic deep structured games: Global convergence of policy optimization,” in Proceedings of the 59th IEEE Conference on Decision and Control, 2020. [10] J. Arabneydi, “New concepts in team theory: Mean field teams and reinforcement learning,” Ph.D. dissertation, Department of Electrical and Computer Engineering, McGill University, Montreal, Canada, 2016. [11] J. Arabneydi and A. Mahajan, “Linear quadratic mean field teams: Optimal and approximately optimal decentralized solutions,” Available at https://arxiv.org/abs/1609.00056, 2016. [12] ——, “Team-optimal solution of finite number of mean-field coupled LQG subsystems,” in Proceedings of the 54th IEEE Conference on Decision and Control, 2015, pp. 5308 – 5313. [13] M. Baharloo, J. Arabneydi, and A. G. Aghdam, “Near-optimal control strategy in leader-follower networks: A case study for linear quadratic mean-field teams,” in Proceedings of the 57th IEEE Conference on Decision and Control, 2018, pp. 3288–3293. [14] ——, “Minmax mean-field team approach for a leader-follower network: A saddlepoint strategy,” IEEE Control Systems Letters, vol. 4, no. 1, pp. 121–126, 2019. [15] J. Arabneydi, M. Baharloo, and A. G. Aghdam, “Optimal distributed control for leader-follower networks: A scalable design,” in Proceedings of the 31st IEEE Canadian Conference on Electrical and Computer Engineering, 2018, pp. 1–4. [16] J. Arabneydi and A. G. Aghdam, “Optimal dynamic pricing for binary demands in smart grids: A fair and privacy-preserving strategy,” in Proceedings of American Control Conference, 2018, pp. 5368–5373. [17] Qu G, Yu C, Low S, Wierman A., “Combining Model-based and Model-free Methods for Nonlinear Control: A Provably Convergent Policy Gradient Approach.” arXiv preprint arXiv:2006.07476. 2020 Jun 12. [18] Fathi V, Arabneydi J, Aghdam AG., “Reinforcement Learning in Linear Quadratic Deep Structured Teams: Global Convergence of Policy Gradient Methods.” arXiv preprint arXiv:2011.14393. 2020 Nov 29. [19] M. Fazel, R. Ge, S. M. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the linear quadratic regulator,” arXiv preprint arXiv:1801.05039, 2018. [20] Y. Luo, Z. Yang, Z. Wang, and M. Kolar, “Natural actor-critic converges globally for hierarchical linear quadratic regulator,” Arxiv at https://arxiv.org/pdf/1912.06875.pdf, 2019. [21] L. Lewis, Z. Yang, L. Yuchen, and Z. Wang, “Decentralized policy gradient method for mean-field linear quadratic regulator with global convergence,” ICML, 2020. [22] K. Zhang, B. Hu, and T. Basar, “Policy optimization for H2 linear control with H1 robustness guarantee: Implicit regularization and global convergence,” arXiv preprint arXiv:1910.09496, 2019. [23] S. Dean, H. Mania, N. Matni, B. Recht, and S. Tu, “Regret bounds for robust adaptive control of the linear quadratic regulator,” in Advances in Neural Information Processing Systems, 2018, pp. 4188–4197. [24] B. T. Polyak, “Gradient methods for the minimisation of functionals,” USSR Computational Mathematics and Mathematical Physics, vol. 3, no. 4, pp. 864–878, 1963 [25] D. Malik, A. Pananjady, K. Bhatia, K. Khamaru, P. L. Bartlett, and M. J. Wainwright, “Derivative-free methods for policy optimization: Guarantees for linear quadratic systems,” Journal of Machine Learning Research, vol. 21, no. 21, pp. 1–51, 2020. [26] A. D. Flaxman, A. T. Kalai, and H. B. McMahan, “Online convex optimization in the bandit setting: Gradient descent without a gradient,” in Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 2005, p. 385–394. [27] Z. Yang, Y. Chen, M. Hong, and Z. Wang, “Provably global convergence of actorcritic: A case for linear quadratic regulator with ergodic cost,” in Advances in Neural Information Processing Systems, 2019, pp. 8353– 8365. [28] Liu B, Wang L, Liu M., “Lifelong Federated Reinforcement Learning: A Learning Architecture for Navigation in Cloud Robotic Systems.” IEEE Robotics and Automation Letters. 2019 Jul 25;4(4):4555-62. [29] Sallab AE, Abdou M, Perot E, Yogamani S. “Deep Reinforcement Learning Framework for Autonomous Driving.” Electronic Imaging. 2017;2017(19):70-6. [30] V´azquez-Canteli JR, Nagy Z., “Reinforcement Learning for Demand Response: A Review of Algorithms and Modeling Techniques.” Applied energy. 2019;235:1072-89. [31] Gama F, Sojoudi S. Graph Neural Networks for Decentralized Linear- Quadratic Control. arXiv preprint arXiv:2011.05360. 2020 Nov 10. [32] D. Jacobson, “Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games,” IEEE Transactions on Automatic control, vol. 18, no. 2, pp. 124–131, 1973. [33] P. Whittle, “Risk-sensitive linear/quadratic/Gaussian control,” Advances in Applied Probability, vol. 13, no. 4, pp. 764–777, 1981. [34] T. Bas¸ar and P. Bernhard, H-infinity optimal control and related minimax design problems: A dynamic game approach. Birkha¨user Basel, 2008. [35] Jin M, Lavaei J. Stability-certified reinforcement learning: A control theoretic perspective. IEEE Access. 2020 Dec 16.