[1] A. E. Bryson and Y.-C. Ho, Applied optimal control: optimization, estimation, and control. Routledge, 2018. [2] F. L. Lewis, D. Vrabie, and V. L. Syrmos, Optimal control. John Wiley & Sons, 2012. [3] M. Gan, J. Zhao, and C. Zhang, “Extended adaptive optimal control of linear systems with unknown dynamics using adaptive dynamic programming,” Asian Journal of Control, vol. 23, no. 2, pp. 1097–1106, 2021. [4] Y. Jiang and Z.-P. Jiang, “Global adaptive dynamic programming for continuous-time nonlinear systems,” IEEE Transactions on Automatic Control, vol. 60, no. 11, pp. 2917– 2929, 2015. [5] D. Bertsekas, Reinforcement learning and optimal control. Athena Scientific, 2019. [6] F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,” IEEE circuits and systems magazine, vol. 9, no. 3, pp. 32–50, 2009. [7] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018. [8] C. Szepesv´ari, Algorithms for reinforcement learning. Springer Nature, 2022. [9] B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis, “Optimal and autonomous control using reinforcement learning: A survey,” IEEE transactions on neural networks and learning systems, vol. 29, no. 6, pp. 2042–2062, 2017. [10] N. Matni, A. Proutiere, A. Rantzer, and S. Tu, “From self-tuning regulators to reinforcement learning and back again,” in 2019 IEEE 58th Conference on Decision and Control (CDC). IEEE, 2019, pp. 3724–3740. [11] F. A. Yaghmaie, F. Gustafsson, and L. Ljung, “Linear quadratic control using modelfree reinforcement learning,” IEEE Transactions on Automatic Control, 2022. [12] Y. Duan, X. Chen, R. Houthooft, J. Schulman, and P. Abbeel, “Benchmarking deep reinforcement learning for continuous control,” in International conference on machine learning. PMLR, 2016, pp. 1329–1338. [13] S. G. Khan, G. Herrmann, F. L. Lewis, T. Pipe, and C. Melhuish, “Reinforcement learning and optimal adaptive control: An overview and implementation examples,” Annual reviews in control, vol. 36, no. 1, pp. 42–59, 2012. [14] D. Zhao, Z. Xia, and D. Wang, “Model-free optimal control for affine nonlinear systems with convergence analysis,” IEEE Transactions on Automation Science and Engineering, vol. 12, no. 4, pp. 1461–1468, 2014. [15] N. C. Luong, D. T. Hoang, S. Gong, D. Niyato, P. Wang, Y.-C. Liang, and D. I. Kim, “Applications of deep reinforcement learning in communications and networking: A survey,” IEEE Communications Surveys & Tutorials, vol. 21, no. 4, pp. 3133–3174, 2019. [16] L. Rodrigues and S. Givigi, “Analysis and design of quadratic neural networks for regression, classification, and lyapunov control of dynamical systems,” arXiv preprint arXiv:2207.13120, 2022. [17] B. Bartan and M. Pilanci, “Neural spectrahedra and semidefinite lifts: Global convex optimization of polynomial activation neural networks in fully polynomial-time,” arXiv preprint arXiv:2101.02429, 2021. [18] R. Bellman, “Dynamic programming,” Science, vol. 153, no. 3731, pp. 34–37, 1966. [19] B. D. Anderson and J. B. Moore, Optimal control: linear quadratic methods. Courier Corporation, 2007. [20] V. Sima, Algorithms for linear-quadratic optimization. CRC Press, 2021. [21] R. E. Kalman et al., “Contributions to the theory of optimal control,” Bol. soc. mat. mexicana, vol. 5, no. 2, pp. 102–119, 1960. [22] J. C. Doyle, “Guaranteed margins for lqg regulators,” IEEE Transactions on automatic Control, vol. 23, no. 4, pp. 756–757, 1978. [23] R. E. Kalman, “A new approach to linear filtering and prediction problems,” 1960. [24] D. Kleinman, “On an iterative technique for riccati equation computations,” IEEE Transactions on Automatic Control, vol. 13, no. 1, pp. 114–115, 1968. [25] D. Tailor and D. Izzo, “Learning the optimal state-feedback via supervised imitation learning,” Astrodynamics, vol. 3, pp. 361–374, 2019. [26] J. L. Proctor, S. L. Brunton, and J. N. Kutz, “Dynamic mode decomposition with control,” SIAM Journal on Applied Dynamical Systems, vol. 15, no. 1, pp. 142–161, 2016. [27] J. Boedecker, J. T. Springenberg, J. W¨ulfing, and M. Riedmiller, “Approximate realtime optimal control based on sparse gaussian process models,” in 2014 IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL). IEEE, 2014, pp. 1–8. [28] D. Zeidler, S. Frey, K.-L. Kompa, and M. Motzkus, “Evolutionary algorithms and their application to optimal control studies,” Physical Review A, vol. 64, no. 2, p. 023420, 2001. [29] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,” Journal of artificial intelligence research, vol. 4, pp. 237–285, 1996. [30] A. S. Polydoros and L. Nalpantidis, “Survey of model-based reinforcement learning: Applications on robotics,” Journal of Intelligent & Robotic Systems, vol. 86, no. 2, pp. 153–173, 2017. [31] R. Tedrake, “Underactuated robotics: Algorithms for walking, running, swimming, flying, and manipulation,” Course Notes for MIT, vol. 6, 2016. [32] F. L. Lewis and D. Liu, Reinforcement learning and approximate dynamic programming for feedback control. John Wiley & Sons, 2013. [33] H. Liu, B. Kiumarsi, Y. Kartal, A. Taha Koru, H. Modares, and F. L. Lewis, “Reinforcement learning applications in unmanned vehicle control: A comprehensive overview,” Unmanned Systems, vol. 11, no. 01, pp. 17–26, 2023. [34] V. Pong, S. Gu, M. Dalal, and S. Levine, “Temporal difference models: Model-free deep rl for model-based control,” arXiv preprint arXiv:1802.09081, 2018. [35] D. Liu, S. Xue, B. Zhao, B. Luo, and Q. Wei, “Adaptive dynamic programming for control: A survey and recent advances,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 51, no. 1, pp. 142–160, 2020. [36] D. V. Prokhorov and D. C. Wunsch, “Adaptive critic designs,” IEEE transactions on Neural Networks, vol. 8, no. 5, pp. 997–1007, 1997. [37] B. Pang and Z.-P. Jiang, “Robust reinforcement learning: A case study in linear quadratic regulation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 10, 2021, pp. 9303–9311. [38] S. J. Bradtke, B. E. Ydstie, and A. G. Barto, “Adaptive linear quadratic control using policy iteration,” in Proceedings of 1994 American Control Conference-ACC’94, vol. 3. IEEE, 1994, pp. 3475–3479. [39] F. L. Lewis and K. G. Vamvoudakis, “Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 41, no. 1, pp. 14–25, 2010. [40] S. A. A. Rizvi and Z. Lin, “Experience replay–based output feedback q-learning scheme for optimal output tracking control of discrete-time linear systems,” International Journal of Adaptive Control and Signal Processing, vol. 33, no. 12, pp. 1825–1842, 2019. [41] B. Kiumarsi, F. L. Lewis, H. Modares, A. Karimpour, and M.-B. Naghibi-Sistani, “Reinforcement q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics,” Automatica, vol. 50, no. 4, pp. 1167–1175, 2014. [42] B. Kiumarsi, F. L. Lewis, M.-B. Naghibi-Sistani, and A. Karimpour, “Optimal tracking control of unknown discrete-time linear systems using input-output measured data,” IEEE transactions on cybernetics, vol. 45, no. 12, pp. 2770–2779, 2015. [43] J. Na, G. Herrmann, and K. G. Vamvoudakis, “Adaptive optimal observer design via approximate dynamic programming,” in 2017 American Control Conference (ACC). IEEE, 2017, pp. 3288–3293. [44] J. Li, Z. Xiao, P. Li, and Z. Ding, “Networked controller and observer design of discretetime systems with inaccurate model parameters,” ISA transactions, vol. 98, pp. 75–86, 2020. [45] L. Rodrigues and S. Givigi, “System identification and control using quadratic neural networks,” IEEE Control Systems Letters, vol. 7, pp. 2209–2214, 2023. [46] H. Kwakernaak and R. Sivan, Linear optimal control systems. Wiley-interscience, 1969, vol. 1072. [47] C. J. C. H. Watkins, “Learning from delayed rewards,” 1989. [48] C. J. Watkins and P. Dayan, “Q-learning,” Machine learning, vol. 8, pp. 279–292, 1992. [49] R. Kalman, P. L. Falb, and M. A. Arbib, “Controllability and observability for linear systems,” IEEE Transactions on Automatic Control, vol. 9, no. 3, pp. 291–292, 1964. [50] G. H. Golub and C. F. Van Loan, Matrix Computations. Johns Hopkins University Press, 1996. [51] W. L. Root and H. W. Lee, “A riccati equation arising in stochastic control,” SIAM Journal on Control, vol. 8, no. 4, pp. 401–414, 1970. [52] B. D. O. Anderson and J. B. Moore, Optimal Control: Linear Quadratic Methods. Dover Publications, 1990. [53] K. Ogata, Modern control engineering fifth edition, 2010.