Linear quadratic control using reinforcement learning and quadratic neural networks

Title:

Linear quadratic control using reinforcement learning and quadratic neural networks

Asri, Soroush (2023) Linear quadratic control using reinforcement learning and quadratic neural networks. Masters thesis, Concordia University.

Preview

Text (application/pdf)
Asri_MASc_S2024.pdf - Accepted Version
Available under License Spectrum Terms of Access.

755kB

Abstract

This thesis focuses on the application of reinforcement learning (RL) techniques to design optimal controllers and observers for linear time-invariant (LTI) systems, namely linear quadratic regulator (LQR), linear quadratic tracker (LQT), and linear quadratic estimator (LQE), utilizing measured data. The closed-form solution and wide-ranging engineering applications of the linear quadratic (LQ) problems have made it a preferred benchmark for assessing RL algorithms. The primary contribution lies in the introduction of novel policy iteration (PI) methods, wherein the value-function approximator (VFA) is designed as a two-layer quadratic neural network (QNN) trained through convex optimization. To the best of our knowledge, this is the first time that a convex optimization-trained QNN is employed as the VFA. The main advantage is that the QNN’s input-output mapping has an analytical expression as a quadratic form, which can then be used to obtain an analytical linear expression for policy improvement. This is in stark contrast to available techniques that must train a second neural network to obtain the policy improvement. Due to the quadratic input-output mapping of the QNNs and the quadratic form of the value-function in the LQ problems, the QNN is a suitable VFA candidate. The thesis designs the LQR and LQT without requiring the system model. The thesis also designs the LQE correcrtion term provided that the system model is given. The thesis establishes the convergence of the learning algorithm to the LQ solution provided one starts from a stabilizing policy. To assess the proposed approach, extensive simulations are conducted using MATLAB, demonstrating the effectiveness of the developed method. Furthermore, the proposed observer is designed for a nonlinear pendulum with a given linearized model and it is shown that the proposed observer is improved over utilizing only linearized model. This shows the adaptability for nonlinear systems.

Divisions:	Concordia University > Gina Cody School of Engineering and Computer Science > Electrical and Computer Engineering
Item Type:	Thesis (Masters)
Authors:	Asri, Soroush
Institution:	Concordia University
Degree Name:	M.A. Sc.
Program:	Electrical and Computer Engineering
Date:	15 November 2023
Thesis Supervisor(s):	Rodrigues, Luis
ID Code:	993308
Deposited By:	Soroush Asri
Deposited On:	05 Jun 2024 15:17
Last Modified:	05 Jun 2024 15:17

References:

[1] A. E. Bryson and Y.-C. Ho, Applied optimal control: optimization, estimation, and control. Routledge, 2018.
[2] F. L. Lewis, D. Vrabie, and V. L. Syrmos, Optimal control. John Wiley & Sons, 2012.
[3] M. Gan, J. Zhao, and C. Zhang, “Extended adaptive optimal control of linear systems with unknown dynamics using adaptive dynamic programming,” Asian Journal of Control, vol. 23, no. 2, pp. 1097–1106, 2021.
[4] Y. Jiang and Z.-P. Jiang, “Global adaptive dynamic programming for continuous-time nonlinear systems,” IEEE Transactions on Automatic Control, vol. 60, no. 11, pp. 2917– 2929, 2015.
[5] D. Bertsekas, Reinforcement learning and optimal control. Athena Scientific, 2019.
[6] F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,” IEEE circuits and systems magazine, vol. 9, no. 3, pp. 32–50, 2009.
[7] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
[8] C. Szepesv´ari, Algorithms for reinforcement learning. Springer Nature, 2022.
[9] B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis, “Optimal and autonomous control using reinforcement learning: A survey,” IEEE transactions on neural networks and learning systems, vol. 29, no. 6, pp. 2042–2062, 2017.
[10] N. Matni, A. Proutiere, A. Rantzer, and S. Tu, “From self-tuning regulators to reinforcement learning and back again,” in 2019 IEEE 58th Conference on Decision and Control (CDC). IEEE, 2019, pp. 3724–3740.
[11] F. A. Yaghmaie, F. Gustafsson, and L. Ljung, “Linear quadratic control using modelfree reinforcement learning,” IEEE Transactions on Automatic Control, 2022.
[12] Y. Duan, X. Chen, R. Houthooft, J. Schulman, and P. Abbeel, “Benchmarking deep reinforcement learning for continuous control,” in International conference on machine learning. PMLR, 2016, pp. 1329–1338.
[13] S. G. Khan, G. Herrmann, F. L. Lewis, T. Pipe, and C. Melhuish, “Reinforcement learning and optimal adaptive control: An overview and implementation examples,” Annual reviews in control, vol. 36, no. 1, pp. 42–59, 2012.
[14] D. Zhao, Z. Xia, and D. Wang, “Model-free optimal control for affine nonlinear systems with convergence analysis,” IEEE Transactions on Automation Science and Engineering,
vol. 12, no. 4, pp. 1461–1468, 2014.
[15] N. C. Luong, D. T. Hoang, S. Gong, D. Niyato, P. Wang, Y.-C. Liang, and D. I. Kim,
“Applications of deep reinforcement learning in communications and networking: A
survey,” IEEE Communications Surveys & Tutorials, vol. 21, no. 4, pp. 3133–3174,
2019.
[16] L. Rodrigues and S. Givigi, “Analysis and design of quadratic neural networks for
regression, classification, and lyapunov control of dynamical systems,” arXiv preprint
arXiv:2207.13120, 2022.
[17] B. Bartan and M. Pilanci, “Neural spectrahedra and semidefinite lifts: Global convex
optimization of polynomial activation neural networks in fully polynomial-time,” arXiv
preprint arXiv:2101.02429, 2021.
[18] R. Bellman, “Dynamic programming,” Science, vol. 153, no. 3731, pp. 34–37, 1966.
[19] B. D. Anderson and J. B. Moore, Optimal control: linear quadratic methods. Courier
Corporation, 2007.
[20] V. Sima, Algorithms for linear-quadratic optimization. CRC Press, 2021.
[21] R. E. Kalman et al., “Contributions to the theory of optimal control,” Bol. soc. mat.
mexicana, vol. 5, no. 2, pp. 102–119, 1960.
[22] J. C. Doyle, “Guaranteed margins for lqg regulators,” IEEE Transactions on automatic
Control, vol. 23, no. 4, pp. 756–757, 1978.
[23] R. E. Kalman, “A new approach to linear filtering and prediction problems,” 1960.
[24] D. Kleinman, “On an iterative technique for riccati equation computations,” IEEE
Transactions on Automatic Control, vol. 13, no. 1, pp. 114–115, 1968.
[25] D. Tailor and D. Izzo, “Learning the optimal state-feedback via supervised imitation
learning,” Astrodynamics, vol. 3, pp. 361–374, 2019.
[26] J. L. Proctor, S. L. Brunton, and J. N. Kutz, “Dynamic mode decomposition with
control,” SIAM Journal on Applied Dynamical Systems, vol. 15, no. 1, pp. 142–161,
2016.
[27] J. Boedecker, J. T. Springenberg, J. W¨ulfing, and M. Riedmiller, “Approximate realtime
optimal control based on sparse gaussian process models,” in 2014 IEEE symposium
on adaptive dynamic programming and reinforcement learning (ADPRL). IEEE,
2014, pp. 1–8.
[28] D. Zeidler, S. Frey, K.-L. Kompa, and M. Motzkus, “Evolutionary algorithms and their
application to optimal control studies,” Physical Review A, vol. 64, no. 2, p. 023420,
2001.
[29] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,”
Journal of artificial intelligence research, vol. 4, pp. 237–285, 1996.
[30] A. S. Polydoros and L. Nalpantidis, “Survey of model-based reinforcement learning:
Applications on robotics,” Journal of Intelligent & Robotic Systems, vol. 86, no. 2, pp.
153–173, 2017.
[31] R. Tedrake, “Underactuated robotics: Algorithms for walking, running, swimming, flying,
and manipulation,” Course Notes for MIT, vol. 6, 2016.
[32] F. L. Lewis and D. Liu, Reinforcement learning and approximate dynamic programming
for feedback control. John Wiley & Sons, 2013.
[33] H. Liu, B. Kiumarsi, Y. Kartal, A. Taha Koru, H. Modares, and F. L. Lewis, “Reinforcement
learning applications in unmanned vehicle control: A comprehensive overview,”
Unmanned Systems, vol. 11, no. 01, pp. 17–26, 2023.
[34] V. Pong, S. Gu, M. Dalal, and S. Levine, “Temporal difference models: Model-free deep
rl for model-based control,” arXiv preprint arXiv:1802.09081, 2018.
[35] D. Liu, S. Xue, B. Zhao, B. Luo, and Q. Wei, “Adaptive dynamic programming for
control: A survey and recent advances,” IEEE Transactions on Systems, Man, and
Cybernetics: Systems, vol. 51, no. 1, pp. 142–160, 2020.
[36] D. V. Prokhorov and D. C. Wunsch, “Adaptive critic designs,” IEEE transactions on
Neural Networks, vol. 8, no. 5, pp. 997–1007, 1997.
[37] B. Pang and Z.-P. Jiang, “Robust reinforcement learning: A case study in linear
quadratic regulation,” in Proceedings of the AAAI Conference on Artificial Intelligence,
vol. 35, no. 10, 2021, pp. 9303–9311.
[38] S. J. Bradtke, B. E. Ydstie, and A. G. Barto, “Adaptive linear quadratic control using
policy iteration,” in Proceedings of 1994 American Control Conference-ACC’94, vol. 3.
IEEE, 1994, pp. 3475–3479.
[39] F. L. Lewis and K. G. Vamvoudakis, “Reinforcement learning for partially observable dynamic
processes: Adaptive dynamic programming using measured output data,” IEEE
Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 41, no. 1,
pp. 14–25, 2010.
[40] S. A. A. Rizvi and Z. Lin, “Experience replay–based output feedback q-learning scheme
for optimal output tracking control of discrete-time linear systems,” International Journal
of Adaptive Control and Signal Processing, vol. 33, no. 12, pp. 1825–1842, 2019.
[41] B. Kiumarsi, F. L. Lewis, H. Modares, A. Karimpour, and M.-B. Naghibi-Sistani, “Reinforcement
q-learning for optimal tracking control of linear discrete-time systems with
unknown dynamics,” Automatica, vol. 50, no. 4, pp. 1167–1175, 2014.
[42] B. Kiumarsi, F. L. Lewis, M.-B. Naghibi-Sistani, and A. Karimpour, “Optimal tracking
control of unknown discrete-time linear systems using input-output measured data,”
IEEE transactions on cybernetics, vol. 45, no. 12, pp. 2770–2779, 2015.
[43] J. Na, G. Herrmann, and K. G. Vamvoudakis, “Adaptive optimal observer design via
approximate dynamic programming,” in 2017 American Control Conference (ACC).
IEEE, 2017, pp. 3288–3293.
[44] J. Li, Z. Xiao, P. Li, and Z. Ding, “Networked controller and observer design of discretetime
systems with inaccurate model parameters,” ISA transactions, vol. 98, pp. 75–86,
2020.
[45] L. Rodrigues and S. Givigi, “System identification and control using quadratic neural
networks,” IEEE Control Systems Letters, vol. 7, pp. 2209–2214, 2023.
[46] H. Kwakernaak and R. Sivan, Linear optimal control systems. Wiley-interscience, 1969,
vol. 1072.
[47] C. J. C. H. Watkins, “Learning from delayed rewards,” 1989.
[48] C. J. Watkins and P. Dayan, “Q-learning,” Machine learning, vol. 8, pp. 279–292, 1992.
[49] R. Kalman, P. L. Falb, and M. A. Arbib, “Controllability and observability for linear
systems,” IEEE Transactions on Automatic Control, vol. 9, no. 3, pp. 291–292, 1964.
[50] G. H. Golub and C. F. Van Loan, Matrix Computations. Johns Hopkins University
Press, 1996.
[51] W. L. Root and H. W. Lee, “A riccati equation arising in stochastic control,” SIAM
Journal on Control, vol. 8, no. 4, pp. 401–414, 1970.
[52] B. D. O. Anderson and J. B. Moore, Optimal Control: Linear Quadratic Methods.
Dover Publications, 1990.
[53] K. Ogata, Modern control engineering fifth edition, 2010.

Repository Staff Only: item control page

Download Statistics

Downloads per month over past year

Research related to the current document (at the CORE website)

Spectrum Research Repository

Linear quadratic control using reinforcement learning and quadratic neural networks

Linear quadratic control using reinforcement learning and quadratic neural networks

Abstract

References: