This thesis focuses on the application of reinforcement learning (RL) techniques to design optimal controllers and observers for linear time-invariant (LTI) systems, namely linear quadratic regulator (LQR), linear quadratic tracker (LQT), and linear quadratic estimator (LQE), utilizing measured data. The closed-form solution and wide-ranging engineering applications of the linear quadratic (LQ) problems have made it a preferred benchmark for assessing RL algorithms. The primary contribution lies in the introduction of novel policy iteration (PI) methods, wherein the value-function approximator (VFA) is designed as a two-layer quadratic neural network (QNN) trained through convex optimization. To the best of our knowledge, this is the first time that a convex optimization-trained QNN is employed as the VFA. The main advantage is that the QNN’s input-output mapping has an analytical expression as a quadratic form, which can then be used to obtain an analytical linear expression for policy improvement. This is in stark contrast to available techniques that must train a second neural network to obtain the policy improvement. Due to the quadratic input-output mapping of the QNNs and the quadratic form of the value-function in the LQ problems, the QNN is a suitable VFA candidate. The thesis designs the LQR and LQT without requiring the system model. The thesis also designs the LQE correcrtion term provided that the system model is given. The thesis establishes the convergence of the learning algorithm to the LQ solution provided one starts from a stabilizing policy. To assess the proposed approach, extensive simulations are conducted using MATLAB, demonstrating the effectiveness of the developed method. Furthermore, the proposed observer is designed for a nonlinear pendulum with a given linearized model and it is shown that the proposed observer is improved over utilizing only linearized model. This shows the adaptability for nonlinear systems.