Shams, Ayman (2026) Reinforcement Learning Methods in Continuous Control: Analyzing Strategies for Goal Selection and Experience Replay for Different Robotic Tasks. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
2MBShams_MCompSc_S2026.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
Surgical robots require highly precise control mechanisms to assist doctors during complex and prolonged procedures. Similar to the human brain, robots in surgical environments must be sample-efficient and capable of learning from limited real-world experiences. Reinforcement Learning (RL) provides a powerful framework for this, enabling robotic agents to acquire adaptive control strategies through interaction-based feedback rather than explicit programming. However, due to safety and data constraints, these agents are allowed only minimal interactions with their environment, making efficient learning essential.
Inspired by how humans revisit past experiences to learn from mistakes, this work explores methods that improve sample efficiency and adaptability in continuous control tasks. We leverage Hindsight Experience Replay (HER) to enable agents to relabel failed trajectories as successful attempts toward alternative goals, accelerating convergence in sparse-reward settings. To further improve learning stability under sparse and safety-constrained conditions, we integrate Truncated Quantile Critics (TQC), a distributional actor–critic method that reduces value overestimation and improves robustness in continuous control. Building on this, we propose a curriculum-based hierarchical reinforcement learning framework using the Options paradigm, which organizes learning into progressively complex stages—from basic subgoals such as alignment and stabilization to advanced tasks like dexterous manipulation through temporally extended skills with dedicated policies and termination conditions.
Within this hierarchical structure, complex manipulation behaviors are decomposed into meaningful phases (e.g., grasp stabilization, lifting, controlled motion, and precise placement), while a high-level controller learns to sequence these skills efficiently. Our experiments show that integrating curriculum learning with HER and hierarchical Options-based control, together with distributional critics, substantially improves learning stability, convergence speed, and overall success rates compared to standard RL approaches. The resulting framework combines goal relabeling, hierarchical control, and structured skill progression to enhance data efficiency in high-dimensional robotic systems. This work advances the development of autonomous surgical and assistive robots capable of mastering complex real-world tasks with limited data and high reliability.
| Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
|---|---|
| Item Type: | Thesis (Masters) |
| Authors: | Shams, Ayman |
| Institution: | Concordia University |
| Degree Name: | M. Comp. Sc. |
| Program: | Computer Science |
| Date: | 13 March 2026 |
| Thesis Supervisor(s): | Fevens, Thomas |
| ID Code: | 996817 |
| Deposited By: | Ayman Shams |
| Deposited On: | 29 Jun 2026 14:59 |
| Last Modified: | 29 Jun 2026 14:59 |
Repository Staff Only: item control page


Download Statistics
Download Statistics