Login | Register

Reinforcement Learning Methods in Continuous Control: Analyzing Strategies for Goal Selection and Experience Replay for Different Robotic Tasks

Title:

Reinforcement Learning Methods in Continuous Control: Analyzing Strategies for Goal Selection and Experience Replay for Different Robotic Tasks

Shams, Ayman (2026) Reinforcement Learning Methods in Continuous Control: Analyzing Strategies for Goal Selection and Experience Replay for Different Robotic Tasks. Masters thesis, Concordia University.

[thumbnail of Shams_MCompSc_S2026.pdf]
Preview
Text (application/pdf)
Shams_MCompSc_S2026.pdf - Accepted Version
Available under License Spectrum Terms of Access.
2MB

Abstract

Surgical robots require highly precise control mechanisms to assist doctors during complex and prolonged procedures. Similar to the human brain, robots in surgical environments must be sample-efficient and capable of learning from limited real-world experiences. Reinforcement Learning (RL) provides a powerful framework for this, enabling robotic agents to acquire adaptive control strategies through interaction-based feedback rather than explicit programming. However, due to safety and data constraints, these agents are allowed only minimal interactions with their environment, making efficient learning essential.

Inspired by how humans revisit past experiences to learn from mistakes, this work explores methods that improve sample efficiency and adaptability in continuous control tasks. We leverage Hindsight Experience Replay (HER) to enable agents to relabel failed trajectories as successful attempts toward alternative goals, accelerating convergence in sparse-reward settings. To further improve learning stability under sparse and safety-constrained conditions, we integrate Truncated Quantile Critics (TQC), a distributional actor–critic method that reduces value overestimation and improves robustness in continuous control. Building on this, we propose a curriculum-based hierarchical reinforcement learning framework using the Options paradigm, which organizes learning into progressively complex stages—from basic subgoals such as alignment and stabilization to advanced tasks like dexterous manipulation through temporally extended skills with dedicated policies and termination conditions.

Within this hierarchical structure, complex manipulation behaviors are decomposed into meaningful phases (e.g., grasp stabilization, lifting, controlled motion, and precise placement), while a high-level controller learns to sequence these skills efficiently. Our experiments show that integrating curriculum learning with HER and hierarchical Options-based control, together with distributional critics, substantially improves learning stability, convergence speed, and overall success rates compared to standard RL approaches. The resulting framework combines goal relabeling, hierarchical control, and structured skill progression to enhance data efficiency in high-dimensional robotic systems. This work advances the development of autonomous surgical and assistive robots capable of mastering complex real-world tasks with limited data and high reliability.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (Masters)
Authors:Shams, Ayman
Institution:Concordia University
Degree Name:M. Comp. Sc.
Program:Computer Science
Date:13 March 2026
Thesis Supervisor(s):Fevens, Thomas
ID Code:996817
Deposited By: Ayman Shams
Deposited On:29 Jun 2026 14:59
Last Modified:29 Jun 2026 14:59
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top