This Ph.D. dissertation studies the control of multi-agent reinforcement learning (MARL) and multi-agent deep reinforcement learning (MADRL) systems under adversarial attacks.
Various attacks are investigated, and several defence algorithms (mitigation approaches) are proposed to assist the consensus control and proper data transmission.

We studied the consensus problem of a leaderless, homogeneous MARL system using actor-critic algorithms, with and without malicious agents.
We considered various distance-based immediate reward functions to improve the system's performance.
In addition to proposing four different immediate reward functions based on Euclidean, n-norm, and Chebyshev distances, we rigorously demonstrated which reward function performs better based on a cumulative reward for each agent and the entire team of agents.
The claims have been proven theoretically, and the simulation confirmed theoretical findings.

We examined whether modifying the malicious agent's neural network (NN) structure, as well as providing a compatible combination of the mean squared error (MSE) loss function and the sigmoid activation function can mitigate the destructive effects of the malicious agent on the leaderless, homogeneous, MARL system performance.
In addition to the theoretical support, the simulation confirmed the findings of the theory.

We studied the gradient-based adversarial attacks on cluster-based, heterogeneous MADRL systems with time-delayed data transmission using deep Q-network (DQN) algorithms.
We introduced two novel observations, termed on-time and time-delay observations, considered when the data transmission channel is idle and the data is transmitted on-time or time-delayed.
By considering the distance between the neighbouring agents, we presented a novel immediate reward function that appends a distance-based reward to the previously utilized reward to improve the MADRL system performance. We considered three types of gradient-based attacks to investigate the robustness of the proposed system data transmission. Two defence methods were proposed to reduce the effects of the discussed malicious attacks.
The theoretical results are illustrated and verified with simulation examples.

We also investigated the data transmission robustness between agents of a cluster-based, heterogeneous MADRL system under a gradient-based adversarial attack. An algorithm using a DQN approach and a proportional feedback controller to defend against the fast gradient sign method (FGSM) attack and improve the DQN agent performance was proposed.
Simulation results are included to verify the presented results.