In this thesis, deep reinforcement learning (DRL) is used for intelligent formation control and obstacle avoidance in multi-agent systems through reward shaping. The objective of this work is to study the application of proximal policy optimization (PPO) algorithm for maneuvering a formation of agents around obstacles. Each agent in the multi-agent system is modeled as a holonomic second-order integrator and the formation is allowed to shrink while maintaining its shape in order to navigate around obstacles and take the geometric centroid of the formation towards the goal. We investigated both angle-based rewards and bearing-based rewards. Experiments were carried out in a two-dimensional simulation environment with different number of agents and multiple obstacles between the formation and the goal. Curriculum learning was used to train the agents in environments with different initializations for the agents, goal and obstacles. Simulation results show the effectiveness of the different reward schemes.