Elasticity is a key feature of cloud computation and is a major contributor to its popularity. Elasticity is defined as automatic provisioning/de-provisioning of resources to match workload changes over time. Service High Availability (HA) is among one of cloud computing’s big challenges. High Availability (HA) is defined as providing a minimum of 99.999% service availability. Maintaining service HA while scaling in/out is even more challenging. Recently, an architecture has been proposed for managing HA. Following the proposed architecture, an Elasticity Engine has been introduced that is capable of managing resources based on application level provisioning or de-provisioning alerts while preserving HA. In contrast to the prevailing monitoring solutions where Virtual Machine (VM) level workload is provided, the Elasticity Engine requires a monitoring solution that monitors service-level workload and triggers alerts accordingly. In this thesis, we propose an approach and an architecture for the monitoring of HA applications at the service level. Accordingly, the monitoring approach starts with monitoring the application components in traditional manner. Workload of the components are mapped to each component’s respective service assignment. The resource usages of all the components providing services is aggregated and mapped to the service level workload using a distributed client-server architecture. This approach allows for distinguishing between the different HA states, active or standby that a component can be assigned at runtime and it (the approach) adapts to the situations where switchovers happen under the control of the SA Forum middleware due to failures for example. The proposed monitoring architecture has been implemented and integrated with the Elasticity Engine to test its effectiveness and overhead. It has been shown that the implemented and integrated prototypes achieve elasticity in a cluster based on service level workload while keeping the monitoring overhead within 5% of its total resource.