Login | Register

Automating the Upgrade of IaaS Cloud Systems


Automating the Upgrade of IaaS Cloud Systems

Nabi, Mina (2019) Automating the Upgrade of IaaS Cloud Systems. PhD thesis, Concordia University.

[thumbnail of Nabi_PhD_F2019.pdf]
Text (application/pdf)
Nabi_PhD_F2019.pdf - Accepted Version


The different resources providing an Infrastructure as a Service (IaaS) cloud service may need to be upgraded several times throughout their life-cycle for different reasons, for instance to fix discovered bugs, to add new features, or to fix a security threat. An IaaS cloud provider is committed to each tenant by a service level agreement (SLA) which indicates the terms of commitment, e.g. the level of availability, that have to be respected even during upgrades. However, the service delivered by the IaaS cloud provider may be affected during the up-grade. Subsequently, this may violate the SLA, which in turn will impact other services rely-ing on the IaaS. Our goal in this thesis is to devise an approach and a framework for automat-ing the upgrade of IaaS cloud systems with minimal impact on the services and with respect to the SLAs.
The upgrade of IaaS cloud systems under availability constraints inherits all the challenges of the upgrade of traditional clustered systems and faces other cloud specific challenges. Similar challenges as in clustered systems include the potential dependencies between resources, po-tential incompatibilities along dependencies during the upgrade, potential system configura-tion inconsistencies due to the upgrade failures and the minimization of the amount of used resources to complete the upgrade. Dependencies of the application layer on the IaaS layer is an added challenge that must be handled properly. In addition, the dynamic nature of the cloud environment poses a new challenge. A cloud system evolves, even during the upgrade, according to the workload changes by scaling in/out. This mechanism (referred to as autoscal-ing) may interfere with the upgrade process in different ways.
In this thesis, we define an upgrade management framework for the upgrade of IaaS cloud systems under SLA constraints. This framework addresses all the aforementioned challenges in an integrated manner. The proposed framework automatically upgrades an IaaS cloud sys-tem from a current configuration to a desired one, according to the upgrade requests specified by the administrator. It consists of two distinct components, one to coordinate the upgrade, and the other one to execute the necessary upgrade actions on the infrastructure resources. For the coordination of the upgrade process, we propose a new approach to automatically identify and schedule the appropriate upgrade methods and actions for implementing the up-grade requests in an iterative manner taking into account the vendors’ descriptions of the in-frastructure components, the SLAs with the tenants, and the status of the system. This ap-proach is also capable of handling new upgrade requests even during ongoing upgrades, which makes it suitable for continuous delivery. In case of failures, the proposed approach automatically issues localized retry and undo recovery operations as appropriate for the failed upgrade actions to preserve the consistency of the system configuration.
In this thesis, to demonstrate the feasibility of the proposed upgrade management framework we present a proof of concept (PoC) for the upgrade IaaS compute, and its application in an OpenStack cluster. In this PoC, we target the new challenge of upgrade of the IaaS cloud (i.e. unexpected interference between the autoscaling and the upgrade processes) compared to the clustered systems. In addition, the prototype of the proposed upgrade approach for coordinat-ing the upgrade of all kinds of IaaS resources has been implemented and discussed in this thesis. We also provide an informal validation and a rigorous analysis of the main properties of our approach. In addition, we conduct experiments to evaluate our approach with respect to SLA constraints of availability and elasticity. The results show that our approach avoids the outage at the application level and reduces SLA violations during the upgrade, compared to the traditional upgrade method used by cloud providers.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Electrical and Computer Engineering
Item Type:Thesis (PhD)
Authors:Nabi, Mina
Institution:Concordia University
Degree Name:Ph. D.
Program:Electrical and Computer Engineering
Date:3 July 2019
Thesis Supervisor(s):Khendek, Ferhat and Toeroe, Maria
Keywords:High Availability; Cloud; IaaS; Upgrade; Scaling;
ID Code:985903
Deposited By: MINA NABI
Deposited On:14 Nov 2019 18:23
Last Modified:14 Nov 2019 18:23


[1] National Institute of Standards and Technology, “NIST Cloud Computing Standards Roadmap,” NIST Special Publication 500 - 291, 2013.
[2] M. Toeroe and F. Tam, Service availability principles and practice. John Wiley and Sons Ltd publication, 2012.
[3] M. Nabi, M. Toeroe, and F. Khendek, “Availability in the cloud: State of the art,” J. Netw. Comput. Appl., vol. 60, pp. 54–67, 2016.
[4] A. Undheim, A. Chilwan, and P. Heegaard, “Differentiated availability in cloud computing SLAs,” in 2011 IEEE/ACM 12th International Conference on Grid Computing, 2011, pp. 129–136.
[5] M. Nabi, F. Khendek, and M. Toeroe, “Upgrade of the IaaS cloud: Issues and potential solutions in the context of high-Availability,” in 26th IEEE International Symposium on Software Reliability Engineering, Industry track, 2015, pp. 21–24.
[6] N. Roy, A. Dubey, and A. Gokhale, “Efficient autoscaling in the cloud using predictive models for workload forecasting,” in 2011 IEEE 4th International Conference on Cloud Computing (CLOUD), 2011, pp. 500–507.
[7] Amazon Web Services, “Amazon EC2 Auto Scaling User Guide,” 2018. [Online]. Available: https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-dg.pdf. [Accessed: 05-Jul-2018].
[8] F. Paraiso, P. Merle, and L. Seinturier, “Managing elasticity across multiple cloud providers,” in 2013 International workshop on Multi-cloud applications and federated clouds - MultiCloud ’13, 2013, pp. 53–60.
[9] Amazon Web Services, “UpdatePolicy Attribute,” 2019. [Online]. Available: http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-attribute-updatepolicy.html. [Accessed: 05-Aug-2019].
[10] Amazon Web Services, “AWS::AutoScaling::ScheduledAction,” 2019. [Online]. Available: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-as-scheduledaction.html. [Accessed: 23-Aug-2019].
[11] I. Foster, Y. Zhao, I. Raicu, and S. Lu, “Cloud Computing and Grid Computing 360-Degree Compared,” in 2008 Grid Computing Environments Workshop, 2008, pp. 1–10.
[12] Q. Zhang, L. Cheng, and R. Boutaba, “Cloud computing: state-of-the-art and research challenges,” pp. 7–18, 2010.
[13] Amazon, “Amazon EC2,” 2018. [Online]. Available: http://aws.amazon.com/ec2/. [Accessed: 30-Jul-2018].
[14] “Google App Engine,” 2018. [Online]. Available: https://cloud.google.com/appengine/. [Accessed: 30-Jul-2018].
[15] “Salesforce,” 2018. [Online]. Available: https://www.salesforce.com/. [Accessed: 30-Jul-2018].
[16] H. Alipour, Y. Liu, and A. Hamou-Lhadj, “Analyzing Auto-scaling Issues in Cloud Environments,” Proc. 24th Annu. Int. Conf. Comput. Sci. Softw. Eng. IBM Corp., pp. 75–89, 2014.
[17] F. L. Ferraris et al., “Evaluating the auto scaling performance of flexiscale and amazon EC2 clouds,” Proc. - 14th Int. Symp. Symb. Numer. Algorithms Sci. Comput. SYNASC 2012, pp. 423–429, 2012.
[18] “OpenStack.” [Online]. Available: http://www.openstack.org/. [Accessed: 05-Aug-2019].
[19] OpenStack, “Heat documentation.” [Online]. Available: http://docs.openstack.org/developer/heat/. [Accessed: 01-May-2019].
[20] H. Khazaei, M. Jelena, V. B.Misic, and N. Beigi Mohammadi, “Availability Analysis of Cloud Computing Centers,” in Communication Software, Service and Multimeda Symposium, 2012, pp. 1981–1986.
[21] F. Longo, R. Ghosh, V. K. Naik, and K. S. Trivedi, “A Scalable Availability Model for Infrastructure-as-a-Service Cloud,” in IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN), 2011, p. pp.335,346.
[22] M. Mihailescu, A. Rodriguez, and C. Amza, “Enhancing application robustness in infrastructure-as-a-service clouds,” Proc. Int. Conf. Dependable Syst. Networks, pp. 146–151, 2011.
[23] Q. Zhang, M. F. Zhani, M. Jabri, and R. Boutaba, “Venice: Reliable virtual data center embedding in clouds,” IEEE INFOCOM 2014 - IEEE Conf. Comput. Commun., pp. 289–297, 2014.
[24] D. Jayasinghe, C. Pu, T. Eilam, M. Steinder, I. Whally, and E. Snible, “Improving Performance and Availability of Services Hosted on IaaS Clouds with Structural Constraint-Aware Virtual Machine Placement,” in IEEE International Conference on Services Computing, 2011, pp. 72–79.
[25] A. Jahanbanifar, F. Khendek, and M. Toeroe, “Providing Hardware Redundancy for Highly Available Services in Virtualized Environments,” 8th IEEE Int. Conf. Softw. Secur. Reliab., no. Vmm, pp. 40–47, 2014.
[26] Distributed-Management-Task-Force (DMTF), “Open Virtualization Format Specification,” 2013. [Online]. Available: https://www.dmtf.org/sites/default/files/standards/documents/DSP0243_2.1.0.pdf. [Accessed: 10-Dec-2018].
[27] E. A. Brewer, “Lessons from giant-scale services,” IEEE Internet Comput., vol. 5, no. 4, pp. 46–55, 2001.
[28] T. Dumitras, P. Narasimhan, and E. Tilevich, “To Upgrade or Not to Upgrade Impact of Online Upgrades across Multiple Administrative Domains,” ACM Int. Conf. Object oriented Program. Syst. Lang. Appl. (OOPSLA ’10), pp. 865--876, 2010.
[29] T. Dumitraş and P. Narasimhan, “Why do upgrades fail and what can we do about It? Toward dependable, online upgrades in enterprise system,” in 10th ACM/IFIP/USENIX International Conference on Middleware (Middleware ’09), 2009, vol. 5896 LNCS, pp. 349–372.
[30] T. Dumitras, “Cloud Software Upgrades : Challenges and Opportunities,” in 2011 IEEE International Workshop on the Maintenance and Evolution of Service-Oriented and Cloud-Based Systems (MESOCA ’11), 2011, pp. 1–10.
[31] T. Das, E. T. Roush, and P. Nandana, “Quantum Leap Cluster Upgrade,” in Proceedings of the 2nd Bangalore Annual Compute Conference (COMPUTE ’09), 2009, pp. 2–5.
[32] X. Ouyang, B. Ding, and H. Wang, “Delayed switch: Cloud service upgrade with low availability and capacity loss,” in 2014 IEEE 5th International Conference on Software Engineering and Service Science (ICSESS), 2014, pp. 1158–1161.
[33] T. Dumitras, “Dependable, Online Upgrades in Enterprise Systems,” 24th ACM SIGPLAN Conf. Companion Object Oriented Program. Syst. Lang. Appl. (OOPSLA ’09), pp. 835–836, 2009.
[34] T. Dumitra and P. Narasimhan, “Toward Upgrades-as-a-Service in Distributed Systems,” in 10th ACM/IFIP/USENIX International Conference on Middleware (Middleware ’09), 2009.
[35] B. Calder et al., “Windows Azure Storage : A Highly Available Cloud Storage Service with Strong Consistency,” in 23rd ACM Symposium on Operating Systems Principles (SOSP), 2011, vol. 20, pp. 143–157.
[36] Amazon Web Services, “AWS Elastic Beanstalk Developer Guide API Version 2010-12-01,” 2010. [Online]. Available: http://awsdocs.s3.amazonaws.com/ElasticBeanstalk/latest/awseb-dg.pdf. [Accessed: 05-Aug-2019].
[37] D. Sun, D. Guimarans, A. Fekete, V. Gramoli, and L. Zhu, “Multi-objective Optimisation for Rolling Upgrade Allowing for Failures in Clouds,” in 2015 IEEE 34th Symposium on Reliable Distributed Systems (SRDS), 2015, pp. 68–73.
[38] V. Gramoli, L. Bass, A. Fekete, and D. W. Sun, “Rollup: Non-Disruptive Rolling Upgrade with Fast Consensus-Based Dynamic Reconfigurations,” IEEE Trans. Parallel Distrib. Syst., vol. 27, pp. 2711–2724, 2016.
[39] D. Sun et al., “Quantifying failure risk of version switch for rolling upgrade on clouds,” 2014 IEEE Fourth Int. Conf. Big Data Cloud Comput., pp. 175–182, 2014.
[40] D. Sun, A. Fekete, V. Gramoli, G. Li, X. Xu, and L. Zhu, “R2C: Robust Rolling-Upgrade in Clouds,” IEEE Trans. Dependable Secur. Comput., pp. 1–1, 2016.
[41] K. Liu, D. Zou, and H. Jin, “UaaS: Software Update as a Service for the IaaS Cloud,” Proc. - 2015 IEEE Int. Conf. Serv. Comput. SCC 2015, pp. 483–490, 2015.
[42] Distributed-Management-Task-Force(DMTF), “Cloud Infrastructure Management Interface (CIMI) Model and RESTful HTTP-based Protocol: An Interface for Managing Cloud Infrastructure.” .
[43] Open-Grid-Forum, “Open Cloud Computing Interface - OCCI.” [Online]. Available: http://occi-wg.org/. [Accessed: 01-May-2015].
[44] “Cloud Application Management for Platforms Version 1.1.” [Online]. Available: http://docs.oasis-open.org/camp/camp-spec/v1.1/camp-spec-v1.1.html.
[45] OASIS, “Topology and Orchestration Specification for Cloud Applications (TOSCA).” [Online]. Available: http://docs.oasis-open.org/tosca/TOSCA/v1.0/TOSCA-v1.0.pdf.
[46] R. Jain and S. Paul, “Network Virtualization and Software Defined Networking for Cloud Computing:A Survey,” IEEE Commun. Mag., no. November, pp. 24–31, 2013.
[47] Intel, “PCI-SIG Single Root I / O Virtualization ( SR-IOV ) Support in Intel ® Virtualization Technology for Connectivity,” 2008.
[48] H. M. Tseng, H. L. Lee, J. W. Hu, T. L. Liu, J. G. Chang, and W. C. Huang, “Network virtualization with cloud virtual switch,” Proc. Int. Conf. Parallel Distrib. Syst. - ICPADS, pp. 998–1003, 2011.
[49] “ESXi: Bare Metal Hypervisor,” 2018. [Online]. Available: https://www.vmware.com/ca/products/esxi-and-esx.html. [Accessed: 01-Oct-2018].
[50] “Ceph.” [Online]. Available: https://ceph.com/. [Accessed: 05-Jan-2017].
[51] “Ansible.” [Online]. Available: http://www.ansible.com/home. [Accessed: 20-Aug-2019].
[52] “OpenSAF - The Open Service Availability Framework.” [Online]. Available: http://opensaf.sourceforge.net/documentation.html. [Accessed: 20-Aug-2019].
[53] P. Heidari, M. Hormati, M. Toeroe, Y. Al Ahmad, and F. Khendek, “Integrating OpenSAF High Availability Solution with OpenStack,” in Services (SERVICES), 2015 IEEE World Congress on, 2015, pp. 229–236.
[54] “Puppet labs.” [Online]. Available: https://puppetlabs.com/?_ga=1.122891208.2105885589.1429055377. [Accessed: 01-May-2018].
[55] “Ruby.” [Online]. Available: https://www.ruby-lang.org/en/. [Accessed: 01-May-2019].
[56] “Chef.” [Online]. Available: https://www.chef.io/chef/. [Accessed: 01-May-2018].
[57] “Salt.” [Online]. Available: http://docs.saltstack.com/en/latest/. [Accessed: 01-May-2018].
[58] “Python.” [Online]. Available: https://www.python.org/. [Accessed: 01-May-2019].
[59] “Mistral.” [Online]. Available: https://docs.openstack.org/mistral/latest/. [Accessed: 01-Jun-2019].
[60] “TaskFlow.” [Online]. Available: https://wiki.openstack.org/wiki/TaskFlow. [Accessed: 01-May-2018].
[61] “The Official YAML Web Site.” [Online]. Available: http://yaml.org/. [Accessed: 01-May-2019].
[62] M. Nabi, M. Toeroe, and F. Khendek, “Rolling upgrade with dynamic batch size for Iaas cloud,” in 2016 IEEE 9th International Conference on Cloud Computing (CLOUD), 2016, pp. 497–504.
[63] “VMware vSAN.” [Online]. Available: https://docs.vmware.com/en/VMware-vSAN/index.html. [Accessed: 05-Jan-2018].
[64] H. Pham, “System Reliability Concepts,” Syst. Softw. Reliab., pp. 9–75, 2006.
[65] L. Tomás and J. Tordsson, “Improving cloud infrastructure utilization through overbooking,” Proc. 2013 ACM Cloud Auton. Comput. Conf. - CAC ’13, p. 1, 2013.
[66] L. Tomas and J. Tordsson, “An autonomic approach to risk-aware data center overbooking,” IEEE Trans. Cloud Comput., vol. 2, no. 3, pp. 292–305, 2014.
[67] “Vagrant.” [Online]. Available: https://www.vagrantup.com/. [Accessed: 01-Oct-2018].
[68] “vagrant-ansible-openstack.” [Online]. Available: https://github.com/dguerri/vagrant-ansible-openstack.
[69] “The Go Programming Language,” 2018. [Online]. Available: https://golang.org/. [Accessed: 10-Oct-2018].
[70] “gophercloud: The OpenStack SDK for Go,” 2018. [Online]. Available: http://gophercloud.io/. [Accessed: 10-Oct-2018].
[71] “QEMU.” [Online]. Available: https://www.qemu.org/. [Accessed: 20-Aug-2019].
[72] A. T. Foundjem, “Towards Improving the Reliability of Live Migration Operations in OpenStack Clouds.,” Ecole Polytechnique de Montreal, 2017.
[73] S. K. Garg, A. N. Toosi, S. K. Gopalaiyengar, and R. Buyya, “SLA-based virtual machine management for heterogeneous workloads in a cloud datacenter,” J. Netw. Comput. Appl., vol. 45, pp. 108–120, 2014.
[74] “JGraphT a Java library of graph theory data structures and algorithms.” [Online]. Available: https://jgrapht.org/. [Accessed: 05-Dec-2018].
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top