The Cloud’s Cloudy Moment: A Systematic Survey of Public Cloud Service Outage

Zheng Li, Mingfei Liang, Liam O’Brien, He Zhang

Abstract


Inadequate service availability is the top concern when employing Cloud computing. It has been recognized that zero downtime is impossible for large-scale Internet services. By learning from the previous and others’ mistakes, nevertheless, it is possible for Cloud vendors to minimize the risk of future downtime or at least keep the downtime short. To facilitate summarizing lessons for Cloud providers, we performed a systematic survey of public Cloud service outage events. This survey followed the standard and rigorous methodology applied for Evidence-Based Software Engineering. This paper reports the result of the survey, such as: (1) none of the Cloud endors can avoid suffering from service outages; (2) Cloud service outages could happen at any time, and each Cloud vendor has experienced violation of its Service Level Agreements during the past years; (3) Climate and Age are two influential factors related to the outage locations; and (4) Power Outage and Routing/Network Issue are two common classes of Cloud service outage causes. In addition to those findings, our work generated a lessons framework by classifying the outage root causes. The framework can in turn be used to arrange outage lessons for reference by Cloud providers. By including potentially new root causes, this lessons framework will be smoothly expanded in our future work.

Full Text:

PDF
Total views : 56 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.