A string of disruptions to Amazon Web Services (AWS) cloud servers in the final months of 2021 affected holiday package deliveries, streaming and gaming apps, personal cash transfer apps, home security cameras and a number of Amazon products, including Alexa and Kindle. These outages further underscore the urgency and importance of cloud security on the list of security priorities for every business. Cloud technologies are here to stay — and they present new challenges for IT and security leaders.
Most companies are already in some stage of transformation and adoption, whether private, public or hybrid cloud, to leverage the significant advantages of scalability, reliability and high availability enabled by cloud technologies. Beyond the benefits, macro trends and global events like the COVID-19 pandemic have further accelerated adoption of cloud for collaboration, meetings, online education, gaming, non-profit fundraising and more.
While we all benefit from the expansion of the cloud, increased use also puts tremendous pressure on cloud infrastructures to deliver. Expectations are high, and people have come to treat the cloud like a basic utility, as critical as electricity, water or gas. As a result, the impacts can be far-reaching, placing public cloud companies under more scrutiny when outages occur. Market and industry research firms estimate that major public cloud platforms experience an outage about once a quarter. However, the widely-publicized string of AWS outages came at the tail end of multiple high-profile cloud outages. In November, a Google Cloud outage affected Snapchat and Spotify, and in October, a Meta outage took out Facebook, Instagram, WhatsApp and Messenger. These events indicate that cloud security incidents are increasing in both frequency and impact on our daily lives.
Outages can happen for various reasons. Hackers are constantly running thousands of virtual machines on the cloud to find weaknesses in any public or private cloud infrastructure. These “bot farms” are used to hold companies for ransom with ransomware or prevent them from servicing their legitimate customers by keeping their system occupied with denial-of-service (DoS) traffic. In addition, bugs in software can also impact availability. (It’s worth pointing out that the November Google Cloud outage was related to a software bug that has since been fixed.)
Six ways to prepare your team to minimize cloud security risk from outages
Don’t let the next outage catch you off guard! While cloud vendors must be prepared to stay ahead of these threats and react quickly when they occur, it’s equally important for those leveraging the cloud to be prepared, as well. As with any situation, preparation can help manage the risks.
- Analyze the readiness of your team to respond to critical situations. Develop short-, medium- and long-term plans to address areas of risk.
- Ensure you have a business continuity and disaster recovery plan. Understand your specific business and application profile, and ensure everyone on your technology team has the same understanding of what it means to be “up and running.” Develop a clear plan for activities, with clearly assigned responsibilities required to bring systems back online.
- Review your testing plan for disaster, availability and outage scenarios. Determine how long it will take for your team to get systems back up and running in case of an outage. Your application and DevOps leadership should drive this process and have automated tests run periodically to check for preparedness.
- Review the commitments you’ve made to customers in your service-level agreements (SLAs). Can you quantify the costs to reimburse customers in the case of an unexpected outage? Do you need to revise these terms?
- Ask your product and technology leadership to outline the business impact of an outage qualitatively and quantitatively. For example, how will the impact be measured in terms of both lost revenue and loss of goodwill?
- Assess how your current cloud architecture can reduce the impact of service outages. Given complexities and costs, a multicloud approach is not a cure-all for resiliency. Ensure your team has maximized resiliency though proper application design and implementation, thoroughly tested through chaos engineering. Looking to improve resiliency? Consider a Multi-AZ deployment across multiple regions.
Three ways to manage security risk with cloud providers
Ensure your team has a common understanding of the commitments and response plans of your cloud vendors and that these are accurately accounted for in your overall cloud security risk mitigation plan.
- Review the terms of your vendors’ SLAs. What commitments are your cloud providers making to you regarding their service? What reimbursements will they provide for reduced availability?
- Consider SLAs in quantifiable terms and make sure they are appropriate to the level of business impact. The higher the risk, the more you have to factor in the guarantees from your vendors.
Most public cloud providers calculate SLAs on a monthly basis. What this means is that, if a cloud provider is guaranteeing 99.99% uptime, the services can be down about four to five minutes in a given month. If there is a seven to eight-hour outage, uptime drops to 99%. So, make sure you have appropriate credits included in your SLAs.
- Consider the implications of cloud availability for your unique business requirements. Your cloud strategy and risk will vary if you leverage infrastructure as a service (IaaS) or platform as a service (PaaS) capability from the cloud vendor.
In the case of IaaS, the vendor is only responsible for ensuring the uptimes for systems under their control, such as the hardware behind the services. In this scenario, you are responsible for ensuring applications are secure and properly designed to protect against intrusions and outages. Depending upon the business context, the cost-risk-benefit trade-off of having multiple availability zones or multiple support regions may make this additional responsibility worth it to you.
In the case of PaaS (for example, Dynamo DB or Azure SQL services), vendors own the responsibility of ensuring the uptimes of their services. You still need to make the right choices and design for higher security and lower risk, but it is less of a burden.
To learn more about what you can do to strengthen cloud security, visit our Connect page to contact our team.