Disaster Preparedness with Acquia Cloud: Contingency Plans & Crisis Comm
by Michael Lemire
Hurricane Sandy has highlighted the importance of continuity planning. Acquia’s contingency and communications plans are designed to ensure that our services and products remain available through such extreme weather events and also to make sure that problems are detected and communicated quickly and accurately to our customers.
High Availability and Disaster Recovery at Acquia
Acquia has a complex set of procedures and policies in place to ensure availability of our cloud products and services. Acquia Cloud, Acquia’s Platform-as-a-Service product, and Drupal Gardens, our Software-as-a-Service product, are built on Amazon Web Services (AWS) infrastructure. Acquia Cloud leverages Amazon Elastic Compute Cloud (EC2) infrastructure and technology to provide highly availability and disaster recovery capabilities in line with every customer’s requirements.
Acquia Cloud customers can choose the appropriate level of redundancy for their web properties. We offer both single/shared server environments and highly available redundant architecture distributed across multiple data centers. In addition, Acquia can provide dedicated, fully redundant hot-standby architecture that is geographically distributed across the United States and/or the European Union. Acquia Cloud also provides the capability for customers to create ad hoc or scheduled backups of the code and data that make up their sites, enabling self-hosting of disaster recovery sites.
Detecting and Escalating Continuity Issues
Acquia has a variety of products and services that are mission critical to our customers, including our core platforms, Acquia Cloud and Drupal Gardens, as well as Acquia Network (the web-based management interface that our customers use to manage Acquia Cloud sites), Acquia Search, Acquia Insight and our marketing sites. All of these systems require redundant and diverse automated monitoring and personnel on call 24x7 and geographically diverse to respond to any platform issues. To ensure continuous monitoring and escalation paths during any contingency issue Acquia’s operations team, which maintains and responds to monitoring events, is staffed currently in the United States (both East and West coasts), as well as Europe and Australia.
All mission critical systems and platforms are extensively monitored both from inside and outside our environment. Any issue that causes a disruption – whether an issue affecting a single customer’s site, or a platform issue that may affect many of our customers – kicks off immediate escalation to our support teams, as well as any additional teams whose expertise might be required. Internal communication channels are redundant.
Communication During a Crisis
During the past year, Acquia has enhanced both its internal and external communications plans and procedures to ensure that we communicate with our clients accurately and consistently.
The escalation of a service issue can be initiated either by a customer via a phone call or a support ticket, or as a result of Acquia’s monitoring procedures. Within 10 minutes of an issue being detected, internal escalation takes place. If the problem has not been resolved within 15 minutes, customers are notified of the situation through a status page at http://status.acquia.com. In addition, we tweet updated status information on our support Twitter account: @acquia_support. If the problem continues for 30 minutes, we initiate an email to our affected customers. The status page and the Twitter feed are updated every 30 minutes until the issue is resolved. Within two hours of problem resolution, we post an initial post-mortem. Within 48 hours, we provide a detailed post-mortem via post and email.
Ideally contingency plans are not needed, but with our responsibility for providing mission-critical web sites for our customers, we continually work to improve our procedures. Acquia’s focus on improving continuity planning includes:
• Deploying highly available services.
• Ensuring continuous 24x7 and redundant monitoring systems to detect issues as they occur.
• Escalating and resolving problems as quickly as possible.
• Communicating to our customers promptly and effectively.
By placing a high priority on these procedures, Acquia works to ensure world-class availability.