Constructing a Fault-Tolerant, Highly Available Cloud Infrastructure for your Drupal Site [December 12, 2012]
Constructing a Fault-Tolerant, Highly Available Cloud Infrastructure for your Drupal Site [December 12, 2012]
Want to learn more about Acquia’s products, services, and happenings in the Drupal Community? Visit our site: http://bit.ly/yLaHO5.
For mission-critical Drupal sites, building a highly available and highly resilient infrastructure to ensure your site doesn't fail can be extremely challenging. Many organizations have already tried to "DIY" and have realized the maintenance and 24x7 support requirements are too costly, or generally unmanageable. As the infrastructure focus shifts to the Cloud to offload management and save money, many organizations are still struggling to architect the right environment that they can trust won't fail and won't lose data.
This webinar focus on best practices for building a highly available infrastructure from industry veteran Andrew Kenney. Andrew Kenney has held senior positions on both sites of the coin: building resilient hosting platforms on top of traditional datacenters at Catalog.com and ONEsite and on top of the cloud at Acquia.
This webinar will be highly technical, and attendees will learn:
• What's unique about Drupal that requires specific infrastructure choices for load balancers, the Web tier, database and file systems
• Best practices for building redundant and highly available environments
• How to architect, automate and test data backup and failover processes
Hannah: Today's webinar is: Constructing a Fault-Tolerant, High Available Cloud Infrastructure for your Drupal Site.
First speaking we have Jess Iandiorio, who is the Senior Director of Cloud Products Marketing, and then we have Andrew Kenney who is the VP of Platform Engineering.
Jess, you take it away now.
Jess: Great, thank you very much, Hannah. Thanks everybody for taking the time to attend today, we have some great content, and we have a new speaker for our Webinar Series. For those of you who attend meetings you know we do three to five per week.
Andrew Kenney has been with the organization since mid-summer, and we are really excited to have him, he comes to us from ONEsite, but he is heading our Platform Engineering at this point, and he is the point person on all things; Acquia Cloud specifically, he'll speak in just a few minutes.
Thank you, Andrew.
Just to key up what we are going to talk about today, what we want to talk about, is we want our customers to be able to focus on Web innovations, and creating killer websites is hard, so that’s why we wanted to be able to spend all of the time you possibly can, figuring out how to optimize your experience and create a really, really cool experience on your website. Hosting that website shouldn’t be as much of a challenge.
The topic today is designing a fault-tolerant, highly available system and the point of the matter is, if your site is mission-critical how do you avoid a crisis, and why do you need this type of infrastructure?
Andrew has some great background around designing highly-available infrastructure and systems, and he's going to go through best practices and then I'll come back towards the end just to give a little bit of information about Acquia Cloud as it relates to all the content he's going to cover, but he's just going to talk generally about best practices and how you could go about doing this yourself.
Again, please ask those questions in the Q&A Tab as we go, and we'll get to them as we can. For the content today, first Andrew is going to discuss the challenges that Drupal sites can have when it comes to your hosting, what makes them complex and why you would want a tuned infrastructure in order to have high availability. He's been able with the types of scenarios that would cause failure, how you can go about creating high availability and resiliency, talk about the resource challenges with some organizations may incur, and then you may go through practical steps in best practices around designing for failure and how you can actually do that and architect and automate the failover as well. He'll close with some information on how you can test failure as well.
With that, I'm going to hand it over to Andrew, and I'm here in the background if you have any questions for me, otherwise I'll moderate Q&A and I'll be back towards the end.
Andrew: Thank, Jess. It's nice to meet you, everyone. Feel free to ask questions as we go or we can just have those at the wrap up, and I'm more than willing to be interrupted though.
Many of you may be familiar with Drupal and its state as a great PHP Content Management system, but even with it being well engineered in having a decade-plus of enhancements, there some number of issues with hosting Drupal and these issues were always present if you're hosting in your own datacenter, or environmental server in, let's say, RackSpace or a SoftLayer but even more challenging when you're dealing with Cloud hosting.
The Cloud is great at a lot of things, but some of these more Legacy applications are very, very complex and extensive applications may have some issues which you can solve with modules, you can solve with great platform engineering, or you can just work around in other ways.
One of these issues is Drupal expects POSIX file system, this essentially means that Drupal and all that’s filing the output calls were designed with the fact that there's a hard drive underneath the Web server, if not a hard drive in there, is an NFS server, there's a Samba server. There's some sort of underlying file system. This is not oppose to some new applications where maybe they're built by default to go store files inside Amazon [Espree 00:04:16] or inside Akamai NetStorage, or inside documented oriented database, like CouchDB or one of those databases.
Drupal has come a long way especially in Drupal 7 in making it so that you can enable modules that will use PHP file streams instead of direct app open … Legacy, Unix file operations, but there's a number of different version of Drupal and they don’t all support this and there's not a lot of great file system options inside the Cloud. At the end of the day Drupal still expects to have that file system there.
A number of other issues are: Drupal may make … you may make five queries on a given page, you may make 50 queries on a given page, and when you're running everything on a single server this is not necessarily a big deal. You may have latency in the hundredth of milliseconds, when you run you're running something on the Cloud it may be the same latency on a single server, but now let's talk about you're running and even with the same availability zone in the Amazon you may have your Web server on one rack and you may have your database on a rack that is a few miles away within the same availability zone.
This latency, even if it's only one millisecond or 10 milliseconds per query it could dramatically add up. One of the key challenges in dealing with Drupal both at the scale of [horizontal 00:05:49] layer as well as just in the Cloud in general, it's how you deal with high latency MYSQL operations. Do you improve the efficiency of the overall page and use less dynamic modules or less … 12-way left joins and views and different modules? Do you implement more cashing? There are a lot of options here but, in general, Drupal still needs to do a lot of work in improving its performance at a database layer.
One other similar-related note is Drupal is not built with partitions tolerance in mind, so Drupal will expect to have a master database that can you can go commit transactions to. It won't have any automatic charging built in so if you move, let's say, the articles on your website, your article section may go down but you'll still have your photo galleries; your other node-driven elements.
Some other new-generation applications may be able to deal with the loss of a single backend database node, maybe they're using a database like a REOC or Cassandra that has grades, partition tolerance built into it, but unfortunately MySQL doesn’t do that unless you're in familiar in charting manually. We can scale out Drupal and scale up Drupal to MySQL layer and we can have availability MySQL, but at the end of the day if you lose your MySQL layer you are going to lose your entire application essentially.
One of the other issues with Drupal hosting is, there's a shortage of talent, there's a shortage of people that have really driven Drupal at a massive scale. There are companies like … the economies of the world who are the Top 50 Internet site that’s powered by Drupal, or there's talent that the WhiteHouse giving back Drupal, but there's still a lack of good, dev ops expertise in terms of selling that … an organization that runs hundreds of Drupal sites. How to go to deploy this either on your internal infrastructure in, let's say, a university IT department, or to go deploy it on a rack space or a traditional datacenter company?
Drupal has its own challenges, and one of those challenges is: how do you find great, either engineering operations, dev ops people to go help you with your Drupal projects?
Now there's a number of ways, and you're all may be aware. of how an application would die in a traditional datacenter. That may be someone tripping over a power cord, it may be you lose your Internet access or one of your actual upstream ISPs or you have DDOS attack.
Many of these also go from the Cloud, but the Cloud also introduces other more … complex scenarios or in a couple of scenarios. You can still have machine loss, Amazon exacerbated this by that machine loss may be even more random and unpredictable, so Amazon may announce that a machine is going to be available on a given day, which is great and probably something that your traditional IT department or infrastructure provider didn’t give you unless they're very good at their jobs.
There's still a chance that at any given moment, Amazon machine may just go down, and may become unavailable, and you really have to introspection into why this happened. The hypervisor layer, all this … the hardware abstraction is not available to you, Amazon shields us, RackSpace Cloud shields us. All these different clouds shield you from knowing what's going on at the machine layer, or there may just be a service outage, so Amazon may lose a datacenter, RackSpace, just this weekend, issued in the Dallas region with its Cloud customer.
You never know when your actual infrastructure and service provider is going to have a hiccup. There may just be network disruption, this could be packet loss, this could be routes being malformed going to different regions different countries, a lot of different ways that the network can go impact your application, and it's not just traffic coming to your website, it's also your website talking with its main [cache 00:10:10] layer, talking with its database layer, all these different things.
One of the key points of Amazon Cloud specifically, is that its file system, if you're using Lasix Box storage, there's been a lot of horror stories out there about EBS outages have taken down Amazon's EC2 systems or anything that’s backed by EBS. In general, it's hard to go have an underlying, like I said before, a POSIX file system at scale, and EBS' instrument technology, but it's still in its infancy. Amazon, although it's focused on reliability and performance for EBS has a lot of work to do to go and improve that, and even people like RackSpace are just now deploying their own EBS-like sub-systems with an open stack.
Your website may fail just from a traffic spike. This traffic may be legitimate traffic, maybe someone talked about your website on a radio program, or TV broadcast, or maybe you get linked from the homepage of TechCrunch or Slashdot, but traffic spike could also be someone that’s initially trying to take down your website. The Cloud doesn’t necessarily makes us any worse other than the fact that you may have little to no control over your network infrastructure, you do not have access to a network engineer who can go to point out exactly we are upstream and all these factors are coming from, and go implement routing changes, or firewall changes to do this, so the Cloud may make it harder for you to go control this.
Your control thing, your ability to go manage, your service in Cloud may go down entirely; this is one of the issues that crops up on Amazon when they have an outage. They may go all the way back so you can do anything. It may go down entirely and you have to be able to engineer around this and ensure that your applications will survive even if you can't go spin up new servers or go adjust resizing and different things like that.
Another way to system failure is that your backups may fail, it's either a network to go and do backups of servers and volumes and all these different things in the Cloud but you have no guarantee even when the API says that a backup is completed, that it's actually done. This may be better than traditional hosting but it's still a lot of progress to be made in engineering to go accommodate this.
In general, everyone wants to have a highly-available and resilient website, there's obviously different levels of SLAs, some people may be happy if the website can sustain an hour of downtime, other organization may feel their website condition is critical and even a blip of a few minutes is just too much because it's actually having financial transactions or just publicity if the website is down.
In general, Drupal specifically should be … your hosting at Drupal should be engineered with high availability and resiliency in mind. To do this you should plan for failure because that’s in the Cloud, just know that any given time a server may die and you can have either the hot standby and process in place to go spin up a new server. This means that you want to make sure that your deployment and your configuration are as automated as possible.
This may be a puppet configuration, it may be CFEngine configuration and may just be the chef or a batch script that says, "This is how I spin up a new machine and install the necessary packages. This is how I check out my Drupal code from GitHub, but at the end of the day when you're woken up by a pager … due to a pager at 2:00 in the morning, you don’t want to have to go think about how you built the server, you want to have a script to go spin it up, or ideally, you want to use tools to go have it scale over automatically; and so you actually have no blips.
Obviously, to have no blips means that you need to have this configured automatically. You should have no single points of failure, that’s the ideal in any engineering organization, any consumer-facing or internal-facing website application have no single point of failures. In a traditional datacenter would mean having dual UPSs, having dual upstream power supplies and network connectivity; having two sets of hard drives in their machine, or having RAID, or having multiple servers, having your application low-distributed across … in regions … geographic regions.
There's lots of single points of failure out there. The Cloud abstracts a lot of this, so in general it's a great idea to run the Cloud because you don’t have to worry about the underlying network infrastructure, you can actually spin a server up in one of the five Amazon East Coast availability zones, and you don’t have to worry about any of the hardware requirements or the power, or any of those things. In order to have no single points of failure, it means you have to have two of everything, or if you due to the downtime have … you can use Amazon's Cloud formation along with CloudWatch to go quickly spin up a server from one of your versions and just boot that up that way, but definitely it's good to have two of everything, at least.
You will want to monitor everything, before I said you could use CloudWatch to go monitor your servers, you can use Nagios installations, you can use Pingdom to make sure that your website is up, but you want everything monitored, so your website itself is returning … Drupal returning the homepage, do you actually want to submit a transaction to go create a new node and validate that this node is there, using companies like Selenium.
Do you want to just make sure that MySQL is running, do you want to see what the CPU help is, or how much network activities there is, and one of the other things is you want to monitor your monitoring system. Maybe you trust that Amazon's CloudWatch isn't going down, maybe trusting Pingdom not to go down, but you probably won't trust the fact that if you're running Nagios and your Nagios server goes down, you can't sustain an outage like that, you don’t want that to happen at 2:00 in the morning and then someone tells you on Monday morning your website has been down all weekend, and a good idea to monitor the monitor servers.
Backing up all your data is key for resiliency and business continuity, and ensuring that your Drupal Cloud system is backed up; your MySQL database is backed up. Your configurations are all there, and this includes not just backing up but validating that your backups are working, because many of us may have been in organization where, yes, the DBA did back up a server but when the primary server failed and someone tried to restore it from the backup someone found out that, oh, well, it's missing one of the databases or one of the sets of cables. Or, maybe the configuration or the password wasn’t actually backed up so there's no way to even log in to that new database server.
It's a very good idea to always go and test all of your backups, and this also includes testing emergency procedures, for organizations have to have business continuity plans, but no plan is flawless and plans have to be continually iterated just like in software. The only way to ensure that the plan works is to go actually engage that plan and test it, so it's all of my recommendation that if you have a failover a datacenter, or you have a way to failover for your website, you will want to test that failover plans.
Maybe you only do it once a year or maybe you do it every Saturday morning at a certain time, if you can engineer out so there's no hiccup for your end users, or may be your website has no traffic at any given point in time of the week, but it's a great idea to actually go test those emergency procedures.
In general, there's challenges with Drupal management, and just the resource challenges. The Cloud tells you that your developers no long have to worry about all the testy details but are necessary to go launch and maintain a website. You don’t have to have any operations staff to be more … invest in Hype. I think a lot of engineers always felt that the operation team is just a bottleneck in their process and once they have validated that their code is good, either versus their opinion or they're running their own system test, or unit test. They wanted to go just push that live and that’s one of the principles of continuous integration.
The reality is that developers aren't necessarily great at managing server configurations, or engineering a way to go deploy software without having any hiccup to the end user client who may load a page and then there's an AJAX call that refreshes in another base so we want to make sure that there's no delay in the process, and that code doesn’t go impact the server, and the server configurations are maintained.
Operations staffs are still very, very likely and you have to go plan for failure to go plan your performer process in reality. It's very hard to go find people that are great at operations as well as understanding an engineer's mindset, and so dev ops is resource challenge.
Here's an example of how we design for failure. Here at Acquia, we plan for failure; we engineer different solutions to different clients' budgets to make sure that we give them something that will make their stakeholders, internally and externally happy. We have multiple availabilities on hosting so for all of our managed Cloud customers when we launch one server we'll then have another backup server in another zone.
Drupal will replicate data from one zone to the other. If there's any service interruption in one zone it will go serve data from the other zone, so this includes the actual Web node layer, or the Apache servers that are serving the raw Drupal files includes the file system. Here we use Cluster effects to go replicate the Drupal file system from server to server and from availability zone to availability zone.
It's also the MySQL layer, we'll have a master database server in its region, or we may have a slave against those master database servers, but it's ensuring that all the data is always in two places and anytime there's a hiccup in one Amazon availability zone it won't impact your Drupal website.
Sometimes that’s not enough. There's been a number of outages recently in the Amazon's history where maybe one availability zone goes down, but due to the control system failure, or due to other issues with the infrastructure there's multiple zones that are impacted. We have the ability to have multiple region-hosting, so this may be out of the East Coast, and the West Coast, U.S. West, and maybe the … our own facilities.
It really depends on what the organization wants, but the multi-region hosting gives businesses the peace of mind and the confidence that if there is a natural disaster that wipes out all of U.S. East, or if there's a colossal failure that’s a cascading failure in the U.S. East, or one of these different regions that your data is always there, your website is always in another region, and you're not going to experience catastrophic delays in bringing your website up-to-date.
During Hurricane Sandy there were a number of organizations that learned this lesson when they had their datacenters in, let's say, Con Edison's facilities in Manhattan and maybe they're in multiple datacenters there, but it's possible for an entire city to go and lose power for, potentially a week, or to have catastrophic damage by water to the equipment. It's always important to have your website available for multiple regions and we offer that for our clients.
One of the other key things … since they are to prevent failure is making sure that you understand the responsibilities and the security model for all the stakeholders in your website. You have the public consumer who is responsible for their browser and them engaging with them and showing they don’t distribute their passwords to unauthorized people.
You have Amazon who is responsible for the network layer for … during that two different machine images on the HyperVisor don’t have the ability to go disrupt each other. Making sure that they are … the physical medium of the servers and the facilities are all locked down and that customers using the security groups can't go from one machine to the other, or have a database called on from one rack to the other for different clients.
Then you have Acquia who is responsible for the software servers to the platform as a service layer with Drupal hosting. We are in charge of the operating system patches, we are in charge of configuring all of the security modules for Apache and in charge of recommending to you that you have Acquia network inside tools that you need to update … you need Drupal modules to ensure a high security, and you do all these things, but that brings it back to you. At the end of the day you're responsible for your application, your developers are the ones that go and make changes too and implement newer architectural things that may need to be security tested, or that choose not to go update a module for one point of view or another.
There's a shared security model here which covers both security availability in compliance, there may be a Federal customer who has to have things enabled a certain way just to go comply with a [FISMA 00:24:23] or Fed ramp accreditation. Obviously security can go impact the overall availability for your website and you don’t engineer for a security up-front them half of them can go take down your machine or they'll compromise your data so you don’t want your website back online until you’ve validated exactly what has changed.
What's very important to understand in the shared security module, and as you're planning for failure. Another thing I had briefly touched before was monitoring. This includes both monitoring your infrastructural application as well as monitoring for the security threats I just mentioned. At Acquia we use a number of different monitoring systems which I'll go in detail in, including Nagios, including your own 24/7, 365, operation step, but we also use third party software to go scan our machines to ensure that they are up-to-date and have no open ports that may be an issue, or have no demons running that are going to be an issue. Or have no other vulnerability.
This includes Rapid7, OSSEC, monitoring the logs, and for thwarting any … lots of issues across issues during security scans. It's important to monitor your infrastructure both from making sure the service is available as well as there's no security holes.
Back to monitoring, we have a very robust monitoring system, it's one of the ways … it's one of the systems we have to have, it's something we have 4,000-plus servers in the Amazon's Cloud, so all the Web servers and database servers and the Git and SVN servers, and all these different types of servers, they are monitored by something we call [Mon 00:26:01] Server, and these, on servers check to makes sure the websites are up, check to make sure that MySQL and Memcache is running, all these different things.
The mon servers also monitor each other, you see that from the line form mon server to mon server at the top, so they monitor each other in the same zone. They may choose to go monitor a mon server in another region, just to ensure that if we lose and entire region we want to get a notice about it.
The mon servers may also be the [height 00:26:32] of Amazon's Cloud, that we may go through rounding from someone like Rackspace, just to have your own business continuity, best-breed monitoring to ensure that if there is a hiccup or service interruption in one of the Amazon regions that we go and catch it. It's important to have external validation of the experience and if we … we may just use something like [Pingdom 00:26:50] in order to go ensure your website is always there.
Ensure that it is operating within the bounds of its SOA, so there's all sorts of ways to do monitoring but it's important to have the assurance that your monitored servers are working and each monitor that goes down has something else alert you that it's down, just so you don’t impact your supporter operations team in trying to recover from an issue.
In pattern high availability resiliency in your monitoring infrastructure is very important. One of the other things; just being able to recover from failure; this includes having database backups; this includes having a file system, snapshots, so you can recover all the Drupal files, making sure that all your EBS volumes are backed up. Pushing those snapshots coming way over to [Espree 00:27:42], making sure that the process is replicated using a distributive file system technology-like luster. With all of this, you can potentially recover from catastrophic data-failure because having backups is important.
You can choose if you want to have these backups live, live replication of MySQL or the file system, or just hourly snapshots, or weekly snapshots, and that depends on your level of risk and how much you want to go spend on these things.
In terms of preventing failover, we utilize a number of these different possibilities, but you can use Amazon Elastic load balancers, multiple servers behind an ELB, and these servers can be distributed across multiple zones. For example, we use ELBs for a client like the MTA of New York, where they wanted to go and ensure that Hurricane Sandy wiped out one of the Amazon availability zones, we can still serve their Drupal website from the other availability zones.
We also used our own load balancers just in our backend to go and distribute traffic between all the different Web nodes, so one of the availability zone may go for request to the other availability zone, where you can do round robin, and that’s a different logic in there to go to distribute the request to all the healthy Web nodes, and to make sure that any unhealthy Web node we cannot sent and travel too, so while our operations team are automating systems to go recover from the reason it's unhealthy.
We have the ability to also use DNS switch to take a database that's catastrophically failed or has other replication labs or something out of service. We always choose, at Acquia, to ensure that all your data transactions are committed. We'd rather have no data loss than incur a minimal service disruption, and so you're potentially losing, usually uploading the file or a user … and account being created or some other … we have people building software service business on top of us, so that loss and protection is very important to us, and so we utilize a DNS switch mechanism to make sure that that database traffic all flows to the other database server.
For the larger sites, multi-region sites, we actually use the manual DNS switch, to switch from region to the other, this prevents a flopping of an issue and having a cache server turned into something even worse, where you may have data written to both regions. The DNS switch allows us and allows our clients to build their Web site over when they choose to and then when everything is status quo again, they can go build back.
As I said before it's very important to test all of your procedures and this includes your failover process. It should be scripted so you can go, failover to your secondary database server, so you shut down one of your Web nodes and have it auto-heal itself. People like Netflix are brilliant about this, where they have their Simian Army as they call it, that they can go shut down RAM and shut down servers, and shut down entire zones and ensure that everything is recovered.
There's a lot of best practices out there in terms of actually testing the failover, and these failover systems and the extra redundancy that you’ve added to the [limiting 00:31:22] or points of failure is key and other non-disaster scenarios. Maybe you were upgrading your version of Drupal or you're rolling out a new module and you need to go add a new database or alter a cable, go through that process within Drupal.
You can failover to one of your given database nodes and then apply the [modular 00:31:44] schema changes to that node without impacting your end users. There's ways you use these systems and in your normal course of business to make sure that you use the available nodes to their full capacity and minimize the impact to your stakeholders.
Jess, do you want to talk about why you would to do everything yourself?
Jess: Sure, yeah. Thank you so much, Andrew. I think that was a really good overview, and hopefully people on the phone were able to take some notes and think about, if you want to try this yourself, what are the best practices that you should be following.
Of course, Acquia Cloud exists and as I'm in marketing I would be remiss not to talk about why you'd want to look at Acquia, but the reasons why our customers had chosen to leave DIY, they are mainly pocketed into these three groups. One is: they don’t have a core competency around hosting let alone high availability, and so if that core competency doesn’t exist it's much easier and much more cost effective to work with a provider who has that as their core competency and can provide the infrastructure resources as well as the support for it.
Another main reason people will come to Acquia is they don’t have the resources or have no desire to have the resources to support their site meeting 24x7 resources available in order to make sure that the site is always up and running optimally, so Acquia is in a unique position to respond to both Drupal application issues as well as the infrastructure issues. We don’t make code changes for our customers but we always are aware of what's going on with your site, and can help you very quickly identify the root cause of an outage and resolve it quickly with you.
Then one of the other reasons is it can be a struggle when you're trying to do this yourself, either hosting on premise and you have purchase servers from someone or if you’ve actually gone straight to Amazon or Rackspace. Oftentimes people have found themselves in between sort of blame game and a lot of finger-pointing if the site goes down, their instinct would be to call the provider and if that provider says, "Hey, it's not us, lights are on, you have service," then you have to turn around and try to talk to your application team, what's wrong, and so there can be a lot of back and forth, a lot to time wasted and what you really is your site up and running.
Those are reasons to not try and do this yourself, of course you're welcome to, but if you try and you haven’t had success, the reasons you're going forward with Acquia is our White Glove service so, again, fully managing on a 24x7 basis for the Drupal application support as well as the infrastructure support, as well as our Drupal expertise, so we have about 90 professionals employed here at Acquia across operations, who are able to scale up and down your application.
We have engineers, we have Drupal support professionals, and they can help you either on the break-fix basis or on an advisory capacity to understand what you should be doing with your Drupal site between the code and configuration to make it run more optimally in the Cloud, so that’s a great piece of what we offer. Of course all of the engines covered today in terms of our high availability offerings and our ability to create full backups and redundancy, across availability zones as well as Amazon Regions.
We are getting to the close here, if you have some questions I'd encourage you to start thinking about them and put them into the Q&A.
The last two slides here just showcase the two options that we have if you would like to look at hosting with Acquia, Dev Cloud is a single server self service instance, so you have a fully-dedicated single server, you manage it yourself and you get access to all of our great tools that allow you to implement continuous integration best practices.
This screen shot you're seeing here is just a quick overview of what our user interface looks like for developers and we have separate dev staging and prod environments pre-established for our customers, very easy-to-use drag and drop tools that allow you to push code files and database across from the different environments while implementing the necessary testing in the background, to make sure that you never have made a change to your codes that could potentially harm your production site.
The other alternative is Managed Cloud, and this is the White Glove service offering where we promise your best day will never become your worst day with someone playing traffic spike that ends up taking your site down. We'll manage your application and your infrastructure for you, our infrastructure is Drupal tuned with all the different aspects that Andrew has talked about. We've used exclusively Open Source technologies as part of what we add to Amazon's resources and we've made all the decisions that need to be made to ensure high availability across all the layers of your stack.
With that, we'll get to questions, and we have one that came in. "Can you run your own flavor of Drupal on Acquia's HA architecture?"
Andrew: The answer is, yes. You can use any version of Drupal and I think we are running Drupal 6, 7 and 8 websites right now. You can install any of Drupal modules you want, we have a list of which HA extensions we support. We support most popular modules out there. There's always been a day, maybe there is some random security module or some media module that need something and we may need to go sell it for you or recommend the alternatives. You can … people have taken Drupal and just … for lack of a better word, and just bastardized it, and just built these kind of crazy applications on top or we've written chunks of it, and then it also works with our HA architecture.
Our expertise is in the Core Drupal, but our professional services and our technical account managers are great at analyzing applications and understanding how to improve them in performance so by now we support pretty much any … the platform can host any PHP application, or static application. It's optimized for Drupal, but the underlying MySQL and file system and Memcache and all these different requirements for Drupal website; they are the AJ capabilities of that works across the board.
Jess: And we do have multiple incidents where customers have come to us, and they’ve got their application running and in our Cloud environment fine, but they came to us from hosting directly with RackSpace or Amazon and they found it to be either unreliable or it just wasn’t cost-effective for them because of the amount of resources that had to be thrown at the custom code.
Another good thing about Acquia is through becoming a customer you can have access to all these tools that help test the quality of your code and configuration, so when you have extensive amounts of custom codes that are brought into our environment we can help you quickly figure out how to tune it and/or if there are big issues that are the culprit for why you would need to constantly increase the amount of resources you're consuming; we can let you know what those issues are and we can do a site audit through [PS 00:38:37] like Andrew mentioned.
Our hope and our promise to our customers is that you're lowering your total cost of ownership if you're giving us the hosting piece of it along with the maintenance, and if there's a situation for any of our customers where we are continually having to assign more resources because of an issue with the quality of your application; that’s when we'll intervene, and suggest, as a cost-savings measure, work with our PS team to do a site audit so we can help you figure out how to make the site better and therefore use less resources.
Andrew: In a lot of cases we can just grow more hardware at a problem to go have that be a Band-Aid, but it's at both our best interest and the best interest from the customer in terms of both their [builds 00:39:17] as well as having an application that will last for many, many more years, to have our team recommend this is what you should not have done. This is how you can best use this module or this other recommendation to go have a more highly-optimized website for the future.
Jess: The question on, "Why did Acquia choose Amazon to standardize the software, Cloud, on?"
Andrew: Acquia has been around for the past four or five years and Amazon was the original Cloud Company, I was at the Amazon Reinvent Conference a couple weeks ago and one of the key agencies there said, "Amazon is number one in the Cloud and there is no number two." We chose Amazon because it was the best horse and the time, and we continue to choose Amazon because it's still the best choice.
Amazon is … their release cycle for new product features and new price change and all these things is accelerating. They continue to invest in new regions and Amazon is still a great choice to go reduce your total cost of ownership by increasing your agility and your velocity to go build new websites and deploy new things, and move things off your traditional IT vendor to the Cloud, and so we are so very, very strong believers in Amazon.
Jess: "Does Acquia have experts in all theirs … as a Drupal architecture across the data base, the OS, caching?" Then the marketing person is going to take a stab at this, where it's a [crosstalk 00:40:50]…
Andrew: We definitely have experts at all different levels. The RBS team may go and we have some Red Hat experts, we have some … a bunch of experts so they can go recommend different options, for people who don’t host with us. Internally we are all gone to based-hosting so that that may be the expertise about operations staff. Database, we know we have operation staff dedicated just to MySQL. We have support contracts with key MySQL either consulting or software companies for any questions that we can't handle.
It's one of the ways that we go scale if you don’t have to go pay at the corner of the world a 10 grand fee for something that we can just go ask them. Caching , we have people that have … help design some of the larger Drupal sites out there and live through them to be under heavy traffic storms, people that they may go contribute after Drupal Core caching modules, be it Mem-cache or regis-caching and all these different capabilities. With [Agar 00:41:56], we don’t have to use Agar internally but we do interact with it and support it, a lot of our big university or government clients may be using Agar in their internal IT department and they may go and choose to use us for maybe some of the flagship sites or for some other purpose. Yes, we do have experience across the board.
Jess: [Ken 00:42:22], unless you have you any questions that you came in straight to you; that looks like the rest of the questions that we have for the day. Hopefully that you found this valuable, you’ve got 15 minutes back in your day, hopefully. You can find good use for that.
Thank you so much, Andrew, for joining us, I really appreciate it and the content was great.
Andrew: Thank you.
Jess: Again, thanks everybody for your time; and the recording will be available within 48 hours if you'd like to take another listen, and you can always reach out directly to Andrew or myself with any further questions. Thank you.
Andrew: Thanks everybody.