Home / How to Scale MySQL in Support of Drupal [October 23, 2012]

How to Scale MySQL in Support of Drupal [October 23, 2012]

How to Scale MySQL in Support of Drupal [October 23, 2012]

Want to learn more about Acquia's products, services, and happenings in the Drupal Community? Visit our site: http://bit.ly/yLaHO5.

CDNs, caching (such as with memcached or redis), and read-only MySQL slave servers can scale Drupal for anonymous, read-only traffic. However, these approaches do not help scale Drupal to handle increased dynamic page generation (such as for logged-in users) or heavy loads (such as high comment volume). Also, Drupal 7's support for read-only slave servers is incomplete, with many contrib modules not supporting it at all.

This presentation will describe a solution for horizontally scaling MySQL in a way that allows elastically-scalable read and write capacity while maintaining 100% compatibility with existing Drupal core and contrib modules.

In this webinar you will learn how to:
• Improve your Drupal site's response time under heavy load.
• Handle increased data volume as your site gains popularity.
• Dramatically reduce the cost of operating the database servers that power your site.

Category: 
Publish on date: 
Wednesday, October 24, 2012
Rating: 
Click to see video transcript

Speaker 1: Hi, everyone. Welcome to the Webinar Today. Today’s webinar is How to Scale MySQL in support of Drupal with Amrith Kumar, who is the CTO at ParElastic and Barry Jaspan, who is the Lead Architect here at Acquia.

Barry: Hi, I’m Barry Jaspan. I’m the Lead Architect for Acquia Cloud. We’ve been working to scale Drupal at Host It for several years now. Everyone likes to think that if you put something in the cloud you’d immediately get complete and automatic horizontally scalability of all components, and that would be terrific. It turns out that there are some parts of hosting Drupal that are easily scalable and there are some parts that are not. The database turns out to be one of those parts that has traditionally been very, very difficult. With MySQL, you can use a bigger and bigger server and eventually you kind of hit the wall, you can’t get bigger servers. Lots of people have come up with lots of possible solutions to this. Obviously, there’s database sharding, there’s various noSQL solutions. There are systems like Cassandra, and whatever that are fundamentally different kinds of databases that possibly horizontally scale better. Drupal doesn’t use those. Drupal is really designed to work with a relational database. The hunt test has been on for what is the best way to sort of move the pieces around among what’s possible while maintaining a relational database abstraction while also giving sort of the best combination of it works with existing systems and it gives us horizontal scalability, and there’s a lots of different players in this space.

When I was introduced to ParElastic, it looked to me like a pretty compelling option and so we’ve been working with them some. As a disclaimer: I’m actually on their technical advisory board. I have a very small amount of stock in the company but I’m doing that because it looked interesting to me, it looked like this might be the solutions that may or may not be the best for all possibly cases, but certainly for a system like Drupal, which really wants to just keep using a relational database and have it pretty much look like a normal single database install. It looked like a really good way to step a system up. So with that, I’m going to hand you off to Amrith and he’s going to talk about it.

Amrith: My name is Amrith. I’m the founder and CTO. For the next 40 minutes or so, we’re going to be talking about databases, database scalability, and how that impacts your Drupal site. I’d like to make this as interactive as possible so if you have very short questions, which you want to post, put them in the Q&A tab and we’ll try and get to them if they really topical immediately or at the end of the presentation.

One of the things which—thanks, Barry, for the intro. You all have probably experienced situations where your site becomes popular and it turns out the response time on your site is adversely affected by the performance of your database.

Drupal, as an application, is something which relies on database. We’re going to be talking about several ways to improve the scalability of your site, some solutions include reducing the load of the database, some include scaling the database. I want to talk to you also about the ParElastic solution, which helps improve your Drupal experience on the whole.

Drupal, as an application, is it’s very sophisticated web application and everything that you see on a web page in Drupal is in fact stored in a database somewhere; it’s a MySQL database, normally. When you request a page on Drupal, there’s some portions of it which potential may be coming out of the database. When people interact with your site, some of the things which they do on your site eventually do get stored in a database and that’s the MySQL database under the covers. Depending on how much you load your database, you may have some impact on your response time and a lot a very noble techniques have been used to improve the response of your database.

On the slides we look at some of those things, one of them is caching or CDN. When you have content which can be distributed on the network, then you put it out at various places—something like Akamai—and you make sure that when somebody wants to access that content they get it somewhere near to them. If the CDN doesn’t work, you may be able to use an application accelerator—something like varnish if that doesn’t work, you may use something like memcached, but eventually, the database when you look at the structure and you say, “The reason we’re doing some of these things, the varnish, the memcached above mySQL is largely because you want to reduce the load on the database. You have a trade off here. If you look at a typical Drupal site, an anonymous user tends to get most of their content either out of CDN or Varnish. If a person is an authenticated user, somebody who logs in to your site, the access may, in some cases, come from CDN or varnish; in some cases it may go all the way back to your database.
If you're logged in to a website, you expect personalized content and it’s not good enough if you get cached content, and that’s the reason why to improve the interactivity with your site, you do impact have to go, in some cases, all the way back to the database. Over time, we’ve noticed that the anonymous user is kind of no longer very popular. More and more people want to interact with your site, they expect that they’re going to be receiving content which is customized for them and, therefore, there’s a larger and larger load being placed on your database at all time. You’ll also notice a lot of people have been treating something like Facebook as a de facto single sign-on. They’re signed into Facebook and they expect that the content which they’re going to get is going to be customized for them. They have cookie on their system. They expect that the content is going to be customized to them so you tend to see a larger load on your underlying database and that is something which in a lot of cases can cost an issue for your site.
We talked about several alternatives to reduce the database mode. We talked about CDN. We talked about accelerators. We talked about caching. The shortcoming is many of these work only in cases where the user is anonymous. It’s not a good solution for an authenticated user. Also, it’s very, very good for cases where you're reading or you're just rendering a website, not very good if people are actually interacting. If they’re posting a comment, you still need to go all the way back to the database. At the end of the day, scaling your database is really the thing which you need to do.

There are several alternatives to scaling your database and we have a couple listed here on the slide:
Going from left to right, one option is get a bigger and bigger server. MySQL find a database and if you have really, really powerful hardware, MySQL can perform very well. One of the downsides with large hardware, especially if you happen to be running in the cloud is, your hardware is exponentially more expensive but the performance is exponentially more expensive. In other words, you can pay a lot more for server but you may not get proportionally more in terms of performance.

The other thing, as Barry mentioned earlier, there is final limit on how big of a server you can get. The largest server you can get on Amazon, for example, is sometimes not big enough for your site or for your database. What do you do then? There is an issue with just the scale up approach, if you will. Scale up is scalability solution where you just go for larger and larger hardware. The issue with scale up is there is cost and there is a limitation, you can’t go beyond a certain limit.

Some other people have come up with a solution called “sharding.” In the middle column in the slide here, you see what “sharding” looks like. Sharding, for those of you who are not familiar with it, is an application architecture where the application distributes its data into shards. Shards are independent databases and the application takes on the responsibility of saying, “if I want to get this data, which database do I need to go to?” The picture you see there looks awfully complicated. It’s not just to scare you, that is really what sharding looks like. The thing at the very top is your Drupal application and then there’s this whole infrastructure between your application and collection of databases which is, among other things, complex, fragile, very costly for you to write. Also, if you go down this route, it’s not something which is standard. What you're really going after the scaling the database, what you're ending up is a fragile solution, which may not in fact scale your database.

Also, as Barry suggested, there are some people who have attempted using solutions like Cassandra or other noSQL object data, noSQL databases. These require wholesale rewrites to Drupal. These are totally non-standard and in many cases, they’re totally unproven and risky. One of the downsides with this approach is—and I would be the first to admit that you can optimize a single code path with noSQL database and make it much, much faster than any relational database ever can. Any other code paths through the application becomes ridiculously slow.

I’ll give you just one example in the case of somebody who I spoke with, who tried to scale Drupal using a noSQL database. They were able to get very good response time on typical page loads. They were able to get modestly good time on comments. But, they had to totally disable the tag cloud because the tag cloud absolutely brought the server to its needs. That’s the fundamental problem with noSQL database is you optimize a primary code path but other code paths become totally sub-optimal. That’s also one of the good things with the relational database, which is you have a normalized data structure under the covers, which is the way in which Drupal stores data in the database and you can ask general queries in a standard query language and you're able to get good response time from most queries.

The implications to all of these three options, if you chose to go down this path, sharding and noSQL, for example, are totally custom-code. You don’t have standard module support for it. In the case of just getting bigger servers, you're limited by the size of the server and some cases that may be big enough, but in many cases, that is proved to be not big enough. There are well-known situations or well-documented situations where even largest servers available to you is not big enough if you're site really becomes popular. When that situation happens, you really need to look for another client solution.

Another kind of solution, which some people have talked about is to have database replication with masters and slaves. On the slide, you see a client on the very top talking to Drupal; and Drupal is sending all of its reads and all of its writes primarily to MySQL Master, which had been replicating to some collection of slaves; and then Drupal is directing some reads to one of the slaves. The solution works okay in some cases, but now Drupal has to realize when it’s doing the reads, does it know whether the writes has been replicated to the slave or not? You may not get the latest content on all the reads if that’s sufficient for your application. Again, in some cases this is acceptable; in many cases, this is not acceptable and therefore we need to look for other alternatives.
Just talking about the various alternatives you have for scalability that we talked about so far. You have replication, the one we talked about most recently, reads are targeted at slaves; writes are targeted at master. Standard MySQL replication between the master and slave and there’s challenges here because you don’t know in all cases whether you're getting the latest content.

There’s the option of sharding, which is an application architecture. It’s something where you have to make wholesale modifications to Drupal. And once you go the write or sharding, it’s not possible in all cases that you're going to be able to get a consistent and a holistic view of your data.

There’s the option of scaling up to a larger hardware that comes with the challenge that it’s ridiculously expensive in many cases and sometimes you can’t scale up large enough.

Finally, there’s the option of a noSQL or newSQL database where you have totally modified Drupal the application. It’s no longer a standard Drupal application, you’re on your own as far as maintaining it. To summarize these options: a combination of high operating costs—large server is very, very costly to operate; high maintenance costs—once you go and modified a code for Drupal all the way, you're on the hook to maintain every piece of it; it’s not standard—noSQL has no standard query language, it’s not like SQL which has a standard query language.; you need to understand the specific API’s for the noSQL implementation you choose to use; you dramatically increase your risk, not all of these solutions have had the same industry rigor, if you will, that MySQL has had. MySQL has a database that’s been around for several decades and that they’re very, very stable database for the most part.

All of these things have cost people who have gone down approaches to severely impact their time to market on their sites. Really what people are looking for is a solution which just works with Drupal or other web application, makes a database scale easily in the clouds.

Let me talk about ParElastic because that’s exactly what we do. Relational databases like mySQL are the thing which Drupal relies on. Drupal is coded to work with MySQL. The challenge is that MySQL is not always perfectly scalable in the cloud. What ParElastic does is, it takes a group of database servers and makes them work together as if there is single database server.

Let me say that again. If you take a group of servers and you distribute the load horizontally across them and all of these servers appear as if they are a single server. ParElastic is built for people who build interactive web apps, digital gaming, ecommerce, social networks… all of the kinds of things which people are using software like Drupal for. If you are using Drupal multi-site or you're building a multi-tenant and SaaS application, ParElastic is ideal for you. I will talk some of the reasons why that’s the case.

Finally, if you are wishing to offer database as a service, if you are wishing to offer some hosted application as a service, ParElastic, because it virtualizes the database, is ideally suited for you. The reason you need this is, database scalability is something which we all recognize as a serious impediment to the way in which you innovate and grow your own application and we want to mitigate many of the risks with other solutions. The risks with other solutions are: the cost; the complexity; and the lack of standardization and you want to get the overall of those things.

Let’s talk about how ParElastic does it. ParElastic is a solution, which allows you to scale your database on demand. Depending on the application load, depending on the Drupal load which replacing on your database, you can add storage, you can add process, and most importantly, because you can add and remove this on the fly, you only pay of what you use. Also, very important, we are not creating a brand new database; we’re just taking existing MySQL. If you’re running in the Amazon cloud, this could be either EC2 instance with mySQL or this could be an RDS instance or a collection of RDS instance. We pick collections of database instances and make them work together as if there is single database instance. This dramatically reduces your risk, reduces your disruption. It’s a database you know and love so all of the existing knowledge and skill about databases just continues. Since it mixes group of MySQL databases, if you're like a single MySQL database, the Drupal application just perceives it as a single MySQL database and it just works. That’s the simplicity of Drupal.

What Drupal application sees on the left hand side of the slide, a single MySQL database. Under the covers, it’s actually a collection of MySQL databases in which we’re all working together as if there is single database and that’s what ParElastic does. We virtualized the database for you so you just interact with what you think of the single database and we take your scalability behind the scenes.
How does all this work?

Let’s start with a couple of slides, which talk about your current configuration, you have Drupal, it talks PDO to a standard MySQL. This is what you have today. With ParElastic, make no change to Drupal, make no change to your PDO and you talk to ParElastic. Absolutely as simple as that.

Behind the scenes, there’s a collection of MySQL servers, these again, if you're running in the cloud, they could be in Amazon’s cloud, they could be EC2 instances with mySQL or they could be RDS instances. What ParElastic does is it distributes your data transparently partitions them across a collection of databases. All of these databases are now going to work together as if they are one single database. The application doesn’t see the difference that there’s multiple.

A very important aspect of ParElastic, something which is part of our intellectual property in the patent we have, is depending on work load on your system and depending on the amount of data you have on your system, you could add additional MySQL servers. Contrast this with something like sharding. If you have some number of shards in your application and you decide you want to have more shards, there is this very time consuming and risky operation called resharding which you have to do where you literally redistribute all of your data on the new set of servers. Part of our intellectual property, for which we have a patent, is ability for you to distribute data across a collection of servers and such that when you add new servers, you don’t need to do any redistribution.
That’s, I think, a very important aspect of our product, which makes it particularly applicable for a Drupal site as the site grows. Now overtime a Drupal site accumulates more and more data, you have more and more content which is published on your site, you have more comments, you have more data in your MySQL database, you can’t start off your system with the number of database servers you expect you're going to need in some number of years. You start with the number of servers you think you're going to need in the next six months, and overtime you grow to whatever you need. The important thing is, you can add additional servers and you don’t need to redistribute data–very important if you want a truly elastic architecture.

The other part of our system, also protected by another patent, is the ability for us to distribute processing on a collection of servers shown here on the left of your screen (in orange). When you have variable, when you have workloads which require processing, ParElastic has the ability to spin up additional databases, this is again off the shelf MySQL database instances where we can do this processing, there is no persistent data, there is no application involved, but you just use it in order to profound processing on the fly and that’s part of how ParElastic is elastic and able to deal with highly variable workloads which your application might face at any point in time.

Now, when you look at something like Drupal multi-site, you have multiple websites, multiple web properties, and in a traditional Drupal multi-site situation, each site has its own database, that database is sitting on a server and you might have some number of sites all sharing the single server. With ParElastic, it’s a little bit different, each site has wooded beliefs is a virtual database of its own and that virtual database is distributed on some ParElastic infrastructure, which is some combination of storage and processing site, which can grown and shrink based on the demand you place on your system at any point and time. With Drupal multi-site, we have some special optimizations, which are particularly good if you are an enterprise, you are using that you have multiple web properties on Drupal.

Finally, nobody wants to have an outage on their site. Those of you who are using Amazon probably knows that there were some outages in the last couple of days, high availability is really, really important for today’s website. ParElastic, as an application, is something which is highly available. You can have multiple instances of ParElastic. You can have this in multiple availability zones. You can distribute your load across all of them with some standard network load balancer.

Also, by the fact that we don’t make any changes to the underlying database, it’s standard MySQL under the covers, you can have MySQL replication going on your storage site as well. So you can have a primary copy in one of the availability zone, a set of mirror and another availability zone. You can have multiple ParElastic instances talking to these and therefore, if you at any point face a failure in your underlying infrastructure, your site is not compromised.

Again, the Drupal application running up top needs to know very, very little about all of these because with ParElastic, the entire database is virtualized and we deal with all of these for you.

Just to quickly summarize what are the details. We adapted the provision resources based on your demand. You don’t have to provision for peaks, you don’t have to provision storage for your demand two years down the road, you provision for what you need today. ParElastic automatically distributes data to all of these servers and we scale both reads and writes. Now, since we’ve distributed the data across the collection of servers, we can scale reads and writes for that. If you’re using Drupal multi-site or you're building a SaaS application, the ability to virtualize your database is really, really important. Finally, ParElastic makes all of your data no matter which server it happens to reside on. It makes it appear like one single database so you can think of the similar to cross-shard operations in sharded architecture.

In a typical sharded architecture, you can query of one shard or another shard and then it’s up to the application to consolidate those results. You don’t have any such thing with ParElastic. You just code your query, assuming you're talking to a single database, it’s up to ParElastic to provide you the single answer to that.

The important benefit to you, without ParElastic, you have to end up provisioning for peak demand. In many cases, that means that you're provisioning a lot of excess capacity. With ParElastic, you provision only for what you need and you dynamically change your provisioning based on the load on your system. This is really an impact on your bottom line.

Finally, unlike all of the other solutions which require you to make changes to your application, whether it’s in sharding or noSQL, with ParElastic, there is almost no change that you need to make your application. It’s a one-time change to get your data into ParElastic and then you application just works. Without ParElastic, you spend a lot of time building your underlying infrastructure. With ParElastic, you don’t have those issues, you just deal with your application and you innovate new application, which is really what you want to do anyway.

It’s always important, when we’re talking about things like this to discuss the performance of the system. So here are some results of standard benchmark. The red line on this benchmark is the performance of the system using standard mySQL (native), which was run on a m1.xlarge system on Amazon’s cloud. It had about a terabyte of data. I think it was about two billion loads worth of data and some of the large tables. The exact same data was loaded on 5x m1.xlarge systems and ParElastic was used against the 5x m1.xlarge system and both cases we used standard MySQL 5.5. The graph shows the response time for a standard database transaction with mySQL in red and with ParElastic in green. The Y axis in the response time in milliseconds so more is worse. You don’t want to be waiting three and a half seconds to get response to see your queries. With ParElastic, you wait barely half a second, 587 millisecond. The performance is about six times better. We have five servers. We pay less than twice as much as a single m1.xlarge so you have a dramatic improvement in performance here.

As the load on your system goes up, you don’t have this rapid decaying performance that you see with traditional MySQL, you have a gradual increase in response time but it’s something which is definitely manageable. What’s not shown on this graph is what happens with very small numbers of users. ParElastic as a product since between your application and the collection of databases, so it introduces a small overhead. In very small numbers of users, like if you have a single user or if you have queries which are very, very short, the overhead may actually dominate. So that’s a quick note on performance, but the important thing to note is you're distributing the load here on 5 m1.xlarge servers. You can equally well distribute it on much larger servers as well.

The databases, which we’re using in ParElastic are elastically scalable and depending on the load on your system, you could spin up more servers on demand. The mechanism for doing that is policy-based and you could monitor the load on your application and you could determine how many more servers you want to spin up or whether you want to spin down servers. From ParElastic’s point of view, when we process query, we’re going to look at what resources we need for those queries and we’re going to provision those resources and use them from within a pool that is provisioned by whoever is running the application. if you were running ParElastic on Amazon for example, you would be provisioning a set of servers and you would inform ParElastic of it and in the event that you're application requires more servers that you determine that to be the case, you would spin up more servers, you would inform ParElastic that they’re available for use and then we would continue to use them. But on the fly, we could provide you with the intelligence which says you actually do need some more servers. The numbers that were shown here are with Amazon EC2 with MySQL, the numbers with RDS are similar. The primary difference between Amazon EC2 and MySQL on the one hand and RDS on the other is— but with RDS, you get some of the managed options which Amazon provides you; with MySQL and EC2, you have to do that yourself. Performance on both cases are similar. In the numbers that we’ve shown here, we used standard IOPS or standard EBS volumes, even with provisioned IOPS you don’t notice a dramatic change in the performance here.

What are some of the benefits to somebody who uses ParElastic in application like Drupal?

The most important one is there are absolutely no lines of group of codes that need to be change. There is a one-time activity which you need to go through, which says, “Have do you migrate on to ParElastic?” “How do you make sure that all modules that you want to use are creating their tables and distributing their database correctly?” But once you do that and beyond that there is absolutely no lines of group of change and installing Drupal is literally the standard Drupal installer. You go back to focusing on the innovation in Drupal rather than trying to deal with how do you innovate in the data tier? Your user experience is dramatically better if you look at this performance chart again. Your user experience is not going to be very good if your response time for queries is dramatically going up to three and a half seconds rather if it’s a half a second, you're in much better territory. Your operation cost is so much lower because you provision and use only the resources that you need at any time. By virtue of the fact that we’re not a totally new database, we’re just standard MySQL under the covers, your risk is dramatically reduces. Since we deal with all of the issues of scalability for you, your time to market dramatically improves as well.

I’ve also attached here with this presentation some contact information for you and if there are any questions, now is your time to start putting them in.

Barry: There are two so far. One, maybe you covered already, which is what happened to a single node fails and you talked about how you used MySQL on occasion but maybe not about using ParElastic to do replication itself or to do duplication. The related question is ParElastic itself, how is ParElastic a single node of failure?

Amrith: OK. Thanks, Barry. Both these questions and I’ll try to get to them one at a time and it’s like don’t get both smack me. What happened to the single node of failure as they’ve built in redundancy? I switch back to the feature on high availability. A single node which fails could be one of couple of things. It could be a node running ParElastic and those are the notes on top. And those are not single nodes of failure because if one node fails, the query can be redirected to another node. Under the covers, a single node of failure could be a MySQL node. One of the storage nodes on the right hand lower part of the picture. Again, standard MySQL replication is used here so if a single node fails, that node has a copy somewhere else and at that point, it’s a matter of recognizing that node has failed and performing a fail of operation and then you direct all of your queries to that alternate node. Replication in this particular case is standard MySQL replication and we’re not at this point in the replication process itself and there are some things which we’re thinking about in that area which would make for product enhancement in the future. The second question was about…

Barry: ParElastic being single point of failure?

Amrith: The ParElastic servers at top were another single node of failure. It’s a very, very good question, which goes to the real heart of why ParElastic as product is innovative and cool. If you're building a traditional website, and you have a component in that site which is stateless. It’s very, very easy for you to horizontally scale that. For example, if I just need to spin up more web servers, I can spin them up because the web server fundamentally does not have state beyond the state of the current transaction which its process. you could have a load balancer and you can spin up and spin down web nodes on the fly. The issue becomes, how do you spin up and spin down databases because they are stateful and single ParElastic is the—the ParElastic engines itself is largely stateless. So we can have a large number of ParElastic engines as well and you can dynamically send traffic to one of many ParElastic instances and all the ParElastic instances are talking to the same underlying storage infrastructure so we’re not a single point of failure either. I hope that answers the questions. If not, just—

Barry: One way I can address this is that we have not deployed ParElastic if we have not been—we are working with it and experiencing with it. All of our masters cloud database clusters are high availability. We run two masters in an active passage configurations. All of the reads and writes are always going to one of the masters, but if that one fails, then all the reads and writes are going to the other master. One is always active; one is passive. What we are planning to do, I believe what we’re doing, I’m not directly involved in this, is we’re running the ParElastic engine on both of the database masters. So whichever one you're talking to, you're actually talking to ParElastic and then ParElastic may be talking to both of the database masters or maybe a collection of other storage nodes or whatever else you might be doing.

Effectively, ParElastic doesn’t really change the story of whether or not you have a single point of failure. If you’re running on one MySQL server, you have a single point of failure. If you're doing what we’re doing, and you have two, and your heartbeat or some other method of doing failure between your two database masters and you have active/passive replication set up, then you don’t have a single point of failure and ParElastic will not add a single point of failure. What ParElastic won’t do based on what Amir has explained so far is, it doesn’t magically take one database server and turn it into not single point of failure anymore. If you only got one underlying server, that’s your situation. Of course, you wouldn’t use ParElastic if one server was sufficient. That was sort of wasting your overhead under most circumstances. It’s actually the multi-tenant feature, it could actually be used anyway, but… all right.

Amrith: Thanks, Barry. Something you didn’t include, I don’t know, it’s probably too late to put in here is what Doug called the “marching app slides” where you showed how different kinds of transactions, different kinds of joins and queries played out across this kind of environment.

Barry: I did not have those slides on this presentation but I can certainly go over that with anybody who’s interested in that so I don’t know if we can do that here.

Amrith: OK. You might be able to—

Barry: Some of the questions actually asked for some of that, so I’m going to into that.

Amrith: I thing I’ll say is if you have time to put it into the 1:00 PM webinar, people could watch that they’re both going be online.

Barry: We’ll do that, OK.

Amrith: There’s a couple more questions here talking about where we store the data, what data is stored where, and so on and so forth. Let me talk about some of that. Where the data is physically stored? We’re not offering a service. What we’re offering is software which you run in your Amazon account, for example, or in your own data center, for example, we have people who are doing both things. Some who run it on the Amazon’s cloud; some were running from their own data center. As a matter of fact, we have some people who are running our software in other clouds as well. The data is physically stored in wherever you choose it to be. If you happen to be a person who was in the UK, who is using Amazon’s UK availability zone, that’s where your data is going to be. By the way, ParElastic is not a Drupal-only solution, it’s a generic database solution so if you have some other database app which requires, for example, the IO security on beta policy which says you can’t have identifying information If the person in the IO country moving out of the IO you're still fake there.

There were a couple of questions about what happens with failures and single point of failure and I think Barry addressed them. I’m going to talk about one of the questions, which is what data is stored in a persistent instance and what is stored in a dynamic instance? One of the things which any database does when it processes a query—let’s talk about simple MySQL, no ParElastic involved and you're attempting to join two tables then perform an aggregation and then sort the results and produce the results of the query. The query plan, which MySQL produces often generate these things calls temp tables.

What ParElastic logically does is it treats the processing nodes as the place where you store temp tables. Your persistent data, every piece of content which was in your website, is stored in the persistent storage part of ParElastic. When we’re processing queries, if we require— and this is important thing, if we require and not in all cases—if we require to join and some data happens to be on online MySQL server and some data happens to be on other MySQL server, this is what some people call “cross-shard” operations. Then and only then, we will use the process in sites, we will get the data from different places and perform the operation on the processing site. This is particularly useful because MySQL as a database is really, really fast if you are just using it for high performance index access to your data. MySQL performance gets a little worse when you start to do things like joins. It gets even worse when start to do things like aggregations and when you start to do something like sorts. What ParElastic does is it says, those operations can be done equally well on the processing those and the processing notes are truly a dynamic elastic resource. If you require that kind of processing, don’t bug down your storage servers for that, do it somewhere else.

There was a question about how you pay for ParElastic. ParElastic is something which we sell in license based on the number of servers on your management, the number of storage, persistent storage under management. Feel free to ping me, we can talk more about prices and things like that, but for the most part, I’d like to keep this to be a more technical presentation and stay out of marketing and sales.

Another thing which several people have asked in the past is how ParElastic compares, for example, with MySQL cluster? MySQL cluster is for the most part, the cluster option available would be NDB storage engine. Performance is really, really good if you can keep all of your data in memory. Performance drops out rapidly if you're data exceeds the capacity of memory. The place where MySQL cluster is particularly applicable is if you really high-end hardware, you're running it on your own data center and performance in the small number of milliseconds or reasonable number of micro second is really important to you.

That’s not really the use case if you're running a database in the cloud where you can’t guarantee, for example, all of your data is going to be in memory and as a matter of fact running MySQL cluster in Amazon produces some very, very peculiar performance results. There is a dramatic improvement that you can get with scale out, which you really can’t get the cost effective e way to scale up and that’s kind of the big picture difference between MySQL cluster and ParElastic. An entirely different level, ParElastic one with standard MySQL, therefore you don’t have to change the storage, ParElastic is not a storage engine. We sit between the app and the collection of MySQL servers. If you happen to be using NDB for some data, you continue to use an NDB for that data. If you want to use some of other high performance storage engines, you can go head and use that. Unlike the MySQL cluster option where you pretty much bound to use the NDB storage engine and you could be one of the brave pioneers doing that and helping them find the bugs out in that. But it’s not something which performs very well in the cloud. Other questions?

There are a couple of questions which I’m seeing about screenshots and administration consoles, that goes back to what Barry just mentioned. A prior arrangement of this presentation had a lot of that content in the interest of time I remove some of that. What I will do is put some of that content back. There is going to be a repeat of this webinar at 1:00 in this afternoon so the version of the slides which will be available online will include that information. Check back, I think it’s set in 48 hours that information will be on the slide or I have another option for you which is, my email address and phone number are right here. Drop me a note. Happy to talk with you about these and a lot more details, a couple of people who have asked about price list and things like that. Michael Aubin, whose phone number is here as well, he is the guy you should speak with about that. Other questions?

Speaker 1: All right. Thank you for the great presentation. Thank you everyone for attending. Again, slides and recording of webinar will be posted on the Acquia website and we’ll be emailed out to you. Have a great day!

Add new comment

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.

Filtered HTML

  • Use [acphone_sales], [acphone_sales_text], [acphone_support], [acphone_international], [acphone_devcloud], [acphone_extra1] and [acphone_extra2] as placeholders for Acquia phone numbers. Add class "acquia-phones-link" to wrapper element to make number a link.
  • To post pieces of code, surround them with <code>...</code> tags. For PHP code, you can use <?php ... ?>, which will also colour it based on syntax.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <h4> <h5> <h2> <img>
  • Lines and paragraphs break automatically.
By submitting this form, you accept the Mollom privacy policy.