Pipe Dream: Geographically Distributed Drupal
by Barry Jaspan
The speed of light is, unfortunately, still a constant. If your Drupal site has users in San Francisco, New York, London, Tokyo, Delhi, and Australia (and whose doesn't?), you've had no good way to give all of them fast access to your site. No matter where you put your master database server, most people have to cross an ocean to access it. Perhaps you can put read-only slave databases with local web servers in locations around the world, but then the remote users still have a long haul when they want to log in and create content---which is, after all, what your Drupal site is for.
I am experimenting with an approach to solving this problem that allows users to log in and create content using web and slave database servers that are geographically close to them while maintaining a single consistent Drupal site. It does not require multiple active database master servers and all the intractable problems that causes. My system, called Pipe Dream, intercepts database-changing operations at the remote locations and sends them over a message queue (a.k.a. a pipe) to the primary location where they are replayed.
Shameless plug: I'll be presenting my work on Pipe Dream at Drupalcon Copenhagen assuming my session gets selected. See you there!
The basic idea is simple. First consider the normal, local-only case:
A user's web browser submits a form via HTTP POST to a web server, which then writes the new data into the database. Later, the web server reads from the database to generate pages.
Pipe Dream changes this process to use a message queue between remote and primary servers. A message queue is an asynchronous, "fire and forget" communication channel; it works like sending an email message. The sender can deliver a message into the queue and immediately go on to other tasks; eventually the recipient will get and act on the message. Here's how Pipe Dream works:
The web browser submits the form via HTTP POST to the web server. The Pipe Dream module intercepts the form data before it is "submitted" (in the Drupal Form API sense) and redirects it to the primary master server via a message queue. A message queue listener on the primary location receives the message and re-submits it to the master server which writes the new data into the database. Eventually the new database changes are replicated to the remote MySQL slave server. In a healthy system, this whole process will probably only take a few seconds so the new data will be available at all of the remote locations right away.
The advantage of this system is that the remote users only ever talk to the web and database servers near them. When new content is submitted, the POST request is processed and returns right away. The content will appear later, but in the meantime the user experiences a fast web site and can move on to other things.
While the basic idea is simple, unsurprisingly, the details are complex. Here are some the issues I have identified and how I am thinking of addressing them:
* Delay. Yes, it is true that a user will post a comment and then may not see it immediately when the page reloads. I don't care. Pipe Dream can display a message: "Thank you for your submission. It will be appear shortly." We're talking about user-generated content web sites, not a nuclear missle control system.
* User login. The previous item notwithstanding, when a user logs in to a remote location, it won't do to tell them "Thank you for logging in. Please refresh the page a few times until your username appears in the upper-right corner." So the user login form actually has to be processed locally, issuing a local session cookie. Pipe Dream could replay the login at the primary location so the user will be logged in "everywhere" but probably the login can just remain local. Logins from the primary location can still be replicated everywhere, because the session table can have multiple entries for a single user id.
* Caches. Much like user login, it is important for cache tables to be stored locally. It might easily be the case that content viewed frequently in a remote location is rarely viewed in the primary location. This turns out to be a non-issue because Drupal 7 uses the REPLACE INTO query for cache updates. All cache clearing and cache entry operations at the primary location will be replicated out, and all cache entry operations at the remote locations will all be stored locally.
* Validation. Form submissions can fail for a number of legitimate reasons such as a missing title field. Pipe Dream will allow forms to be validated locally and only deliver them to the primary location if they pass. Of course, it is possible that a form will pass validation remotely and still fail at the primary location due to a race condition; perhaps a referenced node or term just got deleted. The easiest answer is "I don't care." A better answer is for Pipe Dream to store the form submission in a table of failed submits (which of course gets replicated to the remote locations) and then show the user a block containing forms to be re-submitted. When the user clicks an entry in the block, the form re-appears filled in with validation errors displayed.
* Form API complexity. Drupal 7 forms can have a variety of behaviors and structures, including multiple submit buttons; Pipe Dream will not always know how the form is supposed to behave. For example, node forms have Preview, Add More, Save, and Delete. The first should run locally, the last two should run at the primary location, and Add More is an AJAX operation that I haven't figured out what to do with yet. Pipe Dream will probably be able to handle all "simple" forms that have only a single primary Submit button and no AJAX plus a selected list of forms for which there is a special-purpose handler. Probably Pipe Dream will define a hook that lets modules tell it how to handle their non-standard forms. In the initial implementation, I only plan to support very standard forms; nodes and comments are what everyone is going to care about anyway.
* Images. These might be easy or tricky depending on the way Pipe Dream intercepts form submissions. I have not actually tried it yet.
There are several different approaches to implementing Pipe Dream. My initial plan as described above is to base it on form interception. chx likes the idea of a new database driver that passes all non-SELECT queries (instead of form submits) to the primary location. There are probably other options.
As of this writing, Pipe Dream is an early-stage research project. I have it working with basic node forms with title, body, and taxonomy terms, using a dumb SQL-based message queue. I plan to have a functioning demo and code in the Pipe Dream module at drupal.org later this summer.