Migrating the Drupal way. Part II: saving those old URLs

Migrating the Drupal Way. Part II: Saving Those Old URLs

For the second part of my migration blog, I want to touch on the importance of maintaining URLs from your old site and demonstrate some examples of how to capture them in Drupal. Search engine traffic and other referrals are invaluable when it comes to the success of a site. I've managed sites that have received upwards of 100,000 referrals a day from Google alone.

This is not even to mention all of the external links and bookmarks to your pages. If you are thinking about migrating to Drupal and have not considered the importance of maintaining old URLs, you definitely should. The impact of losing that amount of traffic could be catastrophic, but luckily the solution is reasonably simple.

So, how do you process all of those old links? You could capture them with a generic "page not found" message, but this is frustrating for users and will inevitably hurt your search engine rankings. Thankfully, with a small amount of configuration, you can still send your visitors to the right place in your new site. Here are a few options to maintain URLs using Apache's mod_rewrite module and a couple of Drupal modules.

Get updates

Receive the best content about the future of marketing, industry shifts, and other thought leadership.

Mapping a previously used ID to a NID

If you are fortunate enough to be able to map a previously used ID number to a Drupal NID, then you can very easily use mod_rewrite to remap the URL. Let's say that the old URL looked like http://example.com/index.asp?id=123, but your new Drupal site uses a URL like http://example.com/node/123. The following rules could be used to remap the old URL unbeknownst to the user. The following examples would be placed before the Drupal rewrite rules in your .htaccess file:

# Match a request for index.asp RewriteCond %{REQUEST_URI} ^/index.asp$ # Match a query string like id=[some number] and capture that number RewriteCond %{QUERY_STRING} ^id=([0-9]*)$ # Rewrite a Drupal friendly URL using the captured number # Note that %1 is a backreference from a RewriteCond # where $1 is a backreference from a RewriteRule RewriteRule ^.*$ index.php?q=node/%1 [L]

For a greedy match, you could drop the first RewriteCond and it would capture any URL that ends in "?id=[some number]". The convenient thing here is that Apache doesn't need to know what to do with an ASP file extension because these URLs are rewritten as requests to index.php.

I should note that it is outside the scope of this post to explain how to create new nodes with a NID that matches the ID of an imported item from your old site. I'll save that for another day.

Mapping a previously used numeric file name to a NID

Perhaps your former CMS creates static HTML files, but uses an ID predictably in the file name. The following example will capture any numeric characters in parentheses and append them to index.php?q=node/. e.g. requesting file5.html would return the same contents as node/5.

# Map a filename with a predictable number to a drupal nid RewriteRule ^file([0-9]*)\.html$ index.php?q=node/$1 [L]

The Added Bytes website has a really convenient mod_rewrite cheat sheet if you want to learn more.

Creating individual URL aliases in Drupal

Creating individual URL aliases in Drupal

In some cases you might not want to use mod_rewrite to process these URLs. It might be too much overhead for your server, or you might not have a predictable ID as a reference.

Using the Path Module, which comes with Drupal Core, you can add aliases for system paths. Manually enter Aliases by navigating to Admin -> Site Building -> URL aliases -> Add. The following example would map your old URL 2008/12/12/my-seo-page.html to the Drupal URL node/5

Alternatively, you can add these aliases when you edit each node under the heading URL path settings.

A programmatic solution

The above example of using the Path Module is great when you only have a handful of pages to alias. But consider a migration that has 250,000 pages to remap. If you are using a migration script, you can create these aliases on the fly during the import process using Drupal's path_set_alias() function.

title = $old_title; // etc. // Next you would want to create some logic to reconstruct your old URL // example.com/2008/10/18/my-old-url.html -- use: $old_url = "2008/10/18/my-old-url.html"; // Save the node in the database node_save($node); // Save the new alias based on your old URL if ($node->nid) { path_set_alias('node/' . $node->nid, $old_url, NULL, 'en'); } ?>

Redirecting as a solution

If you want to send your visitors to the correct page and use your new Drupal-style URLs, you can use a permanent redirect. The 301 HTTP status code tells your visitors that the old URL has a new permanent home. Search engines should also respect a 301 and index appropriately. Using the RewriteRules above, you could accomplish this by using the flags [R=301,L], demonstrated below:

# Match a request for index.asp RewriteCond %{REQUEST_URI} ^/index.asp$ # Match a query string like id=[some number] and capture that number RewriteCond %{QUERY_STRING} ^id=([0-9]*)$ # Permanently redirect - note the ? at the end of the address # which is necessary to not append the original query string RewriteRule ^.*$ http://example.com/node/%1? [R=301,L]

The above example will redirect the user to a URL like /node/123. So, what if you want to permanently redirect to a more friendly URL alias? A great option is to use the Global Redirect module for Drupal.

The Global Redirect module solution

When the Path module creates an alias, it does not remove access to the original path. So, /node/123 and /your-aliased-url will both show the same content. This can be a detriment to your search engine rankings because they frown on duplicate content. The Global Redirect module solves this problem by creating a 301 redirect from /node/123 to /your-aliased-url. There is no configuration necessary, it works automatically for all of your URL aliases.

This module can also be used in conjunction with mod_rewrite rules. (The first set of mod_rewrite rules mentioned in this post are a good example) Assume that you have a URL alias like /your-aliased-url used for /node/123. Using mod_rewrite, you can map /index.asp?id=123 to /node/123 and Global Redirect will permanently redirect to your alias. This effectively sends all requests for /index.asp?id=123 to /your-aliased-url.

As you can see, there are many options for capturing your old URLs. No matter what solution you choose, be sure that people coming to your site are seeing what they expect.

With a little bit of planning, a migration won't discourage your visitors and you won't hurt your valuable relationship with search engines.