Home / Migrating the Drupal way. Part II: saving those old URLs

Migrating the Drupal way. Part II: saving those old URLs

For the second part of my migration blog, I want to touch on the importance of maintaining URLs from your old site and demonstrate some examples of how to capture them in Drupal. Search engine traffic and other referrals are invaluable when it comes to the success of a site. I've managed sites that have received upwards of 100,000 referrals a day from Google alone. This is not even to mention all of the external links and bookmarks to your pages. If you are thinking about migrating to Drupal and have not considered the importance of maintaining old URLs, you definitely should. The impact of losing that amount of traffic could be catastrophic, but luckily the solution is reasonably simple.

So, how do you process all of those old links? You could capture them with a generic "page not found" message, but this is frustrating for users and will inevitably hurt your search engine rankings. Thankfully, with a small amount of configuration, you can still send your visitors to the right place in your new site. Here are a few options to maintain URLs using Apache's mod_rewrite module and a couple of Drupal modules.

Mapping a previously used ID to a NID

If you are fortunate enough to be able to map a previously used ID number to a Drupal NID, then you can very easily use mod_rewrite to remap the URL. Let's say that the old URL looked like http://example.com/index.asp?id=123, but your new Drupal site uses a URL like http://example.com/node/123. The following rules could be used to remap the old URL unbeknownst to the user. The following examples would be placed before the Drupal rewrite rules in your .htaccess file:

# Match a request for index.asp
RewriteCond %{REQUEST_URI} ^/index.asp$
# Match a query string like id=[some number] and capture that number
RewriteCond %{QUERY_STRING} ^id=([0-9]*)$
# Rewrite a Drupal friendly URL using the captured number
# Note that %1 is a backreference from a RewriteCond
# where $1 is a backreference from a RewriteRule
RewriteRule ^.*$ index.php?q=node/%1 [L]

For a greedy match, you could drop the first RewriteCond and it would capture any URL that ends in "?id=[some number]". The convenient thing here is that Apache doesn't need to know what to do with an ASP file extension because these URLs are rewritten as requests to index.php.

I should note that it is outside the scope of this post to explain how to create new nodes with a NID that matches the ID of an imported item from your old site. I'll save that for another day.

Mapping a previously used numeric file name to a NID

Perhaps your former CMS creates static HTML files, but uses an ID predictably in the file name. The following example will capture any numeric characters in parentheses and append them to index.php?q=node/. e.g. requesting file5.html would return the same contents as node/5.

# Map a filename with a predictable number to a drupal nid
RewriteRule ^file([0-9]*)\.html$ index.php?q=node/$1 [L]

The Added Bytes website has a really convenient mod_rewrite cheat sheet if you want to learn more.

Creating individual URL aliases in Drupal

In some cases you might not want to use mod_rewrite to process these URLs. It might be too much overhead for your server, or you might not have a predictable ID as a reference. Using the Path Module, which comes with Drupal Core, you can add aliases for system paths. Manually enter Aliases by navigating to Admin -> Site Building -> URL aliases -> Add. The following example would map your old URL 2008/12/12/my-seo-page.html to the Drupal URL node/5

Add a path alias

Alternatively, you can add these aliases when you edit each node under the heading URL path settings.

A programmatic solution

The above example of using the Path Module is great when you only have a handful of pages to alias. But consider a migration that has 250,000 pages to remap. If you are using a migration script, you can create these aliases on the fly during the import process using Drupal's path_set_alias() function.

<?php
require "includes/bootstrap.inc";
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

$node = new stdClass();

// Here you would look up your old article information and populate the node object.
$node->title = $old_title; // etc.

// Next you would want to create some logic to reconstruct your old URL
// example.com/2008/10/18/my-old-url.html -- use:
$old_url = "2008/10/18/my-old-url.html";

// Save the node in the database
node_save($node);

// Save the new alias based on your old URL
if ($node->nid) {
 
path_set_alias('node/' . $node->nid, $old_url, NULL, 'en');
}
?>

Redirecting as a solution

If you want to send your visitors to the correct page and use your new Drupal-style URLs, you can use a permanent redirect. The 301 HTTP status code tells your visitors that the old URL has a new permanent home. Search engines should also respect a 301 and index appropriately. Using the RewriteRules above, you could accomplish this by using the flags [R=301,L], demonstrated below:

# Match a request for index.asp
RewriteCond %{REQUEST_URI} ^/index.asp$
# Match a query string like id=[some number] and capture that number
RewriteCond %{QUERY_STRING} ^id=([0-9]*)$
# Permanently redirect - note the ? at the end of the address
# which is necessary to not append the original query string
RewriteRule ^.*$ http://example.com/node/%1? [R=301,L]

The above example will redirect the user to a URL like /node/123. So, what if you want to permanently redirect to a more friendly URL alias? A great option is to use the Global Redirect module for Drupal.

The Global Redirect module solution

When the Path module creates an alias, it does not remove access to the original path. So, /node/123 and /your-aliased-url will both show the same content. This can be a detriment to your search engine rankings because they frown on duplicate content. The Global Redirect module solves this problem by creating a 301 redirect from /node/123 to /your-aliased-url. There is no configuration necessary, it works automatically for all of your URL aliases.

This module can also be used in conjunction with mod_rewrite rules. (The first set of mod_rewrite rules mentioned in this post are a good example) Assume that you have a URL alias like /your-aliased-url used for /node/123. Using mod_rewrite, you can map /index.asp?id=123 to /node/123 and Global Redirect will permanently redirect to your alias. This effectively sends all requests for /index.asp?id=123 to /your-aliased-url.

As you can see, there are many options for capturing your old URLs. No matter what solution you choose, be sure that people coming to your site are seeing what they expect. With a little bit of planning, a migration won't discourage your visitors and you won't hurt your valuable relationship with search engines.

Comments

Posted on by jackbravo (not verified).

What if the old site was using some weird cms without rewrite rules (maybe they weren't active), and all the site urls look like: index.php?option=com_content&task=view&id=13&Itemid=29
???

is it worthy to preserve those urls, or just start from scratch?

Posted on by Kevin Hankens.

Yeah, it can get crazy complex, but definitely worth it if you can map things predictably.

For example, you could do something like the following:

# redirect: /index.php?option=com_content&task=view&id=13&Itemid=29
# to: /your-view/13/29/com_content/view
RewriteCond %{REQUEST_URI} ^/index.php$
RewriteCond %{QUERY_STRING} ^option=(.*)\&task=(.*)\&id=(.*)\&Itemid=(.*)$
RewriteRule ^.*$ http://example.com/you r-view/%3/%4/%1/%2? [R=301,L]

You could use the redirected values as arguments for Views or custom modules. Depending on your setup, it might be as simple as just grabbing the id field and redirecting to node/xxx.

Posted on by Greg Knaddison.

FWIW, I tend to let Pathauto create the aliases during the import stage (or use Pathauto's bulk generate feature) and, simultaneously, create 301 redirects from the old urls to the new urls using path_redirect: http://drupal.org/proje ct/path_redirect

I'd rather have the content fit with the scheme of the aliases that are on the new site but obviously don't want to lose the old inbound traffic / link juice.

Posted on by lauggh (not verified).

It sounds like it would be really easy to port a static HTML site over to Drupal 5 using node_import and path_redirect, but Acquia Drupal only supports D6. Would the best strategy be to:

1> migrate to D5
2> upgrade to D6
3> port to Acquia Drupal

Please advise.

Posted on by Kevin Hankens.

Yeah, the node_import module only has a 5.x version at this point, so you would have to start with a D5 installation. The path alias creation is super easy though. Enable the Path module and node_import module, then in your CSV import file, just include a field mapped to "path" and the import will automatically create the alias.

Also, be sure to check out the Getting Started Guide on our downloads page for information about upgrading to Acquia Drupal.

Posted on by Kevin Hankens.

Importing a CSV to D6: Also, I neglected to mention that you can skip the 5.x install and import directly to an Acquia Drupal 6.x site by using your own import script. Using the API tools mentioned in this article and the previous one, you can use a CSV file to import your data directly into the database.

An example would be to use the PHP file() function to read in a CSV file line by line. Then using the explode() function, you could grab each element in that row and use them to create the node object, save, repeat.

Posted on by Kevin Hankens.

@Greg - we should be co-writing these things :)

I appreciate the added thoughts, thanks!!

Posted on by Prodigy (not verified).

Fantastic Article! I'm moving from another CMS to Drupal and already have good SEO rankings that I don't want to lose.

How does this look?

OLD: www.purple.com/detail. aspx?ID=618

NEW: www.purple.com/node/33

# Match a request for detail.aspxRewriteCond %{REQUEST_URI} ^/detail.aspx$# Match a query string like id=[some number] and capture that numberRewriteCond %{QUERY_STRING} ^id=([0-9]*)$# Permanently redirect - note the ? at the end of the address# which is necessary to not append the original query stringRewriteRule ^.*$ http://purple.com/node/%1? [R=301,L]

Of course Drupal uses "node/3" for all of it's URLS .. but what if we alias the "node/3" to be something like "cars/lexus".

Global Redirect will fix the above automatically?

Thank you so much! Acquia rocks!

Posted on by jmjohn (not verified).

New Drupal handbook page on how to preserve NIDs and URL paths during migration. More examples of mod_rewrite, some ideas, and some SQL scripts.

http://drupal.org/node/570906