Home / Migrating the Drupal way. Part I: creating a node.

Migrating the Drupal way. Part I: creating a node.

Update: Acquia has released a great migration whitepaper to help you get ready to move to Drupal.

My position with Acquia will find me helping out with a lot of migrations and upgrades. I'm going to embark on a multiple-part blog to discuss some of the common techniques that I use when moving clients to Drupal.

Migrating to Drupal can seem intimidating if you already maintain a database-driven website. However, populating a Drupal site with your current content might be easier than you think. Whether you are migrating from a popular CMS or a fully custom application, you can easily use Drupal modules to mimic your current data structures and migrate your data using a simple custom PHP script. I should note that while there are several different methods to accomplish this task, this happens to be my favorite.

When interacting with Drupal, it's a good idea to do things the Drupal way. Fortunately, Drupal core allows you to bootstrap Drupal and use all of its API functionality outside of a normal Drupal instance. For yours truly, learning about this has been a godsend because it provides a fast, simple way to migrate data.

Creating a basic node

When writing an import script, you will need to bootstrap Drupal to use the API functions. Using drupal_bootstrap($phase), you can load Drupal up to a certain loading phase by designating a $phase argument. The value of $phase allows you to specifically load the site configuration, database layer, modules and other requisite functionality. For our purposes, we will use drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL) to make sure that we have access to the whole API.

Note: Make sure that you create this script in the root of your Drupal installation.

<?php
// Bootstrap Drupal
require 'includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);
?>

For a simple example, we will create a basic node object and save it in our Drupal database using node_save().

<?php
// Construct the new node object.
$node = new stdClass();

// Your script will probably pull this information from a database.
$node->title = "My imported node";
$node->body = "The body of my imported node.\n\nAdditional Information";
$node->type = 'story';   // Your specified content type
$node->created = time();
$node->changed = $node->created;
$node->status = 1;
$node->promote = 1;
$node->sticky = 0;
$node->format = 1;       // Filtered HTML
$node->uid = 1;          // UID of content owner
$node->language = 'en';
// If known, the taxonomy TID values can be added as an array.
$node->taxonomy = array(2,3,1,);

node_save($node);
?>

Creating more complex nodes

The script above will create a new node with a title and body that is published and promoted to the homepage. However, the process becomes more slightly more complicated if you have more data than simple title and body fields. The CCK module is a popular method to extend your nodes by adding any number of custom fields. When Drupal displays your content, CCK adds your custom fields to the node object using hook_nodeapi(). Luckily, you can replicate this by adding your own fields in the import script. So, how can you find out the structure of these fields? One really easy method is to use the Devel module.

CCK Custom FieldsThe Devel module can be used to show how Drupal sees your node object

Using the Devel module

The Devel module is a great way to see, among other things, the structure of the node object which is invaluable in this case. After installing the module and viewing a node you will see new tabs: Dev load and Dev render. Click the Dev load tab, then click the "... (Object) stdClass" header to expand the node object definition. Here you will find some familiar data like nid, type, etc. Near the bottom, you will see some other definitions that begin with "field_". These should resemble the CCK fields that you created for your node type.

Depending on your CCK definitions, the assignments in your import script might look like one of the following:

Devel Load Module TabHere you can see some examples of how CCK has added fields to the node object

<?php
$node
->field_text_field[0]['value'] = "value 1";
$node->field_text_field[1]['value'] = "2nd value";
$node->field_nodereference[0]['nid'] = 58;
?>

Add these assignments to your import script and you will start to see the power of the Drupal API. Let's say you are migrating from another CMS with a number of related fields, categories, images, etc. You could expand this script to iterate through your old database and map all of the related elements to a corresponding node object. Execute your script, and all of your old data will now become Drupal data! The best part about using the API is that it takes care of all of everything from search indexing to path aliases and all of the other little things we might overlook.

Migrating to Drupal can seem like a daunting task, but when doing things the Drupal way it's quite straight forward. Whether you are planning a migration of 100 nodes or 100,000 nodes, proper scripting can make it seem like a breeze!

In this eBook, we show you how to optimize performance of your Drupal site without any knowledge of coding, server configuration, or the command line.

Comments

Posted on by Greg Knaddison.

node_save is one way to create the node, but my preference is drupal_execute which has the benefit of creating the node in a more Drupalish way (i.e. executing the validation from modules that care about the node prior to it being saved). node_save is probably faster, but I'd rather have valid data than fast data.

There is a really good guide about this on http://drupal.org/ node/178506#comment-895418

Also I'm sad we never got to meet while you were in Boulder (you were out here, right? or did you just work for velonews from somewhere else?). Hopefully we'll get to meet up in D.C.

Posted on by mikey_p (not verified).

One trick I've picked up after a myriad of different imports, is node_o bject_prepare.

Take for example your code above. If I just want to fill in status, promote, and sticky, and set the date to the current time;

<?php
// Construct the new node object.
$node = new stdClass();
$node->type = 'story';   // Your specified content type

node_object_prepare($node); // just filled in default values for uid, status, promote, status, date, created, and revision properties

// Your script will probably pull this information from a database.

$node->title = "My imported node";
$node->body = "The body of my imported node.\n\nAdditional Information";

// SNIP
?>

This usually sets defaults for most items, and you can always override the uid, date or other items later. The biggest benefit of this, is the invocation of hook_prepare and hook_nodeapi (with op 'prepare'). If you have other modules that take advantage of these hooks, and want them to work on our imported nodes, then you will need to call node_object_prepare with your node, at some point in your import script.

Note that you could just as easily call this from the end of your import script as well, but then you wouldn't have the chance to override it's values.

Posted on by nico059 (not verified).

For information, I just add this line at the begining of the script:

<?php
require 'modules/node/node.pages.inc';
?>

to be able to use node_object_prepare

Thx guys for your tricks, you save my day !

Posted on by Kevin Hankens.

Awesome tips, thanks guys!

@Greg, I was only in Boulder for a year - hardly enough time to even settle in :) Definitely look me up in DC!

Posted on by isaac77 (not verified).

Thanks for the tips, everyone!

I'm having trouble using node_save with a cck nodereference field (on Drupal 5.x). Anyone know of further documentation on that?

A number of entries at drupal.org mention the problem (e.g. http://drupal.org/node/275754 ) but haven't helped. It seems that using a select list on a custom form to set the value will work, but programatically setting the value using the syntax suggested above $node->field_nodereference[0]['nid'] = 58; will fail. Advice?

Posted on by Kevin Hankens.

Did you stumble upon the CCK import doc? Specifically check out the comment on clearing the cache.

The code above works fine in 6.x, but I didn't test it in 5. In the past, I've done nodereference imports for 5.x by populating the database field manually after creating the node. You could use something like the above script and then add:

<?php
// create your node without populating the nodereference fields

// check your schema for the proper table and column
$result = db_query("UPDATE content_field_noderef SET field_noderef_nid = %d WHERE nid = %d AND delta = %d LIMIT 1", $referenced_nid, $nid, $delta);
?>

The only problem I ran into is addressed in the above links.

Good luck!

Posted on by jp.stacey (not verified).

Thanks for the summary, Kevin: the little fiddly bits like status can cause node_save to fail silently, and it's really hard with just the Devel module alone to work out precisely what's the bare minimum needed to save a node.

If anyone's interested, Node factory is really good at handling the bare bones of node creation (basically the second PHP block in your post). It's still considered bleeding-edge by its maintainer, because I think he wants to nail CCK support, but for one-time imports I'd happily use it to set up basic nodes without reservation.

Posted on by mikey_p (not verified).

Another big point, in the drupal_execute() vs. node_save() decision, is that using node_save() will bypass most validation for nodes such as required and non-required fields, allowed values for CCK fields, length of fields and any custom validation you've added in a custom module.

This can be a big advantage or a huge shock depending on how you look at it. I tend to prefer using node_save() over drupal_execute() for this reason. I don't particularly care for sanitizing my clients data for them, unless we agree to not import anything that doesn't validate, but many times in the course of importing there is an occasional missing field which can throw errors with drupal_execute().

Posted on by jmjohn (not verified).

This is really great. Seeing the thinking here on migration is really useful. I hope that you write some more blog entries.

Thanks!

Posted on by tgeller (not verified).

This IS very handy. Would you mind if I used your code in an upcoming video I'm doing for Lynda.com?

---
Tom Geller * San Francisco * http://www.tomgeller.com
http://www.gellerguides.com * http://www.savemyhomebook.com

Posted on by JSafro (not verified).

Thanks for the code examples, Kevin. They were a huge help. I used them to create a FAPI module that saves both nodes and users. I'll post some code examples below:

<?php
// Take in form data
// Create a new user + content_profile if necessary
// Log in user
// Save calculator_input node
function pr_calculator_form_submit( $form_id, &$form_state ){
    global
$user;

   

// Report success
   
drupal_set_message( t('Thank you for using the calculator.  Your results are below.') );
   
   
// Check DB for existing user w matching email address
   
$local_user = pr_calculator_find_user( $user, $form_state['values']['email'] );

   

// Deal w case where user does not exist
   
if( empty($local_user->uid) ) {
       
$user_data = array(
           
'is_new' => TRUE,
           
'status' => 1,
           
'mail' => $form_state['values']['email'],
           
'name' => $form_state['values']['email'],
           
'pass' => substr( $form_state['values']['email'], 0, 8 ),
           
'roles' => array( 2=>TRUE),
        );
       
$local_user = user_save( NULL, $user_data );
       
watchdog(
           
'user',
           
'New user: %name (%email).',
            array(
'%name' => $form_state['values']['email'], '%email' => $form_state['values']['email']),
           
WATCHDOG_NOTICE,
           
l(t('edit'), 'user/'. $local_user->uid .'/edit')
        );
       
user_authenticate( array('name'=>$user_data['name'], 'pass'=>$user_data['pass']) );
       
       
// Now create the profile
       
$time = time();
       
$owner_uid = (isset($local_user->uid) && !empty($local_user->uid)?$local_user->uid:1);
       
$profile_node = new stdClass();
       
$profile_node->title = $local_user->name;
       
$profile_node->body = '';
       
$profile_node->type = 'profile';   // Your specified content type
       
$profile_node->created = $time;
       
$profile_node->changed = $time;
       
$profile_node->status = 1;
       
$profile_node->promote = 0;
       
$profile_node->sticky = 0;
       
$profile_node->format = 1;       // Filtered HTML
       
$profile_node->uid = $owner_uid; // UID of content owner
       
$profile_node->field_first_name[0]['value'] = $form_state['values']['first_name'];
       
$profile_node->field_last_name[0]['value'] = $form_state['values']['last_name'];
       
$profile_node->field_title[0]['value'] = $form_state['values']['title'];
       
$profile_node->field_company[0]['value'] = $form_state['values']['company'];
       
node_save($profile_node);
    }

   

// Assemble node
   
$time = time();
   
$owner_uid = (isset($local_user->uid) && !empty($local_user->uid)?$local_user->uid:1);
   
$create_node = new stdClass();
   
$create_node->title = 'Calculate how much you can save!';
   
$create_node->body = '';
   
$create_node->type = 'calculator_input';   // Your specified content type
   
$create_node->created = $time;
   
$create_node->changed = $time;
   
$create_node->status = 1;
   
$create_node->promote = 0;
   
$create_node->sticky = 0;
   
$create_node->format = 1;       // Filtered HTML
   
$create_node->uid = $owner_uid; // UID of content owner
   
$create_node->field_avg_billing_rate[0]['value'] = $form_state['values']['avg_billing_rate'];
   
$create_node->field_num_matrixes[0]['value'] = $form_state['values']['num_matrixes'];
   
$create_node->field_avg_hours_per_week[0]['value'] = $form_state['values']['avg_hours_per_week'];
   
node_save($create_node);

   

// Get nid
   
$new_node = node_load( array(
       
'type' => 'calculator_input',
       
'created' => $time,
       
'changed' => $time,
       
'uid' => $owner_uid
   
) );

   

// Redirect to the new node page
   
drupal_redirect_form( array('#redirect' => 'node/'.$new_node->nid ) );
}
?>
Posted on by polarshift (not verified).

What about if you wanted to create the node with a specific node id instead of using the next sequence number. You cannot assign a nid using node_save. Any ideas?

Posted on by markroyko (not verified).

Thanks for these great tips. I was wondering if there's a good way to import pages and map them to a menu item programmatically. (i.e. import an 'about us' page, then map that page/path to the primary menu. I can see how the path is set, but not how mapping to a particular menu item is accomplished.) Any thoughts?

Thanks!

Posted on by jmjohn (not verified).

How do you create node id's in a programmatic fashion? When I use node_save() it fails to create nodes if I specify a nid.

Posted on by eon (not verified).

For new nodes, set $node['nid']='' (nothing)

For my imports, I always have to check whether the node has already been imported, since our import data may have been edited. In that case, if I've found an existing node (by checking the cck field where I stored the imported data's 'original' id), I load that node and set $node['updated'] to time() and $node['revision'] to 1 (create revision) or 0 (do not).

There is no reason to ever manually specify a node['nid'] (other than updates, as stated above). In D6 it's an auto-increment field. If you need to store an original id (as we do), you'll have to put it in a cck field.

Posted on by justageek (not verified).

Anyone got an idea of how to create multiple images with your new nodes, where the images are the standard cck image widget type?

Posted on by nhunter (not verified).

I'm trying to import data using an insert if it doesn't already exist, update if it does strategy. Luckily, the record id of the import data file is the same as the nid of the existing node (if any such).

I perform a full bootstrap, but when I call node_load($nid) to determine whether the first node exists I get garbage back. I stepped into the code and the problem seems to be related to drupal not properly unserializing the schema data after retrieving it from the cache.

node_load()
-> drupal_schema_fields_sql()
-> drupal_get_schema()
-> cache_get('schema')

Am I missing something? Is there something else I need to do after bootstrapping drupal in order to access existing nodes?

Thanks.