Home / Migrate 2.6 - framework changes

Migrate 2.6 - framework changes

The Migrate module provides services for migrating data from various sources (other CMS frameworks, external web services, or other Drupal installations) into the local Drupal environment. It has been used to migrate sites such as The Economist, Examiner.com, Stanford Law School, and PayPal and eBay's developer forums (x.com) to Drupal.

Migrate 2.6 is now in beta. Today I will cover some of the most significant changes at the framework level, for those who have been developing their data migrations using Migrate 2.5 and earlier. There have also been significant UI changes, and the introduction of a framework for implementing wizard-style UIs for defining migrations, which will be covered in future posts.

Registration and construction

In Migrate 2.5, we deprecated auto-registration (the detection of Migration subclasses and registration of corresponding migrations) and introduced a static registration method via hook_migrate_api() - see my previous blog post for more on this. With Migrate 2.6, we have removed the auto-registration capability entirely, cleaned up migration construction, and enhanced the use of migration groups.

First, let's be clear on the distinction between registration and construction, which sometimes confuses people. Registration is the definition of a persistent migration instance - the storage of a row in the migrate_status table describing the distinct attributes of a given migration. More than one migration may be registered from a given Migration class, distinguished by unique machine names and distinct arguments. Construction is the creation of a PHP object within a Drupal page request or drush command invocation - this populates an instance of a class derived from Migration (or MigrationBase) with the information from the migrate_status table, by calling MigrationBase::getInstance($machine_name).

Originally, the registration and construction processes were mixed together, generating all sorts of chicken-and-egg issues (such as, where do the constructor parameters come from, the code calling getInstance or the database?). In addition, while the MigrationBase and Migration constructors took a MigrateGroup parameter, migrations built off DynamicMigration took an arguments array instead - more recent versions of PHP have become pickier about consistent constructor parameters (as they should), and also it was simply confusing trying to remember what to pass to the parent constructor. So, to make the developer experience cleaner and to simplify the code, the way things work now is:

  1. A migration must be explicitly registered (either statically through hook_migrate_api() or dynamically by calling registerMigration()) before it can be used.
  2. On a migration registration operation (i.e., a call to registerMigration, or by requesting registration of the static migrations described in hook_migrate_api()), if the migration machine name already exists in migrate_status its data (group name and arguments) is updated with the arguments passed to registerMigration.
  3. getInstance now is invoked with only a machine name (the class name and arguments parameters are now deprecated and unused), and always constructs the migration object from the migrate_status table (formerly it would use the additional passed parameters if present).
  4. All Migration classes should accept an array of arguments as their first parameter and pass it on to through the parent constructor. For backwards compatibility, if a MigrateGroup object or NULL is the first parameter they will be handled as they were formerly.

Groups

Although we've had a MigrateGroup class for a long time, up to now all it really has been is a tag on migrations to group them (e.g., to run a set of migrations with drush migrate-import --group=my_group). Groups are now much more useful objects, supporting display names in addition to the machine names they always had, and storing arguments that apply to all migrations in the group (such as database credentials).

Groups now may be registered similarly to migrations, either statically (through hook_migrate_api) or dynamically (by calling MigrateGroup::register). In addition, primarily for backwards compatibility, a group (with a title equal to its machine name and an empty argument list) is implicitly registered when a group_name is assigned to a migration via its argument list.

To statically register a migration group, and assign a migration to it:

<?php
function example_migrate_api() {
$api = array(
   
'groups' => array(
     
'my_group' => array(
       
'title' => t('My Group'),
       
'default_format' => 'filtered_html',
      ),
    ),
   
'migrations' => array(
     
'MyTerm' => array(
       
'class_name' => 'MyTermMigration',
       
'group_name' => 'my_group',
       
'destination_vocab' => 'categories',
    ),
  );
  return
$api;
}
?>

In this example, the 'default_format' argument is saved to the migrate_group table, and the 'destination_vocab' argument is saved to the migrate_status table. When the MyTerm migration is constructed, its argument array will contain both those values (note that if the migration arguments included 'default_format' => 'plain_text', this would override the value from the group).

Encryption of migration/group arguments

Now, while we're on the subject of arguments, here's another enhancement to this area. To support scenarios where migrations are defined via a UI rather than code, where database credentials may need to be submitted and stored, we would like to encrypt those credentials before saving them to the database. Such encryption is now directly supported by the Migrate module, by listing any migration or group arguments needing some security in an 'encrypted_arguments' argument. So, imagine you're in a form submit handler where you've gathered database credentials for a set of migrations you're defining. You want to encrypt the credentials and save them as group arguments, so they're available to all the individual migrations within the group. You can register the group like this:

<?php
$values
= $form_state['values'];
$group_arguments = array(
 
'username' => $values['username'],
 
'password' => $values['password'],
 
'default_filter' => 'filtered_html',
 
'encrypted_arguments' => array('username', 'password'),
);
MigrateGroup::register(check_plain($values['group_name']), check_plain($values['group_title']), $group_arguments);
?>

Note that to have your arguments encrypted, you must either have the encrypt module enabled, or the PHP mcrypt extension installed. Also note that the protection here is less than that for, say, user passwords - we must be able to decrypt the credentials to be able to pass them on to the database server. Thus, someone who obtains full access to your server has the information necessary to decrypt the values and obtain access to the referenced database server as well. Do not mistake this encryption support as fully securing your credentials - it's one additional hurdle to jump for a malicious hacker, but you should also be taking other steps such as firewalling the source database so only the web servers running migration have access to port 3306 (assuming a migration from MySQL).

Default field handler

Up to now, to support custom field types has required writing an explicit field handler for each one. However, many if not most field types at the data level are simply one or more columns in a field table to which incoming data values can be directly mapped without modification. Since we can query what columns are supported by each field type, this means a single field handler could actually handle a good number of field types without custom code. Such a field handler - MigrateDefaultFieldHandler - now exists. For any fields whose type doesn't correspond to a registered field handler, this handler will be applied, exposing any columns of the field after the first one as mappable subfields, and handling the copying of mapped data to the primary column (which is mapped to the field name) and to the subfields.

Importing changed records

Migrate is sometimes used not just to do a bulk import of legacy data into a brand-new Drupal installation, but for continuous import of remote data. Even in the classic migration scenario, typically a full bulk import is done shortly ahead of the launch of the new site, followed by a delta migration to pick up the last changes from the legacy site. Usually in both these cases, you want to not only pick up new content (which is the default migrate-import effect when run for a second time), but changes to previously-imported content. To do so requires being able to identify what has changed, and the only mechanism available up to now has been highwater marks - a timestamp on the legacy content that you can trust will be updated when any relevant changes are made. Of course, your source data doesn't always have a convenient timestamp - one particular case near and dear to our hearts is Drupal user data. With Migrate 2.6, we have introduced the 'track_changes' option to MigrateSource. If you pass 'track_changes' => 1 in your options array on your source constructor, Migrate will hash the source data after prepareRow() is called and save the hash in the map table on initial import. On subsequent imports, it will again hash the incoming source data and compare it to the saved hash - if the hashes don't match, the object will be reimported and the new hash saved, otherwise it will be skipped. Of course, this is a slower method than highwater marks, but when you must pick up changed records and don't have a good highwater mark field in the source data, this is the way to go.

Apply field default values

Through Migrate 2.5, if you had NULL/empty values for fields coming in with your source data, and did not set a defaultValue on the field mapping, the fields would end up empty even if the field definition itself had a default value configured. With Migrate 2.6, in this scenario (where you have not overridden the field's default value with defaultValue()), the field's configured default value will be applied.

hook_migrate_api_alter()

An alter hook is now available for hook_migrate_api(), allowing static registrations to be modified - for example, a custom module could build on top of a more general one and change some migration arguments to fit a different use case.

Disable urlencoding of file paths

When migrating files using file paths provided in the source data, sometimes the paths require urlencoding and sometimes they don't. In hindsight the default should probably have been to leave them unaltered, with an option to do urlencoding, but in practice when the current file handling was being developed all our real-world migration projects required the urlencoding, so it gets done by default. We now offer the option to disable the urlencoding (e.g., if your incoming paths are already encoded) by setting the urlencode option to 0:

<?php
$this
->addFieldMapping('field_my_image', 'encoded_image_path');
$this->addFieldMapping('field_my_image:urlencode')
     ->
defaultValue(0);
?>

Increase time limit for batch processing

  1. Do you run your migrations in the UI rather than via drush commands? Read http://drupal.org/node/1806824.
  2. If you feel you MUST run migrations through the UI - read it again.
  3. You're absolutely sure you must use the UI, but the overhead of the Batch API is killing your migration throughput? You can improve things by increasing the PHP time limit just for batch invocations (without setting it globally) - just set $this->batchTimeLimit to the desired time limit (in seconds) in your migration constructor.

Spoiler alert: You can now use the UI to run migrations via drush, thus avoiding the performance issues of using the Batch API. I will discuss this in my next blog post, covering the UI changes in Migrate 2.6.

Reacties

Posted on by Tim Kamanin (niet gecontroleerd).

Mike, thanks a lot for this overview! Previously I wrote two posts related to Migrate 2.6 beta showing how to apply new features of Migrate on practice:

1) Migrate Module Caveats. May Save You Some Time... and Hair http://timonweb.com/migrate-module-caveats-may-save-you-so me-time-and-hair

2) Using Hash Value (track_changes) To Detect Source Data Changes in Migrate for Drupal 7 http://timonweb.com/using-has h-value-trackchanges-to-detect-source-data-...

Posted on by cameronbprince (niet gecontroleerd).

Thanks for the write up Mike. I can't imagine life without Migrate.

Posted on by Do Not CallStev....

In regards to updating changed records, I remember using Migrate v1 back in D6 where I was maintaining product inventory from a parts supplier. IIRC, all I did was use the --update option with the drush command, and it would update my Drupal content with changes from the source (I'd post the link to the g.d.o discussion, but the spam filter won't let me). How are the changes detailed here different?

Posted on by mryan.

track_changes and --update are independent. You would use the --update flag in drush when you want to update ALL records from the source. You can use the track_changes feature so the migration by default (without --update) recognize which records have changed and reimport ONLY those records, leaving any whose source data has not changed alone.

Posted on by Matt-H (niet gecontroleerd).

Thank you for this and your previous articles. I am hoping you would consider either updating or writing a new follow up article to some of your previous migrate related articles, http://www.acquia.com/blog/drupal-drupal-data-migration-part-1- basics and www.acquia.com/blog/drupal-drupal-data-migration-part-2-a rchitecture, so the examples reflect the changes that come with Migrate 2.6. Thank you.

Reactie toevoegen

Plain text

  • Geen HTML toegestaan.
  • Adressen van webpagina's en e-mailadressen worden automatisch naar links omgezet.
  • Regels en alinea's worden automatisch gesplitst.

Filtered HTML

  • Use [acphone_sales], [acphone_sales_text], [acphone_support], [acphone_international], [acphone_devcloud], [acphone_extra1] and [acphone_extra2] as placeholders for Acquia phone numbers. Add class "acquia-phones-link" to wrapper element to make number a link.
  • To post pieces of code, surround them with <code>...</code> tags. For PHP code, you can use <?php ... ?>, which will also colour it based on syntax.
  • Adressen van webpagina's en e-mailadressen worden automatisch naar links omgezet.
  • Toegelaten HTML-tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <h4> <h5> <h2> <img>
  • Regels en alinea's worden automatisch gesplitst.
Bij het indienen van dit fomulier gaat u akkoord met het privacybeleid van Mollom.