Taking Migrations from Madness to Gladness
by Jenn Sramek
For every new and amazing Drupal application we build, there is almost always a corresponding data migration. This major aspect of enterprise projects can involve complex business and data analysis, detailed data mapping, migration from multiple third-party data and analytics sources, test migrations, and data validation.
Data is always an area where our clients have much more expertise than we could hope to gain during a standard professional services engagement, so they are heavily involved in this aspect of the project, including planning, validating test migrations, signoff of final plans, and especially, final data validation. And, this is often a relatively new area for people on the team in terms of their participation. With all of these forces at plat, and looming deadlines throughout, it can be a struggle to accurately plan and set expectations, and it can be even more complex in situations where many business units are coming together under a single, shared platform.
So, how do you plan for all of this?
Know what you are getting into.
Before you Begin: Clean Data
Do whatever you can to clean up the data you have if you are migrating to a new platform. Data structure is created over time, (sometimes over a long time and multiple systems) and it may require some effort to find owners for all the data you have. Find the people within your organization who know the data. Their signoff is important in deciding where to draw the lines on what to migrate (and what not to).
Once you have found those people:
- Delete or archive old or unused content. This can reduce data "cruft" and will make the work of migration easier, but also eases the effort of data validation for you.
- Purge expired and outdated user accounts. Migration can be seen as an opportunity to update the user base to only active accounts, or can be seen as a way to welcome inactive users back. Determine your path forward, and update user content accordingly, purging outdated or duplicate accounts before moving to the new system.
- Review roles and permissions and consider which are actually used. Developing a new user role and permissions system is almost certainly a part of the work of the new site development, and hidden complexity lies in too many user roles, having unused roles, or duplicate roles. It also complicates data validation and increases the cost of QA overall. Streamline as much as possible.
- Gain as much knowledge as possible about what your data means. It is eventually going to be necessary for your team to know exactly what each field of data represents, and whether or not you want to migrate it to the new site. If you wait to start this process until development begins it can draw out the migration schedule. The more (or more complex) your data, the longer this process can take. Get as much of a head start as possible before the development team is engaged.
The Migration Team
The migration team is responsible for ensuring that your data moves to the new platform as seamlessly as possible. This team is made up on members of both client and development teams:
- Client - Project/Program Manager(s) - help keep the activities of migration in sync with development, help to ensure timely signoff of incremental milestones in migration.
- Client - Product Owner - owns the final data migration plan signoff, including confirming what not to migrate. Owns the final signoff of the final, completed migration or migration script, depending on project scope.
- Client - Data or Business Analyst - owns the handoff of technical details about migration to the migration specialist. The person in this role helps to define why each piece of data is used, and its role in critical business functions of the website. They ensure that no critical data is lost, and may work with the client team internally to identify and map data.
- Client - Data Validation Team - these may be those responsible for specific business units, community moderators, or other experts in your data. Their role is knowing the data intimately enough to execute test cases and determine their validity and completeness. Those in this role should be interacting with your data as a regular part of their daily work, making validation much more precise, and much easier.
- Acquia - Engagement Manager - a management expert at Acquia responsible for overall project coordination and success, and coordinating between client and Acquia teams, but also possibly third-party data providers, other vendors related to migration activities, and managing the on-time achievement of milestones.
- Migration Specialist - a technical expert at Acquia responsible for the execution of the migration planning and development tasks through final signoff.
- Site Architect - a technical site development expert at Acquia, who coordinates closely with the migration team to ensure technical alignment (the correct field types are developed and configured to "capture" the data correctly and the site architecture retains relationships between data elements).
Migration cannot begin the moment the project begins, because there are development dependencies. The content types and fields must be developed before the content can be migrated into them. Critical things to consider in planning are as follows:
- It is never too early to start source data cleanup!
- Enable access to data by the migration team as soon as possible in the project schedule.
- Consider the priority of different data types, and plan for development of those aspects of the site first, followed by less critical items.
- Consider that migration can be done incrementally, but this greatly expands the time is takes to validate (from the client team) so an effort should be made to consider the scope of validation work when planning.
- Care should be taken to consider the sunset date of any currently used applications you may be migrating from. It is unwise to run the project schedule up to the day the old system contract ends, because it removes the ability to review the data in the context of the old platform, which can make validation difficult.
Initial and Ongoing Data Sharing
The first act of the Migration Specialist is to gain full access to your data for their initial evaluation. This may include the following:
- Assess the size of the database - How long does it take to transmit/copy? How many of each content type are there?
- Initial assessment of data quality - Is the data consistent among similar content? Is the data model standard? Is this a known system we have worked with before, and what knowledge can we leverage toward this effort.
It is best to accomplish the step of initial data sharing as soon as possible at the start of an engagement, to allow for security clearance (often required) and the setup of a secure environment for migration work. In addition to general access, there are a few special considerations:
- Making sure that your data can be shared securely with your development team, and if there is internal or external security certification, that any requirements for secure data handling have been met.
- Arranging a means for regular, secure dumps of data from your current provider(s) to your development team. This is especially important if you are dealing with large amounts of changing data.
Unpacking the Data
Unpacking and understanding the data from the old system(s) and mapping it to the new is the shared work of the migration team and the client team. They answer the question, "What does all this mean?" and help retain as much meaning as possible as the data is mapped to the new system. This can involve several activities, all of which need to be planned for:
- Data cleanup (see above) - Sometimes in all this dumping and analysis, what comes up is the need to clean up the data before it is migrated. Accomplishing this before migration may reduce cost be decreasing old, edge cases in the data, by minimizing the amount and kind of data to be migrated, and by reducing the amount of time it takes to validate the data on the client side.
- The need to bring in other people - Sometimes it becomes clear that there is data that nobody on the team knows about. They may not know its importance, what role is plays in relation to other data, or whether or not other groups might use the data. This calls for more inclusion of data owners, to ensure that their data is migrated as needed and nothing is left behind that is critical.
- Adjustments to scope may be needed if something (including additional content types, higher volume of data, for example) is discovered in the data that does not map to scope.
Data Mapping and Signoff
Data mapping can be the most tedious part of the migration process as it involves a great deal of back and forth, client involvement, and small adjustments. In our approach to migration:
- We begin by mapping our initial data analysis into a shared spreadsheet that allows the Migration Specialist and the client team to work together to indicate into which specific content type and field each piece of data must be migrated. This spreadsheet includes all of the data fields in the sytem(s) you are migrating from, and all of the content types and fields that have been planned for the new system.
- As part of this process, the client determines which specific fields will not be migrated. This may be because they represent old data, old content types that will not be re-created, or data that is not business critical.
- This process completes with client signoff on the scope of migration and the destination of each data field. This defines the final scope of data migration, even if it differs from the original scope, so this may require adjustment to the scope document.
- The Migration Specialist will use the signed off migration planning document to develop the code and scripts to accomplish the migration.
Initial Test Migrations and Signoff
The Migration Specialist may execute many test migrations (full or complete) before you will see results of an initial migration. During this time, they may be configuring and adjusting the code using the migrate module (which we typically use as the basis for migrations), testing scripts to combine or separate data fields, and essentially getting the migration script to come as close as possible to migrating all fields correctly.
Once they are substantially complete, you will be invited to review the initial test migration. Some adjustment can be made to the migration plan based on the results of this initial migration (adjusting the destination or source fields for certain data, adjusting field labels to be more clear and other small adjustments based on added clarity gained from being able to see the data in place. Major changes at this stage may lead to scope additions, so the results of this step should be discussed in detail to document any adjustments that are in scope, and any items coming up that might require additional scope.
In the case of rapid development, sites with a large amount of data that continues to be updated as development is underway, or high-risk migrations, this work may be split up into incremental migrations. These may be organized around the development schedule for specific content types and fields, the development of interface elements that enable or aid in data validation, or the date on which specific data becomes available for migration. The general plan for whether the migration is incremental or addressed in a test-migrations-followed-by-final-migration manner should be put in place during initial planning. This incremental approach may allow more flexibility for the development team, but may require more time of the data validation team on the client side, since the validation is broken up into phases.
Final Migration and Signoff
Once we gain signoff on either test migration(s) or the series of incremental migrations (depending on the project), the Migration Specialist plans to execute the final content migration. For sites that are newly launching, or replacing existing sites, this may be followed with one or more differential migrations, to bring over content that may have been posted on the old site during cutover. It is recommended that for sites that are re-launching, the old site is put in read-only mode during cutover to reduce the risk of lost content, and to avoid end-user confusion and frustration over what they may experience as "temporarily missing" content.
Signoff occurs after the client team has reviewed the final migration on production after the last migration effort has occurred. At signoff, the migration is considered complete, and the migration specialist may leave the development team entirely, or be available for consultation as needed, depending on the specific situation.
If you are working with multiple departments or agencies moving toward a shared platform, or are working with a very large amount of data and therefore a very large data validation team, please review Joshua Smith's excellent blog post about how to leverage a special advisory group or committee to support this high-risk aspect of the effort.