Drupal 7, 8, 9

Mastering Content Migration - Automated or Manual

Migrations are never a walk in the park, but they don’t have to be a nightmare either. Whether you're transitioning from a legacy system or upgrading your current platform, a successful migration will start by evaluating your content. Which content will be migrated? How will it need to be transformed based on your content strategy design goals, and any new capabilities in the destination system?

Let’s have a look at some of them in detail.

Analysis phase

The process starts by gathering data on the source content by developing a content inventory and can be prepared by querying the source data, site crawling or using system reports. The content inventory is a key document in estimating the effort to migrate content and understanding the destination structure and if the migration is a candidate for automation.

Now it’s important to understand a bit more about the content, and build some metrics by asking questions of each type of content:

  • How much content needs migration of each type?
  • How complex is each type of content?
  • How "structured" is each type of data?
  • How much processing or manipulation is required when moved to the destination site?
  • Does this content depend on any other?
  • What SEO impact may arise moving this content?

A good automation candidate will typically have a high volume of content and a consistent structure and format that can be mapped to the new site.

Behold the content inventory

The content inventory provides insight to the usage of entities across an entire platform and dependencies for each site. Each content type is evaluated as a Good/Potential/Bad candidate for automation depending on volume, complexity and utilization.

Content Inventory

 

This example illustrates a platform migration with many regional sites. The inventory will help to:

  • prioritize the order in which features need to be developed to roll out sites and inform the migration roadmap.
  • highlight cases where particular sites have a high volume of content.
  • help group or cluster sites with dependencies on common features.
  • identify "snowflake" sites with unique features that may require consideration and forward planning.

Evaluation

For each type of content that is a potential candidate for automation, we investigate further to determine which shape the content needs to take on the destination site. Each content type is evaluated, and the following questions are asked with regard to purpose and value:

  • How relevant is the content to the business? Is the content migrated for historical or archival purposes? Does legislation mandate that the content must be publicly available "as is"?
  • Is the maintainability of this content of value to the business?
  • How structured is the source content? How well does it map to the destination?
  • Does the content need to be in a form that editors can maintain or update?

Depending on the answers, various approaches can be evaluated that determine the overall effort to migrate. For example, in some cases, the content is required; however, the effort to transform is not justified or content changes are not anticipated.

Some of those approaches that we can encounter are:

  • Migrate source structured content and map to destination a structured content types.
  • Migrate unstructured content into structured content types.
  • Migrate source content into a structured content type, but some areas will require manual curation or re-entry.
  • Migrate unstructured content into a “Legacy” unstructured content type — this reduces development effort but limits the long-term maintainability.

When mapping the content to the destination site, SEO and incoming links also need to be considered. This is particularly important if a new, "clean" or sanitized URL structure is adopted to ensure that legacy URL paths redirect users and search engines to the new location.

Considerations: When a manual migration is a valid option

Finally I’d highlight three considerations that should be taken when confronting this decision.

The first one is the problem to solve. Automation will not necessarily solve issues of content quality, relevance or purpose in the source system for example, so that would be a good indicator that a manual work is required.

There may also be a temptation to do a "like for like" and focus on building the "same as before"; however, this approach can result in missing out on new opportunities. This is common, in particular, when the new system provides new capabilities such as layout building, branded components and structures that were not previously available for customers and editors alike.

The second one would be the cost perspective, there is also an evaluation between the trade-off of a developer building and test migration scripts versus the time to manually recreate pages by a content editor.

  • Manual migration may be a more suitable option when the effort to migrate the volume of content and manual intervention necessary is less than or close to the total development effort.
  • When estimating effort for an automated migration, it's important to include the effort to review and check the output to ensure the time spent validating and checking the results is included in the equation.

Finally, I would consider the hybrid approach. In cases where the vision, purpose or structure may have changed so significantly, review and editorial changes are necessary. The hybrid approach is the right tool on whose situations where a partial migration can achieve the basic structure such as images, terms, path, and menu, but the editor will reformat post-migration.

Alejandro Moreno Lopez, Software Engineer / Technical Architect, Acquia

Alejandro Moreno López

Software Engineer / Technical Architect Acquia

Alejandro Moreno Lopez is a software engineer/architect who has worked in different roles, from pure development, to leading small and medium teams, to co-founding three different companies with different partners in technology, content management, marketing online and tourism. He's also been lucky enough to work and lead some of the biggest enterprise projects in Drupal like Royal Mail, Parcelforce, and BBC.

Before joining Acquia, Alejandro was the technical architect at BBC, leading two teams responsible for the delivery and innovation of two of the biggest brands in the corporation: BBC GoodFood and BBC TopGear.

He also worked with one of the biggest consultancies in the world, Capgemini, where he enjoyed the delivery and learning curve of complex tools and architectures of the enterprise world.

Born in sunny Alicante, Alejandro moved to London exchanging sun, good weather and exceptional food for interesting projects, incredible teams and amazing challenges. He still doesn’t regret that choice... most of the time.

He loves learning languages, everything boarding (skateboarding, snowboarding, kitesurf, …), cycling, and spending time with his daughter, his wife, and his French Bulldog.