WordPress blogs migration with Sitecore Data Exchange Framework (DEF)

A couple of months ago I gave a talk for the Sitecore User Group Ecuador. As part of the talk I presented a POC (it's actually more than just a POC), which consisted in migrating existing WordPress blogs into a Sitecore instance. Throughout this post I will explain what I did to achieve this using DEF. However, I won't delve into the intrinsic details of how DEF works since there are plenty of resources in the internet. At the end of the post you will find a link to Github with the code.

DEF uses a set of items (endpoints, pipeline steps, pipeline batches, etc) to integrate with other systems. This POC is composed of:
  • One Endpoint
  • One Pipeline Batch
  • One Pipeline
  • Six Pipeline Steps 
Endpoint: 
An endpoint in DEF is an item that holds the configurations for an integration, in this case, the endpoint contains the URLs from where we are going to retrieve the posts/tags/categories:



This endpoint item uses a custom template that inherits from the DEF "Base Endpoint" template and can be extended to have as many new fields as we need, in this case I only needed three new single line text fields for the URLs.

In order to be able to use this endpoint, we need to convert it to something that DEF can understand by using a converter. In the same endpoint item we can specify the converter in the "Converter Type" field. The converter class that I implemented is shown below:

 using Sitecore.DataExchange.Models;  
 using Sitecore.DataExchange.Repositories;  
 using Sitecore.Services.Core.Model;   
 using Sitecore.DataExchange.Converters.Endpoints;  
 namespace DEFExample.Website.Converters.Endpoints  
 {  
   //By inheriting from BaseEndpointConverter you get access to a number of methods that facilitate reading values from fields on a Sitecore item.  
   public class WordpressEndpointConverter : BaseEndpointConverter  
   {  
     private static readonly Guid TemplateId = Guid.Parse(IWordPress_Endpoint_Constants.TemplateIdString);  
     public WordpressEndpointConverter(IItemModelRepository repository) : base(repository)  
     {  
       //  
       //identify the template an item must be based  
       //on in order for the converter to be able to  
       //convert the item  
       this.SupportedTemplateIds.Add(TemplateId);  
     }  
     protected override void AddPlugins(ItemModel source, Endpoint endpoint)  
     {  
       //  
       //create the plugin  
       var settings = new WordpressSettings  
       {  
         PostsUrl = GetStringValue(source, IWordPress_Endpoint_Constants.Posts_URL_FieldName),  
         TagsUrl = GetStringValue(source, IWordPress_Endpoint_Constants.Tags_URL_FieldName),  
         CategoriesUrl = GetStringValue(source, IWordPress_Endpoint_Constants.Categories_URL_FieldName)  
       };  
       //  
       //populate the plugin using values from the item  
       //add the plugin to the endpoint  
       endpoint.AddPlugin(settings);  
     }  
   }  
 }  

Pipeline and Pipeline Steps:

Pipelines are basically a grouping of pipeline steps that perform certain actions. Pipelines in DEF are similar to those in Sitecore, and the concept of pipeline steps is similar to that of processors.

In order to make it more intuitive, I separated the different integration steps into the following pipeline steps:


Each one of these steps accomplish a certain objective (which can be easily understood from the item name). All the pipeline step items use a custom template (that inherits from the "Base Pipeline Step" template that comes with DEF) and I added a "EndpointFrom" field which is a droptree. This field is used to set the endpoint that will be used by the pipeline step: 



Under the Data Exchange Framework section pictured above, we also need to specify a converter and a processor. A converter is a class that transforms Sitecore item models into a DEF component as I already explained above, and a Processor is the class that contains the logic that will be executed by the pipeline step. 

Below you can see the processor I used to read the categories from WordPress:

 using Sitecore.DataExchange.Attributes;  
 using Sitecore.DataExchange.Contexts;  
 using Sitecore.DataExchange.Models;  
 using Sitecore.DataExchange.Plugins;  
 using Sitecore.DataExchange.Processors.PipelineSteps;  
 using System;  
 using System.Collections.Generic;  
 using DEFExample.Website.Models;  
 using Examples.FileSystem;
 using Sitecore.Services.Core.Diagnostics;
 using DEFExample.Website.Helpers.Factories;  
 using DEFExample.Website.Helpers.Services;  
 namespace DEFExample.Website.Processors.PipelineSteps  
 {  
   [RequiredEndpointPlugins(typeof(WordpressSettings))]  
   public class ReadCategoriesStepProcessor : BaseReadDataStepProcessor  
   {  
     // protected static readonly string TotalNumberOfPages = "X-WP-TotalPages";  
     private static IWordpressService _wordpressService;  
     public ReadCategoriesStepProcessor()  
     {  
       _wordpressService = WordpressServiceFactory.Build();  
     }  
     protected override void ReadData(Endpoint endpoint, PipelineStep pipelineStep, PipelineContext pipelineContext,  
       ILogger logger)  
     {  
       if (endpoint == null)  
       {  
         throw new ArgumentNullException(nameof(endpoint));  
       }  
       if (pipelineStep == null)  
       {  
         throw new ArgumentNullException(nameof(pipelineStep));  
       }  
       if (pipelineContext == null)  
       {  
         throw new ArgumentNullException(nameof(pipelineContext));  
       }  
       try  
       {  
         var settings = endpoint.GetWordpressSettings();  
         if (settings == null)  
         {  
           logger.Error("Empty WordPress settings");  
           return;  
         }  
         List<Category> categories = _wordpressService.Read<Category>(settings.CategoriesUrl, logger);  
         var categoriesData = new IterableDataSettings(categories);  
         pipelineContext.AddPlugin(categoriesData);  
       }  
       catch (Exception ex)  
       {  
         logger.Error($"Error in ReadCategoriesStepProcessor: {ex.InnerException}");  
         pipelineContext.CriticalError = true;  
       }  
     }  
   }  
 }  

The other pipeline steps are similar (code attached at the bottom of the page) that's why I won't talk about them here.

Pipeline Batches:

The other thing we need to have is a Pipeline Batch, which can be created using the "Pipeline Batch" template. Pipeline batches allow us to specify which pipelines will run and what is the order of execution:


Pipeline batches also let developers see when was the last time the process started and the last time it ended. Another useful feature provided by this item is the possibility to see the logs for the integration process:

Unless there is a requirement which requires to further customize the batch, there is no need to specify custom converters or processors.

Pipeline Batches can run in schedules (just like any other Sitecore scheduled task) or they can be manually run using the "Run Pipeline Batch" button that appears in the Data Exchange ribbon when we select the Pipeline Batch Item:



If we would like to automatically run the Pipeline Batch we need to create a scheduled task. Creating scheduled tasks for pipeline batches is extremely easy. 
  1. Create a Command Item based on the "Run Selected Pipeline Batches Command" template
  2. In the Pipeline Batches section, select the pipeline batch that you want to schedule (You don't need to change the Type or Method field unless further customization is needed).
  3. Create a Schedule based on the "Schedule" template (like any other normal Sitecore schedule) and select the command created in the previous step.
After running the Pipeline Batch (manually or with a schedule) my Sitecore tree ended up looking like this:
As you can see, besides importing the blog posts, I also imported the categories and tags so that I could link them (using a treelist field) to the actual post. 

The code can be downloaded from Github.

Quick tip: If the "Run Pipeline Batch" button is grayed out and you can't run the batch it is most likely because you haven't enabled the tenant, make sure to check the box below: 



Comments