The starting point to migrate any blog posts from your previous blog engine to Jekyll is the Jekyll-import doc site. Looking at the website you’ll see there are many importers available for all the major blogging platforms.

My previous blogging engine Dexter Blog Engine was not supported, so I opted to use the basic RSS importer.

Here are the steps I followed:

  1. Setup Jekyll importer according to the instruction of the official website.
  2. I exported the complete RSS feed of my blog (not only the first N posts) to a local file to further process / fix it if needed.
  3. I and tried a rough import, just to see if there are any major problems.

Here’s the command I used to start the import procedure:

ruby -rubygems -e 'require "jekyll-import"; JekyllImport::Importers::RSS.run({ "source": "migration/rss.txt" })'

I was lucky! I got no major problems (except fixing a couple of strings) and the posts showed up correctly: the titles, content and publish date were correctly imported.

Let’s look at the ‘problems’:

  • Some data were missing (the comments, the tags and the categories of the posts).
  • The permalinks didn’t match the previous format.

Let’s go through them one by one.

The Comments

I just gave up on them for now, they are not present in the RSS feed and I didn’t used an external service (like Disqus) on my previous blog. I have them on my backup database and - hopefully - I’ll have time to restore them later on.

The Categories

Same as the comments, they are not in the RSS; but honestly I didn’t liked them too much, it’s not a big loss!

The Tags

Looking at the exported RSS feed I could see that associated to every post item there are a series of <category> items containing my old tags, but they were not imported. I guessed that the default RSS imported ignored them; It should be easy to solve this issue with post processing or - even better - customizing the importer.

The Permalinks

This one is pretty important, the Permalinks page describes how Jekyll defines the url. The expression that defines the permalink is configurable editing the ‘permalink’ setting in the _config.yml file.

A possible solution is to use the Jekyll Redirect From plugin. But it will require to add a new option to the YAML front matter of every imported post, there are some options of doing that:

  • Manually edit every post… uhm!
  • Post process every file generated by the importer… not that good!
  • Change the permalink option in the YAML and tweak the importer a bit… I like this one!

My old blog links looked like:

www.primordialcode.com/blog/post/post-title

I decided to keep this structure for Jekyll too, so in _config.yml I set:

permalink:  /blog/post/:title 

Which solved half of the problem, the ‘title’ part of the url do not match because Jekyll uses a different algorthm generate the slug. Jekyll uses the filename to fill in the title (eventually overridden by the ‘slug’ attribute in the YAML front matter), what I needed to do was generate the filename in a way that matched the slug generated by Dexter Blog Engine.

Looking at the exported RSS file I could see that every exported blog post had a <link> attribute containing the original absolute url, I could parse the correct filename from there. Let’s tweak the default RSS importer.

disclaimer: I am NOT a ruby developer, I just messed the things up untill it roughly worked!

At first I tought of creating a brand new importer starting from the existing one, but the things happened to be a lot easier than that: it turns out that when you install a ruby gem (at least on a Windows machine, I have no idea whatsoever on other OSes), all the source code is installed too and you can modify that code directly. On my machine the source code for the importers can be found here: C:\Ruby22-x64\lib\ruby\gems\2.2.0\gems\jekyll-import-0.7.1\lib\jekyll-import\importers

I edited the rss.rb file and changed the importer to extract the filename from the <link> attribute rather than generating it from scratch parsing the post title:

module JekyllImport
  module Importers
    class RSS < Importer
      def self.specify_options(c)
        c.option 'source', '--source NAME', 'The RSS file or URL to import'
      end

      def self.validate(options)
        if options['source'].nil?
          abort "Missing mandatory option --source."
        end
      end

      def self.require_deps
        JekyllImport.require_with_fallback(%w[
          rss
          rss/1.0
          rss/2.0
          open-uri
          fileutils
          safe_yaml
        ])
      end

      # Process the import.
      #
      # source - a URL or a local file String.
      #
      # Returns nothing.
      def self.process(options)
      source = options.fetch('source')

      content = ""
      open(source) { |s| content = s.read }
      rss = ::RSS::Parser.parse(content, false)

      raise "There doesn't appear to be any RSS items at the source (#{source}) provided." unless rss

      rss.items.each do |item|
        formatted_date = item.date.strftime('%Y-%m-%d')
		  
		  # original code
		  #post_name = item.title.split(%r{ |!|/|:|&|-|$|,|\?|\*}).map do |i|
      #  i.downcase if i != ''
      #end.compact.join('-')
		  
		  # extract the filename from the old permalink
		  post_name = item.link.split('/').last(1).join('')
		  
		  name = "#{formatted_date}-#{post_name}"

          header = {
            'layout' => 'post',
            'title' => item.title,
            # extract the tags form the categories (ugly ugly ugly!)
			      'tags' => (item.categories.join(",")).gsub("<category>", "").gsub("</category>", "").split(",")
          }

          FileUtils.mkdir_p("_posts")

          File.open("_posts/#{name}.html", "w") do |f|
            f.puts header.to_yaml
            f.puts "---\n\n"
            f.puts item.description
          end
        end
      end
    end
  end
end

The code displayed above also added the corrects Tags back to the post YAML front matter, parsing the data from the tags exported in the RSS file.

In the end customizing the importer avoided me to write another console application to post process every imported file and add the missing pieces of information, It also avoided me to install the Jekyll Redirect From gem plugin.

cya next

Comments