mkaz solutions log » home

Convert Wordpress Data to Jekyll

Date: Aug 12, 2011

I recently converted solog, this solutions log, from Wordpress to a Jekyll site. Jekyll is a static site generator, written in ruby. It is a pretty basic tool which takes a set of files, applies templates and generates a web site. No database, admin site, plug-ins or constant upgrades needed. It is simple, but for a basic website, simple is all I need.

To convert, I needed to extract the articles out of the Wordpress database and to basic files. I ran into some issues getting the included ruby import scripts to connect to the database, so I simply used MySQL’s xml output. However, the output format wasn’t easy to work with, so after struggling a bit on reading the data using scala, I switched to ruby’s XML-Simple which made it much easier.

Here are the scripts and procedures I used to convert Wordpress to Jekyll friendly files.

First dump database to XML

  mysql DB --xml -e "SELECT post_title, post_name, post_date, \
    post_content, ID, t.slug as category  FROM wp_posts p, \
    wp_term_relationships tr, wp_term_taxonomy tt, wp_terms t \
    WHERE post_status = 'publish' AND post_type = 'post' \
    AND p.id = tr.object_id \
    AND tr.term_taxonomy_id = tt.term_taxonomy_id \
    AND tt.term_id = t.term_id " >posts.xml

This outputs the content in a awkward XML format which all the elements have the same name and only different by attributes, thus the reason it was difficult to work with. Here’s a brief segment of the XML to highlight the point.

  <resultset ...>
    <row>
      <field name="post_title">How to Convert XML to JSON </field>
      <field name="post_name">how-to-convert-xml-to-json </field>
      <field name="post_date">2011-05-23 23:16:51 </field>
      <field name="post_content">Download lift-json jar from
      ...

To parse in ruby, I used the xml-simple library which converts the XML to ruby objects. I initially tried using the built-in REXML library but I ran into the same issues I did with scala, using an xpath query is trickier when dealing with attributes.

To install the library use: sudo gem install xml-simple

Convert XML to Files

Here’s the ruby script I used to convert the XML of all the posts to individual files.

  # Read in dump file and output markdown file  with metadata at top

  require 'xmlsimple'
  data = XmlSimple.xml_in('posts.xml', { 'KeyAttr' => 'field'})

  data['row'].each do |post|

      post['field'].each do |field|
          if (field['name'] == "post_title")
              $title = field['content']
          end
          if (field['name'] == "ID")
              $id = field['content']
          end
          if (field['name'] == "post_name")
              $postname = field['content']
          end
          if (field['name'] == "post_content")
              $content = field['content']
          end
          if (field['name'] == "post_date")
              $date = field['content']
          end
      end

      # write out file
      filename = $postname + ".md"
      myfile = File.new(filename, "w")
      myfile.puts("---")
      myfile.puts("layout: post")
      myfile.puts("title: " + $title)
      myfile.puts("id: " + $id)
      myfile.puts("date: " + $date)
      myfile.puts("---\n")
      myfile.puts($content)
      myfile.close

  end

I needed to clean up a bunch of special characters, such as smart quotes, dashes and apostrophes. I used the command-line tools find and sed to do that.

  # apostrophes
  find ./ -name "*.md" | xargs sed -i '' "s:í:':g"

  # left quote
  find ./ -name "*.md" | xargs sed -i '' 's:ì:":g'

  # right quote
  find ./ -name "*.md" | xargs sed -i '' 's:î:":g'

I then modified here and there to get how I wanted, I only had a dozen or so files I was dealing with, so the script isn’t perfect. I’ll come back when/if I decide to convert my long standing mkaz.com site with its hundreds of posts.

Hope it helps.


Additional Resources