I recently converted solog, this solutions log, from Wordpress to a Jekyll site. Jekyll is a static site generator, written in ruby. It is a pretty basic tool which takes a set of files, applies templates and generates a web site. No database, admin site, plug-ins or constant upgrades needed. It is simple, but for a basic website, simple is all I need.
To convert, I needed to extract the articles out of the Wordpress database and to basic files. I ran into some issues getting the included ruby import scripts to connect to the database, so I simply used MySQL’s xml output. However, the output format wasn’t easy to work with, so after struggling a bit on reading the data using scala, I switched to ruby’s XML-Simple which made it much easier.
Here are the scripts and procedures I used to convert Wordpress to Jekyll friendly files.
mysql DB --xml -e "SELECT post_title, post_name, post_date, \
post_content, ID, t.slug as category FROM wp_posts p, \
wp_term_relationships tr, wp_term_taxonomy tt, wp_terms t \
WHERE post_status = 'publish' AND post_type = 'post' \
AND p.id = tr.object_id \
AND tr.term_taxonomy_id = tt.term_taxonomy_id \
AND tt.term_id = t.term_id " >posts.xml
This outputs the content in a awkward XML format which all the elements have the same name and only different by attributes, thus the reason it was difficult to work with. Here’s a brief segment of the XML to highlight the point.
<resultset ...>
<row>
<field name="post_title">How to Convert XML to JSON </field>
<field name="post_name">how-to-convert-xml-to-json </field>
<field name="post_date">2011-05-23 23:16:51 </field>
<field name="post_content">Download lift-json jar from
...
To parse in ruby, I used the xml-simple library which converts the XML to ruby objects. I initially tried using the built-in REXML library but I ran into the same issues I did with scala, using an xpath query is trickier when dealing with attributes.
To install the library use: sudo gem install xml-simple
Here’s the ruby script I used to convert the XML of all the posts to individual files.
# Read in dump file and output markdown file with metadata at top
require 'xmlsimple'
data = XmlSimple.xml_in('posts.xml', { 'KeyAttr' => 'field'})
data['row'].each do |post|
post['field'].each do |field|
if (field['name'] == "post_title")
$title = field['content']
end
if (field['name'] == "ID")
$id = field['content']
end
if (field['name'] == "post_name")
$postname = field['content']
end
if (field['name'] == "post_content")
$content = field['content']
end
if (field['name'] == "post_date")
$date = field['content']
end
end
# write out file
filename = $postname + ".md"
myfile = File.new(filename, "w")
myfile.puts("---")
myfile.puts("layout: post")
myfile.puts("title: " + $title)
myfile.puts("id: " + $id)
myfile.puts("date: " + $date)
myfile.puts("---\n")
myfile.puts($content)
myfile.close
end
I needed to clean up a bunch of special characters, such as smart quotes, dashes and apostrophes. I used the command-line tools find and sed to do that.
# apostrophes
find ./ -name "*.md" | xargs sed -i '' "s:í:':g"
# left quote
find ./ -name "*.md" | xargs sed -i '' 's:ì:":g'
# right quote
find ./ -name "*.md" | xargs sed -i '' 's:î:":g'
I then modified here and there to get how I wanted, I only had a dozen or so files I was dealing with, so the script isn’t perfect. I’ll come back when/if I decide to convert my long standing mkaz.com site with its hundreds of posts.
Hope it helps.