Recently Globalnews.ca made the move to WordPress and we were tasked with migrating all their data from their existing custom MSSQL-based environment to the new WordPress environment hosted with WordPress.com VIP. We have a normal process for this kind of thing, and it goes something like this:
- Create a local WordPress install and get rid of the default posts, comments, and meta data.
- Put the clients old data into tables in the same WordPress database.
- Write a script that pulls the data from the old tables and uses a combination of WordPress functions and direct database queries to put it into the WordPress install.
- Check the data and if it’s not quite right blast out the posts, comments, and meta data and go back to step 3.
- When the data is exactly what we want in WordPress we use wp-cli to export the data in WXR format, and submit those files to WordPress.com VIP.
It sounds pretty straight forward, and usually it is, with the most difficult part obviously being the script that imports the data from the old tables into WordPress. Not this time however. This time the most difficult part turned out to be step 2, putting the clients data into the same database as WordPress. The problem was that the conversion from MSSQL to MySQL was no trivial matter.
Often you can use MySQL Workbench to migrate data from MSSQL to MySQL, so we started by spinning up a Windows cloud server with MSSQL. We imported the data into an MSSQL database on this new server, and then we installed MySQL and the MySQL Workbench. Due to the large amount of data being converted, we ran into memory issues several times and had to resize the cloud server, but once the memory issues were under control we realized that MySQL Workbench simply could not migrate many of the tables that we needed.
After many hours of digging and research, I finally found the problem. It turns out that there were a couple data types that were causing things to choke. In our case that was mostly limited to nvarchar and ntext. Why? Well because MSSQL doesn’t actually support UTF-8. What?! I know…I was surprised too, but it seems MSSQL doesn’t support the standard in character encoding. Instead it has nvarchar and ntext that don’t store as UTF-8 but offer similar output in a classic Microsoft-proprietary way.
I was able to work around this limitation by creating duplicate tables for each table that contained one of these field types, using nvarchar in place of varchar and ntext in place of text. Then I ran queries to select the data from the old tables and insert it into these newly created tables.
CREATE TABLE [dbo].[ImportFixed]( [RowNum] [bigint] NULL, [post_title] [nvarchar](200) NOT NULL, -- Previously varchar [content_html] [ntext] NULL -- Previously text ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY] GO ---------------------- INSERT INTO [import].[dbo].[ImportFixed]( [RowNum], [post_title], [content_html] ) SELECT [RowNum], [post_title], [content_html] FROM [import].[dbo].[Import] GO
This then allowed MySQL Workbench to properly handle the encoding during it’s migration from MSSQL to MySQL. Even once the process is known it’s tedious and time consuming at best, but it seems to be very reliable and stable.