Solved: Migrate WordPress Posts to Ghost CMS: A Content Export Script

🚀 Executive Summary

TL;DR: Migrating WordPress posts to Ghost CMS is challenging due to incompatible XML exports. This article provides a Python script solution that directly connects to the WordPress database, extracts post content, and transforms it into a Ghost-compatible JSON format for seamless import.

🎯 Key Takeaways

  • The migration script requires Python 3.x, mysql-connector-python, WordPress database credentials, and a running Ghost CMS instance.
  • The script queries key WordPress tables (wp_posts, wp_term_relationships, wp_term_taxonomy, wp_terms) to extract post content, titles, slugs, dates, statuses, and associated tags/categories.
  • WordPress data is transformed into Ghost’s expected JSON structure, mapping post_title to title, post_name to slug, post_content to html, and converting post_date to ISO 8601 format, while parsing tags into an array of objects.

Migrate WordPress Posts to Ghost CMS: A Content Export Script

Welcome to a new technical deep dive from TechResolve! As your digital presence evolves, so too do your platform needs. Many content creators and organizations start with WordPress for its versatility and vast ecosystem. However, for those seeking a more streamlined, performant, and focused blogging experience, Ghost CMS often emerges as an attractive alternative. Its modern interface, Markdown-first approach, and speed are compelling reasons for migration.

The challenge? Moving your meticulously crafted articles from WordPress to Ghost isn’t always a one-click affair. While WordPress offers XML exports, this format isn’t directly consumable by Ghost’s import utility without significant manual transformation. The thought of manually copying hundreds or thousands of posts is enough to deter even the most dedicated content strategists.

The Solution: This tutorial provides a comprehensive, step-by-step guide to building a custom Python script. This script will connect directly to your WordPress database, extract your valuable post content, and transform it into a Ghost-compatible JSON format, ready for a seamless import. Automate the tedious, minimize errors, and accelerate your transition to Ghost with confidence.

Prerequisites

Before we dive into the scripting, ensure you have the following tools and access in place:

  • Python 3.x: Installed on your local machine or a server where you’ll run the script.
  • WordPress Database Access: You will need credentials (hostname, username, password, database name) to connect to your WordPress MySQL or MariaDB database. This typically means direct access via a database client, phpMyAdmin, or SSH access to the server.
  • Python Libraries:

    • mysql-connector-python: For connecting to and querying your MySQL/MariaDB database.
    • json (built-in): For handling JSON data serialization.

You can install mysql-connector-python using pip:

  pip install mysql-connector-python
  • Ghost CMS Instance: While not strictly required for the export script itself, you will eventually need a running Ghost instance to import your generated JSON file.
  • Basic Understanding: Familiarity with Python, SQL queries, and the structure of WordPress data will be beneficial.

Step-by-Step Guide: Building Your Content Export Script

Step 1: Understanding the WordPress Database Structure

WordPress stores its content in a relational database, primarily across a few key tables. For posts, we are most interested in:

  • wp_posts: This table holds the core information for your posts, pages, attachments, and custom post types. Key columns include post_title, post_name (slug), post_content (the actual article HTML), post_date, and post_status (e.g., ‘publish’, ‘draft’).
  • wp_term_relationships: Links posts to terms (categories and tags).
  • wp_term_taxonomy: Defines the taxonomy for terms (e.g., ‘category’, ‘post_tag’).
  • wp_terms: Stores the actual names and slugs of categories and tags.

Our script will primarily query the wp_posts table for post content and then join with the taxonomy tables to fetch associated categories and tags.

Step 2: Connect to the WordPress Database and Extract Data

First, let’s establish a connection to your WordPress database using Python and retrieve the raw post data. Remember to replace the placeholder credentials with your actual database details.

import mysql.connector
import json
import re # For basic content cleanup later if needed

# --- Configuration ---
DB_CONFIG = {
    'host': 'your_wordpress_database_host',
    'user': 'your_wordpress_database_user',
    'password': 'your_wordpress_database_password',
    'database': 'your_wordpress_database_name'
}

def fetch_wordpress_posts():
    posts = []
    try:
        cnx = mysql.connector.connect(**DB_CONFIG)
        cursor = cnx.cursor(dictionary=True) # Fetch rows as dictionaries

        # SQL query to select posts, and their associated categories/tags
        # We focus on 'post' type posts with 'publish' status
        query = """
        SELECT 
            p.ID, 
            p.post_title, 
            p.post_name, 
            p.post_content, 
            p.post_date, 
            p.post_status,
            GROUP_CONCAT(DISTINCT t.name ORDER BY t.name SEPARATOR '|') AS tags_and_categories
        FROM wp_posts p
        LEFT JOIN wp_term_relationships tr ON p.ID = tr.object_id
        LEFT JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id
        LEFT JOIN wp_terms t ON tt.term_id = t.term_id
        WHERE p.post_type = 'post' AND p.post_status = 'publish'
        GROUP BY p.ID
        ORDER BY p.post_date ASC;
        """

        print("Executing database query...")
        cursor.execute(query)

        posts = cursor.fetchall()
        print(f"Fetched {len(posts)} posts from WordPress.")

    except mysql.connector.Error as err:
        print(f"Database error: {err}")
    finally:
        if 'cnx' in locals() and cnx.is_connected():
            cursor.close()
            cnx.close()
            print("Database connection closed.")
    return posts

# Example usage (will be called later in the main script)
# wordpress_posts_data = fetch_wordpress_posts()
# print(wordpress_posts_data[0] if wordpress_posts_data else "No posts found.")

Logic Explained:

  • We use mysql.connector to connect to the database.
  • cursor(dictionary=True) ensures that each row is returned as a dictionary, making it easier to access columns by name.
  • The SQL query performs a LEFT JOIN across wp_posts, wp_term_relationships, wp_term_taxonomy, and wp_terms to gather post details along with their associated tags and categories.
  • GROUP_CONCAT is used to collect all tags and categories for a post into a single string, separated by a pipe (|), which we’ll parse later.
  • The WHERE clause filters for entries that are actual ‘post’ types and have a ‘publish’ status, excluding drafts, pages, or other custom post types for this basic migration.

Step 3: Transform Data for Ghost CMS

Ghost requires a specific JSON structure for importing content. We need to map our WordPress fields to Ghost’s expected format. Key fields for a Ghost post include title, slug, html (for content), status, and published_at. Tags are handled as an array of objects.

def transform_to_ghost_format(wp_posts):
    ghost_posts = []
    for post in wp_posts:
        # Map WordPress post_status to Ghost status
        ghost_status = 'published' if post['post_status'] == 'publish' else 'draft'

        # Parse tags and categories
        tags_raw = post['tags_and_categories']
        ghost_tags = []
        if tags_raw:
            # Split by '|' and create tag objects
            for tag_name in tags_raw.split('|'):
                if tag_name.strip(): # Ensure tag name is not empty
                    ghost_tags.append({"name": tag_name.strip()})

        ghost_post = {
            "title": post['post_title'],
            "slug": post['post_name'],
            "html": post['post_content'], # WordPress post_content is usually HTML
            "status": ghost_status,
            "published_at": post['post_date'].isoformat(), # Convert datetime to ISO 8601 string
            "created_at": post['post_date'].isoformat(),
            "updated_at": post['post_date'].isoformat(),
            "type": "post",
            "feature_image": None, # You might extend this to migrate featured images
            "tags": ghost_tags
            # You can add more fields like authors, custom_excerpt, etc.
        }
        ghost_posts.append(ghost_post)
    print(f"Transformed {len(ghost_posts)} posts into Ghost format.")
    return ghost_posts

Logic Explained:

  • The function iterates through each WordPress post dictionary fetched in Step 2.
  • post_status ‘publish’ maps to Ghost ‘published’, otherwise it’s ‘draft’.
  • The tags_and_categories string is split and converted into an array of objects, each with a name key, as required by Ghost.
  • post_date is converted to ISO 8601 format, which Ghost expects for date fields.
  • post_content is directly used as html, assuming it’s already in a suitable HTML format from WordPress’s rich text editor.
  • Placeholder values like feature_image are set to None but can be extended if you plan to migrate media.

Step 4: Generate the Ghost-Compatible JSON File

Now, we’ll combine the previous steps and write the transformed data into a JSON file that Ghost’s import utility can recognize. Ghost’s import format typically expects a root data object containing a posts array, along with a meta object.

def generate_ghost_import_json(ghost_posts, output_filename="ghost_import.json"):
    # Ghost import format wrapper
    ghost_data = {
        "meta": {
            "api_version": "v5.x" # Adjust to your Ghost version if necessary
        },
        "data": {
            "posts": ghost_posts,
            "tags": [], # You might want to extract unique tags here if not doing it per post
            "users": [], # You might want to extract and map authors here
            "settings": [] # Not typically used for post import
        }
    }

    try:
        with open(output_filename, 'w', encoding='utf-8') as f:
            json.dump(ghost_data, f, ensure_ascii=False, indent=4)
        print(f"Successfully generated Ghost import file: {output_filename}")
    except IOError as e:
        print(f"Error writing JSON file: {e}")

# --- Main script execution ---
if __name__ == "__main__":
    print("Starting WordPress to Ghost migration script...")

    # Step 2: Fetch data
    wp_data = fetch_wordpress_posts()

    if wp_data:
        # Step 3: Transform data
        ghost_ready_data = transform_to_ghost_format(wp_data)

        # Step 4: Generate JSON file
        generate_ghost_import_json(ghost_ready_data)

        print("nMigration script finished. Your 'ghost_import.json' file is ready for import into Ghost CMS.")
    else:
        print("No WordPress posts found or an error occurred during fetching. Exiting.")

Logic Explained:

  • The generate_ghost_import_json function wraps the ghost_posts array within the required data.posts structure.
  • A meta.api_version field is included, which is good practice for Ghost imports. Adjust the version number (e.g., v4.x, v5.x) to match your Ghost instance if issues arise, though Ghost is generally forward-compatible.
  • The json.dump function writes the Python dictionary to a file as formatted JSON. ensure_ascii=False handles non-ASCII characters correctly, and indent=4 makes the output file human-readable.
  • The if __name__ == "__main__": block orchestrates the execution of the functions in the correct order.

Once the script completes, you’ll have a ghost_import.json file. To import it into Ghost:

  1. Log in to your Ghost Admin panel.
  2. Navigate to Settings (gear icon).
  3. Go to the Labs section.
  4. Under Import content, click the Import button and select your generated ghost_import.json file.

Common Pitfalls and Troubleshooting

  • Database Connection Errors: Double-check your DB_CONFIG parameters (host, user, password, database). Ensure the MySQL user has SELECT permissions on the wp_posts, wp_term_relationships, wp_term_taxonomy, and wp_terms tables.
  • Content Formatting and Shortcodes: WordPress often uses shortcodes (e.g., , ) or custom blocks that may not render correctly in Ghost. You might need to extend the transform_to_ghost_format function to parse and convert these or perform manual cleanup in Ghost after import.
  • Missing Images: This script only migrates post content and text-based tags. Images embedded in your WordPress posts are typically hosted on your WordPress server. For a full migration, you would need a separate process to download these images and upload them to Ghost or a CDN, then update the image URLs within the post html content. This is a more advanced task outside the scope of this basic post migration.
  • Large Import Files: If you have an exceptionally large number of posts, the generated JSON file might be very big. Some Ghost hosting environments or browser limits might struggle with very large file uploads. If this happens, consider breaking your export into smaller batches (e.g., by year or date range).

Conclusion and Next Steps

You’ve successfully built a robust Python script to automate the migration of your WordPress posts to Ghost CMS. This approach saves countless hours of manual effort, reduces the risk of human error, and provides a solid foundation for your new content platform. By leveraging direct database access and a programmatic transformation, you gain fine-grained control over your migration process.

This script serves as an excellent starting point. To further enhance your migration, consider these next steps:

  • Image Migration: Implement a strategy to download images from WordPress and upload them to Ghost’s storage, updating post content links accordingly.
  • Author Migration: Map WordPress authors to Ghost users, potentially creating new users in Ghost as part of the script.
  • Page Migration: Extend the script to handle WordPress pages (post_type = 'page').
  • Custom Fields: If you extensively use custom fields in WordPress, determine how to best integrate that data into Ghost (e.g., as custom post settings).

We hope this guide empowers you to make your move to Ghost CMS smoother and more efficient. Happy migrating!

Darian Vance

👉 Read the original article on TechResolve.blog

Support my work

If this article helped you, you can buy me a coffee:

👉 https://buymeacoffee.com/darianvance

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post
how-does-chatgpt-work?-a-guide-for-the-rest-of-us

How Does ChatGPT Work? A Guide for the Rest of Us

Next Post

November 2025 US Cutting Tool Orders Total $206.1M, Up 9.9% From November 2024

Related Posts