🚀 Executive Summary
TL;DR: Migrating WordPress posts to Ghost CMS is challenging due to incompatible XML exports. This article provides a Python script solution that directly connects to the WordPress database, extracts post content, and transforms it into a Ghost-compatible JSON format for seamless import.
🎯 Key Takeaways
- The migration script requires Python 3.x,
mysql-connector-python, WordPress database credentials, and a running Ghost CMS instance. - The script queries key WordPress tables (
wp_posts,wp_term_relationships,wp_term_taxonomy,wp_terms) to extract post content, titles, slugs, dates, statuses, and associated tags/categories. - WordPress data is transformed into Ghost’s expected JSON structure, mapping
post_titletotitle,post_nametoslug,post_contenttohtml, and convertingpost_dateto ISO 8601 format, while parsing tags into an array of objects.
Migrate WordPress Posts to Ghost CMS: A Content Export Script
Welcome to a new technical deep dive from TechResolve! As your digital presence evolves, so too do your platform needs. Many content creators and organizations start with WordPress for its versatility and vast ecosystem. However, for those seeking a more streamlined, performant, and focused blogging experience, Ghost CMS often emerges as an attractive alternative. Its modern interface, Markdown-first approach, and speed are compelling reasons for migration.
The challenge? Moving your meticulously crafted articles from WordPress to Ghost isn’t always a one-click affair. While WordPress offers XML exports, this format isn’t directly consumable by Ghost’s import utility without significant manual transformation. The thought of manually copying hundreds or thousands of posts is enough to deter even the most dedicated content strategists.
The Solution: This tutorial provides a comprehensive, step-by-step guide to building a custom Python script. This script will connect directly to your WordPress database, extract your valuable post content, and transform it into a Ghost-compatible JSON format, ready for a seamless import. Automate the tedious, minimize errors, and accelerate your transition to Ghost with confidence.
Prerequisites
Before we dive into the scripting, ensure you have the following tools and access in place:
- Python 3.x: Installed on your local machine or a server where you’ll run the script.
- WordPress Database Access: You will need credentials (hostname, username, password, database name) to connect to your WordPress MySQL or MariaDB database. This typically means direct access via a database client, phpMyAdmin, or SSH access to the server.
-
Python Libraries:
-
mysql-connector-python: For connecting to and querying your MySQL/MariaDB database. -
json(built-in): For handling JSON data serialization.
-
You can install mysql-connector-python using pip:
pip install mysql-connector-python
- Ghost CMS Instance: While not strictly required for the export script itself, you will eventually need a running Ghost instance to import your generated JSON file.
- Basic Understanding: Familiarity with Python, SQL queries, and the structure of WordPress data will be beneficial.
Step-by-Step Guide: Building Your Content Export Script
Step 1: Understanding the WordPress Database Structure
WordPress stores its content in a relational database, primarily across a few key tables. For posts, we are most interested in:
-
wp_posts: This table holds the core information for your posts, pages, attachments, and custom post types. Key columns includepost_title,post_name(slug),post_content(the actual article HTML),post_date, andpost_status(e.g., ‘publish’, ‘draft’). -
wp_term_relationships: Links posts to terms (categories and tags). -
wp_term_taxonomy: Defines the taxonomy for terms (e.g., ‘category’, ‘post_tag’). -
wp_terms: Stores the actual names and slugs of categories and tags.
Our script will primarily query the wp_posts table for post content and then join with the taxonomy tables to fetch associated categories and tags.
Step 2: Connect to the WordPress Database and Extract Data
First, let’s establish a connection to your WordPress database using Python and retrieve the raw post data. Remember to replace the placeholder credentials with your actual database details.
import mysql.connector
import json
import re # For basic content cleanup later if needed
# --- Configuration ---
DB_CONFIG = {
'host': 'your_wordpress_database_host',
'user': 'your_wordpress_database_user',
'password': 'your_wordpress_database_password',
'database': 'your_wordpress_database_name'
}
def fetch_wordpress_posts():
posts = []
try:
cnx = mysql.connector.connect(**DB_CONFIG)
cursor = cnx.cursor(dictionary=True) # Fetch rows as dictionaries
# SQL query to select posts, and their associated categories/tags
# We focus on 'post' type posts with 'publish' status
query = """
SELECT
p.ID,
p.post_title,
p.post_name,
p.post_content,
p.post_date,
p.post_status,
GROUP_CONCAT(DISTINCT t.name ORDER BY t.name SEPARATOR '|') AS tags_and_categories
FROM wp_posts p
LEFT JOIN wp_term_relationships tr ON p.ID = tr.object_id
LEFT JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id
LEFT JOIN wp_terms t ON tt.term_id = t.term_id
WHERE p.post_type = 'post' AND p.post_status = 'publish'
GROUP BY p.ID
ORDER BY p.post_date ASC;
"""
print("Executing database query...")
cursor.execute(query)
posts = cursor.fetchall()
print(f"Fetched {len(posts)} posts from WordPress.")
except mysql.connector.Error as err:
print(f"Database error: {err}")
finally:
if 'cnx' in locals() and cnx.is_connected():
cursor.close()
cnx.close()
print("Database connection closed.")
return posts
# Example usage (will be called later in the main script)
# wordpress_posts_data = fetch_wordpress_posts()
# print(wordpress_posts_data[0] if wordpress_posts_data else "No posts found.")
Logic Explained:
- We use
mysql.connectorto connect to the database. -
cursor(dictionary=True)ensures that each row is returned as a dictionary, making it easier to access columns by name. - The SQL query performs a
LEFT JOINacrosswp_posts,wp_term_relationships,wp_term_taxonomy, andwp_termsto gather post details along with their associated tags and categories. -
GROUP_CONCATis used to collect all tags and categories for a post into a single string, separated by a pipe (|), which we’ll parse later. - The
WHEREclause filters for entries that are actual ‘post’ types and have a ‘publish’ status, excluding drafts, pages, or other custom post types for this basic migration.
Step 3: Transform Data for Ghost CMS
Ghost requires a specific JSON structure for importing content. We need to map our WordPress fields to Ghost’s expected format. Key fields for a Ghost post include title, slug, html (for content), status, and published_at. Tags are handled as an array of objects.
def transform_to_ghost_format(wp_posts):
ghost_posts = []
for post in wp_posts:
# Map WordPress post_status to Ghost status
ghost_status = 'published' if post['post_status'] == 'publish' else 'draft'
# Parse tags and categories
tags_raw = post['tags_and_categories']
ghost_tags = []
if tags_raw:
# Split by '|' and create tag objects
for tag_name in tags_raw.split('|'):
if tag_name.strip(): # Ensure tag name is not empty
ghost_tags.append({"name": tag_name.strip()})
ghost_post = {
"title": post['post_title'],
"slug": post['post_name'],
"html": post['post_content'], # WordPress post_content is usually HTML
"status": ghost_status,
"published_at": post['post_date'].isoformat(), # Convert datetime to ISO 8601 string
"created_at": post['post_date'].isoformat(),
"updated_at": post['post_date'].isoformat(),
"type": "post",
"feature_image": None, # You might extend this to migrate featured images
"tags": ghost_tags
# You can add more fields like authors, custom_excerpt, etc.
}
ghost_posts.append(ghost_post)
print(f"Transformed {len(ghost_posts)} posts into Ghost format.")
return ghost_posts
Logic Explained:
- The function iterates through each WordPress post dictionary fetched in Step 2.
-
post_status‘publish’ maps to Ghost ‘published’, otherwise it’s ‘draft’. - The
tags_and_categoriesstring is split and converted into an array of objects, each with anamekey, as required by Ghost. -
post_dateis converted to ISO 8601 format, which Ghost expects for date fields. -
post_contentis directly used ashtml, assuming it’s already in a suitable HTML format from WordPress’s rich text editor. - Placeholder values like
feature_imageare set toNonebut can be extended if you plan to migrate media.
Step 4: Generate the Ghost-Compatible JSON File
Now, we’ll combine the previous steps and write the transformed data into a JSON file that Ghost’s import utility can recognize. Ghost’s import format typically expects a root data object containing a posts array, along with a meta object.
def generate_ghost_import_json(ghost_posts, output_filename="ghost_import.json"):
# Ghost import format wrapper
ghost_data = {
"meta": {
"api_version": "v5.x" # Adjust to your Ghost version if necessary
},
"data": {
"posts": ghost_posts,
"tags": [], # You might want to extract unique tags here if not doing it per post
"users": [], # You might want to extract and map authors here
"settings": [] # Not typically used for post import
}
}
try:
with open(output_filename, 'w', encoding='utf-8') as f:
json.dump(ghost_data, f, ensure_ascii=False, indent=4)
print(f"Successfully generated Ghost import file: {output_filename}")
except IOError as e:
print(f"Error writing JSON file: {e}")
# --- Main script execution ---
if __name__ == "__main__":
print("Starting WordPress to Ghost migration script...")
# Step 2: Fetch data
wp_data = fetch_wordpress_posts()
if wp_data:
# Step 3: Transform data
ghost_ready_data = transform_to_ghost_format(wp_data)
# Step 4: Generate JSON file
generate_ghost_import_json(ghost_ready_data)
print("nMigration script finished. Your 'ghost_import.json' file is ready for import into Ghost CMS.")
else:
print("No WordPress posts found or an error occurred during fetching. Exiting.")
Logic Explained:
- The
generate_ghost_import_jsonfunction wraps theghost_postsarray within the requireddata.postsstructure. - A
meta.api_versionfield is included, which is good practice for Ghost imports. Adjust the version number (e.g., v4.x, v5.x) to match your Ghost instance if issues arise, though Ghost is generally forward-compatible. - The
json.dumpfunction writes the Python dictionary to a file as formatted JSON.ensure_ascii=Falsehandles non-ASCII characters correctly, andindent=4makes the output file human-readable. - The
if __name__ == "__main__":block orchestrates the execution of the functions in the correct order.
Once the script completes, you’ll have a ghost_import.json file. To import it into Ghost:
- Log in to your Ghost Admin panel.
- Navigate to Settings (gear icon).
- Go to the Labs section.
- Under Import content, click the Import button and select your generated
ghost_import.jsonfile.
Common Pitfalls and Troubleshooting
-
Database Connection Errors: Double-check your
DB_CONFIGparameters (host,user,password,database). Ensure the MySQL user hasSELECTpermissions on thewp_posts,wp_term_relationships,wp_term_taxonomy, andwp_termstables. -
Content Formatting and Shortcodes: WordPress often uses shortcodes (e.g., , ) or custom blocks that may not render correctly in Ghost. You might need to extend the
transform_to_ghost_formatfunction to parse and convert these or perform manual cleanup in Ghost after import. -
Missing Images: This script only migrates post content and text-based tags. Images embedded in your WordPress posts are typically hosted on your WordPress server. For a full migration, you would need a separate process to download these images and upload them to Ghost or a CDN, then update the image URLs within the post
htmlcontent. This is a more advanced task outside the scope of this basic post migration. - Large Import Files: If you have an exceptionally large number of posts, the generated JSON file might be very big. Some Ghost hosting environments or browser limits might struggle with very large file uploads. If this happens, consider breaking your export into smaller batches (e.g., by year or date range).
Conclusion and Next Steps
You’ve successfully built a robust Python script to automate the migration of your WordPress posts to Ghost CMS. This approach saves countless hours of manual effort, reduces the risk of human error, and provides a solid foundation for your new content platform. By leveraging direct database access and a programmatic transformation, you gain fine-grained control over your migration process.
This script serves as an excellent starting point. To further enhance your migration, consider these next steps:
- Image Migration: Implement a strategy to download images from WordPress and upload them to Ghost’s storage, updating post content links accordingly.
- Author Migration: Map WordPress authors to Ghost users, potentially creating new users in Ghost as part of the script.
-
Page Migration: Extend the script to handle WordPress pages (
post_type = 'page'). - Custom Fields: If you extensively use custom fields in WordPress, determine how to best integrate that data into Ghost (e.g., as custom post settings).
We hope this guide empowers you to make your move to Ghost CMS smoother and more efficient. Happy migrating!
👉 Read the original article on TechResolve.blog
☕ Support my work
If this article helped you, you can buy me a coffee:
