📚 Table of Contents
-
Understanding the Craigslist like Classifieds Platform Design
-
Functional Requirements Analysis
2.1. User Types
2.2. Listing Details
2.3. Filters -
Non Functional Requirements
-
System Capacity Planning
4.1. Key Assumptions
4.2. Post Volume
4.3. Storage Requirements
4.4. Write Traffic
4.5. Read Traffic
4.6. Daily Storage Growth -
API Design
5.1. Post Management APIs
5.2. User Management APIs
5.3. System API -
Database Schema
6.1. Users Table
6.2. Posts Table
6.3. Images Table
6.4. Reports Table -
Storage Architecture
-
Image Upload Strategy
8.1. Direct Client Upload
8.2. Upload Through Backend
8.3. Hybrid Approach -
Read and Write Flow
9.1. Write Flow
9.2. Read Flow -
Geolocation Partitioning
10.1. Why Use Geolocation Partitioning?
10.2. Structure Details
10.3. Database Sharding
10.4. Elasticsearch Indexing Strategy
10.5. Object Storage Organization
10.6. CDN and GeoDNS Implementation
10.7. Request Routing Logic
10.8. Real World Example
10.9. Implementation Challenges -
Search Design
11.1. Document Structure
11.2. Query Patterns
11.3. Scaling Strategy -
Optional Analytics System
-
Key Design Decisions Explained
13.1. Why Use a Hybrid Upload Strategy?
13.2. Why Geographic Partitioning?
13.3. Why 7 Day Auto Expiration?
13.4. Why Object Storage and CDN for Images? -
Summary
Understanding the Craigslist like Classifieds Platform Design
The document outlines a comprehensive system design for a Craigslist-style classifieds platform that allows users to post, browse, and respond to classified listings. Let me walk through each major component in detail.
Functional Requirements Analysis
User Types
The system supports two primary user types:
-
Viewers: Users who browse the platform without posting content. They can:
- Browse and search through listings
- View detailed information about specific listings
- Apply filters to narrow down search results
- Contact sellers/posters
- Report inappropriate content
-
Posters: Users who create and manage listings. They can:
- Create, update, and delete their own listings
- Renew posts every 7 days to keep them active
- Search through and manage their listings
- Upload up to 10 images per listing (1MB each)
- Potentially upload videos (marked as an extra feature)
Listing Details
Each listing contains:
- Title: Brief description of the item/service
- Description: Detailed information
- Price: Listed in a single currency format
- Location: Geographic information about where the item/service is available
- Photos: Up to 10 images
- Auto-deletion: Posts automatically expire after 7 days
Filters
The system supports filtering by:
- Neighborhood: Geographic area within a city
- Price range: Minimum and maximum price values
- Item condition: Categorical value (like new, good, fair, etc.)
- The design allows for additional filters based on user/application needs
Non Functional Requirements
These requirements define the quality attributes of the system:
- Scalability: The system must support up to 10 million users per city
- High Availability: 99.9% uptime guarantee (equals about 8.8 hours of downtime per year)
- Performance: 99th percentile latency under 1 second for read/search operations
- Security: Authentication required for users who want to post listings
System Capacity Planning
The capacity planning section provides detailed calculations for the expected scale:
Key Assumptions
- 10 million users per city
- 10% (1 million) are active posters, assume rest 9M are viewers
- Each active poster creates 10 posts per day
- Each post has 1KB of metadata plus 10 images of 1MB each
- Posts expire after 7 days
Post Volume
- Daily: 10 million new posts per city (1M x 10 post/user)
- Total active posts at any time: 70 million (10M × 7 days)
7 days as we are keeping post active for 7 days & will delete after
Storage Requirements
- Metadata: 70GB (70M posts × 1KB)
- Images: 700TB (70M posts × 10MB)
- This shows why object storage and CDN are critical architectural components
Write Traffic
- Post creation: 116 posts/second average, 232/second at peak
- Image uploads: 580/second average, 1,160/second at peak (accounting for retries)
Read Traffic
- Post views: 2,083/second average, 4,000/second at peak
- Image views: 20,000/second average, 40,000/second at peak
Daily Storage Growth
- Metadata: 10GB/day
- Images: 100TB/day
API Design
The API is RESTful and divided into three main categories:
Post Management APIs
-
GET /post/{id}
: Retrieve a specific post -
DELETE /post/{id}
: Remove a post -
GET /post?search=...
: Search for posts with filters -
POST /post
: Create a new post -
PUT /post
: Update an existing post -
POST /report
: Report abusive content -
POST /contact
: Contact a poster -
DELETE /old_posts
: System endpoint to remove expired posts
User Management APIs
-
POST /signup
: Create a new user account -
POST /login
: Authenticate a user -
DELETE /user
: Delete a user account
System API
-
GET /health
: System health check endpoint
Database Schema
The database uses a relational model with four main tables:
Users Table
Stores basic user information:
CREATE TABLE Users (
id SERIAL PRIMARY KEY,
first_name TEXT,
last_name TEXT,
signup_ts BIGINT
);
Posts Table
Contains all listing information:
CREATE TABLE Posts (
id SERIAL PRIMARY KEY,
created_at BIGINT,
poster_id INT,
location_id INT,
title TEXT,
description TEXT,
price INT,
condition TEXT,
country_code CHAR(2),
state TEXT,
city TEXT,
street_number INT,
street_name TEXT,
zip_code TEXT,
phone_number BIGINT,
email TEXT
);
Images Table
Tracks images associated with posts:
CREATE TABLE Images (
id SERIAL PRIMARY KEY,
ts BIGINT,
post_id INT,
image_address TEXT
);
Reports Table
Records abuse reports:
CREATE TABLE Reports (
id SERIAL PRIMARY KEY,
ts BIGINT,
post_id INT,
user_id INT,
abuse_type TEXT,
message TEXT
);
Storage Architecture
The system uses a multi-tiered storage approach:
- SQL Database: For structured data like metadata, user information, and reports
- Object Storage (S3-like): For storing large binary files like images
- CDN (Content Delivery Network): For efficiently serving images from edge locations close to users
Image Upload Strategy
The document presents three approaches:
Direct Client Upload (Recommended)
- Client sends post metadata to backend
- Backend creates a post record and returns the post ID plus pre-signed URLs for image uploads
- Client uploads images directly to object storage
- Client notifies backend when uploads are complete
Advantages:
- Highly scalable and efficient
- Reduces backend load
- More cost-effective
Disadvantages:
- More complex error handling
- Risk of incomplete uploads if client disconnects
Upload Through Backend
- Client sends metadata and image files to backend
- Backend stores data and handles uploading to object storage
Advantages:
- Better validation and control
- Simpler client implementation
Disadvantages:
- Creates a scalability bottleneck
- Increases backend resource requirements
Hybrid Approach (Recommended)
Combines elements of both approaches for optimal balance of control and scalability.
Read and Write Flow
Write Flow
- Client sends post metadata to the backend
- Backend stores metadata in SQL database and returns a post ID
- Client uploads images directly to object storage
- Database sharding is used to handle high write volumes:
- Multiple write databases
- Partitioning by city ID or consistent hashing
- Each write node has associated read replicas
Read Flow
- Client requests a post by ID
- Load balancer directs request to nearest read replica
- Backend fetches metadata from SQL database
- Images are served from CDN/object storage
Geolocation Partitioning
Geolocation partitioning is a critical architectural strategy for a classifieds platform that divides data along geographic boundaries. Let me break down why it’s important and how each component works:
Why Use Geolocation Partitioning?
Performance Benefits
- Faster Queries: Most users search for listings in their own city or region, so keeping related data together reduces query latency
- Reduced Data Scope: Limits searches to relevant geographic areas instead of scanning the entire database
- Localized Caching: Improves cache hit rates by focusing on locally relevant content
Scalability Advantages
- Independent Scaling: Each region can scale based on its own traffic patterns and user base
- Fault Isolation: Issues in one region don’t affect others
- Optimized Resource Allocation: High-traffic cities can get more resources than low-traffic areas
Structure Details
Hierarchical Organization
The system organizes data in a geographic hierarchy:
- Country: Top level (e.g., US, Canada, UK)
- State/Province: Middle level (e.g., New York, California)
- City: Lowest level (e.g., NYC, San Francisco)
This mirrors how users think about locations when posting or searching for items.
Database Sharding
- Geographic Shards: Each city or region gets its own database shard (or shares with similar-sized regions)
- Consistent Hashing: Distributes cities across shards evenly and minimizes resharding impact
- Mapping Table: Maintains a lookup service that maps cities to their corresponding database shards
Example:
Shard 1: New York City, Chicago, Los Angeles
Shard 2: Miami, Seattle, Denver
Shard 3: Boston, Washington DC, San Francisco
The mapping table would contain entries like:
"new-york-city" → Shard 1
"seattle" → Shard 2
"boston" → Shard 3
Elasticsearch Indexing Strategy
Separate Indexes per Region
- Each city/region gets its own search index (e.g.,
posts_us_ny_nyc
) - Benefits:
- Smaller indexes are faster to search
- Index settings can be tuned for local language and search patterns
- Index operations (updates, rebuilds) affect only one region
Geo_Point Fields
- Special Elasticsearch data type optimized for location-based queries
- Enables powerful queries like:
- “Find all listings within 5 miles of downtown”
- “Sort listings by distance from my current location”
- “Show me items in this neighborhood”
Federated Search
- For queries that span multiple regions:
- System sends parallel queries to relevant regional indexes
- Results are merged, sorted, and returned to the user
- Example: “Show me furniture listings in NYC, Boston, and Philadelphia”
Object Storage Organization
Path-Based Organization
- Images stored using paths that reflect geography:
s3://images/us/ny/nyc/post123.jpg
- Benefits:
- Logical organization matches application structure
- Easy to identify content by location
- Simplifies backup and retention policies by region
Example:
s3://images/us/ny/nyc/post123.jpg
s3://images/us/ca/sf/post456.jpg
s3://images/ca/on/toronto/post789.jpg
CDN and GeoDNS Implementation
CDN Edge Nodes
- Content Delivery Network caches images at edge locations worldwide
- When a user views a listing, images are served from the nearest edge server
- Benefits:
- Reduced image load times (often 10x faster than from origin)
- Lower origin server load
- Better user experience especially for mobile users
GeoDNS Routing
- DNS system determines user’s approximate location
- Routes requests to nearest server cluster
- Example:
- User in Chicago → Midwest regional servers
- User in Paris → European regional servers
Request Routing Logic
Region Inference
The system determines which region’s data to access using multiple methods:
-
Post ID: IDs can encode region information (e.g.,
nyc-12345
) - User IP: Approximate user location from IP address
- Explicit Tags: User-selected region or location preferences
Routing Implementation
- API Gateway: Routes API requests to appropriate regional services
- Load Balancer: Distributes traffic across servers within a region
- Service Discovery: Maintains registry of available services by region
Real World Example
When a user in San Francisco searches for “used bicycle under $200”:
- GeoDNS routes them to West Coast servers
- System identifies SF as their location (from IP or preferences)
- Query goes to the SF Elasticsearch index (
posts_us_ca_sf
) - Results include only SF listings, with data from the SF database shard
- Images load from nearby CDN edge nodes in California
If the user expands their search to include Oakland and San Jose:
- System performs parallel queries across all three city indexes
- Results are merged, filtered by price, and returned to the user
Implementation Challenges
- Cross-Region Searches: Need efficient algorithms for merging results
- Region Mapping Maintenance: Keeping the city→shard mapping updated
- Data Migration: Moving data when resharding or rebalancing
- Consistency: Ensuring consistent experience across regions
This geolocation partitioning architecture enables the classifieds platform to scale efficiently to millions of users while maintaining fast response times and a localized user experience.
Search Design
The system uses Elasticsearch for fast, scalable search functionality:
Document Structure
- Core fields: Post ID, title, price, description
- Location stored as geo_point for spatial queries
- Filter fields: price, condition, neighborhood
Query Patterns
- Full-text search across title and description
- Range filtering for price
- Categorical filtering for condition
- Geospatial filtering by location
Scaling Strategy
- One index per geographic region
- Optimized shard configuration
- Asynchronous indexing via Kafka to Elasticsearch consumers
Optional Analytics System
For future growth, the design includes an analytics capability:
- Log collection for user actions (searches, views, reports)
- Kafka for streaming log data
- Data warehouse (Redshift or BigQuery) for storage
- Analysis for trends and abuse detection
Key Design Decisions Explained
Why Use a Hybrid Upload Strategy?
The hybrid approach balances control and scalability. By having the backend manage metadata but allowing direct image uploads, the system avoids becoming a bottleneck while maintaining control over the core listing data.
Why Geographic Partitioning?
Most classified listings are location-specific, with users typically searching within their own city or region. Geographic partitioning aligns the data storage with this usage pattern, improving performance and reducing query scope.
Why 7 Day Auto Expiration?
This policy keeps content fresh and significantly reduces storage requirements. Without this limitation, the storage needs would grow unbounded over time.
Why Object Storage and CDN for Images?
With 700TB of image data and 40,000 image requests per second at peak, traditional file storage would be inadequate. Object storage offers cost-effective scalability, while CDNs provide low-latency global delivery.
Summary
This Craigslist-like system design demonstrates careful consideration of:
- Scale: Supporting 10 million users per city with 10 million daily posts
- Performance: Ensuring fast response times through caching, CDN, and read replicas
- Cost-efficiency: Using appropriate storage tiers and auto-expiration policies
- Geographic organization: Aligning system architecture with usage patterns
- Evolution path: Starting simple and evolving to microservices as needed
The design balances technical sophistication with practical implementation concerns, providing a solid foundation for a large-scale classifieds platform.
Design with Read & write Replicas
Design with Search Engine
More Details:
Get all articles related to system design
Hastag: SystemDesignWithZeeshanAli
Git: https://github.com/ZeeshanAli-0704/SystemDesignWithZeeshanAli