I Built an AI Baby Meme Generator with a Two-Stage Prompt Pipeline — Here’s How It Works

Last month I shipped babymeme.art — a free AI baby meme generator with 7 preset styles. You pick a style like “Gangster” or “Cursed,” type something like “eating pizza,” and get a meme-ready image in under 5 seconds.

Here’s the full technical breakdown of how it works, the two-stage prompt pipeline I designed, and what I learned building it.

The Architecture

User Input ("eating pizza")
       │
       ▼
┌─────────────────────┐
│  Style Template      │  ← 7 preset prompt templates
│  prefix + input +    │
│  suffix              │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Stage 1: Gemini     │  ← Prompt enhancement via OpenRouter
│  (prompt engineer)   │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Stage 2: Flux.1     │  ← Image generation via fal.ai
│  Schnell (4 steps)   │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Canvas API          │  ← Client-side meme text overlay
│  (Impact font)       │
└─────────────────────┘

Tech stack: Next.js 14 (App Router), TypeScript, Tailwind CSS, Supabase (auth + credits), PayPal (payments), Vercel (hosting).

The Prompt Template System

The core insight: users are lazy. They type “pizza” or “eating pizza” — not a detailed 100-word prompt. So I built 7 style templates that wrap their input into a complete, high-quality prompt.

Each template has a prefix and suffix:

// lib/styles.ts
{
  id: 'gangster',
  promptPrefix: 'Hilarious photo of a tough-looking toddler with
    an exaggerated gangster attitude, wearing oversized designer
    sunglasses, heavy gold chains, a tiny leather jacket,
    making a serious tough-guy face while',
  promptSuffix: ', fisheye lens shot, 90s hip hop music video
    aesthetic, dramatic flash photography, high contrast,
    absurd humor, viral meme quality, 8k.',
}

The prompt builder sandwiches user input between them:

// lib/promptBuilder.ts
export function buildPrompt({ userInput, style }) {
  let processedInput = userInput.trim();

  // Handle lazy single-word input
  if (!processedInput.includes(' ')) {
    processedInput = `doing ${processedInput}`;
  }

  return {
    prompt: `${style.promptPrefix} ${processedInput}${style.promptSuffix}`,
    negativePrompt: style.negativePrompt
  };
}

So when a user types “eating pizza” with the Gangster style, the actual prompt becomes:

“Hilarious photo of a tough-looking toddler with an exaggerated gangster attitude, wearing oversized designer sunglasses, heavy gold chains, a tiny leather jacket, making a serious tough-guy face while eating pizza, fisheye lens shot, 90s hip hop music video aesthetic, dramatic flash photography, high contrast, absurd humor, viral meme quality, 8k.”

Seven styles, each with a distinct visual personality:

Style Vibe
Gangster 90s hip hop, gold chains, fisheye lens
Cursed Security camera, uncanny valley, VHS grain
Giant Knitted Crochet wool kaiju in miniature city
Dramatic Crying Oscar-worthy performance, tears flying
Chubby Michelin rolls, soft pastel, adorable
Cyberpunk Neon, holographic, sci-fi
Pixar 3D render, Disney character style

Stage 1: Gemini Prompt Enhancement

The style template gets the job done, but a second pass with an LLM makes it noticeably better. I pipe the assembled prompt through Gemini 2.0 Flash (free tier via OpenRouter) for enhancement:

// lib/image-generation/gemini-enhancer.ts
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
  headers: {
    'Authorization': `Bearer ${process.env.OPENROUTER_API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'google/gemini-2.0-flash-exp:free',
    messages: [{
      role: 'system',
      content: `You are an expert prompt engineer for Flux image
        generation models. Enhance this prompt. Keep it under
        120 words. Return ONLY the enhanced prompt.
        ALWAYS include: "five fingers on each hand,
        anatomically correct hands"
        NEVER use negative phrasing — Flux doesn't support it.`
    }, {
      role: 'user',
      content: basePrompt
    }]
  })
});

Key decisions here:

  1. Free model — Gemini 2.0 Flash Exp on OpenRouter costs nothing. For a prompt enhancement pass, you don’t need GPT-4.
  2. 120-word cap — Flux Schnell handles shorter prompts better. Longer prompts tend to confuse it.
  3. No negative phrasing — Flux doesn’t support negative prompts the way Stable Diffusion does. Instead of “no extra fingers,” you say “five fingers on each hand.” This took me a while to figure out.
  4. Graceful fallback — If Gemini fails, I just use the original template prompt. The system never blocks on enhancement failure.

Stage 2: Flux.1 Schnell on fal.ai

For image generation, I went with Flux.1 Schnell via fal.ai’s API:

// lib/image-generation/fal-flux-client.ts
const response = await fetch('https://fal.run/fal-ai/flux/schnell', {
  headers: {
    'Authorization': `Key ${process.env.FAL_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    prompt: finalPrompt,
    image_size: 'landscape_4_3',
    num_inference_steps: 4,
    num_images: 1,
    enable_safety_checker: true,
  }),
});

Why Flux Schnell?

  • Speed — 4 inference steps, generates in 2-3 seconds
  • Quality — Surprisingly good for meme-style images
  • Cost — Around $0.003 per image on fal.ai
  • No negative prompt needed — Simpler API surface

I originally planned to use Replicate, but fal.ai’s cold start times were better for my use case. When every second counts in a consumer app, that matters.

Client-Side Meme Text with Canvas API

After generation, users can add classic meme text (white Impact font, black stroke). This runs entirely client-side:

// lib/memeCanvas.ts — simplified
export function renderMemeToCanvas(
  canvas: HTMLCanvasElement,
  image: HTMLImageElement,
  options: { topText: string; bottomText: string }
) {
  const ctx = canvas.getContext('2d');
  ctx.drawImage(image, 0, 0, canvas.width, canvas.height);

  const fontSize = Math.floor(canvas.width / 14);
  ctx.font = `bold ${fontSize}px Impact, sans-serif`;
  ctx.textAlign = 'center';
  ctx.fillStyle = 'white';
  ctx.strokeStyle = 'black';
  ctx.lineWidth = fontSize / 6;

  // Draw with stroke first, then fill (classic meme style)
  if (options.topText) {
    ctx.strokeText(options.topText.toUpperCase(), canvas.width / 2, fontSize + 10);
    ctx.fillText(options.topText.toUpperCase(), canvas.width / 2, fontSize + 10);
  }
}

One gotcha: cross-origin images. The AI-generated images come from fal.ai’s CDN, so you must set crossOrigin = 'anonymous' when loading them into the canvas — otherwise toDataURL() throws a security error on download.

const img = new Image();
img.crossOrigin = 'anonymous'; // This line saves you 2 hours of debugging
img.src = imageUrl;

Content Safety Layer

With any AI image generator, content filtering is non-negotiable. I built a three-layer system:

Layer 1: Keyword blocklist — Catches obvious banned terms before they hit the API.

Layer 2: Context-aware replacement — For the Gangster style, if someone types “gun,” it auto-replaces to “water gun” instead of blocking outright. Fewer frustrated users, same safety outcome.

Layer 3: fal.ai’s built-in safety checkerenable_safety_checker: true as the final guardrail.

export function autoReplaceWords(input: string, styleId: string): string {
  if (styleId === 'gangster') {
    return input.replace(/bgunb/gi, 'water gun')
                .replace(/bweaponb/gi, 'toy weapon');
  }
  return input;
}

Credit System with Atomic Operations

Users get 5 free credits on signup, can buy packages ($2.99/50 credits), or subscribe to Pro ($9.99/month for 300 credits). The credit deduction uses Supabase RPC for atomicity:

-- Credits are deducted before generation starts
-- If generation fails, credits are refunded
const creditResult = await useCredit(user.id);

if (!creditResult.success) {
  return NextResponse.json(
    { error: creditResult.error },
    { status: 402 } // Payment Required
  );
}

// ... generate image ...

if (!imageResult.success) {
  // Refund on failure
  await refundCredit(user.id, creditResult.credit_type);
}

Priority order: subscription credits > free credits > purchased credits. This way purchased credits (which never expire) are the last to be consumed.

SEO for Programmatic Pages

Each of the 7 styles gets its own page at /styles/[slug] with unique metadata, JSON-LD structured data, and content:

// app/styles/[slug]/page.tsx
export async function generateMetadata({ params }) {
  const style = getStyleBySlug(params.slug);
  return {
    title: style.seoTitle,
    description: style.seoDescription,
    openGraph: {
      images: [{ url: `/api/og?style=${style.id}` }],
    },
  };
}

The OG images are generated dynamically at the edge using Next.js ImageResponse:

// app/api/og/route.tsx
export const runtime = 'edge';

export async function GET(request: NextRequest) {
  const style = request.nextUrl.searchParams.get('style');
  return new ImageResponse(
    <div style={{ /* gradient background, emoji, title */ }}>
      {styleEmojis[style]} {styleNames[style]}
    </div>,
    { width: 1200, height: 630 }
  );
}

This gives every style page a unique social share image without storing any static files.

What I’d Do Differently

1. Start with fal.ai from day one. I initially set up Replicate, then switched to fal.ai for faster cold starts. Wasted a day on the migration.

2. Use an LLM for content filtering instead of keyword lists. Keyword blocklists have false positives (“assassin” contains “ass”). An LLM-based classifier would be smarter, but costs more per request.

3. Add image caching earlier. Same prompt + same style = same image. Could save a lot on API costs with a simple hash-based cache.

Numbers After 2 Weeks

  • Built and deployed in ~12 days
  • 7 style templates
  • ~3 second average generation time
  • Cost per image: ~$0.003 (fal.ai) + ~$0 (Gemini free tier)
  • Deployed on Vercel free tier

Try It

babymeme.art — 5 free credits, no credit card required. Pick a style, type something funny, get a meme.

If you’re building something with AI image generation, the two-stage pipeline (LLM enhancement + fast diffusion model) is worth trying. The quality difference between a raw user prompt and an LLM-enhanced one is significant, and using a free model for enhancement keeps the cost near zero.

Have questions about the architecture or prompt engineering? Drop a comment below.

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post

Rotating Residential Proxy Validation Lab for 2026 That You Can Reproduce and Score

Next Post

How to Build a Procurement Management System

Related Posts