Software

5 minute read

I Built an AI Baby Meme Generator with a Two-Stage Prompt Pipeline — Here’s How It Works

February 11, 2026

Last month I shipped babymeme.art — a free AI baby meme generator with 7 preset styles. You pick a style like “Gangster” or “Cursed,” type something like “eating pizza,” and get a meme-ready image in under 5 seconds.

Here’s the full technical breakdown of how it works, the two-stage prompt pipeline I designed, and what I learned building it.

The Architecture

User Input ("eating pizza")
       │
       ▼
┌─────────────────────┐
│  Style Template      │  ← 7 preset prompt templates
│  prefix + input +    │
│  suffix              │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Stage 1: Gemini     │  ← Prompt enhancement via OpenRouter
│  (prompt engineer)   │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Stage 2: Flux.1     │  ← Image generation via fal.ai
│  Schnell (4 steps)   │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Canvas API          │  ← Client-side meme text overlay
│  (Impact font)       │
└─────────────────────┘

Tech stack: Next.js 14 (App Router), TypeScript, Tailwind CSS, Supabase (auth + credits), PayPal (payments), Vercel (hosting).

The Prompt Template System

The core insight: users are lazy. They type “pizza” or “eating pizza” — not a detailed 100-word prompt. So I built 7 style templates that wrap their input into a complete, high-quality prompt.

Each template has a prefix and suffix:

// lib/styles.ts
{
  id: 'gangster',
  promptPrefix: 'Hilarious photo of a tough-looking toddler with
    an exaggerated gangster attitude, wearing oversized designer
    sunglasses, heavy gold chains, a tiny leather jacket,
    making a serious tough-guy face while',
  promptSuffix: ', fisheye lens shot, 90s hip hop music video
    aesthetic, dramatic flash photography, high contrast,
    absurd humor, viral meme quality, 8k.',
}

The prompt builder sandwiches user input between them:

// lib/promptBuilder.ts
export function buildPrompt({ userInput, style }) {
  let processedInput = userInput.trim();

  // Handle lazy single-word input
  if (!processedInput.includes(' ')) {
    processedInput = `doing ${processedInput}`;
  }

  return {
    prompt: `${style.promptPrefix} ${processedInput}${style.promptSuffix}`,
    negativePrompt: style.negativePrompt
  };
}

So when a user types “eating pizza” with the Gangster style, the actual prompt becomes:

“Hilarious photo of a tough-looking toddler with an exaggerated gangster attitude, wearing oversized designer sunglasses, heavy gold chains, a tiny leather jacket, making a serious tough-guy face while eating pizza, fisheye lens shot, 90s hip hop music video aesthetic, dramatic flash photography, high contrast, absurd humor, viral meme quality, 8k.”

Seven styles, each with a distinct visual personality:

Style	Vibe
Gangster	90s hip hop, gold chains, fisheye lens
Cursed	Security camera, uncanny valley, VHS grain
Giant Knitted	Crochet wool kaiju in miniature city
Dramatic Crying	Oscar-worthy performance, tears flying
Chubby	Michelin rolls, soft pastel, adorable
Cyberpunk	Neon, holographic, sci-fi
Pixar	3D render, Disney character style

Stage 1: Gemini Prompt Enhancement

The style template gets the job done, but a second pass with an LLM makes it noticeably better. I pipe the assembled prompt through Gemini 2.0 Flash (free tier via OpenRouter) for enhancement:

// lib/image-generation/gemini-enhancer.ts
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
  headers: {
    'Authorization': `Bearer ${process.env.OPENROUTER_API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'google/gemini-2.0-flash-exp:free',
    messages: [{
      role: 'system',
      content: `You are an expert prompt engineer for Flux image
        generation models. Enhance this prompt. Keep it under
        120 words. Return ONLY the enhanced prompt.
        ALWAYS include: "five fingers on each hand,
        anatomically correct hands"
        NEVER use negative phrasing — Flux doesn't support it.`
    }, {
      role: 'user',
      content: basePrompt
    }]
  })
});

Key decisions here:

Free model — Gemini 2.0 Flash Exp on OpenRouter costs nothing. For a prompt enhancement pass, you don’t need GPT-4.
120-word cap — Flux Schnell handles shorter prompts better. Longer prompts tend to confuse it.
No negative phrasing — Flux doesn’t support negative prompts the way Stable Diffusion does. Instead of “no extra fingers,” you say “five fingers on each hand.” This took me a while to figure out.
Graceful fallback — If Gemini fails, I just use the original template prompt. The system never blocks on enhancement failure.

Stage 2: Flux.1 Schnell on fal.ai

For image generation, I went with Flux.1 Schnell via fal.ai’s API:

// lib/image-generation/fal-flux-client.ts
const response = await fetch('https://fal.run/fal-ai/flux/schnell', {
  headers: {
    'Authorization': `Key ${process.env.FAL_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    prompt: finalPrompt,
    image_size: 'landscape_4_3',
    num_inference_steps: 4,
    num_images: 1,
    enable_safety_checker: true,
  }),
});

Why Flux Schnell?

Speed — 4 inference steps, generates in 2-3 seconds
Quality — Surprisingly good for meme-style images
Cost — Around $0.003 per image on fal.ai
No negative prompt needed — Simpler API surface

I originally planned to use Replicate, but fal.ai’s cold start times were better for my use case. When every second counts in a consumer app, that matters.

Client-Side Meme Text with Canvas API

After generation, users can add classic meme text (white Impact font, black stroke). This runs entirely client-side:

// lib/memeCanvas.ts — simplified
export function renderMemeToCanvas(
  canvas: HTMLCanvasElement,
  image: HTMLImageElement,
  options: { topText: string; bottomText: string }
) {
  const ctx = canvas.getContext('2d');
  ctx.drawImage(image, 0, 0, canvas.width, canvas.height);

  const fontSize = Math.floor(canvas.width / 14);
  ctx.font = `bold ${fontSize}px Impact, sans-serif`;
  ctx.textAlign = 'center';
  ctx.fillStyle = 'white';
  ctx.strokeStyle = 'black';
  ctx.lineWidth = fontSize / 6;

  // Draw with stroke first, then fill (classic meme style)
  if (options.topText) {
    ctx.strokeText(options.topText.toUpperCase(), canvas.width / 2, fontSize + 10);
    ctx.fillText(options.topText.toUpperCase(), canvas.width / 2, fontSize + 10);
  }
}

One gotcha: cross-origin images. The AI-generated images come from fal.ai’s CDN, so you must set crossOrigin = 'anonymous' when loading them into the canvas — otherwise toDataURL() throws a security error on download.

const img = new Image();
img.crossOrigin = 'anonymous'; // This line saves you 2 hours of debugging
img.src = imageUrl;

Content Safety Layer

With any AI image generator, content filtering is non-negotiable. I built a three-layer system:

Layer 1: Keyword blocklist — Catches obvious banned terms before they hit the API.

Layer 2: Context-aware replacement — For the Gangster style, if someone types “gun,” it auto-replaces to “water gun” instead of blocking outright. Fewer frustrated users, same safety outcome.

Layer 3: fal.ai’s built-in safety checker — enable_safety_checker: true as the final guardrail.

export function autoReplaceWords(input: string, styleId: string): string {
  if (styleId === 'gangster') {
    return input.replace(/bgunb/gi, 'water gun')
                .replace(/bweaponb/gi, 'toy weapon');
  }
  return input;
}

Credit System with Atomic Operations

Users get 5 free credits on signup, can buy packages ($2.99/50 credits), or subscribe to Pro ($9.99/month for 300 credits). The credit deduction uses Supabase RPC for atomicity:

-- Credits are deducted before generation starts
-- If generation fails, credits are refunded
const creditResult = await useCredit(user.id);

if (!creditResult.success) {
  return NextResponse.json(
    { error: creditResult.error },
    { status: 402 } // Payment Required
  );
}

// ... generate image ...

if (!imageResult.success) {
  // Refund on failure
  await refundCredit(user.id, creditResult.credit_type);
}

Priority order: subscription credits > free credits > purchased credits. This way purchased credits (which never expire) are the last to be consumed.

SEO for Programmatic Pages

Each of the 7 styles gets its own page at /styles/[slug] with unique metadata, JSON-LD structured data, and content:

// app/styles/[slug]/page.tsx
export async function generateMetadata({ params }) {
  const style = getStyleBySlug(params.slug);
  return {
    title: style.seoTitle,
    description: style.seoDescription,
    openGraph: {
      images: [{ url: `/api/og?style=${style.id}` }],
    },
  };
}

The OG images are generated dynamically at the edge using Next.js ImageResponse:

// app/api/og/route.tsx
export const runtime = 'edge';

export async function GET(request: NextRequest) {
  const style = request.nextUrl.searchParams.get('style');
  return new ImageResponse(
    <div style={{ /* gradient background, emoji, title */ }}>
      {styleEmojis[style]} {styleNames[style]}
    </div>,
    { width: 1200, height: 630 }
  );
}

This gives every style page a unique social share image without storing any static files.

What I’d Do Differently

1. Start with fal.ai from day one. I initially set up Replicate, then switched to fal.ai for faster cold starts. Wasted a day on the migration.

2. Use an LLM for content filtering instead of keyword lists. Keyword blocklists have false positives (“assassin” contains “ass”). An LLM-based classifier would be smarter, but costs more per request.

3. Add image caching earlier. Same prompt + same style = same image. Could save a lot on API costs with a simple hash-based cache.

Numbers After 2 Weeks

Built and deployed in ~12 days
7 style templates
~3 second average generation time
Cost per image: ~$0.003 (fal.ai) + ~$0 (Gemini free tier)
Deployed on Vercel free tier

Try It

babymeme.art — 5 free credits, no credit card required. Pick a style, type something funny, get a meme.

If you’re building something with AI image generation, the two-stage pipeline (LLM enhancement + fast diffusion model) is worth trying. The quality difference between a raw user prompt and an LLM-enhanced one is significant, and using a free model for enhancement keeps the cost near zero.

Have questions about the architecture or prompt engineering? Drop a comment below.

Rotating Residential Proxy Validation Lab for 2026 That You Can Reproduce and Score

February 11, 2026

Software

How to Build a Procurement Management System

February 11, 2026

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hand-Picked Top-Read Stories

SwiftUI Idempotency & Duplicate Prevention (Correctness in Distributed Systems)

Delivery Planning In IT & Software Development: Making a Delivery Plan

OpenAI removes access to sycophancy-prone ChatGPT-4o model

Trending Tags

I Built an AI Baby Meme Generator with a Two-Stage Prompt Pipeline — Here’s How It Works

The Architecture

The Prompt Template System

Stage 1: Gemini Prompt Enhancement

Stage 2: Flux.1 Schnell on fal.ai

Client-Side Meme Text with Canvas API

Content Safety Layer

Credit System with Atomic Operations

SEO for Programmatic Pages

What I’d Do Differently

Numbers After 2 Weeks

Try It

Leave a Reply Cancel reply

Previous Post

Rotating Residential Proxy Validation Lab for 2026 That You Can Reproduce and Score

Next Post

How to Build a Procurement Management System

I Built an AI Baby Meme Generator with a Two-Stage Prompt Pipeline — Here’s How It Works

The Architecture

The Prompt Template System

Stage 1: Gemini Prompt Enhancement

Stage 2: Flux.1 Schnell on fal.ai

Client-Side Meme Text with Canvas API

Content Safety Layer

Credit System with Atomic Operations

SEO for Programmatic Pages

What I’d Do Differently

Numbers After 2 Weeks

Try It

Leave a Reply Cancel reply

Previous Post

Next Post

Related Posts