Last month I shipped babymeme.art — a free AI baby meme generator with 7 preset styles. You pick a style like “Gangster” or “Cursed,” type something like “eating pizza,” and get a meme-ready image in under 5 seconds.
Here’s the full technical breakdown of how it works, the two-stage prompt pipeline I designed, and what I learned building it.
The Architecture
User Input ("eating pizza")
│
▼
┌─────────────────────┐
│ Style Template │ ← 7 preset prompt templates
│ prefix + input + │
│ suffix │
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ Stage 1: Gemini │ ← Prompt enhancement via OpenRouter
│ (prompt engineer) │
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ Stage 2: Flux.1 │ ← Image generation via fal.ai
│ Schnell (4 steps) │
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ Canvas API │ ← Client-side meme text overlay
│ (Impact font) │
└─────────────────────┘
Tech stack: Next.js 14 (App Router), TypeScript, Tailwind CSS, Supabase (auth + credits), PayPal (payments), Vercel (hosting).
The Prompt Template System
The core insight: users are lazy. They type “pizza” or “eating pizza” — not a detailed 100-word prompt. So I built 7 style templates that wrap their input into a complete, high-quality prompt.
Each template has a prefix and suffix:
// lib/styles.ts
{
id: 'gangster',
promptPrefix: 'Hilarious photo of a tough-looking toddler with
an exaggerated gangster attitude, wearing oversized designer
sunglasses, heavy gold chains, a tiny leather jacket,
making a serious tough-guy face while',
promptSuffix: ', fisheye lens shot, 90s hip hop music video
aesthetic, dramatic flash photography, high contrast,
absurd humor, viral meme quality, 8k.',
}
The prompt builder sandwiches user input between them:
// lib/promptBuilder.ts
export function buildPrompt({ userInput, style }) {
let processedInput = userInput.trim();
// Handle lazy single-word input
if (!processedInput.includes(' ')) {
processedInput = `doing ${processedInput}`;
}
return {
prompt: `${style.promptPrefix} ${processedInput}${style.promptSuffix}`,
negativePrompt: style.negativePrompt
};
}
So when a user types “eating pizza” with the Gangster style, the actual prompt becomes:
“Hilarious photo of a tough-looking toddler with an exaggerated gangster attitude, wearing oversized designer sunglasses, heavy gold chains, a tiny leather jacket, making a serious tough-guy face while eating pizza, fisheye lens shot, 90s hip hop music video aesthetic, dramatic flash photography, high contrast, absurd humor, viral meme quality, 8k.”
Seven styles, each with a distinct visual personality:
| Style | Vibe |
|---|---|
| Gangster | 90s hip hop, gold chains, fisheye lens |
| Cursed | Security camera, uncanny valley, VHS grain |
| Giant Knitted | Crochet wool kaiju in miniature city |
| Dramatic Crying | Oscar-worthy performance, tears flying |
| Chubby | Michelin rolls, soft pastel, adorable |
| Cyberpunk | Neon, holographic, sci-fi |
| Pixar | 3D render, Disney character style |
Stage 1: Gemini Prompt Enhancement
The style template gets the job done, but a second pass with an LLM makes it noticeably better. I pipe the assembled prompt through Gemini 2.0 Flash (free tier via OpenRouter) for enhancement:
// lib/image-generation/gemini-enhancer.ts
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
headers: {
'Authorization': `Bearer ${process.env.OPENROUTER_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'google/gemini-2.0-flash-exp:free',
messages: [{
role: 'system',
content: `You are an expert prompt engineer for Flux image
generation models. Enhance this prompt. Keep it under
120 words. Return ONLY the enhanced prompt.
ALWAYS include: "five fingers on each hand,
anatomically correct hands"
NEVER use negative phrasing — Flux doesn't support it.`
}, {
role: 'user',
content: basePrompt
}]
})
});
Key decisions here:
- Free model — Gemini 2.0 Flash Exp on OpenRouter costs nothing. For a prompt enhancement pass, you don’t need GPT-4.
- 120-word cap — Flux Schnell handles shorter prompts better. Longer prompts tend to confuse it.
- No negative phrasing — Flux doesn’t support negative prompts the way Stable Diffusion does. Instead of “no extra fingers,” you say “five fingers on each hand.” This took me a while to figure out.
- Graceful fallback — If Gemini fails, I just use the original template prompt. The system never blocks on enhancement failure.
Stage 2: Flux.1 Schnell on fal.ai
For image generation, I went with Flux.1 Schnell via fal.ai’s API:
// lib/image-generation/fal-flux-client.ts
const response = await fetch('https://fal.run/fal-ai/flux/schnell', {
headers: {
'Authorization': `Key ${process.env.FAL_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
prompt: finalPrompt,
image_size: 'landscape_4_3',
num_inference_steps: 4,
num_images: 1,
enable_safety_checker: true,
}),
});
Why Flux Schnell?
- Speed — 4 inference steps, generates in 2-3 seconds
- Quality — Surprisingly good for meme-style images
- Cost — Around $0.003 per image on fal.ai
- No negative prompt needed — Simpler API surface
I originally planned to use Replicate, but fal.ai’s cold start times were better for my use case. When every second counts in a consumer app, that matters.
Client-Side Meme Text with Canvas API
After generation, users can add classic meme text (white Impact font, black stroke). This runs entirely client-side:
// lib/memeCanvas.ts — simplified
export function renderMemeToCanvas(
canvas: HTMLCanvasElement,
image: HTMLImageElement,
options: { topText: string; bottomText: string }
) {
const ctx = canvas.getContext('2d');
ctx.drawImage(image, 0, 0, canvas.width, canvas.height);
const fontSize = Math.floor(canvas.width / 14);
ctx.font = `bold ${fontSize}px Impact, sans-serif`;
ctx.textAlign = 'center';
ctx.fillStyle = 'white';
ctx.strokeStyle = 'black';
ctx.lineWidth = fontSize / 6;
// Draw with stroke first, then fill (classic meme style)
if (options.topText) {
ctx.strokeText(options.topText.toUpperCase(), canvas.width / 2, fontSize + 10);
ctx.fillText(options.topText.toUpperCase(), canvas.width / 2, fontSize + 10);
}
}
One gotcha: cross-origin images. The AI-generated images come from fal.ai’s CDN, so you must set crossOrigin = 'anonymous' when loading them into the canvas — otherwise toDataURL() throws a security error on download.
const img = new Image();
img.crossOrigin = 'anonymous'; // This line saves you 2 hours of debugging
img.src = imageUrl;
Content Safety Layer
With any AI image generator, content filtering is non-negotiable. I built a three-layer system:
Layer 1: Keyword blocklist — Catches obvious banned terms before they hit the API.
Layer 2: Context-aware replacement — For the Gangster style, if someone types “gun,” it auto-replaces to “water gun” instead of blocking outright. Fewer frustrated users, same safety outcome.
Layer 3: fal.ai’s built-in safety checker — enable_safety_checker: true as the final guardrail.
export function autoReplaceWords(input: string, styleId: string): string {
if (styleId === 'gangster') {
return input.replace(/bgunb/gi, 'water gun')
.replace(/bweaponb/gi, 'toy weapon');
}
return input;
}
Credit System with Atomic Operations
Users get 5 free credits on signup, can buy packages ($2.99/50 credits), or subscribe to Pro ($9.99/month for 300 credits). The credit deduction uses Supabase RPC for atomicity:
-- Credits are deducted before generation starts
-- If generation fails, credits are refunded
const creditResult = await useCredit(user.id);
if (!creditResult.success) {
return NextResponse.json(
{ error: creditResult.error },
{ status: 402 } // Payment Required
);
}
// ... generate image ...
if (!imageResult.success) {
// Refund on failure
await refundCredit(user.id, creditResult.credit_type);
}
Priority order: subscription credits > free credits > purchased credits. This way purchased credits (which never expire) are the last to be consumed.
SEO for Programmatic Pages
Each of the 7 styles gets its own page at /styles/[slug] with unique metadata, JSON-LD structured data, and content:
// app/styles/[slug]/page.tsx
export async function generateMetadata({ params }) {
const style = getStyleBySlug(params.slug);
return {
title: style.seoTitle,
description: style.seoDescription,
openGraph: {
images: [{ url: `/api/og?style=${style.id}` }],
},
};
}
The OG images are generated dynamically at the edge using Next.js ImageResponse:
// app/api/og/route.tsx
export const runtime = 'edge';
export async function GET(request: NextRequest) {
const style = request.nextUrl.searchParams.get('style');
return new ImageResponse(
<div style={{ /* gradient background, emoji, title */ }}>
{styleEmojis[style]} {styleNames[style]}
</div>,
{ width: 1200, height: 630 }
);
}
This gives every style page a unique social share image without storing any static files.
What I’d Do Differently
1. Start with fal.ai from day one. I initially set up Replicate, then switched to fal.ai for faster cold starts. Wasted a day on the migration.
2. Use an LLM for content filtering instead of keyword lists. Keyword blocklists have false positives (“assassin” contains “ass”). An LLM-based classifier would be smarter, but costs more per request.
3. Add image caching earlier. Same prompt + same style = same image. Could save a lot on API costs with a simple hash-based cache.
Numbers After 2 Weeks
- Built and deployed in ~12 days
- 7 style templates
- ~3 second average generation time
- Cost per image: ~$0.003 (fal.ai) + ~$0 (Gemini free tier)
- Deployed on Vercel free tier
Try It
babymeme.art — 5 free credits, no credit card required. Pick a style, type something funny, get a meme.
If you’re building something with AI image generation, the two-stage pipeline (LLM enhancement + fast diffusion model) is worth trying. The quality difference between a raw user prompt and an LLM-enhanced one is significant, and using a free model for enhancement keeps the cost near zero.
Have questions about the architecture or prompt engineering? Drop a comment below.