Over the past year, three terms have dominated conversations around AI coding:
Spec / Plan / Design Document
There’s a growing belief that if a model can first generate a comprehensive spec, and an agent can then execute against it, complex tasks can be automated end-to-end.
It sounds reasonable.
In practice, it rarely works that way.
The problem isn’t that specs are unimportant.
The problem is that we’re generating them in the wrong way.
That’s precisely the gap CodeFlicker is designed to address.
1. We’ve Misunderstood What a Spec Actually Is
In most AI IDEs, a “spec” typically means:
- A document generated before coding
- A description of implementation steps or architecture
- A one-shot artifact that drives downstream execution
This mindset is inherited from traditional software engineering.
But in AI-native workflows, this definition breaks down.
The real value of a spec is not the “plan” itself.
It’s whether the spec encodes sufficient context.
A spec is not a plan.
A spec is an explicit representation of context.
If the context is wrong or incomplete, the spec merely amplifies the error.
2. Why One-Shot Spec Generation Fails by Design
Most AI coding workflows implicitly assume:
Given partial context, ask the model to generate a complete, usable spec in one go.
This assumption is structurally flawed.
2.1 Models Are Biased Toward Early Convergence
Large language models are optimized to produce coherent, well-structured answers quickly.
They are not optimized to:
- Exhaustively explore edge cases
- Construct counterfactual reasoning paths
- Challenge their own assumptions
- Perform adversarial validation
Yet a production-grade spec requires exactly that:
- Coverage of boundary conditions
- Explicit trade-offs
- Rejected alternatives and their rationale
- Hidden constraints surfaced explicitly
This runs counter to the model’s default generation dynamics.
2.2 When Business Context Is Missing, the Model Hallucinates Plausibility
In a coding agent scenario, the model can access:
- Repository structure
- APIs and type information
- Limited commit history
Call this technical context.
But what actually determines task success is often:
- Why is this feature being built now?
- What defines success?
- Which approaches were previously rejected?
- What constraints are non-negotiable but not expressed in code?
That is business context.
And business context is rarely encoded in the repository.
When that layer is missing, a one-shot spec becomes a plausible fiction.
3. Why Small Tasks Seem Fine — and Large Tasks Collapse
There’s a critical complexity boundary here.
For small tasks (0.5–1 engineering day):
- Context is narrow
- Constraints are visible
- Failure cost is low
One-shot planning can occasionally succeed.
For multi-week initiatives:
- Context is deeply entangled
- Implicit constraints dominate
- Directional errors are expensive
One-shot specs almost always degrade.
This isn’t a model capability issue.
It’s a complexity scaling issue.
4. Specs Shouldn’t Be Generated. They Should Be Discussed.
This is where CodeFlicker takes a fundamentally different approach.
Instead of encouraging users to “generate a spec,” CodeFlicker introduces:
Discuss Mode
This is not a casual chat mode.
It’s an engineered workflow constraint.
In Discuss Mode:
- The model’s convergence is intentionally slowed
- It is prevented from outputting a full solution prematurely
- Assumptions must be surfaced
- Edge cases are interrogated
- Rejected paths are documented
The spec is not produced in one shot.
It emerges across structured dialogue as a progressively refined context model.
This shifts the problem from document generation to epistemic alignment.
5. Where the Efficiency Actually Comes From
We validated three development paradigms on ~10 engineering-day initiatives.
1. Traditional Development (~10 PD)
- 1–2 days writing a design doc
- Remaining time spent coding and debugging
2. Typical AI-Assisted Coding (~8 PD)
- Human still writes the design
- Agent accelerates 0.5-day sub-tasks
- Saves ~1–2 PD
AI acts as a productivity multiplier, not a structural transformer.
3. Discuss + Plan-First Workflow (~2.5 PD)
- First 2 days in Discuss Mode
- Systematically surface business constraints and assumptions
- Produce a deeply detailed, execution-grade spec
- Freeze context before implementation
- Main implementation completed in 4–6 hours
The efficiency gain does not come from typing code faster.
It comes from eliminating directional error early.
Complexity is absorbed upfront rather than leaking into execution.
6. The Dual-Mode Workflow in CodeFlicker
In CodeFlicker, complex tasks typically follow:
Discuss → Plan → Execute
Step 1: Discuss
Output: outline.md
Captures:
- Core decisions and trade-offs
- Explicit constraints and “no-go zones”
- Rejected approaches and rationale
The goal is clarity, not solution generation.
Step 2: Plan
Outputs:
tech-design.mdplan.md
The technical design is decomposed into verifiable tasks with explicit acceptance criteria.
Step 3: Execute
- Code is generated against a frozen plan
- 70–90% implementation coverage
- Code review is validated against plan constraints
At this stage, the agent no longer guesses intent.
It executes against a contract.
7. The Real Bottleneck in AI Coding
The limiting factor in AI coding isn’t code generation quality.
It’s contextual completeness.
Without context, a spec is formatting.
With context, a spec becomes an execution contract.
Discuss Mode in CodeFlicker exists to:
- Delay premature convergence
- Surface hidden assumptions
- Freeze boundaries before execution
- Construct a high-fidelity context model
Only then can an agent operate reliably in production-scale tasks.
Conclusion
Specs are not a silver bullet.
A one-shot spec simply helps an AI move faster in the wrong direction.
But a spec that emerges through structured discussion can serve as a stable execution foundation.
If you’re building multi-week features with AI tooling, the missing piece might not be a better model.
It might be an environment that supports discussion-driven spec formation.
That’s the problem CodeFlicker is built to solve.