Traditional AI agents operate with a fundamental limitation: they’re single agents trying to handle everything at once. While they can be impressively capable for straightforward tasks, they struggle when faced with complex, multi-step challenges that require sustained reasoning, strategic planning, and coordinated execution.
The problems are threefold:
-
Limited Context Memory: Even with extended context windows, single-agent systems quickly exhaust their memory when dealing with multi-step tasks. Every file read, every search result, every intermediate finding consumes precious context space, leaving little room for actual reasoning.
-
Lack of Specialization: A single agent must simultaneously handle strategic planning, investigation, implementation, and verification. This jack-of-all-trades approach often leads to shallow execution. The same agent analyzing system architecture is also writing code and running tests, with no clear separation of concerns or focused expertise.
-
No Compound Intelligence: Traditional agents can’t meaningfully build on their own discoveries. Each action happens in isolation, with no accumulation of knowledge. There’s no concept of “this investigation discovered X, now another agent can use that knowledge to do Y more effectively.”
Enter Deep Agent architecture – a fundamentally different approach that addresses these limitations through multi-agent collaboration, forced specialization, and a shared Context Store. This is the same advanced pattern powering production tools like Claude Code, Manus, and Deep Research.
This article explores an experimental implementation built entirely from scratch (no frameworks!) to reveal what’s actually under the hood of these systems. By building without abstractions, we can see exactly how each component works – from context management to agent coordination to action parsing.
The key innovation? A shared Context Store that enables “compound intelligence” where the system gets smarter as it works:
- Knowledge Accumulation: Every discovery becomes a permanent building block
- No Redundant Work: Agents never rediscover the same information
- Focused Execution: Each agent receives only the contexts it needs
You’ll learn how multi-agent collaboration creates compound intelligence, how architectural constraints can improve system design, and how the Context Store transforms the way AI agents work together – all by examining a real implementation built from the ground up.
System Overview: Learning Through Agentic Code Assistant
To understand Deep Agent architecture in practice, we’ll dive deep into Agentic Code Assistant – an experimental, open-source implementation built entirely from scratch without using any agent frameworks. This from-the-ground-up approach reveals exactly how Deep Agent systems work internally, making visible the patterns that frameworks typically hide.
By examining this real system, we’ll explore how Deep Agent architecture patterns work in practice: multi-agent coordination, context management, forced delegation, and the intricate details that make these systems effective.
Throughout this article, we’ll use Agentic Code Assistant as our reference implementation to learn the principles, patterns, and practical considerations that power production Deep Agent systems.
What Makes Deep Agents Different
Unlike traditional agents that operate in short loops or perform simple tool calls, Deep Agents:
- Plan their actions strategically before execution
- Manage evolving context through persistent knowledge stores
- Delegate subtasks to specialised sub-agents with focused expertise
- Maintain state across long, complex interactions
- Build compound intelligence through knowledge sharing
Core Capabilities
The system transforms high-level coding requests into executed implementations through four key capabilities:
-
Strategic Task Decomposition When you submit a complex task like “Add authentication to the API,” the system doesn’t immediately start coding. Instead, it analyzes the request, breaks it into logical subtasks (investigation, planning, implementation, verification), and executes them systematically.
-
Sustained Reasoning Across Long Interactions Multi-step tasks that would overwhelm a single agent’s context window become manageable through structured delegation and knowledge passing. The system maintains coherent progress even across dozens of turns.
-
Knowledge Sharing Through Context Store Every discovery – from “the user model is in /models/user.py” to “authentication uses JWT tokens” – becomes a reusable knowledge artifact. Subsequent agents in the workflow access this accumulated knowledge without redundant investigation.
-
Built-in Verification Implementation isn’t the end. The system systematically verifies changes through test execution and validation, ensuring quality before considering a task complete.
The Typical Workflow
Here’s how the system handles a real request:
This workflow demonstrates the system’s core innovation: knowledge flows through the agents via distilled contexts, with each step building meaningfully on previous discoveries. The Coder doesn’t waste time searching for the user model
- the Explorer already found it and reported it as a context. The verification Explorer receives implementation contexts from the Coder, enabling focused testing. Importantly, the Orchestrator never sees the Explorer’s full investigation history or the Coder’s entire working context – only the relevant, distilled knowledge artifacts. This keeps the orchestration layer lean and focused on strategy.
Architecture Deep-Dive: How Deep Agent Systems Work
Now let’s examine the technical architecture that enables this compound intelligence. The system is built on three specialised agents, a persistent knowledge store, and carefully designed architectural constraints.
Multi-Agent Design: Specialisation Through Separation
The system employs three distinct agent types, each with specific responsibilities and access levels:
The Orchestrator (Lead Architect)
The Orchestrator is the strategic brain of the system – but with a crucial limitation: it has no direct access to code . This forced delegation pattern (which we’ll explore in detail shortly) shapes its entire behavior.
Responsibilities:
- Analyze incoming tasks and create strategic execution plans
- Decompose complex requests into focused subtasks
- Manage the Context Store (decide what knowledge to create and when to use it)
- Track all tasks via the Task Store
- Coordinate subagent execution
- Synthesize results and verify completion
Key Constraint: Cannot read, write, or modify code directly. Must delegate all code-related work to subagents.
Why This Matters: By removing direct code access, the architecture forces proper task decomposition. The Orchestrator can’t take shortcuts – it must think strategically about what information is needed, who should gather it, and how to coordinate execution. This constraint encourages better system design.
The Explorer (Read-Only Investigator)
Explorers are investigation and verification specialists with read-only access to the codebase.
Responsibilities:
- Investigate codebase structure and patterns
- Search for specific implementations or conventions
- Verify implementations through test execution
- Gather information and report contexts containing discoveries
- Validate that changes work as expected
Key Constraint: Read-only access – can execute code and tests, but cannot modify files.
Why This Matters: Separation of investigation from implementation prevents the common anti-pattern of “quick fixes while exploring.” Explorers focus purely on understanding and verification. They might read dozens of files during investigation, but report back only the essential contexts – the distilled knowledge. This produces thorough, accurate knowledge artifacts without polluting the Orchestrator’s context with raw investigation data.
The Coder (Implementation Specialist)
Coders are implementation specialists with full read-write access to the codebase.
Responsibilities:
- Implement features and bug fixes
- Make code modifications (create, edit, delete files)
- Run tests to validate their own implementations
- Report changes with contexts describing what was implemented and how
- Self-validate before reporting back
Key Constraint: Operates with specific implementation instructions from the Orchestrator, with all necessary context pre-loaded.
Why This Matters: Coders receive everything they need upfront – architectural knowledge from Explorers, clear instructions from the Orchestrator, and relevant contexts from the Context Store. They can focus entirely on implementation quality. When done, they report back with contexts containing implementation details, not their entire working history. This keeps the knowledge base focused on outcomes, not process.
The Context Store Innovation: The Key to Compound Intelligence
The Context Store is the system’s most important innovation. It’s a shared knowledge base that enables true compound intelligence through knowledge accumulation and reuse during task execution.
The Problem It Solves
Traditional AI agents suffer from “knowledge amnesia”:
- Repetitive Discovery: Each agent rediscovers the same information, wasting context and time
- Wasted Resources: Precious tokens spent rediscovering known information
- Context Explosion: Every agent carries full context history, quickly exhausting available context windows
- No Learning: Knowledge gained by one agent is lost to the next
The Context Store solves this by making every discovery shareable and reusable. Sub-agents maintain their full context only while running, then report back only the relevant contexts they discovered. This keeps the Orchestrator’s context lean while enabling compound intelligence.
How It Works
The system treats knowledge as persistent artifacts with a clear lifecycle:
- Context Creation The Orchestrator identifies what knowledge is needed and creates named context “slots”:
context_refs:
- user_model_schema
- database_configuration
- authentication_patterns
- Knowledge Discovery Subagents (typically Explorers) investigate and populate these contexts. While running, sub-agents maintain their full context including all file reads, searches, and intermediate findings. However, when they complete, they report back only the distilled, relevant contexts:
{
"id": "authentication_patterns",
"content": "The codebase uses JWT tokens for authentication. Implementation in /middleware/auth.py. User model includes email and hashed_password fields. Token generation uses jose library with HS256 algorithm.",
"metadata": {
"created_by": "explorer",
"task_id": "investigate_auth"
}
}
- Centralized Storage Contexts are stored centrally, indexed by ID, and available to all agents during the task execution. This means:
- Sub-agents work with full context while running (investigating, reading files, etc.)
- Only relevant, distilled contexts are reported back to the Orchestrator
- The Orchestrator’s context stays lean and focused on strategy
- Future sub-agents receive only the specific contexts they need
- Context Injection When launching new subagents, the Orchestrator injects relevant contexts:
task_create:
agent_name: coder
title: Implement auth middleware
context_refs:
- authentication_patterns # Reused from previous task
- user_model_schema # Reused from previous task
- api_route_structure # Newly discovered
- Knowledge Building Each task adds to the knowledge base. The first task discovers auth patterns. The second task implements auth and adds implementation details. The third task adds test patterns. Each builds on the previous.
Context Types
The system supports multiple context types, each capturing different knowledge:
- Code Context: File contents, function signatures, class hierarchies
- Architecture Context: System design, dependencies, patterns in use
- Test Context: Test results, coverage information, verification outcomes
- Bug Context: Error analysis, root causes, reproduction steps
- Implementation Context: Changes made, approach used, trade-offs considered
Real-World Impact
Consider this scenario within a complex task execution:
Main Task: “Add comprehensive error tracking with logging to the API”
Key Observations:
- Explorer’s full context (all 20 files read, searches performed) is discarded – only the distilled contexts remain
- New Explorer receives logging_config context (no need to rediscover)
- Coder receives logging_config + error_patterns + log_patterns contexts – zero context wasted on rediscovering infrastructure
- Final Explorer receives all relevant contexts for focused verification
This is compound intelligence in action: knowledge accumulates during execution, each agent builds on previous discoveries, and the Orchestrator’s context remains focused on orchestration rather than drowning in raw investigation data.
The Forced Delegation Pattern: Architectural Constraint as Design Principle
One of the most interesting aspects of this architecture is what the Orchestrator cannot do: access code directly. This isn’t a limitation – it’s an intentional design constraint that dramatically improves system behavior. The Orchestrator agent has no access to file reading, writing, or editing tools. If it needs to know what’s in a file, it must delegate to an Explorer. If it wants code modified, it must delegate to a Coder. There are no shortcuts.
Why This Design Choice
- Forces Proper Task Decomposition Without shortcuts available, the Orchestrator must think systematically:
- What information do we need? (Launch Explorer)
- What should be built? (Plan based on Explorer findings)
- How should it be implemented? (Provide clear instructions to Coder)
- Is it correct? (Launch Explorer to verify)
Quick fixes and ad-hoc changes become impossible, encouraging methodical execution.
- Encourages Strategic Thinking When you can’t implement directly, you must think strategically about delegation:
- What contexts does the Coder need?
- Has an Explorer already discovered this information?
- Should verification happen before or after the next step?
This leads to better-planned, more coherent implementations.
- Creates Clear Separation of Concerns Strategy and implementation become truly separate:
- Orchestrator: “What needs to happen and why”
- Coder: “How to implement it”
- Explorer: “What exists and whether it works”
Each agent can focus on its expertise without role confusion.
- Prevents Anti-Patterns Common failure modes become impossible:
- Can’t mix investigation with hasty implementation
- Can’t skip verification because “it should work”
- Can’t bypass proper knowledge sharing to save time
Impact on System Behavior
In practice, this constraint produces notably more systematic implementations. The forced delegation version is more verbose, but produces more thorough, better-verified implementations.
Stateless Turn-Based Execution
The system follows a clear, reproducible execution pattern:
Each Turn:
- LLM receives current state + history + feedback from previous turn
- LLM outputs actions (XML-wrapped YAML)
- System executes all actions in the turn
- System provides feedback (success/error/results)
- Cycle repeats until task completion or turn limit
This stateless design means agents don’t carry hidden state – everything is explicit in the turn history.
Technical Implementation: How the System Actually Works
Now that we understand the architecture conceptually, let’s examine how it’s actually implemented.
The Action System: How Agents Communicate
Agents communicate intentions through structured actions using a hybrid XML/YAML format that’s both LLM-friendly and human-readable.
Action Format
Actions are XML tags containing YAML-structured parameters:
agent_name: explorer
title: Find authentication patterns
description: |
Investigate the codebase to discover:
- Location of user model
- Authentication library in use
- Existing auth middleware
- Database configuration for users
context_refs:
- database_config
- user_model_schema
auto_launch: true
Why This Format?
- XML tags are easy for LLMs to generate reliably
- YAML content is human-readable and supports complex structures
- Pydantic validation ensures type safety and catches errors early
Actions by Agent Type
Each agent type has access to specific actions based on its role:
Orchestrator Actions:
- task_create: Create new subagent tasks
- launch_subagent: Launch created tasks
- add_context: Add knowledge to Context Store
- finish: Mark task complete and return to user
Explorer Actions:
- file (read mode): Read file contents
- file (metadata mode): Get file information
- search (grep/glob): Search for code patterns
- bash: Execute commands (read-only)
- todo: Track investigation progress
- scratchpad: Take notes
- report: Report findings with contexts
Coder Actions:
- file (read/write/edit/multi_edit): Full file manipulation
- bash: Execute commands (including installations)
- search (grep/glob): Find code to modify
- todo: Track implementation progress
- scratchpad: Document approach
- report: Report implementation with contexts
Note the access level differences: Explorers can only read files, while Coders can modify them. The Orchestrator can’t access files at all.
Action Parsing Pipeline
The system processes actions through a multi-stage pipeline:
Pipeline Stages:
- XML Extraction: Parse LLM response to find XML tags
- YAML Parsing: Extract and parse YAML content from each tag
- Pydantic Validation: Validate structure and types
- Handler Routing: Route to appropriate specialized handler
- Execution: Execute the action
- Feedback Generation: Create structured feedback for next turn
Example Flow:
# LLM outputs:
"""
path: /src/auth.py
"""
Communication Protocol: Reports and Context Flow
Sub-agents communicate results through structured reports that contain contexts – the distilled knowledge from their execution:
{
"contexts": [
{
"id": "auth_middleware_location",
"content": "Authentication middleware is located at /src/middleware/auth.py. It uses JWT tokens with HS256 algorithm..."
},
{
"id": "test_results",
"content": "Ran authentication test suite: 15 tests, all passing. Verified: token generation, validation, expiration..."
}
],
"comments": "Investigation complete. All authentication code documented and verified.",
"meta": {
"num_turns": 5,
"total_input_tokens": 12450,
"total_output_tokens": 3820,
"model": "openai/gpt-5"
}
}
This reporting mechanism is crucial for context efficiency:
While the sub-agent ran, it maintained full context including:
- All files it read
- All search results
- All intermediate findings
- All bash command outputs
But when reporting back, it distills this into only the relevant contexts. The Orchestrator never sees the raw investigation data – only the refined knowledge.
Knowledge Flow:
Key Points:
- Sub-agent works with full context during execution
- Sub-agent reports only distilled contexts when complete
- Context Store persists contexts centrally
- Orchestrator remains lean, working only with strategic information
- Future agents receive only the specific contexts they need
This creates compound intelligence without context explosion. Each agent builds on previous discoveries, but the system doesn’t accumulate unnecessary detail.
Project Structure: Code Organization
The implementation is organized into clear modules:
src/
├── core/ # Core system implementation
│ ├── agent/ # Base agent classes and task definitions
│ │ ├── agent.py # Abstract Agent base class
│ │ └── agent_task.py # AgentTask data model
│ ├── orchestrator/ # Orchestrator implementation
│ │ ├── orchestrator_agent.py
│ │ ├── orchestrator_hub.py
│ │ └── factory.py # Agent factory
│ ├── context/ # Context Store implementation
│ │ └── context_store.py
│ ├── task/ # Task management
│ │ ├── task_store.py
│ │ └── task_manager.py
│ ├── action/ # Action system
│ │ ├── action_handler.py
│ │ ├── action_parser.py
│ │ └── handlers/ # Specialized action handlers
│ ├── file/ # File operations
│ │ └── file_manager.py
│ ├── search/ # Search operations
│ │ └── search_manager.py
│ ├── bash/ # Command execution
│ │ └── docker_executor.py
│ └── llm/ # LLM client
│ └── litellm_client.py
├── system_msgs/ # Agent system prompts v0.1
│ └── md_files/
│ ├── orchestrator_sys_msg_v0.1.md
│ ├── explorer_sys_msg_v0.1.md
│ └── coder_sys_msg_v0.1.md
└── misc/ # Logging and utilities
└── pretty_log.py
test/ # Test scripts and examples
├── real_simple_task.py # Simple file creation example
└── real_advanced_task.py # Complex API server example
Design Principles:
- Clear separation: Each module has a single, focused responsibility
- Dependency injection: Components receive dependencies explicitly
- Type safety: Pydantic models throughout for validation
- Testability: Components can be tested in isolation
Docker-Based Execution Environment
The system executes code in isolated Docker containers to ensure:
- Safety: No risk to the host system
- Reproducibility: Consistent environment across runs
- Isolation: Each test run is independent
# From real_simple_task.py
executor = DockerExecutor(container_name)
executor.execute("mkdir /workspace")
executor.execute("cd /workspace")
Agents execute code in this container
All file operations happen inside Docker
This makes the system safe to experiment with – agents can install dependencies, run code, and make changes without affecting your actual development environment.
Practical Applications: Real-World Examples
Theory is interesting, but let’s see the system in action with real examples from the codebase.
Example 1: Simple File Creation
The simplest example demonstrates the core workflow:
Task: “Create a file called ‘hello.txt’ with content ‘Hello world! Ending with a newline.'”
What Happens:
# From test/real_simple_task.py
instruction = (
"Create a file called 'hello.txt' with content 'Hello world! Ending with a newline.'"
)
agent_task = AgentTask(task_id="test_task", instruction=instruction)
result = orchestrator.run_task(agent_task, max_turns=15)
Execution Flow:
- Orchestrator receives task
- Orchestrator creates task:
- Coder executes:
- Coder reports back:
- Orchestrator verifies:
- Result:
Even for this simple task, the system follows proper delegation patterns. The Orchestrator doesn’t create the file itself – it delegates to a specialist.
Example 2: Complex API Server Implementation
A more realistic example shows the system handling multi-step complexity:
Task: "Create and run a server on port 3000 with GET /fib endpoint. It should accept a query param /fib?n={number} and return the nth Fibonacci number as a JSON object. Handle error cases for missing/invalid parameters."
# From test/real_advanced_task.py
instruction = """Create and run a server on port 3000 that has a single GET endpoint: /fib.
It should expect a query param /fib?n={some number} and return the nth Fibonacci number
as a JSON object with a key 'result'.
If the query param is not provided, it should return a 400 Bad Request error.
If the query param is not an integer, it should return a 400 Bad Request error.
Automatically choose to install any dependencies if it is required.
Execution Flow:
- Orchestrator analyzes task:
- Orchestrator launches Explorer:
- Explorer investigates:
- Explorer reports:
- Orchestrator synthesizes plan:
- Orchestrator launches Coder with contexts:
- Coder implements:
- Coder reports implementation:
- Orchestrator verifies:
- Result:
Key Observations:
- Strategic Investigation: Explorer investigated environment before implementation
- Context Reuse: Coder received environment information without rediscovering it
- Systematic Implementation: Breaking task into subtasks (install, implement, test)
- Built-in Verification: Explorer tested multiple cases to ensure correctness
- Knowledge Accumulation: Contexts created (fibonacci_api_implementation, api_error_handling) available for future tasks
Use Case Patterns
The system excels in several common software development scenarios:
1. Feature Implementation
Pattern: Investigate → Plan → Implement → Verify
Example: “Add user profile endpoints to API”
- Explorer investigates: existing user model, database schema, API patterns
- Orchestrator plans: endpoints needed, validation requirements
- Coder implements: profile GET/PUT endpoints with validation
- Explorer verifies: tests all endpoints, edge cases
Benefit: Implementations follow existing patterns and conventions
2. Bug Investigation and Fix
Pattern: Reproduce → Investigate → Diagnose → Fix → Verify
Example: “Fix authentication failing for expired tokens”
- Explorer reproduces: creates test case showing failure
- Explorer investigates: traces auth flow, identifies token validation issue
- Orchestrator diagnoses: token expiration check missing
- Coder fixes: adds expiration validation
- Explorer verifies: all auth tests passing, expired tokens rejected
Benefit: Systematic approach ensures root cause is addressed, not just symptoms
3. Code Refactoring
Pattern: Understand → Plan → Refactor → Validate
Example: “Refactor database connection handling”
- Explorer maps: all current database connection code
- Orchestrator plans: centralized connection management approach
- Coder refactors: creates connection pool, updates all usage sites
- Explorer validates: all tests passing, no broken connections
Benefit: Strategic planning prevents breaking changes
4. Test Generation
Pattern: Understand → Identify Gaps → Generate → Validate
Example: “Add comprehensive tests for authentication”
- Explorer understands: reads all auth code, identifies untested paths
- Orchestrator identifies: missing edge case tests (expired tokens, invalid formats, etc.)
- Coder generates: comprehensive test suite
- Explorer validates: runs tests, confirms coverage improvement
Benefit: Tests are comprehensive and understand the actual implementation
Getting Started: Try It Yourself
Want to run the system and see it in action? Check the Github’s repository to see how to get started.
Conclusion: Building Toward Truly Capable AI Development Tools
Let’s step back and examine what this architecture teaches us about the future of AI-powered software development.
Key Takeaways
- Specialization Creates Compound Intelligence
The system demonstrates that multiple specialized agents can achieve more than a single generalist agent. By dividing responsibilities – strategy (Orchestrator), investigation (Explorer), and implementation (Coder) – each agent can focus on its expertise. The whole becomes greater than the sum of its parts.
- The Context Store Enables Compound Intelligence Without Context Explosion
The Context Store is transformative. Traditional agents rediscover the same information repeatedly, wasting tokens and time. The innovation here is dual: knowledge accumulates (each agent builds on previous discoveries), but context stays lean (sub-agents report only distilled contexts, not their full working context). This enables compound intelligence without overwhelming the system with detail.
- Architectural Constraints Can Improve Design
The forced delegation pattern – preventing the Orchestrator from directly accessing code – seems like a limitation but produces better outcomes. By removing shortcuts, the architecture encourages systematic thinking, proper decomposition, and thorough verification. Sometimes constraints clarify thinking.
- Stateless Turn-Based Execution Provides Clarity
The explicit turn structure (LLM → Actions → Execution → Feedback) creates reproducible, debuggable behavior. Each turn has clear inputs and outputs. This transparency makes the system easier to understand, debug, and improve.
Why This Matters
This project demonstrates production-grade agent architecture patterns in an accessible, educational format. Built entirely from scratch without frameworks, it reveals the actual mechanics behind systems like Claude Code, Manus, and Deep Research – showing not just what they do, but how they do it.
More importantly, it bridges the gap between research and practical tools. The concepts here – multi-agent collaboration, context-efficient knowledge sharing, forced delegation patterns – aren’t just interesting theoretically. They solve real problems that single-agent systems face: context explosion, lack of specialization, and inability to build compound intelligence. By building from scratch, we can see exactly how each pattern addresses these challenges.
Future Potential
The architecture is designed for extensibility. Imagine additional specialized agents:
- Security Agent: Focused on vulnerability detection and secure coding patterns
- Documentation Agent: Creates and maintains comprehensive documentation
- Performance Agent: Profiles code and implements optimizations
- Test Agent: Specialized in test generation and coverage analysis
Each would contribute to the Context Store during task execution, building an increasingly sophisticated understanding through their specialized investigations.
The Task Store could enable even longer-running tasks with failure recovery. If a task fails, the system could analyze the failure, adjust the approach, and retry – learning from mistakes within the same execution.
The Context Store could become a true knowledge graph, with contexts linking to related contexts, enabling semantic queries during task execution: “Find all contexts related to authentication” or “Show me everything discovered about the user model.”
Call to Action
The project is open-source under Apache 2.0 license. Built from scratch without frameworks, the codebase is designed to be readable and extensible – perfect for learning how these systems actually work. Whether you’re:
- A developer curious about agent architectures and what’s under the hood
- A researcher exploring multi-agent systems and coordination patterns
- An engineer building AI-powered tools who wants to understand the internals
- A student learning about AI application architecture from first principles
There’s value in exploring the code, trying the examples, and understanding how production Deep Agent systems work at a fundamental level.
GitHub Repository: https://github.com/yourusername/agentic-code-assistant
Try the examples. Read the code. Experiment with different LLM models. Contribute improvements. Build additional agents. The goal is education and exploration.
The future of AI systems isn’t single agents trying to do everything. It’s specialised agents working together, sharing distilled knowledge through efficient context management, and building compound intelligence without context explosion. This project demonstrates that future – today.
Happy coding.




