Key Takeaways
- Three Core Patterns: AI agent systems can be effectively structured using three sub-agent patterns: Synchronous (for immediate results), Asynchronous (for parallel, non-blocking tasks), and Scheduled (for future execution).
- Context Over Concurrency: The primary value of sub-agents is not parallel execution but efficient context management. By delegating tasks, the parent agent's context remains lean, preventing token limits and performance degradation, reducing context tokens by over 90% in some cases.
- Asynchronous by Default: Building AI agents requires a shift from traditional synchronous software design to an asynchronous model. Treating sub-agents like colleagues you delegate tasks to allows for scalable, flexible, and non-blocking user experiences.
- Generalize First, Specialize Later: Avoid premature optimization. Start with a generalist parent agent and only create specialized sub-agents when driven by clear, evidence-based needs like divergent model requirements, security boundaries, or regulatory compliance.
- Unified Context is Key: The most significant long-term challenge is context management. A unified context layer, such as a semantic graph, is superior to simply increasing an LLM's context window, as it prevents context pollution and enables scalable, economically viable agent systems.
In software engineering, Sub-Agent Architecture is a design pattern where a primary orchestrating AI model delegates specific, scoped tasks to secondary, independent AI processes (sub-agents) to manage context limits, parallelize execution, and maintain system stability.
While many AI agent systems look impressive in controlled demos, they frequently fail in production due to bloated context windows, sluggish response times, and degraded conversational quality. Recently, Dan Farrelly, co-founder of Inngest, shared a compelling framework addressing this exact issue based on his experience building general-purpose AI agents. He argues that any production-grade AI system inevitably requires exactly three sub-agent patterns to function reliably: Synchronous, Asynchronous, and Scheduled. This isn't merely a technical implementation detail; it represents a fundamental shift in how we must architect AI systems to actually deliver sustained value without collapsing under their own context limits.
The True Purpose of Sub-Agents: Context Compression
Before diving into the three specific patterns, it's critical to understand why sub-agents exist in the first place. Dan Farrelly provides a sharp definition: a sub-agent is an independent LLM execution context generated by a parent agent to handle a tightly scoped task. The parent agent describes the objective, provisions the necessary tools, and waits for the result. The critical architectural constraint is that the sub-agent must execute entirely within its own context window, preventing its operational noise from polluting the parent's memory.
The core objective here is context compression. The Cursor team recently highlighted this exact principle on the Latent Space podcast. The sub-agent structure ensures the parent agent never has to absorb the sprawling, messy execution trajectory of a delegated task. Consider a scenario where a sub-agent reads 8 files and executes 15 distinct tool calls to solve a problem. The parent agent doesn't need that raw data; it only needs to ingest a concise, 750-token summary of the outcome.
In my analysis, this reveals the fundamental value proposition of the sub-agent pattern. Many assume its primary benefit is enabling parallel execution. However, as Dan Farrelly's work clarifies, the real return on investment comes from context management, not parallelism. By keeping the parent agent's context lean, it can handle a greater number of tasks within a single conversation without hitting context limits or degrading response quality. In testing the Utah project, Farrelly found that implementing sub-agents reduced the number of tokens added to the parent agent's context by over 90%.
Pattern 1: The Synchronous "Wait and See"

The synchronous pattern is the most straightforward. The parent agent spins up a sub-agent and blocks execution, waiting for the result to return. The sub-agent executes its tool calls, finishes the job, and passes a summarized result back up the chain. Dan recommends using this pattern when the parent agent fundamentally requires the answer before it can proceed with its own logic—such as a data query, analytical step, or code generation task where the output dictates the immediate next step.
I view the synchronous pattern essentially as a complex function call. You invoke it, you wait, and you get a return value. While the result does enter the parent agent's context window, the >90% compression rate of the summary means the parent can delegate dozens of times before context limits become a bottleneck. This design is highly intelligent: it maintains clear control flow while ensuring system scalability.
Pattern 2: The Asynchronous "Fire and Forget"

The asynchronous pattern, in my view, represents a necessary paradigm shift from traditional software design. Legacy systems are built on a foundation of synchronous, predictable execution. But in the world of AI agents, we must architect for a degree of asynchronicity and non-determinism. The most effective mental model is to treat a sub-agent as a colleague: you delegate a task, state "let me know when you're done," and immediately move on to your next priority.
This approach allows multiple asynchronous sub-agents to operate in parallel without complex coordination logic. The efficiency gains are obvious, but the more profound benefit is the inherent flexibility and scalability of the system. This is where the architectural integrity of your agent system is tested. The parent agent must maintain a clean, high-level state, offloading not just the task but the detailed operational context to the sub-agent. At Epsilla, we're building our Agent-as-a-Service platform around this principle, using a semantic graph to manage these distributed contexts so the parent agent isn't polluted with transient operational data.
Pattern 3: The Scheduled "Future Execution"

Most will overlook the scheduled pattern, but it's arguably the most strategically significant. Here, the parent agent instructs a sub-agent to execute at a specific point in the future. This is the correct pattern for intelligent follow-ups, dynamic reminders, and periodic checks—any scenario where "later" must mean "using the latest data at the time of execution."
I particularly appreciate the explanation for using two distinct tools instead of a single tool with a mode parameter. Models are fundamentally better at selection (choosing from a list) than they are at parameter optimization (reading a parameter description and choosing correctly). Separate tools translate to cleaner logs, clearer tracing, and radically easier debugging. This attention to detail reflects a seasoned engineer's deep understanding of system maintainability.
This brings to mind a familiar cycle in software engineering: the explosion of microservices followed by a strategic retreat to modular monoliths. A point is always reached where the overhead begins to outweigh the benefits. The same principle applies to generalist versus specialist agents. Starting simple and specializing only when absolutely necessary is almost always the more intelligent strategy.
When Specialization Becomes a Necessity
Despite the strong case for generalist agents, Dan Farrelly acknowledges clear scenarios where specialization is the superior approach. The decision should be driven by concrete needs, not architectural dogma.
- Divergent Model Requirements: One task requires vision capabilities, while another needs rapid classification. Routing these to different, optimized models is a clear win.
- Security Boundaries: Agent A requires access to sensitive customer data, while Agent B interacts only with public information. Isolating these functions into specialized agents is a security imperative.
- Regulatory Compliance: Certain domains, like finance or healthcare, demand auditable and independent processing pipelines for compliance.
- Empirical Evidence from Validated Evaluations: Your own metrics and evaluations consistently demonstrate that a specialized agent outperforms a generalist one on a specific, critical task. The guiding principle is this: specialization must be driven by measured necessity, not architectural aesthetics. Begin with a generalist approach and specialize only when the data and operational requirements make the case for you. This discipline applies not just to AI agent systems but to nearly all software design. We are often seduced by the pursuit of a "perfect architecture," but true wisdom lies in recognizing when simplicity is the optimal solution. This pragmatic stance is, in my view, the key to building sustainable agentic systems. The pace of technological advancement is so rapid that a specialization deemed essential today could be rendered obsolete by a new model capability tomorrow. Maintaining system flexibility and adaptability is far more valuable than perfecting an architecture for the present moment.
Future Trajectories in Agent Orchestration
Dan's team is currently employing a generalist method that applies across all three sub-agent patterns—Sync, Async, and Scheduled. However, they are actively exploring several compelling new frontiers.
- Self-Iterating Agents: With the introduction of scheduled sub-agents, the logical next step is to ask: why can't an agent continuously iterate on itself and the system it inhabits? This introduces fascinating possibilities for self-improvement, with cost and budget naturally serving as the primary constraints.
- Orchestration Awareness: Asynchronous and scheduled sub-agents return event IDs. By exposing these through APIs and context, any agent or sub-agent within the system can gain awareness of what is running, where it's running, and retrieve its status and results. The architecture evolves from a simple loop-and-fan-out model into a true, interconnected network.
- Callbacks and Advanced Patterns: The current reference implementation uses channels as a callback mechanism for task completion. But what would a more generalized sub-agent system with different callback types look like? Imagine callbacks that directly update a task in Linear, store citations from a research task in a database, or trigger a report generation. While tools can be used for these actions, callbacks provide a degree of guaranteed execution that is critical for robust systems. These explorations reveal a potential evolutionary path for AI agent systems: from simple request-response patterns to complex, network-aware collaboration, and ultimately, to autonomous, continuous optimization. Each step is built upon a solid, pragmatic foundation rather than a pursuit of novelty for its own sake.
My Reflections on the Core Challenges
Reading through Dan Farrelly's analysis crystallizes several core insights for me.
First, the design of AI agent systems is fundamentally an exercise in managing complexity. The natural tendency is to solve problems by adding more features and more layers of specialization, but we often overlook the power of simplicity. The value of the three sub-agent patterns lies not in their intricacy, but in how cleanly they map to real-world operational needs.
Second, context management is a profoundly underestimated problem. We see LLM context windows expanding from 4K to 32K and now to over 100K, and it's easy to assume that context length is a solved issue. Dan's practical experience proves otherwise. Even with massive context windows, effective context management remains paramount. A system that can maintain clarity and efficiency over a long-running conversation is infinitely more sustainable than one that relies on the brute force of an enormous, and expensive, context window.
This challenge is the core reason we architected Epsilla as an Agent-as-a-Service platform built on a Semantic Graph. The brute-force approach of stuffing everything into a prompt is inefficient and leads to context pollution, where the parent agent becomes bogged down by the detailed outputs of its sub-agents. The real solution isn't a bigger window; it's a smarter, unified context layer. By managing state and relationships within a semantic graph, our platform allows parent and sub-agents to operate with clean, relevant context, accessing the full history of operations without overloading the prompt. This is how you build systems that are not only powerful but also scalable and economically viable.
The adoption of asynchronous thinking is particularly critical in the architecture of AI agent systems. Traditional software engineering often prioritizes synchronicity and predictability. However, when tackling complex, multi-step tasks, embracing asynchronicity is what unlocks a superior user experience. A user should never be blocked, waiting for a long-running task to complete before they can continue the dialogue. The system must be able to delegate tasks to the background and notify the user upon completion. This design paradigm more closely mirrors the natural cadence of human collaboration and workflow.
The tension between generalization and specialization is a perennial theme in engineering. The history of software is a pendulum swinging between extremes: from monoliths to microservices and back to modular monoliths; from general-purpose tools to specialized solutions and back again. Dan Farrelly’s advice to start with a generalist agent and only specialize when presented with unequivocal evidence is a piece of battle-tested wisdom. It avoids premature optimization and the costly overhead of maintaining a fleet of specialized agents before the problem space truly demands it.
Ultimately, the most valuable aspect of this entire discussion is its pragmatism. It is not an exercise in promoting an idealized, theoretical architecture or showcasing exotic technology. It is a distillation of real-world practice and hard-won insights. In the hyper-accelerated field of AI agents, this grounded, execution-focused approach is not just valuable; it is essential for survival.
For those of you currently building agentic systems, I urge you to seriously consider these three sub-agent patterns. Resist the allure of overly complex architectures. Begin with the simplest implementation that meets the immediate need and allow actual user demand to drive your design decisions. The core imperative is to maintain system flexibility, as today's best practices in this domain are likely to be obsolete tomorrow.
Most importantly, focus on the fundamentals that will determine long-term success: user experience, system maintainability, and above all, context management. The latter is the true center of gravity. A seamless user experience and a maintainable system are downstream consequences of a robust context strategy. This is precisely the problem space where we operate at Epsilla. We posit that robust context and state management shouldn't be a recurring, application-level concern but a dedicated, foundational service. By offloading this complexity to a unified platform, like an Agent-as-a-Service model built on a Semantic Graph, each sub-agent—and the parent agent—can retrieve precisely the state and context it needs, without polluting the primary reasoning loop. This is how you build scalable, reliable, and truly intelligent systems.
FAQ: AI Sub-Agent Architecture
Q1: What is the main benefit of using sub-agents instead of one large AI model? A: The primary benefit is context compression. By delegating tasks to sub-agents, the main orchestrator agent avoids filling its context window with unnecessary execution details, preventing hallucinations, reducing token costs, and maintaining high reasoning quality over long sessions.
Q2: When should I use a synchronous vs. asynchronous sub-agent? A: Use a synchronous sub-agent when the main agent absolutely needs the result of the sub-agent's task to determine its next step. Use an asynchronous sub-agent for independent tasks where the main agent can continue interacting with the user without blocking.
Q3: How do Scheduled sub-agents maintain context if they run days later? A: Scheduled sub-agents rely on persistent state management outside the LLM's active memory. They rehydrate their context at the moment of execution by querying a central database or Semantic Graph, ensuring they act on the most current data rather than stale information.

