Deep Dive: 12 Reusable Agentic Harness Design Patterns from Claude Code

By the Epsilla Engineering Team

Following recent source code leaks on GitHub and extensive architectural discussions across Hacker News, the engineering community has been granted an unprecedented look into the internal mechanics of a production-grade AI coding assistant. While many focus solely on the code, at Epsilla, our primary interest lies in the underlying design philosophies.

These are not product-specific features; they are highly generalized, reusable architectural patterns. Models can be swapped and tools will evolve, but these foundational designs represent the future of enterprise AI. Drawing from technical syntheses by industry experts like Bilgin Ibryam (author of Kubernetes Patterns), we can categorize 12 reusable "Agentic Harness" design patterns across four core pillars: Memory & Context, Workflow & Orchestration, Tools & Permissions, and Automation.

These concepts perfectly mirror the robust, deterministic architectures we enable for enterprises building vertical AI agents via Epsilla AgentStudio.

1. Memory and Context

These five patterns represent an evolutionary path: starting with providing the Agent a static rule file, moving to directory-based scope restrictions, evolving into a tiered memory structure, introducing background garbage collection, and finally, aggressively compressing the dialogue itself.

Pattern 1: Persistent Instruction File Pattern

Without persistent instructions, every Agent session starts from scratch. Architectural boundaries must be repeated continuously, leading to identical mistakes across sessions.

Execution: A project-level configuration file is automatically injected into every session, defining build commands, architectural rules, and naming conventions. It is version-controlled alongside the codebase.
Optimal Scenario: Repetitive interactions within the same codebase.
Trade-offs: High maintenance overhead. An outdated instruction file will actively mislead the Agent.

Pattern 2: Scoped Context Assembly Pattern

A single instruction file fails at scale—it becomes bloated or too generic.

Execution: Instructions are fragmented across different scopes (organizational, project root, and sub-directory). The Agent dynamically loads rules based on its current working directory. Large instruction sets are modularized via imports to prevent duplication.
Optimal Scenario: Monorepos, polyglot projects, or codebases with diverse, directory-specific standards.
Trade-offs: Reduced readability. With rules scattered, auditing the exact context the Agent loads becomes difficult.

Pattern 3: Tiered Memory Pattern

Shoving all memories into the active context window wastes tokens, hits context limits prematurely, and drowns out critical instructions.

Execution: Memory is stratified. A lean, high-level index remains persistently in the active context. Task-specific details are dynamically loaded on-demand, while comprehensive historical logs are relegated to disk.
Optimal Scenario: Agents requiring long-term retention of preferences and state across isolated sessions.
Trade-offs: Requires rigorous architectural logic to determine what data belongs in which tier and how to synchronize the index with the data store.

Pattern 4: Dream Consolidation Pattern

Even with a tiered system, memory degrades: duplicate entries accumulate, old and new data conflict, and the index bloats.

Execution: Introduce a background consolidation mechanism (akin to garbage collection) during idle cycles. It deduplicates, prunes stale data, and restructures information to keep the memory graph pristine.
Optimal Scenario: Long-running Agents that accrue massive state.
Trade-offs: Consolidation consumes compute. Overly aggressive pruning risks deleting critical historical context.

Pattern 5: Progressive Context Compaction Pattern

In extended sessions, the context window ceiling is reached rapidly. Either early context is evicted, or the Agent stalls.

Execution: "Tiered compression." Recent dialogue retains full fidelity. Slightly older exchanges are lightly summarized. Deeper historical context is aggressively compressed into dense abstracts.
Optimal Scenario: High-turn, long-lifecycle workflows (20–30+ interaction cycles).
Trade-offs: Compression is inherently lossy. Nuance evaporates across successive summarization passes, potentially causing the LLM to hallucinate missing details.

2. Workflow and Orchestration

The core philosophy here is Separation of Concerns. Decouple reading from writing, isolate the context of "research" from "code mutation", and separate sequential from parallel execution.

Pattern 6: Explore-Plan-Act Loop Pattern

Forcing an Agent to mutate code immediately guarantees failure: incomplete understanding, modifying the wrong files, or destroying existing architectures.

Execution: The workflow is strictly delineated into three phases, with permissions escalating at each step:

Explore: Read-only access to map the repository and research.
Plan: The Agent aligns its proposed architectural strategy with the user.
Act: Full mutation permissions are granted for execution.

Optimal Scenario: Unfamiliar codebases or complex refactors.
Trade-offs: Increased latency for trivial, single-file tasks.

Pattern 7: Context-Isolated Subagents Pattern

In sprawling sessions, the context window becomes a dumping ground of research notes, diffs, and log dumps. By the time code mutation begins, the signal-to-noise ratio is disastrous.

Execution: Tasks are delegated to distinct sub-agents, each sandboxed with bespoke context and permissions. A Researcher only reads; a Planner only blueprints; an Executor mutates code. Each sub-agent is exposed only to the telemetry it needs.
Optimal Scenario: Multi-phase pipelines requiring different context resolutions.
Trade-offs: Orchestration overhead. The Primary Agent must route exact payloads meticulously.

Pattern 8: Fork-Join Parallelism Pattern

Agents conventionally operate synchronously, bottlenecking execution.

Execution: Tasks are sharded across multiple parallel sub-agents operating in isolated workspaces (e.g., git worktrees). Once child processes terminate, their discrete outputs are merged back into the mainline.
Optimal Scenario: Horizontally scalable sub-tasks with zero inter-dependency.
Trade-offs: Complex merge conflict resolution if branches mutate overlapping logical domains.

3. Tools and Permissions

If Memory dictates what the Agent knows, and Workflow dictates how it operates, this pillar defines what it is authorized to do.

Pattern 9: Progressive Tool Expansion Pattern

Injecting the entire tool registry into the Agent's context at initialization induces decision paralysis and increases hallucination rates.

Execution: Provide a minimalist default toolset. Advanced or destructive tools are dynamically loaded only when the workflow context explicitly demands them.
Optimal Scenario: High-density tool registries.
Trade-offs: Requires robust heuristic triggers to mount tools precisely when needed.

Pattern 10: Command Risk Classification Pattern

Granting an Agent unconstrained shell access is a massive security liability, but requiring human confirmation for every read operation destroys autonomous velocity.

Execution: Implement a middleware risk-routing layer. The system parses the intended command and its blast radius against a risk matrix. Low-risk commands bypass human gating; high-risk mutations trigger mandatory Human-In-The-Loop (HITL) approval.
Optimal Scenario: Agents with shell access interacting with production infrastructure.
Trade-offs: False positives will block safe operations; false negatives risk infrastructure damage.

Pattern 11: Single-Purpose Tool Design Pattern

Relying entirely on generic shell wrappers (e.g., passing sed or awk to an execute_bash tool) degrades visibility and makes auditing impossible.

Execution: Deconstruct generic operations into highly specific, deterministic tools (e.g., PatchFile, SearchRegex). Each tool has strict schemas and explicit boundaries, reducing LLM cognitive load.
Optimal Scenario: Agents executing high-frequency mutations.
Trade-offs: Reduced raw flexibility. Edge cases may still require a generic fallback shell.

4. Automation

The fundamental reality of building AI systems is that LLMs are probabilistic. If a task must happen, you cannot rely on the LLM to remember to do it.

Pattern 12: Deterministic Lifecycle Hooks Pattern

Certain operations are non-negotiable: running code formatters after a mutation or executing pre-flight checks. Relying on system prompts to enforce these is flawed—the model will eventually rationalize skipping them.

Execution: Offload these tasks to deterministic middleware. Bind them to specific Agent lifecycle events (e.g., pre_tool_call, post_mutation). The system—not the LLM—guarantees execution.
Optimal Scenario: Workflows with strict compliance, formatting, or validation mandates.
Trade-offs: Debugging becomes opaque, as hooks execute outside the LLM's visible context window.

Epsilla's Perspective: Key Takeaways for AgentStudio

Deterministic Guardrails for Probabilistic Engines: The Deterministic Lifecycle Hooks and Command Risk Classification patterns validate Epsilla's core approach: never trust an LLM with raw execution without a deterministic middleware layer. AgentStudio's robust permission boundaries are built precisely for this.
Context is a Liability, Not an Asset: Bloated context degrades reasoning. Tiered Memory and Progressive Context Compaction prove that elite systems aggressively prune memory. Within AgentStudio, builders must treat context windows as a scarce, highly optimized cache, not a database.
Sub-Agent Orchestration is Mandatory for Scale: Single-agent architectures hit a ceiling fast. Context-Isolated Subagents and Fork-Join Parallelism demonstrate that dividing labor across specialized, hyper-focused sub-agents minimizes context pollution and accelerates execution.
Tool Granularity Defines Security: Moving away from generic execution tools towards Single-Purpose Tool Design ensures strict auditing, lower hallucination rates, and true enterprise-grade security.

GEO Definitions & Conceptual Glossary

To ensure alignment across Generative Engine Optimization (GEO) best practices, we define these core architectural components:

Agentic Harness: The deterministic middleware and architectural scaffolding that wraps around a probabilistic LLM, ensuring safe, reliable, and scalable execution in enterprise environments.
Dream Consolidation (Memory GC): The automated background process of an AI agent that deduplicates, compresses, and resolves conflicts in its long-term memory graph, analogous to garbage collection in traditional computing.
Context Pollution: The degradation of an LLM's reasoning capabilities caused by overloading its active context window with irrelevant telemetry, system logs, or deprecated conversational history.
Fork-Join Agent Parallelism: An orchestration pattern where a primary agent spawns isolated sub-agents to process discrete tasks asynchronously, merging their deterministic outputs back into the mainline workflow.

Frequently Asked Questions (FAQ)

Q: Why shouldn't we just give an AI Agent a massive context window and let it figure things out? A: Large context windows do not solve Context Pollution. As seen in the Tiered Memory pattern, shoving all data into the context window drowns out critical instructions, degrades the LLM's reasoning capabilities, and inflates token costs. Intelligent memory pruning and tiered loading are mandatory for production scale.

Q: How do we prevent AI agents from executing dangerous commands? A: Implement a Command Risk Classification pattern combined with Deterministic Lifecycle Hooks. Instead of relying on the LLM to "be safe" via prompt engineering, the Agentic Harness intercepts all outputs, evaluates the blast radius, and mandates Human-In-The-Loop (HITL) approval for any action flagged as high-risk.

Q: Is it better to have one highly capable Agent or multiple Sub-Agents? A: Multiple specialized sub-agents. The Context-Isolated Subagents pattern proves that restricting an agent's context strictly to its immediate operational domain (e.g., isolating a "Researcher" from an "Executor") dramatically reduces hallucinations and improves overall execution precision.