The era of cutting-edge AI research being driven solely by "carbohydrate computers"—humans pushing boundaries between coffee breaks and status meetings—is fading. Late last night, AI luminary Andrej Karpathy open-sourced a project that perfectly encapsulates the next paradigm: the autonomous AI researcher.
Dubbed autoresearch, the project is a shockingly minimalist (roughly 630 lines of code) framework designed to let an AI Agent conduct experiments overnight while you sleep. Anyone with a single GPU can now run a tireless, self-evolving research lab.
While autoresearch is currently designed to optimize LLM training, the underlying framework validates a structural shift that will soon apply to every domain in the enterprise.
The 630-Line Autonomous Loop
The core concept is elegant: give an AI Agent a small but authentic LLM training environment and let it autonomously run experiments all night.
The Agent executes a relentless loop:
- Reads context and previous results.
- Proposes targeted code modifications.
- Runs a fast, repeatable experiment (strictly capped at 5 minutes).
- Obtains an objective scalar score (
val_bpb- validation bits per byte). - Commits the winning changes via Git (or rolls back).
- Repeats the process infinitely on a feature branch. The brilliance lies in the 5-minute constraint. No matter how the AI modifies the model architecture, batch size, or optimizer (like Muon or AdamW), the fixed time budget ensures apples-to-apples comparisons. The agent is forced to find the absolute optimal model configuration for your specific hardware platform.
The New Division of Labor: Code vs. Markdown
To make this work, Karpathy radically simplified the codebase into three files:
prepare.py: The fixed constants and data loading scripts. (The Agent cannot touch this).train.py: The training loop and model architecture. (The Agent modifies this infinitely).program.md: The baseline instructions and prompts. (The human edits this). This split represents the new division of labor in the Agentic era. The human's job is to write the prompt (.md); the AI's job is to write the code (.py).

As Karpathy noted, he is already running a scaled-up version of this agent for his nanochat project across 8 H100 GPUs, allowing the system to continuously evolve without human intervention. To spin this up yourself, you simply point an agent like Claude, Codex, or OpenClaw at the repository and say: "Look at program.md and start a new experiment."
The Epsilla Perspective: Composable Agentic Workflows
Karpathy's experiment is a microcosm of what we are building at Epsilla.
What autoresearch demonstrates is that raw coding ability is no longer the bottleneck. The value has migrated up the stack to orchestration and boundary definition. You don't just want an AI running wild; you need a system where humans define the constraints (the prepare.py) and the objectives (the program.md), allowing the AI to handle the hyper-iteration (the train.py).
At Epsilla, we call this the "Deterministic Core + Generative Edge."
When enterprises deploy agents, they cannot rely on fragile, black-box wrappers. They need Composable Agentic Workflows. Our platform allows you to construct robust, deterministic pipelines for your high-risk operations (the Core), while unleashing autonomous agents to iterate, research, and execute at the boundaries (the Edge).
Just as Karpathy’s human researchers now focus on iterating the .md file rather than writing the model code, Epsilla allows enterprise operators to focus on designing the business logic and guardrails, letting our Agent-as-a-Service infrastructure handle the autonomous execution.
The future of work isn't humans writing software. It's humans writing the rules, and agents building the future.

