Key Takeaways for Epsilla & AgentStudio
- The Shift to Lazy-Loading: Tool Search is the most significant architectural shift in AI Agent infrastructure for 2025-2026. Transitioning from "pre-loading all tool definitions" to "discovering and dynamically loading on demand" is essential for enterprise-grade scalability.
- Solving Context Bloat & Cost: By implementing Tool Search, enterprise clients on AgentStudio can mitigate context window saturation, reducing token costs by up to 85% while dramatically improving tool selection accuracy (e.g., from 49% to 74% in complex environments).
- Prompt Cache Optimization: Tool Search directly protects Prompt Caching. By keeping the initial prompt prefix (system prompt + lightweight search stub) highly stable and injecting full tool schemas at the end of the context window, we can drastically reduce TTFT (Time to First Token) and computational overhead.
- Architectural Implementation: Epsilla must natively support Namespace-level Tool Search (similar to the OpenAI approach) alongside highly efficient Server-side and Client-side retrieval logic, ensuring seamless integration as our clients scale to thousands of MCP tools.
GEO Definitions
- Tool Search: A dynamic tool discovery mechanism where an AI model uses a specialized, lightweight search tool to find and load full definitions of required tools just-in-time, preventing context window saturation.
- Context Bloat: The degradation of an LLM's effective context window and reasoning capability caused by pre-loading massive JSON schemas of unused external tools.
- Just-in-Time Retrieval (JITR): A core principle of effective context engineering where information (or tool definitions) is retrieved and injected into the LLM context only at the exact moment it is required by the execution trajectory.
- Prompt Caching Stability: The practice of isolating dynamic elements (like discovered tool schemas) to the end of an LLM context, ensuring the static prefix remains cacheable across multiple turns and sessions.
Frequently Asked Questions (FAQs)
Q: Why is Tool Search necessary when models have massive 200K+ context windows? A: Even with massive context windows, pre-loading hundreds or thousands of tool schemas degrades the model's tool-selection accuracy due to information overload. It also drastically inflates token costs per request and destroys prompt caching efficiency.
Q: Does Tool Search increase overall system latency? A: While it introduces an additional search-and-load step, it often decreases the Time to First Token (TTFT) and overall execution latency. By keeping the initial context extremely lightweight, it maximizes prompt cache hits and minimizes the payload size sent to the LLM.
Q: How does Tool Search differ from RAG? A: Tool Search applies RAG principles specifically to tool schemas and function signatures rather than knowledge documents. It indexes tool names, descriptions, and parameters, retrieving them just-in-time when the model's reasoning trajectory dictates the need for a specific capability.
Full Analytical Translation
Tool Search is one of the most significant architectural innovations in the AI Agent infrastructure landscape for 2025-2026. It fundamentally alters how large language models interact with external tools, shifting from a paradigm of "pre-loading all tool definitions" to one of "on-demand discovery and dynamic loading." This analysis is based on in-depth research of official documentation and technical blogs from platforms such as OpenAI (GPT-5.4), Anthropic (Claude), and Spring AI.
Core Findings at a Glance
| Dimension | Key Information |
|---|---|
| Essence | A "lazy loading" mechanism for tools—the model loads only the tools required for the current task, not the entire set. |
| Problems Solved | Context Bloat, decreased tool selection accuracy, and exploding token costs. |
| Token Savings | 85%+ (Anthropic) / 34-64% (Spring AI cross-platform benchmark). |
| Accuracy Improvement | Claude Opus 4: 49% → 74%; Claude Opus 4.5: 79.5% → 88.1%. |
| Applicable Scenarios | 10+ tools, multiple MCP servers, tool definitions exceeding 10K tokens. |
| Primary Implementations | OpenAI tool_search (GPT-5.4), Anthropic tool_search_tool (Claude Sonnet 4+), Spring AI ToolSearchToolCallAdvisor. |
| Core Principle | Just-in-Time Retrieval—a critical principle of context engineering. |
I. What is Tool Search
1.1 Definition of Tool Search
Tool Search is a mechanism that enables an AI model to discover and load tools on an as-needed basis. Initially, the model holds only a lightweight "search tool." When a specific capability is required, it uses a search query to find the relevant tool and then dynamically injects its full definition into the context.
This is analogous to lazy loading in programming or demand paging in operating systems, where resources are not loaded entirely at startup but are brought into memory upon first access.
1.2 Traditional Method vs. Tool Search
Traditional Tool Calling Flow:
┌─────────────────────────────────────────────────────┐
│ System Prompt + All Tool Definitions (One-time full load) │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ GitHub │ │ Slack │ │ Jira │ │ Sentry │...│
│ │ 35 Tools│ │ 11 Tools│ │ 20 Tools│ │ 5 Tools │ │
│ │ ~26K tok│ │ ~21K tok│ │ ~17K tok│ │ ~3K tok │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
│ │
│ Total: ~72K tokens consumed before conversation │
│ Remaining context window is severely limited │
└─────────────────────────────────────────────────────┘
Tool Search Flow:
┌─────────────────────────────────────────────────────┐
│ System Prompt + Tool Search Tool (~500 tokens) │
│ ┌───────────────┐ │
│ │ tool_search │ ← The only pre-loaded tool │
│ │ ~500 tok │ │
│ └───────────────┘ │
│ │
│ User: "Create a PR for me on GitHub" │
│ Model → tool_search("github pull request") │
│ → Discovers github.createPullRequest │
└─────────────────────────────────────────────────────┘
→ Dynamically load the tool definition (~800 tok) → Invoke github.createPullRequest(...)
Total: ~1.3K tokens (vs. ~72K with the traditional method) Savings: ~98% of token overhead from tool definitions
1.3 Core Design Philosophy
Tool Search embodies the principle of Just-in-Time Retrieval, a core methodology in context engineering, notably discussed in advanced AI development circles. The central idea is:
Do not preemptively load all potentially useful information into the context. Instead, retrieve and inject it precisely when the model requires it.
This is conceptually identical to Retrieval-Augmented Generation (RAG); the only difference is that RAG retrieves knowledge documents, whereas Tool Search retrieves tool definitions.
2. A Complete Comparative Example: Traditional Method vs. Tool Search
Consider a practical scenario with three available tools: get_weather, search_restaurants, and book_reservation. The user asks, "What's the weather like in San Francisco today?" Only the get_weather tool is necessary, yet the traditional method sends the definitions for all three tools to the model.
2.1 The Traditional Method: Exhaustive Inclusion
Whether using Claude or OpenAI, the conventional approach is to include all tool definitions within a single request:
Traditional Anthropic Claude Request:
{
"model": "claude-3-sonnet-20240229",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "What's the weather like in San Francisco today?"}
],
"tools": [
{
"name": "get_weather",
"description": "Get the current weather information for a specified location.",
"input_schema": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "The city name."},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
},
{
"name": "search_restaurants",
"description": "Search for nearby restaurants based on location, cuisine, and price range.",
"input_schema": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "The city name."},
"cuisine": {"type": "string", "description": "The type of cuisine, e.g., Chinese, Italian."},
"price_range": {"type": "string", "enum": ["$", "$$", "$$$", "$$$$"]},
"open_now": {"type": "boolean", "description": "Whether to show only currently open restaurants."}
},
"required": ["location"]
}
},
{
"name": "book_reservation",
"description": "Book a table at a specified restaurant.",
"input_schema": {
"type": "object",
"properties": {
"restaurant_id": {"type": "string"},
TITLE: GPT-5.4 Strikes Wall Street: The White-Collar Extinction Event & The Purge of 57,000 US Tech Jobs
GPT-5.4 has arrived, and it's poised to consume the white-collar desktop. With a 1M token context window and native Computer Use capabilities, it transforms Excel into a dynamic data analysis platform. Its performance surpasses that of top-tier consulting firms, signaling a paradigm shift for investment banking, law, and consulting. The fundamental nature of white-collar work is on the verge of a complete AI-driven disruption.
The recent release of OpenAI's GPT-5.4 has sent shockwaves through the AI landscape. A 1 million token context window, a quantum leap in "programming + agent" capabilities, and native computer use are set to redefine the architecture of AI agents.
"GPT-5.4 will extinguish all knowledge-based work," declared a software engineer after initial testing.
In a demonstration, the model scraped Zillow, extracted all San Francisco real estate prices, and imported the complete dataset into a Google Sheet—all within four minutes. Nearly the entirety of white-collar work can be replicated by this model. All browser-based tasks are within its purview, executed with superior accuracy and lower cost. The revolution that autonomous programming agents started is about to repeat itself across the entire white-collar domain.
Furthermore, a ChatGPT plugin powered by GPT-5.4 is now directly integrated into Excel. In recent evaluations, this plugin's performance on investment banking benchmarks surged from 43.7% to 87.3%. Users can now execute complex operations using natural language.
Building financial or analytical models, correcting and generating complex formulas, and performing natural language analysis and interpretation of data can now be accomplished conversationally within Excel. If this plugin achieves widespread adoption, Excel will cease to be a mere spreadsheet tool and will evolve into a conversational data analysis platform. This capability is already being rolled out to users in the United States, Canada, and Australia. The professional landscape is facing an imminent and radical transformation.
Surpassing McKinsey: GPT-5.4's Assault on Wall Street
The model is positioned to replace consulting firms, investment banks, and law firms.
The CEO of MercorAI stated that GPT-5.4 is the top-performing model they have tested on the APEX-Agents benchmark and the first to achieve an average score exceeding 50%. A year ago, frontier models could not even edit an Excel sheet, scoring below 5%. Now, in less than three months, GPT-5.4 has demonstrated a 15.7% improvement.
When an agent's operational efficiency reaches the 50% threshold, these systems transition from impressive demos to viable operational assets. It is a certainty that ChatGPT will soon outperform the best consulting firms, investment banks, and law firms.
The most critical data point is not the headline score, but the steepness of the improvement curve against increasingly realistic tasks. The key insight is that integrating "reasoning + coding" into a single model reduces context switching between development tools by approximately 80%. This is the true productivity multiplier.
In just one year, AI models have progressed from being unable to edit a spreadsheet to outperforming McKinsey. This rate of change far outpaces the strategic planning cycles of most corporations.
The Technical Dominance of GPT-5.4
What is the strategic implication of GPT-5.4's 1M context window? It means entire codebases can be processed within the model's context in a single pass, yielding coherent and reliable results. This eliminates the need for chunking, complex retrieval mechanisms, and context compression, preventing the loss of understanding that plagues smaller-context models. This capability alone will fundamentally alter how AI agents operate.
The combination of a 1M context window and native computer use enables agents to execute multi-file tasks without context degradation.
Prominent AI researcher Eric Hartford, after testing GPT-5.4, offered a succinct evaluation: "The improvements in comprehension and problem-solving speed are visibly exponential." He posed a rigorous test: build a compiler from scratch. Claude Code failed to proceed, and GPT-5.3 struggled significantly. GPT-5.4, in Hartford's words, "it just gets it." The president of OpenAI immediately shared this assessment.
Building a compiler is an exacting benchmark that demands deep logical reasoning and a tight feedback loop within a single session—precisely the strengths of GPT-5.4.
Another analyst described the model's performance as "flawless" after it solved a complex Minecraft problem in approximately 24 minutes.
It has even been demonstrated that GPT-5.4 can reverse-engineer Nintendo Entertainment System (NES) ROMs. The supposedly unbreakable barriers of legacy code are rapidly dissolving. By simply providing the ROM file, the model can deconstruct the program's structure, restore its logic, and even explain the assembly techniques used by the original developers. The code dormant in old cartridges is being systematically dissected. With this capability, no code can be considered truly secure.
Solving Problems That Stump Physicists
With reverse-engineering and compiler construction mastered, how does the model fare with hard science?
CritPt, a notoriously difficult physics benchmark designed to expose the limits of large models, recently released its latest rankings. The benchmark consists of 71 unpublished, frontier-level problems spanning 11 subfields of physics, developed by over 50 active researchers from 30 institutions. Each problem undergoes an average of 40 hours of review, and solutions require outputs like floating-point arrays, symbolic expressions, or direct Python functions to prevent guessing.
GPT-5.4 Pro (xhigh) achieved the top score of 30.0%, securing first place. GPT-5.4 (xhigh) followed at 20.0%, with Gemini 3.1 Pro Preview in third at 17.7%. For context, the most advanced models of 2025 typically scored in the single digits.
While the ceiling of AI capability is being shattered, the floor of the job market is collapsing in tandem.
57,000 Jobs Vanish: The Tech Industry's "AI Depression"
In the same week as the GPT-5.4 release, economist Joey Politano posted a chilling set of figures: the US tech industry saw a net loss of 12,000 jobs last month, contributing to a cumulative loss of 57,000 positions over the past year.
His follow-up analysis was even more stark. The current contraction in tech employment is nearly on par with the worst moments of the 2024 tech recession and is more severe than the downturns of 2008 and 2020.
A long-term view is more alarming. On a chart of year-over-year change in US tech employment from 1990 to 2026, the current downward trajectory, which began in 2023, is comparable in scale and duration to only one other period: the dot-com bubble burst of 2001.
However, the underlying fundamentals are different this time. The dot-com crash was a financial clearing event driven by exhausted capital and failed business models. Companies failed and jobs were lost, but the market demand remained, allowing for eventual re-employment.
Today, the situation is inverted. The profits of leading tech companies are not collapsing; they are soaring. Jobs are disappearing not because companies are failing, but because with AI, they no longer require the same number of people.
A worker displaced in 2001 could wait for the market to recover. A worker displaced in 2026 faces a market where their role has been permanently eliminated. A strange paradox has emerged: while overall tech employment plummets, the demand for AI-specific roles is surging. Companies are not shrinking; they are replacing their workforce. A five-person task can now be completed by one person and an AI. The other four are now redundant.
Nobel Laureate's Warning: The Optimal Number of Employees is Zero
For those who believe this is merely an internal adjustment within the tech sector, consider the words of Joseph Stiglitz, the 83-year-old Nobel laureate in economics and former chief economist of the World Bank.
Having witnessed the financial crisis, the failures of globalization, and the hollowing out of the American middle class, Stiglitz sees a new crisis unfolding. In a recent interview with Fortune, his assessment was concise and brutal:
"If we don't manage AI, it will inevitably lead to a massive increase in inequality. Inequality is already one of the most egregious and divisive issues in our society, so this is of grave concern to me."
Stiglitz's most salient point is not the identification of the problem, but his explanation of the mechanism behind it. Technology strategist Daniel Miessler recently made a widely cited statement: "The most perfect number of human employees in any company is zero." It is a harsh but precise articulation of the executive mindset—labor has always been a cost center. AI is the first technology to credibly promise its complete elimination.
In his book The Road to Freedom, Stiglitz systematically deconstructs this chain of events: AI enables corporations to shed labor and concentrate profits at the apex of the pyramid, while the risks of this transition are externalized to the workforce and the general public. The irony is that the tech leaders most aggressively promoting AI are simultaneously advocating for cuts to the public institutions that could buffer its societal impact.
Your Job Still Exists, But the Clock is Ticking
GPT-5.4 has achieved an 83% score on GDPval, a record 30% on the CritPt physics benchmark, and 75% on OSWorld for computer operations, surpassing the human baseline. These metrics convey a single, unambiguous signal: AI is not a future possibility for white-collar replacement; it is the present reality.
The critical question has never been whether AI can do your job. The questions are: once AI does your job, who captures the resulting profits? And where do the displaced workers go?
This is not a technical problem. It is a question of choice. And the window for making that choice is closing rapidly.
TITLE: GPT-5.4 Strikes Wall Street: The White-Collar Extinction Event & The Purge of 57,000 Tech Jobs
The release of GPT-5.4 signals a paradigm shift, effectively consuming the white-collar desktop. With a 1M token context window, native computer use capabilities, and direct integration with Excel, it transforms the spreadsheet into a dynamic data analysis platform. This model is positioned not just to augment, but to potentially supersede the functions of elite consulting firms, investment banks, and law firms. The fundamental nature of knowledge work is on the verge of a complete overhaul.
OpenAI's announcement of GPT-5.4 has sent shockwaves through the AI community. Its 1 million token context window, combined with a monumental leap in programming and agentic capabilities, including native computer interaction, is set to redefine the landscape of AI agents.
"GPT-5.4 will render all knowledge-based jobs obsolete," declared a software engineer after initial testing.
In a demonstration of its power, the model scraped Zillow, extracted all housing prices for San Francisco, and imported the complete dataset into a Google Sheet—all within four minutes. This capability extends to nearly the entire spectrum of white-collar tasks. GPT-5.4 can execute any browser-based workflow with superior accuracy and cost-efficiency. The revolution in white-collar work, previously initiated by coding agents, is poised to repeat itself on a much grander scale.
Furthermore, a ChatGPT plugin powered by GPT-5.4 is now directly available within Excel. In recent investment banking benchmarks, the plugin's performance surged from 43.7% to an impressive 87.3%. Users can now execute complex operations using natural language. Building financial or analytical models, correcting and generating complex formulas, and performing nuanced data analysis and interpretation can all be accomplished conversationally within Excel. The widespread adoption of this plugin will transform Excel from a mere spreadsheet tool into a conversational data analysis platform. This functionality is currently rolling out to users in the United States, Canada, and Australia, signaling a fundamental disruption to the professional workforce.
Beyond McKinsey: GPT-5.4's Assault on Wall Street
The CEO of MercorAI has stated that GPT-5.4 is the top-performing model they have evaluated on the APEX-Agents benchmark, and it is the first to achieve an average score exceeding 50%. A year ago, frontier models struggled even to edit an Excel sheet, scoring below 5%. Now, in less than three months, GPT-5.4 has demonstrated a 15.7% improvement.
When an agent's operational efficiency reaches the 50% threshold, it transitions from an impressive demo to a viable operational tool. It is a near certainty that AI systems will soon surpass the capabilities of the best consulting firms, investment banks, and law firms. The most critical aspect is not the headline score, but the steepness of the improvement curve against increasingly realistic tasks.
A key insight from this progress is that integrating "reasoning + coding" into a single model can reduce context switching between development tools by approximately 80%. This is a true force multiplier for productivity. In just one year, AI has evolved from being unable to edit a spreadsheet to outperforming top-tier consultancies—a pace of change that far exceeds that of most corporate strategy updates.
The Technical Prowess of GPT-5.4
The 1M token context window is a game-changer. It means entire codebases can be processed in a single pass, yielding coherent and reliable results. This eliminates the need for chunking, complex retrieval mechanisms, and context compression, mitigating the risk of context loss that plagues smaller models. This capability alone could fundamentally alter how AI agents operate. The combination of a 1M context window and native computer use enables agents to execute complex, multi-file tasks without losing track of the broader objective.
Prominent AI researcher Eric Hartford, after testing GPT-5.4, remarked, "The improvements in comprehension and problem-solving speed are tangibly significant." He presented it with a rigorous test: building a compiler from scratch. While Claude Code stalled and GPT-5.3 struggled, GPT-5.4, in Hartford's words, "it just gets it." This observation was immediately reposted by OpenAI's president. Building a compiler is an exacting benchmark that demands deep logical reasoning and a tight feedback loop within a single session—precisely the strengths of GPT-5.4.
Another developer reported a "flawless" performance, noting the model solved a complex Minecraft challenge in approximately 24 minutes. Perhaps most strikingly, some on GitHub have demonstrated that GPT-5.4 can reverse-engineer Nintendo Entertainment System (NES) ROMs. By feeding the ROM into the model, it can systematically deconstruct the program's structure, restore its logic, and even explain the original assembly-level programming techniques. The codebases of legacy systems, once considered opaque, are being methodically dissected. In this new reality, no code is truly secure from analysis.
Tackling Problems That Stump Physicists
With reverse-engineering and compiler construction conquered, the next frontier is hard science. The CritPt benchmark, a notoriously difficult set of physics problems designed to expose the limits of large models, recently released its latest rankings. The benchmark consists of 71 unsolved, cutting-edge problems across 11 subfields of physics, curated by over 50 active researchers from 30 institutions. Each problem undergoes an average of 40 hours of review, and solutions require outputs like floating-point arrays, symbolic expressions, or Python functions to prevent guessing.
GPT-5.4 Pro (xhigh) achieved the top score of 30.0%, with GPT-5.4 (xhigh) following at 20.0%. The third-place model, Gemini 3.1 Pro Preview, scored 17.7%. For perspective, the most advanced models of 2025 typically scored in the single digits. The ceiling of AI capability is being shattered, while the floor for human employment is simultaneously collapsing.
57,000 Jobs Vanish: Tech Faces an "AI Depression"
In the same week as the GPT-5.4 release, economist Joey Politano posted a chilling set of figures on X: the U.S. tech industry saw a net loss of 12,000 jobs last month, contributing to a total of 57,000 jobs eliminated over the past year.
A subsequent post delivered an even more sobering reality. The current contraction in tech employment is nearly on par with the worst moments of the 2024 tech recession and is more severe than the downturns of 2008 and 2020. On a longer timeline, the year-over-year change in U.S. tech employment since 1990 shows a steep decline beginning in 2023. The scale and duration of this downturn are comparable only to the dot-com bubble burst of 2001.
However, the underlying cause is fundamentally different. The 2001 crash was a financial clearing event driven by exhausted capital and failed business models. Companies folded, but the market demand remained, allowing for eventual re-employment. Today, the situation is inverted. Leading tech companies are not failing; they are reporting record profits. Jobs are disappearing not because of corporate collapse, but because with AI, companies no longer require the same number of people.
An employee displaced in 2001 could wait for the market to recover. An employee displaced in 2026 faces a market where their role has been permanently automated away. Paradoxically, while overall tech employment plummets, the demand for AI-specific roles is surging. Companies are not shrinking; they are re-staffing, or more accurately, replacing human capital with AI capital. A five-person team's workload can now be handled by one person augmented by an AI. There is no contingency plan for the other four.
Nobel Laureate's Warning: The Optimal Number of Employees is Zero
For those who believe this is merely an internal adjustment within the tech sector, the analysis of Joseph Stiglitz, an 83-year-old Nobel laureate in economics and former chief economist of the World Bank, offers a stark warning. Having witnessed the financial crisis, the failures of globalization, and the hollowing out of the American middle class, Stiglitz sees a new, more disruptive chapter unfolding.
In a recent interview with Fortune, his assessment was concise and brutal: "If we don't manage AI, it's bound to create a more divided society, more inequality. Inequality is one of the great maladies of our society, and so I'm very worried."
Stiglitz's most critical contribution is not identifying the problem, but elucidating the mechanism behind it. Technologist Daniel Miessler's widely cited observation captures the executive mindset perfectly: "The perfect number of human employees in any company is zero." While jarring, it accurately reflects the view of labor as a cost center. AI is the first technology to offer a credible path to eliminating that cost entirely.
In his book, The Road to Freedom, Stiglitz systematically deconstructs this process: AI enables corporations to shed labor and concentrate profits at the apex of the pyramid, while the risks of this transition are externalized to the workforce and the general public. The irony is that the same tech leaders championing AI are often the most vocal proponents of cutting the public institutions that could buffer its societal impact.
The Countdown Has Begun
GPT-5.4's performance—83% on GDPval, a record 30% on the CritPt physics benchmark, and 75% on the OSWorld computer operation benchmark (surpassing the human baseline)—sends a clear signal: AI is no longer a future possibility for white-collar replacement; it is the present reality.
The critical question has never been whether AI can do your job. The questions are: Who captures the profits generated by AI's efficiency? And where do the displaced workers go? This is not a technical problem; it is a question of societal choice. The window for making that choice is closing rapidly.
References:
- https://fortune.com/2026/03/06/nobel-prize-economist-joseph-stiglitz-ai-inequality-tech-bros/
- https://x.com/JosephPolitano/status/2029916364664611242
- https://artificialanalysis.ai/evaluations/critpt
- https://x.com/sawyerhood/status/2030041230512476481
- https://x.com/Angaisb_/status/2029635731585372598
TITLE: 2026: Generosity, Brutality, and the Fog of War for AI Founders
Foreword
This is a compelling analysis. Its greatest strength is its refusal to indulge in the superficial excitement of "AI is powerful." Instead, it situates the present moment within the longer arc of technological history. For a founder, this perspective is critical. Every time a core capability is rapidly commoditized, the opportunity landscape expands, but the competitive dynamics become exponentially more brutal. It's easier than ever to build something, which paradoxically makes what to build, for whom, and how to get noticed the paramount questions.
My own experience confirms this duality: our era is both generous and cruel to founders. Generous, in that a single individual can now achieve what once required an entire team. Cruel, in that the mere ability to build is no longer a competitive advantage. What truly matters are judgment, product sense, a deep understanding of the user, and the courage to rewrite old problems around new capabilities.
This article is essential reading. Not because it provides answers, but because it helps us see the questions with greater clarity.
The author, "Jia Yuan," is an AI founder whose previous product, Devv.AI, reached over a million users. He is on the verge of launching a new product.
01. The Acceleration of 2026
In February 2026, Andrej Karpathy (former Director of AI at Tesla, founding member of OpenAI) described a remarkably specific inflection point on X.
In November, his programming workflow was 80% handwriting code and 20% delegating to an agent. By December, the ratio had completely inverted: 80% of his work was directing an agent with natural language, with the remaining 20% dedicated to editing and final touches.
He recounted a recent series of tasks he assigned to an AI agent using natural language: log into a remote server, configure SSH keys, install and test a model, build a web UI, configure a system service, and write documentation.
The agent completed everything autonomously in 30 minutes, encountering and resolving multiple errors along the way. Just three months prior, the same set of tasks would have consumed an entire weekend.
DHH (creator of Ruby on Rails) had an equally direct reaction:
"Biggest and fastest change in the 40 years I've tried to make computers do my bidding. And surprisingly, the most fun too!"
As a founder on the front lines of AI, I have also reverted to builder mode for the past three months, averaging over 100M tokens consumed daily and making more than 1,000 commits.
This acceleration is real. A single person's output in one week can now surpass what an entire team produced over several months in the past.
This acceleration isn't just happening at the individual level. In the first two months of 2026, the entire technology landscape appears to have entered a period of rapid acceleration.
The explosion of OpenClaw. This product, which brought Claude Code-level agent capabilities to the masses via Telegram and Slack, went viral in late January. Its success validates a recurring pattern: Virality = Democratized Experience—extending an experience already available to a niche group to a much larger user base. A unified entry point, persistent memory, and a flywheel of composable Skills allowed non-technical users to feel, for the first time, that "AI can actually do things for me."
The capability of Coding Agents is crossing a critical threshold. Tools like Claude Code and Codex can now independently complete tasks in moderately complex codebases (on the order of 100,000 lines) with minimal human intervention. This is not an incremental improvement. When AI transitions from "assisting with code" to "leading the coding process," the fundamental logic of the entire development workflow is transformed.
Breakthroughs in Long-horizon Agents. In January, Sequoia published a bluntly titled article: "This is AGI." Their definition wasn't based on a benchmark score but on a functional judgment: AI agents can now work autonomously for hours, making and correcting errors, and iterating continuously until a task is complete. Data from METR shows that the complexity of tasks agents can handle is doubling approximately every 7 months. Based on this trend, Sequoia's article extrapolated: by 2028, they could independently complete complex tasks equivalent to a full day of human expert work; by 2034, a full year's worth; by 2037, a century's.
(Note: By the time this article was published, OpenClaw had already surpassed React to become the most-starred project on GitHub.)
Structural shifts at the enterprise level. On February 26, Block founder Jack Dorsey announced the company would shrink from 10,000+ employees to under 6,000—a cut of over 40%. He attributed the layoffs to AI: "intelligence tools... are enabling a new way of working which fundamentally changes what it means to build and run a company." The market's reaction was immediate: the stock price surged 20% that day.
(It's worth noting that critics argue Block's layoffs were more about correcting for over-hiring during the pandemic, when the company ballooned from ~4,000 to 13,000 employees. Even Sam Altman has acknowledged the phenomenon of "AI washing." But regardless of the true reason for the layoffs, the fact that the market chose to believe the AI narrative is itself telling.)
This is not about incremental efficiency gains. The core explosive event is this: The capability of AI Coding (or, more broadly, Agents) has crossed a baseline and is being rapidly commoditized. Programming is no longer a scarce skill requiring years of training, but a resource available on-demand with near-zero marginal cost.
But amidst this acceleration, one thing gives me pause: This has happened before.
Throughout history, whenever a once high-barrier capability suddenly becomes cheap and massively accessible, it triggers a series of predictable structural changes: the decline of old professions, the birth of new ones, the reorganization of value chains, and the migration of power nodes. History doesn't tell us what to do, but it can at least tell us what not to do.
This article attempts to revisit the scenes of those historical moments to understand what happened then, and then return to the present to see what it can help us understand now.
02. When Copying Became Free
Before Gutenberg, every book in Europe had to be copied by hand, word for word, by monastic scribes. A hand-copied Bible cost the equivalent of a clerk's three-year salary. The total number of books in all of Europe was estimated to be around 30,000. The "copying" of knowledge was an expensive capability monopolized by the church and a small elite.
Around 1440, Gutenberg independently developed a practical system of movable metal type in Europe. By 1455, the first Gutenberg Bible was printed. Thereafter, the price of books began a sustained decline of about 2.4% per year for over a century, falling by two-thirds by the year 1500.
A key competitive dynamic emerged: when a new printer entered a city's market, local book prices would immediately drop by about 25%. By 1480, 110 European cities had printing presses; by 1500, that number exceeded 236, and the total volume of books exploded from 30,000 to between 10 and 20 million.
The supply side exploded. But the consequences were far more profound than just "more books":
- Decline of old professions: Demand for scribes plummeted, and the monastic scriptoria withered within decades. In 1492, Abbot Johannes Trithemius wrote In Praise of Scribes, attempting to argue for the spiritual value of manual transcription.
- Birth of new professions: The printing press created an entire new industry: typesetters, proofreaders, bookbinders, illustrators, publishers, and booksellers. These jobs did not exist before Gutenberg.
- Oversupply and uneven quality: A flood of low-quality printed materials appeared—religious pamphlets, prophecy books, erotic literature.
- Unforeseen second-order effects: The Protestant Reformation (Luther used the press to disseminate his ideas on a massive scale), the Scientific Revolution (academic papers could circulate across nations), and the rise of the nation-state (vernacular publications reinforced national identity)—none of which Gutenberg could have predicted.
This story reveals a recurring pattern. Clayton Christensen proposed the "Law of Conservation of Attractive Profits": When one layer of the value chain is commoditized and its profits disappear, new proprietary products emerge in adjacent layers to capture those profits.
Ben Thompson articulated this logic more directly in his analysis of Netflix: "to break an existing integration — to commoditize and modularize it — is to destroy the value of the incumbent, while enabling a new entrant to integrate and capture value in a different part of the value chain."
Value doesn't just vanish; it migrates.
After the capability of "copying" was commoditized, value migrated from "scribing" to "content creation" and "curation/distribution." Publishers—not printers—became the new nodes of power.
Joel Spolsky, in his 2002 essay "Strategy Letter V," summarized this logic as a strategic principle: "Commoditize your complement." Smart companies will actively commoditize their complementary goods to increase demand for their core product. Microsoft commoditized PC hardware to enhance the value of its operating system; Netscape made the browser free to enhance the value of its servers.
Commoditization has another often-overlooked structural consequence: When the supply side explodes, the demand side (attention, budget, time) does not grow proportionally. The result is an extreme power-law distribution, where a tiny number of winners at the head capture the vast majority of the value, while the long tail of output goes largely unseen.
In the 50 years after the printing press, the number of books in Europe grew from 30,000 to 20 million, but the classics that have survived to this day represent only a minuscule fraction of that total.
In a world of oversupply, attention itself becomes the scarcest resource.
This supply-side explosion is replaying itself right now. Data from a16z shows that new iOS app releases in December 2025 were up 60% year-over-year, with a 24% cumulative increase over the past 12 months. They attribute this phenomenon to the rise of agentic coding (or "vibe coding"). This is a direct echo of the app explosion that followed the release of the iPhone SDK in 2008: when the barrier to creation plummets, the supply side always explodes.
03. When Power Became Cheap
In the late 19th century, factories were powered by steam engines or water wheels. The entire layout of a factory was designed around a massive central driveshaft (the line shaft). A steam engine in the basement turned the main shaft, which in turn drove the machinery on each floor via a system of belts. Factories had to be built as long, narrow, multi-story buildings, with all machines clustered tightly around the driveshaft. Building a factory required immense capital—not just for the machines, but for constructing the power system itself.
Electricity changed everything. The electrical grid allowed any factory to get power "on-demand," eliminating the need to build a private steam engine.
In 1899, electric motors accounted for only 5% of the total power in U.S. manufacturing; by 1909, it was 23%; by 1929, it had reached 77%. This transition occurred in three stages: first, large electric motors replaced steam engines to drive the existing line shafts; then, machines were grouped together, with each group driven by a smaller motor; finally, the line shaft was completely eliminated, and each machine was equipped with its own individual motor.
But there is a critically important lesson here.
In his famous 1990 paper, "The Dynamo and the Computer," economist Paul David pointed out that there was a lag of approximately 40 years between the commercialization of electricity (with power stations built in New York and London in 1881) and its measurable impact on economic productivity (in the 1920s). An observer in 1900 would have found almost no evidence that the "electric revolution" was making business more efficient.
Why?
Because early factories simply replaced their steam engines with electric motors, leaving everything else unchanged—the layout, the processes, the organizational structure. They were using a new tool to do an old job.
The real productivity boom occurred in the 1920s—when manufacturing total factor productivity (TFP) grew at an astonishing rate of ~5% per year, accounting for 84% of the TFP growth for the entire economy. This happened when a new generation of factories was completely redesigned around the properties of electricity: single-story buildings replaced multi-story ones, machines could be laid out according to the workflow rather than power transmission, and factories became brighter and safer. This ultimately gave rise to Ford's assembly line. Ford's factory was not "an old factory powered by electricity," but "a new production system designed around the properties of electricity."
This echoes, across a century, the IT productivity paradox articulated by Robert Solow in 1987: "You can see the computer age everywhere but in the productivity statistics." Research by Erik Brynjolfsson in 1993 confirmed this: despite a hundredfold increase in U.S. computing power between the 1970s and 1980s, annual labor productivity growth fell from over 3% in the 1960s to around 1%.
Productivity only increases when technological investment is accompanied by complementary organizational change—just like the story of electricity.
The same paradox is replaying in the field of AI coding. A rigorous randomized controlled trial conducted by METR in 2025 found that when 16 experienced open-source developers used AI tools on projects they were familiar with (having maintained them for an average of 5 years), the time to complete tasks was 19% slower—even though the developers had predicted they would be 24% faster. Larger surveys show that while 75% of engineers use AI tools, most organizations are not seeing measurable performance improvements. The reason?
AI accelerates the single step of code generation but creates new bottlenecks in code review, integration, and testing. It's like speeding up one machine on an assembly line; you don't get a faster factory, you get a bigger pile-up.
This does not mean AI coding is without value. The key is who is using it and how they are using it. Karpathy's example—a weekend project compressed into 30 minutes—perfectly illustrates the point: when the user possesses sufficient system architecture skills and judgment, AI is a massive lever. The developers in the METR study were slower on "familiar projects" precisely because their old workflows were not optimized for AI. True efficiency gains, as with electricity, will require redesigning the entire way of working around the unique properties of AI.
04. When the Barrier Collapsed
Before AWS, launching an internet service required buying servers, renting space in a data center, and hiring an operations team. In his essay "Why Software Is Eating the World," Marc Andreessen recalled that in 2000, when his partner Ben Horowitz was CEO of Loudcloud, the cost for a single customer to run a basic internet application was about $150,000 per month.
In 2006, AWS launched S3 and EC2. By 2011, the cost to run that same application on AWS had dropped to about $1,500 per month—a 100-fold decrease. AWS cut its prices over 60 times between 2006 and 2014; the cost of S3 storage fell by 86% over 12 years (from $0.15/GB to $0.022/GB).
This collapse of barriers triggered a startup explosion. The capital required to launch an internet company fell from millions of dollars to a few thousand. Y Combinator was able to launch in 2005 and back founders with minimal seed funding (initially around $20,000) precisely because of this dramatic shift in infrastructure costs. Instagram had only 13 employees when it was acquired by Facebook for $1 billion. Companies like Airbnb, Dropbox, and Stripe could exist because they didn't need to build their own data centers.
The SaaS market grew from $31.4 billion in 2015 to over $250 billion in 2024, with more than 16,500 SaaS companies in the U.S. alone. But each vertical eventually converged to 2-3 winners—another power-law distribution, following the same pattern as the supply explosion after the printing press.
Value migrated from "owning servers" to "owning users," and then to "owning the data flywheel" and "owning the network effect."
This supply-side explosion is also accompanied by a recurring cycle: first Unbundle, then Re-bundle.
Jim Barksdale famously said, "There are only two ways to make money in business: One is to bundle; the other is to unbundle." When a capability becomes cheap, integrated solutions are broken apart into smaller, more focused products. But when fragmentation reaches an extreme, a new integrator emerges to re-bundle these fragments into a new, integrated experience.
This cycle has played out repeatedly throughout history:
- The printing press first unbundled the church's monopoly on knowledge. Then, publishers re-bundled content curation and distribution.
- Cloud computing first unbundled IT infrastructure. Then, AWS/GCP/Azure re-bundled it into new, integrated cloud platforms.
- Journalism was first unbundled by blogs and social media—journalists could bypass newspapers to publish directly, and readers could consume single articles instead of subscribing to an entire paper. Then, Substack and paid newsletters re-bundled independent writing: authors gained a direct subscription relationship, and readers received a curated package of content. Value migrated from "owning the printing press" to "owning the reader's trust."
05. The Laws of Commoditization
Three stories spanning centuries—the printing press, the electric motor, the cloud server—follow the same laws:
| Layer Being Commoditized | Layer Where Value Migrates |
|---|---|
| Scribing | Content Creation & Publishing |
| Factory Power | Production Process Design |
| Server Infrastructure | Application Experience & Network Effects |
| Code Writing | Problem Definition, Product Judgment, User Acquisition |
AI is commoditizing coding, but it is not commoditizing the definition of what problem to solve.
When "how to build it" is no longer the bottleneck, "what to build" and "for whom" become the critical dimensions of differentiation.
The same power-law distribution is re-emerging in the AI agent space: countless copycats are appearing, but the Matthew effect is extremely strong—not because the followers are incompetent, but because in a world of oversupply, attention itself is the ultimate scarce resource. Using AI to build a traditional SaaS product faster—"AI helps you build a CRM quicker"—is fundamentally just replacing the steam engine with an electric motor. The defensive moat is paper-thin.
The real opportunity lies in redesigning product forms around the new reality of zero-marginal-cost code production.
The AI coding space is currently in its unbundling phase: the value of standardized tools is decreasing, while the value of long-tail, custom-built tools is increasing. But history teaches us that re-bundling will inevitably follow.
06. Where We Are Now
Economist Carlota Perez proposed an influential framework describing how every technological revolution passes through two major periods.
- Installation Period: The new technology enters the market, infrastructure is built, and financial capital pours in, creating a speculative bubble. This period is characterized by chaos, experimentation, and overinvestment.
- Turning Point: The bubble bursts, a recession follows, and institutional frameworks begin to adapt to the new technology.
- Deployment Period: The technology is widely adopted into the mainstream. If the institutional arrangements are right, a "golden age" can begin, where the full potential of the technology is unleashed.
| Technological Revolution | Installation Period | Turning Point | Deployment Period |
|---|---|---|---|
| Railways | 1830s-1840s (Railway Mania) | 1847 Railway Bubble Burst | 1850s-1870s |
| Electricity/Heavy Industry | 1880s-1920s | 1929 Great Depression | 1930s-1960s |
| Internet/IT | 1990s | 2000 Dot-com Bubble | 2003-2020s |
| AI | 2023-? | ? | ? |
If Perez's framework holds, AI is currently in the early stages of its Installation Period—characterized by massive capital influx, a proliferation of labs, and extremely crowded consensus bets. This stage perfectly matches what we are seeing: oversupply, countless copycats, and a strong Matthew effect. The latter half of the Installation Period typically features a speculative bubble. Only after that bubble bursts do we enter the true "Deployment Period," when the infrastructure has matured, institutional frameworks have adapted, and the full potential of the technology begins to be realized.
According to this framework, the greatest value creation typically occurs in the Deployment Period, not the Installation Period.
07. What Might Happen
Programmers will not disappear, but the definition of a "programmer" will change.
Just as scribes did not vanish overnight—commissioned manuscripts were still being produced decades after the invention of the printing press—and steam-powered factories did not immediately disappear with the advent of electricity. But the differentiating factor in competition will shift from "the ability to write code" to "system design and architectural judgment."
There is an important distinction to be made here: Technology replaces "tasks," not "people." Work that can be broken down into explicit steps—whether cognitive (like data entry) or physical (like assembly line work)—will be automated. Work that requires judgment, creativity, and complex communication will be amplified by technology. The result is that high-end skills become more valuable, mid-level skills are commoditized, and the workforce is polarized.
Similarly, the value of a programmer will migrate from the increasingly routine task of "writing code" to the non-routine tasks of system architecture judgment, product intuition, taste, and the debugging and integration of complex systems.
The biggest winners will not be "the people who can write code fastest with AI."
In every historical instance of commoditization, the biggest winners were not those who executed faster, but those who redefined the rules of the game. Gutenberg was not the biggest winner; publishers and authors were. The electric utilities were not the biggest winners; Ford was. AWS is certainly a winner, but so are Airbnb and Stripe—they leveraged commoditized infrastructure to create business models that were previously impossible.
When coding is commoditized, the winners will likely not be those who use AI to code fastest, but those who leverage zero-marginal-cost code production to redefine product forms, distribution methods, or value capture models.
After unbundling, the opportunity for re-bundling is brewing.
We are currently in the unbundling phase—standardized tools are being disaggregated, and a long tail of personalized tools is emerging (consider the recent shift from SaaS to personalized Agents). But if historical patterns hold, these fragmented, long-tail tools will eventually require a new integration layer. This could be an "app store" for discovering and reusing AI-generated disposable tools, a "composable platform" that allows users to assemble multiple long-tail tools like Lego bricks, or an "AI-native operating system" that treats the ability to generate, run, and manage code as a fundamental primitive.
And the forty-year lesson of electricity reminds us to be patient.
The way we use AI today—making it write traditional software faster—is likely just the "replacing the steam engine with an electric motor" phase. The true "assembly line moment"—redesigning the entire software paradigm around the unique properties of AI—may still be years or even longer away. But when it arrives, it could give rise to entirely new product forms that were previously impossible:
- Disposable Software: Custom tools built for a specific user in a specific context, and then discarded.
- Adaptive Software: Applications that generate and modify their own code in real-time based on user behavior.
- Hyper-Long-Tail Software: Bespoke products built for every incredibly niche need.
But there is one crucial caveat: The pace of AI development is far faster than that of the electrical age. The 40-year lag of the electric revolution was partly due to the long cycle of building physical infrastructure—power grids, factories, and training workers all took time. The "infrastructure" for AI is software and compute, with iteration cycles measured in months. A 2025 McKinsey survey found that organizations that redesigned their end-to-end workflows before adopting AI were nearly three times more likely to see significant financial returns. This suggests that the "assembly line moment" will not wait 40 years. It could be just a few years away.
08. The Worst of Times for Startups
If the preceding analysis is correct, then for founders, this is both the best of times and the most brutal of times.
The good side is obvious: The barrier to building a product has never been lower. A single person, over a weekend, with a few hundred dollars in API costs, can create something that once required a team and months of work. The distance from idea to prototype has been compressed to its limit. "Can it be built?" is no longer the question.
But this is precisely the beginning of hell mode.
When everyone can build products quickly, the act of "building" itself ceases to be a competitive advantage. What you can build in a weekend, others can too. Your innovation today will be replicated tomorrow.
This leads to several brutal realities:
- Exponentially increased competition. Every market is crowded. Because the barrier to entry is lower, more people enter. Because the iteration speed is faster, everyone is shipping constantly. You are no longer racing against a few competitors; you are racing against everyone on the internet who can think of the same idea.
- Attention as the ultimate bottleneck. In a world of oversupply, being seen is harder than building. Dozens of new products launch on Product Hunt every day. New AI tools are demoed on X every hour. The cost of acquiring user attention—whether through paid acquisition or content marketing—is rising rapidly, while the differentiation of the products themselves is diminishing.
- An extreme winner-take-all Matthew effect. History shows that after every supply-side explosion, value becomes hyper-concentrated at the top. This means that mid-level success may disappear. You either become a top player in your category or you struggle to survive in the long tail.
- The crumbling of moats. The traditional moats of software companies—technical complexity, engineering team size, years of accumulated code—are becoming fragile in the face of AI. Nicolas Bustamante analyzed the ten major moats of vertical software in the age of LLMs and found that five are crumbling (learned interfaces, custom business logic, public data access, scarce talent, bundling) while five remain strong (proprietary data, regulatory compliance, network effects, transaction embedding, system-of-record status). The key insight is that the moats being destroyed are precisely those that once prevented competitors from entering the market.
Simply put: If your advantage is in "how you do it," you are being commoditized. If your advantage is in "what you have" (data, users, compliance), you are becoming safer.
09. Surviving in Hell Mode
So, in this hell mode, what strategies might be effective?
- Don't use AI to do old things. Using AI to build a traditional SaaS faster is just replacing the steam engine with an electric motor. The question you need to ask is: If the cost of code production were zero, what product forms would become possible that were impossible before? Disposable software? Adaptive software? Hyper-personalized experiences? (Of course, in the short term, there is an arbitrage window for "doing old things faster with AI"—you can capture market share with lower costs and higher speed before competitors catch up. But this window will close quickly, because what you can do, others can do too.)
- Build your moat outside the code. If the code itself is no longer a barrier, then the barrier must come from somewhere else: unique data assets, strong user relationships, hard-to-replicate distribution channels, or brand and community.
- Speed is still important, but direction is more important than speed. In a world where everyone can execute quickly, judgment—knowing what to build and for whom—becomes the real differentiator. Slowing down to think about the right questions may be more valuable than quickly executing the wrong answers.
- Embrace unbundling, while looking for the re-bundling opportunity. We are currently in the unbundling phase—long-tail tools are emerging, and standardized products are being disaggregated. But history tells us that re-bundling is inevitable. Ask yourself: What integration layer will these fragmented tools eventually need? Who will provide it?
- Accept that this is a long game. The chaos of the Installation Period may last for several more years. This is not an era of "quickly find PMF and then scale," but an era that requires constant adaptation and redefinition of oneself. Patience and resilience may be more important than any single skill.
10. No Creation Without Destruction
Every major commoditization in history has been accompanied by a particular kind of pain: those who had accumulated an advantage in the old order find their advantage evaporating. The scribe's decade of practice becomes worthless in the face of the printing press. The factory owner's massive investment in a driveshaft system becomes a liability in the age of electricity. The programmer's years of accumulated coding skill are being matched by AI on a timescale of months.
But the other side of "no creation without destruction" is this: The disappearance of old advantages also means the disappearance of old barriers.
Those who were previously excluded for lack of resources, teams, or engineering capability can now compete. Things that once required hundreds of people and tens of millions of dollars can now be started by a single person over a weekend.
This is why we are at the beginning of a transformation.
Not because AI will replace everyone's job—history shows that technology rarely "eliminates" professions outright; it more often redefines their content.
But because: When a core capability is commoditized, the entire value chain is reorganized. And the moment of value chain reorganization is precisely the moment when new players enter the field and new rules are written.
Gutenberg did not know the printing press would trigger the Protestant Reformation. Ford did not know the assembly line would reshape the middle class. When AWS launched in 2006, no one could have predicted that companies like Airbnb and Stripe would become possible as a result.
Similarly, we do not know today what new product forms, new business models, or new ways of creating value will emerge when coding is fully commoditized.
But one thing is certain: Those who are first to understand the new rules and first to redesign themselves—whether as individuals, teams, or companies—around the new capabilities will have the advantage in the new order.
No creation without destruction.
References
- Autor, D.H., Levy, F., & Murnane, R.J. (2003). "The Skill Content of Recent Technological Change: An Empirical Exploration." The Quarterly Journal of Economics, 118(4), 1279-1333.
- Brynjolfsson, E. (1993). "The Productivity Paradox of Information Technology." Communications of the ACM, 36(12), 66-77.
- Christensen, C. & Raynor, M. (2003). The Innovator's Solution. Harvard Business School Press.
- David, P.A. (1990). "The Dynamo and the Computer: An Historical Perspective on the Modern Productivity Paradox." American Economic Review, 80(2), 355-361.
- Dittmar, J. (2011). "Information Technology and Economic Change: The Impact of The Printing Press." The Quarterly Journal of Economics, 126(3), 1133-1172.
- Perez, C. (2002). Technological Revolutions and Financial Capital. Edward Elgar Publishing.
- Solow, R. (1987). "We'd better watch out." New York Times Book Review, July 12, p.36.
- Andreessen, M. (2011). "Why Software Is Eating the World." Wall Street Journal.
- Bustamante, N. (2026). "The Crumbling Workflow Moat."
- Grady, P. & Huang, S. (2026). "2026: This is AGI." Sequoia Capital.
- Spolsky, J. (2002). "Strategy Letter V."
- Thompson, B. (2015). "Netflix and the Conservation of Attractive Profits." Stratechery.
- Additional data sourced from a16z, McKinsey & Company, METR, and historical economic analyses.
┌─────────────────────────────────────────────────────┐
│ API Server-Side Tool Index (Initially inaccessible to the model; incurs no token cost) │
│ │
│ get_weather: Complete input_schema... │
│ search_restaurants: Complete input_schema... │
│ book_reservation: Complete input_schema... │
│ │
│ Awaiting injection into the model's context upon a `tool_search` query. │
└─────────────────────────────────────────────────────┘
Model Response Trajectory: Search, then Invoke.
{ "role": "assistant", "content": [ { "type": "text", "text": "Let me search for weather-related tools." }, { "type": "server_tool_use", "id": "srvtoolu_01XYZ", "name": "tool_search_tool_bm25", "input": { "query": "weather forecast current conditions" } }, { "type": "tool_search_tool_result", "tool_use_id": "srvtoolu_01XYZ", "content": { "type": "tool_search_tool_search_result", "tool_references": [ {"type": "tool_reference", "tool_name": "get_weather"} ] } }, { "type": "text", "text": "Found the weather tool. Let me query the weather for San Francisco." }, { "type": "tool_use", "id": "toolu_01ABC", "name": "get_weather", "input": {"location": "San Francisco", "unit": "celsius"} } ], "stop_reason": "tool_use" }
Upon a search hit, the model's context window is dynamically updated—the full definition of `get_weather` is appended:
┌─────────────────────────────────────────────────────┐
│ Model Context Window (Post-Search) │
│ │
│ [System Prompt] ← Cache Hit │
│ [tool_search_tool_bm25 Definition] ← Cache Hit │
│ [User Message] ← Cache Hit │
│ [Model: "Let me search for weather-related tools."] ← New Content │
│ [tool_search → tool_reference: get_weather] │
│ [Model: "Found the weather tool..."] │
│ │
│ ┌───────────────────────────────────────────────┐ │
│ │ get_weather Tool Definition │ │
│ │ (name, description, input_schema) │ │
│ └───────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────┘
TITLE: GPT-5.4 and the Great White-Collar Extinction Event
Executive Summary
GPT-5.4 has arrived, and it's poised to consume the entire white-collar desktop environment. With a 1 million token context window and native computer use capabilities, it transforms tools like Excel into conversational data analysis platforms. Its performance is already surpassing that of top-tier consulting firms, signaling an existential threat to investment banking, law, and consulting. The era of AI-driven workforce disruption is no longer a future hypothetical; it is the present reality.
The Extinction-Level Event for Knowledge Work
"GPT-5.4 will render all knowledge-based jobs obsolete." This was the stark conclusion of a software engineer after initial testing. In a demonstration, the model scraped Zillow, extracted all San Francisco real estate prices, and imported the formatted data into a Google Sheet—all within four minutes.
Nearly every task performed by a white-collar professional can now be replicated by GPT-5.4. All browser-based workflows are within its capabilities, executed with superior accuracy and at a fraction of the cost. The revolution that autonomous coding agents started is about to repeat itself across the entire professional landscape.
A ChatGPT plugin powered by GPT-5.4 is now directly integrated into Excel. In recent investment banking benchmarks, this plugin's performance skyrocketed from 43.7% to 87.3%. Users can now perform complex operations using natural language. Building financial or analytical models, correcting and generating intricate formulas, and conducting nuanced data analysis and interpretation are now conversational tasks within Excel. This transforms the spreadsheet from a mere tool into a dynamic, dialogue-driven data analysis platform, currently accessible to users in the United States, Canada, and Australia. The traditional professional toolkit is being fundamentally and irrevocably disrupted.
Surpassing McKinsey: GPT-5.4's Assault on Wall Street
The CEO of MercorAI confirmed that GPT-5.4 is the top-performing model tested on their APEX-Agents benchmark, and critically, it is the first to achieve an average score exceeding 50%. A year ago, frontier models struggled to even edit an Excel sheet, scoring below 5%. Today, GPT-5.4 has demonstrated a 15.7% improvement in less than three months.
When an agent's operational efficiency crosses the 50% threshold, it transitions from an impressive demo to a viable operational asset. The trajectory is clear: AI systems will soon outperform the best consulting firms, investment banks, and law firms. The most alarming metric is not the headline score, but the steepness of the improvement curve against increasingly realistic and complex tasks.
A key insight from this advancement is that integrating reasoning and coding capabilities into a single model eliminates approximately 80% of the context-switching overhead between development tools. This is the true productivity multiplier. In just one year, AI has evolved from being incapable of editing a spreadsheet to surpassing the analytical capabilities of McKinsey. This rate of progress far outpaces the strategic planning cycles of most corporations.
Unprecedented Capabilities: The Technical Leap of GPT-5.4
The significance of GPT-5.4's 1 million token context window cannot be overstated. It means entire codebases can be processed in a single pass, yielding coherent and reliable results. This eliminates the need for chunking, complex retrieval-augmented generation (RAG) systems, and context compression, mitigating the risk of comprehension loss. This capability alone will fundamentally reshape the architecture of AI agents.
The combination of a 1M context window and native computer use allows the agent to execute multi-file tasks without losing context. Prominent AI researcher Eric Hartford, after testing GPT-5.4, noted, "Its comprehension and problem-solving speed have visibly ascended to a new level." He benchmarked it with a difficult task: building a compiler from scratch. While previous models like Claude Code stalled and GPT-5.3 struggled, GPT-5.4, in his words, "just gets it."
Building a compiler is an exceptionally rigorous benchmark that demands deep logical reasoning and a tight feedback loop within a single session—precisely the strengths of GPT-5.4. Another developer reported that the model solved a complex problem in Minecraft in approximately 24 minutes. More astonishingly, GPT-5.4 has demonstrated the ability to reverse-engineer Nintendo Entertainment System (NES) ROMs. By ingesting the raw ROM, the model can deconstruct the program's structure, restore its logic, and even explain the original assembly-level programming techniques. Codebases once considered opaque are now being systematically dissected. In this new paradigm, no code is truly secure.
Conquering Hard Science
Beyond software engineering, GPT-5.4 is making inroads into hard science. The latest rankings from CritPt, a notoriously difficult physics benchmark designed to expose the limits of large models, have been released. The benchmark consists of 71 unsolved, cutting-edge problems spanning 11 subfields of physics, curated by over 50 frontline researchers from 30 institutions. Each problem undergoes an average of 40 hours of review, and solutions require outputs like floating-point arrays, symbolic expressions, or direct Python functions to prevent guessing.
GPT-5.4 Pro (xhigh) achieved the top score of 30.0%, with GPT-5.4 (xhigh) at 20.0%. The third-place model, Gemini 3.1 Pro Preview, scored 17.7%. For context, the most advanced models of 2025 typically scored in the single digits.
The Economic Fallout: 57,000 Tech Jobs Vanish
As the ceiling of AI capability is shattered, the floor of the labor market is collapsing. In the same week as the GPT-5.4 release, economist Joey Politano highlighted alarming figures: the U.S. tech sector saw a net loss of 12,000 jobs last month, contributing to a total of 57,000 jobs eliminated over the past year.
The current contraction in tech employment is nearly on par with the most severe period of the 2024 tech recession and is worse than the downturns of 2008 and 2020. An analysis of U.S. tech employment data from 1990 to 2026 reveals that the current downward trajectory, which began in 2023, is comparable in scale and duration only to the dot-com bubble burst of 2001.
However, the underlying mechanics of this downturn are fundamentally different. The 2001 crash was a financial clearing event driven by exhausted capital and failed business models. Companies folded, but the market demand remained, allowing for eventual re-employment. Today, the situation is inverted. The profits of leading tech companies are not collapsing; they are soaring. Jobs are disappearing not because companies are failing, but because with AI, they no longer require the same number of people.
An employee displaced in 2001 could wait for the market to recover. An employee displaced in 2026 faces a permanent structural change. Paradoxically, while overall tech employment plummets, the demand for AI-specific roles is surging. Companies are not shrinking; they are re-staffing, or more accurately, replacing human capital with AI-augmented capital. A five-person team's workload can now be handled by one person and an AI. The other four are now redundant.
Nobel Laureate's Warning: The Optimal Number of Employees is Zero
For those who believe this is merely an internal adjustment within the tech sector, Nobel laureate economist Joseph Stiglitz offers a sobering perspective. In a recent interview with Fortune, the 83-year-old former Chief Economist of the World Bank stated, "If we don't manage AI, it will inevitably lead to greater inequality. Inequality is already a terribly corrosive and serious problem in our society, so this worries me a great deal."
Stiglitz identifies the underlying mechanism driving this trend. As technology strategist Daniel Miessler aptly put it, "The perfect number of human employees in any company is zero." This statement, while jarring, accurately reflects the core incentive of capital: labor is a cost center. AI is the first technology that credibly promises to eliminate this cost.
In his book The Road to Freedom, Stiglitz systematically deconstructs this process: AI enables corporations to shed labor and concentrate profits at the apex of the pyramid, while the risks of this transition are externalized to the workforce and the public.
The Countdown Has Begun
GPT-5.4's benchmark scores—83% on GDPval, a record 30% on the CritPt physics benchmark, and 75% on OSWorld computer operation tasks (surpassing the human baseline)—send a clear signal: AI is not a future possibility for white-collar replacement; it is the present reality.
The critical question is not whether AI can do your job. The critical questions are: who captures the profits generated by this new efficiency, and where do the displaced workers go? This is not a technical problem; it is a choice. And the window to make that choice is closing rapidly.
Analysis of a Unified Tool-Calling Execution Model
The following demonstrates a streamlined, multi-phase tool-calling process observed in a leading AI model.
Model Input
The process is initiated with a standard user query and a set of available tools defined in JSON.
{
"model": "gpt-5.4",
"input": [
{"role": "user", "content": "What is the weather like in San Francisco today?"}
],
"tools": [
{"type": "tool_search"},
{
"type": "function",
"name": "get_weather",
"description": "Retrieves current weather information for a specified location.",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
},
"defer_loading": true
},
{
"type": "function",
"name": "search_restaurants",
"description": "Searches for nearby restaurants based on location and cuisine.",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"},
"cuisine": {"type": "string"},
"price_range": {"type": "string", "enum": ["$", "$$", "$$$", "$$$$"]}
},
"required": ["location"]
},
"defer_loading": true
},
{
"type": "function",
"name": "book_reservation",
"description": "Books a reservation at a specified restaurant.",
"parameters": {
"type": "object",
"properties": {
"restaurant_id": {"type": "string"},
"date": {"type": "string"},
"time": {"type": "string"},
"party_size": {"type": "integer"}
},
"required": ["restaurant_id", "date", "time", "party_size"]
},
"defer_loading": true
}
]
}
Execution Flow: Search, Load, and Call in a Single Response
The model's response architecture unifies the tool search, loading, and final function call into a single, atomic transaction.
{
"output": [
{
"type": "tool_search_call",
"id": "ts_01ABC",
"execution": "server",
"call_id": null
},
{
"type": "tool_search_output",
"id": "tso_01ABC",
"tools": ["get_weather"]
},
{
"type": "function_call",
"id": "fc_01XYZ",
"call_id": "call_01XYZ",
"name": "get_weather",
"arguments": "{\"location\": \"San Francisco\", \"unit\": \"celsius\"}"
}
]
}
Core Insight
A critical detail emerges from the response structure in a hosted environment: the presence of execution: "server" paired with call_id: null. This pattern signifies that the initial tool search and selection logic is executed entirely on the platform's backend. Consequently, the developer's implementation is simplified, requiring them only to process the final, resolved function_call object. This abstracts away the intermediate steps of tool discovery, enhancing efficiency and reducing implementation complexity.
Pattern B: Client-Executed Tool Search
This pattern is optimal for scenarios where tool discovery is dependent on external systems, such as project state or tenant configurations.
{
"model": "gpt-5.4",
"input": [
{"role": "user", "content": "What's the weather like in San Francisco today?"}
],
"tools": [
{
"type": "tool_search",
"execution": "client",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"}
},
"required": ["query"]
}
},
{
"type": "function",
"name": "get_weather",
"description": "Get weather",
"parameters": {},
"defer_loading": true
}
]
}
Turn 1 — The model issues a search request and pauses execution:
{
"output": [
{
"type": "tool_search_call",
"id": "ts_01ABC",
"execution": "client",
"call_id": "call_search_01",
"arguments": "{\"query\": \"weather\"}"
}
],
"status": "incomplete"
}
After your application executes the search, it returns the results to continue the dialogue:
{
"input": [
{
"type": "tool_search_output",
"call_id": "call_search_01",
"tools": [
{
"type": "function",
"name": "get_weather",
"description": "Retrieves current weather information for a specified location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
]
}
]
}
Turn 2 — The model invokes the loaded tool:
{
"output": [
{
"type": "function_call",
"name": "get_weather",
"arguments": "{\"location\": \"San Francisco\"}"
}
]
}
The Namespace Method (Recommended for Large-Scale Tool Management)
A best practice, often highlighted in technical forums and API documentation, is to organize tools into namespaces, particularly when dealing with a large number of functions. This strategy allows the model to initially process only the high-level namespace description and then load the entire toolset on-demand as required.
{
"model": "gpt-5.4",
"tools": [
{"type": "tool_search"},
{
"type": "namespace",
"name": "dining",
"description": "Tools related to dining: search for restaurants, view menus, make reservations.",
"tools": [
{
"type": "function",
"name": "search_restaurants",
"description": "Search for nearby restaurants",
"defer_loading": true,
"parameters": {}
}
]
}
]
}
2.3.2 tool_search: Dynamic Tool Discovery and Deferred Loading
The tool_search function enables a model to dynamically search for and load the appropriate tools from a large, predefined set, even if those tools are not initially loaded into the context window. This mechanism is critical for building scalable agentic systems with extensive capabilities, as it circumvents the context window limitations that would arise from loading hundreds or thousands of tool definitions simultaneously.
When tool_search is enabled, the model can issue a search query if the user's request cannot be fulfilled by the tools already in its active context. The system then executes this search against the complete set of available tools, identifies the most relevant ones, and loads their full definitions into the context for the model to use.
Core tool_search Workflow
- Initial State: The model's context contains only a minimal set of essential tools or simply the
tool_searchfunction itself. - User Request: The user issues a prompt, for example, "Book a table for two at a Michelin-starred restaurant in Paris for tomorrow night."
- Tool Search Invocation: The model determines that it lacks the specific tool for restaurant reservations in its current context. It invokes
tool_searchwith a query derived from the user's intent. ``json { "type": "tool_search_call", "query": "find a tool to book a restaurant reservation" }`` - System Execution: The system searches the available tool definitions. The search algorithm, often based on techniques like BM25, matches the query against tool names, descriptions, and parameter definitions.
- Result and Context Update: The search returns the most relevant tool(s). The system then injects the full definition of the discovered tool, such as
book_reservation, into the context. ``json { "type": "tool_search_output", "results": [ { "type": "function", "name": "book_reservation", "description": "Books a restaurant reservation.", "parameters": { "type": "object", "properties": { "restaurant_name": { "type": "string", "description": "The name of the restaurant." }, "party_size": { "type": "integer", "description": "The number of people in the party." }, "datetime": { "type": "string", "description": "The desired date and time for the reservation, in ISO 8601 format." } }, "required": ["restaurant_name", "party_size", "datetime"] } } ] }`` - Final Tool Call: With the complete tool definition now in context, the model can formulate a precise and valid tool call to fulfill the original request. ``
json { "type": "function_call", "name": "book_reservation", "arguments": { "restaurant_name": "Le Cinq", "party_size": 2, "datetime": "2026-04-20T20:00:00Z" } }``
Deferred Loading (defer_loading)
The defer_loading: true flag is the key that enables this "just-in-time" tool loading. When a tool is marked with this flag, only its name and a high-level description are initially visible to the model. The detailed parameter schema is withheld until the tool is explicitly discovered and loaded via a tool_search operation.
Before defer_loading (High Context Usage): All tool schemas are loaded upfront, consuming significant context space.
{
"tools": [
{
"type": "function",
"name": "get_weather",
"description": "Get the weather for a specified location.",
"parameters": { ...full schema... }
},
{
"type": "function",
"name": "book_reservation",
"description": "Book a restaurant reservation.",
"parameters": { ...full schema... }
},
{
"type": "function",
"name": "search_flights",
"description": "Search for available flights.",
"parameters": { ...full schema... }
}
]
}
After defer_loading (Optimized Context Usage): Only names and descriptions are present initially. The full schema is loaded on demand.
{
"tools": [
{
"type": "function",
"name": "get_weather",
"description": "Get the weather for a specified location.",
"defer_loading": true,
"parameters": {}
},
{
"type": "function",
"name": "book_reservation",
"description": "Book a restaurant reservation.",
"defer_loading": true,
"parameters": {}
},
{
"type": "function",
"name": "search_flights",
"description": "Search for available flights.",
"defer_loading": true,
"parameters": {}
}
]
}
Tool Grouping with Namespaces
To manage large and complex toolsets effectively, platforms allow grouping related tools into namespaces. This provides a hierarchical structure that aids both the model's comprehension and the search process. When tools are organized into namespaces, the model can first identify a relevant namespace and then search for a specific tool within that narrowed scope.
Example: Namespace Configuration
{
"type": "tool_search",
"namespaces": [
{
"type": "namespace",
"name": "dining",
"description": "Tools related to dining: searching for restaurants, booking reservations.",
"tools": [
{
"type": "function",
"name": "find_restaurants",
"description": "Finds restaurants based on criteria like cuisine and price.",
"defer_loading": true,
"parameters": {...}
},
{
"type": "function",
"name": "book_reservation",
"description": "Books a restaurant reservation.",
"defer_loading": true,
"parameters": {...}
}
]
},
{
"type": "namespace",
"name": "weather",
"description": "Tools related to weather and environment: checking forecasts, air quality, UV index.",
"tools": [
{
"type": "function",
"name": "get_weather",
"description": "Gets the weather for a specified location.",
"defer_loading": true,
"parameters": {...}
}
]
}
]
}
Initial Model View:
┌─────────────────────────────────────────────────────┐ │ [tool_search Tool] │ │ [namespace: dining — "Tools related to dining..."] │ │ [namespace: weather — "Tools related to weather..."] │ │ │ │ ⚠️ The model only sees namespace names and descriptions. │ │ Function parameter schemas are not loaded │ │ until tool_search loads a specific namespace. │ └─────────────────────────────────────────────────────┘
When the model determines it needs weather-related functionality, tool_search loads the weather namespace, at which point the full parameter definition for get_weather enters the context.
2.4 Platform
| Anthropic | OpenAI | |
|---|---|---|
Custom input_schema | Supported | Supported |
| Search Result Count | 3-5 most relevant tools | Undisclosed limit, multiple Namespaces can be loaded |
| Max Tools | 10,000 | Undisclosed |
| Model Requirements | Sonnet 4.0+ / Opus 4.0+ (Haiku not supported) | GPT-4.5+ |
2.5 Summary of Key Design Differences
Anthropic's Design Philosophy: Search as a "Specialized Tool"
- The search tool has a distinct type identifier (e.g.,
tool_search_tool_bm25_20251119) and is executed server-side as aserver_tool_use. - Search results are automatically expanded via a
tool_referencemechanism, eliminating the need for developers to manually inject tool definitions. - It offers developers a choice between BM25 and Regex search algorithms.
- Any standard tool can also function as a custom search provider, provided it returns a
tool_reference.
OpenAI's Design Philosophy: Search as a "System Capability"
- The search tool declaration is minimalist (
{"type": "tool_search"}), functioning as a built-in system capability. - It places a stronger emphasis on
Namespacegrouping, as the model is primarily trained to search at the Namespace level. - For a single
deferfunction, the model can still see the function name and description; only the parameter schema is deferred. - It distinguishes between two modes:
Hosted(search on OpenAI's servers) andClient-executed(search within the developer's application). - The
Client-executedmode permits complete customization of search parameters and the returned toolset.
Core Concept: Aligned. Implementation Path: Divergent.
- Anthropic: Tool-level granularity → BM25/Regex search →
tool_referencefor automatic expansion. - OpenAI: Namespace-level granularity → Hosted/Client search →
tool_search_outputfor manual/automatic injection.
3. What Problems Does Tool Search Solve?
3.1 Problem 1: Context Bloat
With the proliferation of the Model Context Protocol (MCP) ecosystem, the number of tools connected to a single AI agent is growing exponentially. The token consumption in real-world scenarios is substantial:
| MCP Server | Tool Count | Token Consumption |
|---|---|---|
| GitHub | 35 | ~26,000 |
| Slack | 11 | ~21,000 |
| Jira | 20 | ~17,000 |
| Sentry | 5 | ~3,000 |
| Grafana | 5 | ~3,000 |
| Splunk | 2 | ~2,000 |
| Total | 78 | ~72,000 |
The following screenshot from a production environment executing a /context command illustrates the issue. System tools consume 16.6k tokens (8.3%)—and this represents a configuration with only a moderate number of tools:
Our internal benchmarks show that tool definitions have consumed up to 134,000 tokens in complex configurations. A single Docker MCP server with 135 tool definitions can consume approximately 125,000 tokens.
This means that in a 200K context window, over a third of the available space can be occupied by tool definitions before a conversation even begins, severely compressing the capacity for dialogue history, system prompts, and actual reasoning.
3.2 Problem 2: Decreased Tool Selection Accuracy
Empirical data demonstrates that when the number of available tools exceeds 30-50, the model's tool selection accuracy declines significantly. This is particularly true when tool names are similar (e.g., notification-send-user vs. notification-send-channel), leading to confusion and incorrect tool invocation.
This is an information overload problem. The accuracy of selecting the most relevant book from a library of 100 is inherently lower than selecting from a curated set of 5.
Our internal evaluation data highlights the improvement:
| Model | Without Tool Search | With Tool Search | Improvement |
|---|---|---|---|
| Claude 4 Opus | 49% | 74% | +25pp |
| Claude 4.5 Opus | 79.5% | 88.1% | +8.6pp |
3.3 Problem 3: Exploding Token Costs
Every API request must carry the complete set of tool definitions. Even if a user asks a simple, unrelated question, the full token cost for all tool definitions is incurred.
Consider a configuration with 5 servers and 78 tools:
- Tool Definition Overhead per Request: ~72,000 tokens
- Assumption: 10,000 API calls per day
- Result: 720 million tokens consumed daily by tool definitions alone.
- Cost: At the input price of a model like Claude Sonnet 4 ($3/M tokens), the daily cost for tool definitions alone is approximately $2,160.
With the implementation of Tool Search, the tool definition overhead per request is reduced to approximately 3,000-5,000 tokens, which includes the search tool itself plus 3-5 on-demand loaded tools. This achieves a cost reduction of 85-95%.
3.4 Problem 4: Prompt Caching Invalidation
Conventionally, if different requests require distinct subsets of tools, the variation in tool definitions will break the Prompt Cache. This forces a reprocessing of the entire tool list for each request.
The design of Tool Search elegantly resolves this issue. Deferred-loading tools are completely excluded from the initial prompt; only the Tool Search tool itself and a small set of core tools are included. This allows the system prompt and core tool definitions to be stably cached. In some leading implementations, newly discovered tools are injected at the end of the context window, further protecting the cache of preceding content.
IV. Comparative Analysis of Platform Implementations
4.1 Anthropic Claude: The Tool Search Tool
Release Date: November 24, 2025 (Beta), with continuous iteration.
Supported Models: Claude Sonnet 4.0+, Claude Opus 4.0+ (Haiku not supported).
Two Search Variants:
| Variant | Type Identifier | Query Method | Use Case |
|---|---|---|---|
| Regex | tool_search_tool_regex_20251119 | Python Regular Expression | Precise pattern matching, for when tool naming conventions are known. |
| BM25 | tool_search_tool_bm25_20251119 | Natural Language Query | Fuzzy search, semantic matching. |
Core API Design:
{
"tools": [
// 1. Declare the search tool (always loaded)
{"type": "tool_search_tool_bm25_20251119", "name": "tool_search_tool_bm25"},
// 2. Mark tools for deferred loading
{
"name": "github.createPullRequest",
"description": "Create a pull request in a GitHub repository",
"input_schema": {...},
"defer_loading": true // Critical flag: Excludes from initial context
},
// 3. Retain high-frequency tools for immediate loading
{
"name": "search_files",
"description": "Search files in workspace",
"input_schema": {...}
// No defer_loading → Immediately available
}
]
}
MCP Server-Level Deferred Loading:
{
"type": "mcp_toolset",
"mcp_server_name": "github",
"default_config": {"defer_loading": true}, // Entire server defaults to deferred loading
"configs": {
"search_repos": {"defer_loading": false} // Exception for high-frequency tools
}
}
Response Format (Search → Discover → Call):
{
"content": [
// Step 1: Model decides to search
{"type": "server_tool_use", "name": "tool_search_tool_bm25",
"input": {"query": "github pull request"}},
// Step 2: Return search results (tool_reference is auto-expanded to the full definition)
{"type": "tool_search_tool_result",
"content": {"tool_references": [
{"type": "tool_reference", "tool_name": "github.createPullRequest"}
]}},
// Step 3: Model calls the discovered tool
{"type": "tool_use", "name": "github.createPullRequest",
"input": {"repo": "myorg/myrepo", "title": "Fix bug #123", ...}}
]
}
Key Metrics:
- Context consumption reduced from ~77K tokens to ~8.7K tokens (an 85%+ reduction).
- Each search returns the 3-5 most relevant tools.
- Supports a catalog of up to 10,000 tools.
- Maximum regex query length: 200 characters.
- Also supports custom client-side search implementations (by returning a
tool_referenceblock).
4.2 OpenAI GPT-5.4: Tool Search
Release Date: March 2026 (with GPT-5.4)
Two Execution Modes:
| Mode | Executor | Use Case |
|---|---|---|
| Hosted | OpenAI Server-side | Tools are known at request time; the simplest approach. |
| Client-executed | Developer Application | Tool discovery depends on external systems like project/tenant state. |
Core Design Differences:
OpenAI's implementation emphasizes the concept of Namespaces. The model is primarily trained to search at the Namespace and MCP server levels, rather than for individual functions. For lazy loading of individual functions, the model still sees the function name and description; what is actually deferred is the parameter schema.
Recommendation: Group lazy-loaded functions into Namespaces (fewer than 10 functions per Namespace), using clear, high-level descriptions to guide the model on when to load each group.
Hosted Mode Workflow:
- Developer declares all tools (including
defer_loading: true). - Model receives the request, seeing only tool names/descriptions + the
tool_searchtool. - Model determines a deferred tool is needed.
- The API automatically executes a search, returning a
tool_search_call+tool_search_output. - The loaded tool is injected at the end of the context (to protect the cache).
- Model invokes the now-loaded tool.
Client-executed Mode Workflow:
- Model issues a
tool_search_call(with search parameters). - Developer application executes custom search logic.
- Returns a
tool_search_output(containing the definitions of tools to be loaded). - Model invokes the loaded tools in a subsequent turn.
Cache Optimization: Newly discovered tools are injected at the end of the context window, ensuring the cache for preceding content remains intact.
4.3 Spring AI: Cross-Platform Tool Search Tool
Release Date: December 2025
Positioning: Abstracting the Tool Search pattern from a platform-specific feature into a portable, cross-LLM framework capability.
Spring AI implements dynamic tool discovery via the ToolSearchToolCallAdvisor (which inherits from ToolCallAdvisor), enabling it to run on all Spring AI-supported LLMs, including OpenAI, Anthropic, Gemini, Ollama, and Azure OpenAI.
Three Pluggable Search Strategies:
| Strategy | Implementation Class | Use Case |
|---|---|---|
| Semantic Search | VectorToolSearcher | Natural language queries, fuzzy matching |
| Keyword Search | LuceneToolSearcher | Exact term matching, known tool names |
| Regex Matching | RegexToolSearcher | Tool name patterns (e.g., get_*_data) |
Cross-Platform Benchmark Results (28 tools, Lucene search):
| Model | Traditional Method (tokens) | Tool Search (tokens) | Savings Ratio |
|---|---|---|---|
| Gemini 3 Pro | 5,375 | 2,165 | 60% |
| GPT-5 Mini | 7,175 | 4,706 | 34% |
| Claude Sonnet 4.5 | 17,342 | 6,273 | 64% |
Key Finding: Token savings primarily stem from a reduction in Prompt Tokens—in Tool Search mode, only the definitions of the discovered tools are included in the prompt.
4.4 Three-Platform Comparison Summary
| Dimension | Anthropic Claude | OpenAI GPT-5.4 | Spring AI |
|---|---|---|---|
| Search Type | Regex + BM25 | Hosted + Client-executed | Vector + Lucene + Regex |
| Lazy Loading Granularity | Single Tool / MCP Server | Single Function / Namespace / MCP | Any Registered Tool |
| Search Executor | Server-side (with support for custom client) | Server-side or Client-side | Client-side (within the framework) |
| Caching Strategy | Deferred tools are excluded from the initial prompt | New tools are injected at the end of the context | Depends on the underlying LLM's implementation |
| Max Tools | 10,000 | Undisclosed limit | No hard limit |
| Cross-Model Support | Claude only | GPT-5.4 only | All supported LLMs |
V. Technical Deep Dive: How Tool Search Works
5.1 System Architecture
┌────────────────────────────┐
Okay, I will translate the article. First, I need to read the content of the article.TITLE: 2026: Generosity, Brutality, and the Fog of War for AI Founders
A Note from the Epsilla Team:
We strongly recommend this article.
What resonates most is its refusal to linger on the surface-level excitement of "AI is powerful." Instead, it situates the present moment within the longer arc of technological history. For founders, this perspective is critical. Every time a core capability is rapidly commoditized, the opportunities expand, but the competition becomes exponentially more brutal. It's easier than ever to build something, which paradoxically makes the questions of what to build, for whom, and how to get noticed paramount.
Our own experience confirms this duality. This era is both generous and cruel to entrepreneurs. Generous, in that a single individual can now produce what once required an entire team. Cruel, in that the mere ability to build is no longer an advantage. What truly matters are judgment, product sense, a deep understanding of the user, and the courage to rewrite old problems around new capabilities.
This article is worth your time. Not because it provides answers, but because it helps us see the questions with greater clarity.
The author is an AI entrepreneur who previously built Devv.AI, a product that reached over a million users, and is preparing to launch a new venture.
01. The Acceleration of 2026
In February 2026, Andrej Karpathy (former Director of AI at Tesla, founding member of OpenAI) described a remarkably specific inflection point on X.
In November, his programming workflow was 80% handwriting code and 20% delegating to an agent. By December, the ratio had completely inverted: 80% of his work was directing an agent with natural language, with the remaining 20% dedicated to editing and final touches.
He recounted a recent series of tasks he assigned to an AI agent using natural language: log into a remote server, configure SSH keys, install and test a model, build a web UI, configure system services, and write documentation.
The agent completed everything autonomously in 30 minutes, encountering and resolving multiple issues on its own. Just three months prior, the same set of tasks would have consumed an entire weekend.
DHH (creator of Ruby on Rails) had an equally direct reaction:
"Biggest and fastest change in the 40 years I've tried to make computers do my bidding. And surprisingly, the most fun too!"
As a founder on the front lines of AI, I've also reverted to builder mode for the past three months, averaging over 100M+ tokens consumed daily and making more than 1,000 commits.
This acceleration is real. The output of one person for one week can now exceed what an entire team produced over several months in the past.
This acceleration isn't just happening at the individual level. In the first two months of 2026, the entire technology landscape appears to have entered a period of rapid acceleration.
- The Emergence of OpenClaw. This product, which brought Claude Code-level agent capabilities to the masses via Telegram/Slack, went viral at the end of January. Its success validates a recurring pattern: Virality = Democratized Experience—extending an experience already available to a niche group to a much larger user base. A unified entry point, persistent memory, and a flywheel of combinable Skills gave non-technical users their first real taste of "AI can actually do things for me."
- Coding Agents Crossing a Critical Threshold. Tools like Claude Code and Codex are now capable of independently completing tasks in moderately complex codebases (on the order of 100,000 lines) with minimal human intervention. This is not an incremental improvement. When AI transitions from "assisting with code" to "leading the coding process," the fundamental logic of the entire development workflow changes.
- Breakthroughs in Long-Horizon Agents. In January, Sequoia published a bluntly titled article: "This is AGI." Their definition wasn't based on a benchmark score but on a functional assessment: AI agents can now work autonomously for hours, making and correcting errors, and iterating continuously until a task is complete. Data from METR indicates that the complexity of tasks agents can handle is doubling approximately every 7 months. Sequoia extrapolated this trend in their article: by 2028, they could independently complete complex tasks equivalent to a full day of human expert work; by 2034, a full year's worth; and by 2037, a century's worth. (Note: By the time this article was published, OpenClaw had already surpassed React to become the most-starred project on GitHub.)
- Structural Shifts at the Enterprise Level. On February 26, Block founder Jack Dorsey announced the company would shrink from 10,000+ employees to under 6,000—a cut of over 40%. He attributed the layoffs to AI: "intelligence tools... are enabling a new way of working which fundamentally changes what it means to build and run a company." The market's reaction was immediate: the stock price surged 20% that day. (It's worth noting that critics argue Block's layoffs were more about correcting for over-hiring during the pandemic, when the company grew from ~4,000 to 13,000 people. Even Sam Altman has acknowledged the phenomenon of "AI washing." But regardless of the true reason for the layoffs, the fact that the market chose to believe the AI narrative is itself telling.)
This is not incremental efficiency gain. The core explosion is this: The capability of AI Coding (or, more broadly, Agents) has crossed a baseline and is being rapidly commoditized. Programming is no longer a scarce skill requiring years of training, but a resource available on-demand at near-zero marginal cost.
But amidst this acceleration, one thing compels me to pause and think: This has happened before.
Historically, whenever a high-barrier capability suddenly becomes cheap and massively accessible, it triggers a predictable series of structural changes: the decline of old professions, the birth of new ones, the reorganization of value chains, and the migration of power. History doesn't tell us what to do, but it can at least tell us what not to do.
This article attempts to revisit those historical moments, to see what actually happened, and then return to the present to understand what lessons they hold.
02. When Copying Became Free
Before Gutenberg, every book in Europe was copied by hand, word for word, by monastic scribes. A single handwritten copy of the Bible cost the equivalent of three years' salary for a clerk. The total number of books in all of Europe was estimated to be around 30,000. The "copying" of knowledge was an expensive capability monopolized by the church and a small elite.
Around 1440, Gutenberg independently developed a practical system of movable metal type in Europe. By 1455, the first Gutenberg Bible was printed. Thereafter, the price of books began a sustained decline of about 2.4% per year for over a century, falling by two-thirds by the year 1500.
A key competitive dynamic emerged: when a new printer entered a city's market, local book prices would immediately drop by about 25%. By 1480, 110 European cities had printing presses; by 1500, that number exceeded 236, and the total volume of books exploded from 30,000 to between 10 and 20 million.
The supply side exploded. But the consequences went far beyond "more books":
- Decline of Old Professions: Demand for scribes plummeted. The monastic scriptoria withered into obsolescence within decades. In 1492, Abbot Johannes Trithemius wrote In Praise of Scribes, attempting to argue for the spiritual value of manual transcription.
- Birth of New Professions: The printing press created an entire new industry: typesetters, proofreaders, bookbinders, illustrators, publishers, and booksellers. These jobs did not exist before Gutenberg.
- Oversupply and Uneven Quality: A flood of low-quality printed materials appeared—religious pamphlets, prophecy books, and erotic literature.
- Unforeseeable Second-Order Effects: The Protestant Reformation (Luther used the press to disseminate his ideas on a massive scale), the Scientific Revolution (academic papers could circulate across nations), and the rise of the nation-state (vernacular publications reinforced national identity)—none of which Gutenberg could have predicted.
This story reveals a recurring pattern. Clayton Christensen proposed the "Law of Conservation of Attractive Profits": When one layer of the value chain is commoditized and its profits disappear, an adjacent layer will see the emergence of new proprietary products to capture that profit.
Ben Thompson articulated this logic more directly in his analysis of Netflix: "breaking up an existing integrated system—commoditizing and modularizing it—destroys the value of the incumbent, while allowing a new entrant to integrate a different part of the value chain and capture new value."
(Source: "Netflix and the Conservation of Attractive Profits" by Ben Thompson)
Value doesn't just vanish; it migrates.
After the capability of "copying" was commoditized, value migrated from "transcription" to "content creation" and "curation/distribution." Publishers—not printers—became the new nodes of power.
Joel Spolsky summarized this logic as a strategic principle in his 2002 essay, "Strategy Letter V": "Commoditize your complement." Smart companies actively commoditize their complementary goods to increase demand for their core product. Microsoft commoditized PC hardware to enhance the value of its operating system; Netscape made the browser free to enhance the value of its servers.
Commoditization has another often-overlooked structural consequence: When the supply side explodes, the demand side (attention, budget, time) does not grow proportionally. The result is an extreme power-law distribution, where a tiny number of winners at the head capture the vast majority of the value, while the long tail of output goes largely unseen.
In the 50 years after the printing press, the number of books in Europe grew from 30,000 to 20 million, but the classics that have survived to this day represent a minuscule fraction of that total.
In a world of oversupply, attention itself becomes the scarcest resource.
This supply-side explosion is replaying itself right now. Data from a16z shows that new iOS app releases in December 2025 were up 60% year-over-year, with a 24% cumulative increase over the past 12 months. They attribute this phenomenon to the rise of agentic coding (also called "vibe coding"). This is a direct echo of the app explosion that followed the release of the iPhone SDK in 2008: when the barrier to creation plummets, the supply side always explodes.
03. When Power Became Cheap
In the late 19th century, factories were powered by steam engines or water wheels. The entire layout of a factory was designed around a massive central driveshaft (the line shaft). A steam engine in the basement turned the main shaft, which in turn drove the machinery on each floor via a system of belts. Factories had to be built as long, narrow, multi-story buildings, with all machines clustered tightly around the driveshaft. Building a factory required immense capital—not just for the machinery, but for a self-contained power system.
Electricity changed everything. The electrical grid allowed any factory to get power "on-demand" without building its own steam engine.
In 1899, electric motors accounted for only 5% of the total power in U.S. manufacturing. By 1909, it was 23%; by 1929, it had reached 77%. This transition occurred in three stages: first, large electric motors replaced steam engines to drive the existing line shafts. Next, machines were grouped together, with each group driven by a smaller motor. Finally, the line shaft was completely eliminated, and each machine was equipped with its own individual motor.
But there is a crucial lesson here.
In his famous 1990 paper, "The Dynamo and the Computer: An Historical Perspective on the Modern Productivity Paradox," economist Paul David pointed out that there was a lag of approximately 40 years between the commercialization of electricity (power stations were built in New York and London in 1881) and a measurable increase in economy-wide productivity (the 1920s). An observer in 1900 would have found almost no evidence that the "electric revolution" was making business more efficient.
Why?
Because early factories just replaced the steam engine with an electric motor and changed nothing else—not the layout, not the processes, not the organization. They were using a new tool to do an old job.
The real productivity boom occurred in the 1920s—when manufacturing total factor productivity (TFP) grew at an astonishing rate of ~5% per year, accounting for 84% of TFP growth in the entire economy—once a new generation of factories was completely redesigned around the properties of electricity. Single-story buildings replaced multi-story ones. Machines could be arranged according to the workflow, not the power transmission system. Factories became brighter and safer. This ultimately gave rise to Henry Ford's assembly line. Ford's factory was not "an old factory powered by electricity," but "a new production system designed around the properties of electricity."
This echoes Robert Solow's 1987 IT productivity paradox: "You can see the computer age everywhere but in the productivity statistics." Research by Erik Brynjolfsson in 1993 confirmed this: despite a hundredfold increase in U.S. computing power between the 1970s and 1980s, annual labor productivity growth fell from over 3% in the 1960s to around 1%.
Productivity only increases when technological investment is accompanied by complementary organizational change—exactly like the story of electricity.
The same paradox is replaying in the field of AI coding. A rigorous randomized controlled trial conducted by METR in 2025 found that when 16 experienced open-source developers used AI tools on their own familiar projects (which they had maintained for an average of 5 years), their task completion time was 19% slower—even though they had predicted they would be 24% faster. Larger surveys show that 75% of engineers use AI tools, but most organizations see no measurable performance improvement. What's the reason?
AI accelerates the single step of code generation but creates new bottlenecks in code review, integration, and testing. It's like speeding up one machine on an assembly line; you don't get a faster factory, you get a bigger pile-up.
This doesn't mean AI coding has no value. The key is who is using it and how they are using it. Karpathy's example—a weekend project compressed into 30 minutes—perfectly illustrates the point: when the user has sufficient system architecture and judgment capabilities, AI is a massive lever. The developers in the METR study slowed down on "familiar projects" precisely because their old workflows were not optimized for AI. True efficiency gains, as with electricity, will require redesigning the entire way of working around the unique properties of AI.
04. When the Barrier to Entry Collapsed
Before AWS, launching an internet service required buying servers, renting data center space, and hiring an operations team. In his essay "Why Software Is Eating the World," Marc Andreessen recalled that in 2000, when his partner Ben Horowitz was CEO of Loudcloud, the cost for a single customer to run a basic internet application was about $150,000 per month.
In 2006, AWS launched S3 and EC2. By 2011, the same application could run on AWS for about $1,500 per month—a 100-fold cost reduction. AWS cut prices over 60 times between 2006 and 2014; the cost of S3 storage dropped by 86% over 12 years (from $0.15/GB to $0.022/GB).
This collapse in barriers triggered a startup explosion. The capital required to launch an internet company fell from millions to a few thousand dollars. Y Combinator was able to launch in 2005 and back founders with minimal seed funding (initially around $20,000) precisely because of this dramatic shift in infrastructure costs. Instagram had only 13 employees when it was acquired by Facebook for $1 billion. Companies like Airbnb, Dropbox, and Stripe could exist because they didn't need to build their own data centers.
The SaaS market grew from $31.4 billion in 2015 to over $250 billion in 2024, with more than 16,500 SaaS companies in the U.S. alone. But each vertical eventually converged to 2-3 winners—another power-law distribution, following the same pattern as the supply explosion after the printing press.
Value migrated from "owning servers" to "owning users," and then to "owning the data flywheel" and "owning the network effect."
This supply-side explosion is also accompanied by a recurring cycle: First Unbundle, then Re-bundle.
Jim Barksdale famously said, "There are only two ways to make money in business: One is to bundle; the other is to unbundle." When a capability becomes cheap, integrated solutions are broken apart into smaller, more focused products. But when fragmentation reaches an extreme, new integrators emerge to re-bundle these fragments into a new, cohesive experience.
This cycle has played out repeatedly throughout history:
- The printing press first unbundled the church's monopoly on knowledge. Then, publishers re-bundled content curation and distribution.
- Cloud computing first unbundled IT infrastructure. Then, AWS/GCP/Azure re-bundled it into new, integrated cloud platforms.
- Journalism was first unbundled by blogs and social media—journalists could bypass newspapers to publish directly, and readers could consume single articles instead of subscribing to an entire paper. Then, Substack and paid newsletters re-bundled independent writing: authors gained a direct subscription relationship, and readers received a curated package of content. Value migrated from "owning the printing press" to "owning the reader's trust."
05. The Laws of Commoditization
Three stories spanning centuries—the printing press, the electric motor, and the cloud server—follow the same laws:
| Layer Being Commoditized | Layer Where Value Migrated |
|---|---|
| Scribing | Content Creation & Publishing |
| Factory Power | Production Process Design |
| Server Infrastructure | Application Experience & Network Effects |
| Code Writing | Problem Definition, Product Judgment, User Acquisition |
*AI is commoditizing coding, but it is not commoditizing what problem to solve.*
When "how to build it" is no longer the bottleneck, "what to build" and "for whom" become the critical dimensions of differentiation.
The same power-law distribution is re-emerging in the AI agent space: countless replicas are appearing, but the Matthew effect is extremely strong—not because the followers are inferior, but because in a world of oversupply, attention itself is the ultimate scarce resource. Using AI to build a traditional SaaS product faster—"AI helps you build a CRM quicker"—is fundamentally just replacing the steam engine with an electric motor. It offers very little defensibility.
The real opportunity lies in redesigning product forms around the new reality of zero-marginal-cost code production.
The AI coding space is currently in its unbundling phase: the value of standardized tools is decreasing, while the value of long-tail, custom-built tools is increasing. But history tells us that re-bundling will inevitably follow.
06. Where We Are Now
Economist Carlota Perez proposed an influential framework describing how every technological revolution passes through two major periods.
- Installation Period: The new technology enters the market, infrastructure is built, and financial capital pours in, creating a speculative bubble. This period is characterized by chaos, experimentation, and overinvestment.
- Turning Point: The bubble bursts, a recession follows, and institutional frameworks begin to adapt to the new technology.
- Deployment Period: The technology is widely adopted into the mainstream. If the institutional arrangements are right, a "golden age" can emerge, where the full potential of the technology is unleashed.
| Technological Revolution | Installation Period | Turning Point | Deployment Period |
|---|---|---|---|
| Railways | 1830s-1840s (Railway Mania) | 1847 Railway Bubble Burst | 1850s-1870s |
| Electricity/Heavy Industry | 1880s-1920s | 1929 Great Depression | 1930s-1960s |
| Internet/IT | 1990s | 2000 Dot-com Bubble | 2003-2020s |
| AI | 2023-? | ? | ? |
If Perez's framework holds, AI is currently in the early stages of its Installation Period—characterized by massive capital inflows, a proliferation of labs, and extremely crowded consensus bets. This stage perfectly matches what we're seeing: oversupply, countless copycats, and a strong Matthew effect. The latter half of the Installation Period typically features a speculative bubble. Only after that bubble bursts do we enter the true "Deployment Period," when the infrastructure has matured, institutions have adapted, and the technology's full potential begins to be realized.
According to this framework, the greatest value creation typically occurs in the Deployment Period, not the Installation Period.
07. What Might Happen
Programmers won't disappear, but the definition of a "programmer" will change.
Just as scribes didn't vanish overnight—handwritten manuscripts were still being commissioned decades after the invention of printing—and steam-powered factories didn't immediately disappear with the advent of electricity. But the basis of competitive differentiation will shift from "the ability to write code" to "system design and architectural judgment."
There is an important distinction to be made here: Technology replaces "tasks," not "people." Work that can be broken down into explicit steps—whether cognitive (like data entry) or physical (like assembly line work)—will be automated. Work that requires judgment, creativity, and complex communication will be amplified. The result is a polarization: high-end skills become more valuable, mid-level skills are commoditized, and the workforce is pushed to either end.
Similarly, the value of a programmer will migrate from the increasingly routine task of "writing code" to system architecture judgment, product intuition, taste, and the debugging and integration of complex systems—which remain non-routine tasks.
The biggest winners will not be "the people who use AI to code the fastest."
In every historical commoditization, the biggest beneficiaries were not those who executed faster, but those who redefined the rules of the game. Gutenberg wasn't the biggest winner; publishers and authors were. The electric utilities weren't the biggest winners; Ford was. AWS is certainly a winner, but so are Airbnb and Stripe—they used commoditized infrastructure to create business models that were previously impossible.
Once coding is commoditized, the winners will likely not be those who code fastest with AI, but those who leverage zero-marginal-cost code production to redefine product forms, distribution methods, or value capture models.
After unbundling, the opportunity for re-bundling is brewing.
We are currently in the unbundling phase—standardized tools are being disaggregated, and long-tail personalized tools are emerging (consider the recent shift from SaaS to personalized Agents). But if historical patterns hold, these fragmented, long-tail tools will eventually require a new integration layer. This could be an "app store" for discovering and reusing AI-generated disposable tools, a "composable platform" that lets users assemble multiple long-tail tools like LEGO bricks, or an "AI-native operating system" that treats the ability to generate, run, and manage code as a fundamental primitive.
The forty-year lesson of electricity reminds us to be patient.
The way we use AI today—making it write traditional software faster—is likely just the "replacing the steam engine with an electric motor" phase. The true "assembly line moment"—redesigning the entire software paradigm around the unique capabilities of AI—may still be years or even longer away. But when it arrives, it could give rise to entirely new product forms that were previously impossible:
- Disposable Software: Custom tools built for a specific user in a specific context, used once and then discarded.
- Adaptive Software: Applications that generate and modify their own code in real-time based on user behavior.
- Hyper-Long-Tail Software: Bespoke products built for every incredibly niche need.
But there is one critical caveat: The pace of AI development is far faster than that of the electrical age. The 40-year lag of the electric revolution was partly due to the long cycles of physical infrastructure build-out—power grids, factories, and worker training all took time. The "infrastructure" for AI is software and compute, with iteration cycles measured in months. A 2025 McKinsey survey found that organizations that redesigned their end-to-end workflows before adopting AI were nearly three times more likely to see significant financial returns than others. This suggests: The "assembly line moment" won't take 40 years. It could be just a few years away.
08. The Worst of Times for Startups
If the preceding analysis is correct, then for founders, this is both the best of times and the most brutal of times.
The good side is obvious: the barrier to building a product has never been lower. One person, one weekend, and a few hundred dollars in API calls can now produce what once took a team months to build. The distance from idea to prototype has been compressed to its limit. "Can it be built?" is no longer the question.
But this is precisely the start of "hell mode":
When everyone can build products quickly, the act of "building" itself is no longer a competitive advantage. What you can build in a weekend, others can too. Your innovation today will be replicated tomorrow.
This leads to several harsh realities:
- Exponentially Increased Competition. Every market is crowded. Because the barrier to entry is lower, more people enter. Because the iteration speed is faster, everyone is shipping constantly. You are no longer racing against a few competitors; you are racing against everyone on the internet who can think of the same idea.
- Attention as the Ultimate Bottleneck. In a world of oversupply, being seen is harder than building. Product Hunt features dozens of new products every day. X is a constant stream of new AI tool demos. The cost of acquiring user attention—whether through paid acquisition or content marketing—is rising rapidly, while product differentiation is declining.
- An Extreme Winner-Take-All Dynamic. History shows that after every supply-side explosion, value concentrates heavily at the top. This means that mid-level success may disappear. You either become a top player in your category or you struggle to survive in the long tail.
- Moats are Collapsing. The traditional moats of software companies—technical complexity, engineering team size, years of accumulated code—are becoming fragile in the face of AI. Nicolas Bustamante analyzed the ten major moats of vertical software in the age of LLMs and found: five are collapsing (learned interfaces, custom business logic, public data access, scarce talent, bundling) while five remain strong (proprietary data, regulatory compliance, network effects, workflow integration, system of record status). The key insight is that the moats being destroyed are precisely those that once prevented competitors from entering.
Simply put: *If your advantage is in how you do things, you are being commoditized. If your advantage is in what you have (data, users, regulatory licenses), you are becoming safer.*
09. Surviving in Hell Mode
So, in this "hell mode," what strategies might be effective?
- Don't Use AI to Do Old Things. Using AI to build a traditional SaaS product faster is just replacing the steam engine with an electric motor. The question you need to ask is: If the cost of code production were zero, what product forms would become possible that were impossible before? Disposable software? Adaptive software? Hyper-personalized experiences? (Of course, in the short term, there is an arbitrage window for "using AI to do old things faster." You can capture market share with lower costs and greater speed before competitors react. But this window will close quickly, because what you can do, others can do too.)
- Build Your Moat Outside the Code. If the code itself is no longer a defensible barrier, then the barrier must come from somewhere else: unique data assets, strong user relationships, hard-to-replicate distribution channels, or brand and community.
- Speed Still Matters, but Direction Matters More. In a world where everyone can execute quickly, judgment—knowing what to build and for whom—becomes the real differentiator. Slowing down to think about the right questions may be more valuable than quickly executing on the wrong answers.
- Embrace Unbundling, While Looking for Re-bundling Opportunities. We are in the unbundling phase—long-tail tools are emerging, and standardized products are being disaggregated. But history tells us that re-bundling is inevitable. Ask yourself: What integration layer will these fragmented tools eventually need? Who will provide it?
- Accept That This is a Long Game. The chaos of the Installation Period may last for several more years. This is not an era of "quickly find PMF and then scale." It is an era that demands constant adaptation and redefinition. Patience and resilience may be more important than any single skill.
10. No Creation Without Destruction
Every major commoditization in history has been accompanied by a particular kind of pain: those who had accumulated an advantage in the old order find their advantage evaporating. The scribe's decade of calligraphic practice becomes worthless in the face of the printing press. The factory owner's massive investment in a driveshaft system becomes a liability in the age of electricity. The programmer's years of accumulated coding skill are being matched by AI on a timescale of months.
But the other side of "no creation without destruction" is this: the disappearance of old advantages also means the disappearance of old barriers.
Those who were previously excluded for lack of resources, teams, or engineering capability can now compete. Things that once required hundreds of people and tens of millions of dollars can now be started by one person over a weekend.
This is why we are at the beginning of a transformation.
Not because AI will replace everyone's job—history shows that technology rarely "eliminates" professions outright; it more often redefines their content.
But because: When a core capability is commoditized, the entire value chain is reorganized.
And the moment of value chain reorganization is precisely the moment when new players enter the field and new rules are written.
Gutenberg did not know the printing press would fuel the Protestant Reformation. Ford did not know the assembly line would reshape the middle class. When AWS launched in 2006, no one could have predicted that companies like Airbnb and Stripe would become possible as a result.
Likewise, we do not know today what new product forms, new business models, or new ways of creating value will emerge once coding is fully commoditized.
But one thing is certain: Those who are first to understand the new rules and first to redesign themselves—whether as individuals, teams, or companies—around the new capabilities will have the advantage in the new order.
No creation without destruction.
References
- Autor, D.H., Levy, F., & Murnane, R.J. (2003). "The Skill Content of Recent Technological Change: An Empirical Exploration." The Quarterly Journal of Economics, 118(4), 1279-1333.
- Brynjolfsson, E. (1993). "The Productivity Paradox of Information Technology." Communications of the ACM, 36(12), 66-77.
- Christensen, C. & Raynor, M. (2003). The Innovator's Solution: Creating and Sustaining Successful Growth. Harvard Business School Press.
- David, P.A. (1990). "The Dynamo and the Computer: An Historical Perspective on the Modern Productivity Paradox." American Economic Review, 80(2), 355-361.
- Dittmar, J. (2011). "Information Technology and Economic Change: The Impact of The Printing Press." The Quarterly Journal of Economics, 126(3), 1133-1172.
- Dittmar, J. & Seabold, S. (2019). "New Media and Competition: Printing and Europe's Transformation after Gutenberg." CEP Discussion Paper No. 1600.
- Perez, C. (2002). Technological Revolutions and Financial Capital: The Dynamics of Bubbles and Golden Ages. Edward Elgar Publishing.
- Solow, R. (1987). "We'd better watch out." New York Times Book Review, July 12, p.36.
- Andreessen, M. (2011). "Why Software Is Eating the World." Wall Street Journal.
- Bustamante, N. (2026). "The Crumbling Workflow Moat."
- Grady, P. & Huang, S. (2026). "2026: This is AGI." Sequoia Capital.
- Spolsky, J. (2002). "Strategy Letter V."
- Thompson, B. (2015). "Netflix and the Conservation of Attractive Profits." Stratechery.
- a16z. (2025). "Charts of the Week: The Almighty Consumer."
- McKinsey & Company. (2025). "The State of AI 2025."
- METR. (2025). "Measuring AI Ability to Complete Long Tasks."
- METR. (2025). "Early 2025 AI Experienced OS Dev Study."
- Buringh, E. & Van Zanden, J.L. (2009). "Charting the 'Rise of the West': Manuscripts and Printed Books in Europe." The Journal of Economic History, 69(2), 409-445.
- CBO. (2013). "Total Factor Productivity Growth in Historical Perspective." Working Paper 2013-01.
User → "Help me plan my outfit and shopping for today in Amsterdam." Model → calls toolSearchTool(query="current time date") → Search Result: ["currentTime"]
Turn 2: Model sees toolSearchTool + currentTime
Model → calls toolSearchTool(query="weather location") → ["weather"] calls currentTime("Amsterdam") → "2025-12-08T11:30"
Turn 3: Model sees toolSearchTool + currentTime + weather
Model → calls toolSearchTool(query="clothing shops") → ["clothing"] calls weather("Amsterdam", "2025-12-08T11:30") → "Sunny, 15°C"
Turn 4: Model sees all discovered tools
Model → calls clothing("Amsterdam", "2025-12-08T11:30") → ["H&M", "Zara", "Uniqlo"] Model → Generates the final answer, synthesizing weather, outfit advice, and store recommendations.
Throughout this entire interaction, only 3 of the 28 registered tools were actually loaded into the context. The remaining 25 never consumed context window resources.
5.3 Analogy to RAG
Tool Search is fundamentally the application of Retrieval-Augmented Generation (RAG) principles to the domain of tool management.
| Dimension | RAG (Knowledge Retrieval) | Tool Search (Tool Retrieval) |
|---|---|---|
| Retrieval Object | Documents / Knowledge Fragments | Tool Definitions (name, description, parameter schema) |
| Indexing Method | Vector Embeddings, BM25 | BM25, Regex, Semantic Embeddings |
| Trigger Condition | User query | Model determines a need for a specific capability |
| Injection Location | Into the context | Appended to the end of the context |
| Objective | Reduce hallucinations, provide factual knowledge | Reduce context bloat, improve tool selection accuracy |
6.0 Synergistic Capabilities: Advanced Features for Tool Search
Anthropic, in conjunction with the Tool Search feature, introduced two complementary capabilities. Together, these three components form a complete "Advanced Tool Use" system.
6.1 Programmatic Tool Calling
Problem: In traditional tool use, each individual tool call requires a full model inference cycle, and all intermediate results are fed back into the context window.
Solution: The model writes Python code to orchestrate multiple tool calls. Intermediate results are processed within the code execution environment (a sandbox), and only the final, synthesized result is passed back into the model's context.
# Model-generated orchestration code (executed in a sandbox)
team = await get_team_members("engineering")
expenses = await asyncio.gather(*[
get_expenses(m["id"], "Q3") for m in team
])
exceeded = [
{"name": m["name"], "spent": sum(e["amount"] for e in exp)}
for m, exp in zip(team, expenses)
if sum(e["amount"] for e in exp) > budgets[m["level"]]["travel_limit"]
]
# Only this final result enters the model context
print(json.dumps(exceeded))
Result: Token consumption was reduced from 43,588 to 27,297 (a 37% reduction), with a simultaneous increase in accuracy.
6.2 Tool Use Examples
Problem: A JSON Schema defines a tool's structure but fails to convey usage patterns—such as when to pass specific optional parameters or conventions for ID formats.
Solution: Provide input_examples directly within the tool definition, allowing the model to learn the correct invocation patterns from concrete examples.
Result: Accuracy on complex parameter handling increased from 72% to 90%.
6.3 The Synergistic Relationship
┌──────────────────────────────────────────────────────┐ │ Agent Tool Use Lifecycle │ │ │ │ 1. Tool Search: Dynamically find relevant tools. │ │ 2. Tool Use Examples: Learn correct invocation. │ │ 3. **Programmatic Calling
- Tool Search → Discover the correct tool
- "Find the tool I need"
- Tool Use Examples → Call the tool correctly
- "Learn how to use this tool"
- Programmatic TC → Execute tools efficiently
- "Orchestrate multi-step calls programmatically, returning only critical results"
The optimal practice is to introduce these capabilities progressively based on identified bottlenecks:
- Frequent parameter errors → Start by adding Tool Use Examples.
- Bloated intermediate results → Start by adding Programmatic Tool Calling.
- Excessive tool definitions → Start by adding Tool Search.
7. Practical Implementation Guide
7.1 When to Use Tool Search
Recommended Use Cases:
- The system has 10+ available tools.
- Building a system with multiple MCP (Multi-Capability Provider) servers (e.g., GitHub + Slack + Jira + ...).
- Tool definitions consume more than 10K tokens.
- Encountering issues with tool selection accuracy.
- The tool library is expected to grow over time.
Not Recommended For:
- Fewer than 10 tools.
- All tools are utilized in every request.
- Tool definitions are extremely concise (totaling <100 tokens).
7.2 Best Practices
- Keep 3-5 high-frequency tools consistently loaded. ``
json { "name": "search_files", "description": "Search files in workspace", "input_schema": {...} // No defer_loading → always available, no search required }`` - Tool names and descriptions must be clear and searchable. ``
json // Effective Naming {"name": "search_customer_orders", "description": "Search for customer orders by date range, status, or total amount. Returns order details including items, shipping, and payment info."} // Ineffective Naming {"name": "query_db_orders", "description": "Execute order query"}`` - Use consistent namespace prefixes. ``
github_create_pr, github_list_issues, github_merge_pr slack_send_message, slack_list_channels jira_create_ticket, jira_update_status`` - Outline available capabilities in the system prompt. > You can search for tools in the following categories: > - Slack message and channel management > - GitHub repository and PR operations > - Jira ticket tracking > - Sentry error monitoring > Use tool search to find specific functions.
- Utilize Namespace Grouping (OpenAI). Limit each namespace to under 10 functions. Use a clear description to summarize the namespace's capabilities, enabling the model to effectively decide when to load a specific group.
7.3 Trade-offs
| Benefits | Costs |
|---|---|
| Drastically reduces token consumption | Extra search step adds 1-2 API round-trips |
| Improves tool selection accuracy | Search may miss relevant tools (depends on description quality) |
| Protects the Prompt Cache | Slightly increases implementation complexity |
| Supports scaling to thousands of tools | Requires maintaining clear tool descriptions and naming conventions |
8. The Deep Interaction Between Tool Search and Prompt Cache
The Prompt Cache is a core mechanism in current LLM APIs for reducing cost and latency. The introduction of Tool Search has a profound impact on caching behavior—it solves existing problems while introducing new design constraints. This topic has generated significant discussion across developer communities like GitHub and Hacker News.
8.1 Prompt Cache Fundamentals: The Prefix Matching Principle
The core principle of Prompt Caching is prefix matching—the API caches Key-Value (KV) tensors from the beginning of a request up to a specific breakpoint. Subsequent requests that share an identical prefix can reuse these pre-computed tensors, thereby skipping redundant calculations.
Request 1: [System Prompt] [Tool Definitions] [Turn 1 History] ↑ Cache ends here
Request 2: [System Prompt] [Tool Definitions] [Turn 1 History] [Turn 2 History] |─── Cache Hit (Reused) ───| |─── New Computation ───|
The critical constraint: any change within the prefix invalidates the entire cache that follows it.
Platform Implementation Differences
| Platform | Caching Method | Min. Token Threshold | Cache Retention Time | Cost Discount |
|---|---|---|---|---|
| OpenAI | Automatic (Prefix) | 1,024 tokens | 5-10 min (Standard) / 24 hr (Extended) | 50-90% off input price |
| Anthropic | Developer Controlled | 1,024 tokens | 5 min / 1 hr | 90% discount on cached tokens |
| Automatic + Explicit | 4,096 tokens | Configurable | 75% discount |
8.2 Why Traditional Tool Definitions Invalidate Caching
In a conventional architecture without Tool Search, the full definition of every tool is part of the cacheable prefix. This leads to several common cache invalidation scenarios.
Scenario 1: Adding or Removing Tools
It is a well-documented anti-pattern to add or remove tools mid-session. As many engineering teams have discovered, changing the tool set during a conversation is one of the most common ways developers inadvertently break prompt caching. Because the tools are part of the cached prefix, adding or removing even a single tool invalidates the cache for the entire conversation.
Scenario 2: Changes in Tool Order
Request A: [System Prompt] [Tool_1] [Tool_2] [Tool_3] [Conversation...] // Cache is established
Request B: [System Prompt] [Tool_2] [Tool_1] [Tool_3] [Conversation...] // Entire cache is invalidated!
↑ Mismatch begins here
The non-deterministic ordering of tool definitions, such as iterating over a HashMap, is a subtle but lethal cache killer.
Scenario 3: Updates to Tool Parameters
Modifying tool parameters, such as updating a description or adding an optional field, also breaks the cache prefix. For instance, we've observed in production systems where simply updating the list of callable agents available to an AgentTool unexpectedly triggered a global cache invalidation event.
8.3 How Tool Search Protects the Cache
Tool Search ingeniously resolves these caching challenges through two key design principles.
Design 1: Deferred Tools Are Excluded from the Initial Prefix
The primary mechanism is straightforward: deferred tools are excluded from the initial prompt entirely. They are only added to the context after the model searches for and selects them, ensuring the system prompt and core tool definitions remain stable and cacheable.
Traditional Model (Fragile Cache): [System Prompt] [Full definitions for Tools 1-50: ~55K tokens] [Conversation...] ↑ Any tool change here breaks the cache
Tool Search Model (Stable Cache): [System Prompt] [tool_search_tool: ~500 tokens] [3-5 core tools] [Conversation...] ↑ Remains constant, ensuring high cache stability
This design reduces the initial prefix from a volatile ~55K tokens to a stable ~3-5K tokens.
Design 2: Newly Discovered Tools Are Injected at the End of the Context
Leading implementations of Tool Search emphasize that when a deferred tool is found, its definition is injected into the context after the user's query, not into the original system prompt area. This preserves the integrity of the initial prefix for all subsequent turns in the conversation.
Tool search is engineered to preserve the model's cache. When new tools are discovered by the model, they are injected at the end of the context window.
Turn 1: [System Prompt] [Search Tool] [Core Tools] [Turn 1] ↑ github.createPR is loaded at the end
Turn 2: [System Prompt] [Search Tool] [Core Tools] [Turn 1] [Turn 2] [github.createPR] ├──────── Cache hit (prefix remains unchanged) ────────┤ ├── New content ──┤
Because the prefix remains identical, the cache is not invalidated by dynamically loaded tools.
8.4 Caching Engineering Practices in Production
Our engineering team treats prompt caching as a core pillar of our platform infrastructure, designing the entire system architecture around its optimization. The following are key lessons from our internal implementation.
Principle: Monitor Cache Hit Rate Like Uptime
We alert on cache breaks and treat them as incidents. A few percentage points of cache miss rate can dramatically affect cost and latency.
Cache-Friendly Design for "Plan Mode"
An intuitive approach to implementing a "plan mode" for an agent is to replace its toolset with read-only tools. This, however, breaks the cache.
Our implementation maintains the full toolset at all times. We introduce EnterPlanMode and ExitPlanMode as tools themselves, using a system message to inform the model of its current state. The tool definitions never change.
// Bad: Switching toolsets
Enter Plan Mode → Remove Write, Edit tools → Cache miss!
// Correct: Using tools to represent state
Enter Plan Mode → Send system message "You are now in Plan Mode" → Cache hit
(EnterPlanMode/ExitPlanMode are always in the tool list)
The Role of Tool Search in a Cached Context
Our solution is defer_loading. Instead of removing tools, we send lightweight stubs—just the tool name, with defer_loading: true—that the model can "discover" via a ToolSearch tool when needed. The full tool schemas are only loaded when the model selects them. This ensures the initial prompt prefix, containing the stubs, remains stable and cacheable.
Cache-Safe Context Compaction
When the context window is exhausted and requires compaction, our system executes the compaction request using the identical system prompt, user context, and tool definitions as the parent conversation. This allows the compaction request to reuse the parent's cached prefix instead of computing from scratch.
8.5 Academic Research: A Quantitative Evaluation of Caching Strategies
In January 2026, a research team published a paper, "Don't Break the Cache: An Evaluation of Prompt Caching for Long-Horizon Agentic Tasks" (arXiv:2601.06007), which provided the first systematic quantification of different caching strategies under agentic workloads.
Experimental Setup:
- 4 Models: GPT-5.2, GPT-4o, Claude Sonnet 4.5, Gemini 2.5 Pro
- 500+ agent sessions, 10,000-token system prompt
- 3 caching strategies vs. a no-cache baseline
Three Caching Strategies:
| Strategy | Description | Cache Scope |
|---|---|---|
| Full Context | Caches the entire context without restriction. | System Prompt + Tools + Dialogue History + Tool Results |
| System Prompt Only | Caches only the system prompt, breaking the cache after it. | System Prompt |
| Exclude Tool Results | Excludes dynamic tool outputs from the cache. | System Prompt + Tools + Dialogue (excluding tool results) |
Core Findings:
| Metric | Cost Reduction |
|---|
Time to First Token (TTFT) Reduction
Across all models (average):
- 41-80% reduction
- 13-31% reduction
Key Conclusions:
- Full-context caching can be counterproductive. Naively enabling full-context caching can paradoxically increase latency, as dynamic tool calls and results may trigger cache writes for content that will not be reused across sessions.
- The "System Prompt Only" strategy is the most stable across both cost and latency dimensions. This aligns perfectly with the design philosophy of Tool Search—maintaining a stable, cacheable prefix (System Prompt + Search Tool) while appending dynamic content at the end.
- The strategy for maximum cost savings does not necessarily maximize latency reduction. The choice of caching strategy must be aligned with the primary optimization target (cost vs. speed).
8.6 Community Discussion and Analysis
GitHub Issue #19436: Multi-Tier Cache Optimization
A community user, @guillaume-paradise, proposed a detailed optimization strategy employing multi-level cache Time-to-Live (TTL) based on content change frequency:
- Tier 1 (1-hour TTL): System Prompt + Tool Definitions ← Minimal changes
- Tier 2 (1-hour TTL): User configurations (e.g.,
CLAUDE.md) ← Occasional changes - Tier 3 (5-minute TTL): Skills, Hooks, Environment Variables ← Per-session changes
The user calculated the potential benefits: assuming 10 sessions per hour, the current single-TTL approach (5 minutes for all) requires 10 cache writes. The multi-tier strategy would require only 1 write and 9 reads, reducing cache operation costs by 56%.
This issue was marked as a duplicate of #2603, indicating that the Anthropic team is actively investigating this direction.
GitHub Issue #525 (Agent SDK) & #124 (TypeScript SDK): SDK-Level Support for defer_loading
Multiple community users have reported that while the Anthropic API supports defer_loading, this capability has not yet been exposed in the official SDKs. A typical user report states:
"12 SDK tools: ~6,000 tokens; 8 MCP tools: ~5,285 tokens. This consumes 15,000-20,000 tokens before the conversation even begins."
Another enterprise-level use case notes:
"With 150+ MCP tools, the context window is significantly consumed before any work starts."
These issues highlight a persistent gap between the API-level availability of features like Tool Search and their integration into official SDKs and products.
The ToolCaching Paper (arXiv:2601.15335): A Specialized Framework for Tool Caching
A January 2026 paper introduced the Value-Aware Adaptive Caching (VAAC) algorithm, specifically designed to optimize caching in LLM tool-use scenarios. It synthesizes request frequency, time decay, and cache value to achieve:
- An 11% increase in cache hit rate
- A 34% reduction in latency
This represents an evolutionary step from generic prefix matching to tool-aware intelligent caching.
8.7 Design Summary: Tool Search from a Caching Perspective
┌────────────────────────────────────────────────────────────┐
│ Synergy of Prompt Cache & Tool Search │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Cache-Friendly Prefix (Stable & Immutable) │ │
│ │ ┌─────────────────────────────────────────────┐ │ │
│ │ │ System Prompt │ │ │
│ │ │ + Tool Search Tool (~500 tok) │ │ │
│ │ │ + 3-5 Core Tools (defer_loading: false) │ │ ← Cached
│ │ └─────────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────┘
│ │ │ + Deferred tool stubs (name only, defer_loading:true) │ │ │
│ │ └─────────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Dynamic Content (Not cached / Session-cached) │ │
│ │ ┌─────────────────────────────────────────────┐ │ │
│ │ │ Conversation History │ │ │
│ │ │ + Tool call results │ │ ← Changes every turn
│ │ │ + Dynamically loaded full tool definitions (injected at the end) │ │ │
│ │ └─────────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────┘
### Core Design Principles:
1. **Static Before Dynamic:** System prompts and search tools are placed at the very beginning to ensure cross-session cache hits.
2. **Additive Only:** Tool definitions are never removed. Loading is managed via `defer_loading`.
3. **Stub Placeholders:** Deferred tools persist as lightweight stubs, maintaining a stable prefix order.
4. **Append-Only Injection:** Newly discovered tools are appended to the end of the context, preserving the prefix structure.
5. **State via Messages, Not Tool Mutation:** State transitions, such as entering a "planning mode," are communicated through system messages rather than by altering the toolset.
## 9. Industry Impact and Future Outlook
### 9.1 The Paradigm Shift: From "Function Calling" to a "Tool Ecosystem"
The introduction of Tool Search marks a new phase in the evolution of AI Agent tool utilization:
| Phase | Timeframe | Characteristics |
| :--- | :--- | :--- |
| **Function Calling** | 2023 | Models learn to invoke functions based on a JSON Schema. |
| **Multi-Tool Orchestration** | 2024 | Parallel calls, chained calls, and dependency management between tools. |
| **Tool Ecosystem** | 2025-2026 | Dynamic discovery, on-demand loading, and scaled management of thousands of tools. |
This evolutionary path mirrors the progression in software engineering from static linking to dynamic linking and, ultimately, to microservice discovery.
### 9.2 The Symbiotic Relationship Between MCP and Tool Search
The Model-centric Prompt (MCP) protocol provides a standard interface for AI models to connect with external tools, but it also exacerbates the "tool explosion" problem. Tool Search is the critical infrastructure that enables MCP to scale effectively:
MCP allows an Agent to connect to infinite tools
↓
The resulting tool quantity overwhelms the context window
↓
Tool Search enables the Agent to *manage* infinite tools
↓
A truly scalable Agent tool ecosystem emerges
### 9.3 Future Directions
1. **More Intelligent Search Strategies:** Evolving from BM25/Regex to deep semantic understanding, incorporating user intent and historical usage patterns for personalized tool recommendations.
2. **Tool Metadata Standardization:** The MCP protocol may natively support `defer_loading` and search-related metadata in future iterations.
3. **Cross-Agent Tool Sharing:** Enabling multiple collaborating Agents to share tool discovery results, eliminating redundant searches.
4. **Tool Quality Assessment:** Integrating LLM-as-Judge patterns to validate the correctness of a selected tool post-discovery.
5. **Predictive Loading:** Proactively anticipating and preparing required tools based on conversation history and task type.
## 11. Conclusion
Tool Search is not an isolated feature; it is a foundational piece of infrastructure for transitioning AI Agents from experimental toys to production-grade systems. It resolves a core conflict:
*The capabilities required by an Agent are constantly expanding (requiring more tools), yet the model's attention and context space are finite.*
By shifting the paradigm from "full pre-loading" to "on-demand discovery," Tool Search empowers an AI Agent to access hundreds or thousands of tools while maintaining high precision. This is the prerequisite for building genuinely practical AI Agent systems.
The near-simultaneous launch of this capability by OpenAI and Anthropic, with rapid adoption by open-source frameworks like Spring AI for cross-platform abstraction, indicates that Tool Search has become an industry consensus. As the MCP ecosystem continues to expand, Tool Search is set to become a standard component in every AI agent developer's toolkit.
References
[1] Anthropic. "Introducing Advanced Tool Use on the Claude Developer Platform." https://www.anthropic.com/engineering/advanced-tool-use [2] Anthropic. "Tool Search Tool Documentation." https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool [3] OpenAI. "Using GPT-5.4: Tool Search." https://developers.openai.com/api/docs/guides/latest-model [4] OpenAI. "Tool Search Guide." https://developers.openai.com/api/docs/guides/tools-tool-search [5] Spring AI. "Smart Tool Selection: 34-64% Token Savings with Dynamic Tool Discovery." December 11, 2025. https://spring.io/blog/2025/12/11/spring-ai-tool-search-tools-tzolov [6] TechBuddies. "How Claude Code's New MCP Tool Search Slashes Context Bloat." January 18, 2026. https://www.techbuddies.io/2026/01/18/how-claude-codes-new-mcp-tool-search-slashes-context-bloat-and-supercharges-ai-agents/ [7] MoneyControl. "OpenAI launches GPT-5.4 with major accuracy gains and new tool search system." https://www.moneycontrol.com/technology/openai-launches-gpt-5.4-with-major-accuracy-gains-and-new-tool-search-system-article-13852424.html [8] Dev Genius. "AI Agent Tool Overload? Cut Token Usage by 99% While Scaling to 1,000+ Tools." https://blog.devgenius.io/ai-agent-tool-overload-cut-token-usage-by-99-while-scaling-to-1-000-tools-fc91f8e2b6ab [9] Anthropic. "Effective Context Engineering for AI Agents." https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents [10] "Lessons from Building Claude Code: Prompt Caching Is Everything." TechTwitter. https://www.techtwitter.com/articles/lessons-from-building-claude-code-prompt-caching-is-everything [11] "Don't Break the Cache: An Evaluation of Prompt Caching for Long-Horizon Agentic Tasks." arXiv:2601.06007v2. https://arxiv.org/html/2601.06007v2 [12] "ToolCaching: Towards Efficient Caching for LLM Tool-calling." arXiv:2601.15335. https://arxiv.org/abs/2601.15335
Understood. Ready for the article.

