MAKER: Shattering the Illusion of Thinking with Million-Step, Zero-Error LLM Reasoning
Updated on November 13, 2025
MAKER million-step zero-error LLM reasoning visualization
For AI to solve problems at the scale of human organizations and societies—from constructing skyscrapers to managing national logistics—it must execute vast numbers of steps flawlessly. Yet, despite remarkable breakthroughs in reasoning and tool use, Large Language Models (LLMs) have consistently failed at tasks requiring long, dependent sequences of actions.
This is the challenge MAKER addresses. Developed by researchers at the AI Lab in collaboration with UT Austin, MAKER is the first system to successfully solve a task requiring over one million LLM steps with zero errors. This achievement introduces a new paradigm for scaling AI: Massively Decomposed Agentic Processes (MDAPs).
If you are a developer looking to build robust AI systems, a solo founder aiming for scalable operations, or a designer sketching agentic workflows, MAKER provides a blueprint for reliable, large-scale AI development.

The LLM Reliability Cliff
Current LLMs suffer from a persistent error rate that prevents scale-up. When tasks involve many dependent logical steps, even small errors compound quickly, leading to catastrophic failure.
Experiments using benchmarks like the Towers of Hanoi vividly demonstrate this “reliability cliff”. Standard models perform well on simple versions but fail completely once the task crosses about eight disks. A system with just a 1% per-step error rate is expected to fail after only 100 steps of a million-step task.

MAKER tackles this fundamental liability by shifting the focus from constantly improving a single “intelligent” LLM to designing an inherently error-tolerant system architecture.

Understanding MAKER: Scaling Intelligence Through Structure
MAKER—which stands for Maximal Agentic decomposition, first-to-ahead-by-K Error correction, and Red-flagging—is an implementation of the MDAP framework.
The core insight is that reliability can be achieved through extreme decomposition and local error correction. The results suggest that massively decomposed agentic processes (MDAPs) can efficiently solve problems at the level of organizations and societies, instead of relying solely on continual LLM improvement.
MAKER relies on three core components:
1. Maximal Agentic Decomposition (MAD)
For long tasks, LLMs performing multi-step reasoning often become unreliable as their context increases. MAD solves this by breaking the task into the smallest possible subtasks, assigning each to a focused microagent.
- Microagents, Micro-roles: Each agent is assigned only a single subtask (maximal decomposition, m=1). This limits the agent’s context to the minimal information needed for that single step.
- Efficiency: This extreme focus allows the use of smaller, non-reasoning LLMs with limited context sizes, which were found to be more cost-effective for long-range tasks within the MAKER framework.
2. First-to-ahead-by-k Voting
Modularity enables effective and scalable error correction at the subtask level. MAKER uses a multi-agent voting scheme: multiple agents independently attempt to solve the same single step.
- Local Consensus: Candidate actions are sampled until one action has achieved k more votes than any other. This is known as “First-to-ahead-by-k voting”.
- Scaling Efficiency: The necessary vote threshold, k_min, grows only logarithmically (Θ(ln s)) with the total number of steps (s). This is a key finding: when combined with MAD, the overall expected cost of solving the entire task scales log-linearly (Θ(s ln s)). In contrast, if agents handle multiple steps (m>1), the cost grows exponentially.
3. Red-Flagging
To boost the per-step success rate (p), MAKER uses “red-flagging” to discard responses that indicate increased risk of errors, especially correlated errors.
- Indicators of Confusion: MAKER flags responses that are overly long or incorrectly formatted. Preliminary experiments showed that longer answers tend to have more errors, and incorrect formatting often correlates with flawed reasoning.
- Mitigation: By discarding these responses and resampling, MAKER increases the success rate (p) and meaningfully reduces correlated errors, ensuring localized failures don’t propagate.
The Proof: Solving the 20-Disk Towers of Hanoi
To validate MAKER, researchers applied it to the Towers of Hanoi puzzle with 20 disks. This configuration requires 2²⁰ - 1, or 1,048,575, dependent steps. Every single step had to be executed correctly.
Using gpt-4.1-mini (a non-reasoning model chosen for its cost-effectiveness), and setting the voting threshold to k=3, the full MAKER system solved the problem perfectly. This successful execution of over one million LLM steps with zero errors establishes that scaling LLM-based systems to large time horizons is possible.
The process exhibited exponential convergence toward a zero-error solution, confirming MAKER’s theoretical efficiency.

Implications for AI Development, Design, and Scaling
The MAKER architecture provides critical insights for developers, designers, and solo founders building the next generation of AI products:
1. Development and Agent Design
MAKER’s success hinges on Extreme Decomposition, mirroring principles found in microservices architecture:
- Modularity: Each microagent can be tailored to a specific task.
- Independent Development: Agents can be updated and tested in isolation.
- Design for Failure: The system is inherently designed to tolerate the failure of individual agents through voting/error correction.
For developers, this suggests that investment should focus on creating highly specialized, minimal-context microagents rather than continually chasing the latest, largest monolithic LLM.
2. Scaling and Cost Management (For Solo Founders)
By using MDAPs, you can maintain a high probability of success for large tasks by increasing k (the vote threshold). Crucially, the system’s cost scales log-linearly with the number of steps.
- This framework allows for the selection of the most cost-effective LLM (c/p minimized). Surprisingly, smaller, non-reasoning models often provide the best reliability-per-dollar when used in MAKER.
- The total cost of running MAKER scales much more efficiently than using a single agent or a partially decomposed system.
3. Safety and Control (For Founders and Enthusiasts)
MAKER presents an alternative path to advanced AI that comes with substantially reduced risks compared to relying on ever-smarter single models.
- Transparency and Auditing: Because each step has a clearly defined and limited focus, the agents’ actions are easier to sandbox, audit, and control.
- Reduced Collusion Risk: Running multiple focused agents independently on each step substantially reduces the ability of agents to collude to produce harmful actions.
- Model Size and Risk: The ability to use smaller LLMs for the vast majority of the work mitigates risks associated with powerful, less-controlled models.
The Future of Agentic AI
While MAKER demonstrated flawless execution of a known plan in the Towers of Hanoi, the next frontier for AI development is extending this framework to handle creative insights—planning, idea generation, and verification.
By decomposing the entire problem-solving pipeline, including the creative parts, and applying MDAP principles, developers can automate complex processes where the total number of steps and the specific subtask types are unknown beforehand.
MAKER proves that dependable, large-scale intelligence can be achieved with systems that are smaller, safer, and more controllable. The future of AI doesn’t depend solely on building bigger models, but on designing smarter, distributed systems that simply do not fail.
Built an AI tool you want to share? I’ve compiled a curated list of AI directories where you can submit your AI projects. Each directory includes my personal review, submission process details, and quality indicators to help you choose the best platforms for your launch.
MAKER was described in the preprint “Solving a Million-Step LLM Task with Zero Errors,” authored by Elliot Meyerson, Giuseppe Paolo, Roberto Dailey, and others, and featured in the blog post “Shattering the Illusion: MAKER Achieves Million-Step, Zero-Error LLM Reasoning”.