AI & Technology·

February 21, 2026

Context Engineering: The New Discipline That Replaced Prompt Engineering

Prompt engineering is dead. Context engineering — the art of designing dynamic systems that feed LLMs the right information at the right time — is the skill that separates production AI from playground demos.

8 min read

In June 2025, Andrej Karpathy — former Tesla AI lead and OpenAI founding member — posted a single observation that crystallized what thousands of AI engineers were already experiencing: the era of prompt engineering is over. What replaced it is something far more demanding, more systematic, and more powerful. He called it context engineering.

“Context engineering is the delicate art and science of filling the context window with just the right information for the next step,” Karpathy wrote. Shopify CEO Tobi Lutke amplified the signal: “I really like the term context engineering over prompt engineering. It describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM.”

By July 2025, Gartner made it official: “Context engineering is in, and prompt engineering is out.” This was not a rebrand. It was a recognition that building reliable AI systems requires engineering the entire informational environment a model sees — not just crafting clever prompts.

What Is Context Engineering?

Philipp Schmid, Technical Lead at AWS, offers the canonical definition: “Context engineering is the discipline of designing and building dynamic systems that provide the right information and tools, in the right format, at the right time, to give an LLM everything it needs to accomplish a task.”

Think of it this way: prompt engineering is about how you ask. Context engineering is about everything the model sees. In production AI agent systems, the actual prompt may represent a tiny fraction of the total context. The rest comprises conversation history, retrieved documents, tool definitions, agent state, memory, and dynamically assembled knowledge.

Anthropic's engineering team distills the guiding principle: “Find the smallest possible set of high-signal tokens that maximize the likelihood of desired outcomes.” This is context engineering in a sentence — maximum signal, minimum noise.

The Karpathy Mental Model: LLM as CPU, Context as RAM

Karpathy proposed a powerful analogy: think of an LLM as a CPU, and its context window as RAM. Just as a CPU needs the right data loaded into RAM to compute effectively, an LLM needs the right tokens loaded into its context window to reason effectively.

Context engineering is memory management for AI. Too little context and the model lacks what it needs. Too much irrelevant context and costs spike while performance degrades. The engineering challenge is loading precisely the right information at precisely the right time.

The Seven Components of Complete Context

Schmid identified seven essential components that make up complete context for any AI system:

1. Instructions / System Prompt — Behavioral guidelines, rules, persona definitions, and examples that shape the model's behavior.

2. User Prompt — The immediate task, question, or instruction.

3. State / History (Short-term Memory) — The current conversation thread and prior exchanges that maintain continuity.

4. Long-Term Memory — Persistent knowledge across sessions: user preferences, project summaries, learned patterns.

5. Retrieved Information (RAG) — External, current data pulled from documents, databases, and APIs.

6. Available Tools — Definitions of callable functions the model can use to take actions.

7. Structured Output — Specified response formats like JSON schemas that constrain and focus the model's output.

Four Core Strategies from LangChain

LangChain's engineering blog identifies four primary context engineering strategies that form a practical framework:

Write Context

Save information outside the context window for later retrieval. This includes scratchpads where agents document findings through note-taking, and persistent memories generated via self-reflection. Products like ChatGPT, Cursor, and Windsurf already implement this pattern.

Select Context

Pull relevant information into the context window strategically. Methods include reading from scratchpads, embedding-based retrieval, knowledge graphs, and fixed procedural rules files like CLAUDE.md. LangChain found that RAG-based tool selection increases accuracy by 3x compared to loading all tools into context.

Compress Context

Retain only necessary tokens. Techniques include summarization across agent trajectories and at agent-to-agent handoff boundaries. Claude Code triggers auto-compaction at 95% context capacity, preserving architectural decisions while discarding redundant tool outputs.

Isolate Context

Split context across components to prevent overload. This includes multi-agent architectures with specialized sub-agents, environment-based isolation using code sandboxes, and state-based isolation that exposes only relevant fields at each step.

How Context Fails: Four Patterns to Watch For

Drew Breunig identified four critical failure patterns that every AI engineer should know:

Context Poisoning occurs when a hallucination or error enters the context and gets repeatedly referenced, compounding downstream errors. When Google's Gemini agent hallucinated while playing Pokemon, its poisoned goals section led to nonsensical strategies for the rest of the session.

Context Distraction happens when context grows so long that the model over-focuses on accumulated information, neglecting its training knowledge. The context becomes a distraction rather than an aid.

Context Confusion arises when superfluous information generates low-quality responses. The model uses irrelevant data to inform its output, producing answers that sound plausible but miss the point.

Context Clash happens when new information and tools conflict with existing prompt content, creating contradictions the model cannot resolve cleanly.

Lessons from Anthropic: Building Claude Code

Anthropic published a landmark engineering post detailing three techniques developed from building Claude Code that represent some of the most battle-tested context engineering strategies available:

Compaction: When approaching context limits, summarize conversation contents and reinitiate with a compressed summary. The key insight is to preserve architectural decisions and unresolved issues while discarding redundant tool outputs. Start by maximizing recall, then improve precision by eliminating superfluous content.

Structured Note-Taking: Agents write notes persisted outside the context window, then retrieve and consult them later. When Claude played Pokemon as a test case, this technique enabled it to maintain precise tallies across thousands of game steps — something impossible with context alone.

Sub-Agent Architectures: Specialized sub-agents handle focused tasks with clean context windows. The main agent coordinates the high-level plan while sub-agents perform deep technical work, each returning condensed summaries of 1,000 to 2,000 tokens. This isolates context pollution and keeps each agent focused.

Production Lessons from Manus

Manus, an AI agent platform, published some of the most practical production lessons for context engineering. Their most striking insight: KV-cache hit rate is the single most important metric for a production-stage AI agent. With Claude Sonnet, cached tokens cost $0.30 per million tokens versus $3.00 uncached — a 10x cost difference. Manus experiences roughly a 100:1 input-to-output token ratio, making cache efficiency critical.

Their key practices include maintaining stable prompt prefixes for cache hits, implementing append-only context that never retroactively modifies earlier messages, and using deterministic serialization with stable JSON key ordering. They also recommend masking tools rather than removing them mid-iteration, since dynamically changing tool definitions breaks the KV-cache.

Perhaps their most counterintuitive lesson: preserve error information. Erasing failed actions and error traces removes evidence the model needs to adapt its strategy. And avoid few-shot pattern entrapment — uniform context creates brittle agents that repeat the same patterns. Introducing structured variation breaks repetitive decision loops.

Context Engineering for AI Coding Assistants

The AI coding assistant space is where context engineering is most visibly reshaping developer workflows. Each major tool takes a different approach to the same fundamental challenge: how to give a model enough codebase context to be useful without overwhelming it.

Claude Code uses a terminal-native approach with 200,000-token context windows. It pre-loads CLAUDE.md files at session start for project-wide conventions, supports path-scoped rules, sub-agents with separate context windows, and MCP servers for external tool integration. It auto-compacts at 95% context capacity.

Cursor analyzes multiple files simultaneously with project-wide context understanding. It uses .cursorrules files paired with example code for project-specific guidance. GitHub Copilot examines code around the cursor, open files, and repository metadata, using copilot-instructions.md for project context.

A new ecosystem is emerging around project-specific context files — CLAUDE.md, .cursorrules, copilot-instructions.md — that tell AI assistants how to work with a codebase. As Martin Fowler's ThoughtWorks team observed: “The most important variable is no longer which AI you choose, but how well you define the project-specific context it works from.”

The Model Context Protocol: Infrastructure for Context

Anthropic introduced the Model Context Protocol (MCP) in November 2024 as an open standard for connecting AI systems to external data sources and tools. MCP solves the N-times-M integration problem by providing a universal protocol with three core primitives: Resources for information retrieval, Tools for performing actions, and Prompts for reusable templates.

MCP has been adopted by OpenAI, Google DeepMind, and major enterprise systems. It represents the infrastructure layer of context engineering — standardizing how context flows into models across tools and platforms.

The Numbers: Why Context Engineering Matters

LangChain's 2025 State of Agent Engineering Report surveyed 1,340 respondents and found that 57% of organizations now have AI agents in production, yet 32% cite quality as the top barrier — with most failures traced to poor context management, not LLM capability limitations.

Real-world impact is measurable. Microsoft reported a 26% increase in completed coding tasks and 65% fewer errors with AI code helpers using proper context engineering. Five Sigma Insurance achieved an 80% reduction in processing errors and 25% increase in adjuster productivity. Retailers report 10x improvements in personalized offer success rates.

The token economics alone make the case. Current context windows range from 8,000 tokens to over 1 million, but a typical enterprise monorepo spans several million tokens — far exceeding any window. Research confirms the “lost in the middle” phenomenon: models struggle to access information buried in the middle of long contexts. Context engineering is not optional — it is the difference between AI that works and AI that fails.

A Systems Thinker's Take

From a game design and systems thinking perspective, context engineering is deeply familiar. Game designers have always understood that the environment shapes behavior more than individual instructions. A well-designed game does not tell players what to do — it creates conditions where the desired behavior emerges naturally.

Context engineering follows the same principle. You do not tell the model how to reason — you create an informational environment where good reasoning emerges. The feedback loops are identical: provide the right constraints, the right information at the right time, and the system performs. Overload it with noise, and it degrades. This is not prompt crafting. It is systems design.

As Anthropic concluded: “Even with advancing capabilities, treating context as a precious, finite resource will remain central to building reliable, effective agents.” And as Cognizant CIO Neal Ramasamy warned in February 2026: “Teams that take the time to codify that context, implement runtime lineage, and build responsible governance into execution will move faster with fewer surprises as they scale agents.”

Context engineering is not a trend. It is the engineering discipline that makes AI systems work in production. Whether you are building agents, coding assistants, or any LLM-powered product — the context is the product.