Want to learn more about Faros AI?

Fill out this form to speak to a product expert.

I'm interested in...
Loading calendar...
An illustration of a lighthouse in the sea

Thank you!

A Faros AI expert will reach out to schedule a time to talk.
P.S. If you don't see it within one business day, please check your spam folder.
Oops! Something went wrong while submitting the form.
Submitting...
An illustration of a lighthouse in the sea

Thank you!

A Faros AI expert will reach out to schedule a time to talk.
P.S. If you don't see it within one business day, please check your spam folder.
Oops! Something went wrong while submitting the form.

Context Engineering for Developers: The Complete Guide

Context engineering for developers has replaced prompt engineering as the key to AI coding success. Learn the five core strategies—selection, compression, ordering, isolation, and format optimization—plus how to implement context engineering for AI agents in enterprise codebases today.

Thierry Donneau-Golencer
Thierry Donneau-Golencer
An illustration of the seven layers of context engineering for AI agents that developers need to understand
15
min read
Browse Chapters
Share
December 1, 2025

Why is context engineering more important than prompt engineering?

Context engineering for developers has replaced prompt engineering as the key determinant of AI coding agent success. This discipline of architecting your AI agent's entire information ecosystem determines whether teams ship reliable code or generate expensive technical debt. Developers who master what information their agents see, when they see it, and how it's structured are seeing the biggest productivity impact.

Imagine this: you spent twenty minutes crafting the perfect prompt for your coding agent. You were specific about the requirements, clear about the constraints, and even included a few examples. The agent churned for a minute and generated beautiful, idiomatic code that compiled on the first try.

Then you deployed it and watched three microservices go down.

The agent had ignored your authentication layer, bypassed your data validation patterns, and introduced a dependency that conflicted with your existing stack! None of this was mentioned in your prompt because you assumed the AI would just... know. After all, it's been trained on millions of repositories, right?

Here's the key thing you need to know about working with AI coding agents in 2026: the prompt isn't the problem. The context is.

While the industry spent 2023 and 2024 obsessing over prompt engineering, the best teams quietly figured out something more fundamental. By mid-2025, when Andrej Karpathy and Shopify CEO Tobi Lütke started talking about "context engineering," they weren't coining a buzzword. They were naming the discipline that actually determines whether your AI coding agents ship reliable code or generate expensive technical debt.

In 2026, the teams shipping reliable AI-generated code won’t be the ones with clever prompts. They'll be the ones who mastered what information their agents see, when they see it, and how it's structured.

This is the complete guide to why context engineering matters, how it works, and how to implement it in your workflow today.

infographic - context engineering for ai coding agents
Context engineering for AI coding agents summary infographic: Architecting the information ecosystem

What is context engineering?

Context engineering is the discipline of architecting your AI agent's entire information ecosystem; not just the prompt, but all the information a model has access to, including codebase context, git history, dependencies, tool definitions, team standards, and retrieved documentation. The context engineering definition in software development is straightforward: it's the practice of curating all the information an AI agent needs to produce code that actually works in your system.

Think of it this way: Prompt engineering is like giving someone a task: "Fix the authentication bug in the login service." Context engineering is ensuring they have access to your codebase, know which authentication library you use, understand your security requirements, can see recent changes to the auth module, and know which test patterns your team follows. 

In short, context engineering for AI agents means providing the full operational picture—not just the task instruction.

The Full Context Stack:

  1. System prompts & instructions (the traditional "prompt")
  2. Codebase context (relevant files, functions, architectural patterns)
  3. Git history & recent changes (what happened before this task)
  4. Dependencies & imported libraries (what's available to use)
  5. Tool definitions (debuggers, linters, test runners the agent can invoke)
  6. Team standards & patterns (AGENTS.md files, style guides)
  7. Conversation history (context across multiple sessions)
  8. Retrieved documentation (API docs, examples, architecture decision records)
Diagram of the full context stack for ai coding agents
The full context stack for ai coding agents

Traditional Retrieval-Augmented Generation (RAG) focuses on one piece of this puzzle. Context engineering is the entire discipline of managing all these pieces together, in the right order, at the right granularity level, and with the right structure.

LangChain's team puts it well. Context engineering encompasses three facets: 

  • Instructional context (what to do)
  • Knowledge context (facts and domain information)
  • Tools context (capabilities and their results). 

A robust AI coding assistant needs all three, orchestrated correctly.

Why isn't prompt engineering enough anymore?

Prompt engineering fails at scale because models struggle with large contexts ("lost-in-the-middle" phenomenon), costs scale linearly with context size, and single prompts can't capture the architectural knowledge, patterns, and tribal wisdom that determine whether AI-generated code actually works in your system.

The debate around context engineering vs prompt engineering is settled. Context engineering wins. The limitations of prompt engineering became painfully obvious in 2025 when teams tried to scale AI coding assistants beyond demos. Consider the context window paradox: models now advertise 1 million, even 2 million token context windows. Sounds amazing, right? Throw your entire codebase at the AI and let it figure things out.

Except that's not how it works in practice.

Research from Stanford and UC Berkeley found that model correctness starts dropping around 32,000 tokens, even for models claiming much larger windows. The problem is "lost-in-the-middle": when context grows massive, models struggle to attend to information buried in the middle. They focus on the beginning and end, but everything else becomes noise.

And it gets worse: cost and latency scale linearly with context size. Every token you include costs money and adds milliseconds to response time.

The lesson is counterintuitive but critical: More context doesn't equal better performance. Optimal density wins. This aligns with findings from the AI Productivity Paradox Report—more AI usage doesn't automatically mean more productivity.

Here's what breaks when you rely on prompts alone:

Failure Mode What Happens
Architecture violations The agent generates code that compiles perfectly but violates your system's architectural patterns because it never saw your design principles
Repetitive questions The agent asks about information already discussed three sessions ago because it has no memory of previous interactions
Inconsistent patterns When modifying multiple files, the agent uses different naming conventions in each because it's seeing them in isolation
Hallucinated dependencies The agent confidently imports libraries that don't exist in your project because it's relying on training data, not your actual package.json
Context overflow Critical instructions get lost when the context window fills up with too much information
AI coding agent failure modes when relying solely on prompts

What should humans do vs. LLMs?

Humans must define the architecture, curate the context strategy, establish quality gates, and write clear specifications. LLMs should execute code generation, select relevant context at runtime, compress conversation history, and apply patterns—but only within the guardrails humans establish.

This division of labor is critical and often misunderstood. Many teams assume they can hand off context engineering to the AI itself. Recent research challenges this assumption.

What humans must own:

  • Specification quality: Defining what success looks like, with clear acceptance criteria.
  • Context architecture: Deciding what categories of information agents should access.
  • Pattern documentation: Codifying architectural decisions, anti-patterns, and tribal knowledge.
  • Quality gates: Establishing human-in-the-loop checkpoints before agents generate irreversible changes.
  • Evaluation frameworks: Defining how to measure whether context engineering improvements actually work.

What LLMs can handle (with guardrails):

  • Runtime context selection: Retrieving relevant files, functions, and documentation for a given task.
  • Context compression: Summarizing conversation history while preserving key decisions.
  • Pattern application: Following documented standards consistently across files.
  • Incremental refinement: Learning from execution feedback to improve context over time.

The ACE (Agentic Context Engineering) framework from Stanford demonstrates this division well. Their system uses separate roles: a Generator that produces code, a Reflector that extracts lessons from successes and failures, and a Curator that integrates insights into structured context updates. This modular approach, where different components handle generation, evaluation, and curation, mirrors how human teams actually work.

Crucially, ACE's research found that contexts should function as "comprehensive, evolving playbooks" rather than concise summaries. Unlike humans, who often benefit from condensed information, LLMs are more effective when provided with detailed, domain-specific context. The model can filter relevance at inference time, but only if the relevant information is present to begin with.

What are the five core context engineering strategies?

The five strategies are: (1) Context selection - retrieving only the most relevant pieces from your codebase; (2) Context compression - retaining critical information while reducing token count; (3) Context ordering - positioning information where models will attend to it; (4) Context isolation - splitting context across specialized agents; and (5) Format optimization - structuring information for maximum comprehension.

{{cta}}

Strategy #1: Context Selection (Write Less, Include More)

The first principle sounds paradoxical: to give your agent more useful information, you should include less total information.

Instead of dumping 100 files into the context window, you intelligently identify the five files that actually matter for the current task, plus function signatures from 15 others for reference.

The best coding assistants use sophisticated retrieval techniques: semantic search over embeddings to find conceptually related code, AST-based chunking at function and class boundaries, hybrid search combining keyword matching with semantic similarity, and reranking to prioritize the most relevant results.

Practical example for an authentication task:

Needed Not needed
The auth middleware file (500 lines) The entire frontend codebase (50,000 lines)
The user model definition (200 lines) Unrelated backend services (30,000 lines)
Function signatures from the database layer (100 lines) Database schema migration history (5,000 lines)
Your authentication configuration (50 lines) Other configurations for logging, rate limiting, CORs
Example of context needed (and not needed) for an authentication task

Strategy #2: Context Compression (Keep History, Lose Weight)

Your coding agent has been working for 40 turns across three files. It's accumulated git commit history, test results, error messages, and intermediate attempts. The context window is at 95% capacity. What do you do?

Context compression lets you retain the information that matters while drastically reducing token count. When Claude Code hits 95% capacity, for example, it triggers "auto-compact" and summarizes the full trajectory of your interaction. You lose the verbatim conversation but keep the architectural insights and decisions made.

The ACE framework addresses this with "incremental delta updates." Instead of regenerating contexts in full, it produces compact updates that are merged into existing context. This prevents what researchers call "context collapse," where iterative rewriting gradually erodes important details.

Strategy #3: Context Ordering (Position Matters More Than You Think)

Where information appears in your context window dramatically affects whether the AI uses it correctly.

The "lost-in-the-middle" phenomenon is well-documented. Models attend strongly to information at the beginning and end of their context window, but information in the middle gets lost. This isn't a bug—it's how attention mechanisms work at scale

Recommended ordering for coding agents:

[1. BEGINNING - Critical constraints and rules]
- System prompt
- AGENTS.md coding standards
- Critical "DO NOT" instructions
- Security requirements‍

[2. EARLY - Available capabilities]
- Tool definitions (debugger, linter, test runner)
- API documentation for key libraries
- Architectural overview‍

[3. MIDDLE - General context]
- Repository structure
- Related code examples
- Historical context and patterns‍

[4. LATE - Current state]
- Recent git commits
- Current branch changes
- Test results and error messages‍

[5. END - Immediate task]
- Files currently being edited
- Specific functions to modify
- User's exact request

Why does this order work?

  • Critical rules at the start prevent them from being overridden by examples in the middle. If your security requirement is "all API calls must check authentication," put it at position 1, not position 500.
  • Current work at the end leverages recency bias. The model naturally focuses more attention on the most recent information, making it perfect for "here's what I'm editing right now."
  • Examples in the middle provide reference without overwhelming the immediate task. They're available if needed but don't distract from current requirements.

Real-world example from production deployments: teams that moved their AGENTS.md standards from the middle to the very beginning of context saw 35-40% reductions in code style violations. Same information, different position, dramatically different results.

Strategy #4: Context Isolation (Divide and Conquer)

Sometimes the best context engineering decision is to split context across multiple specialized agents rather than giving one agent everything.

This is context isolation: the principle that for complex tasks, multiple agents with focused contexts outperform a single agent with massive context.

Here is an example:

Planning Agent
├─ Context: Architecture docs + task requirements
├─ Output: Implementation plan
└─ Tokens: 8,000
Coder Agent (Backend)
├─ Context: Relevant backend modules + coding standards + plan
├─ Output: Backend implementation
└─ Tokens: 15,000
Coder Agent (Frontend)
├─ Context: Relevant frontend files + UI patterns + plan
├─ Output: Frontend implementation
└─ Tokens: 12,000
Test Agent
├─ Context: Implementation + test patterns + coverage requirements
├─ Output: Test suites
└─ Tokens: 10,000

Each agent sees exactly what it needs and nothing more. The Planning Agent doesn't need implementation details. The Coder Agents don't need test frameworks. The Test Agent doesn't need architectural context.

Note that while context isolation can be helpful, there are cases where it is not appropriate. Recent Cognition research shows that multi-agent systems can be fragile: decision-making becoming too dispersed and parallel actions carry conflicting implicit decisions that undermine reliability. 

For those scenarios, single-threaded architectures with intelligent context compression may prove more dependable.  

The TL;DR: don't build multi-agents for parallelism if they need shared context. Build them for specialization with clear boundaries.

Strategy #5: Format Optimization (Structure is Signal)

How you structure information affects both token efficiency and model comprehension:

  • YAML/XML is more token-efficient than JSON
  • Markdown with clear headers helps models navigate structure
  • Code blocks with language tags enable syntax-aware parsing
  • Structured schemas are faster to process than prose descriptions
  • Tables work better than paragraphs for comparative data

What makes context engineering hard on enterprise codebases?

Enterprise environments expose four structural challenges that generic context-engineering advice does not address: unclear task specifications, generic context files, lack of human-in-the-loop guardrails, and no reliable framework for measuring what improvements actually work.

At Faros AI we have been avid users of coding assistants for over two years now, with our team leveraging Claude Code, Cursor, Devin and GitHub Copilot heavily.  This experience led us to build Clara, our context engineering solution for enterprise codebases.

As the industry coalesced around OpenAI's AGENTS.md standard in 2025, we invested heavily in building enhanced context files for various repositories.

Our initial approach mirrored what most teams were doing: create comprehensive documentation covering architectural patterns, coding standards, common pitfalls, and best practices—essentially a developer onboarding guide optimized for AI consumption.

The results were modest. Agents with access to detailed AGENTS.md files performed slightly better than those without them. We observed that agent variability seemed to be a stronger factor than rules files optimization. Running the same agent twice with identical context produced vastly different results. In addition, generic guidelines applied weakly across all scenarios. A rule like "follow DRY principles" helped in theory but didn't prevent the specific anti-patterns unique to each codebase.

As part of this journey, we identified four recurring blockers that consistently undermine agent reliability. 

{{cta}}

Challenge #1: Most task specifications are too vague for any agent to succeed

Across multiple teams, we found that only a small fraction of engineering tickets include enough clarity for either a human or an AI to implement correctly. Missing objectives, implicit constraints, and ambiguous acceptance criteria are far more common than most organizations realize. Well-crafted, context-rich tickets are the foundation.

Bad context starts with bad specs and AI amplifies the gaps.

Challenge #2: Crafting the correct context files

The industry gravitated toward AGENTS.md files in 2025 but in many organizations, the default response to unreliable agent behavior has been to keep adding more information: longer AGENTS.md files, sprawling architectural notes, encyclopedic design documents. But oversized or generic context files work directly against the core principles of effective context engineering.

Enterprise context often fails in three ways:

  • Too big (violates Strategy #1: Selection and Strategy #2: Compression):
    When context files exceed a few thousand tokens, critical rules get buried in the middle,  exactly where models pay the least attention. This triggers the same “lost-in-the-middle” failures that make large windows unreliable.

  • Too generic (violates Strategy #1: Selection and Strategy #5: Format Optimization):
    Broad advice like “use consistent patterns” or “follow best practices” does not translate into actionable constraints for an agent. Models need codebase-specific, granular, example-driven context to behave consistently.

  • Poorly structured (violates Strategy #3: Ordering):
    Even when relevant information exists, it’s often buried deep in a flat file structure. To be effective, rules files need to be project and repo-specific. 

The result is that teams unintentionally create context that’s expensive to process and ineffective at guiding agent behavior.

Challenge #3: There are no guardrails to stop wasteful or runaway agent sessions

Most AI coding tools still lack structured human-in-the-loop checkpoints. Without these, agents may produce long sequences of misguided changes before anyone intervenes. Enterprises often discover issues only after the agent has already produced irrelevant or incorrect work.

In traditional engineering, humans act as friction. In agent workflows, friction has to be designed.

Challenge # 4: There is no established measurement framework for context engineering

Even when teams experiment with better context ordering or retrieval strategies, they lack a consistent way to measure whether those changes improved outcomes. Agent evaluations are noisy, non-deterministic, and highly context-dependent. Today, there is no shared standard for evaluating context engineering itself.

Conclusion

Context engineering reveals how much invisible tribal knowledge exists in every codebase, including patterns never documented, anti-patterns silently avoided, architectural decisions made once and never written down. AI agents expose these gaps ruthlessly. They don't have the benefit of osmosis through code reviews or hallway conversations. They only know what you explicitly provide.

This creates an opportunity. The work of context engineering—codifying patterns, documenting failure modes, structuring specifications—makes codebases more maintainable for humans too.

But it also creates a challenge that the industry hasn't solved: manually maintaining comprehensive context doesn't scale, there are no standard workflows for human-in-the-loop intervention, and we lack measurement frameworks to evaluate what actually works.

Clever prompts make for impressive demos. Engineered context makes for shippable software.

The teams that master this distinction and build the infrastructure to support it won't just ship more code faster. They'll ship better code with fewer iterations, less technical debt, and higher developer satisfaction. That's the promise of context engineering, and we're just beginning to understand what it takes to deliver it at enterprise scale. 

Frequently Asked Questions

What's the difference between context engineering and RAG?

RAG (Retrieval-Augmented Generation) focuses on retrieving relevant information from a knowledge base, which is one piece of the context puzzle. Context engineering is the entire discipline of managing all context sources together: system prompts, retrieved documents, tool definitions, conversation history, and more, in the right order and structure.

Where should I start if my organization is new to context engineering?

Start with the cheapest, highest-leverage move: better task specs. Add clear objectives, constraints, and success criteria to your Jira tickets, then create one small, repo-specific AGENTS.md with concrete examples of “good” and “bad” patterns.

How do I handle context engineering for legacy codebases with little documentation?

Legacy codebases are actually where context engineering provides the most value. Start by using your AI assistant to generate documentation from the code itself. Have it describe architectural patterns, identify implicit conventions, and flag inconsistencies. Then have senior engineers review and correct these descriptions. This creates documentation and context simultaneously. The key insight: the process of building context for AI agents often produces the documentation your team should have written years ago.

{{cta}}

How do I know if my context is “good enough” for a coding agent?

Think of “good enough” as a human engineer could do the task based on the same inputs. If your ticket, context files, and linked docs would still leave a mid-level engineer guessing, your agent will guess too. Start by improving spec clarity and surfacing non-negotiable constraints at the top of the context.

What are the most common context engineering mistakes?

Three mistakes dominate: (1) Putting everything in one giant file instead of using folder-specific or task-specific context; (2) Writing rules as abstract principles instead of concrete examples; (3) Never pruning, and letting your context files grow indefinitely without removing outdated or contradictory entries. A fourth emerging mistake: assuming the AI will "figure out" implicit context from your codebase without explicit guidance.

How does context engineering change when working with different AI coding tools?

The core principles apply universally, but implementation differs. Claude Code uses CLAUDE.md files and has built-in auto-compaction. Cursor uses .cursorrules and project-level context. GitHub Copilot relies more heavily on open files and repository structure. The key is understanding how each tool constructs its context window and optimizing for that specific mechanism. Don't assume context that works in one tool transfers directly to another.

Can good context engineering compensate for a bad task specification?

No. Context engineering and task specification are complementary, not substitutes. Context tells the agent how to work in your codebase; the specification tells it what to build. A perfectly engineered context can't rescue a vague requirement like "improve the login flow." You'll get technically correct code that doesn't solve the actual problem. Invest in both: clear specifications define success, and good context ensures the path to success follows your standards.

How often should I update my context files?

Treat context like code: update it when you discover a gap. The trigger should be agent failures - when an agent makes a mistake that better context would have prevented, add that lesson immediately. Beyond reactive updates, do a quarterly review to prune outdated entries (deprecated patterns, old library versions) and consolidate redundant rules. Avoid scheduled rewrites; incremental updates preserve institutional knowledge better than periodic overhauls.

Thierry Donneau-Golencer

Thierry Donneau-Golencer

Thierry is Head of Product at Faros AI, where he builds solutions to empower teams and drive engineering excellence. His previous roles include AI research (Stanford Research Institute), an AI startup (Tempo AI, acquired by Salesforce), and large-scale business AI (Salesforce Einstein AI).

Connect
AI Is Everywhere. Impact Isn’t.
75% of engineers use AI tools—yet most organizations see no measurable performance gains.

Read the report to uncover what’s holding teams back—and how to fix it fast.
Discover the Engineering Productivity Handbook
How to build a high-impact program that drives real results.

What to measure and why it matters.

And the 5 critical practices that turn data into impact.
Want to learn more about Faros AI?

Fill out this form and an expert will reach out to schedule time to talk.

Loading calendar...
An illustration of a lighthouse in the sea

Thank you!

A Faros AI expert will reach out to schedule a time to talk.
P.S. If you don't see it within one business day, please check your spam folder.
Oops! Something went wrong while submitting the form.

More articles for you

Editor's Pick
AI
10
MIN READ

DRY Principle in Programming: Preventing Duplication in AI-Generated Code

Understand the DRY principle in programming, why it matters for safe, reliable AI-assisted development, and how to prevent AI agents from generating duplicate or inconsistent code.
November 26, 2025
Editor's Pick
AI
DevProd
9
MIN READ

Are AI Coding Assistants Really Saving Time, Money and Effort?

Research from DORA, METR, Bain, GitHub and Faros AI shows AI coding assistant results vary wildly, from 26% faster to 19% slower. We break down what the industry data actually says about saving time, money, and effort, and why some organizations see ROI while others do not.
November 25, 2025
Editor's Pick
News
AI
DevProd
8
MIN READ

Faros AI Iwatani Release: Metrics to Measure Productivity Gains from AI Coding Tools

Get comprehensive metrics to measure productivity gains from AI coding tools. The Faros AI Iwatani Release helps engineering leaders determine which AI coding assistant offers the highest ROI through usage analytics, cost tracking, and productivity measurement frameworks.
October 31, 2025