LLM-generated config files reduce success by 3% and increase costs 20% or more. Human-written files only marginally help. Only include what agents cannot discover by reading your code.
Six parallel research agents. Two hundred plus sources. One consolidated view of what context engineering actually is, and how to do it well. This is the distilled output behind the nv:context skill.
The consensus definition, distilled from Karpathy, Schmid, Chase, Osmani, and Anthropic. One sentence everyone in the field has independently converged on.
The systematic discipline of designing dynamic systems that provide the right information and tools, in the right format, at the right time, so an LLM can accomplish a task.
That's a subset. Prompts are one layer inside a much larger information system.
Retrieval is one of four operations. Picking docs is the easy part of context.
The system prompt is one of seven components. Treating it as everything is the failure mode.
Most agent failures are not model failures. They are context failures.
The single most counterintuitive finding in the entire research corpus. Every law that follows is a strategy for resolving it.
Distilled from every source. Each one is a tested response to the core paradox. The skill itself condenses these into eight working rules. The synthesis is the fuller source.
LLM-generated config files reduce success by 3% and increase costs 20% or more. Human-written files only marginally help. Only include what agents cannot discover by reading your code.
One executable command with full flags outperforms three paragraphs of description. One code snippet showing a pattern outperforms lengthy explanations.
Every token in your config competes with the actual task. Frontier LLMs follow ~150 to 200 instructions consistently. Claude Code's system prompt already uses ~50. Target under 200 lines per root file.
Root file orients. Subdirectory files scope details. Skills load on demand. MCP retrieves dynamically. Never dump everything into one file.
Keep system instructions and tool definitions static. Append dynamic content at the end. Cached tokens cost 10x less. A single changed token at the start invalidates the entire cache.
Every configuration must include three buckets: Always do, Ask first, and Never do. The single most common helpful constraint: "Never commit secrets."
Don't use LLMs for linting, formatting, or style enforcement. Use hooks and scripts. Reserve agent config for things only an LLM can do.
Write state to files, read selectively. The todo.md pattern pushes
objectives into recent attention. Compression should be reversible: keep URLs
and paths even when dropping content.
Share memory by communicating, don't communicate by sharing memory. Sub-agents work best with focused, isolated contexts. Each one explores extensively but returns condensed summaries.
Start minimal. Observe agent mistakes. Add instructions that prevent those specific mistakes. Prune regularly. Treat config like code.
Independently identified by three different teams. If you do anything to context, you are doing one of these four things.
Persist information beyond a single turn.
Scratchpads, memories, files, todo.md
Retrieve only the relevant information.
RAG, tool selection, memory retrieval, glob/grep
Reduce tokens while retaining meaning.
Summarization, observation masking, tool result clearing
Partition context across boundaries.
Sub-agents, fresh context windows, scoped rules
Independently identified by Anthropic, LangChain, and Google ADK.
Philipp Schmid's framework. Every token your model sees belongs to exactly one of these seven slots. If you cannot name the slot, you cannot engineer it.
Agent identity, behavior rules, boundaries. The persistent persona.
The current task or question. The thing the user actually wants done.
Conversation turns, tool outputs, intermediate results from earlier in the session.
Cross-session facts, preferences, domain knowledge that survives a context reset.
RAG results, file contents, documentation pulled in for the specific task.
Tool descriptions and parameters. The actions the agent can take from here.
The expected response format or schema. Constraints on what comes back.
Framework by Philipp Schmid · Google DeepMind
The current state of agent config files as of April 2026. Different tools, different formats, mostly the same purpose.
@imports, skills, hooks integration.
Pragmatic recommendation. Start with AGENTS.md (widest support) plus CLAUDE.md (deepest features). Keep overlap minimal. Generate other files only for tools you actually use.
A handful of statistics from the corpus that change how you think about context engineering once you have read them.
Auto-generated config files reduce agent success by 3% and increase costs by 20% or more vs no config at all. Human-written files only marginally help.
Experienced developers are 19% slower with AI tools when context is poorly managed, despite feeling 24% faster. A 39-percentage-point perception gap.
Cached tokens cost 10x less than uncached tokens. A single changed token at the start of your prompt invalidates the entire KV cache. Stable prefixes pay for themselves immediately.
Frontier LLMs reliably follow 150 to 200 instructions before attention falls off. Claude Code's own system prompt already burns about 50 of those. You have less budget than you think.
The maturity level of most repositories on the L0 to L6 framework. Goal of nv:context: get teams to L5 or L6 (path-scoped rules, active maintenance, skills, MCP, dynamic capability loading).
Tool calls per task at Manus, across four framework rebuilds. Their five production patterns (KV-cache, tool masking, file system memory, error preservation, diversity injection) are battle-tested at scale.