The Research

200+ sources. 8 distilled laws. One methodology.

This isn’t pop AI. Every recommendation in nv:context links back to a real source: published papers, production data, experienced practitioners. Here’s the full library.

01 / Key Findings

Six numbers that change how you build with AI.

Pulled from primary sources. Each one upends a common assumption about how AI coding agents should work.

−3%

Auto-generated configs reduce success

ETH Zürich

LLM-generated AGENTS.md files cut agent success by 3% and increase costs 20%+. Human-written files only marginally help (+4%). Most repos would do better deleting their config than running /init.

19%

Experienced devs slower with bad context

METR · controlled study

16 senior developers in a controlled study were 19% slower with AI tools, despite feeling 24% faster. The 39-point perception gap is the cost of context failure.

40%

Sweet spot beats max utilization

Dex Horthy · HumanLayer

Using 40% of the context window outperforms using 90%. A focused 300-token context can outperform an unfocused 113K-token context. More tokens is not the answer.

85% drop

Context degrades on a curve

Anthropic · Manus production

At 60% capacity context is safe. At 70% precision drops. At 85% hallucinations begin. Compact proactively. Never wait for the auto-compact at 95%.

~150max

The instruction budget is finite

Frontier LLM benchmarks

Frontier models follow 150–200 instructions consistently. Claude Code’s system prompt already uses ~50. Every line in your CLAUDE.md competes with the actual task for attention.

10x

Stable prefixes cut cost 10x

Manus · KV-cache law

Cached tokens cost 10× less than uncached. Keep system instructions and tool definitions static; append dynamic content at the end. A single changed token at the start invalidates the entire cache.

02 / The Library

12 research logs.

200+ external sources digested into 12 focused logs, organized by research dimension. Six parallel research agents. ~471 KB of distilled findings. Each log is open-source on GitHub.

45 KB
1308 lines

Advanced Claude Patterns

Boris Cherny’s workflow, progressive disclosure, path-specific rules, skills, hooks, subagents, worktrees, and the compounding engineering pattern.

41 KB
933 lines

Agent Config Research

CLAUDE.md, AGENTS.md, .cursorrules, copilot-instructions, GEMINI.md, Windsurf. The complete file format matrix and the ETH Zürich findings.

55 KB
1193 lines

Articles Research

32-source literature review: Anthropic, Manus, Martin Fowler, LangChain, LlamaIndex, Weaviate, Schmid, Karpathy, Inngest, Google ADK and academic papers.

31 KB
586 lines

Community Forums Research

dev.to, Hacker News, and developer communities. Anti-patterns, productivity data (incl. METR’s 19% finding), session management, and Claude Code source-leak insights.

36 KB
636 lines

Definitions Research

The theoretical foundation. Karpathy, Tobi Lutke, Schmid, Simon Willison, and the origin of the term “context engineering” (June 2025, AI Engineer conference).

49 KB
1225 lines

GitHub Discussions Research

Real CLAUDE.md files from production repos, shared skills, working hooks, the Ralph Wiggum loop, worktree patterns, and 38+ TypeScript exemplar repositories.

31 KB
822 lines

Python Specific Research

CLAUDE.md / AGENTS.md patterns from MCP Python SDK, OpenAI Agents Python, Google ADK, LangChain, LangGraph, Pydantic AI, and the elite uv-mandating consensus.

39 KB
790 lines

Reddit Forums Research

r/ClaudeAI, r/LocalLLaMA, r/MachineLearning. The golden rules of CLAUDE.md (under 60 lines), real before/after stories, and power-user task decomposition.

36 KB
635 lines

Tools Research

Repomix, Agnix, Context7, codebase-memory-mcp, LLMLingua-2, Mem0, Graphiti. The full tooling ecosystem for context packaging, memory, and validation.

34 KB
460 lines

Twitter / X Research

The origin story. Dex Horthy coining the term, swyx amplifying, Tobi Lutke’s viral tweet, Karpathy’s endorsement, and the timeline of how the discourse formed.

46 KB
988 lines

Workflow Patterns Research

How power users actually work: RPI workflow, Plan Mode, session management, subagent patterns, worktrees, TDD with agents, the 20% context-budget rule, and 12-Factor Agents.

28 KB
558 lines

YouTube Research

Video and audio sources. Dex Horthy’s Y Combinator talk, Jeff Huber on Chroma, conference keynotes, podcasts, and the "everything is context engineering" thesis on tape.

03 / The 200+ Sources

Where the evidence comes from.

Grouped by origin. Each name below is a primary source cited inside the 12 research logs. The full source URLs live in the logs on GitHub.

AI Labs & Companies

~40 sources

Anthropic Claude Code, Agent SDK, official engineering blog on context engineering

Google DeepMind Gemini, ADK multi-agent framework, Philipp Schmid’s context essays

OpenAI Agent SDK, GPT evaluation methods, Codex / OpenAI Agents Python

GitHub Copilot deployment data, AGENTS.md spec analysis across thousands of repos

Manus Production agent data: 50+ tool calls per task, 4 framework rebuilds, KV-cache laws

JetBrains NeurIPS 2025 paper on AI in IDEs, agent evaluation methodology

Cursor .cursor/rules MDC format, glob-scoped activation patterns

Aider, Continue, Codeium CONVENTIONS.md, alternative agent harnesses, IDE integrations

Windsurf, Zed .windsurf/rules, editor-native agent patterns and rule scoping

LangChain, LlamaIndex Framework guides on the Write / Select / Compress / Isolate operations

Chroma Jeff Huber on the “Gather and Glean” methodology, vector retrieval

HumanLayer, Inngest RPI workflow, harness engineering, 12-Factor Agents framework

Academic & Research

~30 sources

ETH Zürich The agent config study: the −3% finding for auto-generated AGENTS.md files

METR Controlled study (16 senior devs) showing 19% slowdown with poorly-contextualized AI

FlowHunt, LongMemEval Context window benchmarks proving focused 300-token wins over 113K-token contexts

NeurIPS 2025 papers Multiple agent evaluation studies, IDE-integrated AI research, scaffolding ablations

Stanford, MIT, CMU LLM reasoning, RAG analysis, attention degradation across long contexts

Stack Overflow Survey 2026 Industry-wide developer adoption data, AI tooling preferences, productivity self-reports

Google DORA Measured delivery speed dips 1.5% per 25% AI adoption when context is poor

IntuitionLabs, Gartner 40%+ of AI project failures stem from poor context management (industry analysis)

Practitioners

~25 sources

Boris Cherny Claude Code co-creator. 5 parallel sessions, Opus 4.5, 259 PRs in 30 days

Dex Horthy Coined “context engineering” (June 2025). Founder of HumanLayer. RPI workflow author

Andrej Karpathy The “delicate art and science of filling the context window” definition

Tobi Lutke Shopify CEO. The viral tweet that took the term mainstream (June 19, 2025)

Philipp Schmid Google DeepMind. “Most agent failures are not model failures. They are context failures.”

Addy Osmani Google Chrome. “Context engineering is writing the full screenplay for the AI”

Simon Willison Independent. Definitions, anti-patterns, and why the term “stuck”

Martin Fowler Coding agents from a software engineering discipline perspective

swyx (Shawn Wang) SmolAI / AI Engineer. Amplified Dex’s talk, removed RAG track in favor of context

Rodrigo Tadewald Vibe Design methodology: the design lineage behind nv:design and nv:context

Community

~100+ sources

r/ClaudeAI Power-user threads on CLAUDE.md, hooks, skills, and session management

r/LocalLLaMA Open-source LLM context tactics, prompt compression, model behavior under load

r/MachineLearning Academic-adjacent discussion of agent evaluation, RAG, long-context degradation

Hacker News Threads on Claude Code, Boris Cherny’s workflow, agent harnesses, ETH study debate

X / Twitter Technical threads from Karpathy, Schmid, swyx, Dex, Tobi, Simon Willison, Addy Osmani

GitHub Discussions & Issues Real CLAUDE.md files, hooks configurations, awesome-lists, gists, real bug reports

YouTube & Conference talks YC Root Access, AI Engineer World’s Fair, Latent Space, Dometrain, HumanLayer talks

dev.to & personal blogs HumanLayer blog, addyosmani.com, Inngest, Faros, CodeConductor, Vellum, Zep

Convinced?

Stop guessing.
Start engineering.

~/your-repo

$npx skills add johnnichev/nv-context -g -y

← Back to nv:context Read the synthesis →