Most AI teams blame model quality when projects fail. The model wasn't accurate enough. The training data was noisy. The prompts needed tuning. These are convenient explanations because they point to fixable, technical problems. But the real killer is invisible. It's not in the code. It's not in the data. It's in the space between decisions — the accumulated cost of choices nobody wrote down.
We call this context debt.
What Is Context Debt?
Context debt is the compounding cost of undocumented architectural decisions. Unlike technical debt — bad code you chose to ship because you needed to move fast — context debt is invisible. Technical debt lives in the codebase where anyone can find it. Context debt lives in Slack threads, hallway conversations, whiteboard photos nobody saved, and the memories of engineers who've since left the company.
Every team carries context debt. The question is whether they're aware of it. When someone asks "why did we choose Postgres over DynamoDB for this service?" and the answer is "I think Sarah decided that, but she left in Q3" — that's context debt. The decision might have been brilliant. It might have been wrong. Nobody knows, because nobody wrote it down.
In traditional software, context debt is expensive but survivable. In AI systems, it's lethal.
How Context Debt Kills AI Projects
The pattern is always the same. A team builds a proof-of-concept. They select a model — GPT-4, Claude, an open-source alternative. They configure parameters, set temperature values, define prompt templates. The PoC works. Leadership is excited. The team ships it to production on a compressed timeline.
Six months later, the system is behaving strangely. Outputs have drifted. Costs are climbing. A new compliance requirement arrives and nobody can explain why the system processes data the way it does. The original architect left for a startup in Q2. The engineer who chose the embedding model is on parental leave. The PM who defined the acceptance criteria moved to a different product.
The new team inherits a working system and zero understanding of why it works. They spend three weeks reverse-engineering decisions that should have taken an hour to read. They discover the model was chosen because of a specific token-limit constraint that no longer exists. The temperature was set to 0.3 because of a hallucination incident in testing that nobody documented. The prompt template includes a system instruction that references a business rule that was changed two quarters ago.
Every one of these discoveries is context debt being paid back — with interest.
The Numbers
According to AI Plumber's governance framework analysis, 70% of AI project failures trace back to governance gaps — not model performance. Context debt is the single largest contributor to those governance gaps.
Longer debugging time without documented decisions
Longer onboarding when ADRs are absent
Of AI failures are governance failures
Teams without documented decision trails spend 3-5x longer debugging production issues. Onboarding new engineers takes 40% longer when architectural decisions aren't recorded. These aren't estimates — they're patterns observed across dozens of AI deployments, from Series B startups to Fortune 500 enterprises.
Architecture Decision Records: The Antidote
The fix is not revolutionary. It's not a new tool or a new framework. It's a practice that's been available since Michael Nygard first described it in 2011: Architecture Decision Records (ADRs).
An ADR is a short document that captures four things: the decision that was made, the context that prompted it, the options that were considered, and the rationale for choosing one over the others. It is not a design document. It is not a specification. It is a decision receipt — proof that a deliberate choice was made, and a record of why.
ADRs take 15-30 minutes to write. They save weeks of reverse-engineering. The ROI is not debatable.
What an ADR Looks Like
Here's a real example — an ADR for choosing a model routing strategy:
## Date: 2026-02-15
## Author: K. Van Lysebetten
Our customer support pipeline processes ~12,000 requests/day. Current setup sends
all requests to Claude Sonnet, costing ~€4,200/month. 60% of requests are simple
FAQ-type queries that don't require advanced reasoning.
1. Single model (Claude Sonnet for everything) — simple, expensive
2. Two-tier routing (Haiku for simple, Sonnet for complex) — moderate complexity
3. Three-tier routing (Haiku/Sonnet/Opus) — complex, maximum cost optimization
4. Self-hosted open-source for Tier 1 — lowest cost, highest ops burden
Option 2: Two-tier routing with complexity classifier.
Route simple queries (FAQ, status checks, basic info) to Haiku.
Route complex queries (complaints, technical issues, multi-turn) to Sonnet.
- Projected 45% cost reduction (€4,200 → €2,300/month)
- Complexity classifier achieves 94% accuracy on test set
- Option 3 rejected: marginal cost saving vs. added complexity
- Option 4 rejected: ops burden exceeds cost saving at current scale
- Revisit if volume exceeds 50,000 requests/day
Paying Down Context Debt
You don't need to document every decision ever made. You don't need to retrofit two years of architectural choices into ADRs. That's a recipe for burnout and abandonment.
Start with one question: which decisions would cause the most confusion if the author left tomorrow? Write those down first. The model selection decision. The data pipeline architecture. The cost guardrail thresholds. The compliance interpretation that shaped your access control design.
ADRs don't require retrofitting every past choice. They require discipline about every future one. Make it a team norm: every significant decision gets a receipt. No receipt, no merge.
Context Debt in Agentic Systems
If context debt is dangerous in traditional AI systems, it's catastrophic in agentic ones. Agentic AI systems — multi-agent orchestrations, autonomous workflows, AI agents with write access to production — involve probabilistic outputs, model selection tradeoffs, and governance decisions that are rarely self-documenting.
When a traditional API returns an unexpected result, you check the logs. When an AI agent behaves unexpectedly in production, you need to understand not just what happened, but why the system was configured to allow it. Which agent had write access? Who decided the escalation threshold? What was the rationale for the model routing configuration? Without documented decision trails, teams spend weeks performing archaeological excavation on their own systems.
According to AI Plumber's governance-first framework, the six infrastructure pipes required before any agent gets write access all generate decisions that must be documented. Identity management decisions. Audit logging scope decisions. Cost guardrail threshold decisions. Escalation policy decisions. Each one is a potential context debt liability if left undocumented.
The teams that survive agentic AI deployment are the ones that treat documentation as infrastructure — not as an afterthought, but as a prerequisite. Context debt at scale is not a nuisance. It's a governance failure waiting to happen.