Observation 20 March 2026
Adam Crow — Advisor

The context window is the new database

The context window is the new database — illustration
Executive Summary

Enterprise AI projects keep failing at the same point: not the model, not the engineering, but the data supplied to the model at inference time. The context window has become the most critical layer in AI system design, and most teams are treating it like a skip bin. This piece argues for treating it with the same architectural rigour you'd give a production database.

The model is not the bottleneck

There's a persistent belief in enterprise AI that the next model release will solve the quality problem. GPT-5 will hallucinate less. Claude 4 will reason better. Gemini will finally understand your domain.

I keep seeing this play out with clients. They upgrade to the latest model, run the same messy data through it, and get the same disappointing results. Then they blame the model. But the pattern is consistent: when you clean up what goes into the context window, output quality jumps dramatically, regardless of which model you're running. Anthropic's own prompt engineering documentation makes this point explicitly. So does every serious benchmark. The quality of what goes in matters more than the sophistication of what processes it.

Google DeepMind's work on context distillation points in the same direction. The research community is broadly aligned here: model capability has outpaced the data infrastructure feeding it. We're past the point where model upgrades are the bottleneck.

"The most common failure mode in enterprise AI isn't model capability. It's context contamination."

— Token Theory

What happens when context is unstructured

Here's a scenario I see constantly. A customer service agent backed by an LLM needs to answer a question about a client's account. The relevant information is spread across Salesforce, Xero, HubSpot, and three internal spreadsheets. The standard approach: dump whatever you can retrieve into the context window and hope the model sorts it out.

What actually happens is the model receives 15,000 tokens of duplicated records, conflicting dates, incomplete document fragments, and metadata noise. It produces a confident, articulate, and subtly wrong answer. Because the data it was reasoning over was subtly wrong.

This is the part that catches people out. The output looks good. It reads well. But when you trace it back to the source data, you find the model has confidently merged two different John Smiths, or cited a financial figure from last quarter as if it were current. The failure mode isn't obvious until it hits production.

Unstructured vs. Structured Context

Unstructured vs. Structured Context

The context window as an architectural concern

Database architects spend years on schemas, normalisation, indexing, and query optimisation. The context window deserves the same attention. Functionally, it is the database the model queries at inference time. Except it has no schema, no indexing, no normalisation, and no query planner. Everything enters as flat text, and the model reconstructs meaning from scratch on every single request.

This is where the concept of a context layer comes in. Instead of treating the context window as a dumping ground, you design an intermediate layer that:

  • Resolves entities across source systems (is "John Smith" in Salesforce the same as "J. Smith" in HubSpot?)
  • Deduplicates records while preserving provenance
  • Structures information hierarchically, with summaries first and detail on demand
  • Optimises for token efficiency without losing semantic fidelity
  • Maintains temporal ordering so the model understands what happened when

Token economics reinforce the argument

There's a straightforward economic case too. At current pricing (roughly $3-15 per million input tokens for frontier models), every token of noise has a direct cost. An enterprise running 10,000 AI-assisted interactions per day with 5,000 tokens of unnecessary context is burning $150-750 per day on noise. Over a year, that's $55K-$275K in wasted inference cost, before you even count the downstream cost of wrong answers.

Now, prompt caching has made large context windows cheaper. That's real progress. But caching a disorganised mess still produces disorganised reasoning. Caching makes the noise cheaper to send; it doesn't help the model think more clearly. When you structure data before it enters the window, you can deliver the same semantic payload in 40-60% fewer tokens. That's a cost saving and an accuracy improvement.

What this means in practice

I've seen this pattern repeat across every engagement. The client arrives thinking they need a better model, a fancier embedding, or a more sophisticated RAG pipeline. What they actually need is someone to sit down with their data, map the entity relationships across their systems, and build a context layer that gives the model clean, structured, deduplicated information.

It's unglamorous work. It doesn't make for exciting demos. But it's the difference between an AI system that impresses in a proof-of-concept and one that actually works in production, at scale, for months without someone babysitting it.

Further reading

Sources & Further Reading
  • Anthropic, "Long Context Prompting Tips" (docs.anthropic.com) — practical guidance on structuring context for better model performance
  • Liu et al., "Lost in the Middle: How Language Models Use Long Contexts", TACL 2024 — foundational research on how context position affects model attention
  • Google DeepMind, research on context distillation — compressing context while preserving reasoning quality
  • Gartner, "Top Strategic Technology Trends for AI", 2025 — data quality consistently ranked as primary barrier to enterprise AI adoption
  • Simon Willison (simonwillison.net) — ongoing practical analysis of LLM context management and prompt engineering

Interested in working together?

Let's discuss what's possible for your organisation.

hello@tokentheory.ai