Greppable
Making data speak the language of agents.
Token Theory partnered with the Greppable team to architect and build a data language and Claude Code plugin from the ground up, giving AI agents pre-indexed knowledge about codebases, schemas, and architecture so they spend less time exploring and more time building.
Greppable
Developer Tools
8 weeks
2025

The opportunity
AI coding agents like Claude Code are powerful, but they start every session blind. The first thing they do is explore: grepping files, reading directory structures, tracing imports, burning through tool calls and tokens just to understand what they're working with. On large codebases, this exploration phase can consume a significant chunk of both the context window and the session budget before any real work begins.
The Greppable team saw a clear gap. What if agents could start every session with structured, pre-indexed knowledge about the codebase? Not a static README, but living artifacts: code maps, schema definitions, architecture diagrams, and API contracts, all in a format agents can grep and parse natively. And what if the agent's own discoveries could be captured as persistent session memory, so the next session picks up where the last one left off?
What we built
The GDL Format Suite
We designed and implemented seven specialised formats under the Greppable Data Language (GDL) umbrella. Each format addresses a distinct data domain while sharing a unified, grep-native syntax that agents can search, filter, and transform using standard Unix tooling.
- GDL - Grep-native data records (the core format)
- GDL-S - Database schema mapping for relational structures
- GDL-C - Code structure maps for navigating large codebases
- GDL-D - Visual knowledge diagrams rendered as text
- GDL-M - Three-tier agent memory systems (short, mid, long-term)
- GDL-U - Unstructured document indexing for search
- GDL-A - API contract maps for integration surfaces
The Claude Code Plugin
The format suite ships as a Claude Code plugin, distributed via the plugin marketplace and installable in under two minutes. On setup, the plugin scans the entire codebase and generates a set of GDL artifacts: code indexes (.gdlc) mapping every file, export, and import; architecture diagrams (.gdld) showing system flows and dependencies; schema maps (.gdls) for database structures; and API contracts (.gdla) for integration surfaces. These artifacts are injected at the start of every Claude Code session, giving the agent a pre-built mental model of the project before it writes a single line of code. The plugin uses Claude Code's hook system to intercept session events and tool calls, injecting the right artifact at the right moment so the agent gets structural context without having to ask for it.
Session memory
As Claude works through a codebase, it discovers patterns, resolves ambiguities, and makes architectural decisions. The plugin captures those insights as persistent GDL-M memory artifacts. The next session starts with everything the agent learned last time: no re-explaining project conventions, no re-discovering the same code paths. The agent accumulates institutional knowledge the same way a human developer does. In benchmarks, the memory layer delivered the highest ROI of any GDL format, cutting tool calls by 91% and tokens by 42% on memory-heavy tasks, with a visible compounding effect where later sessions benefited from earlier ones completing.
Deep context architecture
Every format was token-optimised from the start. We architected the syntax to minimise the token footprint when consumed by LLMs while preserving full semantic fidelity. In controlled evaluations, orientation tasks (understanding what a codebase does and how it's structured) saw a 76% reduction in tool calls and 41% fewer tokens. Schema queries dropped 63% in tool calls. Across the full eval suite, agent accuracy rose from 94.8% to 99.1%. The effect scales with project size: marginal on small repos under 50 files, but a clear win at 50-500 files and essential above 5,000 where raw grep on source files starts to break down.
Documentation & developer experience
We built the full documentation site at greppable.ai/docs, including interactive examples, format specifications, and integration guides. The site was designed for both human developers and AI agents consuming the docs via context windows. Every page is structured for scannability and grep-friendliness.
Screenshots
Built with
Architecture
Tech Stack
The impact
The Claude Code plugin turned Greppable from a format specification into something developers actually feel in their workflow. Sessions start faster, agents ask fewer redundant questions, and the accumulated session memory means the agent gets sharper the more you use it. Greppable launched free and open-source on the Claude Code plugin marketplace, removing all barriers to adoption.
- Seven production-ready formats and a Claude Code plugin shipped in 8 weeks
- 76% fewer tool calls on orientation tasks, 91% on memory-heavy tasks, benchmarked across a structured eval suite
- Up to 44% token reduction per session, cutting real inference costs for developers
- Agent response quality up across the board: accuracy +0.6, actionability +1.0, overall quality +0.7
- Persistent session memory with compounding returns. Later sessions benefit from everything earlier sessions discovered
- Two-minute setup via the Claude Code plugin marketplace. No configuration beyond initial scan
Related projects
Interested in working together?
Let's discuss what's possible for your organisation.
hello@tokentheory.ai