I have a 5-repo monorepo. Shared core, four brand-variant fronts, one admin module. Every day I ask Claude Code questions like “where is useBaseCart consumed across the platform” or “what overrides the cart button in BrandA versus BrandB”. And every day Claude Code does the same thing.
It runs Glob to find files. Then it runs Grep to find references. Then it reads the top three results. Then it asks itself if that was enough. Then it runs Glob again. Then Grep again.
Fourteen tool calls. Forty seconds. A few thousand fresh input tokens, every time.
This is fine for one question. It is not fine when it is the third cross-repo question of the day and you are watching tokens burn on exploration that the model could have learned once and reused forever. Today’s post is about how I got tired of that, what I tried first, what I deleted, and why jarvis-brain looks the way it does.
What “burning tokens” actually looks like
I ran a benchmark. Fifty questions across five categories - code discovery, usage tracing, cross-repo, dependency path, architecture. Ten questions each. Two runs: Claude Code native (only Glob, Grep, Read) versus Claude Code plus jarvis-brain as an MCP tool.
The headline I care about today is the baseline. Without brain, on the same fifty questions:
- 4,145 fresh input tokens spent on exploration
- 14 tool calls on the hardest cross-repo questions
- 44 seconds average for architecture-deep questions, 189 seconds on the worst single one
- Total wall time across all fifty: 36 minutes
The cache helps. Anthropic’s prompt cache is aggressive, and dollar cost stays low because of it. But the fresh input tokens - the ones the model has to read for the first time on every call - those scale with how often Claude Code re-explores the same code. And it re-explores constantly, because tool results do not become permanent memory.
This is not a Claude Code bug. Glob and Grep are the right primitives when you have no other map of the codebase. They are universal. They work on every repo with zero setup. The cost is paying tokens for exploration every time.
The use case nobody talks about
The pain scales with what kind of code you have. My pain is what I will call, with the names changed, a multi-brand commerce platform.
One core - the shared engine. Composables, base components, business logic, around eighty percent of the actual code. Four brand-variant fronts on top - same engine, different design system per brand, different content, occasional component overrides where a brand needs something the core does not give it. Plus an admin module that consumes a couple of those brands.
This is a V-commerce pattern. White-label e-commerce platforms work this way. Multi-tenant SaaS frontends work this way. Test suites for consistent applications work this way - one test harness, N variants of the same flow. Agencies that fork a core for each client work this way.
If you have ever asked Claude Code “is there a local override of AddToCart.vue in BrandA” you know the shape of the problem. It is not “find the file”. It is “find the file, check three other repos for variants, check which one wins by Nuxt layer priority, check who calls it, check if there are brand-specific composables in the way”. This is not what Glob is for. Glob will find you ten AddToCart.vue files across five repos and leave you to figure out which one matters.
What I tried first, and deleted
The obvious move was a naive RAG. Embed the codebase with Voyage or OpenAI, dump it into a vector store, give Claude Code a search tool. People do this. There are starter repos for it.
I built a prototype. I deleted it.
Two problems. First, embeddings of source code are bad at code structure. They are good at “find me a function that does X conceptually”. They are bad at “find me every consumer of useBaseCart in forge-core”. The first question is semantic. The second one is structural. A vector store does not know that useBaseCart is a name, not a phrase.
Second, naive RAG is another tool to learn. Claude Code already has Glob, Grep, Read. They are native, predictable, and cheap to call. A custom search tool sits next to them and requires prompt engineering to use well. Every new tool is friction. The tool that fixes “Claude Code burns tokens on Glob” should not be “here is a tool Claude Code has to remember to use instead of Glob”.
I considered a smarter Grep wrapper. Same problem in a different wrapper - still fourteen tool calls, still token burn, just with marginally better filtering.
What I needed was not a better retrieval tool. It was a different access path. The structure of the codebase should be pre-computed once, stored, and served to Claude Code as a native MCP tool that feels like Glob but answers like a senior engineer who has read the code.
That is the pivot. Not “build a search engine”. Build the map, then expose it through the protocol Claude Code already speaks.
The architecture in one paragraph
jarvis-brain extracts a graph from your code. Nodes are functions, components, composables, types, files. Edges are imports, calls, overrides, parent-child layer relationships. Built once per repo, then merged across repos into a federated master graph that knows which brand front overrides which core component, and which design system token gets used where. The graph lives in a SQLite FTS5 index for full-text queries, plus a JSON structure for traversal. It is served to Claude Code through five MCP tools that look and feel like built-in primitives - brain_query, brain_graph, brain_path, brain_explain, brain_ffcss.
That last detail is the point. They are not yet another search interface. They are MCP-native. Claude Code sees them in the same tool list as Glob and decides when to use them based on what the question needs.
Tomorrow’s post is how the indexing actually works, the FTS5 trick that makes useBaseCart queryable by typing “Base”, and the benchmark breakdown across all five categories - including the categories where brain does not beat Grep.
What you can poke at right now
brain.sdet.it is the live demo. Brain Website on the front, jarvis-brain backend behind it. Log in to a public demo group, browse the graph, query it, look at the FFCSS token federation. The interesting part is brain.sdet.it/benchmark/ - the full numbers from the fifty-question benchmark, rendered as a static report.
Tomorrow the public repo drops. Educational destylat under AGPL-3.0 - extractors, federation, prompts, MCP tools, FTS5 query layer, benchmark methodology. Enough to learn the pattern, not enough to run a production multi-tenant deployment. That is the line, and Part 3 is where I explain why the line is there.
For today, one question worth sitting with: how many fresh input tokens did your Claude Code session burn yesterday on exploration that did not need to happen?
#FromTheField - series continues tomorrow.