Yesterday’s post framed the problem: Claude Code on a 5-repo monorepo burns thousands of fresh input tokens per cross-repo question, because Glob and Grep have no memory of code structure. Today’s post is the engine that fixes it - how the graph gets built, how it gets queried, and where it does and does not beat the native primitives.
The public repo dropped this morning under AGPL-3.0. It is an educational destylat, not a production deployment kit. The shape is in there; the multi-tenant scaffolding is not.
Two paths into the graph
Code has to become a graph before Claude Code can query it. There are two ways to make that happen, and jarvis-brain supports both.
graph TD
A[Source repos] --> B[Path A: LLM extraction]
A --> C[Path B: CC-local bootstrap]
B -->|Qwen local / Gemini fallback| D[Per-repo graph.json]
C -->|/brain-extract skill, zero LLM cost| D
D --> E[Federation: merge + cross-repo edges + design tokens]
E --> F[Group master graph]
F --> G[FTS5 index + JSON traversal]
G --> H[5 MCP tools served to Claude Code]
Path A is the LLM pipeline. Triggered by a webhook on push, or manually via the admin API. The extractor reads source files and asks a language model to identify nodes (functions, components, composables, types) and relationships (imports, calls, overrides, parent-child layer chains). For day-to-day work it runs against a local Qwen instance reachable through a reverse SSH tunnel - free, fast, good enough for incremental updates under twenty changed files. For deep re-extractions or when Qwen is offline, it falls back to Gemini Flash, with Pro available for the harder runs. Cost is metered and tracked. The output is one graph.json per repo.
Path B is CC-local bootstrap. Sometimes you want a graph for a repo you have not seeded yet, and you do not want to pay Gemini tokens to do the first pass. So you open Claude Code in the target repo, run the /brain-extract skill, and let Claude Code’s native analysis produce the same graph.json schema. You push the result to the graphs repo, the server picks it up, and the graph is live. Zero LLM API cost - Claude Code’s subscription does the work.
Both paths produce identical graph schemas. Both land in the same federation pipeline. The server does not care which one wrote the file. This matters because the audit-precision path (Path B) and the day-to-day-merges path (Path A) have very different cost profiles, and you want them to coexist on the same engine.
Federation is the next stage. Per-repo graphs get merged into a per-group master graph. Cross-repo imports are detected (the core primitives consumed by each front). Design system tokens (CSS custom properties, SCSS variables) are tracked across consumers - canonical definitions flagged, DRY violations flagged, override chains recorded. The result is a single master graph per group that knows the full ecosystem.
The camelCase FTS5 trick
The query layer is SQLite FTS5. This is mostly boring infrastructure, except for one decision that determines whether the whole system feels good or feels broken: how do you handle identifier names like useBaseCart?
FTS5’s default unicode61 tokenizer splits on whitespace and punctuation. useBaseCart is one token. A user searching for “Base” returns nothing - the substring is not a token boundary.
You can write a custom tokenizer. That is the textbook answer. It is also a maintenance burden, an upgrade-path landmine, and adds a C dependency. I did not want it.
The trick is preprocessing at index time, not tokenization at query time. Every identifier emits two values into the index: the original (useBaseCart) plus a space-split version (use Base Cart). FTS5’s default tokenizer indexes both. A user searching for “Base” hits the space-split version. A user searching for the exact name hits the original. Same column, same query, same tokenizer. One extra line of Python in the indexer.
It is less elegant than a custom tokenizer. It also takes ten minutes to implement, has no upgrade risk, and survives every SQLite version. The decision is recorded in the architecture notes as “FTS5 camelCase = preprocessing at index time, not custom tokenizer”. The same pattern works for kebab-case, snake_case, or any compound identifier convention.
The 5 MCP tools
Once the graph is built and indexed, it gets served to Claude Code through five MCP tools. They are not a web search interface. They are MCP-native primitives that Claude Code sees in the same tool list as Glob and Read, and picks based on the shape of the question.
| Tool | What it does |
|---|---|
brain_query | Free-text search with FTS5, returns ranked hits plus two-hop neighbors and cross-repo hints |
brain_graph | Returns the raw graph.json for a repo or the group master, for traversal in code |
brain_path | Shortest path between two nodes - “how does this core primitive reach this UI feature” |
brain_explain | Node detail plus inbound/outbound neighbors plus git-blame provenance, zero LLM cost |
brain_ffcss | Design system tokens: list them, count usage per repo, surface DRY violations |
The point of MCP-native is that Claude Code does not need a system prompt update to use them. They are tools in a list. The model decides. Adding a sixth tool tomorrow is one API contract, not a re-engineering of how questions get routed.
brain_explain is the underrated one. Zero LLM cost - it is a pure graph lookup with git-blame metadata attached. For “who wrote this and what depends on it” questions it replaces a Read + Grep + git log sequence with a single tool call.
The benchmark - 50 questions on a 5-repo monorepo
Two runs, fifty questions each, same code, same model. One run with Claude Code native (Glob, Grep, Read only). One run with jarvis-brain MCP enabled on top. Categories: code discovery, usage tracing, cross-repo, dependency path, architecture - ten questions each.
Headline numbers:
| Metric | Baseline (CC native) | With jarvis-brain | Delta |
|---|---|---|---|
| Total wall time | 36m 44s | 26m 03s | -29.1% |
| Fresh input tokens | 4,145 | 2,003 | -51.7% |
| Total dollar cost | $12.21 | $12.30 | -0.7% |
| Tool calls (avg) | 4.38 | 4.58 | +5% |
| Errors | 0 | 1 | - |
The time savings is ten and a half minutes across the full run. The fresh-tokens savings is the model reading half as much source material the first time. The dollar number is flat because Anthropic’s prompt cache absorbs almost all the input-token differential - the cached reads cost a fraction of fresh reads, and the cache is identical between the two runs at the system-prompt level.
By category, the breakdown is sharper:
| Category | Baseline mean | Brain mean | Delta |
|---|---|---|---|
| Architecture | 105.6s | 49.9s | -53% |
| Cross-repo | 50.5s | 39.3s | -22% |
| Usage tracing | 19.6s | 18.0s | -8% |
| Dependency path | 26.7s | 27.6s | +3% |
| Code discovery | 17.9s | 21.5s | +20% |
Architecture questions are where the graph carries the most signal: “what are the god-nodes in this codebase”, “where do cross-repo overrides cluster”, “which layer has the densest internal edges”. Cross-repo questions next, because the federation pre-computes what Grep would otherwise have to derive from scratch.
Where brain does not win
Code discovery loses. Twenty percent worse, on average. The question shape is “find file X” or “find files starting with Y” - exactly what Glob is built for. Going through brain_query adds a hop without changing the answer. The model still ends up at the same .vue file; it just took an extra tool call to get there.
Dependency path is essentially a tie. The graph has the data but the model is just as happy to chase imports through Grep and Read on simple chains. Brain wins when the chain is long or crosses repo boundaries; otherwise the native approach is equivalent.
One error in fifty - a question about circular dependency detection where the graph traversal hit an edge case and returned no answer at all. Baseline got it right with Grep. The honest number is 49/50 correct, not 50/50. The fix is queued; not shipped yet.
Cost is the one I expected to win on and did not. Anthropic’s cache is aggressive enough that the savings on fresh input tokens evaporate at the bill level. If you are paying API costs the way you pay them today, brain does not cut your bill. What it cuts is wall time and exploration burn - the things that affect how fast you ship, not how much your provider charges.
What ships in the public repo today
github.com/darco81/jarvis-brain-core, AGPL-3.0. The shape of the engine:
brain/extractors/- how source becomes structured node/edge JSONbrain/federation/- merging per-repo graphs, detecting cross-repo edges, design token federationbrain/llm/prompts.py- the extraction promptsbrain/api/mcp.pyplusmcp_tools.py- the five MCP tools and their schemasbrain/api/query.pyplusquery_path.py- FTS5 with the camelCase preprocessing trickbrain/viz/- the graph visualization adapterbenchmark/- methodology, sample questions, runner
What is not in there is the production scaffolding: auth, webhook handlers, admin UI, cost tracking, alerting, the worker queue, deployment configs, the multi-tenant config schema. You can clone this and read the method. You cannot clone this and run a multi-tenant production deployment of it. That is the line, and it is the line on purpose.
Live demo still at brain.sdet.it. Benchmark report rendered as a static page at brain.sdet.it/benchmark/ - all fifty questions, all categories, both runs side by side.
Setup for tomorrow
Tomorrow is Part 3, and it is the post for the question you should actually ask before adopting this: when does the engine pay off, and when is Grep already enough.
Plus the V0.5 tier - what the same engine looks like when you stop pretending you have one repo and start treating a design system org with ten consumer fronts as the unit of work. Cross-repo dedup, token federation, atomic patches across consumers. Different problem class, same architecture underneath.
#FromTheField - day 3 lands tomorrow morning.