The public toolkit audits one route and shows you the truth about it. The method behind it ran on three ecommerce platforms - the kind with carts, variant selectors, payment steps and a hundred routes. This is the part about the gap between the two.

First, the tool finding something

Part 1 was a clean site. Five A’s, nothing to fix - which made the point about a single score lying, but isn’t much of a demo. So the toolkit ships a slow-demo: a deliberately mis-built Nuxt 3 page where every problem is invented on purpose and clearly labelled as synthetic. No client data, ever. Just the shape of the real patterns.

Pointed at it, the deterministic static pass alone finds seven anti-patterns, no browser and no model required:

runtimeCompiler: true in the Nuxt config, components: { global: true }, an image with no width or height, an image with no loading attribute, a watch with deep: true, and two dependencies - lodash and moment - declared and never imported.

Here is the report the static pass prints, headline first:

  • Core Web Vitals: unmeasured (run with --url for a measured grade)
  • Lighthouse perf: n/a (source-only run)
  • Provisional findings grade: C (64/100)
AreaGradeScoreFindings
bundleB814
runtimeA951
networkA1000
ssr-hydrationA1000
assetsB882

Quick Wins (Impact x Effort 6-9):

ProblemAreaFile/ResourceImpactEffortScoreFix
runtimeCompiler: true ships the Vue template compiler to the client.bundleexamples/slow-demo/nuxt.config.ts:6339Remove runtimeCompiler and precompile templates at build time, unless runtime templates are genuinely required.
<img> is missing explicit width/height, which can cause layout shift (CLS).assetsexamples/slow-demo/pages/index.vue:8339Add intrinsic width and height (or aspect-ratio) so the browser reserves space before the image loads.
Global component registration (global: true) forces every component into the entry bundle.bundleexamples/slow-demo/nuxt.config.ts:10236Drop global: true and rely on Nuxt auto-import / explicit imports so components code-split per route.

Medium (Impact x Effort 3-5):

ProblemAreaFile/ResourceImpactEffortScoreFix
Deep watcher (deep: true) recursively tracks every nested property.runtimeexamples/slow-demo/pages/index.vue:40224Watch a specific getter/key, use shallowRef/shallowReactive, or restructure state so a deep watch is unnecessary.
<img> has no loading attribute.assetsexamples/slow-demo/pages/index.vue:8133Add loading=“lazy” for below-the-fold images; keep the LCP/above-the-fold image eager.
Dependency “lodash” is never imported in source (candidate for removal).bundle-133Confirm it is not used in config/runtime, then remove it.
Dependency “moment” is never imported in source (candidate for removal).bundle-133Confirm it is not used in config/runtime, then remove it.

Every one comes with a location (nuxt.config.ts:6, index.vue:8), the impact stated in the message, a concrete fix, and an Impact x Effort score that sorts it into Quick Wins, Medium or Low. That is the whole report a developer needs: where, what, how, and in what order.

Those are the definite ones - pattern-matchable without running anything. The other half of the method only shows up in a full audit, with a URL to measure: the five AI specialists read the measured floor and the source and add what needs interpretation rather than pattern-matching. A style object literal inside a v-for that a static rule can’t safely flag. A run of independent awaits that should have been parallel. A payload pulling every field when it needs three. Those are judgement calls on top of measurement, and they are exactly what the static pass deliberately does not guess at.

The split is the whole architecture in one example. The static pass catches what is definitely wrong, on its own, fast. The specialists, reading real measurements, catch what is contextually wrong. Neither could do the other’s job.

A small, honest moment

While building the static pass, the slow-demo caught a bug in my own analyzer. The deep-watcher check was matching deep: true anywhere it appeared - including in plain data objects that have nothing to do with a Vue watcher. A false positive. The fixture surfaced it, and the fix was to require the match to sit next to an actual watch() call.

I mention it because a tool that finds problems should be held to the standard it sets. The fixture that demonstrates the tool also stress-tested it, and it was wrong before it was right. That is what dogfooding is for.

The line between public and Pro

Everything I have described is the public toolkit, AGPL-3.0, clone and run. It is the complete method on a single route: the deterministic floor, the five specialists, the split headline, a guided fix walkthrough. If you want to understand how a context-first performance audit works, it is all there, working, on real measurements.

What it is not is the production engagement. The three ecommerce audits this distillate came from needed things that don’t belong in a public, single-route tool:

Multiple routes, audited as a set, with findings deduplicated across them - because the same bloated payload on forty product pages is one root cause, not forty findings.

A local runtime - the specialists running through Ollama on the machine, no source leaving the building - because some client repositories cannot have their code sent to a hosted model, full stop.

A real auto-fix engine with a verifier loop - apply a change, re-measure, confirm the metric actually moved - rather than the guided walkthrough the public tool ships.

React, Svelte and Angular specialists, because the public v0.1 is honestly Vue/Nuxt-first and I would rather ship one framework done properly than four done from guesswork.

That is the Pro line. Not features hidden behind a paywall for the sake of it - the parts that only make sense at production scale, on real client constraints, with the years of ecommerce auditing that the niche specialists encode.

The other half lives somewhere else

One thing the three audits included that this tool deliberately does not: the backend. Load testing, server telemetry, database and search root-cause, log mining - all of that was real work on those platforms, and none of it fits a frontend performance tool. It is a different product with a different shape, and it gets its own story another time. A frontend audit that pretends to also be a load test is two tools done badly. This one stays honest about its edge: it audits the frontend, and it says so.

Why /perf:fix doesn’t just fix it

The public tool has a /perf:fix command, and it is deliberately modest. It will walk you through a finding - explain what is wrong, show the before and after, let you apply it - and it will auto-apply at most one or two trivial mechanical changes as a demo. A loading="lazy" here, a font-display: swap there.

This is the shape of one finding as /perf:fix surfaces it - where, what, how, and at what priority, all from the tool’s own report:

WhereWhatHowPriority
assets / index.vue:8<img> missing width/height (CLS risk)Add intrinsic width/height or aspect-ratioQuick Win (Impact 3 x Effort 3 = 9)

It will not auto-apply the rest, and that is a design decision, not a missing feature.

Performance fixes are mostly architectural. Splitting a bundle, restructuring hydration, parallelising a request waterfall, adding route rules - these are decisions with trade-offs, not mechanical patches. The WCAG toolkit taught me this lesson in a different domain: the auto-fixable share of real findings is small, and the honest framing is that the tool discovers and explains while a human decides. Anyone selling you an AI that auto-fixes performance is selling you a tool that will confidently make your site worse.

So the tool finds, measures, root-causes and ranks. You fix. If you want someone who has done the fixing across three production ecommerce platforms, that is the conversation.

Where this leaves the series

Three parts, one thesis, proven on real measurements throughout.

A single performance score lies, and my own portfolio proved it - C on the score, green on every vital, A on every area. The floor underneath is deterministic and stable where it counts - score, verdict and grade identical run to run - which is what lets the AI on top be trusted at all. And the method scales from this one-route distillate to multi-route, multi-framework, local-runtime production work - the part that was never going to fit in a public repo.

The public toolkit is the education. The production engagement is the niche. Both are honest about which is which.

More on the Pro tier and consulting: sdet.it/services.

Series #04 ends here.

#FromTheField