Context-First QA, Part 3: The Roadmap

Two days ago: the thesis. 1000 tasks, $700 vs $40, 70% never hits an LLM.

Yesterday: the map. Ten layers, A through J. Three-Layer Architecture as backbone.

Today: the calendar.

Eight weeks. Four code drops. Four standalone series episodes. Plus how to engage if you want this pattern in your stack - and why June 15 is your forcing function.

The ten layers, one more time

Before we land in the calendar, the quick recap:

A Input Layer (typed QAContext from messy sources)
B Decision Layer (70% without LLM, the deterministic gate)
C Output Layer (Atlassian ADF, the 303 redirect trick)
D Orchestration (partial continuations, heartbeats)
E HITL Safety (action queue state machine)
F Vendor-Agnostic Infra (swap providers, not orchestrators)
G Cost & Telemetry (per-task attribution)
H Operational Discipline (five production gotchas)
I Multi-Process Glue (one script, three backends)
J Approval UX (HITL UI that doesn’t feel like work)

Out of these ten, here’s what publishes when.

The 9-week sequence

Week	#	Theme	Mode
May 26-28	#04	Performance audit 5-agent (WCAG ecosystem)	standalone, code
Jun 2-4	#05	Multi-page WCAG (V0.4 build story)	standalone, code
Jun 9-11	#06	CDAT pattern (Page Objects reinvented)	standalone, code
Jun 16-18	#07	Figma-to-code deterministic	standalone, code
Jun 23-25	#08	6 portals agent-ready in 70 minutes	standalone, narrative
Jul 8-9	#09	Episode C “ADF Without Tears” ⭐	mini-portal, code drop
coming	#10	Episode B “70% Without an LLM”	mini-portal, code drop
Jul 14-16	#11	Episode A “Input Layer”	mini-portal, code drop
Jul 21-23	#12	Episode E “HITL Safety”	mini-portal, code drop

Five standalone series episodes first - they’re already-shipped pieces of the broader ecosystem (WCAG, CDAT, Figma, agent-ready portals). Then four mini-portal code drops from the Context-First QA series itself: B, C, A, E. Each lands in a repo branch you can clone, read, and adapt.

The mini-portal repo: darco81/context-first-qa-patterns, AGPL-3.0. Main branch is an index; each episode branch contains the destylat code for that layer. After publication, branches merge to main with an aggregated README.

The full loop

Here’s the end-to-end production flow. One ticket in Jira, one comment out, full audit trail in between.

sequenceDiagram
    actor Dev as Engineer
    participant Jira
    participant ETL as Deterministic ETL
    participant AI as AI Judge
    participant ADF as ADF Publisher

    Dev->>Jira: ticket created
    Jira->>ETL: webhook trigger
    ETL->>ETL: parallel fetch (Jira+Figma+Playwright)
    ETL->>ETL: enrichment, typing, validation

    alt 70% case (deterministic verdict)
        ETL->>ADF: pass/fail report
        ADF->>Jira: comment with audit trail
    else 30% case (needs judgment)
        ETL->>AI: structured QAContext
        AI->>AI: bounded decision (small scope)
        AI->>ADF: validated output schema
        ADF->>Jira: comment with audit trail
    end

Maps were components. Loop is integration. Each layer in Part 2 maps to a participant or a transition in this diagram.

What I’m NOT publishing - and why

Six of the ten layers (D, F, G, H, I, plus production-grade versions of A and E) stay in pitch-mode for now. They don’t become public code drops in this window.

Why:

D Orchestration is multi-tenant and tied to a specific dispatcher infrastructure. The architecture is teachable; the operational scaffolding is not.
F Vendor-Agnostic Infra is mid-refactor as I write this. I’d rather publish it once it’s done than publish a snapshot that breaks in a month.
G Cost & Telemetry has compliance-adjacent observability concerns. Public destylat would require enough redaction to be misleading.
H Operational Discipline is the “five gotchas in production” piece. Each gotcha is one paragraph; the pattern is teachable in a single article rather than a code drop.
I Multi-Process Glue is the most production-environment-coupled. It assumes a specific shell setup, a specific CI pipeline, a specific dev workflow. Better as consulting work than a public repo.

This isn’t withholding for its own sake. It’s the three-tier model in action:

Tier 1 (public, AGPL): the method. Architecture, decisions, working code on representative data. You clone, you see, you adapt.
Tier 2 (commercial): production-ready implementation. Multi-tenant, compliance-aware, scaled.
Tier 3 (enterprise): design system federation, cross-repo audit, full toolchain integration.

Public version gives you the method. Production version takes 2-4 weeks of work to implement against your stack. That’s where I come in.

June 15: the forcing function

If your team is already running Agent SDK pipelines for QA, performance audits, or accessibility checks, June 15, 2026 is the date in your calendar.

That’s when Anthropic separates programmatic SDK usage from interactive Claude Code subscription windows, and bills SDK traffic at full API rates from a dedicated monthly credit. Today, agentic pipelines borrow against the same rate-limit budget your developers use to write code. After June 15, they don’t - they have their own line item, billed per token.

Which means the cost math from Part 1 stops being theoretical. From mid-June, every naive “LLM everywhere” workflow you ship is an explicit invoice item. Every deterministic floor you add subtracts directly from that invoice.

If you’ve been pricing this work as “fits within my Max plan,” the answer changes in four weeks. The deterministic-first pipelines I’m publishing over the next eight episodes were built before this announcement - but they’re now the most direct way to keep your QA AI costs predictable through Q3.

Better to ship the floor in May than to discover the bill in July.

How to engage

I work with 3-5 teams a year. Long engagements - not “AI tool for QA” consultations, not “let’s prototype something,” not “can you write some tests for us.”

What I do:

Build the deterministic floor under your existing QA, WCAG, or performance audit work.
Wire LLM judgment into the right slot - bounded, testable, attributable.
Set up the toolchain so the next person inheriting it can read it.

What I don’t do:

Hand you an “AI test writer.” The premise of this series is that you don’t want one.
Promise specific accuracy numbers without measuring your codebase first.
Take on work where the goal is volume over reproducibility.

If you’re shopping for “AI test writer” - we won’t be a fit. If you have an existing QA or audit pipeline and you want to add deterministic intelligence on top, DM is the right next step.

More on services: sdet.it/services (the portal is mid-launch, the DM works today).

Next Tuesday

Series #03 wraps here. The mini-portal Context-First QA is now public infrastructure - bookmark the calendar, the episodes land on schedule.

Next up: Performance audit, 5 specialists in parallel dispatch. Same Lead orchestrator pattern as the WCAG toolkit (Series #01, May 5-7). Different domain. Different numbers - 7 hours with AI vs 16 billable vs a full week classic.

Map is upside-down. Loop is bounded. Audit trail is trustable.

That’s the deal.

The ten layers, one more time

The 9-week sequence

The full loop

What I’m NOT publishing - and why

June 15: the forcing function

How to engage

Next Tuesday

Related

Performance audit, Part 3: Where the method scales

ADF Without Tears: The 303 Trick for Inline Images in Jira

ADF Without Tears: The Full Pipeline and the Repo