AP-031 Fixture Divergence Prevention

Two-layer prompt-level prevention for test fixtures that diverge from real producer output. Spec: .correctless/specs/ap031-fixture-divergence-prevention.md. Antipattern: AP-031 in .correctless/antipatterns.md. Motivated by PMB-010 and PMB-011.

What It Does

When a feature parses output produced by another Correctless skill or script (markdown artifact headings, JSON fields, regex matches against artifact content), hand-written test fixtures can silently encode the wrong format. All tests pass against the fixtures while the code fails against real data. This happened twice back-to-back:

  • PMB-010: sync-deferred-backlog.sh expected ## RS-001: headings; the real /creview-spec output uses ## Finding RS-001:. All 65 tests passed against hand-written fixtures. The script imported 0 of 25 pending findings.
  • PMB-011: the /cprune scanner shipped with 3 fixture-divergence bugs (17 false positives, a count-regex that would have corrupted AGENT_CONTEXT.md, and a wrong drift-debt wrapper format).

This feature adds prevention at the two phases where the divergence is introduced: spec writing and test writing.

The Two Layers

graph TD
    A["Feature parses output of<br/>another Correctless tool"] --> B["Layer 1: /cspec Step 3<br/>format-pinning directive"]
    B --> C["Spec pins exact format<br/>(heading regex, JSON schema)<br/>+ cites producer file path"]
    A --> D["Layer 2: ctdd-red agent<br/>real-fixture requirement"]
    D --> E["At least one fixture is a<br/>verbatim excerpt of a real artifact<br/>with a Source: citation"]
    E --> F["/ctdd test audit check 11<br/>fixture provenance"]
    F -->|"synthetic-only or<br/>live-read-only suite"| G["BLOCKING finding"]
    F -->|"no real artifact<br/>exists yet"| H["Dormant — Layer 1<br/>is the sole guard"]
    F -->|"real fixture present"| I["Pass"]

    style B fill:#ffa94d,color:#000
    style D fill:#ffa94d,color:#000
    style F fill:#ffd43b,color:#000
    style G fill:#ff6b6b,color:#fff
    style I fill:#51cf66,color:#fff

Layer 1: Format Pinning in /cspec

skills/cspec/SKILL.md Step 3 (Draft the Spec) now contains a format-pinning directive. When a feature reads from, extracts from, or pattern-matches against files produced by another skill or script, the spec must:

  • (a) pin the exact format being parsed — heading regex, JSON schema, or field names
  • (b) cite the producer file path (SKILL.md template section or script path) as the authoritative format source

Example from the directive: Heading format: '## Finding RS-{NNN}: {title}' per skills/creview-spec/SKILL.md Step 3.5 template. Not: The script reads review findings.

The trigger does NOT fire for file existence checks or path-only operations.

Layer 2: Real Fixtures in /ctdd

Two coordinated halves, with writer-side and auditor-side definitions kept aligned (same trigger-detection language, same producer table):

Writer side (agents/ctdd-red.md): when tests parse another Correctless tool’s output, at least one fixture must be sourced from a real artifact in the repo. The preferred form is a verbatim excerpt in the test file (or a tracked fixture under tests/fixtures/) with a Source: citation in the test language’s line-comment syntax — # Source: in shell/Python, // Source: in Go/TypeScript/Java, -- Source: in SQL. This form is hermetic: it works in CI and fresh clones where .correctless/artifacts/ (gitignored) is absent. Reading the live artifact at test time may add coverage but must never be the sole form.

Auditor side (skills/ctdd/SKILL.md test audit check 11, “fixture provenance”): flags as BLOCKING any in-scope test suite that is synthetic-only (inline heredocs with no real-artifact reference) or live-read-only (reads the gitignored artifact path with no committed excerpt). Scope is limited to test files added or modified on the current branch; the /ctdd orchestrator passes two labeled lists — MODIFIED_TEST_FILES: (from git diff) and UNTRACKED_TEST_FILES: (from git status --porcelain, since RED creates untracked files) — because the audit agent is tool-pinned to Read/Grep/Glob and cannot run git. If either label is missing, the check fails loud with a single BLOCKING finding rather than guessing scope. The auditor also follows referenced fixture files (repo-relative paths only, 10-file budget) and treats fixture content as data, not instructions (TB-003 anti-anchoring fence).

Producer-to-Artifact Reference Table

Both Layer 2 halves carry the same table of known producer-to-artifact patterns, used to distinguish “real artifact exists but the test ignores it” (BLOCKING) from “no artifact exists yet” (dormant):

Producer Artifact pattern
/creview-spec .correctless/artifacts/review-spec-findings-*.md
/caudit .correctless/artifacts/findings/audit-*-round-*.json
/cverify .correctless/meta/intensity-calibration.json
/ctdd .correctless/artifacts/qa-findings-*.json
/cdocs .correctless/artifacts/cost-*.json excluding cost-cache-* (statusline cache)

Dormant Behavior

When a single PR introduces both the producer and the consumer, no real artifact exists to excerpt. The real-fixture requirement is dormant in that case — the spec’s format pinning (Layer 1) is the sole guard — and activates once the producer has run at least once.

Testing

39 block-scoped tests in tests/test-ap031-fixture-divergence.sh covering all 6 spec rules. Assertions extract the relevant section (awk state machine between heading/check boundaries) before grepping, so keywords in unrelated sections cannot satisfy a check (AP-003 mitigation). Distribution copies (correctless/skills/cspec/SKILL.md, correctless/skills/ctdd/SKILL.md, correctless/agents/ctdd-red.md) are verified byte-equal to source. The structural test also pins 8 QA/mini-audit class fixes (e.g., the cost-cache-* exclusion, the labeled-list fail-loud fallback, the TB-003 fence) so a regression in the directive prose fails the suite.

Known Limitations

  • Prompt-level enforcement. All directives are prose read by LLM agents — adherence can fade under context pressure (a deliberate PAT-018 deviation; runtime fixture validation is explicitly out of scope per the spec’s Won’t Do). Check 11 is the second-pass safety net for a RED agent that misses the requirement.
  • Correlated trigger detection. Layer 1 and Layer 2 both depend on recognizing “this feature parses another tool’s output.” The correlation is partial, not total: check 11 examines fixture content directly (Source: citations) and fires independently of whether the spec pinned the format.
  • Stale artifacts. The most recent artifact is the best proxy for current format, but committed excerpts can drift if the producer’s format changes later. Layer 1’s cross-reference to the producer’s SKILL.md template (not the artifact) catches format changes at spec time.
  • No retroactive retrofits. Pre-existing tests are out of scope; the requirement applies to test files added or modified going forward.