Dev Journal

2026-05-22 — Cross-Feature Intelligence Layer

This feature closes a gap in the pipeline’s memory: each /cspec run previously started from a near-blank slate despite the pipeline accumulating rich historical data across features. Six data sources existed (deferred review findings, Devil’s Advocate reports, override patterns, lens recommendations, debug investigations, and phase effectiveness history) but none were surfaced to the spec agent during brainstorm. The cross-feature intelligence layer aggregates them into a single JSON brief that /cspec reads as advisory context.

The core implementation is scripts/cross-feature-intel.sh (876 lines), a deterministic bash script that reads each source, extracts structured entries with per-source parsing logic, applies recency filtering (90-day exclusion), optional file-scope filtering (for debug investigations that have file-scoped data), and caps the output at 30 entries with a per-section minimum guarantee (each non-empty section retains at least 1 entry even at the cap). The six extraction routines each handle a different data format: deferred findings use jq to filter status: "open" entries from the JSON backlog; Devil’s Advocate reports parse ## DA-NNN: markdown headings with severity extracted from either inline bold or subsection formats; overrides are collapsed by reason hash (sha256 first 8 chars) across multiple run files; lens recommendations are collapsed by name with a promotion_candidate flag at count >= 3; debug investigations extract Root Cause text from markdown with file refs matched via intentionally narrow project-conventional path regex; phase effectiveness collapses post-merge bugs by the phase that should have caught them. The script follows PAT-003 (phase-transition script conventions): lives in scripts/, sources lib.sh, accepts CLI arguments, outputs to stdout, exits 0 always.

The /cspec integration adds Step 0a after the initial brainstorm exchange. Once the user has described their feature, /cspec invokes the script with --scope set to the feature’s likely affected files and presents a 3-5 entry summary framed as context. The design decision that required the most thought was DD-004: whether to use an <UNTRUSTED_RESEARCH_BRIEF> fence (like TB-007 for external web content) or a prose anti-anchoring directive. The intelligence brief contains internal project data that was human-reviewed at creation time – the risk is cognitive anchoring (over-weighting historical patterns) rather than prompt injection. The anti-anchoring directive includes calibration examples (weight-when: same files, 3+ recurrence, security-related; dismiss-when: different module, near 90-day boundary, irrelevant pattern) following the PMB-007 lesson that uncalibrated directives cause agents to default to lowest-friction interpretation. PRH-003 prevents the amplification vector where brief content gets interpolated into spec rules. This asymmetry with TB-007 is documented in ARCHITECTURE.md TB-003’s “Mitigation variant” subsection.

The /cstatus integration adds 3-state intelligence health reporting: no data (fresh projects with no pipeline history), stale (brief older than 7 days with remediation guidance), and current (brief age and section entry counts). The dormancy pattern (PAT-019) handles pre-upgrade projects where the script doesn’t exist yet – no intelligence health section appears at all. The feature introduced ABS-037 for the brief’s sole-writer contract, with /cspec as read-only consumer and /cstatus as metadata reader. SFG protection was intentionally omitted for the brief itself (advisory and regenerable from source artifacts), though the script is protected by the existing scripts/*.sh glob.

2026-05-18 — DA-002 Debt Sprint (Workflow-Advance Decomposition)

This feature addresses the single largest file in the project: hooks/workflow-advance.sh at 1,368 lines with 23 command functions. Every feature that adds a phase-transition gate touches this file, and it was growing at roughly 50 lines per feature. The Devil’s Advocate assessment (DA-002) flagged it as a maintainability risk for a single-maintainer project. The solution decomposes it into a thin dispatcher that sources three module files from scripts/wf/.

The decomposition follows the DD-002 design decision: modules are sourced, not executed. They share the dispatcher’s variable scope (REPO_ROOT, CONFIG_FILE, ARTIFACTS_DIR, etc.) and all helper functions. This avoids parameter passing and keeps the same runtime model – callers of workflow-advance.sh see identical behavior. The dispatcher sets SCRIPT_DIR (DD-006) before sourcing any module, and modules use $SCRIPT_DIR instead of BASH_SOURCE[0] for path resolution. This is critical because BASH_SOURCE[0] inside a sourced module resolves to the module file, not the dispatcher – every relative path in the original monolith would break without this fix.

The three modules are grouped by responsibility (DD-001): transitions.sh holds phase transition commands (review, tests, impl, qa, fix, done, verify, documented, audit-start, audit-done, etc.), utility.sh holds operational commands (init, reset, override, status, status-all, diagnose, help), and metadata.sh holds state modification commands (set-intensity, resolve-drift, spec-update). The grouping keeps each file under 500 lines while avoiding the overhead of one-file-per-command (23 files would be excessive). Helper functions used across modules remain in the dispatcher – moving them to lib.sh would require parameterizing dispatcher-local variables for no benefit (DD-003).

The second major change replaces the hardcoded test command in workflow-config.json. The old command was a 3,372-character string manually listing 86 test file names – each new test required editing this string (AP-024 class, the same bug class that PMB-003 caught in setup). The new command uses for f in tests/test-*.sh with an explicit test-helpers.sh exclusion and echoes each filename before execution (DD-004). This required renaming tests/test.sh to tests/test-core.sh (the old name didn’t match the test-*.sh glob) and removing its inline invocations of other test files (each test now runs independently via the glob, eliminating double-execution). The rename cascaded through 12+ test files that contained registration checks verifying their own inclusion in workflow-config.json, ci.yml, and the old test.sh – all updated to check glob discoverability instead. CI’s ci.yml was updated in parallel.

Infrastructure updates: setup now creates the scripts/wf/ subdirectory and installs module files to .correctless/scripts/wf/ with manifest tracking. sync.sh propagates scripts/wf/*.sh to correctless/scripts/wf/. hooks/sensitive-file-guard.sh DEFAULTS include scripts/wf/ to prevent LLM agents from modifying module files directly. Four drift debt items (DRIFT-001, 003, 004, 008) were triaged – two resolved (the underlying concerns are now structurally addressed by other features), two wont-fix (the original proposed fixes are superseded by phase separation and per-round diff review). A drift debt cadence check was added to /cspec Step 0: if 2+ items are open, it emits an advisory before brainstorm.

2026-05-15 — Dashboard Visual Redesign

This feature is a complete visual and UX overhaul of the project dashboard generated by scripts/build-dashboard.sh. The data collection pipeline (bash Steps 0-13) is unchanged – only the HTML/CSS/JS rendering layer (Step 15) was rewritten. Three files changed: the source script, the distribution copy, and the test file (46 new tests for redesign-specific assertions).

The visual identity moves away from the generic GitHub-like palette. Custom fonts (DM Sans for body text, DM Serif Display for headings) are loaded from Google Fonts via a <link> tag with an onerror handler that resets CSS variables --font-body and --font-display to system fonts if the CDN fails. A placeholder SRI hash is present on the font link but is not functional – Google Fonts returns different CSS per user-agent, making static SRI impractical. The onerror fallback is the real safety net, and this tradeoff is documented in QA-001. The accent color shifts from blue (#4361ee / #58a6ff) to warm amber/gold (#c8842d in light mode, #dba14a in dark). Both light and dark modes have distinct, polished color palettes defined through CSS variables – light mode uses warm off-white backgrounds (#faf8f5) while dark mode uses deep purple-tinted darks (#121018).

The layout system changes from flat <h2>-separated sections to a card-based hierarchy. Three card CSS classes (card, section-card, health-verdict) provide different levels of visual containment with box-shadow, border-radius, and border properties. A new .value-narrative section sits near the top of the Metrics view (before Quality Trajectory), prominently displaying the total findings caught pre-merge as a large stat number, escape metrics when available, and a pipeline phase distribution breakdown. This addresses R-002’s goal of making the dashboard’s value immediately obvious to a first-time viewer. The Artifact Browser retains its spec-centric structure with search, status indicators, content tabs, and right panel – the redesign updates its typography and card styling to match the new visual system.

2026-05-14 — Project Dashboard UI

This feature replaces scripts/generate-dashboard.sh with a proper skill (/cdashboard) backed by scripts/build-dashboard.sh. The old script generated a flat HTML dashboard with metrics sections; the new version adds a second view — an Artifact Browser that lets users browse specs, verifications, review findings, research briefs, architecture docs, QA findings, and audit history as rendered markdown directly in the dashboard.

The script collects artifact data by globbing .correctless/ directories (specs, verification, artifacts, findings) and inlines everything as a JSON block inside a <script type="application/json"> tag. The browser-side JavaScript renders markdown using marked.js v14.0.0 with DOMPurify v3.2.4 for sanitization — both loaded from CDN with SRI hashes. This addresses TB-003 (LLM-generated content rendered as HTML), since artifact markdown files contain prose written by LLM agents that could include script tags or event handlers. The </script> injection vector is closed by escaping all </ sequences as <\/ in the inlined JSON before embedding.

The R-007 migration was the most involved part: deleting the old script from source, distribution, and installed locations, then updating all references across ARCHITECTURE.md (ABS-026 consumer list), cmetrics SKILL.md, session-cost tests, sync.sh, FEATURES.md, CLAUDE.md, AGENT_CONTEXT.md, and six test files with hardcoded count assertions. The skill itself is minimal — a 34-line SKILL.md that invokes the bash script and handles the passthrough fallback when artifact reading fails. ABS-032 documents the sole-writer contract. The output directory .correctless/dashboard/ is gitignored since the dashboard is regenerated on demand.

2026-05-09 — UX Review Lens

4 of 9 post-merge bugs (PMB-004 path hallucination, PMB-006 fork stalling, PMB-008 lost findings, PMB-009 silent truncation) are fundamentally UX failures – silent breakage, missing recovery paths, lost output – that no existing review lens would have caught. QA checks correctness, Hacker checks security, Performance checks speed, but nothing asks “does this work from the user’s perspective?” This feature adds UX review agents to all four quality review integration points.

The implementation adds a UX agent to /creview-spec (6th adversarial agent, spawned at high+ intensity), /creview (first-ever parallel subagent in the single-pass review), /ctdd mini-audit (5th specialist agent alongside cross-component, hostile-input, resource-bounds, upgrade-compatibility), and /caudit (new UX preset with 5 specialized roles). The UX agent at each integration point evaluates through four sub-lenses: new-user (path discovery, zero-state behavior, first-run errors), upgrade (silent behavioral changes, migration path clarity, config backward compatibility), offboarding (residual state, orphaned artifacts, graceful degradation), and recovery (error messages on failure, resumption paths, state consistency, output persistence). The /caudit UX preset adds a fifth sub-lens – cross-session continuity – that checks for workflow state persistence across sessions, conversation context dependency, and fresh-session artifact path resolution. This fifth sub-lens is scoped to /caudit because cross-session continuity is only meaningfully testable through multi-session audit scenarios.

Each UX agent prompt includes PMB calibration examples (at least 3 of PMB-004, PMB-006, PMB-008, PMB-009) as concrete instances of what BLOCKING UX failures look like per AP-028 (uncalibrated severity gate). The fail-open design (R-008) means UX agent failures never gate progression – consistent with all other review lenses. Output format varies by integration point: UX-xxx IDs in /creview-spec and /creview, MA-xxx with ux-review LENS in /ctdd, confidence-tiered bounty format in /caudit.

2026-05-08 — Pipeline Completeness Verification

PMB-009 exposed a silent truncation bug in /cauto: the pipeline stopped after TDD+simplify (2 of 7 steps at high intensity) when the Skill tool’s forked execution exhausted context capacity. The Skill tool reported “completed” with no error – workflow state showed done instead of documented. The pipeline is resumable on re-invocation, but the silent truncation breaks the “run to completion” assumption.

The fix adds a two-layer verification mechanism. First, /cauto writes a pipeline manifest (.correctless/artifacts/pipeline-manifest-{branch_slug}.json) as its very first action after the phase gate. The manifest records expected_steps (canonical step list based on intensity: standard gets 6 steps, high+ gets 7 including cupdate-arch), completed_steps (updated after each step), and status (in_progress vs completed). On resumption, /cauto reads the existing manifest and reports which steps were missed. Second, /cstatus checks for incomplete manifests and reports them as a dormant check (per PAT-019 – skips silently when no manifest exists, fires only when one is found incomplete). The canonical step enum (ctdd, simplify, cverify, cupdate-arch, cdocs, consolidation, pr) is defined in ABS-031 and verified by structural tests.

2026-05-08 — Escape Metrics in Audit Pipeline

Added escape rate tracking to the /caudit convergence pipeline. After each audit round, the pipeline now computes and logs the escape rate – findings from round N+1 that should have been caught in round N. This feeds into /cmetrics as a quality signal: a declining escape rate across rounds means agents are getting better at catching issues on the first pass. The metric is advisory and never gates progression.

2026-05-08 — Autonomous Skill Contract

The /cauto pipeline previously stalled at every human decision point – doc approval, architecture entry triage, refactoring confirmation. Each pause broke the pipeline’s execution model and caused PMB-006-class stalls. This feature solves the stalling problem by adding a formal contract for how skills behave when dispatched autonomously.

The core mechanism is an interaction_mode field in every SKILL.md’s YAML frontmatter. The field has three values: autonomous (5 skills like chelp and cmetrics that already run to completion without input), interactive (2 skills – csetup and cspec – that inherently require Socratic human interaction), and hybrid (22 skills that have decision points but can provide sensible defaults). The field is documentation-only – it is NOT parsed by the Claude Code plugin loader (ENV-007 documents the loader only reads name, description, tools, model). Instead, /cauto reads it via the Read tool when planning dispatch, and structural tests verify every skill has it.

Each autonomous and hybrid skill gained a ## Autonomous Defaults section listing decision points with unique IDs (AD-001, AD-002, etc.) and rationale. The interesting design decision was the escalate: always marker for certain decisions that MUST get human input regardless of mode. For hybrid skills with context: fork (cdevadv, cpostmortem, credteam, cverify), these decisions cannot actually reach the human during execution – the fork prevents follow-up input. The deferred escalation mechanism (R-011) resolves this: the skill applies the default, flags escalation_deferred: true, and returns it in structured output. /cauto collects these and surfaces them at pipeline end as a confirmation gate before PR creation (R-013). This means fork+hybrid skills can participate in the autonomous pipeline without architectural changes to fork semantics.

The JSONL artifact (.correctless/artifacts/autonomous-decisions-{branch_slug}.jsonl) follows the ABS-029/audit-record.sh pattern: a dedicated writer script (scripts/autonomous-decision-writer.sh) with subcommands (append/read/path), SFG protection on both the script and the JSONL file, and /cauto as the sole invoker. Skills return decisions in a structured AUTONOMOUS_DECISIONS_START/AUTONOMOUS_DECISIONS_END block; /cauto parses and persists them. The AD-UNLISTED fallback (R-014) handles decision points not listed in a skill’s defaults – they use the first option and get flagged as deferred escalations, making incomplete defaults sections visible rather than silently wrong. The fail-open design (R-005) means that if mode: autonomous is absent from the prompt, skills run interactively – a stall is annoying but safe, while silently applying defaults when the user expects to be asked is worse.

2026-04-25 — Statusline Live Cost

The session cost analysis feature (compute-session-cost.sh) takes ~2 seconds to run – far too slow for the statusline’s 50ms budget. This feature bridges the gap with a background-refresh cache: the statusline reads a lightweight JSON cache file synchronously (<5ms), and when the cache is stale (>30 seconds), spawns compute-session-cost.sh in the background to regenerate it.

The background refresh mechanism uses three defenses against concurrency bugs. First, a lock file (.correctless/artifacts/cost-cache.lock) containing the PID of the background process prevents double spawns – if a second render fires while a computation is running, it sees the lock, checks kill -0, and skips the refresh. Second, a trap EXIT in the background subshell ensures the lock file is cleaned up even if the process dies abnormally. Third, atomic writes via mktemp + mv prevent the statusline from reading a half-written cache file. The lock file is written by the statusline before disown but after & – the $! PID is only available after the background spawn, creating a minimal TOCTOU gap that QA-001 acknowledged as inherent to bash semantics.

Two helpers were extracted to support the display: phase_display_name() converts raw workflow phases (tdd-impl -> GREEN) and was factored out of the existing phase display logic (which used inline case branches), and fmt_cost_nonzero() uses awk to format a decimal only when non-zero. The cost display format – $47.23 ($12.50 in GREEN) – appends to the existing Section 4 content after QA rounds and duration. When cost is zero or the cache doesn’t exist, the cost portion is omitted entirely, keeping the statusline clean during early workflow phases before any cost accrues.

The compute-session-cost.sh extensions (--cache and --phase flags) follow a clean separation: --cache changes the output format (lightweight JSON to stdout instead of full artifact to file), and --phase computes current_phase_cost_usd by filtering the by_phase array. The caller (statusline background subshell) handles file placement, maintaining the script’s single-responsibility as a computation engine per ABS-026 (cost artifact contract).

2026-04-22 — Skill Path Discovery

PMB-004 surfaced a class of bug where skills reference workflow artifacts by concept (“Read the spec artifact”) without specifying how to discover the file path. This works on the Correctless repo itself because conversation context from a preceding /cspec run carries the path forward. On other projects in fresh sessions, the agent hallucinates paths – /creview-spec tried three wrong locations before giving up. The fix is straightforward: each skill now calls workflow-advance.sh status and reads the Spec: line, matching the pattern already used by /creview and /ctdd.

Four skills were fixed: /creview-spec (step 2 replaced entirely), /cverify (removed the vague “from workflow state or .correctless/specs/” fallback), /cpostmortem (added workflow state lookup with a .correctless/specs/ fallback for post-merge postmortems), and /csummary (added workflow-advance.sh status call to replace state file reading). The changes are text-only – prompt edits, not code. The distribution copies were synced via sync.sh.

The structural guard in test-architecture-drift.sh is the more interesting contribution. It maintains two explicit lists – MUST_HAVE_DISCOVERY (8 skills that must have at least one path discovery token) and EXCLUDED_FROM_DISCOVERY (20 skills that don’t need single-spec discovery). Every skill directory that isn’t _shared must appear in exactly one list, or the test fails. This is the same list-based classification pattern used by REG-001 (test registration guard) – a new skill being added to skills/ will fail the drift test until the author decides whether it needs path discovery. The skill_body() helper was extracted to test-helpers.sh (shared harness) since both test-skill-path-discovery.sh and test-architecture-drift.sh need it to strip YAML frontmatter before checking skill content.

AP-025 was added to antipatterns.md documenting the bug class. The Correctless Learnings in CLAUDE.md got a PMB-004 entry. No new architecture patterns were introduced – this feature applies existing conventions (PAT-001 source-to-dist sync, structural guard classification) to a new context.

2026-04-19 — Test Harness Extraction

The 14 newest test files all had the same ~30-line boilerplate block: pass(), fail(), section(), skip(), counter variables, color definitions, preamble (set -uo pipefail, cd to repo root), and summary(). The duplication was a natural consequence of each test file being authored by a fresh TDD agent that couldn’t know about helpers that didn’t exist yet. Once the pattern stabilized across enough files, extraction became purely mechanical.

The interesting part was the variant classification. Not all 14 files duplicated the same subset. Variant A files (8 files like test-carchitect.sh and test-session-cost.sh) had the full boilerplate — complete pass/fail/section/skip functions, counter init, colors, and preamble. Variant B files (test-dev-journal.sh, test-qa-uncertain.sh) had minimal one-liner pass/fail and counters but no section/skip/colors. Variant C files (test-sensitive-file-guard.sh, test-auto-policy.sh, test-allowed-tools-check.sh) never defined pass/fail at all — they used file-specific assert helpers (assert_eq, file_contains) that directly incremented PASS/FAIL. These files only needed the harness for the preamble and counter initialization.

The variable normalization in test-architecture-drift.sh was a minor surprise — it used FAILED_INVS (invariants) instead of the standard FAILED_IDS, an artifact of being written before the naming convention settled. The harness uses FAILED_IDS, so the migration required updating references in both the fail() calls and the summary function.

The registration guard updates (QA-001) were the most consequential QA finding. test-ci-hook-wiring.sh and test-architecture-drift.sh both enumerate test-*.sh files and expect every match to be registered in CI and workflow-config.json. But test-helpers.sh is a sourced helper, not a standalone test — running it directly would just define functions and exit. Both guards now explicitly skip it. This is the same naming-convention tension noted in the QA class fix: test-helpers.sh matches test-*.sh but is semantically different from the test files it serves. A naming convention like helpers-test.sh would avoid this, but changing it now would break the 14 source lines that already reference the path.

2026-04-20 — Session Cost Analysis

The token-tracking PostToolUse hook has been writing zeros for total_cost_usd and token counts since it was introduced. The fields don’t exist in Claude Code’s PostToolUse contract (tracked as #11008). Every /cmetrics dashboard, every calibration entry, every “cost by phase” section has been showing zeros or deriving cost from nothing. This feature replaces the phantom data with real USD cost computed from Claude Code’s session transcripts.

The key insight is that Claude Code already records everything needed — session transcripts in ~/.claude/projects/ contain per-turn model, token counts (input, output, cache write, cache read), and branch context. The challenge is deduplication: streaming produces ~3.14x inflation with multiple JSONL entries per API call sharing the same message.id. Taking the last entry per unique ID (the final streaming response with complete token counts) eliminates the inflation cleanly.

Phase attribution was the most interesting design decision. The script reads the audit trail for phase transitions and assigns each transcript turn to the phase active at its timestamp. Subagents spawned during GREEN that complete during QA are attributed to QA (completion-time, not spawn-time) — spawn-time attribution would require correlating parent tool_use IDs with subagent transcript IDs, adding complexity for marginal accuracy gain. The script always undercounts by the invoking /cdocs session’s cost since it runs before the session ends.

The adversarial review (F-02) caught a significant design flaw: the original spec included a cross-project fallback scan that would search all ~/.claude/projects/ directories for matching cwd patterns. This creates information leakage between projects. The fix was clean — two discovery paths only: candidate derivation from repo root, plus a config override for non-standard layouts.

The pricing validation ($500/M ceiling) catches a likely confusion between per-token and per-million-token values. The 6-decimal precision invariant (total_cost_usd == sum(by_phase) == sum(by_subagent)) ensures the two orthogonal breakdowns account for 100% of cost without floating-point drift.

2026-04-20 — Dashboard Trend Insights

The dashboard started as a data dump — it showed what happened but not whether things are improving. This feature adds four trend sections that answer “is Correctless working?” by transforming raw counts into trajectory views.

The QA Rounds Trend reuses the same horizontal bar visual as Quality Trajectory but maps QA rounds: N from workflow-history.md entries. A declining bar length over time means the workflow is learning — fewer QA rounds needed per feature. The data was already parsed in Step 3; the new section just renders it differently.

Intensity Accuracy reads calibration entries (already parsed in Step 7) and compares recommended_intensity against actual_intensity using an ordinal map. The three buckets (agreed/raised/lowered) surface whether the system’s intensity recommendations match human judgment. With 11 calibration entries, the data is starting to be meaningful — alpha was raised from standard to high, everything else agreed or was lowered.

Override Rate shows per-feature override counts as bars with a one-line mean summary. The data comes from workflow-history.md’s Overrides: N field, which is only present when >0 (per the /cdocs convention). Features with 0 overrides show empty bars. The mean is a simple arithmetic average — useful as a monitoring signal for gate misclassification (AP-023).

Fix Rate reads findings with status fields and computes fixed/total with a percentage bar. The dual degradation (no findings at all vs findings without status fields) matches the spec’s R-006 requirement. The Fix status data not available message catches older qa-findings files that predate the status field.

Section ordering (R-005) was the most constraint-heavy rule — 7 ordering assertions verify the full narrative flow: Project Summary, Quality Trajectory, QA Rounds Trend, Pipeline Phase Distribution, Fix Rate, Antipattern Health, Intensity Accuracy, Override Rate, Cost by Phase, Drift Debt, Dev Journal. The test extracts line numbers via grep -n and compares numerically.

2026-04-19 — Project Dashboard

The dashboard is the first feature that reads across nearly every artifact Correctless produces — workflow history, QA findings, antipatterns, calibration entries, drift debt, token logs, dev journal, overrides, and project config. Building the parser surface in pure bash (awk for markdown, jq for JSON, grep for pattern matching) was the natural choice given the project’s zero-external-dependency stance, but it makes the implementation brittle to format changes. The spec explicitly accepted this risk: if the format changes, the parser breaks visibly (empty sections), not silently (wrong data).

The most interesting section is antipattern dormancy detection. It cross-references AP-xxx IDs against the last 5 qa-findings files to determine whether an antipattern is still firing. Antipatterns with Status: Structurally enforced are marked resolved. This closes the loop on the antipattern lifecycle — you can now see which antipatterns were caught early and stopped recurring because the workflow learned from them.

The HTML generation uses a small vanilla JS DOM builder (h(tag, attrs, ...children)) instead of template literals or string concatenation. This keeps the inline script readable and avoids the escaping nightmares that come with embedding data-derived content in HTML strings. The data is injected as a JSON blob in a <script type="application/json"> tag, parsed once, and rendered entirely client-side.

Dark/light mode via prefers-color-scheme CSS custom properties. Horizontal bars via inline <div> widths. No charting libraries, no CDN links, no fetch calls. The file opens correctly via file:// protocol, which matters because this is a local development tool, not a hosted dashboard.

2026-04-18 — Agent Hook for Internal Import Enforcement

This feature introduces the first agent hook in Correctless, establishing a new hook type alongside the existing bash script hooks. The key insight is that some enforcement checks require LLM reasoning — reading ARCHITECTURE.md, parsing YAML entrypoints, matching glob patterns against import paths — which cannot be done deterministically in a bash script. Claude Code’s agent hook type (type: "agent") solves this by spawning a lightweight sub-agent (Haiku by default) that can read files and reason about the result.

The implementation is a single JSON file at hooks/import-guard.json containing the hook configuration and an embedded prompt. The prompt decomposes the check into six sequential steps: (1) is this a test file? (2) do entrypoints exist in ARCHITECTURE.md? (3) read the test_helpers allow-list, (4) parse entrypoints YAML, (5) check imports against scope globs, (6) decide allow/deny. Each step has a clear early-exit path, and the prompt includes language-aware import patterns for Go, TypeScript/JavaScript, Python, and Rust with explicit allow for unsupported languages.

The most interesting design decision was making the deny reason unconditionally include escalation guidance (“ask the user for guidance”) rather than tracking retry counts. Agent hooks are stateless — they have no persistent state between invocations. The original spec (R-012) described retry counting, but review correctly identified that this is impossible in the agent hook model. The unconditional guidance means every deny is self-documenting without requiring the hook to track anything.

The setup script was extended with a second hook discovery loop for hooks/*.json files. This loop reads hook_type, type, matcher, prompt, and timeout from the JSON file and constructs the settings.json entry differently from command hooks — {type: "agent", prompt: ..., timeout: ...} instead of {type: "command", command: ..., timeout_ms: ...}. The idempotency logic checks for existing agent hooks and updates matcher/prompt/timeout on re-run without duplicating entries. Sync.sh was updated with JSON-specific propagation and bidirectional staleness detection (both source-has-but-dist-missing and dist-has-but-source-missing cases).

The workflow.test_helpers config field is the escape hatch for false positives. Test helper packages (e.g., pkg/handlers/testutil/) that live within an entrypoint’s scope but are legitimately imported in tests can be allow-listed via glob patterns. This was the primary risk mitigation from the spec — agent hooks return {ok: false} with no override mechanism, so false positives are hard walls. The allow-list plus the deny reason’s explicit guidance on how to add to it makes the hook self-correcting.

2026-04-19 — Upgrade Compatibility Lens

PMB-003 exposed a gap in the review pipeline: setup had a hardcoded 2-file script list that silently went stale across 5 PRs, leaving 16 of 18 scripts uninstalled on user projects. No pipeline phase ever asked “what happens to an existing user who upgrades?” This feature closes that gap by adding the upgrade compatibility question to both /creview-spec (spec-level) and /ctdd (implementation-level).

The implementation is entirely prompt-level. In /creview-spec (skills/creview-spec/SKILL.md), a 5th adversarial agent – the Upgrade Compatibility Auditor – was added to the high+ intensity agent roster. It receives the same self-assessment input as the other four agents but examines the spec through a 5-item upgrade checklist: (1) does the spec account for installation of new scripts/hooks, and is the mechanism complete (glob vs hardcoded list)? (2) do new config keys have defaults? (3) do schema changes address backward compatibility? (4) do removals include migration paths? (5) do features depending on new artifacts degrade gracefully? At standard intensity (which only spawns 3 agents), the upgrade agent is not spawned – upgrade issues are primarily implementation-level bugs, making the mini-audit’s code-level check more reliable at catching them.

In /ctdd (skills/ctdd/SKILL.md), a 4th mini-audit specialist was added alongside cross-component, hostile-input, and resource-bounds. Unlike the review agent, this one runs at all intensity levels because it examines the actual git diff, not the spec. The same 5-item checklist is used but reframed for implementation: “does the install/setup mechanism install all new files?” instead of “does the spec account for installation?” Both prompts reference AP-024 and PMB-003 as concrete examples of the bug class, giving the agent historical context for what upgrade failures look like in this project.

The count updates were the most mechanical part but the most error-prone. In creview-spec: “4 adversarial agents” became “5 adversarial agents” in 6 locations (progress announcement, agent spawning text, intensity tier description, task list, checkpoint phases, agent_role enum). In ctdd: “3 specialist agents” became “4 specialist agents” in the progress announcement, and upgrade-compatibility was added to the LENS enum, agent_role enum, and token tracking. The standard-intensity count in creview-spec stayed at 3 – the upgrade agent is intentionally gated behind high+ there. The full redundancy design (both review and mini-audit ask the same questions) means the upgrade compatibility check fires twice per feature at high intensity: once during spec review (catching design omissions) and once during mini-audit (catching implementation omissions).

2026-04-18 — /carchitect Phase 1: Entrypoint-Aware TDD

Phase 1 closes the gap between /carchitect’s machine-referenceable entrypoints (Phase 0, ABS-023) and /ctdd’s test-writing behavior. Before this feature, the RED phase test agent would read ARCHITECTURE.md but had no specific instruction to use the entrypoints section when writing integration tests. The test audit had no check for tests that bypass entrypoints by importing internal packages directly. PR #70 added Entry/Through/Exit contracts (ABS-024) that tell the test audit what shape a test should have — but nothing told the RED phase agent to write tests through entrypoints from the start.

The implementation is entirely prompt-level changes to skills/ctdd/SKILL.md. Three new paragraphs were added to the RED phase test agent’s blockquoted instructions: one instructing the agent to read entrypoints and match rule scope to entrypoint scope globs before writing integration tests (R-001), one instructing it to read Key Patterns/Layer Conventions/Trust Boundaries and respect layer access constraints (R-002), and one providing a graceful fallback with a No documented entrypoint comment marker when no entrypoints section exists (R-004). The Read context list was updated to emphasize “especially the Entrypoints section and Key Patterns” (R-003).

In the test audit section, a new check 10 (Internal import bypass detection) was added after the existing check 9 (Entry contract verification). Check 10 reads entrypoints from ARCHITECTURE.md, builds a map of scope globs to entrypoint names, and checks each [integration] test file for import statements referencing paths within an entrypoint’s scope. The check is language-aware with patterns for Go (import "pkg/..."), TypeScript/JavaScript (import ... from / require()), Python (from pkg import / import pkg), and Rust (use crate:: / mod). Unsupported languages get an ADVISORY skip. The check explicitly excludes self-imports of the entrypoint itself (R-007) and skips entirely when no entrypoints are documented (R-008). When check 10 and check 9 both fire on the same test, they consolidate into a single finding (R-005).

The test file (tests/test-carchitect-phase1.sh) is structural — it verifies prompt text presence in the SKILL.md file via grep patterns rather than testing LLM behavior. This is the same approach used for the integration-test-contracts tests (check 9). 33 assertions across 9 rules cover the mechanical envelope: required phrases, check descriptions, severity levels, consolidation logic, language patterns, and documentation updates. The ABS-023 consumer description was updated from “transitive consumer” to “direct consumer” to reflect that /ctdd now reads entrypoints directly in both the RED phase and test audit, not just through Entry/Through/Exit contracts.

2026-04-26 — Harness Fingerprint + Model Upgrade Detection

This feature exists to close a class of silent regression that the 4.6 → 4.7 audit (OPUS_4_7_MIGRATION.md) made painfully visible: when Anthropic ships a new model or tweaks harness defaults inside an existing model version, the workflow regresses silently. Three findings surfaced in one audit session, none caught by tests, none surfaced by metrics. The “uncontracted model defaults” antipattern (logged in MEMORY.md as a class) is the underlying issue — Correctless’s correctness model implicitly depends on a single Anthropic version’s behavioral defaults (length caps, parallel-tool-call preferences, anti-defensive code priors, in-context skill inlining), and there was no mechanism to notice when those changed.

The implementation is two bundled mechanisms. First, a deterministic fingerprint in scripts/harness-fingerprint.sh: it computes the literal string "{model_name}|{HARNESS_VERSION}" (no hashing — debuggable by reading the file directly) where HARNESS_VERSION is a manually-bumped integer constant at the top of the script. The maintainer increments this when a behavioral change is observed (heuristic OQ-006: >20% delta in any metric across consecutive same-model runs, or a manually-noticed shift like a 4.7-style audit pattern). The script runs at every /cspec Step -1 via a structural marker  and emits a one-time version_bumped advisory per session, gated by a flag file at .correctless/artifacts/harness-notified-{session-id}.flag. Session-id derivation lives in lib.sh as get_current_session_id() (cross-platform: ps -o lstart= → /proc/{pid}/stat → PID-only fallback) — single source of truth, no per-skill drift permitted. A new locked_update_file() helper in lib.sh mirrors locked_update_state for arbitrary file paths (BND-002 / ME-4 round-2).

Second, a /cmodelupgrade skill (the project’s 29th skill — Analysis category) is the sole writer of .correctless/meta/model-baselines.json. Given the current {model}+{HARNESS_VERSION} key, it reads four data sources per feature — intensity-calibration.json for qa_rounds + total_tokens, cost-*.json glob (via ABS-026 — never a hardcoded slug list, mitigates AP-024 and PMB-003), workflow-state-*.json for phase_count, and the baseline file itself. Aggregation uses an explicit three-tier bootstrap: exact-match pool (entries tagged with current harness_version) → pre-fingerprint pool (entries from before /cverify recorded the field, used with explicit “pre-fingerprint baseline” label) → no-baseline mode (clear message, exit 0 — never compares against zero, mitigates DA-004 self-referential metrics). The skill spawns no subagents (ME-12) — all aggregation, comparison, and report rendering happens inline in the orchestrator’s context.

Several design decisions are worth surfacing for future modifiers. The v1 spec proposed an LLM probe to introspect distinctive harness substrings; round-2 of /creview-spec rejected it (CR-2) as compounding-uncertain — undefined channel between agent and script, stability uncertainty, a new trust boundary, susceptibility to negation-spoofing. A literal version constant is testable, has no trust boundary, and aligns with how harness changes actually get noticed in practice (a human observes “things feel wrong” within one session). Hashing was dropped (HI-1 round-2): neither the model name nor the version is secret, and the literal key is debuggable. Sole-writer enforcement is structural via hooks/sensitive-file-guard.sh (PRH-002 — directly mitigates AP-022 dead-code-in-security-paths) — the hook blocks Edit/Write AND Bash redirects (>, >>, tee) for the fingerprint file, the baseline file, AND scripts/harness-fingerprint.sh itself. PRH-006 lifecycle scoping addressed CR-1 round-2: the script’s protection activates after the first commit lands so the implementation agent can create it during /ctdd GREEN, then the protection is permanent.

The verification surfaced one acceptable drift item (DRIFT-001): the live harness-fingerprint.json lacks schema_version because it was first written before MA-UC-001’s schema_version fix landed. The writer is now correct for all future writes (and any rewrite triggered by version bump or corruption recovery — both verified by test_ma_uc_001_schema_version); the live file will self-heal on next rewrite per BND-004’s fail-open posture. INV-007/INV-009/INV-014 are intentionally structural-only because their entry path is the Skill tool, which isn’t bash-testable end-to-end (per QA-002 — accepted limitation). All other 21 invariants and 11 prohibitions/boundaries have mechanical tests that would fail on regression. 110 tests pass in tests/test-harness-fingerprint.sh; cross-suite coverage in test-architecture-drift.sh, test-sensitive-file-guard.sh, test-allowed-tools-check.sh, test-scripts-namespace-migration.sh, and test-skill-path-discovery.sh covers the integration points.

2026-04-28 — Harness-Fingerprint R2 Hardening

This feature exists because the R2 audit of the harness-fingerprint R1 fix batch had a 71% defect rate. The R1 round patched specific instances (“this command leaks”, “this redirect bypasses”); each subsequent R2 specialist round found a route around the previous patch. The lesson, recorded as PMB-002 and the autonomous-fix-defect-rate feedback note in MEMORY.md, is that audit-fix rounds are themselves untested code — and when the underlying extractor is enumeration-based, every fix is a one-instance patch that the next round routes around. The remedy is to close the bug class structurally: the extractor must be incapable of “missing” a write command because it never enumerates them in the first place.

Three architectural pieces shipped together. The first is canonicalize_path in scripts/lib.sh — a pure-bash segment-stack walker. It is total over arbitrary byte sequences (INV-001), idempotent (INV-003), produces no //, no . segments, no .. segments on absolute paths, no trailing / (INV-002), recognizes ASCII . only as a path-segment dot (INV-002a — Unicode lookalikes U+2024, U+FF0E, U+2026 pass through as ordinary bytes), performs no shell expansion of glob characters (INV-004), and runs in <50ms on 1024-byte input (INV-012). The function lives in lib.sh because both hooks/sensitive-file-guard.sh and hooks/workflow-gate.sh consume it via ABS-001 — no per-hook reimplementation. INV-001a closes a subtle fail-open class: empty stdout on non-empty input would let the matcher receive an empty target and skip pattern comparison, so the function’s contract explicitly forbids that. Property-based tests (tests/test-canonicalize-path.sh) use a pinned seed (RANDOM=42), 1000 inputs, and a corpus where each of the dangerous bytes (*, ?, [, ], /, ., ` , \t, \n, $, backtick, (, {) appears in at least 50 inputs; failures hex-dump via xxd` for replay.

The second piece is the hooks/sensitive-file-guard.sh refactor. The old _extract_bash_targets did per-command dispatch — a chain of case branches (cp), mv), tee), dd), etc.) trying to enumerate “which Bash commands write.” Every R2 round found a missing branch. The new extractor has no per-command dispatch: the default branch over-extracts every non-flag token as a candidate (INV-006), and _check_file_against_patterns filters via canonical-form match. Redirect operators are detected first — >, >>, 1>, 2>, &> in both whitespace-separated (cmd > file) and inline-attached (cmd>file) forms (INV-007). Process substitution sub-tokenizes a single level (INV-007a). _has_write_pattern was extended to flag interpreter+eval-flag chains (bash -c ..., perl -e ..., python -c ..., /usr/bin/env perl ...) — INV-013, with a regression test in tests/test-workflow-gate.sh confirming workflow-gate.sh consumes the shared function via ABS-001 with no local redefinition (INV-013a). Both target and protected pattern flow through canonicalize_path before reaching the matcher (INV-005, INV-008, PRH-004). At hook source-time, a v1 sentinel probe verifies canonicalize_path is present and behaves correctly — if missing or wrong, the hook exits 2 fail-closed before any policy runs (INV-005a). This closes the partial-upgrade class where lib.sh and the guard could end up out-of-sync mid-deploy. PRH-002 makes the no-per-command-dispatch rule structural: 28 disallowed tokens are enumerated in INV-006a’s structural test as a permanent ban list.

The third piece is the --version flag and VERSION_OVERRIDE env-var removal from scripts/harness-fingerprint.sh. AUTH-R2-001 surfaced a confused-deputy class: the testability flag was the autonomous-bump escape hatch — anything that could pass --version N could also forge a fingerprint. The harden: strip both surfaces from production (PRH-003 / INV-009); HARNESS_VERSION=N becomes the sole production input. Tests now inject specific versions via a feature-specific helper at tests/harness-fingerprint-test-helpers.sh (make_test_harness_script <version> <workdir>) that copies the production script to $workdir/harness-fp-test-XXXXXX.sh (mktemp), substitutes the constant via POSIX sed, validates the substitution, and co-locates a copy of lib.sh so SCRIPT_DIR/lib.sh resolves to the under-test source. Critically, the destination filename does NOT match the protected pattern in DEFAULTS (BND-003 — */scripts/harness-fingerprint.sh is the protected glob, the helper writes to $workdir/harness-fp-test-...sh). Per Finding #8 amendment from /creview-spec, the helper lives in a feature-specific file rather than the shared tests/test-helpers.sh — keeps the test surface for one feature out of the global helper namespace.

The migration shipped as two commits per INV-011. The first removes --version from production (intentionally leaving the test suite red); the second migrates the tests to the helper (restoring green). The split makes the production-security decision and the test-infrastructure decision independently revertable — if the helper approach turns out wrong later, the tests can be reverted without re-adding --version to production (which would re-open AUTH-R2-001). Loud failure during the migration window (red tests with explicit fingerprint-mismatch messages) is the deliberate signal, not a silent degradation.

Supporting wiring includes a new path-scoped rule file .claude/rules/canonicalize-path.md (PAT-017), the second dogfood usage of ABS-009 after PAT-001’s hooks-pretooluse rule. Frontmatter declares paths: [scripts/lib.sh] so the body loads into editing context whenever an agent opens lib.sh; tests/test-architecture-drift.sh enforces the rule-file shape (existence, frontmatter, See-link from ARCHITECTURE.md, in-file pointer comment). The setup script now greps for VERSION_OVERRIDE in the existing scripts/harness-fingerprint.sh before installation; if found (pre-R2 install), it force-reinstalls with a clear notice referencing INV-009/PRH-003 (INV-014). This closes the upgrade-path break Finding #7 surfaced — without setup-side detection, an existing user’s pipeline could end up with the post-R2 hook (which probes the new lib.sh) and a pre-R2 script (which still carries VERSION_OVERRIDE) and silently misbehave. PAT-016 was promoted from AP-024 (PMB-003 — frequency 16 missing scripts across 5 features) per /cspec Step 8 approval as a side benefit of this work; the glob-over-directory rule with mandatory count-match drift test now lives full-body in ARCHITECTURE.md.

Verification ran 533 tests across the 7 directly-affected suites with 0 failures. Every spec rule has at least one targeted test; every property-based test has pinned seeds and explicit failure-replay. The verification report flagged one acceptable smell (.correctless/scripts/antipattern-scan.sh exits 1 when stdout is redirected — pre-existing, not a regression from R2) and no drift. The work merged via squash-commit 081f842 as PR #86; this dev-journal entry was written post-merge against main because the branch was already merged when /cdocs ran.

2026-04-30 — Audit Findings Persistence Contract

Corrective action for PMB-005. The original failure was simple in shape: /caudit’s persistence step was described in skill prose (“write per-round findings to audit-{preset}-{date}-round-{N}.json”), nothing enforced it, and on 2026-04-26 a hacker R1 audit transitioned audit-done cleanly with no round-JSON written. The findings existed only as commit-message prose on the squash-deleted audit branch. /cmetrics then derived “days since last Olympics” from history.md mtime — last touched 2026-04-04 — and reported the audit as 16 days stale when it had run the day before. Same shape as silent-telemetry-failure (token tracking 2026-04-14) and AP-022 (dead-code-in-security-paths 2026-04-26). The advisory step looked completed in the orchestrator’s mental model but was never structurally verified.

The feature lands three coupled mechanisms in a single PR. First, the cmd_audit_done precondition gate at hooks/workflow-advance.sh:788. Reads .audit.type and .started_at from the workflow state file, validates .audit.type against ^[a-z][a-z0-9-]{0,31}$ BEFORE glob expansion (so a corrupted state with .audit.type=* or ../etc cannot escape the findings dir — MA-003 instance fix), then iterates audit-{preset}-*-round-*.json and accepts the first file whose started_at field equals state’s started_at byte-for-byte. Content-based string equality, not mtime — robust to ENV-003 (filesystem mtime unreliable after git checkout/git clone/git rebase, exactly the operations a developer might run mid-audit). The gate honors the existing override sentinel for emergencies (INV-008) and writes an audit-specific log entry (gate: "audit-done", bypass_target: "cmd_audit_done") so /cmetrics’s separate audit-done override counter can flag routine bypasses on this gate without conflating them with generic audit-phase overrides. The remediation message names all three load-bearing facts — the literal string Audit findings missing, the expected glob with the actual preset substituted, and the started_at ISO timestamp from state (INV-001a) — so the user can fix the gap from the message alone without reading the source.

Second, scripts/audit-record.sh — the sole writer (ABS-029, INV-006). PAT-003 phase-transition CLI: lives in scripts/, sources lib.sh, accepts CLI positionals, exits 0 on success non-zero on failure with stderr error messages and stdout’s success format being a single line containing the absolute path of the written file (path=$(audit-record.sh write-round ...) consumable). Two subcommands: write-round <preset> <round> <findings-file>|- and append-history <preset> <summary-file>|-. The script’s _state_file helper reads branch_slug from lib.sh and locates .correctless/artifacts/workflow-state-{slug}.json — deliberately with NO ls -t mtime fallback (MA-001 — picking the most recently modified state file across branches would let the writer attribute one branch’s audit to another branch’s started_at, exactly the cross-branch contamination the gate’s content match exists to prevent). Path construction is isolated from external state per PRH-003: only CLI positional args and the hardcoded .correctless/artifacts/findings/ base directory contribute. Reading state’s started_at for the JSON content is permitted because that’s content, not path. The TTY-stdin guard ([ ! -t 0 ] check on - stdin form) emits a clear error rather than blocking forever in interactive testing. append-history uses >> append-only with flock -w 5 to serialize concurrent writers; on lock timeout it emits a warning to stderr and exits 0 (history append failure does NOT block round-JSON write or gate transition — PRH-004’s non-blocking advisory contract). A trap on EXIT/INT/TERM/HUP cleans up the tmp file used during the atomic write phase (QA-R4-005 fix).

Third, /cmetrics’s multi-signal staleness consumer (INV-005, PRH-005). Replaces the original PMB-005 single-mtime read with max(history.md mtime, latest round-JSON mtime) per preset, with an explicit “no data” label when both signals are absent — never silently zero or “infinite” without the label. The consumer side is intentionally mtime-based and fail-open: ENV-003 says mtime is unreliable post-git-op, but /cmetrics is advisory and a slightly-wrong staleness number does not corrupt workflow state. The gate is the authoritative content-based check; the consumer is the advisory mtime-based reading. Layer separation is intentional and documented in INV-005’s “acknowledged residual risk” note. The same /cmetrics change adds a separate audit-done override counter alongside the generic override counter — routine audit-done overrides are the AP-023 recurrence pattern for this gate specifically and warrant their own counter in the Override Health section.

Sensitive-file-guard protection of the writer script itself follows the 2026-04-26 sole-writer convention from CLAUDE.md (harness-fingerprint precedent). DEFAULTS in hooks/sensitive-file-guard.sh gain scripts/audit-record.sh and .correctless/scripts/audit-record.sh plus the bare basename — the test suite verifies blocks against Edit/Write/MultiEdit AND Bash redirects (>, >>, tee, cat | tee, 2>, &>) targeting both source and install-mirror paths (INV-009, four test functions). This is the AP-022 mitigation pattern applied identically: structural enforcement that the writer script cannot be silently replaced by an autonomous agent, which would make the contract unenforceable without anyone noticing.

The structural-test landscape includes some honest weakness markers. INV-006 (“audit-record.sh is the sole writer”) and PRH-005 (“/cmetrics never derives staleness from a single signal”) are both grep-based on skill prose and acknowledged AP-003-class — they catch the obvious case but produce false negatives on rephrased text and false positives on reads. The spec accepts the limit and pairs each with a load-bearing complementary test: PRH-001’s command-name grep (audit-record.sh write-round) is robust to rephrasing and catches the writer-fanout class for INV-006; the behavioral fixture test test_inv005_max_picks_newer_signal (creates a audit-qa-history.md 30 days old, a round-JSON today, asserts the staleness reading uses today) is the load-bearing complement for PRH-005. The pattern of pairing weak-but-cheap structural with strong behavioral is the same shape PAT-017’s tests use.

The rule cluster has 22 entries — 10 invariants, 5 prohibitions, 2 boundary conditions, 4 environment assumptions, 1 architectural addition. ABS-029 sits between ABS-028 and ## Patterns per the spec’s exact placement directive. AP-026 (advisory-prose artifact-write contract) was added to antipatterns.md with the 2026-04-26 incident as its frequency-1 case study; the “How to catch it” prescribes the four-step pattern this feature dogfooded — declare ABS, gate-enforce at phase transition, structural test, multi-signal consumer. PMB-005 is the postmortem entry. Override count of 3 reflects mid-feature stale .claude/hooks/workflow-gate.sh requiring manual resync from source (a known-class instance of the install-drift problem MA-009 / MA-016 flagged); the source hooks/workflow-gate.sh already had tdd-audit in its allowlist, so the syncs were mechanical resyncs, not policy changes. Class fix for the install-drift class itself (hash-pin / version-pin of installed hooks) is deferred to a follow-up — out of scope for a feature focused on findings persistence.

2026-05-06 — carchitect Phase 3: Architecture Adherence Auditor

Phase 3 of the /carchitect roadmap closes the loop between architecture documentation and auditing. Phase 0 reverse-engineered the codebase into a structured ARCHITECTURE.md. Phase 1 made the TDD agent read entrypoints from that document. Phase 2 made the spec agent architecture-aware. Phase 3 makes the auditor architecture-aware — /caudit now spawns an Architecture Adherence Checker agent in every preset that mechanically verifies the codebase against the documented PAT-xxx, ABS-xxx, and TB-xxx entries.

The implementation is entirely prompt-level — a new agent prompt template in skills/caudit/SKILL.md with a corresponding row added to each of the three preset agent tables (QA, Hacker, Performance). Each preset gives the agent a different hostile lens: QA gets “Every documented pattern is violated somewhere,” Hacker gets “Every trust boundary has an unguarded crossing,” and Performance gets “Every layer convention hides a performance shortcut.” The agent’s four check types map directly to the /carchitect roadmap’s four planned capabilities: pattern compliance (layer convention adherence), abstraction invariant checking (dependency direction violations), trust boundary enforcement (anti-pattern presence), and undocumented pattern detection (architecture drift). The last type is informational — it surfaces conventions appearing in 3+ files without a PAT-xxx entry, candidates for /cupdate-arch to formalize.

Three edge-case behaviors are handled by prompt instructions. The dormant-signal fallback (R-004) instructs the agent to emit zero findings and a “skipped” message when ARCHITECTURE.md is missing, has placeholder markers, or has no PAT/ABS/TB entries — it never infers architecture, that is /carchitect’s job. The staleness warning (R-005) uses git log -1 --format='%ai' to compare ARCHITECTURE.md’s last commit date against the most recent source commit; a 30-day gap triggers a single SUSPICIOUS-tier advisory. The exception handling (R-003) instructs the agent to recognize TB-xxx sub-entries (the TB-NNNx pattern where NNN matches the parent and x is a lowercase letter) as documented scoped exceptions, avoiding false positive submissions for intentional deviations like TB-001a or TB-004c.

The architecture_ref field (R-006) is the feature’s contribution to the findings data model. Each finding from this agent carries the specific PAT-xxx, ABS-xxx, or TB-xxx identifier that was violated (or null for undocumented-pattern findings). This field is additive — existing findings without it are valid. The triage agent uses it for deduplication (same entry violated in the same file = same finding regardless of description text). The Regression Hunter (R-010) was updated to read architecture_ref from prior round-JSON files for recurring architecture violation detection, with graceful absence handling for prior runs that predate the field.

The testing approach is keyword-presence (AP-003 class) because all rules are prompt-level instructions. The 48 tests in test-carchitect-phase3.sh verify that the instruction text is present in the SKILL.md prompt for each rule — including the hostile lens framing per preset, the dormant fallback text, the staleness warning mechanism, the TB sub-entry exception handling, the architecture_ref field in the JSON schema example, the four check type descriptions, and the read-only tool access constraint. The source-to-dist sync (PAT-001) is verified by the existing SYNC-001 assertion. This is the standard testing limitation for prompt-level skill modifications — the tests verify instruction presence, not that the LLM follows those instructions at runtime.

2026-05-22 — Review-Driven Mini-Audit Lenses

This feature closes a knowledge gap between Correctless’s review and TDD phases. Prior to this change, the mini-audit spawned six fixed adversarial lenses on every feature regardless of its risk profile — a payments feature got the same “resource bounds” lens as a documentation change. Meanwhile, /creview-spec and /creview deeply analyzed each feature’s specific risks, but that analysis evaporated between phases. The bridge is a structured artifact: review agents write lens recommendations, the mini-audit consumes them, and outcomes are tracked for auditability.

The implementation touches five skill files (skills/ctdd/SKILL.md, skills/creview-spec/SKILL.md, skills/creview/SKILL.md, skills/cmetrics/SKILL.md, skills/cwtf/SKILL.md) and one workflow module (scripts/wf/transitions.sh). The core mechanism is prompt-level: review skills are instructed to write a lens-recommendations-{branch_slug}.json artifact after synthesis, and /ctdd’s mini-audit section is extended to read that artifact, select up to 2 recommended lenses within an 8-agent budget, and instantiate them via a custom lens agent template. The template uses the UNTRUSTED_RECOMMENDATION fence pattern (same as fix-diff-reviewer’s UNTRUSTED_DIFF fence) to wrap the review-generated focus areas and severity guidance — ensuring the custom lens agent treats them as directional guidance, not authoritative instructions. This is the TB-003 / TB-005 mitigation pattern applied to a new trust boundary crossing: LLM-generated review findings flowing into mini-audit agent context.

The design has three key constraints. First, recommended lenses are additive — they never displace the 6 default lenses, especially the two core lenses (hostile-input, cross-component) that catch universal bug classes (PRH-001). Second, review agents write structured recommendations (name, focus areas, severity guidance), not full agent system prompts — the mini-audit owns prompt construction (PRH-002). This prevents review agents from bypassing the mini-audit’s severity calibration, output format contract, and fail-open behavior. Third, the recommendation artifact never gates any pipeline phase transition (PRH-003). This is essential because standard-intensity workflows may not run /creview-spec at all — gating on recommendations would break those workflows.

The artifact schema (ABS-036) follows established patterns: branch-scoped by filename (PAT-004), dormant degradation when absent (PAT-019), gitignored under .correctless/artifacts/. The cmd_done gate in scripts/wf/transitions.sh emits a non-blocking warning when the artifact exists but has no outcomes field — a warning, not a gate, consistent with PRH-003. The LENS field in qa-findings JSON is now an open enum, accepting both the 6 fixed lens values and any recommended lens name. This avoids the cascading test updates that would follow each time a review recommends a novel lens concept. The priority heuristic for selecting which 2 of N recommended lenses to run (CRITICAL/HIGH findings first, then source agent diversity) ensures the most important lenses are chosen when the budget is exceeded. Unselected lenses are logged with ran: false and failure_reason: "budget exceeded" in outcomes for full auditability.

Testing follows the standard keyword-presence approach (AP-003 class) for prompt-level rules, with 80 assertions across 19 spec rules in tests/test-review-driven-lenses.sh. The test file also verifies structural properties: the allowed-tools update in /creview (INV-011), the LENS enum extension in qa-findings schema (INV-012), the non-blocking warning in scripts/wf/transitions.sh (INV-006), and the ABS-036 entry in ARCHITECTURE.md. Two existing test files were updated to accommodate the wording change from “spawns 6 specialist agents” to “spawns the 6 default specialist agents” in the mini-audit progress announcement.

2026-05-23 — Review Intelligence Consumer

This feature completes the second consumer integration for the cross-feature intelligence brief (ABS-037). The parent feature (cross-feature-intelligence) built the aggregation script and wired /cspec as the first consumer; this feature extends /creview-spec and /creview to also consume the brief during their Historical Pattern Integration/Findings sections. The core design decision is the separation between reading and writing: review skills read the brief file directly via jq (no script invocation), preserving the invariant that only /cspec triggers regeneration and occurrence count increments. Without this separation, a single feature pipeline (/cspec -> /creview-spec -> /creview) would trigger three regeneration cycles, crossing the 3-occurrence threshold within one pipeline run and defeating the feedback loop dampener entirely.

The implementation touches both review skill SKILL.md files with identical Intelligence Brief Integration sections. Each section includes an anti-anchoring directive adapted for the review context (distinct from /cspec’s brainstorm examples), a jq command with client-side occurrences >= 3 filtering, and dormant degradation per PAT-019. The Bash(*cross-feature-intel*) allowed-tools pattern was added to both skills’ frontmatter to enable the jq read. The critical structural constraint is INV-003/PRH-001: the 6 adversarial agents in /creview-spec and the single-pass agent in /creview must never see brief data. Only the orchestrator reads the brief during synthesis, preserving the unanchored adversarial analysis that is the review’s primary value. Tests verify this by grepping agent definition files for cross-feature-intel references.

The script (scripts/cross-feature-intel.sh) gains occurrence tracking machinery. On each regeneration: existing entries get their occurrences field incremented by 1, new entries start at 1, and entries that leave the brief (filtered out by scope) have their count preserved in a _dormant_counts metadata section for future re-appearance. Pre-occurrence-tracking entries (without an occurrences field) are treated as 0, so the first run seeds them at 1 — a conservative default that means the dampener works correctly from day one. The _dormant_counts section is capped at 100 entries with alphabetical eviction (an approximation — age tracking was considered but deferred for v1). The atomic write uses an echo+tmp+mv pattern rather than locked_update_file() from lib.sh because the script writes complete JSON from scratch rather than applying a jq filter to existing content — a distinction acknowledged in QA-001.

The --min-occurrences N flag provides script-side filtering for stdout output only: entries below the threshold are excluded from stdout but their occurrence counts are always tracked in the on-disk file. This flag exists for potential future consumers that want filtered output without implementing their own jq filter, but the current review skill integration uses client-side jq filtering directly (INV-002). ABS-037 was updated from “idempotent” to “stateful” and its consumer list now includes both review skills with a note explaining they are pure consumers, not regeneration triggers. TB-003’s mitigation variant text was updated to list /creview-spec and /creview as anti-anchoring directive consumers alongside /cspec.

Two smaller additions round out the feature. /cstatus gains threshold proximity reporting — when the brief exists, it reports how many entries are at each occurrence count below the threshold (e.g., “5 entries at 2/3 occurrences, 3 entries at 1/3”), providing diagnostic visibility for why intelligence is not surfacing in reviews. Review findings artifacts gain an Intelligence brief: metadata line recording consumption status (“consumed” vs “dormant”), providing a persistent record distinguishing “intelligence was unavailable” from “intelligence found nothing relevant.” The 58 tests in tests/test-review-intel-consumer.sh cover all 16 spec rules with keyword-presence and behavioral tests, including boundary conditions for first-ever generation (BND-002), all-below-threshold (BND-001), and entry leave/re-enter via _dormant_counts (BND-003) with corruption handling.

2026-05-24 — Documentation and Artifact Pruning Skill

This feature adds /cprune, the first maintenance-oriented skill in the project. After 71 features and 57 days of development, documentation artifacts accumulate without any removal mechanism – ARCHITECTURE.md has 37 ABS entries, antipatterns.md has 31 AP entries, and there are 373 artifact files in .correctless/artifacts/. When referenced files are deleted (via refactoring, feature removal, or branch cleanup), the entries that reference them become context-token waste and anchor agents on outdated information. /cprune addresses this with a scanner-plus-orchestrator architecture that detects staleness candidates mechanically and handles disposition through two modes.

The core mechanism is scripts/prune-scan.sh (777 lines), a standalone bash scanner that accepts --category and --base flags and outputs a JSON array of staleness candidates. The scanner covers 9 categories: architecture entries (ABS/PAT/TB/ENV with all-dead file references), antipatterns (AP-xxx with all-dead test/script references), CLAUDE.md learnings (feature-specific entries with all-dead references), orphaned artifacts (files for branches that no longer exist), stale deferred findings (open findings whose source review artifact was deleted), AGENT_CONTEXT.md count drift (stated vs actual counts), cross-reference consistency (stale Enforced-at paths), completed specs (merged 30+ days ago), and drift debt (resolved/wont-fix entries older than 90 days). The scanner sources scripts/lib.sh (ABS-001) for branch_slug() and shared utilities, and it uses a deterministic extraction approach – backtick paths, Enforced at fields, Test fields, See-links, and path patterns in Violated when fields. The key design decision for architecture entry detection is the “all-dead” criterion: an entry is only a staleness candidate when ALL extracted file paths are dead. Entries with at least one live reference are never candidates (PRH-003). Class-level antipatterns and conventions/postmortems in CLAUDE.md are excluded from staleness detection regardless of file reference status – the class transcends the instance.

The skill definition at skills/cprune/SKILL.md orchestrates the scanner with two execution modes. Autonomous mode (invoked by /cauto via mode: autonomous in the prompt) auto-executes only low-risk actions: orphaned artifact cleanup, AGENT_CONTEXT.md count corrections, resolved drift-debt removal (90+ days), and spec archiving (90+ days post-merge). It skips categories where >50% of entries are flagged (BND-002 safety valve – this typically indicates a major refactor, not staleness) and entirely excludes CLAUDE.md (PRH-002 – too high-risk for autonomous editing). Interactive mode presents all candidates in a formatted report with per-category disposition options (execute all, review individually, skip). The archive-not-delete design (DD-001/INV-004) ensures documentation entries are moved to dedicated archive files rather than deleted: .correctless/ARCHITECTURE_DEPRECATED.md for architecture entries, .correctless/antipatterns-archived.md for antipatterns, .correctless/CLAUDE_LEARNINGS_ARCHIVED.md for CLAUDE.md learnings. Archived entries retain their original IDs, and the archive write must complete before the source removal – crash-safe ordering that prevents entry loss.

The /cauto integration uses intensity-aware placement (DD-005/INV-012). At high+ intensity, /cprune runs after /cupdate-arch – architecture docs are being updated anyway, so pruning alongside ensures they are both accurate and lean. At standard intensity, /cupdate-arch is skipped entirely, so /cprune runs after /cverify instead. In both cases, /cprune is an internal orchestration action excluded from the ABS-031 canonical step name enum (same pattern as the Step 7.5 backlog sweep). The /cstatus integration (INV-013) runs a lightweight threshold check via the scanner and surfaces a “pruning recommended” signal when orphaned artifacts exceed 10 or stale architecture entries exceed 3. This check is dormant (PAT-019) when scripts/prune-scan.sh is not installed, ensuring no errors on projects that have not adopted this skill.

Security enforcement follows established conventions. The scanner script and all three archive files are protected by hooks/sensitive-file-guard.sh (INV-016), preventing LLM agents from modifying staleness detection logic or injecting entries into archive files. ABS-038 declares the archive file contract with /cprune as the sole writer. The /cauto consolidation step (Step 8.1) staging allowlist includes all three archive files (INV-017), ensuring archive changes during the pipeline are committed. /cprune is explicitly read-only for deferred-findings.json (PRH-004) – it reports stale deferred findings but delegates status updates to /ctriage, avoiding a 5th writer on the ABS-033 multi-writer contract. The 116 tests in tests/test-cprune.sh cover all 19 invariants, 4 prohibitions, 4 boundary conditions, the ABS-038 architecture entry, determinism, and edge cases including empty archives, bulk warnings, no-remote fallback, and real ARCHITECTURE.md entry fixtures (per AP-031).

2026-06-03 — Disallowed-Tools Frontmatter

This feature adds disallowed-tools frontmatter to 12 skills that should never edit source files, applying PAT-018 (structural enforcement over prompt-level instruction) as a defense-in-depth layer alongside the existing allowed-tools whitelist. Claude Code v2.1.150 introduced disallowed-tools in skill YAML frontmatter, which structurally removes listed tools from the model while the skill is active – a blocklist complementing the allowed-tools allowlist.

The 12 skills are split into two groups based on their write requirements. Group A (chelp, cstatus, cdashboard) produces no file output via Write, so all five write-capable tools are disallowed: Edit, Write, MultiEdit, NotebookEdit, CreateFile. Group B (cexplain, cwtf, cmetrics, csummary, cpr-review, cmaintain, cmodel, cmodelupgrade, ctriage) writes artifacts via the Write tool (e.g., .correctless/artifacts/wtf-*), so only four tools are disallowed – Write is retained. The remaining 20 skills are exempt because they legitimately use Edit/Write for source file modifications (e.g., /ctdd writes tests, /creview edits specs).

The implementation touches 24 SKILL.md files (12 source + 12 distribution copies) with a single frontmatter line each. The test file (tests/test-disallowed-tools.sh, 339 lines, 117 assertions) covers 7 spec rules. R-005 is the most interesting structurally: it extracts tool basenames by stripping sub-pattern scoping (e.g., Write(.correctless/artifacts/wtf-*) yields Write) and checks that the disallowed set is disjoint from the allowed set. A sub-rule enforces that Group B skills specifically do not disallow Write. R-007 implements a full partition test – every skill in the skills/ directory must be classified as Group A, Group B, or Exempt. This structural drift test ensures that any new skill added to the project triggers a test failure until the developer classifies it, preventing silent omission of write-protection on read-only skills.

ENV-011 was added to ARCHITECTURE.md for the Claude Code v2.1.150 version dependency. On older versions, the disallowed-tools key is silently ignored – no crash, no enforcement. The allowed-tools whitelist handles protection alone. This graceful degradation means the feature never breaks backward compatibility. The defense-in-depth framing is deliberate: neither layer alone is sufficient (the allowed-tools list could be misconfigured; disallowed-tools could be unsupported), but together they provide both “only these tools” and “never these tools” constraints on the same skill.

2026-06-12 — AP-031 Fixture Divergence Prevention

This feature is the structural answer to two back-to-back postmortems with the same root cause. PMB-010: sync-deferred-backlog.sh parsed review findings with a heading regex expecting ## RS-001: while the real /creview-spec output writes ## Finding RS-001: — all 65 tests passed against hand-written fixtures encoding the wrong format, and the script silently imported 0 of 25 pending findings. PMB-011: the /cprune scanner shipped with three more instances of the same class (17 false positives from basename fixtures, a count regex that matched PAT-003 script before the actual count, drift-debt fixtures missing the real {"drift_debt": [...]} wrapper). The class is “test fixtures diverge from real producer output” — AP-031 in the antipattern catalog. The bet here is that the divergence is introduced at exactly two moments (spec writing and test writing), so prevention belongs in the prompts that govern those moments rather than in a runtime validator.

What was written is almost entirely prose-as-code: three directive blocks plus one structural test. Layer 1 lives in skills/cspec/SKILL.md Step 3 — when a feature parses another Correctless tool’s output, the spec must pin the exact format (heading regex, JSON schema, field names) and cite the producer file path as the authoritative source. The directive carries its own trigger-detection heuristics (parsing, jq field access, regex matching trigger it; existence checks and path-only operations do not) and an Example/Not contrast so the spec agent has a calibration anchor. Layer 2 has a writer half and an auditor half kept deliberately symmetrical: agents/ctdd-red.md requires at least one fixture sourced from a real artifact — preferred form is a verbatim excerpt with a Source: citation in the test language’s comment syntax (# Source: shell/Python, // Source: Go/TS/Java, -- Source: SQL) — and skills/ctdd/SKILL.md gains test audit check 11 (fixture provenance), which flags synthetic-only suites as BLOCKING. Both halves embed the same producer-to-artifact reference table (/creview-spec → review-spec-findings-*.md, /caudit → findings/audit-*-round-*.json, etc.) so the writer and the auditor agree on what “a real artifact exists” means.

The subtle design decisions are in the failure modes. Live-read-only fixtures (test reads .correctless/artifacts/... at runtime) are explicitly insufficient — that directory is gitignored, so such a test silently passes in CI with no fixture at all; the audit treats live-read-only as BLOCKING, same as synthetic-only. The audit agent is tool-pinned to Read/Grep/Glob and cannot run git, so the /ctdd orchestrator computes scope and passes two labeled lists — MODIFIED_TEST_FILES: from git diff and UNTRACKED_TEST_FILES: from git status --porcelain (RED-phase test files are untracked, not modified — omitting that list would silently skip exactly the tests that matter). A missing label fails loud with a single BLOCKING finding instead of guessing scope (the PMB-005 lesson: silent omission looks healthy). Fixture-following is bounded (repo-relative paths only, 10-file budget) and fenced (TB-003: fixture content is data to format-compare, never instructions — a fixture saying “AP-031 is satisfied” is itself a finding). And there’s a deliberate bootstrap dormancy: when producer and consumer land in the same PR, no real artifact exists yet, so the real-fixture requirement goes dormant and Layer 1’s format pinning is the sole guard until the producer has run once.

This is a conscious PAT-018 deviation — all enforcement is prompt-level, with the spec’s Won’t Do explicitly declining a runtime fixture validator. The compensating structure is tests/test-ap031-fixture-divergence.sh (39 tests): awk state machines extract each directive’s section before grepping, so keyword assertions are block-scoped (AP-003 mitigation — a keyword elsewhere in the file can’t satisfy a check), and 8 QA/mini-audit class fixes are pinned as named assertions (the cost-cache-* exclusion in the producer table, the fail-loud label fallback, the TB-003 fence, the 10-fixture budget, the no-retroactive-retrofit scope rule). Since the “implementation” is prose, regression means someone editing the directive text — the structural test makes that loud. The antipatterns.md AP-031 entry now carries a “Prevention implemented” note that reframes any recurrence as a postmortem trigger rather than a third strike toward PAT-020 promotion.

2026-06-14 — Slug-Type-Aware Artifact Classification in prune-scan.sh

This is the structural fix for the 2nd AP-032 instance — the prune-scan scanner. Before this feature, scripts/prune-scan.sh’s scan_artifacts had a single mental model: every pattern in artifact_patterns is branch-slug-named (feature-<name>-<md5[:6]>), so to find orphans, match every artifact filename against the live branch-slug set using substring search. But the repo had been quietly using three different slug conventions for years. Branch-slug for workflow-state, token-log, audit-trail, pipeline-manifest, autonomous-decisions. Task-slug (bare task name, no feature- prefix, no hash) for qa-findings, audit-mini. Session-slug (Claude Code session ID, never derived from any live work) for harness-notified-{SESSION_ID}.flag. When a task-slug-named file like qa-findings-prune-scan-slug-aware-matching.json was matched against the live branch-slug set, the match failed and the live file was flagged as a low-risk deletion candidate. Autonomous /cprune would have deleted it. UX-R2-014 had patched the qa-findings instance specifically by removing it from the patterns list, but the bug class was untouched. Any future task-slug-named pattern would silently exhibit the same data-loss vector.

The fix is _classify_artifact_pattern — a bash function in scripts/prune-scan.sh that maps every artifact_patterns entry to exactly one of branch-slug, task-slug, session-slug, or unclassified. It’s total over artifact_patterns (every pattern has a case branch) and defined exactly once (structural test INV-001). When the safety belt checks whether an artifact protects live work, it consults the live-slug set that matches the pattern’s classification — branch-slug patterns against live_branch_slugs (computed via branch_slug() from scripts/lib.sh), task-slug patterns against live_task_slugs (derived from basename(.spec_file, ".md") for each workflow-state-*.json whose .branch is in the live branch set — no .task fallback per EA-003), session-slug patterns are never live-prunable, and unclassified patterns are skipped with an observable skipped_unclassified JSON entry plus stderr advisory (never silently dropped). The producer-pattern table in .correctless/specs/prune-scan-slug-aware.md is the source of truth — INV-008 parses it and the artifact_patterns= assignment line via sed (no prose-grep, no source-and-read) and asserts bidirectional coverage with an allowlist cap of 5.

The other half of the fix is the matching primitive itself. The pre-feature scanner used grep -F "$slug" and unquoted =~ $slug — both substring primitives that cannot distinguish feature-foo-abc from feature-foo-def, or qa-findings-foo from qa-findings-foo-2. The new primitive is bash [[ regex with [-.] delimited-token boundaries: [[ $f =~ ^(.+-)?$slug([-.]|$) ]]. Substring primitives are structurally banned by the new prune-scan-substring-match rule in scripts/antipattern-scan.sh check_shell(). Slug values pass through _slug_is_safe validation at extraction boundaries AND ERE metacharacters are escaped via _escape_ere_metachars before regex interpolation — dual defense ensures malformed slugs are rejected at the boundary AND that any slug slipping through cannot exploit ERE metachar interpretation. MA-001 was the round-2 mini-audit finding that surfaced the ERE escape requirement; the fix is the _escape_ere_metachars helper at lines 222-228 plus the _slug_is_safe gate at lines 251-256.

Six fail-closed paths were added at safety-belt boundaries that pre-feature silently collapsed: (1) empty live-branch-slug set, when git branch returns no live branches, fails non-zero with stderr advisory instead of proceeding with an empty set that would classify every artifact as orphaned (F-001 fix at scripts/prune-scan.sh:744); (2) empty live-task-slug set with the same fail-closed posture; (3) missing realpath — _realpath_tool_available probes for realpath/readlink -f at scan entry, and when neither is available, the scanner exits non-zero with stderr advisory; it never silently falls back to lexical canonicalize_path for symlink-equivalence decisions (this is PAT-020, the canonicalization-fallback antipattern); (4) workflow-state mid-write TOCTOU, where identity comparison uses content-based started_at string equality (primary) then composite task|branch (fallback) then sha256(file) (last resort), never mtime — extending the ABS-029 content-based-match convention to cross-worktree scenarios where the same logical workflow-state may be observed at different paths (MA2-002 fix); (5) non-git BASE_DIR aborts with stderr advisory rather than swallowing git errors silently; (6) lib.sh sourcing failure, where scan_artifacts aborts when branch_slug() isn’t defined after sourcing rather than calling an undefined function.

Schema migration was the other major design decision. The pre-feature scanner emitted a bare JSON array of candidates. The new scanner emits a wrapped object {candidates, skipped_unclassified, protection_set, protection_status}. Consumers (/cprune and /cstatus) read .candidates for the candidate list and have visibility into which patterns were skipped (and why), what protection set was applied (live_branch_slugs, live_task_slugs, session_id), and what protection status was achieved (branch_slug_set_populated, task_slug_set_populated, realpath_available). BND-001 verified the consumer migration — reading the top-level value as an array fails the test. Both skills/cprune/SKILL.md and skills/cstatus/SKILL.md were migrated in the same PR to maintain the consumer contract.

ABS-040 introduces the baseline manifest: a JSON file under .correctless/meta/ recording the operator-acknowledged pattern set. Sole writer is scripts/prune-scan.sh --update-baseline, never set as a side effect of scanning. Autonomous /cprune runs, /cstatus runs, and default-mode /cprune runs all leave the baseline untouched. Update happens only when /cprune SKILL.md invokes the scanner with --update-baseline after interactive human confirmation. For any pattern present in current artifact_patterns but absent from the baseline, candidates emitted via that pattern carry risk: "medium" regardless of safety-belt outcome — preventing auto-promotion of newly-added patterns to low risk without human review. When the baseline file is missing or corrupt, the scanner fails closed to all-medium (INV-011a); it does not proceed as if baseline equaled current set. The baseline file is SFG-protected.

The producer-pattern table approach (INV-008) is itself a deliberate AP-031 satisfaction. Rather than a runtime pattern registry that could drift from the implementation, the spec is the authoritative source. The structural test parses both the spec table and the bash artifact_patterns= assignment via sed and asserts bidirectional coverage at CI time. The real-fixture requirement is satisfied by tests/fixtures/prune-scan/wfstate-real-sample.json — a verbatim 17-line excerpt of a real workflow-state JSON cited via # Source: comment. The fixture exercises the _workflow_state_identity content-based fence — the same primary→fallback→last-resort chain as ABS-029 applied at a new context (cross-worktree state identity instead of audit findings persistence).

The QA round caught 4 BLOCKING findings (F-001 empty live_branch_slugs, F-002 INV-018 dead-code, F-003 silent branch_slug failure, F-004 unescaped task slug in bash ERE) and the mini-audit caught 6 more across rounds 1 and 2 (MA-001 metachar escape, MA-002 pattern_is_new default, MA-003 baseline shape validation, MA-005 parent symlink bypass, MA2-001 realpath fallback, MA2-002 workflow-state identity, MA2-004 set -f noglob). All 10 are fixed and traceable in code via F-NNN fix: / MA-NNN fix: / MA2-NNN fix: comments. 61 tests in tests/test-prune-scan-slug-aware.sh cover all 18 invariants, 2 prohibitions, 2 boundary conditions, the extended EA-001, and the antipattern-scan rule registration. The cverify pass found pre-existing test failures in tests/test-cprune.sh (INV-013-d AP-033 pipefail flake, INV-016-a/b cprune SFG gap) unrelated to this branch — flagged for next debt sprint.

AP-032 is now at frequency 2. Both instances share the same shape: extraction step correct, resolution step incomplete. The cprune-skill 2026-05-24 instance was basename resolution against literal paths (file_exists("lib.sh") returns false when the file lives at scripts/lib.sh). This instance is substring slug matching against delimited tokens. A 3rd instance promotes AP-032 to a PAT-xxx structural rule: “any tool that resolves named references (paths, slugs, identifiers) against on-disk artifacts must define explicit resolution semantics, not lift the comparison primitive from convenience.”

2026-06-15 — Fix-diff reviewer class-shaped bug lens + SFG lift-and-restore backstop

What & why. PMB-019 recurred a class-shaped bug: PR #124 fixed one ARG_MAX site in scripts/build-dashboard.sh and missed the sibling read_file_json helper using the same --arg "$content" pattern. This feature adds a class-shaped bug detection lens to the fix-diff reviewer (agents/fix-diff-reviewer.md): when a fix is scope-narrowed (one site of a multi-instance pattern), the reviewer greps same-directory same-extension sibling modules before approving and emits a HIGH finding unless a SIBLING-DEFERRED: marker enumerates the deferrals. Because the reviewer’s deliverable file is itself SFG-protected (AP-037), the feature also ships an SFG lift-and-restore backstop subsystem so future PRs can develop the guarded file.

What was written. 21 invariants CS-001..CS-021. New: scripts/build-caudit-prompt.sh (the /caudit Step 6a <UNTRUSTED_FINDING_DESCRIPTION> + <PRE_PR_BASE_MARKERS> fence producer), scripts/build-pre-pr-base-markers.sh, scripts/check-no-pending-sfg-lift.sh (CS-012a final-state backstop), a cmd_done sentinel gate in the workflow dispatcher, a dedicated sfg-lift-check CI job, .claude/rules/sfg-deliverable.md, ABS-041 in ARCHITECTURE.md, and ~1300 lines of structural+behavioral tests in tests/test-fix-diff-reviewer-agent.sh (the CS-007 cardinality checklist asserts membership-equality over the 20-ID set).

How it works / hard-won lessons. Three QA rounds and two mini-audit rounds each found real bugs, almost all of the same class the feature exists to prevent: (1) the done-gate sentinel was twice dead — first never written, then keyed its filename on HEAD with content==HEAD so the mismatch branch was unreachable; the fix is a single fixed-name .correctless/artifacts/test-success.sha holding the SHA the suite last passed at, with a behavioral test that constructs the mismatch and asserts refusal. (2) The CS-011 fence producer was prose-only (test-only helper), reopening AP-026 — promoted to a real coded invocation in Step 6a. (3) The producer itself reintroduced PMB-019: jq --arg on unbounded descriptions silently lost findings, and the ceiling fix bounded only the diff while pre-PR-base markers stayed unbounded — fixed class-wide by routing every artifact-sized value through a single stdin→file→cap chokepoint and reserving trusted close fences in a tail post-assembly truncation can’t reach. (4) Character-delimited fences were forgeable (inject a fake rules block / forge a pre-PR-base marker) — fixed with per-invocation nonce-delimited fences + content neutralization. Patterns used: ABS-029 (gate-enforced phase-transition artifact contract), ABS-035 (dispatcher keeps zero cmd_* definitions — the gate is a _-prefixed helper), ABS-010 (byte-equal distribution mirror), PAT-018 (the read-scope deny-list is a prompt-level fallback; the structural Read-guard is deferred to a /carchitect cycle per OQ-010).

The meta-lesson. Every fix round that scope-narrowed (patched one site) sprouted a sibling the next round caught — the feature’s own thesis applied to its own development. The final producer-hardening round explicitly inventoried every argv site and every unbounded body component before declaring the class closed.

2026-06-16 — Cross-Model Spec Review via codex

What was built and why. /creview-spec already had a dormant “external review” path — a config block and a Step 3 stub that never did anything. This feature makes it real: codex (GPT-5.5) becomes a first-class adversarial spec reviewer that runs alongside Claude’s six review agents. The motivation is the project’s founding principle pushed one step further — not just “a different agent grades the work,” but a different model from a different vendor with a different failure distribution. A spec that survives both Claude’s lenses and an independent GPT-5.5 read is materially harder to get wrong. The path stays off by default and silently dormant when no external model is configured, so existing users see no change until they opt in.

What code was written. Two new scripts plus skill wiring. scripts/external-review-run.sh (~660 LOC) is the producer: subcommands review (invoke codex, capture findings), record (append a run to history), set-disposition, pending, and findings-block. cmd_review builds a local -a argv array, injects --sandbox read-only and the --output-schema/--output-last-message flags itself, runs codex with the spec on stdin (never argv), then routes the result through _validate_invocation (closed allowlist), _sanitize_findings (parse-gate, caps, EXT- renamespace, severity coercion), and _within_size_ceiling (4 MiB). It reuses build-caudit-prompt.sh’s _gen_nonce/_neutralize_fences verbatim for the untrusted-output fence. scripts/config-update.sh (~190 LOC) is the sanctioned writer for the two config fields (set-external-model, set-require-external-review) using jq --arg/--argjson + atomic temp+mv, so config never transits a shell redirect. /csetup, /creview-spec (Step 3/3.5), and /cstatus got the wiring; both new scripts were added to the SFG DEFAULTS.

How it works — the trust model. codex is treated as doubly untrusted: its output is shaped exactly like the review findings the orchestrator acts on (TB-008), and its config is the first config input treated as untrusted-against-tampering rather than owner-trusted (TB-001c). The defenses compose: the invocation is a closed allowlist with bin-realpath + flag-shape + model-charset + clamped-timeout validation; the output is parse-gated, bounded, and nonce-fenced before it reaches Claude’s reasoning; and codex findings are advisory-only — renamespaced EXT-NNN, surfaced at the Step 4 human disposition gate, never auto-incorporated. The read-only sandbox bounds writes, not egress — the egress boundary is the opt-in config gate (INV-005 auto-off-when-absent) plus the INV-014 config-time and INV-022 per-run disclosures, deliberately not the sandbox.

The CRITICAL the mini-audit caught (and the design lesson). QA round 1 and the red-team probe both passed, but the mini-audit’s hostile-input lens found that --sandbox read-only lived in the config’s base_args — so a tampered or hand-edited config that dropped it would run codex unsandboxed while every test still passed. That is the AP-022 dead-code-in-security-paths shape exactly: the guard existed but a config could route around it. The fix moves sandbox injection into the producer unconditionally and strips any config attempt to set --sandbox, with a regression test that captures the real argv and asserts the flag is present regardless of config. A round-2 hostile-input re-attack confirmed the fix holds against every bypass vector tried. This is the load-bearing argument for narrow tool allowlists and producer-side enforcement: security-relevant flags belong in code the config cannot reach, not in config the user can edit. Patterns used: ABS-042 (sole-writer producer), ABS-003 (locked_update_file for the history append), INV-009 fence reuse from build-caudit-prompt.sh (don’t reinvent neutralization), PAT-018 (structural enforcement over prompt-level for the sandbox flag).

2026-06-17 — /cchores — Autonomous Issue-Resolution Pipeline

/cchores is the issue→PR sibling of /cauto: a fully-autonomous overnight pipeline that resolves one open GitHub issue end to end and opens a PR that closes it, or aborts fail-closed with an evidence-preserving issue comment. It ingests attacker-controllable issue content and acts on a public surface (push/PR/comment) with no human checkpoint, so its security invariants are load-bearing.

The deterministic spine is coded: cchores-select-candidates.sh (idempotent exact-ref candidate filter), cchores-regression-oracle.sh (committed-substrate flake oracle, fail-closed + CI-superset gate), redact-secrets.sh (whole-buffer multiline fail-closed redactor), cchores-fence-issue.sh/cchores-fence-lib.sh (per-invocation nonce fence over untrusted issue body), cchores-emit.sh (egress chokepoint: redact + per-sink cap, truncate-after-redact), and cchores_slug() (charset-bounded branch slug). The LLM orchestrator (skills/cchores/SKILL.md) composes these; /cdebug gained an autonomous contract (agents/cdebug-fix.md leaf fix agent, structured outcome block, fail-closed jq -e parse-gate) while preserving interactive behavior (INV-006f).

Key decision: the 8-lens security mini-audit found the ingress fence and egress redaction were specified-but-unwired (LLM-obedience prose) while every coded helper was sound. Since the spec mandated coded enforcement, these were wired as coded chokepoints — the right posture for a no-human-checkpoint feature. Arch entries TB-009, TB-004d, ABS-043, ABS-044, ABS-030-revision document the surface. Deferred: DF-001/QA-004 (cdebug-fix Bash allowlist breadth — human tool-surface decision).

2026-06-25 — SFG re-scope: perimeter to write-target-only guardrail

What and why. hooks/sensitive-file-guard.sh (SFG) is a Claude Code PreToolUse hook that blocks the agent from writing protected files (.env, *.pem, .correctless/ state files, etc.). It had been specced as a security perimeter with a deliberately over-extracting Bash target extractor — _extract_bash_targets flagged every non-flag token as a candidate write target once _has_write_pattern fired (which it does on the ubiquitous 2>/dev/null idiom and bare interpreter use). Per PMB-020 / AP-040 that framing is a category error: a cooperative-loop PreToolUse hook can only ever be a guardrail/speedbump (trivially evaded by naming a directory, routing through an interpreter, or any ungated tool). The cost was paid entirely in false-positive friction — 15 false blocks across two dogfood sessions, every one a read/invocation/restore, zero write attacks. This feature right-sizes SFG to block only genuine write destinations.

What was written. _extract_bash_targets was rewritten from token-driven to destination-driven (PRH-001 — no unconditional token-emit branch). New helpers: _excise_process_subs (strips process-substitution spans), _mask_quoted_operators (a length-preserving quote/comment/escape-aware scan that neutralizes operator bytes inside quoted spans and after a word-boundary #), _mask_opaque_operands (interpreter/here-string operand opacity, INV-005), _segment_command (per-segment positional writer detection, INV-020), _extract_writer_dests + _extract_inplace_operand (cp/mv/install/ln final-positional, tee all-args, sed -i/perl -i operand, dd of=, truncate), _redirect_op_suffix (recognizes a live redirect operator as a token suffix), and _emit_dest (sink-device exclusion + dynamic-dest fail-open). Hook-scope LC_ALL=C was added (INV-019); the BLOCKED message was re-framed (INV-014). scripts/lib.sh’s _has_write_pattern/get_target_file are frozen (INV-011, golden-hash pinned). A doc-coherence sweep scoped ABS-029/030/035/038/040/041/042 (+ ABS-012/016) to the guardrail boundary and added the authoritative ABS-045; CHANGELOG/README/AGENT_CONTEXT/cmodelupgrade and the PAT-001 rule-file carve-out were updated.

How it works. The hook still runs the frozen _has_write_pattern pre-filter first (so firing-set is a superset of emit-set, INV-016); only after it fires does the destination-driven extractor run. The extractor emits a token ONLY via the redirect branch (INV-002) or the writer-command branch (INV-003); everything else (reads, invocations, interpreter/eval operands, git restores, process-subs, unresolvable/dynamic destinations) yields the empty set then allowed (INV-007 fail-open, structurally guaranteed). The masking is quote/comment/escape-aware so an operator inside a quoted argument (echoing a string that mentions a redirect) is not mistaken for a real redirect. Emitted destinations still flow through canonicalize_path (PAT-017, INV-010) before matching against DEFAULTS.

The hard part — masking cost. The byte-walk masking is O(n^2) by construction in bash (each tail-slice is O(remaining length)). Three cap iterations were needed: an output-accumulation fix (array-join), a trigger-byte-count cap (which proved structurally broken — its byte set omitted the redirect operators that defeat the masker fast path, it was paddable via comments/heredoc bodies, and the per-iteration tail-slice was O(L) regardless), and finally a simple raw-length cap (_SFG_LENGTH_CAP=12288) checked O(1) at the top before any byte loop (EA-006). Commands over the cap fail open (a >12 KiB command writing a protected path is non-naive — accepted per the guardrail framing). The trade-off: a long single-blob redirect now fails open too, but that path was the one carrying the O(n^2)/bypass bugs.

Patterns and design decisions. Tests are hook-integration only (drive the full hook via a stdin JSON envelope, assert the exit code — RS-006); function-level extractor calls are forbidden because they bypass the pre-filter and prove nothing about the deployed gate. The corpus uses real-bash differential oracles (a backslash-by-operator parity sweep and a write-redirect-operator completeness sweep) rather than hand-derived expectations — the hand-rolled lexer drifted twice (&>> then <> missed), so the completeness sweep is the structural class-fix that makes future operator drift a test failure rather than an escaped bug. Structural tripwires (PRH-001 default-arm, INV-019 LC_ALL=C, INV-006 sink-device, INV-011 golden-hash) are paired with behavioral proofs (behavior-primary, structural-as-tripwire — PMB-016/AP-036). Exotic write forms (cp -t / install -d / ln one-arg / IFS-byte dests / interpreter-mediated / git-restores / over-cap-size) are accepted fail-open non-goals, documented in the STRIDE/EA sections — the guardrail catches the naive accidental clobber, not the motivated adversary.

2026-06-26 — Reduce sensitive-file-guard to Edit/Write-tool-path only

What was built and why. One month after the #205 re-scope rebuilt _extract_bash_targets into a destination-driven extractor, this feature deleted that whole path. The premise (PMB-020/AP-040): SFG is a PreToolUse hook in a cooperative dev-loop — a guardrail/speedbump, not a security perimeter. The Bash-redirect leg it kept was trivially evadable (bash -c, naming a directory, interpreters), yet it carried ~90% of the hook’s complexity (~10 of #205’s defects were quote/comment/backslash masking and three O(n²) iterations) and produced constant false-positive friction blocking the agent’s own reads/invocations. So the leg was removed entirely: the hook now matches only tool_input.file_path for the Edit/Write tool family and fast-paths every Bash command to exit 0. Net 1005→320 lines.

What code was written. hooks/sensitive-file-guard.sh lost _extract_bash_targets, _strip_quotes, _excise_process_subs, _mask_quoted_operators, _mask_opaque_operands, _segment_command, _extract_writer_dests, _extract_inplace_operand, _redirect_op_suffix, _emit_dest, and the _SFG_LENGTH_CAP block. STEP 3 gained an unconditional Bash) exit 0 arm placed after the STEP-2 JSON parse but before _source_lib_sh/config (DD-1, INV-001). The retained Edit/Write path is unchanged: _source_lib_sh for canonicalize_path (PAT-017) + config_file, the v1 canonicalize sentinel probe (fail-closed on a divergent lib.sh), set -f + LC_ALL=C. scripts/lib.sh and hooks/workflow-gate.sh were left untouched (INV-004 — _has_write_pattern survives for workflow-gate’s independent use). The bulk of the diff is the doc-coherence sweep across ARCHITECTURE.md (ABS-045 narrowed; ABS-027/012/016/029/030/035/038/040/041/042 + TB-001a/b amended with a durable downgrade marker), CLAUDE.md conventions, README, CHANGELOG (framed as a security DOWNGRADE), FEATURES.md, and the two .claude/rules files. A new test, tests/test-sfg-doc-coherence.sh, mechanically enforces that sweep; tests/test-sfg-rescope.sh was deleted and the Bash-block assertions across five test files were inverted (exit 2 → exit 0).

How the adversarial phases shaped it. The QA/mini-audit rounds were where the value landed. Round-1 QA caught that the doc-coherence dangling-ref check had forced a rewrite of CLAUDE.md’s append-only PMB ledger — fixed by carving PMBs out via a shared Postmortem-stripping helper. Mini-audit round 1 found the enumerated reject-substring corpus had missed root-level FEATURES.md; round 2 found it also missed a .claude/rules file — the allowlist kept silently missing surfaces (the AP-026/AP-036 prose-drift class the feature is ironically about), so the corpus was generalized to a git ls-files '*.md'-derived set (minus journals/archives/specs and skills/agents, which keep a narrow 4-phrase leg). The most important catch was a genuine fail-closed bug: an array/object/non-string tool_name made jq @sh emit multiple shell tokens, so eval "$_PARSED" ran a bogus command and the hook exited 127 — not the fail-closed exit 2 that INV-006 now requires of the sole remaining fail-closed path. Fixed by guarding tool_name to a scalar string in the jq filter (non-string → jq errors → empty parse → exit 2) and coercing the other interpolated fields to scalars. The probe round produced nothing usable: the Agent-tool isolation: worktree branched the probe worktrees from main rather than the unpushed feature tip, so every probe’s substrate-presence precondition (PMB-015) tripped and the round aborted substrate-invalid — a clean catch, not a false pass.

Design decisions worth recording. (1) Bash) exit 0 inside the hook rather than dropping Bash from the registered matcher — keeps the change self-contained (OQ-002 defers the matcher change). (2) The accepted residual is the whole non-cmd_*-gated DEFAULTS set, not three files — only ABS-029/041 have a content gate that detects a forged Bash write at the next phase transition; for everything else (ABS-030/035/038/040/042, harness-fingerprint, preferences.md→eval) a Bash-mediated write is now unguarded and undetected, accepted because the Bash leg was always evadable and these are dev-workflow state files, not secrets. (3) A cross-cutting find for a follow-up: hooks/workflow-gate.sh:51-63 uses the identical eval + jq @sh parse and shares the non-scalar-tool_name crash (fail-open there) — out of scope here per INV-004, recommended as a separate fix plus a PAT-001 amendment mandating scalar coercion in the bulk-parse convention.

2026-07-01 — InstructionsLoaded hook (direct rule-load observability)

What & why. Feature B of the path-scoped-rules-pat001 line. PAT-001’s measurement gate (does loading a rule file into editing context reduce clause-5 fail-open regressions?) was accepted 2026-04-14 via an indirect git-archaeology proxy. This feature upgrades that signal to direct runtime observation: a fail-open InstructionsLoaded hook logs every .claude/rules/*.md load, and /cwtf shows those loads beside hook-edits so a human can judge whether a rule was in context around an edit. It does not re-open the accepted gate.

What was written. hooks/instructions-loaded.sh (new, + correctless/ mirror): parses harness stdin JSON, canonicalizes file_path via canonicalize_path (PAT-017) and prefix-checks .claude/rules/, then appends one jq -n --arg/--argjson-serialized line to the gitignored .correctless/meta/instructions-loaded.jsonl. hooks/audit-trail.sh gained an additive session_id field (canonical null when empty). setup’s register_hooks() was refactored from a hardcoded 2-type case into a single KNOWN_HOOK_TYPES associative type→timeout map + shared _upsert_command_hook/_upsert_agent_hook helpers used across all four registration seams (fresh, existing-update, drift-repair, invalid-regen) — adding a type now needs no bespoke arm. skills/cwtf/SKILL.md gained a “Rule-Load Observability” section whose runnable cwtf:rule-load-extract block joins the two logs for display. Tests: test-instructions-loaded.sh, test-instructions-loaded-cwtf.sh, test-audit-trail.sh, widened test-ci-hook-wiring.sh, plus real captured-payload + real-format audit-trail fixtures.

How it works / patterns. The hook follows the fail-open telemetry pattern (PAT-005 shape; set -f+LC_ALL=C, no strict mode, every path exits 0 — correct because InstructionsLoaded exit codes are harness-ignored, ENV-012). Log-line construction via jq -n (never string interpolation) is the TB-010 anti-forgery contract (INV-004). The /cwtf consumer follows the ABS-046 JSONL contract: per-line jq -R 'fromjson? | objects' (never jq -s; the | objects guard is load-bearing on jq 1.7, which CI runs — a bare fromjson? aborts the stream on a valid-non-object line, PMB-001). The feature is deliberately human-judged (PRH-005) — no automated MG-001/MG-002 classifier — because the round-3 multi-agent review found the cross-file-join design concentrated ~8 CRITICAL/HIGH failure modes all biasing optimistic.

Non-obvious decisions. (1) The firing model was empirically confirmed per-open/first-load (not session-batched) by capturing a real payload from the live 2.1.185 harness — the make-or-break unknown; attested in the verification report for INV-012b. (2) The /cwtf hook-edit filter canonicalizes .file to repo-relative (ltrimstr($root+"/")) before anchoring ^hooks//^\.correctless/hooks/, because real audit-trail .file values are a mix of absolute and relative paths — anchoring without canonicalization missed every absolute path (the dogfood repo’s common case) and masked a dead channel. This bug survived three fix-rounds because fixtures used only relative paths (AP-031 fixture/producer divergence); the fix added real absolute-path fixtures. (3) The log is intentionally unbounded and not SFG-protected — per-session telemetry, not a security asset (PRH-002); forgery only misleads a human who has the liveness counts in view.

2026-07-04 — Sanctioned sole-writer for SFG-protected meta artifacts

What was built and why. Three .correctless/meta/*.json artifacts — intensity-calibration.json, pat001-measurement-due.json, model-baselines.json — are in the sensitive-file-guard (SFG) DEFAULTS list, so the skills documented to write them (/cverify, /cdocs, /cmodelupgrade) were silently blocked whenever they tried an Edit/Write. Each write no-op’d, and the downstream consumers (/cspec intensity recommendations, /cmetrics, /cmodelupgrade regression reports) kept reading frozen data while looking healthy — the silent-telemetry-failure class, filed as #189/#192/#226. This is AP-037: the protected asset is the deliverable and the guard has no legitimate-write affordance. The feature adds one Bash-invoked sanctioned writer (SFG does not inspect Bash after sfg-edit-write-only), rewires the three skills onto it, and closes the AP-037 class structurally for meta json.

What was written. scripts/meta-record.sh (+ the .correctless/scripts/ mirror) dispatches three operations, each with a hardcoded destination (PRH-005): calibration-append (append one object to calibration_entries[] from stdin, deep-equal-preserving), pat001-set-created-at <sha> (set created_at_commit only when the field is present and literally null — the fix for the old blanket-scan that matched absent keys and polluted sibling files), and baselines-write <model>|<version> (key-merge preserving siblings + schema_version, never a whole-file overwrite). A CI/test-only registry scripts/sanctioned-meta-writers.tsv maps every protected meta file to its writer; the class-closure test reads it but the writer does not (DD-007). scripts/meta-pollution-detect.sh is an advisory detector for pre-existing #226/#192 pollution, surfaced in /cstatus. The three SKILL.md files had their Write(...meta...json) grants dropped and Bash(*meta-record.sh*) added, plus prose to echo the meta-record: FAILED token verbatim. scripts/lib.sh lock helpers were re-gated (see below).

How it works. The writer reuses the ABS-003 lock helpers (_acquire_state_lock/_release_state_lock) directly and hand-rolls only a tri-state read-validate-decide-atomic-rename body inside the lock (PRH-006/DD-008) — it deliberately does not call the two-state locked_update_file, which cannot express the three exit states and would deadlock if wrapped in a pre-lock on the same .lock. The three exit states are: 0+success line (write landed, valid JSON), 0+no change: <reason> (guarded no-op, no bytes rewritten), and non-zero + the mechanical meta-record: FAILED <file>: <reason> stdout token (rejected/failed). That token is the seam the skills echo so fail-loud is provable rather than prompt-level. Input is byte-capped at 64 KB with wc -c and passed via stdin/temp-file, never argv (AP-039). The destination and its nearest existing parents get a fail-closed realpath/readlink -f symlink verdict (_realpath_tool_available, PAT-020) before any mkdir/temp and again before mv — never the lexical canonicalize_path.

Non-obvious decisions. (1) The ABS-003 lock primitive was re-gated through mkdir → O_EXCL → ln → flock (with an ln/mkdir fallback). mkdir was non-atomic on the sandbox overlay (INV-007, two contenders both return 0); an O_EXCL pid-create passed locally but failed CI (empty-pid grace-loop raced under real parallelism); an ln hard-link create-with-content was better but still flaked once on the jq-1.8 CI runner (1 double-hold in the 25-way stress) — file-creation locks with PID stale-detection can’t guarantee mutual exclusion under adversarial scheduling. The fix is kernel flock on a persistent ${sf}.flock fd (race-free, auto-released on death), the default on Linux/CI, with the ln/mkdir dir-lock kept for macOS portability (command -v flock; CORRECTLESS_LOCK_IMPL=ln forces the fallback; deterministic R-tests use it, QA-002 stresses the flock path). Traps hit along the way: a 2>/dev/null on an exec line is PERMANENT (it silenced _fail’s stderr diagnostic process-wide — wrap fd execs in a { …; } 2>/dev/null group); never delete a flock lockfile on release (reopens a double-hold window). Lesson: file-creation locks are fundamentally racy under extreme scheduling — reach for flock when true exclusion matters, and prove concurrency changes on CI, not just locally. (2) The unknown-field schema policy is permissive, not strict: an unknown extra field on a calibration entry is accepted and preserved, so forward-compatible producer-schema growth cannot cause silent data loss (INV-002/RS-007). (3) baselines-write is a key-merge, not a whole-file write, specifically to bound the single-writer blast radius the codex spec review flagged (EXT-002) — a schema_version mismatch fails loud rather than clobbering real-but-wrong data. (4) AP-037 recursion: the writer is itself SFG-protected (INV-005), so building the class-closing deliverable required the AP-037 lift-and-restore affordance (.claude/rules/sfg-deliverable.md, ABS-041) on its own guard. Mechanism honesty (PMB-020/AP-040): SFG is a cooperative-loop guardrail, not a security perimeter — “sole writer” means the sanctioned/expected path enforced against agent Edit/Write, with wrong-content protection coming from the writer’s validation and the append-only/key-merge tests; out-of-band Bash writes are an accepted non-goal.

2026-07-05 — Audit-Trail File-Repo Attribution (#244)

What was built and why. hooks/audit-trail.sh is a PostToolUse telemetry hook that records every file edit into a per-workflow JSONL trail. It derived the artifacts dir, branch slug, state/trail/config paths, the record branch field, and the full-mode adherence file all from the hook’s cwd. When the harness edited a file in a sibling git repo/worktree while cwd was a different repo, the event landed under the wrong repo’s trail — or, if cwd had no .correctless/artifacts dir, was dropped silently. This is the silent-telemetry-failure class the project has a standing antipattern for: /cmetrics/adherence looked healthy while measuring the wrong (or no) repo. The fix reattributes each event to the edited file’s own repo.

What code was written. One production file, hooks/audit-trail.sh (plus tests/test-audit-trail.sh and the synced correctless/hooks/ mirror). New hook-local functions: _resolve_file_repo (walk up from the path’s nearest existing ancestor dir, run git --no-optional-locks -C <dir> rev-parse --show-toplevel; prints root + rc 0 in a repo, nothing + rc 1 otherwise — the distinction attribution needs to no-op vs. log to the wrong repo), _nearest_existing_dir, and _resolve_cached (memoization). The old cwd fast-path ([ -d ".correctless/artifacts" ] || exit 0 before stdin was even read) was removed; the main flow now parses stdin first, resolves each edited file to its repo, groups records per-repo preserving input order, and processes each group in _process_repo (slug/STATE_FILE/CONFIG_FILE/TRAIL/branch/adherence all derived from F).

How it works — the two bugs the workflow caught. The resolver is authoritative because git -C <dir> rev-parse returns the innermost enclosing repo for a directory. The first implementation memoized with a prefix cache (first known repo root that is a prefix of the file wins), which the QA agent found silently misattributes a nested-repo/submodule file to its parent when the parent resolved first (QA-001). The fix — memoize by the file’s nearest-existing directory instead — is both correct for nested repos and matches R-006’s stated O(unique nearest-existing dirs) cost target. The mini-audit’s hostile-input lens then found that the multi-file fan-out multiplexes paths through newline/TAB separators, and a Unix path may legally contain a newline, so a crafted file_path could split into two paths and forge a phantom record into a second repo’s trail (MA-001) — the exact silent-misattribution class the feature exists to kill, reintroduced through the delimiter. Fixed with a fail-open guard that skips any resolved path/repo containing a newline or TAB (a dropped telemetry record is safe; a forged cross-repo record is the bug).

Design decisions that aren’t obvious. (1) The resolver is deliberately local to the hook, not extracted into lib.sh — lib.sh is SFG-protected, so editing it would trip the AP-037 self-guard and require the lift-and-restore dance. This duplicates #242’s walk idiom in workflow-gate.sh; that’s an accepted, documented ABS-001 deferral (spec R-008) with a follow-up to extract a shared try_repo_root_for. (2) No security hardening (safe.directory/GIT_CEILING/-c core.*/timeout): per PMB-020/AP-040 this hook is a cooperative-loop guardrail on fail-open telemetry, not a security perimeter — an earlier (dropped) version of the spec grew a whole planted-repo-defense edifice that a 7-lens review flagged as the PMB-020 category error, prompting the narrowing to audit-trail-only. (3) The empty-branch guard: git -C F branch --show-current is empty on a detached HEAD; that empty value must never be passed bare into branch_slug() (lib.sh:105 treats an empty arg like no arg and falls back to the cwd branch), so an empty branch is a hard no-op. (4) PAT-005 fail-open is preserved on every path — missing git, resolver miss, malformed stdin all degrade to exit 0. Uncovered pre-existing observations (Bash get_target_file not write-aware; prune-scan adherence-*.json slug bug; MultiEdit edits[].file_path shape) were left out of scope for follow-up issues.

2026-07-06 — Generated test-count artifact (agent-context-count-sync, #219)

What was built and why. GitHub #219 is a deadlock between two /cchores rules: INV-010 forbids the chore diff from touching four shared prose docs (AGENT_CONTEXT.md, ARCHITECTURE.md, CLAUDE.md, README.md), while tests/test-ap031-fixture-divergence.sh R-006(c) required AGENT_CONTEXT.md to document a test-file count >= actual. A /cchores fix whose TDD repro is a net-new test file bumps actual, stales the doc, and the one edit that clears it is INV-010-forbidden — so the verified fix aborts (observed on PR #218 and #252). The root cause is a derivable fact hard-coded in an edit-restricted prose doc. Option 2 (this feature) decouples it: the authoritative count moves to a tracked, unprotected, generated artifact tests/test-inventory.json that any actor regenerates freely, and R-006(c) checks that artifact against a recomputed actual. INV-010 is left completely unchanged — the pivot deletes the whole class of hazards a cross-model review flagged against an exception-based Option 1.

What code was written. One new production script scripts/gen-test-inventory.sh (+ the correctless/scripts/ mirror) with write/count subcommands; a new tracked artifact tests/test-inventory.json ({"schema_version":1,"test_file_count":N}); the repointed R-006(c) block in tests/test-ap031-fixture-divergence.sh; consumer-scoped regeneration wiring + allowed-tools in skills/{cchores,ctdd,cdocs}/SKILL.md (+ mirrors); the AGENT_CONTEXT.md Tests-row converted to ~110 test scripts + pointer (INV-007); and ABS-048. Tests: tests/test-gen-test-inventory.sh (generator, resolver, count universe, validation matrix) and tests/test-test-inventory-wiring.sh (SFG/INV-010, wiring mechanism, distribution, allowed-tools).

How it works — the mechanism and the bugs the workflow caught. The single shared count command (INV-002) computes “actual” over the git index (git ls-files --cached -z -- 'tests/test*.sh', direct children only) so an untracked scratch file cannot perturb it and a clean CI checkout matches; R-006(c) obtains actual only from count, so writer and consumer can never drift. The load-bearing subtlety is staging order: because the universe is the index, the wiring must stage the net-new test files before regenerating and then stage the artifact into the same commit — the negative arm of the #219 repro test proves this ordering is load-bearing. The count logic was revised across three rounds and each revision was itself untested code (PMB-002): QA-001 made a bare git pipe fail-loud with set -o pipefail, but the mini-audit’s hostile-input lens found that fix had linearized the NUL stream (tr '\0' '\n') so a committed filename containing a newline split into extra counted lines and inflated the count identically in writer and consumer — a silently-wrong R-006(c) PASS (MA-H1). The fix keeps RS="\0" in awk (one git record ≤ one count) with a ^tests/test[^/]*\.sh$ filter re-asserting the shape. The round-2 re-audit then found the env -u GIT_DIR -u GIT_WORK_TREE pin was enumeration-incomplete — git ls-files --cached honors GIT_INDEX_FILE too — and replaced it with env -i PATH=… HOME=… LC_ALL=C git -C "$ROOT" … (clear the whole ambient env, allowlist the essentials), the 2026-04-28 “enumeration is class-incomplete” lesson applied to git env vars (MA-R2-001).

Design decisions that aren’t obvious. (1) The artifact is deliberately NOT a sole-writer and NOT in SFG DEFAULTS — the inverse of the ABS-029/030/042/047 family. It must stay unprotected so every actor can regenerate it; adding that protection re-arms the #219 deadlock. ABS-048 carries a mandatory load-bearing deviation note so a future SFG-hardening reflex (or an audit) does not “correct” the absence. It borrows only the tri-state FAILED-token exit discipline from meta-record.sh, not the lock or the protection. (2) The repo root is resolved from the generator’s own ${BASH_SOURCE[0]} via a marker-confirmed two-layout discriminator (source scripts/ → root ..; installed .correctless/scripts/ → root ../.., but only when the R-006(c) consumer marker actually exists at the installed candidate root) — never $PWD, never git rev-parse --show-toplevel (which breaks in probe worktrees and under GIT_DIR). This keeps write and count resolving the same tests/ in every wired context and makes a downstream install with no consumer no-op gracefully. (3) The AGENT_CONTEXT.md Tests-row uses a ~ prefix specifically so prune-scan.sh scan_counts’s digit-anchored extractor skips it — without that, autonomous /cprune would “correct” it back to an exact figure and re-arm the drift loop the feature exists to kill. (4) The adversarial probe round was skipped: isolation: "worktree" pins to the base-ref (main), not the feature commit, so it cannot see unmerged work — a documented safe degradation; the mini-audit (which reads the committed tree) carried the adversarial load instead.

2026-07-06 — /cchores Protected-File Affordance (PRH-003 v2)

What was built and why. /cchores was fail-closed against every SFG-protected file — a 2026-07-06 no-op run found 13 of 53 backlog bugs unreachable because the fix touched a DEFAULTS path. This feature is the v2 unblock: /cchores <N> explicit-issue mode treats the human’s explicit issue number as tacit authorization to fix a conservative, non-security subset of protected infra, gated three ways — mode (explicit only; no-arg is unchanged v1), eligibility (# affordance-tagged DEFAULTS lines only), and scope (a branch- and file-scoped per-run marker). Currently only scripts/prune-scan.sh and scripts/harness-fingerprint.sh carry the # affordance tag; secrets, security/sole-writer guards, lib.sh, and state artifacts are # other-floor/# secret-floor and never reachable.

What code was written. hooks/sensitive-file-guard.sh gained a single-source 3-way DEFAULTS tag (# affordance/# secret-floor/# other-floor) from which is_secret_floor()/is_affordance_eligible() derive (QA-001 deleted the old _SFG_LEGACY_EXACT_LINE_MIRROR duplicate); a main-guard so the classification helpers can be sourced side-effect-free; and the _sfg_affordance_allows allowlist check plus the _sfg_marker_binds_branch/_sfg_custom_match STEP-9 helpers. Two new scripts: scripts/chores-authorize.sh (sole marker writer with write --issue N / clear / check / check-capability subcommands, registered in scripts/sanctioned-chores-writers.tsv) and scripts/cchores-diff-check.sh (mode-aware diff/pre-selection gate + --check-classification set-equality immutability gate). scripts/cchores-emit.sh gained the INV-010 banner flags. skills/cchores/SKILL.md was rewired (PRH-003 v2 mode-gating, capability handshake, allowed-tools for both scripts, disallowed-tools marker exclusion). Docs: ABS-049, ABS-045 rewrite, the PAT-001 second carve-out, and the sfg-deliverable.md AP-037 enumeration.

How the ALLOW/BLOCK path works. On each Edit/Write target, STEP 9 canonicalizes, checks DEFAULTS/custom match (no match → allow), then classifies the tag. # secret-floor is deny-first (BLOCK before the allowlist is ever consulted). # other-floor/untagged/custom → BLOCK. Only # affordance reaches the allowlist, which itself fails closed on a present-but-unparseable/wrong-type custom_patterns config (MA-006/MA-012 — the user’s re-protection must win, so a config-degrade cannot convert a BLOCK into an ALLOW), then on a custom_patterns overlap, then in _sfg_affordance_allows: resolve the target’s own worktree (never the hook cwd — AP-035), require a parseable marker binding the byte-exact branch + numeric chore/issue-<N>-* name + matching manifest run_id + allowed_paths membership. Any failure or ambiguity exits 2; only the fully-verified predicate exits 0.

Patterns. ABS-049 (branch+file-scoped conditional-allow allowlist + per-run marker) See-links ABS-045 (the SFG capability boundary, rewritten to a TWO-non-strict-behaviors posture). PAT-001 gains a second carve-out worded to NOT loosen clause 5 (every failure path stays exit-2). chores-authorize.sh joins the sole-writer family (ABS-029/030/042/047) and the AP-037 lift-and-restore deliverable enumeration. AP-040/PMB-020 mechanism honesty is threaded through every invariant: SFG is a cooperative-loop guardrail, never a perimeter.

Design decisions that aren’t obvious. (1) Conservative eligible set — the affordance is limited to non-security infra whose fix cannot weaken a security control, a sole-writer contract, SFG’s own matching, or run state; everything else defaults to # other-floor. This deliberately keeps the ABS-029/030/042/047 sole-writer scripts NON-eligible even though they are “just scripts.” (2) Guardrail-not-perimeter framing — the marker/allowlist stops only the naive Edit/Write; the authoritative confinement against injection is the marker-independent cchores-diff-check.sh secret-floor + shared-doc legs plus never-merge + redaction + the INV-010 banner. Leg (c) out-of-scope and the SFG allowlist are honestly labeled guardrail-only because the marker is Bash-forgeable. (3) Fail-closed everywhere — the git/jq/manifest reads are guarded (2>/dev/null || …), the hook exits only 0 or 2, and the ALLOW-gating config read fails closed independently of the deny-list degrade (MA-006/MA-012). (4) Honest-scoped INV-005 — the per-run run_id nonce makes a leaked marker inert against a later run, but the MA-011 fix scoped both the spec wording and the writer comment to that exact claim and named the crash-window manual edit as an accepted residual, rather than overclaiming unconditional inertness. (5) QA-004 escalated, not autonomously resolved — the marker binds to /cchores’s real chore-run manifest by filename (via branch_slug()), but adding a documented run_id field to /cchores’s INV-007 schema and ratifying minting ownership is a cross-skill interface change carried forward for human adjudication.

2026-07-10 — Design Contract Lens Registry (design-contract-lens-sync)

What was built and why. Over PMB-013 → PMB-020 the project documented eight new /creview-spec Design Contract Checker lenses — cardinality, tool-surface, content-fidelity, extraction-rejection, authoring-affordance, gate-scope, unbounded-input-bounded-medium, mechanism-capability-mismatch — each recorded in CLAUDE.md as a “Design Contract Checker addition.” But the enforcing agent (agents/review-spec-design-contract.md) carried one generic lens and referenced none of them, and neither the agent nor the /creview-spec preamble loads CLAUDE.md. The documented lenses never reached the reviewer meant to apply them — PMB-016 (corrective-action described-but-not-implemented, AP-036) reproduced by the contract mechanism against itself, surfaced as DA-001 in the 2026-07-10 devil’s-advocate report. The clinching detail: the original spec draft seeded only seven lenses and dropped PMB-017’s authoring-affordance — the exact documented-but-unwired gap this feature exists to close, reproduced in the seed. The seed was re-derived to eight by enumerating every “Design Contract Checker” addition in CLAUDE.md rather than trusting a hand count.

What code was written. A new registry agents/design-contract-lenses.tsv (4-column TSV: lens_id, keyword, source_pmb, summary) seeded with the eight lenses (DCL-001 cardinality/PMB-013 … DCL-008 mechanism-capability-mismatch/PMB-020). agents/review-spec-design-contract.md (+ its correctless/agents/ mirror) gained a ## PMB-derived lenses section with one bullet per registry row — each tagged with its DCL-NNN id, the registry keyword verbatim, a directive term, and a concrete when/if condition — plus a row-format template, a worked example, an “adding a lens” runbook, a seed-retirement note, and a migration-seam note. tests/test-design-contract-lens-sync.sh (73 tests) is the enforcer. skills/cpostmortem/SKILL.md (+ mirror) Step 3 gained the prompt-level convention note. Companion bumps: _typos.toml (8 keywords + registry exclude), CONTRIBUTING.md and tests/test-inventory.json (111→112 test files), and an incidental relative-dates fix in tests/test-cross-feature-intel.sh (PMB-001’s 2026-04-10 fixture crossed the intel script’s 90-day recency window on ~2026-07-09 — an AP-024/bound-drift keep-the-suite-green fix). ABS-050 was authored during /cupdate-arch (index line in .correctless/ARCHITECTURE.md + full body in docs/architecture/abstractions.md).

How it works. The registry is the single machine-read source of truth; the agent bullets are derived from it; the test binds the two by set-equality both ways — every registry lens_id is referenced in the agent (completeness, INV-001) and every DCL-NNN in the agent maps to a registry row (no-orphans, INV-002), catching both documented-not-implemented and orphan/typo ids. Two extractors share one pinned token regex (DCL-[0-9]{3} with non-alnum/line-edge boundaries) but differ in scope: a full-file scan catches an orphan DCL-999 placed outside the section (which a section-only scan would miss), while a section-scoped scan drives the per-lens bullet checks. A >= 8 non-empty floor on both sets before comparison closes the vacuous-pass trap (two empty sets are trivially “equal” — the AP-022 dead-code shape). Anti-gaming (INV-005) requires each bullet to carry keyword + directive (excluding the keyword span) + a when/if condition + ≥24-char post-strip body, so a bare stub fails; whole-word matching is done in awk (no \b, no grep -o/-w/-P per EA-002/ENV-006). Registry well-formedness (INV-003) is a callable validate_registry driven by a 15-case malformed-fixture suite (BOM, CRLF, wrong header, embedded-tab 5th field, 2-digit id, etc.).

Non-obvious design decisions. (1) Registry-as-SSOT is the INVERSE authority direction of the ABS-047 sole-writer family (DD-005). In ABS-047 the registry is derived — the SFG DEFAULTS heredoc is the source of truth and the completeness test enumerates the primary set. Here the registry is the source of truth and the agent is derived from it — same completeness-test technique, opposite authority direction — so this feature authored its own ABS-050 entry rather than folding under ABS-047. (2) The substance loop iterates the LIVE registry, not the hard-coded seed (QA-003). An earlier draft checked substance against the 8-row seed array, which would let a future DCL-009 row escape the keyword/condition/body-floor checks — the exact dead-substance-check trap the feature exists to prevent, reproduced one level up. The fix iterates registry_lens_pairs parsed from the real file, with a DCL-009 test-the-test proving the loop covers rows beyond the seed. (3) PRH-001 self-scan tightening from codex (CV-001, HIGH): the self-scan whitelist originally matched any line with grep + the single-quoted CLAUDE.md needle without pinning the trailing file argument, so a prohibited grep '<needle>' CLAUDE.md (reading CLAUDE.md — the coupling PRH-001 forbids) would have passed. Tightened to strip the needle and reject any residual guidance-file token, refactored into a callable prh001_scan_source with three new test-the-test assertions. (4) CLAUDE.md → registry stays a prompt-level /cpostmortem Step-3 convention (DD-004), deliberately NOT a prose-scan — scanning CLAUDE.md to derive the lens set would re-introduce AP-031/AP-036 (convention-blind extraction over a prose document), the exact origin class. The structural leg is registry↔agent; the author-facing leg is the cpostmortem instruction. (5) Source-only registry (DD-001): sync.sh mirrors only agents/*.md, so the .tsv never ships; INV-009’s property-general find correctless/ -name 'design-contract-lenses.tsv' (not a hardcoded path) is the sole backstop against a future relocation shipping it. (6) Mechanism honesty (AP-040/PMB-020): INV-005 proves wiring, keyword-binding, and the presence of a trigger — not semantic correctness (R-A); an author who writes a plausible-but-wrong condition passes, with PR review + the mini-audit lens-body-anti-gaming lens as the backstop. The residual is named, not silently over-claimed.