Harness Fingerprint + Model Upgrade Detection
Detect when Anthropic ships a new model or updates harness defaults silently — and produce a regression report against a captured baseline. Spec:
.correctless/specs/harness-fingerprint.md. Architecture: ABS-027, ABS-028.
What It Does
Correctless’s correctness model implicitly depends on a single Anthropic model version’s uncontracted behavioral defaults — length caps, parallel-tool-call preferences, anti-defensive code priors, in-context skill inlining. When the model or the harness changes, the workflow can regress silently. The 4.6 → 4.7 audit (OPUS_4_7_MIGRATION.md) made this concrete: 3 distinct findings, none surfaced by metrics, none caught by tests.
This feature ships two bundled mechanisms:
- Deterministic fingerprint —
scripts/harness-fingerprint.shcomputes the literal string"{model_name}|{HARNESS_VERSION}"(no hashing — debuggable by reading the file directly) whereHARNESS_VERSIONis a manually-bumped integer constant maintained by the human./cspecinvokes the script at Step -1, advisory-only. When the fingerprint differs from the stored value, a one-timeversion_bumpednotification is shown. /cmodelupgradeskill — compares the current{model}+{HARNESS_VERSION}combination’s per-feature pipeline metrics against a stored baseline and emits a regression report. Read-only on the fingerprint store; sole writer of the baseline file.
How It Works
┌─────────────────┐ ┌────────────────────────┐ ┌──────────────────────┐
│ /cspec Step -1 │ ───▶ │ harness-fingerprint.sh │ ───▶ │ harness-fingerprint │
└─────────────────┘ │ (sole writer) │ │ .json │
└────────────────────────┘ └──────────────────────┘
│ │
▼ ▼
┌────────────────────────┐ ┌──────────────────────┐
│ harness-notified- │ │ /cstatus advisory │
│ {session-id}.flag │ │ line │
│ (per-session dedup) │ └──────────────────────┘
└────────────────────────┘ │
▼
┌──────────────────────┐
│ /cmodelupgrade │
│ (sole writer of │
│ model-baselines) │
└──────────────────────┘
Configuration
No project-level config knobs. The bumped integer constant lives in scripts/harness-fingerprint.sh:
# ============================================================================
# HARNESS_VERSION — INTEGER CONSTANT (PRH-006)
#
# Bumped manually by the maintainer when an Anthropic harness update is
# observed (see OQ-006 in spec for heuristic). Bumping this value triggers a
# version_bumped signal on the next /cspec invocation in any open session.
# DO NOT bump autonomously — sensitive-file-guard protects this script from
# autonomous Edit/Write once committed.
# ============================================================================
HARNESS_VERSION=1
When to bump (OQ-006 heuristic):
/cmodelupgraderegression report shows >20% delta in any metric across consecutive same-model+version runs, OR- The maintainer notices a behavioral change manually (spec quality drops, QA round counts climb without explanation, a 4.7-style audit pattern surfaces)
Files Touched / Added
| Path | Role |
|---|---|
scripts/harness-fingerprint.sh | Fingerprint script (sole writer of fingerprint file) |
skills/cmodelupgrade/SKILL.md | Regression report skill (sole writer of baseline file) |
templates/test-features/baseline.md | Reference feature template scaffolded by /csetup Step 2.6 |
.correctless/meta/harness-fingerprint.json | Fingerprint store ({fingerprint, harness_version, model, timestamp, schema_version}) |
.correctless/meta/model-baselines.json | Baseline metrics keyed by {model}+{HARNESS_VERSION} (schema_version: 1) |
.correctless/test-features/baseline.md | User-editable scaffolded reference feature (idempotent — /csetup never overwrites) |
.correctless/artifacts/harness-notified-{session-id}.flag | Per-session notification dedup |
scripts/lib.sh | Adds get_current_session_id() (cross-platform via ps -o lstart= → /proc/{pid}/stat → PID fallback) and locked_update_file() (generic locked read-modify-write) |
Integration Points
/cspecStep -1 — runsharness-fingerprint.sh checkbefore Socratic brainstorm (marker:<!-- correctless:harness-fingerprint:invocation -->)/cstatusSection 3a — emits theHarness: model={X} version={Y} fingerprint={hash[:8]} status={ok|new|version-bumped}advisory line/cverify— writesharness_versionfield on every new calibration entry (BND-005 prerequisite — without this, the post-fingerprint pool stays empty and the three-tier lookup collapses)/csetupStep 2.6 — scaffoldstemplates/test-features/baseline.mdto.correctless/test-features/baseline.md(idempotent guard via[ ! -f ])/cautoAuto Run Report — surfaces anyharness-notified-*.flagfiles in “What to Review First” (INV-016)hooks/sensitive-file-guard.sh— protectsscripts/harness-fingerprint.sh,.correctless/meta/harness-fingerprint.json, and.correctless/meta/model-baselines.jsonfrom non-sanctioned writers (Edit/Write AND Bash redirects)
Examples
Verify the script is wired (smoke check)
bash scripts/harness-fingerprint.sh check
# fingerprint=claude-opus-4-7|1
# status=unchanged
# model=claude-opus-4-7
# harness_version=1
# notified=false
Capture an initial baseline
# Run /cauto on the controlled-baseline reference feature first
# (after /csetup has scaffolded .correctless/test-features/baseline.md)
/cauto
# Then capture the baseline (requires ≥2 qualifying runs, surfaces source slugs)
/cmodelupgrade --capture-baseline
Inspect the fingerprint state
cat .correctless/meta/harness-fingerprint.json
# {
# "fingerprint": "claude-opus-4-7|1",
# "harness_version": 1,
# "model": "claude-opus-4-7",
# "timestamp": "2026-04-26T22:17:39Z",
# "schema_version": 1
# }
Known Limitations
model_nameis not tamper-resistant (EA-005) — sourced from Claude Code’s environment. An autonomous agent with write access to env or session metadata could spoof its own model name. Single-user dev tool threat model accepts this.- Mid-session changes are not detected (EA-003) — sessions started before a HARNESS_VERSION bump continue with the old fingerprint until restart.
- Per-skill granularity is out of scope — report is per-feature only. Per-skill requires audit-trail to record per-phase qa_rounds and token-tracking to backfill from cost artifacts (both upstream changes deferred).
- No automatic detection of behavioral change (EA-004) — bumping
HARNESS_VERSIONrequires human judgment. The mechanism is reactive to its own observations (>20% metric shift within one key surfaces in/cmodelupgrade), not predictive.
Test Coverage
tests/test-harness-fingerprint.sh — 110 passed, 0 failed. Covers INV-001..019, PRH-001..006, BND-001..005. Cross-suite coverage in test-architecture-drift.sh (ABS-027 presence), test-sensitive-file-guard.sh (HF-002 redirect-block + HF-006 Edit-block), test-allowed-tools-check.sh (cmodelupgrade frontmatter), test-scripts-namespace-migration.sh (HF-PMB003: harness-fingerprint.sh installed), and test-skill-path-discovery.sh (R-005(g)-cmodelupgrade).
See Also
- Spec:
.correctless/specs/harness-fingerprint.md - Verification:
.correctless/verification/harness-fingerprint-verification.md - Architecture: ABS-027 (fingerprint store contract), ABS-028 (test-features baseline contract) in
.correctless/ARCHITECTURE.md - Skill:
/cmodelupgrade