Slug-Type-Aware Artifact Classification
Closes the 2nd instance of AP-032 by giving
scripts/prune-scan.shan explicit slug-type model. Spec:.correctless/specs/prune-scan-slug-aware.md. Architecture: ABS-039, ABS-040, PAT-020. Antipattern: AP-032 (2nd instance).
What It Does
scripts/prune-scan.sh scans .correctless/artifacts/ for orphaned files left behind by deleted or unknown branches. The pre-feature scanner treated every pattern in artifact_patterns as branch-slug-named, but the repo actually uses three different slug conventions: branch-slug (feature-<name>-<md5[:6]>), task-slug (bare task slug for qa-findings-*.json, audit-mini-*.json, etc.), and session-slug (Claude Code session ID, never live-prunable). When a task-slug-named file was matched against the live branch-slug set, the match failed and the live file was flagged as a low-risk deletion candidate. Autonomous /cprune would then delete it.
This feature gives the scanner an explicit slug-type model: _classify_artifact_pattern maps every pattern to exactly one of branch-slug, task-slug, session-slug, or unclassified; safety-belt matching consults the live-slug set that matches the classification; substring slug primitives are structurally banned; and the scanner output migrates from a bare JSON array to a wrapped object {candidates, skipped_unclassified, protection_set, protection_status} so consumers know which patterns were skipped and what protection set was applied.
How It Works
graph TD
A["artifact_patterns[]"] --> B["_classify_artifact_pattern"]
B -->|branch-slug| C["match against live_branch_slugs<br/>(branch_slug from lib.sh)"]
B -->|task-slug| D["match against live_task_slugs<br/>(basename(.spec_file, '.md'))"]
B -->|session-slug| E["never live-prunable<br/>(skip)"]
B -->|unclassified| F["skipped_unclassified[]<br/>+ stderr advisory"]
C --> G["delimited-token regex<br/>[[ $f =~ ^(.+-)?$slug([-.]|$) ]]"]
D --> G
G -->|"match: skip"| H["live file — never candidate"]
G -->|"no match: candidate"| I["wrapped JSON object<br/>{candidates, skipped_unclassified,<br/>protection_set, protection_status}"]
E --> I
F --> I
I --> J["/cprune SKILL.md<br/>reads .candidates"]
I --> K["/cstatus<br/>reads .candidates"]
style B fill:#dcedc8,stroke:#333,stroke-width:2px
style G fill:#fff3e0,stroke:#333,stroke-width:2px
style I fill:#e3f2fd,stroke:#333,stroke-width:2px
The classification function is total over artifact_patterns — every pattern must have a mapping or the structural test (INV-001) fails. The pattern-to-slug-type mapping lives in the spec as a producer-pattern table (INV-008), cross-referenced at CI time against both the function’s case branches and the artifact_patterns= assignment line. Drift in either direction fails the test.
Slug matching uses delimited-token bash [[ regex with [-.] character class for boundaries — it distinguishes feature-foo-abc from feature-foo-def and qa-findings-foo from qa-findings-foo-2. Substring primitives (grep -F "$slug", case "$f" in *"$slug"*), unquoted =~ $slug) are structurally banned by the prune-scan-substring-match rule in scripts/antipattern-scan.sh check_shell(). Slug values are also validated by _slug_is_safe at extraction boundaries AND ERE metacharacters are escaped by _escape_ere_metachars before being interpolated into regex — dual defense ensures malformed slugs are rejected AND that any slug that slips through cannot exploit ERE metachar interpretation.
Safety-Belt Completion
Six fail-closed paths were added in this feature so the safety belt cannot silently collapse:
- Empty live-branch-slug set: when
git branchreturns no live branches (corrupted repo, fresh clone), the scanner fails non-zero with a stderr advisory instead of proceeding with an empty set — which would otherwise classify every artifact as orphaned. - Empty live-task-slug set: same fail-closed behavior when
.spec_filelookups return zero task slugs. - Missing realpath:
_realpath_tool_availableprobes forrealpath/readlink -fat scan entry. Neither available → exit non-zero with stderr advisory. Never silently falls back to lexicalcanonicalize_pathfor symlink-equivalence decisions (PAT-020). - Workflow-state mid-write TOCTOU: identity comparison uses content-based
started_atstring equality (primary) → compositetask|branch(fallback) →sha256(file)(last resort). Never mtime — extends the ABS-029 content-based-match convention. - Non-git BASE_DIR: scanner aborts with stderr advisory rather than proceeding with
giterrors silently swallowed. - lib.sh sourcing failure: when
branch_slug()isn’t defined after sourcing, scan_artifacts aborts before consuming the missing function.
Baseline Manifest (ABS-040)
.correctless/meta/prune-pattern-baseline.json records the operator-acknowledged pattern set. Schema: {"patterns": [...], "updated_at": "{ISO}", "schema_version": 1}.
Sole writer is scripts/prune-scan.sh --update-baseline. The scanner never updates the baseline as a side effect of scanning — autonomous /cprune runs, /cstatus runs, and default-mode /cprune runs all leave the baseline untouched. Baseline update happens only when /cprune SKILL.md invokes the scanner with --update-baseline after interactive human confirmation.
For any pattern present in current artifact_patterns but absent from the baseline, candidates emitted via that pattern carry risk: "medium" (interactive-only) with reason text Newly added pattern '{pattern}' — first scan after upgrade; review before deletion. This prevents auto-promotion of newly-added patterns to low risk without human review. The baseline file is SFG-protected.
When the baseline file is missing or corrupt, the scanner fails closed to all-medium (INV-011a) — it does not proceed as if baseline equaled current set.
Schema Migration
Before this feature, prune-scan.sh emitted a bare JSON array of candidates. After this feature, it emits a wrapped object:
{
"candidates": [...],
"skipped_unclassified": [
{"pattern": "prune-test-synthetic-*.json", "reason": "unclassified pattern", "files": [...]}
],
"protection_set": {
"live_branch_slugs": ["feature-foo-abc123", "..."],
"live_task_slugs": ["prune-scan-slug-aware-matching", "..."],
"session_id": "..."
},
"protection_status": {
"branch_slug_set_populated": true,
"task_slug_set_populated": true,
"realpath_available": true
}
}
Consumers must read .candidates for the candidate list. /cprune SKILL.md and /cstatus SKILL.md were both migrated in the same PR. Reading the top-level value as an array fails the consumer migration check in tests/test-prune-scan-slug-aware.sh BND-001.
AP-032 Class Status
This is the 2nd confirmed instance of AP-032 (extraction correct, resolution incomplete). The frequency is now 2; the promotion threshold remains 3. A 3rd instance promotes AP-032 to a PAT-xxx structural rule: “any tool that resolves named references (paths, slugs, identifiers) against on-disk artifacts must define explicit resolution semantics, not lift the comparison primitive from convenience.”
The first instance (cprune-skill, 2026-05-24) was basename resolution against literal paths; this instance is substring slug matching against delimited tokens. Both have the same shape — the extraction step worked, the resolution step had an incomplete model of what counts as a match.
Testing
61 tests in tests/test-prune-scan-slug-aware.sh cover all 18 invariants, 2 prohibitions, 2 boundary conditions, the extended EA-001 environment assumption, and the new antipattern-scan rule registration. The real-fixture requirement (AP-031) is satisfied by tests/fixtures/prune-scan/wfstate-real-sample.json — a verbatim excerpt of a real workflow-state JSON cited via # Source: comment.
Several pre-existing test failures in tests/test-cprune.sh (INV-013-d, INV-016-a/b) are unrelated — INV-013-d is the AP-033 pipefail+grep SIGPIPE flake (PMB-012) and INV-016-a/b are gaps in cprune-skill’s SFG protection that pre-date this branch.
Known Limitations
- Verification reports under
.correctless/verification/are also task-slug-named but live under a different directory the scanner does not currently cover. If a future scanner extension covers.correctless/verification/, it will need the same slug-type-aware classification. - The scanner only operates within
.correctless/artifacts/. Path-traversal cleanup outside that directory is out of scope. - The risk-tier policy (
/cpruneautonomous-eligibility rules) is unchanged —lowis still auto-eligible. The fix is to ensure live artifacts never reachlowrisk via better classification, not to change downstream policy.