@mcp-graph-workflow/agent-graph-flow 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (34) hide show
  1. package/README.md +40 -0
  2. package/dist/cli/index.d.ts +1 -0
  3. package/dist/cli/index.js +12842 -0
  4. package/dist/index.d.ts +43 -0
  5. package/dist/index.js +48 -0
  6. package/package.json +142 -0
  7. package/src/skills/analyze/ambiguity-audit.md +46 -0
  8. package/src/skills/analyze/decompose-prd.md +26 -0
  9. package/src/skills/analyze/grill-me.md +26 -0
  10. package/src/skills/analyze/to-prd.md +57 -0
  11. package/src/skills/any/code-detachment.md +26 -0
  12. package/src/skills/any/lessons-consult.md +26 -0
  13. package/src/skills/any/wip-one.md +26 -0
  14. package/src/skills/design/design-an-interface.md +26 -0
  15. package/src/skills/design/seam-audit.md +26 -0
  16. package/src/skills/domain/crypto/common-mistakes.md +71 -0
  17. package/src/skills/domain/ml/common-mistakes.md +55 -0
  18. package/src/skills/domain/rag/chunk-overlap-strategy.md +27 -0
  19. package/src/skills/domain/sqlite-perf/fts5-tuning.md +25 -0
  20. package/src/skills/domain/sqlite-perf/wal-mode.md +26 -0
  21. package/src/skills/domain/systems/common-mistakes.md +62 -0
  22. package/src/skills/domain/testing/vitest-isolation.md +31 -0
  23. package/src/skills/domain/typescript/zod-v4-migration.md +27 -0
  24. package/src/skills/implement/anti-hallucination.md +28 -0
  25. package/src/skills/implement/pure-decision-pattern.md +26 -0
  26. package/src/skills/implement/tracer-bullet-tdd.md +26 -0
  27. package/src/skills/plan/budget-aware-picking.md +26 -0
  28. package/src/skills/plan/plan-sprint.md +26 -0
  29. package/src/skills/plan/to-issues.md +67 -0
  30. package/src/skills/review/citation-coverage-review.md +26 -0
  31. package/src/skills/review/deep-module-review.md +26 -0
  32. package/src/skills/review/zoom-out.md +34 -0
  33. package/src/skills/validate/dod-checklist.md +30 -0
  34. package/src/skills/validate/harness-regression-check.md +26 -0
@@ -0,0 +1,27 @@
1
+ ---
2
+ domain: rag
3
+ topic: chunk-overlap-strategy
4
+ triggers: [retrieval_quality, lost_context, chunk_boundary]
5
+ discovered_at: 2026-04-28T00:00:00.000Z
6
+ source_task: seed
7
+ confidence: 0.75
8
+ ---
9
+
10
+ # Chunk Overlap Strategy
11
+
12
+ Default to ~15% overlap between adjacent chunks. Information at chunk
13
+ boundaries gets lost when there's no overlap; queries that span the cut
14
+ return either chunk and miss the bridging context.
15
+
16
+ ## When to apply
17
+
18
+ - Retrieval quality drops on multi-sentence reasoning.
19
+ - Tables, code blocks, or definitions span chunk boundaries.
20
+ - Citations point to the wrong chunk by one position.
21
+
22
+ ## Sizing rule
23
+
24
+ - Token budget for a chunk: 256–512 tokens.
25
+ - Overlap: 15% of chunk size (38–76 tokens).
26
+ - For code chunks, prefer overlapping by full statements, not byte counts —
27
+ splitting an `if/else` block in half makes both chunks unhelpful.
@@ -0,0 +1,25 @@
1
+ ---
2
+ domain: sqlite-perf
3
+ topic: fts5-tuning
4
+ triggers: [slow_search, fts5_queries, search_relevance]
5
+ discovered_at: 2026-04-28T00:00:00.000Z
6
+ source_task: seed
7
+ confidence: 0.78
8
+ ---
9
+
10
+ # FTS5 Tuning
11
+
12
+ `bm25()` ranking + `unicode61` tokenizer + content-rowid linking is the
13
+ default fast path. Use external content tables to avoid double-storing.
14
+
15
+ ## When to apply
16
+
17
+ - Search queries scan thousands of rows.
18
+ - Relevance ranking feels off (BM25 needs explicit weights per column).
19
+ - Index size is bloating the database.
20
+
21
+ ## Tips
22
+
23
+ - Rebuild the FTS index after bulk inserts: `INSERT INTO fts(fts) VALUES('rebuild');`
24
+ - Use `MATCH` with prefix tokens (`foo*`) instead of `LIKE`.
25
+ - For Portuguese/Spanish content, append `remove_diacritics 1` to the tokenizer.
@@ -0,0 +1,26 @@
1
+ ---
2
+ domain: sqlite-perf
3
+ topic: wal-mode
4
+ triggers: [slow_writes, lock_contention, concurrent_readers]
5
+ discovered_at: 2026-04-28T00:00:00.000Z
6
+ source_task: seed
7
+ confidence: 0.85
8
+ ---
9
+
10
+ # SQLite WAL Mode
11
+
12
+ Write-Ahead Logging lets readers and a single writer run concurrently without
13
+ blocking each other. Enable with `PRAGMA journal_mode = WAL;` once per database;
14
+ the setting persists in the file header.
15
+
16
+ ## When to apply
17
+
18
+ - Writes are slow under read pressure.
19
+ - `SQLITE_BUSY` errors appear in logs.
20
+ - Multiple processes/threads need to read while one writes.
21
+
22
+ ## Trade-offs
23
+
24
+ - A `-wal` and `-shm` sidecar file appear next to the database.
25
+ - Long-running readers can prevent the WAL from being checkpointed.
26
+ - Backups must include the WAL file or use `VACUUM INTO`.
@@ -0,0 +1,62 @@
1
+ ---
2
+ domain: systems
3
+ topic: common-mistakes
4
+ triggers: [concurrency, distributed_systems, performance, deadlock, race_condition]
5
+ discovered_at: 2026-04-30T00:00:00.000Z
6
+ source_task: extracta-paper2code
7
+ confidence: 0.8
8
+ ---
9
+
10
+ # Systems Implementation — Common Mistakes
11
+
12
+ Patterns where the code compiles but the *behavior* is wrong under load.
13
+
14
+ ## Concurrency
15
+
16
+ - **Mutex held across await** — JS/Python async: holding a lock while
17
+ awaiting another I/O call serializes the whole pool. Release before
18
+ awaiting; re-acquire after.
19
+ - **Read-modify-write without lock** — `count++` is three ops. Under
20
+ concurrency you lose increments. Use atomic counters or a single
21
+ `UPDATE ... SET col = col + 1`.
22
+ - **Double-check locking** — works in Java/C++ with proper memory
23
+ fences; broken in many other languages because the second check sees
24
+ a partially-constructed object.
25
+
26
+ ## Distributed systems
27
+
28
+ - **At-most-once vs at-least-once** — clients usually need at-least-once
29
+ with idempotency keys; servers usually offer at-least-once delivery.
30
+ Treat duplicates as the default, not the edge case.
31
+ - **Clock skew** — never trust two machines' wall clocks to differ by
32
+ less than 100 ms. Use logical clocks or relative times for ordering.
33
+ - **Network partitions are not "rare"** — every multi-AZ deploy will see
34
+ one within a quarter. Design retries + backoff with jitter as
35
+ always-on, not failure-mode.
36
+
37
+ ## Performance
38
+
39
+ - **N+1 queries** — list view fetches N records then issues N follow-up
40
+ queries. JOIN or use a batch loader (DataLoader pattern).
41
+ - **Wrong cache key granularity** — caching at request level invalidates
42
+ too aggressively; per-user often invalidates not enough. The right
43
+ granularity is the smallest stable subset of the data.
44
+ - **Synchronous I/O in event loop** — Node `fs.readFileSync` in a
45
+ request handler blocks the whole process. Same for Python asyncio
46
+ with sync DB drivers.
47
+
48
+ ## Storage
49
+
50
+ - **Index on the wrong column order** — composite index on (user, ts)
51
+ helps `WHERE user=? AND ts > ?`; doesn't help `WHERE ts > ?`. Order
52
+ by selectivity-then-range.
53
+ - **WAL not checkpointed** — long-running readers in SQLite WAL mode
54
+ prevent checkpoint, growing the `-wal` file unboundedly.
55
+ - **Migration without backfill** — adding a NOT NULL column on a large
56
+ table without a default scans the whole table under a lock.
57
+
58
+ ## When to escalate
59
+
60
+ If a task describes "scale to N users" or "handle bursts", the AC needs
61
+ explicit numbers (latency p95, throughput) and an answer to: what
62
+ happens when the bound is exceeded? Mark UNSPECIFIED otherwise.
@@ -0,0 +1,31 @@
1
+ ---
2
+ domain: testing
3
+ topic: vitest-isolation
4
+ triggers: [flaky_tests, shared_state, test_pollution]
5
+ discovered_at: 2026-04-28T00:00:00.000Z
6
+ source_task: seed
7
+ confidence: 0.8
8
+ ---
9
+
10
+ # Vitest Test Isolation
11
+
12
+ Each test should create its own SqliteStore (`SqliteStore.open(":memory:")`)
13
+ and tear it down in `afterEach`. Module-level singletons leak state across
14
+ files when the worker pool reuses a process.
15
+
16
+ ## When to apply
17
+
18
+ - Tests pass alone but fail when run together.
19
+ - Order-dependent failures.
20
+ - Mutable module-level caches (`Map`, `Set`) that aren't cleared.
21
+
22
+ ## Pattern
23
+
24
+ ```ts
25
+ let store: SqliteStore;
26
+ beforeEach(() => { store = SqliteStore.open(":memory:"); });
27
+ afterEach(() => { store.close(); });
28
+ ```
29
+
30
+ For singletons that can't be replaced, expose a `_reset()` test-only hook
31
+ and call it in `beforeEach` — never share a real DB file across tests.
@@ -0,0 +1,27 @@
1
+ ---
2
+ domain: typescript
3
+ topic: zod-v4-migration
4
+ triggers: [zod_upgrade, schema_migration, type_inference_break]
5
+ discovered_at: 2026-04-28T00:00:00.000Z
6
+ source_task: seed
7
+ confidence: 0.82
8
+ ---
9
+
10
+ # Zod v4 Migration
11
+
12
+ Always import from `zod/v4` in this project: `import { z } from 'zod/v4'`.
13
+ Never import from `zod` — that path resolves to v3 in some environments and
14
+ silently changes runtime behavior.
15
+
16
+ ## Common breaks
17
+
18
+ - `z.record()` now requires both key and value schemas.
19
+ - `z.string().email()` was removed; use `z.email()` at the top level.
20
+ - `.optional()` followed by `.default()` changed evaluation order.
21
+ - `safeParse` errors expose `issues` (no longer `errors`).
22
+
23
+ ## Migration recipe
24
+
25
+ 1. Bulk replace `from "zod"` → `from "zod/v4"`.
26
+ 2. Run `tsc --noEmit` and fix the schema-shape regressions case by case.
27
+ 3. Re-run unit tests for each schema; runtime parsing is stricter in v4.
@@ -0,0 +1,28 @@
1
+ ---
2
+ name: anti-hallucination
3
+ description: Forbidden phrases + citation requirements for code and rationale
4
+ category: implement
5
+ phases: [IMPLEMENT, REVIEW]
6
+ ---
7
+
8
+ # anti-hallucination
9
+
10
+ ## When to use
11
+
12
+ Always — when writing comments, commit messages, PR descriptions, or rationale strings. Per `.claude/rules/anti-hallucination.md`.
13
+
14
+ ## Steps
15
+
16
+ 1. Re-read the change before commit. Search for forbidden phrases:
17
+ "standard practice", "typically", "obviously", "best practice",
18
+ "as expected", "common pattern", "generally", "normally".
19
+ 2. Replace each with a citation (`§EPIC-...`, `§ADR-...`, RFC) or a measurement.
20
+ 3. If you can't cite, the claim is probably wrong — delete it.
21
+ 4. Add §-citations to any new file in src/core/.
22
+ 5. Run `analyze(mode='citation_groundedness')` before finish_task.
23
+
24
+ ## Anti-patterns
25
+
26
+ - Hedging ("I think this is fine") instead of citing.
27
+ - Decorative §-tags that don't actually trace to a spec.
28
+ - Ignoring the linter when it flags banned phrases — fix, don't suppress.
@@ -0,0 +1,26 @@
1
+ ---
2
+ name: pure-decision-pattern
3
+ description: Extract pure logic from I/O so tests stay fast and deterministic
4
+ category: implement
5
+ phases: [IMPLEMENT]
6
+ ---
7
+
8
+ # pure-decision-pattern
9
+
10
+ ## When to use
11
+
12
+ Any module that decides something based on inputs (rate limit, threshold check, status mapping). Keep the decision pure; let the caller orchestrate I/O.
13
+
14
+ ## Steps
15
+
16
+ 1. Identify the decision: a function `(input) → output` with no side effects.
17
+ 2. Move the decision to its own file under src/core/.../<name>.ts.
18
+ 3. Test the decision with deterministic inputs only. No DB, no clock, no fs.
19
+ 4. Caller (MCP tool / hook handler) injects clocks, DB connections, env reads.
20
+ 5. For env-driven toggles, expose a tiny `isXDisabled(env)` helper that the caller checks.
21
+
22
+ ## Anti-patterns
23
+
24
+ - Importing `process.env` deep in core (couples to node + makes test setup awful).
25
+ - Reading DB inside the decision — breaks the unit test boundary.
26
+ - Branching on `Date.now()` directly — inject a clock fn.
@@ -0,0 +1,26 @@
1
+ ---
2
+ name: tracer-bullet-tdd
3
+ description: Red-Green-Refactor with one shot through every layer first
4
+ category: implement
5
+ phases: [IMPLEMENT]
6
+ ---
7
+
8
+ # tracer-bullet-tdd
9
+
10
+ ## When to use
11
+
12
+ Implementing a feature that touches multiple layers (e.g., MCP tool → core function → store). Get a thin slice end-to-end working before fattening any layer.
13
+
14
+ ## Steps
15
+
16
+ 1. Write the **smallest** test that exercises every layer (skinny e2e).
17
+ 2. Stub each layer with the minimum code to make it red, then green.
18
+ 3. Commit the tracer; the diff shows the architecture in one screen.
19
+ 4. Now widen each layer: add cases to the unit tests of the layer that needs them.
20
+ 5. Refactor only after green; never refactor red code.
21
+
22
+ ## Anti-patterns
23
+
24
+ - Building one layer fully before the next exists ("vertical waterfall").
25
+ - Writing 10 tests up front then implementing — long red period kills feedback.
26
+ - Skipping refactor because tests pass — debt accumulates per layer.
@@ -0,0 +1,26 @@
1
+ ---
2
+ name: budget-aware-picking
3
+ description: Pick next task respecting cost budget — prefer XS/S when low
4
+ category: plan
5
+ phases: [PLAN, IMPLEMENT]
6
+ ---
7
+
8
+ # budget-aware-picking
9
+
10
+ ## When to use
11
+
12
+ When LLM cost is approaching the run cap or the sprint budget. Prevents tail-end blowups by biasing toward small tasks.
13
+
14
+ ## Steps
15
+
16
+ 1. Read current spend via `metrics(action='session_cost')`.
17
+ 2. If totalUsd / capUsdPerRun > 0.8, filter ready tasks to xpSize XS/S.
18
+ 3. Sort remaining: priority ASC, depth ASC, id stable tiebreak.
19
+ 4. If no XS/S available, fall back to any size (still escalates approval).
20
+ 5. Document the bias in `decisions.record` so the audit trail shows why.
21
+
22
+ ## Anti-patterns
23
+
24
+ - Letting an L task run with 5% budget remaining — guaranteed mid-task abort.
25
+ - Hard-stopping at 80% without fallback — unfinished critical path.
26
+ - Re-picking the same task after cost-fallback engages without re-budgeting.
@@ -0,0 +1,26 @@
1
+ ---
2
+ name: plan-sprint
3
+ description: Build a sprint with capacity + WIP=1 + dependency-respecting order
4
+ category: plan
5
+ phases: [PLAN]
6
+ ---
7
+
8
+ # plan-sprint
9
+
10
+ ## When to use
11
+
12
+ End of ANALYZE/DESIGN, before IMPLEMENT. You have decomposed atomic tasks and need a sprint that fits the team capacity.
13
+
14
+ ## Steps
15
+
16
+ 1. Compute capacity: hours_available × focus_factor (0.65 default).
17
+ 2. Sort tasks by `depends_on` (topological), then priority ASC.
18
+ 3. Greedily pack until capacity exhausted; respect xpSize budget.
19
+ 4. Run `analyze(mode='design_ready')` and `sync_stack_docs` before unfreezing.
20
+ 5. Verify WIP=1 enforceable: no two tasks share the same primary owner.
21
+
22
+ ## Anti-patterns
23
+
24
+ - Filling capacity to 100% (kills cycle time per Little's Law).
25
+ - Ignoring blocked tasks until sprint starts — pre-resolve in PLAN.
26
+ - Sprint with mixed phases (analyze + implement) — split per phase.
@@ -0,0 +1,67 @@
1
+ ---
2
+ name: to-issues
3
+ description: Break a plan or PRD into independently grabbable issues using tracer-bullet vertical slices; complements plan_sprint by producing the issue list that feeds it
4
+ category: plan
5
+ phases: [PLAN]
6
+ ---
7
+
8
+ # to-issues
9
+
10
+ Port of `skills-main/to-issues`. Use after `to-prd` (or when the PRD already exists) to produce the slice list that `plan_sprint` and `gh issue create` consume.
11
+
12
+ ## Process
13
+
14
+ ### 1. Gather context
15
+
16
+ Work from the PRD node or GitHub issue passed in. `gh issue view <n>` if you need comments.
17
+
18
+ ### 2. Draft vertical slices
19
+
20
+ Each slice is a **tracer bullet** — thin path through ALL layers (schema, API, UI, tests). NOT a horizontal layer cut.
21
+
22
+ Rules:
23
+ - Each slice is independently demoable
24
+ - Prefer many thin slices over few fat ones
25
+ - Mark slices **HITL** (needs human decision/review) or **AFK** (agent can ship alone). Prefer AFK.
26
+
27
+ ### 3. Quiz the user
28
+
29
+ Present as a numbered list:
30
+
31
+ | # | Title | Type | Blocked by | Stories covered |
32
+ |---|---|---|---|---|
33
+ | 1 | … | AFK | none | US-1, US-2 |
34
+
35
+ Ask:
36
+ - Granularity right? (too coarse / too fine)
37
+ - Dependencies correct?
38
+ - Any slices need to merge or split?
39
+ - HITL/AFK assignments correct?
40
+
41
+ Iterate until approved.
42
+
43
+ ### 4. File issues
44
+
45
+ For each slice, create with `gh issue create` in dependency order so blockers get real numbers first.
46
+
47
+ ```markdown
48
+ ## Parent
49
+ #<parent-issue-number>
50
+
51
+ ## What to build
52
+ End-to-end behavior for this slice. No layer-by-layer how-to.
53
+
54
+ ## Acceptance criteria
55
+ - [ ] …
56
+
57
+ ## Blocked by
58
+ - #<n> (or "None — can start immediately")
59
+ ```
60
+
61
+ Do not close or modify the parent issue.
62
+
63
+ ## Anti-patterns
64
+
65
+ - Horizontal slicing ("schema PR, then API PR, then UI PR") — none of those demo on their own
66
+ - Treating every slice as HITL — block on human availability instead of shipping
67
+ - Skipping the quiz step — user will reject the breakdown later, costing more
@@ -0,0 +1,26 @@
1
+ ---
2
+ name: citation-coverage-review
3
+ description: Verify every new core file has at least one §-citation traceable
4
+ category: review
5
+ phases: [REVIEW, HANDOFF]
6
+ ---
7
+
8
+ # citation-coverage-review
9
+
10
+ ## When to use
11
+
12
+ In PR review for any change touching src/core/. Citations anchor implementation back to design intent (EPIC, ADR, RFC).
13
+
14
+ ## Steps
15
+
16
+ 1. Run `analyze(mode='citation_groundedness')`.
17
+ 2. For each src/core/*.ts in the diff lacking §-citation, request one in review.
18
+ 3. Acceptable forms: `§EPIC-22.A4`, `§ADR-0049`, `RFC 7232 §2.3`.
19
+ 4. Skip src/tests/, src/cli/, src/web/ (caller-side discipline only).
20
+ 5. citation-coverage-guard hook (E21.T01) runs the check at task:post-complete; this skill is for human review depth.
21
+
22
+ ## Anti-patterns
23
+
24
+ - Bulk-adding cosmetic §-tags that don't trace anywhere.
25
+ - Approving "no citation needed for utilities" when the utility encodes a real design choice.
26
+ - Ignoring the hook warning thinking "it's just advisory".
@@ -0,0 +1,26 @@
1
+ ---
2
+ name: deep-module-review
3
+ description: Audit a module's depth ratio + interface surface before merge
4
+ category: review
5
+ phases: [REVIEW]
6
+ ---
7
+
8
+ # deep-module-review
9
+
10
+ ## When to use
11
+
12
+ In code review of a new module or significant change. Goal: keep modules deep (small interface, large impl) per Ousterhout.
13
+
14
+ ## Steps
15
+
16
+ 1. Run `analyze(mode='deep_module', dir=<changed-files>)`.
17
+ 2. Reject any new file with depth='shallow' (ratio > 0.5) unless intentional facade.
18
+ 3. For 'medium', ask: can any export be made internal? Does any import only need one symbol?
19
+ 4. Check: function names describe behavior, not implementation.
20
+ 5. Block merge if shallowCandidates > 0 without justification in PR description.
21
+
22
+ ## Anti-patterns
23
+
24
+ - Approving "looks fine" without running the analyzer.
25
+ - Letting test helpers leak into production exports because they're "small".
26
+ - Accepting helper modules with 10 exports — usually a bag of utilities, not a module.
@@ -0,0 +1,34 @@
1
+ ---
2
+ name: zoom-out
3
+ description: Step up one abstraction layer; produce a map of relevant modules and their callers when you don't know an area of code well
4
+ category: review
5
+ phases: [REVIEW, ANALYZE]
6
+ ---
7
+
8
+ # zoom-out
9
+
10
+ Port of `skills-main/zoom-out`. Use when you (or a collaborator) hit code in an unfamiliar area and need orientation before changing it.
11
+
12
+ ## When to use
13
+
14
+ - Reviewing a PR in a subsystem you've never touched
15
+ - About to refactor without a clear map of impact
16
+ - A bug report points into a module whose role you can't articulate
17
+
18
+ ## Output shape
19
+
20
+ A short tree (≤ 1 page) covering:
21
+
22
+ 1. **The module itself** — purpose in one sentence
23
+ 2. **Direct callers** — who depends on this (use `code_intelligence` to find them)
24
+ 3. **Direct dependencies** — what this depends on
25
+ 4. **Sibling modules** — others playing the same role at the same layer
26
+ 5. **Owning epic / requirement node** — pull from the graph if a node references this code path
27
+
28
+ Keep it factual. The point is a map, not commentary.
29
+
30
+ ## Anti-patterns
31
+
32
+ - Diving into implementation details — that's "zoom IN"; this skill is the opposite
33
+ - Skipping the callers — without them you can't tell which behaviors are load-bearing
34
+ - Inventing structure to fit a pattern you saw elsewhere — describe what's actually there
@@ -0,0 +1,30 @@
1
+ ---
2
+ name: dod-checklist
3
+ description: Definition of Done — 9 checks before update_status(done)
4
+ category: validate
5
+ phases: [VALIDATE, IMPLEMENT]
6
+ ---
7
+
8
+ # dod-checklist
9
+
10
+ ## When to use
11
+
12
+ Before calling `finish_task` or `update_status(done)`. The pipeline runs these automatically; this skill is the explicit human-readable form for review.
13
+
14
+ ## Steps
15
+
16
+ 1. has_acceptance_criteria — task or parent has AC. **Required.**
17
+ 2. ac_quality_pass — score ≥ 60 (INVEST). **Required.**
18
+ 3. no_unresolved_blockers — no `depends_on` to non-done. **Required.**
19
+ 4. status_flow_valid — passed through `in_progress` before `done`. **Required.**
20
+ 5. has_description — non-empty.
21
+ 6. not_oversized — L/XL must have subtasks.
22
+ 7. has_testable_ac — at least 1 AC has GIVEN/WHEN/THEN.
23
+ 8. has_estimate — xpSize OR estimateMinutes set.
24
+ 9. has_test_files — testFiles populated.
25
+
26
+ ## Anti-patterns
27
+
28
+ - Marking done when test_gate is failing — violates DoD #4 spirit.
29
+ - Hand-editing AC after starting work to fit "what was done".
30
+ - Skipping ac_quality_pass with a "trivially fixed in next task" excuse.
@@ -0,0 +1,26 @@
1
+ ---
2
+ name: harness-regression-check
3
+ description: Compare harness scores before/after to gate merge
4
+ category: validate
5
+ phases: [VALIDATE, REVIEW]
6
+ ---
7
+
8
+ # harness-regression-check
9
+
10
+ ## When to use
11
+
12
+ In VALIDATE before promoting to REVIEW; in REVIEW before merging. Harness < 70 = elevated hallucination risk.
13
+
14
+ ## Steps
15
+
16
+ 1. Read baseline from last green commit: `analyze(mode='harness_trend')`.
17
+ 2. Run `analyze(mode='harness_scan')` on current state.
18
+ 3. If delta ≤ -5 pts: investigate which dimension regressed (type, test, naming, error handling, etc.).
19
+ 4. If delta ≤ -10 pts: block merge. Open an investigation task before continuing.
20
+ 5. Persist the new score so the next session sees it as baseline.
21
+
22
+ ## Anti-patterns
23
+
24
+ - Treating harness as vanity metric — it predicts review effort.
25
+ - Boosting one dimension (e.g., adding empty JSDoc) to mask another regression.
26
+ - Ignoring the warning at start_task because "I'll clean up at the end".