cursordoctrine 0.1.1 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +3 -3
- package/linux/hooks/anti-slop-audit.sh +36 -11
- package/linux/hooks/anti-slop.md +54 -56
- package/linux/hooks/final-review.md +29 -14
- package/linux/hooks/final-review.sh +31 -9
- package/linux/hooks/hook-common.sh +70 -0
- package/linux/hooks/self-review.md +7 -0
- package/linux/hooks/subagent-stop-review.sh +1 -1
- package/package.json +2 -2
- package/skills/anti-slop/SKILL.md +15 -8
- package/windows/hooks/anti-slop-audit.ps1 +55 -11
- package/windows/hooks/anti-slop.md +54 -56
- package/windows/hooks/final-review.md +29 -14
- package/windows/hooks/final-review.ps1 +25 -8
- package/windows/hooks/hook-common.ps1 +41 -0
- package/windows/hooks/self-review.md +7 -0
- package/windows/hooks/subagent-stop-review.ps1 +1 -1
package/README.md
CHANGED
|
@@ -10,7 +10,7 @@ A small set of Cursor hooks that make the agent review its own work without bolt
|
|
|
10
10
|
2. **Hand the model its own edits back.** After each agent edit, a self-review prompt (plus minimal-edit and anti-slop advisories when they trip) is stashed and delivered on the next turn. The model reads its own diff, fixes real bugs, and stays quiet otherwise.
|
|
11
11
|
3. **Gate blast radius.** One permission gate denies a short, explicit list of dangerous commands (`rm -rf /`, `curl | sh`, force-push, `npm publish`, ...). Everything else is allowed.
|
|
12
12
|
|
|
13
|
-
When an implementation finishes, a stop hook fires exactly one final review pass over everything that changed — then stops. Delegated work gets the same treatment: a subagent that edited files reviews its own implementation before its result returns to the parent, and its edits are folded into the parent's final review. Every bound is enforced twice: in the script and in `hooks.json`.
|
|
13
|
+
When an implementation finishes, a stop hook fires exactly one final review pass over everything that changed — then stops. The review runs across five axes, the first of which is **intent trace**: the hook extracts your last user message from the transcript and prepends it to the review so the model must trace every diff hunk back to a concrete request. Anything untraceable is a hallucinated requirement and gets reverted — this is the only detector that catches "clean code, wrong feature," which no later axis and no linter can see. Delegated work gets the same treatment: a subagent that edited files reviews its own implementation before its result returns to the parent, and its edits are folded into the parent's final review. Every bound is enforced twice: in the script and in `hooks.json`.
|
|
14
14
|
|
|
15
15
|
This setup is for Cursor only. It installs into `~/.cursor` and `~/.agents/hooks` and touches nothing in your projects.
|
|
16
16
|
|
|
@@ -41,7 +41,7 @@ The two folders are functionally identical. Windows runs everything through `pws
|
|
|
41
41
|
| Session | `sessionStart` | `inject-doctrine` reads the doctrine + user rules and emits them as `additional_context`. |
|
|
42
42
|
| Every turn | `postToolUse` | Folds completed subagents' edit markers into this conversation's marker, then drains the conversation's pending feedback file into `additional_context`. One-shot, keyed by conversation id. |
|
|
43
43
|
| Shell | `beforeShellExecution` | `permission-gate` checks the command against a deny list. Allow by default, deny by list, fail open. |
|
|
44
|
-
| Edit | `afterFileEdit` + `stop` | `self-review-trigger` stashes the review prompt per edit; `minimal-edit-audit` and `anti-slop-audit` append advisories when thresholds trip; `final-review` fires one end-of-implementation pass. |
|
|
44
|
+
| Edit | `afterFileEdit` + `stop` | `self-review-trigger` stashes the review prompt per edit; `minimal-edit-audit` and `anti-slop-audit` append advisories when thresholds trip (new deps / premature abstraction / redundant comments / Tier 3 operational slop: retry-without-backoff, await-in-loop, telemetry spam); `final-review` fires one end-of-implementation pass. |
|
|
45
45
|
| Subagent | `subagentStop` | `subagent-stop-review` fires one in-subagent final review when a delegated run edited files, before the result returns to the parent. Marker-gated and flag-braked like `final-review`. |
|
|
46
46
|
|
|
47
47
|
## Install
|
|
@@ -59,7 +59,7 @@ No Node? Open `INSTALL.md`, paste its contents into a Cursor agent chat on the t
|
|
|
59
59
|
|
|
60
60
|
Prerequisites: `git` everywhere; `pwsh` on Windows; `bash` plus `jq` or `python3` on Linux.
|
|
61
61
|
|
|
62
|
-
The anti-slop skill (`skills/anti-slop/` — SKILL.md and the duplication scanner) installs to `~/.cursor/skills/anti-slop/`. The
|
|
62
|
+
The anti-slop skill (`skills/anti-slop/` — SKILL.md and the duplication scanner) installs to `~/.cursor/skills/anti-slop/`. The hook checklist (`~/.agents/hooks/anti-slop.md`, 13 items) is the canonical slop detector for both per-edit advisories and final-review axis 4. The final review runs the scanner from the skill path first when available.
|
|
63
63
|
|
|
64
64
|
## Tuning and kill switches
|
|
65
65
|
|
|
@@ -12,6 +12,8 @@
|
|
|
12
12
|
# *Strategy / *Singleton / *Facade / *Builder / *Visitor / *Decorator
|
|
13
13
|
# class, or CQRS / Event-Sourcing / DDD vocabulary
|
|
14
14
|
# * redundant comments that merely restate the next line of code
|
|
15
|
+
# * operational slop (Tier 3): retry-without-backoff, await-in-loop,
|
|
16
|
+
# telemetry spam (>= 6 log/print statements added in one edit)
|
|
15
17
|
#
|
|
16
18
|
# Fires when a static signal trips OR the edit added a substantial block of
|
|
17
19
|
# new source (>= ANTI_SLOP_CHECKLIST_LINES, default 40). Otherwise silent.
|
|
@@ -102,6 +104,31 @@ redundant="$(printf '%s\n' "$added" |
|
|
|
102
104
|
fi
|
|
103
105
|
done | sort -u | head -n 4)"
|
|
104
106
|
|
|
107
|
+
# --- signal 4: operational slop (Tier 3) ------------------------------------
|
|
108
|
+
# Retry-without-backoff: a retry construct with no sleep/backoff/setTimeout in
|
|
109
|
+
# the added lines. Seed-grade (high precision); the model judges.
|
|
110
|
+
ops_flags=""
|
|
111
|
+
if printf '%s\n' "$added" | grep -qE '\b(retry|retryCount|retries|maxRetries|attempt)[A-Za-z0-9_]*\b'; then
|
|
112
|
+
if ! printf '%s\n' "$added" | grep -qE '\b(sleep|setTimeout|backoff|back_off|exponential|jitter|delay)[A-Za-z0-9_]*\b'; then
|
|
113
|
+
ops_flags="${ops_flags}- RETRY WITHOUT BACKOFF: a retry construct was added but no sleep/backoff/setTimeout is visible in this edit's added lines. Unbounded retries = retry storms + token/cost burn; add bounded backoff or confirm the runtime already throttles.
|
|
114
|
+
"
|
|
115
|
+
fi
|
|
116
|
+
fi
|
|
117
|
+
# Awaited IO call co-occurring with a loop construct on the same edit. N+1 in
|
|
118
|
+
# agent/edge code, not just SQL. The model judges streaming vs serial-await.
|
|
119
|
+
if printf '%s\n' "$added" | grep -qE '\b(for|while|forEach|map|filter|reduce|flatMap|for[[:space:]]+await|async[[:space:]]+for)\b'; then
|
|
120
|
+
if printf '%s\n' "$added" | grep -qE '\bawait[[:space:]]+(fetch|ctx\.db|ctx\.run|client\.|axios|prisma\.|supabase\.|db\.|repo\.)'; then
|
|
121
|
+
ops_flags="${ops_flags}- AWAIT IN LOOP: a loop construct and an awaited IO call both appear in this edit. Sequential awaits in a loop = N+1 / serial latency; confirm whether Promise.all / a batch call / a single query is the right primitive. (If this is genuinely a streaming pattern, ignore.)
|
|
122
|
+
"
|
|
123
|
+
fi
|
|
124
|
+
fi
|
|
125
|
+
# Telemetry spam seed: 6+ log/print statements added in one file.
|
|
126
|
+
log_count="$(printf '%s\n' "$added" | grep -cE '\b(console\.(log|debug|info|warn|error)|print\(|fmt\.Print|std::cout|NSLog|System\.out\.println|println!|dbg!|console\.dir)\b')"
|
|
127
|
+
if [ "$log_count" -ge 6 ]; then
|
|
128
|
+
ops_flags="${ops_flags}- TELEMETRY SPAM: ${log_count} log/print statements added in this one edit. Debug-level telemetry that nobody reads is slop; consolidate or remove (kept only if this is a real logging entrypoint).
|
|
129
|
+
"
|
|
130
|
+
fi
|
|
131
|
+
|
|
105
132
|
# --- decide whether to fire -------------------------------------------------
|
|
106
133
|
added_code="$(printf '%s\n' "$added" | grep -cE '[^[:space:]]')"
|
|
107
134
|
checklist_lines="${ANTI_SLOP_CHECKLIST_LINES:-40}"
|
|
@@ -118,6 +145,7 @@ flags=""
|
|
|
118
145
|
"
|
|
119
146
|
[ -n "$redundant" ] && flags="${flags}- REDUNDANT COMMENTS: $(printf '%s' "$redundant" | paste -sd '|' -) - delete comments that restate the code; keep only WHY.
|
|
120
147
|
"
|
|
148
|
+
flags="${flags}${ops_flags}"
|
|
121
149
|
|
|
122
150
|
if [ -z "$flags" ] && [ "$substantial" = "0" ]; then exit 0; fi
|
|
123
151
|
|
|
@@ -126,17 +154,14 @@ checklist_file="$HOME/.agents/hooks/anti-slop.md"
|
|
|
126
154
|
checklist=""
|
|
127
155
|
[ -f "$checklist_file" ] && checklist="$(cat "$checklist_file")"
|
|
128
156
|
if [ -z "$checklist" ]; then
|
|
129
|
-
checklist='ANTI-SLOP
|
|
130
|
-
1
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
8. Cargo cult - delete any construct whose reason you cannot state.
|
|
138
|
-
9. Architecture - respect the project'"'"'s layering and boundaries.
|
|
139
|
-
10. Redundant comments restating code - delete; keep only WHY.'
|
|
157
|
+
checklist='ANTI-SLOP — read ~/.agents/hooks/anti-slop.md (13 items). Fallback if missing:
|
|
158
|
+
1–10: edge cases, duplication, conventions, deps, premature abstraction,
|
|
159
|
+
accidental complexity, tests (no tautologies), cargo cult, architecture,
|
|
160
|
+
redundant comments / prompt residue.
|
|
161
|
+
11: semantic contracts (behavior change without name/signature change).
|
|
162
|
+
12: operational slop (retry w/o backoff, await-in-loop, telemetry spam).
|
|
163
|
+
13: change surface (too many files for a simple request).
|
|
164
|
+
Fix guilty items now. Never revert what the user asked for.'
|
|
140
165
|
fi
|
|
141
166
|
|
|
142
167
|
flag_block=""
|
package/linux/hooks/anti-slop.md
CHANGED
|
@@ -1,56 +1,54 @@
|
|
|
1
|
-
ANTI-SLOP SELF-REVIEW — you just edited a file
|
|
2
|
-
|
|
3
|
-
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
not a
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
Hard constraints: never revert
|
|
54
|
-
|
|
55
|
-
targeted edits, then stop. The bar: would this pass a senior review at a top
|
|
56
|
-
engineering org without a single "why is this here?" comment.
|
|
1
|
+
ANTI-SLOP SELF-REVIEW — you just edited a file (or you are auditing the
|
|
2
|
+
session diff at final review). Code that runs but should not ship.
|
|
3
|
+
|
|
4
|
+
Intent trace (Tier 0 — hallucinated requirements, scope drift) runs FIRST at
|
|
5
|
+
stop via final-review axis 0, not here. This checklist covers code-shape and
|
|
6
|
+
cost slop. Apply every item; if guilty, FIX with Edit — delete, inline, drop.
|
|
7
|
+
Do not explain. If clean, say nothing.
|
|
8
|
+
|
|
9
|
+
1. EDGE CASES — Happy path only? Check null / empty / zero / boundary / error
|
|
10
|
+
inputs the task implies.
|
|
11
|
+
|
|
12
|
+
2. DUPLICATION — Logic that already exists in this repo? Call it; do not
|
|
13
|
+
re-implement. Same function in many files (isRecord-class) → one source.
|
|
14
|
+
|
|
15
|
+
3. CONVENTIONS — Match the FILE's style, naming, structure, error-handling,
|
|
16
|
+
imports. Not your defaults.
|
|
17
|
+
|
|
18
|
+
4. DEPENDENCIES — New library for something stdlib or an existing dep covers?
|
|
19
|
+
Remove it. A dependency must earn its place.
|
|
20
|
+
|
|
21
|
+
5. PREMATURE ABSTRACTION — Factory / Repository / Mediator / Strategy / Builder /
|
|
22
|
+
CQRS / Event Sourcing / DDD: is there a REAL problem with 2–3 call sites
|
|
23
|
+
TODAY? "Future flexibility" is not a reason. Delete and write direct code.
|
|
24
|
+
|
|
25
|
+
6. ACCIDENTAL COMPLEXITY — Could a junior read this in 30 seconds? Flatten
|
|
26
|
+
indirection, generics, config, layers that do not earn their keep.
|
|
27
|
+
|
|
28
|
+
7. TESTS (epistemic slop) — Assert real OUTCOMES and edge cases, not "it runs",
|
|
29
|
+
not a mirror of the implementation, not expect(true).toBe(true). A test
|
|
30
|
+
that cannot fail is slop.
|
|
31
|
+
|
|
32
|
+
8. CARGO CULT — Can you state WHY each non-obvious construct is there? Remove
|
|
33
|
+
what you cannot justify. A shape you have seen ≠ a shape you need.
|
|
34
|
+
|
|
35
|
+
9. ARCHITECTURE — Respect layering and boundaries. No reaching across layers,
|
|
36
|
+
no business logic in the wrong place, no breaking project constraints.
|
|
37
|
+
|
|
38
|
+
10. REDUNDANT COMMENTS — Delete comments that restate the code ("// increment
|
|
39
|
+
i"). Keep only WHY, never WHAT. No prompt residue ("in a real app...").
|
|
40
|
+
|
|
41
|
+
11. SEMANTIC CONTRACTS (Tier 1) — Did any existing function's BEHAVIOR change
|
|
42
|
+
without its name, signature, or docstring changing? Names are contracts.
|
|
43
|
+
deleteUser() that now soft-deletes is silent contract break.
|
|
44
|
+
|
|
45
|
+
12. OPERATIONAL SLOP (Tier 3) — Retry loops without backoff/sleep/jitter?
|
|
46
|
+
await fetch / ctx.db / prisma inside a for/while/map? Six or more
|
|
47
|
+
console.log / print added in one edit? Token burn with no user value →
|
|
48
|
+
remove or bound.
|
|
49
|
+
|
|
50
|
+
13. CHANGE SURFACE (Tier 5) — Did a simple request touch many files? Every
|
|
51
|
+
file in the diff must trace to the task. Trim unrelated hunks.
|
|
52
|
+
|
|
53
|
+
Hard constraints: never revert what the USER asked for — slop is what got added
|
|
54
|
+
on top. At most a few targeted edits, then stop.
|
|
@@ -1,11 +1,20 @@
|
|
|
1
1
|
FINAL REVIEW — you just finished an implementation. Before you treat it as done,
|
|
2
|
-
audit EVERYTHING you changed this session across the
|
|
2
|
+
audit EVERYTHING you changed this session across the five axes below and FIX what
|
|
3
3
|
fails. Do NOT revert the behaviour the user asked for. If an axis is already
|
|
4
4
|
clean, say so in one line — do not manufacture work.
|
|
5
5
|
|
|
6
6
|
Start by re-reading the diff. Scope the review to your session's changes and the
|
|
7
7
|
code they touch.
|
|
8
8
|
|
|
9
|
+
## 0. Intent trace (HIGHEST PRIORITY — run first)
|
|
10
|
+
The hook extracted your last user message as "ORIGINAL REQUEST" above. For every
|
|
11
|
+
hunk in the diff, answer: which part of the request forced this change? Anything
|
|
12
|
+
that cannot trace to the request is a HALLUCINATED REQUIREMENT — a feature,
|
|
13
|
+
flag, refactor, abstraction, dependency, or "nice to have" that nobody asked for.
|
|
14
|
+
Revert each one. "Clean code, wrong feature" is the worst failure mode and no
|
|
15
|
+
later axis can catch it. This axis outranks all others. (If no ORIGINAL REQUEST
|
|
16
|
+
is present — sandboxed verify run, no transcript — skip this axis.)
|
|
17
|
+
|
|
9
18
|
## 1. Correctness
|
|
10
19
|
- The logic does what the task requires — no off-by-one, inverted condition,
|
|
11
20
|
wrong operator, wrong return value, wrong import path.
|
|
@@ -36,17 +45,23 @@ code they touch.
|
|
|
36
45
|
- Add the missing tests; delete tautological ones.
|
|
37
46
|
|
|
38
47
|
## 4. Anti-slop
|
|
39
|
-
|
|
40
|
-
|
|
48
|
+
Axis 0 already caught intent drift. This axis catches code-shape and cost slop
|
|
49
|
+
across the whole session diff.
|
|
50
|
+
|
|
51
|
+
Step A — mechanical scan (if available):
|
|
52
|
+
If `~/.cursor/skills/anti-slop/scripts/scan_slop.py` exists, run:
|
|
41
53
|
python ~/.cursor/skills/anti-slop/scripts/scan_slop.py --all
|
|
42
|
-
If it does NOT exist,
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
54
|
+
If it does NOT exist, skip Step A (not a failure; do not hunt for the file).
|
|
55
|
+
|
|
56
|
+
Step B — canonical checklist (always):
|
|
57
|
+
Read `~/.agents/hooks/anti-slop.md` and apply ALL 13 items to every hunk you
|
|
58
|
+
changed this session. That file is the single source of truth for slop
|
|
59
|
+
detection — items 1–10 are structural/code, 11 is semantic contracts, 12 is
|
|
60
|
+
operational slop (retries, await-in-loop, telemetry spam), 13 is change
|
|
61
|
+
surface. Fix every hit; consolidate clones to one source of truth.
|
|
62
|
+
|
|
63
|
+
Step C — session footprint (also in the header above):
|
|
64
|
+
If "Session footprint" shows >5 files or the request was simple, justify each
|
|
65
|
+
file or trim. Unjustified files are slop.
|
|
66
|
+
|
|
67
|
+
Fix with edits now; re-run the scan (if Step A ran) and the tests; then stop.
|
|
@@ -1,8 +1,8 @@
|
|
|
1
1
|
#!/usr/bin/env bash
|
|
2
2
|
# final-review.sh - stop hook (Cursor, Linux).
|
|
3
3
|
#
|
|
4
|
-
# ONE comprehensive end-of-implementation review across
|
|
5
|
-
# correctness, reliability, coverage, and anti-slop. When the agent finishes
|
|
4
|
+
# ONE comprehensive end-of-implementation review across five axes:
|
|
5
|
+
# intent, correctness, reliability, coverage, and anti-slop. When the agent finishes
|
|
6
6
|
# an implementation that touched files, Cursor auto-submits this hook's
|
|
7
7
|
# `followup_message` as the next user turn, so the model re-audits everything
|
|
8
8
|
# it changed this session and FIXES what fails.
|
|
@@ -76,11 +76,11 @@ if [ -z "$body" ]; then
|
|
|
76
76
|
released on every path, no races, input validated at the boundary.
|
|
77
77
|
3. Coverage - behaviour-bearing changes have real tests; RUN the suite if present;
|
|
78
78
|
no tautological tests.
|
|
79
|
-
4. Anti-slop -
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
79
|
+
4. Anti-slop - read ~/.agents/hooks/anti-slop.md and apply all 13 items to
|
|
80
|
+
the session diff. If ~/.cursor/skills/anti-slop/scripts/scan_slop.py exists,
|
|
81
|
+
run `python ~/.cursor/skills/anti-slop/scripts/scan_slop.py --all` first.
|
|
82
|
+
Consolidate clones; drop premature abstraction, unneeded deps, operational
|
|
83
|
+
slop (retries, await-in-loop, log spam), unjustified files.
|
|
84
84
|
Fix now, re-run the scan + tests, then stop. If an axis is clean, say so in one line.'
|
|
85
85
|
fi
|
|
86
86
|
body="$(expand_agent_paths "$body")"
|
|
@@ -94,9 +94,31 @@ done <<EOF
|
|
|
94
94
|
$edited
|
|
95
95
|
EOF
|
|
96
96
|
file_list="$(printf '%s' "$file_list" | head -n 30)"
|
|
97
|
-
msg="FINAL REVIEW (end of implementation) - correctness, reliability, coverage, anti-slop.
|
|
98
97
|
|
|
99
|
-
|
|
98
|
+
# Tier 0: extract the last user <user_query> from the transcript so the model
|
|
99
|
+
# can trace every diff hunk back to a concrete request. Anything untraceable is
|
|
100
|
+
# a hallucinated requirement. Empty when there is no transcript or no
|
|
101
|
+
# user_query (sandboxed verify runs, fresh installs) - the axis is then a no-op.
|
|
102
|
+
user_query="$(extract_last_user_query "$input")"
|
|
103
|
+
intent_block=""
|
|
104
|
+
[ -n "$user_query" ] && intent_block="ORIGINAL REQUEST (your last user message, for intent trace):
|
|
105
|
+
---
|
|
106
|
+
$user_query
|
|
107
|
+
---
|
|
108
|
+
|
|
109
|
+
"
|
|
110
|
+
|
|
111
|
+
# Tier 5: cross-file change-surface metric. The per-file afterFileEdit audits
|
|
112
|
+
# miss the 50-file rename case; this seeds the whole-session footprint so the
|
|
113
|
+
# model can judge whether the change surface is proportional to the request.
|
|
114
|
+
unique_files="$(printf '%s\n' "$edited" | grep -c -v '^$')"
|
|
115
|
+
surface_block="Session footprint: ${unique_files} file(s) touched. If a simple request produced >5 files or >200 lines, justify each file's inclusion or trim.
|
|
116
|
+
|
|
117
|
+
"
|
|
118
|
+
|
|
119
|
+
msg="FINAL REVIEW (end of implementation) - intent, correctness, reliability, coverage, anti-slop.
|
|
120
|
+
|
|
121
|
+
${surface_block}${intent_block}Files you changed this session:
|
|
100
122
|
$file_list
|
|
101
123
|
|
|
102
124
|
$body"
|
|
@@ -105,6 +105,76 @@ resolve_agent_path() {
|
|
|
105
105
|
esac
|
|
106
106
|
}
|
|
107
107
|
|
|
108
|
+
# extract_last_user_query <json> -> text of the last <user_query> in this
|
|
109
|
+
# conversation's transcript, or '' if there is none. Capped at 2000 chars.
|
|
110
|
+
#
|
|
111
|
+
# This is the Tier 0 intent-trace primitive: the final-review hook prepends the
|
|
112
|
+
# extracted request to its followup so the model must trace every diff hunk back
|
|
113
|
+
# to it. Anything untraceable is a hallucinated requirement.
|
|
114
|
+
#
|
|
115
|
+
# Walks the JSONL backward via tac (preferred) or a portable awk fallback; finds
|
|
116
|
+
# the first (last) user record whose content carries a <user_query> tag.
|
|
117
|
+
extract_last_user_query() {
|
|
118
|
+
local json="$1"
|
|
119
|
+
local tp
|
|
120
|
+
tp="$(json_get "$json" transcript_path)"
|
|
121
|
+
[ -n "$tp" ] && [ -f "$tp" ] || return 0
|
|
122
|
+
|
|
123
|
+
local reversed
|
|
124
|
+
if command -v tac >/dev/null 2>&1; then
|
|
125
|
+
reversed="$(tac "$tp" 2>/dev/null)"
|
|
126
|
+
else
|
|
127
|
+
# Portable fallback: awk NR table reversed.
|
|
128
|
+
reversed="$(awk '{a[NR]=$0} END {for(i=NR;i>=1;i--) print a[i]}' "$tp" 2>/dev/null)"
|
|
129
|
+
fi
|
|
130
|
+
[ -n "$reversed" ] || return 0
|
|
131
|
+
|
|
132
|
+
# Pull the text out via python3 if available (handles JSON content arrays);
|
|
133
|
+
# fall back to a pure-grep that handles string-typed content.
|
|
134
|
+
if have_py; then
|
|
135
|
+
printf '%s' "$reversed" | python3 -c '
|
|
136
|
+
import json, re, sys
|
|
137
|
+
try:
|
|
138
|
+
for line in sys.stdin:
|
|
139
|
+
line = line.strip()
|
|
140
|
+
if not line or "\"role\"" not in line:
|
|
141
|
+
continue
|
|
142
|
+
try:
|
|
143
|
+
rec = json.loads(line)
|
|
144
|
+
except Exception:
|
|
145
|
+
continue
|
|
146
|
+
if not isinstance(rec, dict) or rec.get("role") != "user":
|
|
147
|
+
continue
|
|
148
|
+
msg = rec.get("message") or {}
|
|
149
|
+
content = msg.get("content")
|
|
150
|
+
text = ""
|
|
151
|
+
if isinstance(content, str):
|
|
152
|
+
text = content
|
|
153
|
+
elif isinstance(content, list):
|
|
154
|
+
for p in content:
|
|
155
|
+
if isinstance(p, dict) and p.get("type") == "text" and p.get("text"):
|
|
156
|
+
text += p["text"]
|
|
157
|
+
m = re.search(r"<user_query>\s*(.+?)\s*</user_query>", text, re.S)
|
|
158
|
+
if m:
|
|
159
|
+
q = m.group(1).strip()
|
|
160
|
+
if len(q) > 2000:
|
|
161
|
+
q = q[:2000] + "..."
|
|
162
|
+
print(q)
|
|
163
|
+
break
|
|
164
|
+
except Exception:
|
|
165
|
+
pass
|
|
166
|
+
' 2>/dev/null
|
|
167
|
+
return 0
|
|
168
|
+
fi
|
|
169
|
+
|
|
170
|
+
# No python3: best-effort grep for the common case where the user message
|
|
171
|
+
# is the only place <user_query> appears in a line. Imperfect but bounded.
|
|
172
|
+
printf '%s' "$reversed" |
|
|
173
|
+
grep -m1 -oE '<user_query>[^<]*</user_query>' 2>/dev/null |
|
|
174
|
+
sed -E 's@</?user_query>@@g' |
|
|
175
|
+
head -c 2000
|
|
176
|
+
}
|
|
177
|
+
|
|
108
178
|
# merge_subagent_edit_markers <json> <parent_cid> -> 0 if anything was folded
|
|
109
179
|
#
|
|
110
180
|
# Subagent edits fire afterFileEdit under the SUBAGENT's conversation_id, so
|
|
@@ -19,6 +19,13 @@ file. Your job, on this turn, is to:
|
|
|
19
19
|
- **Logic bugs that the user would actually care about**: a function
|
|
20
20
|
that returns the wrong thing, an off-by-one, a missing `return`, a
|
|
21
21
|
wrong import path.
|
|
22
|
+
- **Semantic contracts**: did any existing function's BEHAVIOR change
|
|
23
|
+
without its name, signature, or docstring changing? Names are
|
|
24
|
+
contracts. `deleteUser()` that now soft-deletes, a getter that now
|
|
25
|
+
writes, a function that used to throw on bad input and now silently
|
|
26
|
+
returns null — these are silent contract breaks that callers will
|
|
27
|
+
rely on and break against. If behavior changed, the name, signature,
|
|
28
|
+
or docstring must reflect it.
|
|
22
29
|
4. If you find a real bug, **fix it with `Edit`**, then say nothing.
|
|
23
30
|
Do not report it. Do not explain it. The user will see the fix
|
|
24
31
|
in the next message; the bug is gone.
|
|
@@ -4,7 +4,7 @@
|
|
|
4
4
|
# Counterpart of final-review.sh for delegated work. afterFileEdit DOES fire
|
|
5
5
|
# inside subagents (verified: a subagent run left its edits in
|
|
6
6
|
# session-edits-<subagent-cid>.txt), but subagents get no `stop` event, so
|
|
7
|
-
# that marker is never drained and the
|
|
7
|
+
# that marker is never drained and the five-axis review never fires for
|
|
8
8
|
# delegated implementations. This hook closes the loop: when a subagent
|
|
9
9
|
# finishes and ITS conversation has a session-edits marker, return ONE
|
|
10
10
|
# followup_message so the subagent audits its own implementation before the
|
package/package.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "cursordoctrine",
|
|
3
|
-
"version": "0.
|
|
4
|
-
"description": "Thin self-review hooks for Cursor — the model is the auditor.
|
|
3
|
+
"version": "0.2.1",
|
|
4
|
+
"description": "Thin self-review hooks for Cursor — the model is the auditor. Intent-trace final review (Tier 0), unified 13-item anti-slop checklist, operational slop detection.",
|
|
5
5
|
"bin": {
|
|
6
6
|
"cursordoctrine": "bin/cli.mjs"
|
|
7
7
|
},
|
|
@@ -223,10 +223,15 @@ management*, not token volume — one source of truth per concept.
|
|
|
223
223
|
|
|
224
224
|
## Automatic final review
|
|
225
225
|
|
|
226
|
-
The `stop` hook (
|
|
227
|
-
|
|
228
|
-
|
|
229
|
-
|
|
226
|
+
The `stop` hook (`~/.agents/hooks/final-review.ps1` on Windows,
|
|
227
|
+
`~/.agents/hooks/final-review.sh` on Linux) fires after the agent finishes an
|
|
228
|
+
implementation that edited files. It extracts the last `<user_query>` from the
|
|
229
|
+
session transcript (Tier 0 intent trace), reports session footprint (Tier 5),
|
|
230
|
+
and auto-submits a `followup_message` so the model audits five axes: intent,
|
|
231
|
+
correctness, reliability, coverage, anti-slop. Axis 4 delegates to this skill's
|
|
232
|
+
scanner (`scan_slop.py --all`) and the canonical checklist at
|
|
233
|
+
`~/.agents/hooks/anti-slop.md` (13 items, including semantic contracts,
|
|
234
|
+
operational slop, and change surface). One bounded pass per implementation.
|
|
230
235
|
|
|
231
236
|
## Hard constraints
|
|
232
237
|
|
|
@@ -259,9 +264,11 @@ Diff: {before} → {after} lines. Tests: {pass | n/a}
|
|
|
259
264
|
| Install path | `~/.cursor/skills/anti-slop/` |
|
|
260
265
|
| Invoke | `/anti-slop`, or "remove the AI slop" |
|
|
261
266
|
| Scanner | `python scripts/scan_slop.py --all` |
|
|
262
|
-
| Final review | automatic via `stop` hook |
|
|
267
|
+
| Final review | automatic via `stop` hook (`final-review.ps1` / `final-review.sh`) |
|
|
268
|
+
| Hook checklist | `~/.agents/hooks/anti-slop.md` (13 items; per-edit + final-review axis 4) |
|
|
263
269
|
|
|
264
270
|
The scanner is stdlib-only and needs Python 3.9+. Pairs with the **anti-slop
|
|
265
|
-
hook** (advisory
|
|
266
|
-
**
|
|
267
|
-
|
|
271
|
+
audit hook** (`anti-slop-audit.ps1` / `.sh`, advisory per edit), the **stop
|
|
272
|
+
hook** (`final-review.ps1` / `.sh`, five-axis session review incl. intent
|
|
273
|
+
trace), and **minimal-editing** (smallest-diff). This skill is the active
|
|
274
|
+
"delete it now" layer those only nudge toward.
|
|
@@ -13,6 +13,8 @@
|
|
|
13
13
|
# *Strategy / *Singleton / *Facade / *Builder / *Visitor / *Decorator
|
|
14
14
|
# class, or CQRS / Event-Sourcing / DDD vocabulary
|
|
15
15
|
# * redundant comments that merely restate the next line of code
|
|
16
|
+
# * operational slop (Tier 3): retry-without-backoff, await-in-loop,
|
|
17
|
+
# telemetry spam (>= 6 log/print statements added in one edit)
|
|
16
18
|
#
|
|
17
19
|
# Deferred to the model (semantic - no regex can judge these without drowning
|
|
18
20
|
# the user in false positives): edge cases, duplicated logic, ignored
|
|
@@ -128,6 +130,50 @@ foreach ($a in $added) {
|
|
|
128
130
|
if ($redundant.Count -ge 4) { break }
|
|
129
131
|
}
|
|
130
132
|
|
|
133
|
+
# --- signal 4: operational slop (Tier 3) ----------------------------------
|
|
134
|
+
# Retry-without-backoff: a retry loop or recursive retry without an obvious
|
|
135
|
+
# sleep/backoff/setTimeout nearby. The whole-file body is scanned so the
|
|
136
|
+
# backoff can sit above or below the retry; this is deliberately seed-grade
|
|
137
|
+
# (high precision), not a verdict.
|
|
138
|
+
$opsFlags = New-Object System.Collections.Generic.List[string]
|
|
139
|
+
$bodyHas = {
|
|
140
|
+
param($pat)
|
|
141
|
+
foreach ($a in $added) { if ($a -match $pat) { return $true } }
|
|
142
|
+
return $false
|
|
143
|
+
}
|
|
144
|
+
$retryWord = '\b(retry|retryCount|retries|maxRetries|attempt)\w*\b'
|
|
145
|
+
$backoffWord = '\b(sleep|setTimeout|backoff|back_off|exponential|jitter|delay)\w*\b'
|
|
146
|
+
if (& $bodyHas $retryWord) {
|
|
147
|
+
$noBackoff = $true
|
|
148
|
+
foreach ($a in $added) { if ($a -match $backoffWord) { $noBackoff = $false; break } }
|
|
149
|
+
if ($noBackoff) {
|
|
150
|
+
$opsFlags.Add("- RETRY WITHOUT BACKOFF: a retry construct was added but no sleep/backoff/setTimeout is visible in this edit's added lines. Unbounded retries = retry storms + token/cost burn; add bounded backoff or confirm the runtime already throttles.")
|
|
151
|
+
}
|
|
152
|
+
}
|
|
153
|
+
|
|
154
|
+
# `await` (or `await ctx.db`) inside a loop construct on its own line — N+1 in
|
|
155
|
+
# agent/edge code, not just SQL. We seed on the added-line co-occurrence of a
|
|
156
|
+
# loop keyword and an awaited call; the model judges whether it is genuinely a
|
|
157
|
+
# sequential-await loop (real slop) or a legit streaming pattern.
|
|
158
|
+
$loopWord = '\b(for|while|forEach|map|filter|reduce|flatMap|for\s+await|async\s+for)\b'
|
|
159
|
+
$awaitCall = '\bawait\s+(fetch|ctx\.db|ctx\.run|client\.|axios|prisma\.|supabase\.|db\.|repo\.)'
|
|
160
|
+
if (& $bodyHas $loopWord) {
|
|
161
|
+
$awaitInLoop = $false
|
|
162
|
+
foreach ($a in $added) { if ($a -match $awaitCall) { $awaitInLoop = $true; break } }
|
|
163
|
+
if ($awaitInLoop) {
|
|
164
|
+
$opsFlags.Add("- AWAIT IN LOOP: a loop construct and an awaited IO call both appear in this edit. Sequential awaits in a loop = N+1 / serial latency; confirm whether Promise.all / a batch call / a single query is the right primitive. (If this is genuinely a streaming pattern, ignore.)")
|
|
165
|
+
}
|
|
166
|
+
}
|
|
167
|
+
|
|
168
|
+
# Telemetry spam seed: 6+ console.log / print / fmt.Print / std::cout::<< added
|
|
169
|
+
# in one file. Models paste debug prints liberally; six is well past intent.
|
|
170
|
+
$logRe = '\b(console\.(log|debug|info|warn|error)|print\(|fmt\.Print|std::cout|NSLog|System\.out\.println|println!|dbg!|console\.dir)\b'
|
|
171
|
+
$logCount = 0
|
|
172
|
+
foreach ($a in $added) { if ($a -match $logRe) { $logCount++ } }
|
|
173
|
+
if ($logCount -ge 6) {
|
|
174
|
+
$opsFlags.Add("- TELEMETRY SPAM: $logCount log/print statements added in this one edit. Debug-level telemetry that nobody reads is slop; consolidate or remove (kept only if this is a real logging entrypoint).")
|
|
175
|
+
}
|
|
176
|
+
|
|
131
177
|
# --- decide whether to fire ----------------------------------------------
|
|
132
178
|
$srcRe = '\.(ts|tsx|js|jsx|mjs|cjs|py|go|rs|java|kt|kts|cs|cpp|cc|cxx|c|h|hpp|rb|php|swift|scala|m|mm|sh|ps1|lua|dart|ex|exs|vue|svelte)$'
|
|
133
179
|
$addedCode = 0
|
|
@@ -139,6 +185,7 @@ $flags = New-Object System.Collections.Generic.List[string]
|
|
|
139
185
|
if ($depAdded) { $flags.Add("- DEPENDENCY: " + $base + " gained a dependency - is it necessary, or do the stdlib / existing deps already cover it?") }
|
|
140
186
|
if ($patterns.Count -gt 0) { $flags.Add("- PREMATURE ABSTRACTION: " + ($patterns -join ', ') + " - is there a real, present problem (2-3+ call sites that exist today) that needs it? If it is speculative, delete it and write the direct code.") }
|
|
141
187
|
if ($redundant.Count -gt 0) { $flags.Add("- REDUNDANT COMMENTS: " + ($redundant -join ' | ') + " - delete comments that restate the code; keep only WHY.") }
|
|
188
|
+
$flags.AddRange($opsFlags)
|
|
142
189
|
|
|
143
190
|
if ($flags.Count -eq 0 -and -not $substantial) { exit 0 }
|
|
144
191
|
|
|
@@ -148,17 +195,14 @@ $checklist = ''
|
|
|
148
195
|
if (Test-Path -LiteralPath $checklistFile) { $checklist = Get-Content -Raw -LiteralPath $checklistFile }
|
|
149
196
|
if (-not $checklist) {
|
|
150
197
|
$checklist = @'
|
|
151
|
-
ANTI-SLOP
|
|
152
|
-
1
|
|
153
|
-
|
|
154
|
-
|
|
155
|
-
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
8. Cargo cult - delete any construct whose reason you cannot state.
|
|
160
|
-
9. Architecture - respect the project's layering and boundaries.
|
|
161
|
-
10. Redundant comments restating code - delete; keep only WHY.
|
|
198
|
+
ANTI-SLOP — read ~/.agents/hooks/anti-slop.md (13 items). Fallback if missing:
|
|
199
|
+
1–10: edge cases, duplication, conventions, deps, premature abstraction,
|
|
200
|
+
accidental complexity, tests (no tautologies), cargo cult, architecture,
|
|
201
|
+
redundant comments / prompt residue.
|
|
202
|
+
11: semantic contracts (behavior change without name/signature change).
|
|
203
|
+
12: operational slop (retry w/o backoff, await-in-loop, telemetry spam).
|
|
204
|
+
13: change surface (too many files for a simple request).
|
|
205
|
+
Fix guilty items now. Never revert what the user asked for.
|
|
162
206
|
'@
|
|
163
207
|
}
|
|
164
208
|
|
|
@@ -1,56 +1,54 @@
|
|
|
1
|
-
ANTI-SLOP SELF-REVIEW — you just edited a file
|
|
2
|
-
|
|
3
|
-
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
not a
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
Hard constraints: never revert
|
|
54
|
-
|
|
55
|
-
targeted edits, then stop. The bar: would this pass a senior review at a top
|
|
56
|
-
engineering org without a single "why is this here?" comment.
|
|
1
|
+
ANTI-SLOP SELF-REVIEW — you just edited a file (or you are auditing the
|
|
2
|
+
session diff at final review). Code that runs but should not ship.
|
|
3
|
+
|
|
4
|
+
Intent trace (Tier 0 — hallucinated requirements, scope drift) runs FIRST at
|
|
5
|
+
stop via final-review axis 0, not here. This checklist covers code-shape and
|
|
6
|
+
cost slop. Apply every item; if guilty, FIX with Edit — delete, inline, drop.
|
|
7
|
+
Do not explain. If clean, say nothing.
|
|
8
|
+
|
|
9
|
+
1. EDGE CASES — Happy path only? Check null / empty / zero / boundary / error
|
|
10
|
+
inputs the task implies.
|
|
11
|
+
|
|
12
|
+
2. DUPLICATION — Logic that already exists in this repo? Call it; do not
|
|
13
|
+
re-implement. Same function in many files (isRecord-class) → one source.
|
|
14
|
+
|
|
15
|
+
3. CONVENTIONS — Match the FILE's style, naming, structure, error-handling,
|
|
16
|
+
imports. Not your defaults.
|
|
17
|
+
|
|
18
|
+
4. DEPENDENCIES — New library for something stdlib or an existing dep covers?
|
|
19
|
+
Remove it. A dependency must earn its place.
|
|
20
|
+
|
|
21
|
+
5. PREMATURE ABSTRACTION — Factory / Repository / Mediator / Strategy / Builder /
|
|
22
|
+
CQRS / Event Sourcing / DDD: is there a REAL problem with 2–3 call sites
|
|
23
|
+
TODAY? "Future flexibility" is not a reason. Delete and write direct code.
|
|
24
|
+
|
|
25
|
+
6. ACCIDENTAL COMPLEXITY — Could a junior read this in 30 seconds? Flatten
|
|
26
|
+
indirection, generics, config, layers that do not earn their keep.
|
|
27
|
+
|
|
28
|
+
7. TESTS (epistemic slop) — Assert real OUTCOMES and edge cases, not "it runs",
|
|
29
|
+
not a mirror of the implementation, not expect(true).toBe(true). A test
|
|
30
|
+
that cannot fail is slop.
|
|
31
|
+
|
|
32
|
+
8. CARGO CULT — Can you state WHY each non-obvious construct is there? Remove
|
|
33
|
+
what you cannot justify. A shape you have seen ≠ a shape you need.
|
|
34
|
+
|
|
35
|
+
9. ARCHITECTURE — Respect layering and boundaries. No reaching across layers,
|
|
36
|
+
no business logic in the wrong place, no breaking project constraints.
|
|
37
|
+
|
|
38
|
+
10. REDUNDANT COMMENTS — Delete comments that restate the code ("// increment
|
|
39
|
+
i"). Keep only WHY, never WHAT. No prompt residue ("in a real app...").
|
|
40
|
+
|
|
41
|
+
11. SEMANTIC CONTRACTS (Tier 1) — Did any existing function's BEHAVIOR change
|
|
42
|
+
without its name, signature, or docstring changing? Names are contracts.
|
|
43
|
+
deleteUser() that now soft-deletes is silent contract break.
|
|
44
|
+
|
|
45
|
+
12. OPERATIONAL SLOP (Tier 3) — Retry loops without backoff/sleep/jitter?
|
|
46
|
+
await fetch / ctx.db / prisma inside a for/while/map? Six or more
|
|
47
|
+
console.log / print added in one edit? Token burn with no user value →
|
|
48
|
+
remove or bound.
|
|
49
|
+
|
|
50
|
+
13. CHANGE SURFACE (Tier 5) — Did a simple request touch many files? Every
|
|
51
|
+
file in the diff must trace to the task. Trim unrelated hunks.
|
|
52
|
+
|
|
53
|
+
Hard constraints: never revert what the USER asked for — slop is what got added
|
|
54
|
+
on top. At most a few targeted edits, then stop.
|
|
@@ -1,11 +1,20 @@
|
|
|
1
1
|
FINAL REVIEW — you just finished an implementation. Before you treat it as done,
|
|
2
|
-
audit EVERYTHING you changed this session across the
|
|
2
|
+
audit EVERYTHING you changed this session across the five axes below and FIX what
|
|
3
3
|
fails. Do NOT revert the behaviour the user asked for. If an axis is already
|
|
4
4
|
clean, say so in one line — do not manufacture work.
|
|
5
5
|
|
|
6
6
|
Start by re-reading the diff. Scope the review to your session's changes and the
|
|
7
7
|
code they touch.
|
|
8
8
|
|
|
9
|
+
## 0. Intent trace (HIGHEST PRIORITY — run first)
|
|
10
|
+
The hook extracted your last user message as "ORIGINAL REQUEST" above. For every
|
|
11
|
+
hunk in the diff, answer: which part of the request forced this change? Anything
|
|
12
|
+
that cannot trace to the request is a HALLUCINATED REQUIREMENT — a feature,
|
|
13
|
+
flag, refactor, abstraction, dependency, or "nice to have" that nobody asked for.
|
|
14
|
+
Revert each one. "Clean code, wrong feature" is the worst failure mode and no
|
|
15
|
+
later axis can catch it. This axis outranks all others. (If no ORIGINAL REQUEST
|
|
16
|
+
is present — sandboxed verify run, no transcript — skip this axis.)
|
|
17
|
+
|
|
9
18
|
## 1. Correctness
|
|
10
19
|
- The logic does what the task requires — no off-by-one, inverted condition,
|
|
11
20
|
wrong operator, wrong return value, wrong import path.
|
|
@@ -36,17 +45,23 @@ code they touch.
|
|
|
36
45
|
- Add the missing tests; delete tautological ones.
|
|
37
46
|
|
|
38
47
|
## 4. Anti-slop
|
|
39
|
-
|
|
40
|
-
|
|
48
|
+
Axis 0 already caught intent drift. This axis catches code-shape and cost slop
|
|
49
|
+
across the whole session diff.
|
|
50
|
+
|
|
51
|
+
Step A — mechanical scan (if available):
|
|
52
|
+
If `~/.cursor/skills/anti-slop/scripts/scan_slop.py` exists, run:
|
|
41
53
|
python ~/.cursor/skills/anti-slop/scripts/scan_slop.py --all
|
|
42
|
-
If it does NOT exist,
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
54
|
+
If it does NOT exist, skip Step A (not a failure; do not hunt for the file).
|
|
55
|
+
|
|
56
|
+
Step B — canonical checklist (always):
|
|
57
|
+
Read `~/.agents/hooks/anti-slop.md` and apply ALL 13 items to every hunk you
|
|
58
|
+
changed this session. That file is the single source of truth for slop
|
|
59
|
+
detection — items 1–10 are structural/code, 11 is semantic contracts, 12 is
|
|
60
|
+
operational slop (retries, await-in-loop, telemetry spam), 13 is change
|
|
61
|
+
surface. Fix every hit; consolidate clones to one source of truth.
|
|
62
|
+
|
|
63
|
+
Step C — session footprint (also in the header above):
|
|
64
|
+
If "Session footprint" shows >5 files or the request was simple, justify each
|
|
65
|
+
file or trim. Unjustified files are slop.
|
|
66
|
+
|
|
67
|
+
Fix with edits now; re-run the scan (if Step A ran) and the tests; then stop.
|
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
# final-review.ps1 - stop hook (Cursor).
|
|
2
2
|
#
|
|
3
|
-
# ONE comprehensive end-of-implementation review across
|
|
4
|
-
# correctness, reliability, coverage, and anti-slop. When the agent finishes an
|
|
3
|
+
# ONE comprehensive end-of-implementation review across five axes:
|
|
4
|
+
# intent, correctness, reliability, coverage, and anti-slop. When the agent finishes an
|
|
5
5
|
# implementation that touched files, Cursor auto-submits this hook's
|
|
6
6
|
# `followup_message` as the next user turn, so the model re-audits everything it
|
|
7
7
|
# changed this session and FIXES what fails - the model-as-auditor pattern over
|
|
@@ -86,11 +86,11 @@ FINAL REVIEW - audit everything you changed this session and FIX what fails
|
|
|
86
86
|
released on every path, no races, input validated at the boundary.
|
|
87
87
|
3. Coverage - behaviour-bearing changes have real tests; RUN the suite if present;
|
|
88
88
|
no tautological tests.
|
|
89
|
-
4. Anti-slop -
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
89
|
+
4. Anti-slop - read ~/.agents/hooks/anti-slop.md and apply all 13 items to
|
|
90
|
+
the session diff. If ~/.cursor/skills/anti-slop/scripts/scan_slop.py exists,
|
|
91
|
+
run `python ~/.cursor/skills/anti-slop/scripts/scan_slop.py --all` first.
|
|
92
|
+
Consolidate clones; drop premature abstraction, unneeded deps, operational
|
|
93
|
+
slop (retries, await-in-loop, log spam), unjustified files.
|
|
94
94
|
Fix now, re-run the scan + tests, then stop. If an axis is clean, say so in one line.
|
|
95
95
|
'@
|
|
96
96
|
}
|
|
@@ -98,7 +98,24 @@ $body = Expand-AgentPaths $body
|
|
|
98
98
|
|
|
99
99
|
$resolved = @($edited | ForEach-Object { Resolve-AgentPath $_ })
|
|
100
100
|
$fileList = ($resolved | Select-Object -First 30) -join "`n "
|
|
101
|
-
|
|
101
|
+
|
|
102
|
+
# Tier 0: extract the last user <user_query> from the transcript so the model
|
|
103
|
+
# can trace every diff hunk back to a concrete request. Anything untraceable is
|
|
104
|
+
# a hallucinated requirement. Empty when there is no transcript or no user_query
|
|
105
|
+
# (sandboxed verify runs, fresh installs) — the axis is then a no-op.
|
|
106
|
+
$userQuery = Get-LastUserQuery $obj
|
|
107
|
+
$intentBlock = ''
|
|
108
|
+
if ($userQuery) {
|
|
109
|
+
$intentBlock = "ORIGINAL REQUEST (your last user message, for intent trace):`n---`n$userQuery`n---`n`n"
|
|
110
|
+
}
|
|
111
|
+
|
|
112
|
+
# Tier 5: cross-file change-surface metric. The per-file afterFileEdit audits
|
|
113
|
+
# miss the 50-file rename case; this seeds the whole-session footprint so the
|
|
114
|
+
# model can judge whether the change surface is proportional to the request.
|
|
115
|
+
$uniqueFiles = @($edited | Select-Object -Unique).Count
|
|
116
|
+
$surfaceBlock = "Session footprint: $uniqueFiles file(s) touched. If a simple request produced >5 files or >200 lines, justify each file's inclusion or trim.`n`n"
|
|
117
|
+
|
|
118
|
+
$msg = "FINAL REVIEW (end of implementation) - intent, correctness, reliability, coverage, anti-slop.`n`n${surfaceBlock}${intentBlock}Files you changed this session:`n $fileList`n`n$body"
|
|
102
119
|
|
|
103
120
|
# Arm the one-shot brake BEFORE emitting, so a crash after emit can't re-fire.
|
|
104
121
|
New-Item -ItemType File -Path $flag -Force -ErrorAction SilentlyContinue | Out-Null
|
|
@@ -73,6 +73,47 @@ function Resolve-AgentPath([string]$p) {
|
|
|
73
73
|
return ConvertTo-FwdPath $p
|
|
74
74
|
}
|
|
75
75
|
|
|
76
|
+
# Extract the last user <user_query> from a Cursor transcript JSONL. The
|
|
77
|
+
# transcript is an array of {role, message} records; we walk backward from the
|
|
78
|
+
# end, find the last user turn whose content has a <user_query> tag, and return
|
|
79
|
+
# its text. Returns '' if there is no transcript or no user_query. Capped at
|
|
80
|
+
# 2000 chars so the follow-up prompt stays bounded.
|
|
81
|
+
#
|
|
82
|
+
# This is the Tier 0 intent-trace primitive: the final-review hook prepends the
|
|
83
|
+
# extracted request to its followup so the model must trace every diff hunk back
|
|
84
|
+
# to it. Anything untraceable is a hallucinated requirement.
|
|
85
|
+
function Get-LastUserQuery($obj) {
|
|
86
|
+
$tp = ''
|
|
87
|
+
if ($obj -and $obj.PSObject.Properties['transcript_path']) { $tp = [string]$obj.transcript_path }
|
|
88
|
+
if (-not $tp -or -not (Test-Path -LiteralPath $tp)) { return '' }
|
|
89
|
+
$lines = @(Get-Content -LiteralPath $tp -ErrorAction SilentlyContinue)
|
|
90
|
+
for ($i = $lines.Count - 1; $i -ge 0; $i--) {
|
|
91
|
+
$line = $lines[$i]
|
|
92
|
+
if (-not $line -or $line -notmatch '"role"\s*:\s*"user"') { continue }
|
|
93
|
+
try {
|
|
94
|
+
$rec = $line | ConvertFrom-Json -ErrorAction SilentlyContinue
|
|
95
|
+
} catch { continue }
|
|
96
|
+
if (-not $rec -or -not $rec.message) { continue }
|
|
97
|
+
$content = $rec.message.content
|
|
98
|
+
if (-not $content) { continue }
|
|
99
|
+
# content is an array of {type:text,text:...} or a plain string
|
|
100
|
+
$text = ''
|
|
101
|
+
if ($content -is [string]) {
|
|
102
|
+
$text = $content
|
|
103
|
+
} else {
|
|
104
|
+
foreach ($part in $content) {
|
|
105
|
+
if ($part.type -eq 'text' -and $part.text) { $text += $part.text }
|
|
106
|
+
}
|
|
107
|
+
}
|
|
108
|
+
if ($text -match '(?s)<user_query>\s*(.+?)\s*</user_query>') {
|
|
109
|
+
$q = $Matches[1].Trim()
|
|
110
|
+
if ($q.Length -gt 2000) { $q = $q.Substring(0, 2000) + '...' }
|
|
111
|
+
return $q
|
|
112
|
+
}
|
|
113
|
+
}
|
|
114
|
+
return ''
|
|
115
|
+
}
|
|
116
|
+
|
|
76
117
|
# Subagent edits fire afterFileEdit under the SUBAGENT's conversation_id, so
|
|
77
118
|
# their session-edits markers are invisible to the parent's stop-hook review.
|
|
78
119
|
# Subagent transcripts live at <transcripts>/<parent-cid>/subagents/<sub-cid>.jsonl,
|
|
@@ -19,6 +19,13 @@ file. Your job, on this turn, is to:
|
|
|
19
19
|
- **Logic bugs that the user would actually care about**: a function
|
|
20
20
|
that returns the wrong thing, an off-by-one, a missing `return`, a
|
|
21
21
|
wrong import path.
|
|
22
|
+
- **Semantic contracts**: did any existing function's BEHAVIOR change
|
|
23
|
+
without its name, signature, or docstring changing? Names are
|
|
24
|
+
contracts. `deleteUser()` that now soft-deletes, a getter that now
|
|
25
|
+
writes, a function that used to throw on bad input and now silently
|
|
26
|
+
returns null — these are silent contract breaks that callers will
|
|
27
|
+
rely on and break against. If behavior changed, the name, signature,
|
|
28
|
+
or docstring must reflect it.
|
|
22
29
|
4. If you find a real bug, **fix it with `Edit`**, then say nothing.
|
|
23
30
|
Do not report it. Do not explain it. The user will see the fix
|
|
24
31
|
in the next message; the bug is gone.
|
|
@@ -3,7 +3,7 @@
|
|
|
3
3
|
# Counterpart of final-review.ps1 for delegated work. afterFileEdit DOES fire
|
|
4
4
|
# inside subagents (verified: a poteto subagent run left ~58 entries in
|
|
5
5
|
# session-edits-<subagent-cid>.txt), but subagents get no `stop` event, so
|
|
6
|
-
# that marker is never drained and the
|
|
6
|
+
# that marker is never drained and the five-axis review never fires for
|
|
7
7
|
# delegated implementations. This hook closes the loop: when a subagent
|
|
8
8
|
# finishes and ITS conversation has a session-edits marker, return ONE
|
|
9
9
|
# followup_message so the subagent audits its own implementation before the
|