cursordoctrine 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -10,7 +10,7 @@ A small set of Cursor hooks that make the agent review its own work without bolt
10
10
  2. **Hand the model its own edits back.** After each agent edit, a self-review prompt (plus minimal-edit and anti-slop advisories when they trip) is stashed and delivered on the next turn. The model reads its own diff, fixes real bugs, and stays quiet otherwise.
11
11
  3. **Gate blast radius.** One permission gate denies a short, explicit list of dangerous commands (`rm -rf /`, `curl | sh`, force-push, `npm publish`, ...). Everything else is allowed.
12
12
 
13
- When an implementation finishes, a stop hook fires exactly one final review pass over everything that changed — then stops. Delegated work gets the same treatment: a subagent that edited files reviews its own implementation before its result returns to the parent, and its edits are folded into the parent's final review. Every bound is enforced twice: in the script and in `hooks.json`.
13
+ When an implementation finishes, a stop hook fires exactly one final review pass over everything that changed — then stops. The review runs across five axes, the first of which is **intent trace**: the hook extracts your last user message from the transcript and prepends it to the review so the model must trace every diff hunk back to a concrete request. Anything untraceable is a hallucinated requirement and gets reverted — this is the only detector that catches "clean code, wrong feature," which no later axis and no linter can see. Delegated work gets the same treatment: a subagent that edited files reviews its own implementation before its result returns to the parent, and its edits are folded into the parent's final review. Every bound is enforced twice: in the script and in `hooks.json`.
14
14
 
15
15
  This setup is for Cursor only. It installs into `~/.cursor` and `~/.agents/hooks` and touches nothing in your projects.
16
16
 
@@ -41,7 +41,7 @@ The two folders are functionally identical. Windows runs everything through `pws
41
41
  | Session | `sessionStart` | `inject-doctrine` reads the doctrine + user rules and emits them as `additional_context`. |
42
42
  | Every turn | `postToolUse` | Folds completed subagents' edit markers into this conversation's marker, then drains the conversation's pending feedback file into `additional_context`. One-shot, keyed by conversation id. |
43
43
  | Shell | `beforeShellExecution` | `permission-gate` checks the command against a deny list. Allow by default, deny by list, fail open. |
44
- | Edit | `afterFileEdit` + `stop` | `self-review-trigger` stashes the review prompt per edit; `minimal-edit-audit` and `anti-slop-audit` append advisories when thresholds trip; `final-review` fires one end-of-implementation pass. |
44
+ | Edit | `afterFileEdit` + `stop` | `self-review-trigger` stashes the review prompt per edit; `minimal-edit-audit` and `anti-slop-audit` append advisories when thresholds trip (new deps / premature abstraction / redundant comments / Tier 3 operational slop: retry-without-backoff, await-in-loop, telemetry spam); `final-review` fires one end-of-implementation pass. |
45
45
  | Subagent | `subagentStop` | `subagent-stop-review` fires one in-subagent final review when a delegated run edited files, before the result returns to the parent. Marker-gated and flag-braked like `final-review`. |
46
46
 
47
47
  ## Install
@@ -12,6 +12,8 @@
12
12
  # *Strategy / *Singleton / *Facade / *Builder / *Visitor / *Decorator
13
13
  # class, or CQRS / Event-Sourcing / DDD vocabulary
14
14
  # * redundant comments that merely restate the next line of code
15
+ # * operational slop (Tier 3): retry-without-backoff, await-in-loop,
16
+ # telemetry spam (>= 6 log/print statements added in one edit)
15
17
  #
16
18
  # Fires when a static signal trips OR the edit added a substantial block of
17
19
  # new source (>= ANTI_SLOP_CHECKLIST_LINES, default 40). Otherwise silent.
@@ -102,6 +104,31 @@ redundant="$(printf '%s\n' "$added" |
102
104
  fi
103
105
  done | sort -u | head -n 4)"
104
106
 
107
+ # --- signal 4: operational slop (Tier 3) ------------------------------------
108
+ # Retry-without-backoff: a retry construct with no sleep/backoff/setTimeout in
109
+ # the added lines. Seed-grade (high precision); the model judges.
110
+ ops_flags=""
111
+ if printf '%s\n' "$added" | grep -qE '\b(retry|retryCount|retries|maxRetries|attempt)[A-Za-z0-9_]*\b'; then
112
+ if ! printf '%s\n' "$added" | grep -qE '\b(sleep|setTimeout|backoff|back_off|exponential|jitter|delay)[A-Za-z0-9_]*\b'; then
113
+ ops_flags="${ops_flags}- RETRY WITHOUT BACKOFF: a retry construct was added but no sleep/backoff/setTimeout is visible in this edit's added lines. Unbounded retries = retry storms + token/cost burn; add bounded backoff or confirm the runtime already throttles.
114
+ "
115
+ fi
116
+ fi
117
+ # Awaited IO call co-occurring with a loop construct on the same edit. N+1 in
118
+ # agent/edge code, not just SQL. The model judges streaming vs serial-await.
119
+ if printf '%s\n' "$added" | grep -qE '\b(for|while|forEach|map|filter|reduce|flatMap|for[[:space:]]+await|async[[:space:]]+for)\b'; then
120
+ if printf '%s\n' "$added" | grep -qE '\bawait[[:space:]]+(fetch|ctx\.db|ctx\.run|client\.|axios|prisma\.|supabase\.|db\.|repo\.)'; then
121
+ ops_flags="${ops_flags}- AWAIT IN LOOP: a loop construct and an awaited IO call both appear in this edit. Sequential awaits in a loop = N+1 / serial latency; confirm whether Promise.all / a batch call / a single query is the right primitive. (If this is genuinely a streaming pattern, ignore.)
122
+ "
123
+ fi
124
+ fi
125
+ # Telemetry spam seed: 6+ log/print statements added in one file.
126
+ log_count="$(printf '%s\n' "$added" | grep -cE '\b(console\.(log|debug|info|warn|error)|print\(|fmt\.Print|std::cout|NSLog|System\.out\.println|println!|dbg!|console\.dir)\b')"
127
+ if [ "$log_count" -ge 6 ]; then
128
+ ops_flags="${ops_flags}- TELEMETRY SPAM: ${log_count} log/print statements added in this one edit. Debug-level telemetry that nobody reads is slop; consolidate or remove (kept only if this is a real logging entrypoint).
129
+ "
130
+ fi
131
+
105
132
  # --- decide whether to fire -------------------------------------------------
106
133
  added_code="$(printf '%s\n' "$added" | grep -cE '[^[:space:]]')"
107
134
  checklist_lines="${ANTI_SLOP_CHECKLIST_LINES:-40}"
@@ -118,6 +145,7 @@ flags=""
118
145
  "
119
146
  [ -n "$redundant" ] && flags="${flags}- REDUNDANT COMMENTS: $(printf '%s' "$redundant" | paste -sd '|' -) - delete comments that restate the code; keep only WHY.
120
147
  "
148
+ flags="${flags}${ops_flags}"
121
149
 
122
150
  if [ -z "$flags" ] && [ "$substantial" = "0" ]; then exit 0; fi
123
151
 
@@ -1,11 +1,20 @@
1
1
  FINAL REVIEW — you just finished an implementation. Before you treat it as done,
2
- audit EVERYTHING you changed this session across the four axes below and FIX what
2
+ audit EVERYTHING you changed this session across the five axes below and FIX what
3
3
  fails. Do NOT revert the behaviour the user asked for. If an axis is already
4
4
  clean, say so in one line — do not manufacture work.
5
5
 
6
6
  Start by re-reading the diff. Scope the review to your session's changes and the
7
7
  code they touch.
8
8
 
9
+ ## 0. Intent trace (HIGHEST PRIORITY — run first)
10
+ The hook extracted your last user message as "ORIGINAL REQUEST" above. For every
11
+ hunk in the diff, answer: which part of the request forced this change? Anything
12
+ that cannot trace to the request is a HALLUCINATED REQUIREMENT — a feature,
13
+ flag, refactor, abstraction, dependency, or "nice to have" that nobody asked for.
14
+ Revert each one. "Clean code, wrong feature" is the worst failure mode and no
15
+ later axis can catch it. This axis outranks all others. (If no ORIGINAL REQUEST
16
+ is present — sandboxed verify run, no transcript — skip this axis.)
17
+
9
18
  ## 1. Correctness
10
19
  - The logic does what the task requires — no off-by-one, inverted condition,
11
20
  wrong operator, wrong return value, wrong import path.
@@ -83,11 +83,42 @@ if [ -z "$body" ]; then
83
83
  drop premature abstraction, unneeded deps, redundant comments, dead helpers.
84
84
  Fix now, re-run the scan + tests, then stop. If an axis is clean, say so in one line.'
85
85
  fi
86
-
87
- file_list="$(printf '%s\n' "$edited" | head -n 30 | sed 's/^/ /')"
88
- msg="FINAL REVIEW (end of implementation) - correctness, reliability, coverage, anti-slop.
89
-
90
- Files you changed this session:
86
+ body="$(expand_agent_paths "$body")"
87
+
88
+ file_list=""
89
+ while IFS= read -r p; do
90
+ [ -n "$p" ] || continue
91
+ rp="$(resolve_agent_path "$p")"
92
+ file_list="${file_list} ${rp}"$'\n'
93
+ done <<EOF
94
+ $edited
95
+ EOF
96
+ file_list="$(printf '%s' "$file_list" | head -n 30)"
97
+
98
+ # Tier 0: extract the last user <user_query> from the transcript so the model
99
+ # can trace every diff hunk back to a concrete request. Anything untraceable is
100
+ # a hallucinated requirement. Empty when there is no transcript or no
101
+ # user_query (sandboxed verify runs, fresh installs) - the axis is then a no-op.
102
+ user_query="$(extract_last_user_query "$input")"
103
+ intent_block=""
104
+ [ -n "$user_query" ] && intent_block="ORIGINAL REQUEST (your last user message, for intent trace):
105
+ ---
106
+ $user_query
107
+ ---
108
+
109
+ "
110
+
111
+ # Tier 5: cross-file change-surface metric. The per-file afterFileEdit audits
112
+ # miss the 50-file rename case; this seeds the whole-session footprint so the
113
+ # model can judge whether the change surface is proportional to the request.
114
+ unique_files="$(printf '%s\n' "$edited" | grep -c -v '^$')"
115
+ surface_block="Session footprint: ${unique_files} file(s) touched. If a simple request produced >5 files or >200 lines, justify each file's inclusion or trim.
116
+
117
+ "
118
+
119
+ msg="FINAL REVIEW (end of implementation) - intent, correctness, reliability, coverage, anti-slop.
120
+
121
+ ${surface_block}${intent_block}Files you changed this session:
91
122
  $file_list
92
123
 
93
124
  $body"
@@ -87,6 +87,94 @@ is_cursor_config_path() {
87
87
  esac
88
88
  }
89
89
 
90
+ # Expand ~/ in agent-facing text to an absolute profile path (bash expands ~,
91
+ # but stop-hook followups should still emit literals agents can copy-paste).
92
+ expand_agent_paths() {
93
+ local text="$1"
94
+ local home="${HOME%/}"
95
+ text="${text//\~\//$home/}"
96
+ printf '%s' "$text"
97
+ }
98
+
99
+ # Normalize a file path for agent prompts (expand ~).
100
+ resolve_agent_path() {
101
+ local p="$1"
102
+ case "$p" in
103
+ "~/"*) printf '%s' "$HOME/${p#~/}" ;;
104
+ *) printf '%s' "$p" ;;
105
+ esac
106
+ }
107
+
108
+ # extract_last_user_query <json> -> text of the last <user_query> in this
109
+ # conversation's transcript, or '' if there is none. Capped at 2000 chars.
110
+ #
111
+ # This is the Tier 0 intent-trace primitive: the final-review hook prepends the
112
+ # extracted request to its followup so the model must trace every diff hunk back
113
+ # to it. Anything untraceable is a hallucinated requirement.
114
+ #
115
+ # Walks the JSONL backward via tac (preferred) or a portable awk fallback; finds
116
+ # the first (last) user record whose content carries a <user_query> tag.
117
+ extract_last_user_query() {
118
+ local json="$1"
119
+ local tp
120
+ tp="$(json_get "$json" transcript_path)"
121
+ [ -n "$tp" ] && [ -f "$tp" ] || return 0
122
+
123
+ local reversed
124
+ if command -v tac >/dev/null 2>&1; then
125
+ reversed="$(tac "$tp" 2>/dev/null)"
126
+ else
127
+ # Portable fallback: awk NR table reversed.
128
+ reversed="$(awk '{a[NR]=$0} END {for(i=NR;i>=1;i--) print a[i]}' "$tp" 2>/dev/null)"
129
+ fi
130
+ [ -n "$reversed" ] || return 0
131
+
132
+ # Pull the text out via python3 if available (handles JSON content arrays);
133
+ # fall back to a pure-grep that handles string-typed content.
134
+ if have_py; then
135
+ printf '%s' "$reversed" | python3 -c '
136
+ import json, re, sys
137
+ try:
138
+ for line in sys.stdin:
139
+ line = line.strip()
140
+ if not line or "\"role\"" not in line:
141
+ continue
142
+ try:
143
+ rec = json.loads(line)
144
+ except Exception:
145
+ continue
146
+ if not isinstance(rec, dict) or rec.get("role") != "user":
147
+ continue
148
+ msg = rec.get("message") or {}
149
+ content = msg.get("content")
150
+ text = ""
151
+ if isinstance(content, str):
152
+ text = content
153
+ elif isinstance(content, list):
154
+ for p in content:
155
+ if isinstance(p, dict) and p.get("type") == "text" and p.get("text"):
156
+ text += p["text"]
157
+ m = re.search(r"<user_query>\s*(.+?)\s*</user_query>", text, re.S)
158
+ if m:
159
+ q = m.group(1).strip()
160
+ if len(q) > 2000:
161
+ q = q[:2000] + "..."
162
+ print(q)
163
+ break
164
+ except Exception:
165
+ pass
166
+ ' 2>/dev/null
167
+ return 0
168
+ fi
169
+
170
+ # No python3: best-effort grep for the common case where the user message
171
+ # is the only place <user_query> appears in a line. Imperfect but bounded.
172
+ printf '%s' "$reversed" |
173
+ grep -m1 -oE '<user_query>[^<]*</user_query>' 2>/dev/null |
174
+ sed -E 's@</?user_query>@@g' |
175
+ head -c 2000
176
+ }
177
+
90
178
  # merge_subagent_edit_markers <json> <parent_cid> -> 0 if anything was folded
91
179
  #
92
180
  # Subagent edits fire afterFileEdit under the SUBAGENT's conversation_id, so
@@ -24,7 +24,8 @@ cid="$(safe_conversation_id "$input")"
24
24
 
25
25
  fold_note=""
26
26
  if merge_subagent_edit_markers "$input" "$cid"; then
27
- fold_note="SUBAGENT WORK DETECTED - a subagent of this conversation edited files (its edits fired hooks in ITS context, not yours). YOU are the auditor of its work: audit its diff (git status / git diff on the files it touched) against ~/.agents/hooks/self-review.md. Fix real bugs; stay silent otherwise. Its files are folded into this conversation's end-of-implementation review."
27
+ self_review="$(expand_agent_paths "$HOME/.agents/hooks/self-review.md")"
28
+ fold_note="SUBAGENT WORK DETECTED - a subagent of this conversation edited files (its edits fired hooks in ITS context, not yours). YOU are the auditor of its work: audit its diff (git status / git diff on the files it touched) against $self_review. Fix real bugs; stay silent otherwise. Its files are folded into this conversation's end-of-implementation review."
28
29
  fi
29
30
 
30
31
  pending_file="$(hooks_pending_dir)/feedback-$cid.txt"
@@ -19,6 +19,13 @@ file. Your job, on this turn, is to:
19
19
  - **Logic bugs that the user would actually care about**: a function
20
20
  that returns the wrong thing, an off-by-one, a missing `return`, a
21
21
  wrong import path.
22
+ - **Semantic contracts**: did any existing function's BEHAVIOR change
23
+ without its name, signature, or docstring changing? Names are
24
+ contracts. `deleteUser()` that now soft-deletes, a getter that now
25
+ writes, a function that used to throw on bad input and now silently
26
+ returns null — these are silent contract breaks that callers will
27
+ rely on and break against. If behavior changed, the name, signature,
28
+ or docstring must reflect it.
22
29
  4. If you find a real bug, **fix it with `Edit`**, then say nothing.
23
30
  Do not report it. Do not explain it. The user will see the fix
24
31
  in the next message; the bug is gone.
@@ -73,12 +73,22 @@ behaviour the task asked for):
73
73
  1. Correctness - logic, edge cases (null/empty/zero/boundary), language traps, security.
74
74
  2. Reliability - error paths handled, no swallowed errors, resources released.
75
75
  3. Coverage - behaviour-bearing changes have real tests; RUN the suite if present.
76
- 4. Anti-slop - no duplicate helpers, premature abstraction, unneeded deps,
77
- redundant comments, dead code.
76
+ 4. Anti-slop - if ~/.cursor/skills/anti-slop/scripts/scan_slop.py exists, run
77
+ `python ~/.cursor/skills/anti-slop/scripts/scan_slop.py --all`; otherwise
78
+ apply ~/.agents/hooks/anti-slop.md to the session diff.
78
79
  If an axis is clean, say so in one line. Then stop.'
79
80
  fi
81
+ body="$(expand_agent_paths "$body")"
80
82
 
81
- file_list="$(printf '%s\n' "$edited" | head -n 30 | sed 's/^/ /')"
83
+ file_list=""
84
+ while IFS= read -r p; do
85
+ [ -n "$p" ] || continue
86
+ rp="$(resolve_agent_path "$p")"
87
+ file_list="${file_list} ${rp}"$'\n'
88
+ done <<EOF
89
+ $edited
90
+ EOF
91
+ file_list="$(printf '%s' "$file_list" | head -n 30)"
82
92
  msg="SUBAGENT FINAL REVIEW - you just finished delegated implementation work. Before your result returns to the parent agent, audit it.
83
93
 
84
94
  Files you changed this run:
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "cursordoctrine",
3
- "version": "0.1.0",
4
- "description": "Thin self-review hooks for Cursor — the model is the auditor. One command installs the doctrine, the hook pack, and the anti-slop skill.",
3
+ "version": "0.2.0",
4
+ "description": "Thin self-review hooks for Cursor — the model is the auditor. One command installs the doctrine, the hook pack, and the anti-slop skill. Adds an intent-trace final-review axis that catches clean-code-wrong-feature (the worst AI slop).",
5
5
  "bin": {
6
6
  "cursordoctrine": "bin/cli.mjs"
7
7
  },
@@ -13,6 +13,8 @@
13
13
  # *Strategy / *Singleton / *Facade / *Builder / *Visitor / *Decorator
14
14
  # class, or CQRS / Event-Sourcing / DDD vocabulary
15
15
  # * redundant comments that merely restate the next line of code
16
+ # * operational slop (Tier 3): retry-without-backoff, await-in-loop,
17
+ # telemetry spam (>= 6 log/print statements added in one edit)
16
18
  #
17
19
  # Deferred to the model (semantic - no regex can judge these without drowning
18
20
  # the user in false positives): edge cases, duplicated logic, ignored
@@ -128,6 +130,50 @@ foreach ($a in $added) {
128
130
  if ($redundant.Count -ge 4) { break }
129
131
  }
130
132
 
133
+ # --- signal 4: operational slop (Tier 3) ----------------------------------
134
+ # Retry-without-backoff: a retry loop or recursive retry without an obvious
135
+ # sleep/backoff/setTimeout nearby. The whole-file body is scanned so the
136
+ # backoff can sit above or below the retry; this is deliberately seed-grade
137
+ # (high precision), not a verdict.
138
+ $opsFlags = New-Object System.Collections.Generic.List[string]
139
+ $bodyHas = {
140
+ param($pat)
141
+ foreach ($a in $added) { if ($a -match $pat) { return $true } }
142
+ return $false
143
+ }
144
+ $retryWord = '\b(retry|retryCount|retries|maxRetries|attempt)\w*\b'
145
+ $backoffWord = '\b(sleep|setTimeout|backoff|back_off|exponential|jitter|delay)\w*\b'
146
+ if (& $bodyHas $retryWord) {
147
+ $noBackoff = $true
148
+ foreach ($a in $added) { if ($a -match $backoffWord) { $noBackoff = $false; break } }
149
+ if ($noBackoff) {
150
+ $opsFlags.Add("- RETRY WITHOUT BACKOFF: a retry construct was added but no sleep/backoff/setTimeout is visible in this edit's added lines. Unbounded retries = retry storms + token/cost burn; add bounded backoff or confirm the runtime already throttles.")
151
+ }
152
+ }
153
+
154
+ # `await` (or `await ctx.db`) inside a loop construct on its own line — N+1 in
155
+ # agent/edge code, not just SQL. We seed on the added-line co-occurrence of a
156
+ # loop keyword and an awaited call; the model judges whether it is genuinely a
157
+ # sequential-await loop (real slop) or a legit streaming pattern.
158
+ $loopWord = '\b(for|while|forEach|map|filter|reduce|flatMap|for\s+await|async\s+for)\b'
159
+ $awaitCall = '\bawait\s+(fetch|ctx\.db|ctx\.run|client\.|axios|prisma\.|supabase\.|db\.|repo\.)'
160
+ if (& $bodyHas $loopWord) {
161
+ $awaitInLoop = $false
162
+ foreach ($a in $added) { if ($a -match $awaitCall) { $awaitInLoop = $true; break } }
163
+ if ($awaitInLoop) {
164
+ $opsFlags.Add("- AWAIT IN LOOP: a loop construct and an awaited IO call both appear in this edit. Sequential awaits in a loop = N+1 / serial latency; confirm whether Promise.all / a batch call / a single query is the right primitive. (If this is genuinely a streaming pattern, ignore.)")
165
+ }
166
+ }
167
+
168
+ # Telemetry spam seed: 6+ console.log / print / fmt.Print / std::cout::<< added
169
+ # in one file. Models paste debug prints liberally; six is well past intent.
170
+ $logRe = '\b(console\.(log|debug|info|warn|error)|print\(|fmt\.Print|std::cout|NSLog|System\.out\.println|println!|dbg!|console\.dir)\b'
171
+ $logCount = 0
172
+ foreach ($a in $added) { if ($a -match $logRe) { $logCount++ } }
173
+ if ($logCount -ge 6) {
174
+ $opsFlags.Add("- TELEMETRY SPAM: $logCount log/print statements added in this one edit. Debug-level telemetry that nobody reads is slop; consolidate or remove (kept only if this is a real logging entrypoint).")
175
+ }
176
+
131
177
  # --- decide whether to fire ----------------------------------------------
132
178
  $srcRe = '\.(ts|tsx|js|jsx|mjs|cjs|py|go|rs|java|kt|kts|cs|cpp|cc|cxx|c|h|hpp|rb|php|swift|scala|m|mm|sh|ps1|lua|dart|ex|exs|vue|svelte)$'
133
179
  $addedCode = 0
@@ -139,6 +185,7 @@ $flags = New-Object System.Collections.Generic.List[string]
139
185
  if ($depAdded) { $flags.Add("- DEPENDENCY: " + $base + " gained a dependency - is it necessary, or do the stdlib / existing deps already cover it?") }
140
186
  if ($patterns.Count -gt 0) { $flags.Add("- PREMATURE ABSTRACTION: " + ($patterns -join ', ') + " - is there a real, present problem (2-3+ call sites that exist today) that needs it? If it is speculative, delete it and write the direct code.") }
141
187
  if ($redundant.Count -gt 0) { $flags.Add("- REDUNDANT COMMENTS: " + ($redundant -join ' | ') + " - delete comments that restate the code; keep only WHY.") }
188
+ $flags.AddRange($opsFlags)
142
189
 
143
190
  if ($flags.Count -eq 0 -and -not $substantial) { exit 0 }
144
191
 
@@ -1,11 +1,20 @@
1
1
  FINAL REVIEW — you just finished an implementation. Before you treat it as done,
2
- audit EVERYTHING you changed this session across the four axes below and FIX what
2
+ audit EVERYTHING you changed this session across the five axes below and FIX what
3
3
  fails. Do NOT revert the behaviour the user asked for. If an axis is already
4
4
  clean, say so in one line — do not manufacture work.
5
5
 
6
6
  Start by re-reading the diff. Scope the review to your session's changes and the
7
7
  code they touch.
8
8
 
9
+ ## 0. Intent trace (HIGHEST PRIORITY — run first)
10
+ The hook extracted your last user message as "ORIGINAL REQUEST" above. For every
11
+ hunk in the diff, answer: which part of the request forced this change? Anything
12
+ that cannot trace to the request is a HALLUCINATED REQUIREMENT — a feature,
13
+ flag, refactor, abstraction, dependency, or "nice to have" that nobody asked for.
14
+ Revert each one. "Clean code, wrong feature" is the worst failure mode and no
15
+ later axis can catch it. This axis outranks all others. (If no ORIGINAL REQUEST
16
+ is present — sandboxed verify run, no transcript — skip this axis.)
17
+
9
18
  ## 1. Correctness
10
19
  - The logic does what the task requires — no off-by-one, inverted condition,
11
20
  wrong operator, wrong return value, wrong import path.
@@ -94,12 +94,31 @@ FINAL REVIEW - audit everything you changed this session and FIX what fails
94
94
  Fix now, re-run the scan + tests, then stop. If an axis is clean, say so in one line.
95
95
  '@
96
96
  }
97
+ $body = Expand-AgentPaths $body
97
98
 
98
- $fileList = ($edited | Select-Object -First 30) -join "`n "
99
- $msg = "FINAL REVIEW (end of implementation) - correctness, reliability, coverage, anti-slop.`n`nFiles you changed this session:`n $fileList`n`n$body"
99
+ $resolved = @($edited | ForEach-Object { Resolve-AgentPath $_ })
100
+ $fileList = ($resolved | Select-Object -First 30) -join "`n "
101
+
102
+ # Tier 0: extract the last user <user_query> from the transcript so the model
103
+ # can trace every diff hunk back to a concrete request. Anything untraceable is
104
+ # a hallucinated requirement. Empty when there is no transcript or no user_query
105
+ # (sandboxed verify runs, fresh installs) — the axis is then a no-op.
106
+ $userQuery = Get-LastUserQuery $obj
107
+ $intentBlock = ''
108
+ if ($userQuery) {
109
+ $intentBlock = "ORIGINAL REQUEST (your last user message, for intent trace):`n---`n$userQuery`n---`n`n"
110
+ }
111
+
112
+ # Tier 5: cross-file change-surface metric. The per-file afterFileEdit audits
113
+ # miss the 50-file rename case; this seeds the whole-session footprint so the
114
+ # model can judge whether the change surface is proportional to the request.
115
+ $uniqueFiles = @($edited | Select-Object -Unique).Count
116
+ $surfaceBlock = "Session footprint: $uniqueFiles file(s) touched. If a simple request produced >5 files or >200 lines, justify each file's inclusion or trim.`n`n"
117
+
118
+ $msg = "FINAL REVIEW (end of implementation) - intent, correctness, reliability, coverage, anti-slop.`n`n${surfaceBlock}${intentBlock}Files you changed this session:`n $fileList`n`n$body"
100
119
 
101
120
  # Arm the one-shot brake BEFORE emitting, so a crash after emit can't re-fire.
102
- try { New-Item -ItemType File -Path $flag -Force | Out-Null } catch { }
121
+ New-Item -ItemType File -Path $flag -Force -ErrorAction SilentlyContinue | Out-Null
103
122
 
104
123
  Write-HookJson @{ followup_message = $msg }
105
124
  exit 0
@@ -51,6 +51,69 @@ function ConvertTo-FwdPath([string]$p) {
51
51
  return $p.Replace('\', '/')
52
52
  }
53
53
 
54
+ # Expand ~/ in agent-facing text to an absolute profile path. pwsh and many
55
+ # agent tools do not resolve ~ on Windows; stop-hook followups must be literal.
56
+ function Expand-AgentPaths([string]$text) {
57
+ if (-not $text) { return $text }
58
+ $homeFwd = $HOME.TrimEnd('\', '/').Replace('\', '/')
59
+ return $text.Replace('~/', "$homeFwd/")
60
+ }
61
+
62
+ # Normalize a file path for agent prompts (expand ~, forward slashes).
63
+ function Resolve-AgentPath([string]$p) {
64
+ if (-not $p) { return $p }
65
+ $p = $p.Trim()
66
+ if ($p -match '^~[\\/]') {
67
+ $p = Join-Path $HOME ($p.Substring(2))
68
+ }
69
+ if (Test-Path -LiteralPath $p -ErrorAction SilentlyContinue) {
70
+ $resolved = Resolve-Path -LiteralPath $p -ErrorAction SilentlyContinue
71
+ if ($resolved) { return ConvertTo-FwdPath $resolved.Path }
72
+ }
73
+ return ConvertTo-FwdPath $p
74
+ }
75
+
76
+ # Extract the last user <user_query> from a Cursor transcript JSONL. The
77
+ # transcript is an array of {role, message} records; we walk backward from the
78
+ # end, find the last user turn whose content has a <user_query> tag, and return
79
+ # its text. Returns '' if there is no transcript or no user_query. Capped at
80
+ # 2000 chars so the follow-up prompt stays bounded.
81
+ #
82
+ # This is the Tier 0 intent-trace primitive: the final-review hook prepends the
83
+ # extracted request to its followup so the model must trace every diff hunk back
84
+ # to it. Anything untraceable is a hallucinated requirement.
85
+ function Get-LastUserQuery($obj) {
86
+ $tp = ''
87
+ if ($obj -and $obj.PSObject.Properties['transcript_path']) { $tp = [string]$obj.transcript_path }
88
+ if (-not $tp -or -not (Test-Path -LiteralPath $tp)) { return '' }
89
+ $lines = @(Get-Content -LiteralPath $tp -ErrorAction SilentlyContinue)
90
+ for ($i = $lines.Count - 1; $i -ge 0; $i--) {
91
+ $line = $lines[$i]
92
+ if (-not $line -or $line -notmatch '"role"\s*:\s*"user"') { continue }
93
+ try {
94
+ $rec = $line | ConvertFrom-Json -ErrorAction SilentlyContinue
95
+ } catch { continue }
96
+ if (-not $rec -or -not $rec.message) { continue }
97
+ $content = $rec.message.content
98
+ if (-not $content) { continue }
99
+ # content is an array of {type:text,text:...} or a plain string
100
+ $text = ''
101
+ if ($content -is [string]) {
102
+ $text = $content
103
+ } else {
104
+ foreach ($part in $content) {
105
+ if ($part.type -eq 'text' -and $part.text) { $text += $part.text }
106
+ }
107
+ }
108
+ if ($text -match '(?s)<user_query>\s*(.+?)\s*</user_query>') {
109
+ $q = $Matches[1].Trim()
110
+ if ($q.Length -gt 2000) { $q = $q.Substring(0, 2000) + '...' }
111
+ return $q
112
+ }
113
+ }
114
+ return ''
115
+ }
116
+
54
117
  # Subagent edits fire afterFileEdit under the SUBAGENT's conversation_id, so
55
118
  # their session-edits markers are invisible to the parent's stop-hook review.
56
119
  # Subagent transcripts live at <transcripts>/<parent-cid>/subagents/<sub-cid>.jsonl,
@@ -23,7 +23,8 @@ $cid = Get-SafeConversationId $obj
23
23
 
24
24
  $foldNote = ''
25
25
  if (Merge-SubagentEditMarkers $obj $cid) {
26
- $foldNote = "SUBAGENT WORK DETECTED - a subagent of this conversation edited files (its edits fired hooks in ITS context, not yours). YOU are the auditor of its work: audit its diff (git status / git diff on the files it touched) against ~/.agents/hooks/self-review.md. Fix real bugs; stay silent otherwise. Its files are folded into this conversation's end-of-implementation review."
26
+ $selfReview = Expand-AgentPaths (Join-Path $HOME '.agents/hooks/self-review.md')
27
+ $foldNote = "SUBAGENT WORK DETECTED - a subagent of this conversation edited files (its edits fired hooks in ITS context, not yours). YOU are the auditor of its work: audit its diff (git status / git diff on the files it touched) against $selfReview. Fix real bugs; stay silent otherwise. Its files are folded into this conversation's end-of-implementation review."
27
28
  }
28
29
 
29
30
  $pendingFile = Join-Path (Get-HooksPendingDir) "feedback-$cid.txt"
@@ -19,6 +19,13 @@ file. Your job, on this turn, is to:
19
19
  - **Logic bugs that the user would actually care about**: a function
20
20
  that returns the wrong thing, an off-by-one, a missing `return`, a
21
21
  wrong import path.
22
+ - **Semantic contracts**: did any existing function's BEHAVIOR change
23
+ without its name, signature, or docstring changing? Names are
24
+ contracts. `deleteUser()` that now soft-deletes, a getter that now
25
+ writes, a function that used to throw on bad input and now silently
26
+ returns null — these are silent contract breaks that callers will
27
+ rely on and break against. If behavior changed, the name, signature,
28
+ or docstring must reflect it.
22
29
  4. If you find a real bug, **fix it with `Edit`**, then say nothing.
23
30
  Do not report it. Do not explain it. The user will see the fix
24
31
  in the next message; the bug is gone.
@@ -73,13 +73,16 @@ behaviour the task asked for):
73
73
  1. Correctness - logic, edge cases (null/empty/zero/boundary), language traps, security.
74
74
  2. Reliability - error paths handled, no swallowed errors, resources released.
75
75
  3. Coverage - behaviour-bearing changes have real tests; RUN the suite if present.
76
- 4. Anti-slop - no duplicate helpers, premature abstraction, unneeded deps,
77
- redundant comments, dead code.
76
+ 4. Anti-slop - if ~/.cursor/skills/anti-slop/scripts/scan_slop.py exists, run
77
+ `python ~/.cursor/skills/anti-slop/scripts/scan_slop.py --all`; otherwise
78
+ apply ~/.agents/hooks/anti-slop.md to the session diff.
78
79
  If an axis is clean, say so in one line. Then stop.
79
80
  '@
80
81
  }
82
+ $body = Expand-AgentPaths $body
81
83
 
82
- $fileList = ($edited | Select-Object -First 30) -join "`n "
84
+ $resolved = @($edited | ForEach-Object { Resolve-AgentPath $_ })
85
+ $fileList = ($resolved | Select-Object -First 30) -join "`n "
83
86
  $msg = "SUBAGENT FINAL REVIEW - you just finished delegated implementation work. Before your result returns to the parent agent, audit it.`n`nFiles you changed this run:`n $fileList`n`n$body"
84
87
 
85
88
  # Arm the one-shot brake BEFORE emitting, so a crash after emit can't re-fire.