invar-tools 1.10.0__py3-none-any.whl → 1.12.0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- invar/core/doc_edit.py +187 -0
- invar/core/doc_parser.py +563 -0
- invar/core/ts_sig_parser.py +6 -3
- invar/mcp/handlers.py +436 -0
- invar/mcp/server.py +351 -156
- invar/node_tools/ts-query.js +396 -0
- invar/shell/commands/doc.py +409 -0
- invar/shell/commands/guard.py +29 -0
- invar/shell/commands/init.py +72 -13
- invar/shell/commands/perception.py +302 -6
- invar/shell/doc_tools.py +459 -0
- invar/shell/fs.py +15 -14
- invar/shell/prove/crosshair.py +3 -0
- invar/shell/prove/guard_ts.py +13 -10
- invar/shell/py_refs.py +156 -0
- invar/shell/skill_manager.py +17 -15
- invar/shell/ts_compiler.py +238 -0
- invar/templates/examples/typescript/patterns.md +193 -0
- invar/templates/skills/develop/SKILL.md.jinja +46 -0
- invar/templates/skills/review/SKILL.md.jinja +205 -493
- {invar_tools-1.10.0.dist-info → invar_tools-1.12.0.dist-info}/METADATA +58 -8
- {invar_tools-1.10.0.dist-info → invar_tools-1.12.0.dist-info}/RECORD +27 -18
- {invar_tools-1.10.0.dist-info → invar_tools-1.12.0.dist-info}/WHEEL +0 -0
- {invar_tools-1.10.0.dist-info → invar_tools-1.12.0.dist-info}/entry_points.txt +0 -0
- {invar_tools-1.10.0.dist-info → invar_tools-1.12.0.dist-info}/licenses/LICENSE +0 -0
- {invar_tools-1.10.0.dist-info → invar_tools-1.12.0.dist-info}/licenses/LICENSE-GPL +0 -0
- {invar_tools-1.10.0.dist-info → invar_tools-1.12.0.dist-info}/licenses/NOTICE +0 -0
|
@@ -1,337 +1,158 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: review
|
|
3
|
-
description:
|
|
3
|
+
description: Adversarial code review. Code is GUILTY until proven INNOCENT. Every round spawns isolated subagent reviewing FULL scope.
|
|
4
4
|
_invar:
|
|
5
|
-
version: "
|
|
5
|
+
version: "7.0"
|
|
6
6
|
managed: skill
|
|
7
7
|
---
|
|
8
8
|
<!--invar:skill-->
|
|
9
9
|
|
|
10
|
-
# Review
|
|
10
|
+
# Review Skill (Adversarial)
|
|
11
11
|
|
|
12
|
-
|
|
13
|
-
> **Mindset:** REJECTION-FIRST. Code is GUILTY until proven INNOCENT.
|
|
14
|
-
> **Success Metric:** Issues FOUND, not code approved. Zero issues = you failed to look hard enough.
|
|
15
|
-
> **Workflow:** Two-step loop: Review → Fix → Review → Fix → ... (full scope each round, no separate "verify" step).
|
|
12
|
+
## Mandatory Rules (MUST follow, NO exceptions)
|
|
16
13
|
|
|
17
|
-
|
|
14
|
+
1. **EVERY round MUST spawn isolated subagent** (Task tool with model=opus)
|
|
15
|
+
2. **EVERY round reviews FULL scope** (all files, not just changes)
|
|
16
|
+
3. **Code is GUILTY until proven INNOCENT**
|
|
17
|
+
4. **NO user confirmation between rounds** — just do it
|
|
18
|
+
5. **MAX_ROUNDS = 5**
|
|
18
19
|
|
|
19
|
-
|
|
20
|
-
|-------|---------|----------|
|
|
21
|
-
| (default) | Same context | Reviewing **others' code** only |
|
|
22
|
-
| `--deep` | **Isolated agent** | Self-review, before merge, maximum objectivity |
|
|
20
|
+
**Violation = Review Invalid.** If you skip subagent or review only changes, the review is worthless.
|
|
23
21
|
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
**`--deep` mode:** Spawns isolated agent with no conversation history. **Required when:**
|
|
27
|
-
- You wrote or modified the code being reviewed (self-review)
|
|
28
|
-
- Before merge/PR
|
|
29
|
-
- Maximum objectivity needed
|
|
30
|
-
|
|
31
|
-
### ⚠️ Same-Context Review Limitations (CRITICAL)
|
|
32
|
-
|
|
33
|
-
**Same-context review CANNOT be objective for self-written code because:**
|
|
34
|
-
|
|
35
|
-
| Cognitive Bias | Effect |
|
|
36
|
-
|----------------|--------|
|
|
37
|
-
| **Intent over code** | You "know" what it's supposed to do, so you don't see what it actually does |
|
|
38
|
-
| **Context memory** | You "remember" reading code, so you skip re-reading carefully |
|
|
39
|
-
| **Confirmation bias** | You look for "code works" evidence, not "code fails" evidence |
|
|
40
|
-
| **Completion pressure** | Subconscious goal becomes "finish review" not "find bugs" |
|
|
41
|
-
|
|
42
|
-
**Evidence:** In DX-71 review, same-context missed 2 CRITICAL + 4 MAJOR issues that
|
|
43
|
-
isolated agent found immediately. "Fresh eyes" claims don't work in same context.
|
|
44
|
-
|
|
45
|
-
### Mandatory Self-Review Detection (DX-72)
|
|
46
|
-
|
|
47
|
-
**Before starting review, you MUST check:**
|
|
48
|
-
|
|
49
|
-
```
|
|
50
|
-
If ANY file in review scope was edited by agent this session:
|
|
51
|
-
┌──────────────────────────────────────────────────────────────┐
|
|
52
|
-
│ 🚨 SELF-REVIEW DETECTED — Isolation Required │
|
|
53
|
-
│ │
|
|
54
|
-
│ You modified files in the review scope this session. │
|
|
55
|
-
│ Same-context review has proven cognitive blind spots. │
|
|
56
|
-
│ │
|
|
57
|
-
│ Options: │
|
|
58
|
-
│ [1] Use --deep (RECOMMENDED) — Spawn isolated agent │
|
|
59
|
-
│ [2] Acknowledge risk — User explicitly accepts limitations │
|
|
60
|
-
│ │
|
|
61
|
-
│ If user says "continue" or "quick review": │
|
|
62
|
-
│ → Proceed but add WARNING to final report │
|
|
63
|
-
│ → Report MUST state: "Self-review without isolation" │
|
|
64
|
-
└──────────────────────────────────────────────────────────────┘
|
|
65
|
-
```
|
|
66
|
-
|
|
67
|
-
**Default action:** If user doesn't specify, use `--deep` for self-review.
|
|
68
|
-
|
|
69
|
-
### --deep Mode Execution
|
|
70
|
-
|
|
71
|
-
When `--deep` is selected:
|
|
72
|
-
|
|
73
|
-
1. Collect minimal inputs:
|
|
74
|
-
- Files to review
|
|
75
|
-
- Contracts (if available)
|
|
76
|
-
- Test files (if available)
|
|
77
|
-
|
|
78
|
-
2. Spawn Task agent with:
|
|
79
|
-
- **Adversarial Code Reviewer persona** (see Appendix)
|
|
80
|
-
- NO conversation history
|
|
81
|
-
- Only the collected inputs
|
|
82
|
-
|
|
83
|
-
3. Isolated agent returns structured review report
|
|
84
|
-
|
|
85
|
-
4. Main agent fixes issues (if any)
|
|
22
|
+
---
|
|
86
23
|
|
|
87
|
-
|
|
24
|
+
## Scope Classification (DX-75)
|
|
88
25
|
|
|
89
|
-
|
|
26
|
+
**Before starting, classify the scope:**
|
|
90
27
|
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
else:
|
|
97
|
-
quality_met = True
|
|
98
|
-
```
|
|
28
|
+
| Classification | Criteria | Strategy |
|
|
29
|
+
|----------------|----------|----------|
|
|
30
|
+
| **SMALL** | <5 files AND <1500 lines | THOROUGH (no enumeration) |
|
|
31
|
+
| **MEDIUM** | 5-10 files OR 1500-5000 lines | HYBRID (enum + open) |
|
|
32
|
+
| **LARGE** | >10 files OR >5000 lines | CHUNKED (parallel subagents) |
|
|
99
33
|
|
|
100
|
-
**Why
|
|
101
|
-
-
|
|
102
|
-
-
|
|
103
|
-
- Round 2 in same context drifts to "verify my fixes" not "find problems"
|
|
34
|
+
**Why different strategies?**
|
|
35
|
+
- SMALL: Pre-enumeration causes "checklist mentality" — you only verify listed items, miss variants
|
|
36
|
+
- LARGE: Without enumeration, attention drifts — later files get less scrutiny
|
|
104
37
|
|
|
105
38
|
---
|
|
106
39
|
|
|
107
|
-
##
|
|
108
|
-
|
|
109
|
-
**This skill IS for:**
|
|
110
|
-
- Finding bugs and logic errors in existing code
|
|
111
|
-
- Verifying contract semantic value
|
|
112
|
-
- Auditing escape hatches
|
|
113
|
-
- Security review
|
|
114
|
-
|
|
115
|
-
**This skill is NOT for:**
|
|
116
|
-
- Implementing new features → switch to `/develop`
|
|
117
|
-
- Understanding how code works → switch to `/investigate`
|
|
118
|
-
- Deciding on architecture → switch to `/propose`
|
|
119
|
-
|
|
120
|
-
**Drift detection:** If you're writing significant new code (not fixes) → STOP, you're in wrong skill.
|
|
121
|
-
|
|
122
|
-
## Auto-Loop Configuration
|
|
40
|
+
## Strategy: THOROUGH (SMALL scope)
|
|
123
41
|
|
|
124
42
|
```
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
43
|
+
┌─────────────────────────────────────────────────────────────┐
|
|
44
|
+
│ THOROUGH STRATEGY (for SMALL scope) │
|
|
45
|
+
│ ───────────────────────────────────────────────────────────│
|
|
46
|
+
│ │
|
|
47
|
+
│ ⚠️ DO NOT pre-enumerate issues or patterns │
|
|
48
|
+
│ ⚠️ DO NOT use grep/sig to "find issues first" │
|
|
49
|
+
│ │
|
|
50
|
+
│ Instead: │
|
|
51
|
+
│ 1. Read each file COMPLETELY, line by line │
|
|
52
|
+
│ 2. Apply checklist A-G as you read │
|
|
53
|
+
│ 3. Trust your judgment to find issues │
|
|
54
|
+
│ 4. Look for VARIANTS and EDGE CASES │
|
|
55
|
+
│ │
|
|
56
|
+
│ Why: Pre-enumeration narrows focus to known patterns. │
|
|
57
|
+
│ Small scope = you CAN read everything thoroughly. │
|
|
58
|
+
│ This finds issues that pattern matching misses. │
|
|
59
|
+
└─────────────────────────────────────────────────────────────┘
|
|
128
60
|
```
|
|
129
61
|
|
|
130
|
-
|
|
131
|
-
**DO NOT ask "Proceed with fixes?" or similar — just fix and continue.**
|
|
132
|
-
|
|
133
|
-
## Prime Directive: Reject Until Proven Correct
|
|
134
|
-
|
|
135
|
-
**You are the PROSECUTOR, not the defense attorney.**
|
|
136
|
-
|
|
137
|
-
| Trap | Reality Check |
|
|
138
|
-
|------|---------------|
|
|
139
|
-
| "Seems fine" | You failed to find the bug |
|
|
140
|
-
| "Makes sense" | You're rationalizing, not reviewing |
|
|
141
|
-
| "Edge case is unlikely" | Edge cases ARE bugs |
|
|
142
|
-
| "Comment explains it" | Comments don't fix code |
|
|
143
|
-
| "Assessed as acceptable" | "Assessed" ≠ "Fixed" |
|
|
144
|
-
|
|
145
|
-
## Role Separation (CRITICAL)
|
|
146
|
-
|
|
147
|
-
**You play TWO distinct roles that cycle AUTOMATICALLY:**
|
|
148
|
-
|
|
149
|
-
| Role | Allowed Actions | Forbidden |
|
|
150
|
-
|------|-----------------|-----------|
|
|
151
|
-
| **REVIEWER** | Find issues (full scope), declare quality_met | Write code, rationalize issues |
|
|
152
|
-
| **FIXER** | Implement fixes only | Declare quality_met, dismiss issues |
|
|
153
|
-
|
|
154
|
-
**Role Transition Markers (REQUIRED):**
|
|
62
|
+
## Strategy: HYBRID (MEDIUM scope)
|
|
155
63
|
|
|
156
64
|
```
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
|
|
160
|
-
|
|
161
|
-
|
|
162
|
-
|
|
163
|
-
|
|
65
|
+
┌─────────────────────────────────────────────────────────────┐
|
|
66
|
+
│ HYBRID STRATEGY (for MEDIUM scope) │
|
|
67
|
+
│ ───────────────────────────────────────────────────────────│
|
|
68
|
+
│ │
|
|
69
|
+
│ Phase 0: ENUMERATE (Main Agent) │
|
|
70
|
+
│ ┌─────────────────────────────────────────────────────┐ │
|
|
71
|
+
│ │ Use grep/invar_sig to find: │ │
|
|
72
|
+
│ │ - All @pre/@post contracts │ │
|
|
73
|
+
│ │ - All @invar:allow escape hatches │ │
|
|
74
|
+
│ │ - Hardcoded strings (secrets?) │ │
|
|
75
|
+
│ │ - subprocess/exec/eval calls │ │
|
|
76
|
+
│ │ - bare except clauses │ │
|
|
77
|
+
│ │ Create issue_map with file:line for each │ │
|
|
78
|
+
│ └─────────────────────────────────────────────────────┘ │
|
|
79
|
+
│ │
|
|
80
|
+
│ Phase 1: GUIDED REVIEW (Isolated Subagent) │
|
|
81
|
+
│ ┌─────────────────────────────────────────────────────┐ │
|
|
82
|
+
│ │ Pass issue_map to subagent │ │
|
|
83
|
+
│ │ Subagent verifies each item │ │
|
|
84
|
+
│ │ Reports: "Checked N/M items from issue_map" │ │
|
|
85
|
+
│ └─────────────────────────────────────────────────────┘ │
|
|
86
|
+
│ │
|
|
87
|
+
│ Phase 2: OPEN DISCOVERY (Same Subagent) │
|
|
88
|
+
│ ┌─────────────────────────────────────────────────────┐ │
|
|
89
|
+
│ │ "Now forget the issue_map. │ │
|
|
90
|
+
│ │ Look for issues NOT in the map: │ │
|
|
91
|
+
│ │ - Variants of listed patterns │ │
|
|
92
|
+
│ │ - Logic errors │ │
|
|
93
|
+
│ │ - Edge cases" │ │
|
|
94
|
+
│ │ Reports: "Found N additional issues" │ │
|
|
95
|
+
│ └─────────────────────────────────────────────────────┘ │
|
|
96
|
+
└─────────────────────────────────────────────────────────────┘
|
|
164
97
|
```
|
|
165
98
|
|
|
166
|
-
|
|
167
|
-
|
|
168
|
-
## Quality Gate Authority
|
|
169
|
-
|
|
170
|
-
**ONLY the Reviewer role can declare `quality_met`.**
|
|
171
|
-
|
|
172
|
-
Before declaring exit:
|
|
173
|
-
1. Re-read EVERY issue found
|
|
174
|
-
2. For each issue, verify: "Is this ACTUALLY fixed, or did I rationalize it?"
|
|
175
|
-
3. Ask: "Would I accept this excuse from someone else's code?"
|
|
176
|
-
|
|
177
|
-
**Self-Check Questions:**
|
|
178
|
-
- Did I write code AND declare quality_met? → Role confusion detected
|
|
179
|
-
- Did I say "assessed" instead of "fixed"? → Rationalization detected
|
|
180
|
-
- Did any MAJOR become a comment instead of code? → Fix failed
|
|
181
|
-
|
|
182
|
-
## Fault-Finding Persona
|
|
183
|
-
|
|
184
|
-
Assume:
|
|
185
|
-
- The code has bugs until proven otherwise
|
|
186
|
-
- The contracts may be meaningless ceremony
|
|
187
|
-
- The implementer may have rationalized poor decisions
|
|
188
|
-
- Escape hatches may be abused
|
|
189
|
-
- **Your own fixes may introduce new bugs**
|
|
190
|
-
|
|
191
|
-
You ARE here to:
|
|
192
|
-
- Find bugs, logic errors, edge cases
|
|
193
|
-
- Challenge whether contracts have semantic value
|
|
194
|
-
- Check if code matches contracts (not if code "seems right")
|
|
195
|
-
|
|
196
|
-
## Fresh Eyes Mandate (Round 2+) — ENFORCED
|
|
197
|
-
|
|
198
|
-
**For rounds after the first, you MUST adopt "fresh eyes" mindset:**
|
|
199
|
-
|
|
200
|
-
> "I am a different reviewer who has never seen this code or the previous fixes."
|
|
201
|
-
|
|
202
|
-
| Trap | Correction |
|
|
203
|
-
|------|------------|
|
|
204
|
-
| "I just fixed this" | Irrelevant. Review it like new code. |
|
|
205
|
-
| "This was fine last round" | Maybe you missed something. Check again. |
|
|
206
|
-
| "The fix looks correct" | That's FIXER thinking. Find what's WRONG. |
|
|
207
|
-
|
|
208
|
-
### Why This Exists
|
|
209
|
-
|
|
210
|
-
Round 2+ in the same context naturally drifts toward "verify my fixes" instead of
|
|
211
|
-
"find all problems". This cognitive bias causes issues to slip through:
|
|
212
|
-
- Attention focuses on recently-fixed areas
|
|
213
|
-
- Brain skips content it "remembers" reading
|
|
214
|
-
- Subconscious goal becomes "complete task" not "find bugs"
|
|
215
|
-
|
|
216
|
-
### Mandatory Actions (Round 2+)
|
|
217
|
-
|
|
218
|
-
**Before declaring quality_met, you MUST:**
|
|
219
|
-
|
|
220
|
-
1. **RE-READ all files using Read tool**
|
|
221
|
-
```
|
|
222
|
-
❌ WRONG: Rely on context memory ("I already read this")
|
|
223
|
-
✅ RIGHT: Call Read() for each file in scope, every round
|
|
224
|
-
```
|
|
225
|
-
|
|
226
|
-
2. **Systematic audit per code block** (for documentation/examples)
|
|
227
|
-
```
|
|
228
|
-
For each code block:
|
|
229
|
-
- List all symbols USED (types, functions, classes)
|
|
230
|
-
- List all IMPORTS shown
|
|
231
|
-
- Verify: every used symbol has corresponding import
|
|
232
|
-
```
|
|
233
|
-
|
|
234
|
-
3. **Section-by-section explicit check**
|
|
235
|
-
```
|
|
236
|
-
□ Section 1 checked
|
|
237
|
-
□ Section 2 checked
|
|
238
|
-
□ Section 3 checked
|
|
239
|
-
... (every section, not "looks fine overall")
|
|
240
|
-
```
|
|
241
|
-
|
|
242
|
-
4. **Verbalize findings before exit**
|
|
243
|
-
```
|
|
244
|
-
❌ WRONG: "Verified fixes, looks good"
|
|
245
|
-
✅ RIGHT: "Re-read 5 files, checked 23 sections, found 0 new issues"
|
|
246
|
-
```
|
|
247
|
-
|
|
248
|
-
### Round 2+ Workflow Diagram
|
|
99
|
+
## Strategy: CHUNKED (LARGE scope)
|
|
249
100
|
|
|
250
101
|
```
|
|
251
|
-
|
|
252
|
-
|
|
253
|
-
|
|
254
|
-
│
|
|
255
|
-
│
|
|
256
|
-
│
|
|
257
|
-
│
|
|
258
|
-
│
|
|
259
|
-
│
|
|
260
|
-
│
|
|
261
|
-
│
|
|
262
|
-
│
|
|
263
|
-
│
|
|
264
|
-
│
|
|
265
|
-
│
|
|
266
|
-
│
|
|
267
|
-
│
|
|
268
|
-
|
|
102
|
+
┌─────────────────────────────────────────────────────────────┐
|
|
103
|
+
│ CHUNKED STRATEGY (for LARGE scope) │
|
|
104
|
+
│ ───────────────────────────────────────────────────────────│
|
|
105
|
+
│ │
|
|
106
|
+
│ 1. Split files into chunks of ~3-5 files each │
|
|
107
|
+
│ │
|
|
108
|
+
│ 2. For each chunk (can be parallel): │
|
|
109
|
+
│ - Spawn isolated subagent │
|
|
110
|
+
│ - Use HYBRID strategy within chunk │
|
|
111
|
+
│ │
|
|
112
|
+
│ 3. Cross-chunk analysis: │
|
|
113
|
+
│ - Check cross-file dependencies │
|
|
114
|
+
│ - Check API consistency │
|
|
115
|
+
│ │
|
|
116
|
+
│ 4. Merge all findings, deduplicate │
|
|
117
|
+
│ │
|
|
118
|
+
│ Why: Prevents "attention fatigue" on file 8+ of 15. │
|
|
119
|
+
│ Each chunk gets fresh attention. │
|
|
120
|
+
└─────────────────────────────────────────────────────────────┘
|
|
269
121
|
```
|
|
270
122
|
|
|
271
|
-
|
|
272
|
-
1. Re-run the ENTIRE checklist (A through G)
|
|
273
|
-
2. Review ALL files in scope, not just recent fixes
|
|
274
|
-
3. Check if fixes introduced NEW issues
|
|
275
|
-
4. Look for issues you missed in previous rounds
|
|
276
|
-
|
|
277
|
-
## Entry Actions
|
|
278
|
-
|
|
279
|
-
### Context Refresh (DX-54)
|
|
280
|
-
|
|
281
|
-
Before any workflow action:
|
|
282
|
-
1. Read `.invar/context.md` (especially Key Rules section)
|
|
283
|
-
2. Display routing announcement
|
|
284
|
-
|
|
285
|
-
### Routing Announcement
|
|
286
|
-
|
|
287
|
-
```
|
|
288
|
-
📍 Routing: /review — [trigger, e.g. "review_suggested", "user requested review"]
|
|
289
|
-
Task: [review scope summary]
|
|
290
|
-
```
|
|
291
|
-
|
|
292
|
-
## Mode Selection
|
|
293
|
-
|
|
294
|
-
### Step 1: Check Self-Review (MANDATORY)
|
|
295
|
-
|
|
296
|
-
```python
|
|
297
|
-
# Pseudo-code for self-review detection
|
|
298
|
-
files_in_scope = get_review_scope()
|
|
299
|
-
files_edited_this_session = get_agent_edits()
|
|
300
|
-
|
|
301
|
-
if files_in_scope & files_edited_this_session:
|
|
302
|
-
# SELF-REVIEW DETECTED
|
|
303
|
-
if user_said("--deep") or user_said("deep review"):
|
|
304
|
-
mode = ISOLATED
|
|
305
|
-
elif user_said("quick") or user_said("continue"):
|
|
306
|
-
mode = SAME_CONTEXT
|
|
307
|
-
add_warning_to_report = True # "Self-review without isolation"
|
|
308
|
-
else:
|
|
309
|
-
# Default: recommend --deep, wait for user choice
|
|
310
|
-
show_self_review_warning()
|
|
311
|
-
mode = ISOLATED # Default to safe option
|
|
312
|
-
```
|
|
123
|
+
---
|
|
313
124
|
|
|
314
|
-
|
|
125
|
+
## 2-Step Loop (MANDATORY workflow)
|
|
315
126
|
|
|
316
|
-
Look for `review_suggested` warning:
|
|
317
127
|
```
|
|
318
|
-
|
|
319
|
-
|
|
320
|
-
|
|
128
|
+
┌─────────────────────────────────────────────────────────────┐
|
|
129
|
+
│ Round N: │
|
|
130
|
+
│ │
|
|
131
|
+
│ 1. REVIEWER [Subagent] ─────────────────────────────────── │
|
|
132
|
+
│ • Spawn NEW isolated agent (Task tool) │
|
|
133
|
+
│ • Use strategy based on scope classification │
|
|
134
|
+
│ • Review ALL files in scope (full checklist A-G) │
|
|
135
|
+
│ • Return: issues[] or APPROVED │
|
|
136
|
+
│ │
|
|
137
|
+
│ 2. FIXER [Main Agent] ──────────────────────────────────── │
|
|
138
|
+
│ • Fix CRITICAL/MAJOR issues with CODE │
|
|
139
|
+
│ • Run invar_guard() │
|
|
140
|
+
│ • Cannot declare quality_met │
|
|
141
|
+
│ │
|
|
142
|
+
│ → Loop until: APPROVED OR max_rounds OR no_progress │
|
|
143
|
+
└─────────────────────────────────────────────────────────────┘
|
|
321
144
|
```
|
|
322
145
|
|
|
323
|
-
|
|
146
|
+
**Why new subagent each round?**
|
|
147
|
+
- Main agent has context contamination from fixing
|
|
148
|
+
- "Fresh eyes" impossible in same context
|
|
149
|
+
- Round 2+ drifts to "verify my fixes" not "find problems"
|
|
324
150
|
|
|
325
|
-
|
|
326
|
-
|-----------|------|-------|
|
|
327
|
-
| Self-review detected | **Isolated** (default) | Unless user explicitly accepts risk |
|
|
328
|
-
| `review_suggested` present | **Isolated** | Guard recommends isolation |
|
|
329
|
-
| `--deep` flag | **Isolated** | User requested |
|
|
330
|
-
| Others' code, no triggers | **Quick** (same context) | Only valid for non-self code |
|
|
151
|
+
---
|
|
331
152
|
|
|
332
|
-
## Review Checklist
|
|
153
|
+
## Review Checklist (apply to ALL files)
|
|
333
154
|
|
|
334
|
-
> **Principle:** Only items requiring semantic judgment. Mechanical checks
|
|
155
|
+
> **Principle:** Only items requiring semantic judgment. Mechanical checks handled by Guard.
|
|
335
156
|
|
|
336
157
|
### A. Contract Semantic Value
|
|
337
158
|
|
|
@@ -341,273 +162,164 @@ WARNING: review_suggested - Low contract coverage
|
|
|
341
162
|
- [ ] Does @post verify meaningful output properties?
|
|
342
163
|
- Bad: `@post(lambda result: result is not None)`
|
|
343
164
|
- Good: `@post(lambda result: len(result) == len(input))`
|
|
344
|
-
|
|
345
165
|
- [ ] Could someone implement correctly from contracts alone?
|
|
346
166
|
- [ ] Are boundary conditions explicit in contracts?
|
|
347
167
|
|
|
348
168
|
### B. Doctest Coverage
|
|
349
|
-
|
|
350
|
-
- [ ] Do doctests cover boundary cases?
|
|
351
|
-
- [ ] Do doctests cover error cases?
|
|
169
|
+
|
|
170
|
+
- [ ] Do doctests cover normal, boundary, and error cases?
|
|
352
171
|
- [ ] Are doctests testing behavior, not just syntax?
|
|
353
172
|
|
|
354
173
|
### C. Code Quality
|
|
174
|
+
|
|
355
175
|
- [ ] Is duplicated code worth extracting?
|
|
356
176
|
- [ ] Is naming consistent and clear?
|
|
357
177
|
- [ ] Is complexity justified?
|
|
358
178
|
|
|
359
179
|
### D. Escape Hatch Audit
|
|
180
|
+
|
|
360
181
|
- [ ] Is each @invar:allow justification valid?
|
|
361
182
|
- [ ] Could refactoring eliminate the need?
|
|
362
|
-
- [ ] Is there a pattern suggesting systematic issues?
|
|
363
183
|
|
|
364
184
|
### E. Logic Verification
|
|
185
|
+
|
|
365
186
|
- [ ] Do contracts correctly capture intended behavior?
|
|
366
187
|
- [ ] Are there paths that bypass contract checks?
|
|
367
188
|
- [ ] Are there implicit assumptions not in contracts?
|
|
368
|
-
- [ ] Is there dead code or unreachable branches?
|
|
369
189
|
|
|
370
190
|
### F. Security
|
|
191
|
+
|
|
371
192
|
- [ ] Are inputs validated against security threats (injection, XSS)?
|
|
372
193
|
- [ ] No hardcoded secrets (API keys, passwords, tokens)?
|
|
373
194
|
- [ ] Are authentication/authorization checks correct?
|
|
374
|
-
- [ ] Is sensitive data properly protected?
|
|
375
195
|
|
|
376
|
-
### G. Error Handling
|
|
196
|
+
### G. Error Handling
|
|
197
|
+
|
|
377
198
|
- [ ] Are exceptions caught at appropriate level?
|
|
378
199
|
- [ ] Are error messages clear without leaking sensitive info?
|
|
379
|
-
- [ ] Are critical operations logged for debugging?
|
|
380
200
|
- [ ] Is there graceful degradation on failure?
|
|
381
201
|
|
|
382
|
-
|
|
383
|
-
|
|
384
|
-
These are checked by Guard or linters - don't duplicate:
|
|
385
|
-
- Core/Shell separation → Guard (forbidden_import, impure_call)
|
|
386
|
-
- Shell returns Result[T,E] → Guard (shell_result)
|
|
387
|
-
- Missing contracts → Guard (missing_contract)
|
|
388
|
-
- File/function size limits → Guard (file_size, function_size)
|
|
389
|
-
- Entry point thickness → Guard (entry_point_too_thick)
|
|
390
|
-
- Escape hatch count → Guard (review_suggested)
|
|
202
|
+
---
|
|
391
203
|
|
|
392
|
-
##
|
|
204
|
+
## Subagent Prompt Templates
|
|
393
205
|
|
|
394
|
-
|
|
206
|
+
### THOROUGH (SMALL scope)
|
|
395
207
|
|
|
396
|
-
|
|
208
|
+
```
|
|
209
|
+
You are an independent Adversarial Code Reviewer.
|
|
397
210
|
|
|
398
|
-
|
|
399
|
-
|
|
400
|
-
|
|
401
|
-
|
|
211
|
+
RULES:
|
|
212
|
+
1. Code is GUILTY until proven INNOCENT
|
|
213
|
+
2. You did NOT write this code — no emotional attachment
|
|
214
|
+
3. Find reasons to REJECT, not accept
|
|
215
|
+
4. Be specific: file:line + concrete fix
|
|
402
216
|
|
|
403
|
-
|
|
217
|
+
STRATEGY: THOROUGH READING
|
|
218
|
+
- Read each file COMPLETELY, line by line
|
|
219
|
+
- DO NOT pre-scan for patterns — just READ
|
|
220
|
+
- Look for VARIANTS and EDGE CASES
|
|
221
|
+
- Trust your judgment
|
|
404
222
|
|
|
405
|
-
|
|
406
|
-
┌─────────────────────────────────────────────────────────────────┐
|
|
407
|
-
│ START: round = 1, issues = [] │
|
|
408
|
-
│ │
|
|
409
|
-
│ ┌─────────────────────────────────────────────────────────┐ │
|
|
410
|
-
│ │ 🔍 REVIEWER [Round N] — Full Scope Review │ │
|
|
411
|
-
│ │ 1. Apply FULL checklist (A-G) to ENTIRE scope │ │
|
|
412
|
-
│ │ 2. Find ALL issues (don't stop at first) │ │
|
|
413
|
-
│ │ 3. Classify: CRITICAL / MAJOR / MINOR │ │
|
|
414
|
-
│ │ 4. Check previous fixes: CODE or just COMMENT? │ │
|
|
415
|
-
│ │ 5. Check if fixes introduced NEW issues │ │
|
|
416
|
-
│ │ 6. Update issues table │ │
|
|
417
|
-
│ │ │ │
|
|
418
|
-
│ │ EXIT CHECK: │ │
|
|
419
|
-
│ │ - IF no CRITICAL/MAJOR found → quality_met, EXIT │ │
|
|
420
|
-
│ │ - IF round >= MAX_ROUNDS → max_rounds, EXIT │ │
|
|
421
|
-
│ │ - IF no progress (same issues 2 rounds) → EXIT │ │
|
|
422
|
-
│ │ - ELSE → AUTO-TRANSITION to FIXER │ │
|
|
423
|
-
│ └─────────────────────────────────────────────────────────┘ │
|
|
424
|
-
│ ↓ (automatic) │
|
|
425
|
-
│ ┌─────────────────────────────────────────────────────────┐ │
|
|
426
|
-
│ │ 🔧 FIXER [Round N] │ │
|
|
427
|
-
│ │ 1. Fix EACH CRITICAL/MAJOR issue with CODE │ │
|
|
428
|
-
│ │ 2. Run invar_guard() after fixes │ │
|
|
429
|
-
│ │ 3. NO declaring quality_met (forbidden) │ │
|
|
430
|
-
│ │ 4. round++ │ │
|
|
431
|
-
│ │ 5. AUTO-TRANSITION to REVIEWER [Round N+1] │ │
|
|
432
|
-
│ └─────────────────────────────────────────────────────────┘ │
|
|
433
|
-
│ ↓ (automatic, fresh eyes) │
|
|
434
|
-
│ [LOOP BACK TO REVIEWER] │
|
|
435
|
-
│ │
|
|
436
|
-
│ EXIT: Generate final report │
|
|
437
|
-
└─────────────────────────────────────────────────────────────────┘
|
|
438
|
-
```
|
|
223
|
+
SCOPE: [list all files]
|
|
439
224
|
|
|
440
|
-
|
|
441
|
-
full-scope audit with the same rigor as Round 1. This prevents the "verification
|
|
442
|
-
mindset" trap where standards unconsciously lower after fixing.
|
|
225
|
+
Apply checklist A-G to each file.
|
|
443
226
|
|
|
444
|
-
|
|
227
|
+
OUTPUT FORMAT:
|
|
228
|
+
## Verdict: APPROVED | NEEDS WORK | REJECTED
|
|
229
|
+
## Critical Issues (must fix)
|
|
230
|
+
| ID | File:Line | Issue | Fix |
|
|
231
|
+
## Major Issues (should fix)
|
|
232
|
+
| ID | File:Line | Issue | Fix |
|
|
233
|
+
## Minor Issues (backlog)
|
|
234
|
+
| ID | File:Line | Issue | Fix |
|
|
235
|
+
```
|
|
445
236
|
|
|
446
|
-
|
|
237
|
+
### HYBRID (MEDIUM scope)
|
|
447
238
|
|
|
448
|
-
```markdown
|
|
449
|
-
## Review State
|
|
450
|
-
- **Round:** N / MAX_ROUNDS
|
|
451
|
-
- **Role:** REVIEWER | FIXER
|
|
452
|
-
- **Issues Found:** [count]
|
|
453
|
-
- **Issues Fixed:** [count]
|
|
454
|
-
- **Guard Status:** PASS | FAIL
|
|
455
239
|
```
|
|
240
|
+
You are an independent Adversarial Code Reviewer.
|
|
241
|
+
|
|
242
|
+
RULES:
|
|
243
|
+
1. Code is GUILTY until proven INNOCENT
|
|
244
|
+
2. You did NOT write this code — no emotional attachment
|
|
245
|
+
3. Find reasons to REJECT, not accept
|
|
246
|
+
4. Be specific: file:line + concrete fix
|
|
456
247
|
|
|
457
|
-
|
|
248
|
+
STRATEGY: HYBRID (two passes)
|
|
458
249
|
|
|
459
|
-
|
|
460
|
-
|
|
461
|
-
|
|
462
|
-
| MAJOR-2 | MAJOR | 1 | - | ❌ Unfixed | Fix was comment, not code |
|
|
463
|
-
| MAJOR-3 | MAJOR | 2 | - | 🆕 New | Found in Round 2 review |
|
|
464
|
-
| MINOR-1 | MINOR | 1 | - | ⏭️ Backlog | Deferred (non-blocking) |
|
|
250
|
+
PASS 1 - GUIDED:
|
|
251
|
+
Using this issue_map, verify each potential issue:
|
|
252
|
+
[issue_map from Phase 0]
|
|
465
253
|
|
|
466
|
-
|
|
467
|
-
- ✅ Fixed — Actually fixed with CODE (not comments)
|
|
468
|
-
- ❌ Unfixed — Fix failed, was just a comment, or not addressed
|
|
469
|
-
- 🆕 New — Found in a later round (fix may have introduced it, or missed earlier)
|
|
470
|
-
- ⏭️ Backlog — MINOR, deferred to later (non-blocking)
|
|
254
|
+
Report: "Verified X/Y items from issue_map"
|
|
471
255
|
|
|
472
|
-
|
|
473
|
-
|
|
474
|
-
|
|
475
|
-
|
|
256
|
+
PASS 2 - OPEN DISCOVERY:
|
|
257
|
+
Now FORGET the issue_map. Read the code fresh.
|
|
258
|
+
Look for issues NOT in the map:
|
|
259
|
+
- Variants of listed patterns
|
|
260
|
+
- Logic errors
|
|
261
|
+
- Edge cases
|
|
476
262
|
|
|
477
|
-
|
|
263
|
+
Report: "Found N additional issues not in issue_map"
|
|
478
264
|
|
|
479
|
-
|
|
265
|
+
SCOPE: [list all files]
|
|
480
266
|
|
|
481
|
-
|
|
482
|
-
|
|
483
|
-
|
|
484
|
-
|
|
|
485
|
-
|
|
267
|
+
OUTPUT FORMAT:
|
|
268
|
+
## Verdict: APPROVED | NEEDS WORK | REJECTED
|
|
269
|
+
## From Issue Map (Pass 1)
|
|
270
|
+
| ID | File:Line | Issue | Fix |
|
|
271
|
+
## Additional Findings (Pass 2)
|
|
272
|
+
| ID | File:Line | Issue | Fix |
|
|
273
|
+
```
|
|
486
274
|
|
|
487
|
-
|
|
275
|
+
---
|
|
488
276
|
|
|
489
|
-
|
|
277
|
+
## Exit Conditions
|
|
490
278
|
|
|
491
279
|
| Condition | Exit Reason | Result |
|
|
492
280
|
|-----------|-------------|--------|
|
|
493
|
-
|
|
|
494
|
-
|
|
|
495
|
-
|
|
|
496
|
-
|
|
497
|
-
**quality_met requires ALL of:**
|
|
498
|
-
1. Current round's FULL SCOPE review found zero CRITICAL/MAJOR
|
|
499
|
-
2. All previous issues verified as fixed (with code, not comments)
|
|
500
|
-
3. Guard passes
|
|
501
|
-
4. Issues table complete with evidence
|
|
502
|
-
|
|
503
|
-
**Automatic quality_not_met:**
|
|
504
|
-
- Any MAJOR "fixed" with comment instead of code
|
|
505
|
-
- Any issue marked "assessed" or "acceptable"
|
|
506
|
-
- Fixer role declared quality_met (role violation)
|
|
507
|
-
- Same CRITICAL/MAJOR persists for 2+ rounds
|
|
281
|
+
| Subagent returns APPROVED | `quality_met` | Ready for merge |
|
|
282
|
+
| round >= 5 | `max_rounds` | Manual review needed |
|
|
283
|
+
| Same issues 2 rounds | `no_improvement` | Architectural issue |
|
|
508
284
|
|
|
509
|
-
|
|
510
|
-
not when fixes are applied. This ensures the final state is actually reviewed.
|
|
285
|
+
---
|
|
511
286
|
|
|
512
|
-
## Exit Report
|
|
287
|
+
## Exit Report
|
|
513
288
|
|
|
514
|
-
```
|
|
289
|
+
```
|
|
515
290
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
516
291
|
📋 REVIEW COMPLETE
|
|
517
292
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
518
293
|
|
|
519
|
-
**
|
|
520
|
-
**
|
|
521
|
-
**
|
|
522
|
-
**
|
|
523
|
-
**
|
|
294
|
+
**Scope:** SMALL | MEDIUM | LARGE
|
|
295
|
+
**Strategy:** THOROUGH | HYBRID | CHUNKED
|
|
296
|
+
**Exit:** quality_met | max_rounds | no_improvement
|
|
297
|
+
**Rounds:** N / 5
|
|
298
|
+
**Guard:** PASS | FAIL
|
|
524
299
|
|
|
525
300
|
## Issues Table
|
|
526
|
-
|
|
527
|
-
| Issue | Severity | Found | Fixed | Status | Evidence |
|
|
528
|
-
|-------|----------|-------|-------|--------|----------|
|
|
529
|
-
| MAJOR-1 | MAJOR | R1 | R1 | ✅ Fixed | Code at file.py:123 |
|
|
530
|
-
| MAJOR-2 | MAJOR | R2 | R2 | ✅ Fixed | Added validation |
|
|
531
|
-
| ... | ... | ... | ... | ... | ... |
|
|
301
|
+
| Issue | Severity | Round | Status | Evidence |
|
|
532
302
|
|
|
533
303
|
## Round Summary
|
|
304
|
+
| Round | Found | Fixed |
|
|
534
305
|
|
|
535
|
-
|
|
536
|
-
|-------|--------------|--------------|----------------|
|
|
537
|
-
| 1 | 3 | 3 | 0 |
|
|
538
|
-
| 2 | 1 | 1 | 0 |
|
|
539
|
-
| 3 | 0 | - | - | ← quality_met
|
|
540
|
-
|
|
541
|
-
## Self-Check (Final Review Round)
|
|
542
|
-
|
|
543
|
-
- [x] Applied FULL checklist (A-G) with fresh eyes
|
|
544
|
-
- [x] All fixes are CODE, not comments
|
|
545
|
-
- [x] No "assessed as acceptable" rationalizations
|
|
546
|
-
- [x] Guard passes after all changes
|
|
547
|
-
- [x] Role separation maintained throughout
|
|
548
|
-
|
|
549
|
-
## Self-Review Warning (if applicable)
|
|
550
|
-
|
|
551
|
-
⚠️ **This was a same-context self-review.** Cognitive biases may have caused
|
|
552
|
-
issues to be missed. For higher confidence, run `--deep` review before merge.
|
|
553
|
-
|
|
554
|
-
Known blind spots in self-review:
|
|
555
|
-
- Exception handlers that silently lose data
|
|
556
|
-
- Path traversal / security issues in user input
|
|
557
|
-
- Edge cases in validation logic
|
|
558
|
-
- Documentation-implementation mismatches
|
|
559
|
-
|
|
560
|
-
## Recommendation
|
|
561
|
-
|
|
562
|
-
- [x] Ready for merge (quality_met)
|
|
563
|
-
- [ ] Needs manual review (max_rounds)
|
|
564
|
-
- [ ] Architectural refactor needed (no_improvement)
|
|
565
|
-
|
|
566
|
-
**MINOR (Backlog):**
|
|
567
|
-
- [list deferred items]
|
|
306
|
+
✓ Final: guard PASS | X errors, Y warnings
|
|
568
307
|
```
|
|
569
|
-
## Appendix: Adversarial Code Reviewer Persona
|
|
570
308
|
|
|
571
|
-
|
|
572
|
-
|
|
573
|
-
```
|
|
574
|
-
You are an independent Adversarial Code Reviewer.
|
|
309
|
+
---
|
|
575
310
|
|
|
576
|
-
|
|
577
|
-
1. Code is GUILTY until proven INNOCENT
|
|
578
|
-
2. You did NOT write this code — no emotional attachment
|
|
579
|
-
3. Find reasons to REJECT, not accept
|
|
580
|
-
4. Be specific and actionable (file:line, concrete fix)
|
|
581
|
-
5. Your job is to find bugs, not approve code
|
|
311
|
+
## Scope Boundaries
|
|
582
312
|
|
|
583
|
-
|
|
584
|
-
|
|
585
|
-
- Contracts (if available)
|
|
586
|
-
- Test files (if available)
|
|
313
|
+
**IS for:** Finding bugs, verifying contracts, security review
|
|
314
|
+
**NOT for:** New features → /develop | Understanding → /investigate
|
|
587
315
|
|
|
588
|
-
|
|
589
|
-
- Development conversation history
|
|
590
|
-
- Developer's explanations
|
|
591
|
-
- Prior context about design decisions
|
|
316
|
+
## Excluded (Covered by Guard)
|
|
592
317
|
|
|
593
|
-
|
|
594
|
-
|
|
595
|
-
|
|
596
|
-
|
|
597
|
-
3. Major issues (should fix)
|
|
598
|
-
4. Minor issues (nice to fix)
|
|
599
|
-
5. Positive observations (what's done well)
|
|
600
|
-
```
|
|
318
|
+
Don't duplicate mechanical checks:
|
|
319
|
+
- Core/Shell separation → Guard
|
|
320
|
+
- Missing contracts → Guard
|
|
321
|
+
- File/function size → Guard
|
|
601
322
|
|
|
602
323
|
<!--/invar:skill--><!--invar:extensions-->
|
|
603
|
-
<!--
|
|
604
|
-
EXTENSIONS REGION - USER EDITABLE
|
|
605
|
-
Add project-specific extensions here. This section is preserved on update.
|
|
606
|
-
|
|
607
|
-
Examples of what to add:
|
|
608
|
-
- Project-specific security review checklists
|
|
609
|
-
- Custom severity definitions
|
|
610
|
-
- Domain-specific code patterns to check
|
|
611
|
-
- Team code review standards
|
|
612
|
-
======================================================================== -->
|
|
324
|
+
<!-- User extensions preserved on update -->
|
|
613
325
|
<!--/invar:extensions-->
|