cc-reviewer 6.0.0 → 6.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/commands/multi-consult.md +29 -0
- package/commands/multi-review.md +33 -8
- package/dist/handoff.d.ts +7 -2
- package/dist/handoff.js +44 -27
- package/dist/schema.d.ts +9 -0
- package/dist/schema.js +42 -7
- package/package.json +1 -1
|
@@ -34,6 +34,35 @@ If the question references the codebase, populate `relevantFiles` with the minim
|
|
|
34
34
|
|
|
35
35
|
If the current working directory is `/etc`, `~`, `~/.ssh`, or any other clearly sensitive system path, **refuse**. Tell the user: "Please invoke `/multi-consult` from a project root — `<cwd>` looks sensitive." Do not call the tool.
|
|
36
36
|
|
|
37
|
+
### 4. Extract criteria; clarify load-bearing assumptions BEFORE calling
|
|
38
|
+
|
|
39
|
+
Pin what the question is being judged against. Once criteria are explicit, the panel's recommendation is anchored to them instead of floating — this is the fix for "ask twice, get a different answer." Stochastic re-runs converge much better against fixed criteria than against an under-specified question.
|
|
40
|
+
|
|
41
|
+
**4a. Append a CRITERIA block to the end of `question`**, priority-ordered, each tagged `[stated]` or `[assumed]`:
|
|
42
|
+
|
|
43
|
+
```
|
|
44
|
+
CRITERIA (priority order):
|
|
45
|
+
1. [stated] cost-per-request under $X / 1M ops
|
|
46
|
+
2. [stated] team writes Go; minimize ops complexity
|
|
47
|
+
3. [assumed] sustained ~10k QPS write rate
|
|
48
|
+
4. [assumed] eventual consistency acceptable for analytics
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
- `[stated]` = explicit in the user's message or earlier conversation.
|
|
52
|
+
- `[assumed]` = you needed to fix it to recommend; the user did NOT say.
|
|
53
|
+
- Cap `[assumed]` at 3. If the top 3 don't fit, the question is too vague — bounce back to the user before calling.
|
|
54
|
+
|
|
55
|
+
**4b. Pre-call clarification gate.** Scan your `[assumed]` criteria. If any is **load-bearing** (the recommendation would flip if the assumption is wrong), STOP and ask the user before invoking the tool:
|
|
56
|
+
|
|
57
|
+
> "Before I consult the panel, I need to confirm: <restate assumption>. Is that right, or should I adjust to <plausible alternative>?"
|
|
58
|
+
|
|
59
|
+
A burned panel call on a wrong assumed criterion costs more than the round-trip.
|
|
60
|
+
|
|
61
|
+
**Skip the gate when:**
|
|
62
|
+
- `[stated]` criteria fully pin the answer space (no assumptions needed).
|
|
63
|
+
- The user told you to proceed without clarification.
|
|
64
|
+
- Remaining assumptions are clearly incidental (would not flip the rec).
|
|
65
|
+
|
|
37
66
|
## Tool Invocation
|
|
38
67
|
|
|
39
68
|
Call `multi_consult` with:
|
package/commands/multi-review.md
CHANGED
|
@@ -23,17 +23,27 @@ Use `/multi-review` when you want thorough parallel reviews from all available m
|
|
|
23
23
|
|
|
24
24
|
## Before Calling - PREPARE THE HANDOFF
|
|
25
25
|
|
|
26
|
-
### 1. Summarize What You Did
|
|
26
|
+
### 1. Summarize What You Did + State the Acceptance Bar
|
|
27
|
+
|
|
28
|
+
Don't just say what you did — also state the bar the work needs to clear. The bar is what lets reviewers calibrate "material" vs "nice to have." Without it, reviewers default to general code-quality vibes, which produces drift across runs.
|
|
29
|
+
|
|
27
30
|
```
|
|
28
|
-
"Implemented caching layer for the product catalog API using Redis.
|
|
29
|
-
|
|
31
|
+
"Implemented caching layer for the product catalog API using Redis with cache invalidation on product updates.
|
|
32
|
+
Bar: safe under concurrent updates (no stale reads on the next request) AND p95 read latency under 50ms."
|
|
30
33
|
```
|
|
31
34
|
|
|
32
|
-
### 2. List Your Uncertainties
|
|
35
|
+
### 2. List Your Uncertainties — Tag Load-Bearing vs Incidental
|
|
36
|
+
|
|
37
|
+
Tag each uncertainty:
|
|
38
|
+
- `[load-bearing]` = if your assumption here is wrong, the work is NOT shipping-ready
|
|
39
|
+
- `[incidental]` = nice to verify but won't block ship
|
|
40
|
+
|
|
41
|
+
Reviewers prioritize accordingly, and your synthesis can elevate `[load-bearing]` items above stylistic findings.
|
|
42
|
+
|
|
33
43
|
```
|
|
34
44
|
UNCERTAINTIES:
|
|
35
|
-
- "Is the cache
|
|
36
|
-
- "
|
|
45
|
+
- [load-bearing] "Is the cache invalidation race-free under concurrent updates?"
|
|
46
|
+
- [incidental] "Is the TTL value optimal — could it be 60s instead of 30s?"
|
|
37
47
|
```
|
|
38
48
|
|
|
39
49
|
### 3. Ask Specific Questions
|
|
@@ -43,6 +53,16 @@ QUESTIONS:
|
|
|
43
53
|
- "Is there a race condition in the invalidation logic?"
|
|
44
54
|
```
|
|
45
55
|
|
|
56
|
+
### 4. Identify Decisions You Made
|
|
57
|
+
|
|
58
|
+
If you chose between alternatives — caching strategy, retry policy, error-handling shape, schema design, etc. — list them with rationale. The handoff schema's `decisions[]` field gives the adversarial reviewer a concrete hook to attack the design choice rather than just hunt for bugs. Skip if the change is a straightforward bug fix with no design choice involved.
|
|
59
|
+
|
|
60
|
+
```
|
|
61
|
+
DECISIONS:
|
|
62
|
+
1. Chose write-through cache over write-behind. Rationale: stronger read-after-write consistency at the cost of slightly slower writes; we prioritize correctness for catalog data.
|
|
63
|
+
2. Chose 30s TTL with explicit invalidation on update. Rationale: TTL bounds staleness if invalidation misses; explicit invalidation catches the common path immediately.
|
|
64
|
+
```
|
|
65
|
+
|
|
46
66
|
## Tool Invocation
|
|
47
67
|
|
|
48
68
|
Call `multi_review` with:
|
|
@@ -67,14 +87,19 @@ Call `multi_review` with:
|
|
|
67
87
|
```
|
|
68
88
|
SUMMARY:
|
|
69
89
|
<what you did, 1-3 sentences>
|
|
90
|
+
Bar: <what counts as shipping-ready — concrete acceptance criteria>
|
|
70
91
|
|
|
71
92
|
UNCERTAINTIES (verify these):
|
|
72
|
-
1. <uncertainty>
|
|
73
|
-
2. <uncertainty>
|
|
93
|
+
1. [load-bearing|incidental] <uncertainty>
|
|
94
|
+
2. [load-bearing|incidental] <uncertainty>
|
|
74
95
|
|
|
75
96
|
QUESTIONS:
|
|
76
97
|
1. <question>
|
|
77
98
|
|
|
99
|
+
DECISIONS:
|
|
100
|
+
1. <choice>. Rationale: <why this over alternatives>
|
|
101
|
+
2. <choice>. Rationale: <why this over alternatives>
|
|
102
|
+
|
|
78
103
|
PRIORITY FILES:
|
|
79
104
|
- <file>
|
|
80
105
|
```
|
package/dist/handoff.d.ts
CHANGED
|
@@ -218,8 +218,13 @@ export declare function selectRole(focusAreas?: FocusArea[]): ReviewerRole;
|
|
|
218
218
|
export declare const ADVERSARIAL_REVIEWER: ReviewerRole;
|
|
219
219
|
/**
|
|
220
220
|
* Build an adversarial handoff prompt with challenge-mode stance sections.
|
|
221
|
-
*
|
|
222
|
-
*
|
|
221
|
+
*
|
|
222
|
+
* Block structure ported from openai/codex-plugin-cc's adversarial-review
|
|
223
|
+
* prompt: tagged XML blocks (operating_stance, attack_surface, review_method,
|
|
224
|
+
* finding_bar, calibration_rules, grounding_rules, final_check) so the prompt
|
|
225
|
+
* has stable internal structure the reviewer can lean on. CC's handoff
|
|
226
|
+
* sections (uncertainties / decisions / questions / focus / files / focus
|
|
227
|
+
* instructions) are layered on after as our differentiator.
|
|
223
228
|
*/
|
|
224
229
|
export declare function buildAdversarialHandoffPrompt(options: PromptOptions): string;
|
|
225
230
|
export interface PromptOptions {
|
package/dist/handoff.js
CHANGED
|
@@ -186,8 +186,13 @@ export const ADVERSARIAL_REVIEWER = {
|
|
|
186
186
|
};
|
|
187
187
|
/**
|
|
188
188
|
* Build an adversarial handoff prompt with challenge-mode stance sections.
|
|
189
|
-
*
|
|
190
|
-
*
|
|
189
|
+
*
|
|
190
|
+
* Block structure ported from openai/codex-plugin-cc's adversarial-review
|
|
191
|
+
* prompt: tagged XML blocks (operating_stance, attack_surface, review_method,
|
|
192
|
+
* finding_bar, calibration_rules, grounding_rules, final_check) so the prompt
|
|
193
|
+
* has stable internal structure the reviewer can lean on. CC's handoff
|
|
194
|
+
* sections (uncertainties / decisions / questions / focus / files / focus
|
|
195
|
+
* instructions) are layered on after as our differentiator.
|
|
191
196
|
*/
|
|
192
197
|
export function buildAdversarialHandoffPrompt(options) {
|
|
193
198
|
const { handoff } = options;
|
|
@@ -195,34 +200,36 @@ export function buildAdversarialHandoffPrompt(options) {
|
|
|
195
200
|
const sections = [];
|
|
196
201
|
// SECTION 1: ROLE
|
|
197
202
|
sections.push(`# ROLE: ${role.name}\n\n${role.systemPrompt}`);
|
|
198
|
-
// SECTION 2: ADVERSARIAL STANCE
|
|
199
|
-
sections.push(
|
|
200
|
-
|
|
201
|
-
|
|
202
|
-
|
|
203
|
-
|
|
204
|
-
for good intent, partial fixes, or likely follow-up work.
|
|
203
|
+
// SECTION 2: ADVERSARIAL STANCE — tagged blocks form the operating contract
|
|
204
|
+
sections.push(`<operating_stance>
|
|
205
|
+
Default to skepticism.
|
|
206
|
+
Assume the change can fail in subtle, high-cost, or user-visible ways until the evidence says otherwise.
|
|
207
|
+
Do not give credit for good intent, partial fixes, or likely follow-up work.
|
|
208
|
+
If something only works on the happy path, treat that as a real weakness.
|
|
205
209
|
</operating_stance>
|
|
206
210
|
|
|
207
211
|
<attack_surface>
|
|
208
|
-
|
|
209
|
-
|
|
210
|
-
|
|
211
|
-
|
|
212
|
-
|
|
213
|
-
|
|
214
|
-
|
|
215
|
-
|
|
212
|
+
Prioritize the kinds of failures that are expensive, dangerous, or hard to detect:
|
|
213
|
+
- auth, permissions, tenant isolation, and trust boundaries
|
|
214
|
+
- data loss, corruption, duplication, and irreversible state changes
|
|
215
|
+
- rollback safety, retries, partial failure, and idempotency gaps
|
|
216
|
+
- race conditions, ordering assumptions, stale state, and re-entrancy
|
|
217
|
+
- empty-state, null, timeout, and degraded dependency behavior
|
|
218
|
+
- version skew, schema drift, migration hazards, and compatibility regressions
|
|
219
|
+
- observability gaps that would hide failure or make recovery harder
|
|
216
220
|
</attack_surface>
|
|
217
221
|
|
|
218
222
|
<review_method>
|
|
219
|
-
Actively try to disprove the change.
|
|
220
|
-
missing guards, unhandled failure paths
|
|
221
|
-
|
|
223
|
+
Actively try to disprove the change.
|
|
224
|
+
Look for violated invariants, missing guards, unhandled failure paths, and assumptions that stop being true under stress.
|
|
225
|
+
Trace how bad inputs, retries, concurrent actions, or partially completed operations move through the code.
|
|
226
|
+
If the user supplied a focus area, weight it heavily, but still report any other material issue you can defend.
|
|
222
227
|
</review_method>
|
|
223
228
|
|
|
224
229
|
<finding_bar>
|
|
225
|
-
|
|
230
|
+
Report only material findings.
|
|
231
|
+
Do not include style feedback, naming feedback, low-value cleanup, or speculative concerns without evidence.
|
|
232
|
+
Each finding must answer:
|
|
226
233
|
1. What can go wrong?
|
|
227
234
|
2. Why is this code path vulnerable?
|
|
228
235
|
3. What is the likely impact?
|
|
@@ -230,15 +237,25 @@ Material findings only. Each must answer:
|
|
|
230
237
|
</finding_bar>
|
|
231
238
|
|
|
232
239
|
<calibration_rules>
|
|
233
|
-
Prefer one strong finding over several weak ones.
|
|
234
|
-
|
|
240
|
+
Prefer one strong finding over several weak ones.
|
|
241
|
+
Do not dilute serious issues with filler.
|
|
242
|
+
If the change looks safe, say so directly and return no findings.
|
|
235
243
|
</calibration_rules>
|
|
236
244
|
|
|
237
245
|
<grounding_rules>
|
|
238
|
-
Be aggressive, but stay grounded.
|
|
239
|
-
|
|
240
|
-
|
|
241
|
-
|
|
246
|
+
Be aggressive, but stay grounded.
|
|
247
|
+
Every finding must be defensible from the repository context or tool outputs.
|
|
248
|
+
Do not invent files, lines, code paths, incidents, attack chains, or runtime behavior you cannot support.
|
|
249
|
+
If a conclusion depends on an inference, state that explicitly in the finding body and keep the confidence honest.
|
|
250
|
+
</grounding_rules>
|
|
251
|
+
|
|
252
|
+
<final_check>
|
|
253
|
+
Before finalizing, check that each finding is:
|
|
254
|
+
- adversarial rather than stylistic
|
|
255
|
+
- tied to a concrete code location
|
|
256
|
+
- plausible under a real failure scenario
|
|
257
|
+
- actionable for an engineer fixing the issue
|
|
258
|
+
</final_check>`);
|
|
242
259
|
// SECTION 3: TASK (same as standard)
|
|
243
260
|
sections.push(`## YOUR TASK
|
|
244
261
|
|
package/dist/schema.d.ts
CHANGED
|
@@ -11,6 +11,15 @@ export declare const ConfidenceLevel: z.ZodEnum<["verified", "high", "medium", "
|
|
|
11
11
|
export type ConfidenceLevel = z.infer<typeof ConfidenceLevel>;
|
|
12
12
|
export declare const ConfidenceScore: z.ZodNumber;
|
|
13
13
|
export type ConfidenceScore = z.infer<typeof ConfidenceScore>;
|
|
14
|
+
/**
|
|
15
|
+
* Sentinel used when a reviewer omits `confidence` on a finding, agreement,
|
|
16
|
+
* or disagreement. Confidence is required by the Zod schema, but external
|
|
17
|
+
* CLIs occasionally drop the field — rather than reject the whole review,
|
|
18
|
+
* normalization fills it with this midpoint value. 0.5 reads as "the
|
|
19
|
+
* reviewer did not commit to a confidence" without skewing the result
|
|
20
|
+
* toward "confidently right" or "confidently wrong".
|
|
21
|
+
*/
|
|
22
|
+
export declare const DEFAULT_FINDING_CONFIDENCE = 0.5;
|
|
14
23
|
export declare const CodeLocation: z.ZodObject<{
|
|
15
24
|
file: z.ZodString;
|
|
16
25
|
line_start: z.ZodOptional<z.ZodNumber>;
|
package/dist/schema.js
CHANGED
|
@@ -12,6 +12,15 @@ export const SeverityLevel = z.enum(['critical', 'high', 'medium', 'low', 'info'
|
|
|
12
12
|
export const ConfidenceLevel = z.enum(['verified', 'high', 'medium', 'low', 'uncertain']);
|
|
13
13
|
// Numeric confidence score (0-1)
|
|
14
14
|
export const ConfidenceScore = z.number().min(0).max(1);
|
|
15
|
+
/**
|
|
16
|
+
* Sentinel used when a reviewer omits `confidence` on a finding, agreement,
|
|
17
|
+
* or disagreement. Confidence is required by the Zod schema, but external
|
|
18
|
+
* CLIs occasionally drop the field — rather than reject the whole review,
|
|
19
|
+
* normalization fills it with this midpoint value. 0.5 reads as "the
|
|
20
|
+
* reviewer did not commit to a confidence" without skewing the result
|
|
21
|
+
* toward "confidently right" or "confidently wrong".
|
|
22
|
+
*/
|
|
23
|
+
export const DEFAULT_FINDING_CONFIDENCE = 0.5;
|
|
15
24
|
// =============================================================================
|
|
16
25
|
// CODE LOCATION
|
|
17
26
|
// =============================================================================
|
|
@@ -40,7 +49,7 @@ export const ReviewFinding = z.object({
|
|
|
40
49
|
'other'
|
|
41
50
|
]).describe('Primary category of the finding'),
|
|
42
51
|
severity: SeverityLevel.describe('Impact severity level'),
|
|
43
|
-
confidence: ConfidenceScore.describe('Confidence in this finding (0-1)'),
|
|
52
|
+
confidence: ConfidenceScore.describe('Confidence in this finding (0-1). Required. If the reviewer omits this field, normalizeReviewOutput fills it with DEFAULT_FINDING_CONFIDENCE (0.5) so the whole review is not dropped.'),
|
|
44
53
|
title: z.string().max(120).describe('Brief title summarizing the issue'),
|
|
45
54
|
description: z.string().describe('Detailed explanation of the finding'),
|
|
46
55
|
location: CodeLocation.optional().describe('Where in the code this applies'),
|
|
@@ -269,10 +278,28 @@ export function getReviewOutputJsonSchema() {
|
|
|
269
278
|
}
|
|
270
279
|
};
|
|
271
280
|
}
|
|
281
|
+
/**
|
|
282
|
+
* Fill `confidence` with the sentinel when an object-shaped item omits it.
|
|
283
|
+
* Leaves non-object items alone so downstream Zod validation still surfaces
|
|
284
|
+
* a useful error for genuinely malformed entries.
|
|
285
|
+
*/
|
|
286
|
+
function fillMissingConfidence(item) {
|
|
287
|
+
if (!item || typeof item !== 'object' || Array.isArray(item))
|
|
288
|
+
return item;
|
|
289
|
+
const obj = item;
|
|
290
|
+
if (typeof obj.confidence === 'number')
|
|
291
|
+
return obj;
|
|
292
|
+
return { ...obj, confidence: DEFAULT_FINDING_CONFIDENCE };
|
|
293
|
+
}
|
|
272
294
|
/**
|
|
273
295
|
* Normalize reviewer output that deviates from the strict schema.
|
|
274
296
|
* Handles common patterns from external CLIs (e.g. Gemini returning
|
|
275
297
|
* agreements as strings instead of objects, missing required fields).
|
|
298
|
+
*
|
|
299
|
+
* Notably: `confidence` is required on findings/agreements/disagreements,
|
|
300
|
+
* but reviewers occasionally drop it. Rather than reject the whole review,
|
|
301
|
+
* we fill the missing field with DEFAULT_FINDING_CONFIDENCE so the rest of
|
|
302
|
+
* the review survives validation.
|
|
276
303
|
*/
|
|
277
304
|
function normalizeReviewOutput(parsed) {
|
|
278
305
|
const normalized = { ...parsed };
|
|
@@ -280,13 +307,13 @@ function normalizeReviewOutput(parsed) {
|
|
|
280
307
|
if (!normalized.reviewer) {
|
|
281
308
|
normalized.reviewer = 'external';
|
|
282
309
|
}
|
|
283
|
-
// Normalize agreements: string[] -> Agreement[]
|
|
310
|
+
// Normalize agreements: string[] -> Agreement[], then fill missing confidence
|
|
284
311
|
if (Array.isArray(normalized.agreements)) {
|
|
285
312
|
normalized.agreements = normalized.agreements.map((a) => {
|
|
286
313
|
if (typeof a === 'string') {
|
|
287
|
-
return { original_claim: a, assessment: 'correct', confidence:
|
|
314
|
+
return { original_claim: a, assessment: 'correct', confidence: DEFAULT_FINDING_CONFIDENCE };
|
|
288
315
|
}
|
|
289
|
-
return a;
|
|
316
|
+
return fillMissingConfidence(a);
|
|
290
317
|
});
|
|
291
318
|
}
|
|
292
319
|
else {
|
|
@@ -296,6 +323,14 @@ function normalizeReviewOutput(parsed) {
|
|
|
296
323
|
normalized.disagreements = normalized.disagreements ?? [];
|
|
297
324
|
normalized.alternatives = normalized.alternatives ?? [];
|
|
298
325
|
normalized.findings = normalized.findings ?? [];
|
|
326
|
+
// Fill missing confidence on findings and disagreements (Zod requires it;
|
|
327
|
+
// dropping the whole review for one missing scalar is worse than a sentinel)
|
|
328
|
+
if (Array.isArray(normalized.findings)) {
|
|
329
|
+
normalized.findings = normalized.findings.map(fillMissingConfidence);
|
|
330
|
+
}
|
|
331
|
+
if (Array.isArray(normalized.disagreements)) {
|
|
332
|
+
normalized.disagreements = normalized.disagreements.map(fillMissingConfidence);
|
|
333
|
+
}
|
|
299
334
|
// Normalize optional response arrays — drop non-array values
|
|
300
335
|
if (normalized.uncertainty_responses !== undefined && !Array.isArray(normalized.uncertainty_responses)) {
|
|
301
336
|
delete normalized.uncertainty_responses;
|
|
@@ -435,7 +470,7 @@ export function parseLegacyMarkdownOutput(markdown, reviewer) {
|
|
|
435
470
|
output.agreements.push({
|
|
436
471
|
original_claim: content.split(':')[0] || content,
|
|
437
472
|
assessment: 'correct',
|
|
438
|
-
confidence:
|
|
473
|
+
confidence: DEFAULT_FINDING_CONFIDENCE,
|
|
439
474
|
});
|
|
440
475
|
}
|
|
441
476
|
}
|
|
@@ -450,7 +485,7 @@ export function parseLegacyMarkdownOutput(markdown, reviewer) {
|
|
|
450
485
|
output.disagreements.push({
|
|
451
486
|
original_claim: content.split(':')[0] || content,
|
|
452
487
|
issue: 'incorrect',
|
|
453
|
-
confidence:
|
|
488
|
+
confidence: DEFAULT_FINDING_CONFIDENCE,
|
|
454
489
|
reason: content,
|
|
455
490
|
});
|
|
456
491
|
}
|
|
@@ -469,7 +504,7 @@ export function parseLegacyMarkdownOutput(markdown, reviewer) {
|
|
|
469
504
|
id: `legacy-${idx++}`,
|
|
470
505
|
category: 'other',
|
|
471
506
|
severity: 'medium',
|
|
472
|
-
confidence:
|
|
507
|
+
confidence: DEFAULT_FINDING_CONFIDENCE,
|
|
473
508
|
title: content.slice(0, 100),
|
|
474
509
|
description: content,
|
|
475
510
|
location: locationMatch ? {
|