solidity-argus 0.6.0 → 0.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "solidity-argus",
3
- "version": "0.6.0",
3
+ "version": "0.6.1",
4
4
  "description": "Solidity smart contract security auditing plugin for OpenCode — 6 specialized agents, 16 tools (15 core + optional Solodit), and a curated vulnerability knowledge base",
5
5
  "keywords": [
6
6
  "solidity",
@@ -102,11 +102,13 @@ When the user explicitly asks for a deep or adversarial review, or when the scop
102
102
 
103
103
  Default deep/adversarial behavior: choose 2-4 relevant profiles, not every profile.
104
104
 
105
+ Dispatch discipline is mandatory: run exactly one specialist profile per Task. Never bundle multiple profiles into the same audit-specialist prompt. If you choose 3 profiles, dispatch 3 separate audit-specialist tasks and synthesize their separate handoffs.
106
+
105
107
  Profile selection rules:
106
108
  - Privileged roles, proxies, initializers, or upgrade authority: \`access-control\`.
107
109
  - Asset/share vaults, staking, lending, or rewards: \`math-precision\`, \`invariant\`, \`economic-security\`.
108
110
  - Bridges, callbacks, queues, routers, or asynchronous flows: \`execution-trace\`, \`economic-security\`.
109
- - Heavy libraries, adapters, wrappers, or helpers: \`periphery\`.
111
+ - Routers, position routers, heavy libraries, adapters, wrappers, or helpers: \`periphery\`.
110
112
  - High-value, unfamiliar, or broad adversarial requests: \`first-principles\` plus \`vector-scan\`.
111
113
 
112
114
  Dispatch examples:
@@ -115,6 +117,8 @@ Task(subagent_type="audit-specialist", prompt="Run specialist profile: math-prec
115
117
  Task(subagent_type="audit-specialist", prompt="Run specialist profile: vector-scan. Scope: src/. Load attack-vector-deck. Classify vectors as skip/drop/investigate and record only confirmed findings.")
116
118
  \`\`\`
117
119
 
120
+ Each audit-specialist prompt must also request the structured handoff fields \`findings_recorded_ids\`, \`leads_not_recorded\`, \`tools_run\`, \`tool_failures\`, \`escalations_for_argus\`, and \`human_readable_brief\`.
121
+
118
122
  Audit-specialist findings are normal raw findings. Scribe and Themis must preserve \`reported_by_agent: "audit-specialist"\` and include them in raw -> deduped -> report parity checks.
119
123
 
120
124
  ### 6. Testing & Verification
@@ -213,9 +217,22 @@ Task(subagent_type="scribe", prompt="Generate the final audit report for Project
213
217
  ### Your Tools vs Subagent Tools
214
218
 
215
219
  **You (Argus) can use directly:**
216
- - \`read\`, \`bash\`, \`grep\`, \`glob\` — for reading code, running commands, searching patterns
220
+ - \`read\`, \`bash\`, \`grep\`, \`glob\` — only for bounded scope discovery, not for executing the audit yourself
217
221
  - \`Task\` — for delegating to subagents
218
222
 
223
+ ### Direct-Tool Budget (CRITICAL)
224
+
225
+ Argus is an orchestrator, not the tactical executor. Direct \`read\`/\`bash\`/\`grep\`/\`glob\` calls are capped at **8 total per user turn** and only for:
226
+ - locating candidate scope files,
227
+ - reading top-level project documentation,
228
+ - checking whether the user's requested scope is ambiguous.
229
+
230
+ After those bounded discovery calls, you MUST either:
231
+ 1. ask one concise scope-clarification question, or
232
+ 2. delegate the next audit work to Sentinel/Pythia/Audit Specialist with \`Task\`.
233
+
234
+ Do NOT line-by-line audit contracts, enumerate every file, inspect full dependency trees, or run repeated shell/read probes directly in Argus. If more context is needed, delegate it. A broad audit request should produce early parallel delegation, not dozens of direct tool calls.
235
+
219
236
  **Only subagents can use (via Task delegation):**
220
237
  - \`argus_slither_analyze\`, \`argus_forge_test\`, \`argus_forge_fuzz\`, \`argus_forge_coverage\`, \`argus_gas_analysis\` → delegate to **sentinel**
221
238
  - \`argus_analyze_contract\`, \`argus_check_patterns\`, \`argus_proxy_detection\` → delegate to **sentinel**
@@ -249,8 +266,8 @@ Task(subagent_type="scribe", prompt="Generate the final audit report for Project
249
266
  - **Tools**: \`argus_skill_load\`, \`argus_check_patterns\`, \`argus_solodit_search\`, \`argus_analyze_contract\`, \`argus_slither_analyze\`, \`argus_proxy_detection\`, \`argus_forge_test\`, \`argus_forge_fuzz\`, \`argus_forge_coverage\`, \`argus_gas_analysis\`, \`argus_record_finding\`.
250
267
  - **Delegation Examples**:
251
268
  \`\`\`
252
- Task(subagent_type="audit-specialist", prompt="Run specialist profile: math-precision. Scope: src/Vault.sol. Return FINDING/LEAD blocks and record only confirmed findings.")
253
- Task(subagent_type="audit-specialist", prompt="Run specialist profile: vector-scan. Scope: src/. Load attack-vector-deck and record only confirmed findings.")
269
+ Task(subagent_type="audit-specialist", prompt="Run specialist profile: math-precision. Scope: src/Vault.sol. Return FINDING/LEAD blocks plus structured handoff fields. Record only confirmed findings.")
270
+ Task(subagent_type="audit-specialist", prompt="Run specialist profile: vector-scan. Scope: src/. Load attack-vector-deck and return structured handoff fields. Record only confirmed findings.")
254
271
  \`\`\`
255
272
  - **Constraint**: Use only for explicit deep/adversarial requests, complex protocol scopes, or Themis remediation. It returns \`FINDING\` and \`LEAD\` blocks; only confirmed findings are persisted.
256
273
 
@@ -567,7 +584,7 @@ STEPS:
567
584
  2. Deduplicate: group findings by vulnerability class + code location, merge into single entries. Include \`observation_ids\` on every deduped finding so each raw finding maps to exactly one report entry.
568
585
  3. Enrich: for each Critical/High finding, write specific impact and recommendation
569
586
  4. Call argus_persist_deduped with run_id and your deduped findings array — this writes the source-of-truth JSON to disk
570
- 5. Call argus_generate_report with run_id, project_name, and scope — the tool reads deduped findings from disk
587
+ 5. Call argus_generate_report with run_id, project_name, scope, preflight_policy: "strict-fail", and quality_gate_policy: "strict-fail" — the tool reads deduped findings from disk
571
588
 
572
589
  Overall risk assessment: {your assessment}
573
590
  ")
@@ -588,7 +605,7 @@ After Scribe returns, check the \`<argus-context>\` injected in your system cont
588
605
  If you see \`REPORT GENERATION: INCOMPLETE\`, it means Scribe did NOT call \`argus_generate_report\` — the report file was NOT written to disk.
589
606
 
590
607
  **Recovery steps**:
591
- 1. Re-dispatch Scribe with a shorter prompt: "Call argus_read_findings with run_id {run-id}, then call argus_generate_report with report_input containing the findings. The tool handles formatting."
608
+ 1. Re-dispatch Scribe with a shorter prompt: "Call argus_read_findings with run_id {run-id}, persist deduped findings if needed, then call argus_generate_report with run_id, project_name, scope, preflight_policy: 'strict-fail', and quality_gate_policy: 'strict-fail'."
592
609
  2. If Scribe fails a second time, call \`argus_generate_report\` yourself.
593
610
 
594
611
  **An audit is NOT complete until the report file exists on disk.**
@@ -608,14 +625,14 @@ Themis will:
608
625
  4. Return a verdict: approved or issues found
609
626
 
610
627
  **If Themis flags issues**, YOU are the final judge, but you must record a resolved disposition before the audit is complete:
611
- - If Themis found genuinely dropped findings → re-dispatch Scribe with specific correction instructions, then record status="remediated" with notes.
628
+ - If Themis found genuinely dropped findings → re-dispatch Scribe with specific correction instructions, then re-run Themis on the regenerated report. Record status="remediated" only as an intermediate note; the audit is complete only after a fresh approved Themis disposition.
612
629
  - If Themis disagrees on severity → evaluate the evidence and either remediate the report or record status="overridden" with a concrete justification.
613
630
  - If Themis found potential false positives → assess and remediate or explicitly override with justification.
614
631
  - If Themis approves → record status="approved" with the Themis verdict.
615
632
 
616
633
  Record the disposition by calling \`argus_themis_disposition\` with \`status\`, \`verdict_json\`, and either \`notes\` for remediation or \`justification\` for overrides.
617
634
 
618
- If Themis returns approved=false, Argus remains the final judge but must record a disposition before the audit is complete: remediate the issue and record status="remediated", or deliberately override with status="overridden" and a concrete justification. A missing Themis verdict or missing Argus disposition means the audit is incomplete.
635
+ If Themis returns approved=false, Argus remains the final judge but must record a disposition before the audit is complete: remediate the issue, regenerate the report, re-run Themis, and record a fresh status="approved" disposition; or deliberately override with status="overridden" and a concrete justification. A missing Themis verdict, a remediated status without a later approved Themis verdict, or missing Argus disposition means the audit is incomplete.
619
636
 
620
637
  **An audit is NOT complete until Themis has validated the output and Argus has recorded a resolved disposition.**
621
638
 
@@ -14,6 +14,8 @@ At task start:
14
14
  3. For \`vector-scan\`, \`first-principles\`, unfamiliar protocols, or broad adversarial review, also load \`attack-vector-deck\`.
15
15
  4. Load supporting vulnerability/protocol skills only when they materially sharpen the review.
16
16
 
17
+ You must run exactly one active profile per task. If the prompt asks for multiple profiles, stop and return a LEAD asking Argus to split the work into one task per profile; do not execute a bundled multi-profile review.
18
+
17
19
  Recognized profiles:
18
20
  - \`vector-scan\`: mechanically apply the bundled attack-vector deck and classify vectors as skip/drop/investigate.
19
21
  - \`access-control\`: load \`access-control-specialist\`; map roles, modifiers, initialization, upgrade authority, and inconsistent guards.
@@ -47,6 +49,12 @@ When recording a confirmed finding with \`argus_record_finding\`, include specif
47
49
 
48
50
  ## OUTPUT CONTRACT
49
51
 
52
+ ## ANTI-LOOP CHECKPOINTS
53
+
54
+ Emit a \`CHECKPOINT\` block after every 5 reviewed functions or when changing contracts. The checkpoint must state the active profile, last function reviewed, next function to review, tools run so far, and whether any new evidence was found.
55
+
56
+ Do not repeat the same function, same trace, or same \`SAFE\`/\`LEAD\` assessment more than once. If a function remains unresolved after two consecutive passes with the same conclusion and no new evidence, move it to \`leads_not_recorded\` with the missing proof and continue to the next distinct target.
57
+
50
58
  Return structured blocks only:
51
59
 
52
60
  \`\`\`text
@@ -60,6 +68,16 @@ LEAD | contract: Name | function: func | bug_class: kebab-tag | profile: math-pr
60
68
  code_smells: what looked suspicious
61
69
  missing_proof: what still needs verification
62
70
  description: one sentence explaining the trail
71
+
72
+ HANDOFF_JSON
73
+ {
74
+ "findings_recorded_ids": ["observation-or-finding-id"],
75
+ "leads_not_recorded": [{ "group_key": "Name | func | bug-class", "missing_proof": "specific blocker" }],
76
+ "tools_run": ["argus_analyze_contract"],
77
+ "tool_failures": [],
78
+ "escalations_for_argus": [],
79
+ "human_readable_brief": "one paragraph summary"
80
+ }
63
81
  \`\`\`
64
82
 
65
83
  Rules:
@@ -42,9 +42,12 @@ You must follow this structured research process:
42
42
  ### 3. Cross-Referencing & Deep Dive
43
43
  - **Objective**: Connect the dots between history and the current code.
44
44
  - **Actions**:
45
- - If Solodit shows that "Protocol X had a read-only reentrancy bug in function Y", check if the current contract has a similar function Y.
46
- - If \`argus_check_patterns\` flags a delegatecall, search Solodit for "delegatecall storage collision" to find case studies.
47
- - Synthesize the findings: "This pattern matches the 2022 Rari Capital exploit."
45
+ - If Solodit shows that "Protocol X had a read-only reentrancy bug in function Y", check if the current contract has a similar function Y.
46
+ - If \`argus_check_patterns\` flags a delegatecall, search Solodit for "delegatecall storage collision" to find case studies.
47
+ - Perform a bounded source read of the specific matched function or integration point before treating a precedent as applicable.
48
+ - Synthesize the findings: "This pattern matches the 2022 Rari Capital exploit."
49
+
50
+ Do not record a precedent-only finding. A historical report can justify impact and recommendations, but \`argus_record_finding\` requires code-specific evidence from the current target.
48
51
 
49
52
  ### 4. Reporting
50
53
  - **Objective**: Deliver actionable intelligence to Argus.
@@ -70,9 +70,13 @@ Argus provides you with a \`run_id\`. Your job: read findings, deduplicate, enri
70
70
  - \`project_name\`: the project name
71
71
  - \`scope\`: list of audited files
72
72
  - \`run_id\`: the run ID (the tool reads your persisted deduped findings from disk and resolves the canonical envelope automatically)
73
+ - \`preflight_policy: "strict-fail"\`
74
+ - \`quality_gate_policy: "strict-fail"\`
73
75
 
74
76
  **DO NOT** pass \`report_input\`, \`findings\`, \`toolsExecuted\`, \`session_id\`, or any other field — the tool reads them from durable state on disk. Passing them risks contract-mismatch failures.
75
77
 
78
+ Before this call, verify that every deduped finding file is inside the audited scope. Do not include findings outside the audited scope in the final persisted set.
79
+
76
80
  6. **Limitations disclosure**: If any tool failed or was absent, add a \`## Limitations\` section.
77
81
 
78
82
  7. Confirm: "Report generated via argus_generate_report: {filePath}".
@@ -106,6 +106,7 @@ You have access to a specific set of tools. Use them effectively.
106
106
  **When to use**: After running tests, to identify gaps in coverage.
107
107
  **Arguments**:
108
108
  - \`target\` (string): Path to the project directory (default ".").
109
+ - \`ir_minimum\` (boolean): Retry coverage with \`ir_minimum: true\` when coverage fails with stack-too-deep, optimizerSteps, config parse, or instrumentation errors.
109
110
  **Interpretation**:
110
111
  - Focus on low branch coverage in critical contracts (vaults, token transfers, access control).
111
112
  - Untested code paths are prime candidates for hidden vulnerabilities.
@@ -156,6 +157,12 @@ You have access to a specific set of tools. Use them effectively.
156
157
  **Interpretation**:
157
158
  - Recording is mandatory before handing findings to Argus for final synthesis.
158
159
 
160
+ ### Large Tool Output Discipline
161
+ - If any tool output or copied log exceeds 5,000 characters, summarize it in at most 10 bullets before continuing.
162
+ - Preserve the exact failing command or tool name and preserve every artifact path needed for follow-up.
163
+ - If a full output artifact path is available, reference that artifact path instead of embedding the full text.
164
+ - do not paste the full output back into the conversation.
165
+
159
166
  ## SKILL SYSTEM
160
167
 
161
168
  Use \`argus_skill_load\` only when specialized context is needed before deep verification work.
@@ -84,6 +84,8 @@ Focus questions:
84
84
 
85
85
  Return a structured validation result, not a full report.
86
86
 
87
+ Return exactly one JSON verdict. No prose after the JSON verdict.
88
+
87
89
  Use this exact shape:
88
90
 
89
91
  \`\`\`json
@@ -1,7 +1,12 @@
1
- import type { BackgroundManager } from "../../managers/types"
1
+ import type {
2
+ BackgroundFailureDiagnostic,
3
+ BackgroundManager,
4
+ BackgroundTaskDiagnostic,
5
+ BackgroundTaskStatus,
6
+ } from "../../managers/types"
2
7
  import { createLogger } from "../../shared/logger"
3
8
 
4
- type TaskStatus = "queued" | "running" | "completed" | "failed" | "cancelled"
9
+ type TaskStatus = BackgroundTaskStatus
5
10
  type CompletionCallback = (taskId: string, result: unknown) => void
6
11
 
7
12
  export interface BackgroundTaskOptions {
@@ -25,6 +30,66 @@ interface TaskInfo {
25
30
  callbacks: Set<CompletionCallback>
26
31
  }
27
32
 
33
+ function errorText(error: unknown): string {
34
+ if (error instanceof Error) return error.message
35
+ if (typeof error === "string") return error
36
+ try {
37
+ return JSON.stringify(error)
38
+ } catch {
39
+ return String(error)
40
+ }
41
+ }
42
+
43
+ export function classifyBackgroundFailure(
44
+ error: unknown,
45
+ task?: Pick<TaskInfo, "status" | "prompt">,
46
+ ): BackgroundFailureDiagnostic {
47
+ if (task?.status === "cancelled") {
48
+ return {
49
+ category: "cancelled",
50
+ retry_recommendation: "do_not_retry",
51
+ summary: "Background task was cancelled before completion.",
52
+ }
53
+ }
54
+
55
+ const text = errorText(error)
56
+ const lower = text.toLowerCase()
57
+ if (text.includes("This model does not support assistant message prefill")) {
58
+ return {
59
+ category: "model_error",
60
+ retry_recommendation: "retry_with_changes",
61
+ summary: "Provider rejected assistant prefill; retry with a fresh or shorter prompt.",
62
+ }
63
+ }
64
+ if (lower.includes("timed out")) {
65
+ const likelySizeRelated = (task?.prompt.length ?? 0) > 5_000
66
+ return {
67
+ category: "timeout",
68
+ retry_recommendation: likelySizeRelated ? "retry_with_changes" : "safe_to_retry",
69
+ summary: likelySizeRelated
70
+ ? "Background task timed out; retry with a shorter prompt or narrower scope."
71
+ : "Background task timed out; retrying is safe if upstream services are healthy.",
72
+ }
73
+ }
74
+ if (
75
+ lower.includes("argus tool") ||
76
+ lower.includes("command failed") ||
77
+ lower.includes("tool error") ||
78
+ lower.includes('"success":false')
79
+ ) {
80
+ return {
81
+ category: "tool_error",
82
+ retry_recommendation: "retry_with_changes",
83
+ summary: "Background task failed inside a tool or command invocation.",
84
+ }
85
+ }
86
+ return {
87
+ category: "unknown",
88
+ retry_recommendation: "retry_with_changes",
89
+ summary: text.length > 0 ? text : "Background task failed for an unknown reason.",
90
+ }
91
+ }
92
+
28
93
  export type Dispatcher = (
29
94
  agentName: string,
30
95
  prompt: string,
@@ -185,6 +250,23 @@ export function createBackgroundManager(
185
250
  return Promise.resolve(task.result)
186
251
  }
187
252
 
253
+ function getTaskStatus(taskId: string): Promise<BackgroundTaskDiagnostic | undefined> {
254
+ const task = tasks.get(taskId)
255
+ if (!task) return Promise.resolve(undefined)
256
+
257
+ if (task.status === "completed") {
258
+ return Promise.resolve({ status: task.status, result: task.result })
259
+ }
260
+ if (task.status === "failed" || task.status === "cancelled") {
261
+ return Promise.resolve({
262
+ status: task.status,
263
+ error: task.error,
264
+ diagnostic: classifyBackgroundFailure(task.error, task),
265
+ })
266
+ }
267
+ return Promise.resolve({ status: task.status })
268
+ }
269
+
188
270
  function onComplete(
189
271
  taskIdOrCallback: string | CompletionCallback,
190
272
  callback?: CompletionCallback,
@@ -227,6 +309,7 @@ export function createBackgroundManager(
227
309
  dispatch,
228
310
  cancel,
229
311
  getResult,
312
+ getTaskStatus,
230
313
  onComplete,
231
314
  getActiveCount,
232
315
  }
@@ -155,15 +155,21 @@ function isResolvedThemisDisposition(value: unknown): boolean {
155
155
  if (disposition?.status === "approved") {
156
156
  return disposition.verdict?.approved === true
157
157
  }
158
- if (disposition?.status === "remediated") {
159
- return disposition.verdict?.approved === false && hasText(disposition.notes)
160
- }
161
158
  if (disposition?.status === "overridden") {
162
159
  return disposition.verdict?.approved === false && hasText(disposition.justification)
163
160
  }
164
161
  return false
165
162
  }
166
163
 
164
+ function isRemediatedThemisDisposition(value: unknown): boolean {
165
+ const disposition = asRecord(value) as ThemisDisposition | null
166
+ return (
167
+ disposition?.status === "remediated" &&
168
+ disposition.verdict?.approved === false &&
169
+ hasText(disposition.notes)
170
+ )
171
+ }
172
+
167
173
  function hasRejectedThemisVerdict(value: unknown): boolean {
168
174
  const verdict = asRecord(value) as ThemisVerdict | null
169
175
  return verdict?.approved === false
@@ -189,6 +195,16 @@ function collectThemisDispositionErrors(events: AuditEvent[]): string[] {
189
195
 
190
196
  if (hasResolvedDisposition) return []
191
197
 
198
+ const hasRemediatedDisposition = laterEvents.some((event) => {
199
+ if (event.type !== "tool.completed") return false
200
+ const payload = asRecord(event.payload)
201
+ return isRemediatedThemisDisposition(payload?.themisDisposition)
202
+ })
203
+
204
+ if (hasRemediatedDisposition) {
205
+ return ["remediated Themis disposition requires fresh approved Themis validation"]
206
+ }
207
+
192
208
  const hasUnresolvedRejection = laterEvents.some((event) => {
193
209
  if (event.type !== "tool.completed") return false
194
210
  const payload = asRecord(event.payload)
@@ -41,14 +41,30 @@ function formatDuration(startTime: number, endTime?: number): string {
41
41
  }
42
42
 
43
43
  function buildToolLedgerLine(auditState: AuditState): string {
44
- const taskDispatches = auditState.toolsExecuted.filter((tool) => tool.tool === "task").length
44
+ const taskTools = auditState.toolsExecuted.filter((tool) => tool.tool === "task")
45
+ const taskDispatches = taskTools.length
45
46
  const argusTools = auditState.toolsExecuted.filter((tool) => tool.tool !== "task").slice(-5)
46
47
  const entries = argusTools.map((tool) => {
47
48
  const status = tool.success ? "ok" : "failed"
48
49
  return `${tool.tool}=${status} findings=${tool.findingsCount} duration=${formatDuration(tool.startTime, tool.endTime)}`
49
50
  })
50
51
 
51
- if (taskDispatches > 0) entries.push(`task dispatches=${taskDispatches}`)
52
+ if (taskDispatches > 0) {
53
+ const bySubagent = new Map<string, number>()
54
+ for (const tool of taskTools) {
55
+ const subagent = tool.subagent_type ?? "unknown"
56
+ bySubagent.set(subagent, (bySubagent.get(subagent) ?? 0) + 1)
57
+ }
58
+ const subagentSummary = [...bySubagent.entries()]
59
+ .sort(([a], [b]) => a.localeCompare(b))
60
+ .map(([subagent, count]) => `${subagent}=${count}`)
61
+ .join(" ")
62
+ entries.push(
63
+ subagentSummary.length > 0
64
+ ? `task dispatches=${taskDispatches} (${subagentSummary})`
65
+ : `task dispatches=${taskDispatches}`,
66
+ )
67
+ }
52
68
  return entries.length > 0 ? entries.join("; ") : "none"
53
69
  }
54
70
 
@@ -1,5 +1,20 @@
1
1
  import type { AuditState } from "../state/types"
2
2
 
3
+ export type BackgroundTaskStatus = "queued" | "running" | "completed" | "failed" | "cancelled"
4
+
5
+ export type BackgroundFailureDiagnostic = {
6
+ category: "model_error" | "tool_error" | "timeout" | "cancelled" | "unknown"
7
+ retry_recommendation: "safe_to_retry" | "retry_with_changes" | "do_not_retry"
8
+ summary: string
9
+ }
10
+
11
+ export type BackgroundTaskDiagnostic = {
12
+ status: BackgroundTaskStatus
13
+ result?: unknown
14
+ error?: unknown
15
+ diagnostic?: BackgroundFailureDiagnostic
16
+ }
17
+
3
18
  /**
4
19
  * BackgroundManager interface
5
20
  * Handles dispatching and managing background agent tasks
@@ -27,6 +42,12 @@ export interface BackgroundManager {
27
42
  */
28
43
  getResult(taskId: string): Promise<unknown>
29
44
 
45
+ /**
46
+ * Get task status and structured diagnostics for completed, failed, queued, and cancelled tasks.
47
+ * Unknown task IDs resolve to undefined.
48
+ */
49
+ getTaskStatus(taskId: string): Promise<BackgroundTaskDiagnostic | undefined>
50
+
30
51
  /**
31
52
  * Register a callback to be invoked when a task completes
32
53
  * @param callback - Function called with (taskId, result) when task finishes
@@ -0,0 +1,96 @@
1
+ import type { CanonicalFinding } from "../state/schemas"
2
+ import type { Finding } from "../state/types"
3
+
4
+ export type LineageCountMismatch = {
5
+ check: string
6
+ observation_count?: number
7
+ observation_ids_length: number
8
+ }
9
+
10
+ export type FindingLineageResult = {
11
+ valid: boolean
12
+ raw_count: number
13
+ mapped_count: number
14
+ duplicate_observation_ids: string[]
15
+ phantom_observation_ids: string[]
16
+ missing_observation_ids: string[]
17
+ count_mismatches: LineageCountMismatch[]
18
+ }
19
+
20
+ type FindingLike = Pick<Finding, "check"> & {
21
+ id?: string
22
+ observation_id?: string
23
+ observation_ids?: unknown
24
+ observation_count?: unknown
25
+ }
26
+
27
+ function sorted(values: Iterable<string>): string[] {
28
+ return Array.from(new Set(values)).sort((a, b) => a.localeCompare(b))
29
+ }
30
+
31
+ function observationIds(value: FindingLike): string[] {
32
+ if (!Array.isArray(value.observation_ids)) return []
33
+ return value.observation_ids.filter((id): id is string => typeof id === "string" && id.length > 0)
34
+ }
35
+
36
+ function rawObservationIds(rawFindings: CanonicalFinding[]): string[] {
37
+ return rawFindings
38
+ .map((finding) => finding.observation_id)
39
+ .filter((id): id is string => typeof id === "string" && id.length > 0)
40
+ }
41
+
42
+ export function validateFindingLineage(
43
+ rawFindings: CanonicalFinding[],
44
+ dedupedFindings: FindingLike[],
45
+ ): FindingLineageResult {
46
+ const rawIds = new Set(rawObservationIds(rawFindings))
47
+ const mappedIds: string[] = []
48
+ const seen = new Set<string>()
49
+ const duplicates = new Set<string>()
50
+ const countMismatches: LineageCountMismatch[] = []
51
+
52
+ for (const finding of dedupedFindings) {
53
+ const ids = observationIds(finding)
54
+ const suppliedCount = finding.observation_count
55
+
56
+ if (ids.length === 0 || (suppliedCount != null && suppliedCount !== ids.length)) {
57
+ countMismatches.push({
58
+ check: finding.check || finding.id || "(unknown finding)",
59
+ observation_count: typeof suppliedCount === "number" ? suppliedCount : undefined,
60
+ observation_ids_length: ids.length,
61
+ })
62
+ }
63
+
64
+ for (const id of ids) {
65
+ mappedIds.push(id)
66
+ if (seen.has(id)) {
67
+ duplicates.add(id)
68
+ }
69
+ seen.add(id)
70
+ }
71
+ }
72
+
73
+ const mappedSet = new Set(mappedIds)
74
+ const phantom = mappedIds.filter((id) => !rawIds.has(id))
75
+ const missing = Array.from(rawIds).filter((id) => !mappedSet.has(id))
76
+ const duplicateIds = sorted(duplicates)
77
+ const phantomIds = sorted(phantom)
78
+ const missingIds = sorted(missing)
79
+
80
+ countMismatches.sort((a, b) => a.check.localeCompare(b.check))
81
+
82
+ return {
83
+ valid:
84
+ duplicateIds.length === 0 &&
85
+ phantomIds.length === 0 &&
86
+ missingIds.length === 0 &&
87
+ countMismatches.length === 0 &&
88
+ mappedIds.length === rawIds.size,
89
+ raw_count: rawIds.size,
90
+ mapped_count: mappedIds.length,
91
+ duplicate_observation_ids: duplicateIds,
92
+ phantom_observation_ids: phantomIds,
93
+ missing_observation_ids: missingIds,
94
+ count_mismatches: countMismatches,
95
+ }
96
+ }
@@ -16,6 +16,8 @@ export interface ReportPathOptions {
16
16
  outputDir: string
17
17
  /** Optional run_id for run-scoped naming */
18
18
  runId?: string
19
+ /** Optional caller-supplied report revision. Base report is revision 1. */
20
+ revision?: number
19
21
  }
20
22
 
21
23
  export interface ResolvedReportPath {
@@ -46,7 +48,7 @@ export function sanitizeContractName(name: string): string {
46
48
  }
47
49
 
48
50
  export function resolveReportPath(options: ReportPathOptions): ResolvedReportPath {
49
- const { contractName, date, outputDir, runId } = options
51
+ const { contractName, date, outputDir, runId, revision } = options
50
52
 
51
53
  if (!contractName || contractName.trim() === "") {
52
54
  throw new ReportPathError("contractName must not be empty")
@@ -54,12 +56,16 @@ export function resolveReportPath(options: ReportPathOptions): ResolvedReportPat
54
56
  if (!outputDir || outputDir.trim() === "") {
55
57
  throw new ReportPathError("outputDir must not be empty")
56
58
  }
59
+ if (revision != null && (!Number.isInteger(revision) || revision < 2)) {
60
+ throw new ReportPathError("revision must be an integer greater than or equal to 2")
61
+ }
57
62
 
58
63
  const resolvedDate = date ?? new Date()
59
64
  const dateStr = formatReportDate(resolvedDate)
60
65
  const sanitizedName = sanitizeContractName(contractName)
61
66
  const runIdSuffix = runId ? `-${runId.substring(0, 8)}` : ""
62
- const filename = `${sanitizedName}-security-audit-${dateStr}${runIdSuffix}.md`
67
+ const revisionSuffix = revision == null ? "" : `-r${revision}`
68
+ const filename = `${sanitizedName}-security-audit-${dateStr}${runIdSuffix}${revisionSuffix}.md`
63
69
  const filePath = join(outputDir, filename)
64
70
  const canonicalId = runId ?? filename
65
71
 
@@ -95,6 +95,7 @@ export interface ToolExecution {
95
95
  success: boolean
96
96
  findingsCount: number
97
97
  findingCounts?: FindingCounts
98
+ subagent_type?: string
98
99
  }
99
100
 
100
101
  export interface FindingCounts {
@@ -40,6 +40,8 @@ type ForgeCoverageResult = {
40
40
  report: ForgeCoverageReport
41
41
  executionTime: number
42
42
  error?: string
43
+ hint?: string
44
+ suggested_command?: string
43
45
  }
44
46
 
45
47
  export type ForgeCommandRunner = (
@@ -73,6 +75,36 @@ function isStackTooDeep(stderr: string): boolean {
73
75
  return /stack too deep/i.test(stderr)
74
76
  }
75
77
 
78
+ function classifyCoverageFailure(
79
+ stderr: string,
80
+ args: NormalizedForgeCoverageArgs,
81
+ ): Pick<ForgeCoverageResult, "hint" | "suggested_command"> | undefined {
82
+ if (
83
+ !/(optimizerSteps|unsupported optimizer|config parse|failed to parse|instrumentation)/i.test(
84
+ stderr,
85
+ )
86
+ ) {
87
+ return undefined
88
+ }
89
+
90
+ const command = buildCoverageCommand({ ...args, ir_minimum: true }).join(" ")
91
+ return {
92
+ hint:
93
+ `Forge coverage failed for ${args.target} while parsing or instrumenting project configuration. ` +
94
+ "If foundry.toml uses optimizerSteps or unsupported optimizer settings, run a scoped coverage command or temporarily adjust coverage-only config manually; Argus will not edit foundry.toml.",
95
+ suggested_command: command,
96
+ }
97
+ }
98
+
99
+ function shouldRetryWithIrMinimum(stderr: string): boolean {
100
+ return (
101
+ isStackTooDeep(stderr) ||
102
+ /(optimizerSteps|unsupported optimizer|config parse|failed to parse|instrumentation)/i.test(
103
+ stderr,
104
+ )
105
+ )
106
+ }
107
+
76
108
  function parsePercent(input: string): number {
77
109
  const match = input.match(/(\d+(?:\.\d+)?)%/)
78
110
  if (!match?.[1]) {
@@ -165,11 +197,15 @@ export async function executeForgeCoverage(
165
197
  const normalizedArgs = normalizeArgs(args, context)
166
198
  context.metadata({ title: `Run forge coverage: ${normalizedArgs.target}` })
167
199
 
168
- const fail = (error: string): ForgeCoverageResult => ({
200
+ const fail = (
201
+ error: string,
202
+ diagnostics?: Pick<ForgeCoverageResult, "hint" | "suggested_command">,
203
+ ): ForgeCoverageResult => ({
169
204
  success: false,
170
205
  report: { files: [], summary: { ...EMPTY_SUMMARY } },
171
206
  executionTime: Date.now() - startedAt,
172
207
  error,
208
+ ...diagnostics,
173
209
  })
174
210
 
175
211
  try {
@@ -181,7 +217,7 @@ export async function executeForgeCoverage(
181
217
  if (
182
218
  runResult.exitCode !== 0 &&
183
219
  !normalizedArgs.ir_minimum &&
184
- isStackTooDeep(runResult.stderr)
220
+ shouldRetryWithIrMinimum(runResult.stderr)
185
221
  ) {
186
222
  runResult = await runCommand(buildCoverageCommand(normalizedArgs, true), {
187
223
  signal: context.abort,
@@ -190,9 +226,9 @@ export async function executeForgeCoverage(
190
226
  }
191
227
 
192
228
  if (runResult.exitCode !== 0) {
193
- return fail(
194
- runResult.stderr.trim() || `forge coverage exited with code ${runResult.exitCode}`,
195
- )
229
+ const error =
230
+ runResult.stderr.trim() || `forge coverage exited with code ${runResult.exitCode}`
231
+ return fail(error, classifyCoverageFailure(error, normalizedArgs))
196
232
  }
197
233
 
198
234
  let report: ForgeCoverageReport
@@ -1,7 +1,8 @@
1
- import { mkdir, writeFile } from "node:fs/promises"
1
+ import { mkdir, readFile, writeFile } from "node:fs/promises"
2
2
  import { dirname } from "node:path"
3
3
  import { type ToolContext, tool } from "@opencode-ai/plugin"
4
4
  import { createAuditArtifactResolver } from "../shared/audit-artifact-resolver"
5
+ import { validateFindingLineage } from "../shared/lineage-validator"
5
6
  import { createLogger } from "../shared/logger"
6
7
  import { resolveProjectDir } from "../shared/project-utils"
7
8
  import { isNonEmptyString } from "../shared/type-guards"
@@ -22,6 +23,28 @@ export interface DedupedFindingsArtifact {
22
23
  findings: CanonicalFinding[]
23
24
  }
24
25
 
26
+ async function loadRawFindings(
27
+ runId: string,
28
+ projectDir: string,
29
+ ): Promise<CanonicalFinding[] | null> {
30
+ const findingsFile = createAuditArtifactResolver(runId, projectDir).paths().findingsFile
31
+ try {
32
+ const parsed = JSON.parse(await readFile(findingsFile, "utf8"))
33
+ if (!parsed || !Array.isArray(parsed.findings)) return null
34
+ return parsed.findings
35
+ } catch {
36
+ return null
37
+ }
38
+ }
39
+
40
+ function missingRawFindings(runId: string): string {
41
+ return JSON.stringify({
42
+ success: false,
43
+ error: "MissingRawFindingsError",
44
+ message: `Cannot verify deduped lineage because .argus/runs/${runId}/findings.json is missing or invalid`,
45
+ })
46
+ }
47
+
25
48
  export async function executePersistDeduped(
26
49
  args: PersistDedupedArgs,
27
50
  context: ToolContext,
@@ -55,6 +78,27 @@ export async function executePersistDeduped(
55
78
  const projectDir = resolveProjectDir(context)
56
79
  const resolver = createAuditArtifactResolver(args.run_id, projectDir)
57
80
  const dedupedPath = resolver.paths().dedupedFindingsFile
81
+ const rawFindings = await loadRawFindings(args.run_id, projectDir)
82
+
83
+ if (!rawFindings) {
84
+ return missingRawFindings(args.run_id)
85
+ }
86
+
87
+ const lineage = validateFindingLineage(rawFindings, findings)
88
+ if (!lineage.valid) {
89
+ return JSON.stringify({
90
+ success: false,
91
+ error: "LineageError",
92
+ lineage: {
93
+ raw_count: lineage.raw_count,
94
+ mapped_count: lineage.mapped_count,
95
+ duplicate_observation_ids: lineage.duplicate_observation_ids,
96
+ phantom_observation_ids: lineage.phantom_observation_ids,
97
+ missing_observation_ids: lineage.missing_observation_ids,
98
+ count_mismatches: lineage.count_mismatches,
99
+ },
100
+ })
101
+ }
58
102
 
59
103
  const artifact: DedupedFindingsArtifact = {
60
104
  run_id: args.run_id,
@@ -12,6 +12,8 @@ import type { AuditState } from "../state/types"
12
12
 
13
13
  type ReadFindingsArgs = {
14
14
  run_id: string
15
+ findings_offset?: number
16
+ findings_limit?: number
15
17
  }
16
18
 
17
19
  type ReportFinding = Omit<
@@ -42,6 +44,11 @@ type CompactReportInput = Omit<
42
44
  run_id: string
43
45
  findings: ReportFinding[]
44
46
  toolsExecuted: ReportToolExecution[]
47
+ findingsPage?: {
48
+ offset: number
49
+ limit: number
50
+ total: number
51
+ }
45
52
  }
46
53
 
47
54
  type ReadFindingsInlineResult = {
@@ -98,13 +105,31 @@ function stripInternalKeys(obj: object, keysToStrip: ReadonlySet<string>): Recor
98
105
  return result
99
106
  }
100
107
 
101
- function buildCompactInput(reportInput: ReportInput): CompactReportInput {
108
+ function normalizePageArgs(args: ReadFindingsArgs): { offset: number; limit: number } | null {
109
+ if (args.findings_offset == null && args.findings_limit == null) return null
110
+
111
+ const offset = args.findings_offset ?? 0
112
+ const limit = args.findings_limit ?? 50
113
+ if (!Number.isInteger(offset) || offset < 0) {
114
+ throw new Error("findings_offset must be a non-negative integer")
115
+ }
116
+ if (!Number.isInteger(limit) || limit < 1 || limit > 500) {
117
+ throw new Error("findings_limit must be an integer between 1 and 500")
118
+ }
119
+ return { offset, limit }
120
+ }
121
+
122
+ function buildCompactInput(
123
+ reportInput: ReportInput,
124
+ page: { offset: number; limit: number } | null = null,
125
+ ): CompactReportInput {
126
+ const rawFindings = page
127
+ ? reportInput.findings.slice(page.offset, page.offset + page.limit)
128
+ : reportInput.findings
102
129
  return {
103
130
  run_id: reportInput.run_id,
104
131
  projectDir: reportInput.projectDir,
105
- findings: reportInput.findings.map(
106
- (f) => stripInternalKeys(f, FINDING_INTERNAL_KEYS) as ReportFinding,
107
- ),
132
+ findings: rawFindings.map((f) => stripInternalKeys(f, FINDING_INTERNAL_KEYS) as ReportFinding),
108
133
  toolsExecuted: reportInput.toolsExecuted.map(
109
134
  (t) => stripInternalKeys(t, TOOL_EXECUTION_INTERNAL_KEYS) as ReportToolExecution,
110
135
  ),
@@ -118,6 +143,13 @@ function buildCompactInput(reportInput: ReportInput): CompactReportInput {
118
143
  ...(reportInput.proxyContracts && { proxyContracts: reportInput.proxyContracts }),
119
144
  ...(reportInput.patternVersion && { patternVersion: reportInput.patternVersion }),
120
145
  ...(reportInput.skillsLoaded && { skillsLoaded: reportInput.skillsLoaded }),
146
+ ...(page && {
147
+ findingsPage: {
148
+ offset: page.offset,
149
+ limit: page.limit,
150
+ total: reportInput.findings.length,
151
+ },
152
+ }),
121
153
  }
122
154
  }
123
155
 
@@ -339,7 +371,8 @@ export async function executeReadFindings(
339
371
 
340
372
  const projectDir = resolveProjectDir(context)
341
373
  const reportInput = readAuditStateAsReportInput(projectDir, runId)
342
- const compactInput = buildCompactInput(reportInput)
374
+ const page = normalizePageArgs(args)
375
+ const compactInput = buildCompactInput(reportInput, page)
343
376
 
344
377
  const inlineJson = JSON.stringify({
345
378
  success: true,
@@ -383,6 +416,14 @@ export const readFindingsTool = tool({
383
416
  "Read the materialized ReportInput artifact from disk for a given run. Returns the canonical findings, tools executed, scope, and all enrichment data. Scribe should call this before generating the report.",
384
417
  args: {
385
418
  run_id: tool.schema.string().describe("The run ID to read findings for."),
419
+ findings_offset: tool.schema
420
+ .number()
421
+ .optional()
422
+ .describe("Optional zero-based finding offset for paged inline retrieval."),
423
+ findings_limit: tool.schema
424
+ .number()
425
+ .optional()
426
+ .describe("Optional finding page size for inline retrieval (1-500)."),
386
427
  },
387
428
  async execute(args, context) {
388
429
  return executeReadFindings(args, context)
@@ -1,7 +1,7 @@
1
1
  import { type ToolContext, tool } from "@opencode-ai/plugin"
2
2
  import { isNonEmptyString } from "../shared/type-guards"
3
3
  import { normalizeToCanonicalFinding } from "../state/adapters"
4
- import { SCHEMA_VERSION } from "../state/schemas"
4
+ import { type CanonicalFinding, SCHEMA_VERSION } from "../state/schemas"
5
5
  import type { ArgusAgentName } from "../state/types"
6
6
 
7
7
  type RecordFindingArgs = {
@@ -12,20 +12,7 @@ type RecordFindingArgs = {
12
12
  type RecordFindingResponse = {
13
13
  success: boolean
14
14
  count: number
15
- findings: Array<{
16
- id: string
17
- check: string
18
- severity: string
19
- confidence: string
20
- file: string
21
- description: string
22
- lines: [number, number]
23
- source: string
24
- reported_by_agent: string
25
- impact?: string
26
- recommendation?: string
27
- proofOfConcept?: string
28
- }>
15
+ findings: CanonicalFinding[]
29
16
  schema_version: string
30
17
  note: string
31
18
  enrichment_warnings?: string[]
@@ -202,20 +189,7 @@ export async function executeRecordFinding(
202
189
  const response: RecordFindingResponse = {
203
190
  success: true,
204
191
  count: findings.length,
205
- findings: findings.map((f) => ({
206
- id: f.id,
207
- check: f.check,
208
- severity: f.severity,
209
- confidence: f.confidence,
210
- file: f.file,
211
- description: f.description,
212
- lines: f.lines,
213
- source: f.source,
214
- reported_by_agent: f.reported_by_agent,
215
- ...(f.impact !== undefined ? { impact: f.impact } : {}),
216
- ...(f.recommendation !== undefined ? { recommendation: f.recommendation } : {}),
217
- ...(f.proofOfConcept !== undefined ? { proofOfConcept: f.proofOfConcept } : {}),
218
- })),
192
+ findings,
219
193
  schema_version: SCHEMA_VERSION,
220
194
  note: "Findings recorded to event journal. The system assigns the canonical run_id automatically — use the run_id from <argus-context> for Scribe dispatch.",
221
195
  ...(enrichmentWarnings.length > 0
@@ -9,6 +9,7 @@ import { createAuditArtifactResolver } from "../shared/audit-artifact-resolver"
9
9
  import type { DropDiagnostic } from "../shared/drop-diagnostics"
10
10
  import { createDropDiagnosticsCollector } from "../shared/drop-diagnostics"
11
11
  import { computeMissingKeyTools } from "../shared/key-tools"
12
+ import { validateFindingLineage } from "../shared/lineage-validator"
12
13
  import { createLogger } from "../shared/logger"
13
14
  import { resolveProjectDir } from "../shared/project-utils"
14
15
  import { resolveReportPath } from "../shared/report-path-resolver"
@@ -38,6 +39,8 @@ type ReportGeneratorArgs = {
38
39
  preflight_policy?: PreflightPolicy
39
40
  tool_coverage_policy?: ToolCoveragePolicy
40
41
  run_id?: string
42
+ revision?: number
43
+ force?: boolean
41
44
  }
42
45
 
43
46
  type FindingsCount = {
@@ -131,6 +134,30 @@ async function checkDuplicateWrite(
131
134
  return null
132
135
  }
133
136
 
137
+ async function checkSafeForceOverwrite(
138
+ filePath: string,
139
+ runId: string,
140
+ ): Promise<{ code: string; message: string } | null> {
141
+ if (!existsSync(filePath)) return null
142
+ try {
143
+ const existingContent = await Bun.file(filePath).text()
144
+ const existingRunId = extractReportRunId(existingContent)
145
+ if (existingRunId === runId) return null
146
+ return {
147
+ code: "INSECURE_OVERWRITE_REFUSED",
148
+ message:
149
+ existingRunId == null
150
+ ? `Refusing to force overwrite ${filePath}: existing file has no Argus report metadata.`
151
+ : `Refusing to force overwrite ${filePath}: existing report belongs to run_id "${existingRunId}", not "${runId}".`,
152
+ }
153
+ } catch (err) {
154
+ return {
155
+ code: "INSECURE_OVERWRITE_REFUSED",
156
+ message: `Refusing to force overwrite ${filePath}: existing file could not be read (${err instanceof Error ? err.message : String(err)}).`,
157
+ }
158
+ }
159
+ }
160
+
134
161
  const SEVERITY_ORDER: FindingSeverity[] = ["Critical", "High", "Medium", "Low", "Informational"]
135
162
 
136
163
  const SEVERITY_PREFIX: Record<FindingSeverity, string> = {
@@ -767,6 +794,23 @@ function shouldIncludeFinding(finding: Finding, threshold: SeverityThreshold): b
767
794
  return FINDING_WEIGHT[finding.severity] >= THRESHOLD_WEIGHT[threshold]
768
795
  }
769
796
 
797
+ function normalizeScopePath(value: string): string {
798
+ return value.replace(/^\.\//, "").replace(/\/+$|\\+$/g, "")
799
+ }
800
+
801
+ function isFindingInScope(finding: Finding, scope: string[]): boolean {
802
+ if (scope.length === 0) return true
803
+ const file = normalizeScopePath(finding.file)
804
+ return scope.some((entry) => {
805
+ const scoped = normalizeScopePath(entry)
806
+ return file === scoped || file.startsWith(`${scoped}/`)
807
+ })
808
+ }
809
+
810
+ function collectOutOfScopeFindings(findings: Finding[], scope: string[]): Finding[] {
811
+ return findings.filter((finding) => !isFindingInScope(finding, scope))
812
+ }
813
+
770
814
  function calculateCounts(findings: Finding[]): FindingsCount {
771
815
  const counts = emptyCounts()
772
816
 
@@ -877,31 +921,6 @@ function hasDedupLineage(findings: Finding[]): boolean {
877
921
  })
878
922
  }
879
923
 
880
- function observationIdsForFinding(finding: Finding): string[] {
881
- const observationIds = (finding as { observation_ids?: unknown }).observation_ids
882
- if (Array.isArray(observationIds)) {
883
- return observationIds.filter((id): id is string => typeof id === "string" && id.length > 0)
884
- }
885
- return typeof finding.observation_id === "string" && finding.observation_id.length > 0
886
- ? [finding.observation_id]
887
- : []
888
- }
889
-
890
- function compareObservationLineage(
891
- eventFindings: Finding[],
892
- reportFindings: Finding[],
893
- ): { missing: string[]; extra: string[]; matches: boolean } {
894
- const expected = new Set(eventFindings.flatMap(observationIdsForFinding))
895
- const actual = new Set(reportFindings.flatMap(observationIdsForFinding))
896
- const missing = Array.from(expected)
897
- .filter((id) => !actual.has(id))
898
- .sort((a, b) => a.localeCompare(b))
899
- const extra = Array.from(actual)
900
- .filter((id) => !expected.has(id))
901
- .sort((a, b) => a.localeCompare(b))
902
- return { missing, extra, matches: missing.length === 0 && extra.length === 0 }
903
- }
904
-
905
924
  export function validateReportQuality(
906
925
  findings: Finding[],
907
926
  policy: QualityGatePolicy,
@@ -1211,6 +1230,18 @@ export async function executeReportGeneration(
1211
1230
  const qualityGatePolicy = args.quality_gate_policy ?? "warn"
1212
1231
  const toolCoveragePolicy = args.tool_coverage_policy ?? "enforce"
1213
1232
  const expectedRunId = resolveExpectedRunId(args, context, deps)
1233
+ const invalidRegenerationOptions =
1234
+ args.force === true && args.revision != null
1235
+ ? {
1236
+ code: "INVALID_REGENERATION_OPTIONS",
1237
+ message: "force and revision must not both be set.",
1238
+ }
1239
+ : args.revision != null && (!Number.isInteger(args.revision) || args.revision < 2)
1240
+ ? {
1241
+ code: "INVALID_REGENERATION_OPTIONS",
1242
+ message: "revision must be an integer greater than or equal to 2.",
1243
+ }
1244
+ : null
1214
1245
 
1215
1246
  // Ensure report-input.json is materialized before attempting disk lookup.
1216
1247
  // Scribe may call generate_report without calling read_findings first,
@@ -1235,6 +1266,18 @@ export async function executeReportGeneration(
1235
1266
  const preflightPolicy = args.preflight_policy ?? "warn"
1236
1267
  let preflightWarningSection: string | null = null
1237
1268
  const warningBullets: string[] = []
1269
+ const state = reportInputToAuditState(reportInput)
1270
+ const scope = args.scope.length > 0 ? args.scope : reportInput.scope
1271
+ const finalFindings = dedupeFindingsForFinalOutput(reportInput.findings)
1272
+ const outOfScopeFindings = collectOutOfScopeFindings(finalFindings, scope)
1273
+ if (outOfScopeFindings.length > 0) {
1274
+ const locations = outOfScopeFindings.map(formatLocation).join(", ")
1275
+ const message = `findings outside audited scope: ${locations}`
1276
+ if (preflightPolicy === "strict-fail") {
1277
+ throw new Error(`Preflight failed (strict-fail): ${message}`)
1278
+ }
1279
+ warningBullets.push(`- ${message}`)
1280
+ }
1238
1281
 
1239
1282
  // Hard gate: refuse to generate a report if key audit tools have not been executed
1240
1283
  if (toolCoveragePolicy !== "skip") {
@@ -1285,11 +1328,24 @@ export async function executeReportGeneration(
1285
1328
  const inputFindings = dedupeFindingsForFinalOutput(reportInput.findings)
1286
1329
  const hasLineage = hasDedupLineage(reportInput.findings)
1287
1330
  const shouldCheckParity = eventFindings.length === inputFindings.length || hasLineage
1331
+ const lineage = hasLineage
1332
+ ? validateFindingLineage(projectFindings(events), reportInput.findings)
1333
+ : null
1288
1334
  const parity = shouldCheckParity
1289
- ? hasLineage
1290
- ? compareObservationLineage(projectFindings(events), reportInput.findings)
1291
- : compareIssueFingerprintSets(eventFindings, inputFindings)
1292
- : { missing: [], extra: [], matches: true }
1335
+ ? lineage
1336
+ ? {
1337
+ missing: lineage.missing_observation_ids,
1338
+ extra: lineage.phantom_observation_ids,
1339
+ duplicates: lineage.duplicate_observation_ids,
1340
+ countMismatches: lineage.count_mismatches,
1341
+ matches: lineage.valid,
1342
+ }
1343
+ : {
1344
+ ...compareIssueFingerprintSets(eventFindings, inputFindings),
1345
+ duplicates: [],
1346
+ countMismatches: [],
1347
+ }
1348
+ : { missing: [], extra: [], duplicates: [], countMismatches: [], matches: true }
1293
1349
 
1294
1350
  if (!shouldCheckParity) {
1295
1351
  const unverifiableSummary = `event_findings=${eventFindings.length}, report_findings=${inputFindings.length}`
@@ -1320,6 +1376,14 @@ export async function executeReportGeneration(
1320
1376
  if (parity.extra.length > 0) {
1321
1377
  warningBullets.push(`- Extra ${parityLabel}: ${parity.extra.join(", ")}`)
1322
1378
  }
1379
+ if (parity.duplicates.length > 0) {
1380
+ warningBullets.push(`- Duplicate ${parityLabel}: ${parity.duplicates.join(", ")}`)
1381
+ }
1382
+ if (parity.countMismatches.length > 0) {
1383
+ warningBullets.push(
1384
+ `- Observation count mismatches: ${parity.countMismatches.map((item) => item.check).join(", ")}`,
1385
+ )
1386
+ }
1323
1387
  }
1324
1388
  } catch (err) {
1325
1389
  if (err instanceof Error && err.message.startsWith("Preflight failed (strict-fail)")) {
@@ -1341,9 +1405,6 @@ export async function executeReportGeneration(
1341
1405
  ].join("\n")
1342
1406
  }
1343
1407
 
1344
- const state = reportInputToAuditState(reportInput)
1345
- const scope = args.scope.length > 0 ? args.scope : reportInput.scope
1346
- const finalFindings = dedupeFindingsForFinalOutput(reportInput.findings)
1347
1408
  const findings = sortFindingsDeterministically(
1348
1409
  finalFindings.filter((finding) => shouldIncludeFinding(finding, threshold)),
1349
1410
  )
@@ -1442,6 +1503,7 @@ export async function executeReportGeneration(
1442
1503
  date: new Date(auditDate),
1443
1504
  outputDir: ".opencode/reports/",
1444
1505
  runId: runId || undefined,
1506
+ revision: args.revision,
1445
1507
  })
1446
1508
 
1447
1509
  const result: ReportGenerationResult = {
@@ -1454,6 +1516,11 @@ export async function executeReportGeneration(
1454
1516
  contractDiagnostics: diagnostics,
1455
1517
  }
1456
1518
 
1519
+ if (invalidRegenerationOptions) {
1520
+ result.error = invalidRegenerationOptions
1521
+ return result
1522
+ }
1523
+
1457
1524
  try {
1458
1525
  const loadConfig = deps.loadConfig ?? loadArgusConfig
1459
1526
  const projectDir = resolveProjectDir(context)
@@ -1468,14 +1535,28 @@ export async function executeReportGeneration(
1468
1535
  }
1469
1536
  return result
1470
1537
  }
1471
- const fullPath = path.join(resolvedOutput, canonicalFilename)
1538
+ const { filePath: fullPath } = resolveReportPath({
1539
+ contractName: args.project_name,
1540
+ date: new Date(auditDate),
1541
+ outputDir: resolvedOutput,
1542
+ runId: runId || undefined,
1543
+ revision: args.revision,
1544
+ })
1472
1545
 
1473
1546
  // Single-writer policy: check for duplicate writes with same run_id
1474
1547
  if (runId) {
1475
- const duplicateError = await checkDuplicateWrite(fullPath, runId)
1476
- if (duplicateError) {
1477
- result.error = duplicateError
1478
- return result
1548
+ if (args.force === true) {
1549
+ const forceError = await checkSafeForceOverwrite(fullPath, runId)
1550
+ if (forceError) {
1551
+ result.error = forceError
1552
+ return result
1553
+ }
1554
+ } else {
1555
+ const duplicateError = await checkDuplicateWrite(fullPath, runId)
1556
+ if (duplicateError) {
1557
+ result.error = duplicateError
1558
+ return result
1559
+ }
1479
1560
  }
1480
1561
  }
1481
1562
 
@@ -1505,6 +1586,10 @@ export const reportGeneratorTool = tool({
1505
1586
  .enum(["critical", "high", "medium", "low", "informational"])
1506
1587
  .default("informational"),
1507
1588
  preflight_policy: tool.schema.enum(["warn", "strict-fail"]).optional(),
1589
+ quality_gate_policy: tool.schema
1590
+ .enum(["warn", "strict-fail"])
1591
+ .optional()
1592
+ .describe("Controls whether report quality gate violations warn or fail generation."),
1508
1593
  tool_coverage_policy: tool.schema
1509
1594
  .enum(["enforce", "warn", "skip"])
1510
1595
  .optional()
@@ -1518,6 +1603,18 @@ export const reportGeneratorTool = tool({
1518
1603
  .describe(
1519
1604
  "The canonical run ID from <argus-context>. The tool reads the materialized report-input.json from disk using this ID.",
1520
1605
  ),
1606
+ revision: tool.schema
1607
+ .number()
1608
+ .optional()
1609
+ .describe(
1610
+ "Caller-supplied report revision. Must be an integer >= 2 and writes a -r{revision} file.",
1611
+ ),
1612
+ force: tool.schema
1613
+ .boolean()
1614
+ .optional()
1615
+ .describe(
1616
+ "Overwrite only the base canonical report path when existing Argus metadata matches the same run_id.",
1617
+ ),
1521
1618
  },
1522
1619
  async execute(args, context) {
1523
1620
  const result = await executeReportGeneration(args, context)
@@ -1,5 +1,13 @@
1
1
  import { createHash } from "node:crypto"
2
- import { existsSync, mkdtempSync, readFileSync, rmSync, writeFileSync } from "node:fs"
2
+ import {
3
+ existsSync,
4
+ mkdtempSync,
5
+ readdirSync,
6
+ readFileSync,
7
+ rmSync,
8
+ statSync,
9
+ writeFileSync,
10
+ } from "node:fs"
3
11
  import { tmpdir } from "node:os"
4
12
  import { dirname, isAbsolute, join, resolve } from "node:path"
5
13
  import { type ToolContext, tool } from "@opencode-ai/plugin"
@@ -63,6 +71,8 @@ export type SlitherAnalyzeResult = {
63
71
  executionTime: number
64
72
  errors: string[]
65
73
  error?: string
74
+ hint?: string
75
+ suggested_command?: string
66
76
  }
67
77
 
68
78
  function mapSeverity(impact?: string): FindingSeverity {
@@ -151,6 +161,50 @@ function shouldTryFlattenFallback(errors: string[], stderr: string): boolean {
151
161
  return FALLBACK_TRIGGERS.some((trigger) => combined.includes(trigger))
152
162
  }
153
163
 
164
+ function isMixedPragmaSlitherFailure(errors: string[], stderr: string): boolean {
165
+ const combined = [...errors, stderr].join(" ")
166
+ return (
167
+ /(CryticCompileError|Slither exited with code 1)/i.test(combined) &&
168
+ /(solc|pragma|requires different compiler version|different compiler version|compiler version)/i.test(
169
+ combined,
170
+ )
171
+ )
172
+ }
173
+
174
+ function containsSolidityFile(dir: string): boolean {
175
+ try {
176
+ for (const entry of readdirSync(dir)) {
177
+ const fullPath = join(dir, entry)
178
+ const stat = statSync(fullPath)
179
+ if (stat.isFile() && entry.endsWith(".sol")) return true
180
+ if (stat.isDirectory() && containsSolidityFile(fullPath)) return true
181
+ }
182
+ } catch {
183
+ return false
184
+ }
185
+ return false
186
+ }
187
+
188
+ function mixedPragmaDiagnostics(
189
+ args: SlitherArgs,
190
+ projectDir: string,
191
+ errors: string[],
192
+ stderr: string,
193
+ ): Pick<SlitherAnalyzeResult, "hint" | "suggested_command"> | undefined {
194
+ if (!isMixedPragmaSlitherFailure(errors, stderr)) return undefined
195
+
196
+ const target = resolve(projectDir, args.target)
197
+ const srcCandidate = join(target, "src")
198
+ const suggestion =
199
+ existsSync(srcCandidate) && containsSolidityFile(srcCandidate) ? srcCandidate : undefined
200
+ return {
201
+ hint: "Try narrowing target to a single-pragma subdirectory and check foundry.toml/remappings for mixed compiler or vendored dependency scope issues.",
202
+ suggested_command: suggestion
203
+ ? buildCommand({ ...args, target: suggestion }).join(" ")
204
+ : undefined,
205
+ }
206
+ }
207
+
154
208
  const parseSolcVersion = parseSolcVersionShared
155
209
  const extractContractNames = extractContractNamesShared
156
210
  const hasBinary = hasBinaryShared
@@ -488,7 +542,8 @@ export async function executeSlitherAnalyze(
488
542
  payload = JSON.parse(runResult.stdout) as SlitherPayload
489
543
  } catch (error) {
490
544
  const message = error instanceof Error ? error.message : "Unknown parse error"
491
- if (args.via_ir || shouldTryFlattenFallback(errors, runResult.stderr)) {
545
+ const diagnostics = mixedPragmaDiagnostics(args, projectDir, errors, runResult.stderr)
546
+ if (!diagnostics && (args.via_ir || shouldTryFlattenFallback(errors, runResult.stderr))) {
492
547
  const fallbackResult = await flattenFallback(args, context, {
493
548
  ...getDefaultFlattenDeps(),
494
549
  runCommand,
@@ -503,6 +558,7 @@ export async function executeSlitherAnalyze(
503
558
  executionTime: Date.now() - startedAt,
504
559
  errors,
505
560
  error: `Slither output parse error: ${message}`,
561
+ ...diagnostics,
506
562
  }
507
563
  }
508
564
 
@@ -513,9 +569,12 @@ export async function executeSlitherAnalyze(
513
569
  const findings = parseFindings(payload)
514
570
  const success = findings.length > 0 || (runResult.exitCode === 0 && payload.success !== false)
515
571
 
572
+ const diagnostics = mixedPragmaDiagnostics(args, projectDir, errors, runResult.stderr)
573
+
516
574
  if (
517
575
  !success &&
518
576
  findings.length === 0 &&
577
+ !diagnostics &&
519
578
  (args.via_ir || shouldTryFlattenFallback(errors, runResult.stderr))
520
579
  ) {
521
580
  const fallbackResult = await flattenFallback(args, context, {
@@ -532,6 +591,7 @@ export async function executeSlitherAnalyze(
532
591
  findings,
533
592
  executionTime: Date.now() - startedAt,
534
593
  errors,
594
+ ...diagnostics,
535
595
  }
536
596
  } catch (error) {
537
597
  const message = error instanceof Error ? error.message : "Unknown error"