solidity-argus 0.5.7 → 0.5.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/AGENTS.md CHANGED
@@ -12,7 +12,7 @@ CLI: `argus doctor`, `argus init`, `argus install`.
12
12
 
13
13
  **Role**: Primary security audit orchestrator
14
14
  **Description**: Argus Panoptes, the All-Seeing Guardian. Coordinates full Solidity security audits by dispatching Sentinel (analysis), Pythia (research), Scribe (reporting), and Themis (validation). Follows a rigorous 7-step methodology: Reconnaissance, Automated Scanning, Manual Review, Attack Surface Mapping, Vulnerability Research, Testing & Verification, and Reporting.
15
- **Model**: anthropic/claude-opus-4-6
15
+ **Model**: anthropic/claude-opus-4-7
16
16
  **Tools**: 14 orchestrator-accessible argus_* tools (argus_slither_analyze, argus_analyze_contract, argus_check_patterns, argus_proxy_detection, argus_solodit_search, argus_forge_test, argus_gas_analysis, argus_forge_fuzz, argus_forge_coverage, argus_skill_load, argus_generate_report, argus_record_finding, argus_read_findings, argus_sync_knowledge). `argus_persist_deduped` is reserved for Scribe.
17
17
 
18
18
  ## sentinel
@@ -39,6 +39,6 @@ CLI: `argus doctor`, `argus init`, `argus install`.
39
39
  ## themis
40
40
 
41
41
  **Role**: Audit quality gate
42
- **Description**: Independent cross-validation agent running on GPT-5.4 (different LLM provider for reasoning diversity). Validates pipeline integrity: compares raw findings against Scribe's deduped output and the final report. Performs second-opinion research via Solodit and vulnerability skill checklists. Returns a structured verdict to Argus who makes the final decision. Dispatched by Argus after Scribe completes.
43
- **Model**: openai/gpt-5.4
42
+ **Description**: Independent cross-validation agent running on GPT-5.5 (different LLM provider for reasoning diversity). Validates pipeline integrity: compares raw findings against Scribe's deduped output and the final report. Performs second-opinion research via Solodit and vulnerability skill checklists. Returns a structured verdict to Argus who makes the final decision. Dispatched by Argus after Scribe completes.
43
+ **Model**: openai/gpt-5.5
44
44
  **Tools**: argus_read_findings, argus_solodit_search, argus_check_patterns, argus_skill_load, skill
package/README.md CHANGED
@@ -65,11 +65,11 @@ Argus will automatically:
65
65
 
66
66
  | Agent | Role | Model |
67
67
  |-------|------|-------|
68
- | `@argus` | Orchestrator — coordinates the full audit | claude-opus-4-6 |
68
+ | `@argus` | Orchestrator — coordinates the full audit | claude-opus-4-7 |
69
69
  | `@sentinel` | Static analysis & testing specialist | claude-sonnet-4-6 |
70
70
  | `@pythia` | Vulnerability researcher | claude-sonnet-4-6 |
71
71
  | `@scribe` | Audit report writer | claude-sonnet-4-6 |
72
- | `@themis` | Independent audit quality gate | gpt-5.4 |
72
+ | `@themis` | Independent audit quality gate | gpt-5.5 |
73
73
 
74
74
  ### @argus — The Orchestrator
75
75
  Argus Panoptes is the lead auditor. It follows a 7-step methodology (Reconnaissance, Automated Scanning, Manual Review, Attack Surface Mapping, Vulnerability Research, Testing & Verification, Reporting) and delegates to Sentinel, Pythia, Scribe, and Themis as needed.
@@ -284,11 +284,11 @@ Create `.argus/solidity-argus.jsonc` in your project root. `.opencode/solidity-a
284
284
  ```jsonc
285
285
  {
286
286
  "agents": {
287
- "argus": { "model": "anthropic/claude-opus-4-6" },
287
+ "argus": { "model": "anthropic/claude-opus-4-7" },
288
288
  "sentinel": { "model": "anthropic/claude-sonnet-4-6" },
289
289
  "pythia": { "model": "anthropic/claude-sonnet-4-6" },
290
290
  "scribe": { "model": "anthropic/claude-sonnet-4-6" },
291
- "themis": { "model": "openai/gpt-5.4" }
291
+ "themis": { "model": "openai/gpt-5.5" }
292
292
  },
293
293
 
294
294
  "tools": {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "solidity-argus",
3
- "version": "0.5.7",
3
+ "version": "0.5.9",
4
4
  "description": "Solidity smart contract security auditing plugin for OpenCode — 5 specialized agents, 15 tools (14 core + optional Solodit), and a curated vulnerability knowledge base",
5
5
  "keywords": [
6
6
  "solidity",
@@ -229,7 +229,7 @@ Task(subagent_type="scribe", prompt="Generate the final audit report for Project
229
229
  - **Constraint**: Only invoke Scribe after all analysis and testing are complete.
230
230
 
231
231
  ### **@themis** (The Quality Gate)
232
- - **Role**: Independent audit validation using a different LLM provider (GPT-5.4).
232
+ - **Role**: Independent audit validation using a different LLM provider (GPT-5.5).
233
233
  - **Tools**: \`argus_read_findings\`, \`argus_solodit_search\`, \`argus_check_patterns\`, \`argus_skill_load\`
234
234
  - **Delegation Examples**:
235
235
  \`\`\`
@@ -255,7 +255,7 @@ When building the final report or synthesizing findings:
255
255
  2. **Secondary source**: Tool transcript text (use only when durable evidence is unavailable or incomplete).
256
256
  3. **Never** synthesize findings from ephemeral background transcript retrieval alone if durable state evidence exists.
257
257
  4. **Manual-finding durability**: If Argus, Sentinel, or Pythia identifies a finding outside analyzer tool payloads, they must call \
258
- \`argus_record_finding\` before proceeding. The JSON payload MUST include \`impact\`, \`recommendation\`, and (for Critical/High) \`proofOfConcept\` fields.
258
+ \`argus_record_finding\` before proceeding. The JSON payload should include \`impact\`, \`recommendation\`, and \`proofOfConcept\` fields whenever they are known. Missing enrichment is recorded with warnings rather than rejected, but Scribe must enrich final Critical/High findings before reporting.
259
259
  5. **Report parity rule**: Scribe must not include findings in \`report_input\` unless they are event-backed (recorded via tools/events).
260
260
 
261
261
  **Bounded background fan-out**: For deep audits, limit concurrent high-context background delegations to max 2 at a time. Split larger workloads into sequential waves. This prevents retrieval blind spots from simultaneous long-running tasks.
@@ -365,7 +365,7 @@ Your subagents have access to these specialized tools. Know when to delegate eac
365
365
  "proofOfConcept": "Steps to reproduce or reference to PoC test"
366
366
  }
367
367
  \`\`\`
368
- - **CRITICAL**: For Critical and High findings, \`impact\`, \`recommendation\`, and \`proofOfConcept\` are MANDATORY. The quality gate will flag findings missing these fields. Preferred field names: \`check\`, \`file\`, \`lines\`. The aliases \`title\`/\`name\` → \`check\` and \`location\` → \`file\` are accepted but canonical names are preferred. Instruct Sentinel and Pythia accordingly when delegating.
368
+ - **CRITICAL**: For Critical and High final report findings, \`impact\`, \`recommendation\`, and \`proofOfConcept\` are MANDATORY. For any finding with \`source: "slither"\`, preserve the finding even when enrichment is not ready, but add these three fields before final Scribe persistence whenever possible. \`argus_record_finding\` warns on incomplete Slither enrichment instead of dropping the finding. Preferred field names: \`check\`, \`file\`, \`lines\`. The aliases \`title\`/\`name\` → \`check\` and \`location\` → \`file\` are accepted but canonical names are preferred. Instruct Sentinel and Pythia accordingly when delegating.
369
369
 
370
370
  - **\`argus_sync_knowledge\`**:
371
371
  - **Use**: Maintenance.
@@ -527,7 +527,7 @@ Scope: {list of audited files}
527
527
 
528
528
  STEPS:
529
529
  1. Call argus_read_findings with run_id above to load all findings
530
- 2. Deduplicate: group findings by vulnerability class + code location, merge into single entries
530
+ 2. Deduplicate: group findings by vulnerability class + code location, merge into single entries. Include \`observation_ids\` on every deduped finding so each raw finding maps to exactly one report entry.
531
531
  3. Enrich: for each Critical/High finding, write specific impact and recommendation
532
532
  4. Call argus_persist_deduped with run_id and your deduped findings array — this writes the source-of-truth JSON to disk
533
533
  5. Call argus_generate_report with run_id, project_name, and scope — the tool reads deduped findings from disk
@@ -538,7 +538,7 @@ Overall risk assessment: {your assessment}
538
538
 
539
539
  Scribe will:
540
540
  1. Read raw findings (may contain duplicates from different tools)
541
- 2. Semantically deduplicate (e.g., merge reentrancy-eth + reentrancy-cei-violation at same location)
541
+ 2. Semantically deduplicate (e.g., merge reentrancy-eth + reentrancy-cei-violation at same location) while preserving \`observation_ids\` lineage for every raw finding
542
542
  3. Enrich Critical/High findings with specific impact and recommendation text
543
543
  4. Persist deduped findings to disk via \`argus_persist_deduped\` (source-of-truth JSON)
544
544
  5. Call \`argus_generate_report\` with \`run_id\` — the tool reads from disk and renders markdown
@@ -103,11 +103,12 @@ You have two primary tools. Master them.
103
103
  "lines": [startLine, endLine],
104
104
  "source": "manual",
105
105
  "impact": "Specific impact based on the historical precedent (e.g., 'Total vault drain via flash loan, similar to $X loss in Protocol Y')",
106
- "recommendation": "Specific mitigation from the precedent audit report"
106
+ "recommendation": "Specific mitigation from the precedent audit report",
107
+ "proofOfConcept": "Steps to reproduce, exploit sketch, or reference to the historical exploit/audit evidence"
107
108
  }
108
109
  \`\`\`
109
110
 
110
- **CRITICAL**: For Critical and High findings, \`impact\` and \`recommendation\` are MANDATORY. The quality gate will flag findings missing these fields. Use your Solodit research to write specific, precedent-backed impact and recommendation text — not generic placeholders.
111
+ **CRITICAL**: For Critical and High final report findings, \`impact\`, \`recommendation\`, and \`proofOfConcept\` are MANDATORY. \`argus_record_finding\` preserves incomplete findings with warnings rather than dropping them, but Scribe must enrich them before final reporting. Use your Solodit research to write specific, precedent-backed impact, recommendation, and proof-of-concept text — not generic placeholders.
111
112
 
112
113
  **Interpretation**:
113
114
  - A finding is not report-ready until it has been recorded through this tool.
@@ -53,6 +53,7 @@ Argus provides you with a \`run_id\`. Your job: read findings, deduplicate, enri
53
53
  - Add "**Detected by:**" listing all tools/checks that flagged it
54
54
  - Example: reentrancy-eth + reentrancy-cei-violation + reentrancy-eth-withdraw-state-after-call at VulnerableVault.sol:18-23 → ONE finding
55
55
  - **PRESERVATION RULE**: Every raw finding MUST map to exactly one deduped finding. Only merge findings that are genuinely the SAME vulnerability at the SAME location. Different vulnerability classes (e.g., default-visibility vs dos-revert) are SEPARATE findings even if both are Informational. NEVER drop findings during deduplication.
56
+ - **LINEAGE RULE**: Every deduped finding MUST include \`observation_ids\` containing each raw finding's \`observation_id\`, plus \`observation_count\`, \`sources\`, and \`reported_by_agents\` when available. This lets \`argus_generate_report\` prove raw-to-deduped parity instead of emitting a "Finding parity not verifiable" warning.
56
57
 
57
58
  3. **Enrich** (MANDATORY for Critical/High):
58
59
  - Write specific \`impact\` (concrete consequence, not "could be exploited")
@@ -61,7 +62,7 @@ Argus provides you with a \`run_id\`. Your job: read findings, deduplicate, enri
61
62
 
62
63
  4. **Persist deduped findings**: Call \`argus_persist_deduped\` with:
63
64
  - \`run_id\`: the run ID from Argus
64
- - \`deduped_findings\`: JSON array of your deduped and enriched findings
65
+ - \`deduped_findings\`: JSON array of your deduped and enriched findings, including \`observation_ids\` lineage for every merged raw observation
65
66
 
66
67
  This writes the source-of-truth JSON to disk at \`.argus/runs/{run_id}/deduped-findings.json\`.
67
68
 
@@ -151,7 +151,7 @@ You have access to a specific set of tools. Use them effectively.
151
151
  }
152
152
  \`\`\`
153
153
 
154
- **CRITICAL**: For Critical and High findings, \`impact\`, \`recommendation\`, and \`proofOfConcept\` are MANDATORY. The quality gate will flag findings missing these fields. Do not use generic placeholders — be specific to the vulnerability.
154
+ **CRITICAL**: For Critical and High findings, \`impact\`, \`recommendation\`, and \`proofOfConcept\` are MANDATORY. For any finding with \`source: "slither"\`, preserve the finding even when enrichment is not ready, but add these three fields before final Scribe persistence whenever possible. \`argus_record_finding\` warns on incomplete Slither enrichment instead of dropping the finding. Do not use generic placeholders — be specific to the vulnerability.
155
155
 
156
156
  **Interpretation**:
157
157
  - Recording is mandatory before handing findings to Argus for final synthesis.
@@ -5,7 +5,7 @@ export const THEMIS_PROMPT = `You are **Themis**, the Quality Gate of Argus Pano
5
5
  You are the final validation and review agent in the audit pipeline. You do not run the full audit from scratch and you do not write the final report. You verify that the pipeline output is complete, consistent, and defensible.
6
6
 
7
7
  Model context:
8
- - You run on **OpenAI GPT-5.4-pro**.
8
+ - You run on **OpenAI GPT-5.5**.
9
9
  - This is intentionally a different provider than the other Argus agents (Claude) to increase reasoning diversity for final quality checks.
10
10
 
11
11
  Your core responsibilities are:
@@ -1,5 +1,6 @@
1
1
  import { existsSync, readdirSync, readFileSync } from "node:fs"
2
- import { basename, dirname, extname, join } from "node:path"
2
+ import { homedir } from "node:os"
3
+ import { basename, dirname, extname, join, resolve } from "node:path"
3
4
  import { loadArgusConfig } from "../../config/loader"
4
5
  import type { ArgusConfig } from "../../config/types"
5
6
  import { createLogger } from "../../shared/logger"
@@ -12,6 +13,8 @@ import {
12
13
  } from "../../skills/argus-skill-resolver"
13
14
  import { parseFrontmatter, validateSkillFrontmatter } from "../../skills/skill-schema"
14
15
  import { detectViaIr } from "../../tools/slither-tool"
16
+ import { DEFAULT_SOLODIT_PORT } from "../../tools/solodit-search-tool"
17
+ import { checkSoloditHealth } from "../../utils/solodit-health"
15
18
  import { cliOutput } from "../cli-output"
16
19
  import type { CliCommand } from "../types"
17
20
 
@@ -133,6 +136,143 @@ export function buildSkillHealthReport(
133
136
  }
134
137
  }
135
138
 
139
+ // ─────────────────────────────────────────────────────────────────────────────
140
+ // Install-drift detection
141
+ //
142
+ // OpenCode's plugin resolver walks up the filesystem looking up `node_modules`
143
+ // directories. A stale copy of solidity-argus hoisted to a higher-precedence
144
+ // location (typically `~/.cache/opencode/node_modules/solidity-argus`) will
145
+ // SHADOW the canonical install under `~/.cache/opencode/packages/...`. The
146
+ // shadowing install is loaded silently, leading to confusing failures like
147
+ // `undefined is not an object (evaluating 'result.toLowerCase')` on every MCP
148
+ // call (older versions lacked defensive guards in `tool.execute.after`).
149
+ //
150
+ // This check enumerates known install locations and flags drift.
151
+ // ─────────────────────────────────────────────────────────────────────────────
152
+
153
+ export type ArgusInstallSource =
154
+ | "current"
155
+ | "hoisted-cache"
156
+ | "package-cache"
157
+ | "user-config"
158
+ | "project-local"
159
+
160
+ export type ArgusInstall = {
161
+ source: ArgusInstallSource
162
+ path: string
163
+ version: string | null
164
+ }
165
+
166
+ export type InstallDriftReport = {
167
+ current: ArgusInstall | null
168
+ installs: ArgusInstall[]
169
+ errors: string[]
170
+ warnings: string[]
171
+ }
172
+
173
+ function readPackageVersion(packageRoot: string): string | null {
174
+ try {
175
+ const raw = readFileSync(join(packageRoot, "package.json"), "utf8")
176
+ const parsed = JSON.parse(raw) as { version?: unknown }
177
+ return typeof parsed.version === "string" ? parsed.version : null
178
+ } catch {
179
+ return null
180
+ }
181
+ }
182
+
183
+ function getCurrentArgusInstall(): ArgusInstall | null {
184
+ // doctor.ts lives at <packageRoot>/src/cli/commands/doctor.ts
185
+ const packageRoot = resolve(import.meta.dir, "../../..")
186
+ if (!existsSync(join(packageRoot, "package.json"))) return null
187
+ const version = readPackageVersion(packageRoot)
188
+ return { source: "current", path: packageRoot, version }
189
+ }
190
+
191
+ export function enumerateArgusInstallCandidates(
192
+ cwd: string,
193
+ home: string,
194
+ ): Array<{ source: ArgusInstallSource; path: string }> {
195
+ return [
196
+ {
197
+ source: "hoisted-cache",
198
+ path: join(home, ".cache", "opencode", "node_modules", "solidity-argus"),
199
+ },
200
+ {
201
+ source: "package-cache",
202
+ path: join(
203
+ home,
204
+ ".cache",
205
+ "opencode",
206
+ "packages",
207
+ "solidity-argus@latest",
208
+ "node_modules",
209
+ "solidity-argus",
210
+ ),
211
+ },
212
+ {
213
+ source: "user-config",
214
+ path: join(home, ".config", "opencode", "node_modules", "solidity-argus"),
215
+ },
216
+ {
217
+ source: "project-local",
218
+ path: join(cwd, "node_modules", "solidity-argus"),
219
+ },
220
+ ]
221
+ }
222
+
223
+ function findArgusInstalls(cwd: string, home: string): ArgusInstall[] {
224
+ const installs: ArgusInstall[] = []
225
+ for (const { source, path } of enumerateArgusInstallCandidates(cwd, home)) {
226
+ if (existsSync(path)) {
227
+ installs.push({ source, path, version: readPackageVersion(path) })
228
+ }
229
+ }
230
+ return installs
231
+ }
232
+
233
+ export function detectInstallDrift(
234
+ current: ArgusInstall | null,
235
+ installs: ArgusInstall[],
236
+ ): { errors: string[]; warnings: string[] } {
237
+ const errors: string[] = []
238
+ const warnings: string[] = []
239
+
240
+ const hoisted = installs.find((i) => i.source === "hoisted-cache")
241
+ const pkgCache = installs.find((i) => i.source === "package-cache")
242
+
243
+ // Highest-confidence error: hoisted cache shadows the canonical cache with a
244
+ // DIFFERENT version. OpenCode will load the wrong one.
245
+ if (hoisted && pkgCache && hoisted.version !== pkgCache.version) {
246
+ errors.push(
247
+ `Stale install shadowing canonical version:\n` +
248
+ ` ${hoisted.path} (v${hoisted.version ?? "unknown"})\n` +
249
+ ` shadows ${pkgCache.path} (v${pkgCache.version ?? "unknown"}).\n` +
250
+ ` OpenCode will load v${hoisted.version ?? "unknown"} instead of v${pkgCache.version ?? "unknown"}.\n` +
251
+ ` Fix: rm -rf "${hoisted.path}"`,
252
+ )
253
+ return { errors, warnings }
254
+ }
255
+
256
+ // Lower-confidence: hoisted install drifts from the version the doctor CLI
257
+ // is itself running as (typical when the user upgraded via bunx/opencode).
258
+ if (hoisted && current?.version && hoisted.version && hoisted.version !== current.version) {
259
+ warnings.push(
260
+ `Possible stale install (drift from running version):\n` +
261
+ ` ${hoisted.path} (v${hoisted.version}) differs from current (v${current.version}).\n` +
262
+ ` Fix: rm -rf "${hoisted.path}"`,
263
+ )
264
+ }
265
+
266
+ return { errors, warnings }
267
+ }
268
+
269
+ export function buildInstallDriftReport(cwd: string, home: string): InstallDriftReport {
270
+ const current = getCurrentArgusInstall()
271
+ const installs = findArgusInstalls(cwd, home)
272
+ const { errors, warnings } = detectInstallDrift(current, installs)
273
+ return { current, installs, errors, warnings }
274
+ }
275
+
136
276
  const NON_SKILL_FILENAMES = new Set(["README.md", "INVENTORY.md", "CHANGELOG.md", "LICENSE.md"])
137
277
 
138
278
  function scanMarkdownFiles(dir: string, maxDepth = 8): string[] {
@@ -237,6 +377,22 @@ export const doctorCommand: CliCommand = {
237
377
  cliOutput.log(`${YELLOW}⚠${RESET} Project: no Solidity project detected`)
238
378
  }
239
379
 
380
+ const driftReport = buildInstallDriftReport(cwd, homedir())
381
+ if (driftReport.errors.length === 0 && driftReport.warnings.length === 0) {
382
+ const versionStr = driftReport.current?.version
383
+ ? ` (current: v${driftReport.current.version})`
384
+ : ""
385
+ cliOutput.log(`${GREEN}✓${RESET} Install drift: none detected${versionStr}`)
386
+ } else {
387
+ for (const err of driftReport.errors) {
388
+ cliOutput.log(`${RED}✗${RESET} Install drift: ${err}`)
389
+ hasFailure = true
390
+ }
391
+ for (const warn of driftReport.warnings) {
392
+ cliOutput.log(`${YELLOW}⚠${RESET} Install drift: ${warn}`)
393
+ }
394
+ }
395
+
240
396
  if (projectType === "foundry" && detectViaIr(cwd)) {
241
397
  cliOutput.log(
242
398
  `${YELLOW}⚠${RESET} via_ir: enabled in foundry.toml — Slither will use flatten fallback`,
@@ -305,21 +461,13 @@ export const doctorCommand: CliCommand = {
305
461
 
306
462
  const soloditEnabled = config?.solodit?.enabled !== false
307
463
  if (soloditEnabled) {
308
- try {
309
- const response = await fetch(
310
- "https://solodit.cyfrin.io/api/trpc/findings.get?batch=1&input=" +
311
- encodeURIComponent(JSON.stringify({ 0: "[]" })),
312
- {
313
- signal: AbortSignal.timeout(5000),
314
- },
315
- )
316
- if (response.ok) {
317
- cliOutput.log(`${GREEN}✓${RESET} Solodit API: reachable`)
318
- } else {
319
- cliOutput.log(`${YELLOW}⚠${RESET} Solodit API: returned ${response.status}`)
320
- }
321
- } catch {
322
- cliOutput.log(`${YELLOW}⚠${RESET} Solodit API: unreachable`)
464
+ const port = config?.solodit?.port ?? DEFAULT_SOLODIT_PORT
465
+ const status = await checkSoloditHealth(port, true)
466
+ if (status.reachable) {
467
+ cliOutput.log(`${GREEN}✓${RESET} Solodit MCP: reachable on port ${port}`)
468
+ } else {
469
+ const suffix = status.error ? ` (${status.error})` : ""
470
+ cliOutput.log(`${YELLOW}⚠${RESET} Solodit MCP: unreachable on port ${port}${suffix}`)
323
471
  }
324
472
  } else {
325
473
  cliOutput.log(`${YELLOW}⚠${RESET} Solodit: disabled in config`)
@@ -64,7 +64,8 @@ function addPluginToConfig(configPath: string): { added: boolean; ok: boolean }
64
64
 
65
65
  export const installCommand: CliCommand = {
66
66
  name: "install",
67
- description: "Register solidity-argus in your OpenCode config (use --global for ~/.config/opencode)",
67
+ description:
68
+ "Register solidity-argus in your OpenCode config (use --global for ~/.config/opencode)",
68
69
  async execute(args: string[]): Promise<number> {
69
70
  const isGlobal = args.includes("--global") || args.includes("-g")
70
71
  const local = localConfigPath()
@@ -85,7 +86,9 @@ export const installCommand: CliCommand = {
85
86
  ` Installing globally would write to ${global} and load solidity-argus in EVERY OpenCode session.`,
86
87
  )
87
88
  cliOutput.warn(` To install globally on purpose, re-run with: argus install --global`)
88
- cliOutput.warn(` To install for this project, first create an opencode.json in this directory.`)
89
+ cliOutput.warn(
90
+ ` To install for this project, first create an opencode.json in this directory.`,
91
+ )
89
92
 
90
93
  const proceed = await confirm("Install globally anyway?", false)
91
94
  if (!proceed) {
@@ -1,9 +1,9 @@
1
1
  export const DEFAULT_MODELS = {
2
- argus: "anthropic/claude-opus-4-6",
2
+ argus: "anthropic/claude-opus-4-7",
3
3
  sentinel: "anthropic/claude-sonnet-4-6",
4
4
  pythia: "anthropic/claude-sonnet-4-6",
5
5
  scribe: "anthropic/claude-sonnet-4-6",
6
- themis: "openai/gpt-5.4",
6
+ themis: "openai/gpt-5.5",
7
7
  } as const
8
8
 
9
9
  export const DEFAULT_STEPS = 50 as const
@@ -14,7 +14,6 @@ import {
14
14
  releaseEventSink,
15
15
  } from "./features/persistent-state/event-sink"
16
16
  import {
17
- materializeFindings,
18
17
  materializeFindingsForRun,
19
18
  materializeReportInput,
20
19
  } from "./features/persistent-state/findings-materializer"
@@ -12,10 +12,7 @@ import type { CanonicalFinding, CanonicalToolExecution, ReportInput } from "../.
12
12
  import { SCHEMA_VERSION } from "../../state/schemas"
13
13
  import { readEvents } from "./event-sink"
14
14
 
15
- export type MaterializeFindingsTrigger =
16
- | "session.idle"
17
- | "session.deleted"
18
- | "tool.execute.after"
15
+ export type MaterializeFindingsTrigger = "session.idle" | "session.deleted" | "tool.execute.after"
19
16
 
20
17
  export interface MaterializeFindingsForRunOptions {
21
18
  failFast?: boolean
@@ -83,6 +83,54 @@ function asRecord(value: unknown): Record<string, unknown> | null {
83
83
  return null
84
84
  }
85
85
 
86
+ function isGenerateReportCompletion(event: AuditEvent): boolean {
87
+ if (event.type !== "tool.completed") return false
88
+ const payload = asRecord(event.payload)
89
+ if (!payload) return false
90
+ return payload.tool === "argus_generate_report" || payload.name === "argus_generate_report"
91
+ }
92
+
93
+ async function collectReportCompletenessErrors(events: AuditEvent[]): Promise<string[]> {
94
+ const errors: string[] = []
95
+ const reportEvents = events.filter(isGenerateReportCompletion)
96
+
97
+ for (const event of reportEvents) {
98
+ const payload = asRecord(event.payload)
99
+ const filePath = payload?.filePath
100
+ if (typeof filePath !== "string" || filePath.length === 0) continue
101
+
102
+ try {
103
+ const report = await Bun.file(filePath).text()
104
+ if (report.includes("## ⚠ Completeness Warning")) {
105
+ errors.push("generated report contains Completeness Warning")
106
+ }
107
+ } catch {
108
+ // Missing report files are handled by report-generation/tool-tracking gates.
109
+ }
110
+ }
111
+
112
+ return errors
113
+ }
114
+
115
+ function collectReportQualityGateErrors(events: AuditEvent[]): string[] {
116
+ const errors: string[] = []
117
+ const reportEvents = events.filter(isGenerateReportCompletion)
118
+
119
+ for (const event of reportEvents) {
120
+ const payload = asRecord(event.payload)
121
+ const qualityGates = asRecord(payload?.qualityGates)
122
+ if (qualityGates?.passed !== false) continue
123
+
124
+ const violations = Array.isArray(qualityGates.violations)
125
+ ? qualityGates.violations.filter((entry): entry is string => typeof entry === "string")
126
+ : []
127
+ const details = violations.length > 0 ? `: ${violations.join("; ")}` : ""
128
+ errors.push(`generated report failed quality gates${details}`)
129
+ }
130
+
131
+ return errors
132
+ }
133
+
86
134
  function collectParentChildIntegrityErrors(events: AuditEvent[]): string[] {
87
135
  const errors: string[] = []
88
136
  const parentByChild = new Map<string, string>()
@@ -257,17 +305,25 @@ export async function finalizeRun(
257
305
  const hasEventsAfterExistingFinalization =
258
306
  existingResult !== null && existingResult.finalizedIndex < events.length - 1
259
307
  if (existingResult?.invariantsPassed && !hasEventsAfterExistingFinalization) {
260
- return {
261
- success: existingResult.success,
262
- invariantsPassed: existingResult.invariantsPassed,
263
- errors: existingResult.errors,
264
- warnings: existingResult.warnings,
265
- runId: existingResult.runId,
266
- timestamp: existingResult.timestamp,
308
+ const reportErrors = [
309
+ ...(await collectReportCompletenessErrors(events)),
310
+ ...collectReportQualityGateErrors(events),
311
+ ]
312
+ if (reportErrors.length === 0) {
313
+ return {
314
+ success: existingResult.success,
315
+ invariantsPassed: existingResult.invariantsPassed,
316
+ errors: existingResult.errors,
317
+ warnings: existingResult.warnings,
318
+ runId: existingResult.runId,
319
+ timestamp: existingResult.timestamp,
320
+ }
267
321
  }
268
322
  }
269
323
 
270
324
  const { errors, warnings } = collectInvariantErrors(events)
325
+ errors.push(...(await collectReportCompletenessErrors(events)))
326
+ errors.push(...collectReportQualityGateErrors(events))
271
327
  const invariantsPassed = errors.length === 0
272
328
  const sessionId = events.at(-1)?.session_id ?? ""
273
329
 
@@ -185,7 +185,7 @@ export function createConfigHandler(
185
185
  mode: "subagent",
186
186
  model: argusConfig.agents?.themis?.model ?? DEFAULT_MODELS.themis,
187
187
  steps: argusConfig.agents?.themis?.steps ?? DEFAULT_STEPS,
188
- description: "Audit quality gate — independent cross-validation (GPT-5.4)",
188
+ description: "Audit quality gate — independent cross-validation (GPT-5.5)",
189
189
  prompt: THEMIS_PROMPT,
190
190
  permission: {
191
191
  argus_read_findings: "allow",
@@ -62,6 +62,13 @@ const KNOWN_INPUT_FIELDS = new Set([
62
62
  "observationId",
63
63
  "observationFingerprint",
64
64
  "issueFingerprint",
65
+ "observation_ids",
66
+ "observationIds",
67
+ "observation_count",
68
+ "observationCount",
69
+ "reported_by_agents",
70
+ "reportedByAgents",
71
+ "sources",
65
72
  "elements",
66
73
  "location",
67
74
  ])
@@ -157,6 +164,20 @@ function pushValidationDiagnostics(errors: ValidationError[]): Diagnostic[] {
157
164
  }))
158
165
  }
159
166
 
167
+ function normalizeStringArray(value: unknown): string[] | undefined {
168
+ if (!Array.isArray(value)) return undefined
169
+ const strings = value.filter(
170
+ (item): item is string => typeof item === "string" && item.length > 0,
171
+ )
172
+ return strings.length > 0
173
+ ? Array.from(new Set(strings)).sort((a, b) => a.localeCompare(b))
174
+ : undefined
175
+ }
176
+
177
+ function normalizePositiveInteger(value: unknown): number | undefined {
178
+ return typeof value === "number" && Number.isInteger(value) && value > 0 ? value : undefined
179
+ }
180
+
160
181
  export function normalizeToCanonicalFinding(
161
182
  raw: Finding | Record<string, unknown>,
162
183
  runId: string,
@@ -288,6 +309,16 @@ export function normalizeToCanonicalFinding(
288
309
  observationId,
289
310
  })
290
311
 
312
+ const observationIds =
313
+ normalizeStringArray(input.observation_ids) ?? normalizeStringArray(input.observationIds)
314
+ const reportedByAgents =
315
+ normalizeStringArray(input.reported_by_agents) ?? normalizeStringArray(input.reportedByAgents)
316
+ const sources = normalizeStringArray(input.sources)
317
+ const observationCount =
318
+ normalizePositiveInteger(input.observation_count) ??
319
+ normalizePositiveInteger(input.observationCount) ??
320
+ observationIds?.length
321
+
291
322
  const canonical: CanonicalFinding = {
292
323
  id: observationId,
293
324
  check,
@@ -302,6 +333,10 @@ export function normalizeToCanonicalFinding(
302
333
  issue_fingerprint: issueFingerprint,
303
334
  observation_fingerprint: observationFingerprint,
304
335
  observation_id: observationId,
336
+ observation_ids: observationIds,
337
+ observation_count: observationCount,
338
+ reported_by_agents: reportedByAgents,
339
+ sources,
305
340
  impact: typeof input.impact === "string" && input.impact.length > 0 ? input.impact : undefined,
306
341
  recommendation:
307
342
  typeof input.recommendation === "string" && input.recommendation.length > 0
@@ -85,7 +85,7 @@ export const persistDedupedTool = tool({
85
85
  deduped_findings: tool.schema
86
86
  .string()
87
87
  .describe(
88
- "Serialized JSON array of deduplicated and enriched findings. Each finding should have: check, severity, confidence, description, file, lines, source, impact, recommendation.",
88
+ "Serialized JSON array of deduplicated and enriched findings. Each finding should have: check, severity, confidence, description, file, lines, source, impact, recommendation, proofOfConcept, and observation_ids lineage proving which raw findings were merged.",
89
89
  ),
90
90
  },
91
91
  async execute(args, context) {
@@ -28,6 +28,8 @@ type RecordFindingResponse = {
28
28
  }>
29
29
  schema_version: string
30
30
  note: string
31
+ enrichment_warnings?: string[]
32
+ enrichment_hint?: string
31
33
  }
32
34
 
33
35
  type ParseResult = { ok: true; data: Record<string, unknown>[] } | { ok: false; error: string }
@@ -79,6 +81,16 @@ function errorResponse(error: string): string {
79
81
  })
80
82
  }
81
83
 
84
+ function collectMissingEnrichmentFields(
85
+ finding: ReturnType<typeof normalizeToCanonicalFinding>["data"],
86
+ ): string[] {
87
+ const missing: string[] = []
88
+ if (!isNonEmptyString(finding.impact)) missing.push("impact")
89
+ if (!isNonEmptyString(finding.recommendation)) missing.push("recommendation")
90
+ if (!isNonEmptyString(finding.proofOfConcept)) missing.push("proofOfConcept")
91
+ return missing
92
+ }
93
+
82
94
  export async function executeRecordFinding(
83
95
  args: RecordFindingArgs,
84
96
  context: ToolContext,
@@ -160,16 +172,21 @@ export async function executeRecordFinding(
160
172
  return errorResponse(`Failed to record finding(s): ${errors.join("; ")}`)
161
173
  }
162
174
 
163
- // Warn when Critical/High findings are missing enrichment fields
175
+ // Warn when report-quality enrichment is missing without dropping findings.
164
176
  const enrichmentWarnings: string[] = []
165
177
  const HIGH_SEVERITIES = new Set(["Critical", "High"])
166
178
  for (const f of findings) {
167
- if (!HIGH_SEVERITIES.has(f.severity)) continue
168
- const missing: string[] = []
169
- if (!f.impact) missing.push("impact")
170
- if (!f.recommendation) missing.push("recommendation")
171
- if (!f.proofOfConcept) missing.push("proofOfConcept")
179
+ const missing = collectMissingEnrichmentFields(f)
172
180
  if (missing.length > 0) {
181
+ if (f.source === "slither") {
182
+ enrichmentWarnings.push(
183
+ `[${f.severity}] Slither finding ${f.check} in ${f.file} is missing: ${missing.join(", ")}. The finding was recorded, but Scribe must enrich it before final reporting.`,
184
+ )
185
+ continue
186
+ }
187
+
188
+ if (!HIGH_SEVERITIES.has(f.severity)) continue
189
+
173
190
  enrichmentWarnings.push(
174
191
  `[${f.severity}] ${f.check} in ${f.file} is missing: ${missing.join(", ")}. Quality gate will flag this.`,
175
192
  )
@@ -199,7 +216,7 @@ export async function executeRecordFinding(
199
216
  ? {
200
217
  enrichment_warnings: enrichmentWarnings,
201
218
  enrichment_hint:
202
- "Critical and High findings MUST include impact, recommendation, and proofOfConcept fields. Re-submit with these fields to pass the quality gate.",
219
+ "Critical and High findings MUST include impact, recommendation, and proofOfConcept fields. Slither findings should include all three fields before Scribe persists deduped findings; incomplete Slither records are preserved but will be flagged by report quality gates if not enriched downstream.",
203
220
  }
204
221
  : {}),
205
222
  }
@@ -215,13 +232,13 @@ export const recordFindingTool = tool({
215
232
  .string()
216
233
  .optional()
217
234
  .describe(
218
- 'Serialized JSON object for a single finding. Required fields: check (string, e.g. "reentrancy-eth"), severity (Critical|High|Medium|Low|Informational), confidence (High|Medium|Low), description (string), file (relative path, e.g. "src/Vault.sol"), lines ([startLine, endLine] tuple), source ("manual"). Optional: impact, recommendation, proofOfConcept (mandatory for Critical/High).',
235
+ 'Serialized JSON object for a single finding. Required fields: check (string, e.g. "reentrancy-eth"), severity (Critical|High|Medium|Low|Informational), confidence (High|Medium|Low), description (string), file (relative path, e.g. "src/Vault.sol"), lines ([startLine, endLine] tuple), source ("manual"|"slither"|"pattern"|"scvd"|"solodit"|"fuzz"). Optional: impact, recommendation, proofOfConcept (mandatory for Critical/High final report findings; strongly recommended for Slither-source findings before Scribe persistence).',
219
236
  ),
220
237
  findings: tool.schema
221
238
  .string()
222
239
  .optional()
223
240
  .describe(
224
- "Serialized JSON array of finding objects. Each object requires the same fields as the finding parameter: check, severity, confidence, description, file, lines, source. Aliases title/name → check and location → file are accepted but canonical names are preferred.",
241
+ "Serialized JSON array of finding objects. Each object requires the same fields as the finding parameter: check, severity, confidence, description, file, lines, source. impact, recommendation, and proofOfConcept are mandatory for Critical/High final report findings and strongly recommended for Slither-source findings before Scribe persistence. Aliases title/name → check and location → file are accepted but canonical names are preferred.",
225
242
  ),
226
243
  },
227
244
  async execute(args, context) {
@@ -627,9 +627,7 @@ function parseReportInputPayload(
627
627
  dedupedArtifact.findings,
628
628
  effectiveRunId,
629
629
  projectDir,
630
- typeof dedupedArtifact.deduped_by === "string"
631
- ? dedupedArtifact.deduped_by
632
- : "scribe",
630
+ typeof dedupedArtifact.deduped_by === "string" ? dedupedArtifact.deduped_by : "scribe",
633
631
  )
634
632
  const merged: Record<string, unknown> = {
635
633
  ...baseInput,
@@ -658,10 +656,7 @@ function parseReportInputPayload(
658
656
  ) {
659
657
  merged.schema_version = SCHEMA_VERSION
660
658
  }
661
- if (
662
- typeof merged.projectDir !== "string" ||
663
- (merged.projectDir as string).length === 0
664
- ) {
659
+ if (typeof merged.projectDir !== "string" || (merged.projectDir as string).length === 0) {
665
660
  merged.projectDir = projectDir
666
661
  }
667
662
  if (!Array.isArray(merged.scope)) {
@@ -858,6 +853,38 @@ function sortFindingsDeterministically(findings: Finding[]): Finding[] {
858
853
  return [...findings].sort(compareFindingsDeterministically)
859
854
  }
860
855
 
856
+ function hasDedupLineage(findings: Finding[]): boolean {
857
+ return findings.some((finding) => {
858
+ const observationIds = (finding as { observation_ids?: unknown }).observation_ids
859
+ return Array.isArray(observationIds) && observationIds.length > 0
860
+ })
861
+ }
862
+
863
+ function observationIdsForFinding(finding: Finding): string[] {
864
+ const observationIds = (finding as { observation_ids?: unknown }).observation_ids
865
+ if (Array.isArray(observationIds)) {
866
+ return observationIds.filter((id): id is string => typeof id === "string" && id.length > 0)
867
+ }
868
+ return typeof finding.observation_id === "string" && finding.observation_id.length > 0
869
+ ? [finding.observation_id]
870
+ : []
871
+ }
872
+
873
+ function compareObservationLineage(
874
+ eventFindings: Finding[],
875
+ reportFindings: Finding[],
876
+ ): { missing: string[]; extra: string[]; matches: boolean } {
877
+ const expected = new Set(eventFindings.flatMap(observationIdsForFinding))
878
+ const actual = new Set(reportFindings.flatMap(observationIdsForFinding))
879
+ const missing = Array.from(expected)
880
+ .filter((id) => !actual.has(id))
881
+ .sort((a, b) => a.localeCompare(b))
882
+ const extra = Array.from(actual)
883
+ .filter((id) => !expected.has(id))
884
+ .sort((a, b) => a.localeCompare(b))
885
+ return { missing, extra, matches: missing.length === 0 && extra.length === 0 }
886
+ }
887
+
861
888
  export function validateReportQuality(
862
889
  findings: Finding[],
863
890
  policy: QualityGatePolicy,
@@ -1154,7 +1181,7 @@ export async function executeReportGeneration(
1154
1181
  deps: ReportGenerationDependencies = {},
1155
1182
  ): Promise<ReportGenerationResult> {
1156
1183
  const includeExecutiveSummary = args.include_executive_summary ?? true
1157
- const threshold = args.severity_threshold ?? "low"
1184
+ const threshold = args.severity_threshold ?? "informational"
1158
1185
  const qualityGatePolicy = args.quality_gate_policy ?? "warn"
1159
1186
  const toolCoveragePolicy = args.tool_coverage_policy ?? "enforce"
1160
1187
  const expectedRunId = resolveExpectedRunId(args, context, deps)
@@ -1230,7 +1257,26 @@ export async function executeReportGeneration(
1230
1257
 
1231
1258
  const eventFindings = dedupeFindingsForFinalOutput(projectFindings(events))
1232
1259
  const inputFindings = dedupeFindingsForFinalOutput(reportInput.findings)
1233
- const parity = compareIssueFingerprintSets(eventFindings, inputFindings)
1260
+ const hasLineage = hasDedupLineage(reportInput.findings)
1261
+ const shouldCheckParity = eventFindings.length === inputFindings.length || hasLineage
1262
+ const parity = shouldCheckParity
1263
+ ? hasLineage
1264
+ ? compareObservationLineage(projectFindings(events), reportInput.findings)
1265
+ : compareIssueFingerprintSets(eventFindings, inputFindings)
1266
+ : { missing: [], extra: [], matches: true }
1267
+
1268
+ if (!shouldCheckParity) {
1269
+ const unverifiableSummary = `event_findings=${eventFindings.length}, report_findings=${inputFindings.length}`
1270
+ if (preflightPolicy === "strict-fail") {
1271
+ throw new Error(
1272
+ `Preflight failed (strict-fail): finding parity not verifiable (${unverifiableSummary}; missing observation_ids)`,
1273
+ )
1274
+ }
1275
+
1276
+ warningBullets.push(
1277
+ `- Finding parity not verifiable: ${unverifiableSummary}; deduped findings must include observation_ids to prove merged observations were preserved`,
1278
+ )
1279
+ }
1234
1280
 
1235
1281
  if (!parity.matches) {
1236
1282
  const mismatchSummary = `missing=${parity.missing.length}, extra=${parity.extra.length}`
@@ -1241,11 +1287,12 @@ export async function executeReportGeneration(
1241
1287
  }
1242
1288
 
1243
1289
  warningBullets.push(`- Finding parity mismatch: ${mismatchSummary}`)
1290
+ const parityLabel = hasLineage ? "observation IDs" : "issue fingerprints"
1244
1291
  if (parity.missing.length > 0) {
1245
- warningBullets.push(`- Missing issue fingerprints: ${parity.missing.join(", ")}`)
1292
+ warningBullets.push(`- Missing ${parityLabel}: ${parity.missing.join(", ")}`)
1246
1293
  }
1247
1294
  if (parity.extra.length > 0) {
1248
- warningBullets.push(`- Extra issue fingerprints: ${parity.extra.join(", ")}`)
1295
+ warningBullets.push(`- Extra ${parityLabel}: ${parity.extra.join(", ")}`)
1249
1296
  }
1250
1297
  }
1251
1298
  } catch (err) {