npm - opmsec - Versions diffs - 0.1.4 → 0.1.5 - Mend

opmsec 0.1.4 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (139) hide show

package/docs/cli/register-agent.mdx CHANGED Viewed

@@ -1,11 +1,11 @@
 ---
 title: 'opm register-agent'
-description: 'Register a new security agent with ZK-verified benchmarks.'
+description: 'Register a new AI security agent with ZK-verified benchmarks.'
 ---
 # opm register-agent
-Register a new security agent with ZK-verified benchmarks. Agents must pass 10 benchmark cases with 100% accuracy and produce a valid ZK proof before they can submit scores on-chain.
+Register your own security agent. Must pass 10 benchmark cases with 100% accuracy and produce a valid ZK proof.
 ## Usage
@@ -15,66 +15,34 @@ opm register-agent --name <name> --model <model> [--system-prompt <prompt>]
 | Flag | Required | Description |
 |------|----------|-------------|
-| <code>--name</code> | Yes | Agent identifier (e.g. <code>my-security-agent</code>) |
-| <code>--model</code> | Yes | LLM model (e.g. <code>anthropic/claude-sonnet-4-20250514</code>) |
-| <code>--system-prompt</code> | No | Custom system prompt (defaults to OPM security auditor prompt) |
+| `--name` | Yes | Agent identifier |
+| `--model` | Yes | LLM model (e.g. `anthropic/claude-sonnet-4-20250514`) |
+| `--system-prompt` | No | Custom system prompt (defaults to OPM security auditor) |
+## Process
+1. **Validate** — Checks agent name, model, env vars
+2. **Benchmark** — Sends 10 labeled cases in a single LLM call (3 clean, 7 malicious)
+3. **ZK proof** — Generates proof of 100% accuracy without revealing test data
+4. **Register** — Calls `OPMRegistry.registerAgent()` with name, model, system prompt hash, proof hash
 ## Required Environment Variables
 | Variable | Purpose |
 |----------|---------|
-| <code>AGENT_PRIVATE_KEY</code> | Wallet that becomes the agent identity; must hold Base Sepolia ETH for gas |
-| <code>OPENROUTER_API_KEY</code> or <code>OPENAI_API_KEY</code> | Required to run benchmark LLM calls |
-## Process
+| `AGENT_PRIVATE_KEY` | Wallet that becomes the agent identity (needs Base Sepolia ETH) |
+| `OPENROUTER_API_KEY` or `OPENAI_API_KEY` | LLM access for benchmark calls |
-<Steps>
-  <Step title="Validate configuration">
-    Checks agent name, model, and required env vars.
-  </Step>
-  <Step title="Run benchmark suite">
-    Executes 10 labeled test cases across categories: clean, typosquat, malicious, cve, obfuscated, exfiltration, dependency_confusion.
-  </Step>
-  <Step title="Generate ZK proof">
-    Produces a zero-knowledge proof that the agent achieved 100% accuracy without revealing test data or expected outputs.
-  </Step>
-  <Step title="Register on-chain">
-    Calls <code>OPMRegistry.registerAgent()</code> with name, model, system prompt hash, and proof hash.
-  </Step>
-</Steps>
+## On Success
-## Benchmark Categories
+- BaseScan transaction link shown
+- Agent authorized to submit scores
+- Participates in all future package scans
-| Category | Count | Description |
-|----------|-------|-------------|
-| clean | 3 | Legitimate packages (string utils, math, validator) |
-| typosquat | 1 | Typosquat with credential exfiltration |
-| malicious | 2 | Postinstall shell, SSH key exfiltration |
-| cve | 1 | Known prototype pollution CVE |
-| obfuscated | 1 | Obfuscated reverse shell |
-| exfiltration | 1 | Env var exfiltration on import |
-| dependency_confusion | 1 | Internal scope shadowing + exfiltration |
+## On Failure
-## Requirements
+Shows which cases were misclassified, expected vs actual verdict, and rejection reason.
 <Warning>
-**100% accuracy** is required. The agent must pass all 10 benchmark cases. Each case is evaluated by score range and risk level ordinal.
+**100% accuracy required.** No partial passes. The ZK proof hides test data and individual results — only the commitment hash and proof hash go on-chain.
 </Warning>
-<Note>
-The ZK proof hides test data and individual results. Only the commitment hash and proof hash are stored on-chain.
-</Note>
-## Success
-On success:
-- Etherscan transaction link is shown
-- Agent is authorized to submit scores
-- Agent participates in the next package scan alongside existing agents
-## Failure
-On failure, the CLI shows:
-- Which benchmark cases failed and why
-- Expected vs actual risk level and score
-- Rejection reason (e.g. "Agent achieved 90% accuracy (100% required)")

package/docs/cli/view.mdx CHANGED Viewed

@@ -1,52 +1,35 @@
 ---
 title: 'opm view / opm whois'
-description: 'Author profile and published packages.'
+description: 'Author profile, reputation, and published packages.'
 ---
 # opm view / opm whois
-Show author profile and published packages. Resolves ENS identity and displays on-chain reputation.
+Look up an author's ENS profile, OPM reputation, and published packages.
 ## Usage
 <CodeGroup>
-```bash View by ENS name
-opm view vitalik.eth
+```bash By ENS name
+opm view djpai.eth
 ```
 ```bash Whois (auto-appends .eth)
-opm whois vitalik
+opm whois djpai
 ```
 </CodeGroup>
 <Note>
-<code>opm view &lt;name.eth&gt;</code> shows author profile. <code>opm view &lt;pkg&gt;</code> (without <code>.eth</code>) delegates to <code>opm info</code>.
+`opm view <name.eth>` shows author profile. `opm view <pkg>` (without `.eth`) delegates to `opm info`.
 </Note>
-## Output
+## What It Shows
-### Identity
+- **Identity** — ENS name, wallet address, bio, URL, GitHub, Twitter, email, avatar
+- **Author stats** — Packages published, average reputation score
+- **OPM ENS records** — `opm.version`, `opm.fileverse`, `opm.risk_score`, `opm.packages`, contenthash
+- **Published packages** — Name, version, risk score, checksum, signature status, report link
-- **ENS name** — Resolved identity
-- **Address** — Wallet address
-- **Bio** — ENS description text record
-- **URL** — ENS url record
-- **GitHub** — <code>com.github</code> text record
-- **Twitter** — <code>com.twitter</code> text record
-- **Email** — ENS email record
-- **Avatar** — Rendered from ENS avatar record
-### Author Stats
-- **Packages published** — Count from OPMRegistry
-- **Avg reputation** — Risk badge (lower = better)
-### Published Packages
-For each package: name, version, risk score, checksum, signature status, and report URI link.
-<Note>
-No environment variables are required. ENS resolution uses public resolvers.
-</Note>
+No environment variables required. Uses public ENS resolvers.

package/docs/concepts/ens-records.mdx ADDED Viewed

@@ -0,0 +1,44 @@
+---
+title: 'ENS Text Records'
+description: 'Package metadata stored on ENS for decentralized discovery.'
+---
+# ENS Text Records
+When you publish via `opm push`, package metadata is written to your ENS name as text records — creating a decentralized discovery layer independent of the smart contract.
+## Record Keys
+### Author-Level (on `djpai.eth`)
+| Key | Example |
+|-----|---------|
+| `opm.version` | `1.2.3` |
+| `opm.checksum` | `0x8a3f...` |
+| `opm.fileverse` | Fileverse report URI |
+| `opm.risk_score` | `12` |
+| `opm.signature` | ECDSA signature |
+| `opm.contract` | Registry contract address |
+| `opm.packages` | `express,lodash` |
+### Per-Package (namespaced)
+`opm.pkg.<name>.version`, `opm.pkg.<name>.checksum`, `opm.pkg.<name>.fileverse`, etc.
+## Subnames
+OPM creates ENS subnames like `express.djpai.eth` during `opm push` if you own the parent name. Each subname gets its own text records for per-package resolution.
+## Contenthash
+The ENS `contenthash` is set to the Fileverse IPFS CID — read directly from the Fileverse Portal smart contract. This makes audit reports discoverable via standard ENS contenthash resolution.
+## When Records Are Used
+- **`opm push`** — Writes all records after on-chain registration
+- **`opm info`** / **`opm view`** — Reads and displays records alongside on-chain data
+- **`opm install express@djpai.eth`** — Resolves the ENS author, reads metadata
+<Note>
+Writing records requires the signer (`OPM_SIGNING_KEY`) to own/manage the ENS name. Reading is permissionless.
+</Note>

package/docs/concepts/multi-agent-consensus.mdx CHANGED Viewed

@@ -1,58 +1,40 @@
 ---
 title: 'Multi-Agent Consensus'
-description: 'Three LLMs run in parallel, submit scores on-chain, and aggregate via intelligence-weighted averaging.'
+description: 'Three LLMs scan in parallel, submit scores on-chain, aggregate via intelligence-weighted averaging.'
 ---
 # Multi-Agent Consensus
-OPM uses **three heterogeneous AI agents** that run in parallel to analyze packages. Each agent independently evaluates source code, version history, and CVE data, then submits its risk score and reasoning on-chain. Scores are aggregated using **intelligence-weighted averaging** to produce a final risk assessment.
+Three different AI models analyze every package in parallel. Each submits an independent risk score on-chain. Scores are aggregated using intelligence-weighted averaging.
-## Agent Configuration
+## Agent Lineup
-| Agent | OpenRouter (preferred) | OpenAI (fallback) |
-|-------|------------------------|--------------------|
+| Agent | OpenRouter | OpenAI fallback |
+|-------|-----------|----------------|
 | agent-1 | Claude Sonnet 4 | GPT-4.1 |
 | agent-2 | Gemini 2.5 Flash | GPT-4.1 Mini |
 | agent-3 | DeepSeek Chat | GPT-4.1 Nano |
-Model diversity reduces single-model blind spots and improves consensus reliability. Override models via `AGENT1_MODEL`, `AGENT2_MODEL`, and `AGENT3_MODEL`.
+Model diversity reduces single-model blind spots. Override via `AGENT1_MODEL`, `AGENT2_MODEL`, `AGENT3_MODEL`.
-## Analysis Pipeline
+## What Each Agent Does
-Each agent receives:
+1. Receives package source, `package.json`, version history, and any known CVEs
+2. Produces structured JSON: risk score (0-100), risk level, reasoning, vulnerability list, supply chain indicators
+3. Submits score on-chain via `OPMRegistry.submitScore()`
-- Packed tarball contents (scannable extensions: `.js`, `.ts`, `.mjs`, `.cjs`, `.json`)
-- `package.json` and dependency metadata
-- Version history and changelog context
-- CVE/OSV advisory data when available
+Each agent scores a version only once.
-Each agent produces structured JSON:
+## Intelligence-Weighted Scoring
-- **Risk score** (0–100) with categorical classification (LOW, MEDIUM, HIGH, CRITICAL)
-- **Vulnerability enumeration** with severity, category, file path, and evidence
-- **Supply chain indicators**: install scripts, native bindings, obfuscated code, network calls, filesystem access, process spawning, `eval` usage, environment variable access
-- **Version history analysis**: changelog risk, maintainer changes, dependency graph mutations
-- **Recommendation**: SAFE, CAUTION, WARN, or BLOCK
-## On-Chain Submission
-Agent wallets call `OPMRegistry.submitScore(name, version, riskScore, reasoning)` for each package version. Each agent may submit only once per version. Scores are stored in the contract's `versionData` mapping.
-## Intelligence-Weighted Aggregation
-Scores are aggregated using model weights from the **Artificial Analysis API**:
-- **Intelligence Index**: General reasoning and knowledge
-- **Coding Index**: Code generation and analysis capability
-Weights are applied to each agent's score before computing the mean. This favors higher-capability models when consensus is ambiguous.
+When `ARTIFICIAL_ANALYSIS_API_KEY` is set, scores are weighted by each model's Intelligence Index and Coding Index from the Artificial Analysis API. Higher-capability models carry more weight in the final score.
 <Note>
-If `ARTIFICIAL_ANALYSIS_API_KEY` is unset or the API is unavailable, OPM falls back to **equal weighting** (simple arithmetic mean).
+Without the API key, agents are weighted equally (simple mean).
 </Note>
-## Fallback Behavior
+## Provider Routing
-- **Provider**: When `OPENROUTER_API_KEY` is set, OPM routes through OpenRouter. Otherwise it uses `OPENAI_API_KEY` with the GPT-4.1 family.
-- **Force provider**: Set `LLM_PROVIDER=openrouter` or `LLM_PROVIDER=openai` to override auto-detection.
-- **OpenAI models**: `gpt-4.1`, `gpt-4.1-mini`, `gpt-4.1-nano` for agent-1, agent-2, agent-3 respectively.
+- `OPENROUTER_API_KEY` set → routes through OpenRouter (multi-model)
+- Only `OPENAI_API_KEY` → falls back to GPT-4.1 family
+- Force a provider: `LLM_PROVIDER=openrouter` or `LLM_PROVIDER=openai`

package/docs/concepts/on-chain-registry.mdx CHANGED Viewed

@@ -1,72 +1,45 @@
 ---
 title: 'On-chain Registry'
-description: 'OPMRegistry.sol stores packages, versions, author profiles, agent scores, and report URIs on Base Sepolia.'
+description: 'OPMRegistry.sol on Base Sepolia — packages, scores, authors, agents.'
 ---
 # On-chain Registry
-The **OPMRegistry** smart contract is the canonical on-chain store for package metadata, author profiles, agent scores, and report URIs. It implements a domain-specific form of the [ERC-8004 (Trustless Agents)](https://eips.ethereum.org/EIPS/eip-8004) three-registry architecture.
+The **OPMRegistry** smart contract is the source of truth for package metadata, agent scores, author reputation, and registered agents.
 ## Deployment
 | Property | Value |
 |----------|-------|
-| **Chain** | Base Sepolia |
-| **Chain ID** | 84532 |
-| **Contract Address** | `0x16684391fc9bf48246B08Afe16d1a57BFa181d48` |
-| **Explorer** | [BaseScan](https://sepolia.basescan.org/address/0x16684391fc9bf48246B08Afe16d1a57BFa181d48) |
+| **Chain** | Base Sepolia (84532) |
+| **Address** | [`0x16684391fc9bf48246B08Afe16d1a57BFa181d48`](https://sepolia.basescan.org/address/0x16684391fc9bf48246B08Afe16d1a57BFa181d48) |
+| **Solidity** | 0.8.20 |
-Override via `CONTRACT_ADDRESS` in your environment.
+## What It Stores
-## Data Structures
+| Data | Description |
+|------|-------------|
+| **Packages** | Name → version → checksum, signature, author, ENS name, report URI |
+| **Agent scores** | Per-version risk scores (0-100) with reasoning from each agent |
+| **Authors** | Wallet → ENS name, cumulative reputation, package count |
+| **Agents** | Authorized agents (owner-set or ZK-verified) with name, model, proof hash |
-| Struct | Fields |
-|--------|--------|
-| `AuthorProfile` | `addr`, `ensName`, `reputationTotal`, `reputationCount`, `packagesPublished` |
-| `AgentScore` | `agent`, `riskScore`, `reasoning` |
-| `VersionData` | `author`, `checksum`, `signature`, `reportURI`, `scores[]`, `exists` |
-| `Package` | `name`, `versions[]`, `exists` |
-| `RegisteredAgent` | `agentAddress`, `name`, `model`, `systemPromptHash`, `proofHash`, `registeredAt`, `active` |
+## Key Operations
-## Key Functions
+- **`registerPackage`** — Store a new version with checksum, signature, and ENS binding
+- **`submitScore`** — Agent submits risk score + reasoning (authorized agents only)
+- **`setReportURI`** — Attach Fileverse/IPFS report link to a version
+- **`registerAgent`** — Permissionless agent registration with ZK proof hash
+- **`getSafestVersion`** — Get the lowest-risk version in a lookback window
+- **`getPackageInfo`** — Full metadata + aggregate score for a version
-| Function | Access | Description |
-|----------|--------|-------------|
-| `registerPackage(name, version, checksum, sig, ensName)` | Public | Register a new package version with checksum, signature, and ENS binding |
-| `submitScore(name, version, riskScore, reasoning)` | Authorized agents | Submit a risk score (0–100) and reasoning for a package version |
-| `setReportURI(name, version, uri)` | Authorized agents | Attach a Fileverse report URI to a package version |
-| `getPackageInfo(name, version)` | View | Retrieve full metadata and aggregate score for a package version |
-| `getScores(name, version)` | View | Return all individual agent scores for a version |
-| `getAggregateScore(name, version)` | View | Compute mean risk score across all agent submissions |
-| `getSafestVersion(name, lookback)` | View | Return the lowest-risk version within a configurable lookback window |
-| `getVersions(name)` | View | List all registered versions of a package |
-| `getAuthorByAddress(addr)` | View | Retrieve author profile by Ethereum address |
-| `getAuthorByENS(ensName)` | View | Resolve author profile by ENS name |
-| `getAuthorReputation(addr)` | View | Compute author's mean risk score across all packages |
-| `registerAgent(name, model, systemPromptHash, proofHash)` | Public | Permissionless agent registration (requires valid ZK proof) |
-## Events
-| Event | Parameters |
-|-------|------------|
-| `PackageRegistered` | `name`, `version`, `author`, `ensName` |
-| `ScoreSubmitted` | `name`, `version`, `agent`, `riskScore`, `reasoning` |
-| `ReportURISet` | `name`, `version`, `uri` |
-| `AuthorRegistered` | `addr`, `ensName` |
-| `AgentAuthorized` | `agent`, `status` |
-| `AgentRegistered` | `agent`, `name`, `model`, `systemPromptHash`, `proofHash`, `timestamp` |
+See [Contract Functions](/contract/functions) for the complete API reference.
 ## Author Reputation
-Author reputation is computed as:
-```
-reputation = reputationTotal / reputationCount
-```
-Where `reputationTotal` accumulates each agent's score for every version of every package authored by that address. This provides a cumulative risk score average across all published packages.
+Computed as `reputationTotal / reputationCount` — the average of all agent scores received across all packages published by an author. Lower is better.
-## Risk Thresholds (Contract)
+## Risk Thresholds
 | Constant | Value |
 |----------|-------|

package/docs/concepts/security-model.mdx CHANGED Viewed

@@ -1,76 +1,44 @@
 ---
 title: 'Security Model'
-description: 'OPM defense-in-depth: cryptographic attestation, multi-agent AI, CVE integration, supply chain checks, and ENS-based trust.'
+description: 'Five layers of defense: crypto, AI, CVEs, supply chain checks, and ENS trust.'
 ---
 # Security Model
-OPM employs a **defense-in-depth** architecture across five layers. No single layer is sufficient; together they mitigate supply chain injection, typosquatting, dependency confusion, maintainer takeover, and known vulnerability exploitation.
+OPM layers five independent defenses. No single layer is sufficient — together they catch supply chain injection, typosquatting, dependency confusion, maintainer takeover, and known CVEs.
-## 1. Cryptographic Layer
+## 1. Cryptographic Signing
-- **SHA-256 checksum**: Computed over the packed tarball; stored on-chain and verified at install time
-- **ECDSA signature**: secp256k1 signature over the checksum, derived from the author's Ethereum private key
-- **On-chain registration**: Checksum, signature, and author binding stored in `OPMRegistry` on Base Sepolia
+Every package gets a SHA-256 checksum and ECDSA signature (secp256k1) from the author's Ethereum wallet. Both are stored on-chain and verified at install time. Tampered tarballs are rejected.
-Installation is blocked if the tarball checksum does not match the on-chain value or if the signature verification fails.
+## 2. Multi-Agent AI Scanning
-## 2. AI Layer
+Three LLMs (Claude, Gemini, DeepSeek) analyze source code, dependency metadata, and version history in parallel. Each submits a risk score (0-100) on-chain. Scores are aggregated using intelligence-weighted averaging via the Artificial Analysis API.
-Three heterogeneous LLMs analyze packages in parallel:
+Agents flag: install scripts, native bindings, obfuscated code, network calls, filesystem access, process spawning, `eval` usage, and env var access.
-- Static analysis of source code, dependency metadata, and version history
-- Each agent submits a risk score (0–100) and reasoning on-chain
-- **Intelligence-weighted aggregation** via the Artificial Analysis API (Intelligence Index, Coding Index)
-- Fallback to equal weighting if the API is unavailable
+## 3. CVE Detection
-<CardGroup cols={2}>
-  <Card title="Agent 1" icon="robot">
-    Claude Sonnet 4 (OpenRouter) / GPT-4.1 (OpenAI fallback)
-  </Card>
-  <Card title="Agent 2" icon="robot">
-    Gemini 2.5 Flash (OpenRouter) / GPT-4.1 Mini (OpenAI fallback)
-  </Card>
-  <Card title="Agent 3" icon="robot">
-    DeepSeek Chat (OpenRouter) / GPT-4.1 Nano (OpenAI fallback)
-  </Card>
-</CardGroup>
+Real-time lookup against the OSV database (CVE + GHSA advisories) with CVSS v3 severity scoring. CRITICAL CVEs block installation. HIGH CVEs trigger warnings with suggested fix versions.
-## 3. CVE Layer
+## 4. Supply Chain Checks
-- **OSV (Open Source Vulnerabilities)**: Real-time CVE and GHSA advisory data
-- **GitHub Advisory Database**: Integrated via OSV API
-- **CVSS v3** base score computation for severity classification
-- **CRITICAL** severity: installation blocked
-- **HIGH** severity: warning with suggested fix version
+| Check | How |
+|-------|-----|
+| Typosquatting | Levenshtein distance against top npm packages + AI name similarity assessment |
+| Dependency confusion | Scoped vs unscoped name conflicts surfaced during `opm check` |
+| ChainPatrol blocklist | Fallback for packages not in the on-chain registry |
-## 4. Supply Chain Layer
+## 5. ENS Trust Layer
-| Check | Description |
-|-------|-------------|
-| **Typosquat detection** | Package names compared against npm search results and download-count differentials; AI agents assess name similarity |
-| **Dependency confusion** | Scoped vs unscoped name conflicts and internal package shadowing surfaced during `opm check` |
-| **ChainPatrol blocklist** | Fallback blocklist for packages absent from the on-chain registry (requires `CHAINPATROL_API_KEY`) |
-AI agents also flag: install scripts, native bindings, obfuscated code, network calls, filesystem access, process spawning, `eval` usage, and environment variable access.
-## 5. Trust Layer
-- **ENS identity resolution**: Author addresses resolved to ENS names (Sepolia, Mainnet fallback)
-- **On-chain author reputation**: Cumulative risk score average across all published packages
-- **Author profiles**: ENS text records (avatar, description, URL, GitHub, Twitter, email) for human verification
+Author addresses resolve to ENS names. Reputation is computed as the average risk score across all published packages. ENS text records (avatar, GitHub, Twitter) provide human-verifiable identity signals.
 ## Risk Thresholds
-| Range | Level | Effect |
-|-------|-------|--------|
+| Score | Level | What happens |
+|-------|-------|-------------|
 | 0–20 | LOW | Safe to install |
 | 21–40 | MEDIUM | Flagged for caution |
 | 41–70 | HIGH | Warnings triggered |
 | 71–100 | CRITICAL | High risk |
-| Threshold | Value | Behavior |
-|-----------|-------|----------|
-| Block threshold (CLI) | 80 | `opm push` blocks publication; `opm install` blocks installation |
-| `HIGH_RISK_THRESHOLD` (contract) | 70 | Packages above trigger warnings |
-| `MEDIUM_RISK_THRESHOLD` (contract) | 40 | Packages above flagged for caution |
+| **≥ 80** | — | **Blocks `opm push` and `opm install`** |

package/docs/concepts/zk-agent-verification.mdx CHANGED Viewed

@@ -1,82 +1,44 @@
 ---
 title: 'ZK Agent Verification'
-description: 'Permissionless agent registration requires passing a benchmark suite and proving 100% accuracy via zero-knowledge proofs.'
+description: 'Permissionless agent registration with zero-knowledge proof of 100% benchmark accuracy.'
 ---
 # ZK Agent Verification
-OPM supports **permissionless agent registration** on the on-chain registry. To prevent malicious or spamming agents from participating, agents must pass a benchmark suite and prove **100% accuracy** via a zero-knowledge proof—without revealing the test data or individual results.
+Anyone can register a security agent on OPM. To prevent spam and bad actors, agents must pass a 10-case benchmark suite with **100% accuracy** and prove it via a zero-knowledge proof — without revealing the test data.
 ## Benchmark Suite
-The benchmark consists of **10 labeled test cases** covering:
+10 labeled test cases covering:
-| Category | Description |
-|----------|-------------|
-| Clean packages | Legitimate, low-risk packages |
-| Typosquats | Name-similar malicious packages |
-| Env exfiltration | Environment variable exfiltration attempts |
-| Obfuscated code | Heavily obfuscated or minified payloads |
-| Postinstall attacks | Malicious `postinstall` scripts |
-| Known CVEs | Packages with known vulnerabilities |
-| Dependency confusion | Scoped vs unscoped name conflicts |
+| Category | Cases | Expected |
+|----------|-------|----------|
+| Clean packages | 3 | SAFE |
+| Typosquat | 1 | FLAGGED |
+| Malicious (postinstall, SSH exfil) | 2 | FLAGGED |
+| Known CVE | 1 | FLAGGED |
+| Obfuscated payload | 1 | FLAGGED |
+| Env var exfiltration | 1 | FLAGGED |
+| Dependency confusion | 1 | FLAGGED |
-Each test case has an **expected output**: a risk level (LOW, MEDIUM, HIGH, CRITICAL) and score range. Expected outputs are committed via a hash before the agent runs.
+## How It Works
-## Verification Flow
+1. **Commit** — Expected outputs hashed with a random salt → `commitmentHash`
+2. **Run** — Agent processes all 10 cases in a single LLM call
+3. **Compare** — Actual outputs checked against expected
+4. **Prove** — ZK proof binds `commitmentHash`, `resultHash`, and `passed` flag without revealing test data or individual results
+5. **Register** — If 100% accurate, call `OPMRegistry.registerAgent()` with the proof hash
-1. **Commitment**: Expected outputs hashed with a salt → `commitmentHash`
-2. **Execution**: Candidate agent runs against all 10 cases
-3. **Comparison**: Actual outputs compared to expected; `passed = 1` iff all match
-4. **Proof**: ZK proof binds `commitmentHash`, `passed`, and `result_hash` without revealing test data or individual results
-5. **Registration**: Only agents with `passed = 1` (100% accuracy) are registered on-chain
+The proof proves accuracy without leaking the labeled dataset, so future agents can't game the benchmark.
 ## Circom Circuit
-The `accuracy_verifier.circom` circuit implements the verification logic:
+`AccuracyVerifier(10)` in `packages/contracts/circuits/accuracy_verifier.circom`:
-- **Private inputs**: `expected[N]`, `actual[N]`, `salt`
-- **Public input**: `commitmentHash` (Poseidon hash of `salt` and `expected[0..N-1]`)
-- **Public outputs**: `passed` (1 if all match, 0 otherwise), `proofHash`
+- **Private inputs**: `expected[10]`, `actual[10]`, `salt`
+- **Public input**: `commitmentHash` (Poseidon hash of salt + expected values)
+- **Public outputs**: `passed` (1 if all match), `proofHash`
-The circuit:
-1. Verifies the commitment: `hash(salt, expected[0..N-1]) === commitmentHash`
-2. Checks equality for each test case: `expected[i] === actual[i]`
-3. Computes `passed` as the product of all match bits (1 iff all match)
-4. Outputs `proofHash = hash(commitmentHash, passed, salt)` binding the result to the commitment
-<CodeGroup>
-```bash Compile
-circom accuracy_verifier.circom --r1cs --wasm --sym -o build/
-```
-```bash Trusted setup
-snarkjs groth16 setup build/accuracy_verifier.r1cs pot12_final.ptau build/accuracy_verifier_0000.zkey
-snarkjs zkey contribute build/accuracy_verifier_0000.zkey build/accuracy_verifier_final.zkey --name="opm-ceremony"
-snarkjs zkey export verificationkey build/accuracy_verifier_final.zkey build/verification_key.json
-```
-```bash Prove
-snarkjs groth16 prove build/accuracy_verifier_final.zkey build/accuracy_verifier_js/accuracy_verifier.wasm input.json build/proof.json build/public.json
-```
-```bash Verify
-snarkjs groth16 verify build/verification_key.json build/public.json build/proof.json
-```
-</CodeGroup>
-## On-Chain Registration
-Agents call `OPMRegistry.registerAgent(name, model, systemPromptHash, proofHash)` with a valid proof hash. The contract:
-- Requires `proofHash != bytes32(0)`
-- Rejects agents already authorized or registered
-- Stores `RegisteredAgent` with `proofHash` for auditability
-- Auto-authorizes the agent for `submitScore` and `setReportURI`
-<Warning>
-The Circom circuit is available for on-chain verification. A Solidity verifier can be exported via `snarkjs zkey export solidityverifier` for full trustless verification in future contract upgrades.
-</Warning>
+<Note>
+Currently, `keccak256(proof)` is stored on-chain. Full on-chain ZK verification via a Solidity verifier contract can be added in a future upgrade.
+</Note>