npm - create-verifiable-agent - Versions diffs - 1.0.0 - Mend

create-verifiable-agent 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/README.md +222 -0
package/bin/create-verifiable-agent.js +51 -0
package/demo/mythos-recipe.yaml +183 -0
package/demo/mythos.js +337 -0
package/package.json +49 -0
package/src/analyzer.js +216 -0
package/src/collab-card.js +94 -0
package/src/demo-loader.js +17 -0
package/src/generator.js +190 -0
package/src/html-extractor.js +262 -0
package/src/index.js +107 -0
package/src/notebook.js +277 -0
package/src/plan.js +49 -0
package/src/verifier.js +320 -0

package/README.md ADDED Viewed

@@ -0,0 +1,222 @@
+# create-verifiable-agent
+> Turn any GitHub repo (or local codebase) into a **verifiable multi-agent recipe** in one command — powered by **Claude Sonnet 4.6 + Computer Use API**.
+```bash
+npx create-verifiable-agent https://github.com/your/repo
+```
+---
+## What it does
+| Output | Description |
+|--------|-------------|
+| `recipe.yaml` | Multi-agent YAML recipe with 5 specialized agents |
+| `verification-report.yaml` | Self-consistency scores + provenance chain |
+| `notebook.md` | Interactive Markdown notebook with full workflow trace |
+| `collab-card.md` | Human-AI collaboration card with trust boundaries |
+---
+## The Mythos Demo
+In March 2025, the Mythos/Capybara internal AI strategy leaked publicly. The core failure: agents running straight to production with no verifier, no sandbox, no human gates.
+Run the simulation (no API key needed):
+```bash
+npx create-verifiable-agent --demo mythos --sandbox
+```
+It will catch all 5 simulated vulnerabilities:
+| ID | Severity | Finding |
+|----|----------|---------|
+| MYTH-001 | 🔴 CRITICAL | Hard-coded API keys in CI/CD |
+| MYTH-002 | 🔴 CRITICAL | No human-in-the-loop gates |
+| MYTH-003 | 🟠 HIGH | Missing self-consistency verification |
+| MYTH-004 | 🟠 HIGH | Computer Use without audit trail |
+| MYTH-005 | 🟡 MEDIUM | No sandbox mode |
+> **Note:** This is a fictional/educational simulation based on publicly available summaries. No real Mythos code or confidential data is included.
+---
+## Quick start
+### Prerequisites
+```bash
+node >= 18
+npm >= 9
+export ANTHROPIC_API_KEY=sk-ant-...
+```
+### Run on a GitHub repo
+```bash
+npx create-verifiable-agent https://github.com/anthropics/anthropic-sdk-python
+```
+### Run on a local path
+```bash
+npx create-verifiable-agent ./my-project
+```
+### Safe sandbox mode (no real API calls)
+```bash
+npx create-verifiable-agent https://github.com/your/repo --sandbox
+```
+### Pro plan — plan mode (default, shows plan before executing)
+```bash
+npx create-verifiable-agent https://github.com/your/repo
+# Shows plan, asks for confirmation before making any API calls
+```
+### Auto-accept (skip confirmation prompt)
+```bash
+npx create-verifiable-agent https://github.com/your/repo --accept-edits
+```
+---
+## CLI options
+```
+npx create-verifiable-agent [source] [options]
+Arguments:
+  source                  GitHub URL or local path (omit for Mythos demo)
+Options:
+  -o, --output <dir>      Output directory (default: ./agent-output)
+  --sandbox               Safe mode: no real API calls or mutations
+  --plan                  Show plan before executing (default: true)
+  --accept-edits          Auto-accept all edits (skip confirmation)
+  --demo <name>           Built-in demo: mythos (default)
+  --no-notebook           Skip the Markdown notebook
+  --no-collab-card        Skip the collaboration card
+  --model <model>         Claude model (default: claude-sonnet-4-6)
+  --max-files <n>         Max files to analyze (default: 50)
+  --api-key <key>         Anthropic API key
+  -V, --version           Show version
+  -h, --help              Show help
+```
+---
+## How it works
+```
+Source repo
+    │
+    ▼
+┌─────────────┐
+│  analyzer   │  Scans files, detects stack, extracts architecture
+└──────┬──────┘
+       │
+    ▼
+┌─────────────┐
+│   planner   │  Decomposes goal, assigns tasks to agents
+└──────┬──────┘
+       │ requires human approval ✋
+    ▼
+┌─────────────┐
+│  executor   │  Implements changes (sandbox only by default)
+└──────┬──────┘
+       │
+    ▼
+┌──────────────────┐
+│ computer_use     │  Browser/UI validation with screenshot audit trail
+│ agent            │  (Claude Computer Use API)
+└──────────────────┘
+       │
+    ▼
+┌─────────────┐
+│  verifier   │  Self-consistency (3 samples) + provenance chain
+└──────┬──────┘
+       │
+    ▼
+  Outputs ──► recipe.yaml + verification-report.yaml + notebook.md + collab-card.md
+```
+---
+## Verification loops
+### Self-consistency
+Every critical output is generated **3 times** and compared. If results diverge by more than 20%, the finding is escalated to human review.
+### Provenance tracking
+Every agent output is **SHA-256 hashed** and linked to its inputs, forming a tamper-evident chain you can audit later.
+### Human review gates
+By default, humans must approve:
+- The task plan (before executor runs)
+- Any file writes or shell commands
+- The final verification report
+---
+## Safety & sandbox mode
+`sandbox_mode: true` is the default. It:
+- Blocks all external API calls
+- Blocks all file mutations
+- Blocks all shell command execution
+- Enables full Computer Use screenshot audit trail
+- Requires human approval for every agent action
+To run in live mode, explicitly pass `--no-sandbox` after reviewing the plan.
+---
+## Architecture
+```
+create-verifiable-agent/
+├── bin/
+│   └── create-verifiable-agent.js   # CLI entry point
+├── src/
+│   ├── index.js                     # Main orchestrator
+│   ├── analyzer.js                  # Codebase analysis
+│   ├── generator.js                 # YAML recipe generation (Claude API)
+│   ├── verifier.js                  # Self-consistency + provenance
+│   ├── notebook.js                  # Interactive Markdown notebook
+│   ├── collab-card.js               # Human-AI collaboration card
+│   ├── plan.js                      # Plan mode UI
+│   └── demo-loader.js               # Demo context loader
+├── demo/
+│   ├── mythos.js                    # Mythos cyber-risk simulation
+│   └── mythos-recipe.yaml           # Pre-built Mythos recipe
+├── test/
+│   └── run-tests.js                 # Test suite
+├── demo-script.md                   # 60-second video demo script
+└── README.md
+```
+---
+## Run tests
+```bash
+npm test
+```
+---
+## Contributing
+Issues and PRs welcome at [kju4q/verifiable-agent-recipe](https://github.com/kju4q/verifiable-agent-recipe).
+---
+## License
+MIT

package/bin/create-verifiable-agent.js ADDED Viewed

@@ -0,0 +1,51 @@
+#!/usr/bin/env node
+'use strict';
+const { Command } = require('commander');
+const chalk = require('chalk');
+const path = require('path');
+const { run } = require('../src/index');
+const program = new Command();
+console.log(chalk.cyan.bold('\n  create-verifiable-agent'));
+console.log(chalk.gray('  Claude Sonnet 4.6 + Computer Use  →  verifiable multi-agent recipe\n'));
+program
+  .name('create-verifiable-agent')
+  .description('Turn any GitHub repo or local codebase into a verifiable multi-agent YAML recipe')
+  .argument('[source]', 'GitHub URL or local path to analyze (omit to use Mythos demo)')
+  .option('-o, --output <dir>', 'output directory', './agent-output')
+  .option('--sandbox', 'run in safe sandbox mode (no real API calls, no mutations)', false)
+  .option('--plan', 'show plan before executing (default for Pro)', true)
+  .option('--accept-edits', 'auto-accept all edits without prompting', false)
+  .option('--demo <name>', 'run a built-in demo: mythos (default)', 'mythos')
+  .option('--no-notebook', 'skip generating the interactive Markdown notebook')
+  .option('--no-collab-card', 'skip generating the human-AI collaboration card')
+  .option('--model <model>', 'Claude model to use', 'claude-sonnet-4-6')
+  .option('--max-files <n>', 'max files to analyze', '50')
+  .option('--api-key <key>', 'Anthropic API key (or set ANTHROPIC_API_KEY env var)')
+  .version('1.0.0');
+program.parse(process.argv);
+const opts = program.opts();
+const source = program.args[0] || '__demo__';
+run({
+  source,
+  outputDir: path.resolve(opts.output),
+  sandbox: opts.sandbox,
+  planMode: opts.plan,
+  acceptEdits: opts.acceptEdits,
+  demo: source === '__demo__' ? opts.demo : null,
+  notebook: opts.notebook !== false,
+  collabCard: opts.collabCard !== false,
+  model: opts.model,
+  maxFiles: parseInt(opts.maxFiles, 10),
+  apiKey: opts.apiKey || process.env.ANTHROPIC_API_KEY,
+}).catch(err => {
+  console.error(chalk.red('\nFatal error:'), err.message);
+  if (process.env.DEBUG) console.error(err.stack);
+  process.exit(1);
+});

package/demo/mythos-recipe.yaml ADDED Viewed

@@ -0,0 +1,183 @@
+# Mythos / Capybara Cyber-Risk Simulation — Pre-built Recipe
+# FICTIONAL / EDUCATIONAL USE ONLY
+# Based on publicly available summaries of the March 2025 Mythos strategy leak
+metadata:
+  name: mythos-capybara-verifiable-agent
+  version: 1.0.0
+  description: >
+    Multi-agent recipe that would have caught the Mythos/Capybara security
+    vulnerabilities before the March 2025 leak became public.
+    EDUCATIONAL SIMULATION ONLY.
+  source_repo: mythos-capybara-sim
+  generated_at: "2026-03-31T00:00:00Z"
+  model: claude-sonnet-4-6
+  computer_use_enabled: true
+safety:
+  sandbox_mode: true
+  guardrails:
+    - no_destructive_writes
+    - no_external_api_calls_in_sandbox
+    - human_approval_required_for_mutations
+    - block_hardcoded_secrets
+    - require_screenshot_audit_trail
+    - minimum_confidence_threshold_0.85
+  plan_mode: true
+  accept_edits: false
+agents:
+  - id: secret_scanner
+    role: Secret & Credentials Scanner
+    model: claude-sonnet-4-6
+    tools: [read_file, grep, glob]
+    responsibilities:
+      - Scan all files for hard-coded API keys, tokens, passwords
+      - Check CI/CD configs for exposed credentials
+      - Flag any environment variable misuse
+    inputs: [source_repo_path]
+    outputs: [secret_scan_report, credential_risk_map]
+  - id: agent_architecture_auditor
+    role: Agent Architecture Auditor
+    model: claude-sonnet-4-6
+    tools: [read_file, glob]
+    responsibilities:
+      - Detect single-agent vs multi-agent patterns
+      - Flag missing human-in-the-loop gates
+      - Identify auto-approve / auto-execute patterns
+      - Check for verifier agent presence
+    inputs: [source_repo_path, codebase_summary]
+    outputs: [architecture_audit, gate_gap_report]
+  - id: computer_use_auditor
+    role: Computer Use Provenance Auditor
+    model: claude-sonnet-4-6
+    computer_use: true
+    tools: [screenshot, read_file, grep]
+    responsibilities:
+      - Verify Computer Use sessions have screenshot audit trails
+      - Confirm all UI interactions are logged and hashed
+      - Validate no unlogged browser sessions exist
+    inputs: [source_repo_path]
+    outputs: [cu_audit_report, screenshot_hash_manifest]
+  - id: risk_verifier
+    role: Multi-Sample Risk Verifier
+    model: claude-sonnet-4-6
+    tools: [read_file, bash]
+    responsibilities:
+      - Run self-consistency checks on risk scores (3 samples minimum)
+      - Flag any risk decision below 0.85 confidence
+      - Verify provenance chain for all findings
+    inputs: [secret_scan_report, architecture_audit, cu_audit_report]
+    outputs: [verification_report, confidence_scores, provenance_chain]
+    safety:
+      require_sandbox: true
+      min_samples: 3
+      confidence_threshold: 0.85
+  - id: remediation_planner
+    role: Remediation Planner
+    model: claude-sonnet-4-6
+    tools: [write_file, read_file]
+    responsibilities:
+      - Generate prioritized remediation plan
+      - Write fix templates for each finding
+      - Create human-AI collaboration card
+    inputs: [verification_report, gate_gap_report]
+    outputs: [remediation_plan, fix_templates, collab_card]
+    safety:
+      require_human_approval: true
+workflow:
+  - step: 1
+    agent: secret_scanner
+    action: scan_for_hardcoded_secrets
+    outputs_to: [risk_verifier]
+    expected_findings:
+      - MYTH-001  # Hard-coded OPENAI_KEY in ci/deploy.yml
+  - step: 2
+    agent: agent_architecture_auditor
+    action: audit_agent_patterns
+    outputs_to: [risk_verifier]
+    expected_findings:
+      - MYTH-002  # auto_approve=True in agent_runner.py
+      - MYTH-003  # Missing confidence thresholds in risk_scorer.py
+      - MYTH-005  # No sandbox_mode flag
+  - step: 3
+    agent: computer_use_auditor
+    action: verify_cu_provenance
+    outputs_to: [risk_verifier]
+    optional: false
+    expected_findings:
+      - MYTH-004  # No screenshot audit trail
+  - step: 4
+    agent: risk_verifier
+    action: multi_sample_verification
+    outputs_to: [remediation_planner]
+    requires_approval: false
+    self_consistency:
+      samples: 3
+      threshold: 0.85
+  - step: 5
+    agent: remediation_planner
+    action: generate_remediation_plan
+    outputs_to: null
+    requires_approval: true  # Human must sign off
+verification:
+  self_consistency:
+    enabled: true
+    method: multi_sample
+    samples: 3
+    threshold: 0.85
+    description: >
+      Each risk finding is scored 3 times. If scores diverge > 15%,
+      the finding is escalated to human review before any action.
+  provenance:
+    enabled: true
+    track_inputs: true
+    track_model_version: true
+    track_timestamps: true
+    hash_outputs: true
+    screenshot_audit: true
+    description: >
+      All agent outputs (including Computer Use screenshots) are SHA-256
+      hashed and linked to their inputs, forming a tamper-evident chain.
+  human_review_gates:
+    - after_risk_verifier
+    - before_remediation_planner
+    - before_any_file_write
+findings_simulated:
+  - id: MYTH-001
+    severity: CRITICAL
+    title: Hard-coded API keys in CI/CD pipeline
+    would_be_caught_at: step_1
+  - id: MYTH-002
+    severity: CRITICAL
+    title: No human-in-the-loop gates on autonomous agent
+    would_be_caught_at: step_2
+  - id: MYTH-003
+    severity: HIGH
+    title: Missing self-consistency verification on risk scores
+    would_be_caught_at: step_2
+  - id: MYTH-004
+    severity: HIGH
+    title: Computer Use agent without screenshot audit trail
+    would_be_caught_at: step_3
+  - id: MYTH-005
+    severity: MEDIUM
+    title: No sandbox mode — agents run against production
+    would_be_caught_at: step_2