create-verifiable-agent 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,222 @@
1
+ # create-verifiable-agent
2
+
3
+ > Turn any GitHub repo (or local codebase) into a **verifiable multi-agent recipe** in one command — powered by **Claude Sonnet 4.6 + Computer Use API**.
4
+
5
+ ```bash
6
+ npx create-verifiable-agent https://github.com/your/repo
7
+ ```
8
+
9
+ ---
10
+
11
+ ## What it does
12
+
13
+ | Output | Description |
14
+ |--------|-------------|
15
+ | `recipe.yaml` | Multi-agent YAML recipe with 5 specialized agents |
16
+ | `verification-report.yaml` | Self-consistency scores + provenance chain |
17
+ | `notebook.md` | Interactive Markdown notebook with full workflow trace |
18
+ | `collab-card.md` | Human-AI collaboration card with trust boundaries |
19
+
20
+ ---
21
+
22
+ ## The Mythos Demo
23
+
24
+ In March 2025, the Mythos/Capybara internal AI strategy leaked publicly. The core failure: agents running straight to production with no verifier, no sandbox, no human gates.
25
+
26
+ Run the simulation (no API key needed):
27
+
28
+ ```bash
29
+ npx create-verifiable-agent --demo mythos --sandbox
30
+ ```
31
+
32
+ It will catch all 5 simulated vulnerabilities:
33
+
34
+ | ID | Severity | Finding |
35
+ |----|----------|---------|
36
+ | MYTH-001 | 🔴 CRITICAL | Hard-coded API keys in CI/CD |
37
+ | MYTH-002 | 🔴 CRITICAL | No human-in-the-loop gates |
38
+ | MYTH-003 | 🟠 HIGH | Missing self-consistency verification |
39
+ | MYTH-004 | 🟠 HIGH | Computer Use without audit trail |
40
+ | MYTH-005 | 🟡 MEDIUM | No sandbox mode |
41
+
42
+ > **Note:** This is a fictional/educational simulation based on publicly available summaries. No real Mythos code or confidential data is included.
43
+
44
+ ---
45
+
46
+ ## Quick start
47
+
48
+ ### Prerequisites
49
+
50
+ ```bash
51
+ node >= 18
52
+ npm >= 9
53
+ export ANTHROPIC_API_KEY=sk-ant-...
54
+ ```
55
+
56
+ ### Run on a GitHub repo
57
+
58
+ ```bash
59
+ npx create-verifiable-agent https://github.com/anthropics/anthropic-sdk-python
60
+ ```
61
+
62
+ ### Run on a local path
63
+
64
+ ```bash
65
+ npx create-verifiable-agent ./my-project
66
+ ```
67
+
68
+ ### Safe sandbox mode (no real API calls)
69
+
70
+ ```bash
71
+ npx create-verifiable-agent https://github.com/your/repo --sandbox
72
+ ```
73
+
74
+ ### Pro plan — plan mode (default, shows plan before executing)
75
+
76
+ ```bash
77
+ npx create-verifiable-agent https://github.com/your/repo
78
+ # Shows plan, asks for confirmation before making any API calls
79
+ ```
80
+
81
+ ### Auto-accept (skip confirmation prompt)
82
+
83
+ ```bash
84
+ npx create-verifiable-agent https://github.com/your/repo --accept-edits
85
+ ```
86
+
87
+ ---
88
+
89
+ ## CLI options
90
+
91
+ ```
92
+ npx create-verifiable-agent [source] [options]
93
+
94
+ Arguments:
95
+ source GitHub URL or local path (omit for Mythos demo)
96
+
97
+ Options:
98
+ -o, --output <dir> Output directory (default: ./agent-output)
99
+ --sandbox Safe mode: no real API calls or mutations
100
+ --plan Show plan before executing (default: true)
101
+ --accept-edits Auto-accept all edits (skip confirmation)
102
+ --demo <name> Built-in demo: mythos (default)
103
+ --no-notebook Skip the Markdown notebook
104
+ --no-collab-card Skip the collaboration card
105
+ --model <model> Claude model (default: claude-sonnet-4-6)
106
+ --max-files <n> Max files to analyze (default: 50)
107
+ --api-key <key> Anthropic API key
108
+ -V, --version Show version
109
+ -h, --help Show help
110
+ ```
111
+
112
+ ---
113
+
114
+ ## How it works
115
+
116
+ ```
117
+ Source repo
118
+
119
+
120
+ ┌─────────────┐
121
+ │ analyzer │ Scans files, detects stack, extracts architecture
122
+ └──────┬──────┘
123
+
124
+
125
+ ┌─────────────┐
126
+ │ planner │ Decomposes goal, assigns tasks to agents
127
+ └──────┬──────┘
128
+ │ requires human approval ✋
129
+
130
+ ┌─────────────┐
131
+ │ executor │ Implements changes (sandbox only by default)
132
+ └──────┬──────┘
133
+
134
+
135
+ ┌──────────────────┐
136
+ │ computer_use │ Browser/UI validation with screenshot audit trail
137
+ │ agent │ (Claude Computer Use API)
138
+ └──────────────────┘
139
+
140
+
141
+ ┌─────────────┐
142
+ │ verifier │ Self-consistency (3 samples) + provenance chain
143
+ └──────┬──────┘
144
+
145
+
146
+ Outputs ──► recipe.yaml + verification-report.yaml + notebook.md + collab-card.md
147
+ ```
148
+
149
+ ---
150
+
151
+ ## Verification loops
152
+
153
+ ### Self-consistency
154
+ Every critical output is generated **3 times** and compared. If results diverge by more than 20%, the finding is escalated to human review.
155
+
156
+ ### Provenance tracking
157
+ Every agent output is **SHA-256 hashed** and linked to its inputs, forming a tamper-evident chain you can audit later.
158
+
159
+ ### Human review gates
160
+ By default, humans must approve:
161
+ - The task plan (before executor runs)
162
+ - Any file writes or shell commands
163
+ - The final verification report
164
+
165
+ ---
166
+
167
+ ## Safety & sandbox mode
168
+
169
+ `sandbox_mode: true` is the default. It:
170
+ - Blocks all external API calls
171
+ - Blocks all file mutations
172
+ - Blocks all shell command execution
173
+ - Enables full Computer Use screenshot audit trail
174
+ - Requires human approval for every agent action
175
+
176
+ To run in live mode, explicitly pass `--no-sandbox` after reviewing the plan.
177
+
178
+ ---
179
+
180
+ ## Architecture
181
+
182
+ ```
183
+ create-verifiable-agent/
184
+ ├── bin/
185
+ │ └── create-verifiable-agent.js # CLI entry point
186
+ ├── src/
187
+ │ ├── index.js # Main orchestrator
188
+ │ ├── analyzer.js # Codebase analysis
189
+ │ ├── generator.js # YAML recipe generation (Claude API)
190
+ │ ├── verifier.js # Self-consistency + provenance
191
+ │ ├── notebook.js # Interactive Markdown notebook
192
+ │ ├── collab-card.js # Human-AI collaboration card
193
+ │ ├── plan.js # Plan mode UI
194
+ │ └── demo-loader.js # Demo context loader
195
+ ├── demo/
196
+ │ ├── mythos.js # Mythos cyber-risk simulation
197
+ │ └── mythos-recipe.yaml # Pre-built Mythos recipe
198
+ ├── test/
199
+ │ └── run-tests.js # Test suite
200
+ ├── demo-script.md # 60-second video demo script
201
+ └── README.md
202
+ ```
203
+
204
+ ---
205
+
206
+ ## Run tests
207
+
208
+ ```bash
209
+ npm test
210
+ ```
211
+
212
+ ---
213
+
214
+ ## Contributing
215
+
216
+ Issues and PRs welcome at [kju4q/verifiable-agent-recipe](https://github.com/kju4q/verifiable-agent-recipe).
217
+
218
+ ---
219
+
220
+ ## License
221
+
222
+ MIT
@@ -0,0 +1,51 @@
1
+ #!/usr/bin/env node
2
+ 'use strict';
3
+
4
+ const { Command } = require('commander');
5
+ const chalk = require('chalk');
6
+ const path = require('path');
7
+ const { run } = require('../src/index');
8
+
9
+ const program = new Command();
10
+
11
+ console.log(chalk.cyan.bold('\n create-verifiable-agent'));
12
+ console.log(chalk.gray(' Claude Sonnet 4.6 + Computer Use → verifiable multi-agent recipe\n'));
13
+
14
+ program
15
+ .name('create-verifiable-agent')
16
+ .description('Turn any GitHub repo or local codebase into a verifiable multi-agent YAML recipe')
17
+ .argument('[source]', 'GitHub URL or local path to analyze (omit to use Mythos demo)')
18
+ .option('-o, --output <dir>', 'output directory', './agent-output')
19
+ .option('--sandbox', 'run in safe sandbox mode (no real API calls, no mutations)', false)
20
+ .option('--plan', 'show plan before executing (default for Pro)', true)
21
+ .option('--accept-edits', 'auto-accept all edits without prompting', false)
22
+ .option('--demo <name>', 'run a built-in demo: mythos (default)', 'mythos')
23
+ .option('--no-notebook', 'skip generating the interactive Markdown notebook')
24
+ .option('--no-collab-card', 'skip generating the human-AI collaboration card')
25
+ .option('--model <model>', 'Claude model to use', 'claude-sonnet-4-6')
26
+ .option('--max-files <n>', 'max files to analyze', '50')
27
+ .option('--api-key <key>', 'Anthropic API key (or set ANTHROPIC_API_KEY env var)')
28
+ .version('1.0.0');
29
+
30
+ program.parse(process.argv);
31
+
32
+ const opts = program.opts();
33
+ const source = program.args[0] || '__demo__';
34
+
35
+ run({
36
+ source,
37
+ outputDir: path.resolve(opts.output),
38
+ sandbox: opts.sandbox,
39
+ planMode: opts.plan,
40
+ acceptEdits: opts.acceptEdits,
41
+ demo: source === '__demo__' ? opts.demo : null,
42
+ notebook: opts.notebook !== false,
43
+ collabCard: opts.collabCard !== false,
44
+ model: opts.model,
45
+ maxFiles: parseInt(opts.maxFiles, 10),
46
+ apiKey: opts.apiKey || process.env.ANTHROPIC_API_KEY,
47
+ }).catch(err => {
48
+ console.error(chalk.red('\nFatal error:'), err.message);
49
+ if (process.env.DEBUG) console.error(err.stack);
50
+ process.exit(1);
51
+ });
@@ -0,0 +1,183 @@
1
+ # Mythos / Capybara Cyber-Risk Simulation — Pre-built Recipe
2
+ # FICTIONAL / EDUCATIONAL USE ONLY
3
+ # Based on publicly available summaries of the March 2025 Mythos strategy leak
4
+
5
+ metadata:
6
+ name: mythos-capybara-verifiable-agent
7
+ version: 1.0.0
8
+ description: >
9
+ Multi-agent recipe that would have caught the Mythos/Capybara security
10
+ vulnerabilities before the March 2025 leak became public.
11
+ EDUCATIONAL SIMULATION ONLY.
12
+ source_repo: mythos-capybara-sim
13
+ generated_at: "2026-03-31T00:00:00Z"
14
+ model: claude-sonnet-4-6
15
+ computer_use_enabled: true
16
+
17
+ safety:
18
+ sandbox_mode: true
19
+ guardrails:
20
+ - no_destructive_writes
21
+ - no_external_api_calls_in_sandbox
22
+ - human_approval_required_for_mutations
23
+ - block_hardcoded_secrets
24
+ - require_screenshot_audit_trail
25
+ - minimum_confidence_threshold_0.85
26
+ plan_mode: true
27
+ accept_edits: false
28
+
29
+ agents:
30
+ - id: secret_scanner
31
+ role: Secret & Credentials Scanner
32
+ model: claude-sonnet-4-6
33
+ tools: [read_file, grep, glob]
34
+ responsibilities:
35
+ - Scan all files for hard-coded API keys, tokens, passwords
36
+ - Check CI/CD configs for exposed credentials
37
+ - Flag any environment variable misuse
38
+ inputs: [source_repo_path]
39
+ outputs: [secret_scan_report, credential_risk_map]
40
+
41
+ - id: agent_architecture_auditor
42
+ role: Agent Architecture Auditor
43
+ model: claude-sonnet-4-6
44
+ tools: [read_file, glob]
45
+ responsibilities:
46
+ - Detect single-agent vs multi-agent patterns
47
+ - Flag missing human-in-the-loop gates
48
+ - Identify auto-approve / auto-execute patterns
49
+ - Check for verifier agent presence
50
+ inputs: [source_repo_path, codebase_summary]
51
+ outputs: [architecture_audit, gate_gap_report]
52
+
53
+ - id: computer_use_auditor
54
+ role: Computer Use Provenance Auditor
55
+ model: claude-sonnet-4-6
56
+ computer_use: true
57
+ tools: [screenshot, read_file, grep]
58
+ responsibilities:
59
+ - Verify Computer Use sessions have screenshot audit trails
60
+ - Confirm all UI interactions are logged and hashed
61
+ - Validate no unlogged browser sessions exist
62
+ inputs: [source_repo_path]
63
+ outputs: [cu_audit_report, screenshot_hash_manifest]
64
+
65
+ - id: risk_verifier
66
+ role: Multi-Sample Risk Verifier
67
+ model: claude-sonnet-4-6
68
+ tools: [read_file, bash]
69
+ responsibilities:
70
+ - Run self-consistency checks on risk scores (3 samples minimum)
71
+ - Flag any risk decision below 0.85 confidence
72
+ - Verify provenance chain for all findings
73
+ inputs: [secret_scan_report, architecture_audit, cu_audit_report]
74
+ outputs: [verification_report, confidence_scores, provenance_chain]
75
+ safety:
76
+ require_sandbox: true
77
+ min_samples: 3
78
+ confidence_threshold: 0.85
79
+
80
+ - id: remediation_planner
81
+ role: Remediation Planner
82
+ model: claude-sonnet-4-6
83
+ tools: [write_file, read_file]
84
+ responsibilities:
85
+ - Generate prioritized remediation plan
86
+ - Write fix templates for each finding
87
+ - Create human-AI collaboration card
88
+ inputs: [verification_report, gate_gap_report]
89
+ outputs: [remediation_plan, fix_templates, collab_card]
90
+ safety:
91
+ require_human_approval: true
92
+
93
+ workflow:
94
+ - step: 1
95
+ agent: secret_scanner
96
+ action: scan_for_hardcoded_secrets
97
+ outputs_to: [risk_verifier]
98
+ expected_findings:
99
+ - MYTH-001 # Hard-coded OPENAI_KEY in ci/deploy.yml
100
+
101
+ - step: 2
102
+ agent: agent_architecture_auditor
103
+ action: audit_agent_patterns
104
+ outputs_to: [risk_verifier]
105
+ expected_findings:
106
+ - MYTH-002 # auto_approve=True in agent_runner.py
107
+ - MYTH-003 # Missing confidence thresholds in risk_scorer.py
108
+ - MYTH-005 # No sandbox_mode flag
109
+
110
+ - step: 3
111
+ agent: computer_use_auditor
112
+ action: verify_cu_provenance
113
+ outputs_to: [risk_verifier]
114
+ optional: false
115
+ expected_findings:
116
+ - MYTH-004 # No screenshot audit trail
117
+
118
+ - step: 4
119
+ agent: risk_verifier
120
+ action: multi_sample_verification
121
+ outputs_to: [remediation_planner]
122
+ requires_approval: false
123
+ self_consistency:
124
+ samples: 3
125
+ threshold: 0.85
126
+
127
+ - step: 5
128
+ agent: remediation_planner
129
+ action: generate_remediation_plan
130
+ outputs_to: null
131
+ requires_approval: true # Human must sign off
132
+
133
+ verification:
134
+ self_consistency:
135
+ enabled: true
136
+ method: multi_sample
137
+ samples: 3
138
+ threshold: 0.85
139
+ description: >
140
+ Each risk finding is scored 3 times. If scores diverge > 15%,
141
+ the finding is escalated to human review before any action.
142
+
143
+ provenance:
144
+ enabled: true
145
+ track_inputs: true
146
+ track_model_version: true
147
+ track_timestamps: true
148
+ hash_outputs: true
149
+ screenshot_audit: true
150
+ description: >
151
+ All agent outputs (including Computer Use screenshots) are SHA-256
152
+ hashed and linked to their inputs, forming a tamper-evident chain.
153
+
154
+ human_review_gates:
155
+ - after_risk_verifier
156
+ - before_remediation_planner
157
+ - before_any_file_write
158
+
159
+ findings_simulated:
160
+ - id: MYTH-001
161
+ severity: CRITICAL
162
+ title: Hard-coded API keys in CI/CD pipeline
163
+ would_be_caught_at: step_1
164
+
165
+ - id: MYTH-002
166
+ severity: CRITICAL
167
+ title: No human-in-the-loop gates on autonomous agent
168
+ would_be_caught_at: step_2
169
+
170
+ - id: MYTH-003
171
+ severity: HIGH
172
+ title: Missing self-consistency verification on risk scores
173
+ would_be_caught_at: step_2
174
+
175
+ - id: MYTH-004
176
+ severity: HIGH
177
+ title: Computer Use agent without screenshot audit trail
178
+ would_be_caught_at: step_3
179
+
180
+ - id: MYTH-005
181
+ severity: MEDIUM
182
+ title: No sandbox mode — agents run against production
183
+ would_be_caught_at: step_2