clew-code 0.2.21 → 0.2.23

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (67) hide show
  1. package/README.md +45 -45
  2. package/dist/main.js +2906 -2722
  3. package/docs/architecture.html +91 -148
  4. package/docs/assets/clew-agent-loop.png +0 -0
  5. package/docs/assets/clew-general-architecture.png +0 -0
  6. package/docs/assets/clew-mcp-architecture.png +0 -0
  7. package/docs/assets/clew-p2p-swarm.png +0 -0
  8. package/docs/changelog.html +150 -0
  9. package/docs/cli-reference.html +90 -0
  10. package/docs/commands.html +133 -241
  11. package/docs/configuration.html +85 -147
  12. package/docs/contributing.html +91 -0
  13. package/docs/css/styles.css +272 -152
  14. package/docs/daemon.html +62 -129
  15. package/docs/features/bridge-mode.html +61 -66
  16. package/docs/features/evals.html +57 -149
  17. package/docs/features/searxng-search.html +58 -118
  18. package/docs/features/sentry-setup.html +61 -124
  19. package/docs/index.html +185 -148
  20. package/docs/installation.html +77 -105
  21. package/docs/internals/growthbook-ab-testing.html +69 -91
  22. package/docs/internals/hidden-features.html +81 -143
  23. package/docs/js/main.js +29 -0
  24. package/docs/loop.html +69 -181
  25. package/docs/mcp.html +99 -247
  26. package/docs/models.html +63 -92
  27. package/docs/permission-model.html +86 -102
  28. package/docs/plugins.html +84 -102
  29. package/docs/providers.html +87 -127
  30. package/docs/quick-start.html +81 -93
  31. package/docs/research-memory.html +71 -102
  32. package/docs/security.html +71 -0
  33. package/docs/skills.html +67 -117
  34. package/docs/swarm.html +78 -236
  35. package/docs/tools.html +183 -171
  36. package/docs/troubleshooting.html +86 -106
  37. package/docs/voice-mode.html +79 -0
  38. package/package.json +1 -1
  39. package/docs/architecture.th.html +0 -79
  40. package/docs/clew-code-architecture.html +0 -1126
  41. package/docs/commands.th.html +0 -269
  42. package/docs/configuration.th.html +0 -108
  43. package/docs/daemon.th.html +0 -73
  44. package/docs/features/bridge-mode.th.html +0 -62
  45. package/docs/features/evals.th.html +0 -62
  46. package/docs/features/searxng-search.th.html +0 -67
  47. package/docs/features/sentry-setup.th.html +0 -69
  48. package/docs/features/swarm.html +0 -156
  49. package/docs/generated/providers.html +0 -625
  50. package/docs/generated/tools.html +0 -558
  51. package/docs/index.th.html +0 -292
  52. package/docs/installation.th.html +0 -105
  53. package/docs/internals/growthbook-ab-testing.th.html +0 -60
  54. package/docs/internals/hidden-features.th.html +0 -107
  55. package/docs/loop.th.html +0 -227
  56. package/docs/mcp.th.html +0 -207
  57. package/docs/models.th.html +0 -61
  58. package/docs/permission-model.th.html +0 -67
  59. package/docs/plugins.th.html +0 -79
  60. package/docs/prompts-and-features.html +0 -806
  61. package/docs/providers.th.html +0 -81
  62. package/docs/quick-start.th.html +0 -89
  63. package/docs/research-memory.th.html +0 -72
  64. package/docs/skills.th.html +0 -90
  65. package/docs/swarm.th.html +0 -280
  66. package/docs/tools.th.html +0 -84
  67. package/docs/troubleshooting.th.html +0 -85
package/docs/daemon.html CHANGED
@@ -1,129 +1,62 @@
1
- <!DOCTYPE html>
2
- <html lang="en">
3
- <head>
4
- <meta charset="UTF-8">
5
- <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
- <title>Autonomous Daemon — Clew</title>
7
- <meta name="description" content="24/7 autonomous background execution task queue, agent loop, supervisor integration, and recurring tasks.">
8
- <link rel="preconnect" href="https://fonts.googleapis.com">
9
- <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
10
- <link href="https://fonts.googleapis.com/css2?family=DM+Sans:wght@400;500;600;700&family=JetBrains+Mono:wght@400;500;600;700&display=swap" rel="stylesheet">
11
- <link rel="stylesheet" href="css/styles.css">
12
- <link rel="icon" type="image/svg+xml" href="./assets/clew.svg">
13
- </head>
14
- <body>
15
- <header class="header"></header>
16
- <div class="app">
17
- <aside class="sidebar" id="sidebar"></aside>
18
- <div class="sidebar-overlay" id="sidebarOverlay"></div>
19
- <div class="content-wrap">
20
- <main class="content">
21
- <div class="breadcrumbs"><a href="index.html">Home</a><span class="sep">/</span><span>Daemon Mode</span></div>
22
- <h1>Autonomous Daemon Mode</h1>
23
- <p class="section-subtitle">Run Clew as a 24/7 background daemon — task queue, agent loop, health checks, and supervisor auto-respawn for unattended autonomous operation.</p>
24
-
25
- <p>The autonomous system lives in <code>src/services/autonomous/</code> and consists of four main components: the <strong>task queue</strong>, <strong>agent loop</strong>, <strong>daemon entry point</strong>, and <strong>supervisor integration</strong>.</p>
26
-
27
- <h2>Architecture</h2>
28
- <pre><code> + Task Queue (taskQueue.ts)
29
- | File-backed persistent queue
30
- | Priorities Leases Dead-letter
31
- |
32
- + Agent Loop (agentLoop.ts)
33
- | Dequeue Spawn worker Monitor Retry
34
- |
35
- + Daemon Mode (daemonMode.ts)
36
- | Background process entry point
37
- |
38
- + Supervisor (supervisorIntegration.ts)
39
- Health checks Auto-respawn State tracking</code></pre>
40
-
41
- <h2>Task Queue</h2>
42
- <p>The file-backed persistent queue (<code>src/services/autonomous/taskQueue.ts</code>) is the foundation of the autonomous system:</p>
43
- <ul>
44
- <li><strong>Persistence</strong> Tasks survive process restarts via on-disk storage</li>
45
- <li><strong>Priorities</strong> Urgent tasks skip ahead in the queue</li>
46
- <li><strong>Leases</strong> Tasks are leased to workers with TTL; expired leases are retried</li>
47
- <li><strong>Dead-letter</strong> — Tasks that exhaust retries are moved to dead-letter for inspection</li>
48
- <li><strong>Scheduling</strong> — One-shot and recurring (cron) tasks supported</li>
49
- </ul>
50
-
51
- <h2>Agent Loop</h2>
52
- <p>The continuous agent loop (<code>src/services/autonomous/agentLoop.ts</code>) runs in the background:</p>
53
- <ol>
54
- <li><strong>Dequeue</strong> — Pull the highest-priority ready task</li>
55
- <li><strong>Spawn worker</strong> — Launch a worker session for the task</li>
56
- <li><strong>Monitor</strong> Track progress, streaming output, and resource usage</li>
57
- <li><strong>Retry or complete</strong> — On failure, retry with backoff; on success, record result</li>
58
- <li><strong>Repeat</strong> — Check for new tasks and repeat the cycle</li>
59
- </ol>
60
-
61
- <h2>Daemon Entry Point</h2>
62
- <p><code>src/services/autonomous/daemonMode.ts</code> provides the background process entry point. When started in daemon mode, Clew:</p>
63
- <ul>
64
- <li>Detaches from the terminal and runs as a background process</li>
65
- <li>Logs output to a configurable log file</li>
66
- <li>Responds to signals for graceful shutdown</li>
67
- <li>Reports status to the supervisor for health tracking</li>
68
- </ul>
69
-
70
- <h2>Supervisor Integration</h2>
71
- <p><code>src/services/autonomous/supervisorIntegration.ts</code> ensures the daemon stays running:</p>
72
- <ul>
73
- <li><strong>Health checks</strong> — Periodic heartbeat and resource checks</li>
74
- <li><strong>Auto-respawn</strong> — Automatic restart on unexpected exit</li>
75
- <li><strong>State tracking</strong> — Current status, running tasks, error counts</li>
76
- <li><strong>Graceful degradation</strong> — Reduces polling frequency on repeated failures</li>
77
- </ul>
78
-
79
- <h2>Commands</h2>
80
- <table>
81
- <tr><th>Command</th><th>Description</th></tr>
82
- <tr><td><code>/daemon</code></td><td>Open interactive control panel; subcommands: start, stop, status, restart</td></tr>
83
- <tr><td><code>/task</code></td><td>Create scheduled or recurring tasks via interactive form</td></tr>
84
- <tr><td><code>/task list</code></td><td>List queued, running, and completed tasks</td></tr>
85
- <tr><td><code>/loop</code></td><td>Run a prompt or command on a recurring interval (<code>/loop 5m /check-deploy</code>)</td></tr>
86
- <tr><td><code>/agents</code></td><td>Manage agent configurations and daemon worker pools</td></tr>
87
- <tr><td><code>/tasks</code></td><td>List and manage background agent tasks</td></tr>
88
- </table>
89
-
90
- <h2>Task Scheduling</h2>
91
- <p>Scheduled tasks can be created through the interactive <code>/task</code> form or programmatically. Storage modes:</p>
92
- <ul>
93
- <li><strong>Durable</strong> — Persists to <code>.clew/scheduled_tasks.json</code>, survives restarts</li>
94
- <li><strong>Session-only</strong> — Kept in memory for the current session only</li>
95
- </ul>
96
-
97
- <p>Recurring tasks auto-expire after 30 days. One-shot tasks auto-delete after firing. Custom cron expressions are supported (standard 5-field format).</p>
98
-
99
- <pre><code>/task
100
- Name: Deploy health check
101
- Schedule: Daily
102
- Time: 09:00
103
- Prompt: Check deployment status and report
104
- Storage: Durable</code></pre>
105
-
106
- <h2>Architecture Files</h2>
107
- <table>
108
- <tr><th>File</th><th>Role</th></tr>
109
- <tr><td><code>src/services/autonomous/taskQueue.ts</code></td><td>Persistent task queue with priorities, leases, dead-letter</td></tr>
110
- <tr><td><code>src/services/autonomous/agentLoop.ts</code></td><td>Continuous 24/7 agent loop</td></tr>
111
- <tr><td><code>src/services/autonomous/daemonMode.ts</code></td><td>Background daemon entry point</td></tr>
112
- <tr><td><code>src/services/autonomous/supervisorIntegration.ts</code></td><td>Health checks, auto-respawn, state tracking</td></tr>
113
- </table>
114
-
115
- <footer class="footer">
116
- <span>Clew Code 0.2.14 — Open Source</span>
117
- <div class="footer-links">
118
- <a href="https://github.com/ClewCode/ClewCode">GitHub</a>
119
- <a href="https://github.com/ClewCode/ClewCode/issues">Issues</a>
120
- </div>
121
- </footer>
122
- </main>
123
- <nav class="toc-sidebar"></nav>
124
- </div>
125
- </div>
126
- <script src="js/main.js"></script>
127
- </body>
128
- </html>
129
-
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>Daemon Mode — Clew Code</title>
7
+ <meta name="description" content="Run Clew Code as a background daemon for autonomous operations.">
8
+ <link rel="icon" type="image/svg+xml" href="assets/clew.svg">
9
+ <link rel="preconnect" href="https://fonts.googleapis.com">
10
+ <link href="https://fonts.googleapis.com/css2?family=DM+Sans:wght@400;500;600;700&family=JetBrains+Mono:wght@400;500;600;700&display=swap" rel="stylesheet">
11
+ <link rel="stylesheet" href="css/styles.css">
12
+ </head>
13
+ <body>
14
+ <header class="header"></header>
15
+ <div id="sidebarOverlay" class="sidebar-overlay"></div>
16
+ <aside id="sidebar" class="sidebar"></aside>
17
+
18
+ <div class="content-wrap">
19
+ <div class="content">
20
+
21
+ <div class="breadcrumbs"><a href="index.html">Home</a><span class="sep">/</span><span class="current">Daemon Mode</span></div>
22
+
23
+ <h1>Daemon Mode</h1>
24
+ <p class="sub">Run Clew Code as a persistent background process for autonomous task execution, scheduled jobs, and continuous monitoring.</p>
25
+
26
+ <h2 id="overview">Overview</h2>
27
+ <p>Daemon mode keeps Clew Code running in the background, processing tasks from a persistent queue, executing scheduled jobs via cron, and coordinating with mesh peers — all without an active terminal session.</p>
28
+
29
+ <h2 id="daemon-commands">Daemon Commands</h2>
30
+ <pre><code class="language-bash">❯ /daemon # Open daemon dashboard
31
+ ❯ /daemon status # Check daemon status
32
+ </code></pre>
33
+
34
+ <h2 id="task-queue">Task Queue</h2>
35
+ <p>The daemon uses a file-backed persistent task queue with:</p>
36
+ <ul>
37
+ <li><strong>Lease-based concurrency</strong> — max 3 concurrent workers</li>
38
+ <li><strong>Exponential backoff retry</strong> — failed tasks are retried with increasing delays</li>
39
+ <li><strong>Dead-letter management</strong> — tasks that exceed retry limits are moved to dead-letter storage</li>
40
+ </ul>
41
+
42
+ <h2 id="scheduling">Scheduling</h2>
43
+ <p>Use cron syntax to schedule recurring tasks:</p>
44
+ <pre><code class="language-bash">❯ /task add "0 9 * * *" "daily standup summary"
45
+ /task list # list scheduled tasks
46
+ /task remove &lt;id&gt; # remove a task
47
+ </code></pre>
48
+
49
+ <h2 id="loop">Agent Loop</h2>
50
+ <p>The daemon integrates with the autonomous agent loop for 24/7 operation:</p>
51
+ <pre><code class="language-bash">❯ /loop start # start the autonomous loop
52
+ /loop stop # stop the loop
53
+ ❯ /loop status # check loop status
54
+ </code></pre>
55
+
56
+ <p>See <a href="loop.html">Agent Loop</a> for details.</p>
57
+ </div>
58
+ </div>
59
+
60
+ <script src="js/main.js"></script>
61
+ </body>
62
+ </html>
@@ -1,67 +1,62 @@
1
- <!DOCTYPE html>
2
- <html lang="en">
3
- <head>
4
- <meta charset="UTF-8">
5
- <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
- <title>Bridge Mode — Remote Control & Collaboration — Clew</title>
7
- <meta name="description" content="WebSocket remote control and collaboration for Clew.">
8
- <link rel="preconnect" href="https://fonts.googleapis.com">
9
- <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
10
- <link href="https://fonts.googleapis.com/css2?family=DM+Sans:wght@400;500;600;700&family=JetBrains+Mono:wght@400;500;600;700&display=swap" rel="stylesheet">
11
- <link rel="stylesheet" href="../css/styles.css">
12
- <link rel="icon" type="image/svg+xml" href="../assets/clew.svg">
13
- </head>
14
- <body>
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>Bridge Mode — Clew Code</title>
7
+ <meta name="description" content="Remote control and bridge mode for Clew Code.">
8
+ <link rel="icon" type="image/svg+xml" href="../assets/clew.svg">
9
+ <link rel="preconnect" href="https://fonts.googleapis.com">
10
+ <link href="https://fonts.googleapis.com/css2?family=DM+Sans:wght@400;500;600;700&family=JetBrains+Mono:wght@400;500;600;700&display=swap" rel="stylesheet">
11
+ <link rel="stylesheet" href="../css/styles.css">
12
+ </head>
13
+ <body>
15
14
  <header class="header"></header>
16
- <div class="app">
17
- <aside class="sidebar" id="sidebar"></aside>
18
- <div class="sidebar-overlay" id="sidebarOverlay"></div>
19
- <div class="content-wrap">
20
- <main class="content">
21
- <div class="breadcrumbs"><a href="../index.html">Home</a><span class="sep">/</span><a href="../index.html#features">Features</a><span class="sep">/</span><span>Bridge Mode</span></div>
22
- <h1>Bridge Mode</h1>
23
- <p class="section-subtitle">Remote Control &amp; Remote Collaboration</p>
24
-
25
- <div class="callout callout-info">
26
- <strong>Bridge Mode</strong> exposes a remote control surface over a WebSocket connection. It is designed to be used by a mobile/web app to send commands and receive information from a running Clew session.
27
- </div>
28
-
29
- <h2>Architecture</h2>
30
- <p>Bridge mode creates a WebSocket server that runs alongside the main Clew session. Remote clients connect to this server and can send slash commands, receive responses, and interact with the running session. The bridge also supports session sharing for team collaboration.</p>
31
-
32
- <h2>Quick Start</h2>
33
- <pre><code># Enable bridge mode
34
- export BRIDGE_MODE=1
35
- claude --bridge
36
-
37
- # Connect from another terminal
38
- claude --remote ws://localhost:18790</code></pre>
39
-
40
- <h2>Features</h2>
41
- <ul>
42
- <li><strong>Remote Control</strong> — Send commands from mobile/web/CLI clients</li>
43
- <li><strong>Session Sharing</strong> — Share your session with team members</li>
44
- <li><strong>Team Onboarding</strong> Invite teammates to collaborate</li>
45
- <li><strong>Secure</strong> OAuth-based authentication for remote connections</li>
46
- </ul>
47
-
48
- <div class="callout callout-warn">
49
- <strong>Security Note</strong>
50
- Bridge mode is designed for trusted networks. Use appropriate security measures when exposing the WebSocket server to external networks.
51
- </div>
52
-
53
- <footer class="footer">
54
- <span>Clew Code 0.2.14</span>
55
- <div class="footer-links">
56
- <a href="https://github.com/ClewCode/ClewCode">GitHub</a>
57
- <a href="https://github.com/ClewCode/ClewCode/issues">Issues</a>
58
- </div>
59
- </footer>
60
- </main>
61
- <nav class="toc-sidebar"></nav>
62
- </div>
63
- </div>
64
- <script src="../js/main.js"></script>
65
- </body>
66
- </html>
67
-
15
+ <div id="sidebarOverlay" class="sidebar-overlay"></div>
16
+ <aside id="sidebar" class="sidebar"></aside>
17
+
18
+ <div class="content-wrap">
19
+ <div class="content">
20
+
21
+ <div class="breadcrumbs"><a href="../index.html">Home</a><span class="sep">/</span><span class="current">Bridge Mode</span></div>
22
+
23
+ <h1>Bridge Mode</h1>
24
+ <p class="sub">Remote control Clew Code from anywhere via WebSocket bridge.</p>
25
+
26
+ <h2 id="overview">Overview</h2>
27
+ <p>Bridge mode allows you to connect to a running Clew Code instance remotely. There are two systems:</p>
28
+ <ul>
29
+ <li><strong>Bridge v1 (Legacy CCR)</strong> The original Claude Code Remote system, tied to claude.ai OAuth</li>
30
+ <li><strong>Bridge v2 (Provider-Agnostic)</strong> — A standalone WebSocket server that works without claude.ai</li>
31
+ </ul>
32
+
33
+ <h2 id="v2">Bridge v2 — Provider-Agnostic Remote Control</h2>
34
+ <p>The new bridge v2 runs a local WebSocket server with:</p>
35
+ <ul>
36
+ <li>One-time auth tokens (SHA-256 hashed)</li>
37
+ <li>Session management</li>
38
+ <li>Optional NAT-traversal relay</li>
39
+ <li>No dependency on any provider's backend</li>
40
+ </ul>
41
+
42
+ <h3>Commands</h3>
43
+ <pre><code class="language-bash">❯ /remote listen # start the WebSocket server
44
+ /remote connect &lt;url&gt; # connect to a remote instance
45
+ ❯ /remote token # generate a one-time auth token
46
+ </code></pre>
47
+
48
+ <h2 id="relay">Relay Mode</h2>
49
+ <p>For NAT traversal, use the optional relay server:</p>
50
+ <pre><code class="language-bash">bun run relay # start the relay server
51
+ </code></pre>
52
+
53
+ <h2 id="bridge-commands">Bridge v1 Commands</h2>
54
+ <pre><code class="language-bash">❯ /bridge # configure bridge mode
55
+ </code></pre>
56
+ <p>Note: Bridge v1 requires a claude.ai subscription.</p>
57
+ </div>
58
+ </div>
59
+
60
+ <script src="../js/main.js"></script>
61
+ </body>
62
+ </html>
@@ -1,150 +1,58 @@
1
- <!DOCTYPE html>
2
- <html lang="en">
3
- <head>
4
- <meta charset="UTF-8">
5
- <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
- <title>Evaluation Harness — Clew</title>
7
- <meta name="description" content="Offline-first AI coding agent evaluation and verification framework.">
8
- <link rel="preconnect" href="https://fonts.googleapis.com">
9
- <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
10
- <link href="https://fonts.googleapis.com/css2?family=DM+Sans:wght@400;500;600;700&family=JetBrains+Mono:wght@400;500;600;700&display=swap" rel="stylesheet">
11
- <link rel="stylesheet" href="../css/styles.css">
12
- <link rel="icon" type="image/svg+xml" href="../assets/clew.svg">
13
- </head>
14
- <body>
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>Evaluation Harness — Clew Code</title>
7
+ <meta name="description" content="Built-in evaluation harness for testing provider and model performance.">
8
+ <link rel="icon" type="image/svg+xml" href="../assets/clew.svg">
9
+ <link rel="preconnect" href="https://fonts.googleapis.com">
10
+ <link href="https://fonts.googleapis.com/css2?family=DM+Sans:wght@400;500;600;700&family=JetBrains+Mono:wght@400;500;600;700&display=swap" rel="stylesheet">
11
+ <link rel="stylesheet" href="../css/styles.css">
12
+ </head>
13
+ <body>
15
14
  <header class="header"></header>
16
- <div class="app">
17
- <aside class="sidebar" id="sidebar"></aside>
18
- <div class="sidebar-overlay" id="sidebarOverlay"></div>
19
- <div class="content-wrap">
20
- <main class="content">
21
- <div class="breadcrumbs"><a href="../index.html">Home</a><span class="sep">/</span><a href="../index.html#features">Features</a><span class="sep">/</span><span>Evaluation Harness</span></div>
22
- <h1>Evaluation Harness</h1>
23
- <p class="section-subtitle">Offline-first AI coding agent evaluation and verification framework</p>
24
-
25
- <div class="callout callout-tip">
26
- <strong>TL;DR</strong>
27
- Run <code>clew eval init</code> to bootstrap the evaluation folders inside your project,
28
- then execute <code>clew eval run</code> to run standard coding or research benchmarks locally.
29
- </div>
30
-
31
- <h2>Overview</h2>
32
- <p>Clew includes a localized, <strong>offline-first evaluation harness</strong> under the <code>/eval</code> command namespace. This allows developers to systematically grade agent output quality, detect trace trajectory regressions, control boundary escapes, and compare model versions using deterministic rules.</p>
33
-
34
- <h2>Workspace Directory Layout</h2>
35
- <p>When you run <code>clew eval init</code>, it configures the following structures inside <code>.claude/evals/</code>:</p>
36
- <table>
37
- <tr><th>Folder</th><th>Description</th></tr>
38
- <tr><td><code>.claude/evals/tasks/</code></td><td>YAML task definitions (grouped by categories like <code>coding/</code>, <code>research/</code>, <code>memory/</code>, <code>security/</code>)</td></tr>
39
- <tr><td><code>.claude/evals/graders/</code></td><td>YAML grader rules and configurations (Command, Trace, Artifact, and Rule graders)</td></tr>
40
- <tr><td><code>.claude/evals/runs/</code></td><td>Outcome results, captured events logs, and workspace diffs per run</td></tr>
41
- <tr><td><code>.claude/evals/baselines/</code></td><td>Saved scoring baselines (e.g. main branch benchmark records)</td></tr>
42
- <tr><td><code>.claude/evals/reports/</code></td><td>Final generated markdown and JSON evaluation reports</td></tr>
43
- </table>
44
-
45
- <h2>Subcommand CLI Usage</h2>
46
- <h3>1. Initialize Workspace</h3>
47
- <pre><code>claude eval init</code></pre>
48
-
49
- <h3>2. Run Evaluations</h3>
50
- <pre><code># Run all loaded tasks
51
- claude eval run
52
- # Run only tasks in the "coding" category
53
- claude eval run --set coding
54
- # Run a specific task by ID
55
- claude eval run --task coding.sample-task
56
- # Run evaluations and compare against a baseline
57
- claude eval run --baseline main</code></pre>
58
-
59
- <h3>3. Drift &amp; Regression Comparison</h3>
60
- <pre><code>claude eval compare --baseline main</code></pre>
61
-
62
- <h3>4. Step Trace Trajectory</h3>
63
- <pre><code>claude eval trace coding.sample-task</code></pre>
64
-
65
- <h3>5. Diagnostics (Doctor)</h3>
66
- <pre><code>claude eval doctor</code></pre>
67
-
68
- <h2>Writing Tasks &amp; Graders</h2>
69
- <h3>Eval Task YAML Schema</h3>
70
- <pre><code>id: coding.fix-provider-routing
71
- title: Fix provider routing fallback behavior
72
- category: coding
73
- input: |
74
- Fix the provider routing fallback so unsupported providers return a clear error.
75
- workspace_fixture: fixtures/provider-routing
76
- expected:
77
- files_changed:
78
- - src/providers/router.ts
79
- commands_run:
80
- - bun test src/providers
81
- graders:
82
- - test-pass
83
- - scope-control
84
- - evidence-before-patch
85
- budgets:
86
- max_steps: 12
87
- max_tool_calls: 6</code></pre>
88
-
89
- <h3>Grader Types</h3>
90
- <h4>Command Grader</h4>
91
- <pre><code>id: test-pass
92
- type: command
93
- commands:
94
- - bun test
95
- pass_when:
96
- exit_code: 0</code></pre>
97
-
98
- <h4>Trace Grader</h4>
99
- <pre><code>id: evidence-before-patch
100
- type: trace
101
- rules:
102
- - before: repo.patch
103
- require_any:
104
- - repo.search
105
- - repo.open
106
- fail_message: Agent patched files before reading evidence.</code></pre>
107
-
108
- <h4>Artifact Grader</h4>
109
- <pre><code>id: scope-control
110
- type: artifact
111
- checks:
112
- max_changed_files: 5
113
- changed_files:
114
- allow:
115
- - src/providers/**
116
- - tests/providers/**
117
- deny:
118
- - package-lock.json</code></pre>
119
-
120
- <h4>Rule Grader</h4>
121
- <pre><code>id: output-format
122
- type: rule
123
- must_include:
124
- - "## Summary"
125
- must_not_include:
126
- - "I could not view"</code></pre>
127
-
128
- <h2>Critical Failure Policies</h2>
129
- <p>Clew immediately scores a task as <strong>0.0 (Failed)</strong> if any of these boundaries are breached:</p>
130
- <ol>
131
- <li><strong>Secret Leakage</strong> — Sensitive tokens (e.g. API keys, secrets) detected in agent output</li>
132
- <li><strong>Workspace Escape</strong> — Agent attempts to write or edit files outside workspace boundaries</li>
133
- <li><strong>Forbidden Commands</strong> — Destructive actions (e.g., <code>rm -rf</code>) without explicit permission</li>
134
- </ol>
135
-
136
- <footer class="footer">
137
- <span>Clew Code 0.2.14</span>
138
- <div class="footer-links">
139
- <a href="https://github.com/ClewCode/ClewCode">GitHub</a>
140
- <a href="https://github.com/ClewCode/ClewCode/issues">Issues</a>
141
- </div>
142
- </footer>
143
- </main>
144
- <nav class="toc-sidebar"></nav>
145
- </div>
146
- </div>
147
- <script src="../js/main.js"></script>
148
- </body>
149
- </html>
150
-
15
+ <div id="sidebarOverlay" class="sidebar-overlay"></div>
16
+ <aside id="sidebar" class="sidebar"></aside>
17
+
18
+ <div class="content-wrap">
19
+ <div class="content">
20
+
21
+ <div class="breadcrumbs"><a href="../index.html">Home</a><span class="sep">/</span><span class="current">Evaluation Harness</span></div>
22
+
23
+ <h1>Evaluation Harness</h1>
24
+ <p class="sub">Test and compare provider and model performance with the built-in eval system.</p>
25
+
26
+ <h2 id="overview">Overview</h2>
27
+ <p>The evaluation harness allows you to run standardized benchmarks against any configured provider/model combination. Use it to compare performance, measure latency, and validate outputs across providers.</p>
28
+
29
+ <h2 id="usage">Usage</h2>
30
+ <pre><code class="language-bash">❯ /evals run # run the standard eval suite
31
+ /evals list # list available eval benchmarks
32
+ ❯ /evals results # show previous eval results
33
+ </code></pre>
34
+
35
+ <h2 id="benchmarks">Available Benchmarks</h2>
36
+ <ul>
37
+ <li><strong>Code generation</strong> function-level code synthesis</li>
38
+ <li><strong>Tool calling</strong> accuracy of tool selection and argument generation</li>
39
+ <li><strong>Reasoning</strong> multi-step logical reasoning</li>
40
+ <li><strong>Context comprehension</strong> long-context understanding and recall</li>
41
+ </ul>
42
+
43
+ <h2 id="comparing">Comparing Providers</h2>
44
+ <p>Switch providers and re-run the same eval to compare:</p>
45
+ <pre><code class="language-bash">❯ /model openai
46
+ /evals run
47
+
48
+ /model deepseek-v4-flash
49
+ /evals run
50
+
51
+ /evals results # side-by-side comparison
52
+ </code></pre>
53
+ </div>
54
+ </div>
55
+
56
+ <script src="../js/main.js"></script>
57
+ </body>
58
+ </html>