clew-code 0.2.21 → 0.2.22
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/main.js +1861 -1856
- package/docs/architecture.html +91 -148
- package/docs/assets/clew-agent-loop.png +0 -0
- package/docs/assets/clew-general-architecture.png +0 -0
- package/docs/assets/clew-mcp-architecture.png +0 -0
- package/docs/assets/clew-p2p-swarm.png +0 -0
- package/docs/changelog.html +150 -0
- package/docs/cli-reference.html +90 -0
- package/docs/commands.html +156 -265
- package/docs/configuration.html +85 -147
- package/docs/contributing.html +91 -0
- package/docs/css/styles.css +425 -425
- package/docs/daemon.html +62 -129
- package/docs/features/bridge-mode.html +61 -66
- package/docs/features/evals.html +57 -149
- package/docs/features/searxng-search.html +58 -118
- package/docs/features/sentry-setup.html +61 -124
- package/docs/index.html +137 -125
- package/docs/installation.html +77 -105
- package/docs/internals/growthbook-ab-testing.html +69 -91
- package/docs/internals/hidden-features.html +81 -143
- package/docs/js/main.js +29 -0
- package/docs/loop.html +69 -181
- package/docs/mcp.html +99 -247
- package/docs/models.html +69 -110
- package/docs/permission-model.html +86 -102
- package/docs/plugins.html +84 -102
- package/docs/providers.html +87 -127
- package/docs/quick-start.html +81 -93
- package/docs/research-memory.html +71 -102
- package/docs/security.html +71 -0
- package/docs/skills.html +67 -117
- package/docs/swarm.html +78 -236
- package/docs/tools.html +152 -151
- package/docs/troubleshooting.html +86 -106
- package/docs/voice-mode.html +79 -0
- package/package.json +1 -1
- package/docs/architecture.th.html +0 -79
- package/docs/clew-code-architecture.html +0 -1126
- package/docs/commands.th.html +0 -269
- package/docs/configuration.th.html +0 -108
- package/docs/daemon.th.html +0 -73
- package/docs/features/bridge-mode.th.html +0 -62
- package/docs/features/evals.th.html +0 -62
- package/docs/features/searxng-search.th.html +0 -67
- package/docs/features/sentry-setup.th.html +0 -69
- package/docs/features/swarm.html +0 -156
- package/docs/generated/providers.html +0 -625
- package/docs/generated/tools.html +0 -558
- package/docs/index.th.html +0 -292
- package/docs/installation.th.html +0 -105
- package/docs/internals/growthbook-ab-testing.th.html +0 -60
- package/docs/internals/hidden-features.th.html +0 -107
- package/docs/loop.th.html +0 -227
- package/docs/mcp.th.html +0 -207
- package/docs/models.th.html +0 -61
- package/docs/permission-model.th.html +0 -67
- package/docs/plugins.th.html +0 -79
- package/docs/prompts-and-features.html +0 -806
- package/docs/providers.th.html +0 -81
- package/docs/quick-start.th.html +0 -89
- package/docs/research-memory.th.html +0 -72
- package/docs/skills.th.html +0 -90
- package/docs/swarm.th.html +0 -280
- package/docs/tools.th.html +0 -84
- package/docs/troubleshooting.th.html +0 -85
package/docs/daemon.html
CHANGED
|
@@ -1,129 +1,62 @@
|
|
|
1
|
-
<!DOCTYPE html>
|
|
2
|
-
<html lang="en">
|
|
3
|
-
<head>
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
<
|
|
15
|
-
<
|
|
16
|
-
<
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
<ul>
|
|
64
|
-
<li>Detaches from the terminal and runs as a background process</li>
|
|
65
|
-
<li>Logs output to a configurable log file</li>
|
|
66
|
-
<li>Responds to signals for graceful shutdown</li>
|
|
67
|
-
<li>Reports status to the supervisor for health tracking</li>
|
|
68
|
-
</ul>
|
|
69
|
-
|
|
70
|
-
<h2>Supervisor Integration</h2>
|
|
71
|
-
<p><code>src/services/autonomous/supervisorIntegration.ts</code> ensures the daemon stays running:</p>
|
|
72
|
-
<ul>
|
|
73
|
-
<li><strong>Health checks</strong> — Periodic heartbeat and resource checks</li>
|
|
74
|
-
<li><strong>Auto-respawn</strong> — Automatic restart on unexpected exit</li>
|
|
75
|
-
<li><strong>State tracking</strong> — Current status, running tasks, error counts</li>
|
|
76
|
-
<li><strong>Graceful degradation</strong> — Reduces polling frequency on repeated failures</li>
|
|
77
|
-
</ul>
|
|
78
|
-
|
|
79
|
-
<h2>Commands</h2>
|
|
80
|
-
<table>
|
|
81
|
-
<tr><th>Command</th><th>Description</th></tr>
|
|
82
|
-
<tr><td><code>/daemon</code></td><td>Open interactive control panel; subcommands: start, stop, status, restart</td></tr>
|
|
83
|
-
<tr><td><code>/task</code></td><td>Create scheduled or recurring tasks via interactive form</td></tr>
|
|
84
|
-
<tr><td><code>/task list</code></td><td>List queued, running, and completed tasks</td></tr>
|
|
85
|
-
<tr><td><code>/loop</code></td><td>Run a prompt or command on a recurring interval (<code>/loop 5m /check-deploy</code>)</td></tr>
|
|
86
|
-
<tr><td><code>/agents</code></td><td>Manage agent configurations and daemon worker pools</td></tr>
|
|
87
|
-
<tr><td><code>/tasks</code></td><td>List and manage background agent tasks</td></tr>
|
|
88
|
-
</table>
|
|
89
|
-
|
|
90
|
-
<h2>Task Scheduling</h2>
|
|
91
|
-
<p>Scheduled tasks can be created through the interactive <code>/task</code> form or programmatically. Storage modes:</p>
|
|
92
|
-
<ul>
|
|
93
|
-
<li><strong>Durable</strong> — Persists to <code>.clew/scheduled_tasks.json</code>, survives restarts</li>
|
|
94
|
-
<li><strong>Session-only</strong> — Kept in memory for the current session only</li>
|
|
95
|
-
</ul>
|
|
96
|
-
|
|
97
|
-
<p>Recurring tasks auto-expire after 30 days. One-shot tasks auto-delete after firing. Custom cron expressions are supported (standard 5-field format).</p>
|
|
98
|
-
|
|
99
|
-
<pre><code>/task
|
|
100
|
-
Name: Deploy health check
|
|
101
|
-
Schedule: Daily
|
|
102
|
-
Time: 09:00
|
|
103
|
-
Prompt: Check deployment status and report
|
|
104
|
-
Storage: Durable</code></pre>
|
|
105
|
-
|
|
106
|
-
<h2>Architecture Files</h2>
|
|
107
|
-
<table>
|
|
108
|
-
<tr><th>File</th><th>Role</th></tr>
|
|
109
|
-
<tr><td><code>src/services/autonomous/taskQueue.ts</code></td><td>Persistent task queue with priorities, leases, dead-letter</td></tr>
|
|
110
|
-
<tr><td><code>src/services/autonomous/agentLoop.ts</code></td><td>Continuous 24/7 agent loop</td></tr>
|
|
111
|
-
<tr><td><code>src/services/autonomous/daemonMode.ts</code></td><td>Background daemon entry point</td></tr>
|
|
112
|
-
<tr><td><code>src/services/autonomous/supervisorIntegration.ts</code></td><td>Health checks, auto-respawn, state tracking</td></tr>
|
|
113
|
-
</table>
|
|
114
|
-
|
|
115
|
-
<footer class="footer">
|
|
116
|
-
<span>Clew Code 0.2.14 — Open Source</span>
|
|
117
|
-
<div class="footer-links">
|
|
118
|
-
<a href="https://github.com/ClewCode/ClewCode">GitHub</a>
|
|
119
|
-
<a href="https://github.com/ClewCode/ClewCode/issues">Issues</a>
|
|
120
|
-
</div>
|
|
121
|
-
</footer>
|
|
122
|
-
</main>
|
|
123
|
-
<nav class="toc-sidebar"></nav>
|
|
124
|
-
</div>
|
|
125
|
-
</div>
|
|
126
|
-
<script src="js/main.js"></script>
|
|
127
|
-
</body>
|
|
128
|
-
</html>
|
|
129
|
-
|
|
1
|
+
<!DOCTYPE html>
|
|
2
|
+
<html lang="en">
|
|
3
|
+
<head>
|
|
4
|
+
<meta charset="UTF-8">
|
|
5
|
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
|
6
|
+
<title>Daemon Mode — Clew Code</title>
|
|
7
|
+
<meta name="description" content="Run Clew Code as a background daemon for autonomous operations.">
|
|
8
|
+
<link rel="icon" type="image/svg+xml" href="assets/clew.svg">
|
|
9
|
+
<link rel="preconnect" href="https://fonts.googleapis.com">
|
|
10
|
+
<link href="https://fonts.googleapis.com/css2?family=DM+Sans:wght@400;500;600;700&family=JetBrains+Mono:wght@400;500;600;700&display=swap" rel="stylesheet">
|
|
11
|
+
<link rel="stylesheet" href="css/styles.css">
|
|
12
|
+
</head>
|
|
13
|
+
<body>
|
|
14
|
+
<header class="header"></header>
|
|
15
|
+
<div id="sidebarOverlay" class="sidebar-overlay"></div>
|
|
16
|
+
<aside id="sidebar" class="sidebar"></aside>
|
|
17
|
+
|
|
18
|
+
<div class="content-wrap">
|
|
19
|
+
<div class="content">
|
|
20
|
+
|
|
21
|
+
<div class="breadcrumbs"><a href="index.html">Home</a><span class="sep">/</span><span class="current">Daemon Mode</span></div>
|
|
22
|
+
|
|
23
|
+
<h1>Daemon Mode</h1>
|
|
24
|
+
<p class="sub">Run Clew Code as a persistent background process for autonomous task execution, scheduled jobs, and continuous monitoring.</p>
|
|
25
|
+
|
|
26
|
+
<h2 id="overview">Overview</h2>
|
|
27
|
+
<p>Daemon mode keeps Clew Code running in the background, processing tasks from a persistent queue, executing scheduled jobs via cron, and coordinating with mesh peers — all without an active terminal session.</p>
|
|
28
|
+
|
|
29
|
+
<h2 id="daemon-commands">Daemon Commands</h2>
|
|
30
|
+
<pre><code class="language-bash">❯ /daemon # Open daemon dashboard
|
|
31
|
+
❯ /daemon status # Check daemon status
|
|
32
|
+
</code></pre>
|
|
33
|
+
|
|
34
|
+
<h2 id="task-queue">Task Queue</h2>
|
|
35
|
+
<p>The daemon uses a file-backed persistent task queue with:</p>
|
|
36
|
+
<ul>
|
|
37
|
+
<li><strong>Lease-based concurrency</strong> — max 3 concurrent workers</li>
|
|
38
|
+
<li><strong>Exponential backoff retry</strong> — failed tasks are retried with increasing delays</li>
|
|
39
|
+
<li><strong>Dead-letter management</strong> — tasks that exceed retry limits are moved to dead-letter storage</li>
|
|
40
|
+
</ul>
|
|
41
|
+
|
|
42
|
+
<h2 id="scheduling">Scheduling</h2>
|
|
43
|
+
<p>Use cron syntax to schedule recurring tasks:</p>
|
|
44
|
+
<pre><code class="language-bash">❯ /task add "0 9 * * *" "daily standup summary"
|
|
45
|
+
❯ /task list # list scheduled tasks
|
|
46
|
+
❯ /task remove <id> # remove a task
|
|
47
|
+
</code></pre>
|
|
48
|
+
|
|
49
|
+
<h2 id="loop">Agent Loop</h2>
|
|
50
|
+
<p>The daemon integrates with the autonomous agent loop for 24/7 operation:</p>
|
|
51
|
+
<pre><code class="language-bash">❯ /loop start # start the autonomous loop
|
|
52
|
+
❯ /loop stop # stop the loop
|
|
53
|
+
❯ /loop status # check loop status
|
|
54
|
+
</code></pre>
|
|
55
|
+
|
|
56
|
+
<p>See <a href="loop.html">Agent Loop</a> for details.</p>
|
|
57
|
+
</div>
|
|
58
|
+
</div>
|
|
59
|
+
|
|
60
|
+
<script src="js/main.js"></script>
|
|
61
|
+
</body>
|
|
62
|
+
</html>
|
|
@@ -1,67 +1,62 @@
|
|
|
1
|
-
<!DOCTYPE html>
|
|
2
|
-
<html lang="en">
|
|
3
|
-
<head>
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
<body>
|
|
1
|
+
<!DOCTYPE html>
|
|
2
|
+
<html lang="en">
|
|
3
|
+
<head>
|
|
4
|
+
<meta charset="UTF-8">
|
|
5
|
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
|
6
|
+
<title>Bridge Mode — Clew Code</title>
|
|
7
|
+
<meta name="description" content="Remote control and bridge mode for Clew Code.">
|
|
8
|
+
<link rel="icon" type="image/svg+xml" href="../assets/clew.svg">
|
|
9
|
+
<link rel="preconnect" href="https://fonts.googleapis.com">
|
|
10
|
+
<link href="https://fonts.googleapis.com/css2?family=DM+Sans:wght@400;500;600;700&family=JetBrains+Mono:wght@400;500;600;700&display=swap" rel="stylesheet">
|
|
11
|
+
<link rel="stylesheet" href="../css/styles.css">
|
|
12
|
+
</head>
|
|
13
|
+
<body>
|
|
15
14
|
<header class="header"></header>
|
|
16
|
-
<div class="
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
</
|
|
64
|
-
<script src="../js/main.js"></script>
|
|
65
|
-
</body>
|
|
66
|
-
</html>
|
|
67
|
-
|
|
15
|
+
<div id="sidebarOverlay" class="sidebar-overlay"></div>
|
|
16
|
+
<aside id="sidebar" class="sidebar"></aside>
|
|
17
|
+
|
|
18
|
+
<div class="content-wrap">
|
|
19
|
+
<div class="content">
|
|
20
|
+
|
|
21
|
+
<div class="breadcrumbs"><a href="../index.html">Home</a><span class="sep">/</span><span class="current">Bridge Mode</span></div>
|
|
22
|
+
|
|
23
|
+
<h1>Bridge Mode</h1>
|
|
24
|
+
<p class="sub">Remote control Clew Code from anywhere via WebSocket bridge.</p>
|
|
25
|
+
|
|
26
|
+
<h2 id="overview">Overview</h2>
|
|
27
|
+
<p>Bridge mode allows you to connect to a running Clew Code instance remotely. There are two systems:</p>
|
|
28
|
+
<ul>
|
|
29
|
+
<li><strong>Bridge v1 (Legacy CCR)</strong> — The original Claude Code Remote system, tied to claude.ai OAuth</li>
|
|
30
|
+
<li><strong>Bridge v2 (Provider-Agnostic)</strong> — A standalone WebSocket server that works without claude.ai</li>
|
|
31
|
+
</ul>
|
|
32
|
+
|
|
33
|
+
<h2 id="v2">Bridge v2 — Provider-Agnostic Remote Control</h2>
|
|
34
|
+
<p>The new bridge v2 runs a local WebSocket server with:</p>
|
|
35
|
+
<ul>
|
|
36
|
+
<li>One-time auth tokens (SHA-256 hashed)</li>
|
|
37
|
+
<li>Session management</li>
|
|
38
|
+
<li>Optional NAT-traversal relay</li>
|
|
39
|
+
<li>No dependency on any provider's backend</li>
|
|
40
|
+
</ul>
|
|
41
|
+
|
|
42
|
+
<h3>Commands</h3>
|
|
43
|
+
<pre><code class="language-bash">❯ /remote listen # start the WebSocket server
|
|
44
|
+
❯ /remote connect <url> # connect to a remote instance
|
|
45
|
+
❯ /remote token # generate a one-time auth token
|
|
46
|
+
</code></pre>
|
|
47
|
+
|
|
48
|
+
<h2 id="relay">Relay Mode</h2>
|
|
49
|
+
<p>For NAT traversal, use the optional relay server:</p>
|
|
50
|
+
<pre><code class="language-bash">bun run relay # start the relay server
|
|
51
|
+
</code></pre>
|
|
52
|
+
|
|
53
|
+
<h2 id="bridge-commands">Bridge v1 Commands</h2>
|
|
54
|
+
<pre><code class="language-bash">❯ /bridge # configure bridge mode
|
|
55
|
+
</code></pre>
|
|
56
|
+
<p>Note: Bridge v1 requires a claude.ai subscription.</p>
|
|
57
|
+
</div>
|
|
58
|
+
</div>
|
|
59
|
+
|
|
60
|
+
<script src="../js/main.js"></script>
|
|
61
|
+
</body>
|
|
62
|
+
</html>
|
package/docs/features/evals.html
CHANGED
|
@@ -1,150 +1,58 @@
|
|
|
1
|
-
<!DOCTYPE html>
|
|
2
|
-
<html lang="en">
|
|
3
|
-
<head>
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
<body>
|
|
1
|
+
<!DOCTYPE html>
|
|
2
|
+
<html lang="en">
|
|
3
|
+
<head>
|
|
4
|
+
<meta charset="UTF-8">
|
|
5
|
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
|
6
|
+
<title>Evaluation Harness — Clew Code</title>
|
|
7
|
+
<meta name="description" content="Built-in evaluation harness for testing provider and model performance.">
|
|
8
|
+
<link rel="icon" type="image/svg+xml" href="../assets/clew.svg">
|
|
9
|
+
<link rel="preconnect" href="https://fonts.googleapis.com">
|
|
10
|
+
<link href="https://fonts.googleapis.com/css2?family=DM+Sans:wght@400;500;600;700&family=JetBrains+Mono:wght@400;500;600;700&display=swap" rel="stylesheet">
|
|
11
|
+
<link rel="stylesheet" href="../css/styles.css">
|
|
12
|
+
</head>
|
|
13
|
+
<body>
|
|
15
14
|
<header class="header"></header>
|
|
16
|
-
<div class="
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
<pre><code>claude eval compare --baseline main</code></pre>
|
|
61
|
-
|
|
62
|
-
<h3>4. Step Trace Trajectory</h3>
|
|
63
|
-
<pre><code>claude eval trace coding.sample-task</code></pre>
|
|
64
|
-
|
|
65
|
-
<h3>5. Diagnostics (Doctor)</h3>
|
|
66
|
-
<pre><code>claude eval doctor</code></pre>
|
|
67
|
-
|
|
68
|
-
<h2>Writing Tasks & Graders</h2>
|
|
69
|
-
<h3>Eval Task YAML Schema</h3>
|
|
70
|
-
<pre><code>id: coding.fix-provider-routing
|
|
71
|
-
title: Fix provider routing fallback behavior
|
|
72
|
-
category: coding
|
|
73
|
-
input: |
|
|
74
|
-
Fix the provider routing fallback so unsupported providers return a clear error.
|
|
75
|
-
workspace_fixture: fixtures/provider-routing
|
|
76
|
-
expected:
|
|
77
|
-
files_changed:
|
|
78
|
-
- src/providers/router.ts
|
|
79
|
-
commands_run:
|
|
80
|
-
- bun test src/providers
|
|
81
|
-
graders:
|
|
82
|
-
- test-pass
|
|
83
|
-
- scope-control
|
|
84
|
-
- evidence-before-patch
|
|
85
|
-
budgets:
|
|
86
|
-
max_steps: 12
|
|
87
|
-
max_tool_calls: 6</code></pre>
|
|
88
|
-
|
|
89
|
-
<h3>Grader Types</h3>
|
|
90
|
-
<h4>Command Grader</h4>
|
|
91
|
-
<pre><code>id: test-pass
|
|
92
|
-
type: command
|
|
93
|
-
commands:
|
|
94
|
-
- bun test
|
|
95
|
-
pass_when:
|
|
96
|
-
exit_code: 0</code></pre>
|
|
97
|
-
|
|
98
|
-
<h4>Trace Grader</h4>
|
|
99
|
-
<pre><code>id: evidence-before-patch
|
|
100
|
-
type: trace
|
|
101
|
-
rules:
|
|
102
|
-
- before: repo.patch
|
|
103
|
-
require_any:
|
|
104
|
-
- repo.search
|
|
105
|
-
- repo.open
|
|
106
|
-
fail_message: Agent patched files before reading evidence.</code></pre>
|
|
107
|
-
|
|
108
|
-
<h4>Artifact Grader</h4>
|
|
109
|
-
<pre><code>id: scope-control
|
|
110
|
-
type: artifact
|
|
111
|
-
checks:
|
|
112
|
-
max_changed_files: 5
|
|
113
|
-
changed_files:
|
|
114
|
-
allow:
|
|
115
|
-
- src/providers/**
|
|
116
|
-
- tests/providers/**
|
|
117
|
-
deny:
|
|
118
|
-
- package-lock.json</code></pre>
|
|
119
|
-
|
|
120
|
-
<h4>Rule Grader</h4>
|
|
121
|
-
<pre><code>id: output-format
|
|
122
|
-
type: rule
|
|
123
|
-
must_include:
|
|
124
|
-
- "## Summary"
|
|
125
|
-
must_not_include:
|
|
126
|
-
- "I could not view"</code></pre>
|
|
127
|
-
|
|
128
|
-
<h2>Critical Failure Policies</h2>
|
|
129
|
-
<p>Clew immediately scores a task as <strong>0.0 (Failed)</strong> if any of these boundaries are breached:</p>
|
|
130
|
-
<ol>
|
|
131
|
-
<li><strong>Secret Leakage</strong> — Sensitive tokens (e.g. API keys, secrets) detected in agent output</li>
|
|
132
|
-
<li><strong>Workspace Escape</strong> — Agent attempts to write or edit files outside workspace boundaries</li>
|
|
133
|
-
<li><strong>Forbidden Commands</strong> — Destructive actions (e.g., <code>rm -rf</code>) without explicit permission</li>
|
|
134
|
-
</ol>
|
|
135
|
-
|
|
136
|
-
<footer class="footer">
|
|
137
|
-
<span>Clew Code 0.2.14</span>
|
|
138
|
-
<div class="footer-links">
|
|
139
|
-
<a href="https://github.com/ClewCode/ClewCode">GitHub</a>
|
|
140
|
-
<a href="https://github.com/ClewCode/ClewCode/issues">Issues</a>
|
|
141
|
-
</div>
|
|
142
|
-
</footer>
|
|
143
|
-
</main>
|
|
144
|
-
<nav class="toc-sidebar"></nav>
|
|
145
|
-
</div>
|
|
146
|
-
</div>
|
|
147
|
-
<script src="../js/main.js"></script>
|
|
148
|
-
</body>
|
|
149
|
-
</html>
|
|
150
|
-
|
|
15
|
+
<div id="sidebarOverlay" class="sidebar-overlay"></div>
|
|
16
|
+
<aside id="sidebar" class="sidebar"></aside>
|
|
17
|
+
|
|
18
|
+
<div class="content-wrap">
|
|
19
|
+
<div class="content">
|
|
20
|
+
|
|
21
|
+
<div class="breadcrumbs"><a href="../index.html">Home</a><span class="sep">/</span><span class="current">Evaluation Harness</span></div>
|
|
22
|
+
|
|
23
|
+
<h1>Evaluation Harness</h1>
|
|
24
|
+
<p class="sub">Test and compare provider and model performance with the built-in eval system.</p>
|
|
25
|
+
|
|
26
|
+
<h2 id="overview">Overview</h2>
|
|
27
|
+
<p>The evaluation harness allows you to run standardized benchmarks against any configured provider/model combination. Use it to compare performance, measure latency, and validate outputs across providers.</p>
|
|
28
|
+
|
|
29
|
+
<h2 id="usage">Usage</h2>
|
|
30
|
+
<pre><code class="language-bash">❯ /evals run # run the standard eval suite
|
|
31
|
+
❯ /evals list # list available eval benchmarks
|
|
32
|
+
❯ /evals results # show previous eval results
|
|
33
|
+
</code></pre>
|
|
34
|
+
|
|
35
|
+
<h2 id="benchmarks">Available Benchmarks</h2>
|
|
36
|
+
<ul>
|
|
37
|
+
<li><strong>Code generation</strong> — function-level code synthesis</li>
|
|
38
|
+
<li><strong>Tool calling</strong> — accuracy of tool selection and argument generation</li>
|
|
39
|
+
<li><strong>Reasoning</strong> — multi-step logical reasoning</li>
|
|
40
|
+
<li><strong>Context comprehension</strong> — long-context understanding and recall</li>
|
|
41
|
+
</ul>
|
|
42
|
+
|
|
43
|
+
<h2 id="comparing">Comparing Providers</h2>
|
|
44
|
+
<p>Switch providers and re-run the same eval to compare:</p>
|
|
45
|
+
<pre><code class="language-bash">❯ /model openai
|
|
46
|
+
❯ /evals run
|
|
47
|
+
|
|
48
|
+
❯ /model deepseek-v4-flash
|
|
49
|
+
❯ /evals run
|
|
50
|
+
|
|
51
|
+
❯ /evals results # side-by-side comparison
|
|
52
|
+
</code></pre>
|
|
53
|
+
</div>
|
|
54
|
+
</div>
|
|
55
|
+
|
|
56
|
+
<script src="../js/main.js"></script>
|
|
57
|
+
</body>
|
|
58
|
+
</html>
|