clew-code 0.2.4 → 0.2.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (57) hide show
  1. package/README.md +264 -292
  2. package/dist/clew-dev.js +5118 -2840
  3. package/dist/main.js +2358 -2133
  4. package/docs/_config.yml +1 -1
  5. package/docs/architecture.html +145 -166
  6. package/docs/architecture.th.html +2 -23
  7. package/docs/commands.html +1 -22
  8. package/docs/commands.th.html +1 -22
  9. package/docs/configuration.html +145 -166
  10. package/docs/configuration.th.html +2 -23
  11. package/docs/css/styles.css +22 -0
  12. package/docs/daemon.html +128 -160
  13. package/docs/daemon.th.html +2 -30
  14. package/docs/features/bridge-mode.html +98 -98
  15. package/docs/features/bridge-mode.th.html +1 -1
  16. package/docs/features/evals.html +181 -181
  17. package/docs/features/evals.th.html +1 -1
  18. package/docs/features/searxng-search.html +150 -150
  19. package/docs/features/searxng-search.th.html +1 -1
  20. package/docs/features/sentry-setup.html +156 -156
  21. package/docs/features/sentry-setup.th.html +1 -1
  22. package/docs/index.html +298 -333
  23. package/docs/index.th.html +1 -36
  24. package/docs/installation.html +103 -124
  25. package/docs/installation.th.html +2 -23
  26. package/docs/internals/growthbook-ab-testing.html +112 -112
  27. package/docs/internals/growthbook-ab-testing.th.html +1 -1
  28. package/docs/internals/hidden-features.html +147 -147
  29. package/docs/internals/hidden-features.th.html +1 -1
  30. package/docs/js/main.js +78 -7
  31. package/docs/loop.html +180 -0
  32. package/docs/loop.th.html +226 -0
  33. package/docs/mcp.html +246 -157
  34. package/docs/mcp.th.html +156 -60
  35. package/docs/models.html +1 -22
  36. package/docs/models.th.html +1 -22
  37. package/docs/peer.html +235 -0
  38. package/docs/peer.th.html +279 -0
  39. package/docs/permission-model.html +101 -122
  40. package/docs/permission-model.th.html +2 -23
  41. package/docs/plugins.html +101 -122
  42. package/docs/plugins.th.html +2 -23
  43. package/docs/providers.html +117 -138
  44. package/docs/providers.th.html +2 -23
  45. package/docs/quick-start.html +92 -120
  46. package/docs/quick-start.th.html +1 -29
  47. package/docs/research-memory.html +79 -111
  48. package/docs/research-memory.th.html +2 -30
  49. package/docs/skills.html +116 -137
  50. package/docs/skills.th.html +2 -23
  51. package/docs/taste.html +96 -29
  52. package/docs/taste.th.html +193 -54
  53. package/docs/tools.html +169 -190
  54. package/docs/tools.th.html +2 -23
  55. package/docs/troubleshooting.html +105 -126
  56. package/docs/troubleshooting.th.html +2 -23
  57. package/package.json +2 -2
@@ -1,181 +1,181 @@
1
- <!DOCTYPE html>
2
- <html lang="en">
3
- <head>
4
- <meta charset="UTF-8">
5
- <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
- <title>Evaluation Harness — Clew</title>
7
- <meta name="description" content="Offline-first AI coding agent evaluation and verification framework.">
8
- <link rel="preconnect" href="https://fonts.googleapis.com">
9
- <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
10
- <link href="https://fonts.googleapis.com/css2?family=DM+Sans:wght@400;500;600;700&family=JetBrains+Mono:wght@400;500;600;700&display=swap" rel="stylesheet">
11
- <link rel="stylesheet" href="../css/styles.css">
12
- <link rel="icon" type="image/svg+xml" href="../assets/clew.svg">
13
- </head>
14
- <body>
15
- <header class="header">
16
- <div class="header-inner">
17
- <a href="../index.html" class="logo">
18
- <span>Clew Code</span>
19
- </a>
20
- <nav class="header-nav">
21
- <a href="../index.html">Home</a>
22
- <a href="../index.html#features">Features</a>
23
- <a href="../index.html#commands">Commands</a>
24
- <a href="../quick-start.html" class="active">Docs</a>
25
- <a href="https://github.com/JonusNattapong/ClewCode" target="_blank">GitHub</a>
26
- <div class="lang-wrap">
27
- <button class="lang-btn">🌐</button>
28
- <div class="lang-menu">
29
- <a href="../../readme/README.zh.md">中文</a>
30
- <a href="../../readme/README.th.md">ไทย</a>
31
- <a href="../../readme/README.ja.md">日本語</a>
32
- <a href="../../readme/README.ko.md">한국어</a>
33
- <a href="../../readme/README.es.md">Español</a>
34
- <a href="../../readme/README.fr.md">Français</a>
35
- <a href="../../readme/README.de.md">Deutsch</a>
36
- <a href="../../readme/README.pt.md">Português</a>
37
- <a href="../../readme/README.vi.md">Tiếng Việt</a>
38
- <a href="../../readme/README.id.md">Bahasa Indonesia</a>
39
- <a href="../../readme/README.ru.md">Русский</a>
40
- <a href="../../readme/README.hi.md">हिन्दी</a>
41
- <a href="../../README.md">English</a>
42
- </div>
43
- </div>
44
- </nav>
45
- <button class="menu-btn" id="menuToggle" aria-label="Toggle navigation"><span></span><span></span><span></span></button>
46
- </div>
47
- </header>
48
- <div class="app">
49
- <aside class="sidebar" id="sidebar"></aside>
50
- <div class="sidebar-overlay" id="sidebarOverlay"></div>
51
- <div class="content-wrap">
52
- <main class="content">
53
- <div class="breadcrumbs"><a href="../index.html">Home</a><span class="sep">/</span><a href="../index.html#features">Features</a><span class="sep">/</span><span>Evaluation Harness</span></div>
54
- <h1>Evaluation Harness</h1>
55
- <p class="section-subtitle">Offline-first AI coding agent evaluation and verification framework</p>
56
-
57
- <div class="callout callout-tip">
58
- <strong>TL;DR</strong>
59
- Run <code>clew eval init</code> to bootstrap the evaluation folders inside your project,
60
- then execute <code>clew eval run</code> to run standard coding or research benchmarks locally.
61
- </div>
62
-
63
- <h2>Overview</h2>
64
- <p>Clew includes a localized, <strong>offline-first evaluation harness</strong> under the <code>/eval</code> command namespace. This allows developers to systematically grade agent output quality, detect trace trajectory regressions, control boundary escapes, and compare model versions using deterministic rules.</p>
65
-
66
- <h2>Workspace Directory Layout</h2>
67
- <p>When you run <code>clew eval init</code>, it configures the following structures inside <code>.claude/evals/</code>:</p>
68
- <table>
69
- <tr><th>Folder</th><th>Description</th></tr>
70
- <tr><td><code>.claude/evals/tasks/</code></td><td>YAML task definitions (grouped by categories like <code>coding/</code>, <code>research/</code>, <code>memory/</code>, <code>security/</code>)</td></tr>
71
- <tr><td><code>.claude/evals/graders/</code></td><td>YAML grader rules and configurations (Command, Trace, Artifact, and Rule graders)</td></tr>
72
- <tr><td><code>.claude/evals/runs/</code></td><td>Outcome results, captured events logs, and workspace diffs per run</td></tr>
73
- <tr><td><code>.claude/evals/baselines/</code></td><td>Saved scoring baselines (e.g. main branch benchmark records)</td></tr>
74
- <tr><td><code>.claude/evals/reports/</code></td><td>Final generated markdown and JSON evaluation reports</td></tr>
75
- </table>
76
-
77
- <h2>Subcommand CLI Usage</h2>
78
- <h3>1. Initialize Workspace</h3>
79
- <pre><code>claude eval init</code></pre>
80
-
81
- <h3>2. Run Evaluations</h3>
82
- <pre><code># Run all loaded tasks
83
- claude eval run
84
- # Run only tasks in the "coding" category
85
- claude eval run --set coding
86
- # Run a specific task by ID
87
- claude eval run --task coding.sample-task
88
- # Run evaluations and compare against a baseline
89
- claude eval run --baseline main</code></pre>
90
-
91
- <h3>3. Drift &amp; Regression Comparison</h3>
92
- <pre><code>claude eval compare --baseline main</code></pre>
93
-
94
- <h3>4. Step Trace Trajectory</h3>
95
- <pre><code>claude eval trace coding.sample-task</code></pre>
96
-
97
- <h3>5. Diagnostics (Doctor)</h3>
98
- <pre><code>claude eval doctor</code></pre>
99
-
100
- <h2>Writing Tasks &amp; Graders</h2>
101
- <h3>Eval Task YAML Schema</h3>
102
- <pre><code>id: coding.fix-provider-routing
103
- title: Fix provider routing fallback behavior
104
- category: coding
105
- input: |
106
- Fix the provider routing fallback so unsupported providers return a clear error.
107
- workspace_fixture: fixtures/provider-routing
108
- expected:
109
- files_changed:
110
- - src/providers/router.ts
111
- commands_run:
112
- - bun test src/providers
113
- graders:
114
- - test-pass
115
- - scope-control
116
- - evidence-before-patch
117
- budgets:
118
- max_steps: 12
119
- max_tool_calls: 6</code></pre>
120
-
121
- <h3>Grader Types</h3>
122
- <h4>Command Grader</h4>
123
- <pre><code>id: test-pass
124
- type: command
125
- commands:
126
- - bun test
127
- pass_when:
128
- exit_code: 0</code></pre>
129
-
130
- <h4>Trace Grader</h4>
131
- <pre><code>id: evidence-before-patch
132
- type: trace
133
- rules:
134
- - before: repo.patch
135
- require_any:
136
- - repo.search
137
- - repo.open
138
- fail_message: Agent patched files before reading evidence.</code></pre>
139
-
140
- <h4>Artifact Grader</h4>
141
- <pre><code>id: scope-control
142
- type: artifact
143
- checks:
144
- max_changed_files: 5
145
- changed_files:
146
- allow:
147
- - src/providers/**
148
- - tests/providers/**
149
- deny:
150
- - package-lock.json</code></pre>
151
-
152
- <h4>Rule Grader</h4>
153
- <pre><code>id: output-format
154
- type: rule
155
- must_include:
156
- - "## Summary"
157
- must_not_include:
158
- - "I could not view"</code></pre>
159
-
160
- <h2>Critical Failure Policies</h2>
161
- <p>Clew immediately scores a task as <strong>0.0 (Failed)</strong> if any of these boundaries are breached:</p>
162
- <ol>
163
- <li><strong>Secret Leakage</strong> — Sensitive tokens (e.g. API keys, secrets) detected in agent output</li>
164
- <li><strong>Workspace Escape</strong> — Agent attempts to write or edit files outside workspace boundaries</li>
165
- <li><strong>Forbidden Commands</strong> — Destructive actions (e.g., <code>rm -rf</code>) without explicit permission</li>
166
- </ol>
167
-
168
- <footer class="footer">
169
- <span>Clew v0.1.2</span>
170
- <div class="footer-links">
171
- <a href="https://github.com/JonusNattapong/ClewCode">GitHub</a>
172
- <a href="https://github.com/JonusNattapong/ClewCode/issues">Issues</a>
173
- </div>
174
- </footer>
175
- </main>
176
- <nav class="toc-sidebar"></nav>
177
- </div>
178
- </div>
179
- <script src="../js/main.js"></script>
180
- </body>
181
- </html>
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>Evaluation Harness — Clew</title>
7
+ <meta name="description" content="Offline-first AI coding agent evaluation and verification framework.">
8
+ <link rel="preconnect" href="https://fonts.googleapis.com">
9
+ <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
10
+ <link href="https://fonts.googleapis.com/css2?family=DM+Sans:wght@400;500;600;700&family=JetBrains+Mono:wght@400;500;600;700&display=swap" rel="stylesheet">
11
+ <link rel="stylesheet" href="../css/styles.css">
12
+ <link rel="icon" type="image/svg+xml" href="../assets/clew.svg">
13
+ </head>
14
+ <body>
15
+ <header class="header">
16
+ <div class="header-inner">
17
+ <a href="../index.html" class="logo">
18
+ <span>Clew Code</span>
19
+ </a>
20
+ <nav class="header-nav">
21
+ <a href="../index.html">Home</a>
22
+ <a href="../index.html#features">Features</a>
23
+ <a href="../index.html#commands">Commands</a>
24
+ <a href="../quick-start.html" class="active">Docs</a>
25
+ <a href="https://github.com/JonusNattapong/ClewCode" target="_blank">GitHub</a>
26
+ <div class="lang-wrap">
27
+ <button class="lang-btn">🌐</button>
28
+ <div class="lang-menu">
29
+ <a href="../../readme/README.zh.md">中文</a>
30
+ <a href="../../readme/README.th.md">ไทย</a>
31
+ <a href="../../readme/README.ja.md">日本語</a>
32
+ <a href="../../readme/README.ko.md">한국어</a>
33
+ <a href="../../readme/README.es.md">Español</a>
34
+ <a href="../../readme/README.fr.md">Français</a>
35
+ <a href="../../readme/README.de.md">Deutsch</a>
36
+ <a href="../../readme/README.pt.md">Português</a>
37
+ <a href="../../readme/README.vi.md">Tiếng Việt</a>
38
+ <a href="../../readme/README.id.md">Bahasa Indonesia</a>
39
+ <a href="../../readme/README.ru.md">Русский</a>
40
+ <a href="../../readme/README.hi.md">हिन्दी</a>
41
+ <a href="../../README.md">English</a>
42
+ </div>
43
+ </div>
44
+ </nav>
45
+ <button class="menu-btn" id="menuToggle" aria-label="Toggle navigation"><span></span><span></span><span></span></button>
46
+ </div>
47
+ </header>
48
+ <div class="app">
49
+ <aside class="sidebar" id="sidebar"></aside>
50
+ <div class="sidebar-overlay" id="sidebarOverlay"></div>
51
+ <div class="content-wrap">
52
+ <main class="content">
53
+ <div class="breadcrumbs"><a href="../index.html">Home</a><span class="sep">/</span><a href="../index.html#features">Features</a><span class="sep">/</span><span>Evaluation Harness</span></div>
54
+ <h1>Evaluation Harness</h1>
55
+ <p class="section-subtitle">Offline-first AI coding agent evaluation and verification framework</p>
56
+
57
+ <div class="callout callout-tip">
58
+ <strong>TL;DR</strong>
59
+ Run <code>clew eval init</code> to bootstrap the evaluation folders inside your project,
60
+ then execute <code>clew eval run</code> to run standard coding or research benchmarks locally.
61
+ </div>
62
+
63
+ <h2>Overview</h2>
64
+ <p>Clew includes a localized, <strong>offline-first evaluation harness</strong> under the <code>/eval</code> command namespace. This allows developers to systematically grade agent output quality, detect trace trajectory regressions, control boundary escapes, and compare model versions using deterministic rules.</p>
65
+
66
+ <h2>Workspace Directory Layout</h2>
67
+ <p>When you run <code>clew eval init</code>, it configures the following structures inside <code>.claude/evals/</code>:</p>
68
+ <table>
69
+ <tr><th>Folder</th><th>Description</th></tr>
70
+ <tr><td><code>.claude/evals/tasks/</code></td><td>YAML task definitions (grouped by categories like <code>coding/</code>, <code>research/</code>, <code>memory/</code>, <code>security/</code>)</td></tr>
71
+ <tr><td><code>.claude/evals/graders/</code></td><td>YAML grader rules and configurations (Command, Trace, Artifact, and Rule graders)</td></tr>
72
+ <tr><td><code>.claude/evals/runs/</code></td><td>Outcome results, captured events logs, and workspace diffs per run</td></tr>
73
+ <tr><td><code>.claude/evals/baselines/</code></td><td>Saved scoring baselines (e.g. main branch benchmark records)</td></tr>
74
+ <tr><td><code>.claude/evals/reports/</code></td><td>Final generated markdown and JSON evaluation reports</td></tr>
75
+ </table>
76
+
77
+ <h2>Subcommand CLI Usage</h2>
78
+ <h3>1. Initialize Workspace</h3>
79
+ <pre><code>claude eval init</code></pre>
80
+
81
+ <h3>2. Run Evaluations</h3>
82
+ <pre><code># Run all loaded tasks
83
+ claude eval run
84
+ # Run only tasks in the "coding" category
85
+ claude eval run --set coding
86
+ # Run a specific task by ID
87
+ claude eval run --task coding.sample-task
88
+ # Run evaluations and compare against a baseline
89
+ claude eval run --baseline main</code></pre>
90
+
91
+ <h3>3. Drift &amp; Regression Comparison</h3>
92
+ <pre><code>claude eval compare --baseline main</code></pre>
93
+
94
+ <h3>4. Step Trace Trajectory</h3>
95
+ <pre><code>claude eval trace coding.sample-task</code></pre>
96
+
97
+ <h3>5. Diagnostics (Doctor)</h3>
98
+ <pre><code>claude eval doctor</code></pre>
99
+
100
+ <h2>Writing Tasks &amp; Graders</h2>
101
+ <h3>Eval Task YAML Schema</h3>
102
+ <pre><code>id: coding.fix-provider-routing
103
+ title: Fix provider routing fallback behavior
104
+ category: coding
105
+ input: |
106
+ Fix the provider routing fallback so unsupported providers return a clear error.
107
+ workspace_fixture: fixtures/provider-routing
108
+ expected:
109
+ files_changed:
110
+ - src/providers/router.ts
111
+ commands_run:
112
+ - bun test src/providers
113
+ graders:
114
+ - test-pass
115
+ - scope-control
116
+ - evidence-before-patch
117
+ budgets:
118
+ max_steps: 12
119
+ max_tool_calls: 6</code></pre>
120
+
121
+ <h3>Grader Types</h3>
122
+ <h4>Command Grader</h4>
123
+ <pre><code>id: test-pass
124
+ type: command
125
+ commands:
126
+ - bun test
127
+ pass_when:
128
+ exit_code: 0</code></pre>
129
+
130
+ <h4>Trace Grader</h4>
131
+ <pre><code>id: evidence-before-patch
132
+ type: trace
133
+ rules:
134
+ - before: repo.patch
135
+ require_any:
136
+ - repo.search
137
+ - repo.open
138
+ fail_message: Agent patched files before reading evidence.</code></pre>
139
+
140
+ <h4>Artifact Grader</h4>
141
+ <pre><code>id: scope-control
142
+ type: artifact
143
+ checks:
144
+ max_changed_files: 5
145
+ changed_files:
146
+ allow:
147
+ - src/providers/**
148
+ - tests/providers/**
149
+ deny:
150
+ - package-lock.json</code></pre>
151
+
152
+ <h4>Rule Grader</h4>
153
+ <pre><code>id: output-format
154
+ type: rule
155
+ must_include:
156
+ - "## Summary"
157
+ must_not_include:
158
+ - "I could not view"</code></pre>
159
+
160
+ <h2>Critical Failure Policies</h2>
161
+ <p>Clew immediately scores a task as <strong>0.0 (Failed)</strong> if any of these boundaries are breached:</p>
162
+ <ol>
163
+ <li><strong>Secret Leakage</strong> — Sensitive tokens (e.g. API keys, secrets) detected in agent output</li>
164
+ <li><strong>Workspace Escape</strong> — Agent attempts to write or edit files outside workspace boundaries</li>
165
+ <li><strong>Forbidden Commands</strong> — Destructive actions (e.g., <code>rm -rf</code>) without explicit permission</li>
166
+ </ol>
167
+
168
+ <footer class="footer">
169
+ <span>Clew Code v0.2.4</span>
170
+ <div class="footer-links">
171
+ <a href="https://github.com/JonusNattapong/ClewCode">GitHub</a>
172
+ <a href="https://github.com/JonusNattapong/ClewCode/issues">Issues</a>
173
+ </div>
174
+ </footer>
175
+ </main>
176
+ <nav class="toc-sidebar"></nav>
177
+ </div>
178
+ </div>
179
+ <script src="../js/main.js"></script>
180
+ </body>
181
+ </html>
@@ -74,7 +74,7 @@ claude eval run --task coding.sample-task</code></pre>
74
74
  <pre><code>claude eval compare --baseline main</code></pre>
75
75
 
76
76
  <footer class="footer">
77
- <span>Clew v0.1.2</span>
77
+ <span>Clew Code v0.2.4</span>
78
78
  <div class="footer-links">
79
79
  <a href="https://github.com/JonusNattapong/ClewCode">GitHub</a>
80
80
  <a href="https://github.com/JonusNattapong/ClewCode/issues">ปัญหา</a>