opencode-goal-mode 0.2.2 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/ARCHITECTURE.md CHANGED
@@ -8,12 +8,17 @@ configuration directory:
8
8
  gates). Each is a Markdown file: YAML frontmatter (mode, permissions, color,
9
9
  temperature) over a system-prompt body.
10
10
  2. **Commands** (`commands/*.md`) — slash commands (`/goal`, `/goal-contract`,
11
- `/goal-review`, `/goal-status`, `/goal-repair`, `/goal-final`) that bind a
12
- prompt template to an agent, some forced to run as subtasks.
11
+ `/goal-review`, `/goal-evidence-map`, `/goal-status`, `/goal-repair`,
12
+ `/goal-final`) that bind a prompt template to an agent, some forced to run as
13
+ subtasks.
13
14
  3. **The `goal-guard` plugin** (`plugins/goal-guard.js` + `plugins/goal-guard/`)
14
15
  — a runtime guard that enforces review discipline, blocks destructive shell
15
16
  commands, preserves state across compaction and restarts, and exposes
16
17
  first-class `goal_*` tools.
18
+ 4. **An experimental TUI companion** (`plugins/goal-sidebar.js`) — a separate
19
+ `{ tui }` plugin module that renders the active goal as a yellow sidebar
20
+ banner. It is *paired* with the server plugin purely through the on-disk state
21
+ snapshot (no extra IPC) and no-ops on any runtime without the slot API.
17
22
 
18
23
  This document focuses on the plugin, where the engineering lives.
19
24
 
@@ -41,13 +46,15 @@ as plugins. Each module is independently unit-tested.
41
46
  | `goal-guard/config.js` | Config resolution (defaults < env vars < plugin options). |
42
47
  | `goal-guard/state.js` | Per-session state records + the store (monotonic seq, LRU, persistence hooks). |
43
48
  | `goal-guard/persistence.js` | Atomic, debounced JSON persistence under the XDG state dir. |
44
- | `goal-guard/verdicts.js` | Verdict extraction (last-wins, anchored) and recording. |
49
+ | `goal-guard/verdicts.js` | Verdict extraction (last-wins, anchored), recording, and Reviewer Memory updates. |
45
50
  | `goal-guard/gates.js` | Required-gate computation and freshness. |
46
51
  | `goal-guard/completion.js` | `Goal Completed` claim evaluation. |
47
52
  | `goal-guard/events.js` | Shared edit/verification/evidence mutators. |
48
- | `goal-guard/summary.js` | State summaries and structured status reports. |
53
+ | `goal-guard/summary.js` | State summaries, status reports, and evidence-map projections. |
49
54
  | `goal-guard/system.js` | Live state block injected into the system prompt. |
50
- | `goal-guard/tools.js` | The `goal_status` / `goal_contract` / `goal_evidence` / `goal_reset` tools. |
55
+ | `goal-guard/summary.js` | Status/evidence projections, the short goal label, and the sidebar view. |
56
+ | `goal-guard/tools.js` | The `goal_status` / `goal_evidence_map` / `goal_reviewer_memory` / `goal_contract` / `goal_evidence` / `goal_reset` tools. |
57
+ | `goal-guard/sidebar-data.js` | Pure reader that projects the persisted snapshot into the sidebar banner model. |
51
58
  | `goal-guard/logger.js` | Best-effort logging/toasts over the OpenCode client. |
52
59
 
53
60
  ## Hooks used
@@ -88,7 +95,12 @@ re-running verification does not.
88
95
  A session record tracks: active flag, captured goal text, the Goal Contract,
89
96
  dirty flag and reasons, changed files, review-cycle count, the last edit/review/
90
97
  verification seq and timestamps, the verdict log and per-agent latest verdict,
91
- recorded evidence, and completion-rejection history.
98
+ recorded evidence, Reviewer Memory, and completion-rejection history.
99
+
100
+ Reviewer Memory stores bounded summaries of blocking reviewer findings. A fresh
101
+ FAIL opens or refreshes a finding for that reviewer; a fresh PASS from the same
102
+ reviewer marks its open findings resolved. The memory is injected into status and
103
+ system context so recurring review issues survive long sessions and restarts.
92
104
 
93
105
  ### Persistence
94
106
 
@@ -137,11 +149,13 @@ or any required gate is missing/stale.
137
149
 
138
150
  ## Custom tools
139
151
 
140
- The `tool` hook registers four tools (names are verbatim object keys):
152
+ The `tool` hook registers six tools (names are verbatim object keys):
141
153
 
142
154
  - `goal_contract` — record the Goal Contract; activates enforcement and fixes the
143
155
  required specialist gates.
144
156
  - `goal_evidence` — log a verification command + result into the ledger.
157
+ - `goal_evidence_map` — return the acceptance-criteria evidence map with reviewer status and next actions.
158
+ - `goal_reviewer_memory` — return open and recently resolved reviewer findings.
145
159
  - `goal_status` — return the authoritative gate/dirty/completion status.
146
160
  - `goal_reset` — clear the session's goal state (requires `confirm: true`).
147
161
 
@@ -149,6 +163,25 @@ The `@opencode-ai/plugin` import they need is isolated to `tools.js` and loaded
149
163
  via a guarded dynamic import, so if the host cannot resolve it the core guard
150
164
  hooks still load.
151
165
 
166
+ ## TUI companion (experimental)
167
+
168
+ `plugins/goal-sidebar.js` is a TUI plugin module — `export const tui = async (api)
169
+ => …` — distinct from the server plugin (`@opencode-ai/plugin` types it as a
170
+ `{ tui }` module, mutually exclusive with `{ server }`). It registers a
171
+ `sidebar_content` slot via `api.slots.register({ slots: { sidebar_content } })`
172
+ and renders, in the configured colour (`#FFD700` by default), the short goal
173
+ label plus a `passing/total gates · dirty/ready` line.
174
+
175
+ It is *paired* with the server plugin only through the persisted state file:
176
+ `sidebar-data.js` recomputes the same `stateBaseDir`/`projectKey` path the guard
177
+ writes to and projects the active session via `summary.sidebarView`. That keeps
178
+ the pure projection logic Node-testable (`tests/sidebar.test.mjs`) even though the
179
+ JSX renderer itself can only run inside OpenCode's (Bun) TUI runtime. Everything
180
+ in the `tui` entry is wrapped so a missing slot API, missing JSX runtime, or read
181
+ error degrades to rendering nothing — it can never break the TUI. The server plugin
182
+ also emits review-verdict and completion-unlock toasts (`toastOnReview`) so review
183
+ progress is visible even without the banner.
184
+
152
185
  ## Configuration
153
186
 
154
187
  `config.js` merges, in increasing precedence: built-in defaults, environment
@@ -172,9 +205,16 @@ manifest of the file hashes it wrote. On upgrade it distinguishes files it owns
172
205
 
173
206
  - `tests/shell.test.mjs` — the analyzer against the bypass and false-positive corpora.
174
207
  - `tests/plugin.test.mjs` — hook behavior, gating, verdicts, completion, tools, isolation.
208
+ - `tests/truthfulness-benchmark.test.mjs` — false-completion corpus and truthfulness scoring.
175
209
  - `tests/state.test.mjs` — store, seq ordering, eviction, persistence round-trips.
210
+ - `tests/sidebar.test.mjs` — short goal label, sidebar projection, snapshot reader, new destructive bins.
211
+ - `tests/toast.test.mjs` — review-verdict and completion-unlock toasts.
176
212
  - `tests/agents.test.mjs` / `tests/commands.test.mjs` — frontmatter and contracts.
177
213
  - `tests/install.test.mjs` — recursive copy, manifest upgrades, uninstall.
178
214
 
215
+ The shell guard's headline accuracy is measured on an external, third-party
216
+ corpus (`benchmarks/external.mjs` over `external-corpus.json`), not on the curated
217
+ fixtures — see [research/benchmarks.md](research/benchmarks.md).
218
+
179
219
  `npm run validate` runs the tests, the structural config validator, the publish
180
220
  readiness check, and an `npm pack --dry-run`.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,32 @@
1
1
  # Changelog
2
2
 
3
+ ## v0.3.0
4
+
5
+ - Honest benchmarks: add an EXTERNAL corpus of 704 real third-party commands from
6
+ tldr-pages (`benchmarks/external.mjs`, `npm run bench:external`) as the headline
7
+ detection/false-positive measure (93.3% vs 53.8% legacy; ~0% real false
8
+ positives). Reframe the curated 71-command set and 9 completion cases as
9
+ regression *fixtures*, not measured accuracy, and reword the README/charts to
10
+ stop overclaiming.
11
+ - Stronger guard: block `mkfs.<fstype>` variants, `srm`, and `mkswap`
12
+ (genuine destructive commands the external corpus exposed as misses).
13
+ - Deeper TUI embedding: toast on each review verdict (PASS/FAIL) and once when the
14
+ last required gate clears (`toastOnReview`); `goal_status` now surfaces the goal.
15
+ - Experimental TUI sidebar banner (`plugins/goal-sidebar.js`): the active goal in
16
+ shining yellow with a live gate-status line, paired with the guard via persisted
17
+ state. No-ops on any runtime without the TUI slot API. New options
18
+ `sidebarBanner` / `sidebarColor` (`GOAL_GUARD_SIDEBAR_*`).
19
+ - Tighter `/goal` flow that seeds the Goal Contract via the `goal_contract` tool.
20
+
21
+ ## v0.2.4
22
+
23
+ - Add Reviewer Memory for unresolved/resolved reviewer findings across cycles.
24
+ - Add a False Completion Dataset and Benchmark Truthfulness Score for completion-claim enforcement.
25
+
26
+ ## v0.2.3
27
+
28
+ - Add `/goal-evidence-map` to map acceptance criteria to recorded verification evidence, gaps, and next actions.
29
+
3
30
  ## v0.2.2
4
31
 
5
32
  - Refresh source-backed research notes for OpenCode plugin/runtime facts and the Claude Code/Codex comparison.
package/README.md CHANGED
@@ -38,26 +38,47 @@ honest caveats, in [research/goal-mode-comparison.md](research/goal-mode-compari
38
38
  - **Destructive commands are blocked by a real shell tokenizer**, not a regex.
39
39
  Claude Code's own docs call Bash argument-matching *"fragile"*.
40
40
 
41
- ### Benchmark: shell-guard accuracy
41
+ ### Benchmarks (honest edition)
42
42
 
43
- The guard replaced a boundary-anchored regex classifier. On a labeled corpus of
44
- 71 real commands (`npm run bench` from a repository checkout, reproducible — see
45
- [research/benchmarks.md](research/benchmarks.md)):
43
+ The headline number is measured on commands **the analyzer was never fitted to**:
44
+ 704 real example commands from [tldr-pages](https://github.com/tldr-pages/tldr)
45
+ (common/linux/osx), authored by hundreds of contributors who have never seen
46
+ this guard. Ground-truth labels come from a deliberately simple, analyzer-*independent*
47
+ rule (see [build-external-corpus.mjs](benchmarks/build-external-corpus.mjs)).
48
+ Reproduce with `npm run bench` or `node benchmarks/external.mjs`.
46
49
 
47
- ![Destructive-command detection rate by family](docs/benchmarks/detection-by-family.svg)
50
+ ![Guard accuracy on real third-party commands](docs/benchmarks/external-scorecard.svg)
48
51
 
49
- ![Overall guard accuracy: detection rate vs false-positive rate](docs/benchmarks/overall-scorecard.svg)
50
-
51
- | | Legacy regex guard | Goal Mode analyzer |
52
+ | On 704 real third-party commands | Legacy regex guard | Goal Mode analyzer |
52
53
  | --- | --- | --- |
53
- | Destructive-command detection | **20.8%** | **100%** |
54
- | False positives on safe commands | **21.7%** | **0%** |
55
- | Obfuscated bypasses caught (`$(…)`, `bash -c`, `sudo -u`, interpreters) | 0% | 100% |
56
- | Remote exec (`curl \| sh`) caught | 0% | 100% |
57
-
58
- The deeper analysis costs a few microseconds per command on this machine
59
- (hundreds of thousands of classifications per second) negligible for a
60
- per-tool-call guard:
54
+ | Destructive-command detection | 53.8% | **93.3%** |
55
+ | False positives on safe commands | 0.2% | **0.2%** |
56
+
57
+ Honest caveats, because the point of this rewrite was to stop overclaiming:
58
+
59
+ - The ~7 remaining "misses" are almost all un-flagged single-target `rm <file>`,
60
+ which the guard **intentionally permits** (plain `rm` is common and the guard
61
+ blocks `rm -r`/`rm -f`, `$(rm …)`, `bash -c`, interpreters, etc.). Under a
62
+ strict every-`rm`-is-destructive labeling those count against it.
63
+ - The single counted false positive (`git filter-repo …`) actually *is* a
64
+ history-rewriting command, so the real-world false-positive rate is effectively
65
+ zero. `node benchmarks/external.mjs --json` lists every miss and false positive
66
+ so you can audit the disagreements yourself.
67
+
68
+ Two **curated fixture sets** also ship — and they are explicitly *fixtures*, not
69
+ an unbiased benchmark. They define the patterns the analyzer must catch and guard
70
+ against regressions, so they pass by construction; do not read the 100%/0% there
71
+ as measured accuracy:
72
+
73
+ - `benchmarks/corpus.mjs` — 71 destructive patterns (incl. `$(…)`, `bash -c`,
74
+ `sudo -u`, `/bin/rm`, `git -C … reset --hard`, `curl | sh`, interpreter
75
+ deletes) and their safe look-alikes (`git checkout -b`, `echo "rm -rf /"`).
76
+ - `benchmarks/completion-corpus.mjs` — 9 completion-claim policy cases (missing
77
+ review-cycle line, stale review after edit, missing contextual gate, inactive
78
+ session, custom marker). `npm run bench:truthfulness` prints them.
79
+
80
+ The analysis costs ~1µs per command (hundreds of thousands of classifications per
81
+ second) — negligible for a per-tool-call guard:
61
82
 
62
83
  ![Per-command analysis latency](docs/benchmarks/latency.svg)
63
84
 
@@ -72,8 +93,8 @@ per-tool-call guard:
72
93
  discovery, verification planning, and reviews to subagents.
73
94
  - Strict review gates for prompt compliance, diff review, verification, security,
74
95
  UX, operations, data, API, performance, tests, docs, quality, and final audit.
75
- - Slash commands: `/goal`, `/goal-contract`, `/goal-review`, `/goal-status`,
76
- `/goal-repair`, `/goal-final`.
96
+ - Slash commands: `/goal`, `/goal-contract`, `/goal-review`,
97
+ `/goal-evidence-map`, `/goal-status`, `/goal-repair`, `/goal-final`.
77
98
  - The `goal-guard` plugin:
78
99
  - **Quote-aware shell analysis** that blocks destructive and remote-exec
79
100
  commands (including ones that evade naive regexes — `$(rm -rf …)`,
@@ -83,14 +104,40 @@ per-tool-call guard:
83
104
  `Goal Not Completed` with the exact missing review gates.
84
105
  - **Contextual gating**: the goal text and changed files determine which
85
106
  specialist reviewers are required.
86
- - **Disk persistence**: review ledgers survive OpenCode restarts.
87
- - **Custom tools**: `goal_contract`, `goal_evidence`, `goal_status`,
88
- `goal_reset`.
107
+ - **Reviewer Memory**: blocking reviewer findings are carried across cycles,
108
+ surfaced in status/system context, and marked resolved by fresh PASS verdicts.
109
+ - **Disk persistence**: review ledgers and Reviewer Memory survive OpenCode restarts.
110
+ - **Custom tools**: `goal_contract`, `goal_evidence`, `goal_evidence_map`,
111
+ `goal_reviewer_memory`, `goal_status`, `goal_reset`.
89
112
  - **Live state injection** into the system prompt so the model always knows
90
113
  what the guard requires.
114
+ - **TUI toasts**: a toast on each review verdict (PASS/FAIL) and a single
115
+ "completion unlocked" toast the moment the last required gate clears.
116
+ - An **experimental** companion TUI plugin (`plugins/goal-sidebar.js`) that shows
117
+ the active goal as a shining-yellow banner in the sidebar with a compact gate
118
+ status line. See [TUI integration](#tui-integration).
91
119
  - A test suite validating the analyzer, plugin hooks, state store, install
92
120
  safety, and config compatibility.
93
121
 
122
+ ## TUI integration
123
+
124
+ Goal Mode is a **plugin pair**: the server-side `goal-guard` plugin owns
125
+ enforcement and writes its state to disk, and an experimental TUI plugin
126
+ (`plugins/goal-sidebar.js`) reads that same state to render a live banner.
127
+
128
+ - **Sidebar goal banner (experimental).** The current goal renders in shining
129
+ yellow in the sidebar (`sidebar_content` slot), with a `passing/total gates ·
130
+ dirty/ready` status line, and updates as reviews land. It requires a
131
+ TUI-plugin-capable OpenCode (one exposing `api.slots.register`); on any older
132
+ runtime it silently no-ops, so it can never break your TUI. Set
133
+ `sidebarBanner: false` (or `GOAL_GUARD_SIDEBAR_BANNER=0`) to disable, or
134
+ `sidebarColor` to recolour it. Because no local environment can run OpenCode's
135
+ TUI runtime, this banner is shipped best-effort and should be verified in your
136
+ own TUI.
137
+ - **Toasts.** Review verdicts and completion-unlock events surface as toasts
138
+ (`toastOnReview`), and blocked destructive commands / premature completions
139
+ toast as before (`toastOnBlock`).
140
+
94
141
  ## Install globally
95
142
 
96
143
  ```bash
@@ -152,17 +199,28 @@ Or via environment variables (`GOAL_GUARD_*`):
152
199
  | `maxSessions` / `GOAL_GUARD_MAX_SESSIONS` | `200` | Session cache size. |
153
200
  | `sessionTtlMs` / `GOAL_GUARD_SESSION_TTL_MS` | `86400000` | Idle session TTL. |
154
201
  | `toastOnBlock` / `GOAL_GUARD_TOAST_ON_BLOCK` | `true` | Toast when something is blocked. |
202
+ | `toastOnReview` / `GOAL_GUARD_TOAST_ON_REVIEW` | `true` | Toast on each review verdict and when completion unlocks. |
203
+ | `sidebarBanner` / `GOAL_GUARD_SIDEBAR_BANNER` | `true` | Show the experimental yellow goal banner in the TUI sidebar. |
204
+ | `sidebarColor` / `GOAL_GUARD_SIDEBAR_COLOR` | `#FFD700` | Foreground colour of the sidebar goal banner. |
155
205
 
156
206
  ## Custom tools
157
207
 
158
- The plugin registers four tools the model can call directly:
208
+ The plugin registers six tools the model can call directly:
159
209
 
160
210
  - `goal_contract` — record the Goal Contract (requirements, non-goals,
161
211
  acceptance criteria). Activates enforcement and fixes the required gates.
162
212
  - `goal_evidence` — record a verification command and result.
213
+ - `goal_evidence_map` — return the acceptance-criteria evidence map with
214
+ reviewer status, gaps, and next actions.
215
+ - `goal_reviewer_memory` — return unresolved and recently resolved reviewer findings.
163
216
  - `goal_status` — return the authoritative gate/dirty/completion status.
164
217
  - `goal_reset` — clear the session's goal state (requires `confirm: true`).
165
218
 
219
+ Use `/goal-evidence-map` when you need a read-only matrix of each acceptance
220
+ criterion against recorded evidence, reviewer status, gaps, and the next
221
+ required action. The command is backed by the `goal_evidence_map` tool, so it
222
+ uses persisted Goal Guard state rather than relying on transcript memory.
223
+
166
224
  ## Validation
167
225
 
168
226
  ```bash
@@ -215,7 +273,7 @@ git push --follow-tags
215
273
  ```
216
274
 
217
275
  For a version that is already bumped and reviewed, commit the current tree, tag
218
- the reviewed version (for example `v0.2.2`), push the branch and tag, then create
276
+ the reviewed version (for example `v0.2.4`), push the branch and tag, then create
219
277
  the GitHub Release. Ensure `NPM_TOKEN` has npm publish rights before publishing
220
278
  the release.
221
279
 
@@ -0,0 +1,177 @@
1
+ #!/usr/bin/env node
2
+ /**
3
+ * Build an EXTERNAL, third-party-authored shell-command corpus for the guard
4
+ * benchmark, so the reported detection / false-positive numbers measure
5
+ * real-world behavior instead of a self-authored set the analyzer was tuned on.
6
+ *
7
+ * Source: the tldr-pages project (https://github.com/tldr-pages/tldr, CC-BY).
8
+ * Every example command in the English `common`, `linux`, and `osx` pages is a
9
+ * real invocation documented by hundreds of contributors who have never seen
10
+ * this analyzer — so the analyzer cannot have been fitted to them.
11
+ *
12
+ * Ground-truth labels come from `labelDestructive()` below: a deliberately
13
+ * SIMPLE, transparent rule based on the primary utility and a fixed list of
14
+ * irreversible operations. It is intentionally independent of the analyzer's
15
+ * own classification logic. It is not perfect (no automatic labeler is) — the
16
+ * benchmark reports raw agreement and discloses the labeler so disagreements
17
+ * are auditable rather than hidden.
18
+ *
19
+ * Usage:
20
+ * node benchmarks/build-external-corpus.mjs --tldr /path/to/tldr [--limit 600]
21
+ * TLDR_DIR=/path/to/tldr node benchmarks/build-external-corpus.mjs
22
+ *
23
+ * Writes benchmarks/external-corpus.json (committed, so `npm run bench` is
24
+ * reproducible without a tldr checkout). Re-run this to regenerate it.
25
+ */
26
+
27
+ import { readFileSync, readdirSync, writeFileSync, existsSync } from "node:fs";
28
+ import { join, dirname } from "node:path";
29
+ import { fileURLToPath } from "node:url";
30
+ import { parseArgs } from "node:util";
31
+
32
+ const { values } = parseArgs({
33
+ options: {
34
+ tldr: { type: "string" },
35
+ limit: { type: "string", default: "600" },
36
+ },
37
+ });
38
+
39
+ const here = dirname(fileURLToPath(import.meta.url));
40
+ const tldrDir = values.tldr || process.env.TLDR_DIR;
41
+ const safeLimit = Math.max(50, Number.parseInt(values.limit, 10) || 600);
42
+
43
+ if (!tldrDir || !existsSync(tldrDir)) {
44
+ console.error(
45
+ "Need a tldr-pages checkout. Pass --tldr <dir> or set TLDR_DIR.\n" +
46
+ " git clone --depth 1 https://github.com/tldr-pages/tldr.git",
47
+ );
48
+ process.exit(1);
49
+ }
50
+
51
+ /** Pinned provenance for reproducibility — resolves a symbolic HEAD to its SHA. */
52
+ function tldrCommit() {
53
+ try {
54
+ const head = readFileSync(join(tldrDir, ".git", "HEAD"), "utf8").trim();
55
+ const ref = head.match(/^ref:\s*(.+)$/);
56
+ if (!ref) return head;
57
+ return readFileSync(join(tldrDir, ".git", ref[1]), "utf8").trim();
58
+ } catch {
59
+ return "unknown";
60
+ }
61
+ }
62
+
63
+ /**
64
+ * Turn a tldr example line into a real, literal shell command:
65
+ * - `{{placeholder}}` → its inner text (a realistic argument).
66
+ * - `[-f|--force]` / `[-r|--recursive]` alternative-flag notation → the first
67
+ * form (`-f`, `-r`), so the result is a command a shell would actually accept
68
+ * rather than tldr documentation syntax.
69
+ */
70
+ function fillPlaceholders(cmd) {
71
+ return cmd
72
+ .replace(/\{\{(.*?)\}\}/g, (_, inner) => String(inner).trim() || "arg")
73
+ .replace(/\[([^\]|]+)\|[^\]]+\]/g, (_, first) => String(first).trim());
74
+ }
75
+
76
+ /** Independent, transparent destructive-intent labeler (NOT the analyzer). */
77
+ function labelDestructive(cmd) {
78
+ const c = cmd.trim();
79
+ // Remote code execution: fetch piped into a shell.
80
+ if (/\b(curl|wget|fetch)\b[^|]*\|\s*(sudo\s+)?(sh|bash|zsh|dash|ksh)\b/.test(c)) return true;
81
+ // Strip a leading wrapper so `sudo rm` / `time rm` resolve to their target.
82
+ const stripped = c.replace(/^(sudo|time|nice|ionice|nohup|env)\s+(-\S+\s+)*/, "");
83
+ const m = stripped.match(/^(\/[^\s]*\/)?([a-zA-Z0-9_.-]+)\b(.*)$/);
84
+ if (!m) return false;
85
+ const bin = m[2];
86
+ const rest = m[3] || "";
87
+ const DESTRUCTIVE_BINS = new Set([
88
+ "rm", "rmdir", "shred", "srm", "dd", "mkfs", "fdisk", "parted",
89
+ "wipefs", "mkswap", "blkdiscard", "sgdisk", "unlink",
90
+ ]);
91
+ if (/^mkfs\./.test(bin)) return true;
92
+ if (DESTRUCTIVE_BINS.has(bin)) {
93
+ if (bin === "dd") return /\bof=\/dev\//.test(rest);
94
+ if (bin === "rmdir") return false; // only removes empty dirs
95
+ return true;
96
+ }
97
+ if (bin === "git") {
98
+ if (/\breset\s+--hard\b/.test(rest)) return true;
99
+ if (/\bclean\b.*\s-\S*f/.test(rest)) return true;
100
+ if (/\bpush\b.*(--force\b|\s-f\b)/.test(rest)) return true;
101
+ if (/\bbranch\b.*\s-D\b/.test(rest)) return true;
102
+ if (/\breflog\s+expire\b/.test(rest)) return true;
103
+ if (/\bgc\b.*--prune/.test(rest)) return true;
104
+ if (/\bfilter-branch\b/.test(rest)) return true;
105
+ }
106
+ return false;
107
+ }
108
+
109
+ const dirs = ["common", "linux", "osx"]
110
+ .map((d) => join(tldrDir, "pages", d))
111
+ .filter((d) => existsSync(d));
112
+
113
+ const seen = new Set();
114
+ const destructive = [];
115
+ const safe = [];
116
+
117
+ for (const dir of dirs) {
118
+ const family = dir.split("/").slice(-1)[0];
119
+ for (const file of readdirSync(dir)) {
120
+ if (!file.endsWith(".md")) continue;
121
+ const page = file.replace(/\.md$/, "");
122
+ const text = readFileSync(join(dir, file), "utf8");
123
+ for (const line of text.split("\n")) {
124
+ const trimmed = line.trim();
125
+ // tldr example commands are fenced in single backticks on their own line.
126
+ if (!trimmed.startsWith("`") || !trimmed.endsWith("`") || trimmed.length < 4) continue;
127
+ const raw = fillPlaceholders(trimmed.slice(1, -1)).trim();
128
+ if (!raw || raw.length > 240) continue;
129
+ if (!/^[a-zA-Z/.~$]/.test(raw)) continue; // must start like a command
130
+ if (seen.has(raw)) continue;
131
+ seen.add(raw);
132
+ const entry = { cmd: raw, page, family };
133
+ if (labelDestructive(raw)) destructive.push(entry);
134
+ else safe.push(entry);
135
+ }
136
+ }
137
+ }
138
+
139
+ /** Deterministic evenly-spaced stride sample (no RNG, so the build is stable). */
140
+ function stride(list, target) {
141
+ if (list.length <= target) return list.slice();
142
+ const step = list.length / target;
143
+ const out = [];
144
+ for (let i = 0; i < target; i += 1) out.push(list[Math.floor(i * step)]);
145
+ return out;
146
+ }
147
+
148
+ // Enrich ALL destructive examples (they are rare in real docs) and stride-sample
149
+ // safe ones up to the limit. This is disclosed in the report so the imbalance is
150
+ // not mistaken for the natural base rate.
151
+ destructive.sort((a, b) => a.cmd.localeCompare(b.cmd));
152
+ safe.sort((a, b) => a.cmd.localeCompare(b.cmd));
153
+ const sampledSafe = stride(safe, safeLimit);
154
+
155
+ const corpus = {
156
+ source: "tldr-pages",
157
+ url: "https://github.com/tldr-pages/tldr",
158
+ license: "CC-BY-4.0",
159
+ commit: tldrCommit(),
160
+ pages: dirs.map((d) => d.split("/").slice(-2).join("/")),
161
+ labeler: "benchmarks/build-external-corpus.mjs labelDestructive() — independent of the analyzer",
162
+ totals: {
163
+ uniqueCommandsScanned: seen.size,
164
+ destructiveFound: destructive.length,
165
+ safeFound: safe.length,
166
+ safeSampled: sampledSafe.length,
167
+ },
168
+ entries: [...destructive, ...sampledSafe],
169
+ };
170
+
171
+ const outPath = join(here, "external-corpus.json");
172
+ writeFileSync(outPath, JSON.stringify(corpus, null, 2));
173
+ console.log(
174
+ `Wrote ${corpus.entries.length} external commands ` +
175
+ `(${destructive.length} destructive + ${sampledSafe.length}/${safe.length} safe sampled) ` +
176
+ `from ${seen.size} unique tldr examples @ ${corpus.commit.slice(0, 12)} → ${outPath}`,
177
+ );
@@ -0,0 +1,176 @@
1
+ /**
2
+ * Minimal dependency-free SVG chart generator for the benchmark report.
3
+ * Produces grouped bar charts that GitHub renders inline in the README.
4
+ */
5
+
6
+ const PALETTE = {
7
+ legacy: "#9aa0a6",
8
+ current: "#2da44e",
9
+ axis: "#d0d7de",
10
+ text: "#1f2328",
11
+ subtext: "#656d76",
12
+ grid: "#eaeef2",
13
+ bg: "#ffffff",
14
+ };
15
+
16
+ function esc(s) {
17
+ return String(s).replace(/&/g, "&amp;").replace(/</g, "&lt;").replace(/>/g, "&gt;");
18
+ }
19
+
20
+ /**
21
+ * Grouped vertical bar chart.
22
+ * @param {object} opts
23
+ * @param {string} opts.title
24
+ * @param {string} opts.subtitle
25
+ * @param {string[]} opts.groups x-axis group labels
26
+ * @param {Array<{name:string,color:string,values:number[]}>} opts.series
27
+ * @param {string} [opts.unit] appended to value labels (e.g. "%")
28
+ * @param {number} [opts.max] y-axis max (default 100)
29
+ */
30
+ export function groupedBarChart({ title, subtitle, groups, series, unit = "%", max = 100 }) {
31
+ const W = 720;
32
+ const H = 380;
33
+ const padL = 48;
34
+ const padR = 20;
35
+ const padT = 64;
36
+ const padB = 84;
37
+ const plotW = W - padL - padR;
38
+ const plotH = H - padT - padB;
39
+ const groupW = plotW / groups.length;
40
+ const barGap = 8;
41
+ const barW = (groupW - barGap * (series.length + 1)) / series.length;
42
+
43
+ const parts = [];
44
+ parts.push(`<svg xmlns="http://www.w3.org/2000/svg" width="${W}" height="${H}" viewBox="0 0 ${W} ${H}" font-family="-apple-system,Segoe UI,Roboto,Helvetica,Arial,sans-serif">`);
45
+ parts.push(`<rect width="${W}" height="${H}" fill="${PALETTE.bg}"/>`);
46
+ parts.push(`<text x="${padL}" y="28" font-size="17" font-weight="700" fill="${PALETTE.text}">${esc(title)}</text>`);
47
+ if (subtitle) parts.push(`<text x="${padL}" y="47" font-size="12" fill="${PALETTE.subtext}">${esc(subtitle)}</text>`);
48
+
49
+ // Gridlines + y labels.
50
+ const ticks = 5;
51
+ for (let t = 0; t <= ticks; t += 1) {
52
+ const v = (max / ticks) * t;
53
+ const y = padT + plotH - (v / max) * plotH;
54
+ parts.push(`<line x1="${padL}" y1="${y.toFixed(1)}" x2="${W - padR}" y2="${y.toFixed(1)}" stroke="${PALETTE.grid}" stroke-width="1"/>`);
55
+ parts.push(`<text x="${padL - 8}" y="${(y + 4).toFixed(1)}" font-size="11" text-anchor="end" fill="${PALETTE.subtext}">${v}${unit}</text>`);
56
+ }
57
+
58
+ // Bars.
59
+ groups.forEach((g, gi) => {
60
+ const gx = padL + gi * groupW;
61
+ series.forEach((s, si) => {
62
+ const v = Math.max(0, Math.min(max, s.values[gi] ?? 0));
63
+ const bh = (v / max) * plotH;
64
+ const x = gx + barGap + si * (barW + barGap);
65
+ const y = padT + plotH - bh;
66
+ parts.push(`<rect x="${x.toFixed(1)}" y="${y.toFixed(1)}" width="${barW.toFixed(1)}" height="${bh.toFixed(1)}" rx="3" fill="${s.color}"/>`);
67
+ parts.push(`<text x="${(x + barW / 2).toFixed(1)}" y="${(y - 5).toFixed(1)}" font-size="11" font-weight="600" text-anchor="middle" fill="${PALETTE.text}">${Math.round(v)}${unit}</text>`);
68
+ });
69
+ parts.push(`<text x="${(gx + groupW / 2).toFixed(1)}" y="${(padT + plotH + 18).toFixed(1)}" font-size="11" text-anchor="middle" fill="${PALETTE.text}">${esc(g)}</text>`);
70
+ });
71
+
72
+ // Axis line.
73
+ parts.push(`<line x1="${padL}" y1="${padT + plotH}" x2="${W - padR}" y2="${padT + plotH}" stroke="${PALETTE.axis}" stroke-width="1.5"/>`);
74
+
75
+ // Legend.
76
+ const legendY = H - 26;
77
+ let lx = padL;
78
+ series.forEach((s) => {
79
+ parts.push(`<rect x="${lx}" y="${legendY - 10}" width="12" height="12" rx="2" fill="${s.color}"/>`);
80
+ parts.push(`<text x="${lx + 18}" y="${legendY}" font-size="12" fill="${PALETTE.text}">${esc(s.name)}</text>`);
81
+ lx += 24 + s.name.length * 7.2;
82
+ });
83
+
84
+ parts.push("</svg>");
85
+ return parts.join("\n");
86
+ }
87
+
88
+ /**
89
+ * Categorical capability matrix: rows = capabilities, columns = platforms,
90
+ * each cell colored by enforcement level. Honest, citable comparison.
91
+ * @param {object} opts
92
+ * @param {string[]} opts.columns
93
+ * @param {Array<{capability:string, cells:string[]}>} opts.rows cell ∈ levels keys
94
+ */
95
+ export function capabilityMatrix({ title, subtitle, columns, rows }) {
96
+ const levels = {
97
+ Enforced: { fill: "#2da44e", text: "#ffffff", label: "Enforced" },
98
+ Partial: { fill: "#d4a72c", text: "#1f2328", label: "Partial" },
99
+ "Prompt-only": { fill: "#dbe9d5", text: "#1f2328", label: "Prompt-only" },
100
+ None: { fill: "#eaeef2", text: "#656d76", label: "None" },
101
+ };
102
+ const W = 760;
103
+ const padL = 300;
104
+ const padT = 70;
105
+ const rowH = 38;
106
+ const colW = (W - padL - 16) / columns.length;
107
+ const legendH = 30;
108
+ const H = padT + rows.length * rowH + legendH + 16;
109
+
110
+ const parts = [];
111
+ parts.push(`<svg xmlns="http://www.w3.org/2000/svg" width="${W}" height="${H}" viewBox="0 0 ${W} ${H}" font-family="-apple-system,Segoe UI,Roboto,Helvetica,Arial,sans-serif">`);
112
+ parts.push(`<rect width="${W}" height="${H}" fill="${PALETTE.bg}"/>`);
113
+ parts.push(`<text x="20" y="28" font-size="17" font-weight="700" fill="${PALETTE.text}">${esc(title)}</text>`);
114
+ if (subtitle) parts.push(`<text x="20" y="47" font-size="12" fill="${PALETTE.subtext}">${esc(subtitle)}</text>`);
115
+
116
+ // Column headers.
117
+ columns.forEach((c, ci) => {
118
+ const x = padL + ci * colW + colW / 2;
119
+ parts.push(`<text x="${x.toFixed(1)}" y="${padT - 8}" font-size="12.5" font-weight="700" text-anchor="middle" fill="${PALETTE.text}">${esc(c)}</text>`);
120
+ });
121
+
122
+ rows.forEach((r, ri) => {
123
+ const y = padT + ri * rowH;
124
+ parts.push(`<text x="${padL - 14}" y="${y + rowH / 2 + 4}" font-size="12" text-anchor="end" fill="${PALETTE.text}">${esc(r.capability)}</text>`);
125
+ r.cells.forEach((cell, ci) => {
126
+ const lv = levels[cell] || levels.None;
127
+ const x = padL + ci * colW + 4;
128
+ parts.push(`<rect x="${x.toFixed(1)}" y="${y + 4}" width="${(colW - 8).toFixed(1)}" height="${rowH - 8}" rx="4" fill="${lv.fill}"/>`);
129
+ parts.push(`<text x="${(x + (colW - 8) / 2).toFixed(1)}" y="${y + rowH / 2 + 4}" font-size="11" font-weight="600" text-anchor="middle" fill="${lv.text}">${lv.label}</text>`);
130
+ });
131
+ });
132
+
133
+ // Legend.
134
+ const ly = padT + rows.length * rowH + 22;
135
+ let lx = padL - 14;
136
+ for (const key of ["Enforced", "Partial", "Prompt-only", "None"]) {
137
+ const lv = levels[key];
138
+ parts.push(`<rect x="${lx}" y="${ly - 11}" width="12" height="12" rx="2" fill="${lv.fill}"/>`);
139
+ parts.push(`<text x="${lx + 17}" y="${ly}" font-size="11.5" fill="${PALETTE.text}">${esc(key)}</text>`);
140
+ lx += 30 + key.length * 7;
141
+ }
142
+
143
+ parts.push("</svg>");
144
+ return parts.join("\n");
145
+ }
146
+
147
+ /** Horizontal bar chart for a single-series scorecard with long labels. */
148
+ export function horizontalBarChart({ title, subtitle, rows, unit = "", max }) {
149
+ const W = 720;
150
+ const rowH = 38;
151
+ const padT = 64;
152
+ const padB = 24;
153
+ const padL = 230;
154
+ const padR = 70;
155
+ const H = padT + rows.length * rowH + padB;
156
+ const plotW = W - padL - padR;
157
+ const top = Math.max(max ?? Math.max(...rows.map((r) => r.value)) * 1.15, 1);
158
+
159
+ const parts = [];
160
+ parts.push(`<svg xmlns="http://www.w3.org/2000/svg" width="${W}" height="${H}" viewBox="0 0 ${W} ${H}" font-family="-apple-system,Segoe UI,Roboto,Helvetica,Arial,sans-serif">`);
161
+ parts.push(`<rect width="${W}" height="${H}" fill="${PALETTE.bg}"/>`);
162
+ parts.push(`<text x="20" y="28" font-size="17" font-weight="700" fill="${PALETTE.text}">${esc(title)}</text>`);
163
+ if (subtitle) parts.push(`<text x="20" y="47" font-size="12" fill="${PALETTE.subtext}">${esc(subtitle)}</text>`);
164
+
165
+ rows.forEach((r, i) => {
166
+ const y = padT + i * rowH;
167
+ const bw = (Math.min(r.value, top) / top) * plotW;
168
+ parts.push(`<text x="${padL - 12}" y="${y + rowH / 2 + 4}" font-size="12" text-anchor="end" fill="${PALETTE.text}">${esc(r.label)}</text>`);
169
+ parts.push(`<rect x="${padL}" y="${y + 6}" width="${plotW}" height="${rowH - 16}" rx="3" fill="${PALETTE.grid}"/>`);
170
+ parts.push(`<rect x="${padL}" y="${y + 6}" width="${bw.toFixed(1)}" height="${rowH - 16}" rx="3" fill="${r.color || PALETTE.current}"/>`);
171
+ parts.push(`<text x="${(padL + bw + 8).toFixed(1)}" y="${y + rowH / 2 + 4}" font-size="12" font-weight="600" fill="${PALETTE.text}">${r.display ?? r.value + unit}</text>`);
172
+ });
173
+
174
+ parts.push("</svg>");
175
+ return parts.join("\n");
176
+ }