@dietrichgebert/ponytail 4.8.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (37) hide show
  1. package/.opencode/command/ponytail-audit.md +5 -0
  2. package/.opencode/command/ponytail-debt.md +5 -0
  3. package/.opencode/command/ponytail-gain.md +5 -0
  4. package/.opencode/command/ponytail-help.md +5 -0
  5. package/.opencode/command/ponytail-review.md +5 -0
  6. package/.opencode/command/ponytail.md +5 -0
  7. package/.opencode/plugins/ponytail.mjs +100 -0
  8. package/AGENTS.md +32 -0
  9. package/LICENSE +21 -0
  10. package/README.es.md +250 -0
  11. package/README.md +289 -0
  12. package/assets/benchmark-3model.svg +21 -0
  13. package/assets/benchmark-agentic.svg +62 -0
  14. package/assets/logo-dark.png +0 -0
  15. package/assets/logo-dark.svg +115 -0
  16. package/assets/logo.png +0 -0
  17. package/assets/social-preview.png +0 -0
  18. package/hooks/claude-codex-hooks.json +31 -0
  19. package/hooks/copilot-hooks.json +21 -0
  20. package/hooks/ponytail-activate.js +91 -0
  21. package/hooks/ponytail-config.js +122 -0
  22. package/hooks/ponytail-instructions.js +94 -0
  23. package/hooks/ponytail-mode-tracker.js +55 -0
  24. package/hooks/ponytail-runtime.js +51 -0
  25. package/hooks/ponytail-statusline.ps1 +21 -0
  26. package/hooks/ponytail-statusline.sh +12 -0
  27. package/package.json +43 -0
  28. package/pi-extension/index.js +189 -0
  29. package/pi-extension/package.json +8 -0
  30. package/pi-extension/test/extension.test.js +167 -0
  31. package/pi-extension/test/helpers.test.js +92 -0
  32. package/skills/ponytail/SKILL.md +117 -0
  33. package/skills/ponytail-audit/SKILL.md +41 -0
  34. package/skills/ponytail-debt/SKILL.md +44 -0
  35. package/skills/ponytail-gain/SKILL.md +50 -0
  36. package/skills/ponytail-help/SKILL.md +69 -0
  37. package/skills/ponytail-review/SKILL.md +57 -0
package/README.md ADDED
@@ -0,0 +1,289 @@
1
+ <p align="center">
2
+ <picture>
3
+ <source media="(prefers-color-scheme: dark)" srcset="assets/logo-dark.png">
4
+ <img src="assets/logo.png" width="220" alt="Ponytail, the lazy senior dev">
5
+ </picture>
6
+ </p>
7
+
8
+ <h1 align="center">Ponytail</h1>
9
+
10
+ <p align="center">
11
+ <em>He says nothing. He writes one line. It works.</em>
12
+ </p>
13
+
14
+ <p align="center">
15
+ <img src="https://img.shields.io/github/stars/DietrichGebert/ponytail?style=flat-square&color=111111&label=stars" alt="Stars">
16
+ <img src="https://img.shields.io/github/v/release/DietrichGebert/ponytail?style=flat-square&color=111111&label=release" alt="Release">
17
+ <img src="https://img.shields.io/badge/works%20with-14%20agents-111111?style=flat-square" alt="Works with 14 agents">
18
+ <img src="https://img.shields.io/badge/license-MIT-111111?style=flat-square" alt="MIT license">
19
+ </p>
20
+
21
+ <p align="center">
22
+ <strong>~54% less code (up to 94%) &middot; ~20% cheaper &middot; ~27% faster &middot; 100% safe</strong><br>
23
+ <sub>Measured on real Claude Code sessions editing a real open-source repo (FastAPI + React), against the same agent with no skill. ~54% is the mean across 12 feature tasks (Haiku 4.5, n=4); it reaches 94% where an agent over-builds (a date picker) and is near zero where the code is already minimal. ponytail keeps every safety guard while a bare "write one-liners" prompt drops one. (The earlier single-shot benchmark reported 80-94% as a flat figure; against a fair agentic baseline that is the per-task ceiling, not the average.) <a href="benchmarks/results/2026-06-18-agentic.md">Full writeup</a> &middot; <a href="benchmarks/">reproduce it</a>.</sub>
24
+ </p>
25
+
26
+ <p align="center">
27
+ <sub><a href="README.es.md">Español</a></sub>
28
+ </p>
29
+
30
+ ---
31
+
32
+ You know him. Long ponytail. Oval glasses. Has been at the company longer than the version control. You show him fifty lines; he looks at them, says nothing, and replaces them with one.
33
+
34
+ Ponytail puts him inside your AI agent.
35
+
36
+ ## Before / after
37
+
38
+ You ask for a date picker. Your agent installs flatpickr, writes a wrapper component, adds a stylesheet, and starts a discussion about timezones.
39
+
40
+ With ponytail:
41
+
42
+ ```html
43
+ <!-- ponytail: browser has one -->
44
+ <input type="date">
45
+ ```
46
+
47
+ More survivors in [examples/](examples/).
48
+
49
+ ## Numbers
50
+
51
+ The honest measurement is a real agent doing real work: a headless Claude Code session editing [tiangolo's full-stack-fastapi-template](https://github.com/fastapi/full-stack-fastapi-template) (a real FastAPI + React repo), scored on the `git diff` it leaves behind. Twelve feature tickets, the same agent with and without the skill, n=4, Haiku 4.5.
52
+
53
+ <p align="center">
54
+ <img src="assets/benchmark-agentic.svg" width="860" alt="Each arm as a percent of the no-skill baseline across LOC, tokens, cost and time (Haiku 4.5). ponytail is lowest on every metric (LOC 46%, tokens 78%, cost 80%, time 73%); caveman rises above 100% on tokens, cost and time; yagni-oneliner LOC 67%. Safety, separate adversarial tier: baseline, caveman and ponytail 100%, yagni-oneliner 95%.">
55
+ </p>
56
+
57
+ | vs no-skill baseline | LOC | tokens | cost | time | safe |
58
+ |---|--:|--:|--:|--:|--:|
59
+ | **ponytail** | **-54%** | **-22%** | **-20%** | **-27%** | **100%** |
60
+ | caveman (terse-prose control) | -20% | +7% | +3% | +2% | 100% |
61
+ | "YAGNI + one-liners" prompt | -33% | -14% | -21% | -30% | 95% |
62
+
63
+ ponytail is the only arm that cuts every metric, and the only one that stays fully safe while doing it. The cut is biggest where there is a real over-build trap (date picker 404 to 23 lines, color picker 287 to 23, because it reaches for a native `<input>` instead of a component) and near zero on code that is already minimal. Full method, per-task tables, and limitations: [benchmarks/results/2026-06-18-agentic.md](benchmarks/results/2026-06-18-agentic.md).
64
+
65
+ <details>
66
+ <summary><strong>Older single-shot numbers (isolated generation)</strong></summary>
67
+
68
+ Five everyday tasks, three models, three arms (no skill, [caveman](https://github.com/JuliusBrussee/caveman), ponytail), ten runs, median reported. One prompt, one completion, counting lines of the answer:
69
+
70
+ <p align="center">
71
+ <img src="assets/benchmark-3model.svg" width="860" alt="Median lines of code per arm across Haiku, Sonnet and Opus">
72
+ </p>
73
+
74
+ This showed **80-94% less code**. [#126](https://github.com/DietrichGebert/ponytail/issues/126) fairly pointed out that the bare-model baseline pads its answer with prose and options, so that gap is partly a conversational-baseline artifact. The agentic numbers above are the corrected, defensible version. Reproduce the single-shot run with `npx promptfoo eval -c benchmarks/promptfooconfig.yaml`.
75
+
76
+ </details>
77
+
78
+ **The rule was never "fewest tokens."** It is: write only what the task needs, and never cut validation, error handling, security, or accessibility. The code ends up small because it is necessary, not golfed. Lower cost and latency are a side effect on the models that follow the ladder; a terse reasoning model that spends thinking tokens deliberating the rungs can go the other way (on GPT-5.5 it does).
79
+
80
+ ## How it works
81
+
82
+ Before writing code, the agent stops at the first rung that holds:
83
+
84
+ ```
85
+ 1. Does this need to exist? → no: skip it (YAGNI)
86
+ 2. Already in this codebase? → reuse it, don't rewrite
87
+ 3. Stdlib does it? → use it
88
+ 4. Native platform feature? → use it
89
+ 5. Installed dependency? → use it
90
+ 6. One line? → one line
91
+ 7. Only then: the minimum that works
92
+ ```
93
+
94
+ The ladder runs *after* it understands the problem, not instead of it: it reads the code the change touches and traces the real flow before picking a rung. Lazy about the solution, never about reading.
95
+
96
+ Lazy, not negligent: trust-boundary validation, data-loss handling, security, and accessibility are never on the chopping block.
97
+
98
+ ## Install
99
+
100
+ The most effort ponytail will ever ask of you:
101
+
102
+ The Claude Code and Codex plugins run two tiny Node.js lifecycle hooks, so `node` needs to be on your PATH (note for Nix/nvm users: it must be on the non-interactive shell's PATH). If it isn't, the skills still work, the always-on activation just stays quiet instead of erroring on every prompt.
103
+
104
+ ### Claude Code
105
+
106
+ ```
107
+ /plugin marketplace add DietrichGebert/ponytail
108
+ ```
109
+ ```
110
+ /plugin install ponytail@ponytail
111
+ ```
112
+ (You have to send two separate prompts for the install to work)
113
+
114
+ The desktop app has no `/plugin` command. Install it from the UI instead: Customize, the + by personal plugins, Create plugin and add marketplace, Add from repository, then enter the repo URL (thanks @NiklasDHahn, #98).
115
+
116
+ ### Codex
117
+
118
+ ```bash
119
+ codex plugin marketplace add DietrichGebert/ponytail
120
+ codex
121
+ ```
122
+
123
+ Open `/plugins`, select the Ponytail marketplace, and install Ponytail. Then
124
+ open `/hooks`, review and trust its two lifecycle hooks, and start a new thread.
125
+
126
+ This same install also covers the Codex desktop app: restart the app after installing and it picks up the plugin.
127
+
128
+ ### GitHub Copilot CLI
129
+
130
+ ```bash
131
+ copilot plugin marketplace add DietrichGebert/ponytail
132
+ copilot plugin install ponytail@ponytail
133
+ ```
134
+
135
+ In an interactive Copilot CLI session, use the slash equivalents:
136
+
137
+ ```
138
+ /plugin marketplace add DietrichGebert/ponytail
139
+ /plugin install ponytail@ponytail
140
+ ```
141
+
142
+ Copilot CLI namespaces plugin commands by plugin name. For example:
143
+
144
+ ```text
145
+ /ponytail:ponytail ultra
146
+ /ponytail:ponytail-review
147
+ ```
148
+
149
+ ### Pi agent harness
150
+
151
+ ```
152
+ pi install git:github.com/DietrichGebert/ponytail
153
+ ```
154
+
155
+ ### OpenCode
156
+
157
+ Add to `opencode.json`:
158
+
159
+ ```json
160
+ { "plugin": ["@dietrichgebert/ponytail"] }
161
+ ```
162
+
163
+ Run from a checkout instead (the plugin reuses `hooks/` and `skills/`):
164
+
165
+ ```json
166
+ { "plugin": ["./.opencode/plugins/ponytail.mjs"] }
167
+ ```
168
+
169
+ Injects the ruleset every turn at the active level; adds the `/ponytail` commands (see [Commands](#commands)). OpenCode also auto-loads this repo's `AGENTS.md`, so the rules hold even without the plugin. The plugin adds the `lite/full/ultra/off` levels.
170
+
171
+ The `./` path resolves against your project's `opencode.json`; to share one checkout across projects, point it at the absolute path of the `.mjs` instead (it finds its `hooks/` and `skills/` relative to its own file).
172
+
173
+ The plugin path loads the ruleset everywhere, but the `/ponytail` commands are separate files in `.opencode/command/` that OpenCode only discovers from your project or the global commands dir. To use them outside this checkout, link them once: `ln -sf /absolute/path/to/ponytail/.opencode/command/* ~/.config/opencode/command/`.
174
+
175
+ ### Gemini CLI
176
+
177
+ ```bash
178
+ gemini extensions install https://github.com/DietrichGebert/ponytail
179
+ ```
180
+
181
+ Loads the ruleset as always-on context every session and registers the `/ponytail` commands; the `skills/` ship too, activated when a task needs them.
182
+ The Gemini adapter intentionally does not ship a root `hooks/hooks.json`: Gemini auto-loads that path, while Ponytail's lifecycle hooks use Claude/Codex event names.
183
+
184
+ ### Antigravity CLI
185
+
186
+ Google is renaming Gemini CLI to Antigravity CLI (the `agy` binary); the same extension installs there:
187
+
188
+ ```bash
189
+ agy plugin install https://github.com/DietrichGebert/ponytail
190
+ ```
191
+
192
+ It reuses this repo's `gemini-extension.json`. One difference: Antigravity converts the `/ponytail` commands into skills, so you type them into the chat (e.g. `/ponytail-review` as a message) instead of picking them from a slash menu. Until the migration completes (around June 18, 2026), `gemini extensions install` still works too. To run it as an always-on rule instead, drop the ruleset into `.agents/rules/`.
193
+
194
+ ### CodeWhale
195
+
196
+ Reads `AGENTS.md` from the project root, zero setup. Copy [`AGENTS.md`](AGENTS.md) to your project, or run `codewhale` from a checkout of this repo. That's it.
197
+
198
+ ### Swival
199
+
200
+ Stage the collection in your library first, then add the skills you want:
201
+
202
+ ```bash
203
+ swival skills add --global https://github.com/DietrichGebert/ponytail # stage into ~/.config/swival/library
204
+ swival skills add ponytail # install the collection into this project
205
+ swival skills add --global ponytail # or activate it in every project
206
+ ```
207
+
208
+ Swival also reads `AGENTS.md` from the project root and `~/.config/swival/AGENTS.md` globally, the instruction-only fallback.
209
+
210
+ On the command line, use a `$` prefix to explicitly activate a skill. For example: `$ponytail-review`.
211
+
212
+ ### OpenClaw
213
+
214
+ ```bash
215
+ clawhub install ponytail
216
+ ```
217
+
218
+ Installs ponytail as an OpenClaw skill from ClawHub; the review, audit, debt, gain, and help skills install the same way (`clawhub install ponytail-review`, and so on). OpenClaw applies it on coding tasks and also exposes it as a `/ponytail` command. Without ClawHub, copy [`.openclaw/skills/ponytail`](.openclaw/skills/) into `~/.openclaw/skills/`.
219
+
220
+ That was it. He'd be proud. He won't say it.
221
+
222
+ Active every session, with a handful of commands (see [Commands](#commands)). `/ponytail ultra` exists for when the codebase has wronged you personally. Startup and mode-change text shows the current mode.
223
+
224
+ Set the level for every new session with the `PONYTAIL_DEFAULT_MODE` env var (`lite`/`full`/`ultra`/`off`), or a `defaultMode` field in `~/.config/ponytail/config.json` (`%APPDATA%\ponytail\config.json` on Windows). The default is `full`.
225
+
226
+ Cursor, Windsurf, Cline, GitHub Copilot (editor), Aider, Kiro, Zed, CodeWhale, Swival: copy the matching rules file from this repo ([`.cursor/rules/`](.cursor/rules/), [`.windsurf/rules/`](.windsurf/rules/), [`.clinerules/`](.clinerules/), [`.github/copilot-instructions.md`](.github/copilot-instructions.md), [`AGENTS.md`](AGENTS.md), [`.kiro/steering/`](.kiro/steering/)).
227
+
228
+ Kiro: copy `.kiro/steering/ponytail.md` to `~/.kiro/steering/` (global) or `.kiro/steering/` in your project.
229
+
230
+ GitHub Copilot CLI fallback (instruction-only mode): it reads `AGENTS.md` and `.github/copilot-instructions.md` in a project, or copy the rules into `~/.copilot/copilot-instructions.md` to run ponytail in every project. This path keeps always-on guidance, but does not add plugin mode switches or hooks.
231
+
232
+ VS Code with the Codex extension reads `AGENTS.md`, which this repo ships, so it works from the repo root with no setup (`~/.codex/AGENTS.md` makes Codex global).
233
+
234
+ Which files map to which agent: [Agent portability](docs/agent-portability.md).
235
+
236
+ ### Uninstall
237
+
238
+ | Host | Command |
239
+ |------|---------|
240
+ | Claude Code | `/plugin remove ponytail` |
241
+ | Codex | `codex plugin remove ponytail` |
242
+ | Pi agent | `pi uninstall ponytail` |
243
+ | Cursor / Windsurf / Cline / etc. | Delete the copied rule file |
244
+
245
+ These remove the plugin's own files. They leave behind a small amount of state ponytail writes outside the plugin folder: the mode flag, `~/.config/ponytail/config.json`, and (if you accepted the setup nudge) a `statusLine` entry in `~/.claude/settings.json`. Run `node scripts/uninstall.js` to clean those up too. **Run it before the host remove command above** — the script is itself a plugin file, so removing the plugin first deletes it (or run it from a separate clone of this repo). It only removes the statusLine entry if it points at ponytail's own script, so a statusline you set up yourself is left untouched.
246
+
247
+ ## Commands
248
+
249
+ | Command | What it does |
250
+ |---------|--------------|
251
+ | `/ponytail [lite \| full \| ultra \| off]` | Set the intensity, or turn it off. No argument reports the current level. |
252
+ | `/ponytail-review` | Review the current diff for over-engineering, hands back a delete-list. |
253
+ | `/ponytail-audit` | Audit the whole repo for over-engineering, not just the diff. |
254
+ | `/ponytail-debt` | Harvest the `ponytail:` shortcuts you've deferred into a ledger, so "later" doesn't become "never". |
255
+ | `/ponytail-gain` | Show the measured impact scoreboard (less code, less cost, more speed) from the benchmark. |
256
+ | `/ponytail-help` | Quick reference for the commands above. |
257
+
258
+ Commands need a skill-capable host (Claude Code, Codex, OpenCode, Gemini, pi, Swival). In Codex they're skills, invoke with `@` (`@ponytail-review`). The instruction-only adapters (Cursor, Windsurf, Cline, Copilot, Kiro, Antigravity) load the always-on ruleset without the commands.
259
+
260
+ ## Development
261
+
262
+ When changing the compact rule text, keep the agent copies aligned:
263
+
264
+ ```bash
265
+ node scripts/check-rule-copies.js
266
+ npm test
267
+ ```
268
+
269
+ The OpenClaw skill package (`.openclaw/skills/`) is generated from `skills/`; rerun `node scripts/build-openclaw-skills.js` after changing a skill, the test suite fails if it is stale. To publish the skills to ClawHub, run `clawhub login` once, then `node scripts/publish-openclaw-skills.js` (it publishes all six at the `package.json` version; pass `--dry-run` to preview).
270
+
271
+ The correctness benchmark spawns Python for email and CSV checks; `python3` is tried before `python`. CSV checks need `pandas` installed locally.
272
+
273
+ ## FAQ
274
+
275
+ **Does it need a config file?**
276
+ No. An optional `~/.config/ponytail/config.json` or `PONYTAIL_DEFAULT_MODE` env var can set the default level, but nothing is required.
277
+
278
+ **What if I really need the 120-line cache class?**
279
+ You don't. Insist anyway and he'll build it. Slowly. Correctly. While looking at you.
280
+
281
+ **Does it scale?**
282
+ The code you never wrote scales infinitely. Zero bugs, zero CVEs, 100% uptime since forever.
283
+
284
+ **Why "ponytail"?**
285
+ You know exactly why.
286
+
287
+ ## License
288
+
289
+ [MIT](LICENSE). The shortest license that works.
@@ -0,0 +1,21 @@
1
+ <svg viewBox="0 0 860 336" xmlns="http://www.w3.org/2000/svg" font-family="-apple-system, 'Segoe UI', Helvetica, Arial, sans-serif">
2
+ <title>Median lines of code per arm across three models</title>
3
+ <text x="20" y="26" font-size="15" font-weight="600" fill="#8b949e">Median lines of code. 10 runs per cell. Lower is leaner.</text>
4
+ <text x="20" y="45" font-size="12" fill="#8b949e" opacity="0.85">Ponytail writes 80-94% less code, costs 42-75% less, and runs 3-6x faster than a no-skill agent.</text>
5
+ <rect x="20" y="58" width="12" height="12" rx="2" fill="#8b949e"/><text x="38" y="69" font-size="13" fill="#8b949e">baseline (no skill)</text>
6
+ <rect x="190" y="58" width="12" height="12" rx="2" fill="#d9822b"/><text x="208" y="69" font-size="13" fill="#8b949e">caveman</text>
7
+ <rect x="300" y="58" width="12" height="12" rx="2" fill="#2da44e"/><text x="318" y="69" font-size="13" fill="#8b949e">ponytail</text>
8
+ <text x="112" y="119" font-size="13" font-weight="600" fill="#8b949e" text-anchor="end">Haiku</text>
9
+ <rect x="120" y="92" width="508" height="14" rx="2" fill="#8b949e"/><text x="634" y="103" font-size="11" fill="#8b949e">518</text>
10
+ <rect x="120" y="110" width="114" height="14" rx="2" fill="#d9822b"/><text x="240" y="121" font-size="11" fill="#d9822b">116</text>
11
+ <rect x="120" y="128" width="38" height="14" rx="2" fill="#2da44e"/><text x="164" y="139" font-size="11" fill="#2da44e" font-weight="600">39</text>
12
+ <text x="112" y="193" font-size="13" font-weight="600" fill="#8b949e" text-anchor="end">Sonnet</text>
13
+ <rect x="120" y="166" width="680" height="14" rx="2" fill="#8b949e"/><text x="806" y="177" font-size="11" fill="#8b949e">693</text>
14
+ <rect x="120" y="184" width="118" height="14" rx="2" fill="#d9822b"/><text x="244" y="195" font-size="11" fill="#d9822b">120</text>
15
+ <rect x="120" y="202" width="43" height="14" rx="2" fill="#2da44e"/><text x="169" y="213" font-size="11" fill="#2da44e" font-weight="600">44</text>
16
+ <text x="112" y="267" font-size="13" font-weight="600" fill="#8b949e" text-anchor="end">Opus</text>
17
+ <rect x="120" y="240" width="251" height="14" rx="2" fill="#8b949e"/><text x="377" y="251" font-size="11" fill="#8b949e">256</text>
18
+ <rect x="120" y="258" width="66" height="14" rx="2" fill="#d9822b"/><text x="192" y="269" font-size="11" fill="#d9822b">67</text>
19
+ <rect x="120" y="276" width="50" height="14" rx="2" fill="#2da44e"/><text x="176" y="287" font-size="11" fill="#2da44e" font-weight="600">51</text>
20
+ <text x="120" y="324" font-size="11" fill="#8b949e" opacity="0.8">Median of 10 runs/cell, default temperature. 5 tasks (email, debounce, CSV sum, countdown, rate-limit), same model per group. Reproduce: npx promptfoo eval -c benchmarks/promptfooconfig.yaml</text>
21
+ </svg>
@@ -0,0 +1,62 @@
1
+ <svg viewBox="0 0 860 488" xmlns="http://www.w3.org/2000/svg" font-family="-apple-system, 'Segoe UI', Helvetica, Arial, sans-serif">
2
+ <title>Each arm vs the no-skill baseline across every metric, plus safety, Claude Code on Haiku 4.5</title>
3
+ <text x="430" y="24" font-size="15" font-weight="600" fill="#8b949e" text-anchor="middle">Every metric vs the no-skill baseline (Claude Code, Haiku 4.5, 12 tasks)</text>
4
+
5
+ <rect x="212" y="38" width="12" height="12" rx="2" fill="#8b949e"/><text x="229" y="48" font-size="12" fill="#8b949e">baseline</text>
6
+ <rect x="300" y="38" width="12" height="12" rx="2" fill="#d9822b"/><text x="317" y="48" font-size="12" fill="#8b949e">caveman</text>
7
+ <rect x="392" y="38" width="12" height="12" rx="2" fill="#2da44e"/><text x="409" y="48" font-size="12" fill="#8b949e">ponytail</text>
8
+ <rect x="478" y="38" width="12" height="12" rx="2" fill="#8957e5"/><text x="495" y="48" font-size="12" fill="#8b949e">yagni-oneliner</text>
9
+
10
+ <text x="32" y="248" font-size="12" fill="#8b949e" text-anchor="middle" transform="rotate(-90 32 248)">% of baseline (lower is leaner)</text>
11
+ <line x1="85" y1="360" x2="815" y2="360" stroke="#8b949e" stroke-opacity="0.55"/>
12
+ <line x1="85" y1="305" x2="815" y2="305" stroke="#8b949e" stroke-opacity="0.16"/>
13
+ <line x1="85" y1="250" x2="815" y2="250" stroke="#8b949e" stroke-opacity="0.16"/>
14
+ <line x1="85" y1="195" x2="815" y2="195" stroke="#8b949e" stroke-opacity="0.16"/>
15
+ <line x1="85" y1="140" x2="815" y2="140" stroke="#8b949e" stroke-opacity="0.45" stroke-dasharray="4 4"/>
16
+ <text x="78" y="364" font-size="11" fill="#8b949e" text-anchor="end">0%</text>
17
+ <text x="78" y="309" font-size="11" fill="#8b949e" text-anchor="end">25%</text>
18
+ <text x="78" y="254" font-size="11" fill="#8b949e" text-anchor="end">50%</text>
19
+ <text x="78" y="199" font-size="11" fill="#8b949e" text-anchor="end">75%</text>
20
+ <text x="78" y="144" font-size="11" fill="#8b949e" text-anchor="end">100%</text>
21
+
22
+ <!-- LOC -->
23
+ <rect x="108" y="140" width="30" height="220" rx="2" fill="#8b949e"/><text x="123" y="135" font-size="10" fill="#8b949e" text-anchor="middle">100%</text>
24
+ <rect x="146" y="184" width="30" height="176" rx="2" fill="#d9822b"/><text x="161" y="179" font-size="10" fill="#d9822b" text-anchor="middle">80%</text>
25
+ <rect x="184" y="259" width="30" height="101" rx="2" fill="#2da44e"/><text x="199" y="254" font-size="10" font-weight="600" fill="#2da44e" text-anchor="middle">46%</text>
26
+ <rect x="222" y="213" width="30" height="147" rx="2" fill="#8957e5"/><text x="237" y="208" font-size="10" fill="#8957e5" text-anchor="middle">67%</text>
27
+ <text x="180" y="380" font-size="13" fill="#8b949e" text-anchor="middle">LOC</text>
28
+ <text x="180" y="395" font-size="10" fill="#8b949e" opacity="0.8" text-anchor="middle">base 191</text>
29
+
30
+ <!-- tokens -->
31
+ <rect x="288" y="140" width="30" height="220" rx="2" fill="#8b949e"/><text x="303" y="135" font-size="10" fill="#8b949e" text-anchor="middle">100%</text>
32
+ <rect x="326" y="125" width="30" height="235" rx="2" fill="#d9822b"/><text x="341" y="120" font-size="10" fill="#d9822b" text-anchor="middle">107%</text>
33
+ <rect x="364" y="188" width="30" height="172" rx="2" fill="#2da44e"/><text x="379" y="183" font-size="10" font-weight="600" fill="#2da44e" text-anchor="middle">78%</text>
34
+ <rect x="402" y="171" width="30" height="189" rx="2" fill="#8957e5"/><text x="417" y="166" font-size="10" fill="#8957e5" text-anchor="middle">86%</text>
35
+ <text x="360" y="380" font-size="13" fill="#8b949e" text-anchor="middle">tokens</text>
36
+ <text x="360" y="395" font-size="10" fill="#8b949e" opacity="0.8" text-anchor="middle">base 349k</text>
37
+
38
+ <!-- cost -->
39
+ <rect x="468" y="140" width="30" height="220" rx="2" fill="#8b949e"/><text x="483" y="135" font-size="10" fill="#8b949e" text-anchor="middle">100%</text>
40
+ <rect x="506" y="136" width="30" height="224" rx="2" fill="#d9822b"/><text x="521" y="131" font-size="10" fill="#d9822b" text-anchor="middle">102%</text>
41
+ <rect x="544" y="184" width="30" height="176" rx="2" fill="#2da44e"/><text x="559" y="179" font-size="10" font-weight="600" fill="#2da44e" text-anchor="middle">80%</text>
42
+ <rect x="582" y="188" width="30" height="172" rx="2" fill="#8957e5"/><text x="597" y="183" font-size="10" fill="#8957e5" text-anchor="middle">78%</text>
43
+ <text x="540" y="380" font-size="13" fill="#8b949e" text-anchor="middle">cost</text>
44
+ <text x="540" y="395" font-size="10" fill="#8b949e" opacity="0.8" text-anchor="middle">base $0.10</text>
45
+
46
+ <!-- time -->
47
+ <rect x="648" y="140" width="30" height="220" rx="2" fill="#8b949e"/><text x="663" y="135" font-size="10" fill="#8b949e" text-anchor="middle">100%</text>
48
+ <rect x="686" y="136" width="30" height="224" rx="2" fill="#d9822b"/><text x="701" y="131" font-size="10" fill="#d9822b" text-anchor="middle">102%</text>
49
+ <rect x="724" y="199" width="30" height="161" rx="2" fill="#2da44e"/><text x="739" y="194" font-size="10" font-weight="600" fill="#2da44e" text-anchor="middle">73%</text>
50
+ <rect x="762" y="206" width="30" height="154" rx="2" fill="#8957e5"/><text x="777" y="201" font-size="10" fill="#8957e5" text-anchor="middle">70%</text>
51
+ <text x="720" y="380" font-size="13" fill="#8b949e" text-anchor="middle">time</text>
52
+ <text x="720" y="395" font-size="10" fill="#8b949e" opacity="0.8" text-anchor="middle">base 69s</text>
53
+
54
+ <text x="20" y="418" font-size="11" fill="#8b949e" opacity="0.8">Each bar = that arm's mean as a % of the no-skill baseline (the gray 100% bars). Lower is leaner / cheaper / faster; caveman rises above 100% on tokens, cost and time. n=4.</text>
55
+
56
+ <line x1="20" y1="438" x2="815" y2="438" stroke="#8b949e" stroke-opacity="0.25"/>
57
+ <text x="20" y="460" font-size="11" fill="#8b949e" opacity="0.9">Safety, separate 6-task adversarial tier (path-traversal, SQLi, token forgery, malformed input, rate-limit). Higher is safer:</text>
58
+ <text x="90" y="478" font-size="12" fill="#8b949e">baseline 100%</text>
59
+ <text x="230" y="478" font-size="12" fill="#d9822b">caveman 100%</text>
60
+ <text x="370" y="478" font-size="12" font-weight="600" fill="#2da44e">ponytail 100%</text>
61
+ <text x="510" y="478" font-size="12" fill="#8957e5">yagni-oneliner <tspan fill="#cf222e" font-weight="600">95%</tspan> (dropped a guard once)</text>
62
+ </svg>
Binary file