selftune 0.2.30 → 0.2.32

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (102) hide show
  1. package/README.md +83 -56
  2. package/apps/local-dashboard/dist/assets/index-B-ut4w0B.js +15 -0
  3. package/apps/local-dashboard/dist/assets/index-BFGfCVrL.css +1 -0
  4. package/apps/local-dashboard/dist/assets/vendor-ui-DfowE3Hu.js +1 -0
  5. package/apps/local-dashboard/dist/index.html +3 -3
  6. package/cli/selftune/command-surface.ts +613 -2
  7. package/cli/selftune/create/baseline.ts +429 -0
  8. package/cli/selftune/create/check.ts +35 -0
  9. package/cli/selftune/create/init.ts +115 -0
  10. package/cli/selftune/create/package-candidate-state.ts +771 -0
  11. package/cli/selftune/create/package-evaluator.ts +710 -0
  12. package/cli/selftune/create/package-fingerprint.ts +142 -0
  13. package/cli/selftune/create/package-search.ts +377 -0
  14. package/cli/selftune/create/publish.ts +431 -0
  15. package/cli/selftune/create/readiness.ts +495 -0
  16. package/cli/selftune/create/replay.ts +330 -0
  17. package/cli/selftune/create/report.ts +74 -0
  18. package/cli/selftune/create/scaffold.ts +121 -0
  19. package/cli/selftune/create/skills-ref-adapter.ts +177 -0
  20. package/cli/selftune/create/status.ts +33 -0
  21. package/cli/selftune/create/templates.ts +249 -0
  22. package/cli/selftune/cron/setup.ts +1 -1
  23. package/cli/selftune/dashboard-action-events.ts +4 -1
  24. package/cli/selftune/dashboard-action-result.ts +789 -24
  25. package/cli/selftune/dashboard-action-stream.ts +80 -0
  26. package/cli/selftune/dashboard-contract.ts +146 -3
  27. package/cli/selftune/dashboard-server.ts +5 -4
  28. package/cli/selftune/eval/hooks-to-evals.ts +58 -35
  29. package/cli/selftune/eval/synthetic-evals.ts +145 -17
  30. package/cli/selftune/evolution/bounded-mutations.ts +1045 -0
  31. package/cli/selftune/evolution/evolve-body.ts +9 -36
  32. package/cli/selftune/evolution/evolve.ts +8 -72
  33. package/cli/selftune/evolution/stopping-criteria.ts +5 -13
  34. package/cli/selftune/evolution/unblock-suggestions.ts +0 -16
  35. package/cli/selftune/evolution/validate-host-replay.ts +115 -15
  36. package/cli/selftune/improve.ts +206 -0
  37. package/cli/selftune/index.ts +123 -6
  38. package/cli/selftune/init.ts +1 -1
  39. package/cli/selftune/localdb/queries/dashboard.ts +30 -0
  40. package/cli/selftune/localdb/schema.ts +52 -0
  41. package/cli/selftune/monitoring/watch.ts +257 -23
  42. package/cli/selftune/orchestrate/execute.ts +300 -1
  43. package/cli/selftune/orchestrate/finalize.ts +14 -0
  44. package/cli/selftune/orchestrate/plan.ts +22 -5
  45. package/cli/selftune/orchestrate/prepare.ts +59 -4
  46. package/cli/selftune/orchestrate/report.ts +1 -1
  47. package/cli/selftune/orchestrate.ts +34 -1
  48. package/cli/selftune/publish.ts +35 -0
  49. package/cli/selftune/registry/github-install.ts +256 -0
  50. package/cli/selftune/registry/index.ts +1 -1
  51. package/cli/selftune/registry/install.ts +58 -7
  52. package/cli/selftune/routes/actions.ts +81 -15
  53. package/cli/selftune/routes/overview.ts +1 -1
  54. package/cli/selftune/routes/skill-report.ts +147 -2
  55. package/cli/selftune/run.ts +18 -0
  56. package/cli/selftune/schedule.ts +3 -3
  57. package/cli/selftune/search-run.ts +703 -0
  58. package/cli/selftune/status.ts +35 -11
  59. package/cli/selftune/testing-readiness.ts +431 -40
  60. package/cli/selftune/types.ts +316 -0
  61. package/cli/selftune/utils/eval-readiness.ts +1 -0
  62. package/cli/selftune/utils/json-output.ts +11 -0
  63. package/cli/selftune/utils/lifecycle-surface.ts +48 -0
  64. package/cli/selftune/utils/query-filter.ts +82 -1
  65. package/cli/selftune/utils/tui.ts +85 -2
  66. package/cli/selftune/verify.ts +205 -0
  67. package/cli/selftune/workflows/proposals.ts +1 -1
  68. package/cli/selftune/workflows/skill-scaffold.ts +141 -63
  69. package/cli/selftune/workflows/workflows.ts +4 -4
  70. package/package.json +1 -1
  71. package/packages/dashboard-core/src/routes/manifest.ts +2 -2
  72. package/packages/ui/src/components/SkillReportPanels.tsx +7 -7
  73. package/packages/ui/src/primitives/button.tsx +5 -0
  74. package/skill/SKILL.md +148 -85
  75. package/skill/references/cli-quick-reference.md +16 -1
  76. package/skill/references/creator-playbook.md +31 -10
  77. package/skill/workflows/Baseline.md +8 -9
  78. package/skill/workflows/Contributions.md +4 -4
  79. package/skill/workflows/Create.md +173 -0
  80. package/skill/workflows/CreateTestDeploy.md +34 -30
  81. package/skill/workflows/Cron.md +2 -2
  82. package/skill/workflows/Dashboard.md +3 -3
  83. package/skill/workflows/Evals.md +13 -7
  84. package/skill/workflows/Evolve.md +75 -32
  85. package/skill/workflows/EvolveBody.md +22 -15
  86. package/skill/workflows/Hook.md +1 -1
  87. package/skill/workflows/Improve.md +168 -0
  88. package/skill/workflows/Initialize.md +3 -3
  89. package/skill/workflows/Orchestrate.md +49 -12
  90. package/skill/workflows/Publish.md +100 -0
  91. package/skill/workflows/Registry.md +19 -13
  92. package/skill/workflows/Run.md +72 -0
  93. package/skill/workflows/Schedule.md +2 -2
  94. package/skill/workflows/SearchRun.md +89 -0
  95. package/skill/workflows/SignalsDashboard.md +2 -2
  96. package/skill/workflows/UnitTest.md +13 -4
  97. package/skill/workflows/Verify.md +136 -0
  98. package/skill/workflows/Watch.md +114 -47
  99. package/skill/workflows/Workflows.md +13 -8
  100. package/apps/local-dashboard/dist/assets/index-BcXquWFB.css +0 -1
  101. package/apps/local-dashboard/dist/assets/index-Coq42hE4.js +0 -15
  102. package/apps/local-dashboard/dist/assets/vendor-ui-B0H8s1mP.js +0 -1
package/README.md CHANGED
@@ -101,7 +101,7 @@ selftune learned that real users say "slides", "deck", "presentation for Monday"
101
101
 
102
102
  **I use skills for non-coding work** — Marketing workflows, research pipelines, compliance checks, slide decks. You say "make me a presentation" and nothing happens. selftune learns that "slides", "deck", and "presentation for Monday" all mean the same skill — and fixes the routing automatically.
103
103
 
104
- ## Creator Loop
104
+ ## Creator Lifecycle
105
105
 
106
106
  If you publish skills, the loop is:
107
107
 
@@ -115,26 +115,40 @@ If you publish skills, the loop is:
115
115
 
116
116
  ## How to Test a Skill
117
117
 
118
- The default creator loop is:
118
+ The simplified lifecycle is:
119
119
 
120
120
  ```bash
121
+ selftune verify --skill-path path/to/SKILL.md
122
+ selftune publish --skill-path path/to/SKILL.md
123
+ selftune search-run --skill-path path/to/SKILL.md --surface both
124
+ selftune improve --skill my-skill --skill-path path/to/SKILL.md --dry-run --validation-mode replay
125
+ selftune run --dry-run
126
+ ```
127
+
128
+ What each step gives you:
129
+
130
+ - `verify` runs the draft-package readiness check first, then emits the benchmark-style package report once the draft is ready. If readiness is still incomplete, it surfaces the next missing low-level step instead of guessing.
131
+ - `publish` delegates to the draft-package publish flow and starts `watch` by default. Use `--no-watch` if you want a manual monitoring handoff.
132
+ - `search-run` evaluates a bounded minibatch of routing/body package variants against the accepted frontier and persists the measured winner plus provenance.
133
+ - `search-run` is currently an explicit package-improvement surface. `run` / `orchestrate` do not auto-select bounded package search yet.
134
+ - `improve` is the intention-level alias for `evolve` and `evolve body`. Use `--scope description|routing|body` when you already know the right mutation surface.
135
+ - `run` is the intention-level alias for `orchestrate`, so you can preview or operate the whole closed loop without remembering the internal command name.
136
+
137
+ The advanced lifecycle primitives are still available when you need explicit control:
138
+
139
+ ```bash
140
+ selftune create check --skill-path path/to/SKILL.md
121
141
  selftune eval generate --skill my-skill
122
142
  selftune eval unit-test --skill my-skill --generate --skill-path path/to/SKILL.md
143
+ selftune create replay --skill-path path/to/SKILL.md --mode package
144
+ selftune create baseline --skill-path path/to/SKILL.md --mode package
145
+ selftune create report --skill-path path/to/SKILL.md
146
+ selftune create publish --skill-path path/to/SKILL.md --watch
123
147
  selftune evolve --skill my-skill --skill-path path/to/SKILL.md --dry-run --validation-mode replay
124
148
  selftune grade baseline --skill my-skill --skill-path path/to/SKILL.md
125
- selftune evolve --skill my-skill --skill-path path/to/SKILL.md --with-baseline
126
149
  selftune watch --skill my-skill
127
150
  ```
128
151
 
129
- What each step gives you:
130
-
131
- - `eval generate` builds the routing eval set and mirrors a canonical copy into `~/.selftune/eval-sets/<skill>.json`
132
- - `eval unit-test` creates or runs deterministic skill tests and stores the latest run summary under `~/.selftune/unit-tests/<skill>.last-run.json`
133
- - `evolve --dry-run --validation-mode replay` proves the candidate against replay-backed validation without deploying
134
- - `grade baseline` stores a no-skill comparison in SQLite so the dashboard and `selftune status` can tell whether the skill adds value
135
- - `evolve --with-baseline` is the live deploy step once the creator loop is complete
136
- - `watch` keeps the deployed skill under regression monitoring
137
-
138
152
  The local dashboard overview, per-skill report, and `selftune status` now all read from those artifacts to show whether a skill is blocked on testing, ready to deploy, or already under watch.
139
153
 
140
154
  ## How It Works
@@ -163,7 +177,7 @@ selftune is an open-source CLI and agent skill that provides skill-level observa
163
177
 
164
178
  ### How is selftune different from LLM observability tools?
165
179
 
166
- LLM observability tools (Langfuse, LangSmith, Arize) trace what happens inside model calls — token usage, latency, chain failures. selftune operates at a different layer: it monitors whether the *right skill was triggered* for the *right query* in the first place. They're complementary, not competitive.
180
+ LLM observability tools (Langfuse, LangSmith, Arize) trace what happens inside model calls — token usage, latency, chain failures. selftune operates at a different layer: it monitors whether the _right skill was triggered_ for the _right query_ in the first place. They're complementary, not competitive.
167
181
 
168
182
  ### How is this different from agents that "learn"?
169
183
 
@@ -181,41 +195,54 @@ selftune is empirical. It observes real sessions, grades execution quality, dete
181
195
 
182
196
  Your agent runs these — you just say what you want ("improve my skills", "show the dashboard").
183
197
 
184
- | Group | Command | What it does |
185
- | ---------- | -------------------------------------------- | ------------------------------------------------------------------------------------------- |
186
- | | `selftune status` | Get a one-line health summary plus compact attention / improving highlights |
187
- | | `selftune last` | Quick insight from the most recent session |
188
- | | `selftune orchestrate` | Run the full autonomous loop (sync grade evolve → watch) |
189
- | | `selftune sync` | Replay source-truth transcripts/rollouts into SQLite and refresh repair state |
190
- | | `selftune dashboard` | Open the visual skill health dashboard |
191
- | | `selftune doctor` | Health check: logs, hooks, config, permissions |
192
- | **ingest** | `selftune ingest claude` | Backfill from Claude Code transcripts |
193
- | | `selftune ingest codex` | Import Codex rollout logs (experimental) |
194
- | **grade** | `selftune grade --skill <name>` | Grade a skill session with evidence |
195
- | | `selftune grade auto` | Auto-grade recent sessions for ungraded skills |
196
- | | `selftune grade baseline --skill <name>` | Measure skill value vs no-skill baseline |
197
- | **evolve** | `selftune evolve --skill <name>` | Propose, validate, and deploy improved descriptions |
198
- | | `selftune evolve body --skill <name>` | Evolve full skill body or routing table |
199
- | | `selftune evolve rollback --skill <name>` | Rollback a previous evolution |
200
- | **eval** | `selftune eval generate --skill <name>` | Generate eval sets (`--synthetic` for cold-start) |
201
- | | `selftune eval unit-test --skill <name>` | Run or generate skill-level unit tests |
202
- | | `selftune eval composability --skill <name>` | Detect conflicts between co-occurring skills |
203
- | | `selftune eval family-overlap --prefix sc-` | Detect sibling overlap and suggest when a skill family should be consolidated |
204
- | | `selftune eval import` | Import external eval corpus from [SkillsBench](https://github.com/benchflow-ai/skillsbench) |
205
- | **hooks** | `selftune codex install` | Install selftune hooks into Codex (`--dry-run`, `--uninstall`) |
206
- | | `selftune opencode install` | Install selftune hooks into OpenCode |
207
- | | `selftune cline install` | Install selftune hooks into Cline |
208
- | | `selftune pi install` | Install selftune hooks into Pi |
209
- | **auto** | `selftune cron setup` | Install OS-level scheduling (cron/launchd/systemd) |
210
- | | `selftune watch --skill <name>` | Monitor after deploy. Auto-rollback on regression. |
211
- | **other** | `selftune workflows` | Discover and manage multi-skill workflows |
212
- | | `selftune contributions` | Manage creator-directed sharing preferences |
213
- | | `selftune creator-contributions` | Create or remove bundled `selftune.contribute.json` configs for skill creators |
214
- | | `selftune contribute` | Export an anonymized community contribution bundle |
215
- | | `selftune recover` | Recover SQLite from legacy/exported JSONL during migration or disaster recovery |
216
- | | `selftune badge --skill <name>` | Generate a health badge for your skill's README |
217
- | | `selftune telemetry` | Manage anonymous usage analytics (status, enable, disable) |
218
- | | `selftune alpha upload` | Run a manual SQLite-backed alpha upload cycle and emit a JSON send summary |
198
+ | Group | Command | What it does |
199
+ | ---------- | ---------------------------------------------- | ------------------------------------------------------------------------------------------- |
200
+ | | `selftune status` | Get a one-line health summary plus compact attention / improving highlights |
201
+ | | `selftune last` | Quick insight from the most recent session |
202
+ | | `selftune verify --skill-path <path>` | Check draft-package readiness, then emit benchmark-style verification evidence |
203
+ | | `selftune publish --skill-path <path>` | Publish a verified draft package and start watch by default |
204
+ | | `selftune search-run --skill-path <path>` | Run bounded package search over routing/body variants against the measured frontier |
205
+ | | `selftune improve --skill <name>` | Route to the smallest matching evolution surface |
206
+ | | `selftune run` | Run the full autonomous loop through the simplified lifecycle alias |
207
+ | | `selftune orchestrate` | Advanced alias for `run` |
208
+ | | `selftune sync` | Replay source-truth transcripts/rollouts into SQLite and refresh repair state |
209
+ | | `selftune dashboard` | Open the visual skill health dashboard |
210
+ | | `selftune doctor` | Health check: logs, hooks, config, permissions |
211
+ | **ingest** | `selftune ingest claude` | Backfill from Claude Code transcripts |
212
+ | | `selftune ingest codex` | Import Codex rollout logs (experimental) |
213
+ | **grade** | `selftune grade --skill <name>` | Grade a skill session with evidence |
214
+ | | `selftune grade auto` | Auto-grade recent sessions for ungraded skills |
215
+ | | `selftune grade baseline --skill <name>` | Measure skill value vs no-skill baseline |
216
+ | **evolve** | `selftune evolve --skill <name>` | Propose, validate, and deploy improved descriptions |
217
+ | | `selftune evolve body --skill <name>` | Evolve full skill body or routing table |
218
+ | | `selftune evolve rollback --skill <name>` | Rollback a previous evolution |
219
+ | **create** | `selftune create init --name <name>` | Initialize a new draft skill package skeleton |
220
+ | | `selftune create status --skill-path <path>` | Show the current draft-package readiness |
221
+ | | `selftune create scaffold --from-workflow 1` | Scaffold a draft skill package from an observed workflow |
222
+ | | `selftune create check --skill-path <path>` | Advanced draft-package readiness primitive behind `verify` |
223
+ | | `selftune create replay --skill-path <path>` | Replay-validate the current draft package |
224
+ | | `selftune create baseline --skill-path <path>` | Measure draft-package lift vs a no-skill baseline |
225
+ | | `selftune create report --skill-path <path>` | Render measured draft-package evidence as a benchmark-style report |
226
+ | | `selftune create publish --skill-path <path>` | Advanced publish primitive behind `publish` |
227
+ | **eval** | `selftune eval generate --skill <name>` | Generate eval sets (`--synthetic` for cold-start) |
228
+ | | `selftune eval unit-test --skill <name>` | Run or generate skill-level unit tests |
229
+ | | `selftune eval composability --skill <name>` | Detect conflicts between co-occurring skills |
230
+ | | `selftune eval family-overlap --prefix sc-` | Detect sibling overlap and suggest when a skill family should be consolidated |
231
+ | | `selftune eval import` | Import external eval corpus from [SkillsBench](https://github.com/benchflow-ai/skillsbench) |
232
+ | **hooks** | `selftune codex install` | Install selftune hooks into Codex (`--dry-run`, `--uninstall`) |
233
+ | | `selftune opencode install` | Install selftune hooks into OpenCode |
234
+ | | `selftune cline install` | Install selftune hooks into Cline |
235
+ | | `selftune pi install` | Install selftune hooks into Pi |
236
+ | **auto** | `selftune cron setup` | Install OS-level scheduling (cron/launchd/systemd) |
237
+ | | `selftune watch --skill <name>` | Monitor after deploy. Auto-rollback on regression. |
238
+ | **other** | `selftune workflows` | Discover and manage multi-skill workflows |
239
+ | | `selftune contributions` | Manage creator-directed sharing preferences |
240
+ | | `selftune creator-contributions` | Create or remove bundled `selftune.contribute.json` configs for skill creators |
241
+ | | `selftune contribute` | Export an anonymized community contribution bundle |
242
+ | | `selftune recover` | Recover SQLite from legacy/exported JSONL during migration or disaster recovery |
243
+ | | `selftune badge --skill <name>` | Generate a health badge for your skill's README |
244
+ | | `selftune telemetry` | Manage anonymous usage analytics (status, enable, disable) |
245
+ | | `selftune alpha upload` | Run a manual SQLite-backed alpha upload cycle and emit a JSON send summary |
219
246
 
220
247
  Full command reference: `selftune --help`
221
248
 
@@ -245,14 +272,14 @@ selftune is complementary to these tools, not competitive. They trace what happe
245
272
 
246
273
  ## Platforms
247
274
 
248
- | Platform | Support | Session capture | LLM-backed judge / evolve | Optimizer agents | Config location |
249
- | --- | --- | --- | --- | --- | --- |
250
- | **Claude Code** | Full | Automatic hooks via `selftune init` + `selftune ingest claude` | Yes | Native `claude --agent` | `~/.claude/settings.json` |
251
- | **Codex** | Experimental | `selftune codex install`, `selftune ingest codex`, or `selftune ingest wrap-codex` | Yes | Inlined into `codex exec` | `~/.codex/hooks.json` |
252
- | **OpenCode** | Experimental | `selftune opencode install` + `selftune ingest opencode` | Yes | Native `opencode run --agent` | `./opencode.json` or `~/.config/opencode/opencode.json` |
253
- | **Cline** | Experimental | `selftune cline install` | No | No | `~/Documents/Cline/Hooks/` |
254
- | **OpenClaw** | Experimental | `selftune ingest openclaw` + `selftune cron setup --platform openclaw` | No | No | — |
255
- | **Pi** | Experimental | `selftune pi install` + `selftune ingest pi` | Yes | Inlined into `pi -p` with system-prompt setup | `~/.pi/extensions/selftune/` |
275
+ | Platform | Support | Session capture | LLM-backed judge / evolve | Optimizer agents | Config location |
276
+ | --------------- | ------------ | ---------------------------------------------------------------------------------- | ------------------------- | --------------------------------------------- | ------------------------------------------------------- |
277
+ | **Claude Code** | Full | Automatic hooks via `selftune init` + `selftune ingest claude` | Yes | Native `claude --agent` | `~/.claude/settings.json` |
278
+ | **Codex** | Experimental | `selftune codex install`, `selftune ingest codex`, or `selftune ingest wrap-codex` | Yes | Inlined into `codex exec` | `~/.codex/hooks.json` |
279
+ | **OpenCode** | Experimental | `selftune opencode install` + `selftune ingest opencode` | Yes | Native `opencode run --agent` | `./opencode.json` or `~/.config/opencode/opencode.json` |
280
+ | **Cline** | Experimental | `selftune cline install` | No | No | `~/Documents/Cline/Hooks/` |
281
+ | **OpenClaw** | Experimental | `selftune ingest openclaw` + `selftune cron setup --platform openclaw` | No | No | — |
282
+ | **Pi** | Experimental | `selftune pi install` + `selftune ingest pi` | Yes | Inlined into `pi -p` with system-prompt setup | `~/.pi/extensions/selftune/` |
256
283
 
257
284
  Codex, OpenCode, Claude Code, and Pi can run selftune's LLM-backed judge, eval, and optimizer workflows. Codex and OpenCode also participate in experimental runtime replay validation during `selftune evolve`, using `codex exec --json` and `opencode run --format json` respectively. OpenCode agents are registered in config during `selftune opencode install`; Codex still inlines bundled agent instructions into the prompt because it has no native `--agent` flag. OpenCode has weaker hook coverage than Claude Code because it lacks a prompt-submission event and cannot hard-block pre-tool writes. Pi has no native subagent flag, so selftune inlines bundled optimizer instructions into `pi -p` calls. Cline is telemetry-only today. OpenClaw remains ingest and cron only. All platforms write to the same shared log schema.
258
285