selftune 0.2.16 → 0.2.19

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (91) hide show
  1. package/README.md +32 -22
  2. package/apps/local-dashboard/dist/assets/index-DnhnXQm6.js +60 -0
  3. package/apps/local-dashboard/dist/assets/index-_EcLywDg.css +1 -0
  4. package/apps/local-dashboard/dist/assets/vendor-table-BIiI3YhS.js +1 -0
  5. package/apps/local-dashboard/dist/assets/vendor-ui-CGEmUayx.js +12 -0
  6. package/apps/local-dashboard/dist/index.html +5 -5
  7. package/cli/selftune/alpha-upload/build-payloads.ts +14 -1
  8. package/cli/selftune/alpha-upload/client.ts +51 -1
  9. package/cli/selftune/alpha-upload/flush.ts +46 -5
  10. package/cli/selftune/alpha-upload/stage-canonical.ts +32 -10
  11. package/cli/selftune/alpha-upload-contract.ts +9 -0
  12. package/cli/selftune/constants.ts +92 -5
  13. package/cli/selftune/contribute/contribute.ts +30 -2
  14. package/cli/selftune/contribute/sanitize.ts +52 -5
  15. package/cli/selftune/contribution-config.ts +249 -0
  16. package/cli/selftune/contribution-relay.ts +177 -0
  17. package/cli/selftune/contribution-signals.ts +219 -0
  18. package/cli/selftune/contribution-staging.ts +147 -0
  19. package/cli/selftune/contributions.ts +532 -0
  20. package/cli/selftune/creator-contributions.ts +333 -0
  21. package/cli/selftune/dashboard-contract.ts +305 -1
  22. package/cli/selftune/dashboard-server.ts +47 -13
  23. package/cli/selftune/eval/family-overlap.ts +395 -0
  24. package/cli/selftune/eval/hooks-to-evals.ts +182 -28
  25. package/cli/selftune/eval/synthetic-evals.ts +298 -11
  26. package/cli/selftune/evolution/description-quality.ts +12 -11
  27. package/cli/selftune/evolution/evolve.ts +214 -51
  28. package/cli/selftune/evolution/validate-proposal.ts +9 -6
  29. package/cli/selftune/export.ts +2 -2
  30. package/cli/selftune/grading/grade-session.ts +20 -0
  31. package/cli/selftune/hooks/commit-track.ts +188 -0
  32. package/cli/selftune/hooks/prompt-log.ts +10 -1
  33. package/cli/selftune/hooks/session-stop.ts +2 -2
  34. package/cli/selftune/hooks/skill-eval.ts +15 -1
  35. package/cli/selftune/hooks/stdin-preview.ts +32 -0
  36. package/cli/selftune/index.ts +41 -5
  37. package/cli/selftune/ingestors/codex-rollout.ts +31 -35
  38. package/cli/selftune/ingestors/codex-wrapper.ts +32 -24
  39. package/cli/selftune/localdb/db.ts +2 -2
  40. package/cli/selftune/localdb/direct-write.ts +69 -6
  41. package/cli/selftune/localdb/queries.ts +1253 -37
  42. package/cli/selftune/localdb/schema.ts +66 -0
  43. package/cli/selftune/orchestrate.ts +32 -4
  44. package/cli/selftune/recover.ts +153 -0
  45. package/cli/selftune/repair/skill-usage.ts +363 -4
  46. package/cli/selftune/routes/actions.ts +35 -1
  47. package/cli/selftune/routes/analytics.ts +14 -0
  48. package/cli/selftune/routes/index.ts +1 -0
  49. package/cli/selftune/routes/overview.ts +150 -4
  50. package/cli/selftune/routes/skill-report.ts +648 -18
  51. package/cli/selftune/status.ts +81 -2
  52. package/cli/selftune/sync.ts +56 -2
  53. package/cli/selftune/trust-model.ts +66 -0
  54. package/cli/selftune/types.ts +80 -0
  55. package/cli/selftune/utils/skill-detection.ts +43 -0
  56. package/cli/selftune/utils/transcript.ts +210 -1
  57. package/cli/selftune/watchlist.ts +65 -0
  58. package/node_modules/@selftune/telemetry-contract/src/types.ts +11 -0
  59. package/package.json +1 -1
  60. package/packages/telemetry-contract/src/types.ts +11 -0
  61. package/packages/ui/src/components/ActivityTimeline.tsx +165 -150
  62. package/packages/ui/src/components/EvidenceViewer.tsx +335 -144
  63. package/packages/ui/src/components/EvolutionTimeline.tsx +58 -28
  64. package/packages/ui/src/components/OrchestrateRunsPanel.tsx +33 -16
  65. package/packages/ui/src/components/RecentActivityFeed.tsx +72 -41
  66. package/packages/ui/src/components/section-cards.tsx +12 -9
  67. package/packages/ui/src/primitives/card.tsx +1 -1
  68. package/skill/SKILL.md +40 -2
  69. package/skill/Workflows/AlphaUpload.md +4 -0
  70. package/skill/Workflows/Composability.md +64 -0
  71. package/skill/Workflows/Contribute.md +6 -3
  72. package/skill/Workflows/Contributions.md +97 -0
  73. package/skill/Workflows/CreatorContributions.md +74 -0
  74. package/skill/Workflows/Dashboard.md +31 -0
  75. package/skill/Workflows/Evals.md +57 -8
  76. package/skill/Workflows/Evolve.md +31 -13
  77. package/skill/Workflows/ExportCanonical.md +121 -0
  78. package/skill/Workflows/Hook.md +131 -0
  79. package/skill/Workflows/Ingest.md +7 -0
  80. package/skill/Workflows/Initialize.md +29 -9
  81. package/skill/Workflows/Orchestrate.md +27 -5
  82. package/skill/Workflows/Quickstart.md +94 -0
  83. package/skill/Workflows/Recover.md +84 -0
  84. package/skill/Workflows/RepairSkillUsage.md +95 -0
  85. package/skill/Workflows/Sync.md +18 -12
  86. package/skill/Workflows/Uninstall.md +82 -0
  87. package/skill/settings_snippet.json +11 -0
  88. package/apps/local-dashboard/dist/assets/index-BMIS6uUh.css +0 -2
  89. package/apps/local-dashboard/dist/assets/index-DOu3iLD9.js +0 -16
  90. package/apps/local-dashboard/dist/assets/vendor-table-pHbDxq36.js +0 -8
  91. package/apps/local-dashboard/dist/assets/vendor-ui-DIwlrGlb.js +0 -12
package/README.md CHANGED
@@ -69,6 +69,8 @@ selftune learned that real users say "slides", "deck", "presentation for Monday"
69
69
 
70
70
  **I manage an agent setup with many skills** — You have 15+ skills installed. Some work. Some don't. Some conflict. Tell your agent "how are my skills doing?" and selftune gives you a health dashboard and automatically improves the skills that aren't keeping up.
71
71
 
72
+ **I use skills for non-coding work** — Marketing workflows, research pipelines, compliance checks, slide decks. You say "make me a presentation" and nothing happens. selftune learns that "slides", "deck", and "presentation for Monday" all mean the same skill — and fixes the routing automatically.
73
+
72
74
  ## How It Works
73
75
 
74
76
  <p align="center">
@@ -77,29 +79,27 @@ selftune learned that real users say "slides", "deck", "presentation for Monday"
77
79
 
78
80
  A continuous feedback loop that makes your skills learn and adapt. Automatically. Your agent runs everything — you just install the skill and talk naturally.
79
81
 
80
- **Observe** — Hooks capture every query and which skills fired. On Claude Code, hooks install automatically during `selftune init`. Backfill existing transcripts with `selftune ingest claude`.
82
+ **Observe** — Seven real-time hooks capture every query, every skill invocation, and every correction signal. Structured telemetry — not raw logs. On Claude Code, hooks install automatically during `selftune init`. Backfill existing transcripts with `selftune ingest claude`.
83
+
84
+ **Detect** — Finds the gap between how you talk and how your skills are described. You say "make me a slide deck" and your pptx skill stays silent — selftune catches that mismatch. Clusters missed queries by invocation type. Detects correction signals ("why didn't you use X?") and triggers immediate improvement.
85
+
86
+ **Evolve** — Generates multiple proposals biased toward different invocation types, validates each against your real eval set with majority voting, runs constitutional checks, then gates with an expensive model before deploying. Not guesswork — evidence. Automatic backup on every deploy.
81
87
 
82
- **Detect** — Finds the gap between how you talk and how your skills are described. You say "make me a slide deck" and your pptx skill stays silent — selftune catches that mismatch. Real-time correction signals ("why didn't you use X?") are detected and trigger immediate improvement.
88
+ **Watch** — After deploying changes, selftune monitors trigger rates, false negatives, and per-invocation-type scores. If anything regresses, it rolls back automatically. No manual monitoring needed.
83
89
 
84
- **Evolve** — Rewrites skill descriptions and full skill bodies — to match how you actually work. Cheap-loop mode uses haiku for the loop, sonnet for the gate (~80% cost reduction). Teacher-student body evolution with 3-gate validation. Automatic backup.
90
+ **Automate** — Run `selftune cron setup` to install OS-level scheduling. selftune syncs, grades, evolves, and watches on a schedule fully autonomous.
85
91
 
86
- **Watch** After deploying changes, selftune monitors skill trigger rates. If anything regresses, it rolls back automatically.
92
+ ## How Is This Different from Agents That "Learn"?
87
93
 
88
- **Automate** Run `selftune cron setup` to install OS-level scheduling. selftune syncs, evaluates, evolves, and watches on a schedule no manual intervention needed.
94
+ Some agents claim self-improvement by saving notes about what worked. That's knowledge persistence — not a closed loop. There's no measurement, no validation, and no way to know if the saved notes are actually correct.
89
95
 
90
- ## What's New in v0.2.0
96
+ selftune is empirical. It observes real sessions, grades execution quality, detects missed triggers, proposes changes, validates them against eval sets, deploys with automatic backup, monitors for regressions, and rolls back on failure. Twelve interlocking mechanisms — not one background thread writing markdown.
91
97
 
92
- - **Full skill body evolution** Beyond descriptions: evolve routing tables and entire skill bodies using teacher-student model with structural, trigger, and quality gates
93
- - **Synthetic eval generation** `selftune eval generate --synthetic` generates eval sets from SKILL.md via LLM, no session logs needed. Solves cold-start: new skills get evals immediately.
94
- - **Cheap-loop evolution** `selftune evolve --cheap-loop` uses haiku for proposal generation and validation, sonnet only for the final deployment gate. ~80% cost reduction.
95
- - **Batch trigger validation** Validation now batches 10 queries per LLM call instead of one-per-query. ~10x faster evolution loops.
96
- - **Per-stage model control** — `--validation-model`, `--proposal-model`, and `--gate-model` flags give fine-grained control over which model runs each evolution stage.
97
- - **Auto-activation system** — Hooks detect when selftune should run and suggest actions
98
- - **Enforcement guardrails** — Blocks SKILL.md edits on monitored skills unless `selftune watch` has been run
99
- - **Live dashboard server** — `selftune dashboard --serve` with SSE auto-refresh and action buttons
100
- - **Evolution memory** — Persists context, plans, and decisions across context resets
101
- - **4 specialized agents** — Diagnosis analyst, pattern analyst, evolution reviewer, integration guide
102
- - **Sandbox test harness** — Comprehensive automated test coverage, including devcontainer-based LLM testing
98
+ | Approach | Measures quality? | Validates changes? | Detects regressions? | Rolls back? |
99
+ | ------------------------- | ----------------- | --------------------------- | ---------------------- | ----------- |
100
+ | Agent saves its own notes | No | No | No | No |
101
+ | Manual skill rewrites | No | No | No | No |
102
+ | **selftune** | 3-tier grading | Eval sets + majority voting | Post-deploy monitoring | Automatic |
103
103
 
104
104
  ## Commands
105
105
 
@@ -107,13 +107,16 @@ Your agent runs these — you just say what you want ("improve my skills", "show
107
107
 
108
108
  | Group | Command | What it does |
109
109
  | ---------- | -------------------------------------------- | ------------------------------------------------------------------------------------------- |
110
- | | `selftune status` | See which skills are undertriggering and why |
111
- | | `selftune orchestrate` | Run the full autonomous loop (sync → evolve → watch) |
110
+ | | `selftune status` | Get a one-line health summary plus compact attention / improving highlights |
111
+ | | `selftune last` | Quick insight from the most recent session |
112
+ | | `selftune orchestrate` | Run the full autonomous loop (sync → grade → evolve → watch) |
113
+ | | `selftune sync` | Replay source-truth transcripts/rollouts into SQLite and refresh repair state |
112
114
  | | `selftune dashboard` | Open the visual skill health dashboard |
113
115
  | | `selftune doctor` | Health check: logs, hooks, config, permissions |
114
116
  | **ingest** | `selftune ingest claude` | Backfill from Claude Code transcripts |
115
117
  | | `selftune ingest codex` | Import Codex rollout logs (experimental) |
116
118
  | **grade** | `selftune grade --skill <name>` | Grade a skill session with evidence |
119
+ | | `selftune grade auto` | Auto-grade recent sessions for ungraded skills |
117
120
  | | `selftune grade baseline --skill <name>` | Measure skill value vs no-skill baseline |
118
121
  | **evolve** | `selftune evolve --skill <name>` | Propose, validate, and deploy improved descriptions |
119
122
  | | `selftune evolve body --skill <name>` | Evolve full skill body or routing table |
@@ -121,11 +124,18 @@ Your agent runs these — you just say what you want ("improve my skills", "show
121
124
  | **eval** | `selftune eval generate --skill <name>` | Generate eval sets (`--synthetic` for cold-start) |
122
125
  | | `selftune eval unit-test --skill <name>` | Run or generate skill-level unit tests |
123
126
  | | `selftune eval composability --skill <name>` | Detect conflicts between co-occurring skills |
127
+ | | `selftune eval family-overlap --prefix sc-` | Detect sibling overlap and suggest when a skill family should be consolidated |
124
128
  | | `selftune eval import` | Import external eval corpus from [SkillsBench](https://github.com/benchflow-ai/skillsbench) |
125
129
  | **auto** | `selftune cron setup` | Install OS-level scheduling (cron/launchd/systemd) |
126
130
  | | `selftune watch --skill <name>` | Monitor after deploy. Auto-rollback on regression. |
127
- | **other** | `selftune telemetry` | Manage anonymous usage analytics (status, enable, disable) |
128
- | | `selftune alpha upload` | Run a manual alpha upload cycle and emit a JSON send summary |
131
+ | **other** | `selftune workflows` | Discover and manage multi-skill workflows |
132
+ | | `selftune contributions` | Manage creator-directed sharing preferences |
133
+ | | `selftune creator-contributions` | Create or remove bundled `selftune.contribute.json` configs for skill creators |
134
+ | | `selftune contribute` | Export an anonymized community contribution bundle |
135
+ | | `selftune recover` | Recover SQLite from legacy/exported JSONL during migration or disaster recovery |
136
+ | | `selftune badge --skill <name>` | Generate a health badge for your skill's README |
137
+ | | `selftune telemetry` | Manage anonymous usage analytics (status, enable, disable) |
138
+ | | `selftune alpha upload` | Run a manual SQLite-backed alpha upload cycle and emit a JSON send summary |
129
139
 
130
140
  Full command reference: `selftune --help`
131
141
 
@@ -157,7 +167,7 @@ selftune is complementary to these tools, not competitive. They trace what happe
157
167
 
158
168
  **Claude Code** (fully supported) — Hooks install automatically. `selftune ingest claude` backfills existing transcripts. This is the primary supported platform.
159
169
 
160
- **Codex** (experimental) — `selftune ingest wrap-codex -- <args>` or `selftune ingest codex`. Adapter exists but is not actively tested.
170
+ **Codex** (experimental) — `selftune ingest wrap-codex -- <args>` or `selftune ingest codex`. Adapter exists but is not actively tested. Skill attribution is conservative: selftune only records explicit Codex skill evidence, not incidental assistant/meta mentions.
161
171
 
162
172
  **OpenCode** (experimental) — `selftune ingest opencode`. Adapter exists but is not actively tested.
163
173