selftune 0.2.0 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (122) hide show
  1. package/.claude/agents/diagnosis-analyst.md +20 -10
  2. package/.claude/agents/evolution-reviewer.md +14 -1
  3. package/.claude/agents/integration-guide.md +18 -6
  4. package/.claude/agents/pattern-analyst.md +18 -5
  5. package/CHANGELOG.md +12 -4
  6. package/README.md +43 -35
  7. package/apps/local-dashboard/dist/assets/geist-cyrillic-wght-normal-CHSlOQsW.woff2 +0 -0
  8. package/apps/local-dashboard/dist/assets/geist-latin-ext-wght-normal-DMtmJ5ZE.woff2 +0 -0
  9. package/apps/local-dashboard/dist/assets/geist-latin-wght-normal-Dm3htQBi.woff2 +0 -0
  10. package/apps/local-dashboard/dist/assets/index-C4EOTFZ2.js +15 -0
  11. package/apps/local-dashboard/dist/assets/index-bl-Webyd.css +1 -0
  12. package/apps/local-dashboard/dist/assets/vendor-react-U7zYD9Rg.js +60 -0
  13. package/apps/local-dashboard/dist/assets/vendor-table-B7VF2Ipl.js +26 -0
  14. package/apps/local-dashboard/dist/assets/vendor-ui-D7_zX_qy.js +346 -0
  15. package/apps/local-dashboard/dist/favicon.png +0 -0
  16. package/apps/local-dashboard/dist/index.html +17 -0
  17. package/apps/local-dashboard/dist/logo.png +0 -0
  18. package/apps/local-dashboard/dist/logo.svg +9 -0
  19. package/cli/selftune/badge/badge-data.ts +1 -1
  20. package/cli/selftune/badge/badge.ts +4 -8
  21. package/cli/selftune/canonical-export.ts +183 -0
  22. package/cli/selftune/constants.ts +28 -0
  23. package/cli/selftune/contribute/contribute.ts +1 -1
  24. package/cli/selftune/cron/setup.ts +17 -17
  25. package/cli/selftune/dashboard-contract.ts +202 -0
  26. package/cli/selftune/dashboard-server.ts +653 -186
  27. package/cli/selftune/dashboard.ts +41 -176
  28. package/cli/selftune/eval/baseline.ts +5 -4
  29. package/cli/selftune/eval/composability-v2.ts +273 -0
  30. package/cli/selftune/eval/hooks-to-evals.ts +34 -15
  31. package/cli/selftune/eval/unit-test-cli.ts +1 -1
  32. package/cli/selftune/evolution/evidence.ts +26 -0
  33. package/cli/selftune/evolution/evolve-body.ts +105 -11
  34. package/cli/selftune/evolution/evolve.ts +371 -25
  35. package/cli/selftune/evolution/extract-patterns.ts +87 -29
  36. package/cli/selftune/evolution/rollback.ts +2 -2
  37. package/cli/selftune/grading/auto-grade.ts +200 -0
  38. package/cli/selftune/grading/grade-session.ts +448 -97
  39. package/cli/selftune/grading/results.ts +42 -0
  40. package/cli/selftune/hooks/prompt-log.ts +172 -2
  41. package/cli/selftune/hooks/session-stop.ts +123 -3
  42. package/cli/selftune/hooks/skill-eval.ts +119 -3
  43. package/cli/selftune/index.ts +395 -116
  44. package/cli/selftune/ingestors/claude-replay.ts +140 -114
  45. package/cli/selftune/ingestors/codex-rollout.ts +345 -46
  46. package/cli/selftune/ingestors/codex-wrapper.ts +207 -39
  47. package/cli/selftune/ingestors/openclaw-ingest.ts +141 -8
  48. package/cli/selftune/ingestors/opencode-ingest.ts +193 -17
  49. package/cli/selftune/init.ts +227 -14
  50. package/cli/selftune/last.ts +14 -5
  51. package/cli/selftune/localdb/db.ts +63 -0
  52. package/cli/selftune/localdb/materialize.ts +428 -0
  53. package/cli/selftune/localdb/queries.ts +376 -0
  54. package/cli/selftune/localdb/schema.ts +204 -0
  55. package/cli/selftune/monitoring/watch.ts +66 -15
  56. package/cli/selftune/normalization.ts +682 -0
  57. package/cli/selftune/observability.ts +19 -44
  58. package/cli/selftune/orchestrate.ts +1073 -0
  59. package/cli/selftune/quickstart.ts +203 -0
  60. package/cli/selftune/repair/skill-usage.ts +576 -0
  61. package/cli/selftune/schedule.ts +561 -0
  62. package/cli/selftune/status.ts +48 -26
  63. package/cli/selftune/sync.ts +627 -0
  64. package/cli/selftune/types.ts +148 -0
  65. package/cli/selftune/utils/canonical-log.ts +45 -0
  66. package/cli/selftune/utils/hooks.ts +41 -0
  67. package/cli/selftune/utils/html.ts +27 -0
  68. package/cli/selftune/utils/llm-call.ts +78 -20
  69. package/cli/selftune/utils/math.ts +10 -0
  70. package/cli/selftune/utils/query-filter.ts +139 -0
  71. package/cli/selftune/utils/skill-discovery.ts +340 -0
  72. package/cli/selftune/utils/skill-log.ts +68 -0
  73. package/cli/selftune/utils/skill-usage-confidence.ts +18 -0
  74. package/cli/selftune/utils/transcript.ts +272 -26
  75. package/cli/selftune/workflows/discover.ts +254 -0
  76. package/cli/selftune/workflows/skill-md-writer.ts +288 -0
  77. package/cli/selftune/workflows/workflows.ts +188 -0
  78. package/package.json +21 -8
  79. package/packages/telemetry-contract/README.md +11 -0
  80. package/packages/telemetry-contract/fixtures/golden.json +87 -0
  81. package/packages/telemetry-contract/fixtures/golden.test.ts +42 -0
  82. package/packages/telemetry-contract/index.ts +1 -0
  83. package/packages/telemetry-contract/package.json +19 -0
  84. package/packages/telemetry-contract/src/index.ts +2 -0
  85. package/packages/telemetry-contract/src/types.ts +163 -0
  86. package/packages/telemetry-contract/src/validators.ts +109 -0
  87. package/skill/SKILL.md +84 -53
  88. package/skill/Workflows/AutoActivation.md +17 -16
  89. package/skill/Workflows/Badge.md +6 -0
  90. package/skill/Workflows/Baseline.md +46 -23
  91. package/skill/Workflows/Composability.md +12 -5
  92. package/skill/Workflows/Contribute.md +17 -14
  93. package/skill/Workflows/Cron.md +56 -79
  94. package/skill/Workflows/Dashboard.md +45 -34
  95. package/skill/Workflows/Doctor.md +30 -17
  96. package/skill/Workflows/Evals.md +64 -40
  97. package/skill/Workflows/EvolutionMemory.md +2 -0
  98. package/skill/Workflows/Evolve.md +102 -47
  99. package/skill/Workflows/EvolveBody.md +6 -6
  100. package/skill/Workflows/Grade.md +36 -31
  101. package/skill/Workflows/ImportSkillsBench.md +11 -5
  102. package/skill/Workflows/Ingest.md +43 -36
  103. package/skill/Workflows/Initialize.md +44 -30
  104. package/skill/Workflows/Orchestrate.md +139 -0
  105. package/skill/Workflows/Replay.md +39 -18
  106. package/skill/Workflows/Rollback.md +3 -3
  107. package/skill/Workflows/Schedule.md +61 -0
  108. package/skill/Workflows/Sync.md +88 -0
  109. package/skill/Workflows/UnitTest.md +34 -22
  110. package/skill/Workflows/Watch.md +14 -4
  111. package/skill/Workflows/Workflows.md +129 -0
  112. package/skill/assets/activation-rules-default.json +26 -0
  113. package/skill/assets/multi-skill-settings.json +63 -0
  114. package/skill/assets/single-skill-settings.json +57 -0
  115. package/skill/references/invocation-taxonomy.md +2 -2
  116. package/skill/references/logs.md +164 -2
  117. package/skill/references/setup-patterns.md +65 -0
  118. package/skill/references/version-history.md +40 -0
  119. package/skill/settings_snippet.json +1 -1
  120. package/templates/multi-skill-settings.json +7 -7
  121. package/templates/single-skill-settings.json +6 -6
  122. package/dashboard/index.html +0 -1680
@@ -11,12 +11,22 @@ Investigate why a specific skill is underperforming. Analyze telemetry logs,
11
11
  grading results, and session transcripts to identify root causes and recommend
12
12
  targeted fixes.
13
13
 
14
- **Activate when the user says:**
15
- - "diagnose skill issues"
16
- - "why is skill X underperforming"
17
- - "what's wrong with this skill"
18
- - "skill failure analysis"
19
- - "debug skill performance"
14
+ **Activation policy:** This is a subagent-only role, spawned by the main agent.
15
+ If a user asks for diagnosis directly, the main agent should route to this subagent.
16
+
17
+ ## Connection to Workflows
18
+
19
+ This agent is spawned by the main agent as a subagent when deeper analysis is
20
+ needed — it is not called directly by the user.
21
+
22
+ **Connected workflows:**
23
+ - **Doctor** — when `selftune doctor` reveals persistent issues with a specific skill, spawn this agent for root cause analysis
24
+ - **Grade** — when grades are consistently low for a skill, spawn this agent to investigate why
25
+ - **Status** — when `selftune status` shows CRITICAL or WARNING flags on a skill, spawn this agent for a deep dive
26
+
27
+ The main agent decides when to escalate to this subagent based on severity
28
+ and persistence of the issue. One-off failures are handled inline; recurring
29
+ or unexplained failures warrant spawning this agent.
20
30
 
21
31
  ## Context
22
32
 
@@ -48,7 +58,7 @@ any warnings or regression flags.
48
58
  ### Step 3: Pull telemetry stats
49
59
 
50
60
  ```bash
51
- selftune evals --skill <name> --stats
61
+ selftune eval generate --skill <name> --stats
52
62
  ```
53
63
 
54
64
  Review aggregate metrics:
@@ -59,7 +69,7 @@ Review aggregate metrics:
59
69
  ### Step 4: Analyze trigger coverage
60
70
 
61
71
  ```bash
62
- selftune evals --skill <name> --max 50
72
+ selftune eval generate --skill <name> --max 50
63
73
  ```
64
74
 
65
75
  Review the generated eval set. Count entries by invocation type:
@@ -106,8 +116,8 @@ Compile findings into a structured report.
106
116
  |---------|---------|
107
117
  | `selftune status` | Overall health snapshot |
108
118
  | `selftune last` | Most recent session details |
109
- | `selftune evals --skill <name> --stats` | Aggregate telemetry |
110
- | `selftune evals --skill <name> --max 50` | Generate eval set for coverage analysis |
119
+ | `selftune eval generate --skill <name> --stats` | Aggregate telemetry |
120
+ | `selftune eval generate --skill <name> --max 50` | Generate eval set for coverage analysis |
111
121
  | `selftune doctor` | Check infrastructure health |
112
122
 
113
123
  ## Output
@@ -18,6 +18,19 @@ vs. new descriptions, and provides an approve/reject verdict with reasoning.
18
18
  - "review pending changes"
19
19
  - "should I deploy this evolution"
20
20
 
21
+ ## Connection to Workflows
22
+
23
+ This agent is spawned by the main agent as a subagent to provide a safety
24
+ review before deploying an evolution.
25
+
26
+ **Connected workflows:**
27
+ - **Evolve** — in the review-before-deploy step, spawn this agent to evaluate the proposal for regressions, scope creep, and eval set quality
28
+ - **EvolveBody** — same role for full-body and routing-table evolutions
29
+
30
+ **Mode behavior:**
31
+ - **Interactive mode** — spawn this agent before deploying an evolution to get a human-readable safety review with an approve/reject verdict
32
+ - **Autonomous mode** — the orchestrator handles validation internally using regression thresholds and auto-rollback; this agent is for interactive safety reviews only
33
+
21
34
  ## Context
22
35
 
23
36
  You need access to:
@@ -114,7 +127,7 @@ Issue an approve or reject decision with full reasoning.
114
127
  | Command | Purpose |
115
128
  |---------|---------|
116
129
  | `selftune evolve --skill <name> --skill-path <path> --dry-run` | Generate proposal without deploying |
117
- | `selftune evals --skill <name>` | Check eval set used for validation |
130
+ | Read eval file from evolve output or audit log | Inspect the exact eval set used for validation |
118
131
  | `selftune watch --skill <name> --skill-path <path>` | Check current performance baseline |
119
132
  | `selftune status` | Overall skill health context |
120
133
 
@@ -19,6 +19,18 @@ verify the setup is working end-to-end.
19
19
  - "get selftune working"
20
20
  - "selftune setup guide"
21
21
 
22
+ ## Connection to Workflows
23
+
24
+ This agent is the deep-dive version of the Initialize workflow, spawned by
25
+ the main agent as a subagent when the project structure is complex.
26
+
27
+ **Connected workflows:**
28
+ - **Initialize** — for complex project structures (monorepos, multi-skill repos, mixed agent platforms), spawn this agent instead of running the basic init workflow
29
+
30
+ **When to spawn:** when the project has multiple SKILL.md files, multiple
31
+ packages or workspaces, mixed agent platforms (Claude + Codex), or any
32
+ structure where the standard `selftune init` needs project-specific guidance.
33
+
22
34
  ## Context
23
35
 
24
36
  You need access to:
@@ -90,8 +102,8 @@ Parse the output to confirm `~/.selftune/config.json` was created. Note the
90
102
  detected `agent_type` and `cli_path`.
91
103
 
92
104
  If the user is on a non-Claude agent platform:
93
- - **Codex** — inform about `wrap-codex` and `ingest-codex` options
94
- - **OpenCode** — inform about `ingest-opencode` option
105
+ - **Codex** — inform about `ingest wrap-codex` and `ingest codex` options
106
+ - **OpenCode** — inform about `ingest opencode` option
95
107
 
96
108
  ### Step 5: Install hooks
97
109
 
@@ -106,8 +118,8 @@ into `~/.claude/settings.json`. Three hooks are required:
106
118
 
107
119
  Derive script paths from `cli_path` in `~/.selftune/config.json`.
108
120
 
109
- For **Codex**: use `selftune wrap-codex` or `selftune ingest-codex`.
110
- For **OpenCode**: use `selftune ingest-opencode`.
121
+ For **Codex**: use `selftune ingest wrap-codex` or `selftune ingest codex`.
122
+ For **OpenCode**: use `selftune ingest opencode`.
111
123
 
112
124
  ### Step 6: Verify with doctor
113
125
 
@@ -159,7 +171,7 @@ from any package directory.
159
171
  Tell the user what to do next based on their goals:
160
172
 
161
173
  - **"I want to see how my skills are doing"** — run `selftune status`
162
- - **"I want to improve a skill"** — run `selftune evals --skill <name>` then `selftune evolve`
174
+ - **"I want to improve a skill"** — run `selftune eval generate --skill <name>` then `selftune evolve --skill <name>`
163
175
  - **"I want to grade a session"** — run `selftune grade --skill <name>`
164
176
 
165
177
  ## Commands
@@ -170,7 +182,7 @@ Tell the user what to do next based on their goals:
170
182
  | `selftune doctor` | Verify installation health |
171
183
  | `selftune status` | Post-setup health check |
172
184
  | `selftune last` | Verify telemetry capture |
173
- | `selftune evals --list-skills` | Confirm skills are being tracked |
185
+ | `selftune eval generate --list-skills` | Confirm skills are being tracked |
174
186
 
175
187
  ## Output
176
188
 
@@ -19,6 +19,19 @@ opportunities, and identify systemic issues affecting multiple skills.
19
19
  - "skill trigger conflicts"
20
20
  - "optimize my skills"
21
21
 
22
+ ## Connection to Workflows
23
+
24
+ This agent is spawned by the main agent as a subagent for deep cross-skill
25
+ analysis.
26
+
27
+ **Connected workflows:**
28
+ - **Composability** — when `selftune eval composability` identifies conflict candidates, spawn this agent for deeper investigation of trigger overlaps and resolution strategies
29
+ - **Evals** — when analyzing cross-skill patterns or systemwide undertriggering, spawn this agent to find optimization opportunities
30
+
31
+ **When to spawn:** when the user asks about conflicts between skills,
32
+ cross-skill optimization, or when composability scores indicate moderate-to-severe
33
+ conflicts (score > 0.3).
34
+
22
35
  ## Context
23
36
 
24
37
  You need access to:
@@ -33,7 +46,7 @@ You need access to:
33
46
  ### Step 1: Inventory all skills
34
47
 
35
48
  ```bash
36
- selftune evals --list-skills
49
+ selftune eval generate --list-skills
37
50
  ```
38
51
 
39
52
  Parse the JSON output to get a complete list of skills with their query
@@ -77,7 +90,7 @@ Read `skill_usage_log.jsonl` and group by query text. Look for:
77
90
  For each skill, pull stats:
78
91
 
79
92
  ```bash
80
- selftune evals --skill <name> --stats
93
+ selftune eval generate --skill <name> --stats
81
94
  ```
82
95
 
83
96
  Compare across skills:
@@ -100,10 +113,10 @@ Compile a cross-skill analysis report.
100
113
 
101
114
  | Command | Purpose |
102
115
  |---------|---------|
103
- | `selftune evals --list-skills` | Inventory all skills with query counts |
116
+ | `selftune eval generate --list-skills` | Inventory all skills with query counts |
104
117
  | `selftune status` | Health snapshot across all skills |
105
- | `selftune evals --skill <name> --stats` | Per-skill aggregate telemetry |
106
- | `selftune evals --skill <name> --max 50` | Generate eval set per skill |
118
+ | `selftune eval generate --skill <name> --stats` | Per-skill aggregate telemetry |
119
+ | `selftune eval generate --skill <name> --max 50` | Generate eval set per skill |
107
120
 
108
121
  ## Output
109
122
 
package/CHANGELOG.md CHANGED
@@ -7,15 +7,23 @@ and this project adheres to [Semantic Versioning](https://semver.org/).
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ### Added
11
+
12
+ - **Real-time improvement signal detection** — `prompt-log` hook detects user corrections ("why didn't you use X?") and explicit skill requests via pure regex patterns. Signals are logged to `~/.claude/improvement_signals.jsonl` with skill name extraction from installed skills.
13
+ - **Signal-reactive orchestration** — `session-stop` hook checks for pending improvement signals and spawns a focused `selftune orchestrate --max-skills 2` run in the background. Respects a 30-minute lockfile to prevent concurrent runs.
14
+ - **Signal-aware candidate selection** — Orchestrator reads pending signals and boosts priority for mentioned skills (+150 per signal, capped at +450). Signaled skills bypass the minimum evidence gate and the "UNGRADED with 0 missed queries" gate.
15
+ - **Orchestrate lockfile** — `acquireLock()`/`releaseLock()` with PID+timestamp in `~/.claude/.orchestrate.lock`. 30-minute stale threshold prevents deadlocks from crashed runs.
16
+ - **Signal consumption** — After an orchestrate run completes, consumed signals are marked with `consumed: true`, `consumed_at`, and `consumed_by_run` so they don't affect subsequent runs.
17
+
10
18
  ## [0.2.0] — 2026-03-08
11
19
 
12
20
  ### Added
13
21
 
14
22
  - **Full skill body evolution** — Teacher-student model for evolving routing tables and complete skill bodies with 3-gate validation (structural, trigger, quality)
15
- - **Synthetic eval generation** — `selftune evals --synthetic --skill <name> --skill-path <path>` generates eval sets from SKILL.md via LLM without needing real session logs. Solves cold-start for new skills.
23
+ - **Synthetic eval generation** — `selftune eval generate --synthetic --skill <name> --skill-path <path>` generates eval sets from SKILL.md via LLM without needing real session logs. Solves cold-start for new skills.
16
24
  - **Batch trigger validation** — `validateProposalBatched()` batches 10 queries per LLM call (configurable via `TRIGGER_CHECK_BATCH_SIZE`). ~10x faster evolution loops. Sequential `validateProposalSequential()` kept for backward compat.
17
25
  - **Cheap-loop evolution mode** — `selftune evolve --cheap-loop` uses haiku for proposal generation and validation, sonnet only for the final deployment gate. New `--gate-model` and `--proposal-model` flags for manual per-stage control.
18
- - **Validation model selection** — `--validation-model` flag on `evolve` and `evolve-body` commands (default: `haiku`).
26
+ - **Validation model selection** — `--validation-model` flag on `evolve` and `evolve body` commands (default: `haiku`).
19
27
  - **Proposal model selection** — `--proposal-model` flag on `evolve`, passed through to `generateProposal()` and `generateMultipleProposals()`.
20
28
  - **Gate validation dependency injection** — `gateValidateProposal` added to `EvolveDeps` for testability.
21
29
  - **Auto-activation system** — `auto-activate.ts` UserPromptSubmit hook detects when selftune should run and outputs formatted suggestions; session state tracking prevents repeated nags; PAI coexistence support
@@ -47,7 +55,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/).
47
55
  - `selftune status` — CLI skill health summary with pass rates, trends, and system health
48
56
  - `selftune last` — Quick insight from the most recent session
49
57
  - `selftune dashboard` — Skill-health-centric HTML dashboard with grid view and drill-down
50
- - `selftune replay` — Claude Code transcript replay for retroactive log backfill
58
+ - `selftune ingest claude` — Claude Code transcript replay for retroactive log backfill
51
59
  - `selftune contribute` — Opt-in anonymized data export for community contribution
52
60
  - CI/CD workflows: publish, auto-bump, CodeQL, scorecard
53
61
  - FOSS governance: LICENSE (MIT), CODE_OF_CONDUCT, CONTRIBUTING, SECURITY
@@ -57,7 +65,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/).
57
65
 
58
66
  ### Added
59
67
 
60
- - CLI entry point with 10 commands: `init`, `evals`, `grade`, `evolve`, `rollback`, `watch`, `doctor`, `ingest-codex`, `ingest-opencode`, `wrap-codex`
68
+ - CLI entry point with 10 commands: `init`, `eval generate`, `grade`, `evolve`, `evolve rollback`, `watch`, `doctor`, `ingest codex`, `ingest opencode`, `ingest wrap-codex`
61
69
  - Agent auto-detection for Claude Code, Codex, and OpenCode
62
70
  - Telemetry hooks for Claude Code (`prompt-log`, `skill-eval`, `session-stop`)
63
71
  - Codex wrapper and batch ingestor for rollout logs
package/README.md CHANGED
@@ -6,9 +6,9 @@
6
6
 
7
7
  **Self-improving skills for AI agents.**
8
8
 
9
- [![CI](https://github.com/WellDunDun/selftune/actions/workflows/ci.yml/badge.svg)](https://github.com/WellDunDun/selftune/actions/workflows/ci.yml)
10
- [![CodeQL](https://github.com/WellDunDun/selftune/actions/workflows/codeql.yml/badge.svg)](https://github.com/WellDunDun/selftune/actions/workflows/codeql.yml)
11
- [![OpenSSF Scorecard](https://api.securityscorecards.dev/projects/github.com/WellDunDun/selftune/badge)](https://securityscorecards.dev/viewer/?uri=github.com/WellDunDun/selftune)
9
+ [![CI](https://github.com/selftune-dev/selftune/actions/workflows/ci.yml/badge.svg)](https://github.com/selftune-dev/selftune/actions/workflows/ci.yml)
10
+ [![CodeQL](https://github.com/selftune-dev/selftune/actions/workflows/codeql.yml/badge.svg)](https://github.com/selftune-dev/selftune/actions/workflows/codeql.yml)
11
+ [![OpenSSF Scorecard](https://api.securityscorecards.dev/projects/github.com/selftune-dev/selftune/badge)](https://securityscorecards.dev/viewer/?uri=github.com/selftune-dev/selftune)
12
12
  [![npm version](https://img.shields.io/npm/v/selftune)](https://www.npmjs.com/package/selftune)
13
13
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
14
14
  [![TypeScript](https://img.shields.io/badge/TypeScript-blue.svg)](https://www.typescriptlang.org/)
@@ -25,17 +25,17 @@ Your agent skills learn how you work. Detect what's broken. Fix it automatically
25
25
 
26
26
  Your skills don't understand how you talk. You say "make me a slide deck" and nothing happens — no error, no log, no signal. selftune watches your real sessions, learns how you actually speak, and rewrites skill descriptions to match. Automatically.
27
27
 
28
- Works with **Claude Code**, **Codex**, **OpenCode**, and **OpenClaw**. Zero runtime dependencies.
28
+ Works with **Claude Code** (primary). Codex, OpenCode, and OpenClaw adapters are experimental. Zero runtime dependencies.
29
29
 
30
30
  ## Install
31
31
 
32
32
  ```bash
33
- npx skills add WellDunDun/selftune
33
+ npx skills add selftune-dev/selftune
34
34
  ```
35
35
 
36
36
  Then tell your agent: **"initialize selftune"**
37
37
 
38
- Two minutes. No API keys. No external services. No configuration ceremony. Uses your existing agent subscription. Within minutes you'll see which skills are undertriggering.
38
+ Two minutes. No API keys. No external services. No configuration ceremony. Uses your existing agent subscription. You'll see which skills are undertriggering.
39
39
 
40
40
  **CLI only** (no skill, just the CLI):
41
41
 
@@ -53,11 +53,11 @@ selftune learned that real users say "slides", "deck", "presentation for Monday"
53
53
 
54
54
  ## Built for How You Actually Work
55
55
 
56
- **I write and use my own skills** — You built skills for your workflow but your descriptions don't match how you actually talk. selftune learns your language from real sessions and evolves descriptions to match no more manual tuning. `selftune status` · `selftune evolve` · `selftune baseline`
56
+ **I write and use my own skills** — Your skill descriptions don't match how you actually talk. Tell your agent "improve my skills" and selftune learns your language from real sessions, evolves descriptions to match, and validates before deploying. No manual tuning.
57
57
 
58
- **I publish skills others install** — Your skill works for you, but every user talks differently. selftune ships skills that get better for every user automatically — adapting descriptions to how each person actually works. `selftune status` · `selftune evals` · `selftune badge`
58
+ **I publish skills others install** — Your skill works for you, but every user talks differently. selftune ships skills that get better for every user automatically — adapting descriptions to how each person actually works.
59
59
 
60
- **I manage an agent setup with many skills** — You have 15+ skills installed. Some work. Some don't. Some conflict. selftune gives you a health dashboard and automatically improves the skills that aren't keeping up with how your team works. `selftune dashboard` · `selftune composability` · `selftune doctor`
60
+ **I manage an agent setup with many skills** — You have 15+ skills installed. Some work. Some don't. Some conflict. Tell your agent "how are my skills doing?" and selftune gives you a health dashboard and automatically improves the skills that aren't keeping up.
61
61
 
62
62
  ## How It Works
63
63
 
@@ -65,20 +65,22 @@ selftune learned that real users say "slides", "deck", "presentation for Monday"
65
65
  <img src="./assets/FeedbackLoop.gif" alt="Observe → Detect → Evolve → Watch" width="800">
66
66
  </p>
67
67
 
68
- A continuous feedback loop that makes your skills learn and adapt. Automatically.
68
+ A continuous feedback loop that makes your skills learn and adapt. Automatically. Your agent runs everything — you just install the skill and talk naturally.
69
69
 
70
- **Observe** — Hooks capture every user query and which skills fired. On Claude Code, hooks install automatically. Use `selftune replay` to backfill existing transcripts. This is how your skills start learning.
70
+ **Observe** — Hooks capture every query and which skills fired. On Claude Code, hooks install automatically during `selftune init`. Backfill existing transcripts with `selftune ingest claude`.
71
71
 
72
- **Detect** — selftune finds the gap between how you talk and how your skills are described. You say "make me a slide deck" and your pptx skill stays silent — selftune catches that mismatch.
72
+ **Detect** — Finds the gap between how you talk and how your skills are described. You say "make me a slide deck" and your pptx skill stays silent — selftune catches that mismatch. Real-time correction signals ("why didn't you use X?") are detected and trigger immediate improvement.
73
73
 
74
- **Evolve** — Rewrites skill descriptions — and full skill bodies — to match how you actually work. Batched validation with per-stage model control (`--cheap-loop` uses haiku for the loop, sonnet for the gate). Teacher-student body evolution with 3-gate validation. Baseline comparison gates on measurable lift. Automatic backup.
74
+ **Evolve** — Rewrites skill descriptions — and full skill bodies — to match how you actually work. Cheap-loop mode uses haiku for the loop, sonnet for the gate (~80% cost reduction). Teacher-student body evolution with 3-gate validation. Automatic backup.
75
75
 
76
- **Watch** — After deploying changes, selftune monitors skill trigger rates. If anything regresses, it rolls back automatically. Your skills keep improving without you touching them.
76
+ **Watch** — After deploying changes, selftune monitors skill trigger rates. If anything regresses, it rolls back automatically.
77
+
78
+ **Automate** — Run `selftune cron setup` to install OS-level scheduling. selftune syncs, evaluates, evolves, and watches on a schedule — no manual intervention needed.
77
79
 
78
80
  ## What's New in v0.2.0
79
81
 
80
82
  - **Full skill body evolution** — Beyond descriptions: evolve routing tables and entire skill bodies using teacher-student model with structural, trigger, and quality gates
81
- - **Synthetic eval generation** — `selftune evals --synthetic` generates eval sets from SKILL.md via LLM, no session logs needed. Solves cold-start: new skills get evals immediately.
83
+ - **Synthetic eval generation** — `selftune eval generate --synthetic` generates eval sets from SKILL.md via LLM, no session logs needed. Solves cold-start: new skills get evals immediately.
82
84
  - **Cheap-loop evolution** — `selftune evolve --cheap-loop` uses haiku for proposal generation and validation, sonnet only for the final deployment gate. ~80% cost reduction.
83
85
  - **Batch trigger validation** — Validation now batches 10 queries per LLM call instead of one-per-query. ~10x faster evolution loops.
84
86
  - **Per-stage model control** — `--validation-model`, `--proposal-model`, and `--gate-model` flags give fine-grained control over which model runs each evolution stage.
@@ -91,21 +93,27 @@ A continuous feedback loop that makes your skills learn and adapt. Automatically
91
93
 
92
94
  ## Commands
93
95
 
94
- | Command | What it does |
95
- |---|---|
96
- | `selftune status` | See which skills are undertriggering and why |
97
- | `selftune evals --skill <name>` | Generate eval sets from real session data (`--synthetic` for cold-start) |
98
- | `selftune evolve --skill <name>` | Propose, validate, and deploy improved descriptions (`--cheap-loop`, `--with-baseline`) |
99
- | `selftune evolve-body --skill <name>` | Evolve full skill body or routing table (teacher-student, 3-gate validation) |
100
- | `selftune baseline --skill <name>` | Measure skill value vs no-skill baseline |
101
- | `selftune unit-test --skill <name>` | Run or generate skill-level unit tests |
102
- | `selftune composability --skill <name>` | Detect conflicts between co-occurring skills |
103
- | `selftune import-skillsbench` | Import external eval corpus from [SkillsBench](https://github.com/benchflow-ai/skillsbench) |
104
- | `selftune badge --skill <name>` | Generate skill health badge SVG |
105
- | `selftune watch --skill <name>` | Monitor after deploy. Auto-rollback on regression. |
106
- | `selftune dashboard` | Open the visual skill health dashboard |
107
- | `selftune replay` | Backfill data from existing Claude Code transcripts |
108
- | `selftune doctor` | Health check: logs, hooks, config, permissions |
96
+ Your agent runs these you just say what you want ("improve my skills", "show the dashboard").
97
+
98
+ | Group | Command | What it does |
99
+ |-------|---------|-------------|
100
+ | | `selftune status` | See which skills are undertriggering and why |
101
+ | | `selftune orchestrate` | Run the full autonomous loop (sync evolve watch) |
102
+ | | `selftune dashboard` | Open the visual skill health dashboard |
103
+ | | `selftune doctor` | Health check: logs, hooks, config, permissions |
104
+ | **ingest** | `selftune ingest claude` | Backfill from Claude Code transcripts |
105
+ | | `selftune ingest codex` | Import Codex rollout logs (experimental) |
106
+ | **grade** | `selftune grade --skill <name>` | Grade a skill session with evidence |
107
+ | | `selftune grade baseline --skill <name>` | Measure skill value vs no-skill baseline |
108
+ | **evolve** | `selftune evolve --skill <name>` | Propose, validate, and deploy improved descriptions |
109
+ | | `selftune evolve body --skill <name>` | Evolve full skill body or routing table |
110
+ | | `selftune evolve rollback --skill <name>` | Rollback a previous evolution |
111
+ | **eval** | `selftune eval generate --skill <name>` | Generate eval sets (`--synthetic` for cold-start) |
112
+ | | `selftune eval unit-test --skill <name>` | Run or generate skill-level unit tests |
113
+ | | `selftune eval composability --skill <name>` | Detect conflicts between co-occurring skills |
114
+ | | `selftune eval import` | Import external eval corpus from [SkillsBench](https://github.com/benchflow-ai/skillsbench) |
115
+ | **auto** | `selftune cron setup` | Install OS-level scheduling (cron/launchd/systemd) |
116
+ | | `selftune watch --skill <name>` | Monitor after deploy. Auto-rollback on regression. |
109
117
 
110
118
  Full command reference: `selftune --help`
111
119
 
@@ -135,13 +143,13 @@ selftune is complementary to these tools, not competitive. They trace what happe
135
143
 
136
144
  ## Platforms
137
145
 
138
- **Claude Code** — Hooks install automatically. `selftune replay` backfills existing transcripts.
146
+ **Claude Code** (fully supported) — Hooks install automatically. `selftune ingest claude` backfills existing transcripts. This is the primary supported platform.
139
147
 
140
- **Codex** — `selftune wrap-codex -- <args>` or `selftune ingest-codex`
148
+ **Codex** (experimental) — `selftune ingest wrap-codex -- <args>` or `selftune ingest codex`. Adapter exists but is not actively tested.
141
149
 
142
- **OpenCode** — `selftune ingest-opencode`
150
+ **OpenCode** (experimental) — `selftune ingest opencode`. Adapter exists but is not actively tested.
143
151
 
144
- **OpenClaw** — `selftune ingest-openclaw` + `selftune cron setup` for autonomous evolution
152
+ **OpenClaw** (experimental) — `selftune ingest openclaw` + `selftune cron setup` for autonomous evolution. Adapter exists but is not actively tested.
145
153
 
146
154
  Requires [Bun](https://bun.sh) or Node.js 18+. No extra API keys.
147
155
 
@@ -151,6 +159,6 @@ Requires [Bun](https://bun.sh) or Node.js 18+. No extra API keys.
151
159
 
152
160
  [Architecture](ARCHITECTURE.md) · [Contributing](CONTRIBUTING.md) · [Security](SECURITY.md) · [Integration Guide](docs/integration-guide.md) · [Sponsor](https://github.com/sponsors/WellDunDun)
153
161
 
154
- MIT licensed. Free forever. Works with Claude Code, Codex, OpenCode, and OpenClaw.
162
+ MIT licensed. Free forever. Primary support for Claude Code; experimental adapters for Codex, OpenCode, and OpenClaw.
155
163
 
156
164
  </div>