selftune 0.2.0 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/agents/diagnosis-analyst.md +20 -10
- package/.claude/agents/evolution-reviewer.md +14 -1
- package/.claude/agents/integration-guide.md +18 -6
- package/.claude/agents/pattern-analyst.md +18 -5
- package/CHANGELOG.md +12 -4
- package/README.md +43 -35
- package/apps/local-dashboard/dist/assets/geist-cyrillic-wght-normal-CHSlOQsW.woff2 +0 -0
- package/apps/local-dashboard/dist/assets/geist-latin-ext-wght-normal-DMtmJ5ZE.woff2 +0 -0
- package/apps/local-dashboard/dist/assets/geist-latin-wght-normal-Dm3htQBi.woff2 +0 -0
- package/apps/local-dashboard/dist/assets/index-C4EOTFZ2.js +15 -0
- package/apps/local-dashboard/dist/assets/index-bl-Webyd.css +1 -0
- package/apps/local-dashboard/dist/assets/vendor-react-U7zYD9Rg.js +60 -0
- package/apps/local-dashboard/dist/assets/vendor-table-B7VF2Ipl.js +26 -0
- package/apps/local-dashboard/dist/assets/vendor-ui-D7_zX_qy.js +346 -0
- package/apps/local-dashboard/dist/favicon.png +0 -0
- package/apps/local-dashboard/dist/index.html +17 -0
- package/apps/local-dashboard/dist/logo.png +0 -0
- package/apps/local-dashboard/dist/logo.svg +9 -0
- package/cli/selftune/badge/badge-data.ts +1 -1
- package/cli/selftune/badge/badge.ts +4 -8
- package/cli/selftune/canonical-export.ts +183 -0
- package/cli/selftune/constants.ts +28 -0
- package/cli/selftune/contribute/contribute.ts +1 -1
- package/cli/selftune/cron/setup.ts +17 -17
- package/cli/selftune/dashboard-contract.ts +202 -0
- package/cli/selftune/dashboard-server.ts +653 -186
- package/cli/selftune/dashboard.ts +41 -176
- package/cli/selftune/eval/baseline.ts +5 -4
- package/cli/selftune/eval/composability-v2.ts +273 -0
- package/cli/selftune/eval/hooks-to-evals.ts +34 -15
- package/cli/selftune/eval/unit-test-cli.ts +1 -1
- package/cli/selftune/evolution/evidence.ts +26 -0
- package/cli/selftune/evolution/evolve-body.ts +105 -11
- package/cli/selftune/evolution/evolve.ts +371 -25
- package/cli/selftune/evolution/extract-patterns.ts +87 -29
- package/cli/selftune/evolution/rollback.ts +2 -2
- package/cli/selftune/grading/auto-grade.ts +200 -0
- package/cli/selftune/grading/grade-session.ts +448 -97
- package/cli/selftune/grading/results.ts +42 -0
- package/cli/selftune/hooks/prompt-log.ts +172 -2
- package/cli/selftune/hooks/session-stop.ts +123 -3
- package/cli/selftune/hooks/skill-eval.ts +119 -3
- package/cli/selftune/index.ts +395 -116
- package/cli/selftune/ingestors/claude-replay.ts +140 -114
- package/cli/selftune/ingestors/codex-rollout.ts +345 -46
- package/cli/selftune/ingestors/codex-wrapper.ts +207 -39
- package/cli/selftune/ingestors/openclaw-ingest.ts +141 -8
- package/cli/selftune/ingestors/opencode-ingest.ts +193 -17
- package/cli/selftune/init.ts +227 -14
- package/cli/selftune/last.ts +14 -5
- package/cli/selftune/localdb/db.ts +63 -0
- package/cli/selftune/localdb/materialize.ts +428 -0
- package/cli/selftune/localdb/queries.ts +376 -0
- package/cli/selftune/localdb/schema.ts +204 -0
- package/cli/selftune/monitoring/watch.ts +66 -15
- package/cli/selftune/normalization.ts +682 -0
- package/cli/selftune/observability.ts +19 -44
- package/cli/selftune/orchestrate.ts +1073 -0
- package/cli/selftune/quickstart.ts +203 -0
- package/cli/selftune/repair/skill-usage.ts +576 -0
- package/cli/selftune/schedule.ts +561 -0
- package/cli/selftune/status.ts +48 -26
- package/cli/selftune/sync.ts +627 -0
- package/cli/selftune/types.ts +148 -0
- package/cli/selftune/utils/canonical-log.ts +45 -0
- package/cli/selftune/utils/hooks.ts +41 -0
- package/cli/selftune/utils/html.ts +27 -0
- package/cli/selftune/utils/llm-call.ts +78 -20
- package/cli/selftune/utils/math.ts +10 -0
- package/cli/selftune/utils/query-filter.ts +139 -0
- package/cli/selftune/utils/skill-discovery.ts +340 -0
- package/cli/selftune/utils/skill-log.ts +68 -0
- package/cli/selftune/utils/skill-usage-confidence.ts +18 -0
- package/cli/selftune/utils/transcript.ts +272 -26
- package/cli/selftune/workflows/discover.ts +254 -0
- package/cli/selftune/workflows/skill-md-writer.ts +288 -0
- package/cli/selftune/workflows/workflows.ts +188 -0
- package/package.json +21 -8
- package/packages/telemetry-contract/README.md +11 -0
- package/packages/telemetry-contract/fixtures/golden.json +87 -0
- package/packages/telemetry-contract/fixtures/golden.test.ts +42 -0
- package/packages/telemetry-contract/index.ts +1 -0
- package/packages/telemetry-contract/package.json +19 -0
- package/packages/telemetry-contract/src/index.ts +2 -0
- package/packages/telemetry-contract/src/types.ts +163 -0
- package/packages/telemetry-contract/src/validators.ts +109 -0
- package/skill/SKILL.md +84 -53
- package/skill/Workflows/AutoActivation.md +17 -16
- package/skill/Workflows/Badge.md +6 -0
- package/skill/Workflows/Baseline.md +46 -23
- package/skill/Workflows/Composability.md +12 -5
- package/skill/Workflows/Contribute.md +17 -14
- package/skill/Workflows/Cron.md +56 -79
- package/skill/Workflows/Dashboard.md +45 -34
- package/skill/Workflows/Doctor.md +30 -17
- package/skill/Workflows/Evals.md +64 -40
- package/skill/Workflows/EvolutionMemory.md +2 -0
- package/skill/Workflows/Evolve.md +102 -47
- package/skill/Workflows/EvolveBody.md +6 -6
- package/skill/Workflows/Grade.md +36 -31
- package/skill/Workflows/ImportSkillsBench.md +11 -5
- package/skill/Workflows/Ingest.md +43 -36
- package/skill/Workflows/Initialize.md +44 -30
- package/skill/Workflows/Orchestrate.md +139 -0
- package/skill/Workflows/Replay.md +39 -18
- package/skill/Workflows/Rollback.md +3 -3
- package/skill/Workflows/Schedule.md +61 -0
- package/skill/Workflows/Sync.md +88 -0
- package/skill/Workflows/UnitTest.md +34 -22
- package/skill/Workflows/Watch.md +14 -4
- package/skill/Workflows/Workflows.md +129 -0
- package/skill/assets/activation-rules-default.json +26 -0
- package/skill/assets/multi-skill-settings.json +63 -0
- package/skill/assets/single-skill-settings.json +57 -0
- package/skill/references/invocation-taxonomy.md +2 -2
- package/skill/references/logs.md +164 -2
- package/skill/references/setup-patterns.md +65 -0
- package/skill/references/version-history.md +40 -0
- package/skill/settings_snippet.json +1 -1
- package/templates/multi-skill-settings.json +7 -7
- package/templates/single-skill-settings.json +6 -6
- package/dashboard/index.html +0 -1680
|
@@ -11,12 +11,22 @@ Investigate why a specific skill is underperforming. Analyze telemetry logs,
|
|
|
11
11
|
grading results, and session transcripts to identify root causes and recommend
|
|
12
12
|
targeted fixes.
|
|
13
13
|
|
|
14
|
-
**
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
14
|
+
**Activation policy:** This is a subagent-only role, spawned by the main agent.
|
|
15
|
+
If a user asks for diagnosis directly, the main agent should route to this subagent.
|
|
16
|
+
|
|
17
|
+
## Connection to Workflows
|
|
18
|
+
|
|
19
|
+
This agent is spawned by the main agent as a subagent when deeper analysis is
|
|
20
|
+
needed — it is not called directly by the user.
|
|
21
|
+
|
|
22
|
+
**Connected workflows:**
|
|
23
|
+
- **Doctor** — when `selftune doctor` reveals persistent issues with a specific skill, spawn this agent for root cause analysis
|
|
24
|
+
- **Grade** — when grades are consistently low for a skill, spawn this agent to investigate why
|
|
25
|
+
- **Status** — when `selftune status` shows CRITICAL or WARNING flags on a skill, spawn this agent for a deep dive
|
|
26
|
+
|
|
27
|
+
The main agent decides when to escalate to this subagent based on severity
|
|
28
|
+
and persistence of the issue. One-off failures are handled inline; recurring
|
|
29
|
+
or unexplained failures warrant spawning this agent.
|
|
20
30
|
|
|
21
31
|
## Context
|
|
22
32
|
|
|
@@ -48,7 +58,7 @@ any warnings or regression flags.
|
|
|
48
58
|
### Step 3: Pull telemetry stats
|
|
49
59
|
|
|
50
60
|
```bash
|
|
51
|
-
selftune
|
|
61
|
+
selftune eval generate --skill <name> --stats
|
|
52
62
|
```
|
|
53
63
|
|
|
54
64
|
Review aggregate metrics:
|
|
@@ -59,7 +69,7 @@ Review aggregate metrics:
|
|
|
59
69
|
### Step 4: Analyze trigger coverage
|
|
60
70
|
|
|
61
71
|
```bash
|
|
62
|
-
selftune
|
|
72
|
+
selftune eval generate --skill <name> --max 50
|
|
63
73
|
```
|
|
64
74
|
|
|
65
75
|
Review the generated eval set. Count entries by invocation type:
|
|
@@ -106,8 +116,8 @@ Compile findings into a structured report.
|
|
|
106
116
|
|---------|---------|
|
|
107
117
|
| `selftune status` | Overall health snapshot |
|
|
108
118
|
| `selftune last` | Most recent session details |
|
|
109
|
-
| `selftune
|
|
110
|
-
| `selftune
|
|
119
|
+
| `selftune eval generate --skill <name> --stats` | Aggregate telemetry |
|
|
120
|
+
| `selftune eval generate --skill <name> --max 50` | Generate eval set for coverage analysis |
|
|
111
121
|
| `selftune doctor` | Check infrastructure health |
|
|
112
122
|
|
|
113
123
|
## Output
|
|
@@ -18,6 +18,19 @@ vs. new descriptions, and provides an approve/reject verdict with reasoning.
|
|
|
18
18
|
- "review pending changes"
|
|
19
19
|
- "should I deploy this evolution"
|
|
20
20
|
|
|
21
|
+
## Connection to Workflows
|
|
22
|
+
|
|
23
|
+
This agent is spawned by the main agent as a subagent to provide a safety
|
|
24
|
+
review before deploying an evolution.
|
|
25
|
+
|
|
26
|
+
**Connected workflows:**
|
|
27
|
+
- **Evolve** — in the review-before-deploy step, spawn this agent to evaluate the proposal for regressions, scope creep, and eval set quality
|
|
28
|
+
- **EvolveBody** — same role for full-body and routing-table evolutions
|
|
29
|
+
|
|
30
|
+
**Mode behavior:**
|
|
31
|
+
- **Interactive mode** — spawn this agent before deploying an evolution to get a human-readable safety review with an approve/reject verdict
|
|
32
|
+
- **Autonomous mode** — the orchestrator handles validation internally using regression thresholds and auto-rollback; this agent is for interactive safety reviews only
|
|
33
|
+
|
|
21
34
|
## Context
|
|
22
35
|
|
|
23
36
|
You need access to:
|
|
@@ -114,7 +127,7 @@ Issue an approve or reject decision with full reasoning.
|
|
|
114
127
|
| Command | Purpose |
|
|
115
128
|
|---------|---------|
|
|
116
129
|
| `selftune evolve --skill <name> --skill-path <path> --dry-run` | Generate proposal without deploying |
|
|
117
|
-
|
|
|
130
|
+
| Read eval file from evolve output or audit log | Inspect the exact eval set used for validation |
|
|
118
131
|
| `selftune watch --skill <name> --skill-path <path>` | Check current performance baseline |
|
|
119
132
|
| `selftune status` | Overall skill health context |
|
|
120
133
|
|
|
@@ -19,6 +19,18 @@ verify the setup is working end-to-end.
|
|
|
19
19
|
- "get selftune working"
|
|
20
20
|
- "selftune setup guide"
|
|
21
21
|
|
|
22
|
+
## Connection to Workflows
|
|
23
|
+
|
|
24
|
+
This agent is the deep-dive version of the Initialize workflow, spawned by
|
|
25
|
+
the main agent as a subagent when the project structure is complex.
|
|
26
|
+
|
|
27
|
+
**Connected workflows:**
|
|
28
|
+
- **Initialize** — for complex project structures (monorepos, multi-skill repos, mixed agent platforms), spawn this agent instead of running the basic init workflow
|
|
29
|
+
|
|
30
|
+
**When to spawn:** when the project has multiple SKILL.md files, multiple
|
|
31
|
+
packages or workspaces, mixed agent platforms (Claude + Codex), or any
|
|
32
|
+
structure where the standard `selftune init` needs project-specific guidance.
|
|
33
|
+
|
|
22
34
|
## Context
|
|
23
35
|
|
|
24
36
|
You need access to:
|
|
@@ -90,8 +102,8 @@ Parse the output to confirm `~/.selftune/config.json` was created. Note the
|
|
|
90
102
|
detected `agent_type` and `cli_path`.
|
|
91
103
|
|
|
92
104
|
If the user is on a non-Claude agent platform:
|
|
93
|
-
- **Codex** — inform about `wrap-codex` and `ingest
|
|
94
|
-
- **OpenCode** — inform about `ingest
|
|
105
|
+
- **Codex** — inform about `ingest wrap-codex` and `ingest codex` options
|
|
106
|
+
- **OpenCode** — inform about `ingest opencode` option
|
|
95
107
|
|
|
96
108
|
### Step 5: Install hooks
|
|
97
109
|
|
|
@@ -106,8 +118,8 @@ into `~/.claude/settings.json`. Three hooks are required:
|
|
|
106
118
|
|
|
107
119
|
Derive script paths from `cli_path` in `~/.selftune/config.json`.
|
|
108
120
|
|
|
109
|
-
For **Codex**: use `selftune wrap-codex` or `selftune ingest
|
|
110
|
-
For **OpenCode**: use `selftune ingest
|
|
121
|
+
For **Codex**: use `selftune ingest wrap-codex` or `selftune ingest codex`.
|
|
122
|
+
For **OpenCode**: use `selftune ingest opencode`.
|
|
111
123
|
|
|
112
124
|
### Step 6: Verify with doctor
|
|
113
125
|
|
|
@@ -159,7 +171,7 @@ from any package directory.
|
|
|
159
171
|
Tell the user what to do next based on their goals:
|
|
160
172
|
|
|
161
173
|
- **"I want to see how my skills are doing"** — run `selftune status`
|
|
162
|
-
- **"I want to improve a skill"** — run `selftune
|
|
174
|
+
- **"I want to improve a skill"** — run `selftune eval generate --skill <name>` then `selftune evolve --skill <name>`
|
|
163
175
|
- **"I want to grade a session"** — run `selftune grade --skill <name>`
|
|
164
176
|
|
|
165
177
|
## Commands
|
|
@@ -170,7 +182,7 @@ Tell the user what to do next based on their goals:
|
|
|
170
182
|
| `selftune doctor` | Verify installation health |
|
|
171
183
|
| `selftune status` | Post-setup health check |
|
|
172
184
|
| `selftune last` | Verify telemetry capture |
|
|
173
|
-
| `selftune
|
|
185
|
+
| `selftune eval generate --list-skills` | Confirm skills are being tracked |
|
|
174
186
|
|
|
175
187
|
## Output
|
|
176
188
|
|
|
@@ -19,6 +19,19 @@ opportunities, and identify systemic issues affecting multiple skills.
|
|
|
19
19
|
- "skill trigger conflicts"
|
|
20
20
|
- "optimize my skills"
|
|
21
21
|
|
|
22
|
+
## Connection to Workflows
|
|
23
|
+
|
|
24
|
+
This agent is spawned by the main agent as a subagent for deep cross-skill
|
|
25
|
+
analysis.
|
|
26
|
+
|
|
27
|
+
**Connected workflows:**
|
|
28
|
+
- **Composability** — when `selftune eval composability` identifies conflict candidates, spawn this agent for deeper investigation of trigger overlaps and resolution strategies
|
|
29
|
+
- **Evals** — when analyzing cross-skill patterns or systemwide undertriggering, spawn this agent to find optimization opportunities
|
|
30
|
+
|
|
31
|
+
**When to spawn:** when the user asks about conflicts between skills,
|
|
32
|
+
cross-skill optimization, or when composability scores indicate moderate-to-severe
|
|
33
|
+
conflicts (score > 0.3).
|
|
34
|
+
|
|
22
35
|
## Context
|
|
23
36
|
|
|
24
37
|
You need access to:
|
|
@@ -33,7 +46,7 @@ You need access to:
|
|
|
33
46
|
### Step 1: Inventory all skills
|
|
34
47
|
|
|
35
48
|
```bash
|
|
36
|
-
selftune
|
|
49
|
+
selftune eval generate --list-skills
|
|
37
50
|
```
|
|
38
51
|
|
|
39
52
|
Parse the JSON output to get a complete list of skills with their query
|
|
@@ -77,7 +90,7 @@ Read `skill_usage_log.jsonl` and group by query text. Look for:
|
|
|
77
90
|
For each skill, pull stats:
|
|
78
91
|
|
|
79
92
|
```bash
|
|
80
|
-
selftune
|
|
93
|
+
selftune eval generate --skill <name> --stats
|
|
81
94
|
```
|
|
82
95
|
|
|
83
96
|
Compare across skills:
|
|
@@ -100,10 +113,10 @@ Compile a cross-skill analysis report.
|
|
|
100
113
|
|
|
101
114
|
| Command | Purpose |
|
|
102
115
|
|---------|---------|
|
|
103
|
-
| `selftune
|
|
116
|
+
| `selftune eval generate --list-skills` | Inventory all skills with query counts |
|
|
104
117
|
| `selftune status` | Health snapshot across all skills |
|
|
105
|
-
| `selftune
|
|
106
|
-
| `selftune
|
|
118
|
+
| `selftune eval generate --skill <name> --stats` | Per-skill aggregate telemetry |
|
|
119
|
+
| `selftune eval generate --skill <name> --max 50` | Generate eval set per skill |
|
|
107
120
|
|
|
108
121
|
## Output
|
|
109
122
|
|
package/CHANGELOG.md
CHANGED
|
@@ -7,15 +7,23 @@ and this project adheres to [Semantic Versioning](https://semver.org/).
|
|
|
7
7
|
|
|
8
8
|
## [Unreleased]
|
|
9
9
|
|
|
10
|
+
### Added
|
|
11
|
+
|
|
12
|
+
- **Real-time improvement signal detection** — `prompt-log` hook detects user corrections ("why didn't you use X?") and explicit skill requests via pure regex patterns. Signals are logged to `~/.claude/improvement_signals.jsonl` with skill name extraction from installed skills.
|
|
13
|
+
- **Signal-reactive orchestration** — `session-stop` hook checks for pending improvement signals and spawns a focused `selftune orchestrate --max-skills 2` run in the background. Respects a 30-minute lockfile to prevent concurrent runs.
|
|
14
|
+
- **Signal-aware candidate selection** — Orchestrator reads pending signals and boosts priority for mentioned skills (+150 per signal, capped at +450). Signaled skills bypass the minimum evidence gate and the "UNGRADED with 0 missed queries" gate.
|
|
15
|
+
- **Orchestrate lockfile** — `acquireLock()`/`releaseLock()` with PID+timestamp in `~/.claude/.orchestrate.lock`. 30-minute stale threshold prevents deadlocks from crashed runs.
|
|
16
|
+
- **Signal consumption** — After an orchestrate run completes, consumed signals are marked with `consumed: true`, `consumed_at`, and `consumed_by_run` so they don't affect subsequent runs.
|
|
17
|
+
|
|
10
18
|
## [0.2.0] — 2026-03-08
|
|
11
19
|
|
|
12
20
|
### Added
|
|
13
21
|
|
|
14
22
|
- **Full skill body evolution** — Teacher-student model for evolving routing tables and complete skill bodies with 3-gate validation (structural, trigger, quality)
|
|
15
|
-
- **Synthetic eval generation** — `selftune
|
|
23
|
+
- **Synthetic eval generation** — `selftune eval generate --synthetic --skill <name> --skill-path <path>` generates eval sets from SKILL.md via LLM without needing real session logs. Solves cold-start for new skills.
|
|
16
24
|
- **Batch trigger validation** — `validateProposalBatched()` batches 10 queries per LLM call (configurable via `TRIGGER_CHECK_BATCH_SIZE`). ~10x faster evolution loops. Sequential `validateProposalSequential()` kept for backward compat.
|
|
17
25
|
- **Cheap-loop evolution mode** — `selftune evolve --cheap-loop` uses haiku for proposal generation and validation, sonnet only for the final deployment gate. New `--gate-model` and `--proposal-model` flags for manual per-stage control.
|
|
18
|
-
- **Validation model selection** — `--validation-model` flag on `evolve` and `evolve
|
|
26
|
+
- **Validation model selection** — `--validation-model` flag on `evolve` and `evolve body` commands (default: `haiku`).
|
|
19
27
|
- **Proposal model selection** — `--proposal-model` flag on `evolve`, passed through to `generateProposal()` and `generateMultipleProposals()`.
|
|
20
28
|
- **Gate validation dependency injection** — `gateValidateProposal` added to `EvolveDeps` for testability.
|
|
21
29
|
- **Auto-activation system** — `auto-activate.ts` UserPromptSubmit hook detects when selftune should run and outputs formatted suggestions; session state tracking prevents repeated nags; PAI coexistence support
|
|
@@ -47,7 +55,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/).
|
|
|
47
55
|
- `selftune status` — CLI skill health summary with pass rates, trends, and system health
|
|
48
56
|
- `selftune last` — Quick insight from the most recent session
|
|
49
57
|
- `selftune dashboard` — Skill-health-centric HTML dashboard with grid view and drill-down
|
|
50
|
-
- `selftune
|
|
58
|
+
- `selftune ingest claude` — Claude Code transcript replay for retroactive log backfill
|
|
51
59
|
- `selftune contribute` — Opt-in anonymized data export for community contribution
|
|
52
60
|
- CI/CD workflows: publish, auto-bump, CodeQL, scorecard
|
|
53
61
|
- FOSS governance: LICENSE (MIT), CODE_OF_CONDUCT, CONTRIBUTING, SECURITY
|
|
@@ -57,7 +65,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/).
|
|
|
57
65
|
|
|
58
66
|
### Added
|
|
59
67
|
|
|
60
|
-
- CLI entry point with 10 commands: `init`, `
|
|
68
|
+
- CLI entry point with 10 commands: `init`, `eval generate`, `grade`, `evolve`, `evolve rollback`, `watch`, `doctor`, `ingest codex`, `ingest opencode`, `ingest wrap-codex`
|
|
61
69
|
- Agent auto-detection for Claude Code, Codex, and OpenCode
|
|
62
70
|
- Telemetry hooks for Claude Code (`prompt-log`, `skill-eval`, `session-stop`)
|
|
63
71
|
- Codex wrapper and batch ingestor for rollout logs
|
package/README.md
CHANGED
|
@@ -6,9 +6,9 @@
|
|
|
6
6
|
|
|
7
7
|
**Self-improving skills for AI agents.**
|
|
8
8
|
|
|
9
|
-
[](https://github.com/selftune-dev/selftune/actions/workflows/ci.yml)
|
|
10
|
+
[](https://github.com/selftune-dev/selftune/actions/workflows/codeql.yml)
|
|
11
|
+
[](https://securityscorecards.dev/viewer/?uri=github.com/selftune-dev/selftune)
|
|
12
12
|
[](https://www.npmjs.com/package/selftune)
|
|
13
13
|
[](LICENSE)
|
|
14
14
|
[](https://www.typescriptlang.org/)
|
|
@@ -25,17 +25,17 @@ Your agent skills learn how you work. Detect what's broken. Fix it automatically
|
|
|
25
25
|
|
|
26
26
|
Your skills don't understand how you talk. You say "make me a slide deck" and nothing happens — no error, no log, no signal. selftune watches your real sessions, learns how you actually speak, and rewrites skill descriptions to match. Automatically.
|
|
27
27
|
|
|
28
|
-
Works with **Claude Code
|
|
28
|
+
Works with **Claude Code** (primary). Codex, OpenCode, and OpenClaw adapters are experimental. Zero runtime dependencies.
|
|
29
29
|
|
|
30
30
|
## Install
|
|
31
31
|
|
|
32
32
|
```bash
|
|
33
|
-
npx skills add
|
|
33
|
+
npx skills add selftune-dev/selftune
|
|
34
34
|
```
|
|
35
35
|
|
|
36
36
|
Then tell your agent: **"initialize selftune"**
|
|
37
37
|
|
|
38
|
-
Two minutes. No API keys. No external services. No configuration ceremony. Uses your existing agent subscription.
|
|
38
|
+
Two minutes. No API keys. No external services. No configuration ceremony. Uses your existing agent subscription. You'll see which skills are undertriggering.
|
|
39
39
|
|
|
40
40
|
**CLI only** (no skill, just the CLI):
|
|
41
41
|
|
|
@@ -53,11 +53,11 @@ selftune learned that real users say "slides", "deck", "presentation for Monday"
|
|
|
53
53
|
|
|
54
54
|
## Built for How You Actually Work
|
|
55
55
|
|
|
56
|
-
**I write and use my own skills** —
|
|
56
|
+
**I write and use my own skills** — Your skill descriptions don't match how you actually talk. Tell your agent "improve my skills" and selftune learns your language from real sessions, evolves descriptions to match, and validates before deploying. No manual tuning.
|
|
57
57
|
|
|
58
|
-
**I publish skills others install** — Your skill works for you, but every user talks differently. selftune ships skills that get better for every user automatically — adapting descriptions to how each person actually works.
|
|
58
|
+
**I publish skills others install** — Your skill works for you, but every user talks differently. selftune ships skills that get better for every user automatically — adapting descriptions to how each person actually works.
|
|
59
59
|
|
|
60
|
-
**I manage an agent setup with many skills** — You have 15+ skills installed. Some work. Some don't. Some conflict. selftune gives you a health dashboard and automatically improves the skills that aren't keeping up
|
|
60
|
+
**I manage an agent setup with many skills** — You have 15+ skills installed. Some work. Some don't. Some conflict. Tell your agent "how are my skills doing?" and selftune gives you a health dashboard and automatically improves the skills that aren't keeping up.
|
|
61
61
|
|
|
62
62
|
## How It Works
|
|
63
63
|
|
|
@@ -65,20 +65,22 @@ selftune learned that real users say "slides", "deck", "presentation for Monday"
|
|
|
65
65
|
<img src="./assets/FeedbackLoop.gif" alt="Observe → Detect → Evolve → Watch" width="800">
|
|
66
66
|
</p>
|
|
67
67
|
|
|
68
|
-
A continuous feedback loop that makes your skills learn and adapt. Automatically.
|
|
68
|
+
A continuous feedback loop that makes your skills learn and adapt. Automatically. Your agent runs everything — you just install the skill and talk naturally.
|
|
69
69
|
|
|
70
|
-
**Observe** — Hooks capture every
|
|
70
|
+
**Observe** — Hooks capture every query and which skills fired. On Claude Code, hooks install automatically during `selftune init`. Backfill existing transcripts with `selftune ingest claude`.
|
|
71
71
|
|
|
72
|
-
**Detect** —
|
|
72
|
+
**Detect** — Finds the gap between how you talk and how your skills are described. You say "make me a slide deck" and your pptx skill stays silent — selftune catches that mismatch. Real-time correction signals ("why didn't you use X?") are detected and trigger immediate improvement.
|
|
73
73
|
|
|
74
|
-
**Evolve** — Rewrites skill descriptions — and full skill bodies — to match how you actually work.
|
|
74
|
+
**Evolve** — Rewrites skill descriptions — and full skill bodies — to match how you actually work. Cheap-loop mode uses haiku for the loop, sonnet for the gate (~80% cost reduction). Teacher-student body evolution with 3-gate validation. Automatic backup.
|
|
75
75
|
|
|
76
|
-
**Watch** — After deploying changes, selftune monitors skill trigger rates. If anything regresses, it rolls back automatically.
|
|
76
|
+
**Watch** — After deploying changes, selftune monitors skill trigger rates. If anything regresses, it rolls back automatically.
|
|
77
|
+
|
|
78
|
+
**Automate** — Run `selftune cron setup` to install OS-level scheduling. selftune syncs, evaluates, evolves, and watches on a schedule — no manual intervention needed.
|
|
77
79
|
|
|
78
80
|
## What's New in v0.2.0
|
|
79
81
|
|
|
80
82
|
- **Full skill body evolution** — Beyond descriptions: evolve routing tables and entire skill bodies using teacher-student model with structural, trigger, and quality gates
|
|
81
|
-
- **Synthetic eval generation** — `selftune
|
|
83
|
+
- **Synthetic eval generation** — `selftune eval generate --synthetic` generates eval sets from SKILL.md via LLM, no session logs needed. Solves cold-start: new skills get evals immediately.
|
|
82
84
|
- **Cheap-loop evolution** — `selftune evolve --cheap-loop` uses haiku for proposal generation and validation, sonnet only for the final deployment gate. ~80% cost reduction.
|
|
83
85
|
- **Batch trigger validation** — Validation now batches 10 queries per LLM call instead of one-per-query. ~10x faster evolution loops.
|
|
84
86
|
- **Per-stage model control** — `--validation-model`, `--proposal-model`, and `--gate-model` flags give fine-grained control over which model runs each evolution stage.
|
|
@@ -91,21 +93,27 @@ A continuous feedback loop that makes your skills learn and adapt. Automatically
|
|
|
91
93
|
|
|
92
94
|
## Commands
|
|
93
95
|
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
|
97
|
-
|
|
98
|
-
| `selftune
|
|
99
|
-
| `selftune
|
|
100
|
-
| `selftune
|
|
101
|
-
| `selftune
|
|
102
|
-
| `selftune
|
|
103
|
-
| `selftune
|
|
104
|
-
| `selftune
|
|
105
|
-
| `selftune
|
|
106
|
-
| `selftune
|
|
107
|
-
| `selftune
|
|
108
|
-
| `selftune
|
|
96
|
+
Your agent runs these — you just say what you want ("improve my skills", "show the dashboard").
|
|
97
|
+
|
|
98
|
+
| Group | Command | What it does |
|
|
99
|
+
|-------|---------|-------------|
|
|
100
|
+
| | `selftune status` | See which skills are undertriggering and why |
|
|
101
|
+
| | `selftune orchestrate` | Run the full autonomous loop (sync → evolve → watch) |
|
|
102
|
+
| | `selftune dashboard` | Open the visual skill health dashboard |
|
|
103
|
+
| | `selftune doctor` | Health check: logs, hooks, config, permissions |
|
|
104
|
+
| **ingest** | `selftune ingest claude` | Backfill from Claude Code transcripts |
|
|
105
|
+
| | `selftune ingest codex` | Import Codex rollout logs (experimental) |
|
|
106
|
+
| **grade** | `selftune grade --skill <name>` | Grade a skill session with evidence |
|
|
107
|
+
| | `selftune grade baseline --skill <name>` | Measure skill value vs no-skill baseline |
|
|
108
|
+
| **evolve** | `selftune evolve --skill <name>` | Propose, validate, and deploy improved descriptions |
|
|
109
|
+
| | `selftune evolve body --skill <name>` | Evolve full skill body or routing table |
|
|
110
|
+
| | `selftune evolve rollback --skill <name>` | Rollback a previous evolution |
|
|
111
|
+
| **eval** | `selftune eval generate --skill <name>` | Generate eval sets (`--synthetic` for cold-start) |
|
|
112
|
+
| | `selftune eval unit-test --skill <name>` | Run or generate skill-level unit tests |
|
|
113
|
+
| | `selftune eval composability --skill <name>` | Detect conflicts between co-occurring skills |
|
|
114
|
+
| | `selftune eval import` | Import external eval corpus from [SkillsBench](https://github.com/benchflow-ai/skillsbench) |
|
|
115
|
+
| **auto** | `selftune cron setup` | Install OS-level scheduling (cron/launchd/systemd) |
|
|
116
|
+
| | `selftune watch --skill <name>` | Monitor after deploy. Auto-rollback on regression. |
|
|
109
117
|
|
|
110
118
|
Full command reference: `selftune --help`
|
|
111
119
|
|
|
@@ -135,13 +143,13 @@ selftune is complementary to these tools, not competitive. They trace what happe
|
|
|
135
143
|
|
|
136
144
|
## Platforms
|
|
137
145
|
|
|
138
|
-
**Claude Code** — Hooks install automatically. `selftune
|
|
146
|
+
**Claude Code** (fully supported) — Hooks install automatically. `selftune ingest claude` backfills existing transcripts. This is the primary supported platform.
|
|
139
147
|
|
|
140
|
-
**Codex** — `selftune wrap-codex -- <args>` or `selftune ingest
|
|
148
|
+
**Codex** (experimental) — `selftune ingest wrap-codex -- <args>` or `selftune ingest codex`. Adapter exists but is not actively tested.
|
|
141
149
|
|
|
142
|
-
**OpenCode** — `selftune ingest
|
|
150
|
+
**OpenCode** (experimental) — `selftune ingest opencode`. Adapter exists but is not actively tested.
|
|
143
151
|
|
|
144
|
-
**OpenClaw** — `selftune ingest
|
|
152
|
+
**OpenClaw** (experimental) — `selftune ingest openclaw` + `selftune cron setup` for autonomous evolution. Adapter exists but is not actively tested.
|
|
145
153
|
|
|
146
154
|
Requires [Bun](https://bun.sh) or Node.js 18+. No extra API keys.
|
|
147
155
|
|
|
@@ -151,6 +159,6 @@ Requires [Bun](https://bun.sh) or Node.js 18+. No extra API keys.
|
|
|
151
159
|
|
|
152
160
|
[Architecture](ARCHITECTURE.md) · [Contributing](CONTRIBUTING.md) · [Security](SECURITY.md) · [Integration Guide](docs/integration-guide.md) · [Sponsor](https://github.com/sponsors/WellDunDun)
|
|
153
161
|
|
|
154
|
-
MIT licensed. Free forever.
|
|
162
|
+
MIT licensed. Free forever. Primary support for Claude Code; experimental adapters for Codex, OpenCode, and OpenClaw.
|
|
155
163
|
|
|
156
164
|
</div>
|
|
Binary file
|
|
Binary file
|