selftune 0.2.16 → 0.2.19
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +32 -22
- package/apps/local-dashboard/dist/assets/index-DnhnXQm6.js +60 -0
- package/apps/local-dashboard/dist/assets/index-_EcLywDg.css +1 -0
- package/apps/local-dashboard/dist/assets/vendor-table-BIiI3YhS.js +1 -0
- package/apps/local-dashboard/dist/assets/vendor-ui-CGEmUayx.js +12 -0
- package/apps/local-dashboard/dist/index.html +5 -5
- package/cli/selftune/alpha-upload/build-payloads.ts +14 -1
- package/cli/selftune/alpha-upload/client.ts +51 -1
- package/cli/selftune/alpha-upload/flush.ts +46 -5
- package/cli/selftune/alpha-upload/stage-canonical.ts +32 -10
- package/cli/selftune/alpha-upload-contract.ts +9 -0
- package/cli/selftune/constants.ts +92 -5
- package/cli/selftune/contribute/contribute.ts +30 -2
- package/cli/selftune/contribute/sanitize.ts +52 -5
- package/cli/selftune/contribution-config.ts +249 -0
- package/cli/selftune/contribution-relay.ts +177 -0
- package/cli/selftune/contribution-signals.ts +219 -0
- package/cli/selftune/contribution-staging.ts +147 -0
- package/cli/selftune/contributions.ts +532 -0
- package/cli/selftune/creator-contributions.ts +333 -0
- package/cli/selftune/dashboard-contract.ts +305 -1
- package/cli/selftune/dashboard-server.ts +47 -13
- package/cli/selftune/eval/family-overlap.ts +395 -0
- package/cli/selftune/eval/hooks-to-evals.ts +182 -28
- package/cli/selftune/eval/synthetic-evals.ts +298 -11
- package/cli/selftune/evolution/description-quality.ts +12 -11
- package/cli/selftune/evolution/evolve.ts +214 -51
- package/cli/selftune/evolution/validate-proposal.ts +9 -6
- package/cli/selftune/export.ts +2 -2
- package/cli/selftune/grading/grade-session.ts +20 -0
- package/cli/selftune/hooks/commit-track.ts +188 -0
- package/cli/selftune/hooks/prompt-log.ts +10 -1
- package/cli/selftune/hooks/session-stop.ts +2 -2
- package/cli/selftune/hooks/skill-eval.ts +15 -1
- package/cli/selftune/hooks/stdin-preview.ts +32 -0
- package/cli/selftune/index.ts +41 -5
- package/cli/selftune/ingestors/codex-rollout.ts +31 -35
- package/cli/selftune/ingestors/codex-wrapper.ts +32 -24
- package/cli/selftune/localdb/db.ts +2 -2
- package/cli/selftune/localdb/direct-write.ts +69 -6
- package/cli/selftune/localdb/queries.ts +1253 -37
- package/cli/selftune/localdb/schema.ts +66 -0
- package/cli/selftune/orchestrate.ts +32 -4
- package/cli/selftune/recover.ts +153 -0
- package/cli/selftune/repair/skill-usage.ts +363 -4
- package/cli/selftune/routes/actions.ts +35 -1
- package/cli/selftune/routes/analytics.ts +14 -0
- package/cli/selftune/routes/index.ts +1 -0
- package/cli/selftune/routes/overview.ts +150 -4
- package/cli/selftune/routes/skill-report.ts +648 -18
- package/cli/selftune/status.ts +81 -2
- package/cli/selftune/sync.ts +56 -2
- package/cli/selftune/trust-model.ts +66 -0
- package/cli/selftune/types.ts +80 -0
- package/cli/selftune/utils/skill-detection.ts +43 -0
- package/cli/selftune/utils/transcript.ts +210 -1
- package/cli/selftune/watchlist.ts +65 -0
- package/node_modules/@selftune/telemetry-contract/src/types.ts +11 -0
- package/package.json +1 -1
- package/packages/telemetry-contract/src/types.ts +11 -0
- package/packages/ui/src/components/ActivityTimeline.tsx +165 -150
- package/packages/ui/src/components/EvidenceViewer.tsx +335 -144
- package/packages/ui/src/components/EvolutionTimeline.tsx +58 -28
- package/packages/ui/src/components/OrchestrateRunsPanel.tsx +33 -16
- package/packages/ui/src/components/RecentActivityFeed.tsx +72 -41
- package/packages/ui/src/components/section-cards.tsx +12 -9
- package/packages/ui/src/primitives/card.tsx +1 -1
- package/skill/SKILL.md +40 -2
- package/skill/Workflows/AlphaUpload.md +4 -0
- package/skill/Workflows/Composability.md +64 -0
- package/skill/Workflows/Contribute.md +6 -3
- package/skill/Workflows/Contributions.md +97 -0
- package/skill/Workflows/CreatorContributions.md +74 -0
- package/skill/Workflows/Dashboard.md +31 -0
- package/skill/Workflows/Evals.md +57 -8
- package/skill/Workflows/Evolve.md +31 -13
- package/skill/Workflows/ExportCanonical.md +121 -0
- package/skill/Workflows/Hook.md +131 -0
- package/skill/Workflows/Ingest.md +7 -0
- package/skill/Workflows/Initialize.md +29 -9
- package/skill/Workflows/Orchestrate.md +27 -5
- package/skill/Workflows/Quickstart.md +94 -0
- package/skill/Workflows/Recover.md +84 -0
- package/skill/Workflows/RepairSkillUsage.md +95 -0
- package/skill/Workflows/Sync.md +18 -12
- package/skill/Workflows/Uninstall.md +82 -0
- package/skill/settings_snippet.json +11 -0
- package/apps/local-dashboard/dist/assets/index-BMIS6uUh.css +0 -2
- package/apps/local-dashboard/dist/assets/index-DOu3iLD9.js +0 -16
- package/apps/local-dashboard/dist/assets/vendor-table-pHbDxq36.js +0 -8
- package/apps/local-dashboard/dist/assets/vendor-ui-DIwlrGlb.js +0 -12
package/README.md
CHANGED
|
@@ -69,6 +69,8 @@ selftune learned that real users say "slides", "deck", "presentation for Monday"
|
|
|
69
69
|
|
|
70
70
|
**I manage an agent setup with many skills** — You have 15+ skills installed. Some work. Some don't. Some conflict. Tell your agent "how are my skills doing?" and selftune gives you a health dashboard and automatically improves the skills that aren't keeping up.
|
|
71
71
|
|
|
72
|
+
**I use skills for non-coding work** — Marketing workflows, research pipelines, compliance checks, slide decks. You say "make me a presentation" and nothing happens. selftune learns that "slides", "deck", and "presentation for Monday" all mean the same skill — and fixes the routing automatically.
|
|
73
|
+
|
|
72
74
|
## How It Works
|
|
73
75
|
|
|
74
76
|
<p align="center">
|
|
@@ -77,29 +79,27 @@ selftune learned that real users say "slides", "deck", "presentation for Monday"
|
|
|
77
79
|
|
|
78
80
|
A continuous feedback loop that makes your skills learn and adapt. Automatically. Your agent runs everything — you just install the skill and talk naturally.
|
|
79
81
|
|
|
80
|
-
**Observe** —
|
|
82
|
+
**Observe** — Seven real-time hooks capture every query, every skill invocation, and every correction signal. Structured telemetry — not raw logs. On Claude Code, hooks install automatically during `selftune init`. Backfill existing transcripts with `selftune ingest claude`.
|
|
83
|
+
|
|
84
|
+
**Detect** — Finds the gap between how you talk and how your skills are described. You say "make me a slide deck" and your pptx skill stays silent — selftune catches that mismatch. Clusters missed queries by invocation type. Detects correction signals ("why didn't you use X?") and triggers immediate improvement.
|
|
85
|
+
|
|
86
|
+
**Evolve** — Generates multiple proposals biased toward different invocation types, validates each against your real eval set with majority voting, runs constitutional checks, then gates with an expensive model before deploying. Not guesswork — evidence. Automatic backup on every deploy.
|
|
81
87
|
|
|
82
|
-
**
|
|
88
|
+
**Watch** — After deploying changes, selftune monitors trigger rates, false negatives, and per-invocation-type scores. If anything regresses, it rolls back automatically. No manual monitoring needed.
|
|
83
89
|
|
|
84
|
-
**
|
|
90
|
+
**Automate** — Run `selftune cron setup` to install OS-level scheduling. selftune syncs, grades, evolves, and watches on a schedule — fully autonomous.
|
|
85
91
|
|
|
86
|
-
|
|
92
|
+
## How Is This Different from Agents That "Learn"?
|
|
87
93
|
|
|
88
|
-
|
|
94
|
+
Some agents claim self-improvement by saving notes about what worked. That's knowledge persistence — not a closed loop. There's no measurement, no validation, and no way to know if the saved notes are actually correct.
|
|
89
95
|
|
|
90
|
-
|
|
96
|
+
selftune is empirical. It observes real sessions, grades execution quality, detects missed triggers, proposes changes, validates them against eval sets, deploys with automatic backup, monitors for regressions, and rolls back on failure. Twelve interlocking mechanisms — not one background thread writing markdown.
|
|
91
97
|
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
- **Auto-activation system** — Hooks detect when selftune should run and suggest actions
|
|
98
|
-
- **Enforcement guardrails** — Blocks SKILL.md edits on monitored skills unless `selftune watch` has been run
|
|
99
|
-
- **Live dashboard server** — `selftune dashboard --serve` with SSE auto-refresh and action buttons
|
|
100
|
-
- **Evolution memory** — Persists context, plans, and decisions across context resets
|
|
101
|
-
- **4 specialized agents** — Diagnosis analyst, pattern analyst, evolution reviewer, integration guide
|
|
102
|
-
- **Sandbox test harness** — Comprehensive automated test coverage, including devcontainer-based LLM testing
|
|
98
|
+
| Approach | Measures quality? | Validates changes? | Detects regressions? | Rolls back? |
|
|
99
|
+
| ------------------------- | ----------------- | --------------------------- | ---------------------- | ----------- |
|
|
100
|
+
| Agent saves its own notes | No | No | No | No |
|
|
101
|
+
| Manual skill rewrites | No | No | No | No |
|
|
102
|
+
| **selftune** | 3-tier grading | Eval sets + majority voting | Post-deploy monitoring | Automatic |
|
|
103
103
|
|
|
104
104
|
## Commands
|
|
105
105
|
|
|
@@ -107,13 +107,16 @@ Your agent runs these — you just say what you want ("improve my skills", "show
|
|
|
107
107
|
|
|
108
108
|
| Group | Command | What it does |
|
|
109
109
|
| ---------- | -------------------------------------------- | ------------------------------------------------------------------------------------------- |
|
|
110
|
-
| | `selftune status` |
|
|
111
|
-
| | `selftune
|
|
110
|
+
| | `selftune status` | Get a one-line health summary plus compact attention / improving highlights |
|
|
111
|
+
| | `selftune last` | Quick insight from the most recent session |
|
|
112
|
+
| | `selftune orchestrate` | Run the full autonomous loop (sync → grade → evolve → watch) |
|
|
113
|
+
| | `selftune sync` | Replay source-truth transcripts/rollouts into SQLite and refresh repair state |
|
|
112
114
|
| | `selftune dashboard` | Open the visual skill health dashboard |
|
|
113
115
|
| | `selftune doctor` | Health check: logs, hooks, config, permissions |
|
|
114
116
|
| **ingest** | `selftune ingest claude` | Backfill from Claude Code transcripts |
|
|
115
117
|
| | `selftune ingest codex` | Import Codex rollout logs (experimental) |
|
|
116
118
|
| **grade** | `selftune grade --skill <name>` | Grade a skill session with evidence |
|
|
119
|
+
| | `selftune grade auto` | Auto-grade recent sessions for ungraded skills |
|
|
117
120
|
| | `selftune grade baseline --skill <name>` | Measure skill value vs no-skill baseline |
|
|
118
121
|
| **evolve** | `selftune evolve --skill <name>` | Propose, validate, and deploy improved descriptions |
|
|
119
122
|
| | `selftune evolve body --skill <name>` | Evolve full skill body or routing table |
|
|
@@ -121,11 +124,18 @@ Your agent runs these — you just say what you want ("improve my skills", "show
|
|
|
121
124
|
| **eval** | `selftune eval generate --skill <name>` | Generate eval sets (`--synthetic` for cold-start) |
|
|
122
125
|
| | `selftune eval unit-test --skill <name>` | Run or generate skill-level unit tests |
|
|
123
126
|
| | `selftune eval composability --skill <name>` | Detect conflicts between co-occurring skills |
|
|
127
|
+
| | `selftune eval family-overlap --prefix sc-` | Detect sibling overlap and suggest when a skill family should be consolidated |
|
|
124
128
|
| | `selftune eval import` | Import external eval corpus from [SkillsBench](https://github.com/benchflow-ai/skillsbench) |
|
|
125
129
|
| **auto** | `selftune cron setup` | Install OS-level scheduling (cron/launchd/systemd) |
|
|
126
130
|
| | `selftune watch --skill <name>` | Monitor after deploy. Auto-rollback on regression. |
|
|
127
|
-
| **other** | `selftune
|
|
128
|
-
| | `selftune
|
|
131
|
+
| **other** | `selftune workflows` | Discover and manage multi-skill workflows |
|
|
132
|
+
| | `selftune contributions` | Manage creator-directed sharing preferences |
|
|
133
|
+
| | `selftune creator-contributions` | Create or remove bundled `selftune.contribute.json` configs for skill creators |
|
|
134
|
+
| | `selftune contribute` | Export an anonymized community contribution bundle |
|
|
135
|
+
| | `selftune recover` | Recover SQLite from legacy/exported JSONL during migration or disaster recovery |
|
|
136
|
+
| | `selftune badge --skill <name>` | Generate a health badge for your skill's README |
|
|
137
|
+
| | `selftune telemetry` | Manage anonymous usage analytics (status, enable, disable) |
|
|
138
|
+
| | `selftune alpha upload` | Run a manual SQLite-backed alpha upload cycle and emit a JSON send summary |
|
|
129
139
|
|
|
130
140
|
Full command reference: `selftune --help`
|
|
131
141
|
|
|
@@ -157,7 +167,7 @@ selftune is complementary to these tools, not competitive. They trace what happe
|
|
|
157
167
|
|
|
158
168
|
**Claude Code** (fully supported) — Hooks install automatically. `selftune ingest claude` backfills existing transcripts. This is the primary supported platform.
|
|
159
169
|
|
|
160
|
-
**Codex** (experimental) — `selftune ingest wrap-codex -- <args>` or `selftune ingest codex`. Adapter exists but is not actively tested.
|
|
170
|
+
**Codex** (experimental) — `selftune ingest wrap-codex -- <args>` or `selftune ingest codex`. Adapter exists but is not actively tested. Skill attribution is conservative: selftune only records explicit Codex skill evidence, not incidental assistant/meta mentions.
|
|
161
171
|
|
|
162
172
|
**OpenCode** (experimental) — `selftune ingest opencode`. Adapter exists but is not actively tested.
|
|
163
173
|
|