npm - selftune - Versions diffs - 0.2.16 → 0.2.19 - Mend

selftune 0.2.16 → 0.2.19

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (91) hide show

package/README.md +32 -22
package/apps/local-dashboard/dist/assets/index-DnhnXQm6.js +60 -0
package/apps/local-dashboard/dist/assets/index-_EcLywDg.css +1 -0
package/apps/local-dashboard/dist/assets/vendor-table-BIiI3YhS.js +1 -0
package/apps/local-dashboard/dist/assets/vendor-ui-CGEmUayx.js +12 -0
package/apps/local-dashboard/dist/index.html +5 -5
package/cli/selftune/alpha-upload/build-payloads.ts +14 -1
package/cli/selftune/alpha-upload/client.ts +51 -1
package/cli/selftune/alpha-upload/flush.ts +46 -5
package/cli/selftune/alpha-upload/stage-canonical.ts +32 -10
package/cli/selftune/alpha-upload-contract.ts +9 -0
package/cli/selftune/constants.ts +92 -5
package/cli/selftune/contribute/contribute.ts +30 -2
package/cli/selftune/contribute/sanitize.ts +52 -5
package/cli/selftune/contribution-config.ts +249 -0
package/cli/selftune/contribution-relay.ts +177 -0
package/cli/selftune/contribution-signals.ts +219 -0
package/cli/selftune/contribution-staging.ts +147 -0
package/cli/selftune/contributions.ts +532 -0
package/cli/selftune/creator-contributions.ts +333 -0
package/cli/selftune/dashboard-contract.ts +305 -1
package/cli/selftune/dashboard-server.ts +47 -13
package/cli/selftune/eval/family-overlap.ts +395 -0
package/cli/selftune/eval/hooks-to-evals.ts +182 -28
package/cli/selftune/eval/synthetic-evals.ts +298 -11
package/cli/selftune/evolution/description-quality.ts +12 -11
package/cli/selftune/evolution/evolve.ts +214 -51
package/cli/selftune/evolution/validate-proposal.ts +9 -6
package/cli/selftune/export.ts +2 -2
package/cli/selftune/grading/grade-session.ts +20 -0
package/cli/selftune/hooks/commit-track.ts +188 -0
package/cli/selftune/hooks/prompt-log.ts +10 -1
package/cli/selftune/hooks/session-stop.ts +2 -2
package/cli/selftune/hooks/skill-eval.ts +15 -1
package/cli/selftune/hooks/stdin-preview.ts +32 -0
package/cli/selftune/index.ts +41 -5
package/cli/selftune/ingestors/codex-rollout.ts +31 -35
package/cli/selftune/ingestors/codex-wrapper.ts +32 -24
package/cli/selftune/localdb/db.ts +2 -2
package/cli/selftune/localdb/direct-write.ts +69 -6
package/cli/selftune/localdb/queries.ts +1253 -37
package/cli/selftune/localdb/schema.ts +66 -0
package/cli/selftune/orchestrate.ts +32 -4
package/cli/selftune/recover.ts +153 -0
package/cli/selftune/repair/skill-usage.ts +363 -4
package/cli/selftune/routes/actions.ts +35 -1
package/cli/selftune/routes/analytics.ts +14 -0
package/cli/selftune/routes/index.ts +1 -0
package/cli/selftune/routes/overview.ts +150 -4
package/cli/selftune/routes/skill-report.ts +648 -18
package/cli/selftune/status.ts +81 -2
package/cli/selftune/sync.ts +56 -2
package/cli/selftune/trust-model.ts +66 -0
package/cli/selftune/types.ts +80 -0
package/cli/selftune/utils/skill-detection.ts +43 -0
package/cli/selftune/utils/transcript.ts +210 -1
package/cli/selftune/watchlist.ts +65 -0
package/node_modules/@selftune/telemetry-contract/src/types.ts +11 -0
package/package.json +1 -1
package/packages/telemetry-contract/src/types.ts +11 -0
package/packages/ui/src/components/ActivityTimeline.tsx +165 -150
package/packages/ui/src/components/EvidenceViewer.tsx +335 -144
package/packages/ui/src/components/EvolutionTimeline.tsx +58 -28
package/packages/ui/src/components/OrchestrateRunsPanel.tsx +33 -16
package/packages/ui/src/components/RecentActivityFeed.tsx +72 -41
package/packages/ui/src/components/section-cards.tsx +12 -9
package/packages/ui/src/primitives/card.tsx +1 -1
package/skill/SKILL.md +40 -2
package/skill/Workflows/AlphaUpload.md +4 -0
package/skill/Workflows/Composability.md +64 -0
package/skill/Workflows/Contribute.md +6 -3
package/skill/Workflows/Contributions.md +97 -0
package/skill/Workflows/CreatorContributions.md +74 -0
package/skill/Workflows/Dashboard.md +31 -0
package/skill/Workflows/Evals.md +57 -8
package/skill/Workflows/Evolve.md +31 -13
package/skill/Workflows/ExportCanonical.md +121 -0
package/skill/Workflows/Hook.md +131 -0
package/skill/Workflows/Ingest.md +7 -0
package/skill/Workflows/Initialize.md +29 -9
package/skill/Workflows/Orchestrate.md +27 -5
package/skill/Workflows/Quickstart.md +94 -0
package/skill/Workflows/Recover.md +84 -0
package/skill/Workflows/RepairSkillUsage.md +95 -0
package/skill/Workflows/Sync.md +18 -12
package/skill/Workflows/Uninstall.md +82 -0
package/skill/settings_snippet.json +11 -0
package/apps/local-dashboard/dist/assets/index-BMIS6uUh.css +0 -2
package/apps/local-dashboard/dist/assets/index-DOu3iLD9.js +0 -16
package/apps/local-dashboard/dist/assets/vendor-table-pHbDxq36.js +0 -8
package/apps/local-dashboard/dist/assets/vendor-ui-DIwlrGlb.js +0 -12

package/README.md CHANGED Viewed

@@ -69,6 +69,8 @@ selftune learned that real users say "slides", "deck", "presentation for Monday"
 **I manage an agent setup with many skills** — You have 15+ skills installed. Some work. Some don't. Some conflict. Tell your agent "how are my skills doing?" and selftune gives you a health dashboard and automatically improves the skills that aren't keeping up.
+**I use skills for non-coding work** — Marketing workflows, research pipelines, compliance checks, slide decks. You say "make me a presentation" and nothing happens. selftune learns that "slides", "deck", and "presentation for Monday" all mean the same skill — and fixes the routing automatically.
 ## How It Works
 <p align="center">
@@ -77,29 +79,27 @@ selftune learned that real users say "slides", "deck", "presentation for Monday"
 A continuous feedback loop that makes your skills learn and adapt. Automatically. Your agent runs everything — you just install the skill and talk naturally.
-**Observe** — Hooks capture every query and which skills fired. On Claude Code, hooks install automatically during `selftune init`. Backfill existing transcripts with `selftune ingest claude`.
+**Observe** — Seven real-time hooks capture every query, every skill invocation, and every correction signal. Structured telemetry — not raw logs. On Claude Code, hooks install automatically during `selftune init`. Backfill existing transcripts with `selftune ingest claude`.
+**Detect** — Finds the gap between how you talk and how your skills are described. You say "make me a slide deck" and your pptx skill stays silent — selftune catches that mismatch. Clusters missed queries by invocation type. Detects correction signals ("why didn't you use X?") and triggers immediate improvement.
+**Evolve** — Generates multiple proposals biased toward different invocation types, validates each against your real eval set with majority voting, runs constitutional checks, then gates with an expensive model before deploying. Not guesswork — evidence. Automatic backup on every deploy.
-**Detect** — Finds the gap between how you talk and how your skills are described. You say "make me a slide deck" and your pptx skill stays silent — selftune catches that mismatch. Real-time correction signals ("why didn't you use X?") are detected and trigger immediate improvement.
+**Watch** — After deploying changes, selftune monitors trigger rates, false negatives, and per-invocation-type scores. If anything regresses, it rolls back automatically. No manual monitoring needed.
-**Evolve** — Rewrites skill descriptions — and full skill bodies — to match how you actually work. Cheap-loop mode uses haiku for the loop, sonnet for the gate (~80% cost reduction). Teacher-student body evolution with 3-gate validation. Automatic backup.
+**Automate** — Run `selftune cron setup` to install OS-level scheduling. selftune syncs, grades, evolves, and watches on a schedule — fully autonomous.
-**Watch** — After deploying changes, selftune monitors skill trigger rates. If anything regresses, it rolls back automatically.
+## How Is This Different from Agents That "Learn"?
-**Automate** — Run `selftune cron setup` to install OS-level scheduling. selftune syncs, evaluates, evolves, and watches on a schedule — no manual intervention needed.
+Some agents claim self-improvement by saving notes about what worked. That's knowledge persistence — not a closed loop. There's no measurement, no validation, and no way to know if the saved notes are actually correct.
-## What's New in v0.2.0
+selftune is empirical. It observes real sessions, grades execution quality, detects missed triggers, proposes changes, validates them against eval sets, deploys with automatic backup, monitors for regressions, and rolls back on failure. Twelve interlocking mechanisms — not one background thread writing markdown.
-- **Full skill body evolution** — Beyond descriptions: evolve routing tables and entire skill bodies using teacher-student model with structural, trigger, and quality gates
-- **Synthetic eval generation** — `selftune eval generate --synthetic` generates eval sets from SKILL.md via LLM, no session logs needed. Solves cold-start: new skills get evals immediately.
-- **Cheap-loop evolution** — `selftune evolve --cheap-loop` uses haiku for proposal generation and validation, sonnet only for the final deployment gate. ~80% cost reduction.
-- **Batch trigger validation** — Validation now batches 10 queries per LLM call instead of one-per-query. ~10x faster evolution loops.
-- **Per-stage model control** — `--validation-model`, `--proposal-model`, and `--gate-model` flags give fine-grained control over which model runs each evolution stage.
-- **Auto-activation system** — Hooks detect when selftune should run and suggest actions
-- **Enforcement guardrails** — Blocks SKILL.md edits on monitored skills unless `selftune watch` has been run
-- **Live dashboard server** — `selftune dashboard --serve` with SSE auto-refresh and action buttons
-- **Evolution memory** — Persists context, plans, and decisions across context resets
-- **4 specialized agents** — Diagnosis analyst, pattern analyst, evolution reviewer, integration guide
-- **Sandbox test harness** — Comprehensive automated test coverage, including devcontainer-based LLM testing
+| Approach                  | Measures quality? | Validates changes?          | Detects regressions?   | Rolls back? |
+| ------------------------- | ----------------- | --------------------------- | ---------------------- | ----------- |
+| Agent saves its own notes | No                | No                          | No                     | No          |
+| Manual skill rewrites     | No                | No                          | No                     | No          |
+| **selftune**              | 3-tier grading    | Eval sets + majority voting | Post-deploy monitoring | Automatic   |
 ## Commands
@@ -107,13 +107,16 @@ Your agent runs these — you just say what you want ("improve my skills", "show
 | Group      | Command                                      | What it does                                                                                |
 | ---------- | -------------------------------------------- | ------------------------------------------------------------------------------------------- |
-|            | `selftune status`                            | See which skills are undertriggering and why                                                |
-|            | `selftune orchestrate`                       | Run the full autonomous loop (sync → evolve → watch)                                        |
+|            | `selftune status`                            | Get a one-line health summary plus compact attention / improving highlights                 |
+|            | `selftune last`                              | Quick insight from the most recent session                                                  |
+|            | `selftune orchestrate`                       | Run the full autonomous loop (sync → grade → evolve → watch)                                |
+|            | `selftune sync`                              | Replay source-truth transcripts/rollouts into SQLite and refresh repair state               |
 |            | `selftune dashboard`                         | Open the visual skill health dashboard                                                      |
 |            | `selftune doctor`                            | Health check: logs, hooks, config, permissions                                              |
 | **ingest** | `selftune ingest claude`                     | Backfill from Claude Code transcripts                                                       |
 |            | `selftune ingest codex`                      | Import Codex rollout logs (experimental)                                                    |
 | **grade**  | `selftune grade --skill <name>`              | Grade a skill session with evidence                                                         |
+|            | `selftune grade auto`                        | Auto-grade recent sessions for ungraded skills                                              |
 |            | `selftune grade baseline --skill <name>`     | Measure skill value vs no-skill baseline                                                    |
 | **evolve** | `selftune evolve --skill <name>`             | Propose, validate, and deploy improved descriptions                                         |
 |            | `selftune evolve body --skill <name>`        | Evolve full skill body or routing table                                                     |
@@ -121,11 +124,18 @@ Your agent runs these — you just say what you want ("improve my skills", "show
 | **eval**   | `selftune eval generate --skill <name>`      | Generate eval sets (`--synthetic` for cold-start)                                           |
 |            | `selftune eval unit-test --skill <name>`     | Run or generate skill-level unit tests                                                      |
 |            | `selftune eval composability --skill <name>` | Detect conflicts between co-occurring skills                                                |
+|            | `selftune eval family-overlap --prefix sc-`  | Detect sibling overlap and suggest when a skill family should be consolidated               |
 |            | `selftune eval import`                       | Import external eval corpus from [SkillsBench](https://github.com/benchflow-ai/skillsbench) |
 | **auto**   | `selftune cron setup`                        | Install OS-level scheduling (cron/launchd/systemd)                                          |
 |            | `selftune watch --skill <name>`              | Monitor after deploy. Auto-rollback on regression.                                          |
-| **other**  | `selftune telemetry`                         | Manage anonymous usage analytics (status, enable, disable)                                  |
-|            | `selftune alpha upload`                      | Run a manual alpha upload cycle and emit a JSON send summary                                |
+| **other**  | `selftune workflows`                         | Discover and manage multi-skill workflows                                                   |
+|            | `selftune contributions`                    | Manage creator-directed sharing preferences                                                  |
+|            | `selftune creator-contributions`            | Create or remove bundled `selftune.contribute.json` configs for skill creators              |
+|            | `selftune contribute`                       | Export an anonymized community contribution bundle                                           |
+|            | `selftune recover`                           | Recover SQLite from legacy/exported JSONL during migration or disaster recovery             |
+|            | `selftune badge --skill <name>`              | Generate a health badge for your skill's README                                             |
+|            | `selftune telemetry`                         | Manage anonymous usage analytics (status, enable, disable)                                  |
+|            | `selftune alpha upload`                      | Run a manual SQLite-backed alpha upload cycle and emit a JSON send summary                  |
 Full command reference: `selftune --help`
@@ -157,7 +167,7 @@ selftune is complementary to these tools, not competitive. They trace what happe
 **Claude Code** (fully supported) — Hooks install automatically. `selftune ingest claude` backfills existing transcripts. This is the primary supported platform.
-**Codex** (experimental) — `selftune ingest wrap-codex -- <args>` or `selftune ingest codex`. Adapter exists but is not actively tested.
+**Codex** (experimental) — `selftune ingest wrap-codex -- <args>` or `selftune ingest codex`. Adapter exists but is not actively tested. Skill attribution is conservative: selftune only records explicit Codex skill evidence, not incidental assistant/meta mentions.
 **OpenCode** (experimental) — `selftune ingest opencode`. Adapter exists but is not actively tested.