npm - selftune - Versions diffs - 0.2.31 → 0.2.32 - Mend

selftune 0.2.31 → 0.2.32

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (95) hide show

package/README.md +83 -56
package/apps/local-dashboard/dist/assets/index-B-ut4w0B.js +15 -0
package/apps/local-dashboard/dist/assets/index-BFGfCVrL.css +1 -0
package/apps/local-dashboard/dist/assets/vendor-ui-DfowE3Hu.js +1 -0
package/apps/local-dashboard/dist/index.html +3 -3
package/cli/selftune/command-surface.ts +613 -2
package/cli/selftune/create/baseline.ts +429 -0
package/cli/selftune/create/check.ts +35 -0
package/cli/selftune/create/init.ts +115 -0
package/cli/selftune/create/package-candidate-state.ts +771 -0
package/cli/selftune/create/package-evaluator.ts +710 -0
package/cli/selftune/create/package-fingerprint.ts +142 -0
package/cli/selftune/create/package-search.ts +377 -0
package/cli/selftune/create/publish.ts +431 -0
package/cli/selftune/create/readiness.ts +495 -0
package/cli/selftune/create/replay.ts +330 -0
package/cli/selftune/create/report.ts +74 -0
package/cli/selftune/create/scaffold.ts +121 -0
package/cli/selftune/create/skills-ref-adapter.ts +177 -0
package/cli/selftune/create/status.ts +33 -0
package/cli/selftune/create/templates.ts +249 -0
package/cli/selftune/cron/setup.ts +1 -1
package/cli/selftune/dashboard-action-events.ts +4 -1
package/cli/selftune/dashboard-action-result.ts +789 -24
package/cli/selftune/dashboard-action-stream.ts +80 -0
package/cli/selftune/dashboard-contract.ts +146 -3
package/cli/selftune/dashboard-server.ts +5 -4
package/cli/selftune/eval/hooks-to-evals.ts +58 -35
package/cli/selftune/eval/synthetic-evals.ts +145 -17
package/cli/selftune/evolution/bounded-mutations.ts +1045 -0
package/cli/selftune/evolution/evolve-body.ts +9 -36
package/cli/selftune/evolution/evolve.ts +8 -72
package/cli/selftune/evolution/stopping-criteria.ts +5 -13
package/cli/selftune/evolution/unblock-suggestions.ts +0 -16
package/cli/selftune/evolution/validate-host-replay.ts +115 -15
package/cli/selftune/improve.ts +206 -0
package/cli/selftune/index.ts +123 -6
package/cli/selftune/init.ts +1 -1
package/cli/selftune/localdb/queries/dashboard.ts +30 -0
package/cli/selftune/localdb/schema.ts +52 -0
package/cli/selftune/monitoring/watch.ts +257 -23
package/cli/selftune/orchestrate/execute.ts +300 -1
package/cli/selftune/orchestrate/finalize.ts +14 -0
package/cli/selftune/orchestrate/plan.ts +22 -5
package/cli/selftune/orchestrate/prepare.ts +59 -4
package/cli/selftune/orchestrate/report.ts +1 -1
package/cli/selftune/orchestrate.ts +34 -1
package/cli/selftune/publish.ts +35 -0
package/cli/selftune/routes/actions.ts +81 -15
package/cli/selftune/routes/overview.ts +1 -1
package/cli/selftune/routes/skill-report.ts +147 -2
package/cli/selftune/run.ts +18 -0
package/cli/selftune/schedule.ts +3 -3
package/cli/selftune/search-run.ts +703 -0
package/cli/selftune/status.ts +35 -11
package/cli/selftune/testing-readiness.ts +431 -40
package/cli/selftune/types.ts +316 -0
package/cli/selftune/utils/eval-readiness.ts +1 -0
package/cli/selftune/utils/json-output.ts +11 -0
package/cli/selftune/utils/lifecycle-surface.ts +48 -0
package/cli/selftune/utils/query-filter.ts +82 -1
package/cli/selftune/utils/tui.ts +85 -2
package/cli/selftune/verify.ts +205 -0
package/cli/selftune/workflows/proposals.ts +1 -1
package/cli/selftune/workflows/skill-scaffold.ts +141 -63
package/cli/selftune/workflows/workflows.ts +4 -4
package/package.json +1 -1
package/skill/SKILL.md +148 -85
package/skill/references/cli-quick-reference.md +16 -1
package/skill/references/creator-playbook.md +31 -10
package/skill/workflows/Baseline.md +8 -9
package/skill/workflows/Contributions.md +4 -4
package/skill/workflows/Create.md +173 -0
package/skill/workflows/CreateTestDeploy.md +34 -30
package/skill/workflows/Cron.md +2 -2
package/skill/workflows/Dashboard.md +3 -3
package/skill/workflows/Evals.md +13 -7
package/skill/workflows/Evolve.md +75 -32
package/skill/workflows/EvolveBody.md +22 -15
package/skill/workflows/Hook.md +1 -1
package/skill/workflows/Improve.md +168 -0
package/skill/workflows/Initialize.md +3 -3
package/skill/workflows/Orchestrate.md +49 -12
package/skill/workflows/Publish.md +100 -0
package/skill/workflows/Run.md +72 -0
package/skill/workflows/Schedule.md +2 -2
package/skill/workflows/SearchRun.md +89 -0
package/skill/workflows/SignalsDashboard.md +2 -2
package/skill/workflows/UnitTest.md +13 -4
package/skill/workflows/Verify.md +136 -0
package/skill/workflows/Watch.md +114 -47
package/skill/workflows/Workflows.md +13 -8
package/apps/local-dashboard/dist/assets/index-B7v_o1WC.js +0 -15
package/apps/local-dashboard/dist/assets/index-CrO77SVi.css +0 -1
package/apps/local-dashboard/dist/assets/vendor-ui-B0H8s1mP.js +0 -1

package/skill/SKILL.md CHANGED Viewed

@@ -2,26 +2,28 @@
 name: selftune
 description: >
   Self-improving skills toolkit that watches real agent sessions, detects missed
-  triggers, grades execution quality, and evolves skill descriptions to match how
-  users actually talk. Use when grading sessions, generating evals, evolving skill
-  descriptions or routing tables, discovering reusable workflows, scaffolding new
-  workflow skills, checking skill health, viewing the dashboard, ingesting sessions
-  from other platforms, or running autonomous improvement loops.
+  triggers, grades execution quality, and evolves skills through a package
+  evaluation pipeline (replay, baseline, grading, unit tests, and post-deploy
+  watch). Use when verifying skill packages, publishing improvements, evolving
+  skill descriptions or routing tables, discovering reusable workflows, scaffolding
+  new workflow skills, checking skill health, viewing the dashboard, ingesting
+  sessions from other platforms, or running autonomous improvement loops.
   Make sure to use this skill whenever the user mentions skill improvement, skill
   performance, skill triggers, skill evolution, skill health, undertriggering,
   overtriggering, session grading, or wants to know how their skills are doing —
   even if they don't say "selftune" explicitly.
 metadata:
   author: selftune-dev
-  version: 0.2.31
+  version: 0.2.32
   category: developer-tools
 ---
 # selftune
 Observe real agent sessions, detect missed triggers, grade execution quality,
-evolve skill descriptions toward the language real users actually use, and
-scaffold workflow skills from repeated telemetry patterns.
+evolve skills through package evaluation (replay, baseline, grading, body,
+unit tests, and post-deploy watch), and scaffold workflow skills from
+repeated telemetry patterns.
 **You are the operator.** The user installed this skill so YOU can manage their
 skill health autonomously. They will say things like "set up selftune",
@@ -34,6 +36,43 @@ If `~/.selftune/config.json` does not exist, read `workflows/Initialize.md`
 first. The CLI must be installed (`selftune` on PATH) before other commands
 will work. Do not proceed with other commands until initialization is complete.
+## Primary Lifecycle
+Default to this lifecycle unless the user explicitly asks for a low-level
+workflow:
+1. `status`
+   - use `selftune status`
+   - for draft packages, use `selftune create status --skill-path <path>`
+2. `verify`
+   - use `selftune verify --skill-path <path>`
+   - if verify reports missing readiness or evidence, follow the returned next
+     low-level command instead of rerunning the full chain
+3. `publish`
+   - for draft packages, use `selftune publish --skill-path <path>`
+   - for already-live skills, `publish` usually means a validated `Improve`
+     action plus `Watch`
+4. `improve`
+   - use `selftune improve --skill <name> --skill-path <path>`
+   - let `--scope auto` choose bounded package search automatically when the
+     skill already has package evidence or a draft package manifest
+   - set `--scope description|routing|body|package` when the measured gap is
+     already clear and you want to force the mutation surface
+   - use `--scope package` when the problem spans routing and body together or
+     you want measured frontier comparison before deciding what to publish
+   - omit `--dry-run` when you want the winning package candidate promoted back
+     into the draft automatically
+5. `run`
+   - use `selftune run`
+Treat `eval generate`, `unit-test`, `replay`, `baseline`, `watch`, and
+body-specific evolution as advanced supporting workflows unless the user asks
+for them directly or the default lifecycle fails.
 ## Command Execution Policy
 ```bash
@@ -43,7 +82,8 @@ selftune <command> [options]
 Commands vary in output format:
 - **JSON by default:** `selftune doctor` and `selftune watch` emit structured JSON on stdout.
-- **Text by default:** `selftune status`, `selftune last`, `selftune orchestrate`, and `selftune evolve` print human-readable text.
+- **Text by default:** `selftune status`, `selftune last`, `selftune verify`, `selftune publish`, and `selftune improve` print human-readable text when stdout is a TTY.
+- **Mixed runtime output:** `selftune run` / `selftune orchestrate` emit JSON on stdout and a human report on stderr.
 - **JSON opt-in:** `selftune sync --json` enables structured JSON output.
 - **Server:** `selftune dashboard` starts a local SPA server — it does not emit data.
@@ -54,70 +94,78 @@ next step from prose.
 Run `selftune <command> --help` for exact flags. Read
 `references/cli-quick-reference.md` when you need the full flag reference.
-## Creator Trust Loop
+## Package Evaluation Pipeline (Creator Trust Loop)
-When the user wants to improve a skill, default to this creator loop before
-jumping straight to mutation:
+When the user wants to improve a skill, default to this package evaluation
+pipeline before jumping straight to mutation. Each step builds measured
+evidence that the package is ready to publish:
-1. `selftune eval generate --skill <name> --skill-path <path>`
-2. `selftune eval unit-test --skill <name> --generate --skill-path <path>`
-3. `selftune evolve --skill <name> --skill-path <path> --dry-run --validation-mode replay`
-4. `selftune grade baseline --skill <name> --skill-path <path>`
-5. `selftune evolve --skill <name> --skill-path <path> --with-baseline`
-6. then `selftune watch --skill <name>`
+- `draft` — the package exists but is still incomplete
+- `verify_blocked` — the draft is still in one of the concrete readiness states: `needs_spec_validation`, `needs_package_resources`, `needs_evals`, `needs_unit_tests`, `needs_routing_replay`, or `needs_baseline`
+- `verified` — the trust gates pass and the skill is ready to ship
+- `published` — the skill was shipped successfully
+- `watching` — post-deploy monitoring is active
+- `needs_improvement` — measured evidence shows trigger, routing, body, or value gaps
+- `unhealthy` — hooks, telemetry, config, or selftune itself is broken
 If the user asks "how do I know this skill works?" or "can I trust this skill
-yet?", start with this loop, then use `selftune status`, the dashboard, or the
-skill report to explain what is still missing, whether the skill is ready to
-deploy, or whether it is already being watched live.
+yet?", start with this pipeline, then use `selftune status`, the dashboard, or
+the skill report to explain what is still missing, whether the package is ready
+to publish, or whether it is already being watched live.
 ## Workflow Routing
-| Trigger keywords | Workflow | File |
-| --- | --- | --- |
-| create test deploy, creator loop, ship skill, ready to deploy, can I trust this skill, how do I know this skill works | CreateTestDeploy | workflows/CreateTestDeploy.md |
-| grade, score, evaluate, assess session, auto-grade | Grade | workflows/Grade.md |
-| evals, eval set, undertriggering, skill stats, eval generate | Evals | workflows/Evals.md |
-| evolve, improve, optimize skills, make skills better, triggers, catch more queries, apply proposal, apply contributor proposal | Evolve | workflows/Evolve.md |
-| evolve body, evolve routing, full body evolution, rewrite skill, teacher student | EvolveBody | workflows/EvolveBody.md |
-| evolve rollback, undo, restore, revert evolution, go back, undo last change | Rollback | workflows/Rollback.md |
-| watch, monitor, regression, post-deploy, keep an eye on | Watch | workflows/Watch.md |
-| doctor, health, hooks, broken, diagnose, not working, something wrong | Doctor | workflows/Doctor.md |
-| ingest, import, codex logs, opencode, openclaw, pi, wrap codex | Ingest | workflows/Ingest.md |
-| replay, backfill, claude transcripts, historical sessions | Replay | workflows/Replay.md |
-| contributions, sharing preferences, opt in/out creator sharing, approve/revoke contributions | Contributions | workflows/Contributions.md |
-| creator contributions, selftune.contribute.json, enable/disable creator contribution | CreatorContributions | workflows/CreatorContributions.md |
-| signals dashboard, contributor signals, signals page, community dashboard, community data, contributor stats, signal health, how are signals, how is community | SignalsDashboard | workflows/SignalsDashboard.md |
-| contribute, share, export bundle, export data, anonymized, give back | Contribute | workflows/Contribute.md |
-| init, setup, set up, bootstrap, first time, install, configure selftune, alpha, enroll | Initialize | workflows/Initialize.md |
-| cron, schedule, automate evolution, run automatically | Cron | workflows/Cron.md |
-| schedule, selftune schedule, launchd, systemd, crontab, automation setup | Schedule | workflows/Schedule.md |
-| auto-activate, suggestions, activation rules, nag, why suggest | AutoActivation | workflows/AutoActivation.md |
-| dashboard, visual, open dashboard, show dashboard, serve dashboard | Dashboard | workflows/Dashboard.md |
-| evolution memory, session continuity, what happened last | EvolutionMemory | workflows/EvolutionMemory.md |
-| grade baseline, baseline lift, adds value, skill value, no-skill comparison | Baseline | workflows/Baseline.md |
-| eval unit-test, skill test, test skill, generate tests, run tests | UnitTest | workflows/UnitTest.md |
-| eval composability, co-occurrence, skill conflicts, family overlap, sibling confusion | Composability | workflows/Composability.md |
-| eval import, skillsbench, external evals, benchmark tasks | ImportSkillsBench | workflows/ImportSkillsBench.md |
-| telemetry, analytics, disable analytics, opt out, tracking, privacy | Telemetry | workflows/Telemetry.md |
-| orchestrate, autonomous, full loop, improve all skills, run selftune loop | Orchestrate | workflows/Orchestrate.md |
-| sync, refresh, source truth, rescan sessions | Sync | workflows/Sync.md |
-| badge, readme badge, skill badge, health badge | Badge | workflows/Badge.md |
-| workflows, discover workflows, scaffold workflow skill, build skill from logs | Workflows | workflows/Workflows.md |
-| alpha upload, upload data, send alpha data, manual upload | AlphaUpload | workflows/AlphaUpload.md |
-| recover, rebuild sqlite, recover db, legacy backfill | Recover | workflows/Recover.md |
-| quickstart, getting started, onboard, first time setup, new user | Quickstart | workflows/Quickstart.md |
-| uninstall, remove selftune, clean up, teardown | Uninstall | workflows/Uninstall.md |
-| repair, rebuild usage, fix skill usage, trustworthy usage | RepairSkillUsage | workflows/RepairSkillUsage.md |
-| export canonical, canonical export, canonical telemetry, push payload | ExportCanonical | workflows/ExportCanonical.md |
-| hook, run hook, invoke hook, manual hook, debug hook | Hook | workflows/Hook.md |
-| codex/opencode/cline/pi hooks, platform hooks, non-claude hooks, multi-agent | PlatformHooks | workflows/PlatformHooks.md |
-| registry, distribute, push/install/sync/rollback skill, team skills | Registry | workflows/Registry.md |
-| export, dump, jsonl, export sqlite, debug export | Export | _(direct: `selftune export`)_ |
-| status, health summary, skill health, how are skills, run selftune | Status | _(direct: `selftune status`)_ |
-| last, last session, recent session, what happened | Last | _(direct: `selftune last`)_ |
-Workflows Grade, Evolve, Watch, and Ingest also run autonomously via `selftune orchestrate`.
+| Trigger keywords                                                                                                                                               | Workflow             | File                              |
+| -------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------- | --------------------------------- |
+| create skill, new skill package, author skill, bootstrap skill, scaffold package, benchmark report, package report, publish report                             | Create               | workflows/Create.md               |
+| verify skill, creator loop, can I trust this skill, how do I know this skill works, test this skill, ready to ship, ready to deploy                            | Verify               | workflows/Verify.md               |
+| publish skill, ship skill, deploy skill, go live, release skill                                                                                                | Publish              | workflows/Publish.md              |
+| search run, package frontier, candidate search, bounded package evolution, compare package candidates, optimize package, improve routing and body together, bounded evolution | SearchRun            | workflows/SearchRun.md            |
+| grade, score, evaluate, assess session, auto-grade                                                                                                             | Grade                | workflows/Grade.md                |
+| evals, eval set, undertriggering, skill stats, eval generate                                                                                                   | Evals                | workflows/Evals.md                |
+| improve, optimize skills, make skills better, triggers, catch more queries, apply proposal, apply contributor proposal                                         | Improve              | workflows/Improve.md              |
+| evolve description, description-only evolution, improve trigger wording                                                                                        | Evolve               | workflows/Evolve.md               |
+| evolve body, evolve routing, full body evolution, rewrite skill, teacher student                                                                               | EvolveBody           | workflows/EvolveBody.md           |
+| evolve rollback, undo, restore, revert evolution, go back, undo last change                                                                                    | Rollback             | workflows/Rollback.md             |
+| watch, monitor, regression, post-deploy, keep an eye on                                                                                                        | Watch                | workflows/Watch.md                |
+| doctor, health, hooks, broken, diagnose, not working, something wrong                                                                                          | Doctor               | workflows/Doctor.md               |
+| ingest, import, codex logs, opencode, openclaw, pi, wrap codex                                                                                                 | Ingest               | workflows/Ingest.md               |
+| replay, backfill, claude transcripts, historical sessions                                                                                                      | Replay               | workflows/Replay.md               |
+| contributions, sharing preferences, opt in/out creator sharing, approve/revoke contributions                                                                   | Contributions        | workflows/Contributions.md        |
+| creator contributions, selftune.contribute.json, enable/disable creator contribution                                                                           | CreatorContributions | workflows/CreatorContributions.md |
+| signals dashboard, contributor signals, signals page, community dashboard, community data, contributor stats, signal health, how are signals, how is community | SignalsDashboard     | workflows/SignalsDashboard.md     |
+| contribute, share, export bundle, export data, anonymized, give back                                                                                           | Contribute           | workflows/Contribute.md           |
+| init, setup, set up, bootstrap, first time, install, configure selftune, alpha, enroll                                                                         | Initialize           | workflows/Initialize.md           |
+| cron, schedule, automate evolution, run automatically                                                                                                          | Cron                 | workflows/Cron.md                 |
+| schedule, selftune schedule, launchd, systemd, crontab, automation setup                                                                                       | Schedule             | workflows/Schedule.md             |
+| auto-activate, suggestions, activation rules, nag, why suggest                                                                                                 | AutoActivation       | workflows/AutoActivation.md       |
+| dashboard, visual, open dashboard, show dashboard, serve dashboard                                                                                             | Dashboard            | workflows/Dashboard.md            |
+| evolution memory, session continuity, what happened last                                                                                                       | EvolutionMemory      | workflows/EvolutionMemory.md      |
+| grade baseline, baseline lift, adds value, skill value, no-skill comparison                                                                                    | Baseline             | workflows/Baseline.md             |
+| eval unit-test, skill test, test skill, generate tests, run tests                                                                                              | UnitTest             | workflows/UnitTest.md             |
+| eval composability, co-occurrence, skill conflicts, family overlap, sibling confusion                                                                          | Composability        | workflows/Composability.md        |
+| eval import, skillsbench, external evals, benchmark tasks                                                                                                      | ImportSkillsBench    | workflows/ImportSkillsBench.md    |
+| telemetry, analytics, disable analytics, opt out, tracking, privacy                                                                                            | Telemetry            | workflows/Telemetry.md            |
+| orchestrate, autonomous, full loop, improve all skills, run selftune, run selftune loop, run with package search, automatic package improvement                | Run                  | workflows/Run.md                  |
+| sync, refresh, source truth, rescan sessions                                                                                                                   | Sync                 | workflows/Sync.md                 |
+| badge, readme badge, skill badge, health badge                                                                                                                 | Badge                | workflows/Badge.md                |
+| workflows, discover workflows, scaffold workflow skill, build skill from logs                                                                                  | Workflows            | workflows/Workflows.md            |
+| alpha upload, upload data, send alpha data, manual upload                                                                                                      | AlphaUpload          | workflows/AlphaUpload.md          |
+| recover, rebuild sqlite, recover db, legacy backfill                                                                                                           | Recover              | workflows/Recover.md              |
+| quickstart, getting started, onboard, first time setup, new user                                                                                               | Quickstart           | workflows/Quickstart.md           |
+| uninstall, remove selftune, clean up, teardown                                                                                                                 | Uninstall            | workflows/Uninstall.md            |
+| repair, rebuild usage, fix skill usage, trustworthy usage                                                                                                      | RepairSkillUsage     | workflows/RepairSkillUsage.md     |
+| export canonical, canonical export, canonical telemetry, push payload                                                                                          | ExportCanonical      | workflows/ExportCanonical.md      |
+| hook, run hook, invoke hook, manual hook, debug hook                                                                                                           | Hook                 | workflows/Hook.md                 |
+| codex/opencode/cline/pi hooks, platform hooks, non-claude hooks, multi-agent                                                                                   | PlatformHooks        | workflows/PlatformHooks.md        |
+| registry, distribute, push/install/sync/rollback skill, team skills                                                                                            | Registry             | workflows/Registry.md             |
+| export, dump, jsonl, export sqlite, debug export                                                                                                               | Export               | _(direct: `selftune export`)_     |
+| status, health summary, skill health, how are skills, run selftune                                                                                             | Status               | _(direct: `selftune status`)_     |
+| last, last session, recent session, what happened                                                                                                              | Last                 | _(direct: `selftune last`)_       |
+Workflows Grade, Improve, Watch, and Ingest also run autonomously via `selftune orchestrate`.
+When package evaluation evidence exists, `selftune orchestrate` (aliased as `selftune run`)
+can automatically select package-level bounded search instead of description-level evolve.
 ## Interactive Configuration
@@ -130,12 +178,27 @@ tier reference, and quick-path rules.
 selftune bundles focused agents in `agents/`. Read the relevant agent file and
 follow its instructions — either inline or by spawning a subagent.
-| Trigger keywords | Agent file | When to use |
-| --- | --- | --- |
-| diagnose, root cause, why failing, debug performance | `agents/diagnosis-analyst.md` | Recurring low grades or unclear failures after doctor/status |
-| patterns, conflicts, cross-skill, overlap | `agents/pattern-analyst.md` | Skills overlap, misroute, or interfere |
-| review evolution, check proposal, safe to deploy | `agents/evolution-reviewer.md` | Before deploying high-stakes or marginal evolutions |
-| set up selftune, integrate, configure project | `agents/integration-guide.md` | Complex setup: monorepos, multi-skill, mixed-platform |
+| Trigger keywords                                     | Agent file                     | When to use                                                  |
+| ---------------------------------------------------- | ------------------------------ | ------------------------------------------------------------ |
+| diagnose, root cause, why failing, debug performance | `agents/diagnosis-analyst.md`  | Recurring low grades or unclear failures after doctor/status |
+| patterns, conflicts, cross-skill, overlap            | `agents/pattern-analyst.md`    | Skills overlap, misroute, or interfere                       |
+| review evolution, check proposal, safe to deploy     | `agents/evolution-reviewer.md` | Before deploying high-stakes or marginal evolutions          |
+| set up selftune, integrate, configure project        | `agents/integration-guide.md`  | Complex setup: monorepos, multi-skill, mixed-platform        |
+## Advanced Workflows
+Load these when the user explicitly asks for a low-level step, when the primary
+lifecycle fails, or when debugging needs deeper evidence:
+- `workflows/Evals.md`
+- `workflows/UnitTest.md`
+- `workflows/Baseline.md`
+- `workflows/Replay.md`
+- `workflows/Watch.md`
+- `workflows/Evolve.md`
+- `workflows/EvolveBody.md`
+- `workflows/Composability.md`
+- `workflows/ImportSkillsBench.md`
 ## Negative Examples
@@ -173,16 +236,16 @@ community contribution, signal sharing, opt in creator, creator UUID.
 Load these on demand — do not read unless needed for the current task:
-| Reference | When to read |
-| --- | --- |
-| `references/cli-quick-reference.md` | Need exact CLI flags beyond `--help` |
-| `references/troubleshooting.md` | Diagnosing common errors |
-| `references/examples.md` | Need step-by-step scenario walkthroughs |
-| `references/creator-playbook.md` | Publishing skills others install; before-ship vs after-ship creator loop |
-| `references/interactive-config.md` | Before mutating workflows |
-| `references/grading-methodology.md` | Grading sessions or interpreting grades |
-| `references/invocation-taxonomy.md` | Analyzing trigger coverage |
-| `references/logs.md` | Parsing or debugging log files |
-| `references/setup-patterns.md` | Complex platform-specific setup |
-| `references/version-history.md` | Checking what changed between versions |
-| `settings_snippet.json` | During initialization |
+| Reference                           | When to read                                                         |
+| ----------------------------------- | -------------------------------------------------------------------- |
+| `references/cli-quick-reference.md` | Need exact CLI flags beyond `--help`                                 |
+| `references/troubleshooting.md`     | Diagnosing common errors                                             |
+| `references/examples.md`            | Need step-by-step scenario walkthroughs                              |
+| `references/creator-playbook.md`    | Publishing skills others install; before-ship vs after-ship pipeline |
+| `references/interactive-config.md`  | Before mutating workflows                                            |
+| `references/grading-methodology.md` | Grading sessions or interpreting grades                              |
+| `references/invocation-taxonomy.md` | Analyzing trigger coverage                                           |
+| `references/logs.md`                | Parsing or debugging log files                                       |
+| `references/setup-patterns.md`      | Complex platform-specific setup                                      |
+| `references/version-history.md`     | Checking what changed between versions                               |
+| `settings_snippet.json`             | During initialization                                                |

package/skill/references/cli-quick-reference.md CHANGED Viewed

@@ -20,9 +20,23 @@ selftune grade baseline  --skill <name> --skill-path <path> [--eval-set <path>]
 selftune evolve          --skill <name> --skill-path <path> [--dry-run] [--validation-mode auto|replay|judge]
 selftune evolve body     --skill <name> --skill-path <path> --target <body|routing> [--dry-run]
 selftune evolve rollback --skill <name> --skill-path <path> [--proposal-id <id>]
+selftune improve --skill <name> --skill-path <path> [--scope auto|description|routing|body|package] [--dry-run] [--validation-mode auto|replay|judge]
+# Create group
+selftune verify --skill-path <path> [--agent AGENT] [--eval-set PATH] [--no-auto-fix] [--json]
+selftune publish --skill-path <path> [--no-watch] [--ignore-watch-alerts] [--json]
+selftune search-run --skill-path <path> [--skill NAME] [--surface routing|body|both] [--max-candidates N] [--agent AGENT] [--eval-set PATH] [--apply-winner] [--json]
+selftune create status --skill-path <path> [--json]
+selftune create init --name <name> --description <text> [--output-dir PATH] [--force] [--json]
+selftune create scaffold --from-workflow <id|index> [--output-dir PATH] [--skill-name NAME] [--description TEXT] [--write] [--force] [--json]
+selftune create check --skill-path <path> [--json]
+selftune create replay --skill-path <path> [--mode routing|package] [--agent AGENT] [--json]
+selftune create baseline --skill-path <path> [--mode routing|package] [--agent AGENT] [--json]
+selftune create report --skill-path <path> [--agent AGENT] [--eval-set PATH] [--json]
+selftune create publish --skill-path <path> [--watch] [--ignore-watch-alerts] [--json]
 # Eval group
-selftune eval generate      --skill <name> [--list-skills] [--stats] [--max N] [--seed N] [--output PATH] [--blend]
+selftune eval generate      --skill <name> [--list-skills] [--stats] [--max N] [--seed N] [--output PATH] [--agent AGENT] [--blend]
 selftune eval unit-test      --skill <name> --tests <path> [--run-agent] [--generate]
 selftune eval import         --dir <path> --skill <name> --output <path> [--match-strategy exact|fuzzy]
 selftune eval composability  --skill <name> [--window N] [--telemetry-log <path>]
@@ -45,6 +59,7 @@ selftune telemetry [status|enable|disable]
 selftune export    [TABLE...] [--output/-o DIR] [--since DATE]
 # Autonomous loop
+selftune run [--dry-run] [--review-required] [--auto-approve] [--skill NAME] [--max-skills N] [--recent-window HOURS] [--sync-force] [--max-auto-grade N] [--loop] [--loop-interval SECS]
 selftune orchestrate [--dry-run] [--review-required] [--auto-approve] [--skill NAME] [--max-skills N] [--recent-window HOURS] [--sync-force] [--max-auto-grade N] [--loop] [--loop-interval SECS]
 selftune sync        [--since DATE] [--dry-run] [--force] [--no-claude] [--no-codex] [--no-opencode] [--no-openclaw] [--no-pi] [--no-repair] [--json]

package/skill/references/creator-playbook.md CHANGED Viewed

@@ -3,8 +3,9 @@
 Use this when you are publishing a skill other people will install.
 If the user wants the operational step-by-step loop from cold start to deploy,
-route first to `workflows/CreateTestDeploy.md`. Use this reference for the
-packaging and after-ship interpretation layer around that loop.
+route first to `workflows/Verify.md` and `workflows/Publish.md`. Use this
+reference for the packaging and after-ship interpretation layer around that
+loop.
 The goal is simple:
@@ -39,20 +40,23 @@ Rule of thumb:
 ### Cold-start test and deploy the skill before publishing
-The default creator loop is now:
+The default package evaluation pipeline is:
 ```bash
+selftune verify --skill-path path/to/my-skill
 selftune eval generate --skill my-skill
+selftune verify --skill-path path/to/my-skill
 selftune eval unit-test --skill my-skill --generate --skill-path path/to/SKILL.md
-selftune evolve --skill my-skill --skill-path path/to/SKILL.md --dry-run --validation-mode replay
-selftune grade baseline --skill my-skill --skill-path path/to/SKILL.md
-selftune evolve --skill my-skill --skill-path path/to/SKILL.md --with-baseline
-selftune watch --skill my-skill
+selftune verify --skill-path path/to/my-skill
+selftune create replay --skill-path path/to/my-skill --mode package
+selftune create baseline --skill-path path/to/my-skill --mode package
+selftune verify --skill-path path/to/my-skill
+selftune publish --skill-path path/to/my-skill
 ```
-That same sequence is now packaged as the dedicated `CreateTestDeploy`
-workflow in the shipped selftune skill, while `Evals`, `UnitTest`, `Baseline`,
-`Evolve`, and `Watch` remain the atomic workflow docs for each individual step.
+`verify` is the front door in that sequence. Evals, unit tests, replay, and
+baseline remain the atomic supporting steps when the draft is still missing
+evidence.
 The dashboard overview, per-skill report, and `selftune status` all read from that loop and show
 the next missing step directly, then flip to deploy-ready and watching states once the skill is shipped.
@@ -106,11 +110,28 @@ Actionable threshold today:
 - at least `10` total signals
 - at least `3` distinct contributor cohorts
+### Package-level improvement
+When a skill has enough package evaluation evidence (accepted frontier
+candidates, canonical package evaluations), `selftune orchestrate` can
+automatically select package-level bounded search instead of description-only
+evolve. You can also trigger this manually:
+```bash
+selftune improve --skill my-skill --skill-path path/to/SKILL.md --scope package
+```
+Package search generates bounded mutations on routing and body surfaces,
+evaluates them against the accepted frontier parent through the package
+evaluator, and applies the winning candidate. Watch evidence feeds back into
+frontier selection, so post-deploy regressions inform future search runs.
 ### Interpret signal correctly
 - High missed counts with concentrated categories usually mean the **description/router** is wrong.
 - Low grades with decent trigger rate usually mean the **body/workflow/reference/tool split** is wrong.
 - Low-signal skills need more contributors before you trust a proposal.
+- When both routing and body surfaces show weakness, `selftune improve --scope package` or automatic orchestrate scope selection can address them together.
 ## Fast Checklist

package/skill/workflows/Baseline.md CHANGED Viewed

@@ -138,20 +138,19 @@ Report the interpretation to the user based on the lift value.
 Add `--with-baseline` to evolve commands to prevent wasting evolution
 cycles on skills that don't add value.
-### 4. Canonical creator loop position
+### 4. Canonical pipeline position
-Baseline is the last pre-deploy check in the default creator loop:
+Baseline is the last pre-deploy check in the package evaluation pipeline:
 ```bash
-selftune eval generate --skill <name>
-selftune eval unit-test --skill <name> --generate --skill-path <path>
-selftune evolve --skill <name> --skill-path <path> --dry-run --validation-mode replay
-selftune grade baseline --skill <name> --skill-path <path>
-selftune evolve --skill <name> --skill-path <path> --with-baseline
-selftune watch --skill <name>
+selftune verify --skill-path <path>
+selftune create baseline --skill-path <path> --mode package
+selftune verify --skill-path <path>
+selftune publish --skill-path <path>
 ```
-After that, the skill is ready for live deploy and then watch with much clearer trust evidence.
+For already-published skills, `grade baseline` remains the explicit value gate
+behind `evolve --with-baseline`.
 ## Common Patterns

package/skill/workflows/Contributions.md CHANGED Viewed

@@ -56,16 +56,16 @@ selftune contributions upload [--dry-run] [--retry-failed] [--limit <n>]
 ## Automatic Flush via Orchestrate
-When `selftune orchestrate` runs, it automatically flushes any staged
+When `selftune run` runs, it automatically flushes any staged
 creator-directed relay signals as Step 10 (after alpha upload). This means
 users who have opted in don't need to run `selftune contributions upload`
-manually — orchestrate handles it. The flush is fail-open and never blocks
-the orchestrate loop. An API key is required (alpha enrolled).
+manually — the runtime handles it. The flush is fail-open and never blocks
+the autonomous loop. An API key is required (alpha enrolled).
 ## Notes
 - This workflow now shows which installed skills are requesting creator-directed sharing via `selftune.contribute.json`.
-- Once approved, creator-directed contribution signals are staged locally during `selftune sync` / `selftune orchestrate`.
+- Once approved, creator-directed contribution signals are staged locally during `selftune sync` / `selftune run`.
 - Use `selftune contributions upload` to flush staged rows to the creator-directed relay endpoint.
 - Relay upload is separate from `selftune alpha upload` and currently reuses the local cloud API key when available.
 - Use `selftune contribute` when the user explicitly wants to export/share an anonymized community bundle.