npm - selftune - Versions diffs - 0.2.31 → 0.2.32 - Mend

selftune 0.2.31 → 0.2.32

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (95) hide show

package/README.md +83 -56
package/apps/local-dashboard/dist/assets/index-B-ut4w0B.js +15 -0
package/apps/local-dashboard/dist/assets/index-BFGfCVrL.css +1 -0
package/apps/local-dashboard/dist/assets/vendor-ui-DfowE3Hu.js +1 -0
package/apps/local-dashboard/dist/index.html +3 -3
package/cli/selftune/command-surface.ts +613 -2
package/cli/selftune/create/baseline.ts +429 -0
package/cli/selftune/create/check.ts +35 -0
package/cli/selftune/create/init.ts +115 -0
package/cli/selftune/create/package-candidate-state.ts +771 -0
package/cli/selftune/create/package-evaluator.ts +710 -0
package/cli/selftune/create/package-fingerprint.ts +142 -0
package/cli/selftune/create/package-search.ts +377 -0
package/cli/selftune/create/publish.ts +431 -0
package/cli/selftune/create/readiness.ts +495 -0
package/cli/selftune/create/replay.ts +330 -0
package/cli/selftune/create/report.ts +74 -0
package/cli/selftune/create/scaffold.ts +121 -0
package/cli/selftune/create/skills-ref-adapter.ts +177 -0
package/cli/selftune/create/status.ts +33 -0
package/cli/selftune/create/templates.ts +249 -0
package/cli/selftune/cron/setup.ts +1 -1
package/cli/selftune/dashboard-action-events.ts +4 -1
package/cli/selftune/dashboard-action-result.ts +789 -24
package/cli/selftune/dashboard-action-stream.ts +80 -0
package/cli/selftune/dashboard-contract.ts +146 -3
package/cli/selftune/dashboard-server.ts +5 -4
package/cli/selftune/eval/hooks-to-evals.ts +58 -35
package/cli/selftune/eval/synthetic-evals.ts +145 -17
package/cli/selftune/evolution/bounded-mutations.ts +1045 -0
package/cli/selftune/evolution/evolve-body.ts +9 -36
package/cli/selftune/evolution/evolve.ts +8 -72
package/cli/selftune/evolution/stopping-criteria.ts +5 -13
package/cli/selftune/evolution/unblock-suggestions.ts +0 -16
package/cli/selftune/evolution/validate-host-replay.ts +115 -15
package/cli/selftune/improve.ts +206 -0
package/cli/selftune/index.ts +123 -6
package/cli/selftune/init.ts +1 -1
package/cli/selftune/localdb/queries/dashboard.ts +30 -0
package/cli/selftune/localdb/schema.ts +52 -0
package/cli/selftune/monitoring/watch.ts +257 -23
package/cli/selftune/orchestrate/execute.ts +300 -1
package/cli/selftune/orchestrate/finalize.ts +14 -0
package/cli/selftune/orchestrate/plan.ts +22 -5
package/cli/selftune/orchestrate/prepare.ts +59 -4
package/cli/selftune/orchestrate/report.ts +1 -1
package/cli/selftune/orchestrate.ts +34 -1
package/cli/selftune/publish.ts +35 -0
package/cli/selftune/routes/actions.ts +81 -15
package/cli/selftune/routes/overview.ts +1 -1
package/cli/selftune/routes/skill-report.ts +147 -2
package/cli/selftune/run.ts +18 -0
package/cli/selftune/schedule.ts +3 -3
package/cli/selftune/search-run.ts +703 -0
package/cli/selftune/status.ts +35 -11
package/cli/selftune/testing-readiness.ts +431 -40
package/cli/selftune/types.ts +316 -0
package/cli/selftune/utils/eval-readiness.ts +1 -0
package/cli/selftune/utils/json-output.ts +11 -0
package/cli/selftune/utils/lifecycle-surface.ts +48 -0
package/cli/selftune/utils/query-filter.ts +82 -1
package/cli/selftune/utils/tui.ts +85 -2
package/cli/selftune/verify.ts +205 -0
package/cli/selftune/workflows/proposals.ts +1 -1
package/cli/selftune/workflows/skill-scaffold.ts +141 -63
package/cli/selftune/workflows/workflows.ts +4 -4
package/package.json +1 -1
package/skill/SKILL.md +148 -85
package/skill/references/cli-quick-reference.md +16 -1
package/skill/references/creator-playbook.md +31 -10
package/skill/workflows/Baseline.md +8 -9
package/skill/workflows/Contributions.md +4 -4
package/skill/workflows/Create.md +173 -0
package/skill/workflows/CreateTestDeploy.md +34 -30
package/skill/workflows/Cron.md +2 -2
package/skill/workflows/Dashboard.md +3 -3
package/skill/workflows/Evals.md +13 -7
package/skill/workflows/Evolve.md +75 -32
package/skill/workflows/EvolveBody.md +22 -15
package/skill/workflows/Hook.md +1 -1
package/skill/workflows/Improve.md +168 -0
package/skill/workflows/Initialize.md +3 -3
package/skill/workflows/Orchestrate.md +49 -12
package/skill/workflows/Publish.md +100 -0
package/skill/workflows/Run.md +72 -0
package/skill/workflows/Schedule.md +2 -2
package/skill/workflows/SearchRun.md +89 -0
package/skill/workflows/SignalsDashboard.md +2 -2
package/skill/workflows/UnitTest.md +13 -4
package/skill/workflows/Verify.md +136 -0
package/skill/workflows/Watch.md +114 -47
package/skill/workflows/Workflows.md +13 -8
package/apps/local-dashboard/dist/assets/index-B7v_o1WC.js +0 -15
package/apps/local-dashboard/dist/assets/index-CrO77SVi.css +0 -1
package/apps/local-dashboard/dist/assets/vendor-ui-B0H8s1mP.js +0 -1

package/skill/workflows/SearchRun.md ADDED Viewed

@@ -0,0 +1,89 @@
+# selftune Search-Run Workflow
+Use this when the user wants bounded package evolution instead of a single
+description, routing, or body mutation. `search-run` explores a minibatch of
+package variants, evaluates them through the shared package evaluator, and
+records the winner plus provenance in the package frontier.
+## Command
+```bash
+selftune search-run --skill-path <path> [--skill NAME] [--surface routing|body|both] [--max-candidates N] [--agent AGENT] [--eval-set PATH] [--apply-winner] [--json]
+```
+## Options
+| Flag               | Description                                                |
+| ------------------ | ---------------------------------------------------------- |
+| `--skill-path`     | Path to a skill directory or `SKILL.md` file               |
+| `--skill`          | Override the inferred skill name for lineage and reporting |
+| `--surface`        | Mutation surface to explore: `routing`, `body`, or `both`  |
+| `--max-candidates` | Cap the number of variants evaluated in this run           |
+| `--agent`          | Runtime agent to use for shared package evaluation         |
+| `--eval-set`       | Override the eval set used during package evaluation       |
+| `--apply-winner`   | Promote the winning candidate back into the draft package  |
+| `--json`           | Emit the full search result as JSON                        |
+| `--help`           | Show command help                                          |
+## What It Does
+1. Resolves the draft package from `--skill-path`
+2. Generates reflective variants first when measured failures and an agent are
+   available, then eval-informed targeted variants, then deterministic fallback
+   variants to fill the minibatch
+   - when `--surface both`, selftune biases the routing/body minibatch toward
+     the weaker measured surface from the accepted frontier or canonical package
+     evaluation
+3. Evaluates each candidate through the shared package evaluator
+4. If routing and body both produce accepted improvements, evaluates a merged
+   candidate built from the complementary surfaces
+5. Compares accepted candidates against the measured frontier
+6. Persists the search run, selected parent, winner, and provenance
+## When To Use
+- The package is already verified and you want to explore multiple candidate edits
+- You want a measured parent-selection loop instead of a one-shot mutation
+- The skill report already shows frontier lineage and you want to expand it
+## Recommended Path
+Start after the draft passes package verification:
+```bash
+selftune verify --skill-path <path>
+selftune search-run --skill-path <path> --surface both
+```
+If you want the main lifecycle alias instead of the low-level command, use:
+```bash
+selftune improve --skill <name> --skill-path <path> --scope package
+```
+`selftune run` (via the `selftune orchestrate` runtime) auto-selects package
+search for eligible skills. During the plan phase, `shouldSelectPackageSearch()`
+routes a skill to the package-search action instead of description evolution
+when eligibility criteria are met:
+- The skill has an **accepted frontier candidate** from a prior search run, OR
+- The skill has a **canonical package evaluation artifact**, OR
+- The skill has a **draft package plus enough grading evidence** to be treated
+  as package-search eligible during orchestrate
+Manual invocation via `selftune improve --scope package` or
+`selftune search-run` remains available for on-demand use outside the
+orchestrate loop.
+Plain `selftune improve --skill <name> --skill-path <path>` also auto-selects
+this path when the skill already has package evidence or a draft package
+manifest.
+When `search-run` runs with `--apply-winner`, or when `improve --scope package`
+runs without `--dry-run`, the winning candidate is copied back into the draft
+package automatically and the canonical package-evaluation artifact is
+refreshed. The next step is usually:
+```bash
+selftune publish --skill-path <path>
+```

package/skill/workflows/SignalsDashboard.md CHANGED Viewed

@@ -51,9 +51,9 @@ These same thresholds gate proposal generation on the API side.
 4. If the user wants to help a skill reach threshold, route to the **Contribute** workflow
 5. If the user is the skill creator, use the Community page as the handoff into proposals and watch
-## Creator Loop
+## After-Ship Pipeline
-For a creator, the after-ship loop is:
+For a creator, the after-ship pipeline is:
 1. check whether the skill is low-signal or actionable
 2. inspect missed categories and grade distribution

package/skill/workflows/UnitTest.md CHANGED Viewed

@@ -146,16 +146,25 @@ selftune eval unit-test --skill Research
 Compare the new `pass_rate` against the previous run. Report whether
 the evolution improved trigger accuracy.
-### 5. Continue the creator loop
+### 5. Continue the pipeline
-After unit tests exist, the next creator step is usually:
+After unit tests exist, the next pipeline step is usually:
 ```bash
-selftune evolve --skill <name> --skill-path <path> --dry-run --validation-mode replay
+selftune verify --skill-path <path>
+```
+If `verify` still reports missing runtime proof, the next explicit supporting
+steps are usually:
+```bash
+selftune create replay --skill-path <path> --mode package
+selftune create baseline --skill-path <path> --mode package
+selftune verify --skill-path <path>
 ```
 That keeps the sequence aligned with the dashboard readiness surface:
-evals -> unit tests -> replay dry-run -> baseline -> deploy -> watch.
+evals -> unit tests -> replay/baseline proof -> publish -> watch.
 ## Common Patterns

package/skill/workflows/Verify.md ADDED Viewed

@@ -0,0 +1,136 @@
+# selftune Verify Workflow
+Use this when the user wants to know whether a skill is trustworthy enough to
+ship or when they ask for the full draft lifecycle without wanting every low-level
+command explained upfront.
+## What Verify Means
+`Verify` is the primary trust-building workflow. The lifecycle entrypoint is:
+```bash
+selftune verify --skill-path <path> [--agent AGENT] [--eval-set PATH] [--no-auto-fix] [--json]
+```
+## Options
+| Flag | Description |
+|------|-------------|
+| `--skill-path` | Path to a skill directory or `SKILL.md` file |
+| `--agent` | Runtime agent to use for package evaluation once readiness passes |
+| `--eval-set` | Override the canonical eval-set path for package evaluation |
+| `--no-auto-fix` | Skip automatic evidence generation when readiness checks fail |
+| `--json` | Emit readiness plus report data as JSON |
+| `--help` | Show command help |
+`verify` first runs the same readiness contract as `selftune create check`. If
+the draft is missing evidence (evals, unit tests, replay, or baseline), it
+automatically runs the corresponding sub-command to generate it, up to 4
+iterations. Use `--no-auto-fix` to disable this and get the old behavior of
+returning the missing state immediately. If the draft is ready, it then runs
+the benchmark-style package report.
+## When to Use
+- The user asks "can I trust this skill?"
+- The user asks "is this ready to ship?"
+- The user asks for the full draft lifecycle or uses older "creator loop" wording
+- A draft package exists and you need trust evidence before publish
+## Default Path
+### 1. Read the current state
+For draft packages:
+```bash
+selftune create status --skill-path <path>
+```
+For a higher-level health summary:
+```bash
+selftune status
+```
+### 2. Run verify
+```bash
+selftune verify --skill-path <path>
+```
+This is the default trust command because it tells you whether the package
+itself is incomplete before you spend time generating more evidence.
+### 3. Fill missing trust evidence
+When `verify` reports missing readiness or evidence, use only the missing
+supporting step.
+#### Missing evals or tests
+```bash
+selftune eval generate --skill <name> --skill-path <path>
+selftune eval unit-test --skill <name> --generate --skill-path <path>
+```
+If the skill is cold-start and has no trusted triggers yet, prefer:
+```bash
+selftune eval generate --skill <name> --auto-synthetic --skill-path <path>
+```
+#### Missing runtime proof
+```bash
+selftune create replay --skill-path <path> --mode package
+```
+#### Missing no-skill value proof
+```bash
+selftune create baseline --skill-path <path> --mode package
+```
+### 4. Re-run verify
+```bash
+selftune verify --skill-path <path>
+```
+The skill is verified when it no longer presents missing package/evidence
+blockers and the next action becomes publish.
+## Outcome
+`Verify` should leave the skill in one of these states:
+- still `draft`
+- one of `needs_spec_validation`, `needs_package_resources`, `needs_evals`, `needs_unit_tests`, `needs_routing_replay`, or `needs_baseline`
+- `verified` (`ready_to_publish` in CLI output)
+If the result is `verified`, hand off to `Publish`.
+If the result is one of the concrete missing-evidence states, run only the missing step.
+If the package itself is broken, route back to `Create`.
+## Which workflows to load next
+- package authoring gaps -> `workflows/Create.md`
+- publishing -> `workflows/Publish.md`
+- specific low-level evidence work -> `Evals.md`, `UnitTest.md`, `Replay.md`, `Baseline.md`
+## Common Patterns
+**"How do I know this skill works?"**
+> Use `Verify`. Start with `create status` or `status`, then fill only the
+> missing evidence step that `verify` reports.
+**"Is this ready to ship?"**
+> Use `Verify`. If the package is already verified, move directly to `Publish`.
+**"Give me the creator loop."**
+> Use `Verify` as the primary trust workflow instead of dumping every low-level
+> command at once.

package/skill/workflows/Watch.md CHANGED Viewed

@@ -1,7 +1,10 @@
 # selftune Watch Workflow
-Monitor post-deploy skill performance for regressions. Compares current
+Monitor post-deploy package performance for regressions. Compares current
 pass rates against a baseline within a sliding window of recent sessions.
+Watch is the final stage of the package evaluation pipeline: after a
+package is published, watch feeds measured evidence back into the accepted
+frontier to confirm the package holds under real traffic.
 ## Default Command
@@ -11,33 +14,43 @@ selftune watch --skill <name> --skill-path <path> [options]
 ## Options
-| Flag                  | Description                                      | Default  |
-| --------------------- | ------------------------------------------------ | -------- |
-| `--skill <name>`      | Skill name                                       | Required |
-| `--skill-path <path>` | Path to the skill's SKILL.md                     | Required |
-| `--window <n>`        | Sliding window size (number of sessions)         | 20       |
-| `--threshold <n>`     | Regression threshold (drop from baseline)        | 0.1      |
-| `--auto-rollback`     | Automatically rollback on detected regression    | Off      |
-| `--sync-first`        | Refresh source-truth telemetry before evaluating | Off      |
-| `--sync-force`        | Force a full source rescan during `--sync-first` | Off      |
-| `--grade-threshold <n>` | Grade regression threshold (drop from baseline)| 0.15     |
+| Flag                    | Description                                      | Default  |
+| ----------------------- | ------------------------------------------------ | -------- |
+| `--skill <name>`        | Skill name                                       | Required |
+| `--skill-path <path>`   | Path to the skill's SKILL.md                     | Required |
+| `--window <n>`          | Sliding window size (number of sessions)         | 20       |
+| `--threshold <n>`       | Regression threshold (drop from baseline)        | 0.1      |
+| `--auto-rollback`       | Automatically rollback on detected regression    | Off      |
+| `--sync-first`          | Refresh source-truth telemetry before evaluating | Off      |
+| `--sync-force`          | Force a full source rescan during `--sync-first` | Off      |
+| `--grade-threshold <n>` | Grade regression threshold (drop from baseline)  | 0.15     |
 | `--no-grade-watch`      | Disable grade-based regression monitoring        | Enabled  |
-| `--help`                | Show command help                               | Off      |
+| `--help`                | Show command help                                | Off      |
 ## Output Format
 ```json
 {
-  "skill_name": "pptx",
-  "window_size": 20,
-  "sessions_evaluated": 18,
-  "current_pass_rate": 0.89,
-  "baseline_pass_rate": 0.92,
-  "threshold": 0.1,
-  "regression_detected": false,
-  "delta": -0.03,
-  "status": "healthy",
-  "evaluated_at": "2026-02-28T14:00:00Z",
+  "snapshot": {
+    "timestamp": "2026-04-14T14:00:00Z",
+    "skill_name": "pptx",
+    "window_sessions": 20,
+    "skill_checks": 18,
+    "pass_rate": 0.89,
+    "false_negative_rate": 0.11,
+    "by_invocation_type": {
+      "explicit": { "passed": 4, "total": 4 },
+      "implicit": { "passed": 8, "total": 9 },
+      "contextual": { "passed": 4, "total": 4 },
+      "negative": { "passed": 0, "total": 1 }
+    },
+    "regression_detected": false,
+    "baseline_pass_rate": 0.92
+  },
+  "alert": null,
+  "rolledBack": false,
+  "recommendation": "Skill \"pptx\" is stable. Pass rate 0.89 is within acceptable range of baseline 0.92.",
+  "recommended_command": null,
   "gradeAlert": null,
   "gradeRegression": null
 }
@@ -51,19 +64,21 @@ When grade regression is detected, the additional fields are populated:
   "gradeRegression": {
     "before": 0.85,
     "after": 0.65,
-    "delta": 0.20
+    "delta": 0.2
   }
 }
 ```
 ### Status Values
-| Status              | Meaning                                           |
-| ------------------- | ------------------------------------------------- |
-| `healthy`           | Current pass rate is within threshold of baseline |
-| `warning`           | Pass rate dropped but within threshold            |
-| `regression`        | Pass rate dropped below baseline minus threshold  |
-| `insufficient_data` | Not enough sessions in the window to evaluate     |
+Watch does not emit a separate `status` enum anymore. Instead, read:
+| Field                          | Meaning                                                              |
+| ------------------------------ | -------------------------------------------------------------------- |
+| `snapshot.regression_detected` | Trigger pass rate dropped below baseline minus threshold             |
+| `alert`                        | One or more trigger/grade regressions were detected                  |
+| `rolledBack`                   | Watch auto-rollback restored the previous version                    |
+| `recommended_command`          | Machine-readable follow-up command when watch wants an explicit step |
 ## Grade Regression Monitoring
@@ -76,10 +91,10 @@ exceeds `gradeRegressionThreshold` (default 0.15), a `gradeAlert` is raised.
 This runs alongside trigger regression:
-| Check              | Source                      | Threshold | Field               |
-| ------------------ | --------------------------- | --------- | ------------------- |
-| Trigger regression | Eval set pass rates         | 0.10      | `regression_detected` |
-| Grade regression   | Grading baseline vs recent  | 0.15      | `gradeRegression`     |
+| Check              | Source                     | Threshold | Field                 |
+| ------------------ | -------------------------- | --------- | --------------------- |
+| Trigger regression | Eval set pass rates        | 0.10      | `regression_detected` |
+| Grade regression   | Grading baseline vs recent | 0.15      | `gradeRegression`     |
 Both checks contribute to the overall `alert` field. A grade regression alert
 is appended to the watch alert string alongside any trigger regression alert.
@@ -92,16 +107,17 @@ if you only want trigger-based monitoring.
 ### Check Regression Status
 ```bash
-# Parse: .regression_detected (boolean)
-# Parse: .status (string)
-# Parse: .delta (float, negative = regression)
+# Parse: .snapshot.regression_detected (boolean)
+# Parse: .alert (string|null)
+# Parse: .rolledBack (boolean)
 ```
 ### Get Key Metrics
 ```bash
-# Parse: .current_pass_rate vs .baseline_pass_rate
-# Parse: .sessions_evaluated (should be close to .window_size)
+# Parse: .snapshot.pass_rate vs .snapshot.baseline_pass_rate
+# Parse: .snapshot.skill_checks (should be close to .snapshot.window_sessions)
+# Parse: .recommended_command for the next machine-readable follow-up
 ```
 ## Steps
@@ -132,12 +148,12 @@ selftune watch --skill pptx --skill-path /path/to/SKILL.md
 Parse the JSON output. Key decision points:
-| Status              | Action                                                    |
-| ------------------- | --------------------------------------------------------- |
-| `healthy`           | No action needed. Skill is performing well.               |
-| `warning`           | Monitor closely. Consider re-running after more sessions. |
-| `regression`        | Investigate. Consider rollback.                           |
-| `insufficient_data` | Wait for more sessions before evaluating.                 |
+| Signal                              | Action                                            |
+| ----------------------------------- | ------------------------------------------------- |
+| `alert == null`                     | No rollback signal. Continue monitoring.          |
+| `alert != null` and `rolledBack`    | Auto-rollback already happened. Confirm recovery. |
+| `alert != null` and not rolled back | Investigate and consider `recommended_command`.   |
+| low `snapshot.skill_checks`         | Wait for more sessions before calling it stable.  |
 ### 3. Decide Action
@@ -174,7 +190,8 @@ context window resets before the user acts on the results.
 **"Check for regressions"**
-> Same as above. Focus on the `regression_detected` and `delta` fields.
+> Same as above. Focus on `snapshot.regression_detected`, `alert`, and
+> `recommended_command`.
 **"How is the skill doing?"**
@@ -186,12 +203,62 @@ context window resets before the user acts on the results.
 > Use `--auto-rollback`. The command will restore the previous description
 > automatically if pass rate drops below baseline minus threshold.
+## Trust Scoring and Frontier Feedback
+Watch results now feed back into the package search pipeline and orchestrate's
+scope selection:
+- **Frontier demotion:** When watch detects a regression for a skill that has
+  an accepted package frontier candidate, the observed watch evidence is
+  written back into that candidate's artifact and SQLite row. This can demote
+  the candidate during future frontier parent selection, so the next search
+  run compares against a stronger baseline instead of repeating a regressed
+  one.
+- **Publish blocking:** Watch alerts can block publish when regressions are
+  detected. The `create publish --watch` flow attaches structured watch
+  evidence directly to the package-evaluation summary. Regressions surface as
+  explicit blockers rather than silent degradation.
+- **Scope influence:** Orchestrate uses watch evidence when deciding whether a
+  skill should go through description-level evolve or package-level search on
+  the next run. Skills with observed package regressions may be re-routed to
+  package search to address the underlying package-level issue rather than
+  only adjusting the description.
+### How Watch Evidence Feeds Back to the Frontier
+When `selftune publish` runs with watch enabled and the watch result contains
+alerts, the publish flow calls `refreshPackageCandidateEvaluationObservation()`
+(in `create/package-candidate-state.ts`). This function writes the structured
+watch evidence — alert text, regression deltas, and grade regression data —
+back into the candidate's SQLite row.
+The frontier ranking comparator in `package-candidate-state.ts` uses **watch
+rank as the highest-priority signal** in its 15-level sort order:
+| Watch rank | Meaning                                    | Effect on frontier position |
+| ---------- | ------------------------------------------ | --------------------------- |
+| 2          | No issues detected                         | Best — ranked first         |
+| 1          | Unknown or insufficient watch data         | Neutral — middle tier       |
+| 0          | Alert, rollback, or regression detected    | Worst — ranked last         |
+Demoted candidates appear with `watch_demoted: true` in the dashboard frontier
+state, making it visible to the operator which candidates were deprioritized by
+watch evidence.
+On subsequent search runs, the parent selection step picks the highest-ranked
+frontier member as the baseline for the next generation of mutations. A
+watch-demoted candidate (rank 0) will be deprioritized in favor of candidates
+without alerts, so the search continues from a stronger measured baseline
+rather than repeating a regressed one.
 ## Autonomous Mode
-When called by `selftune orchestrate`, watch runs automatically on recently
-evolved skills:
+When called by `selftune run` (backed by the `selftune orchestrate` runtime),
+watch runs automatically on recently evolved skills:
 - Checks all skills evolved in the last --recent-window hours (default 24)
 - Auto-rollback is enabled by default
 - Results are included in the orchestrate run report
 - No user notification — regressions are handled silently via rollback
+- Watch evidence feeds back into the accepted package frontier for skills
+  with package evaluation history, influencing future scope selection

package/skill/workflows/Workflows.md CHANGED Viewed

@@ -63,14 +63,15 @@ discovered-source metadata with occurrence count and synergy score.
 `scaffold` turns an observed workflow into a draft local skill.
 - Default behavior is preview-first: the command prints the proposed skill name,
-  output path, provenance, and full `SKILL.md` content.
-- Add `--write` to create `<output-dir>/<skill-name>/SKILL.md`.
-- The generated skill is intentionally conservative: it includes provenance,
-  a description derived from the workflow trigger, an execution plan, and the
-  discovered workflow section. It does not silently publish or distribute the
-  new skill.
-When `selftune orchestrate` sees a strong workflow pattern, it now creates a
+  output path, provenance, and the package preview.
+- Add `--write` to create a full package under `<output-dir>/<skill-name>/`.
+- The generated skill is intentionally conservative: it includes a trigger
+  summary in `SKILL.md`, ordered execution steps in `workflows/default.md`,
+  provenance in `references/overview.md`, empty `scripts/`, empty `assets/`,
+  and a `selftune.create.json` manifest. It does not silently publish or
+  distribute the new skill.
+When `selftune run` sees a strong workflow pattern, it now creates a
 review-first `new_skill` proposal automatically. The manual `scaffold` command
 still exists for explicit previewing and local draft writes.
@@ -89,6 +90,10 @@ selftune workflows scaffold 1 --output-dir .agents/skills --write
 The number prefix (for example, `1.`) is the 1-based index you can pass to
 `selftune workflows save <index>`.
+When you preview a scaffold, selftune prints the package metadata followed by
+the generated file contents for `SKILL.md`, `workflows/default.md`, and
+`references/overview.md`.
 ```text
 Discovered Workflows (from 450 sessions):