npm - selftune - Versions diffs - 0.2.9 → 0.2.12 - Mend

selftune 0.2.9 → 0.2.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (140) hide show

package/README.md +35 -35
package/apps/local-dashboard/dist/assets/index-4_dAY17K.js +16 -0
package/apps/local-dashboard/dist/assets/index-BxV5WZHc.css +2 -0
package/apps/local-dashboard/dist/assets/rolldown-runtime-Dw2cE7zH.js +1 -0
package/apps/local-dashboard/dist/assets/vendor-react-CKkiCskZ.js +11 -0
package/apps/local-dashboard/dist/assets/vendor-table-pHbDxq36.js +8 -0
package/apps/local-dashboard/dist/assets/vendor-ui-7xD7fNEU.js +12 -0
package/apps/local-dashboard/dist/index.html +16 -15
package/bin/selftune.cjs +1 -1
package/cli/selftune/activation-rules.ts +1 -0
package/cli/selftune/alpha-upload/build-payloads.ts +18 -2
package/cli/selftune/alpha-upload/stage-canonical.ts +94 -0
package/cli/selftune/auth/device-code.ts +32 -0
package/cli/selftune/auto-update.ts +12 -0
package/cli/selftune/badge/badge.ts +1 -0
package/cli/selftune/canonical-export.ts +5 -0
package/cli/selftune/claude-agents.ts +154 -0
package/cli/selftune/contribute/bundle.ts +1 -0
package/cli/selftune/contribute/contribute.ts +1 -0
package/cli/selftune/cron/setup.ts +2 -2
package/cli/selftune/dashboard-server.ts +1 -0
package/cli/selftune/eval/hooks-to-evals.ts +1 -0
package/cli/selftune/eval/import-skillsbench.ts +1 -0
package/cli/selftune/eval/synthetic-evals.ts +2 -3
package/cli/selftune/eval/unit-test.ts +1 -0
package/cli/selftune/evolution/deploy-proposal.ts +9 -238
package/cli/selftune/evolution/evolve-body.ts +93 -6
package/cli/selftune/evolution/evolve.ts +3 -7
package/cli/selftune/evolution/propose-body.ts +3 -2
package/cli/selftune/evolution/propose-routing.ts +3 -2
package/cli/selftune/evolution/refine-body.ts +3 -2
package/cli/selftune/evolution/rollback.ts +1 -1
package/cli/selftune/export.ts +1 -0
package/cli/selftune/grading/grade-session.ts +8 -0
package/cli/selftune/hooks/auto-activate.ts +1 -0
package/cli/selftune/hooks/evolution-guard.ts +1 -1
package/cli/selftune/hooks/prompt-log.ts +1 -0
package/cli/selftune/hooks/session-stop.ts +34 -40
package/cli/selftune/hooks/skill-change-guard.ts +1 -0
package/cli/selftune/hooks/skill-eval.ts +1 -1
package/cli/selftune/index.ts +23 -14
package/cli/selftune/ingestors/claude-replay.ts +1 -0
package/cli/selftune/ingestors/codex-rollout.ts +1 -0
package/cli/selftune/ingestors/codex-wrapper.ts +1 -0
package/cli/selftune/ingestors/openclaw-ingest.ts +1 -0
package/cli/selftune/ingestors/opencode-ingest.ts +1 -0
package/cli/selftune/init.ts +121 -29
package/cli/selftune/localdb/db.ts +1 -0
package/cli/selftune/localdb/direct-write.ts +39 -0
package/cli/selftune/localdb/materialize.ts +2 -0
package/cli/selftune/localdb/queries.ts +53 -0
package/cli/selftune/localdb/schema.ts +28 -0
package/cli/selftune/normalization.ts +1 -0
package/cli/selftune/observability.ts +1 -0
package/cli/selftune/repair/skill-usage.ts +1 -0
package/cli/selftune/routes/orchestrate-runs.ts +1 -0
package/cli/selftune/routes/overview.ts +1 -0
package/cli/selftune/routes/report.ts +1 -1
package/cli/selftune/routes/skill-report.ts +2 -1
package/cli/selftune/status.ts +1 -1
package/cli/selftune/sync.ts +30 -1
package/cli/selftune/uninstall.ts +412 -0
package/cli/selftune/utils/canonical-log.ts +2 -0
package/cli/selftune/utils/frontmatter.ts +50 -7
package/cli/selftune/utils/jsonl.ts +1 -0
package/cli/selftune/utils/llm-call.ts +131 -3
package/cli/selftune/utils/skill-log.ts +1 -0
package/cli/selftune/utils/transcript.ts +1 -0
package/cli/selftune/utils/trigger-check.ts +1 -1
package/cli/selftune/workflows/skill-md-writer.ts +5 -5
package/cli/selftune/workflows/workflows.ts +1 -0
package/package.json +37 -33
package/packages/telemetry-contract/fixtures/golden.test.ts +1 -0
package/packages/telemetry-contract/package.json +1 -1
package/packages/telemetry-contract/src/schemas.ts +1 -0
package/packages/telemetry-contract/tests/compatibility.test.ts +1 -0
package/packages/ui/README.md +35 -34
package/packages/ui/package.json +3 -3
package/packages/ui/src/components/ActivityTimeline.tsx +50 -43
package/packages/ui/src/components/EvidenceViewer.tsx +306 -182
package/packages/ui/src/components/EvolutionTimeline.tsx +83 -72
package/packages/ui/src/components/InfoTip.tsx +4 -3
package/packages/ui/src/components/OrchestrateRunsPanel.tsx +60 -53
package/packages/ui/src/components/section-cards.tsx +20 -25
package/packages/ui/src/components/skill-health-grid.tsx +213 -193
package/packages/ui/src/lib/constants.tsx +1 -0
package/packages/ui/src/primitives/badge.tsx +12 -15
package/packages/ui/src/primitives/button.tsx +7 -7
package/packages/ui/src/primitives/card.tsx +15 -26
package/packages/ui/src/primitives/checkbox.tsx +7 -8
package/packages/ui/src/primitives/collapsible.tsx +5 -5
package/packages/ui/src/primitives/dropdown-menu.tsx +45 -55
package/packages/ui/src/primitives/label.tsx +6 -6
package/packages/ui/src/primitives/select.tsx +28 -37
package/packages/ui/src/primitives/table.tsx +17 -44
package/packages/ui/src/primitives/tabs.tsx +14 -21
package/packages/ui/src/primitives/tooltip.tsx +10 -22
package/skill/SKILL.md +70 -57
package/skill/Workflows/AlphaUpload.md +4 -4
package/skill/Workflows/AutoActivation.md +11 -6
package/skill/Workflows/Badge.md +22 -16
package/skill/Workflows/Baseline.md +34 -36
package/skill/Workflows/Composability.md +16 -11
package/skill/Workflows/Contribute.md +26 -21
package/skill/Workflows/Cron.md +23 -22
package/skill/Workflows/Dashboard.md +32 -27
package/skill/Workflows/Doctor.md +33 -27
package/skill/Workflows/Evals.md +48 -47
package/skill/Workflows/EvolutionMemory.md +31 -21
package/skill/Workflows/Evolve.md +84 -82
package/skill/Workflows/EvolveBody.md +58 -47
package/skill/Workflows/Grade.md +16 -13
package/skill/Workflows/ImportSkillsBench.md +9 -6
package/skill/Workflows/Ingest.md +36 -21
package/skill/Workflows/Initialize.md +108 -40
package/skill/Workflows/Orchestrate.md +22 -16
package/skill/Workflows/Replay.md +12 -7
package/skill/Workflows/Rollback.md +13 -6
package/skill/Workflows/Schedule.md +6 -6
package/skill/Workflows/Sync.md +18 -11
package/skill/Workflows/UnitTest.md +28 -17
package/skill/Workflows/Watch.md +28 -21
package/skill/agents/diagnosis-analyst.md +11 -0
package/skill/agents/evolution-reviewer.md +15 -1
package/skill/agents/integration-guide.md +10 -0
package/skill/agents/pattern-analyst.md +12 -1
package/skill/references/grading-methodology.md +23 -24
package/skill/references/interactive-config.md +7 -7
package/skill/references/invocation-taxonomy.md +22 -20
package/skill/references/logs.md +14 -6
package/skill/references/setup-patterns.md +4 -2
package/.claude/agents/diagnosis-analyst.md +0 -156
package/.claude/agents/evolution-reviewer.md +0 -180
package/.claude/agents/integration-guide.md +0 -212
package/.claude/agents/pattern-analyst.md +0 -160
package/apps/local-dashboard/dist/assets/index-Bs3Y4ixf.css +0 -1
package/apps/local-dashboard/dist/assets/index-C4UYGWKr.js +0 -15
package/apps/local-dashboard/dist/assets/vendor-react-BQH_6WrG.js +0 -60
package/apps/local-dashboard/dist/assets/vendor-table-dK1QMLq9.js +0 -26
package/apps/local-dashboard/dist/assets/vendor-ui-CO2mrx6e.js +0 -341

package/skill/Workflows/Cron.md CHANGED Viewed

@@ -17,11 +17,11 @@ OpenClaw-specific cron integration.
 Auto-detect the current platform and install scheduled jobs.
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--platform <name>` | Force a specific platform (`openclaw`, `cron`, `launchd`, `systemd`) | Auto-detect |
-| `--dry-run` | Preview without installing | Off |
-| `--tz <timezone>` | IANA timezone for job schedules (OpenClaw only) | Flag > `TZ` env > system timezone |
+| Flag                | Description                                                          | Default                           |
+| ------------------- | -------------------------------------------------------------------- | --------------------------------- |
+| `--platform <name>` | Force a specific platform (`openclaw`, `cron`, `launchd`, `systemd`) | Auto-detect                       |
+| `--dry-run`         | Preview without installing                                           | Off                               |
+| `--tz <timezone>`   | IANA timezone for job schedules (OpenClaw only)                      | Flag > `TZ` env > system timezone |
 Platform auto-detection: macOS → launchd, Linux → systemd, other → cron.
@@ -43,9 +43,9 @@ No flags.
 Remove all selftune cron jobs from OpenClaw.
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--dry-run` | Preview which jobs would be removed without deleting | Off |
+| Flag        | Description                                          | Default |
+| ----------- | ---------------------------------------------------- | ------- |
+| `--dry-run` | Preview which jobs would be removed without deleting | Off     |
 ## Aliases
@@ -56,11 +56,11 @@ invocations with flags (e.g. `selftune schedule --platform launchd`) continue to
 Setup registers these jobs:
-| Name | Cron Expression | Schedule | Description |
-|------|----------------|----------|-------------|
-| `selftune-sync` | `*/30 * * * *` | Every 30 minutes | Sync source-truth telemetry |
-| `selftune-status` | `0 8 * * *` | Daily at 8am | Health check — report skills with pass rate below 80% |
-| `selftune-orchestrate` | `0 */6 * * *` | Every 6 hours | Full autonomous loop: sync → candidate selection → evolve → watch |
+| Name                   | Cron Expression | Schedule         | Description                                                       |
+| ---------------------- | --------------- | ---------------- | ----------------------------------------------------------------- |
+| `selftune-sync`        | `*/30 * * * *`  | Every 30 minutes | Sync source-truth telemetry                                       |
+| `selftune-status`      | `0 8 * * *`     | Daily at 8am     | Health check — report skills with pass rate below 80%             |
+| `selftune-orchestrate` | `0 */6 * * *`   | Every 6 hours    | Full autonomous loop: sync → candidate selection → evolve → watch |
 All jobs run in **isolated session** mode — each execution gets a clean
 session with no context accumulation from previous runs.
@@ -79,6 +79,7 @@ session with no context accumulation from previous runs.
 3. Verify with `selftune status` after the first scheduled run fires
 For OpenClaw specifically:
 1. Run `selftune cron setup --platform openclaw --dry-run` to preview
 2. Run `selftune cron setup --platform openclaw` to register jobs
 3. Run `selftune cron list` to verify jobs are registered
@@ -111,15 +112,15 @@ interactive mode is for user-directed improvements.
 ## Safety Controls
-| Control | How It Works |
-|---------|-------------|
-| Dry-run first | `selftune cron setup --dry-run` previews commands before installing |
-| Regression threshold | Evolution only deploys if improvement exceeds 5% on existing triggers |
-| Auto-rollback | `selftune watch` automatically rolls back if pass rate drops below baseline minus threshold |
-| Audit trail | Every evolution recorded in `evolution_audit_log.jsonl` with full history |
-| SKILL.md backup | `.bak` file created before every deploy — primary rollback path exists via .bak; fallback depends on audit metadata integrity |
-| Human override | `selftune evolve rollback --skill <name> --skill-path <path>` available anytime to manually revert |
-| Pin descriptions | Config flag to freeze specific skills and prevent evolution on sensitive skills |
+| Control              | How It Works                                                                                                                  |
+| -------------------- | ----------------------------------------------------------------------------------------------------------------------------- |
+| Dry-run first        | `selftune cron setup --dry-run` previews commands before installing                                                           |
+| Regression threshold | Evolution only deploys if improvement exceeds 5% on existing triggers                                                         |
+| Auto-rollback        | `selftune watch` automatically rolls back if pass rate drops below baseline minus threshold                                   |
+| Audit trail          | Every evolution recorded in `evolution_audit_log.jsonl` with full history                                                     |
+| SKILL.md backup      | `.bak` file created before every deploy — primary rollback path exists via .bak; fallback depends on audit metadata integrity |
+| Human override       | `selftune evolve rollback --skill <name> --skill-path <path>` available anytime to manually revert                            |
+| Pin descriptions     | Config flag to freeze specific skills and prevent evolution on sensitive skills                                               |
 ## Common Patterns

package/skill/Workflows/Dashboard.md CHANGED Viewed

@@ -19,11 +19,11 @@ generate JSONL from SQLite for debugging or offline analysis.
 ## Options
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--port <port>` | Custom port for the server | 3141 |
-| `--no-open` | Start server without opening browser | Off |
-| `--serve` | *(Deprecated)* Alias for default behavior | — |
+| Flag            | Description                               | Default |
+| --------------- | ----------------------------------------- | ------- |
+| `--port <port>` | Custom port for the server                | 3141    |
+| `--no-open`     | Start server without opening browser      | Off     |
+| `--serve`       | _(Deprecated)_ Alias for default behavior | —       |
 Note: `--export` and `--out` were removed. The CLI will error if used,
 suggesting `selftune dashboard` instead.
@@ -37,18 +37,18 @@ override.
 ### Endpoints
-| Method | Path | Description |
-|--------|------|-------------|
-| `GET` | `/` | Serve dashboard SPA shell |
-| `GET` | `/api/v2/overview` | SQLite-backed overview payload |
-| `GET` | `/api/v2/skills/:name` | SQLite-backed per-skill report |
-| `GET` | `/api/v2/orchestrate-runs` | Recent orchestrate run reports |
-| `GET` | `/api/v2/doctor` | System health diagnostics (config, logs, hooks, evolution) |
-| `GET` | `/api/v2/events` | SSE stream for live dashboard updates |
-| `GET` | `/api/health` | Dashboard server health probe |
-| `POST` | `/api/actions/watch` | Trigger `selftune watch` for a skill |
-| `POST` | `/api/actions/evolve` | Trigger `selftune evolve` for a skill |
-| `POST` | `/api/actions/rollback` | Trigger `selftune evolve rollback` for a skill |
+| Method | Path                       | Description                                                |
+| ------ | -------------------------- | ---------------------------------------------------------- |
+| `GET`  | `/`                        | Serve dashboard SPA shell                                  |
+| `GET`  | `/api/v2/overview`         | SQLite-backed overview payload                             |
+| `GET`  | `/api/v2/skills/:name`     | SQLite-backed per-skill report                             |
+| `GET`  | `/api/v2/orchestrate-runs` | Recent orchestrate run reports                             |
+| `GET`  | `/api/v2/doctor`           | System health diagnostics (config, logs, hooks, evolution) |
+| `GET`  | `/api/v2/events`           | SSE stream for live dashboard updates                      |
+| `GET`  | `/api/health`              | Dashboard server health probe                              |
+| `POST` | `/api/actions/watch`       | Trigger `selftune watch` for a skill                       |
+| `POST` | `/api/actions/evolve`      | Trigger `selftune evolve` for a skill                      |
+| `POST` | `/api/actions/rollback`    | Trigger `selftune evolve rollback` for a skill             |
 ### Live Updates (SSE)
@@ -110,16 +110,16 @@ database and stops the server.
 The dashboard displays data from these sources:
-| Data | Source | Description |
-|------|--------|-------------|
-| Telemetry | SQLite (`~/.selftune/selftune.db`) | Session-level telemetry records |
-| Skills | SQLite (`~/.selftune/selftune.db`) | Skill activation and usage events |
-| Queries | SQLite (`~/.selftune/selftune.db`) | All user queries across sessions |
-| Evolution | SQLite (`~/.selftune/selftune.db`) | Evolution audit trail (create, deploy, rollback) |
-| Decisions | `~/.selftune/memory/` | Evolution decision records |
-| Snapshots | Computed | Per-skill monitoring snapshots (pass rate, regression status) |
-| Unmatched | Computed | Queries that did not trigger any skill |
-| Pending | Computed | Evolution proposals not yet deployed, rejected, or rolled back |
+| Data      | Source                             | Description                                                    |
+| --------- | ---------------------------------- | -------------------------------------------------------------- |
+| Telemetry | SQLite (`~/.selftune/selftune.db`) | Session-level telemetry records                                |
+| Skills    | SQLite (`~/.selftune/selftune.db`) | Skill activation and usage events                              |
+| Queries   | SQLite (`~/.selftune/selftune.db`) | All user queries across sessions                               |
+| Evolution | SQLite (`~/.selftune/selftune.db`) | Evolution audit trail (create, deploy, rollback)               |
+| Decisions | `~/.selftune/memory/`              | Evolution decision records                                     |
+| Snapshots | Computed                           | Per-skill monitoring snapshots (pass rate, regression status)  |
+| Unmatched | Computed                           | Queries that did not trigger any skill                         |
+| Pending   | Computed                           | Evolution proposals not yet deployed, rejected, or rolled back |
 If no log data is found, the server reports an error listing the
 checked file paths.
@@ -142,21 +142,26 @@ to trigger watch, evolve, or rollback directly from the dashboard.
 ## Common Patterns
 **User wants to see skill performance visually**
 > Run `selftune dashboard`. This opens a browser with a point-in-time snapshot.
 > Report to the user that the dashboard is open.
 **User wants live monitoring**
 > Run `selftune dashboard`. The server provides real-time updates via SSE
 > (~1 second latency).
 **Dashboard shows no data**
 > Run `selftune doctor` to verify hooks are installed. If hooks are missing,
 > route to the Initialize workflow. If hooks are present but no sessions
 > have run, inform the user that sessions must generate telemetry first.
 **User wants a different port**
 > Run `selftune dashboard --port <port>`. Port must be 1-65535.
 **User wants to trigger actions from the dashboard**
 > Run `selftune dashboard`. The dashboard provides action buttons for
 > watch, evolve, and rollback per skill via POST endpoints.

package/skill/Workflows/Doctor.md CHANGED Viewed

@@ -96,47 +96,47 @@ or queue checks when alpha is configured:
 ### Config Check
-| Check name | What it validates |
-|------------|-------------------|
-| `config` | `~/.selftune/config.json` exists, is valid JSON, contains `agent_type` and `llm_mode` fields |
+| Check name | What it validates                                                                            |
+| ---------- | -------------------------------------------------------------------------------------------- |
+| `config`   | `~/.selftune/config.json` exists, is valid JSON, contains `agent_type` and `llm_mode` fields |
 ### Log Checks (4 checks)
-| Check name | What it validates |
-|------------|-------------------|
+| Check name              | What it validates                                     |
+| ----------------------- | ----------------------------------------------------- |
 | `log_session_telemetry` | `session_telemetry_log.jsonl` exists and is parseable |
-| `log_skill_usage` | `skill_usage_log.jsonl` exists and is parseable |
-| `log_all_queries` | `all_queries_log.jsonl` exists and is parseable |
-| `log_evolution_audit` | `evolution_audit_log.jsonl` exists and is parseable |
+| `log_skill_usage`       | `skill_usage_log.jsonl` exists and is parseable       |
+| `log_all_queries`       | `all_queries_log.jsonl` exists and is parseable       |
+| `log_evolution_audit`   | `evolution_audit_log.jsonl` exists and is parseable   |
 ### Hook Check
-| Check name | What it validates |
-|------------|-------------------|
+| Check name      | What it validates                                       |
+| --------------- | ------------------------------------------------------- |
 | `hook_settings` | `~/.claude/settings.json` has selftune hooks configured |
 ### Evolution Check
-| Check name | What it validates |
-|------------|-------------------|
+| Check name        | What it validates                                |
+| ----------------- | ------------------------------------------------ |
 | `evolution_audit` | Evolution audit log entries have valid structure |
 ### Integrity Check
-| Check name | What it validates |
-|------------|-------------------|
+| Check name                 | What it validates                                                                                             |
+| -------------------------- | ------------------------------------------------------------------------------------------------------------- |
 | `dashboard_freshness_mode` | Warns when the dashboard still relies on legacy JSONL watcher invalidation instead of SQLite WAL live refresh |
 ### Skill Version Sync Check
-| Check name | What it validates |
-|------------|-------------------|
+| Check name           | What it validates                                         |
+| -------------------- | --------------------------------------------------------- |
 | `skill_version_sync` | SKILL.md frontmatter version matches package.json version |
 ### Version Check
-| Check name | What it validates |
-|------------|-------------------|
+| Check name           | What it validates                                |
+| -------------------- | ------------------------------------------------ |
 | `version_up_to_date` | Installed version matches latest on npm registry |
 ## Steps
@@ -155,15 +155,15 @@ Parse the JSON output. If `healthy: true`, selftune is fully operational.
 For each failed check, take the appropriate action:
-| Failed check | Fix |
-|-------------|-----|
-| `config` | Run `selftune init` (or `selftune init --force` to regenerate). |
-| `log_*` | Run a session to generate initial log entries. Check hook installation with `selftune init`. |
-| `hook_settings` | Run `selftune init` to install hooks into `~/.claude/settings.json`. |
-| `evolution_audit` | Remove corrupted entries. Future operations will append clean entries. |
+| Failed check               | Fix                                                                                                                                              |
+| -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ |
+| `config`                   | Run `selftune init` (or `selftune init --force` to regenerate).                                                                                  |
+| `log_*`                    | Run a session to generate initial log entries. Check hook installation with `selftune init`.                                                     |
+| `hook_settings`            | Run `selftune init` to install hooks into `~/.claude/settings.json`.                                                                             |
+| `evolution_audit`          | Remove corrupted entries. Future operations will append clean entries.                                                                           |
 | `dashboard_freshness_mode` | This is an operator warning, not a broken install. Expect possible freshness gaps for SQLite-only writes and export before destructive recovery. |
-| `skill_version_sync` | Run `bun run sync-version` to stamp SKILL.md from package.json. |
-| `version_up_to_date` | Run `npm install -g selftune` to update. |
+| `skill_version_sync`       | Run `bun run sync-version` to stamp SKILL.md from package.json.                                                                                  |
+| `version_up_to_date`       | Run `npm install -g selftune` to update.                                                                                                         |
 ### 4. Re-run Doctor
@@ -181,6 +181,7 @@ for root cause analysis.
 **Symptoms:** `selftune status` shows alpha upload as "not enrolled" or "enrolled (missing credential)"
 **Diagnostic steps:**
 1. Check `selftune status` — look at "Alpha Upload" and "Cloud link" lines
 2. If `doctor` includes a `cloud_link` or alpha queue warning, prefer `.checks[].guidance.next_command`
 3. If "not enrolled" or "not linked": run `selftune init --alpha --alpha-email <email>` (opens browser for device-code auth)
@@ -192,23 +193,28 @@ for root cause analysis.
 ## Common Patterns
 **User reports something seems broken**
 > Run `selftune doctor`. Parse the JSON output for failed checks. Report
 > each failure's `name` and `message` to the user with the recommended fix.
 **User asks if hooks are working**
 > Run `selftune doctor`. Parse `.checks[]` for hook-related entries. If
 > hooks pass but no data appears, verify hook script paths in
 > `~/.claude/settings.json` point to actual files.
 **No telemetry data available**
 > Run `selftune doctor`. Route fixes by platform:
+>
 > - **Claude Code** — route to the Initialize workflow to install hooks
 > - **Codex** — run `selftune ingest codex` or `selftune ingest wrap-codex`
 > - **OpenCode** — run `selftune ingest opencode`
 > - **OpenClaw** — run `selftune ingest openclaw`
-> At least one session must complete after setup to generate telemetry.
+>   At least one session must complete after setup to generate telemetry.
 **User asks to check selftune health**
 > Run `selftune doctor`. Parse `.healthy` and `.summary`. If `healthy: true`,
 > report that selftune is fully operational. If false, report failed checks
 > and recommended fixes.

package/skill/Workflows/Evals.md CHANGED Viewed

@@ -7,6 +7,7 @@ its invocation type.
 ## When to Invoke
 Invoke this workflow when the user requests any of the following:
 - Generating eval sets or test data for a skill
 - Checking which skills are undertriggering
 - Viewing skill telemetry or usage stats
@@ -21,22 +22,22 @@ selftune eval generate --skill <name> [options]
 ## Options
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--skill <name>` | Skill to generate evals for | Required (unless `--list-skills`) |
-| `--list-skills` | List all logged skills with query counts | Off |
-| `--stats` | Show aggregate telemetry stats for the skill | Off |
-| `--max <n>` | Maximum eval entries per side | 50 |
-| `--seed <n>` | Seed for deterministic shuffling | 42 |
-| `--output <path>` / `--out <path>` | Output file path | `{skillName}_trigger_eval.json` |
-| `--no-negatives` | Exclude negative examples from output | Off |
-| `--no-taxonomy` | Skip invocation_type classification | Off |
-| `--skill-log <path>` | Path to skill_usage_log.jsonl | Default log path |
-| `--query-log <path>` | Path to all_queries_log.jsonl | Default log path |
-| `--telemetry-log <path>` | Path to session_telemetry_log.jsonl | Default log path |
-| `--synthetic` | Generate evals from SKILL.md via LLM (no logs needed) | Off |
-| `--skill-path <path>` | Path to SKILL.md (required with `--synthetic`) | — |
-| `--model <model>` | LLM model to use for synthetic generation | Agent default |
+| Flag                               | Description                                           | Default                           |
+| ---------------------------------- | ----------------------------------------------------- | --------------------------------- |
+| `--skill <name>`                   | Skill to generate evals for                           | Required (unless `--list-skills`) |
+| `--list-skills`                    | List all logged skills with query counts              | Off                               |
+| `--stats`                          | Show aggregate telemetry stats for the skill          | Off                               |
+| `--max <n>`                        | Maximum eval entries per side                         | 50                                |
+| `--seed <n>`                       | Seed for deterministic shuffling                      | 42                                |
+| `--output <path>` / `--out <path>` | Output file path                                      | `{skillName}_trigger_eval.json`   |
+| `--no-negatives`                   | Exclude negative examples from output                 | Off                               |
+| `--no-taxonomy`                    | Skip invocation_type classification                   | Off                               |
+| `--skill-log <path>`               | Path to skill_usage_log.jsonl                         | Default log path                  |
+| `--query-log <path>`               | Path to all_queries_log.jsonl                         | Default log path                  |
+| `--telemetry-log <path>`           | Path to session_telemetry_log.jsonl                   | Default log path                  |
+| `--synthetic`                      | Generate evals from SKILL.md via LLM (no logs needed) | Off                               |
+| `--skill-path <path>`              | Path to SKILL.md (required with `--synthetic`)        | —                                 |
+| `--model <model>`                  | LLM model to use for synthetic generation             | Agent default                     |
 ## Output Format
@@ -126,6 +127,7 @@ selftune eval generate --skill pptx --synthetic --skill-path /path/to/skills/ppt
 ```
 The command:
 1. Reads the SKILL.md file content
 2. Loads real user queries from the database (if available) as few-shot style examples so synthetic queries match real phrasing patterns
 3. Sends skill content and real examples to an LLM with a prompt requesting realistic test queries
@@ -155,6 +157,7 @@ selftune eval generate --skill pptx --max 50 --output evals-pptx.json
 ```
 The command:
 1. Reads positive triggers from `skill_usage_log.jsonl`
 2. Reads all queries from `all_queries_log.jsonl`
 3. Identifies queries that should have triggered but did not
@@ -181,40 +184,36 @@ If the user responds with "use defaults" or similar shorthand, skip to step 1 us
 For `--list-skills` or `--stats` requests, skip pre-flight entirely — these are read-only operations.
-Use `AskUserQuestion` with these questions:
-```json
-{
-  "questions": [
-    {
-      "question": "Generation Mode",
-      "options": ["Log-based — build from real usage logs (recommended if logs exist)", "Synthetic — generate from SKILL.md via LLM (for new skills)"]
-    },
-    {
-      "question": "Model (for synthetic mode)",
-      "options": ["Fast (haiku) — quick generation", "Balanced (sonnet) — better diversity (recommended)", "Best (opus) — highest quality"]
-    },
-    {
-      "question": "Max Entries",
-      "options": ["50 (default)", "25 (quick)", "100 (comprehensive)"]
-    }
-  ]
-}
-```
-If `AskUserQuestion` is not available, fall back to presenting these as inline numbered options.
+Ask one `AskUserQuestion` at a time in this order:
+1. `Generation Mode`
+   Options:
+   - `Log-based — build from real usage logs (recommended if logs exist)`
+   - `Synthetic — generate from SKILL.md via LLM (for new skills)`
+2. If the user chose synthetic, ask `Model (for synthetic mode)`
+   Options:
+   - `Fast (haiku) — quick generation`
+   - `Balanced (sonnet) — better diversity (recommended)`
+   - `Best (opus) — highest quality`
+3. Ask `Max Entries`
+   Options:
+   - `50 (default)`
+   - `25 (quick)`
+   - `100 (comprehensive)`
+If `AskUserQuestion` is not available or Claude does not invoke it, fall back to presenting the same choices as inline numbered options.
 After the user responds, parse their selections and map each choice to the corresponding CLI flags:
-| Selection | CLI Flag |
-|-----------|----------|
-| 1a (log-based) | _(no flag, default)_ |
-| 1b (synthetic) | `--synthetic --skill-path <path>` |
-| Custom max entries | `--max <value>` |
-| 4a (haiku) | `--model haiku` (resolved internally by selftune) |
-| 4b (sonnet) | `--model sonnet` |
-| 4c (opus) | `--model opus` |
-| Custom output path | `--out <path>` |
+| Selection          | CLI Flag                                          |
+| ------------------ | ------------------------------------------------- |
+| 1a (log-based)     | _(no flag, default)_                              |
+| 1b (synthetic)     | `--synthetic --skill-path <path>`                 |
+| Custom max entries | `--max <value>`                                   |
+| 4a (haiku)         | `--model haiku` (resolved internally by selftune) |
+| 4b (sonnet)        | `--model sonnet`                                  |
+| 4c (opus)          | `--model opus`                                    |
+| Custom output path | `--out <path>`                                    |
 Show a confirmation summary to the user:
@@ -238,6 +237,7 @@ eval generation is useful.
 ### 2. Generate the Eval Set
 Run with `--skill <name>`. Parse the JSON output and review for:
 - Balance between positive and negative entries
 - Coverage of all three positive invocation types (explicit, implicit, contextual)
 - Reasonable negative examples (keyword overlap but wrong intent)
@@ -245,6 +245,7 @@ Run with `--skill <name>`. Parse the JSON output and review for:
 ### 3. Review Invocation Type Distribution
 A healthy eval set has:
 - Some explicit queries (easy baseline)
 - Many implicit queries (natural usage)
 - Several contextual queries (real-world usage)

package/skill/Workflows/EvolutionMemory.md CHANGED Viewed

@@ -35,26 +35,29 @@ rolled back.
 # Selftune Context
 ## Active Evolutions
 - pptx: deployed -- Added implicit triggers for slide deck queries
 - csv-parser: regression -- pass_rate=0.65, baseline=0.88
 ## Known Issues
 - Regression detected for csv-parser: pass_rate=0.65 below baseline=0.88
 ## Last Updated
 2026-03-01T14:00:00.000Z
 ```
 **Status values:**
-| Status | Meaning |
-|--------|---------|
-| `deployed` | Evolution was deployed successfully |
-| `failed` | Evolution attempted but did not deploy |
-| `regression` | Watch detected a regression in pass rate |
-| `healthy` | Watch confirmed pass rate is within threshold |
-| `rolled-back` | Rollback completed successfully |
-| `rollback-failed` | Rollback was attempted but failed |
+| Status            | Meaning                                       |
+| ----------------- | --------------------------------------------- |
+| `deployed`        | Evolution was deployed successfully           |
+| `failed`          | Evolution attempted but did not deploy        |
+| `regression`      | Watch detected a regression in pass rate      |
+| `healthy`         | Watch confirmed pass rate is within threshold |
+| `rolled-back`     | Rollback completed successfully               |
+| `rollback-failed` | Rollback was attempted but failed             |
 ### 2. plan.md -- Current Priorities
@@ -66,13 +69,16 @@ Records evolution priorities and strategy.
 # Evolution Plan
 ## Current Priorities
 1. Improve csv-parser implicit trigger coverage
 2. Re-evolve pptx after eval set expansion
 ## Strategy
 Focus on skills with highest session volume first.
 ## Last Updated
 2026-03-01T14:00:00.000Z
 ```
@@ -85,6 +91,7 @@ only appended.
 ```markdown
 ## 2026-03-01T14:00:00.000Z -- evolve
 - **Skill:** pptx
 - **Action:** evolved
 - **Rationale:** Missed implicit triggers for slide deck queries
@@ -95,14 +102,14 @@ only appended.
 Each entry contains:
-| Field | Description |
-|-------|-------------|
-| Timestamp | ISO 8601 timestamp in the `##` heading |
-| Action type | `evolve`, `rollback`, or `watch` in the heading |
-| Skill | The skill name |
-| Action | Past-tense result: `evolved`, `rolled-back`, or `watched` |
-| Rationale | Why the action was taken |
-| Result | What happened |
+| Field       | Description                                               |
+| ----------- | --------------------------------------------------------- |
+| Timestamp   | ISO 8601 timestamp in the `##` heading                    |
+| Action type | `evolve`, `rollback`, or `watch` in the heading           |
+| Skill       | The skill name                                            |
+| Action      | Past-tense result: `evolved`, `rolled-back`, or `watched` |
+| Rationale   | Why the action was taken                                  |
+| Result      | What happened                                             |
 Entries are separated by `---` markers.
@@ -111,11 +118,11 @@ Entries are separated by `---` markers.
 Memory is updated automatically by the memory writer (`cli/selftune/memory/writer.ts`).
 No manual editing is required during normal operation.
-| Trigger | Function | Updates |
-|---------|----------|---------|
-| After evolve completes | `updateContextAfterEvolve` | context.md + decisions.md |
-| After rollback completes | `updateContextAfterRollback` | context.md + decisions.md |
-| After watch completes | `updateContextAfterWatch` | context.md + decisions.md, adds known issues on regression |
+| Trigger                  | Function                     | Updates                                                    |
+| ------------------------ | ---------------------------- | ---------------------------------------------------------- |
+| After evolve completes   | `updateContextAfterEvolve`   | context.md + decisions.md                                  |
+| After rollback completes | `updateContextAfterRollback` | context.md + decisions.md                                  |
+| After watch completes    | `updateContextAfterWatch`    | context.md + decisions.md, adds known issues on regression |
 ## Reading Memory
@@ -142,13 +149,16 @@ They will be recreated automatically on the next evolve, watch, or rollback run.
 ## Common Patterns
 **"What happened in the last evolution?"**
 > Read `~/.selftune/memory/decisions.md`. The most recent entry at the bottom
 > of the file contains the last action, skill, rationale, and result.
 **"What's the current state?"**
 > Read `~/.selftune/memory/context.md`. The Active Evolutions section lists
 > every tracked skill and its current status.
 **"Memory seems stale"**
 > Delete the files in `~/.selftune/memory/` and run `selftune evolve` or
 > `selftune watch` to recreate them with fresh data.