npm - selftune - Versions diffs - 0.2.8 → 0.2.10 - Mend

selftune 0.2.8 → 0.2.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (140) hide show

package/README.md +35 -35
package/apps/local-dashboard/dist/assets/index-BZVLv70T.js +16 -0
package/apps/local-dashboard/dist/assets/{index-CRtLkBTi.css → index-Bs3Y4ixf.css} +1 -1
package/apps/local-dashboard/dist/assets/{vendor-react-BQH_6WrG.js → vendor-react-BXP54cYo.js} +4 -4
package/apps/local-dashboard/dist/assets/{vendor-table-dK1QMLq9.js → vendor-table-DTF_SXoy.js} +1 -1
package/apps/local-dashboard/dist/assets/{vendor-ui-CO2mrx6e.js → vendor-ui-CWU0d1wd.js} +66 -66
package/apps/local-dashboard/dist/index.html +15 -15
package/bin/selftune.cjs +1 -1
package/cli/selftune/activation-rules.ts +37 -18
package/cli/selftune/agent-guidance.ts +16 -16
package/cli/selftune/alpha-identity.ts +1 -2
package/cli/selftune/alpha-upload/build-payloads.ts +18 -2
package/cli/selftune/alpha-upload/flush.ts +2 -2
package/cli/selftune/alpha-upload/stage-canonical.ts +106 -3
package/cli/selftune/auth/device-code.ts +32 -0
package/cli/selftune/auto-update.ts +12 -0
package/cli/selftune/badge/badge.ts +1 -0
package/cli/selftune/canonical-export.ts +5 -0
package/cli/selftune/claude-agents.ts +154 -0
package/cli/selftune/contribute/bundle.ts +2 -0
package/cli/selftune/contribute/contribute.ts +1 -0
package/cli/selftune/cron/setup.ts +2 -2
package/cli/selftune/dashboard-contract.ts +1 -1
package/cli/selftune/dashboard-server.ts +11 -52
package/cli/selftune/eval/hooks-to-evals.ts +13 -6
package/cli/selftune/eval/import-skillsbench.ts +1 -0
package/cli/selftune/eval/synthetic-evals.ts +2 -3
package/cli/selftune/eval/unit-test.ts +1 -0
package/cli/selftune/evolution/deploy-proposal.ts +1 -0
package/cli/selftune/evolution/evolve-body.ts +93 -6
package/cli/selftune/evolution/evolve.ts +0 -1
package/cli/selftune/evolution/propose-body.ts +3 -2
package/cli/selftune/evolution/propose-routing.ts +3 -2
package/cli/selftune/evolution/refine-body.ts +3 -2
package/cli/selftune/export.ts +1 -0
package/cli/selftune/grading/auto-grade.ts +1 -0
package/cli/selftune/grading/grade-session.ts +9 -0
package/cli/selftune/hooks/auto-activate.ts +6 -0
package/cli/selftune/hooks/evolution-guard.ts +12 -15
package/cli/selftune/hooks/prompt-log.ts +1 -0
package/cli/selftune/hooks/session-stop.ts +34 -40
package/cli/selftune/hooks/skill-change-guard.ts +1 -0
package/cli/selftune/hooks/skill-eval.ts +1 -1
package/cli/selftune/index.ts +23 -14
package/cli/selftune/ingestors/claude-replay.ts +1 -0
package/cli/selftune/ingestors/codex-rollout.ts +1 -0
package/cli/selftune/ingestors/codex-wrapper.ts +1 -0
package/cli/selftune/ingestors/openclaw-ingest.ts +1 -0
package/cli/selftune/ingestors/opencode-ingest.ts +1 -0
package/cli/selftune/init.ts +197 -96
package/cli/selftune/localdb/db.ts +1 -0
package/cli/selftune/localdb/direct-write.ts +93 -12
package/cli/selftune/localdb/materialize.ts +2 -0
package/cli/selftune/localdb/queries.ts +210 -0
package/cli/selftune/localdb/schema.ts +72 -1
package/cli/selftune/monitoring/watch.ts +1 -0
package/cli/selftune/normalization.ts +4 -0
package/cli/selftune/observability.ts +14 -7
package/cli/selftune/orchestrate.ts +15 -37
package/cli/selftune/repair/skill-usage.ts +7 -3
package/cli/selftune/routes/orchestrate-runs.ts +1 -0
package/cli/selftune/routes/overview.ts +1 -0
package/cli/selftune/routes/skill-report.ts +1 -0
package/cli/selftune/sync.ts +31 -1
package/cli/selftune/types.ts +2 -2
package/cli/selftune/uninstall.ts +412 -0
package/cli/selftune/utils/canonical-log.ts +2 -0
package/cli/selftune/utils/jsonl.ts +1 -0
package/cli/selftune/utils/llm-call.ts +131 -3
package/cli/selftune/utils/skill-log.ts +1 -0
package/cli/selftune/utils/transcript.ts +1 -0
package/cli/selftune/utils/trigger-check.ts +1 -1
package/cli/selftune/workflows/skill-md-writer.ts +5 -5
package/cli/selftune/workflows/workflows.ts +1 -0
package/package.json +38 -33
package/packages/telemetry-contract/fixtures/golden.test.ts +1 -0
package/packages/telemetry-contract/package.json +3 -3
package/packages/telemetry-contract/src/index.ts +0 -1
package/packages/telemetry-contract/src/schemas.ts +6 -24
package/packages/telemetry-contract/tests/compatibility.test.ts +1 -0
package/packages/ui/README.md +35 -34
package/packages/ui/package.json +3 -3
package/packages/ui/src/components/ActivityTimeline.tsx +49 -42
package/packages/ui/src/components/EvidenceViewer.tsx +306 -182
package/packages/ui/src/components/EvolutionTimeline.tsx +83 -72
package/packages/ui/src/components/InfoTip.tsx +4 -3
package/packages/ui/src/components/OrchestrateRunsPanel.tsx +60 -53
package/packages/ui/src/components/section-cards.tsx +19 -24
package/packages/ui/src/components/skill-health-grid.tsx +213 -193
package/packages/ui/src/lib/constants.tsx +1 -0
package/packages/ui/src/primitives/badge.tsx +12 -15
package/packages/ui/src/primitives/button.tsx +7 -7
package/packages/ui/src/primitives/card.tsx +15 -26
package/packages/ui/src/primitives/checkbox.tsx +7 -8
package/packages/ui/src/primitives/collapsible.tsx +5 -5
package/packages/ui/src/primitives/dropdown-menu.tsx +45 -55
package/packages/ui/src/primitives/label.tsx +6 -6
package/packages/ui/src/primitives/select.tsx +28 -37
package/packages/ui/src/primitives/table.tsx +17 -44
package/packages/ui/src/primitives/tabs.tsx +14 -21
package/packages/ui/src/primitives/tooltip.tsx +10 -22
package/skill/SKILL.md +72 -59
package/skill/Workflows/AlphaUpload.md +4 -4
package/skill/Workflows/AutoActivation.md +11 -6
package/skill/Workflows/Badge.md +22 -16
package/skill/Workflows/Baseline.md +34 -36
package/skill/Workflows/Composability.md +16 -11
package/skill/Workflows/Contribute.md +26 -21
package/skill/Workflows/Cron.md +23 -22
package/skill/Workflows/Dashboard.md +40 -40
package/skill/Workflows/Doctor.md +40 -34
package/skill/Workflows/Evals.md +48 -47
package/skill/Workflows/EvolutionMemory.md +31 -21
package/skill/Workflows/Evolve.md +84 -82
package/skill/Workflows/EvolveBody.md +58 -47
package/skill/Workflows/Grade.md +16 -13
package/skill/Workflows/ImportSkillsBench.md +9 -6
package/skill/Workflows/Ingest.md +36 -21
package/skill/Workflows/Initialize.md +138 -97
package/skill/Workflows/Orchestrate.md +22 -16
package/skill/Workflows/Replay.md +12 -7
package/skill/Workflows/Rollback.md +13 -6
package/skill/Workflows/Schedule.md +6 -6
package/skill/Workflows/Sync.md +18 -11
package/skill/Workflows/UnitTest.md +28 -17
package/skill/Workflows/Watch.md +28 -21
package/skill/agents/diagnosis-analyst.md +11 -0
package/skill/agents/evolution-reviewer.md +15 -1
package/skill/agents/integration-guide.md +10 -0
package/skill/agents/pattern-analyst.md +12 -1
package/skill/references/grading-methodology.md +23 -24
package/skill/references/interactive-config.md +7 -7
package/skill/references/invocation-taxonomy.md +22 -20
package/skill/references/logs.md +20 -6
package/skill/references/setup-patterns.md +4 -2
package/.claude/agents/diagnosis-analyst.md +0 -156
package/.claude/agents/evolution-reviewer.md +0 -180
package/.claude/agents/integration-guide.md +0 -212
package/.claude/agents/pattern-analyst.md +0 -160
package/apps/local-dashboard/dist/assets/index-Bk9vSHHd.js +0 -15

package/skill/Workflows/Contribute.md CHANGED Viewed

@@ -18,42 +18,43 @@ selftune contribute --skill selftune
 ## Options
-| Flag | Description |
-|------|-------------|
-| `--skill <name>` | Skill to contribute data for (default: "selftune") |
-| `--output <path>` | Output file path (default: auto-generated in ~/.selftune/contributions/) |
-| `--preview` | Show what would be shared without writing |
-| `--sanitize <level>` | `conservative` (default) or `aggressive` |
-| `--since <date>` | Only include data from this date onward |
-| `--submit` | Auto-create GitHub Issue via `gh` CLI |
+| Flag                 | Description                                                              |
+| -------------------- | ------------------------------------------------------------------------ |
+| `--skill <name>`     | Skill to contribute data for (default: "selftune")                       |
+| `--output <path>`    | Output file path (default: auto-generated in ~/.selftune/contributions/) |
+| `--preview`          | Show what would be shared without writing                                |
+| `--sanitize <level>` | `conservative` (default) or `aggressive`                                 |
+| `--since <date>`     | Only include data from this date onward                                  |
+| `--submit`           | Auto-create GitHub Issue via `gh` CLI                                    |
 ## Sanitization Levels
 ### Conservative (default)
-| Pattern | Replacement |
-|---------|-------------|
-| File paths | `[PATH]` |
-| Email addresses | `[EMAIL]` |
-| API keys, tokens, JWTs | `[SECRET]` |
-| IP addresses | `[IP]` |
-| Project name from cwd | `[PROJECT]` |
-| Session IDs | `[SESSION]` |
+| Pattern                | Replacement |
+| ---------------------- | ----------- |
+| File paths             | `[PATH]`    |
+| Email addresses        | `[EMAIL]`   |
+| API keys, tokens, JWTs | `[SECRET]`  |
+| IP addresses           | `[IP]`      |
+| Project name from cwd  | `[PROJECT]` |
+| Session IDs            | `[SESSION]` |
 ### Aggressive
 Extends conservative with:
-| Pattern | Replacement |
-|---------|-------------|
+| Pattern                                    | Replacement    |
+| ------------------------------------------ | -------------- |
 | camelCase/PascalCase identifiers > 8 chars | `[IDENTIFIER]` |
-| Quoted strings | `[STRING]` |
-| Import/require module paths | `[MODULE]` |
-| Queries > 200 chars | Truncated |
+| Quoted strings                             | `[STRING]`     |
+| Import/require module paths                | `[MODULE]`     |
+| Queries > 200 chars                        | Truncated      |
 ## Bundle Contents
 The contribution bundle includes:
 - **Positive queries** -- queries that triggered the skill (sanitized)
 - **Eval entries** -- trigger eval set for the skill
 - **Grading summary** -- aggregate pass rates (no raw transcripts)
@@ -79,16 +80,20 @@ No raw transcripts, file contents, or identifiable information is included.
 ## Common Patterns
 **User wants to see what would be shared**
 > Run `selftune contribute --preview`. Parse the output and report the
 > sanitized data summary to the user before proceeding.
 **User requests stronger anonymization**
 > Run `selftune contribute --sanitize aggressive`. This replaces identifiers,
 > quoted strings, and module paths in addition to standard PII scrubbing.
 **User wants to submit directly**
 > Run `selftune contribute --submit`. This creates a GitHub Issue via `gh`
 > CLI with the bundle inlined or uploaded as a gist.
 **User wants to limit to recent data**
 > Run `selftune contribute --since <date>` with the user's specified date.

package/skill/Workflows/Cron.md CHANGED Viewed

@@ -17,11 +17,11 @@ OpenClaw-specific cron integration.
 Auto-detect the current platform and install scheduled jobs.
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--platform <name>` | Force a specific platform (`openclaw`, `cron`, `launchd`, `systemd`) | Auto-detect |
-| `--dry-run` | Preview without installing | Off |
-| `--tz <timezone>` | IANA timezone for job schedules (OpenClaw only) | Flag > `TZ` env > system timezone |
+| Flag                | Description                                                          | Default                           |
+| ------------------- | -------------------------------------------------------------------- | --------------------------------- |
+| `--platform <name>` | Force a specific platform (`openclaw`, `cron`, `launchd`, `systemd`) | Auto-detect                       |
+| `--dry-run`         | Preview without installing                                           | Off                               |
+| `--tz <timezone>`   | IANA timezone for job schedules (OpenClaw only)                      | Flag > `TZ` env > system timezone |
 Platform auto-detection: macOS → launchd, Linux → systemd, other → cron.
@@ -43,9 +43,9 @@ No flags.
 Remove all selftune cron jobs from OpenClaw.
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--dry-run` | Preview which jobs would be removed without deleting | Off |
+| Flag        | Description                                          | Default |
+| ----------- | ---------------------------------------------------- | ------- |
+| `--dry-run` | Preview which jobs would be removed without deleting | Off     |
 ## Aliases
@@ -56,11 +56,11 @@ invocations with flags (e.g. `selftune schedule --platform launchd`) continue to
 Setup registers these jobs:
-| Name | Cron Expression | Schedule | Description |
-|------|----------------|----------|-------------|
-| `selftune-sync` | `*/30 * * * *` | Every 30 minutes | Sync source-truth telemetry |
-| `selftune-status` | `0 8 * * *` | Daily at 8am | Health check — report skills with pass rate below 80% |
-| `selftune-orchestrate` | `0 */6 * * *` | Every 6 hours | Full autonomous loop: sync → candidate selection → evolve → watch |
+| Name                   | Cron Expression | Schedule         | Description                                                       |
+| ---------------------- | --------------- | ---------------- | ----------------------------------------------------------------- |
+| `selftune-sync`        | `*/30 * * * *`  | Every 30 minutes | Sync source-truth telemetry                                       |
+| `selftune-status`      | `0 8 * * *`     | Daily at 8am     | Health check — report skills with pass rate below 80%             |
+| `selftune-orchestrate` | `0 */6 * * *`   | Every 6 hours    | Full autonomous loop: sync → candidate selection → evolve → watch |
 All jobs run in **isolated session** mode — each execution gets a clean
 session with no context accumulation from previous runs.
@@ -79,6 +79,7 @@ session with no context accumulation from previous runs.
 3. Verify with `selftune status` after the first scheduled run fires
 For OpenClaw specifically:
 1. Run `selftune cron setup --platform openclaw --dry-run` to preview
 2. Run `selftune cron setup --platform openclaw` to register jobs
 3. Run `selftune cron list` to verify jobs are registered
@@ -111,15 +112,15 @@ interactive mode is for user-directed improvements.
 ## Safety Controls
-| Control | How It Works |
-|---------|-------------|
-| Dry-run first | `selftune cron setup --dry-run` previews commands before installing |
-| Regression threshold | Evolution only deploys if improvement exceeds 5% on existing triggers |
-| Auto-rollback | `selftune watch` automatically rolls back if pass rate drops below baseline minus threshold |
-| Audit trail | Every evolution recorded in `evolution_audit_log.jsonl` with full history |
-| SKILL.md backup | `.bak` file created before every deploy — primary rollback path exists via .bak; fallback depends on audit metadata integrity |
-| Human override | `selftune evolve rollback --skill <name> --skill-path <path>` available anytime to manually revert |
-| Pin descriptions | Config flag to freeze specific skills and prevent evolution on sensitive skills |
+| Control              | How It Works                                                                                                                  |
+| -------------------- | ----------------------------------------------------------------------------------------------------------------------------- |
+| Dry-run first        | `selftune cron setup --dry-run` previews commands before installing                                                           |
+| Regression threshold | Evolution only deploys if improvement exceeds 5% on existing triggers                                                         |
+| Auto-rollback        | `selftune watch` automatically rolls back if pass rate drops below baseline minus threshold                                   |
+| Audit trail          | Every evolution recorded in `evolution_audit_log.jsonl` with full history                                                     |
+| SKILL.md backup      | `.bak` file created before every deploy — primary rollback path exists via .bak; fallback depends on audit metadata integrity |
+| Human override       | `selftune evolve rollback --skill <name> --skill-path <path>` available anytime to manually revert                            |
+| Pin descriptions     | Config flag to freeze specific skills and prevent evolution on sensitive skills                                               |
 ## Common Patterns

package/skill/Workflows/Dashboard.md CHANGED Viewed

@@ -11,22 +11,19 @@ selftune dashboard
 ```
 Starts a Bun HTTP server with a React SPA dashboard and opens it in the
-default browser. The dashboard reads SQLite directly, but the current
-live-update invalidation path still watches JSONL logs and pushes
-updates via Server-Sent Events (SSE). That means the dashboard usually
-refreshes quickly, but SQLite-only writes can still lag until the WAL
-cutover lands. TanStack Query polling (60s) acts as a fallback. Action
-buttons trigger selftune commands directly from the dashboard. Use
-`selftune export` to generate JSONL from SQLite for debugging or
-offline analysis.
+default browser. The dashboard reads SQLite directly and uses WAL-based
+invalidation to push live updates via Server-Sent Events (SSE).
+TanStack Query polling (60s) acts as a fallback. Action buttons trigger
+selftune commands directly from the dashboard. Use `selftune export` to
+generate JSONL from SQLite for debugging or offline analysis.
 ## Options
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--port <port>` | Custom port for the server | 3141 |
-| `--no-open` | Start server without opening browser | Off |
-| `--serve` | *(Deprecated)* Alias for default behavior | — |
+| Flag            | Description                               | Default |
+| --------------- | ----------------------------------------- | ------- |
+| `--port <port>` | Custom port for the server                | 3141    |
+| `--no-open`     | Start server without opening browser      | Off     |
+| `--serve`       | _(Deprecated)_ Alias for default behavior | —       |
 Note: `--export` and `--out` were removed. The CLI will error if used,
 suggesting `selftune dashboard` instead.
@@ -40,27 +37,25 @@ override.
 ### Endpoints
-| Method | Path | Description |
-|--------|------|-------------|
-| `GET` | `/` | Serve dashboard SPA shell |
-| `GET` | `/api/v2/overview` | SQLite-backed overview payload |
-| `GET` | `/api/v2/skills/:name` | SQLite-backed per-skill report |
-| `GET` | `/api/v2/orchestrate-runs` | Recent orchestrate run reports |
-| `GET` | `/api/v2/doctor` | System health diagnostics (config, logs, hooks, evolution) |
-| `GET` | `/api/v2/events` | SSE stream for live dashboard updates |
-| `GET` | `/api/health` | Dashboard server health probe |
-| `POST` | `/api/actions/watch` | Trigger `selftune watch` for a skill |
-| `POST` | `/api/actions/evolve` | Trigger `selftune evolve` for a skill |
-| `POST` | `/api/actions/rollback` | Trigger `selftune evolve rollback` for a skill |
+| Method | Path                       | Description                                                |
+| ------ | -------------------------- | ---------------------------------------------------------- |
+| `GET`  | `/`                        | Serve dashboard SPA shell                                  |
+| `GET`  | `/api/v2/overview`         | SQLite-backed overview payload                             |
+| `GET`  | `/api/v2/skills/:name`     | SQLite-backed per-skill report                             |
+| `GET`  | `/api/v2/orchestrate-runs` | Recent orchestrate run reports                             |
+| `GET`  | `/api/v2/doctor`           | System health diagnostics (config, logs, hooks, evolution) |
+| `GET`  | `/api/v2/events`           | SSE stream for live dashboard updates                      |
+| `GET`  | `/api/health`              | Dashboard server health probe                              |
+| `POST` | `/api/actions/watch`       | Trigger `selftune watch` for a skill                       |
+| `POST` | `/api/actions/evolve`      | Trigger `selftune evolve` for a skill                      |
+| `POST` | `/api/actions/rollback`    | Trigger `selftune evolve rollback` for a skill             |
 ### Live Updates (SSE)
 The dashboard connects to `/api/v2/events` via Server-Sent Events.
-When watched JSONL log files change on disk, the server broadcasts an
-`update` event. The SPA invalidates all cached queries, triggering
-immediate refetches. New data usually appears quickly, but the runtime
-footer and Status page will warn when the server is still in this
-legacy JSONL watcher mode.
+The server watches the SQLite WAL file for changes and broadcasts an
+`update` event when new data is written. The SPA invalidates all cached
+queries, triggering immediate refetches (~1s latency).
 TanStack Query polling (60s) acts as a fallback safety net in case the
 SSE connection drops. Data also refreshes on window focus.
@@ -115,16 +110,16 @@ database and stops the server.
 The dashboard displays data from these sources:
-| Data | Source | Description |
-|------|--------|-------------|
-| Telemetry | SQLite (`~/.selftune/selftune.db`) | Session-level telemetry records |
-| Skills | SQLite (`~/.selftune/selftune.db`) | Skill activation and usage events |
-| Queries | SQLite (`~/.selftune/selftune.db`) | All user queries across sessions |
-| Evolution | SQLite (`~/.selftune/selftune.db`) | Evolution audit trail (create, deploy, rollback) |
-| Decisions | `~/.selftune/memory/` | Evolution decision records |
-| Snapshots | Computed | Per-skill monitoring snapshots (pass rate, regression status) |
-| Unmatched | Computed | Queries that did not trigger any skill |
-| Pending | Computed | Evolution proposals not yet deployed, rejected, or rolled back |
+| Data      | Source                             | Description                                                    |
+| --------- | ---------------------------------- | -------------------------------------------------------------- |
+| Telemetry | SQLite (`~/.selftune/selftune.db`) | Session-level telemetry records                                |
+| Skills    | SQLite (`~/.selftune/selftune.db`) | Skill activation and usage events                              |
+| Queries   | SQLite (`~/.selftune/selftune.db`) | All user queries across sessions                               |
+| Evolution | SQLite (`~/.selftune/selftune.db`) | Evolution audit trail (create, deploy, rollback)               |
+| Decisions | `~/.selftune/memory/`              | Evolution decision records                                     |
+| Snapshots | Computed                           | Per-skill monitoring snapshots (pass rate, regression status)  |
+| Unmatched | Computed                           | Queries that did not trigger any skill                         |
+| Pending   | Computed                           | Evolution proposals not yet deployed, rejected, or rolled back |
 If no log data is found, the server reports an error listing the
 checked file paths.
@@ -147,21 +142,26 @@ to trigger watch, evolve, or rollback directly from the dashboard.
 ## Common Patterns
 **User wants to see skill performance visually**
 > Run `selftune dashboard`. This opens a browser with a point-in-time snapshot.
 > Report to the user that the dashboard is open.
 **User wants live monitoring**
 > Run `selftune dashboard`. The server provides real-time updates via SSE
 > (~1 second latency).
 **Dashboard shows no data**
 > Run `selftune doctor` to verify hooks are installed. If hooks are missing,
 > route to the Initialize workflow. If hooks are present but no sessions
 > have run, inform the user that sessions must generate telemetry first.
 **User wants a different port**
 > Run `selftune dashboard --port <port>`. Port must be 1-65535.
 **User wants to trigger actions from the dashboard**
 > Run `selftune dashboard`. The dashboard provides action buttons for
 > watch, evolve, and rollback per skill via POST endpoints.

package/skill/Workflows/Doctor.md CHANGED Viewed

@@ -40,14 +40,14 @@ None. Doctor runs all checks unconditionally.
     },
     {
       "name": "dashboard_freshness_mode",
-      "status": "warn",
-      "message": "Dashboard still uses legacy JSONL watcher invalidation"
+      "status": "pass",
+      "message": "Dashboard reads SQLite and watches WAL for live updates"
     }
   ],
   "summary": {
-    "pass": 8,
+    "pass": 9,
     "fail": 1,
-    "warn": 1,
+    "warn": 0,
     "total": 10
   },
   "healthy": false
@@ -96,47 +96,47 @@ or queue checks when alpha is configured:
 ### Config Check
-| Check name | What it validates |
-|------------|-------------------|
-| `config` | `~/.selftune/config.json` exists, is valid JSON, contains `agent_type` and `llm_mode` fields |
+| Check name | What it validates                                                                            |
+| ---------- | -------------------------------------------------------------------------------------------- |
+| `config`   | `~/.selftune/config.json` exists, is valid JSON, contains `agent_type` and `llm_mode` fields |
 ### Log Checks (4 checks)
-| Check name | What it validates |
-|------------|-------------------|
+| Check name              | What it validates                                     |
+| ----------------------- | ----------------------------------------------------- |
 | `log_session_telemetry` | `session_telemetry_log.jsonl` exists and is parseable |
-| `log_skill_usage` | `skill_usage_log.jsonl` exists and is parseable |
-| `log_all_queries` | `all_queries_log.jsonl` exists and is parseable |
-| `log_evolution_audit` | `evolution_audit_log.jsonl` exists and is parseable |
+| `log_skill_usage`       | `skill_usage_log.jsonl` exists and is parseable       |
+| `log_all_queries`       | `all_queries_log.jsonl` exists and is parseable       |
+| `log_evolution_audit`   | `evolution_audit_log.jsonl` exists and is parseable   |
 ### Hook Check
-| Check name | What it validates |
-|------------|-------------------|
+| Check name      | What it validates                                       |
+| --------------- | ------------------------------------------------------- |
 | `hook_settings` | `~/.claude/settings.json` has selftune hooks configured |
 ### Evolution Check
-| Check name | What it validates |
-|------------|-------------------|
+| Check name        | What it validates                                |
+| ----------------- | ------------------------------------------------ |
 | `evolution_audit` | Evolution audit log entries have valid structure |
 ### Integrity Check
-| Check name | What it validates |
-|------------|-------------------|
+| Check name                 | What it validates                                                                                             |
+| -------------------------- | ------------------------------------------------------------------------------------------------------------- |
 | `dashboard_freshness_mode` | Warns when the dashboard still relies on legacy JSONL watcher invalidation instead of SQLite WAL live refresh |
 ### Skill Version Sync Check
-| Check name | What it validates |
-|------------|-------------------|
+| Check name           | What it validates                                         |
+| -------------------- | --------------------------------------------------------- |
 | `skill_version_sync` | SKILL.md frontmatter version matches package.json version |
 ### Version Check
-| Check name | What it validates |
-|------------|-------------------|
+| Check name           | What it validates                                |
+| -------------------- | ------------------------------------------------ |
 | `version_up_to_date` | Installed version matches latest on npm registry |
 ## Steps
@@ -155,15 +155,15 @@ Parse the JSON output. If `healthy: true`, selftune is fully operational.
 For each failed check, take the appropriate action:
-| Failed check | Fix |
-|-------------|-----|
-| `config` | Run `selftune init` (or `selftune init --force` to regenerate). |
-| `log_*` | Run a session to generate initial log entries. Check hook installation with `selftune init`. |
-| `hook_settings` | Run `selftune init` to install hooks into `~/.claude/settings.json`. |
-| `evolution_audit` | Remove corrupted entries. Future operations will append clean entries. |
+| Failed check               | Fix                                                                                                                                              |
+| -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ |
+| `config`                   | Run `selftune init` (or `selftune init --force` to regenerate).                                                                                  |
+| `log_*`                    | Run a session to generate initial log entries. Check hook installation with `selftune init`.                                                     |
+| `hook_settings`            | Run `selftune init` to install hooks into `~/.claude/settings.json`.                                                                             |
+| `evolution_audit`          | Remove corrupted entries. Future operations will append clean entries.                                                                           |
 | `dashboard_freshness_mode` | This is an operator warning, not a broken install. Expect possible freshness gaps for SQLite-only writes and export before destructive recovery. |
-| `skill_version_sync` | Run `bun run sync-version` to stamp SKILL.md from package.json. |
-| `version_up_to_date` | Run `npm install -g selftune` to update. |
+| `skill_version_sync`       | Run `bun run sync-version` to stamp SKILL.md from package.json.                                                                                  |
+| `version_up_to_date`       | Run `npm install -g selftune` to update.                                                                                                         |
 ### 4. Re-run Doctor
@@ -181,34 +181,40 @@ for root cause analysis.
 **Symptoms:** `selftune status` shows alpha upload as "not enrolled" or "enrolled (missing credential)"
 **Diagnostic steps:**
 1. Check `selftune status` — look at "Alpha Upload" and "Cloud link" lines
 2. If `doctor` includes a `cloud_link` or alpha queue warning, prefer `.checks[].guidance.next_command`
-3. If "not enrolled" or "not linked": run `selftune init --alpha --alpha-email <email> --alpha-key <key>`
-4. If "enrolled (missing credential)": re-run `selftune init --alpha --alpha-email <email> --alpha-key <credential> --force`
-5. If "api_key has invalid format": credential must start with `st_live_` or `st_test_`
+3. If "not enrolled" or "not linked": run `selftune init --alpha --alpha-email <email>` (opens browser for device-code auth)
+4. If "enrolled (missing credential)": re-run `selftune init --alpha --alpha-email <email> --force` (re-authenticates via browser)
+5. If "api_key has invalid format": re-run init with `--alpha --force` to re-authenticate
 **Resolution:** Follow the setup sequence in Initialize workflow → Alpha Enrollment section.
 ## Common Patterns
 **User reports something seems broken**
 > Run `selftune doctor`. Parse the JSON output for failed checks. Report
 > each failure's `name` and `message` to the user with the recommended fix.
 **User asks if hooks are working**
 > Run `selftune doctor`. Parse `.checks[]` for hook-related entries. If
 > hooks pass but no data appears, verify hook script paths in
 > `~/.claude/settings.json` point to actual files.
 **No telemetry data available**
 > Run `selftune doctor`. Route fixes by platform:
+>
 > - **Claude Code** — route to the Initialize workflow to install hooks
 > - **Codex** — run `selftune ingest codex` or `selftune ingest wrap-codex`
 > - **OpenCode** — run `selftune ingest opencode`
 > - **OpenClaw** — run `selftune ingest openclaw`
-> At least one session must complete after setup to generate telemetry.
+>   At least one session must complete after setup to generate telemetry.
 **User asks to check selftune health**
 > Run `selftune doctor`. Parse `.healthy` and `.summary`. If `healthy: true`,
 > report that selftune is fully operational. If false, report failed checks
 > and recommended fixes.

package/skill/Workflows/Evals.md CHANGED Viewed

@@ -7,6 +7,7 @@ its invocation type.
 ## When to Invoke
 Invoke this workflow when the user requests any of the following:
 - Generating eval sets or test data for a skill
 - Checking which skills are undertriggering
 - Viewing skill telemetry or usage stats
@@ -21,22 +22,22 @@ selftune eval generate --skill <name> [options]
 ## Options
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--skill <name>` | Skill to generate evals for | Required (unless `--list-skills`) |
-| `--list-skills` | List all logged skills with query counts | Off |
-| `--stats` | Show aggregate telemetry stats for the skill | Off |
-| `--max <n>` | Maximum eval entries per side | 50 |
-| `--seed <n>` | Seed for deterministic shuffling | 42 |
-| `--output <path>` / `--out <path>` | Output file path | `{skillName}_trigger_eval.json` |
-| `--no-negatives` | Exclude negative examples from output | Off |
-| `--no-taxonomy` | Skip invocation_type classification | Off |
-| `--skill-log <path>` | Path to skill_usage_log.jsonl | Default log path |
-| `--query-log <path>` | Path to all_queries_log.jsonl | Default log path |
-| `--telemetry-log <path>` | Path to session_telemetry_log.jsonl | Default log path |
-| `--synthetic` | Generate evals from SKILL.md via LLM (no logs needed) | Off |
-| `--skill-path <path>` | Path to SKILL.md (required with `--synthetic`) | — |
-| `--model <model>` | LLM model to use for synthetic generation | Agent default |
+| Flag                               | Description                                           | Default                           |
+| ---------------------------------- | ----------------------------------------------------- | --------------------------------- |
+| `--skill <name>`                   | Skill to generate evals for                           | Required (unless `--list-skills`) |
+| `--list-skills`                    | List all logged skills with query counts              | Off                               |
+| `--stats`                          | Show aggregate telemetry stats for the skill          | Off                               |
+| `--max <n>`                        | Maximum eval entries per side                         | 50                                |
+| `--seed <n>`                       | Seed for deterministic shuffling                      | 42                                |
+| `--output <path>` / `--out <path>` | Output file path                                      | `{skillName}_trigger_eval.json`   |
+| `--no-negatives`                   | Exclude negative examples from output                 | Off                               |
+| `--no-taxonomy`                    | Skip invocation_type classification                   | Off                               |
+| `--skill-log <path>`               | Path to skill_usage_log.jsonl                         | Default log path                  |
+| `--query-log <path>`               | Path to all_queries_log.jsonl                         | Default log path                  |
+| `--telemetry-log <path>`           | Path to session_telemetry_log.jsonl                   | Default log path                  |
+| `--synthetic`                      | Generate evals from SKILL.md via LLM (no logs needed) | Off                               |
+| `--skill-path <path>`              | Path to SKILL.md (required with `--synthetic`)        | —                                 |
+| `--model <model>`                  | LLM model to use for synthetic generation             | Agent default                     |
 ## Output Format
@@ -126,6 +127,7 @@ selftune eval generate --skill pptx --synthetic --skill-path /path/to/skills/ppt
 ```
 The command:
 1. Reads the SKILL.md file content
 2. Loads real user queries from the database (if available) as few-shot style examples so synthetic queries match real phrasing patterns
 3. Sends skill content and real examples to an LLM with a prompt requesting realistic test queries
@@ -155,6 +157,7 @@ selftune eval generate --skill pptx --max 50 --output evals-pptx.json
 ```
 The command:
 1. Reads positive triggers from `skill_usage_log.jsonl`
 2. Reads all queries from `all_queries_log.jsonl`
 3. Identifies queries that should have triggered but did not
@@ -181,40 +184,36 @@ If the user responds with "use defaults" or similar shorthand, skip to step 1 us
 For `--list-skills` or `--stats` requests, skip pre-flight entirely — these are read-only operations.
-Use `AskUserQuestion` with these questions:
-```json
-{
-  "questions": [
-    {
-      "question": "Generation Mode",
-      "options": ["Log-based — build from real usage logs (recommended if logs exist)", "Synthetic — generate from SKILL.md via LLM (for new skills)"]
-    },
-    {
-      "question": "Model (for synthetic mode)",
-      "options": ["Fast (haiku) — quick generation", "Balanced (sonnet) — better diversity (recommended)", "Best (opus) — highest quality"]
-    },
-    {
-      "question": "Max Entries",
-      "options": ["50 (default)", "25 (quick)", "100 (comprehensive)"]
-    }
-  ]
-}
-```
-If `AskUserQuestion` is not available, fall back to presenting these as inline numbered options.
+Ask one `AskUserQuestion` at a time in this order:
+1. `Generation Mode`
+   Options:
+   - `Log-based — build from real usage logs (recommended if logs exist)`
+   - `Synthetic — generate from SKILL.md via LLM (for new skills)`
+2. If the user chose synthetic, ask `Model (for synthetic mode)`
+   Options:
+   - `Fast (haiku) — quick generation`
+   - `Balanced (sonnet) — better diversity (recommended)`
+   - `Best (opus) — highest quality`
+3. Ask `Max Entries`
+   Options:
+   - `50 (default)`
+   - `25 (quick)`
+   - `100 (comprehensive)`
+If `AskUserQuestion` is not available or Claude does not invoke it, fall back to presenting the same choices as inline numbered options.
 After the user responds, parse their selections and map each choice to the corresponding CLI flags:
-| Selection | CLI Flag |
-|-----------|----------|
-| 1a (log-based) | _(no flag, default)_ |
-| 1b (synthetic) | `--synthetic --skill-path <path>` |
-| Custom max entries | `--max <value>` |
-| 4a (haiku) | `--model haiku` (resolved internally by selftune) |
-| 4b (sonnet) | `--model sonnet` |
-| 4c (opus) | `--model opus` |
-| Custom output path | `--out <path>` |
+| Selection          | CLI Flag                                          |
+| ------------------ | ------------------------------------------------- |
+| 1a (log-based)     | _(no flag, default)_                              |
+| 1b (synthetic)     | `--synthetic --skill-path <path>`                 |
+| Custom max entries | `--max <value>`                                   |
+| 4a (haiku)         | `--model haiku` (resolved internally by selftune) |
+| 4b (sonnet)        | `--model sonnet`                                  |
+| 4c (opus)          | `--model opus`                                    |
+| Custom output path | `--out <path>`                                    |
 Show a confirmation summary to the user:
@@ -238,6 +237,7 @@ eval generation is useful.
 ### 2. Generate the Eval Set
 Run with `--skill <name>`. Parse the JSON output and review for:
 - Balance between positive and negative entries
 - Coverage of all three positive invocation types (explicit, implicit, contextual)
 - Reasonable negative examples (keyword overlap but wrong intent)
@@ -245,6 +245,7 @@ Run with `--skill <name>`. Parse the JSON output and review for:
 ### 3. Review Invocation Type Distribution
 A healthy eval set has:
 - Some explicit queries (easy baseline)
 - Many implicit queries (natural usage)
 - Several contextual queries (real-world usage)