npm - selftune - Versions diffs - 0.2.8 → 0.2.10 - Mend

selftune 0.2.8 → 0.2.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (140) hide show

package/README.md +35 -35
package/apps/local-dashboard/dist/assets/index-BZVLv70T.js +16 -0
package/apps/local-dashboard/dist/assets/{index-CRtLkBTi.css → index-Bs3Y4ixf.css} +1 -1
package/apps/local-dashboard/dist/assets/{vendor-react-BQH_6WrG.js → vendor-react-BXP54cYo.js} +4 -4
package/apps/local-dashboard/dist/assets/{vendor-table-dK1QMLq9.js → vendor-table-DTF_SXoy.js} +1 -1
package/apps/local-dashboard/dist/assets/{vendor-ui-CO2mrx6e.js → vendor-ui-CWU0d1wd.js} +66 -66
package/apps/local-dashboard/dist/index.html +15 -15
package/bin/selftune.cjs +1 -1
package/cli/selftune/activation-rules.ts +37 -18
package/cli/selftune/agent-guidance.ts +16 -16
package/cli/selftune/alpha-identity.ts +1 -2
package/cli/selftune/alpha-upload/build-payloads.ts +18 -2
package/cli/selftune/alpha-upload/flush.ts +2 -2
package/cli/selftune/alpha-upload/stage-canonical.ts +106 -3
package/cli/selftune/auth/device-code.ts +32 -0
package/cli/selftune/auto-update.ts +12 -0
package/cli/selftune/badge/badge.ts +1 -0
package/cli/selftune/canonical-export.ts +5 -0
package/cli/selftune/claude-agents.ts +154 -0
package/cli/selftune/contribute/bundle.ts +2 -0
package/cli/selftune/contribute/contribute.ts +1 -0
package/cli/selftune/cron/setup.ts +2 -2
package/cli/selftune/dashboard-contract.ts +1 -1
package/cli/selftune/dashboard-server.ts +11 -52
package/cli/selftune/eval/hooks-to-evals.ts +13 -6
package/cli/selftune/eval/import-skillsbench.ts +1 -0
package/cli/selftune/eval/synthetic-evals.ts +2 -3
package/cli/selftune/eval/unit-test.ts +1 -0
package/cli/selftune/evolution/deploy-proposal.ts +1 -0
package/cli/selftune/evolution/evolve-body.ts +93 -6
package/cli/selftune/evolution/evolve.ts +0 -1
package/cli/selftune/evolution/propose-body.ts +3 -2
package/cli/selftune/evolution/propose-routing.ts +3 -2
package/cli/selftune/evolution/refine-body.ts +3 -2
package/cli/selftune/export.ts +1 -0
package/cli/selftune/grading/auto-grade.ts +1 -0
package/cli/selftune/grading/grade-session.ts +9 -0
package/cli/selftune/hooks/auto-activate.ts +6 -0
package/cli/selftune/hooks/evolution-guard.ts +12 -15
package/cli/selftune/hooks/prompt-log.ts +1 -0
package/cli/selftune/hooks/session-stop.ts +34 -40
package/cli/selftune/hooks/skill-change-guard.ts +1 -0
package/cli/selftune/hooks/skill-eval.ts +1 -1
package/cli/selftune/index.ts +23 -14
package/cli/selftune/ingestors/claude-replay.ts +1 -0
package/cli/selftune/ingestors/codex-rollout.ts +1 -0
package/cli/selftune/ingestors/codex-wrapper.ts +1 -0
package/cli/selftune/ingestors/openclaw-ingest.ts +1 -0
package/cli/selftune/ingestors/opencode-ingest.ts +1 -0
package/cli/selftune/init.ts +197 -96
package/cli/selftune/localdb/db.ts +1 -0
package/cli/selftune/localdb/direct-write.ts +93 -12
package/cli/selftune/localdb/materialize.ts +2 -0
package/cli/selftune/localdb/queries.ts +210 -0
package/cli/selftune/localdb/schema.ts +72 -1
package/cli/selftune/monitoring/watch.ts +1 -0
package/cli/selftune/normalization.ts +4 -0
package/cli/selftune/observability.ts +14 -7
package/cli/selftune/orchestrate.ts +15 -37
package/cli/selftune/repair/skill-usage.ts +7 -3
package/cli/selftune/routes/orchestrate-runs.ts +1 -0
package/cli/selftune/routes/overview.ts +1 -0
package/cli/selftune/routes/skill-report.ts +1 -0
package/cli/selftune/sync.ts +31 -1
package/cli/selftune/types.ts +2 -2
package/cli/selftune/uninstall.ts +412 -0
package/cli/selftune/utils/canonical-log.ts +2 -0
package/cli/selftune/utils/jsonl.ts +1 -0
package/cli/selftune/utils/llm-call.ts +131 -3
package/cli/selftune/utils/skill-log.ts +1 -0
package/cli/selftune/utils/transcript.ts +1 -0
package/cli/selftune/utils/trigger-check.ts +1 -1
package/cli/selftune/workflows/skill-md-writer.ts +5 -5
package/cli/selftune/workflows/workflows.ts +1 -0
package/package.json +38 -33
package/packages/telemetry-contract/fixtures/golden.test.ts +1 -0
package/packages/telemetry-contract/package.json +3 -3
package/packages/telemetry-contract/src/index.ts +0 -1
package/packages/telemetry-contract/src/schemas.ts +6 -24
package/packages/telemetry-contract/tests/compatibility.test.ts +1 -0
package/packages/ui/README.md +35 -34
package/packages/ui/package.json +3 -3
package/packages/ui/src/components/ActivityTimeline.tsx +49 -42
package/packages/ui/src/components/EvidenceViewer.tsx +306 -182
package/packages/ui/src/components/EvolutionTimeline.tsx +83 -72
package/packages/ui/src/components/InfoTip.tsx +4 -3
package/packages/ui/src/components/OrchestrateRunsPanel.tsx +60 -53
package/packages/ui/src/components/section-cards.tsx +19 -24
package/packages/ui/src/components/skill-health-grid.tsx +213 -193
package/packages/ui/src/lib/constants.tsx +1 -0
package/packages/ui/src/primitives/badge.tsx +12 -15
package/packages/ui/src/primitives/button.tsx +7 -7
package/packages/ui/src/primitives/card.tsx +15 -26
package/packages/ui/src/primitives/checkbox.tsx +7 -8
package/packages/ui/src/primitives/collapsible.tsx +5 -5
package/packages/ui/src/primitives/dropdown-menu.tsx +45 -55
package/packages/ui/src/primitives/label.tsx +6 -6
package/packages/ui/src/primitives/select.tsx +28 -37
package/packages/ui/src/primitives/table.tsx +17 -44
package/packages/ui/src/primitives/tabs.tsx +14 -21
package/packages/ui/src/primitives/tooltip.tsx +10 -22
package/skill/SKILL.md +72 -59
package/skill/Workflows/AlphaUpload.md +4 -4
package/skill/Workflows/AutoActivation.md +11 -6
package/skill/Workflows/Badge.md +22 -16
package/skill/Workflows/Baseline.md +34 -36
package/skill/Workflows/Composability.md +16 -11
package/skill/Workflows/Contribute.md +26 -21
package/skill/Workflows/Cron.md +23 -22
package/skill/Workflows/Dashboard.md +40 -40
package/skill/Workflows/Doctor.md +40 -34
package/skill/Workflows/Evals.md +48 -47
package/skill/Workflows/EvolutionMemory.md +31 -21
package/skill/Workflows/Evolve.md +84 -82
package/skill/Workflows/EvolveBody.md +58 -47
package/skill/Workflows/Grade.md +16 -13
package/skill/Workflows/ImportSkillsBench.md +9 -6
package/skill/Workflows/Ingest.md +36 -21
package/skill/Workflows/Initialize.md +138 -97
package/skill/Workflows/Orchestrate.md +22 -16
package/skill/Workflows/Replay.md +12 -7
package/skill/Workflows/Rollback.md +13 -6
package/skill/Workflows/Schedule.md +6 -6
package/skill/Workflows/Sync.md +18 -11
package/skill/Workflows/UnitTest.md +28 -17
package/skill/Workflows/Watch.md +28 -21
package/skill/agents/diagnosis-analyst.md +11 -0
package/skill/agents/evolution-reviewer.md +15 -1
package/skill/agents/integration-guide.md +10 -0
package/skill/agents/pattern-analyst.md +12 -1
package/skill/references/grading-methodology.md +23 -24
package/skill/references/interactive-config.md +7 -7
package/skill/references/invocation-taxonomy.md +22 -20
package/skill/references/logs.md +20 -6
package/skill/references/setup-patterns.md +4 -2
package/.claude/agents/diagnosis-analyst.md +0 -156
package/.claude/agents/evolution-reviewer.md +0 -180
package/.claude/agents/integration-guide.md +0 -212
package/.claude/agents/pattern-analyst.md +0 -160
package/apps/local-dashboard/dist/assets/index-Bk9vSHHd.js +0 -15

package/skill/Workflows/Replay.md CHANGED Viewed

@@ -27,13 +27,13 @@ selftune ingest claude
 ## Options
-| Flag | Description |
-|------|-------------|
-| `--since <date>` | Only include transcripts modified after this date |
-| `--dry-run` | Preview what would be ingested without writing |
-| `--force` | Re-ingest all transcripts (ignore marker file) |
-| `--verbose` | Show detailed progress per file |
-| `--projects-dir <path>` | Override default `~/.claude/projects/` path |
+| Flag                    | Description                                       |
+| ----------------------- | ------------------------------------------------- |
+| `--since <date>`        | Only include transcripts modified after this date |
+| `--dry-run`             | Preview what would be ingested without writing    |
+| `--force`               | Re-ingest all transcripts (ignore marker file)    |
+| `--verbose`             | Show detailed progress per file                   |
+| `--projects-dir <path>` | Override default `~/.claude/projects/` path       |
 ## Source
@@ -43,6 +43,7 @@ Each transcript is a JSONL file containing user and assistant messages.
 ## Output
 Writes to:
 - `~/.claude/all_queries_log.jsonl` -- one record per user query (all messages, not just last)
 - `~/.claude/session_telemetry_log.jsonl` -- per-session metrics with `source: "claude_code_replay"`
 - `~/.claude/skill_usage_log.jsonl` -- skill triggers detected in transcripts
@@ -76,16 +77,20 @@ Report the number of sessions ingested and any skills discovered to the user.
 ## Common Patterns
 **User wants to backfill logs from Claude Code history**
 > Run `selftune ingest claude`. No options needed for a full backfill.
 > Parse the output and report ingested session counts.
 **User wants to ingest only recent sessions**
 > Run `selftune ingest claude --since <date>` with the user's specified date.
 **User wants to re-ingest everything from scratch**
 > Run `selftune ingest claude --force`. This ignores the marker file and
 > rescans all transcripts.
 **Agent needs to verify ingestion succeeded**
 > Run `selftune doctor` after ingestion. Parse the JSON output to check
 > that log file entry counts increased.

package/skill/Workflows/Rollback.md CHANGED Viewed

@@ -11,11 +11,11 @@ selftune evolve rollback --skill <name> --skill-path <path> [options]
 ## Options
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--skill <name>` | Skill name | Required |
-| `--skill-path <path>` | Path to the skill's SKILL.md | Required |
-| `--proposal-id <id>` | Specific proposal to rollback | Latest evolution |
+| Flag                  | Description                   | Default          |
+| --------------------- | ----------------------------- | ---------------- |
+| `--skill <name>`      | Skill name                    | Required         |
+| `--skill-path <path>` | Path to the skill's SKILL.md  | Required         |
+| `--proposal-id <id>`  | Specific proposal to rollback | Latest evolution |
 ## Output Format
@@ -31,7 +31,7 @@ The command writes a `rolled_back` entry to `~/.claude/evolution_audit_log.jsonl
     "total": 50,
     "passed": 35,
     "failed": 15,
-    "pass_rate": 0.70
+    "pass_rate": 0.7
   }
 }
 ```
@@ -78,6 +78,7 @@ Manual restoration from version control is required.
 ### 0. Read Evolution Context
 Before starting, read `~/.selftune/memory/context.md` for session context:
 - Active evolutions and their current status
 - Previous rollback history
 - Last update timestamp
@@ -107,6 +108,7 @@ selftune evolve rollback --skill pptx --skill-path /path/to/SKILL.md --proposal-
 ### 3. Verify Restoration
 After rollback, verify the SKILL.md content is restored:
 - Read the file and confirm it matches the pre-evolution version
 - Check the audit log for the `rolled_back` entry
 - Optionally re-run evals to confirm the original pass rate
@@ -114,6 +116,7 @@ After rollback, verify the SKILL.md content is restored:
 ### 4. Update Memory
 After rollback completes, the memory writer updates:
 - `~/.selftune/memory/decisions.md` -- records the rollback decision and reason
 - `~/.selftune/memory/context.md` -- clears the active evolution state and notes the rollback
@@ -128,16 +131,20 @@ audit trail and can use it to avoid repeating failed evolution patterns.
 ## Common Patterns
 **"Rollback the last evolution"**
 > Run rollback with `--skill` and `--skill-path`. The command automatically
 > finds the latest `deployed` entry in the audit log.
 **"Undo the pptx skill change"**
 > Same as above, specifying `--skill pptx`.
 **"Restore the original description"**
 > If multiple evolutions have occurred, use `--proposal-id` to target a
 > specific one. Without it, only the most recent evolution is rolled back.
 **"The rollback says no backup found"**
 > Check version control (git) for the pre-evolution SKILL.md. The audit
 > trail may also contain the original description in a `created` entry.

package/skill/Workflows/Schedule.md CHANGED Viewed

@@ -36,12 +36,12 @@ Outputs examples for all three scheduling systems (cron, launchd, systemd).
 ## Flags
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--format <type>` | Output only one format: `cron`, `launchd`, or `systemd` | All formats |
-| `--install` | Write and activate scheduler artifacts for the selected/default platform | Off |
-| `--dry-run` | Preview installed files and activation commands without writing | Off |
-| `--help` | Show help message | — |
+| Flag              | Description                                                              | Default     |
+| ----------------- | ------------------------------------------------------------------------ | ----------- |
+| `--format <type>` | Output only one format: `cron`, `launchd`, or `systemd`                  | All formats |
+| `--install`       | Write and activate scheduler artifacts for the selected/default platform | Off         |
+| `--dry-run`       | Preview installed files and activation commands without writing          | Off         |
+| `--help`          | Show help message                                                        | —           |
 ## Steps

package/skill/Workflows/Sync.md CHANGED Viewed

@@ -19,21 +19,22 @@ selftune sync
 ## Options
-| Flag | Description |
-|------|-------------|
-| `--since <date>` | Only sync sessions modified on/after this date |
-| `--dry-run` | Show summary without writing files |
-| `--force` | Ignore per-source markers and rescan everything |
-| `--no-claude` | Skip Claude transcript replay |
-| `--no-codex` | Skip Codex rollout ingest |
-| `--no-opencode` | Skip OpenCode ingest |
-| `--no-openclaw` | Skip OpenClaw ingest |
-| `--no-repair` | Skip rebuilding `skill_usage_repaired.jsonl` |
-| `--json` | Output results as JSON |
+| Flag             | Description                                     |
+| ---------------- | ----------------------------------------------- |
+| `--since <date>` | Only sync sessions modified on/after this date  |
+| `--dry-run`      | Show summary without writing files              |
+| `--force`        | Ignore per-source markers and rescan everything |
+| `--no-claude`    | Skip Claude transcript replay                   |
+| `--no-codex`     | Skip Codex rollout ingest                       |
+| `--no-opencode`  | Skip OpenCode ingest                            |
+| `--no-openclaw`  | Skip OpenClaw ingest                            |
+| `--no-repair`    | Skip rebuilding `skill_usage_repaired.jsonl`    |
+| `--json`         | Output results as JSON                          |
 ## Output
 Writes/refreshed data:
 - `~/.claude/session_telemetry_log.jsonl`
 - `~/.claude/all_queries_log.jsonl`
 - `~/.claude/skill_usage_log.jsonl`
@@ -50,6 +51,7 @@ counts. Report the preview summary to the user.
 ### 2. Run Sync
 Run `selftune sync`. The output includes:
 - Per-source `scanned`, `synced`, and `skipped` counts
 - Repaired overlay totals
 - Any errors or warnings
@@ -92,20 +94,25 @@ Use `--json` when the agent needs to parse sync results programmatically
 ## Common Patterns
 **User wants to refresh telemetry data**
 > Run `selftune sync`. Report per-source `scanned`, `synced`, and `skipped` counts.
 **User wants to sync only recent sessions**
 > Run `selftune sync --since <date>` with the user's specified date.
 **User wants a full rescan from scratch**
 > Run `selftune sync --force`. This ignores per-source markers and rescans
 > all sessions.
 **Agent needs to verify sync worked**
 > Check per-source `scanned`, `synced`, and `skipped` counts. `synced=0`
 > is normal when data is already up-to-date. Verify `scanned > 0` for
 > expected sources to confirm sync ran successfully.
 **Agent is chaining into monitoring or evolution**
 > Use `selftune watch --sync-first` or `selftune evolve --sync-first` to
 > refresh source truth automatically before making decisions.

package/skill/Workflows/UnitTest.md CHANGED Viewed

@@ -11,15 +11,15 @@ selftune eval unit-test --skill <name> --tests <path> [options]
 ## Options
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--skill <name>` | Skill name | Required |
-| `--tests <path>` | Path to unit test JSON file | `~/.selftune/unit-tests/<skill>.json` |
-| `--run-agent` | Run agent-based assertions (not just trigger checks) | Off |
-| `--generate` | Generate tests from skill content instead of running | Off |
-| `--skill-path <path>` | Path to SKILL.md (required for `--generate`) | None |
-| `--eval-set <path>` | Eval set for failure context (used with `--generate`) | None |
-| `--model <flag>` | Model flag for LLM calls | Agent default |
+| Flag                  | Description                                           | Default                               |
+| --------------------- | ----------------------------------------------------- | ------------------------------------- |
+| `--skill <name>`      | Skill name                                            | Required                              |
+| `--tests <path>`      | Path to unit test JSON file                           | `~/.selftune/unit-tests/<skill>.json` |
+| `--run-agent`         | Run agent-based assertions (not just trigger checks)  | Off                                   |
+| `--generate`          | Generate tests from skill content instead of running  | Off                                   |
+| `--skill-path <path>` | Path to SKILL.md (required for `--generate`)          | None                                  |
+| `--eval-set <path>`   | Eval set for failure context (used with `--generate`) | None                                  |
+| `--model <flag>`      | Model flag for LLM calls                              | Agent default                         |
 ## Test Format
@@ -48,12 +48,12 @@ Tests are stored as JSON arrays in `~/.selftune/unit-tests/<skill>.json`:
 ## Assertion Types
-| Type | What it checks | Requires agent? |
-|------|---------------|-----------------|
-| `trigger_check` | Query triggers the skill description | No (LLM only) |
-| `output_contains` | Agent output contains expected text | Yes |
-| `output_matches_regex` | Agent output matches regex pattern | Yes |
-| `tool_called` | Agent used a specific tool | Yes |
+| Type                   | What it checks                       | Requires agent? |
+| ---------------------- | ------------------------------------ | --------------- |
+| `trigger_check`        | Query triggers the skill description | No (LLM only)   |
+| `output_contains`      | Agent output contains expected text  | Yes             |
+| `output_matches_regex` | Agent output matches regex pattern   | Yes             |
+| `tool_called`          | Agent used a specific tool           | Yes             |
 Trigger check assertions are cheap (single LLM call). Agent-based assertions
 require `--run-agent` and run the query through the full agent.
@@ -66,14 +66,19 @@ require `--run-agent` and run the query through the full agent.
   "total": 10,
   "passed": 8,
   "failed": 2,
-  "pass_rate": 0.80,
+  "pass_rate": 0.8,
   "results": [
     {
       "test_id": "research-trigger-1",
       "overall_passed": true,
       "trigger_passed": true,
       "assertion_results": [
-        { "type": "trigger_check", "value": "true", "passed": true, "evidence": "LLM responded YES" }
+        {
+          "type": "trigger_check",
+          "value": "true",
+          "passed": true,
+          "evidence": "LLM responded YES"
+        }
       ],
       "duration_ms": 450
     }
@@ -93,6 +98,7 @@ selftune eval unit-test --skill Research --generate --skill-path ~/.claude/skill
 ```
 Parse the output. The LLM creates test cases covering:
 - Explicit trigger queries
 - Implicit trigger queries
 - Contextual trigger queries
@@ -114,6 +120,7 @@ Add `--run-agent` for full agent-based assertions.
 ### 3. Parse Results
 Parse the JSON output. Check `pass_rate` and investigate failures:
 - Failed trigger checks -- description needs improvement (route to Evolve)
 - Failed output assertions -- skill workflow needs fixes
 - Failed tool assertions -- skill routing is broken
@@ -134,17 +141,21 @@ the evolution improved trigger accuracy.
 ## Common Patterns
 **User asks to generate tests for a skill**
 > Run `selftune eval unit-test --skill <name> --generate --skill-path <path>`.
 > Parse the output and report how many tests were generated.
 **User asks to run existing tests**
 > Run `selftune eval unit-test --skill <name>`. Parse the JSON output and
 > report pass rate and any failures.
 **User asks for full agent-based testing**
 > Run `selftune eval unit-test --skill <name> --run-agent`. This runs queries
 > through the full agent, so inform the user it will take longer.
 **After an evolution completes**
 > Run unit tests to verify the evolution improved trigger accuracy. Compare
 > the new pass rate against the pre-evolution baseline.

package/skill/Workflows/Watch.md CHANGED Viewed

@@ -11,15 +11,15 @@ selftune watch --skill <name> --skill-path <path> [options]
 ## Options
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--skill <name>` | Skill name | Required |
-| `--skill-path <path>` | Path to the skill's SKILL.md | Required |
-| `--window <n>` | Sliding window size (number of sessions) | 20 |
-| `--threshold <n>` | Regression threshold (drop from baseline) | 0.1 |
-| `--auto-rollback` | Automatically rollback on detected regression | Off |
-| `--sync-first` | Refresh source-truth telemetry before evaluating | Off |
-| `--sync-force` | Force a full source rescan during `--sync-first` | Off |
+| Flag                  | Description                                      | Default  |
+| --------------------- | ------------------------------------------------ | -------- |
+| `--skill <name>`      | Skill name                                       | Required |
+| `--skill-path <path>` | Path to the skill's SKILL.md                     | Required |
+| `--window <n>`        | Sliding window size (number of sessions)         | 20       |
+| `--threshold <n>`     | Regression threshold (drop from baseline)        | 0.1      |
+| `--auto-rollback`     | Automatically rollback on detected regression    | Off      |
+| `--sync-first`        | Refresh source-truth telemetry before evaluating | Off      |
+| `--sync-force`        | Force a full source rescan during `--sync-first` | Off      |
 ## Output Format
@@ -40,12 +40,12 @@ selftune watch --skill <name> --skill-path <path> [options]
 ### Status Values
-| Status | Meaning |
-|--------|---------|
-| `healthy` | Current pass rate is within threshold of baseline |
-| `warning` | Pass rate dropped but within threshold |
-| `regression` | Pass rate dropped below baseline minus threshold |
-| `insufficient_data` | Not enough sessions in the window to evaluate |
+| Status              | Meaning                                           |
+| ------------------- | ------------------------------------------------- |
+| `healthy`           | Current pass rate is within threshold of baseline |
+| `warning`           | Pass rate dropped but within threshold            |
+| `regression`        | Pass rate dropped below baseline minus threshold  |
+| `insufficient_data` | Not enough sessions in the window to evaluate     |
 ## Parsing Instructions
@@ -69,6 +69,7 @@ selftune watch --skill <name> --skill-path <path> [options]
 ### 0. Read Evolution Context
 Read `~/.selftune/memory/context.md` for session context:
 - Active evolutions and their current status
 - Known issues and regression history
 - Last update timestamp
@@ -91,16 +92,17 @@ selftune watch --skill pptx --skill-path /path/to/SKILL.md
 Parse the JSON output. Key decision points:
-| Status | Action |
-|--------|--------|
-| `healthy` | No action needed. Skill is performing well. |
-| `warning` | Monitor closely. Consider re-running after more sessions. |
-| `regression` | Investigate. Consider rollback. |
-| `insufficient_data` | Wait for more sessions before evaluating. |
+| Status              | Action                                                    |
+| ------------------- | --------------------------------------------------------- |
+| `healthy`           | No action needed. Skill is performing well.               |
+| `warning`           | Monitor closely. Consider re-running after more sessions. |
+| `regression`        | Investigate. Consider rollback.                           |
+| `insufficient_data` | Wait for more sessions before evaluating.                 |
 ### 3. Decide Action
 If regression is detected:
 - Review recent session transcripts to understand what changed
 - Check if the eval set is still representative
 - Run `evolve rollback` if the regression is confirmed (see `Workflows/Rollback.md`)
@@ -111,6 +113,7 @@ previous description and logs a `rolled_back` entry.
 ### 4. Report
 Summarize the snapshot for the user:
 - Current pass rate vs baseline
 - Number of sessions evaluated
 - Whether regression was detected
@@ -126,16 +129,20 @@ context window resets before the user acts on the results.
 ## Common Patterns
 **"Is the skill performing well after the change?"**
 > Run watch with the skill name and path. Report the snapshot.
 **"Check for regressions"**
 > Same as above. Focus on the `regression_detected` and `delta` fields.
 **"How is the skill doing?"**
 > Run watch. If `insufficient_data`, tell the user to wait for more
 > sessions before drawing conclusions.
 **"Auto-rollback if it regresses"**
 > Use `--auto-rollback`. The command will restore the previous description
 > automatically if pass rate drops below baseline minus threshold.

package/skill/agents/diagnosis-analyst.md CHANGED Viewed

@@ -88,6 +88,7 @@ selftune eval generate --skill <name> --max 50
 Treat these outputs as exploratory summaries. Verify important claims against
 the underlying logs:
 - `~/.claude/skill_usage_log.jsonl`
 - `~/.claude/all_queries_log.jsonl`
 - `~/.claude/session_telemetry_log.jsonl`
@@ -96,6 +97,7 @@ the underlying logs:
 Read `~/.claude/evolution_audit_log.jsonl` for entries affecting the target
 skill. Look for:
 - recent deploys followed by regressions
 - repeated dry-runs or validated proposals with no deploy
 - rollbacks
@@ -107,6 +109,7 @@ Prefer the specific sessions passed by the parent. Otherwise, select recent
 sessions that show errors, unmatched queries, or clear misses.
 Look for:
 - the skill never being read or invoked
 - the wrong workflow being chosen
 - steps performed out of order
@@ -121,6 +124,7 @@ smallest credible next action.
 ## Stop Conditions
 Stop and return to the parent if:
 - the target skill is ambiguous
 - the required logs or transcripts are unavailable
 - the evidence is limited to one isolated session
@@ -134,30 +138,37 @@ Return a compact report with these sections:
 ## Diagnosis Report: <skill-name>
 ### Summary
 [2-4 sentence explanation of what is going wrong]
 ### Root Cause
 [TRIGGER / PROCESS / QUALITY / INFRASTRUCTURE]
 ### Findings
 - [Finding 1]
 - [Finding 2]
 - [Finding 3]
 ### Evidence
 - [path or command result]
 - [session ID / query / timestamp]
 - [audit or transcript evidence]
 ### Recommended Next Actions
 1. [Highest-leverage next step]
 2. [Second step]
 3. [Optional follow-up]
 ### Suggested Commands
 - `...`
 - `...`
 ### Confidence
 [high / medium / low]
 ```

package/skill/agents/evolution-reviewer.md CHANGED Viewed

@@ -3,7 +3,7 @@ name: evolution-reviewer
 description: Use when reviewing a dry-run or pending evolution proposal before deployment, especially for high-stakes skills, marginal improvements, or recent regressions. Compares old vs new content, checks evidence quality, and returns an approve or reject verdict with conditions.
 tools: Read, Grep, Glob, Bash
 disallowedTools: Write, Edit
-model: sonnet
+model: opus
 maxTurns: 8
 ---
@@ -69,12 +69,14 @@ selftune evolve --skill <name> --skill-path <path> --dry-run
 ### 2. Compare original vs proposed content
 For description proposals, compare:
 - preserved working anchors
 - added language for missed queries
 - scope creep or vague broadening
 - tone and style continuity
 For routing/body proposals, compare:
 - workflow routing ownership changes
 - added or removed operational steps
 - whether the body still matches current CLI behavior
@@ -83,6 +85,7 @@ For routing/body proposals, compare:
 ### 3. Assess eval and evidence quality
 Check:
 - eval size is meaningful for the change being proposed
 - negatives exist for overtriggering protection
 - explicit queries are protected
@@ -91,6 +94,7 @@ Check:
 ### 4. Check metrics and history
 Review proposal metrics and recent history:
 - pass-rate delta
 - regression count or obvious explicit regressions
 - confidence
@@ -99,6 +103,7 @@ Review proposal metrics and recent history:
 ### 5. Render a safety verdict
 Issue one of:
 - `APPROVE`
 - `APPROVE WITH CONDITIONS`
 - `REJECT`
@@ -106,6 +111,7 @@ Issue one of:
 ## Stop Conditions
 Stop and return to the parent if:
 - there is no concrete proposal or diff to review
 - the target skill or proposal is ambiguous
 - the eval source is missing
@@ -119,31 +125,39 @@ Return a compact verdict with these sections:
 ## Evolution Review: <skill-name>
 ### Proposal ID
 [proposal ID or "not provided"]
 ### Verdict
 [APPROVE / APPROVE WITH CONDITIONS / REJECT]
 ### Summary
 [2-4 sentence explanation]
 ### Findings
 - [Finding 1]
 - [Finding 2]
 - [Finding 3]
 ### Evidence
 - [audit entry / eval fact / diff observation]
 - [audit entry / eval fact / diff observation]
 ### Required Changes
 1. [Only if not approved]
 2. [Only if not approved]
 ### Post-Deploy Conditions
 - [watch requirement or monitoring threshold]
 - [follow-up check]
 ### Confidence
 [high / medium / low]
 ```

package/skill/agents/integration-guide.md CHANGED Viewed

@@ -46,6 +46,7 @@ parent. Do not ask the user directly unless the parent explicitly told you to.
 ### 1. Detect project structure
 Inspect the workspace and classify it as one of:
 - single-skill project
 - multi-skill repo
 - monorepo with shared tooling
@@ -64,6 +65,7 @@ selftune doctor
 ```
 Check:
 - whether the CLI exists
 - whether `config.json` exists and looks current (resolve via `SELFTUNE_CONFIG_DIR` or `SELFTUNE_HOME` env vars first, falling back to `~/.selftune/`; run `selftune doctor` to confirm the resolved path)
 - whether hooks or ingest paths are healthy
@@ -80,6 +82,7 @@ selftune init [--agent claude_code] [--cli-path <path>] [--force]
 For other platforms, route to the appropriate ingest workflow after init.
 If the repo layout is complex, decide whether the user needs:
 - one shared setup at the repo root
 - per-package setup guidance
 - absolute paths to avoid cwd-dependent failures
@@ -89,6 +92,7 @@ If the repo layout is complex, decide whether the user needs:
 If `requestedMode` is `plan-only`, stop at a verified setup plan.
 If `requestedMode` is `hands-on`, you may:
 - run `selftune init`
 - create or refresh local activation-rules files
 - repair obvious path or config issues
@@ -116,6 +120,7 @@ run evals, improve a skill, or set up autonomous orchestration.
 ## Stop Conditions
 Stop and return to the parent if:
 - the project root is ambiguous
 - the CLI is missing and installation is not allowed
 - the repo has no skills and the task is really skill creation, not setup
@@ -130,25 +135,30 @@ Return a setup report with these sections:
 ## selftune Setup Complete
 ### Environment
 - Agent platform: <claude_code / codex / opencode / openclaw / unknown>
 - Project type: <single-skill / multi-skill / monorepo / no-skills>
 - Skills detected: <list>
 ### Configuration
 - Config: [created / verified / missing]
 - Init path: [command used or recommended]
 - Hooks or ingest: [healthy / needs work / not applicable]
 - Doctor: [healthy / unhealthy with blockers]
 ### Verification
 - Telemetry capture: [working / not verified]
 - Skill tracking: [working / not verified]
 ### Next Steps
 1. [Primary recommended action]
 2. [Secondary action]
 3. [Optional action]
 ### Confidence
 [high / medium / low]
 ```