npm - selftune - Versions diffs - 0.2.16 → 0.2.18 - Mend

selftune 0.2.16 → 0.2.18

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (40) hide show

package/README.md +24 -19
package/cli/selftune/alpha-upload/build-payloads.ts +14 -1
package/cli/selftune/alpha-upload/client.ts +51 -1
package/cli/selftune/alpha-upload/flush.ts +46 -5
package/cli/selftune/alpha-upload/stage-canonical.ts +25 -4
package/cli/selftune/alpha-upload-contract.ts +9 -0
package/cli/selftune/constants.ts +82 -5
package/cli/selftune/contribute/sanitize.ts +52 -5
package/cli/selftune/dashboard-contract.ts +100 -0
package/cli/selftune/dashboard-server.ts +2 -2
package/cli/selftune/evolution/description-quality.ts +12 -11
package/cli/selftune/evolution/evolve.ts +214 -51
package/cli/selftune/evolution/validate-proposal.ts +9 -6
package/cli/selftune/grading/grade-session.ts +20 -0
package/cli/selftune/hooks/commit-track.ts +188 -0
package/cli/selftune/hooks/prompt-log.ts +10 -1
package/cli/selftune/hooks/session-stop.ts +2 -2
package/cli/selftune/hooks/skill-eval.ts +15 -1
package/cli/selftune/hooks/stdin-preview.ts +32 -0
package/cli/selftune/localdb/direct-write.ts +69 -6
package/cli/selftune/localdb/queries.ts +552 -7
package/cli/selftune/localdb/schema.ts +46 -0
package/cli/selftune/orchestrate.ts +32 -4
package/cli/selftune/routes/overview.ts +41 -3
package/cli/selftune/routes/skill-report.ts +88 -17
package/cli/selftune/types.ts +31 -0
package/cli/selftune/utils/transcript.ts +210 -1
package/node_modules/@selftune/telemetry-contract/src/types.ts +11 -0
package/package.json +1 -1
package/packages/telemetry-contract/src/types.ts +11 -0
package/skill/SKILL.md +29 -1
package/skill/Workflows/Evolve.md +31 -13
package/skill/Workflows/ExportCanonical.md +121 -0
package/skill/Workflows/Hook.md +131 -0
package/skill/Workflows/Initialize.md +9 -8
package/skill/Workflows/Orchestrate.md +27 -5
package/skill/Workflows/Quickstart.md +94 -0
package/skill/Workflows/RepairSkillUsage.md +87 -0
package/skill/Workflows/Uninstall.md +82 -0
package/skill/settings_snippet.json +11 -0

package/skill/SKILL.md CHANGED Viewed

@@ -12,7 +12,7 @@ description: >
   even if they don't say "selftune" explicitly.
 metadata:
   author: selftune-dev
-  version: 0.2.16
+  version: 0.2.10
   category: developer-tools
 ---
@@ -104,9 +104,27 @@ selftune cron remove [--dry-run]
 selftune telemetry [status|enable|disable]
 selftune export    [TABLE...] [--output/-o DIR] [--since DATE]
+# Autonomous loop
+selftune orchestrate [--dry-run] [--review-required] [--auto-approve] [--skill NAME] [--max-skills N] [--recent-window HOURS] [--sync-force] [--max-auto-grade N] [--loop] [--loop-interval SECS]
+selftune sync        [--since DATE] [--dry-run] [--force] [--no-claude] [--no-codex] [--no-opencode] [--no-openclaw] [--no-repair] [--json]
+# Discovery + badges
+selftune workflows   [--skill NAME] [--skill-path PATH] [--min-occurrences N] [--window N] [--json] [save --skill NAME --skill-path PATH]
+selftune badge       --skill <name> [--format svg|markdown|url] [--output PATH]
+# Maintenance
+selftune quickstart
+selftune repair-skill-usage [--since DATE] [--dry-run]
+selftune export-canonical   [--out FILE] [--platform NAME] [--record-kind KIND] [--pretty] [--push-payload]
+selftune uninstall          [--dry-run] [--keep-logs] [--npm-uninstall]
+# Hook dispatch (for debugging/manual invocation)
+selftune hook <name>   # prompt-log | session-stop | skill-eval | auto-activate | skill-change-guard | evolution-guard
 # Alpha enrollment (device-code flow — browser opens automatically)
 selftune init --alpha --alpha-email <email>
 selftune alpha upload [--dry-run]
+selftune alpha relink
 selftune status                                                        # shows cloud link state + upload readiness
 ```
@@ -139,6 +157,11 @@ selftune status                                                        # shows c
 | badge, readme badge, skill badge, health badge                                                                                          | Badge             | Workflows/Badge.md                    |
 | workflows, discover workflows, list workflows, multi-skill workflows                                                                    | Workflows         | Workflows/Workflows.md                |
 | alpha upload, upload data, send alpha data, manual upload, dry run upload                                                               | AlphaUpload       | Workflows/AlphaUpload.md              |
+| quickstart, getting started, onboard, first time setup, new user                                                                        | Quickstart        | Workflows/Quickstart.md               |
+| uninstall, remove selftune, clean up, teardown                                                                                          | Uninstall         | Workflows/Uninstall.md                |
+| repair, rebuild usage, fix skill usage, trustworthy usage, repair-skill-usage                                                           | RepairSkillUsage  | Workflows/RepairSkillUsage.md         |
+| export canonical, canonical export, canonical telemetry, push payload                                                                   | ExportCanonical   | Workflows/ExportCanonical.md          |
+| hook, run hook, invoke hook, manual hook, debug hook                                                                                    | Hook              | Workflows/Hook.md                     |
 | export, dump, jsonl, export sqlite, debug export                                                                                        | Export            | _(direct command — no workflow file)_ |
 | status, health summary, skill health, how are skills, skills doing, run selftune                                                        | Status            | _(direct command — no workflow file)_ |
 | last, last session, recent session, what happened, what changed                                                                         | Last              | _(direct command — no workflow file)_ |
@@ -319,6 +342,11 @@ accomplish a task _using_ a skill, route to that skill instead.
 | `agents/pattern-analyst.md`         | Cross-skill conflict detection                      | Spawn when composability flags conflicts        |
 | `agents/evolution-reviewer.md`      | Safety gate for evolution proposals                 | Spawn before deploying high-stakes evolutions   |
 | `agents/integration-guide.md`       | Guided setup for complex projects                   | Spawn for monorepos, multi-skill setups         |
+| `Workflows/Quickstart.md`           | Guided onboarding: init, ingest, status             | First-time setup for new users                  |
+| `Workflows/Uninstall.md`            | Clean removal of selftune data and config           | When removing selftune completely               |
+| `Workflows/RepairSkillUsage.md`     | Rebuild skill usage from source transcripts         | When skill usage data seems inaccurate          |
+| `Workflows/ExportCanonical.md`      | Export canonical telemetry for downstream use       | When exporting data for external consumption    |
+| `Workflows/Hook.md`                 | Manual hook invocation for debugging                | When debugging or testing hooks manually        |
 | `references/logs.md`                | Log file formats (telemetry, usage, queries, audit) | When parsing or debugging log files             |
 | `references/grading-methodology.md` | 3-tier grading model, evidence standards            | When grading sessions or interpreting grades    |
 | `references/invocation-taxonomy.md` | 4 invocation types, coverage analysis               | When analyzing trigger coverage                 |

package/skill/Workflows/Evolve.md CHANGED Viewed

@@ -31,14 +31,16 @@ selftune evolve --skill <name> --skill-path <path> [options]
 | `--confidence <n>`           | Minimum confidence threshold (0-1)                                      | 0.6                            |
 | `--max-iterations <n>`       | Maximum retry iterations                                                | 3                              |
 | `--validation-model <model>` | Model for trigger-check validation LLM calls                            | `haiku`                        |
-| `--pareto`                   | Generate multiple candidates per iteration                              | Off                            |
-| `--candidates <n>`           | Number of candidates per iteration (with `--pareto`)                    | 3                              |
+| `--pareto`                   | Generate multiple candidates per iteration                              | On                             |
+| `--candidates <n>`           | Number of candidates per iteration when Pareto mode is enabled          | `3`                            |
 | `--token-efficiency`         | Optimize for token efficiency in proposals                              | Off                            |
 | `--with-baseline`            | Include a no-skill baseline comparison                                  | Off                            |
 | `--cheap-loop`               | Use cheap models for loop, expensive for final gate                     | On                             |
 | `--full-model`               | Use full-cost model throughout (disables cheap-loop)                    | Off                            |
 | `--verbose`                  | Print detailed progress during evolution                                | Off                            |
 | `--gate-model <model>`       | Model for final gate validation                                         | `sonnet` (when `--cheap-loop`) |
+| `--gate-effort <level>`      | Thinking effort for the final gate (`low|medium|high|max`)              | None                           |
+| `--adaptive-gate`            | Escalate risky gate checks to `opus` + `high` effort                    | Off                            |
 | `--proposal-model <model>`   | Model for proposal generation LLM calls                                 | None                           |
 | `--sync-first`               | Refresh source-truth telemetry before generating evals/failure patterns | Off                            |
 | `--sync-force`               | Force a full source rescan during `--sync-first`                        | Off                            |
@@ -115,7 +117,7 @@ Ask one `AskUserQuestion` at a time in this order:
    - `Single model — use one model throughout`
 4. `Advanced Options`
    Options:
-   - `Defaults (0.6 confidence, 3 iterations, single candidate) (recommended)`
+   - `Defaults (0.6 confidence, 3 iterations, 3 Pareto candidates) (recommended)`
    - `Stricter (0.7 confidence, 5 iterations)`
    - `Pareto mode (multiple candidates per iteration)`
@@ -146,7 +148,7 @@ Configuration Summary:
   Model:       haiku (cheap-loop: sonnet gate)
   Confidence:  0.6
   Iterations:  3
-  Pareto:      off
+  Pareto:      on (3 candidates)
 Proceeding...
 ```
@@ -284,15 +286,20 @@ Proposals are scored on heuristic quality criteria (no LLM required). The compos
 ### Stopping Criteria
-The evolution loop stops when any of these conditions is met (priority order):
-| #   | Condition          | Meaning                                             |
-| --- | ------------------ | --------------------------------------------------- |
-| 1   | **Converged**      | Pass rate >= 0.95                                   |
-| 2   | **Max iterations** | Reached `--max-iterations` limit                    |
-| 3   | **Low confidence** | Proposal confidence below `--confidence` threshold  |
-| 4   | **Plateau**        | Pass rate unchanged across 3 consecutive iterations |
-| 5   | **Continue**       | None of the above -- keep iterating                 |
+The evolution loop uses a modular stopping criteria evaluator
+(`evolution/stopping-criteria.ts`) that checks conditions in priority order
+after each validation pass. The evaluator receives the current pass rate,
+historical pass rates from previous iterations, and proposal confidence to
+make a unified stop/continue decision. The stopping reason is recorded in
+audit entries for traceability.
+| #   | Condition          | Meaning                                                        |
+| --- | ------------------ | -------------------------------------------------------------- |
+| 1   | **Converged**      | Pass rate >= 0.95                                              |
+| 2   | **Max iterations** | Reached `--max-iterations` limit                               |
+| 3   | **Low confidence** | Proposal confidence below `--confidence` threshold             |
+| 4   | **Plateau**        | < 1% pass rate variation across 3 consecutive iterations       |
+| 5   | **Continue**       | None of the above -- keep iterating                            |
 ## Cheap Loop Mode
@@ -310,6 +317,11 @@ The gate validation is a new step between validation and deploy. It re-runs
 `validateProposal` using the gate model. If the gate fails, the proposal is
 not deployed.
+When `--adaptive-gate` is enabled, selftune keeps the normal gate for low-risk
+proposals and escalates only risky ones to `opus` with `high` effort. Risk
+signals include small net lift, regressions, low proposal confidence, and
+large description broadening.
 ```bash
 # Cheap loop with default models
 selftune evolve --skill X --skill-path Y --cheap-loop
@@ -317,6 +329,12 @@ selftune evolve --skill X --skill-path Y --cheap-loop
 # Cheap loop with opus gate
 selftune evolve --skill X --skill-path Y --cheap-loop --gate-model opus
+# Cheap loop with adaptive escalation for risky proposals
+selftune evolve --skill X --skill-path Y --cheap-loop --adaptive-gate
+# Explicit high-effort opus gate
+selftune evolve --skill X --skill-path Y --cheap-loop --gate-model opus --gate-effort high
 # Manual model control without cheap-loop
 selftune evolve --skill X --skill-path Y --proposal-model haiku --validation-model sonnet
 ```

package/skill/Workflows/ExportCanonical.md ADDED Viewed

@@ -0,0 +1,121 @@
+# selftune Export Canonical Workflow
+Export canonical telemetry records as JSONL or as a V2 push payload for cloud
+upload. Canonical records are the normalized, platform-agnostic representation
+of sessions, prompts, skill invocations, execution facts, and normalization runs.
+## When to Use
+- The user wants to export telemetry data for external analysis
+- The user says "export canonical", "canonical export", or "canonical telemetry"
+- The agent needs to produce a push payload for manual upload inspection
+- Debugging what data would be sent to the cloud API
+## Default Command
+```bash
+selftune export-canonical
+```
+## Options
+| Flag                    | Description                                                         |
+| ----------------------- | ------------------------------------------------------------------- |
+| `--out <path>`          | Write output to a file instead of stdout                            |
+| `--platform <name>`     | Filter by platform (`claude_code`, `codex`, `opencode`, `openclaw`) |
+| `--record-kind <kind>`  | Filter by record kind (`session`, `prompt`, `skill_invocation`, `execution_fact`, `normalization_run`) |
+| `--pretty`              | Pretty-print JSON output with 2-space indentation                   |
+| `--log <path>`          | Path to canonical log file (default: `~/.claude/canonical_log.jsonl`) |
+| `--projects-dir <path>` | Claude transcript directory for fallback synthesis (default: `~/.claude/projects`) |
+| `--push-payload`        | Output as a V2 push payload envelope instead of raw JSONL           |
+## Output Formats
+### Default (JSONL)
+One canonical record per line:
+```jsonl
+{"record_kind":"session","session_id":"abc123","platform":"claude_code",...}
+{"record_kind":"prompt","prompt_id":"p1","session_id":"abc123",...}
+{"record_kind":"skill_invocation","invocation_id":"inv1","skill_name":"selftune",...}
+```
+### Push Payload (`--push-payload`)
+A single JSON envelope matching the V2 cloud upload schema:
+```json
+{
+  "schema_version": "2.0",
+  "client_version": "0.1.0",
+  "push_id": "uuid",
+  "normalizer_version": "1.0.0",
+  "canonical": {
+    "sessions": [...],
+    "prompts": [...],
+    "skill_invocations": [...],
+    "execution_facts": [...],
+    "normalization_runs": [...],
+    "evolution_evidence": [...],
+    "orchestrate_runs": [],
+    "grading_results": [],
+    "improvement_signals": []
+  }
+}
+```
+### File output (`--out`)
+When `--out` is specified, the data is written to the file and a JSON summary
+is printed to stdout:
+```json
+{
+  "ok": true,
+  "out": "/path/to/output.jsonl",
+  "count": 42,
+  "format": "jsonl",
+  "pretty": false,
+  "platform": null,
+  "record_kind": null
+}
+```
+## Fallback Behavior
+If the canonical log file is empty or does not exist, the command falls back to
+synthesizing canonical records directly from Claude Code transcripts in
+`--projects-dir`. This supports existing installs that have rich transcript
+data but have not yet generated a canonical log.
+## Common Patterns
+**Export all canonical data**
+> Run `selftune export-canonical > export.jsonl` to dump everything.
+**Export only skill invocations**
+> Run `selftune export-canonical --record-kind skill_invocation` to filter.
+**Inspect push payload before upload**
+> Run `selftune export-canonical --push-payload --pretty` to see exactly what would be sent to the cloud API.
+**Export to file with summary**
+> Run `selftune export-canonical --out /tmp/export.jsonl --pretty` to write data and see a count summary.
+**Filter by platform**
+> Run `selftune export-canonical --platform claude_code` to export only Claude Code records.
+## Troubleshooting
+| Symptom | Cause | Fix |
+| --- | --- | --- |
+| Empty output | No canonical log and no transcripts | Run `selftune sync` or `selftune quickstart` to ingest data first |
+| "Unknown platform" error | Invalid `--platform` value | Use one of: `claude_code`, `codex`, `opencode`, `openclaw` |
+| "Unknown record kind" error | Invalid `--record-kind` value | Use one of: `session`, `prompt`, `skill_invocation`, `execution_fact`, `normalization_run` |
+| Push payload missing evolution evidence | No evolution runs recorded | Run `selftune evolve` to generate evidence, then re-export |

package/skill/Workflows/Hook.md ADDED Viewed

@@ -0,0 +1,131 @@
+# selftune Hook Workflow
+Manually invoke individual Claude Code hooks for debugging and testing.
+Each hook reads its payload from stdin and behaves exactly as it would when
+triggered by the Claude Code host agent.
+## When to Use
+- Debugging a specific hook's behavior with a known payload
+- The user says "hook", "run hook", "invoke hook", "manual hook", or "debug hook"
+- Testing hook installation by simulating a hook event
+- Verifying hook output before or after configuration changes
+## Default Command
+```bash
+echo '{"payload":"..."}' | selftune hook <name>
+```
+Where `<name>` is one of the 6 available hooks.
+## Available Hooks
+| Hook Name              | Claude Code Event      | Purpose                                                                 |
+| ---------------------- | ---------------------- | ----------------------------------------------------------------------- |
+| `prompt-log`           | UserPromptSubmit       | Logs every user query to SQLite for false-negative eval detection        |
+| `session-stop`         | Stop                   | Extracts session-level telemetry from transcript when a session ends     |
+| `skill-eval`           | PostToolUse            | Records skill usage when a SKILL.md is read or a Skill tool is invoked  |
+| `auto-activate`        | UserPromptSubmit       | Evaluates activation rules and suggests selftune actions via stderr      |
+| `skill-change-guard`   | PreToolUse             | Warns (advisory) when an agent is about to write to a SKILL.md file     |
+| `evolution-guard`      | PreToolUse             | Blocks writes to monitored SKILL.md files until `selftune watch` runs   |
+## Hook Details
+### prompt-log
+Fires on every user message before Claude processes it. Writes the query to
+SQLite so that `hooks-to-evals` can identify prompts that did NOT trigger a
+skill — the raw material for false-negative eval entries. Also writes a
+canonical prompt record.
+### session-stop
+Fires when a Claude Code session ends. Reads the session transcript JSONL and
+extracts process-level telemetry (tool calls, errors, skills triggered, token
+counts). Writes one record per session to SQLite with a JSONL backup. May
+trigger a reactive `selftune orchestrate` spawn if conditions are met.
+### skill-eval
+Fires after Read or Skill tool calls. If the target is a SKILL.md file or a
+Skill invocation, finds the triggering user query from the transcript and
+writes a usage record. Builds the real-usage eval dataset over time.
+### auto-activate
+Fires on every user message. Evaluates activation rules against the session
+context and outputs suggestions to stderr (shown to Claude as system messages).
+Suggestions are advisory only — exit code is always 0. Tracks session state to
+avoid repeated suggestions.
+### skill-change-guard
+Fires before Write/Edit tool calls. If the target is a SKILL.md file, outputs
+a suggestion to run `selftune watch --skill <name>` to monitor impact. Advisory
+only — exit code is always 0, never blocking. Uses session state to avoid
+repeating suggestions for the same skill.
+### evolution-guard
+Fires before Write/Edit tool calls. If the target is a SKILL.md file that has
+a deployed evolution under active monitoring, and no recent `selftune watch`
+snapshot exists, this hook BLOCKS the write with exit code 2. This prevents
+unmonitored changes to skills that are being tracked.
+Exit codes:
+- `0` — Allow (not a SKILL.md, not monitored, or watch is recent)
+- `2` — Block with message (Claude Code convention for PreToolUse hooks)
+Fail-open: any internal error results in exit 0 (never blocks accidentally).
+## Output Format
+Hook output varies by hook type:
+- **prompt-log, session-stop, skill-eval**: Write to SQLite and JSONL logs silently. Exit 0 on success.
+- **auto-activate**: Writes suggestions to stderr. Exit 0 always.
+- **skill-change-guard**: Writes advisory message to stderr. Exit 0 always.
+- **evolution-guard**: Writes block message to stderr on exit 2. Exit 0 when allowing.
+## Common Patterns
+**Debug a prompt-log hook**
+> Pipe a UserPromptSubmit payload to test prompt logging:
+>
+> ```bash
+> echo '{"session_id":"test","query":"improve my skills"}' | selftune hook prompt-log
+> ```
+**Test skill-eval with a PostToolUse payload**
+> ```bash
+> echo '{"tool_name":"Read","file_path":"/path/to/SKILL.md","session_id":"test"}' | selftune hook skill-eval
+> ```
+**Verify evolution-guard blocks correctly**
+> ```bash
+> echo '{"tool_name":"Write","file_path":"/path/to/monitored/SKILL.md"}' | selftune hook evolution-guard
+> echo $?  # Should be 2 if skill is monitored without recent watch
+> ```
+## Error Handling
+If no hook name is provided or the name is unrecognized, the command exits with
+a `UNKNOWN_COMMAND` error listing available hooks:
+```
+Unknown hook: (none). Available: prompt-log, session-stop, skill-eval, auto-activate, skill-change-guard, evolution-guard
+```
+## Troubleshooting
+| Symptom | Cause | Fix |
+| --- | --- | --- |
+| "Unknown hook" error | Typo in hook name | Use one of: `prompt-log`, `session-stop`, `skill-eval`, `auto-activate`, `skill-change-guard`, `evolution-guard` |
+| Hook exits 0 but no data written | Payload missing required fields | Check the hook's expected payload schema in `cli/selftune/types.ts` |
+| evolution-guard always exits 0 | No deployed evolution for the target skill | Run `selftune evolve` first to deploy an evolution, then test the guard |
+| auto-activate produces no suggestions | Activation rules not configured or already suggested in session | Check `~/.selftune/` for activation rules and session state files |

package/skill/Workflows/Initialize.md CHANGED Viewed

@@ -126,14 +126,15 @@ Code subagent calls stay up to date.
 **Hook reference** (for troubleshooting):
-| Hook                       | Script                        | Purpose                                         | Notes                                          |
-| -------------------------- | ----------------------------- | ----------------------------------------------- | ---------------------------------------------- |
-| `UserPromptSubmit`         | `hooks/prompt-log.ts`         | Log every user query                            | Accepts both `prompt` and legacy `user_prompt` |
-| `UserPromptSubmit`         | `hooks/auto-activate.ts`      | Suggest skills before prompt processing         | Uses `additionalContext` JSON for suggestions  |
-| `PreToolUse` (Write/Edit)  | `hooks/skill-change-guard.ts` | Detect uncontrolled skill edits                 | `if` filter: only fires on `*SKILL.md` paths   |
-| `PreToolUse` (Write/Edit)  | `hooks/evolution-guard.ts`    | Block SKILL.md edits on monitored skills        | `if` filter: only fires on `*SKILL.md` paths   |
-| `PostToolUse` (Read/Skill) | `hooks/skill-eval.ts`         | Track skill triggers and Skill tool invocations |                                                |
-| `Stop`                     | `hooks/session-stop.ts`       | Capture session telemetry                       | Runs async (non-blocking), 60s timeout         |
+| Hook                       | Script                        | Purpose                                         | Notes                                           |
+| -------------------------- | ----------------------------- | ----------------------------------------------- | ----------------------------------------------- |
+| `UserPromptSubmit`         | `hooks/prompt-log.ts`         | Log every user query                            | Accepts both `prompt` and legacy `user_prompt`  |
+| `UserPromptSubmit`         | `hooks/auto-activate.ts`      | Suggest skills before prompt processing         | Uses `additionalContext` JSON for suggestions   |
+| `PreToolUse` (Write/Edit)  | `hooks/skill-change-guard.ts` | Detect uncontrolled skill edits                 | `if` filter: only fires on `*SKILL.md` paths    |
+| `PreToolUse` (Write/Edit)  | `hooks/evolution-guard.ts`    | Block SKILL.md edits on monitored skills        | `if` filter: only fires on `*SKILL.md` paths    |
+| `PostToolUse` (Read/Skill) | `hooks/skill-eval.ts`         | Track skill triggers and Skill tool invocations | Fast-path: skips non-PostToolUse/non-Read/Skill |
+| `PostToolUse` (Bash)       | `hooks/commit-track.ts`       | Track git commits for session traceability      | Fast-path: skips non-git Bash commands          |
+| `Stop`                     | `hooks/session-stop.ts`       | Capture session telemetry                       | Runs async (non-blocking), 60s timeout          |
 **Codex agents:**

package/skill/Workflows/Orchestrate.md CHANGED Viewed

@@ -20,6 +20,22 @@ recent changes with auto-rollback enabled.
 selftune orchestrate
 ```
+Autonomous evolve settings used by orchestrate:
+```text
+confidenceThreshold = 0.6
+maxIterations = 3
+paretoEnabled = true
+candidateCount = 3
+tokenEfficiencyEnabled = false
+withBaseline = false
+validationModel = haiku
+cheapLoop = true
+gateModel = sonnet
+adaptiveGate = true
+proposalModel = haiku
+```
 ## Flags
 | Flag                        | Description                                                | Default    |
@@ -109,10 +125,11 @@ This is the recommended runtime for recurring autonomous scheduling.
 | **Automated (loop)** | `selftune orchestrate --loop`         | No agent session; LLM cost only if evolution triggers | Configurable interval |
 In automated mode, the OS calls the CLI binary directly. No agent session
-is created. LLM calls only happen during the evolution step (proposing and
-validating description changes), which uses the configured model tier.
-The orchestrate logic itself (sync, status, candidate selection) is pure
-data processing with zero token cost.
+is created. Outside of the regular sync/status/candidate-selection logic,
+LLM calls can come from auto-grading ungraded skills and from the evolution
+step itself. By default, orchestrate runs proposal generation and validation
+on `haiku`, then re-runs the final gate on `sonnet` before deploy. Risky
+candidates are escalated to `opus` with `high` effort for the gate only.
 **Cron mode:** Install OS-level scheduling with `selftune cron setup`.
 Runs as separate invocations on a schedule (default: every 6 hours).
@@ -144,10 +161,15 @@ In autonomous mode, orchestrate calls sub-workflows in this fixed order:
 1. **Sync** — refresh source-truth telemetry across all supported agents (`selftune sync`)
 2. **Status** — compute skill health using existing grade results (reads `grading.json` outputs from previous sessions)
 3. **Auto-grade** — grade up to `--max-auto-grade` (default 5) ungraded skills that have session data but no grades yet. Skipped during `--dry-run` (grading makes LLM calls). After grading, status is recomputed so candidate selection sees updated grades. Fail-open: individual grading errors are logged but never block the loop.
-4. **Evolve** — run evolution on selected candidates (pre-flight is skipped, cheap-loop mode enabled, defaults used)
+4. **Evolve** — run evolution on selected candidates (pre-flight is skipped; Pareto mode uses 3 candidates; cheap-loop uses `haiku` for proposal + validation and `sonnet` for the final gate; adaptive gate escalation promotes risky proposals to `opus` + `high` effort; baseline and token-efficiency stay off)
 5. **Watch** — monitor recently evolved skills (auto-rollback enabled by default, `--recent-window` hours lookback)
 6. **Alpha Upload** — if enrolled in the alpha program (`config.alpha.enrolled === true`) and an API key is configured, stage new canonical records (sessions, invocations, evolution evidence, orchestrate runs) into `canonical_upload_staging`, build V2 push payloads, and flush to the cloud API (`POST /api/v1/push`) with Bearer auth. Fail-open: upload errors never block the orchestrate loop. Respects `--dry-run`.
+When orchestrate invokes evolve for a selected candidate, it always passes
+`confidenceThreshold: 0.6` and `maxIterations: 3`, plus the autonomous evolve
+defaults listed above. Those defaults are the recurring-run policy for the
+autonomy-first loop; there are no orchestrate flags to override them per run.
 Between candidate selection and evolution, orchestrate checks for
 **cross-skill eval set overlap**. When two or more evolution candidates
 share >30% of their positive eval queries, a warning is logged to stderr.

package/skill/Workflows/Quickstart.md ADDED Viewed

@@ -0,0 +1,94 @@
+# selftune Quickstart Workflow
+Guided onboarding that runs init, ingest, and status in a single command.
+Designed for first-time users who want to get selftune working immediately.
+## When to Use
+- The user is setting up selftune for the first time
+- The user says "getting started", "quickstart", "onboard", or "first time"
+- The agent needs to bootstrap selftune in one step without running init, ingest, and status separately
+## Default Command
+```bash
+selftune quickstart
+```
+Help:
+```bash
+selftune quickstart --help
+```
+## Options
+| Flag     | Description            |
+| -------- | ---------------------- |
+| `--help` | Show usage information |
+## Steps Performed
+Quickstart runs three steps automatically:
+1. **Init** — Creates `~/.selftune/config.json` if it does not exist. Skips if config is already present.
+2. **Ingest** — Runs Claude Code transcript replay if the ingest marker file does not exist. Discovers transcripts from `~/.claude/projects/` and writes session telemetry to SQLite.
+3. **Status** — Displays current skill health using `computeStatus`. Shows pass rates, trends, and health indicators for all detected skills; when you need per-skill check volume, look at `snapshot.skill_checks` rather than a "session count" field.
+After status, quickstart suggests the top 3 skills that would benefit from evolution, prioritized by:
+- **UNGRADED/UNKNOWN** skills (highest priority) — suggests running `selftune grade`
+- **CRITICAL** skills (pass rate below threshold) — suggests evolution
+- **WARNING** skills — suggests improvement
+## Output Format
+```text
+selftune quickstart
+====================
+[1/3] Config exists, skipping init.
+[2/3] Running ingest claude...
+      Ingested 12 sessions.
+[3/3] Current status:
+  Skill Health Summary
+  ...
+Suggested next steps:
+  - my-skill: pass rate 45% — needs evolution
+  - other-skill: needs grading — run `selftune grade --skill other-skill`
+```
+If all skills are healthy, the output ends with:
+```text
+All skills are healthy. No immediate actions needed.
+```
+## Common Patterns
+**First-time setup**
+> Run `selftune quickstart`. It handles init, ingest, and status automatically.
+**Already initialized**
+> Quickstart skips steps that are already complete (config exists, ingest marker exists). It is safe to run multiple times.
+**No transcripts found**
+> If no Claude Code transcripts exist in `~/.claude/projects/`, quickstart reports "No Claude Code transcripts found" and continues to the status step. The user should run some agent sessions first, then re-run quickstart.
+**Status or ingest fails**
+> Quickstart catches errors in each step and suggests the manual command for troubleshooting (e.g., `selftune init`, `selftune ingest claude`, or `selftune status`).
+## Troubleshooting
+| Symptom | Cause | Fix |
+| --- | --- | --- |
+| "Init failed" at step 1 | Config directory permissions or corrupted config | Run `selftune init --force` manually |
+| "Ingest failed" at step 2 | Transcript directory missing or unreadable | Verify `~/.claude/projects/` exists and contains session directories |
+| "No sessions found" after ingest | No actionable transcripts or no skill usage detected | Run agent sessions that use skills, then re-run quickstart |
+| "Status failed" at step 3 | SQLite database issue | Run `selftune doctor` to diagnose |