npm - selftune - Versions diffs - 0.2.23 → 0.2.24 - Mend

selftune 0.2.23 → 0.2.24

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (219) hide show

package/skill/references/cli-quick-reference.md ADDED Viewed

@@ -0,0 +1,89 @@
+# CLI Quick Reference
+Full flag reference for all selftune commands. Run `selftune <command> --help`
+for the most up-to-date flags.
+```bash
+# Ingest group
+selftune ingest claude   [--since DATE] [--dry-run] [--force] [--verbose]
+selftune ingest codex                                                          # (experimental)
+selftune ingest opencode                                                       # (experimental)
+selftune ingest openclaw [--agents-dir PATH] [--since DATE] [--dry-run] [--force] [--verbose]  # (experimental)
+selftune ingest pi       [--sessions-dir PATH] [--since DATE] [--dry-run] [--force] [--verbose]  # (experimental)
+selftune ingest wrap-codex -- <codex args>                                     # (experimental)
+# Grade group
+selftune grade auto      --skill <name> [--expectations "..."] [--agent <name>]
+selftune grade baseline  --skill <name> --skill-path <path> [--eval-set <path>] [--agent <name>]
+# Evolve group
+selftune evolve          --skill <name> --skill-path <path> [--dry-run] [--validation-mode auto|replay|judge]
+selftune evolve body     --skill <name> --skill-path <path> --target <body|routing> [--dry-run]
+selftune evolve rollback --skill <name> --skill-path <path> [--proposal-id <id>]
+# Eval group
+selftune eval generate      --skill <name> [--list-skills] [--stats] [--max N] [--seed N] [--output PATH] [--blend]
+selftune eval unit-test      --skill <name> --tests <path> [--run-agent] [--generate]
+selftune eval import         --dir <path> --skill <name> --output <path> [--match-strategy exact|fuzzy]
+selftune eval composability  --skill <name> [--window N] [--telemetry-log <path>]
+selftune eval family-overlap --prefix <family-> | --skills <a,b,c> [--parent-skill <name>] [--min-overlap 0.3] [--min-shared 2]
+# Other commands
+selftune watch    --skill <name> --skill-path <path> [--auto-rollback] [--grade-threshold N] [--no-grade-watch]
+selftune status
+selftune last
+selftune doctor
+selftune dashboard [--port <port>] [--no-open]
+selftune contributions [status|preview <skill>|upload [--dry-run]|approve <skill>|revoke <skill>|default <ask|always|never>|reset]
+selftune creator-contributions [status|enable --skill <name>|enable --all [--prefix <value>]|disable --skill <name>]
+selftune contribute [--skill NAME] [--preview] [--sanitize LEVEL] [--submit]
+selftune cron setup [--dry-run]                         # auto-detect platform (cron/launchd/systemd)
+selftune cron setup --platform openclaw [--dry-run] [--tz <timezone>]  # OpenClaw-specific
+selftune cron list
+selftune cron remove [--dry-run]
+selftune telemetry [status|enable|disable]
+selftune export    [TABLE...] [--output/-o DIR] [--since DATE]
+# Autonomous loop
+selftune orchestrate [--dry-run] [--review-required] [--auto-approve] [--skill NAME] [--max-skills N] [--recent-window HOURS] [--sync-force] [--max-auto-grade N] [--loop] [--loop-interval SECS]
+selftune sync        [--since DATE] [--dry-run] [--force] [--no-claude] [--no-codex] [--no-opencode] [--no-openclaw] [--no-pi] [--no-repair] [--json]
+# Discovery + badges
+selftune workflows   [--skill NAME] [--skill-path PATH] [--min-occurrences N] [--window N] [--json] [save <name-or-index> --skill-path PATH] [scaffold <name-or-index> --output-dir PATH --skill-name NAME --description TEXT --write --force]
+selftune badge       --skill <name> [--format svg|markdown|url] [--output PATH]
+# Maintenance
+selftune quickstart
+selftune repair-skill-usage [--since DATE] [--dry-run]
+selftune recover            [--full] [--force] [--since DATE]
+selftune export-canonical   [--out FILE] [--platform NAME] [--record-kind KIND] [--pretty] [--push-payload]
+selftune uninstall          [--dry-run] [--keep-logs] [--npm-uninstall]
+# Hook dispatch (for debugging/manual invocation)
+selftune hook <name>   # prompt-log | session-stop | skill-eval | auto-activate | skill-change-guard | evolution-guard
+# Platform hooks (non-Claude-Code agents)
+selftune codex hook
+selftune codex install    [--dry-run] [--uninstall]
+selftune opencode hook
+selftune opencode install [--dry-run] [--uninstall]
+selftune cline hook
+selftune cline install    [--dry-run] [--uninstall]
+selftune pi hook
+selftune pi install       [--dry-run] [--uninstall]
+# Registry (team skill distribution)
+selftune registry push [name]      [--version=<semver>] [--summary=<text>]
+selftune registry install <name>   [--global]
+selftune registry sync
+selftune registry status
+selftune registry rollback <name>  [--to=<version>] [--reason=<text>]
+selftune registry history <name>
+selftune registry list
+# Alpha enrollment (device-code flow — browser opens automatically)
+selftune init --alpha --alpha-email <email>
+selftune alpha upload [--dry-run]
+selftune alpha relink
+selftune status                                                        # shows cloud link state + upload readiness
+```

package/skill/references/creator-playbook.md ADDED Viewed

@@ -0,0 +1,131 @@
+# Creator Playbook
+Use this when you are publishing a skill other people will install.
+If the user wants the operational step-by-step loop from cold start to deploy,
+route first to `workflows/CreateTestDeploy.md`. Use this reference for the
+packaging and after-ship interpretation layer around that loop.
+The goal is simple:
+1. ship a skill that routes cleanly on day one
+2. collect privacy-safe signal after launch
+3. turn that signal into a safe improvement loop
+## Before Ship
+### Decide what belongs where
+| Put it in... | When it belongs there |
+| --- | --- |
+| `description` / routing section | The user intent that should trigger the skill |
+| `workflows/` | Ordered procedures the agent should follow once routed |
+| `references/` | Background knowledge, checklists, examples, or taxonomy the agent may need during execution |
+| `scripts/` or tools | Deterministic mechanics the agent should not reinvent every run |
+Rule of thumb:
+- If the agent needs to **recognize** a request, fix the router.
+- If the agent needs to **follow steps**, add or split a workflow.
+- If the agent needs **context**, add a reference.
+- If the agent keeps redoing the same exact logic, make it code.
+### Keep the routing surface small
+- Start router-first. Add only the trigger phrases and negative examples needed to call the right skill.
+- Keep workflow detail out of the top-level description.
+- Split into separate workflows when the execution path meaningfully changes.
+- Add negative examples whenever a nearby intent should not trigger the skill.
+### Cold-start test and deploy the skill before publishing
+The default creator loop is now:
+```bash
+selftune eval generate --skill my-skill
+selftune eval unit-test --skill my-skill --generate --skill-path path/to/SKILL.md
+selftune evolve --skill my-skill --skill-path path/to/SKILL.md --dry-run --validation-mode replay
+selftune grade baseline --skill my-skill --skill-path path/to/SKILL.md
+selftune evolve --skill my-skill --skill-path path/to/SKILL.md --with-baseline
+selftune watch --skill my-skill
+```
+That same sequence is now packaged as the dedicated `CreateTestDeploy`
+workflow in the shipped selftune skill, while `Evals`, `UnitTest`, `Baseline`,
+`Evolve`, and `Watch` remain the atomic workflow docs for each individual step.
+The dashboard overview, per-skill report, and `selftune status` all read from that loop and show
+the next missing step directly, then flip to deploy-ready and watching states once the skill is shipped.
+Ship only after you can explain:
+- what should trigger the skill
+- what should not
+- where the body depends on references versus tools
+### Bundle creator-directed contribution config
+If you want post-ship creator signal:
+```bash
+selftune creator-contributions enable --skill my-skill --creator-id <cloud-user-uuid>
+```
+This writes `selftune.contribute.json` into the skill package so end users can opt in to privacy-safe creator-directed sharing.
+The `creator_id` must be your cloud user UUID. Supported signals today are:
+- `trigger`
+- `grade`
+- `miss_category`
+## After Ship
+### Tell users what to opt into
+There are two different community paths:
+- `selftune contributions approve <skill>`: creator-directed relay signals for your dashboard
+- `selftune contribute --skill <skill> --submit`: sanitized community bundle submission
+Relay is the lightweight always-on loop. Bundles are the deeper periodic export.
+### Watch the right surfaces
+After launch, the loop is:
+1. open the cloud Community page or the skill detail Community tab
+2. check whether the skill is still low-signal or has crossed the actionable threshold
+3. inspect missed categories and grade distribution
+4. create a contributor proposal only when the signal is coherent
+5. approve/apply the proposal through the normal proposals flow
+6. watch the skill after apply
+Actionable threshold today:
+- at least `10` total signals
+- at least `3` distinct contributor cohorts
+### Interpret signal correctly
+- High missed counts with concentrated categories usually mean the **description/router** is wrong.
+- Low grades with decent trigger rate usually mean the **body/workflow/reference/tool split** is wrong.
+- Low-signal skills need more contributors before you trust a proposal.
+## Fast Checklist
+Before ship:
+- router describes when to use the skill
+- workflows describe how to do the job
+- references carry durable context
+- tools/scripts carry deterministic mechanics
+- evals cover both your language and other likely phrasings
+- `selftune.contribute.json` is bundled if you want creator-directed signal
+After ship:
+- community overview shows your skill by name
+- per-skill community page shows missed categories and grades
+- contributor proposals are reviewed before apply
+- watch is run after meaningful changes

package/skill/references/examples.md ADDED Viewed

@@ -0,0 +1,48 @@
+# Examples
+## Scenario 1: First-time setup
+User says: "Set up selftune" or "Install selftune"
+Actions:
+1. Read `workflows/Initialize.md`
+2. Run `selftune init` to bootstrap config (hooks are installed automatically)
+3. Run `selftune doctor` to verify
+Result: Config at `~/.selftune/config.json`, hooks active, ready for session capture.
+## Scenario 2: Improve a skill
+User says: "Make the pptx skill catch more queries" or "Evolve the Research skill"
+Actions:
+1. `selftune eval generate --skill pptx` to find missed triggers
+2. `selftune evolve --skill pptx --skill-path <path>` to propose changes
+3. `selftune watch --skill pptx --skill-path <path>` to monitor post-deploy
+Result: Skill description updated to match real user language, with rollback available.
+## Scenario 3: Check skill health
+User says: "How are my skills doing?" or "Run selftune"
+Actions:
+1. `selftune status` for overall health summary
+2. `selftune last` for most recent session insight
+3. `selftune doctor` if issues detected
+Result: Pass rates, trend data, and actionable recommendations.
+## Scenario 4: Autonomous operation
+User says: "Set up cron jobs" or "Run selftune automatically"
+Actions:
+1. `selftune cron setup` to install OS-level scheduling
+2. Orchestrate loop runs: ingest -> grade -> evolve -> watch
+Result: Skills improve continuously without manual intervention.

package/skill/references/troubleshooting.md ADDED Viewed

@@ -0,0 +1,47 @@
+# Troubleshooting
+## CLI not found
+Error: `command not found: selftune`
+Cause: CLI not installed or not on PATH.
+Solution:
+1. Reinstall or refresh with `npx skills add selftune-dev/selftune`
+2. If you manage the CLI directly, use `npm install -g selftune` or `bun add -g selftune`
+3. Check `bin/selftune.cjs` exists if running from a source checkout
+4. Verify with `which selftune`
+5. If using bun from a source checkout: `bun link` in the repo root
+## No sessions to grade
+Error: `selftune grade` returns empty results.
+Cause: Hooks not capturing sessions, or no sessions since last ingest.
+Solution:
+1. Run `selftune doctor` to verify hook installation
+2. Run `selftune ingest claude --force` to re-ingest
+3. Run `selftune doctor` to check database health and telemetry record counts
+## Evolution proposes no changes
+Cause: Eval set too small or skill already well-tuned.
+Solution:
+1. Run `selftune eval generate --skill <name> --max 50` for a larger eval set
+2. Check `selftune status` — if pass rate is >90%, evolution may not be needed
+3. Try `selftune evolve body` for deeper structural changes
+## Dashboard won't serve
+Error: Port already in use or blank page.
+Solution:
+1. Try a different port: `selftune dashboard --port 3142`
+2. Check if another process holds the port: `lsof -i :3141`
+3. Use `--no-open` to start the server without opening a browser

package/skill/references/version-history.md CHANGED Viewed

@@ -32,7 +32,7 @@ agent execution.
 - Added first-class routing and quick-reference coverage for
   `selftune workflows`
-- Added a dedicated `Workflows/Workflows.md` guide for workflow discovery and
+- Added a dedicated `workflows/Workflows.md` guide for workflow discovery and
   codification
 - Updated composability guidance to reflect synergy, conflicts, and workflow
   candidates

package/skill/selftune.contribute.json ADDED Viewed

@@ -0,0 +1,11 @@
+{
+  "version": 1,
+  "creator_id": "43c960c1-9b02-4020-96f0-6fdd8f030b5a",
+  "skill_name": "selftune",
+  "contribution": {
+    "enabled": true,
+    "signals": ["trigger", "grade", "miss_category"],
+    "message": "Help improve selftune by sharing anonymous usage signals with the selftune creator.",
+    "privacy_url": "https://docs.selftune.dev/privacy/contributor-signals"
+  }
+}

package/skill/{Workflows → workflows}/Baseline.md RENAMED Viewed

@@ -26,7 +26,7 @@ selftune grade baseline --skill <name> --skill-path <path> [options]
 | `--skill <name>`      | Skill name                   | Required                 |
 | `--skill-path <path>` | Path to the skill's SKILL.md | Required                 |
 | `--eval-set <path>`   | Pre-built eval set JSON      | Auto-generated from logs |
-| `--agent <name>`      | Agent CLI to use             | Auto-detected            |
+| `--agent <name>`      | Agent CLI to use (claude, codex, opencode, pi) | Auto-detected            |
 ## Output Format
@@ -42,6 +42,9 @@ selftune grade baseline --skill <name> --skill-path <path> [options]
 }
 ```
+Every baseline run is also written into SQLite (`grading_baselines`) so the dashboard and
+`selftune status` can show whether the skill has cleared the no-skill comparison step.
 ## How It Works
 1. Loads the eval set (from `--eval-set` or auto-generated from logs)
@@ -85,6 +88,7 @@ Ask one `AskUserQuestion` at a time in this order:
    - `claude`
    - `codex`
    - `opencode`
+   - `pi`
 If `AskUserQuestion` is not available or Claude does not invoke it, fall back to presenting the same choices as inline numbered options.
@@ -134,6 +138,21 @@ Report the interpretation to the user based on the lift value.
 Add `--with-baseline` to evolve commands to prevent wasting evolution
 cycles on skills that don't add value.
+### 4. Canonical creator loop position
+Baseline is the last pre-deploy check in the default creator loop:
+```bash
+selftune eval generate --skill <name>
+selftune eval unit-test --skill <name> --generate --skill-path <path>
+selftune evolve --skill <name> --skill-path <path> --dry-run --validation-mode replay
+selftune grade baseline --skill <name> --skill-path <path>
+selftune evolve --skill <name> --skill-path <path> --with-baseline
+selftune watch --skill <name>
+```
+After that, the skill is ready for live deploy and then watch with much clearer trust evidence.
 ## Common Patterns
 **User asks whether a skill adds value (e.g., "does the Research skill help?"):**

package/skill/{Workflows → workflows}/Contribute.md RENAMED Viewed

@@ -1,11 +1,13 @@
 # selftune Contribute Workflow
-Export anonymized skill observability data as a JSON bundle for **community**
-contribution. Helps improve selftune's skill routing without exposing private data.
+Export an anonymized **export bundle** of skill observability data for
+community contribution. Helps improve selftune's skill routing without
+exposing private data.
-This is **not** the same as `selftune contributions`, which manages per-skill
-creator-directed sharing preferences, or `selftune creator-contributions`,
-which manages the creator-side bundled config file.
+This is **not** the same as:
+- `selftune contributions` — managing your **sharing preferences** for creator-directed signals
+- `selftune creator-contributions` — managing the **creator sharing setup** file (`selftune.contribute.json`)
+- The signals dashboard — viewing aggregated **contributor signal data** from all contributors
 ## When to Use
@@ -28,7 +30,9 @@ selftune contribute --skill selftune
 | `--preview`          | Show what would be shared without writing                                |
 | `--sanitize <level>` | `conservative` (default) or `aggressive`                                 |
 | `--since <date>`     | Only include data from this date onward                                  |
-| `--submit`           | Auto-create GitHub Issue via `gh` CLI                                    |
+| `--submit`           | Submit bundle to the cloud endpoint (falls back to GitHub if it fails)   |
+| `--endpoint <url>`   | Override the default cloud API endpoint                                  |
+| `--github`           | Submit via GitHub Issue instead of the cloud endpoint                     |
 ## Sanitization Levels
@@ -68,8 +72,12 @@ No raw transcripts, file contents, or identifiable information is included.
 ## Submission
-- Default: writes JSON file to `~/.selftune/contributions/`
-- `--submit`: creates a GitHub Issue with the bundle
+- Default: writes the export bundle JSON file to `~/.selftune/contributions/`
+- `--submit`: submits the export bundle to the cloud endpoint (`POST /api/v1/community/bundles`)
+  - Requires a `selftune.contribute.json` with a valid `creator_id` in the skill directory
+  - Uses the local alpha API key for authentication when available
+  - Falls back to GitHub Issue submission if the cloud endpoint is unreachable
+- `--github`: explicitly submits via GitHub Issue instead of the cloud endpoint
   - Small bundles (< 50KB): inlined in issue body
   - Large bundles (>= 50KB): uploaded as a gist
@@ -94,8 +102,13 @@ No raw transcripts, file contents, or identifiable information is included.
 **User wants to submit directly**
-> Run `selftune contribute --submit`. This creates a GitHub Issue via `gh`
-> CLI with the bundle inlined or uploaded as a gist.
+> Run `selftune contribute --submit`. This submits the export bundle to the
+> cloud endpoint. If the cloud endpoint fails, it falls back to GitHub.
+**User wants to submit via GitHub explicitly**
+> Run `selftune contribute --submit --github`. This creates a GitHub Issue
+> via `gh` CLI with the bundle inlined or uploaded as a gist.
 **User wants to limit to recent data**

package/skill/{Workflows → workflows}/Contributions.md RENAMED Viewed

@@ -1,11 +1,11 @@
 # selftune Contributions Workflow
-Manage local preferences for future creator-directed contribution flows.
+Manage local **sharing preferences** for creator-directed contribution flows.
-This is **not** the same as `selftune contribute`:
-- `selftune contributions` manages per-skill opt-in choices for creator-directed sharing
-- `selftune contribute` exports a community contribution bundle
-- `selftune creator-contributions` manages the creator-side `selftune.contribute.json` file
+This is **not** the same as:
+- `selftune contribute` — exporting an anonymized **export bundle** for the community
+- `selftune creator-contributions` — managing the **creator sharing setup** file (`selftune.contribute.json`)
+- The signals dashboard — viewing aggregated **contributor signal data** from all contributors
 ## When to Use
@@ -54,6 +54,14 @@ selftune contributions upload [--dry-run] [--retry-failed] [--limit <n>]
 | `--retry-failed` | Boolean | Requeue failed rows before attempting upload |
 | `--limit <n>` | Integer | Maximum number of staged rows to upload in one run |
+## Automatic Flush via Orchestrate
+When `selftune orchestrate` runs, it automatically flushes any staged
+creator-directed relay signals as Step 10 (after alpha upload). This means
+users who have opted in don't need to run `selftune contributions upload`
+manually — orchestrate handles it. The flush is fail-open and never blocks
+the orchestrate loop. An API key is required (alpha enrolled).
 ## Notes
 - This workflow now shows which installed skills are requesting creator-directed sharing via `selftune.contribute.json`.

package/skill/workflows/CreateTestDeploy.md ADDED Viewed

@@ -0,0 +1,170 @@
+# selftune Create, Test, and Deploy Workflow
+Use this when the user wants one guided path from a new or shaky skill to a
+safe shipped skill.
+This is a composed workflow. It does not replace the atomic `Evals`,
+`UnitTest`, `Baseline`, `Evolve`, or `Watch` workflows. It decides which one
+comes next and keeps the creator trust loop in order.
+## When to Use
+- The user says "create, test, and deploy"
+- The user wants the full creator loop end to end
+- The user asks "how do I know this skill works?" before shipping
+- The user asks whether a skill is ready to deploy
+- The user wants one recommended path from cold start to live watch
+## Default Path
+There is no single `selftune create-test-deploy` command yet. Run the loop
+step by step:
+```bash
+selftune eval generate --skill <name> --skill-path <path>
+selftune eval unit-test --skill <name> --generate --skill-path <path>
+selftune evolve --skill <name> --skill-path <path> --dry-run --validation-mode replay
+selftune grade baseline --skill <name> --skill-path <path>
+selftune evolve --skill <name> --skill-path <path> --with-baseline
+selftune watch --skill <name>
+```
+## How to Run It
+### 1. Resolve the current loop position
+Start with one of these surfaces:
+```bash
+selftune status
+```
+or
+```bash
+selftune dashboard
+```
+Use the readiness summary to find which step is missing:
+- missing evals
+- missing unit tests
+- missing replay validation
+- missing baseline
+- ready to deploy
+- already deployed and under watch
+### 2. Run only the next missing step
+Do not blindly rerun the whole loop if the dashboard or status already shows a
+later step is complete.
+#### Missing evals
+Run:
+```bash
+selftune eval generate --skill <name> --skill-path <path>
+```
+If the skill is cold-start and there are no trusted triggers yet, prefer:
+```bash
+selftune eval generate --skill <name> --auto-synthetic --skill-path <path>
+```
+Then continue to `UnitTest`.
+#### Missing unit tests
+Run:
+```bash
+selftune eval unit-test --skill <name> --generate --skill-path <path>
+```
+Then continue to replay dry-run validation.
+#### Missing replay validation
+Run:
+```bash
+selftune evolve --skill <name> --skill-path <path> --dry-run --validation-mode replay
+```
+This is the pre-deploy proof step. It validates against runtime-style routing
+without mutating the skill.
+Then continue to baseline.
+#### Missing baseline
+Run:
+```bash
+selftune grade baseline --skill <name> --skill-path <path>
+```
+Then continue to live deploy.
+#### Ready to deploy
+Run:
+```bash
+selftune evolve --skill <name> --skill-path <path> --with-baseline
+```
+This is the recommended creator ship command because it deploys only after the
+candidate clears the earlier trust gates.
+Then continue to watch.
+#### Already deployed and under watch
+Run:
+```bash
+selftune watch --skill <name>
+```
+Use this state to explain whether the skill is stable, regressing, or ready for
+another iteration.
+## Which workflow to read next
+Load the atomic workflow that matches the next missing step:
+- eval generation -> `workflows/Evals.md`
+- unit tests -> `workflows/UnitTest.md`
+- replay dry-run / deploy -> `workflows/Evolve.md`
+- baseline -> `workflows/Baseline.md`
+- live monitoring -> `workflows/Watch.md`
+Use `references/creator-playbook.md` when the user is publishing a skill other
+people will install and needs before-ship versus after-ship guidance.
+## Common Patterns
+**User asks for one end-to-end shipping path**
+> Use this workflow. Check the current readiness surface first, then run the
+> next missing creator-loop step instead of dumping every command at once.
+**User asks whether a skill is safe to ship**
+> Use `selftune status` or the dashboard to confirm evals, unit tests, replay
+> validation, and baseline exist. If all four are complete, run `selftune
+> evolve --with-baseline`. Otherwise run the missing step first.
+**User already shipped the skill**
+> Do not send them back to eval generation unless the evidence is stale or
+> missing. Route to `Watch` and explain whether the skill is stable.
+**User wants to understand why the loop is ordered this way**
+> Explain the progression:
+> router coverage -> workflow correctness -> runtime proof -> no-skill value ->
+> live deploy -> watch.