selftune 0.2.18 → 0.2.20
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +9 -4
- package/apps/local-dashboard/dist/assets/index-D8O-RG1I.js +60 -0
- package/apps/local-dashboard/dist/assets/index-_EcLywDg.css +1 -0
- package/apps/local-dashboard/dist/assets/vendor-table-BIiI3YhS.js +1 -0
- package/apps/local-dashboard/dist/assets/vendor-ui-CGEmUayx.js +12 -0
- package/apps/local-dashboard/dist/index.html +5 -5
- package/cli/selftune/alpha-upload/stage-canonical.ts +7 -6
- package/cli/selftune/constants.ts +10 -0
- package/cli/selftune/contribute/contribute.ts +30 -2
- package/cli/selftune/contribution-config.ts +249 -0
- package/cli/selftune/contribution-relay.ts +177 -0
- package/cli/selftune/contribution-signals.ts +219 -0
- package/cli/selftune/contribution-staging.ts +147 -0
- package/cli/selftune/contributions.ts +532 -0
- package/cli/selftune/creator-contributions.ts +333 -0
- package/cli/selftune/dashboard-contract.ts +209 -1
- package/cli/selftune/dashboard-server.ts +45 -11
- package/cli/selftune/eval/family-overlap.ts +714 -0
- package/cli/selftune/eval/hooks-to-evals.ts +182 -28
- package/cli/selftune/eval/synthetic-evals.ts +298 -11
- package/cli/selftune/evolution/evidence.ts +5 -0
- package/cli/selftune/evolution/evolve-body.ts +62 -2
- package/cli/selftune/evolution/evolve.ts +58 -1
- package/cli/selftune/evolution/validate-body.ts +10 -0
- package/cli/selftune/evolution/validate-host-replay.ts +236 -0
- package/cli/selftune/evolution/validate-proposal.ts +10 -0
- package/cli/selftune/evolution/validate-routing.ts +112 -5
- package/cli/selftune/export.ts +2 -2
- package/cli/selftune/index.ts +41 -5
- package/cli/selftune/ingestors/codex-rollout.ts +31 -35
- package/cli/selftune/ingestors/codex-wrapper.ts +32 -24
- package/cli/selftune/localdb/db.ts +2 -2
- package/cli/selftune/localdb/direct-write.ts +8 -3
- package/cli/selftune/localdb/materialize.ts +7 -2
- package/cli/selftune/localdb/queries.ts +712 -31
- package/cli/selftune/localdb/schema.ts +30 -1
- package/cli/selftune/recover.ts +153 -0
- package/cli/selftune/repair/skill-usage.ts +363 -4
- package/cli/selftune/routes/actions.ts +35 -1
- package/cli/selftune/routes/analytics.ts +14 -0
- package/cli/selftune/routes/index.ts +1 -0
- package/cli/selftune/routes/overview.ts +112 -4
- package/cli/selftune/routes/skill-report.ts +575 -11
- package/cli/selftune/status.ts +81 -2
- package/cli/selftune/sync.ts +56 -2
- package/cli/selftune/trust-model.ts +66 -0
- package/cli/selftune/types.ts +103 -0
- package/cli/selftune/utils/skill-detection.ts +43 -0
- package/cli/selftune/utils/text-similarity.ts +73 -0
- package/cli/selftune/watchlist.ts +65 -0
- package/package.json +1 -1
- package/packages/ui/src/components/ActivityTimeline.tsx +165 -150
- package/packages/ui/src/components/EvidenceViewer.tsx +419 -145
- package/packages/ui/src/components/EvolutionTimeline.tsx +81 -29
- package/packages/ui/src/components/OrchestrateRunsPanel.tsx +33 -16
- package/packages/ui/src/components/RecentActivityFeed.tsx +72 -41
- package/packages/ui/src/components/section-cards.tsx +12 -9
- package/packages/ui/src/primitives/card.tsx +1 -1
- package/packages/ui/src/types.ts +4 -0
- package/skill/SKILL.md +11 -1
- package/skill/Workflows/AlphaUpload.md +4 -0
- package/skill/Workflows/Composability.md +78 -0
- package/skill/Workflows/Contribute.md +6 -3
- package/skill/Workflows/Contributions.md +97 -0
- package/skill/Workflows/CreatorContributions.md +74 -0
- package/skill/Workflows/Dashboard.md +31 -0
- package/skill/Workflows/Evals.md +57 -8
- package/skill/Workflows/Evolve.md +23 -0
- package/skill/Workflows/Ingest.md +7 -0
- package/skill/Workflows/Initialize.md +20 -1
- package/skill/Workflows/Recover.md +84 -0
- package/skill/Workflows/RepairSkillUsage.md +12 -4
- package/skill/Workflows/Sync.md +18 -12
- package/apps/local-dashboard/dist/assets/index-BMIS6uUh.css +0 -2
- package/apps/local-dashboard/dist/assets/index-DOu3iLD9.js +0 -16
- package/apps/local-dashboard/dist/assets/vendor-table-pHbDxq36.js +0 -8
- package/apps/local-dashboard/dist/assets/vendor-ui-DIwlrGlb.js +0 -12
|
@@ -4,12 +4,27 @@ Analyze how skills interact when triggered together in the same session.
|
|
|
4
4
|
Detects conflict candidates — skill pairs that produce more errors when
|
|
5
5
|
co-occurring than when used alone.
|
|
6
6
|
|
|
7
|
+
Use the same workflow when the user is asking whether a sibling skill family
|
|
8
|
+
should stay split apart or be consolidated under one parent skill.
|
|
9
|
+
|
|
7
10
|
## Default Command
|
|
8
11
|
|
|
9
12
|
```bash
|
|
10
13
|
selftune eval composability --skill <name> [options]
|
|
11
14
|
```
|
|
12
15
|
|
|
16
|
+
## Family Overlap Command
|
|
17
|
+
|
|
18
|
+
```bash
|
|
19
|
+
selftune eval family-overlap --prefix <family-> [options]
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
Or analyze an explicit set of siblings:
|
|
23
|
+
|
|
24
|
+
```bash
|
|
25
|
+
selftune eval family-overlap --skills <skill-a,skill-b,skill-c> [options]
|
|
26
|
+
```
|
|
27
|
+
|
|
13
28
|
## Options
|
|
14
29
|
|
|
15
30
|
| Flag | Description | Default |
|
|
@@ -18,6 +33,16 @@ selftune eval composability --skill <name> [options]
|
|
|
18
33
|
| `--window <n>` | Only analyze sessions from last N days | All sessions |
|
|
19
34
|
| `--telemetry-log <path>` | Path to telemetry log | `~/.claude/session_telemetry_log.jsonl` |
|
|
20
35
|
|
|
36
|
+
### Family Overlap Options
|
|
37
|
+
|
|
38
|
+
| Flag | Description | Default |
|
|
39
|
+
| ----------------------- | ------------------------------------------------------------------ | ------- |
|
|
40
|
+
| `--prefix <family->` | Analyze all installed/observed sibling skills with this prefix | Required unless `--skills` |
|
|
41
|
+
| `--skills <a,b,c>` | Analyze a specific skill family | Required unless `--prefix` |
|
|
42
|
+
| `--parent-skill <name>` | Override the suggested consolidated parent skill name | Derived from prefix |
|
|
43
|
+
| `--min-overlap <pct>` | Minimum positive-query overlap to flag consolidation pressure | `0.3` |
|
|
44
|
+
| `--min-shared <n>` | Minimum shared positive queries to flag a sibling pair | `2` |
|
|
45
|
+
|
|
21
46
|
## Output Format
|
|
22
47
|
|
|
23
48
|
```json
|
|
@@ -60,6 +85,38 @@ The analyzer is a pure function that computes conflict scores from telemetry:
|
|
|
60
85
|
3. Pairs with `conflict_score > 0.3` are flagged as conflict candidates
|
|
61
86
|
4. Results sorted by co-occurrence count (most common first)
|
|
62
87
|
|
|
88
|
+
## How Family Overlap Works
|
|
89
|
+
|
|
90
|
+
The family-overlap analyzer answers a different question:
|
|
91
|
+
|
|
92
|
+
1. Build a trusted positive query set for each sibling skill
|
|
93
|
+
2. Compare every pair of siblings using exact-query overlap
|
|
94
|
+
3. Flag pairs whose overlap crosses the configured threshold
|
|
95
|
+
4. If overlap is persistent across the family, emit:
|
|
96
|
+
- consolidation recommendation
|
|
97
|
+
- draft parent skill name
|
|
98
|
+
- internal workflow mapping
|
|
99
|
+
- compatibility alias / migration notes
|
|
100
|
+
|
|
101
|
+
This is for packaging questions like:
|
|
102
|
+
|
|
103
|
+
- "Should `sc-search`, `sc-model`, and `sc-compare` really be one parent skill?"
|
|
104
|
+
- "Are my sibling skills competing for the same user intent?"
|
|
105
|
+
- "Should I stop evolving these independently and redesign the family?"
|
|
106
|
+
|
|
107
|
+
When trusted telemetry is sparse, the same command also emits a
|
|
108
|
+
`cold_start_suspicion` block. That is a weaker, earlier signal based on the
|
|
109
|
+
installed skill surfaces:
|
|
110
|
+
|
|
111
|
+
1. Frontmatter / top-level description similarity
|
|
112
|
+
2. Overlap in `## When to Use` language
|
|
113
|
+
3. Shared command surface (for example, siblings that both wrap `mentor search`)
|
|
114
|
+
4. Synthetic sibling-confusion probes derived from those overlapping surfaces
|
|
115
|
+
|
|
116
|
+
Treat `cold_start_suspicion.candidate` as architecture suspicion, not proof.
|
|
117
|
+
It is meant to tell you "this family may want a parent skill" before enough
|
|
118
|
+
real usage exists to confirm it through trusted positive-query overlap.
|
|
119
|
+
|
|
63
120
|
## Steps
|
|
64
121
|
|
|
65
122
|
### 1. Run Analysis
|
|
@@ -86,6 +143,19 @@ When conflict candidates are identified, present them to the user with recommend
|
|
|
86
143
|
- Consider evolving descriptions to reduce false triggers
|
|
87
144
|
- Use the `pattern-analyst` agent for deeper cross-skill analysis
|
|
88
145
|
|
|
146
|
+
### 4. Investigate Family Consolidation
|
|
147
|
+
|
|
148
|
+
```bash
|
|
149
|
+
selftune eval family-overlap --prefix sc-
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
Interpretation:
|
|
153
|
+
|
|
154
|
+
- `consolidation_candidate: false` means keep improving the sibling descriptions/workflows separately
|
|
155
|
+
- `consolidation_candidate: true` means the problem is likely packaging, not just wording
|
|
156
|
+
- `cold_start_suspicion.candidate: true` means installed skill surfaces already look suspicious even though trusted telemetry is still sparse
|
|
157
|
+
- `refactor_proposal` is a draft for human review only; do not auto-deploy a family rewrite
|
|
158
|
+
|
|
89
159
|
## Subagent Escalation
|
|
90
160
|
|
|
91
161
|
For deep cross-skill analysis beyond what the composability command provides,
|
|
@@ -110,3 +180,11 @@ resolution plan with trigger ownership recommendations.
|
|
|
110
180
|
**"Why are sessions with multiple skills failing?"**
|
|
111
181
|
|
|
112
182
|
> Run composability for each skill involved, look for high conflict scores.
|
|
183
|
+
|
|
184
|
+
**"Are my State Change skills too fragmented?"**
|
|
185
|
+
|
|
186
|
+
> `selftune eval family-overlap --prefix sc-`
|
|
187
|
+
|
|
188
|
+
**"Should I consolidate this sibling skill family?"**
|
|
189
|
+
|
|
190
|
+
> Run `selftune eval family-overlap` and look for `consolidation_candidate` when you have live evidence, or `cold_start_suspicion` when you only have installed skill surfaces plus cold-start evals.
|
|
@@ -1,8 +1,11 @@
|
|
|
1
1
|
# selftune Contribute Workflow
|
|
2
2
|
|
|
3
|
-
Export anonymized skill observability data as a JSON bundle for community
|
|
4
|
-
contribution. Helps improve selftune's skill routing without exposing
|
|
5
|
-
|
|
3
|
+
Export anonymized skill observability data as a JSON bundle for **community**
|
|
4
|
+
contribution. Helps improve selftune's skill routing without exposing private data.
|
|
5
|
+
|
|
6
|
+
This is **not** the same as `selftune contributions`, which manages per-skill
|
|
7
|
+
creator-directed sharing preferences, or `selftune creator-contributions`,
|
|
8
|
+
which manages the creator-side bundled config file.
|
|
6
9
|
|
|
7
10
|
## When to Use
|
|
8
11
|
|
|
@@ -0,0 +1,97 @@
|
|
|
1
|
+
# selftune Contributions Workflow
|
|
2
|
+
|
|
3
|
+
Manage local preferences for future creator-directed contribution flows.
|
|
4
|
+
|
|
5
|
+
This is **not** the same as `selftune contribute`:
|
|
6
|
+
- `selftune contributions` manages per-skill opt-in choices for creator-directed sharing
|
|
7
|
+
- `selftune contribute` exports a community contribution bundle
|
|
8
|
+
- `selftune creator-contributions` manages the creator-side `selftune.contribute.json` file
|
|
9
|
+
|
|
10
|
+
## When to Use
|
|
11
|
+
|
|
12
|
+
- The user asks to approve or revoke sharing signals with a specific skill creator
|
|
13
|
+
- The user wants to see which creator-directed contribution preferences are stored locally
|
|
14
|
+
- The user wants to set a default behavior for future creator-directed contribution prompts
|
|
15
|
+
|
|
16
|
+
## Default Commands
|
|
17
|
+
|
|
18
|
+
```bash
|
|
19
|
+
selftune contributions
|
|
20
|
+
selftune contributions preview <skill>
|
|
21
|
+
selftune contributions approve <skill>
|
|
22
|
+
selftune contributions revoke <skill>
|
|
23
|
+
selftune contributions default <ask|always|never>
|
|
24
|
+
selftune contributions upload [--dry-run] [--retry-failed] [--limit <n>]
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
## What It Does Today
|
|
28
|
+
|
|
29
|
+
- Discovers installed skills that ship a `selftune.contribute.json` config
|
|
30
|
+
- Stores local opt-in / opt-out state in `~/.selftune/contribution-preferences.json`
|
|
31
|
+
- Stages privacy-safe creator-directed relay signals locally during `selftune sync` once a skill is approved
|
|
32
|
+
- Keeps creator-directed sharing preferences separate from:
|
|
33
|
+
- `selftune contribute` community export bundles
|
|
34
|
+
- `selftune alpha upload` personal cloud uploads
|
|
35
|
+
|
|
36
|
+
## Commands
|
|
37
|
+
|
|
38
|
+
| Command | Description |
|
|
39
|
+
| --- | --- |
|
|
40
|
+
| `selftune contributions` | Show current creator-directed contribution preferences |
|
|
41
|
+
| `selftune contributions status` | Same as above |
|
|
42
|
+
| `selftune contributions preview <skill>` | Show the privacy-safe relay payload shape for one skill |
|
|
43
|
+
| `selftune contributions approve <skill>` | Approve creator-directed sharing for one skill |
|
|
44
|
+
| `selftune contributions revoke <skill>` | Revoke creator-directed sharing for one skill |
|
|
45
|
+
| `selftune contributions default <ask|always|never>` | Set the default behavior for future creator-directed prompts |
|
|
46
|
+
| `selftune contributions upload [--dry-run] [--retry-failed] [--limit <n>]` | Flush locally staged creator-directed relay signals |
|
|
47
|
+
| `selftune contributions reset` | Reset all creator-directed sharing preferences to defaults |
|
|
48
|
+
|
|
49
|
+
## Upload Flags
|
|
50
|
+
|
|
51
|
+
| Flag | Type | Description |
|
|
52
|
+
| --- | --- | --- |
|
|
53
|
+
| `--dry-run` | Boolean | Show pending staged rows without uploading |
|
|
54
|
+
| `--retry-failed` | Boolean | Requeue failed rows before attempting upload |
|
|
55
|
+
| `--limit <n>` | Integer | Maximum number of staged rows to upload in one run |
|
|
56
|
+
|
|
57
|
+
## Notes
|
|
58
|
+
|
|
59
|
+
- This workflow now shows which installed skills are requesting creator-directed sharing via `selftune.contribute.json`.
|
|
60
|
+
- Once approved, creator-directed contribution signals are staged locally during `selftune sync` / `selftune orchestrate`.
|
|
61
|
+
- Use `selftune contributions upload` to flush staged rows to the creator-directed relay endpoint.
|
|
62
|
+
- Relay upload is separate from `selftune alpha upload` and currently reuses the local cloud API key when available.
|
|
63
|
+
- Use `selftune contribute` when the user explicitly wants to export/share an anonymized community bundle.
|
|
64
|
+
- Use `selftune alpha upload` when the user wants to push their own cloud telemetry.
|
|
65
|
+
|
|
66
|
+
## Common Patterns
|
|
67
|
+
|
|
68
|
+
**User asks what creator-directed sharing is configured**
|
|
69
|
+
|
|
70
|
+
> Run `selftune contributions` and summarize the global default plus any per-skill choices.
|
|
71
|
+
|
|
72
|
+
**User wants to allow contribution signals for one skill**
|
|
73
|
+
|
|
74
|
+
> Run `selftune contributions approve <skill>`.
|
|
75
|
+
|
|
76
|
+
**User wants to see what would actually be shared**
|
|
77
|
+
|
|
78
|
+
> Run `selftune contributions preview <skill>` and summarize the requested signals plus the “never shared” guarantees.
|
|
79
|
+
|
|
80
|
+
**User wants to turn off creator-directed sharing for one skill**
|
|
81
|
+
|
|
82
|
+
> Run `selftune contributions revoke <skill>`.
|
|
83
|
+
|
|
84
|
+
**User wants future creator-directed prompts to default one way**
|
|
85
|
+
|
|
86
|
+
> Run `selftune contributions default <ask|always|never>` using the user's preference.
|
|
87
|
+
|
|
88
|
+
**User wants to send staged creator-directed signals now**
|
|
89
|
+
|
|
90
|
+
> Run `selftune contributions upload`.
|
|
91
|
+
> Use `--dry-run` first if they want to confirm how many staged rows are pending.
|
|
92
|
+
> Use `--retry-failed` if earlier relay attempts failed and need to be retried.
|
|
93
|
+
> Use `--limit 25` when they want a smaller controlled batch.
|
|
94
|
+
|
|
95
|
+
**User wants to clear all stored creator-directed contribution preferences**
|
|
96
|
+
|
|
97
|
+
> Run `selftune contributions reset`.
|
|
@@ -0,0 +1,74 @@
|
|
|
1
|
+
# selftune Creator-Contributions Workflow
|
|
2
|
+
|
|
3
|
+
Manage the creator-side `selftune.contribute.json` file bundled with a skill.
|
|
4
|
+
|
|
5
|
+
This is **not** the same as:
|
|
6
|
+
- `selftune contributions` — end-user opt-in / opt-out preferences
|
|
7
|
+
- `selftune contribute` — community export bundle
|
|
8
|
+
|
|
9
|
+
## When to Use
|
|
10
|
+
|
|
11
|
+
- The user is a skill creator and wants to enable creator-directed contribution for one skill
|
|
12
|
+
- The user wants to inspect or remove a bundled `selftune.contribute.json`
|
|
13
|
+
- The user wants to prepare a skill package for the future creator ← user relay pipeline
|
|
14
|
+
|
|
15
|
+
## Default Commands
|
|
16
|
+
|
|
17
|
+
```bash
|
|
18
|
+
selftune creator-contributions
|
|
19
|
+
selftune creator-contributions status --skill <name>
|
|
20
|
+
selftune creator-contributions enable --skill <name> [--skill-path <path>] [--creator-id <id>]
|
|
21
|
+
selftune creator-contributions enable --all [--prefix sc-] [--creator-id <id>]
|
|
22
|
+
selftune creator-contributions disable --skill <name> [--skill-path <path>]
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
## Options
|
|
26
|
+
|
|
27
|
+
| Flag | Description |
|
|
28
|
+
| --- | --- |
|
|
29
|
+
| `--skill <name>` | Skill name to inspect or configure |
|
|
30
|
+
| `--skill-path <path>` | Explicit path to the skill's `SKILL.md` when auto-discovery is ambiguous |
|
|
31
|
+
| `--creator-id <id>` | Explicit creator ID. If omitted, selftune uses `alpha.cloud_user_id` from local config when available |
|
|
32
|
+
| `--signals <csv>` | Comma-separated signal list for the generated config |
|
|
33
|
+
| `--message <text>` | Custom opt-in note stored in the config |
|
|
34
|
+
| `--privacy-url <url>` | Optional creator privacy URL stored in the config |
|
|
35
|
+
| `--all` | Enable configs for every installed skill selftune can resolve |
|
|
36
|
+
| `--prefix <prefix>` | Limit `--all` to installed skills whose names start with this prefix |
|
|
37
|
+
|
|
38
|
+
## What It Does Today
|
|
39
|
+
|
|
40
|
+
- Discovers installed skills that already ship `selftune.contribute.json`
|
|
41
|
+
- Creates or removes that config file locally for a creator-owned skill
|
|
42
|
+
- Can bulk-enable configs for multiple installed skills (useful for a skill suite like `sc-*`)
|
|
43
|
+
- Uses a static JSON config only — no executable creator code
|
|
44
|
+
|
|
45
|
+
## Notes
|
|
46
|
+
|
|
47
|
+
- This is local packaging/setup only. It does **not** upload creator-directed signals yet.
|
|
48
|
+
- The creator ID is currently sourced from `--creator-id` or the local alpha identity's `cloud_user_id`.
|
|
49
|
+
- Use this workflow when the user is preparing a skill package.
|
|
50
|
+
|
|
51
|
+
## Common Patterns
|
|
52
|
+
|
|
53
|
+
**User wants to see which of their skills already request creator contributions**
|
|
54
|
+
|
|
55
|
+
> Run `selftune creator-contributions` and summarize the discovered configs.
|
|
56
|
+
> Example: `selftune creator-contributions status --skill sc-search`
|
|
57
|
+
|
|
58
|
+
**User wants to enable creator contributions for one skill**
|
|
59
|
+
|
|
60
|
+
> Run `selftune creator-contributions enable --skill <name>`.
|
|
61
|
+
> If auto-discovery fails, rerun with `--skill-path /path/to/SKILL.md`.
|
|
62
|
+
> If no creator identity is available locally, rerun with `--creator-id <id>`.
|
|
63
|
+
> Example: `selftune creator-contributions enable --skill sc-search --skill-path ./skills/sc-search/SKILL.md --creator-id cr_state_change --signals trigger,grade,miss_category --message "Share privacy-safe usage signals with the skill creator." --privacy-url https://statechange.ai/privacy`
|
|
64
|
+
|
|
65
|
+
**User wants to enable creator contributions for a whole installed skill suite**
|
|
66
|
+
|
|
67
|
+
> Run `selftune creator-contributions enable --all --prefix sc-`.
|
|
68
|
+
> This is the fastest path when preparing a whole family of skills like State Change skills.
|
|
69
|
+
> Example: `selftune creator-contributions enable --all --prefix sc- --creator-id cr_state_change`
|
|
70
|
+
|
|
71
|
+
**User wants to stop bundling creator contribution config**
|
|
72
|
+
|
|
73
|
+
> Run `selftune creator-contributions disable --skill <name>`.
|
|
74
|
+
> Example: `selftune creator-contributions disable --skill sc-search --skill-path ./skills/sc-search/SKILL.md`
|
|
@@ -49,6 +49,7 @@ override.
|
|
|
49
49
|
| `POST` | `/api/actions/watch` | Trigger `selftune watch` for a skill |
|
|
50
50
|
| `POST` | `/api/actions/evolve` | Trigger `selftune evolve` for a skill |
|
|
51
51
|
| `POST` | `/api/actions/rollback` | Trigger `selftune evolve rollback` for a skill |
|
|
52
|
+
| `POST` | `/api/actions/watchlist` | Persist creator watchlist preferences |
|
|
52
53
|
|
|
53
54
|
### Live Updates (SSE)
|
|
54
55
|
|
|
@@ -98,6 +99,36 @@ All action endpoints return:
|
|
|
98
99
|
|
|
99
100
|
On failure, `success` is `false` and `error` contains the error message.
|
|
100
101
|
|
|
102
|
+
**Watchlist** request body:
|
|
103
|
+
|
|
104
|
+
```json
|
|
105
|
+
{
|
|
106
|
+
"skills": ["pptx", "sc-search"]
|
|
107
|
+
}
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
`skills` must be an array of skill names. The action replaces the full persisted
|
|
111
|
+
watchlist for the local dashboard.
|
|
112
|
+
|
|
113
|
+
Watchlist success response:
|
|
114
|
+
|
|
115
|
+
```json
|
|
116
|
+
{
|
|
117
|
+
"success": true,
|
|
118
|
+
"watched_skills": ["pptx", "sc-search"],
|
|
119
|
+
"error": null
|
|
120
|
+
}
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
Watchlist failure response:
|
|
124
|
+
|
|
125
|
+
```json
|
|
126
|
+
{
|
|
127
|
+
"success": false,
|
|
128
|
+
"error": "Missing required field: skills[]"
|
|
129
|
+
}
|
|
130
|
+
```
|
|
131
|
+
|
|
101
132
|
### Browser and Shutdown
|
|
102
133
|
|
|
103
134
|
The live server auto-opens the dashboard URL in the default browser on
|
package/skill/Workflows/Evals.md
CHANGED
|
@@ -25,7 +25,7 @@ selftune eval generate --skill <name> [options]
|
|
|
25
25
|
| Flag | Description | Default |
|
|
26
26
|
| ---------------------------------- | ----------------------------------------------------- | --------------------------------- |
|
|
27
27
|
| `--skill <name>` | Skill to generate evals for | Required (unless `--list-skills`) |
|
|
28
|
-
| `--list-skills` | List
|
|
28
|
+
| `--list-skills` | List skills with trusted-vs-raw readiness counts | Off |
|
|
29
29
|
| `--stats` | Show aggregate telemetry stats for the skill | Off |
|
|
30
30
|
| `--max <n>` | Maximum eval entries per side | 50 |
|
|
31
31
|
| `--seed <n>` | Seed for deterministic shuffling | 42 |
|
|
@@ -36,6 +36,7 @@ selftune eval generate --skill <name> [options]
|
|
|
36
36
|
| `--query-log <path>` | Path to all_queries_log.jsonl | Default log path |
|
|
37
37
|
| `--telemetry-log <path>` | Path to session_telemetry_log.jsonl | Default log path |
|
|
38
38
|
| `--synthetic` | Generate evals from SKILL.md via LLM (no logs needed) | Off |
|
|
39
|
+
| `--auto-synthetic` | Fall back to SKILL.md-based cold-start evals when no trusted triggers exist | Off |
|
|
39
40
|
| `--skill-path <path>` | Path to SKILL.md (required with `--synthetic`) | — |
|
|
40
41
|
| `--model <model>` | LLM model to use for synthetic generation | Agent default |
|
|
41
42
|
|
|
@@ -65,8 +66,22 @@ and optional `invocation_type` (omitted when `--no-taxonomy` is set).
|
|
|
65
66
|
```json
|
|
66
67
|
{
|
|
67
68
|
"skills": [
|
|
68
|
-
{
|
|
69
|
-
|
|
69
|
+
{
|
|
70
|
+
"name": "pptx",
|
|
71
|
+
"trusted_trigger_count": 42,
|
|
72
|
+
"raw_trigger_count": 42,
|
|
73
|
+
"trusted_session_count": 15,
|
|
74
|
+
"raw_session_count": 15,
|
|
75
|
+
"readiness": "log-ready"
|
|
76
|
+
},
|
|
77
|
+
{
|
|
78
|
+
"name": "sc-search",
|
|
79
|
+
"trusted_trigger_count": 0,
|
|
80
|
+
"raw_trigger_count": 1,
|
|
81
|
+
"trusted_session_count": 0,
|
|
82
|
+
"raw_session_count": 1,
|
|
83
|
+
"readiness": "cold-start"
|
|
84
|
+
}
|
|
70
85
|
]
|
|
71
86
|
}
|
|
72
87
|
```
|
|
@@ -115,7 +130,11 @@ Discover which skills have telemetry data and how many queries each has.
|
|
|
115
130
|
selftune eval generate --list-skills
|
|
116
131
|
```
|
|
117
132
|
|
|
118
|
-
Run this first to identify which skills have enough data for eval generation.
|
|
133
|
+
Run this first to identify which skills have enough trusted data for eval generation.
|
|
134
|
+
Installed skills with no trusted trigger history now appear as `cold-start`, which means the
|
|
135
|
+
skill is installed locally and ready for `--auto-synthetic` / `--synthetic` eval generation.
|
|
136
|
+
If raw trigger history exists but trusted positives do not, the list now shows both counts so the
|
|
137
|
+
creator can see that telemetry exists without being misled into thinking the skill is fully ready.
|
|
119
138
|
|
|
120
139
|
### Generate Synthetic Evals (Cold Start)
|
|
121
140
|
|
|
@@ -126,20 +145,36 @@ queries directly from the SKILL.md content via an LLM.
|
|
|
126
145
|
selftune eval generate --skill pptx --synthetic --skill-path /path/to/skills/pptx/SKILL.md
|
|
127
146
|
```
|
|
128
147
|
|
|
148
|
+
If the skill is installed locally but has no trusted trigger history yet, use the faster creator
|
|
149
|
+
onboarding path:
|
|
150
|
+
|
|
151
|
+
```bash
|
|
152
|
+
selftune eval generate --skill pptx --auto-synthetic --skill-path /path/to/skills/pptx/SKILL.md
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
`--auto-synthetic` keeps the normal log-based path when real trigger data exists, but falls back
|
|
156
|
+
to synthetic cold-start generation when it does not.
|
|
157
|
+
|
|
129
158
|
The command:
|
|
130
159
|
|
|
131
160
|
1. Reads the SKILL.md file content
|
|
132
161
|
2. Loads real user queries from the database (if available) as few-shot style examples so synthetic queries match real phrasing patterns
|
|
133
|
-
3.
|
|
134
|
-
4.
|
|
135
|
-
5.
|
|
136
|
-
6.
|
|
162
|
+
3. Detects nearby installed sibling skills to generate harder negative controls
|
|
163
|
+
4. Over-generates a candidate pool with a balanced prompt family mix (explicit / implicit / contextual positives plus sibling-confusion / adjacent / unrelated negatives)
|
|
164
|
+
5. Runs a second critique/prune pass to remove weak paraphrases, overlaps, and blurry boundary cases
|
|
165
|
+
6. Parses the response into eval entries with invocation type annotations
|
|
166
|
+
7. Classifies each positive query using the deterministic `classifyInvocation()` heuristic
|
|
167
|
+
8. Writes the eval set to the output file
|
|
137
168
|
|
|
138
169
|
**Note:** When real query data exists in the database, synthetic generation
|
|
139
170
|
automatically includes high-confidence positive triggers and general queries as
|
|
140
171
|
phrasing references. This produces more natural-sounding eval queries. If no
|
|
141
172
|
database is available, generation proceeds without real examples (fail-open).
|
|
142
173
|
|
|
174
|
+
The synthetic cold-start path is intentionally small and targeted. It is meant to bootstrap a
|
|
175
|
+
creator skill into its first supervised evolution cycle, not serve as the long-term source of
|
|
176
|
+
truth once real telemetry exists.
|
|
177
|
+
|
|
143
178
|
Use `--model` to override the default LLM model:
|
|
144
179
|
|
|
145
180
|
```bash
|
|
@@ -165,6 +200,20 @@ The command:
|
|
|
165
200
|
5. Annotates each entry with invocation type
|
|
166
201
|
6. Writes the eval set to the output file
|
|
167
202
|
|
|
203
|
+
After generation, the current validation path is:
|
|
204
|
+
|
|
205
|
+
```bash
|
|
206
|
+
selftune evolve --skill <name> --skill-path /path/to/SKILL.md --eval-set <generated-file> --dry-run
|
|
207
|
+
```
|
|
208
|
+
|
|
209
|
+
That dry run validates a proposal against the generated eval set without deploying.
|
|
210
|
+
|
|
211
|
+
If the selected skill has no trusted positives yet but selftune can resolve a local `SKILL.md`,
|
|
212
|
+
the command now prints the exact `--auto-synthetic` rerun hint instead of leaving the creator to
|
|
213
|
+
guess the cold-start path.
|
|
214
|
+
|
|
215
|
+
After reviewing a dry-run proposal, deploy by rerunning without `--dry-run`.
|
|
216
|
+
|
|
168
217
|
### Show Stats
|
|
169
218
|
|
|
170
219
|
View aggregate telemetry for a skill: average turns, tool call breakdown,
|
|
@@ -76,6 +76,29 @@ The evolution process writes multiple audit entries:
|
|
|
76
76
|
| `validated` | Proposal tested against eval set | `eval_snapshot` with before/after pass rates |
|
|
77
77
|
| `deployed` | Updated SKILL.md written to disk | `eval_snapshot` with final rates |
|
|
78
78
|
|
|
79
|
+
Routing/body validation may also carry provenance fields such as:
|
|
80
|
+
|
|
81
|
+
- `validation_mode` — `llm_judge`, `host_replay`, or `structural_guard`
|
|
82
|
+
- `validation_agent` — which host/agent performed the validation
|
|
83
|
+
- `validation_fixture_id` — fixture identifier when replay-backed validation is used
|
|
84
|
+
- `before_pass_rate` / `after_pass_rate` — only present when trigger validation actually ran; structural-guard exits do not emit synthetic pass rates
|
|
85
|
+
|
|
86
|
+
Most evolve runs today still validate through `llm_judge`. Routing evolution now
|
|
87
|
+
auto-builds a replay fixture from the target skill plus installed sibling
|
|
88
|
+
skills in the same registry, so replay-backed validation is preferred whenever
|
|
89
|
+
that local fixture can be constructed because it captures host-style routing
|
|
90
|
+
behavior instead of model judgment.
|
|
91
|
+
|
|
92
|
+
The current replay path is fixture-backed: it evaluates the target routing table
|
|
93
|
+
against the installed target/competing skill surfaces in a controlled replay
|
|
94
|
+
fixture and records per-entry evidence. That is still a stronger signal than a
|
|
95
|
+
free-form judge prompt, but you should describe it as replay-backed validation,
|
|
96
|
+
not as live operator telemetry.
|
|
97
|
+
|
|
98
|
+
Replay parsing is intentionally conservative: unreadable skill files degrade to
|
|
99
|
+
empty surfaces instead of throwing, and malformed routing rows with empty
|
|
100
|
+
trigger cells are ignored rather than treated as valid triggers.
|
|
101
|
+
|
|
79
102
|
## Parsing Instructions
|
|
80
103
|
|
|
81
104
|
### Track Evolution Progress
|
|
@@ -93,6 +93,13 @@ Writes to:
|
|
|
93
93
|
- `~/.claude/all_queries_log.jsonl` -- extracted user queries
|
|
94
94
|
- `~/.claude/session_telemetry_log.jsonl` -- per-session metrics with `source: "codex_rollout"`
|
|
95
95
|
|
|
96
|
+
### Notes
|
|
97
|
+
|
|
98
|
+
- Conservative skill attribution: Codex rollout ingest only attributes a skill when it has
|
|
99
|
+
explicit evidence, such as a skill file/path read or an explicit user mention that invokes
|
|
100
|
+
the skill. Incidental mentions inside assistant reasoning, optimizer prompts, or eval text do
|
|
101
|
+
not count as triggers.
|
|
102
|
+
|
|
96
103
|
### Steps
|
|
97
104
|
|
|
98
105
|
1. Verify `$CODEX_HOME/sessions/` directory exists and contains session files
|
|
@@ -192,6 +192,25 @@ and evolution pipeline have data to work with immediately.
|
|
|
192
192
|
The sync step is fail-open — if it encounters errors, init continues.
|
|
193
193
|
Skip with `--no-sync` if you only want hooks for forward-looking data.
|
|
194
194
|
|
|
195
|
+
If the user is migrating from a much older pre-SQLite install and wants to
|
|
196
|
+
recover legacy selftune JSONL history itself, use `selftune recover` as a
|
|
197
|
+
separate recovery step. That is not part of normal first-time setup.
|
|
198
|
+
|
|
199
|
+
Recovery quick reference:
|
|
200
|
+
|
|
201
|
+
| Flag | Description |
|
|
202
|
+
| --- | --- |
|
|
203
|
+
| `--full` | Rebuild SQLite from the available JSONL/export sources |
|
|
204
|
+
| `--force` | Skip the SQLite-only preflight guard during a full rebuild |
|
|
205
|
+
| `--since <date>` | Recover only rows on or after the given date |
|
|
206
|
+
| `--json` | Output JSON summary instead of human-readable text |
|
|
207
|
+
|
|
208
|
+
Example:
|
|
209
|
+
|
|
210
|
+
```bash
|
|
211
|
+
selftune recover --full --force
|
|
212
|
+
```
|
|
213
|
+
|
|
195
214
|
### 9. Autonomy Scheduling
|
|
196
215
|
|
|
197
216
|
Init automatically installs OS-level scheduling (launchd on macOS, cron/systemd
|
|
@@ -271,7 +290,7 @@ Enrollment uses a device-code flow — one command, one browser approval, fully
|
|
|
271
290
|
|
|
272
291
|
### Setup Sequence
|
|
273
292
|
|
|
274
|
-
1. **Check local config**: Run `selftune status` — look for the "Alpha Upload" section
|
|
293
|
+
1. **Check local config**: Run `selftune status` — use the first summary line and compact `Highlights` section to explain current skill health, then look for the "Alpha Upload" section
|
|
275
294
|
2. **If not linked**: First use `AskUserQuestion` for the opt-in decision. Only if the user says yes, collect their email and run:
|
|
276
295
|
|
|
277
296
|
```bash
|
|
@@ -0,0 +1,84 @@
|
|
|
1
|
+
# selftune Recover Workflow
|
|
2
|
+
|
|
3
|
+
Recover or backfill the local SQLite database from legacy JSONL files or an
|
|
4
|
+
explicit `selftune export` snapshot.
|
|
5
|
+
|
|
6
|
+
This is a recovery-only workflow. Normal operation should use `selftune sync`,
|
|
7
|
+
which replays native source data into SQLite and also triggers alpha upload
|
|
8
|
+
when enrolled.
|
|
9
|
+
|
|
10
|
+
## When to Use
|
|
11
|
+
|
|
12
|
+
- The user is migrating from a pre-SQLite selftune install and still has
|
|
13
|
+
legacy JSONL history that is not in SQLite yet
|
|
14
|
+
- The user exported SQLite to JSONL and now needs to rebuild a fresh DB from
|
|
15
|
+
that snapshot
|
|
16
|
+
- The user explicitly asks to recover, rebuild, or backfill SQLite from JSONL
|
|
17
|
+
|
|
18
|
+
## Default Command
|
|
19
|
+
|
|
20
|
+
```bash
|
|
21
|
+
selftune recover
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
## Options
|
|
25
|
+
|
|
26
|
+
| Flag | Description |
|
|
27
|
+
| ------------------------------ | ------------------------------------------------------------- |
|
|
28
|
+
| `--full` | Rebuild SQLite tables from scratch |
|
|
29
|
+
| `--force` | Skip the preflight guard for SQLite-only rows during rebuild |
|
|
30
|
+
| `--since <date>` | Incrementally materialize records on/after this date |
|
|
31
|
+
| `--canonical-log <path>` | Canonical JSONL path override |
|
|
32
|
+
| `--telemetry-log <path>` | Session telemetry JSONL path override |
|
|
33
|
+
| `--evolution-audit-log <path>` | Evolution audit JSONL path override |
|
|
34
|
+
| `--evolution-evidence-log <path>` | Evolution evidence JSONL path override |
|
|
35
|
+
| `--orchestrate-run-log <path>` | Orchestrate runs JSONL path override |
|
|
36
|
+
| `--json` | Output a JSON summary |
|
|
37
|
+
|
|
38
|
+
## Output
|
|
39
|
+
|
|
40
|
+
The command prints a summary of what was materialized into SQLite:
|
|
41
|
+
|
|
42
|
+
- sessions
|
|
43
|
+
- prompts
|
|
44
|
+
- skill invocations
|
|
45
|
+
- execution facts
|
|
46
|
+
- session telemetry
|
|
47
|
+
- legacy skill usage
|
|
48
|
+
- evolution audit
|
|
49
|
+
- evolution evidence
|
|
50
|
+
- orchestrate runs
|
|
51
|
+
|
|
52
|
+
With `--json`, the result includes `mode`, `source`, `since`, `force`, and the
|
|
53
|
+
full count breakdown.
|
|
54
|
+
|
|
55
|
+
## Common Patterns
|
|
56
|
+
|
|
57
|
+
**Backfill legacy JSONL into an existing SQLite DB**
|
|
58
|
+
|
|
59
|
+
> Run `selftune recover`.
|
|
60
|
+
|
|
61
|
+
**Rebuild a deleted DB from an explicit export snapshot**
|
|
62
|
+
|
|
63
|
+
> Run `selftune export --output ./recovery-snapshot`, then recover from the exported JSONL files explicitly:
|
|
64
|
+
>
|
|
65
|
+
> `selftune recover --full --force --telemetry-log ./recovery-snapshot/session_telemetry_log.jsonl --evolution-audit-log ./recovery-snapshot/evolution_audit_log.jsonl --evolution-evidence-log ./recovery-snapshot/evolution_evidence_log.jsonl --orchestrate-run-log ./recovery-snapshot/orchestrate_run_log.jsonl`
|
|
66
|
+
|
|
67
|
+
**Recover only recent JSONL rows**
|
|
68
|
+
|
|
69
|
+
> Run `selftune recover --since 2026-01-01`.
|
|
70
|
+
|
|
71
|
+
## Important Notes
|
|
72
|
+
|
|
73
|
+
- Do **not** use this as a normal freshness command. Use `selftune sync` for day-to-day operation.
|
|
74
|
+
- Alpha upload remains SQLite-first. Recovery only repopulates SQLite so the normal upload pipeline can stage and send data afterward.
|
|
75
|
+
- If you are recovering from post-cutover data, prefer a SQLite backup or `selftune export` snapshot. Passive legacy JSONL files do not contain all post-cutover records.
|
|
76
|
+
|
|
77
|
+
## Example Flags Used Above
|
|
78
|
+
|
|
79
|
+
| Flag | Description |
|
|
80
|
+
| --- | --- |
|
|
81
|
+
| `-o, --output <dir>` | Export SQLite into a portable snapshot directory |
|
|
82
|
+
| `--full` | Rebuild SQLite tables from scratch |
|
|
83
|
+
| `--force` | Skip the SQLite-only preflight guard during full rebuild |
|
|
84
|
+
| `--telemetry-log <path>` | Point recover at the exported telemetry JSONL file |
|