@neikyun/ciel 6.3.0 → 6.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (28) hide show
  1. package/assets/.claude/settings.json +1 -1
  2. package/assets/CLAUDE.md +5 -9
  3. package/assets/commands/ciel-audit.md +195 -59
  4. package/assets/commands/ciel-status.md +1 -1
  5. package/assets/commands/ciel-update.md +4 -0
  6. package/assets/dist/plugin/index.js +7 -9
  7. package/assets/platforms/opencode/.opencode/agents/ciel-critic.md +320 -483
  8. package/assets/platforms/opencode/.opencode/agents/ciel-explorer.md +113 -95
  9. package/assets/platforms/opencode/.opencode/agents/ciel-improver.md +204 -273
  10. package/assets/platforms/opencode/.opencode/agents/ciel-researcher.md +259 -270
  11. package/assets/platforms/opencode/.opencode/agents/ciel.md +1 -1
  12. package/assets/platforms/opencode/.opencode/commands/ciel-audit.md +300 -10
  13. package/assets/platforms/opencode/.opencode/commands/ciel-create-skill.md +75 -10
  14. package/assets/platforms/opencode/.opencode/commands/ciel-eval.md +71 -10
  15. package/assets/platforms/opencode/.opencode/commands/ciel-improve.md +7 -13
  16. package/assets/platforms/opencode/.opencode/commands/ciel-init.md +165 -11
  17. package/assets/platforms/opencode/.opencode/commands/ciel-migrate.md +5 -0
  18. package/assets/platforms/opencode/.opencode/commands/ciel-refresh.md +89 -13
  19. package/assets/platforms/opencode/.opencode/commands/ciel-status.md +6 -1
  20. package/assets/platforms/opencode/.opencode/commands/ciel-update.md +31 -18
  21. package/assets/platforms/opencode/.opencode/commands/ciel.md +1 -2
  22. package/assets/platforms/opencode/.opencode/plugins/ciel.ts +146 -0
  23. package/assets/platforms/opencode/AGENTS.md +3 -3
  24. package/assets/skills/ciel/SKILL.md +1 -1
  25. package/dist/plugin/index.d.ts.map +1 -1
  26. package/dist/plugin/index.js +7 -9
  27. package/dist/plugin/index.js.map +1 -1
  28. package/package.json +3 -2
@@ -33,7 +33,7 @@ Tu es l'orchestrateur Ciel v6. Analyse, planifie, implemente, verifie.
33
33
 
34
34
  ## Regles (immutables)
35
35
 
36
- 1. **Depth en 1ere ligne** — chaque reponse commence par la classification (Trivial/Standard/Critical/Spike)
36
+ 1. **Pipeline interne** — classifie la profondeur et suis les 16 étapes en interne. N'affiche pas la classification.
37
37
  2. **Pipeline** — suis les 16 etapes (tableau ci-dessous). Le plugin injecte un rappel avant chaque message.
38
38
  3. **TODO list** — utilise `todowrite` au debut de chaque tache pour tracker les etapes. Marque chaque etape completed/in_progress au fur et a mesure.
39
39
  4. **ASK** — utilise `question` tool SEULEMENT si ambigu. Si le contexte est suffisant, DECIDE et avance. Ne demande pas pour chaque etape.
@@ -1,19 +1,309 @@
1
1
  ---
2
- command: ciel-audit
3
- description: Session post-mortem — detect Ciel paradigm violations
2
+ description: ---
4
3
  subtask: false
5
4
  ---
6
5
 
7
- # /ciel-audit — Session audit
6
+ ---
7
+ description: Audits the current Claude Code session for Ciel paradigm violations (missed Task dispatches, inline gathering, hook inactivity, skill overlaps, intent routing misses). Produces a structured report with a Ciel Health Score (0-100). If score < 75, creates a GitHub Issue on the Ciel repository with the findings and session timeline. Hook-independent — works even when Ciel hooks are broken.
8
+ ---
8
9
 
9
- Audits the current session for Ciel paradigm violations (missed pipeline steps, missing depth classification, skipped FAIRE gates, hook inactivity). Produces a structured report.
10
+ # /ciel-audit Session post-mortem
11
+
12
+ *Generates a structured report of Ciel behavior violations observed in the current session. Calculates a Ciel Health Score (0-100). If the score is below 75, creates a GitHub Issue on the Ciel repository (github.com/KaosKyun/Ciel) with the full timeline and findings — otherwise produces the report only without creating an issue.*
10
13
 
11
14
  Usage: `/ciel-audit`
12
15
 
13
- ## Process
16
+ Runs inline in the main session. Does not dispatch agents. Does not depend on hooks being active.
17
+
18
+ ---
19
+
20
+ ## Instructions to the model
21
+
22
+ You are auditing the **current conversation session** — the one you are participating in right now. Jump directly to the analysis. No preamble. No meta-commentary. No "I will now audit…".
23
+
24
+ ### What to audit
25
+
26
+ Scan the session's tool-use history (your own prior turns). For each `/ciel <task>` invocation in this session, check the **eight dimensions** below. For each dimension, assign a severity and penalty score to calculate the final Ciel Health Score.
27
+
28
+ #### Dimension 1: Dispatch discipline (critical) — penalty up to -25
29
+
30
+ - Did the assistant emit a `Task(subagent_type="ciel-*")` within the **first 3 tool calls** after the `/ciel` prompt?
31
+ - If NO: count the inline `Bash` / `Read` / `Grep` / `Glob` / `WebSearch` / `WebFetch` calls emitted in the main session before any dispatch. This is the v2.1.5 anti-pattern documented in `skills/ciel/SKILL.md:146-156`.
32
+ - Exception: Trivial tasks (rename, typo, 1-line fix, docs-only) are allowed to run inline.
33
+ - Severity→score:
34
+ - Critical dispatch violation (Standard/Critical task with no Task()): **-25**
35
+ - High (delayed dispatch, >1 inline tool before dispatch): **-15**
36
+ - OK (Task() within first 3 tool calls): **0**
37
+
38
+ #### Dimension 2: Hook activity (critical) — penalty up to -25
39
+
40
+ Search the session transcript for strings that Ciel hooks would have injected:
41
+
42
+ - `"CIEL depth hint:"` — from `hooks/user-prompt-submit.sh:36`, injected via `additionalContext` on every UserPromptSubmit.
43
+ - `"CIEL "` prefix (e.g., `"CIEL [CRITIQUE]"`, `"CIEL src/...` ) — from `hooks/pre-tool-write.sh:34,36`, injected before every Write/Edit.
44
+ - Session banner from `hooks/session-start.sh` (SessionStart context).
45
+ - `"META-CRITIQUER"` or similar end-of-session signal from `hooks/stop.sh`.
46
+
47
+ - None found despite writes/edits: **-25** → hooks broken
48
+ - Partial (some hooks firing but not all): **-10**
49
+ - All present: **0**
50
+
51
+ Most likely root cause: relative paths in `Ciel/settings.json:8,19,31,42,54,65,76`. The `command` field is written as `bash .claude/plugins/ciel/hooks/<file>.sh`, which Claude Code resolves against the current working directory, not the plugin directory.
52
+
53
+ #### Dimension 3: Skill invocation coverage vs depth — penalty up to -15
54
+
55
+ Cross-reference `skills/ciel/SKILL.md:20-64` (Depth Gauge) with what actually happened:
56
+
57
+ - **Standard** task → `researcher` and `explorer` agents must be dispatched in parallel before FAIRE. Missing both: **-15**. Missing one: **-8**.
58
+ - **Critical** task → must additionally invoke `stride-analyzer` and `security-regression-check`. Missing: **-15**.
59
+ - Depth ambiguous and `depth-classifier` not invoked: **-5**.
60
+
61
+ #### Dimension 4: Skill overlap / redundancy — penalty up to -10
62
+
63
+ - Both `relire-critic` AND `critiquer-auditor` on the same diff: **-10**
64
+ - `meta-critiquer` did not fire at end-of-task: **-5**
65
+ - Same skill invoked 3+ times for same scope: **-5**
66
+
67
+ #### Dimension 5: Agent report quality — penalty up to -5
68
+
69
+ - Any `Task(subagent_type="ciel-*")` with returned message under 200 tokens: **-5** per occurrence (max -10)
70
+
71
+ #### Dimension 6: Intent routing hits / misses — penalty up to -10
72
+
73
+ Scan the user's `/ciel` prompt text against the intent signals in `SKILL.md:79-94`. For each matched intent, verify the corresponding skill was invoked. Each miss: **-5** (max -10).
74
+
75
+ #### Dimension 7: npm version staleness — penalty up to -10
76
+
77
+ Check if a newer version of Ciel is available on npm:
78
+
79
+ ```bash
80
+ NPM_VERSION=$(npm view @neikyun/ciel version 2>/dev/null || echo "unknown")
81
+ LOCAL_VERSION=$(cat /path/to/VERSION 2>/dev/null || cat package.json 2>/dev/null | grep '"version"' | head -1 | cut -d'"' -f4 || echo "unknown")
82
+ ```
83
+
84
+ - npm version > local version: **-10**
85
+ - Version check failed (npm not installed, no network): **0** (not a violation, just incomplete data)
86
+ - Versions match or local is newer: **0**
87
+
88
+ If npm version > local version, include an **Update notification** in the report:
89
+ > A newer version of Ciel is available on npm: **{npm_version}** (installed: **{local_version}**). Run `ciel-update` or `npm update -g @neikyun/ciel` to upgrade.
90
+
91
+ #### Dimension 8: Platform health — penalty up to -5
92
+
93
+ Check which Ciel platforms are present and whether they have valid configurations:
94
+
95
+ ```bash
96
+ ls /path/to/platforms/ 2>/dev/null
97
+ ```
98
+
99
+ Expected platforms: codex, cursor, kilocode, lmstudio, ollama, opencode, windsurf
100
+
101
+ - All platforms present: **0**
102
+ - Missing 1-2 platforms: **-3**
103
+ - Missing 3+ platforms: **-5**
104
+
105
+ ---
106
+
107
+ ### Scoring
108
+
109
+ **Ciel Health Score** = 100 - sum(penalties)
110
+
111
+ | Score range | Status | Issue created? |
112
+ |-------------|--------|----------------|
113
+ | 90-100 | Excellent | No |
114
+ | 75-89 | Good (minor issues) | No |
115
+ | 50-74 | Needs improvement | **Yes** — creates issue with timeline |
116
+ | 0-49 | Critical | **Yes** — creates issue with timeline |
117
+
118
+ The score is calculated automatically from the detected violations.
119
+
120
+ ---
121
+
122
+ ### Report format
123
+
124
+ Begin the output with the literal line `# Ciel Session Audit Report`. End with the literal line `**End of audit report.**` on its own line.
125
+
126
+ ```markdown
127
+ # Ciel Session Audit Report
128
+
129
+ **Date**: <today's date>
130
+ **Ciel Health Score**: <N>/100 — <Excellent|Good|Needs improvement|Critical>
131
+ **npm**: local v<X.Y.Z> | npm v<X.Y.Z> | <up-to-date|update available>
132
+ **Platforms**: codex ✓ cursor ✓ kilo ✓ lmstudio ✓ ollama ✓ opencode ✓ windsurf ✓ (or ✗ if missing)
133
+ **Session summary**: <N> /ciel invocation(s), <N> total tool calls, <N> Task() dispatches, <N> inline Bash/Read/Grep/WebSearch calls in main session.
134
+
135
+ **Verdict**: <PASS | VIOLATIONS FOUND>
136
+
137
+ ---
138
+
139
+ ## Violations detected
140
+
141
+ ### <N>. <Short violation name> — penalty: -<N>
142
+
143
+ **Severity**: <critical | high | medium | low>
144
+ **Evidence from session**
145
+ - User prompt (turn N): `<exact prompt text, truncated to 200 chars>`
146
+ - Assistant's first 3 tool calls after this prompt:
147
+ 1. `<tool_name>(<brief args>)`
148
+ 2. `<tool_name>(<brief args>)`
149
+ 3. `<tool_name>(<brief args>)`
150
+ - Expected: `Task(subagent_type="ciel-<role>", ...)` as first tool call
151
+ - Observed: `<actual behavior>`
152
+
153
+ **Root cause hypothesis**
154
+ <one or two sentences>
155
+
156
+ **Proposed fix**
157
+ - <concrete edit with file:line>
158
+
159
+ ...
160
+
161
+ ---
162
+
163
+ ## Update notification
164
+
165
+ <If npm version > local version, show update notification here. Otherwise omit this section.>
166
+
167
+ ---
168
+
169
+ ## Scoring breakdown
170
+
171
+ | Dimension | Penalty |
172
+ |-----------|---------|
173
+ | D1 — Dispatch discipline | -<N> |
174
+ | D2 — Hook activity | -<N> |
175
+ | D3 — Skill coverage vs depth | -<N> |
176
+ | D4 — Skill overlap | -<N> |
177
+ | D5 — Agent report quality | -<N> |
178
+ | D6 — Intent routing | -<N> |
179
+ | D7 — npm version | -<N> |
180
+ | D8 — Platform health | -<N> |
181
+ | **Total** | **-<N>** |
182
+ | **Health Score** | **<N>/100** |
183
+
184
+ ---
185
+
186
+ ## Summary of fixes to apply
187
+
188
+ Apply these in order. Each points to a file:line and a concrete change.
189
+
190
+ 1. **<Fix name>** — `<path>:<line>` — <one-line description>
191
+ 2. ...
192
+
193
+ After each fix: re-run the scenario that triggered the violation in a fresh Claude session to verify.
194
+
195
+ ---
196
+
197
+ **End of audit report.**
198
+ ```
199
+
200
+ ### If no violations are found (score = 100)
201
+
202
+ Output a single short section — **no issue is created** for PASS verdicts:
203
+
204
+ ```markdown
205
+ # Ciel Session Audit Report
206
+
207
+ **Date**: <today's date>
208
+ **Ciel Health Score**: 100/100 — Excellent
209
+ **npm**: local v<X.Y.Z> | npm v<X.Y.Z> | up-to-date
210
+ **Platforms**: all 7 platforms present ✓
211
+ **Session summary**: <N> /ciel invocation(s), <N> tool calls, <N> Task() dispatches.
212
+ **Verdict**: PASS
213
+
214
+ **No violations found.** No issue created.
215
+
216
+ **End of audit report.**
217
+ ```
218
+
219
+ ---
220
+
221
+ ### GitHub Issue creation (only if score < 75)
222
+
223
+ If the Ciel Health Score is **below 75**, create a GitHub Issue with the report AND the session timeline.
224
+
225
+ **Important**: Do NOT create an issue if score >= 75. Only create for scores < 75.
226
+
227
+ 1. **Check for duplicate issues first**:
228
+ ```bash
229
+ EXISTING=$(gh issue list --repo KaosKyun/Ciel --label audit --state open \
230
+ --json title --jq '.[].title' 2>/dev/null | \
231
+ grep -c "^\[CIEL-AUDIT\] $(date +%Y-%m-%d) -" || true)
232
+ if [ "$EXISTING" -gt 0 ]; then
233
+ echo "Warning: today's audit issue already exists. Skipping creation."
234
+ echo "View existing at: https://github.com/KaosKyun/Ciel/issues?q=is%3Aopen+label%3Aaudit"
235
+ exit 0
236
+ fi
237
+ ```
238
+
239
+ 2. **Save the report to a temp file**: Write the complete report text to a temp file:
240
+ ```bash
241
+ REPORT_FILE="/tmp/ciel-audit-report-$(date +%Y%m%d).md"
242
+ cat > "$REPORT_FILE" << 'REPORT_EOF'
243
+ # Ciel Session Audit Report
244
+ ...
245
+ **End of audit report.**
246
+ REPORT_EOF
247
+ ```
248
+
249
+ 3. **Determine the repository**: run `gh repo view --json owner,name` to confirm, or default to `KaosKyun/Ciel`.
250
+
251
+ 4. **Create the issue** with `gh issue create`, using Python to avoid shell quoting issues:
252
+ ```bash
253
+ python3 -c "
254
+ import subprocess, sys
255
+ ymd, date_str, score, verdict = sys.argv[1:5]
256
+ report_path = f'/tmp/ciel-audit-report-{ymd}.md'
257
+ with open(report_path) as f:
258
+ body = f.read()
259
+ subprocess.run(['gh', 'issue', 'create',
260
+ '--repo', 'KaosKyun/Ciel',
261
+ '--title', f'[CIEL-AUDIT] {date_str} - {verdict} (Score: {score}/100)',
262
+ '--label', 'audit,ciel',
263
+ '--body', body])
264
+ " "$(date +%Y%m%d)" "$(date +%Y-%m-%d)" "<score>" "<verdict>"
265
+ ```
266
+
267
+ 5. **Include the session timeline** in the issue body. Before the report, add a timeline section listing all `/ciel` invocations in chronological order:
268
+ ```markdown
269
+ ## Session Timeline
270
+
271
+ | Time | Event |
272
+ |------|-------|
273
+ | T+N | /ciel <prompt excerpt> |
274
+ | T+N | Write/Edit on <file> |
275
+ | T+N | Task() dispatch to <agent> |
276
+ ...
277
+
278
+ ---
279
+
280
+ <full audit report>
281
+ ```
282
+
283
+ 6. **Handle errors gracefully**:
284
+ - If `gh` is not available: print a message telling the user to install (`brew install gh`) and print the report to stdout instead.
285
+ - If the repository can't be reached: skip issue creation, print a warning, but still output the report.
286
+
287
+ 7. **After creating the issue**: include the issue URL at the end of your response so the user can open it directly.
288
+
289
+ **Title format**: `[CIEL-AUDIT] YYYY-MM-DD - VIOLATIONS FOUND (Score: <N>/100)`
290
+
291
+ **Labels**: `audit`, `ciel`
292
+
293
+ ---
294
+
295
+ ### Tone and length
296
+
297
+ - **Mechanically actionable, not narrative**. File paths, line numbers, before/after snippets.
298
+ - Target length: 400–800 lines of markdown for a session with 2–3 violations. Shorter if PASS.
299
+ - No rhetorical preamble. No meta-commentary about the audit process itself. No emoji. No closing remarks after the `**End of audit report.**` marker.
300
+
301
+ ### What NOT to do
14
302
 
15
- 1. Analyzes current session context
16
- 2. Checks for depth classification at each step
17
- 3. Detects FAIRE gate violations
18
- 4. Checks hook activity
19
- 5. Reports findings with severity levels
303
+ - Do NOT fix violations in the current session. Only produce the report and optionally create the issue.
304
+ - Do NOT invoke other Ciel skills. This command is fully self-contained.
305
+ - Do NOT dispatch `Task()` agents. Audit happens inline.
306
+ - Do NOT ask clarifying questions. Produce the report with the information you have.
307
+ - Do NOT create an issue if score >= 75. Only create for score < 75.
308
+ - Do NOT create duplicate issues — run the `gh issue list` check before creating.
309
+ - Do NOT restart, rerun, or attempt to fix the session in-flight. The audit report is the deliverable.
@@ -1,20 +1,85 @@
1
1
  ---
2
- command: ciel-create-skill
3
- description: Generate a new Ciel SKILL.md scaffold
2
+ description: ---
4
3
  agent: ciel-improver
5
4
  subtask: true
6
5
  ---
7
6
 
8
- # /ciel-create-skill — Create new skill
7
+ > **OpenCode note**: This command requires `claude --print` headless mode for full functionality (binary evals, skill scaffold generation). On OpenCode it runs in degraded mode the improver agent returns proposals only. For the full harness, use Claude Code.
9
8
 
10
- Generates a valid Ciel SKILL.md scaffold following Anthropic Skills-first rules (kebab-case ≤64, YAML description ≤1024, body ≤500 lines).
9
+ ---
10
+ description: Generates a valid Ciel SKILL.md scaffold following Anthropic Skills-first rules (kebab-case ≤64, YAML description ≤1024, body ≤500 lines).
11
+ ---
12
+
13
+ # /ciel-create-skill — Create a new Ciel skill
14
+
15
+ *Generates a valid SKILL.md scaffold following Anthropic Skills-first rules (kebab-case name ≤64 chars, YAML frontmatter ≤1024-char description, ≤500-line body, progressive disclosure to one reference.md).*
11
16
 
12
17
  Usage: `/ciel-create-skill <name> <purpose>`
13
18
 
14
- ## Process
19
+ - `name` — kebab-case, max 64 chars, unique across `skills/`
20
+ - `purpose` — one-line description of what the skill does
21
+
22
+ ---
23
+
24
+ ## What it does
25
+
26
+ 1. Dispatches the `improver` agent in MODE=CREATE-SKILL
27
+ 2. The agent invokes `skill-creator` skill
28
+ 3. Validates name, category (you pick), description length, uniqueness, paths glob
29
+ 4. Generates a scaffold SKILL.md + optional reference.md
30
+ 5. Returns the proposed files for user review
31
+ 6. On approval, writes files + registers in `skills/ciel/reference.md` catalog
32
+
33
+ ---
34
+
35
+ ## Example
36
+
37
+ ```
38
+ /ciel-create-skill kotlin-coroutines-mastery "Expert in Kotlin coroutines: structured concurrency, Flow operators, cancellation, testing. Use when working with suspend functions, CoroutineScope, Flow, or coroutine builders."
39
+ ```
40
+
41
+ Expected output:
42
+ 1. Validation: ✓ name valid, unique, no reserved words
43
+ 2. Category suggestion: `domain` (based on "Expert in X" pattern)
44
+ 3. Preview of `skills/domain/kotlin-coroutines-mastery/SKILL.md` (~150 lines)
45
+ 4. Preview of `skills/domain/kotlin-coroutines-mastery/reference.md` (~300 lines) with Flow operators cheatsheet
46
+ 5. Catalog entry: `| kotlin-coroutines-mastery | Kotlin files with suspend/Flow |`
47
+ 6. Approve? [y/n/edit]
48
+
49
+ ---
50
+
51
+ ## Naming rules
52
+
53
+ - kebab-case only (no underscores, no camelCase)
54
+ - Max 64 chars
55
+ - No reserved words: `anthropic`, `claude`, `mcp`
56
+ - Must be unique across `skills/**/SKILL.md`
57
+ - Warn if starts with category name (e.g. `workflow-foo` in `workflow/` is redundant)
58
+
59
+ ---
60
+
61
+ ## Category decision
62
+
63
+ - `workflow` — enforces a CRÉER/CRITIQUER/META-CRITIQUER step
64
+ - `research` — finds information outside the codebase
65
+ - `domain` — encodes expertise in a specific tech or pattern family
66
+ - `utility` — wraps a frequent mechanical operation
67
+ - `meta` — modifies Ciel itself
68
+
69
+ ---
70
+
71
+ ## Guardrails
72
+
73
+ - **Max 1 new skill per invocation** — prevents skill explosion
74
+ - **SKILL.md size**: ≤ 300 lines (hard cap)
75
+ - **reference.md size**: ≤ 500 lines (hard cap)
76
+ - **Duplication detection**: warns if description overlaps ≥ 70% with existing skill
77
+ - **Never creates**: files in `.claude-plugin/`, agents, hooks, commands — those use different patterns
78
+
79
+ ---
80
+
81
+ ## After creation
15
82
 
16
- 1. Validates skill name (kebab-case, ≤64 chars)
17
- 2. Researches common patterns for the domain
18
- 3. Generates SKILL.md with YAML frontmatter
19
- 4. Returns proposed skill for user approval
20
- 5. Never writes autonomously
83
+ 1. Test the skill: `/ciel-eval <name>` (runs baseline eval if dataset exists)
84
+ 2. If it's a `workflow` skill, update `skills/ciel/SKILL.md` pipeline sections to reference it
85
+ 3. Commit with message `feat(ciel): add <category>/<name> skill`
@@ -1,27 +1,88 @@
1
1
  ---
2
- command: ciel-eval
3
- description: Run binary eval harness for Ciel skills
2
+ description: ---
4
3
  agent: ciel-improver
5
4
  subtask: true
6
5
  ---
7
6
 
7
+ > **OpenCode note**: This command requires `claude --print` headless mode for full functionality (binary evals, skill scaffold generation). On OpenCode it runs in degraded mode — the improver agent returns proposals only. For the full harness, use Claude Code.
8
+
9
+ ---
10
+ description: Runs the binary eval dataset for one or all Ciel skills via claude --print headless, comparing variants.
11
+ ---
12
+
8
13
  # /ciel-eval — Run eval harness
9
14
 
10
- Runs the binary eval dataset for one skill (or all skills), comparing variants and persisting scoreboards.
15
+ *Runs the binary eval dataset for one skill (or all skills) via `claude --print` headless mode, comparing variants and persisting scoreboards.*
11
16
 
12
17
  Usage: `/ciel-eval [skill-name]`
13
18
 
14
19
  - No arg → runs baseline eval for ALL skills that have a dataset
15
20
  - `skill-name` → runs eval for that skill only (baseline + any variants in `variants/`)
16
21
 
17
- ## Process
22
+ ---
23
+
24
+ ## What it does
18
25
 
19
26
  1. Dispatches the `improver` agent in MODE=EVAL
20
27
  2. The agent invokes `skill-variant-evaluator` skill
21
28
  3. For each target skill:
22
- - Loads dataset from `evals/datasets/<skill>.jsonl`
23
- - Generates 2-3 variants of the skill
24
- - Executes all variants headlessly against the dataset
25
- - Compares aggregate scores
26
- - Proposes winner with tiebreak on token usage
27
- 4. Results appended to `evals/results/<skill>-scoreboard.md`
29
+ - Loads dataset from `evals/datasets/<skill-name>.jsonl`
30
+ - If variants exist (`skills/<category>/<name>/variants/*.md`), evaluates each
31
+ - Otherwise, baseline-only scoring
32
+ 4. Persists results to `evals/results/<skill-name>-<timestamp>.json`
33
+ 5. Returns a scoreboard
34
+
35
+ ---
36
+
37
+ ## Example output
38
+
39
+ ```
40
+ # Eval results — flux-narrator
41
+
42
+ Dataset: evals/datasets/flux-narration.jsonl (8 entries)
43
+ Variants evaluated: 3
44
+
45
+ | Variant | Score | Tokens | Duration |
46
+ |---------|-------|--------|----------|
47
+ | A (baseline) | 0.72 | 14.5k | 12.5s |
48
+ | B (tightened) | 0.89 | 15.1k | 13.2s ← WINNER |
49
+ | C (reduced) | 0.81 | 12.8k | 11.7s |
50
+
51
+ Winner: **Variant B** (+0.17 over baseline)
52
+
53
+ Recommendation: adopt Variant B via `/ciel-improve`
54
+
55
+ Result persisted: evals/results/flux-narrator-2026-04-17-142345.json
56
+ ```
57
+
58
+ ---
59
+
60
+ ## Guardrails
61
+
62
+ - **Max 20 eval entries per dataset** — prevents runaway costs
63
+ - **Max 3 variants per skill** — A/B/C, more is noise
64
+ - **Cost estimation** — displayed before starting; user can abort if > 500k tokens projected
65
+ - **Read-only evals** — `--disallowed-tools "Write Edit NotebookEdit"` to prevent side effects
66
+ - **Graceful fallback** — if `claude --print` is unavailable, outputs manual eval prompts and waits for user input
67
+
68
+ ---
69
+
70
+ ## When to run
71
+
72
+ - Before and after every `/ciel-improve` pass (baseline vs improved)
73
+ - On every CHANGELOG version bump — regression sanity check
74
+ - When creating a new skill — establish baseline score
75
+ - When adding new eval criteria — re-score existing skills
76
+
77
+ ---
78
+
79
+ ## Eval dataset format
80
+
81
+ One JSON object per line (JSONL). See `evals/datasets/depth-classification.jsonl` for examples. Required fields:
82
+
83
+ - `id` — unique in dataset
84
+ - `input` — prompt / context passed to skill
85
+ - `expected_behavior` — object mapping criterion name → boolean
86
+ - `skill` — which skill this tests
87
+
88
+ See `skills/meta/skill-variant-evaluator/reference.md` for the full schema and supported criterion evaluators.
@@ -1,19 +1,13 @@
1
1
  ---
2
- command: ciel-improve
3
- description: Self-improvement — analyze sessions and propose skill patches
4
- agent: ciel-improver
5
- subtask: true
2
+ description: Slash trigger for the ciel-improve skill on OpenCode.
6
3
  ---
7
4
 
8
- # /ciel-improve Self-improvement
5
+ Invoke the `ciel-improve` skill via the Skill tool with the user's arguments:
9
6
 
10
- Analyzes recent session transcripts to detect repeated failure modes, user corrections, and skill output truncation, then produces a patch-set proposing specific rewrites for Ciel skills.
7
+ ```
8
+ $ARGUMENTS
9
+ ```
11
10
 
12
- Usage: `/ciel-improve`
11
+ If `$ARGUMENTS` is empty, invoke the skill with no argument — it will classify the current context and prompt for a task if needed.
13
12
 
14
- ## Process
15
-
16
- 1. Scans `.ciel/learnings.md` for recent user corrections
17
- 2. Analyzes session transcripts for failure patterns
18
- 3. Produces a patch-set with specific skill rewrites
19
- 4. Never rewrites autonomously — returns proposals for review
13
+ The full logic (depth classifier, intent routing, pipeline selection, agent dispatch rules) lives in the `ciel-improve` skill itself. This command file is a thin trigger; modify the skill, not this file, to change behavior.