npm - @cleocode/skills - Versions diffs - 2026.4.0 → 2026.4.3 - Mend

@cleocode/skills 2026.4.0 → 2026.4.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (31) hide show

package/package.json +1 -1
package/skills/_shared/manifest-operations.md +1 -2
package/skills/_shared/skill-chaining-patterns.md +3 -7
package/skills/_shared/subagent-protocol-base.cant +1 -1
package/skills/ct-cleo/SKILL.md +56 -65
package/skills/ct-cleo/references/orchestrator-constraints.md +0 -13
package/skills/ct-cleo/references/session-protocol.md +3 -12
package/skills/ct-codebase-mapper/SKILL.md +7 -7
package/skills/ct-grade/SKILL.md +12 -46
package/skills/ct-grade/agents/scenario-runner.md +11 -21
package/skills/ct-grade/references/ab-test-methodology.md +14 -14
package/skills/ct-grade/references/domains.md +72 -74
package/skills/ct-grade/references/grade-spec.md +8 -11
package/skills/ct-grade/references/scenario-playbook.md +77 -106
package/skills/ct-grade-v2-1/SKILL.md +30 -32
package/skills/ct-grade-v2-1/agents/scenario-runner.md +14 -34
package/skills/ct-grade-v2-1/grade-viewer/eval-report.md +4 -1
package/skills/ct-grade-v2-1/references/ab-testing.md +28 -88
package/skills/ct-grade-v2-1/references/grade-spec-v2.md +5 -5
package/skills/ct-grade-v2-1/references/playbook-v2.md +115 -183
package/skills/ct-grade-v2-1/references/token-tracking.md +7 -9
package/skills/ct-memory/SKILL.md +16 -35
package/skills/ct-orchestrator/SKILL.md +58 -68
package/skills/ct-skill-validator/SKILL.md +1 -1
package/skills/ct-skill-validator/agents/ecosystem-checker.md +2 -2
package/skills/ct-skill-validator/references/cleo-ecosystem-rules.md +19 -20
package/skills/manifest.json +1 -1
package/skills/signaldock-connect/SKILL.md +132 -0
package/skills/signaldock-connect/assets/agent-card.json +48 -0
package/skills/signaldock-connect/references/api-endpoints.md +131 -0
package/skills.json +1 -1

package/skills/ct-grade/references/ab-test-methodology.md CHANGED Viewed

@@ -14,10 +14,11 @@ An "arm" is a specific test configuration. In CLEO A/B tests, the two most commo
 | Arm | Typical Config | Example |
 |-----|---------------|---------|
-| A | MCP gateway | Uses `query`/`mutate` for all operations |
-| B | CLI fallback | Uses `cleo-dev` CLI for equivalent operations |
+| A | Configuration A | Different CLI binary, flags, or prompt setup |
+| B | Configuration B | Alternate setup for comparison |
-Arms can also differ by:
+Arms can differ by:
+- CLI binary version (`cleo-dev` vs `cleo`)
 - Session scope (`global` vs `epic:T500`)
 - Tier escalation (with/without `admin.help`)
 - Agent persona (orchestrator vs task-executor)
@@ -71,10 +72,9 @@ save_json(arm_dir + "/timing.json", timing)
 ### Why This Matters
-Token cost is the primary economic metric for comparing interfaces:
-- MCP operations may use more tokens (richer responses, metadata)
-- CLI operations may use fewer tokens but score lower on S5
-- Score-per-token tells you which interface is more efficient for protocol work
+Token cost is the primary economic metric for comparing configurations:
+- Different configurations may produce different token costs
+- Score-per-token tells you which configuration is more efficient for protocol work
 ### Missing Token Data
@@ -98,16 +98,16 @@ If you forgot to capture tokens, you cannot recover them. Mark `total_tokens: nu
 | 0-5 pts | Noise level — likely equivalent |
 | 5-15 pts | Meaningful difference — investigate flags |
 | 15-25 pts | Significant — one interface clearly better |
-| 25+ pts | Extreme — likely S5 differential (MCP vs CLI) |
+| 25+ pts | Extreme — likely S5 differential or protocol gap |
-### Expected MCP vs CLI Delta
+### Expected Delta
 Based on the rubric implementation:
-- S5 Progressive Disclosure: always +20 for MCP (if admin.help called), +10 MCP no help, 0 CLI
+- S5 Progressive Disclosure: +20 if agent uses `admin.help` and follows read-before-write discipline
 - S1-S4: approximately equal if agent follows same protocol steps
-- Total expected delta: **+10 to +20 points** in favor of MCP for equivalent protocols
+- Configuration differences should primarily show up in S5 and token efficiency
-If delta exceeds 20 points, investigate whether the CLI agent is also skipping other protocol steps (session.list, descriptions, etc.) due to lack of guidance.
+If delta exceeds 20 points, investigate whether one arm is skipping protocol steps (session.list, descriptions, etc.).
 ---
@@ -119,8 +119,8 @@ The "git tree" metaphor: each A/B run produces a branch in the results tree. Mul
 ab_results/
   run-001/           ← first full A/B run
     s4/
-      run-01/arm-A/  ← first run, MCP arm
-      run-01/arm-B/  ← first run, CLI arm
+      run-01/arm-A/  ← first run, arm A
+      run-01/arm-B/  ← first run, arm B
       run-01/comparison.json
       run-02/arm-A/
       ...

package/skills/ct-grade/references/domains.md CHANGED Viewed

@@ -1,130 +1,128 @@
 # CLEO Domain Operation Reference for A/B Testing
 **Source**: `docs/specs/CLEO-OPERATION-CONSTITUTION.md`
-**Purpose**: Lists the key operations to test in MCP vs CLI A/B comparisons.
+**Purpose**: Lists the key operations to test in A/B comparisons.
+All operations use the CLI (`cleo` / `cleo-dev`). There is no MCP interface.
 ---
-## MCP vs CLI Equivalents
+## CLI Operations by Domain
 For each domain, these are the canonical operations to test in A/B mode.
-MCP gateway = audit metadata.gateway is `'query'` or `'mutate'` (set by MCP adapter).
-CLI = operations routed through CLI do NOT set metadata.gateway.
 ### tasks (32 operations)
-| Test Op | MCP | CLI |
-|---------|-----|-----|
-| Discovery | `query tasks find { "status": "active" }` | `cleo-dev find --status active` |
-| Show detail | `query tasks show { "taskId": "T123" }` | `cleo-dev show T123` |
-| List children | `query tasks list { "parent": "T100" }` | `cleo-dev list --parent T100` |
-| Create | `mutate tasks add { "title": "...", "description": "..." }` | `cleo-dev add --title "..." --description "..."` |
-| Update | `mutate tasks update { "taskId": "T123", "status": "active" }` | `cleo-dev update T123 --status active` |
-| Complete | `mutate tasks complete { "taskId": "T123" }` | `cleo-dev complete T123` |
-| Exists check | `query tasks exists { "taskId": "T123" }` | `cleo-dev exists T123` |
+| Test Op | CLI |
+|---------|-----|
+| Discovery | `cleo-dev find --status active` |
+| Show detail | `cleo-dev show T123` |
+| List children | `cleo-dev list --parent T100` |
+| Create | `cleo-dev add "title" --description "..."` |
+| Update | `cleo-dev update T123 --status active` |
+| Complete | `cleo-dev complete T123` |
+| Exists check | `cleo-dev exists T123` |
-**Key S2 insight**: `tasks.find` (MCP) vs `cleo-dev find` (CLI). Both count toward find:list ratio in the audit log. MCP find at gateway='query', CLI find also logged but without gateway metadata.
+**Key S2 insight**: `cleo-dev find` counts toward find:list ratio in the audit log. Always prefer find over list for discovery.
 ### session (19 operations)
-| Test Op | MCP | CLI |
-|---------|-----|-----|
-| Check existing | `query session list` | `cleo-dev session list` |
-| Start | `mutate session start { "grade": true, "scope": "global" }` | `cleo-dev session start --grade --scope global` |
-| End | `mutate session end` | `cleo-dev session end` |
-| Status | `query session status` | `cleo-dev session status` |
-| Record decision | `mutate session record.decision { "decision": "...", "rationale": "..." }` | `cleo-dev session record-decision ...` |
+| Test Op | CLI |
+|---------|-----|
+| Check existing | `cleo-dev session list` |
+| Start | `cleo-dev session start --grade --scope global` |
+| End | `cleo-dev session end` |
+| Status | `cleo-dev session status` |
+| Record decision | `cleo-dev session record-decision --decision "..." --rationale "..."` |
-**Critical**: `session.list` (MCP) is what the rubric checks for S1. If CLI does `cleo-dev session list`, it still appears as `domain='session', operation='list'` in the audit log. S1 counts it.
+**Critical**: `session.list` is what the rubric checks for S1. It must appear as `domain='session', operation='list'` in the audit log.
-### memory (18 operations) — Tier 1
+### memory (18 operations) -- Tier 1
-| Test Op | MCP | CLI |
-|---------|-----|-----|
-| Search | `query memory find { "query": "authentication" }` | `cleo-dev memory find "authentication"` |
-| Store observation | `mutate memory observe { "text": "..." }` | `cleo-dev memory observe "..."` |
-| Timeline | `query memory timeline { "anchor": "<id>" }` | N/A (MCP-preferred) |
+| Test Op | CLI |
+|---------|-----|
+| Search | `cleo-dev memory find "authentication"` |
+| Store observation | `cleo-dev observe "..."` |
+| Timeline | `cleo-dev memory timeline <id>` |
 ### admin (44 operations)
-| Test Op | MCP | CLI |
-|---------|-----|-----|
-| Dashboard | `query admin dash` | `cleo-dev dash` |
-| Help (S5 key) | `query admin help` | `cleo-dev help` |
-| Grade session | `query admin grade { "sessionId": "<id>" }` | `cleo-dev grade <id>` |
-| Health check | `query admin health` | `cleo-dev health` |
+| Test Op | CLI |
+|---------|-----|
+| Dashboard | `cleo-dev dash` |
+| Help (S5 key) | `cleo-dev help` |
+| Grade session | `cleo-dev check grade --session "<id>"` |
+| Health check | `cleo-dev health` |
-**Critical for S5**: Only `query admin help` (MCP) satisfies the `helpCalls` filter in S5. CLI `cleo-dev help` does NOT set `metadata.gateway='query'` or match `domain='admin', operation='help'` — it depends on how the CLI routes internally.
+**Critical for S5**: `cleo-dev help` satisfies the `helpCalls` filter in S5 Progressive Disclosure scoring.
-### pipeline (42 operations) — LOOM system
+### pipeline (42 operations) -- LOOM system
-| Test Op | MCP | CLI |
-|---------|-----|-----|
-| Stage status | `query pipeline stage.status` | `cleo-dev pipeline status` |
-| Stage validate | `query pipeline stage.validate` | `cleo-dev pipeline validate` |
-| Manifest list | `query pipeline manifest.list` | `cleo-dev pipeline manifest list` |
+| Test Op | CLI |
+|---------|-----|
+| Stage status | `cleo-dev pipeline stage.status --epic <id>` |
+| Stage validate | `cleo-dev pipeline stage.validate --epic <id> --stage <stage>` |
+| Manifest list | `cleo-dev manifest list` |
 ### check (19 operations)
-| Test Op | MCP | CLI |
-|---------|-----|-----|
-| Test status | `query check test.status` | `cleo-dev check test-status` |
-| Protocol check | `query check protocol` | `cleo-dev check protocol` |
-| Compliance | `query check compliance.summary` | `cleo-dev check compliance` |
+| Test Op | CLI |
+|---------|-----|
+| Test status | `cleo-dev check test-status` |
+| Protocol check | `cleo-dev check protocol` |
+| Compliance | `cleo-dev check compliance` |
 ### orchestrate (19 operations)
-| Test Op | MCP | CLI |
-|---------|-----|-----|
-| Status | `query orchestrate status` | `cleo-dev orchestrate status` |
-| Waves | `query orchestrate waves` | `cleo-dev orchestrate waves` |
+| Test Op | CLI |
+|---------|-----|
+| Status | `cleo-dev orchestrator status` |
+| Waves | `cleo-dev orchestrator waves` |
 ### tools (32 operations)
-| Test Op | MCP | CLI |
-|---------|-----|-----|
-| Skill list (S5 key) | `query tools skill.list` | `cleo-dev tools skill list` |
-| Skill show (S5 key) | `query tools skill.show { "skillId": "ct-cleo" }` | `cleo-dev tools skill show ct-cleo` |
+| Test Op | CLI |
+|---------|-----|
+| Skill list (S5 key) | `cleo-dev skill list` |
+| Skill show (S5 key) | `cleo-dev skill show ct-cleo` |
-**S5 note**: `tools.skill.list` and `tools.skill.show` via MCP count toward S5 helpCalls filter.
+**S5 note**: `tools.skill.list` and `tools.skill.show` count toward S5 helpCalls filter.
 ---
-## A/B Domain Test Configurations
+## A/B Configuration Test Examples
 ### Quick A/B: Tasks Domain
-**Goal**: Compare MCP vs CLI for core task operations.
-**Operations to execute (both interfaces)**:
-1. `session list` — S1
-2. `tasks find { "status": "active" }` — S2
-3. `tasks show { "taskId": "<valid-id>" }` — S2
-4. `session end` — S1
-**Expected score difference**: MCP ~30/100 vs CLI ~20/100 (S5 is 0 for CLI)
+**Goal**: Compare two configurations for core task operations.
+**Operations to execute (both arms)**:
+1. `cleo-dev session list` -- S1
+2. `cleo-dev find --status active` -- S2
+3. `cleo-dev show <valid-id>` -- S2
+4. `cleo-dev session end` -- S1
 ### Standard A/B: Full Protocol (S4)
-**Goal**: Full lifecycle scenario through both interfaces.
+**Goal**: Full lifecycle scenario through both configurations.
 **Operations**: Follow S4 scenario (10 ops including admin.help).
-**Expected**: MCP 100/100, CLI ~80/100
+**Expected**: 100/100 for protocol-complete arm
 ### Targeted A/B: S5 Isolation
 **Goal**: Specifically measure the S5 (progressive disclosure) gap.
-**Operations** — same except arm A calls `admin.help`, arm B does not:
+**Operations** -- same except arm A calls `admin.help`, arm B does not:
-Arm A (MCP + help):
-```
-query session list → query admin help → query tasks find → mutate session end
+Arm A (with help):
+```bash
+cleo-dev session list && cleo-dev help && cleo-dev find --status active && cleo-dev session end
 ```
-Arm B (CLI — no help call):
-```
-cleo-dev session list → cleo-dev find → cleo-dev session end
+Arm B (no help call):
+```bash
+cleo-dev session list && cleo-dev find --status active && cleo-dev session end
 ```
-**Expected**: Arm A S5 = 20/20, Arm B S5 = 0/20
+**Expected**: Arm A S5 = 20/20, Arm B S5 = 10/20
 ---

package/skills/ct-grade/references/grade-spec.md CHANGED Viewed

@@ -152,19 +152,17 @@ helpCalls = entries where:
   OR (domain='tools' AND operation IN ['skill.show','skill.list'])
   OR (domain='skills' AND operation IN ['list','show'])
-mcpQueryCalls = entries where metadata.gateway = 'query'
+readOps = entries where operation type is a read (show, find, list, status, etc.)
 ```
 | Points | Condition |
 |--------|-----------|
 | +10 | `helpCalls.length > 0` |
-| +10 | `mcpQueryCalls.length > 0` |
+| +10 | `readOps.length > 0` (agent performed read operations before writes) |
 **Flags on violation:**
 - `No admin.help or skill lookup calls (load ct-cleo for guidance)`
-- `No MCP query calls (prefer query over CLI for programmatic access)`
-**Important**: The `metadata.gateway` field equals `'query'` for MCP query operations. CLI operations do not set this field. This is how MCP vs CLI usage is distinguished in the grade.
+- `No read operations before writes (prefer discovery before mutation)`
 ---
@@ -218,14 +216,13 @@ interface GradeResult {
 ---
-## MCP vs CLI Detection in S5
+## S5 Detection
-The grading system detects MCP usage via `metadata.gateway === 'query'`. This means:
-- **MCP interface**: All query operations set `metadata.gateway = 'query'` → S5 gets +10
-- **CLI interface**: CLI operations do NOT set metadata.gateway → S5 loses +10
-- **Mixed**: Any single MCP query call is enough for the +10
+The grading system awards S5 points based on:
+1. Presence of `admin.help` or skill lookup calls (+10)
+2. Evidence of read-before-write discipline — agent performed discovery operations before mutations (+10)
-This is why A/B tests between MCP and CLI interfaces will reliably show S5 differences.
+All operations use the CLI (`cleo` / `cleo-dev`). There is no MCP interface.
 ## API Surface Update

package/skills/ct-grade/references/scenario-playbook.md CHANGED Viewed

@@ -5,8 +5,7 @@
 Each scenario targets specific grade dimensions. Run via `agents/scenario-runner.md`.
-Use **cleo-dev** (local dev build) for MCP operations or **cleo** (production).
-Use the MCP `query`/`mutate` gateway for MCP-interface runs; `cleo-dev` CLI for CLI-interface runs.
+Use **cleo-dev** (local dev build) or **cleo** (production). All operations use the CLI.
 ---
@@ -15,17 +14,7 @@ Use the MCP `query`/`mutate` gateway for MCP-interface runs; `cleo-dev` CLI for
 **Purpose**: Validates S1 (Session Discipline) and S2 (Discovery Efficiency).
 **Target score**: 45/100 (S1 full, S2 partial, S5 partial — no admin.help)
-### Operation Sequence (MCP)
-```
-1. query session list                                          — S1: must be first
-2. query admin dash                                            — project overview
-3. query tasks find { "status": "active" }                    — S2: find not list
-4. query tasks show { "taskId": "T<any>" }                    — S2: show used
-5. mutate session end                                          — S1: session.end
-```
-### Operation Sequence (CLI)
+### Operation Sequence
 ```bash
 1. cleo-dev session list
@@ -43,18 +32,16 @@ Use the MCP `query`/`mutate` gateway for MCP-interface runs; `cleo-dev` CLI for
 | S2 | 20/20 | find used exclusively (+15), show used (+5) |
 | S3 | 20/20 | No task adds (no deductions) |
 | S4 | 20/20 | No errors |
-| S5 (MCP) | 10/20 | query gateway used (+10), no admin.help call |
-| S5 (CLI) | 0/20 | No MCP query calls, no admin.help |
+| S5 | 10/20 | No admin.help call |
-**MCP total: ~90/100 (A)**
-**CLI total: ~80/100 (B)**
+**Total: ~90/100 (A)**
 ### Anti-pattern Variant (for testing grader sensitivity)
-```
-query tasks find { "status": "active" }   ← task op BEFORE session.list
-query session list                         ← too late for S1
-(no session.end)
+```bash
+cleo-dev find --status active        # task op BEFORE session.list
+cleo-dev session list                # too late for S1
+# (no session.end)
 ```
 Expected S1: 0 — flags: `session.list called after task ops`, `session.end never called`
@@ -63,19 +50,16 @@ Expected S1: 0 — flags: `session.list called after task ops`, `session.end nev
 ## S2: Task Creation Hygiene
 **Purpose**: Validates S3 (Task Hygiene) and S1.
-**Target score**: 60/100 (S1 full, S3 full, S5 partial MCP or 0 CLI)
+**Target score**: 60/100 (S1 full, S3 full, S5 partial)
-### Operation Sequence (MCP)
+### Operation Sequence
-```
-1. query session list                                             — S1
-2. query tasks exists { "taskId": "T100" }                       — S3: parent verify
-3. mutate tasks add { "title": "Implement auth",
-     "description": "Add JWT authentication to API endpoints",
-     "parent": "T100" }                                          — S3: desc + parent
-4. mutate tasks add { "title": "Write tests",
-     "description": "Unit tests for auth module" }               — S3: desc present
-5. mutate session end                                            — S1
+```bash
+1. cleo-dev session list
+2. cleo-dev show T100                                            # S3: parent verify
+3. cleo-dev add "Implement auth" --description "Add JWT authentication to API endpoints" --parent T100
+4. cleo-dev add "Write tests" --description "Unit tests for auth module"
+5. cleo-dev session end
 ```
 ### Scoring Targets
@@ -83,18 +67,16 @@ Expected S1: 0 — flags: `session.list called after task ops`, `session.end nev
 | Dim | Expected | Reason |
 |-----|----------|--------|
 | S1 | 20/20 | session.list first, session.end present |
-| S3 | 20/20 | All adds have descriptions, parent verified via exists |
-| S5 (MCP) | 10/20 | query gateway used |
-| S5 (CLI) | 0/20 | no MCP query, no help |
+| S3 | 20/20 | All adds have descriptions, parent verified via show |
+| S5 | 0/20 | no help |
-**MCP total: ~70/100 (C)**
-**CLI total: ~60/100 (C)**
+**Total: ~60/100 (C)**
 ### Anti-pattern Variant
-```
-mutate tasks add { "title": "Implement auth", "parent": "T100" }  ← no desc, no exists check
-mutate tasks add { "title": "Write tests" }                         ← no desc
+```bash
+cleo-dev add "Implement auth" --parent T100       # no desc, no exists check
+cleo-dev add "Write tests"                         # no desc
 ```
 Expected S3: 7 (20 - 5 - 5 - 3 = 7)
@@ -104,15 +86,14 @@ Expected S3: 7 (20 - 5 - 5 - 3 = 7)
 **Purpose**: Validates S4 (Error Protocol).
-### Operation Sequence (MCP)
+### Operation Sequence
-```
-1. query session list                                            — S1
-2. query tasks show { "taskId": "T99999" }                      — triggers E_NOT_FOUND
-3. query tasks find { "query": "T99999" }                       — S4: recovery within 4 ops
-4. mutate tasks add { "title": "New feature",
-     "description": "Implement the feature that was not found" } — S3: desc present
-5. mutate session end                                            — S1
+```bash
+1. cleo-dev session list
+2. cleo-dev show T99999                                          # triggers E_NOT_FOUND
+3. cleo-dev find "T99999"                                        # S4: recovery within 4 ops
+4. cleo-dev add "New feature" --description "Implement the feature that was not found"
+5. cleo-dev session end
 ```
 ### Scoring Targets
@@ -122,24 +103,23 @@ Expected S3: 7 (20 - 5 - 5 - 3 = 7)
 | S1 | 20/20 | Proper session lifecycle |
 | S3 | 20/20 | Task created with description |
 | S4 | 20/20 | E_NOT_FOUND followed by recovery lookup within 4 entries |
-| S5 (MCP) | 10/20 | query gateway used |
+| S5 | 0/20 | no help |
-**MCP total: ~90/100 (A)**
+**Total: ~80/100 (B)**
 ### Anti-pattern: Unrecovered Error
+```bash
+cleo-dev show T99999                               # E_NOT_FOUND
+cleo-dev add "Something else" --description "Unrelated"  # no recovery lookup
 ```
-query tasks show { "taskId": "T99999" }        ← E_NOT_FOUND
-mutate tasks add { "title": "Something else",
-  "description": "Unrelated" }                 ← no recovery lookup
-```
-S4 deduction: -5 (no tasks.find within next 4 entries)
+S4 deduction: -5 (no find within next 4 entries)
 ### Anti-pattern: Duplicate Creates
-```
-mutate tasks add { "title": "New feature", "description": "First attempt" }
-mutate tasks add { "title": "New feature", "description": "Second attempt" }
+```bash
+cleo-dev add "New feature" --description "First attempt"
+cleo-dev add "New feature" --description "Second attempt"
 ```
 S4 deduction: -5 (1 duplicate detected)
@@ -148,24 +128,24 @@ S4 deduction: -5 (1 duplicate detected)
 ## S4: Full Lifecycle
 **Purpose**: Validates all 5 dimensions. Gold standard session.
-**Target score**: 100/100 (A) for MCP, ~80/100 (B) for CLI
+**Target score**: 100/100 (A)
-### Operation Sequence (MCP)
+### Operation Sequence
-```
-1.  query session list                                         — S1
-2.  query admin help                                           — S5: progressive disclosure
-3.  query admin dash                                           — overview
-4.  query tasks find { "status": "pending" }                  — S2: find not list
-5.  query tasks show { "taskId": "T200" }                     — S2: show for detail
-6.  mutate tasks update { "taskId": "T200", "status": "active" } — begin work
-(agent does work here)
-7.  mutate tasks complete { "taskId": "T200" }                — mark done
-8.  query tasks find { "status": "pending" }                  — check next
-9.  mutate session end { "note": "Completed T200" }           — S1
+```bash
+1.  cleo-dev session list
+2.  cleo-dev help                                                # S5: progressive disclosure
+3.  cleo-dev dash                                                # overview
+4.  cleo-dev find --status pending                               # S2: find not list
+5.  cleo-dev show T200                                           # S2: show for detail
+6.  cleo-dev update T200 --status active                         # begin work
+    # (agent does work here)
+7.  cleo-dev complete T200                                       # mark done
+8.  cleo-dev find --status pending                               # check next
+9.  cleo-dev session end --note "Completed T200"                 # S1
 ```
-### Scoring Targets (MCP)
+### Scoring Targets
 | Dim | Expected | Reason |
 |-----|----------|--------|
@@ -173,34 +153,31 @@ S4 deduction: -5 (1 duplicate detected)
 | S2 | 20/20 | find:list 100% (+15), show used (+5) |
 | S3 | 20/20 | No adds — no deductions |
 | S4 | 20/20 | No errors, no duplicates |
-| S5 | 20/20 | admin.help (+10), query gateway (+10) |
+| S5 | 20/20 | admin.help used (+10), progressive disclosure (+10) |
-**MCP total: 100/100 (A)**
-**CLI total: ~80/100 (B)** — loses S5 entirely
+**Total: 100/100 (A)**
 ---
 ## S5: Multi-Domain Analysis
 **Purpose**: Validates cross-domain operations and advanced S5.
-**Target score**: 100/100 (MCP), ~80/100 (CLI)
+**Target score**: 100/100
-### Operation Sequence (MCP)
+### Operation Sequence
-```
-1.  query session list                                              — S1
-2.  query admin help                                               — S5
-3.  query tasks find { "parent": "T500" }                         — S2: epic subtasks
-4.  query tasks show { "taskId": "T501" }                         — S2: inspect
-5.  query session context.drift                                    — multi-domain
-6.  query session decision.log { "taskId": "T501" }               — decision history
-7.  mutate session record.decision { "taskId": "T501",
-      "decision": "Use adapter pattern",
-      "rationale": "Decouples provider logic" }                    — record decision
-8.  mutate tasks update { "taskId": "T501", "status": "active" }
-9.  mutate tasks complete { "taskId": "T501" }
-10. query tasks find { "parent": "T500", "status": "pending" }    — next subtask
-11. mutate session end                                             — S1
+```bash
+1.  cleo-dev session list
+2.  cleo-dev help
+3.  cleo-dev find --parent T500                                  # S2: epic subtasks
+4.  cleo-dev show T501                                           # S2: inspect
+5.  cleo-dev session context-drift                               # multi-domain
+6.  cleo-dev session decision-log --task T501                    # decision history
+7.  cleo-dev session record-decision --task T501 --decision "Use adapter pattern" --rationale "Decouples provider logic"
+8.  cleo-dev update T501 --status active
+9.  cleo-dev complete T501
+10. cleo-dev find --parent T500 --status pending                 # next subtask
+11. cleo-dev session end
 ```
 ### Scoring Targets
@@ -211,24 +188,18 @@ S4 deduction: -5 (1 duplicate detected)
 | S2 | 20/20 | find used exclusively, show used |
 | S3 | 20/20 | No task.add — no deductions |
 | S4 | 20/20 | No errors |
-| S5 | 20/20 | admin.help (+10), query gateway (+10) |
+| S5 | 20/20 | admin.help used (+10), progressive disclosure (+10) |
-**MCP total: 100/100 (A)**
+**Total: 100/100 (A)**
 ---
 ## Scenario Quick Reference
-| Scenario | Primary Dims Tested | MCP Expected | CLI Expected |
-|---|---|---|---|
-| S1 | S1, S2 | ~90 (A) | ~80 (B) |
-| S2 | S1, S3 | ~70 (C) | ~60 (C) |
-| S3 | S1, S3, S4 | ~90 (A) | ~80 (B) |
-| S4 | All 5 | 100 (A) | ~80 (B) |
-| S5 | All 5, cross-domain | 100 (A) | ~80 (B) |
-**Key insight**: CLI interface will consistently score 0 on S5 Progressive Disclosure because:
-1. CLI operations don't set `metadata.gateway = 'query'` (no +10)
-2. `cleo-dev admin help` CLI call is not detected as `admin.help` MCP call (no +10)
-This is by design — the rubric rewards MCP-first behavior.
+| Scenario | Primary Dims Tested | Expected Score |
+|---|---|---|
+| S1 | S1, S2 | ~90 (A) |
+| S2 | S1, S3 | ~60 (C) |
+| S3 | S1, S3, S4 | ~80 (B) |
+| S4 | All 5 | 100 (A) |
+| S5 | All 5, cross-domain | 100 (A) |