@cleocode/skills 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (171) hide show
  1. package/dispatch-config.json +404 -0
  2. package/index.d.ts +178 -0
  3. package/index.js +405 -0
  4. package/package.json +14 -0
  5. package/profiles/core.json +7 -0
  6. package/profiles/full.json +10 -0
  7. package/profiles/minimal.json +7 -0
  8. package/profiles/recommended.json +7 -0
  9. package/provider-skills-map.json +97 -0
  10. package/skills/_shared/cleo-style-guide.md +84 -0
  11. package/skills/_shared/manifest-operations.md +810 -0
  12. package/skills/_shared/placeholders.json +433 -0
  13. package/skills/_shared/skill-chaining-patterns.md +237 -0
  14. package/skills/_shared/subagent-protocol-base.md +223 -0
  15. package/skills/_shared/task-system-integration.md +232 -0
  16. package/skills/_shared/testing-framework-config.md +110 -0
  17. package/skills/ct-cleo/SKILL.md +490 -0
  18. package/skills/ct-cleo/references/anti-patterns.md +19 -0
  19. package/skills/ct-cleo/references/loom-lifecycle.md +136 -0
  20. package/skills/ct-cleo/references/orchestrator-constraints.md +55 -0
  21. package/skills/ct-cleo/references/session-protocol.md +162 -0
  22. package/skills/ct-codebase-mapper/SKILL.md +82 -0
  23. package/skills/ct-contribution/SKILL.md +521 -0
  24. package/skills/ct-contribution/templates/contribution-init.json +21 -0
  25. package/skills/ct-dev-workflow/SKILL.md +423 -0
  26. package/skills/ct-docs-lookup/SKILL.md +66 -0
  27. package/skills/ct-docs-review/SKILL.md +175 -0
  28. package/skills/ct-docs-write/SKILL.md +108 -0
  29. package/skills/ct-documentor/SKILL.md +231 -0
  30. package/skills/ct-epic-architect/SKILL.md +305 -0
  31. package/skills/ct-epic-architect/references/bug-epic-example.md +172 -0
  32. package/skills/ct-epic-architect/references/commands.md +201 -0
  33. package/skills/ct-epic-architect/references/feature-epic-example.md +210 -0
  34. package/skills/ct-epic-architect/references/migration-epic-example.md +244 -0
  35. package/skills/ct-epic-architect/references/output-format.md +92 -0
  36. package/skills/ct-epic-architect/references/patterns.md +284 -0
  37. package/skills/ct-epic-architect/references/refactor-epic-example.md +412 -0
  38. package/skills/ct-epic-architect/references/research-epic-example.md +226 -0
  39. package/skills/ct-epic-architect/references/shell-escaping.md +86 -0
  40. package/skills/ct-epic-architect/references/skill-aware-execution.md +195 -0
  41. package/skills/ct-grade/SKILL.md +230 -0
  42. package/skills/ct-grade/agents/analysis-reporter.md +203 -0
  43. package/skills/ct-grade/agents/blind-comparator.md +157 -0
  44. package/skills/ct-grade/agents/scenario-runner.md +134 -0
  45. package/skills/ct-grade/eval-viewer/__pycache__/generate_grade_review.cpython-314.pyc +0 -0
  46. package/skills/ct-grade/eval-viewer/generate_grade_review.py +1138 -0
  47. package/skills/ct-grade/eval-viewer/generate_grade_viewer.py +544 -0
  48. package/skills/ct-grade/eval-viewer/generate_review.py +283 -0
  49. package/skills/ct-grade/eval-viewer/grade-review.html +1574 -0
  50. package/skills/ct-grade/eval-viewer/viewer.html +219 -0
  51. package/skills/ct-grade/evals/evals.json +94 -0
  52. package/skills/ct-grade/references/ab-test-methodology.md +150 -0
  53. package/skills/ct-grade/references/domains.md +137 -0
  54. package/skills/ct-grade/references/grade-spec.md +236 -0
  55. package/skills/ct-grade/references/scenario-playbook.md +234 -0
  56. package/skills/ct-grade/references/token-tracking.md +120 -0
  57. package/skills/ct-grade/scripts/__pycache__/audit_analyzer.cpython-314.pyc +0 -0
  58. package/skills/ct-grade/scripts/__pycache__/run_ab_test.cpython-314.pyc +0 -0
  59. package/skills/ct-grade/scripts/__pycache__/run_all.cpython-314.pyc +0 -0
  60. package/skills/ct-grade/scripts/__pycache__/token_tracker.cpython-314.pyc +0 -0
  61. package/skills/ct-grade/scripts/audit_analyzer.py +279 -0
  62. package/skills/ct-grade/scripts/generate_report.py +283 -0
  63. package/skills/ct-grade/scripts/run_ab_test.py +504 -0
  64. package/skills/ct-grade/scripts/run_all.py +287 -0
  65. package/skills/ct-grade/scripts/setup_run.py +183 -0
  66. package/skills/ct-grade/scripts/token_tracker.py +630 -0
  67. package/skills/ct-grade-v2-1/SKILL.md +237 -0
  68. package/skills/ct-grade-v2-1/agents/analysis-reporter.md +203 -0
  69. package/skills/ct-grade-v2-1/agents/blind-comparator.md +157 -0
  70. package/skills/ct-grade-v2-1/agents/scenario-runner.md +179 -0
  71. package/skills/ct-grade-v2-1/evals/evals.json +74 -0
  72. package/skills/ct-grade-v2-1/grade-viewer/__pycache__/build_op_stats.cpython-314.pyc +0 -0
  73. package/skills/ct-grade-v2-1/grade-viewer/__pycache__/generate_grade_review.cpython-314.pyc +0 -0
  74. package/skills/ct-grade-v2-1/grade-viewer/build_op_stats.py +174 -0
  75. package/skills/ct-grade-v2-1/grade-viewer/eval-analysis.json +41 -0
  76. package/skills/ct-grade-v2-1/grade-viewer/eval-report.md +34 -0
  77. package/skills/ct-grade-v2-1/grade-viewer/generate_grade_review.py +1023 -0
  78. package/skills/ct-grade-v2-1/grade-viewer/generate_grade_viewer.py +548 -0
  79. package/skills/ct-grade-v2-1/grade-viewer/grade-review-eval.html +613 -0
  80. package/skills/ct-grade-v2-1/grade-viewer/grade-review.html +1532 -0
  81. package/skills/ct-grade-v2-1/grade-viewer/viewer.html +620 -0
  82. package/skills/ct-grade-v2-1/manifest-entry.json +31 -0
  83. package/skills/ct-grade-v2-1/references/ab-testing.md +233 -0
  84. package/skills/ct-grade-v2-1/references/domains-ssot.md +156 -0
  85. package/skills/ct-grade-v2-1/references/grade-spec-v2.md +167 -0
  86. package/skills/ct-grade-v2-1/references/playbook-v2.md +393 -0
  87. package/skills/ct-grade-v2-1/references/token-tracking.md +202 -0
  88. package/skills/ct-grade-v2-1/scripts/generate_report.py +419 -0
  89. package/skills/ct-grade-v2-1/scripts/run_ab_test.py +493 -0
  90. package/skills/ct-grade-v2-1/scripts/run_scenario.py +396 -0
  91. package/skills/ct-grade-v2-1/scripts/setup_run.py +207 -0
  92. package/skills/ct-grade-v2-1/scripts/token_tracker.py +175 -0
  93. package/skills/ct-memory/SKILL.md +84 -0
  94. package/skills/ct-orchestrator/INSTALL.md +61 -0
  95. package/skills/ct-orchestrator/README.md +69 -0
  96. package/skills/ct-orchestrator/SKILL.md +380 -0
  97. package/skills/ct-orchestrator/manifest-entry.json +19 -0
  98. package/skills/ct-orchestrator/orchestrator-prompt.txt +17 -0
  99. package/skills/ct-orchestrator/references/SUBAGENT-PROTOCOL-BLOCK.md +66 -0
  100. package/skills/ct-orchestrator/references/autonomous-operation.md +167 -0
  101. package/skills/ct-orchestrator/references/lifecycle-gates.md +98 -0
  102. package/skills/ct-orchestrator/references/orchestrator-compliance.md +271 -0
  103. package/skills/ct-orchestrator/references/orchestrator-handoffs.md +85 -0
  104. package/skills/ct-orchestrator/references/orchestrator-patterns.md +164 -0
  105. package/skills/ct-orchestrator/references/orchestrator-recovery.md +113 -0
  106. package/skills/ct-orchestrator/references/orchestrator-spawning.md +271 -0
  107. package/skills/ct-orchestrator/references/orchestrator-tokens.md +180 -0
  108. package/skills/ct-research-agent/SKILL.md +226 -0
  109. package/skills/ct-skill-creator/.cleo/.context-state.json +13 -0
  110. package/skills/ct-skill-creator/.cleo/logs/cleo.2026-03-07.1.log +24 -0
  111. package/skills/ct-skill-creator/.cleo/tasks.db +0 -0
  112. package/skills/ct-skill-creator/SKILL.md +356 -0
  113. package/skills/ct-skill-creator/agents/analyzer.md +276 -0
  114. package/skills/ct-skill-creator/agents/comparator.md +204 -0
  115. package/skills/ct-skill-creator/agents/grader.md +225 -0
  116. package/skills/ct-skill-creator/assets/eval_review.html +146 -0
  117. package/skills/ct-skill-creator/eval-viewer/__pycache__/generate_review.cpython-314.pyc +0 -0
  118. package/skills/ct-skill-creator/eval-viewer/generate_review.py +471 -0
  119. package/skills/ct-skill-creator/eval-viewer/viewer.html +1325 -0
  120. package/skills/ct-skill-creator/manifest-entry.json +17 -0
  121. package/skills/ct-skill-creator/references/dynamic-context.md +228 -0
  122. package/skills/ct-skill-creator/references/frontmatter.md +83 -0
  123. package/skills/ct-skill-creator/references/invocation-control.md +165 -0
  124. package/skills/ct-skill-creator/references/output-patterns.md +86 -0
  125. package/skills/ct-skill-creator/references/provider-deployment.md +175 -0
  126. package/skills/ct-skill-creator/references/schemas.md +430 -0
  127. package/skills/ct-skill-creator/references/workflows.md +28 -0
  128. package/skills/ct-skill-creator/scripts/__init__.py +1 -0
  129. package/skills/ct-skill-creator/scripts/__pycache__/__init__.cpython-314.pyc +0 -0
  130. package/skills/ct-skill-creator/scripts/__pycache__/aggregate_benchmark.cpython-314.pyc +0 -0
  131. package/skills/ct-skill-creator/scripts/__pycache__/generate_report.cpython-314.pyc +0 -0
  132. package/skills/ct-skill-creator/scripts/__pycache__/improve_description.cpython-314.pyc +0 -0
  133. package/skills/ct-skill-creator/scripts/__pycache__/init_skill.cpython-314.pyc +0 -0
  134. package/skills/ct-skill-creator/scripts/__pycache__/quick_validate.cpython-314.pyc +0 -0
  135. package/skills/ct-skill-creator/scripts/__pycache__/run_eval.cpython-314.pyc +0 -0
  136. package/skills/ct-skill-creator/scripts/__pycache__/run_loop.cpython-314.pyc +0 -0
  137. package/skills/ct-skill-creator/scripts/__pycache__/utils.cpython-314.pyc +0 -0
  138. package/skills/ct-skill-creator/scripts/aggregate_benchmark.py +401 -0
  139. package/skills/ct-skill-creator/scripts/generate_report.py +326 -0
  140. package/skills/ct-skill-creator/scripts/improve_description.py +247 -0
  141. package/skills/ct-skill-creator/scripts/init_skill.py +306 -0
  142. package/skills/ct-skill-creator/scripts/package_skill.py +110 -0
  143. package/skills/ct-skill-creator/scripts/quick_validate.py +97 -0
  144. package/skills/ct-skill-creator/scripts/run_eval.py +310 -0
  145. package/skills/ct-skill-creator/scripts/run_loop.py +328 -0
  146. package/skills/ct-skill-creator/scripts/utils.py +47 -0
  147. package/skills/ct-skill-validator/SKILL.md +178 -0
  148. package/skills/ct-skill-validator/agents/ecosystem-checker.md +151 -0
  149. package/skills/ct-skill-validator/assets/valid-skill-example.md +13 -0
  150. package/skills/ct-skill-validator/evals/eval_set.json +14 -0
  151. package/skills/ct-skill-validator/evals/evals.json +52 -0
  152. package/skills/ct-skill-validator/manifest-entry.json +20 -0
  153. package/skills/ct-skill-validator/references/cleo-ecosystem-rules.md +163 -0
  154. package/skills/ct-skill-validator/references/validation-rules.md +168 -0
  155. package/skills/ct-skill-validator/scripts/__init__.py +0 -0
  156. package/skills/ct-skill-validator/scripts/__pycache__/audit_body.cpython-314.pyc +0 -0
  157. package/skills/ct-skill-validator/scripts/__pycache__/check_ecosystem.cpython-314.pyc +0 -0
  158. package/skills/ct-skill-validator/scripts/__pycache__/generate_validation_report.cpython-314.pyc +0 -0
  159. package/skills/ct-skill-validator/scripts/__pycache__/validate.cpython-314.pyc +0 -0
  160. package/skills/ct-skill-validator/scripts/audit_body.py +242 -0
  161. package/skills/ct-skill-validator/scripts/check_ecosystem.py +169 -0
  162. package/skills/ct-skill-validator/scripts/check_manifest.py +172 -0
  163. package/skills/ct-skill-validator/scripts/generate_validation_report.py +442 -0
  164. package/skills/ct-skill-validator/scripts/validate.py +422 -0
  165. package/skills/ct-spec-writer/SKILL.md +189 -0
  166. package/skills/ct-stickynote/README.md +14 -0
  167. package/skills/ct-stickynote/SKILL.md +46 -0
  168. package/skills/ct-task-executor/SKILL.md +296 -0
  169. package/skills/ct-validator/SKILL.md +216 -0
  170. package/skills/manifest.json +469 -0
  171. package/skills.json +281 -0
@@ -0,0 +1,86 @@
1
+ # Shell Escaping Reference
2
+
3
+ When passing text to CLEO commands via `--notes`, `--description`, or other text fields, certain characters require escaping to prevent shell interpretation.
4
+
5
+ ---
6
+
7
+ ## Quick Reference
8
+
9
+ | Character | Escape As | Example |
10
+ |-----------|-----------|---------|
11
+ | `$` | `\$` | `\$100`, `\$HOME` |
12
+ | Backtick | `\`` | Code blocks |
13
+ | `"` | `\"` | Nested quotes |
14
+ | Exclamation | `\!` | History expansion (bash) |
15
+
16
+ ---
17
+
18
+ ## Common Patterns
19
+
20
+ ### Dollar Signs in Notes
21
+
22
+ ```bash
23
+ # CORRECT - escaped dollar sign
24
+ cleo add "Task" --notes "Cost estimate: \$500 per user"
25
+ cleo add "Task" --description "Process \$DATA variable"
26
+
27
+ # WRONG - $500 and $DATA interpreted as shell variables
28
+ cleo add "Task" --notes "Cost estimate: $500 per user"
29
+ cleo add "Task" --description "Process $DATA variable"
30
+ ```
31
+
32
+ ### Quotes in Text
33
+
34
+ ```bash
35
+ # CORRECT - escaped inner quotes
36
+ cleo add "Task" --notes "User said \"hello world\""
37
+
38
+ # Alternative - use single quotes for outer
39
+ cleo add 'Task with "quotes" inside'
40
+ ```
41
+
42
+ ### Backticks for Code
43
+
44
+ ```bash
45
+ # CORRECT - escaped backticks
46
+ cleo add "Task" --notes "Run \`npm install\` first"
47
+
48
+ # Alternative - use $() syntax in description
49
+ cleo add "Task" --notes 'Run `npm install` first'
50
+ ```
51
+
52
+ ---
53
+
54
+ ## Validation Errors
55
+
56
+ If you see validation exit code 6 (`E_VALIDATION_*`), check for unescaped special characters in your text fields. The shell may have interpolated variables before CLEO received the input.
57
+
58
+ ### Debugging Tips
59
+
60
+ 1. Echo your command first to see what the shell produces:
61
+ ```bash
62
+ echo "cleo add \"Task\" --notes \"Price: $500\""
63
+ # Shows: cleo add "Task" --notes "Price: "
64
+ # The $500 became empty (no $500 variable exists)
65
+ ```
66
+
67
+ 2. Use single quotes when escaping is complex:
68
+ ```bash
69
+ cleo add 'Task' --notes 'Price: $500 for "premium" tier'
70
+ ```
71
+
72
+ ---
73
+
74
+ ## HEREDOC for Complex Text
75
+
76
+ For multi-line or heavily-quoted content, use a HEREDOC:
77
+
78
+ ```bash
79
+ cleo update T001 --notes "$(cat <<'EOF'
80
+ Complex notes with $variables and "quotes"
81
+ that don't need escaping inside HEREDOC.
82
+ EOF
83
+ )"
84
+ ```
85
+
86
+ Note: Use `<<'EOF'` (quoted) to prevent variable expansion, or `<<EOF` (unquoted) if you want variables expanded.
@@ -0,0 +1,195 @@
1
+ # Skill-Aware Epic Execution Patterns
2
+
3
+ This reference documents how to integrate the ct-epic-architect skill with orchestrators, subagents, and CLEO research commands.
4
+
5
+ ---
6
+
7
+ ## Orchestrator Workflow
8
+
9
+ ### When to Invoke ct-epic-architect
10
+
11
+ Use the ct-epic-architect skill when the user's request involves:
12
+
13
+ | Trigger | Example Request | Action |
14
+ |---------|-----------------|--------|
15
+ | Epic creation | "Create an epic for user authentication" | Invoke `/ct-epic-architect` |
16
+ | Task decomposition | "Break down this project into tasks" | Invoke `/ct-epic-architect` |
17
+ | Dependency planning | "Plan the dependency order for this feature" | Invoke `/ct-epic-architect` |
18
+ | Wave analysis | "What can run in parallel?" | Invoke `/ct-epic-architect` |
19
+ | Sprint planning | "Plan the sprint backlog" | Invoke `/ct-epic-architect` |
20
+
21
+ ### How to Invoke
22
+
23
+ ```
24
+ # Via Skill tool
25
+ Skill(skill="ct-epic-architect")
26
+
27
+ # Via slash command
28
+ /ct-epic-architect
29
+ ```
30
+
31
+ **What Loads:**
32
+ 1. SKILL.md body (480 lines of core instructions)
33
+ 2. Access to references/ files (loaded on-demand when Claude reads them)
34
+
35
+ ### Decision Tree
36
+
37
+ ```
38
+ User Request
39
+
40
+
41
+ ┌───────────────────────────────┐
42
+ │ Is this about epic/task │
43
+ │ planning and decomposition? │
44
+ └───────────────────────────────┘
45
+
46
+ ├── YES ─► Invoke /ct-epic-architect
47
+
48
+ └── NO ─► Handle directly or use other skill
49
+ ```
50
+
51
+ ---
52
+
53
+ ## Subagent Skill Specification
54
+
55
+ ### Why Subagents Don't Inherit Skills
56
+
57
+ Per the skill system architecture:
58
+ - **Skills are session-scoped** - They load into the CURRENT context
59
+ - **Subagents are NEW contexts** - They don't automatically get parent skills
60
+ - **This is intentional** - Prevents context bloat and skill pollution
61
+
62
+ ### When Subagents SHOULD Have ct-epic-architect
63
+
64
+ | Scenario | Should Have Skill? | Rationale |
65
+ |----------|-------------------|-----------|
66
+ | Coder implementing a task | No | Coders execute, not plan |
67
+ | Researcher gathering info | No | Researchers investigate, not plan |
68
+ | Nested orchestrator | Yes | Needs planning capability |
69
+ | Epic architect subagent | Yes | Primary function requires it |
70
+
71
+ ### Declaring Skills for Subagents
72
+
73
+ When spawning a subagent that needs ct-epic-architect:
74
+
75
+ ```markdown
76
+ # In subagent prompt frontmatter (if using template)
77
+ ---
78
+ name: nested-orchestrator
79
+ skills:
80
+ - ct-epic-architect
81
+ - orchestrator
82
+ ---
83
+ ```
84
+
85
+ Or via Task tool:
86
+
87
+ ```
88
+ Task(
89
+ subagent_type="general-purpose",
90
+ prompt="""
91
+ You have access to the ct-epic-architect skill.
92
+ Use /ct-epic-architect when you need to create epics.
93
+
94
+ [Task description]
95
+ """
96
+ )
97
+ ```
98
+
99
+ ---
100
+
101
+ ## CLEO Research Integration
102
+
103
+ ### Epic-Architect Uses Research Commands
104
+
105
+ Before creating epics, ct-epic-architect SHOULD check existing research:
106
+
107
+ ```bash
108
+ # Check for related research before planning
109
+ {{TASK_RESEARCH_LIST_CMD}} --status complete --topic {{DOMAIN}}
110
+ {{TASK_RESEARCH_SHOW_CMD}} {{RESEARCH_ID}}
111
+
112
+ # Link epic to research after creation
113
+ {{TASK_LINK_CMD}} {{EPIC_ID}} {{RESEARCH_ID}}
114
+ ```
115
+
116
+ ### Research Output Protocol for Epic Creation
117
+
118
+ When ct-epic-architect creates an epic, it follows the subagent protocol:
119
+
120
+ 1. **Write output file**: `{{OUTPUT_DIR}}/{{DATE}}_epic-{{FEATURE_SLUG}}.md`
121
+ 2. **Append manifest entry**: Single line JSON to `{{MANIFEST_PATH}}`
122
+ 3. **Return summary only**: "Epic created. See MANIFEST.jsonl for summary."
123
+
124
+ ### Querying Prior Research
125
+
126
+ Epic-architect can leverage prior research for informed planning:
127
+
128
+ ```bash
129
+ # Find research related to the epic domain
130
+ {{TASK_RESEARCH_LIST_CMD}} --topic "authentication"
131
+
132
+ # Get key findings from specific research
133
+ {{TASK_RESEARCH_SHOW_CMD}} research-auth-options-2026-01-15
134
+
135
+ # The key_findings array informs epic structure without loading full content
136
+ ```
137
+
138
+ ---
139
+
140
+ ## Integration with Orchestrator Skill
141
+
142
+ ### Orchestrator → Epic-Architect Flow
143
+
144
+ ```
145
+ Orchestrator receives "plan authentication feature"
146
+
147
+
148
+ Orchestrator spawns ct-epic-architect subagent
149
+
150
+
151
+ Epic-architect creates epic and tasks in CLEO
152
+
153
+
154
+ Epic-architect writes to manifest
155
+
156
+
157
+ Orchestrator reads manifest key_findings
158
+
159
+
160
+ Orchestrator proceeds with task execution
161
+ ```
162
+
163
+ ### Context Protection
164
+
165
+ The orchestrator skill enforces context budget (ORC-005). When spawning ct-epic-architect:
166
+
167
+ 1. **Pass minimal context** - Epic-architect reads full task details itself
168
+ 2. **Receive minimal response** - Only manifest summary returned
169
+ 3. **Query manifest for details** - Don't ask ct-epic-architect for full breakdown
170
+
171
+ ```bash
172
+ # Orchestrator queries manifest for epic details
173
+ tail -1 {{MANIFEST_PATH}} | jq '.key_findings'
174
+ ```
175
+
176
+ ---
177
+
178
+ ## Anti-Patterns
179
+
180
+ | Anti-Pattern | Problem | Solution |
181
+ |--------------|---------|----------|
182
+ | Giving all subagents ct-epic-architect skill | Context bloat | Only nested orchestrators need it |
183
+ | Returning full epic details | Bloats orchestrator context | Return "Epic created. See MANIFEST." |
184
+ | Skipping research check | Duplicate work | Always query research first |
185
+ | Loading all references | Wasted tokens | Load only needed references |
186
+ | Parallel epic creation | Task conflicts | Create epics sequentially |
187
+
188
+ ---
189
+
190
+ ## Cross-References
191
+
192
+ - **Orchestrator Skill**: skills/ct-orchestrator/SKILL.md
193
+ - **Subagent Protocol**: skills/_shared/subagent-protocol-base.md
194
+ - **Task System Integration**: skills/_shared/task-system-integration.md
195
+ - **Research Commands**: docs/commands/research.md
@@ -0,0 +1,230 @@
1
+ ---
2
+ name: ct-grade
3
+ description: Session grading for agent behavioral analysis. Use when evaluating agent session quality, running grade scenarios, or interpreting grade results. Triggers on grading tasks, session quality checks, or behavioral analysis needs.
4
+ version: 1.0.0
5
+ tier: 2
6
+ core: false
7
+ category: quality
8
+ protocol: null
9
+ dependencies: []
10
+ sharedResources: []
11
+ compatibility:
12
+ - claude-code
13
+ - cursor
14
+ - windsurf
15
+ - gemini-cli
16
+ license: MIT
17
+ ---
18
+
19
+ # Session Grading Guide
20
+
21
+ Session grading evaluates agent behavioral patterns against the CLEO protocol. It reads the audit log for a completed session and applies a 5-dimension rubric to produce a score (0-100), letter grade (A-F), and diagnostic flags.
22
+
23
+ ## When to Use Grade Mode
24
+
25
+ Use grading when you need to:
26
+ - Evaluate how well an agent followed CLEO protocol during a session
27
+ - Identify behavioral anti-patterns (skipped discovery, missing session.end, etc.)
28
+ - Track improvement over time across multiple sessions
29
+ - Validate that orchestrated subagents followed protocol
30
+
31
+ Grading requires audit data. Sessions must be started with the `--grade` flag to enable audit log capture.
32
+
33
+ ## Starting a Grade Session
34
+
35
+ ### CLI
36
+
37
+ ```bash
38
+ # Start a session with grading enabled
39
+ ct session start --scope epic:T001 --name "Feature work" --grade
40
+
41
+ # The --grade flag enables detailed audit logging
42
+ # All MCP and CLI operations are recorded for later analysis
43
+ ```
44
+
45
+ ### MCP
46
+
47
+ ```
48
+ mutate({ domain: "session", operation: "start",
49
+ params: { scope: "epic:T001", name: "Feature work", grade: true }})
50
+ ```
51
+
52
+ ## Running Scenarios
53
+
54
+ The grading rubric evaluates 5 behavioral scenarios that map to protocol compliance:
55
+
56
+ ### 1. Fresh Discovery
57
+ Tests whether the agent checks existing sessions and tasks before starting work. Evaluates `session.list` and `tasks.find` calls at session start.
58
+
59
+ ### 2. Task Hygiene
60
+ Tests whether task creation follows protocol: descriptions provided, parent existence verified before subtask creation, no duplicate tasks.
61
+
62
+ ### 3. Error Recovery
63
+ Tests whether the agent handles errors correctly: follows up `E_NOT_FOUND` with recovery lookups (`tasks.find` or `tasks.exists`), avoids duplicate creates after failures.
64
+
65
+ ### 4. Full Lifecycle
66
+ Tests session discipline end-to-end: session listed before task ops, session properly ended, MCP-first usage patterns.
67
+
68
+ ### 5. Multi-Domain Analysis
69
+ Tests progressive disclosure: use of `admin.help` or skill lookups, preference for `query` (MCP) over CLI for programmatic access.
70
+
71
+ ## Evaluating Results
72
+
73
+ ### CLI
74
+
75
+ ```bash
76
+ # Grade a specific session
77
+ ct grade <sessionId>
78
+
79
+ # List all past grade results
80
+ ct grade --list
81
+ ```
82
+
83
+ ### MCP
84
+
85
+ ```
86
+ # Grade a session
87
+ # Canonical registry surface (preferred)
88
+ query({ domain: "check", operation: "grade",
89
+ params: { sessionId: "abc-123" }})
90
+
91
+ # List past grades
92
+ query({ domain: "check", operation: "grade.list" })
93
+
94
+ # Compatibility aliases still work at runtime
95
+ query({ domain: "admin", operation: "grade",
96
+ params: { sessionId: "abc-123" }})
97
+ ```
98
+
99
+ ## Understanding the 5 Dimensions
100
+
101
+ Each dimension scores 0-20 points, totaling 0-100.
102
+
103
+ ### S1: Session Discipline (20 pts)
104
+
105
+ | Points | Criteria |
106
+ |--------|----------|
107
+ | 10 | `session.list` called before first task operation |
108
+ | 10 | `session.end` called when work is complete |
109
+
110
+ **What it measures**: Does the agent check existing sessions before starting, and properly close sessions when done?
111
+
112
+ ### S2: Discovery Efficiency (20 pts)
113
+
114
+ | Points | Criteria |
115
+ |--------|----------|
116
+ | 0-15 | `find:list` ratio >= 80% earns full 15; scales linearly below |
117
+ | 5 | `tasks.show` used for detail retrieval |
118
+
119
+ **What it measures**: Does the agent prefer `tasks.find` (low context cost) over `tasks.list` (high context cost) for discovery?
120
+
121
+ ### S3: Task Hygiene (20 pts)
122
+
123
+ Starts at 20 and deducts for violations:
124
+
125
+ | Deduction | Violation |
126
+ |-----------|-----------|
127
+ | -5 each | `tasks.add` without a description |
128
+ | -3 | Subtasks created without `tasks.exists` parent check |
129
+
130
+ **What it measures**: Does the agent create well-formed tasks with descriptions and verify parents before creating subtasks?
131
+
132
+ ### S4: Error Protocol (20 pts)
133
+
134
+ Starts at 20 and deducts for violations:
135
+
136
+ | Deduction | Violation |
137
+ |-----------|-----------|
138
+ | -5 each | `E_NOT_FOUND` error not followed by recovery lookup within 5 ops |
139
+ | -5 | Duplicate task creates detected (same title in session) |
140
+
141
+ **What it measures**: Does the agent recover gracefully from errors and avoid creating duplicate tasks?
142
+
143
+ ### S5: Progressive Disclosure Use (20 pts)
144
+
145
+ | Points | Criteria |
146
+ |--------|----------|
147
+ | 10 | `admin.help` or skill lookup calls made |
148
+ | 10 | `query` (MCP gateway) used for programmatic access |
149
+
150
+ **What it measures**: Does the agent use progressive disclosure (help/skills) and prefer MCP over CLI?
151
+
152
+ ## Interpreting Scores
153
+
154
+ ### Letter Grades
155
+
156
+ | Grade | Score Range | Meaning |
157
+ |-------|-----------|---------|
158
+ | **A** | 90-100 | Excellent protocol adherence. Agent follows all best practices. |
159
+ | **B** | 75-89 | Good. Minor gaps in one or two dimensions. |
160
+ | **C** | 60-74 | Acceptable. Several protocol violations need attention. |
161
+ | **D** | 45-59 | Below expectations. Significant anti-patterns present. |
162
+ | **F** | 0-44 | Failing. Major protocol violations across multiple dimensions. |
163
+
164
+ ### Reading the Output
165
+
166
+ The grade result includes:
167
+ - **score/maxScore**: Raw numeric score (e.g., `85/100`)
168
+ - **percent**: Percentage score
169
+ - **grade**: Letter grade (A-F)
170
+ - **dimensions**: Per-dimension breakdown with score, max, and evidence
171
+ - **flags**: Specific violations or improvement suggestions
172
+ - **entryCount**: Number of audit entries analyzed
173
+
174
+ ### Flags
175
+
176
+ Flags are actionable diagnostic messages. Each flag identifies a specific behavioral issue:
177
+
178
+ - `session.list never called` -- Check existing sessions before starting new ones
179
+ - `session.end never called` -- Always end sessions when done
180
+ - `tasks.list used Nx` -- Prefer `tasks.find` for discovery
181
+ - `tasks.add without description` -- Always provide task descriptions
182
+ - `Subtasks created without tasks.exists parent check` -- Verify parent exists first
183
+ - `E_NOT_FOUND not followed by recovery lookup` -- Follow errors with `tasks.find` or `tasks.exists`
184
+ - `No admin.help or skill lookup calls` -- Load `ct-cleo` for protocol guidance
185
+ - `No MCP query calls` -- Prefer `query` over CLI
186
+
187
+ ## Common Anti-patterns
188
+
189
+ | Anti-pattern | Impact | Fix |
190
+ |-------------|--------|-----|
191
+ | Skipping `session.list` at start | -10 S1 | Always check existing sessions first |
192
+ | Forgetting `session.end` | -10 S1 | End sessions when work is complete |
193
+ | Using `tasks.list` instead of `tasks.find` | -up to 15 S2 | Use `find` for discovery, `list` only for known parent children |
194
+ | Creating tasks without descriptions | -5 each S3 | Always provide a description with `tasks.add` |
195
+ | Ignoring `E_NOT_FOUND` errors | -5 each S4 | Follow up with `tasks.find` or `tasks.exists` |
196
+ | Creating duplicate tasks | -5 S4 | Check for existing tasks before creating new ones |
197
+ | Never using `admin.help` | -10 S5 | Use progressive disclosure for protocol guidance |
198
+ | CLI-only usage (no MCP) | -10 S5 | Prefer `query`/`mutate` for programmatic access |
199
+
200
+ ## Grade Result Schema
201
+
202
+ Grade results are stored in `.cleo/metrics/GRADES.jsonl` as append-only JSONL. Each entry conforms to `schemas/grade.schema.json` with these fields:
203
+
204
+ - `sessionId` (string, required) -- Session that was graded
205
+ - `taskId` (string, optional) -- Associated task ID
206
+ - `totalScore` (number, 0-100) -- Aggregate score
207
+ - `maxScore` (number, default 100) -- Maximum possible score
208
+ - `dimensions` (object) -- Per-dimension `{ score, max, evidence[] }`
209
+ - `flags` (string[]) -- Specific violations or suggestions
210
+ - `timestamp` (ISO 8601) -- When the grade was computed
211
+ - `entryCount` (number) -- Audit entries analyzed
212
+ - `evaluator` (`auto` | `manual`) -- How the grade was computed
213
+
214
+ ## MCP Operations
215
+
216
+ | Gateway | Domain | Operation | Description |
217
+ |---------|--------|-----------|-------------|
218
+ | `query` | `check` | `grade` | Canonical grade read (`params: { sessionId }`) |
219
+ | `query` | `check` | `grade.list` | Canonical grade history read |
220
+ | `query` | `admin` | `grade` | Compatibility alias for runtime handlers |
221
+ | `query` | `admin` | `grade.list` | Compatibility alias for runtime handlers |
222
+ | `query` | `admin` | `token` | Canonical token telemetry read (`action=summary|list|show`) |
223
+
224
+
225
+ ## API Update Notes
226
+
227
+ - Prefer the canonical registry surface from `docs/specs/CLEO-API.md`: `check.grade`, `check.grade.list`, and `admin.token` with an `action` param.
228
+ - `admin.grade*` and split `admin.token.*` paths remain compatibility handlers and may still appear in existing automation.
229
+ - Browser clients should target `POST /api/query` and `POST /api/mutate`; LAFS metadata is carried in `X-Cleo-*` headers by default.
230
+ - Treat persisted token transport values `api` and `http` as equivalent during the compatibility window described in `docs/specs/CLEO-WEB-API.md`.