ctx-cc 3.5.0 → 4.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +375 -676
- package/agents/ctx-arch-mapper.md +5 -3
- package/agents/ctx-auditor.md +5 -3
- package/agents/ctx-codex-reviewer.md +214 -0
- package/agents/ctx-concerns-mapper.md +5 -3
- package/agents/ctx-criteria-suggester.md +6 -4
- package/agents/ctx-debugger.md +5 -3
- package/agents/ctx-designer.md +488 -114
- package/agents/ctx-discusser.md +5 -3
- package/agents/ctx-executor.md +5 -3
- package/agents/ctx-handoff.md +6 -4
- package/agents/ctx-learner.md +5 -3
- package/agents/ctx-mapper.md +4 -3
- package/agents/ctx-ml-analyst.md +600 -0
- package/agents/ctx-ml-engineer.md +933 -0
- package/agents/ctx-ml-reviewer.md +485 -0
- package/agents/ctx-ml-scientist.md +626 -0
- package/agents/ctx-parallelizer.md +4 -3
- package/agents/ctx-planner.md +5 -3
- package/agents/ctx-predictor.md +4 -3
- package/agents/ctx-qa.md +5 -3
- package/agents/ctx-quality-mapper.md +5 -3
- package/agents/ctx-researcher.md +5 -3
- package/agents/ctx-reviewer.md +6 -4
- package/agents/ctx-team-coordinator.md +5 -3
- package/agents/ctx-tech-mapper.md +5 -3
- package/agents/ctx-verifier.md +5 -3
- package/bin/ctx.js +199 -27
- package/commands/brand.md +309 -0
- package/commands/ctx.md +10 -10
- package/commands/design.md +304 -0
- package/commands/experiment.md +251 -0
- package/commands/help.md +57 -7
- package/commands/init.md +25 -0
- package/commands/metrics.md +1 -1
- package/commands/milestone.md +1 -1
- package/commands/ml-status.md +197 -0
- package/commands/monitor.md +1 -1
- package/commands/train.md +266 -0
- package/commands/visual-qa.md +559 -0
- package/commands/voice.md +1 -1
- package/hooks/post-tool-use.js +39 -0
- package/hooks/pre-tool-use.js +94 -0
- package/hooks/subagent-stop.js +32 -0
- package/package.json +9 -3
- package/plugin.json +46 -0
- package/skills/ctx-design-system/SKILL.md +572 -0
- package/skills/ctx-ml-experiment/SKILL.md +334 -0
- package/skills/ctx-ml-pipeline/SKILL.md +437 -0
- package/skills/ctx-orchestrator/SKILL.md +91 -0
- package/skills/ctx-review-gate/SKILL.md +147 -0
- package/skills/ctx-state/SKILL.md +100 -0
- package/skills/ctx-visual-qa/SKILL.md +587 -0
- package/src/agents.js +109 -0
- package/src/auto.js +287 -0
- package/src/capabilities.js +226 -0
- package/src/commits.js +94 -0
- package/src/config.js +112 -0
- package/src/context.js +241 -0
- package/src/handoff.js +156 -0
- package/src/hooks.js +218 -0
- package/src/install.js +125 -50
- package/src/lifecycle.js +194 -0
- package/src/metrics.js +198 -0
- package/src/pipeline.js +269 -0
- package/src/review-gate.js +338 -0
- package/src/runner.js +120 -0
- package/src/skills.js +143 -0
- package/src/state.js +267 -0
- package/src/worktree.js +244 -0
- package/templates/PRD.json +1 -1
- package/templates/config.json +4 -237
- package/workflows/ctx-router.md +0 -485
- package/workflows/map-codebase.md +0 -329
package/commands/help.md
CHANGED
|
@@ -4,18 +4,18 @@ description: Show CTX commands and usage guide
|
|
|
4
4
|
---
|
|
5
5
|
|
|
6
6
|
<objective>
|
|
7
|
-
Display the CTX
|
|
7
|
+
Display the CTX 4.0 command reference.
|
|
8
8
|
|
|
9
9
|
Output ONLY the reference content below. Do NOT add project-specific analysis.
|
|
10
10
|
</objective>
|
|
11
11
|
|
|
12
12
|
<reference>
|
|
13
|
-
# CTX
|
|
13
|
+
# CTX 4.0 Command Reference
|
|
14
14
|
|
|
15
15
|
**CTX** (Continuous Task eXecution) - Intelligent workflow orchestration for Claude Code.
|
|
16
16
|
|
|
17
17
|
**Conversational-first.** Just describe what you want - no commands to memorize.
|
|
18
|
-
|
|
18
|
+
26 agents. 7 skills. Deterministic hooks. Three-stage review gate with Codex cross-model review. Phase-based lifecycle.
|
|
19
19
|
|
|
20
20
|
## Quick Start
|
|
21
21
|
|
|
@@ -25,6 +25,7 @@ Output ONLY the reference content below. Do NOT add project-specific analysis.
|
|
|
25
25
|
"I want to build a todo app" → CTX sets up your project
|
|
26
26
|
"Fix the login bug" → CTX starts debugging
|
|
27
27
|
"Is my app accessible?" → CTX runs QA
|
|
28
|
+
"Design a landing page" → CTX launches design workflow
|
|
28
29
|
"What's next?" → CTX shows status
|
|
29
30
|
```
|
|
30
31
|
|
|
@@ -36,7 +37,17 @@ Or use commands directly:
|
|
|
36
37
|
4. /ctx status Check progress (read-only)
|
|
37
38
|
```
|
|
38
39
|
|
|
39
|
-
## What's New in
|
|
40
|
+
## What's New in 4.0
|
|
41
|
+
|
|
42
|
+
- **Native Architecture** - Skills, hooks, agents via Claude Code's extension points
|
|
43
|
+
- **Agency-Grade Design** - Figma MCP, DTCG tokens, pixel-perfect QA, approval gates
|
|
44
|
+
- **Deterministic Hooks** - PreToolUse/PostToolUse/SubagentStop enforcement
|
|
45
|
+
- **Agent Capabilities** - Tool allowlists per agent category (planning/execution/review)
|
|
46
|
+
- **Pipeline Orchestration** - plan→execute→verify via Skills, not external JS
|
|
47
|
+
- **Autonomous Mode** - Process stories with review gates and graceful stop
|
|
48
|
+
- **Plugin Manifest** - Marketplace-ready distribution
|
|
49
|
+
|
|
50
|
+
## What's in 3.5
|
|
40
51
|
|
|
41
52
|
- **Learning System** - CTX learns your patterns, decisions, preferences
|
|
42
53
|
- **Predictive Planning** - AI suggests what to build next with ROI scoring
|
|
@@ -115,10 +126,24 @@ Or use commands directly:
|
|
|
115
126
|
| `/ctx debug --status` | Show current session status |
|
|
116
127
|
| `/ctx debug --abort` | Abort current session |
|
|
117
128
|
|
|
129
|
+
### Design (Agency-Grade Visual Workflow)
|
|
130
|
+
| Command | Purpose |
|
|
131
|
+
|---------|---------|
|
|
132
|
+
| `/ctx design` | Launch design workflow (brand, component, or audit) |
|
|
133
|
+
| `/ctx brand` | Establish brand identity — mood board → 3 options → BRAND_KIT.md |
|
|
134
|
+
| `/ctx visual-qa` | Pixel-perfect visual QA — measurement-driven, not screenshot |
|
|
135
|
+
|
|
136
|
+
### Machine Learning
|
|
137
|
+
| Command | Purpose |
|
|
138
|
+
|---------|---------|
|
|
139
|
+
| `/ctx experiment` | ML experiment workflow — hypothesize, design, run, analyze, iterate |
|
|
140
|
+
| `/ctx train` | Model training — features, HPO, evaluation, registry |
|
|
141
|
+
| `/ctx ml-status` | ML dashboard — experiments, models, drift alerts |
|
|
142
|
+
|
|
118
143
|
### QA (Full System Testing)
|
|
119
144
|
| Command | Purpose |
|
|
120
145
|
|---------|---------|
|
|
121
|
-
| `/ctx qa` | Full system QA - crawl all pages (WCAG 2.
|
|
146
|
+
| `/ctx qa` | Full system QA - crawl all pages (WCAG 2.2 AA) |
|
|
122
147
|
| `/ctx qa --pages` | List discovered pages first |
|
|
123
148
|
| `/ctx qa --section "auth"` | QA specific section only |
|
|
124
149
|
| `/ctx qa --a11y-only` | Accessibility audit only |
|
|
@@ -208,7 +233,7 @@ Or use commands directly:
|
|
|
208
233
|
| balanced | Opus | Sonnet | Haiku | 1x |
|
|
209
234
|
| budget | Sonnet | Sonnet | Haiku | 0.4x |
|
|
210
235
|
|
|
211
|
-
##
|
|
236
|
+
## 25 Specialized Agents
|
|
212
237
|
|
|
213
238
|
| Agent | Purpose |
|
|
214
239
|
|-------|---------|
|
|
@@ -232,6 +257,31 @@ Or use commands directly:
|
|
|
232
257
|
| ctx-auditor | Compliance logging |
|
|
233
258
|
| ctx-learner | Pattern learning |
|
|
234
259
|
| ctx-predictor | Feature prediction |
|
|
260
|
+
| ctx-qa | Full system QA (WCAG 2.2 + visual) |
|
|
261
|
+
| ctx-ml-scientist | Autonomous ML experiments (hypothesis-driven) |
|
|
262
|
+
| ctx-ml-engineer | MLOps pipelines, model registry, drift detection |
|
|
263
|
+
| ctx-ml-analyst | Data analysis, EDA, statistical testing |
|
|
264
|
+
| ctx-ml-reviewer | ML code review, leakage detection, reproducibility |
|
|
265
|
+
|
|
266
|
+
## 7 Skills (Auto-Discovered)
|
|
267
|
+
|
|
268
|
+
| Skill | Purpose |
|
|
269
|
+
|-------|---------|
|
|
270
|
+
| ctx-orchestrator | Pipeline execution, lifecycle, autonomous mode |
|
|
271
|
+
| ctx-state | STATE.json management, phase transitions |
|
|
272
|
+
| ctx-review-gate | Three-stage review (spec compliance + code quality + optional Codex cross-review) |
|
|
273
|
+
| ctx-design-system | W3C DTCG tokens, Figma sync, theme management |
|
|
274
|
+
| ctx-visual-qa | Pixel-perfect measurement, accessibility audit |
|
|
275
|
+
| ctx-ml-experiment | ML experiment lifecycle, hypothesis tracking |
|
|
276
|
+
| ctx-ml-pipeline | Training pipelines, inference, model registry |
|
|
277
|
+
|
|
278
|
+
## 3 Hooks (Deterministic)
|
|
279
|
+
|
|
280
|
+
| Hook | Event | Purpose |
|
|
281
|
+
|------|-------|---------|
|
|
282
|
+
| ctx-pre-tool-use.js | PreToolUse | TDD enforcement, capability restrictions |
|
|
283
|
+
| ctx-post-tool-use.js | PostToolUse | Audit logging for file modifications |
|
|
284
|
+
| ctx-subagent-stop.js | SubagentStop | Agent completion tracking in STATE.json |
|
|
235
285
|
|
|
236
286
|
## Key Features
|
|
237
287
|
|
|
@@ -304,5 +354,5 @@ npx ctx-cc --force
|
|
|
304
354
|
```
|
|
305
355
|
|
|
306
356
|
---
|
|
307
|
-
*CTX
|
|
357
|
+
*CTX 4.0 - Learning. Prediction. Self-healing. Cross-model review. Voice control.*
|
|
308
358
|
</reference>
|
package/commands/init.md
CHANGED
|
@@ -20,6 +20,7 @@ Initialize a new CTX project through unified flow: questioning → research →
|
|
|
20
20
|
- `.ctx/STATE.md` — Living project state
|
|
21
21
|
- `.ctx/PRD.json` — Requirements contract
|
|
22
22
|
- `.ctx/config.json` — Workflow preferences
|
|
23
|
+
- `.ctx/capability-manifest.json` — Tool restrictions read by the PreToolUse hook
|
|
23
24
|
- `.ctx/research/` — Domain research (ArguSeek)
|
|
24
25
|
- `.ctx/ROADMAP.md` — Phase structure
|
|
25
26
|
|
|
@@ -136,6 +137,29 @@ cat > .ctx/.gitignore << 'EOF'
|
|
|
136
137
|
*.secrets
|
|
137
138
|
credentials.json
|
|
138
139
|
EOF
|
|
140
|
+
|
|
141
|
+
# Seed the capability manifest used by the PreToolUse hook.
|
|
142
|
+
# The plugin install step generates this template; prefer the global install,
|
|
143
|
+
# fall back to project-local, warn if neither exists (enforcement is optional).
|
|
144
|
+
# If an older-version manifest already exists, back it up before copying.
|
|
145
|
+
CTX_TEMPLATE_DIR="${HOME}/.claude/ctx/templates"
|
|
146
|
+
if [ ! -f "${CTX_TEMPLATE_DIR}/capability-manifest.json" ]; then
|
|
147
|
+
CTX_TEMPLATE_DIR=".claude/ctx/templates"
|
|
148
|
+
fi
|
|
149
|
+
if [ -f "${CTX_TEMPLATE_DIR}/capability-manifest.json" ]; then
|
|
150
|
+
if [ -f .ctx/capability-manifest.json ]; then
|
|
151
|
+
OLD_VER=$(grep -oE '"_version"\s*:\s*[0-9]+' .ctx/capability-manifest.json | grep -oE '[0-9]+$' || echo "0")
|
|
152
|
+
NEW_VER=$(grep -oE '"_version"\s*:\s*[0-9]+' "${CTX_TEMPLATE_DIR}/capability-manifest.json" | grep -oE '[0-9]+$' || echo "0")
|
|
153
|
+
if [ "${OLD_VER}" != "${NEW_VER}" ]; then
|
|
154
|
+
mv .ctx/capability-manifest.json ".ctx/capability-manifest.v${OLD_VER}.backup.json"
|
|
155
|
+
fi
|
|
156
|
+
fi
|
|
157
|
+
cp "${CTX_TEMPLATE_DIR}/capability-manifest.json" .ctx/capability-manifest.json
|
|
158
|
+
else
|
|
159
|
+
echo "WARN: capability-manifest.json template not found in ${HOME}/.claude/ctx/templates or .claude/ctx/templates"
|
|
160
|
+
echo " The PreToolUse hook will silently no-op until the manifest is seeded."
|
|
161
|
+
echo " Reinstall ctx-cc (npx ctx-cc --force) to regenerate it."
|
|
162
|
+
fi
|
|
139
163
|
```
|
|
140
164
|
|
|
141
165
|
## Phase 5: Write STATE.md
|
|
@@ -181,6 +205,7 @@ EOF
|
|
|
181
205
|
|
|
182
206
|
```bash
|
|
183
207
|
git add .ctx/STATE.md .ctx/.gitignore
|
|
208
|
+
[ -f .ctx/capability-manifest.json ] && git add .ctx/capability-manifest.json
|
|
184
209
|
git commit -m "docs: initialize CTX project - {{project_name}}"
|
|
185
210
|
```
|
|
186
211
|
|
package/commands/metrics.md
CHANGED
|
@@ -4,7 +4,7 @@ description: Metrics dashboard - understand AI productivity impact with stories
|
|
|
4
4
|
---
|
|
5
5
|
|
|
6
6
|
<objective>
|
|
7
|
-
CTX
|
|
7
|
+
CTX 4.0 Metrics Dashboard - Comprehensive productivity analytics for understanding AI development impact.
|
|
8
8
|
</objective>
|
|
9
9
|
|
|
10
10
|
<usage>
|
package/commands/milestone.md
CHANGED
|
@@ -4,7 +4,7 @@ description: Milestone workflow - list, audit, complete, and start new milestone
|
|
|
4
4
|
---
|
|
5
5
|
|
|
6
6
|
<objective>
|
|
7
|
-
CTX
|
|
7
|
+
CTX 4.0 Milestone Workflow - Full release management with audit, archive, and git tagging.
|
|
8
8
|
</objective>
|
|
9
9
|
|
|
10
10
|
<usage>
|
|
@@ -0,0 +1,197 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ctx:ml-status
|
|
3
|
+
description: Show ML project status — experiments, models, features, drift alerts. Read-only dashboard. No agents spawned.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
<objective>
|
|
7
|
+
Display a concise ML project dashboard. Read-only. Shows active experiment, recent experiment history, production model versions, feature count, and any drift alerts. No writes, no agents.
|
|
8
|
+
</objective>
|
|
9
|
+
|
|
10
|
+
<usage>
|
|
11
|
+
```bash
|
|
12
|
+
/ctx:ml-status # Full dashboard
|
|
13
|
+
/ctx:ml-status models # Model registry only
|
|
14
|
+
/ctx:ml-status exp # Experiment log only
|
|
15
|
+
/ctx:ml-status drift # Drift alerts only
|
|
16
|
+
```
|
|
17
|
+
</usage>
|
|
18
|
+
|
|
19
|
+
<process>
|
|
20
|
+
|
|
21
|
+
## Step 1: Check ML Directory Exists
|
|
22
|
+
|
|
23
|
+
If `.ctx/ml/` does not exist:
|
|
24
|
+
```
|
|
25
|
+
[ML Status] No ML project found.
|
|
26
|
+
|
|
27
|
+
Run /ctx:experiment new "<hypothesis>" to start.
|
|
28
|
+
```
|
|
29
|
+
Exit.
|
|
30
|
+
|
|
31
|
+
## Step 2: Parse Filter Argument
|
|
32
|
+
|
|
33
|
+
| Arg | Show |
|
|
34
|
+
|-----|------|
|
|
35
|
+
| none | Full dashboard |
|
|
36
|
+
| `models` | Model registry section only |
|
|
37
|
+
| `exp` | Experiment log section only |
|
|
38
|
+
| `drift` | Drift alerts section only |
|
|
39
|
+
|
|
40
|
+
## Step 3: Read Files
|
|
41
|
+
|
|
42
|
+
Read the following files (skip silently if missing):
|
|
43
|
+
|
|
44
|
+
| File | Purpose |
|
|
45
|
+
|------|---------|
|
|
46
|
+
| `.ctx/ml/ML-STATUS.md` | Active experiment, current focus, blockers |
|
|
47
|
+
| `.ctx/ml/EXPERIMENT-LOG.md` | Experiment history table |
|
|
48
|
+
| `.ctx/ml/models/registry.yaml` | Model versions and metrics |
|
|
49
|
+
| `.ctx/ml/features/feature-registry.yaml` | Feature count and inventory |
|
|
50
|
+
| `.ctx/ml/experiments/*/artifacts/drift_alerts.json` | Any saved drift alerts |
|
|
51
|
+
|
|
52
|
+
## Step 4: Render Dashboard
|
|
53
|
+
|
|
54
|
+
### Full Dashboard Output
|
|
55
|
+
|
|
56
|
+
```
|
|
57
|
+
[ML Status] {project_dir}
|
|
58
|
+
Updated: {ML-STATUS.md updated date}
|
|
59
|
+
|
|
60
|
+
━━━ Active Experiment ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
61
|
+
|
|
62
|
+
{experiment_id} — {hypothesis title}
|
|
63
|
+
Status: {draft | running | concluded}
|
|
64
|
+
Phase: {hypothesize | design | train | review | done}
|
|
65
|
+
|
|
66
|
+
Current Focus:
|
|
67
|
+
{from ML-STATUS.md Current Focus section}
|
|
68
|
+
|
|
69
|
+
━━━ Recent Experiments ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
70
|
+
|
|
71
|
+
{last 5 rows from EXPERIMENT-LOG.md, formatted as table}
|
|
72
|
+
|
|
73
|
+
ID Status Primary Metric Result
|
|
74
|
+
EXP-{n} running AUC >= 0.90 —
|
|
75
|
+
EXP-{n-1} accepted AUC >= 0.88 AUC 0.91
|
|
76
|
+
EXP-{n-2} rejected AUC >= 0.88 AUC 0.86
|
|
77
|
+
EXP-{n-3} accepted AUC >= 0.85 AUC 0.87
|
|
78
|
+
|
|
79
|
+
━━━ Production Models ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
80
|
+
|
|
81
|
+
{for each model in registry.yaml}
|
|
82
|
+
{model_name}: v{current} — {primary metric}: {value}
|
|
83
|
+
Promoted: {date} Experiment: {experiment_id}
|
|
84
|
+
|
|
85
|
+
━━━ Features ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
86
|
+
|
|
87
|
+
{count} features registered
|
|
88
|
+
{count} features validated
|
|
89
|
+
{list feature names used by production models, comma-separated}
|
|
90
|
+
|
|
91
|
+
━━━ Drift Alerts ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
92
|
+
|
|
93
|
+
{if no drift_alerts.json found}
|
|
94
|
+
No drift alerts.
|
|
95
|
+
|
|
96
|
+
{if drift alerts found}
|
|
97
|
+
{count} feature(s) drifted in latest check:
|
|
98
|
+
{feature}: KS={ks_stat}, p={pvalue} [{severity}]
|
|
99
|
+
|
|
100
|
+
━━━ Blockers ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
101
|
+
|
|
102
|
+
{from ML-STATUS.md Blocking Issues section, or "none"}
|
|
103
|
+
|
|
104
|
+
━━━ Next Steps ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
105
|
+
|
|
106
|
+
{context-aware suggestions:}
|
|
107
|
+
|
|
108
|
+
If active experiment is draft:
|
|
109
|
+
Run /ctx:train to start training.
|
|
110
|
+
|
|
111
|
+
If active experiment is running:
|
|
112
|
+
Training in progress. Run /ctx:train again to check.
|
|
113
|
+
|
|
114
|
+
If active experiment is concluded (accepted):
|
|
115
|
+
Run /ctx:experiment new "<next hypothesis>" to iterate.
|
|
116
|
+
|
|
117
|
+
If active experiment is concluded (rejected):
|
|
118
|
+
Run /ctx:experiment new "<revised hypothesis>" based on learnings.
|
|
119
|
+
|
|
120
|
+
If no active experiment:
|
|
121
|
+
Run /ctx:experiment new "<hypothesis>" to start.
|
|
122
|
+
|
|
123
|
+
If drift alerts present:
|
|
124
|
+
Run /ctx:experiment new "retrain on updated data distribution" to address drift.
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
### Models-Only Output
|
|
128
|
+
|
|
129
|
+
```
|
|
130
|
+
[ML Status] Production Models
|
|
131
|
+
|
|
132
|
+
{model_name}
|
|
133
|
+
Current: v{n}
|
|
134
|
+
{primary metric}: {value}
|
|
135
|
+
Promoted: {date}
|
|
136
|
+
Experiment: {experiment_id}
|
|
137
|
+
Promotion criteria: {from registry.yaml}
|
|
138
|
+
|
|
139
|
+
Version history:
|
|
140
|
+
v{n}: {primary metric}: {value} — production
|
|
141
|
+
v{n-1}: {primary metric}: {value} — retired
|
|
142
|
+
v{n-2}: {primary metric}: {value} — retired
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
### Experiments-Only Output
|
|
146
|
+
|
|
147
|
+
```
|
|
148
|
+
[ML Status] Experiment Log
|
|
149
|
+
|
|
150
|
+
ID Hypothesis (truncated 60 chars) Model Metric Result Status
|
|
151
|
+
─────────────────────────────────────────────────────────────────────────────────────────
|
|
152
|
+
EXP-{n} {hypothesis} {model} {metric} {result} {status}
|
|
153
|
+
...
|
|
154
|
+
|
|
155
|
+
Total: {count} experiments ({accepted} accepted, {rejected} rejected, {running} running)
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
### Drift-Only Output
|
|
159
|
+
|
|
160
|
+
```
|
|
161
|
+
[ML Status] Drift Alerts
|
|
162
|
+
|
|
163
|
+
Source: .ctx/ml/experiments/{latest_exp}/artifacts/drift_alerts.json
|
|
164
|
+
Checked: {file modified date}
|
|
165
|
+
|
|
166
|
+
{if no alerts}
|
|
167
|
+
No drift detected.
|
|
168
|
+
|
|
169
|
+
{if alerts}
|
|
170
|
+
Feature KS Stat p-value Severity
|
|
171
|
+
──────────────────────────────────────────────
|
|
172
|
+
{feature} {stat} {pvalue} {high|medium}
|
|
173
|
+
|
|
174
|
+
Recommendation: Run /ctx:experiment new "retrain on updated distribution"
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
## Step 5: Handle Missing Files Gracefully
|
|
178
|
+
|
|
179
|
+
If a file is missing, show that section as empty with a hint:
|
|
180
|
+
|
|
181
|
+
```
|
|
182
|
+
━━━ Production Models ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
183
|
+
No models registered yet.
|
|
184
|
+
Models appear here after a /ctx:train run passes review.
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
Never error on missing files — display what is available.
|
|
188
|
+
|
|
189
|
+
</process>
|
|
190
|
+
|
|
191
|
+
<guardrails>
|
|
192
|
+
- This command is read-only. It never writes files, spawns agents, or modifies state.
|
|
193
|
+
- Do not parse YAML strictly — if registry.yaml is malformed, show a warning and continue.
|
|
194
|
+
- Drift alert files are found by globbing .ctx/ml/experiments/*/artifacts/drift_alerts.json — show the most recently modified one.
|
|
195
|
+
- Truncate long hypothesis strings to 60 characters in table views.
|
|
196
|
+
- If ML-STATUS.md does not exist but EXPERIMENT-LOG.md does, derive status from the log.
|
|
197
|
+
</guardrails>
|
package/commands/monitor.md
CHANGED
|
@@ -4,7 +4,7 @@ description: Self-healing deployments - connect to error tracking (Sentry/LogRoc
|
|
|
4
4
|
---
|
|
5
5
|
|
|
6
6
|
<objective>
|
|
7
|
-
CTX
|
|
7
|
+
CTX 4.0 Self-Healing Deployments - Monitor production errors and automatically create fix stories or even auto-fix with PR creation.
|
|
8
8
|
</objective>
|
|
9
9
|
|
|
10
10
|
<usage>
|
|
@@ -0,0 +1,266 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ctx:train
|
|
3
|
+
description: ML model training workflow — feature engineering, training, HPO, evaluation, and registry promotion. Uses Digital Twin patterns from ctx-ml-pipeline skill.
|
|
4
|
+
args: experiment_id (optional — defaults to active experiment from ML-STATUS.md)
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
<objective>
|
|
8
|
+
Orchestrate a full ML training run for a designed experiment. Loads config, spawns engineer to build/run the pipeline, spawns reviewer to validate results, and promotes to model registry if promotion criteria are met.
|
|
9
|
+
|
|
10
|
+
This command assumes the experiment already has HYPOTHESIS.md, DESIGN.md, and config.yaml. Run /ctx:experiment new first if not.
|
|
11
|
+
</objective>
|
|
12
|
+
|
|
13
|
+
<usage>
|
|
14
|
+
```bash
|
|
15
|
+
/ctx:train # Train for active experiment (from ML-STATUS.md)
|
|
16
|
+
/ctx:train EXP-003 # Train for specific experiment
|
|
17
|
+
/ctx:train --dry-run # Validate config without running training
|
|
18
|
+
```
|
|
19
|
+
</usage>
|
|
20
|
+
|
|
21
|
+
<process>
|
|
22
|
+
|
|
23
|
+
## Step 1: Parse Arguments
|
|
24
|
+
|
|
25
|
+
- No args → read active experiment from `.ctx/ml/ML-STATUS.md`
|
|
26
|
+
- `EXP-{n}` → use that experiment ID
|
|
27
|
+
- `--dry-run` → validate config and design, report issues, do not train
|
|
28
|
+
|
|
29
|
+
If no active experiment and no ID given:
|
|
30
|
+
```
|
|
31
|
+
[Train] No active experiment found.
|
|
32
|
+
Run /ctx:experiment new "<hypothesis>" to create one.
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
## Step 2: Validate Prerequisites
|
|
36
|
+
|
|
37
|
+
Check these files exist before proceeding:
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
.ctx/ml/experiments/{id}/HYPOTHESIS.md → must exist
|
|
41
|
+
.ctx/ml/experiments/{id}/DESIGN.md → must exist
|
|
42
|
+
.ctx/ml/experiments/{id}/config.yaml → must exist
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
If any are missing:
|
|
46
|
+
```
|
|
47
|
+
[Train] Cannot run — missing required files for {experiment_id}:
|
|
48
|
+
- {missing file}
|
|
49
|
+
|
|
50
|
+
Run /ctx:experiment new to create them.
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
Also check:
|
|
54
|
+
- `config.yaml` has a `data.train` path and it exists on disk
|
|
55
|
+
- `config.yaml` has a `model.type` set
|
|
56
|
+
- `config.yaml` has a `seed` set (reject if missing — non-reproducible)
|
|
57
|
+
|
|
58
|
+
## Step 3: Dry Run (if --dry-run)
|
|
59
|
+
|
|
60
|
+
Read DESIGN.md acceptance criteria and config.yaml. Report:
|
|
61
|
+
|
|
62
|
+
```
|
|
63
|
+
[Train] Dry run for {experiment_id}
|
|
64
|
+
|
|
65
|
+
Config:
|
|
66
|
+
Model: {model.type}
|
|
67
|
+
Params: {key params}
|
|
68
|
+
Data: {data paths}
|
|
69
|
+
Seed: {seed}
|
|
70
|
+
|
|
71
|
+
Design checks:
|
|
72
|
+
Primary metric: {metric} — {target}
|
|
73
|
+
Guard rails: {metrics}
|
|
74
|
+
Acceptance criteria: {count} items
|
|
75
|
+
|
|
76
|
+
Issues: {none | list any problems}
|
|
77
|
+
|
|
78
|
+
Ready to train: {yes | no}
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
Exit after dry run. Do not spawn training agent.
|
|
82
|
+
|
|
83
|
+
## Step 4: Update Experiment Status
|
|
84
|
+
|
|
85
|
+
Update `.ctx/ml/EXPERIMENT-LOG.md` — change row status from `draft` to `running`.
|
|
86
|
+
Update `.ctx/ml/ML-STATUS.md` — set active experiment and status to running.
|
|
87
|
+
|
|
88
|
+
## Step 5: Spawn ctx-ml-engineer for Training
|
|
89
|
+
|
|
90
|
+
```
|
|
91
|
+
Agent({
|
|
92
|
+
subagent_type: "ctx-ml-engineer",
|
|
93
|
+
prompt: |
|
|
94
|
+
Run ML training pipeline for {experiment_id}.
|
|
95
|
+
|
|
96
|
+
Read these files:
|
|
97
|
+
- .ctx/ml/experiments/{experiment_id}/config.yaml
|
|
98
|
+
- .ctx/ml/experiments/{experiment_id}/DESIGN.md
|
|
99
|
+
- .ctx/ml/features/feature-registry.yaml
|
|
100
|
+
|
|
101
|
+
Execute the full pipeline:
|
|
102
|
+
|
|
103
|
+
1. DATA VALIDATION
|
|
104
|
+
- Load data from config.yaml data paths
|
|
105
|
+
- Apply Pandera schema validation (fail hard on violations)
|
|
106
|
+
- Log row counts and class distribution
|
|
107
|
+
|
|
108
|
+
2. FEATURE PIPELINE
|
|
109
|
+
- Build transform pipeline from feature-registry.yaml
|
|
110
|
+
- Fit on training data only
|
|
111
|
+
- Transform train, val, test
|
|
112
|
+
- Save fitted pipeline to artifacts/pipeline.pkl
|
|
113
|
+
|
|
114
|
+
3. TRAINING
|
|
115
|
+
- Train model from config.yaml model params
|
|
116
|
+
- Use early stopping (patience=20)
|
|
117
|
+
- Log training curve to artifacts/train.log
|
|
118
|
+
|
|
119
|
+
4. HPO (if config.yaml has hpo: true)
|
|
120
|
+
- Run Optuna with n_trials from config or default 100
|
|
121
|
+
- Save study to artifacts/hpo_study.pkl
|
|
122
|
+
- Retrain with best params
|
|
123
|
+
|
|
124
|
+
5. EVALUATION
|
|
125
|
+
- Compute primary metric and all guard rail metrics
|
|
126
|
+
- Compute calibration error
|
|
127
|
+
- Generate ROC curve, calibration curve, feature importance plots
|
|
128
|
+
- Save metrics to artifacts/metrics.json
|
|
129
|
+
- Save plots to artifacts/plots/
|
|
130
|
+
|
|
131
|
+
6. CONFORMAL WRAPPER
|
|
132
|
+
- Fit MAPIE on calibration split at alpha=0.1
|
|
133
|
+
- Save to artifacts/mapie.pkl
|
|
134
|
+
|
|
135
|
+
7. INFERENCE SMOKE TEST
|
|
136
|
+
- Load model + pipeline + mapie
|
|
137
|
+
- Run 5 predictions with full envelope
|
|
138
|
+
- Verify envelope structure is correct
|
|
139
|
+
|
|
140
|
+
Write artifacts/metrics.json with all metrics.
|
|
141
|
+
Write all artifacts per ctx-ml-pipeline skill reproducibility requirements.
|
|
142
|
+
|
|
143
|
+
Do NOT write RESULTS.md — the reviewer will do that.
|
|
144
|
+
Do NOT update the model registry — that happens after review passes.
|
|
145
|
+
})
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
## Step 6: Validate Training Artifacts
|
|
149
|
+
|
|
150
|
+
After engineer completes, verify these files exist:
|
|
151
|
+
|
|
152
|
+
```
|
|
153
|
+
.ctx/ml/experiments/{id}/artifacts/
|
|
154
|
+
├── model.pkl → required
|
|
155
|
+
├── pipeline.pkl → required
|
|
156
|
+
├── mapie.pkl → required
|
|
157
|
+
├── config.yaml → required (copy of run config)
|
|
158
|
+
├── metrics.json → required
|
|
159
|
+
├── train.log → required
|
|
160
|
+
└── plots/ → required (at least roc_curve.png)
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
If any required artifact is missing, report which and do not proceed to review.
|
|
164
|
+
|
|
165
|
+
## Step 7: Spawn ctx-ml-reviewer for Evaluation
|
|
166
|
+
|
|
167
|
+
```
|
|
168
|
+
Agent({
|
|
169
|
+
subagent_type: "ctx-ml-reviewer",
|
|
170
|
+
prompt: |
|
|
171
|
+
Review training results for {experiment_id}.
|
|
172
|
+
|
|
173
|
+
Read:
|
|
174
|
+
- .ctx/ml/experiments/{experiment_id}/HYPOTHESIS.md
|
|
175
|
+
- .ctx/ml/experiments/{experiment_id}/DESIGN.md
|
|
176
|
+
- .ctx/ml/experiments/{experiment_id}/artifacts/metrics.json
|
|
177
|
+
- .ctx/ml/experiments/{experiment_id}/artifacts/train.log
|
|
178
|
+
- .ctx/ml/models/registry.yaml (current production model metrics for comparison)
|
|
179
|
+
|
|
180
|
+
Review checklist:
|
|
181
|
+
- [ ] Primary metric meets DESIGN.md acceptance threshold
|
|
182
|
+
- [ ] Guard rail metrics not violated
|
|
183
|
+
- [ ] Training loss converged (check train.log — no NaN, no divergence)
|
|
184
|
+
- [ ] Calibration error < 0.05
|
|
185
|
+
- [ ] Primary metric improves on production model by promotion criteria
|
|
186
|
+
|
|
187
|
+
Write .ctx/ml/experiments/{experiment_id}/RESULTS.md with:
|
|
188
|
+
- Metrics table (baseline vs result vs delta)
|
|
189
|
+
- Verdict: accepted | rejected | inconclusive
|
|
190
|
+
- Key findings
|
|
191
|
+
- Next experiment recommendation
|
|
192
|
+
|
|
193
|
+
Return verdict as final line: VERDICT: accepted | rejected | inconclusive
|
|
194
|
+
})
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
## Step 8: Handle Verdict
|
|
198
|
+
|
|
199
|
+
Read reviewer verdict.
|
|
200
|
+
|
|
201
|
+
### Verdict: accepted
|
|
202
|
+
|
|
203
|
+
1. Determine next version: read current version from `models/registry.yaml`, increment.
|
|
204
|
+
2. Update `models/registry.yaml`:
|
|
205
|
+
- Add new version entry with metrics, experiment ID, date, status: production
|
|
206
|
+
- Set previous production version status to: retired
|
|
207
|
+
- Set `current` to new version
|
|
208
|
+
3. Update `EXPERIMENT-LOG.md` row status to: `accepted`
|
|
209
|
+
4. Update `ML-STATUS.md` with outcome
|
|
210
|
+
|
|
211
|
+
Output:
|
|
212
|
+
```
|
|
213
|
+
[Train] EXP-{n} accepted — model promoted to {name} v{version}
|
|
214
|
+
|
|
215
|
+
Metrics:
|
|
216
|
+
{primary}: {value} (was {baseline}, +{delta})
|
|
217
|
+
{guard}: {value} (was {baseline})
|
|
218
|
+
|
|
219
|
+
Artifacts: .ctx/ml/experiments/{experiment_id}/artifacts/
|
|
220
|
+
Registry: .ctx/ml/models/registry.yaml
|
|
221
|
+
|
|
222
|
+
Run /ctx:experiment new "<next hypothesis>" to continue.
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
### Verdict: rejected
|
|
226
|
+
|
|
227
|
+
1. Update `EXPERIMENT-LOG.md` row status to: `rejected`
|
|
228
|
+
2. Update `ML-STATUS.md` with outcome and reviewer's next recommendation
|
|
229
|
+
|
|
230
|
+
Output:
|
|
231
|
+
```
|
|
232
|
+
[Train] EXP-{n} rejected — {reason from RESULTS.md}
|
|
233
|
+
|
|
234
|
+
Primary metric: {value} (target was {target})
|
|
235
|
+
|
|
236
|
+
Key findings:
|
|
237
|
+
{findings from RESULTS.md}
|
|
238
|
+
|
|
239
|
+
Next recommendation: {from reviewer}
|
|
240
|
+
|
|
241
|
+
Run /ctx:experiment new "<next hypothesis>" to iterate.
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
### Verdict: inconclusive
|
|
245
|
+
|
|
246
|
+
1. Update `EXPERIMENT-LOG.md` row status to: `inconclusive`
|
|
247
|
+
2. Report what is blocking a clear verdict
|
|
248
|
+
|
|
249
|
+
Output:
|
|
250
|
+
```
|
|
251
|
+
[Train] EXP-{n} inconclusive — {reason}
|
|
252
|
+
|
|
253
|
+
Blocking issue: {issue}
|
|
254
|
+
|
|
255
|
+
Recommended action: {action}
|
|
256
|
+
```
|
|
257
|
+
|
|
258
|
+
</process>
|
|
259
|
+
|
|
260
|
+
<guardrails>
|
|
261
|
+
- Never promote to registry without an accepted verdict from ctx-ml-reviewer.
|
|
262
|
+
- Never proceed to review if required training artifacts are missing.
|
|
263
|
+
- Seed is mandatory in config.yaml — non-reproducible runs are rejected.
|
|
264
|
+
- Dry run never touches EXPERIMENT-LOG.md or ML-STATUS.md.
|
|
265
|
+
- If training fails mid-run, update EXPERIMENT-LOG.md status to "failed" before exiting.
|
|
266
|
+
</guardrails>
|