ctx-cc 3.5.0 → 4.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (71) hide show
  1. package/README.md +34 -289
  2. package/agents/ctx-arch-mapper.md +5 -3
  3. package/agents/ctx-auditor.md +5 -3
  4. package/agents/ctx-concerns-mapper.md +5 -3
  5. package/agents/ctx-criteria-suggester.md +6 -4
  6. package/agents/ctx-debugger.md +5 -3
  7. package/agents/ctx-designer.md +488 -114
  8. package/agents/ctx-discusser.md +5 -3
  9. package/agents/ctx-executor.md +5 -3
  10. package/agents/ctx-handoff.md +6 -4
  11. package/agents/ctx-learner.md +5 -3
  12. package/agents/ctx-mapper.md +4 -3
  13. package/agents/ctx-ml-analyst.md +600 -0
  14. package/agents/ctx-ml-engineer.md +933 -0
  15. package/agents/ctx-ml-reviewer.md +485 -0
  16. package/agents/ctx-ml-scientist.md +626 -0
  17. package/agents/ctx-parallelizer.md +4 -3
  18. package/agents/ctx-planner.md +5 -3
  19. package/agents/ctx-predictor.md +4 -3
  20. package/agents/ctx-qa.md +5 -3
  21. package/agents/ctx-quality-mapper.md +5 -3
  22. package/agents/ctx-researcher.md +5 -3
  23. package/agents/ctx-reviewer.md +6 -4
  24. package/agents/ctx-team-coordinator.md +5 -3
  25. package/agents/ctx-tech-mapper.md +5 -3
  26. package/agents/ctx-verifier.md +5 -3
  27. package/bin/ctx.js +168 -27
  28. package/commands/brand.md +309 -0
  29. package/commands/design.md +304 -0
  30. package/commands/experiment.md +251 -0
  31. package/commands/help.md +57 -7
  32. package/commands/metrics.md +1 -1
  33. package/commands/milestone.md +1 -1
  34. package/commands/ml-status.md +197 -0
  35. package/commands/monitor.md +1 -1
  36. package/commands/train.md +266 -0
  37. package/commands/visual-qa.md +559 -0
  38. package/commands/voice.md +1 -1
  39. package/hooks/post-tool-use.js +39 -0
  40. package/hooks/pre-tool-use.js +93 -0
  41. package/hooks/subagent-stop.js +32 -0
  42. package/package.json +9 -3
  43. package/plugin.json +45 -0
  44. package/skills/ctx-design-system/SKILL.md +572 -0
  45. package/skills/ctx-ml-experiment/SKILL.md +334 -0
  46. package/skills/ctx-ml-pipeline/SKILL.md +437 -0
  47. package/skills/ctx-orchestrator/SKILL.md +91 -0
  48. package/skills/ctx-review-gate/SKILL.md +111 -0
  49. package/skills/ctx-state/SKILL.md +100 -0
  50. package/skills/ctx-visual-qa/SKILL.md +587 -0
  51. package/src/agents.js +109 -0
  52. package/src/auto.js +287 -0
  53. package/src/capabilities.js +171 -0
  54. package/src/commits.js +94 -0
  55. package/src/config.js +112 -0
  56. package/src/context.js +241 -0
  57. package/src/handoff.js +156 -0
  58. package/src/hooks.js +218 -0
  59. package/src/install.js +119 -51
  60. package/src/lifecycle.js +194 -0
  61. package/src/metrics.js +198 -0
  62. package/src/pipeline.js +269 -0
  63. package/src/review-gate.js +244 -0
  64. package/src/runner.js +120 -0
  65. package/src/skills.js +143 -0
  66. package/src/state.js +267 -0
  67. package/src/worktree.js +244 -0
  68. package/templates/PRD.json +1 -1
  69. package/templates/config.json +1 -237
  70. package/workflows/ctx-router.md +0 -485
  71. package/workflows/map-codebase.md +0 -329
@@ -0,0 +1,197 @@
1
+ ---
2
+ name: ctx:ml-status
3
+ description: Show ML project status — experiments, models, features, drift alerts. Read-only dashboard. No agents spawned.
4
+ ---
5
+
6
+ <objective>
7
+ Display a concise ML project dashboard. Read-only. Shows active experiment, recent experiment history, production model versions, feature count, and any drift alerts. No writes, no agents.
8
+ </objective>
9
+
10
+ <usage>
11
+ ```bash
12
+ /ctx:ml-status # Full dashboard
13
+ /ctx:ml-status models # Model registry only
14
+ /ctx:ml-status exp # Experiment log only
15
+ /ctx:ml-status drift # Drift alerts only
16
+ ```
17
+ </usage>
18
+
19
+ <process>
20
+
21
+ ## Step 1: Check ML Directory Exists
22
+
23
+ If `.ctx/ml/` does not exist:
24
+ ```
25
+ [ML Status] No ML project found.
26
+
27
+ Run /ctx:experiment new "<hypothesis>" to start.
28
+ ```
29
+ Exit.
30
+
31
+ ## Step 2: Parse Filter Argument
32
+
33
+ | Arg | Show |
34
+ |-----|------|
35
+ | none | Full dashboard |
36
+ | `models` | Model registry section only |
37
+ | `exp` | Experiment log section only |
38
+ | `drift` | Drift alerts section only |
39
+
40
+ ## Step 3: Read Files
41
+
42
+ Read the following files (skip silently if missing):
43
+
44
+ | File | Purpose |
45
+ |------|---------|
46
+ | `.ctx/ml/ML-STATUS.md` | Active experiment, current focus, blockers |
47
+ | `.ctx/ml/EXPERIMENT-LOG.md` | Experiment history table |
48
+ | `.ctx/ml/models/registry.yaml` | Model versions and metrics |
49
+ | `.ctx/ml/features/feature-registry.yaml` | Feature count and inventory |
50
+ | `.ctx/ml/experiments/*/artifacts/drift_alerts.json` | Any saved drift alerts |
51
+
52
+ ## Step 4: Render Dashboard
53
+
54
+ ### Full Dashboard Output
55
+
56
+ ```
57
+ [ML Status] {project_dir}
58
+ Updated: {ML-STATUS.md updated date}
59
+
60
+ ━━━ Active Experiment ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
61
+
62
+ {experiment_id} — {hypothesis title}
63
+ Status: {draft | running | concluded}
64
+ Phase: {hypothesize | design | train | review | done}
65
+
66
+ Current Focus:
67
+ {from ML-STATUS.md Current Focus section}
68
+
69
+ ━━━ Recent Experiments ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
70
+
71
+ {last 5 rows from EXPERIMENT-LOG.md, formatted as table}
72
+
73
+ ID Status Primary Metric Result
74
+ EXP-{n} running AUC >= 0.90 —
75
+ EXP-{n-1} accepted AUC >= 0.88 AUC 0.91
76
+ EXP-{n-2} rejected AUC >= 0.88 AUC 0.86
77
+ EXP-{n-3} accepted AUC >= 0.85 AUC 0.87
78
+
79
+ ━━━ Production Models ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
80
+
81
+ {for each model in registry.yaml}
82
+ {model_name}: v{current} — {primary metric}: {value}
83
+ Promoted: {date} Experiment: {experiment_id}
84
+
85
+ ━━━ Features ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
86
+
87
+ {count} features registered
88
+ {count} features validated
89
+ {list feature names used by production models, comma-separated}
90
+
91
+ ━━━ Drift Alerts ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
92
+
93
+ {if no drift_alerts.json found}
94
+ No drift alerts.
95
+
96
+ {if drift alerts found}
97
+ {count} feature(s) drifted in latest check:
98
+ {feature}: KS={ks_stat}, p={pvalue} [{severity}]
99
+
100
+ ━━━ Blockers ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
101
+
102
+ {from ML-STATUS.md Blocking Issues section, or "none"}
103
+
104
+ ━━━ Next Steps ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
105
+
106
+ {context-aware suggestions:}
107
+
108
+ If active experiment is draft:
109
+ Run /ctx:train to start training.
110
+
111
+ If active experiment is running:
112
+ Training in progress. Run /ctx:train again to check.
113
+
114
+ If active experiment is concluded (accepted):
115
+ Run /ctx:experiment new "<next hypothesis>" to iterate.
116
+
117
+ If active experiment is concluded (rejected):
118
+ Run /ctx:experiment new "<revised hypothesis>" based on learnings.
119
+
120
+ If no active experiment:
121
+ Run /ctx:experiment new "<hypothesis>" to start.
122
+
123
+ If drift alerts present:
124
+ Run /ctx:experiment new "retrain on updated data distribution" to address drift.
125
+ ```
126
+
127
+ ### Models-Only Output
128
+
129
+ ```
130
+ [ML Status] Production Models
131
+
132
+ {model_name}
133
+ Current: v{n}
134
+ {primary metric}: {value}
135
+ Promoted: {date}
136
+ Experiment: {experiment_id}
137
+ Promotion criteria: {from registry.yaml}
138
+
139
+ Version history:
140
+ v{n}: {primary metric}: {value} — production
141
+ v{n-1}: {primary metric}: {value} — retired
142
+ v{n-2}: {primary metric}: {value} — retired
143
+ ```
144
+
145
+ ### Experiments-Only Output
146
+
147
+ ```
148
+ [ML Status] Experiment Log
149
+
150
+ ID Hypothesis (truncated 60 chars) Model Metric Result Status
151
+ ─────────────────────────────────────────────────────────────────────────────────────────
152
+ EXP-{n} {hypothesis} {model} {metric} {result} {status}
153
+ ...
154
+
155
+ Total: {count} experiments ({accepted} accepted, {rejected} rejected, {running} running)
156
+ ```
157
+
158
+ ### Drift-Only Output
159
+
160
+ ```
161
+ [ML Status] Drift Alerts
162
+
163
+ Source: .ctx/ml/experiments/{latest_exp}/artifacts/drift_alerts.json
164
+ Checked: {file modified date}
165
+
166
+ {if no alerts}
167
+ No drift detected.
168
+
169
+ {if alerts}
170
+ Feature KS Stat p-value Severity
171
+ ──────────────────────────────────────────────
172
+ {feature} {stat} {pvalue} {high|medium}
173
+
174
+ Recommendation: Run /ctx:experiment new "retrain on updated distribution"
175
+ ```
176
+
177
+ ## Step 5: Handle Missing Files Gracefully
178
+
179
+ If a file is missing, show that section as empty with a hint:
180
+
181
+ ```
182
+ ━━━ Production Models ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
183
+ No models registered yet.
184
+ Models appear here after a /ctx:train run passes review.
185
+ ```
186
+
187
+ Never error on missing files — display what is available.
188
+
189
+ </process>
190
+
191
+ <guardrails>
192
+ - This command is read-only. It never writes files, spawns agents, or modifies state.
193
+ - Do not parse YAML strictly — if registry.yaml is malformed, show a warning and continue.
194
+ - Drift alert files are found by globbing .ctx/ml/experiments/*/artifacts/drift_alerts.json — show the most recently modified one.
195
+ - Truncate long hypothesis strings to 60 characters in table views.
196
+ - If ML-STATUS.md does not exist but EXPERIMENT-LOG.md does, derive status from the log.
197
+ </guardrails>
@@ -4,7 +4,7 @@ description: Self-healing deployments - connect to error tracking (Sentry/LogRoc
4
4
  ---
5
5
 
6
6
  <objective>
7
- CTX 3.3 Self-Healing Deployments - Monitor production errors and automatically create fix stories or even auto-fix with PR creation.
7
+ CTX 3.5 Self-Healing Deployments - Monitor production errors and automatically create fix stories or even auto-fix with PR creation.
8
8
  </objective>
9
9
 
10
10
  <usage>
@@ -0,0 +1,266 @@
1
+ ---
2
+ name: ctx:train
3
+ description: ML model training workflow — feature engineering, training, HPO, evaluation, and registry promotion. Uses Digital Twin patterns from ctx-ml-pipeline skill.
4
+ args: experiment_id (optional — defaults to active experiment from ML-STATUS.md)
5
+ ---
6
+
7
+ <objective>
8
+ Orchestrate a full ML training run for a designed experiment. Loads config, spawns engineer to build/run the pipeline, spawns reviewer to validate results, and promotes to model registry if promotion criteria are met.
9
+
10
+ This command assumes the experiment already has HYPOTHESIS.md, DESIGN.md, and config.yaml. Run /ctx:experiment new first if not.
11
+ </objective>
12
+
13
+ <usage>
14
+ ```bash
15
+ /ctx:train # Train for active experiment (from ML-STATUS.md)
16
+ /ctx:train EXP-003 # Train for specific experiment
17
+ /ctx:train --dry-run # Validate config without running training
18
+ ```
19
+ </usage>
20
+
21
+ <process>
22
+
23
+ ## Step 1: Parse Arguments
24
+
25
+ - No args → read active experiment from `.ctx/ml/ML-STATUS.md`
26
+ - `EXP-{n}` → use that experiment ID
27
+ - `--dry-run` → validate config and design, report issues, do not train
28
+
29
+ If no active experiment and no ID given:
30
+ ```
31
+ [Train] No active experiment found.
32
+ Run /ctx:experiment new "<hypothesis>" to create one.
33
+ ```
34
+
35
+ ## Step 2: Validate Prerequisites
36
+
37
+ Check these files exist before proceeding:
38
+
39
+ ```bash
40
+ .ctx/ml/experiments/{id}/HYPOTHESIS.md → must exist
41
+ .ctx/ml/experiments/{id}/DESIGN.md → must exist
42
+ .ctx/ml/experiments/{id}/config.yaml → must exist
43
+ ```
44
+
45
+ If any are missing:
46
+ ```
47
+ [Train] Cannot run — missing required files for {experiment_id}:
48
+ - {missing file}
49
+
50
+ Run /ctx:experiment new to create them.
51
+ ```
52
+
53
+ Also check:
54
+ - `config.yaml` has a `data.train` path and it exists on disk
55
+ - `config.yaml` has a `model.type` set
56
+ - `config.yaml` has a `seed` set (reject if missing — non-reproducible)
57
+
58
+ ## Step 3: Dry Run (if --dry-run)
59
+
60
+ Read DESIGN.md acceptance criteria and config.yaml. Report:
61
+
62
+ ```
63
+ [Train] Dry run for {experiment_id}
64
+
65
+ Config:
66
+ Model: {model.type}
67
+ Params: {key params}
68
+ Data: {data paths}
69
+ Seed: {seed}
70
+
71
+ Design checks:
72
+ Primary metric: {metric} — {target}
73
+ Guard rails: {metrics}
74
+ Acceptance criteria: {count} items
75
+
76
+ Issues: {none | list any problems}
77
+
78
+ Ready to train: {yes | no}
79
+ ```
80
+
81
+ Exit after dry run. Do not spawn training agent.
82
+
83
+ ## Step 4: Update Experiment Status
84
+
85
+ Update `.ctx/ml/EXPERIMENT-LOG.md` — change row status from `draft` to `running`.
86
+ Update `.ctx/ml/ML-STATUS.md` — set active experiment and status to running.
87
+
88
+ ## Step 5: Spawn ctx-ml-engineer for Training
89
+
90
+ ```
91
+ Agent({
92
+ subagent_type: "ctx-ml-engineer",
93
+ prompt: |
94
+ Run ML training pipeline for {experiment_id}.
95
+
96
+ Read these files:
97
+ - .ctx/ml/experiments/{experiment_id}/config.yaml
98
+ - .ctx/ml/experiments/{experiment_id}/DESIGN.md
99
+ - .ctx/ml/features/feature-registry.yaml
100
+
101
+ Execute the full pipeline:
102
+
103
+ 1. DATA VALIDATION
104
+ - Load data from config.yaml data paths
105
+ - Apply Pandera schema validation (fail hard on violations)
106
+ - Log row counts and class distribution
107
+
108
+ 2. FEATURE PIPELINE
109
+ - Build transform pipeline from feature-registry.yaml
110
+ - Fit on training data only
111
+ - Transform train, val, test
112
+ - Save fitted pipeline to artifacts/pipeline.pkl
113
+
114
+ 3. TRAINING
115
+ - Train model from config.yaml model params
116
+ - Use early stopping (patience=20)
117
+ - Log training curve to artifacts/train.log
118
+
119
+ 4. HPO (if config.yaml has hpo: true)
120
+ - Run Optuna with n_trials from config or default 100
121
+ - Save study to artifacts/hpo_study.pkl
122
+ - Retrain with best params
123
+
124
+ 5. EVALUATION
125
+ - Compute primary metric and all guard rail metrics
126
+ - Compute calibration error
127
+ - Generate ROC curve, calibration curve, feature importance plots
128
+ - Save metrics to artifacts/metrics.json
129
+ - Save plots to artifacts/plots/
130
+
131
+ 6. CONFORMAL WRAPPER
132
+ - Fit MAPIE on calibration split at alpha=0.1
133
+ - Save to artifacts/mapie.pkl
134
+
135
+ 7. INFERENCE SMOKE TEST
136
+ - Load model + pipeline + mapie
137
+ - Run 5 predictions with full envelope
138
+ - Verify envelope structure is correct
139
+
140
+ Write artifacts/metrics.json with all metrics.
141
+ Write all artifacts per ctx-ml-pipeline skill reproducibility requirements.
142
+
143
+ Do NOT write RESULTS.md — the reviewer will do that.
144
+ Do NOT update the model registry — that happens after review passes.
145
+ })
146
+ ```
147
+
148
+ ## Step 6: Validate Training Artifacts
149
+
150
+ After engineer completes, verify these files exist:
151
+
152
+ ```
153
+ .ctx/ml/experiments/{id}/artifacts/
154
+ ├── model.pkl → required
155
+ ├── pipeline.pkl → required
156
+ ├── mapie.pkl → required
157
+ ├── config.yaml → required (copy of run config)
158
+ ├── metrics.json → required
159
+ ├── train.log → required
160
+ └── plots/ → required (at least roc_curve.png)
161
+ ```
162
+
163
+ If any required artifact is missing, report which and do not proceed to review.
164
+
165
+ ## Step 7: Spawn ctx-ml-reviewer for Evaluation
166
+
167
+ ```
168
+ Agent({
169
+ subagent_type: "ctx-ml-reviewer",
170
+ prompt: |
171
+ Review training results for {experiment_id}.
172
+
173
+ Read:
174
+ - .ctx/ml/experiments/{experiment_id}/HYPOTHESIS.md
175
+ - .ctx/ml/experiments/{experiment_id}/DESIGN.md
176
+ - .ctx/ml/experiments/{experiment_id}/artifacts/metrics.json
177
+ - .ctx/ml/experiments/{experiment_id}/artifacts/train.log
178
+ - .ctx/ml/models/registry.yaml (current production model metrics for comparison)
179
+
180
+ Review checklist:
181
+ - [ ] Primary metric meets DESIGN.md acceptance threshold
182
+ - [ ] Guard rail metrics not violated
183
+ - [ ] Training loss converged (check train.log — no NaN, no divergence)
184
+ - [ ] Calibration error < 0.05
185
+ - [ ] Primary metric improves on production model by promotion criteria
186
+
187
+ Write .ctx/ml/experiments/{experiment_id}/RESULTS.md with:
188
+ - Metrics table (baseline vs result vs delta)
189
+ - Verdict: accepted | rejected | inconclusive
190
+ - Key findings
191
+ - Next experiment recommendation
192
+
193
+ Return verdict as final line: VERDICT: accepted | rejected | inconclusive
194
+ })
195
+ ```
196
+
197
+ ## Step 8: Handle Verdict
198
+
199
+ Read reviewer verdict.
200
+
201
+ ### Verdict: accepted
202
+
203
+ 1. Determine next version: read current version from `models/registry.yaml`, increment.
204
+ 2. Update `models/registry.yaml`:
205
+ - Add new version entry with metrics, experiment ID, date, status: production
206
+ - Set previous production version status to: retired
207
+ - Set `current` to new version
208
+ 3. Update `EXPERIMENT-LOG.md` row status to: `accepted`
209
+ 4. Update `ML-STATUS.md` with outcome
210
+
211
+ Output:
212
+ ```
213
+ [Train] EXP-{n} accepted — model promoted to {name} v{version}
214
+
215
+ Metrics:
216
+ {primary}: {value} (was {baseline}, +{delta})
217
+ {guard}: {value} (was {baseline})
218
+
219
+ Artifacts: .ctx/ml/experiments/{experiment_id}/artifacts/
220
+ Registry: .ctx/ml/models/registry.yaml
221
+
222
+ Run /ctx:experiment new "<next hypothesis>" to continue.
223
+ ```
224
+
225
+ ### Verdict: rejected
226
+
227
+ 1. Update `EXPERIMENT-LOG.md` row status to: `rejected`
228
+ 2. Update `ML-STATUS.md` with outcome and reviewer's next recommendation
229
+
230
+ Output:
231
+ ```
232
+ [Train] EXP-{n} rejected — {reason from RESULTS.md}
233
+
234
+ Primary metric: {value} (target was {target})
235
+
236
+ Key findings:
237
+ {findings from RESULTS.md}
238
+
239
+ Next recommendation: {from reviewer}
240
+
241
+ Run /ctx:experiment new "<next hypothesis>" to iterate.
242
+ ```
243
+
244
+ ### Verdict: inconclusive
245
+
246
+ 1. Update `EXPERIMENT-LOG.md` row status to: `inconclusive`
247
+ 2. Report what is blocking a clear verdict
248
+
249
+ Output:
250
+ ```
251
+ [Train] EXP-{n} inconclusive — {reason}
252
+
253
+ Blocking issue: {issue}
254
+
255
+ Recommended action: {action}
256
+ ```
257
+
258
+ </process>
259
+
260
+ <guardrails>
261
+ - Never promote to registry without an accepted verdict from ctx-ml-reviewer.
262
+ - Never proceed to review if required training artifacts are missing.
263
+ - Seed is mandatory in config.yaml — non-reproducible runs are rejected.
264
+ - Dry run never touches EXPERIMENT-LOG.md or ML-STATUS.md.
265
+ - If training fails mid-run, update EXPERIMENT-LOG.md status to "failed" before exiting.
266
+ </guardrails>