ctx-cc 3.5.0 → 4.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (74) hide show
  1. package/README.md +375 -676
  2. package/agents/ctx-arch-mapper.md +5 -3
  3. package/agents/ctx-auditor.md +5 -3
  4. package/agents/ctx-codex-reviewer.md +214 -0
  5. package/agents/ctx-concerns-mapper.md +5 -3
  6. package/agents/ctx-criteria-suggester.md +6 -4
  7. package/agents/ctx-debugger.md +5 -3
  8. package/agents/ctx-designer.md +488 -114
  9. package/agents/ctx-discusser.md +5 -3
  10. package/agents/ctx-executor.md +5 -3
  11. package/agents/ctx-handoff.md +6 -4
  12. package/agents/ctx-learner.md +5 -3
  13. package/agents/ctx-mapper.md +4 -3
  14. package/agents/ctx-ml-analyst.md +600 -0
  15. package/agents/ctx-ml-engineer.md +933 -0
  16. package/agents/ctx-ml-reviewer.md +485 -0
  17. package/agents/ctx-ml-scientist.md +626 -0
  18. package/agents/ctx-parallelizer.md +4 -3
  19. package/agents/ctx-planner.md +5 -3
  20. package/agents/ctx-predictor.md +4 -3
  21. package/agents/ctx-qa.md +5 -3
  22. package/agents/ctx-quality-mapper.md +5 -3
  23. package/agents/ctx-researcher.md +5 -3
  24. package/agents/ctx-reviewer.md +6 -4
  25. package/agents/ctx-team-coordinator.md +5 -3
  26. package/agents/ctx-tech-mapper.md +5 -3
  27. package/agents/ctx-verifier.md +5 -3
  28. package/bin/ctx.js +199 -27
  29. package/commands/brand.md +309 -0
  30. package/commands/ctx.md +10 -10
  31. package/commands/design.md +304 -0
  32. package/commands/experiment.md +251 -0
  33. package/commands/help.md +57 -7
  34. package/commands/init.md +25 -0
  35. package/commands/metrics.md +1 -1
  36. package/commands/milestone.md +1 -1
  37. package/commands/ml-status.md +197 -0
  38. package/commands/monitor.md +1 -1
  39. package/commands/train.md +266 -0
  40. package/commands/visual-qa.md +559 -0
  41. package/commands/voice.md +1 -1
  42. package/hooks/post-tool-use.js +39 -0
  43. package/hooks/pre-tool-use.js +94 -0
  44. package/hooks/subagent-stop.js +32 -0
  45. package/package.json +9 -3
  46. package/plugin.json +46 -0
  47. package/skills/ctx-design-system/SKILL.md +572 -0
  48. package/skills/ctx-ml-experiment/SKILL.md +334 -0
  49. package/skills/ctx-ml-pipeline/SKILL.md +437 -0
  50. package/skills/ctx-orchestrator/SKILL.md +91 -0
  51. package/skills/ctx-review-gate/SKILL.md +147 -0
  52. package/skills/ctx-state/SKILL.md +100 -0
  53. package/skills/ctx-visual-qa/SKILL.md +587 -0
  54. package/src/agents.js +109 -0
  55. package/src/auto.js +287 -0
  56. package/src/capabilities.js +226 -0
  57. package/src/commits.js +94 -0
  58. package/src/config.js +112 -0
  59. package/src/context.js +241 -0
  60. package/src/handoff.js +156 -0
  61. package/src/hooks.js +218 -0
  62. package/src/install.js +125 -50
  63. package/src/lifecycle.js +194 -0
  64. package/src/metrics.js +198 -0
  65. package/src/pipeline.js +269 -0
  66. package/src/review-gate.js +338 -0
  67. package/src/runner.js +120 -0
  68. package/src/skills.js +143 -0
  69. package/src/state.js +267 -0
  70. package/src/worktree.js +244 -0
  71. package/templates/PRD.json +1 -1
  72. package/templates/config.json +4 -237
  73. package/workflows/ctx-router.md +0 -485
  74. package/workflows/map-codebase.md +0 -329
@@ -0,0 +1,334 @@
1
+ ---
2
+ name: ctx-ml-experiment
3
+ description: |
4
+ WHEN: User wants to run ML experiments, track hypotheses, compare models, analyze results, tune hyperparameters, or manage the experiment loop (hypothesize → design → implement → run → analyze → iterate).
5
+ WHEN NOT: Software development, UI design, non-ML coding tasks, infrastructure work unrelated to model training or evaluation.
6
+ ---
7
+
8
+ # CTX ML Experiment — Hypothesis-Driven Experiment Lifecycle
9
+
10
+ You manage the full ML experiment lifecycle: from hypothesis formation through result analysis and iteration. Every experiment is persisted, comparable, and traceable.
11
+
12
+ ## Core Principle
13
+
14
+ No ad-hoc runs. Every model change starts with a hypothesis, proceeds through a defined design, and concludes with recorded results. The experiment log is the ground truth.
15
+
16
+ ## Experiment Lifecycle
17
+
18
+ ```
19
+ hypothesize → design → implement → run → analyze → iterate
20
+ ↑ │
21
+ └────────── learn from results ──────────┘
22
+ ```
23
+
24
+ Never skip phases. A run without a hypothesis is noise, not science.
25
+
26
+ ## Directory Structure
27
+
28
+ Bootstrap `.ctx/ml/` if it does not exist:
29
+
30
+ ```
31
+ .ctx/ml/
32
+ ├── experiments/
33
+ │ ├── EXP-001/
34
+ │ │ ├── HYPOTHESIS.md # What we believe and why
35
+ │ │ ├── DESIGN.md # How to test it
36
+ │ │ ├── config.yaml # Reproducible run config
37
+ │ │ ├── RESULTS.md # Actual outcomes vs expected
38
+ │ │ └── artifacts/ # Model checkpoints, plots, logs
39
+ │ └── EXPERIMENT-LOG.md # Running table of all experiments
40
+ ├── analysis/
41
+ │ ├── EDA-<dataset>.md # Exploratory data analysis
42
+ │ └── plots/
43
+ ├── features/
44
+ │ ├── feature-registry.yaml # All features, transforms, lineage
45
+ │ └── transforms/ # Reusable transform definitions
46
+ ├── models/
47
+ │ ├── registry.yaml # Model versions, metrics, lineage
48
+ │ └── configs/ # Canonical model configs
49
+ └── ML-STATUS.md # Current ML project status
50
+ ```
51
+
52
+ ## Experiment ID Convention
53
+
54
+ IDs are sequential: `EXP-001`, `EXP-002`, etc. Read the current max from `EXPERIMENT-LOG.md` and increment. Never reuse IDs.
55
+
56
+ ## HYPOTHESIS.md Format
57
+
58
+ ```markdown
59
+ # EXP-{id}: {Short Hypothesis Title}
60
+
61
+ **Created**: {ISO date}
62
+ **Author**: {agent or human}
63
+ **Status**: draft | running | concluded
64
+
65
+ ## Hypothesis
66
+
67
+ {One sentence: "We believe that X will improve Y by Z because of W."}
68
+
69
+ ## Rationale
70
+
71
+ - {Observation or data point that motivates this}
72
+ - {Prior experiment result that informs this}
73
+ - {Domain knowledge or literature support}
74
+
75
+ ## Expected Outcome
76
+
77
+ - Primary metric: {metric} improves from {baseline} to {target}
78
+ - Guard rail: {metric} does not degrade below {threshold}
79
+
80
+ ## Null Hypothesis
81
+
82
+ No meaningful difference between {treatment} and {control} on {metric}.
83
+
84
+ ## Risk
85
+
86
+ {What would make this hypothesis wrong? What could go wrong?}
87
+ ```
88
+
89
+ ## DESIGN.md Format
90
+
91
+ ```markdown
92
+ # EXP-{id}: Experiment Design
93
+
94
+ ## Setup
95
+
96
+ | Property | Value |
97
+ |----------|-------|
98
+ | Baseline | {model/config used as control} |
99
+ | Treatment | {what changes} |
100
+ | Dataset | {train/val/test splits, sizes} |
101
+ | Random seed | {seed value} |
102
+
103
+ ## Changes from Baseline
104
+
105
+ - {File or config}: {change description}
106
+
107
+ ## Metrics
108
+
109
+ | Metric | Direction | Threshold | Notes |
110
+ |--------|-----------|-----------|-------|
111
+ | {primary} | maximize | {value} | promotion gate |
112
+ | {guard} | minimize | {value} | regression gate |
113
+
114
+ ## Evaluation Protocol
115
+
116
+ 1. {Step 1}
117
+ 2. {Step 2}
118
+ 3. {Step 3}
119
+
120
+ ## Acceptance Criteria
121
+
122
+ - [ ] Primary metric meets threshold
123
+ - [ ] Guard rail metrics not violated
124
+ - [ ] Training is stable (no loss explosion, no NaN)
125
+ - [ ] Inference latency within budget
126
+ ```
127
+
128
+ ## config.yaml Format
129
+
130
+ ```yaml
131
+ experiment_id: EXP-001
132
+ hypothesis: "XGBoost with depth=6 outperforms depth=4 baseline"
133
+ created: "2026-03-25"
134
+
135
+ data:
136
+ train: data/train_v2.parquet
137
+ val: data/val_v2.parquet
138
+ test: data/test_v2.parquet
139
+ seed: 42
140
+
141
+ model:
142
+ type: xgboost
143
+ params:
144
+ max_depth: 6
145
+ n_estimators: 300
146
+ learning_rate: 0.05
147
+ subsample: 0.8
148
+ colsample_bytree: 0.8
149
+
150
+ evaluation:
151
+ primary_metric: auc
152
+ guard_metrics: [precision, recall]
153
+ cv_folds: 5
154
+
155
+ artifacts:
156
+ model_path: artifacts/model.pkl
157
+ plots: artifacts/plots/
158
+ logs: artifacts/train.log
159
+ ```
160
+
161
+ ## RESULTS.md Format
162
+
163
+ ```markdown
164
+ # EXP-{id}: Results
165
+
166
+ **Concluded**: {ISO date}
167
+ **Status**: accepted | rejected | inconclusive
168
+
169
+ ## Outcome
170
+
171
+ | Metric | Baseline | Result | Delta | Pass? |
172
+ |--------|----------|--------|-------|-------|
173
+ | {primary} | {value} | {value} | {+/-} | yes/no |
174
+ | {guard} | {value} | {value} | {+/-} | yes/no |
175
+
176
+ ## Verdict
177
+
178
+ {accepted | rejected | inconclusive} — {One sentence reason}
179
+
180
+ ## Key Findings
181
+
182
+ - {Finding 1}
183
+ - {Finding 2}
184
+
185
+ ## What This Tells Us
186
+
187
+ {2-3 sentences on what we learned, not just the metric deltas.}
188
+
189
+ ## Next Experiment
190
+
191
+ {What should EXP-{n+1} test, given these results?}
192
+ ```
193
+
194
+ ## EXPERIMENT-LOG.md Format
195
+
196
+ ```markdown
197
+ # ML Experiment Log
198
+
199
+ | ID | Hypothesis | Model | Primary Metric | Result | Status |
200
+ |----|-----------|-------|---------------|--------|--------|
201
+ | EXP-001 | XGBoost > baseline | XGBoost depth=4 | AUC 0.85 target | AUC 0.82 | rejected |
202
+ | EXP-002 | Feature X improves AUC | XGBoost+feat_x | AUC 0.87 target | AUC 0.87 | accepted |
203
+ | EXP-003 | HPO on accepted config | XGBoost+feat_x tuned | AUC 0.90 target | running | running |
204
+ ```
205
+
206
+ Always append — never delete rows. Mark rejected experiments as `rejected`, not removed.
207
+
208
+ ## Model Registry Format (models/registry.yaml)
209
+
210
+ ```yaml
211
+ models:
212
+ risk-classifier:
213
+ current: v3
214
+ versions:
215
+ v1:
216
+ metrics: { auc: 0.82, precision: 0.79 }
217
+ experiment: EXP-001
218
+ date: "2026-01-15"
219
+ status: retired
220
+ artifacts: .ctx/ml/experiments/EXP-001/artifacts/
221
+ v2:
222
+ metrics: { auc: 0.87, precision: 0.84 }
223
+ experiment: EXP-002
224
+ date: "2026-02-01"
225
+ status: retired
226
+ artifacts: .ctx/ml/experiments/EXP-002/artifacts/
227
+ v3:
228
+ metrics: { auc: 0.91, precision: 0.88 }
229
+ experiment: EXP-005
230
+ date: "2026-03-10"
231
+ status: production
232
+ artifacts: .ctx/ml/experiments/EXP-005/artifacts/
233
+ promotion_criteria:
234
+ primary: "auc >= current + 0.02"
235
+ guard: "precision regression <= 0.01"
236
+ stability: "training loss converges within 50 epochs"
237
+ ```
238
+
239
+ ## Feature Registry Format (features/feature-registry.yaml)
240
+
241
+ ```yaml
242
+ features:
243
+ age_z_score:
244
+ type: numeric
245
+ source: demographics
246
+ transform: "(age - mean) / std"
247
+ version: 1
248
+ created: "2026-01-15"
249
+ used_by: [risk-classifier, bio-age-model]
250
+ validated: true
251
+ notes: "mean=45.2, std=12.1 from train_v1"
252
+
253
+ cholesterol_ratio:
254
+ type: numeric
255
+ source: labs
256
+ transform: "hdl / ldl"
257
+ version: 2
258
+ created: "2026-02-10"
259
+ used_by: [risk-classifier]
260
+ validated: true
261
+ notes: "v2 clips outliers at 99th percentile"
262
+ ```
263
+
264
+ ## ML-STATUS.md Format
265
+
266
+ ```markdown
267
+ # ML Project Status
268
+
269
+ **Updated**: {ISO date}
270
+ **Active Experiment**: EXP-{n} — {hypothesis title}
271
+
272
+ ## Current Focus
273
+
274
+ {1-2 sentences on what we are trying to solve right now}
275
+
276
+ ## Recent Results
277
+
278
+ | EXP | Outcome | Key Learning |
279
+ |-----|---------|--------------|
280
+ | EXP-{n-1} | accepted | {learning} |
281
+ | EXP-{n-2} | rejected | {learning} |
282
+
283
+ ## Blocking Issues
284
+
285
+ - {Issue if any, else "none"}
286
+
287
+ ## Next Experiments Queued
288
+
289
+ 1. {EXP-{n+1}} — {hypothesis}
290
+ 2. {EXP-{n+2}} — {hypothesis}
291
+ ```
292
+
293
+ ## Workflow Rules
294
+
295
+ 1. Every experiment gets its own numbered directory. No experiments in ad-hoc files.
296
+ 2. `config.yaml` must be committed before running. Results are not reproducible without it.
297
+ 3. Record results even for rejected experiments. Negative results have value.
298
+ 4. Update `EXPERIMENT-LOG.md` after every concluded experiment — keep it current.
299
+ 5. Update `ML-STATUS.md` whenever the active experiment changes.
300
+ 6. Never promote a model to registry without a concluded RESULTS.md with accepted status.
301
+ 7. Feature registry is append-only with versioning. Old feature versions stay in the file.
302
+
303
+ ## Agent Spawn Patterns
304
+
305
+ When orchestrating experiment work, spawn agents appropriate to the phase:
306
+
307
+ ```
308
+ Phase: hypothesize → spawn ctx-ml-scientist
309
+ prompt: "Form a hypothesis for EXP-{n}. Context: {prior results}. Write HYPOTHESIS.md."
310
+
311
+ Phase: design → spawn ctx-ml-scientist
312
+ prompt: "Design the experiment for EXP-{n}. Write DESIGN.md and config.yaml."
313
+
314
+ Phase: implement → spawn ctx-ml-engineer
315
+ prompt: "Implement training script for EXP-{n} using config.yaml."
316
+
317
+ Phase: analyze → spawn ctx-ml-analyst
318
+ prompt: "Analyze results from EXP-{n} run. Write RESULTS.md. Update EXPERIMENT-LOG.md."
319
+
320
+ Phase: review → spawn ctx-ml-reviewer
321
+ prompt: "Review EXP-{n} results. Recommend: accept, reject, or run follow-up."
322
+ ```
323
+
324
+ ## Iteration Decision Tree
325
+
326
+ After `analyze`:
327
+
328
+ ```
329
+ Result meets primary metric AND guards hold?
330
+ YES → Accept. Update model registry. Queue follow-up if headroom remains.
331
+ NO (primary miss) → Reject. Document learnings. Form new hypothesis.
332
+ NO (guard violated) → Inconclusive. Address regression before retrying.
333
+ Unclear → Run validation on holdout set before deciding.
334
+ ```