ctx-cc 3.5.0 → 4.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +34 -289
- package/agents/ctx-arch-mapper.md +5 -3
- package/agents/ctx-auditor.md +5 -3
- package/agents/ctx-concerns-mapper.md +5 -3
- package/agents/ctx-criteria-suggester.md +6 -4
- package/agents/ctx-debugger.md +5 -3
- package/agents/ctx-designer.md +488 -114
- package/agents/ctx-discusser.md +5 -3
- package/agents/ctx-executor.md +5 -3
- package/agents/ctx-handoff.md +6 -4
- package/agents/ctx-learner.md +5 -3
- package/agents/ctx-mapper.md +4 -3
- package/agents/ctx-ml-analyst.md +600 -0
- package/agents/ctx-ml-engineer.md +933 -0
- package/agents/ctx-ml-reviewer.md +485 -0
- package/agents/ctx-ml-scientist.md +626 -0
- package/agents/ctx-parallelizer.md +4 -3
- package/agents/ctx-planner.md +5 -3
- package/agents/ctx-predictor.md +4 -3
- package/agents/ctx-qa.md +5 -3
- package/agents/ctx-quality-mapper.md +5 -3
- package/agents/ctx-researcher.md +5 -3
- package/agents/ctx-reviewer.md +6 -4
- package/agents/ctx-team-coordinator.md +5 -3
- package/agents/ctx-tech-mapper.md +5 -3
- package/agents/ctx-verifier.md +5 -3
- package/bin/ctx.js +168 -27
- package/commands/brand.md +309 -0
- package/commands/design.md +304 -0
- package/commands/experiment.md +251 -0
- package/commands/help.md +57 -7
- package/commands/metrics.md +1 -1
- package/commands/milestone.md +1 -1
- package/commands/ml-status.md +197 -0
- package/commands/monitor.md +1 -1
- package/commands/train.md +266 -0
- package/commands/visual-qa.md +559 -0
- package/commands/voice.md +1 -1
- package/hooks/post-tool-use.js +39 -0
- package/hooks/pre-tool-use.js +93 -0
- package/hooks/subagent-stop.js +32 -0
- package/package.json +9 -3
- package/plugin.json +45 -0
- package/skills/ctx-design-system/SKILL.md +572 -0
- package/skills/ctx-ml-experiment/SKILL.md +334 -0
- package/skills/ctx-ml-pipeline/SKILL.md +437 -0
- package/skills/ctx-orchestrator/SKILL.md +91 -0
- package/skills/ctx-review-gate/SKILL.md +111 -0
- package/skills/ctx-state/SKILL.md +100 -0
- package/skills/ctx-visual-qa/SKILL.md +587 -0
- package/src/agents.js +109 -0
- package/src/auto.js +287 -0
- package/src/capabilities.js +171 -0
- package/src/commits.js +94 -0
- package/src/config.js +112 -0
- package/src/context.js +241 -0
- package/src/handoff.js +156 -0
- package/src/hooks.js +218 -0
- package/src/install.js +119 -51
- package/src/lifecycle.js +194 -0
- package/src/metrics.js +198 -0
- package/src/pipeline.js +269 -0
- package/src/review-gate.js +244 -0
- package/src/runner.js +120 -0
- package/src/skills.js +143 -0
- package/src/state.js +267 -0
- package/src/worktree.js +244 -0
- package/templates/PRD.json +1 -1
- package/templates/config.json +1 -237
- package/workflows/ctx-router.md +0 -485
- package/workflows/map-codebase.md +0 -329
|
@@ -0,0 +1,197 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ctx:ml-status
|
|
3
|
+
description: Show ML project status — experiments, models, features, drift alerts. Read-only dashboard. No agents spawned.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
<objective>
|
|
7
|
+
Display a concise ML project dashboard. Read-only. Shows active experiment, recent experiment history, production model versions, feature count, and any drift alerts. No writes, no agents.
|
|
8
|
+
</objective>
|
|
9
|
+
|
|
10
|
+
<usage>
|
|
11
|
+
```bash
|
|
12
|
+
/ctx:ml-status # Full dashboard
|
|
13
|
+
/ctx:ml-status models # Model registry only
|
|
14
|
+
/ctx:ml-status exp # Experiment log only
|
|
15
|
+
/ctx:ml-status drift # Drift alerts only
|
|
16
|
+
```
|
|
17
|
+
</usage>
|
|
18
|
+
|
|
19
|
+
<process>
|
|
20
|
+
|
|
21
|
+
## Step 1: Check ML Directory Exists
|
|
22
|
+
|
|
23
|
+
If `.ctx/ml/` does not exist:
|
|
24
|
+
```
|
|
25
|
+
[ML Status] No ML project found.
|
|
26
|
+
|
|
27
|
+
Run /ctx:experiment new "<hypothesis>" to start.
|
|
28
|
+
```
|
|
29
|
+
Exit.
|
|
30
|
+
|
|
31
|
+
## Step 2: Parse Filter Argument
|
|
32
|
+
|
|
33
|
+
| Arg | Show |
|
|
34
|
+
|-----|------|
|
|
35
|
+
| none | Full dashboard |
|
|
36
|
+
| `models` | Model registry section only |
|
|
37
|
+
| `exp` | Experiment log section only |
|
|
38
|
+
| `drift` | Drift alerts section only |
|
|
39
|
+
|
|
40
|
+
## Step 3: Read Files
|
|
41
|
+
|
|
42
|
+
Read the following files (skip silently if missing):
|
|
43
|
+
|
|
44
|
+
| File | Purpose |
|
|
45
|
+
|------|---------|
|
|
46
|
+
| `.ctx/ml/ML-STATUS.md` | Active experiment, current focus, blockers |
|
|
47
|
+
| `.ctx/ml/EXPERIMENT-LOG.md` | Experiment history table |
|
|
48
|
+
| `.ctx/ml/models/registry.yaml` | Model versions and metrics |
|
|
49
|
+
| `.ctx/ml/features/feature-registry.yaml` | Feature count and inventory |
|
|
50
|
+
| `.ctx/ml/experiments/*/artifacts/drift_alerts.json` | Any saved drift alerts |
|
|
51
|
+
|
|
52
|
+
## Step 4: Render Dashboard
|
|
53
|
+
|
|
54
|
+
### Full Dashboard Output
|
|
55
|
+
|
|
56
|
+
```
|
|
57
|
+
[ML Status] {project_dir}
|
|
58
|
+
Updated: {ML-STATUS.md updated date}
|
|
59
|
+
|
|
60
|
+
━━━ Active Experiment ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
61
|
+
|
|
62
|
+
{experiment_id} — {hypothesis title}
|
|
63
|
+
Status: {draft | running | concluded}
|
|
64
|
+
Phase: {hypothesize | design | train | review | done}
|
|
65
|
+
|
|
66
|
+
Current Focus:
|
|
67
|
+
{from ML-STATUS.md Current Focus section}
|
|
68
|
+
|
|
69
|
+
━━━ Recent Experiments ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
70
|
+
|
|
71
|
+
{last 5 rows from EXPERIMENT-LOG.md, formatted as table}
|
|
72
|
+
|
|
73
|
+
ID Status Primary Metric Result
|
|
74
|
+
EXP-{n} running AUC >= 0.90 —
|
|
75
|
+
EXP-{n-1} accepted AUC >= 0.88 AUC 0.91
|
|
76
|
+
EXP-{n-2} rejected AUC >= 0.88 AUC 0.86
|
|
77
|
+
EXP-{n-3} accepted AUC >= 0.85 AUC 0.87
|
|
78
|
+
|
|
79
|
+
━━━ Production Models ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
80
|
+
|
|
81
|
+
{for each model in registry.yaml}
|
|
82
|
+
{model_name}: v{current} — {primary metric}: {value}
|
|
83
|
+
Promoted: {date} Experiment: {experiment_id}
|
|
84
|
+
|
|
85
|
+
━━━ Features ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
86
|
+
|
|
87
|
+
{count} features registered
|
|
88
|
+
{count} features validated
|
|
89
|
+
{list feature names used by production models, comma-separated}
|
|
90
|
+
|
|
91
|
+
━━━ Drift Alerts ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
92
|
+
|
|
93
|
+
{if no drift_alerts.json found}
|
|
94
|
+
No drift alerts.
|
|
95
|
+
|
|
96
|
+
{if drift alerts found}
|
|
97
|
+
{count} feature(s) drifted in latest check:
|
|
98
|
+
{feature}: KS={ks_stat}, p={pvalue} [{severity}]
|
|
99
|
+
|
|
100
|
+
━━━ Blockers ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
101
|
+
|
|
102
|
+
{from ML-STATUS.md Blocking Issues section, or "none"}
|
|
103
|
+
|
|
104
|
+
━━━ Next Steps ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
105
|
+
|
|
106
|
+
{context-aware suggestions:}
|
|
107
|
+
|
|
108
|
+
If active experiment is draft:
|
|
109
|
+
Run /ctx:train to start training.
|
|
110
|
+
|
|
111
|
+
If active experiment is running:
|
|
112
|
+
Training in progress. Run /ctx:train again to check.
|
|
113
|
+
|
|
114
|
+
If active experiment is concluded (accepted):
|
|
115
|
+
Run /ctx:experiment new "<next hypothesis>" to iterate.
|
|
116
|
+
|
|
117
|
+
If active experiment is concluded (rejected):
|
|
118
|
+
Run /ctx:experiment new "<revised hypothesis>" based on learnings.
|
|
119
|
+
|
|
120
|
+
If no active experiment:
|
|
121
|
+
Run /ctx:experiment new "<hypothesis>" to start.
|
|
122
|
+
|
|
123
|
+
If drift alerts present:
|
|
124
|
+
Run /ctx:experiment new "retrain on updated data distribution" to address drift.
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
### Models-Only Output
|
|
128
|
+
|
|
129
|
+
```
|
|
130
|
+
[ML Status] Production Models
|
|
131
|
+
|
|
132
|
+
{model_name}
|
|
133
|
+
Current: v{n}
|
|
134
|
+
{primary metric}: {value}
|
|
135
|
+
Promoted: {date}
|
|
136
|
+
Experiment: {experiment_id}
|
|
137
|
+
Promotion criteria: {from registry.yaml}
|
|
138
|
+
|
|
139
|
+
Version history:
|
|
140
|
+
v{n}: {primary metric}: {value} — production
|
|
141
|
+
v{n-1}: {primary metric}: {value} — retired
|
|
142
|
+
v{n-2}: {primary metric}: {value} — retired
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
### Experiments-Only Output
|
|
146
|
+
|
|
147
|
+
```
|
|
148
|
+
[ML Status] Experiment Log
|
|
149
|
+
|
|
150
|
+
ID Hypothesis (truncated 60 chars) Model Metric Result Status
|
|
151
|
+
─────────────────────────────────────────────────────────────────────────────────────────
|
|
152
|
+
EXP-{n} {hypothesis} {model} {metric} {result} {status}
|
|
153
|
+
...
|
|
154
|
+
|
|
155
|
+
Total: {count} experiments ({accepted} accepted, {rejected} rejected, {running} running)
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
### Drift-Only Output
|
|
159
|
+
|
|
160
|
+
```
|
|
161
|
+
[ML Status] Drift Alerts
|
|
162
|
+
|
|
163
|
+
Source: .ctx/ml/experiments/{latest_exp}/artifacts/drift_alerts.json
|
|
164
|
+
Checked: {file modified date}
|
|
165
|
+
|
|
166
|
+
{if no alerts}
|
|
167
|
+
No drift detected.
|
|
168
|
+
|
|
169
|
+
{if alerts}
|
|
170
|
+
Feature KS Stat p-value Severity
|
|
171
|
+
──────────────────────────────────────────────
|
|
172
|
+
{feature} {stat} {pvalue} {high|medium}
|
|
173
|
+
|
|
174
|
+
Recommendation: Run /ctx:experiment new "retrain on updated distribution"
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
## Step 5: Handle Missing Files Gracefully
|
|
178
|
+
|
|
179
|
+
If a file is missing, show that section as empty with a hint:
|
|
180
|
+
|
|
181
|
+
```
|
|
182
|
+
━━━ Production Models ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
183
|
+
No models registered yet.
|
|
184
|
+
Models appear here after a /ctx:train run passes review.
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
Never error on missing files — display what is available.
|
|
188
|
+
|
|
189
|
+
</process>
|
|
190
|
+
|
|
191
|
+
<guardrails>
|
|
192
|
+
- This command is read-only. It never writes files, spawns agents, or modifies state.
|
|
193
|
+
- Do not parse YAML strictly — if registry.yaml is malformed, show a warning and continue.
|
|
194
|
+
- Drift alert files are found by globbing .ctx/ml/experiments/*/artifacts/drift_alerts.json — show the most recently modified one.
|
|
195
|
+
- Truncate long hypothesis strings to 60 characters in table views.
|
|
196
|
+
- If ML-STATUS.md does not exist but EXPERIMENT-LOG.md does, derive status from the log.
|
|
197
|
+
</guardrails>
|
package/commands/monitor.md
CHANGED
|
@@ -4,7 +4,7 @@ description: Self-healing deployments - connect to error tracking (Sentry/LogRoc
|
|
|
4
4
|
---
|
|
5
5
|
|
|
6
6
|
<objective>
|
|
7
|
-
CTX 3.
|
|
7
|
+
CTX 3.5 Self-Healing Deployments - Monitor production errors and automatically create fix stories or even auto-fix with PR creation.
|
|
8
8
|
</objective>
|
|
9
9
|
|
|
10
10
|
<usage>
|
|
@@ -0,0 +1,266 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ctx:train
|
|
3
|
+
description: ML model training workflow — feature engineering, training, HPO, evaluation, and registry promotion. Uses Digital Twin patterns from ctx-ml-pipeline skill.
|
|
4
|
+
args: experiment_id (optional — defaults to active experiment from ML-STATUS.md)
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
<objective>
|
|
8
|
+
Orchestrate a full ML training run for a designed experiment. Loads config, spawns engineer to build/run the pipeline, spawns reviewer to validate results, and promotes to model registry if promotion criteria are met.
|
|
9
|
+
|
|
10
|
+
This command assumes the experiment already has HYPOTHESIS.md, DESIGN.md, and config.yaml. Run /ctx:experiment new first if not.
|
|
11
|
+
</objective>
|
|
12
|
+
|
|
13
|
+
<usage>
|
|
14
|
+
```bash
|
|
15
|
+
/ctx:train # Train for active experiment (from ML-STATUS.md)
|
|
16
|
+
/ctx:train EXP-003 # Train for specific experiment
|
|
17
|
+
/ctx:train --dry-run # Validate config without running training
|
|
18
|
+
```
|
|
19
|
+
</usage>
|
|
20
|
+
|
|
21
|
+
<process>
|
|
22
|
+
|
|
23
|
+
## Step 1: Parse Arguments
|
|
24
|
+
|
|
25
|
+
- No args → read active experiment from `.ctx/ml/ML-STATUS.md`
|
|
26
|
+
- `EXP-{n}` → use that experiment ID
|
|
27
|
+
- `--dry-run` → validate config and design, report issues, do not train
|
|
28
|
+
|
|
29
|
+
If no active experiment and no ID given:
|
|
30
|
+
```
|
|
31
|
+
[Train] No active experiment found.
|
|
32
|
+
Run /ctx:experiment new "<hypothesis>" to create one.
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
## Step 2: Validate Prerequisites
|
|
36
|
+
|
|
37
|
+
Check these files exist before proceeding:
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
.ctx/ml/experiments/{id}/HYPOTHESIS.md → must exist
|
|
41
|
+
.ctx/ml/experiments/{id}/DESIGN.md → must exist
|
|
42
|
+
.ctx/ml/experiments/{id}/config.yaml → must exist
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
If any are missing:
|
|
46
|
+
```
|
|
47
|
+
[Train] Cannot run — missing required files for {experiment_id}:
|
|
48
|
+
- {missing file}
|
|
49
|
+
|
|
50
|
+
Run /ctx:experiment new to create them.
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
Also check:
|
|
54
|
+
- `config.yaml` has a `data.train` path and it exists on disk
|
|
55
|
+
- `config.yaml` has a `model.type` set
|
|
56
|
+
- `config.yaml` has a `seed` set (reject if missing — non-reproducible)
|
|
57
|
+
|
|
58
|
+
## Step 3: Dry Run (if --dry-run)
|
|
59
|
+
|
|
60
|
+
Read DESIGN.md acceptance criteria and config.yaml. Report:
|
|
61
|
+
|
|
62
|
+
```
|
|
63
|
+
[Train] Dry run for {experiment_id}
|
|
64
|
+
|
|
65
|
+
Config:
|
|
66
|
+
Model: {model.type}
|
|
67
|
+
Params: {key params}
|
|
68
|
+
Data: {data paths}
|
|
69
|
+
Seed: {seed}
|
|
70
|
+
|
|
71
|
+
Design checks:
|
|
72
|
+
Primary metric: {metric} — {target}
|
|
73
|
+
Guard rails: {metrics}
|
|
74
|
+
Acceptance criteria: {count} items
|
|
75
|
+
|
|
76
|
+
Issues: {none | list any problems}
|
|
77
|
+
|
|
78
|
+
Ready to train: {yes | no}
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
Exit after dry run. Do not spawn training agent.
|
|
82
|
+
|
|
83
|
+
## Step 4: Update Experiment Status
|
|
84
|
+
|
|
85
|
+
Update `.ctx/ml/EXPERIMENT-LOG.md` — change row status from `draft` to `running`.
|
|
86
|
+
Update `.ctx/ml/ML-STATUS.md` — set active experiment and status to running.
|
|
87
|
+
|
|
88
|
+
## Step 5: Spawn ctx-ml-engineer for Training
|
|
89
|
+
|
|
90
|
+
```
|
|
91
|
+
Agent({
|
|
92
|
+
subagent_type: "ctx-ml-engineer",
|
|
93
|
+
prompt: |
|
|
94
|
+
Run ML training pipeline for {experiment_id}.
|
|
95
|
+
|
|
96
|
+
Read these files:
|
|
97
|
+
- .ctx/ml/experiments/{experiment_id}/config.yaml
|
|
98
|
+
- .ctx/ml/experiments/{experiment_id}/DESIGN.md
|
|
99
|
+
- .ctx/ml/features/feature-registry.yaml
|
|
100
|
+
|
|
101
|
+
Execute the full pipeline:
|
|
102
|
+
|
|
103
|
+
1. DATA VALIDATION
|
|
104
|
+
- Load data from config.yaml data paths
|
|
105
|
+
- Apply Pandera schema validation (fail hard on violations)
|
|
106
|
+
- Log row counts and class distribution
|
|
107
|
+
|
|
108
|
+
2. FEATURE PIPELINE
|
|
109
|
+
- Build transform pipeline from feature-registry.yaml
|
|
110
|
+
- Fit on training data only
|
|
111
|
+
- Transform train, val, test
|
|
112
|
+
- Save fitted pipeline to artifacts/pipeline.pkl
|
|
113
|
+
|
|
114
|
+
3. TRAINING
|
|
115
|
+
- Train model from config.yaml model params
|
|
116
|
+
- Use early stopping (patience=20)
|
|
117
|
+
- Log training curve to artifacts/train.log
|
|
118
|
+
|
|
119
|
+
4. HPO (if config.yaml has hpo: true)
|
|
120
|
+
- Run Optuna with n_trials from config or default 100
|
|
121
|
+
- Save study to artifacts/hpo_study.pkl
|
|
122
|
+
- Retrain with best params
|
|
123
|
+
|
|
124
|
+
5. EVALUATION
|
|
125
|
+
- Compute primary metric and all guard rail metrics
|
|
126
|
+
- Compute calibration error
|
|
127
|
+
- Generate ROC curve, calibration curve, feature importance plots
|
|
128
|
+
- Save metrics to artifacts/metrics.json
|
|
129
|
+
- Save plots to artifacts/plots/
|
|
130
|
+
|
|
131
|
+
6. CONFORMAL WRAPPER
|
|
132
|
+
- Fit MAPIE on calibration split at alpha=0.1
|
|
133
|
+
- Save to artifacts/mapie.pkl
|
|
134
|
+
|
|
135
|
+
7. INFERENCE SMOKE TEST
|
|
136
|
+
- Load model + pipeline + mapie
|
|
137
|
+
- Run 5 predictions with full envelope
|
|
138
|
+
- Verify envelope structure is correct
|
|
139
|
+
|
|
140
|
+
Write artifacts/metrics.json with all metrics.
|
|
141
|
+
Write all artifacts per ctx-ml-pipeline skill reproducibility requirements.
|
|
142
|
+
|
|
143
|
+
Do NOT write RESULTS.md — the reviewer will do that.
|
|
144
|
+
Do NOT update the model registry — that happens after review passes.
|
|
145
|
+
})
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
## Step 6: Validate Training Artifacts
|
|
149
|
+
|
|
150
|
+
After engineer completes, verify these files exist:
|
|
151
|
+
|
|
152
|
+
```
|
|
153
|
+
.ctx/ml/experiments/{id}/artifacts/
|
|
154
|
+
├── model.pkl → required
|
|
155
|
+
├── pipeline.pkl → required
|
|
156
|
+
├── mapie.pkl → required
|
|
157
|
+
├── config.yaml → required (copy of run config)
|
|
158
|
+
├── metrics.json → required
|
|
159
|
+
├── train.log → required
|
|
160
|
+
└── plots/ → required (at least roc_curve.png)
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
If any required artifact is missing, report which and do not proceed to review.
|
|
164
|
+
|
|
165
|
+
## Step 7: Spawn ctx-ml-reviewer for Evaluation
|
|
166
|
+
|
|
167
|
+
```
|
|
168
|
+
Agent({
|
|
169
|
+
subagent_type: "ctx-ml-reviewer",
|
|
170
|
+
prompt: |
|
|
171
|
+
Review training results for {experiment_id}.
|
|
172
|
+
|
|
173
|
+
Read:
|
|
174
|
+
- .ctx/ml/experiments/{experiment_id}/HYPOTHESIS.md
|
|
175
|
+
- .ctx/ml/experiments/{experiment_id}/DESIGN.md
|
|
176
|
+
- .ctx/ml/experiments/{experiment_id}/artifacts/metrics.json
|
|
177
|
+
- .ctx/ml/experiments/{experiment_id}/artifacts/train.log
|
|
178
|
+
- .ctx/ml/models/registry.yaml (current production model metrics for comparison)
|
|
179
|
+
|
|
180
|
+
Review checklist:
|
|
181
|
+
- [ ] Primary metric meets DESIGN.md acceptance threshold
|
|
182
|
+
- [ ] Guard rail metrics not violated
|
|
183
|
+
- [ ] Training loss converged (check train.log — no NaN, no divergence)
|
|
184
|
+
- [ ] Calibration error < 0.05
|
|
185
|
+
- [ ] Primary metric improves on production model by promotion criteria
|
|
186
|
+
|
|
187
|
+
Write .ctx/ml/experiments/{experiment_id}/RESULTS.md with:
|
|
188
|
+
- Metrics table (baseline vs result vs delta)
|
|
189
|
+
- Verdict: accepted | rejected | inconclusive
|
|
190
|
+
- Key findings
|
|
191
|
+
- Next experiment recommendation
|
|
192
|
+
|
|
193
|
+
Return verdict as final line: VERDICT: accepted | rejected | inconclusive
|
|
194
|
+
})
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
## Step 8: Handle Verdict
|
|
198
|
+
|
|
199
|
+
Read reviewer verdict.
|
|
200
|
+
|
|
201
|
+
### Verdict: accepted
|
|
202
|
+
|
|
203
|
+
1. Determine next version: read current version from `models/registry.yaml`, increment.
|
|
204
|
+
2. Update `models/registry.yaml`:
|
|
205
|
+
- Add new version entry with metrics, experiment ID, date, status: production
|
|
206
|
+
- Set previous production version status to: retired
|
|
207
|
+
- Set `current` to new version
|
|
208
|
+
3. Update `EXPERIMENT-LOG.md` row status to: `accepted`
|
|
209
|
+
4. Update `ML-STATUS.md` with outcome
|
|
210
|
+
|
|
211
|
+
Output:
|
|
212
|
+
```
|
|
213
|
+
[Train] EXP-{n} accepted — model promoted to {name} v{version}
|
|
214
|
+
|
|
215
|
+
Metrics:
|
|
216
|
+
{primary}: {value} (was {baseline}, +{delta})
|
|
217
|
+
{guard}: {value} (was {baseline})
|
|
218
|
+
|
|
219
|
+
Artifacts: .ctx/ml/experiments/{experiment_id}/artifacts/
|
|
220
|
+
Registry: .ctx/ml/models/registry.yaml
|
|
221
|
+
|
|
222
|
+
Run /ctx:experiment new "<next hypothesis>" to continue.
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
### Verdict: rejected
|
|
226
|
+
|
|
227
|
+
1. Update `EXPERIMENT-LOG.md` row status to: `rejected`
|
|
228
|
+
2. Update `ML-STATUS.md` with outcome and reviewer's next recommendation
|
|
229
|
+
|
|
230
|
+
Output:
|
|
231
|
+
```
|
|
232
|
+
[Train] EXP-{n} rejected — {reason from RESULTS.md}
|
|
233
|
+
|
|
234
|
+
Primary metric: {value} (target was {target})
|
|
235
|
+
|
|
236
|
+
Key findings:
|
|
237
|
+
{findings from RESULTS.md}
|
|
238
|
+
|
|
239
|
+
Next recommendation: {from reviewer}
|
|
240
|
+
|
|
241
|
+
Run /ctx:experiment new "<next hypothesis>" to iterate.
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
### Verdict: inconclusive
|
|
245
|
+
|
|
246
|
+
1. Update `EXPERIMENT-LOG.md` row status to: `inconclusive`
|
|
247
|
+
2. Report what is blocking a clear verdict
|
|
248
|
+
|
|
249
|
+
Output:
|
|
250
|
+
```
|
|
251
|
+
[Train] EXP-{n} inconclusive — {reason}
|
|
252
|
+
|
|
253
|
+
Blocking issue: {issue}
|
|
254
|
+
|
|
255
|
+
Recommended action: {action}
|
|
256
|
+
```
|
|
257
|
+
|
|
258
|
+
</process>
|
|
259
|
+
|
|
260
|
+
<guardrails>
|
|
261
|
+
- Never promote to registry without an accepted verdict from ctx-ml-reviewer.
|
|
262
|
+
- Never proceed to review if required training artifacts are missing.
|
|
263
|
+
- Seed is mandatory in config.yaml — non-reproducible runs are rejected.
|
|
264
|
+
- Dry run never touches EXPERIMENT-LOG.md or ML-STATUS.md.
|
|
265
|
+
- If training fails mid-run, update EXPERIMENT-LOG.md status to "failed" before exiting.
|
|
266
|
+
</guardrails>
|