ma-agents 3.3.0 → 3.4.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.opencode/skills/.ma-agents.json +99 -99
- package/.roo/skills/.ma-agents.json +99 -99
- package/README.md +56 -15
- package/bin/cli.js +63 -8
- package/lib/agents.js +23 -0
- package/lib/bmad-cache/cache-manifest.json +1 -1
- package/lib/bmad-customizations/bmm-demerzel.customize.yaml +36 -0
- package/lib/bmad-customizations/demerzel.md +32 -0
- package/lib/bmad-extension/module-help.csv +13 -0
- package/lib/bmad-extension/skills/bmad-ma-agent-ml/.gitkeep +0 -0
- package/lib/bmad-extension/skills/bmad-ma-agent-ml/SKILL.md +59 -0
- package/lib/bmad-extension/skills/bmad-ma-agent-ml/bmad-skill-manifest.yaml +11 -0
- package/lib/bmad-extension/skills/generate-backlog/.gitkeep +0 -0
- package/lib/bmad-extension/skills/ml-advise/.gitkeep +0 -0
- package/lib/bmad-extension/skills/ml-advise/SKILL.md +76 -0
- package/lib/bmad-extension/skills/ml-advise/bmad-skill-manifest.yaml +3 -0
- package/lib/bmad-extension/skills/ml-advise/skill.json +7 -0
- package/lib/bmad-extension/skills/ml-analysis/.gitkeep +0 -0
- package/lib/bmad-extension/skills/ml-analysis/SKILL.md +60 -0
- package/lib/bmad-extension/skills/ml-analysis/bmad-skill-manifest.yaml +3 -0
- package/lib/bmad-extension/skills/ml-analysis/skill.json +7 -0
- package/lib/bmad-extension/skills/ml-architecture/.gitkeep +0 -0
- package/lib/bmad-extension/skills/ml-architecture/SKILL.md +55 -0
- package/lib/bmad-extension/skills/ml-architecture/bmad-skill-manifest.yaml +3 -0
- package/lib/bmad-extension/skills/ml-architecture/skill.json +7 -0
- package/lib/bmad-extension/skills/ml-detailed-design/.gitkeep +0 -0
- package/lib/bmad-extension/skills/ml-detailed-design/SKILL.md +67 -0
- package/lib/bmad-extension/skills/ml-detailed-design/bmad-skill-manifest.yaml +3 -0
- package/lib/bmad-extension/skills/ml-detailed-design/skill.json +7 -0
- package/lib/bmad-extension/skills/ml-eda/.gitkeep +0 -0
- package/lib/bmad-extension/skills/ml-eda/SKILL.md +56 -0
- package/lib/bmad-extension/skills/ml-eda/bmad-skill-manifest.yaml +3 -0
- package/lib/bmad-extension/skills/ml-eda/scripts/baseline_classifier.py +522 -0
- package/lib/bmad-extension/skills/ml-eda/scripts/class_weights_calculator.py +295 -0
- package/lib/bmad-extension/skills/ml-eda/scripts/clustering_explorer.py +383 -0
- package/lib/bmad-extension/skills/ml-eda/scripts/eda_analyzer.py +654 -0
- package/lib/bmad-extension/skills/ml-eda/skill.json +7 -0
- package/lib/bmad-extension/skills/ml-experiment/.gitkeep +0 -0
- package/lib/bmad-extension/skills/ml-experiment/SKILL.md +74 -0
- package/lib/bmad-extension/skills/ml-experiment/assets/advanced_trainer_configs.py +430 -0
- package/lib/bmad-extension/skills/ml-experiment/assets/quick_trainer_setup.py +233 -0
- package/lib/bmad-extension/skills/ml-experiment/assets/template_datamodule.py +219 -0
- package/lib/bmad-extension/skills/ml-experiment/assets/template_gnn_module.py +341 -0
- package/lib/bmad-extension/skills/ml-experiment/assets/template_lightning_module.py +158 -0
- package/lib/bmad-extension/skills/ml-experiment/bmad-skill-manifest.yaml +3 -0
- package/lib/bmad-extension/skills/ml-experiment/skill.json +7 -0
- package/lib/bmad-extension/skills/ml-hparam/.gitkeep +0 -0
- package/lib/bmad-extension/skills/ml-hparam/SKILL.md +81 -0
- package/lib/bmad-extension/skills/ml-hparam/bmad-skill-manifest.yaml +3 -0
- package/lib/bmad-extension/skills/ml-hparam/skill.json +7 -0
- package/lib/bmad-extension/skills/ml-ideation/.gitkeep +0 -0
- package/lib/bmad-extension/skills/ml-ideation/SKILL.md +50 -0
- package/lib/bmad-extension/skills/ml-ideation/bmad-skill-manifest.yaml +3 -0
- package/lib/bmad-extension/skills/ml-ideation/scripts/validate_ml_prd.py +287 -0
- package/lib/bmad-extension/skills/ml-ideation/skill.json +7 -0
- package/lib/bmad-extension/skills/ml-infra/.gitkeep +0 -0
- package/lib/bmad-extension/skills/ml-infra/SKILL.md +58 -0
- package/lib/bmad-extension/skills/ml-infra/bmad-skill-manifest.yaml +3 -0
- package/lib/bmad-extension/skills/ml-infra/skill.json +7 -0
- package/lib/bmad-extension/skills/ml-retrospective/.gitkeep +0 -0
- package/lib/bmad-extension/skills/ml-retrospective/SKILL.md +63 -0
- package/lib/bmad-extension/skills/ml-retrospective/bmad-skill-manifest.yaml +3 -0
- package/lib/bmad-extension/skills/ml-retrospective/skill.json +7 -0
- package/lib/bmad-extension/skills/ml-revision/.gitkeep +0 -0
- package/lib/bmad-extension/skills/ml-revision/SKILL.md +82 -0
- package/lib/bmad-extension/skills/ml-revision/bmad-skill-manifest.yaml +3 -0
- package/lib/bmad-extension/skills/ml-revision/skill.json +7 -0
- package/lib/bmad-extension/skills/ml-techspec/.gitkeep +0 -0
- package/lib/bmad-extension/skills/ml-techspec/SKILL.md +80 -0
- package/lib/bmad-extension/skills/ml-techspec/bmad-skill-manifest.yaml +3 -0
- package/lib/bmad-extension/skills/ml-techspec/skill.json +7 -0
- package/lib/bmad.js +85 -8
- package/lib/skill-authoring.js +1 -1
- package/package.json +2 -2
- package/test/agent-injection-strategy.test.js +4 -4
- package/test/bmad-version-bump.test.js +34 -34
- package/test/build-bmad-args.test.js +13 -6
- package/test/convert-agents-to-skills.test.js +11 -1
- package/test/extension-module-restructure.test.js +31 -7
- package/test/migration-validation.test.js +14 -11
|
@@ -0,0 +1,287 @@
|
|
|
1
|
+
#!/usr/bin/env python3
|
|
2
|
+
"""
|
|
3
|
+
validate_prd.py — BMAD DL Lifecycle
|
|
4
|
+
Validates docs/prd/01_PRD.md for completeness before architecture begins.
|
|
5
|
+
|
|
6
|
+
Usage:
|
|
7
|
+
python3 scripts/validate_prd.py <prd_path>
|
|
8
|
+
python3 scripts/validate_prd.py docs/prd/01_PRD.md
|
|
9
|
+
|
|
10
|
+
Exit codes:
|
|
11
|
+
0 — PASS
|
|
12
|
+
1 — validation errors found
|
|
13
|
+
2 — file not found or unreadable
|
|
14
|
+
"""
|
|
15
|
+
|
|
16
|
+
from __future__ import annotations
|
|
17
|
+
|
|
18
|
+
import re
|
|
19
|
+
import sys
|
|
20
|
+
from dataclasses import dataclass, field
|
|
21
|
+
from pathlib import Path
|
|
22
|
+
|
|
23
|
+
|
|
24
|
+
# ── Configuration ─────────────────────────────────────────────────────────────
|
|
25
|
+
|
|
26
|
+
REQUIRED_SECTIONS = ["Project Overview", "Traceable Requirements", "Status"]
|
|
27
|
+
REQUIRED_REQ_CATEGORIES = {"System", "Data", "Performance"}
|
|
28
|
+
REQUIRED_REQ_PREFIXES = {"REQ-SYS", "REQ-DATA", "REQ-PERF"}
|
|
29
|
+
STATUS_APPROVAL_PATTERN = re.compile(r"\[x\]\s*Approved", re.IGNORECASE)
|
|
30
|
+
|
|
31
|
+
# Table column indices (0-based after splitting on |)
|
|
32
|
+
COL_REQ_ID = 1
|
|
33
|
+
COL_CATEGORY = 2
|
|
34
|
+
COL_DESCRIPTION = 3
|
|
35
|
+
COL_ACCEPTANCE = 4
|
|
36
|
+
|
|
37
|
+
|
|
38
|
+
# ── Data structures ────────────────────────────────────────────────────────────
|
|
39
|
+
|
|
40
|
+
@dataclass
|
|
41
|
+
class Requirement:
|
|
42
|
+
req_id: str
|
|
43
|
+
category: str
|
|
44
|
+
description: str
|
|
45
|
+
acceptance_criteria: str
|
|
46
|
+
line_number: int
|
|
47
|
+
|
|
48
|
+
|
|
49
|
+
@dataclass
|
|
50
|
+
class ValidationResult:
|
|
51
|
+
errors: list[str] = field(default_factory=list)
|
|
52
|
+
warnings: list[str] = field(default_factory=list)
|
|
53
|
+
|
|
54
|
+
@property
|
|
55
|
+
def passed(self) -> bool:
|
|
56
|
+
return len(self.errors) == 0
|
|
57
|
+
|
|
58
|
+
def add_error(self, msg: str) -> None:
|
|
59
|
+
self.errors.append(msg)
|
|
60
|
+
|
|
61
|
+
def add_warning(self, msg: str) -> None:
|
|
62
|
+
self.warnings.append(msg)
|
|
63
|
+
|
|
64
|
+
|
|
65
|
+
# ── Parsing helpers ────────────────────────────────────────────────────────────
|
|
66
|
+
|
|
67
|
+
def _clean_cell(cell: str) -> str:
|
|
68
|
+
return cell.strip().strip("*`[]")
|
|
69
|
+
|
|
70
|
+
|
|
71
|
+
def _is_separator_row(row: str) -> bool:
|
|
72
|
+
return bool(re.match(r"^\s*\|[\s\-:|]+\|\s*$", row))
|
|
73
|
+
|
|
74
|
+
|
|
75
|
+
def parse_requirements_table(lines: list[str]) -> list[Requirement]:
|
|
76
|
+
"""Extract requirement rows from the markdown table in section B."""
|
|
77
|
+
requirements: list[Requirement] = []
|
|
78
|
+
in_table = False
|
|
79
|
+
|
|
80
|
+
for i, line in enumerate(lines, start=1):
|
|
81
|
+
if re.match(r"\|\s*Requirement\s*ID", line, re.IGNORECASE):
|
|
82
|
+
in_table = True
|
|
83
|
+
continue
|
|
84
|
+
if not in_table:
|
|
85
|
+
continue
|
|
86
|
+
if _is_separator_row(line):
|
|
87
|
+
continue
|
|
88
|
+
if not line.strip().startswith("|"):
|
|
89
|
+
in_table = False
|
|
90
|
+
continue
|
|
91
|
+
|
|
92
|
+
cells = line.split("|")
|
|
93
|
+
if len(cells) < 5:
|
|
94
|
+
continue
|
|
95
|
+
|
|
96
|
+
req_id = _clean_cell(cells[COL_REQ_ID])
|
|
97
|
+
category = _clean_cell(cells[COL_CATEGORY])
|
|
98
|
+
description = _clean_cell(cells[COL_DESCRIPTION])
|
|
99
|
+
acceptance = _clean_cell(cells[COL_ACCEPTANCE])
|
|
100
|
+
|
|
101
|
+
# Skip placeholder / header rows
|
|
102
|
+
if not req_id or req_id.startswith(":") or "Requirement" in req_id:
|
|
103
|
+
continue
|
|
104
|
+
|
|
105
|
+
requirements.append(Requirement(
|
|
106
|
+
req_id=req_id,
|
|
107
|
+
category=category,
|
|
108
|
+
description=description,
|
|
109
|
+
acceptance_criteria=acceptance,
|
|
110
|
+
line_number=i,
|
|
111
|
+
))
|
|
112
|
+
|
|
113
|
+
return requirements
|
|
114
|
+
|
|
115
|
+
|
|
116
|
+
def find_sections(text: str) -> set[str]:
|
|
117
|
+
"""Return set of section headings found (### A., ### B., etc.)."""
|
|
118
|
+
return set(re.findall(r"###\s+[A-Z]\.\s+(.+)", text))
|
|
119
|
+
|
|
120
|
+
|
|
121
|
+
# ── Validation checks ──────────────────────────────────────────────────────────
|
|
122
|
+
|
|
123
|
+
def check_required_sections(text: str, result: ValidationResult) -> None:
|
|
124
|
+
sections = find_sections(text)
|
|
125
|
+
for required in REQUIRED_SECTIONS:
|
|
126
|
+
if not any(required.lower() in s.lower() for s in sections):
|
|
127
|
+
result.add_error(f"Missing required section: '### X. {required}'")
|
|
128
|
+
|
|
129
|
+
|
|
130
|
+
def check_requirements_table(lines: list[str], result: ValidationResult) -> list[Requirement]:
|
|
131
|
+
reqs = parse_requirements_table(lines)
|
|
132
|
+
|
|
133
|
+
if not reqs:
|
|
134
|
+
result.add_error(
|
|
135
|
+
"No requirements found in the Traceable Requirements table. "
|
|
136
|
+
"Ensure the table header contains 'Requirement ID'."
|
|
137
|
+
)
|
|
138
|
+
return []
|
|
139
|
+
|
|
140
|
+
return reqs
|
|
141
|
+
|
|
142
|
+
|
|
143
|
+
def check_req_id_format(reqs: list[Requirement], result: ValidationResult) -> None:
|
|
144
|
+
pattern = re.compile(r"^REQ-[A-Z]+-\d+$")
|
|
145
|
+
for req in reqs:
|
|
146
|
+
if not pattern.match(req.req_id):
|
|
147
|
+
result.add_error(
|
|
148
|
+
f"Line {req.line_number}: Invalid REQ-ID format '{req.req_id}'. "
|
|
149
|
+
f"Expected pattern: REQ-<CATEGORY>-<NUMBER> (e.g. REQ-PERF-01)"
|
|
150
|
+
)
|
|
151
|
+
|
|
152
|
+
|
|
153
|
+
def check_category_coverage(reqs: list[Requirement], result: ValidationResult) -> None:
|
|
154
|
+
found_prefixes = {req.req_id.rsplit("-", 1)[0] for req in reqs}
|
|
155
|
+
missing = REQUIRED_REQ_PREFIXES - found_prefixes
|
|
156
|
+
for prefix in sorted(missing):
|
|
157
|
+
result.add_error(
|
|
158
|
+
f"No requirement with prefix '{prefix}-' found. "
|
|
159
|
+
f"Every PRD must include at least one {prefix} requirement."
|
|
160
|
+
)
|
|
161
|
+
|
|
162
|
+
|
|
163
|
+
def check_empty_fields(reqs: list[Requirement], result: ValidationResult) -> None:
|
|
164
|
+
placeholder_patterns = [
|
|
165
|
+
re.compile(r"^\[.+\]$"), # [Placeholder text]
|
|
166
|
+
re.compile(r"^\.{3}$"), # ...
|
|
167
|
+
re.compile(r"^-$"), # -
|
|
168
|
+
re.compile(r"^TBD$", re.IGNORECASE),
|
|
169
|
+
]
|
|
170
|
+
|
|
171
|
+
def _is_placeholder(value: str) -> bool:
|
|
172
|
+
return not value or any(p.match(value) for p in placeholder_patterns)
|
|
173
|
+
|
|
174
|
+
for req in reqs:
|
|
175
|
+
if _is_placeholder(req.description):
|
|
176
|
+
result.add_error(
|
|
177
|
+
f"Line {req.line_number}: REQ '{req.req_id}' has an empty or placeholder Description."
|
|
178
|
+
)
|
|
179
|
+
if _is_placeholder(req.acceptance_criteria):
|
|
180
|
+
result.add_error(
|
|
181
|
+
f"Line {req.line_number}: REQ '{req.req_id}' has an empty or placeholder Acceptance Criteria. "
|
|
182
|
+
f"Every requirement must have measurable acceptance criteria."
|
|
183
|
+
)
|
|
184
|
+
if _is_placeholder(req.category):
|
|
185
|
+
result.add_error(
|
|
186
|
+
f"Line {req.line_number}: REQ '{req.req_id}' has an empty Category."
|
|
187
|
+
)
|
|
188
|
+
|
|
189
|
+
|
|
190
|
+
def check_status_approval(text: str, result: ValidationResult) -> None:
|
|
191
|
+
if not STATUS_APPROVAL_PATTERN.search(text):
|
|
192
|
+
result.add_error(
|
|
193
|
+
"Status section does not show approval. "
|
|
194
|
+
"Expected: '* [x] Approved for Architecture Design'"
|
|
195
|
+
)
|
|
196
|
+
|
|
197
|
+
|
|
198
|
+
def check_duplicate_req_ids(reqs: list[Requirement], result: ValidationResult) -> None:
|
|
199
|
+
seen: dict[str, int] = {}
|
|
200
|
+
for req in reqs:
|
|
201
|
+
if req.req_id in seen:
|
|
202
|
+
result.add_error(
|
|
203
|
+
f"Duplicate REQ-ID '{req.req_id}' found at line {req.line_number} "
|
|
204
|
+
f"(first seen at line {seen[req.req_id]})."
|
|
205
|
+
)
|
|
206
|
+
else:
|
|
207
|
+
seen[req.req_id] = req.line_number
|
|
208
|
+
|
|
209
|
+
|
|
210
|
+
def check_minimum_requirements(reqs: list[Requirement], result: ValidationResult) -> None:
|
|
211
|
+
if len(reqs) < 3:
|
|
212
|
+
result.add_warning(
|
|
213
|
+
f"Only {len(reqs)} requirement(s) found. A meaningful PRD typically has "
|
|
214
|
+
f"at least one each of REQ-SYS, REQ-DATA, and REQ-PERF."
|
|
215
|
+
)
|
|
216
|
+
|
|
217
|
+
|
|
218
|
+
# ── Main ───────────────────────────────────────────────────────────────────────
|
|
219
|
+
|
|
220
|
+
def validate(prd_path: Path) -> ValidationResult:
|
|
221
|
+
result = ValidationResult()
|
|
222
|
+
|
|
223
|
+
try:
|
|
224
|
+
text = prd_path.read_text(encoding="utf-8")
|
|
225
|
+
except FileNotFoundError:
|
|
226
|
+
result.add_error(f"File not found: {prd_path}")
|
|
227
|
+
return result
|
|
228
|
+
except OSError as e:
|
|
229
|
+
result.add_error(f"Cannot read file: {e}")
|
|
230
|
+
return result
|
|
231
|
+
|
|
232
|
+
lines = text.splitlines()
|
|
233
|
+
|
|
234
|
+
check_required_sections(text, result)
|
|
235
|
+
reqs = check_requirements_table(lines, result)
|
|
236
|
+
|
|
237
|
+
if reqs:
|
|
238
|
+
check_req_id_format(reqs, result)
|
|
239
|
+
check_category_coverage(reqs, result)
|
|
240
|
+
check_empty_fields(reqs, result)
|
|
241
|
+
check_duplicate_req_ids(reqs, result)
|
|
242
|
+
check_minimum_requirements(reqs, result)
|
|
243
|
+
|
|
244
|
+
check_status_approval(text, result)
|
|
245
|
+
|
|
246
|
+
return result
|
|
247
|
+
|
|
248
|
+
|
|
249
|
+
def print_report(prd_path: Path, result: ValidationResult) -> None:
|
|
250
|
+
print(f"\nValidating: {prd_path}")
|
|
251
|
+
print("─" * 60)
|
|
252
|
+
|
|
253
|
+
if result.passed and not result.warnings:
|
|
254
|
+
print("✓ PRD validation PASSED — ready for architecture phase.")
|
|
255
|
+
return
|
|
256
|
+
|
|
257
|
+
if result.errors:
|
|
258
|
+
print(f"✗ FAILED — {len(result.errors)} error(s) must be fixed:\n")
|
|
259
|
+
for i, err in enumerate(result.errors, 1):
|
|
260
|
+
print(f" {i}. {err}")
|
|
261
|
+
|
|
262
|
+
if result.warnings:
|
|
263
|
+
print(f"\n⚠ {len(result.warnings)} warning(s):\n")
|
|
264
|
+
for w in result.warnings:
|
|
265
|
+
print(f" • {w}")
|
|
266
|
+
|
|
267
|
+
if result.passed:
|
|
268
|
+
print("\n✓ PRD validation PASSED (with warnings).")
|
|
269
|
+
|
|
270
|
+
|
|
271
|
+
def main() -> int:
|
|
272
|
+
if len(sys.argv) < 2:
|
|
273
|
+
print("Usage: python3 validate_prd.py <prd_path>", file=sys.stderr)
|
|
274
|
+
print("Example: python3 validate_prd.py docs/prd/01_PRD.md", file=sys.stderr)
|
|
275
|
+
return 2
|
|
276
|
+
|
|
277
|
+
prd_path = Path(sys.argv[1])
|
|
278
|
+
result = validate(prd_path)
|
|
279
|
+
print_report(prd_path, result)
|
|
280
|
+
|
|
281
|
+
if not result.passed:
|
|
282
|
+
return 1
|
|
283
|
+
return 0
|
|
284
|
+
|
|
285
|
+
|
|
286
|
+
if __name__ == "__main__":
|
|
287
|
+
sys.exit(main())
|
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "ML Ideation & Framing",
|
|
3
|
+
"description": "Frames the Machine Learning problem, defines the Research Thesis, and produces the PRD requirements.",
|
|
4
|
+
"version": "1.0.0",
|
|
5
|
+
"author": "Demerzel (ML Scientist)",
|
|
6
|
+
"tags": ["Machine Learning", "Planning", "Ideation"]
|
|
7
|
+
}
|
|
File without changes
|
|
@@ -0,0 +1,58 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ml-infra
|
|
3
|
+
description: ML Infra — Set up the Python environment, install dependencies, and run a smoke test before full training
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# ML Stage 5 — Infrastructure & Smoke Test
|
|
7
|
+
|
|
8
|
+
Validate the full pipeline end-to-end on a tiny data slice before committing to a full training run.
|
|
9
|
+
|
|
10
|
+
## Instructions
|
|
11
|
+
|
|
12
|
+
### 1. Load Context
|
|
13
|
+
- Read `_bmad-output/planning-artifacts/ml-techspec.md` — confirm it is LOCKED
|
|
14
|
+
- If TechSpec is not locked, STOP: "TechSpec must be locked before infrastructure setup. Run /ml-techspec first."
|
|
15
|
+
- Read `configs/ml_config.yaml` for tracking_tool setting
|
|
16
|
+
|
|
17
|
+
### 2. Environment Setup
|
|
18
|
+
Guide the user through (or execute if tools are available):
|
|
19
|
+
```bash
|
|
20
|
+
uv venv
|
|
21
|
+
uv sync
|
|
22
|
+
```
|
|
23
|
+
If new dependencies were added in architecture stage:
|
|
24
|
+
```bash
|
|
25
|
+
uv add <package1> <package2>
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
### 3. Configure Experiment Tracking
|
|
29
|
+
Based on `tracking_tool` in config:
|
|
30
|
+
- **wandb**: Verify `import wandb; wandb.login()` works. Create project if needed.
|
|
31
|
+
- **mlflow**: Verify `import mlflow; mlflow.set_tracking_uri(...)` works.
|
|
32
|
+
- **clearml**: Verify `clearml-init` has been run.
|
|
33
|
+
- **local**: Confirm `logs/` directory exists for CSV/JSON run logs.
|
|
34
|
+
|
|
35
|
+
### 4. Write Smoke Test Script
|
|
36
|
+
If `scripts/smoke_test.py` does not exist, create it:
|
|
37
|
+
- Load first 100 rows of training data
|
|
38
|
+
- Run the full preprocessing pipeline
|
|
39
|
+
- Instantiate the model with fixed hyperparameters from TechSpec
|
|
40
|
+
- Train on 80 rows, predict on 20 rows
|
|
41
|
+
- Log one metric to the tracking tool
|
|
42
|
+
- Assert the pipeline completes without error
|
|
43
|
+
|
|
44
|
+
Run the smoke test: `uv run python scripts/smoke_test.py`
|
|
45
|
+
|
|
46
|
+
### 5. Verify Outputs
|
|
47
|
+
Confirm:
|
|
48
|
+
- Environment activates cleanly (`uv run python --version`)
|
|
49
|
+
- All imports resolve (no ModuleNotFoundError)
|
|
50
|
+
- Data loads from the path specified in TechSpec
|
|
51
|
+
- Preprocessing pipeline runs without error
|
|
52
|
+
- Smoke test metric appears in tracking tool dashboard (or local log)
|
|
53
|
+
|
|
54
|
+
### 6. Confirm & Advance
|
|
55
|
+
- Report smoke test result: PASS or FAIL with error details
|
|
56
|
+
- On PASS: "Infrastructure validated. Proceed to **Stage 6 — /ml-experiment** to run the full experiment."
|
|
57
|
+
- On FAIL: Fix the issue before advancing. Do not proceed to full training with a broken pipeline.
|
|
58
|
+
- STOP and WAIT for user confirmation
|
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "ML Infrastructure Build",
|
|
3
|
+
"description": "Builds all engineering infrastructure (data pipelines, training loops, tracking, eval) for experiments.",
|
|
4
|
+
"version": "1.0.0",
|
|
5
|
+
"author": "Demerzel (ML Scientist)",
|
|
6
|
+
"tags": ["Machine Learning", "Infrastructure", "MLOps", "Data Pipeline", "Training"]
|
|
7
|
+
}
|
|
File without changes
|
|
@@ -0,0 +1,63 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ml-retrospective
|
|
3
|
+
description: ML Retrospective - Capture learnings, document what worked and what failed, and produce a reusable knowledge artifact
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# ML Stage 8 - Retrospective
|
|
7
|
+
|
|
8
|
+
Close the experiment loop. Document learnings so the next experiment starts smarter.
|
|
9
|
+
|
|
10
|
+
## Instructions
|
|
11
|
+
|
|
12
|
+
### 1. Load Context
|
|
13
|
+
- Read all artifacts in `_bmad-output/planning-artifacts/`
|
|
14
|
+
- Read all artifacts in `_bmad-output/implementation-artifacts/`
|
|
15
|
+
- Reconstruct the full lifecycle: what was hypothesized, what was built, what happened
|
|
16
|
+
|
|
17
|
+
### 2. Write Retrospective
|
|
18
|
+
Write `_bmad-output/planning-artifacts/retrospective.md` with these sections:
|
|
19
|
+
|
|
20
|
+
#### Hypothesis Outcome
|
|
21
|
+
- **Original Hypothesis** (from research-thesis.md)
|
|
22
|
+
- **Outcome**: CONFIRMED / REFUTED / PARTIALLY CONFIRMED
|
|
23
|
+
- **Evidence**: Key metrics that support the conclusion
|
|
24
|
+
|
|
25
|
+
#### What Worked Well
|
|
26
|
+
- Data characteristics that made modelling tractable
|
|
27
|
+
- Preprocessing decisions that improved results
|
|
28
|
+
- Architecture choices that were validated
|
|
29
|
+
- Process decisions that saved time
|
|
30
|
+
|
|
31
|
+
#### What Did Not Work
|
|
32
|
+
- Approaches attempted and discarded (with evidence, not opinion)
|
|
33
|
+
- Data quality issues that were not caught in EDA
|
|
34
|
+
- Assumptions from the PRD that were wrong
|
|
35
|
+
- HPO strategies that did not yield improvement
|
|
36
|
+
|
|
37
|
+
#### Failure Cost Reflection
|
|
38
|
+
- Actual FP/FN distribution in best model
|
|
39
|
+
- Whether the failure cost tradeoff met business expectations
|
|
40
|
+
- Recommended threshold for production use (with cost justification)
|
|
41
|
+
|
|
42
|
+
#### Technical Debt
|
|
43
|
+
- Known shortcuts taken and their risk
|
|
44
|
+
- Features excluded that may still have signal
|
|
45
|
+
- Data collection improvements recommended for next iteration
|
|
46
|
+
|
|
47
|
+
#### Recommended Next Experiments
|
|
48
|
+
- Top 3 actionable hypotheses for the next iteration, ranked by expected improvement
|
|
49
|
+
- Each with: hypothesis, required data, estimated effort
|
|
50
|
+
|
|
51
|
+
#### Process Improvements
|
|
52
|
+
- Anything in the Demerzel protocol that should be adjusted for this domain
|
|
53
|
+
- Data pipeline improvements needed before next experiment
|
|
54
|
+
|
|
55
|
+
### 3. Archive Experiment
|
|
56
|
+
- Confirm all artifacts are saved under `_bmad-output/`
|
|
57
|
+
- Confirm model artifact is saved under `_bmad-output/implementation-artifacts/models/`
|
|
58
|
+
- Confirm experiment is logged and accessible in tracking tool
|
|
59
|
+
|
|
60
|
+
### 4. Close Session
|
|
61
|
+
- Present retrospective summary
|
|
62
|
+
- "Experiment cycle complete. All artifacts are saved in `_bmad-output/`. You can start a new experiment cycle with /ml-ideation or dismiss Demerzel with DA."
|
|
63
|
+
- STOP and WAIT for user input
|
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "ML Session Retrospective",
|
|
3
|
+
"description": "Extracts learnings, metrics, and failure modes from the current session and archives them to the knowledge base.",
|
|
4
|
+
"version": "1.0.0",
|
|
5
|
+
"author": "Demerzel (ML Scientist)",
|
|
6
|
+
"tags": ["Machine Learning", "Retrospective", "Knowledge Capture", "Post-mortem", "Demerzel"]
|
|
7
|
+
}
|
|
File without changes
|
|
@@ -0,0 +1,82 @@
|
|
|
1
|
+
---
|
|
2
|
+
|
|
3
|
+
name: ml-revision
|
|
4
|
+
|
|
5
|
+
description: Acts as Demerzel (Machine Learning Scientist) to formulate the next hypothesis, explicitly amend all upstream documents (Thesis, PRD, Architecture, Design) that need updating, and generate the task set for the next experiment cycle.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# Machine Learning Workflow: Iterative Revision Cycle — Demerzel
|
|
10
|
+
|
|
11
|
+
## 1. Operating Instructions
|
|
12
|
+
|
|
13
|
+
You are **Demerzel**, an expert Machine Learning Scientist. A revision is not just a hypothesis update. It is a **document audit**: every upstream document that no longer accurately reflects what was learned must be explicitly amended.
|
|
14
|
+
|
|
15
|
+
1. **Summarize experiment history:**
|
|
16
|
+
```bash
|
|
17
|
+
python3 scripts/summarize_experiment_history.py _bmad-output/implementation-artifacts/ --metric val/f1
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
2. **Read in order:**
|
|
21
|
+
- `_bmad-output/planning-artifacts/research-thesis.md`
|
|
22
|
+
- `_bmad-output/implementation-artifacts/ml-analysis-exp-[id].md`
|
|
23
|
+
- `_bmad-output/planning-artifacts/techspecs/ml-techspec-exp-[id].md`
|
|
24
|
+
- `_bmad-output/planning-artifacts/ml-prd.md`
|
|
25
|
+
- `_bmad-output/planning-artifacts/ml-architecture.md`
|
|
26
|
+
- `_bmad-output/planning-artifacts/ml-detailed-design.md`
|
|
27
|
+
|
|
28
|
+
3. **Conduct the document audit.** For each upstream document, state:
|
|
29
|
+
- **No change needed** — with explicit reason.
|
|
30
|
+
- **Amendment needed** — with the exact proposed change.
|
|
31
|
+
|
|
32
|
+
4. **Formulate the next hypothesis:**
|
|
33
|
+
- Format: "Using [change] will improve [metric] from [baseline] to [target] because [reasoning from the latest analysis]."
|
|
34
|
+
- The hypothesis must be falsifiable. State what result would disprove it.
|
|
35
|
+
|
|
36
|
+
5. **Generate new tasks** for the next cycle:
|
|
37
|
+
- New infrastructure tasks → `INF-0XX` (if architecture changes).
|
|
38
|
+
- New experiment tasks → `EXP-0XX` (increment the counter).
|
|
39
|
+
|
|
40
|
+
6. **CRITICAL:** Do not execute any changes yet. Present the full revision plan (History, Verdict, Audit, New Hypothesis, New Tasks) to the user. Halt and wait.
|
|
41
|
+
|
|
42
|
+
7. Upon approval, execute all changes:
|
|
43
|
+
- Apply all document amendments.
|
|
44
|
+
- Update `research-thesis.md` Section II (new) and Section V (archive old).
|
|
45
|
+
- Append to `_bmad-output/implementation-artifacts/ml-revision-log.md`.
|
|
46
|
+
|
|
47
|
+
8. **Commit the revision artifact:**
|
|
48
|
+
```bash
|
|
49
|
+
git add _bmad-output/planning-artifacts/ _bmad-output/implementation-artifacts/ml-revision-log.md
|
|
50
|
+
git commit -m "docs(ml-revision): cycle [N] -- new hypothesis H-00N"
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
## 2. Expected Output Templates
|
|
54
|
+
|
|
55
|
+
### Template A: Update to `_bmad-output/planning-artifacts/research-thesis.md`
|
|
56
|
+
- Section II: Replace active hypothesis. Set status to "Untested".
|
|
57
|
+
- Section V: Append the previous hypothesis row.
|
|
58
|
+
|
|
59
|
+
### Template B: `_bmad-output/implementation-artifacts/ml-revision-log.md`
|
|
60
|
+
|
|
61
|
+
```markdown
|
|
62
|
+
### Revision Cycle [N]
|
|
63
|
+
* **Triggered By:** EXP-[ID]
|
|
64
|
+
* **Verdict:** [SUPPORTED / FALSIFIED]
|
|
65
|
+
* **New Hypothesis:** "[New domain-grounded, falsifiable statement]"
|
|
66
|
+
* **Rationale:** [Evidence from analysis]
|
|
67
|
+
|
|
68
|
+
### Document Amendment Log
|
|
69
|
+
| Document | Change / No Change | Detail |
|
|
70
|
+
| :--- | :--- | :--- |
|
|
71
|
+
| research-thesis.md | AMENDED | Section II and V updated. |
|
|
72
|
+
| ml-prd.md | [Change] | [Detail] |
|
|
73
|
+
| ml-architecture.md | [Change] | [Detail] |
|
|
74
|
+
|
|
75
|
+
### New Task Generation
|
|
76
|
+
| Task ID | Description | Linked Req |
|
|
77
|
+
| :--- | :--- | :--- |
|
|
78
|
+
| `EXP-0XX` | [Training run with new hypothesis] | [REQ-ID] |
|
|
79
|
+
| `INF-0XX` | [New infra if architecture changed] | [REQ-ID] |
|
|
80
|
+
|
|
81
|
+
* **Status:** [Approved — ready for /ml-techspec]
|
|
82
|
+
```
|
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "ML Iterative Revision",
|
|
3
|
+
"description": "Formulates the next hypothesis, audits all upstream documents, and generates the task set for the next cycle.",
|
|
4
|
+
"version": "1.0.0",
|
|
5
|
+
"author": "Demerzel (ML Scientist)",
|
|
6
|
+
"tags": ["Machine Learning", "Revision", "Iteration", "Hypothesis", "Audit", "Demerzel"]
|
|
7
|
+
}
|
|
File without changes
|
|
@@ -0,0 +1,80 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ml-techspec
|
|
3
|
+
description: ML TechSpec — Lock the experiment contract. Defines acceptance criteria that cannot be changed during training.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# ML Stage 4 — TechSpec (The Contract)
|
|
7
|
+
|
|
8
|
+
The TechSpec is a locked contract. Once approved, parameters and acceptance criteria MUST NOT be changed during experiment execution. This prevents moving the goalposts after seeing results.
|
|
9
|
+
|
|
10
|
+
## Instructions
|
|
11
|
+
|
|
12
|
+
### 1. Load Context
|
|
13
|
+
- Read `_bmad-output/planning-artifacts/ml-prd.md`
|
|
14
|
+
- Read `_bmad-output/planning-artifacts/eda-report.md`
|
|
15
|
+
- Read `_bmad-output/planning-artifacts/ml-architecture.md`
|
|
16
|
+
|
|
17
|
+
### 2. Draft the TechSpec
|
|
18
|
+
Write `_bmad-output/planning-artifacts/ml-techspec.md` with the following sections — every field is REQUIRED:
|
|
19
|
+
|
|
20
|
+
```
|
|
21
|
+
# ML TechSpec — [Project Name]
|
|
22
|
+
Status: DRAFT -> LOCKED (set to LOCKED on user approval)
|
|
23
|
+
Locked At: [timestamp on approval]
|
|
24
|
+
|
|
25
|
+
## Experiment Identity
|
|
26
|
+
- Project: [project_name from configs/ml_config.yaml]
|
|
27
|
+
- Tracking Tool: [wandb / mlflow / clearml / local]
|
|
28
|
+
- Run Name Convention: [e.g. xgb_v{version}_{date}]
|
|
29
|
+
|
|
30
|
+
## Data Contract
|
|
31
|
+
- Training Data: [path, row count, date range]
|
|
32
|
+
- Validation Strategy: [stratified k-fold N=X / time-series split / holdout ratio]
|
|
33
|
+
- Test Set: [path or split ratio — NEVER touched until final evaluation]
|
|
34
|
+
- Feature Set: [list of features, with preprocessing step for each]
|
|
35
|
+
- Excluded Features: [list with reason for exclusion]
|
|
36
|
+
|
|
37
|
+
## Model Contract
|
|
38
|
+
- Algorithm: [exact algorithm name and library]
|
|
39
|
+
- Baseline: [exact baseline — e.g. DummyClassifier(strategy='most_frequent')]
|
|
40
|
+
- Fixed Hyperparameters: [params NOT being tuned, with values]
|
|
41
|
+
- HPO Space: [params being tuned, with ranges and search strategy]
|
|
42
|
+
- HPO Budget: [max trials or max wall-clock time]
|
|
43
|
+
|
|
44
|
+
## Acceptance Criteria (PRIMARY — must ALL pass)
|
|
45
|
+
- Primary Metric: [e.g. Recall >= 0.85 on validation set]
|
|
46
|
+
- Secondary Metric: [e.g. AUC-ROC >= 0.80]
|
|
47
|
+
- Baseline Beat: [model must beat baseline on primary metric]
|
|
48
|
+
|
|
49
|
+
## Guardrail Criteria (MUST NOT violate)
|
|
50
|
+
- [e.g. Precision >= 0.50 — below this FP rate is unacceptable in domain]
|
|
51
|
+
- [e.g. Inference latency < 100ms per sample]
|
|
52
|
+
|
|
53
|
+
## Failure Cost Reminder
|
|
54
|
+
- False Negative cost: [from PRD]
|
|
55
|
+
- False Positive cost: [from PRD]
|
|
56
|
+
- Threshold selection strategy: [maximize recall / F-beta / cost-weighted]
|
|
57
|
+
|
|
58
|
+
## Reproducibility
|
|
59
|
+
- Random seed: [integer]
|
|
60
|
+
- uv lockfile: pyproject.toml + uv.lock
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
### 3. Surface Dilemmas & Commit Gate
|
|
64
|
+
|
|
65
|
+
Before presenting and **before any git commit**:
|
|
66
|
+
|
|
67
|
+
- Identify every contract decision where two or more reasonable options existed (metric threshold values, HPO budget, random seed choice, held-out split ratio, guardrail definitions, etc.)
|
|
68
|
+
- Format each as: **Dilemma [Letter] — Title** / **Context** / **Options (a/b)** / **Recommendation** / **Your decision:** [blank]
|
|
69
|
+
- If all choices were unambiguous, state explicitly: "No open dilemmas."
|
|
70
|
+
- **Do NOT commit the TechSpec until the user has resolved all dilemmas and given explicit lock approval.**
|
|
71
|
+
|
|
72
|
+
### 4. Lock the Contract
|
|
73
|
+
- Present the TechSpec draft and all surfaced dilemmas to the user
|
|
74
|
+
- State explicitly: "Once you approve this, the acceptance criteria and data contract are locked. You may tune hyperparameters but may NOT change the primary metric threshold, feature set, or validation strategy during training."
|
|
75
|
+
- Ask: "Do you approve and lock this TechSpec?"
|
|
76
|
+
- On approval: Set Status to LOCKED, add Locked At timestamp, then commit
|
|
77
|
+
|
|
78
|
+
### 5. Confirm & Advance
|
|
79
|
+
- "TechSpec locked. Proceed to **Stage 5 — /ml-infra** to set up the environment and run a smoke test."
|
|
80
|
+
- STOP and WAIT for user confirmation
|
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "ML Pre-Experiment TechSpec",
|
|
3
|
+
"description": "Produces a pre-experiment contract that pins hypotheses, parameters, and success criteria.",
|
|
4
|
+
"version": "1.0.0",
|
|
5
|
+
"author": "Demerzel (ML Scientist)",
|
|
6
|
+
"tags": ["Machine Learning", "TechSpec", "Experiment Design", "Contract"]
|
|
7
|
+
}
|