ma-agents 3.3.0 → 3.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (80) hide show
  1. package/.opencode/skills/.ma-agents.json +99 -99
  2. package/.roo/skills/.ma-agents.json +99 -99
  3. package/README.md +56 -15
  4. package/bin/cli.js +63 -8
  5. package/lib/agents.js +23 -0
  6. package/lib/bmad-cache/cache-manifest.json +1 -1
  7. package/lib/bmad-customizations/bmm-demerzel.customize.yaml +36 -0
  8. package/lib/bmad-customizations/demerzel.md +32 -0
  9. package/lib/bmad-extension/module-help.csv +13 -0
  10. package/lib/bmad-extension/skills/bmad-ma-agent-ml/.gitkeep +0 -0
  11. package/lib/bmad-extension/skills/bmad-ma-agent-ml/SKILL.md +59 -0
  12. package/lib/bmad-extension/skills/bmad-ma-agent-ml/bmad-skill-manifest.yaml +11 -0
  13. package/lib/bmad-extension/skills/generate-backlog/.gitkeep +0 -0
  14. package/lib/bmad-extension/skills/ml-advise/.gitkeep +0 -0
  15. package/lib/bmad-extension/skills/ml-advise/SKILL.md +76 -0
  16. package/lib/bmad-extension/skills/ml-advise/bmad-skill-manifest.yaml +3 -0
  17. package/lib/bmad-extension/skills/ml-advise/skill.json +7 -0
  18. package/lib/bmad-extension/skills/ml-analysis/.gitkeep +0 -0
  19. package/lib/bmad-extension/skills/ml-analysis/SKILL.md +60 -0
  20. package/lib/bmad-extension/skills/ml-analysis/bmad-skill-manifest.yaml +3 -0
  21. package/lib/bmad-extension/skills/ml-analysis/skill.json +7 -0
  22. package/lib/bmad-extension/skills/ml-architecture/.gitkeep +0 -0
  23. package/lib/bmad-extension/skills/ml-architecture/SKILL.md +55 -0
  24. package/lib/bmad-extension/skills/ml-architecture/bmad-skill-manifest.yaml +3 -0
  25. package/lib/bmad-extension/skills/ml-architecture/skill.json +7 -0
  26. package/lib/bmad-extension/skills/ml-detailed-design/.gitkeep +0 -0
  27. package/lib/bmad-extension/skills/ml-detailed-design/SKILL.md +67 -0
  28. package/lib/bmad-extension/skills/ml-detailed-design/bmad-skill-manifest.yaml +3 -0
  29. package/lib/bmad-extension/skills/ml-detailed-design/skill.json +7 -0
  30. package/lib/bmad-extension/skills/ml-eda/.gitkeep +0 -0
  31. package/lib/bmad-extension/skills/ml-eda/SKILL.md +56 -0
  32. package/lib/bmad-extension/skills/ml-eda/bmad-skill-manifest.yaml +3 -0
  33. package/lib/bmad-extension/skills/ml-eda/scripts/baseline_classifier.py +522 -0
  34. package/lib/bmad-extension/skills/ml-eda/scripts/class_weights_calculator.py +295 -0
  35. package/lib/bmad-extension/skills/ml-eda/scripts/clustering_explorer.py +383 -0
  36. package/lib/bmad-extension/skills/ml-eda/scripts/eda_analyzer.py +654 -0
  37. package/lib/bmad-extension/skills/ml-eda/skill.json +7 -0
  38. package/lib/bmad-extension/skills/ml-experiment/.gitkeep +0 -0
  39. package/lib/bmad-extension/skills/ml-experiment/SKILL.md +74 -0
  40. package/lib/bmad-extension/skills/ml-experiment/assets/advanced_trainer_configs.py +430 -0
  41. package/lib/bmad-extension/skills/ml-experiment/assets/quick_trainer_setup.py +233 -0
  42. package/lib/bmad-extension/skills/ml-experiment/assets/template_datamodule.py +219 -0
  43. package/lib/bmad-extension/skills/ml-experiment/assets/template_gnn_module.py +341 -0
  44. package/lib/bmad-extension/skills/ml-experiment/assets/template_lightning_module.py +158 -0
  45. package/lib/bmad-extension/skills/ml-experiment/bmad-skill-manifest.yaml +3 -0
  46. package/lib/bmad-extension/skills/ml-experiment/skill.json +7 -0
  47. package/lib/bmad-extension/skills/ml-hparam/.gitkeep +0 -0
  48. package/lib/bmad-extension/skills/ml-hparam/SKILL.md +81 -0
  49. package/lib/bmad-extension/skills/ml-hparam/bmad-skill-manifest.yaml +3 -0
  50. package/lib/bmad-extension/skills/ml-hparam/skill.json +7 -0
  51. package/lib/bmad-extension/skills/ml-ideation/.gitkeep +0 -0
  52. package/lib/bmad-extension/skills/ml-ideation/SKILL.md +50 -0
  53. package/lib/bmad-extension/skills/ml-ideation/bmad-skill-manifest.yaml +3 -0
  54. package/lib/bmad-extension/skills/ml-ideation/scripts/validate_ml_prd.py +287 -0
  55. package/lib/bmad-extension/skills/ml-ideation/skill.json +7 -0
  56. package/lib/bmad-extension/skills/ml-infra/.gitkeep +0 -0
  57. package/lib/bmad-extension/skills/ml-infra/SKILL.md +58 -0
  58. package/lib/bmad-extension/skills/ml-infra/bmad-skill-manifest.yaml +3 -0
  59. package/lib/bmad-extension/skills/ml-infra/skill.json +7 -0
  60. package/lib/bmad-extension/skills/ml-retrospective/.gitkeep +0 -0
  61. package/lib/bmad-extension/skills/ml-retrospective/SKILL.md +63 -0
  62. package/lib/bmad-extension/skills/ml-retrospective/bmad-skill-manifest.yaml +3 -0
  63. package/lib/bmad-extension/skills/ml-retrospective/skill.json +7 -0
  64. package/lib/bmad-extension/skills/ml-revision/.gitkeep +0 -0
  65. package/lib/bmad-extension/skills/ml-revision/SKILL.md +82 -0
  66. package/lib/bmad-extension/skills/ml-revision/bmad-skill-manifest.yaml +3 -0
  67. package/lib/bmad-extension/skills/ml-revision/skill.json +7 -0
  68. package/lib/bmad-extension/skills/ml-techspec/.gitkeep +0 -0
  69. package/lib/bmad-extension/skills/ml-techspec/SKILL.md +80 -0
  70. package/lib/bmad-extension/skills/ml-techspec/bmad-skill-manifest.yaml +3 -0
  71. package/lib/bmad-extension/skills/ml-techspec/skill.json +7 -0
  72. package/lib/bmad.js +85 -8
  73. package/lib/skill-authoring.js +1 -1
  74. package/package.json +2 -2
  75. package/test/agent-injection-strategy.test.js +4 -4
  76. package/test/bmad-version-bump.test.js +34 -34
  77. package/test/build-bmad-args.test.js +13 -6
  78. package/test/convert-agents-to-skills.test.js +11 -1
  79. package/test/extension-module-restructure.test.js +31 -7
  80. package/test/migration-validation.test.js +14 -11
@@ -0,0 +1,287 @@
1
+ #!/usr/bin/env python3
2
+ """
3
+ validate_prd.py — BMAD DL Lifecycle
4
+ Validates docs/prd/01_PRD.md for completeness before architecture begins.
5
+
6
+ Usage:
7
+ python3 scripts/validate_prd.py <prd_path>
8
+ python3 scripts/validate_prd.py docs/prd/01_PRD.md
9
+
10
+ Exit codes:
11
+ 0 — PASS
12
+ 1 — validation errors found
13
+ 2 — file not found or unreadable
14
+ """
15
+
16
+ from __future__ import annotations
17
+
18
+ import re
19
+ import sys
20
+ from dataclasses import dataclass, field
21
+ from pathlib import Path
22
+
23
+
24
+ # ── Configuration ─────────────────────────────────────────────────────────────
25
+
26
+ REQUIRED_SECTIONS = ["Project Overview", "Traceable Requirements", "Status"]
27
+ REQUIRED_REQ_CATEGORIES = {"System", "Data", "Performance"}
28
+ REQUIRED_REQ_PREFIXES = {"REQ-SYS", "REQ-DATA", "REQ-PERF"}
29
+ STATUS_APPROVAL_PATTERN = re.compile(r"\[x\]\s*Approved", re.IGNORECASE)
30
+
31
+ # Table column indices (0-based after splitting on |)
32
+ COL_REQ_ID = 1
33
+ COL_CATEGORY = 2
34
+ COL_DESCRIPTION = 3
35
+ COL_ACCEPTANCE = 4
36
+
37
+
38
+ # ── Data structures ────────────────────────────────────────────────────────────
39
+
40
+ @dataclass
41
+ class Requirement:
42
+ req_id: str
43
+ category: str
44
+ description: str
45
+ acceptance_criteria: str
46
+ line_number: int
47
+
48
+
49
+ @dataclass
50
+ class ValidationResult:
51
+ errors: list[str] = field(default_factory=list)
52
+ warnings: list[str] = field(default_factory=list)
53
+
54
+ @property
55
+ def passed(self) -> bool:
56
+ return len(self.errors) == 0
57
+
58
+ def add_error(self, msg: str) -> None:
59
+ self.errors.append(msg)
60
+
61
+ def add_warning(self, msg: str) -> None:
62
+ self.warnings.append(msg)
63
+
64
+
65
+ # ── Parsing helpers ────────────────────────────────────────────────────────────
66
+
67
+ def _clean_cell(cell: str) -> str:
68
+ return cell.strip().strip("*`[]")
69
+
70
+
71
+ def _is_separator_row(row: str) -> bool:
72
+ return bool(re.match(r"^\s*\|[\s\-:|]+\|\s*$", row))
73
+
74
+
75
+ def parse_requirements_table(lines: list[str]) -> list[Requirement]:
76
+ """Extract requirement rows from the markdown table in section B."""
77
+ requirements: list[Requirement] = []
78
+ in_table = False
79
+
80
+ for i, line in enumerate(lines, start=1):
81
+ if re.match(r"\|\s*Requirement\s*ID", line, re.IGNORECASE):
82
+ in_table = True
83
+ continue
84
+ if not in_table:
85
+ continue
86
+ if _is_separator_row(line):
87
+ continue
88
+ if not line.strip().startswith("|"):
89
+ in_table = False
90
+ continue
91
+
92
+ cells = line.split("|")
93
+ if len(cells) < 5:
94
+ continue
95
+
96
+ req_id = _clean_cell(cells[COL_REQ_ID])
97
+ category = _clean_cell(cells[COL_CATEGORY])
98
+ description = _clean_cell(cells[COL_DESCRIPTION])
99
+ acceptance = _clean_cell(cells[COL_ACCEPTANCE])
100
+
101
+ # Skip placeholder / header rows
102
+ if not req_id or req_id.startswith(":") or "Requirement" in req_id:
103
+ continue
104
+
105
+ requirements.append(Requirement(
106
+ req_id=req_id,
107
+ category=category,
108
+ description=description,
109
+ acceptance_criteria=acceptance,
110
+ line_number=i,
111
+ ))
112
+
113
+ return requirements
114
+
115
+
116
+ def find_sections(text: str) -> set[str]:
117
+ """Return set of section headings found (### A., ### B., etc.)."""
118
+ return set(re.findall(r"###\s+[A-Z]\.\s+(.+)", text))
119
+
120
+
121
+ # ── Validation checks ──────────────────────────────────────────────────────────
122
+
123
+ def check_required_sections(text: str, result: ValidationResult) -> None:
124
+ sections = find_sections(text)
125
+ for required in REQUIRED_SECTIONS:
126
+ if not any(required.lower() in s.lower() for s in sections):
127
+ result.add_error(f"Missing required section: '### X. {required}'")
128
+
129
+
130
+ def check_requirements_table(lines: list[str], result: ValidationResult) -> list[Requirement]:
131
+ reqs = parse_requirements_table(lines)
132
+
133
+ if not reqs:
134
+ result.add_error(
135
+ "No requirements found in the Traceable Requirements table. "
136
+ "Ensure the table header contains 'Requirement ID'."
137
+ )
138
+ return []
139
+
140
+ return reqs
141
+
142
+
143
+ def check_req_id_format(reqs: list[Requirement], result: ValidationResult) -> None:
144
+ pattern = re.compile(r"^REQ-[A-Z]+-\d+$")
145
+ for req in reqs:
146
+ if not pattern.match(req.req_id):
147
+ result.add_error(
148
+ f"Line {req.line_number}: Invalid REQ-ID format '{req.req_id}'. "
149
+ f"Expected pattern: REQ-<CATEGORY>-<NUMBER> (e.g. REQ-PERF-01)"
150
+ )
151
+
152
+
153
+ def check_category_coverage(reqs: list[Requirement], result: ValidationResult) -> None:
154
+ found_prefixes = {req.req_id.rsplit("-", 1)[0] for req in reqs}
155
+ missing = REQUIRED_REQ_PREFIXES - found_prefixes
156
+ for prefix in sorted(missing):
157
+ result.add_error(
158
+ f"No requirement with prefix '{prefix}-' found. "
159
+ f"Every PRD must include at least one {prefix} requirement."
160
+ )
161
+
162
+
163
+ def check_empty_fields(reqs: list[Requirement], result: ValidationResult) -> None:
164
+ placeholder_patterns = [
165
+ re.compile(r"^\[.+\]$"), # [Placeholder text]
166
+ re.compile(r"^\.{3}$"), # ...
167
+ re.compile(r"^-$"), # -
168
+ re.compile(r"^TBD$", re.IGNORECASE),
169
+ ]
170
+
171
+ def _is_placeholder(value: str) -> bool:
172
+ return not value or any(p.match(value) for p in placeholder_patterns)
173
+
174
+ for req in reqs:
175
+ if _is_placeholder(req.description):
176
+ result.add_error(
177
+ f"Line {req.line_number}: REQ '{req.req_id}' has an empty or placeholder Description."
178
+ )
179
+ if _is_placeholder(req.acceptance_criteria):
180
+ result.add_error(
181
+ f"Line {req.line_number}: REQ '{req.req_id}' has an empty or placeholder Acceptance Criteria. "
182
+ f"Every requirement must have measurable acceptance criteria."
183
+ )
184
+ if _is_placeholder(req.category):
185
+ result.add_error(
186
+ f"Line {req.line_number}: REQ '{req.req_id}' has an empty Category."
187
+ )
188
+
189
+
190
+ def check_status_approval(text: str, result: ValidationResult) -> None:
191
+ if not STATUS_APPROVAL_PATTERN.search(text):
192
+ result.add_error(
193
+ "Status section does not show approval. "
194
+ "Expected: '* [x] Approved for Architecture Design'"
195
+ )
196
+
197
+
198
+ def check_duplicate_req_ids(reqs: list[Requirement], result: ValidationResult) -> None:
199
+ seen: dict[str, int] = {}
200
+ for req in reqs:
201
+ if req.req_id in seen:
202
+ result.add_error(
203
+ f"Duplicate REQ-ID '{req.req_id}' found at line {req.line_number} "
204
+ f"(first seen at line {seen[req.req_id]})."
205
+ )
206
+ else:
207
+ seen[req.req_id] = req.line_number
208
+
209
+
210
+ def check_minimum_requirements(reqs: list[Requirement], result: ValidationResult) -> None:
211
+ if len(reqs) < 3:
212
+ result.add_warning(
213
+ f"Only {len(reqs)} requirement(s) found. A meaningful PRD typically has "
214
+ f"at least one each of REQ-SYS, REQ-DATA, and REQ-PERF."
215
+ )
216
+
217
+
218
+ # ── Main ───────────────────────────────────────────────────────────────────────
219
+
220
+ def validate(prd_path: Path) -> ValidationResult:
221
+ result = ValidationResult()
222
+
223
+ try:
224
+ text = prd_path.read_text(encoding="utf-8")
225
+ except FileNotFoundError:
226
+ result.add_error(f"File not found: {prd_path}")
227
+ return result
228
+ except OSError as e:
229
+ result.add_error(f"Cannot read file: {e}")
230
+ return result
231
+
232
+ lines = text.splitlines()
233
+
234
+ check_required_sections(text, result)
235
+ reqs = check_requirements_table(lines, result)
236
+
237
+ if reqs:
238
+ check_req_id_format(reqs, result)
239
+ check_category_coverage(reqs, result)
240
+ check_empty_fields(reqs, result)
241
+ check_duplicate_req_ids(reqs, result)
242
+ check_minimum_requirements(reqs, result)
243
+
244
+ check_status_approval(text, result)
245
+
246
+ return result
247
+
248
+
249
+ def print_report(prd_path: Path, result: ValidationResult) -> None:
250
+ print(f"\nValidating: {prd_path}")
251
+ print("─" * 60)
252
+
253
+ if result.passed and not result.warnings:
254
+ print("✓ PRD validation PASSED — ready for architecture phase.")
255
+ return
256
+
257
+ if result.errors:
258
+ print(f"✗ FAILED — {len(result.errors)} error(s) must be fixed:\n")
259
+ for i, err in enumerate(result.errors, 1):
260
+ print(f" {i}. {err}")
261
+
262
+ if result.warnings:
263
+ print(f"\n⚠ {len(result.warnings)} warning(s):\n")
264
+ for w in result.warnings:
265
+ print(f" • {w}")
266
+
267
+ if result.passed:
268
+ print("\n✓ PRD validation PASSED (with warnings).")
269
+
270
+
271
+ def main() -> int:
272
+ if len(sys.argv) < 2:
273
+ print("Usage: python3 validate_prd.py <prd_path>", file=sys.stderr)
274
+ print("Example: python3 validate_prd.py docs/prd/01_PRD.md", file=sys.stderr)
275
+ return 2
276
+
277
+ prd_path = Path(sys.argv[1])
278
+ result = validate(prd_path)
279
+ print_report(prd_path, result)
280
+
281
+ if not result.passed:
282
+ return 1
283
+ return 0
284
+
285
+
286
+ if __name__ == "__main__":
287
+ sys.exit(main())
@@ -0,0 +1,7 @@
1
+ {
2
+ "name": "ML Ideation & Framing",
3
+ "description": "Frames the Machine Learning problem, defines the Research Thesis, and produces the PRD requirements.",
4
+ "version": "1.0.0",
5
+ "author": "Demerzel (ML Scientist)",
6
+ "tags": ["Machine Learning", "Planning", "Ideation"]
7
+ }
File without changes
@@ -0,0 +1,58 @@
1
+ ---
2
+ name: ml-infra
3
+ description: ML Infra — Set up the Python environment, install dependencies, and run a smoke test before full training
4
+ ---
5
+
6
+ # ML Stage 5 — Infrastructure & Smoke Test
7
+
8
+ Validate the full pipeline end-to-end on a tiny data slice before committing to a full training run.
9
+
10
+ ## Instructions
11
+
12
+ ### 1. Load Context
13
+ - Read `_bmad-output/planning-artifacts/ml-techspec.md` — confirm it is LOCKED
14
+ - If TechSpec is not locked, STOP: "TechSpec must be locked before infrastructure setup. Run /ml-techspec first."
15
+ - Read `configs/ml_config.yaml` for tracking_tool setting
16
+
17
+ ### 2. Environment Setup
18
+ Guide the user through (or execute if tools are available):
19
+ ```bash
20
+ uv venv
21
+ uv sync
22
+ ```
23
+ If new dependencies were added in architecture stage:
24
+ ```bash
25
+ uv add <package1> <package2>
26
+ ```
27
+
28
+ ### 3. Configure Experiment Tracking
29
+ Based on `tracking_tool` in config:
30
+ - **wandb**: Verify `import wandb; wandb.login()` works. Create project if needed.
31
+ - **mlflow**: Verify `import mlflow; mlflow.set_tracking_uri(...)` works.
32
+ - **clearml**: Verify `clearml-init` has been run.
33
+ - **local**: Confirm `logs/` directory exists for CSV/JSON run logs.
34
+
35
+ ### 4. Write Smoke Test Script
36
+ If `scripts/smoke_test.py` does not exist, create it:
37
+ - Load first 100 rows of training data
38
+ - Run the full preprocessing pipeline
39
+ - Instantiate the model with fixed hyperparameters from TechSpec
40
+ - Train on 80 rows, predict on 20 rows
41
+ - Log one metric to the tracking tool
42
+ - Assert the pipeline completes without error
43
+
44
+ Run the smoke test: `uv run python scripts/smoke_test.py`
45
+
46
+ ### 5. Verify Outputs
47
+ Confirm:
48
+ - Environment activates cleanly (`uv run python --version`)
49
+ - All imports resolve (no ModuleNotFoundError)
50
+ - Data loads from the path specified in TechSpec
51
+ - Preprocessing pipeline runs without error
52
+ - Smoke test metric appears in tracking tool dashboard (or local log)
53
+
54
+ ### 6. Confirm & Advance
55
+ - Report smoke test result: PASS or FAIL with error details
56
+ - On PASS: "Infrastructure validated. Proceed to **Stage 6 — /ml-experiment** to run the full experiment."
57
+ - On FAIL: Fix the issue before advancing. Do not proceed to full training with a broken pipeline.
58
+ - STOP and WAIT for user confirmation
@@ -0,0 +1,3 @@
1
+ type: skill
2
+ name: ml-infra
3
+ module: ma-skills
@@ -0,0 +1,7 @@
1
+ {
2
+ "name": "ML Infrastructure Build",
3
+ "description": "Builds all engineering infrastructure (data pipelines, training loops, tracking, eval) for experiments.",
4
+ "version": "1.0.0",
5
+ "author": "Demerzel (ML Scientist)",
6
+ "tags": ["Machine Learning", "Infrastructure", "MLOps", "Data Pipeline", "Training"]
7
+ }
@@ -0,0 +1,63 @@
1
+ ---
2
+ name: ml-retrospective
3
+ description: ML Retrospective - Capture learnings, document what worked and what failed, and produce a reusable knowledge artifact
4
+ ---
5
+
6
+ # ML Stage 8 - Retrospective
7
+
8
+ Close the experiment loop. Document learnings so the next experiment starts smarter.
9
+
10
+ ## Instructions
11
+
12
+ ### 1. Load Context
13
+ - Read all artifacts in `_bmad-output/planning-artifacts/`
14
+ - Read all artifacts in `_bmad-output/implementation-artifacts/`
15
+ - Reconstruct the full lifecycle: what was hypothesized, what was built, what happened
16
+
17
+ ### 2. Write Retrospective
18
+ Write `_bmad-output/planning-artifacts/retrospective.md` with these sections:
19
+
20
+ #### Hypothesis Outcome
21
+ - **Original Hypothesis** (from research-thesis.md)
22
+ - **Outcome**: CONFIRMED / REFUTED / PARTIALLY CONFIRMED
23
+ - **Evidence**: Key metrics that support the conclusion
24
+
25
+ #### What Worked Well
26
+ - Data characteristics that made modelling tractable
27
+ - Preprocessing decisions that improved results
28
+ - Architecture choices that were validated
29
+ - Process decisions that saved time
30
+
31
+ #### What Did Not Work
32
+ - Approaches attempted and discarded (with evidence, not opinion)
33
+ - Data quality issues that were not caught in EDA
34
+ - Assumptions from the PRD that were wrong
35
+ - HPO strategies that did not yield improvement
36
+
37
+ #### Failure Cost Reflection
38
+ - Actual FP/FN distribution in best model
39
+ - Whether the failure cost tradeoff met business expectations
40
+ - Recommended threshold for production use (with cost justification)
41
+
42
+ #### Technical Debt
43
+ - Known shortcuts taken and their risk
44
+ - Features excluded that may still have signal
45
+ - Data collection improvements recommended for next iteration
46
+
47
+ #### Recommended Next Experiments
48
+ - Top 3 actionable hypotheses for the next iteration, ranked by expected improvement
49
+ - Each with: hypothesis, required data, estimated effort
50
+
51
+ #### Process Improvements
52
+ - Anything in the Demerzel protocol that should be adjusted for this domain
53
+ - Data pipeline improvements needed before next experiment
54
+
55
+ ### 3. Archive Experiment
56
+ - Confirm all artifacts are saved under `_bmad-output/`
57
+ - Confirm model artifact is saved under `_bmad-output/implementation-artifacts/models/`
58
+ - Confirm experiment is logged and accessible in tracking tool
59
+
60
+ ### 4. Close Session
61
+ - Present retrospective summary
62
+ - "Experiment cycle complete. All artifacts are saved in `_bmad-output/`. You can start a new experiment cycle with /ml-ideation or dismiss Demerzel with DA."
63
+ - STOP and WAIT for user input
@@ -0,0 +1,3 @@
1
+ type: skill
2
+ name: ml-retrospective
3
+ module: ma-skills
@@ -0,0 +1,7 @@
1
+ {
2
+ "name": "ML Session Retrospective",
3
+ "description": "Extracts learnings, metrics, and failure modes from the current session and archives them to the knowledge base.",
4
+ "version": "1.0.0",
5
+ "author": "Demerzel (ML Scientist)",
6
+ "tags": ["Machine Learning", "Retrospective", "Knowledge Capture", "Post-mortem", "Demerzel"]
7
+ }
File without changes
@@ -0,0 +1,82 @@
1
+ ---
2
+
3
+ name: ml-revision
4
+
5
+ description: Acts as Demerzel (Machine Learning Scientist) to formulate the next hypothesis, explicitly amend all upstream documents (Thesis, PRD, Architecture, Design) that need updating, and generate the task set for the next experiment cycle.
6
+
7
+ ---
8
+
9
+ # Machine Learning Workflow: Iterative Revision Cycle — Demerzel
10
+
11
+ ## 1. Operating Instructions
12
+
13
+ You are **Demerzel**, an expert Machine Learning Scientist. A revision is not just a hypothesis update. It is a **document audit**: every upstream document that no longer accurately reflects what was learned must be explicitly amended.
14
+
15
+ 1. **Summarize experiment history:**
16
+ ```bash
17
+ python3 scripts/summarize_experiment_history.py _bmad-output/implementation-artifacts/ --metric val/f1
18
+ ```
19
+
20
+ 2. **Read in order:**
21
+ - `_bmad-output/planning-artifacts/research-thesis.md`
22
+ - `_bmad-output/implementation-artifacts/ml-analysis-exp-[id].md`
23
+ - `_bmad-output/planning-artifacts/techspecs/ml-techspec-exp-[id].md`
24
+ - `_bmad-output/planning-artifacts/ml-prd.md`
25
+ - `_bmad-output/planning-artifacts/ml-architecture.md`
26
+ - `_bmad-output/planning-artifacts/ml-detailed-design.md`
27
+
28
+ 3. **Conduct the document audit.** For each upstream document, state:
29
+ - **No change needed** — with explicit reason.
30
+ - **Amendment needed** — with the exact proposed change.
31
+
32
+ 4. **Formulate the next hypothesis:**
33
+ - Format: "Using [change] will improve [metric] from [baseline] to [target] because [reasoning from the latest analysis]."
34
+ - The hypothesis must be falsifiable. State what result would disprove it.
35
+
36
+ 5. **Generate new tasks** for the next cycle:
37
+ - New infrastructure tasks → `INF-0XX` (if architecture changes).
38
+ - New experiment tasks → `EXP-0XX` (increment the counter).
39
+
40
+ 6. **CRITICAL:** Do not execute any changes yet. Present the full revision plan (History, Verdict, Audit, New Hypothesis, New Tasks) to the user. Halt and wait.
41
+
42
+ 7. Upon approval, execute all changes:
43
+ - Apply all document amendments.
44
+ - Update `research-thesis.md` Section II (new) and Section V (archive old).
45
+ - Append to `_bmad-output/implementation-artifacts/ml-revision-log.md`.
46
+
47
+ 8. **Commit the revision artifact:**
48
+ ```bash
49
+ git add _bmad-output/planning-artifacts/ _bmad-output/implementation-artifacts/ml-revision-log.md
50
+ git commit -m "docs(ml-revision): cycle [N] -- new hypothesis H-00N"
51
+ ```
52
+
53
+ ## 2. Expected Output Templates
54
+
55
+ ### Template A: Update to `_bmad-output/planning-artifacts/research-thesis.md`
56
+ - Section II: Replace active hypothesis. Set status to "Untested".
57
+ - Section V: Append the previous hypothesis row.
58
+
59
+ ### Template B: `_bmad-output/implementation-artifacts/ml-revision-log.md`
60
+
61
+ ```markdown
62
+ ### Revision Cycle [N]
63
+ * **Triggered By:** EXP-[ID]
64
+ * **Verdict:** [SUPPORTED / FALSIFIED]
65
+ * **New Hypothesis:** "[New domain-grounded, falsifiable statement]"
66
+ * **Rationale:** [Evidence from analysis]
67
+
68
+ ### Document Amendment Log
69
+ | Document | Change / No Change | Detail |
70
+ | :--- | :--- | :--- |
71
+ | research-thesis.md | AMENDED | Section II and V updated. |
72
+ | ml-prd.md | [Change] | [Detail] |
73
+ | ml-architecture.md | [Change] | [Detail] |
74
+
75
+ ### New Task Generation
76
+ | Task ID | Description | Linked Req |
77
+ | :--- | :--- | :--- |
78
+ | `EXP-0XX` | [Training run with new hypothesis] | [REQ-ID] |
79
+ | `INF-0XX` | [New infra if architecture changed] | [REQ-ID] |
80
+
81
+ * **Status:** [Approved — ready for /ml-techspec]
82
+ ```
@@ -0,0 +1,3 @@
1
+ type: skill
2
+ name: ml-revision
3
+ module: ma-skills
@@ -0,0 +1,7 @@
1
+ {
2
+ "name": "ML Iterative Revision",
3
+ "description": "Formulates the next hypothesis, audits all upstream documents, and generates the task set for the next cycle.",
4
+ "version": "1.0.0",
5
+ "author": "Demerzel (ML Scientist)",
6
+ "tags": ["Machine Learning", "Revision", "Iteration", "Hypothesis", "Audit", "Demerzel"]
7
+ }
File without changes
@@ -0,0 +1,80 @@
1
+ ---
2
+ name: ml-techspec
3
+ description: ML TechSpec — Lock the experiment contract. Defines acceptance criteria that cannot be changed during training.
4
+ ---
5
+
6
+ # ML Stage 4 — TechSpec (The Contract)
7
+
8
+ The TechSpec is a locked contract. Once approved, parameters and acceptance criteria MUST NOT be changed during experiment execution. This prevents moving the goalposts after seeing results.
9
+
10
+ ## Instructions
11
+
12
+ ### 1. Load Context
13
+ - Read `_bmad-output/planning-artifacts/ml-prd.md`
14
+ - Read `_bmad-output/planning-artifacts/eda-report.md`
15
+ - Read `_bmad-output/planning-artifacts/ml-architecture.md`
16
+
17
+ ### 2. Draft the TechSpec
18
+ Write `_bmad-output/planning-artifacts/ml-techspec.md` with the following sections — every field is REQUIRED:
19
+
20
+ ```
21
+ # ML TechSpec — [Project Name]
22
+ Status: DRAFT -> LOCKED (set to LOCKED on user approval)
23
+ Locked At: [timestamp on approval]
24
+
25
+ ## Experiment Identity
26
+ - Project: [project_name from configs/ml_config.yaml]
27
+ - Tracking Tool: [wandb / mlflow / clearml / local]
28
+ - Run Name Convention: [e.g. xgb_v{version}_{date}]
29
+
30
+ ## Data Contract
31
+ - Training Data: [path, row count, date range]
32
+ - Validation Strategy: [stratified k-fold N=X / time-series split / holdout ratio]
33
+ - Test Set: [path or split ratio — NEVER touched until final evaluation]
34
+ - Feature Set: [list of features, with preprocessing step for each]
35
+ - Excluded Features: [list with reason for exclusion]
36
+
37
+ ## Model Contract
38
+ - Algorithm: [exact algorithm name and library]
39
+ - Baseline: [exact baseline — e.g. DummyClassifier(strategy='most_frequent')]
40
+ - Fixed Hyperparameters: [params NOT being tuned, with values]
41
+ - HPO Space: [params being tuned, with ranges and search strategy]
42
+ - HPO Budget: [max trials or max wall-clock time]
43
+
44
+ ## Acceptance Criteria (PRIMARY — must ALL pass)
45
+ - Primary Metric: [e.g. Recall >= 0.85 on validation set]
46
+ - Secondary Metric: [e.g. AUC-ROC >= 0.80]
47
+ - Baseline Beat: [model must beat baseline on primary metric]
48
+
49
+ ## Guardrail Criteria (MUST NOT violate)
50
+ - [e.g. Precision >= 0.50 — below this FP rate is unacceptable in domain]
51
+ - [e.g. Inference latency < 100ms per sample]
52
+
53
+ ## Failure Cost Reminder
54
+ - False Negative cost: [from PRD]
55
+ - False Positive cost: [from PRD]
56
+ - Threshold selection strategy: [maximize recall / F-beta / cost-weighted]
57
+
58
+ ## Reproducibility
59
+ - Random seed: [integer]
60
+ - uv lockfile: pyproject.toml + uv.lock
61
+ ```
62
+
63
+ ### 3. Surface Dilemmas & Commit Gate
64
+
65
+ Before presenting and **before any git commit**:
66
+
67
+ - Identify every contract decision where two or more reasonable options existed (metric threshold values, HPO budget, random seed choice, held-out split ratio, guardrail definitions, etc.)
68
+ - Format each as: **Dilemma [Letter] — Title** / **Context** / **Options (a/b)** / **Recommendation** / **Your decision:** [blank]
69
+ - If all choices were unambiguous, state explicitly: "No open dilemmas."
70
+ - **Do NOT commit the TechSpec until the user has resolved all dilemmas and given explicit lock approval.**
71
+
72
+ ### 4. Lock the Contract
73
+ - Present the TechSpec draft and all surfaced dilemmas to the user
74
+ - State explicitly: "Once you approve this, the acceptance criteria and data contract are locked. You may tune hyperparameters but may NOT change the primary metric threshold, feature set, or validation strategy during training."
75
+ - Ask: "Do you approve and lock this TechSpec?"
76
+ - On approval: Set Status to LOCKED, add Locked At timestamp, then commit
77
+
78
+ ### 5. Confirm & Advance
79
+ - "TechSpec locked. Proceed to **Stage 5 — /ml-infra** to set up the environment and run a smoke test."
80
+ - STOP and WAIT for user confirmation
@@ -0,0 +1,3 @@
1
+ type: skill
2
+ name: ml-techspec
3
+ module: ma-skills
@@ -0,0 +1,7 @@
1
+ {
2
+ "name": "ML Pre-Experiment TechSpec",
3
+ "description": "Produces a pre-experiment contract that pins hypotheses, parameters, and success criteria.",
4
+ "version": "1.0.0",
5
+ "author": "Demerzel (ML Scientist)",
6
+ "tags": ["Machine Learning", "TechSpec", "Experiment Design", "Contract"]
7
+ }