get-research-done 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (127) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +560 -0
  3. package/agents/grd-architect.md +789 -0
  4. package/agents/grd-codebase-mapper.md +738 -0
  5. package/agents/grd-critic.md +1065 -0
  6. package/agents/grd-debugger.md +1203 -0
  7. package/agents/grd-evaluator.md +948 -0
  8. package/agents/grd-executor.md +784 -0
  9. package/agents/grd-explorer.md +2063 -0
  10. package/agents/grd-graduator.md +484 -0
  11. package/agents/grd-integration-checker.md +423 -0
  12. package/agents/grd-phase-researcher.md +641 -0
  13. package/agents/grd-plan-checker.md +745 -0
  14. package/agents/grd-planner.md +1386 -0
  15. package/agents/grd-project-researcher.md +865 -0
  16. package/agents/grd-research-synthesizer.md +256 -0
  17. package/agents/grd-researcher.md +2361 -0
  18. package/agents/grd-roadmapper.md +605 -0
  19. package/agents/grd-verifier.md +778 -0
  20. package/bin/install.js +1294 -0
  21. package/commands/grd/add-phase.md +207 -0
  22. package/commands/grd/add-todo.md +193 -0
  23. package/commands/grd/architect.md +283 -0
  24. package/commands/grd/audit-milestone.md +277 -0
  25. package/commands/grd/check-todos.md +228 -0
  26. package/commands/grd/complete-milestone.md +136 -0
  27. package/commands/grd/debug.md +169 -0
  28. package/commands/grd/discuss-phase.md +86 -0
  29. package/commands/grd/evaluate.md +1095 -0
  30. package/commands/grd/execute-phase.md +339 -0
  31. package/commands/grd/explore.md +258 -0
  32. package/commands/grd/graduate.md +323 -0
  33. package/commands/grd/help.md +482 -0
  34. package/commands/grd/insert-phase.md +227 -0
  35. package/commands/grd/insights.md +231 -0
  36. package/commands/grd/join-discord.md +18 -0
  37. package/commands/grd/list-phase-assumptions.md +50 -0
  38. package/commands/grd/map-codebase.md +71 -0
  39. package/commands/grd/new-milestone.md +721 -0
  40. package/commands/grd/new-project.md +1008 -0
  41. package/commands/grd/pause-work.md +134 -0
  42. package/commands/grd/plan-milestone-gaps.md +295 -0
  43. package/commands/grd/plan-phase.md +525 -0
  44. package/commands/grd/progress.md +364 -0
  45. package/commands/grd/quick-explore.md +236 -0
  46. package/commands/grd/quick.md +309 -0
  47. package/commands/grd/remove-phase.md +349 -0
  48. package/commands/grd/research-phase.md +200 -0
  49. package/commands/grd/research.md +681 -0
  50. package/commands/grd/resume-work.md +40 -0
  51. package/commands/grd/set-profile.md +106 -0
  52. package/commands/grd/settings.md +136 -0
  53. package/commands/grd/update.md +172 -0
  54. package/commands/grd/verify-work.md +219 -0
  55. package/get-research-done/config/default.json +15 -0
  56. package/get-research-done/references/checkpoints.md +1078 -0
  57. package/get-research-done/references/continuation-format.md +249 -0
  58. package/get-research-done/references/git-integration.md +254 -0
  59. package/get-research-done/references/model-profiles.md +73 -0
  60. package/get-research-done/references/planning-config.md +94 -0
  61. package/get-research-done/references/questioning.md +141 -0
  62. package/get-research-done/references/tdd.md +263 -0
  63. package/get-research-done/references/ui-brand.md +160 -0
  64. package/get-research-done/references/verification-patterns.md +612 -0
  65. package/get-research-done/templates/DEBUG.md +159 -0
  66. package/get-research-done/templates/UAT.md +247 -0
  67. package/get-research-done/templates/archive-reason.md +195 -0
  68. package/get-research-done/templates/codebase/architecture.md +255 -0
  69. package/get-research-done/templates/codebase/concerns.md +310 -0
  70. package/get-research-done/templates/codebase/conventions.md +307 -0
  71. package/get-research-done/templates/codebase/integrations.md +280 -0
  72. package/get-research-done/templates/codebase/stack.md +186 -0
  73. package/get-research-done/templates/codebase/structure.md +285 -0
  74. package/get-research-done/templates/codebase/testing.md +480 -0
  75. package/get-research-done/templates/config.json +35 -0
  76. package/get-research-done/templates/context.md +283 -0
  77. package/get-research-done/templates/continue-here.md +78 -0
  78. package/get-research-done/templates/critic-log.md +288 -0
  79. package/get-research-done/templates/data-report.md +173 -0
  80. package/get-research-done/templates/debug-subagent-prompt.md +91 -0
  81. package/get-research-done/templates/decision-log.md +58 -0
  82. package/get-research-done/templates/decision.md +138 -0
  83. package/get-research-done/templates/discovery.md +146 -0
  84. package/get-research-done/templates/experiment-readme.md +104 -0
  85. package/get-research-done/templates/graduated-script.md +180 -0
  86. package/get-research-done/templates/iteration-summary.md +234 -0
  87. package/get-research-done/templates/milestone-archive.md +123 -0
  88. package/get-research-done/templates/milestone.md +115 -0
  89. package/get-research-done/templates/objective.md +271 -0
  90. package/get-research-done/templates/phase-prompt.md +567 -0
  91. package/get-research-done/templates/planner-subagent-prompt.md +117 -0
  92. package/get-research-done/templates/project.md +184 -0
  93. package/get-research-done/templates/requirements.md +231 -0
  94. package/get-research-done/templates/research-project/ARCHITECTURE.md +204 -0
  95. package/get-research-done/templates/research-project/FEATURES.md +147 -0
  96. package/get-research-done/templates/research-project/PITFALLS.md +200 -0
  97. package/get-research-done/templates/research-project/STACK.md +120 -0
  98. package/get-research-done/templates/research-project/SUMMARY.md +170 -0
  99. package/get-research-done/templates/research.md +529 -0
  100. package/get-research-done/templates/roadmap.md +202 -0
  101. package/get-research-done/templates/scorecard.json +113 -0
  102. package/get-research-done/templates/state.md +287 -0
  103. package/get-research-done/templates/summary.md +246 -0
  104. package/get-research-done/templates/user-setup.md +311 -0
  105. package/get-research-done/templates/verification-report.md +322 -0
  106. package/get-research-done/workflows/complete-milestone.md +756 -0
  107. package/get-research-done/workflows/diagnose-issues.md +231 -0
  108. package/get-research-done/workflows/discovery-phase.md +289 -0
  109. package/get-research-done/workflows/discuss-phase.md +433 -0
  110. package/get-research-done/workflows/execute-phase.md +657 -0
  111. package/get-research-done/workflows/execute-plan.md +1844 -0
  112. package/get-research-done/workflows/list-phase-assumptions.md +178 -0
  113. package/get-research-done/workflows/map-codebase.md +322 -0
  114. package/get-research-done/workflows/resume-project.md +307 -0
  115. package/get-research-done/workflows/transition.md +556 -0
  116. package/get-research-done/workflows/verify-phase.md +628 -0
  117. package/get-research-done/workflows/verify-work.md +596 -0
  118. package/hooks/dist/grd-check-update.js +61 -0
  119. package/hooks/dist/grd-statusline.js +84 -0
  120. package/package.json +47 -0
  121. package/scripts/audit-help-commands.sh +115 -0
  122. package/scripts/build-hooks.js +42 -0
  123. package/scripts/verify-all-commands.sh +246 -0
  124. package/scripts/verify-architect-warning.sh +35 -0
  125. package/scripts/verify-insights-mode.sh +40 -0
  126. package/scripts/verify-quick-mode.sh +20 -0
  127. package/scripts/verify-revise-data-routing.sh +139 -0
@@ -0,0 +1,180 @@
1
+ # Graduated Script Template
2
+
3
+ This template is used when graduating an exploration notebook to a validated Python script via `/grd:graduate`.
4
+
5
+ ## Template Variables
6
+
7
+ | Variable | Source | Description |
8
+ |----------|--------|-------------|
9
+ | `{{experiment_name}}` | OBJECTIVE.md or notebook filename | Human-readable experiment name |
10
+ | `{{source_notebook}}` | User input | Path to source notebook (e.g., `notebooks/exploration/001_initial.ipynb`) |
11
+ | `{{source_run}}` | Run directory | Run that achieved PROCEED verdict (e.g., `experiments/run_003_baseline`) |
12
+ | `{{critic_verdict}}` | CRITIC.md | Critic verdict (always PROCEED for graduation) |
13
+ | `{{verdict_date}}` | CRITIC.md | ISO 8601 date of verdict |
14
+ | `{{graduation_timestamp}}` | System | ISO 8601 timestamp when graduated |
15
+
16
+ ## Python Script Template
17
+
18
+ ```python
19
+ """
20
+ Validated experiment: {{experiment_name}}
21
+
22
+ Source notebook: {{source_notebook}}
23
+ Source run: {{source_run}}
24
+ Critic verdict: {{critic_verdict}} ({{verdict_date}})
25
+ Graduated: {{graduation_timestamp}}
26
+
27
+ MANUAL REFACTORING REQUIRED:
28
+ - [ ] Remove/convert magic commands (grep "^%")
29
+ - [ ] Extract code into functions
30
+ - [ ] Replace parameter cell with argparse
31
+ - [ ] Add docstrings and type hints
32
+ - [ ] Set all random seeds explicitly
33
+ - [ ] Write tests for core functions
34
+
35
+ This script was auto-generated from a validated notebook.
36
+ Review and complete the refactoring checklist above before production use.
37
+ """
38
+
39
+ import argparse
40
+ import random
41
+ from typing import Any
42
+
43
+ import numpy as np
44
+
45
+ # Uncomment if using PyTorch:
46
+ # import torch
47
+
48
+ # Uncomment if using TensorFlow:
49
+ # import tensorflow as tf
50
+
51
+
52
+ def set_random_seeds(seed: int = 42) -> None:
53
+ """Set all random seeds for reproducibility.
54
+
55
+ Args:
56
+ seed: Random seed value (default: 42)
57
+ """
58
+ random.seed(seed)
59
+ np.random.seed(seed)
60
+
61
+ # Uncomment for PyTorch:
62
+ # torch.manual_seed(seed)
63
+ # torch.cuda.manual_seed_all(seed)
64
+ # torch.backends.cudnn.deterministic = True
65
+ # torch.backends.cudnn.benchmark = False
66
+
67
+ # Uncomment for TensorFlow:
68
+ # tf.random.set_seed(seed)
69
+
70
+
71
+ def main(args: argparse.Namespace) -> dict[str, Any]:
72
+ """Main experiment entry point.
73
+
74
+ Args:
75
+ args: Parsed command line arguments
76
+
77
+ Returns:
78
+ Dictionary containing experiment results/metrics
79
+ """
80
+ set_random_seeds(args.random_seed)
81
+
82
+ # ---------------------------------------------------------------------
83
+ # TODO: Implement experiment logic here
84
+ # Extracted from: {{source_notebook}}
85
+ #
86
+ # Refactoring guidance:
87
+ # 1. Convert notebook cells to functions
88
+ # 2. Add type hints to all function signatures
89
+ # 3. Replace hardcoded values with args.* parameters
90
+ # 4. Return metrics as a dictionary for logging
91
+ # ---------------------------------------------------------------------
92
+
93
+ results = {
94
+ "status": "not_implemented",
95
+ "message": "Replace this with actual experiment implementation"
96
+ }
97
+
98
+ return results
99
+
100
+
101
+ if __name__ == "__main__":
102
+ parser = argparse.ArgumentParser(
103
+ description="{{experiment_name}}",
104
+ formatter_class=argparse.ArgumentDefaultsHelpFormatter
105
+ )
106
+
107
+ # Standard arguments
108
+ parser.add_argument(
109
+ "--random-seed",
110
+ type=int,
111
+ default=42,
112
+ help="Random seed for reproducibility"
113
+ )
114
+
115
+ # TODO: Add experiment-specific arguments
116
+ # Example:
117
+ # parser.add_argument("--learning-rate", type=float, default=0.001)
118
+ # parser.add_argument("--epochs", type=int, default=100)
119
+ # parser.add_argument("--batch-size", type=int, default=32)
120
+ # parser.add_argument("--data-path", type=str, required=True)
121
+
122
+ args = parser.parse_args()
123
+
124
+ results = main(args)
125
+ print(f"Experiment complete: {results}")
126
+ ```
127
+
128
+ ## Usage
129
+
130
+ The graduation workflow:
131
+
132
+ 1. **Notebook achieves PROCEED verdict** via `/grd:research`
133
+ 2. **User initiates graduation** via `/grd:graduate`
134
+ 3. **System converts notebook to script** using nbconvert
135
+ 4. **System prepends this header template** with filled variables
136
+ 5. **Script lands in** `src/experiments/{{sanitized_experiment_name}}.py`
137
+ 6. **User completes refactoring checklist** before production use
138
+
139
+ ## Refactoring Checklist Details
140
+
141
+ ### Remove/convert magic commands
142
+ ```bash
143
+ # Find magic commands in the generated script
144
+ grep "^%" src/experiments/your_script.py
145
+ ```
146
+ Common conversions:
147
+ - `%matplotlib inline` - Remove (not needed in scripts)
148
+ - `%load_ext autoreload` - Remove (not applicable)
149
+ - `%%time` - Replace with `time.time()` or profiling
150
+ - `!pip install` - Move to requirements.txt
151
+
152
+ ### Extract code into functions
153
+ - Each logical block should be a function
154
+ - Functions should have single responsibility
155
+ - Return values instead of relying on globals
156
+
157
+ ### Replace parameter cell with argparse
158
+ - Identify cells tagged 'parameters' or variable definitions
159
+ - Convert each parameter to `parser.add_argument()`
160
+ - Use type hints and defaults from original values
161
+
162
+ ### Add docstrings and type hints
163
+ - Every function needs a docstring
164
+ - Use Google or NumPy docstring style consistently
165
+ - Add type hints to all arguments and return values
166
+
167
+ ### Set all random seeds explicitly
168
+ - Call `set_random_seeds()` at start of `main()`
169
+ - Pass seed through all library calls that accept it
170
+ - Document any randomness that cannot be seeded
171
+
172
+ ### Write tests for core functions
173
+ - Create `tests/test_{{sanitized_experiment_name}}.py`
174
+ - Test pure functions with known inputs/outputs
175
+ - Test edge cases and error conditions
176
+
177
+ ---
178
+
179
+ *Template version: 1.0*
180
+ *Phase: 06-notebook-support*
@@ -0,0 +1,234 @@
1
+ # Iteration Summary Template
2
+
3
+ Template for `experiments/archive/YYYY-MM-DD_hypothesis_name/ITERATION_SUMMARY.md`.
4
+
5
+ Collapses all iteration attempts into a single summary document when archiving.
6
+
7
+ ---
8
+
9
+ ## File Template
10
+
11
+ ```markdown
12
+ # Iteration Summary: {{hypothesis_name}}
13
+
14
+ **Total Iterations:** {{N}}
15
+ **Date Range:** {{first_run_date}} to {{last_run_date}}
16
+ **Outcome:** Archived (hypothesis abandoned)
17
+
18
+ ## Iteration History
19
+
20
+ | # | Run | Date | Verdict | Confidence | Key Metric | Notes |
21
+ |---|-----|------|---------|------------|------------|-------|
22
+ | 1 | run_001_baseline | YYYY-MM-DD | REVISE_METHOD | MEDIUM | F1=0.72 | Initial attempt |
23
+ | 2 | run_002_tuned | YYYY-MM-DD | REVISE_METHOD | MEDIUM | F1=0.76 | Hyperparameter tuning |
24
+ | 3 | run_003_final | YYYY-MM-DD | ESCALATE | LOW | F1=0.78 | Limit reached |
25
+
26
+ ## Metric Trend
27
+
28
+ **Best achieved:** {{metric}}={{best_value}}
29
+ **Target:** {{threshold}}
30
+ **Gap:** {{best_value - threshold}}
31
+ **Trend:** {{improving|stagnant|degrading}}
32
+
33
+ ## Verdict Distribution
34
+
35
+ - PROCEED: {{count}}
36
+ - REVISE_METHOD: {{count}}
37
+ - REVISE_DATA: {{count}}
38
+ - ESCALATE: {{count}}
39
+
40
+ ## Key Observations
41
+
42
+ {{Summary of what was tried and why it didn't work}}
43
+
44
+ ## Preserved Artifacts
45
+
46
+ - Final run: {{run_NNN_description}}/
47
+ - All CRITIC_LOG.md files (merged or individual)
48
+ - Final SCORECARD.json
49
+
50
+ ---
51
+
52
+ *Summary generated on archive. See ARCHIVE_REASON.md for human rationale.*
53
+ ```
54
+
55
+ ---
56
+
57
+ ## Usage Notes
58
+
59
+ **Field descriptions:**
60
+
61
+ - **hypothesis_name:** Human-readable name from OBJECTIVE.md "what" section
62
+ - **N:** Total iteration count (number of runs attempted)
63
+ - **first_run_date:** Timestamp from earliest run directory
64
+ - **last_run_date:** Timestamp from final run directory
65
+ - **Outcome:** Always "Archived (hypothesis abandoned)" for this template
66
+
67
+ **Iteration History table:**
68
+
69
+ Populate from all run directories in experiments/:
70
+ - **#:** Sequential iteration number (1, 2, 3...)
71
+ - **Run:** Run directory name (run_001_baseline, run_002_tuned, etc.)
72
+ - **Date:** Extraction date from CRITIC_LOG.md or directory timestamp
73
+ - **Verdict:** Critic verdict (PROCEED, REVISE_METHOD, REVISE_DATA, ESCALATE)
74
+ - **Confidence:** Critic confidence level (HIGH, MEDIUM, LOW)
75
+ - **Key Metric:** Primary metric value from SCORECARD.json (e.g., "F1=0.72")
76
+ - **Notes:** Brief description of what was tried (from run README.md or CRITIC_LOG.md)
77
+
78
+ **Metric Trend section:**
79
+
80
+ - **Best achieved:** Highest value attained across all iterations for primary metric
81
+ - **Target:** Threshold from OBJECTIVE.md for that metric
82
+ - **Gap:** Difference (best_value - threshold), show with sign
83
+ - **Trend:** Classify based on metric progression:
84
+ - "improving" if values consistently increase across iterations
85
+ - "stagnant" if values plateau or fluctuate without progress
86
+ - "degrading" if values decrease (rare but possible with overfitting)
87
+
88
+ **Verdict Distribution:**
89
+
90
+ Count occurrences of each verdict type across all runs:
91
+ ```bash
92
+ # Example calculation
93
+ PROCEED=$(grep -h "^\*\*Verdict:\*\* PROCEED" experiments/run_*/CRITIC_LOG.md | wc -l)
94
+ REVISE_METHOD=$(grep -h "^\*\*Verdict:\*\* REVISE_METHOD" experiments/run_*/CRITIC_LOG.md | wc -l)
95
+ REVISE_DATA=$(grep -h "^\*\*Verdict:\*\* REVISE_DATA" experiments/run_*/CRITIC_LOG.md | wc -l)
96
+ ESCALATE=$(grep -h "^\*\*Verdict:\*\* ESCALATE" experiments/run_*/CRITIC_LOG.md | wc -l)
97
+ ```
98
+
99
+ **Key Observations section:**
100
+
101
+ Synthesize learnings from all iterations. Extract from:
102
+ - CRITIC_LOG.md strengths/weaknesses across runs
103
+ - Patterns in metric progression
104
+ - What approaches were attempted (architectures, hyperparameters, data transformations)
105
+ - Why none succeeded (common failure mode)
106
+
107
+ Examples:
108
+ - "All iterations struggled with class imbalance (99.2% negative). Resampling techniques (SMOTE, undersampling) did not improve recall."
109
+ - "Metric progression stagnated after run 2. Runs 3-5 showed no improvement despite hyperparameter changes, suggesting architectural limitation."
110
+ - "Data leakage warnings in DATA_REPORT.md were confirmed. Removing leaked features in run 4-5 caused significant performance drop, revealing hypothesis depended on invalid signal."
111
+
112
+ **Preserved Artifacts section:**
113
+
114
+ List what's kept in the archive:
115
+ - Final run directory (moved from experiments/)
116
+ - CRITIC_LOG.md files (individual or merged)
117
+ - Final SCORECARD.json
118
+ - ARCHIVE_REASON.md (user rationale)
119
+ - This ITERATION_SUMMARY.md
120
+
121
+ **Example populated template:**
122
+
123
+ ```markdown
124
+ # Iteration Summary: Ensemble Methods for Fraud Detection
125
+
126
+ **Total Iterations:** 5
127
+ **Date Range:** 2026-01-15 to 2026-01-30
128
+ **Outcome:** Archived (hypothesis abandoned)
129
+
130
+ ## Iteration History
131
+
132
+ | # | Run | Date | Verdict | Confidence | Key Metric | Notes |
133
+ |---|-----|------|---------|------------|------------|-------|
134
+ | 1 | run_001_baseline | 2026-01-15 | REVISE_METHOD | MEDIUM | F1=0.52 | Initial ensemble (RF+GBM) |
135
+ | 2 | run_002_smote | 2026-01-18 | REVISE_METHOD | MEDIUM | F1=0.56 | Added SMOTE resampling |
136
+ | 3 | run_003_weighted | 2026-01-22 | REVISE_METHOD | MEDIUM | F1=0.58 | Class weight optimization |
137
+ | 4 | run_004_reduced | 2026-01-25 | REVISE_METHOD | LOW | F1=0.57 | Feature reduction (237→15) |
138
+ | 5 | run_005_final | 2026-01-30 | ESCALATE | LOW | F1=0.58 | Limit reached |
139
+
140
+ ## Metric Trend
141
+
142
+ **Best achieved:** F1=0.58
143
+ **Target:** 0.85
144
+ **Gap:** -0.27
145
+ **Trend:** stagnant (plateaued at 0.56-0.58 after run 2)
146
+
147
+ ## Verdict Distribution
148
+
149
+ - PROCEED: 0
150
+ - REVISE_METHOD: 4
151
+ - REVISE_DATA: 0
152
+ - ESCALATE: 1
153
+
154
+ ## Key Observations
155
+
156
+ All five iterations struggled with severe class imbalance (99.2% negative class, N=120 positive examples). Despite trying multiple approaches (SMOTE resampling, class weighting, feature reduction), F1 score plateaued at 0.56-0.58, far below the target of 0.85.
157
+
158
+ **What was tried:**
159
+ - Run 1: Baseline ensemble (Random Forest + Gradient Boosting)
160
+ - Run 2: SMOTE oversampling to balance classes
161
+ - Run 3: Class weight optimization (weighted loss functions)
162
+ - Run 4: Feature reduction (237→15 features) to reduce overfitting
163
+ - Run 5: Combined approach with focal loss
164
+
165
+ **Common failure mode:**
166
+ All iterations achieved high precision (>0.90) but poor recall (<0.42), indicating the model learned to be conservative due to extreme class imbalance. Resampling techniques introduced artificial patterns that didn't generalize. The fundamental issue is insufficient positive examples (N=120) for the hypothesis to be testable with ensemble methods.
167
+
168
+ **Critic's repeated concerns:**
169
+ - "Limited positive examples prevent ensemble diversity"
170
+ - "High precision but poor recall suggests severe class imbalance"
171
+ - "Feature space too large relative to positive sample size"
172
+
173
+ ## Preserved Artifacts
174
+
175
+ - Final run: run_005_final/ (complete snapshot with DECISION.md)
176
+ - All CRITIC_LOG.md files (archived individually in final_run/)
177
+ - Final SCORECARD.json (F1=0.58, composite=0.64)
178
+ - ARCHIVE_REASON.md (user rationale for abandonment)
179
+ - This ITERATION_SUMMARY.md
180
+
181
+ ---
182
+
183
+ *Summary generated on archive. See ARCHIVE_REASON.md for human rationale.*
184
+ ```
185
+
186
+ ---
187
+
188
+ ## Integration
189
+
190
+ This template is used by `/grd:evaluate` command in Phase 5 (Archive Handling) when user selects "Archive" decision.
191
+
192
+ **Inputs:**
193
+ - All run directories in experiments/ (run_001, run_002, ...)
194
+ - CRITIC_LOG.md from each run (verdict, confidence, recommendations)
195
+ - SCORECARD.json from each run (metrics, composite score)
196
+ - OBJECTIVE.md (hypothesis, target thresholds)
197
+
198
+ **Generation logic:**
199
+
200
+ ```bash
201
+ # Collect all runs
202
+ RUNS=$(ls -1d experiments/run_* | sort)
203
+ TOTAL=$(echo "$RUNS" | wc -l | tr -d ' ')
204
+
205
+ # Extract date range
206
+ FIRST_DATE=$(stat -f "%Sm" -t "%Y-%m-%d" $(echo "$RUNS" | head -1))
207
+ LAST_DATE=$(stat -f "%Sm" -t "%Y-%m-%d" $(echo "$RUNS" | tail -1))
208
+
209
+ # Build iteration history table
210
+ for run in $RUNS; do
211
+ VERDICT=$(grep "^\*\*Verdict:\*\*" "$run/CRITIC_LOG.md" | head -1 | awk '{print $2}')
212
+ CONFIDENCE=$(grep "^\*\*Confidence:\*\*" "$run/CRITIC_LOG.md" | head -1 | awk '{print $2}')
213
+ METRIC=$(jq -r '.metrics[0] | "\(.name)=\(.value)"' "$run/metrics/SCORECARD.json")
214
+ NOTES=$(head -1 "$run/README.md" | sed 's/^# //')
215
+ echo "| $i | $(basename $run) | $DATE | $VERDICT | $CONFIDENCE | $METRIC | $NOTES |"
216
+ done
217
+
218
+ # Calculate metric trend
219
+ BEST_METRIC=$(jq -s 'map(.metrics[0].value) | max' experiments/run_*/metrics/SCORECARD.json)
220
+ TARGET=$(jq -r '.metrics[0].threshold' .planning/OBJECTIVE.md)
221
+ GAP=$(echo "$BEST_METRIC - $TARGET" | bc)
222
+
223
+ # Determine trend (simplified)
224
+ if [ "$GAP" -lt -0.10 ]; then
225
+ TREND="stagnant (far from target)"
226
+ else
227
+ TREND="improving (but insufficient)"
228
+ fi
229
+ ```
230
+
231
+ **Output:**
232
+ - experiments/archive/YYYY-MM-DD_hypothesis_slug/ITERATION_SUMMARY.md
233
+ - Referenced by ARCHIVE_REASON.md in same directory
234
+ - Provides historical context for negative result
@@ -0,0 +1,123 @@
1
+ # Milestone Archive Template
2
+
3
+ This template is used by the complete-milestone workflow to create archive files in `.planning/milestones/`.
4
+
5
+ ---
6
+
7
+ ## File Template
8
+
9
+ # Milestone v{{VERSION}}: {{MILESTONE_NAME}}
10
+
11
+ **Status:** ✅ SHIPPED {{DATE}}
12
+ **Phases:** {{PHASE_START}}-{{PHASE_END}}
13
+ **Total Plans:** {{TOTAL_PLANS}}
14
+
15
+ ## Overview
16
+
17
+ {{MILESTONE_DESCRIPTION}}
18
+
19
+ ## Phases
20
+
21
+ {{PHASES_SECTION}}
22
+
23
+ [For each phase in this milestone, include:]
24
+
25
+ ### Phase {{PHASE_NUM}}: {{PHASE_NAME}}
26
+
27
+ **Goal**: {{PHASE_GOAL}}
28
+ **Depends on**: {{DEPENDS_ON}}
29
+ **Plans**: {{PLAN_COUNT}} plans
30
+
31
+ Plans:
32
+
33
+ - [x] {{PHASE}}-01: {{PLAN_DESCRIPTION}}
34
+ - [x] {{PHASE}}-02: {{PLAN_DESCRIPTION}}
35
+ [... all plans ...]
36
+
37
+ **Details:**
38
+ {{PHASE_DETAILS_FROM_ROADMAP}}
39
+
40
+ **For decimal phases, include (INSERTED) marker:**
41
+
42
+ ### Phase 2.1: Critical Security Patch (INSERTED)
43
+
44
+ **Goal**: Fix authentication bypass vulnerability
45
+ **Depends on**: Phase 2
46
+ **Plans**: 1 plan
47
+
48
+ Plans:
49
+
50
+ - [x] 02.1-01: Patch auth vulnerability
51
+
52
+ **Details:**
53
+ {{PHASE_DETAILS_FROM_ROADMAP}}
54
+
55
+ ---
56
+
57
+ ## Milestone Summary
58
+
59
+ **Decimal Phases:**
60
+
61
+ - Phase 2.1: Critical Security Patch (inserted after Phase 2 for urgent fix)
62
+ - Phase 5.1: Performance Hotfix (inserted after Phase 5 for production issue)
63
+
64
+ **Key Decisions:**
65
+ {{DECISIONS_FROM_PROJECT_STATE}}
66
+ [Example:]
67
+
68
+ - Decision: Use ROADMAP.md split (Rationale: Constant context cost)
69
+ - Decision: Decimal phase numbering (Rationale: Clear insertion semantics)
70
+
71
+ **Issues Resolved:**
72
+ {{ISSUES_RESOLVED_DURING_MILESTONE}}
73
+ [Example:]
74
+
75
+ - Fixed context overflow at 100+ phases
76
+ - Resolved phase insertion confusion
77
+
78
+ **Issues Deferred:**
79
+ {{ISSUES_DEFERRED_TO_LATER}}
80
+ [Example:]
81
+
82
+ - PROJECT-STATE.md tiering (deferred until decisions > 300)
83
+
84
+ **Technical Debt Incurred:**
85
+ {{SHORTCUTS_NEEDING_FUTURE_WORK}}
86
+ [Example:]
87
+
88
+ - Some workflows still have hardcoded paths (fix in Phase 5)
89
+
90
+ ---
91
+
92
+ _For current project status, see .planning/ROADMAP.md_
93
+
94
+ ---
95
+
96
+ ## Usage Guidelines
97
+
98
+ <guidelines>
99
+ **When to create milestone archives:**
100
+ - After completing all phases in a milestone (v1.0, v1.1, v2.0, etc.)
101
+ - Triggered by complete-milestone workflow
102
+ - Before planning next milestone work
103
+
104
+ **How to fill template:**
105
+
106
+ - Replace {{PLACEHOLDERS}} with actual values
107
+ - Extract phase details from ROADMAP.md
108
+ - Document decimal phases with (INSERTED) marker
109
+ - Include key decisions from PROJECT-STATE.md or SUMMARY files
110
+ - List issues resolved vs deferred
111
+ - Capture technical debt for future reference
112
+
113
+ **Archive location:**
114
+
115
+ - Save to `.planning/milestones/v{VERSION}-{NAME}.md`
116
+ - Example: `.planning/milestones/v1.0-mvp.md`
117
+
118
+ **After archiving:**
119
+
120
+ - Update ROADMAP.md to collapse completed milestone in `<details>` tag
121
+ - Update PROJECT.md to brownfield format with Current State section
122
+ - Continue phase numbering in next milestone (never restart at 01)
123
+ </guidelines>
@@ -0,0 +1,115 @@
1
+ # Milestone Entry Template
2
+
3
+ Add this entry to `.planning/MILESTONES.md` when completing a milestone:
4
+
5
+ ```markdown
6
+ ## v[X.Y] [Name] (Shipped: YYYY-MM-DD)
7
+
8
+ **Delivered:** [One sentence describing what shipped]
9
+
10
+ **Phases completed:** [X-Y] ([Z] plans total)
11
+
12
+ **Key accomplishments:**
13
+ - [Major achievement 1]
14
+ - [Major achievement 2]
15
+ - [Major achievement 3]
16
+ - [Major achievement 4]
17
+
18
+ **Stats:**
19
+ - [X] files created/modified
20
+ - [Y] lines of code (primary language)
21
+ - [Z] phases, [N] plans, [M] tasks
22
+ - [D] days from start to ship (or milestone to milestone)
23
+
24
+ **Git range:** `feat(XX-XX)` → `feat(YY-YY)`
25
+
26
+ **What's next:** [Brief description of next milestone goals, or "Project complete"]
27
+
28
+ ---
29
+ ```
30
+
31
+ <structure>
32
+ If MILESTONES.md doesn't exist, create it with header:
33
+
34
+ ```markdown
35
+ # Project Milestones: [Project Name]
36
+
37
+ [Entries in reverse chronological order - newest first]
38
+ ```
39
+ </structure>
40
+
41
+ <guidelines>
42
+ **When to create milestones:**
43
+ - Initial v1.0 MVP shipped
44
+ - Major version releases (v2.0, v3.0)
45
+ - Significant feature milestones (v1.1, v1.2)
46
+ - Before archiving planning (capture what was shipped)
47
+
48
+ **Don't create milestones for:**
49
+ - Individual phase completions (normal workflow)
50
+ - Work in progress (wait until shipped)
51
+ - Minor bug fixes that don't constitute a release
52
+
53
+ **Stats to include:**
54
+ - Count modified files: `git diff --stat feat(XX-XX)..feat(YY-YY) | tail -1`
55
+ - Count LOC: `find . -name "*.swift" -o -name "*.ts" | xargs wc -l` (or relevant extension)
56
+ - Phase/plan/task counts from ROADMAP
57
+ - Timeline from first phase commit to last phase commit
58
+
59
+ **Git range format:**
60
+ - First commit of milestone → last commit of milestone
61
+ - Example: `feat(01-01)` → `feat(04-01)` for phases 1-4
62
+ </guidelines>
63
+
64
+ <example>
65
+ ```markdown
66
+ # Project Milestones: WeatherBar
67
+
68
+ ## v1.1 Security & Polish (Shipped: 2025-12-10)
69
+
70
+ **Delivered:** Security hardening with Keychain integration and comprehensive error handling
71
+
72
+ **Phases completed:** 5-6 (3 plans total)
73
+
74
+ **Key accomplishments:**
75
+ - Migrated API key storage from plaintext to macOS Keychain
76
+ - Implemented comprehensive error handling for network failures
77
+ - Added Sentry crash reporting integration
78
+ - Fixed memory leak in auto-refresh timer
79
+
80
+ **Stats:**
81
+ - 23 files modified
82
+ - 650 lines of Swift added
83
+ - 2 phases, 3 plans, 12 tasks
84
+ - 8 days from v1.0 to v1.1
85
+
86
+ **Git range:** `feat(05-01)` → `feat(06-02)`
87
+
88
+ **What's next:** v2.0 SwiftUI redesign with widget support
89
+
90
+ ---
91
+
92
+ ## v1.0 MVP (Shipped: 2025-11-25)
93
+
94
+ **Delivered:** Menu bar weather app with current conditions and 3-day forecast
95
+
96
+ **Phases completed:** 1-4 (7 plans total)
97
+
98
+ **Key accomplishments:**
99
+ - Menu bar app with popover UI (AppKit)
100
+ - OpenWeather API integration with auto-refresh
101
+ - Current weather display with conditions icon
102
+ - 3-day forecast list with high/low temperatures
103
+ - Code signed and notarized for distribution
104
+
105
+ **Stats:**
106
+ - 47 files created
107
+ - 2,450 lines of Swift
108
+ - 4 phases, 7 plans, 28 tasks
109
+ - 12 days from start to ship
110
+
111
+ **Git range:** `feat(01-01)` → `feat(04-01)`
112
+
113
+ **What's next:** Security audit and hardening for v1.1
114
+ ```
115
+ </example>