get-research-done 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (127) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +560 -0
  3. package/agents/grd-architect.md +789 -0
  4. package/agents/grd-codebase-mapper.md +738 -0
  5. package/agents/grd-critic.md +1065 -0
  6. package/agents/grd-debugger.md +1203 -0
  7. package/agents/grd-evaluator.md +948 -0
  8. package/agents/grd-executor.md +784 -0
  9. package/agents/grd-explorer.md +2063 -0
  10. package/agents/grd-graduator.md +484 -0
  11. package/agents/grd-integration-checker.md +423 -0
  12. package/agents/grd-phase-researcher.md +641 -0
  13. package/agents/grd-plan-checker.md +745 -0
  14. package/agents/grd-planner.md +1386 -0
  15. package/agents/grd-project-researcher.md +865 -0
  16. package/agents/grd-research-synthesizer.md +256 -0
  17. package/agents/grd-researcher.md +2361 -0
  18. package/agents/grd-roadmapper.md +605 -0
  19. package/agents/grd-verifier.md +778 -0
  20. package/bin/install.js +1294 -0
  21. package/commands/grd/add-phase.md +207 -0
  22. package/commands/grd/add-todo.md +193 -0
  23. package/commands/grd/architect.md +283 -0
  24. package/commands/grd/audit-milestone.md +277 -0
  25. package/commands/grd/check-todos.md +228 -0
  26. package/commands/grd/complete-milestone.md +136 -0
  27. package/commands/grd/debug.md +169 -0
  28. package/commands/grd/discuss-phase.md +86 -0
  29. package/commands/grd/evaluate.md +1095 -0
  30. package/commands/grd/execute-phase.md +339 -0
  31. package/commands/grd/explore.md +258 -0
  32. package/commands/grd/graduate.md +323 -0
  33. package/commands/grd/help.md +482 -0
  34. package/commands/grd/insert-phase.md +227 -0
  35. package/commands/grd/insights.md +231 -0
  36. package/commands/grd/join-discord.md +18 -0
  37. package/commands/grd/list-phase-assumptions.md +50 -0
  38. package/commands/grd/map-codebase.md +71 -0
  39. package/commands/grd/new-milestone.md +721 -0
  40. package/commands/grd/new-project.md +1008 -0
  41. package/commands/grd/pause-work.md +134 -0
  42. package/commands/grd/plan-milestone-gaps.md +295 -0
  43. package/commands/grd/plan-phase.md +525 -0
  44. package/commands/grd/progress.md +364 -0
  45. package/commands/grd/quick-explore.md +236 -0
  46. package/commands/grd/quick.md +309 -0
  47. package/commands/grd/remove-phase.md +349 -0
  48. package/commands/grd/research-phase.md +200 -0
  49. package/commands/grd/research.md +681 -0
  50. package/commands/grd/resume-work.md +40 -0
  51. package/commands/grd/set-profile.md +106 -0
  52. package/commands/grd/settings.md +136 -0
  53. package/commands/grd/update.md +172 -0
  54. package/commands/grd/verify-work.md +219 -0
  55. package/get-research-done/config/default.json +15 -0
  56. package/get-research-done/references/checkpoints.md +1078 -0
  57. package/get-research-done/references/continuation-format.md +249 -0
  58. package/get-research-done/references/git-integration.md +254 -0
  59. package/get-research-done/references/model-profiles.md +73 -0
  60. package/get-research-done/references/planning-config.md +94 -0
  61. package/get-research-done/references/questioning.md +141 -0
  62. package/get-research-done/references/tdd.md +263 -0
  63. package/get-research-done/references/ui-brand.md +160 -0
  64. package/get-research-done/references/verification-patterns.md +612 -0
  65. package/get-research-done/templates/DEBUG.md +159 -0
  66. package/get-research-done/templates/UAT.md +247 -0
  67. package/get-research-done/templates/archive-reason.md +195 -0
  68. package/get-research-done/templates/codebase/architecture.md +255 -0
  69. package/get-research-done/templates/codebase/concerns.md +310 -0
  70. package/get-research-done/templates/codebase/conventions.md +307 -0
  71. package/get-research-done/templates/codebase/integrations.md +280 -0
  72. package/get-research-done/templates/codebase/stack.md +186 -0
  73. package/get-research-done/templates/codebase/structure.md +285 -0
  74. package/get-research-done/templates/codebase/testing.md +480 -0
  75. package/get-research-done/templates/config.json +35 -0
  76. package/get-research-done/templates/context.md +283 -0
  77. package/get-research-done/templates/continue-here.md +78 -0
  78. package/get-research-done/templates/critic-log.md +288 -0
  79. package/get-research-done/templates/data-report.md +173 -0
  80. package/get-research-done/templates/debug-subagent-prompt.md +91 -0
  81. package/get-research-done/templates/decision-log.md +58 -0
  82. package/get-research-done/templates/decision.md +138 -0
  83. package/get-research-done/templates/discovery.md +146 -0
  84. package/get-research-done/templates/experiment-readme.md +104 -0
  85. package/get-research-done/templates/graduated-script.md +180 -0
  86. package/get-research-done/templates/iteration-summary.md +234 -0
  87. package/get-research-done/templates/milestone-archive.md +123 -0
  88. package/get-research-done/templates/milestone.md +115 -0
  89. package/get-research-done/templates/objective.md +271 -0
  90. package/get-research-done/templates/phase-prompt.md +567 -0
  91. package/get-research-done/templates/planner-subagent-prompt.md +117 -0
  92. package/get-research-done/templates/project.md +184 -0
  93. package/get-research-done/templates/requirements.md +231 -0
  94. package/get-research-done/templates/research-project/ARCHITECTURE.md +204 -0
  95. package/get-research-done/templates/research-project/FEATURES.md +147 -0
  96. package/get-research-done/templates/research-project/PITFALLS.md +200 -0
  97. package/get-research-done/templates/research-project/STACK.md +120 -0
  98. package/get-research-done/templates/research-project/SUMMARY.md +170 -0
  99. package/get-research-done/templates/research.md +529 -0
  100. package/get-research-done/templates/roadmap.md +202 -0
  101. package/get-research-done/templates/scorecard.json +113 -0
  102. package/get-research-done/templates/state.md +287 -0
  103. package/get-research-done/templates/summary.md +246 -0
  104. package/get-research-done/templates/user-setup.md +311 -0
  105. package/get-research-done/templates/verification-report.md +322 -0
  106. package/get-research-done/workflows/complete-milestone.md +756 -0
  107. package/get-research-done/workflows/diagnose-issues.md +231 -0
  108. package/get-research-done/workflows/discovery-phase.md +289 -0
  109. package/get-research-done/workflows/discuss-phase.md +433 -0
  110. package/get-research-done/workflows/execute-phase.md +657 -0
  111. package/get-research-done/workflows/execute-plan.md +1844 -0
  112. package/get-research-done/workflows/list-phase-assumptions.md +178 -0
  113. package/get-research-done/workflows/map-codebase.md +322 -0
  114. package/get-research-done/workflows/resume-project.md +307 -0
  115. package/get-research-done/workflows/transition.md +556 -0
  116. package/get-research-done/workflows/verify-phase.md +628 -0
  117. package/get-research-done/workflows/verify-work.md +596 -0
  118. package/hooks/dist/grd-check-update.js +61 -0
  119. package/hooks/dist/grd-statusline.js +84 -0
  120. package/package.json +47 -0
  121. package/scripts/audit-help-commands.sh +115 -0
  122. package/scripts/build-hooks.js +42 -0
  123. package/scripts/verify-all-commands.sh +246 -0
  124. package/scripts/verify-architect-warning.sh +35 -0
  125. package/scripts/verify-insights-mode.sh +40 -0
  126. package/scripts/verify-quick-mode.sh +20 -0
  127. package/scripts/verify-revise-data-routing.sh +139 -0
@@ -0,0 +1,159 @@
1
+ # Debug Template
2
+
3
+ Template for `.planning/debug/[slug].md` — active debug session tracking.
4
+
5
+ ---
6
+
7
+ ## File Template
8
+
9
+ ```markdown
10
+ ---
11
+ status: gathering | investigating | fixing | verifying | resolved
12
+ trigger: "[verbatim user input]"
13
+ created: [ISO timestamp]
14
+ updated: [ISO timestamp]
15
+ ---
16
+
17
+ ## Current Focus
18
+ <!-- OVERWRITE on each update - always reflects NOW -->
19
+
20
+ hypothesis: [current theory being tested]
21
+ test: [how testing it]
22
+ expecting: [what result means if true/false]
23
+ next_action: [immediate next step]
24
+
25
+ ## Symptoms
26
+ <!-- Written during gathering, then immutable -->
27
+
28
+ expected: [what should happen]
29
+ actual: [what actually happens]
30
+ errors: [error messages if any]
31
+ reproduction: [how to trigger]
32
+ started: [when it broke / always broken]
33
+
34
+ ## Eliminated
35
+ <!-- APPEND only - prevents re-investigating after /clear -->
36
+
37
+ - hypothesis: [theory that was wrong]
38
+ evidence: [what disproved it]
39
+ timestamp: [when eliminated]
40
+
41
+ ## Evidence
42
+ <!-- APPEND only - facts discovered during investigation -->
43
+
44
+ - timestamp: [when found]
45
+ checked: [what was examined]
46
+ found: [what was observed]
47
+ implication: [what this means]
48
+
49
+ ## Resolution
50
+ <!-- OVERWRITE as understanding evolves -->
51
+
52
+ root_cause: [empty until found]
53
+ fix: [empty until applied]
54
+ verification: [empty until verified]
55
+ files_changed: []
56
+ ```
57
+
58
+ ---
59
+
60
+ <section_rules>
61
+
62
+ **Frontmatter (status, trigger, timestamps):**
63
+ - `status`: OVERWRITE - reflects current phase
64
+ - `trigger`: IMMUTABLE - verbatim user input, never changes
65
+ - `created`: IMMUTABLE - set once
66
+ - `updated`: OVERWRITE - update on every change
67
+
68
+ **Current Focus:**
69
+ - OVERWRITE entirely on each update
70
+ - Always reflects what Claude is doing RIGHT NOW
71
+ - If Claude reads this after /clear, it knows exactly where to resume
72
+ - Fields: hypothesis, test, expecting, next_action
73
+
74
+ **Symptoms:**
75
+ - Written during initial gathering phase
76
+ - IMMUTABLE after gathering complete
77
+ - Reference point for what we're trying to fix
78
+ - Fields: expected, actual, errors, reproduction, started
79
+
80
+ **Eliminated:**
81
+ - APPEND only - never remove entries
82
+ - Prevents re-investigating dead ends after context reset
83
+ - Each entry: hypothesis, evidence that disproved it, timestamp
84
+ - Critical for efficiency across /clear boundaries
85
+
86
+ **Evidence:**
87
+ - APPEND only - never remove entries
88
+ - Facts discovered during investigation
89
+ - Each entry: timestamp, what checked, what found, implication
90
+ - Builds the case for root cause
91
+
92
+ **Resolution:**
93
+ - OVERWRITE as understanding evolves
94
+ - May update multiple times as fixes are tried
95
+ - Final state shows confirmed root cause and verified fix
96
+ - Fields: root_cause, fix, verification, files_changed
97
+
98
+ </section_rules>
99
+
100
+ <lifecycle>
101
+
102
+ **Creation:** Immediately when /grd:debug is called
103
+ - Create file with trigger from user input
104
+ - Set status to "gathering"
105
+ - Current Focus: next_action = "gather symptoms"
106
+ - Symptoms: empty, to be filled
107
+
108
+ **During symptom gathering:**
109
+ - Update Symptoms section as user answers questions
110
+ - Update Current Focus with each question
111
+ - When complete: status → "investigating"
112
+
113
+ **During investigation:**
114
+ - OVERWRITE Current Focus with each hypothesis
115
+ - APPEND to Evidence with each finding
116
+ - APPEND to Eliminated when hypothesis disproved
117
+ - Update timestamp in frontmatter
118
+
119
+ **During fixing:**
120
+ - status → "fixing"
121
+ - Update Resolution.root_cause when confirmed
122
+ - Update Resolution.fix when applied
123
+ - Update Resolution.files_changed
124
+
125
+ **During verification:**
126
+ - status → "verifying"
127
+ - Update Resolution.verification with results
128
+ - If verification fails: status → "investigating", try again
129
+
130
+ **On resolution:**
131
+ - status → "resolved"
132
+ - Move file to .planning/debug/resolved/
133
+
134
+ </lifecycle>
135
+
136
+ <resume_behavior>
137
+
138
+ When Claude reads this file after /clear:
139
+
140
+ 1. Parse frontmatter → know status
141
+ 2. Read Current Focus → know exactly what was happening
142
+ 3. Read Eliminated → know what NOT to retry
143
+ 4. Read Evidence → know what's been learned
144
+ 5. Continue from next_action
145
+
146
+ The file IS the debugging brain. Claude should be able to resume perfectly from any interruption point.
147
+
148
+ </resume_behavior>
149
+
150
+ <size_constraint>
151
+
152
+ Keep debug files focused:
153
+ - Evidence entries: 1-2 lines each, just the facts
154
+ - Eliminated: brief - hypothesis + why it failed
155
+ - No narrative prose - structured data only
156
+
157
+ If evidence grows very large (10+ entries), consider whether you're going in circles. Check Eliminated to ensure you're not re-treading.
158
+
159
+ </size_constraint>
@@ -0,0 +1,247 @@
1
+ # UAT Template
2
+
3
+ Template for `.planning/phases/XX-name/{phase}-UAT.md` — persistent UAT session tracking.
4
+
5
+ ---
6
+
7
+ ## File Template
8
+
9
+ ```markdown
10
+ ---
11
+ status: testing | complete | diagnosed
12
+ phase: XX-name
13
+ source: [list of SUMMARY.md files tested]
14
+ started: [ISO timestamp]
15
+ updated: [ISO timestamp]
16
+ ---
17
+
18
+ ## Current Test
19
+ <!-- OVERWRITE each test - shows where we are -->
20
+
21
+ number: [N]
22
+ name: [test name]
23
+ expected: |
24
+ [what user should observe]
25
+ awaiting: user response
26
+
27
+ ## Tests
28
+
29
+ ### 1. [Test Name]
30
+ expected: [observable behavior - what user should see]
31
+ result: [pending]
32
+
33
+ ### 2. [Test Name]
34
+ expected: [observable behavior]
35
+ result: pass
36
+
37
+ ### 3. [Test Name]
38
+ expected: [observable behavior]
39
+ result: issue
40
+ reported: "[verbatim user response]"
41
+ severity: major
42
+
43
+ ### 4. [Test Name]
44
+ expected: [observable behavior]
45
+ result: skipped
46
+ reason: [why skipped]
47
+
48
+ ...
49
+
50
+ ## Summary
51
+
52
+ total: [N]
53
+ passed: [N]
54
+ issues: [N]
55
+ pending: [N]
56
+ skipped: [N]
57
+
58
+ ## Gaps
59
+
60
+ <!-- YAML format for plan-phase --gaps consumption -->
61
+ - truth: "[expected behavior from test]"
62
+ status: failed
63
+ reason: "User reported: [verbatim response]"
64
+ severity: blocker | major | minor | cosmetic
65
+ test: [N]
66
+ root_cause: "" # Filled by diagnosis
67
+ artifacts: [] # Filled by diagnosis
68
+ missing: [] # Filled by diagnosis
69
+ debug_session: "" # Filled by diagnosis
70
+ ```
71
+
72
+ ---
73
+
74
+ <section_rules>
75
+
76
+ **Frontmatter:**
77
+ - `status`: OVERWRITE - "testing" or "complete"
78
+ - `phase`: IMMUTABLE - set on creation
79
+ - `source`: IMMUTABLE - SUMMARY files being tested
80
+ - `started`: IMMUTABLE - set on creation
81
+ - `updated`: OVERWRITE - update on every change
82
+
83
+ **Current Test:**
84
+ - OVERWRITE entirely on each test transition
85
+ - Shows which test is active and what's awaited
86
+ - On completion: "[testing complete]"
87
+
88
+ **Tests:**
89
+ - Each test: OVERWRITE result field when user responds
90
+ - `result` values: [pending], pass, issue, skipped
91
+ - If issue: add `reported` (verbatim) and `severity` (inferred)
92
+ - If skipped: add `reason` if provided
93
+
94
+ **Summary:**
95
+ - OVERWRITE counts after each response
96
+ - Tracks: total, passed, issues, pending, skipped
97
+
98
+ **Gaps:**
99
+ - APPEND only when issue found (YAML format)
100
+ - After diagnosis: fill `root_cause`, `artifacts`, `missing`, `debug_session`
101
+ - This section feeds directly into /grd:plan-phase --gaps
102
+
103
+ </section_rules>
104
+
105
+ <diagnosis_lifecycle>
106
+
107
+ **After testing complete (status: complete), if gaps exist:**
108
+
109
+ 1. User runs diagnosis (from verify-work offer or manually)
110
+ 2. diagnose-issues workflow spawns parallel debug agents
111
+ 3. Each agent investigates one gap, returns root cause
112
+ 4. UAT.md Gaps section updated with diagnosis:
113
+ - Each gap gets `root_cause`, `artifacts`, `missing`, `debug_session` filled
114
+ 5. status → "diagnosed"
115
+ 6. Ready for /grd:plan-phase --gaps with root causes
116
+
117
+ **After diagnosis:**
118
+ ```yaml
119
+ ## Gaps
120
+
121
+ - truth: "Comment appears immediately after submission"
122
+ status: failed
123
+ reason: "User reported: works but doesn't show until I refresh the page"
124
+ severity: major
125
+ test: 2
126
+ root_cause: "useEffect in CommentList.tsx missing commentCount dependency"
127
+ artifacts:
128
+ - path: "src/components/CommentList.tsx"
129
+ issue: "useEffect missing dependency"
130
+ missing:
131
+ - "Add commentCount to useEffect dependency array"
132
+ debug_session: ".planning/debug/comment-not-refreshing.md"
133
+ ```
134
+
135
+ </diagnosis_lifecycle>
136
+
137
+ <lifecycle>
138
+
139
+ **Creation:** When /grd:verify-work starts new session
140
+ - Extract tests from SUMMARY.md files
141
+ - Set status to "testing"
142
+ - Current Test points to test 1
143
+ - All tests have result: [pending]
144
+
145
+ **During testing:**
146
+ - Present test from Current Test section
147
+ - User responds with pass confirmation or issue description
148
+ - Update test result (pass/issue/skipped)
149
+ - Update Summary counts
150
+ - If issue: append to Gaps section (YAML format), infer severity
151
+ - Move Current Test to next pending test
152
+
153
+ **On completion:**
154
+ - status → "complete"
155
+ - Current Test → "[testing complete]"
156
+ - Commit file
157
+ - Present summary with next steps
158
+
159
+ **Resume after /clear:**
160
+ 1. Read frontmatter → know phase and status
161
+ 2. Read Current Test → know where we are
162
+ 3. Find first [pending] result → continue from there
163
+ 4. Summary shows progress so far
164
+
165
+ </lifecycle>
166
+
167
+ <severity_guide>
168
+
169
+ Severity is INFERRED from user's natural language, never asked.
170
+
171
+ | User describes | Infer |
172
+ |----------------|-------|
173
+ | Crash, error, exception, fails completely, unusable | blocker |
174
+ | Doesn't work, nothing happens, wrong behavior, missing | major |
175
+ | Works but..., slow, weird, minor, small issue | minor |
176
+ | Color, font, spacing, alignment, visual, looks off | cosmetic |
177
+
178
+ Default: **major** (safe default, user can clarify if wrong)
179
+
180
+ </severity_guide>
181
+
182
+ <good_example>
183
+ ```markdown
184
+ ---
185
+ status: diagnosed
186
+ phase: 04-comments
187
+ source: 04-01-SUMMARY.md, 04-02-SUMMARY.md
188
+ started: 2025-01-15T10:30:00Z
189
+ updated: 2025-01-15T10:45:00Z
190
+ ---
191
+
192
+ ## Current Test
193
+
194
+ [testing complete]
195
+
196
+ ## Tests
197
+
198
+ ### 1. View Comments on Post
199
+ expected: Comments section expands, shows count and comment list
200
+ result: pass
201
+
202
+ ### 2. Create Top-Level Comment
203
+ expected: Submit comment via rich text editor, appears in list with author info
204
+ result: issue
205
+ reported: "works but doesn't show until I refresh the page"
206
+ severity: major
207
+
208
+ ### 3. Reply to a Comment
209
+ expected: Click Reply, inline composer appears, submit shows nested reply
210
+ result: pass
211
+
212
+ ### 4. Visual Nesting
213
+ expected: 3+ level thread shows indentation, left borders, caps at reasonable depth
214
+ result: pass
215
+
216
+ ### 5. Delete Own Comment
217
+ expected: Click delete on own comment, removed or shows [deleted] if has replies
218
+ result: pass
219
+
220
+ ### 6. Comment Count
221
+ expected: Post shows accurate count, increments when adding comment
222
+ result: pass
223
+
224
+ ## Summary
225
+
226
+ total: 6
227
+ passed: 5
228
+ issues: 1
229
+ pending: 0
230
+ skipped: 0
231
+
232
+ ## Gaps
233
+
234
+ - truth: "Comment appears immediately after submission in list"
235
+ status: failed
236
+ reason: "User reported: works but doesn't show until I refresh the page"
237
+ severity: major
238
+ test: 2
239
+ root_cause: "useEffect in CommentList.tsx missing commentCount dependency"
240
+ artifacts:
241
+ - path: "src/components/CommentList.tsx"
242
+ issue: "useEffect missing dependency"
243
+ missing:
244
+ - "Add commentCount to useEffect dependency array"
245
+ debug_session: ".planning/debug/comment-not-refreshing.md"
246
+ ```
247
+ </good_example>
@@ -0,0 +1,195 @@
1
+ # Archive Reason Template
2
+
3
+ Template for documenting failed/abandoned hypotheses in `experiments/archive/YYYY-MM-DD_hypothesis_name/ARCHIVE_REASON.md`.
4
+
5
+ ---
6
+
7
+ ## File Template
8
+
9
+ ```markdown
10
+ # Archive Reason: {{hypothesis_name}}
11
+
12
+ **Archived:** {{ISO_8601_timestamp}}
13
+ **Original Hypothesis:** {{hypothesis_statement_from_objective}}
14
+ **Final Iteration:** {{N}} of {{limit}}
15
+ **Final Verdict:** {{ESCALATE|REVISE_METHOD_limit|REVISE_DATA_unresolved}}
16
+
17
+ ## Why This Failed
18
+
19
+ {{user_rationale_required}}
20
+
21
+ ## What We Learned
22
+
23
+ {{Insights from failed attempts - to be filled by user}}
24
+
25
+ - Key finding 1
26
+ - Key finding 2
27
+ - Key finding 3
28
+
29
+ ## What Would Need to Change
30
+
31
+ {{Conditions under which this might work - to be filled by user}}
32
+
33
+ - Required change 1
34
+ - Required change 2
35
+ - Required change 3
36
+
37
+ ## Final Metrics
38
+
39
+ | Metric | Best Value | Target | Gap |
40
+ |--------|------------|--------|-----|
41
+ | {{metric}} | {{best_achieved}} | {{threshold}} | {{difference}} |
42
+
43
+ ## Iteration Timeline
44
+
45
+ See: `ITERATION_SUMMARY.md` for detailed history of all attempts.
46
+
47
+ **Summary:**
48
+ - Total runs: {{N}}
49
+ - Verdict distribution: {{PROCEED: X, REVISE_METHOD: Y, REVISE_DATA: Z, ESCALATE: W}}
50
+ - Best composite score: {{best_score}} (needed: {{threshold}})
51
+
52
+ ---
53
+
54
+ *This negative result is preserved to prevent future researchers from repeating this approach without the necessary conditions.*
55
+
56
+ ---
57
+
58
+ **Archive location:** experiments/archive/{{YYYY-MM-DD}}_{{hypothesis_slug}}/
59
+ **Decision recorded:** human_eval/decision_log.md
60
+ ```
61
+
62
+ ---
63
+
64
+ ## Usage Notes
65
+
66
+ **Field descriptions:**
67
+
68
+ - **hypothesis_name:** Human-readable name extracted from OBJECTIVE.md "what" section
69
+ - **hypothesis_slug:** Filename-safe version (spaces→underscores, lowercase, alphanumeric only)
70
+ - **ISO_8601_timestamp:** Format YYYY-MM-DDTHH:MM:SSZ (UTC time)
71
+ - **hypothesis_statement_from_objective:** Full "what/why/expected" from OBJECTIVE.md
72
+ - **N:** Final iteration count from run directory
73
+ - **limit:** Iteration limit (default 5, or custom from --limit flag)
74
+ - **Final Verdict:** Reason for archival (ESCALATE, REVISE_METHOD limit reached, REVISE_DATA unresolved)
75
+
76
+ **Why This Failed (REQUIRED):**
77
+
78
+ This is the most critical section. User must provide substantive explanation:
79
+ - What was attempted
80
+ - Why it didn't work
81
+ - What blocked success
82
+ - Any insights about the approach
83
+
84
+ Examples:
85
+ - "Data quality issues prevented reliable model training. Missing values in key features caused high variance."
86
+ - "Hypothesis was too ambitious given available data. Sample size (N=500) insufficient for complex ensemble methods."
87
+ - "Leakage detection revealed fundamental data collection flaw that cannot be corrected without re-collection."
88
+
89
+ **What We Learned (user fills):**
90
+
91
+ Insights that emerged from failed attempts. Examples:
92
+ - "Feature X has stronger predictive power than initially assumed"
93
+ - "Class imbalance >90% requires specialized techniques beyond standard methods"
94
+ - "Temporal drift in data makes cross-validation unreliable"
95
+
96
+ **What Would Need to Change (user fills):**
97
+
98
+ Conditions for future success. Examples:
99
+ - "Collect 10x more data (N=5000+) to support ensemble complexity"
100
+ - "Fix data pipeline to prevent leakage at source"
101
+ - "Reformulate as binary classification instead of multi-class"
102
+
103
+ **Final Metrics table:**
104
+
105
+ - Show best values achieved across ALL iterations (not just final)
106
+ - Include gap calculation (target - best_achieved)
107
+ - Order by importance (primary metric first)
108
+
109
+ **Example populated template:**
110
+
111
+ ```markdown
112
+ # Archive Reason: Ensemble Methods for Fraud Detection
113
+
114
+ **Archived:** 2026-01-30T15:45:00Z
115
+ **Original Hypothesis:** Ensemble methods will improve F1 score over single models by combining predictions from random forest, gradient boosting, and neural networks.
116
+ **Final Iteration:** 5 of 5
117
+ **Final Verdict:** REVISE_METHOD_limit
118
+
119
+ ## Why This Failed
120
+
121
+ After 5 iterations with different ensemble configurations, we could not achieve the target F1 score of 0.85. The fundamental issue is severe class imbalance (99.2% negative class) combined with limited positive examples (N=120). Ensemble methods require sufficient positive examples to learn diverse patterns, but our dataset is too imbalanced for this approach to work effectively.
122
+
123
+ All iterations showed high precision (>0.90) but poor recall (<0.40), resulting in F1 scores between 0.52-0.58. Attempts to address this through resampling (SMOTE, undersampling) introduced artificial patterns that didn't generalize to the test set.
124
+
125
+ ## What We Learned
126
+
127
+ - Class imbalance of 99%+ requires specialized loss functions (focal loss) rather than ensemble complexity
128
+ - Resampling techniques (SMOTE) work poorly with high-dimensional data (237 features)
129
+ - Single model (gradient boosting) with class weights performed nearly as well as ensembles (F1: 0.56 vs 0.58)
130
+ - Feature importance analysis revealed only 15 features have meaningful signal
131
+
132
+ ## What Would Need to Change
133
+
134
+ - Collect 10x more positive examples (N=1200+) to support ensemble diversity
135
+ - Reduce feature space to top 15-20 features to prevent overfitting on noise
136
+ - Use focal loss or cost-sensitive learning instead of standard ensemble methods
137
+ - Consider anomaly detection approaches instead of classification
138
+ - Re-evaluate hypothesis: perhaps single model is sufficient given data constraints
139
+
140
+ ## Final Metrics
141
+
142
+ | Metric | Best Value | Target | Gap |
143
+ |--------|------------|--------|-----|
144
+ | f1_score | 0.58 | 0.85 | -0.27 |
145
+ | precision | 0.92 | 0.80 | +0.12 |
146
+ | recall | 0.42 | 0.80 | -0.38 |
147
+
148
+ ## Iteration Timeline
149
+
150
+ See: `ITERATION_SUMMARY.md` for detailed history of all attempts.
151
+
152
+ **Summary:**
153
+ - Total runs: 5
154
+ - Verdict distribution: PROCEED: 0, REVISE_METHOD: 5, REVISE_DATA: 0, ESCALATE: 0
155
+ - Best composite score: 0.64 (needed: 0.80)
156
+
157
+ ---
158
+
159
+ *This negative result is preserved to prevent future researchers from repeating this approach without the necessary conditions.*
160
+
161
+ ---
162
+
163
+ **Archive location:** experiments/archive/2026-01-30_ensemble_methods_fraud_detection/
164
+ **Decision recorded:** human_eval/decision_log.md
165
+ ```
166
+
167
+ ---
168
+
169
+ ## Integration
170
+
171
+ This template is used by `/grd:evaluate` command in Phase 5 (Archive Handling) when user selects "Archive" decision.
172
+
173
+ **Inputs:**
174
+ - OBJECTIVE.md (hypothesis statement, metrics, thresholds)
175
+ - All SCORECARD.json files across runs (for best metrics)
176
+ - All CRITIC_LOG.md files (for verdict history)
177
+ - User rationale (REQUIRED from confirmation prompt)
178
+ - Iteration metadata (count, limit, final verdict)
179
+
180
+ **Outputs:**
181
+ - experiments/archive/YYYY-MM-DD_hypothesis_slug/ARCHIVE_REASON.md (this template)
182
+ - Referenced by ITERATION_SUMMARY.md in same directory
183
+ - Logged in human_eval/decision_log.md
184
+
185
+ **Archive directory structure:**
186
+ ```
187
+ experiments/archive/YYYY-MM-DD_hypothesis_name/
188
+ ├── ARCHIVE_REASON.md # This template (why it failed)
189
+ ├── ITERATION_SUMMARY.md # Collapsed run history
190
+ └── final_run/ # Final run directory moved from experiments/
191
+ ├── DECISION.md
192
+ ├── SCORECARD.json
193
+ ├── CRITIC_LOG.md
194
+ └── ...
195
+ ```