get-research-done 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (127) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +560 -0
  3. package/agents/grd-architect.md +789 -0
  4. package/agents/grd-codebase-mapper.md +738 -0
  5. package/agents/grd-critic.md +1065 -0
  6. package/agents/grd-debugger.md +1203 -0
  7. package/agents/grd-evaluator.md +948 -0
  8. package/agents/grd-executor.md +784 -0
  9. package/agents/grd-explorer.md +2063 -0
  10. package/agents/grd-graduator.md +484 -0
  11. package/agents/grd-integration-checker.md +423 -0
  12. package/agents/grd-phase-researcher.md +641 -0
  13. package/agents/grd-plan-checker.md +745 -0
  14. package/agents/grd-planner.md +1386 -0
  15. package/agents/grd-project-researcher.md +865 -0
  16. package/agents/grd-research-synthesizer.md +256 -0
  17. package/agents/grd-researcher.md +2361 -0
  18. package/agents/grd-roadmapper.md +605 -0
  19. package/agents/grd-verifier.md +778 -0
  20. package/bin/install.js +1294 -0
  21. package/commands/grd/add-phase.md +207 -0
  22. package/commands/grd/add-todo.md +193 -0
  23. package/commands/grd/architect.md +283 -0
  24. package/commands/grd/audit-milestone.md +277 -0
  25. package/commands/grd/check-todos.md +228 -0
  26. package/commands/grd/complete-milestone.md +136 -0
  27. package/commands/grd/debug.md +169 -0
  28. package/commands/grd/discuss-phase.md +86 -0
  29. package/commands/grd/evaluate.md +1095 -0
  30. package/commands/grd/execute-phase.md +339 -0
  31. package/commands/grd/explore.md +258 -0
  32. package/commands/grd/graduate.md +323 -0
  33. package/commands/grd/help.md +482 -0
  34. package/commands/grd/insert-phase.md +227 -0
  35. package/commands/grd/insights.md +231 -0
  36. package/commands/grd/join-discord.md +18 -0
  37. package/commands/grd/list-phase-assumptions.md +50 -0
  38. package/commands/grd/map-codebase.md +71 -0
  39. package/commands/grd/new-milestone.md +721 -0
  40. package/commands/grd/new-project.md +1008 -0
  41. package/commands/grd/pause-work.md +134 -0
  42. package/commands/grd/plan-milestone-gaps.md +295 -0
  43. package/commands/grd/plan-phase.md +525 -0
  44. package/commands/grd/progress.md +364 -0
  45. package/commands/grd/quick-explore.md +236 -0
  46. package/commands/grd/quick.md +309 -0
  47. package/commands/grd/remove-phase.md +349 -0
  48. package/commands/grd/research-phase.md +200 -0
  49. package/commands/grd/research.md +681 -0
  50. package/commands/grd/resume-work.md +40 -0
  51. package/commands/grd/set-profile.md +106 -0
  52. package/commands/grd/settings.md +136 -0
  53. package/commands/grd/update.md +172 -0
  54. package/commands/grd/verify-work.md +219 -0
  55. package/get-research-done/config/default.json +15 -0
  56. package/get-research-done/references/checkpoints.md +1078 -0
  57. package/get-research-done/references/continuation-format.md +249 -0
  58. package/get-research-done/references/git-integration.md +254 -0
  59. package/get-research-done/references/model-profiles.md +73 -0
  60. package/get-research-done/references/planning-config.md +94 -0
  61. package/get-research-done/references/questioning.md +141 -0
  62. package/get-research-done/references/tdd.md +263 -0
  63. package/get-research-done/references/ui-brand.md +160 -0
  64. package/get-research-done/references/verification-patterns.md +612 -0
  65. package/get-research-done/templates/DEBUG.md +159 -0
  66. package/get-research-done/templates/UAT.md +247 -0
  67. package/get-research-done/templates/archive-reason.md +195 -0
  68. package/get-research-done/templates/codebase/architecture.md +255 -0
  69. package/get-research-done/templates/codebase/concerns.md +310 -0
  70. package/get-research-done/templates/codebase/conventions.md +307 -0
  71. package/get-research-done/templates/codebase/integrations.md +280 -0
  72. package/get-research-done/templates/codebase/stack.md +186 -0
  73. package/get-research-done/templates/codebase/structure.md +285 -0
  74. package/get-research-done/templates/codebase/testing.md +480 -0
  75. package/get-research-done/templates/config.json +35 -0
  76. package/get-research-done/templates/context.md +283 -0
  77. package/get-research-done/templates/continue-here.md +78 -0
  78. package/get-research-done/templates/critic-log.md +288 -0
  79. package/get-research-done/templates/data-report.md +173 -0
  80. package/get-research-done/templates/debug-subagent-prompt.md +91 -0
  81. package/get-research-done/templates/decision-log.md +58 -0
  82. package/get-research-done/templates/decision.md +138 -0
  83. package/get-research-done/templates/discovery.md +146 -0
  84. package/get-research-done/templates/experiment-readme.md +104 -0
  85. package/get-research-done/templates/graduated-script.md +180 -0
  86. package/get-research-done/templates/iteration-summary.md +234 -0
  87. package/get-research-done/templates/milestone-archive.md +123 -0
  88. package/get-research-done/templates/milestone.md +115 -0
  89. package/get-research-done/templates/objective.md +271 -0
  90. package/get-research-done/templates/phase-prompt.md +567 -0
  91. package/get-research-done/templates/planner-subagent-prompt.md +117 -0
  92. package/get-research-done/templates/project.md +184 -0
  93. package/get-research-done/templates/requirements.md +231 -0
  94. package/get-research-done/templates/research-project/ARCHITECTURE.md +204 -0
  95. package/get-research-done/templates/research-project/FEATURES.md +147 -0
  96. package/get-research-done/templates/research-project/PITFALLS.md +200 -0
  97. package/get-research-done/templates/research-project/STACK.md +120 -0
  98. package/get-research-done/templates/research-project/SUMMARY.md +170 -0
  99. package/get-research-done/templates/research.md +529 -0
  100. package/get-research-done/templates/roadmap.md +202 -0
  101. package/get-research-done/templates/scorecard.json +113 -0
  102. package/get-research-done/templates/state.md +287 -0
  103. package/get-research-done/templates/summary.md +246 -0
  104. package/get-research-done/templates/user-setup.md +311 -0
  105. package/get-research-done/templates/verification-report.md +322 -0
  106. package/get-research-done/workflows/complete-milestone.md +756 -0
  107. package/get-research-done/workflows/diagnose-issues.md +231 -0
  108. package/get-research-done/workflows/discovery-phase.md +289 -0
  109. package/get-research-done/workflows/discuss-phase.md +433 -0
  110. package/get-research-done/workflows/execute-phase.md +657 -0
  111. package/get-research-done/workflows/execute-plan.md +1844 -0
  112. package/get-research-done/workflows/list-phase-assumptions.md +178 -0
  113. package/get-research-done/workflows/map-codebase.md +322 -0
  114. package/get-research-done/workflows/resume-project.md +307 -0
  115. package/get-research-done/workflows/transition.md +556 -0
  116. package/get-research-done/workflows/verify-phase.md +628 -0
  117. package/get-research-done/workflows/verify-work.md +596 -0
  118. package/hooks/dist/grd-check-update.js +61 -0
  119. package/hooks/dist/grd-statusline.js +84 -0
  120. package/package.json +47 -0
  121. package/scripts/audit-help-commands.sh +115 -0
  122. package/scripts/build-hooks.js +42 -0
  123. package/scripts/verify-all-commands.sh +246 -0
  124. package/scripts/verify-architect-warning.sh +35 -0
  125. package/scripts/verify-insights-mode.sh +40 -0
  126. package/scripts/verify-quick-mode.sh +20 -0
  127. package/scripts/verify-revise-data-routing.sh +139 -0
@@ -0,0 +1,283 @@
1
+ # Phase Context Template
2
+
3
+ Template for `.planning/phases/XX-name/{phase}-CONTEXT.md` - captures implementation decisions for a phase.
4
+
5
+ **Purpose:** Document decisions that downstream agents need. Researcher uses this to know WHAT to investigate. Planner uses this to know WHAT choices are locked vs flexible.
6
+
7
+ **Key principle:** Categories are NOT predefined. They emerge from what was actually discussed for THIS phase. A CLI phase has CLI-relevant sections, a UI phase has UI-relevant sections.
8
+
9
+ **Downstream consumers:**
10
+ - `grd-phase-researcher` — Reads decisions to focus research (e.g., "card layout" → research card component patterns)
11
+ - `grd-planner` — Reads decisions to create specific tasks (e.g., "infinite scroll" → task includes virtualization)
12
+
13
+ ---
14
+
15
+ ## File Template
16
+
17
+ ```markdown
18
+ # Phase [X]: [Name] - Context
19
+
20
+ **Gathered:** [date]
21
+ **Status:** Ready for planning
22
+
23
+ <domain>
24
+ ## Phase Boundary
25
+
26
+ [Clear statement of what this phase delivers — the scope anchor. This comes from ROADMAP.md and is fixed. Discussion clarifies implementation within this boundary.]
27
+
28
+ </domain>
29
+
30
+ <decisions>
31
+ ## Implementation Decisions
32
+
33
+ ### [Area 1 that was discussed]
34
+ - [Specific decision made]
35
+ - [Another decision if applicable]
36
+
37
+ ### [Area 2 that was discussed]
38
+ - [Specific decision made]
39
+
40
+ ### [Area 3 that was discussed]
41
+ - [Specific decision made]
42
+
43
+ ### Claude's Discretion
44
+ [Areas where user explicitly said "you decide" — Claude has flexibility here during planning/implementation]
45
+
46
+ </decisions>
47
+
48
+ <specifics>
49
+ ## Specific Ideas
50
+
51
+ [Any particular references, examples, or "I want it like X" moments from discussion. Product references, specific behaviors, interaction patterns.]
52
+
53
+ [If none: "No specific requirements — open to standard approaches"]
54
+
55
+ </specifics>
56
+
57
+ <deferred>
58
+ ## Deferred Ideas
59
+
60
+ [Ideas that came up during discussion but belong in other phases. Captured here so they're not lost, but explicitly out of scope for this phase.]
61
+
62
+ [If none: "None — discussion stayed within phase scope"]
63
+
64
+ </deferred>
65
+
66
+ ---
67
+
68
+ *Phase: XX-name*
69
+ *Context gathered: [date]*
70
+ ```
71
+
72
+ <good_examples>
73
+
74
+ **Example 1: Visual feature (Post Feed)**
75
+
76
+ ```markdown
77
+ # Phase 3: Post Feed - Context
78
+
79
+ **Gathered:** 2025-01-20
80
+ **Status:** Ready for planning
81
+
82
+ <domain>
83
+ ## Phase Boundary
84
+
85
+ Display posts from followed users in a scrollable feed. Users can view posts and see engagement counts. Creating posts and interactions are separate phases.
86
+
87
+ </domain>
88
+
89
+ <decisions>
90
+ ## Implementation Decisions
91
+
92
+ ### Layout style
93
+ - Card-based layout, not timeline or list
94
+ - Each card shows: author avatar, name, timestamp, full post content, reaction counts
95
+ - Cards have subtle shadows, rounded corners — modern feel
96
+
97
+ ### Loading behavior
98
+ - Infinite scroll, not pagination
99
+ - Pull-to-refresh on mobile
100
+ - New posts indicator at top ("3 new posts") rather than auto-inserting
101
+
102
+ ### Empty state
103
+ - Friendly illustration + "Follow people to see posts here"
104
+ - Suggest 3-5 accounts to follow based on interests
105
+
106
+ ### Claude's Discretion
107
+ - Loading skeleton design
108
+ - Exact spacing and typography
109
+ - Error state handling
110
+
111
+ </decisions>
112
+
113
+ <specifics>
114
+ ## Specific Ideas
115
+
116
+ - "I like how Twitter shows the new posts indicator without disrupting your scroll position"
117
+ - Cards should feel like Linear's issue cards — clean, not cluttered
118
+
119
+ </specifics>
120
+
121
+ <deferred>
122
+ ## Deferred Ideas
123
+
124
+ - Commenting on posts — Phase 5
125
+ - Bookmarking posts — add to backlog
126
+
127
+ </deferred>
128
+
129
+ ---
130
+
131
+ *Phase: 03-post-feed*
132
+ *Context gathered: 2025-01-20*
133
+ ```
134
+
135
+ **Example 2: CLI tool (Database backup)**
136
+
137
+ ```markdown
138
+ # Phase 2: Backup Command - Context
139
+
140
+ **Gathered:** 2025-01-20
141
+ **Status:** Ready for planning
142
+
143
+ <domain>
144
+ ## Phase Boundary
145
+
146
+ CLI command to backup database to local file or S3. Supports full and incremental backups. Restore command is a separate phase.
147
+
148
+ </domain>
149
+
150
+ <decisions>
151
+ ## Implementation Decisions
152
+
153
+ ### Output format
154
+ - JSON for programmatic use, table format for humans
155
+ - Default to table, --json flag for JSON
156
+ - Verbose mode (-v) shows progress, silent by default
157
+
158
+ ### Flag design
159
+ - Short flags for common options: -o (output), -v (verbose), -f (force)
160
+ - Long flags for clarity: --incremental, --compress, --encrypt
161
+ - Required: database connection string (positional or --db)
162
+
163
+ ### Error recovery
164
+ - Retry 3 times on network failure, then fail with clear message
165
+ - --no-retry flag to fail fast
166
+ - Partial backups are deleted on failure (no corrupt files)
167
+
168
+ ### Claude's Discretion
169
+ - Exact progress bar implementation
170
+ - Compression algorithm choice
171
+ - Temp file handling
172
+
173
+ </decisions>
174
+
175
+ <specifics>
176
+ ## Specific Ideas
177
+
178
+ - "I want it to feel like pg_dump — familiar to database people"
179
+ - Should work in CI pipelines (exit codes, no interactive prompts)
180
+
181
+ </specifics>
182
+
183
+ <deferred>
184
+ ## Deferred Ideas
185
+
186
+ - Scheduled backups — separate phase
187
+ - Backup rotation/retention — add to backlog
188
+
189
+ </deferred>
190
+
191
+ ---
192
+
193
+ *Phase: 02-backup-command*
194
+ *Context gathered: 2025-01-20*
195
+ ```
196
+
197
+ **Example 3: Organization task (Photo library)**
198
+
199
+ ```markdown
200
+ # Phase 1: Photo Organization - Context
201
+
202
+ **Gathered:** 2025-01-20
203
+ **Status:** Ready for planning
204
+
205
+ <domain>
206
+ ## Phase Boundary
207
+
208
+ Organize existing photo library into structured folders. Handle duplicates and apply consistent naming. Tagging and search are separate phases.
209
+
210
+ </domain>
211
+
212
+ <decisions>
213
+ ## Implementation Decisions
214
+
215
+ ### Grouping criteria
216
+ - Primary grouping by year, then by month
217
+ - Events detected by time clustering (photos within 2 hours = same event)
218
+ - Event folders named by date + location if available
219
+
220
+ ### Duplicate handling
221
+ - Keep highest resolution version
222
+ - Move duplicates to _duplicates folder (don't delete)
223
+ - Log all duplicate decisions for review
224
+
225
+ ### Naming convention
226
+ - Format: YYYY-MM-DD_HH-MM-SS_originalname.ext
227
+ - Preserve original filename as suffix for searchability
228
+ - Handle name collisions with incrementing suffix
229
+
230
+ ### Claude's Discretion
231
+ - Exact clustering algorithm
232
+ - How to handle photos with no EXIF data
233
+ - Folder emoji usage
234
+
235
+ </decisions>
236
+
237
+ <specifics>
238
+ ## Specific Ideas
239
+
240
+ - "I want to be able to find photos by roughly when they were taken"
241
+ - Don't delete anything — worst case, move to a review folder
242
+
243
+ </specifics>
244
+
245
+ <deferred>
246
+ ## Deferred Ideas
247
+
248
+ - Face detection grouping — future phase
249
+ - Cloud sync — out of scope for now
250
+
251
+ </deferred>
252
+
253
+ ---
254
+
255
+ *Phase: 01-photo-organization*
256
+ *Context gathered: 2025-01-20*
257
+ ```
258
+
259
+ </good_examples>
260
+
261
+ <guidelines>
262
+ **This template captures DECISIONS for downstream agents.**
263
+
264
+ The output should answer: "What does the researcher need to investigate? What choices are locked for the planner?"
265
+
266
+ **Good content (concrete decisions):**
267
+ - "Card-based layout, not timeline"
268
+ - "Retry 3 times on network failure, then fail"
269
+ - "Group by year, then by month"
270
+ - "JSON for programmatic use, table for humans"
271
+
272
+ **Bad content (too vague):**
273
+ - "Should feel modern and clean"
274
+ - "Good user experience"
275
+ - "Fast and responsive"
276
+ - "Easy to use"
277
+
278
+ **After creation:**
279
+ - File lives in phase directory: `.planning/phases/XX-name/{phase}-CONTEXT.md`
280
+ - `grd-phase-researcher` uses decisions to focus investigation
281
+ - `grd-planner` uses decisions + research to create executable tasks
282
+ - Downstream agents should NOT need to ask the user again about captured decisions
283
+ </guidelines>
@@ -0,0 +1,78 @@
1
+ # Continue-Here Template
2
+
3
+ Copy and fill this structure for `.planning/phases/XX-name/.continue-here.md`:
4
+
5
+ ```yaml
6
+ ---
7
+ phase: XX-name
8
+ task: 3
9
+ total_tasks: 7
10
+ status: in_progress
11
+ last_updated: 2025-01-15T14:30:00Z
12
+ ---
13
+ ```
14
+
15
+ ```markdown
16
+ <current_state>
17
+ [Where exactly are we? What's the immediate context?]
18
+ </current_state>
19
+
20
+ <completed_work>
21
+ [What got done this session - be specific]
22
+
23
+ - Task 1: [name] - Done
24
+ - Task 2: [name] - Done
25
+ - Task 3: [name] - In progress, [what's done on it]
26
+ </completed_work>
27
+
28
+ <remaining_work>
29
+ [What's left in this phase]
30
+
31
+ - Task 3: [name] - [what's left to do]
32
+ - Task 4: [name] - Not started
33
+ - Task 5: [name] - Not started
34
+ </remaining_work>
35
+
36
+ <decisions_made>
37
+ [Key decisions and why - so next session doesn't re-debate]
38
+
39
+ - Decided to use [X] because [reason]
40
+ - Chose [approach] over [alternative] because [reason]
41
+ </decisions_made>
42
+
43
+ <blockers>
44
+ [Anything stuck or waiting on external factors]
45
+
46
+ - [Blocker 1]: [status/workaround]
47
+ </blockers>
48
+
49
+ <context>
50
+ [Mental state, "vibe", anything that helps resume smoothly]
51
+
52
+ [What were you thinking about? What was the plan?
53
+ This is the "pick up exactly where you left off" context.]
54
+ </context>
55
+
56
+ <next_action>
57
+ [The very first thing to do when resuming]
58
+
59
+ Start with: [specific action]
60
+ </next_action>
61
+ ```
62
+
63
+ <yaml_fields>
64
+ Required YAML frontmatter:
65
+
66
+ - `phase`: Directory name (e.g., `02-authentication`)
67
+ - `task`: Current task number
68
+ - `total_tasks`: How many tasks in phase
69
+ - `status`: `in_progress`, `blocked`, `almost_done`
70
+ - `last_updated`: ISO timestamp
71
+ </yaml_fields>
72
+
73
+ <guidelines>
74
+ - Be specific enough that a fresh Claude instance understands immediately
75
+ - Include WHY decisions were made, not just what
76
+ - The `<next_action>` should be actionable without reading anything else
77
+ - This file gets DELETED after resume - it's not permanent storage
78
+ </guidelines>
@@ -0,0 +1,288 @@
1
+ # Critic Evaluation: {{run_name}}
2
+
3
+ **Timestamp:** {{timestamp}}
4
+ **Iteration:** {{iteration_number}}
5
+ **Objective:** {{brief_hypothesis}}
6
+
7
+ ---
8
+
9
+ ## Verdict
10
+
11
+ **Decision:** {{PROCEED | REVISE_METHOD | REVISE_DATA | ESCALATE}}
12
+ **Confidence:** {{HIGH | MEDIUM | LOW}}
13
+
14
+ ## Reasoning
15
+
16
+ {{explanation_of_routing_decision}}
17
+
18
+ {{context_for_why_this_verdict_makes_sense}}
19
+
20
+ {{evidence_supporting_decision}}
21
+
22
+ ## Metrics Summary
23
+
24
+ | Metric | Value | Threshold | Comparison | Result |
25
+ |--------|-------|-----------|------------|--------|
26
+ | {{metric_name}} | {{value}} | {{threshold}} | {{>|<|=}} | {{PASS|FAIL}} |
27
+
28
+ **Composite Score:** {{weighted_average}} (threshold: {{composite_threshold}})
29
+
30
+ **Baseline Comparison:** {{if_baseline_defined}}
31
+
32
+ | Metric | Baseline | Actual | Improvement | % Change |
33
+ |--------|----------|--------|-------------|----------|
34
+ | {{metric_name}} | {{baseline_value}} | {{actual_value}} | {{delta}} | {{percentage}} |
35
+
36
+ ## Strengths
37
+
38
+ {{list_of_what_experiment_does_well}}
39
+
40
+ Examples:
41
+ - Implementation correctly uses stratified k-fold as specified in OBJECTIVE.md
42
+ - Random seed set to 42 for reproducibility
43
+ - Clear documentation in README.md
44
+ - Hyperparameters well-documented in config.yaml
45
+ - Code quality is high with proper error handling
46
+ - Training/validation curves show healthy learning behavior
47
+
48
+ ## Weaknesses
49
+
50
+ {{list_of_issues_or_concerns}}
51
+
52
+ Examples:
53
+ - F1 score (0.78) below threshold (0.80)
54
+ - Train-test gap of 0.08 suggests mild overfitting
55
+ - Learning rate may be too high (training loss plateaus early)
56
+ - Missing validation curves in output
57
+ - Evaluation methodology doesn't match OBJECTIVE.md (used holdout instead of k-fold)
58
+ - Random seed not set (non-reproducible results)
59
+
60
+ ## Recommendations
61
+
62
+ {{list_of_specific_actionable_suggestions}}
63
+
64
+ **For REVISE_METHOD verdicts:**
65
+ - Reduce learning rate from 0.1 to 0.01
66
+ - Add dropout layer with rate 0.3 to reduce overfitting
67
+ - Increase training epochs from 50 to 100 (training curve not plateaued)
68
+ - Add early stopping with patience=10 to prevent overfitting
69
+ - Fix data split bug on line 45 in train.py
70
+ - Add missing metrics to output (currently missing F1 score)
71
+
72
+ **For REVISE_DATA verdicts:**
73
+ - Investigate feature 'transaction_id' for potential leakage (dominates feature importance)
74
+ - Re-analyze temporal features for leakage (results suggest future information used)
75
+ - Verify target column is correct (baseline outperforms model significantly)
76
+ - Check for train-test overlap (metrics suggest data contamination)
77
+ - Investigate data quality issues (high variance across folds)
78
+
79
+ **For PROCEED verdicts:**
80
+ - Document validation approach in final report
81
+ - Consider additional robustness checks before production
82
+ - Monitor for drift in production deployment
83
+
84
+ **For ESCALATE verdicts:**
85
+ - Human decision required (see evidence package below)
86
+ - Consider revising hypothesis or success criteria
87
+ - May need to collect additional data
88
+ - Strategic pivot may be necessary
89
+
90
+ ## Investigation Notes
91
+
92
+ {{notes_from_scientific_skepticism_checks}}
93
+
94
+ ### Suspicious Success Check
95
+
96
+ {{result_of_investigation_for_unusually_high_metrics}}
97
+
98
+ - Metrics: {{list_metrics_and_values}}
99
+ - Task complexity: {{assessment_of_difficulty}}
100
+ - Assessment: {{plausible | suspicious | highly_suspicious}}
101
+ - Reasoning: {{why}}
102
+
103
+ ### Train-Test Gap
104
+
105
+ - Train metric: {{value}}
106
+ - Validation metric: {{value}}
107
+ - Gap: {{delta}}
108
+ - Assessment: {{acceptable | moderate_concern | high_concern}}
109
+ - Reasoning: {{why}}
110
+
111
+ ### Reproducibility
112
+
113
+ - Random seed set: {{yes|no}}
114
+ - Dependencies documented: {{yes|no}}
115
+ - Data references recorded: {{yes|no}}
116
+ - Assessment: {{reproducible | partially_reproducible | non_reproducible}}
117
+
118
+ ### Data Integrity
119
+
120
+ {{if_DATA_REPORT_referenced}}
121
+
122
+ - Leakage features excluded: {{yes|no|N/A}}
123
+ - Class imbalance handled: {{yes|no|N/A}}
124
+ - Temporal splits used if needed: {{yes|no|N/A}}
125
+ - Assessment: {{concerns_none | concerns_minor | concerns_major}}
126
+
127
+ ### Code Quality
128
+
129
+ - Evaluation matches OBJECTIVE.md: {{yes|no}}
130
+ - Data split correct: {{yes|no}}
131
+ - Hyperparameters documented: {{yes|no}}
132
+ - Error handling present: {{yes|no}}
133
+ - Assessment: {{good | acceptable | needs_improvement}}
134
+
135
+ ## Trend Analysis
136
+
137
+ **Iteration Trend:** {{improving | stagnant | degrading | first_run}}
138
+
139
+ {{comparison_with_previous_iterations_if_available}}
140
+
141
+ **Historical Performance:**
142
+
143
+ | Iteration | Composite Score | Key Changes | Verdict |
144
+ |-----------|----------------|-------------|---------|
145
+ | 1 | {{value}} | {{change_description}} | {{verdict}} |
146
+ | 2 | {{value}} | {{change_description}} | {{verdict}} |
147
+ | 3 (current) | {{value}} | {{change_description}} | {{verdict}} |
148
+
149
+ **Trend Assessment:**
150
+
151
+ {{detailed_analysis_of_progress_across_iterations}}
152
+
153
+ Examples:
154
+ - "Metrics improving steadily (+0.02 per iteration). Current trajectory suggests threshold will be reached in 1-2 more iterations."
155
+ - "Metrics stagnant across 3 iterations despite different hyperparameters. May indicate fundamental limitation."
156
+ - "Metrics degrading. Recent changes counterproductive—consider reverting to iteration 1 approach."
157
+
158
+ **Cycle Detection:**
159
+
160
+ {{if_same_verdict_repeated}}
161
+
162
+ - Same verdict: {{verdict}} repeated {{N}} times
163
+ - Assessment: {{no_cycle | potential_cycle | cycle_detected}}
164
+ - Action: {{continue | escalate | try_different_approach}}
165
+
166
+ ## Next Steps
167
+
168
+ {{based_on_verdict}}
169
+
170
+ ### If PROCEED (HIGH confidence)
171
+ Ready for quantitative evaluation by Evaluator agent.
172
+
173
+ **Action:** Run `/grd:evaluate` to generate SCORECARD.json
174
+
175
+ **What happens next:**
176
+ - Evaluator will run comprehensive benchmark suite
177
+ - Results will be compared against OBJECTIVE.md criteria
178
+ - SCORECARD.json will be generated for human evaluation gate
179
+
180
+ ### If PROCEED (MEDIUM confidence)
181
+ Metrics meet criteria but minor concerns noted.
182
+
183
+ **Action:** Proceed to Evaluator with caveats
184
+
185
+ **Caveats:**
186
+ {{list_of_minor_concerns_to_monitor}}
187
+
188
+ ### If PROCEED (LOW confidence)
189
+ **HUMAN GATE REQUIRED**
190
+
191
+ Metrics pass thresholds but concerns exist:
192
+ {{list_of_concerns}}
193
+
194
+ **Question for human:**
195
+ Should we proceed to Evaluator despite concerns, or investigate further?
196
+
197
+ **Options:**
198
+ 1. Proceed to Evaluator (accept concerns)
199
+ 2. REVISE_METHOD (address concerns first)
200
+ 3. ESCALATE (need strategic decision)
201
+
202
+ ### If REVISE_METHOD
203
+ Address implementation issues and re-run experiment.
204
+
205
+ **Action:** Implement recommendations above, then run experiment again
206
+
207
+ **Specific fixes needed:**
208
+ {{prioritized_list_of_fixes}}
209
+
210
+ **Expected impact:**
211
+ {{what_should_improve_if_fixes_applied}}
212
+
213
+ **Estimated effort:** {{low|medium|high}}
214
+
215
+ ### If REVISE_DATA
216
+ Return to data exploration with specific concerns.
217
+
218
+ **Action:** Run `/grd:explore` with focus areas
219
+
220
+ **Concerns to investigate:**
221
+ {{list_of_specific_data_concerns}}
222
+
223
+ **What to look for:**
224
+ {{guidance_for_data_re_analysis}}
225
+
226
+ **Updates needed:**
227
+ - Append findings to DATA_REPORT.md
228
+ - Update OBJECTIVE.md if constraints change
229
+ - Re-run experiment with corrected data
230
+
231
+ ### If ESCALATE
232
+ Human decision required—cannot determine clear path forward.
233
+
234
+ **Reason for escalation:** {{cycle_detected | ambiguous_root_cause | iteration_limit | strategic_decision_needed}}
235
+
236
+ **Evidence Package:**
237
+
238
+ #### Iteration History
239
+ {{summary_of_all_attempts}}
240
+
241
+ #### Conflicting Signals
242
+ {{description_of_ambiguity_or_contradiction}}
243
+
244
+ #### Attempted Resolutions
245
+ {{what_was_tried_and_why_it_didnt_work}}
246
+
247
+ #### Recommendation
248
+ {{suggested_strategic_direction_or_questions_for_human}}
249
+
250
+ **Human Options:**
251
+ 1. Continue with more iterations (increase limit)
252
+ 2. Revise hypothesis or success criteria (update OBJECTIVE.md)
253
+ 3. Archive hypothesis as disproven (document learnings)
254
+ 4. Return to data collection (need more/better data)
255
+ 5. Strategic pivot (fundamentally different approach)
256
+
257
+ ## Appendix
258
+
259
+ ### Falsification Criteria Status
260
+
261
+ {{if_falsification_criteria_defined_in_OBJECTIVE}}
262
+
263
+ | Criterion | Status | Notes |
264
+ |-----------|--------|-------|
265
+ | {{criterion_name}} | {{not_met | approaching | met}} | {{details}} |
266
+
267
+ **Assessment:** {{hypothesis_still_viable | approaching_falsification | falsified}}
268
+
269
+ ### Experiment Metadata
270
+
271
+ - **Run directory:** {{path_to_run_NNN}}
272
+ - **Code files:** {{list_of_key_files}}
273
+ - **Configuration:** {{path_to_config_yaml_or_none}}
274
+ - **Documentation:** {{path_to_README_or_none}}
275
+ - **Training time:** {{duration_in_seconds_or_minutes}}
276
+ - **Compute resources:** {{cpu|gpu|tpu}} - {{details}}
277
+
278
+ ### References
279
+
280
+ - **OBJECTIVE.md:** `.planning/OBJECTIVE.md`
281
+ - **DATA_REPORT.md:** {{path_or_none}}
282
+ - **Previous iterations:** {{paths_to_previous_CRITIC_LOGs}}
283
+
284
+ ---
285
+
286
+ *Critique by grd-critic*
287
+ *Agent version: GRD Critic v1.0*
288
+ *Referenced: .planning/OBJECTIVE.md*