get-research-done 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +560 -0
- package/agents/grd-architect.md +789 -0
- package/agents/grd-codebase-mapper.md +738 -0
- package/agents/grd-critic.md +1065 -0
- package/agents/grd-debugger.md +1203 -0
- package/agents/grd-evaluator.md +948 -0
- package/agents/grd-executor.md +784 -0
- package/agents/grd-explorer.md +2063 -0
- package/agents/grd-graduator.md +484 -0
- package/agents/grd-integration-checker.md +423 -0
- package/agents/grd-phase-researcher.md +641 -0
- package/agents/grd-plan-checker.md +745 -0
- package/agents/grd-planner.md +1386 -0
- package/agents/grd-project-researcher.md +865 -0
- package/agents/grd-research-synthesizer.md +256 -0
- package/agents/grd-researcher.md +2361 -0
- package/agents/grd-roadmapper.md +605 -0
- package/agents/grd-verifier.md +778 -0
- package/bin/install.js +1294 -0
- package/commands/grd/add-phase.md +207 -0
- package/commands/grd/add-todo.md +193 -0
- package/commands/grd/architect.md +283 -0
- package/commands/grd/audit-milestone.md +277 -0
- package/commands/grd/check-todos.md +228 -0
- package/commands/grd/complete-milestone.md +136 -0
- package/commands/grd/debug.md +169 -0
- package/commands/grd/discuss-phase.md +86 -0
- package/commands/grd/evaluate.md +1095 -0
- package/commands/grd/execute-phase.md +339 -0
- package/commands/grd/explore.md +258 -0
- package/commands/grd/graduate.md +323 -0
- package/commands/grd/help.md +482 -0
- package/commands/grd/insert-phase.md +227 -0
- package/commands/grd/insights.md +231 -0
- package/commands/grd/join-discord.md +18 -0
- package/commands/grd/list-phase-assumptions.md +50 -0
- package/commands/grd/map-codebase.md +71 -0
- package/commands/grd/new-milestone.md +721 -0
- package/commands/grd/new-project.md +1008 -0
- package/commands/grd/pause-work.md +134 -0
- package/commands/grd/plan-milestone-gaps.md +295 -0
- package/commands/grd/plan-phase.md +525 -0
- package/commands/grd/progress.md +364 -0
- package/commands/grd/quick-explore.md +236 -0
- package/commands/grd/quick.md +309 -0
- package/commands/grd/remove-phase.md +349 -0
- package/commands/grd/research-phase.md +200 -0
- package/commands/grd/research.md +681 -0
- package/commands/grd/resume-work.md +40 -0
- package/commands/grd/set-profile.md +106 -0
- package/commands/grd/settings.md +136 -0
- package/commands/grd/update.md +172 -0
- package/commands/grd/verify-work.md +219 -0
- package/get-research-done/config/default.json +15 -0
- package/get-research-done/references/checkpoints.md +1078 -0
- package/get-research-done/references/continuation-format.md +249 -0
- package/get-research-done/references/git-integration.md +254 -0
- package/get-research-done/references/model-profiles.md +73 -0
- package/get-research-done/references/planning-config.md +94 -0
- package/get-research-done/references/questioning.md +141 -0
- package/get-research-done/references/tdd.md +263 -0
- package/get-research-done/references/ui-brand.md +160 -0
- package/get-research-done/references/verification-patterns.md +612 -0
- package/get-research-done/templates/DEBUG.md +159 -0
- package/get-research-done/templates/UAT.md +247 -0
- package/get-research-done/templates/archive-reason.md +195 -0
- package/get-research-done/templates/codebase/architecture.md +255 -0
- package/get-research-done/templates/codebase/concerns.md +310 -0
- package/get-research-done/templates/codebase/conventions.md +307 -0
- package/get-research-done/templates/codebase/integrations.md +280 -0
- package/get-research-done/templates/codebase/stack.md +186 -0
- package/get-research-done/templates/codebase/structure.md +285 -0
- package/get-research-done/templates/codebase/testing.md +480 -0
- package/get-research-done/templates/config.json +35 -0
- package/get-research-done/templates/context.md +283 -0
- package/get-research-done/templates/continue-here.md +78 -0
- package/get-research-done/templates/critic-log.md +288 -0
- package/get-research-done/templates/data-report.md +173 -0
- package/get-research-done/templates/debug-subagent-prompt.md +91 -0
- package/get-research-done/templates/decision-log.md +58 -0
- package/get-research-done/templates/decision.md +138 -0
- package/get-research-done/templates/discovery.md +146 -0
- package/get-research-done/templates/experiment-readme.md +104 -0
- package/get-research-done/templates/graduated-script.md +180 -0
- package/get-research-done/templates/iteration-summary.md +234 -0
- package/get-research-done/templates/milestone-archive.md +123 -0
- package/get-research-done/templates/milestone.md +115 -0
- package/get-research-done/templates/objective.md +271 -0
- package/get-research-done/templates/phase-prompt.md +567 -0
- package/get-research-done/templates/planner-subagent-prompt.md +117 -0
- package/get-research-done/templates/project.md +184 -0
- package/get-research-done/templates/requirements.md +231 -0
- package/get-research-done/templates/research-project/ARCHITECTURE.md +204 -0
- package/get-research-done/templates/research-project/FEATURES.md +147 -0
- package/get-research-done/templates/research-project/PITFALLS.md +200 -0
- package/get-research-done/templates/research-project/STACK.md +120 -0
- package/get-research-done/templates/research-project/SUMMARY.md +170 -0
- package/get-research-done/templates/research.md +529 -0
- package/get-research-done/templates/roadmap.md +202 -0
- package/get-research-done/templates/scorecard.json +113 -0
- package/get-research-done/templates/state.md +287 -0
- package/get-research-done/templates/summary.md +246 -0
- package/get-research-done/templates/user-setup.md +311 -0
- package/get-research-done/templates/verification-report.md +322 -0
- package/get-research-done/workflows/complete-milestone.md +756 -0
- package/get-research-done/workflows/diagnose-issues.md +231 -0
- package/get-research-done/workflows/discovery-phase.md +289 -0
- package/get-research-done/workflows/discuss-phase.md +433 -0
- package/get-research-done/workflows/execute-phase.md +657 -0
- package/get-research-done/workflows/execute-plan.md +1844 -0
- package/get-research-done/workflows/list-phase-assumptions.md +178 -0
- package/get-research-done/workflows/map-codebase.md +322 -0
- package/get-research-done/workflows/resume-project.md +307 -0
- package/get-research-done/workflows/transition.md +556 -0
- package/get-research-done/workflows/verify-phase.md +628 -0
- package/get-research-done/workflows/verify-work.md +596 -0
- package/hooks/dist/grd-check-update.js +61 -0
- package/hooks/dist/grd-statusline.js +84 -0
- package/package.json +47 -0
- package/scripts/audit-help-commands.sh +115 -0
- package/scripts/build-hooks.js +42 -0
- package/scripts/verify-all-commands.sh +246 -0
- package/scripts/verify-architect-warning.sh +35 -0
- package/scripts/verify-insights-mode.sh +40 -0
- package/scripts/verify-quick-mode.sh +20 -0
- package/scripts/verify-revise-data-routing.sh +139 -0
|
@@ -0,0 +1,283 @@
|
|
|
1
|
+
# Phase Context Template
|
|
2
|
+
|
|
3
|
+
Template for `.planning/phases/XX-name/{phase}-CONTEXT.md` - captures implementation decisions for a phase.
|
|
4
|
+
|
|
5
|
+
**Purpose:** Document decisions that downstream agents need. Researcher uses this to know WHAT to investigate. Planner uses this to know WHAT choices are locked vs flexible.
|
|
6
|
+
|
|
7
|
+
**Key principle:** Categories are NOT predefined. They emerge from what was actually discussed for THIS phase. A CLI phase has CLI-relevant sections, a UI phase has UI-relevant sections.
|
|
8
|
+
|
|
9
|
+
**Downstream consumers:**
|
|
10
|
+
- `grd-phase-researcher` — Reads decisions to focus research (e.g., "card layout" → research card component patterns)
|
|
11
|
+
- `grd-planner` — Reads decisions to create specific tasks (e.g., "infinite scroll" → task includes virtualization)
|
|
12
|
+
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
## File Template
|
|
16
|
+
|
|
17
|
+
```markdown
|
|
18
|
+
# Phase [X]: [Name] - Context
|
|
19
|
+
|
|
20
|
+
**Gathered:** [date]
|
|
21
|
+
**Status:** Ready for planning
|
|
22
|
+
|
|
23
|
+
<domain>
|
|
24
|
+
## Phase Boundary
|
|
25
|
+
|
|
26
|
+
[Clear statement of what this phase delivers — the scope anchor. This comes from ROADMAP.md and is fixed. Discussion clarifies implementation within this boundary.]
|
|
27
|
+
|
|
28
|
+
</domain>
|
|
29
|
+
|
|
30
|
+
<decisions>
|
|
31
|
+
## Implementation Decisions
|
|
32
|
+
|
|
33
|
+
### [Area 1 that was discussed]
|
|
34
|
+
- [Specific decision made]
|
|
35
|
+
- [Another decision if applicable]
|
|
36
|
+
|
|
37
|
+
### [Area 2 that was discussed]
|
|
38
|
+
- [Specific decision made]
|
|
39
|
+
|
|
40
|
+
### [Area 3 that was discussed]
|
|
41
|
+
- [Specific decision made]
|
|
42
|
+
|
|
43
|
+
### Claude's Discretion
|
|
44
|
+
[Areas where user explicitly said "you decide" — Claude has flexibility here during planning/implementation]
|
|
45
|
+
|
|
46
|
+
</decisions>
|
|
47
|
+
|
|
48
|
+
<specifics>
|
|
49
|
+
## Specific Ideas
|
|
50
|
+
|
|
51
|
+
[Any particular references, examples, or "I want it like X" moments from discussion. Product references, specific behaviors, interaction patterns.]
|
|
52
|
+
|
|
53
|
+
[If none: "No specific requirements — open to standard approaches"]
|
|
54
|
+
|
|
55
|
+
</specifics>
|
|
56
|
+
|
|
57
|
+
<deferred>
|
|
58
|
+
## Deferred Ideas
|
|
59
|
+
|
|
60
|
+
[Ideas that came up during discussion but belong in other phases. Captured here so they're not lost, but explicitly out of scope for this phase.]
|
|
61
|
+
|
|
62
|
+
[If none: "None — discussion stayed within phase scope"]
|
|
63
|
+
|
|
64
|
+
</deferred>
|
|
65
|
+
|
|
66
|
+
---
|
|
67
|
+
|
|
68
|
+
*Phase: XX-name*
|
|
69
|
+
*Context gathered: [date]*
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
<good_examples>
|
|
73
|
+
|
|
74
|
+
**Example 1: Visual feature (Post Feed)**
|
|
75
|
+
|
|
76
|
+
```markdown
|
|
77
|
+
# Phase 3: Post Feed - Context
|
|
78
|
+
|
|
79
|
+
**Gathered:** 2025-01-20
|
|
80
|
+
**Status:** Ready for planning
|
|
81
|
+
|
|
82
|
+
<domain>
|
|
83
|
+
## Phase Boundary
|
|
84
|
+
|
|
85
|
+
Display posts from followed users in a scrollable feed. Users can view posts and see engagement counts. Creating posts and interactions are separate phases.
|
|
86
|
+
|
|
87
|
+
</domain>
|
|
88
|
+
|
|
89
|
+
<decisions>
|
|
90
|
+
## Implementation Decisions
|
|
91
|
+
|
|
92
|
+
### Layout style
|
|
93
|
+
- Card-based layout, not timeline or list
|
|
94
|
+
- Each card shows: author avatar, name, timestamp, full post content, reaction counts
|
|
95
|
+
- Cards have subtle shadows, rounded corners — modern feel
|
|
96
|
+
|
|
97
|
+
### Loading behavior
|
|
98
|
+
- Infinite scroll, not pagination
|
|
99
|
+
- Pull-to-refresh on mobile
|
|
100
|
+
- New posts indicator at top ("3 new posts") rather than auto-inserting
|
|
101
|
+
|
|
102
|
+
### Empty state
|
|
103
|
+
- Friendly illustration + "Follow people to see posts here"
|
|
104
|
+
- Suggest 3-5 accounts to follow based on interests
|
|
105
|
+
|
|
106
|
+
### Claude's Discretion
|
|
107
|
+
- Loading skeleton design
|
|
108
|
+
- Exact spacing and typography
|
|
109
|
+
- Error state handling
|
|
110
|
+
|
|
111
|
+
</decisions>
|
|
112
|
+
|
|
113
|
+
<specifics>
|
|
114
|
+
## Specific Ideas
|
|
115
|
+
|
|
116
|
+
- "I like how Twitter shows the new posts indicator without disrupting your scroll position"
|
|
117
|
+
- Cards should feel like Linear's issue cards — clean, not cluttered
|
|
118
|
+
|
|
119
|
+
</specifics>
|
|
120
|
+
|
|
121
|
+
<deferred>
|
|
122
|
+
## Deferred Ideas
|
|
123
|
+
|
|
124
|
+
- Commenting on posts — Phase 5
|
|
125
|
+
- Bookmarking posts — add to backlog
|
|
126
|
+
|
|
127
|
+
</deferred>
|
|
128
|
+
|
|
129
|
+
---
|
|
130
|
+
|
|
131
|
+
*Phase: 03-post-feed*
|
|
132
|
+
*Context gathered: 2025-01-20*
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
**Example 2: CLI tool (Database backup)**
|
|
136
|
+
|
|
137
|
+
```markdown
|
|
138
|
+
# Phase 2: Backup Command - Context
|
|
139
|
+
|
|
140
|
+
**Gathered:** 2025-01-20
|
|
141
|
+
**Status:** Ready for planning
|
|
142
|
+
|
|
143
|
+
<domain>
|
|
144
|
+
## Phase Boundary
|
|
145
|
+
|
|
146
|
+
CLI command to backup database to local file or S3. Supports full and incremental backups. Restore command is a separate phase.
|
|
147
|
+
|
|
148
|
+
</domain>
|
|
149
|
+
|
|
150
|
+
<decisions>
|
|
151
|
+
## Implementation Decisions
|
|
152
|
+
|
|
153
|
+
### Output format
|
|
154
|
+
- JSON for programmatic use, table format for humans
|
|
155
|
+
- Default to table, --json flag for JSON
|
|
156
|
+
- Verbose mode (-v) shows progress, silent by default
|
|
157
|
+
|
|
158
|
+
### Flag design
|
|
159
|
+
- Short flags for common options: -o (output), -v (verbose), -f (force)
|
|
160
|
+
- Long flags for clarity: --incremental, --compress, --encrypt
|
|
161
|
+
- Required: database connection string (positional or --db)
|
|
162
|
+
|
|
163
|
+
### Error recovery
|
|
164
|
+
- Retry 3 times on network failure, then fail with clear message
|
|
165
|
+
- --no-retry flag to fail fast
|
|
166
|
+
- Partial backups are deleted on failure (no corrupt files)
|
|
167
|
+
|
|
168
|
+
### Claude's Discretion
|
|
169
|
+
- Exact progress bar implementation
|
|
170
|
+
- Compression algorithm choice
|
|
171
|
+
- Temp file handling
|
|
172
|
+
|
|
173
|
+
</decisions>
|
|
174
|
+
|
|
175
|
+
<specifics>
|
|
176
|
+
## Specific Ideas
|
|
177
|
+
|
|
178
|
+
- "I want it to feel like pg_dump — familiar to database people"
|
|
179
|
+
- Should work in CI pipelines (exit codes, no interactive prompts)
|
|
180
|
+
|
|
181
|
+
</specifics>
|
|
182
|
+
|
|
183
|
+
<deferred>
|
|
184
|
+
## Deferred Ideas
|
|
185
|
+
|
|
186
|
+
- Scheduled backups — separate phase
|
|
187
|
+
- Backup rotation/retention — add to backlog
|
|
188
|
+
|
|
189
|
+
</deferred>
|
|
190
|
+
|
|
191
|
+
---
|
|
192
|
+
|
|
193
|
+
*Phase: 02-backup-command*
|
|
194
|
+
*Context gathered: 2025-01-20*
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
**Example 3: Organization task (Photo library)**
|
|
198
|
+
|
|
199
|
+
```markdown
|
|
200
|
+
# Phase 1: Photo Organization - Context
|
|
201
|
+
|
|
202
|
+
**Gathered:** 2025-01-20
|
|
203
|
+
**Status:** Ready for planning
|
|
204
|
+
|
|
205
|
+
<domain>
|
|
206
|
+
## Phase Boundary
|
|
207
|
+
|
|
208
|
+
Organize existing photo library into structured folders. Handle duplicates and apply consistent naming. Tagging and search are separate phases.
|
|
209
|
+
|
|
210
|
+
</domain>
|
|
211
|
+
|
|
212
|
+
<decisions>
|
|
213
|
+
## Implementation Decisions
|
|
214
|
+
|
|
215
|
+
### Grouping criteria
|
|
216
|
+
- Primary grouping by year, then by month
|
|
217
|
+
- Events detected by time clustering (photos within 2 hours = same event)
|
|
218
|
+
- Event folders named by date + location if available
|
|
219
|
+
|
|
220
|
+
### Duplicate handling
|
|
221
|
+
- Keep highest resolution version
|
|
222
|
+
- Move duplicates to _duplicates folder (don't delete)
|
|
223
|
+
- Log all duplicate decisions for review
|
|
224
|
+
|
|
225
|
+
### Naming convention
|
|
226
|
+
- Format: YYYY-MM-DD_HH-MM-SS_originalname.ext
|
|
227
|
+
- Preserve original filename as suffix for searchability
|
|
228
|
+
- Handle name collisions with incrementing suffix
|
|
229
|
+
|
|
230
|
+
### Claude's Discretion
|
|
231
|
+
- Exact clustering algorithm
|
|
232
|
+
- How to handle photos with no EXIF data
|
|
233
|
+
- Folder emoji usage
|
|
234
|
+
|
|
235
|
+
</decisions>
|
|
236
|
+
|
|
237
|
+
<specifics>
|
|
238
|
+
## Specific Ideas
|
|
239
|
+
|
|
240
|
+
- "I want to be able to find photos by roughly when they were taken"
|
|
241
|
+
- Don't delete anything — worst case, move to a review folder
|
|
242
|
+
|
|
243
|
+
</specifics>
|
|
244
|
+
|
|
245
|
+
<deferred>
|
|
246
|
+
## Deferred Ideas
|
|
247
|
+
|
|
248
|
+
- Face detection grouping — future phase
|
|
249
|
+
- Cloud sync — out of scope for now
|
|
250
|
+
|
|
251
|
+
</deferred>
|
|
252
|
+
|
|
253
|
+
---
|
|
254
|
+
|
|
255
|
+
*Phase: 01-photo-organization*
|
|
256
|
+
*Context gathered: 2025-01-20*
|
|
257
|
+
```
|
|
258
|
+
|
|
259
|
+
</good_examples>
|
|
260
|
+
|
|
261
|
+
<guidelines>
|
|
262
|
+
**This template captures DECISIONS for downstream agents.**
|
|
263
|
+
|
|
264
|
+
The output should answer: "What does the researcher need to investigate? What choices are locked for the planner?"
|
|
265
|
+
|
|
266
|
+
**Good content (concrete decisions):**
|
|
267
|
+
- "Card-based layout, not timeline"
|
|
268
|
+
- "Retry 3 times on network failure, then fail"
|
|
269
|
+
- "Group by year, then by month"
|
|
270
|
+
- "JSON for programmatic use, table for humans"
|
|
271
|
+
|
|
272
|
+
**Bad content (too vague):**
|
|
273
|
+
- "Should feel modern and clean"
|
|
274
|
+
- "Good user experience"
|
|
275
|
+
- "Fast and responsive"
|
|
276
|
+
- "Easy to use"
|
|
277
|
+
|
|
278
|
+
**After creation:**
|
|
279
|
+
- File lives in phase directory: `.planning/phases/XX-name/{phase}-CONTEXT.md`
|
|
280
|
+
- `grd-phase-researcher` uses decisions to focus investigation
|
|
281
|
+
- `grd-planner` uses decisions + research to create executable tasks
|
|
282
|
+
- Downstream agents should NOT need to ask the user again about captured decisions
|
|
283
|
+
</guidelines>
|
|
@@ -0,0 +1,78 @@
|
|
|
1
|
+
# Continue-Here Template
|
|
2
|
+
|
|
3
|
+
Copy and fill this structure for `.planning/phases/XX-name/.continue-here.md`:
|
|
4
|
+
|
|
5
|
+
```yaml
|
|
6
|
+
---
|
|
7
|
+
phase: XX-name
|
|
8
|
+
task: 3
|
|
9
|
+
total_tasks: 7
|
|
10
|
+
status: in_progress
|
|
11
|
+
last_updated: 2025-01-15T14:30:00Z
|
|
12
|
+
---
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
```markdown
|
|
16
|
+
<current_state>
|
|
17
|
+
[Where exactly are we? What's the immediate context?]
|
|
18
|
+
</current_state>
|
|
19
|
+
|
|
20
|
+
<completed_work>
|
|
21
|
+
[What got done this session - be specific]
|
|
22
|
+
|
|
23
|
+
- Task 1: [name] - Done
|
|
24
|
+
- Task 2: [name] - Done
|
|
25
|
+
- Task 3: [name] - In progress, [what's done on it]
|
|
26
|
+
</completed_work>
|
|
27
|
+
|
|
28
|
+
<remaining_work>
|
|
29
|
+
[What's left in this phase]
|
|
30
|
+
|
|
31
|
+
- Task 3: [name] - [what's left to do]
|
|
32
|
+
- Task 4: [name] - Not started
|
|
33
|
+
- Task 5: [name] - Not started
|
|
34
|
+
</remaining_work>
|
|
35
|
+
|
|
36
|
+
<decisions_made>
|
|
37
|
+
[Key decisions and why - so next session doesn't re-debate]
|
|
38
|
+
|
|
39
|
+
- Decided to use [X] because [reason]
|
|
40
|
+
- Chose [approach] over [alternative] because [reason]
|
|
41
|
+
</decisions_made>
|
|
42
|
+
|
|
43
|
+
<blockers>
|
|
44
|
+
[Anything stuck or waiting on external factors]
|
|
45
|
+
|
|
46
|
+
- [Blocker 1]: [status/workaround]
|
|
47
|
+
</blockers>
|
|
48
|
+
|
|
49
|
+
<context>
|
|
50
|
+
[Mental state, "vibe", anything that helps resume smoothly]
|
|
51
|
+
|
|
52
|
+
[What were you thinking about? What was the plan?
|
|
53
|
+
This is the "pick up exactly where you left off" context.]
|
|
54
|
+
</context>
|
|
55
|
+
|
|
56
|
+
<next_action>
|
|
57
|
+
[The very first thing to do when resuming]
|
|
58
|
+
|
|
59
|
+
Start with: [specific action]
|
|
60
|
+
</next_action>
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
<yaml_fields>
|
|
64
|
+
Required YAML frontmatter:
|
|
65
|
+
|
|
66
|
+
- `phase`: Directory name (e.g., `02-authentication`)
|
|
67
|
+
- `task`: Current task number
|
|
68
|
+
- `total_tasks`: How many tasks in phase
|
|
69
|
+
- `status`: `in_progress`, `blocked`, `almost_done`
|
|
70
|
+
- `last_updated`: ISO timestamp
|
|
71
|
+
</yaml_fields>
|
|
72
|
+
|
|
73
|
+
<guidelines>
|
|
74
|
+
- Be specific enough that a fresh Claude instance understands immediately
|
|
75
|
+
- Include WHY decisions were made, not just what
|
|
76
|
+
- The `<next_action>` should be actionable without reading anything else
|
|
77
|
+
- This file gets DELETED after resume - it's not permanent storage
|
|
78
|
+
</guidelines>
|
|
@@ -0,0 +1,288 @@
|
|
|
1
|
+
# Critic Evaluation: {{run_name}}
|
|
2
|
+
|
|
3
|
+
**Timestamp:** {{timestamp}}
|
|
4
|
+
**Iteration:** {{iteration_number}}
|
|
5
|
+
**Objective:** {{brief_hypothesis}}
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Verdict
|
|
10
|
+
|
|
11
|
+
**Decision:** {{PROCEED | REVISE_METHOD | REVISE_DATA | ESCALATE}}
|
|
12
|
+
**Confidence:** {{HIGH | MEDIUM | LOW}}
|
|
13
|
+
|
|
14
|
+
## Reasoning
|
|
15
|
+
|
|
16
|
+
{{explanation_of_routing_decision}}
|
|
17
|
+
|
|
18
|
+
{{context_for_why_this_verdict_makes_sense}}
|
|
19
|
+
|
|
20
|
+
{{evidence_supporting_decision}}
|
|
21
|
+
|
|
22
|
+
## Metrics Summary
|
|
23
|
+
|
|
24
|
+
| Metric | Value | Threshold | Comparison | Result |
|
|
25
|
+
|--------|-------|-----------|------------|--------|
|
|
26
|
+
| {{metric_name}} | {{value}} | {{threshold}} | {{>|<|=}} | {{PASS|FAIL}} |
|
|
27
|
+
|
|
28
|
+
**Composite Score:** {{weighted_average}} (threshold: {{composite_threshold}})
|
|
29
|
+
|
|
30
|
+
**Baseline Comparison:** {{if_baseline_defined}}
|
|
31
|
+
|
|
32
|
+
| Metric | Baseline | Actual | Improvement | % Change |
|
|
33
|
+
|--------|----------|--------|-------------|----------|
|
|
34
|
+
| {{metric_name}} | {{baseline_value}} | {{actual_value}} | {{delta}} | {{percentage}} |
|
|
35
|
+
|
|
36
|
+
## Strengths
|
|
37
|
+
|
|
38
|
+
{{list_of_what_experiment_does_well}}
|
|
39
|
+
|
|
40
|
+
Examples:
|
|
41
|
+
- Implementation correctly uses stratified k-fold as specified in OBJECTIVE.md
|
|
42
|
+
- Random seed set to 42 for reproducibility
|
|
43
|
+
- Clear documentation in README.md
|
|
44
|
+
- Hyperparameters well-documented in config.yaml
|
|
45
|
+
- Code quality is high with proper error handling
|
|
46
|
+
- Training/validation curves show healthy learning behavior
|
|
47
|
+
|
|
48
|
+
## Weaknesses
|
|
49
|
+
|
|
50
|
+
{{list_of_issues_or_concerns}}
|
|
51
|
+
|
|
52
|
+
Examples:
|
|
53
|
+
- F1 score (0.78) below threshold (0.80)
|
|
54
|
+
- Train-test gap of 0.08 suggests mild overfitting
|
|
55
|
+
- Learning rate may be too high (training loss plateaus early)
|
|
56
|
+
- Missing validation curves in output
|
|
57
|
+
- Evaluation methodology doesn't match OBJECTIVE.md (used holdout instead of k-fold)
|
|
58
|
+
- Random seed not set (non-reproducible results)
|
|
59
|
+
|
|
60
|
+
## Recommendations
|
|
61
|
+
|
|
62
|
+
{{list_of_specific_actionable_suggestions}}
|
|
63
|
+
|
|
64
|
+
**For REVISE_METHOD verdicts:**
|
|
65
|
+
- Reduce learning rate from 0.1 to 0.01
|
|
66
|
+
- Add dropout layer with rate 0.3 to reduce overfitting
|
|
67
|
+
- Increase training epochs from 50 to 100 (training curve not plateaued)
|
|
68
|
+
- Add early stopping with patience=10 to prevent overfitting
|
|
69
|
+
- Fix data split bug on line 45 in train.py
|
|
70
|
+
- Add missing metrics to output (currently missing F1 score)
|
|
71
|
+
|
|
72
|
+
**For REVISE_DATA verdicts:**
|
|
73
|
+
- Investigate feature 'transaction_id' for potential leakage (dominates feature importance)
|
|
74
|
+
- Re-analyze temporal features for leakage (results suggest future information used)
|
|
75
|
+
- Verify target column is correct (baseline outperforms model significantly)
|
|
76
|
+
- Check for train-test overlap (metrics suggest data contamination)
|
|
77
|
+
- Investigate data quality issues (high variance across folds)
|
|
78
|
+
|
|
79
|
+
**For PROCEED verdicts:**
|
|
80
|
+
- Document validation approach in final report
|
|
81
|
+
- Consider additional robustness checks before production
|
|
82
|
+
- Monitor for drift in production deployment
|
|
83
|
+
|
|
84
|
+
**For ESCALATE verdicts:**
|
|
85
|
+
- Human decision required (see evidence package below)
|
|
86
|
+
- Consider revising hypothesis or success criteria
|
|
87
|
+
- May need to collect additional data
|
|
88
|
+
- Strategic pivot may be necessary
|
|
89
|
+
|
|
90
|
+
## Investigation Notes
|
|
91
|
+
|
|
92
|
+
{{notes_from_scientific_skepticism_checks}}
|
|
93
|
+
|
|
94
|
+
### Suspicious Success Check
|
|
95
|
+
|
|
96
|
+
{{result_of_investigation_for_unusually_high_metrics}}
|
|
97
|
+
|
|
98
|
+
- Metrics: {{list_metrics_and_values}}
|
|
99
|
+
- Task complexity: {{assessment_of_difficulty}}
|
|
100
|
+
- Assessment: {{plausible | suspicious | highly_suspicious}}
|
|
101
|
+
- Reasoning: {{why}}
|
|
102
|
+
|
|
103
|
+
### Train-Test Gap
|
|
104
|
+
|
|
105
|
+
- Train metric: {{value}}
|
|
106
|
+
- Validation metric: {{value}}
|
|
107
|
+
- Gap: {{delta}}
|
|
108
|
+
- Assessment: {{acceptable | moderate_concern | high_concern}}
|
|
109
|
+
- Reasoning: {{why}}
|
|
110
|
+
|
|
111
|
+
### Reproducibility
|
|
112
|
+
|
|
113
|
+
- Random seed set: {{yes|no}}
|
|
114
|
+
- Dependencies documented: {{yes|no}}
|
|
115
|
+
- Data references recorded: {{yes|no}}
|
|
116
|
+
- Assessment: {{reproducible | partially_reproducible | non_reproducible}}
|
|
117
|
+
|
|
118
|
+
### Data Integrity
|
|
119
|
+
|
|
120
|
+
{{if_DATA_REPORT_referenced}}
|
|
121
|
+
|
|
122
|
+
- Leakage features excluded: {{yes|no|N/A}}
|
|
123
|
+
- Class imbalance handled: {{yes|no|N/A}}
|
|
124
|
+
- Temporal splits used if needed: {{yes|no|N/A}}
|
|
125
|
+
- Assessment: {{concerns_none | concerns_minor | concerns_major}}
|
|
126
|
+
|
|
127
|
+
### Code Quality
|
|
128
|
+
|
|
129
|
+
- Evaluation matches OBJECTIVE.md: {{yes|no}}
|
|
130
|
+
- Data split correct: {{yes|no}}
|
|
131
|
+
- Hyperparameters documented: {{yes|no}}
|
|
132
|
+
- Error handling present: {{yes|no}}
|
|
133
|
+
- Assessment: {{good | acceptable | needs_improvement}}
|
|
134
|
+
|
|
135
|
+
## Trend Analysis
|
|
136
|
+
|
|
137
|
+
**Iteration Trend:** {{improving | stagnant | degrading | first_run}}
|
|
138
|
+
|
|
139
|
+
{{comparison_with_previous_iterations_if_available}}
|
|
140
|
+
|
|
141
|
+
**Historical Performance:**
|
|
142
|
+
|
|
143
|
+
| Iteration | Composite Score | Key Changes | Verdict |
|
|
144
|
+
|-----------|----------------|-------------|---------|
|
|
145
|
+
| 1 | {{value}} | {{change_description}} | {{verdict}} |
|
|
146
|
+
| 2 | {{value}} | {{change_description}} | {{verdict}} |
|
|
147
|
+
| 3 (current) | {{value}} | {{change_description}} | {{verdict}} |
|
|
148
|
+
|
|
149
|
+
**Trend Assessment:**
|
|
150
|
+
|
|
151
|
+
{{detailed_analysis_of_progress_across_iterations}}
|
|
152
|
+
|
|
153
|
+
Examples:
|
|
154
|
+
- "Metrics improving steadily (+0.02 per iteration). Current trajectory suggests threshold will be reached in 1-2 more iterations."
|
|
155
|
+
- "Metrics stagnant across 3 iterations despite different hyperparameters. May indicate fundamental limitation."
|
|
156
|
+
- "Metrics degrading. Recent changes counterproductive—consider reverting to iteration 1 approach."
|
|
157
|
+
|
|
158
|
+
**Cycle Detection:**
|
|
159
|
+
|
|
160
|
+
{{if_same_verdict_repeated}}
|
|
161
|
+
|
|
162
|
+
- Same verdict: {{verdict}} repeated {{N}} times
|
|
163
|
+
- Assessment: {{no_cycle | potential_cycle | cycle_detected}}
|
|
164
|
+
- Action: {{continue | escalate | try_different_approach}}
|
|
165
|
+
|
|
166
|
+
## Next Steps
|
|
167
|
+
|
|
168
|
+
{{based_on_verdict}}
|
|
169
|
+
|
|
170
|
+
### If PROCEED (HIGH confidence)
|
|
171
|
+
Ready for quantitative evaluation by Evaluator agent.
|
|
172
|
+
|
|
173
|
+
**Action:** Run `/grd:evaluate` to generate SCORECARD.json
|
|
174
|
+
|
|
175
|
+
**What happens next:**
|
|
176
|
+
- Evaluator will run comprehensive benchmark suite
|
|
177
|
+
- Results will be compared against OBJECTIVE.md criteria
|
|
178
|
+
- SCORECARD.json will be generated for human evaluation gate
|
|
179
|
+
|
|
180
|
+
### If PROCEED (MEDIUM confidence)
|
|
181
|
+
Metrics meet criteria but minor concerns noted.
|
|
182
|
+
|
|
183
|
+
**Action:** Proceed to Evaluator with caveats
|
|
184
|
+
|
|
185
|
+
**Caveats:**
|
|
186
|
+
{{list_of_minor_concerns_to_monitor}}
|
|
187
|
+
|
|
188
|
+
### If PROCEED (LOW confidence)
|
|
189
|
+
**HUMAN GATE REQUIRED**
|
|
190
|
+
|
|
191
|
+
Metrics pass thresholds but concerns exist:
|
|
192
|
+
{{list_of_concerns}}
|
|
193
|
+
|
|
194
|
+
**Question for human:**
|
|
195
|
+
Should we proceed to Evaluator despite concerns, or investigate further?
|
|
196
|
+
|
|
197
|
+
**Options:**
|
|
198
|
+
1. Proceed to Evaluator (accept concerns)
|
|
199
|
+
2. REVISE_METHOD (address concerns first)
|
|
200
|
+
3. ESCALATE (need strategic decision)
|
|
201
|
+
|
|
202
|
+
### If REVISE_METHOD
|
|
203
|
+
Address implementation issues and re-run experiment.
|
|
204
|
+
|
|
205
|
+
**Action:** Implement recommendations above, then run experiment again
|
|
206
|
+
|
|
207
|
+
**Specific fixes needed:**
|
|
208
|
+
{{prioritized_list_of_fixes}}
|
|
209
|
+
|
|
210
|
+
**Expected impact:**
|
|
211
|
+
{{what_should_improve_if_fixes_applied}}
|
|
212
|
+
|
|
213
|
+
**Estimated effort:** {{low|medium|high}}
|
|
214
|
+
|
|
215
|
+
### If REVISE_DATA
|
|
216
|
+
Return to data exploration with specific concerns.
|
|
217
|
+
|
|
218
|
+
**Action:** Run `/grd:explore` with focus areas
|
|
219
|
+
|
|
220
|
+
**Concerns to investigate:**
|
|
221
|
+
{{list_of_specific_data_concerns}}
|
|
222
|
+
|
|
223
|
+
**What to look for:**
|
|
224
|
+
{{guidance_for_data_re_analysis}}
|
|
225
|
+
|
|
226
|
+
**Updates needed:**
|
|
227
|
+
- Append findings to DATA_REPORT.md
|
|
228
|
+
- Update OBJECTIVE.md if constraints change
|
|
229
|
+
- Re-run experiment with corrected data
|
|
230
|
+
|
|
231
|
+
### If ESCALATE
|
|
232
|
+
Human decision required—cannot determine clear path forward.
|
|
233
|
+
|
|
234
|
+
**Reason for escalation:** {{cycle_detected | ambiguous_root_cause | iteration_limit | strategic_decision_needed}}
|
|
235
|
+
|
|
236
|
+
**Evidence Package:**
|
|
237
|
+
|
|
238
|
+
#### Iteration History
|
|
239
|
+
{{summary_of_all_attempts}}
|
|
240
|
+
|
|
241
|
+
#### Conflicting Signals
|
|
242
|
+
{{description_of_ambiguity_or_contradiction}}
|
|
243
|
+
|
|
244
|
+
#### Attempted Resolutions
|
|
245
|
+
{{what_was_tried_and_why_it_didnt_work}}
|
|
246
|
+
|
|
247
|
+
#### Recommendation
|
|
248
|
+
{{suggested_strategic_direction_or_questions_for_human}}
|
|
249
|
+
|
|
250
|
+
**Human Options:**
|
|
251
|
+
1. Continue with more iterations (increase limit)
|
|
252
|
+
2. Revise hypothesis or success criteria (update OBJECTIVE.md)
|
|
253
|
+
3. Archive hypothesis as disproven (document learnings)
|
|
254
|
+
4. Return to data collection (need more/better data)
|
|
255
|
+
5. Strategic pivot (fundamentally different approach)
|
|
256
|
+
|
|
257
|
+
## Appendix
|
|
258
|
+
|
|
259
|
+
### Falsification Criteria Status
|
|
260
|
+
|
|
261
|
+
{{if_falsification_criteria_defined_in_OBJECTIVE}}
|
|
262
|
+
|
|
263
|
+
| Criterion | Status | Notes |
|
|
264
|
+
|-----------|--------|-------|
|
|
265
|
+
| {{criterion_name}} | {{not_met | approaching | met}} | {{details}} |
|
|
266
|
+
|
|
267
|
+
**Assessment:** {{hypothesis_still_viable | approaching_falsification | falsified}}
|
|
268
|
+
|
|
269
|
+
### Experiment Metadata
|
|
270
|
+
|
|
271
|
+
- **Run directory:** {{path_to_run_NNN}}
|
|
272
|
+
- **Code files:** {{list_of_key_files}}
|
|
273
|
+
- **Configuration:** {{path_to_config_yaml_or_none}}
|
|
274
|
+
- **Documentation:** {{path_to_README_or_none}}
|
|
275
|
+
- **Training time:** {{duration_in_seconds_or_minutes}}
|
|
276
|
+
- **Compute resources:** {{cpu|gpu|tpu}} - {{details}}
|
|
277
|
+
|
|
278
|
+
### References
|
|
279
|
+
|
|
280
|
+
- **OBJECTIVE.md:** `.planning/OBJECTIVE.md`
|
|
281
|
+
- **DATA_REPORT.md:** {{path_or_none}}
|
|
282
|
+
- **Previous iterations:** {{paths_to_previous_CRITIC_LOGs}}
|
|
283
|
+
|
|
284
|
+
---
|
|
285
|
+
|
|
286
|
+
*Critique by grd-critic*
|
|
287
|
+
*Agent version: GRD Critic v1.0*
|
|
288
|
+
*Referenced: .planning/OBJECTIVE.md*
|