get-research-done 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +560 -0
- package/agents/grd-architect.md +789 -0
- package/agents/grd-codebase-mapper.md +738 -0
- package/agents/grd-critic.md +1065 -0
- package/agents/grd-debugger.md +1203 -0
- package/agents/grd-evaluator.md +948 -0
- package/agents/grd-executor.md +784 -0
- package/agents/grd-explorer.md +2063 -0
- package/agents/grd-graduator.md +484 -0
- package/agents/grd-integration-checker.md +423 -0
- package/agents/grd-phase-researcher.md +641 -0
- package/agents/grd-plan-checker.md +745 -0
- package/agents/grd-planner.md +1386 -0
- package/agents/grd-project-researcher.md +865 -0
- package/agents/grd-research-synthesizer.md +256 -0
- package/agents/grd-researcher.md +2361 -0
- package/agents/grd-roadmapper.md +605 -0
- package/agents/grd-verifier.md +778 -0
- package/bin/install.js +1294 -0
- package/commands/grd/add-phase.md +207 -0
- package/commands/grd/add-todo.md +193 -0
- package/commands/grd/architect.md +283 -0
- package/commands/grd/audit-milestone.md +277 -0
- package/commands/grd/check-todos.md +228 -0
- package/commands/grd/complete-milestone.md +136 -0
- package/commands/grd/debug.md +169 -0
- package/commands/grd/discuss-phase.md +86 -0
- package/commands/grd/evaluate.md +1095 -0
- package/commands/grd/execute-phase.md +339 -0
- package/commands/grd/explore.md +258 -0
- package/commands/grd/graduate.md +323 -0
- package/commands/grd/help.md +482 -0
- package/commands/grd/insert-phase.md +227 -0
- package/commands/grd/insights.md +231 -0
- package/commands/grd/join-discord.md +18 -0
- package/commands/grd/list-phase-assumptions.md +50 -0
- package/commands/grd/map-codebase.md +71 -0
- package/commands/grd/new-milestone.md +721 -0
- package/commands/grd/new-project.md +1008 -0
- package/commands/grd/pause-work.md +134 -0
- package/commands/grd/plan-milestone-gaps.md +295 -0
- package/commands/grd/plan-phase.md +525 -0
- package/commands/grd/progress.md +364 -0
- package/commands/grd/quick-explore.md +236 -0
- package/commands/grd/quick.md +309 -0
- package/commands/grd/remove-phase.md +349 -0
- package/commands/grd/research-phase.md +200 -0
- package/commands/grd/research.md +681 -0
- package/commands/grd/resume-work.md +40 -0
- package/commands/grd/set-profile.md +106 -0
- package/commands/grd/settings.md +136 -0
- package/commands/grd/update.md +172 -0
- package/commands/grd/verify-work.md +219 -0
- package/get-research-done/config/default.json +15 -0
- package/get-research-done/references/checkpoints.md +1078 -0
- package/get-research-done/references/continuation-format.md +249 -0
- package/get-research-done/references/git-integration.md +254 -0
- package/get-research-done/references/model-profiles.md +73 -0
- package/get-research-done/references/planning-config.md +94 -0
- package/get-research-done/references/questioning.md +141 -0
- package/get-research-done/references/tdd.md +263 -0
- package/get-research-done/references/ui-brand.md +160 -0
- package/get-research-done/references/verification-patterns.md +612 -0
- package/get-research-done/templates/DEBUG.md +159 -0
- package/get-research-done/templates/UAT.md +247 -0
- package/get-research-done/templates/archive-reason.md +195 -0
- package/get-research-done/templates/codebase/architecture.md +255 -0
- package/get-research-done/templates/codebase/concerns.md +310 -0
- package/get-research-done/templates/codebase/conventions.md +307 -0
- package/get-research-done/templates/codebase/integrations.md +280 -0
- package/get-research-done/templates/codebase/stack.md +186 -0
- package/get-research-done/templates/codebase/structure.md +285 -0
- package/get-research-done/templates/codebase/testing.md +480 -0
- package/get-research-done/templates/config.json +35 -0
- package/get-research-done/templates/context.md +283 -0
- package/get-research-done/templates/continue-here.md +78 -0
- package/get-research-done/templates/critic-log.md +288 -0
- package/get-research-done/templates/data-report.md +173 -0
- package/get-research-done/templates/debug-subagent-prompt.md +91 -0
- package/get-research-done/templates/decision-log.md +58 -0
- package/get-research-done/templates/decision.md +138 -0
- package/get-research-done/templates/discovery.md +146 -0
- package/get-research-done/templates/experiment-readme.md +104 -0
- package/get-research-done/templates/graduated-script.md +180 -0
- package/get-research-done/templates/iteration-summary.md +234 -0
- package/get-research-done/templates/milestone-archive.md +123 -0
- package/get-research-done/templates/milestone.md +115 -0
- package/get-research-done/templates/objective.md +271 -0
- package/get-research-done/templates/phase-prompt.md +567 -0
- package/get-research-done/templates/planner-subagent-prompt.md +117 -0
- package/get-research-done/templates/project.md +184 -0
- package/get-research-done/templates/requirements.md +231 -0
- package/get-research-done/templates/research-project/ARCHITECTURE.md +204 -0
- package/get-research-done/templates/research-project/FEATURES.md +147 -0
- package/get-research-done/templates/research-project/PITFALLS.md +200 -0
- package/get-research-done/templates/research-project/STACK.md +120 -0
- package/get-research-done/templates/research-project/SUMMARY.md +170 -0
- package/get-research-done/templates/research.md +529 -0
- package/get-research-done/templates/roadmap.md +202 -0
- package/get-research-done/templates/scorecard.json +113 -0
- package/get-research-done/templates/state.md +287 -0
- package/get-research-done/templates/summary.md +246 -0
- package/get-research-done/templates/user-setup.md +311 -0
- package/get-research-done/templates/verification-report.md +322 -0
- package/get-research-done/workflows/complete-milestone.md +756 -0
- package/get-research-done/workflows/diagnose-issues.md +231 -0
- package/get-research-done/workflows/discovery-phase.md +289 -0
- package/get-research-done/workflows/discuss-phase.md +433 -0
- package/get-research-done/workflows/execute-phase.md +657 -0
- package/get-research-done/workflows/execute-plan.md +1844 -0
- package/get-research-done/workflows/list-phase-assumptions.md +178 -0
- package/get-research-done/workflows/map-codebase.md +322 -0
- package/get-research-done/workflows/resume-project.md +307 -0
- package/get-research-done/workflows/transition.md +556 -0
- package/get-research-done/workflows/verify-phase.md +628 -0
- package/get-research-done/workflows/verify-work.md +596 -0
- package/hooks/dist/grd-check-update.js +61 -0
- package/hooks/dist/grd-statusline.js +84 -0
- package/package.json +47 -0
- package/scripts/audit-help-commands.sh +115 -0
- package/scripts/build-hooks.js +42 -0
- package/scripts/verify-all-commands.sh +246 -0
- package/scripts/verify-architect-warning.sh +35 -0
- package/scripts/verify-insights-mode.sh +40 -0
- package/scripts/verify-quick-mode.sh +20 -0
- package/scripts/verify-revise-data-routing.sh +139 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025 Lex Christopherson
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,560 @@
|
|
|
1
|
+
<div align="center">
|
|
2
|
+
|
|
3
|
+
# GET RESEARCH DONE (GRD)
|
|
4
|
+
|
|
5
|
+
**A recursive, agentic framework for ML research with hypothesis-driven experimentation for Claude Code.**
|
|
6
|
+
|
|
7
|
+
**Structured ML experimentation with scientific rigor — from hypothesis to validated conclusion, with a Critic agent enforcing skepticism at every step.**
|
|
8
|
+
|
|
9
|
+
[](https://www.npmjs.com/package/get-research-done)
|
|
10
|
+
[](https://www.npmjs.com/package/get-research-done)
|
|
11
|
+
[](https://discord.gg/5JJgD5svVS)
|
|
12
|
+
[](https://github.com/glittercowboy/get-research-done)
|
|
13
|
+
[](LICENSE)
|
|
14
|
+
|
|
15
|
+
<br>
|
|
16
|
+
|
|
17
|
+
```bash
|
|
18
|
+
npx get-research-done
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
**Works on Mac, Windows, and Linux.**
|
|
22
|
+
|
|
23
|
+
<br>
|
|
24
|
+
|
|
25
|
+

|
|
26
|
+
|
|
27
|
+
<br>
|
|
28
|
+
|
|
29
|
+
*"If you know clearly what you want, this WILL build it for you. No bs."*
|
|
30
|
+
|
|
31
|
+
*"I've done SpecKit, OpenSpec and Taskmaster — this has produced the best results for me."*
|
|
32
|
+
|
|
33
|
+
*"By far the most powerful addition to my Claude Code. Nothing over-engineered. Literally just gets shit done."*
|
|
34
|
+
|
|
35
|
+
<br>
|
|
36
|
+
|
|
37
|
+
**Trusted by engineers at Amazon, Google, Shopify, and Webflow.**
|
|
38
|
+
|
|
39
|
+
[Why I Built This](#why-i-built-this) · [How It Works](#how-it-works) · [Commands](#commands) · [Why It Works](#why-it-works)
|
|
40
|
+
|
|
41
|
+
</div>
|
|
42
|
+
|
|
43
|
+
---
|
|
44
|
+
|
|
45
|
+
## Why I Built This
|
|
46
|
+
|
|
47
|
+
ML research has a reproducibility crisis. Experiments are ad-hoc, hypotheses are vague, validation is subjective, and insights get lost.
|
|
48
|
+
|
|
49
|
+
I've watched researchers spend weeks on experiments with fundamental flaws: data leakage baked into features, "95% accuracy" on shifted distributions, negative results deleted rather than preserved. The problem isn't capability — it's structure.
|
|
50
|
+
|
|
51
|
+
So I built GRD. It's the framework that makes ML research systematic:
|
|
52
|
+
|
|
53
|
+
- **Data-first philosophy** — Explore your data before forming hypotheses
|
|
54
|
+
- **Testable hypotheses** — Falsification criteria, success metrics, baseline requirements
|
|
55
|
+
- **Automated skepticism** — Critic agent catches leakage, overfitting, and logical errors
|
|
56
|
+
- **Recursive validation** — Results contradict the data? System routes back to exploration
|
|
57
|
+
- **Human-in-the-loop gates** — You make final calls on validation and archival
|
|
58
|
+
- **Negative result preservation** — Failed hypotheses are valuable knowledge
|
|
59
|
+
|
|
60
|
+
The complexity is in the system, not in your workflow. You run five commands: `/grd:explore`, `/grd:architect`, `/grd:research`, `/grd:evaluate`, `/grd:graduate`. The agents handle the rest.
|
|
61
|
+
|
|
62
|
+
— **Ulmentflam**
|
|
63
|
+
|
|
64
|
+
---
|
|
65
|
+
|
|
66
|
+
## Who This Is For
|
|
67
|
+
|
|
68
|
+
ML researchers and practitioners who want structured experimentation with hypothesis-driven workflows — without building custom research infrastructure from scratch.
|
|
69
|
+
|
|
70
|
+
---
|
|
71
|
+
|
|
72
|
+
## Getting Started
|
|
73
|
+
|
|
74
|
+
```bash
|
|
75
|
+
npx get-research-done
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
The installer prompts you to choose:
|
|
79
|
+
1. **Runtime** — Claude Code, OpenCode, or both
|
|
80
|
+
2. **Location** — Global (all projects) or local (current project only)
|
|
81
|
+
|
|
82
|
+
Verify with `/grd:help` inside your Claude Code or OpenCode interface.
|
|
83
|
+
|
|
84
|
+
### Staying Updated
|
|
85
|
+
|
|
86
|
+
GRD evolves fast. Update periodically:
|
|
87
|
+
|
|
88
|
+
```bash
|
|
89
|
+
npx get-research-done@latest
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
<details>
|
|
93
|
+
<summary><strong>Non-interactive Install (Docker, CI, Scripts)</strong></summary>
|
|
94
|
+
|
|
95
|
+
```bash
|
|
96
|
+
# Claude Code
|
|
97
|
+
npx get-research-done --claude --global # Install to ~/.claude/
|
|
98
|
+
npx get-research-done --claude --local # Install to ./.claude/
|
|
99
|
+
|
|
100
|
+
# OpenCode (open source, free models)
|
|
101
|
+
npx get-research-done --opencode --global # Install to ~/.opencode/
|
|
102
|
+
|
|
103
|
+
# Both runtimes
|
|
104
|
+
npx get-research-done --both --global # Install to both directories
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
Use `--global` (`-g`) or `--local` (`-l`) to skip the location prompt.
|
|
108
|
+
Use `--claude`, `--opencode`, or `--both` to skip the runtime prompt.
|
|
109
|
+
|
|
110
|
+
</details>
|
|
111
|
+
|
|
112
|
+
<details>
|
|
113
|
+
<summary><strong>Development Installation</strong></summary>
|
|
114
|
+
|
|
115
|
+
Clone the repository and run the installer locally:
|
|
116
|
+
|
|
117
|
+
```bash
|
|
118
|
+
git clone https://github.com/glittercowboy/get-research-done.git
|
|
119
|
+
cd get-research-done
|
|
120
|
+
node bin/install.js --claude --local
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
Installs to `./.claude/` for testing modifications before contributing.
|
|
124
|
+
|
|
125
|
+
</details>
|
|
126
|
+
|
|
127
|
+
### Recommended: Skip Permissions Mode
|
|
128
|
+
|
|
129
|
+
GRD is designed for frictionless automation. Run Claude Code with:
|
|
130
|
+
|
|
131
|
+
```bash
|
|
132
|
+
claude --dangerously-skip-permissions
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
> [!TIP]
|
|
136
|
+
> This is how GRD is intended to be used — stopping to approve `date` and `git commit` 50 times defeats the purpose.
|
|
137
|
+
|
|
138
|
+
<details>
|
|
139
|
+
<summary><strong>Alternative: Granular Permissions</strong></summary>
|
|
140
|
+
|
|
141
|
+
If you prefer not to use that flag, add this to your project's `.claude/settings.json`:
|
|
142
|
+
|
|
143
|
+
```json
|
|
144
|
+
{
|
|
145
|
+
"permissions": {
|
|
146
|
+
"allow": [
|
|
147
|
+
"Bash(date:*)",
|
|
148
|
+
"Bash(echo:*)",
|
|
149
|
+
"Bash(cat:*)",
|
|
150
|
+
"Bash(ls:*)",
|
|
151
|
+
"Bash(mkdir:*)",
|
|
152
|
+
"Bash(wc:*)",
|
|
153
|
+
"Bash(head:*)",
|
|
154
|
+
"Bash(tail:*)",
|
|
155
|
+
"Bash(sort:*)",
|
|
156
|
+
"Bash(grep:*)",
|
|
157
|
+
"Bash(tr:*)",
|
|
158
|
+
"Bash(git add:*)",
|
|
159
|
+
"Bash(git commit:*)",
|
|
160
|
+
"Bash(git status:*)",
|
|
161
|
+
"Bash(git log:*)",
|
|
162
|
+
"Bash(git diff:*)",
|
|
163
|
+
"Bash(git tag:*)"
|
|
164
|
+
]
|
|
165
|
+
}
|
|
166
|
+
}
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
</details>
|
|
170
|
+
|
|
171
|
+
---
|
|
172
|
+
|
|
173
|
+
## How It Works
|
|
174
|
+
|
|
175
|
+
GRD follows a recursive validation loop: **Explore → Architect → Research → Evaluate → Graduate**. The Critic agent enforces skepticism at every step, routing experiments back to earlier phases when issues are detected.
|
|
176
|
+
|
|
177
|
+
> **Already have ML code?** Run `/grd:map-codebase` first. It spawns parallel agents to analyze your models, datasets, metrics, and experiment patterns. Then `/grd:new-project` knows your research context.
|
|
178
|
+
|
|
179
|
+
### 1. Data Reconnaissance
|
|
180
|
+
|
|
181
|
+
```
|
|
182
|
+
/grd:explore ./data/train.csv
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
**Understand your data before forming hypotheses.**
|
|
186
|
+
|
|
187
|
+
The Explorer agent profiles your dataset:
|
|
188
|
+
|
|
189
|
+
- **Distributions** — Feature statistics, class balance, outliers
|
|
190
|
+
- **Missing data patterns** — MCAR/MAR/MNAR analysis
|
|
191
|
+
- **Leakage detection** — High-confidence warnings for temporal/feature leakage
|
|
192
|
+
- **Data quality** — Anomalies that could invalidate experiments
|
|
193
|
+
|
|
194
|
+
The output grounds all downstream work in data reality.
|
|
195
|
+
|
|
196
|
+
**Creates:** `.planning/DATA_REPORT.md`
|
|
197
|
+
|
|
198
|
+
---
|
|
199
|
+
|
|
200
|
+
### 2. Hypothesis Synthesis
|
|
201
|
+
|
|
202
|
+
```
|
|
203
|
+
/grd:architect
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
**Transform data insights into testable hypotheses.**
|
|
207
|
+
|
|
208
|
+
The Architect agent reads your DATA_REPORT.md and proposes hypotheses with:
|
|
209
|
+
|
|
210
|
+
- **Testable claims** — What you're trying to prove
|
|
211
|
+
- **Success metrics** — Weighted metrics with thresholds
|
|
212
|
+
- **Falsification criteria** — What would disprove the hypothesis
|
|
213
|
+
- **Baseline requirements** — What you're comparing against
|
|
214
|
+
|
|
215
|
+
The Architect collaborates iteratively — propose, explain reasoning, refine based on your feedback.
|
|
216
|
+
|
|
217
|
+
**Creates:** `.planning/OBJECTIVE.md`
|
|
218
|
+
|
|
219
|
+
---
|
|
220
|
+
|
|
221
|
+
### 3. Recursive Validation Loop
|
|
222
|
+
|
|
223
|
+
```
|
|
224
|
+
/grd:research baseline
|
|
225
|
+
```
|
|
226
|
+
|
|
227
|
+
**Implement experiments with automated skeptical review.**
|
|
228
|
+
|
|
229
|
+
The Researcher agent:
|
|
230
|
+
|
|
231
|
+
1. **Creates isolated run** — `experiments/run_001_baseline/` with complete snapshot
|
|
232
|
+
2. **Implements experiment** — Code, config, data references
|
|
233
|
+
3. **Spawns Critic** — Automated skeptic reviews for logical errors, leakage, overfitting
|
|
234
|
+
|
|
235
|
+
The Critic returns one of four verdicts:
|
|
236
|
+
|
|
237
|
+
| Verdict | Meaning | Routing |
|
|
238
|
+
|---------|---------|---------|
|
|
239
|
+
| `PROCEED` | Logic sound, results align with data | → Evaluator |
|
|
240
|
+
| `REVISE_METHOD` | Logical error, bad hyperparams | → Back to Researcher |
|
|
241
|
+
| `REVISE_DATA` | Anomalous results, potential leakage | → Back to Explorer |
|
|
242
|
+
| `ESCALATE` | Ambiguous failure | → Human decision |
|
|
243
|
+
|
|
244
|
+
If `REVISE_METHOD`, continue with:
|
|
245
|
+
```
|
|
246
|
+
/grd:research --continue
|
|
247
|
+
```
|
|
248
|
+
|
|
249
|
+
The loop iterates until PROCEED (default limit: 5 iterations).
|
|
250
|
+
|
|
251
|
+
**Creates:** `experiments/run_NNN/` with code, config, logs, `CRITIC_LOG.md`, `SCORECARD.json`
|
|
252
|
+
|
|
253
|
+
---
|
|
254
|
+
|
|
255
|
+
### 4. Human Evaluation Gate
|
|
256
|
+
|
|
257
|
+
```
|
|
258
|
+
/grd:evaluate
|
|
259
|
+
```
|
|
260
|
+
|
|
261
|
+
**Review evidence and make the final call.**
|
|
262
|
+
|
|
263
|
+
After Critic approves and Evaluator benchmarks, you see the evidence package:
|
|
264
|
+
|
|
265
|
+
- **SCORECARD.json** — Quantitative metrics vs thresholds
|
|
266
|
+
- **CRITIC_LOG.md** — What passed validation and why
|
|
267
|
+
- **OBJECTIVE.md** — Original hypothesis for comparison
|
|
268
|
+
- **DATA_REPORT.md** — Data characteristics for context
|
|
269
|
+
|
|
270
|
+
Three decisions:
|
|
271
|
+
|
|
272
|
+
| Decision | Meaning | Next Step |
|
|
273
|
+
|----------|---------|-----------|
|
|
274
|
+
| **Seal** | Hypothesis validated | Ready for production/publication |
|
|
275
|
+
| **Iterate** | Continue experimenting | `/grd:research --continue` |
|
|
276
|
+
| **Archive** | Abandon hypothesis | Preserved as negative result |
|
|
277
|
+
|
|
278
|
+
Archived hypotheses are kept in `experiments/archive/` — negative results are valuable too.
|
|
279
|
+
|
|
280
|
+
**Creates:** `DECISION.md`, `human_eval/decision_log.md`
|
|
281
|
+
|
|
282
|
+
---
|
|
283
|
+
|
|
284
|
+
### 5. Notebook Graduation
|
|
285
|
+
|
|
286
|
+
```
|
|
287
|
+
/grd:graduate notebooks/exploration/baseline.ipynb
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
**Convert validated notebooks to production scripts.**
|
|
291
|
+
|
|
292
|
+
After a notebook passes Critic validation:
|
|
293
|
+
|
|
294
|
+
1. **Validates requirements** — Random seeds set, parameters cell tagged
|
|
295
|
+
2. **Converts to Python** — Via nbconvert with metadata header
|
|
296
|
+
3. **Places in `src/experiments/`** — Ready for production use
|
|
297
|
+
4. **Generates refactoring checklist** — Manual cleanup guide
|
|
298
|
+
|
|
299
|
+
**Creates:** `src/experiments/{script_name}.py`
|
|
300
|
+
|
|
301
|
+
---
|
|
302
|
+
|
|
303
|
+
### The Recursive Loop in Action
|
|
304
|
+
|
|
305
|
+
```
|
|
306
|
+
/grd:explore ./data/ # Profile data
|
|
307
|
+
/grd:architect # Form hypothesis
|
|
308
|
+
/grd:research baseline # Implement + Critic review
|
|
309
|
+
→ REVISE_METHOD # Critic finds issue
|
|
310
|
+
/grd:research --continue # Fix and retry
|
|
311
|
+
→ PROCEED # Critic approves
|
|
312
|
+
/grd:evaluate # Human reviews evidence
|
|
313
|
+
→ Seal # Hypothesis validated
|
|
314
|
+
/grd:graduate notebook.ipynb # Graduate to script
|
|
315
|
+
```
|
|
316
|
+
|
|
317
|
+
The power is in the routing. If results contradict the data profile, `REVISE_DATA` sends you back to `/grd:explore`. The system is self-correcting.
|
|
318
|
+
|
|
319
|
+
---
|
|
320
|
+
|
|
321
|
+
## Why It Works
|
|
322
|
+
|
|
323
|
+
### Data-First Philosophy
|
|
324
|
+
|
|
325
|
+
ML research fails when hypotheses aren't grounded in data reality. GRD enforces **data reconnaissance before hypothesis formation**:
|
|
326
|
+
|
|
327
|
+
1. **Explorer** profiles your data — distributions, outliers, class balance, leakage risks
|
|
328
|
+
2. **Architect** reads DATA_REPORT.md before proposing hypotheses
|
|
329
|
+
3. **Critic** validates experiments against data characteristics
|
|
330
|
+
|
|
331
|
+
No more "95% accuracy" on shifted distributions. No more spending weeks on experiments with data leakage baked in.
|
|
332
|
+
|
|
333
|
+
### Recursive Validation Loop
|
|
334
|
+
|
|
335
|
+
Research is non-linear. Results often invalidate assumptions. GRD's Critic agent has **three exit paths**:
|
|
336
|
+
|
|
337
|
+
| Exit Code | Meaning | Routing |
|
|
338
|
+
|-----------|---------|---------|
|
|
339
|
+
| `PROCEED` | Logic sound, aligns with data | → Evaluator → Human gate |
|
|
340
|
+
| `REVISE_METHOD` | Logical error, bad approach | → Back to Researcher |
|
|
341
|
+
| `REVISE_DATA` | Data quality concern | → Back to Explorer |
|
|
342
|
+
|
|
343
|
+
When results contradict the data profile, the system forces a return to the data layer. This is the core innovation over linear workflow tools.
|
|
344
|
+
|
|
345
|
+
### Context Engineering
|
|
346
|
+
|
|
347
|
+
GRD structures context so Claude can reason effectively:
|
|
348
|
+
|
|
349
|
+
| Artifact | Purpose |
|
|
350
|
+
|----------|---------|
|
|
351
|
+
| `DATA_REPORT.md` | Living data profile — distributions, leakage warnings, anomalies |
|
|
352
|
+
| `OBJECTIVE.md` | Testable hypothesis — what, why, metrics, falsification criteria |
|
|
353
|
+
| `CRITIC_LOG.md` | Validation history — verdicts, confidence, recommendations |
|
|
354
|
+
| `SCORECARD.json` | Quantitative results — metrics vs thresholds, composite score |
|
|
355
|
+
| `experiments/run_NNN/` | Isolated snapshots — code, config, logs, outputs per iteration |
|
|
356
|
+
|
|
357
|
+
Each run is a complete, reproducible snapshot. No context rot. No lost experiments.
|
|
358
|
+
|
|
359
|
+
### Agent Roles
|
|
360
|
+
|
|
361
|
+
| Agent | Responsibility | Output |
|
|
362
|
+
|-------|----------------|--------|
|
|
363
|
+
| **Explorer** | Data reconnaissance, leakage detection | `DATA_REPORT.md` |
|
|
364
|
+
| **Architect** | Hypothesis synthesis, success criteria | `OBJECTIVE.md` |
|
|
365
|
+
| **Researcher** | Implementation, experiment execution | `experiments/run_NNN/` |
|
|
366
|
+
| **Critic** | Skeptical validation, routing decisions | `CRITIC_LOG.md` |
|
|
367
|
+
| **Evaluator** | Quantitative benchmarking | `SCORECARD.json` |
|
|
368
|
+
| **Graduator** | Notebook-to-script conversion | `src/experiments/` |
|
|
369
|
+
|
|
370
|
+
The Researcher spawns Critic automatically. You don't orchestrate — you just run `/grd:research` and the loop handles itself.
|
|
371
|
+
|
|
372
|
+
### Human-in-the-Loop Gates
|
|
373
|
+
|
|
374
|
+
Automated skepticism catches obvious errors. But **humans make final calls**:
|
|
375
|
+
|
|
376
|
+
- **Low confidence PROCEED** — Critic shows concerns, you decide whether to continue
|
|
377
|
+
- **Iteration limit reached** — After 5 attempts, you review and choose direction
|
|
378
|
+
- **Evaluate gate** — You see full evidence package before Seal/Iterate/Archive
|
|
379
|
+
|
|
380
|
+
The system makes it **harder to deceive yourself**, not easier to ship models.
|
|
381
|
+
|
|
382
|
+
### Negative Result Preservation
|
|
383
|
+
|
|
384
|
+
Failed hypotheses are valuable. When you Archive:
|
|
385
|
+
|
|
386
|
+
- Final run preserved in `experiments/archive/`
|
|
387
|
+
- `ARCHIVE_REASON.md` captures why it failed
|
|
388
|
+
- `ITERATION_SUMMARY.md` shows what was tried
|
|
389
|
+
- Future researchers won't repeat the same mistakes
|
|
390
|
+
|
|
391
|
+
Insufficient skepticism causes most ML research failures. GRD makes skepticism structural.
|
|
392
|
+
|
|
393
|
+
---
|
|
394
|
+
|
|
395
|
+
## Commands
|
|
396
|
+
|
|
397
|
+
### Research Loop (Core Workflow)
|
|
398
|
+
|
|
399
|
+
| Command | What it does |
|
|
400
|
+
|---------|--------------|
|
|
401
|
+
| `/grd:explore [path]` | Data reconnaissance — profile distributions, detect leakage, identify anomalies |
|
|
402
|
+
| `/grd:architect [direction]` | Hypothesis synthesis — create testable OBJECTIVE.md with falsification criteria |
|
|
403
|
+
| `/grd:research [description]` | Recursive validation — implement experiment, Critic review, routing |
|
|
404
|
+
| `/grd:research --continue` | Continue after REVISE_METHOD verdict |
|
|
405
|
+
| `/grd:evaluate [run_name]` | Human decision gate — Seal / Iterate / Archive |
|
|
406
|
+
| `/grd:graduate <notebook>` | Graduate validated notebook to production script |
|
|
407
|
+
|
|
408
|
+
### Project Setup
|
|
409
|
+
|
|
410
|
+
| Command | What it does |
|
|
411
|
+
|---------|--------------|
|
|
412
|
+
| `/grd:new-project` | Initialize project with questioning → research → requirements |
|
|
413
|
+
| `/grd:map-codebase` | Analyze existing codebase before new-project |
|
|
414
|
+
|
|
415
|
+
### Navigation
|
|
416
|
+
|
|
417
|
+
| Command | What it does |
|
|
418
|
+
|---------|--------------|
|
|
419
|
+
| `/grd:progress` | Where am I? What's next? |
|
|
420
|
+
| `/grd:help` | Show all commands and usage guide |
|
|
421
|
+
| `/grd:update` | Update GRD with changelog preview |
|
|
422
|
+
| `/grd:join-discord` | Join the GRD Discord community |
|
|
423
|
+
|
|
424
|
+
### Session
|
|
425
|
+
|
|
426
|
+
| Command | What it does |
|
|
427
|
+
|---------|--------------|
|
|
428
|
+
| `/grd:pause-work` | Create handoff when stopping mid-experiment |
|
|
429
|
+
| `/grd:resume-work` | Restore from last session |
|
|
430
|
+
|
|
431
|
+
### Utilities
|
|
432
|
+
|
|
433
|
+
| Command | What it does |
|
|
434
|
+
|---------|--------------|
|
|
435
|
+
| `/grd:settings` | Configure model profile and workflow agents |
|
|
436
|
+
| `/grd:set-profile <profile>` | Switch model profile (quality/balanced/budget) |
|
|
437
|
+
| `/grd:add-todo [desc]` | Capture idea for later |
|
|
438
|
+
| `/grd:check-todos` | List pending todos |
|
|
439
|
+
| `/grd:debug [desc]` | Systematic debugging with persistent state |
|
|
440
|
+
| `/grd:quick` | Execute ad-hoc experiment with GRD guarantees |
|
|
441
|
+
|
|
442
|
+
---
|
|
443
|
+
|
|
444
|
+
## Configuration
|
|
445
|
+
|
|
446
|
+
GRD stores project settings in `.planning/config.json`. Configure during `/grd:new-project` or update later with `/grd:settings`.
|
|
447
|
+
|
|
448
|
+
### Core Settings
|
|
449
|
+
|
|
450
|
+
| Setting | Options | Default | What it controls |
|
|
451
|
+
|---------|---------|---------|------------------|
|
|
452
|
+
| `mode` | `yolo`, `interactive` | `interactive` | Auto-approve vs confirm at each step |
|
|
453
|
+
| `iteration_limit` | 1-10 | `5` | Max Researcher → Critic loops before human gate |
|
|
454
|
+
|
|
455
|
+
### Model Profiles
|
|
456
|
+
|
|
457
|
+
Control which Claude model each agent uses. Balance quality vs token spend.
|
|
458
|
+
|
|
459
|
+
| Profile | Explorer/Architect | Researcher | Critic/Evaluator |
|
|
460
|
+
|---------|-------------------|------------|------------------|
|
|
461
|
+
| `quality` | Opus | Opus | Sonnet |
|
|
462
|
+
| `balanced` (default) | Sonnet | Sonnet | Sonnet |
|
|
463
|
+
| `budget` | Sonnet | Haiku | Haiku |
|
|
464
|
+
|
|
465
|
+
Switch profiles:
|
|
466
|
+
```
|
|
467
|
+
/grd:set-profile budget
|
|
468
|
+
```
|
|
469
|
+
|
|
470
|
+
Or configure via `/grd:settings`.
|
|
471
|
+
|
|
472
|
+
### Workflow Agents
|
|
473
|
+
|
|
474
|
+
| Setting | Default | What it does |
|
|
475
|
+
|---------|---------|--------------|
|
|
476
|
+
| `workflow.research` | `true` | Domain research before project setup |
|
|
477
|
+
| `workflow.plan_check` | `true` | Verifies experiment design before execution |
|
|
478
|
+
| `workflow.verifier` | `true` | Confirms hypothesis criteria after Critic approval |
|
|
479
|
+
|
|
480
|
+
### Execution
|
|
481
|
+
|
|
482
|
+
| Setting | Default | What it controls |
|
|
483
|
+
|---------|---------|------------------|
|
|
484
|
+
| `commit_docs` | `true` | Track `.planning/` in git |
|
|
485
|
+
|
|
486
|
+
---
|
|
487
|
+
|
|
488
|
+
## Troubleshooting
|
|
489
|
+
|
|
490
|
+
**Commands not found after install?**
|
|
491
|
+
- Restart Claude Code to reload slash commands
|
|
492
|
+
- Verify files exist in `~/.claude/commands/grd/` (global) or `./.claude/commands/grd/` (local)
|
|
493
|
+
|
|
494
|
+
**Commands not working as expected?**
|
|
495
|
+
- Run `/grd:help` to verify installation
|
|
496
|
+
- Re-run `npx get-research-done` to reinstall
|
|
497
|
+
|
|
498
|
+
**Updating to the latest version?**
|
|
499
|
+
```bash
|
|
500
|
+
npx get-research-done@latest
|
|
501
|
+
```
|
|
502
|
+
|
|
503
|
+
**Using Docker or containerized environments?**
|
|
504
|
+
|
|
505
|
+
If file reads fail with tilde paths (`~/.claude/...`), set `CLAUDE_CONFIG_DIR` before installing:
|
|
506
|
+
```bash
|
|
507
|
+
CLAUDE_CONFIG_DIR=/home/youruser/.claude npx get-research-done --global
|
|
508
|
+
```
|
|
509
|
+
This ensures absolute paths are used instead of `~` which may not expand correctly in containers.
|
|
510
|
+
|
|
511
|
+
### Uninstalling
|
|
512
|
+
|
|
513
|
+
To remove GRD completely:
|
|
514
|
+
|
|
515
|
+
```bash
|
|
516
|
+
# Global installs
|
|
517
|
+
npx get-research-done --claude --global --uninstall
|
|
518
|
+
npx get-research-done --opencode --global --uninstall
|
|
519
|
+
|
|
520
|
+
# Local installs (current project)
|
|
521
|
+
npx get-research-done --claude --local --uninstall
|
|
522
|
+
npx get-research-done --opencode --local --uninstall
|
|
523
|
+
```
|
|
524
|
+
|
|
525
|
+
This removes all GRD commands, agents, hooks, and settings while preserving your other configurations.
|
|
526
|
+
|
|
527
|
+
---
|
|
528
|
+
|
|
529
|
+
## Community Ports
|
|
530
|
+
|
|
531
|
+
| Project | Platform | Description |
|
|
532
|
+
|---------|----------|-------------|
|
|
533
|
+
| [grd-opencode](https://github.com/rokicool/grd-opencode) | OpenCode | GRD adapted for OpenCode CLI |
|
|
534
|
+
| [grd-gemini](https://github.com/uberfuzzy/grd-gemini) | Gemini CLI | GRD adapted for Google's Gemini CLI |
|
|
535
|
+
|
|
536
|
+
---
|
|
537
|
+
|
|
538
|
+
## Star History
|
|
539
|
+
|
|
540
|
+
<a href="https://star-history.com/#glittercowboy/get-research-done&Date">
|
|
541
|
+
<picture>
|
|
542
|
+
<source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=glittercowboy/get-research-done&type=Date&theme=dark" />
|
|
543
|
+
<source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=glittercowboy/get-research-done&type=Date" />
|
|
544
|
+
<img alt="Star History Chart" src="https://api.star-history.com/svg?repos=glittercowboy/get-research-done&type=Date" />
|
|
545
|
+
</picture>
|
|
546
|
+
</a>
|
|
547
|
+
|
|
548
|
+
---
|
|
549
|
+
|
|
550
|
+
## License
|
|
551
|
+
|
|
552
|
+
MIT License. See [LICENSE](LICENSE) for details.
|
|
553
|
+
|
|
554
|
+
---
|
|
555
|
+
|
|
556
|
+
<div align="center">
|
|
557
|
+
|
|
558
|
+
**Claude Code is powerful. GRD makes ML research systematic.**
|
|
559
|
+
|
|
560
|
+
</div>
|