get-research-done 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +560 -0
- package/agents/grd-architect.md +789 -0
- package/agents/grd-codebase-mapper.md +738 -0
- package/agents/grd-critic.md +1065 -0
- package/agents/grd-debugger.md +1203 -0
- package/agents/grd-evaluator.md +948 -0
- package/agents/grd-executor.md +784 -0
- package/agents/grd-explorer.md +2063 -0
- package/agents/grd-graduator.md +484 -0
- package/agents/grd-integration-checker.md +423 -0
- package/agents/grd-phase-researcher.md +641 -0
- package/agents/grd-plan-checker.md +745 -0
- package/agents/grd-planner.md +1386 -0
- package/agents/grd-project-researcher.md +865 -0
- package/agents/grd-research-synthesizer.md +256 -0
- package/agents/grd-researcher.md +2361 -0
- package/agents/grd-roadmapper.md +605 -0
- package/agents/grd-verifier.md +778 -0
- package/bin/install.js +1294 -0
- package/commands/grd/add-phase.md +207 -0
- package/commands/grd/add-todo.md +193 -0
- package/commands/grd/architect.md +283 -0
- package/commands/grd/audit-milestone.md +277 -0
- package/commands/grd/check-todos.md +228 -0
- package/commands/grd/complete-milestone.md +136 -0
- package/commands/grd/debug.md +169 -0
- package/commands/grd/discuss-phase.md +86 -0
- package/commands/grd/evaluate.md +1095 -0
- package/commands/grd/execute-phase.md +339 -0
- package/commands/grd/explore.md +258 -0
- package/commands/grd/graduate.md +323 -0
- package/commands/grd/help.md +482 -0
- package/commands/grd/insert-phase.md +227 -0
- package/commands/grd/insights.md +231 -0
- package/commands/grd/join-discord.md +18 -0
- package/commands/grd/list-phase-assumptions.md +50 -0
- package/commands/grd/map-codebase.md +71 -0
- package/commands/grd/new-milestone.md +721 -0
- package/commands/grd/new-project.md +1008 -0
- package/commands/grd/pause-work.md +134 -0
- package/commands/grd/plan-milestone-gaps.md +295 -0
- package/commands/grd/plan-phase.md +525 -0
- package/commands/grd/progress.md +364 -0
- package/commands/grd/quick-explore.md +236 -0
- package/commands/grd/quick.md +309 -0
- package/commands/grd/remove-phase.md +349 -0
- package/commands/grd/research-phase.md +200 -0
- package/commands/grd/research.md +681 -0
- package/commands/grd/resume-work.md +40 -0
- package/commands/grd/set-profile.md +106 -0
- package/commands/grd/settings.md +136 -0
- package/commands/grd/update.md +172 -0
- package/commands/grd/verify-work.md +219 -0
- package/get-research-done/config/default.json +15 -0
- package/get-research-done/references/checkpoints.md +1078 -0
- package/get-research-done/references/continuation-format.md +249 -0
- package/get-research-done/references/git-integration.md +254 -0
- package/get-research-done/references/model-profiles.md +73 -0
- package/get-research-done/references/planning-config.md +94 -0
- package/get-research-done/references/questioning.md +141 -0
- package/get-research-done/references/tdd.md +263 -0
- package/get-research-done/references/ui-brand.md +160 -0
- package/get-research-done/references/verification-patterns.md +612 -0
- package/get-research-done/templates/DEBUG.md +159 -0
- package/get-research-done/templates/UAT.md +247 -0
- package/get-research-done/templates/archive-reason.md +195 -0
- package/get-research-done/templates/codebase/architecture.md +255 -0
- package/get-research-done/templates/codebase/concerns.md +310 -0
- package/get-research-done/templates/codebase/conventions.md +307 -0
- package/get-research-done/templates/codebase/integrations.md +280 -0
- package/get-research-done/templates/codebase/stack.md +186 -0
- package/get-research-done/templates/codebase/structure.md +285 -0
- package/get-research-done/templates/codebase/testing.md +480 -0
- package/get-research-done/templates/config.json +35 -0
- package/get-research-done/templates/context.md +283 -0
- package/get-research-done/templates/continue-here.md +78 -0
- package/get-research-done/templates/critic-log.md +288 -0
- package/get-research-done/templates/data-report.md +173 -0
- package/get-research-done/templates/debug-subagent-prompt.md +91 -0
- package/get-research-done/templates/decision-log.md +58 -0
- package/get-research-done/templates/decision.md +138 -0
- package/get-research-done/templates/discovery.md +146 -0
- package/get-research-done/templates/experiment-readme.md +104 -0
- package/get-research-done/templates/graduated-script.md +180 -0
- package/get-research-done/templates/iteration-summary.md +234 -0
- package/get-research-done/templates/milestone-archive.md +123 -0
- package/get-research-done/templates/milestone.md +115 -0
- package/get-research-done/templates/objective.md +271 -0
- package/get-research-done/templates/phase-prompt.md +567 -0
- package/get-research-done/templates/planner-subagent-prompt.md +117 -0
- package/get-research-done/templates/project.md +184 -0
- package/get-research-done/templates/requirements.md +231 -0
- package/get-research-done/templates/research-project/ARCHITECTURE.md +204 -0
- package/get-research-done/templates/research-project/FEATURES.md +147 -0
- package/get-research-done/templates/research-project/PITFALLS.md +200 -0
- package/get-research-done/templates/research-project/STACK.md +120 -0
- package/get-research-done/templates/research-project/SUMMARY.md +170 -0
- package/get-research-done/templates/research.md +529 -0
- package/get-research-done/templates/roadmap.md +202 -0
- package/get-research-done/templates/scorecard.json +113 -0
- package/get-research-done/templates/state.md +287 -0
- package/get-research-done/templates/summary.md +246 -0
- package/get-research-done/templates/user-setup.md +311 -0
- package/get-research-done/templates/verification-report.md +322 -0
- package/get-research-done/workflows/complete-milestone.md +756 -0
- package/get-research-done/workflows/diagnose-issues.md +231 -0
- package/get-research-done/workflows/discovery-phase.md +289 -0
- package/get-research-done/workflows/discuss-phase.md +433 -0
- package/get-research-done/workflows/execute-phase.md +657 -0
- package/get-research-done/workflows/execute-plan.md +1844 -0
- package/get-research-done/workflows/list-phase-assumptions.md +178 -0
- package/get-research-done/workflows/map-codebase.md +322 -0
- package/get-research-done/workflows/resume-project.md +307 -0
- package/get-research-done/workflows/transition.md +556 -0
- package/get-research-done/workflows/verify-phase.md +628 -0
- package/get-research-done/workflows/verify-work.md +596 -0
- package/hooks/dist/grd-check-update.js +61 -0
- package/hooks/dist/grd-statusline.js +84 -0
- package/package.json +47 -0
- package/scripts/audit-help-commands.sh +115 -0
- package/scripts/build-hooks.js +42 -0
- package/scripts/verify-all-commands.sh +246 -0
- package/scripts/verify-architect-warning.sh +35 -0
- package/scripts/verify-insights-mode.sh +40 -0
- package/scripts/verify-quick-mode.sh +20 -0
- package/scripts/verify-revise-data-routing.sh +139 -0
|
@@ -0,0 +1,202 @@
|
|
|
1
|
+
# Roadmap Template
|
|
2
|
+
|
|
3
|
+
Template for `.planning/ROADMAP.md`.
|
|
4
|
+
|
|
5
|
+
## Initial Roadmap (v1.0 Greenfield)
|
|
6
|
+
|
|
7
|
+
```markdown
|
|
8
|
+
# Roadmap: [Project Name]
|
|
9
|
+
|
|
10
|
+
## Overview
|
|
11
|
+
|
|
12
|
+
[One paragraph describing the journey from start to finish]
|
|
13
|
+
|
|
14
|
+
## Phases
|
|
15
|
+
|
|
16
|
+
**Phase Numbering:**
|
|
17
|
+
- Integer phases (1, 2, 3): Planned milestone work
|
|
18
|
+
- Decimal phases (2.1, 2.2): Urgent insertions (marked with INSERTED)
|
|
19
|
+
|
|
20
|
+
Decimal phases appear between their surrounding integers in numeric order.
|
|
21
|
+
|
|
22
|
+
- [ ] **Phase 1: [Name]** - [One-line description]
|
|
23
|
+
- [ ] **Phase 2: [Name]** - [One-line description]
|
|
24
|
+
- [ ] **Phase 3: [Name]** - [One-line description]
|
|
25
|
+
- [ ] **Phase 4: [Name]** - [One-line description]
|
|
26
|
+
|
|
27
|
+
## Phase Details
|
|
28
|
+
|
|
29
|
+
### Phase 1: [Name]
|
|
30
|
+
**Goal**: [What this phase delivers]
|
|
31
|
+
**Depends on**: Nothing (first phase)
|
|
32
|
+
**Requirements**: [REQ-01, REQ-02, REQ-03]
|
|
33
|
+
**Success Criteria** (what must be TRUE):
|
|
34
|
+
1. [Observable behavior from user perspective]
|
|
35
|
+
2. [Observable behavior from user perspective]
|
|
36
|
+
3. [Observable behavior from user perspective]
|
|
37
|
+
**Plans**: [Number of plans, e.g., "3 plans" or "TBD"]
|
|
38
|
+
|
|
39
|
+
Plans:
|
|
40
|
+
- [ ] 01-01: [Brief description of first plan]
|
|
41
|
+
- [ ] 01-02: [Brief description of second plan]
|
|
42
|
+
- [ ] 01-03: [Brief description of third plan]
|
|
43
|
+
|
|
44
|
+
### Phase 2: [Name]
|
|
45
|
+
**Goal**: [What this phase delivers]
|
|
46
|
+
**Depends on**: Phase 1
|
|
47
|
+
**Requirements**: [REQ-04, REQ-05]
|
|
48
|
+
**Success Criteria** (what must be TRUE):
|
|
49
|
+
1. [Observable behavior from user perspective]
|
|
50
|
+
2. [Observable behavior from user perspective]
|
|
51
|
+
**Plans**: [Number of plans]
|
|
52
|
+
|
|
53
|
+
Plans:
|
|
54
|
+
- [ ] 02-01: [Brief description]
|
|
55
|
+
- [ ] 02-02: [Brief description]
|
|
56
|
+
|
|
57
|
+
### Phase 2.1: Critical Fix (INSERTED)
|
|
58
|
+
**Goal**: [Urgent work inserted between phases]
|
|
59
|
+
**Depends on**: Phase 2
|
|
60
|
+
**Success Criteria** (what must be TRUE):
|
|
61
|
+
1. [What the fix achieves]
|
|
62
|
+
**Plans**: 1 plan
|
|
63
|
+
|
|
64
|
+
Plans:
|
|
65
|
+
- [ ] 02.1-01: [Description]
|
|
66
|
+
|
|
67
|
+
### Phase 3: [Name]
|
|
68
|
+
**Goal**: [What this phase delivers]
|
|
69
|
+
**Depends on**: Phase 2
|
|
70
|
+
**Requirements**: [REQ-06, REQ-07, REQ-08]
|
|
71
|
+
**Success Criteria** (what must be TRUE):
|
|
72
|
+
1. [Observable behavior from user perspective]
|
|
73
|
+
2. [Observable behavior from user perspective]
|
|
74
|
+
3. [Observable behavior from user perspective]
|
|
75
|
+
**Plans**: [Number of plans]
|
|
76
|
+
|
|
77
|
+
Plans:
|
|
78
|
+
- [ ] 03-01: [Brief description]
|
|
79
|
+
- [ ] 03-02: [Brief description]
|
|
80
|
+
|
|
81
|
+
### Phase 4: [Name]
|
|
82
|
+
**Goal**: [What this phase delivers]
|
|
83
|
+
**Depends on**: Phase 3
|
|
84
|
+
**Requirements**: [REQ-09, REQ-10]
|
|
85
|
+
**Success Criteria** (what must be TRUE):
|
|
86
|
+
1. [Observable behavior from user perspective]
|
|
87
|
+
2. [Observable behavior from user perspective]
|
|
88
|
+
**Plans**: [Number of plans]
|
|
89
|
+
|
|
90
|
+
Plans:
|
|
91
|
+
- [ ] 04-01: [Brief description]
|
|
92
|
+
|
|
93
|
+
## Progress
|
|
94
|
+
|
|
95
|
+
**Execution Order:**
|
|
96
|
+
Phases execute in numeric order: 2 → 2.1 → 2.2 → 3 → 3.1 → 4
|
|
97
|
+
|
|
98
|
+
| Phase | Plans Complete | Status | Completed |
|
|
99
|
+
|-------|----------------|--------|-----------|
|
|
100
|
+
| 1. [Name] | 0/3 | Not started | - |
|
|
101
|
+
| 2. [Name] | 0/2 | Not started | - |
|
|
102
|
+
| 3. [Name] | 0/2 | Not started | - |
|
|
103
|
+
| 4. [Name] | 0/1 | Not started | - |
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
<guidelines>
|
|
107
|
+
**Initial planning (v1.0):**
|
|
108
|
+
- Phase count depends on depth setting (quick: 3-5, standard: 5-8, comprehensive: 8-12)
|
|
109
|
+
- Each phase delivers something coherent
|
|
110
|
+
- Phases can have 1+ plans (split if >3 tasks or multiple subsystems)
|
|
111
|
+
- Plans use naming: {phase}-{plan}-PLAN.md (e.g., 01-02-PLAN.md)
|
|
112
|
+
- No time estimates (this isn't enterprise PM)
|
|
113
|
+
- Progress table updated by execute workflow
|
|
114
|
+
- Plan count can be "TBD" initially, refined during planning
|
|
115
|
+
|
|
116
|
+
**Success criteria:**
|
|
117
|
+
- 2-5 observable behaviors per phase (from user's perspective)
|
|
118
|
+
- Cross-checked against requirements during roadmap creation
|
|
119
|
+
- Flow downstream to `must_haves` in plan-phase
|
|
120
|
+
- Verified by verify-phase after execution
|
|
121
|
+
- Format: "User can [action]" or "[Thing] works/exists"
|
|
122
|
+
|
|
123
|
+
**After milestones ship:**
|
|
124
|
+
- Collapse completed milestones in `<details>` tags
|
|
125
|
+
- Add new milestone sections for upcoming work
|
|
126
|
+
- Keep continuous phase numbering (never restart at 01)
|
|
127
|
+
</guidelines>
|
|
128
|
+
|
|
129
|
+
<status_values>
|
|
130
|
+
- `Not started` - Haven't begun
|
|
131
|
+
- `In progress` - Currently working
|
|
132
|
+
- `Complete` - Done (add completion date)
|
|
133
|
+
- `Deferred` - Pushed to later (with reason)
|
|
134
|
+
</status_values>
|
|
135
|
+
|
|
136
|
+
## Milestone-Grouped Roadmap (After v1.0 Ships)
|
|
137
|
+
|
|
138
|
+
After completing first milestone, reorganize with milestone groupings:
|
|
139
|
+
|
|
140
|
+
```markdown
|
|
141
|
+
# Roadmap: [Project Name]
|
|
142
|
+
|
|
143
|
+
## Milestones
|
|
144
|
+
|
|
145
|
+
- ✅ **v1.0 MVP** - Phases 1-4 (shipped YYYY-MM-DD)
|
|
146
|
+
- 🚧 **v1.1 [Name]** - Phases 5-6 (in progress)
|
|
147
|
+
- 📋 **v2.0 [Name]** - Phases 7-10 (planned)
|
|
148
|
+
|
|
149
|
+
## Phases
|
|
150
|
+
|
|
151
|
+
<details>
|
|
152
|
+
<summary>✅ v1.0 MVP (Phases 1-4) - SHIPPED YYYY-MM-DD</summary>
|
|
153
|
+
|
|
154
|
+
### Phase 1: [Name]
|
|
155
|
+
**Goal**: [What this phase delivers]
|
|
156
|
+
**Plans**: 3 plans
|
|
157
|
+
|
|
158
|
+
Plans:
|
|
159
|
+
- [x] 01-01: [Brief description]
|
|
160
|
+
- [x] 01-02: [Brief description]
|
|
161
|
+
- [x] 01-03: [Brief description]
|
|
162
|
+
|
|
163
|
+
[... remaining v1.0 phases ...]
|
|
164
|
+
|
|
165
|
+
</details>
|
|
166
|
+
|
|
167
|
+
### 🚧 v1.1 [Name] (In Progress)
|
|
168
|
+
|
|
169
|
+
**Milestone Goal:** [What v1.1 delivers]
|
|
170
|
+
|
|
171
|
+
#### Phase 5: [Name]
|
|
172
|
+
**Goal**: [What this phase delivers]
|
|
173
|
+
**Depends on**: Phase 4
|
|
174
|
+
**Plans**: 2 plans
|
|
175
|
+
|
|
176
|
+
Plans:
|
|
177
|
+
- [ ] 05-01: [Brief description]
|
|
178
|
+
- [ ] 05-02: [Brief description]
|
|
179
|
+
|
|
180
|
+
[... remaining v1.1 phases ...]
|
|
181
|
+
|
|
182
|
+
### 📋 v2.0 [Name] (Planned)
|
|
183
|
+
|
|
184
|
+
**Milestone Goal:** [What v2.0 delivers]
|
|
185
|
+
|
|
186
|
+
[... v2.0 phases ...]
|
|
187
|
+
|
|
188
|
+
## Progress
|
|
189
|
+
|
|
190
|
+
| Phase | Milestone | Plans Complete | Status | Completed |
|
|
191
|
+
|-------|-----------|----------------|--------|-----------|
|
|
192
|
+
| 1. Foundation | v1.0 | 3/3 | Complete | YYYY-MM-DD |
|
|
193
|
+
| 2. Features | v1.0 | 2/2 | Complete | YYYY-MM-DD |
|
|
194
|
+
| 5. Security | v1.1 | 0/2 | Not started | - |
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
**Notes:**
|
|
198
|
+
- Milestone emoji: ✅ shipped, 🚧 in progress, 📋 planned
|
|
199
|
+
- Completed milestones collapsed in `<details>` for readability
|
|
200
|
+
- Current/future milestones expanded
|
|
201
|
+
- Continuous phase numbering (01-99)
|
|
202
|
+
- Progress table includes milestone column
|
|
@@ -0,0 +1,113 @@
|
|
|
1
|
+
{
|
|
2
|
+
"$schema": "http://json-schema.org/draft-07/schema#",
|
|
3
|
+
"$id": "grd-scorecard-v1",
|
|
4
|
+
"title": "GRD Experiment Scorecard",
|
|
5
|
+
"description": "Quantitative evaluation results for a validated experiment run",
|
|
6
|
+
|
|
7
|
+
"run_id": "{{run_NNN_description}}",
|
|
8
|
+
"timestamp": "{{ISO8601_timestamp}}",
|
|
9
|
+
"objective_ref": ".planning/OBJECTIVE.md",
|
|
10
|
+
"hypothesis": "{{brief_hypothesis_statement}}",
|
|
11
|
+
"iteration": "{{iteration_number}}",
|
|
12
|
+
"data_version": "{{sha256_hash_of_data}}",
|
|
13
|
+
|
|
14
|
+
"evaluation": {
|
|
15
|
+
"strategy": "{{k-fold|stratified-k-fold|time-series-split|holdout}}",
|
|
16
|
+
"k": "{{number_of_folds_or_null}}",
|
|
17
|
+
"test_size": "{{proportion_or_null}}",
|
|
18
|
+
"random_state": 42,
|
|
19
|
+
"folds_completed": "{{number_of_folds_executed}}"
|
|
20
|
+
},
|
|
21
|
+
|
|
22
|
+
"metrics": {
|
|
23
|
+
"{{metric_name}}": {
|
|
24
|
+
"mean": "{{float_mean_across_folds}}",
|
|
25
|
+
"std": "{{float_std_across_folds}}",
|
|
26
|
+
"per_fold": ["{{fold_1_value}}", "{{fold_2_value}}", "..."],
|
|
27
|
+
"threshold": "{{threshold_from_objective}}",
|
|
28
|
+
"comparison": "{{>=|<=|==}}",
|
|
29
|
+
"weight": "{{0.0-1.0_from_objective}}",
|
|
30
|
+
"result": "{{PASS|FAIL}}"
|
|
31
|
+
},
|
|
32
|
+
"{{additional_metrics}}": {
|
|
33
|
+
"...": "..."
|
|
34
|
+
}
|
|
35
|
+
},
|
|
36
|
+
|
|
37
|
+
"composite_score": "{{weighted_average_of_all_metrics}}",
|
|
38
|
+
"composite_threshold": "{{from_objective_or_default_0.5}}",
|
|
39
|
+
"overall_result": "{{PASS|FAIL}}",
|
|
40
|
+
|
|
41
|
+
"baseline_comparison": {
|
|
42
|
+
"experiment_score": "{{weighted_composite_score}}",
|
|
43
|
+
"baselines": [
|
|
44
|
+
{
|
|
45
|
+
"name": "{{primary_baseline_name}}",
|
|
46
|
+
"type": "primary",
|
|
47
|
+
"source": "{{own_implementation|literature_citation}}",
|
|
48
|
+
"score": "{{baseline_composite_score}}",
|
|
49
|
+
"experiment_score": "{{experiment_composite_score}}",
|
|
50
|
+
"improvement": "{{float_experiment_minus_baseline}}",
|
|
51
|
+
"improvement_pct": "{{percentage_string}}",
|
|
52
|
+
"significant": "{{true|false|not_tested}}",
|
|
53
|
+
"run_path": "{{experiments/run_NNN_baseline/}}",
|
|
54
|
+
"note": "{{optional_note_for_literature_baselines}}"
|
|
55
|
+
},
|
|
56
|
+
{
|
|
57
|
+
"name": "{{secondary_baseline_name}}",
|
|
58
|
+
"type": "secondary",
|
|
59
|
+
"source": "{{own_implementation|literature_citation}}",
|
|
60
|
+
"score": "{{baseline_composite_score}}",
|
|
61
|
+
"improvement": "{{float_improvement}}",
|
|
62
|
+
"improvement_pct": "{{percentage_string}}",
|
|
63
|
+
"significant": "{{true|false|not_tested}}",
|
|
64
|
+
"run_path": "{{experiments/run_NNN_secondary/|null_if_literature}}"
|
|
65
|
+
}
|
|
66
|
+
],
|
|
67
|
+
"primary_baseline": "{{primary_baseline_name_or_null}}",
|
|
68
|
+
"secondary_baselines": ["{{secondary_name_1}}", "{{secondary_name_2}}"],
|
|
69
|
+
"warnings": ["{{warning_if_any_baseline_unavailable}}"]
|
|
70
|
+
},
|
|
71
|
+
|
|
72
|
+
"baseline_validation": {
|
|
73
|
+
"researcher_validated": "{{true|false}}",
|
|
74
|
+
"evaluator_validated": "{{true|false}}",
|
|
75
|
+
"validation_skipped": "{{true_if_skip_baseline_used|false}}",
|
|
76
|
+
"data_hash_match": "{{true_if_same_data|false_with_warning}}",
|
|
77
|
+
"notes": ["{{validation_notes_if_any}}"]
|
|
78
|
+
},
|
|
79
|
+
|
|
80
|
+
"confidence_interval": {
|
|
81
|
+
"composite_lower": "{{float_lower_bound}}",
|
|
82
|
+
"composite_upper": "{{float_upper_bound}}",
|
|
83
|
+
"confidence_level": 0.95,
|
|
84
|
+
"method": "{{bootstrap|t_distribution}}"
|
|
85
|
+
},
|
|
86
|
+
|
|
87
|
+
"provenance": {
|
|
88
|
+
"code_snapshot": "experiments/{{run_id}}/code/",
|
|
89
|
+
"config_file": "experiments/{{run_id}}/config.yaml",
|
|
90
|
+
"logs": "experiments/{{run_id}}/logs/",
|
|
91
|
+
"outputs": "experiments/{{run_id}}/outputs/"
|
|
92
|
+
},
|
|
93
|
+
|
|
94
|
+
"critic_summary": {
|
|
95
|
+
"verdict": "PROCEED",
|
|
96
|
+
"confidence": "{{HIGH|MEDIUM|LOW}}",
|
|
97
|
+
"log_path": "experiments/{{run_id}}/CRITIC_LOG.md"
|
|
98
|
+
},
|
|
99
|
+
|
|
100
|
+
"ready_for_human_review": true,
|
|
101
|
+
"next_phase": "Phase 5: Human Evaluation Gate",
|
|
102
|
+
|
|
103
|
+
"_notes": {
|
|
104
|
+
"description": "This template shows the structure and placeholder format for SCORECARD.json",
|
|
105
|
+
"usage": "grd-evaluator agent populates this template with actual evaluation results",
|
|
106
|
+
"weights_constraint": "All metric weights must sum to 1.0",
|
|
107
|
+
"baseline_structure": "baselines array supports multiple comparisons: first is primary (required), rest are secondary (optional)",
|
|
108
|
+
"baseline_types": "primary = required for experiment to proceed; secondary = optional additional comparisons",
|
|
109
|
+
"baseline_validation": "Tracks both Researcher (start) and Evaluator (end) validation states",
|
|
110
|
+
"mlflow_integration": "These fields are also logged to MLflow if available",
|
|
111
|
+
"phase_5_requirement": "ready_for_human_review: true signals Phase 5 can proceed"
|
|
112
|
+
}
|
|
113
|
+
}
|
|
@@ -0,0 +1,287 @@
|
|
|
1
|
+
# State Template
|
|
2
|
+
|
|
3
|
+
<!-- STATE.md template v2.0 - GRD research loop tracking -->
|
|
4
|
+
|
|
5
|
+
Template for `.planning/STATE.md` — the project's living memory.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## File Template
|
|
10
|
+
|
|
11
|
+
```markdown
|
|
12
|
+
# Project State
|
|
13
|
+
|
|
14
|
+
## Project Reference
|
|
15
|
+
|
|
16
|
+
See: .planning/PROJECT.md (updated [date])
|
|
17
|
+
|
|
18
|
+
**Core value:** [One-liner from PROJECT.md Core Value section]
|
|
19
|
+
**Current focus:** [Current phase name]
|
|
20
|
+
|
|
21
|
+
## Current Position
|
|
22
|
+
|
|
23
|
+
Phase: [X] of [Y] ([Phase name])
|
|
24
|
+
Plan: [A] of [B] in current phase
|
|
25
|
+
Status: [Ready to plan / Planning / Ready to execute / In progress / Phase complete]
|
|
26
|
+
Last activity: [YYYY-MM-DD] — [What happened]
|
|
27
|
+
|
|
28
|
+
Progress: [░░░░░░░░░░] 0%
|
|
29
|
+
|
|
30
|
+
## Research Loop State
|
|
31
|
+
|
|
32
|
+
**Active Hypothesis:** {{hypothesis_id_or_none}}
|
|
33
|
+
**Objective:** {{brief_hypothesis_statement}}
|
|
34
|
+
**Status:** {{not_started|in_progress|pending_review|archived}}
|
|
35
|
+
|
|
36
|
+
### Current Iteration
|
|
37
|
+
|
|
38
|
+
- **Iteration:** {{N}} of {{limit}} (default limit: 5)
|
|
39
|
+
- **Current Run:** experiments/{{run_NNN_description}}
|
|
40
|
+
- **Phase:** {{researcher|critic|evaluator|human_review}}
|
|
41
|
+
- **Data Revisions:** {{data_revision_count}} of {{data_revision_limit}} (default limit: 2)
|
|
42
|
+
|
|
43
|
+
### Loop History
|
|
44
|
+
|
|
45
|
+
| Iteration | Run | Verdict | Confidence | Metrics Summary |
|
|
46
|
+
|-----------|-----|---------|------------|-----------------|
|
|
47
|
+
| 1 | run_001_baseline | REVISE_METHOD | MEDIUM | acc=0.72 |
|
|
48
|
+
| 2 | run_002_tuned | PROCEED | HIGH | acc=0.85 |
|
|
49
|
+
|
|
50
|
+
### Verdict Trend
|
|
51
|
+
|
|
52
|
+
- **Pattern:** {{improving|stagnant|degrading|mixed}}
|
|
53
|
+
- **Consecutive same verdicts:** {{N}}
|
|
54
|
+
- **Last 3 verdicts:** {{verdict1, verdict2, verdict3}}
|
|
55
|
+
|
|
56
|
+
### Human Decisions
|
|
57
|
+
|
|
58
|
+
| Timestamp | Decision | Rationale |
|
|
59
|
+
|-----------|----------|-----------|
|
|
60
|
+
| {{timestamp}} | {{Continue|Archive|Reset|Escalate}} | {{user_rationale}} |
|
|
61
|
+
|
|
62
|
+
### Data Revisions
|
|
63
|
+
|
|
64
|
+
Track REVISE_DATA cycles within current hypothesis:
|
|
65
|
+
|
|
66
|
+
| Iteration | Concerns | Explorer Result | Action Taken |
|
|
67
|
+
|-----------|----------|-----------------|--------------|
|
|
68
|
+
| {{N}} | {{concern_list}} | {{result_summary}} | {{action}} |
|
|
69
|
+
|
|
70
|
+
**Data Revision Limits:**
|
|
71
|
+
- Current count: {{data_revision_count}} of {{data_revision_limit}}
|
|
72
|
+
- If limit reached: Escalate to human (data quality may be insufficient for hypothesis)
|
|
73
|
+
|
|
74
|
+
## Performance Metrics
|
|
75
|
+
|
|
76
|
+
**Velocity:**
|
|
77
|
+
- Total plans completed: [N]
|
|
78
|
+
- Average duration: [X] min
|
|
79
|
+
- Total execution time: [X.X] hours
|
|
80
|
+
|
|
81
|
+
**By Phase:**
|
|
82
|
+
|
|
83
|
+
| Phase | Plans | Total | Avg/Plan |
|
|
84
|
+
|-------|-------|-------|----------|
|
|
85
|
+
| - | - | - | - |
|
|
86
|
+
|
|
87
|
+
**Recent Trend:**
|
|
88
|
+
- Last 5 plans: [durations]
|
|
89
|
+
- Trend: [Improving / Stable / Degrading]
|
|
90
|
+
|
|
91
|
+
*Updated after each plan completion*
|
|
92
|
+
|
|
93
|
+
## Accumulated Context
|
|
94
|
+
|
|
95
|
+
### Decisions
|
|
96
|
+
|
|
97
|
+
Decisions are logged in PROJECT.md Key Decisions table.
|
|
98
|
+
Recent decisions affecting current work:
|
|
99
|
+
|
|
100
|
+
- [Phase X]: [Decision summary]
|
|
101
|
+
- [Phase Y]: [Decision summary]
|
|
102
|
+
|
|
103
|
+
### Research Decisions
|
|
104
|
+
|
|
105
|
+
| Decision | Iteration | Impact |
|
|
106
|
+
|----------|-----------|--------|
|
|
107
|
+
| {{decision_description}} | {{N}} | {{what_changed}} |
|
|
108
|
+
|
|
109
|
+
### Pending Todos
|
|
110
|
+
|
|
111
|
+
[From .planning/todos/pending/ — ideas captured during sessions]
|
|
112
|
+
|
|
113
|
+
None yet.
|
|
114
|
+
|
|
115
|
+
### Blockers/Concerns
|
|
116
|
+
|
|
117
|
+
[Issues that affect future work]
|
|
118
|
+
|
|
119
|
+
None yet.
|
|
120
|
+
|
|
121
|
+
### Research Blockers
|
|
122
|
+
|
|
123
|
+
- **Current:** {{blocker_or_none}}
|
|
124
|
+
- **Requires:** {{human_action|data_fix|method_change}}
|
|
125
|
+
|
|
126
|
+
### Research Blockers
|
|
127
|
+
|
|
128
|
+
- **Current:** {{blocker_or_none}}
|
|
129
|
+
- **Requires:** {{human_action|data_fix|method_change}}
|
|
130
|
+
|
|
131
|
+
## Session Continuity
|
|
132
|
+
|
|
133
|
+
Last session: [YYYY-MM-DD HH:MM]
|
|
134
|
+
Stopped at: [Description of last completed action]
|
|
135
|
+
Resume file: [Path to .continue-here*.md if exists, otherwise "None"]
|
|
136
|
+
|
|
137
|
+
## Research Loop History
|
|
138
|
+
|
|
139
|
+
**Active Loop:** [N/A - no active research loop]
|
|
140
|
+
**Loop Status:** [idle/exploring/synthesizing/validating/complete]
|
|
141
|
+
|
|
142
|
+
| Loop | Started | Focus Area | Status | Outcome |
|
|
143
|
+
|------|---------|------------|--------|---------|
|
|
144
|
+
| - | - | - | - | - |
|
|
145
|
+
|
|
146
|
+
**Current Loop Progress:**
|
|
147
|
+
- [ ] Data reconnaissance (Explorer)
|
|
148
|
+
- [ ] Hypothesis synthesis (Architect)
|
|
149
|
+
- [ ] Implementation (Researcher)
|
|
150
|
+
- [ ] Validation (Critic)
|
|
151
|
+
- [ ] Evaluation (Evaluator)
|
|
152
|
+
|
|
153
|
+
**Loop Notes:**
|
|
154
|
+
_Notes from current research iteration appear here_
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
<purpose>
|
|
158
|
+
|
|
159
|
+
STATE.md is the project's short-term memory spanning all phases and sessions.
|
|
160
|
+
|
|
161
|
+
**Problem it solves:** Information is captured in summaries, issues, and decisions but not systematically consumed. Sessions start without context.
|
|
162
|
+
|
|
163
|
+
**Solution:** A single, small file that's:
|
|
164
|
+
- Read first in every workflow
|
|
165
|
+
- Updated after every significant action
|
|
166
|
+
- Contains digest of accumulated context
|
|
167
|
+
- Enables instant session restoration
|
|
168
|
+
|
|
169
|
+
</purpose>
|
|
170
|
+
|
|
171
|
+
<lifecycle>
|
|
172
|
+
|
|
173
|
+
**Creation:** After ROADMAP.md is created (during init)
|
|
174
|
+
- Reference PROJECT.md (read it for current context)
|
|
175
|
+
- Initialize empty accumulated context sections
|
|
176
|
+
- Set position to "Phase 1 ready to plan"
|
|
177
|
+
|
|
178
|
+
**Reading:** First step of every workflow
|
|
179
|
+
- progress: Present status to user
|
|
180
|
+
- plan: Inform planning decisions
|
|
181
|
+
- execute: Know current position
|
|
182
|
+
- transition: Know what's complete
|
|
183
|
+
|
|
184
|
+
**Writing:** After every significant action
|
|
185
|
+
- execute: After SUMMARY.md created
|
|
186
|
+
- Update position (phase, plan, status)
|
|
187
|
+
- Note new decisions (detail in PROJECT.md)
|
|
188
|
+
- Add blockers/concerns
|
|
189
|
+
- transition: After phase marked complete
|
|
190
|
+
- Update progress bar
|
|
191
|
+
- Clear resolved blockers
|
|
192
|
+
- Refresh Project Reference date
|
|
193
|
+
|
|
194
|
+
</lifecycle>
|
|
195
|
+
|
|
196
|
+
<sections>
|
|
197
|
+
|
|
198
|
+
### Project Reference
|
|
199
|
+
Points to PROJECT.md for full context. Includes:
|
|
200
|
+
- Core value (the ONE thing that matters)
|
|
201
|
+
- Current focus (which phase)
|
|
202
|
+
- Last update date (triggers re-read if stale)
|
|
203
|
+
|
|
204
|
+
Claude reads PROJECT.md directly for requirements, constraints, and decisions.
|
|
205
|
+
|
|
206
|
+
### Current Position
|
|
207
|
+
Where we are right now:
|
|
208
|
+
- Phase X of Y — which phase
|
|
209
|
+
- Plan A of B — which plan within phase
|
|
210
|
+
- Status — current state
|
|
211
|
+
- Last activity — what happened most recently
|
|
212
|
+
- Progress bar — visual indicator of overall completion
|
|
213
|
+
|
|
214
|
+
Progress calculation: (completed plans) / (total plans across all phases) × 100%
|
|
215
|
+
|
|
216
|
+
### Performance Metrics
|
|
217
|
+
Track velocity to understand execution patterns:
|
|
218
|
+
- Total plans completed
|
|
219
|
+
- Average duration per plan
|
|
220
|
+
- Per-phase breakdown
|
|
221
|
+
- Recent trend (improving/stable/degrading)
|
|
222
|
+
|
|
223
|
+
Updated after each plan completion.
|
|
224
|
+
|
|
225
|
+
### Accumulated Context
|
|
226
|
+
|
|
227
|
+
**Decisions:** Reference to PROJECT.md Key Decisions table, plus recent decisions summary for quick access. Full decision log lives in PROJECT.md.
|
|
228
|
+
|
|
229
|
+
**Pending Todos:** Ideas captured via /grd:add-todo
|
|
230
|
+
- Count of pending todos
|
|
231
|
+
- Reference to .planning/todos/pending/
|
|
232
|
+
- Brief list if few, count if many (e.g., "5 pending todos — see /grd:check-todos")
|
|
233
|
+
|
|
234
|
+
**Blockers/Concerns:** From "Next Phase Readiness" sections
|
|
235
|
+
- Issues that affect future work
|
|
236
|
+
- Prefix with originating phase
|
|
237
|
+
- Cleared when addressed
|
|
238
|
+
|
|
239
|
+
### Session Continuity
|
|
240
|
+
Enables instant resumption:
|
|
241
|
+
- When was last session
|
|
242
|
+
- What was last completed
|
|
243
|
+
- Is there a .continue-here file to resume from
|
|
244
|
+
|
|
245
|
+
### Research Loop History
|
|
246
|
+
Tracks recursive validation cycles (STATE-01 requirement):
|
|
247
|
+
- **Active Loop**: Which research loop is currently running (or N/A)
|
|
248
|
+
- **Loop Status**: Current stage (idle/exploring/synthesizing/validating/complete)
|
|
249
|
+
- **Loop Table**: History of completed and ongoing loops with outcomes
|
|
250
|
+
- **Current Loop Progress**: Checklist tracking which agents have contributed
|
|
251
|
+
- **Loop Notes**: Insights, decisions, and findings from the current iteration
|
|
252
|
+
|
|
253
|
+
When a research loop starts (future phases), this section tracks:
|
|
254
|
+
- Explorer's data reconnaissance
|
|
255
|
+
- Architect's hypothesis synthesis
|
|
256
|
+
- Researcher's implementation
|
|
257
|
+
- Critic's validation challenges
|
|
258
|
+
- Evaluator's metric assessments
|
|
259
|
+
|
|
260
|
+
This enables the recursive "hypothesis → experiment → validate → refine" cycle that distinguishes GRD from linear development workflows.
|
|
261
|
+
|
|
262
|
+
### Data Revisions Table
|
|
263
|
+
|
|
264
|
+
Tracks REVISE_DATA cycles within the current hypothesis:
|
|
265
|
+
- **Iteration**: Which experiment iteration triggered data revision
|
|
266
|
+
- **Concerns**: Summary of data concerns from Critic (truncated)
|
|
267
|
+
- **Explorer Result**: Outcome of re-analysis (addressed, critical issue, etc.)
|
|
268
|
+
- **Action Taken**: What happened next (loop continues, escalated, etc.)
|
|
269
|
+
|
|
270
|
+
Data revisions are tracked separately from method revisions because:
|
|
271
|
+
- Data issues are more fundamental than hyperparameter tuning
|
|
272
|
+
- Lower limit (default 2) prevents infinite data loops
|
|
273
|
+
- Multiple data revisions suggest hypothesis may not be viable with current data
|
|
274
|
+
|
|
275
|
+
</sections>
|
|
276
|
+
|
|
277
|
+
<size_constraint>
|
|
278
|
+
|
|
279
|
+
Keep STATE.md under 100 lines.
|
|
280
|
+
|
|
281
|
+
It's a DIGEST, not an archive. If accumulated context grows too large:
|
|
282
|
+
- Keep only 3-5 recent decisions in summary (full log in PROJECT.md)
|
|
283
|
+
- Keep only active blockers, remove resolved ones
|
|
284
|
+
|
|
285
|
+
The goal is "read once, know where we are" — if it's too long, that fails.
|
|
286
|
+
|
|
287
|
+
</size_constraint>
|