get-research-done 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +560 -0
- package/agents/grd-architect.md +789 -0
- package/agents/grd-codebase-mapper.md +738 -0
- package/agents/grd-critic.md +1065 -0
- package/agents/grd-debugger.md +1203 -0
- package/agents/grd-evaluator.md +948 -0
- package/agents/grd-executor.md +784 -0
- package/agents/grd-explorer.md +2063 -0
- package/agents/grd-graduator.md +484 -0
- package/agents/grd-integration-checker.md +423 -0
- package/agents/grd-phase-researcher.md +641 -0
- package/agents/grd-plan-checker.md +745 -0
- package/agents/grd-planner.md +1386 -0
- package/agents/grd-project-researcher.md +865 -0
- package/agents/grd-research-synthesizer.md +256 -0
- package/agents/grd-researcher.md +2361 -0
- package/agents/grd-roadmapper.md +605 -0
- package/agents/grd-verifier.md +778 -0
- package/bin/install.js +1294 -0
- package/commands/grd/add-phase.md +207 -0
- package/commands/grd/add-todo.md +193 -0
- package/commands/grd/architect.md +283 -0
- package/commands/grd/audit-milestone.md +277 -0
- package/commands/grd/check-todos.md +228 -0
- package/commands/grd/complete-milestone.md +136 -0
- package/commands/grd/debug.md +169 -0
- package/commands/grd/discuss-phase.md +86 -0
- package/commands/grd/evaluate.md +1095 -0
- package/commands/grd/execute-phase.md +339 -0
- package/commands/grd/explore.md +258 -0
- package/commands/grd/graduate.md +323 -0
- package/commands/grd/help.md +482 -0
- package/commands/grd/insert-phase.md +227 -0
- package/commands/grd/insights.md +231 -0
- package/commands/grd/join-discord.md +18 -0
- package/commands/grd/list-phase-assumptions.md +50 -0
- package/commands/grd/map-codebase.md +71 -0
- package/commands/grd/new-milestone.md +721 -0
- package/commands/grd/new-project.md +1008 -0
- package/commands/grd/pause-work.md +134 -0
- package/commands/grd/plan-milestone-gaps.md +295 -0
- package/commands/grd/plan-phase.md +525 -0
- package/commands/grd/progress.md +364 -0
- package/commands/grd/quick-explore.md +236 -0
- package/commands/grd/quick.md +309 -0
- package/commands/grd/remove-phase.md +349 -0
- package/commands/grd/research-phase.md +200 -0
- package/commands/grd/research.md +681 -0
- package/commands/grd/resume-work.md +40 -0
- package/commands/grd/set-profile.md +106 -0
- package/commands/grd/settings.md +136 -0
- package/commands/grd/update.md +172 -0
- package/commands/grd/verify-work.md +219 -0
- package/get-research-done/config/default.json +15 -0
- package/get-research-done/references/checkpoints.md +1078 -0
- package/get-research-done/references/continuation-format.md +249 -0
- package/get-research-done/references/git-integration.md +254 -0
- package/get-research-done/references/model-profiles.md +73 -0
- package/get-research-done/references/planning-config.md +94 -0
- package/get-research-done/references/questioning.md +141 -0
- package/get-research-done/references/tdd.md +263 -0
- package/get-research-done/references/ui-brand.md +160 -0
- package/get-research-done/references/verification-patterns.md +612 -0
- package/get-research-done/templates/DEBUG.md +159 -0
- package/get-research-done/templates/UAT.md +247 -0
- package/get-research-done/templates/archive-reason.md +195 -0
- package/get-research-done/templates/codebase/architecture.md +255 -0
- package/get-research-done/templates/codebase/concerns.md +310 -0
- package/get-research-done/templates/codebase/conventions.md +307 -0
- package/get-research-done/templates/codebase/integrations.md +280 -0
- package/get-research-done/templates/codebase/stack.md +186 -0
- package/get-research-done/templates/codebase/structure.md +285 -0
- package/get-research-done/templates/codebase/testing.md +480 -0
- package/get-research-done/templates/config.json +35 -0
- package/get-research-done/templates/context.md +283 -0
- package/get-research-done/templates/continue-here.md +78 -0
- package/get-research-done/templates/critic-log.md +288 -0
- package/get-research-done/templates/data-report.md +173 -0
- package/get-research-done/templates/debug-subagent-prompt.md +91 -0
- package/get-research-done/templates/decision-log.md +58 -0
- package/get-research-done/templates/decision.md +138 -0
- package/get-research-done/templates/discovery.md +146 -0
- package/get-research-done/templates/experiment-readme.md +104 -0
- package/get-research-done/templates/graduated-script.md +180 -0
- package/get-research-done/templates/iteration-summary.md +234 -0
- package/get-research-done/templates/milestone-archive.md +123 -0
- package/get-research-done/templates/milestone.md +115 -0
- package/get-research-done/templates/objective.md +271 -0
- package/get-research-done/templates/phase-prompt.md +567 -0
- package/get-research-done/templates/planner-subagent-prompt.md +117 -0
- package/get-research-done/templates/project.md +184 -0
- package/get-research-done/templates/requirements.md +231 -0
- package/get-research-done/templates/research-project/ARCHITECTURE.md +204 -0
- package/get-research-done/templates/research-project/FEATURES.md +147 -0
- package/get-research-done/templates/research-project/PITFALLS.md +200 -0
- package/get-research-done/templates/research-project/STACK.md +120 -0
- package/get-research-done/templates/research-project/SUMMARY.md +170 -0
- package/get-research-done/templates/research.md +529 -0
- package/get-research-done/templates/roadmap.md +202 -0
- package/get-research-done/templates/scorecard.json +113 -0
- package/get-research-done/templates/state.md +287 -0
- package/get-research-done/templates/summary.md +246 -0
- package/get-research-done/templates/user-setup.md +311 -0
- package/get-research-done/templates/verification-report.md +322 -0
- package/get-research-done/workflows/complete-milestone.md +756 -0
- package/get-research-done/workflows/diagnose-issues.md +231 -0
- package/get-research-done/workflows/discovery-phase.md +289 -0
- package/get-research-done/workflows/discuss-phase.md +433 -0
- package/get-research-done/workflows/execute-phase.md +657 -0
- package/get-research-done/workflows/execute-plan.md +1844 -0
- package/get-research-done/workflows/list-phase-assumptions.md +178 -0
- package/get-research-done/workflows/map-codebase.md +322 -0
- package/get-research-done/workflows/resume-project.md +307 -0
- package/get-research-done/workflows/transition.md +556 -0
- package/get-research-done/workflows/verify-phase.md +628 -0
- package/get-research-done/workflows/verify-work.md +596 -0
- package/hooks/dist/grd-check-update.js +61 -0
- package/hooks/dist/grd-statusline.js +84 -0
- package/package.json +47 -0
- package/scripts/audit-help-commands.sh +115 -0
- package/scripts/build-hooks.js +42 -0
- package/scripts/verify-all-commands.sh +246 -0
- package/scripts/verify-architect-warning.sh +35 -0
- package/scripts/verify-insights-mode.sh +40 -0
- package/scripts/verify-quick-mode.sh +20 -0
- package/scripts/verify-revise-data-routing.sh +139 -0
|
@@ -0,0 +1,173 @@
|
|
|
1
|
+
# Data Report: {dataset_name}
|
|
2
|
+
|
|
3
|
+
**Generated:** {timestamp}
|
|
4
|
+
**Source:** {file_path_or_paths}
|
|
5
|
+
**Sampling:** {sampling_note_if_applicable}
|
|
6
|
+
|
|
7
|
+
## Data Overview
|
|
8
|
+
|
|
9
|
+
| Metric | Value |
|
|
10
|
+
|--------|-------|
|
|
11
|
+
| Total Rows | {count} |
|
|
12
|
+
| Total Columns | {count} |
|
|
13
|
+
| Memory Usage | {size} |
|
|
14
|
+
| File Format | {format} |
|
|
15
|
+
|
|
16
|
+
### Column Summary
|
|
17
|
+
|
|
18
|
+
| Column | Type | Non-Null | Unique | Sample Values |
|
|
19
|
+
|--------|------|----------|--------|---------------|
|
|
20
|
+
| {col} | {dtype} | {count} | {count} | {values} |
|
|
21
|
+
|
|
22
|
+
## Distributions & Statistics
|
|
23
|
+
|
|
24
|
+
### Numerical Columns
|
|
25
|
+
|
|
26
|
+
| Column | Mean | Std | Min | 25% | 50% | 75% | Max |
|
|
27
|
+
|--------|------|-----|-----|-----|-----|-----|-----|
|
|
28
|
+
| {col} | {val} | {val} | {val} | {val} | {val} | {val} | {val} |
|
|
29
|
+
|
|
30
|
+
### Categorical Columns
|
|
31
|
+
|
|
32
|
+
| Column | Unique | Top Value | Frequency |
|
|
33
|
+
|--------|--------|-----------|-----------|
|
|
34
|
+
| {col} | {count} | {value} | {count} |
|
|
35
|
+
|
|
36
|
+
## Missing Data Analysis
|
|
37
|
+
|
|
38
|
+
| Column | Missing Count | Missing % | Pattern | Confidence |
|
|
39
|
+
|--------|---------------|-----------|---------|------------|
|
|
40
|
+
| {col} | {count} | {pct} | {MCAR/MAR/MNAR} | {HIGH/MEDIUM/LOW} |
|
|
41
|
+
|
|
42
|
+
**Pattern definitions:**
|
|
43
|
+
- **MCAR** (Missing Completely At Random): Missing values are randomly distributed, no correlation with other features
|
|
44
|
+
- **MAR** (Missing At Random): Missingness depends on observed data (other features)
|
|
45
|
+
- **MNAR** (Missing Not At Random): Missingness depends on the unobserved value itself
|
|
46
|
+
|
|
47
|
+
## Outlier Detection
|
|
48
|
+
|
|
49
|
+
### Statistical Outliers
|
|
50
|
+
|
|
51
|
+
| Column | Method | Outlier Count | % of Total | Severity |
|
|
52
|
+
|--------|--------|---------------|------------|----------|
|
|
53
|
+
| {col} | Z-score (>3) | {count} | {pct} | {severity} |
|
|
54
|
+
| {col} | IQR (1.5x) | {count} | {pct} | {severity} |
|
|
55
|
+
|
|
56
|
+
**Severity levels:**
|
|
57
|
+
- **LOW**: <1% outliers
|
|
58
|
+
- **MEDIUM**: 1-5% outliers
|
|
59
|
+
- **HIGH**: >5% outliers
|
|
60
|
+
|
|
61
|
+
### Top Anomalous Values
|
|
62
|
+
|
|
63
|
+
| Column | Value | Z-score | Reason |
|
|
64
|
+
|--------|-------|---------|--------|
|
|
65
|
+
| {col} | {val} | {z} | {explanation} |
|
|
66
|
+
|
|
67
|
+
## Class Balance (if target specified)
|
|
68
|
+
|
|
69
|
+
**Target Variable:** {target_column}
|
|
70
|
+
|
|
71
|
+
| Class | Count | Percentage |
|
|
72
|
+
|-------|-------|------------|
|
|
73
|
+
| {class} | {count} | {pct} |
|
|
74
|
+
|
|
75
|
+
**Imbalance Ratio:** {ratio}
|
|
76
|
+
**Severity:** {LOW/MEDIUM/HIGH}
|
|
77
|
+
**Recommendation:** {recommendation}
|
|
78
|
+
|
|
79
|
+
**Severity thresholds:**
|
|
80
|
+
- **LOW**: Ratio <2:1 (acceptable imbalance)
|
|
81
|
+
- **MEDIUM**: Ratio 2-10:1 (consider resampling or class weighting)
|
|
82
|
+
- **HIGH**: Ratio >10:1 (resampling or specialized techniques required)
|
|
83
|
+
|
|
84
|
+
## Data Leakage Analysis
|
|
85
|
+
|
|
86
|
+
### Feature-Target Correlation
|
|
87
|
+
|
|
88
|
+
| Feature | Correlation | Risk Level | Confidence | Notes |
|
|
89
|
+
|---------|-------------|------------|------------|-------|
|
|
90
|
+
| {feature} | {corr} | {HIGH/MEDIUM/LOW} | {confidence} | {explanation} |
|
|
91
|
+
|
|
92
|
+
**Risk thresholds:**
|
|
93
|
+
- **HIGH**: Correlation >0.9 (likely leakage)
|
|
94
|
+
- **MEDIUM**: Correlation 0.7-0.9 (investigate further)
|
|
95
|
+
- **LOW**: Correlation <0.7 (normal relationship)
|
|
96
|
+
|
|
97
|
+
### High Feature-Feature Correlations
|
|
98
|
+
|
|
99
|
+
| Feature 1 | Feature 2 | Correlation | Risk |
|
|
100
|
+
|-----------|-----------|-------------|------|
|
|
101
|
+
| {f1} | {f2} | {corr} | {risk_note} |
|
|
102
|
+
|
|
103
|
+
**Note:** Correlations >0.95 may indicate redundancy, derived features, or leakage.
|
|
104
|
+
|
|
105
|
+
### Train-Test Overlap (if multiple files)
|
|
106
|
+
|
|
107
|
+
| Metric | Value |
|
|
108
|
+
|--------|-------|
|
|
109
|
+
| Overlapping Rows | {count} |
|
|
110
|
+
| Overlap % (Train) | {pct} |
|
|
111
|
+
| Overlap % (Test) | {pct} |
|
|
112
|
+
| Severity | {severity} |
|
|
113
|
+
|
|
114
|
+
**Severity thresholds:**
|
|
115
|
+
- **LOW**: <1% overlap (minor contamination)
|
|
116
|
+
- **MEDIUM**: 1-5% overlap (significant issue)
|
|
117
|
+
- **HIGH**: >5% overlap (critical—invalidates evaluation)
|
|
118
|
+
|
|
119
|
+
### Temporal Leakage Indicators
|
|
120
|
+
|
|
121
|
+
| Issue | Detected | Confidence | Details |
|
|
122
|
+
|-------|----------|------------|---------|
|
|
123
|
+
| Future timestamps in features | {yes/no} | {confidence} | {details} |
|
|
124
|
+
| Train dates after test dates | {yes/no} | {confidence} | {details} |
|
|
125
|
+
| Rolling features computed globally | {unknown} | {confidence} | {details} |
|
|
126
|
+
|
|
127
|
+
**Common temporal leakage sources:**
|
|
128
|
+
- Features computed using future data (e.g., global mean instead of train-only mean)
|
|
129
|
+
- Target leakage through time-lagged features
|
|
130
|
+
- Features derived from test set statistics
|
|
131
|
+
|
|
132
|
+
## Recommendations
|
|
133
|
+
|
|
134
|
+
### Must Address (Blocking)
|
|
135
|
+
|
|
136
|
+
These issues will cause model failure or produce invalid results:
|
|
137
|
+
|
|
138
|
+
- [ ] {critical_issue_1}
|
|
139
|
+
- [ ] {critical_issue_2}
|
|
140
|
+
|
|
141
|
+
**Examples of blocking issues:**
|
|
142
|
+
- High-confidence data leakage (feature correlates >0.95 with target)
|
|
143
|
+
- Train-test overlap >5%
|
|
144
|
+
- Target variable has missing values
|
|
145
|
+
- Features with 100% missing values
|
|
146
|
+
|
|
147
|
+
### Should Address (Non-blocking)
|
|
148
|
+
|
|
149
|
+
These issues will reduce model quality but won't break training:
|
|
150
|
+
|
|
151
|
+
- [ ] {recommended_issue_1}
|
|
152
|
+
- [ ] {recommended_issue_2}
|
|
153
|
+
|
|
154
|
+
**Examples of non-blocking issues:**
|
|
155
|
+
- Missing data <30% (imputation recommended)
|
|
156
|
+
- High number of outliers (investigate or clip)
|
|
157
|
+
- Class imbalance >5:1 (resampling recommended)
|
|
158
|
+
- Low-variance features (consider removal)
|
|
159
|
+
|
|
160
|
+
### Notes
|
|
161
|
+
|
|
162
|
+
{additional_observations}
|
|
163
|
+
|
|
164
|
+
**Common observations:**
|
|
165
|
+
- High-cardinality categorical features (may need encoding strategy)
|
|
166
|
+
- Skewed distributions (may benefit from transformation)
|
|
167
|
+
- Correlated feature groups (consider dimensionality reduction)
|
|
168
|
+
- Data quality issues (duplicate rows, inconsistent formatting)
|
|
169
|
+
|
|
170
|
+
---
|
|
171
|
+
|
|
172
|
+
*Report generated by GRD Explorer Agent*
|
|
173
|
+
*Template: get-research-done/templates/data-report.md*
|
|
@@ -0,0 +1,91 @@
|
|
|
1
|
+
# Debug Subagent Prompt Template
|
|
2
|
+
|
|
3
|
+
Template for spawning grd-debugger agent. The agent contains all debugging expertise - this template provides problem context only.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Template
|
|
8
|
+
|
|
9
|
+
```markdown
|
|
10
|
+
<objective>
|
|
11
|
+
Investigate issue: {issue_id}
|
|
12
|
+
|
|
13
|
+
**Summary:** {issue_summary}
|
|
14
|
+
</objective>
|
|
15
|
+
|
|
16
|
+
<symptoms>
|
|
17
|
+
expected: {expected}
|
|
18
|
+
actual: {actual}
|
|
19
|
+
errors: {errors}
|
|
20
|
+
reproduction: {reproduction}
|
|
21
|
+
timeline: {timeline}
|
|
22
|
+
</symptoms>
|
|
23
|
+
|
|
24
|
+
<mode>
|
|
25
|
+
symptoms_prefilled: {true_or_false}
|
|
26
|
+
goal: {find_root_cause_only | find_and_fix}
|
|
27
|
+
</mode>
|
|
28
|
+
|
|
29
|
+
<debug_file>
|
|
30
|
+
Create: .planning/debug/{slug}.md
|
|
31
|
+
</debug_file>
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
---
|
|
35
|
+
|
|
36
|
+
## Placeholders
|
|
37
|
+
|
|
38
|
+
| Placeholder | Source | Example |
|
|
39
|
+
|-------------|--------|---------|
|
|
40
|
+
| `{issue_id}` | Orchestrator-assigned | `auth-screen-dark` |
|
|
41
|
+
| `{issue_summary}` | User description | `Auth screen is too dark` |
|
|
42
|
+
| `{expected}` | From symptoms | `See logo clearly` |
|
|
43
|
+
| `{actual}` | From symptoms | `Screen is dark` |
|
|
44
|
+
| `{errors}` | From symptoms | `None in console` |
|
|
45
|
+
| `{reproduction}` | From symptoms | `Open /auth page` |
|
|
46
|
+
| `{timeline}` | From symptoms | `After recent deploy` |
|
|
47
|
+
| `{goal}` | Orchestrator sets | `find_and_fix` |
|
|
48
|
+
| `{slug}` | Generated | `auth-screen-dark` |
|
|
49
|
+
|
|
50
|
+
---
|
|
51
|
+
|
|
52
|
+
## Usage
|
|
53
|
+
|
|
54
|
+
**From /grd:debug:**
|
|
55
|
+
```python
|
|
56
|
+
Task(
|
|
57
|
+
prompt=filled_template,
|
|
58
|
+
subagent_type="grd-debugger",
|
|
59
|
+
description="Debug {slug}"
|
|
60
|
+
)
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
**From diagnose-issues (UAT):**
|
|
64
|
+
```python
|
|
65
|
+
Task(prompt=template, subagent_type="grd-debugger", description="Debug UAT-001")
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## Continuation
|
|
71
|
+
|
|
72
|
+
For checkpoints, spawn fresh agent with:
|
|
73
|
+
|
|
74
|
+
```markdown
|
|
75
|
+
<objective>
|
|
76
|
+
Continue debugging {slug}. Evidence is in the debug file.
|
|
77
|
+
</objective>
|
|
78
|
+
|
|
79
|
+
<prior_state>
|
|
80
|
+
Debug file: @.planning/debug/{slug}.md
|
|
81
|
+
</prior_state>
|
|
82
|
+
|
|
83
|
+
<checkpoint_response>
|
|
84
|
+
**Type:** {checkpoint_type}
|
|
85
|
+
**Response:** {user_response}
|
|
86
|
+
</checkpoint_response>
|
|
87
|
+
|
|
88
|
+
<mode>
|
|
89
|
+
goal: {goal}
|
|
90
|
+
</mode>
|
|
91
|
+
```
|
|
@@ -0,0 +1,58 @@
|
|
|
1
|
+
# Decision Log Template
|
|
2
|
+
|
|
3
|
+
Template for `human_eval/decision_log.md` — central chronological record of all human evaluation decisions.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## File Template
|
|
8
|
+
|
|
9
|
+
```markdown
|
|
10
|
+
# Human Evaluation Decision Log
|
|
11
|
+
|
|
12
|
+
This log tracks all human evaluation decisions for this research project.
|
|
13
|
+
|
|
14
|
+
| Timestamp | Run | Decision | Key Metric | Reference |
|
|
15
|
+
|-----------|-----|----------|------------|-----------|
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## Usage Notes
|
|
21
|
+
|
|
22
|
+
- **Append-only:** New entries always added at the bottom (chronological order)
|
|
23
|
+
- **Timestamp:** YYYY-MM-DD HH:MM format (local time)
|
|
24
|
+
- **Run:** Run directory name (e.g., run_003_tuned)
|
|
25
|
+
- **Decision:** One of Seal, Iterate, Archive
|
|
26
|
+
- **Key Metric:** Primary metric with value (e.g., F1=0.85)
|
|
27
|
+
- **Reference:** Relative path to run directory
|
|
28
|
+
|
|
29
|
+
## Example Entries
|
|
30
|
+
|
|
31
|
+
| Timestamp | Run | Decision | Key Metric | Reference |
|
|
32
|
+
|-----------|-----|----------|------------|-----------|
|
|
33
|
+
| 2026-01-30 14:23 | run_003_tuned | Seal | F1=0.85 | experiments/run_003_tuned/ |
|
|
34
|
+
| 2026-01-29 10:15 | run_002_baseline | Iterate | F1=0.76 | experiments/run_002_baseline/ |
|
|
35
|
+
| 2026-01-28 16:45 | run_001_initial | Archive | F1=0.65 | experiments/archive/2026-01-28_hypothesis_v1/ |
|
|
36
|
+
|
|
37
|
+
## Navigation
|
|
38
|
+
|
|
39
|
+
- For full decision context, see DECISION.md in each run directory
|
|
40
|
+
- For archived runs, see ARCHIVE_REASON.md in archive directory
|
|
41
|
+
- Log created automatically on first /grd:evaluate decision
|
|
42
|
+
|
|
43
|
+
---
|
|
44
|
+
|
|
45
|
+
## Integration
|
|
46
|
+
|
|
47
|
+
This template is used by `/grd:evaluate` command in Phase 4 (Decision Logging).
|
|
48
|
+
|
|
49
|
+
**Creation:**
|
|
50
|
+
- Created automatically when first decision is logged
|
|
51
|
+
- Initialized with header and table structure
|
|
52
|
+
|
|
53
|
+
**Updates:**
|
|
54
|
+
- Each human decision appends one row
|
|
55
|
+
- Order: chronological (newest at bottom)
|
|
56
|
+
- No deletion or modification of existing entries
|
|
57
|
+
|
|
58
|
+
**Log references run only, no bidirectional links** — per 05-CONTEXT.md decision
|
|
@@ -0,0 +1,138 @@
|
|
|
1
|
+
# Human Decision Template
|
|
2
|
+
|
|
3
|
+
Template for per-run decision records in `experiments/run_NNN/DECISION.md`.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## File Template
|
|
8
|
+
|
|
9
|
+
```markdown
|
|
10
|
+
# Human Decision: {{run_name}}
|
|
11
|
+
|
|
12
|
+
**Timestamp:** {{ISO_8601_timestamp}}
|
|
13
|
+
**Hypothesis:** {{brief_hypothesis_from_objective}}
|
|
14
|
+
**Decision:** {{Seal|Iterate|Archive}}
|
|
15
|
+
**Rationale:** {{user_reasoning_if_provided}}
|
|
16
|
+
|
|
17
|
+
## Evidence Summary
|
|
18
|
+
|
|
19
|
+
**Critic Verdict:** {{PROCEED}} (Confidence: {{HIGH|MEDIUM|LOW}})
|
|
20
|
+
**Composite Score:** {{score}} (threshold: {{threshold}})
|
|
21
|
+
**Key Metric:** {{metric_name}}={{value}} (target: {{comparison}}{{threshold}})
|
|
22
|
+
|
|
23
|
+
## Metrics Detail
|
|
24
|
+
|
|
25
|
+
| Metric | Value | Threshold | Status |
|
|
26
|
+
|--------|-------|-----------|--------|
|
|
27
|
+
| {{metric_1}} | {{value}} | {{threshold}} | {{PASS|FAIL}} |
|
|
28
|
+
| {{metric_2}} | {{value}} | {{threshold}} | {{PASS|FAIL}} |
|
|
29
|
+
|
|
30
|
+
## Decision Context
|
|
31
|
+
|
|
32
|
+
### For Seal
|
|
33
|
+
- Hypothesis validated
|
|
34
|
+
- Ready for production/publication
|
|
35
|
+
- All success criteria met
|
|
36
|
+
|
|
37
|
+
### For Iterate
|
|
38
|
+
- Continuing experimentation
|
|
39
|
+
- Direction: {{REVISE_METHOD|REVISE_DATA}} (from Critic recommendation)
|
|
40
|
+
- Next focus: {{specific_area}}
|
|
41
|
+
|
|
42
|
+
### For Archive
|
|
43
|
+
- Hypothesis abandoned
|
|
44
|
+
- Reason: {{user_rationale_required}}
|
|
45
|
+
- Preserved as negative result
|
|
46
|
+
|
|
47
|
+
---
|
|
48
|
+
|
|
49
|
+
*Decision recorded: {{ISO_8601_timestamp}}*
|
|
50
|
+
*Run directory: experiments/{{run_name}}/*
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
---
|
|
54
|
+
|
|
55
|
+
## Usage Notes
|
|
56
|
+
|
|
57
|
+
**Field descriptions:**
|
|
58
|
+
|
|
59
|
+
- **run_name:** Directory name (e.g., run_003_tuned) extracted from experiments/ path
|
|
60
|
+
- **ISO_8601_timestamp:** Format YYYY-MM-DDTHH:MM:SSZ (UTC time)
|
|
61
|
+
- **brief_hypothesis_from_objective:** Extract "what" statement from OBJECTIVE.md (1-2 sentences max)
|
|
62
|
+
- **Decision:** One of: Seal, Iterate, Archive
|
|
63
|
+
- **Rationale:** REQUIRED for Archive, optional for Seal/Iterate
|
|
64
|
+
|
|
65
|
+
**Evidence Summary fields:**
|
|
66
|
+
|
|
67
|
+
- **Critic Verdict:** Always "PROCEED" (required to reach human eval)
|
|
68
|
+
- **Confidence:** Extract from CRITIC_LOG.md (HIGH/MEDIUM/LOW)
|
|
69
|
+
- **Composite Score:** Weighted average from SCORECARD.json
|
|
70
|
+
- **threshold:** Overall threshold from OBJECTIVE.md
|
|
71
|
+
- **Key Metric:** Primary metric (highest weight or first in list)
|
|
72
|
+
- **comparison:** Operator (>, <, >=, <=, ==)
|
|
73
|
+
|
|
74
|
+
**Metrics Detail table:**
|
|
75
|
+
|
|
76
|
+
- Pull all metrics from SCORECARD.json
|
|
77
|
+
- Include: name, achieved value, threshold, PASS/FAIL status
|
|
78
|
+
- Order by weight (descending) or as defined in OBJECTIVE.md
|
|
79
|
+
|
|
80
|
+
**Decision Context sections:**
|
|
81
|
+
|
|
82
|
+
- Only populate the section matching the decision type
|
|
83
|
+
- For Iterate: include Critic's recommendation if available
|
|
84
|
+
- For Archive: user_rationale is REQUIRED
|
|
85
|
+
|
|
86
|
+
**Example populated template:**
|
|
87
|
+
|
|
88
|
+
```markdown
|
|
89
|
+
# Human Decision: run_003_tuned
|
|
90
|
+
|
|
91
|
+
**Timestamp:** 2026-01-30T14:35:00Z
|
|
92
|
+
**Hypothesis:** Ensemble methods will improve F1 score over single models
|
|
93
|
+
**Decision:** Seal
|
|
94
|
+
**Rationale:** Results demonstrate clear improvement with robust validation
|
|
95
|
+
|
|
96
|
+
## Evidence Summary
|
|
97
|
+
|
|
98
|
+
**Critic Verdict:** PROCEED (Confidence: HIGH)
|
|
99
|
+
**Composite Score:** 0.89 (threshold: 0.80)
|
|
100
|
+
**Key Metric:** f1_score=0.91 (target: >=0.85)
|
|
101
|
+
|
|
102
|
+
## Metrics Detail
|
|
103
|
+
|
|
104
|
+
| Metric | Value | Threshold | Status |
|
|
105
|
+
|--------|-------|-----------|--------|
|
|
106
|
+
| f1_score | 0.91 | 0.85 | PASS |
|
|
107
|
+
| precision | 0.88 | 0.80 | PASS |
|
|
108
|
+
| recall | 0.94 | 0.80 | PASS |
|
|
109
|
+
|
|
110
|
+
## Decision Context
|
|
111
|
+
|
|
112
|
+
### For Seal
|
|
113
|
+
- Hypothesis validated
|
|
114
|
+
- Ready for production/publication
|
|
115
|
+
- All success criteria met
|
|
116
|
+
|
|
117
|
+
---
|
|
118
|
+
|
|
119
|
+
*Decision recorded: 2026-01-30T14:35:00Z*
|
|
120
|
+
*Run directory: experiments/run_003_tuned/*
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
---
|
|
124
|
+
|
|
125
|
+
## Integration
|
|
126
|
+
|
|
127
|
+
This template is used by `/grd:evaluate` command in Phase 4 (Decision Logging).
|
|
128
|
+
|
|
129
|
+
**Inputs:**
|
|
130
|
+
- OBJECTIVE.md (hypothesis, metrics, thresholds)
|
|
131
|
+
- SCORECARD.json (metrics values, composite score)
|
|
132
|
+
- CRITIC_LOG.md (verdict, confidence)
|
|
133
|
+
- User decision (Seal/Iterate/Archive)
|
|
134
|
+
- User rationale (if Archive or provided)
|
|
135
|
+
|
|
136
|
+
**Output:**
|
|
137
|
+
- experiments/run_NNN/DECISION.md (this template populated)
|
|
138
|
+
- Appended to human_eval/decision_log.md (summary entry)
|
|
@@ -0,0 +1,146 @@
|
|
|
1
|
+
# Discovery Template
|
|
2
|
+
|
|
3
|
+
Template for `.planning/phases/XX-name/DISCOVERY.md` - shallow research for library/option decisions.
|
|
4
|
+
|
|
5
|
+
**Purpose:** Answer "which library/option should we use" questions during mandatory discovery in plan-phase.
|
|
6
|
+
|
|
7
|
+
For deep ecosystem research ("how do experts build this"), use `/grd:research-phase` which produces RESEARCH.md.
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## File Template
|
|
12
|
+
|
|
13
|
+
```markdown
|
|
14
|
+
---
|
|
15
|
+
phase: XX-name
|
|
16
|
+
type: discovery
|
|
17
|
+
topic: [discovery-topic]
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
<session_initialization>
|
|
21
|
+
Before beginning discovery, verify today's date:
|
|
22
|
+
!`date +%Y-%m-%d`
|
|
23
|
+
|
|
24
|
+
Use this date when searching for "current" or "latest" information.
|
|
25
|
+
Example: If today is 2025-11-22, search for "2025" not "2024".
|
|
26
|
+
</session_initialization>
|
|
27
|
+
|
|
28
|
+
<discovery_objective>
|
|
29
|
+
Discover [topic] to inform [phase name] implementation.
|
|
30
|
+
|
|
31
|
+
Purpose: [What decision/implementation this enables]
|
|
32
|
+
Scope: [Boundaries]
|
|
33
|
+
Output: DISCOVERY.md with recommendation
|
|
34
|
+
</discovery_objective>
|
|
35
|
+
|
|
36
|
+
<discovery_scope>
|
|
37
|
+
<include>
|
|
38
|
+
- [Question to answer]
|
|
39
|
+
- [Area to investigate]
|
|
40
|
+
- [Specific comparison if needed]
|
|
41
|
+
</include>
|
|
42
|
+
|
|
43
|
+
<exclude>
|
|
44
|
+
- [Out of scope for this discovery]
|
|
45
|
+
- [Defer to implementation phase]
|
|
46
|
+
</exclude>
|
|
47
|
+
</discovery_scope>
|
|
48
|
+
|
|
49
|
+
<discovery_protocol>
|
|
50
|
+
|
|
51
|
+
**Source Priority:**
|
|
52
|
+
1. **Context7 MCP** - For library/framework documentation (current, authoritative)
|
|
53
|
+
2. **Official Docs** - For platform-specific or non-indexed libraries
|
|
54
|
+
3. **WebSearch** - For comparisons, trends, community patterns (verify all findings)
|
|
55
|
+
|
|
56
|
+
**Quality Checklist:**
|
|
57
|
+
Before completing discovery, verify:
|
|
58
|
+
- [ ] All claims have authoritative sources (Context7 or official docs)
|
|
59
|
+
- [ ] Negative claims ("X is not possible") verified with official documentation
|
|
60
|
+
- [ ] API syntax/configuration from Context7 or official docs (never WebSearch alone)
|
|
61
|
+
- [ ] WebSearch findings cross-checked with authoritative sources
|
|
62
|
+
- [ ] Recent updates/changelogs checked for breaking changes
|
|
63
|
+
- [ ] Alternative approaches considered (not just first solution found)
|
|
64
|
+
|
|
65
|
+
**Confidence Levels:**
|
|
66
|
+
- HIGH: Context7 or official docs confirm
|
|
67
|
+
- MEDIUM: WebSearch + Context7/official docs confirm
|
|
68
|
+
- LOW: WebSearch only or training knowledge only (mark for validation)
|
|
69
|
+
|
|
70
|
+
</discovery_protocol>
|
|
71
|
+
|
|
72
|
+
|
|
73
|
+
<output_structure>
|
|
74
|
+
Create `.planning/phases/XX-name/DISCOVERY.md`:
|
|
75
|
+
|
|
76
|
+
```markdown
|
|
77
|
+
# [Topic] Discovery
|
|
78
|
+
|
|
79
|
+
## Summary
|
|
80
|
+
[2-3 paragraph executive summary - what was researched, what was found, what's recommended]
|
|
81
|
+
|
|
82
|
+
## Primary Recommendation
|
|
83
|
+
[What to do and why - be specific and actionable]
|
|
84
|
+
|
|
85
|
+
## Alternatives Considered
|
|
86
|
+
[What else was evaluated and why not chosen]
|
|
87
|
+
|
|
88
|
+
## Key Findings
|
|
89
|
+
|
|
90
|
+
### [Category 1]
|
|
91
|
+
- [Finding with source URL and relevance to our case]
|
|
92
|
+
|
|
93
|
+
### [Category 2]
|
|
94
|
+
- [Finding with source URL and relevance]
|
|
95
|
+
|
|
96
|
+
## Code Examples
|
|
97
|
+
[Relevant implementation patterns, if applicable]
|
|
98
|
+
|
|
99
|
+
## Metadata
|
|
100
|
+
|
|
101
|
+
<metadata>
|
|
102
|
+
<confidence level="high|medium|low">
|
|
103
|
+
[Why this confidence level - based on source quality and verification]
|
|
104
|
+
</confidence>
|
|
105
|
+
|
|
106
|
+
<sources>
|
|
107
|
+
- [Primary authoritative sources used]
|
|
108
|
+
</sources>
|
|
109
|
+
|
|
110
|
+
<open_questions>
|
|
111
|
+
[What couldn't be determined or needs validation during implementation]
|
|
112
|
+
</open_questions>
|
|
113
|
+
|
|
114
|
+
<validation_checkpoints>
|
|
115
|
+
[If confidence is LOW or MEDIUM, list specific things to verify during implementation]
|
|
116
|
+
</validation_checkpoints>
|
|
117
|
+
</metadata>
|
|
118
|
+
```
|
|
119
|
+
</output_structure>
|
|
120
|
+
|
|
121
|
+
<success_criteria>
|
|
122
|
+
- All scope questions answered with authoritative sources
|
|
123
|
+
- Quality checklist items completed
|
|
124
|
+
- Clear primary recommendation
|
|
125
|
+
- Low-confidence findings marked with validation checkpoints
|
|
126
|
+
- Ready to inform PLAN.md creation
|
|
127
|
+
</success_criteria>
|
|
128
|
+
|
|
129
|
+
<guidelines>
|
|
130
|
+
**When to use discovery:**
|
|
131
|
+
- Technology choice unclear (library A vs B)
|
|
132
|
+
- Best practices needed for unfamiliar integration
|
|
133
|
+
- API/library investigation required
|
|
134
|
+
- Single decision pending
|
|
135
|
+
|
|
136
|
+
**When NOT to use:**
|
|
137
|
+
- Established patterns (CRUD, auth with known library)
|
|
138
|
+
- Implementation details (defer to execution)
|
|
139
|
+
- Questions answerable from existing project context
|
|
140
|
+
|
|
141
|
+
**When to use RESEARCH.md instead:**
|
|
142
|
+
- Niche/complex domains (3D, games, audio, shaders)
|
|
143
|
+
- Need ecosystem knowledge, not just library choice
|
|
144
|
+
- "How do experts build this" questions
|
|
145
|
+
- Use `/grd:research-phase` for these
|
|
146
|
+
</guidelines>
|
|
@@ -0,0 +1,104 @@
|
|
|
1
|
+
# {{run_name}}
|
|
2
|
+
|
|
3
|
+
**Created:** {{timestamp}}
|
|
4
|
+
**Iteration:** {{iteration_number}}
|
|
5
|
+
**Status:** {{status}}
|
|
6
|
+
**Experiment Type:** {{experiment_type}} (script | notebook)
|
|
7
|
+
**Hypothesis:** {{brief_hypothesis_from_objective}}
|
|
8
|
+
|
|
9
|
+
## Summary
|
|
10
|
+
|
|
11
|
+
{{one_paragraph_explaining_what_why_how}}
|
|
12
|
+
|
|
13
|
+
## Notebook Execution
|
|
14
|
+
|
|
15
|
+
> Note: This section only appears for notebook experiments (experiment_type == 'notebook')
|
|
16
|
+
|
|
17
|
+
- **Source Notebook:** {{source_notebook}}
|
|
18
|
+
- **Input Notebook:** code/input.ipynb (copy of source)
|
|
19
|
+
- **Executed Notebook:** output.ipynb (with outputs)
|
|
20
|
+
- **Metrics Extracted:** metrics.json (via scrapbook)
|
|
21
|
+
|
|
22
|
+
### Parameters Injected
|
|
23
|
+
|
|
24
|
+
The following parameters were injected via papermill:
|
|
25
|
+
|
|
26
|
+
| Parameter | Value |
|
|
27
|
+
|-----------|-------|
|
|
28
|
+
| random_seed | {{random_seed}} |
|
|
29
|
+
| data_path | {{data_path}} |
|
|
30
|
+
| {{parameter_name}} | {{parameter_value}} |
|
|
31
|
+
|
|
32
|
+
### Execution Details
|
|
33
|
+
|
|
34
|
+
- **Kernel:** {{kernel_name}} (auto-detected)
|
|
35
|
+
- **Cell Timeout:** {{cell_timeout}} seconds
|
|
36
|
+
- **Execution Time:** {{execution_time_seconds}} seconds
|
|
37
|
+
- **Retry Attempted:** {{retry_attempted}} (true/false)
|
|
38
|
+
|
|
39
|
+
## Reproduce
|
|
40
|
+
|
|
41
|
+
### For Script Experiments
|
|
42
|
+
|
|
43
|
+
```bash
|
|
44
|
+
cd experiments/{{run_name}}
|
|
45
|
+
python code/train.py --config config.yaml
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
### For Notebook Experiments
|
|
49
|
+
|
|
50
|
+
```bash
|
|
51
|
+
cd experiments/{{run_name}}
|
|
52
|
+
# Re-execute notebook with papermill
|
|
53
|
+
papermill code/input.ipynb output.ipynb -f config.yaml
|
|
54
|
+
# Or view executed notebook
|
|
55
|
+
jupyter notebook output.ipynb
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
## Configuration
|
|
59
|
+
|
|
60
|
+
See: `config.yaml`
|
|
61
|
+
|
|
62
|
+
Key parameters:
|
|
63
|
+
{{key_hyperparameters_list}}
|
|
64
|
+
|
|
65
|
+
## Files
|
|
66
|
+
|
|
67
|
+
### For Notebook Experiments
|
|
68
|
+
|
|
69
|
+
- `code/input.ipynb` - Original notebook (source copy)
|
|
70
|
+
- `output.ipynb` - Executed notebook with outputs
|
|
71
|
+
- `metrics.json` - Extracted metrics (via scrapbook)
|
|
72
|
+
- `config.yaml` - Experiment configuration
|
|
73
|
+
- `data/` - Data references (symlinks + hashes)
|
|
74
|
+
- `logs/` - Execution logs
|
|
75
|
+
- `outputs/` - Model artifacts
|
|
76
|
+
- `metrics/SCORECARD.json` - Evaluation results (if generated)
|
|
77
|
+
- `CRITIC_LOG.md` - Critic verdict
|
|
78
|
+
|
|
79
|
+
### For Script Experiments
|
|
80
|
+
|
|
81
|
+
- `code/train.py` - Training script
|
|
82
|
+
- `config.yaml` - Experiment configuration
|
|
83
|
+
- `data/` - Data references (symlinks + hashes)
|
|
84
|
+
- `logs/` - Execution logs
|
|
85
|
+
- `outputs/` - Model artifacts
|
|
86
|
+
- `metrics/SCORECARD.json` - Evaluation results
|
|
87
|
+
- `CRITIC_LOG.md` - Critic verdict
|
|
88
|
+
|
|
89
|
+
## Data
|
|
90
|
+
|
|
91
|
+
- **Source:** {{data_path}}
|
|
92
|
+
- **Hash:** {{data_hash}}
|
|
93
|
+
- **Version:** {{data_version_if_available}}
|
|
94
|
+
|
|
95
|
+
## Results
|
|
96
|
+
|
|
97
|
+
{{metrics_summary_or_pending}}
|
|
98
|
+
|
|
99
|
+
## Critic Verdict
|
|
100
|
+
|
|
101
|
+
{{verdict_if_available_or_pending}}
|
|
102
|
+
|
|
103
|
+
---
|
|
104
|
+
*Generated by grd-researcher*
|