get-research-done 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (127) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +560 -0
  3. package/agents/grd-architect.md +789 -0
  4. package/agents/grd-codebase-mapper.md +738 -0
  5. package/agents/grd-critic.md +1065 -0
  6. package/agents/grd-debugger.md +1203 -0
  7. package/agents/grd-evaluator.md +948 -0
  8. package/agents/grd-executor.md +784 -0
  9. package/agents/grd-explorer.md +2063 -0
  10. package/agents/grd-graduator.md +484 -0
  11. package/agents/grd-integration-checker.md +423 -0
  12. package/agents/grd-phase-researcher.md +641 -0
  13. package/agents/grd-plan-checker.md +745 -0
  14. package/agents/grd-planner.md +1386 -0
  15. package/agents/grd-project-researcher.md +865 -0
  16. package/agents/grd-research-synthesizer.md +256 -0
  17. package/agents/grd-researcher.md +2361 -0
  18. package/agents/grd-roadmapper.md +605 -0
  19. package/agents/grd-verifier.md +778 -0
  20. package/bin/install.js +1294 -0
  21. package/commands/grd/add-phase.md +207 -0
  22. package/commands/grd/add-todo.md +193 -0
  23. package/commands/grd/architect.md +283 -0
  24. package/commands/grd/audit-milestone.md +277 -0
  25. package/commands/grd/check-todos.md +228 -0
  26. package/commands/grd/complete-milestone.md +136 -0
  27. package/commands/grd/debug.md +169 -0
  28. package/commands/grd/discuss-phase.md +86 -0
  29. package/commands/grd/evaluate.md +1095 -0
  30. package/commands/grd/execute-phase.md +339 -0
  31. package/commands/grd/explore.md +258 -0
  32. package/commands/grd/graduate.md +323 -0
  33. package/commands/grd/help.md +482 -0
  34. package/commands/grd/insert-phase.md +227 -0
  35. package/commands/grd/insights.md +231 -0
  36. package/commands/grd/join-discord.md +18 -0
  37. package/commands/grd/list-phase-assumptions.md +50 -0
  38. package/commands/grd/map-codebase.md +71 -0
  39. package/commands/grd/new-milestone.md +721 -0
  40. package/commands/grd/new-project.md +1008 -0
  41. package/commands/grd/pause-work.md +134 -0
  42. package/commands/grd/plan-milestone-gaps.md +295 -0
  43. package/commands/grd/plan-phase.md +525 -0
  44. package/commands/grd/progress.md +364 -0
  45. package/commands/grd/quick-explore.md +236 -0
  46. package/commands/grd/quick.md +309 -0
  47. package/commands/grd/remove-phase.md +349 -0
  48. package/commands/grd/research-phase.md +200 -0
  49. package/commands/grd/research.md +681 -0
  50. package/commands/grd/resume-work.md +40 -0
  51. package/commands/grd/set-profile.md +106 -0
  52. package/commands/grd/settings.md +136 -0
  53. package/commands/grd/update.md +172 -0
  54. package/commands/grd/verify-work.md +219 -0
  55. package/get-research-done/config/default.json +15 -0
  56. package/get-research-done/references/checkpoints.md +1078 -0
  57. package/get-research-done/references/continuation-format.md +249 -0
  58. package/get-research-done/references/git-integration.md +254 -0
  59. package/get-research-done/references/model-profiles.md +73 -0
  60. package/get-research-done/references/planning-config.md +94 -0
  61. package/get-research-done/references/questioning.md +141 -0
  62. package/get-research-done/references/tdd.md +263 -0
  63. package/get-research-done/references/ui-brand.md +160 -0
  64. package/get-research-done/references/verification-patterns.md +612 -0
  65. package/get-research-done/templates/DEBUG.md +159 -0
  66. package/get-research-done/templates/UAT.md +247 -0
  67. package/get-research-done/templates/archive-reason.md +195 -0
  68. package/get-research-done/templates/codebase/architecture.md +255 -0
  69. package/get-research-done/templates/codebase/concerns.md +310 -0
  70. package/get-research-done/templates/codebase/conventions.md +307 -0
  71. package/get-research-done/templates/codebase/integrations.md +280 -0
  72. package/get-research-done/templates/codebase/stack.md +186 -0
  73. package/get-research-done/templates/codebase/structure.md +285 -0
  74. package/get-research-done/templates/codebase/testing.md +480 -0
  75. package/get-research-done/templates/config.json +35 -0
  76. package/get-research-done/templates/context.md +283 -0
  77. package/get-research-done/templates/continue-here.md +78 -0
  78. package/get-research-done/templates/critic-log.md +288 -0
  79. package/get-research-done/templates/data-report.md +173 -0
  80. package/get-research-done/templates/debug-subagent-prompt.md +91 -0
  81. package/get-research-done/templates/decision-log.md +58 -0
  82. package/get-research-done/templates/decision.md +138 -0
  83. package/get-research-done/templates/discovery.md +146 -0
  84. package/get-research-done/templates/experiment-readme.md +104 -0
  85. package/get-research-done/templates/graduated-script.md +180 -0
  86. package/get-research-done/templates/iteration-summary.md +234 -0
  87. package/get-research-done/templates/milestone-archive.md +123 -0
  88. package/get-research-done/templates/milestone.md +115 -0
  89. package/get-research-done/templates/objective.md +271 -0
  90. package/get-research-done/templates/phase-prompt.md +567 -0
  91. package/get-research-done/templates/planner-subagent-prompt.md +117 -0
  92. package/get-research-done/templates/project.md +184 -0
  93. package/get-research-done/templates/requirements.md +231 -0
  94. package/get-research-done/templates/research-project/ARCHITECTURE.md +204 -0
  95. package/get-research-done/templates/research-project/FEATURES.md +147 -0
  96. package/get-research-done/templates/research-project/PITFALLS.md +200 -0
  97. package/get-research-done/templates/research-project/STACK.md +120 -0
  98. package/get-research-done/templates/research-project/SUMMARY.md +170 -0
  99. package/get-research-done/templates/research.md +529 -0
  100. package/get-research-done/templates/roadmap.md +202 -0
  101. package/get-research-done/templates/scorecard.json +113 -0
  102. package/get-research-done/templates/state.md +287 -0
  103. package/get-research-done/templates/summary.md +246 -0
  104. package/get-research-done/templates/user-setup.md +311 -0
  105. package/get-research-done/templates/verification-report.md +322 -0
  106. package/get-research-done/workflows/complete-milestone.md +756 -0
  107. package/get-research-done/workflows/diagnose-issues.md +231 -0
  108. package/get-research-done/workflows/discovery-phase.md +289 -0
  109. package/get-research-done/workflows/discuss-phase.md +433 -0
  110. package/get-research-done/workflows/execute-phase.md +657 -0
  111. package/get-research-done/workflows/execute-plan.md +1844 -0
  112. package/get-research-done/workflows/list-phase-assumptions.md +178 -0
  113. package/get-research-done/workflows/map-codebase.md +322 -0
  114. package/get-research-done/workflows/resume-project.md +307 -0
  115. package/get-research-done/workflows/transition.md +556 -0
  116. package/get-research-done/workflows/verify-phase.md +628 -0
  117. package/get-research-done/workflows/verify-work.md +596 -0
  118. package/hooks/dist/grd-check-update.js +61 -0
  119. package/hooks/dist/grd-statusline.js +84 -0
  120. package/package.json +47 -0
  121. package/scripts/audit-help-commands.sh +115 -0
  122. package/scripts/build-hooks.js +42 -0
  123. package/scripts/verify-all-commands.sh +246 -0
  124. package/scripts/verify-architect-warning.sh +35 -0
  125. package/scripts/verify-insights-mode.sh +40 -0
  126. package/scripts/verify-quick-mode.sh +20 -0
  127. package/scripts/verify-revise-data-routing.sh +139 -0
@@ -0,0 +1,173 @@
1
+ # Data Report: {dataset_name}
2
+
3
+ **Generated:** {timestamp}
4
+ **Source:** {file_path_or_paths}
5
+ **Sampling:** {sampling_note_if_applicable}
6
+
7
+ ## Data Overview
8
+
9
+ | Metric | Value |
10
+ |--------|-------|
11
+ | Total Rows | {count} |
12
+ | Total Columns | {count} |
13
+ | Memory Usage | {size} |
14
+ | File Format | {format} |
15
+
16
+ ### Column Summary
17
+
18
+ | Column | Type | Non-Null | Unique | Sample Values |
19
+ |--------|------|----------|--------|---------------|
20
+ | {col} | {dtype} | {count} | {count} | {values} |
21
+
22
+ ## Distributions & Statistics
23
+
24
+ ### Numerical Columns
25
+
26
+ | Column | Mean | Std | Min | 25% | 50% | 75% | Max |
27
+ |--------|------|-----|-----|-----|-----|-----|-----|
28
+ | {col} | {val} | {val} | {val} | {val} | {val} | {val} | {val} |
29
+
30
+ ### Categorical Columns
31
+
32
+ | Column | Unique | Top Value | Frequency |
33
+ |--------|--------|-----------|-----------|
34
+ | {col} | {count} | {value} | {count} |
35
+
36
+ ## Missing Data Analysis
37
+
38
+ | Column | Missing Count | Missing % | Pattern | Confidence |
39
+ |--------|---------------|-----------|---------|------------|
40
+ | {col} | {count} | {pct} | {MCAR/MAR/MNAR} | {HIGH/MEDIUM/LOW} |
41
+
42
+ **Pattern definitions:**
43
+ - **MCAR** (Missing Completely At Random): Missing values are randomly distributed, no correlation with other features
44
+ - **MAR** (Missing At Random): Missingness depends on observed data (other features)
45
+ - **MNAR** (Missing Not At Random): Missingness depends on the unobserved value itself
46
+
47
+ ## Outlier Detection
48
+
49
+ ### Statistical Outliers
50
+
51
+ | Column | Method | Outlier Count | % of Total | Severity |
52
+ |--------|--------|---------------|------------|----------|
53
+ | {col} | Z-score (>3) | {count} | {pct} | {severity} |
54
+ | {col} | IQR (1.5x) | {count} | {pct} | {severity} |
55
+
56
+ **Severity levels:**
57
+ - **LOW**: <1% outliers
58
+ - **MEDIUM**: 1-5% outliers
59
+ - **HIGH**: >5% outliers
60
+
61
+ ### Top Anomalous Values
62
+
63
+ | Column | Value | Z-score | Reason |
64
+ |--------|-------|---------|--------|
65
+ | {col} | {val} | {z} | {explanation} |
66
+
67
+ ## Class Balance (if target specified)
68
+
69
+ **Target Variable:** {target_column}
70
+
71
+ | Class | Count | Percentage |
72
+ |-------|-------|------------|
73
+ | {class} | {count} | {pct} |
74
+
75
+ **Imbalance Ratio:** {ratio}
76
+ **Severity:** {LOW/MEDIUM/HIGH}
77
+ **Recommendation:** {recommendation}
78
+
79
+ **Severity thresholds:**
80
+ - **LOW**: Ratio <2:1 (acceptable imbalance)
81
+ - **MEDIUM**: Ratio 2-10:1 (consider resampling or class weighting)
82
+ - **HIGH**: Ratio >10:1 (resampling or specialized techniques required)
83
+
84
+ ## Data Leakage Analysis
85
+
86
+ ### Feature-Target Correlation
87
+
88
+ | Feature | Correlation | Risk Level | Confidence | Notes |
89
+ |---------|-------------|------------|------------|-------|
90
+ | {feature} | {corr} | {HIGH/MEDIUM/LOW} | {confidence} | {explanation} |
91
+
92
+ **Risk thresholds:**
93
+ - **HIGH**: Correlation >0.9 (likely leakage)
94
+ - **MEDIUM**: Correlation 0.7-0.9 (investigate further)
95
+ - **LOW**: Correlation <0.7 (normal relationship)
96
+
97
+ ### High Feature-Feature Correlations
98
+
99
+ | Feature 1 | Feature 2 | Correlation | Risk |
100
+ |-----------|-----------|-------------|------|
101
+ | {f1} | {f2} | {corr} | {risk_note} |
102
+
103
+ **Note:** Correlations >0.95 may indicate redundancy, derived features, or leakage.
104
+
105
+ ### Train-Test Overlap (if multiple files)
106
+
107
+ | Metric | Value |
108
+ |--------|-------|
109
+ | Overlapping Rows | {count} |
110
+ | Overlap % (Train) | {pct} |
111
+ | Overlap % (Test) | {pct} |
112
+ | Severity | {severity} |
113
+
114
+ **Severity thresholds:**
115
+ - **LOW**: <1% overlap (minor contamination)
116
+ - **MEDIUM**: 1-5% overlap (significant issue)
117
+ - **HIGH**: >5% overlap (critical—invalidates evaluation)
118
+
119
+ ### Temporal Leakage Indicators
120
+
121
+ | Issue | Detected | Confidence | Details |
122
+ |-------|----------|------------|---------|
123
+ | Future timestamps in features | {yes/no} | {confidence} | {details} |
124
+ | Train dates after test dates | {yes/no} | {confidence} | {details} |
125
+ | Rolling features computed globally | {unknown} | {confidence} | {details} |
126
+
127
+ **Common temporal leakage sources:**
128
+ - Features computed using future data (e.g., global mean instead of train-only mean)
129
+ - Target leakage through time-lagged features
130
+ - Features derived from test set statistics
131
+
132
+ ## Recommendations
133
+
134
+ ### Must Address (Blocking)
135
+
136
+ These issues will cause model failure or produce invalid results:
137
+
138
+ - [ ] {critical_issue_1}
139
+ - [ ] {critical_issue_2}
140
+
141
+ **Examples of blocking issues:**
142
+ - High-confidence data leakage (feature correlates >0.95 with target)
143
+ - Train-test overlap >5%
144
+ - Target variable has missing values
145
+ - Features with 100% missing values
146
+
147
+ ### Should Address (Non-blocking)
148
+
149
+ These issues will reduce model quality but won't break training:
150
+
151
+ - [ ] {recommended_issue_1}
152
+ - [ ] {recommended_issue_2}
153
+
154
+ **Examples of non-blocking issues:**
155
+ - Missing data <30% (imputation recommended)
156
+ - High number of outliers (investigate or clip)
157
+ - Class imbalance >5:1 (resampling recommended)
158
+ - Low-variance features (consider removal)
159
+
160
+ ### Notes
161
+
162
+ {additional_observations}
163
+
164
+ **Common observations:**
165
+ - High-cardinality categorical features (may need encoding strategy)
166
+ - Skewed distributions (may benefit from transformation)
167
+ - Correlated feature groups (consider dimensionality reduction)
168
+ - Data quality issues (duplicate rows, inconsistent formatting)
169
+
170
+ ---
171
+
172
+ *Report generated by GRD Explorer Agent*
173
+ *Template: get-research-done/templates/data-report.md*
@@ -0,0 +1,91 @@
1
+ # Debug Subagent Prompt Template
2
+
3
+ Template for spawning grd-debugger agent. The agent contains all debugging expertise - this template provides problem context only.
4
+
5
+ ---
6
+
7
+ ## Template
8
+
9
+ ```markdown
10
+ <objective>
11
+ Investigate issue: {issue_id}
12
+
13
+ **Summary:** {issue_summary}
14
+ </objective>
15
+
16
+ <symptoms>
17
+ expected: {expected}
18
+ actual: {actual}
19
+ errors: {errors}
20
+ reproduction: {reproduction}
21
+ timeline: {timeline}
22
+ </symptoms>
23
+
24
+ <mode>
25
+ symptoms_prefilled: {true_or_false}
26
+ goal: {find_root_cause_only | find_and_fix}
27
+ </mode>
28
+
29
+ <debug_file>
30
+ Create: .planning/debug/{slug}.md
31
+ </debug_file>
32
+ ```
33
+
34
+ ---
35
+
36
+ ## Placeholders
37
+
38
+ | Placeholder | Source | Example |
39
+ |-------------|--------|---------|
40
+ | `{issue_id}` | Orchestrator-assigned | `auth-screen-dark` |
41
+ | `{issue_summary}` | User description | `Auth screen is too dark` |
42
+ | `{expected}` | From symptoms | `See logo clearly` |
43
+ | `{actual}` | From symptoms | `Screen is dark` |
44
+ | `{errors}` | From symptoms | `None in console` |
45
+ | `{reproduction}` | From symptoms | `Open /auth page` |
46
+ | `{timeline}` | From symptoms | `After recent deploy` |
47
+ | `{goal}` | Orchestrator sets | `find_and_fix` |
48
+ | `{slug}` | Generated | `auth-screen-dark` |
49
+
50
+ ---
51
+
52
+ ## Usage
53
+
54
+ **From /grd:debug:**
55
+ ```python
56
+ Task(
57
+ prompt=filled_template,
58
+ subagent_type="grd-debugger",
59
+ description="Debug {slug}"
60
+ )
61
+ ```
62
+
63
+ **From diagnose-issues (UAT):**
64
+ ```python
65
+ Task(prompt=template, subagent_type="grd-debugger", description="Debug UAT-001")
66
+ ```
67
+
68
+ ---
69
+
70
+ ## Continuation
71
+
72
+ For checkpoints, spawn fresh agent with:
73
+
74
+ ```markdown
75
+ <objective>
76
+ Continue debugging {slug}. Evidence is in the debug file.
77
+ </objective>
78
+
79
+ <prior_state>
80
+ Debug file: @.planning/debug/{slug}.md
81
+ </prior_state>
82
+
83
+ <checkpoint_response>
84
+ **Type:** {checkpoint_type}
85
+ **Response:** {user_response}
86
+ </checkpoint_response>
87
+
88
+ <mode>
89
+ goal: {goal}
90
+ </mode>
91
+ ```
@@ -0,0 +1,58 @@
1
+ # Decision Log Template
2
+
3
+ Template for `human_eval/decision_log.md` — central chronological record of all human evaluation decisions.
4
+
5
+ ---
6
+
7
+ ## File Template
8
+
9
+ ```markdown
10
+ # Human Evaluation Decision Log
11
+
12
+ This log tracks all human evaluation decisions for this research project.
13
+
14
+ | Timestamp | Run | Decision | Key Metric | Reference |
15
+ |-----------|-----|----------|------------|-----------|
16
+ ```
17
+
18
+ ---
19
+
20
+ ## Usage Notes
21
+
22
+ - **Append-only:** New entries always added at the bottom (chronological order)
23
+ - **Timestamp:** YYYY-MM-DD HH:MM format (local time)
24
+ - **Run:** Run directory name (e.g., run_003_tuned)
25
+ - **Decision:** One of Seal, Iterate, Archive
26
+ - **Key Metric:** Primary metric with value (e.g., F1=0.85)
27
+ - **Reference:** Relative path to run directory
28
+
29
+ ## Example Entries
30
+
31
+ | Timestamp | Run | Decision | Key Metric | Reference |
32
+ |-----------|-----|----------|------------|-----------|
33
+ | 2026-01-30 14:23 | run_003_tuned | Seal | F1=0.85 | experiments/run_003_tuned/ |
34
+ | 2026-01-29 10:15 | run_002_baseline | Iterate | F1=0.76 | experiments/run_002_baseline/ |
35
+ | 2026-01-28 16:45 | run_001_initial | Archive | F1=0.65 | experiments/archive/2026-01-28_hypothesis_v1/ |
36
+
37
+ ## Navigation
38
+
39
+ - For full decision context, see DECISION.md in each run directory
40
+ - For archived runs, see ARCHIVE_REASON.md in archive directory
41
+ - Log created automatically on first /grd:evaluate decision
42
+
43
+ ---
44
+
45
+ ## Integration
46
+
47
+ This template is used by `/grd:evaluate` command in Phase 4 (Decision Logging).
48
+
49
+ **Creation:**
50
+ - Created automatically when first decision is logged
51
+ - Initialized with header and table structure
52
+
53
+ **Updates:**
54
+ - Each human decision appends one row
55
+ - Order: chronological (newest at bottom)
56
+ - No deletion or modification of existing entries
57
+
58
+ **Log references run only, no bidirectional links** — per 05-CONTEXT.md decision
@@ -0,0 +1,138 @@
1
+ # Human Decision Template
2
+
3
+ Template for per-run decision records in `experiments/run_NNN/DECISION.md`.
4
+
5
+ ---
6
+
7
+ ## File Template
8
+
9
+ ```markdown
10
+ # Human Decision: {{run_name}}
11
+
12
+ **Timestamp:** {{ISO_8601_timestamp}}
13
+ **Hypothesis:** {{brief_hypothesis_from_objective}}
14
+ **Decision:** {{Seal|Iterate|Archive}}
15
+ **Rationale:** {{user_reasoning_if_provided}}
16
+
17
+ ## Evidence Summary
18
+
19
+ **Critic Verdict:** {{PROCEED}} (Confidence: {{HIGH|MEDIUM|LOW}})
20
+ **Composite Score:** {{score}} (threshold: {{threshold}})
21
+ **Key Metric:** {{metric_name}}={{value}} (target: {{comparison}}{{threshold}})
22
+
23
+ ## Metrics Detail
24
+
25
+ | Metric | Value | Threshold | Status |
26
+ |--------|-------|-----------|--------|
27
+ | {{metric_1}} | {{value}} | {{threshold}} | {{PASS|FAIL}} |
28
+ | {{metric_2}} | {{value}} | {{threshold}} | {{PASS|FAIL}} |
29
+
30
+ ## Decision Context
31
+
32
+ ### For Seal
33
+ - Hypothesis validated
34
+ - Ready for production/publication
35
+ - All success criteria met
36
+
37
+ ### For Iterate
38
+ - Continuing experimentation
39
+ - Direction: {{REVISE_METHOD|REVISE_DATA}} (from Critic recommendation)
40
+ - Next focus: {{specific_area}}
41
+
42
+ ### For Archive
43
+ - Hypothesis abandoned
44
+ - Reason: {{user_rationale_required}}
45
+ - Preserved as negative result
46
+
47
+ ---
48
+
49
+ *Decision recorded: {{ISO_8601_timestamp}}*
50
+ *Run directory: experiments/{{run_name}}/*
51
+ ```
52
+
53
+ ---
54
+
55
+ ## Usage Notes
56
+
57
+ **Field descriptions:**
58
+
59
+ - **run_name:** Directory name (e.g., run_003_tuned) extracted from experiments/ path
60
+ - **ISO_8601_timestamp:** Format YYYY-MM-DDTHH:MM:SSZ (UTC time)
61
+ - **brief_hypothesis_from_objective:** Extract "what" statement from OBJECTIVE.md (1-2 sentences max)
62
+ - **Decision:** One of: Seal, Iterate, Archive
63
+ - **Rationale:** REQUIRED for Archive, optional for Seal/Iterate
64
+
65
+ **Evidence Summary fields:**
66
+
67
+ - **Critic Verdict:** Always "PROCEED" (required to reach human eval)
68
+ - **Confidence:** Extract from CRITIC_LOG.md (HIGH/MEDIUM/LOW)
69
+ - **Composite Score:** Weighted average from SCORECARD.json
70
+ - **threshold:** Overall threshold from OBJECTIVE.md
71
+ - **Key Metric:** Primary metric (highest weight or first in list)
72
+ - **comparison:** Operator (>, <, >=, <=, ==)
73
+
74
+ **Metrics Detail table:**
75
+
76
+ - Pull all metrics from SCORECARD.json
77
+ - Include: name, achieved value, threshold, PASS/FAIL status
78
+ - Order by weight (descending) or as defined in OBJECTIVE.md
79
+
80
+ **Decision Context sections:**
81
+
82
+ - Only populate the section matching the decision type
83
+ - For Iterate: include Critic's recommendation if available
84
+ - For Archive: user_rationale is REQUIRED
85
+
86
+ **Example populated template:**
87
+
88
+ ```markdown
89
+ # Human Decision: run_003_tuned
90
+
91
+ **Timestamp:** 2026-01-30T14:35:00Z
92
+ **Hypothesis:** Ensemble methods will improve F1 score over single models
93
+ **Decision:** Seal
94
+ **Rationale:** Results demonstrate clear improvement with robust validation
95
+
96
+ ## Evidence Summary
97
+
98
+ **Critic Verdict:** PROCEED (Confidence: HIGH)
99
+ **Composite Score:** 0.89 (threshold: 0.80)
100
+ **Key Metric:** f1_score=0.91 (target: >=0.85)
101
+
102
+ ## Metrics Detail
103
+
104
+ | Metric | Value | Threshold | Status |
105
+ |--------|-------|-----------|--------|
106
+ | f1_score | 0.91 | 0.85 | PASS |
107
+ | precision | 0.88 | 0.80 | PASS |
108
+ | recall | 0.94 | 0.80 | PASS |
109
+
110
+ ## Decision Context
111
+
112
+ ### For Seal
113
+ - Hypothesis validated
114
+ - Ready for production/publication
115
+ - All success criteria met
116
+
117
+ ---
118
+
119
+ *Decision recorded: 2026-01-30T14:35:00Z*
120
+ *Run directory: experiments/run_003_tuned/*
121
+ ```
122
+
123
+ ---
124
+
125
+ ## Integration
126
+
127
+ This template is used by `/grd:evaluate` command in Phase 4 (Decision Logging).
128
+
129
+ **Inputs:**
130
+ - OBJECTIVE.md (hypothesis, metrics, thresholds)
131
+ - SCORECARD.json (metrics values, composite score)
132
+ - CRITIC_LOG.md (verdict, confidence)
133
+ - User decision (Seal/Iterate/Archive)
134
+ - User rationale (if Archive or provided)
135
+
136
+ **Output:**
137
+ - experiments/run_NNN/DECISION.md (this template populated)
138
+ - Appended to human_eval/decision_log.md (summary entry)
@@ -0,0 +1,146 @@
1
+ # Discovery Template
2
+
3
+ Template for `.planning/phases/XX-name/DISCOVERY.md` - shallow research for library/option decisions.
4
+
5
+ **Purpose:** Answer "which library/option should we use" questions during mandatory discovery in plan-phase.
6
+
7
+ For deep ecosystem research ("how do experts build this"), use `/grd:research-phase` which produces RESEARCH.md.
8
+
9
+ ---
10
+
11
+ ## File Template
12
+
13
+ ```markdown
14
+ ---
15
+ phase: XX-name
16
+ type: discovery
17
+ topic: [discovery-topic]
18
+ ---
19
+
20
+ <session_initialization>
21
+ Before beginning discovery, verify today's date:
22
+ !`date +%Y-%m-%d`
23
+
24
+ Use this date when searching for "current" or "latest" information.
25
+ Example: If today is 2025-11-22, search for "2025" not "2024".
26
+ </session_initialization>
27
+
28
+ <discovery_objective>
29
+ Discover [topic] to inform [phase name] implementation.
30
+
31
+ Purpose: [What decision/implementation this enables]
32
+ Scope: [Boundaries]
33
+ Output: DISCOVERY.md with recommendation
34
+ </discovery_objective>
35
+
36
+ <discovery_scope>
37
+ <include>
38
+ - [Question to answer]
39
+ - [Area to investigate]
40
+ - [Specific comparison if needed]
41
+ </include>
42
+
43
+ <exclude>
44
+ - [Out of scope for this discovery]
45
+ - [Defer to implementation phase]
46
+ </exclude>
47
+ </discovery_scope>
48
+
49
+ <discovery_protocol>
50
+
51
+ **Source Priority:**
52
+ 1. **Context7 MCP** - For library/framework documentation (current, authoritative)
53
+ 2. **Official Docs** - For platform-specific or non-indexed libraries
54
+ 3. **WebSearch** - For comparisons, trends, community patterns (verify all findings)
55
+
56
+ **Quality Checklist:**
57
+ Before completing discovery, verify:
58
+ - [ ] All claims have authoritative sources (Context7 or official docs)
59
+ - [ ] Negative claims ("X is not possible") verified with official documentation
60
+ - [ ] API syntax/configuration from Context7 or official docs (never WebSearch alone)
61
+ - [ ] WebSearch findings cross-checked with authoritative sources
62
+ - [ ] Recent updates/changelogs checked for breaking changes
63
+ - [ ] Alternative approaches considered (not just first solution found)
64
+
65
+ **Confidence Levels:**
66
+ - HIGH: Context7 or official docs confirm
67
+ - MEDIUM: WebSearch + Context7/official docs confirm
68
+ - LOW: WebSearch only or training knowledge only (mark for validation)
69
+
70
+ </discovery_protocol>
71
+
72
+
73
+ <output_structure>
74
+ Create `.planning/phases/XX-name/DISCOVERY.md`:
75
+
76
+ ```markdown
77
+ # [Topic] Discovery
78
+
79
+ ## Summary
80
+ [2-3 paragraph executive summary - what was researched, what was found, what's recommended]
81
+
82
+ ## Primary Recommendation
83
+ [What to do and why - be specific and actionable]
84
+
85
+ ## Alternatives Considered
86
+ [What else was evaluated and why not chosen]
87
+
88
+ ## Key Findings
89
+
90
+ ### [Category 1]
91
+ - [Finding with source URL and relevance to our case]
92
+
93
+ ### [Category 2]
94
+ - [Finding with source URL and relevance]
95
+
96
+ ## Code Examples
97
+ [Relevant implementation patterns, if applicable]
98
+
99
+ ## Metadata
100
+
101
+ <metadata>
102
+ <confidence level="high|medium|low">
103
+ [Why this confidence level - based on source quality and verification]
104
+ </confidence>
105
+
106
+ <sources>
107
+ - [Primary authoritative sources used]
108
+ </sources>
109
+
110
+ <open_questions>
111
+ [What couldn't be determined or needs validation during implementation]
112
+ </open_questions>
113
+
114
+ <validation_checkpoints>
115
+ [If confidence is LOW or MEDIUM, list specific things to verify during implementation]
116
+ </validation_checkpoints>
117
+ </metadata>
118
+ ```
119
+ </output_structure>
120
+
121
+ <success_criteria>
122
+ - All scope questions answered with authoritative sources
123
+ - Quality checklist items completed
124
+ - Clear primary recommendation
125
+ - Low-confidence findings marked with validation checkpoints
126
+ - Ready to inform PLAN.md creation
127
+ </success_criteria>
128
+
129
+ <guidelines>
130
+ **When to use discovery:**
131
+ - Technology choice unclear (library A vs B)
132
+ - Best practices needed for unfamiliar integration
133
+ - API/library investigation required
134
+ - Single decision pending
135
+
136
+ **When NOT to use:**
137
+ - Established patterns (CRUD, auth with known library)
138
+ - Implementation details (defer to execution)
139
+ - Questions answerable from existing project context
140
+
141
+ **When to use RESEARCH.md instead:**
142
+ - Niche/complex domains (3D, games, audio, shaders)
143
+ - Need ecosystem knowledge, not just library choice
144
+ - "How do experts build this" questions
145
+ - Use `/grd:research-phase` for these
146
+ </guidelines>
@@ -0,0 +1,104 @@
1
+ # {{run_name}}
2
+
3
+ **Created:** {{timestamp}}
4
+ **Iteration:** {{iteration_number}}
5
+ **Status:** {{status}}
6
+ **Experiment Type:** {{experiment_type}} (script | notebook)
7
+ **Hypothesis:** {{brief_hypothesis_from_objective}}
8
+
9
+ ## Summary
10
+
11
+ {{one_paragraph_explaining_what_why_how}}
12
+
13
+ ## Notebook Execution
14
+
15
+ > Note: This section only appears for notebook experiments (experiment_type == 'notebook')
16
+
17
+ - **Source Notebook:** {{source_notebook}}
18
+ - **Input Notebook:** code/input.ipynb (copy of source)
19
+ - **Executed Notebook:** output.ipynb (with outputs)
20
+ - **Metrics Extracted:** metrics.json (via scrapbook)
21
+
22
+ ### Parameters Injected
23
+
24
+ The following parameters were injected via papermill:
25
+
26
+ | Parameter | Value |
27
+ |-----------|-------|
28
+ | random_seed | {{random_seed}} |
29
+ | data_path | {{data_path}} |
30
+ | {{parameter_name}} | {{parameter_value}} |
31
+
32
+ ### Execution Details
33
+
34
+ - **Kernel:** {{kernel_name}} (auto-detected)
35
+ - **Cell Timeout:** {{cell_timeout}} seconds
36
+ - **Execution Time:** {{execution_time_seconds}} seconds
37
+ - **Retry Attempted:** {{retry_attempted}} (true/false)
38
+
39
+ ## Reproduce
40
+
41
+ ### For Script Experiments
42
+
43
+ ```bash
44
+ cd experiments/{{run_name}}
45
+ python code/train.py --config config.yaml
46
+ ```
47
+
48
+ ### For Notebook Experiments
49
+
50
+ ```bash
51
+ cd experiments/{{run_name}}
52
+ # Re-execute notebook with papermill
53
+ papermill code/input.ipynb output.ipynb -f config.yaml
54
+ # Or view executed notebook
55
+ jupyter notebook output.ipynb
56
+ ```
57
+
58
+ ## Configuration
59
+
60
+ See: `config.yaml`
61
+
62
+ Key parameters:
63
+ {{key_hyperparameters_list}}
64
+
65
+ ## Files
66
+
67
+ ### For Notebook Experiments
68
+
69
+ - `code/input.ipynb` - Original notebook (source copy)
70
+ - `output.ipynb` - Executed notebook with outputs
71
+ - `metrics.json` - Extracted metrics (via scrapbook)
72
+ - `config.yaml` - Experiment configuration
73
+ - `data/` - Data references (symlinks + hashes)
74
+ - `logs/` - Execution logs
75
+ - `outputs/` - Model artifacts
76
+ - `metrics/SCORECARD.json` - Evaluation results (if generated)
77
+ - `CRITIC_LOG.md` - Critic verdict
78
+
79
+ ### For Script Experiments
80
+
81
+ - `code/train.py` - Training script
82
+ - `config.yaml` - Experiment configuration
83
+ - `data/` - Data references (symlinks + hashes)
84
+ - `logs/` - Execution logs
85
+ - `outputs/` - Model artifacts
86
+ - `metrics/SCORECARD.json` - Evaluation results
87
+ - `CRITIC_LOG.md` - Critic verdict
88
+
89
+ ## Data
90
+
91
+ - **Source:** {{data_path}}
92
+ - **Hash:** {{data_hash}}
93
+ - **Version:** {{data_version_if_available}}
94
+
95
+ ## Results
96
+
97
+ {{metrics_summary_or_pending}}
98
+
99
+ ## Critic Verdict
100
+
101
+ {{verdict_if_available_or_pending}}
102
+
103
+ ---
104
+ *Generated by grd-researcher*