npm - get-research-done - Versions diffs - 1.1.0 - Mend

get-research-done 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (127) hide show

package/LICENSE +21 -0
package/README.md +560 -0
package/agents/grd-architect.md +789 -0
package/agents/grd-codebase-mapper.md +738 -0
package/agents/grd-critic.md +1065 -0
package/agents/grd-debugger.md +1203 -0
package/agents/grd-evaluator.md +948 -0
package/agents/grd-executor.md +784 -0
package/agents/grd-explorer.md +2063 -0
package/agents/grd-graduator.md +484 -0
package/agents/grd-integration-checker.md +423 -0
package/agents/grd-phase-researcher.md +641 -0
package/agents/grd-plan-checker.md +745 -0
package/agents/grd-planner.md +1386 -0
package/agents/grd-project-researcher.md +865 -0
package/agents/grd-research-synthesizer.md +256 -0
package/agents/grd-researcher.md +2361 -0
package/agents/grd-roadmapper.md +605 -0
package/agents/grd-verifier.md +778 -0
package/bin/install.js +1294 -0
package/commands/grd/add-phase.md +207 -0
package/commands/grd/add-todo.md +193 -0
package/commands/grd/architect.md +283 -0
package/commands/grd/audit-milestone.md +277 -0
package/commands/grd/check-todos.md +228 -0
package/commands/grd/complete-milestone.md +136 -0
package/commands/grd/debug.md +169 -0
package/commands/grd/discuss-phase.md +86 -0
package/commands/grd/evaluate.md +1095 -0
package/commands/grd/execute-phase.md +339 -0
package/commands/grd/explore.md +258 -0
package/commands/grd/graduate.md +323 -0
package/commands/grd/help.md +482 -0
package/commands/grd/insert-phase.md +227 -0
package/commands/grd/insights.md +231 -0
package/commands/grd/join-discord.md +18 -0
package/commands/grd/list-phase-assumptions.md +50 -0
package/commands/grd/map-codebase.md +71 -0
package/commands/grd/new-milestone.md +721 -0
package/commands/grd/new-project.md +1008 -0
package/commands/grd/pause-work.md +134 -0
package/commands/grd/plan-milestone-gaps.md +295 -0
package/commands/grd/plan-phase.md +525 -0
package/commands/grd/progress.md +364 -0
package/commands/grd/quick-explore.md +236 -0
package/commands/grd/quick.md +309 -0
package/commands/grd/remove-phase.md +349 -0
package/commands/grd/research-phase.md +200 -0
package/commands/grd/research.md +681 -0
package/commands/grd/resume-work.md +40 -0
package/commands/grd/set-profile.md +106 -0
package/commands/grd/settings.md +136 -0
package/commands/grd/update.md +172 -0
package/commands/grd/verify-work.md +219 -0
package/get-research-done/config/default.json +15 -0
package/get-research-done/references/checkpoints.md +1078 -0
package/get-research-done/references/continuation-format.md +249 -0
package/get-research-done/references/git-integration.md +254 -0
package/get-research-done/references/model-profiles.md +73 -0
package/get-research-done/references/planning-config.md +94 -0
package/get-research-done/references/questioning.md +141 -0
package/get-research-done/references/tdd.md +263 -0
package/get-research-done/references/ui-brand.md +160 -0
package/get-research-done/references/verification-patterns.md +612 -0
package/get-research-done/templates/DEBUG.md +159 -0
package/get-research-done/templates/UAT.md +247 -0
package/get-research-done/templates/archive-reason.md +195 -0
package/get-research-done/templates/codebase/architecture.md +255 -0
package/get-research-done/templates/codebase/concerns.md +310 -0
package/get-research-done/templates/codebase/conventions.md +307 -0
package/get-research-done/templates/codebase/integrations.md +280 -0
package/get-research-done/templates/codebase/stack.md +186 -0
package/get-research-done/templates/codebase/structure.md +285 -0
package/get-research-done/templates/codebase/testing.md +480 -0
package/get-research-done/templates/config.json +35 -0
package/get-research-done/templates/context.md +283 -0
package/get-research-done/templates/continue-here.md +78 -0
package/get-research-done/templates/critic-log.md +288 -0
package/get-research-done/templates/data-report.md +173 -0
package/get-research-done/templates/debug-subagent-prompt.md +91 -0
package/get-research-done/templates/decision-log.md +58 -0
package/get-research-done/templates/decision.md +138 -0
package/get-research-done/templates/discovery.md +146 -0
package/get-research-done/templates/experiment-readme.md +104 -0
package/get-research-done/templates/graduated-script.md +180 -0
package/get-research-done/templates/iteration-summary.md +234 -0
package/get-research-done/templates/milestone-archive.md +123 -0
package/get-research-done/templates/milestone.md +115 -0
package/get-research-done/templates/objective.md +271 -0
package/get-research-done/templates/phase-prompt.md +567 -0
package/get-research-done/templates/planner-subagent-prompt.md +117 -0
package/get-research-done/templates/project.md +184 -0
package/get-research-done/templates/requirements.md +231 -0
package/get-research-done/templates/research-project/ARCHITECTURE.md +204 -0
package/get-research-done/templates/research-project/FEATURES.md +147 -0
package/get-research-done/templates/research-project/PITFALLS.md +200 -0
package/get-research-done/templates/research-project/STACK.md +120 -0
package/get-research-done/templates/research-project/SUMMARY.md +170 -0
package/get-research-done/templates/research.md +529 -0
package/get-research-done/templates/roadmap.md +202 -0
package/get-research-done/templates/scorecard.json +113 -0
package/get-research-done/templates/state.md +287 -0
package/get-research-done/templates/summary.md +246 -0
package/get-research-done/templates/user-setup.md +311 -0
package/get-research-done/templates/verification-report.md +322 -0
package/get-research-done/workflows/complete-milestone.md +756 -0
package/get-research-done/workflows/diagnose-issues.md +231 -0
package/get-research-done/workflows/discovery-phase.md +289 -0
package/get-research-done/workflows/discuss-phase.md +433 -0
package/get-research-done/workflows/execute-phase.md +657 -0
package/get-research-done/workflows/execute-plan.md +1844 -0
package/get-research-done/workflows/list-phase-assumptions.md +178 -0
package/get-research-done/workflows/map-codebase.md +322 -0
package/get-research-done/workflows/resume-project.md +307 -0
package/get-research-done/workflows/transition.md +556 -0
package/get-research-done/workflows/verify-phase.md +628 -0
package/get-research-done/workflows/verify-work.md +596 -0
package/hooks/dist/grd-check-update.js +61 -0
package/hooks/dist/grd-statusline.js +84 -0
package/package.json +47 -0
package/scripts/audit-help-commands.sh +115 -0
package/scripts/build-hooks.js +42 -0
package/scripts/verify-all-commands.sh +246 -0
package/scripts/verify-architect-warning.sh +35 -0
package/scripts/verify-insights-mode.sh +40 -0
package/scripts/verify-quick-mode.sh +20 -0
package/scripts/verify-revise-data-routing.sh +139 -0

package/get-research-done/templates/DEBUG.md ADDED Viewed

@@ -0,0 +1,159 @@
+# Debug Template
+Template for `.planning/debug/[slug].md` — active debug session tracking.
+---
+## File Template
+```markdown
+---
+status: gathering | investigating | fixing | verifying | resolved
+trigger: "[verbatim user input]"
+created: [ISO timestamp]
+updated: [ISO timestamp]
+---
+## Current Focus
+<!-- OVERWRITE on each update - always reflects NOW -->
+hypothesis: [current theory being tested]
+test: [how testing it]
+expecting: [what result means if true/false]
+next_action: [immediate next step]
+## Symptoms
+<!-- Written during gathering, then immutable -->
+expected: [what should happen]
+actual: [what actually happens]
+errors: [error messages if any]
+reproduction: [how to trigger]
+started: [when it broke / always broken]
+## Eliminated
+<!-- APPEND only - prevents re-investigating after /clear -->
+- hypothesis: [theory that was wrong]
+  evidence: [what disproved it]
+  timestamp: [when eliminated]
+## Evidence
+<!-- APPEND only - facts discovered during investigation -->
+- timestamp: [when found]
+  checked: [what was examined]
+  found: [what was observed]
+  implication: [what this means]
+## Resolution
+<!-- OVERWRITE as understanding evolves -->
+root_cause: [empty until found]
+fix: [empty until applied]
+verification: [empty until verified]
+files_changed: []
+```
+---
+<section_rules>
+**Frontmatter (status, trigger, timestamps):**
+- `status`: OVERWRITE - reflects current phase
+- `trigger`: IMMUTABLE - verbatim user input, never changes
+- `created`: IMMUTABLE - set once
+- `updated`: OVERWRITE - update on every change
+**Current Focus:**
+- OVERWRITE entirely on each update
+- Always reflects what Claude is doing RIGHT NOW
+- If Claude reads this after /clear, it knows exactly where to resume
+- Fields: hypothesis, test, expecting, next_action
+**Symptoms:**
+- Written during initial gathering phase
+- IMMUTABLE after gathering complete
+- Reference point for what we're trying to fix
+- Fields: expected, actual, errors, reproduction, started
+**Eliminated:**
+- APPEND only - never remove entries
+- Prevents re-investigating dead ends after context reset
+- Each entry: hypothesis, evidence that disproved it, timestamp
+- Critical for efficiency across /clear boundaries
+**Evidence:**
+- APPEND only - never remove entries
+- Facts discovered during investigation
+- Each entry: timestamp, what checked, what found, implication
+- Builds the case for root cause
+**Resolution:**
+- OVERWRITE as understanding evolves
+- May update multiple times as fixes are tried
+- Final state shows confirmed root cause and verified fix
+- Fields: root_cause, fix, verification, files_changed
+</section_rules>
+<lifecycle>
+**Creation:** Immediately when /grd:debug is called
+- Create file with trigger from user input
+- Set status to "gathering"
+- Current Focus: next_action = "gather symptoms"
+- Symptoms: empty, to be filled
+**During symptom gathering:**
+- Update Symptoms section as user answers questions
+- Update Current Focus with each question
+- When complete: status → "investigating"
+**During investigation:**
+- OVERWRITE Current Focus with each hypothesis
+- APPEND to Evidence with each finding
+- APPEND to Eliminated when hypothesis disproved
+- Update timestamp in frontmatter
+**During fixing:**
+- status → "fixing"
+- Update Resolution.root_cause when confirmed
+- Update Resolution.fix when applied
+- Update Resolution.files_changed
+**During verification:**
+- status → "verifying"
+- Update Resolution.verification with results
+- If verification fails: status → "investigating", try again
+**On resolution:**
+- status → "resolved"
+- Move file to .planning/debug/resolved/
+</lifecycle>
+<resume_behavior>
+When Claude reads this file after /clear:
+1. Parse frontmatter → know status
+2. Read Current Focus → know exactly what was happening
+3. Read Eliminated → know what NOT to retry
+4. Read Evidence → know what's been learned
+5. Continue from next_action
+The file IS the debugging brain. Claude should be able to resume perfectly from any interruption point.
+</resume_behavior>
+<size_constraint>
+Keep debug files focused:
+- Evidence entries: 1-2 lines each, just the facts
+- Eliminated: brief - hypothesis + why it failed
+- No narrative prose - structured data only
+If evidence grows very large (10+ entries), consider whether you're going in circles. Check Eliminated to ensure you're not re-treading.
+</size_constraint>

package/get-research-done/templates/UAT.md ADDED Viewed

@@ -0,0 +1,247 @@
+# UAT Template
+Template for `.planning/phases/XX-name/{phase}-UAT.md` — persistent UAT session tracking.
+---
+## File Template
+```markdown
+---
+status: testing | complete | diagnosed
+phase: XX-name
+source: [list of SUMMARY.md files tested]
+started: [ISO timestamp]
+updated: [ISO timestamp]
+---
+## Current Test
+<!-- OVERWRITE each test - shows where we are -->
+number: [N]
+name: [test name]
+expected: |
+  [what user should observe]
+awaiting: user response
+## Tests
+### 1. [Test Name]
+expected: [observable behavior - what user should see]
+result: [pending]
+### 2. [Test Name]
+expected: [observable behavior]
+result: pass
+### 3. [Test Name]
+expected: [observable behavior]
+result: issue
+reported: "[verbatim user response]"
+severity: major
+### 4. [Test Name]
+expected: [observable behavior]
+result: skipped
+reason: [why skipped]
+...
+## Summary
+total: [N]
+passed: [N]
+issues: [N]
+pending: [N]
+skipped: [N]
+## Gaps
+<!-- YAML format for plan-phase --gaps consumption -->
+- truth: "[expected behavior from test]"
+  status: failed
+  reason: "User reported: [verbatim response]"
+  severity: blocker | major | minor | cosmetic
+  test: [N]
+  root_cause: ""     # Filled by diagnosis
+  artifacts: []      # Filled by diagnosis
+  missing: []        # Filled by diagnosis
+  debug_session: ""  # Filled by diagnosis
+```
+---
+<section_rules>
+**Frontmatter:**
+- `status`: OVERWRITE - "testing" or "complete"
+- `phase`: IMMUTABLE - set on creation
+- `source`: IMMUTABLE - SUMMARY files being tested
+- `started`: IMMUTABLE - set on creation
+- `updated`: OVERWRITE - update on every change
+**Current Test:**
+- OVERWRITE entirely on each test transition
+- Shows which test is active and what's awaited
+- On completion: "[testing complete]"
+**Tests:**
+- Each test: OVERWRITE result field when user responds
+- `result` values: [pending], pass, issue, skipped
+- If issue: add `reported` (verbatim) and `severity` (inferred)
+- If skipped: add `reason` if provided
+**Summary:**
+- OVERWRITE counts after each response
+- Tracks: total, passed, issues, pending, skipped
+**Gaps:**
+- APPEND only when issue found (YAML format)
+- After diagnosis: fill `root_cause`, `artifacts`, `missing`, `debug_session`
+- This section feeds directly into /grd:plan-phase --gaps
+</section_rules>
+<diagnosis_lifecycle>
+**After testing complete (status: complete), if gaps exist:**
+1. User runs diagnosis (from verify-work offer or manually)
+2. diagnose-issues workflow spawns parallel debug agents
+3. Each agent investigates one gap, returns root cause
+4. UAT.md Gaps section updated with diagnosis:
+   - Each gap gets `root_cause`, `artifacts`, `missing`, `debug_session` filled
+5. status → "diagnosed"
+6. Ready for /grd:plan-phase --gaps with root causes
+**After diagnosis:**
+```yaml
+## Gaps
+- truth: "Comment appears immediately after submission"
+  status: failed
+  reason: "User reported: works but doesn't show until I refresh the page"
+  severity: major
+  test: 2
+  root_cause: "useEffect in CommentList.tsx missing commentCount dependency"
+  artifacts:
+    - path: "src/components/CommentList.tsx"
+      issue: "useEffect missing dependency"
+  missing:
+    - "Add commentCount to useEffect dependency array"
+  debug_session: ".planning/debug/comment-not-refreshing.md"
+```
+</diagnosis_lifecycle>
+<lifecycle>
+**Creation:** When /grd:verify-work starts new session
+- Extract tests from SUMMARY.md files
+- Set status to "testing"
+- Current Test points to test 1
+- All tests have result: [pending]
+**During testing:**
+- Present test from Current Test section
+- User responds with pass confirmation or issue description
+- Update test result (pass/issue/skipped)
+- Update Summary counts
+- If issue: append to Gaps section (YAML format), infer severity
+- Move Current Test to next pending test
+**On completion:**
+- status → "complete"
+- Current Test → "[testing complete]"
+- Commit file
+- Present summary with next steps
+**Resume after /clear:**
+1. Read frontmatter → know phase and status
+2. Read Current Test → know where we are
+3. Find first [pending] result → continue from there
+4. Summary shows progress so far
+</lifecycle>
+<severity_guide>
+Severity is INFERRED from user's natural language, never asked.
+| User describes | Infer |
+|----------------|-------|
+| Crash, error, exception, fails completely, unusable | blocker |
+| Doesn't work, nothing happens, wrong behavior, missing | major |
+| Works but..., slow, weird, minor, small issue | minor |
+| Color, font, spacing, alignment, visual, looks off | cosmetic |
+Default: **major** (safe default, user can clarify if wrong)
+</severity_guide>
+<good_example>
+```markdown
+---
+status: diagnosed
+phase: 04-comments
+source: 04-01-SUMMARY.md, 04-02-SUMMARY.md
+started: 2025-01-15T10:30:00Z
+updated: 2025-01-15T10:45:00Z
+---
+## Current Test
+[testing complete]
+## Tests
+### 1. View Comments on Post
+expected: Comments section expands, shows count and comment list
+result: pass
+### 2. Create Top-Level Comment
+expected: Submit comment via rich text editor, appears in list with author info
+result: issue
+reported: "works but doesn't show until I refresh the page"
+severity: major
+### 3. Reply to a Comment
+expected: Click Reply, inline composer appears, submit shows nested reply
+result: pass
+### 4. Visual Nesting
+expected: 3+ level thread shows indentation, left borders, caps at reasonable depth
+result: pass
+### 5. Delete Own Comment
+expected: Click delete on own comment, removed or shows [deleted] if has replies
+result: pass
+### 6. Comment Count
+expected: Post shows accurate count, increments when adding comment
+result: pass
+## Summary
+total: 6
+passed: 5
+issues: 1
+pending: 0
+skipped: 0
+## Gaps
+- truth: "Comment appears immediately after submission in list"
+  status: failed
+  reason: "User reported: works but doesn't show until I refresh the page"
+  severity: major
+  test: 2
+  root_cause: "useEffect in CommentList.tsx missing commentCount dependency"
+  artifacts:
+    - path: "src/components/CommentList.tsx"
+      issue: "useEffect missing dependency"
+  missing:
+    - "Add commentCount to useEffect dependency array"
+  debug_session: ".planning/debug/comment-not-refreshing.md"
+```
+</good_example>

package/get-research-done/templates/archive-reason.md ADDED Viewed

@@ -0,0 +1,195 @@
+# Archive Reason Template
+Template for documenting failed/abandoned hypotheses in `experiments/archive/YYYY-MM-DD_hypothesis_name/ARCHIVE_REASON.md`.
+---
+## File Template
+```markdown
+# Archive Reason: {{hypothesis_name}}
+**Archived:** {{ISO_8601_timestamp}}
+**Original Hypothesis:** {{hypothesis_statement_from_objective}}
+**Final Iteration:** {{N}} of {{limit}}
+**Final Verdict:** {{ESCALATE|REVISE_METHOD_limit|REVISE_DATA_unresolved}}
+## Why This Failed
+{{user_rationale_required}}
+## What We Learned
+{{Insights from failed attempts - to be filled by user}}
+- Key finding 1
+- Key finding 2
+- Key finding 3
+## What Would Need to Change
+{{Conditions under which this might work - to be filled by user}}
+- Required change 1
+- Required change 2
+- Required change 3
+## Final Metrics
+| Metric | Best Value | Target | Gap |
+|--------|------------|--------|-----|
+| {{metric}} | {{best_achieved}} | {{threshold}} | {{difference}} |
+## Iteration Timeline
+See: `ITERATION_SUMMARY.md` for detailed history of all attempts.
+**Summary:**
+- Total runs: {{N}}
+- Verdict distribution: {{PROCEED: X, REVISE_METHOD: Y, REVISE_DATA: Z, ESCALATE: W}}
+- Best composite score: {{best_score}} (needed: {{threshold}})
+---
+*This negative result is preserved to prevent future researchers from repeating this approach without the necessary conditions.*
+---
+**Archive location:** experiments/archive/{{YYYY-MM-DD}}_{{hypothesis_slug}}/
+**Decision recorded:** human_eval/decision_log.md
+```
+---
+## Usage Notes
+**Field descriptions:**
+- **hypothesis_name:** Human-readable name extracted from OBJECTIVE.md "what" section
+- **hypothesis_slug:** Filename-safe version (spaces→underscores, lowercase, alphanumeric only)
+- **ISO_8601_timestamp:** Format YYYY-MM-DDTHH:MM:SSZ (UTC time)
+- **hypothesis_statement_from_objective:** Full "what/why/expected" from OBJECTIVE.md
+- **N:** Final iteration count from run directory
+- **limit:** Iteration limit (default 5, or custom from --limit flag)
+- **Final Verdict:** Reason for archival (ESCALATE, REVISE_METHOD limit reached, REVISE_DATA unresolved)
+**Why This Failed (REQUIRED):**
+This is the most critical section. User must provide substantive explanation:
+- What was attempted
+- Why it didn't work
+- What blocked success
+- Any insights about the approach
+Examples:
+- "Data quality issues prevented reliable model training. Missing values in key features caused high variance."
+- "Hypothesis was too ambitious given available data. Sample size (N=500) insufficient for complex ensemble methods."
+- "Leakage detection revealed fundamental data collection flaw that cannot be corrected without re-collection."
+**What We Learned (user fills):**
+Insights that emerged from failed attempts. Examples:
+- "Feature X has stronger predictive power than initially assumed"
+- "Class imbalance >90% requires specialized techniques beyond standard methods"
+- "Temporal drift in data makes cross-validation unreliable"
+**What Would Need to Change (user fills):**
+Conditions for future success. Examples:
+- "Collect 10x more data (N=5000+) to support ensemble complexity"
+- "Fix data pipeline to prevent leakage at source"
+- "Reformulate as binary classification instead of multi-class"
+**Final Metrics table:**
+- Show best values achieved across ALL iterations (not just final)
+- Include gap calculation (target - best_achieved)
+- Order by importance (primary metric first)
+**Example populated template:**
+```markdown
+# Archive Reason: Ensemble Methods for Fraud Detection
+**Archived:** 2026-01-30T15:45:00Z
+**Original Hypothesis:** Ensemble methods will improve F1 score over single models by combining predictions from random forest, gradient boosting, and neural networks.
+**Final Iteration:** 5 of 5
+**Final Verdict:** REVISE_METHOD_limit
+## Why This Failed
+After 5 iterations with different ensemble configurations, we could not achieve the target F1 score of 0.85. The fundamental issue is severe class imbalance (99.2% negative class) combined with limited positive examples (N=120). Ensemble methods require sufficient positive examples to learn diverse patterns, but our dataset is too imbalanced for this approach to work effectively.
+All iterations showed high precision (>0.90) but poor recall (<0.40), resulting in F1 scores between 0.52-0.58. Attempts to address this through resampling (SMOTE, undersampling) introduced artificial patterns that didn't generalize to the test set.
+## What We Learned
+- Class imbalance of 99%+ requires specialized loss functions (focal loss) rather than ensemble complexity
+- Resampling techniques (SMOTE) work poorly with high-dimensional data (237 features)
+- Single model (gradient boosting) with class weights performed nearly as well as ensembles (F1: 0.56 vs 0.58)
+- Feature importance analysis revealed only 15 features have meaningful signal
+## What Would Need to Change
+- Collect 10x more positive examples (N=1200+) to support ensemble diversity
+- Reduce feature space to top 15-20 features to prevent overfitting on noise
+- Use focal loss or cost-sensitive learning instead of standard ensemble methods
+- Consider anomaly detection approaches instead of classification
+- Re-evaluate hypothesis: perhaps single model is sufficient given data constraints
+## Final Metrics
+| Metric | Best Value | Target | Gap |
+|--------|------------|--------|-----|
+| f1_score | 0.58 | 0.85 | -0.27 |
+| precision | 0.92 | 0.80 | +0.12 |
+| recall | 0.42 | 0.80 | -0.38 |
+## Iteration Timeline
+See: `ITERATION_SUMMARY.md` for detailed history of all attempts.
+**Summary:**
+- Total runs: 5
+- Verdict distribution: PROCEED: 0, REVISE_METHOD: 5, REVISE_DATA: 0, ESCALATE: 0
+- Best composite score: 0.64 (needed: 0.80)
+---
+*This negative result is preserved to prevent future researchers from repeating this approach without the necessary conditions.*
+---
+**Archive location:** experiments/archive/2026-01-30_ensemble_methods_fraud_detection/
+**Decision recorded:** human_eval/decision_log.md
+```
+---
+## Integration
+This template is used by `/grd:evaluate` command in Phase 5 (Archive Handling) when user selects "Archive" decision.
+**Inputs:**
+- OBJECTIVE.md (hypothesis statement, metrics, thresholds)
+- All SCORECARD.json files across runs (for best metrics)
+- All CRITIC_LOG.md files (for verdict history)
+- User rationale (REQUIRED from confirmation prompt)
+- Iteration metadata (count, limit, final verdict)
+**Outputs:**
+- experiments/archive/YYYY-MM-DD_hypothesis_slug/ARCHIVE_REASON.md (this template)
+- Referenced by ITERATION_SUMMARY.md in same directory
+- Logged in human_eval/decision_log.md
+**Archive directory structure:**
+```
+experiments/archive/YYYY-MM-DD_hypothesis_name/
+├── ARCHIVE_REASON.md          # This template (why it failed)
+├── ITERATION_SUMMARY.md        # Collapsed run history
+└── final_run/                  # Final run directory moved from experiments/
+    ├── DECISION.md
+    ├── SCORECARD.json
+    ├── CRITIC_LOG.md
+    └── ...
+```