opencode-autoresearch 3.1.0-beta.2 → 3.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (45) hide show
  1. package/.opencode-plugin/plugin.json +1 -1
  2. package/AGENTS.md +42 -0
  3. package/README.md +246 -30
  4. package/VERSION +1 -0
  5. package/dist/cli.js +508 -15
  6. package/dist/cli.js.map +1 -1
  7. package/dist/constants.d.ts +1 -5
  8. package/dist/constants.d.ts.map +1 -1
  9. package/dist/constants.js +1 -5
  10. package/dist/constants.js.map +1 -1
  11. package/dist/helpers.d.ts +1 -2
  12. package/dist/helpers.d.ts.map +1 -1
  13. package/dist/helpers.js +19 -10
  14. package/dist/helpers.js.map +1 -1
  15. package/dist/index.d.ts +1 -1
  16. package/dist/index.d.ts.map +1 -1
  17. package/dist/run-manager.d.ts +2 -2
  18. package/dist/run-manager.d.ts.map +1 -1
  19. package/dist/run-manager.js +18 -16
  20. package/dist/run-manager.js.map +1 -1
  21. package/dist/subagent-pool.d.ts +6 -0
  22. package/dist/subagent-pool.d.ts.map +1 -1
  23. package/dist/subagent-pool.js +12 -2
  24. package/dist/subagent-pool.js.map +1 -1
  25. package/dist/types.d.ts +15 -38
  26. package/dist/types.d.ts.map +1 -1
  27. package/dist/wizard.d.ts.map +1 -1
  28. package/dist/wizard.js +2 -1
  29. package/dist/wizard.js.map +1 -1
  30. package/docs/ARCHITECTURE.md +134 -28
  31. package/docs/RELEASE.md +54 -25
  32. package/hooks/init.sh +6 -2
  33. package/hooks/status.sh +4 -3
  34. package/hooks/stop.sh +10 -6
  35. package/hooks/verify-package.sh +78 -0
  36. package/package.json +34 -14
  37. package/skills/autoresearch/SKILL.md +29 -4
  38. package/skills/autoresearch/references/core-principles.md +3 -3
  39. package/skills/autoresearch/references/interaction-wizard.md +1 -1
  40. package/skills/autoresearch/references/loop-workflow.md +4 -4
  41. package/skills/autoresearch/references/plan-workflow.md +2 -2
  42. package/skills/autoresearch/references/results-logging.md +1 -1
  43. package/skills/autoresearch/references/self-improve-loop.md +255 -0
  44. package/skills/autoresearch/references/state-management.md +3 -3
  45. package/skills/autoresearch/references/subagent-orchestration.md +1 -1
@@ -1,8 +1,8 @@
1
1
  ---
2
2
  name: autoresearch
3
- description: "Run a subagent-first structured improve-verify loop in OpenCode. Activate with /autoresearch or specialized modes like /autoresearch:plan, /autoresearch:debug, /autoresearch:fix, /autoresearch:learn, /autoresearch:predict, /autoresearch:scenario, /autoresearch:security, /autoresearch:ship."
3
+ description: "Run a subagent-first structured improve-verify loop in OpenCode. Activate with /autoresearch or specialized modes like /autoresearch:plan, /autoresearch:debug, /autoresearch:fix, /autoresearch:learn, /autoresearch:predict, /autoresearch:scenario, /autoresearch:security, /autoresearch:ship. Supports recursive self-improvement loops."
4
4
  metadata:
5
- short-description: "Subagent-first autonomous iteration loop for OpenCode"
5
+ short-description: "Subagent-first autonomous iteration loop for OpenCode with recursive self-improvement"
6
6
  ---
7
7
 
8
8
  # Auto Research for OpenCode
@@ -18,7 +18,8 @@ When invoked:
18
18
  3. Read `references/subagent-orchestration.md`
19
19
  4. For new interactive runs, read `references/interaction-wizard.md`, `references/plan-workflow.md`, and `references/loop-workflow.md`
20
20
  5. For state and results semantics, read `references/state-management.md` and `references/results-logging.md`
21
- 6. For specialized modes, read the matching workflow reference:
21
+ 6. For self-improvement and recursive loops, read `references/self-improve-loop.md`
22
+ 7. For specialized modes, read the matching workflow reference:
22
23
  - `references/debug-workflow.md`
23
24
  - `references/fix-workflow.md`
24
25
  - `references/learn-workflow.md`
@@ -37,6 +38,27 @@ The main agent is the orchestrator. Subagents are the standing execution pool.
37
38
  - The main agent owns the final decision, the edit, and the run state.
38
39
  - Approval belongs before launch. After launch, continue by default unless the user stops the run.
39
40
 
41
+ ## Recursive Self-Improvement
42
+
43
+ Auto Research can run on itself:
44
+
45
+ ```mermaid
46
+ flowchart TD
47
+ A[Meta-Goal] --> B[Child Loop]
48
+ B --> C[Evaluate]
49
+ C --> D{Improve?}
50
+ D -->|yes| E[Learn + Memory]
51
+ D -->|no| F[Adapt Strategy]
52
+ E --> G[Next Child]
53
+ F --> G
54
+ G --> B
55
+ ```
56
+
57
+ - Use `references/self-improve-loop.md` for recursive run semantics.
58
+ - Meta-iterations spawn child loops that inherit the meta-goal.
59
+ - Patterns extracted from child results guide strategy adaptation.
60
+ - Memory persists across meta-iterations in `autoresearch-memory.md`.
61
+
40
62
  ## Required Internal Fields
41
63
 
42
64
  Infer or confirm before launching:
@@ -61,6 +83,7 @@ Strongly recommended:
61
83
  5. Record every iteration before the next one starts.
62
84
  6. Keep strict improvements, discard regressions.
63
85
  7. Continue until the stop condition is met.
86
+ 8. For self-improvement runs, archive state before each meta-iteration.
64
87
 
65
88
  ## Background Control
66
89
 
@@ -74,4 +97,6 @@ autoresearch complete
74
97
 
75
98
  ## Output
76
99
 
77
- Follow `references/structured-output-spec.md`. Print a setup summary before the first iteration, short progress updates during the loop, and a completion summary when done.
100
+ Follow `references/structured-output-spec.md`. Print a setup summary before the first iteration, short progress updates during the loop, and a completion summary when done.
101
+
102
+ For recursive runs, emit meta-iteration summaries in addition to standard progress.
@@ -13,8 +13,8 @@ The loop exists to make disciplined progress, not noisy activity.
13
13
 
14
14
  ## Artifact Discipline
15
15
 
16
- - `autoresearch-state.json` is the current run snapshot.
17
- - `research-results.tsv` is the append-only experiment log.
18
- - `autoresearch-launch.json` is the last background launch request.
16
+ - `.autoresearch/state.json` is the current run snapshot.
17
+ - `autoresearch-results.tsv` is the append-only experiment log.
18
+ - `.autoresearch/launch.json` is the last background launch request.
19
19
 
20
20
  Only helper scripts should mutate these files when possible.
@@ -14,7 +14,7 @@ Confirm or infer:
14
14
  6. Should the run stay in `foreground` or move to `background`?
15
15
  7. Which standing subagent pool should stay active for the run?
16
16
 
17
- Use `python scripts/autoresearch_wizard.py` to build the first setup summary, then only ask about fields that are still missing or risky.
17
+ Use `autoresearch wizard` to build the first setup summary, then only ask about fields that are still missing or risky.
18
18
 
19
19
  ## Launch Rule
20
20
 
@@ -4,10 +4,10 @@
4
4
 
5
5
  1. Read the relevant code and repo configuration.
6
6
  2. Read `references/subagent-orchestration.md` so the standing subagent pool and task split are clear before the first iteration.
7
- 3. Generate the initial setup summary with `scripts/autoresearch_wizard.py` when the request is incomplete.
7
+ 3. Generate the initial setup summary with `autoresearch wizard` when the request is incomplete.
8
8
  4. Summarize the goal, scope, metric, direction, verify command, guard, and subagent plan.
9
9
  5. Ask one grounded clarification round if needed.
10
- 6. Initialize artifacts with `scripts/autoresearch_init_run.py`.
10
+ 6. Initialize artifacts with `autoresearch init`.
11
11
 
12
12
  ## Phase 2: Iterate
13
13
 
@@ -19,7 +19,7 @@ For each iteration:
19
19
  4. Run verify and guard commands.
20
20
  5. Feed subagent findings back into the next iteration plan.
21
21
  6. Keep or discard the experiment.
22
- 7. Record the outcome with `scripts/autoresearch_record_iteration.py`.
22
+ 7. Record the outcome with `autoresearch record`.
23
23
 
24
24
  ## Phase 3: Decide
25
25
 
@@ -32,4 +32,4 @@ Stop when:
32
32
 
33
33
  Once the user approves launch, continue by default until one of those stop conditions is true. Do not restart the approval cycle on every pass; re-anchor the same standing pool and keep iterating.
34
34
 
35
- Background supervisors should use `scripts/autoresearch_supervisor_status.py` to make the relaunch decision from the same artifacts.
35
+ Background supervisors should use `autoresearch status` to make the relaunch decision from the same artifacts.
@@ -19,7 +19,7 @@ Turn a vague request into a launch-ready setup summary with:
19
19
 
20
20
  1. Read the repo before asking anything.
21
21
  2. Infer defaults where the repo makes them obvious.
22
- 3. Generate the setup summary with `python scripts/autoresearch_wizard.py`.
22
+ 3. Generate the setup summary with `autoresearch wizard`.
23
23
  4. Ask only the missing or risky questions.
24
24
  5. Let the user correct the setup once before launch.
25
25
  6. If the user approves, initialize artifacts and start the loop.
@@ -38,5 +38,5 @@ Ask these in order when missing:
38
38
  ## Defaults
39
39
 
40
40
  - If the repo has `pytest.ini` or a `tests/` directory, default `verify` to `pytest`.
41
- - If the repo contains `scripts/autoresearch_supervisor_status.py`, offer it as the default guard.
41
+ - If the repo contains `autoresearch status`, offer it as the default guard.
42
42
  - Default metric direction to `lower` unless the user clearly wants to maximize a score.
@@ -1,6 +1,6 @@
1
1
  # Results Logging
2
2
 
3
- `research-results.tsv` is the primary append-only results log per run.
3
+ `autoresearch-results.tsv` is the primary append-only results log per run.
4
4
 
5
5
  The runtime maintains `autoresearch-results.tsv` as the canonical iteration log.
6
6
 
@@ -0,0 +1,255 @@
1
+ # Self-Improvement Loop
2
+
3
+ Use this reference when Auto Research should run on its own codebase or when setting up long-running recursive improvement cycles.
4
+
5
+ ## Overview
6
+
7
+ The self-improvement loop is a **meta-orchestration layer** that sits above the standard improve-verify loop. It enables Auto Research to iteratively improve itself, its documentation, its test coverage, or any other measurable property of the autoresearch repository.
8
+
9
+ ```mermaid
10
+ flowchart TD
11
+ subgraph Meta["Meta-Orchestrator"]
12
+ M1[Define Meta-Goal] --> M2[Spawn Child Loop]
13
+ M2 --> M3[Evaluate Child Result]
14
+ M3 --> M4{Child Succeeded?}
15
+ M4 -->|yes| M5[Extract Patterns]
16
+ M4 -->|no| M6[Adapt Strategy]
17
+ M5 --> M7[Update Memory]
18
+ M6 --> M2
19
+ M7 --> M8{Meta Stop?}
20
+ M8 -->|no| M2
21
+ M8 -->|yes| M9[Final Meta-Report]
22
+ end
23
+
24
+ subgraph Child["Child Loop (Standard AR)"]
25
+ C1[Baseline] --> C2[Iterate]
26
+ C2 --> C3[Verify]
27
+ C3 --> C4{Keep?}
28
+ C4 -->|yes| C5[Record]
29
+ C4 -->|no| C6[Discard]
30
+ C5 --> C7{Stop?}
31
+ C6 --> C7
32
+ C7 -->|no| C2
33
+ C7 -->|yes| C8[Child Report]
34
+ end
35
+
36
+ M2 -.->|launches| C1
37
+ C8 -.->|feeds into| M3
38
+ ```
39
+
40
+ ## Activation Contract
41
+
42
+ When invoked for self-improvement:
43
+
44
+ 1. Read `references/core-principles.md`
45
+ 2. Read `references/loop-workflow.md`
46
+ 3. Read `references/subagent-orchestration.md`
47
+ 4. Read this document (`references/self-improve-loop.md`)
48
+ 5. Read `references/state-management.md` for artifact semantics
49
+ 6. Read `references/results-logging.md` for record format
50
+
51
+ ## Meta-Goal Definition
52
+
53
+ The meta-goal must be measurable and bounded:
54
+
55
+ - **Target**: What property of autoresearch should improve?
56
+ - **Metric**: Numeric measurement (e.g., test coverage %, doc completeness score)
57
+ - **Direction**: `lower` or `higher`
58
+ - **Verify**: Mechanical command that measures the metric
59
+ - **Guard**: Command that catches regressions in core functionality
60
+ - **Scope**: Which files/subsystems are in scope
61
+
62
+ ### Example Meta-Goals
63
+
64
+ ```bash
65
+ # Improve documentation coverage
66
+ autoresearch init \
67
+ --goal "All public APIs have documentation" \
68
+ --metric "doc_coverage_pct" \
69
+ --direction "higher" \
70
+ --verify "node scripts/measure-doc-coverage.js" \
71
+ --guard "npm run typecheck && npm run build"
72
+
73
+ # Improve test coverage
74
+ autoresearch init \
75
+ --goal "Increase branch coverage" \
76
+ --metric "branch_coverage" \
77
+ --direction "higher" \
78
+ --verify "npm run test:coverage" \
79
+ --guard "npm test"
80
+
81
+ # Reduce complexity
82
+ autoresearch init \
83
+ --goal "Reduce cyclomatic complexity" \
84
+ --metric "avg_complexity" \
85
+ --direction "lower" \
86
+ --verify "npx complexity-report src/" \
87
+ --guard "npm test"
88
+ ```
89
+
90
+ ## Recursive Loop Phases
91
+
92
+ ### Phase 1: Meta-Setup
93
+
94
+ 1. Define meta-goal, metric, direction, verify, guard, and scope.
95
+ 2. Baseline the current state of the autoresearch repository.
96
+ 3. Initialize `autoresearch-memory.md` with known patterns and strategies.
97
+ 4. Set iteration cap and wall-clock duration for the meta-loop.
98
+ 5. Determine child loop parameters (iterations per child, stop conditions).
99
+
100
+ ### Phase 2: Child Loop Execution
101
+
102
+ Each child loop is a standard Auto Research run:
103
+
104
+ 1. Inherit meta-goal as child goal.
105
+ 2. Run the standard improve-verify loop for N iterations or until child stop condition.
106
+ 3. Produce child report: iterations, keeps, discards, best metric, patterns found.
107
+
108
+ ### Phase 3: Meta-Evaluation
109
+
110
+ After each child loop completes:
111
+
112
+ 1. Evaluate child success: Did metric improve? Were there regressions?
113
+ 2. Extract reusable patterns from child results.
114
+ 3. Update strategy based on pattern analysis.
115
+ 4. Decide: spawn another child, adapt approach, or meta-stop.
116
+
117
+ ### Phase 4: Memory Update
118
+
119
+ Persist learnings across meta-iterations:
120
+
121
+ 1. Append successful patterns to `autoresearch-memory.md`.
122
+ 2. Update `.autoresearch/state.json` with meta-run progress.
123
+ 3. Record meta-iteration in `autoresearch-results.tsv` with `meta:` prefix.
124
+
125
+ ## Memory Format
126
+
127
+ The memory file tracks patterns that persist across runs:
128
+
129
+ ```markdown
130
+ # Auto Research Memory
131
+
132
+ ## Successful Patterns
133
+
134
+ ### Pattern: Incremental doc improvements
135
+ - Context: Adding mermaid diagrams to README
136
+ - Approach: One diagram per iteration, verify render
137
+ - Result: 3/3 kept, no regressions
138
+ - Confidence: high
139
+
140
+ ### Pattern: Test-first for new features
141
+ - Context: Adding self-improvement loop
142
+ - Approach: Write test, implement, verify
143
+ - Result: 5/7 kept, 2 discards due to edge cases
144
+ - Confidence: medium
145
+
146
+ ## Failed Approaches
147
+
148
+ ### Approach: Large rewrite of state manager
149
+ - Context: Trying to simplify run-manager.ts
150
+ - Result: 0/3 kept, multiple guard failures
151
+ - Lesson: Prefer incremental changes over rewrites
152
+
153
+ ## Strategy Recommendations
154
+
155
+ - For docs: incremental, one section per iteration
156
+ - For tests: test-first, small units
157
+ - For refactoring: typecheck-first, then test
158
+ ```
159
+
160
+ ## Meta-Stop Conditions
161
+
162
+ Stop the recursive loop when:
163
+
164
+ 1. **Goal met**: Metric reaches target threshold.
165
+ 2. **Diminishing returns**: N consecutive child loops with no improvement.
166
+ 3. **Iteration cap**: Meta-iteration cap reached.
167
+ 4. **Duration elapsed**: Wall-clock cap exceeded.
168
+ 5. **User request**: Explicit stop requested.
169
+ 6. **Needs human**: Child loop surfaces blocker requiring human input.
170
+
171
+ ## Meta-Iteration Record Format
172
+
173
+ Meta-iterations are recorded with a `meta:` prefix in the results log:
174
+
175
+ ```tsv
176
+ timestamp iteration decision metric_value verify_status guard_status hypothesis change_summary labels note
177
+ 2024-01-15T10:00:00Z meta:001 keep 68.5 pass pass strategy:incremental_docs Child loop 001 completed with 5/7 kept doc,meta Pattern: one diagram per iteration
178
+ ```
179
+
180
+ ## Background Self-Improvement
181
+
182
+ For overnight or long-running self-improvement:
183
+
184
+ ```bash
185
+ autoresearch init \
186
+ --goal "Improve AutoResearch documentation and test coverage" \
187
+ --metric "combined_score" \
188
+ --direction "higher" \
189
+ --verify "node scripts/combined-score.js" \
190
+ --guard "npm run typecheck && npm test" \
191
+ --mode "background" \
192
+ --iterations "50" \
193
+ --duration "8h" \
194
+ --scope "src/,docs/,wiki/,skills/"
195
+
196
+ autoresearch launch
197
+ ```
198
+
199
+ The background supervisor (`autoresearch status`) will:
200
+
201
+ 1. Check child loop status periodically.
202
+ 2. Spawn new child loops when previous ones complete.
203
+ 3. Stop if meta-stop conditions are met.
204
+ 4. Resume from `.autoresearch/state.json` on restart.
205
+
206
+ ## Safety
207
+
208
+ Self-improvement loops have additional guardrails:
209
+
210
+ 1. **Scope enforcement**: Only modify files within declared scope.
211
+ 2. **Guard command**: Must pass before any keep decision.
212
+ 3. **Backup state**: Archive `.autoresearch/state.json` before each meta-iteration.
213
+ 4. **Human checkpoint**: Optional `needs_human` flag after N meta-iterations.
214
+ 5. **Rollback strategy**: Documented in memory for each pattern.
215
+
216
+ ## Subagent Pool for Self-Improvement
217
+
218
+ The standing pool for self-improvement includes:
219
+
220
+ | Role | Purpose |
221
+ | --- | --- |
222
+ | `meta_orchestrator` | Owns meta-goal and child loop decisions |
223
+ | `child_orchestrator` | Runs standard loop within child context |
224
+ | `pattern_analyst` | Extracts patterns from child results |
225
+ | `strategy_advisor` | Recommends tactic changes |
226
+ | `regression_guard` | Extra verification for self-modification |
227
+ | `doc_reviewer` | Reviews documentation changes |
228
+ | `test_designer` | Designs tests for new functionality |
229
+
230
+ ## Example Full Recursive Session
231
+
232
+ ```text
233
+ $ autoresearch init --goal "Improve README and add mermaid diagrams" \
234
+ --metric "doc_completeness" --direction higher \
235
+ --verify "node scripts/score-docs.js" --mode background
236
+
237
+ [meta-001] Child loop launched: 10 iterations
238
+ [child-001] Baseline: doc_completeness = 42
239
+ [child-001] iter 001: keep (diagram added, score 48)
240
+ [child-001] iter 002: keep (diagram added, score 53)
241
+ [child-001] iter 003: discard (diagram broken)
242
+ [child-001] iter 004: keep (diagram fixed, score 55)
243
+ ...
244
+ [child-001] Complete: 7/10 kept, best score 61
245
+
246
+ [meta-001] Pattern: SVG diagrams > mermaid for banners
247
+ [meta-001] Pattern: One section per iteration is optimal
248
+ [meta-002] Strategy adapted: Focus on wiki next
249
+ [meta-002] Child loop launched: 10 iterations
250
+ ...
251
+
252
+ [meta-stop] Goal threshold reached (80/100)
253
+ [meta-complete] Report: autoresearch-report.md
254
+ [meta-complete] Memory: autoresearch-memory.md
255
+ ```
@@ -1,6 +1,6 @@
1
1
  # State Management
2
2
 
3
- `autoresearch-state.json` is the run checkpoint.
3
+ `.autoresearch/state.json` is the run checkpoint.
4
4
 
5
5
  ## Core Fields
6
6
 
@@ -28,7 +28,7 @@
28
28
 
29
29
  ## Resume Semantics
30
30
 
31
- - `python scripts/autoresearch_runtime_ctl.py resume` clears `stop_requested` and marks the background run active again.
31
+ - `autoresearch resume` clears `stop_requested` and marks the background run active again.
32
32
  - Resume does not create a new run; it continues the existing state snapshot.
33
33
  - Resume should re-anchor the standing pool with the latest metric, last iteration, and active role guidance before the next handoff.
34
34
  - Completed runs are not resumable; return to the previous state by starting a new run.
@@ -36,4 +36,4 @@
36
36
 
37
37
  ## Completion Semantics
38
38
 
39
- - `python scripts/autoresearch_runtime_ctl.py complete` moves a background run to `completed`, clears `background_active`, and ends the detached session lifecycle.
39
+ - `autoresearch complete` moves a background run to `completed`, clears `background_active`, and ends the detached session lifecycle.
@@ -26,7 +26,7 @@ Use this reference when a run should be subagent-first.
26
26
 
27
27
  ## State Ownership
28
28
 
29
- - Only the orchestrator records iterations, mutates `autoresearch-state.json`, and decides whether the latest step is `keep`, `discard`, or `needs_human`.
29
+ - Only the orchestrator records iterations, mutates `.autoresearch/state.json`, and decides whether the latest step is `keep`, `discard`, or `needs_human`.
30
30
  - Subagents may disagree, critique, or verify, but their output is supporting evidence.
31
31
  - If several subagents contribute to one change, roll that evidence into one orchestrator-owned iteration result.
32
32