npm - opencode-autoresearch - Versions diffs - 3.1.0-beta.2 → 3.3.0 - Mend

opencode-autoresearch 3.1.0-beta.2 → 3.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (45) hide show

package/.opencode-plugin/plugin.json +1 -1
package/AGENTS.md +42 -0
package/README.md +246 -30
package/VERSION +1 -0
package/dist/cli.js +508 -15
package/dist/cli.js.map +1 -1
package/dist/constants.d.ts +1 -5
package/dist/constants.d.ts.map +1 -1
package/dist/constants.js +1 -5
package/dist/constants.js.map +1 -1
package/dist/helpers.d.ts +1 -2
package/dist/helpers.d.ts.map +1 -1
package/dist/helpers.js +19 -10
package/dist/helpers.js.map +1 -1
package/dist/index.d.ts +1 -1
package/dist/index.d.ts.map +1 -1
package/dist/run-manager.d.ts +2 -2
package/dist/run-manager.d.ts.map +1 -1
package/dist/run-manager.js +18 -16
package/dist/run-manager.js.map +1 -1
package/dist/subagent-pool.d.ts +6 -0
package/dist/subagent-pool.d.ts.map +1 -1
package/dist/subagent-pool.js +12 -2
package/dist/subagent-pool.js.map +1 -1
package/dist/types.d.ts +15 -38
package/dist/types.d.ts.map +1 -1
package/dist/wizard.d.ts.map +1 -1
package/dist/wizard.js +2 -1
package/dist/wizard.js.map +1 -1
package/docs/ARCHITECTURE.md +134 -28
package/docs/RELEASE.md +54 -25
package/hooks/init.sh +6 -2
package/hooks/status.sh +4 -3
package/hooks/stop.sh +10 -6
package/hooks/verify-package.sh +78 -0
package/package.json +34 -14
package/skills/autoresearch/SKILL.md +29 -4
package/skills/autoresearch/references/core-principles.md +3 -3
package/skills/autoresearch/references/interaction-wizard.md +1 -1
package/skills/autoresearch/references/loop-workflow.md +4 -4
package/skills/autoresearch/references/plan-workflow.md +2 -2
package/skills/autoresearch/references/results-logging.md +1 -1
package/skills/autoresearch/references/self-improve-loop.md +255 -0
package/skills/autoresearch/references/state-management.md +3 -3
package/skills/autoresearch/references/subagent-orchestration.md +1 -1

package/skills/autoresearch/SKILL.md CHANGED Viewed

@@ -1,8 +1,8 @@
 ---
 name: autoresearch
-description: "Run a subagent-first structured improve-verify loop in OpenCode. Activate with /autoresearch or specialized modes like /autoresearch:plan, /autoresearch:debug, /autoresearch:fix, /autoresearch:learn, /autoresearch:predict, /autoresearch:scenario, /autoresearch:security, /autoresearch:ship."
+description: "Run a subagent-first structured improve-verify loop in OpenCode. Activate with /autoresearch or specialized modes like /autoresearch:plan, /autoresearch:debug, /autoresearch:fix, /autoresearch:learn, /autoresearch:predict, /autoresearch:scenario, /autoresearch:security, /autoresearch:ship. Supports recursive self-improvement loops."
 metadata:
-  short-description: "Subagent-first autonomous iteration loop for OpenCode"
+  short-description: "Subagent-first autonomous iteration loop for OpenCode with recursive self-improvement"
 ---
 # Auto Research for OpenCode
@@ -18,7 +18,8 @@ When invoked:
 3. Read `references/subagent-orchestration.md`
 4. For new interactive runs, read `references/interaction-wizard.md`, `references/plan-workflow.md`, and `references/loop-workflow.md`
 5. For state and results semantics, read `references/state-management.md` and `references/results-logging.md`
-6. For specialized modes, read the matching workflow reference:
+6. For self-improvement and recursive loops, read `references/self-improve-loop.md`
+7. For specialized modes, read the matching workflow reference:
    - `references/debug-workflow.md`
    - `references/fix-workflow.md`
    - `references/learn-workflow.md`
@@ -37,6 +38,27 @@ The main agent is the orchestrator. Subagents are the standing execution pool.
 - The main agent owns the final decision, the edit, and the run state.
 - Approval belongs before launch. After launch, continue by default unless the user stops the run.
+## Recursive Self-Improvement
+Auto Research can run on itself:
+```mermaid
+flowchart TD
+    A[Meta-Goal] --> B[Child Loop]
+    B --> C[Evaluate]
+    C --> D{Improve?}
+    D -->|yes| E[Learn + Memory]
+    D -->|no| F[Adapt Strategy]
+    E --> G[Next Child]
+    F --> G
+    G --> B
+```
+- Use `references/self-improve-loop.md` for recursive run semantics.
+- Meta-iterations spawn child loops that inherit the meta-goal.
+- Patterns extracted from child results guide strategy adaptation.
+- Memory persists across meta-iterations in `autoresearch-memory.md`.
 ## Required Internal Fields
 Infer or confirm before launching:
@@ -61,6 +83,7 @@ Strongly recommended:
 5. Record every iteration before the next one starts.
 6. Keep strict improvements, discard regressions.
 7. Continue until the stop condition is met.
+8. For self-improvement runs, archive state before each meta-iteration.
 ## Background Control
@@ -74,4 +97,6 @@ autoresearch complete
 ## Output
-Follow `references/structured-output-spec.md`. Print a setup summary before the first iteration, short progress updates during the loop, and a completion summary when done.
+Follow `references/structured-output-spec.md`. Print a setup summary before the first iteration, short progress updates during the loop, and a completion summary when done.
+For recursive runs, emit meta-iteration summaries in addition to standard progress.

package/skills/autoresearch/references/core-principles.md CHANGED Viewed

@@ -13,8 +13,8 @@ The loop exists to make disciplined progress, not noisy activity.
 ## Artifact Discipline
-- `autoresearch-state.json` is the current run snapshot.
-- `research-results.tsv` is the append-only experiment log.
-- `autoresearch-launch.json` is the last background launch request.
+- `.autoresearch/state.json` is the current run snapshot.
+- `autoresearch-results.tsv` is the append-only experiment log.
+- `.autoresearch/launch.json` is the last background launch request.
 Only helper scripts should mutate these files when possible.

package/skills/autoresearch/references/interaction-wizard.md CHANGED Viewed

@@ -14,7 +14,7 @@ Confirm or infer:
 6. Should the run stay in `foreground` or move to `background`?
 7. Which standing subagent pool should stay active for the run?
-Use `python scripts/autoresearch_wizard.py` to build the first setup summary, then only ask about fields that are still missing or risky.
+Use `autoresearch wizard` to build the first setup summary, then only ask about fields that are still missing or risky.
 ## Launch Rule

package/skills/autoresearch/references/loop-workflow.md CHANGED Viewed

@@ -4,10 +4,10 @@
 1. Read the relevant code and repo configuration.
 2. Read `references/subagent-orchestration.md` so the standing subagent pool and task split are clear before the first iteration.
-3. Generate the initial setup summary with `scripts/autoresearch_wizard.py` when the request is incomplete.
+3. Generate the initial setup summary with `autoresearch wizard` when the request is incomplete.
 4. Summarize the goal, scope, metric, direction, verify command, guard, and subagent plan.
 5. Ask one grounded clarification round if needed.
-6. Initialize artifacts with `scripts/autoresearch_init_run.py`.
+6. Initialize artifacts with `autoresearch init`.
 ## Phase 2: Iterate
@@ -19,7 +19,7 @@ For each iteration:
 4. Run verify and guard commands.
 5. Feed subagent findings back into the next iteration plan.
 6. Keep or discard the experiment.
-7. Record the outcome with `scripts/autoresearch_record_iteration.py`.
+7. Record the outcome with `autoresearch record`.
 ## Phase 3: Decide
@@ -32,4 +32,4 @@ Stop when:
 Once the user approves launch, continue by default until one of those stop conditions is true. Do not restart the approval cycle on every pass; re-anchor the same standing pool and keep iterating.
-Background supervisors should use `scripts/autoresearch_supervisor_status.py` to make the relaunch decision from the same artifacts.
+Background supervisors should use `autoresearch status` to make the relaunch decision from the same artifacts.

package/skills/autoresearch/references/plan-workflow.md CHANGED Viewed

@@ -19,7 +19,7 @@ Turn a vague request into a launch-ready setup summary with:
 1. Read the repo before asking anything.
 2. Infer defaults where the repo makes them obvious.
-3. Generate the setup summary with `python scripts/autoresearch_wizard.py`.
+3. Generate the setup summary with `autoresearch wizard`.
 4. Ask only the missing or risky questions.
 5. Let the user correct the setup once before launch.
 6. If the user approves, initialize artifacts and start the loop.
@@ -38,5 +38,5 @@ Ask these in order when missing:
 ## Defaults
 - If the repo has `pytest.ini` or a `tests/` directory, default `verify` to `pytest`.
-- If the repo contains `scripts/autoresearch_supervisor_status.py`, offer it as the default guard.
+- If the repo contains `autoresearch status`, offer it as the default guard.
 - Default metric direction to `lower` unless the user clearly wants to maximize a score.

package/skills/autoresearch/references/results-logging.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Results Logging
-`research-results.tsv` is the primary append-only results log per run.
+`autoresearch-results.tsv` is the primary append-only results log per run.
 The runtime maintains `autoresearch-results.tsv` as the canonical iteration log.

package/skills/autoresearch/references/self-improve-loop.md ADDED Viewed

@@ -0,0 +1,255 @@
+# Self-Improvement Loop
+Use this reference when Auto Research should run on its own codebase or when setting up long-running recursive improvement cycles.
+## Overview
+The self-improvement loop is a **meta-orchestration layer** that sits above the standard improve-verify loop. It enables Auto Research to iteratively improve itself, its documentation, its test coverage, or any other measurable property of the autoresearch repository.
+```mermaid
+flowchart TD
+    subgraph Meta["Meta-Orchestrator"]
+        M1[Define Meta-Goal] --> M2[Spawn Child Loop]
+        M2 --> M3[Evaluate Child Result]
+        M3 --> M4{Child Succeeded?}
+        M4 -->|yes| M5[Extract Patterns]
+        M4 -->|no| M6[Adapt Strategy]
+        M5 --> M7[Update Memory]
+        M6 --> M2
+        M7 --> M8{Meta Stop?}
+        M8 -->|no| M2
+        M8 -->|yes| M9[Final Meta-Report]
+    end
+    subgraph Child["Child Loop (Standard AR)"]
+        C1[Baseline] --> C2[Iterate]
+        C2 --> C3[Verify]
+        C3 --> C4{Keep?}
+        C4 -->|yes| C5[Record]
+        C4 -->|no| C6[Discard]
+        C5 --> C7{Stop?}
+        C6 --> C7
+        C7 -->|no| C2
+        C7 -->|yes| C8[Child Report]
+    end
+    M2 -.->|launches| C1
+    C8 -.->|feeds into| M3
+```
+## Activation Contract
+When invoked for self-improvement:
+1. Read `references/core-principles.md`
+2. Read `references/loop-workflow.md`
+3. Read `references/subagent-orchestration.md`
+4. Read this document (`references/self-improve-loop.md`)
+5. Read `references/state-management.md` for artifact semantics
+6. Read `references/results-logging.md` for record format
+## Meta-Goal Definition
+The meta-goal must be measurable and bounded:
+- **Target**: What property of autoresearch should improve?
+- **Metric**: Numeric measurement (e.g., test coverage %, doc completeness score)
+- **Direction**: `lower` or `higher`
+- **Verify**: Mechanical command that measures the metric
+- **Guard**: Command that catches regressions in core functionality
+- **Scope**: Which files/subsystems are in scope
+### Example Meta-Goals
+```bash
+# Improve documentation coverage
+autoresearch init \
+  --goal "All public APIs have documentation" \
+  --metric "doc_coverage_pct" \
+  --direction "higher" \
+  --verify "node scripts/measure-doc-coverage.js" \
+  --guard "npm run typecheck && npm run build"
+# Improve test coverage
+autoresearch init \
+  --goal "Increase branch coverage" \
+  --metric "branch_coverage" \
+  --direction "higher" \
+  --verify "npm run test:coverage" \
+  --guard "npm test"
+# Reduce complexity
+autoresearch init \
+  --goal "Reduce cyclomatic complexity" \
+  --metric "avg_complexity" \
+  --direction "lower" \
+  --verify "npx complexity-report src/" \
+  --guard "npm test"
+```
+## Recursive Loop Phases
+### Phase 1: Meta-Setup
+1. Define meta-goal, metric, direction, verify, guard, and scope.
+2. Baseline the current state of the autoresearch repository.
+3. Initialize `autoresearch-memory.md` with known patterns and strategies.
+4. Set iteration cap and wall-clock duration for the meta-loop.
+5. Determine child loop parameters (iterations per child, stop conditions).
+### Phase 2: Child Loop Execution
+Each child loop is a standard Auto Research run:
+1. Inherit meta-goal as child goal.
+2. Run the standard improve-verify loop for N iterations or until child stop condition.
+3. Produce child report: iterations, keeps, discards, best metric, patterns found.
+### Phase 3: Meta-Evaluation
+After each child loop completes:
+1. Evaluate child success: Did metric improve? Were there regressions?
+2. Extract reusable patterns from child results.
+3. Update strategy based on pattern analysis.
+4. Decide: spawn another child, adapt approach, or meta-stop.
+### Phase 4: Memory Update
+Persist learnings across meta-iterations:
+1. Append successful patterns to `autoresearch-memory.md`.
+2. Update `.autoresearch/state.json` with meta-run progress.
+3. Record meta-iteration in `autoresearch-results.tsv` with `meta:` prefix.
+## Memory Format
+The memory file tracks patterns that persist across runs:
+```markdown
+# Auto Research Memory
+## Successful Patterns
+### Pattern: Incremental doc improvements
+- Context: Adding mermaid diagrams to README
+- Approach: One diagram per iteration, verify render
+- Result: 3/3 kept, no regressions
+- Confidence: high
+### Pattern: Test-first for new features
+- Context: Adding self-improvement loop
+- Approach: Write test, implement, verify
+- Result: 5/7 kept, 2 discards due to edge cases
+- Confidence: medium
+## Failed Approaches
+### Approach: Large rewrite of state manager
+- Context: Trying to simplify run-manager.ts
+- Result: 0/3 kept, multiple guard failures
+- Lesson: Prefer incremental changes over rewrites
+## Strategy Recommendations
+- For docs: incremental, one section per iteration
+- For tests: test-first, small units
+- For refactoring: typecheck-first, then test
+```
+## Meta-Stop Conditions
+Stop the recursive loop when:
+1. **Goal met**: Metric reaches target threshold.
+2. **Diminishing returns**: N consecutive child loops with no improvement.
+3. **Iteration cap**: Meta-iteration cap reached.
+4. **Duration elapsed**: Wall-clock cap exceeded.
+5. **User request**: Explicit stop requested.
+6. **Needs human**: Child loop surfaces blocker requiring human input.
+## Meta-Iteration Record Format
+Meta-iterations are recorded with a `meta:` prefix in the results log:
+```tsv
+timestamp	iteration	decision	metric_value	verify_status	guard_status	hypothesis	change_summary	labels	note
+2024-01-15T10:00:00Z	meta:001	keep	68.5	pass	pass	strategy:incremental_docs	Child loop 001 completed with 5/7 kept	doc,meta	Pattern: one diagram per iteration
+```
+## Background Self-Improvement
+For overnight or long-running self-improvement:
+```bash
+autoresearch init \
+  --goal "Improve AutoResearch documentation and test coverage" \
+  --metric "combined_score" \
+  --direction "higher" \
+  --verify "node scripts/combined-score.js" \
+  --guard "npm run typecheck && npm test" \
+  --mode "background" \
+  --iterations "50" \
+  --duration "8h" \
+  --scope "src/,docs/,wiki/,skills/"
+autoresearch launch
+```
+The background supervisor (`autoresearch status`) will:
+1. Check child loop status periodically.
+2. Spawn new child loops when previous ones complete.
+3. Stop if meta-stop conditions are met.
+4. Resume from `.autoresearch/state.json` on restart.
+## Safety
+Self-improvement loops have additional guardrails:
+1. **Scope enforcement**: Only modify files within declared scope.
+2. **Guard command**: Must pass before any keep decision.
+3. **Backup state**: Archive `.autoresearch/state.json` before each meta-iteration.
+4. **Human checkpoint**: Optional `needs_human` flag after N meta-iterations.
+5. **Rollback strategy**: Documented in memory for each pattern.
+## Subagent Pool for Self-Improvement
+The standing pool for self-improvement includes:
+| Role | Purpose |
+| --- | --- |
+| `meta_orchestrator` | Owns meta-goal and child loop decisions |
+| `child_orchestrator` | Runs standard loop within child context |
+| `pattern_analyst` | Extracts patterns from child results |
+| `strategy_advisor` | Recommends tactic changes |
+| `regression_guard` | Extra verification for self-modification |
+| `doc_reviewer` | Reviews documentation changes |
+| `test_designer` | Designs tests for new functionality |
+## Example Full Recursive Session
+```text
+$ autoresearch init --goal "Improve README and add mermaid diagrams" \
+  --metric "doc_completeness" --direction higher \
+  --verify "node scripts/score-docs.js" --mode background
+[meta-001] Child loop launched: 10 iterations
+[child-001] Baseline: doc_completeness = 42
+[child-001] iter 001: keep (diagram added, score 48)
+[child-001] iter 002: keep (diagram added, score 53)
+[child-001] iter 003: discard (diagram broken)
+[child-001] iter 004: keep (diagram fixed, score 55)
+...
+[child-001] Complete: 7/10 kept, best score 61
+[meta-001] Pattern: SVG diagrams > mermaid for banners
+[meta-001] Pattern: One section per iteration is optimal
+[meta-002] Strategy adapted: Focus on wiki next
+[meta-002] Child loop launched: 10 iterations
+...
+[meta-stop] Goal threshold reached (80/100)
+[meta-complete] Report: autoresearch-report.md
+[meta-complete] Memory: autoresearch-memory.md
+```

package/skills/autoresearch/references/state-management.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # State Management
-`autoresearch-state.json` is the run checkpoint.
+`.autoresearch/state.json` is the run checkpoint.
 ## Core Fields
@@ -28,7 +28,7 @@
 ## Resume Semantics
-- `python scripts/autoresearch_runtime_ctl.py resume` clears `stop_requested` and marks the background run active again.
+- `autoresearch resume` clears `stop_requested` and marks the background run active again.
 - Resume does not create a new run; it continues the existing state snapshot.
 - Resume should re-anchor the standing pool with the latest metric, last iteration, and active role guidance before the next handoff.
 - Completed runs are not resumable; return to the previous state by starting a new run.
@@ -36,4 +36,4 @@
 ## Completion Semantics
-- `python scripts/autoresearch_runtime_ctl.py complete` moves a background run to `completed`, clears `background_active`, and ends the detached session lifecycle.
+- `autoresearch complete` moves a background run to `completed`, clears `background_active`, and ends the detached session lifecycle.

package/skills/autoresearch/references/subagent-orchestration.md CHANGED Viewed

@@ -26,7 +26,7 @@ Use this reference when a run should be subagent-first.
 ## State Ownership
-- Only the orchestrator records iterations, mutates `autoresearch-state.json`, and decides whether the latest step is `keep`, `discard`, or `needs_human`.
+- Only the orchestrator records iterations, mutates `.autoresearch/state.json`, and decides whether the latest step is `keep`, `discard`, or `needs_human`.
 - Subagents may disagree, critique, or verify, but their output is supporting evidence.
 - If several subagents contribute to one change, roll that evidence into one orchestrator-owned iteration result.