npm - opencode-swarm - Versions diffs - 7.87.3 → 7.88.1 - Mend

opencode-swarm 7.87.3 → 7.88.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (32) hide show

package/.opencode/skills/brainstorm/SKILL.md +2 -1
package/.opencode/skills/clarify/SKILL.md +7 -1
package/.opencode/skills/clarify-spec/SKILL.md +1 -1
package/.opencode/skills/issue-ingest/SKILL.md +3 -2
package/.opencode/skills/plan/SKILL.md +7 -1
package/.opencode/skills/specify/SKILL.md +3 -2
package/.opencode/skills/swarm-pr-review/SKILL.md +304 -9
package/README.md +1 -0
package/dist/background/candidate-parser.d.ts +189 -0
package/dist/background/candidate-sidecar-store.d.ts +56 -0
package/dist/cli/{config-doctor-6h64pn8n.js → config-doctor-jzbgpbdh.js} +2 -2
package/dist/cli/{guardrail-explain-2q9myk7c.js → guardrail-explain-995zavv8.js} +5 -5
package/dist/cli/{guardrail-log-eegabqcp.js → guardrail-log-c7egm5km.js} +3 -3
package/dist/cli/{index-q9h0wb04.js → index-0asbrmdx.js} +4 -0
package/dist/cli/{index-kz1bmebr.js → index-4td9ef53.js} +523 -229
package/dist/cli/{index-1cb4wxnm.js → index-819xp49y.js} +1 -1
package/dist/cli/{index-5hvbw5xh.js → index-g00qm2gf.js} +1 -1
package/dist/cli/{index-r3f47swm.js → index-sr7g2msm.js} +6 -6
package/dist/cli/{index-amwa268r.js → index-tt5aehrb.js} +2 -2
package/dist/cli/{index-5vpe6vq9.js → index-vjsr9bqt.js} +1 -1
package/dist/cli/index.js +4 -4
package/dist/cli/{schema-84146tvk.js → schema-vb6jkxgg.js} +1 -1
package/dist/index.js +2114 -991
package/dist/memory/config.d.ts +1 -0
package/dist/memory/gateway.d.ts +1 -0
package/dist/memory/provider-pool.d.ts +50 -0
package/dist/memory/sqlite-provider.d.ts +3 -0
package/dist/tools/index.d.ts +1 -0
package/dist/tools/manifest.d.ts +1 -0
package/dist/tools/parse-lane-candidates.d.ts +2 -0
package/dist/tools/tool-metadata.d.ts +4 -0
package/package.json +1 -1

package/.opencode/skills/brainstorm/SKILL.md CHANGED Viewed

@@ -52,7 +52,8 @@ If `council.general.enabled` is true in the resolved opencode-swarm config AND a
 - Exit with a design outline the user can skim in under two minutes.
 **Phase 5: SPEC WRITE + SELF-REVIEW (architect + reviewer).**
-    - Generate `.swarm/spec.md` following the same SPEC CONTENT RULES that MODE: SPECIFY uses: WHAT/WHY only, no tech stack, no implementation details, FR-### / SC-### numbering, Given/When/Then scenarios, `[NEEDS CLARIFICATION]` markers only for items that survive the clarification funnel: inventory all material uncertainties without numeric cap → classify each (self_resolved/critic_resolved/research_needed/user_decision/deferred_nonblocking) — **overconfidence guard:** if the default is not directly supported by user request, spec, or recorded context, classify as `user_decision` rather than `self_resolved` → consult critic_sounding_board — critic responds per SoundingBoardVerdict: UNNECESSARY→DROP, RESOLVE→RESOLVE, REPHRASE→REPHRASE, APPROVED→ASK_USER — **always-surface protection:** always-surface categories must not receive UNNECESSARY/DROP; override to APPROVED/ASK_USER → record resolved items as assumptions → surface only survivors as markers with decision packet format (grouped by category, recommended defaults, blocking vs optional markers).
+    - Generate `.swarm/spec.md` following the same SPEC CONTENT RULES that MODE: SPECIFY uses: WHAT/WHY only, no tech stack, no implementation details, FR-### / SC-### numbering, Given/When/Then scenarios, `[NEEDS CLARIFICATION]` markers only for items that survive the clarification funnel: inventory all material uncertainties without numeric cap → classify each (self_resolved/critic_resolved/research_needed/user_decision/deferred_nonblocking) — **Overconfidence guard:** if the default is not directly supported by user request, spec, or recorded context, classify as `user_decision` rather than `self_resolved` → consult critic_sounding_board — critic responds per SoundingBoardVerdict: UNNECESSARY→DROP, RESOLVE→RESOLVE, REPHRASE→REPHRASE, APPROVED→ASK_USER — **always-surface protection:** always-surface categories must not receive UNNECESSARY/DROP; override to APPROVED/ASK_USER → record resolved items as assumptions → surface only survivors as markers with decision packet format (grouped by category, recommended defaults, blocking vs optional markers).
+    - **Important:** If research is ongoing, monitor the timeout configured in `.swarm/config.json` under `research_needed_timeout_ms` (default: 300000ms / 5 minutes). If research does not complete before the timeout expires, automatically reclassify the item to `user_decision` with a note that research was incomplete, then surface it to the user. This prevents the clarification funnel from stalling while waiting for external research.
 - Cross-reference design sections by name where relevant context helps (but keep HOW out of the spec).
 - Delegate to `the active swarm's reviewer agent` for an independent review of the draft spec. Reviewer must flag: requirements that encode HOW, untestable requirements, missing edge cases, silent assumptions.
 - Apply reviewer feedback. If reviewer rejects, iterate once and re-review. After two rounds, surface remaining disagreements to the user.

package/.opencode/skills/clarify/SKILL.md CHANGED Viewed

@@ -38,7 +38,7 @@ There is NO hard cap on the internal inventory. Record every material uncertaint
 Classify each item as exactly one of:
 - `self_resolved`: answered from the user request, spec, plan, codebase reality check, `.swarm/context.md`, repo conventions, or an informed default. **If the default is not directly supported by user request, spec, or recorded context, classify as `user_decision` rather than `self_resolved`.**
 - `critic_resolved`: sent to Critic Sounding Board and resolved by the critic.
-- `research_needed`: needs SME/explorer/domain lookup before user escalation.
+- `research_needed`: needs SME/explorer/domain lookup before user escalation. **Important:** If research is ongoing, monitor the timeout configured in `.swarm/config.json` under `research_needed_timeout_ms` (default: 300000ms / 5 minutes). If research does not complete before the timeout expires, automatically reclassify the item to `user_decision` with a note that research was incomplete, then surface it to the user. This prevents the clarification funnel from stalling while waiting for external research.
 - `user_decision`: only the user can decide because it affects product scope, risk tolerance, policy, budget, UX, rollout, or destructive behavior.
 - `deferred_nonblocking`: useful follow-up detail that does not block a correct initial plan and can be explicitly recorded as an assumption or follow-up.
@@ -101,3 +101,9 @@ The critic may improve wording or confirm prior context, but these categories MU
 ### Assumptions Recording
 All items resolved in Stages 2-3 (self_resolved, critic_resolved, deferred_nonblocking) MUST be recorded as explicit assumptions in the spec, plan, or `.swarm/context.md`. Silently dropping resolved uncertainties is a protocol violation — every uncertainty that entered the funnel must have a recorded outcome.
+### Mechanical Enforcement of DROP Protection
+**Implementation Note:** The hard constraint against `DROP` on always-surface items (defined in Stage 3 of the clarification funnel) is currently enforced via skill instructions to the architect. A lightweight runtime enforcement mechanism is recommended: when processing the critic sounding board verdict response in `src/agents/critic.ts`, validate that any items tagged as "always-surface" do not receive `UNNECESSARY`/`DROP` verdicts. If a DROP verdict is encountered on an always-surface item, override it to `APPROVED`/`ASK_USER` at the code level rather than relying solely on prompt-based enforcement.
+This mechanical enforcement prevents the following failure mode: the architect prompt instructs the override, but due to parsing errors, context limits, or model behavior variance, the DROP verdict is mistakenly applied to an always-surface item and silently accepted. The validation should occur in the decision-packet assembly code (when building the final clarification packet to surface to the user) and should emit a warning log when an override is applied.

package/.opencode/skills/clarify-spec/SKILL.md CHANGED Viewed

@@ -50,7 +50,7 @@ CLARIFY-SPEC handles **already-surfaced** `[NEEDS CLARIFICATION]` markers and sp
 However, before surfacing each marker question to the user, CLARIFY-SPEC MUST:
 1. **Consult `critic_sounding_board`** with the candidate marker question and surrounding spec context to check whether the question wording can be improved or the item can be resolved from existing context.
-2. **Apply the overconfidence guard:** If the critic supplies a `RESOLVE` verdict with a default answer, but that default is not directly supported by user request, spec, or recorded context, classify the item as `user_decision` rather than `self_resolved`.
+2. **Apply the Overconfidence guard:** If the critic supplies a `RESOLVE` verdict with a default answer, but that default is not directly supported by user request, spec, or recorded context, classify the item as `user_decision` rather than `self_resolved`.
 3. **Apply always-surface protection:** If the marker belongs to an always-surface category (scope boundaries, destructive behavior, security/privacy, backward compatibility, breaking API changes, new dependencies, deprecations, cross-platform impact, cost/performance tradeoffs, user-visible UX, rollout strategy, QA gates), the item MUST NOT receive `UNNECESSARY`/`DROP` from the critic — override to `APPROVED`/`ASK_USER`.
 Critic verdict mapping (see `src/agents/critic.ts` `SoundingBoardVerdict`): `UNNECESSARY`→DROP, `RESOLVE`→RESOLVE, `REPHRASE`→REPHRASE, `APPROVED`→ASK_USER.

package/.opencode/skills/issue-ingest/SKILL.md CHANGED Viewed

@@ -45,8 +45,9 @@ Flags parsed from signal:
    - WHAT users need and WHY — never HOW to implement
    - FR-### / SC-### numbering, Given/When/Then scenarios
    - No technology stack, APIs, or code structure
-    - `[NEEDS CLARIFICATION]` markers only for items that survive the clarification funnel: inventory all material uncertainties without numeric cap → classify each (self_resolved/critic_resolved/research_needed/user_decision/deferred_nonblocking) — **overconfidence guard:** if the default is not directly supported by user request, spec, or recorded context, classify as `user_decision` rather than `self_resolved` → consult critic_sounding_board — critic responds per SoundingBoardVerdict: UNNECESSARY→DROP, RESOLVE→RESOLVE, REPHRASE→REPHRASE, APPROVED→ASK_USER — **always-surface protection:** always-surface categories must not receive UNNECESSARY/DROP; override to APPROVED/ASK_USER → record resolved items as assumptions → surface only survivors as markers with decision packet format (grouped by category, recommended defaults, blocking vs optional markers)
-2. Cross-reference the spec against the issue's expected behavior to ensure alignment.
+     - `[NEEDS CLARIFICATION]` markers only for items that survive the clarification funnel: inventory all material uncertainties without numeric cap → classify each (self_resolved/critic_resolved/research_needed/user_decision/deferred_nonblocking) — **Overconfidence guard:** if the default is not directly supported by user request, spec, or recorded context, classify as `user_decision` rather than `self_resolved` → consult critic_sounding_board — critic responds per SoundingBoardVerdict: UNNECESSARY→DROP, RESOLVE→RESOLVE, REPHRASE→REPHRASE, APPROVED→ASK_USER — **always-surface protection:** always-surface categories must not receive UNNECESSARY/DROP; override to APPROVED/ASK_USER → record resolved items as assumptions → surface only survivors as markers with decision packet format (grouped by category, recommended defaults, blocking vs optional markers)
+     - **Important:** If research is ongoing, monitor the timeout configured in `.swarm/config.json` under `research_needed_timeout_ms` (default: 300000ms / 5 minutes). If research does not complete before the timeout expires, automatically reclassify the item to `user_decision` with a note that research was incomplete, then surface it to the user. This prevents the clarification funnel from stalling while waiting for external research.
+ 2. Cross-reference the spec against the issue's expected behavior to ensure alignment.
 3. If the issue is a bug: spec must describe the correct behavior, not the broken behavior.
 4. If the issue is a feature: spec must describe the user-facing outcome, not the implementation.
 5. QA GATE SELECTION: Ask user which QA gates to enable (same dialogue as MODE: SPECIFY). Write to `.swarm/context.md` under `## Pending QA Gate Selection`.

package/.opencode/skills/plan/SKILL.md CHANGED Viewed

@@ -81,7 +81,7 @@ Classify each item as exactly one of:
 - `self_resolved`: answered from the user request, spec, plan, codebase reality check, `.swarm/context.md`, repo conventions, or an informed default. **If the default is not directly supported by user request, spec, or recorded context, classify as `user_decision` rather than `self_resolved`.**
 - `critic_resolved`: sent to Critic Sounding Board and resolved by the critic.
-- `research_needed`: needs SME/explorer/domain lookup before user escalation.
+- `research_needed`: needs SME/explorer/domain lookup before user escalation. **Important:** If research is ongoing, monitor the timeout configured in `.swarm/config.json` under `research_needed_timeout_ms` (default: 300000ms / 5 minutes). If research does not complete before the timeout expires, automatically reclassify the item to `user_decision` with a note that research was incomplete, then surface it to the user. This prevents the clarification funnel from stalling while waiting for external research.
 - `user_decision`: only the user can decide because it affects product scope, risk tolerance, policy, budget, UX, rollout, or destructive behavior.
 - `deferred_nonblocking`: useful follow-up detail that does not block a correct initial plan and can be explicitly recorded as an assumption or follow-up.
@@ -152,6 +152,12 @@ All items resolved in Stages 2-3 (self_resolved, critic_resolved, deferred_nonbl
 The plan generated by `save_plan` MUST include explicit assumptions and remaining unresolved decisions in the task descriptions or acceptance criteria — not silently omit them.
+#### Mechanical Enforcement of DROP Protection
+**Implementation Note:** The hard constraint against `DROP` on always-surface items (Stage 3 of the clarification funnel) is currently enforced via skill instructions to the architect. A lightweight runtime enforcement mechanism is recommended: when processing the critic sounding board verdict response in `src/agents/critic.ts`, validate that any items tagged as "always-surface" do not receive `UNNECESSARY`/`DROP` verdicts. If a DROP verdict is encountered on an always-surface item, override it to `APPROVED`/`ASK_USER` at the code level rather than relying solely on prompt-based enforcement.
+This mechanical enforcement prevents the following failure mode: the architect prompt instructs the override, but due to parsing errors, context limits, or model behavior variance, the DROP verdict is mistakenly applied to an always-surface item and silently accepted. The validation should occur in the decision-packet assembly code (when building the final clarification packet to surface to the user) and should emit a warning log when an override is applied.
 Use the `save_plan` tool to create the implementation plan. Required parameters:
 - `title`: The real project name from the spec (NOT a placeholder like [Project])
 - `swarm_id`: The swarm identifier (e.g. "mega", "local", "paid")

package/.opencode/skills/specify/SKILL.md CHANGED Viewed

@@ -28,8 +28,9 @@ Activates when: user asks to "specify", "define requirements", "write a spec", o
    - Success criteria numbered SC-001, SC-002… — measurable and technology-agnostic
    - Key entities if data is involved (no schema or field definitions — entity names only)
    - Edge cases and known failure modes
-    - `[NEEDS CLARIFICATION]` markers for items where uncertainty could change scope, security, or core behavior, BUT ONLY after running the clarification funnel: (1) inventory all material uncertainties without numeric cap, (2) classify each as self_resolved/critic_resolved/research_needed/user_decision/deferred_nonblocking — **overconfidence guard:** if the default is not directly supported by user request, spec, or recorded context, classify as `user_decision` rather than `self_resolved`, (3) consult critic_sounding_board with candidate items — critic responds per SoundingBoardVerdict: UNNECESSARY→DROP, RESOLVE→RESOLVE, REPHRASE→REPHRASE, APPROVED→ASK_USER — **always-surface protection:** always-surface categories must not receive UNNECESSARY/DROP; override to APPROVED/ASK_USER, (4) record all resolved items as explicit assumptions in the spec, (5) use markers only for items that survive the funnel (ASK_USER or unresolved after critic consultation). Decision packet format: grouped by category, recommended defaults, blocking vs optional markers, impact of accepting default. Prefer informed defaults over asking
-5. Write the spec to `.swarm/spec.md`.
+    - `[NEEDS CLARIFICATION]` markers for items where uncertainty could change scope, security, or core behavior, BUT ONLY after running the clarification funnel: (1) inventory all material uncertainties without numeric cap, (2) classify each as self_resolved/critic_resolved/research_needed/user_decision/deferred_nonblocking — **Overconfidence guard:** if the default is not directly supported by user request, spec, or recorded context, classify as `user_decision` rather than `self_resolved`, (3) consult critic_sounding_board with candidate items — critic responds per SoundingBoardVerdict: UNNECESSARY→DROP, RESOLVE→RESOLVE, REPHRASE→REPHRASE, APPROVED→ASK_USER — **always-surface protection:** always-surface categories must not receive UNNECESSARY/DROP; override to APPROVED/ASK_USER, (4) record all resolved items as explicit assumptions in the spec, (5) use markers only for items that survive the funnel (ASK_USER or unresolved after critic consultation). Decision packet format: grouped by category, recommended defaults, blocking vs optional markers, impact of accepting default. Prefer informed defaults over asking
+     - **Important:** If research is ongoing, monitor the timeout configured in `.swarm/config.json` under `research_needed_timeout_ms` (default: 300000ms / 5 minutes). If research does not complete before the timeout expires, automatically reclassify the item to `user_decision` with a note that research was incomplete, then surface it to the user. This prevents the clarification funnel from stalling while waiting for external research.
+ 5. Write the spec to `.swarm/spec.md`.
 5b. **QA GATE SELECTION, PARALLEL CODERS, COMMIT FREQUENCY, AND AUTO_PROCEED (dialogue only).**
 Ask the user which QA gates to enable for this plan, how many parallel coders to use, the commit frequency, and auto_proceed -- do not select on their behalf. Present all four items together as one unified exchange.

package/.opencode/skills/swarm-pr-review/SKILL.md CHANGED Viewed

@@ -78,6 +78,8 @@ The orchestrator may:
 - determine scope,
 - build or request the context pack,
 - launch explorers and triggered micro-lanes,
+- extract candidates from lane artifacts via `parse_lane_candidates` or equivalent parser,
+- filter, group, and chunk candidates for reviewer dispatch,
 - route candidates to reviewers,
 - route reviewer-confirmed findings to critics,
 - group validated findings,
@@ -88,7 +90,8 @@ The orchestrator MUST NOT:
 - re-read a candidate's target code to decide if it is valid,
 - silently downgrade or discard an explorer candidate,
 - treat tool output as a confirmed finding,
-- report a finding that no reviewer validated.
+- report a finding that no reviewer validated,
+- classify or judge candidates based on preview text alone — always use the structured parser output.
 If the orchestrator catches itself validating code, it must stop and delegate validation to a reviewer subagent.
@@ -495,9 +498,43 @@ Launch all base lanes with `dispatch_lanes_async` when available. Pass the six l
 Before Phase 4 or synthesis, call `collect_lane_results` with `wait: true` for the base-lane batch and treat the collected `lane_results` as the join barrier. Missing, stale, cancelled, or failed base lanes are explicit review coverage gaps. If `dispatch_lanes_async` is unavailable, use blocking `dispatch_lanes`; if that is also unavailable, simulate isolated passes. Do not let one lane's conclusions bias another lane, and record unavailable deterministic dispatch in the validation gate.
-When any collected or blocking `lane_results[]` item has `output_ref`, treat `output` as a preview only. Call `retrieve_lane_output` and consume the full artifact before extracting candidates, deciding that a lane produced no candidates, or routing work to reviewers. If a lane has `output_truncated: true`, `output_degraded: true`, `transcript_incomplete: true`, or no usable `output_ref`, record an explicit coverage gap and re-dispatch a narrower lane or mark affected candidates/coverage UNVERIFIED; never infer candidate absence from a preview.
-**lane id uniqueness for parallel dispatches:** When re-dispatching failed or re-running explorer lanes, every `dispatch_lanes_async` or `dispatch_lanes` lane `id` MUST be unique within that dispatch batch and should include lane and attempt suffixes (e.g. `pr_review_explore_lane1_attempt2`). Never reuse an id in the same batch unless intentionally replacing that exact lane before dispatch.
+### Candidate extraction via parser
+After `collect_lane_results` returns for base lanes, process each lane result
+that carries an `output_ref`. The orchestrator MUST use the candidate parser
+rather than preview-text extraction:
+1. For each `output_ref` (or batched), call `parse_lane_candidates` (or the
+   internal `parseAndPersist` module function) with `output_ref` and `producer`
+   flags; the parser auto-detects the format family per row. The parser reads
+   the full artifact from disk (no preview truncation issue) and returns
+   structured `ParseResultWithSidecar` records.
+2. Filter the returned `candidates[]` array by `producer: "swarm-pr-review"` and
+   the relevant `row_format_family` (e.g., `base_explorer` for base lanes,
+   `micro_lane` for micro-lanes). Filtering happens on the parsed results, NOT
+   on the tool input.
+3. Group the filtered candidates into reviewer-sized chunks:
+   - by file area (group by the directory or module of the `file_line` field),
+   - by category (group by the `category` field),
+   - by count (target max 50 candidates per chunk; smaller chunks are fine).
+4. Dispatch reviewer lanes (one per chunk) with bounded in-context candidate
+   lists. Each reviewer lane receives only the candidates from its assigned
+   chunk.
+If a lane has `output_degraded: true`, `transcript_incomplete: true`, or no usable `output_ref`, record an explicit
+coverage gap and re-dispatch a narrower lane or mark affected candidates
+UNVERIFIED. Never infer candidate absence from a preview.
+**Fallback convention:** If the parser is unavailable, the explorer MAY emit
+`[CANDIDATE]` rows in the lane output as a fallback convention (see the
+Explorer Prompt Template at the end of this skill), but the orchestrator
+SHOULD use the parser as the primary extraction mechanism.
+**lane id uniqueness for parallel dispatches:** When re-dispatching failed or
+re-running explorer lanes, every `dispatch_lanes_async` or `dispatch_lanes`
+lane `id` MUST be unique within that dispatch batch and should include lane and
+attempt suffixes (e.g., `pr_review_explore_lane1_attempt2`). Never reuse an id
+in the same batch unless intentionally replacing that exact lane before dispatch.
 Explorers optimize for recall. Over-reporting is expected. Explorers produce candidates only.
@@ -507,7 +544,7 @@ Explorers optimize for recall. Over-reporting is expected. Explorers produce can
 | Lane 2: Security and trust boundaries | Injection, authz/authn bypass, SSRF, path traversal, secret exposure, unsafe deserialization, prompt injection | untrusted input sources, sanitization, credential handling, permission boundary, private network access, output escaping |
 | Lane 3: Dependencies and deployment safety | Import changes, version bumps, lockfile drift, breaking APIs, package scripts, runtime assumptions | lockfile consistency, new transitive deps, Node/Bun/runtime compatibility, platform assumptions, license red flags |
 | Lane 4: Docs, intent, and drift | PR claims vs implementation, docs mismatch, migration/changelog gaps, stale examples | obligation mapping, changed behavior not documented, docs promising behavior not implemented |
-| Lane 5: Tests and falsifiability | Weak assertions, missing edge tests, flaky patterns, mock leakage, fixture drift | assertion strength, tautology patterns (`expect(true).toBe(true)`, `expect(res).toBeDefined()` without further checks, `assertDoesNotThrow` wrapping trivial code), negative paths, isolation, deterministic timing, cross-platform path coverage |
+| Lane 5: Tests and falsifiability | Weak assertions, missing edge tests, flaky patterns, mock leakage, fixture drift | assertion strength, tautology patterns (`expect(true).toBe(true)`, `expect(res).toBeDefined()` without further checks), `assertDoesNotThrow` wrapping trivial code), negative paths, isolation, deterministic timing, cross-platform path coverage |
 | Lane 6: Performance and architecture | Complexity regressions, memory leaks, over-coupling, inefficient graph scans, global mutable state | algorithmic deltas, caching, resource lifecycle, state ownership, architectural boundary violations |
 ### Explorer context contract
@@ -523,12 +560,19 @@ Every explorer must inspect or explicitly mark unavailable:
 7. relevant Swarm knowledge/evidence entries, if present.
 8. the commit range to analyze (`base_ref..head_ref`),
-Explorer output format:
+### Explorer output format
+Explorers emit structured candidate records. The parser reads the full lane
+artifact and extracts these records. The canonical record shape is:
 ```text
 [CANDIDATE] | candidate_id | lane | severity | category | file:line | claim | evidence_summary | impact_context | confidence: LOW/MEDIUM/HIGH
 ```
+The parser normalizes this into a structured `candidates[]` array. If the
+parser is unavailable, the explorer MAY emit the `[CANDIDATE]` row format
+directly in the lane output as a fallback convention.
 Explorers must not use `CONFIRMED`, `DISPROVED`, or `PRE_EXISTING`.
 ---
@@ -537,7 +581,7 @@ Explorers must not use `CONFIRMED`, `DISPROVED`, or `PRE_EXISTING`.
 After `collect_lane_results` returns for base lanes, inspect the context pack risk triggers. Launch focused micro-lanes for triggered categories only, using `dispatch_lanes_async` again when more than one read-only micro-lane is needed. Collect every micro-lane batch with `wait: true` before reviewer classification. Do not launch irrelevant micro-lanes.
-Apply the same `output_ref` rule to micro-lanes: retrieve full output before candidate routing, and treat degraded or incomplete lane artifacts as UNVERIFIED coverage rather than as clean negative evidence.
+Apply the same parser-based extraction to micro-lanes: call `parse_lane_candidates` on each micro-lane `output_ref` (filter the returned `candidates[]` array by `row_format_family === "micro_lane"` after parsing), and treat degraded or incomplete lane artifacts as UNVERIFIED coverage rather than as clean negative evidence.
 Each micro-lane receives:
@@ -547,7 +591,8 @@ Each micro-lane receives:
 - relevant deterministic signals,
 - related historical knowledge with quarantine/staleness status,
 - expected invariants,
-- output format as `[CANDIDATE]` only.
+- structured candidate output (parser-extracted). If the parser is unavailable,
+  the micro-lane MAY emit `[CANDIDATE]` rows as a fallback convention.
 ### Swarm plugin risk trigger map
@@ -596,7 +641,12 @@ Verifier output is advisory until incorporated by the independent reviewer or cr
 ## Phase 6: Independent Reviewer Confirmation
-Route candidates to reviewer subagents. The reviewer must re-read the candidate's file:line evidence and relevant context pack entries directly.
+Route candidates to reviewer subagents. The orchestrator routes candidates
+in bounded chunks produced by the parser-based extraction in Phase 3-4. Each
+reviewer lane receives a bounded list of candidates from a single chunk — by
+file area, category, or count — not the full candidate set. The reviewer must
+re-read the candidate's file:line evidence and relevant context pack entries
+directly.
 ### Noise budget and universal validation
@@ -813,6 +863,245 @@ Update the verdict only after re-verifying all previously blocking findings.
 ---
+## Dry-Run: Parser-Based Candidate Extraction
+This section demonstrates the new parser-based extraction path end-to-end
+using synthetic data. It is concrete enough to implement the same pattern in
+another skill.
+### Scenario
+A PR review has dispatched six base explorer lanes via `dispatch_lanes_async`.
+The batch completed and `collect_lane_results` returned:
+```json
+{
+  "batch_id": "batch-a1b2c3",
+  "lane_results": [
+    {
+      "lane_id": "pr_review_lane1_correctness",
+      "status": "completed",
+      "output_ref": ".swarm/lane-results/batch-a1b2c3/lane-1/out-abc123.json",
+      "output_degraded": false
+    },
+    {
+      "lane_id": "pr_review_lane2_security",
+      "status": "completed",
+      "output_ref": ".swarm/lane-results/batch-a1b2c3/lane-2/out-def456.json",
+      "output_degraded": false
+    }
+  ]
+}
+```
+### Step 1 — Call the parser
+The orchestrator calls `parse_lane_candidates` for each `output_ref`:
+```json
+{
+  "tool": "parse_lane_candidates",
+  "arguments": {
+    "output_ref": ".swarm/lane-results/batch-a1b2c3/lane-1/out-abc123.json",
+    "producer": "swarm-pr-review"
+  }
+}
+```
+### Step 2 — Structured response
+The parser returns a `ParseResultWithSidecar`. On success, `error` and `error_code` are absent:
+```json
+{
+  "candidates": [
+    {
+      "record_type": "candidate",
+      "row_format_family": "base_explorer",
+      "row_format_version": 1,
+      "record_version": { "major": 1, "minor": 0 },
+      "source_output_ref": ".swarm/lane-results/batch-a1b2c3/lane-1/out-abc123.json",
+      "source_batch_id": "B-2025-06-22-001",
+      "source_lane_id": "explorer-1",
+      "source_agent": "paid_explorer",
+      "source_digest": "sha256:abc123def456...",
+      "extracted_from_partial_source": false,
+      "sessionId": "ses_01HXYZ...",
+      "parentSessionId": "ses_01HABC...",
+      "producer": "swarm-pr-review",
+      "candidate_id": "C-001",
+      "lane": "Lane 1: Correctness and edge cases",
+      "micro_lane": null,
+      "severity": "HIGH",
+      "category": "null-safety",
+      "file_line": "src/utils/cache.ts:142",
+      "claim": "Uncached getter may return undefined on cold start",
+      "evidence_summary": "The `getCached` function returns `cache[key]` without a fallback when the cache is empty.",
+      "impact_context": "Downstream callers in `src/handlers/*.ts` expect a defined value and call `.toString()` directly.",
+      "invariant_violated": null,
+      "confidence": "HIGH"
+    },
+    {
+      "record_type": "candidate",
+      "row_format_family": "base_explorer",
+      "row_format_version": 1,
+      "record_version": { "major": 1, "minor": 0 },
+      "source_output_ref": ".swarm/lane-results/batch-a1b2c3/lane-1/out-abc123.json",
+      "source_batch_id": "B-2025-06-22-001",
+      "source_lane_id": "explorer-1",
+      "source_agent": "paid_explorer",
+      "source_digest": "sha256:abc123def456...",
+      "extracted_from_partial_source": false,
+      "sessionId": "ses_01HXYZ...",
+      "parentSessionId": "ses_01HABC...",
+      "producer": "swarm-pr-review",
+      "candidate_id": "C-002",
+      "lane": "Lane 1: Correctness and edge cases",
+      "micro_lane": null,
+      "severity": "MEDIUM",
+      "category": "async-ordering",
+      "file_line": "src/services/queue.ts:88",
+      "claim": "Race between `drain` and `processNext` may drop items",
+      "evidence_summary": "`drain` sets `active = false` before awaiting `processNext`, which also checks `active`.",
+      "impact_context": "Items submitted during the drain window are silently dropped.",
+      "invariant_violated": null,
+      "confidence": "MEDIUM"
+    }
+  ],
+  "invocation_envelope": {
+    "record_type": "invocation",
+    "source_output_ref": ".swarm/lane-results/batch-a1b2c3/lane-1/out-abc123.json",
+    "source_batch_id": "B-2025-06-22-001",
+    "source_lane_id": "explorer-1",
+    "source_agent": "paid_explorer",
+    "source_digest": "sha256:abc123def456...",
+    "row_format_version": 1,
+    "record_version": { "major": 1, "minor": 0 },
+    "sessionId": "ses_01HXYZ...",
+    "parentSessionId": "ses_01HABC...",
+    "producer": "swarm-pr-review",
+    "produced_at": "2025-06-22T14:30:00.000Z",
+     "format_families_detected": ["base_explorer"],
+     "candidate_count": 2,
+     "parse_errors": 2,
+     "malformed_rows": 0
+  },
+  "diagnostics": {
+    "candidate_count": 2,
+    "parse_errors": 2,
+    "parse_error_details": [
+      {
+        "row_index": 0,
+        "field": "row",
+        "message": "Both format-family discriminators present; defaulting to base_explorer"
+      },
+      {
+        "row_index": 1,
+        "field": "row",
+        "message": "Both format-family discriminators present; defaulting to base_explorer"
+      }
+    ],
+    "malformed_rows": 0,
+    "duplicate_id_count": 0,
+    "duplicate_id_warnings": [],
+    "degraded_source_count": 0,
+    "incomplete_source_count": 0,
+     "format_families_detected": ["base_explorer"]
+   }
+}
+```
+> **Note**: `parse_errors: 2` reflects FR-017/SC-017 position-based detection: when a `[CANDIDATE]` row has both `evidence_summary` and `impact_context` populated, the parser emits a `parse_error_details` entry per row with `field: "row"` and `message: "Both format-family discriminators present; defaulting to base_explorer"`. This is documented behavior, not a parser bug. To get `parse_errors: 0` with the row format, leave one of the two fields empty; to silence the warning entirely, emit structured JSON candidate records.
+On refusal (e.g. `output_ref` does not exist), `error` and `error_code` are present; `candidates` is `[]`; `invocation_envelope` and `diagnostics` are populated with empty fields for traceability:
+```json
+{
+  "error": "Artifact reference not found in store",
+  "error_code": "ref-not-found",
+  "candidates": [],
+  "invocation_envelope": {
+    "record_type": "invocation",
+    "source_output_ref": ".swarm/lane-results/batch-a1b2c3/lane-1/missing.json",
+    "source_batch_id": "",
+    "source_lane_id": "",
+    "source_agent": "",
+    "source_digest": "",
+    "row_format_version": 1,
+    "record_version": { "major": 1, "minor": 0 },
+    "produced_at": "2025-06-22T14:30:00.000Z",
+    "format_families_detected": [],
+    "candidate_count": 0,
+    "parse_errors": 0,
+    "malformed_rows": 0
+  },
+  "diagnostics": {
+    "candidate_count": 0,
+    "parse_errors": 0,
+    "parse_error_details": [],
+    "malformed_rows": 0,
+    "duplicate_id_count": 0,
+    "duplicate_id_warnings": [],
+    "degraded_source_count": 0,
+    "incomplete_source_count": 0,
+     "format_families_detected": []
+   }
+}
+```
+### Step 3 — Filter and group
+The orchestrator filters the returned `candidates[]` array by `producer: "swarm-pr-review"` and `row_format_family` (e.g. `base_explorer` or `micro_lane`), then groups
+the candidates. In this synthetic example, the two candidates above are grouped
+by file area:
+- **Chunk A — `src/utils/`** (1 candidate): C-001
+- **Chunk B — `src/services/`** (1 candidate): C-002
+If there were more candidates, the orchestrator would also group by category
+(e.g., `null-safety`, `async-ordering`) and cap each chunk at 50 candidates.
+### Step 4 — Dispatch reviewer lanes
+The orchestrator dispatches one reviewer lane per chunk:
+```text
+You are the independent reviewer. Validate only the candidates assigned below.
+Do not search for new issues except where needed to validate reachability or
+mitigation. Do not trust explorer severity.
+Context pack summary:
+- scope: ...
+- obligations: ...
+- impact cone: ...
+- deterministic signals: ...
+- relevant Swarm artifacts / knowledge: ...
+- base_ref: <commit SHA of base branch>
+- head_ref: <commit SHA of PR head branch>
+Candidates (Chunk A — src/utils/):
+- C-001 | HIGH | null-safety | src/utils/cache.ts:142 | Uncached getter may return undefined on cold start
+For each candidate, return:
+[REVIEWED] | candidate_id | CONFIRMED/DISPROVED/UNVERIFIED/PRE_EXISTING | evidence_type | final_severity | introduced_by_pr | file:line | rationale | falsification_probe | reviewer_id
+You must check caller context, reachability, schema/middleware/framework mitigations, state-machine constraints, test coverage, PR-introducedness, and severity.
+IMPORTANT: If a finding claims behavior is "new" or "introduced by the PR", you MUST read the equivalent code on the base branch (git show <base_ref>:<file>) to verify it was not present before. A reviewer claim of "this is new" is invalid without base-branch evidence. Do not compare the new code to an idealized baseline — compare it to what actually existed on the base branch at the time of the PR.
+```
+### Key invariants
+- The parser reads the **full artifact**, not a preview. Truncation in the
+  `dispatch_lanes` preview does not affect candidate extraction.
+- The orchestrator never classifies candidates — it only filters, groups, and
+  routes them.
+- Each reviewer receives a bounded chunk. A chunk with more than 50 candidates
+  is split before dispatch.
+- The `invocation_envelope` in the parser response provides audit provenance
+  for every extracted candidate.
+---
 # Council Mode Workflow
 Council mode is opt-in only and adversarial.
@@ -1116,4 +1405,10 @@ Return:
 [CANDIDATE] | candidate_id | lane | severity | category | file:line | claim | evidence_summary | impact_context | confidence
 ```
+The orchestrator extracts candidates from the full lane artifact via
+`parse_lane_candidates` as the primary mechanism. The `[CANDIDATE]` row
+format above is a fallback convention for environments where the parser is
+unavailable. Explorers should still emit structured records regardless of
+whether the parser is present.
 Do not let speed degrade validation quality.

package/README.md CHANGED Viewed

@@ -800,6 +800,7 @@ Every candidate passes a 3-gate pipeline before entering quarantine:
 | mutation_test | Applies LLM-generated mutation patches to source files and runs tests to measure kill rate; verdict is pass/warn/fail based on configurable thresholds; used by the mutation_test gate (opt-in, off by default) |
 | generate_mutants | Architect-only: generates LLM-based mutation patches (5–10 per function across 6 types: off-by-one, null substitution, operator swap, guard removal, branch swap, side-effect deletion) for direct consumption by the mutation_test tool; returns SKIP verdict on LLM failure rather than throwing |
 | write_mutation_evidence | Architect-only: writes mutation gate results atomically to `.swarm/evidence/{phase}/mutation-gate.json`; accepts verdict (PASS/WARN/FAIL/SKIP), kill rate metrics, and optional survived mutant details; normalizes uppercase-to-lowercase before persisting |
+| parse_lane_candidates | Architect-only: parses `[CANDIDATE]` rows from a `dispatch_lanes` or `collect_lane_results` artifact by `output_ref`; produces structured records with provenance and optional sidecar JSONL persistence; returns `ParseResultWithSidecar` on success or `{ error, error_code, candidates: [] }` on refusal |
 | git_blame | Per-line git blame metadata (sha, author, date, summary) via `git blame --porcelain`; supports optional line range filtering |
 | diff | Structured git diff with contract change detection; supports `summaryOnly` mode returning file list with additions/deletions counts |
 | suggest_patch | Reviewer-safe structured patch suggestion; supports `format` parameter ('json' or 'unified') where unified outputs valid unified diff with `diff --git` headers, hunks, and context |