opencode-swarm 7.87.3 → 7.88.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (32) hide show
  1. package/.opencode/skills/brainstorm/SKILL.md +2 -1
  2. package/.opencode/skills/clarify/SKILL.md +7 -1
  3. package/.opencode/skills/clarify-spec/SKILL.md +1 -1
  4. package/.opencode/skills/issue-ingest/SKILL.md +3 -2
  5. package/.opencode/skills/plan/SKILL.md +7 -1
  6. package/.opencode/skills/specify/SKILL.md +3 -2
  7. package/.opencode/skills/swarm-pr-review/SKILL.md +304 -9
  8. package/README.md +1 -0
  9. package/dist/background/candidate-parser.d.ts +189 -0
  10. package/dist/background/candidate-sidecar-store.d.ts +56 -0
  11. package/dist/cli/{config-doctor-6h64pn8n.js → config-doctor-jzbgpbdh.js} +2 -2
  12. package/dist/cli/{guardrail-explain-2q9myk7c.js → guardrail-explain-995zavv8.js} +5 -5
  13. package/dist/cli/{guardrail-log-eegabqcp.js → guardrail-log-c7egm5km.js} +3 -3
  14. package/dist/cli/{index-q9h0wb04.js → index-0asbrmdx.js} +4 -0
  15. package/dist/cli/{index-kz1bmebr.js → index-4td9ef53.js} +523 -229
  16. package/dist/cli/{index-1cb4wxnm.js → index-819xp49y.js} +1 -1
  17. package/dist/cli/{index-5hvbw5xh.js → index-g00qm2gf.js} +1 -1
  18. package/dist/cli/{index-r3f47swm.js → index-sr7g2msm.js} +6 -6
  19. package/dist/cli/{index-amwa268r.js → index-tt5aehrb.js} +2 -2
  20. package/dist/cli/{index-5vpe6vq9.js → index-vjsr9bqt.js} +1 -1
  21. package/dist/cli/index.js +4 -4
  22. package/dist/cli/{schema-84146tvk.js → schema-vb6jkxgg.js} +1 -1
  23. package/dist/index.js +2114 -991
  24. package/dist/memory/config.d.ts +1 -0
  25. package/dist/memory/gateway.d.ts +1 -0
  26. package/dist/memory/provider-pool.d.ts +50 -0
  27. package/dist/memory/sqlite-provider.d.ts +3 -0
  28. package/dist/tools/index.d.ts +1 -0
  29. package/dist/tools/manifest.d.ts +1 -0
  30. package/dist/tools/parse-lane-candidates.d.ts +2 -0
  31. package/dist/tools/tool-metadata.d.ts +4 -0
  32. package/package.json +1 -1
@@ -52,7 +52,8 @@ If `council.general.enabled` is true in the resolved opencode-swarm config AND a
52
52
  - Exit with a design outline the user can skim in under two minutes.
53
53
 
54
54
  **Phase 5: SPEC WRITE + SELF-REVIEW (architect + reviewer).**
55
- - Generate `.swarm/spec.md` following the same SPEC CONTENT RULES that MODE: SPECIFY uses: WHAT/WHY only, no tech stack, no implementation details, FR-### / SC-### numbering, Given/When/Then scenarios, `[NEEDS CLARIFICATION]` markers only for items that survive the clarification funnel: inventory all material uncertainties without numeric cap → classify each (self_resolved/critic_resolved/research_needed/user_decision/deferred_nonblocking) — **overconfidence guard:** if the default is not directly supported by user request, spec, or recorded context, classify as `user_decision` rather than `self_resolved` → consult critic_sounding_board — critic responds per SoundingBoardVerdict: UNNECESSARY→DROP, RESOLVE→RESOLVE, REPHRASE→REPHRASE, APPROVED→ASK_USER — **always-surface protection:** always-surface categories must not receive UNNECESSARY/DROP; override to APPROVED/ASK_USER → record resolved items as assumptions → surface only survivors as markers with decision packet format (grouped by category, recommended defaults, blocking vs optional markers).
55
+ - Generate `.swarm/spec.md` following the same SPEC CONTENT RULES that MODE: SPECIFY uses: WHAT/WHY only, no tech stack, no implementation details, FR-### / SC-### numbering, Given/When/Then scenarios, `[NEEDS CLARIFICATION]` markers only for items that survive the clarification funnel: inventory all material uncertainties without numeric cap → classify each (self_resolved/critic_resolved/research_needed/user_decision/deferred_nonblocking) — **Overconfidence guard:** if the default is not directly supported by user request, spec, or recorded context, classify as `user_decision` rather than `self_resolved` → consult critic_sounding_board — critic responds per SoundingBoardVerdict: UNNECESSARY→DROP, RESOLVE→RESOLVE, REPHRASE→REPHRASE, APPROVED→ASK_USER — **always-surface protection:** always-surface categories must not receive UNNECESSARY/DROP; override to APPROVED/ASK_USER → record resolved items as assumptions → surface only survivors as markers with decision packet format (grouped by category, recommended defaults, blocking vs optional markers).
56
+ - **Important:** If research is ongoing, monitor the timeout configured in `.swarm/config.json` under `research_needed_timeout_ms` (default: 300000ms / 5 minutes). If research does not complete before the timeout expires, automatically reclassify the item to `user_decision` with a note that research was incomplete, then surface it to the user. This prevents the clarification funnel from stalling while waiting for external research.
56
57
  - Cross-reference design sections by name where relevant context helps (but keep HOW out of the spec).
57
58
  - Delegate to `the active swarm's reviewer agent` for an independent review of the draft spec. Reviewer must flag: requirements that encode HOW, untestable requirements, missing edge cases, silent assumptions.
58
59
  - Apply reviewer feedback. If reviewer rejects, iterate once and re-review. After two rounds, surface remaining disagreements to the user.
@@ -38,7 +38,7 @@ There is NO hard cap on the internal inventory. Record every material uncertaint
38
38
  Classify each item as exactly one of:
39
39
  - `self_resolved`: answered from the user request, spec, plan, codebase reality check, `.swarm/context.md`, repo conventions, or an informed default. **If the default is not directly supported by user request, spec, or recorded context, classify as `user_decision` rather than `self_resolved`.**
40
40
  - `critic_resolved`: sent to Critic Sounding Board and resolved by the critic.
41
- - `research_needed`: needs SME/explorer/domain lookup before user escalation.
41
+ - `research_needed`: needs SME/explorer/domain lookup before user escalation. **Important:** If research is ongoing, monitor the timeout configured in `.swarm/config.json` under `research_needed_timeout_ms` (default: 300000ms / 5 minutes). If research does not complete before the timeout expires, automatically reclassify the item to `user_decision` with a note that research was incomplete, then surface it to the user. This prevents the clarification funnel from stalling while waiting for external research.
42
42
  - `user_decision`: only the user can decide because it affects product scope, risk tolerance, policy, budget, UX, rollout, or destructive behavior.
43
43
  - `deferred_nonblocking`: useful follow-up detail that does not block a correct initial plan and can be explicitly recorded as an assumption or follow-up.
44
44
 
@@ -101,3 +101,9 @@ The critic may improve wording or confirm prior context, but these categories MU
101
101
  ### Assumptions Recording
102
102
 
103
103
  All items resolved in Stages 2-3 (self_resolved, critic_resolved, deferred_nonblocking) MUST be recorded as explicit assumptions in the spec, plan, or `.swarm/context.md`. Silently dropping resolved uncertainties is a protocol violation — every uncertainty that entered the funnel must have a recorded outcome.
104
+
105
+ ### Mechanical Enforcement of DROP Protection
106
+
107
+ **Implementation Note:** The hard constraint against `DROP` on always-surface items (defined in Stage 3 of the clarification funnel) is currently enforced via skill instructions to the architect. A lightweight runtime enforcement mechanism is recommended: when processing the critic sounding board verdict response in `src/agents/critic.ts`, validate that any items tagged as "always-surface" do not receive `UNNECESSARY`/`DROP` verdicts. If a DROP verdict is encountered on an always-surface item, override it to `APPROVED`/`ASK_USER` at the code level rather than relying solely on prompt-based enforcement.
108
+
109
+ This mechanical enforcement prevents the following failure mode: the architect prompt instructs the override, but due to parsing errors, context limits, or model behavior variance, the DROP verdict is mistakenly applied to an always-surface item and silently accepted. The validation should occur in the decision-packet assembly code (when building the final clarification packet to surface to the user) and should emit a warning log when an override is applied.
@@ -50,7 +50,7 @@ CLARIFY-SPEC handles **already-surfaced** `[NEEDS CLARIFICATION]` markers and sp
50
50
  However, before surfacing each marker question to the user, CLARIFY-SPEC MUST:
51
51
 
52
52
  1. **Consult `critic_sounding_board`** with the candidate marker question and surrounding spec context to check whether the question wording can be improved or the item can be resolved from existing context.
53
- 2. **Apply the overconfidence guard:** If the critic supplies a `RESOLVE` verdict with a default answer, but that default is not directly supported by user request, spec, or recorded context, classify the item as `user_decision` rather than `self_resolved`.
53
+ 2. **Apply the Overconfidence guard:** If the critic supplies a `RESOLVE` verdict with a default answer, but that default is not directly supported by user request, spec, or recorded context, classify the item as `user_decision` rather than `self_resolved`.
54
54
  3. **Apply always-surface protection:** If the marker belongs to an always-surface category (scope boundaries, destructive behavior, security/privacy, backward compatibility, breaking API changes, new dependencies, deprecations, cross-platform impact, cost/performance tradeoffs, user-visible UX, rollout strategy, QA gates), the item MUST NOT receive `UNNECESSARY`/`DROP` from the critic — override to `APPROVED`/`ASK_USER`.
55
55
 
56
56
  Critic verdict mapping (see `src/agents/critic.ts` `SoundingBoardVerdict`): `UNNECESSARY`→DROP, `RESOLVE`→RESOLVE, `REPHRASE`→REPHRASE, `APPROVED`→ASK_USER.
@@ -45,8 +45,9 @@ Flags parsed from signal:
45
45
  - WHAT users need and WHY — never HOW to implement
46
46
  - FR-### / SC-### numbering, Given/When/Then scenarios
47
47
  - No technology stack, APIs, or code structure
48
- - `[NEEDS CLARIFICATION]` markers only for items that survive the clarification funnel: inventory all material uncertainties without numeric cap → classify each (self_resolved/critic_resolved/research_needed/user_decision/deferred_nonblocking) — **overconfidence guard:** if the default is not directly supported by user request, spec, or recorded context, classify as `user_decision` rather than `self_resolved` → consult critic_sounding_board — critic responds per SoundingBoardVerdict: UNNECESSARY→DROP, RESOLVE→RESOLVE, REPHRASE→REPHRASE, APPROVED→ASK_USER — **always-surface protection:** always-surface categories must not receive UNNECESSARY/DROP; override to APPROVED/ASK_USER → record resolved items as assumptions → surface only survivors as markers with decision packet format (grouped by category, recommended defaults, blocking vs optional markers)
49
- 2. Cross-reference the spec against the issue's expected behavior to ensure alignment.
48
+ - `[NEEDS CLARIFICATION]` markers only for items that survive the clarification funnel: inventory all material uncertainties without numeric cap → classify each (self_resolved/critic_resolved/research_needed/user_decision/deferred_nonblocking) — **Overconfidence guard:** if the default is not directly supported by user request, spec, or recorded context, classify as `user_decision` rather than `self_resolved` → consult critic_sounding_board — critic responds per SoundingBoardVerdict: UNNECESSARY→DROP, RESOLVE→RESOLVE, REPHRASE→REPHRASE, APPROVED→ASK_USER — **always-surface protection:** always-surface categories must not receive UNNECESSARY/DROP; override to APPROVED/ASK_USER → record resolved items as assumptions → surface only survivors as markers with decision packet format (grouped by category, recommended defaults, blocking vs optional markers)
49
+ - **Important:** If research is ongoing, monitor the timeout configured in `.swarm/config.json` under `research_needed_timeout_ms` (default: 300000ms / 5 minutes). If research does not complete before the timeout expires, automatically reclassify the item to `user_decision` with a note that research was incomplete, then surface it to the user. This prevents the clarification funnel from stalling while waiting for external research.
50
+ 2. Cross-reference the spec against the issue's expected behavior to ensure alignment.
50
51
  3. If the issue is a bug: spec must describe the correct behavior, not the broken behavior.
51
52
  4. If the issue is a feature: spec must describe the user-facing outcome, not the implementation.
52
53
  5. QA GATE SELECTION: Ask user which QA gates to enable (same dialogue as MODE: SPECIFY). Write to `.swarm/context.md` under `## Pending QA Gate Selection`.
@@ -81,7 +81,7 @@ Classify each item as exactly one of:
81
81
 
82
82
  - `self_resolved`: answered from the user request, spec, plan, codebase reality check, `.swarm/context.md`, repo conventions, or an informed default. **If the default is not directly supported by user request, spec, or recorded context, classify as `user_decision` rather than `self_resolved`.**
83
83
  - `critic_resolved`: sent to Critic Sounding Board and resolved by the critic.
84
- - `research_needed`: needs SME/explorer/domain lookup before user escalation.
84
+ - `research_needed`: needs SME/explorer/domain lookup before user escalation. **Important:** If research is ongoing, monitor the timeout configured in `.swarm/config.json` under `research_needed_timeout_ms` (default: 300000ms / 5 minutes). If research does not complete before the timeout expires, automatically reclassify the item to `user_decision` with a note that research was incomplete, then surface it to the user. This prevents the clarification funnel from stalling while waiting for external research.
85
85
  - `user_decision`: only the user can decide because it affects product scope, risk tolerance, policy, budget, UX, rollout, or destructive behavior.
86
86
  - `deferred_nonblocking`: useful follow-up detail that does not block a correct initial plan and can be explicitly recorded as an assumption or follow-up.
87
87
 
@@ -152,6 +152,12 @@ All items resolved in Stages 2-3 (self_resolved, critic_resolved, deferred_nonbl
152
152
 
153
153
  The plan generated by `save_plan` MUST include explicit assumptions and remaining unresolved decisions in the task descriptions or acceptance criteria — not silently omit them.
154
154
 
155
+ #### Mechanical Enforcement of DROP Protection
156
+
157
+ **Implementation Note:** The hard constraint against `DROP` on always-surface items (Stage 3 of the clarification funnel) is currently enforced via skill instructions to the architect. A lightweight runtime enforcement mechanism is recommended: when processing the critic sounding board verdict response in `src/agents/critic.ts`, validate that any items tagged as "always-surface" do not receive `UNNECESSARY`/`DROP` verdicts. If a DROP verdict is encountered on an always-surface item, override it to `APPROVED`/`ASK_USER` at the code level rather than relying solely on prompt-based enforcement.
158
+
159
+ This mechanical enforcement prevents the following failure mode: the architect prompt instructs the override, but due to parsing errors, context limits, or model behavior variance, the DROP verdict is mistakenly applied to an always-surface item and silently accepted. The validation should occur in the decision-packet assembly code (when building the final clarification packet to surface to the user) and should emit a warning log when an override is applied.
160
+
155
161
  Use the `save_plan` tool to create the implementation plan. Required parameters:
156
162
  - `title`: The real project name from the spec (NOT a placeholder like [Project])
157
163
  - `swarm_id`: The swarm identifier (e.g. "mega", "local", "paid")
@@ -28,8 +28,9 @@ Activates when: user asks to "specify", "define requirements", "write a spec", o
28
28
  - Success criteria numbered SC-001, SC-002… — measurable and technology-agnostic
29
29
  - Key entities if data is involved (no schema or field definitions — entity names only)
30
30
  - Edge cases and known failure modes
31
- - `[NEEDS CLARIFICATION]` markers for items where uncertainty could change scope, security, or core behavior, BUT ONLY after running the clarification funnel: (1) inventory all material uncertainties without numeric cap, (2) classify each as self_resolved/critic_resolved/research_needed/user_decision/deferred_nonblocking — **overconfidence guard:** if the default is not directly supported by user request, spec, or recorded context, classify as `user_decision` rather than `self_resolved`, (3) consult critic_sounding_board with candidate items — critic responds per SoundingBoardVerdict: UNNECESSARY→DROP, RESOLVE→RESOLVE, REPHRASE→REPHRASE, APPROVED→ASK_USER — **always-surface protection:** always-surface categories must not receive UNNECESSARY/DROP; override to APPROVED/ASK_USER, (4) record all resolved items as explicit assumptions in the spec, (5) use markers only for items that survive the funnel (ASK_USER or unresolved after critic consultation). Decision packet format: grouped by category, recommended defaults, blocking vs optional markers, impact of accepting default. Prefer informed defaults over asking
32
- 5. Write the spec to `.swarm/spec.md`.
31
+ - `[NEEDS CLARIFICATION]` markers for items where uncertainty could change scope, security, or core behavior, BUT ONLY after running the clarification funnel: (1) inventory all material uncertainties without numeric cap, (2) classify each as self_resolved/critic_resolved/research_needed/user_decision/deferred_nonblocking — **Overconfidence guard:** if the default is not directly supported by user request, spec, or recorded context, classify as `user_decision` rather than `self_resolved`, (3) consult critic_sounding_board with candidate items — critic responds per SoundingBoardVerdict: UNNECESSARY→DROP, RESOLVE→RESOLVE, REPHRASE→REPHRASE, APPROVED→ASK_USER — **always-surface protection:** always-surface categories must not receive UNNECESSARY/DROP; override to APPROVED/ASK_USER, (4) record all resolved items as explicit assumptions in the spec, (5) use markers only for items that survive the funnel (ASK_USER or unresolved after critic consultation). Decision packet format: grouped by category, recommended defaults, blocking vs optional markers, impact of accepting default. Prefer informed defaults over asking
32
+ - **Important:** If research is ongoing, monitor the timeout configured in `.swarm/config.json` under `research_needed_timeout_ms` (default: 300000ms / 5 minutes). If research does not complete before the timeout expires, automatically reclassify the item to `user_decision` with a note that research was incomplete, then surface it to the user. This prevents the clarification funnel from stalling while waiting for external research.
33
+ 5. Write the spec to `.swarm/spec.md`.
33
34
  5b. **QA GATE SELECTION, PARALLEL CODERS, COMMIT FREQUENCY, AND AUTO_PROCEED (dialogue only).**
34
35
  Ask the user which QA gates to enable for this plan, how many parallel coders to use, the commit frequency, and auto_proceed -- do not select on their behalf. Present all four items together as one unified exchange.
35
36
 
@@ -78,6 +78,8 @@ The orchestrator may:
78
78
  - determine scope,
79
79
  - build or request the context pack,
80
80
  - launch explorers and triggered micro-lanes,
81
+ - extract candidates from lane artifacts via `parse_lane_candidates` or equivalent parser,
82
+ - filter, group, and chunk candidates for reviewer dispatch,
81
83
  - route candidates to reviewers,
82
84
  - route reviewer-confirmed findings to critics,
83
85
  - group validated findings,
@@ -88,7 +90,8 @@ The orchestrator MUST NOT:
88
90
  - re-read a candidate's target code to decide if it is valid,
89
91
  - silently downgrade or discard an explorer candidate,
90
92
  - treat tool output as a confirmed finding,
91
- - report a finding that no reviewer validated.
93
+ - report a finding that no reviewer validated,
94
+ - classify or judge candidates based on preview text alone — always use the structured parser output.
92
95
 
93
96
  If the orchestrator catches itself validating code, it must stop and delegate validation to a reviewer subagent.
94
97
 
@@ -495,9 +498,43 @@ Launch all base lanes with `dispatch_lanes_async` when available. Pass the six l
495
498
 
496
499
  Before Phase 4 or synthesis, call `collect_lane_results` with `wait: true` for the base-lane batch and treat the collected `lane_results` as the join barrier. Missing, stale, cancelled, or failed base lanes are explicit review coverage gaps. If `dispatch_lanes_async` is unavailable, use blocking `dispatch_lanes`; if that is also unavailable, simulate isolated passes. Do not let one lane's conclusions bias another lane, and record unavailable deterministic dispatch in the validation gate.
497
500
 
498
- When any collected or blocking `lane_results[]` item has `output_ref`, treat `output` as a preview only. Call `retrieve_lane_output` and consume the full artifact before extracting candidates, deciding that a lane produced no candidates, or routing work to reviewers. If a lane has `output_truncated: true`, `output_degraded: true`, `transcript_incomplete: true`, or no usable `output_ref`, record an explicit coverage gap and re-dispatch a narrower lane or mark affected candidates/coverage UNVERIFIED; never infer candidate absence from a preview.
499
-
500
- **lane id uniqueness for parallel dispatches:** When re-dispatching failed or re-running explorer lanes, every `dispatch_lanes_async` or `dispatch_lanes` lane `id` MUST be unique within that dispatch batch and should include lane and attempt suffixes (e.g. `pr_review_explore_lane1_attempt2`). Never reuse an id in the same batch unless intentionally replacing that exact lane before dispatch.
501
+ ### Candidate extraction via parser
502
+
503
+ After `collect_lane_results` returns for base lanes, process each lane result
504
+ that carries an `output_ref`. The orchestrator MUST use the candidate parser
505
+ rather than preview-text extraction:
506
+
507
+ 1. For each `output_ref` (or batched), call `parse_lane_candidates` (or the
508
+ internal `parseAndPersist` module function) with `output_ref` and `producer`
509
+ flags; the parser auto-detects the format family per row. The parser reads
510
+ the full artifact from disk (no preview truncation issue) and returns
511
+ structured `ParseResultWithSidecar` records.
512
+ 2. Filter the returned `candidates[]` array by `producer: "swarm-pr-review"` and
513
+ the relevant `row_format_family` (e.g., `base_explorer` for base lanes,
514
+ `micro_lane` for micro-lanes). Filtering happens on the parsed results, NOT
515
+ on the tool input.
516
+ 3. Group the filtered candidates into reviewer-sized chunks:
517
+ - by file area (group by the directory or module of the `file_line` field),
518
+ - by category (group by the `category` field),
519
+ - by count (target max 50 candidates per chunk; smaller chunks are fine).
520
+ 4. Dispatch reviewer lanes (one per chunk) with bounded in-context candidate
521
+ lists. Each reviewer lane receives only the candidates from its assigned
522
+ chunk.
523
+
524
+ If a lane has `output_degraded: true`, `transcript_incomplete: true`, or no usable `output_ref`, record an explicit
525
+ coverage gap and re-dispatch a narrower lane or mark affected candidates
526
+ UNVERIFIED. Never infer candidate absence from a preview.
527
+
528
+ **Fallback convention:** If the parser is unavailable, the explorer MAY emit
529
+ `[CANDIDATE]` rows in the lane output as a fallback convention (see the
530
+ Explorer Prompt Template at the end of this skill), but the orchestrator
531
+ SHOULD use the parser as the primary extraction mechanism.
532
+
533
+ **lane id uniqueness for parallel dispatches:** When re-dispatching failed or
534
+ re-running explorer lanes, every `dispatch_lanes_async` or `dispatch_lanes`
535
+ lane `id` MUST be unique within that dispatch batch and should include lane and
536
+ attempt suffixes (e.g., `pr_review_explore_lane1_attempt2`). Never reuse an id
537
+ in the same batch unless intentionally replacing that exact lane before dispatch.
501
538
 
502
539
  Explorers optimize for recall. Over-reporting is expected. Explorers produce candidates only.
503
540
 
@@ -507,7 +544,7 @@ Explorers optimize for recall. Over-reporting is expected. Explorers produce can
507
544
  | Lane 2: Security and trust boundaries | Injection, authz/authn bypass, SSRF, path traversal, secret exposure, unsafe deserialization, prompt injection | untrusted input sources, sanitization, credential handling, permission boundary, private network access, output escaping |
508
545
  | Lane 3: Dependencies and deployment safety | Import changes, version bumps, lockfile drift, breaking APIs, package scripts, runtime assumptions | lockfile consistency, new transitive deps, Node/Bun/runtime compatibility, platform assumptions, license red flags |
509
546
  | Lane 4: Docs, intent, and drift | PR claims vs implementation, docs mismatch, migration/changelog gaps, stale examples | obligation mapping, changed behavior not documented, docs promising behavior not implemented |
510
- | Lane 5: Tests and falsifiability | Weak assertions, missing edge tests, flaky patterns, mock leakage, fixture drift | assertion strength, tautology patterns (`expect(true).toBe(true)`, `expect(res).toBeDefined()` without further checks, `assertDoesNotThrow` wrapping trivial code), negative paths, isolation, deterministic timing, cross-platform path coverage |
547
+ | Lane 5: Tests and falsifiability | Weak assertions, missing edge tests, flaky patterns, mock leakage, fixture drift | assertion strength, tautology patterns (`expect(true).toBe(true)`, `expect(res).toBeDefined()` without further checks), `assertDoesNotThrow` wrapping trivial code), negative paths, isolation, deterministic timing, cross-platform path coverage |
511
548
  | Lane 6: Performance and architecture | Complexity regressions, memory leaks, over-coupling, inefficient graph scans, global mutable state | algorithmic deltas, caching, resource lifecycle, state ownership, architectural boundary violations |
512
549
 
513
550
  ### Explorer context contract
@@ -523,12 +560,19 @@ Every explorer must inspect or explicitly mark unavailable:
523
560
  7. relevant Swarm knowledge/evidence entries, if present.
524
561
  8. the commit range to analyze (`base_ref..head_ref`),
525
562
 
526
- Explorer output format:
563
+ ### Explorer output format
564
+
565
+ Explorers emit structured candidate records. The parser reads the full lane
566
+ artifact and extracts these records. The canonical record shape is:
527
567
 
528
568
  ```text
529
569
  [CANDIDATE] | candidate_id | lane | severity | category | file:line | claim | evidence_summary | impact_context | confidence: LOW/MEDIUM/HIGH
530
570
  ```
531
571
 
572
+ The parser normalizes this into a structured `candidates[]` array. If the
573
+ parser is unavailable, the explorer MAY emit the `[CANDIDATE]` row format
574
+ directly in the lane output as a fallback convention.
575
+
532
576
  Explorers must not use `CONFIRMED`, `DISPROVED`, or `PRE_EXISTING`.
533
577
 
534
578
  ---
@@ -537,7 +581,7 @@ Explorers must not use `CONFIRMED`, `DISPROVED`, or `PRE_EXISTING`.
537
581
 
538
582
  After `collect_lane_results` returns for base lanes, inspect the context pack risk triggers. Launch focused micro-lanes for triggered categories only, using `dispatch_lanes_async` again when more than one read-only micro-lane is needed. Collect every micro-lane batch with `wait: true` before reviewer classification. Do not launch irrelevant micro-lanes.
539
583
 
540
- Apply the same `output_ref` rule to micro-lanes: retrieve full output before candidate routing, and treat degraded or incomplete lane artifacts as UNVERIFIED coverage rather than as clean negative evidence.
584
+ Apply the same parser-based extraction to micro-lanes: call `parse_lane_candidates` on each micro-lane `output_ref` (filter the returned `candidates[]` array by `row_format_family === "micro_lane"` after parsing), and treat degraded or incomplete lane artifacts as UNVERIFIED coverage rather than as clean negative evidence.
541
585
 
542
586
  Each micro-lane receives:
543
587
 
@@ -547,7 +591,8 @@ Each micro-lane receives:
547
591
  - relevant deterministic signals,
548
592
  - related historical knowledge with quarantine/staleness status,
549
593
  - expected invariants,
550
- - output format as `[CANDIDATE]` only.
594
+ - structured candidate output (parser-extracted). If the parser is unavailable,
595
+ the micro-lane MAY emit `[CANDIDATE]` rows as a fallback convention.
551
596
 
552
597
  ### Swarm plugin risk trigger map
553
598
 
@@ -596,7 +641,12 @@ Verifier output is advisory until incorporated by the independent reviewer or cr
596
641
 
597
642
  ## Phase 6: Independent Reviewer Confirmation
598
643
 
599
- Route candidates to reviewer subagents. The reviewer must re-read the candidate's file:line evidence and relevant context pack entries directly.
644
+ Route candidates to reviewer subagents. The orchestrator routes candidates
645
+ in bounded chunks produced by the parser-based extraction in Phase 3-4. Each
646
+ reviewer lane receives a bounded list of candidates from a single chunk — by
647
+ file area, category, or count — not the full candidate set. The reviewer must
648
+ re-read the candidate's file:line evidence and relevant context pack entries
649
+ directly.
600
650
 
601
651
  ### Noise budget and universal validation
602
652
 
@@ -813,6 +863,245 @@ Update the verdict only after re-verifying all previously blocking findings.
813
863
 
814
864
  ---
815
865
 
866
+ ## Dry-Run: Parser-Based Candidate Extraction
867
+
868
+ This section demonstrates the new parser-based extraction path end-to-end
869
+ using synthetic data. It is concrete enough to implement the same pattern in
870
+ another skill.
871
+
872
+ ### Scenario
873
+
874
+ A PR review has dispatched six base explorer lanes via `dispatch_lanes_async`.
875
+ The batch completed and `collect_lane_results` returned:
876
+
877
+ ```json
878
+ {
879
+ "batch_id": "batch-a1b2c3",
880
+ "lane_results": [
881
+ {
882
+ "lane_id": "pr_review_lane1_correctness",
883
+ "status": "completed",
884
+ "output_ref": ".swarm/lane-results/batch-a1b2c3/lane-1/out-abc123.json",
885
+ "output_degraded": false
886
+ },
887
+ {
888
+ "lane_id": "pr_review_lane2_security",
889
+ "status": "completed",
890
+ "output_ref": ".swarm/lane-results/batch-a1b2c3/lane-2/out-def456.json",
891
+ "output_degraded": false
892
+ }
893
+ ]
894
+ }
895
+ ```
896
+
897
+ ### Step 1 — Call the parser
898
+
899
+ The orchestrator calls `parse_lane_candidates` for each `output_ref`:
900
+
901
+ ```json
902
+ {
903
+ "tool": "parse_lane_candidates",
904
+ "arguments": {
905
+ "output_ref": ".swarm/lane-results/batch-a1b2c3/lane-1/out-abc123.json",
906
+ "producer": "swarm-pr-review"
907
+ }
908
+ }
909
+ ```
910
+
911
+ ### Step 2 — Structured response
912
+
913
+ The parser returns a `ParseResultWithSidecar`. On success, `error` and `error_code` are absent:
914
+
915
+ ```json
916
+ {
917
+ "candidates": [
918
+ {
919
+ "record_type": "candidate",
920
+ "row_format_family": "base_explorer",
921
+ "row_format_version": 1,
922
+ "record_version": { "major": 1, "minor": 0 },
923
+ "source_output_ref": ".swarm/lane-results/batch-a1b2c3/lane-1/out-abc123.json",
924
+ "source_batch_id": "B-2025-06-22-001",
925
+ "source_lane_id": "explorer-1",
926
+ "source_agent": "paid_explorer",
927
+ "source_digest": "sha256:abc123def456...",
928
+ "extracted_from_partial_source": false,
929
+ "sessionId": "ses_01HXYZ...",
930
+ "parentSessionId": "ses_01HABC...",
931
+ "producer": "swarm-pr-review",
932
+ "candidate_id": "C-001",
933
+ "lane": "Lane 1: Correctness and edge cases",
934
+ "micro_lane": null,
935
+ "severity": "HIGH",
936
+ "category": "null-safety",
937
+ "file_line": "src/utils/cache.ts:142",
938
+ "claim": "Uncached getter may return undefined on cold start",
939
+ "evidence_summary": "The `getCached` function returns `cache[key]` without a fallback when the cache is empty.",
940
+ "impact_context": "Downstream callers in `src/handlers/*.ts` expect a defined value and call `.toString()` directly.",
941
+ "invariant_violated": null,
942
+ "confidence": "HIGH"
943
+ },
944
+ {
945
+ "record_type": "candidate",
946
+ "row_format_family": "base_explorer",
947
+ "row_format_version": 1,
948
+ "record_version": { "major": 1, "minor": 0 },
949
+ "source_output_ref": ".swarm/lane-results/batch-a1b2c3/lane-1/out-abc123.json",
950
+ "source_batch_id": "B-2025-06-22-001",
951
+ "source_lane_id": "explorer-1",
952
+ "source_agent": "paid_explorer",
953
+ "source_digest": "sha256:abc123def456...",
954
+ "extracted_from_partial_source": false,
955
+ "sessionId": "ses_01HXYZ...",
956
+ "parentSessionId": "ses_01HABC...",
957
+ "producer": "swarm-pr-review",
958
+ "candidate_id": "C-002",
959
+ "lane": "Lane 1: Correctness and edge cases",
960
+ "micro_lane": null,
961
+ "severity": "MEDIUM",
962
+ "category": "async-ordering",
963
+ "file_line": "src/services/queue.ts:88",
964
+ "claim": "Race between `drain` and `processNext` may drop items",
965
+ "evidence_summary": "`drain` sets `active = false` before awaiting `processNext`, which also checks `active`.",
966
+ "impact_context": "Items submitted during the drain window are silently dropped.",
967
+ "invariant_violated": null,
968
+ "confidence": "MEDIUM"
969
+ }
970
+ ],
971
+ "invocation_envelope": {
972
+ "record_type": "invocation",
973
+ "source_output_ref": ".swarm/lane-results/batch-a1b2c3/lane-1/out-abc123.json",
974
+ "source_batch_id": "B-2025-06-22-001",
975
+ "source_lane_id": "explorer-1",
976
+ "source_agent": "paid_explorer",
977
+ "source_digest": "sha256:abc123def456...",
978
+ "row_format_version": 1,
979
+ "record_version": { "major": 1, "minor": 0 },
980
+ "sessionId": "ses_01HXYZ...",
981
+ "parentSessionId": "ses_01HABC...",
982
+ "producer": "swarm-pr-review",
983
+ "produced_at": "2025-06-22T14:30:00.000Z",
984
+ "format_families_detected": ["base_explorer"],
985
+ "candidate_count": 2,
986
+ "parse_errors": 2,
987
+ "malformed_rows": 0
988
+ },
989
+ "diagnostics": {
990
+ "candidate_count": 2,
991
+ "parse_errors": 2,
992
+ "parse_error_details": [
993
+ {
994
+ "row_index": 0,
995
+ "field": "row",
996
+ "message": "Both format-family discriminators present; defaulting to base_explorer"
997
+ },
998
+ {
999
+ "row_index": 1,
1000
+ "field": "row",
1001
+ "message": "Both format-family discriminators present; defaulting to base_explorer"
1002
+ }
1003
+ ],
1004
+ "malformed_rows": 0,
1005
+ "duplicate_id_count": 0,
1006
+ "duplicate_id_warnings": [],
1007
+ "degraded_source_count": 0,
1008
+ "incomplete_source_count": 0,
1009
+ "format_families_detected": ["base_explorer"]
1010
+ }
1011
+ }
1012
+ ```
1013
+ > **Note**: `parse_errors: 2` reflects FR-017/SC-017 position-based detection: when a `[CANDIDATE]` row has both `evidence_summary` and `impact_context` populated, the parser emits a `parse_error_details` entry per row with `field: "row"` and `message: "Both format-family discriminators present; defaulting to base_explorer"`. This is documented behavior, not a parser bug. To get `parse_errors: 0` with the row format, leave one of the two fields empty; to silence the warning entirely, emit structured JSON candidate records.
1014
+
1015
+ On refusal (e.g. `output_ref` does not exist), `error` and `error_code` are present; `candidates` is `[]`; `invocation_envelope` and `diagnostics` are populated with empty fields for traceability:
1016
+
1017
+ ```json
1018
+ {
1019
+ "error": "Artifact reference not found in store",
1020
+ "error_code": "ref-not-found",
1021
+ "candidates": [],
1022
+ "invocation_envelope": {
1023
+ "record_type": "invocation",
1024
+ "source_output_ref": ".swarm/lane-results/batch-a1b2c3/lane-1/missing.json",
1025
+ "source_batch_id": "",
1026
+ "source_lane_id": "",
1027
+ "source_agent": "",
1028
+ "source_digest": "",
1029
+ "row_format_version": 1,
1030
+ "record_version": { "major": 1, "minor": 0 },
1031
+ "produced_at": "2025-06-22T14:30:00.000Z",
1032
+ "format_families_detected": [],
1033
+ "candidate_count": 0,
1034
+ "parse_errors": 0,
1035
+ "malformed_rows": 0
1036
+ },
1037
+ "diagnostics": {
1038
+ "candidate_count": 0,
1039
+ "parse_errors": 0,
1040
+ "parse_error_details": [],
1041
+ "malformed_rows": 0,
1042
+ "duplicate_id_count": 0,
1043
+ "duplicate_id_warnings": [],
1044
+ "degraded_source_count": 0,
1045
+ "incomplete_source_count": 0,
1046
+ "format_families_detected": []
1047
+ }
1048
+ }
1049
+ ```
1050
+
1051
+ ### Step 3 — Filter and group
1052
+
1053
+ The orchestrator filters the returned `candidates[]` array by `producer: "swarm-pr-review"` and `row_format_family` (e.g. `base_explorer` or `micro_lane`), then groups
1054
+ the candidates. In this synthetic example, the two candidates above are grouped
1055
+ by file area:
1056
+
1057
+ - **Chunk A — `src/utils/`** (1 candidate): C-001
1058
+ - **Chunk B — `src/services/`** (1 candidate): C-002
1059
+
1060
+ If there were more candidates, the orchestrator would also group by category
1061
+ (e.g., `null-safety`, `async-ordering`) and cap each chunk at 50 candidates.
1062
+
1063
+ ### Step 4 — Dispatch reviewer lanes
1064
+
1065
+ The orchestrator dispatches one reviewer lane per chunk:
1066
+
1067
+ ```text
1068
+ You are the independent reviewer. Validate only the candidates assigned below.
1069
+ Do not search for new issues except where needed to validate reachability or
1070
+ mitigation. Do not trust explorer severity.
1071
+
1072
+ Context pack summary:
1073
+ - scope: ...
1074
+ - obligations: ...
1075
+ - impact cone: ...
1076
+ - deterministic signals: ...
1077
+ - relevant Swarm artifacts / knowledge: ...
1078
+ - base_ref: <commit SHA of base branch>
1079
+ - head_ref: <commit SHA of PR head branch>
1080
+
1081
+ Candidates (Chunk A — src/utils/):
1082
+ - C-001 | HIGH | null-safety | src/utils/cache.ts:142 | Uncached getter may return undefined on cold start
1083
+
1084
+ For each candidate, return:
1085
+ [REVIEWED] | candidate_id | CONFIRMED/DISPROVED/UNVERIFIED/PRE_EXISTING | evidence_type | final_severity | introduced_by_pr | file:line | rationale | falsification_probe | reviewer_id
1086
+
1087
+ You must check caller context, reachability, schema/middleware/framework mitigations, state-machine constraints, test coverage, PR-introducedness, and severity.
1088
+
1089
+ IMPORTANT: If a finding claims behavior is "new" or "introduced by the PR", you MUST read the equivalent code on the base branch (git show <base_ref>:<file>) to verify it was not present before. A reviewer claim of "this is new" is invalid without base-branch evidence. Do not compare the new code to an idealized baseline — compare it to what actually existed on the base branch at the time of the PR.
1090
+ ```
1091
+
1092
+ ### Key invariants
1093
+
1094
+ - The parser reads the **full artifact**, not a preview. Truncation in the
1095
+ `dispatch_lanes` preview does not affect candidate extraction.
1096
+ - The orchestrator never classifies candidates — it only filters, groups, and
1097
+ routes them.
1098
+ - Each reviewer receives a bounded chunk. A chunk with more than 50 candidates
1099
+ is split before dispatch.
1100
+ - The `invocation_envelope` in the parser response provides audit provenance
1101
+ for every extracted candidate.
1102
+
1103
+ ---
1104
+
816
1105
  # Council Mode Workflow
817
1106
 
818
1107
  Council mode is opt-in only and adversarial.
@@ -1116,4 +1405,10 @@ Return:
1116
1405
  [CANDIDATE] | candidate_id | lane | severity | category | file:line | claim | evidence_summary | impact_context | confidence
1117
1406
  ```
1118
1407
 
1408
+ The orchestrator extracts candidates from the full lane artifact via
1409
+ `parse_lane_candidates` as the primary mechanism. The `[CANDIDATE]` row
1410
+ format above is a fallback convention for environments where the parser is
1411
+ unavailable. Explorers should still emit structured records regardless of
1412
+ whether the parser is present.
1413
+
1119
1414
  Do not let speed degrade validation quality.
package/README.md CHANGED
@@ -800,6 +800,7 @@ Every candidate passes a 3-gate pipeline before entering quarantine:
800
800
  | mutation_test | Applies LLM-generated mutation patches to source files and runs tests to measure kill rate; verdict is pass/warn/fail based on configurable thresholds; used by the mutation_test gate (opt-in, off by default) |
801
801
  | generate_mutants | Architect-only: generates LLM-based mutation patches (5–10 per function across 6 types: off-by-one, null substitution, operator swap, guard removal, branch swap, side-effect deletion) for direct consumption by the mutation_test tool; returns SKIP verdict on LLM failure rather than throwing |
802
802
  | write_mutation_evidence | Architect-only: writes mutation gate results atomically to `.swarm/evidence/{phase}/mutation-gate.json`; accepts verdict (PASS/WARN/FAIL/SKIP), kill rate metrics, and optional survived mutant details; normalizes uppercase-to-lowercase before persisting |
803
+ | parse_lane_candidates | Architect-only: parses `[CANDIDATE]` rows from a `dispatch_lanes` or `collect_lane_results` artifact by `output_ref`; produces structured records with provenance and optional sidecar JSONL persistence; returns `ParseResultWithSidecar` on success or `{ error, error_code, candidates: [] }` on refusal |
803
804
  | git_blame | Per-line git blame metadata (sha, author, date, summary) via `git blame --porcelain`; supports optional line range filtering |
804
805
  | diff | Structured git diff with contract change detection; supports `summaryOnly` mode returning file list with additions/deletions counts |
805
806
  | suggest_patch | Reviewer-safe structured patch suggestion; supports `format` parameter ('json' or 'unified') where unified outputs valid unified diff with `diff --git` headers, hunks, and context |