opencode-swarm 7.87.3 → 7.88.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.opencode/skills/brainstorm/SKILL.md +2 -1
- package/.opencode/skills/clarify/SKILL.md +7 -1
- package/.opencode/skills/clarify-spec/SKILL.md +1 -1
- package/.opencode/skills/issue-ingest/SKILL.md +3 -2
- package/.opencode/skills/plan/SKILL.md +7 -1
- package/.opencode/skills/specify/SKILL.md +3 -2
- package/.opencode/skills/swarm-pr-review/SKILL.md +304 -9
- package/README.md +1 -0
- package/dist/background/candidate-parser.d.ts +189 -0
- package/dist/background/candidate-sidecar-store.d.ts +56 -0
- package/dist/cli/{config-doctor-6h64pn8n.js → config-doctor-jzbgpbdh.js} +2 -2
- package/dist/cli/{guardrail-explain-2q9myk7c.js → guardrail-explain-995zavv8.js} +5 -5
- package/dist/cli/{guardrail-log-eegabqcp.js → guardrail-log-c7egm5km.js} +3 -3
- package/dist/cli/{index-q9h0wb04.js → index-0asbrmdx.js} +4 -0
- package/dist/cli/{index-kz1bmebr.js → index-4td9ef53.js} +523 -229
- package/dist/cli/{index-1cb4wxnm.js → index-819xp49y.js} +1 -1
- package/dist/cli/{index-5hvbw5xh.js → index-g00qm2gf.js} +1 -1
- package/dist/cli/{index-r3f47swm.js → index-sr7g2msm.js} +6 -6
- package/dist/cli/{index-amwa268r.js → index-tt5aehrb.js} +2 -2
- package/dist/cli/{index-5vpe6vq9.js → index-vjsr9bqt.js} +1 -1
- package/dist/cli/index.js +4 -4
- package/dist/cli/{schema-84146tvk.js → schema-vb6jkxgg.js} +1 -1
- package/dist/index.js +2114 -991
- package/dist/memory/config.d.ts +1 -0
- package/dist/memory/gateway.d.ts +1 -0
- package/dist/memory/provider-pool.d.ts +50 -0
- package/dist/memory/sqlite-provider.d.ts +3 -0
- package/dist/tools/index.d.ts +1 -0
- package/dist/tools/manifest.d.ts +1 -0
- package/dist/tools/parse-lane-candidates.d.ts +2 -0
- package/dist/tools/tool-metadata.d.ts +4 -0
- package/package.json +1 -1
|
@@ -52,7 +52,8 @@ If `council.general.enabled` is true in the resolved opencode-swarm config AND a
|
|
|
52
52
|
- Exit with a design outline the user can skim in under two minutes.
|
|
53
53
|
|
|
54
54
|
**Phase 5: SPEC WRITE + SELF-REVIEW (architect + reviewer).**
|
|
55
|
-
- Generate `.swarm/spec.md` following the same SPEC CONTENT RULES that MODE: SPECIFY uses: WHAT/WHY only, no tech stack, no implementation details, FR-### / SC-### numbering, Given/When/Then scenarios, `[NEEDS CLARIFICATION]` markers only for items that survive the clarification funnel: inventory all material uncertainties without numeric cap → classify each (self_resolved/critic_resolved/research_needed/user_decision/deferred_nonblocking) — **
|
|
55
|
+
- Generate `.swarm/spec.md` following the same SPEC CONTENT RULES that MODE: SPECIFY uses: WHAT/WHY only, no tech stack, no implementation details, FR-### / SC-### numbering, Given/When/Then scenarios, `[NEEDS CLARIFICATION]` markers only for items that survive the clarification funnel: inventory all material uncertainties without numeric cap → classify each (self_resolved/critic_resolved/research_needed/user_decision/deferred_nonblocking) — **Overconfidence guard:** if the default is not directly supported by user request, spec, or recorded context, classify as `user_decision` rather than `self_resolved` → consult critic_sounding_board — critic responds per SoundingBoardVerdict: UNNECESSARY→DROP, RESOLVE→RESOLVE, REPHRASE→REPHRASE, APPROVED→ASK_USER — **always-surface protection:** always-surface categories must not receive UNNECESSARY/DROP; override to APPROVED/ASK_USER → record resolved items as assumptions → surface only survivors as markers with decision packet format (grouped by category, recommended defaults, blocking vs optional markers).
|
|
56
|
+
- **Important:** If research is ongoing, monitor the timeout configured in `.swarm/config.json` under `research_needed_timeout_ms` (default: 300000ms / 5 minutes). If research does not complete before the timeout expires, automatically reclassify the item to `user_decision` with a note that research was incomplete, then surface it to the user. This prevents the clarification funnel from stalling while waiting for external research.
|
|
56
57
|
- Cross-reference design sections by name where relevant context helps (but keep HOW out of the spec).
|
|
57
58
|
- Delegate to `the active swarm's reviewer agent` for an independent review of the draft spec. Reviewer must flag: requirements that encode HOW, untestable requirements, missing edge cases, silent assumptions.
|
|
58
59
|
- Apply reviewer feedback. If reviewer rejects, iterate once and re-review. After two rounds, surface remaining disagreements to the user.
|
|
@@ -38,7 +38,7 @@ There is NO hard cap on the internal inventory. Record every material uncertaint
|
|
|
38
38
|
Classify each item as exactly one of:
|
|
39
39
|
- `self_resolved`: answered from the user request, spec, plan, codebase reality check, `.swarm/context.md`, repo conventions, or an informed default. **If the default is not directly supported by user request, spec, or recorded context, classify as `user_decision` rather than `self_resolved`.**
|
|
40
40
|
- `critic_resolved`: sent to Critic Sounding Board and resolved by the critic.
|
|
41
|
-
- `research_needed`: needs SME/explorer/domain lookup before user escalation.
|
|
41
|
+
- `research_needed`: needs SME/explorer/domain lookup before user escalation. **Important:** If research is ongoing, monitor the timeout configured in `.swarm/config.json` under `research_needed_timeout_ms` (default: 300000ms / 5 minutes). If research does not complete before the timeout expires, automatically reclassify the item to `user_decision` with a note that research was incomplete, then surface it to the user. This prevents the clarification funnel from stalling while waiting for external research.
|
|
42
42
|
- `user_decision`: only the user can decide because it affects product scope, risk tolerance, policy, budget, UX, rollout, or destructive behavior.
|
|
43
43
|
- `deferred_nonblocking`: useful follow-up detail that does not block a correct initial plan and can be explicitly recorded as an assumption or follow-up.
|
|
44
44
|
|
|
@@ -101,3 +101,9 @@ The critic may improve wording or confirm prior context, but these categories MU
|
|
|
101
101
|
### Assumptions Recording
|
|
102
102
|
|
|
103
103
|
All items resolved in Stages 2-3 (self_resolved, critic_resolved, deferred_nonblocking) MUST be recorded as explicit assumptions in the spec, plan, or `.swarm/context.md`. Silently dropping resolved uncertainties is a protocol violation — every uncertainty that entered the funnel must have a recorded outcome.
|
|
104
|
+
|
|
105
|
+
### Mechanical Enforcement of DROP Protection
|
|
106
|
+
|
|
107
|
+
**Implementation Note:** The hard constraint against `DROP` on always-surface items (defined in Stage 3 of the clarification funnel) is currently enforced via skill instructions to the architect. A lightweight runtime enforcement mechanism is recommended: when processing the critic sounding board verdict response in `src/agents/critic.ts`, validate that any items tagged as "always-surface" do not receive `UNNECESSARY`/`DROP` verdicts. If a DROP verdict is encountered on an always-surface item, override it to `APPROVED`/`ASK_USER` at the code level rather than relying solely on prompt-based enforcement.
|
|
108
|
+
|
|
109
|
+
This mechanical enforcement prevents the following failure mode: the architect prompt instructs the override, but due to parsing errors, context limits, or model behavior variance, the DROP verdict is mistakenly applied to an always-surface item and silently accepted. The validation should occur in the decision-packet assembly code (when building the final clarification packet to surface to the user) and should emit a warning log when an override is applied.
|
|
@@ -50,7 +50,7 @@ CLARIFY-SPEC handles **already-surfaced** `[NEEDS CLARIFICATION]` markers and sp
|
|
|
50
50
|
However, before surfacing each marker question to the user, CLARIFY-SPEC MUST:
|
|
51
51
|
|
|
52
52
|
1. **Consult `critic_sounding_board`** with the candidate marker question and surrounding spec context to check whether the question wording can be improved or the item can be resolved from existing context.
|
|
53
|
-
2. **Apply the
|
|
53
|
+
2. **Apply the Overconfidence guard:** If the critic supplies a `RESOLVE` verdict with a default answer, but that default is not directly supported by user request, spec, or recorded context, classify the item as `user_decision` rather than `self_resolved`.
|
|
54
54
|
3. **Apply always-surface protection:** If the marker belongs to an always-surface category (scope boundaries, destructive behavior, security/privacy, backward compatibility, breaking API changes, new dependencies, deprecations, cross-platform impact, cost/performance tradeoffs, user-visible UX, rollout strategy, QA gates), the item MUST NOT receive `UNNECESSARY`/`DROP` from the critic — override to `APPROVED`/`ASK_USER`.
|
|
55
55
|
|
|
56
56
|
Critic verdict mapping (see `src/agents/critic.ts` `SoundingBoardVerdict`): `UNNECESSARY`→DROP, `RESOLVE`→RESOLVE, `REPHRASE`→REPHRASE, `APPROVED`→ASK_USER.
|
|
@@ -45,8 +45,9 @@ Flags parsed from signal:
|
|
|
45
45
|
- WHAT users need and WHY — never HOW to implement
|
|
46
46
|
- FR-### / SC-### numbering, Given/When/Then scenarios
|
|
47
47
|
- No technology stack, APIs, or code structure
|
|
48
|
-
|
|
49
|
-
|
|
48
|
+
- `[NEEDS CLARIFICATION]` markers only for items that survive the clarification funnel: inventory all material uncertainties without numeric cap → classify each (self_resolved/critic_resolved/research_needed/user_decision/deferred_nonblocking) — **Overconfidence guard:** if the default is not directly supported by user request, spec, or recorded context, classify as `user_decision` rather than `self_resolved` → consult critic_sounding_board — critic responds per SoundingBoardVerdict: UNNECESSARY→DROP, RESOLVE→RESOLVE, REPHRASE→REPHRASE, APPROVED→ASK_USER — **always-surface protection:** always-surface categories must not receive UNNECESSARY/DROP; override to APPROVED/ASK_USER → record resolved items as assumptions → surface only survivors as markers with decision packet format (grouped by category, recommended defaults, blocking vs optional markers)
|
|
49
|
+
- **Important:** If research is ongoing, monitor the timeout configured in `.swarm/config.json` under `research_needed_timeout_ms` (default: 300000ms / 5 minutes). If research does not complete before the timeout expires, automatically reclassify the item to `user_decision` with a note that research was incomplete, then surface it to the user. This prevents the clarification funnel from stalling while waiting for external research.
|
|
50
|
+
2. Cross-reference the spec against the issue's expected behavior to ensure alignment.
|
|
50
51
|
3. If the issue is a bug: spec must describe the correct behavior, not the broken behavior.
|
|
51
52
|
4. If the issue is a feature: spec must describe the user-facing outcome, not the implementation.
|
|
52
53
|
5. QA GATE SELECTION: Ask user which QA gates to enable (same dialogue as MODE: SPECIFY). Write to `.swarm/context.md` under `## Pending QA Gate Selection`.
|
|
@@ -81,7 +81,7 @@ Classify each item as exactly one of:
|
|
|
81
81
|
|
|
82
82
|
- `self_resolved`: answered from the user request, spec, plan, codebase reality check, `.swarm/context.md`, repo conventions, or an informed default. **If the default is not directly supported by user request, spec, or recorded context, classify as `user_decision` rather than `self_resolved`.**
|
|
83
83
|
- `critic_resolved`: sent to Critic Sounding Board and resolved by the critic.
|
|
84
|
-
- `research_needed`: needs SME/explorer/domain lookup before user escalation.
|
|
84
|
+
- `research_needed`: needs SME/explorer/domain lookup before user escalation. **Important:** If research is ongoing, monitor the timeout configured in `.swarm/config.json` under `research_needed_timeout_ms` (default: 300000ms / 5 minutes). If research does not complete before the timeout expires, automatically reclassify the item to `user_decision` with a note that research was incomplete, then surface it to the user. This prevents the clarification funnel from stalling while waiting for external research.
|
|
85
85
|
- `user_decision`: only the user can decide because it affects product scope, risk tolerance, policy, budget, UX, rollout, or destructive behavior.
|
|
86
86
|
- `deferred_nonblocking`: useful follow-up detail that does not block a correct initial plan and can be explicitly recorded as an assumption or follow-up.
|
|
87
87
|
|
|
@@ -152,6 +152,12 @@ All items resolved in Stages 2-3 (self_resolved, critic_resolved, deferred_nonbl
|
|
|
152
152
|
|
|
153
153
|
The plan generated by `save_plan` MUST include explicit assumptions and remaining unresolved decisions in the task descriptions or acceptance criteria — not silently omit them.
|
|
154
154
|
|
|
155
|
+
#### Mechanical Enforcement of DROP Protection
|
|
156
|
+
|
|
157
|
+
**Implementation Note:** The hard constraint against `DROP` on always-surface items (Stage 3 of the clarification funnel) is currently enforced via skill instructions to the architect. A lightweight runtime enforcement mechanism is recommended: when processing the critic sounding board verdict response in `src/agents/critic.ts`, validate that any items tagged as "always-surface" do not receive `UNNECESSARY`/`DROP` verdicts. If a DROP verdict is encountered on an always-surface item, override it to `APPROVED`/`ASK_USER` at the code level rather than relying solely on prompt-based enforcement.
|
|
158
|
+
|
|
159
|
+
This mechanical enforcement prevents the following failure mode: the architect prompt instructs the override, but due to parsing errors, context limits, or model behavior variance, the DROP verdict is mistakenly applied to an always-surface item and silently accepted. The validation should occur in the decision-packet assembly code (when building the final clarification packet to surface to the user) and should emit a warning log when an override is applied.
|
|
160
|
+
|
|
155
161
|
Use the `save_plan` tool to create the implementation plan. Required parameters:
|
|
156
162
|
- `title`: The real project name from the spec (NOT a placeholder like [Project])
|
|
157
163
|
- `swarm_id`: The swarm identifier (e.g. "mega", "local", "paid")
|
|
@@ -28,8 +28,9 @@ Activates when: user asks to "specify", "define requirements", "write a spec", o
|
|
|
28
28
|
- Success criteria numbered SC-001, SC-002… — measurable and technology-agnostic
|
|
29
29
|
- Key entities if data is involved (no schema or field definitions — entity names only)
|
|
30
30
|
- Edge cases and known failure modes
|
|
31
|
-
- `[NEEDS CLARIFICATION]` markers for items where uncertainty could change scope, security, or core behavior, BUT ONLY after running the clarification funnel: (1) inventory all material uncertainties without numeric cap, (2) classify each as self_resolved/critic_resolved/research_needed/user_decision/deferred_nonblocking — **
|
|
32
|
-
|
|
31
|
+
- `[NEEDS CLARIFICATION]` markers for items where uncertainty could change scope, security, or core behavior, BUT ONLY after running the clarification funnel: (1) inventory all material uncertainties without numeric cap, (2) classify each as self_resolved/critic_resolved/research_needed/user_decision/deferred_nonblocking — **Overconfidence guard:** if the default is not directly supported by user request, spec, or recorded context, classify as `user_decision` rather than `self_resolved`, (3) consult critic_sounding_board with candidate items — critic responds per SoundingBoardVerdict: UNNECESSARY→DROP, RESOLVE→RESOLVE, REPHRASE→REPHRASE, APPROVED→ASK_USER — **always-surface protection:** always-surface categories must not receive UNNECESSARY/DROP; override to APPROVED/ASK_USER, (4) record all resolved items as explicit assumptions in the spec, (5) use markers only for items that survive the funnel (ASK_USER or unresolved after critic consultation). Decision packet format: grouped by category, recommended defaults, blocking vs optional markers, impact of accepting default. Prefer informed defaults over asking
|
|
32
|
+
- **Important:** If research is ongoing, monitor the timeout configured in `.swarm/config.json` under `research_needed_timeout_ms` (default: 300000ms / 5 minutes). If research does not complete before the timeout expires, automatically reclassify the item to `user_decision` with a note that research was incomplete, then surface it to the user. This prevents the clarification funnel from stalling while waiting for external research.
|
|
33
|
+
5. Write the spec to `.swarm/spec.md`.
|
|
33
34
|
5b. **QA GATE SELECTION, PARALLEL CODERS, COMMIT FREQUENCY, AND AUTO_PROCEED (dialogue only).**
|
|
34
35
|
Ask the user which QA gates to enable for this plan, how many parallel coders to use, the commit frequency, and auto_proceed -- do not select on their behalf. Present all four items together as one unified exchange.
|
|
35
36
|
|
|
@@ -78,6 +78,8 @@ The orchestrator may:
|
|
|
78
78
|
- determine scope,
|
|
79
79
|
- build or request the context pack,
|
|
80
80
|
- launch explorers and triggered micro-lanes,
|
|
81
|
+
- extract candidates from lane artifacts via `parse_lane_candidates` or equivalent parser,
|
|
82
|
+
- filter, group, and chunk candidates for reviewer dispatch,
|
|
81
83
|
- route candidates to reviewers,
|
|
82
84
|
- route reviewer-confirmed findings to critics,
|
|
83
85
|
- group validated findings,
|
|
@@ -88,7 +90,8 @@ The orchestrator MUST NOT:
|
|
|
88
90
|
- re-read a candidate's target code to decide if it is valid,
|
|
89
91
|
- silently downgrade or discard an explorer candidate,
|
|
90
92
|
- treat tool output as a confirmed finding,
|
|
91
|
-
- report a finding that no reviewer validated
|
|
93
|
+
- report a finding that no reviewer validated,
|
|
94
|
+
- classify or judge candidates based on preview text alone — always use the structured parser output.
|
|
92
95
|
|
|
93
96
|
If the orchestrator catches itself validating code, it must stop and delegate validation to a reviewer subagent.
|
|
94
97
|
|
|
@@ -495,9 +498,43 @@ Launch all base lanes with `dispatch_lanes_async` when available. Pass the six l
|
|
|
495
498
|
|
|
496
499
|
Before Phase 4 or synthesis, call `collect_lane_results` with `wait: true` for the base-lane batch and treat the collected `lane_results` as the join barrier. Missing, stale, cancelled, or failed base lanes are explicit review coverage gaps. If `dispatch_lanes_async` is unavailable, use blocking `dispatch_lanes`; if that is also unavailable, simulate isolated passes. Do not let one lane's conclusions bias another lane, and record unavailable deterministic dispatch in the validation gate.
|
|
497
500
|
|
|
498
|
-
|
|
499
|
-
|
|
500
|
-
|
|
501
|
+
### Candidate extraction via parser
|
|
502
|
+
|
|
503
|
+
After `collect_lane_results` returns for base lanes, process each lane result
|
|
504
|
+
that carries an `output_ref`. The orchestrator MUST use the candidate parser
|
|
505
|
+
rather than preview-text extraction:
|
|
506
|
+
|
|
507
|
+
1. For each `output_ref` (or batched), call `parse_lane_candidates` (or the
|
|
508
|
+
internal `parseAndPersist` module function) with `output_ref` and `producer`
|
|
509
|
+
flags; the parser auto-detects the format family per row. The parser reads
|
|
510
|
+
the full artifact from disk (no preview truncation issue) and returns
|
|
511
|
+
structured `ParseResultWithSidecar` records.
|
|
512
|
+
2. Filter the returned `candidates[]` array by `producer: "swarm-pr-review"` and
|
|
513
|
+
the relevant `row_format_family` (e.g., `base_explorer` for base lanes,
|
|
514
|
+
`micro_lane` for micro-lanes). Filtering happens on the parsed results, NOT
|
|
515
|
+
on the tool input.
|
|
516
|
+
3. Group the filtered candidates into reviewer-sized chunks:
|
|
517
|
+
- by file area (group by the directory or module of the `file_line` field),
|
|
518
|
+
- by category (group by the `category` field),
|
|
519
|
+
- by count (target max 50 candidates per chunk; smaller chunks are fine).
|
|
520
|
+
4. Dispatch reviewer lanes (one per chunk) with bounded in-context candidate
|
|
521
|
+
lists. Each reviewer lane receives only the candidates from its assigned
|
|
522
|
+
chunk.
|
|
523
|
+
|
|
524
|
+
If a lane has `output_degraded: true`, `transcript_incomplete: true`, or no usable `output_ref`, record an explicit
|
|
525
|
+
coverage gap and re-dispatch a narrower lane or mark affected candidates
|
|
526
|
+
UNVERIFIED. Never infer candidate absence from a preview.
|
|
527
|
+
|
|
528
|
+
**Fallback convention:** If the parser is unavailable, the explorer MAY emit
|
|
529
|
+
`[CANDIDATE]` rows in the lane output as a fallback convention (see the
|
|
530
|
+
Explorer Prompt Template at the end of this skill), but the orchestrator
|
|
531
|
+
SHOULD use the parser as the primary extraction mechanism.
|
|
532
|
+
|
|
533
|
+
**lane id uniqueness for parallel dispatches:** When re-dispatching failed or
|
|
534
|
+
re-running explorer lanes, every `dispatch_lanes_async` or `dispatch_lanes`
|
|
535
|
+
lane `id` MUST be unique within that dispatch batch and should include lane and
|
|
536
|
+
attempt suffixes (e.g., `pr_review_explore_lane1_attempt2`). Never reuse an id
|
|
537
|
+
in the same batch unless intentionally replacing that exact lane before dispatch.
|
|
501
538
|
|
|
502
539
|
Explorers optimize for recall. Over-reporting is expected. Explorers produce candidates only.
|
|
503
540
|
|
|
@@ -507,7 +544,7 @@ Explorers optimize for recall. Over-reporting is expected. Explorers produce can
|
|
|
507
544
|
| Lane 2: Security and trust boundaries | Injection, authz/authn bypass, SSRF, path traversal, secret exposure, unsafe deserialization, prompt injection | untrusted input sources, sanitization, credential handling, permission boundary, private network access, output escaping |
|
|
508
545
|
| Lane 3: Dependencies and deployment safety | Import changes, version bumps, lockfile drift, breaking APIs, package scripts, runtime assumptions | lockfile consistency, new transitive deps, Node/Bun/runtime compatibility, platform assumptions, license red flags |
|
|
509
546
|
| Lane 4: Docs, intent, and drift | PR claims vs implementation, docs mismatch, migration/changelog gaps, stale examples | obligation mapping, changed behavior not documented, docs promising behavior not implemented |
|
|
510
|
-
| Lane 5: Tests and falsifiability | Weak assertions, missing edge tests, flaky patterns, mock leakage, fixture drift | assertion strength, tautology patterns (`expect(true).toBe(true)`, `expect(res).toBeDefined()` without further checks, `assertDoesNotThrow` wrapping trivial code), negative paths, isolation, deterministic timing, cross-platform path coverage |
|
|
547
|
+
| Lane 5: Tests and falsifiability | Weak assertions, missing edge tests, flaky patterns, mock leakage, fixture drift | assertion strength, tautology patterns (`expect(true).toBe(true)`, `expect(res).toBeDefined()` without further checks), `assertDoesNotThrow` wrapping trivial code), negative paths, isolation, deterministic timing, cross-platform path coverage |
|
|
511
548
|
| Lane 6: Performance and architecture | Complexity regressions, memory leaks, over-coupling, inefficient graph scans, global mutable state | algorithmic deltas, caching, resource lifecycle, state ownership, architectural boundary violations |
|
|
512
549
|
|
|
513
550
|
### Explorer context contract
|
|
@@ -523,12 +560,19 @@ Every explorer must inspect or explicitly mark unavailable:
|
|
|
523
560
|
7. relevant Swarm knowledge/evidence entries, if present.
|
|
524
561
|
8. the commit range to analyze (`base_ref..head_ref`),
|
|
525
562
|
|
|
526
|
-
Explorer output format
|
|
563
|
+
### Explorer output format
|
|
564
|
+
|
|
565
|
+
Explorers emit structured candidate records. The parser reads the full lane
|
|
566
|
+
artifact and extracts these records. The canonical record shape is:
|
|
527
567
|
|
|
528
568
|
```text
|
|
529
569
|
[CANDIDATE] | candidate_id | lane | severity | category | file:line | claim | evidence_summary | impact_context | confidence: LOW/MEDIUM/HIGH
|
|
530
570
|
```
|
|
531
571
|
|
|
572
|
+
The parser normalizes this into a structured `candidates[]` array. If the
|
|
573
|
+
parser is unavailable, the explorer MAY emit the `[CANDIDATE]` row format
|
|
574
|
+
directly in the lane output as a fallback convention.
|
|
575
|
+
|
|
532
576
|
Explorers must not use `CONFIRMED`, `DISPROVED`, or `PRE_EXISTING`.
|
|
533
577
|
|
|
534
578
|
---
|
|
@@ -537,7 +581,7 @@ Explorers must not use `CONFIRMED`, `DISPROVED`, or `PRE_EXISTING`.
|
|
|
537
581
|
|
|
538
582
|
After `collect_lane_results` returns for base lanes, inspect the context pack risk triggers. Launch focused micro-lanes for triggered categories only, using `dispatch_lanes_async` again when more than one read-only micro-lane is needed. Collect every micro-lane batch with `wait: true` before reviewer classification. Do not launch irrelevant micro-lanes.
|
|
539
583
|
|
|
540
|
-
Apply the same
|
|
584
|
+
Apply the same parser-based extraction to micro-lanes: call `parse_lane_candidates` on each micro-lane `output_ref` (filter the returned `candidates[]` array by `row_format_family === "micro_lane"` after parsing), and treat degraded or incomplete lane artifacts as UNVERIFIED coverage rather than as clean negative evidence.
|
|
541
585
|
|
|
542
586
|
Each micro-lane receives:
|
|
543
587
|
|
|
@@ -547,7 +591,8 @@ Each micro-lane receives:
|
|
|
547
591
|
- relevant deterministic signals,
|
|
548
592
|
- related historical knowledge with quarantine/staleness status,
|
|
549
593
|
- expected invariants,
|
|
550
|
-
- output
|
|
594
|
+
- structured candidate output (parser-extracted). If the parser is unavailable,
|
|
595
|
+
the micro-lane MAY emit `[CANDIDATE]` rows as a fallback convention.
|
|
551
596
|
|
|
552
597
|
### Swarm plugin risk trigger map
|
|
553
598
|
|
|
@@ -596,7 +641,12 @@ Verifier output is advisory until incorporated by the independent reviewer or cr
|
|
|
596
641
|
|
|
597
642
|
## Phase 6: Independent Reviewer Confirmation
|
|
598
643
|
|
|
599
|
-
Route candidates to reviewer subagents. The
|
|
644
|
+
Route candidates to reviewer subagents. The orchestrator routes candidates
|
|
645
|
+
in bounded chunks produced by the parser-based extraction in Phase 3-4. Each
|
|
646
|
+
reviewer lane receives a bounded list of candidates from a single chunk — by
|
|
647
|
+
file area, category, or count — not the full candidate set. The reviewer must
|
|
648
|
+
re-read the candidate's file:line evidence and relevant context pack entries
|
|
649
|
+
directly.
|
|
600
650
|
|
|
601
651
|
### Noise budget and universal validation
|
|
602
652
|
|
|
@@ -813,6 +863,245 @@ Update the verdict only after re-verifying all previously blocking findings.
|
|
|
813
863
|
|
|
814
864
|
---
|
|
815
865
|
|
|
866
|
+
## Dry-Run: Parser-Based Candidate Extraction
|
|
867
|
+
|
|
868
|
+
This section demonstrates the new parser-based extraction path end-to-end
|
|
869
|
+
using synthetic data. It is concrete enough to implement the same pattern in
|
|
870
|
+
another skill.
|
|
871
|
+
|
|
872
|
+
### Scenario
|
|
873
|
+
|
|
874
|
+
A PR review has dispatched six base explorer lanes via `dispatch_lanes_async`.
|
|
875
|
+
The batch completed and `collect_lane_results` returned:
|
|
876
|
+
|
|
877
|
+
```json
|
|
878
|
+
{
|
|
879
|
+
"batch_id": "batch-a1b2c3",
|
|
880
|
+
"lane_results": [
|
|
881
|
+
{
|
|
882
|
+
"lane_id": "pr_review_lane1_correctness",
|
|
883
|
+
"status": "completed",
|
|
884
|
+
"output_ref": ".swarm/lane-results/batch-a1b2c3/lane-1/out-abc123.json",
|
|
885
|
+
"output_degraded": false
|
|
886
|
+
},
|
|
887
|
+
{
|
|
888
|
+
"lane_id": "pr_review_lane2_security",
|
|
889
|
+
"status": "completed",
|
|
890
|
+
"output_ref": ".swarm/lane-results/batch-a1b2c3/lane-2/out-def456.json",
|
|
891
|
+
"output_degraded": false
|
|
892
|
+
}
|
|
893
|
+
]
|
|
894
|
+
}
|
|
895
|
+
```
|
|
896
|
+
|
|
897
|
+
### Step 1 — Call the parser
|
|
898
|
+
|
|
899
|
+
The orchestrator calls `parse_lane_candidates` for each `output_ref`:
|
|
900
|
+
|
|
901
|
+
```json
|
|
902
|
+
{
|
|
903
|
+
"tool": "parse_lane_candidates",
|
|
904
|
+
"arguments": {
|
|
905
|
+
"output_ref": ".swarm/lane-results/batch-a1b2c3/lane-1/out-abc123.json",
|
|
906
|
+
"producer": "swarm-pr-review"
|
|
907
|
+
}
|
|
908
|
+
}
|
|
909
|
+
```
|
|
910
|
+
|
|
911
|
+
### Step 2 — Structured response
|
|
912
|
+
|
|
913
|
+
The parser returns a `ParseResultWithSidecar`. On success, `error` and `error_code` are absent:
|
|
914
|
+
|
|
915
|
+
```json
|
|
916
|
+
{
|
|
917
|
+
"candidates": [
|
|
918
|
+
{
|
|
919
|
+
"record_type": "candidate",
|
|
920
|
+
"row_format_family": "base_explorer",
|
|
921
|
+
"row_format_version": 1,
|
|
922
|
+
"record_version": { "major": 1, "minor": 0 },
|
|
923
|
+
"source_output_ref": ".swarm/lane-results/batch-a1b2c3/lane-1/out-abc123.json",
|
|
924
|
+
"source_batch_id": "B-2025-06-22-001",
|
|
925
|
+
"source_lane_id": "explorer-1",
|
|
926
|
+
"source_agent": "paid_explorer",
|
|
927
|
+
"source_digest": "sha256:abc123def456...",
|
|
928
|
+
"extracted_from_partial_source": false,
|
|
929
|
+
"sessionId": "ses_01HXYZ...",
|
|
930
|
+
"parentSessionId": "ses_01HABC...",
|
|
931
|
+
"producer": "swarm-pr-review",
|
|
932
|
+
"candidate_id": "C-001",
|
|
933
|
+
"lane": "Lane 1: Correctness and edge cases",
|
|
934
|
+
"micro_lane": null,
|
|
935
|
+
"severity": "HIGH",
|
|
936
|
+
"category": "null-safety",
|
|
937
|
+
"file_line": "src/utils/cache.ts:142",
|
|
938
|
+
"claim": "Uncached getter may return undefined on cold start",
|
|
939
|
+
"evidence_summary": "The `getCached` function returns `cache[key]` without a fallback when the cache is empty.",
|
|
940
|
+
"impact_context": "Downstream callers in `src/handlers/*.ts` expect a defined value and call `.toString()` directly.",
|
|
941
|
+
"invariant_violated": null,
|
|
942
|
+
"confidence": "HIGH"
|
|
943
|
+
},
|
|
944
|
+
{
|
|
945
|
+
"record_type": "candidate",
|
|
946
|
+
"row_format_family": "base_explorer",
|
|
947
|
+
"row_format_version": 1,
|
|
948
|
+
"record_version": { "major": 1, "minor": 0 },
|
|
949
|
+
"source_output_ref": ".swarm/lane-results/batch-a1b2c3/lane-1/out-abc123.json",
|
|
950
|
+
"source_batch_id": "B-2025-06-22-001",
|
|
951
|
+
"source_lane_id": "explorer-1",
|
|
952
|
+
"source_agent": "paid_explorer",
|
|
953
|
+
"source_digest": "sha256:abc123def456...",
|
|
954
|
+
"extracted_from_partial_source": false,
|
|
955
|
+
"sessionId": "ses_01HXYZ...",
|
|
956
|
+
"parentSessionId": "ses_01HABC...",
|
|
957
|
+
"producer": "swarm-pr-review",
|
|
958
|
+
"candidate_id": "C-002",
|
|
959
|
+
"lane": "Lane 1: Correctness and edge cases",
|
|
960
|
+
"micro_lane": null,
|
|
961
|
+
"severity": "MEDIUM",
|
|
962
|
+
"category": "async-ordering",
|
|
963
|
+
"file_line": "src/services/queue.ts:88",
|
|
964
|
+
"claim": "Race between `drain` and `processNext` may drop items",
|
|
965
|
+
"evidence_summary": "`drain` sets `active = false` before awaiting `processNext`, which also checks `active`.",
|
|
966
|
+
"impact_context": "Items submitted during the drain window are silently dropped.",
|
|
967
|
+
"invariant_violated": null,
|
|
968
|
+
"confidence": "MEDIUM"
|
|
969
|
+
}
|
|
970
|
+
],
|
|
971
|
+
"invocation_envelope": {
|
|
972
|
+
"record_type": "invocation",
|
|
973
|
+
"source_output_ref": ".swarm/lane-results/batch-a1b2c3/lane-1/out-abc123.json",
|
|
974
|
+
"source_batch_id": "B-2025-06-22-001",
|
|
975
|
+
"source_lane_id": "explorer-1",
|
|
976
|
+
"source_agent": "paid_explorer",
|
|
977
|
+
"source_digest": "sha256:abc123def456...",
|
|
978
|
+
"row_format_version": 1,
|
|
979
|
+
"record_version": { "major": 1, "minor": 0 },
|
|
980
|
+
"sessionId": "ses_01HXYZ...",
|
|
981
|
+
"parentSessionId": "ses_01HABC...",
|
|
982
|
+
"producer": "swarm-pr-review",
|
|
983
|
+
"produced_at": "2025-06-22T14:30:00.000Z",
|
|
984
|
+
"format_families_detected": ["base_explorer"],
|
|
985
|
+
"candidate_count": 2,
|
|
986
|
+
"parse_errors": 2,
|
|
987
|
+
"malformed_rows": 0
|
|
988
|
+
},
|
|
989
|
+
"diagnostics": {
|
|
990
|
+
"candidate_count": 2,
|
|
991
|
+
"parse_errors": 2,
|
|
992
|
+
"parse_error_details": [
|
|
993
|
+
{
|
|
994
|
+
"row_index": 0,
|
|
995
|
+
"field": "row",
|
|
996
|
+
"message": "Both format-family discriminators present; defaulting to base_explorer"
|
|
997
|
+
},
|
|
998
|
+
{
|
|
999
|
+
"row_index": 1,
|
|
1000
|
+
"field": "row",
|
|
1001
|
+
"message": "Both format-family discriminators present; defaulting to base_explorer"
|
|
1002
|
+
}
|
|
1003
|
+
],
|
|
1004
|
+
"malformed_rows": 0,
|
|
1005
|
+
"duplicate_id_count": 0,
|
|
1006
|
+
"duplicate_id_warnings": [],
|
|
1007
|
+
"degraded_source_count": 0,
|
|
1008
|
+
"incomplete_source_count": 0,
|
|
1009
|
+
"format_families_detected": ["base_explorer"]
|
|
1010
|
+
}
|
|
1011
|
+
}
|
|
1012
|
+
```
|
|
1013
|
+
> **Note**: `parse_errors: 2` reflects FR-017/SC-017 position-based detection: when a `[CANDIDATE]` row has both `evidence_summary` and `impact_context` populated, the parser emits a `parse_error_details` entry per row with `field: "row"` and `message: "Both format-family discriminators present; defaulting to base_explorer"`. This is documented behavior, not a parser bug. To get `parse_errors: 0` with the row format, leave one of the two fields empty; to silence the warning entirely, emit structured JSON candidate records.
|
|
1014
|
+
|
|
1015
|
+
On refusal (e.g. `output_ref` does not exist), `error` and `error_code` are present; `candidates` is `[]`; `invocation_envelope` and `diagnostics` are populated with empty fields for traceability:
|
|
1016
|
+
|
|
1017
|
+
```json
|
|
1018
|
+
{
|
|
1019
|
+
"error": "Artifact reference not found in store",
|
|
1020
|
+
"error_code": "ref-not-found",
|
|
1021
|
+
"candidates": [],
|
|
1022
|
+
"invocation_envelope": {
|
|
1023
|
+
"record_type": "invocation",
|
|
1024
|
+
"source_output_ref": ".swarm/lane-results/batch-a1b2c3/lane-1/missing.json",
|
|
1025
|
+
"source_batch_id": "",
|
|
1026
|
+
"source_lane_id": "",
|
|
1027
|
+
"source_agent": "",
|
|
1028
|
+
"source_digest": "",
|
|
1029
|
+
"row_format_version": 1,
|
|
1030
|
+
"record_version": { "major": 1, "minor": 0 },
|
|
1031
|
+
"produced_at": "2025-06-22T14:30:00.000Z",
|
|
1032
|
+
"format_families_detected": [],
|
|
1033
|
+
"candidate_count": 0,
|
|
1034
|
+
"parse_errors": 0,
|
|
1035
|
+
"malformed_rows": 0
|
|
1036
|
+
},
|
|
1037
|
+
"diagnostics": {
|
|
1038
|
+
"candidate_count": 0,
|
|
1039
|
+
"parse_errors": 0,
|
|
1040
|
+
"parse_error_details": [],
|
|
1041
|
+
"malformed_rows": 0,
|
|
1042
|
+
"duplicate_id_count": 0,
|
|
1043
|
+
"duplicate_id_warnings": [],
|
|
1044
|
+
"degraded_source_count": 0,
|
|
1045
|
+
"incomplete_source_count": 0,
|
|
1046
|
+
"format_families_detected": []
|
|
1047
|
+
}
|
|
1048
|
+
}
|
|
1049
|
+
```
|
|
1050
|
+
|
|
1051
|
+
### Step 3 — Filter and group
|
|
1052
|
+
|
|
1053
|
+
The orchestrator filters the returned `candidates[]` array by `producer: "swarm-pr-review"` and `row_format_family` (e.g. `base_explorer` or `micro_lane`), then groups
|
|
1054
|
+
the candidates. In this synthetic example, the two candidates above are grouped
|
|
1055
|
+
by file area:
|
|
1056
|
+
|
|
1057
|
+
- **Chunk A — `src/utils/`** (1 candidate): C-001
|
|
1058
|
+
- **Chunk B — `src/services/`** (1 candidate): C-002
|
|
1059
|
+
|
|
1060
|
+
If there were more candidates, the orchestrator would also group by category
|
|
1061
|
+
(e.g., `null-safety`, `async-ordering`) and cap each chunk at 50 candidates.
|
|
1062
|
+
|
|
1063
|
+
### Step 4 — Dispatch reviewer lanes
|
|
1064
|
+
|
|
1065
|
+
The orchestrator dispatches one reviewer lane per chunk:
|
|
1066
|
+
|
|
1067
|
+
```text
|
|
1068
|
+
You are the independent reviewer. Validate only the candidates assigned below.
|
|
1069
|
+
Do not search for new issues except where needed to validate reachability or
|
|
1070
|
+
mitigation. Do not trust explorer severity.
|
|
1071
|
+
|
|
1072
|
+
Context pack summary:
|
|
1073
|
+
- scope: ...
|
|
1074
|
+
- obligations: ...
|
|
1075
|
+
- impact cone: ...
|
|
1076
|
+
- deterministic signals: ...
|
|
1077
|
+
- relevant Swarm artifacts / knowledge: ...
|
|
1078
|
+
- base_ref: <commit SHA of base branch>
|
|
1079
|
+
- head_ref: <commit SHA of PR head branch>
|
|
1080
|
+
|
|
1081
|
+
Candidates (Chunk A — src/utils/):
|
|
1082
|
+
- C-001 | HIGH | null-safety | src/utils/cache.ts:142 | Uncached getter may return undefined on cold start
|
|
1083
|
+
|
|
1084
|
+
For each candidate, return:
|
|
1085
|
+
[REVIEWED] | candidate_id | CONFIRMED/DISPROVED/UNVERIFIED/PRE_EXISTING | evidence_type | final_severity | introduced_by_pr | file:line | rationale | falsification_probe | reviewer_id
|
|
1086
|
+
|
|
1087
|
+
You must check caller context, reachability, schema/middleware/framework mitigations, state-machine constraints, test coverage, PR-introducedness, and severity.
|
|
1088
|
+
|
|
1089
|
+
IMPORTANT: If a finding claims behavior is "new" or "introduced by the PR", you MUST read the equivalent code on the base branch (git show <base_ref>:<file>) to verify it was not present before. A reviewer claim of "this is new" is invalid without base-branch evidence. Do not compare the new code to an idealized baseline — compare it to what actually existed on the base branch at the time of the PR.
|
|
1090
|
+
```
|
|
1091
|
+
|
|
1092
|
+
### Key invariants
|
|
1093
|
+
|
|
1094
|
+
- The parser reads the **full artifact**, not a preview. Truncation in the
|
|
1095
|
+
`dispatch_lanes` preview does not affect candidate extraction.
|
|
1096
|
+
- The orchestrator never classifies candidates — it only filters, groups, and
|
|
1097
|
+
routes them.
|
|
1098
|
+
- Each reviewer receives a bounded chunk. A chunk with more than 50 candidates
|
|
1099
|
+
is split before dispatch.
|
|
1100
|
+
- The `invocation_envelope` in the parser response provides audit provenance
|
|
1101
|
+
for every extracted candidate.
|
|
1102
|
+
|
|
1103
|
+
---
|
|
1104
|
+
|
|
816
1105
|
# Council Mode Workflow
|
|
817
1106
|
|
|
818
1107
|
Council mode is opt-in only and adversarial.
|
|
@@ -1116,4 +1405,10 @@ Return:
|
|
|
1116
1405
|
[CANDIDATE] | candidate_id | lane | severity | category | file:line | claim | evidence_summary | impact_context | confidence
|
|
1117
1406
|
```
|
|
1118
1407
|
|
|
1408
|
+
The orchestrator extracts candidates from the full lane artifact via
|
|
1409
|
+
`parse_lane_candidates` as the primary mechanism. The `[CANDIDATE]` row
|
|
1410
|
+
format above is a fallback convention for environments where the parser is
|
|
1411
|
+
unavailable. Explorers should still emit structured records regardless of
|
|
1412
|
+
whether the parser is present.
|
|
1413
|
+
|
|
1119
1414
|
Do not let speed degrade validation quality.
|
package/README.md
CHANGED
|
@@ -800,6 +800,7 @@ Every candidate passes a 3-gate pipeline before entering quarantine:
|
|
|
800
800
|
| mutation_test | Applies LLM-generated mutation patches to source files and runs tests to measure kill rate; verdict is pass/warn/fail based on configurable thresholds; used by the mutation_test gate (opt-in, off by default) |
|
|
801
801
|
| generate_mutants | Architect-only: generates LLM-based mutation patches (5–10 per function across 6 types: off-by-one, null substitution, operator swap, guard removal, branch swap, side-effect deletion) for direct consumption by the mutation_test tool; returns SKIP verdict on LLM failure rather than throwing |
|
|
802
802
|
| write_mutation_evidence | Architect-only: writes mutation gate results atomically to `.swarm/evidence/{phase}/mutation-gate.json`; accepts verdict (PASS/WARN/FAIL/SKIP), kill rate metrics, and optional survived mutant details; normalizes uppercase-to-lowercase before persisting |
|
|
803
|
+
| parse_lane_candidates | Architect-only: parses `[CANDIDATE]` rows from a `dispatch_lanes` or `collect_lane_results` artifact by `output_ref`; produces structured records with provenance and optional sidecar JSONL persistence; returns `ParseResultWithSidecar` on success or `{ error, error_code, candidates: [] }` on refusal |
|
|
803
804
|
| git_blame | Per-line git blame metadata (sha, author, date, summary) via `git blame --porcelain`; supports optional line range filtering |
|
|
804
805
|
| diff | Structured git diff with contract change detection; supports `summaryOnly` mode returning file list with additions/deletions counts |
|
|
805
806
|
| suggest_patch | Reviewer-safe structured patch suggestion; supports `format` parameter ('json' or 'unified') where unified outputs valid unified diff with `diff --git` headers, hunks, and context |
|