@bastani/atomic 0.8.29-alpha.2 → 0.8.29-alpha.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -11,6 +11,9 @@
11
11
 
12
12
  ### Changed
13
13
 
14
+ - Changed the bundled builtin `ralph` workflow to run `/skill:prompt-engineer` prompt-engineering and `/skill:research-codebase` research before orchestration instead of starting with an RFC/planner stage, pass the research artifact as primary implementation context, reuse prior research session data on follow-up loops, and feed unresolved reviewer findings into later research passes ([#1371](https://github.com/bastani-inc/atomic/issues/1371)).
15
+ - Changed bundled `goal`, `ralph`, and `open-claude-design` decision gates to use schema-backed workflow `structured_output` stages instead of registering bespoke terminating custom tools.
16
+ - Changed bundled `goal` worker/reviewer prompts and `ralph` orchestrator/reviewer prompts to request end-to-end verification when practical, using browser-skilled subagents for web/frontend flows that may depend on backend/API behavior and tmux-skilled subagents for TUI or terminal-app scenarios.
14
17
  - Bumped the bundled upstream pi runtime libraries `@earendil-works/pi-agent-core`, `@earendil-works/pi-ai`, and `@earendil-works/pi-tui` from `^0.79.1` to `^0.79.3`, bringing the latest upstream provider, model, agent-core, and TUI compatibility fixes into `@bastani/atomic`.
15
18
  - Updated the structured-output extension example and SDK/workflow/extension docs to use the canonical factory instead of hand-rolled `terminate: true` wrappers, and documented that schema-specific calls pass fields directly rather than through `{ value: ... }` ([#1350](https://github.com/bastani-inc/atomic/issues/1350)).
16
19
  - Enforced top-level object schemas for `structured_output` factory registration, with guidance to wrap array or primitive final values in object fields, made custom-named factory tools advertise the configured name in prompt metadata, and documented that text print mode recognizes factory-created custom structured-output tools without treating every terminating tool as printable ([#1350](https://github.com/bastani-inc/atomic/issues/1350)).
@@ -27,6 +30,10 @@
27
30
  - Fixed prerelease publishing for native Atomic artifacts by allowing the `@bastani/atomic-natives` package metadata in release-preparation verification, running native artifact builds on architecture-matched Blacksmith and macOS runners, and documenting the two-package publish flow while keeping npm provenance publishing on GitHub-hosted Ubuntu.
28
31
  - Fixed the bundled experimental Cursor provider to honor per-request stream deadlines across open/read/resume writes, reset timed-out or aborted streams, clean up replaced paused turns safely, catch cleanup cancellation failures, tolerate non-MCP Cursor exec protocol messages without ending assistant turns, and align Run requests with Cursor's private CLI protocol by using blob/KV conversation state plus request-context tool-definition responses without the unsupported custom system-prompt field ([#1286](https://github.com/bastani-inc/atomic/issues/1286)).
29
32
  - Fixed release archive startup for the bundled experimental Cursor provider by declaring `@bufbuild/protobuf` as an `@bastani/atomic` runtime dependency, covering Cursor in the bundled-package dependency metadata guard, and smoke-checking Cursor/protobuf assets in native archives ([#1286](https://github.com/bastani-inc/atomic/issues/1286)).
33
+ - Fixed bundled `ralph` skill-prompt stages to invoke bundled skills through `/skill:<name>` expansion so prompt engineering and research stages receive the intended skill instructions.
34
+ - Fixed concurrent bundled workflow stage resource reloads to serialize temporary subagent child environment isolation so parallel stage startup cannot leave parent process child flags accidentally cleared.
35
+ - Fixed bundled workflow stage sessions to keep workflow package skills (`create-spec`, `impeccable`, `prompt-engineer`, `research-codebase`, and `skill-creator`) available while disabling only the recursive workflows extension in child sessions.
36
+ - Fixed bundled workflow stage resource discovery so bundled subagent definitions stay available, `subagent` is active by default with the same two-hop nesting budget as main chat, and explicitly allowlisted bundled extension tools such as `subagent`, `web_search`, `fetch_content`, and `intercom` remain visible even when a workflow is launched from a subagent child process.
30
37
 
31
38
  ### Security
32
39
 
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@bastani/cursor",
3
- "version": "0.8.29-alpha.2",
3
+ "version": "0.8.29-alpha.3",
4
4
  "private": true,
5
5
  "description": "Experimental first-party Atomic extension for Cursor OAuth, model discovery, and streaming provider registration.",
6
6
  "contributors": [
@@ -40,7 +40,7 @@
40
40
  }
41
41
  },
42
42
  "dependencies": {
43
- "@bastani/atomic-natives": "0.8.29-alpha.2",
43
+ "@bastani/atomic-natives": "0.8.29-alpha.3",
44
44
  "@bufbuild/protobuf": "^2.0.0"
45
45
  }
46
46
  }
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@bastani/intercom",
3
- "version": "0.8.29-alpha.2",
3
+ "version": "0.8.29-alpha.3",
4
4
  "private": true,
5
5
  "description": "Atomic extension providing a private coordination channel between parent and child agent sessions. Fork of: https://github.com/nicobailon/pi-intercom",
6
6
  "contributors": [
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@bastani/mcp",
3
- "version": "0.8.29-alpha.2",
3
+ "version": "0.8.29-alpha.3",
4
4
  "private": true,
5
5
  "description": "Atomic extension that adapts MCP (Model Context Protocol) servers into the coding agent. Fork of: https://github.com/nicobailon/pi-mcp-adapter",
6
6
  "contributors": [
@@ -13,6 +13,7 @@
13
13
 
14
14
  - Fixed subagent `outputSchema` readback to accept cross-process captures only when flat `output.json` params validate against the schema and `output.meta.json` sidecar metadata matches the final successful terminating `structured_output` transcript action, rejecting missing metadata, stale captures with later assistant/tool-result messages, sibling tool calls, duplicate structured-output calls, mismatched call IDs/names, and error tool results while ignoring benign `custom` host annotations around the final tool result ([#1350](https://github.com/bastani-inc/atomic/issues/1350)).
15
15
  - Fixed explicit empty child `tools: []` allowlists with `outputSchema` to pass only `--tools structured_output`, keeping the restricted child from regaining default tools while still enabling the required final-answer channel ([#1350](https://github.com/bastani-inc/atomic/issues/1350)).
16
+ - Fixed workflow-stage subagent depth handling so bundled workflow stages inherit the main-chat two-hop subagent nesting budget while preserving stricter configured limits, and updated the nested-depth rejection message to describe the maximum-depth condition ([#1372](https://github.com/bastani-inc/atomic/pull/1372)).
16
17
 
17
18
  ## [0.8.28] - 2026-06-11
18
19
 
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@bastani/subagents",
3
- "version": "0.8.29-alpha.2",
3
+ "version": "0.8.29-alpha.3",
4
4
  "private": true,
5
5
  "description": "Atomic extension for delegating tasks to subagents with chains, parallel execution, and TUI clarification. Fork of: https://github.com/nicobailon/pi-subagents",
6
6
  "contributors": [
@@ -1008,8 +1008,9 @@ export function resolveWorkflowStageMaxSubagentDepth(
1008
1008
  ): number {
1009
1009
  const maxDepth = resolveCurrentMaxSubagentDepth(configMaxDepth);
1010
1010
  return isWorkflowStageOrchestrationContext(ctx)
1011
- // Workflow stages reserve one child-subagent hop; a 0-depth constraint would
1012
- // prevent the stage from delegating to its configured subagent at all.
1011
+ // Workflow stages use the same two-hop default as main chat. A 0-depth
1012
+ // constraint still preserves one child-subagent hop so configured workflow
1013
+ // stages can delegate at least once.
1013
1014
  ? Math.min(maxDepth, Math.max(1, ctx.orchestrationContext?.constraints.maxSubagentDepth ?? 1))
1014
1015
  : maxDepth;
1015
1016
  }
@@ -1030,7 +1031,7 @@ export function resolveSubagentDepthPolicy(
1030
1031
  }
1031
1032
 
1032
1033
  function workflowStageSubagentDepthMessage(depth: number, maxDepth: number, action: "call" | "resume" = "call"): string {
1033
- return `Nested subagent ${action} blocked (depth=${depth}, max=${maxDepth}). Sub-agents inside workflow stages cannot spawn nested sub-agents.`;
1034
+ return `Nested subagent ${action} blocked (depth=${depth}, max=${maxDepth}). Sub-agents inside workflow stages are running at the maximum nesting depth.`;
1034
1035
  }
1035
1036
 
1036
1037
  export function subagentDepthBlockedMessage(
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@bastani/web-access",
3
- "version": "0.8.29-alpha.2",
3
+ "version": "0.8.29-alpha.3",
4
4
  "private": true,
5
5
  "description": "Atomic extension for web search, URL fetching, GitHub repo cloning, PDF/video extraction. Fork of: https://github.com/nicobailon/pi-web-access",
6
6
  "contributors": [
@@ -12,6 +12,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
12
12
 
13
13
  ### Changed
14
14
 
15
+ - Changed the builtin `ralph` workflow to start each iteration with `/skill:prompt-engineer` prompt-engineering and `/skill:research-codebase` research instead of an RFC/planner stage, pass the resulting research artifact to the orchestrator as primary implementation context, fork follow-up research from prior research session data, and feed unresolved reviewer findings into subsequent research passes ([#1371](https://github.com/bastani-inc/atomic/issues/1371)).
16
+ - Changed builtin `goal`, `ralph`, and `open-claude-design` decision gates to use schema-backed workflow `structured_output` stages with TypeBox-native schema builders instead of registering bespoke terminating custom tools or wrapping plain JSON schemas with `Type.Unsafe`.
17
+ - Changed the builtin `ralph` prompt-engineering stage to disable all tools while relying on the `/skill:prompt-engineer` skill prompt, keeping that first-pass rewrite focused and tool-free.
18
+ - Changed builtin `goal` worker/reviewer prompts and `ralph` orchestrator/reviewer prompts to request end-to-end verification when practical, using browser-skilled subagents for web/frontend flows that may depend on backend/API behavior and tmux-skilled subagents for TUI or terminal-app scenarios.
15
19
  - Aligned the workflows extension with upstream pi TUI `^0.79.3` so workflow graph, custom UI, and prompt-broker integrations inherit the latest shared TUI compatibility fixes.
16
20
  - Documented the opt-in `structured_output` workflow path and clarified that ordinary workflow stages do not receive `structured_output` from the default tool registry; schema-enabled items auto-add the runtime tool to explicit `tools` allowlists ([#1350](https://github.com/bastani-inc/atomic/issues/1350)).
17
21
  - Clarified that workflow `structured_output` gate schemas must be top-level object tool-argument schemas, with arrays and primitives wrapped in object fields before being returned through the terminating tool, and documented the one-`prompt()` limit for schema-backed `StageContext` result contracts ([#1350](https://github.com/bastani-inc/atomic/issues/1350)).
@@ -23,6 +27,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
23
27
  - Fixed schema-backed workflow stages to fail with a clear stage-level error when `prompt()` is called more than once on the same `StageContext`, rather than surfacing the lower-level structured-output single-use guard ([#1350](https://github.com/bastani-inc/atomic/issues/1350)).
24
28
  - Fixed schema-backed workflow model fallback so an attempt that already captured a valid terminating `structured_output` result is treated as successful instead of retrying against fallback models and tripping the single-use result guard ([#1350](https://github.com/bastani-inc/atomic/issues/1350)).
25
29
  - Fixed the workflow graph overlay remaining interactive when the parent/main-chat agent opens `ask_user_question`: the graph keeps focus, the parent question stays pending behind it with a clear “Main chat needs input — exit graph to answer.” status hint, hiding/exiting the graph focuses the pending question, and host custom-UI state changes no longer hide, restore, remount, or repaint the overlay ([#1353](https://github.com/bastani-inc/atomic/issues/1353)).
30
+ - Fixed builtin `ralph` skill-prompt stages to invoke bundled skills through `/skill:<name>` expansion so prompt engineering and research stages receive the intended skill instructions.
31
+ - Fixed concurrent workflow stage resource reloads to serialize temporary subagent child environment isolation so parallel stage startup cannot leave parent process child flags accidentally cleared.
32
+ - Fixed workflow stage sessions to keep bundled workflow package skills (`create-spec`, `impeccable`, `prompt-engineer`, `research-codebase`, and `skill-creator`) available while still disabling only the recursive workflows extension inside child sessions.
33
+ - Fixed workflow stage resource discovery so bundled subagent definitions stay available, `subagent` is active by default with the same two-hop nesting budget as main chat, and explicitly allowlisted bundled extension tools such as `subagent`, `web_search`, `fetch_content`, and `intercom` remain visible even when a workflow is launched from a subagent child process.
26
34
 
27
35
  ## [0.8.28] - 2026-06-11
28
36
 
@@ -282,6 +282,8 @@ const decision = await ctx.stage("review-gate", { schema: Decision }).prompt(
282
282
 
283
283
  Atomic registers the canonical `structured_output` tool only for schema-enabled items, automatically adds it to explicit `tools` allowlists, and fails the item if the model completes without the final tool call. The schema is used directly as the tool argument contract, so wrap arrays or primitives in an object field such as `{ items: [...] }` or `{ value: ... }`. A schema-backed `StageContext` supports one `prompt()` call because the final-answer tool is an exact-once result contract; create another `ctx.stage(..., { schema })` for another structured prompt. `ctx.task`/`ctx.chain`/`ctx.parallel` results expose the parsed value as `result.structured` and keep `result.text` as formatted JSON for handoffs.
284
284
 
285
+ `subagent` is available as a default workflow-stage tool with the same default two-hop nesting budget as main chat: a stage can launch a subagent, and that child can launch one nested subagent before the guard blocks further delegation. `tools` allowlists apply to bundled extension tools as well as built-ins; if a stage sets `tools`, list every tool it should see. Workflow stages can explicitly list `subagent`, `web_search`, `fetch_content`, `intercom`, and other loaded extension tools, while `excludedTools` and `noTools: "all"` still win. Bundled `@bastani/subagents` agent definitions are available to the `subagent` tool in workflow stages, including workflows launched from a subagent child process.
286
+
285
287
  ### Model fallbacks
286
288
 
287
289
  Stages and high-level task helpers can retry transient provider/model failures with an ordered `fallbackModels` list. The primary `model` is tried first, then each fallback, and finally the current Atomic-selected model when available. Fallbacks are only used for retryable model/provider failures such as rate limits, quota/auth/provider outages, unavailable models, network timeouts, and 5xx errors — ordinary tool, shell, validation, cancellation, and workflow-code failures are not retried.
@@ -589,7 +591,7 @@ Child workflow outputs: `result`, `findings`, `research_doc_path`, `artifact_dir
589
591
 
590
592
  ### `goal`
591
593
 
592
- Goal Runner workflow: initialize a persisted goal ledger with a per-run goal id and lifecycle events, render goal-continuation context, run bounded worker LM turns, append receipts, run three independent reviewers, and let a TypeScript reducer decide `complete`, `continue`, `blocked`, or `needs_human`. Token budget behavior is intentionally excluded.
594
+ Goal Runner workflow: initialize a persisted goal ledger with a per-run goal id and lifecycle events, render goal-continuation context, run bounded worker LM turns, append receipts, run three independent reviewers, and let a TypeScript reducer decide `complete`, `continue`, `blocked`, or `needs_human`. Workers and reviewers are prompted to verify user-visible behavior end-to-end when practical with browser-skilled subagents for web/frontend flows that may depend on backend/API behavior and tmux-skilled subagents for TUI or terminal-app scenarios. Token budget behavior is intentionally excluded.
593
595
 
594
596
  ```text
595
597
  /workflow goal objective="Migrate the database layer to Drizzle ORM" base_branch=develop
@@ -607,22 +609,22 @@ Child workflow outputs: `result`, `status`, `approved`, `goal_id`, `objective`,
607
609
 
608
610
  ### `ralph`
609
611
 
610
- Planorchestratesimplify → review workflow with optional final-stage PR handoff: write an RFC-style technical design document under `specs/`, delegate implementation through sub-agents, simplify recent changes, run parallel reviewers, and iterate until approval or the loop limit. Ralph skips PR creation by default; prompt text alone does not opt in. Pass `create_pr=true` to authorize only the final `pull-request` stage to inspect provider credentials and attempt provider-appropriate PR/MR/review creation (for example GitHub `gh`, Azure Repos `az repos pr create`, or Sapling/Phabricator tooling). Ralph's own PR-creation instructions live in that final stage. Reviewers inspect repository infrastructure directly as needed; Ralph no longer runs separate `infra-*` discovery stages.
612
+ Prompt-engineeringresearchorchestrate → review workflow with optional final-stage PR handoff: transform the user prompt into a codebase and online research question with `/skill:prompt-engineer`, run `/skill:research-codebase` against it, write findings under `research/`, delegate implementation through sub-agents from that research, run parallel reviewers, and iterate until approval or the loop limit. Ralph's orchestrator and reviewers are prompted to verify user-visible behavior end-to-end when practical with browser-skilled subagents for web/frontend flows that may depend on backend/API behavior and tmux-skilled subagents for TUI or terminal-app scenarios. Follow-up iterations pass unresolved review artifacts into prompt-engineering/research and fork research from prior research session data when available. Ralph skips PR creation by default; prompt text alone does not opt in. Pass `create_pr=true` to authorize only the final `pull-request` stage to inspect provider credentials and attempt provider-appropriate PR/MR/review creation (for example GitHub `gh`, Azure Repos `az repos pr create`, or Sapling/Phabricator tooling). Ralph's own PR-creation instructions live in that final stage. Reviewers inspect repository infrastructure directly as needed; Ralph no longer runs separate `infra-*` discovery stages.
611
613
 
612
614
  ```text
613
- /workflow ralph prompt="Plan and migrate the database layer to Drizzle ORM" max_loops=3 base_branch=develop
614
- /workflow ralph prompt="Plan and migrate the database layer to Drizzle ORM" max_loops=3 base_branch=develop create_pr=true
615
+ /workflow ralph prompt="Migrate the database layer to Drizzle ORM" max_loops=3 base_branch=develop
616
+ /workflow ralph prompt="Migrate the database layer to Drizzle ORM" max_loops=3 base_branch=develop create_pr=true
615
617
  ```
616
618
 
617
619
  | Input | Type | Required | Default | Description |
618
620
  | ------------------ | --------- | -------- | ------------- | ------------------------------------------------------------- |
619
- | `prompt` | `text` | ✓ | — | Task, feature request, issue summary, or spec path to plan, execute, refine, and review. |
620
- | `max_loops` | `number` | — | `10` | Maximum plan/orchestrate/review iterations before completion or optional final handoff. |
621
+ | `prompt` | `text` | ✓ | — | Task, feature request, issue summary, or spec path to research, execute, refine, and review. |
622
+ | `max_loops` | `number` | — | `10` | Maximum research/orchestrate/review iterations before completion or optional final handoff. |
621
623
  | `base_branch` | `string` | — | `origin/main` | Branch reviewers and the optional final stage compare the current delta with; also used to create a missing worktree. |
622
624
  | `git_worktree_dir` | `string` | — | `""` | Optional reusable Git worktree root. Empty runs in the invoking checkout; non-empty values run Ralph stages in the created/reused worktree. |
623
625
  | `create_pr` | `boolean` | — | `false` | Safe-by-default PR creation flag. Omitted or `false` skips the final `pull-request` stage and omits `pr_report`; prompt text alone does not opt in, and only strict `true` authorizes the final `pull-request` stage to attempt provider-appropriate PR/MR/review creation. |
624
626
 
625
- Child workflow outputs: `result`, `plan`, `plan_path`, `implementation_notes_path`, `approved`, `iterations_completed`, `review_report`, and `review_report_path`. `pr_report` is included only when `create_pr=true` and the final `pull-request` stage runs.
627
+ Child workflow outputs: `result`, `plan` (latest transformed research question), `plan_path` (compatibility alias for `research_path`), `research`, `research_path`, `implementation_notes_path`, `approved`, `iterations_completed`, `review_report`, and `review_report_path`. `pr_report` is included only when `create_pr=true` and the final `pull-request` stage runs.
626
628
 
627
629
  ### `open-claude-design`
628
630
 
@@ -44,27 +44,6 @@ interface DeepResearchCodebaseResult {
44
44
 
45
45
  const FILE_ONLY_OUTPUT = "file-only" satisfies WorkflowOutputMode;
46
46
 
47
- const CODEBASE_SKILLS = {
48
- locator:
49
- "codebase-locator — use this skill's search-first discipline when mapping where files, symbols, docs, tests, and configuration live.",
50
- analyzer:
51
- "codebase-analyzer — use this skill's evidence-driven deep-read style when explaining behavior, architecture, control flow, data flow, and edge cases.",
52
- patternFinder:
53
- "codebase-pattern-finder — use this skill's example-mining approach when separating reusable conventions from one-off details.",
54
- researchLocator:
55
- "codebase-research-locator — use this skill's historical-discovery approach when finding prior research, specs, ADRs, issues, and TODOs.",
56
- researchAnalyzer:
57
- "codebase-research-analyzer — use this skill's synthesis approach when extracting decisions, constraints, stale assumptions, and open questions from prior research.",
58
- onlineResearcher:
59
- "codebase-online-researcher — use this skill's source-citing approach when external documentation or ecosystem behavior materially affects the answer.",
60
- } as const;
61
-
62
- function codebaseSkillGuidance(
63
- ...skills: readonly (keyof typeof CODEBASE_SKILLS)[]
64
- ): string {
65
- return skills.map((skill) => CODEBASE_SKILLS[skill]).join("\n");
66
- }
67
-
68
47
  function taggedPrompt(sections: readonly PromptSection[]): string {
69
48
  return sections
70
49
  .map(([tag, content]) => {
@@ -446,11 +425,7 @@ export default defineWorkflow("deep-research-codebase")
446
425
  "role",
447
426
  "You are a senior codebase research scout preparing work for specialist agents.",
448
427
  ],
449
- ["objective", `Map the repository. Research question: ${prompt}`],
450
- [
451
- "codebase_skills",
452
- codebaseSkillGuidance("locator", "analyzer", "patternFinder"),
453
- ],
428
+ ["objective", `Map the repository using parallel codebase-locator, codebase-analyzer, and codebase-pattern-finder subagents. Research question: ${prompt}`],
454
429
  [
455
430
  "instructions",
456
431
  [
@@ -480,10 +455,9 @@ export default defineWorkflow("deep-research-codebase")
480
455
  ["role", "You locate prior project research and decision history."],
481
456
  [
482
457
  "objective",
483
- "Find existing docs, specs, ADRs, issues/PR notes, TODOs, and research artifacts relevant to the task.",
458
+ "Find existing docs, specs, ADRs, issues/PR notes, TODOs, and research artifacts relevant to the task using parallel codebase-research-locator subagents.",
484
459
  ],
485
460
  ["task", "{task}"],
486
- ["codebase_skills", codebaseSkillGuidance("researchLocator")],
487
461
  [
488
462
  "instructions",
489
463
  [
@@ -520,10 +494,9 @@ export default defineWorkflow("deep-research-codebase")
520
494
  ],
521
495
  [
522
496
  "objective",
523
- `Extract reusable historical context. Research question: ${prompt}`,
497
+ `Extract reusable historical context using parallel codebase-research-analyzer subagents. Research question: ${prompt}`,
524
498
  ],
525
499
  ["prior_research_locator_output", "{previous}"],
526
- ["codebase_skills", codebaseSkillGuidance("researchAnalyzer")],
527
500
  [
528
501
  "instructions",
529
502
  [
@@ -558,13 +531,9 @@ export default defineWorkflow("deep-research-codebase")
558
531
  ["role", "You turn scout research into clean work partitions."],
559
532
  [
560
533
  "objective",
561
- `Return at most ${partitionCap} independent partitions for this research question: ${prompt}`,
534
+ `Return at most ${partitionCap} independent partitions for this research question: ${prompt}. Use parallel codebase-locator, codebase-analyzer, and codebase-pattern-finder subagents.`,
562
535
  ],
563
536
  ["scout_output", "{previous}"],
564
- [
565
- "codebase_skills",
566
- codebaseSkillGuidance("locator", "analyzer", "patternFinder"),
567
- ],
568
537
  [
569
538
  "instructions",
570
539
  [
@@ -607,11 +576,11 @@ export default defineWorkflow("deep-research-codebase")
607
576
  "scout_context",
608
577
  `Read the scout artifact before making evidence claims: ${displayWorkflowPath(scoutPath)}\nCompact saved-output reference: {previous}`,
609
578
  ],
610
- ["codebase_skills", codebaseSkillGuidance("locator")],
611
579
  [
612
580
  "instructions",
613
581
  [
614
582
  "Find the highest-signal files, tests, docs, commands, configs, and symbols for this partition.",
583
+ "Use parallel codebase-locator subagents to explore different areas of the partition.",
615
584
  "Explain why each path matters for the research question.",
616
585
  "Prioritize exact paths and symbol names over broad descriptions.",
617
586
  "Flag areas that look relevant but could not be verified.",
@@ -643,11 +612,10 @@ export default defineWorkflow("deep-research-codebase")
643
612
  "scout_context",
644
613
  `Read the scout artifact before making evidence claims: ${displayWorkflowPath(scoutPath)}\nCompact saved-output reference: {previous}`,
645
614
  ],
646
- ["codebase_skills", codebaseSkillGuidance("patternFinder")],
647
615
  [
648
616
  "instructions",
649
617
  [
650
- "Identify recurring implementation patterns, abstractions, naming conventions, and anti-patterns in this partition.",
618
+ "Identify recurring implementation patterns, abstractions, naming conventions, and anti-patterns in this partition using parallel codebase-pattern-finder subagents.",
651
619
  "Use concrete examples with paths, symbols, or test names.",
652
620
  "Distinguish established conventions from one-off implementation details.",
653
621
  "Avoid generic advice that is not grounded in the repository.",
@@ -711,11 +679,10 @@ export default defineWorkflow("deep-research-codebase")
711
679
  "context",
712
680
  `Read these artifacts before analyzing: ${displayWorkflowPaths(analyzerReads)}\nCompact saved-output reference: {previous}`,
713
681
  ],
714
- ["codebase_skills", codebaseSkillGuidance("analyzer")],
715
682
  [
716
683
  "instructions",
717
684
  [
718
- "Analyze behavior, control flow, data flow, lifecycle, error handling, and test coverage for this partition.",
685
+ "Analyze behavior, control flow, data flow, lifecycle, error handling, and test coverage for this partition using parallel codebase-analyzer subagents.",
719
686
  "Build on the locator output; do not repeat file discovery except where needed as evidence.",
720
687
  "Call out edge cases, invariants, and coupling to other partitions.",
721
688
  "If evidence is incomplete, explain what remains unknown and how to verify it.",
@@ -747,11 +714,11 @@ export default defineWorkflow("deep-research-codebase")
747
714
  ["assignment", `Partition ${i}/${partitions.length}: ${partition}`],
748
715
  ["research_question", prompt],
749
716
  ["local_context", onlineResearcherLocalContext],
750
- ["codebase_skills", codebaseSkillGuidance("onlineResearcher")],
751
717
  [
752
718
  "instructions",
753
719
  [
754
720
  "Identify external library/framework behavior, standards, or docs that materially affect the local interpretation.",
721
+ "Use parallel codebase-online-researcher subagents to explore different angles of external research.",
755
722
  "Cite sources, package names, API names, versions, or documentation titles when available.",
756
723
  "Explain how each external fact applies to this repository.",
757
724
  "If external research is unnecessary or unavailable, say so and focus on local implications.",
@@ -829,14 +796,6 @@ export default defineWorkflow("deep-research-codebase")
829
796
  "specialist_reports",
830
797
  `Read the complete explorer handoff artifact(s) at ${displayWorkflowPaths(explorerPaths)}. They preserve every partition's Locator, Pattern Finder, Analyzer, and Online Researcher output from the original inline specialist handoff while keeping this prompt bounded.`,
831
798
  ],
832
- [
833
- "codebase_skills",
834
- codebaseSkillGuidance(
835
- "analyzer",
836
- "researchAnalyzer",
837
- "onlineResearcher",
838
- ),
839
- ],
840
799
  [
841
800
  "instructions",
842
801
  [
@@ -845,6 +804,7 @@ export default defineWorkflow("deep-research-codebase")
845
804
  "Prioritize claims supported by concrete paths, symbols, tests, docs, or cited external references.",
846
805
  "Resolve contradictions explicitly and preserve important uncertainty.",
847
806
  "Avoid inventing facts not supported by the supplied reports; state unknowns instead.",
807
+ "Use parallel codebase-analyzer, codebase-research-analyzer, and codebase-online-researcher subagents as needed to verify claims or fill critical gaps in the supplied reports.",
848
808
  "End with actionable next steps for a developer who will use this research.",
849
809
  ].join("\n"),
850
810
  ],
@@ -13,7 +13,7 @@ import { join } from "node:path";
13
13
  import { defineWorkflow } from "../src/workflows/define-workflow.js";
14
14
  import { Type } from "typebox";
15
15
  import type { WorkflowTaskResult } from "../src/shared/types.js";
16
- import { WORKER_PREFLIGHT_CONTRACT } from "./shared-prompts.js";
16
+ import { E2E_VERIFICATION_GUIDANCE, WORKER_PREFLIGHT_CONTRACT } from "./shared-prompts.js";
17
17
 
18
18
  const DEFAULT_MAX_TURNS = 10;
19
19
  // Goal Runner runs three independent reviewer personas; two approvals form a majority.
@@ -135,108 +135,64 @@ function positiveInteger(value: number | undefined, fallback: number): number {
135
135
  return floored >= 1 ? floored : fallback;
136
136
  }
137
137
 
138
- const reviewDecisionSchema = {
139
- type: "object",
140
- additionalProperties: false,
141
- required: [
142
- "findings",
143
- "overall_correctness",
144
- "overall_explanation",
145
- "overall_confidence_score",
146
- "goal_oracle_satisfied",
147
- "receipt_assessment",
148
- "verification_remaining",
149
- "stop_review_loop",
150
- ],
151
- properties: {
152
- findings: {
153
- type: "array",
154
- items: {
155
- type: "object",
156
- additionalProperties: false,
157
- required: ["title", "body", "confidence_score", "code_location"],
158
- properties: {
159
- title: { type: "string" },
160
- body: { type: "string" },
161
- confidence_score: { type: "number", minimum: 0, maximum: 1 },
162
- priority: { type: ["integer", "null"], minimum: 0, maximum: 3 },
163
- code_location: {
164
- type: "object",
165
- additionalProperties: false,
166
- required: ["absolute_file_path", "line_range"],
167
- properties: {
168
- absolute_file_path: { type: "string" },
169
- line_range: {
170
- type: "object",
171
- additionalProperties: false,
172
- required: ["start", "end"],
173
- properties: {
174
- start: { type: "integer", minimum: 1 },
175
- end: { type: "integer", minimum: 1 },
176
- },
177
- },
178
- },
138
+ const reviewFindingSchema = Type.Object(
139
+ {
140
+ title: Type.String(),
141
+ body: Type.String(),
142
+ confidence_score: Type.Number({ minimum: 0, maximum: 1 }),
143
+ priority: Type.Optional(
144
+ Type.Union([Type.Integer({ minimum: 0, maximum: 3 }), Type.Null()]),
145
+ ),
146
+ code_location: Type.Object(
147
+ {
148
+ absolute_file_path: Type.String(),
149
+ line_range: Type.Object(
150
+ {
151
+ start: Type.Integer({ minimum: 1 }),
152
+ end: Type.Integer({ minimum: 1 }),
179
153
  },
180
- },
154
+ { additionalProperties: false },
155
+ ),
181
156
  },
182
- },
183
- overall_correctness: {
184
- type: "string",
185
- enum: ["patch is correct", "patch is incorrect"],
186
- },
187
- overall_explanation: { type: "string" },
188
- overall_confidence_score: { type: "number", minimum: 0, maximum: 1 },
189
- goal_oracle_satisfied: { type: "boolean" },
190
- receipt_assessment: { type: "string" },
191
- verification_remaining: { type: "string" },
192
- stop_review_loop: { type: "boolean" },
193
- reviewer_error: {
194
- anyOf: [
195
- { type: "null" },
196
- {
197
- type: "object",
198
- additionalProperties: false,
199
- required: ["kind", "message", "attempted_recovery"],
200
- properties: {
201
- kind: {
202
- type: "string",
203
- enum: [
204
- "validation_unavailable",
205
- "dependency_unavailable",
206
- "tool_failure",
207
- "reviewer_failure",
208
- ],
209
- },
210
- message: { type: "string" },
211
- attempted_recovery: { type: "string" },
212
- },
213
- },
214
- ],
215
- },
157
+ { additionalProperties: false },
158
+ ),
216
159
  },
217
- } as const;
218
-
219
- const reviewDecisionTool = {
220
- name: "review_decision",
221
- label: "Review Decision",
222
- description:
223
- "Emit the final structured review verdict after inspecting the patch.",
224
- promptSnippet: "Emit the final review verdict as structured data",
225
- promptGuidelines: [
226
- "Call review_decision after completing review investigation and validation.",
227
- "This is a terminating structured-output tool; do not emit another assistant response after calling it.",
228
- ],
229
- parameters: reviewDecisionSchema,
230
- async execute(_toolCallId: string, params: ReviewDecision) {
231
- return {
232
- content: [
233
- { type: "text" as const, text: JSON.stringify(params, null, 2) },
234
- ],
235
- details: params,
236
- terminate: true,
237
- };
160
+ { additionalProperties: false },
161
+ );
162
+
163
+ const reviewerErrorSchema = Type.Object(
164
+ {
165
+ kind: Type.Union([
166
+ Type.Literal("validation_unavailable"),
167
+ Type.Literal("dependency_unavailable"),
168
+ Type.Literal("tool_failure"),
169
+ Type.Literal("reviewer_failure"),
170
+ ]),
171
+ message: Type.String(),
172
+ attempted_recovery: Type.String(),
238
173
  },
239
- };
174
+ { additionalProperties: false },
175
+ );
176
+
177
+ const reviewDecisionSchema = Type.Object(
178
+ {
179
+ findings: Type.Array(reviewFindingSchema),
180
+ overall_correctness: Type.Union([
181
+ Type.Literal("patch is correct"),
182
+ Type.Literal("patch is incorrect"),
183
+ ]),
184
+ overall_explanation: Type.String(),
185
+ overall_confidence_score: Type.Number({ minimum: 0, maximum: 1 }),
186
+ goal_oracle_satisfied: Type.Boolean(),
187
+ receipt_assessment: Type.String(),
188
+ verification_remaining: Type.String(),
189
+ stop_review_loop: Type.Boolean(),
190
+ reviewer_error: Type.Optional(
191
+ Type.Union([Type.Null(), reviewerErrorSchema]),
192
+ ),
193
+ },
194
+ { additionalProperties: false },
195
+ );
240
196
 
241
197
  const GOAL_CONTINUATION_REFERENCE = [
242
198
  "Continuation behavior:",
@@ -589,6 +545,7 @@ function renderGoalContinuationPrompt(
589
545
  ].join("\n"),
590
546
  ],
591
547
  ["goal_guidelines", GOAL_CONTINUATION_REFERENCE],
548
+ ["e2e_verification", E2E_VERIFICATION_GUIDANCE],
592
549
  ]);
593
550
  }
594
551
 
@@ -619,6 +576,7 @@ function renderForkedGoalWorkerPrompt(
619
576
  renderLatestReviewArtifacts(latestReviewArtifactPaths),
620
577
  ].join("\n"),
621
578
  ],
579
+ ["e2e_verification", E2E_VERIFICATION_GUIDANCE],
622
580
  ]);
623
581
  }
624
582
 
@@ -795,6 +753,7 @@ function renderReviewerPrompt(args: {
795
753
  ["goal_framework", GOAL_METHOD_REFERENCE],
796
754
  ["goal_guidelines", GOAL_CONTINUATION_REFERENCE],
797
755
  ["auditability", RECEIPT_EXPECTATIONS],
756
+ ["e2e_verification", E2E_VERIFICATION_GUIDANCE],
798
757
  [
799
758
  "goal_context",
800
759
  [
@@ -829,8 +788,6 @@ function renderReviewerPrompt(args: {
829
788
  [
830
789
  "Inspect the actual diff/repository state rather than trusting stage summaries.",
831
790
  "Identify the smallest relevant validation set from repository evidence: targeted tests, lint, typecheck, build, generated-artifact checks, CI-equivalent scripts, or user-flow proof.",
832
- "When practical, include an end-to-end QA check that exercises the app the way a user would: use the tmux skill for terminal app environments and browser for web app environments.",
833
- "For web app environments, capture a screenshot as a certificate of correct completion when the UI state proves the objective; for terminal app environments, capture the terminal window/output that shows proof of correctness.",
834
791
  "Run or delegate focused validation when it is necessary to distinguish a real bug from a hunch.",
835
792
  "If tests or typechecks fail because dependencies are missing, install/download the missing dependencies with the repo's documented package manager instead of bypassing the check.",
836
793
  "If validation cannot be completed after reasonable recovery, record the limitation in overall_explanation and reviewer_error; do not use missing dependencies as a reason to approve.",
@@ -915,14 +872,14 @@ function renderReviewerPrompt(args: {
915
872
  [
916
873
  "output_format",
917
874
  [
918
- "You have a structured-output tool named review_decision. Use it after your investigation and validation attempts.",
875
+ "Use the schema-backed structured_output tool after your investigation and validation attempts.",
919
876
  "The tool terminates the turn and provides the structured data; do not emit a separate final assistant response after calling it.",
920
- "The review gate decides completion only by parsing the JSON object returned by this tool; invalid JSON, missing fields, reviewer_error, or stop_review_loop=false are treated as not approved for safety.",
877
+ "The review gate decides completion only from the JSON object captured by structured_output; invalid JSON, missing fields, reviewer_error, or stop_review_loop=false are treated as not approved for safety.",
921
878
  "Set stop_review_loop=true only when there are no P0/P1/P2 findings, overall_correctness is patch is correct, goal_oracle_satisfied is true, no objective-relevant verification remains, and reviewer_error is null/omitted.",
922
879
  "P3 nice-to-have findings are non-blocking when the rest of the approval contract is satisfied; do not use P3 for work required by the objective or verification oracle.",
923
880
  "If you hit a reviewer/tool/validation error, still return the object with stop_review_loop=false and reviewer_error populated instead of pretending the patch is approved.",
924
881
  [
925
- "The review_decision tool schema is authoritative; do not copy a hand-written JSON blob into the final response. Here is an example output:",
882
+ "The structured_output schema is authoritative; do not copy a hand-written JSON blob into the final response. Here is an example output:",
926
883
  "{",
927
884
  ' "findings": [',
928
885
  " {",
@@ -1080,8 +1037,8 @@ export default defineWorkflow("goal")
1080
1037
  "github-copilot/claude-opus-4.8:xhigh",
1081
1038
  "anthropic/claude-opus-4-8:xhigh"
1082
1039
  ],
1083
- tools: [...goalRunnerTools, reviewDecisionTool.name],
1084
- customTools: [reviewDecisionTool],
1040
+ tools: goalRunnerTools,
1041
+ schema: reviewDecisionSchema,
1085
1042
  };
1086
1043
 
1087
1044
  let latestReviews: ReviewRecord[] = [];
@@ -83,6 +83,8 @@ export type RalphWorkflowOutputs = WorkflowOutputValues & {
83
83
  readonly result?: string;
84
84
  readonly plan?: string;
85
85
  readonly plan_path?: string;
86
+ readonly research?: string;
87
+ readonly research_path?: string;
86
88
  readonly implementation_notes_path?: string;
87
89
  readonly pr_report?: string;
88
90
  readonly approved?: boolean;