@jterrats/open-orchestra 1.0.8 → 1.0.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (109) hide show
  1. package/AGENTS.md +12 -2
  2. package/CLAUDE.md +13 -2
  3. package/dist/acceptance-criteria-quality.d.ts +12 -0
  4. package/dist/acceptance-criteria-quality.js +137 -0
  5. package/dist/acceptance-criteria-quality.js.map +1 -0
  6. package/dist/architecture-debt-inventory.d.ts +31 -0
  7. package/dist/architecture-debt-inventory.js +200 -0
  8. package/dist/architecture-debt-inventory.js.map +1 -0
  9. package/dist/architecture-debt-report.d.ts +2 -0
  10. package/dist/architecture-debt-report.js +28 -0
  11. package/dist/architecture-debt-report.js.map +1 -0
  12. package/dist/autonomous-phase-lifecycle.d.ts +5 -1
  13. package/dist/autonomous-phase-lifecycle.js +87 -17
  14. package/dist/autonomous-phase-lifecycle.js.map +1 -1
  15. package/dist/cli-payloads.d.ts +4 -0
  16. package/dist/cli-payloads.js +24 -0
  17. package/dist/cli-payloads.js.map +1 -0
  18. package/dist/command-manifest.js +3 -1
  19. package/dist/command-manifest.js.map +1 -1
  20. package/dist/command-routes.js +3 -1
  21. package/dist/command-routes.js.map +1 -1
  22. package/dist/commands.d.ts +1 -1
  23. package/dist/commands.js +16 -3
  24. package/dist/commands.js.map +1 -1
  25. package/dist/github.js +22 -7
  26. package/dist/github.js.map +1 -1
  27. package/dist/metrics-commands.js +69 -17
  28. package/dist/metrics-commands.js.map +1 -1
  29. package/dist/phase-executor.js +5 -169
  30. package/dist/phase-executor.js.map +1 -1
  31. package/dist/phase-playbooks.js +17 -0
  32. package/dist/phase-playbooks.js.map +1 -1
  33. package/dist/qa-e2e-artifacts.d.ts +7 -0
  34. package/dist/qa-e2e-artifacts.js +225 -0
  35. package/dist/qa-e2e-artifacts.js.map +1 -0
  36. package/dist/quality-contracts.d.ts +83 -0
  37. package/dist/quality-contracts.js +463 -0
  38. package/dist/quality-contracts.js.map +1 -0
  39. package/dist/refresh-generated.js +81 -28
  40. package/dist/refresh-generated.js.map +1 -1
  41. package/dist/runtime-bootstrap.js +3 -0
  42. package/dist/runtime-bootstrap.js.map +1 -1
  43. package/dist/runtime-commands.d.ts +2 -0
  44. package/dist/runtime-commands.js +186 -1
  45. package/dist/runtime-commands.js.map +1 -1
  46. package/dist/runtime-context-manifest.d.ts +27 -0
  47. package/dist/runtime-context-manifest.js +151 -0
  48. package/dist/runtime-context-manifest.js.map +1 -0
  49. package/dist/runtime-execution-renderer.d.ts +3 -1
  50. package/dist/runtime-execution-renderer.js +7 -1
  51. package/dist/runtime-execution-renderer.js.map +1 -1
  52. package/dist/runtime-execution.d.ts +2 -1
  53. package/dist/runtime-execution.js +162 -2
  54. package/dist/runtime-execution.js.map +1 -1
  55. package/dist/runtime-guardrails.js +5 -1
  56. package/dist/runtime-guardrails.js.map +1 -1
  57. package/dist/runtime-lifecycle-watch.d.ts +93 -0
  58. package/dist/runtime-lifecycle-watch.js +391 -0
  59. package/dist/runtime-lifecycle-watch.js.map +1 -0
  60. package/dist/runtime-parent-actions.d.ts +7 -2
  61. package/dist/runtime-parent-actions.js +132 -1
  62. package/dist/runtime-parent-actions.js.map +1 -1
  63. package/dist/runtime-spawn-bridge.js +21 -1
  64. package/dist/runtime-spawn-bridge.js.map +1 -1
  65. package/dist/sonar-insights.d.ts +1 -0
  66. package/dist/sonar-insights.js +6 -2
  67. package/dist/sonar-insights.js.map +1 -1
  68. package/dist/types/model-config.d.ts +6 -0
  69. package/dist/types/runtime.d.ts +17 -2
  70. package/dist/types/tasks.d.ts +12 -0
  71. package/dist/types.d.ts +1 -1
  72. package/dist/types.js.map +1 -1
  73. package/dist/web-api.js +8 -0
  74. package/dist/web-api.js.map +1 -1
  75. package/dist/web-console/assets/index-DXbrxR_d.js +11 -0
  76. package/dist/web-console/index.html +1 -1
  77. package/dist/workflow-handoff-assessment.d.ts +3 -0
  78. package/dist/workflow-handoff-assessment.js +246 -0
  79. package/dist/workflow-handoff-assessment.js.map +1 -0
  80. package/dist/workflow-handoff-contract.d.ts +32 -0
  81. package/dist/workflow-handoff-contract.js +123 -0
  82. package/dist/workflow-handoff-contract.js.map +1 -0
  83. package/dist/workflow-phase-transition.d.ts +16 -0
  84. package/dist/workflow-phase-transition.js +76 -0
  85. package/dist/workflow-phase-transition.js.map +1 -0
  86. package/dist/workflow-run-commands.js +47 -12
  87. package/dist/workflow-run-commands.js.map +1 -1
  88. package/dist/workflow-services.js +57 -27
  89. package/dist/workflow-services.js.map +1 -1
  90. package/dist/workspace-init-artifacts.d.ts +9 -0
  91. package/dist/workspace-init-artifacts.js +28 -0
  92. package/dist/workspace-init-artifacts.js.map +1 -1
  93. package/dist/workspace-runtime-bootstrap.d.ts +3 -1
  94. package/dist/workspace-runtime-bootstrap.js +8 -3
  95. package/dist/workspace-runtime-bootstrap.js.map +1 -1
  96. package/dist/workspace.d.ts +5 -2
  97. package/dist/workspace.js +44 -15
  98. package/dist/workspace.js.map +1 -1
  99. package/docs/architecture-debt-inventory.md +25 -0
  100. package/docs/e2e-test-batteries.md +34 -23
  101. package/docs/orchestra-mvp.md +8 -0
  102. package/docs/runtime-adapters.md +68 -8
  103. package/docs/sonar-quality-gates.md +133 -11
  104. package/package.json +4 -1
  105. package/rules/delivery-quality-gates.mdc +6 -0
  106. package/rules/devops-tooling.mdc +1 -0
  107. package/rules/security-guardrails.mdc +3 -0
  108. package/rules/testing-discipline.mdc +9 -0
  109. package/dist/web-console/assets/index-CgSKcay8.js +0 -11
@@ -0,0 +1,25 @@
1
+ # Architecture Debt Inventory
2
+
3
+ Open Orchestra includes a report-only architecture debt inventory for spotting files that may need future refactoring.
4
+
5
+ Run it with:
6
+
7
+ ```sh
8
+ npm run architecture:inventory
9
+ ```
10
+
11
+ For machine-readable output:
12
+
13
+ ```sh
14
+ npm run build
15
+ node scripts/architecture-debt-inventory.js --json
16
+ ```
17
+
18
+ The inventory reports:
19
+
20
+ - Large files over the configured line threshold.
21
+ - Long functions over the configured function threshold.
22
+ - Command-facing modules that may contain orchestration logic.
23
+ - Module-boundary candidates that mix CLI, filesystem, workflow, and domain concerns.
24
+
25
+ This slice is intentionally warn-only. It does not fail CI yet because the current repository still needs threshold tuning and incremental refactor stories. Future enforcement slices can promote selected categories to CI failures once the baseline is reviewed.
@@ -21,35 +21,36 @@ entry points a user or CI runner actually executes.
21
21
 
22
22
  ## P0 Release-Blocking Batteries
23
23
 
24
- | Battery | Scope | Command | Minimum Assertions | Evidence |
25
- | --- | --- | --- | --- | --- |
26
- | Source quality | Static checks, build, unit tests, workflow validation, secret scan, security audit | `npm run precommit` | exit code 0, no leaks, no audit blockers, workflow valid | command log |
27
- | Local CLI onboarding | Current source CLI in `/tmp` workspaces | `ORCHESTRA_NODE_SCRIPT=$PWD/bin/orchestra.js npm run test:e2e:init` | `--version`, `init`, `status`, `validate`, first-use task, handoff, evidence, release readiness | stdout/stderr, JSON output, filesystem assertions |
28
- | Installed CLI onboarding | Installed or packaged CLI in `/tmp` workspaces | `npm run test:e2e:init` after installing the candidate package | same assertions as local CLI onboarding, proving the packaged binary matches source behavior | stdout/stderr, JSON output, filesystem assertions, package version |
29
- | Browser console | Web console task, cost, provider, delegation, recovery, evidence, workflow, accessibility, artifacts | `npm run test:e2e` | visible state, API persistence, evidence attachment, lifecycle transitions, responsive/keyboard behavior | Playwright report, screenshots/traces on failure |
30
- | Public site | Documentation/site navigation, docs catalog, architecture viewer, mobile fit | `npm run test:e2e` | navigation order, local docs catalog search, no raw GitHub redirect for docs, mobile content fit | Playwright report |
31
- | Runtime manual queue | Manual runtime delegation in a `/tmp` workspace | `npm run test:e2e:runtime` | two active sessions, third manual `spawn-request` materializes `queued`, artifact includes lifecycle commands, `runtime sessions` lists queued session | stdout/stderr, JSON output, artifact content |
32
- | Init refresh environments | Simulated Codex, Claude, Cursor, generic workspaces | `node --test e2e/init-refresh-environments.test.js` | missing runtime guidance files regenerate on `init --force`, user content is preserved, managed blocks are updated only inside managed ranges | filesystem diff assertions |
33
- | Workflow lifecycle CLI | CLI workflow run, gate, resume, QA failback, release readiness | `node --test e2e/workflow-lifecycle-cli.test.js` | task phases create handoffs, blocked QA routes back, routine gate resumes immediately, release readiness maps acceptance to evidence | JSON output, events, handoffs |
24
+ | Battery | Scope | Command | Minimum Assertions | Evidence |
25
+ | ------------------------- | ---------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------ |
26
+ | Source quality | Static checks, build, unit tests, workflow validation, secret scan, security audit | `npm run precommit` | exit code 0, no leaks, no audit blockers, workflow valid | command log |
27
+ | Local CLI onboarding | Current source CLI in `/tmp` workspaces | `ORCHESTRA_NODE_SCRIPT=$PWD/bin/orchestra.js npm run test:e2e:init` | `--version`, `init`, `status`, `validate`, first-use task, handoff, evidence, release readiness | stdout/stderr, JSON output, filesystem assertions |
28
+ | Installed CLI onboarding | Installed or packaged CLI in `/tmp` workspaces | `npm run test:e2e:init` after installing the candidate package | same assertions as local CLI onboarding, proving the packaged binary matches source behavior | stdout/stderr, JSON output, filesystem assertions, package version |
29
+ | Browser console | Web console task, cost, provider, delegation, recovery, evidence, workflow, accessibility, artifacts | `npm run test:e2e` | visible state, API persistence, evidence attachment, lifecycle transitions, responsive/keyboard behavior | Playwright report, screenshots/traces on failure |
30
+ | Public site | Documentation/site navigation, docs catalog, architecture viewer, mobile fit | `npm run test:e2e` | navigation order, local docs catalog search, no raw GitHub redirect for docs, mobile content fit | Playwright report |
31
+ | Runtime manual queue | Manual runtime delegation in a `/tmp` workspace | `npm run test:e2e:runtime` | two active sessions, third manual `spawn-request` materializes `queued`, artifact includes lifecycle commands, `runtime sessions` lists queued session | stdout/stderr, JSON output, artifact content |
32
+ | Init refresh environments | Simulated Codex, Claude, Cursor, generic workspaces | `node --test e2e/init-refresh-environments.test.js` | missing runtime guidance files regenerate on `init --force`, user content is preserved, managed blocks are updated only inside managed ranges | filesystem diff assertions |
33
+ | Workflow lifecycle CLI | CLI workflow run, gate, resume, QA failback, release readiness | `node --test e2e/workflow-lifecycle-cli.test.js` | task phases create handoffs, blocked QA routes back, routine gate resumes immediately, release readiness maps acceptance to evidence | JSON output, events, handoffs |
34
34
 
35
35
  ## P1 High-Risk Regression Batteries
36
36
 
37
- | Battery | Scope | Command | Minimum Assertions | Evidence |
38
- | --- | --- | --- | --- | --- |
39
- | Multi-squad runtime | Parallel squad delegation with queue and threshold policy | `node --test e2e/runtime-multi-squad.test.js` | independent sessions, non-blocking parent, queued sessions do not fall back to parent, completion order reconciles | JSON output, lifecycle events |
40
- | Acceptance evidence | CLI, API, browser, and deferred integration evidence | `node --test e2e/acceptance-evidence.test.js` | evidence maps to named acceptance criteria, deferred external validation requires owner and rationale | evidence artifacts |
41
- | Recovery and repair | Interrupted runs, stale locks, failed provider phases | `node --test e2e/recovery-cli.test.js` plus browser recovery coverage | recovery detects issue, repair requires confirmation, repaired state is observable | JSON output, before/after state |
42
- | Docs/site content source | Site content generated from docs and manifest | `npm run site:build && npm run test:e2e -- --grep docs` | docs render as human-friendly catalog, no markdown-only dead ends, search works | Playwright report |
43
- | Security-sensitive operations | File paths, shell execution, web writes, secrets, telemetry redaction | `node --test e2e/security-boundaries.test.js` | path traversal blocked, unsafe writes rejected, secret-like data redacted, no raw stack traces | command/API evidence |
37
+ | Battery | Scope | Command | Minimum Assertions | Evidence |
38
+ | ------------------------------ | --------------------------------------------------------------------- | --------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------ |
39
+ | Multi-squad runtime | Parallel squad delegation with queue and threshold policy | `node --test e2e/runtime-multi-squad.test.js` | independent sessions, non-blocking parent, queued sessions do not fall back to parent, completion order reconciles | JSON output, lifecycle events |
40
+ | Acceptance evidence | CLI, API, browser, and deferred integration evidence | `node --test e2e/acceptance-evidence.test.js` | evidence maps to named acceptance criteria, deferred external validation requires owner and rationale | evidence artifacts |
41
+ | Recovery and repair | Interrupted runs, stale locks, failed provider phases | `node --test e2e/recovery-cli.test.js` plus browser recovery coverage | recovery detects issue, repair requires confirmation, repaired state is observable | JSON output, before/after state |
42
+ | Docs/site content source | Site content generated from docs and manifest | `npm run site:build && npm run test:e2e -- --grep docs` | docs render as human-friendly catalog, no markdown-only dead ends, search works | Playwright report |
43
+ | Security-sensitive operations | File paths, shell execution, web writes, secrets, telemetry redaction | `node --test e2e/security-boundaries.test.js` | path traversal blocked, unsafe writes rejected, secret-like data redacted, no raw stack traces | command/API evidence |
44
+ | Ollama provider-backed runtime | Local OpenAI-compatible Ollama provider route in a `/tmp` workspace | `npm run test:e2e:runtime:ollama` | `model connect --provider ollama`, provider-backed developer phase, OpenAI-compatible request shape, provider provenance, no runtime subagent credentials in artifacts | stdout/stderr, JSON output, mock provider request, event log |
44
45
 
45
46
  ## P2 Extended Confidence Batteries
46
47
 
47
- | Battery | Scope | Command | Minimum Assertions | Evidence |
48
- | --- | --- | --- | --- | --- |
49
- | Tracker and GitHub sync | Issue import/export and close readiness | opt-in CI job with network credentials | labels, comments, close gate, release readiness, no secret exposure | sanitized logs |
50
- | Sonar quality loop | Local or remote Sonar import and release gate mapping | configured Sonar workflow or local compose job | insights imported, release readiness reflects quality gate, unavailable token is explicit | artifact import report |
51
- | Provider-backed delegation | OpenAI, Gemini, Ollama, Claude/Cursor runtime bridges | opt-in provider E2E | budget checks, rate-limit/backpressure, lifecycle events, no direct API when disallowed | redacted provider provenance |
52
- | Package release dry run | npm package contents and release check | `npm pack --dry-run --json && orchestra release check --json` | generated/private state excluded, version/tag policy valid, release readiness complete | package list, release report |
48
+ | Battery | Scope | Command | Minimum Assertions | Evidence |
49
+ | -------------------------- | ----------------------------------------------------- | ------------------------------------------------------------- | ----------------------------------------------------------------------------------------- | ---------------------------- |
50
+ | Tracker and GitHub sync | Issue import/export and close readiness | opt-in CI job with network credentials | labels, comments, close gate, release readiness, no secret exposure | sanitized logs |
51
+ | Sonar quality loop | Local or remote Sonar import and release gate mapping | configured Sonar workflow or local compose job | insights imported, release readiness reflects quality gate, unavailable token is explicit | artifact import report |
52
+ | Provider-backed delegation | OpenAI, Gemini, Ollama, Claude/Cursor runtime bridges | opt-in provider E2E | budget checks, rate-limit/backpressure, lifecycle events, no direct API when disallowed | redacted provider provenance |
53
+ | Package release dry run | npm package contents and release check | `npm pack --dry-run --json && orchestra release check --json` | generated/private state excluded, version/tag policy valid, release readiness complete | package list, release report |
53
54
 
54
55
  ## Required `/tmp` Fixture Patterns
55
56
 
@@ -83,6 +84,16 @@ the packaging/install path is wrong.
83
84
  5. Add focused security and acceptance-evidence E2E only where unit tests cannot
84
85
  prove the user-visible contract.
85
86
 
87
+ ## Opt-In Provider Runtime Batteries
88
+
89
+ Provider-backed runtime batteries are not part of default CI because they may
90
+ need local services or paid credentials. They must still be deterministic enough
91
+ to run on a developer machine. `npm run test:e2e:runtime:ollama` uses a local
92
+ OpenAI-compatible mock endpoint by default to prove the Ollama adapter contract,
93
+ workflow provenance, and no-secret behavior without requiring a real Ollama
94
+ daemon. A separate real-model smoke can be run with `ORCHESTRA_OLLAMA_SMOKE=1`
95
+ when validating a local model installation.
96
+
86
97
  ## Definition Of Done
87
98
 
88
99
  An E2E battery is complete only when it has:
@@ -109,6 +109,14 @@ when release readiness passes. If release readiness is blocked, closure
109
109
  requires `--accepted-risk <text>`. `--dry-run --json` prints planned commands
110
110
  without writing local tasks, comments, or issue state.
111
111
 
112
+ GitHub comment payloads are file-backed. Generated commands use
113
+ `gh issue comment --body-file <payload-file>` for new comments and
114
+ `gh api --input <payload-json-file>` for comment updates instead of embedding
115
+ multiline markdown after `--body` or `-f body=...`. Agents should keep this
116
+ pattern for any copied or derived command: write markdown or JSON payload bytes
117
+ to a temporary file, pass the file path as an argv value, and avoid logging the
118
+ payload contents when command execution fails.
119
+
112
120
  The transport boundary is intentionally tracker-agnostic. The local CLI can
113
121
  execute `gh` because it is a child process with stable arguments. MCP-backed
114
122
  trackers such as GitHub MCP, Jira, Bitbucket, GitLab, or a custom work tracker
@@ -161,6 +161,8 @@ orchestra runtime session --session STORY-001:claude-cli --action suspend --json
161
161
  orchestra runtime session --session STORY-001:claude-cli --action resume --json
162
162
  orchestra runtime session --session STORY-001:claude-cli --action cancel --json
163
163
  orchestra runtime spawn-request --task STORY-001 --role developer --runtime codex-cli --json
164
+ orchestra runtime parent-actions --task STORY-001 --json
165
+ orchestra runtime parent-actions --task STORY-001 --dispatch --until-idle --runtime codex-cli --timeout 5m --idle-timeout 10s --json
164
166
  orchestra runtime spawn-lifecycle --session STORY-001:manual:developer:codex-cli --status spawned --agent-id <runtime-agent-id> --json
165
167
  ```
166
168
 
@@ -172,13 +174,36 @@ failed, or timed-out events so the parent runtime can reconcile claimed work,
172
174
  spawned agent ids, stale sessions, and handoff state without inventing a second
173
175
  source of truth.
174
176
 
175
- Spawn request JSON includes `parentRuntimeAction`, a structured instruction for
176
- the active parent runtime. Codex receives `kind=codex-spawn-agent` with
177
- `tool=spawn_agent`; Claude receives `kind=claude-agent-request` with
178
- `tool=claude-code-agent`; Cursor receives `kind=cursor-background-agent` with
179
- `tool=cursor-background-agent`. The action points to the prompt artifact,
180
- expected result artifact, ownership paths, allowed commands, and lifecycle
181
- commands. It does not include secrets or direct provider credentials.
177
+ Spawn request JSON and `runtime parent-actions` include `parentRuntimeAction`, a
178
+ structured instruction for the active parent runtime. Codex receives
179
+ `kind=codex-spawn-agent` with `tool=spawn_agent`; Claude receives
180
+ `kind=claude-agent-request` with `tool=claude-code-agent`; Cursor receives
181
+ `kind=cursor-background-agent` with `tool=cursor-background-agent`. The action
182
+ points to the prompt artifact, expected result artifact, ownership paths,
183
+ allowed commands, and lifecycle commands. It does not include secrets or direct
184
+ provider credentials.
185
+
186
+ When `workflow run` pauses with a pending parent runtime action, parent agents
187
+ have two supported paths:
188
+
189
+ - Manual inspection: run `runtime parent-actions --task <id> --json`, inspect
190
+ each requested action, call the active runtime's native tool, then record
191
+ `runtime spawn-lifecycle` with the returned child id.
192
+ - Auto-dispatch: run
193
+ `runtime parent-actions --task <id> --dispatch --until-idle --runtime <runtime-id>`.
194
+ The dispatcher repeatedly inspects pending parent actions, dispatches only
195
+ safe actions for the active runtime, records spawned lifecycle events with
196
+ dispatcher session ids, applies `runtime watch` completions when expected
197
+ handoff artifacts appear, resumes paused workflow runs, and continues across
198
+ later phases until idle or timeout.
199
+
200
+ The auto-dispatch loop is bounded by `--timeout`, `--idle-timeout`, and
201
+ `--interval`, so it never polls forever. It skips queued actions, suspended
202
+ sessions, runtime mismatches, unavailable runtimes, manual/unsupported action
203
+ kinds, and tool mismatches. This keeps the boundary explicit: Orchestra emits
204
+ auditable actions and lifecycle commands; the active parent runtime executes
205
+ native tools such as Codex `spawn_agent`, and the dispatcher only consumes
206
+ actions that are safe for the runtime declared on the command line.
182
207
 
183
208
  ## Native Background Agent Notes
184
209
 
@@ -192,7 +217,10 @@ They need a precise packet and lifecycle hooks:
192
217
  `runtime spawn-lifecycle`.
193
218
  - Codex: render `runtime spawn-request`, read `parentRuntimeAction`, and call
194
219
  the parent `spawn_agent` tool with the prompt artifact as the role-scoped
195
- assignment. Keep the child detached unless the parent is blocked.
220
+ assignment. In workflow auto-consumer mode, use
221
+ `runtime parent-actions --dispatch --until-idle --runtime codex-cli` to
222
+ discover and consume safe actions after the run pauses. Keep the child
223
+ detached unless the parent is blocked.
196
224
  - Cursor: render `runtime spawn-request`, then launch it as a Cursor Background
197
225
  Agent. Background work should stay detached from the current chat and report
198
226
  lifecycle state back to Orchestra before the workflow is resumed.
@@ -245,6 +273,22 @@ parent-agent fallback reason. `subagents` requires runtime-native support and
245
273
  fails fast if the runtime cannot satisfy it. `single-agent` forces the parent
246
274
  agent path and records that choice in phase provenance.
247
275
 
276
+ When no task or role executor is configured and the default executor is
277
+ `generic-runtime`, `auto` and strict `subagents` mode infer the active runtime
278
+ from `OPEN_ORCHESTRA_ACTIVE_RUNTIME`, known parent-runtime environment markers,
279
+ or managed runtime bootstrap files. Codex maps to `codex-cli`, Claude maps to
280
+ `claude-cli`, Cursor maps to `cursor-cli`, Windsurf maps to `windsurf-agent`,
281
+ and VS Code maps to `vscode-agent`.
282
+
283
+ Explicit selections always take precedence in this order: `--runtime`, task
284
+ override, role override, then `runtimePolicy.defaults.executor`. Automatic
285
+ inference never rewrites `.agent-workflow/config.json`; it only affects the
286
+ current planning decision. Set `workflow.phaseExecutionMode` to `single-agent`
287
+ or configure `runtimePolicy.defaults.executor` to override inference for
288
+ deterministic local or CI runs. If `OPEN_ORCHESTRA_ACTIVE_RUNTIME` names an
289
+ unknown runtime, workflow planning fails with supported values and the same
290
+ override options instead of requiring hidden config edits.
291
+
248
292
  Subagent spawning is fully asynchronous by default. A spawn request returns the
249
293
  `sessionId`, request artifact, prompt artifact, expected result artifact, status,
250
294
  next lifecycle commands, and quality warnings, then the parent agent should
@@ -331,3 +375,19 @@ orchestra runtime sessions --task <id> --json
331
375
  orchestra runtime spawn-lifecycle --session <id> --status completed --agent-id <id> --json
332
376
  orchestra model providers --json
333
377
  ```
378
+
379
+ ## Ollama E2E
380
+
381
+ The Ollama adapter has an opt-in E2E battery that runs in a temporary workspace
382
+ and uses a local OpenAI-compatible endpoint controlled by the test:
383
+
384
+ ```bash
385
+ npm run test:e2e:runtime:ollama
386
+ ```
387
+
388
+ The test configures `model connect --provider ollama`, runs a developer phase
389
+ through provider-backed execution, validates the request body sent to
390
+ `/v1/chat/completions`, and checks model provenance events. It intentionally
391
+ does not require a real Ollama daemon, so default CI and local development do
392
+ not degrade when Ollama is unavailable. Use `ORCHESTRA_OLLAMA_SMOKE=1` for a
393
+ separate real-model smoke check.
@@ -22,10 +22,13 @@ Required GitHub secret when the GitHub Actions workflow is enabled:
22
22
 
23
23
  - `SONAR_TOKEN`: token for SonarQube Cloud or SonarQube Server.
24
24
 
25
- Optional GitHub secret:
25
+ Optional GitHub secrets:
26
26
 
27
27
  - `SONAR_HOST_URL`: required for self-hosted SonarQube Server. Leave unset for
28
28
  SonarQube Cloud, or set `http://localhost:9000` only for local commands.
29
+ - `CF_ACCESS_CLIENT_ID` and `CF_ACCESS_CLIENT_SECRET`: Cloudflare Access service
30
+ token credentials for GitHub-hosted runners that must reach a private
31
+ self-hosted SonarQube URL protected by Zero Trust.
29
32
 
30
33
  Optional GitHub variables:
31
34
 
@@ -37,6 +40,12 @@ Optional GitHub variables:
37
40
  `workflow_dispatch`.
38
41
  - `SONAR_QUALITY_GATE_WAIT`: set to `true` to fail the workflow when the remote
39
42
  quality gate fails.
43
+ - `SONAR_RUNNER`: set to `self-hosted` to run the Sonar workflow on a local
44
+ runner that can reach the shared SonarQube runtime directly. When this is set,
45
+ the workflow uses `http://localhost:9001` by default and skips Cloudflare
46
+ Access service-token checks.
47
+ - `SONAR_LOCAL_HOST_URL`: optional override for self-hosted runner mode when the
48
+ runner reaches SonarQube through a different local-only URL.
40
49
 
41
50
  The workflow skips analysis when `SONAR_TOKEN` is not configured. This keeps
42
51
  forks and offline development usable. For private repositories, keep
@@ -55,29 +64,142 @@ gate status. If the scanner can upload analysis but the wait step fails with
55
64
 
56
65
  ## Local SonarQube
57
66
 
58
- Open Orchestra includes `docker-compose.sonar.yml` for local SonarQube
59
- dogfooding:
67
+ Open Orchestra does not own the long-lived local SonarQube containers. The
68
+ shared laptop/VPS runtime lives in `~/dev/sonarqube_jeterrats_dev` so multiple
69
+ projects can use the same SonarQube server without tying its lifecycle to this
70
+ repository.
60
71
 
61
72
  ```bash
62
- docker compose -f docker-compose.sonar.yml up -d
73
+ cd ~/dev/sonarqube_jeterrats_dev
74
+ docker compose up -d
63
75
  ```
64
76
 
65
- Open `http://localhost:9000`, complete the SonarQube first-run setup, create a
66
- project key, and generate a project token. Then run scanner/import commands
67
- against the local host. Example import after analysis is available:
77
+ The shared runtime binds SonarQube to `127.0.0.1:${SONAR_PORT:-9001}` by
78
+ default, persists data in Docker volumes, and routes
79
+ `sonarqube.jterrats.dev` through the Cloudflare Tunnel named
80
+ `open-orchestra-sonar-local`.
81
+ The local database password is a rotated strong value stored only in the shared
82
+ infra `.env` file with owner-only file permissions; do not reset it to the
83
+ default `sonar` password.
84
+
85
+ ```bash
86
+ cd ~/dev/sonarqube_jeterrats_dev
87
+ docker compose ps
88
+ docker compose logs -f sonarqube
89
+ docker compose logs -f cloudflared
90
+ ```
91
+
92
+ This repository keeps only project-specific assets: `sonar-project.properties`,
93
+ scanner scripts, import commands, and release evidence. That separation avoids
94
+ one project accidentally stopping, deleting, changing ports, or rotating
95
+ credentials for every other project using the same SonarQube server.
96
+
97
+ Open `http://localhost:9001`, complete the SonarQube first-run setup if needed,
98
+ create the `jterrats_open-orchestra` project key, and generate a project token.
99
+ The scanner and `orchestra sonar import` both authenticate with the token as
100
+ SonarQube Basic auth (`<token>:`), so the token must be valid for analysis and
101
+ API reads on the target project. Then run scanner/import commands against the
102
+ local or tunnel host. Example local scan:
103
+
104
+ ```bash
105
+ SONAR_HOST_URL=http://localhost:9001 SONAR_TOKEN=<local-token> npm run sonar:scan:local
106
+ ```
107
+
108
+ Example import after analysis is available:
68
109
 
69
110
  ```bash
70
111
  SONAR_TOKEN=<local-token> node bin/orchestra.js sonar import \
71
112
  --provider sonarqube-local \
72
- --host-url http://localhost:9000 \
73
- --project-key open-orchestra \
113
+ --host-url http://localhost:9001 \
114
+ --project-key jterrats_open-orchestra \
74
115
  --branch main \
75
116
  --task GH-368-LOCAL-SONARQUBE-PROVIDER \
76
117
  --json
77
118
  ```
78
119
 
79
- HTTP is accepted only for `sonarqube-local` on localhost. Self-hosted and cloud
80
- hosts must use HTTPS.
120
+ HTTP is accepted only for `sonarqube-local` on localhost. Shared tunnel and
121
+ cloud hosts must use HTTPS.
122
+
123
+ ### Private Cloudflare Access
124
+
125
+ Do not expose SonarQube as a public DNS-only origin. If remote access is needed,
126
+ use Cloudflare Tunnel with Cloudflare Access so `sonarqube.jterrats.dev` is an
127
+ authenticated private entry point, not an open public service.
128
+
129
+ Minimum Cloudflare setup:
130
+
131
+ - Create a tunnel for this laptop or temporary VPS. The current tunnel name is
132
+ `open-orchestra-sonar-local`.
133
+ - Route `sonarqube.jterrats.dev` to the Sonar service behind the tunnel. The DNS
134
+ record is a proxied CNAME to
135
+ `6fb60222-1427-4ca1-bf11-9e19375d39ff.cfargotunnel.com`.
136
+ - Protect the hostname with a Cloudflare Access self-hosted application.
137
+ - Restrict Access to named users, groups, or a short-lived maintainer policy.
138
+ - Require MFA at the identity provider when possible.
139
+ - Keep SonarQube itself authenticated; Cloudflare Access is an outer gate, not a
140
+ replacement for Sonar users and tokens.
141
+
142
+ When the tunnel hostname is active, CI can use
143
+ `SONAR_HOST_URL=https://sonarqube.jterrats.dev` only if the runner has an
144
+ approved Access path. For GitHub-hosted runners, create a Cloudflare Access
145
+ service token, add a Service Auth policy scoped to the SonarQube application,
146
+ and configure `CF_ACCESS_CLIENT_ID` plus `CF_ACCESS_CLIENT_SECRET` as GitHub
147
+ secrets. The workflow starts an ephemeral localhost proxy that injects those
148
+ headers for SonarScanner and Orchestra import calls; browser login remains
149
+ required for human access. The proxy readiness check validates SonarQube through
150
+ the configured `SONAR_TOKEN` so private SonarQube instances that require
151
+ authentication do not fail on anonymous health endpoints.
152
+
153
+ If the Access service token secrets are not configured, the workflow keeps the
154
+ normal direct Sonar URL behavior. Use local analysis evidence or a self-hosted
155
+ runner when GitHub-hosted runners cannot access the private endpoint.
156
+
157
+ ### Self-Hosted Runner Mode
158
+
159
+ For private local SonarQube on a laptop or low-cost VPS, prefer a self-hosted
160
+ GitHub Actions runner over exposing the analyzer path through Cloudflare Access.
161
+ Configure the repository or organization variable:
162
+
163
+ ```bash
164
+ gh variable set SONAR_RUNNER --repo jterratsdev/open-orchestra --body self-hosted
165
+ ```
166
+
167
+ Register the runner with dedicated labels so only the Sonar job can claim it.
168
+ Do not include OS-specific labels in the workflow unless the Sonar runtime truly
169
+ depends on that operating system; this keeps the same CI definition usable from
170
+ a macOS laptop today and a Linux host later.
171
+
172
+ ```text
173
+ self-hosted
174
+ sonar
175
+ local-sonar
176
+ ```
177
+
178
+ For the current laptop setup, use the macOS ARM64 runner package and configure
179
+ the runner with `--labels sonar,local-sonar`. GitHub automatically adds the
180
+ platform labels such as `self-hosted`, `macOS`, and `ARM64`.
181
+
182
+ Keep the shared SonarQube stack running locally:
183
+
184
+ ```bash
185
+ cd ~/dev/sonarqube_jterrats_dev
186
+ docker compose up -d
187
+ ```
188
+
189
+ When `SONAR_RUNNER=self-hosted`, the workflow resolves SonarQube to
190
+ `http://localhost:9001` unless `SONAR_LOCAL_HOST_URL` is set. This intentionally
191
+ ignores `SONAR_HOST_URL`, so organization-level Cloudflare tunnel secrets do not
192
+ pull local machine analysis back through Zero Trust. Cloudflare Access remains
193
+ available for human remote browser usage and for GitHub-hosted runner access to
194
+ private SonarQube only. The CI scan uses
195
+ `continue-on-error` on the scanner step so Orchestra can still import and upload
196
+ Sonar evidence when the quality gate fails; a final workflow step re-fails the
197
+ job after evidence is captured.
198
+
199
+ If the runner itself runs inside a container, `localhost` points at the runner
200
+ container. In that case either run the runner process on the host, attach the
201
+ runner container to the SonarQube Docker network, or set `SONAR_LOCAL_HOST_URL`
202
+ to the host/network address that reaches SonarQube without Cloudflare.
81
203
 
82
204
  Sonar reads TypeScript through `tsconfig.sonar.json`, a standalone analyzer
83
205
  config that mirrors the build compiler options but lowers only the analyzer
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@jterrats/open-orchestra",
3
- "version": "1.0.8",
3
+ "version": "1.0.9",
4
4
  "type": "module",
5
5
  "workspaces": [
6
6
  "extensions/vscode-open-orchestra",
@@ -18,16 +18,19 @@
18
18
  "test:e2e": "npm run build && npm run site:build && playwright test",
19
19
  "test:e2e:init": "node --test e2e/init-onboarding.test.js",
20
20
  "test:e2e:runtime": "node --test e2e/runtime-manual-queue.test.js",
21
+ "test:e2e:runtime:ollama": "npm run build && node --test e2e/runtime-ollama-provider.test.js",
21
22
  "lint": "eslint . && prettier --check \"{bin,e2e,scripts,test,src}/**/*.js\" \"{site,web-console}/src/**/*.{css,js,jsx}\" \"{site,web-console}/*.{html,js,json}\" \"extensions/**/*.{cjs,json,md}\" \"src/**/*.ts\" \"*.{js,json}\"",
22
23
  "format": "prettier --write \"{bin,e2e,scripts,test,src}/**/*.js\" \"{site,web-console}/src/**/*.{css,js,jsx}\" \"{site,web-console}/*.{html,js,json}\" \"extensions/**/*.{cjs,json,md}\" \"src/**/*.ts\" \"*.{js,json}\"",
23
24
  "secret-scan": "node scripts/secret-scan.js",
24
25
  "security:audit": "node scripts/security-audit.js",
26
+ "architecture:inventory": "npm run build && node scripts/architecture-debt-inventory.js",
25
27
  "duplicates": "jscpd --config .jscpd.json",
26
28
  "validate:workflow": "node scripts/validate-workflow.js",
27
29
  "release:matrix": "node scripts/release-test-matrix.js",
28
30
  "performance:bench": "npm run build && node scripts/performance-benchmark.js",
29
31
  "precommit": "npm run lint && npm run typecheck && npm run secret-scan && npm run security:audit && npm test && npm run validate:workflow",
30
32
  "prepack": "npm run build",
33
+ "sonar:scan:local": "sonar-scanner -Dsonar.host.url=${SONAR_HOST_URL:-http://localhost:9001}",
31
34
  "hooks:install": "git config core.hooksPath .githooks",
32
35
  "build:web": "npm run build:web:legacy && npm run build:web:react",
33
36
  "build:web:legacy": "esbuild src/web-console-client.js --bundle --format=esm --platform=browser --target=es2022 --outfile=dist/assets/web-console.js",
@@ -19,6 +19,10 @@ Development work is not complete when code compiles. Every implementation must m
19
19
 
20
20
  - QA receives the Developer handoff before release approval.
21
21
  - QA must produce a test plan covering acceptance criteria, regression areas, edge cases, data setup, and environment assumptions.
22
+ - QA must block test planning when acceptance criteria are fragmented, non-verifiable, or only role/phase headings. Return those findings to PO/BA before release evidence is generated.
23
+ - QA plans must include an AC-to-evidence matrix with expected observable result, actual result, artifact/command, and pass/fail/deferred status for each acceptance criterion.
24
+ - QA must validate that the planned tests exercise the actual risk, not a weaker surrogate. For scope/split, handoff, workflow, runtime, queueing, failback, or release-gate bugs, the test data must create the condition that should trigger the guardrail.
25
+ - QA must block when the plan substitutes a weaker surface for the requested behavior, such as browser smoke for workflow/CLI behavior, command execution without stdout/files/events checks, or API response checks without receiver-side effects for integrations.
22
26
  - QA must execute or explicitly defer each test case with a reason.
23
27
  - QA findings must include severity, reproduction steps, expected result, actual result, and evidence.
24
28
  - QA execution must be reviewable through a sprint-review-style evidence demo before release approval. Analyst/BA compares the executed evidence against the GitHub issue, user story, acceptance criteria, and Orchestra task; Architect reviews whether the tests cover architecture contracts, boundaries, integrations, data flow, and risk areas.
@@ -29,6 +33,8 @@ Development work is not complete when code compiles. Every implementation must m
29
33
 
30
34
  - QA and Developer must identify which manual checks should become automated tests.
31
35
  - Prefer Playwright for browser-based E2E, smoke, and regression flows.
36
+ - Automation for every product surface must use isolated deterministic fixtures: web, mobile, desktop, CLI, API, integrations, workflow/runtime, installer, data, and generated-artifact flows. Use the configured E2E fixture path, sandbox org, emulator, container, device farm, or test environment when the user/project provides one; otherwise default local disposable fixtures to `/tmp`.
37
+ - Automation must assert state transitions and final artifacts. For workflow-style systems, assert the automaton path: valid transition, blocked transition, loop/return transition, resume, and final release transition when applicable. For UI/mobile/API/integration systems, assert the user-visible state, device/responsive behavior, API contract, receiver-side side effect, persisted data, async job/event, or external sandbox state that proves the acceptance criterion.
32
38
  - Use Page Object pattern for Playwright suites. Selectors belong in page objects or stable test helpers, not scattered through test bodies.
33
39
  - Automated tests must be deterministic and avoid real network, clock, or randomness unless controlled by fixtures, mocks, or seeded data.
34
40
 
@@ -30,6 +30,7 @@ DevOps decisions must cover deployability, scalability, downtime strategy, obser
30
30
  - Prefer managed services when they reduce operational risk without creating unacceptable lock-in or cost exposure.
31
31
  - Record tool choices and major operational trade-offs in an ADR when they affect long-term operations.
32
32
  - CI/CD, IaC, runbooks, and operational scripts that repeat command matrices, provider lists, environment maps, or resource collections must load the `collection-standards` skill.
33
+ - Local Docker stacks must publish ports on `127.0.0.1` unless the task explicitly requires LAN/public access and Security has accepted the risk. Databases, caches, queues, admin UIs, metrics backends, and Docker socket access are private by default.
33
34
 
34
35
  ## Scalability
35
36
  - Define expected traffic, data volume, concurrency, growth assumptions, and bottlenecks.
@@ -35,3 +35,6 @@ These are non-negotiable. Violations must be fixed before code review.
35
35
 
36
36
  - For databases, encryption, IaC, environment segregation, secrets management, scalability, and vulnerability management, see **infra-data-encryption.mdc**.
37
37
  - Production networked services must consider TLS 1.2+, certificate management, HSTS where applicable, secure cookies, least privilege, and secret rotation.
38
+ - Local Docker Compose, dev servers, databases, caches, observability tools, and admin UIs must bind published ports to `127.0.0.1` by default.
39
+ - Binding to `0.0.0.0`, `[::]`, or a LAN interface is a security exception. It needs a linked task, explicit rationale, no default credentials, and a time-bounded review.
40
+ - Never expose Redis, Postgres, admin consoles, Docker socket, or internal APIs on public/LAN interfaces unless Security approves the exact use case and compensating controls.
@@ -40,6 +40,11 @@ alwaysApply: true
40
40
  - Use the Page Object pattern for UI tests. Selectors live in page objects, not test bodies.
41
41
  - Tag tests by speed/scope (`@smoke`, `@regression`) so CI can run fast feedback loops.
42
42
  - Capture evidence for E2E failures with traces, screenshots, or videos when supported by the framework.
43
+ - E2E tests for any product surface must run against isolated disposable fixtures: web apps, mobile apps, desktop apps, CLI, APIs, integrations, workflows, runtimes, installers, file-system flows, data pipelines, and generated artifacts. Use the user/project-configured E2E fixture path, device farm, sandbox org, emulator, container, or test environment when one is explicitly provided; otherwise default local disposable fixtures to `/tmp`. Each test creates its own users/data/project/tasks/roles/acceptance criteria/expected artifacts as applicable; never rely on the developer's current repo state as the tested product state.
44
+ - QA must choose fixture data that reproduces the risk being validated. If the acceptance criterion is about oversized stories, split decisions, phase returns, queueing, or specialist roles, the E2E must create an oversized/cross-cutting task with enough paths, roles, acceptance criteria, and risk signals to force that behavior.
45
+ - Every E2E must assert the resulting state, not only that the action executed. Validate UI-visible state, navigation, accessibility, device/responsive behavior, API contracts, receiver-side integration state, database/mock records, files, events, handoff contents, stdout/stderr, exit code, push notifications, background jobs, generated artifacts, or release-readiness output as applicable.
46
+ - For workflow automata, E2E must validate state transitions and loops: allowed transition, blocked transition, return-to-dev/architect/BA when findings fail, and resume behavior after the corrective state is satisfied.
47
+ - Include negative and edge scenarios when they are the reason for the change. A happy-path smoke is insufficient for bugs involving guardrails, split detection, security boundaries, QA failback, or release blocking.
43
48
  - QA, SDET, Developer, BA, Architect, and Release work that produces or reviews evidence must load the `qa-evidence-pack` skill when it involves acceptance criteria coverage, Playwright/browser artifacts, CLI stdout/stderr, API contracts, integration side effects, screenshots, visual diffs, or annotated defect evidence.
44
49
  - Keep large screenshots, videos, traces, logs, API payloads, and visual diffs as files. Summarize them in a compact evidence report so agents do not consume context with raw artifacts.
45
50
 
@@ -47,6 +52,10 @@ alwaysApply: true
47
52
 
48
53
  - Developer must provide QA with test commands run, pass/fail results, covered scenarios, and known gaps.
49
54
  - QA must produce a test plan before release approval and map every acceptance criterion to automated, manual, contract/mock, or deferred evidence.
55
+ - QA must reject fragmented acceptance criteria before planning tests. Role names, phase names, headings, and partial clauses are not executable criteria and must return to PO/BA for rewrite.
56
+ - QA plans must include an AC-to-evidence matrix with: acceptance criterion, test type, fixture/setup, command or artifact, expected observable result, actual result, and pass/fail/deferred status.
57
+ - QA must challenge weak tests before approving: if the test fixture is too small to exercise the bug, uses only the happy path, does not create the expected failure/return condition, or does not inspect the artifact/state that proves the acceptance criterion, QA must block or request changes.
58
+ - QA must reject weaker surrogate tests. Validate the affected product surface directly: workflow/CLI/API/integration/generated artifact/mobile/desktop/data behavior cannot be approved only by a generic browser smoke or command-executed check.
50
59
  - QA evidence must validate observable outcomes, not only execution. CLI checks assert exit code, stdout/stderr, files, events, or final state; browser checks assert visible user-facing state; API checks assert response contract and side effects; integration checks assert sandbox/mock/contract/webhook/event/log outcomes or defer with owner and rationale.
51
60
  - Evidence summaries or metadata must name the covered acceptance criterion or explicitly state that all acceptance criteria are covered. Smoke and regression checks are useful but do not count as acceptance coverage unless they map to an acceptance criterion.
52
61
  - Visual/UI/diagram defect evidence must include source or expected image when available, actual screenshot/render, diff image when practical, and an annotated screenshot for ambiguous failures. Use red boxes for broken bounds/overlap, orange arrows for wrong connectors or flow, yellow translucent areas for excess spacing, blue guide lines for alignment, and short defect labels.