npm - @jterrats/open-orchestra - Versions diffs - 1.0.8 → 1.0.9 - Mend

@jterrats/open-orchestra 1.0.8 → 1.0.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (109) hide show

package/AGENTS.md +12 -2
package/CLAUDE.md +13 -2
package/dist/acceptance-criteria-quality.d.ts +12 -0
package/dist/acceptance-criteria-quality.js +137 -0
package/dist/acceptance-criteria-quality.js.map +1 -0
package/dist/architecture-debt-inventory.d.ts +31 -0
package/dist/architecture-debt-inventory.js +200 -0
package/dist/architecture-debt-inventory.js.map +1 -0
package/dist/architecture-debt-report.d.ts +2 -0
package/dist/architecture-debt-report.js +28 -0
package/dist/architecture-debt-report.js.map +1 -0
package/dist/autonomous-phase-lifecycle.d.ts +5 -1
package/dist/autonomous-phase-lifecycle.js +87 -17
package/dist/autonomous-phase-lifecycle.js.map +1 -1
package/dist/cli-payloads.d.ts +4 -0
package/dist/cli-payloads.js +24 -0
package/dist/cli-payloads.js.map +1 -0
package/dist/command-manifest.js +3 -1
package/dist/command-manifest.js.map +1 -1
package/dist/command-routes.js +3 -1
package/dist/command-routes.js.map +1 -1
package/dist/commands.d.ts +1 -1
package/dist/commands.js +16 -3
package/dist/commands.js.map +1 -1
package/dist/github.js +22 -7
package/dist/github.js.map +1 -1
package/dist/metrics-commands.js +69 -17
package/dist/metrics-commands.js.map +1 -1
package/dist/phase-executor.js +5 -169
package/dist/phase-executor.js.map +1 -1
package/dist/phase-playbooks.js +17 -0
package/dist/phase-playbooks.js.map +1 -1
package/dist/qa-e2e-artifacts.d.ts +7 -0
package/dist/qa-e2e-artifacts.js +225 -0
package/dist/qa-e2e-artifacts.js.map +1 -0
package/dist/quality-contracts.d.ts +83 -0
package/dist/quality-contracts.js +463 -0
package/dist/quality-contracts.js.map +1 -0
package/dist/refresh-generated.js +81 -28
package/dist/refresh-generated.js.map +1 -1
package/dist/runtime-bootstrap.js +3 -0
package/dist/runtime-bootstrap.js.map +1 -1
package/dist/runtime-commands.d.ts +2 -0
package/dist/runtime-commands.js +186 -1
package/dist/runtime-commands.js.map +1 -1
package/dist/runtime-context-manifest.d.ts +27 -0
package/dist/runtime-context-manifest.js +151 -0
package/dist/runtime-context-manifest.js.map +1 -0
package/dist/runtime-execution-renderer.d.ts +3 -1
package/dist/runtime-execution-renderer.js +7 -1
package/dist/runtime-execution-renderer.js.map +1 -1
package/dist/runtime-execution.d.ts +2 -1
package/dist/runtime-execution.js +162 -2
package/dist/runtime-execution.js.map +1 -1
package/dist/runtime-guardrails.js +5 -1
package/dist/runtime-guardrails.js.map +1 -1
package/dist/runtime-lifecycle-watch.d.ts +93 -0
package/dist/runtime-lifecycle-watch.js +391 -0
package/dist/runtime-lifecycle-watch.js.map +1 -0
package/dist/runtime-parent-actions.d.ts +7 -2
package/dist/runtime-parent-actions.js +132 -1
package/dist/runtime-parent-actions.js.map +1 -1
package/dist/runtime-spawn-bridge.js +21 -1
package/dist/runtime-spawn-bridge.js.map +1 -1
package/dist/sonar-insights.d.ts +1 -0
package/dist/sonar-insights.js +6 -2
package/dist/sonar-insights.js.map +1 -1
package/dist/types/model-config.d.ts +6 -0
package/dist/types/runtime.d.ts +17 -2
package/dist/types/tasks.d.ts +12 -0
package/dist/types.d.ts +1 -1
package/dist/types.js.map +1 -1
package/dist/web-api.js +8 -0
package/dist/web-api.js.map +1 -1
package/dist/web-console/assets/index-DXbrxR_d.js +11 -0
package/dist/web-console/index.html +1 -1
package/dist/workflow-handoff-assessment.d.ts +3 -0
package/dist/workflow-handoff-assessment.js +246 -0
package/dist/workflow-handoff-assessment.js.map +1 -0
package/dist/workflow-handoff-contract.d.ts +32 -0
package/dist/workflow-handoff-contract.js +123 -0
package/dist/workflow-handoff-contract.js.map +1 -0
package/dist/workflow-phase-transition.d.ts +16 -0
package/dist/workflow-phase-transition.js +76 -0
package/dist/workflow-phase-transition.js.map +1 -0
package/dist/workflow-run-commands.js +47 -12
package/dist/workflow-run-commands.js.map +1 -1
package/dist/workflow-services.js +57 -27
package/dist/workflow-services.js.map +1 -1
package/dist/workspace-init-artifacts.d.ts +9 -0
package/dist/workspace-init-artifacts.js +28 -0
package/dist/workspace-init-artifacts.js.map +1 -1
package/dist/workspace-runtime-bootstrap.d.ts +3 -1
package/dist/workspace-runtime-bootstrap.js +8 -3
package/dist/workspace-runtime-bootstrap.js.map +1 -1
package/dist/workspace.d.ts +5 -2
package/dist/workspace.js +44 -15
package/dist/workspace.js.map +1 -1
package/docs/architecture-debt-inventory.md +25 -0
package/docs/e2e-test-batteries.md +34 -23
package/docs/orchestra-mvp.md +8 -0
package/docs/runtime-adapters.md +68 -8
package/docs/sonar-quality-gates.md +133 -11
package/package.json +4 -1
package/rules/delivery-quality-gates.mdc +6 -0
package/rules/devops-tooling.mdc +1 -0
package/rules/security-guardrails.mdc +3 -0
package/rules/testing-discipline.mdc +9 -0
package/dist/web-console/assets/index-CgSKcay8.js +0 -11

package/docs/architecture-debt-inventory.md ADDED Viewed

@@ -0,0 +1,25 @@
+# Architecture Debt Inventory
+Open Orchestra includes a report-only architecture debt inventory for spotting files that may need future refactoring.
+Run it with:
+```sh
+npm run architecture:inventory
+```
+For machine-readable output:
+```sh
+npm run build
+node scripts/architecture-debt-inventory.js --json
+```
+The inventory reports:
+- Large files over the configured line threshold.
+- Long functions over the configured function threshold.
+- Command-facing modules that may contain orchestration logic.
+- Module-boundary candidates that mix CLI, filesystem, workflow, and domain concerns.
+This slice is intentionally warn-only. It does not fail CI yet because the current repository still needs threshold tuning and incremental refactor stories. Future enforcement slices can promote selected categories to CI failures once the baseline is reviewed.

package/docs/e2e-test-batteries.md CHANGED Viewed

@@ -21,35 +21,36 @@ entry points a user or CI runner actually executes.
 ## P0 Release-Blocking Batteries
-| Battery | Scope | Command | Minimum Assertions | Evidence |
-| --- | --- | --- | --- | --- |
-| Source quality | Static checks, build, unit tests, workflow validation, secret scan, security audit | `npm run precommit` | exit code 0, no leaks, no audit blockers, workflow valid | command log |
-| Local CLI onboarding | Current source CLI in `/tmp` workspaces | `ORCHESTRA_NODE_SCRIPT=$PWD/bin/orchestra.js npm run test:e2e:init` | `--version`, `init`, `status`, `validate`, first-use task, handoff, evidence, release readiness | stdout/stderr, JSON output, filesystem assertions |
-| Installed CLI onboarding | Installed or packaged CLI in `/tmp` workspaces | `npm run test:e2e:init` after installing the candidate package | same assertions as local CLI onboarding, proving the packaged binary matches source behavior | stdout/stderr, JSON output, filesystem assertions, package version |
-| Browser console | Web console task, cost, provider, delegation, recovery, evidence, workflow, accessibility, artifacts | `npm run test:e2e` | visible state, API persistence, evidence attachment, lifecycle transitions, responsive/keyboard behavior | Playwright report, screenshots/traces on failure |
-| Public site | Documentation/site navigation, docs catalog, architecture viewer, mobile fit | `npm run test:e2e` | navigation order, local docs catalog search, no raw GitHub redirect for docs, mobile content fit | Playwright report |
-| Runtime manual queue | Manual runtime delegation in a `/tmp` workspace | `npm run test:e2e:runtime` | two active sessions, third manual `spawn-request` materializes `queued`, artifact includes lifecycle commands, `runtime sessions` lists queued session | stdout/stderr, JSON output, artifact content |
-| Init refresh environments | Simulated Codex, Claude, Cursor, generic workspaces | `node --test e2e/init-refresh-environments.test.js` | missing runtime guidance files regenerate on `init --force`, user content is preserved, managed blocks are updated only inside managed ranges | filesystem diff assertions |
-| Workflow lifecycle CLI | CLI workflow run, gate, resume, QA failback, release readiness | `node --test e2e/workflow-lifecycle-cli.test.js` | task phases create handoffs, blocked QA routes back, routine gate resumes immediately, release readiness maps acceptance to evidence | JSON output, events, handoffs |
+| Battery                   | Scope                                                                                                | Command                                                             | Minimum Assertions                                                                                                                                     | Evidence                                                           |
+| ------------------------- | ---------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------ |
+| Source quality            | Static checks, build, unit tests, workflow validation, secret scan, security audit                   | `npm run precommit`                                                 | exit code 0, no leaks, no audit blockers, workflow valid                                                                                               | command log                                                        |
+| Local CLI onboarding      | Current source CLI in `/tmp` workspaces                                                              | `ORCHESTRA_NODE_SCRIPT=$PWD/bin/orchestra.js npm run test:e2e:init` | `--version`, `init`, `status`, `validate`, first-use task, handoff, evidence, release readiness                                                        | stdout/stderr, JSON output, filesystem assertions                  |
+| Installed CLI onboarding  | Installed or packaged CLI in `/tmp` workspaces                                                       | `npm run test:e2e:init` after installing the candidate package      | same assertions as local CLI onboarding, proving the packaged binary matches source behavior                                                           | stdout/stderr, JSON output, filesystem assertions, package version |
+| Browser console           | Web console task, cost, provider, delegation, recovery, evidence, workflow, accessibility, artifacts | `npm run test:e2e`                                                  | visible state, API persistence, evidence attachment, lifecycle transitions, responsive/keyboard behavior                                               | Playwright report, screenshots/traces on failure                   |
+| Public site               | Documentation/site navigation, docs catalog, architecture viewer, mobile fit                         | `npm run test:e2e`                                                  | navigation order, local docs catalog search, no raw GitHub redirect for docs, mobile content fit                                                       | Playwright report                                                  |
+| Runtime manual queue      | Manual runtime delegation in a `/tmp` workspace                                                      | `npm run test:e2e:runtime`                                          | two active sessions, third manual `spawn-request` materializes `queued`, artifact includes lifecycle commands, `runtime sessions` lists queued session | stdout/stderr, JSON output, artifact content                       |
+| Init refresh environments | Simulated Codex, Claude, Cursor, generic workspaces                                                  | `node --test e2e/init-refresh-environments.test.js`                 | missing runtime guidance files regenerate on `init --force`, user content is preserved, managed blocks are updated only inside managed ranges          | filesystem diff assertions                                         |
+| Workflow lifecycle CLI    | CLI workflow run, gate, resume, QA failback, release readiness                                       | `node --test e2e/workflow-lifecycle-cli.test.js`                    | task phases create handoffs, blocked QA routes back, routine gate resumes immediately, release readiness maps acceptance to evidence                   | JSON output, events, handoffs                                      |
 ## P1 High-Risk Regression Batteries
-| Battery | Scope | Command | Minimum Assertions | Evidence |
-| --- | --- | --- | --- | --- |
-| Multi-squad runtime | Parallel squad delegation with queue and threshold policy | `node --test e2e/runtime-multi-squad.test.js` | independent sessions, non-blocking parent, queued sessions do not fall back to parent, completion order reconciles | JSON output, lifecycle events |
-| Acceptance evidence | CLI, API, browser, and deferred integration evidence | `node --test e2e/acceptance-evidence.test.js` | evidence maps to named acceptance criteria, deferred external validation requires owner and rationale | evidence artifacts |
-| Recovery and repair | Interrupted runs, stale locks, failed provider phases | `node --test e2e/recovery-cli.test.js` plus browser recovery coverage | recovery detects issue, repair requires confirmation, repaired state is observable | JSON output, before/after state |
-| Docs/site content source | Site content generated from docs and manifest | `npm run site:build && npm run test:e2e -- --grep docs` | docs render as human-friendly catalog, no markdown-only dead ends, search works | Playwright report |
-| Security-sensitive operations | File paths, shell execution, web writes, secrets, telemetry redaction | `node --test e2e/security-boundaries.test.js` | path traversal blocked, unsafe writes rejected, secret-like data redacted, no raw stack traces | command/API evidence |
+| Battery                        | Scope                                                                 | Command                                                               | Minimum Assertions                                                                                                                                                     | Evidence                                                     |
+| ------------------------------ | --------------------------------------------------------------------- | --------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------ |
+| Multi-squad runtime            | Parallel squad delegation with queue and threshold policy             | `node --test e2e/runtime-multi-squad.test.js`                         | independent sessions, non-blocking parent, queued sessions do not fall back to parent, completion order reconciles                                                     | JSON output, lifecycle events                                |
+| Acceptance evidence            | CLI, API, browser, and deferred integration evidence                  | `node --test e2e/acceptance-evidence.test.js`                         | evidence maps to named acceptance criteria, deferred external validation requires owner and rationale                                                                  | evidence artifacts                                           |
+| Recovery and repair            | Interrupted runs, stale locks, failed provider phases                 | `node --test e2e/recovery-cli.test.js` plus browser recovery coverage | recovery detects issue, repair requires confirmation, repaired state is observable                                                                                     | JSON output, before/after state                              |
+| Docs/site content source       | Site content generated from docs and manifest                         | `npm run site:build && npm run test:e2e -- --grep docs`               | docs render as human-friendly catalog, no markdown-only dead ends, search works                                                                                        | Playwright report                                            |
+| Security-sensitive operations  | File paths, shell execution, web writes, secrets, telemetry redaction | `node --test e2e/security-boundaries.test.js`                         | path traversal blocked, unsafe writes rejected, secret-like data redacted, no raw stack traces                                                                         | command/API evidence                                         |
+| Ollama provider-backed runtime | Local OpenAI-compatible Ollama provider route in a `/tmp` workspace   | `npm run test:e2e:runtime:ollama`                                     | `model connect --provider ollama`, provider-backed developer phase, OpenAI-compatible request shape, provider provenance, no runtime subagent credentials in artifacts | stdout/stderr, JSON output, mock provider request, event log |
 ## P2 Extended Confidence Batteries
-| Battery | Scope | Command | Minimum Assertions | Evidence |
-| --- | --- | --- | --- | --- |
-| Tracker and GitHub sync | Issue import/export and close readiness | opt-in CI job with network credentials | labels, comments, close gate, release readiness, no secret exposure | sanitized logs |
-| Sonar quality loop | Local or remote Sonar import and release gate mapping | configured Sonar workflow or local compose job | insights imported, release readiness reflects quality gate, unavailable token is explicit | artifact import report |
-| Provider-backed delegation | OpenAI, Gemini, Ollama, Claude/Cursor runtime bridges | opt-in provider E2E | budget checks, rate-limit/backpressure, lifecycle events, no direct API when disallowed | redacted provider provenance |
-| Package release dry run | npm package contents and release check | `npm pack --dry-run --json && orchestra release check --json` | generated/private state excluded, version/tag policy valid, release readiness complete | package list, release report |
+| Battery                    | Scope                                                 | Command                                                       | Minimum Assertions                                                                        | Evidence                     |
+| -------------------------- | ----------------------------------------------------- | ------------------------------------------------------------- | ----------------------------------------------------------------------------------------- | ---------------------------- |
+| Tracker and GitHub sync    | Issue import/export and close readiness               | opt-in CI job with network credentials                        | labels, comments, close gate, release readiness, no secret exposure                       | sanitized logs               |
+| Sonar quality loop         | Local or remote Sonar import and release gate mapping | configured Sonar workflow or local compose job                | insights imported, release readiness reflects quality gate, unavailable token is explicit | artifact import report       |
+| Provider-backed delegation | OpenAI, Gemini, Ollama, Claude/Cursor runtime bridges | opt-in provider E2E                                           | budget checks, rate-limit/backpressure, lifecycle events, no direct API when disallowed   | redacted provider provenance |
+| Package release dry run    | npm package contents and release check                | `npm pack --dry-run --json && orchestra release check --json` | generated/private state excluded, version/tag policy valid, release readiness complete    | package list, release report |
 ## Required `/tmp` Fixture Patterns
@@ -83,6 +84,16 @@ the packaging/install path is wrong.
 5. Add focused security and acceptance-evidence E2E only where unit tests cannot
    prove the user-visible contract.
+## Opt-In Provider Runtime Batteries
+Provider-backed runtime batteries are not part of default CI because they may
+need local services or paid credentials. They must still be deterministic enough
+to run on a developer machine. `npm run test:e2e:runtime:ollama` uses a local
+OpenAI-compatible mock endpoint by default to prove the Ollama adapter contract,
+workflow provenance, and no-secret behavior without requiring a real Ollama
+daemon. A separate real-model smoke can be run with `ORCHESTRA_OLLAMA_SMOKE=1`
+when validating a local model installation.
 ## Definition Of Done
 An E2E battery is complete only when it has:

package/docs/orchestra-mvp.md CHANGED Viewed

@@ -109,6 +109,14 @@ when release readiness passes. If release readiness is blocked, closure
 requires `--accepted-risk <text>`. `--dry-run --json` prints planned commands
 without writing local tasks, comments, or issue state.
+GitHub comment payloads are file-backed. Generated commands use
+`gh issue comment --body-file <payload-file>` for new comments and
+`gh api --input <payload-json-file>` for comment updates instead of embedding
+multiline markdown after `--body` or `-f body=...`. Agents should keep this
+pattern for any copied or derived command: write markdown or JSON payload bytes
+to a temporary file, pass the file path as an argv value, and avoid logging the
+payload contents when command execution fails.
 The transport boundary is intentionally tracker-agnostic. The local CLI can
 execute `gh` because it is a child process with stable arguments. MCP-backed
 trackers such as GitHub MCP, Jira, Bitbucket, GitLab, or a custom work tracker

package/docs/runtime-adapters.md CHANGED Viewed

@@ -161,6 +161,8 @@ orchestra runtime session --session STORY-001:claude-cli --action suspend --json
 orchestra runtime session --session STORY-001:claude-cli --action resume --json
 orchestra runtime session --session STORY-001:claude-cli --action cancel --json
 orchestra runtime spawn-request --task STORY-001 --role developer --runtime codex-cli --json
+orchestra runtime parent-actions --task STORY-001 --json
+orchestra runtime parent-actions --task STORY-001 --dispatch --until-idle --runtime codex-cli --timeout 5m --idle-timeout 10s --json
 orchestra runtime spawn-lifecycle --session STORY-001:manual:developer:codex-cli --status spawned --agent-id <runtime-agent-id> --json
 ```
@@ -172,13 +174,36 @@ failed, or timed-out events so the parent runtime can reconcile claimed work,
 spawned agent ids, stale sessions, and handoff state without inventing a second
 source of truth.
-Spawn request JSON includes `parentRuntimeAction`, a structured instruction for
-the active parent runtime. Codex receives `kind=codex-spawn-agent` with
-`tool=spawn_agent`; Claude receives `kind=claude-agent-request` with
-`tool=claude-code-agent`; Cursor receives `kind=cursor-background-agent` with
-`tool=cursor-background-agent`. The action points to the prompt artifact,
-expected result artifact, ownership paths, allowed commands, and lifecycle
-commands. It does not include secrets or direct provider credentials.
+Spawn request JSON and `runtime parent-actions` include `parentRuntimeAction`, a
+structured instruction for the active parent runtime. Codex receives
+`kind=codex-spawn-agent` with `tool=spawn_agent`; Claude receives
+`kind=claude-agent-request` with `tool=claude-code-agent`; Cursor receives
+`kind=cursor-background-agent` with `tool=cursor-background-agent`. The action
+points to the prompt artifact, expected result artifact, ownership paths,
+allowed commands, and lifecycle commands. It does not include secrets or direct
+provider credentials.
+When `workflow run` pauses with a pending parent runtime action, parent agents
+have two supported paths:
+- Manual inspection: run `runtime parent-actions --task <id> --json`, inspect
+  each requested action, call the active runtime's native tool, then record
+  `runtime spawn-lifecycle` with the returned child id.
+- Auto-dispatch: run
+  `runtime parent-actions --task <id> --dispatch --until-idle --runtime <runtime-id>`.
+  The dispatcher repeatedly inspects pending parent actions, dispatches only
+  safe actions for the active runtime, records spawned lifecycle events with
+  dispatcher session ids, applies `runtime watch` completions when expected
+  handoff artifacts appear, resumes paused workflow runs, and continues across
+  later phases until idle or timeout.
+The auto-dispatch loop is bounded by `--timeout`, `--idle-timeout`, and
+`--interval`, so it never polls forever. It skips queued actions, suspended
+sessions, runtime mismatches, unavailable runtimes, manual/unsupported action
+kinds, and tool mismatches. This keeps the boundary explicit: Orchestra emits
+auditable actions and lifecycle commands; the active parent runtime executes
+native tools such as Codex `spawn_agent`, and the dispatcher only consumes
+actions that are safe for the runtime declared on the command line.
 ## Native Background Agent Notes
@@ -192,7 +217,10 @@ They need a precise packet and lifecycle hooks:
   `runtime spawn-lifecycle`.
 - Codex: render `runtime spawn-request`, read `parentRuntimeAction`, and call
   the parent `spawn_agent` tool with the prompt artifact as the role-scoped
-  assignment. Keep the child detached unless the parent is blocked.
+  assignment. In workflow auto-consumer mode, use
+  `runtime parent-actions --dispatch --until-idle --runtime codex-cli` to
+  discover and consume safe actions after the run pauses. Keep the child
+  detached unless the parent is blocked.
 - Cursor: render `runtime spawn-request`, then launch it as a Cursor Background
   Agent. Background work should stay detached from the current chat and report
   lifecycle state back to Orchestra before the workflow is resumed.
@@ -245,6 +273,22 @@ parent-agent fallback reason. `subagents` requires runtime-native support and
 fails fast if the runtime cannot satisfy it. `single-agent` forces the parent
 agent path and records that choice in phase provenance.
+When no task or role executor is configured and the default executor is
+`generic-runtime`, `auto` and strict `subagents` mode infer the active runtime
+from `OPEN_ORCHESTRA_ACTIVE_RUNTIME`, known parent-runtime environment markers,
+or managed runtime bootstrap files. Codex maps to `codex-cli`, Claude maps to
+`claude-cli`, Cursor maps to `cursor-cli`, Windsurf maps to `windsurf-agent`,
+and VS Code maps to `vscode-agent`.
+Explicit selections always take precedence in this order: `--runtime`, task
+override, role override, then `runtimePolicy.defaults.executor`. Automatic
+inference never rewrites `.agent-workflow/config.json`; it only affects the
+current planning decision. Set `workflow.phaseExecutionMode` to `single-agent`
+or configure `runtimePolicy.defaults.executor` to override inference for
+deterministic local or CI runs. If `OPEN_ORCHESTRA_ACTIVE_RUNTIME` names an
+unknown runtime, workflow planning fails with supported values and the same
+override options instead of requiring hidden config edits.
 Subagent spawning is fully asynchronous by default. A spawn request returns the
 `sessionId`, request artifact, prompt artifact, expected result artifact, status,
 next lifecycle commands, and quality warnings, then the parent agent should
@@ -331,3 +375,19 @@ orchestra runtime sessions --task <id> --json
 orchestra runtime spawn-lifecycle --session <id> --status completed --agent-id <id> --json
 orchestra model providers --json
 ```
+## Ollama E2E
+The Ollama adapter has an opt-in E2E battery that runs in a temporary workspace
+and uses a local OpenAI-compatible endpoint controlled by the test:
+```bash
+npm run test:e2e:runtime:ollama
+```
+The test configures `model connect --provider ollama`, runs a developer phase
+through provider-backed execution, validates the request body sent to
+`/v1/chat/completions`, and checks model provenance events. It intentionally
+does not require a real Ollama daemon, so default CI and local development do
+not degrade when Ollama is unavailable. Use `ORCHESTRA_OLLAMA_SMOKE=1` for a
+separate real-model smoke check.

package/docs/sonar-quality-gates.md CHANGED Viewed

@@ -22,10 +22,13 @@ Required GitHub secret when the GitHub Actions workflow is enabled:
 - `SONAR_TOKEN`: token for SonarQube Cloud or SonarQube Server.
-Optional GitHub secret:
+Optional GitHub secrets:
 - `SONAR_HOST_URL`: required for self-hosted SonarQube Server. Leave unset for
   SonarQube Cloud, or set `http://localhost:9000` only for local commands.
+- `CF_ACCESS_CLIENT_ID` and `CF_ACCESS_CLIENT_SECRET`: Cloudflare Access service
+  token credentials for GitHub-hosted runners that must reach a private
+  self-hosted SonarQube URL protected by Zero Trust.
 Optional GitHub variables:
@@ -37,6 +40,12 @@ Optional GitHub variables:
   `workflow_dispatch`.
 - `SONAR_QUALITY_GATE_WAIT`: set to `true` to fail the workflow when the remote
   quality gate fails.
+- `SONAR_RUNNER`: set to `self-hosted` to run the Sonar workflow on a local
+  runner that can reach the shared SonarQube runtime directly. When this is set,
+  the workflow uses `http://localhost:9001` by default and skips Cloudflare
+  Access service-token checks.
+- `SONAR_LOCAL_HOST_URL`: optional override for self-hosted runner mode when the
+  runner reaches SonarQube through a different local-only URL.
 The workflow skips analysis when `SONAR_TOKEN` is not configured. This keeps
 forks and offline development usable. For private repositories, keep
@@ -55,29 +64,142 @@ gate status. If the scanner can upload analysis but the wait step fails with
 ## Local SonarQube
-Open Orchestra includes `docker-compose.sonar.yml` for local SonarQube
-dogfooding:
+Open Orchestra does not own the long-lived local SonarQube containers. The
+shared laptop/VPS runtime lives in `~/dev/sonarqube_jeterrats_dev` so multiple
+projects can use the same SonarQube server without tying its lifecycle to this
+repository.
 ```bash
-docker compose -f docker-compose.sonar.yml up -d
+cd ~/dev/sonarqube_jeterrats_dev
+docker compose up -d
 ```
-Open `http://localhost:9000`, complete the SonarQube first-run setup, create a
-project key, and generate a project token. Then run scanner/import commands
-against the local host. Example import after analysis is available:
+The shared runtime binds SonarQube to `127.0.0.1:${SONAR_PORT:-9001}` by
+default, persists data in Docker volumes, and routes
+`sonarqube.jterrats.dev` through the Cloudflare Tunnel named
+`open-orchestra-sonar-local`.
+The local database password is a rotated strong value stored only in the shared
+infra `.env` file with owner-only file permissions; do not reset it to the
+default `sonar` password.
+```bash
+cd ~/dev/sonarqube_jeterrats_dev
+docker compose ps
+docker compose logs -f sonarqube
+docker compose logs -f cloudflared
+```
+This repository keeps only project-specific assets: `sonar-project.properties`,
+scanner scripts, import commands, and release evidence. That separation avoids
+one project accidentally stopping, deleting, changing ports, or rotating
+credentials for every other project using the same SonarQube server.
+Open `http://localhost:9001`, complete the SonarQube first-run setup if needed,
+create the `jterrats_open-orchestra` project key, and generate a project token.
+The scanner and `orchestra sonar import` both authenticate with the token as
+SonarQube Basic auth (`<token>:`), so the token must be valid for analysis and
+API reads on the target project. Then run scanner/import commands against the
+local or tunnel host. Example local scan:
+```bash
+SONAR_HOST_URL=http://localhost:9001 SONAR_TOKEN=<local-token> npm run sonar:scan:local
+```
+Example import after analysis is available:
 ```bash
 SONAR_TOKEN=<local-token> node bin/orchestra.js sonar import \
   --provider sonarqube-local \
-  --host-url http://localhost:9000 \
-  --project-key open-orchestra \
+  --host-url http://localhost:9001 \
+  --project-key jterrats_open-orchestra \
   --branch main \
   --task GH-368-LOCAL-SONARQUBE-PROVIDER \
   --json
 ```
-HTTP is accepted only for `sonarqube-local` on localhost. Self-hosted and cloud
-hosts must use HTTPS.
+HTTP is accepted only for `sonarqube-local` on localhost. Shared tunnel and
+cloud hosts must use HTTPS.
+### Private Cloudflare Access
+Do not expose SonarQube as a public DNS-only origin. If remote access is needed,
+use Cloudflare Tunnel with Cloudflare Access so `sonarqube.jterrats.dev` is an
+authenticated private entry point, not an open public service.
+Minimum Cloudflare setup:
+- Create a tunnel for this laptop or temporary VPS. The current tunnel name is
+  `open-orchestra-sonar-local`.
+- Route `sonarqube.jterrats.dev` to the Sonar service behind the tunnel. The DNS
+  record is a proxied CNAME to
+  `6fb60222-1427-4ca1-bf11-9e19375d39ff.cfargotunnel.com`.
+- Protect the hostname with a Cloudflare Access self-hosted application.
+- Restrict Access to named users, groups, or a short-lived maintainer policy.
+- Require MFA at the identity provider when possible.
+- Keep SonarQube itself authenticated; Cloudflare Access is an outer gate, not a
+  replacement for Sonar users and tokens.
+When the tunnel hostname is active, CI can use
+`SONAR_HOST_URL=https://sonarqube.jterrats.dev` only if the runner has an
+approved Access path. For GitHub-hosted runners, create a Cloudflare Access
+service token, add a Service Auth policy scoped to the SonarQube application,
+and configure `CF_ACCESS_CLIENT_ID` plus `CF_ACCESS_CLIENT_SECRET` as GitHub
+secrets. The workflow starts an ephemeral localhost proxy that injects those
+headers for SonarScanner and Orchestra import calls; browser login remains
+required for human access. The proxy readiness check validates SonarQube through
+the configured `SONAR_TOKEN` so private SonarQube instances that require
+authentication do not fail on anonymous health endpoints.
+If the Access service token secrets are not configured, the workflow keeps the
+normal direct Sonar URL behavior. Use local analysis evidence or a self-hosted
+runner when GitHub-hosted runners cannot access the private endpoint.
+### Self-Hosted Runner Mode
+For private local SonarQube on a laptop or low-cost VPS, prefer a self-hosted
+GitHub Actions runner over exposing the analyzer path through Cloudflare Access.
+Configure the repository or organization variable:
+```bash
+gh variable set SONAR_RUNNER --repo jterratsdev/open-orchestra --body self-hosted
+```
+Register the runner with dedicated labels so only the Sonar job can claim it.
+Do not include OS-specific labels in the workflow unless the Sonar runtime truly
+depends on that operating system; this keeps the same CI definition usable from
+a macOS laptop today and a Linux host later.
+```text
+self-hosted
+sonar
+local-sonar
+```
+For the current laptop setup, use the macOS ARM64 runner package and configure
+the runner with `--labels sonar,local-sonar`. GitHub automatically adds the
+platform labels such as `self-hosted`, `macOS`, and `ARM64`.
+Keep the shared SonarQube stack running locally:
+```bash
+cd ~/dev/sonarqube_jterrats_dev
+docker compose up -d
+```
+When `SONAR_RUNNER=self-hosted`, the workflow resolves SonarQube to
+`http://localhost:9001` unless `SONAR_LOCAL_HOST_URL` is set. This intentionally
+ignores `SONAR_HOST_URL`, so organization-level Cloudflare tunnel secrets do not
+pull local machine analysis back through Zero Trust. Cloudflare Access remains
+available for human remote browser usage and for GitHub-hosted runner access to
+private SonarQube only. The CI scan uses
+`continue-on-error` on the scanner step so Orchestra can still import and upload
+Sonar evidence when the quality gate fails; a final workflow step re-fails the
+job after evidence is captured.
+If the runner itself runs inside a container, `localhost` points at the runner
+container. In that case either run the runner process on the host, attach the
+runner container to the SonarQube Docker network, or set `SONAR_LOCAL_HOST_URL`
+to the host/network address that reaches SonarQube without Cloudflare.
 Sonar reads TypeScript through `tsconfig.sonar.json`, a standalone analyzer
 config that mirrors the build compiler options but lowers only the analyzer

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@jterrats/open-orchestra",
-  "version": "1.0.8",
+  "version": "1.0.9",
   "type": "module",
   "workspaces": [
     "extensions/vscode-open-orchestra",
@@ -18,16 +18,19 @@
     "test:e2e": "npm run build && npm run site:build && playwright test",
     "test:e2e:init": "node --test e2e/init-onboarding.test.js",
     "test:e2e:runtime": "node --test e2e/runtime-manual-queue.test.js",
+    "test:e2e:runtime:ollama": "npm run build && node --test e2e/runtime-ollama-provider.test.js",
     "lint": "eslint . && prettier --check \"{bin,e2e,scripts,test,src}/**/*.js\" \"{site,web-console}/src/**/*.{css,js,jsx}\" \"{site,web-console}/*.{html,js,json}\" \"extensions/**/*.{cjs,json,md}\" \"src/**/*.ts\" \"*.{js,json}\"",
     "format": "prettier --write \"{bin,e2e,scripts,test,src}/**/*.js\" \"{site,web-console}/src/**/*.{css,js,jsx}\" \"{site,web-console}/*.{html,js,json}\" \"extensions/**/*.{cjs,json,md}\" \"src/**/*.ts\" \"*.{js,json}\"",
     "secret-scan": "node scripts/secret-scan.js",
     "security:audit": "node scripts/security-audit.js",
+    "architecture:inventory": "npm run build && node scripts/architecture-debt-inventory.js",
     "duplicates": "jscpd --config .jscpd.json",
     "validate:workflow": "node scripts/validate-workflow.js",
     "release:matrix": "node scripts/release-test-matrix.js",
     "performance:bench": "npm run build && node scripts/performance-benchmark.js",
     "precommit": "npm run lint && npm run typecheck && npm run secret-scan && npm run security:audit && npm test && npm run validate:workflow",
     "prepack": "npm run build",
+    "sonar:scan:local": "sonar-scanner -Dsonar.host.url=${SONAR_HOST_URL:-http://localhost:9001}",
     "hooks:install": "git config core.hooksPath .githooks",
     "build:web": "npm run build:web:legacy && npm run build:web:react",
     "build:web:legacy": "esbuild src/web-console-client.js --bundle --format=esm --platform=browser --target=es2022 --outfile=dist/assets/web-console.js",

package/rules/delivery-quality-gates.mdc CHANGED Viewed

@@ -19,6 +19,10 @@ Development work is not complete when code compiles. Every implementation must m
 - QA receives the Developer handoff before release approval.
 - QA must produce a test plan covering acceptance criteria, regression areas, edge cases, data setup, and environment assumptions.
+- QA must block test planning when acceptance criteria are fragmented, non-verifiable, or only role/phase headings. Return those findings to PO/BA before release evidence is generated.
+- QA plans must include an AC-to-evidence matrix with expected observable result, actual result, artifact/command, and pass/fail/deferred status for each acceptance criterion.
+- QA must validate that the planned tests exercise the actual risk, not a weaker surrogate. For scope/split, handoff, workflow, runtime, queueing, failback, or release-gate bugs, the test data must create the condition that should trigger the guardrail.
+- QA must block when the plan substitutes a weaker surface for the requested behavior, such as browser smoke for workflow/CLI behavior, command execution without stdout/files/events checks, or API response checks without receiver-side effects for integrations.
 - QA must execute or explicitly defer each test case with a reason.
 - QA findings must include severity, reproduction steps, expected result, actual result, and evidence.
 - QA execution must be reviewable through a sprint-review-style evidence demo before release approval. Analyst/BA compares the executed evidence against the GitHub issue, user story, acceptance criteria, and Orchestra task; Architect reviews whether the tests cover architecture contracts, boundaries, integrations, data flow, and risk areas.
@@ -29,6 +33,8 @@ Development work is not complete when code compiles. Every implementation must m
 - QA and Developer must identify which manual checks should become automated tests.
 - Prefer Playwright for browser-based E2E, smoke, and regression flows.
+- Automation for every product surface must use isolated deterministic fixtures: web, mobile, desktop, CLI, API, integrations, workflow/runtime, installer, data, and generated-artifact flows. Use the configured E2E fixture path, sandbox org, emulator, container, device farm, or test environment when the user/project provides one; otherwise default local disposable fixtures to `/tmp`.
+- Automation must assert state transitions and final artifacts. For workflow-style systems, assert the automaton path: valid transition, blocked transition, loop/return transition, resume, and final release transition when applicable. For UI/mobile/API/integration systems, assert the user-visible state, device/responsive behavior, API contract, receiver-side side effect, persisted data, async job/event, or external sandbox state that proves the acceptance criterion.
 - Use Page Object pattern for Playwright suites. Selectors belong in page objects or stable test helpers, not scattered through test bodies.
 - Automated tests must be deterministic and avoid real network, clock, or randomness unless controlled by fixtures, mocks, or seeded data.

package/rules/devops-tooling.mdc CHANGED Viewed

@@ -30,6 +30,7 @@ DevOps decisions must cover deployability, scalability, downtime strategy, obser
 - Prefer managed services when they reduce operational risk without creating unacceptable lock-in or cost exposure.
 - Record tool choices and major operational trade-offs in an ADR when they affect long-term operations.
 - CI/CD, IaC, runbooks, and operational scripts that repeat command matrices, provider lists, environment maps, or resource collections must load the `collection-standards` skill.
+- Local Docker stacks must publish ports on `127.0.0.1` unless the task explicitly requires LAN/public access and Security has accepted the risk. Databases, caches, queues, admin UIs, metrics backends, and Docker socket access are private by default.
 ## Scalability
 - Define expected traffic, data volume, concurrency, growth assumptions, and bottlenecks.

package/rules/security-guardrails.mdc CHANGED Viewed

@@ -35,3 +35,6 @@ These are non-negotiable. Violations must be fixed before code review.
 - For databases, encryption, IaC, environment segregation, secrets management, scalability, and vulnerability management, see **infra-data-encryption.mdc**.
 - Production networked services must consider TLS 1.2+, certificate management, HSTS where applicable, secure cookies, least privilege, and secret rotation.
+- Local Docker Compose, dev servers, databases, caches, observability tools, and admin UIs must bind published ports to `127.0.0.1` by default.
+- Binding to `0.0.0.0`, `[::]`, or a LAN interface is a security exception. It needs a linked task, explicit rationale, no default credentials, and a time-bounded review.
+- Never expose Redis, Postgres, admin consoles, Docker socket, or internal APIs on public/LAN interfaces unless Security approves the exact use case and compensating controls.

package/rules/testing-discipline.mdc CHANGED Viewed

@@ -40,6 +40,11 @@ alwaysApply: true
 - Use the Page Object pattern for UI tests. Selectors live in page objects, not test bodies.
 - Tag tests by speed/scope (`@smoke`, `@regression`) so CI can run fast feedback loops.
 - Capture evidence for E2E failures with traces, screenshots, or videos when supported by the framework.
+- E2E tests for any product surface must run against isolated disposable fixtures: web apps, mobile apps, desktop apps, CLI, APIs, integrations, workflows, runtimes, installers, file-system flows, data pipelines, and generated artifacts. Use the user/project-configured E2E fixture path, device farm, sandbox org, emulator, container, or test environment when one is explicitly provided; otherwise default local disposable fixtures to `/tmp`. Each test creates its own users/data/project/tasks/roles/acceptance criteria/expected artifacts as applicable; never rely on the developer's current repo state as the tested product state.
+- QA must choose fixture data that reproduces the risk being validated. If the acceptance criterion is about oversized stories, split decisions, phase returns, queueing, or specialist roles, the E2E must create an oversized/cross-cutting task with enough paths, roles, acceptance criteria, and risk signals to force that behavior.
+- Every E2E must assert the resulting state, not only that the action executed. Validate UI-visible state, navigation, accessibility, device/responsive behavior, API contracts, receiver-side integration state, database/mock records, files, events, handoff contents, stdout/stderr, exit code, push notifications, background jobs, generated artifacts, or release-readiness output as applicable.
+- For workflow automata, E2E must validate state transitions and loops: allowed transition, blocked transition, return-to-dev/architect/BA when findings fail, and resume behavior after the corrective state is satisfied.
+- Include negative and edge scenarios when they are the reason for the change. A happy-path smoke is insufficient for bugs involving guardrails, split detection, security boundaries, QA failback, or release blocking.
 - QA, SDET, Developer, BA, Architect, and Release work that produces or reviews evidence must load the `qa-evidence-pack` skill when it involves acceptance criteria coverage, Playwright/browser artifacts, CLI stdout/stderr, API contracts, integration side effects, screenshots, visual diffs, or annotated defect evidence.
 - Keep large screenshots, videos, traces, logs, API payloads, and visual diffs as files. Summarize them in a compact evidence report so agents do not consume context with raw artifacts.
@@ -47,6 +52,10 @@ alwaysApply: true
 - Developer must provide QA with test commands run, pass/fail results, covered scenarios, and known gaps.
 - QA must produce a test plan before release approval and map every acceptance criterion to automated, manual, contract/mock, or deferred evidence.
+- QA must reject fragmented acceptance criteria before planning tests. Role names, phase names, headings, and partial clauses are not executable criteria and must return to PO/BA for rewrite.
+- QA plans must include an AC-to-evidence matrix with: acceptance criterion, test type, fixture/setup, command or artifact, expected observable result, actual result, and pass/fail/deferred status.
+- QA must challenge weak tests before approving: if the test fixture is too small to exercise the bug, uses only the happy path, does not create the expected failure/return condition, or does not inspect the artifact/state that proves the acceptance criterion, QA must block or request changes.
+- QA must reject weaker surrogate tests. Validate the affected product surface directly: workflow/CLI/API/integration/generated artifact/mobile/desktop/data behavior cannot be approved only by a generic browser smoke or command-executed check.
 - QA evidence must validate observable outcomes, not only execution. CLI checks assert exit code, stdout/stderr, files, events, or final state; browser checks assert visible user-facing state; API checks assert response contract and side effects; integration checks assert sandbox/mock/contract/webhook/event/log outcomes or defer with owner and rationale.
 - Evidence summaries or metadata must name the covered acceptance criterion or explicitly state that all acceptance criteria are covered. Smoke and regression checks are useful but do not count as acceptance coverage unless they map to an acceptance criterion.
 - Visual/UI/diagram defect evidence must include source or expected image when available, actual screenshot/render, diff image when practical, and an annotated screenshot for ambiguous failures. Use red boxes for broken bounds/overlap, orange arrows for wrong connectors or flow, yellow translucent areas for excess spacing, blue guide lines for alignment, and short defect labels.