npm - @jterrats/open-orchestra - Versions diffs - 1.0.2 → 1.0.4 - Mend

@jterrats/open-orchestra 1.0.2 → 1.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (151) hide show

package/AGENTS.md +7 -2
package/CLAUDE.md +2 -2
package/README.md +3 -0
package/dist/args.js +12 -2
package/dist/args.js.map +1 -1
package/dist/assets/web-console.js +44 -0
package/dist/autonomous-phase-lifecycle.js +23 -3
package/dist/autonomous-phase-lifecycle.js.map +1 -1
package/dist/autonomous-run-state.js +2 -0
package/dist/autonomous-run-state.js.map +1 -1
package/dist/benchmark.js +6 -0
package/dist/benchmark.js.map +1 -1
package/dist/cli.js +4 -1
package/dist/cli.js.map +1 -1
package/dist/command-manifest.js +4 -3
package/dist/command-manifest.js.map +1 -1
package/dist/command-utils.js +4 -5
package/dist/command-utils.js.map +1 -1
package/dist/commands.d.ts +1 -1
package/dist/commands.js +1 -1
package/dist/commands.js.map +1 -1
package/dist/metrics-commands.js +8 -0
package/dist/metrics-commands.js.map +1 -1
package/dist/phase-playbooks.js +27 -1
package/dist/phase-playbooks.js.map +1 -1
package/dist/roles/core-roles.js +10 -5
package/dist/roles/core-roles.js.map +1 -1
package/dist/skills-catalog.js +136 -0
package/dist/skills-catalog.js.map +1 -1
package/dist/skills-commands.d.ts +1 -0
package/dist/skills-commands.js +37 -1
package/dist/skills-commands.js.map +1 -1
package/dist/skills-planning.d.ts +2 -1
package/dist/skills-planning.js +79 -11
package/dist/skills-planning.js.map +1 -1
package/dist/skills.d.ts +1 -1
package/dist/skills.js +1 -1
package/dist/skills.js.map +1 -1
package/dist/task-graph-commands.js +36 -8
package/dist/task-graph-commands.js.map +1 -1
package/dist/types/metrics.d.ts +2 -0
package/dist/types/skills.d.ts +9 -0
package/dist/types/tasks.d.ts +8 -1
package/dist/types.d.ts +2 -2
package/dist/types.js.map +1 -1
package/dist/web-api.js +80 -7
package/dist/web-api.js.map +1 -1
package/dist/workflow-approval-service.js +13 -0
package/dist/workflow-approval-service.js.map +1 -1
package/dist/workflow-evidence-service.js +37 -2
package/dist/workflow-evidence-service.js.map +1 -1
package/dist/workflow-gates.js +56 -1
package/dist/workflow-gates.js.map +1 -1
package/dist/workflow-phase-planner.js +86 -13
package/dist/workflow-phase-planner.js.map +1 -1
package/dist/workflow-run-commands.d.ts +1 -0
package/dist/workflow-run-commands.js +11 -6
package/dist/workflow-run-commands.js.map +1 -1
package/dist/workflow-services.js +24 -0
package/dist/workflow-services.js.map +1 -1
package/dist/workflow-task-service.js +27 -2
package/dist/workflow-task-service.js.map +1 -1
package/docs/adoption-guide.md +22 -1
package/docs/advisory-supervisor-architecture.md +206 -0
package/docs/architecture.md +47 -41
package/docs/autonomous-workflow.md +2 -2
package/docs/backlog/ac-evidence-bugfix-stories-20260517.md +76 -0
package/docs/backlog/chaos-testing-stack-strategy.md +146 -0
package/docs/backlog/dev-best-practices-hardening-story.md +69 -0
package/docs/backlog/docs-public-internal-package-hygiene-story.md +62 -0
package/docs/backlog/project-persona-registry-epic.md +350 -0
package/docs/backlog/prompt-bank-registry-epic.md +159 -0
package/docs/backlog/site-docs-manifest-story.md +56 -0
package/docs/dev-team-specialist-role-profiles.md +1 -1
package/docs/diagrams/diagram-master-prompt.md +207 -0
package/docs/diagrams/enterprise-set/README.md +22 -0
package/docs/diagrams/enterprise-set/lead-to-account-swimlanes.svg +38 -0
package/docs/diagrams/enterprise-set/product-implementation-timeline.svg +45 -0
package/docs/diagrams/enterprise-set/salesforce-enterprise-architecture.svg +54 -0
package/docs/diagrams/experiments/pixel-v2-review.md +124 -0
package/docs/diagrams/experiments/roadmap/diagram.mmd +14 -0
package/docs/diagrams/experiments/roadmap/diagram.svg +48 -0
package/docs/diagrams/experiments/roadmap/experiment.md +44 -0
package/docs/diagrams/experiments/sfdc-implementation/diagram.mmd +54 -0
package/docs/diagrams/experiments/sfdc-implementation/diagram.svg +72 -0
package/docs/diagrams/experiments/sfdc-implementation/experiment.md +41 -0
package/docs/diagrams/experiments/swimlane/diagram.mmd +40 -0
package/docs/diagrams/experiments/swimlane/diagram.svg +70 -0
package/docs/diagrams/experiments/swimlane/experiment.md +50 -0
package/docs/diagrams/experiments/timeline/diagram.mmd +9 -0
package/docs/diagrams/experiments/timeline/diagram.svg +29 -0
package/docs/diagrams/experiments/timeline/experiment.md +34 -0
package/docs/diagrams/final-artifact-hygiene.md +40 -0
package/docs/diagrams/mermaid-target-strategy.md +106 -0
package/docs/diagrams/payment-gateway/architecture.md +57 -0
package/docs/diagrams/payment-gateway/architecture.mmd +39 -0
package/docs/diagrams/payment-gateway/architecture.svg +171 -0
package/docs/diagrams/prompt-bank.md +48 -0
package/docs/diagrams/salesforce-integration/architecture.md +56 -0
package/docs/diagrams/salesforce-integration/architecture.mmd +26 -0
package/docs/diagrams/salesforce-integration/architecture.svg +123 -0
package/docs/diagrams/source-fidelity-review.md +116 -0
package/docs/diagrams/state-uml-recreated.drawio +336 -0
package/docs/diagrams/state-uml-recreated.prompt.md +114 -0
package/docs/diagrams/state-uml-recreated.prompt.v10.md +52 -0
package/docs/diagrams/state-uml-recreated.prompt.v11.md +52 -0
package/docs/diagrams/state-uml-recreated.prompt.v12.md +50 -0
package/docs/diagrams/state-uml-recreated.prompt.v14.md +91 -0
package/docs/diagrams/state-uml-recreated.prompt.v2.md +31 -0
package/docs/diagrams/state-uml-recreated.prompt.v3.md +36 -0
package/docs/diagrams/state-uml-recreated.prompt.v4.md +35 -0
package/docs/diagrams/state-uml-recreated.prompt.v5.md +35 -0
package/docs/diagrams/state-uml-recreated.prompt.v6.md +39 -0
package/docs/diagrams/state-uml-recreated.prompt.v7.md +37 -0
package/docs/diagrams/state-uml-recreated.prompt.v8.md +41 -0
package/docs/diagrams/state-uml-recreated.prompt.v9.md +32 -0
package/docs/diagrams/state-uml-recreated.svg +159 -0
package/docs/diagrams/v14-stress-test/README.md +33 -0
package/docs/diagrams/v14-stress-test/stress-test.svg +114 -0
package/docs/external-artifact-import-bridge.md +56 -0
package/docs/{setup-agents-applicability-review.md → external-baseline-applicability-review.md} +37 -40
package/docs/{setup-agents-dogfooding-findings.md → external-baseline-dogfooding-findings.md} +10 -9
package/docs/multi-agent-orchestrator-backlog.md +1 -1
package/docs/orchestra-mvp.md +19 -0
package/docs/persona-workflows.md +42 -0
package/docs/release-test-matrix.md +21 -9
package/docs/reports/ac-evidence-backfill-20260517.md +256 -0
package/docs/reports/ac-evolution-reconciliation-20260517.md +366 -0
package/docs/reports/ac-failure-evidence-20260517.md +115 -0
package/docs/reports/ac-history-dry-run-20260517.md +434 -0
package/docs/runtime-llm-flow.md +8 -0
package/docs/site-content-workflow.md +96 -0
package/docs/site-manifest.json +143 -0
package/docs/skill-loading-strategy.md +18 -7
package/docs/story-mapping-adoption-review.md +99 -0
package/docs/workspace-repo-strategy.md +63 -0
package/package.json +3 -1
package/rules/agent-collaboration.mdc +2 -0
package/rules/code-review-engineering.mdc +2 -0
package/rules/delivery-quality-gates.mdc +12 -0
package/rules/development-engineering.mdc +3 -0
package/rules/diagram-quality.mdc +35 -0
package/rules/module-boundaries.mdc +71 -0
package/rules/testing-discipline.mdc +13 -0
package/skills/chaos-resilience-testing/SKILL.md +127 -0
package/skills/chaos-resilience-testing/manifest.json +61 -0
package/skills/collection-standards/SKILL.md +2 -0
package/skills/diagram-export/SKILL.md +30 -0
package/skills/qa-evidence-pack/SKILL.md +110 -0
package/skills/qa-evidence-pack/manifest.json +60 -0
package/docs/setup-agents-bridge.md +0 -61

package/rules/module-boundaries.mdc ADDED Viewed

@@ -0,0 +1,71 @@
+---
+description: Module boundaries, god-file prevention, and thin adapter standards
+alwaysApply: true
+---
+# Module Boundaries
+Every code change must preserve clear ownership boundaries. Before adding code
+to an existing file, check whether the file is already large, multi-purpose, or
+adapter-shaped. If the change would make the file harder to review, create or
+reuse the correct domain, model, service, repository, or adapter module instead.
+## Pre-Write Check
+- Inspect the target file's current responsibility, exported surface, and size
+  before editing.
+- Treat files over 300 lines, functions over 30 lines, and command/controller
+  files with business logic as god-file risk.
+- A large existing file is not a reason to keep adding to it. If the new change
+  is separable, extract the new behavior into a focused module and wire it from
+  the existing entry point.
+- If extraction is unsafe in the current task, record a follow-up debt task with
+  the reason, affected file, and proposed boundary.
+## Expected Layers
+- `model` or `types`: narrow public data contracts, discriminated unions,
+  schemas, and DTOs.
+- `domain`: pure invariants, policy decisions, validation rules, state
+  transitions, and calculations.
+- `service` or `use-case`: orchestration of domain logic, repositories, clients,
+  and side effects for one workflow.
+- `repository`, `store`, or `gateway`: persistence, file I/O, network I/O, and
+  external system adapters.
+- `commands`: CLI adapter only. Parse arguments, call services, format output,
+  and convert errors to user-safe messages.
+- `web` or `api`: HTTP/UI adapter only. Parse requests, call services, serialize
+  responses, and map errors.
+## Logicless Commands
+Command modules must remain nearly logicless. They may:
+- parse flags and positional arguments;
+- choose output format;
+- call one service/use-case function;
+- map expected errors to CLI messages and exit codes.
+Command modules must not:
+- own business rules or workflow policy;
+- perform direct persistence when a repository/service should own it;
+- contain repeated hardcoded registries, option lists, status sets, or provider
+  matrices;
+- implement complex loops, joins, retries, or batching;
+- become the primary test target for domain behavior.
+## Hardcoded Collections
+Repeated hardcoded values must move to a typed source of truth. This applies to
+roles, statuses, providers, commands, option lists, validators, selectors,
+fixtures, CI matrices, and any key/value collection reused by more than one
+consumer. Load `collection-standards` when this risk appears.
+## Review Checklist
+- Did the author check target file size and responsibility before writing?
+- Did new logic land in the correct layer?
+- Is the command/controller/route still a thin adapter?
+- Are repeated hardcoded collections extracted to a typed source of truth?
+- Is there a follow-up debt task when extraction was intentionally deferred?

package/rules/testing-discipline.mdc CHANGED Viewed

@@ -6,40 +6,53 @@ alwaysApply: true
 # Testing Discipline
 ## Test-Driven Development (TDD)
 - Write the test **before** or **alongside** the implementation. At minimum, tests must exist before the PR.
 - Red → Green → Refactor. Start with a failing test, make it pass with minimal code, then clean up.
 - Every development task must include unit tests for new or changed business logic before it is handed to QA.
 ## Behavior-Driven Development (BDD)
 - Test **behavior**, not implementation. Test what the function does, not how it does it.
 - Name tests as specifications: `it('rejects orders with zero quantity')`, not `it('test1')`.
 - One assertion per test method. If you need multiple, it's multiple behaviors — split them.
 ## Test Structure
 - **Arrange → Act → Assert.** Separate setup, execution, and verification with blank lines.
 - Use factory functions or builders for test data — never copy-paste fixtures across test files.
 - QA automation, E2E suites, contract tests, and test scripts that repeat fixture collections, selectors, expected outputs, or command matrices must load the `collection-standards` skill.
 - Tests must be deterministic. No reliance on system clock, network, or random values without seeding.
 ## Sync Tests
 - If data is duplicated across packages (e.g., type definitions, config arrays), a test must assert both copies are identical.
 - Schema changes in a source of truth must break a test somewhere — if they don't, add one.
 ## Coverage
 - Target **90%+ line coverage** for business logic. Infrastructure/glue code can be lower.
 - Coverage is a floor, not a goal. 100% coverage with bad assertions is worse than 80% with good ones.
 ## E2E / Integration
 - Prefer Playwright for browser-based E2E, smoke, and regression automation.
 - Use the Page Object pattern for UI tests. Selectors live in page objects, not test bodies.
 - Tag tests by speed/scope (`@smoke`, `@regression`) so CI can run fast feedback loops.
 - Capture evidence for E2E failures with traces, screenshots, or videos when supported by the framework.
+- QA, SDET, Developer, BA, Architect, and Release work that produces or reviews evidence must load the `qa-evidence-pack` skill when it involves acceptance criteria coverage, Playwright/browser artifacts, CLI stdout/stderr, API contracts, integration side effects, screenshots, visual diffs, or annotated defect evidence.
+- Keep large screenshots, videos, traces, logs, API payloads, and visual diffs as files. Summarize them in a compact evidence report so agents do not consume context with raw artifacts.
 ## QA Handoff
 - Developer must provide QA with test commands run, pass/fail results, covered scenarios, and known gaps.
 - QA must produce a test plan before release approval and map every acceptance criterion to automated, manual, contract/mock, or deferred evidence.
 - QA evidence must validate observable outcomes, not only execution. CLI checks assert exit code, stdout/stderr, files, events, or final state; browser checks assert visible user-facing state; API checks assert response contract and side effects; integration checks assert sandbox/mock/contract/webhook/event/log outcomes or defer with owner and rationale.
 - Evidence summaries or metadata must name the covered acceptance criterion or explicitly state that all acceptance criteria are covered. Smoke and regression checks are useful but do not count as acceptance coverage unless they map to an acceptance criterion.
+- Visual/UI/diagram defect evidence must include source or expected image when available, actual screenshot/render, diff image when practical, and an annotated screenshot for ambiguous failures. Use red boxes for broken bounds/overlap, orange arrows for wrong connectors or flow, yellow translucent areas for excess spacing, blue guide lines for alignment, and short defect labels.
+- Executed QA evidence must receive a sprint-review-style cross-review before release: Analyst/Business Analyst compares the evidence against the GitHub issue, user story, acceptance criteria, and Orchestra task, while Architect validates technical contract, integration, boundary, data-flow, and risk coverage.
+- Analyst/Business Analyst must comment on the GitHub issue/user story and Orchestra task when the evidence does not prove the requested behavior, misses acceptance criteria, or exposes workflow gaps. These findings block release until fixed or explicitly risk-accepted by the Product Owner.
+- If Analyst/BA or Architect review is not applicable, QA must record the rationale and Product Owner acceptance before release.
 - QA and Developer must decide which manual checks should be automated, preferring Playwright for browser flows.
 - User-facing QA plans must include responsive, accessibility, copy, tooltip, loading, empty, error, success, and recovery-state checks.
 - API, data, async, performance, and config changes must include targeted regression checks for contract, migration, idempotency, latency, and environment behavior when applicable.

package/skills/chaos-resilience-testing/SKILL.md ADDED Viewed

@@ -0,0 +1,127 @@
+# Chaos Resilience Testing
+Design deterministic failure scenarios that prove workflows, APIs, providers,
+gates, budgets, and regulated flows degrade safely.
+## When To Load
+- Trigger: `chaos`
+- Trigger: `resilience`
+- Trigger: `fault injection`
+- Trigger: `failure mode`
+- Trigger: `provider timeout`
+- Trigger: `provider unavailable`
+- Trigger: `offline mode`
+- Trigger: `circuit breaker`
+- Trigger: `rate limit`
+- Trigger: `budget exhaustion`
+- Trigger: `approval race`
+- Trigger: `policy failure`
+- Trigger: `audit failure`
+- Trigger: `stale data`
+- Trigger: `corrupted state`
+- Trigger: `tenant isolation`
+- Trigger: `regulated flow`
+## Procedure
+1. Identify the task, acceptance criteria, impacted runtime surfaces, and the
+   user-visible or release-critical outcome that must survive failure.
+2. Classify each failure as one of:
+   - fail closed: security, approvals, regulated authority, secrets, PII/PHI,
+     payment, policy, tenant isolation, or destructive actions;
+   - degrade with recovery: optional enrichment, UI panels, advisory features,
+     non-critical telemetry, or external references;
+   - retry with bounds: transient provider/API, storage, webhook, or scheduler
+     failures with explicit timeout, backoff, and retry limits.
+3. Select deterministic scenarios before implementation. Prefer controlled
+   stubs, fake providers, injected stores, fixture corruption, and bounded
+   timeout simulation over random production-style fault injection.
+4. For each scenario, define:
+   - fault injected;
+   - expected behavior;
+   - expected user/operator message;
+   - expected audit/event/evidence output;
+   - recovery path;
+   - acceptance criteria covered.
+5. Validate at least the relevant categories:
+   - provider/model timeout or unavailable provider;
+   - external API/network unavailable;
+   - corrupted or partially written local state;
+   - stale reads or cache mismatch;
+   - concurrent update/approval race;
+   - budget/rate-limit exhaustion;
+   - policy engine denial or failure;
+   - audit/event write failure;
+   - offline mode with optional sources unavailable;
+   - tenant/regulatory boundary enforcement.
+6. Capture observable evidence. A passing command alone is not enough; prove the
+   final state, emitted event, user message, skipped activation, blocked gate, or
+   recovery artifact.
+7. Record unresolved resilience gaps with owner, severity, release impact, and
+   whether Product/Security/Compliance accepted the risk.
+## Stack Guidance
+- Start with local deterministic faults: Node tests, fake providers, fake
+  storage/repositories, controlled timers, `AbortController`, injected clocks,
+  and fixture corruption.
+- Use Playwright route stubs for web/API degraded states such as timeout, stale
+  data, malformed payload, empty response, or server error.
+- Use Docker Compose, Toxiproxy, WireMock/MSW/Pact, k6, and OpenTelemetry only
+  when integration or SaaS boundaries require network/service-level evidence.
+- Use Chaos Mesh or LitmusChaos only for future Kubernetes-managed services;
+  these are not npm package MVP dependencies.
+- Keep stack details in backlog or architecture docs and load only the relevant
+  scenario guidance into task context.
+## Evidence Report Template
+```md
+# Chaos / Resilience Evidence
+Task:
+Issue/User Story:
+Environment:
+Date:
+## Scenario Matrix
+| Scenario | Fault | Expected behavior | Actual behavior | Evidence | Result |
+| -------- | ----- | ----------------- | --------------- | -------- | ------ |
+## Acceptance Criteria Coverage
+| AC | Scenario | Result | Notes |
+| -- | -------- | ------ | ----- |
+## Recovery And Audit
+| Scenario | Recovery path | Audit/event evidence | User/operator message |
+| -------- | ------------- | -------------------- | --------------------- |
+## Gaps
+| Gap | Severity | Owner | Release decision |
+| --- | -------- | ----- | ---------------- |
+```
+## Acceptance Rules
+- Security, compliance, tenant isolation, approval, regulated authority, secrets,
+  and payment-related failures must fail closed unless an explicit accepted risk
+  says otherwise.
+- Optional enrichment and advisory features may degrade, but must expose clear
+  rationale and recovery guidance.
+- Retries must be bounded by timeout, retry count, backoff, and budget policy.
+- Chaos evidence must map back to acceptance criteria and release gates.
+- A generated or automated reviewer cannot self-approve resilience gaps in
+  regulated or high-risk flows.
+## Evidence
+- `command`
+- `file`
+- `log`
+- `report`
+- `trace`

package/skills/chaos-resilience-testing/manifest.json ADDED Viewed

@@ -0,0 +1,61 @@
+{
+  "id": "chaos-resilience-testing",
+  "name": "Chaos Resilience Testing",
+  "summary": "Design deterministic failure scenarios that prove workflows, APIs, providers, gates, budgets, and regulated flows degrade safely.",
+  "triggers": [
+    "chaos",
+    "resilience",
+    "fault injection",
+    "failure mode",
+    "provider timeout",
+    "provider unavailable",
+    "offline mode",
+    "circuit breaker",
+    "rate limit",
+    "budget exhaustion",
+    "approval race",
+    "policy failure",
+    "audit failure",
+    "stale data",
+    "corrupted state",
+    "tenant isolation",
+    "regulated flow"
+  ],
+  "roles": [
+    "qa",
+    "sdet",
+    "sre",
+    "security",
+    "architect",
+    "developer",
+    "devops",
+    "platform_engineer",
+    "release_manager"
+  ],
+  "capabilities": [
+    "resilience-testing",
+    "chaos-testing",
+    "failure-mode-analysis",
+    "operational-readiness"
+  ],
+  "riskAreas": [
+    "security",
+    "release",
+    "integration",
+    "governance",
+    "sre",
+    "devops",
+    "compliance",
+    "performance"
+  ],
+  "sourceGroups": [
+    "quality-security",
+    "devops-runtime",
+    "architecture",
+    "product-backlog",
+    "agent-memory"
+  ],
+  "evidence": ["command", "file", "log", "report", "trace"],
+  "loadBudget": "normal",
+  "entry": "skills/chaos-resilience-testing/SKILL.md"
+}

package/skills/collection-standards/SKILL.md CHANGED Viewed

@@ -9,6 +9,8 @@ operational tooling, and generated code.
 - Developer, QA/SDET, DevOps, Platform, SRE, or Performance work writes code,
   scripts, tests, generated options, or automation helpers.
+- A module-boundary or god-file review finds repeated hardcoded values in
+  commands, controllers, services, tests, or generated option builders.
 - The task mentions hardcoded values, arrays, maps, key/value pairs, options,
   fixtures, selectors, command cases, provider lists, CI matrices, roles,
   statuses, validators, bulk/batch processing, O(n), N+1, nested loops, or

package/skills/diagram-export/SKILL.md CHANGED Viewed

@@ -14,12 +14,41 @@ Create, validate, and export architecture, workflow, and sequence diagrams.
 ## Procedure
+- Load `docs/diagrams/diagram-master-prompt.md` as the canonical source-free
+  diagram prompt when detailed generation or validation guidance is needed.
 - Identify the diagram purpose and authoritative architecture sources before drawing.
+- Classify the task before drawing: `semantic`, `inspired-by-reference`, or `recreation`.
+- For `recreation`, acceptance is pixel-perfect source fidelity unless the user explicitly accepts an approximation. Structural similarity is not enough.
+- For `recreation`, inventory every visible source element before drawing: containers, labels, icons, connectors, arrowheads, line styles, colors, borders, spacing, rotations, z-order, and page/canvas bounds.
 - Choose the diagram style from the decision matrix before drafting.
+- When there is no source reference, create a diagram contract before drawing: purpose, audience, node inventory, groups, relationships, labels, annotations, expected reading flow, and planned connector endpoints/anchors.
 - Prefer text-native diagrams such as Mermaid unless the project requires another format.
+- For recreated or high-fidelity diagrams, always perform a post-render visual QA pass against the source reference. Re-evaluate container sizing, text fit, spacing, connector bend points, and line/container overlaps after the diagram is rendered.
+- After populating real text, subcards, chips, icons, and internal connectors, run a global layout reflow: grow parent containers when children need padding, then re-evaluate neighbors, connector routes, label lanes, and canvas bounds.
+- Do not solve container overflow primarily by shrinking text. Prefer growing the parent container, repositioning children, or rerouting connectors unless the source reference requires tighter text.
+- For `recreation`, record source-vs-output gaps by element ID or visual region after each iteration. If the chosen target cannot reach pixel-perfect fidelity, reclassify as approximation and document the reason.
+- Avoid running connector lines over containers or important labels whenever practical. Add bend points, adjust spacing, or resize containers before treating the diagram as ready.
+- Validate connector endpoint distance during the visual QA pass: every arrow must visibly leave the intended source edge and land on the intended target edge.
+- Validate connector-label separation during the visual QA pass: labels must be placed in reserved whitespace or on readable label backgrounds, and must not touch connector strokes, arrowheads, or container borders.
+- Validate element ordering during the visual QA pass: connectors and arrowheads must remain visible above the states or containers they connect, while accepted diagrams should remain visually stable across regenerations.
+- Validate connector anchor aesthetics during the visual QA pass: choose source and target edge points that minimize bend count and unnecessary line travel without changing the intended relationship.
+- Validate diagonal and crossing aesthetics during the visual QA pass: prefer orthogonal connectors and add line jumps or bridge arcs where unavoidable crossings remain.
+- Validate layout simplification during the visual QA pass: before accepting a bent connector, check whether moving either connected element slightly creates a straighter route without breaking nearby spacing or semantics.
+- Validate editable/rendered equivalence during the visual QA pass: draw.io XML and rendered SVG must describe the same moved elements, connectors, labels, and annotations.
+- Validate annotation target clarity during the visual QA pass: every annotation arrow must visibly land on the exact element or line it describes, without obscuring target text or labels.
+- For source-free diagrams, validate the rendered output against the diagram contract before handoff; correct and re-render when endpoints, labels, anchors, bend counts, or reading flow drift from the contract.
+- Source-free diagrams still require a pixel-perfect quality pass against their own contract before delivery: text must fit, containers must be correctly sized, connectors must visibly attach to the intended source/target edges, arrowheads must remain visible, labels must not collide with lines or borders, and whitespace must be intentional.
+- Never deliver the first render of a source-free diagram without re-evaluating sizes, line routing, connector anchors, text containment, z-order, and visual balance.
+- After every correction, review the whole canvas again. Local fixes are incomplete until container containment, neighboring positions, connector routes, label lanes, z-order, and whitespace still pass globally.
+- Before final handoff, perform diagram artifact hygiene:
+  - keep the accepted editable source, accepted render, prompt master or final prompt, and minimum QA evidence;
+  - archive or exclude intermediate previews, failed renders, temporary prompts, and one-off correction notes;
+  - do not publish source-specific prompt fragments into the prompt bank unless they have been rewritten as reusable rules;
+  - record where archived iterations or evidence can be found when traceability is required.
 - Run `orchestra diagrams lint --file <diagram.mmd>` for lint-only validation before sharing Mermaid diagrams.
 - Attach evidence with `orchestra diagrams lint --file <diagram.mmd> --task <task-id>` when the diagram supports workflow delivery.
 - If `mmdc` is missing, report the install guidance instead of pretending validation passed.
+- Mermaid outputs can be accepted as semantic diagrams, but not as pixel-perfect recreations when exact layout, connectors, icons, rotations, or reference styling are acceptance criteria. Escalate those cases to draw.io XML or Lucid.
 ## Decision Matrix
@@ -33,3 +62,4 @@ Create, validate, and export architecture, workflow, and sequence diagrams.
 - `file`
 - `report`
+- `screenshot`

package/skills/qa-evidence-pack/SKILL.md ADDED Viewed

@@ -0,0 +1,110 @@
+# QA Evidence Pack
+Build reviewable QA evidence packs that prove observable outcomes against
+acceptance criteria without loading a large QA playbook into every task.
+## When To Load
+- Trigger: `qa evidence`
+- Trigger: `test evidence`
+- Trigger: `acceptance criteria coverage`
+- Trigger: `playwright`
+- Trigger: `e2e`
+- Trigger: `screenshot`
+- Trigger: `trace`
+- Trigger: `video`
+- Trigger: `visual diff`
+- Trigger: `annotated screenshot`
+- Trigger: `cli output`
+- Trigger: `stdout`
+- Trigger: `stderr`
+- Trigger: `api contract`
+- Trigger: `integration`
+- Trigger: `webhook`
+## Procedure
+1. Identify the GitHub issue, user story, Orchestra task, and acceptance criteria.
+2. Create or update a compact evidence report instead of pasting raw logs into
+   the agent context.
+3. Map every acceptance criterion to one of: automated, manual, contract/mock,
+   external verification, deferred with owner and rationale.
+4. Capture observable outcomes, not only command execution:
+   - Web: visible state, key screenshots, Playwright trace, video for failures
+     or critical flows, viewport/device.
+   - CLI: command, exit code, stdout/stderr expectations, created/changed files,
+     emitted events, final state.
+   - API: request shape, response contract, error contract, idempotency when
+     relevant, side effects.
+   - Integration: sandbox/mock receiver result, webhook/event/log, correlation
+     ID, database/query evidence, or explicit deferral.
+   - Visual/UI/diagram: source or expected image, actual image, diff image when
+     practical, annotated image for defects.
+5. For visual bugs, create an annotated screenshot using concise overlays:
+   - red rectangle for clipped, overlapping, or incorrect element bounds;
+   - orange arrow for wrong connector, anchor, or flow direction;
+   - yellow translucent area for excess whitespace or spacing defect;
+   - blue guide line for expected alignment;
+   - short label naming the defect.
+6. Store large artifacts as files and reference paths from the report. Summarize
+   only the relevant finding in the handoff.
+7. Ask BA/Product to compare evidence against story and acceptance criteria, and
+   Architect to review technical coverage before release.
+## Evidence Report Template
+```md
+# QA Evidence Report
+Task:
+Issue/User Story:
+Commit:
+Environment:
+Date:
+## Acceptance Criteria Coverage
+| AC  | Test | Result | Evidence | Notes |
+| --- | ---- | ------ | -------- | ----- |
+## Commands
+| Command | Result | Output artifact |
+| ------- | ------ | --------------- |
+## Visual Evidence
+| Viewport/Source | Actual | Expected/Source | Diff | Annotated | Result |
+| --------------- | ------ | --------------- | ---- | --------- | ------ |
+## External Verification
+| System | Correlation ID | Evidence | Result |
+| ------ | -------------- | -------- | ------ |
+## Risks / Gaps
+| Gap | Owner | PO accepted? | Rationale |
+| --- | ----- | ------------ | --------- |
+```
+## Acceptance Rules
+- A passing test without observable-result validation is not sufficient QA
+  evidence.
+- A report without acceptance-criteria mapping is incomplete for release.
+- Visual defects need source/expected, actual, and annotated evidence unless the
+  defect is already self-evident in a single screenshot.
+- External integrations need receiver-side evidence or explicit deferral.
+- Deferred evidence needs owner, rationale, follow-up, and Product Owner
+  acceptance before release.
+## Evidence
+- `command`
+- `file`
+- `screenshot`
+- `trace`
+- `video`
+- `log`
+- `report`

package/skills/qa-evidence-pack/manifest.json ADDED Viewed

@@ -0,0 +1,60 @@
+{
+  "id": "qa-evidence-pack",
+  "name": "QA Evidence Pack",
+  "summary": "Create acceptance-criteria-mapped QA evidence packs with observable outcomes, artifacts, and annotated visual defect evidence.",
+  "triggers": [
+    "qa evidence",
+    "test evidence",
+    "acceptance criteria coverage",
+    "playwright",
+    "e2e",
+    "screenshot",
+    "trace",
+    "video",
+    "visual diff",
+    "annotated screenshot",
+    "cli output",
+    "stdout",
+    "stderr",
+    "api contract",
+    "integration",
+    "webhook"
+  ],
+  "roles": [
+    "qa",
+    "sdet",
+    "developer",
+    "frontend_specialist",
+    "backend_specialist",
+    "devops",
+    "sre",
+    "business_analyst",
+    "product_owner",
+    "architect",
+    "release_manager"
+  ],
+  "capabilities": [
+    "qa-evidence",
+    "acceptance-coverage",
+    "visual-annotation",
+    "external-verification"
+  ],
+  "riskAreas": ["quality", "release", "ux", "integration", "governance"],
+  "sourceGroups": [
+    "quality-security",
+    "product-backlog",
+    "codebase",
+    "agent-memory"
+  ],
+  "evidence": [
+    "command",
+    "file",
+    "screenshot",
+    "trace",
+    "video",
+    "log",
+    "report"
+  ],
+  "loadBudget": "normal",
+  "entry": "skills/qa-evidence-pack/SKILL.md"
+}

package/docs/setup-agents-bridge.md DELETED Viewed

@@ -1,61 +0,0 @@
-# setup-agents Bridge
-Open Orchestra can import optional `setup-agents` artifacts without making the
-runtime depend on Salesforce-specific setup flows.
-## Command
-```bash
-orchestra setup-agents import --source .setup-agents
-```
-Use `--json` to get a machine-readable report with imported, skipped, and
-conflicted profiles, tasks, evidence references, and handoff references.
-## Supported Inputs
-The importer reads:
-- `.setup-agents/open-orchestra/profiles.json`
-- `.setup-agents/tasks.json`
-- `.setup-agents/tasks.jsonl`
-- `.setup-agents/state/**/*.json`
-- `.setup-agents/state/**/*.jsonl`
-Task records may use either sparse legacy fields or enriched delivery fields.
-Supported task fields include:
-- `id`, `title`, `summary`
-- `owner`, `ownerRole`, `role`, `profile`, `profileId`
-- `backlogItem`, `backlog`
-- `userStory`, `goal`, `scope`
-- `acceptanceCriteria`
-- `definitionOfReady`, `definitionOfDone`
-- `dependencies`, `dependsOn`
-- `risks`, `assumptions`, `paths`, `files`
-- `testStrategy`
-- `contractVersion`
-- `acceptanceStatus`, `acceptedBy`
-- `evidenceIds`, `evidence`
-- `handoffIds`, `handoffs`
-## Mapping Rules
-Profile role mappings are read from `profiles.json` when present. Setup role
-IDs such as `setup-agents:qa` are normalized to Orchestra role IDs such as
-`qa`. Unknown owner roles are treated as conflicts during profile import and
-fall back to `developer` for sparse task records.
-Imported tasks preserve setup metadata in `task.externalRefs.setupAgents`.
-Evidence and handoff IDs are stored as references there; the importer does not
-copy or mutate the original setup artifacts.
-## Idempotency And Conflicts
-Re-running the import does not duplicate tasks. If an existing task has the
-same ID, title, and owner role, the task is reported as skipped. If the ID
-matches but title or owner differs, the importer reports a conflict and leaves
-the existing Orchestra task unchanged.
-Each import records a `SETUP_AGENTS_IMPORTED` event with summary counts so the
-story-to-evidence trail remains auditable.