npm - @openrig/cli - Versions diffs - 0.1.3 → 0.1.4 - Mend

@openrig/cli 0.1.3 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (89) hide show

package/daemon/specs/agents/shared/skills/containerized-e2e/SKILL.md ADDED Viewed

@@ -0,0 +1,256 @@
+---
+name: containerized-e2e
+description: Run end-to-end dogfood tests inside Docker containers to simulate real user experiences. Use when you need to verify install paths, control plane functionality, UI rendering, or packaging correctness in a clean environment. Triggers include "containerized test", "Docker dogfood", "clean install test", "e2e in container", or testing that requires a fresh environment without dev-mode shortcuts.
+allowed-tools: Bash(docker:*), Bash(agent-browser:*), Bash(npx agent-browser:*)
+---
+# Containerized E2E Testing
+Run OpenRig (or any npm-installable CLI + web UI project) through end-to-end testing inside Docker containers, simulating real user install and usage scenarios.
+## When to Use
+- Verifying the npm install path works from a packed tarball
+- Testing control plane functionality without live agent runtimes
+- Checking UI rendering via agent-browser in a clean environment
+- Regression testing after packaging changes
+- Phase boundary acceptance gates
+## When NOT to Use
+- Testing live agent behavior (send/capture/broadcast, whoami from inside an agent, transcript capture) — these need real claude-code/codex runtimes on the host
+- Quick feedback during active TDD cycles — too slow for the edit/test loop
+## Prerequisites
+- Docker installed and running
+- The repo builds successfully (`npm run build` for all workspaces)
+- `agent-browser` skill loaded (for UI verification commands)
+## Testing Personas
+### Fresh User
+A brand new user who has never installed OpenRig. Exercises the first-run experience.
+- Empty `~/.openrig/` directory
+- No existing rigs, snapshots, or specs
+- Tests: install, daemon start, preflight, doctor, first rig up, UI renders
+### Mature User
+A user with existing OpenRig state — rigs, snapshots, library additions, transcripts.
+- Pre-populated `~/.openrig/` via Docker volume persistence
+- Build this state organically: run the fresh-user tests first, and the volume accumulates real state
+- Tests: restore from existing snapshots, expand existing rigs, library with user-added specs, upgrade-path behaviors
+To set up a mature user volume:
+```bash
+# Create a named volume
+docker volume create openrig-mature-user
+# Run fresh-user tests with the volume mounted
+docker run -it --rm --shm-size=1g \
+  -v openrig-mature-user:/root/.openrig \
+  -v /tmp/openrig-e2e-artifacts:/artifacts \
+  openrig-e2e
+# The volume now has real state from actual rig commands
+# Subsequent runs with the same volume simulate a mature user
+```
+## Workflow
+### 1. Build the E2E Image
+Use the provided build script or do it manually:
+```bash
+# Using the build script (recommended)
+bash {SKILL_DIR}/scripts/build-e2e-image.sh /path/to/repo
+# Or manually:
+cd /path/to/repo
+npm run build --workspace @openrig/daemon
+npm run build --workspace @openrig/ui
+npm run build --workspace @openrig/cli
+bash scripts/build-package.sh
+cd packages/cli && npm pack --pack-destination /tmp/e2e-build
+cp {SKILL_DIR}/scripts/Dockerfile /tmp/e2e-build/
+mv /tmp/e2e-build/openrig-cli-*.tgz /tmp/e2e-build/openrig-cli.tgz
+cd /tmp/e2e-build && docker build -t openrig-e2e:latest .
+```
+### 2. Start the Container
+```bash
+# Fresh user (ephemeral state)
+docker run -d --rm --name openrig-e2e \
+  --shm-size=1g \
+  -v /tmp/openrig-e2e-artifacts:/artifacts \
+  openrig-e2e sleep infinity
+# Mature user (persistent volume)
+docker run -d --rm --name openrig-e2e \
+  --shm-size=1g \
+  -v openrig-mature-user:/root/.openrig \
+  -v /tmp/openrig-e2e-artifacts:/artifacts \
+  openrig-e2e sleep infinity
+```
+**Important:** Always use `--shm-size=1g` for Chromium stability during browser tests.
+### 3. Run Tests Inside the Container
+Execute commands via `docker exec`:
+```bash
+# Start the daemon
+docker exec openrig-e2e rig daemon start
+# Run preflight and doctor
+docker exec openrig-e2e rig preflight --json
+docker exec openrig-e2e rig doctor --json
+# Copy test specs into the container
+docker cp {SKILL_DIR}/templates/control-plane-test.yaml openrig-e2e:/workspace/
+docker cp {SKILL_DIR}/templates/expansion-pod-fragment.yaml openrig-e2e:/workspace/
+docker cp {SKILL_DIR}/templates/expansion-collision-fragment.yaml openrig-e2e:/workspace/
+# Launch a rig
+docker exec openrig-e2e rig up /workspace/control-plane-test.yaml --json
+# Check topology
+docker exec openrig-e2e rig ps --json
+docker exec openrig-e2e rig ps --nodes --json
+```
+### 4. Browser Testing Inside the Container
+agent-browser runs inside the container via `docker exec`:
+```bash
+# Open the daemon UI
+docker exec openrig-e2e agent-browser open http://127.0.0.1:7433
+docker exec openrig-e2e agent-browser wait --load networkidle
+# Inspect interactive elements
+docker exec openrig-e2e agent-browser snapshot -i
+# Capture screenshots
+docker exec openrig-e2e agent-browser screenshot /artifacts/screenshots/dashboard.png
+docker exec openrig-e2e agent-browser screenshot --annotate /artifacts/screenshots/dashboard-annotated.png
+# Navigate and verify specific surfaces
+docker exec openrig-e2e agent-browser click @e4  # Open specs drawer (ref from snapshot)
+docker exec openrig-e2e agent-browser wait 1000
+docker exec openrig-e2e agent-browser screenshot /artifacts/screenshots/specs-drawer.png
+```
+**ARM64 note:** The Dockerfile uses Debian's system chromium instead of Chrome for Testing, which is unavailable on Linux ARM64. The environment variables `AGENT_BROWSER_EXECUTABLE_PATH` and `AGENT_BROWSER_ARGS` are set in the image.
+### 5. Test Scenarios
+#### Control Plane Lifecycle
+```bash
+# Launch the multi-pod test spec
+docker exec openrig-e2e rig up /workspace/control-plane-test.yaml --json
+RIG_ID=$(docker exec openrig-e2e rig ps --json | jq -r '.[0].rigId')
+# Verify topology
+docker exec openrig-e2e rig ps --nodes --json
+# Expand with a new pod
+docker exec openrig-e2e rig expand "$RIG_ID" /workspace/expansion-pod-fragment.yaml --json
+# Verify expansion
+docker exec openrig-e2e rig ps --nodes --json
+# Test validation rejection (colliding namespace)
+docker exec openrig-e2e rig expand "$RIG_ID" /workspace/expansion-collision-fragment.yaml --json
+# Should fail with namespace collision error, rig unchanged
+# Snapshot
+docker exec openrig-e2e rig down "$RIG_ID" --snapshot --json
+# Restore and verify
+SNAPSHOT_ID=$(docker exec openrig-e2e rig snapshot list "$RIG_ID" | awk 'NR==2 {print $1}')
+docker exec openrig-e2e rig restore "$SNAPSHOT_ID" --rig "$RIG_ID"
+# Export
+docker exec openrig-e2e rig export "$RIG_ID" -o /artifacts/captures/exported-rig.yaml
+```
+#### UI Verification
+```bash
+# After launching a rig, verify the graph renders
+docker exec openrig-e2e agent-browser open http://127.0.0.1:7433
+docker exec openrig-e2e agent-browser wait --load networkidle
+docker exec openrig-e2e agent-browser snapshot -i
+docker exec openrig-e2e agent-browser screenshot --annotate /artifacts/screenshots/graph-with-rig.png
+# Open drawers and verify content
+docker exec openrig-e2e agent-browser snapshot -i  # Get fresh refs
+# Click through specs drawer, discovery drawer, rig detail, etc.
+# Take screenshots at each step
+```
+### 6. Cleanup
+```bash
+docker exec openrig-e2e rig daemon stop
+docker exec openrig-e2e agent-browser close
+docker stop openrig-e2e
+```
+### 7. Write Report
+Copy the report template and fill it in:
+```bash
+cp {SKILL_DIR}/templates/e2e-report-template.md /tmp/openrig-e2e-artifacts/report.md
+```
+Fill in results as tests complete — do not batch findings for the end.
+## Test Spec Templates
+| Template | Purpose |
+|----------|---------|
+| `templates/control-plane-test.yaml` | Multi-pod terminal-only rig spec (backend + frontend, 3 nodes, cross-pod edges) |
+| `templates/expansion-pod-fragment.yaml` | Pod fragment for expansion happy path (ops pod with cross-pod edge) |
+| `templates/expansion-collision-fragment.yaml` | Pod fragment that intentionally collides — for validation rejection testing |
+| `templates/e2e-report-template.md` | Structured test report template |
+## Scripts
+| Script | Purpose |
+|--------|---------|
+| `scripts/build-e2e-image.sh` | Build the Docker image from the repo (builds packages, packs tarball, builds image) |
+| `scripts/Dockerfile` | The proven Dockerfile — Node 22, tmux, system Chromium, agent-browser, OpenRig CLI |
+## Limitations
+- **No live agent runtimes.** The container does not include claude-code or codex. Use `runtime: terminal` specs for control plane testing. Test live agent behavior on the host.
+- **ARM64 browser workaround.** Chrome for Testing is unavailable on Linux ARM64. The Dockerfile uses Debian chromium. This is transparent to agent-browser commands.
+- **No GPU/display.** All browser testing is headless. Screenshots and videos capture what a user would see, but there is no visible browser window.
+## Combining with Host-Based Dogfood
+For complete coverage, use both approaches:
+| What to test | Where | Tool |
+|-------------|-------|------|
+| Install path, packaging | Container | This skill |
+| CLI commands, lifecycle | Container | This skill |
+| UI rendering, drawers | Container | This skill + agent-browser |
+| Validation/error paths | Container | This skill |
+| Live agent startup | Host | QA with /dogfood skill |
+| Communication (send/capture) | Host | QA with live agents |
+| Whoami from inside agent | Host | QA with live agents |
+| Transcript capture | Host | QA with live agents |
+| Chatroom with real participants | Host | QA with live agents |

package/daemon/specs/agents/shared/skills/containerized-e2e/scripts/Dockerfile ADDED Viewed

@@ -0,0 +1,39 @@
+# OpenRig Containerized E2E Testing Image
+#
+# Provides: Node 22, tmux, agent-browser (with system Chromium), OpenRig CLI
+# Usage:
+#   1. Build the CLI tarball: cd <repo>/packages/cli && npm pack
+#   2. Copy tarball to build context as openrig-cli.tgz
+#   3. docker build -t openrig-e2e .
+#   4. docker run -it --rm --shm-size=1g -v /tmp/e2e-artifacts:/artifacts openrig-e2e
+#
+# ARM64 note: Chrome for Testing is unavailable on Linux ARM64.
+# This image uses Debian's system chromium instead, pointed via
+# AGENT_BROWSER_EXECUTABLE_PATH.
+FROM node:22-bookworm
+ENV DEBIAN_FRONTEND=noninteractive
+ENV AGENT_BROWSER_ARGS=--no-sandbox
+ENV AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium
+ENV OPENRIG_HOME=/root/.openrig
+RUN apt-get update \
+ && apt-get install -y --no-install-recommends \
+    ca-certificates \
+    chromium \
+    curl \
+    git \
+    procps \
+    tmux \
+ && rm -rf /var/lib/apt/lists/*
+RUN npm install -g agent-browser
+COPY openrig-cli.tgz /tmp/openrig-cli.tgz
+RUN npm install -g /tmp/openrig-cli.tgz \
+ && rm /tmp/openrig-cli.tgz
+WORKDIR /workspace
+CMD ["/bin/bash"]

package/daemon/specs/agents/shared/skills/containerized-e2e/scripts/build-e2e-image.sh ADDED Viewed

@@ -0,0 +1,37 @@
+#!/bin/bash
+# Build the OpenRig containerized E2E testing image.
+#
+# Usage: ./build-e2e-image.sh [repo-root]
+#   repo-root defaults to the current directory.
+#
+# Produces: Docker image tagged openrig-e2e:latest
+set -euo pipefail
+REPO_ROOT="${1:-.}"
+SKILL_DIR="$(cd "$(dirname "$0")/.." && pwd)"
+BUILD_CTX="/tmp/openrig-e2e-build"
+echo "=== Building OpenRig packages ==="
+cd "$REPO_ROOT"
+npm run build --workspace @openrig/daemon
+npm run build --workspace @openrig/ui
+npm run build --workspace @openrig/cli
+bash scripts/build-package.sh
+echo "=== Packing CLI tarball ==="
+mkdir -p "$BUILD_CTX"
+cd "$REPO_ROOT/packages/cli"
+npm pack --pack-destination "$BUILD_CTX"
+echo "=== Preparing Docker build context ==="
+cp "$SKILL_DIR/scripts/Dockerfile" "$BUILD_CTX/Dockerfile"
+# Rename tarball to a stable name
+mv "$BUILD_CTX"/openrig-cli-*.tgz "$BUILD_CTX/openrig-cli.tgz"
+echo "=== Building Docker image ==="
+cd "$BUILD_CTX"
+docker build -t openrig-e2e:latest .
+echo "=== Done ==="
+echo "Run: docker run -it --rm --shm-size=1g -v /tmp/openrig-e2e-artifacts:/artifacts openrig-e2e"

package/daemon/specs/agents/shared/skills/containerized-e2e/templates/control-plane-test.yaml ADDED Viewed

@@ -0,0 +1,40 @@
+version: "0.2"
+name: control-plane-test
+summary: >
+  Multi-pod terminal-only topology for containerized control-plane testing.
+  Exercises pods, cross-pod edges, expansion targets, and snapshot/restore
+  without requiring live coding runtimes.
+pods:
+  - id: backend
+    label: Backend Team
+    members:
+      - id: api
+        runtime: terminal
+        agent_ref: "builtin:terminal"
+        profile: none
+        cwd: /tmp
+      - id: db
+        runtime: terminal
+        agent_ref: "builtin:terminal"
+        profile: none
+        cwd: /tmp
+    edges:
+      - kind: delegates_to
+        from: api
+        to: db
+  - id: frontend
+    label: Frontend Team
+    members:
+      - id: ui
+        runtime: terminal
+        agent_ref: "builtin:terminal"
+        profile: none
+        cwd: /tmp
+    edges: []
+edges:
+  - kind: delegates_to
+    from: frontend.ui
+    to: backend.api

package/daemon/specs/agents/shared/skills/containerized-e2e/templates/e2e-report-template.md ADDED Viewed

@@ -0,0 +1,94 @@
+# Containerized E2E Test Report
+Date: {{DATE}}
+Persona: {{PERSONA}}
+Image: openrig-e2e:latest
+## Summary
+- Tests run: {{TOTAL}}
+- Passed: {{PASSED}}
+- Failed: {{FAILED}}
+- Skipped: {{SKIPPED}}
+## Environment
+- Node: {{NODE_VERSION}}
+- tmux: {{TMUX_VERSION}}
+- Chromium: {{CHROMIUM_VERSION}}
+- OpenRig CLI: {{RIG_VERSION}}
+- Platform: {{PLATFORM}}
+## Test Results
+### Install & Boot
+| Test | Result | Notes |
+|------|--------|-------|
+| npm install -g | | |
+| rig daemon start | | |
+| rig preflight | | |
+| rig doctor | | |
+| UI loads in browser | | |
+### Rig Lifecycle
+| Test | Result | Notes |
+|------|--------|-------|
+| rig up (terminal-only spec) | | |
+| rig ps / rig ps --nodes | | |
+| Graph renders in browser | | |
+| rig down --snapshot | | |
+| rig restore | | |
+| Restored nodes match | | |
+### Expansion
+| Test | Result | Notes |
+|------|--------|-------|
+| rig expand (happy path) | | |
+| Graph updates after expand | | |
+| ps --nodes shows new nodes | | |
+| Expand with collision (rejected) | | |
+| Rig unchanged after rejection | | |
+### Snapshot/Restore with Expansion
+| Test | Result | Notes |
+|------|--------|-------|
+| Snapshot captures expanded pods | | |
+| Restore brings back expanded pods | | |
+| Cross-pod edges survive restore | | |
+| Export includes expanded topology | | |
+### CLI Surface
+| Test | Result | Notes |
+|------|--------|-------|
+| rig specs ls | | |
+| rig specs show | | |
+| rig config | | |
+| rig whoami (daemon down) | | |
+| rig export | | |
+### UI Surface (agent-browser)
+| Test | Result | Notes |
+|------|--------|-------|
+| Dashboard renders | | |
+| Explorer sidebar | | |
+| Specs drawer opens | | |
+| Discovery drawer opens | | |
+| System drawer opens | | |
+| Rig detail drawer | | |
+| Node detail in graph | | |
+## Bugs Found
+(Append each bug as discovered — do not batch)
+## Artifacts
+- Screenshots: /artifacts/screenshots/
+- Videos: /artifacts/videos/
+- CLI transcript: /artifacts/cli-transcript.txt

package/daemon/specs/agents/shared/skills/containerized-e2e/templates/expansion-collision-fragment.yaml ADDED Viewed

@@ -0,0 +1,13 @@
+# This fragment intentionally collides with the existing 'backend' pod.
+# Use it to verify that expansion validation rejects namespace collisions
+# and leaves the rig unchanged.
+pod:
+  id: backend
+  label: Colliding Pod
+  members:
+    - id: worker
+      runtime: terminal
+      agent_ref: "builtin:terminal"
+      profile: none
+      cwd: /tmp
+  edges: []

package/daemon/specs/agents/shared/skills/containerized-e2e/templates/expansion-pod-fragment.yaml ADDED Viewed

@@ -0,0 +1,14 @@
+pod:
+  id: ops
+  label: Operations
+  members:
+    - id: monitor
+      runtime: terminal
+      agent_ref: "builtin:terminal"
+      profile: none
+      cwd: /tmp
+  edges: []
+crossPodEdges:
+  - kind: delegates_to
+    from: ops.monitor
+    to: backend.api

package/daemon/specs/agents/shared/skills/development-team/SKILL.md ADDED Viewed

@@ -0,0 +1,149 @@
+---
+name: development-team
+description: How the development pod coordinates implementation, QA, and design without skipping gates.
+---
+# Development Team
+You are part of the development pod. Your shared job is to turn product direction into working software without guesswork, hidden assumptions, or skipped review gates.
+## Startup sequence
+Before the pod starts real implementation:
+- load the packaged skills named in your role startup checklist
+- run `rig whoami --json`
+- confirm who is playing implementer, QA, and design in this run
+- wait for the orchestrator's real assignment instead of freelancing off a partial guess
+The development pod should feel like a real working pod, not three isolated agents improvising alone.
+## Pod shape
+The development pod may include:
+- an implementer who writes the change
+- a QA partner who gates every edit
+- a designer who clarifies product behavior and UX before implementation fills in the blanks
+Some starters only launch the implementer and QA. Others also launch a designer. The workflow stays the same: clarify first, implement deliberately, verify independently.
+## Shared loop
+This is the default loop for product work:
+```
+1. Clarify the work and the acceptance criteria
+2. Implementer sends a pre-edit proposal to QA
+3. QA approves or rejects with specifics
+4. Implementer changes code with TDD
+5. Implementer sends the diff and verification output back to QA
+6. QA approves or rejects with specifics
+7. If commit authority is enabled, the implementer may commit
+8. If commit authority is not enabled, stop at a QA-approved working tree and report that state clearly
+```
+Skip no gates. If the task is ambiguous, resolve the ambiguity before editing.
+## What the implementer must hand QA
+Pre-edit proposal should include:
+- the files expected to change
+- the behavior or acceptance criteria being targeted
+- the first failing test or verification step
+- any likely edge cases or invariants
+Post-edit review bundle should include:
+- what changed
+- the actual verification commands run
+- the result of those commands
+- any remaining uncertainty or follow-up risk
+QA should not have to reverse-engineer what the implementer thought they were doing.
+## Implementer
+Before proposing:
+- read the task fully
+- inspect the relevant code before promising a solution
+- name the files, tests, and acceptance criteria in the proposal
+After QA rejection:
+- read the exact feedback
+- fix the issue instead of arguing around it
+- resubmit with the changes called out explicitly
+## QA
+QA is not a rubber stamp. QA is a product voice — not just a test gate.
+When reviewing a proposal:
+- reject if the scope is wrong
+- check whether the planned tests actually prove the contract
+- flag hidden risks and missing failure cases
+When reviewing a diff:
+- read the actual code, not just the summary
+- verify independently when possible
+- if you cannot verify independently, require real output in the review bundle and inspect it critically
+If the implementer stalls on a permission or approval prompt, call that out immediately. Do not treat a blocked pane as finished implementation.
+### QA dogfood mode
+When QA is dogfooding (testing existing features rather than gating new code), QA works solo with full autonomy:
+- find issues AND fix them in a loop
+- test the fix, then move to the next issue
+- only escalate architecture-level concerns to the orchestrator
+- do not wait for approval to fix obvious bugs during dogfood
+- report findings to the chatroom so the rig has visibility
+### QA as a product voice
+QA sees the product from the user's perspective. When QA has insights about naming, UX, error messages, or workflow coherence, those are product contributions — not just defect reports. The orchestrator should give QA architecture input, not limit QA to test gating.
+## Designer
+When present, the designer should work ahead of implementation:
+- turn vague goals into concrete flows, states, copy, and interaction choices
+- surface edge cases before engineering has to guess
+- review built results for coherence, not just visual polish
+The designer is part of the development pod, not a decorative sidecar.
+## Browser testing and dogfood tools
+The development pod has access to browser automation and structured dogfood testing tools:
+- **`agent-browser`** — browser automation CLI. Navigate to the daemon UI, snapshot interactive elements, take annotated screenshots, record repro videos. Use `agent-browser open <url>`, `agent-browser snapshot -i`, `agent-browser screenshot --annotate`.
+- **`dogfood`** — structured exploratory testing workflow. Produces a report with screenshots, repro videos, and step-by-step evidence for every finding.
+- **`containerized-e2e`** — Docker-based clean-install testing. Simulates a fresh user environment.
+QA typically drives browser and dogfood testing, but both impl and QA should know these tools exist and can use them. When dogfooding UI:
+1. Load `/agent-browser` and `/dogfood`
+2. Open the daemon UI: `agent-browser open http://127.0.0.1:7433`
+3. Systematically explore surfaces, take screenshots as proof
+4. Report findings using the PASS/FAIL/GAP format to the chatroom
+## When the pod is blocked
+If the blocker is:
+- ambiguity: pull in design or ask the orchestrator for clarification
+- failing tests / unexpected behavior: use `systematic-debugging`
+- code changes: use `test-driven-development`
+- completion claims: use `verification-before-completion`
+Do not hand-wave around blockers. Name them and route them.
+## Communication
+- Pre-edit proposal: `rig send <qa-session> "PRE-EDIT: ..." --verify`
+- Review bundle: `rig send <qa-session> "REVIEW BUNDLE: ..." --verify`
+- Design clarification: `rig send <design-session> "Need product/design input on ..." --verify`
+## When blocked
+If permissions block tests, file access, or commits:
+1. identify the exact blocked command
+2. tell the human what that prevents
+3. continue with the work you can still do
+Do not silently stall. Do not pretend blocked verification is complete.