npm - @yemi33/minions - Versions diffs - 0.1.2070 → 0.1.2072 - Mend

@yemi33/minions 0.1.2070 → 0.1.2072

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

package/dashboard/js/qa.js +358 -0
package/dashboard/js/state.js +2 -1
package/dashboard/pages/qa.html +72 -0
package/dashboard/styles.css +102 -0
package/dashboard.js +410 -6
package/docs/qa-runbook-lifecycle.md +232 -0
package/engine/cleanup.js +4 -1
package/engine/comment-classifier.js +8 -1
package/engine/cooldown.js +6 -2
package/engine/gh-comment.js +74 -3
package/engine/gh-token.js +7 -9
package/engine/lifecycle.js +100 -0
package/engine/pipeline.js +9 -1
package/engine/playbook.js +39 -0
package/engine/qa-runners/maestro.js +152 -0
package/engine/qa-runners/playwright.js +149 -0
package/engine/qa-runners.js +323 -0
package/engine/qa-sessions.js +1008 -0
package/engine/shared.js +71 -12
package/engine.js +140 -0
package/package.json +1 -1
package/playbooks/qa-session-draft.md +158 -0
package/playbooks/qa-session-execute.md +165 -0
package/playbooks/qa-session-setup.md +154 -0
package/prompts/cc-system.md +43 -0
package/routing.md +3 -0

package/playbooks/qa-session-setup.md ADDED Viewed

@@ -0,0 +1,154 @@
+---
+requiresProjectContext: true
+---
+# Playbook: QA Session SETUP
+You are {{agent_name}}, the {{agent_role}} on the {{project_name}} project.
+TEAM ROOT: {{team_root}}
+## Your Task
+QA Session **SETUP** phase for session **{{session_id}}** (work item {{item_id}}).
+A user asked Minions to QA the following target and flows:
+- **Session id:** `{{session_id}}`
+- **Target kind:** `{{target_kind}}`
+- **Target PR id:** `{{target_pr_id}}`
+- **Target branch:** `{{target_branch}}`
+- **Target commit SHA:** `{{target_sha}}`
+- **Target worktree (kind=current):** `{{target_worktree}}`
+- **Raw target JSON:** `{{target_json}}`
+- **Flows (natural language):** {{flows_raw}}
+- **Runner hint (optional explicit runner):** `{{runner_hint}}`
+- **Capture:** `{{capture}}`
+- **Mode:** `{{session_mode}}`
+{{additional_context}}
+## What "qa-session-setup" means
+A `qa-session-setup` task is the **first** of three chained work items the
+engine dispatches for each QA Session (SETUP → DRAFT → EXECUTE). Your job is
+to make the target runnable so the DRAFT and EXECUTE agents can drive a real
+live instance:
+1. **Resolve the target** (`{{target_kind}}`) into a checked-out worktree.
+   - `pr`     → check out the PR's head branch (`{{target_pr_id}}`).
+   - `branch` → check out branch `{{target_branch}}`.
+   - `commit` → detach at `{{target_sha}}`.
+   - `current` → reuse the existing worktree at `{{target_worktree}}` (or
+     `MINIONS_AGENT_CWD` if `{{target_worktree}}` is empty).
+2. **Inspect the codebase** to find a single "dev-up" command. Look in this
+   order: `package.json` `scripts.dev|start|serve`, top-level `Procfile`,
+   project README "Run locally / Getting Started" section, a `Makefile` `dev`
+   target, or a docker-compose service that exposes an HTTP port. Pick the
+   smallest command that brings the app up and binds to a TCP port.
+3. **Write the managed-spawn sidecar** to
+   `agents/{{agent_id}}/managed-spawn.json` (relative to the Minions root)
+   with **exactly one** spec named **`{{managed_spawn_name}}`**. Use the JSON
+   shape the `managed_spawn` section below documents. The engine ingests this
+   sidecar on your exit and gates the next phase on the first healthcheck.
+## Hard requirements on the sidecar
+The engine validates the sidecar through `evaluateManagedSpawnAcceptance`.
+Anything that fails validation flips your dispatch to FAILED with
+`failure_class: 'invalid-managed-spawn'` and the QA Session transitions to
+`failed` automatically. Specifically:
+- `specs[0].name` MUST equal `{{managed_spawn_name}}` (exact match). The
+  engine joins the spawn back to its owning session by this convention.
+- `specs[0].healthcheck` MUST be present and verifiable. Prefer HTTP
+  (`type: 'http'`) with a real URL and `expect_status` set; fall back to
+  `type: 'command'` only when the app has no HTTP surface (e.g. a CLI worker).
+- `specs[0].cmd` MUST be on the engine's allowlist (`node`, `bun`, `npm`,
+  `npx`, `pnpm`, `yarn`, `python`, `docker`, `adb`, `gradle`, `gradlew`,
+  `mvn`, `pwsh`, `powershell`, `bash`, `sh`, `curl`, `git`, …). If the
+  project requires a non-allowlisted binary, **stop and report a setup
+  failure** (see below) — do NOT try to work around it.
+- `specs[0].cwd` MUST be an absolute path inside the resolved worktree.
+- Pick a free port and put it in both `ports[]` and the healthcheck URL.
+The `managed_spawn` block injected later in this prompt has the full schema
+and the executable allowlist enumerated. Read it before writing the sidecar.
+## No PR, no commit, no test code yet
+SETUP only resolves the target and writes the managed-spawn spec. **Do not**:
+- write any test code — that belongs to the DRAFT phase
+- commit, push, or open a PR — sessions are tracked via the session record,
+  not a merged PR
+- modify project source — the dev-up command should run the project as-is
+- start the app yourself (`bun run dev` in a detached process) — the engine
+  spawns the spec for you after you exit
+If the project genuinely will not run without a code change (missing
+dependency wiring, hard-coded prod URL, etc.), stop and report a setup
+failure so the human can decide whether to patch it themselves.
+## Failure path (REQUIRED)
+If you cannot resolve the target, cannot find a dev-up command, hit a
+non-allowlisted binary, or otherwise produce a sidecar that
+`evaluateManagedSpawnAcceptance` would reject, **do not write a malformed
+sidecar**. Instead, write your completion report with:
+```json
+{
+  "status": "failed",
+  "summary": "<one-line human-readable explanation of what blocked SETUP>",
+  "failure_class": "qa-session-setup-failed",
+  "retryable": false,
+  "needs_rerun": false,
+  "nonce": "<value of MINIONS_COMPLETION_NONCE env var>",
+  "artifacts": []
+}
+```
+The `engine/qa-sessions.js#handleSetupComplete` hook reads `failure_class`
+and the summary, transitions the session to `failed`, and surfaces the
+explanation in the dashboard session card so the human knows exactly why
+SETUP gave up.
+Examples of legitimate failure summaries:
+- `"Project has no detectable dev-up command — no package.json scripts.dev, Procfile, or Makefile dev target."`
+- `"Required binary 'cargo' is not on the engine's managed-spawn allowlist."`
+- `"Target PR #1234 head branch could not be checked out: fatal: reference is not a tree."`
+- `"Detected dev-up command but it requires a database connection string we have no way to provide in CI."`
+## Working directory
+```bash
+# PowerShell
+echo $env:MINIONS_AGENT_CWD
+pwd
+# bash/zsh
+echo "$MINIONS_AGENT_CWD"
+pwd
+```
+`MINIONS_AGENT_CWD` is the engine-resolved worktree root. Prefer it over
+`pwd` for any cwd-sensitive command.
+## Findings
+Write findings to `{{team_root}}/notes/inbox/{{agent_id}}-{{item_id}}-{{date}}.md`
+only after successful completion. Include:
+- Session id + target summary
+- Dev-up command chosen and where it was discovered (file:line)
+- Managed-spawn name, healthcheck shape, port
+- Notes for future setup runs on the same target (flaky startup, env-vars
+  needed, port collisions)
+## Constraints
+- Do not modify production code unless explicitly asked.
+- Do not remove worktrees; the engine handles cleanup automatically.
+- The sidecar is the deliverable — without it, the session is stuck in
+  `spawning` until the SETUP WI times out.

package/prompts/cc-system.md CHANGED Viewed

@@ -152,6 +152,49 @@ curl -s http://localhost:{{dashboard_port}}/api/work-items
 curl -s http://localhost:{{dashboard_port}}/api/status
 ```
+## QA Sessions — natural-language → POST /api/qa/session
+When the user describes a UI/E2E flow they want validated against a *live, running* app instance — "QA the login flow on PR #1234", "smoke test the homepage", "test the checkout flow on the `develop` branch", "validate the signup journey on my current worktree", etc. — dispatch a QA Session via `POST /api/qa/session` instead of opening an `implement` or `test` work item. The session pipeline boots the dev-up command as a managed-spawn, drafts a runner-native test (Playwright or Maestro) from the natural-language flows, then executes it against the live target and captures the requested artifacts.
+**When to use this endpoint:**
+- The user wants a behavioural / end-to-end check ("does the login form actually work", "make sure the cart adds items"). Use `/api/qa/session`.
+- The user wants a code change, unit/integration tests in the repo, or an investigation. Keep using `/api/work-items` with `type: "fix"`, `"implement"`, `"test"`, `"explore"`.
+- The user explicitly says "QA …", "smoke test …", "test the … flow", "validate the … journey", or names a concrete UI walkthrough against a live app. Use `/api/qa/session`.
+**Body shape (mirrors `engine/qa-sessions.js#validateSpec`):**
+- `target` — REQUIRED object describing what to QA against. `target.kind` is one of:
+  - `pr` — needs `target.prId` (e.g. `"github:yemi33/minions#2911"` or `"ado:office/iss/constellation#5215493"`).
+  - `branch` — needs `target.branch` (e.g. `"develop"`).
+  - `commit` — needs `target.sha` (full 40-char SHA).
+  - `current` — uses the user's current worktree. `target.worktree` is optional; the SETUP agent falls back to `MINIONS_AGENT_CWD` when empty.
+- `flowsRaw` — REQUIRED natural-language description of the steps to test (≤4000 chars). Paste the user's own description verbatim where possible; the runner adapter translates it into the runner-native script.
+- `mode` — `"confirm"` (default — pauses at `awaiting-approval` so the user can review the drafted test before EXECUTE fires) or `"auto"` (chains straight from DRAFT to EXECUTE). Pick `"confirm"` unless the user said "just run it" / "no review needed" / "auto".
+- `capture` — optional `{ video?: bool, screenshots?: bool, logs?: bool }`. Default is everything false. Set what the user asked for.
+- `runner` — optional kebab-case name to force a specific runner (`"playwright"`, `"maestro"`, or a plugin). Omit to let the engine auto-detect (Maestro wins when the project has `.maestro/`; Playwright is the safe default).
+- `project` — REQUIRED when multiple projects are configured (mirrors `/api/work-items`). Omit for the central path.
+**Worked example — PR target, confirm mode (default):**
+```
+curl -s -X POST http://localhost:{{dashboard_port}}/api/qa/session \
+  -H 'Content-Type: application/json' \
+  -H 'X-CC-Turn-Id: {{cc_turn_id}}' \
+  -d '{"target":{"kind":"pr","prId":"github:yemi33/MyApp#1234"},"flowsRaw":"Open the homepage, click Login, enter test@example.com / hunter2, and verify the dashboard renders with the user'\''s name in the header.","mode":"confirm","capture":{"screenshots":true,"logs":true},"project":"MyApp"}'
+```
+**Worked example — current worktree, auto mode, video capture:**
+```
+curl -s -X POST http://localhost:{{dashboard_port}}/api/qa/session \
+  -H 'Content-Type: application/json' \
+  -H 'X-CC-Turn-Id: {{cc_turn_id}}' \
+  -d '{"target":{"kind":"current"},"flowsRaw":"Add three items to the cart, go to checkout, complete the payment form with the Stripe test card, and verify the success page.","mode":"auto","capture":{"video":true,"screenshots":true},"project":"MyApp"}'
+```
+**Response:** `{ sessionId, state: "spawning", setupWorkItemId, managedSpawnName }`. Tell the user the session id so they can watch it at `/qa` and steer it via the `/approve` (run the drafted test), `/edit` (re-draft with feedback), `/dismiss` (accept the draft without running), `/cancel` (give up), or `/kill` (cancel + tear down the managed-spawn) endpoints listed in `GET /api/routes`.
+**Do not also dispatch a `/api/work-items` `implement` or `test` for the same QA request.** The QA Session pipeline owns its own SETUP → DRAFT → EXECUTE work items end-to-end; firing a parallel work-item is the same double-dispatch class that the "Never both" rule above forbids. If the user asks for both a QA pass AND a code change, do them as two separate, sequential calls — QA Session for the behavioural check, work-item for the fix.
 **Required fields per endpoint** — the server returns `{ error: "..." }` if missing. Common cases:
 - `POST /api/work-items`: `title` REQUIRED. `description` recommended. `project` REQUIRED when multiple projects are configured (server returns the list of known names if you guess wrong). `type` defaults to `implement`; valid values: `fix`, `implement`, `implement:large`, `setup`, `explore`, `ask`, `review`, `test`, `verify`. Agent hint via `agent` (string) or `agents` (array).
   - Exempt from the `project` requirement (these run rootless or via central paths): `ask`, `explore`, `plan`, `plan-to-prd`, `meeting`. (`docs` is intentionally NOT exempt — it's write-capable and lands in `WORKTREE_REQUIRING_TYPES`, so it needs a real project worktree. For minions-repo docs work, pass `project: "minions"` explicitly.) `setup` is also in the project-required set — it operates inside a real project worktree but produces no PR. Every other type needs a project worktree, so the server rejects project-less creates with `400 { error, knownProjects }` when ≠1 project is configured.

package/routing.md CHANGED Viewed

@@ -22,6 +22,9 @@ How the engine decides who handles what. Parsed by engine.js — keep the table
 | docs | lambert | _any_ |
 | setup | dallas | _any_ |
 | qa-validate | dallas | ralph |
+| qa-session-setup | dallas | ralph |
+| qa-session-draft | dallas | ralph |
+| qa-session-execute | dallas | ralph |
 Notes:
 - `_author_` means route to the PR author