npm - @pdlc-os/pdlc - Versions diffs - 0.1.0 - Mend

@pdlc-os/pdlc 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (41) hide show

package/.claude/commands/brainstorm.md +360 -0
package/.claude/commands/build.md +383 -0
package/.claude/commands/init.md +371 -0
package/.claude/commands/ship.md +349 -0
package/.claude/settings.json +40 -0
package/CLAUDE.md +179 -0
package/README.md +452 -0
package/agents/bolt.md +84 -0
package/agents/echo.md +87 -0
package/agents/friday.md +83 -0
package/agents/jarvis.md +87 -0
package/agents/muse.md +87 -0
package/agents/neo.md +78 -0
package/agents/oracle.md +81 -0
package/agents/phantom.md +85 -0
package/agents/pulse.md +95 -0
package/bin/pdlc.js +221 -0
package/hooks/pdlc-context-monitor.js +129 -0
package/hooks/pdlc-guardrails.js +307 -0
package/hooks/pdlc-session-start.sh +73 -0
package/hooks/pdlc-statusline.js +183 -0
package/package.json +48 -0
package/scripts/frame-template.html +332 -0
package/scripts/helper.js +88 -0
package/scripts/server.cjs +357 -0
package/scripts/start-server.sh +173 -0
package/scripts/stop-server.sh +54 -0
package/skills/reflect.md +189 -0
package/skills/repo-scan.md +266 -0
package/skills/review.md +156 -0
package/skills/safety-guardrails.md +168 -0
package/skills/ship.md +148 -0
package/skills/tdd.md +88 -0
package/skills/test.md +153 -0
package/templates/CONSTITUTION.md +254 -0
package/templates/INTENT.md +120 -0
package/templates/OVERVIEW.md +93 -0
package/templates/PRD.md +212 -0
package/templates/STATE.md +113 -0
package/templates/episode.md +182 -0
package/templates/review.md +215 -0

package/skills/safety-guardrails.md ADDED Viewed

@@ -0,0 +1,168 @@
+# Safety Guardrail Reference
+## When this skill activates
+This skill is the authoritative reference for all three guardrail tiers. It is consulted by the guardrails hook (`hooks/pdlc-guardrails.js`) on every tool call and destructive operation. It is also referenced directly by all other skills when a potentially dangerous action is about to be taken.
+When in doubt about whether an action requires a guardrail check: check this document first.
+---
+## Tier 1 — Hard Blocks
+Tier 1 actions are **blocked by default**. They require the full double-RED confirmation protocol before PDLC will allow them to proceed. Even after confirmation, they are logged as Tier 1 events in `docs/pdlc/memory/STATE.md`.
+### Tier 1 actions:
+**1. Force-push to main or master**
+- Trigger: any `git push --force`, `git push -f`, or `git push --force-with-lease` targeting `main` or `master`.
+- Why it's Tier 1: irreversibly rewrites shared history; other team members' local branches may become inconsistent; no recovery without a backup.
+**2. DROP TABLE without a prior migration file**
+- Trigger: any SQL `DROP TABLE` statement executed directly, or any ORM migration that calls `drop_table`, `dropTable`, or equivalent, without a corresponding migration file already committed to the repository.
+- Why it's Tier 1: data destruction is irreversible without a backup. The migration file requirement ensures the action is intentional and tracked.
+**3. rm -rf on files outside the current feature branch**
+- Trigger: any `rm -rf` (or `rimraf`, `del -rf`, or equivalent) targeting paths that contain files not created or modified by the current feature branch.
+- Determination: compare the target path against `git diff --name-only main...HEAD`. If the target contains files not in the diff, this is Tier 1.
+- Why it's Tier 1: deletes work that exists outside the current change set, potentially destroying unrelated code, config, or data.
+**4. Deploy with failing test gates**
+- Trigger: any attempt to execute Step 5 (Trigger CI/CD) in the Ship protocol when the Constitution test gates have not passed.
+- Why it's Tier 1: shipping broken code to production can cause outages, data corruption, and customer impact.
+---
+## Tier 2 — Pause and Confirm
+Tier 2 actions **pause execution** and require explicit human confirmation before proceeding. The human must type a clear "yes, proceed" (or equivalent affirmative) — PDLC does not accept ambiguous responses.
+Tier 2 actions can be **downgraded to Tier 3** (proceed with logged warning only) by adding an explicit entry to `docs/pdlc/memory/CONSTITUTION.md` under a "Guardrail overrides" section. Example: `tier2_downgrade: rm_rf` downgrades all `rm -rf` actions to Tier 3 for this project.
+### Tier 2 actions:
+**1. Any rm -rf**
+- Trigger: `rm -rf` or equivalent bulk-delete command targeting any path, regardless of scope.
+- Pause message: "About to delete [target path] recursively. This will permanently remove all files and directories at that path. Confirm? (yes/no)"
+**2. git reset --hard**
+- Trigger: `git reset --hard [any ref]`.
+- Pause message: "About to run `git reset --hard [ref]`. This will discard all uncommitted changes and move HEAD to [ref]. This cannot be undone. Confirm? (yes/no)"
+**3. Production database commands**
+- Trigger: `psql`, `mysql`, `sqlite3`, or any ORM migration runner invoked with a production connection string (identified by: `DATABASE_URL` or `DB_URL` containing `prod`, `production`, or a non-localhost, non-test host; or connection strings explicitly labeled `prod` in env files).
+- Pause message: "About to run a database command against what appears to be a production database ([connection hint]). Confirm? (yes/no)"
+**4. External API write calls**
+- Trigger: any HTTP `POST`, `PUT`, `PATCH`, or `DELETE` to an external URL (i.e. not `localhost` or `127.0.0.1`). Includes Slack webhooks, email APIs, payment processors, third-party services, GitHub API write calls.
+- Pause message: "About to make a [METHOD] request to [URL]. This is an external write operation. Confirm? (yes/no)"
+**5. Modifying CONSTITUTION.md**
+- Trigger: any write, edit, or overwrite of `docs/pdlc/memory/CONSTITUTION.md`.
+- Pause message: "About to modify CONSTITUTION.md. This changes the rules governing this project. Confirm? (yes/no)"
+---
+## Tier 3 — Logged Warnings
+Tier 3 actions **proceed without interruption**. PDLC logs the event in `docs/pdlc/memory/STATE.md` under a "Guardrail log" section and continues.
+Log format:
+```
+[YYYY-MM-DD HH:MM] Tier 3 event: [action description] — [context: task ID, phase, reason]
+```
+### Tier 3 actions:
+**1. Skipping a test layer**
+- Trigger: a test layer from the Test Execution Protocol is skipped for any reason.
+- Log entry must include: which layer was skipped, who authorized the skip (human instruction or Constitution config), and the current task ID.
+**2. Overriding a Constitution rule**
+- Trigger: any action that explicitly deviates from a rule defined in `docs/pdlc/memory/CONSTITUTION.md`.
+- Log entry must include: which rule was overridden, why, and who authorized the override.
+**3. Accepting a Phantom security warning without fixing**
+- Trigger: human marks a Phantom finding in a review file as "Accept and move on" without requesting a fix.
+- Log entry must include: the finding title, the affected file, and the human's stated reason (if provided).
+**4. Accepting an Echo test coverage gap**
+- Trigger: human marks an Echo finding (test coverage gap or missing edge case) in a review file as "Accept and move on" without requesting a fix.
+- Log entry must include: the finding title, the affected coverage area, and the human's stated reason (if provided).
+---
+## Override Protocol for Tier 1
+When a Tier 1 action is detected, execute the following sequence exactly. Do not deviate.
+**Step 1 — First RED warning**
+Display the following (in bold/highlighted/red formatting):
+```
+[TIER 1 — HARD BLOCK]
+You are about to perform a Tier 1 action:
+  Action: [specific action description]
+  Target: [specific target — file path, branch name, table name, etc.]
+  Risk: [one sentence describing the irreversible consequence]
+This action is classified as a hard block because it can cause irreversible damage.
+To proceed, you must confirm twice.
+Confirmation 1 of 2: Type exactly: "yes, I understand this is a Tier 1 action"
+```
+Wait for the human's response. Do not proceed until you receive the exact phrase.
+**Step 2 — Validate first confirmation**
+If the human typed exactly "yes, I understand this is a Tier 1 action" (case-insensitive): proceed to Step 3.
+If the human typed anything else: treat as a cancellation. State: "Tier 1 action cancelled. No changes made." Stop.
+**Step 3 — Second RED warning**
+Display again (in bold/highlighted/red formatting):
+```
+[TIER 1 — HARD BLOCK — SECOND CONFIRMATION REQUIRED]
+You are still about to perform:
+  Action: [same action description as Step 1]
+  Target: [same target as Step 1]
+  Risk: [same risk as Step 1]
+This is your second and final confirmation. Once you confirm, this action will execute immediately.
+Confirmation 2 of 2: Type exactly: "yes, proceed"
+```
+Wait for the human's response.
+**Step 4 — Validate second confirmation**
+If the human typed exactly "yes, proceed" (case-insensitive): proceed to Step 5.
+If the human typed anything else: treat as a cancellation. State: "Tier 1 action cancelled after first confirmation. No changes made." Stop.
+**Step 5 — Execute and log**
+Execute the action.
+Immediately after execution, log to `docs/pdlc/memory/STATE.md`:
+```
+[YYYY-MM-DD HH:MM] Tier 1 event EXECUTED: [action description] — [target] — double-RED confirmation completed by human.
+```
+---
+## Rules
+- Tier classification is determined by the action type, not by context or intent. A force-push to main is always Tier 1, even if the human says "it's fine."
+- Tier 1 double-RED confirmation cannot be scripted, pre-approved, or batch-confirmed. Each instance requires its own two-step confirmation.
+- Tier 2 actions downgraded to Tier 3 via CONSTITUTION.md still get logged. Downgrading removes the pause, not the log entry.
+- All Tier 1 and Tier 2 events (executed or cancelled) are logged in STATE.md. Tier 3 events are logged in STATE.md.
+- If PDLC cannot determine whether an action is in scope for a tier (e.g. ambiguous target path), default to the higher tier and ask the human to clarify before proceeding.
+- The guardrail system does not override human authority — it enforces deliberate decision-making. The human can always authorize a Tier 1 action; the protocol simply requires them to do so explicitly and twice.

package/skills/ship.md ADDED Viewed

@@ -0,0 +1,148 @@
+# Ship Protocol
+## When this skill activates
+Activate at the start of the **Ship sub-phase** of Operation, triggered by `/pdlc ship`. This skill governs the complete sequence from merge through deployment, versioning, and tagging. Do not begin the Ship protocol unless Construction (Build + Review + Test) is fully complete and the Constitution test gates have passed.
+---
+## Protocol
+Execute the following steps in strict sequence. Do not skip or reorder steps.
+### Step 1 — Verify Constitution test gates
+1. Read the active episode file at `docs/pdlc/memory/episodes/[episode-id].md`. Find the Test Summary section.
+2. Read `docs/pdlc/memory/CONSTITUTION.md`. Find the "Test gates" section. Identify which layers are required to pass.
+3. For each required layer: confirm its status in the Test Summary is "Pass" or "Accepted" (human-approved).
+4. If any required layer shows "Fail" and was not explicitly accepted by the human: stop. Do not proceed. State: "Constitution test gate not satisfied: [layer name] failed and was not accepted. Resolve before shipping."
+5. If all required gates are satisfied: proceed to Step 2.
+### Step 2 — Verify human PR approval
+1. Read `docs/pdlc/memory/STATE.md`. Check for a "PR approved" or equivalent entry.
+2. If no PR approval is recorded: ask the human directly: "Has the PR for [feature-name] been approved? Please confirm before I proceed with the merge."
+3. Wait for explicit confirmation. Do not proceed without it.
+4. Once confirmed: proceed to Step 3.
+### Step 3 — Merge commit
+Use a merge commit. Never squash. Never rebase. Preserving the full branch history is a non-negotiable requirement of the PDLC merge strategy.
+```bash
+git checkout main
+git pull origin main
+git merge --no-ff feature/[feature-name]
+```
+- If merge conflicts arise: stop. Surface the conflicting files to the human. Ask: "Merge conflicts detected in [files]. Please resolve them and let me know when ready."
+- Do not auto-resolve merge conflicts. Wait for human resolution.
+- Once the merge is clean: proceed to Step 4.
+### Step 4 — Push to main
+```bash
+git push origin main
+```
+- If the push is rejected (e.g. remote has diverged): stop. Do not force-push. Report to human: "Push rejected — remote main has diverged. Please advise on resolution before I continue."
+- Force-pushing to main is a Tier 1 guardrail. It must not happen without double-RED confirmation (see `skills/safety-guardrails.md`).
+### Step 5 — Trigger CI/CD
+Pulse coordinates CI/CD. Determine the deployment trigger:
+1. Check for a `.github/workflows/` directory. Look for workflow files with `on: push` to `main` or a `deploy` trigger. If found: CI/CD has been triggered by the push. Note the workflow file name.
+2. Check for a `Makefile` with a `deploy` or `release` target. If found: run `make deploy` (or `make release`) after confirming with the human that this is the correct command.
+3. Check `package.json scripts` for `deploy`, `release`, or `publish` entries. If found: run `npm run deploy` (or equivalent) after human confirmation.
+4. If no deployment trigger is found: ask the human: "I couldn't find a configured deployment trigger. How should I trigger the CI/CD pipeline?"
+5. Record in `docs/pdlc/memory/STATE.md`: CI/CD triggered via [method], at [timestamp].
+### Step 6 — Generate CHANGELOG entry and release notes
+Jarvis generates the CHANGELOG entry. Provide Jarvis with:
+- The feature name and episode ID
+- The complete list of Beads tasks completed in this cycle (read from the episode file)
+- The test summary from the episode file
+- Any tech debt or known tradeoffs recorded in the episode file
+Jarvis produces:
+- A CHANGELOG entry in the format defined in `docs/pdlc/memory/CHANGELOG.md` (check existing entries for the format convention)
+- A release notes summary (2–5 sentences, human-readable, suitable for a GitHub release or announcement)
+Append the CHANGELOG entry to `docs/pdlc/memory/CHANGELOG.md`. Do not overwrite existing entries.
+### Step 7 — Determine semantic version bump
+Analyze what was shipped in this feature cycle:
+- **Patch** — bug fixes, minor tweaks, documentation-only changes, internal refactors with no API or behavioral change.
+- **Minor** — new features, new API endpoints, new UI components, backwards-compatible additions to existing interfaces.
+- **Major** — breaking changes (removed APIs, changed API contracts, removed UI features, significant architectural shifts that require consumer updates).
+When determining the bump:
+1. Read the list of completed Beads tasks and their labels.
+2. Read the PRD to understand the scope of what shipped.
+3. Read `docs/pdlc/design/[feature-name]/api-contracts.md` for any contract changes.
+4. If the classification is not obvious, or if any task could be argued as Minor vs Major: ask the human. State your reasoning and proposed bump, then ask for confirmation.
+Once determined: proceed to Step 8.
+### Step 8 — Tag the commit
+Read the current version from `docs/pdlc/memory/CHANGELOG.md`, `package.json`, or the latest git tag (`git describe --tags --abbrev=0`). Apply the version bump from Step 7.
+```bash
+git tag v[X.Y.Z] -m "[feature name]"
+```
+Example:
+```bash
+git tag v1.3.0 -m "User authentication feature"
+```
+### Step 9 — Push the tag
+```bash
+git push origin v[X.Y.Z]
+```
+Confirm the tag is visible on the remote. If the project uses GitHub Releases: create a GitHub Release using the release notes from Step 6. Use the tag just pushed as the release target.
+### Step 10 — Update STATE.md
+Update `docs/pdlc/memory/STATE.md`:
+- Phase: Operation/Verify
+- Last shipped: [feature-name] v[X.Y.Z] at [timestamp]
+- CI/CD: triggered via [method]
+- Tag: v[X.Y.Z] pushed
+- Next step: Verify sub-phase
+Report to human: "Ship complete. [feature-name] merged to main, tagged as v[X.Y.Z], CI/CD triggered. Proceeding to Verify."
+---
+## Rules
+- Merge strategy is always merge commit (`--no-ff`). Squash and rebase are forbidden.
+- Never force-push to main. This is a Tier 1 guardrail (double-RED confirmation required if human explicitly requests it).
+- Do not begin the Ship sequence unless Constitution test gates are verified as passed (or explicitly accepted by human).
+- Do not push the merge without explicit human PR approval — either confirmed in STATE.md or confirmed verbally in this session.
+- Do not auto-resolve merge conflicts. Surface them to the human.
+- Semver bump determination must be explicit. When ambiguous, ask the human — do not guess.
+- The CHANGELOG must be updated before the tag is pushed. Do not tag first and update docs later.
+- All steps must be recorded in STATE.md as they are completed. If the session ends mid-Ship, STATE.md must reflect exactly where the sequence was paused.
+---
+## Output
+- Feature branch merged to main via `git merge --no-ff`.
+- Main pushed to origin.
+- CI/CD pipeline triggered.
+- CHANGELOG entry appended to `docs/pdlc/memory/CHANGELOG.md`.
+- Release notes generated.
+- Commit tagged as `v[X.Y.Z]` and tag pushed to origin.
+- `docs/pdlc/memory/STATE.md` updated: phase set to Operation/Verify.
+- Human notified of successful ship and next step (Verify sub-phase).

package/skills/tdd.md ADDED Viewed

@@ -0,0 +1,88 @@
+# Test-Driven Development
+## When this skill activates
+Activate during the **Build sub-phase** of Construction whenever a Beads task is claimed and implementation work is about to begin. This skill governs every line of implementation code written during the Build sub-phase. It does not apply to scaffolding (e.g. `npm init`, directory creation) or config-only tasks, but applies to all logic, handlers, routes, components, services, hooks, and utilities.
+If the task is infrastructure-only (e.g. setting up a CI pipeline, configuring environment variables, provisioning infra), pause before writing any code and ask the human for an explicit TDD override before proceeding.
+---
+## Protocol
+### Before the first test
+1. Read the active Beads task in full: title, description, acceptance criteria, epic and story labels.
+2. Locate the PRD for this feature at `docs/pdlc/prds/PRD_[feature-name]_[YYYY-MM-DD].md`.
+3. Find the BDD user story that maps to this task (match via the `story:[id]` label on the task). Read the full Given/When/Then block for that story.
+4. Extract the exact Given, When, and Then language. These phrases become the basis for your test names and assertions. Do not paraphrase — use the exact nouns and verbs from the user story.
+5. Identify which acceptance criteria from the task map to which Then clauses. Each Then clause should correspond to at least one test case.
+### The TDD cycle (Red → Green → Refactor)
+Repeat this cycle for each acceptance criterion:
+**Step 1 — Red: Write a failing test.**
+- Name the test using the Given/When/Then language from the PRD user story.
+  - Format: `given [context], when [action], then [expected outcome]`
+  - Example: `given unauthenticated user, when POST /login with valid credentials, then returns 200 with JWT`
+- The test must be specific: it must call the exact function, module, component, or endpoint that will implement this criterion. Do not write a placeholder test against a stub you intend to replace later.
+- Run the test. Confirm it fails with a meaningful failure (not a syntax error or missing import). If it fails for the wrong reason, fix the test setup before proceeding.
+- Do not write implementation code at this step. If the module does not exist yet, create an empty export or stub just enough to make the test fail for the right reason (i.e. the logic is not implemented, not that the file is missing).
+**Step 2 — Green: Write the minimal implementation to pass the test.**
+- Write only the code required to make this specific test pass. Do not implement features not yet covered by a test.
+- Run the test. It must pass.
+- Run the full test suite. If any previously passing tests now fail, stop and fix the regression before continuing.
+**Step 3 — Refactor: Clean up without breaking.**
+- Improve naming, extract duplication, simplify logic — but make no functional changes.
+- Run the full test suite again after refactoring. All tests must continue to pass.
+**Repeat** for the next acceptance criterion.
+### Auto-fix loop rule
+If a test fails after implementation:
+- **Attempt 1**: Diagnose and fix. Re-run the test.
+- **Attempt 2**: If still failing, re-read the acceptance criterion and user story. Revise the implementation. Re-run the test.
+- **Attempt 3**: If still failing, revise the approach more broadly — check assumptions, re-read the design docs at `docs/pdlc/design/[feature-name]/`. Re-run the test.
+- **After 3 failed attempts**: Stop. Do not attempt a 4th fix automatically. Present the human with the following diagnostic info:
+  - The test name and full test code
+  - The current implementation being tested
+  - The exact error output from all 3 attempts
+  - Your hypothesis for why the test is failing
+  - Two proposed approaches to resolve it
+  - Ask: "(A) Continue automatically with approach 1, (B) Continue automatically with approach 2, or (C) Take the wheel — I'll guide you."
+### After all acceptance criteria have tests passing
+1. Run the full test suite one final time.
+2. Confirm zero regressions.
+3. Record in `docs/pdlc/memory/STATE.md`: task ID, tests written (count), tests passing (count).
+4. Proceed to the Review sub-phase.
+---
+## Rules
+- **No implementation code without a failing test first.** This is non-negotiable. If you find yourself writing logic before a test, stop and write the test.
+- Infrastructure-only tasks require explicit human override before skipping TDD. State clearly: "This task appears to be infrastructure-only and may not be testable via unit/integration tests. Requesting TDD override to proceed."
+- Test names must use the exact Given/When/Then language from the PRD user story. No generic names like `test1`, `should work`, or `handles error`.
+- Each test must target a specific acceptance criterion. One criterion may have multiple tests; one test must not cover multiple criteria.
+- The auto-fix loop cap is 3 attempts. Never attempt a 4th fix without human input.
+- Refactoring is only permitted after the Green step. Never refactor during Red or during a failing auto-fix loop.
+- Do not skip running the full suite after each Green step. Regressions found late cost more than regressions caught immediately.
+---
+## Output
+- A set of test files co-located with (or in the standard test directory adjacent to) the implementation, covering all acceptance criteria for the active task.
+- All tests passing when the suite is run.
+- `docs/pdlc/memory/STATE.md` updated with task ID, test count, and pass status.
+- A clean working tree on the feature branch, ready for the Review sub-phase.

package/skills/test.md ADDED Viewed

@@ -0,0 +1,153 @@
+# Test Execution Protocol
+## When this skill activates
+Activate at the start of the **Test sub-phase** of Construction, after the Review sub-phase is complete and the human has approved the review file. This skill governs the full test execution run for a completed Beads task.
+Before starting, read `docs/pdlc/memory/CONSTITUTION.md` — specifically the "Test gates" section. This defines which layers are required to pass before Operation can begin, and which layers (if any) are pre-configured as skipped.
+---
+## Protocol
+Execute the six test layers below in order. Do not skip layers unless the Constitution explicitly marks them as skipped or the human issues an explicit skip instruction mid-run. If a layer is skipped, log it as a Tier 3 guardrail event in `docs/pdlc/memory/STATE.md`.
+---
+### Layer 1 — Unit Tests
+**Purpose**: Confirm all TDD-written tests still pass after any refactoring or review-driven changes.
+1. Run the full unit test suite using the project's configured test runner (check `package.json scripts`, `Makefile`, or `pyproject.toml` for the test command).
+2. Capture: total tests, passed, failed, skipped, coverage percentage per module.
+3. If any unit test fails: this is a regression introduced after the TDD cycle. Stop. Fix the regression before proceeding to Layer 2.
+4. Record results.
+---
+### Layer 2 — Integration Tests
+**Purpose**: Verify that service boundaries, database interactions, and external dependency contracts behave correctly when connected.
+1. Identify integration test files (typically in a `tests/integration/`, `__tests__/integration/`, or `spec/integration/` directory).
+2. Run integration tests. Ensure test databases or test-mode external services are active — do not run against production.
+3. Capture: total tests, passed, failed, error messages for any failures.
+4. If any integration test fails: diagnose. Check whether the failure is in the new code or in an existing contract that was inadvertently broken. Fix and re-run. Apply the 3-attempt auto-fix rule (same as TDD skill): after 3 failed attempts, surface to human with full diagnostics.
+5. Record results.
+---
+### Layer 3 — End-to-End (E2E) Tests
+**Purpose**: Test the complete user journey described in the PRD using a real browser instance, not a simulator or jsdom.
+1. Read the PRD at `docs/pdlc/prds/PRD_[feature-name]_[YYYY-MM-DD].md`. Identify the user journey(ies) covered by this task.
+2. Run E2E tests using a real Chromium instance:
+   - Preferred command: `npx playwright test` (or `yarn playwright test`)
+   - Alternative: `npx cypress run` if the project uses Cypress
+   - Check `package.json scripts` for `e2e`, `test:e2e`, or `playwright` entries
+3. E2E tests must exercise the actual UI or API surface against a running local server. Verify the dev server is running before executing.
+4. Capture: test names, pass/fail status, screenshots on failure, video if configured.
+5. If E2E tests fail: diagnose using screenshots and error output. Apply the 3-attempt auto-fix rule. After 3 attempts, surface to human.
+6. Record results.
+---
+### Layer 4 — Performance Tests
+**Purpose**: Ensure the new code does not degrade system performance beyond the budget defined in the Constitution.
+1. Check `docs/pdlc/memory/CONSTITUTION.md` for a "Performance budget" or "Perf budget" section. Note any defined thresholds (e.g. p95 response time < 200ms, throughput > 500 req/s).
+2. Run the project's load/performance benchmark suite. Check for:
+   - `k6` scripts in a `perf/` or `load/` directory
+   - `artillery` config files
+   - `ab` (Apache Bench) scripts in the Makefile
+   - `autocannon` or `wrk` commands in `package.json scripts`
+3. If no performance test suite exists and the Constitution defines a perf budget: note this as an Advisory finding and log in the episode test summary.
+4. Compare results against the Constitution's perf budget. Flag any threshold violations as Important findings.
+5. Record results.
+---
+### Layer 5 — Accessibility Tests
+**Purpose**: Ensure the UI changes in this task meet WCAG standards.
+1. Only run this layer if the task includes UI changes (check affected files for components, pages, or views).
+2. Run axe-core or equivalent:
+   - If Playwright is active: use `@axe-core/playwright` within E2E tests, or run `npx axe [url]`
+   - If Cypress is active: use `cypress-axe`
+   - Standalone: `npx axe-cli [url]` against the running dev server
+3. Capture: WCAG violation count, violation severity (critical / serious / moderate / minor), affected elements.
+4. Critical and Serious violations are Important findings. Moderate/Minor are Advisory.
+5. Record results.
+---
+### Layer 6 — Visual Regression Tests
+**Purpose**: Detect unintended visual changes to UI components or pages.
+1. Only run this layer if the task includes UI changes.
+2. Run screenshot diff against the established baseline:
+   - If Playwright is active: use `expect(page).toHaveScreenshot()` with an existing baseline
+   - If Chromatic/Percy is configured: trigger the visual diff run
+   - If no visual regression tooling is configured: note this as an Advisory finding
+3. Capture: number of screenshots compared, number of diffs detected, percentage change per screenshot.
+4. Any diff above the project's configured threshold (or 0.1% if not configured) is flagged as an Important finding and shown to the human with the diff image.
+5. Record results.
+---
+### After all layers complete
+1. Compile the full test summary:
+```
+## Test Summary — [task-id] — [YYYY-MM-DD]
+| Layer               | Status  | Passed | Failed | Skipped | Notes                    |
+|---------------------|---------|--------|--------|---------|--------------------------|
+| Unit                | [Pass]  | X      | 0      | Y       |                          |
+| Integration         | [Pass]  | X      | 0      | Y       |                          |
+| E2E                 | [Pass]  | X      | 0      | Y       |                          |
+| Performance         | [Pass]  | —      | —      | —       | p95: Xms (budget: Yms)   |
+| Accessibility       | [Pass]  | —      | —      | —       | 0 critical violations    |
+| Visual Regression   | [Pass]  | X      | 0      | —       | 0 diffs above threshold  |
+Skipped layers: [list any, with reason]
+Tier 3 guardrail events logged: [list any skips or accepts]
+```
+2. Write this summary into the active episode file at `docs/pdlc/memory/episodes/[episode-id].md` under the "Test Summary" section.
+3. Check `docs/pdlc/memory/CONSTITUTION.md` test gates:
+   - If all required layers pass: update `docs/pdlc/memory/STATE.md` — test gate status: passed.
+   - If a required layer has failures: surface a soft warning to the human. Present the full failure output. Ask: "(A) Fix the failures now, (B) Accept and continue (logged as Tier 3 guardrail event), or (C) Defer — add to tech debt log."
+4. Mark the Beads task as done: `bd done [task-id] --message "All test layers complete"`
+5. Update `docs/pdlc/memory/STATE.md`: task complete, test gate status, any open items.
+---
+## Rules
+- Execute layers in order (1 → 6). Do not run Layer 3 before Layer 2, etc.
+- A layer may only be skipped if: (a) `CONSTITUTION.md` explicitly marks it as skipped for this project, or (b) the human issues an explicit skip instruction during the run. Any skip is a Tier 3 guardrail event — log it.
+- E2E tests must use a real Chromium instance. jsdom, happy-dom, or similar virtual DOM environments do not satisfy the E2E requirement.
+- Do not run any test layer against a production database or production environment. Use test/staging environments only.
+- The 3-attempt auto-fix rule applies per failing test per layer. After 3 attempts, escalate to human.
+- Constitution test gates are soft: failures surface to human as warnings, not hard blocks. Human decides to fix, accept, or defer.
+- All layer results must be written into the episode file — even for skipped layers. Record the reason for skipping.
+---
+## Output
+- All six test layers executed (or explicitly skipped with logged rationale).
+- Full test summary written into the episode file at `docs/pdlc/memory/episodes/[episode-id].md`.
+- `docs/pdlc/memory/STATE.md` updated with test gate status (passed / failed / accepted with conditions).
+- Beads task marked done via `bd done`.
+- Any Tier 3 guardrail events (skipped layers, accepted failures) logged in `docs/pdlc/memory/STATE.md`.
+- Task ready for: either the next task in `bd ready` queue, or (if queue is empty) episode file drafting and Construction completion.