npm - @fredcallagan/arn-spark - Versions diffs - 5.1.0 - Mend

@fredcallagan/arn-spark 5.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (130) hide show

package/plugins/arn-spark/skills/arn-spark-visual-readiness/SKILL.md ADDED Viewed

@@ -0,0 +1,293 @@
+---
+name: arn-spark-visual-readiness
+description: >-
+  This skill should be used when the user says "visual readiness",
+  "check visual layers", "activate visual layer", "visual checkpoint",
+  "promote visual testing", "enable layer 2", "visual test health",
+  "check deferred layers", "activate deferred layers", "layer promotion",
+  or wants to validate and activate deferred visual testing layers
+  after project milestones.
+version: 1.0.0
+---
+# Arness Visual Readiness
+Validate and activate deferred visual testing layers after project milestones. When `arn-spark-visual-strategy` sets up a multi-layer testing strategy, Layer 1 (typically browser-based capture) is validated and activated immediately. Additional layers (e.g., native window capture) are marked as **deferred** because the project may not yet have the build pipeline, platform access, or tooling required to validate them. This skill is the checkpoint that evaluates whether deferred layers are now ready, validates them with a spike, and promotes them to active.
+This is a conversational skill that runs in normal conversation (NOT plan mode). It uses the `arn-spark-visual-test-engineer` agent for layer validation spikes.
+The primary artifacts are:
+- **Updated arness.md** -- deferred layers promoted to active with validation evidence
+- **Updated strategy document** -- validation results appended per layer
+- **Readiness report** -- layer status table with evidence and recommendations
+**The core problem this solves:** deferred visual testing layers sit dormant after `arn-spark-visual-strategy` because nothing re-evaluates whether the project has reached the point where those layers can be activated. This skill closes that gap by checking activation criteria, running validation spikes, and promoting layers that pass.
+## Prerequisites
+Read the project's `arness.md` for a `## Arness` section. If no `## Arness` section exists or Arness Spark fields are missing, inform the user: "Arness Spark is not configured for this project yet. Run `/arn-brainstorming` to get started — it will set everything up automatically." Do not proceed without it.
+Extract:
+- **Plans directory**
+- **Vision directory** (default: `.arness/vision`)
+- **Spikes directory** (default: `.arness/spikes`) -- for validation spike workspaces
+- **Git** / **Platform**
+Check for `### Visual Testing` subsection:
+1. If found: parse all fields (see Step 1 for details)
+2. If NOT found: "No visual testing configuration found in arness.md. Run `/arn-spark-visual-strategy` first to set up your visual testing strategy." Exit.
+## Workflow
+### Step 1: Load Visual Testing Config
+Read arness.md `### Visual Testing` section.
+**Parse top-level fields as Layer 1 config (always active):**
+- **Strategy doc:** path to the visual strategy document
+- **Baseline directory:** path to baseline images
+- **Capture script:** path to the capture script
+- **Compare script:** path to the comparison script
+- **Layers:** comma-separated list of layer names
+- **Diff threshold:** pixel difference tolerance percentage
+- **Integration:** manual / npm-script / ci / arness-pipeline
+**Scan for `#### Layer N:` subsections.** For each subsection, extract per-layer fields:
+- **Status:** active / deferred
+- **Capture script:** path to the layer's capture script
+- **Compare script:** path to the layer's comparison script
+- **Baseline directory:** path to the layer's baselines
+- **Diff threshold:** layer-specific threshold (or inherit top-level)
+- **Requires dev server:** yes / no
+- **Activation criteria:** free-text description of what must be true to activate
+- **Environment:** target platform/OS for this layer
+- **Spike result:** previous spike outcome (Validated / Partially validated / Failed / Deferred)
+Build a layer list with all extracted data. Layer 1 is always the top-level config (implicit, always active). Additional layers come from `#### Layer N:` subsections.
+**If no `### Visual Testing` found:** suggest `/arn-spark-visual-strategy` and exit.
+**If no deferred layers:** "All visual testing layers are active. No deferred layers to promote." Present a summary table of active layers and suggest `/arn-code-review-implementation` for a full multi-layer quality check. Exit.
+### Step 2: Validate Active Layers
+For each active layer, verify the existing pipeline still works:
+1. Run the layer's capture script against the development build (or prototype if dev build unavailable)
+2. Run the layer's compare script against the baselines
+3. Check baseline counts against the screen manifest (if `baseline-manifest.json` exists)
+4. Report newly capturable screens that lack baselines (screens added since last baseline update)
+Present results per active layer:
+"**Layer [N] ([Name]) -- Active:**
+- Capture: [PASS / FAIL] -- [N] screens captured
+- Compare: [PASS / FAIL] -- [N] screens compared, [M] within threshold
+- Baselines: [N] baselines / [M] screens in manifest ([coverage]%)
+- New screens without baselines: [list or 'none']"
+If an active layer's pipeline fails: report it as a **WARNING** but continue. Do not block deferred layer evaluation because an active layer has a transient issue.
+### Step 3: Check Activation Criteria
+Read the readiness checklist:
+> Read `${CLAUDE_PLUGIN_ROOT}/skills/arn-spark-visual-readiness/references/readiness-checklist.md`
+For each deferred layer:
+1. Read the layer's `**Activation criteria:**` field from arness.md
+2. Match the criteria text against the common patterns in the readiness checklist (Build Success, Platform Access, Tool Availability, CI Configuration)
+3. Execute the concrete checks for each matching pattern:
+   - Run commands to verify tool availability (`which [tool]`, `[tool] --version`)
+   - Check for build artifacts at expected paths
+   - Test file transfer mechanisms for cross-environment layers
+   - Verify CI workflow configuration if relevant
+4. Collect evidence for each check
+5. **Journey upgrade check** — If the layer has `**Interaction:** static` or no `**Interaction:**` field:
+   - Run the UIA Availability readiness pattern (Pattern 6 from readiness-checklist.md)
+   - Run the Journey Runner readiness pattern (Pattern 7 from readiness-checklist.md) — skip if no runner exists yet
+   - On macOS: run the Accessibility Permissions pattern (Pattern 8 from readiness-checklist.md)
+   - If UIA Availability passes (automation framework available, accessibility tree inspectable):
+     Inform the user: "This Layer 2 is currently configured for static screenshot capture. The platform's UI automation framework is available, which means journey-based interaction testing is possible. Journey mode walks through the app like a user — clicking buttons, filling forms, navigating screens — and captures screenshots at each step."
+     Ask the user:
+     **"Would you like to upgrade to journey interaction mode?"**
+     Options:
+     1. **Yes** — Upgrade to journey-based interaction testing
+     2. **No** — Keep static screenshot capture
+   - Record the user's choice. If yes, mark the layer for journey upgrade in Step 5.
+   - If UIA Availability fails, do not suggest the upgrade — static mode remains appropriate.
+Present status per layer:
+"**Layer [N] ([Name]):** Criteria '[activation criteria text]'
+- [Check 1]: [PASS / FAIL] -- [evidence]
+- [Check 2]: [PASS / FAIL] -- [evidence]
+- **Overall: [MET / NOT MET]**"
+If ambiguous (e.g., a tool is installed but version is uncertain, or platform access is partial): ask the user for explicit confirmation rather than assuming.
+### Step 4: Validate Deferred Layers
+For each deferred layer whose activation criteria are met:
+**IMPORTANT: Run validation spikes sequentially, one at a time.** Do NOT launch multiple `arn-spark-visual-test-engineer` agents in parallel or in the background. The agent needs Bash and Write tool access, which requires user permission approval. Parallel or background agents cannot surface permission prompts to the user, causing all tool calls to be denied.
+For each qualifying layer:
+1. Read the spike checklist:
+   > Read `${CLAUDE_PLUGIN_ROOT}/skills/arn-spark-visual-strategy/references/spike-checklist.md`
+2. Determine the spike workspace: `[spikes-dir]/visual-readiness-spike-layer-[N]/`
+3. Invoke the `arn-spark-visual-test-engineer` agent via the Task tool (foreground, not background), passing the model from `.arness/agent-models/spark.md` as the `model` parameter (see `plugins/arn-spark/skills/arn-spark-ensure-config/references/ensure-config.md` "Dispatch convention" for fallback). Context:
+   - Layer specification (name, capture approach, environment, scripts)
+   - Stack details from the strategy document
+   - Environment constraints
+   - Dev server URL or build path
+   - Spike workspace path
+   - Baseline screenshots for comparison (if available)
+   - Spike checklist criteria for the specific layer
+   - Existing scripts from the deferred layer config (if any were pre-created during `arn-spark-visual-strategy`)
+4. Wait for the agent to complete fully before proceeding to the next layer.
+5. Present results using the same classification as `arn-spark-visual-strategy`:
+   - **Validated:** "Layer [N] works. [Evidence: captured screenshots match baselines within threshold.]"
+   - **Partially validated:** "Layer [N] works with caveats. [Evidence + caveats, e.g., anti-aliasing noise above expected threshold.]"
+   - **Failed:** "Layer [N] does not work in this environment. [Evidence + reason.] Should I investigate an alternative approach?"
+   - **Deferred:** "Layer [N] still cannot be tested here. [Required environment + instructions.] Leaving as deferred."
+6. Proceed to the next layer only after presenting results.
+### Step 5: Promote Layers
+For each deferred layer that was partially validated, ask before promoting:
+Ask the user:
+> **Layer [N] ([Name]) validated with caveats: [caveats]. Promote to active?**
+> 1. **Yes** — Promote to active with caveats noted
+> 2. **No** — Leave as deferred
+For each deferred layer that was validated (Validated or user-approved Partially validated):
+1. Update arness.md `#### Layer N:` subsection:
+   - Change `**Status:** deferred` to `**Status:** active`
+   - Add `**Validated:** [YYYY-MM-DD]` with today's date
+   - Update `**Spike result:**` with the validation evidence summary
+2. For layers that failed validation: leave as `**Status:** deferred`, report the reason.
+3. **Journey upgrade** — If the user accepted the journey upgrade suggestion in Step 3:
+   a. Update the layer's `**Interaction:**` field from `static` to `journey` in arness.md
+   b. Add `**Journey manifest:**` field with path `<baselines-dir>/layer-2/journey-manifest.json`
+   c. Add `**Journey runner:**` field with path `scripts/journey-runner.<ext>` (`.ps1` for Windows, `.swift` or `.applescript` for macOS)
+   d. Invoke the `arn-spark-visual-test-engineer` agent via the Task tool, passing the model from `.arness/agent-models/spark.md` as the `model` parameter (see `plugins/arn-spark/skills/arn-spark-ensure-config/references/ensure-config.md` "Dispatch convention" for fallback). Context:
+      - Journey schema reference: `${CLAUDE_PLUGIN_ROOT}/skills/arn-spark-visual-strategy/references/journey-schema.md`
+      - Journey manifest output path: the path from step (b)
+      - Target platform: detected from the layer's `**Environment:**` field
+      - Accessibility tree hints: any automation IDs discovered during the UIA Availability check
+   e. Wait for the agent to generate the journey manifest and runner script
+   f. Run the Journey Runner readiness pattern to validate the generated artifacts
+   g. Update `**Spike result:**` to include journey validation evidence
+4. For layers that remained deferred (criteria not met): leave unchanged, report which criteria were not met.
+Present the changes:
+"**arness.md updated:**
+- Layer [N] ([Name]): deferred -> **active** (validated [date])
+- Layer [M] ([Name]): remains **deferred** (reason: [criteria not met / validation failed])"
+### Step 6: Update Strategy Document
+Read the strategy document path from the `**Strategy doc:**` field in `### Visual Testing`.
+1. Read the strategy document
+2. For each layer that was validated in this session:
+   - Find the `### Layer N:` section in the strategy document
+   - Update the spike result with new validation evidence
+   - Add a `#### Readiness Check ([date])` subsection documenting:
+     - Activation criteria evaluation results
+     - Spike validation results
+     - Status change (deferred -> active, or still deferred with reason)
+3. Write the updated strategy document
+If the strategy document is not found at the configured path: warn and skip this step. Proceed with remaining steps.
+### Step 7: Update .gitignore
+Check if Git is configured (from `## Arness` config or by checking for `.git/`). If not configured, skip this step silently.
+If Git is configured and newly activated layers produce output directories:
+1. Inventory directories referenced by the newly activated layers (captures, diffs, spike workspaces)
+2. Classify each path as **ephemeral** (regenerated on every run, machine-specific) or **shared** (baselines, scripts, manifests)
+3. Read the project's `.gitignore` and check which paths are already covered
+4. Present the classification to the user:
+"The newly activated layer(s) reference these paths:
+| Path | Type | Recommendation | Currently in .gitignore |
+|------|------|----------------|------------------------|
+| [path] | [ephemeral / shared] | [ignore / track] | [yes / no] |
+| ... | ... | ... | ... |
+Ask the user:
+> **Proceed with these .gitignore recommendations?**
+> 1. **Yes** — Apply the recommendations
+> 2. **Adjust** — Let me specify which paths to change"
+5. Wait for user confirmation or adjustments. If **Adjust**, collect changes as free-form text.
+6. Add confirmed paths to `.gitignore` under a `# Visual testing -- Layer [N]` comment block
+### Step 8: Summary
+Present the readiness report:
+"**Visual Readiness Report**
+**Layer Status:**
+| Layer | Name | Previous Status | Current Status | Evidence |
+|-------|------|----------------|----------------|----------|
+| 1 | [Name] | active | active | [capture/compare health] |
+| 2 | [Name] | deferred | [active/deferred] | [validation result or criteria status] |
+| ... | ... | ... | ... | ... |
+**Screen Coverage:**
+- Screen manifest: [N] screens
+- Layer 1: [N] capturable / [M] baselined
+- Layer 2: [N] capturable / [M] baselined (if newly active)
+**Recommendations:**
+- [Action items based on results, e.g., 'Update Layer 1 baselines for 3 new screens', 'Re-evaluate Layer 2 after Windows CI runner is configured']
+Run `/arn-code-review-implementation` to execute a full multi-layer quality check with all active layers."
+## Agent Invocation Guide
+| Situation | Action |
+|-----------|--------|
+| Validate a deferred layer (Step 4) | Invoke `arn-spark-visual-test-engineer` sequentially (foreground, not background) with layer spec, environment, workspace, spike checklist. Wait for completion before the next layer. |
+| Agent permission denied | Re-run `arn-spark-visual-test-engineer` in foreground. If still denied, execute validation directly in conversation (write POC files and run capture commands yourself). |
+| User asks about initial visual setup | Defer: "Initial visual testing setup is handled by `/arn-spark-visual-strategy`." |
+| User asks about quality gate | Defer: "The multi-layer quality gate runs during `/arn-code-review-implementation`." |
+| User asks about specific layer tooling | Discuss and invoke `arn-spark-tech-evaluator` if a deep comparison is needed. |
+| Cross-environment validation deferred | Record the deferral with instructions. Leave layer as deferred with updated evidence. |
+## Error Handling
+- **No `### Visual Testing` in arness.md** -- suggest running `/arn-spark-visual-strategy` first to set up visual testing. Exit without further action.
+- **No deferred layers** -- report all layers active, present a summary table, suggest `/arn-code-review-implementation` for a full multi-layer quality check. Exit.
+- **Spike validation fails** -- leave the layer as deferred, record the failure reason and evidence. Suggest manual investigation or alternative approaches.
+- **Agent permission denied** -- re-run `arn-spark-visual-test-engineer` in foreground. If still denied, execute validation directly in conversation (write POC files and run capture commands).
+- **Criteria ambiguous** -- ask the user for explicit confirmation rather than assuming. Present the evidence collected and let the user decide.
+- **Strategy document not found at configured path** -- warn and skip Step 6. Proceed with remaining steps (arness.md update, gitignore, summary).
+- **Build command fails during validation** -- record the failure evidence (exit code, error output). Leave the layer as deferred. Report the build error and suggest investigating the build pipeline.

package/plugins/arn-spark/skills/arn-spark-visual-readiness/references/readiness-checklist.md ADDED Viewed

@@ -0,0 +1,196 @@
+# Visual Readiness Checklist
+Evaluation guide used by `arn-spark-visual-readiness` during Step 3 (Check Activation Criteria). The skill reads this checklist and matches each deferred layer's activation criteria text against the common patterns below, then executes the concrete checks to collect evidence for promotion decisions.
+## Common Activation Criteria Patterns
+### Build Success
+Use when the activation criteria mention building, compiling, packaging, or producing an executable.
+- [ ] Build command completes without errors (exit code 0)
+- [ ] Build output exists at the expected path
+- [ ] Build output is executable or launchable
+- [ ] Build output version matches the current project version (if applicable)
+**How to check:**
+1. Run the project's build command from the project root (e.g., `npm run build`, `cargo build --release`, `npm run tauri build`)
+2. Verify exit code is 0
+3. Check for build output at the expected path (e.g., `ls -la dist/`, `ls -la target/release/`)
+4. Attempt to launch the built artifact briefly to confirm it starts (if safe to do so)
+**Evidence to collect:** Build command output (last 20 lines), build output file path and size, launch confirmation (if tested).
+### Platform Access
+Use when the activation criteria mention a specific OS, cross-platform file access, WSL2-to-Windows, or running on a target machine.
+- [ ] Target OS is accessible from the development environment
+- [ ] File transfer mechanism works (e.g., WSL2 <-> Windows via `/mnt/c/`, SSH/SCP to remote machine)
+- [ ] Required tools are installed on the target platform
+- [ ] Network connectivity exists between environments (if needed for dev server access)
+**How to check:**
+1. Test file write to the cross-platform path (e.g., `touch /mnt/c/temp/visual-test-probe && rm /mnt/c/temp/visual-test-probe`)
+2. Verify target platform tool availability (e.g., run `powershell.exe -Command "Get-Command nircmd"` from WSL2)
+3. If remote: test SSH connection and file transfer round-trip
+4. If network: test connectivity to dev server from the target environment
+**Evidence to collect:** File transfer test output, tool version output from target platform, connectivity test results.
+### Tool Availability
+Use when the activation criteria mention specific screenshot tools, image comparison libraries, browsers, or runtimes.
+- [ ] Screenshot capture tool is installed and accessible
+- [ ] Image comparison library is installed
+- [ ] Required browser or runtime is available
+- [ ] Tool version meets minimum requirements (if specified)
+**How to check:**
+1. Run `which [tool]` or `[tool] --version` for each required tool
+2. For Node.js tools: check `npx [tool] --version` or verify in `node_modules/`
+3. For system tools: check the system package manager or standard install paths
+4. For browsers: verify Playwright browsers are installed (`npx playwright install --dry-run`)
+**Evidence to collect:** Version output from each tool, install path, any version warnings.
+### CI Configuration
+Use when the activation criteria mention CI pipelines, GitHub Actions runners, or automated visual test execution.
+- [ ] CI workflow file exists at the expected path (e.g., `.github/workflows/visual-tests.yml`)
+- [ ] OS matrix includes the required platform for this layer
+- [ ] Visual test step is defined in the workflow
+- [ ] Required secrets or environment variables are configured (if applicable)
+**How to check:**
+1. Read the CI workflow file and verify it contains the visual test job
+2. Check the `runs-on` field for the required OS (e.g., `windows-latest` for native Windows capture)
+3. Verify the visual test step references the correct scripts
+4. Check for required environment variables in the workflow
+**Evidence to collect:** Workflow file path, relevant job/step configuration, OS matrix values.
+### Dev Server Availability
+Use when the activation criteria mention a running development server, hot-reload, or local URL access from the capture environment.
+- [ ] Dev server starts successfully
+- [ ] Dev server is accessible from the capture environment
+- [ ] Dev server serves the expected content (not an error page)
+- [ ] Dev server port is not conflicting with other services
+**How to check:**
+1. Start the dev server (e.g., `npm run dev`)
+2. Verify it responds at the expected URL (e.g., `curl -s -o /dev/null -w "%{http_code}" http://localhost:5173`)
+3. If cross-environment: verify the URL is accessible from the target platform
+4. Check for port conflicts before starting
+**Evidence to collect:** Dev server start output, HTTP response code, content verification.
+### UIA Availability
+Use when the activation criteria mention UI automation, accessibility APIs, UIA, NSAccessibility, interaction testing, or native element inspection.
+- [ ] Platform automation framework loads (Windows: `Add-Type -AssemblyName UIAutomationClient` succeeds in PowerShell; macOS: `osascript -e 'tell application "System Events" to get name of first process'` succeeds)
+- [ ] Accessibility tree is inspectable -- can enumerate top-level UI elements of a running application
+- [ ] Target application exposes automation IDs on key interactive elements (buttons, inputs, menus, navigation)
+**How to check:**
+1. Windows: run `powershell.exe -Command "Add-Type -AssemblyName UIAutomationClient; [System.Windows.Automation.AutomationElement]::RootElement.GetRuntimeId()"` and verify it succeeds
+2. macOS: run `osascript -e 'tell application "System Events" to get name of first process'` and verify it returns a process name
+3. Launch the target application and inspect its accessibility tree (Windows: `Inspect.exe` or `FlaUI`; macOS: Accessibility Inspector in Xcode)
+4. Verify that key interactive elements have non-empty automation IDs or accessibility identifiers
+**Evidence to collect:** Automation framework load output, list of discovered automation IDs for the target application, accessibility tree depth (number of levels inspected).
+### Journey Runner
+Use when the activation criteria mention journey execution, journey runner, journey manifest, interaction test runner, or step-based capture.
+- [ ] Runner script exists at the path specified in the `**Journey runner:**` arness.md field
+- [ ] Runner script is executable (`test -x <path>` or file extension matches expected type: `.ps1` for Windows, `.swift` or `.applescript` for macOS)
+- [ ] `journey-manifest.json` exists at the path specified in the `**Journey manifest:**` arness.md field
+- [ ] Manifest is valid JSON (`cat <path> | python -m json.tool` or equivalent)
+- [ ] Manifest contains at least one journey in the `journeys` array
+- [ ] Runner can parse the manifest in dry-run mode (runner loads manifest, resolves selectors, does not execute actions)
+**How to check:**
+1. Check runner script existence: `test -f <journey-runner-path> && echo "EXISTS" || echo "MISSING"`
+2. Check runner script is executable: `test -x <journey-runner-path> && echo "EXECUTABLE" || echo "NOT EXECUTABLE"`
+3. Validate manifest JSON: `cat <journey-manifest-path> | python -m json.tool > /dev/null 2>&1 && echo "VALID" || echo "INVALID"`
+4. Count journeys in manifest: `python -c "import json; m=json.load(open('<path>')); print(len(m.get('journeys', [])))"`
+5. Run the runner in dry-run mode: `<runner-command> --dry-run <manifest-path>` and check for errors
+6. Verify that the dry-run output reports resolved selectors for all journey steps
+**Evidence to collect:** File existence checks, JSON validation output, journey count, dry-run output showing resolved vs unresolved selectors.
+### Accessibility Permissions (macOS)
+Use when the activation criteria mention macOS Accessibility permissions, System Events authorization, or terminal accessibility access. This pattern applies only on macOS.
+- [ ] Terminal/IDE has been granted Accessibility permission in System Preferences
+- [ ] `osascript` accessibility queries succeed without authorization errors
+**How to check:**
+1. Run: `osascript -e 'tell application "System Events" to get properties of first UI element of first process'`
+2. If the command fails with an authorization error (e.g., "Not authorized to send Apple events"), permissions are not granted
+3. If the command succeeds and returns UI element properties, permissions are granted
+**Evidence to collect:** osascript output (success) or error message (failure).
+**User instructions for granting permission:** To grant Accessibility permission: System Preferences > Privacy & Security > Privacy > Accessibility > Add your terminal application (Terminal, iTerm2, VS Code, etc.) and enable the checkbox. You may need to restart the terminal after granting permission.
+## Evidence Collection Guidelines
+Evidence must be concrete and verifiable. Each check should produce one or more of:
+- **Command output:** The stdout/stderr from running a verification command (truncate to relevant lines)
+- **File existence:** Path and size of an expected artifact (`ls -la [path]`)
+- **Version string:** Output from `[tool] --version` confirming availability
+- **Test result:** Pass/fail from a probe command (e.g., file write round-trip, HTTP response code)
+Avoid subjective evidence like "it seems to work" or "probably available." Each piece of evidence should be reproducible by running the same command again.
+## Promotion Decision Tree
+Follow this decision tree for each deferred layer:
+```
+1. Are ALL activation criteria checklist items passing?
+   |
+   +-- NO --> Leave layer as DEFERRED
+   |          Report which criteria failed with evidence
+   |          Suggest remediation steps
+   |
+   +-- YES --> Proceed to spike validation (Step 4)
+               |
+               2. Does the spike validation pass?
+                  |
+                  +-- VALIDATED --> Promote to ACTIVE
+                  |                 Update arness.md Status: active
+                  |                 Record validation date and evidence
+                  |
+                  +-- PARTIALLY VALIDATED --> Present caveats to user
+                  |                           Ask: "Promote with caveats, or leave deferred?"
+                  |                           If user approves: Promote to ACTIVE with caveats noted
+                  |                           If user declines: Leave as DEFERRED
+                  |
+                  +-- FAILED --> Leave as DEFERRED
+                  |              Record failure evidence
+                  |              Suggest investigation or alternative approach
+                  |
+                  +-- DEFERRED --> Leave as DEFERRED
+                                   Cannot validate in current environment
+                                   Record what is needed to validate
+```
+## Custom Criteria
+If a layer's activation criteria text does not match any common pattern above, evaluate it as a custom criterion:
+1. Break the criteria text into individual checkable assertions
+2. For each assertion, determine the most direct verification method (command, file check, or user confirmation)
+3. Execute the checks and collect evidence
+4. If any assertion cannot be verified programmatically, ask the user for confirmation with the context of what was checked and what remains uncertain