npm - agileflow - Versions diffs - 2.80.0 → 2.82.0 - Mend

agileflow 2.80.0 → 2.82.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

package/README.md +6 -6
package/package.json +1 -1
package/scripts/agent-loop.js +765 -0
package/scripts/agileflow-configure.js +3 -1
package/scripts/agileflow-welcome.js +65 -0
package/scripts/damage-control-bash.js +22 -115
package/scripts/damage-control-edit.js +19 -156
package/scripts/damage-control-write.js +19 -156
package/scripts/lib/damage-control-utils.js +251 -0
package/scripts/obtain-context.js +57 -2
package/scripts/ralph-loop.js +516 -32
package/scripts/session-manager.js +434 -20
package/src/core/agents/configuration-visual-e2e.md +300 -0
package/src/core/agents/orchestrator.md +301 -6
package/src/core/commands/babysit.md +193 -15
package/src/core/commands/batch.md +362 -0
package/src/core/commands/choose.md +337 -0
package/src/core/commands/configure.md +372 -100
package/src/core/commands/session/end.md +332 -103
package/src/core/commands/workflow.md +344 -0
package/src/core/commands/setup/visual-e2e.md +0 -462

package/src/core/agents/configuration-visual-e2e.md ADDED Viewed

@@ -0,0 +1,300 @@
+---
+name: configuration-visual-e2e
+description: Configure Visual E2E testing infrastructure with Playwright and screenshot verification
+tools: Read, Write, Edit, Bash, Glob, Grep
+model: haiku
+compact_context:
+  priority: high
+  preserve_rules:
+    - "Install Playwright with npx playwright install --with-deps chromium"
+    - "Create playwright.config.ts with webServer config for auto-starting dev server"
+    - "Create tests/e2e/ directory with example test that takes screenshots"
+    - "Create screenshots/ directory for visual verification workflow"
+    - "Add test:e2e script to package.json"
+    - "All screenshots must be visually reviewed and renamed with 'verified-' prefix"
+    - "Use TodoWrite to track all 8 setup steps"
+    - "Run example test after setup to verify it works"
+  state_fields:
+    - playwright_installed
+    - config_created
+    - example_test_created
+    - screenshots_dir_created
+---
+# Configuration: Visual E2E Testing
+Set up Visual E2E testing infrastructure with Playwright and screenshot verification workflow for reliable UI development.
+---
+## What This Does
+Visual E2E testing catches issues that functional tests miss:
+1. **Playwright Setup** - Install test runner and chromium browser
+2. **Screenshot Capture** - E2E tests capture screenshots during test runs
+3. **Visual Verification** - Claude reviews screenshots before marking UI work complete
+4. **Auto-Start Dev Server** - webServer config auto-starts dev server for tests
+---
+## Configuration Steps
+### Step 1: Check Prerequisites
+```bash
+# Verify package.json exists
+ls package.json
+```
+If no package.json, exit with: "This project needs a package.json. Run `npm init` first."
+### Step 2: Ask User to Proceed
+```xml
+<invoke name="AskUserQuestion">
+<parameter name="questions">[{
+  "question": "Set up Visual E2E testing with Playwright?",
+  "header": "Visual E2E",
+  "multiSelect": false,
+  "options": [
+    {"label": "Yes, install Playwright (Recommended)", "description": "~300MB for chromium browser, creates tests/e2e/ and screenshots/"},
+    {"label": "Skip", "description": "No Visual E2E setup"}
+  ]
+}]</parameter>
+</invoke>
+```
+If user selects "Skip", exit with: "Visual E2E setup skipped. Run /agileflow:configure to set up later."
+### Step 3: Ask Dev Server Configuration
+```xml
+<invoke name="AskUserQuestion">
+<parameter name="questions">[{
+  "question": "What command starts your dev server?",
+  "header": "Dev Server",
+  "multiSelect": false,
+  "options": [
+    {"label": "npm run dev", "description": "Default Next.js/Vite command"},
+    {"label": "npm start", "description": "Create React App default"},
+    {"label": "yarn dev", "description": "Yarn package manager"}
+  ]
+}]</parameter>
+</invoke>
+```
+### Step 4: Install Playwright
+```bash
+# Install Playwright test runner
+npm install --save-dev @playwright/test
+# Install chromium browser (smallest option, ~300MB)
+npx playwright install --with-deps chromium
+```
+### Step 5: Create playwright.config.ts
+Create `playwright.config.ts` in project root:
+```typescript
+import { defineConfig, devices } from '@playwright/test';
+export default defineConfig({
+  testDir: './tests/e2e',
+  // Run tests in parallel
+  fullyParallel: true,
+  // Fail the build on CI if you accidentally left test.only
+  forbidOnly: !!process.env.CI,
+  // Retry on CI only
+  retries: process.env.CI ? 2 : 0,
+  // Opt out of parallel tests on CI
+  workers: process.env.CI ? 1 : undefined,
+  // Reporter
+  reporter: 'html',
+  use: {
+    // Base URL for navigation
+    baseURL: 'http://localhost:3000',
+    // Capture screenshot on every test
+    screenshot: 'on',
+    // Collect trace on failure
+    trace: 'on-first-retry',
+  },
+  // Configure webServer to auto-start dev server
+  webServer: {
+    command: 'npm run dev', // Replace with user's choice from Step 3
+    url: 'http://localhost:3000',
+    reuseExistingServer: !process.env.CI,
+    timeout: 120000,
+  },
+  projects: [
+    {
+      name: 'chromium',
+      use: { ...devices['Desktop Chrome'] },
+    },
+  ],
+});
+```
+### Step 6: Create Directory Structure
+```bash
+# Create tests/e2e directory
+mkdir -p tests/e2e
+# Create screenshots directory
+mkdir -p screenshots
+```
+### Step 7: Create Example Test
+Create `tests/e2e/visual-example.spec.ts`:
+```typescript
+import { test, expect } from '@playwright/test';
+test.describe('Visual Verification Examples', () => {
+  test('homepage loads correctly', async ({ page }) => {
+    await page.goto('/');
+    // Capture full-page screenshot for visual verification
+    await page.screenshot({
+      path: 'screenshots/homepage-full.png',
+      fullPage: true,
+    });
+    // Basic assertions
+    await expect(page).toHaveTitle(/./);
+  });
+  test('component renders correctly', async ({ page }) => {
+    await page.goto('/');
+    // Capture specific element screenshot
+    const header = page.locator('header').first();
+    if (await header.isVisible()) {
+      await header.screenshot({
+        path: 'screenshots/header-component.png',
+      });
+    }
+    // Verify element is visible
+    await expect(header).toBeVisible();
+  });
+});
+```
+### Step 8: Add npm Scripts
+Add to package.json scripts:
+```json
+{
+  "scripts": {
+    "test:e2e": "playwright test",
+    "test:e2e:ui": "playwright test --ui",
+    "test:e2e:headed": "playwright test --headed"
+  }
+}
+```
+### Step 9: Run Verification Test
+```bash
+npm run test:e2e
+```
+### Step 10: Show Completion Summary
+```
+Visual E2E Setup Complete
+Installed:
+- @playwright/test
+- chromium browser
+Created:
+- playwright.config.ts (with webServer auto-start)
+- tests/e2e/visual-example.spec.ts (example test)
+- screenshots/ (for visual verification)
+Added scripts to package.json:
+- npm run test:e2e         Run all e2e tests
+- npm run test:e2e:ui      Run with Playwright UI
+- npm run test:e2e:headed  Run with visible browser
+Visual Verification Workflow:
+1. Run tests: npm run test:e2e
+2. Review screenshots in screenshots/
+3. Rename verified: mv file.png verified-file.png
+4. Verify all: node scripts/screenshot-verifier.js
+Why Visual Mode?
+Tests passing doesn't mean UI looks correct. A button can "work"
+but be the wrong color, position, or missing entirely.
+Visual verification catches these issues.
+```
+---
+## Visual Verification Workflow
+After running tests:
+1. **Review screenshots**: Read each screenshot in screenshots/
+2. **Verify visually**: Check that UI looks correct
+3. **Rename verified**: `mv screenshots/homepage.png screenshots/verified-homepage.png`
+4. **Run verifier**: `node scripts/screenshot-verifier.js --path ./screenshots`
+This ensures Claude actually looked at each screenshot before declaring completion.
+---
+## Integration with Ralph Loop
+When using Visual Mode in Ralph Loop:
+```bash
+# Initialize loop with Visual Mode
+node scripts/ralph-loop.js --init --epic=EP-XXXX --visual
+# Loop checks:
+# 1. npm test passes
+# 2. All screenshots have verified- prefix
+# 3. Minimum 2 iterations completed
+```
+Visual Mode prevents premature completion promises for UI work.
+---
+## Troubleshooting
+**Tests fail with "No server running":**
+- Ensure webServer command matches your dev server command
+- Check the port number in baseURL matches your app
+**Screenshots directory empty:**
+- Tests must include `await page.screenshot({path: 'screenshots/...'})` calls
+- Check test output for errors
+**Browser not installed:**
+- Run `npx playwright install --with-deps chromium`
+---
+## Related
+- Playwright docs: https://playwright.dev/docs/intro
+- webServer config: https://playwright.dev/docs/test-webserver

package/src/core/agents/orchestrator.md CHANGED Viewed

@@ -48,6 +48,22 @@ RULE #3: DEPENDENCY DETECTION
 | Same domain, different experts | PARALLEL | Security + Performance analyzing same code |
 | Best-of-N comparison | PARALLEL | Expert1 vs Expert2 vs Expert3 approaches |
+RULE #3b: JOIN STRATEGIES (for parallel deployment)
+| Strategy | When | Behavior |
+|----------|------|----------|
+| `all` | Full implementation | Wait for all, fail if any fails |
+| `first` | Racing approaches | Take first completion |
+| `any` | Fallback patterns | Take first success |
+| `any-N` | Multiple perspectives | Take first N successes |
+| `majority` | High-stakes decisions | Take consensus (2+ agree) |
+RULE #3c: FAILURE POLICIES
+| Policy | When | Behavior |
+|--------|------|----------|
+| `fail-fast` | Critical work (default) | Stop on first failure |
+| `continue` | Analysis/review | Run all, report failures |
+| `ignore` | Optional enrichments | Skip failures silently |
 RULE #4: SYNTHESIS REQUIREMENTS
 - NEVER give final answer without all expert results
 - Flag conflicts explicitly: "Expert A recommends X (rationale: ...), Expert B recommends Y (rationale: ...)"
@@ -90,6 +106,7 @@ RULE #4: SYNTHESIS REQUIREMENTS
 3. Collect ALL results before synthesizing
 4. Always flag conflicts in final answer
 5. Provide recommendation with rationale
+6. 🧪 EXPERIMENTAL: For quality gates (coverage ≥ X%, tests pass), use nested loops - see "NESTED LOOP MODE" section
 <!-- COMPACT_SUMMARY_END -->
@@ -237,11 +254,114 @@ TaskOutput(task_id: "<ui_expert_id>", block: true)
 ---
+## JOIN STRATEGIES
+When spawning parallel experts, specify how to handle results:
+| Strategy | Behavior | Use Case |
+|----------|----------|----------|
+| `all` | Wait for all, fail if any fails | Full feature implementation |
+| `first` | Take first result, cancel others | Racing alternative approaches |
+| `any` | Take first success, ignore failures | Fallback patterns |
+| `any-N` | Take first N successes | Get multiple perspectives |
+| `majority` | Take consensus result | High-stakes decisions |
+### Failure Policies
+Combine with strategies to handle errors gracefully:
+| Policy | Behavior | Use Case |
+|--------|----------|----------|
+| `fail-fast` | Stop all on first failure (default) | Critical operations |
+| `continue` | Run all to completion, report failures | Comprehensive analysis |
+| `ignore` | Skip failed branches silently | Optional enrichments |
+**Usage:**
+```
+Deploy parallel (strategy: all, on-fail: continue):
+  - agileflow-security (may fail if no vulnerabilities)
+  - agileflow-performance (may fail if no issues)
+  - agileflow-testing
+Run all to completion. Report any failures at end.
+```
+**When to use each policy:**
+| Scenario | Recommended Policy |
+|----------|-------------------|
+| Implementation work | `fail-fast` (need all parts) |
+| Code review/analysis | `continue` (want all perspectives) |
+| Optional enrichments | `ignore` (nice-to-have) |
+### Strategy: all (Default)
+Wait for all experts to complete. Report all results in synthesis.
+```
+Deploy parallel (strategy: all):
+  - agileflow-api (endpoint)
+  - agileflow-ui (component)
+Collect ALL results before synthesizing.
+If ANY expert fails → report failure with details.
+```
+### Strategy: first
+Take the first expert that completes. Useful for racing approaches.
+```
+Deploy parallel (strategy: first):
+  - Expert A (approach: caching)
+  - Expert B (approach: pagination)
+  - Expert C (approach: batching)
+First to complete wins → use that approach.
+Cancel/ignore other results.
+Use case: Finding ANY working solution when multiple approaches are valid.
+```
+### Strategy: any
+Take first successful result. Ignore failures. Useful for fallbacks.
+```
+Deploy parallel (strategy: any):
+  - Expert A (primary approach)
+  - Expert B (fallback approach)
+First SUCCESS wins → use that result.
+If A fails but B succeeds → use B.
+If all fail → report all failures.
+Use case: Resilient operations where any working solution is acceptable.
+```
+### Strategy: majority
+Multiple experts analyze same thing. Take consensus.
+```
+Deploy parallel (strategy: majority):
+  - Security Expert 1
+  - Security Expert 2
+  - Security Expert 3
+If 2+ agree → use consensus recommendation.
+If no consensus → report conflict, request decision.
+Use case: High-stakes security reviews, architecture decisions.
+```
+---
 ## PARALLEL PATTERNS
 ### Full-Stack Feature
 ```
-Parallel:
+Parallel (strategy: all):
   - agileflow-api (endpoint)
   - agileflow-ui (component)
 Then:
@@ -250,22 +370,32 @@ Then:
 ### Code Review/Analysis
 ```
-Parallel (analyze same code):
+Parallel (strategy: all):
   - agileflow-security
   - agileflow-performance
   - agileflow-testing
 Then:
-  - Synthesize findings
+  - Synthesize all findings
 ```
-### Best-of-N
+### Best-of-N (Racing)
 ```
-Parallel (same task, different approaches):
+Parallel (strategy: first):
   - Expert A (approach 1)
   - Expert B (approach 2)
   - Expert C (approach 3)
 Then:
-  - Compare and select best
+  - Use first completion
+```
+### Consensus Decision
+```
+Parallel (strategy: majority):
+  - Security Expert 1
+  - Security Expert 2
+  - Security Expert 3
+Then:
+  - Take consensus recommendation
 ```
 ---
@@ -326,3 +456,168 @@ These are independent — deploying in parallel.
 Proceed with integration?
 ```
+---
+## NESTED LOOP MODE (Experimental)
+When agents need to iterate until quality gates pass, use **nested loops**. Each agent runs its own isolated loop with quality verification.
+### When to Use
+| Scenario | Use Nested Loops? |
+|----------|-------------------|
+| Simple implementation | No - single expert spawn |
+| Need coverage threshold | Yes - agent loops until coverage met |
+| Need visual verification | Yes - agent loops until screenshots verified |
+| Complex multi-gate feature | Yes - each domain gets its own loop |
+### How It Works
+```
+┌─────────────────────────────────────────────────────────────┐
+│                    ORCHESTRATOR                              │
+│                                                              │
+│  ┌──────────────────┐  ┌──────────────────┐                 │
+│  │ API Agent        │  │ UI Agent         │  (parallel)     │
+│  │ Loop: coverage   │  │ Loop: visual     │                 │
+│  │ Max: 5 iter      │  │ Max: 5 iter      │  ← ISOLATED     │
+│  └──────────────────┘  └──────────────────┘                 │
+│           ↓                    ↓                             │
+│      TaskOutput           TaskOutput                        │
+│           ↓                    ↓                             │
+│  ┌──────────────────────────────────────────────────────┐   │
+│  │              SYNTHESIS + VERIFICATION                 │   │
+│  └──────────────────────────────────────────────────────┘   │
+└─────────────────────────────────────────────────────────────┘
+```
+### Spawning with Agent Loops
+**Step 1: Generate loop ID and include in prompt**
+```
+Task(
+  description: "API with coverage loop",
+  prompt: `Implement /api/profile endpoint.
+  ## AGENT LOOP ACTIVE
+  You have a quality gate to satisfy:
+  - Gate: coverage >= 80%
+  - Max iterations: 5
+  - Loop ID: abc12345
+  ## Workflow
+  1. Implement the feature
+  2. Run the gate check:
+     node .agileflow/scripts/agent-loop.js --check --loop-id=abc12345
+  3. If check returns exit code 2 (running), iterate and improve
+  4. If check returns exit code 0 (passed), you're done
+  5. If check returns exit code 1 (failed), report the failure
+  Continue iterating until the gate passes or max iterations reached.`,
+  subagent_type: "agileflow-api",
+  run_in_background: true
+)
+```
+**Step 2: Initialize the loop before spawning**
+Before spawning the agent, the orchestrator should document that loops are being used. The agent will initialize its own loop using:
+```bash
+node .agileflow/scripts/agent-loop.js --init --gate=coverage --threshold=80 --max=5 --agent=agileflow-api --loop-id=abc12345
+```
+### Available Quality Gates
+| Gate | Flag | Description |
+|------|------|-------------|
+| `tests` | `--gate=tests` | Run test command, pass on exit 0 |
+| `coverage` | `--gate=coverage --threshold=80` | Run coverage, pass when >= threshold |
+| `visual` | `--gate=visual` | Check screenshots have verified- prefix |
+| `lint` | `--gate=lint` | Run lint command, pass on exit 0 |
+| `types` | `--gate=types` | Run tsc --noEmit, pass on exit 0 |
+### Monitoring Progress
+Read the event bus for loop status:
+```bash
+# Events emitted to: docs/09-agents/bus/log.jsonl
+{"type":"agent_loop","event":"init","loop_id":"abc12345","agent":"agileflow-api","gate":"coverage","threshold":80}
+{"type":"agent_loop","event":"iteration","loop_id":"abc12345","iter":1,"value":65,"passed":false}
+{"type":"agent_loop","event":"iteration","loop_id":"abc12345","iter":2,"value":72,"passed":false}
+{"type":"agent_loop","event":"passed","loop_id":"abc12345","final_value":82,"iterations":3}
+```
+### Safety Limits
+| Limit | Value | Enforced By |
+|-------|-------|-------------|
+| Max iterations per agent | 5 | agent-loop.js |
+| Max concurrent loops | 3 | agent-loop.js |
+| Timeout per loop | 10 min | agent-loop.js |
+| Regression abort | 2 consecutive | agent-loop.js |
+| Stall abort | 5 min no progress | agent-loop.js |
+### Example: Full Feature with Quality Gates
+```
+Request: "Implement user profile with API at 80% coverage and UI with visual verification"
+Parallel spawn:
+- agileflow-api with coverage loop (threshold: 80%)
+- agileflow-ui with visual loop
+## Agent Loop Status
+### API Expert (agileflow-api)
+- Gate: coverage >= 80%
+- Iterations: 3
+- Progress: 65% → 72% → 82% ✓
+- Status: PASSED
+### UI Expert (agileflow-ui)
+- Gate: visual (screenshots verified)
+- Iterations: 2
+- Progress: 0/3 → 3/3 verified ✓
+- Status: PASSED
+## Synthesis
+Both quality gates satisfied. Feature implementation complete.
+Files created:
+- src/routes/profile.ts (API)
+- src/components/ProfilePage.tsx (UI)
+- tests/profile.test.ts (coverage)
+- screenshots/verified-profile-*.png (visual)
+```
+### Abort Handling
+If an agent loop fails:
+1. **Max iterations reached**: Report which gate wasn't satisfied
+2. **Regression detected**: Note that quality went down twice
+3. **Stalled**: Note no progress for 5+ minutes
+4. **Timeout**: Note 10-minute limit exceeded
+```markdown
+## Agent Loop FAILED
+### API Expert (agileflow-api)
+- Gate: coverage >= 80%
+- Final: 72%
+- Status: FAILED (max_iterations)
+- Reason: Couldn't reach 80% coverage in 5 iterations
+### Recommendation
+- Review uncovered code paths
+- Consider if 80% is achievable
+- May need to reduce threshold or add more test cases
+```