agileflow 2.80.0 → 2.82.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,300 @@
1
+ ---
2
+ name: configuration-visual-e2e
3
+ description: Configure Visual E2E testing infrastructure with Playwright and screenshot verification
4
+ tools: Read, Write, Edit, Bash, Glob, Grep
5
+ model: haiku
6
+ compact_context:
7
+ priority: high
8
+ preserve_rules:
9
+ - "Install Playwright with npx playwright install --with-deps chromium"
10
+ - "Create playwright.config.ts with webServer config for auto-starting dev server"
11
+ - "Create tests/e2e/ directory with example test that takes screenshots"
12
+ - "Create screenshots/ directory for visual verification workflow"
13
+ - "Add test:e2e script to package.json"
14
+ - "All screenshots must be visually reviewed and renamed with 'verified-' prefix"
15
+ - "Use TodoWrite to track all 8 setup steps"
16
+ - "Run example test after setup to verify it works"
17
+ state_fields:
18
+ - playwright_installed
19
+ - config_created
20
+ - example_test_created
21
+ - screenshots_dir_created
22
+ ---
23
+
24
+ # Configuration: Visual E2E Testing
25
+
26
+ Set up Visual E2E testing infrastructure with Playwright and screenshot verification workflow for reliable UI development.
27
+
28
+ ---
29
+
30
+ ## What This Does
31
+
32
+ Visual E2E testing catches issues that functional tests miss:
33
+
34
+ 1. **Playwright Setup** - Install test runner and chromium browser
35
+ 2. **Screenshot Capture** - E2E tests capture screenshots during test runs
36
+ 3. **Visual Verification** - Claude reviews screenshots before marking UI work complete
37
+ 4. **Auto-Start Dev Server** - webServer config auto-starts dev server for tests
38
+
39
+ ---
40
+
41
+ ## Configuration Steps
42
+
43
+ ### Step 1: Check Prerequisites
44
+
45
+ ```bash
46
+ # Verify package.json exists
47
+ ls package.json
48
+ ```
49
+
50
+ If no package.json, exit with: "This project needs a package.json. Run `npm init` first."
51
+
52
+ ### Step 2: Ask User to Proceed
53
+
54
+ ```xml
55
+ <invoke name="AskUserQuestion">
56
+ <parameter name="questions">[{
57
+ "question": "Set up Visual E2E testing with Playwright?",
58
+ "header": "Visual E2E",
59
+ "multiSelect": false,
60
+ "options": [
61
+ {"label": "Yes, install Playwright (Recommended)", "description": "~300MB for chromium browser, creates tests/e2e/ and screenshots/"},
62
+ {"label": "Skip", "description": "No Visual E2E setup"}
63
+ ]
64
+ }]</parameter>
65
+ </invoke>
66
+ ```
67
+
68
+ If user selects "Skip", exit with: "Visual E2E setup skipped. Run /agileflow:configure to set up later."
69
+
70
+ ### Step 3: Ask Dev Server Configuration
71
+
72
+ ```xml
73
+ <invoke name="AskUserQuestion">
74
+ <parameter name="questions">[{
75
+ "question": "What command starts your dev server?",
76
+ "header": "Dev Server",
77
+ "multiSelect": false,
78
+ "options": [
79
+ {"label": "npm run dev", "description": "Default Next.js/Vite command"},
80
+ {"label": "npm start", "description": "Create React App default"},
81
+ {"label": "yarn dev", "description": "Yarn package manager"}
82
+ ]
83
+ }]</parameter>
84
+ </invoke>
85
+ ```
86
+
87
+ ### Step 4: Install Playwright
88
+
89
+ ```bash
90
+ # Install Playwright test runner
91
+ npm install --save-dev @playwright/test
92
+
93
+ # Install chromium browser (smallest option, ~300MB)
94
+ npx playwright install --with-deps chromium
95
+ ```
96
+
97
+ ### Step 5: Create playwright.config.ts
98
+
99
+ Create `playwright.config.ts` in project root:
100
+
101
+ ```typescript
102
+ import { defineConfig, devices } from '@playwright/test';
103
+
104
+ export default defineConfig({
105
+ testDir: './tests/e2e',
106
+
107
+ // Run tests in parallel
108
+ fullyParallel: true,
109
+
110
+ // Fail the build on CI if you accidentally left test.only
111
+ forbidOnly: !!process.env.CI,
112
+
113
+ // Retry on CI only
114
+ retries: process.env.CI ? 2 : 0,
115
+
116
+ // Opt out of parallel tests on CI
117
+ workers: process.env.CI ? 1 : undefined,
118
+
119
+ // Reporter
120
+ reporter: 'html',
121
+
122
+ use: {
123
+ // Base URL for navigation
124
+ baseURL: 'http://localhost:3000',
125
+
126
+ // Capture screenshot on every test
127
+ screenshot: 'on',
128
+
129
+ // Collect trace on failure
130
+ trace: 'on-first-retry',
131
+ },
132
+
133
+ // Configure webServer to auto-start dev server
134
+ webServer: {
135
+ command: 'npm run dev', // Replace with user's choice from Step 3
136
+ url: 'http://localhost:3000',
137
+ reuseExistingServer: !process.env.CI,
138
+ timeout: 120000,
139
+ },
140
+
141
+ projects: [
142
+ {
143
+ name: 'chromium',
144
+ use: { ...devices['Desktop Chrome'] },
145
+ },
146
+ ],
147
+ });
148
+ ```
149
+
150
+ ### Step 6: Create Directory Structure
151
+
152
+ ```bash
153
+ # Create tests/e2e directory
154
+ mkdir -p tests/e2e
155
+
156
+ # Create screenshots directory
157
+ mkdir -p screenshots
158
+ ```
159
+
160
+ ### Step 7: Create Example Test
161
+
162
+ Create `tests/e2e/visual-example.spec.ts`:
163
+
164
+ ```typescript
165
+ import { test, expect } from '@playwright/test';
166
+
167
+ test.describe('Visual Verification Examples', () => {
168
+ test('homepage loads correctly', async ({ page }) => {
169
+ await page.goto('/');
170
+
171
+ // Capture full-page screenshot for visual verification
172
+ await page.screenshot({
173
+ path: 'screenshots/homepage-full.png',
174
+ fullPage: true,
175
+ });
176
+
177
+ // Basic assertions
178
+ await expect(page).toHaveTitle(/./);
179
+ });
180
+
181
+ test('component renders correctly', async ({ page }) => {
182
+ await page.goto('/');
183
+
184
+ // Capture specific element screenshot
185
+ const header = page.locator('header').first();
186
+ if (await header.isVisible()) {
187
+ await header.screenshot({
188
+ path: 'screenshots/header-component.png',
189
+ });
190
+ }
191
+
192
+ // Verify element is visible
193
+ await expect(header).toBeVisible();
194
+ });
195
+ });
196
+ ```
197
+
198
+ ### Step 8: Add npm Scripts
199
+
200
+ Add to package.json scripts:
201
+
202
+ ```json
203
+ {
204
+ "scripts": {
205
+ "test:e2e": "playwright test",
206
+ "test:e2e:ui": "playwright test --ui",
207
+ "test:e2e:headed": "playwright test --headed"
208
+ }
209
+ }
210
+ ```
211
+
212
+ ### Step 9: Run Verification Test
213
+
214
+ ```bash
215
+ npm run test:e2e
216
+ ```
217
+
218
+ ### Step 10: Show Completion Summary
219
+
220
+ ```
221
+ Visual E2E Setup Complete
222
+
223
+ Installed:
224
+ - @playwright/test
225
+ - chromium browser
226
+
227
+ Created:
228
+ - playwright.config.ts (with webServer auto-start)
229
+ - tests/e2e/visual-example.spec.ts (example test)
230
+ - screenshots/ (for visual verification)
231
+
232
+ Added scripts to package.json:
233
+ - npm run test:e2e Run all e2e tests
234
+ - npm run test:e2e:ui Run with Playwright UI
235
+ - npm run test:e2e:headed Run with visible browser
236
+
237
+ Visual Verification Workflow:
238
+ 1. Run tests: npm run test:e2e
239
+ 2. Review screenshots in screenshots/
240
+ 3. Rename verified: mv file.png verified-file.png
241
+ 4. Verify all: node scripts/screenshot-verifier.js
242
+
243
+ Why Visual Mode?
244
+ Tests passing doesn't mean UI looks correct. A button can "work"
245
+ but be the wrong color, position, or missing entirely.
246
+ Visual verification catches these issues.
247
+ ```
248
+
249
+ ---
250
+
251
+ ## Visual Verification Workflow
252
+
253
+ After running tests:
254
+
255
+ 1. **Review screenshots**: Read each screenshot in screenshots/
256
+ 2. **Verify visually**: Check that UI looks correct
257
+ 3. **Rename verified**: `mv screenshots/homepage.png screenshots/verified-homepage.png`
258
+ 4. **Run verifier**: `node scripts/screenshot-verifier.js --path ./screenshots`
259
+
260
+ This ensures Claude actually looked at each screenshot before declaring completion.
261
+
262
+ ---
263
+
264
+ ## Integration with Ralph Loop
265
+
266
+ When using Visual Mode in Ralph Loop:
267
+
268
+ ```bash
269
+ # Initialize loop with Visual Mode
270
+ node scripts/ralph-loop.js --init --epic=EP-XXXX --visual
271
+
272
+ # Loop checks:
273
+ # 1. npm test passes
274
+ # 2. All screenshots have verified- prefix
275
+ # 3. Minimum 2 iterations completed
276
+ ```
277
+
278
+ Visual Mode prevents premature completion promises for UI work.
279
+
280
+ ---
281
+
282
+ ## Troubleshooting
283
+
284
+ **Tests fail with "No server running":**
285
+ - Ensure webServer command matches your dev server command
286
+ - Check the port number in baseURL matches your app
287
+
288
+ **Screenshots directory empty:**
289
+ - Tests must include `await page.screenshot({path: 'screenshots/...'})` calls
290
+ - Check test output for errors
291
+
292
+ **Browser not installed:**
293
+ - Run `npx playwright install --with-deps chromium`
294
+
295
+ ---
296
+
297
+ ## Related
298
+
299
+ - Playwright docs: https://playwright.dev/docs/intro
300
+ - webServer config: https://playwright.dev/docs/test-webserver
@@ -48,6 +48,22 @@ RULE #3: DEPENDENCY DETECTION
48
48
  | Same domain, different experts | PARALLEL | Security + Performance analyzing same code |
49
49
  | Best-of-N comparison | PARALLEL | Expert1 vs Expert2 vs Expert3 approaches |
50
50
 
51
+ RULE #3b: JOIN STRATEGIES (for parallel deployment)
52
+ | Strategy | When | Behavior |
53
+ |----------|------|----------|
54
+ | `all` | Full implementation | Wait for all, fail if any fails |
55
+ | `first` | Racing approaches | Take first completion |
56
+ | `any` | Fallback patterns | Take first success |
57
+ | `any-N` | Multiple perspectives | Take first N successes |
58
+ | `majority` | High-stakes decisions | Take consensus (2+ agree) |
59
+
60
+ RULE #3c: FAILURE POLICIES
61
+ | Policy | When | Behavior |
62
+ |--------|------|----------|
63
+ | `fail-fast` | Critical work (default) | Stop on first failure |
64
+ | `continue` | Analysis/review | Run all, report failures |
65
+ | `ignore` | Optional enrichments | Skip failures silently |
66
+
51
67
  RULE #4: SYNTHESIS REQUIREMENTS
52
68
  - NEVER give final answer without all expert results
53
69
  - Flag conflicts explicitly: "Expert A recommends X (rationale: ...), Expert B recommends Y (rationale: ...)"
@@ -90,6 +106,7 @@ RULE #4: SYNTHESIS REQUIREMENTS
90
106
  3. Collect ALL results before synthesizing
91
107
  4. Always flag conflicts in final answer
92
108
  5. Provide recommendation with rationale
109
+ 6. 🧪 EXPERIMENTAL: For quality gates (coverage ≥ X%, tests pass), use nested loops - see "NESTED LOOP MODE" section
93
110
 
94
111
  <!-- COMPACT_SUMMARY_END -->
95
112
 
@@ -237,11 +254,114 @@ TaskOutput(task_id: "<ui_expert_id>", block: true)
237
254
 
238
255
  ---
239
256
 
257
+ ## JOIN STRATEGIES
258
+
259
+ When spawning parallel experts, specify how to handle results:
260
+
261
+ | Strategy | Behavior | Use Case |
262
+ |----------|----------|----------|
263
+ | `all` | Wait for all, fail if any fails | Full feature implementation |
264
+ | `first` | Take first result, cancel others | Racing alternative approaches |
265
+ | `any` | Take first success, ignore failures | Fallback patterns |
266
+ | `any-N` | Take first N successes | Get multiple perspectives |
267
+ | `majority` | Take consensus result | High-stakes decisions |
268
+
269
+ ### Failure Policies
270
+
271
+ Combine with strategies to handle errors gracefully:
272
+
273
+ | Policy | Behavior | Use Case |
274
+ |--------|----------|----------|
275
+ | `fail-fast` | Stop all on first failure (default) | Critical operations |
276
+ | `continue` | Run all to completion, report failures | Comprehensive analysis |
277
+ | `ignore` | Skip failed branches silently | Optional enrichments |
278
+
279
+ **Usage:**
280
+ ```
281
+ Deploy parallel (strategy: all, on-fail: continue):
282
+ - agileflow-security (may fail if no vulnerabilities)
283
+ - agileflow-performance (may fail if no issues)
284
+ - agileflow-testing
285
+
286
+ Run all to completion. Report any failures at end.
287
+ ```
288
+
289
+ **When to use each policy:**
290
+
291
+ | Scenario | Recommended Policy |
292
+ |----------|-------------------|
293
+ | Implementation work | `fail-fast` (need all parts) |
294
+ | Code review/analysis | `continue` (want all perspectives) |
295
+ | Optional enrichments | `ignore` (nice-to-have) |
296
+
297
+ ### Strategy: all (Default)
298
+
299
+ Wait for all experts to complete. Report all results in synthesis.
300
+
301
+ ```
302
+ Deploy parallel (strategy: all):
303
+ - agileflow-api (endpoint)
304
+ - agileflow-ui (component)
305
+
306
+ Collect ALL results before synthesizing.
307
+ If ANY expert fails → report failure with details.
308
+ ```
309
+
310
+ ### Strategy: first
311
+
312
+ Take the first expert that completes. Useful for racing approaches.
313
+
314
+ ```
315
+ Deploy parallel (strategy: first):
316
+ - Expert A (approach: caching)
317
+ - Expert B (approach: pagination)
318
+ - Expert C (approach: batching)
319
+
320
+ First to complete wins → use that approach.
321
+ Cancel/ignore other results.
322
+
323
+ Use case: Finding ANY working solution when multiple approaches are valid.
324
+ ```
325
+
326
+ ### Strategy: any
327
+
328
+ Take first successful result. Ignore failures. Useful for fallbacks.
329
+
330
+ ```
331
+ Deploy parallel (strategy: any):
332
+ - Expert A (primary approach)
333
+ - Expert B (fallback approach)
334
+
335
+ First SUCCESS wins → use that result.
336
+ If A fails but B succeeds → use B.
337
+ If all fail → report all failures.
338
+
339
+ Use case: Resilient operations where any working solution is acceptable.
340
+ ```
341
+
342
+ ### Strategy: majority
343
+
344
+ Multiple experts analyze same thing. Take consensus.
345
+
346
+ ```
347
+ Deploy parallel (strategy: majority):
348
+ - Security Expert 1
349
+ - Security Expert 2
350
+ - Security Expert 3
351
+
352
+ If 2+ agree → use consensus recommendation.
353
+ If no consensus → report conflict, request decision.
354
+
355
+ Use case: High-stakes security reviews, architecture decisions.
356
+ ```
357
+
358
+ ---
359
+
240
360
  ## PARALLEL PATTERNS
241
361
 
242
362
  ### Full-Stack Feature
243
363
  ```
244
- Parallel:
364
+ Parallel (strategy: all):
245
365
  - agileflow-api (endpoint)
246
366
  - agileflow-ui (component)
247
367
  Then:
@@ -250,22 +370,32 @@ Then:
250
370
 
251
371
  ### Code Review/Analysis
252
372
  ```
253
- Parallel (analyze same code):
373
+ Parallel (strategy: all):
254
374
  - agileflow-security
255
375
  - agileflow-performance
256
376
  - agileflow-testing
257
377
  Then:
258
- - Synthesize findings
378
+ - Synthesize all findings
259
379
  ```
260
380
 
261
- ### Best-of-N
381
+ ### Best-of-N (Racing)
262
382
  ```
263
- Parallel (same task, different approaches):
383
+ Parallel (strategy: first):
264
384
  - Expert A (approach 1)
265
385
  - Expert B (approach 2)
266
386
  - Expert C (approach 3)
267
387
  Then:
268
- - Compare and select best
388
+ - Use first completion
389
+ ```
390
+
391
+ ### Consensus Decision
392
+ ```
393
+ Parallel (strategy: majority):
394
+ - Security Expert 1
395
+ - Security Expert 2
396
+ - Security Expert 3
397
+ Then:
398
+ - Take consensus recommendation
269
399
  ```
270
400
 
271
401
  ---
@@ -326,3 +456,168 @@ These are independent — deploying in parallel.
326
456
 
327
457
  Proceed with integration?
328
458
  ```
459
+
460
+ ---
461
+
462
+ ## NESTED LOOP MODE (Experimental)
463
+
464
+ When agents need to iterate until quality gates pass, use **nested loops**. Each agent runs its own isolated loop with quality verification.
465
+
466
+ ### When to Use
467
+
468
+ | Scenario | Use Nested Loops? |
469
+ |----------|-------------------|
470
+ | Simple implementation | No - single expert spawn |
471
+ | Need coverage threshold | Yes - agent loops until coverage met |
472
+ | Need visual verification | Yes - agent loops until screenshots verified |
473
+ | Complex multi-gate feature | Yes - each domain gets its own loop |
474
+
475
+ ### How It Works
476
+
477
+ ```
478
+ ┌─────────────────────────────────────────────────────────────┐
479
+ │ ORCHESTRATOR │
480
+ │ │
481
+ │ ┌──────────────────┐ ┌──────────────────┐ │
482
+ │ │ API Agent │ │ UI Agent │ (parallel) │
483
+ │ │ Loop: coverage │ │ Loop: visual │ │
484
+ │ │ Max: 5 iter │ │ Max: 5 iter │ ← ISOLATED │
485
+ │ └──────────────────┘ └──────────────────┘ │
486
+ │ ↓ ↓ │
487
+ │ TaskOutput TaskOutput │
488
+ │ ↓ ↓ │
489
+ │ ┌──────────────────────────────────────────────────────┐ │
490
+ │ │ SYNTHESIS + VERIFICATION │ │
491
+ │ └──────────────────────────────────────────────────────┘ │
492
+ └─────────────────────────────────────────────────────────────┘
493
+ ```
494
+
495
+ ### Spawning with Agent Loops
496
+
497
+ **Step 1: Generate loop ID and include in prompt**
498
+
499
+ ```
500
+ Task(
501
+ description: "API with coverage loop",
502
+ prompt: `Implement /api/profile endpoint.
503
+
504
+ ## AGENT LOOP ACTIVE
505
+
506
+ You have a quality gate to satisfy:
507
+ - Gate: coverage >= 80%
508
+ - Max iterations: 5
509
+ - Loop ID: abc12345
510
+
511
+ ## Workflow
512
+
513
+ 1. Implement the feature
514
+ 2. Run the gate check:
515
+ node .agileflow/scripts/agent-loop.js --check --loop-id=abc12345
516
+ 3. If check returns exit code 2 (running), iterate and improve
517
+ 4. If check returns exit code 0 (passed), you're done
518
+ 5. If check returns exit code 1 (failed), report the failure
519
+
520
+ Continue iterating until the gate passes or max iterations reached.`,
521
+ subagent_type: "agileflow-api",
522
+ run_in_background: true
523
+ )
524
+ ```
525
+
526
+ **Step 2: Initialize the loop before spawning**
527
+
528
+ Before spawning the agent, the orchestrator should document that loops are being used. The agent will initialize its own loop using:
529
+
530
+ ```bash
531
+ node .agileflow/scripts/agent-loop.js --init --gate=coverage --threshold=80 --max=5 --agent=agileflow-api --loop-id=abc12345
532
+ ```
533
+
534
+ ### Available Quality Gates
535
+
536
+ | Gate | Flag | Description |
537
+ |------|------|-------------|
538
+ | `tests` | `--gate=tests` | Run test command, pass on exit 0 |
539
+ | `coverage` | `--gate=coverage --threshold=80` | Run coverage, pass when >= threshold |
540
+ | `visual` | `--gate=visual` | Check screenshots have verified- prefix |
541
+ | `lint` | `--gate=lint` | Run lint command, pass on exit 0 |
542
+ | `types` | `--gate=types` | Run tsc --noEmit, pass on exit 0 |
543
+
544
+ ### Monitoring Progress
545
+
546
+ Read the event bus for loop status:
547
+
548
+ ```bash
549
+ # Events emitted to: docs/09-agents/bus/log.jsonl
550
+
551
+ {"type":"agent_loop","event":"init","loop_id":"abc12345","agent":"agileflow-api","gate":"coverage","threshold":80}
552
+ {"type":"agent_loop","event":"iteration","loop_id":"abc12345","iter":1,"value":65,"passed":false}
553
+ {"type":"agent_loop","event":"iteration","loop_id":"abc12345","iter":2,"value":72,"passed":false}
554
+ {"type":"agent_loop","event":"passed","loop_id":"abc12345","final_value":82,"iterations":3}
555
+ ```
556
+
557
+ ### Safety Limits
558
+
559
+ | Limit | Value | Enforced By |
560
+ |-------|-------|-------------|
561
+ | Max iterations per agent | 5 | agent-loop.js |
562
+ | Max concurrent loops | 3 | agent-loop.js |
563
+ | Timeout per loop | 10 min | agent-loop.js |
564
+ | Regression abort | 2 consecutive | agent-loop.js |
565
+ | Stall abort | 5 min no progress | agent-loop.js |
566
+
567
+ ### Example: Full Feature with Quality Gates
568
+
569
+ ```
570
+ Request: "Implement user profile with API at 80% coverage and UI with visual verification"
571
+
572
+ Parallel spawn:
573
+ - agileflow-api with coverage loop (threshold: 80%)
574
+ - agileflow-ui with visual loop
575
+
576
+ ## Agent Loop Status
577
+
578
+ ### API Expert (agileflow-api)
579
+ - Gate: coverage >= 80%
580
+ - Iterations: 3
581
+ - Progress: 65% → 72% → 82% ✓
582
+ - Status: PASSED
583
+
584
+ ### UI Expert (agileflow-ui)
585
+ - Gate: visual (screenshots verified)
586
+ - Iterations: 2
587
+ - Progress: 0/3 → 3/3 verified ✓
588
+ - Status: PASSED
589
+
590
+ ## Synthesis
591
+
592
+ Both quality gates satisfied. Feature implementation complete.
593
+
594
+ Files created:
595
+ - src/routes/profile.ts (API)
596
+ - src/components/ProfilePage.tsx (UI)
597
+ - tests/profile.test.ts (coverage)
598
+ - screenshots/verified-profile-*.png (visual)
599
+ ```
600
+
601
+ ### Abort Handling
602
+
603
+ If an agent loop fails:
604
+
605
+ 1. **Max iterations reached**: Report which gate wasn't satisfied
606
+ 2. **Regression detected**: Note that quality went down twice
607
+ 3. **Stalled**: Note no progress for 5+ minutes
608
+ 4. **Timeout**: Note 10-minute limit exceeded
609
+
610
+ ```markdown
611
+ ## Agent Loop FAILED
612
+
613
+ ### API Expert (agileflow-api)
614
+ - Gate: coverage >= 80%
615
+ - Final: 72%
616
+ - Status: FAILED (max_iterations)
617
+ - Reason: Couldn't reach 80% coverage in 5 iterations
618
+
619
+ ### Recommendation
620
+ - Review uncovered code paths
621
+ - Consider if 80% is achievable
622
+ - May need to reduce threshold or add more test cases
623
+ ```