npm - valent-pipeline - Versions diffs - 0.2.19 → 0.2.21 - Mend

valent-pipeline 0.2.19 → 0.2.21

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (115) hide show

package/README.md +438 -0
package/package.json +1 -1
package/pipeline/agents-manifest.yaml +61 -1
package/pipeline/docs/agent-reference.md +82 -23
package/pipeline/docs/design/refactor-checklist.md +111 -0
package/pipeline/docs/index.md +60 -0
package/pipeline/docs/lead-lifecycle.md +1 -1
package/pipeline/docs/pipeline-overview.md +4 -0
package/pipeline/prompts/bend.md +5 -11
package/pipeline/prompts/critic.md +9 -0
package/pipeline/prompts/data.md +59 -0
package/pipeline/prompts/docgen.md +61 -0
package/pipeline/prompts/fend.md +3 -10
package/pipeline/prompts/iac.md +70 -0
package/pipeline/prompts/knowledge.md +2 -0
package/pipeline/prompts/lead.md +97 -6
package/pipeline/prompts/libdev.md +61 -0
package/pipeline/prompts/mcp-dev.md +59 -0
package/pipeline/prompts/mobile.md +92 -0
package/pipeline/prompts/qa-a.md +1 -1
package/pipeline/prompts/qa-b.md +1 -1
package/pipeline/prompts/reqs.md +5 -1
package/pipeline/scripts/db-bootstrap.ts +1 -1
package/pipeline/scripts/embed-sqlite.ts +5 -0
package/pipeline/steps/common/quality-standards.md +19 -0
package/pipeline/steps/critic/data-pipeline.md +28 -0
package/pipeline/steps/critic/document-generation.md +21 -0
package/pipeline/steps/critic/iac.md +29 -0
package/pipeline/steps/critic/library.md +24 -0
package/pipeline/steps/critic/mcp-server.md +24 -0
package/pipeline/steps/critic/mobile-app.md +29 -0
package/pipeline/steps/data/estimate.md +51 -0
package/pipeline/steps/data/handoff.md +9 -0
package/pipeline/steps/data/implement.md +16 -0
package/pipeline/steps/data/read-inputs.md +13 -0
package/pipeline/steps/data/write-tests.md +13 -0
package/pipeline/steps/docgen/estimate.md +49 -0
package/pipeline/steps/docgen/handoff.md +9 -0
package/pipeline/steps/docgen/implement.md +19 -0
package/pipeline/steps/docgen/read-inputs.md +13 -0
package/pipeline/steps/docgen/write-tests.md +15 -0
package/pipeline/steps/iac/estimate.md +50 -0
package/pipeline/steps/iac/handoff.md +9 -0
package/pipeline/steps/iac/implement.md +19 -0
package/pipeline/steps/iac/read-inputs.md +13 -0
package/pipeline/steps/iac/write-tests.md +20 -0
package/pipeline/steps/judge/ship-decision.md +14 -1
package/pipeline/steps/libdev/estimate.md +49 -0
package/pipeline/steps/libdev/handoff.md +9 -0
package/pipeline/steps/libdev/implement.md +19 -0
package/pipeline/steps/libdev/read-inputs.md +13 -0
package/pipeline/steps/libdev/write-tests.md +16 -0
package/pipeline/steps/mcp-dev/estimate.md +49 -0
package/pipeline/steps/mcp-dev/handoff.md +9 -0
package/pipeline/steps/mcp-dev/implement.md +29 -0
package/pipeline/steps/mcp-dev/read-inputs.md +13 -0
package/pipeline/steps/mcp-dev/write-tests.md +19 -0
package/pipeline/steps/mobile/emulator-lifecycle.md +67 -0
package/pipeline/steps/mobile/estimate.md +51 -0
package/pipeline/steps/mobile/flutter.md +30 -0
package/pipeline/steps/mobile/handoff.md +18 -0
package/pipeline/steps/mobile/implement.md +20 -0
package/pipeline/steps/mobile/react-native.md +32 -0
package/pipeline/steps/mobile/read-inputs.md +10 -0
package/pipeline/steps/mobile/write-tests.md +59 -0
package/pipeline/steps/orchestration/adopt-lead-and-create-team.md +1 -1
package/pipeline/steps/orchestration/sprint-execute.md +3 -2
package/pipeline/steps/orchestration/sprint-groom.md +4 -0
package/pipeline/steps/orchestration/sprint-size.md +26 -16
package/pipeline/steps/orchestration/validate-story-inputs.md +9 -0
package/pipeline/steps/qa-a/data-pipeline.md +32 -0
package/pipeline/steps/qa-a/document-generation.md +52 -0
package/pipeline/steps/qa-a/iac.md +30 -0
package/pipeline/steps/qa-a/library.md +42 -0
package/pipeline/steps/qa-a/mcp-server.md +31 -0
package/pipeline/steps/qa-a/mobile-app.md +59 -0
package/pipeline/steps/qa-b/data-pipeline.md +48 -0
package/pipeline/steps/qa-b/document-generation.md +47 -0
package/pipeline/steps/qa-b/iac.md +44 -0
package/pipeline/steps/qa-b/library.md +61 -0
package/pipeline/steps/qa-b/mcp-server.md +40 -0
package/pipeline/steps/qa-b/mobile-app.md +71 -0
package/pipeline/steps/readiness/standalone-review.md +7 -2
package/pipeline/steps/reqs/data-pipeline.md +56 -0
package/pipeline/steps/reqs/document-generation.md +55 -0
package/pipeline/steps/reqs/draft-brief.md +10 -0
package/pipeline/steps/reqs/iac.md +63 -0
package/pipeline/steps/reqs/library.md +56 -0
package/pipeline/steps/reqs/mcp-server.md +48 -0
package/pipeline/steps/reqs/mobile-app.md +54 -0
package/pipeline/steps/reqs/self-review.md +5 -3
package/pipeline/task-graphs/backend-api.yaml +19 -2
package/pipeline/task-graphs/data-pipeline.yaml +29 -12
package/pipeline/task-graphs/document-generation.yaml +29 -12
package/pipeline/task-graphs/frontend-only.yaml +19 -2
package/pipeline/task-graphs/fullstack-web.yaml +19 -2
package/pipeline/task-graphs/library.yaml +29 -12
package/pipeline/task-graphs/mcp-server.yaml +29 -12
package/pipeline/task-graphs/mobile-app.yaml +171 -0
package/pipeline/templates/bugs.template.md +1 -1
package/pipeline/templates/critic-review.template.md +1 -1
package/pipeline/templates/data-handoff.template.md +96 -0
package/pipeline/templates/docgen-handoff.template.md +83 -0
package/pipeline/templates/iac-handoff.template.md +83 -0
package/pipeline/templates/judge-decision.template.md +11 -1
package/pipeline/templates/libdev-handoff.template.md +82 -0
package/pipeline/templates/mcp-dev-handoff.template.md +87 -0
package/pipeline/templates/mobile-handoff.template.md +122 -0
package/pipeline/templates/reqs-brief.template.md +60 -4
package/skills/valent-run-deferred-tests/SKILL.md +109 -0
package/skills/valent-run-epic/SKILL.md +1 -1
package/skills/valent-run-project/SKILL.md +1 -1
package/src/commands/db-rebuild.js +5 -0
package/src/lib/config-schema.js +1 -1
package/src/lib/db.js +1 -1

package/pipeline/steps/mcp-dev/read-inputs.md ADDED Viewed

@@ -0,0 +1,13 @@
+# MCP-DEV Step: Read Inputs
+## Step 1: Read reqs-brief.md
+Understand: acceptance criteria, business rules, tool definitions (names, descriptions, inputSchema), transport requirements (stdio/SSE/HTTP), capability declarations, error handling expectations, cross-cutting concerns.
+## Step 2: Read qa-test-spec.md
+Understand: what tests to write for each AC, expected assertions, protocol compliance verification requirements, test case names and structure.
+## Step 3: Read correction directives
+Read `{correction_directives}`. Apply all directives targeting MCP-DEV. Note any conflicts with default behavior and follow the directive.
+## Step 3b: Query Knowledge Agent (Conditional)
+If a Knowledge Agent is available in the team config, send: `[KNOWLEDGE-QUERY] What codebase conventions, implementation patterns, and known pitfalls should I know? Context: I am MCP-DEV implementing {story_id} using {tech_stack.mcp_sdk} with {tech_stack.transport_type} transport.` If no response within a reasonable time or no Knowledge Agent is spawned, proceed without.

package/pipeline/steps/mcp-dev/write-tests.md ADDED Viewed

@@ -0,0 +1,19 @@
+# MCP-DEV Step: Write Tests
+## Step 10: Write test code
+Satisfy qa-test-spec for each AC. Every test case named in qa-test-spec must have a corresponding test. Follow quality standards from the core prompt. Record in `mcp-dev-handoff.md#test-files-written`.
+**Critical requirement: real transport, no mocked transport.** Tests must spawn a real MCP server instance and communicate over the actual transport (stdio pipe, SSE connection, or HTTP). Do not mock the transport layer. The test client sends real JSON-RPC messages and asserts on real responses.
+## Step 11: Test protocol compliance
+Tests must cover the full protocol handshake and lifecycle:
+1. `initialize` request returns correct server info and capabilities
+2. `tools/list` returns all registered tools with correct inputSchema
+3. `tools/call` for each tool with valid params returns expected result shape
+4. `tools/call` with invalid params returns JSON-RPC `-32602`
+5. `tools/call` triggering tool failure returns result with `isError: true`
+6. Unknown method returns JSON-RPC `-32601`
+7. Malformed JSON returns JSON-RPC `-32700`
+## Step 12: Run tests, verify all pass
+Run the full test suite. All tests must pass. Record results in `mcp-dev-handoff.md#test-results-summary`. If tests fail, fix the code -- do not skip or weaken tests.

package/pipeline/steps/mobile/emulator-lifecycle.md ADDED Viewed

@@ -0,0 +1,67 @@
+# MOBILE Step: Emulator Lifecycle Management
+## Step 7b: Boot Emulator
+### Android Emulator
+1. List available AVDs: `emulator -list-avds`
+2. Boot emulator: `emulator -avd {avd_name} -no-snapshot-load -no-audio -no-window &`
+3. Wait for boot: `adb wait-for-device` then poll `adb shell getprop sys.boot_completed` until it returns `1` (max 120s, 10 retries at 12s intervals)
+4. If boot fails after 120s: kill process (`adb emu kill`), retry once with fresh boot. If second attempt fails, file `[BLOCKER]` to Lead with emulator logs.
+Record emulator config in `mobile-handoff.md#emulator-configuration`.
+### iOS Simulator (Mac Only)
+1. Verify Mac host: `uname -s` must return `Darwin`. If not Mac, skip iOS entirely.
+2. List available simulators: `xcrun simctl list devices available`
+3. Boot simulator: `xcrun simctl boot {device_udid}`
+4. Wait for boot: poll `xcrun simctl list devices | grep Booted` (max 60s)
+5. If boot fails: `xcrun simctl shutdown all`, retry once. If second attempt fails, file `[BLOCKER]` to Lead.
+Record simulator config in `mobile-handoff.md#emulator-configuration`.
+## Step 7c: Build and Install App
+### React Native
+1. Start Metro bundler: `npx react-native start --reset-cache &`
+2. Wait for Metro ready: poll for `http://localhost:8081/status` returning `packager-status:running` (max 60s). Handle port conflicts by checking if port 8081 is in use.
+3. Android build + install: `npx react-native run-android`
+4. iOS build + install (Mac only): `npx react-native run-ios --simulator="{simulator_name}"`
+5. Verify main activity/screen renders within 10s of launch. If not, capture `adb logcat` output and file P1 bug.
+### Flutter
+1. Resolve dependencies: `flutter pub get`
+2. Android build + install: `flutter build apk --debug && flutter install --device-id {emulator_id}`
+3. iOS build + install (Mac only): `flutter build ios --debug --simulator && flutter install --device-id {simulator_id}`
+4. Verify app launches and main screen renders within 10s.
+### Native Module Recovery (React Native)
+If native module errors occur during build:
+- iOS: run `cd ios && pod install && cd ..` and retry build
+- Android: run `cd android && ./gradlew clean && cd ..` and retry build
+- If native module build fails after retry, file P1 bug with full build output.
+## Step 7d: State Isolation Between Maestro Flows
+Before each Maestro flow execution:
+- **Android:** `adb shell pm clear {app_package_name}`
+- **iOS:** `xcrun simctl terminate {device_udid} {bundle_id}` followed by `xcrun simctl privacy {device_udid} reset all {bundle_id}`
+This ensures no state leakage between test flows. Every flow starts from a clean app state.
+## Step 7e: Pre-Grant Permissions
+Before test execution, pre-grant required permissions to avoid UI dialog interference:
+- **Android:** `adb shell pm grant {package} android.permission.{PERMISSION}` for each required permission
+- **iOS:** `xcrun simctl privacy {device_udid} grant {permission-type} {bundle_id}`
+Never depend on UI dialogs for permission grants during E2E tests.
+## Step 7f: Crash Recovery
+If emulator/simulator crashes or becomes unresponsive during test execution:
+1. Detect via `adb devices` showing offline or Maestro flow timeout
+2. Capture crash logs: `adb logcat -d > crash-{timestamp}.log`
+3. Kill stale processes: `adb emu kill` / `xcrun simctl shutdown all`
+4. Re-boot with clean state (Step 7b)
+5. Resume from the last incomplete Maestro flow (do not re-run passed flows)
+6. Max 2 crash recovery attempts per platform. After 2 crashes, file P1 bug with crash logs and stop testing on that platform.

package/pipeline/steps/mobile/estimate.md ADDED Viewed

@@ -0,0 +1,51 @@
+# Mobile Estimation
+**Purpose:** Assign a Fibonacci story point estimate for mobile implementation complexity. This is a lightweight estimation step — no code tools, no implementation. Read specs, assess complexity, output a number with rationale.
+**Fibonacci scale:** 1, 2, 3, 5, 8, 13, 21
+## Step 1: Read Groomed Specs
+Read and assess:
+- `{story_output_dir}/reqs-brief.md` — REQUIRED
+- `{story_output_dir}/uxa-spec.md` — REQUIRED (if UI profile active)
+- `{story_output_dir}/qa-test-spec.md` — REQUIRED
+## Step 2: Assess Complexity Factors
+Evaluate each factor and record your assessment:
+| Factor | Assessment | Weight |
+|--------|-----------|--------|
+| **Screen count** | How many new or modified screens? Simple displays vs complex interactive screens? | High |
+| **Navigation complexity** | Deep linking, nested stacks/tabs/drawers, modal flows, conditional navigation? | High |
+| **Platform-specific requirements** | Android-only vs cross-platform? Platform-divergent behavior? | Medium |
+| **Native module integration** | Camera, GPS, push notifications, biometrics, file system? | Medium |
+| **State management complexity** | Local state vs global state? Offline persistence? Optimistic updates? | Medium |
+| **API integration surface** | Number of endpoints consumed, real-time updates, file uploads? | Medium |
+## Step 3: Select Fibonacci Value
+Map your assessment to the Fibonacci scale:
+| Points | Typical Mobile Scope |
+|--------|---------------------|
+| 1 | Text change, style tweak, single prop addition |
+| 2 | Simple display screen, minor layout change |
+| 3 | Interactive screen with local state, form with validation |
+| 5 | Multi-screen feature, navigation setup, API integration |
+| 8 | Complex interactive feature, cross-platform divergence, native modules |
+| 13 | Large feature with offline support, complex navigation, extensive platform handling |
+| 21 | Epic-scale: new navigation paradigm or major platform integration (consider splitting) |
+**Calibration context (if `{estimation_model}` is `calibrated`):**
+If calibration directives are provided in `{correction_directives}`, factor them into your estimate. These are learned patterns from prior sprints.
+## Step 4: Write Estimate
+Write to `{story_output_dir}/mobile-estimation.md` using `.valent-pipeline/templates/estimation.template.md`:
+- Fibonacci value with brief rationale (2-3 sentences)
+- Factor assessments from Step 2
+- Calibration adjustments applied (if any)
+Send: `[ESTIMATION] MOBILE estimates {story_id} at {points} points. See mobile-estimation.md.`

package/pipeline/steps/mobile/flutter.md ADDED Viewed

@@ -0,0 +1,30 @@
+# MOBILE Step: Flutter Specifics
+This step is loaded conditionally when `{tech_stack.mobile_framework}` is `flutter`. Read before implementing.
+## Flutter Build Configuration
+- Debug builds for testing: `flutter build apk --debug` / `flutter build ios --debug --simulator`
+- Resolve dependencies before build: `flutter pub get`
+- Verify Flutter SDK version matches project constraints in `pubspec.yaml`
+- Hot reload: disable during E2E (use cold start for each Maestro flow via `clearState`)
+## Flutter Testing Patterns
+- Widget tests: use `flutter_test` with `WidgetTester` for component isolation
+- Integration tests: use Maestro YAML flows (NOT `flutter_driver` or `integration_test` for pipeline E2E)
+- State management: clear providers/blocs/cubits between test suites
+- Platform channels: test with real native code, not mock method channel handlers
+## Flutter-Specific Emulator Setup
+- Android: standard AVD boot, then `flutter install --device-id {emulator_id}`
+- iOS: `open -a Simulator` if not already running, then `flutter install --device-id {simulator_id}`
+- Verify device connection: `flutter devices` must list the target device
+## Area Labels for Testing
+- Use `Key` with `ValueKey('testID')` for Flutter widgets
+- Maestro `tapOn` with `id:` selector reads the `ValueKey` on both platforms
+- Follow the area label convention from uxa-spec.md: `{screen}-{section}-{element}`
+## Offline Testing
+- Use emulator console commands for network simulation: `adb emu network delay gprs` / `adb emu network speed gsm`
+- Do NOT use `adb shell svc wifi disable` (unreliable on emulators)
+- Test offline-capable features per reqs-brief offline requirements

package/pipeline/steps/mobile/handoff.md ADDED Viewed

@@ -0,0 +1,18 @@
+# MOBILE Step: Handoff
+Read `.valent-pipeline/steps/common/distilled-handoff-format.md` before writing output.
+## Step 11: Write mobile-handoff.md
+Complete all sections of the handoff document using the template at `.valent-pipeline/templates/mobile-handoff.template.md`. Set `status: completed` in frontmatter.
+If iOS tests were deferred (host is not Mac):
+- Set `ios_deferred: true` in frontmatter
+- Complete the `Deferred iOS Tests` section listing all unexecuted iOS flows
+- Include in the inbox message: `[IOS-DEFERRED] {count} iOS Maestro flows deferred. Run /run-deferred-tests on Mac to complete.`
+Notify lead via inbox: `[DONE] Mobile implementation complete. See mobile-handoff.md#orchestrator-summary.`
+## Independent Verification Requirement
+All Android tests must pass before marking complete. If on Mac, all iOS tests must also pass. Do not mark complete with failing Android tests. Do not rely on BEND or CRITIC to catch your failures.
+**Smoke test gate:** The app-level smoke test (Step 9b) must pass before sending `[DONE]`. If it fails, the app's entry point is not wired to your deliverable -- fix the wiring before marking complete.

package/pipeline/steps/mobile/implement.md ADDED Viewed

@@ -0,0 +1,20 @@
+# MOBILE Step: Implement
+## Step 3: Detect host platform
+Run platform detection to determine available targets:
+- `uname -s` returns `Darwin` → Mac: both Android and iOS targets available
+- `uname -s` returns `Linux` or `MINGW*`/`MSYS*` → Windows/Linux: Android only, iOS deferred
+Record platform capabilities in `mobile-handoff.md#platform-coverage`. If iOS is unavailable, set `ios_deferred: true` in handoff frontmatter.
+## Step 4: Plan screen architecture
+From uxa-spec.md screen specifications (if present) or reqs-brief.md: identify screens, navigation structure (stack, tab, drawer), shared components, deep link URI patterns. Map to framework conventions for `{tech_stack.mobile_framework}`.
+## Step 5: Implement screens and navigation
+Per spec: create screen components, navigation setup (React Navigation / Flutter Navigator), deep linking configuration. Apply `testID` attributes matching the area label system from uxa-spec.md. Record in `mobile-handoff.md#screens-implemented`.
+## Step 6: Implement components
+Per spec: forms, lists, modals, gesture handlers, platform-specific components. Wire to backend API endpoints per `bend-handoff.md#api-endpoints-implemented` (if BEND is active). Record in `mobile-handoff.md#components-created`.
+## Step 7: Implement platform-specific behavior
+Handle platform divergences: permissions (camera, location, notifications), native modules, platform-specific UI (Android back button, iOS swipe-to-go-back, safe areas, notch handling). Use `Platform.OS` / `Platform.select` for divergent behavior. Record decisions in `mobile-handoff.md#implementation-decisions`.

package/pipeline/steps/mobile/react-native.md ADDED Viewed

@@ -0,0 +1,32 @@
+# MOBILE Step: React Native Specifics
+This step is loaded conditionally when `{tech_stack.mobile_framework}` is `react-native`. Read before implementing.
+## Metro Bundler Management
+- Start Metro before any test execution: `npx react-native start --reset-cache &`
+- Monitor Metro for JS bundle errors. If bundling fails, fix the code and retry.
+- Handle port conflicts: check if port 8081 is in use before starting (`lsof -i :8081` on Mac/Linux, `netstat -ano | findstr 8081` on Windows). Kill stale Metro processes if needed.
+- Kill Metro after all tests complete.
+## React Native Build Configuration
+- Use `react-native.config.js` for native module auto-linking
+- Hermes engine: verify Hermes is enabled for Android (check `android/app/build.gradle` for `enableHermes: true`)
+- Flipper: disable in release builds, optional in debug
+- Fast Refresh: disable during E2E to avoid test flakiness (`--no-interactive` flag)
+## React Native Testing Patterns
+- Component tests: use React Native Testing Library (`@testing-library/react-native`)
+- Navigation tests: verify deep link resolution via React Navigation linking config, verify screen transitions
+- Platform-specific code: test `.android.tsx` and `.ios.tsx` variants separately when they exist
+- AsyncStorage / MMKV: clear between test suites to prevent state leakage
+## Native Module Considerations
+- If the story uses native modules, verify auto-linking succeeded on both platforms
+- Bridge calls must be tested with real native code, not mocks
+- Pod install (iOS): `cd ios && pod install` after adding native dependencies
+- Gradle sync (Android): `cd android && ./gradlew --refresh-dependencies` if native module issues
+## Area Labels for Testing
+- Use `testID` prop for React Native components (maps to `accessibilityIdentifier` on iOS, `resource-id` on Android)
+- Maestro `tapOn` uses `id:` selector which reads `testID` on both platforms
+- Follow the area label convention from uxa-spec.md: `{screen}-{section}-{element}`

package/pipeline/steps/mobile/read-inputs.md ADDED Viewed

@@ -0,0 +1,10 @@
+# MOBILE Step: Read Inputs
+## Step 1: Read inputs
+Read `reqs-brief.md`, `uxa-spec.md` (if UI profile active), and `qa-test-spec.md`. Understand: acceptance criteria, screen specifications, navigation flows, component hierarchy, platform-specific behaviors, Maestro flow specifications, test specifications.
+## Step 2: Read correction directives
+Read `{correction_directives}`. Apply all directives targeting MOBILE. Note any conflicts with default behavior and follow the directive.
+## Step 2b: Query Knowledge Agent (Conditional)
+If a Knowledge Agent is available in the team config, send: `[KNOWLEDGE-QUERY] What mobile patterns, navigation conventions, and platform-specific constraints should I know? Context: I am MOBILE implementing {story_id} using {tech_stack.mobile_framework}.` If no response within a reasonable time or no Knowledge Agent is spawned, proceed without.

package/pipeline/steps/mobile/write-tests.md ADDED Viewed

@@ -0,0 +1,59 @@
+# MOBILE Step: Write Tests
+## Step 8: Write Maestro YAML test flows
+For each AC in qa-test-spec.md, write a Maestro YAML flow file. Place flows in `e2e/maestro/` directory.
+Each flow file structure:
+```yaml
+appId: {app_package_name}
+name: {descriptive flow name}
+---
+- clearState
+- launchApp
+# ... test steps: tapOn, assertVisible, inputText, scroll, back, swipe, etc.
+```
+Rules:
+- Every flow must start with `clearState` and `launchApp`
+- Use `assertVisible` and `assertNotVisible` for assertions, not fixed-time waits
+- Use `waitForAnimationToEnd` instead of hardcoded `extendedWaitUntil` timeouts
+- Deep link tests: use `openLink` command with the URI pattern from reqs-brief
+- Screenshot capture: use `takeScreenshot` at assertion points for evidence
+Record in `mobile-handoff.md#maestro-flow-files`.
+## Step 8b: Write unit tests
+Write unit tests per qa-test-spec.md using `{tech_stack.test_framework_unit}`. Unit tests MAY mock API clients for isolated component logic. Every mocked unit test for an API-calling AC must be paired with a real-API Maestro flow for the same AC. Record in `mobile-handoff.md#test-files-written`.
+## Step 9: Run unit tests, verify all pass
+Run the unit test suite. All tests must pass. Record results in `mobile-handoff.md#test-results-summary`. If tests fail, fix the code -- do not skip or weaken tests.
+## Step 9b: App-Level Smoke Test
+Write one test that bootstraps the application from its entry point and asserts the story's deliverable is present and reachable. This catches "unwired entry point" bugs where a screen exists but is never registered in the navigation. Mandatory for the first mobile story in a project, recommended for all subsequent stories.
+Record in `mobile-handoff.md#test-files-written`.
+## Step 10: Run Maestro flows
+### Android (always)
+For each flow file:
+1. State isolation (Step 7d)
+2. Execute: `maestro test {flow_file}`
+3. Record per-flow result (pass/fail with output)
+### iOS (Mac only)
+For each flow file tagged as `both` or `ios`:
+1. State isolation (Step 7d, iOS variant)
+2. Execute: `maestro test {flow_file} --device {ios_simulator_name}`
+3. Record per-flow result
+### iOS Deferred (Windows/Linux)
+If not on Mac, record all iOS-targeted flows in `mobile-handoff.md#deferred-ios-tests` with reason "Host OS lacks iOS simulator". This is expected, not a bug.
+E2E tests run serially against the single emulator -- the emulator is shared mutable state. The 1.5-minute timeout per story applies to test execution time excluding emulator boot time.
+## Step 10b: Signal integration readiness
+When mobile code is complete and all available-platform tests pass, send to BEND via inbox:
+`[INTEGRATION-READY] Mobile code complete. Run integration tests against my app.`
+Wait for BEND's `[INTEGRATION-READY]` message before running integration verification. Once both sides are ready, verify that API calls from the app resolve correctly against BEND's running server.

package/pipeline/steps/orchestration/adopt-lead-and-create-team.md CHANGED Viewed

@@ -99,7 +99,7 @@ Otherwise, substitute variables in the knowledge spawn template (`{{story_id}}`,
 | Wave | Spawn Trigger | Agents |
 |---|---|---|
-| 2 | QA-A sends `[HANDOFF]` (completes) | BEND, FEND, CRITIC |
+| 2 | QA-A sends `[HANDOFF]` (completes) | BEND, FEND, DATA, MCP-DEV, LIBDEV, DOCGEN, IAC, CRITIC (each only if not skipped by testing_profiles) |
 | 3 | CRITIC task becomes `in_progress` | QA-B, PMCP (if ui profile) |
 | 4 | JUDGE bug-review task becomes `in_progress` | (reserved) |

package/pipeline/steps/orchestration/sprint-execute.md CHANGED Viewed

@@ -2,7 +2,7 @@
 **Condition:** Only execute in sprint mode (`{is_sprint_mode}` is true).
-Execute sprint stories sequentially. Phase 2 agents (BEND, FEND, CRITIC, QA-B, JUDGE) are spawned fresh per story and killed after each.
+Execute sprint stories sequentially. BEND/FEND persist from the sizing phase into story 1. CRITIC, QA-B, and JUDGE are spawned fresh for story 1. For story 2+, all Phase 2 agents are spawned fresh and killed after each.
 ## For Each Sprint Story (Sequential)
@@ -22,7 +22,8 @@ For each story in sprint order:
 1. Update story status to `development` in both `{backlog_path}` and `sprint-{n}-status.yaml`
 2. Execute the standard story flow:
    - Create story branch
-   - Spawn Phase 2 agents per the task graph (BEND, FEND, CRITIC, QA-B, JUDGE)
+   - **Story 1:** BEND/FEND already alive from sizing — spawn only CRITIC, QA-B, JUDGE fresh. Send implementation context to existing BEND/FEND.
+   - **Story 2+:** Spawn all Phase 2 agents fresh per the task graph (BEND, FEND, CRITIC, QA-B, JUDGE)
    - Agents query Knowledge/SQLite for grooming context (NOT in-context from Phase 1)
    - Monitor execution, handle rejections, gate verdicts
    - On JUDGE SHIP: merge branch, record actuals

package/pipeline/steps/orchestration/sprint-groom.md CHANGED Viewed

@@ -14,6 +14,10 @@ For each pending story in the grooming batch:
    - `api` — story has API endpoints, backend logic, or database changes
    - `ui` — story has UI components, pages, or visual elements
    - `data-pipeline` — story has ETL, data transformation, or batch processing
+   - `mcp-server` — story has MCP server tools, handlers, or protocol work
+   - `library` — story is shared library/package (exports, packaging, versioning)
+   - `document-generation` — story has document/report template or generation pipeline work
+   - `iac` — story has infrastructure work (Terraform, CloudFormation, Kubernetes, CI/CD)
 3. Write `testing_profiles: [api, ui]` (or whichever apply) to the story's backlog entry
 This must complete before Step 1. Downstream agents rely on `testing_profiles` to determine conditional steps.

package/pipeline/steps/orchestration/sprint-size.md CHANGED Viewed

@@ -2,33 +2,43 @@
 **Condition:** Only execute in sprint mode (`{is_sprint_mode}` is true).
-## Step 1: Spawn Estimation Agents
+## Step 1: Spawn Developer Agents
-Spawn BEND with `.valent-pipeline/steps/bend/estimate.md` step file only (no implementation tools).
+Scan groomed stories' `testing_profiles` in `{backlog_path}`:
-If any groomed stories have `fullstack-web` or `frontend-only` surface, also spawn FEND with `.valent-pipeline/steps/fend/estimate.md`.
+- Spawn BEND if any groomed story has `api` in `testing_profiles`
+- Spawn FEND if any groomed story has `ui` in `testing_profiles`
+- Spawn DATA if any groomed story has `data-pipeline` in `testing_profiles`
+- Spawn MCP-DEV if any groomed story has `mcp-server` in `testing_profiles`
+- Spawn LIBDEV if any groomed story has `library` in `testing_profiles`
+- Spawn DOCGEN if any groomed story has `document-generation` in `testing_profiles`
+- Spawn IAC if any groomed story has `iac` in `testing_profiles`
-Pass `{estimation_model}` and `{correction_directives}` (calibration directives) in the spawn context.
+Spawn with their normal prompt template and pass `.valent-pipeline/steps/{agent}/estimate.md` as the first step. Pass `{estimation_model}` and `{correction_directives}` (calibration directives) in the spawn context.
+These agents persist into execution — they are NOT killed after sizing.
 ## Step 2: Size Each Groomed Story
 For each story with status `groomed`:
 1. Update status to `sizing` in `{backlog_path}`
-2. Send story context (reqs-brief, uxa-spec, qa-test-spec) to BEND
-3. BEND writes `bend-estimation.md` with Fibonacci points
-4. If full-stack: FEND writes `fend-estimation.md` with Fibonacci points
-5. **Record points:**
-   - Backend-only: `story_points = BEND estimate`
-   - Full-stack: `story_points = BEND estimate + FEND estimate`
-   - Data-pipeline: `story_points = BEND estimate`
+2. Read story's `testing_profiles` from `{backlog_path}`
+3. Dispatch based on profiles — send story context to **every agent whose profile is present**:
+   - `api` in profiles → send to BEND
+   - `ui` in profiles → send to FEND
+   - `data-pipeline` in profiles → send to DATA
+   - `mcp-server` in profiles → send to MCP-DEV
+   - `library` in profiles → send to LIBDEV
+   - `document-generation` in profiles → send to DOCGEN
+   - `iac` in profiles → send to IAC
+   Multiple profiles can be active (e.g., `[api, data-pipeline]` sends to both BEND and DATA).
+4. Agents write estimation files (`{agent}-estimation.md`)
+5. **Record points:** sum all agent estimates for the story.
+   `story_points = sum of all agent estimates received`
 6. Update story's `story_points` field in `{backlog_path}`
-## Step 3: Kill Estimation Agents
-In epic/project mode: kill BEND and FEND after sizing all stories. They will be respawned fresh per story during execution (each story needs clean code context).
-## Step 4: Update Sprint State
+## Step 3: Update Sprint State
 Update `pipeline-state.json`: `current_sprint.phase = "planning"`.

package/pipeline/steps/orchestration/validate-story-inputs.md CHANGED Viewed

@@ -33,11 +33,20 @@ Based on the story scope and project type, determine which testing profiles are
 | Story has API endpoints (backend routes, REST/GraphQL) | `api` |
 | Story has UI components (pages, components, visual changes) | `ui` |
 | Story has data pipeline work (ETL, transformations, migrations) | `data-pipeline` |
+| Story has MCP server tools, handlers, or protocol work | `mcp-server` |
+| Story is shared library/package (exports, packaging, versioning) | `library` |
+| Story has document/report template or generation pipeline work | `document-generation` |
+| Story has infrastructure work (Terraform, CloudFormation, Kubernetes, CI/CD) | `iac` |
 Multiple profiles can be active. Examples:
 - Backend-only story: `[api]`
 - Frontend-only story: `[ui]`
 - Fullstack story with both API and UI work: `[api, ui]`
 - Data pipeline story: `[data-pipeline]`
+- MCP server story: `[mcp-server]`
+- Library/package story: `[library]`
+- Document generation story: `[document-generation]`
+- Infrastructure story: `[iac]`
+- Fullstack story with infrastructure: `[api, ui, iac]`
 Set `{testing_profiles}` for use in shared context.

package/pipeline/steps/qa-a/data-pipeline.md ADDED Viewed

@@ -0,0 +1,32 @@
+# QA-A Step: Data Pipeline Testing
+## Pipeline Smoke Test Specification
+For every pipeline stage in this story, write a **Pipeline Smoke Test** table:
+```
+## Pipeline Smoke Tests
+| ID | Input Dataset | Transform Step | Expected Output | Row Count Delta | Idempotency Check |
+```
+Rules:
+- One row per transform stage (ingest, each transform, output)
+- Input dataset: exact description of seed data (file path, format, row count, key characteristics)
+- Transform step: the specific stage being tested
+- Expected output: key fields, values, and format QA-B must verify
+- Row count delta: expected rows in vs rows out with reason for any difference
+- Idempotency check: "Run twice, assert identical output" for every write stage
+- Minimum per pipeline: one happy path per stage, one null/malformed input, one empty input
+- Every filter/join stage MUST have a row asserting dropped rows are logged with reason
+- Checkpoint/resume: at least one test row that simulates mid-pipeline failure and verifies resume produces correct final output
+## Quality Gate Additions
+- [ ] Smoke test table covers every pipeline stage (ingest, transform, output)
+- [ ] Every filter/join has a row count delta assertion with drop reason verification
+- [ ] Idempotency test specified for every write stage
+- [ ] Null and malformed input test cases included
+- [ ] Empty input test case included
+- [ ] Checkpoint/resume test case included (if pipeline supports checkpointing)
+- [ ] Row counts verified at each stage boundary

package/pipeline/steps/qa-a/document-generation.md ADDED Viewed

@@ -0,0 +1,52 @@
+# QA-A Step: Document Generation Testing
+## Render Smoke Test Specification
+For every document template in this story, write a **Render Smoke Test** table:
+```
+## Render Smoke Tests
+| ID | Template | Input Data | Expected Output Format | Validation Check |
+```
+Rules:
+- One row per template + scenario (happy path + key edge cases)
+- Input Data: exact JSON payload or reference to fixture file
+- Expected Output Format: PDF, HTML, Markdown, etc. with expected MIME type
+- Validation Check: what QA-B must verify in the generated output
+- Minimum per template: one happy path with all variables populated, one with null/missing optional variables, one with edge-case data (unicode, long strings, special characters)
+### Variable Substitution Tests
+- **Normal substitution:** all required variables present and correctly typed -- verify they appear in output at expected positions
+- **Null variables:** optional variables set to null -- verify graceful handling (omitted or default value), no literal `null` in output
+- **Missing variables:** required variables omitted -- verify clear error, no unsubstituted markers (`{{varName}}`, `${varName}`, etc.) in output
+### Conditional Section Tests
+- Templates with conditional sections must have test rows for each branch (true and false conditions)
+- Templates with loops must have test rows for empty collection, single item, and multiple items
+### Output Format Validation
+- Every declared output format must have at least one test row
+- Validation must confirm correct MIME type and parseable structure (valid HTML, valid PDF, valid Markdown)
+### Encoding and Unicode Tests
+- At least one test row with CJK characters, emoji, or RTL text in variable data
+- Verify output preserves unicode correctly (no mojibake, no encoding errors)
+### No Unsubstituted Markers
+- Every test must verify that no raw template markers appear in the final output
+## Quality Gate Additions
+- [ ] Render smoke test table covers every template (happy path + null/missing + edge-case data)
+- [ ] Variable substitution tested for normal, null, and missing cases
+- [ ] Conditional sections tested for all branches
+- [ ] Every output format has at least one validation test
+- [ ] Encoding/unicode test included
+- [ ] No unsubstituted markers assertion included in every test

package/pipeline/steps/qa-a/iac.md ADDED Viewed

@@ -0,0 +1,30 @@
+# QA-A Step: Infrastructure Testing
+## Infrastructure Smoke Test Specification
+For every infrastructure resource in this story, write an **Infrastructure Smoke Test** table:
+```
+## Infrastructure Smoke Tests
+| ID | Resource | Operation | Expected State | Validation Method |
+```
+Rules:
+- One row per resource provisioned or modified
+- Resource: resource type and logical name (e.g., `aws_s3_bucket.data_lake`)
+- Operation: plan, apply, destroy, or drift-check
+- Expected state: the desired state after the operation (e.g., "exists with tags", "no diff on re-apply")
+- Validation method: how QA-B verifies (e.g., "terraform plan output", "aws cli describe", "policy check output")
+- Minimum per resource: one plan validation, one tagging check
+- Every story must include: plan output validation (no errors), drift check (plan after apply = no changes), tagging check (all resources tagged), security policy check (no overly permissive IAM)
+- Idempotency row required: "apply twice, second plan shows no changes"
+## Quality Gate Additions
+- [ ] Smoke test table covers every infrastructure resource (plan + tagging + security)
+- [ ] Plan output validation row present (terraform plan succeeds without errors)
+- [ ] Drift check row present (plan after apply = no changes)
+- [ ] Tagging check row present (all resources have standard tags)
+- [ ] Security policy check row present (no wildcard IAM, no hardcoded secrets)
+- [ ] Idempotency row present (apply twice = no changes)

package/pipeline/steps/qa-a/library.md ADDED Viewed

@@ -0,0 +1,42 @@
+# QA-A Step: Library Testing
+## Export Smoke Test Specification
+For every public export in this story, write an **Export Smoke Test** table:
+```
+## Export Smoke Tests
+| ID | Import Method | Module Path | Expected Export | Verification |
+```
+Rules:
+- One row per export + import method (CJS `require()` and ESM `import` for each export)
+- Module path: exact path from the exports map (e.g., `"./utils"`, `"."`)
+- Expected export: the named or default export and its expected type/signature
+- Verification: what to assert (typeof, instanceof, return value shape, callable, etc.)
+- Minimum per export: one CJS row, one ESM row
+- Type declaration exports must have a verification row confirming .d.ts resolution
+- Backwards compatibility: if this is an update to an existing library, include rows verifying that previously documented imports still resolve
+## Tree-Shaking Specification
+If the library declares `sideEffects: false`, write a **Tree-Shaking Test** table:
+```
+## Tree-Shaking Tests
+| ID | Import Statement | Expected Included | Expected Excluded | Verification |
+```
+Rules:
+- Selective import must not pull in unrelated modules
+- Bundle output must not contain code from unused exports
+- Side-effect-free imports must produce no console output or global mutations
+## Quality Gate Additions
+- [ ] Export smoke test table covers every public export (CJS + ESM rows)
+- [ ] Type declaration verification rows present for all typed exports
+- [ ] Backwards compatibility rows present for updated libraries
+- [ ] Tree-shaking tests present if sideEffects: false is declared