valent-pipeline 0.2.20 → 0.2.21

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (110) hide show
  1. package/README.md +438 -0
  2. package/package.json +1 -1
  3. package/pipeline/agents-manifest.yaml +61 -1
  4. package/pipeline/docs/agent-reference.md +82 -23
  5. package/pipeline/docs/design/refactor-checklist.md +111 -0
  6. package/pipeline/docs/index.md +60 -0
  7. package/pipeline/docs/pipeline-overview.md +4 -0
  8. package/pipeline/prompts/bend.md +5 -11
  9. package/pipeline/prompts/critic.md +9 -0
  10. package/pipeline/prompts/data.md +59 -0
  11. package/pipeline/prompts/docgen.md +61 -0
  12. package/pipeline/prompts/fend.md +3 -10
  13. package/pipeline/prompts/iac.md +70 -0
  14. package/pipeline/prompts/lead.md +81 -3
  15. package/pipeline/prompts/libdev.md +61 -0
  16. package/pipeline/prompts/mcp-dev.md +59 -0
  17. package/pipeline/prompts/mobile.md +92 -0
  18. package/pipeline/prompts/qa-a.md +1 -1
  19. package/pipeline/prompts/qa-b.md +1 -1
  20. package/pipeline/prompts/reqs.md +5 -1
  21. package/pipeline/scripts/db-bootstrap.ts +1 -1
  22. package/pipeline/scripts/embed-sqlite.ts +5 -0
  23. package/pipeline/steps/common/quality-standards.md +19 -0
  24. package/pipeline/steps/critic/data-pipeline.md +28 -0
  25. package/pipeline/steps/critic/document-generation.md +21 -0
  26. package/pipeline/steps/critic/iac.md +29 -0
  27. package/pipeline/steps/critic/library.md +24 -0
  28. package/pipeline/steps/critic/mcp-server.md +24 -0
  29. package/pipeline/steps/critic/mobile-app.md +29 -0
  30. package/pipeline/steps/data/estimate.md +51 -0
  31. package/pipeline/steps/data/handoff.md +9 -0
  32. package/pipeline/steps/data/implement.md +16 -0
  33. package/pipeline/steps/data/read-inputs.md +13 -0
  34. package/pipeline/steps/data/write-tests.md +13 -0
  35. package/pipeline/steps/docgen/estimate.md +49 -0
  36. package/pipeline/steps/docgen/handoff.md +9 -0
  37. package/pipeline/steps/docgen/implement.md +19 -0
  38. package/pipeline/steps/docgen/read-inputs.md +13 -0
  39. package/pipeline/steps/docgen/write-tests.md +15 -0
  40. package/pipeline/steps/iac/estimate.md +50 -0
  41. package/pipeline/steps/iac/handoff.md +9 -0
  42. package/pipeline/steps/iac/implement.md +19 -0
  43. package/pipeline/steps/iac/read-inputs.md +13 -0
  44. package/pipeline/steps/iac/write-tests.md +20 -0
  45. package/pipeline/steps/judge/ship-decision.md +14 -1
  46. package/pipeline/steps/libdev/estimate.md +49 -0
  47. package/pipeline/steps/libdev/handoff.md +9 -0
  48. package/pipeline/steps/libdev/implement.md +19 -0
  49. package/pipeline/steps/libdev/read-inputs.md +13 -0
  50. package/pipeline/steps/libdev/write-tests.md +16 -0
  51. package/pipeline/steps/mcp-dev/estimate.md +49 -0
  52. package/pipeline/steps/mcp-dev/handoff.md +9 -0
  53. package/pipeline/steps/mcp-dev/implement.md +29 -0
  54. package/pipeline/steps/mcp-dev/read-inputs.md +13 -0
  55. package/pipeline/steps/mcp-dev/write-tests.md +19 -0
  56. package/pipeline/steps/mobile/emulator-lifecycle.md +67 -0
  57. package/pipeline/steps/mobile/estimate.md +51 -0
  58. package/pipeline/steps/mobile/flutter.md +30 -0
  59. package/pipeline/steps/mobile/handoff.md +18 -0
  60. package/pipeline/steps/mobile/implement.md +20 -0
  61. package/pipeline/steps/mobile/react-native.md +32 -0
  62. package/pipeline/steps/mobile/read-inputs.md +10 -0
  63. package/pipeline/steps/mobile/write-tests.md +59 -0
  64. package/pipeline/steps/orchestration/adopt-lead-and-create-team.md +1 -1
  65. package/pipeline/steps/orchestration/sprint-groom.md +4 -0
  66. package/pipeline/steps/orchestration/sprint-size.md +19 -12
  67. package/pipeline/steps/orchestration/validate-story-inputs.md +9 -0
  68. package/pipeline/steps/qa-a/data-pipeline.md +32 -0
  69. package/pipeline/steps/qa-a/document-generation.md +52 -0
  70. package/pipeline/steps/qa-a/iac.md +30 -0
  71. package/pipeline/steps/qa-a/library.md +42 -0
  72. package/pipeline/steps/qa-a/mcp-server.md +31 -0
  73. package/pipeline/steps/qa-a/mobile-app.md +59 -0
  74. package/pipeline/steps/qa-b/data-pipeline.md +48 -0
  75. package/pipeline/steps/qa-b/document-generation.md +47 -0
  76. package/pipeline/steps/qa-b/iac.md +44 -0
  77. package/pipeline/steps/qa-b/library.md +61 -0
  78. package/pipeline/steps/qa-b/mcp-server.md +40 -0
  79. package/pipeline/steps/qa-b/mobile-app.md +71 -0
  80. package/pipeline/steps/readiness/standalone-review.md +7 -2
  81. package/pipeline/steps/reqs/data-pipeline.md +56 -0
  82. package/pipeline/steps/reqs/document-generation.md +55 -0
  83. package/pipeline/steps/reqs/draft-brief.md +10 -0
  84. package/pipeline/steps/reqs/iac.md +63 -0
  85. package/pipeline/steps/reqs/library.md +56 -0
  86. package/pipeline/steps/reqs/mcp-server.md +48 -0
  87. package/pipeline/steps/reqs/mobile-app.md +54 -0
  88. package/pipeline/steps/reqs/self-review.md +5 -3
  89. package/pipeline/task-graphs/backend-api.yaml +19 -2
  90. package/pipeline/task-graphs/data-pipeline.yaml +29 -12
  91. package/pipeline/task-graphs/document-generation.yaml +29 -12
  92. package/pipeline/task-graphs/frontend-only.yaml +19 -2
  93. package/pipeline/task-graphs/fullstack-web.yaml +19 -2
  94. package/pipeline/task-graphs/library.yaml +29 -12
  95. package/pipeline/task-graphs/mcp-server.yaml +29 -12
  96. package/pipeline/task-graphs/mobile-app.yaml +171 -0
  97. package/pipeline/templates/bugs.template.md +1 -1
  98. package/pipeline/templates/critic-review.template.md +1 -1
  99. package/pipeline/templates/data-handoff.template.md +96 -0
  100. package/pipeline/templates/docgen-handoff.template.md +83 -0
  101. package/pipeline/templates/iac-handoff.template.md +83 -0
  102. package/pipeline/templates/judge-decision.template.md +11 -1
  103. package/pipeline/templates/libdev-handoff.template.md +82 -0
  104. package/pipeline/templates/mcp-dev-handoff.template.md +87 -0
  105. package/pipeline/templates/mobile-handoff.template.md +122 -0
  106. package/pipeline/templates/reqs-brief.template.md +60 -4
  107. package/skills/valent-run-deferred-tests/SKILL.md +109 -0
  108. package/src/commands/db-rebuild.js +5 -0
  109. package/src/lib/config-schema.js +1 -1
  110. package/src/lib/db.js +1 -1
@@ -0,0 +1,32 @@
1
+ # MOBILE Step: React Native Specifics
2
+
3
+ This step is loaded conditionally when `{tech_stack.mobile_framework}` is `react-native`. Read before implementing.
4
+
5
+ ## Metro Bundler Management
6
+ - Start Metro before any test execution: `npx react-native start --reset-cache &`
7
+ - Monitor Metro for JS bundle errors. If bundling fails, fix the code and retry.
8
+ - Handle port conflicts: check if port 8081 is in use before starting (`lsof -i :8081` on Mac/Linux, `netstat -ano | findstr 8081` on Windows). Kill stale Metro processes if needed.
9
+ - Kill Metro after all tests complete.
10
+
11
+ ## React Native Build Configuration
12
+ - Use `react-native.config.js` for native module auto-linking
13
+ - Hermes engine: verify Hermes is enabled for Android (check `android/app/build.gradle` for `enableHermes: true`)
14
+ - Flipper: disable in release builds, optional in debug
15
+ - Fast Refresh: disable during E2E to avoid test flakiness (`--no-interactive` flag)
16
+
17
+ ## React Native Testing Patterns
18
+ - Component tests: use React Native Testing Library (`@testing-library/react-native`)
19
+ - Navigation tests: verify deep link resolution via React Navigation linking config, verify screen transitions
20
+ - Platform-specific code: test `.android.tsx` and `.ios.tsx` variants separately when they exist
21
+ - AsyncStorage / MMKV: clear between test suites to prevent state leakage
22
+
23
+ ## Native Module Considerations
24
+ - If the story uses native modules, verify auto-linking succeeded on both platforms
25
+ - Bridge calls must be tested with real native code, not mocks
26
+ - Pod install (iOS): `cd ios && pod install` after adding native dependencies
27
+ - Gradle sync (Android): `cd android && ./gradlew --refresh-dependencies` if native module issues
28
+
29
+ ## Area Labels for Testing
30
+ - Use `testID` prop for React Native components (maps to `accessibilityIdentifier` on iOS, `resource-id` on Android)
31
+ - Maestro `tapOn` uses `id:` selector which reads `testID` on both platforms
32
+ - Follow the area label convention from uxa-spec.md: `{screen}-{section}-{element}`
@@ -0,0 +1,10 @@
1
+ # MOBILE Step: Read Inputs
2
+
3
+ ## Step 1: Read inputs
4
+ Read `reqs-brief.md`, `uxa-spec.md` (if UI profile active), and `qa-test-spec.md`. Understand: acceptance criteria, screen specifications, navigation flows, component hierarchy, platform-specific behaviors, Maestro flow specifications, test specifications.
5
+
6
+ ## Step 2: Read correction directives
7
+ Read `{correction_directives}`. Apply all directives targeting MOBILE. Note any conflicts with default behavior and follow the directive.
8
+
9
+ ## Step 2b: Query Knowledge Agent (Conditional)
10
+ If a Knowledge Agent is available in the team config, send: `[KNOWLEDGE-QUERY] What mobile patterns, navigation conventions, and platform-specific constraints should I know? Context: I am MOBILE implementing {story_id} using {tech_stack.mobile_framework}.` If no response within a reasonable time or no Knowledge Agent is spawned, proceed without.
@@ -0,0 +1,59 @@
1
+ # MOBILE Step: Write Tests
2
+
3
+ ## Step 8: Write Maestro YAML test flows
4
+ For each AC in qa-test-spec.md, write a Maestro YAML flow file. Place flows in `e2e/maestro/` directory.
5
+
6
+ Each flow file structure:
7
+ ```yaml
8
+ appId: {app_package_name}
9
+ name: {descriptive flow name}
10
+ ---
11
+ - clearState
12
+ - launchApp
13
+ # ... test steps: tapOn, assertVisible, inputText, scroll, back, swipe, etc.
14
+ ```
15
+
16
+ Rules:
17
+ - Every flow must start with `clearState` and `launchApp`
18
+ - Use `assertVisible` and `assertNotVisible` for assertions, not fixed-time waits
19
+ - Use `waitForAnimationToEnd` instead of hardcoded `extendedWaitUntil` timeouts
20
+ - Deep link tests: use `openLink` command with the URI pattern from reqs-brief
21
+ - Screenshot capture: use `takeScreenshot` at assertion points for evidence
22
+
23
+ Record in `mobile-handoff.md#maestro-flow-files`.
24
+
25
+ ## Step 8b: Write unit tests
26
+ Write unit tests per qa-test-spec.md using `{tech_stack.test_framework_unit}`. Unit tests MAY mock API clients for isolated component logic. Every mocked unit test for an API-calling AC must be paired with a real-API Maestro flow for the same AC. Record in `mobile-handoff.md#test-files-written`.
27
+
28
+ ## Step 9: Run unit tests, verify all pass
29
+ Run the unit test suite. All tests must pass. Record results in `mobile-handoff.md#test-results-summary`. If tests fail, fix the code -- do not skip or weaken tests.
30
+
31
+ ## Step 9b: App-Level Smoke Test
32
+
33
+ Write one test that bootstraps the application from its entry point and asserts the story's deliverable is present and reachable. This catches "unwired entry point" bugs where a screen exists but is never registered in the navigation. Mandatory for the first mobile story in a project, recommended for all subsequent stories.
34
+
35
+ Record in `mobile-handoff.md#test-files-written`.
36
+
37
+ ## Step 10: Run Maestro flows
38
+
39
+ ### Android (always)
40
+ For each flow file:
41
+ 1. State isolation (Step 7d)
42
+ 2. Execute: `maestro test {flow_file}`
43
+ 3. Record per-flow result (pass/fail with output)
44
+
45
+ ### iOS (Mac only)
46
+ For each flow file tagged as `both` or `ios`:
47
+ 1. State isolation (Step 7d, iOS variant)
48
+ 2. Execute: `maestro test {flow_file} --device {ios_simulator_name}`
49
+ 3. Record per-flow result
50
+
51
+ ### iOS Deferred (Windows/Linux)
52
+ If not on Mac, record all iOS-targeted flows in `mobile-handoff.md#deferred-ios-tests` with reason "Host OS lacks iOS simulator". This is expected, not a bug.
53
+
54
+ E2E tests run serially against the single emulator -- the emulator is shared mutable state. The 1.5-minute timeout per story applies to test execution time excluding emulator boot time.
55
+
56
+ ## Step 10b: Signal integration readiness
57
+ When mobile code is complete and all available-platform tests pass, send to BEND via inbox:
58
+ `[INTEGRATION-READY] Mobile code complete. Run integration tests against my app.`
59
+ Wait for BEND's `[INTEGRATION-READY]` message before running integration verification. Once both sides are ready, verify that API calls from the app resolve correctly against BEND's running server.
@@ -99,7 +99,7 @@ Otherwise, substitute variables in the knowledge spawn template (`{{story_id}}`,
99
99
 
100
100
  | Wave | Spawn Trigger | Agents |
101
101
  |---|---|---|
102
- | 2 | QA-A sends `[HANDOFF]` (completes) | BEND, FEND, CRITIC |
102
+ | 2 | QA-A sends `[HANDOFF]` (completes) | BEND, FEND, DATA, MCP-DEV, LIBDEV, DOCGEN, IAC, CRITIC (each only if not skipped by testing_profiles) |
103
103
  | 3 | CRITIC task becomes `in_progress` | QA-B, PMCP (if ui profile) |
104
104
  | 4 | JUDGE bug-review task becomes `in_progress` | (reserved) |
105
105
 
@@ -14,6 +14,10 @@ For each pending story in the grooming batch:
14
14
  - `api` — story has API endpoints, backend logic, or database changes
15
15
  - `ui` — story has UI components, pages, or visual elements
16
16
  - `data-pipeline` — story has ETL, data transformation, or batch processing
17
+ - `mcp-server` — story has MCP server tools, handlers, or protocol work
18
+ - `library` — story is shared library/package (exports, packaging, versioning)
19
+ - `document-generation` — story has document/report template or generation pipeline work
20
+ - `iac` — story has infrastructure work (Terraform, CloudFormation, Kubernetes, CI/CD)
17
21
  3. Write `testing_profiles: [api, ui]` (or whichever apply) to the story's backlog entry
18
22
 
19
23
  This must complete before Step 1. Downstream agents rely on `testing_profiles` to determine conditional steps.
@@ -2,12 +2,17 @@
2
2
 
3
3
  **Condition:** Only execute in sprint mode (`{is_sprint_mode}` is true).
4
4
 
5
- ## Step 1: Spawn BEND/FEND
5
+ ## Step 1: Spawn Developer Agents
6
6
 
7
7
  Scan groomed stories' `testing_profiles` in `{backlog_path}`:
8
8
 
9
- - Spawn BEND if any groomed story has `api` or `data-pipeline` in `testing_profiles`
9
+ - Spawn BEND if any groomed story has `api` in `testing_profiles`
10
10
  - Spawn FEND if any groomed story has `ui` in `testing_profiles`
11
+ - Spawn DATA if any groomed story has `data-pipeline` in `testing_profiles`
12
+ - Spawn MCP-DEV if any groomed story has `mcp-server` in `testing_profiles`
13
+ - Spawn LIBDEV if any groomed story has `library` in `testing_profiles`
14
+ - Spawn DOCGEN if any groomed story has `document-generation` in `testing_profiles`
15
+ - Spawn IAC if any groomed story has `iac` in `testing_profiles`
11
16
 
12
17
  Spawn with their normal prompt template and pass `.valent-pipeline/steps/{agent}/estimate.md` as the first step. Pass `{estimation_model}` and `{correction_directives}` (calibration directives) in the spawn context.
13
18
 
@@ -19,16 +24,18 @@ For each story with status `groomed`:
19
24
 
20
25
  1. Update status to `sizing` in `{backlog_path}`
21
26
  2. Read story's `testing_profiles` from `{backlog_path}`
22
- 3. Dispatch based on profiles:
23
- - `[api]` or `[data-pipeline]`: send story context to BEND only
24
- - `[ui]` only: send story context to FEND only
25
- - `[api, ui]` (fullstack): send story context to both BEND and FEND
26
- 4. Agents write estimation files (`bend-estimation.md` and/or `fend-estimation.md`)
27
- 5. **Record points:**
28
- - Backend-only (`[api]`): `story_points = BEND estimate`
29
- - Frontend-only (`[ui]`): `story_points = FEND estimate`
30
- - Fullstack (`[api, ui]`): `story_points = BEND estimate + FEND estimate`
31
- - Data-pipeline: `story_points = BEND estimate`
27
+ 3. Dispatch based on profiles — send story context to **every agent whose profile is present**:
28
+ - `api` in profiles send to BEND
29
+ - `ui` in profiles send to FEND
30
+ - `data-pipeline` in profiles send to DATA
31
+ - `mcp-server` in profiles send to MCP-DEV
32
+ - `library` in profiles → send to LIBDEV
33
+ - `document-generation` in profiles send to DOCGEN
34
+ - `iac` in profiles send to IAC
35
+ Multiple profiles can be active (e.g., `[api, data-pipeline]` sends to both BEND and DATA).
36
+ 4. Agents write estimation files (`{agent}-estimation.md`)
37
+ 5. **Record points:** sum all agent estimates for the story.
38
+ `story_points = sum of all agent estimates received`
32
39
  6. Update story's `story_points` field in `{backlog_path}`
33
40
 
34
41
  ## Step 3: Update Sprint State
@@ -33,11 +33,20 @@ Based on the story scope and project type, determine which testing profiles are
33
33
  | Story has API endpoints (backend routes, REST/GraphQL) | `api` |
34
34
  | Story has UI components (pages, components, visual changes) | `ui` |
35
35
  | Story has data pipeline work (ETL, transformations, migrations) | `data-pipeline` |
36
+ | Story has MCP server tools, handlers, or protocol work | `mcp-server` |
37
+ | Story is shared library/package (exports, packaging, versioning) | `library` |
38
+ | Story has document/report template or generation pipeline work | `document-generation` |
39
+ | Story has infrastructure work (Terraform, CloudFormation, Kubernetes, CI/CD) | `iac` |
36
40
 
37
41
  Multiple profiles can be active. Examples:
38
42
  - Backend-only story: `[api]`
39
43
  - Frontend-only story: `[ui]`
40
44
  - Fullstack story with both API and UI work: `[api, ui]`
41
45
  - Data pipeline story: `[data-pipeline]`
46
+ - MCP server story: `[mcp-server]`
47
+ - Library/package story: `[library]`
48
+ - Document generation story: `[document-generation]`
49
+ - Infrastructure story: `[iac]`
50
+ - Fullstack story with infrastructure: `[api, ui, iac]`
42
51
 
43
52
  Set `{testing_profiles}` for use in shared context.
@@ -0,0 +1,32 @@
1
+ # QA-A Step: Data Pipeline Testing
2
+
3
+ ## Pipeline Smoke Test Specification
4
+
5
+ For every pipeline stage in this story, write a **Pipeline Smoke Test** table:
6
+
7
+ ```
8
+ ## Pipeline Smoke Tests
9
+
10
+ | ID | Input Dataset | Transform Step | Expected Output | Row Count Delta | Idempotency Check |
11
+ ```
12
+
13
+ Rules:
14
+ - One row per transform stage (ingest, each transform, output)
15
+ - Input dataset: exact description of seed data (file path, format, row count, key characteristics)
16
+ - Transform step: the specific stage being tested
17
+ - Expected output: key fields, values, and format QA-B must verify
18
+ - Row count delta: expected rows in vs rows out with reason for any difference
19
+ - Idempotency check: "Run twice, assert identical output" for every write stage
20
+ - Minimum per pipeline: one happy path per stage, one null/malformed input, one empty input
21
+ - Every filter/join stage MUST have a row asserting dropped rows are logged with reason
22
+ - Checkpoint/resume: at least one test row that simulates mid-pipeline failure and verifies resume produces correct final output
23
+
24
+ ## Quality Gate Additions
25
+
26
+ - [ ] Smoke test table covers every pipeline stage (ingest, transform, output)
27
+ - [ ] Every filter/join has a row count delta assertion with drop reason verification
28
+ - [ ] Idempotency test specified for every write stage
29
+ - [ ] Null and malformed input test cases included
30
+ - [ ] Empty input test case included
31
+ - [ ] Checkpoint/resume test case included (if pipeline supports checkpointing)
32
+ - [ ] Row counts verified at each stage boundary
@@ -0,0 +1,52 @@
1
+ # QA-A Step: Document Generation Testing
2
+
3
+ ## Render Smoke Test Specification
4
+
5
+ For every document template in this story, write a **Render Smoke Test** table:
6
+
7
+ ```
8
+ ## Render Smoke Tests
9
+
10
+ | ID | Template | Input Data | Expected Output Format | Validation Check |
11
+ ```
12
+
13
+ Rules:
14
+ - One row per template + scenario (happy path + key edge cases)
15
+ - Input Data: exact JSON payload or reference to fixture file
16
+ - Expected Output Format: PDF, HTML, Markdown, etc. with expected MIME type
17
+ - Validation Check: what QA-B must verify in the generated output
18
+ - Minimum per template: one happy path with all variables populated, one with null/missing optional variables, one with edge-case data (unicode, long strings, special characters)
19
+
20
+ ### Variable Substitution Tests
21
+
22
+ - **Normal substitution:** all required variables present and correctly typed -- verify they appear in output at expected positions
23
+ - **Null variables:** optional variables set to null -- verify graceful handling (omitted or default value), no literal `null` in output
24
+ - **Missing variables:** required variables omitted -- verify clear error, no unsubstituted markers (`{{varName}}`, `${varName}`, etc.) in output
25
+
26
+ ### Conditional Section Tests
27
+
28
+ - Templates with conditional sections must have test rows for each branch (true and false conditions)
29
+ - Templates with loops must have test rows for empty collection, single item, and multiple items
30
+
31
+ ### Output Format Validation
32
+
33
+ - Every declared output format must have at least one test row
34
+ - Validation must confirm correct MIME type and parseable structure (valid HTML, valid PDF, valid Markdown)
35
+
36
+ ### Encoding and Unicode Tests
37
+
38
+ - At least one test row with CJK characters, emoji, or RTL text in variable data
39
+ - Verify output preserves unicode correctly (no mojibake, no encoding errors)
40
+
41
+ ### No Unsubstituted Markers
42
+
43
+ - Every test must verify that no raw template markers appear in the final output
44
+
45
+ ## Quality Gate Additions
46
+
47
+ - [ ] Render smoke test table covers every template (happy path + null/missing + edge-case data)
48
+ - [ ] Variable substitution tested for normal, null, and missing cases
49
+ - [ ] Conditional sections tested for all branches
50
+ - [ ] Every output format has at least one validation test
51
+ - [ ] Encoding/unicode test included
52
+ - [ ] No unsubstituted markers assertion included in every test
@@ -0,0 +1,30 @@
1
+ # QA-A Step: Infrastructure Testing
2
+
3
+ ## Infrastructure Smoke Test Specification
4
+
5
+ For every infrastructure resource in this story, write an **Infrastructure Smoke Test** table:
6
+
7
+ ```
8
+ ## Infrastructure Smoke Tests
9
+
10
+ | ID | Resource | Operation | Expected State | Validation Method |
11
+ ```
12
+
13
+ Rules:
14
+ - One row per resource provisioned or modified
15
+ - Resource: resource type and logical name (e.g., `aws_s3_bucket.data_lake`)
16
+ - Operation: plan, apply, destroy, or drift-check
17
+ - Expected state: the desired state after the operation (e.g., "exists with tags", "no diff on re-apply")
18
+ - Validation method: how QA-B verifies (e.g., "terraform plan output", "aws cli describe", "policy check output")
19
+ - Minimum per resource: one plan validation, one tagging check
20
+ - Every story must include: plan output validation (no errors), drift check (plan after apply = no changes), tagging check (all resources tagged), security policy check (no overly permissive IAM)
21
+ - Idempotency row required: "apply twice, second plan shows no changes"
22
+
23
+ ## Quality Gate Additions
24
+
25
+ - [ ] Smoke test table covers every infrastructure resource (plan + tagging + security)
26
+ - [ ] Plan output validation row present (terraform plan succeeds without errors)
27
+ - [ ] Drift check row present (plan after apply = no changes)
28
+ - [ ] Tagging check row present (all resources have standard tags)
29
+ - [ ] Security policy check row present (no wildcard IAM, no hardcoded secrets)
30
+ - [ ] Idempotency row present (apply twice = no changes)
@@ -0,0 +1,42 @@
1
+ # QA-A Step: Library Testing
2
+
3
+ ## Export Smoke Test Specification
4
+
5
+ For every public export in this story, write an **Export Smoke Test** table:
6
+
7
+ ```
8
+ ## Export Smoke Tests
9
+
10
+ | ID | Import Method | Module Path | Expected Export | Verification |
11
+ ```
12
+
13
+ Rules:
14
+ - One row per export + import method (CJS `require()` and ESM `import` for each export)
15
+ - Module path: exact path from the exports map (e.g., `"./utils"`, `"."`)
16
+ - Expected export: the named or default export and its expected type/signature
17
+ - Verification: what to assert (typeof, instanceof, return value shape, callable, etc.)
18
+ - Minimum per export: one CJS row, one ESM row
19
+ - Type declaration exports must have a verification row confirming .d.ts resolution
20
+ - Backwards compatibility: if this is an update to an existing library, include rows verifying that previously documented imports still resolve
21
+
22
+ ## Tree-Shaking Specification
23
+
24
+ If the library declares `sideEffects: false`, write a **Tree-Shaking Test** table:
25
+
26
+ ```
27
+ ## Tree-Shaking Tests
28
+
29
+ | ID | Import Statement | Expected Included | Expected Excluded | Verification |
30
+ ```
31
+
32
+ Rules:
33
+ - Selective import must not pull in unrelated modules
34
+ - Bundle output must not contain code from unused exports
35
+ - Side-effect-free imports must produce no console output or global mutations
36
+
37
+ ## Quality Gate Additions
38
+
39
+ - [ ] Export smoke test table covers every public export (CJS + ESM rows)
40
+ - [ ] Type declaration verification rows present for all typed exports
41
+ - [ ] Backwards compatibility rows present for updated libraries
42
+ - [ ] Tree-shaking tests present if sideEffects: false is declared
@@ -0,0 +1,31 @@
1
+ # QA-A Step: MCP Server Testing
2
+
3
+ ## Protocol Smoke Test Specification
4
+
5
+ For every MCP tool in this story, write a **Protocol Smoke Test** table:
6
+
7
+ ```
8
+ ## Protocol Smoke Tests
9
+
10
+ | ID | JSON-RPC Method | Params | Expected Result Shape | Expected Error Code |
11
+ ```
12
+
13
+ Rules:
14
+ - Initialize handshake first: `initialize` request must be the first row, verifying server info and capabilities
15
+ - `tools/list` must follow, verifying all tools are registered with correct inputSchema
16
+ - One `tools/call` row per tool with valid params (happy path)
17
+ - One `tools/call` row per tool with invalid params (schema violation, expecting `-32602`)
18
+ - Two-tier error model coverage: at least one row triggering `isError: true` (tool failure) and at least one row triggering a JSON-RPC error code (protocol failure)
19
+ - One row for unknown tool name (expecting `-32601` Method not found or `-32602` Invalid params)
20
+ - One row for malformed JSON-RPC (expecting `-32700` or `-32600`)
21
+ - Expected result shape: key fields and content types QA-B must verify
22
+ - Params: exact JSON payload for the JSON-RPC params field
23
+
24
+ ## Quality Gate Additions
25
+
26
+ - [ ] Protocol smoke test table covers initialize handshake
27
+ - [ ] Protocol smoke test table covers tools/list
28
+ - [ ] Protocol smoke test table covers tools/call for every tool (happy path + error paths)
29
+ - [ ] Two-tier error model tested: at least one JSON-RPC error code and one isError:true
30
+ - [ ] Invalid args test included for every tool
31
+ - [ ] Unknown tool / unknown method test included
@@ -0,0 +1,59 @@
1
+ # QA-A Step: Mobile App Testing
2
+
3
+ ## Maestro Flow Specification
4
+
5
+ For every mobile screen/flow in this story, write a **Maestro Flow Specification** table:
6
+
7
+ ### Maestro Flow Specifications
8
+
9
+ | ID | Flow Name | App State Setup | Steps | Expected Result | Platform |
10
+ |----|-----------|-----------------|-------|-----------------|----------|
11
+
12
+ Column rules:
13
+ - **ID:** Sequential flow identifier (MF-001, MF-002, ...)
14
+ - **Flow Name:** Descriptive name matching the AC being tested
15
+ - **App State Setup:** How to reach the required starting state. Every flow starts with `clearState` + `launchApp`. If seed data is needed, specify the API call or fixture.
16
+ - **Steps:** Sequence of Maestro actions (`launchApp`, `tapOn`, `assertVisible`, `inputText`, `scroll`, `back`, `swipe`, `openLink`, `takeScreenshot`)
17
+ - **Expected Result:** What the user should see after the flow completes (assert conditions)
18
+ - **Platform:** `both` (default) | `android-only` | `ios-only`
19
+
20
+ ### Flow Writing Rules
21
+
22
+ 1. Every flow MUST start with `clearState` or `launchApp` with `clearState: true` — no state carryover from previous flows
23
+ 2. Every flow MUST be independent — no ordering dependency between flows
24
+ 3. Use `assertVisible` and `assertNotVisible` for assertions, not fixed-time waits
25
+ 4. Use `waitForAnimationToEnd` for animation settling, not `extendedWaitUntil` with hardcoded durations
26
+ 5. For deep link tests, use `openLink` command with the URI pattern from reqs-brief
27
+ 6. Include `takeScreenshot` at key assertion points for evidence capture
28
+
29
+ ## Platform-Conditional Test Requirements
30
+
31
+ For each AC, specify:
32
+ - **Both platforms** (default): tests that verify identical behavior on Android and iOS
33
+ - **Android-specific:** tests for Android-only behavior (hardware back button, specific permissions, Android notification channels)
34
+ - **iOS-specific:** tests for iOS-only behavior (swipe-to-go-back, Face ID/Touch ID, iOS notification categories)
35
+
36
+ Mark iOS-specific tests with `[DEFER-IOS]` if the pipeline host may be Windows/Linux. The MOBILE agent will move these to the deferred queue.
37
+
38
+ ## State Isolation Requirements
39
+
40
+ - Every Maestro flow must be independent (no ordering dependency)
41
+ - App state cleared between flows via `adb shell pm clear {package}` / simulator equivalent
42
+ - If a flow requires seed data, specify the exact API call or fixture to set it up before the flow
43
+ - If a flow requires specific permissions, specify which permissions must be pre-granted
44
+
45
+ ## API Integration for Mobile
46
+
47
+ When a story's mobile app calls backend APIs, the spec MUST include:
48
+ - At least one Maestro flow per API-calling AC that exercises the real API round-trip (app → API → database → response → UI update)
49
+ - No mocked API responses in Maestro flows (Maestro does not support API interception — this is enforced by design)
50
+ - Infrastructure prerequisite: "API server must be running before Maestro flow execution"
51
+
52
+ ## Quality Gate Additions
53
+
54
+ - [ ] Every AC has at least one Maestro flow specification
55
+ - [ ] State isolation documented for every flow (clearState + seed data if needed)
56
+ - [ ] Platform-specific tests explicitly tagged (both / android-only / ios-only)
57
+ - [ ] No flow depends on another flow's output state
58
+ - [ ] Deep link tests included for screens with URI patterns
59
+ - [ ] Infrastructure prerequisites documented (API server, emulator, permissions)
@@ -0,0 +1,48 @@
1
+ # QA-B Step: Data Pipeline Testing
2
+
3
+ ## Pipeline Execution Tests
4
+
5
+ Mandatory for all stories with data pipeline stages. Run the real pipeline against real data and real data stores.
6
+
7
+ **Procedure:**
8
+
9
+ 1. **Seed sample data.** Per smoke test table in qa-test-spec.md. Use fixture files, seed scripts, or direct data store insertion. Include happy-path data, null/malformed records, and edge cases.
10
+ 2. **Run pipeline.** Execute the full pipeline from ingest to output. Capture all logs.
11
+ 3. **Validate row counts.** At each stage boundary, verify row counts match expected values from the smoke test table. Record actual vs expected.
12
+ 4. **Spot-check values.** For each transform stage, verify a sample of output records against expected values. Check data types, formats, null handling, and edge cases.
13
+ 5. **Re-run pipeline (idempotency).** Run the same pipeline again with the same input. Assert that the output is identical -- no duplicate rows, no changed values, no side effects from the second run.
14
+ 6. **Kill and restart (checkpoint/resume).** If the pipeline supports checkpointing: run the pipeline, kill it mid-execution (after at least one checkpoint), restart, and verify that the final output matches a clean full run.
15
+ 7. **Record results** in `## Pipeline Execution Results` of execution-report.md.
16
+
17
+ **Pipeline Execution Results table:**
18
+
19
+ ```
20
+ | ID | Stage | Input Rows | Expected Output Rows | Actual Output Rows | Spot-Check Values | Idempotency | Result |
21
+ ```
22
+
23
+ **Row Count Reconciliation:**
24
+
25
+ ```
26
+ | Stage | Input | Output | Dropped | Drop Reason | Expected Delta | Actual Delta | Match |
27
+ ```
28
+
29
+ Include full pipeline logs and commands for reproducibility.
30
+
31
+ **Failure handling:**
32
+ - Pipeline fails to start: file P1 bug, record error, continue.
33
+ - Stage produces wrong row count: file bug at appropriate priority, continue remaining stages.
34
+ - Idempotency fails (duplicates on re-run): file P1 bug, record both run outputs.
35
+ - Checkpoint/resume produces different output than clean run: file P1 bug, record both outputs.
36
+ - Pipeline crashes mid-run: file P1 bug, record crash output, attempt restart.
37
+
38
+ **This step cannot be skipped.** If qa-test-spec.md lacks a Pipeline Smoke Tests section, construct the table from pipeline stages in reqs-brief.md and execute.
39
+
40
+ ## Execution Report Additions
41
+
42
+ The execution report MUST include:
43
+ - `## Pipeline Execution Results` table with actual vs expected for every stage
44
+ - `## Row Count Reconciliation` table showing data flow through the pipeline
45
+ - Full pipeline execution logs
46
+ - Pipeline start command and configuration
47
+ - Idempotency verification results (both runs compared)
48
+ - Checkpoint/resume verification results (if applicable)
@@ -0,0 +1,47 @@
1
+ # QA-B Step: Document Generation Testing
2
+
3
+ ## Render Validation Tests
4
+
5
+ Mandatory for all stories with document generation. Invoke the real render pipeline and validate actual output.
6
+
7
+ **Procedure:**
8
+
9
+ 1. **Seed template and input data.** Per render smoke test table in qa-test-spec.md. Load templates and prepare input data fixtures (JSON, database records, or programmatic setup).
10
+ 2. **Invoke document generation.** Call the render pipeline with the seeded template and input data. Capture the generated output (file, buffer, or stream).
11
+ 3. **Verify output format and MIME type.** Confirm the output matches the expected format (PDF, HTML, Markdown) and has the correct MIME type.
12
+ 4. **Parse output structure.** For HTML: parse the DOM. For PDF: extract text and metadata. For Markdown: parse structure. Verify the output is well-formed and parseable.
13
+ 5. **Check variable substitution.** Verify all expected variable values appear in the output at correct positions. Verify no unsubstituted template markers (`{{varName}}`, `${varName}`, `{% raw %}`, etc.) remain.
14
+ 6. **Verify encoding.** Confirm UTF-8 encoding. Test with unicode data (CJK, emoji, RTL) and verify output preserves characters correctly -- no mojibake, no replacement characters.
15
+ 7. **Execute edge-case data tests.** Per render smoke test table: null variables, missing optional fields, empty collections, extremely long strings, special characters. Verify graceful handling.
16
+ 8. **Record results** in `## Render Validation Results` of execution-report.md.
17
+
18
+ ```
19
+ ## Render Validation Results
20
+
21
+ | ID | Template | Input Data | Expected Format | Actual Format | Substitution Check | Encoding Check | Result |
22
+ ```
23
+
24
+ Include raw generation commands and output excerpts for reproducibility.
25
+
26
+ ### Variable Substitution Audit
27
+
28
+ After all render tests execute, build a substitution audit:
29
+
30
+ | Template | Total Variables | Substituted Correctly | Null Handled | Missing Handled | Unsubstituted Markers Found |
31
+ |----------|----------------|----------------------|--------------|-----------------|---------------------------|
32
+
33
+ **Failure handling:**
34
+ - Render pipeline fails to start: file P1 bug, record error, continue.
35
+ - Render test fails: file bug at appropriate priority, continue remaining tests.
36
+ - Output contains unsubstituted markers: file P1 bug -- this is a data leak / presentation defect.
37
+ - Encoding errors (mojibake, replacement characters): file P2 bug.
38
+
39
+ **This step cannot be skipped.** If qa-test-spec.md lacks a Render Smoke Tests section, construct the table from template definitions in reqs-brief.md and execute.
40
+
41
+ ## Execution Report Additions
42
+
43
+ The execution report MUST include:
44
+ - `## Render Validation Results` table with actual vs expected for every row
45
+ - Variable substitution audit table
46
+ - Raw generation commands and output excerpts
47
+ - Encoding verification details (input data with unicode, output confirmation)
@@ -0,0 +1,44 @@
1
+ # QA-B Step: Infrastructure Testing
2
+
3
+ ## Infrastructure Validation Tests
4
+
5
+ Mandatory for all stories with infrastructure resources. Validate infrastructure definitions against real plan output and live state.
6
+
7
+ **Procedure:**
8
+
9
+ 1. **Initialize.** Run `terraform init` (or equivalent) to initialize providers and modules. Verify initialization succeeds without errors.
10
+ 2. **Plan validation.** Run `terraform plan` (or equivalent). Verify no errors. Capture plan output for review.
11
+ 3. **Apply (if test environment available).** Run `terraform apply` against the test environment. Capture apply output.
12
+ 4. **Verify resources exist.** For each resource in the smoke test table, verify it exists in the target environment using CLI or API queries.
13
+ 5. **Verify tags.** For each resource, verify all standard tags are present (environment, project, owner, managed-by).
14
+ 6. **Verify IAM policies.** For each IAM role/policy, verify least-privilege: no wildcard actions, no overly broad resource scopes.
15
+ 7. **Idempotency check.** Run `terraform plan` again after apply. Expect no changes (zero diff). Any unexpected diff is a bug.
16
+ 8. **Record results** in `## Infrastructure Validation Results` of execution-report.md.
17
+
18
+ **Infrastructure Validation Results table:**
19
+
20
+ ```
21
+ | ID | Resource | Operation | Expected State | Actual State | Tags Valid | IAM Valid | Result |
22
+ ```
23
+
24
+ Include full command output for reproducibility.
25
+
26
+ **Failure handling:**
27
+ - Init fails: file P1 bug, record error, continue.
28
+ - Plan fails: file P1 bug, record plan output, continue.
29
+ - Apply fails: file P1 bug, record apply output, continue.
30
+ - Missing resource after apply: file P1 bug, continue remaining checks.
31
+ - Missing tags: file P2 bug, continue.
32
+ - Overly permissive IAM: file P1 bug, continue.
33
+ - Idempotency fails (diff after apply): file P1 bug, record both plan outputs.
34
+
35
+ **This step cannot be skipped.** If qa-test-spec.md lacks an Infrastructure Smoke Tests section, construct the table from infrastructure resources in reqs-brief.md and execute.
36
+
37
+ ## Execution Report Additions
38
+
39
+ The execution report MUST include:
40
+ - `## Infrastructure Validation Results` table with actual vs expected for every resource
41
+ - Full terraform init/plan/apply output
42
+ - Tag verification results per resource
43
+ - IAM policy verification results
44
+ - Idempotency verification (second plan output showing no changes)