valent-pipeline 0.2.20 → 0.2.21

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (110) hide show
  1. package/README.md +438 -0
  2. package/package.json +1 -1
  3. package/pipeline/agents-manifest.yaml +61 -1
  4. package/pipeline/docs/agent-reference.md +82 -23
  5. package/pipeline/docs/design/refactor-checklist.md +111 -0
  6. package/pipeline/docs/index.md +60 -0
  7. package/pipeline/docs/pipeline-overview.md +4 -0
  8. package/pipeline/prompts/bend.md +5 -11
  9. package/pipeline/prompts/critic.md +9 -0
  10. package/pipeline/prompts/data.md +59 -0
  11. package/pipeline/prompts/docgen.md +61 -0
  12. package/pipeline/prompts/fend.md +3 -10
  13. package/pipeline/prompts/iac.md +70 -0
  14. package/pipeline/prompts/lead.md +81 -3
  15. package/pipeline/prompts/libdev.md +61 -0
  16. package/pipeline/prompts/mcp-dev.md +59 -0
  17. package/pipeline/prompts/mobile.md +92 -0
  18. package/pipeline/prompts/qa-a.md +1 -1
  19. package/pipeline/prompts/qa-b.md +1 -1
  20. package/pipeline/prompts/reqs.md +5 -1
  21. package/pipeline/scripts/db-bootstrap.ts +1 -1
  22. package/pipeline/scripts/embed-sqlite.ts +5 -0
  23. package/pipeline/steps/common/quality-standards.md +19 -0
  24. package/pipeline/steps/critic/data-pipeline.md +28 -0
  25. package/pipeline/steps/critic/document-generation.md +21 -0
  26. package/pipeline/steps/critic/iac.md +29 -0
  27. package/pipeline/steps/critic/library.md +24 -0
  28. package/pipeline/steps/critic/mcp-server.md +24 -0
  29. package/pipeline/steps/critic/mobile-app.md +29 -0
  30. package/pipeline/steps/data/estimate.md +51 -0
  31. package/pipeline/steps/data/handoff.md +9 -0
  32. package/pipeline/steps/data/implement.md +16 -0
  33. package/pipeline/steps/data/read-inputs.md +13 -0
  34. package/pipeline/steps/data/write-tests.md +13 -0
  35. package/pipeline/steps/docgen/estimate.md +49 -0
  36. package/pipeline/steps/docgen/handoff.md +9 -0
  37. package/pipeline/steps/docgen/implement.md +19 -0
  38. package/pipeline/steps/docgen/read-inputs.md +13 -0
  39. package/pipeline/steps/docgen/write-tests.md +15 -0
  40. package/pipeline/steps/iac/estimate.md +50 -0
  41. package/pipeline/steps/iac/handoff.md +9 -0
  42. package/pipeline/steps/iac/implement.md +19 -0
  43. package/pipeline/steps/iac/read-inputs.md +13 -0
  44. package/pipeline/steps/iac/write-tests.md +20 -0
  45. package/pipeline/steps/judge/ship-decision.md +14 -1
  46. package/pipeline/steps/libdev/estimate.md +49 -0
  47. package/pipeline/steps/libdev/handoff.md +9 -0
  48. package/pipeline/steps/libdev/implement.md +19 -0
  49. package/pipeline/steps/libdev/read-inputs.md +13 -0
  50. package/pipeline/steps/libdev/write-tests.md +16 -0
  51. package/pipeline/steps/mcp-dev/estimate.md +49 -0
  52. package/pipeline/steps/mcp-dev/handoff.md +9 -0
  53. package/pipeline/steps/mcp-dev/implement.md +29 -0
  54. package/pipeline/steps/mcp-dev/read-inputs.md +13 -0
  55. package/pipeline/steps/mcp-dev/write-tests.md +19 -0
  56. package/pipeline/steps/mobile/emulator-lifecycle.md +67 -0
  57. package/pipeline/steps/mobile/estimate.md +51 -0
  58. package/pipeline/steps/mobile/flutter.md +30 -0
  59. package/pipeline/steps/mobile/handoff.md +18 -0
  60. package/pipeline/steps/mobile/implement.md +20 -0
  61. package/pipeline/steps/mobile/react-native.md +32 -0
  62. package/pipeline/steps/mobile/read-inputs.md +10 -0
  63. package/pipeline/steps/mobile/write-tests.md +59 -0
  64. package/pipeline/steps/orchestration/adopt-lead-and-create-team.md +1 -1
  65. package/pipeline/steps/orchestration/sprint-groom.md +4 -0
  66. package/pipeline/steps/orchestration/sprint-size.md +19 -12
  67. package/pipeline/steps/orchestration/validate-story-inputs.md +9 -0
  68. package/pipeline/steps/qa-a/data-pipeline.md +32 -0
  69. package/pipeline/steps/qa-a/document-generation.md +52 -0
  70. package/pipeline/steps/qa-a/iac.md +30 -0
  71. package/pipeline/steps/qa-a/library.md +42 -0
  72. package/pipeline/steps/qa-a/mcp-server.md +31 -0
  73. package/pipeline/steps/qa-a/mobile-app.md +59 -0
  74. package/pipeline/steps/qa-b/data-pipeline.md +48 -0
  75. package/pipeline/steps/qa-b/document-generation.md +47 -0
  76. package/pipeline/steps/qa-b/iac.md +44 -0
  77. package/pipeline/steps/qa-b/library.md +61 -0
  78. package/pipeline/steps/qa-b/mcp-server.md +40 -0
  79. package/pipeline/steps/qa-b/mobile-app.md +71 -0
  80. package/pipeline/steps/readiness/standalone-review.md +7 -2
  81. package/pipeline/steps/reqs/data-pipeline.md +56 -0
  82. package/pipeline/steps/reqs/document-generation.md +55 -0
  83. package/pipeline/steps/reqs/draft-brief.md +10 -0
  84. package/pipeline/steps/reqs/iac.md +63 -0
  85. package/pipeline/steps/reqs/library.md +56 -0
  86. package/pipeline/steps/reqs/mcp-server.md +48 -0
  87. package/pipeline/steps/reqs/mobile-app.md +54 -0
  88. package/pipeline/steps/reqs/self-review.md +5 -3
  89. package/pipeline/task-graphs/backend-api.yaml +19 -2
  90. package/pipeline/task-graphs/data-pipeline.yaml +29 -12
  91. package/pipeline/task-graphs/document-generation.yaml +29 -12
  92. package/pipeline/task-graphs/frontend-only.yaml +19 -2
  93. package/pipeline/task-graphs/fullstack-web.yaml +19 -2
  94. package/pipeline/task-graphs/library.yaml +29 -12
  95. package/pipeline/task-graphs/mcp-server.yaml +29 -12
  96. package/pipeline/task-graphs/mobile-app.yaml +171 -0
  97. package/pipeline/templates/bugs.template.md +1 -1
  98. package/pipeline/templates/critic-review.template.md +1 -1
  99. package/pipeline/templates/data-handoff.template.md +96 -0
  100. package/pipeline/templates/docgen-handoff.template.md +83 -0
  101. package/pipeline/templates/iac-handoff.template.md +83 -0
  102. package/pipeline/templates/judge-decision.template.md +11 -1
  103. package/pipeline/templates/libdev-handoff.template.md +82 -0
  104. package/pipeline/templates/mcp-dev-handoff.template.md +87 -0
  105. package/pipeline/templates/mobile-handoff.template.md +122 -0
  106. package/pipeline/templates/reqs-brief.template.md +60 -4
  107. package/skills/valent-run-deferred-tests/SKILL.md +109 -0
  108. package/src/commands/db-rebuild.js +5 -0
  109. package/src/lib/config-schema.js +1 -1
  110. package/src/lib/db.js +1 -1
@@ -0,0 +1,13 @@
1
+ # DOCGEN Step: Read Inputs
2
+
3
+ ## Step 1: Read reqs-brief.md
4
+ Understand: acceptance criteria, business rules, template definitions, variable schemas, output format requirements, asset dependencies (fonts, images, stylesheets), encoding requirements, cross-cutting concerns.
5
+
6
+ ## Step 2: Read qa-test-spec.md
7
+ Understand: what tests to write for each AC, expected assertions, output validation requirements, test case names and structure.
8
+
9
+ ## Step 3: Read correction directives
10
+ Read `{correction_directives}`. Apply all directives targeting DOCGEN. Note any conflicts with default behavior and follow the directive.
11
+
12
+ ## Step 3b: Query Knowledge Agent (Conditional)
13
+ If a Knowledge Agent is available in the team config, send: `[KNOWLEDGE-QUERY] What codebase conventions, implementation patterns, and known pitfalls should I know? Context: I am DOCGEN implementing {story_id} using {tech_stack.template_engine} with output formats {tech_stack.output_formats}.` If no response within a reasonable time or no Knowledge Agent is spawned, proceed without.
@@ -0,0 +1,15 @@
1
+ # DOCGEN Step: Write Tests
2
+
3
+ ## Step 10: Write test code
4
+ Satisfy qa-test-spec for each AC. Every test case named in qa-test-spec must have a corresponding test. Follow quality standards from the core prompt. Record in `docgen-handoff.md#test-files-written`.
5
+
6
+ Tests must invoke real document generation -- no mocked renderers. The actual template engine must process templates and produce real output. Parse and validate the generated output programmatically:
7
+ - For HTML: parse the DOM and assert on structure and content.
8
+ - For PDF: extract text/metadata and assert on content.
9
+ - For Markdown: parse and assert on structure and content.
10
+
11
+ ## Step 11: Test with edge-case data
12
+ Per qa-test-spec and quality standards: tests must exercise null values, missing optional fields, unicode characters (CJK, emoji, RTL text), extremely long strings, empty collections, and special characters that could break template syntax. Verify that auto-escaping prevents injection in all cases.
13
+
14
+ ## Step 12: Run tests, verify all pass
15
+ Run the full test suite. All tests must pass. Record results in `docgen-handoff.md#test-results-summary`. If tests fail, fix the code -- do not skip or weaken tests.
@@ -0,0 +1,50 @@
1
+ # Infrastructure Estimation
2
+
3
+ **Purpose:** Assign a Fibonacci story point estimate for infrastructure implementation complexity. This is a lightweight estimation step — no code tools, no implementation. Read specs, assess complexity, output a number with rationale.
4
+
5
+ **Fibonacci scale:** 1, 2, 3, 5, 8, 13, 21
6
+
7
+ ## Step 1: Read Groomed Specs
8
+
9
+ Read and assess:
10
+ - `{story_output_dir}/reqs-brief.md` — REQUIRED
11
+ - `{story_output_dir}/qa-test-spec.md` — REQUIRED
12
+
13
+ ## Step 2: Assess Complexity Factors
14
+
15
+ Evaluate each factor and record your assessment:
16
+
17
+ | Factor | Assessment | Weight |
18
+ |--------|-----------|--------|
19
+ | **Resource count and complexity** | How many resources? Simple (S3 bucket, DNS record) or complex (VPC, EKS cluster, RDS)? | High |
20
+ | **New infrastructure vs incremental** | Greenfield (new VPC, new cluster) vs incremental (add IAM role, add env var)? | High |
21
+ | **IAM/Security surface** | New roles, policies, service accounts? Cross-account access? | Medium |
22
+ | **State management complexity** | New state backend, state migration, workspace setup? | Medium |
23
+ | **Integration surface** | Outputs consumed by other services, shared config, cross-stack references? | Medium |
24
+ | **Test complexity** | How hard will infrastructure validation be? Multi-resource dependencies, timing issues? | Medium |
25
+
26
+ ## Step 3: Select Fibonacci Value
27
+
28
+ Map your assessment to the Fibonacci scale:
29
+
30
+ | Points | Typical Infrastructure Scope |
31
+ |--------|------------------------------|
32
+ | 1 | Tag update, single env var, minor config change |
33
+ | 2 | Simple resource addition (S3 bucket, DNS record), trivial IAM tweak |
34
+ | 3 | Standard resource with IAM, moderate config (Lambda + role, security group rules) |
35
+ | 5 | Multi-resource feature (RDS + security group + IAM + secrets), provider setup |
36
+ | 8 | Complex infrastructure (VPC + subnets + NAT + routing, ECS service + ALB + auto-scaling) |
37
+ | 13 | Large infrastructure spanning multiple services, complex IAM, multi-environment setup |
38
+ | 21 | Epic-scale: new cluster, major networking overhaul, multi-region setup (consider splitting) |
39
+
40
+ **Calibration context (if `{estimation_model}` is `calibrated`):**
41
+ If calibration directives are provided in `{correction_directives}`, factor them into your estimate.
42
+
43
+ ## Step 4: Write Estimate
44
+
45
+ Write to `{story_output_dir}/iac-estimation.md` using `.valent-pipeline/templates/estimation.template.md`:
46
+ - Fibonacci value with brief rationale (2-3 sentences)
47
+ - Factor assessments from Step 2
48
+ - Calibration adjustments applied (if any)
49
+
50
+ Send: `[ESTIMATION] IAC estimates {story_id} at {points} points. See iac-estimation.md.`
@@ -0,0 +1,9 @@
1
+ # IAC Step: Handoff
2
+
3
+ Read `.valent-pipeline/steps/common/distilled-handoff-format.md` before writing output.
4
+
5
+ ## Step 13: Write iac-handoff.md
6
+ Complete all sections of the handoff document using the template at `.valent-pipeline/templates/iac-handoff.template.md`. Set `status: completed` in frontmatter. Notify lead via inbox: `[DONE] Infrastructure implementation complete. See iac-handoff.md#orchestrator-summary.`
7
+
8
+ ## Independent Verification Requirement
9
+ You must independently verify: all tests pass against the combined, integrated codebase before marking your task complete. Do not rely on other agents or CRITIC to catch your failures.
@@ -0,0 +1,19 @@
1
+ # IAC Step: Implement
2
+
3
+ ## Step 4: Plan implementation approach
4
+ Order: (1) state backend setup, (2) provider configuration, (3) resource definitions with tagging, (4) IAM roles/policies (least privilege), (5) outputs/exports for consuming services, (6) environment configuration. Identify shared config that needs coordination with BEND/FEND/DATA.
5
+
6
+ ## Step 5: Configure state backend
7
+ Per reqs-brief: set up remote state backend with locking. Record in `iac-handoff.md#state-management`.
8
+
9
+ ## Step 6: Configure providers
10
+ Per reqs-brief: provider blocks with pinned versions, authentication configuration, default tags. Record in `iac-handoff.md#files-created-modified`.
11
+
12
+ ## Step 7: Define resources with tagging
13
+ Per reqs-brief: resource definitions with standard tags (environment, project, owner, managed-by). Every resource must be tagged. Apply destroy protection on stateful resources (databases, storage). Record in `iac-handoff.md#resources-provisioned`.
14
+
15
+ ## Step 8: Create IAM roles and policies
16
+ Per reqs-brief: roles, policies, service accounts with least-privilege access. No wildcard actions or resources unless explicitly justified. Record in `iac-handoff.md#iam-security`.
17
+
18
+ ## Step 9: Define outputs and environment configuration
19
+ Per reqs-brief: outputs/exports for consuming services, environment variables, secret references, connection strings. Coordinate with other dev agents via inbox for shared config. Record in `iac-handoff.md#environment-configuration`.
@@ -0,0 +1,13 @@
1
+ # IAC Step: Read Inputs
2
+
3
+ ## Step 1: Read reqs-brief.md
4
+ Understand: acceptance criteria, business rules, infrastructure requirements, deployment configurations, resource provisioning needs, IAM/security requirements, environment configuration, cross-cutting concerns.
5
+
6
+ ## Step 2: Read qa-test-spec.md
7
+ Understand: what tests to write for each AC, expected assertions, infrastructure state verification requirements, test case names and structure.
8
+
9
+ ## Step 3: Read correction directives
10
+ Read `{correction_directives}`. Apply all directives targeting IAC. Note any conflicts with default behavior and follow the directive.
11
+
12
+ ## Step 3b: Query Knowledge Agent (Conditional)
13
+ If a Knowledge Agent is available in the team config, send: `[KNOWLEDGE-QUERY] What codebase conventions, infrastructure patterns, and known pitfalls should I know? Context: I am IAC implementing {story_id} using {tech_stack.iac_framework} on {tech_stack.cloud_provider}.` If no response within a reasonable time or no Knowledge Agent is spawned, proceed without.
@@ -0,0 +1,20 @@
1
+ # IAC Step: Write Tests
2
+
3
+ ## Step 10: Write test code
4
+ Satisfy qa-test-spec for each AC. Every test case named in qa-test-spec must have a corresponding test. Follow quality standards from the core prompt. Record in `iac-handoff.md#test-files-written`.
5
+
6
+ Infrastructure test categories:
7
+ - **Plan validation** -- terraform plan (or equivalent) succeeds without errors
8
+ - **Policy checks** -- tflint, checkov, OPA, or equivalent policy-as-code validation passes
9
+ - **Idempotency** -- apply twice, second apply shows no changes (zero diff)
10
+ - **Resource tagging verification** -- all resources have required standard tags
11
+ - **Security policy checks** -- no overly permissive IAM, no hardcoded secrets
12
+
13
+ Do NOT use mocked providers for happy-path tests. Tests must validate against real plan output or real infrastructure state.
14
+
15
+ ## Step 11: Run tests, verify all pass
16
+ Run the full infrastructure test suite. All tests must pass. Record results in `iac-handoff.md#test-results-summary`. If tests fail, fix the code -- do not skip or weaken tests.
17
+
18
+ ## Step 12: Signal integration readiness
19
+ When your code is complete and all tests pass, send to other active dev agents via inbox:
20
+ `[INTEGRATION-READY] Infrastructure code complete. Environment configuration available at iac-handoff.md#environment-configuration.`
@@ -8,8 +8,20 @@ Read `.valent-pipeline/steps/common/distilled-handoff-format.md` before writing
8
8
  - ALL evidence checklist items PASS (or N/A)
9
9
  - Socratic validation reveals no concerns undermining evidence integrity
10
10
 
11
+ **SHIP-PARTIAL** if:
12
+ - ALL available-platform evidence checklist items PASS (or N/A)
13
+ - Platform coverage evidence item is PARTIAL (not FAIL)
14
+ - iOS tests are deferred (not failed) due to host platform limitation
15
+ - `mobile-handoff.md` frontmatter has `ios_deferred: true`
16
+ - Socratic validation reveals no concerns for available-platform evidence
17
+
18
+ SHIP-PARTIAL is ONLY valid when:
19
+ 1. The project type is `mobile-app`
20
+ 2. iOS tests were deferred due to host OS limitation (not because they failed)
21
+ 3. All Android tests passed
22
+
11
23
  **REJECT** if:
12
- - ANY evidence checklist item FAIL, OR
24
+ - ANY evidence checklist item FAIL (excluding PARTIAL platform coverage when iOS is deferred, not failed), OR
13
25
  - Socratic validation undermines evidence integrity
14
26
 
15
27
  For REJECT: identify root cause, responsible phase/agent, required action. JUDGE rejections are non-routine; lead (Opus) takes ownership.
@@ -19,6 +31,7 @@ For REJECT: identify root cause, responsible phase/agent, required action. JUDGE
19
31
  - Write to `{story_output_dir}/judge-decision.md`
20
32
  - Update frontmatter: status completed, all steps in stepsCompleted
21
33
  - SHIP: `[JUDGE-SHIP] Story {story_id} approved for shipping. See judge-decision.md#verdict.`
34
+ - SHIP-PARTIAL: `[JUDGE-SHIP-PARTIAL] Story {story_id} approved for shipping (Android verified). iOS deferred — run /run-deferred-tests on Mac. See judge-decision.md#ship-partial-detail.`
22
35
  - REJECT: `[JUDGE-REJECT] Story {story_id} rejected. See judge-decision.md#rejection-detail.`
23
36
 
24
37
  ## Step 14b: Write Story Report (SHIP only)
@@ -0,0 +1,49 @@
1
+ # Library Estimation
2
+
3
+ **Purpose:** Assign a Fibonacci story point estimate for library implementation complexity. This is a lightweight estimation step — no code tools, no implementation. Read specs, assess complexity, output a number with rationale.
4
+
5
+ **Fibonacci scale:** 1, 2, 3, 5, 8, 13, 21
6
+
7
+ ## Step 1: Read Groomed Specs
8
+
9
+ Read and assess:
10
+ - `{story_output_dir}/reqs-brief.md` — REQUIRED
11
+ - `{story_output_dir}/qa-test-spec.md` — REQUIRED
12
+
13
+ ## Step 2: Assess Complexity Factors
14
+
15
+ Evaluate each factor and record your assessment:
16
+
17
+ | Factor | Assessment | Weight |
18
+ |--------|-----------|--------|
19
+ | **AC count and complexity** | How many ACs? Are they simple (add export) or complex (breaking API redesign, dual-module compat)? | High |
20
+ | **New patterns vs established** | Greenfield (new library, new module system) vs incremental (add export, update types)? | High |
21
+ | **Public API surface** | How many new exports? New type declarations? Breaking changes requiring migration? | Medium |
22
+ | **Module system complexity** | Single target (ESM-only) vs dual CJS+ESM? Tree-shaking requirements? | Medium |
23
+ | **Test complexity** | How hard will consumer-simulation tests be? Cross-module-system testing? Bundle verification? | Medium |
24
+
25
+ ## Step 3: Select Fibonacci Value
26
+
27
+ Map your assessment to the Fibonacci scale:
28
+
29
+ | Points | Typical Library Scope |
30
+ |--------|-----------------------|
31
+ | 1 | Single export addition, type fix, no packaging change |
32
+ | 2 | Simple new export with types, trivial packaging update |
33
+ | 3 | Multiple exports, type declarations, moderate test coverage |
34
+ | 5 | New module with CJS+ESM entry points, type overhaul, consumer-sim tests |
35
+ | 8 | Major API surface change, breaking changes with migration, dual-module verification |
36
+ | 13 | Large library restructure, new module system support, extensive type surface |
37
+ | 21 | Epic-scale: new library or complete packaging overhaul (consider splitting the story) |
38
+
39
+ **Calibration context (if `{estimation_model}` is `calibrated`):**
40
+ If calibration directives are provided in `{correction_directives}`, factor them into your estimate. These are learned patterns from prior sprints — e.g., "dual CJS+ESM stories consistently under-pointed by 1 tier" or "stories with breaking changes average 8 points."
41
+
42
+ ## Step 4: Write Estimate
43
+
44
+ Write to `{story_output_dir}/libdev-estimation.md` using `.valent-pipeline/templates/estimation.template.md`:
45
+ - Fibonacci value with brief rationale (2-3 sentences)
46
+ - Factor assessments from Step 2
47
+ - Calibration adjustments applied (if any)
48
+
49
+ Send: `[ESTIMATION] LIBDEV estimates {story_id} at {points} points. See libdev-estimation.md.`
@@ -0,0 +1,9 @@
1
+ # LIBDEV Step: Handoff
2
+
3
+ Read `.valent-pipeline/steps/common/distilled-handoff-format.md` before writing output.
4
+
5
+ ## Step 13: Write libdev-handoff.md
6
+ Complete all sections of the handoff document using the template at `.valent-pipeline/templates/libdev-handoff.template.md`. Set `status: completed` in frontmatter. Notify lead via inbox: `[DONE] Library implementation complete. See libdev-handoff.md#orchestrator-summary.`
7
+
8
+ ## Independent Verification Requirement
9
+ You must independently verify: all tests pass, all exports resolve correctly, CJS and ESM entry points work, and type declarations are complete before marking your task complete. Do not rely on CRITIC or QA-B to catch your failures.
@@ -0,0 +1,19 @@
1
+ # LIBDEV Step: Implement
2
+
3
+ ## Step 4: Plan implementation approach
4
+ Order: public API surface design (exports map) -> core module implementation -> type declarations (.d.ts/hints) -> CJS+ESM entry points -> side effects verification. Identify peer dependencies that must be declared.
5
+
6
+ ## Step 5: Design public API surface
7
+ Per reqs-brief: define the exports map, entry points, and module structure. Every public export must be intentional -- no internal modules leaking into the public API. Record in `libdev-handoff.md#public-api-surface`.
8
+
9
+ ## Step 6: Implement core modules
10
+ Per reqs-brief: implement the library's core logic, utilities, and domain types. Each module must be independently importable via the exports map. Record in `libdev-handoff.md#files-created-modified`.
11
+
12
+ ## Step 7: Implement type declarations
13
+ Per reqs-brief: create or generate type declarations for every public export. TypeScript: .d.ts files or inline types. Python: type hints and py.typed marker. Rust: public type signatures. Types must match the actual implementation signatures exactly.
14
+
15
+ ## Step 8: Configure entry points and packaging
16
+ Per reqs-brief: configure main (CJS), module (ESM), and types entry points. Set sideEffects field. Declare peer dependencies with correct version ranges. Record in `libdev-handoff.md#package-configuration`.
17
+
18
+ ## Step 9: Verify no accidental side effects
19
+ Import the library's main entry point and verify no observable side effects execute (no console output, no network calls, no global mutations, no file I/O). If side effects are intentional, document them and set sideEffects field accordingly. Record decisions in `libdev-handoff.md#implementation-decisions`.
@@ -0,0 +1,13 @@
1
+ # LIBDEV Step: Read Inputs
2
+
3
+ ## Step 1: Read reqs-brief.md
4
+ Understand: acceptance criteria, business rules, public API surface requirements, export contracts, type declarations, packaging constraints, peer dependency requirements, cross-cutting concerns.
5
+
6
+ ## Step 2: Read qa-test-spec.md
7
+ Understand: what tests to write for each AC, expected assertions, consumer-simulation requirements, test case names and structure.
8
+
9
+ ## Step 3: Read correction directives
10
+ Read `{correction_directives}`. Apply all directives targeting LIBDEV. Note any conflicts with default behavior and follow the directive.
11
+
12
+ ## Step 3b: Query Knowledge Agent (Conditional)
13
+ If a Knowledge Agent is available in the team config, send: `[KNOWLEDGE-QUERY] What codebase conventions, implementation patterns, and known pitfalls should I know? Context: I am LIBDEV implementing {story_id} using {tech_stack.language} with {tech_stack.module_system} module system and {tech_stack.type_system}.` If no response within a reasonable time or no Knowledge Agent is spawned, proceed without.
@@ -0,0 +1,16 @@
1
+ # LIBDEV Step: Write Tests
2
+
3
+ ## Step 10: Write test code
4
+ Satisfy qa-test-spec for each AC. Every test case named in qa-test-spec must have a corresponding test. Follow quality standards from the core prompt. Record in `libdev-handoff.md#test-files-written`.
5
+
6
+ **Critical rule:** Tests must import the actual library using its public API -- the same way a consumer would. No importing from internal/private module paths. No mocked imports. If a test cannot exercise the library through its public exports, the API surface is wrong, not the test.
7
+
8
+ ## Step 11: Write consumer-simulation tests
9
+ Beyond unit tests, write at least one consumer-simulation test per major export that:
10
+ 1. Imports the library the way a real consumer would (CJS `require()` and ESM `import`)
11
+ 2. Exercises the documented usage example from the public API surface
12
+ 3. Verifies the return types match declared type signatures
13
+ 4. Confirms no side effects on import (no console output, no global mutations)
14
+
15
+ ## Step 12: Run tests, verify all pass
16
+ Run the full library test suite. All tests must pass. Record results in `libdev-handoff.md#test-results-summary`. If tests fail, fix the code -- do not skip or weaken tests.
@@ -0,0 +1,49 @@
1
+ # MCP Server Estimation
2
+
3
+ **Purpose:** Assign a Fibonacci story point estimate for MCP server implementation complexity. This is a lightweight estimation step -- no code tools, no implementation. Read specs, assess complexity, output a number with rationale.
4
+
5
+ **Fibonacci scale:** 1, 2, 3, 5, 8, 13, 21
6
+
7
+ ## Step 1: Read Groomed Specs
8
+
9
+ Read and assess:
10
+ - `{story_output_dir}/reqs-brief.md` -- REQUIRED
11
+ - `{story_output_dir}/qa-test-spec.md` -- REQUIRED
12
+
13
+ ## Step 2: Assess Complexity Factors
14
+
15
+ Evaluate each factor and record your assessment:
16
+
17
+ | Factor | Assessment | Weight |
18
+ |--------|-----------|--------|
19
+ | **Tool count and handler complexity** | How many tools? Are handlers simple (stateless transform) or complex (stateful, multi-step, external calls)? | High |
20
+ | **New patterns vs established** | Greenfield server vs adding tools to existing server? New transport vs reusing existing? | High |
21
+ | **Input schema complexity** | Simple flat params vs deeply nested schemas, conditional validation, cross-field dependencies? | Medium |
22
+ | **Error surface** | How many distinct failure modes per tool? Complex error mapping? External service failures to handle? | Medium |
23
+ | **Test complexity** | How hard will QA-B's protocol tests be to pass? Complex fixtures, multi-step tool sequences, stateful interactions? | Medium |
24
+
25
+ ## Step 3: Select Fibonacci Value
26
+
27
+ Map your assessment to the Fibonacci scale:
28
+
29
+ | Points | Typical MCP Server Scope |
30
+ |--------|------------------------|
31
+ | 1 | Config change, single tool param update, no schema change |
32
+ | 2 | Simple single-tool server, trivial inputSchema, basic transport |
33
+ | 3 | Standard server with 2-3 tools, input validation, proper error model |
34
+ | 5 | Multi-tool server with complex schemas, stateful handlers, external integrations |
35
+ | 8 | Complex server with many tools, multiple content types, advanced error handling, transport edge cases |
36
+ | 13 | Large server spanning multiple domains, custom transport, extensive protocol compliance surface |
37
+ | 21 | Epic-scale: new transport implementation or major protocol extension (consider splitting the story) |
38
+
39
+ **Calibration context (if `{estimation_model}` is `calibrated`):**
40
+ If calibration directives are provided in `{correction_directives}`, factor them into your estimate. These are learned patterns from prior sprints -- e.g., "input validation consistently under-pointed by 1 tier" or "stories with 5+ tools average 8 points."
41
+
42
+ ## Step 4: Write Estimate
43
+
44
+ Write to `{story_output_dir}/mcp-dev-estimation.md` using `.valent-pipeline/templates/estimation.template.md`:
45
+ - Fibonacci value with brief rationale (2-3 sentences)
46
+ - Factor assessments from Step 2
47
+ - Calibration adjustments applied (if any)
48
+
49
+ Send: `[ESTIMATION] MCP-DEV estimates {story_id} at {points} points. See mcp-dev-estimation.md.`
@@ -0,0 +1,9 @@
1
+ # MCP-DEV Step: Handoff
2
+
3
+ Read `.valent-pipeline/steps/common/distilled-handoff-format.md` before writing output.
4
+
5
+ ## Step 13: Write mcp-dev-handoff.md
6
+ Complete all sections of the handoff document using the template at `.valent-pipeline/templates/mcp-dev-handoff.template.md`. Set `status: completed` in frontmatter. Notify lead via inbox: `[DONE] MCP server implementation complete. See mcp-dev-handoff.md#orchestrator-summary.`
7
+
8
+ ## Independent Verification Requirement
9
+ You must independently verify: all tests pass against the complete server implementation before marking your task complete. Do not rely on CRITIC or QA-B to catch your failures.
@@ -0,0 +1,29 @@
1
+ # MCP-DEV Step: Implement
2
+
3
+ ## Step 4: Plan implementation approach
4
+ Order: server scaffolding + transport setup -> initialize handler + capability declarations -> tool registration + inputSchema definitions -> tool handlers (try-catch with isError:true) -> input validation against declared schemas. Identify any shared files or dependencies.
5
+
6
+ ## Step 5: Scaffold server and transport
7
+ Set up MCP server instance and configure transport layer (stdio, SSE, or HTTP per `{tech_stack.transport_type}`). Ensure the transport can accept connections and route JSON-RPC messages to handlers. Record in `mcp-dev-handoff.md#files-created-modified`.
8
+
9
+ ## Step 6: Implement initialize handler and capabilities
10
+ Implement the `initialize` JSON-RPC method. Return server info and capability declarations that exactly match the tools and features being implemented. No phantom capabilities. Record in `mcp-dev-handoff.md#capabilities-declared`.
11
+
12
+ ## Step 7: Register tools and define inputSchema
13
+ Register each tool with its name, description, and JSON Schema for inputs. The declared inputSchema is the contract -- it must be validated at runtime in Step 8. Record in `mcp-dev-handoff.md#tools-implemented`.
14
+
15
+ ## Step 8: Implement tool handlers
16
+ For each registered tool, implement the handler logic. Every handler must:
17
+ 1. Wrap execution in try-catch
18
+ 2. Validate inputs against the declared inputSchema; reject with JSON-RPC `-32602` on schema violation
19
+ 3. On tool-level failure (business logic errors), return result with `isError: true` and a descriptive error message
20
+ 4. On unexpected exceptions, catch and return JSON-RPC `-32603` (Internal error) -- never let unhandled exceptions kill the transport
21
+ Record decisions in `mcp-dev-handoff.md#implementation-decisions`. Record error model in `mcp-dev-handoff.md#error-model`.
22
+
23
+ ## Step 9: Implement protocol error handling
24
+ Ensure the server correctly returns JSON-RPC error codes for protocol-level failures:
25
+ - `-32700` for malformed JSON (parse errors)
26
+ - `-32600` for invalid JSON-RPC requests (missing method, missing id)
27
+ - `-32601` for unknown methods
28
+ - `-32602` for invalid params (schema validation failures)
29
+ - `-32603` for internal server errors (unhandled exceptions)
@@ -0,0 +1,13 @@
1
+ # MCP-DEV Step: Read Inputs
2
+
3
+ ## Step 1: Read reqs-brief.md
4
+ Understand: acceptance criteria, business rules, tool definitions (names, descriptions, inputSchema), transport requirements (stdio/SSE/HTTP), capability declarations, error handling expectations, cross-cutting concerns.
5
+
6
+ ## Step 2: Read qa-test-spec.md
7
+ Understand: what tests to write for each AC, expected assertions, protocol compliance verification requirements, test case names and structure.
8
+
9
+ ## Step 3: Read correction directives
10
+ Read `{correction_directives}`. Apply all directives targeting MCP-DEV. Note any conflicts with default behavior and follow the directive.
11
+
12
+ ## Step 3b: Query Knowledge Agent (Conditional)
13
+ If a Knowledge Agent is available in the team config, send: `[KNOWLEDGE-QUERY] What codebase conventions, implementation patterns, and known pitfalls should I know? Context: I am MCP-DEV implementing {story_id} using {tech_stack.mcp_sdk} with {tech_stack.transport_type} transport.` If no response within a reasonable time or no Knowledge Agent is spawned, proceed without.
@@ -0,0 +1,19 @@
1
+ # MCP-DEV Step: Write Tests
2
+
3
+ ## Step 10: Write test code
4
+ Satisfy qa-test-spec for each AC. Every test case named in qa-test-spec must have a corresponding test. Follow quality standards from the core prompt. Record in `mcp-dev-handoff.md#test-files-written`.
5
+
6
+ **Critical requirement: real transport, no mocked transport.** Tests must spawn a real MCP server instance and communicate over the actual transport (stdio pipe, SSE connection, or HTTP). Do not mock the transport layer. The test client sends real JSON-RPC messages and asserts on real responses.
7
+
8
+ ## Step 11: Test protocol compliance
9
+ Tests must cover the full protocol handshake and lifecycle:
10
+ 1. `initialize` request returns correct server info and capabilities
11
+ 2. `tools/list` returns all registered tools with correct inputSchema
12
+ 3. `tools/call` for each tool with valid params returns expected result shape
13
+ 4. `tools/call` with invalid params returns JSON-RPC `-32602`
14
+ 5. `tools/call` triggering tool failure returns result with `isError: true`
15
+ 6. Unknown method returns JSON-RPC `-32601`
16
+ 7. Malformed JSON returns JSON-RPC `-32700`
17
+
18
+ ## Step 12: Run tests, verify all pass
19
+ Run the full test suite. All tests must pass. Record results in `mcp-dev-handoff.md#test-results-summary`. If tests fail, fix the code -- do not skip or weaken tests.
@@ -0,0 +1,67 @@
1
+ # MOBILE Step: Emulator Lifecycle Management
2
+
3
+ ## Step 7b: Boot Emulator
4
+
5
+ ### Android Emulator
6
+ 1. List available AVDs: `emulator -list-avds`
7
+ 2. Boot emulator: `emulator -avd {avd_name} -no-snapshot-load -no-audio -no-window &`
8
+ 3. Wait for boot: `adb wait-for-device` then poll `adb shell getprop sys.boot_completed` until it returns `1` (max 120s, 10 retries at 12s intervals)
9
+ 4. If boot fails after 120s: kill process (`adb emu kill`), retry once with fresh boot. If second attempt fails, file `[BLOCKER]` to Lead with emulator logs.
10
+
11
+ Record emulator config in `mobile-handoff.md#emulator-configuration`.
12
+
13
+ ### iOS Simulator (Mac Only)
14
+ 1. Verify Mac host: `uname -s` must return `Darwin`. If not Mac, skip iOS entirely.
15
+ 2. List available simulators: `xcrun simctl list devices available`
16
+ 3. Boot simulator: `xcrun simctl boot {device_udid}`
17
+ 4. Wait for boot: poll `xcrun simctl list devices | grep Booted` (max 60s)
18
+ 5. If boot fails: `xcrun simctl shutdown all`, retry once. If second attempt fails, file `[BLOCKER]` to Lead.
19
+
20
+ Record simulator config in `mobile-handoff.md#emulator-configuration`.
21
+
22
+ ## Step 7c: Build and Install App
23
+
24
+ ### React Native
25
+ 1. Start Metro bundler: `npx react-native start --reset-cache &`
26
+ 2. Wait for Metro ready: poll for `http://localhost:8081/status` returning `packager-status:running` (max 60s). Handle port conflicts by checking if port 8081 is in use.
27
+ 3. Android build + install: `npx react-native run-android`
28
+ 4. iOS build + install (Mac only): `npx react-native run-ios --simulator="{simulator_name}"`
29
+ 5. Verify main activity/screen renders within 10s of launch. If not, capture `adb logcat` output and file P1 bug.
30
+
31
+ ### Flutter
32
+ 1. Resolve dependencies: `flutter pub get`
33
+ 2. Android build + install: `flutter build apk --debug && flutter install --device-id {emulator_id}`
34
+ 3. iOS build + install (Mac only): `flutter build ios --debug --simulator && flutter install --device-id {simulator_id}`
35
+ 4. Verify app launches and main screen renders within 10s.
36
+
37
+ ### Native Module Recovery (React Native)
38
+ If native module errors occur during build:
39
+ - iOS: run `cd ios && pod install && cd ..` and retry build
40
+ - Android: run `cd android && ./gradlew clean && cd ..` and retry build
41
+ - If native module build fails after retry, file P1 bug with full build output.
42
+
43
+ ## Step 7d: State Isolation Between Maestro Flows
44
+
45
+ Before each Maestro flow execution:
46
+ - **Android:** `adb shell pm clear {app_package_name}`
47
+ - **iOS:** `xcrun simctl terminate {device_udid} {bundle_id}` followed by `xcrun simctl privacy {device_udid} reset all {bundle_id}`
48
+
49
+ This ensures no state leakage between test flows. Every flow starts from a clean app state.
50
+
51
+ ## Step 7e: Pre-Grant Permissions
52
+
53
+ Before test execution, pre-grant required permissions to avoid UI dialog interference:
54
+ - **Android:** `adb shell pm grant {package} android.permission.{PERMISSION}` for each required permission
55
+ - **iOS:** `xcrun simctl privacy {device_udid} grant {permission-type} {bundle_id}`
56
+
57
+ Never depend on UI dialogs for permission grants during E2E tests.
58
+
59
+ ## Step 7f: Crash Recovery
60
+
61
+ If emulator/simulator crashes or becomes unresponsive during test execution:
62
+ 1. Detect via `adb devices` showing offline or Maestro flow timeout
63
+ 2. Capture crash logs: `adb logcat -d > crash-{timestamp}.log`
64
+ 3. Kill stale processes: `adb emu kill` / `xcrun simctl shutdown all`
65
+ 4. Re-boot with clean state (Step 7b)
66
+ 5. Resume from the last incomplete Maestro flow (do not re-run passed flows)
67
+ 6. Max 2 crash recovery attempts per platform. After 2 crashes, file P1 bug with crash logs and stop testing on that platform.
@@ -0,0 +1,51 @@
1
+ # Mobile Estimation
2
+
3
+ **Purpose:** Assign a Fibonacci story point estimate for mobile implementation complexity. This is a lightweight estimation step — no code tools, no implementation. Read specs, assess complexity, output a number with rationale.
4
+
5
+ **Fibonacci scale:** 1, 2, 3, 5, 8, 13, 21
6
+
7
+ ## Step 1: Read Groomed Specs
8
+
9
+ Read and assess:
10
+ - `{story_output_dir}/reqs-brief.md` — REQUIRED
11
+ - `{story_output_dir}/uxa-spec.md` — REQUIRED (if UI profile active)
12
+ - `{story_output_dir}/qa-test-spec.md` — REQUIRED
13
+
14
+ ## Step 2: Assess Complexity Factors
15
+
16
+ Evaluate each factor and record your assessment:
17
+
18
+ | Factor | Assessment | Weight |
19
+ |--------|-----------|--------|
20
+ | **Screen count** | How many new or modified screens? Simple displays vs complex interactive screens? | High |
21
+ | **Navigation complexity** | Deep linking, nested stacks/tabs/drawers, modal flows, conditional navigation? | High |
22
+ | **Platform-specific requirements** | Android-only vs cross-platform? Platform-divergent behavior? | Medium |
23
+ | **Native module integration** | Camera, GPS, push notifications, biometrics, file system? | Medium |
24
+ | **State management complexity** | Local state vs global state? Offline persistence? Optimistic updates? | Medium |
25
+ | **API integration surface** | Number of endpoints consumed, real-time updates, file uploads? | Medium |
26
+
27
+ ## Step 3: Select Fibonacci Value
28
+
29
+ Map your assessment to the Fibonacci scale:
30
+
31
+ | Points | Typical Mobile Scope |
32
+ |--------|---------------------|
33
+ | 1 | Text change, style tweak, single prop addition |
34
+ | 2 | Simple display screen, minor layout change |
35
+ | 3 | Interactive screen with local state, form with validation |
36
+ | 5 | Multi-screen feature, navigation setup, API integration |
37
+ | 8 | Complex interactive feature, cross-platform divergence, native modules |
38
+ | 13 | Large feature with offline support, complex navigation, extensive platform handling |
39
+ | 21 | Epic-scale: new navigation paradigm or major platform integration (consider splitting) |
40
+
41
+ **Calibration context (if `{estimation_model}` is `calibrated`):**
42
+ If calibration directives are provided in `{correction_directives}`, factor them into your estimate. These are learned patterns from prior sprints.
43
+
44
+ ## Step 4: Write Estimate
45
+
46
+ Write to `{story_output_dir}/mobile-estimation.md` using `.valent-pipeline/templates/estimation.template.md`:
47
+ - Fibonacci value with brief rationale (2-3 sentences)
48
+ - Factor assessments from Step 2
49
+ - Calibration adjustments applied (if any)
50
+
51
+ Send: `[ESTIMATION] MOBILE estimates {story_id} at {points} points. See mobile-estimation.md.`
@@ -0,0 +1,30 @@
1
+ # MOBILE Step: Flutter Specifics
2
+
3
+ This step is loaded conditionally when `{tech_stack.mobile_framework}` is `flutter`. Read before implementing.
4
+
5
+ ## Flutter Build Configuration
6
+ - Debug builds for testing: `flutter build apk --debug` / `flutter build ios --debug --simulator`
7
+ - Resolve dependencies before build: `flutter pub get`
8
+ - Verify Flutter SDK version matches project constraints in `pubspec.yaml`
9
+ - Hot reload: disable during E2E (use cold start for each Maestro flow via `clearState`)
10
+
11
+ ## Flutter Testing Patterns
12
+ - Widget tests: use `flutter_test` with `WidgetTester` for component isolation
13
+ - Integration tests: use Maestro YAML flows (NOT `flutter_driver` or `integration_test` for pipeline E2E)
14
+ - State management: clear providers/blocs/cubits between test suites
15
+ - Platform channels: test with real native code, not mock method channel handlers
16
+
17
+ ## Flutter-Specific Emulator Setup
18
+ - Android: standard AVD boot, then `flutter install --device-id {emulator_id}`
19
+ - iOS: `open -a Simulator` if not already running, then `flutter install --device-id {simulator_id}`
20
+ - Verify device connection: `flutter devices` must list the target device
21
+
22
+ ## Area Labels for Testing
23
+ - Use `Key` with `ValueKey('testID')` for Flutter widgets
24
+ - Maestro `tapOn` with `id:` selector reads the `ValueKey` on both platforms
25
+ - Follow the area label convention from uxa-spec.md: `{screen}-{section}-{element}`
26
+
27
+ ## Offline Testing
28
+ - Use emulator console commands for network simulation: `adb emu network delay gprs` / `adb emu network speed gsm`
29
+ - Do NOT use `adb shell svc wifi disable` (unreliable on emulators)
30
+ - Test offline-capable features per reqs-brief offline requirements
@@ -0,0 +1,18 @@
1
+ # MOBILE Step: Handoff
2
+
3
+ Read `.valent-pipeline/steps/common/distilled-handoff-format.md` before writing output.
4
+
5
+ ## Step 11: Write mobile-handoff.md
6
+ Complete all sections of the handoff document using the template at `.valent-pipeline/templates/mobile-handoff.template.md`. Set `status: completed` in frontmatter.
7
+
8
+ If iOS tests were deferred (host is not Mac):
9
+ - Set `ios_deferred: true` in frontmatter
10
+ - Complete the `Deferred iOS Tests` section listing all unexecuted iOS flows
11
+ - Include in the inbox message: `[IOS-DEFERRED] {count} iOS Maestro flows deferred. Run /run-deferred-tests on Mac to complete.`
12
+
13
+ Notify lead via inbox: `[DONE] Mobile implementation complete. See mobile-handoff.md#orchestrator-summary.`
14
+
15
+ ## Independent Verification Requirement
16
+ All Android tests must pass before marking complete. If on Mac, all iOS tests must also pass. Do not mark complete with failing Android tests. Do not rely on BEND or CRITIC to catch your failures.
17
+
18
+ **Smoke test gate:** The app-level smoke test (Step 9b) must pass before sending `[DONE]`. If it fails, the app's entry point is not wired to your deliverable -- fix the wiring before marking complete.
@@ -0,0 +1,20 @@
1
+ # MOBILE Step: Implement
2
+
3
+ ## Step 3: Detect host platform
4
+ Run platform detection to determine available targets:
5
+ - `uname -s` returns `Darwin` → Mac: both Android and iOS targets available
6
+ - `uname -s` returns `Linux` or `MINGW*`/`MSYS*` → Windows/Linux: Android only, iOS deferred
7
+
8
+ Record platform capabilities in `mobile-handoff.md#platform-coverage`. If iOS is unavailable, set `ios_deferred: true` in handoff frontmatter.
9
+
10
+ ## Step 4: Plan screen architecture
11
+ From uxa-spec.md screen specifications (if present) or reqs-brief.md: identify screens, navigation structure (stack, tab, drawer), shared components, deep link URI patterns. Map to framework conventions for `{tech_stack.mobile_framework}`.
12
+
13
+ ## Step 5: Implement screens and navigation
14
+ Per spec: create screen components, navigation setup (React Navigation / Flutter Navigator), deep linking configuration. Apply `testID` attributes matching the area label system from uxa-spec.md. Record in `mobile-handoff.md#screens-implemented`.
15
+
16
+ ## Step 6: Implement components
17
+ Per spec: forms, lists, modals, gesture handlers, platform-specific components. Wire to backend API endpoints per `bend-handoff.md#api-endpoints-implemented` (if BEND is active). Record in `mobile-handoff.md#components-created`.
18
+
19
+ ## Step 7: Implement platform-specific behavior
20
+ Handle platform divergences: permissions (camera, location, notifications), native modules, platform-specific UI (Android back button, iOS swipe-to-go-back, safe areas, notch handling). Use `Platform.OS` / `Platform.select` for divergent behavior. Record decisions in `mobile-handoff.md#implementation-decisions`.