npm - valent-pipeline - Versions diffs - 0.2.20 → 0.2.21 - Mend

valent-pipeline 0.2.20 → 0.2.21

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (110) hide show

package/README.md +438 -0
package/package.json +1 -1
package/pipeline/agents-manifest.yaml +61 -1
package/pipeline/docs/agent-reference.md +82 -23
package/pipeline/docs/design/refactor-checklist.md +111 -0
package/pipeline/docs/index.md +60 -0
package/pipeline/docs/pipeline-overview.md +4 -0
package/pipeline/prompts/bend.md +5 -11
package/pipeline/prompts/critic.md +9 -0
package/pipeline/prompts/data.md +59 -0
package/pipeline/prompts/docgen.md +61 -0
package/pipeline/prompts/fend.md +3 -10
package/pipeline/prompts/iac.md +70 -0
package/pipeline/prompts/lead.md +81 -3
package/pipeline/prompts/libdev.md +61 -0
package/pipeline/prompts/mcp-dev.md +59 -0
package/pipeline/prompts/mobile.md +92 -0
package/pipeline/prompts/qa-a.md +1 -1
package/pipeline/prompts/qa-b.md +1 -1
package/pipeline/prompts/reqs.md +5 -1
package/pipeline/scripts/db-bootstrap.ts +1 -1
package/pipeline/scripts/embed-sqlite.ts +5 -0
package/pipeline/steps/common/quality-standards.md +19 -0
package/pipeline/steps/critic/data-pipeline.md +28 -0
package/pipeline/steps/critic/document-generation.md +21 -0
package/pipeline/steps/critic/iac.md +29 -0
package/pipeline/steps/critic/library.md +24 -0
package/pipeline/steps/critic/mcp-server.md +24 -0
package/pipeline/steps/critic/mobile-app.md +29 -0
package/pipeline/steps/data/estimate.md +51 -0
package/pipeline/steps/data/handoff.md +9 -0
package/pipeline/steps/data/implement.md +16 -0
package/pipeline/steps/data/read-inputs.md +13 -0
package/pipeline/steps/data/write-tests.md +13 -0
package/pipeline/steps/docgen/estimate.md +49 -0
package/pipeline/steps/docgen/handoff.md +9 -0
package/pipeline/steps/docgen/implement.md +19 -0
package/pipeline/steps/docgen/read-inputs.md +13 -0
package/pipeline/steps/docgen/write-tests.md +15 -0
package/pipeline/steps/iac/estimate.md +50 -0
package/pipeline/steps/iac/handoff.md +9 -0
package/pipeline/steps/iac/implement.md +19 -0
package/pipeline/steps/iac/read-inputs.md +13 -0
package/pipeline/steps/iac/write-tests.md +20 -0
package/pipeline/steps/judge/ship-decision.md +14 -1
package/pipeline/steps/libdev/estimate.md +49 -0
package/pipeline/steps/libdev/handoff.md +9 -0
package/pipeline/steps/libdev/implement.md +19 -0
package/pipeline/steps/libdev/read-inputs.md +13 -0
package/pipeline/steps/libdev/write-tests.md +16 -0
package/pipeline/steps/mcp-dev/estimate.md +49 -0
package/pipeline/steps/mcp-dev/handoff.md +9 -0
package/pipeline/steps/mcp-dev/implement.md +29 -0
package/pipeline/steps/mcp-dev/read-inputs.md +13 -0
package/pipeline/steps/mcp-dev/write-tests.md +19 -0
package/pipeline/steps/mobile/emulator-lifecycle.md +67 -0
package/pipeline/steps/mobile/estimate.md +51 -0
package/pipeline/steps/mobile/flutter.md +30 -0
package/pipeline/steps/mobile/handoff.md +18 -0
package/pipeline/steps/mobile/implement.md +20 -0
package/pipeline/steps/mobile/react-native.md +32 -0
package/pipeline/steps/mobile/read-inputs.md +10 -0
package/pipeline/steps/mobile/write-tests.md +59 -0
package/pipeline/steps/orchestration/adopt-lead-and-create-team.md +1 -1
package/pipeline/steps/orchestration/sprint-groom.md +4 -0
package/pipeline/steps/orchestration/sprint-size.md +19 -12
package/pipeline/steps/orchestration/validate-story-inputs.md +9 -0
package/pipeline/steps/qa-a/data-pipeline.md +32 -0
package/pipeline/steps/qa-a/document-generation.md +52 -0
package/pipeline/steps/qa-a/iac.md +30 -0
package/pipeline/steps/qa-a/library.md +42 -0
package/pipeline/steps/qa-a/mcp-server.md +31 -0
package/pipeline/steps/qa-a/mobile-app.md +59 -0
package/pipeline/steps/qa-b/data-pipeline.md +48 -0
package/pipeline/steps/qa-b/document-generation.md +47 -0
package/pipeline/steps/qa-b/iac.md +44 -0
package/pipeline/steps/qa-b/library.md +61 -0
package/pipeline/steps/qa-b/mcp-server.md +40 -0
package/pipeline/steps/qa-b/mobile-app.md +71 -0
package/pipeline/steps/readiness/standalone-review.md +7 -2
package/pipeline/steps/reqs/data-pipeline.md +56 -0
package/pipeline/steps/reqs/document-generation.md +55 -0
package/pipeline/steps/reqs/draft-brief.md +10 -0
package/pipeline/steps/reqs/iac.md +63 -0
package/pipeline/steps/reqs/library.md +56 -0
package/pipeline/steps/reqs/mcp-server.md +48 -0
package/pipeline/steps/reqs/mobile-app.md +54 -0
package/pipeline/steps/reqs/self-review.md +5 -3
package/pipeline/task-graphs/backend-api.yaml +19 -2
package/pipeline/task-graphs/data-pipeline.yaml +29 -12
package/pipeline/task-graphs/document-generation.yaml +29 -12
package/pipeline/task-graphs/frontend-only.yaml +19 -2
package/pipeline/task-graphs/fullstack-web.yaml +19 -2
package/pipeline/task-graphs/library.yaml +29 -12
package/pipeline/task-graphs/mcp-server.yaml +29 -12
package/pipeline/task-graphs/mobile-app.yaml +171 -0
package/pipeline/templates/bugs.template.md +1 -1
package/pipeline/templates/critic-review.template.md +1 -1
package/pipeline/templates/data-handoff.template.md +96 -0
package/pipeline/templates/docgen-handoff.template.md +83 -0
package/pipeline/templates/iac-handoff.template.md +83 -0
package/pipeline/templates/judge-decision.template.md +11 -1
package/pipeline/templates/libdev-handoff.template.md +82 -0
package/pipeline/templates/mcp-dev-handoff.template.md +87 -0
package/pipeline/templates/mobile-handoff.template.md +122 -0
package/pipeline/templates/reqs-brief.template.md +60 -4
package/skills/valent-run-deferred-tests/SKILL.md +109 -0
package/src/commands/db-rebuild.js +5 -0
package/src/lib/config-schema.js +1 -1
package/src/lib/db.js +1 -1

package/pipeline/steps/docgen/read-inputs.md ADDED Viewed

@@ -0,0 +1,13 @@
+# DOCGEN Step: Read Inputs
+## Step 1: Read reqs-brief.md
+Understand: acceptance criteria, business rules, template definitions, variable schemas, output format requirements, asset dependencies (fonts, images, stylesheets), encoding requirements, cross-cutting concerns.
+## Step 2: Read qa-test-spec.md
+Understand: what tests to write for each AC, expected assertions, output validation requirements, test case names and structure.
+## Step 3: Read correction directives
+Read `{correction_directives}`. Apply all directives targeting DOCGEN. Note any conflicts with default behavior and follow the directive.
+## Step 3b: Query Knowledge Agent (Conditional)
+If a Knowledge Agent is available in the team config, send: `[KNOWLEDGE-QUERY] What codebase conventions, implementation patterns, and known pitfalls should I know? Context: I am DOCGEN implementing {story_id} using {tech_stack.template_engine} with output formats {tech_stack.output_formats}.` If no response within a reasonable time or no Knowledge Agent is spawned, proceed without.

package/pipeline/steps/docgen/write-tests.md ADDED Viewed

@@ -0,0 +1,15 @@
+# DOCGEN Step: Write Tests
+## Step 10: Write test code
+Satisfy qa-test-spec for each AC. Every test case named in qa-test-spec must have a corresponding test. Follow quality standards from the core prompt. Record in `docgen-handoff.md#test-files-written`.
+Tests must invoke real document generation -- no mocked renderers. The actual template engine must process templates and produce real output. Parse and validate the generated output programmatically:
+- For HTML: parse the DOM and assert on structure and content.
+- For PDF: extract text/metadata and assert on content.
+- For Markdown: parse and assert on structure and content.
+## Step 11: Test with edge-case data
+Per qa-test-spec and quality standards: tests must exercise null values, missing optional fields, unicode characters (CJK, emoji, RTL text), extremely long strings, empty collections, and special characters that could break template syntax. Verify that auto-escaping prevents injection in all cases.
+## Step 12: Run tests, verify all pass
+Run the full test suite. All tests must pass. Record results in `docgen-handoff.md#test-results-summary`. If tests fail, fix the code -- do not skip or weaken tests.

package/pipeline/steps/iac/estimate.md ADDED Viewed

@@ -0,0 +1,50 @@
+# Infrastructure Estimation
+**Purpose:** Assign a Fibonacci story point estimate for infrastructure implementation complexity. This is a lightweight estimation step — no code tools, no implementation. Read specs, assess complexity, output a number with rationale.
+**Fibonacci scale:** 1, 2, 3, 5, 8, 13, 21
+## Step 1: Read Groomed Specs
+Read and assess:
+- `{story_output_dir}/reqs-brief.md` — REQUIRED
+- `{story_output_dir}/qa-test-spec.md` — REQUIRED
+## Step 2: Assess Complexity Factors
+Evaluate each factor and record your assessment:
+| Factor | Assessment | Weight |
+|--------|-----------|--------|
+| **Resource count and complexity** | How many resources? Simple (S3 bucket, DNS record) or complex (VPC, EKS cluster, RDS)? | High |
+| **New infrastructure vs incremental** | Greenfield (new VPC, new cluster) vs incremental (add IAM role, add env var)? | High |
+| **IAM/Security surface** | New roles, policies, service accounts? Cross-account access? | Medium |
+| **State management complexity** | New state backend, state migration, workspace setup? | Medium |
+| **Integration surface** | Outputs consumed by other services, shared config, cross-stack references? | Medium |
+| **Test complexity** | How hard will infrastructure validation be? Multi-resource dependencies, timing issues? | Medium |
+## Step 3: Select Fibonacci Value
+Map your assessment to the Fibonacci scale:
+| Points | Typical Infrastructure Scope |
+|--------|------------------------------|
+| 1 | Tag update, single env var, minor config change |
+| 2 | Simple resource addition (S3 bucket, DNS record), trivial IAM tweak |
+| 3 | Standard resource with IAM, moderate config (Lambda + role, security group rules) |
+| 5 | Multi-resource feature (RDS + security group + IAM + secrets), provider setup |
+| 8 | Complex infrastructure (VPC + subnets + NAT + routing, ECS service + ALB + auto-scaling) |
+| 13 | Large infrastructure spanning multiple services, complex IAM, multi-environment setup |
+| 21 | Epic-scale: new cluster, major networking overhaul, multi-region setup (consider splitting) |
+**Calibration context (if `{estimation_model}` is `calibrated`):**
+If calibration directives are provided in `{correction_directives}`, factor them into your estimate.
+## Step 4: Write Estimate
+Write to `{story_output_dir}/iac-estimation.md` using `.valent-pipeline/templates/estimation.template.md`:
+- Fibonacci value with brief rationale (2-3 sentences)
+- Factor assessments from Step 2
+- Calibration adjustments applied (if any)
+Send: `[ESTIMATION] IAC estimates {story_id} at {points} points. See iac-estimation.md.`

package/pipeline/steps/iac/handoff.md ADDED Viewed

@@ -0,0 +1,9 @@
+# IAC Step: Handoff
+Read `.valent-pipeline/steps/common/distilled-handoff-format.md` before writing output.
+## Step 13: Write iac-handoff.md
+Complete all sections of the handoff document using the template at `.valent-pipeline/templates/iac-handoff.template.md`. Set `status: completed` in frontmatter. Notify lead via inbox: `[DONE] Infrastructure implementation complete. See iac-handoff.md#orchestrator-summary.`
+## Independent Verification Requirement
+You must independently verify: all tests pass against the combined, integrated codebase before marking your task complete. Do not rely on other agents or CRITIC to catch your failures.

package/pipeline/steps/iac/implement.md ADDED Viewed

@@ -0,0 +1,19 @@
+# IAC Step: Implement
+## Step 4: Plan implementation approach
+Order: (1) state backend setup, (2) provider configuration, (3) resource definitions with tagging, (4) IAM roles/policies (least privilege), (5) outputs/exports for consuming services, (6) environment configuration. Identify shared config that needs coordination with BEND/FEND/DATA.
+## Step 5: Configure state backend
+Per reqs-brief: set up remote state backend with locking. Record in `iac-handoff.md#state-management`.
+## Step 6: Configure providers
+Per reqs-brief: provider blocks with pinned versions, authentication configuration, default tags. Record in `iac-handoff.md#files-created-modified`.
+## Step 7: Define resources with tagging
+Per reqs-brief: resource definitions with standard tags (environment, project, owner, managed-by). Every resource must be tagged. Apply destroy protection on stateful resources (databases, storage). Record in `iac-handoff.md#resources-provisioned`.
+## Step 8: Create IAM roles and policies
+Per reqs-brief: roles, policies, service accounts with least-privilege access. No wildcard actions or resources unless explicitly justified. Record in `iac-handoff.md#iam-security`.
+## Step 9: Define outputs and environment configuration
+Per reqs-brief: outputs/exports for consuming services, environment variables, secret references, connection strings. Coordinate with other dev agents via inbox for shared config. Record in `iac-handoff.md#environment-configuration`.

package/pipeline/steps/iac/read-inputs.md ADDED Viewed

@@ -0,0 +1,13 @@
+# IAC Step: Read Inputs
+## Step 1: Read reqs-brief.md
+Understand: acceptance criteria, business rules, infrastructure requirements, deployment configurations, resource provisioning needs, IAM/security requirements, environment configuration, cross-cutting concerns.
+## Step 2: Read qa-test-spec.md
+Understand: what tests to write for each AC, expected assertions, infrastructure state verification requirements, test case names and structure.
+## Step 3: Read correction directives
+Read `{correction_directives}`. Apply all directives targeting IAC. Note any conflicts with default behavior and follow the directive.
+## Step 3b: Query Knowledge Agent (Conditional)
+If a Knowledge Agent is available in the team config, send: `[KNOWLEDGE-QUERY] What codebase conventions, infrastructure patterns, and known pitfalls should I know? Context: I am IAC implementing {story_id} using {tech_stack.iac_framework} on {tech_stack.cloud_provider}.` If no response within a reasonable time or no Knowledge Agent is spawned, proceed without.

package/pipeline/steps/iac/write-tests.md ADDED Viewed

@@ -0,0 +1,20 @@
+# IAC Step: Write Tests
+## Step 10: Write test code
+Satisfy qa-test-spec for each AC. Every test case named in qa-test-spec must have a corresponding test. Follow quality standards from the core prompt. Record in `iac-handoff.md#test-files-written`.
+Infrastructure test categories:
+- **Plan validation** -- terraform plan (or equivalent) succeeds without errors
+- **Policy checks** -- tflint, checkov, OPA, or equivalent policy-as-code validation passes
+- **Idempotency** -- apply twice, second apply shows no changes (zero diff)
+- **Resource tagging verification** -- all resources have required standard tags
+- **Security policy checks** -- no overly permissive IAM, no hardcoded secrets
+Do NOT use mocked providers for happy-path tests. Tests must validate against real plan output or real infrastructure state.
+## Step 11: Run tests, verify all pass
+Run the full infrastructure test suite. All tests must pass. Record results in `iac-handoff.md#test-results-summary`. If tests fail, fix the code -- do not skip or weaken tests.
+## Step 12: Signal integration readiness
+When your code is complete and all tests pass, send to other active dev agents via inbox:
+`[INTEGRATION-READY] Infrastructure code complete. Environment configuration available at iac-handoff.md#environment-configuration.`

package/pipeline/steps/judge/ship-decision.md CHANGED Viewed

@@ -8,8 +8,20 @@ Read `.valent-pipeline/steps/common/distilled-handoff-format.md` before writing
 - ALL evidence checklist items PASS (or N/A)
 - Socratic validation reveals no concerns undermining evidence integrity
+**SHIP-PARTIAL** if:
+- ALL available-platform evidence checklist items PASS (or N/A)
+- Platform coverage evidence item is PARTIAL (not FAIL)
+- iOS tests are deferred (not failed) due to host platform limitation
+- `mobile-handoff.md` frontmatter has `ios_deferred: true`
+- Socratic validation reveals no concerns for available-platform evidence
+SHIP-PARTIAL is ONLY valid when:
+1. The project type is `mobile-app`
+2. iOS tests were deferred due to host OS limitation (not because they failed)
+3. All Android tests passed
 **REJECT** if:
-- ANY evidence checklist item FAIL, OR
+- ANY evidence checklist item FAIL (excluding PARTIAL platform coverage when iOS is deferred, not failed), OR
 - Socratic validation undermines evidence integrity
 For REJECT: identify root cause, responsible phase/agent, required action. JUDGE rejections are non-routine; lead (Opus) takes ownership.
@@ -19,6 +31,7 @@ For REJECT: identify root cause, responsible phase/agent, required action. JUDGE
 - Write to `{story_output_dir}/judge-decision.md`
 - Update frontmatter: status completed, all steps in stepsCompleted
 - SHIP: `[JUDGE-SHIP] Story {story_id} approved for shipping. See judge-decision.md#verdict.`
+- SHIP-PARTIAL: `[JUDGE-SHIP-PARTIAL] Story {story_id} approved for shipping (Android verified). iOS deferred — run /run-deferred-tests on Mac. See judge-decision.md#ship-partial-detail.`
 - REJECT: `[JUDGE-REJECT] Story {story_id} rejected. See judge-decision.md#rejection-detail.`
 ## Step 14b: Write Story Report (SHIP only)

package/pipeline/steps/libdev/estimate.md ADDED Viewed

@@ -0,0 +1,49 @@
+# Library Estimation
+**Purpose:** Assign a Fibonacci story point estimate for library implementation complexity. This is a lightweight estimation step — no code tools, no implementation. Read specs, assess complexity, output a number with rationale.
+**Fibonacci scale:** 1, 2, 3, 5, 8, 13, 21
+## Step 1: Read Groomed Specs
+Read and assess:
+- `{story_output_dir}/reqs-brief.md` — REQUIRED
+- `{story_output_dir}/qa-test-spec.md` — REQUIRED
+## Step 2: Assess Complexity Factors
+Evaluate each factor and record your assessment:
+| Factor | Assessment | Weight |
+|--------|-----------|--------|
+| **AC count and complexity** | How many ACs? Are they simple (add export) or complex (breaking API redesign, dual-module compat)? | High |
+| **New patterns vs established** | Greenfield (new library, new module system) vs incremental (add export, update types)? | High |
+| **Public API surface** | How many new exports? New type declarations? Breaking changes requiring migration? | Medium |
+| **Module system complexity** | Single target (ESM-only) vs dual CJS+ESM? Tree-shaking requirements? | Medium |
+| **Test complexity** | How hard will consumer-simulation tests be? Cross-module-system testing? Bundle verification? | Medium |
+## Step 3: Select Fibonacci Value
+Map your assessment to the Fibonacci scale:
+| Points | Typical Library Scope |
+|--------|-----------------------|
+| 1 | Single export addition, type fix, no packaging change |
+| 2 | Simple new export with types, trivial packaging update |
+| 3 | Multiple exports, type declarations, moderate test coverage |
+| 5 | New module with CJS+ESM entry points, type overhaul, consumer-sim tests |
+| 8 | Major API surface change, breaking changes with migration, dual-module verification |
+| 13 | Large library restructure, new module system support, extensive type surface |
+| 21 | Epic-scale: new library or complete packaging overhaul (consider splitting the story) |
+**Calibration context (if `{estimation_model}` is `calibrated`):**
+If calibration directives are provided in `{correction_directives}`, factor them into your estimate. These are learned patterns from prior sprints — e.g., "dual CJS+ESM stories consistently under-pointed by 1 tier" or "stories with breaking changes average 8 points."
+## Step 4: Write Estimate
+Write to `{story_output_dir}/libdev-estimation.md` using `.valent-pipeline/templates/estimation.template.md`:
+- Fibonacci value with brief rationale (2-3 sentences)
+- Factor assessments from Step 2
+- Calibration adjustments applied (if any)
+Send: `[ESTIMATION] LIBDEV estimates {story_id} at {points} points. See libdev-estimation.md.`

package/pipeline/steps/libdev/handoff.md ADDED Viewed

@@ -0,0 +1,9 @@
+# LIBDEV Step: Handoff
+Read `.valent-pipeline/steps/common/distilled-handoff-format.md` before writing output.
+## Step 13: Write libdev-handoff.md
+Complete all sections of the handoff document using the template at `.valent-pipeline/templates/libdev-handoff.template.md`. Set `status: completed` in frontmatter. Notify lead via inbox: `[DONE] Library implementation complete. See libdev-handoff.md#orchestrator-summary.`
+## Independent Verification Requirement
+You must independently verify: all tests pass, all exports resolve correctly, CJS and ESM entry points work, and type declarations are complete before marking your task complete. Do not rely on CRITIC or QA-B to catch your failures.

package/pipeline/steps/libdev/implement.md ADDED Viewed

@@ -0,0 +1,19 @@
+# LIBDEV Step: Implement
+## Step 4: Plan implementation approach
+Order: public API surface design (exports map) -> core module implementation -> type declarations (.d.ts/hints) -> CJS+ESM entry points -> side effects verification. Identify peer dependencies that must be declared.
+## Step 5: Design public API surface
+Per reqs-brief: define the exports map, entry points, and module structure. Every public export must be intentional -- no internal modules leaking into the public API. Record in `libdev-handoff.md#public-api-surface`.
+## Step 6: Implement core modules
+Per reqs-brief: implement the library's core logic, utilities, and domain types. Each module must be independently importable via the exports map. Record in `libdev-handoff.md#files-created-modified`.
+## Step 7: Implement type declarations
+Per reqs-brief: create or generate type declarations for every public export. TypeScript: .d.ts files or inline types. Python: type hints and py.typed marker. Rust: public type signatures. Types must match the actual implementation signatures exactly.
+## Step 8: Configure entry points and packaging
+Per reqs-brief: configure main (CJS), module (ESM), and types entry points. Set sideEffects field. Declare peer dependencies with correct version ranges. Record in `libdev-handoff.md#package-configuration`.
+## Step 9: Verify no accidental side effects
+Import the library's main entry point and verify no observable side effects execute (no console output, no network calls, no global mutations, no file I/O). If side effects are intentional, document them and set sideEffects field accordingly. Record decisions in `libdev-handoff.md#implementation-decisions`.

package/pipeline/steps/libdev/read-inputs.md ADDED Viewed

@@ -0,0 +1,13 @@
+# LIBDEV Step: Read Inputs
+## Step 1: Read reqs-brief.md
+Understand: acceptance criteria, business rules, public API surface requirements, export contracts, type declarations, packaging constraints, peer dependency requirements, cross-cutting concerns.
+## Step 2: Read qa-test-spec.md
+Understand: what tests to write for each AC, expected assertions, consumer-simulation requirements, test case names and structure.
+## Step 3: Read correction directives
+Read `{correction_directives}`. Apply all directives targeting LIBDEV. Note any conflicts with default behavior and follow the directive.
+## Step 3b: Query Knowledge Agent (Conditional)
+If a Knowledge Agent is available in the team config, send: `[KNOWLEDGE-QUERY] What codebase conventions, implementation patterns, and known pitfalls should I know? Context: I am LIBDEV implementing {story_id} using {tech_stack.language} with {tech_stack.module_system} module system and {tech_stack.type_system}.` If no response within a reasonable time or no Knowledge Agent is spawned, proceed without.

package/pipeline/steps/libdev/write-tests.md ADDED Viewed

@@ -0,0 +1,16 @@
+# LIBDEV Step: Write Tests
+## Step 10: Write test code
+Satisfy qa-test-spec for each AC. Every test case named in qa-test-spec must have a corresponding test. Follow quality standards from the core prompt. Record in `libdev-handoff.md#test-files-written`.
+**Critical rule:** Tests must import the actual library using its public API -- the same way a consumer would. No importing from internal/private module paths. No mocked imports. If a test cannot exercise the library through its public exports, the API surface is wrong, not the test.
+## Step 11: Write consumer-simulation tests
+Beyond unit tests, write at least one consumer-simulation test per major export that:
+1. Imports the library the way a real consumer would (CJS `require()` and ESM `import`)
+2. Exercises the documented usage example from the public API surface
+3. Verifies the return types match declared type signatures
+4. Confirms no side effects on import (no console output, no global mutations)
+## Step 12: Run tests, verify all pass
+Run the full library test suite. All tests must pass. Record results in `libdev-handoff.md#test-results-summary`. If tests fail, fix the code -- do not skip or weaken tests.

package/pipeline/steps/mcp-dev/estimate.md ADDED Viewed

@@ -0,0 +1,49 @@
+# MCP Server Estimation
+**Purpose:** Assign a Fibonacci story point estimate for MCP server implementation complexity. This is a lightweight estimation step -- no code tools, no implementation. Read specs, assess complexity, output a number with rationale.
+**Fibonacci scale:** 1, 2, 3, 5, 8, 13, 21
+## Step 1: Read Groomed Specs
+Read and assess:
+- `{story_output_dir}/reqs-brief.md` -- REQUIRED
+- `{story_output_dir}/qa-test-spec.md` -- REQUIRED
+## Step 2: Assess Complexity Factors
+Evaluate each factor and record your assessment:
+| Factor | Assessment | Weight |
+|--------|-----------|--------|
+| **Tool count and handler complexity** | How many tools? Are handlers simple (stateless transform) or complex (stateful, multi-step, external calls)? | High |
+| **New patterns vs established** | Greenfield server vs adding tools to existing server? New transport vs reusing existing? | High |
+| **Input schema complexity** | Simple flat params vs deeply nested schemas, conditional validation, cross-field dependencies? | Medium |
+| **Error surface** | How many distinct failure modes per tool? Complex error mapping? External service failures to handle? | Medium |
+| **Test complexity** | How hard will QA-B's protocol tests be to pass? Complex fixtures, multi-step tool sequences, stateful interactions? | Medium |
+## Step 3: Select Fibonacci Value
+Map your assessment to the Fibonacci scale:
+| Points | Typical MCP Server Scope |
+|--------|------------------------|
+| 1 | Config change, single tool param update, no schema change |
+| 2 | Simple single-tool server, trivial inputSchema, basic transport |
+| 3 | Standard server with 2-3 tools, input validation, proper error model |
+| 5 | Multi-tool server with complex schemas, stateful handlers, external integrations |
+| 8 | Complex server with many tools, multiple content types, advanced error handling, transport edge cases |
+| 13 | Large server spanning multiple domains, custom transport, extensive protocol compliance surface |
+| 21 | Epic-scale: new transport implementation or major protocol extension (consider splitting the story) |
+**Calibration context (if `{estimation_model}` is `calibrated`):**
+If calibration directives are provided in `{correction_directives}`, factor them into your estimate. These are learned patterns from prior sprints -- e.g., "input validation consistently under-pointed by 1 tier" or "stories with 5+ tools average 8 points."
+## Step 4: Write Estimate
+Write to `{story_output_dir}/mcp-dev-estimation.md` using `.valent-pipeline/templates/estimation.template.md`:
+- Fibonacci value with brief rationale (2-3 sentences)
+- Factor assessments from Step 2
+- Calibration adjustments applied (if any)
+Send: `[ESTIMATION] MCP-DEV estimates {story_id} at {points} points. See mcp-dev-estimation.md.`

package/pipeline/steps/mcp-dev/handoff.md ADDED Viewed

@@ -0,0 +1,9 @@
+# MCP-DEV Step: Handoff
+Read `.valent-pipeline/steps/common/distilled-handoff-format.md` before writing output.
+## Step 13: Write mcp-dev-handoff.md
+Complete all sections of the handoff document using the template at `.valent-pipeline/templates/mcp-dev-handoff.template.md`. Set `status: completed` in frontmatter. Notify lead via inbox: `[DONE] MCP server implementation complete. See mcp-dev-handoff.md#orchestrator-summary.`
+## Independent Verification Requirement
+You must independently verify: all tests pass against the complete server implementation before marking your task complete. Do not rely on CRITIC or QA-B to catch your failures.

package/pipeline/steps/mcp-dev/implement.md ADDED Viewed

@@ -0,0 +1,29 @@
+# MCP-DEV Step: Implement
+## Step 4: Plan implementation approach
+Order: server scaffolding + transport setup -> initialize handler + capability declarations -> tool registration + inputSchema definitions -> tool handlers (try-catch with isError:true) -> input validation against declared schemas. Identify any shared files or dependencies.
+## Step 5: Scaffold server and transport
+Set up MCP server instance and configure transport layer (stdio, SSE, or HTTP per `{tech_stack.transport_type}`). Ensure the transport can accept connections and route JSON-RPC messages to handlers. Record in `mcp-dev-handoff.md#files-created-modified`.
+## Step 6: Implement initialize handler and capabilities
+Implement the `initialize` JSON-RPC method. Return server info and capability declarations that exactly match the tools and features being implemented. No phantom capabilities. Record in `mcp-dev-handoff.md#capabilities-declared`.
+## Step 7: Register tools and define inputSchema
+Register each tool with its name, description, and JSON Schema for inputs. The declared inputSchema is the contract -- it must be validated at runtime in Step 8. Record in `mcp-dev-handoff.md#tools-implemented`.
+## Step 8: Implement tool handlers
+For each registered tool, implement the handler logic. Every handler must:
+1. Wrap execution in try-catch
+2. Validate inputs against the declared inputSchema; reject with JSON-RPC `-32602` on schema violation
+3. On tool-level failure (business logic errors), return result with `isError: true` and a descriptive error message
+4. On unexpected exceptions, catch and return JSON-RPC `-32603` (Internal error) -- never let unhandled exceptions kill the transport
+Record decisions in `mcp-dev-handoff.md#implementation-decisions`. Record error model in `mcp-dev-handoff.md#error-model`.
+## Step 9: Implement protocol error handling
+Ensure the server correctly returns JSON-RPC error codes for protocol-level failures:
+- `-32700` for malformed JSON (parse errors)
+- `-32600` for invalid JSON-RPC requests (missing method, missing id)
+- `-32601` for unknown methods
+- `-32602` for invalid params (schema validation failures)
+- `-32603` for internal server errors (unhandled exceptions)

package/pipeline/steps/mcp-dev/read-inputs.md ADDED Viewed

@@ -0,0 +1,13 @@
+# MCP-DEV Step: Read Inputs
+## Step 1: Read reqs-brief.md
+Understand: acceptance criteria, business rules, tool definitions (names, descriptions, inputSchema), transport requirements (stdio/SSE/HTTP), capability declarations, error handling expectations, cross-cutting concerns.
+## Step 2: Read qa-test-spec.md
+Understand: what tests to write for each AC, expected assertions, protocol compliance verification requirements, test case names and structure.
+## Step 3: Read correction directives
+Read `{correction_directives}`. Apply all directives targeting MCP-DEV. Note any conflicts with default behavior and follow the directive.
+## Step 3b: Query Knowledge Agent (Conditional)
+If a Knowledge Agent is available in the team config, send: `[KNOWLEDGE-QUERY] What codebase conventions, implementation patterns, and known pitfalls should I know? Context: I am MCP-DEV implementing {story_id} using {tech_stack.mcp_sdk} with {tech_stack.transport_type} transport.` If no response within a reasonable time or no Knowledge Agent is spawned, proceed without.

package/pipeline/steps/mcp-dev/write-tests.md ADDED Viewed

@@ -0,0 +1,19 @@
+# MCP-DEV Step: Write Tests
+## Step 10: Write test code
+Satisfy qa-test-spec for each AC. Every test case named in qa-test-spec must have a corresponding test. Follow quality standards from the core prompt. Record in `mcp-dev-handoff.md#test-files-written`.
+**Critical requirement: real transport, no mocked transport.** Tests must spawn a real MCP server instance and communicate over the actual transport (stdio pipe, SSE connection, or HTTP). Do not mock the transport layer. The test client sends real JSON-RPC messages and asserts on real responses.
+## Step 11: Test protocol compliance
+Tests must cover the full protocol handshake and lifecycle:
+1. `initialize` request returns correct server info and capabilities
+2. `tools/list` returns all registered tools with correct inputSchema
+3. `tools/call` for each tool with valid params returns expected result shape
+4. `tools/call` with invalid params returns JSON-RPC `-32602`
+5. `tools/call` triggering tool failure returns result with `isError: true`
+6. Unknown method returns JSON-RPC `-32601`
+7. Malformed JSON returns JSON-RPC `-32700`
+## Step 12: Run tests, verify all pass
+Run the full test suite. All tests must pass. Record results in `mcp-dev-handoff.md#test-results-summary`. If tests fail, fix the code -- do not skip or weaken tests.

package/pipeline/steps/mobile/emulator-lifecycle.md ADDED Viewed

@@ -0,0 +1,67 @@
+# MOBILE Step: Emulator Lifecycle Management
+## Step 7b: Boot Emulator
+### Android Emulator
+1. List available AVDs: `emulator -list-avds`
+2. Boot emulator: `emulator -avd {avd_name} -no-snapshot-load -no-audio -no-window &`
+3. Wait for boot: `adb wait-for-device` then poll `adb shell getprop sys.boot_completed` until it returns `1` (max 120s, 10 retries at 12s intervals)
+4. If boot fails after 120s: kill process (`adb emu kill`), retry once with fresh boot. If second attempt fails, file `[BLOCKER]` to Lead with emulator logs.
+Record emulator config in `mobile-handoff.md#emulator-configuration`.
+### iOS Simulator (Mac Only)
+1. Verify Mac host: `uname -s` must return `Darwin`. If not Mac, skip iOS entirely.
+2. List available simulators: `xcrun simctl list devices available`
+3. Boot simulator: `xcrun simctl boot {device_udid}`
+4. Wait for boot: poll `xcrun simctl list devices | grep Booted` (max 60s)
+5. If boot fails: `xcrun simctl shutdown all`, retry once. If second attempt fails, file `[BLOCKER]` to Lead.
+Record simulator config in `mobile-handoff.md#emulator-configuration`.
+## Step 7c: Build and Install App
+### React Native
+1. Start Metro bundler: `npx react-native start --reset-cache &`
+2. Wait for Metro ready: poll for `http://localhost:8081/status` returning `packager-status:running` (max 60s). Handle port conflicts by checking if port 8081 is in use.
+3. Android build + install: `npx react-native run-android`
+4. iOS build + install (Mac only): `npx react-native run-ios --simulator="{simulator_name}"`
+5. Verify main activity/screen renders within 10s of launch. If not, capture `adb logcat` output and file P1 bug.
+### Flutter
+1. Resolve dependencies: `flutter pub get`
+2. Android build + install: `flutter build apk --debug && flutter install --device-id {emulator_id}`
+3. iOS build + install (Mac only): `flutter build ios --debug --simulator && flutter install --device-id {simulator_id}`
+4. Verify app launches and main screen renders within 10s.
+### Native Module Recovery (React Native)
+If native module errors occur during build:
+- iOS: run `cd ios && pod install && cd ..` and retry build
+- Android: run `cd android && ./gradlew clean && cd ..` and retry build
+- If native module build fails after retry, file P1 bug with full build output.
+## Step 7d: State Isolation Between Maestro Flows
+Before each Maestro flow execution:
+- **Android:** `adb shell pm clear {app_package_name}`
+- **iOS:** `xcrun simctl terminate {device_udid} {bundle_id}` followed by `xcrun simctl privacy {device_udid} reset all {bundle_id}`
+This ensures no state leakage between test flows. Every flow starts from a clean app state.
+## Step 7e: Pre-Grant Permissions
+Before test execution, pre-grant required permissions to avoid UI dialog interference:
+- **Android:** `adb shell pm grant {package} android.permission.{PERMISSION}` for each required permission
+- **iOS:** `xcrun simctl privacy {device_udid} grant {permission-type} {bundle_id}`
+Never depend on UI dialogs for permission grants during E2E tests.
+## Step 7f: Crash Recovery
+If emulator/simulator crashes or becomes unresponsive during test execution:
+1. Detect via `adb devices` showing offline or Maestro flow timeout
+2. Capture crash logs: `adb logcat -d > crash-{timestamp}.log`
+3. Kill stale processes: `adb emu kill` / `xcrun simctl shutdown all`
+4. Re-boot with clean state (Step 7b)
+5. Resume from the last incomplete Maestro flow (do not re-run passed flows)
+6. Max 2 crash recovery attempts per platform. After 2 crashes, file P1 bug with crash logs and stop testing on that platform.

package/pipeline/steps/mobile/estimate.md ADDED Viewed

@@ -0,0 +1,51 @@
+# Mobile Estimation
+**Purpose:** Assign a Fibonacci story point estimate for mobile implementation complexity. This is a lightweight estimation step — no code tools, no implementation. Read specs, assess complexity, output a number with rationale.
+**Fibonacci scale:** 1, 2, 3, 5, 8, 13, 21
+## Step 1: Read Groomed Specs
+Read and assess:
+- `{story_output_dir}/reqs-brief.md` — REQUIRED
+- `{story_output_dir}/uxa-spec.md` — REQUIRED (if UI profile active)
+- `{story_output_dir}/qa-test-spec.md` — REQUIRED
+## Step 2: Assess Complexity Factors
+Evaluate each factor and record your assessment:
+| Factor | Assessment | Weight |
+|--------|-----------|--------|
+| **Screen count** | How many new or modified screens? Simple displays vs complex interactive screens? | High |
+| **Navigation complexity** | Deep linking, nested stacks/tabs/drawers, modal flows, conditional navigation? | High |
+| **Platform-specific requirements** | Android-only vs cross-platform? Platform-divergent behavior? | Medium |
+| **Native module integration** | Camera, GPS, push notifications, biometrics, file system? | Medium |
+| **State management complexity** | Local state vs global state? Offline persistence? Optimistic updates? | Medium |
+| **API integration surface** | Number of endpoints consumed, real-time updates, file uploads? | Medium |
+## Step 3: Select Fibonacci Value
+Map your assessment to the Fibonacci scale:
+| Points | Typical Mobile Scope |
+|--------|---------------------|
+| 1 | Text change, style tweak, single prop addition |
+| 2 | Simple display screen, minor layout change |
+| 3 | Interactive screen with local state, form with validation |
+| 5 | Multi-screen feature, navigation setup, API integration |
+| 8 | Complex interactive feature, cross-platform divergence, native modules |
+| 13 | Large feature with offline support, complex navigation, extensive platform handling |
+| 21 | Epic-scale: new navigation paradigm or major platform integration (consider splitting) |
+**Calibration context (if `{estimation_model}` is `calibrated`):**
+If calibration directives are provided in `{correction_directives}`, factor them into your estimate. These are learned patterns from prior sprints.
+## Step 4: Write Estimate
+Write to `{story_output_dir}/mobile-estimation.md` using `.valent-pipeline/templates/estimation.template.md`:
+- Fibonacci value with brief rationale (2-3 sentences)
+- Factor assessments from Step 2
+- Calibration adjustments applied (if any)
+Send: `[ESTIMATION] MOBILE estimates {story_id} at {points} points. See mobile-estimation.md.`

package/pipeline/steps/mobile/flutter.md ADDED Viewed

@@ -0,0 +1,30 @@
+# MOBILE Step: Flutter Specifics
+This step is loaded conditionally when `{tech_stack.mobile_framework}` is `flutter`. Read before implementing.
+## Flutter Build Configuration
+- Debug builds for testing: `flutter build apk --debug` / `flutter build ios --debug --simulator`
+- Resolve dependencies before build: `flutter pub get`
+- Verify Flutter SDK version matches project constraints in `pubspec.yaml`
+- Hot reload: disable during E2E (use cold start for each Maestro flow via `clearState`)
+## Flutter Testing Patterns
+- Widget tests: use `flutter_test` with `WidgetTester` for component isolation
+- Integration tests: use Maestro YAML flows (NOT `flutter_driver` or `integration_test` for pipeline E2E)
+- State management: clear providers/blocs/cubits between test suites
+- Platform channels: test with real native code, not mock method channel handlers
+## Flutter-Specific Emulator Setup
+- Android: standard AVD boot, then `flutter install --device-id {emulator_id}`
+- iOS: `open -a Simulator` if not already running, then `flutter install --device-id {simulator_id}`
+- Verify device connection: `flutter devices` must list the target device
+## Area Labels for Testing
+- Use `Key` with `ValueKey('testID')` for Flutter widgets
+- Maestro `tapOn` with `id:` selector reads the `ValueKey` on both platforms
+- Follow the area label convention from uxa-spec.md: `{screen}-{section}-{element}`
+## Offline Testing
+- Use emulator console commands for network simulation: `adb emu network delay gprs` / `adb emu network speed gsm`
+- Do NOT use `adb shell svc wifi disable` (unreliable on emulators)
+- Test offline-capable features per reqs-brief offline requirements

package/pipeline/steps/mobile/handoff.md ADDED Viewed

@@ -0,0 +1,18 @@
+# MOBILE Step: Handoff
+Read `.valent-pipeline/steps/common/distilled-handoff-format.md` before writing output.
+## Step 11: Write mobile-handoff.md
+Complete all sections of the handoff document using the template at `.valent-pipeline/templates/mobile-handoff.template.md`. Set `status: completed` in frontmatter.
+If iOS tests were deferred (host is not Mac):
+- Set `ios_deferred: true` in frontmatter
+- Complete the `Deferred iOS Tests` section listing all unexecuted iOS flows
+- Include in the inbox message: `[IOS-DEFERRED] {count} iOS Maestro flows deferred. Run /run-deferred-tests on Mac to complete.`
+Notify lead via inbox: `[DONE] Mobile implementation complete. See mobile-handoff.md#orchestrator-summary.`
+## Independent Verification Requirement
+All Android tests must pass before marking complete. If on Mac, all iOS tests must also pass. Do not mark complete with failing Android tests. Do not rely on BEND or CRITIC to catch your failures.
+**Smoke test gate:** The app-level smoke test (Step 9b) must pass before sending `[DONE]`. If it fails, the app's entry point is not wired to your deliverable -- fix the wiring before marking complete.

package/pipeline/steps/mobile/implement.md ADDED Viewed

@@ -0,0 +1,20 @@
+# MOBILE Step: Implement
+## Step 3: Detect host platform
+Run platform detection to determine available targets:
+- `uname -s` returns `Darwin` → Mac: both Android and iOS targets available
+- `uname -s` returns `Linux` or `MINGW*`/`MSYS*` → Windows/Linux: Android only, iOS deferred
+Record platform capabilities in `mobile-handoff.md#platform-coverage`. If iOS is unavailable, set `ios_deferred: true` in handoff frontmatter.
+## Step 4: Plan screen architecture
+From uxa-spec.md screen specifications (if present) or reqs-brief.md: identify screens, navigation structure (stack, tab, drawer), shared components, deep link URI patterns. Map to framework conventions for `{tech_stack.mobile_framework}`.
+## Step 5: Implement screens and navigation
+Per spec: create screen components, navigation setup (React Navigation / Flutter Navigator), deep linking configuration. Apply `testID` attributes matching the area label system from uxa-spec.md. Record in `mobile-handoff.md#screens-implemented`.
+## Step 6: Implement components
+Per spec: forms, lists, modals, gesture handlers, platform-specific components. Wire to backend API endpoints per `bend-handoff.md#api-endpoints-implemented` (if BEND is active). Record in `mobile-handoff.md#components-created`.
+## Step 7: Implement platform-specific behavior
+Handle platform divergences: permissions (camera, location, notifications), native modules, platform-specific UI (Android back button, iOS swipe-to-go-back, safe areas, notch handling). Use `Platform.OS` / `Platform.select` for divergent behavior. Record decisions in `mobile-handoff.md#implementation-decisions`.