npm - valent-pipeline - Versions diffs - 0.2.19 → 0.2.21 - Mend

valent-pipeline 0.2.19 → 0.2.21

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (115) hide show

package/README.md +438 -0
package/package.json +1 -1
package/pipeline/agents-manifest.yaml +61 -1
package/pipeline/docs/agent-reference.md +82 -23
package/pipeline/docs/design/refactor-checklist.md +111 -0
package/pipeline/docs/index.md +60 -0
package/pipeline/docs/lead-lifecycle.md +1 -1
package/pipeline/docs/pipeline-overview.md +4 -0
package/pipeline/prompts/bend.md +5 -11
package/pipeline/prompts/critic.md +9 -0
package/pipeline/prompts/data.md +59 -0
package/pipeline/prompts/docgen.md +61 -0
package/pipeline/prompts/fend.md +3 -10
package/pipeline/prompts/iac.md +70 -0
package/pipeline/prompts/knowledge.md +2 -0
package/pipeline/prompts/lead.md +97 -6
package/pipeline/prompts/libdev.md +61 -0
package/pipeline/prompts/mcp-dev.md +59 -0
package/pipeline/prompts/mobile.md +92 -0
package/pipeline/prompts/qa-a.md +1 -1
package/pipeline/prompts/qa-b.md +1 -1
package/pipeline/prompts/reqs.md +5 -1
package/pipeline/scripts/db-bootstrap.ts +1 -1
package/pipeline/scripts/embed-sqlite.ts +5 -0
package/pipeline/steps/common/quality-standards.md +19 -0
package/pipeline/steps/critic/data-pipeline.md +28 -0
package/pipeline/steps/critic/document-generation.md +21 -0
package/pipeline/steps/critic/iac.md +29 -0
package/pipeline/steps/critic/library.md +24 -0
package/pipeline/steps/critic/mcp-server.md +24 -0
package/pipeline/steps/critic/mobile-app.md +29 -0
package/pipeline/steps/data/estimate.md +51 -0
package/pipeline/steps/data/handoff.md +9 -0
package/pipeline/steps/data/implement.md +16 -0
package/pipeline/steps/data/read-inputs.md +13 -0
package/pipeline/steps/data/write-tests.md +13 -0
package/pipeline/steps/docgen/estimate.md +49 -0
package/pipeline/steps/docgen/handoff.md +9 -0
package/pipeline/steps/docgen/implement.md +19 -0
package/pipeline/steps/docgen/read-inputs.md +13 -0
package/pipeline/steps/docgen/write-tests.md +15 -0
package/pipeline/steps/iac/estimate.md +50 -0
package/pipeline/steps/iac/handoff.md +9 -0
package/pipeline/steps/iac/implement.md +19 -0
package/pipeline/steps/iac/read-inputs.md +13 -0
package/pipeline/steps/iac/write-tests.md +20 -0
package/pipeline/steps/judge/ship-decision.md +14 -1
package/pipeline/steps/libdev/estimate.md +49 -0
package/pipeline/steps/libdev/handoff.md +9 -0
package/pipeline/steps/libdev/implement.md +19 -0
package/pipeline/steps/libdev/read-inputs.md +13 -0
package/pipeline/steps/libdev/write-tests.md +16 -0
package/pipeline/steps/mcp-dev/estimate.md +49 -0
package/pipeline/steps/mcp-dev/handoff.md +9 -0
package/pipeline/steps/mcp-dev/implement.md +29 -0
package/pipeline/steps/mcp-dev/read-inputs.md +13 -0
package/pipeline/steps/mcp-dev/write-tests.md +19 -0
package/pipeline/steps/mobile/emulator-lifecycle.md +67 -0
package/pipeline/steps/mobile/estimate.md +51 -0
package/pipeline/steps/mobile/flutter.md +30 -0
package/pipeline/steps/mobile/handoff.md +18 -0
package/pipeline/steps/mobile/implement.md +20 -0
package/pipeline/steps/mobile/react-native.md +32 -0
package/pipeline/steps/mobile/read-inputs.md +10 -0
package/pipeline/steps/mobile/write-tests.md +59 -0
package/pipeline/steps/orchestration/adopt-lead-and-create-team.md +1 -1
package/pipeline/steps/orchestration/sprint-execute.md +3 -2
package/pipeline/steps/orchestration/sprint-groom.md +4 -0
package/pipeline/steps/orchestration/sprint-size.md +26 -16
package/pipeline/steps/orchestration/validate-story-inputs.md +9 -0
package/pipeline/steps/qa-a/data-pipeline.md +32 -0
package/pipeline/steps/qa-a/document-generation.md +52 -0
package/pipeline/steps/qa-a/iac.md +30 -0
package/pipeline/steps/qa-a/library.md +42 -0
package/pipeline/steps/qa-a/mcp-server.md +31 -0
package/pipeline/steps/qa-a/mobile-app.md +59 -0
package/pipeline/steps/qa-b/data-pipeline.md +48 -0
package/pipeline/steps/qa-b/document-generation.md +47 -0
package/pipeline/steps/qa-b/iac.md +44 -0
package/pipeline/steps/qa-b/library.md +61 -0
package/pipeline/steps/qa-b/mcp-server.md +40 -0
package/pipeline/steps/qa-b/mobile-app.md +71 -0
package/pipeline/steps/readiness/standalone-review.md +7 -2
package/pipeline/steps/reqs/data-pipeline.md +56 -0
package/pipeline/steps/reqs/document-generation.md +55 -0
package/pipeline/steps/reqs/draft-brief.md +10 -0
package/pipeline/steps/reqs/iac.md +63 -0
package/pipeline/steps/reqs/library.md +56 -0
package/pipeline/steps/reqs/mcp-server.md +48 -0
package/pipeline/steps/reqs/mobile-app.md +54 -0
package/pipeline/steps/reqs/self-review.md +5 -3
package/pipeline/task-graphs/backend-api.yaml +19 -2
package/pipeline/task-graphs/data-pipeline.yaml +29 -12
package/pipeline/task-graphs/document-generation.yaml +29 -12
package/pipeline/task-graphs/frontend-only.yaml +19 -2
package/pipeline/task-graphs/fullstack-web.yaml +19 -2
package/pipeline/task-graphs/library.yaml +29 -12
package/pipeline/task-graphs/mcp-server.yaml +29 -12
package/pipeline/task-graphs/mobile-app.yaml +171 -0
package/pipeline/templates/bugs.template.md +1 -1
package/pipeline/templates/critic-review.template.md +1 -1
package/pipeline/templates/data-handoff.template.md +96 -0
package/pipeline/templates/docgen-handoff.template.md +83 -0
package/pipeline/templates/iac-handoff.template.md +83 -0
package/pipeline/templates/judge-decision.template.md +11 -1
package/pipeline/templates/libdev-handoff.template.md +82 -0
package/pipeline/templates/mcp-dev-handoff.template.md +87 -0
package/pipeline/templates/mobile-handoff.template.md +122 -0
package/pipeline/templates/reqs-brief.template.md +60 -4
package/skills/valent-run-deferred-tests/SKILL.md +109 -0
package/skills/valent-run-epic/SKILL.md +1 -1
package/skills/valent-run-project/SKILL.md +1 -1
package/src/commands/db-rebuild.js +5 -0
package/src/lib/config-schema.js +1 -1
package/src/lib/db.js +1 -1

package/pipeline/steps/critic/mobile-app.md ADDED Viewed

@@ -0,0 +1,29 @@
+# CRITIC Domain: Mobile App Review
+**Applies to:** Stories where MOBILE is the implementing agent.
+## Edge Case Focus Areas
+In addition to the standard edge case hunt (Pass 2), scrutinize these mobile-specific risks:
+- **Hardcoded platform assumptions** — code that assumes Android-only or iOS-only without platform checks. `Platform.OS` / `Platform.select` (RN) or `dart:io Platform.isAndroid` (Flutter) must be used for divergent behavior. Any hardcoded platform assumption is a **Med** finding.
+- **Missing state isolation in Maestro flows** — flows that don't start with `clearState` or explicit app data clear. Any flow without state isolation is a **High** finding.
+- **Emulator-only code paths** — code that works on emulator but will fail on real devices. Common: `localhost` URLs instead of `10.0.2.2` for Android emulator API access, `127.0.0.1` instead of the machine's IP. Any hardcoded `localhost` or `127.0.0.1` in API base URLs is a **High** finding.
+- **Missing permission handling** — native features (camera, location, notifications, biometrics) used without permission request flows. The app must handle both grant and deny outcomes. Missing permission handling is a **High** finding.
+- **Navigation stack leaks** — screens pushed but never popped, leading to memory growth. Deep link handlers that don't reset the navigation stack are a **Med** finding. Verify that deep links use `reset` or `popToTop` + `navigate` pattern.
+- **Gesture conflict** — overlapping gesture handlers (e.g., swipe-to-delete conflicting with navigation swipe, scroll inside scroll). Any unhandled gesture conflict is a **Med** finding.
+- **Metro bundler dependency in production** — code that assumes Metro is running or references `__DEV__` without guards. Any dev-only code path reachable in production bundle is a **High** finding.
+- **Missing keyboard avoidance** — input screens that don't handle keyboard appearance (content hidden behind keyboard). Missing `KeyboardAvoidingView` (RN) or `resizeToAvoidBottomInset` (Flutter) for input screens is a **Med** finding.
+- **Unsafe area rendering** — content rendered under status bar, notch, or home indicator. Missing `SafeAreaView` (RN) or `SafeArea` (Flutter) on screen containers is a **Med** finding.
+- **Background/foreground state bugs** — app state corruption when returning from background. If the story involves data fetching, verify that stale data is refreshed on foreground. Missing foreground refresh for data screens is a **Low** finding.
+## Test Code Review Additions
+In addition to the standard test code review checklist:
+- **Maestro flow completeness** — every AC must have a corresponding Maestro flow. Missing flow for an AC is a **High** finding.
+- **State isolation verified** — every Maestro flow must start with `clearState` or explicit app data clear. Missing isolation is a **High** finding.
+- **Platform coverage** — if story specifies both-platform behavior, flows must cover both (or explicitly mark iOS as deferred with reason). Missing platform coverage without documented reason is a **Med** finding.
+- **No hardcoded waits in flows** — Maestro flows should use `assertVisible` / `waitForAnimationToEnd`, not `extendedWaitUntil` with fixed timeouts. Fixed timeouts in flows are a **Med** finding.
+- **Deep link flow coverage** — if reqs-brief specifies deep link URIs, at least one Maestro flow must test each URI via `openLink`. Missing deep link test is a **Med** finding.
+- **Unit test isolation** — unit tests must not depend on emulator state or Maestro. They run independently with mocked native modules where needed. Unit tests importing emulator-specific code is a **Med** finding.

package/pipeline/steps/data/estimate.md ADDED Viewed

@@ -0,0 +1,51 @@
+# Data Pipeline Estimation
+**Purpose:** Assign a Fibonacci story point estimate for data pipeline implementation complexity. This is a lightweight estimation step -- no code tools, no implementation. Read specs, assess complexity, output a number with rationale.
+**Fibonacci scale:** 1, 2, 3, 5, 8, 13, 21
+## Step 1: Read Groomed Specs
+Read and assess:
+- `{story_output_dir}/reqs-brief.md` -- REQUIRED
+- `{story_output_dir}/qa-test-spec.md` -- REQUIRED
+## Step 2: Assess Complexity Factors
+Evaluate each factor and record your assessment:
+| Factor | Assessment | Weight |
+|--------|-----------|--------|
+| **AC count and complexity** | How many ACs? Are conditions simple (read-and-write) or complex (multi-stage transforms, conditional logic, aggregations)? | High |
+| **New patterns vs established** | Greenfield pipeline vs incremental (add stage, add source)? | High |
+| **Data source complexity** | Number of sources, format diversity (CSV, JSON, Parquet, API), encoding issues, schema volatility? | Medium |
+| **Transform complexity** | Joins, aggregations, dedup logic, data type coercions, timezone handling, stateful transforms? | High |
+| **Data quality surface** | How many quality rules? Null handling, dedup, row count assertions, constraint enforcement? | Medium |
+| **Checkpoint/resume needs** | Does the pipeline need checkpoint/resume? How many stages? | Medium |
+| **Test complexity** | Idempotency tests, checkpoint/resume tests, large dataset edge cases, malformed input handling? | Medium |
+## Step 3: Select Fibonacci Value
+Map your assessment to the Fibonacci scale:
+| Points | Typical Data Pipeline Scope |
+|--------|---------------------------|
+| 1 | Config change, add column to existing stage, no new source |
+| 2 | Simple single-source ingest, trivial transform, basic output |
+| 3 | Standard ETL with one source, moderate transforms, quality rules |
+| 5 | Multi-source pipeline, joins, dedup, checkpoint, moderate test surface |
+| 8 | Complex transforms with aggregations, multiple quality rules, checkpoint/resume, cross-source joins |
+| 13 | Large pipeline spanning multiple domains, complex data model, extensive quality rules, full checkpoint/resume |
+| 21 | Epic-scale: new pipeline framework or major architectural change (consider splitting the story) |
+**Calibration context (if `{estimation_model}` is `calibrated`):**
+If calibration directives are provided in `{correction_directives}`, factor them into your estimate. These are learned patterns from prior sprints -- e.g., "multi-source joins consistently under-pointed by 1 tier" or "stories with 4+ transform stages average 8 points."
+## Step 4: Write Estimate
+Write to `{story_output_dir}/data-estimation.md` using `.valent-pipeline/templates/estimation.template.md`:
+- Fibonacci value with brief rationale (2-3 sentences)
+- Factor assessments from Step 2
+- Calibration adjustments applied (if any)
+Send: `[ESTIMATION] DATA estimates {story_id} at {points} points. See data-estimation.md.`

package/pipeline/steps/data/handoff.md ADDED Viewed

@@ -0,0 +1,9 @@
+# DATA Step: Handoff
+Read `.valent-pipeline/steps/common/distilled-handoff-format.md` before writing output.
+## Step 12: Write data-handoff.md
+Complete all sections of the handoff document using the template at `.valent-pipeline/templates/data-handoff.template.md`. Set `status: completed` in frontmatter. Notify lead via inbox: `[DONE] Data pipeline implementation complete. See data-handoff.md#orchestrator-summary.`
+## Independent Verification Requirement
+You must independently verify: all tests pass against the complete pipeline before marking your task complete. Do not rely on CRITIC or QA-B to catch your failures.

package/pipeline/steps/data/implement.md ADDED Viewed

@@ -0,0 +1,16 @@
+# DATA Step: Implement
+## Step 4: Data model / pipeline state schema
+Per reqs-brief: define or update the data model for pipeline inputs, outputs, and intermediate state. Define checkpoint state schema if the pipeline requires resume capability. Record in `data-handoff.md#pipeline-stages-implemented`.
+## Step 5: Ingestion layer (readers, parsers, encoding handling)
+Per reqs-brief: implement readers for each data source. Every file read must specify encoding explicitly (UTF-8 unless source requires otherwise). Parsers must handle malformed input gracefully -- log and skip bad records, never silently drop or crash. Record in `data-handoff.md#pipeline-stages-implemented`.
+## Step 6: Transform stages
+Per reqs-brief: implement each transform as a discrete, testable stage. Every filter or join that reduces row count MUST log: rows in, rows out, rows dropped, and drop reason. Silent data loss is a Critical defect. Record data quality rules in `data-handoff.md#data-quality-rules`. Record decisions in `data-handoff.md#implementation-decisions`.
+## Step 7: Output / sink layer (idempotent writes with natural keys)
+Per reqs-brief: implement writers for each data destination. All writes must be idempotent -- use natural keys or deterministic IDs so that re-running the pipeline with the same input produces identical results, not duplicates. Record in `data-handoff.md#pipeline-stages-implemented`.
+## Step 8: Checkpoint mechanism for resume after failure
+Per reqs-brief: implement checkpoint markers after each major stage. On failure, the pipeline must be able to resume from the last successful checkpoint rather than restarting from scratch. Record in `data-handoff.md#checkpointresume-design`.

package/pipeline/steps/data/read-inputs.md ADDED Viewed

@@ -0,0 +1,13 @@
+# DATA Step: Read Inputs
+## Step 1: Read reqs-brief.md
+Understand: acceptance criteria, business rules, data sources, data destinations, transform logic, data quality rules, encoding requirements, cross-cutting concerns (logging, retry, scheduling, etc.).
+## Step 2: Read qa-test-spec.md
+Understand: what tests to write for each AC, expected assertions, row count verification requirements, idempotency expectations, checkpoint/resume test cases, test case names and structure.
+## Step 3: Read correction directives
+Read `{correction_directives}`. Apply all directives targeting DATA. Note any conflicts with default behavior and follow the directive.
+## Step 3b: Query Knowledge Agent (Conditional)
+If a Knowledge Agent is available in the team config, send: `[KNOWLEDGE-QUERY] What codebase conventions, implementation patterns, and known pitfalls should I know? Context: I am DATA implementing {story_id} using {tech_stack.pipeline_framework} + {tech_stack.data_store}.` If no response within a reasonable time or no Knowledge Agent is spawned, proceed without.

package/pipeline/steps/data/write-tests.md ADDED Viewed

@@ -0,0 +1,13 @@
+# DATA Step: Write Tests
+## Step 8: Write test code
+Satisfy qa-test-spec for each AC. Every test case named in qa-test-spec must have a corresponding test. Follow quality standards from the core prompt. Record in `data-handoff.md#test-files-written`.
+## Step 9: Verify idempotency
+Every pipeline write path must have an idempotency test: run the same input through the pipeline twice and assert that the output is identical -- no duplicate rows, no changed values, no side effects from the second run.
+## Step 10: Verify checkpoint/resume
+If the pipeline has checkpoint capability, test it: run the pipeline, simulate a failure mid-run (after at least one checkpoint), resume, and assert that the final output matches a clean full run. No data loss, no duplicates from partial + resumed execution.
+## Step 11: Run tests, verify all pass
+Run the full pipeline test suite. All tests must pass. Record results in `data-handoff.md#test-results-summary`. If tests fail, fix the code -- do not skip or weaken tests.

package/pipeline/steps/docgen/estimate.md ADDED Viewed

@@ -0,0 +1,49 @@
+# Document Generation Estimation
+**Purpose:** Assign a Fibonacci story point estimate for document generation implementation complexity. This is a lightweight estimation step -- no code tools, no implementation. Read specs, assess complexity, output a number with rationale.
+**Fibonacci scale:** 1, 2, 3, 5, 8, 13, 21
+## Step 1: Read Groomed Specs
+Read and assess:
+- `{story_output_dir}/reqs-brief.md` -- REQUIRED
+- `{story_output_dir}/qa-test-spec.md` -- REQUIRED
+## Step 2: Assess Complexity Factors
+Evaluate each factor and record your assessment:
+| Factor | Assessment | Weight |
+|--------|-----------|--------|
+| **Template count and complexity** | How many templates? Are they simple (static text + variable substitution) or complex (conditionals, loops, nested partials, dynamic sections)? | High |
+| **Variable schema complexity** | Simple flat variables vs deeply nested objects, arrays of objects, computed/derived values? | High |
+| **Output format count** | Single format vs multi-format (PDF + HTML + Markdown)? Each format adds rendering and validation complexity. | Medium |
+| **Asset dependencies** | No assets vs fonts/images/stylesheets that must be embedded or resolved? | Medium |
+| **Edge-case surface** | How hard will QA-B's test suite be to pass? Unicode, injection, null handling, large documents? | Medium |
+## Step 3: Select Fibonacci Value
+Map your assessment to the Fibonacci scale:
+| Points | Typical DOCGEN Scope |
+|--------|---------------------|
+| 1 | Single template, flat variables, one output format, no assets |
+| 2 | Simple template with conditionals, basic variable schema |
+| 3 | Standard template with loops/conditionals, multi-variable schema, single output format |
+| 5 | Multiple templates, nested variable schemas, 2+ output formats |
+| 8 | Complex templates with partials/inheritance, asset pipeline, multi-format output |
+| 13 | Large template system with dynamic sections, complex asset resolution, extensive edge-case surface |
+| 21 | Epic-scale: new template engine integration or major rendering pipeline change (consider splitting the story) |
+**Calibration context (if `{estimation_model}` is `calibrated`):**
+If calibration directives are provided in `{correction_directives}`, factor them into your estimate. These are learned patterns from prior sprints -- e.g., "multi-format stories consistently under-pointed by 1 tier" or "stories with asset dependencies average 8 points."
+## Step 4: Write Estimate
+Write to `{story_output_dir}/docgen-estimation.md` using `.valent-pipeline/templates/estimation.template.md`:
+- Fibonacci value with brief rationale (2-3 sentences)
+- Factor assessments from Step 2
+- Calibration adjustments applied (if any)
+Send: `[ESTIMATION] DOCGEN estimates {story_id} at {points} points. See docgen-estimation.md.`

package/pipeline/steps/docgen/handoff.md ADDED Viewed

@@ -0,0 +1,9 @@
+# DOCGEN Step: Handoff
+Read `.valent-pipeline/steps/common/distilled-handoff-format.md` before writing output.
+## Step 13: Write docgen-handoff.md
+Complete all sections of the handoff document using the template at `.valent-pipeline/templates/docgen-handoff.template.md`. Set `status: completed` in frontmatter. Notify lead via inbox: `[DONE] Document generation implementation complete. See docgen-handoff.md#orchestrator-summary.`
+## Independent Verification Requirement
+You must independently verify: all tests pass and all templates produce valid output in every declared format before marking your task complete. Do not rely on CRITIC to catch your failures.

package/pipeline/steps/docgen/implement.md ADDED Viewed

@@ -0,0 +1,19 @@
+# DOCGEN Step: Implement
+## Step 4: Plan implementation approach
+Order: template engine setup (auto-escape on) -> template definitions with variable schema -> render pipeline (validate -> substitute -> output) -> encoding (UTF-8) -> asset embedding/resolution. Identify cross-cutting concerns that span multiple templates.
+## Step 5: Set up template engine with auto-escaping
+Configure the template engine with auto-escaping enabled by default. Any raw/unescaped output must require explicit opt-in with a justifying comment. Record engine configuration in `docgen-handoff.md#implementation-decisions`.
+## Step 6: Implement template definitions with variable schema
+Per reqs-brief: define templates, declare all variables with types and required/optional status, implement conditional sections, loops, and partials. Record in `docgen-handoff.md#templates-implemented` and `docgen-handoff.md#variable-schema`.
+## Step 7: Implement render pipeline
+Build the render pipeline: validate all required variables are present and correctly typed before substitution -> substitute variables into templates -> generate output in the target format. Missing or null required variables must produce a clear error, never unsubstituted markers in output. Record in `docgen-handoff.md#implementation-decisions`.
+## Step 8: Implement encoding and output formatting
+All template reads and output writes must specify UTF-8 encoding explicitly. For large documents, use streaming render to avoid unbounded memory consumption. Record supported formats in `docgen-handoff.md#output-formats`.
+## Step 9: Implement asset embedding and resolution
+Per reqs-brief: resolve and embed fonts, images, stylesheets referenced by templates. Validate that all referenced assets exist at render time -- broken asset paths must produce a clear error. Record in `docgen-handoff.md#asset-dependencies`.

package/pipeline/steps/docgen/read-inputs.md ADDED Viewed

@@ -0,0 +1,13 @@
+# DOCGEN Step: Read Inputs
+## Step 1: Read reqs-brief.md
+Understand: acceptance criteria, business rules, template definitions, variable schemas, output format requirements, asset dependencies (fonts, images, stylesheets), encoding requirements, cross-cutting concerns.
+## Step 2: Read qa-test-spec.md
+Understand: what tests to write for each AC, expected assertions, output validation requirements, test case names and structure.
+## Step 3: Read correction directives
+Read `{correction_directives}`. Apply all directives targeting DOCGEN. Note any conflicts with default behavior and follow the directive.
+## Step 3b: Query Knowledge Agent (Conditional)
+If a Knowledge Agent is available in the team config, send: `[KNOWLEDGE-QUERY] What codebase conventions, implementation patterns, and known pitfalls should I know? Context: I am DOCGEN implementing {story_id} using {tech_stack.template_engine} with output formats {tech_stack.output_formats}.` If no response within a reasonable time or no Knowledge Agent is spawned, proceed without.

package/pipeline/steps/docgen/write-tests.md ADDED Viewed

@@ -0,0 +1,15 @@
+# DOCGEN Step: Write Tests
+## Step 10: Write test code
+Satisfy qa-test-spec for each AC. Every test case named in qa-test-spec must have a corresponding test. Follow quality standards from the core prompt. Record in `docgen-handoff.md#test-files-written`.
+Tests must invoke real document generation -- no mocked renderers. The actual template engine must process templates and produce real output. Parse and validate the generated output programmatically:
+- For HTML: parse the DOM and assert on structure and content.
+- For PDF: extract text/metadata and assert on content.
+- For Markdown: parse and assert on structure and content.
+## Step 11: Test with edge-case data
+Per qa-test-spec and quality standards: tests must exercise null values, missing optional fields, unicode characters (CJK, emoji, RTL text), extremely long strings, empty collections, and special characters that could break template syntax. Verify that auto-escaping prevents injection in all cases.
+## Step 12: Run tests, verify all pass
+Run the full test suite. All tests must pass. Record results in `docgen-handoff.md#test-results-summary`. If tests fail, fix the code -- do not skip or weaken tests.

package/pipeline/steps/iac/estimate.md ADDED Viewed

@@ -0,0 +1,50 @@
+# Infrastructure Estimation
+**Purpose:** Assign a Fibonacci story point estimate for infrastructure implementation complexity. This is a lightweight estimation step — no code tools, no implementation. Read specs, assess complexity, output a number with rationale.
+**Fibonacci scale:** 1, 2, 3, 5, 8, 13, 21
+## Step 1: Read Groomed Specs
+Read and assess:
+- `{story_output_dir}/reqs-brief.md` — REQUIRED
+- `{story_output_dir}/qa-test-spec.md` — REQUIRED
+## Step 2: Assess Complexity Factors
+Evaluate each factor and record your assessment:
+| Factor | Assessment | Weight |
+|--------|-----------|--------|
+| **Resource count and complexity** | How many resources? Simple (S3 bucket, DNS record) or complex (VPC, EKS cluster, RDS)? | High |
+| **New infrastructure vs incremental** | Greenfield (new VPC, new cluster) vs incremental (add IAM role, add env var)? | High |
+| **IAM/Security surface** | New roles, policies, service accounts? Cross-account access? | Medium |
+| **State management complexity** | New state backend, state migration, workspace setup? | Medium |
+| **Integration surface** | Outputs consumed by other services, shared config, cross-stack references? | Medium |
+| **Test complexity** | How hard will infrastructure validation be? Multi-resource dependencies, timing issues? | Medium |
+## Step 3: Select Fibonacci Value
+Map your assessment to the Fibonacci scale:
+| Points | Typical Infrastructure Scope |
+|--------|------------------------------|
+| 1 | Tag update, single env var, minor config change |
+| 2 | Simple resource addition (S3 bucket, DNS record), trivial IAM tweak |
+| 3 | Standard resource with IAM, moderate config (Lambda + role, security group rules) |
+| 5 | Multi-resource feature (RDS + security group + IAM + secrets), provider setup |
+| 8 | Complex infrastructure (VPC + subnets + NAT + routing, ECS service + ALB + auto-scaling) |
+| 13 | Large infrastructure spanning multiple services, complex IAM, multi-environment setup |
+| 21 | Epic-scale: new cluster, major networking overhaul, multi-region setup (consider splitting) |
+**Calibration context (if `{estimation_model}` is `calibrated`):**
+If calibration directives are provided in `{correction_directives}`, factor them into your estimate.
+## Step 4: Write Estimate
+Write to `{story_output_dir}/iac-estimation.md` using `.valent-pipeline/templates/estimation.template.md`:
+- Fibonacci value with brief rationale (2-3 sentences)
+- Factor assessments from Step 2
+- Calibration adjustments applied (if any)
+Send: `[ESTIMATION] IAC estimates {story_id} at {points} points. See iac-estimation.md.`

package/pipeline/steps/iac/handoff.md ADDED Viewed

@@ -0,0 +1,9 @@
+# IAC Step: Handoff
+Read `.valent-pipeline/steps/common/distilled-handoff-format.md` before writing output.
+## Step 13: Write iac-handoff.md
+Complete all sections of the handoff document using the template at `.valent-pipeline/templates/iac-handoff.template.md`. Set `status: completed` in frontmatter. Notify lead via inbox: `[DONE] Infrastructure implementation complete. See iac-handoff.md#orchestrator-summary.`
+## Independent Verification Requirement
+You must independently verify: all tests pass against the combined, integrated codebase before marking your task complete. Do not rely on other agents or CRITIC to catch your failures.

package/pipeline/steps/iac/implement.md ADDED Viewed

@@ -0,0 +1,19 @@
+# IAC Step: Implement
+## Step 4: Plan implementation approach
+Order: (1) state backend setup, (2) provider configuration, (3) resource definitions with tagging, (4) IAM roles/policies (least privilege), (5) outputs/exports for consuming services, (6) environment configuration. Identify shared config that needs coordination with BEND/FEND/DATA.
+## Step 5: Configure state backend
+Per reqs-brief: set up remote state backend with locking. Record in `iac-handoff.md#state-management`.
+## Step 6: Configure providers
+Per reqs-brief: provider blocks with pinned versions, authentication configuration, default tags. Record in `iac-handoff.md#files-created-modified`.
+## Step 7: Define resources with tagging
+Per reqs-brief: resource definitions with standard tags (environment, project, owner, managed-by). Every resource must be tagged. Apply destroy protection on stateful resources (databases, storage). Record in `iac-handoff.md#resources-provisioned`.
+## Step 8: Create IAM roles and policies
+Per reqs-brief: roles, policies, service accounts with least-privilege access. No wildcard actions or resources unless explicitly justified. Record in `iac-handoff.md#iam-security`.
+## Step 9: Define outputs and environment configuration
+Per reqs-brief: outputs/exports for consuming services, environment variables, secret references, connection strings. Coordinate with other dev agents via inbox for shared config. Record in `iac-handoff.md#environment-configuration`.

package/pipeline/steps/iac/read-inputs.md ADDED Viewed

@@ -0,0 +1,13 @@
+# IAC Step: Read Inputs
+## Step 1: Read reqs-brief.md
+Understand: acceptance criteria, business rules, infrastructure requirements, deployment configurations, resource provisioning needs, IAM/security requirements, environment configuration, cross-cutting concerns.
+## Step 2: Read qa-test-spec.md
+Understand: what tests to write for each AC, expected assertions, infrastructure state verification requirements, test case names and structure.
+## Step 3: Read correction directives
+Read `{correction_directives}`. Apply all directives targeting IAC. Note any conflicts with default behavior and follow the directive.
+## Step 3b: Query Knowledge Agent (Conditional)
+If a Knowledge Agent is available in the team config, send: `[KNOWLEDGE-QUERY] What codebase conventions, infrastructure patterns, and known pitfalls should I know? Context: I am IAC implementing {story_id} using {tech_stack.iac_framework} on {tech_stack.cloud_provider}.` If no response within a reasonable time or no Knowledge Agent is spawned, proceed without.

package/pipeline/steps/iac/write-tests.md ADDED Viewed

@@ -0,0 +1,20 @@
+# IAC Step: Write Tests
+## Step 10: Write test code
+Satisfy qa-test-spec for each AC. Every test case named in qa-test-spec must have a corresponding test. Follow quality standards from the core prompt. Record in `iac-handoff.md#test-files-written`.
+Infrastructure test categories:
+- **Plan validation** -- terraform plan (or equivalent) succeeds without errors
+- **Policy checks** -- tflint, checkov, OPA, or equivalent policy-as-code validation passes
+- **Idempotency** -- apply twice, second apply shows no changes (zero diff)
+- **Resource tagging verification** -- all resources have required standard tags
+- **Security policy checks** -- no overly permissive IAM, no hardcoded secrets
+Do NOT use mocked providers for happy-path tests. Tests must validate against real plan output or real infrastructure state.
+## Step 11: Run tests, verify all pass
+Run the full infrastructure test suite. All tests must pass. Record results in `iac-handoff.md#test-results-summary`. If tests fail, fix the code -- do not skip or weaken tests.
+## Step 12: Signal integration readiness
+When your code is complete and all tests pass, send to other active dev agents via inbox:
+`[INTEGRATION-READY] Infrastructure code complete. Environment configuration available at iac-handoff.md#environment-configuration.`

package/pipeline/steps/judge/ship-decision.md CHANGED Viewed

@@ -8,8 +8,20 @@ Read `.valent-pipeline/steps/common/distilled-handoff-format.md` before writing
 - ALL evidence checklist items PASS (or N/A)
 - Socratic validation reveals no concerns undermining evidence integrity
+**SHIP-PARTIAL** if:
+- ALL available-platform evidence checklist items PASS (or N/A)
+- Platform coverage evidence item is PARTIAL (not FAIL)
+- iOS tests are deferred (not failed) due to host platform limitation
+- `mobile-handoff.md` frontmatter has `ios_deferred: true`
+- Socratic validation reveals no concerns for available-platform evidence
+SHIP-PARTIAL is ONLY valid when:
+1. The project type is `mobile-app`
+2. iOS tests were deferred due to host OS limitation (not because they failed)
+3. All Android tests passed
 **REJECT** if:
-- ANY evidence checklist item FAIL, OR
+- ANY evidence checklist item FAIL (excluding PARTIAL platform coverage when iOS is deferred, not failed), OR
 - Socratic validation undermines evidence integrity
 For REJECT: identify root cause, responsible phase/agent, required action. JUDGE rejections are non-routine; lead (Opus) takes ownership.
@@ -19,6 +31,7 @@ For REJECT: identify root cause, responsible phase/agent, required action. JUDGE
 - Write to `{story_output_dir}/judge-decision.md`
 - Update frontmatter: status completed, all steps in stepsCompleted
 - SHIP: `[JUDGE-SHIP] Story {story_id} approved for shipping. See judge-decision.md#verdict.`
+- SHIP-PARTIAL: `[JUDGE-SHIP-PARTIAL] Story {story_id} approved for shipping (Android verified). iOS deferred — run /run-deferred-tests on Mac. See judge-decision.md#ship-partial-detail.`
 - REJECT: `[JUDGE-REJECT] Story {story_id} rejected. See judge-decision.md#rejection-detail.`
 ## Step 14b: Write Story Report (SHIP only)

package/pipeline/steps/libdev/estimate.md ADDED Viewed

@@ -0,0 +1,49 @@
+# Library Estimation
+**Purpose:** Assign a Fibonacci story point estimate for library implementation complexity. This is a lightweight estimation step — no code tools, no implementation. Read specs, assess complexity, output a number with rationale.
+**Fibonacci scale:** 1, 2, 3, 5, 8, 13, 21
+## Step 1: Read Groomed Specs
+Read and assess:
+- `{story_output_dir}/reqs-brief.md` — REQUIRED
+- `{story_output_dir}/qa-test-spec.md` — REQUIRED
+## Step 2: Assess Complexity Factors
+Evaluate each factor and record your assessment:
+| Factor | Assessment | Weight |
+|--------|-----------|--------|
+| **AC count and complexity** | How many ACs? Are they simple (add export) or complex (breaking API redesign, dual-module compat)? | High |
+| **New patterns vs established** | Greenfield (new library, new module system) vs incremental (add export, update types)? | High |
+| **Public API surface** | How many new exports? New type declarations? Breaking changes requiring migration? | Medium |
+| **Module system complexity** | Single target (ESM-only) vs dual CJS+ESM? Tree-shaking requirements? | Medium |
+| **Test complexity** | How hard will consumer-simulation tests be? Cross-module-system testing? Bundle verification? | Medium |
+## Step 3: Select Fibonacci Value
+Map your assessment to the Fibonacci scale:
+| Points | Typical Library Scope |
+|--------|-----------------------|
+| 1 | Single export addition, type fix, no packaging change |
+| 2 | Simple new export with types, trivial packaging update |
+| 3 | Multiple exports, type declarations, moderate test coverage |
+| 5 | New module with CJS+ESM entry points, type overhaul, consumer-sim tests |
+| 8 | Major API surface change, breaking changes with migration, dual-module verification |
+| 13 | Large library restructure, new module system support, extensive type surface |
+| 21 | Epic-scale: new library or complete packaging overhaul (consider splitting the story) |
+**Calibration context (if `{estimation_model}` is `calibrated`):**
+If calibration directives are provided in `{correction_directives}`, factor them into your estimate. These are learned patterns from prior sprints — e.g., "dual CJS+ESM stories consistently under-pointed by 1 tier" or "stories with breaking changes average 8 points."
+## Step 4: Write Estimate
+Write to `{story_output_dir}/libdev-estimation.md` using `.valent-pipeline/templates/estimation.template.md`:
+- Fibonacci value with brief rationale (2-3 sentences)
+- Factor assessments from Step 2
+- Calibration adjustments applied (if any)
+Send: `[ESTIMATION] LIBDEV estimates {story_id} at {points} points. See libdev-estimation.md.`

package/pipeline/steps/libdev/handoff.md ADDED Viewed

@@ -0,0 +1,9 @@
+# LIBDEV Step: Handoff
+Read `.valent-pipeline/steps/common/distilled-handoff-format.md` before writing output.
+## Step 13: Write libdev-handoff.md
+Complete all sections of the handoff document using the template at `.valent-pipeline/templates/libdev-handoff.template.md`. Set `status: completed` in frontmatter. Notify lead via inbox: `[DONE] Library implementation complete. See libdev-handoff.md#orchestrator-summary.`
+## Independent Verification Requirement
+You must independently verify: all tests pass, all exports resolve correctly, CJS and ESM entry points work, and type declarations are complete before marking your task complete. Do not rely on CRITIC or QA-B to catch your failures.

package/pipeline/steps/libdev/implement.md ADDED Viewed

@@ -0,0 +1,19 @@
+# LIBDEV Step: Implement
+## Step 4: Plan implementation approach
+Order: public API surface design (exports map) -> core module implementation -> type declarations (.d.ts/hints) -> CJS+ESM entry points -> side effects verification. Identify peer dependencies that must be declared.
+## Step 5: Design public API surface
+Per reqs-brief: define the exports map, entry points, and module structure. Every public export must be intentional -- no internal modules leaking into the public API. Record in `libdev-handoff.md#public-api-surface`.
+## Step 6: Implement core modules
+Per reqs-brief: implement the library's core logic, utilities, and domain types. Each module must be independently importable via the exports map. Record in `libdev-handoff.md#files-created-modified`.
+## Step 7: Implement type declarations
+Per reqs-brief: create or generate type declarations for every public export. TypeScript: .d.ts files or inline types. Python: type hints and py.typed marker. Rust: public type signatures. Types must match the actual implementation signatures exactly.
+## Step 8: Configure entry points and packaging
+Per reqs-brief: configure main (CJS), module (ESM), and types entry points. Set sideEffects field. Declare peer dependencies with correct version ranges. Record in `libdev-handoff.md#package-configuration`.
+## Step 9: Verify no accidental side effects
+Import the library's main entry point and verify no observable side effects execute (no console output, no network calls, no global mutations, no file I/O). If side effects are intentional, document them and set sideEffects field accordingly. Record decisions in `libdev-handoff.md#implementation-decisions`.

package/pipeline/steps/libdev/read-inputs.md ADDED Viewed

@@ -0,0 +1,13 @@
+# LIBDEV Step: Read Inputs
+## Step 1: Read reqs-brief.md
+Understand: acceptance criteria, business rules, public API surface requirements, export contracts, type declarations, packaging constraints, peer dependency requirements, cross-cutting concerns.
+## Step 2: Read qa-test-spec.md
+Understand: what tests to write for each AC, expected assertions, consumer-simulation requirements, test case names and structure.
+## Step 3: Read correction directives
+Read `{correction_directives}`. Apply all directives targeting LIBDEV. Note any conflicts with default behavior and follow the directive.
+## Step 3b: Query Knowledge Agent (Conditional)
+If a Knowledge Agent is available in the team config, send: `[KNOWLEDGE-QUERY] What codebase conventions, implementation patterns, and known pitfalls should I know? Context: I am LIBDEV implementing {story_id} using {tech_stack.language} with {tech_stack.module_system} module system and {tech_stack.type_system}.` If no response within a reasonable time or no Knowledge Agent is spawned, proceed without.

package/pipeline/steps/libdev/write-tests.md ADDED Viewed

@@ -0,0 +1,16 @@
+# LIBDEV Step: Write Tests
+## Step 10: Write test code
+Satisfy qa-test-spec for each AC. Every test case named in qa-test-spec must have a corresponding test. Follow quality standards from the core prompt. Record in `libdev-handoff.md#test-files-written`.
+**Critical rule:** Tests must import the actual library using its public API -- the same way a consumer would. No importing from internal/private module paths. No mocked imports. If a test cannot exercise the library through its public exports, the API surface is wrong, not the test.
+## Step 11: Write consumer-simulation tests
+Beyond unit tests, write at least one consumer-simulation test per major export that:
+1. Imports the library the way a real consumer would (CJS `require()` and ESM `import`)
+2. Exercises the documented usage example from the public API surface
+3. Verifies the return types match declared type signatures
+4. Confirms no side effects on import (no console output, no global mutations)
+## Step 12: Run tests, verify all pass
+Run the full library test suite. All tests must pass. Record results in `libdev-handoff.md#test-results-summary`. If tests fail, fix the code -- do not skip or weaken tests.

package/pipeline/steps/mcp-dev/estimate.md ADDED Viewed

@@ -0,0 +1,49 @@
+# MCP Server Estimation
+**Purpose:** Assign a Fibonacci story point estimate for MCP server implementation complexity. This is a lightweight estimation step -- no code tools, no implementation. Read specs, assess complexity, output a number with rationale.
+**Fibonacci scale:** 1, 2, 3, 5, 8, 13, 21
+## Step 1: Read Groomed Specs
+Read and assess:
+- `{story_output_dir}/reqs-brief.md` -- REQUIRED
+- `{story_output_dir}/qa-test-spec.md` -- REQUIRED
+## Step 2: Assess Complexity Factors
+Evaluate each factor and record your assessment:
+| Factor | Assessment | Weight |
+|--------|-----------|--------|
+| **Tool count and handler complexity** | How many tools? Are handlers simple (stateless transform) or complex (stateful, multi-step, external calls)? | High |
+| **New patterns vs established** | Greenfield server vs adding tools to existing server? New transport vs reusing existing? | High |
+| **Input schema complexity** | Simple flat params vs deeply nested schemas, conditional validation, cross-field dependencies? | Medium |
+| **Error surface** | How many distinct failure modes per tool? Complex error mapping? External service failures to handle? | Medium |
+| **Test complexity** | How hard will QA-B's protocol tests be to pass? Complex fixtures, multi-step tool sequences, stateful interactions? | Medium |
+## Step 3: Select Fibonacci Value
+Map your assessment to the Fibonacci scale:
+| Points | Typical MCP Server Scope |
+|--------|------------------------|
+| 1 | Config change, single tool param update, no schema change |
+| 2 | Simple single-tool server, trivial inputSchema, basic transport |
+| 3 | Standard server with 2-3 tools, input validation, proper error model |
+| 5 | Multi-tool server with complex schemas, stateful handlers, external integrations |
+| 8 | Complex server with many tools, multiple content types, advanced error handling, transport edge cases |
+| 13 | Large server spanning multiple domains, custom transport, extensive protocol compliance surface |
+| 21 | Epic-scale: new transport implementation or major protocol extension (consider splitting the story) |
+**Calibration context (if `{estimation_model}` is `calibrated`):**
+If calibration directives are provided in `{correction_directives}`, factor them into your estimate. These are learned patterns from prior sprints -- e.g., "input validation consistently under-pointed by 1 tier" or "stories with 5+ tools average 8 points."
+## Step 4: Write Estimate
+Write to `{story_output_dir}/mcp-dev-estimation.md` using `.valent-pipeline/templates/estimation.template.md`:
+- Fibonacci value with brief rationale (2-3 sentences)
+- Factor assessments from Step 2
+- Calibration adjustments applied (if any)
+Send: `[ESTIMATION] MCP-DEV estimates {story_id} at {points} points. See mcp-dev-estimation.md.`

package/pipeline/steps/mcp-dev/handoff.md ADDED Viewed

@@ -0,0 +1,9 @@
+# MCP-DEV Step: Handoff
+Read `.valent-pipeline/steps/common/distilled-handoff-format.md` before writing output.
+## Step 13: Write mcp-dev-handoff.md
+Complete all sections of the handoff document using the template at `.valent-pipeline/templates/mcp-dev-handoff.template.md`. Set `status: completed` in frontmatter. Notify lead via inbox: `[DONE] MCP server implementation complete. See mcp-dev-handoff.md#orchestrator-summary.`
+## Independent Verification Requirement
+You must independently verify: all tests pass against the complete server implementation before marking your task complete. Do not rely on CRITIC or QA-B to catch your failures.

package/pipeline/steps/mcp-dev/implement.md ADDED Viewed

@@ -0,0 +1,29 @@
+# MCP-DEV Step: Implement
+## Step 4: Plan implementation approach
+Order: server scaffolding + transport setup -> initialize handler + capability declarations -> tool registration + inputSchema definitions -> tool handlers (try-catch with isError:true) -> input validation against declared schemas. Identify any shared files or dependencies.
+## Step 5: Scaffold server and transport
+Set up MCP server instance and configure transport layer (stdio, SSE, or HTTP per `{tech_stack.transport_type}`). Ensure the transport can accept connections and route JSON-RPC messages to handlers. Record in `mcp-dev-handoff.md#files-created-modified`.
+## Step 6: Implement initialize handler and capabilities
+Implement the `initialize` JSON-RPC method. Return server info and capability declarations that exactly match the tools and features being implemented. No phantom capabilities. Record in `mcp-dev-handoff.md#capabilities-declared`.
+## Step 7: Register tools and define inputSchema
+Register each tool with its name, description, and JSON Schema for inputs. The declared inputSchema is the contract -- it must be validated at runtime in Step 8. Record in `mcp-dev-handoff.md#tools-implemented`.
+## Step 8: Implement tool handlers
+For each registered tool, implement the handler logic. Every handler must:
+1. Wrap execution in try-catch
+2. Validate inputs against the declared inputSchema; reject with JSON-RPC `-32602` on schema violation
+3. On tool-level failure (business logic errors), return result with `isError: true` and a descriptive error message
+4. On unexpected exceptions, catch and return JSON-RPC `-32603` (Internal error) -- never let unhandled exceptions kill the transport
+Record decisions in `mcp-dev-handoff.md#implementation-decisions`. Record error model in `mcp-dev-handoff.md#error-model`.
+## Step 9: Implement protocol error handling
+Ensure the server correctly returns JSON-RPC error codes for protocol-level failures:
+- `-32700` for malformed JSON (parse errors)
+- `-32600` for invalid JSON-RPC requests (missing method, missing id)
+- `-32601` for unknown methods
+- `-32602` for invalid params (schema validation failures)
+- `-32603` for internal server errors (unhandled exceptions)