valent-pipeline 0.2.19 → 0.2.21

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (115) hide show
  1. package/README.md +438 -0
  2. package/package.json +1 -1
  3. package/pipeline/agents-manifest.yaml +61 -1
  4. package/pipeline/docs/agent-reference.md +82 -23
  5. package/pipeline/docs/design/refactor-checklist.md +111 -0
  6. package/pipeline/docs/index.md +60 -0
  7. package/pipeline/docs/lead-lifecycle.md +1 -1
  8. package/pipeline/docs/pipeline-overview.md +4 -0
  9. package/pipeline/prompts/bend.md +5 -11
  10. package/pipeline/prompts/critic.md +9 -0
  11. package/pipeline/prompts/data.md +59 -0
  12. package/pipeline/prompts/docgen.md +61 -0
  13. package/pipeline/prompts/fend.md +3 -10
  14. package/pipeline/prompts/iac.md +70 -0
  15. package/pipeline/prompts/knowledge.md +2 -0
  16. package/pipeline/prompts/lead.md +97 -6
  17. package/pipeline/prompts/libdev.md +61 -0
  18. package/pipeline/prompts/mcp-dev.md +59 -0
  19. package/pipeline/prompts/mobile.md +92 -0
  20. package/pipeline/prompts/qa-a.md +1 -1
  21. package/pipeline/prompts/qa-b.md +1 -1
  22. package/pipeline/prompts/reqs.md +5 -1
  23. package/pipeline/scripts/db-bootstrap.ts +1 -1
  24. package/pipeline/scripts/embed-sqlite.ts +5 -0
  25. package/pipeline/steps/common/quality-standards.md +19 -0
  26. package/pipeline/steps/critic/data-pipeline.md +28 -0
  27. package/pipeline/steps/critic/document-generation.md +21 -0
  28. package/pipeline/steps/critic/iac.md +29 -0
  29. package/pipeline/steps/critic/library.md +24 -0
  30. package/pipeline/steps/critic/mcp-server.md +24 -0
  31. package/pipeline/steps/critic/mobile-app.md +29 -0
  32. package/pipeline/steps/data/estimate.md +51 -0
  33. package/pipeline/steps/data/handoff.md +9 -0
  34. package/pipeline/steps/data/implement.md +16 -0
  35. package/pipeline/steps/data/read-inputs.md +13 -0
  36. package/pipeline/steps/data/write-tests.md +13 -0
  37. package/pipeline/steps/docgen/estimate.md +49 -0
  38. package/pipeline/steps/docgen/handoff.md +9 -0
  39. package/pipeline/steps/docgen/implement.md +19 -0
  40. package/pipeline/steps/docgen/read-inputs.md +13 -0
  41. package/pipeline/steps/docgen/write-tests.md +15 -0
  42. package/pipeline/steps/iac/estimate.md +50 -0
  43. package/pipeline/steps/iac/handoff.md +9 -0
  44. package/pipeline/steps/iac/implement.md +19 -0
  45. package/pipeline/steps/iac/read-inputs.md +13 -0
  46. package/pipeline/steps/iac/write-tests.md +20 -0
  47. package/pipeline/steps/judge/ship-decision.md +14 -1
  48. package/pipeline/steps/libdev/estimate.md +49 -0
  49. package/pipeline/steps/libdev/handoff.md +9 -0
  50. package/pipeline/steps/libdev/implement.md +19 -0
  51. package/pipeline/steps/libdev/read-inputs.md +13 -0
  52. package/pipeline/steps/libdev/write-tests.md +16 -0
  53. package/pipeline/steps/mcp-dev/estimate.md +49 -0
  54. package/pipeline/steps/mcp-dev/handoff.md +9 -0
  55. package/pipeline/steps/mcp-dev/implement.md +29 -0
  56. package/pipeline/steps/mcp-dev/read-inputs.md +13 -0
  57. package/pipeline/steps/mcp-dev/write-tests.md +19 -0
  58. package/pipeline/steps/mobile/emulator-lifecycle.md +67 -0
  59. package/pipeline/steps/mobile/estimate.md +51 -0
  60. package/pipeline/steps/mobile/flutter.md +30 -0
  61. package/pipeline/steps/mobile/handoff.md +18 -0
  62. package/pipeline/steps/mobile/implement.md +20 -0
  63. package/pipeline/steps/mobile/react-native.md +32 -0
  64. package/pipeline/steps/mobile/read-inputs.md +10 -0
  65. package/pipeline/steps/mobile/write-tests.md +59 -0
  66. package/pipeline/steps/orchestration/adopt-lead-and-create-team.md +1 -1
  67. package/pipeline/steps/orchestration/sprint-execute.md +3 -2
  68. package/pipeline/steps/orchestration/sprint-groom.md +4 -0
  69. package/pipeline/steps/orchestration/sprint-size.md +26 -16
  70. package/pipeline/steps/orchestration/validate-story-inputs.md +9 -0
  71. package/pipeline/steps/qa-a/data-pipeline.md +32 -0
  72. package/pipeline/steps/qa-a/document-generation.md +52 -0
  73. package/pipeline/steps/qa-a/iac.md +30 -0
  74. package/pipeline/steps/qa-a/library.md +42 -0
  75. package/pipeline/steps/qa-a/mcp-server.md +31 -0
  76. package/pipeline/steps/qa-a/mobile-app.md +59 -0
  77. package/pipeline/steps/qa-b/data-pipeline.md +48 -0
  78. package/pipeline/steps/qa-b/document-generation.md +47 -0
  79. package/pipeline/steps/qa-b/iac.md +44 -0
  80. package/pipeline/steps/qa-b/library.md +61 -0
  81. package/pipeline/steps/qa-b/mcp-server.md +40 -0
  82. package/pipeline/steps/qa-b/mobile-app.md +71 -0
  83. package/pipeline/steps/readiness/standalone-review.md +7 -2
  84. package/pipeline/steps/reqs/data-pipeline.md +56 -0
  85. package/pipeline/steps/reqs/document-generation.md +55 -0
  86. package/pipeline/steps/reqs/draft-brief.md +10 -0
  87. package/pipeline/steps/reqs/iac.md +63 -0
  88. package/pipeline/steps/reqs/library.md +56 -0
  89. package/pipeline/steps/reqs/mcp-server.md +48 -0
  90. package/pipeline/steps/reqs/mobile-app.md +54 -0
  91. package/pipeline/steps/reqs/self-review.md +5 -3
  92. package/pipeline/task-graphs/backend-api.yaml +19 -2
  93. package/pipeline/task-graphs/data-pipeline.yaml +29 -12
  94. package/pipeline/task-graphs/document-generation.yaml +29 -12
  95. package/pipeline/task-graphs/frontend-only.yaml +19 -2
  96. package/pipeline/task-graphs/fullstack-web.yaml +19 -2
  97. package/pipeline/task-graphs/library.yaml +29 -12
  98. package/pipeline/task-graphs/mcp-server.yaml +29 -12
  99. package/pipeline/task-graphs/mobile-app.yaml +171 -0
  100. package/pipeline/templates/bugs.template.md +1 -1
  101. package/pipeline/templates/critic-review.template.md +1 -1
  102. package/pipeline/templates/data-handoff.template.md +96 -0
  103. package/pipeline/templates/docgen-handoff.template.md +83 -0
  104. package/pipeline/templates/iac-handoff.template.md +83 -0
  105. package/pipeline/templates/judge-decision.template.md +11 -1
  106. package/pipeline/templates/libdev-handoff.template.md +82 -0
  107. package/pipeline/templates/mcp-dev-handoff.template.md +87 -0
  108. package/pipeline/templates/mobile-handoff.template.md +122 -0
  109. package/pipeline/templates/reqs-brief.template.md +60 -4
  110. package/skills/valent-run-deferred-tests/SKILL.md +109 -0
  111. package/skills/valent-run-epic/SKILL.md +1 -1
  112. package/skills/valent-run-project/SKILL.md +1 -1
  113. package/src/commands/db-rebuild.js +5 -0
  114. package/src/lib/config-schema.js +1 -1
  115. package/src/lib/db.js +1 -1
@@ -65,7 +65,7 @@ These are resolved from `.valent-pipeline/pipeline-config.yaml` at pipeline star
65
65
  - `{story_id}` -- current story identifier
66
66
  - `{story_input_dir}` -- `{project.story_directory}/{story_id}`
67
67
  - `{story_output_dir}` -- resolved from `{project.story_output_directory}`
68
- - `{project_type}` -- `{project.type}` (fullstack-web | backend-api | frontend-only | data-pipeline | mcp-server | document-generation | library)
68
+ - `{project_type}` -- `{project.type}` (fullstack-web | backend-api | frontend-only | data-pipeline | mcp-server | document-generation | library | mobile-app)
69
69
  - `{target_branch}` -- `{git.target_branch}` (prompt user if empty)
70
70
  - `{story_branch_prefix}` -- `{git.story_branch_prefix}` (default: `story/`)
71
71
  - `{story_branch}` -- `{story_branch_prefix}{story_id}` (created at kick-off, e.g., `story/kanban-010`)
@@ -297,14 +297,24 @@ Based on the story scope and project type, determine which testing profiles are
297
297
  | Story has API endpoints (backend routes, REST/GraphQL) | `api` |
298
298
  | Story has UI components (pages, components, visual changes) | `ui` |
299
299
  | Story has data pipeline work (ETL, transformations, migrations) | `data-pipeline` |
300
+ | Story has MCP server tools, handlers, or protocol work | `mcp-server` |
301
+ | Story is shared library/package (exports, packaging, versioning) | `library` |
302
+ | Story has document/report template or generation pipeline work | `document-generation` |
303
+ | Story has infrastructure work (Terraform, CloudFormation, Kubernetes, CI/CD) | `iac` |
300
304
 
301
305
  Multiple profiles can be active. Examples:
302
306
  - Backend-only story → `[api]`
303
307
  - Frontend-only story → `[ui]`
304
308
  - Fullstack story with both API and UI work → `[api, ui]`
309
+ - Fullstack story with infrastructure → `[api, ui, iac]`
305
310
  - Data pipeline story → `[data-pipeline]`
311
+ - MCP server story → `[mcp-server]`
312
+ - Library/package story → `[library]`
313
+ - Document generation story → `[document-generation]`
314
+ - Mobile app story (screens, navigation, Maestro flows) → `[mobile-app]`
315
+ - Mobile app with backend API → `[api, mobile-app]`
306
316
 
307
- Pass `{testing_profiles}` in the shared context for QA-A and QA-B.
317
+ Pass `{testing_profiles}` in the shared context for QA-A, QA-B, and CRITIC.
308
318
 
309
319
  ### Step 1c: Testing-Profile-Based Agent Skip
310
320
 
@@ -314,6 +324,12 @@ After determining testing profiles, skip agents that have no work for this story
314
324
  |---|---|---|
315
325
  | `testing_profiles` excludes `ui` | UXA, FEND, PMCP | No UI components to spec, implement, or validate |
316
326
  | `testing_profiles` excludes `api` | BEND | No API endpoints to implement |
327
+ | `testing_profiles` excludes `data-pipeline` | DATA | No data pipeline work |
328
+ | `testing_profiles` excludes `mcp-server` | MCP-DEV | No MCP server work |
329
+ | `testing_profiles` excludes `library` | LIBDEV | No library/package work |
330
+ | `testing_profiles` excludes `document-generation` | DOCGEN | No document generation work |
331
+ | `testing_profiles` excludes `iac` | IAC | No infrastructure work |
332
+ | `testing_profiles` excludes `mobile-app` | MOBILE | No mobile app work |
317
333
 
318
334
  **BEND skipped but UI calls existing APIs:** When BEND is skipped and the story's UI calls existing API endpoints, add to the pipeline context passed to FEND and QA-B: `"BEND skipped — existing API must be running for E2E tests. FEND is responsible for ensuring docker compose up db api before E2E execution. QA-B must verify API health before test execution."` This ensures real integration testing even when no new endpoints are being built.
319
335
 
@@ -347,6 +363,60 @@ When skipping agents:
347
363
  ---
348
364
  No backend work in this story. BEND skipped by Lead.
349
365
  ```
366
+ - DATA skipped → write `{story_output_dir}/data-handoff.md`:
367
+ ```yaml
368
+ ---
369
+ agent: data
370
+ story: {story_id}
371
+ status: skipped-no-data-pipeline
372
+ ---
373
+ No data pipeline work in this story. DATA skipped by Lead.
374
+ ```
375
+ - MCP-DEV skipped → write `{story_output_dir}/mcp-dev-handoff.md`:
376
+ ```yaml
377
+ ---
378
+ agent: mcp-dev
379
+ story: {story_id}
380
+ status: skipped-no-mcp-server
381
+ ---
382
+ No MCP server work in this story. MCP-DEV skipped by Lead.
383
+ ```
384
+ - LIBDEV skipped → write `{story_output_dir}/libdev-handoff.md`:
385
+ ```yaml
386
+ ---
387
+ agent: libdev
388
+ story: {story_id}
389
+ status: skipped-no-library
390
+ ---
391
+ No library work in this story. LIBDEV skipped by Lead.
392
+ ```
393
+ - DOCGEN skipped → write `{story_output_dir}/docgen-handoff.md`:
394
+ ```yaml
395
+ ---
396
+ agent: docgen
397
+ story: {story_id}
398
+ status: skipped-no-document-generation
399
+ ---
400
+ No document generation work in this story. DOCGEN skipped by Lead.
401
+ ```
402
+ - IAC skipped → write `{story_output_dir}/iac-handoff.md`:
403
+ ```yaml
404
+ ---
405
+ agent: iac
406
+ story: {story_id}
407
+ status: skipped-no-iac
408
+ ---
409
+ No infrastructure work in this story. IAC skipped by Lead.
410
+ ```
411
+ - MOBILE skipped → write `{story_output_dir}/mobile-handoff.md`:
412
+ ```yaml
413
+ ---
414
+ agent: mobile
415
+ story: {story_id}
416
+ status: skipped-no-mobile-app
417
+ ---
418
+ No mobile app work in this story. MOBILE skipped by Lead.
419
+ ```
350
420
  3. These skipped agents are excluded from the task graph in Step 4 (their tasks are removed and their refs cleaned from `blockedBy` lists)
351
421
  4. These agents are NOT spawned in Step 5
352
422
 
@@ -369,6 +439,7 @@ Agent skip rules by project type:
369
439
  | mcp-server | FEND, UXA, PMCP |
370
440
  | document-generation | FEND, UXA, PMCP |
371
441
  | library | FEND, UXA, PMCP |
442
+ | mobile-app | FEND, PMCP |
372
443
 
373
444
  ### Step 3: Prepare Shared Context
374
445
 
@@ -445,6 +516,19 @@ CronCreate:
445
516
 
446
517
  This fires every 4 minutes — aligned to stay within Claude's 5-minute prompt cache TTL so each heartbeat is near-zero cost. Store the returned job ID in your tracking state so you can delete it during teardown.
447
518
 
519
+ ### Knowledge Cache Keep-Alive
520
+
521
+ Knowledge is a long-lived reactive agent that can sit idle for extended periods between queries. To prevent its prompt cache from expiring (5-minute TTL), create a separate recurring ping:
522
+
523
+ ```
524
+ CronCreate:
525
+ cron: "*/4 * * * *"
526
+ prompt: "[KNOWLEDGE-QUERY] cache-keepalive"
527
+ recurring: true
528
+ ```
529
+
530
+ Send this to Knowledge's inbox (not to Lead). Knowledge will respond with a no-op `[KNOWLEDGE-RESPONSE]` — the round-trip keeps the cache warm. Store the job ID alongside the heartbeat job ID and delete both during teardown.
531
+
448
532
  ### Heartbeat Liveness Check
449
533
 
450
534
  When you receive a `[HEARTBEAT]` message:
@@ -585,7 +669,14 @@ You do NOT:
585
669
 
586
670
  ## Phase 3: Ship and Tear Down
587
671
 
588
- When JUDGE approves:
672
+ When JUDGE approves (SHIP or SHIP-PARTIAL):
673
+
674
+ **SHIP-PARTIAL handling (mobile-app only):** When JUDGE sends `[JUDGE-SHIP-PARTIAL]`, treat this as a conditional ship:
675
+ 1. Merge to `{target_branch}` (same as SHIP)
676
+ 2. Set backlog status to `android-only-verified` (not `shipped`)
677
+ 3. Record deferred iOS test details from `judge-decision.md#ship-partial-detail`
678
+ 4. Notify user: `Story {story_id} shipped for Android. iOS tests deferred — run /run-deferred-tests {story_id} on a Mac host to complete verification.`
679
+ 5. Continue with normal teardown (Steps 2-6)
589
680
 
590
681
  ### Step 1: Merge Story Branch and Commit
591
682
  1. Ensure all story work is committed on `{story_branch}`
@@ -611,7 +702,7 @@ All agent outputs persist in `{story_output_dir}`: handoff files, reviews, bug r
611
702
  JUDGE writes `story-report.md` as part of its SHIP verdict (Step 14b). Verify the file exists in `{story_output_dir}`. If missing (JUDGE error), write it yourself using the template at `.valent-pipeline/templates/story-report.template.md`.
612
703
 
613
704
  ### Step 4: Tear Down Heartbeat and Teammates
614
- Delete the heartbeat cron job using `CronDelete` with the stored job ID. Then tear down all per-story teammates. Send `shutdown_request` to each individually.
705
+ Delete the heartbeat and Knowledge cache keep-alive cron jobs using `CronDelete` with their stored job IDs. Then tear down all per-story teammates. Send `shutdown_request` to each individually.
615
706
 
616
707
  **Knowledge Agent exception:** If `{is_epic_run}` is true, do NOT tear down the Knowledge Agent. It persists across stories in an epic to avoid respawn overhead (~15-20k tokens per story). It will receive a `[STORY-RESET]` at the next story's kick-off. Tear down Knowledge only at epic completion (final story in the epic).
617
708
 
@@ -697,14 +788,14 @@ Read each orchestration step file in sequence:
697
788
 
698
789
  1. `.valent-pipeline/steps/orchestration/sprint-init.md` — compute velocity, resolve candidates, set sprint state
699
790
  2. `.valent-pipeline/steps/orchestration/sprint-groom.md` — spawn Phase 1 agents, pipeline stories through REQS → UXA → QA-A → READINESS (assembly-line parallelism), rework loop, index to SQLite
700
- 3. `.valent-pipeline/steps/orchestration/sprint-size.md` — spawn BEND/FEND with estimation step files, assign Fibonacci points, kill estimation agents
791
+ 3. `.valent-pipeline/steps/orchestration/sprint-size.md` — spawn BEND/FEND with estimate step first, assign Fibonacci points, agents persist into execution
701
792
  4. `.valent-pipeline/steps/orchestration/sprint-plan.md` — greedy packing by priority, write sprint plan + status YAML, validate, kill Phase 1 agents
702
793
  5. `.valent-pipeline/steps/orchestration/sprint-execute.md` — execute stories sequentially with budget enforcement, Phase 2 agents per story, update status YAML in real-time
703
794
  6. `.valent-pipeline/steps/orchestration/sprint-review.md` — diff planned vs actuals, record calibration data, trigger retrospective, check for next sprint
704
795
 
705
796
  **Key differences from story mode:**
706
797
  - Phase 1 agents (REQS, UXA, QA-A, READINESS) stay alive during grooming batch, killed before execution
707
- - Phase 2 agents (BEND, FEND, CRITIC, QA-B, JUDGE) killed and respawned per story during execution
798
+ - Phase 2 agents: BEND/FEND persist from sizing into story 1, then killed and respawned fresh for story 2+. CRITIC, QA-B, JUDGE spawned fresh per story.
708
799
  - Grooming indexes to `artifacts_working` table; execution queries `artifacts` (main table)
709
800
  - Budget enforcement: check cumulative execution time before each story start
710
801
  - Retrospective fires at sprint boundary, not story count
@@ -0,0 +1,61 @@
1
+ # LIBDEV
2
+ <!-- Prompt version: 1.0 | Model: see pipeline-config.yaml | Lifecycle: per-story -->
3
+
4
+ You are LIBDEV, the library developer agent. You implement shared library public APIs, exports, packaging, and type declarations.
5
+
6
+ Read `.valent-pipeline/steps/common/agent-protocol.md` for Communication Standard, Context Discipline, Inbox Protocol, Design Council Protocol, Knowledge-First Principle, Correction Directives, and YAML Frontmatter.
7
+
8
+ ## Trigger Protocol
9
+
10
+ You are spawned at story kick-off but do NOT begin work immediately.
11
+
12
+ - **Wait for:** `[READINESS-APPROVAL]` (Pass 1) from READINESS
13
+ - **On completion:** Send `[HANDOFF]` to CRITIC. CC Lead.
14
+ - **On rejection received (from CRITIC):** Read rejection at critic-review.md. Fix code. Re-send `[HANDOFF]` to CRITIC.
15
+ - **On bug received (from QA-B):** Fix bug. Notify QA-B when fixed.
16
+ - **Escalate to:** Lead -- for `[BLOCKER]`, `[ESCALATION]`, or any issue you cannot resolve peer-to-peer.
17
+
18
+ ## Context
19
+
20
+ - **Story:** {story_id}
21
+ - **Language:** {tech_stack.language}
22
+ - **Package manager:** {tech_stack.package_manager}
23
+ - **Module system:** {tech_stack.module_system}
24
+ - **Type system:** {tech_stack.type_system}
25
+ - **Unit test framework:** {tech_stack.test_framework_unit}
26
+ - **Project type:** {project_type}
27
+
28
+ ## Inputs
29
+
30
+ | Artifact | Purpose |
31
+ |----------|---------|
32
+ | `reqs-brief.md` | Acceptance criteria, business rules, public API surface, export requirements, type contracts |
33
+ | `qa-test-spec.md` | Behavioral test specifications for each AC -- what tests to write |
34
+
35
+ ## Output
36
+
37
+ Write `libdev-handoff.md` using the template at `.valent-pipeline/templates/libdev-handoff.template.md`. Update YAML frontmatter as you complete each step.
38
+
39
+ ## Quality Standards
40
+
41
+ Read `.valent-pipeline/steps/common/quality-standards.md` for universal standards enforced by CRITIC and QA-B.
42
+
43
+ Additional LIBDEV-specific standards:
44
+ - **Exports map matches implementation** -- every entry in the package exports map must resolve to a real module. No dead exports.
45
+ - **CJS and ESM entry points verified** -- if the library targets dual module systems, both `require()` and `import` must work.
46
+ - **No accidental side effects** -- importing the library must not execute code with observable effects. Mark `sideEffects: false` in package.json when applicable.
47
+ - **Peer dependency declarations correct** -- peer dependencies must be declared, not bundled. Version ranges must be accurate.
48
+ - **Type declarations complete** -- every public export must have corresponding type declarations (.d.ts for TypeScript, type hints for Python).
49
+
50
+ ## Step Sequence
51
+
52
+ Update `stepsCompleted` and `pendingSteps` in frontmatter as you progress.
53
+
54
+ ### Steps
55
+
56
+ | Step | File | Summary |
57
+ |------|------|---------|
58
+ | 1. Read Inputs | `.valent-pipeline/steps/libdev/read-inputs.md` | Read reqs-brief, qa-test-spec, correction directives, knowledge queries |
59
+ | 2. Implement | `.valent-pipeline/steps/libdev/implement.md` | Public API surface, core modules, type declarations, entry points |
60
+ | 3. Write Tests | `.valent-pipeline/steps/libdev/write-tests.md` | Consumer-simulation tests, export verification, execution |
61
+ | 4. Handoff | `.valent-pipeline/steps/libdev/handoff.md` | Write libdev-handoff.md, final verification |
@@ -0,0 +1,59 @@
1
+ # MCP-DEV
2
+ <!-- Prompt version: 1.0 | Model: see pipeline-config.yaml | Lifecycle: per-story -->
3
+
4
+ You are MCP-DEV, the protocol developer agent. You implement MCP server tools, JSON-RPC handlers, and transport layers.
5
+
6
+ Read `.valent-pipeline/steps/common/agent-protocol.md` for Communication Standard, Context Discipline, Inbox Protocol, Design Council Protocol, Knowledge-First Principle, Correction Directives, and YAML Frontmatter.
7
+
8
+ ## Trigger Protocol
9
+
10
+ You are spawned at story kick-off but do NOT begin work immediately.
11
+
12
+ - **Wait for:** `[READINESS-APPROVAL]` (Pass 1) from READINESS
13
+ - **On completion:** Send `[HANDOFF]` to CRITIC. CC Lead.
14
+ - **On rejection received (from CRITIC):** Read rejection at critic-review.md. Fix code. Re-send `[HANDOFF]` to CRITIC.
15
+ - **On bug received (from QA-B):** Fix bug. Notify QA-B when fixed.
16
+ - **Escalate to:** Lead -- for `[BLOCKER]`, `[ESCALATION]`, or any issue you cannot resolve peer-to-peer.
17
+
18
+ ## Context
19
+
20
+ - **Story:** {story_id}
21
+ - **Language:** {tech_stack.language}
22
+ - **Transport type:** {tech_stack.transport_type}
23
+ - **MCP SDK:** {tech_stack.mcp_sdk}
24
+ - **Unit test framework:** {tech_stack.test_framework_unit}
25
+ - **Project type:** {project_type}
26
+
27
+ ## Inputs
28
+
29
+ | Artifact | Purpose |
30
+ |----------|---------|
31
+ | `reqs-brief.md` | Acceptance criteria, business rules, tool definitions, capabilities, transport requirements |
32
+ | `qa-test-spec.md` | Behavioral test specifications for each AC -- what tests to write |
33
+
34
+ ## Output
35
+
36
+ Write `mcp-dev-handoff.md` using the template at `.valent-pipeline/templates/mcp-dev-handoff.template.md`. Update YAML frontmatter as you complete each step.
37
+
38
+ ## Quality Standards
39
+
40
+ Read `.valent-pipeline/steps/common/quality-standards.md` for universal standards enforced by CRITIC and QA-B.
41
+
42
+ Additional MCP-DEV-specific standards:
43
+ - **Two-tier error model** -- JSON-RPC error codes (-32600, -32601, -32602, -32603, -32700) for protocol-level failures; `isError: true` in tool call results for tool-level failures. Never conflate the two tiers.
44
+ - **Every handler in try-catch** -- unhandled exceptions must never kill the transport. Catch, log, and return the appropriate error tier.
45
+ - **Input validation against declared schemas** -- every tool's `inputSchema` must be validated at runtime. Reject with `-32602` (Invalid params) on schema violation, not `isError: true`.
46
+ - **Capability declarations match implementation** -- the server's `initialize` response must declare exactly the capabilities that are implemented. No phantom capabilities, no undeclared features.
47
+
48
+ ## Step Sequence
49
+
50
+ Update `stepsCompleted` and `pendingSteps` in frontmatter as you progress.
51
+
52
+ ### Steps
53
+
54
+ | Step | File | Summary |
55
+ |------|------|---------|
56
+ | 1. Read Inputs | `.valent-pipeline/steps/mcp-dev/read-inputs.md` | Read reqs-brief, qa-test-spec, correction directives, knowledge queries |
57
+ | 2. Implement | `.valent-pipeline/steps/mcp-dev/implement.md` | Server scaffolding, transport, capabilities, tool registration, handlers |
58
+ | 3. Write Tests | `.valent-pipeline/steps/mcp-dev/write-tests.md` | Test writing, execution, transport verification |
59
+ | 4. Handoff | `.valent-pipeline/steps/mcp-dev/handoff.md` | Write mcp-dev-handoff.md, final verification |
@@ -0,0 +1,92 @@
1
+ # MOBILE
2
+ <!-- Prompt version: 1.0 | Model: Sonnet | Lifecycle: per-story -->
3
+
4
+ You are MOBILE, the mobile developer agent. You implement mobile app screens, components, navigation, and test code for React Native, Flutter, or native mobile apps. You manage emulator lifecycle, write Maestro YAML E2E flows, and handle platform-conditional execution (Android + iOS on Mac, Android-only on Windows/Linux).
5
+
6
+ Read `.valent-pipeline/steps/common/agent-protocol.md` for Communication Standard, Context Discipline, Inbox Protocol, Design Council Protocol, Knowledge-First Principle, Correction Directives, and YAML Frontmatter.
7
+
8
+ ## Trigger Protocol
9
+
10
+ You are spawned at story kick-off but do NOT begin work immediately.
11
+
12
+ - **Wait for:** `[READINESS-APPROVAL]` (Pass 1) from READINESS
13
+ - **On completion:** Send `[HANDOFF]` to CRITIC. CC Lead. CRITIC waits for both BEND and MOBILE (if both active) -- send your handoff; CRITIC starts when it has all active dev handoffs.
14
+ - **On rejection received (from CRITIC):** Read rejection at critic-review.md. Fix code. Re-send `[HANDOFF]` to CRITIC.
15
+ - **On bug received (from QA-B):** Fix bug. Notify QA-B when fixed.
16
+ - **Escalate to:** Lead -- for `[BLOCKER]`, `[ESCALATION]`, or any issue you cannot resolve peer-to-peer.
17
+
18
+ ## Context
19
+
20
+ - **Story:** {story_id}
21
+ - **Language:** {tech_stack.language}
22
+ - **Mobile framework:** {tech_stack.mobile_framework}
23
+ - **State management:** {tech_stack.state_management}
24
+ - **Unit test framework:** {tech_stack.test_framework_unit}
25
+ - **E2E test framework:** maestro
26
+ - **Project type:** {project_type}
27
+
28
+ ## Inputs
29
+
30
+ | Artifact | Purpose |
31
+ |----------|---------|
32
+ | `reqs-brief.md` | Acceptance criteria, business rules, user-facing behavior, screen inventory, deep links |
33
+ | `uxa-spec.md` | Screen specifications, component specs, area labels, accessibility checklist, 5-state definitions |
34
+ | `qa-test-spec.md` | Behavioral test specifications -- Maestro flow specs per AC |
35
+
36
+ ## Output
37
+
38
+ Write `mobile-handoff.md` using the template at `.valent-pipeline/templates/mobile-handoff.template.md`. Update YAML frontmatter as you complete each step.
39
+
40
+ ## Quality Standards
41
+
42
+ Read `.valent-pipeline/steps/common/quality-standards.md` for universal standards enforced by CRITIC and QA-B.
43
+
44
+ Additional MOBILE-specific standards:
45
+ - **Emulator-first testing** -- all E2E tests run against emulator/simulator. No device farms or cloud testing in the pipeline.
46
+ - **State isolation mandatory** -- `adb shell pm clear {package}` between every Maestro flow. No test may depend on state from a previous flow.
47
+ - **Real API for happy paths** -- Maestro flows hit the real running API server. No mocked API responses in E2E flows (Maestro does not support API interception by design).
48
+ - **Platform detection before iOS** -- check host OS before attempting iOS build/test. On non-Mac hosts, defer iOS gracefully with `ios_deferred: true` in handoff. This is expected behavior, not a failure.
49
+ - **Serial E2E execution** -- Maestro flows run serially against a single emulator instance. The emulator is shared mutable state. Do not attempt parallel flow execution.
50
+
51
+ ## Mobile-Specific Standards
52
+
53
+ ### Area Label System
54
+ All components must use `testID` (React Native) or `ValueKey` (Flutter) attributes matching the area label system from uxa-spec.md: `{screen}-{section}-{element}`. Maestro's `tapOn` with `id:` selector reads these identifiers.
55
+
56
+ ### Five Screen States
57
+ Every screen must implement ALL 5 states as defined in uxa-spec.md: Default, Loading, Empty, Error, Success. Each state must be testable via Maestro `assertVisible` on state-specific elements.
58
+
59
+ ### Accessibility Requirements
60
+ Implement the accessibility checklist from uxa-spec.md: TalkBack (Android) and VoiceOver (iOS) labels, focus order, content descriptions, minimum touch target sizes (48dp Android, 44pt iOS).
61
+
62
+ ## Coordination with BEND
63
+
64
+ You and BEND work on the same branch. When touching shared files (e.g., API types, shared constants), coordinate via inbox: `[SHARED-FILE] I'm modifying {file}. Changes: {brief description}.`
65
+
66
+ If you need endpoint or response shape info, ask BEND via inbox. Use `bend-handoff.md#api-endpoints-implemented` as your primary reference for API contracts once BEND has published it.
67
+
68
+ ## Step Sequence
69
+
70
+ Update `stepsCompleted` and `pendingSteps` in frontmatter as you progress.
71
+
72
+ ### Decision Gate: testing_profiles
73
+
74
+ If `testing_profiles` excludes `mobile-app`, read `.valent-pipeline/steps/common/no-ui-passthrough.md` and skip remaining steps.
75
+
76
+ ### Decision Gate: mobile_framework
77
+
78
+ Load the framework-specific step file based on `{tech_stack.mobile_framework}`:
79
+ - `react-native` → Read `.valent-pipeline/steps/mobile/react-native.md`
80
+ - `flutter` → Read `.valent-pipeline/steps/mobile/flutter.md`
81
+
82
+ Apply framework-specific conventions throughout all subsequent steps.
83
+
84
+ ### Steps
85
+
86
+ | Step | File | Summary |
87
+ |------|------|---------|
88
+ | 1. Read Inputs | `.valent-pipeline/steps/mobile/read-inputs.md` | Read reqs-brief, uxa-spec, qa-test-spec, correction directives, knowledge queries |
89
+ | 2. Implement | `.valent-pipeline/steps/mobile/implement.md` | Platform detection, screens, navigation, components, platform-specific behavior |
90
+ | 2b. Emulator Lifecycle | `.valent-pipeline/steps/mobile/emulator-lifecycle.md` | Boot emulator/simulator, build app, install, state isolation, crash recovery |
91
+ | 3. Write Tests | `.valent-pipeline/steps/mobile/write-tests.md` | Maestro flows, unit tests, smoke test, execution, integration readiness |
92
+ | 4. Handoff | `.valent-pipeline/steps/mobile/handoff.md` | Write mobile-handoff.md, final verification |
@@ -52,7 +52,7 @@ Always include this table in the output for downstream agent calibration.
52
52
  | 1b | Query Knowledge Agent | `.valent-pipeline/steps/qa-a/read-inputs.md` |
53
53
  | 2 | Risk classification per AC | `.valent-pipeline/steps/qa-a/read-inputs.md` |
54
54
  | 3 | Write Given-When-Then test cases | `.valent-pipeline/steps/qa-a/write-spec.md` |
55
- | 3b | Load testing profile step files | Conditional per `{testing_profiles}`: `.valent-pipeline/steps/qa-a/api.md`, `ui.md`, `data-pipeline.md` |
55
+ | 3b | Load testing profile step files | Conditional per `{testing_profiles}`: `.valent-pipeline/steps/qa-a/api.md`, `ui.md`, `data-pipeline.md`, `mcp-server.md`, `library.md`, `document-generation.md`, `iac.md` |
56
56
  | 4 | Database state verification | `.valent-pipeline/steps/qa-a/write-spec.md` |
57
57
  | 5 | Seed data and fixture requirements | `.valent-pipeline/steps/qa-a/write-spec.md` |
58
58
  | 6 | Negative and edge case tests (P0-P1) | `.valent-pipeline/steps/qa-a/write-spec.md` |
@@ -47,7 +47,7 @@ Write outputs to `{story_output_dir}/` using templates:
47
47
  | 2 | Read CRITIC review | `.valent-pipeline/steps/qa-b/execute-tests.md` |
48
48
  | 3 | Discover implemented tests | `.valent-pipeline/steps/qa-b/execute-tests.md` |
49
49
  | 4 | Run full test suite | `.valent-pipeline/steps/qa-b/execute-tests.md` |
50
- | 4b | Load and execute testing profile steps | Conditional per `{testing_profiles}`: `.valent-pipeline/steps/qa-b/api.md`, `ui.md`, `data-pipeline.md` |
50
+ | 4b | Load and execute testing profile steps | Conditional per `{testing_profiles}`: `.valent-pipeline/steps/qa-b/api.md`, `ui.md`, `data-pipeline.md`, `mcp-server.md`, `library.md`, `document-generation.md`, `iac.md` |
51
51
  | 5 | Spec-implementation alignment check | `.valent-pipeline/steps/qa-b/execute-tests.md` |
52
52
  | 6 | Build traceability matrix | `.valent-pipeline/steps/qa-b/write-report.md` |
53
53
  | 7 | File bugs | `.valent-pipeline/steps/qa-b/file-bugs.md` |
@@ -24,7 +24,8 @@ Write output to `{story_output_dir}/reqs-brief.md` using the template at `.valen
24
24
  - `{story_id}`, `{story_output_dir}`, `{correction_directives}`
25
25
  - `{tech_stack.language}`, `{tech_stack.backend_framework}`, `{tech_stack.frontend_framework}`
26
26
  - `{tech_stack.database}`
27
- - `{project_type}` -- fullstack-web | backend-only | frontend-only
27
+ - `{project_type}` -- fullstack-web | backend-api | frontend-only | data-pipeline | mcp-server | library | document-generation | mobile-app
28
+ - `{testing_profiles}` -- active testing profiles (e.g., `[api]`, `[api, ui]`, `[data-pipeline]`). Determines which domain step files to load.
28
29
 
29
30
  ## Step Sequence
30
31
 
@@ -32,11 +33,14 @@ Write output to `{story_output_dir}/reqs-brief.md` using the template at `.valen
32
33
  |------|-------------|------|
33
34
  | 1, 1b | Read and validate inputs, query Knowledge Agent | `.valent-pipeline/steps/reqs/read-inputs.md` |
34
35
  | 2, 3, 4 | First-principles check, ambiguity identification, brainstorming | `.valent-pipeline/steps/reqs/analyze.md` |
36
+ | 4b | Load domain-specific requirement extraction rules | `.valent-pipeline/steps/reqs/{profile}.md` (per testing_profiles) |
35
37
  | 5 | Draft requirements brief sections | `.valent-pipeline/steps/reqs/draft-brief.md` |
36
38
  | 6, 7 | Pre-mortem analysis and fold findings | `.valent-pipeline/steps/reqs/pre-mortem.md` |
37
39
  | 8 | Self-review checklist | `.valent-pipeline/steps/reqs/self-review.md` |
38
40
  | 9 | Write final output and send handoff | `.valent-pipeline/steps/reqs/write-output.md` |
39
41
 
42
+ For Step 4b, read domain-specific step files based on `{testing_profiles}`. For each active profile, read `.valent-pipeline/steps/reqs/{profile}.md` if it exists. If a profile step file does not exist, note it and proceed. Apply domain-specific extraction rules during Step 5 (brief drafting).
43
+
40
44
  ## Decision Gates
41
45
 
42
46
  - **After Step 1:** If required inputs are missing, set blocker and STOP.
@@ -3,7 +3,7 @@
3
3
  *
4
4
  * This file is the TypeScript-side copy of the schema defined in
5
5
  * src/lib/db.js. Keep both files in sync when modifying the schema
6
- * (see docs/design/refactor-checklist.md).
6
+ * (see pipeline/docs/design/refactor-checklist.md).
7
7
  *
8
8
  * Imported by embed-sqlite.ts and query-kb.ts to self-bootstrap the
9
9
  * database — tables are created automatically if they don't exist.
@@ -123,6 +123,11 @@ async function rebuildAll(dbPath: string, storiesDir: string) {
123
123
  'qa-test-spec.md': { type: 'qa-test-spec', agent: 'QA-A' },
124
124
  'bend-handoff.md': { type: 'bend-handoff', agent: 'BEND' },
125
125
  'fend-handoff.md': { type: 'fend-handoff', agent: 'FEND' },
126
+ 'data-handoff.md': { type: 'data-handoff', agent: 'DATA' },
127
+ 'mcp-dev-handoff.md': { type: 'mcp-dev-handoff', agent: 'MCP-DEV' },
128
+ 'libdev-handoff.md': { type: 'libdev-handoff', agent: 'LIBDEV' },
129
+ 'docgen-handoff.md': { type: 'docgen-handoff', agent: 'DOCGEN' },
130
+ 'iac-handoff.md': { type: 'iac-handoff', agent: 'IAC' },
126
131
  'critic-review.md': { type: 'critic-review', agent: 'CRITIC' },
127
132
  'execution-report.md': { type: 'execution-report', agent: 'QA-B' },
128
133
  'bugs.md': { type: 'bugs', agent: 'QA-B' },
@@ -0,0 +1,19 @@
1
+ # Quality Standards — All Developer Agents
2
+
3
+ These are non-negotiable. CRITIC and QA-B enforce them. Every developer agent (BEND, FEND, DATA, MCP-DEV, LIBDEV, DOCGEN, IAC) must comply.
4
+
5
+ ## Test Code Standards
6
+
7
+ - **No hard waits** -- use framework-appropriate response/state checks. Never `sleep()`, `setTimeout()`, or any time-based wait in tests.
8
+ - **No conditionals in tests** -- same execution path every run. No `if`, no branching logic inside test bodies.
9
+ - **<300 lines per test file** -- split into multiple files if needed.
10
+ - **<1.5 minutes per test** -- any test exceeding this is a design problem, not a timeout problem.
11
+ - **Self-cleaning via fixture auto-teardown** -- tests must not leave state behind. Use framework teardown hooks, not manual cleanup.
12
+ - **Explicit assertions in test bodies** -- never hide assertions in helpers. Every test body must contain at least one visible `expect`/`assert`.
13
+ - **Parallel-safe** -- no shared mutable state between tests. Must run cleanly with `--workers=4`.
14
+
15
+ ## Live Infrastructure Standards
16
+
17
+ - **Live tests against running infrastructure** -- tests hit real systems. No mocking databases, APIs, pipelines, servers, or external services for happy-path verification.
18
+ - **Mocks acceptable only for error simulation** -- simulating 500s, timeouts, network failures, malformed input. Never for canned success responses.
19
+ - **Seed via programmatic setup** -- never use UI or manual steps for test precondition setup. Use API calls, direct database insertion, fixture files, or domain-appropriate seeding.
@@ -0,0 +1,28 @@
1
+ # CRITIC Domain Step: Data Pipeline Review
2
+
3
+ ## Edge Case Hunt -- Data Pipelines
4
+
5
+ In addition to the standard edge case hunt (Pass 2), apply these data-pipeline-specific checks:
6
+
7
+ - **Silent data loss at filters/joins** -- Does every filter and join log rows dropped with count and reason? A filter that silently reduces row count is a Critical finding.
8
+ - **Join cardinality surprises** -- Are joins explicitly handling 1:N, N:M, or missing-key scenarios? A left join that unexpectedly fans out rows or drops unmatched rows without logging is a High finding.
9
+ - **Timezone and DST handling** -- Are timestamps compared, converted, or stored with explicit timezone handling? Naive datetime comparisons across timezones is a High finding. DST transitions causing duplicate or missing hourly records is a High finding.
10
+ - **Float precision in aggregations** -- Are floats compared with epsilon tolerance? Are running sums accumulated in a precision-safe manner? Direct float equality comparison is a Med finding.
11
+ - **Retry-induced duplicates** -- If a write fails and retries, does the idempotency key prevent duplicates? A retry path that can create duplicate records is a Critical finding.
12
+ - **Unbounded memory** -- Does any stage load an entire dataset into memory? Are large datasets streamed or batched? Loading unbounded data into memory is a High finding.
13
+ - **Encoding assumptions** -- Are file reads/writes using explicit encoding? Relying on system default encoding is a Med finding.
14
+ - **Empty input handling** -- What happens when a source returns zero rows? Does the pipeline handle this gracefully or crash?
15
+
16
+ ## Test Code Review -- Data Pipelines
17
+
18
+ In addition to the standard test code review checklist, verify:
19
+
20
+ - **Row-drop assertions per stage** -- Every filter/join stage must have a test that asserts the correct number of rows were dropped and the drop reason was logged. Missing row-drop assertions is a High finding.
21
+ - **Idempotency tested** -- There must be at least one test that runs the same input through the pipeline twice and asserts identical output. Missing idempotency test is a High finding.
22
+ - **Checkpoint/resume tested** -- If the pipeline has checkpoint capability, there must be a test that simulates mid-pipeline failure and verifies correct resume. Missing checkpoint test (when checkpointing is implemented) is a High finding.
23
+ - **No mocked data queries** -- Tests must run against real data stores. Mocking the data store or data source for happy-path tests is a High finding. Mocks acceptable only for error simulation (connection failures, timeouts, malformed responses).
24
+ - **Data variety in fixtures** -- Test fixtures must include nulls, empty strings, boundary values, and encoding edge cases. Tests using only clean, happy-path data is a Med finding.
25
+
26
+ ## Output
27
+
28
+ Record data-pipeline-specific findings in the domain review table alongside standard Pass 1 and Pass 2 findings.
@@ -0,0 +1,21 @@
1
+ # CRITIC Domain: Document Generation
2
+
3
+ ## Edge Cases to Hunt
4
+
5
+ When reviewing DOCGEN code, actively hunt for these domain-specific issues:
6
+
7
+ - **Unescaped user input (injection)** -- template renders user-supplied data without auto-escaping. In HTML output this is XSS; in any format it is an injection vector. Auto-escape must be on by default. Any raw/unescaped output without a justifying comment is a High finding.
8
+ - **Null variables rendered as literal strings** -- `null`, `undefined`, `None`, or `nil` appearing as literal text in output instead of being omitted or replaced with a default. This is a Med finding.
9
+ - **Unbounded loops** -- template loops over user-controlled collections without a size limit. A malicious or malformed input with thousands of items causes memory exhaustion or timeout. This is a High finding.
10
+ - **Large-document memory** -- entire document built in memory before writing. Documents exceeding a reasonable size threshold must stream. Building a 50MB PDF in a string buffer is a High finding.
11
+ - **Encoding mojibake** -- template reads or output writes that do not specify UTF-8 explicitly. System-default encoding on Windows (CP-1252) or other locales silently corrupts unicode. Missing explicit encoding is a Med finding.
12
+ - **Broken asset paths** -- templates reference fonts, images, or stylesheets by path but the paths are not validated at render time. A missing asset produces a broken document silently. This is a Med finding.
13
+
14
+ ## Test Review
15
+
16
+ CRITIC reviews DOCGEN test code with equal hostility to production code. In addition to the standard test review checklist:
17
+
18
+ - **Output parsed, not just "exists"** -- tests that assert `output !== null` or `output.length > 0` without parsing the output structure are a High finding. Tests must parse HTML (DOM), extract PDF text, or parse Markdown and assert on content.
19
+ - **Injection escaping tested** -- at least one test must supply input containing characters that would be dangerous if unescaped (`<script>`, `{{`, `${`, etc.) and verify the output has them escaped. Missing injection tests is a Med finding.
20
+ - **Edge-case data tested** -- tests must include null values, empty collections, unicode characters, and extremely long strings. If all tests use only happy-path data, that is a Med finding.
21
+ - **No mocked renderers** -- tests that mock the template engine or render pipeline instead of invoking real generation are a High finding. The actual engine must process templates and produce real output.
@@ -0,0 +1,29 @@
1
+ # CRITIC Domain Step: Infrastructure Review
2
+
3
+ ## Edge Case Hunt -- Infrastructure
4
+
5
+ In addition to the standard edge case hunt (Pass 2), apply these infrastructure-specific checks:
6
+
7
+ - **Hardcoded secrets** -- Are any credentials, API keys, tokens, or passwords hardcoded in resource definitions, variable defaults, or outputs? Hardcoded secrets is a Critical finding.
8
+ - **Overly permissive IAM** -- Do any IAM policies use wildcard (`*`) actions or resources without explicit justification? Wildcard IAM is a High finding.
9
+ - **Missing resource tags** -- Are any resources missing standard tags (environment, project, owner, managed-by)? Missing tags is a Med finding.
10
+ - **No remote state** -- Is state stored locally instead of a remote backend? Local state file is a High finding.
11
+ - **Missing state locking** -- Is state locking configured (DynamoDB, blob lease, etc.)? Missing locking is a High finding.
12
+ - **Provider version unpinned** -- Are provider versions floating (no version constraint)? Unpinned providers is a Med finding.
13
+ - **Resource dependencies not explicit** -- Are implicit dependencies relied upon where explicit `depends_on` is needed? Missing explicit dependency is a Med finding.
14
+ - **Missing outputs for consuming services** -- Do other services need values (connection strings, ARNs, endpoints) that are not exported as outputs? Missing outputs is a Med finding.
15
+ - **No destroy protection on stateful resources** -- Are databases, storage buckets, or other stateful resources missing lifecycle `prevent_destroy` or deletion protection? Missing destroy protection is a High finding.
16
+
17
+ ## Test Code Review -- Infrastructure
18
+
19
+ In addition to the standard test code review checklist, verify:
20
+
21
+ - **Plan validation exists** -- There must be at least one test that runs `terraform plan` (or equivalent) and asserts success. Missing plan validation is a High finding.
22
+ - **Idempotency tested** -- There must be at least one test that applies infrastructure and then runs plan again, asserting zero changes. Missing idempotency test is a High finding.
23
+ - **Security policies checked** -- There must be tests validating IAM policies are least-privilege and no hardcoded secrets exist (tflint, checkov, OPA, or equivalent). Missing security policy checks is a High finding.
24
+ - **No mocked providers** -- Tests must validate against real plan output or real infrastructure state. Mocking providers for happy-path tests is a High finding.
25
+ - **Tag verification** -- Tests must verify all resources have required standard tags. Missing tag verification is a Med finding.
26
+
27
+ ## Output
28
+
29
+ Record infrastructure-specific findings in the domain review table alongside standard Pass 1 and Pass 2 findings.
@@ -0,0 +1,24 @@
1
+ # CRITIC Domain: Library Review
2
+
3
+ **Applies to:** Stories where LIBDEV is the implementing agent.
4
+
5
+ ## Edge Case Focus Areas
6
+
7
+ In addition to the standard edge case hunt (Pass 2), scrutinize these library-specific risks:
8
+
9
+ - **Accidental breaking changes** -- renamed or removed exports that downstream consumers depend on. Compare the exports map against any prior version. Any export removal or rename without a semver major bump is a High finding.
10
+ - **Missing exports map entries** -- code exists in the package but is not reachable through the declared exports map. Dead code that consumers cannot import is wasted; importable internals that leak through missing exports boundaries are a security/stability risk.
11
+ - **Circular dependencies** -- module A imports module B which imports module A. These cause undefined behavior in CJS (partial objects) and initialization order bugs in ESM. Any circular dependency in the public API surface is a High finding.
12
+ - **CJS/ESM dual-instance corruption** -- when a library is loaded via both `require()` and `import` in the same process, two separate module instances can exist. Shared state (singletons, caches, registries) will diverge silently. If the library holds any mutable state, verify the dual-instance scenario is handled or documented.
13
+ - **Tree-shaking broken by side effects** -- top-level code that executes on import (console.log, global registration, polyfills) prevents bundlers from eliminating unused exports. If `sideEffects: false` is declared but side effects exist, that is a High finding (bundlers will drop code that was meant to run).
14
+ - **Peer dependency version drift** -- declared peer dependency ranges that are too wide (accepting incompatible majors) or too narrow (excluding compatible versions).
15
+ - **Type declaration mismatch** -- .d.ts signatures that do not match the runtime implementation. An overloaded type that accepts `string` when the implementation throws on non-number input is a High finding.
16
+
17
+ ## Test Code Review Additions
18
+
19
+ In addition to the standard test code review checklist:
20
+
21
+ - **Both import paths tested** -- if the library targets CJS+ESM, tests must exercise both `require()` and `import`. If only one path is tested, that is a Med finding.
22
+ - **Consumer-simulation test exists** -- at least one test must import the library the way a real consumer would (from the package entry point, not from internal source paths). Missing consumer-sim is a High finding.
23
+ - **Exports match declared map** -- the test suite must verify that every entry in the exports map resolves to a real module with the expected exports. If this verification is missing, that is a Med finding.
24
+ - **No internal path imports in tests** -- tests that import from `./src/internal/module` instead of the public API are testing implementation, not the contract. This is a Med finding unless the test explicitly targets internals as a regression guard.
@@ -0,0 +1,24 @@
1
+ # CRITIC Domain: MCP Server
2
+
3
+ ## Edge Cases
4
+
5
+ MCP server implementations have protocol-specific failure modes. Hunt for these in addition to the general edge case checklist:
6
+
7
+ - **Crash on malformed JSON** -- does the server survive receiving `{broken` or empty input on the transport? Or does it crash and kill the process?
8
+ - **Mismatched response IDs** -- does the response `id` field always match the request `id`? Are notification messages (no `id`) handled correctly without sending a response?
9
+ - **Missing isError on tool failure** -- when a tool handler throws or fails, does the result include `isError: true`? Or does it silently return a success-shaped response with error text in content?
10
+ - **Schema declared but not validated** -- does the server declare an `inputSchema` for a tool but skip runtime validation? Send params that violate the schema and verify `-32602` is returned.
11
+ - **Pre-initialize requests** -- what happens if a client sends `tools/list` or `tools/call` before `initialize`? The server should reject or handle gracefully, not crash or return stale data.
12
+ - **Unhandled exceptions killing stdio** -- an unhandled throw in a handler can crash the process and sever the stdio pipe. Every handler must be in try-catch. Check for any handler that lacks error wrapping.
13
+ - **Capability mismatch** -- capabilities declared in `initialize` response that have no corresponding implementation, or implemented features not declared in capabilities.
14
+ - **Content type mismatch** -- tool declares it returns `text` content but actually returns a different type, or returns multiple content items when one is expected.
15
+
16
+ ## Test Review
17
+
18
+ CRITIC reviews MCP-DEV test code with the same rigor as production code. In addition to the standard test review checklist:
19
+
20
+ - **Real transport tested** -- tests must spawn a real server and communicate over the actual transport (stdio pipe, SSE, HTTP). Any test that mocks the transport layer is a High finding.
21
+ - **Both error tiers tested** -- tests must cover JSON-RPC error codes (protocol tier) AND `isError: true` (tool tier). Missing either tier is a High finding.
22
+ - **Every tool has a call test** -- every tool registered by the server must have at least one `tools/call` test with valid params. A missing tool test is a High finding.
23
+ - **Initialize-first ordering** -- tests must send `initialize` before other requests. Tests that skip the handshake are testing undefined behavior.
24
+ - **Schema violation tests** -- for every tool with an `inputSchema`, there must be a test sending invalid params and asserting `-32602`. Missing schema validation tests is a Med finding.