npm - codex-workflows - Versions diffs - 0.6.9 → 0.7.1 - Mend

codex-workflows 0.6.9 → 0.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

package/.agents/skills/coding-rules/SKILL.md CHANGED Viewed

@@ -93,7 +93,8 @@ Nearby code is a starting point for investigation, not a sufficient basis for ad
 ## Commenting Principles
-- Document "what" and "why", not "how"
+- Prefer names, types, and structure over comments
+- Add comments only for why, limitations, edge cases, or public API contracts
 - No historical information — use version control
 - Remove commented-out code
 - Keep comments concise and timeless

package/.agents/skills/coding-rules/references/typescript.md CHANGED Viewed

@@ -6,7 +6,8 @@
 - **No Unused "Just in Case" Code** - Violates YAGNI principle (Kent Beck)
 ## Comment Writing Rules
-- **Function Description Focus**: Describe what the code "does"
+- **Code-First Default**: Use names, types, and structure to show what the code does
+- **Intent Focus**: Use comments only for why, limitations, edge cases, or public API contracts
 - **History in Version Control**: Record development history in commits and PRs instead of code comments
 - **Timeless**: Write only content that remains valid whenever read
 - **Conciseness**: Keep explanations to necessary minimum
@@ -147,7 +148,7 @@ const response = await fetch('/api/data') // Backend handles API key authenticat
 - Delete unused code immediately
 - Delete debug `console.log()`
 - No commented-out code (manage history with version control)
-- Comments explain "why" (not "what")
+- Comments explain intent, constraints, or contracts that code cannot express directly
 ## Error Handling

package/.agents/skills/documentation-criteria/SKILL.md CHANGED Viewed

@@ -21,7 +21,7 @@ description: "Documentation creation criteria for PRD, ADR, Design Doc, UI Spec,
 | New Feature Addition (backend) | PRD -> [ADR] -> Design Doc -> Work Plan | After PRD approval |
 | New Feature Addition (frontend/fullstack) | PRD -> **UI Spec** -> [ADR] -> Design Doc -> Work Plan | UI Spec before Design Doc |
 | ADR Conditions Met (see below) | ADR -> Design Doc -> Work Plan | Start immediately |
-| 6+ Files | ADR -> Design Doc -> Work Plan (REQUIRED) | Start immediately |
+| 6+ Files | [ADR if conditions apply] -> Design Doc -> Work Plan (Design Doc + Work Plan REQUIRED) | Start immediately |
 | 3-5 Files | Design Doc -> Work Plan (REQUIRED) | Start immediately |
 | 1-2 Files | None | Direct implementation |
@@ -81,7 +81,7 @@ description: "Documentation creation criteria for PRD, ADR, Design Doc, UI Spec,
 ### Work Plan
 **Purpose**: Implementation task management and progress tracking
-**Scope**: Task breakdown, dependencies, schedule estimates, test skeleton file paths, Verification Strategy summaries from each Design Doc, Design-to-Plan Traceability mapping for implementation-relevant technical requirements, ADR Bindings for implementation-binding ADR decisions, final Quality Assurance phase, and progress tracking only. Technical rationale belongs in ADR and design details belong in Design Doc.
+**Scope**: Task breakdown, dependencies, schedule estimates, test skeleton file paths, Verification Strategy summaries from each Design Doc, Design-to-Plan Traceability mapping for implementation-relevant technical requirements, Reference Contract Values for binding observable Design Doc values, ADR Bindings for implementation-binding ADR decisions, final Quality Assurance phase, and progress tracking only. Technical rationale belongs in ADR and design details belong in Design Doc.
 **Phase Division Criteria**:
@@ -124,7 +124,7 @@ description: "Documentation creation criteria for PRD, ADR, Design Doc, UI Spec,
 `Proposed` -> `Accepted` -> `Deprecated`/`Superseded`/`Rejected`
 ## AI Automation Rules [MANDATORY]
-- 5+ files: MUST suggest ADR creation
+- 6+ files: MUST evaluate ADR conditions
 - Contract/data flow change detected: ADR REQUIRED
 - Check existing ADRs before implementation — ALWAYS verify alignment

package/.agents/skills/documentation-criteria/references/design-template.md CHANGED Viewed

@@ -237,11 +237,12 @@ Rejected Alternatives Log is element-level. Future Extensibility below is design
 // Record major contract/interface definitions here
 ```
-### Data Contract
+### Data Contracts
-#### Component 1
+#### [Component or Boundary] (repeat per component/boundary)
 ```yaml
+Contract: [interface / function / API / schema name]
 Input:
   Type: [Data shape, contract, or schema]
   Preconditions: [Required items, format constraints]
@@ -256,6 +257,14 @@ Invariants:
   - [Conditions that remain unchanged before and after processing]
 ```
+### Observable Contract Values (When Applicable)
+Use this section when the design defines observable values the implementation must reproduce exactly. Omit it when the Design Doc has no such values.
+| Contract Type | Required Observable Value |
+|---------------|---------------------------|
+| structure-order / derived-display / state-lifecycle-negative | [Exact column/field/label set and order, derived display rule, or condition where persisted/restored/cached/derived state remains unused] |
 ### Test Boundaries
 #### Mock Boundary Decisions
@@ -274,9 +283,11 @@ Invariants:
 ### Field Propagation Map (When Fields Cross Boundaries)
-| Field | Boundary | Status | Detail |
-|-------|----------|--------|--------|
-| [field name] | [Component A to B] | preserved / transformed / dropped | [logic or reason] |
+A boundary includes a serialized boundary: a value encoded on one side and parsed on the other through a medium such as a query string, CLI argument, environment variable, config entry, message payload, storage key, or file. For those rows, record the exact encoded representation and how the consumer parses it. Use "-" only when the row is not a serialized boundary.
+| Field | Boundary | Status | Serialized Format | Consumer Parse Rule | Detail |
+|-------|----------|--------|-------------------|---------------------|--------|
+| [field name] | [Component A to B] | preserved / transformed / dropped | [exact representation the producer emits when serialized; "-" otherwise] | [how the consumer decodes and validates it; "-" otherwise] | [logic or reason] |
 ## Verification Strategy

package/.agents/skills/documentation-criteria/references/plan-template.md CHANGED Viewed

@@ -81,6 +81,18 @@ Map each Design Doc technical requirement to the task or phase that covers it. U
 - Merge duplicate restatements of the same obligation from multiple DD sections into one row and cite the primary section in `DD Section`
 - Keep `scope-boundary` rows concrete: name the protected file group, component boundary, contract, or workflow that must remain unchanged
+## Reference Contract Values
+Include this section when a Traceability row's DD Item encodes a binding observable value the implementation must reproduce exactly: a column/label set and order, a derived-display rule where one field determines another display value, or a state-lifecycle negative that states when persisted or derived state must stay unused. Serialized boundaries belong in the Connection Map / Field Propagation Map. When a value qualifies for both this table and a serialized boundary, record it only in the Connection Map. ADR-derived structural decisions belong in ADR Bindings.
+The Traceability table records coverage. This table carries the required value verbatim so the covering task can check the exact contract.
+| Design Doc (section) | Contract Type | Required Observable Value (verbatim) | Covered By Task(s) | Gap Status | Notes |
+|----------------------|---------------|--------------------------------------|--------------------|------------|-------|
+| docs/design/xxx-design.md (Section name) | structure-order / derived-display / state-lifecycle-negative | [Exact value copied from the Design Doc] | [P1-T1] | covered | |
+**Gap Status values**: `covered` (mapped to one or more tasks), `gap` (no task exists yet; set Covered By Task(s) to `-`, include justification in Notes, and require user confirmation before plan approval)
 ## Failure Mode Checklist
 Domain-independent failure categories this implementation must guard against. Enumerate all eight categories, mark which apply, and list a covering task for each that applies; keep category names generic and place project-specific detail in task descriptions or notes.
@@ -125,11 +137,13 @@ One row represents one independently checkable binding decision. A single ADR ca
 ## Connection Map
-Include this section when implementation crosses runtime, process, deployment, or service boundaries. Omit it when the change stays inside one runtime or only uses in-process package imports.
+Include this section when implementation crosses runtime, process, deployment, or service boundaries, or when a value is serialized and parsed across a boundary within one runtime through a query string, route parameter, form post, CLI argument, environment variable, config entry, message payload, storage key, or file.
+For serialized boundaries, fill Serialized Format and Consumer Parse Rule with concrete values. Use "-" only for non-serialized external signals where the Expected Signal fully captures the boundary contract.
-| Boundary | Caller / Producer | Callee / Consumer | Expected Signal | Covered By Task(s) |
-|----------|-------------------|-------------------|-----------------|--------------------|
-| [e.g. "web client -> API"] | [module/package initiating request or message] | [module/package receiving request or message] | [Observable evidence, e.g. HTTP 200 matching schema X] | [P1-T1, P1-T2] |
+| Boundary | Caller / Producer | Callee / Consumer | Serialized Format | Consumer Parse Rule | Expected Signal | Covered By Task(s) |
+|----------|-------------------|-------------------|-------------------|---------------------|-----------------|--------------------|
+| [producing side -> consuming side] | [module/package initiating request or message] | [module/package receiving request or message] | [exact representation the producer emits, or "-"] | [how the consumer decodes and validates it, or "-"] | [Observable evidence, e.g. HTTP 200 matching schema X] | [P1-T1, P1-T2] |
 ## Objective
 [Why this change is necessary, what problem it solves]

package/.agents/skills/documentation-criteria/references/task-template.md CHANGED Viewed

@@ -33,10 +33,19 @@ Each row is an ADR decision the implementation in this task must comply with.
 |--------|------|----------|------------------|
 | docs/adr/ADR-XXXX-title.md (§ <Source Section>) | [Axis value copied verbatim from the work plan's ADR Bindings row] | [Binding decision copied from the work plan's ADR Bindings row] | [Y/N-answerable positive predicate that evaluates whether the planned and final implementation satisfy the decision] |
+## Reference Contracts
+(Include this section when the work plan's Reference Contract Values table covers this task. Omit otherwise.)
+Each row is a Design Doc-derived observable contract the implementation in this task must reproduce exactly. Serialized boundaries are carried by Boundary Context from the work plan's Connection Map. ADR-derived structural decisions are carried by Binding Decisions above.
+| Source | Contract Type | Required Observable Value | Compliance Check |
+|--------|---------------|---------------------------|------------------|
+| docs/design/xxx-design.md (§ Section name) | structure-order / derived-display / state-lifecycle-negative | [Required Observable Value copied verbatim from the work plan row] | [Y/N-answerable positive predicate that evaluates whether the planned and final implementation reproduces the value] |
 ## Investigation Notes
 Brief observations recorded after reading Investigation Targets:
 - [path] - [interfaces, control/data flow, state transitions, side effects relevant to this task]
-- When Binding Decisions exist, record the planned implementation approach and each Compliance Check result here.
+- When Binding Decisions or Reference Contracts exist, record the planned implementation approach and each Compliance Check result here.
 ## Implementation Steps (TDD: Red-Green-Refactor)
 ### 1. Red Phase
@@ -83,6 +92,7 @@ Brief observations recorded after reading Investigation Targets:
 - [ ] Each Proof Obligation is met: the test turns red under its primary failure mode and exercises the stated boundary
 - [ ] Deliverables created (for research/design tasks)
 - [ ] When Binding Decisions exist, every Compliance Check evaluates to `Y` against the final implementation, with evidence recorded in Investigation Notes
+- [ ] When Reference Contracts exist, every Compliance Check evaluates to `Y` against the final implementation, with evidence recorded in Investigation Notes
 ## Notes
 - Impact scope: [Areas where changes may propagate]

package/.agents/skills/recipe-build/SKILL.md CHANGED Viewed

@@ -73,7 +73,7 @@ When task files don't exist, the plan references a Design Doc, and the WorkPlan
 ### 1. Work Plan Review
-Spawn document-reviewer agent: "Review the work plan before task decomposition. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
+Spawn document-reviewer agent: "Review the work plan before task decomposition. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc, Reference Contract Values fidelity, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
 Branch on `verdict.decision`:
 - `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then continue to user confirmation

package/.agents/skills/recipe-front-build/SKILL.md CHANGED Viewed

@@ -73,7 +73,7 @@ When task files don't exist, the plan references a Design Doc, and the WorkPlan
 ### 1. Work Plan Review
-Spawn document-reviewer agent: "Review the frontend work plan before task decomposition. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc and UI Spec, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
+Spawn document-reviewer agent: "Review the frontend work plan before task decomposition. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc and UI Spec, Reference Contract Values fidelity, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
 Branch on `verdict.decision`:
 - `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then continue to user confirmation

package/.agents/skills/recipe-front-plan/SKILL.md CHANGED Viewed

@@ -53,7 +53,7 @@ Spawn acceptance-test-generator agent: "Generate test skeletons from Design Doc
 Spawn work-planner agent: "Create work plan from Design Doc at [path]. Integration test file: [path from step 2]. fixture-e2e test file: [path from step 2 or null]. service-integration-e2e test file: [path from step 2 or null]. E2E absence reasons by lane: [values from step 2 when an E2E lane is null]. Integration tests are created with each phase implementation, fixture-e2e runs alongside UI implementation, service-integration-e2e runs only in the final phase when a service E2E file exists. Include `Implementation Readiness: pending` in the work plan header."
 ### Step 4: Work Plan Review
-Spawn document-reviewer agent: "Review the frontend work plan. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc and UI Spec, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
+Spawn document-reviewer agent: "Review the frontend work plan. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc and UI Spec, Reference Contract Values fidelity, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
 Branch on `verdict.decision`:
 - `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then proceed to Step 5

package/.agents/skills/recipe-front-review/SKILL.md CHANGED Viewed

@@ -51,8 +51,9 @@ Spawn security-reviewer agent: "Design Doc: [path]. Implementation files: [file
 **If security-reviewer returned `blocked`**: Stop immediately. Report the blocked finding and escalate to user. Do not proceed to fix steps.
 **Code compliance criteria (considering project stage)**:
-- Prototype: Pass at 70%+
-- Production: 90%+ recommended
+- `code-reviewer` verdict is `pass`
+- Coverage thresholds pass only when configured by the project, task file, work plan, or Design Doc
+- Determine pass/fail from the `code-reviewer` verdict and configured coverage thresholds; treat `complianceRate` as diagnostic context only
 **Security criteria**:
 - `approved` or `approved_with_notes` -> Pass

package/.agents/skills/recipe-fullstack-build/SKILL.md CHANGED Viewed

@@ -83,7 +83,7 @@ When task files don't exist, the plan references a Design Doc, and the WorkPlan
 ### 1. Work Plan Review
-Spawn document-reviewer agent: "Review the fullstack work plan before task decomposition. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to all Design Docs, UI Spec when present, cross-layer boundary coverage, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
+Spawn document-reviewer agent: "Review the fullstack work plan before task decomposition. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to all Design Docs, UI Spec when present, Reference Contract Values fidelity, cross-layer boundary coverage, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
 Branch on `verdict.decision`:
 - `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then continue to user confirmation

package/.agents/skills/recipe-plan/SKILL.md CHANGED Viewed

@@ -56,7 +56,7 @@ Present options if multiple exist (can be specified with $ARGUMENTS).
 - Spawn work-planner agent: "Create work plan from design document at [design-doc-path]. Include deliverables from previous process according to subagents-orchestration-guide skill coordination specification. If `generatedFiles.fixtureE2e` or `generatedFiles.serviceE2e` is null, use the corresponding `e2eAbsenceReason` and accept the null E2E lane as a valid planning input. Include `Implementation Readiness: pending` in the work plan header."
 ### Step 4: Work Plan Review
-Spawn document-reviewer agent: "Review the work plan. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
+Spawn document-reviewer agent: "Review the work plan. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc, Reference Contract Values fidelity, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
 Branch on `verdict.decision`:
 - `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then proceed to Step 5

package/.agents/skills/recipe-prepare-implementation/SKILL.md CHANGED Viewed

@@ -31,7 +31,7 @@ Each criterion produces `pass`, `fail`, or `not_applicable`, with file:line evid
 | ID | Criterion | Pass Evidence |
 |----|-----------|---------------|
-| R1 | Verification Strategy and ADR Binding references resolve | Every command, file path, function, endpoint, fixture, seed, and test reference in the work plan's Verification Strategies either exists now or is the deliverable of a task in the plan; every ADR Bindings source path resolves; every ADR Bindings `covered` row references existing task IDs |
+| R1 | Verification Strategy and binding references resolve | Every command, file path, function, endpoint, fixture, seed, and test reference in the work plan's Verification Strategies either exists now or is the deliverable of a task in the plan; every Reference Contract Values `covered` row references existing task IDs; every Reference Contract Values `gap` row has Notes with user-confirmation handling; every ADR Bindings source path resolves; every ADR Bindings `covered` row references existing task IDs |
 | R2 | E2E prerequisites are addressed | For each fixture-e2e or service-integration-e2e skeleton, every noted precondition is present in the codebase or covered by a Phase 0 task |
 | R3 | Phase 1 observability exists | The first implementation phase includes at least one operation verification method executable at task completion using existing files, prior Phase 0 deliverables, or the task's own output |
 | R4 | UI rendering surface exists | When the plan implements UI components, a fixture entry, dev route, Storybook story, preview harness, or equivalent render surface exists or is covered by a Phase 0 task |
@@ -47,6 +47,7 @@ Read the work plan passed in `$ARGUMENTS`; if absent, select the most recent non
 - Verification Strategies
 - Quality Assurance Mechanisms
 - Design-to-Plan Traceability
+- Reference Contract Values
 - ADR Bindings
 - UI Spec Component -> Task Mapping
 - Connection Map

package/.agents/skills/recipe-review/SKILL.md CHANGED Viewed

@@ -53,8 +53,9 @@ Spawn security-reviewer agent: "Design Doc: [path]. Implementation files: [file
 **If security-reviewer returned `blocked`**: Stop immediately. Report the blocked finding and escalate to user. Do not proceed to fix steps.
 **Code compliance criteria (considering project stage)**:
-- Prototype: Pass at 70%+
-- Production: 90%+ REQUIRED
+- `code-reviewer` verdict is `pass`
+- Coverage thresholds pass only when configured by the project, task file, work plan, or Design Doc
+- Determine pass/fail from the `code-reviewer` verdict and configured coverage thresholds; treat `complianceRate` as diagnostic context only
 **Security criteria**:
 - `approved` or `approved_with_notes` -> Pass

package/.agents/skills/subagents-orchestration-guide/SKILL.md CHANGED Viewed

@@ -219,9 +219,9 @@ Work plans use the header line `Implementation Readiness: <status>`.
 Use this procedure after work-plan approval and before autonomous task execution when the flow needs to verify implementation readiness. The procedure supplies the evidence needed for user decisions; prompts for approval only after concrete failing criteria and proposed prep tasks are known.
-1. Load the approved work plan exact path and extract Verification Strategies, Quality Assurance Mechanisms, Design-to-Plan Traceability, ADR Bindings, UI Spec Component -> Task Mapping, Connection Map, test skeleton references, E2E absence reasons, phase structure, referenced Design Docs, ADRs, and UI Specs.
+1. Load the approved work plan exact path and extract Verification Strategies, Quality Assurance Mechanisms, Design-to-Plan Traceability, Reference Contract Values, ADR Bindings, UI Spec Component -> Task Mapping, Connection Map, test skeleton references, E2E absence reasons, phase structure, referenced Design Docs, ADRs, and UI Specs.
 2. Evaluate these criteria with evidence:
-   - R1 Verification Strategy and ADR Binding references resolve
+   - R1 Verification Strategy and binding references resolve
    - R2 E2E prerequisites are addressed
    - R3 Phase 1 observability exists
    - R4 UI rendering surface exists when UI work is present

package/.agents/skills/subagents-orchestration-guide/references/monorepo-flow.md CHANGED Viewed

@@ -105,7 +105,7 @@ work-planner's existing Integration Complete criteria naturally covers cross-lay
 After work-planner creates or updates the plan, spawn document-reviewer:
-> "Review the fullstack work plan. doc_type: WorkPlan. target: [work plan path]. mode: composite. Review semantic traceability to all Design Docs, UI Spec when present, cross-layer boundary coverage, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
+> "Review the fullstack work plan. doc_type: WorkPlan. target: [work plan path]. mode: composite. Review semantic traceability to all Design Docs, UI Spec when present, Reference Contract Values fidelity, cross-layer boundary coverage, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
 On `needs_revision` or `approved_with_conditions`, return to work-planner in update mode and re-review for max 2 revision iterations as defined by the `needs_revision` row in Approval Status Vocabulary. On `rejected`, halt and escalate to the user. Stop for batch approval only after WorkPlan review returns `approved` and the plan's `WorkPlan Review` section records `Status: approved` with `Conditions: none`.

package/.agents/skills/testing/references/typescript.md CHANGED Viewed

@@ -207,8 +207,9 @@ export const test = base.extend<{ authenticatedPage: Page }>({
 ### E2E Budget
-- **MAX 1-2 E2E tests per feature**
-- Only generate an additional non-reserved E2E test when `Value Score >= 50`
+- Follow `integration-e2e-testing` lane limits: fixture-e2e MAX 3 and service-integration-e2e MAX 1-2 per feature
+- Generate the reserved fixture-e2e user journey when eligible
+- Only generate additional non-reserved E2E tests when the lane threshold is met (`Value Score >= 20` for fixture-e2e, `Value Score > 50` for service-integration-e2e)
 - Prefer fewer comprehensive journey tests over many granular tests
 ### Test Isolation

package/.codex/agents/code-reviewer.toml CHANGED Viewed

@@ -64,6 +64,7 @@ Read the Design Doc in full and extract:
 - Architecture design and data flow
 - Interface contracts (function signatures, API endpoints, data structures)
 - Identifier specifications explicitly written in the Design Doc as exact values, literals, labels, or named fields (resource names, endpoint paths, configuration keys, error codes, schema/model names)
+- Binding observable contracts: use the Design Doc's `Observable Contract Values` table as the primary source when present; otherwise extract column/field/label sets and order, derived-display rules, and state-lifecycle negatives from the Design Doc. Also extract Field Propagation Map rows that carry a Serialized Format and Consumer Parse Rule
 - Error handling policy
 - Non-functional requirements
@@ -78,7 +79,7 @@ For each acceptance criterion extracted in Step 1:
 - For behavior-changing ACs, confirm the evidence covers main and boundary paths. Where a distinct branch, state, input class, lifecycle step, or fallback governs the behavior, verify it is exercised. Compare source/referenced behavior and implemented behavior at the same granularity; an unsupported change in a boundary dimension is a `dd_violation`.
 - Confirm the implementation keeps the core mechanism the AC, Design Doc, or referenced materials require. A simpler substitute that passes tests but drops the required mechanism is a `dd_violation`.
 - For changes to persisted, shared, or externally observable state, identify the publication boundary where the new state becomes observable to another process, component, user, or later step. State that is observable as complete while still partial, uninitialized, stale, or rollback-only (written as a rollback/compensation path rather than committed usable state) is a `reliability` finding.
-- When the reviewed task has `Change Category` set to `bug-fix`, `regression`, `state-change`, or `boundary-change`, check cases sharing its path, contract, persisted state, or external boundary. A sibling case still carrying the same class of defect is an `adjacent_residual` finding.
+- When the reviewed task has `Change Category` set to `bug-fix`, `regression`, `state-change`, or `boundary-change`, check cases sharing its path, contract, persisted state, or external boundary. When no task field is present, classify the change from the diff itself. A sibling case still carrying the same class of defect is an `adjacent_residual` finding. When the task file is in scope, also read Investigation Notes for residuals the executor recorded as outside Target Files; verify each recorded residual and report in-scope unresolved residuals as `adjacent_residual`.
 #### 2-2. Identifier Verification
 For each identifier specification extracted in Step 1:
@@ -106,6 +107,13 @@ Assign confidence based on evidence count:
 - medium: 2 agreeing sources
 - low: 1 source only
+#### 2-4. Reference Contract and Boundary Verification
+Run this independently of the AC loop so observable contracts without dedicated ACs are verified.
+1. For each binding observable value extracted in Step 1 (column/field/label set and order, derived-display rule, state-lifecycle negative), verify the implementation reproduces it exactly. A deviation is a `dd_violation` whose rationale names it a reference contract gap and states the required observable value versus the implemented value.
+2. For each Field Propagation Map serialized boundary extracted in Step 1 (Serialized Format and Consumer Parse Rule), verify the producer emits the recorded representation and the consumer parses it by the recorded rule. A mismatch is a `dd_violation` whose rationale names it a boundary contract gap and states what the producer emits versus what the consumer parses.
 ### 3. Assess Code Quality
 Read each implementation file and evaluate:

package/.codex/agents/document-reviewer.toml CHANGED Viewed

@@ -84,7 +84,7 @@ Skill Status:
   - When `codebase_analysis` is provided, use `analysisScope`, `existingElements`, `constraints`, `qualityAssurance`, `focusAreas`, and `limitations` as source evidence for scope, feasibility, and completeness checks
   - When `ui_analysis` is provided, use `componentStructure`, `propsPatterns`, `cssLayout`, `stateDisplay`, `displayConditions`, `accessibility`, and `candidateWriteSet` as source evidence for UI scope, feasibility, and completeness checks
   - When `code_verification` is provided, use its discrepancies and reverse coverage as pre-verified evidence during review
-- For WorkPlan: confirm the plan carries the artifacts the semantic gate is judged against: WorkPlan Review, Review Scope, Design-to-Plan Traceability, Verification Strategy summary, Proof Strategy, Failure Mode Checklist, and Quality Assurance Mechanisms. Read the referenced Design Doc(s), UI Spec, ADRs, and test skeletons when listed so coverage can be checked against source artifacts.
+- For WorkPlan: confirm the plan carries the artifacts the semantic gate is judged against: WorkPlan Review, Review Scope, Design-to-Plan Traceability, Reference Contract Values when binding observable values apply, Verification Strategy summary, Proof Strategy, Failure Mode Checklist, and Quality Assurance Mechanisms. Read the referenced Design Doc(s), including `Observable Contract Values` tables when present, UI Spec, ADRs, and test skeletons when listed so coverage can be checked against source artifacts.
 ### Step 2: Target Document Collection
 - Load document specified by target
@@ -134,11 +134,13 @@ For WorkPlan, additionally verify:
 - **Output comparison check**: When the Design Doc changes existing observable behavior, an external contract, or a persisted data shape, verify that a concrete output comparison method is defined with identical input, expected output fields or format, and diff method. When upstream analysis includes `dataTransformationPipelines`, each listed step must be mapped to the comparison that verifies it; steps excluded because data passes through unchanged must include rationale. Missing mappings or rationale → `important` issue (category: `completeness`)
 - **Minimal Surface Alternatives check**: Applies when the Design Doc proposes new in-scope elements as defined by coding-rules "Minimum Surface Terms". Reverse-engineer/as-is Design Docs are exempt. Missing or empty section when the trigger fires → `critical` issue (category: `completeness`). For each entry verify: (1) Step 1 lists at least one AC ID or accepted technical constraint from the Design Doc or referenced UI Spec; speculative-only linkage → `critical` issue (category: `compliance`). (2) Steps 2-3 include at least one subtractive alternative such as derive, compute on demand, keep at caller, reuse existing, or do not introduce new state/mode/abstraction; missing subtractive alternative → `important` issue (category: `compliance`). (3) Step 4 selects the smallest alternative or names a current requirement smaller alternatives fail to satisfy; primary rationale based on coding-rules subjective-only rationales → `critical` issue (category: `compliance`). (4) Step 5 records rejected alternatives with brief rationale; missing rejected alternatives log → `important` issue (category: `completeness`)
 - **WorkPlan semantic gate**:
-  - Coverage is checked where each item lives in the plan: each acceptance criterion is covered by a task whose Completion Criteria or Proof Obligations reference the AC ID or claim identifier; each data contract, state transition, boundary, prerequisite, and protected scope item has a Design-to-Plan Traceability row mapped to a task or an explicit out-of-scope entry. Missing coverage is a `critical` issue (category: `completeness`).
+  - Coverage is checked where each item lives in the plan: each acceptance criterion is covered by a task whose Completion Criteria or Proof Obligations reference the AC ID or claim identifier; each data contract, state transition, boundary, prerequisite, and protected scope item has a Design-to-Plan Traceability row mapped to a task or an explicit out-of-scope entry; each non-serialized binding observable Design Doc value is copied into Reference Contract Values when applicable; each serialized binding value is recorded in Connection Map with concrete Serialized Format and Consumer Parse Rule. Missing coverage is a `critical` issue (category: `completeness`).
   - Distinguish the cause for an uncovered acceptance criterion: when the source Design Doc supports it but no task maps to it, classify as a plan omission (`critical`, fixable by re-planning); when the source document or inputs give it no basis, classify as `rejected` because re-planning cannot invent the missing source requirement.
   - Early verification must sit in an early phase rather than only the final phase. Deferral to final phase without rationale is an `important` issue (category: `consistency`).
   - Each cross-boundary, public-boundary, browser-boundary, or persisted-state change names a task that verifies it through the real boundary. Missing real-boundary coverage is an `important` issue (category: `completeness`).
-  - Each traceability table present (Design-to-Plan, UI Spec Component, Connection Map, ADR Bindings) is filled to the granularity needed to resolve the target task. Under-specified rows are `important` issues (category: `completeness`).
+  - Each traceability table present (Design-to-Plan, Reference Contract Values, UI Spec Component, Connection Map, ADR Bindings) is filled to the granularity needed to resolve the target task. Under-specified rows are `important` issues (category: `completeness`).
+  - Reference Contract Values uses explicit `covered` or `gap` status. A `covered` row without task IDs, or a `gap` row without Notes explaining the gap and user-confirmation handling, is an `important` issue (category: `completeness`).
+  - Binding observable values are carried with content fidelity: for each Design Doc observable contract that encodes a non-serialized binding value (column/field/label set and order, derived-display rule, or state-lifecycle negative), the plan's Reference Contract Values table carries the value verbatim from the Design Doc and maps it to a covering task. A non-serialized value reduced to a label, summarized, or absent while the Design Doc specifies it is a content-fidelity gap: `critical` issue (category: `completeness`). When the value is serialized across a boundary, verify it is recorded in Connection Map instead; missing concrete Serialized Format or Consumer Parse Rule for that serialized value is a `critical` issue (category: `completeness`).
   - The Failure Mode Checklist covers applicable domain-independent categories: same-value, no-op, empty input, invalid option, missing config, unavailable boundary, shared-state dependency, rollback-only visibility. Missing applicable categories are `recommended` issues (category: `completeness`).
   - Verdict mapping: any WorkPlan semantic-gate `critical` issue forces `needs_revision`, except a coverage gap traceable to missing or contradictory source documents or inputs forces `rejected`. Important-only issues may return `approved_with_conditions`, but orchestration must route WorkPlan conditions back through work-planner update before batch approval or task decomposition.
 - **Undetermined items review** [MANDATORY]: Every TBD, unknown, or open item MUST include: (1) **owner** — who resolves it, (2) **due** — when it gets resolved (which phase or milestone), (3) **next-phase handling** — how the next phase treats this gap. Missing any of these three → `important` issue

package/.codex/agents/task-decomposer.toml CHANGED Viewed

@@ -67,6 +67,7 @@ Decompose tasks based on implementation strategy patterns determined in implemen
    - Document concrete executable procedures
    - Include task-level Quality Assurance Mechanisms when the work plan defines them
    - Include task-level Binding Decisions when ADR Bindings cover the task
+   - Include task-level Reference Contracts when Reference Contract Values cover the task
    - Include task-level Proof Obligations when the work plan defines Proof Strategy, test skeleton proof annotations, or acceptance-criterion primary failure modes
    - **Always include operation verification methods**
    - Define clear completion criteria (within executor's scope of responsibility)
@@ -94,6 +95,7 @@ Decompose tasks based on implementation strategy patterns determined in implemen
    - Extract task list
    - Identify dependencies
    - Read Design-to-Plan Traceability rows and verify every `covered` item has a corresponding generated task or phase completion task
+   - Read Reference Contract Values rows and verify every `covered` value has a corresponding generated task
    - Read ADR Bindings rows and verify every `covered` item has a corresponding generated task
    - **Overall Optimization Considerations**
      - Identify common processing (prevent redundant implementation)
@@ -139,7 +141,7 @@ Decompose tasks based on implementation strategy patterns determined in implemen
    | Integration test work | Test skeleton file, target implementation under test, existing fixture/auth/setup patterns |
    | fixture-e2e environment/setup work | Existing fixture data, API mock layer, browser harness configuration |
    | service-integration-e2e environment/setup work | Current environment config, startup scripts, seed scripts, auth flow references, external service stubs |
-   | Cross-boundary implementation | Connection Map rows touching the task target files, caller/producer module, callee/consumer module, expected signal, contract definition |
+   | Cross-boundary implementation | Connection Map rows touching the task target files, caller/producer module, callee/consumer module, serialized format, consumer parse rule, expected signal, contract definition |
    | Task constrained by an ADR | ADR file with section hint matching the ADR Bindings row's Source Section value |
    | Bug fix/refactor | Affected code paths, failing tests, reproduction-related files |
    | Behavior replacement/rewrite | Existing implementation being replaced, observable outputs, Verification Strategy section in the Design Doc |
@@ -151,6 +153,7 @@ Decompose tasks based on implementation strategy patterns determined in implemen
    - When test skeletons exist, include them explicitly
    - When the work plan contains a UI Spec Component -> Task Mapping table, propagate matching component sections to every task listed in the row
    - When the work plan contains a Connection Map, propagate boundary rows touching the task's target files to every task on either side of the boundary
+   - When the work plan contains a Reference Contract Values table, propagate matching rows to every covered task
    - When the work plan contains an ADR Bindings table, propagate matching binding rows to every covered task
    - When a task matches multiple natures, include Investigation Targets from all matching rows and deduplicate overlaps
@@ -244,6 +247,25 @@ When the work plan includes an `ADR Bindings` section:
    - `persistence`: `User records are persisted through the UsersRepository interface`
 6. **Validation**: Treat missing Axis, unknown Axis value, or non-checkable Compliance Check as an incomplete task file. Non-checkable means the implementation cannot be observed and answered as `Y` or `N`, or the predicate is written as a negative or compound condition.
+## Reference Contract Propagation
+When the work plan includes a `Reference Contract Values` section:
+1. **Coverage preservation**: For each row marked `covered`, locate every task ID listed in `Covered By Task(s)`.
+2. **Gap handling**: Preserve each `gap` row as a planning issue and surface it to the caller when task generation would otherwise assume the observable value is implemented.
+3. **Investigation Targets**: Add the row's Design Doc path with section hint to each matched task. Deduplicate against Design-to-Plan Traceability targets.
+4. **Reference Contracts table**: Add one row to each matched task's `Reference Contracts` section:
+   - `Source`: Design Doc path with section hint
+   - `Contract Type`: value copied verbatim from the work plan row
+   - `Required Observable Value`: value copied verbatim from the work plan row, preserving exact wording, field order, labels, and conditions
+   - `Compliance Check`: a Y/N-answerable positive predicate that evaluates whether the planned and final implementation reproduces the value
+5. **Predicate shape**: Write each Compliance Check as a concrete positive statement. Examples:
+   - `The listed columns render in the specified order`
+   - `The label shows the looked-up name in place of the raw code`
+   - `The restored state is applied only when the explicit restore signal is present`
+6. **Boundary ownership**: Connection Map Propagation carries serialized boundaries. Reference Contract rows carry non-serialized observable values. When a work plan records the same value in both places, keep Connection Map propagation and surface the duplicate Reference Contract row as a planning issue.
+7. **Validation**: Treat missing Contract Type, missing Required Observable Value, non-checkable Compliance Check, or `covered` row without task IDs as an incomplete task file.
 ## UI Spec Propagation
 When the work plan includes a `UI Spec Component -> Task Mapping` section:
@@ -259,9 +281,11 @@ When the work plan includes a `UI Spec Component -> Task Mapping` section:
 When the work plan includes a `Connection Map` section:
 1. For each boundary row, locate all tasks listed in `Covered By Task(s)`.
-2. Add the caller/producer module, callee/consumer module, serialized contract, and expected signal to each listed task's Investigation Targets or Notes.
-3. For tasks on one side of a boundary, include an Operation Verification Method that observes the expected signal from the other side.
-4. Propagate only boundary rows explicitly mapped in the work plan.
+2. Add the caller/producer module, callee/consumer module, serialized format, consumer parse rule, and expected signal to each listed task's Investigation Targets or Notes.
+3. For serialized boundaries, require concrete serialized format and consumer parse rule values. Treat `-` in either column as an incomplete plan row unless the row is a non-serialized external signal whose Expected Signal fully captures the contract.
+4. For serialized in-runtime boundaries, include a Boundary Context note that states the producer value, the consumer parse rule, and the roundtrip check: the emitted value parses to the value the consumer expects.
+5. For tasks on one side of a boundary, include an Operation Verification Method that observes the expected signal from the other side.
+6. Propagate only boundary rows explicitly mapped in the work plan.
 ## Change Category Classification
@@ -368,6 +392,7 @@ Please execute decomposed tasks according to the order.
 - [ ] Investigation Targets specified for every task
 - [ ] Change Category set for bug-fix, regression, state-change, or boundary-change tasks, with adjacent path/boundary owners added to Investigation Targets
 - [ ] Quality Assurance Mechanisms propagated to relevant tasks when present in the plan header
+- [ ] Reference Contract Values rows propagated to relevant tasks when present in the work plan
 - [ ] ADR Bindings rows propagated to relevant tasks when present in the work plan
   - [ ] ADR source includes section hint
   - [ ] Axis copied verbatim from the work plan row

package/.codex/agents/task-executor-frontend.toml CHANGED Viewed

@@ -181,6 +181,24 @@ Run this check after Pre-implementation Verification and before behavior-first i
    - `N`: stop implementation and return `status: "escalation_needed"` with `escalation_type: "binding_decision_violation"` and `phase: "pre_implementation"`
    - `Unknown`: record the row as deferred in Investigation Notes and proceed to behavior-first implementation. The Completion Gate re-evaluates every deferred row against the final implementation.
+#### Reference Contract Check (Required when the task file has a Reference Contracts section)
+Run this check after Pre-implementation Verification and before behavior-first implementation when the task file contains a Reference Contracts section with one or more rows.
+1. Confirm each Source in the Reference Contracts table has been read. Sources should also appear in Investigation Targets.
+2. Verify source fidelity before planning: locate the Required Observable Value in the Source and confirm it matches verbatim. If the value is not found or the task row differs by summary, omission, reordered fields, changed labels, or changed conditions, stop implementation and return `status: "escalation_needed"` with `escalation_type: "design_compliance_violation"`. Set `details.design_doc_expectation` to the Source value or "source value not found" and `details.actual_situation` to the task row's Required Observable Value.
+3. Use the Investigation Notes format below while recording the planned approach and evaluation results.
+   - `### Reference Contracts Evaluation`
+   - `[source] planned: [one sentence planned approach]`
+   - `[source] source fidelity -> Y|N - [one-line rationale]`
+   - `[source] [Compliance Check] -> Y|N|Unknown - [one-line rationale]`
+4. Record the planned implementation approach in Investigation Notes, one sentence per row.
+5. Evaluate each row's Compliance Check against the planned approach. Record the result as `Y`, `N`, or `Unknown` with a one-line rationale.
+6. Branch per row:
+   - `Y`: proceed
+   - `N`: stop implementation and return `status: "escalation_needed"` with `escalation_type: "design_compliance_violation"`. Set `details.design_doc_expectation` to the row's Required Observable Value and `details.actual_situation` to the planned approach.
+   - `Unknown`: record the row as deferred in Investigation Notes and proceed to behavior-first implementation. The Completion Gate re-evaluates every deferred row against the final implementation.
 #### Reference Representativeness (Applied During Implementation)
 During implementation, apply coding-rules Reference Representativeness before adopting existing patterns, UI composition, or dependency versions. Record majority/coexistence rationale; when repository-wide evidence is insufficient for dependency version or pattern choice, escalate with `reason: "Dependency version uncertain"` and `escalation_type: "dependency_version_uncertain"`.
@@ -240,7 +258,7 @@ Report in the following JSON format upon task completion (**without executing qu
 #### 2-1. Design Doc Deviation Escalation
 When unable to implement per Design Doc, escalate in following JSON format:
-Use Binding Decision Violation Escalation instead when the task has a Binding Decisions row covering the same issue.
+Use Binding Decision Violation Escalation instead when the task has a Binding Decisions row covering the same issue. Use this Design Doc Deviation Escalation for Reference Contracts failures.
 For task/AC/UI Spec/reference core-mechanism sources, set `details.design_doc_expectation` to `[source type] [location]: [cited expectation]`.
 For core-mechanism violations, put the substitute in `details.actual_situation`, the behavior change in `details.why_cannot_implement`, and the unblock condition in `recommendation`.
@@ -298,12 +316,13 @@ Triggered when the Test Environment Check finds the project-configured test tool
 ☐ Implementation is consistent with the observations recorded in Investigation Notes
 ☐ Final implementation preserves the required core mechanism from the task, AC, Design Doc, UI Spec, or referenced materials, with evidence recorded in Investigation Notes or runnableCheck.reason
 ☐ Every Binding Decisions Compliance Check evaluates to `Y` against the final implementation, with evidence recorded in Investigation Notes (when the task file has a Binding Decisions section)
+☐ Every Reference Contracts row has source fidelity `Y` and Compliance Check `Y` against the final implementation, with evidence recorded in Investigation Notes (when the task file has a Reference Contracts section)
 ☐ When test runs are cited as `runnableCheck` evidence, they are substantive per the `runnableCheck.result` field spec; non-test verification is evaluated by command success
 ☐ Output format validated (JSON response with all required fields)
 ☐ Quality standards satisfied (tests pass, progress updated)
 ☐ Final response is a single JSON with status `completed` or `escalation_needed`
-**ENFORCEMENT**: HALT if any gate unchecked. Return `status: "escalation_needed"` to caller. Use `escalation_type: "binding_decision_violation"` with `phase: "completion_gate"` when the unchecked item is a Binding Decisions Compliance Check. Use `escalation_type: "design_compliance_violation"` for core mechanism preservation or other completion gate failures.
+**ENFORCEMENT**: HALT if any gate unchecked. Return `status: "escalation_needed"` to caller. Use `escalation_type: "binding_decision_violation"` with `phase: "completion_gate"` when the unchecked item is a Binding Decisions Compliance Check. Use `escalation_type: "design_compliance_violation"` for Reference Contracts, core mechanism preservation, or other completion gate failures.
 """

package/.codex/agents/task-executor.toml CHANGED Viewed

@@ -181,6 +181,24 @@ Run this check after Pre-implementation Verification and before the TDD cycle wh
    - `N`: stop implementation and return `status: "escalation_needed"` with `escalation_type: "binding_decision_violation"` and `phase: "pre_implementation"`
    - `Unknown`: record the row as deferred in Investigation Notes and proceed to the TDD cycle. The Completion Gate re-evaluates every deferred row against the final implementation.
+#### Reference Contract Check (Required when the task file has a Reference Contracts section)
+Run this check after Pre-implementation Verification and before the TDD cycle when the task file contains a Reference Contracts section with one or more rows.
+1. Confirm each Source in the Reference Contracts table has been read. Sources should also appear in Investigation Targets.
+2. Verify source fidelity before planning: locate the Required Observable Value in the Source and confirm it matches verbatim. If the value is not found or the task row differs by summary, omission, reordered fields, changed labels, or changed conditions, stop implementation and return `status: "escalation_needed"` with `escalation_type: "design_compliance_violation"`. Set `details.design_doc_expectation` to the Source value or "source value not found" and `details.actual_situation` to the task row's Required Observable Value.
+3. Use the Investigation Notes format below while recording the planned approach and evaluation results.
+   - `### Reference Contracts Evaluation`
+   - `[source] planned: [one sentence planned approach]`
+   - `[source] source fidelity -> Y|N - [one-line rationale]`
+   - `[source] [Compliance Check] -> Y|N|Unknown - [one-line rationale]`
+4. Record the planned implementation approach in Investigation Notes, one sentence per row.
+5. Evaluate each row's Compliance Check against the planned approach. Record the result as `Y`, `N`, or `Unknown` with a one-line rationale.
+6. Branch per row:
+   - `Y`: proceed
+   - `N`: stop implementation and return `status: "escalation_needed"` with `escalation_type: "design_compliance_violation"`. Set `details.design_doc_expectation` to the row's Required Observable Value and `details.actual_situation` to the planned approach.
+   - `Unknown`: record the row as deferred in Investigation Notes and proceed to the TDD cycle. The Completion Gate re-evaluates every deferred row against the final implementation.
 #### Reference Representativeness (Applied During Implementation)
 During implementation, apply coding-rules Reference Representativeness before adopting existing patterns, API usage, or dependency versions. Record majority/coexistence rationale; when repository-wide evidence is insufficient for dependency version or pattern choice, escalate with `reason: "Dependency version uncertain"` and `escalation_type: "dependency_version_uncertain"`.
@@ -239,7 +257,7 @@ Report in the following JSON format upon task completion (**without executing qu
 #### 2-1. Design Doc Deviation Escalation
 When unable to implement per Design Doc, escalate in following JSON format:
-Use Binding Decision Violation Escalation instead when the task has a Binding Decisions row covering the same issue.
+Use Binding Decision Violation Escalation instead when the task has a Binding Decisions row covering the same issue. Use this Design Doc Deviation Escalation for Reference Contracts failures.
 For task/AC/reference core-mechanism sources, set `details.design_doc_expectation` to `[source type] [location]: [cited expectation]`.
 For core-mechanism violations, put the substitute in `details.actual_situation`, the behavior change in `details.why_cannot_implement`, and the unblock condition in `recommendation`.
@@ -297,12 +315,13 @@ Triggered when the Test Environment Check finds the project-configured test tool
 ☐ Implementation is consistent with the observations recorded in Investigation Notes
 ☐ Final implementation preserves the required core mechanism from the task, AC, Design Doc, or referenced materials, with evidence recorded in Investigation Notes or runnableCheck.reason
 ☐ Every Binding Decisions Compliance Check evaluates to `Y` against the final implementation, with evidence recorded in Investigation Notes (when the task file has a Binding Decisions section)
+☐ Every Reference Contracts row has source fidelity `Y` and Compliance Check `Y` against the final implementation, with evidence recorded in Investigation Notes (when the task file has a Reference Contracts section)
 ☐ When test runs are cited as `runnableCheck` evidence, they are substantive per the `runnableCheck.result` field spec; non-test verification is evaluated by command success
 ☐ Output format validated (JSON response with all required fields)
 ☐ Quality standards satisfied (tests pass, progress updated)
 ☐ Final response is a single JSON with status `completed` or `escalation_needed`
-**ENFORCEMENT**: HALT if any gate unchecked. Return `status: "escalation_needed"` to caller. Use `escalation_type: "binding_decision_violation"` with `phase: "completion_gate"` when the unchecked item is a Binding Decisions Compliance Check. Use `escalation_type: "design_compliance_violation"` for core mechanism preservation or other completion gate failures.
+**ENFORCEMENT**: HALT if any gate unchecked. Return `status: "escalation_needed"` to caller. Use `escalation_type: "binding_decision_violation"` with `phase: "completion_gate"` when the unchecked item is a Binding Decisions Compliance Check. Use `escalation_type: "design_compliance_violation"` for Reference Contracts, core mechanism preservation, or other completion gate failures.
 """

package/.codex/agents/technical-designer-frontend.toml CHANGED Viewed

@@ -85,6 +85,14 @@ For each integration boundary, define:
 - Input props or consumed context
 - Output events or effects
 - On Error behavior
+- Serialized Format and Consumer Parse Rule when a value crosses through a query string, route parameter, form post, storage key, config entry, message payload, file, or similar encoded medium
+When the design contains observable values the implementation must reproduce exactly, record them explicitly in the Design Doc using the `Observable Contract Values` table:
+- `structure-order`: column sets, field sets, label sets, or display order
+- `derived-display`: display value derived from another field, lookup, state, or configuration
+- `state-lifecycle-negative`: condition where persisted, restored, cached, or derived state must stay unused
+Write each Required Observable Value as a copyable exact value, not a summary. If a value is serialized and parsed across a boundary, record it in Field Propagation Map / Connection Map instead of this table.
 ### Minimal Surface Alternatives【Required when introducing maintenance-surface elements】

package/.codex/agents/technical-designer.toml CHANGED Viewed

@@ -160,9 +160,18 @@ Record direct impact, indirect impact, and explicitly unaffected components in t
 ### Field Propagation Map【Required】
 When new or changed fields cross component boundaries:
-Document each field's status (preserved / transformed / dropped) at each boundary with rationale.
+Document each field's status (preserved / transformed / dropped) at each boundary with rationale. When a value is serialized and parsed through a query string, route parameter, form post, CLI argument, environment variable, config entry, message payload, storage key, or file, record the exact Serialized Format and Consumer Parse Rule.
 Skip if no fields cross component boundaries.
+### Observable Contract Values【Required when applicable】
+When the design contains observable values the implementation must reproduce exactly, record them explicitly in the Design Doc using the `Observable Contract Values` table:
+- `structure-order`: column sets, field sets, label sets, or display order
+- `derived-display`: display value derived from another field, lookup, state, or configuration
+- `state-lifecycle-negative`: condition where persisted, restored, cached, or derived state must stay unused
+Write each Required Observable Value as a copyable exact value, not a summary. If a value is serialized and parsed across a boundary, record it in Field Propagation Map / Connection Map instead of this table.
 ### Interface Change Impact Analysis【Required】
 Record existing operation, new operation, conversion need, adapter/wrapper need, and compatibility method. When conversion is required, specify adapter implementation or migration path.

package/.codex/agents/work-planner.toml CHANGED Viewed

@@ -141,20 +141,38 @@ Rules:
 - Record the mapping in the `UI Spec Component -> Task Mapping` table from the plan template.
 - Mark components with no covering task as `gap` with justification and user confirmation before approval.
-### 5b. Map Runtime Boundaries to Tasks
+### 5b. Map Reference Contract Values to Tasks
-When implementation crosses runtime, process, deployment, or service boundaries, create a `Connection Map`.
+After Design-to-Plan Traceability is complete, create a `Reference Contract Values` table when any traced Design Doc item contains a binding observable value the implementation must reproduce exactly. When the Design Doc has an `Observable Contract Values` table, use that table as the primary source and copy each applicable row into the work plan.
+Qualifying observable values:
+- `structure-order`: column sets, field sets, label sets, or display order
+- `derived-display`: display value derived from another field, lookup, state, or configuration
+- `state-lifecycle-negative`: condition where persisted, restored, cached, or derived state must stay unused
+For each qualifying value:
+1. Record the Design Doc path and section.
+2. Classify the Contract Type using exactly one value above.
+3. Copy the Required Observable Value verbatim from the Design Doc. Preserve field names, labels, order, and conditions.
+4. Map it to the task IDs that implement or verify the value.
+5. Mark `covered` when concrete task IDs cover the value. Mark `gap` only when no concrete task covers it, set Covered By Task(s) to `-`, add justification in Notes, and flag it for user confirmation before plan approval.
+Serialized boundaries belong in the Connection Map. ADR-derived structural decisions belong in ADR Bindings. When a value qualifies both as an observable value and as a serialized boundary, record it in the Connection Map and omit a duplicate Reference Contract Values row.
+### 5c. Map Runtime Boundaries to Tasks
+When implementation crosses runtime, process, deployment, or service boundaries, or when a value is serialized and parsed within one runtime, create a `Connection Map`.
 A boundary qualifies only when all of the following hold:
-- The two sides run in separate processes, services, runtimes, or deployed artifacts.
-- A serialized contract crosses the boundary, such as HTTP, RPC, event payload, queue message, or webhook payload.
-- A failure on one side creates an observable signal on the other side, such as a status code, timeout, missing field, dropped message, or persisted row.
+- Two sides exchange a value through a boundary. This includes separate processes, services, runtimes, or deployed artifacts, and same-runtime serialized media such as query strings, route parameters, form posts, CLI arguments, environment variables, config entries, message payloads, storage keys, or files.
+- Producer and consumer depend on a shared representation or parse rule.
+- A mismatch creates an observable signal, such as a status code, timeout, missing field, dropped message, invalid route state, parse failure, or persisted row difference.
-Map only boundaries satisfying all three qualifications above: separate runtime, serialized contract, and observable cross-side signal.
+Map only boundaries satisfying all three qualifications above.
-For each boundary, record the caller/producer, callee/consumer, expected signal, and covering tasks in the `Connection Map` table.
+For each boundary, record the caller/producer, callee/consumer, serialized format, consumer parse rule, expected signal, and covering tasks in the `Connection Map` table. Serialized boundaries must have concrete Serialized Format and Consumer Parse Rule values. Use `-` only for non-serialized external signals where the Expected Signal fully captures the contract.
-### 5c. Map ADR Decisions to Tasks
+### 5d. Map ADR Decisions to Tasks
 When ADRs are provided as input or listed in a Design Doc's "Prerequisite ADRs" section, create an `ADR Bindings` table before finalizing tasks.
@@ -177,7 +195,7 @@ Mapping rules:
 - Acceptance criteria and required user-visible behaviors belong in `Design-to-Plan Traceability`; `ADR Bindings` covers structural implementation constraints.
 - If an ADR decision constrains the design but no task covers it, add a justified gap and flag it for user confirmation before plan approval.
-### 5d. Build Failure Mode Checklist
+### 5e. Build Failure Mode Checklist
 Populate the plan template's `Failure Mode Checklist` before finalizing tasks. Enumerate all eight domain-independent categories, mark whether each applies, and list the task IDs that cover applicable categories. Keep category names generic and put project-specific details in task descriptions or notes.
@@ -390,6 +408,9 @@ When creating work plans, **Phase Structure Diagrams** and **Task Dependency Dia
   - [ ] Scope-boundary items mapped explicitly when the Design Doc defines protected no-change areas
   - [ ] Covered By Task(s) uses only normalized task IDs
   - [ ] No unjustified `gap` entries remain
+- [ ] Reference Contract Values table completed when binding observable values appear in traced Design Doc items
+  - [ ] Each row mapped to covering task(s) or justified `gap`
+  - [ ] No unjustified `gap` entries remain
 - [ ] ADR Bindings table completed when ADRs are provided or listed in Design Doc prerequisites
   - [ ] ADR references resolved with exact path or single `docs/adr/ADR-XXXX-*.md` match
   - [ ] Each binding row has one valid Axis value

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "codex-workflows",
-  "version": "0.6.9",
+  "version": "0.7.1",
   "description": "Task-oriented agentic coding framework for OpenAI Codex CLI — skills, recipes, and subagents for structured development workflows",
   "license": "MIT",
   "author": "Shinsuke Kagawa",