npm - codex-workflows - Versions diffs - 0.4.1 → 0.4.3 - Mend

codex-workflows 0.4.1 → 0.4.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

package/.agents/skills/coding-rules/references/security-checks.md CHANGED Viewed

@@ -49,12 +49,14 @@ Sources: OWASP Top 10:2025, DryRun Agentic Coding Security Report (2026-03)
 - Endpoints or route handlers defined without authentication middleware
 - Resource access operations (read, update, delete) without authorization verification
 - Administrative or destructive operations accessible without elevated permissions
-- Recent research indicates this pattern appears at elevated rates in AI-generated code — treat as high-priority review target
+- AI-generated code frequently omits authentication middleware and authorization checks; treat every route handler and resource access operation as an explicit verification target during review
+- Detection approach: search for route or endpoint handlers without authentication middleware, and resource operations (read, update, delete) without authorization checks in the call chain
 ### Mishandling of Exceptional Conditions (OWASP A10:2025)
 - Error handlers that expose internal system details (stack traces, database errors, file paths) in responses
-- Error handlers that fail open (grant access or skip validation on error)
+- Error handlers that grant access, skip authentication, or bypass authorization when an exception occurs
 - Missing error handling on security-critical operations (authentication, authorization, cryptographic operations)
+- Detection approach: search for catch or error-handler blocks that return stack traces, database errors, or file paths in responses, and for handlers that continue with success-path behavior without re-validating security state
 ### Software Supply Chain Patterns (OWASP A03:2025)
 - Dependencies imported without version pinning

package/.agents/skills/coding-rules/references/typescript.md CHANGED Viewed

@@ -7,13 +7,13 @@
 ## Comment Writing Rules
 - **Function Description Focus**: Describe what the code "does"
-- **No Historical Information**: Do not record development history
+- **History in Version Control**: Record development history in commits and PRs instead of code comments
 - **Timeless**: Write only content that remains valid whenever read
 - **Conciseness**: Keep explanations to necessary minimum
 ## Type Safety
-**Absolute Rule**: any type is completely prohibited. It disables type checking and becomes a source of runtime errors.
+**Absolute Rule**: Use `unknown`, generics, unions, intersections, or validated assertions instead of `any`. `any` disables type checking and becomes a source of runtime errors.
 **any Type Alternatives (Priority Order)**
 1. **unknown Type + Type Guards**: Use for validating external input (API responses, localStorage, URL parameters)
@@ -91,7 +91,7 @@ setUsers(users)
 **Props Design (Props-driven Approach)**
 - Props are the interface: Define all necessary information as props
-- Avoid implicit dependencies: Do not depend on global state or context without necessity
+- Declare dependencies explicitly through props, hooks, or injected modules instead of relying on ambient global state
 - Type-safe: Always define Props type explicitly
 **Environment Variables**
@@ -146,7 +146,7 @@ const response = await fetch('/api/data') // Backend handles API key authenticat
 ## Error Handling
-**Absolute Rule**: Error suppression prohibited. All errors must have log output and appropriate handling.
+**Absolute Rule**: Handle every error explicitly with log output, recovery logic, or escalation appropriate to the failure mode.
 **Fail-Fast Principle**: Fail quickly on errors to prevent continued processing in invalid states
 ```typescript

package/.agents/skills/documentation-criteria/SKILL.md CHANGED Viewed

@@ -53,23 +53,19 @@ description: "Documentation creation criteria for PRD, ADR, Design Doc, UI Spec,
 ### PRD (Product Requirements Document)
 **Purpose**: Define business requirements and user value
-**Includes**: Business requirements, success metrics, user stories, MoSCoW prioritization, MVP/Future phase separation, user journey diagram, scope boundary diagram
-**Excludes**: Technical implementation details, technical selection rationale, implementation phases, task breakdown
+**Scope**: Business requirements, user value, success metrics, user stories, MoSCoW prioritization, MVP/Future phase separation, user journey diagram, and scope boundary diagram only. Technical implementation details belong in Design Doc, technical decision rationale in ADR, and implementation phases or task breakdown belong in Work Plan.
 ### ADR (Architecture Decision Record)
 **Purpose**: Record technical decision rationale and background
-**Includes**: Decision, rationale, option comparison (minimum 3 options), architecture impact, principled implementation guidelines
-**Excludes**: Implementation schedule, detailed procedures, specific code examples, resource assignments
+**Scope**: Decision, rationale, option comparison (minimum 3 options), architecture impact, and principled implementation guidance only. Implementation procedures and code examples belong in Design Doc, while schedule and resource assignments belong in Work Plan.
 ### UI Specification
 **Purpose**: Define UI structure, screen transitions, component decomposition, and interaction design
-**Includes**: Screen list and transitions, component state x display matrix, interaction definitions, AC traceability, existing component reuse map, accessibility requirements
-**Excludes**: Technical implementation details, API contracts, test implementation (generated by acceptance-test-generator), implementation schedule
+**Scope**: Screen list and transitions, component state x display matrix, component decomposition, interaction definitions, AC traceability, existing component reuse map, visual acceptance criteria, and accessibility requirements only. Technical implementation and API contracts belong in Design Doc, test implementation belongs in generated test skeletons, and schedule belongs in Work Plan.
 ### Design Document
 **Purpose**: Define technical implementation methods in detail
-**Includes**: Existing codebase analysis, technical approach, dependencies and constraints, interface/contract definitions, data flow, acceptance criteria, change impact map, code inspection evidence
-**Excludes**: Why that technology was chosen (reference ADR), when/who to implement (reference Work Plan), detailed test strategy and test case selection (generated by acceptance-test-generator from acceptance criteria)
+**Scope**: Existing codebase analysis, technical approach, dependencies and constraints, interface and contract definitions, data flow, acceptance criteria, change impact map, code inspection evidence, and verification strategy only. Technology selection rationale belongs in ADR, schedule and assignments belong in Work Plan, and detailed test strategy or case selection belongs in generated test skeletons.
 **Required Structural Elements**:
 - Existing codebase analysis and code inspection evidence
@@ -85,8 +81,7 @@ description: "Documentation creation criteria for PRD, ADR, Design Doc, UI Spec,
 ### Work Plan
 **Purpose**: Implementation task management and progress tracking
-**Includes**: Task breakdown, schedule estimates, test skeleton file paths, Verification Strategy summaries from each Design Doc, final Quality Assurance phase (required), progress records
-**Excludes**: Technical rationale, design details
+**Scope**: Task breakdown, dependencies, schedule estimates, test skeleton file paths, Verification Strategy summaries from each Design Doc, final Quality Assurance phase, and progress tracking only. Technical rationale belongs in ADR and design details belong in Design Doc.
 **Phase Division Criteria**:
@@ -103,7 +98,7 @@ description: "Documentation creation criteria for PRD, ADR, Design Doc, UI Spec,
 **When Hybrid is selected**:
 - Combine vertical and horizontal phase structures as defined in the Design Doc
-- Final phase is always Quality Assurance
+- Final phase is always Quality Assurance with acceptance criteria verification, all tests passing, and quality checks complete
 ## Creation Process [MANDATORY]

package/.agents/skills/documentation-criteria/references/adr-template.md CHANGED Viewed

@@ -2,7 +2,7 @@
 ## Status
-[Proposed | Accepted | Deprecated | Superseded]
+[Proposed | Accepted | Deprecated | Superseded | Rejected]
 ## Context
@@ -54,6 +54,10 @@
 - [List changes that are neither good nor bad]
+## Architecture Impact
+[Describe how this decision affects existing architecture: components changed, dependencies introduced or removed, and new architectural constraints]
 ## Implementation Guidance
 [Principled direction only. Implementation procedures go to Design Doc]

package/.agents/skills/documentation-criteria/references/design-template.md CHANGED Viewed

@@ -277,11 +277,16 @@ System Invariants:
 ### Error Handling
-[Types of errors and how to handle them]
+| Error Category | Example | Detection | Recovery Strategy | User Impact |
+|---------------|---------|-----------|-------------------|-------------|
+| [Validation / External / Infrastructure / Business logic] | [Specific error] | [How detected] | [Retry / Fallback / Propagate / Log-and-continue] | [User-facing message or silent handling] |
 ### Logging and Monitoring
-[What to record in logs and how to monitor]
+- **Log events**: [Key events to log: state transitions, external calls, error occurrences, performance thresholds]
+- **Log levels**: [Which events use DEBUG / INFO / WARN / ERROR]
+- **Sensitive data**: [Fields to mask or exclude; align with Security Considerations]
+- **Monitoring**: [Metrics to track, alert thresholds, dashboard requirements]
 ## Implementation Plan
@@ -301,12 +306,6 @@ System Invariants:
    - Technical Reason: [Technical necessity to implement after A]
    - Prerequisites: [Required pre-implementations]
-### Integration Points
-**Integration Point 1: [Name]**
-- Components: [Component A] to [Component B]
-- Contract: [Interface/API contract between components]
 ### Migration Strategy
 [Technical migration approach, ensuring backward compatibility]
@@ -323,7 +322,9 @@ Mark items as N/A with brief rationale when the feature has no relevant trust bo
 ## Future Extensibility
-[Considerations for future feature additions or changes]
+- **Extension points**: [Interfaces, hooks, or plugin mechanisms designed for future use]
+- **Known future requirements**: [Planned features that influenced current design decisions]
+- **Intentional limitations**: [What was deliberately kept simple and why]
 ## Alternative Solutions

package/.agents/skills/recipe-build/SKILL.md CHANGED Viewed

@@ -98,14 +98,24 @@ ENFORCEMENT: Sub-agent prompts missing the constraint suffix MUST be re-issued w
 VERIFY approval status before proceeding. Once confirmed, INITIATE autonomous execution mode.
-## Security Review (After All Tasks Complete)
-After all task cycles finish, collect all `filesModified` from every task-executor response (deduplicated), then invoke security-reviewer before the completion report:
-1. Spawn security-reviewer agent: "Design Doc: [path]. Implementation files: [collected filesModified list]. Review security compliance."
-2. Check response:
-   - `approved` or `approved_with_notes` -> Proceed to completion report (include notes if present)
-   - `needs_revision` -> Spawn task-executor with `requiredFixes`, then quality-fixer, then re-invoke security-reviewer
-   - `blocked` -> Escalate to user
+## Post-Implementation Verification (After All Tasks Complete)
+After all task cycles finish, collect all `filesModified` from every task-executor response (deduplicated), then run both verification agents before the completion report:
+1. Spawn code-verifier agent: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [path]. `code_paths`: [collected filesModified list]."
+2. Spawn security-reviewer agent: "Design Doc: [path]. Implementation files: [collected filesModified list]. Review security compliance."
+3. Consolidate results:
+   - code-verifier passes when `summary.status` is `consistent` or `mostly_consistent`
+   - code-verifier fails when `summary.status` is `needs_review` or `inconsistent`
+   - security-reviewer passes when `status` is `approved` or `approved_with_notes`
+   - security-reviewer fails when `status` is `needs_revision`
+   - security-reviewer `blocked` -> Escalate to user
+4. If either verifier fails:
+   - Create a single fix task covering verifier discrepancies and security requiredFixes
+   - Spawn task-executor with that consolidated task
+   - Spawn quality-fixer
+   - Re-run only the verifier(s) that failed
+   - Maximum retry count is 1 verification fix cycle; if any failed verifier still fails after re-run, escalate to the user
+5. If both verifiers pass -> Proceed to completion report
 **[STOP — BLOCKING]** Upon detecting ANY requirement changes, halt execution immediately.
 **CANNOT proceed until user explicitly confirms the change scope.**

package/.agents/skills/recipe-front-build/SKILL.md CHANGED Viewed

@@ -106,14 +106,24 @@ ENFORCEMENT: Sub-agent prompts missing the constraint suffix MUST be re-issued w
 VERIFY approval status before proceeding. Once confirmed, INITIATE autonomous execution mode.
-## Security Review (After All Tasks Complete)
-After all task cycles finish, collect all `filesModified` from every task-executor-frontend response (deduplicated), then invoke security-reviewer before the completion report:
-1. Spawn security-reviewer agent: "Design Doc: [path]. Implementation files: [collected filesModified list]. Review security compliance."
-2. Check response:
-   - `approved` or `approved_with_notes` -> Proceed to completion report (include notes if present)
-   - `needs_revision` -> Spawn task-executor-frontend with `requiredFixes`, then quality-fixer-frontend, then re-invoke security-reviewer
-   - `blocked` -> Escalate to user
+## Post-Implementation Verification (After All Tasks Complete)
+After all task cycles finish, collect all `filesModified` from every task-executor-frontend response (deduplicated), then run both verification agents before the completion report:
+1. Spawn code-verifier agent: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [path]. `code_paths`: [collected filesModified list]."
+2. Spawn security-reviewer agent: "Design Doc: [path]. Implementation files: [collected filesModified list]. Review security compliance."
+3. Consolidate results:
+   - code-verifier passes when `summary.status` is `consistent` or `mostly_consistent`
+   - code-verifier fails when `summary.status` is `needs_review` or `inconsistent`
+   - security-reviewer passes when `status` is `approved` or `approved_with_notes`
+   - security-reviewer fails when `status` is `needs_revision`
+   - security-reviewer `blocked` -> Escalate to user
+4. If either verifier fails:
+   - Create a single fix task covering verifier discrepancies and security requiredFixes
+   - Spawn task-executor-frontend with that consolidated task
+   - Spawn quality-fixer-frontend
+   - Re-run only the verifier(s) that failed
+   - Maximum retry count is 1 verification fix cycle; if any failed verifier still fails after re-run, escalate to the user
+5. If both verifiers pass -> Proceed to completion report
 **[STOP -- BLOCKING]** Upon detecting ANY requirement changes, halt execution immediately.
 **CANNOT proceed until user explicitly confirms the change scope.**

package/.agents/skills/recipe-front-review/SKILL.md CHANGED Viewed

@@ -33,7 +33,7 @@ Identify the Design Doc in docs/design/ and check implementation files changed f
 **CANNOT proceed without both a Design Doc and implementation files.**
 ### 2. Execute code-reviewer
-Spawn code-reviewer agent: "Validate Design Doc compliance for [design-doc-path]. Implementation files: [git diff file list]. Review mode: full. Return structured JSON report with complianceRate, verdict, acceptanceCriteria, and qualityIssues."
+Spawn code-reviewer agent: "Validate Design Doc compliance for [design-doc-path]. Implementation files: [git diff file list]. Review mode: full. Return structured JSON report per your Output Format specification."
 **Store output as**: `$STEP_2_OUTPUT`
@@ -59,10 +59,16 @@ Spawn security-reviewer agent: "Design Doc: [path]. Implementation files: [file
 ```
 Code Compliance: [complianceRate from code-reviewer]
   Verdict: [verdict from code-reviewer]
+  Identifier Match Rate: [identifierMatchRate from code-reviewer]
   Acceptance Criteria:
-  - [fulfilled] [item]
+  - [fulfilled] [item] (confidence: [high/medium/low])
   - [partially_fulfilled] [item]: [gap] — [suggestion]
   - [unfulfilled] [item]: [gap] — [suggestion]
+  Identifier Mismatches (show only mismatches; write `None` if all identifiers match):
+  - None
+  - [identifier]: DD=[designDocValue] Code=[codeValue] at [location] (confidence: [high/medium/low])
+  Quality Findings:
+  - [category] [location]: [description] — [rationale]
 Security Review: [status from security-reviewer]
   Findings by category:

package/.agents/skills/recipe-fullstack-build/SKILL.md CHANGED Viewed

@@ -116,14 +116,24 @@ ENFORCEMENT: Sub-agent prompts missing the constraint suffix MUST be re-issued w
 VERIFY approval status before proceeding. Once confirmed, INITIATE autonomous execution mode.
-## Security Review (After All Tasks Complete)
-After all task cycles finish, collect all `filesModified` from every task-executor/task-executor-frontend response (deduplicated), then invoke security-reviewer before the completion report:
-1. Spawn security-reviewer agent: "Design Doc: [path(s)]. Implementation files: [collected filesModified list]. Review security compliance."
-2. Check response:
-   - `approved` or `approved_with_notes` -> Proceed to completion report (include notes if present)
-   - `needs_revision` -> Spawn layer-appropriate task-executor with `requiredFixes`, then quality-fixer, then re-invoke security-reviewer
-   - `blocked` -> Escalate to user
+## Post-Implementation Verification (After All Tasks Complete)
+After all task cycles finish, collect all `filesModified` from every task-executor/task-executor-frontend response (deduplicated), then run both verification agents before the completion report:
+1. Spawn code-verifier once per Design Doc: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [single design doc path]. `code_paths`: [collected filesModified list]."
+2. Spawn security-reviewer agent: "Design Doc: [path(s)]. Implementation files: [collected filesModified list]. Review security compliance."
+3. Consolidate results:
+   - each code-verifier run passes when `summary.status` is `consistent` or `mostly_consistent`
+   - a code-verifier run fails when `summary.status` is `needs_review` or `inconsistent`
+   - security-reviewer passes when `status` is `approved` or `approved_with_notes`
+   - security-reviewer fails when `status` is `needs_revision`
+   - security-reviewer `blocked` -> Escalate to user
+4. If any verifier fails:
+   - Create a single fix task covering verifier discrepancies and security requiredFixes
+   - Spawn the layer-appropriate task-executor
+   - Spawn the layer-appropriate quality-fixer
+   - Re-run only the verifier(s) that failed
+   - Maximum retry count is 1 verification fix cycle; if any failed verifier still fails after re-run, escalate to the user
+5. If all verifiers pass -> Proceed to completion report
 **[STOP -- BLOCKING]** Upon detecting ANY requirement changes, halt execution immediately.
 **CANNOT proceed until user explicitly confirms the change scope.**

package/.agents/skills/recipe-fullstack-implement/SKILL.md CHANGED Viewed

@@ -127,14 +127,24 @@ ENFORCEMENT: Sub-agent prompts missing the constraint suffix MUST be re-issued w
 3. Quality-fixer MUST run after each executor (no skipping)
 4. Commit MUST execute when quality-fixer returns `status: "approved"` (do not defer to end)
-### Security Review (After All Tasks Complete)
-After all task cycles finish, collect all `filesModified` from every task-executor/task-executor-frontend response (deduplicated), then invoke security-reviewer before the completion report:
-1. Spawn security-reviewer agent: "Design Doc: [path(s)]. Implementation files: [collected filesModified list]. Review security compliance."
-2. Check response:
-   - `approved` or `approved_with_notes` -> Proceed to completion report (include notes if present)
-   - `needs_revision` -> Spawn layer-appropriate task-executor with `requiredFixes`, then quality-fixer, then re-invoke security-reviewer
-   - `blocked` -> Escalate to user
+### Post-Implementation Verification (After All Tasks Complete)
+After all task cycles finish, collect all `filesModified` from every task-executor/task-executor-frontend response (deduplicated), then run both verification agents before the completion report:
+1. Spawn code-verifier once per Design Doc: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [single design doc path]. `code_paths`: [collected filesModified list]."
+2. Spawn security-reviewer agent: "Design Doc: [path(s)]. Implementation files: [collected filesModified list]. Review security compliance."
+3. Consolidate results:
+   - each code-verifier run passes when `summary.status` is `consistent` or `mostly_consistent`
+   - a code-verifier run fails when `summary.status` is `needs_review` or `inconsistent`
+   - security-reviewer passes when `status` is `approved` or `approved_with_notes`
+   - security-reviewer fails when `status` is `needs_revision`
+   - security-reviewer `blocked` -> Escalate to user
+4. If any verifier fails:
+   - Create a single fix task covering verifier discrepancies and security requiredFixes
+   - Spawn the layer-appropriate task-executor
+   - Spawn the layer-appropriate quality-fixer
+   - Re-run only the verifier(s) that failed
+   - Maximum retry count is 1 verification fix cycle; if any failed verifier still fails after re-run, escalate to the user
+5. If all verifiers pass -> Proceed to completion report
 ### Test Information Communication
 After acceptance-test-generator execution, when calling work-planner, communicate:

package/.agents/skills/recipe-implement/SKILL.md CHANGED Viewed

@@ -108,14 +108,24 @@ After user grants "batch approval for entire implementation phase", enter autono
 3. Spawn quality-fixer (or quality-fixer-frontend) agent: "Quality check and fixes"
 4. git commit -> Execute on `status: "approved"`
-### Security Review (After All Tasks Complete)
-After all task cycles finish, collect all `filesModified` from every executor response (task-executor and task-executor-frontend, deduplicated), then invoke security-reviewer before the completion report:
-1. Spawn security-reviewer agent: "Design Doc: [path]. Implementation files: [collected filesModified list]. Review security compliance."
-2. Check response:
-   - `approved` or `approved_with_notes` -> Proceed to completion report (include notes if present)
-   - `needs_revision` -> Spawn layer-appropriate executor (task-executor or task-executor-frontend per task filename routing) with `requiredFixes`, then layer-appropriate quality-fixer, then re-invoke security-reviewer
-   - `blocked` -> Escalate to user
+### Post-Implementation Verification (After All Tasks Complete)
+After all task cycles finish, collect all `filesModified` from every executor response (task-executor and task-executor-frontend, deduplicated), then run both verification agents before the completion report:
+1. Spawn code-verifier agent: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [path]. `code_paths`: [collected filesModified list]."
+2. Spawn security-reviewer agent: "Design Doc: [path]. Implementation files: [collected filesModified list]. Review security compliance."
+3. Consolidate results:
+   - code-verifier passes when `summary.status` is `consistent` or `mostly_consistent`
+   - code-verifier fails when `summary.status` is `needs_review` or `inconsistent`
+   - security-reviewer passes when `status` is `approved` or `approved_with_notes`
+   - security-reviewer fails when `status` is `needs_revision`
+   - security-reviewer `blocked` -> Escalate to user
+4. If either verifier fails:
+   - Create a single fix task covering verifier discrepancies and security requiredFixes
+   - Spawn the layer-appropriate executor
+   - Spawn the layer-appropriate quality-fixer
+   - Re-run only the verifier(s) that failed
+   - Maximum retry count is 1 verification fix cycle; if any failed verifier still fails after re-run, escalate to the user
+5. If both verifiers pass -> Proceed to completion report
 ### Test Information Communication
 After acceptance-test-generator execution, when spawning work-planner, communicate:

package/.agents/skills/recipe-review/SKILL.md CHANGED Viewed

@@ -35,7 +35,7 @@ Design Doc (uses most recent if omitted): $ARGUMENTS
 Identify Design Doc in docs/design/ and check implementation files via git diff.
 ### Step 2: Execute code-reviewer
-Spawn code-reviewer agent: "Validate Design Doc compliance for the implementation. Design Doc path: [path]. Implementation files: [git diff file list]. Review mode: full. Return structured JSON report with complianceRate, verdict, acceptanceCriteria, and qualityIssues."
+Spawn code-reviewer agent: "Validate Design Doc compliance for the implementation. Design Doc path: [path]. Implementation files: [git diff file list]. Review mode: full. Return structured JSON report per your Output Format specification."
 **Store output as**: `$STEP_2_OUTPUT`
@@ -61,10 +61,16 @@ Spawn security-reviewer agent: "Design Doc: [path]. Implementation files: [file
 ```
 Code Compliance: [complianceRate from code-reviewer]
   Verdict: [verdict from code-reviewer]
+  Identifier Match Rate: [identifierMatchRate from code-reviewer]
   Acceptance Criteria:
-  - [fulfilled] [item]
+  - [fulfilled] [item] (confidence: [high/medium/low])
   - [partially_fulfilled] [item]: [gap] — [suggestion]
   - [unfulfilled] [item]: [gap] — [suggestion]
+  Identifier Mismatches (show only mismatches; write `None` if all identifiers match):
+  - None
+  - [identifier]: DD=[designDocValue] Code=[codeValue] at [location] (confidence: [high/medium/low])
+  Quality Findings:
+  - [category] [location]: [description] — [rationale]
 Security Review: [status from security-reviewer]
   Findings by category:

package/.agents/skills/subagents-orchestration-guide/SKILL.md CHANGED Viewed

@@ -69,7 +69,7 @@ The following subagents are available:
 10. **technical-designer**: ADR/Design Doc creation
 11. **work-planner**: Work plan creation from Design Doc and test skeletons
 12. **document-reviewer**: Single document quality and rule compliance check
-13. **code-verifier**: Document-code consistency verification for review inputs
+13. **code-verifier**: Document-code consistency verification for review inputs and post-implementation verification
 14. **design-sync**: Design Doc consistency verification across multiple documents
 15. **acceptance-test-generator**: Generate integration and E2E test skeletons from Design Doc ACs
@@ -182,7 +182,7 @@ Subagents respond in JSON format. The final response from each JSON-returning su
 - **task-executor**: status (escalation_needed/completed), escalation_type (design_compliance_violation/similar_function_found/similar_component_found/investigation_target_not_found/out_of_scope_file/test_environment_not_ready), testsAdded, requiresTestReview
 - **quality-fixer**: status (approved/blocked). For blocked responses, discriminate by `reason`: specification conflicts use `blockingIssues[]`; execution prerequisites use `missingPrerequisites[]`, and each item provides its own `resolutionSteps`
 - **document-reviewer**: verdict.decision (approved/approved_with_conditions/needs_revision/rejected)
-- **code-verifier**: summary, discrepancies, reverseCoverage
+- **code-verifier**: summary.status, summary.consistencyScore, discrepancies, reverseCoverage
 - **design-sync**: sync_status (CONFLICTS_FOUND/NO_CONFLICTS) — text format with [SUMMARY] block
 - **integration-test-reviewer**: status (approved/needs_revision/blocked), requiredFixes
 - **security-reviewer**: status (approved/approved_with_notes/needs_revision/blocked), findings, notes, requiredFixes
@@ -300,9 +300,9 @@ Batch approval -> Start autonomous execution mode
       -> Orchestrator: Execute git commit
       -> Check remaining tasks:
           - Yes -> next task
-          - No -> security-reviewer: Security review
-              - approved/approved_with_notes -> Completion report
-              - needs_revision -> layer-appropriate task-executor: Security fixes -> quality-fixer -> security-reviewer
+          - No -> code-verifier + security-reviewer: Post-implementation verification
+              - all pass -> Completion report
+              - any fail -> layer-appropriate task-executor: Verification fixes -> quality-fixer -> re-run failed verifiers
               - blocked -> Escalate to user
 ```
@@ -321,6 +321,16 @@ Use the task loop defined in the autonomous execution diagram above. The canonic
 3. quality-fixer quality gate
 4. git commit on approval
+### Post-Implementation Verification Pass/Fail Criteria
+| Verifier | Pass | Fail | Blocked |
+|----------|------|------|---------|
+| code-verifier | `summary.status` is `consistent` or `mostly_consistent` | `summary.status` is `needs_review` or `inconsistent` | — |
+| security-reviewer | `status` is `approved` or `approved_with_notes` | `status` is `needs_revision` | `status` is `blocked` |
+Re-run only verifiers that failed on the previous verification cycle.
+Maximum retry count is 1 verification fix cycle. If any failed verifier still fails after the re-run, escalate to the user.
 ## Main Orchestrator Roles
 1. **State Management**: Track current phase, each subagent's state, and next action

package/.agents/skills/testing/SKILL.md CHANGED Viewed

@@ -64,7 +64,7 @@ All tests MUST be:
 - **Independent**: No dependencies between tests
 - **Reproducible**: Same input always produces same output
-- **Fast**: Complete test suite runs in reasonable time
+- **Fast**: Complete the full test suite within the project's accepted feedback window and flag suites that materially slow local iteration or CI
 - **Self-checking**: Clear pass/fail without manual verification
 - **Timely**: Written close to the code they test
@@ -162,7 +162,7 @@ Test names should clearly describe:
 - Use setup hooks to prepare test environment
 - Use teardown hooks to clean up resources
-- Keep setup minimal and focused
+- Keep setup scoped to the data, dependencies, and fixtures required for the behavior under test
 - Ensure teardown runs even if test fails
 ## Mocking and Test Doubles
@@ -177,7 +177,7 @@ Test names should clearly describe:
 ### Mocking Principles [MANDATORY]
 - Mock at boundaries, not internally — use real implementations for internal utilities
-- Keep mocks simple and focused
+- Keep each mock limited to the behavior the test needs to control or observe
 - Verify mock expectations when relevant
 - Use adapters for external libraries/frameworks you do not control
@@ -345,7 +345,7 @@ Eliminate tests that fail intermittently:
 - Add test for every bug fix
 - Maintain comprehensive test suite
 - Run full suite regularly
-- Don't delete tests without good reason
+- Delete a test only when the covered behavior no longer exists or the same behavior is covered by a stronger test at the correct level
 ### Legacy Code

package/.codex/agents/acceptance-test-generator.toml CHANGED Viewed

@@ -242,7 +242,8 @@ These annotations are used when planning and prioritizing test implementation.
 ## Constraints and Quality Standards
 **Mandatory Compliance**:
-- Output only test skeletons (prohibit implementation code, assertions, mocks)
+- Output test skeletons only: verification points, expected results, and pass criteria
+- Downstream consumers treat these skeletons as design artifacts rather than runnable tests
 - Clearly state verification points, expected results, and pass criteria for each test
 - Preserve original AC statements in comments (ensure traceability)
 - Stay within test budget; report if budget insufficient for critical tests
@@ -273,7 +274,7 @@ These annotations are used when planning and prioritizing test implementation.
 - Framework/Language: Auto-detect from existing test files
 - Placement: Identify test directory with project-specific patterns
 - Naming: Follow existing file naming conventions
-- Output: Test skeleton only (exclude implementation code)
+- Output: Test skeletons only (follow Constraints and Quality Standards for the boundary)
 **File Operations**:
 - Existing files: Append to end, prevent duplication (check existing tests)

package/.codex/agents/code-reviewer.toml CHANGED Viewed

@@ -59,27 +59,79 @@ Skill Status:
 ## Workflow
 ### 1. Load Baseline
-Read the Design Doc and extract:
+Read the Design Doc in full and extract:
 - Functional requirements and acceptance criteria (list each AC individually)
 - Architecture design and data flow
+- Interface contracts (function signatures, API endpoints, data structures)
+- Identifier specifications explicitly written in the Design Doc as exact values, literals, labels, or named fields (resource names, endpoint paths, configuration keys, error codes, schema/model names)
 - Error handling policy
 - Non-functional requirements
-### 2. Map Implementation to Acceptance Criteria
+### 2. Map Implementation to Design Doc
+#### 2-1. Acceptance Criteria Verification
 For each acceptance criterion extracted in Step 1:
 - Search implementation files for the corresponding code
 - Determine status: fulfilled / partially fulfilled / unfulfilled
 - Record the file path and relevant code location
 - Note any deviations from the Design Doc specification
+#### 2-2. Identifier Verification
+For each identifier specification extracted in Step 1:
+1. Search implementation files for the exact string
+2. Compare code values against the Design Doc specification
+3. Flag discrepancies such as missing references, misspellings, or inconsistent naming
+4. Evaluate every identifier and update overall totals for matched and mismatched results
+5. Emit only mismatches in `identifierVerification`, with `{ identifier, designDocValue, codeValue, location, match, confidence, evidence }`
+Identifier extraction constraints:
+- Only verify identifiers that are explicitly written in the Design Doc as exact values, literals, labeled fields, or code-facing names
+- Do not infer identifiers from descriptive prose, conceptual summaries, or implied naming conventions
+- If the Design Doc names a concept without an exact code-facing value, treat it as a normal Design Doc claim, not an identifier check
+#### 2-3. Evidence Collection
+For each acceptance criterion and identifier check:
+1. Primary evidence: direct implementation in source files
+2. Secondary evidence: corresponding tests
+3. Tertiary evidence: config, schemas, or type definitions
+`agreeing sources` means multiple sources independently support the same determination about the same acceptance criterion or identifier. Naming overlap alone is NOT agreement; the evidence must support the same behavior, contract, or exact value match.
+Assign confidence based on evidence count:
+- high: 3+ agreeing sources
+- medium: 2 agreeing sources
+- low: 1 source only
 ### 3. Assess Code Quality
-Read each implementation file and check:
-- Function length (ideal: <50 lines, max: 200 lines)
-- Nesting depth (ideal: <=3 levels, max: 4 levels)
-- Single responsibility adherence
-- Error handling implementation
-- Appropriate logging
-- Test coverage for acceptance criteria
+Read each implementation file and evaluate:
+#### 3-1. Structural Quality
+For each implementation file, read the concrete functions, handlers, or components in scope and evaluate them against the active coding-rules skill:
+- Function organization: flag `maintainability` when a single function mixes multiple distinct concerns such as validation, orchestration, persistence, and presentation formatting
+- Control-flow clarity: flag `maintainability` when branches, nested conditions, or early-exit patterns make the execution path materially difficult to follow
+- Single responsibility adherence: flag `maintainability` when a function or file has more than one primary responsibility
+- Naming clarity: flag `maintainability` when ambiguous names materially obscure intent, domain meaning, or responsibility
+#### 3-2. Error Handling and Reliability
+Read error paths and boundary handling directly in the code:
+- Error handling implementation: verify failures are either propagated explicitly or handled with context
+- Explicit failure paths over silent suppression: flag `reliability` when errors are swallowed, converted to defaults without justification, or otherwise hidden from callers and operators
+- Boundary validation: flag `reliability` when external input, deserialized data, or cross-system responses enter important logic without the validation implied by the Design Doc, type contracts, or code boundary shape
+#### 3-3. Test Coverage for Acceptance Criteria
+- For each fulfilled AC, check whether tests exercise the expected behavior
+Classify each quality finding into one of:
+- `dd_violation`: implementation deviates from the Design Doc
+- `maintainability`: code structure impedes change or comprehension
+- `reliability`: missing safeguards could cause runtime failure
+- `coverage_gap`: acceptance criteria lack meaningful test verification
+Each finding MUST include a rationale:
+- `dd_violation`: what the Design Doc says vs what code does
+- `maintainability`: the concrete maintenance or comprehension risk
+- `reliability`: the failure scenario and triggering conditions
+- `coverage_gap`: the untested AC and why coverage matters
 ### 4. Check Architecture Compliance
 Verify against the Design Doc architecture:
@@ -89,9 +141,10 @@ Verify against the Design Doc architecture:
 - No unnecessary duplicate implementations (Pattern 5 from ai-development-guide skill)
 - Existing codebase analysis section includes similar functionality investigation results
-### 5. Calculate Compliance
+### 5. Calculate Compliance and Consolidate
 - Compliance rate = (fulfilled items + 0.5 x partially fulfilled items) / total AC items x 100
-- Compile all AC statuses, quality issues with specific locations
+- Identifier match rate = matched identifiers / total identifiers x 100
+- Compile all AC statuses, identifier results, and quality findings with specific locations
 - Determine verdict based on compliance rate
 ### 6. Return JSON Result
@@ -102,50 +155,100 @@ Return the JSON result as the final response. See Output Format for the schema.
 ```json
 {
   "complianceRate": "[X]%",
+  "identifierMatchRate": "[X]%",
   "verdict": "[pass/needs-improvement/needs-redesign]",
   "acceptanceCriteria": [
     {
       "item": "[acceptance criteria name]",
       "status": "fulfilled|partially_fulfilled|unfulfilled",
+      "confidence": "high|medium|low",
       "location": "[file:line, if implemented]",
+      "evidence": ["[source1: file:line]", "[source2: test file:line]"],
       "gap": "[what is missing or deviating, if not fully fulfilled]",
       "suggestion": "[specific fix, if not fully fulfilled]"
     }
   ],
-  "qualityIssues": [
+  "identifierVerification": [
+    {
+      "identifier": "[identifier name]",
+      "designDocValue": "[value specified in Design Doc]",
+      "codeValue": "[value found in code, or 'not found']",
+      "location": "[file:line]",
+      "confidence": "high|medium|low",
+      "evidence": ["[source1: file:line]", "[source2: config file:line]"],
+      "match": false
+    }
+  ],
+  "qualityFindings": [
     {
-      "type": "[long-function/deep-nesting/multiple-responsibilities]",
-      "location": "[filename:function]",
+      "category": "dd_violation|maintainability|reliability|coverage_gap",
+      "location": "[filename:function or file:line]",
+      "description": "[specific issue]",
+      "rationale": "[why this matters]",
       "suggestion": "[specific improvement]"
     }
   ],
+  "summary": {
+    "acsTotal": 0,
+    "acsFulfilled": 0,
+    "acsPartial": 0,
+    "acsUnfulfilled": 0,
+    "identifiersTotal": 0,
+    "identifiersMatched": 0,
+    "lowConfidenceItems": 0,
+    "findingsByCategory": {
+      "dd_violation": 0,
+      "maintainability": 0,
+      "reliability": 0,
+      "coverage_gap": 0
+    }
+  },
   "nextAction": "[highest priority action needed]"
 }
 ```
+`identifierVerification` MUST include mismatches only. Use `summary.identifiersTotal` and `summary.identifiersMatched` for overall counts.
 ## Verdict Criteria
 - **90%+**: pass — Minor adjustments only
 - **70-89%**: needs-improvement — Critical gaps exist
 - **<70%**: needs-redesign — Major revision required
+Lower the verdict by one level only when at least one identifier mismatch has confidence `medium` or `high`.
 ## Important Notes
 ### Review Principles
 - Use Design Doc as single source of truth; evaluate independent of implementation context
+- Every finding must include file:line evidence
+- Low-confidence determinations must be explicit
+- Convert abstract skill rules into concrete, code-backed review findings rather than restating the rule alone
 - Provide solutions, not just problems; quantify wherever possible
-- Acknowledge good implementations; present improvements as actionable items
 ## Completion Criteria
-- [ ] All acceptance criteria individually evaluated
-- [ ] Compliance rate calculated
+- [ ] All acceptance criteria individually evaluated with confidence
+- [ ] Identifier specifications verified against implementation
+- [ ] Compliance rate and identifier match rate calculated
+- [ ] Quality findings classified with rationale
 - [ ] Verdict determined
 - [ ] Final response is the JSON output
+## Output Self-Check
+- [ ] Every AC determination cites evidence
+- [ ] Identifier comparisons use exact strings from the Design Doc and code
+- [ ] Low-confidence items are explicit
+- [ ] Every quality finding includes category, rationale, and file:line
+- [ ] Every maintainability or reliability finding is backed by code that was actually read, not inferred from naming alone
+- [ ] identifierVerification contains mismatches only, and each mismatch includes confidence and evidence
 ### Escalation Criteria
 Recommend higher-level review when: Design Doc itself has deficiencies, security concerns discovered, or critical performance issues found.

package/.codex/agents/requirement-analyzer.toml CHANGED Viewed

@@ -55,7 +55,7 @@ Scale determination and required document details follow the principles in docum
 - **Medium**: 3-5 files, spanning multiple components
 - **Large**: 6+ files, architecture-level changes
-※ADR conditions (contract system changes, data flow changes, architecture changes, external dependency changes) require ADR regardless of scale
+Note: ADR conditions (contract system changes, data flow changes, architecture changes, external dependency changes) require ADR regardless of scale
 ### Important: Clear Determination Expressions
 MUST use the following expressions to show clear determinations:
@@ -162,7 +162,7 @@ Return the JSON result as the final response. See Output Format for the schema.
 - [ ] Do I understand the user's true purpose?
 - [ ] Have I properly estimated the impact scope?
 - [ ] Have I correctly determined ADR necessity?
-- [ ] Have I not overlooked technical risks?
+- [ ] Have I identified all technical risks and dependencies?
 - [ ] Have I listed scopeDependencies for uncertain scale?
 - [ ] Final response is the JSON output

package/.codex/agents/task-executor.toml CHANGED Viewed

@@ -162,7 +162,7 @@ Select and execute files with pattern `docs/plans/tasks/*-task-*.md` that have u
 **Unavailable**: Escalate with `status: "escalation_needed"`, `reason: "test_environment_not_ready"`
 #### Pre-implementation Verification (Pattern 5 Compliant)
-1. **Read relevant Design Doc sections** and understand accurately
+1. **Read relevant Design Doc sections** and extract interface contracts, data structures, dependency constraints, and verification expectations
 2. **Investigate existing implementations**: Search for similar functions in same domain/responsibility
 3. **Cross-check against Investigation Notes**: Ensure planned implementation is consistent with the observations recorded in the task file
 4. **Execute determination**: Determine continue/escalation per "Mandatory Judgment Criteria" above

package/.codex/agents/technical-designer-frontend.toml CHANGED Viewed

@@ -80,7 +80,7 @@ Must be performed before Design Doc creation:
    - Search existing code for keywords related to planned component
    - Look for components with same domain, responsibilities, or UI patterns
    - Decision and action:
-     - Similar component found → Use that component (do not create new component)
+     - Similar component found → Reuse, compose, or extend that component path and document the reuse decision
      - Similar component is technical debt → Create ADR improvement proposal before implementation
      - No similar component → Proceed with new implementation

package/.codex/agents/technical-designer.toml CHANGED Viewed

@@ -97,7 +97,7 @@ Must be performed before Design Doc creation:
    - Search existing code for keywords related to planned functionality
    - Look for implementations with same domain, responsibilities, or configuration patterns
    - Decision and action:
-     - Similar functionality found → Use that implementation (do not create new implementation)
+     - Similar functionality found → Reuse or extend that implementation path and document the reuse decision
      - Similar functionality is technical debt → Create ADR improvement proposal before implementation
      - No similar functionality → Proceed with new implementation

package/.codex/agents/work-planner.toml CHANGED Viewed

@@ -111,7 +111,7 @@ Execute file output immediately. Final approval is managed by the orchestrator r
 1. **Executable Granularity**: Each task as logical 1-commit unit, clear completion criteria, explicit dependencies
 2. **Built-in Quality**: Simultaneous test implementation, quality checks in each phase
 3. **Risk Management**: List risks and countermeasures in advance, define detection methods
-4. **Ensure Flexibility**: Prioritize essential purpose, avoid excessive detail
+4. **Ensure Flexibility**: Prioritize essential purpose and include only details required for task execution, verification, and dependency management
 5. **Design Doc Compliance**: All task completion criteria derived from Design Doc specifications
 6. **Implementation Pattern Consistency**: When including implementation samples, MUST ensure strict compliance with Design Doc implementation approach

package/README.md CHANGED Viewed

@@ -4,9 +4,9 @@
 [![Agent Skills](https://img.shields.io/badge/Agent%20Skills-Spec%20Compliant-blue)](https://developers.openai.com/codex/skills/)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-**End-to-end AI coding workflows for [Codex CLI](https://developers.openai.com/codex/cli)** — specialized subagents handle requirements, design, implementation, and quality checks so you get code with explicit design docs, test coverage, and commit-level traceability — not just raw generations.
+**Structured agentic coding workflows for [OpenAI Codex CLI](https://developers.openai.com/codex/cli)** — specialized AI coding agents plan, implement, test, and review changes with traceable docs, task-level commits, and quality gates.
-Built on the [Agent Skills specification](https://developers.openai.com/codex/skills/) and [Codex subagents](https://developers.openai.com/codex/subagents). Works with the latest GPT models.
+Built on the [Agent Skills specification](https://developers.openai.com/codex/skills/) and [Codex subagents](https://developers.openai.com/codex/subagents). Designed for long-running tasks, large refactors, and reviewable changes.
 ---
@@ -25,36 +25,45 @@ $recipe-implement Add user authentication with JWT
 `$` is Codex CLI's syntax for invoking a skill explicitly. Type `$recipe-` to see all available recipes via tab completion.
-The framework runs a structured workflow — requirements → design → task decomposition → TDD implementation → quality gates — all through specialized subagents.
+Small changes stay lightweight. Larger tasks get structure: requirements → design → task decomposition → TDD implementation → quality gates.
+codex-workflows is the Codex-native counterpart of [Claude Code Workflows](https://github.com/shinpr/claude-code-workflows): same document-driven development style, adapted for Codex CLI, subagents, and GPT models.
 ---
 ## Why codex-workflows?
-**Without codex-workflows:**
-- Code generation is inconsistent across large tasks
-- Requirements and design decisions are implicit — lost after the session
-- Refactoring and debugging become harder as context grows
+Codex is already strong at one-shot implementation. The problem starts when a change spans multiple files, needs design decisions to stay visible, or has to survive review, testing, and follow-up edits.
+For larger tasks, explicit planning changes the job from raw generation into verification against a design, a task breakdown, and acceptance criteria. That matters because review loops are more reliable than first-shot generation once scope and ambiguity grow.
+codex-workflows adds the missing structure around those jobs:
+- Traceable artifacts: PRD → Design Doc → Task → Commit
+- Built-in TDD and quality gates before code is ready to commit
+- Agent context separation for large refactors, migrations, and PR-sized changes
+- Diagnosis and reverse-engineering flows for bugs and legacy code
+## Not Designed For
-**With codex-workflows:**
-- Every change is traceable: PRD → Design Doc → Task → Commit
-- Built-in TDD and quality gates catch regressions before commit
-- Large tasks stay structured and reviewable through agent context separation
+- One-shot toy scripts or vibe-coding sessions where speed matters more than traceability
+- Repositories that do not use tests, lint, builds, or reviewable commits
+- Teams that do not want design docs, task breakdowns, or explicit quality gates
 ---
 ## What It Does
-A single request becomes a structured development process:
+A single request becomes a structured development process. The framework chooses the level of ceremony based on scope:
+| Scale | File Count | What Happens |
+|-------|------------|-------------|
+| Small | 1-2 | Simplified plan → direct implementation |
+| Medium | 3-5 | Design Doc → work plan → task execution |
+| Large | 6+ | PRD → ADR → Design Doc → test skeletons → work plan → autonomous execution |
-1. **Understand** the problem (scale, constraints, affected files)
-2. **Analyze the existing codebase** (dependencies, data layer, risk areas)
-3. **Design** the solution (ADR, Design Doc with acceptance criteria)
-4. **Break it into tasks** (atomic, 1 commit each)
-5. **Implement with tests** (TDD per task)
-6. **Run quality checks** (lint, test, build — no failing checks)
+For larger work, the path usually looks like this: understand the problem, analyze the codebase, design the change, break it into atomic tasks, implement with tests, and run quality checks before commit.
-Each step is handled by a specialized subagent in its own context, preventing context pollution and reducing error accumulation in long-running tasks:
+Each step is handled by a specialized subagent in its own context, using context engineering to prevent context pollution and reduce error accumulation in long-running tasks:
 ```
 User Request
@@ -96,6 +105,8 @@ Problem → investigator → verifier (ACH + Devil's Advocate) → solver → Ac
 Existing code → scope-discoverer (discoveredUnits + prdUnits) → prd-creator → code-verifier → document-reviewer → Design Docs
 ```
+This works best when repository knowledge is explicit and local. Short `AGENTS.md` files can act as entry points, while design docs, plans, and task files hold the deeper instructions that agents need to execute reliably.
 ---
 ## Installation
@@ -103,7 +114,7 @@ Existing code → scope-discoverer (discoveredUnits + prdUnits) → prd-creator
 ### Requirements
 - [Codex CLI](https://developers.openai.com/codex/cli) (latest)
-- Node.js >= 20
+- Node.js >= 22
 ### Install
@@ -266,16 +277,6 @@ Codex spawns these as needed during recipe execution. Each agent runs in its own
 ## How It Works
-### Scale-Based Workflow Selection
-The framework automatically determines the right level of ceremony:
-| Scale | File Count | What Happens |
-|-------|------------|-------------|
-| Small | 1-2 | Simplified plan → direct implementation |
-| Medium | 3-5 | Design Doc → work plan → task execution |
-| Large | 6+ | PRD → ADR → Design Doc → test skeletons → work plan → autonomous execution |
 ### Autonomous Execution Mode
 After work plan approval, the framework enters guided autonomous execution with escalation points:
@@ -287,7 +288,8 @@ After work plan approval, the framework enters guided autonomous execution with
 ### Context Separation
-Each subagent runs in a fresh context. This matters because:
+Each subagent runs in a fresh context. This context-engineering pattern keeps long-running agentic coding tasks legible and reviewable:
+- generation and verification happen in separate contexts, reducing author bias and carry-over assumptions
 - **document-reviewer** reviews without the author's bias
 - **investigator** collects evidence without confirmation bias
 - **code-reviewer** validates compliance without implementation context
@@ -349,10 +351,6 @@ A: Yes. Edit the TOML files in `.codex/agents/` — change model, sandbox_mode,
 A: `$recipe-implement` is the universal entry point. It runs requirement-analyzer first, detects affected layers from the codebase, and automatically routes to backend, frontend, or fullstack flow. `$recipe-fullstack-implement` skips the detection and goes straight into the fullstack flow (separate Design Docs per layer, design-sync, layer-aware task execution). Use `$recipe-implement` when you're not sure; use `$recipe-fullstack-implement` when you know upfront that the feature spans both layers.
-**Q: How does this relate to Claude Code Workflows?**
-A: codex-workflows is the Codex-native counterpart of [Claude Code Workflows](https://github.com/shinpr/claude-code-workflows). Same development philosophy, adapted for Codex CLI's subagent architecture and GPT model family.
 **Q: Does this work with MCP servers?**
 A: Yes. Codex skills and subagents work alongside [MCP](https://developers.openai.com/codex/mcp) — skills operate at the instruction layer while MCP operates at the tool transport layer. You can add MCP servers to any agent's TOML configuration.
@@ -363,6 +361,19 @@ A: Subagents escalate to the user when they encounter design deviations, ambiguo
 ---
+## Design Rationale
+<details>
+<summary>Background reading behind the workflow design</summary>
+- [Planning Is the Real Superpower of Agentic Coding](https://www.norsica.jp/blog/planning-superpower-agentic-coding) — why explicit planning turns large-task execution from raw generation into verification against a design and task breakdown
+- [Why LLMs Are Bad at 'First Try' and Great at Verification](https://www.norsica.jp/blog/llm-verification-over-generation) — why review loops and session separation are more reliable than first-shot generation on complex work
+- [Stop Putting Everything in AGENTS.md](https://www.norsica.jp/blog/stop-putting-everything-in-agents-md) — why `AGENTS.md` should stay lean while rules, docs, and task instructions live near the point of use
+</details>
+---
 ## License
 MIT License — free to use, modify, and distribute.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "codex-workflows",
-  "version": "0.4.1",
+  "version": "0.4.3",
   "description": "Task-oriented agentic coding framework for OpenAI Codex CLI — skills, recipes, and subagents for structured development workflows",
   "license": "MIT",
   "author": "Shinsuke Kagawa",
@@ -22,9 +22,12 @@
     "agent-skills",
     "agentic-coding",
     "ai-coding",
+    "ai-coding-agent",
     "subagents",
     "multi-agent",
+    "harness-engineering",
     "context-engineering",
+    "ai-development-workflow",
     "tdd",
     "code-generation"
   ],