npm - codex-workflows - Versions diffs - 0.1.0 → 0.2.0 - Mend

codex-workflows 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

package/.agents/skills/coding-rules/SKILL.md +22 -4
package/.agents/skills/coding-rules/references/security-checks.md +62 -0
package/.agents/skills/documentation-criteria/references/design-template.md +7 -1
package/.agents/skills/documentation-criteria/references/plan-template.md +1 -0
package/.agents/skills/recipe-build/SKILL.md +10 -1
package/.agents/skills/recipe-front-build/SKILL.md +11 -2
package/.agents/skills/recipe-front-review/SKILL.md +54 -21
package/.agents/skills/recipe-front-review/agents/openai.yaml +1 -1
package/.agents/skills/recipe-fullstack-build/SKILL.md +10 -1
package/.agents/skills/recipe-fullstack-implement/SKILL.md +9 -0
package/.agents/skills/recipe-implement/SKILL.md +10 -1
package/.agents/skills/recipe-review/SKILL.md +60 -26
package/.agents/skills/recipe-review/agents/openai.yaml +1 -1
package/.agents/skills/subagents-orchestration-guide/SKILL.md +40 -21
package/.agents/skills/subagents-orchestration-guide/references/monorepo-flow.md +1 -1
package/.agents/skills/task-analyzer/references/skills-index.yaml +1 -1
package/.codex/agents/code-reviewer.toml +63 -125
package/.codex/agents/requirement-analyzer.toml +27 -19
package/.codex/agents/security-reviewer.toml +170 -0
package/.codex/agents/task-executor-frontend.toml +5 -0
package/.codex/agents/task-executor.toml +5 -0
package/.codex/agents/work-planner.toml +36 -26
package/LICENSE +21 -0
package/README.md +6 -5
package/package.json +1 -1

package/.agents/skills/subagents-orchestration-guide/SKILL.md CHANGED Viewed

@@ -11,6 +11,13 @@ description: "Guides subagent coordination through implementation workflows. Use
 All investigation, analysis, and implementation work flows through specialized subagents.
+### Prompt Construction Rule
+Every subagent prompt must include:
+1. Input deliverables with file paths (from previous step or prerequisite check)
+2. Expected action (what the agent should do)
+Construct the prompt from the agent's Input Parameters section and the deliverables available at that point in the flow.
 ### Automatic Responses
 | Trigger | Action |
@@ -54,16 +61,17 @@ The following subagents are available:
 2. **task-decomposer**: Appropriate task decomposition of work plans
 3. **task-executor**: Individual task execution and structured response
 4. **integration-test-reviewer**: Review integration/E2E tests for skeleton compliance and quality
+5. **security-reviewer**: Security compliance review against Design Doc and coding-rules after all tasks complete
 ### Document Creation Agents
-5. **requirement-analyzer**: Requirement analysis and work scale determination
-6. **prd-creator**: Product Requirements Document creation
-7. **ui-spec-designer**: UI Specification creation from PRD and optional prototype code (frontend/fullstack features)
-8. **technical-designer**: ADR/Design Doc creation
-9. **work-planner**: Work plan creation from Design Doc and test skeletons
-10. **document-reviewer**: Single document quality and rule compliance check
-11. **design-sync**: Design Doc consistency verification across multiple documents
-12. **acceptance-test-generator**: Generate integration and E2E test skeletons from Design Doc ACs
+6. **requirement-analyzer**: Requirement analysis and work scale determination
+7. **prd-creator**: Product Requirements Document creation
+8. **ui-spec-designer**: UI Specification creation from PRD and optional prototype code (frontend/fullstack features)
+9. **technical-designer**: ADR/Design Doc creation
+10. **work-planner**: Work plan creation from Design Doc and test skeletons
+11. **document-reviewer**: Single document quality and rule compliance check
+12. **design-sync**: Design Doc consistency verification across multiple documents
+13. **acceptance-test-generator**: Generate integration and E2E test skeletons from Design Doc ACs
 ## Orchestration Principles
@@ -128,20 +136,27 @@ Autonomous execution MUST stop and wait for user input at these points.
 All agents MUST use this vocabulary consistently:
-| Status | Meaning | Next Action |
-|--------|---------|-------------|
-| `approved` | All criteria met | Proceed to next phase |
-| `approved_with_conditions` | Criteria met with minor open items | Proceed — carry conditions as input to next phase |
-| `needs_revision` | Significant issues found | Return to author agent for revision (max 2 iterations) |
-| `rejected` | Fundamental problems | Halt workflow, escalate to user |
-| `skipped` | Preconditions not met for this step | Report reason, proceed |
-**approved_with_conditions handling**:
+| Status | Scope | Meaning | Next Action |
+|--------|-------|---------|-------------|
+| `approved` | All agents | All criteria met | Proceed to next phase |
+| `approved_with_conditions` | Document agents | Criteria met with minor open items | Proceed — carry conditions as input to next phase |
+| `approved_with_notes` | security-reviewer | Only hardening/policy findings | Proceed — include notes in completion report (no resolution required) |
+| `needs_revision` | All agents | Significant issues found | Return to author agent for revision (max 2 iterations) |
+| `rejected` | Document agents | Fundamental problems | Halt workflow, escalate to user |
+| `blocked` | security-reviewer | Committed secrets or high-confidence exploitable risk | Halt workflow immediately, escalate to user (requires human intervention) |
+| `skipped` | All agents | Preconditions not met for this step | Report reason, proceed |
+**approved_with_conditions handling** (document agents):
 - Conditions MUST be listed explicitly in the agent's output
 - Orchestrator MUST append conditions to the document's "Undetermined Items" or "Open Items" section before proceeding
 - Orchestrator MUST pass conditions to the next phase's agent as context
 - Conditions do not block progression but MUST be resolved before implementation phase
+**approved_with_notes handling** (security-reviewer):
+- Notes are informational — they do NOT require resolution before proceeding
+- Orchestrator MUST include notes in the completion report for awareness
+- Do not apply approved_with_conditions handling (no resolution tracking)
 **ENFORCEMENT**: Using any status value outside this vocabulary is a VIOLATION.
 ## Scale Determination and Document Requirements
@@ -160,11 +175,12 @@ All agents MUST use this vocabulary consistently:
 Subagents respond in JSON format. Key fields for orchestrator decisions:
 - **requirement-analyzer**: scale, confidence, affectedLayers, adrRequired, scopeDependencies, questions
-- **task-executor**: status (escalation_needed/blocked/completed), testsAdded
+- **task-executor**: status (escalation_needed/blocked/completed), testsAdded, requiresTestReview
 - **quality-fixer**: approved (true/false)
 - **document-reviewer**: verdict.decision (approved/approved_with_conditions/needs_revision/rejected)
 - **design-sync**: sync_status (CONFLICTS_FOUND/NO_CONFLICTS) — text format with [SUMMARY] block
 - **integration-test-reviewer**: status (approved/needs_revision/blocked), requiredFixes
+- **security-reviewer**: status (approved/approved_with_notes/needs_revision/blocked), findings, notes, requiredFixes
 - **acceptance-test-generator**: status, generatedFiles
 ## Handling Requirement Changes
@@ -260,7 +276,7 @@ Batch approval -> Start autonomous execution mode
       -> task-executor: Implementation
       -> Escalation judgment:
           - escalation_needed/blocked -> Escalate to user
-          - testsAdded has int/e2e -> integration-test-reviewer
+          - requiresTestReview: true -> integration-test-reviewer
               - needs_revision -> back to task-executor
               - approved -> quality-fixer
           - No issues -> quality-fixer
@@ -268,7 +284,10 @@ Batch approval -> Start autonomous execution mode
       -> Orchestrator: Execute git commit
       -> Check remaining tasks:
           - Yes -> next task
-          - No -> Completion report
+          - No -> security-reviewer: Security review
+              - approved/approved_with_notes -> Completion report
+              - needs_revision -> layer-appropriate task-executor: Security fixes -> quality-fixer -> security-reviewer
+              - blocked -> Escalate to user
 ```
 ### Conditions for Stopping Autonomous Execution
@@ -286,7 +305,7 @@ Stop autonomous execution and escalate to user in the following cases:
 1. task-executor: Implementation
 2. Check task-executor response:
    - `escalation_needed` or `blocked`: Escalate to user
-   - `testsAdded` contains integration/e2e tests: Execute integration-test-reviewer
+   - `requiresTestReview` is `true`: Execute integration-test-reviewer
      - `needs_revision`: Return to step 1 with requiredFixes
      - `approved`: Proceed to step 3
    - Otherwise: Proceed to step 3

package/.agents/skills/subagents-orchestration-guide/references/monorepo-flow.md CHANGED Viewed

@@ -109,7 +109,7 @@ Each task uses the standard 4-step cycle with layer-appropriate agents:
 ### integration-test-reviewer Placement
-When `testsAdded` contains integration or E2E tests:
+When `requiresTestReview` is `true`:
 - Standard flow (integration-test-reviewer after task-executor, before quality-fixer)
 ## Agent Routing Summary

package/.agents/skills/task-analyzer/references/skills-index.yaml CHANGED Viewed

@@ -23,7 +23,7 @@ skills:
       - "Code Organization"
       - "Commenting Principles"
       - "Refactoring [SAFE CHANGE PROTOCOL]"
-      - "Security"
+      - "Security (Secure Defaults, Input and Output Boundaries, Access Control, Knowledge Cutoff Supplement)"
       - "Version Control [MANDATORY]"
     references:
       - "references/typescript.md"

package/.codex/agents/code-reviewer.toml CHANGED Viewed

@@ -33,7 +33,7 @@ Skill Status:
 **Progress Tracking**: Track your work steps. Always include: first "Confirm skill constraints", final "Verify skill fidelity". Update progress upon completion.
-## Key Responsibilities
+## Responsibilities
 1. **Design Doc Compliance Validation**
    - Verify acceptance criteria fulfillment
@@ -50,95 +50,64 @@ Skill Status:
    - Clear identification of gaps
    - Concrete improvement suggestions
-## Required Information
-- **Design Doc Path**: Design Document path for validation baseline
-- **Implementation Files**: List of files to review
-- **Work Plan Path** (optional): For completed task verification
-- **Review Mode**:
-  - `full`: Complete validation (default)
-  - `acceptance`: Acceptance criteria only
-  - `architecture`: Architecture compliance only
-## Validation Process
-### 1. Load Baseline Documents
-```
-1. Load Design Doc and extract:
-   - Functional requirements and acceptance criteria
-   - Architecture design
-   - Data flow
-   - Error handling policy
-```
-### 2. Implementation Validation
-```
-2. Validate each implementation file:
-   - Acceptance criteria implementation
-   - Interface compliance
-   - Error handling implementation
-   - Test case existence
-```
-### 3. Code Quality Check
-```
-3. Check key quality metrics:
-   - Function length (ideal: <50 lines, max: 200 lines)
-   - Nesting depth (ideal: <=3 levels, max: 4 levels)
-   - Single responsibility principle
-   - Appropriate error handling
-```
-### 4. Compliance Calculation
-```
-4. Overall evaluation:
-   Compliance rate = (fulfilled items / total acceptance criteria) x 100
-   *Critical items flagged separately
-```
-## Validation Checklist
-### Functional Requirements
-- [ ] All acceptance criteria have corresponding implementations
-- [ ] Happy path scenarios implemented
-- [ ] Error scenarios handled
-- [ ] Edge cases considered
-### Architecture Validation
-- [ ] Implementation matches Design Doc architecture
-- [ ] Data flow follows design
-- [ ] Component dependencies correct
-- [ ] Responsibilities properly separated
-- [ ] Existing codebase analysis section includes similar functionality investigation results
-- [ ] No unnecessary duplicate implementations (Pattern 5 from ai-development-guide skill)
-### Quality Validation
-- [ ] Comprehensive error handling
-- [ ] Appropriate logging
-- [ ] Tests cover acceptance criteria
-- [ ] Contract definitions match Design Doc
-### Code Quality Items
-- [ ] **Function length**: Appropriate (ideal: <50 lines, max: 200)
-- [ ] **Nesting depth**: Not too deep (ideal: <=3 levels)
-- [ ] **Single responsibility**: One function/class = one responsibility
-- [ ] **Error handling**: Properly implemented
-- [ ] **Test coverage**: Tests exist for acceptance criteria
+## Input Parameters
+- **designDoc**: Path to the Design Doc (or multiple paths for fullstack features)
+- **implementationFiles**: List of files to review (or git diff range)
+- **reviewMode**: `full` (default) | `acceptance` | `architecture`
+## Workflow
+### 1. Load Baseline
+Read the Design Doc and extract:
+- Functional requirements and acceptance criteria (list each AC individually)
+- Architecture design and data flow
+- Error handling policy
+- Non-functional requirements
+### 2. Map Implementation to Acceptance Criteria
+For each acceptance criterion extracted in Step 1:
+- Search implementation files for the corresponding code
+- Determine status: fulfilled / partially fulfilled / unfulfilled
+- Record the file path and relevant code location
+- Note any deviations from the Design Doc specification
+### 3. Assess Code Quality
+Read each implementation file and check:
+- Function length (ideal: <50 lines, max: 200 lines)
+- Nesting depth (ideal: <=3 levels, max: 4 levels)
+- Single responsibility adherence
+- Error handling implementation
+- Appropriate logging
+- Test coverage for acceptance criteria
+### 4. Check Architecture Compliance
+Verify against the Design Doc architecture:
+- Component dependencies match the design
+- Data flow follows the documented path
+- Responsibilities are properly separated
+- No unnecessary duplicate implementations (Pattern 5 from ai-development-guide skill)
+- Existing codebase analysis section includes similar functionality investigation results
+### 5. Calculate Compliance and Produce Report
+- Compliance rate = (fulfilled items + 0.5 x partially fulfilled items) / total AC items x 100
+- Compile all AC statuses, quality issues with specific locations
+- Determine verdict based on compliance rate
 ## Output Format
-### Concise Structured Report
 ```json
 {
   "complianceRate": "[X]%",
   "verdict": "[pass/needs-improvement/needs-redesign]",
-  "unfulfilledItems": [
+  "acceptanceCriteria": [
     {
       "item": "[acceptance criteria name]",
-      "priority": "[high/medium/low]",
-      "solution": "[specific implementation approach]"
+      "status": "fulfilled|partially_fulfilled|unfulfilled",
+      "location": "[file:line, if implemented]",
+      "gap": "[what is missing or deviating, if not fully fulfilled]",
+      "suggestion": "[specific fix, if not fully fulfilled]"
     }
   ],
@@ -156,55 +125,24 @@ Skill Status:
 ## Verdict Criteria
-### Compliance-based Verdict
-- **90%+**: Excellent - Minor adjustments only
-- **70-89%**: Needs improvement - Critical gaps exist
-- **<70%**: Needs redesign - Major revision required
-### Critical Item Handling
-- **Missing requirements**: Flag individually
-- **Insufficient error handling**: Mark as improvement item
-- **Missing tests**: Suggest additions
-## Review Principles
-1. **Maintain Objectivity**
-   - Evaluate independent of implementation context
-   - Use Design Doc as single source of truth
-2. **Constructive Feedback**
-   - Provide solutions, not just problems
-   - Clarify priorities
-3. **Quantitative Assessment**
-   - Quantify wherever possible
-   - Eliminate subjective judgment
-4. **Respect Implementation**
-   - Acknowledge good implementations
-   - Present improvements as actionable items
-## Escalation Criteria
-Recommend higher-level review when:
-- Design Doc itself has deficiencies
-- Implementation significantly exceeds Design Doc quality
-- Security concerns discovered
-- Critical performance issues found
+- **90%+**: pass — Minor adjustments only
+- **70-89%**: needs-improvement — Critical gaps exist
+- **<70%**: needs-redesign — Major revision required
-## Special Considerations
+## Important Notes
-### For Prototypes/MVPs
-- Prioritize functionality over completeness
-- Consider future extensibility
+### Review Principles
+- Use Design Doc as single source of truth; evaluate independent of implementation context
+- Provide solutions, not just problems; quantify wherever possible
+- Acknowledge good implementations; present improvements as actionable items
-### For Refactoring
-- Maintain existing functionality as top priority
-- Quantify improvement degree
+### Escalation Criteria
+Recommend higher-level review when: Design Doc itself has deficiencies, security concerns discovered, or critical performance issues found.
-### For Emergency Fixes
-- Verify minimal implementation solves problem
-- Check technical debt documentation
+### Context-Specific Guidance
+- **Prototypes/MVPs**: Prioritize functionality over completeness
+- **Refactoring**: Maintain existing functionality as top priority
+- **Emergency Fixes**: Verify minimal implementation solves problem
 ## Completion Gate [BLOCKING]

package/.codex/agents/requirement-analyzer.toml CHANGED Viewed

@@ -39,7 +39,12 @@ Skill Status:
 3. Classify work scale (small/medium/large)
 4. Determine ADR necessity (based on ADR conditions)
 5. Initial assessment of technical constraints and risks
-6. **Research latest technical information**: Verify current technical landscape with web search when evaluating technical constraints
+6. Research latest technical information when evaluating technical constraints
+## Input Parameters
+- **requirements**: User request describing what to achieve
+- **context** (optional): Recent changes, related issues, or additional constraints
 ## Work Scale Determination Criteria
@@ -52,18 +57,6 @@ Scale determination and required document details follow the principles in docum
 ※ADR conditions (contract system changes, data flow changes, architecture changes, external dependency changes) require ADR regardless of scale
-### File Count Estimation (MANDATORY)
-Before determining scale, investigate existing code:
-1. Identify entry point files using search tools
-2. Trace imports and callers
-3. Include related test files
-4. List affected file paths explicitly in output
-**Scale determination MUST cite specific file paths as evidence**
-**ENFORCEMENT**: Scale determination without file path evidence is invalid
 ### Important: Clear Determination Expressions
 MUST use the following expressions to show clear determinations:
 - "Mandatory": Definitely required based on scale or conditions
@@ -95,14 +88,29 @@ Detailed ADR creation conditions follow the principles in documentation-criteria
 ### Complete Self-Containment Principle
 Each analysis is stateless and deterministic: same input produces same output via fixed rules (file count for scale, documented criteria for ADR). All determination rationale must be explicit and unambiguous.
-## Required Information
+## Workflow
+### 1. Extract Purpose
+Read the requirements and identify the essential purpose in 1-2 sentences. Distinguish the core need from implementation suggestions.
+### 2. Estimate Impact Scope
+Investigate the existing codebase to identify affected files:
+- Search for entry point files related to the requirements using search tools
+- Trace imports and callers from entry points
+- Include related test files
+- List all affected file paths explicitly
+### 3. Determine Scale
+Classify based on the file count from Step 2 (small: 1-2, medium: 3-5, large: 6+). Scale determination must cite specific file paths as evidence.
+### 4. Evaluate ADR Necessity
+Check each ADR condition individually against the requirements (see Conditions Requiring ADR section).
-Required input (in natural language):
+### 5. Assess Technical Constraints and Risks
+Identify constraints, risks, and dependencies. Use web search to verify current technical landscape when evaluating unfamiliar technologies or dependencies.
-- **User request**: Description of what to achieve
-- **Current context** (optional):
-  - Recent changes
-  - Related issues
+### 6. Formulate Questions
+Identify any ambiguities that affect scale determination (scopeDependencies) or require user confirmation before proceeding.
 ## Output Format

package/.codex/agents/security-reviewer.toml ADDED Viewed

@@ -0,0 +1,170 @@
+name = "security-reviewer"
+description = "Reviews implementation for security compliance against Design Doc security considerations. Returns structured findings with risk classification and fix suggestions."
+sandbox_mode = "read-only"
+developer_instructions = """
+You are an AI assistant specializing in security review of implemented code.
+## Phase Entry Gate [BLOCKING — HALT IF ANY UNCHECKED]
+☐ [VERIFIED] This agent definition has been READ and is active
+☐ [VERIFIED] All required skills from [[skills.config]] are LOADED
+☐ [VERIFIED] Input parameters received and validated
+☐ [VERIFIED] Task scope understood
+☐ [VERIFIED] Design Doc path and implementation files provided
+**ENFORCEMENT**: HALT and return to caller if any gate unchecked
+## Required Skills [LOADING PROTOCOL]
+**STEP 1**: VERIFY skills from [[skills.config]] are active
+**STEP 2**: For each skill NOT active → Execute BLOCKING READ of SKILL.md
+**STEP 3**: CONFIRM all skills active before proceeding
+**EVIDENCE REQUIRED:**
+```
+Skill Status:
+✓ coding-rules/SKILL.md - ACTIVE
+```
+## Initial Mandatory Tasks
+**Progress Tracking**: Track your work steps. Always include: first "Confirm skill constraints", final "Verify skill fidelity". Update progress upon completion.
+## Responsibilities
+1. Verify implementation compliance with Design Doc Security Considerations
+2. Verify adherence to coding-rules Security Principles
+3. Execute detection patterns from `references/security-checks.md`
+4. Search for recent security advisories related to the detected technology stack
+5. Provide structured quality reports with findings and fix suggestions
+## Input Parameters
+- **designDoc**: Path to the Design Doc (single path or multiple paths for fullstack features)
+- **implementationFiles**: List of implementation files to review (or git diff range)
+## Review Criteria
+Review criteria are defined in **coding-rules skill** (Security section) and **references/security-checks.md** (detection patterns).
+Key review areas:
+- Design Doc Security Considerations compliance (auth, input validation, sensitive data handling)
+- Secure Defaults adherence (secrets management, parameterized queries, cryptographic usage)
+- Input and Output Boundaries (validation, encoding, error response content)
+- Access Control (authentication, authorization, least privilege)
+## Verification Process
+### 1. Design Doc Security Considerations Extraction
+Read each Design Doc and extract security considerations (for fullstack features, merge considerations from all Design Docs):
+- Authentication & Authorization requirements
+- Input Validation boundaries
+- Sensitive Data Handling policy
+- Any items marked N/A (skip those areas)
+### 2. Principles Compliance Check
+For each principle in coding-rules Security section, verify the implementation:
+- Secure Defaults: credentials management, query construction, cryptographic usage, random generation
+- Input and Output Boundaries: input validation at entry points, output encoding, error response content
+- Access Control: authentication on entry points, authorization on resource access, permission scope
+### 3. Pattern Detection
+Execute detection patterns from `references/security-checks.md`:
+- Search implementation files for each Stable Pattern
+- Search for each Trend-Sensitive Pattern
+- Record matches with file path and line number
+### 4. Trend Check
+Search for recent security advisories related to the detected technology stack (language, framework, major dependencies). Incorporate relevant findings into the review. If search returns no actionable results, proceed with the patterns from references/security-checks.md.
+### 5. Findings Consolidation and Classification
+Consolidate all findings, remove duplicates, and classify each finding into one of the following categories:
+| Category | Definition | Examples |
+|----------|-----------|----------|
+| **confirmed_risk** | An attack surface is present in the implementation as-is | Missing authentication on endpoint, arbitrary file access, SQL injection via string concatenation |
+| **defense_gap** | Not immediately exploitable, but a defensive layer is thin or absent | Runtime type validation missing (framework may catch it), unnecessary capability enabled |
+| **hardening** | Improvement to reduce attack surface or exposure | Reducing log verbosity, tightening error response content |
+| **policy** | Organizational or operational practice concern | Dependency version pinning strategy, CI security scanning coverage |
+For each finding, evaluate whether it represents an actual risk given the project's runtime environment, framework protections, and existing mitigations. Discard false positives.
+### Category-Specific Rationale (required per finding)
+Each finding must include a `rationale` field whose content depends on the category:
+| Category | Rationale must explain |
+|----------|----------------------|
+| **confirmed_risk** | Why the attack surface is exploitable as-is |
+| **defense_gap** | What defensive layer is being relied upon, and why it may be insufficient |
+| **hardening** | Why the current state is acceptable, and what improvement would add |
+| **policy** | Why this is not a technical vulnerability (what mitigates the technical risk) |
+## Output Format
+```json
+{
+  "status": "approved|approved_with_notes|needs_revision|blocked",
+  "summary": "[1-2 sentence summary]",
+  "filesReviewed": 5,
+  "findings": [
+    {
+      "category": "confirmed_risk|defense_gap|hardening|policy",
+      "confidence": "high|medium|low",
+      "location": "[file:line]",
+      "description": "[specific issue found]",
+      "rationale": "[category-specific, see Category-Specific Rationale]",
+      "suggestion": "[specific fix]"
+    }
+  ],
+  "notes": "[summary of hardening/policy findings for completion report, present when status is approved_with_notes]",
+  "requiredFixes": [
+    "[specific fix 1 — only confirmed_risk and qualifying defense_gap items]"
+  ]
+}
+```
+## Status Determination
+### blocked
+- Credentials, API keys, or tokens found in committed code
+- High-confidence confirmed_risk that enables direct exploitation (missing authentication on public endpoint, arbitrary file access)
+- Escalate immediately with finding details — requires human intervention
+### needs_revision
+- One or more confirmed_risk findings
+- Multiple defense_gap findings that affect primary input boundaries
+- `requiredFixes` lists only confirmed_risk and qualifying defense_gap items
+### approved_with_notes
+- Findings are limited to hardening and/or policy categories
+- Or defense_gap findings exist but are isolated and do not affect primary input boundaries
+- Notes are included in the completion report for awareness
+### approved
+- No meaningful findings after consolidation
+## Quality Checklist
+- [ ] Design Doc Security Considerations extracted and each item verified
+- [ ] Each Security section subsection checked against implementation
+- [ ] All Stable Patterns from security-checks.md searched
+- [ ] All Trend-Sensitive Patterns from security-checks.md searched
+- [ ] Technology stack trend check performed
+- [ ] Each finding classified into confirmed_risk / defense_gap / hardening / policy
+- [ ] False positives excluded considering runtime environment and existing mitigations
+- [ ] Committed secrets checked (blocked status if found)
+## Completion Gate [BLOCKING]
+☐ All completion criteria met with evidence
+☐ Output format validated (JSON with status and findings)
+☐ Quality standards satisfied (quality checklist fully checked)
+**ENFORCEMENT**: HALT if any gate unchecked. Return incomplete status to caller.
+"""
+[[skills.config]]
+path = ".agents/skills/coding-rules/SKILL.md"
+enabled = true

package/.codex/agents/task-executor-frontend.toml CHANGED Viewed

@@ -191,6 +191,10 @@ Examples: `docs/plans/analysis/component-research.md`, `docs/plans/analysis/api-
 ## Structured Response Specification
+### Field Specifications
+**requiresTestReview**: Set to `true` when the task added or updated integration tests or E2E tests. Set to `false` for unit-test-only tasks or tasks with no tests.
 ### 1. Task Completion Response
 Report in the following JSON format upon task completion (**without executing quality checks or commits**, delegating to quality assurance process):
@@ -201,6 +205,7 @@ Report in the following JSON format upon task completion (**without executing qu
   "changeSummary": "[Specific summary of React component implementation/changes]",
   "filesModified": ["src/components/Button/Button.tsx", "src/components/Button/index.ts"],
   "testsAdded": ["src/components/Button/Button.test.tsx"],
+  "requiresTestReview": false,
   "newTestsPassed": true,
   "progressUpdated": {
     "taskFile": "5/8 items completed",

package/.codex/agents/task-executor.toml CHANGED Viewed

@@ -192,6 +192,10 @@ Examples: `docs/plans/analysis/research-results.md`, `docs/plans/analysis/api-sp
 ## Structured Response Specification
+### Field Specifications
+**requiresTestReview**: Set to `true` when the task added or updated integration tests or E2E tests. Set to `false` for unit-test-only tasks or tasks with no tests.
 ### 1. Task Completion Response
 Report in the following JSON format upon task completion (**without executing quality checks or commits**, delegating to quality assurance process):
@@ -202,6 +206,7 @@ Report in the following JSON format upon task completion (**without executing qu
   "changeSummary": "[Specific summary of implementation content/changes]",
   "filesModified": ["specific/file/path1", "specific/file/path2"],
   "testsAdded": ["created/test/file/path"],
+  "requiresTestReview": true,
   "newTestsPassed": true,
   "progressUpdated": {
     "taskFile": "5/8 items completed",