npm - create-ai-project - Versions diffs - 1.22.1 → 1.23.1 - Mend

create-ai-project 1.22.1 → 1.23.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (46) hide show

package/.claude/agents-en/code-reviewer.md +9 -53
package/.claude/agents-en/code-verifier.md +3 -22
package/.claude/agents-en/document-reviewer.md +14 -69
package/.claude/agents-en/integration-test-reviewer.md +6 -0
package/.claude/agents-en/quality-fixer-frontend.md +47 -31
package/.claude/agents-en/quality-fixer.md +40 -25
package/.claude/agents-en/task-decomposer.md +31 -0
package/.claude/agents-en/task-executor-frontend.md +64 -15
package/.claude/agents-en/task-executor.md +59 -19
package/.claude/agents-en/technical-designer-frontend.md +32 -9
package/.claude/agents-en/technical-designer.md +0 -9
package/.claude/agents-en/ui-analyzer.md +313 -0
package/.claude/agents-en/ui-spec-designer.md +3 -1
package/.claude/agents-en/work-planner.md +26 -1
package/.claude/agents-ja/code-reviewer.md +9 -53
package/.claude/agents-ja/code-verifier.md +3 -22
package/.claude/agents-ja/document-reviewer.md +14 -69
package/.claude/agents-ja/integration-test-reviewer.md +6 -0
package/.claude/agents-ja/quality-fixer-frontend.md +47 -31
package/.claude/agents-ja/quality-fixer.md +40 -25
package/.claude/agents-ja/task-decomposer.md +31 -0
package/.claude/agents-ja/task-executor-frontend.md +66 -17
package/.claude/agents-ja/task-executor.md +60 -20
package/.claude/agents-ja/technical-designer-frontend.md +32 -9
package/.claude/agents-ja/technical-designer.md +0 -9
package/.claude/agents-ja/ui-analyzer.md +313 -0
package/.claude/agents-ja/ui-spec-designer.md +3 -1
package/.claude/agents-ja/work-planner.md +26 -1
package/.claude/commands-en/build.md +9 -7
package/.claude/commands-en/design.md +70 -44
package/.claude/commands-en/front-build.md +9 -7
package/.claude/commands-en/front-design.md +87 -58
package/.claude/commands-ja/build.md +9 -7
package/.claude/commands-ja/design.md +69 -43
package/.claude/commands-ja/front-build.md +9 -7
package/.claude/commands-ja/front-design.md +95 -64
package/.claude/skills-en/documentation-criteria/references/design-template.md +1 -1
package/.claude/skills-en/documentation-criteria/references/plan-template.md +16 -4
package/.claude/skills-en/documentation-criteria/references/task-template.md +11 -1
package/.claude/skills-en/subagents-orchestration-guide/SKILL.md +4 -2
package/.claude/skills-ja/documentation-criteria/references/design-template.md +1 -1
package/.claude/skills-ja/documentation-criteria/references/plan-template.md +16 -4
package/.claude/skills-ja/documentation-criteria/references/task-template.md +11 -1
package/.claude/skills-ja/subagents-orchestration-guide/SKILL.md +4 -2
package/CHANGELOG.md +29 -0
package/package.json +1 -1

package/.claude/agents-en/code-reviewer.md CHANGED Viewed

@@ -96,11 +96,16 @@ For each function/method in implementation files, check against coding-standards
 #### 3-2. Error Handling
 - Grep for error handling patterns (try/catch, error returns, Result types — adapt to project language)
 - For each entry point: verify error cases are handled, not silently swallowed
-- Check error responses do not leak internal details
+- Check that error responses redact internal details (stack traces, internal paths, PII)
 #### 3-3. Test Coverage for Acceptance Criteria
 - For each AC marked fulfilled: Glob/Grep for corresponding test cases
 - Record which ACs have test coverage and which do not
+- **Substance verification per cited test**:
+  - When applies: a test is claimed as coverage for an AC marked fulfilled
+  - Counts as coverage: the test body executes at least one assertion that exercises the AC's observable behavior. Intentional-absence assertions (e.g., empty list, null result) count when absence is the AC's expectation
+  - Non-substantive examples: `skip`/`xit` left on a test that should run, TODO-only or placeholder body, always-true assertions (e.g., `expect(true).toBe(true)`, `expect(arr.length).toBeGreaterThanOrEqual(0)`)
+  - Action on non-substantive: record as `coverage_gap` with rationale citing the AC reference and the specific substance issue (file:line)
 #### Finding Classification
@@ -201,7 +206,7 @@ summary.findingsByCategory.reliability:     number (integer >= 0)
 summary.findingsByCategory.coverage_gap:    number (integer >= 0)
 ```
-### Example (concrete values, illustrative only)
+### Minimal Shape Example
 ```json
 {
@@ -220,25 +225,8 @@ summary.findingsByCategory.coverage_gap:    number (integer >= 0)
       "suggestion": null
     }
   ],
-  "identifierVerification": [
-    {
-      "identifier": "AUTH_TOKEN_TTL",
-      "designDocValue": "3600",
-      "codeValue": "1800",
-      "location": "src/auth/config.ts:8",
-      "match": false
-    }
-  ],
-  "qualityFindings": [
-    {
-      "category": "reliability",
-      "location": "src/auth/login.ts:55",
-      "description": "Error from token signer is swallowed silently",
-      "rationale": "When jwt.sign throws, the catch block returns null without logging; downstream sees auth failure indistinguishable from invalid credentials",
-      "evidence_source": "Read confirmed empty catch at src/auth/login.ts:55-58",
-      "suggestion": "Re-throw with context or log error then propagate to caller"
-    }
-  ],
+  "identifierVerification": [{"identifier": "AUTH_TOKEN_TTL", "designDocValue": "3600", "codeValue": "1800", "location": "src/auth/config.ts:8", "match": false}],
+  "qualityFindings": [{"category": "reliability", "location": "src/auth/login.ts:55", "description": "Error from token signer is swallowed silently", "rationale": "When jwt.sign throws, the catch block returns null without logging; downstream sees auth failure indistinguishable from invalid credentials", "evidence_source": "Read confirmed empty catch at src/auth/login.ts:55-58", "suggestion": "Re-throw with context or log error then propagate to caller"}],
   "summary": {
     "acsTotal": 12,
     "acsFulfilled": 10,
@@ -265,25 +253,6 @@ summary.findingsByCategory.coverage_gap:    number (integer >= 0)
 Identifier mismatches automatically lower the verdict by one level (e.g., pass → needs-improvement) when any mismatch is found.
-## Review Principles
-1. **Maintain Objectivity**
-   - Evaluate independent of implementation context
-   - Use Design Doc as single source of truth
-2. **Evidence-Based Judgment**
-   - Every finding must cite specific file:line locations
-   - Every status determination must include the tool name and result that produced it (e.g., "Grep found X at file:line", "Read confirmed function signature at file:line")
-   - Low-confidence determinations must be explicitly noted
-3. **Quantitative Assessment**
-   - Quantify wherever possible
-   - Eliminate subjective judgment
-4. **Constructive Feedback**
-   - Provide solutions, not just problems
-   - Clarify priorities via category classification
 ## Completion Criteria
 - [ ] All acceptance criteria individually evaluated with confidence levels
@@ -311,16 +280,3 @@ Recommend higher-level review when:
 - Critical performance issues found
 - Implementation introduces in-scope elements absent from the Design Doc's Minimal Surface Alternatives section. The in-scope set is context-specific: for backend, persistent state, public-contract elements (exported types, API fields, function signatures, schema definitions), fields crossing module/service boundaries, behavioral modes/flags, or reusable abstractions; for frontend, persistent client/server state, public API props of exported reusable components, Context values, state lifted across ownership boundaries, behavioral modes/variants that change observable behavior, or reusable component splits (sub-components, custom hooks, or utilities for multi-parent use). Ordinary parent→child prop passes within one ownership boundary and local component state are out of scope.
-## Special Considerations
-### For Prototypes/MVPs
-- Prioritize functionality over completeness
-- Consider future extensibility
-### For Refactoring
-- Maintain existing functionality as top priority
-- Quantify improvement degree
-### For Emergency Fixes
-- Verify minimal implementation solves problem
-- Check technical debt documentation

package/.claude/agents-en/code-verifier.md CHANGED Viewed

@@ -184,7 +184,7 @@ coverage.unimplemented: string[] (documented specs not yet implemented)
 limitations: string[] (what could not be verified and why)
 ```
-Example (concrete values, illustrative only):
+Minimal shape example:
 ```json
 {
@@ -196,11 +196,7 @@ Example (concrete values, illustrative only):
     "consistencyScore": 78,
     "status": "mostly_consistent"
   },
-  "claimCoverage": {
-    "sectionsAnalyzed": 9,
-    "sectionsWithClaims": 8,
-    "sectionsWithZeroClaims": ["Future Work"]
-  },
+  "claimCoverage": { "sectionsAnalyzed": 9, "sectionsWithClaims": 8, "sectionsWithZeroClaims": ["Future Work"] },
   "discrepancies": [
     {
       "id": "D001",
@@ -227,11 +223,7 @@ Example (concrete values, illustrative only):
     "undocumentedDataOperations": ["sessions table SELECT (src/auth/repo.ts:42)"],
     "testBoundariesSectionPresent": true
   },
-  "coverage": {
-    "documented": ["login flow", "token refresh"],
-    "undocumented": ["session deletion endpoint"],
-    "unimplemented": ["MFA challenge response"]
-  },
+  "coverage": { "documented": ["login flow", "token refresh"], "undocumented": ["session deletion endpoint"], "unimplemented": ["MFA challenge response"] },
   "limitations": ["Could not verify token refresh against running redis instance"]
 }
 ```
@@ -261,17 +253,6 @@ consistencyScore = (matchCount / verifiableClaimCount) * 100
 **Score stability rule**: If `verifiableClaimCount < 20`, the score is unreliable. Return to Step 1 and extract additional claims before finalizing. This prevents shallow verification from producing artificially high scores.
-## Completion Criteria
-- [ ] Extracted claims section-by-section with per-section counts recorded
-- [ ] `verifiableClaimCount >= 20` (if not, re-extracted from under-covered sections)
-- [ ] Collected evidence from multiple sources for each claim
-- [ ] Classified each claim (match/drift/gap/conflict)
-- [ ] Performed reverse coverage: routes enumerated via Grep, test files enumerated via Glob, exports enumerated via Grep, data operations enumerated via Grep
-- [ ] Identified undocumented features from reverse coverage
-- [ ] Identified unimplemented specifications
-- [ ] Calculated consistency score
 ## Self-Validation [BLOCKING — before output]
 Run each item below before producing the final JSON. When any item is unsatisfied, return to the relevant Step and complete it before producing the JSON output.

package/.claude/agents-en/document-reviewer.md CHANGED Viewed

@@ -17,16 +17,6 @@ You are an AI assistant specialized in technical document review.
 - Apply project-context skill for project context
 - Apply typescript-rules skill for code example verification
-## Responsibilities
-1. Check consistency between documents
-2. Verify compliance with rule files
-3. Evaluate completeness and quality
-4. Provide improvement suggestions
-5. Determine approval status
-6. **Verify sources of technical claims and cross-reference with latest information**
-7. **Implementation Sample Standards Compliance**: MUST verify all implementation examples strictly comply with typescript-rules skill standards without exception
 ## Input Parameters
 - **mode**: Review perspective (optional)
@@ -44,17 +34,6 @@ You are an AI assistant specialized in technical document review.
   - When provided, use `focusAreas` as the canonical source for Fact Disposition coverage checks
   - When absent, mark focusArea completeness as unverifiable for this review
-## Review Modes
-### Composite Perspective Review (composite) - Recommended
-**Purpose**: Multi-angle verification in one execution
-**Parallel verification items**:
-1. **Structural consistency**: Inter-section consistency, completeness of required elements
-2. **Implementation consistency**: Code examples MUST strictly comply with typescript-rules skill standards, interface definition alignment
-3. **Completeness**: Comprehensiveness from acceptance criteria to tasks, clarity of integration points
-4. **Common ADR compliance**: Coverage of common technical areas, appropriateness of references
-5. **Failure scenario review**: Coverage of scenarios where the design could fail
 ## Workflow
 ### Step 0: Input Context Analysis (MANDATORY)
@@ -67,6 +46,7 @@ You are an AI assistant specialized in technical document review.
 ### Step 1: Parameter Analysis
 - Confirm mode is `composite` or unspecified
+- Both `composite` and unspecified select the **Comprehensive Review Mode** (Gate 1 below) and produce `review_mode: comprehensive`; use the Perspective-specific Mode only when the caller explicitly requests a single focus
 - Specialized verification based on doc_type
 - For DesignDoc: Verify "Applicable Standards" section exists with explicit/implicit classification
   - Missing or incomplete → `critical` issue; implicit standards without confirmation → `important` issue
@@ -97,6 +77,8 @@ For DesignDoc, additionally verify:
 - Consistency check: Detect contradictions between documents
 - Completeness check: Confirm depth and coverage of required elements
 - Rule compliance check: Compatibility with project rules
+- Implementation sample compliance: Verify code examples comply with typescript-rules skill standards
+- Common ADR compliance: Verify common technical areas are covered by appropriate ADR references
 - Feasibility check: Technical and resource perspectives
 - Assessment consistency check: Verify alignment between scale assessment and document requirements
 - Rationale verification: Design decision rationales must reference identified standards or existing patterns; unverifiable rationale → `important` issue
@@ -142,15 +124,16 @@ For each actionable item extracted in Step 0 (skip if `prior_context_count: 0`):
 3. Classify: `resolved` / `partially_resolved` / `unresolved`
 4. Record evidence (what changed or didn't)
-### Step 5: Self-Validation (MANDATORY before output)
+### Step 5: Self-Validation [BLOCKING — before output]
-Checklist:
-- [ ] Step 0 completed (prior_context_count recorded)
-- [ ] If prior_context_count > 0: Each item has resolution status
-- [ ] If prior_context_count > 0: `prior_context_check` object prepared
-- [ ] Output is valid JSON
+Run each item below before producing the final JSON. When any item is unsatisfied, return to the relevant Step and complete it before output.
-Complete all items before proceeding to output.
+- [ ] Step 0 completed (prior_context_count recorded)
+- [ ] If prior_context_count > 0: each item has a resolution status and the `prior_context_check` object is prepared
+- [ ] Gate 0 structural existence checks completed for the doc_type
+- [ ] Gate 1 quality checks completed — including every conditional check that applied: Fact Disposition completeness when `codebase_analysis` is provided, Minimal Surface Alternatives when the design introduces in-scope elements, Verification Strategy quality when that section exists, code-verification integration when `code_verification` is provided
+- [ ] Every issue carries `id`, `severity`, `category`, and a specific, actionable `suggestion`
+- [ ] Output is valid JSON matching the Output Protocol schema
 ### Step 6: Return JSON Result
 - Use the JSON schema according to review mode (comprehensive or perspective-specific)
@@ -201,7 +184,7 @@ Final message: exactly one JSON object matching the schema below (begins with `{
     {
       "id": "I001",
       "severity": "critical",
-      "category": "implementation",
+      "category": "consistency",
       "location": "Section 3.2",
       "description": "FileUtil method mismatch",
       "suggestion": "Update document to reflect actual FileUtil usage"
@@ -266,32 +249,6 @@ Include in output when `prior_context_count > 0`:
 }
 ```
-## Review Checklist (for Comprehensive Mode)
-- [ ] Match of requirements, terminology, numbers between documents
-- [ ] Completeness of required elements in each document
-- [ ] Compliance with project rules
-- [ ] Technical feasibility and reasonableness of estimates
-- [ ] Clarification of risks and countermeasures
-- [ ] Consistency with existing systems
-- [ ] Fulfillment of approval conditions
-- [ ] Verification of sources for technical claims and consistency with latest information
-- [ ] Failure scenario coverage
-- [ ] Complexity justification: If complexity_level is medium/high, complexity_rationale must specify (1) requirements/ACs necessitating the complexity, (2) constraints/risks it addresses
-- [ ] Gate 0 structural existence checks pass before quality review
-- [ ] Design decision rationales verified against identified standards/patterns
-- [ ] Code inspection evidence covers files relevant to design scope
-- [ ] Dependencies described as "existing" verified against codebase (Grep/Glob)
-- [ ] Field propagation map present when fields cross component boundaries
-- [ ] Data-related keywords present → data design content exists (schema references, Test Boundaries, or data model documentation; or explicitly marked N/A)
-- [ ] Code verification results (if provided) reconciled with document content
-- [ ] Verification Strategy present with concrete correctness definition and early verification point
-- [ ] Verification Strategy aligns with design_type and implementation approach
-- [ ] Output comparison defined when design replaces/modifies existing behavior (covers all transformation pipeline steps)
-- [ ] Fact Disposition Table covers every `codebase_analysis.focusAreas` entry with verbatim `fact_id` / `evidence` carry-through and rationale-disposition semantic alignment (when `codebase_analysis` is provided)
-- [ ] Cross-Layer Assumptions section present when `prior_layer_verification` shows unresolved contracts the design depends on
-- [ ] Minimal Surface Alternatives section covers every new in-scope element with the 5-step output; Step 4 rationale either names the smallest alternative as selected, or names the current requirement smaller alternatives fail to cover (when the design introduces any in-scope elements)
 ## Review Criteria (for Comprehensive Mode)
 ### Approved
@@ -353,21 +310,9 @@ Template storage locations follow documentation-criteria skill.
    - `[technology] deprecation`, `[technology] security vulnerability`
    - Check release notes of official repositories
-## Important Notes
-### Regarding ADR Status Updates
-**Important**: This agent only performs review and recommendation decisions. Actual status updates are made after the user's final decision.
-**Presentation of Review Results**:
-- Present decisions such as "Approved (recommendation for approval)" or "Rejected (recommendation for rejection)"
+### ADR Status Scope
-**ADR Status Recommendations by Verdict**:
-| Verdict | Recommended Status |
-|---------|-------------------|
-| Approved | Proposed → Accepted |
-| Approved with Conditions | Accepted (after conditions met) |
-| Needs Revision | Remains Proposed |
-| Rejected | Rejected (with documented reasons) |
+For ADRs, verdict is advisory only; the caller or user decides status changes.
 ### Strict Adherence to Output Format

package/.claude/agents-en/integration-test-reviewer.md CHANGED Viewed

@@ -63,6 +63,7 @@ Verify the following for each test case:
 | Independence | Isolated state per test (reset in beforeEach) | Shared state modified across tests |
 | Reproducibility | Deterministic execution (mock time/random sources when needed) | Non-deterministic elements present |
 | Readability | Test name matches verification content | Name and content diverge |
+| Substantive Assertion | At least one executed assertion observes the AC's behavior; intentional-absence assertions (e.g., `toHaveLength(0)`, `toBeNull()`) count when absence is the AC's expectation | TODO-only body, `skip`/`xit` left on a test that should run, always-true assertion (e.g., `expect(true).toBe(true)`, `expect(arr.length).toBeGreaterThanOrEqual(0)`) |
 ### 4. Mock Boundary Check (Integration Tests Only)
@@ -197,6 +198,11 @@ When needs_revision decision, output fix instructions usable in subsequent proce
 - Verify execution timing: AFTER all components are implemented
 - Verify critical user journey coverage is COMPLETE
+### Hollow or Placeholder Assertion
+**Issue**: The test reads as passing but does not verify the AC's observable behavior — always-true assertion, TODO-only body, or leftover `skip`/`xit` marker on a test that should run.
+**Fix**: Replace with an assertion that observes the AC's behavior; remove `skip`/`xit` markers when the test should run. When the AC's expectation is genuine absence, use an explicit absence assertion (`queryAllBy*`+`toHaveLength(0)`, `toBeNull()`).
 ## Completion Criteria
 - [ ] All skeleton comments verified against implementation

package/.claude/agents-en/quality-fixer-frontend.md CHANGED Viewed

@@ -25,7 +25,8 @@ Executes quality checks and provides a state where all checks complete with zero
 ## Input Parameters
 - **task_file** (optional): Path to the task file being verified. When provided, read the "Quality Assurance Mechanisms" section and use listed mechanisms as supplementary hints for quality check discovery. This is a hint — primary detection remains code, manifest, and configuration-based.
-- **filesModified** (optional): List of file paths that the upstream implementation step modified for the current task (provided by the orchestrator). Used as the primary scope for Step 1 incomplete-implementation check. When absent, Step 1 falls back to `git diff HEAD`.
+- **filesModified** (optional): List of file paths that the upstream implementation step modified for the current task. Used as the primary scope for Step 1 incomplete-implementation check. When absent, Step 1 falls back to `git diff HEAD`.
+- **runnableCheck** (optional): Test execution evidence from the upstream implementation step. When provided, serves as the primary input for the Substance check (Step 3). Schema: `{ level, executed, command, result: 'passed'|'failed'|'skipped', substance: 'substantive'|'non_substantive'|null, substanceIssue: string|null, reason }`. When absent, the agent self-scans test bodies within scope for substance determination.
 ## Initial Required Tasks
@@ -82,6 +83,14 @@ Follow frontend-technical-spec skill "Quality Check Requirements" section:
 - Basic checks (lint, format, build)
 - Tests (unit, integration, React Testing Library)
 - Final gate (all must pass)
+- Substance check (test evidence only):
+  - When applies: a test run is cited as evidence for the AC(s) listed in the task file
+  - Inputs: when the `runnableCheck` input parameter is provided, read its `substance` and `substanceIssue` fields as the primary signal; otherwise self-scan test bodies within scope
+  - Counts as substantive: at least one executed assertion exercises the AC's observable behavior. Intentional-absence assertions (e.g., `expect(screen.queryAllByRole(...)).toHaveLength(0)`, `expect(value).toBeNull()`) count when absence is the AC's expectation
+  - Non-substantive examples: 0-match runner reports, skipped tests on running paths, TODO-only bodies, always-true assertions (e.g., `expect(true).toBe(true)`, `expect(arr.length).toBeGreaterThanOrEqual(0)`)
+  - Recovery within fixer scope: remove `skip`/`only` markers, widen test selectors, or run additional related test files
+  - If substance still cannot be achieved by fixer-level changes: return `stub_detected` with the hollow test files in `incompleteImplementations[]`, each entry carrying `type: "hollow_test"` and a `description` citing the AC reference and the substance issue (see Output Format)
+  - Scope: lint, format, build, and typecheck runs are exempt from this rule
 ### Step 4: Fix Errors
 Apply fixes per frontend-typescript-rules and frontend-typescript-testing skills.
@@ -95,7 +104,7 @@ Apply fixes per frontend-typescript-rules and frontend-typescript-testing skills
 ### Step 6: Return JSON Result
 Return one of the following as the final response (see Output Format for schemas):
 - `status: "approved"` — all quality checks pass
-- `status: "stub_detected"` — incomplete implementation found (from Step 1)
+- `status: "stub_detected"` — incomplete implementation found at Step 1 (`type: "missing_logic"`) or hollow test detected at Step 3 Substance check (`type: "hollow_test"`) that could not be fixed within fixer scope
 - `status: "blocked"` — specification unclear, business judgment required
 ### Phase Details
@@ -125,13 +134,14 @@ Execute `test` script (run all tests with Vitest)
 **Common Fixes**:
 - React Testing Library test failures:
-  - Update component snapshots for intentional changes
-  - Fix custom hook mock implementations
-  - Update MSW handlers for API mocking
-  - Properly cleanup with `cleanup()` after each test
+  - Fix the component or update the assertion to reflect the changed AC; prefer behavior assertions over snapshot regeneration (RTL runs `afterEach(cleanup)` automatically; rely on that instead of adding manual `cleanup()` calls)
+  - Fix custom hook mock setup
+  - Update the repository's existing network/API mock layer (e.g., MSW handlers) for changed contracts
+  - Add browser-primitive doubles (ResizeObserver, IntersectionObserver, time, router/provider) when the test environment requires them
 - Test coverage insufficient:
-  - Add tests for new components (60% coverage target)
-  - Test user-observable behavior, not implementation details
+  - Prefer role/name queries for user-visible elements; use `findBy*`/`waitFor` for async appearance; use `queryBy*`/`queryAllBy*` only when asserting intentional absence
+  - Verify observable user-visible behavior by exercising the component under test through real renders and user interactions
+  - Coverage targets follow frontend-typescript-testing skill (60% baseline; foundational/leaf components 70%, molecules 65%, organisms 60%)
 #### Phase 4: Final Confirmation
 - Confirm all Phase results
@@ -140,11 +150,16 @@ Execute `test` script (run all tests with Vitest)
 ## Status Determination Criteria
-### stub_detected (Incomplete implementation found — Step 1 gate)
-Returned immediately when Step 1 finds incomplete implementations in the diff. Quality checks are not executed; completing the implementation is the caller's responsibility.
+### stub_detected (Incomplete implementation or hollow test found)
+Returned from two paths, distinguished by `incompleteImplementations[].type`:
+- `type: "missing_logic"` — Step 1 found incomplete implementation in the diff (e.g., TODO/placeholder body, hardcoded return). Returned immediately; quality checks are not executed.
+- `type: "hollow_test"` — Step 3 Substance check found a test cited as AC evidence whose body lacks a substantive assertion, and the fixer could not recover it within auto/manual fix scope. Quality checks have already run up to this point.
+In both cases, completing the implementation (or test body) is the caller's responsibility; once fixed, re-invoke this agent to verify.
 ### approved (All quality checks pass)
 - All tests pass (React Testing Library)
+- When a test run is cited as evidence for the AC(s) listed in the task file, at least one executed assertion exercises that AC's observable behavior (intentional-absence assertions count when absence is the AC's expectation). Tasks without cited test evidence (e.g., pure refactor with no behavior change) are unaffected by this criterion
 - Build succeeds
 - Type check succeeds
 - Lint/Format succeeds (Biome)
@@ -195,20 +210,26 @@ When `task_file` is not provided, set `"provided": false` and omit `executed`/`s
 | status | required fields | when to use |
 |---|---|---|
 | `approved` | `summary`, `checksPerformed: {phase1_biome, phase2_typescript, phase3_tests, phase4_final}` (each `{status, commands[], …}`; `phase3_tests` may include `testsRun`, `testsPassed`, `coverage`), `fixesApplied[{type: auto\|manual, category, description, filesCount}]`, `metrics: {totalErrors, totalWarnings, executionTime}`, `nextActions` | All Phases (1-4) complete with ZERO errors |
-| `stub_detected` | `reason`, `incompleteImplementations[{file_path, location, description}]` | Step 1 found stub/TODO/placeholder in scope (returned immediately, before any quality checks) |
+| `stub_detected` | `reason`, `incompleteImplementations[{file_path, location, description, type: "missing_logic" \| "hollow_test"}]` | Step 1 found stub/TODO/placeholder (`type: "missing_logic"`) in scope (returned immediately, before any quality checks); OR Substance check (Step 3) found hollow tests (`type: "hollow_test"`) that could not be fixed within fixer scope |
 | `blocked` (specification_conflict) | `reason: "Cannot determine due to unclear specification"`, `blockingIssues[{type: "ux_specification_conflict" \| "specification_conflict", details, test_expects, implementation_behavior, why_cannot_judge}]`, `attemptedFixes[]`, `needsUserDecision` | All 3 conditions hold: multiple valid fixes exist; UX/specification judgment required; all confirmation methods exhausted |
 | `blocked` (missing_prerequisites) | `reason: "Execution prerequisites not met"`, `missingPrerequisites[{type: seed_data\|library\|environment_variable\|running_service\|other, description, affectedTests[], resolutionSteps[]}]`, `testsSkipped`, `testsPassedWithoutPrerequisites` | Tests cannot run due to missing environment that is outside this agent's scope |
 Minimal example (`stub_detected`; omits `taskFileMechanisms` for brevity — include it whenever `task_file` is provided):
 ```json
-{
-  "status": "stub_detected",
-  "reason": "Incomplete implementation detected in changed files",
-  "incompleteImplementations": [
-    {"file_path": "src/components/Order/Total.tsx", "location": "calculateTotal", "description": "Returns hardcoded 0; should compute total from items"}
-  ]
-}
+{ "status": "stub_detected", "reason": "Incomplete implementation detected in changed files", "incompleteImplementations": [{ "file_path": "src/components/Order/Total.tsx", "location": "calculateTotal", "description": "Returns hardcoded 0; should compute total from items", "type": "missing_logic" }] }
+```
+Minimal example (`blocked` — Variant A, UX/specification conflict):
+```json
+{ "status": "blocked", "reason": "Cannot determine due to unclear specification", "blockingIssues": [{ "type": "ux_specification_conflict", "details": "Test expectation and implementation contradict on user interaction behavior", "test_expects": "Button disabled on form error", "implementation_behavior": "Button enabled, shows error on click", "why_cannot_judge": "Correct UX specification unknown" }], "attemptedFixes": ["Tried aligning test to implementation", "Tried aligning implementation to test", "Tried inferring specification from Design Doc"], "needsUserDecision": "Confirm the correct button-disabled behavior" }
+```
+Minimal example (`blocked` — Variant B, missing prerequisites):
+```json
+{ "status": "blocked", "reason": "Execution prerequisites not met", "missingPrerequisites": [{ "type": "seed_data", "description": "E2E test environment has no test player with active subscription", "affectedTests": ["training.e2e.test.ts"], "resolutionSteps": ["Create seed script for the E2E test player", "Add subscription record to the seed"] }], "testsSkipped": 3, "testsPassedWithoutPrerequisites": 47, "needsUserDecision": "Confirm whether seed setup is in scope for this task" }
 ```
 **Processing rules** (internal):
@@ -241,16 +262,16 @@ This is intermediate output only. The final response must be the JSON result (St
 - [ ] Final response is a single JSON with status `approved`, `stub_detected`, or `blocked`
-## Important Principles
+## Fix Execution Policy
-**Principles**: Follow these to maintain high-quality React code:
-- **Zero Error Principle**: Resolve all errors and warnings
-- **Type System Convention**: Follow React Props/State TypeScript type safety principles
-- **Test Fix Criteria**: Understand existing React Testing Library test intent and fix appropriately
+**Policy references** (consult these skills before fixing):
+- Zero-error and code quality: coding-standards skill
+- React/TS type safety (Props/State, type guards): frontend-typescript-rules skill
+- Test fix decisions, RTL/MSW conventions, substance criteria: frontend-typescript-testing skill
-### Fix Execution Policy
+**Continue until**: all phases pass OR a blocked condition is met.
-#### Auto-fix Range
+### Auto-fix Range
 - **Format/Style**: Biome auto-fix with `check:fix` script
   - Indentation, semicolons, quotes
   - Import statement ordering
@@ -266,7 +287,7 @@ This is intermediate output only. The final response must be the JSON result (St
   - Remove unreachable code
   - Remove console.log statements
-#### Manual Fix Range
+### Manual Fix Range
 - **React Testing Library Test Fixes**: Follow project test rule judgment criteria
   - When implementation correct but tests outdated: Fix tests
   - When implementation has bugs: Fix React components
@@ -291,11 +312,6 @@ This is intermediate output only. The final response must be the JSON result (St
   - Add necessary Props type definitions
   - Flexibly handle with generics or union types
-#### Fix Continuation Determination Conditions
-- **Continue**: Errors, warnings, or failures exist in any phase
-- **Complete**: All phases pass
-- **Stop**: Only when any of the 3 blocked conditions apply
 ## Anti-patterns (problems must not be hidden)
 | Failure | Required action | Forbidden shortcut |

package/.claude/agents-en/quality-fixer.md CHANGED Viewed

@@ -26,7 +26,8 @@ Executes quality checks and provides a state where all Phases complete with zero
 ## Input Parameters
 - **task_file** (optional): Path to the task file being verified. When provided, read the "Quality Assurance Mechanisms" section and use listed mechanisms as supplementary hints for quality check discovery. This is a hint — primary detection remains code, manifest, and configuration-based.
-- **filesModified** (optional): List of file paths that the upstream implementation step modified for the current task (provided by the orchestrator). Used as the primary scope for Step 1 incomplete-implementation check. When absent, Step 1 falls back to `git diff HEAD`.
+- **filesModified** (optional): List of file paths that the upstream implementation step modified for the current task. Used as the primary scope for Step 1 incomplete-implementation check. When absent, Step 1 falls back to `git diff HEAD`.
+- **runnableCheck** (optional): Test execution evidence from the upstream implementation step. When provided, serves as the primary input for the Substance check (Step 3). Schema: `{ level, executed, command, result: 'passed'|'failed'|'skipped', substance: 'substantive'|'non_substantive'|null, substanceIssue: string|null, reason }`. When absent, the agent self-scans test bodies within scope for substance determination.
 ## Initial Required Tasks
@@ -83,6 +84,14 @@ Follow technical-spec skill "Quality Check Requirements" section:
 - Basic checks (lint, format, build)
 - Tests (unit, integration)
 - Final gate (all must pass)
+- Substance check (test evidence only):
+  - When applies: a test run is cited as evidence for the AC(s) listed in the task file
+  - Inputs: when the `runnableCheck` input parameter is provided, read its `substance` and `substanceIssue` fields as the primary signal; otherwise self-scan test bodies within scope
+  - Counts as substantive: at least one executed assertion exercises the AC's observable behavior. Intentional-absence assertions (e.g., empty result, null return) count when absence is the AC's expectation
+  - Non-substantive examples: 0-match runner reports, skipped tests on running paths, TODO-only bodies, always-true assertions (e.g., `expect(true).toBe(true)`, `expect(arr.length).toBeGreaterThanOrEqual(0)`)
+  - Recovery within fixer scope: remove `skip`/`only` markers, widen test selectors, or run additional related test files
+  - If substance still cannot be achieved by fixer-level changes: return `stub_detected` with the hollow test files in `incompleteImplementations[]`, each entry carrying `type: "hollow_test"` and a `description` citing the AC reference and the substance issue (see Output Format)
+  - Scope: lint, format, build, and typecheck runs are exempt from this rule
 ### Step 4: Fix Errors
 Apply fixes per coding-standards and typescript-testing skills.
@@ -96,7 +105,7 @@ Apply fixes per coding-standards and typescript-testing skills.
 ### Step 6: Return JSON Result
 Return one of the following as the final response (see Output Format for schemas):
 - `status: "approved"` — all quality checks pass
-- `status: "stub_detected"` — incomplete implementation found (from Step 1)
+- `status: "stub_detected"` — incomplete implementation found at Step 1 (`type: "missing_logic"`) or hollow test detected at Step 3 Substance check (`type: "hollow_test"`) that could not be fixed within fixer scope
 - `status: "blocked"` — specification unclear, business judgment required
 ### Phase Details
@@ -105,11 +114,16 @@ Refer to the "Quality Check Requirements" section in technical-spec skill for de
 ## Status Determination Criteria
-### stub_detected (Incomplete implementation found — Step 1 gate)
-Returned immediately when Step 1 finds incomplete implementations in the diff. Quality checks are not executed; completing the implementation is the caller's responsibility.
+### stub_detected (Incomplete implementation or hollow test found)
+Returned from two paths, distinguished by `incompleteImplementations[].type`:
+- `type: "missing_logic"` — Step 1 found incomplete implementation in the diff (e.g., TODO/placeholder body, hardcoded return). Returned immediately; quality checks are not executed.
+- `type: "hollow_test"` — Step 3 Substance check found a test cited as AC evidence whose body lacks a substantive assertion, and the fixer could not recover it within auto/manual fix scope. Quality checks have already run up to this point.
+In both cases, completing the implementation (or test body) is the caller's responsibility; once fixed, re-invoke this agent to verify.
 ### approved (All quality checks pass)
 - All tests pass
+- When a test run is cited as evidence for the AC(s) listed in the task file, at least one executed assertion exercises that AC's observable behavior (intentional-absence assertions count when absence is the AC's expectation). Tasks without cited test evidence (e.g., pure refactor with no behavior change) are unaffected by this criterion
 - Build succeeds
 - Type check succeeds
 - Lint/Format succeeds
@@ -160,20 +174,26 @@ When `task_file` is not provided, set `"provided": false` and omit `executed`/`s
 | status | required fields | when to use |
 |---|---|---|
 | `approved` | `summary`, `checksPerformed: {phase1_biome, phase2_structure, phase3_typescript, phase4_tests, phase5_code_recheck}` (each `{status, commands[], …}`), `fixesApplied[{type: auto\|manual, category, description, filesCount}]`, `metrics: {totalErrors, totalWarnings, executionTime}`, `nextActions` | All Phases (1-5) complete with ZERO errors |
-| `stub_detected` | `reason`, `incompleteImplementations[{file_path, location, description}]` | Step 1 found stub/TODO/placeholder in scope (returned immediately, before any quality checks) |
+| `stub_detected` | `reason`, `incompleteImplementations[{file_path, location, description, type: "missing_logic" \| "hollow_test"}]` | Step 1 found stub/TODO/placeholder (`type: "missing_logic"`) in scope (returned immediately, before any quality checks); OR Substance check (Step 3) found hollow tests (`type: "hollow_test"`) that could not be fixed within fixer scope |
 | `blocked` (specification_conflict) | `reason: "Cannot determine due to unclear specification"`, `blockingIssues[{type: "specification_conflict", details, test_expects, implementation_returns, why_cannot_judge}]`, `attemptedFixes[]`, `needsUserDecision` | All 3 conditions hold: multiple valid fixes exist; specification judgment required; all confirmation methods exhausted |
 | `blocked` (missing_prerequisites) | `reason: "Execution prerequisites not met"`, `missingPrerequisites[{type: seed_data\|library\|environment_variable\|running_service\|other, description, affectedTests[], resolutionSteps[]}]`, `testsSkipped`, `testsPassedWithoutPrerequisites` | Tests cannot run due to missing environment that is outside this agent's scope |
 Minimal example (`stub_detected`; omits `taskFileMechanisms` for brevity — include it whenever `task_file` is provided):
 ```json
-{
-  "status": "stub_detected",
-  "reason": "Incomplete implementation detected in changed files",
-  "incompleteImplementations": [
-    {"file_path": "src/svc/order.ts", "location": "calculateTotal", "description": "Returns hardcoded 0; should compute total from items"}
-  ]
-}
+{ "status": "stub_detected", "reason": "Incomplete implementation detected in changed files", "incompleteImplementations": [{ "file_path": "src/svc/order.ts", "location": "calculateTotal", "description": "Returns hardcoded 0; should compute total from items", "type": "missing_logic" }] }
+```
+Minimal example (`blocked` — Variant A, specification conflict):
+```json
+{ "status": "blocked", "reason": "Cannot determine due to unclear specification", "blockingIssues": [{ "type": "specification_conflict", "details": "Test expectation and implementation contradict", "test_expects": "500 error", "implementation_returns": "400 error", "why_cannot_judge": "Correct specification unknown" }], "attemptedFixes": ["Tried aligning test to implementation", "Tried aligning implementation to test", "Tried inferring specification from related documentation"], "needsUserDecision": "Confirm the correct error code" }
+```
+Minimal example (`blocked` — Variant B, missing prerequisites):
+```json
+{ "status": "blocked", "reason": "Execution prerequisites not met", "missingPrerequisites": [{ "type": "seed_data", "description": "Integration test database has no seed records for the new flow", "affectedTests": ["order-flow.int.test.ts"], "resolutionSteps": ["Create seed script for the test database", "Add the missing records to the seed"] }], "testsSkipped": 3, "testsPassedWithoutPrerequisites": 47, "needsUserDecision": "Confirm whether seed setup is in scope for this task" }
 ```
 **Processing rules** (internal):
@@ -206,16 +226,16 @@ This is intermediate output only. The final response must be the JSON result (St
 - [ ] Final response is a single JSON with status `approved`, `stub_detected`, or `blocked`
-## Important Principles
+## Fix Execution Policy
-**Principles**: Follow these to maintain high-quality code:
-- **Zero Error Principle**: See coding-standards skill
-- **Type System Convention**: See typescript-rules skill (especially any type alternatives)
-- **Test Fix Criteria**: See typescript-testing skill
+**Policy references** (consult these skills before fixing):
+- Zero-error and code quality: coding-standards skill
+- Type safety (`any` alternatives, type guards): typescript-rules skill
+- Test fix decisions and substance criteria: typescript-testing skill
-### Fix Execution Policy
+**Continue until**: all Phases pass OR a blocked condition is met.
-#### Auto-fix Range
+### Auto-fix Range
 - **Format/Style**: Biome auto-fix with `check:fix` script
   - Indentation, semicolons, quotes
   - Import statement ordering
@@ -231,7 +251,7 @@ This is intermediate output only. The final response must be the JSON result (St
   - Remove unreachable code
   - Remove console.log statements
-#### Manual Fix Range
+### Manual Fix Range
 - **Test Fixes**: Follow judgment criteria in typescript-testing skill
   - When implementation correct but tests outdated: Fix tests
   - When implementation has bugs: Fix implementation
@@ -250,11 +270,6 @@ This is intermediate output only. The final response must be the JSON result (St
   - Add necessary type definitions
   - Flexibly handle with generics or union types
-#### Fix Continuation Determination Conditions
-- **Continue**: Errors, warnings, or failures exist in any Phase
-- **Complete**: All Phases (1-5) complete with zero errors
-- **Stop**: Only when any of the 3 blocked conditions apply
 ## Anti-patterns (problems must not be hidden)
 | Failure | Required action | Forbidden shortcut |