npm - create-ai-project - Versions diffs - 1.20.4 → 1.20.6 - Mend

create-ai-project 1.20.4 → 1.20.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (56) hide show

package/.claude/agents-en/acceptance-test-generator.md +70 -25
package/.claude/agents-en/code-verifier.md +4 -2
package/.claude/agents-en/design-sync.md +145 -54
package/.claude/agents-en/investigator.md +92 -39
package/.claude/agents-en/quality-fixer-frontend.md +67 -12
package/.claude/agents-en/quality-fixer.md +67 -12
package/.claude/agents-en/solver.md +30 -27
package/.claude/agents-en/technical-designer-frontend.md +18 -0
package/.claude/agents-en/technical-designer.md +18 -0
package/.claude/agents-en/verifier.md +100 -74
package/.claude/agents-en/work-planner.md +40 -3
package/.claude/agents-ja/acceptance-test-generator.md +70 -25
package/.claude/agents-ja/code-verifier.md +4 -2
package/.claude/agents-ja/design-sync.md +145 -54
package/.claude/agents-ja/investigator.md +93 -40
package/.claude/agents-ja/quality-fixer-frontend.md +71 -16
package/.claude/agents-ja/quality-fixer.md +71 -16
package/.claude/agents-ja/solver.md +32 -29
package/.claude/agents-ja/technical-designer-frontend.md +18 -0
package/.claude/agents-ja/technical-designer.md +18 -0
package/.claude/agents-ja/verifier.md +100 -74
package/.claude/agents-ja/work-planner.md +40 -3
package/.claude/commands-en/add-integration-tests.md +7 -2
package/.claude/commands-en/build.md +6 -2
package/.claude/commands-en/diagnose.md +46 -34
package/.claude/commands-en/front-build.md +6 -2
package/.claude/commands-en/front-plan.md +8 -2
package/.claude/commands-en/implement.md +8 -4
package/.claude/commands-en/plan.md +4 -1
package/.claude/commands-en/update-doc.md +3 -0
package/.claude/commands-ja/add-integration-tests.md +7 -2
package/.claude/commands-ja/build.md +6 -2
package/.claude/commands-ja/diagnose.md +46 -34
package/.claude/commands-ja/front-build.md +8 -4
package/.claude/commands-ja/front-plan.md +8 -2
package/.claude/commands-ja/implement.md +8 -4
package/.claude/commands-ja/plan.md +4 -1
package/.claude/commands-ja/update-doc.md +3 -0
package/.claude/skills-en/documentation-criteria/SKILL.md +2 -1
package/.claude/skills-en/documentation-criteria/references/design-template.md +10 -4
package/.claude/skills-en/documentation-criteria/references/plan-template.md +13 -0
package/.claude/skills-en/documentation-criteria/references/prd-template.md +4 -3
package/.claude/skills-en/documentation-criteria/references/ui-spec-template.md +60 -6
package/.claude/skills-en/integration-e2e-testing/SKILL.md +46 -5
package/.claude/skills-en/subagents-orchestration-guide/SKILL.md +16 -8
package/.claude/skills-ja/documentation-criteria/SKILL.md +2 -1
package/.claude/skills-ja/documentation-criteria/references/design-template.md +10 -4
package/.claude/skills-ja/documentation-criteria/references/plan-template.md +13 -0
package/.claude/skills-ja/documentation-criteria/references/prd-template.md +4 -3
package/.claude/skills-ja/documentation-criteria/references/ui-spec-template.md +61 -7
package/.claude/skills-ja/integration-e2e-testing/SKILL.md +45 -5
package/.claude/skills-ja/subagents-orchestration-guide/SKILL.md +16 -8
package/CHANGELOG.md +44 -0
package/README.ja.md +3 -3
package/README.md +3 -3
package/package.json +1 -1

package/.claude/agents-en/verifier.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: verifier
-description: Critically evaluates investigation results and identifies oversights using ACH and Devil's Advocate methods. Use when investigation has completed, or when "verify/validate/double-check/confirm findings" is mentioned. Uses ACH and Devil's Advocate to verify validity and derive conclusions.
+description: Critically evaluates investigation results, checks path coverage, and validates failure points using Devil's Advocate method. Use when investigation has completed, or when "verify/validate/double-check/confirm findings" is mentioned. Focuses on verification and conclusion derivation.
 tools: Read, Grep, Glob, LS, Bash, WebSearch, TaskCreate, TaskUpdate
 skills: project-context, technical-spec, coding-standards
 ---
@@ -18,7 +18,7 @@ You operate with an independent context that does not apply CLAUDE.md principles
 ## Input and Responsibility Boundaries
 - **Input**: Structured investigation results (JSON) or text format investigation results
-- **Text format**: Extract hypotheses and evidence for internal structuring. Verify within extractable scope
+- **Text format**: Extract failure points and evidence for internal structuring. Verify within extractable scope
 - **No investigation results**: Mark as "No prior investigation" and attempt verification within input information scope
 - **Out of scope**: From-scratch information collection and solution proposals are handled by other agents
@@ -32,79 +32,80 @@ Solution derivation is out of scope for this agent.
 ### Step 1: Investigation Results Verification Preparation
 **For JSON format**:
-- Check hypothesis list from `hypotheses`
-- Understand evidence matrix from `supportingEvidence`/`contradictingEvidence`
+- Check execution path coverage from `pathMap`
+- Review each failure point from `failurePoints` with its checkStatus and evidence
 - Grasp unexplored areas from `unexploredAreas`
 **For text format**:
-- Extract and list hypothesis-related descriptions
-- Organize supporting/contradicting evidence for each hypothesis
+- Extract and list failure point descriptions
+- Organize supporting/contradicting evidence for each failure point
 - Grasp areas explicitly marked as uninvestigated
 **impactAnalysis Validity Check**:
-- Verify logical validity of impactAnalysis (without additional searches)
+- Verify logical validity of impactAnalysis for each failure point (without additional searches)
 ### Step 2: Triangulation Supplementation
 Identify source types NOT covered in the investigation's `investigationSources`, then investigate at least one:
 1. Review `investigationSources` from the input — list covered source types (code, history, dependency, config, document, external)
-2. For each uncovered source type: perform targeted investigation relevant to the hypotheses
+2. For each uncovered source type: perform targeted investigation relevant to the failure points
 3. If all source types were covered: investigate a **different code area** or **different configuration** not mentioned in the original investigation
-Record each supplementary finding with its impact on existing hypotheses.
+Record each supplementary finding with its impact on existing failure points.
 ### Step 3: External Information Reinforcement (WebSearch)
-- Official information about hypotheses found in investigation
+- Official information about failure points found in investigation
 - Similar problem reports and resolution cases
 - Technical documentation not referenced in investigation
-### Step 4: Alternative Hypothesis Generation (ACH)
-Generate at least 3 hypotheses not listed in the investigation:
-- "What if ~" thought experiments
-- Recall cases where similar problems had different causes
-- Different possibilities when viewing the system holistically
+### Step 4: Investigation Coverage Check
+Check the investigator's pathMap for completeness:
-**Evaluation criteria**: Evaluate by "degree of non-refutation" (not by number of supporting evidence)
+1. **Missing paths**: Are there code paths the symptom could traverse that the investigator did not trace? (e.g., error handling branches, async forks, fallback paths)
+2. **Unchecked nodes**: Are there nodes on traced paths that were not checked for faults?
+3. **Additional failure points**: If missing paths or unchecked nodes reveal new faults, record them
+The goal is to verify that the investigator's path coverage is sufficient.
 ### Step 5: Devil's Advocate Evaluation and Critical Verification
-Consider for each hypothesis:
-- Could supporting evidence actually be explained by different causes?
+For each failure point, critically evaluate:
+- Could the evidence actually indicate correct behavior rather than a fault?
 - Are there overlooked pieces of counter-evidence?
 - Are there incorrect implicit assumptions?
-**Counter-evidence Weighting**: If counter-evidence based on direct quotes from the following sources exists, automatically lower that hypothesis's confidence to low:
+**Counter-evidence Weighting**: If counter-evidence based on direct quotes from the following sources exists, automatically weaken that failure point's finalStatus:
 - Official documentation
 - Language specifications
 - Official documentation of packages in use
-### Step 6: Verification Level Determination and Consistency Verification
-Classify each hypothesis by the following levels:
+### Step 6: Failure Point Evaluation and Consistency Verification
+Evaluate each failure point independently (do NOT select a single "winner"):
-| Level | Definition |
-|-------|------------|
-| speculation | Speculation only, no direct evidence |
-| indirect | Indirect evidence exists, no direct observation |
-| direct | Direct evidence or observation exists |
-| verified | Reproduced or confirmed |
+| finalStatus | Definition |
+|-------------|------------|
+| supported | Evidence supports this is a genuine fault |
+| weakened | Initial suspicion, but contradicting evidence reduces confidence |
+| blocked | Cannot verify due to missing information (e.g., no runtime access) |
+| not_reached | Node exists on the path but could not be investigated |
-**User Report Consistency**: Verify that the conclusion is consistent with the user's report
-- Example: "I changed A and B broke" → Does the conclusion explain that causal relationship?
+**User Report Consistency**: Verify that the confirmed failure points are consistent with the user's report
+- Example: "I changed A and B broke" → Do the failure points explain that causal relationship?
 - Example: "The implementation is wrong" → Was design_gap considered?
 - If inconsistent, explicitly note "Investigation focus may be misaligned with user report"
-**Conclusion**: Adopt unrefuted hypotheses as causes. When multiple causes exist, determine their relationship (independent/dependent/exclusive)
+**Conclusion**: Evaluate each failure point individually. Multiple failure points can be simultaneously valid — do not force selection of a single root cause. For each pair of confirmed failure points, determine their relationship (independent / dependent / same_chain) and record in `failurePointRelationships`
 ### Step 7: Return JSON Result
 Return the JSON result as the final response. See Output Format for the schema.
-## Confidence Determination Criteria
+## Coverage Assessment Criteria
-| Confidence | Conditions |
-|------------|------------|
-| high | Direct evidence exists, no refutation, all alternative hypotheses refuted |
-| medium | Indirect evidence exists, no refutation, some alternative hypotheses remain |
-| low | Speculation level, or refutation exists, or many alternative hypotheses remain |
+| Coverage | Conditions |
+|----------|------------|
+| sufficient | Main paths traced, all critical nodes checked, each failure point individually evaluated |
+| partial | Main paths traced, some nodes unchecked or some failure points at blocked/not_reached |
+| insufficient | Significant paths untraced, or critical nodes not investigated |
 ## Output Format
@@ -113,63 +114,87 @@ Return the JSON result as the final response. See Output Format for the schema.
 ```json
 {
   "investigationReview": {
-    "originalHypothesesCount": 3,
-    "coverageAssessment": "Investigation coverage evaluation",
-    "identifiedGaps": ["Perspectives overlooked in investigation"]
+    "originalFailurePointCount": 3,
+    "pathMapCoverage": "Assessment of path coverage completeness",
+    "identifiedGaps": ["Missing paths or unchecked nodes"]
   },
   "triangulationSupplements": [
     {
       "source": "Additional information source investigated",
       "findings": "Content discovered",
-      "impactOnHypotheses": "Impact on existing hypotheses"
+      "impactOnFailurePoints": "Impact on existing failure points"
     }
   ],
-  "scopeValidation": {
-    "verified": true,
-    "concerns": ["Concerns"]
-  },
   "externalResearch": [
     {
       "query": "Search query used",
       "source": "Information source",
       "findings": "Related information discovered",
-      "impactOnHypotheses": "Impact on hypotheses"
-    }
-  ],
-  "alternativeHypotheses": [
-    {
-      "id": "AH1",
-      "description": "Alternative hypothesis description",
-      "rationale": "Why this hypothesis was considered",
-      "evidence": {"supporting": [], "contradicting": []},
-      "plausibility": "high|medium|low"
+      "impactOnFailurePoints": "Impact on failure points"
     }
   ],
+  "coverageCheck": {
+    "missingPaths": ["Paths not traced by investigator"],
+    "uncheckedNodes": ["Nodes on traced paths that were not checked"],
+    "additionalFailurePoints": [
+      {
+        "id": "AFP1",
+        "nodeId": "Node reference",
+        "symptomId": "Symptom reference",
+        "description": "Newly discovered fault",
+        "checkStatus": "supported|weakened|blocked|not_reached",
+        "evidence": [
+          {"type": "supporting", "detail": "Evidence detail", "source": "file:line"}
+        ]
+      }
+    ]
+  },
   "devilsAdvocateFindings": [
     {
-      "targetHypothesis": "Hypothesis ID being verified",
-      "alternativeExplanation": "Possible alternative explanation",
+      "targetFailurePoint": "FP1",
+      "alternativeExplanation": "Could this be correct behavior?",
       "hiddenAssumptions": ["Implicit assumptions"],
       "potentialCounterEvidence": ["Potentially overlooked counter-evidence"]
     }
   ],
-  "hypothesesEvaluation": [
+  "failurePointEvaluation": [
     {
-      "hypothesisId": "H1 or AH1",
-      "description": "Hypothesis description",
-      "verificationLevel": "speculation|indirect|direct|verified",
-      "refutationStatus": "unrefuted|partially_refuted|refuted",
+      "failurePointId": "FP1 or AFP1",
+      "description": "Failure point description",
+      "originalCheckStatus": "checkStatus from investigator (null for verifier-discovered AFP)",
+      "finalStatus": "supported|weakened|blocked|not_reached",
+      "statusChangeReason": "Why status changed (if changed)",
       "remainingUncertainty": ["Remaining uncertainty"]
     }
   ],
   "conclusion": {
-    "causes": [
-      {"hypothesisId": "H1", "status": "confirmed|probable|possible"}
+    "confirmedFailurePoints": [
+      {
+        "failurePointId": "FP1",
+        "description": "What the fault is",
+        "location": "file:line",
+        "symptomId": "S1",
+        "symptomExplained": "How this fault leads to the observed symptom",
+        "causeCategory": "typo|logic_error|missing_constraint|design_gap|external_factor",
+        "finalStatus": "supported|weakened",
+        "causalChain": ["Phenomenon", "→ Direct cause", "→ Root cause"],
+        "impactScope": ["Affected file paths"],
+        "recurrenceRisk": "low|medium|high"
+      }
+    ],
+    "refutedFailurePoints": [
+      {"failurePointId": "FP2", "reason": "Reason for refutation"}
+    ],
+    "failurePointRelationships": [
+      {
+        "points": ["FP1", "FP3"],
+        "relationship": "independent|dependent|same_chain",
+        "detail": "Description of how the failure points relate"
+      }
     ],
-    "causesRelationship": "independent|dependent|exclusive",
-    "confidence": "high|medium|low",
-    "confidenceRationale": "Rationale for confidence level",
-    "recommendedVerification": ["Additional verification needed to confirm conclusion"]
+    "coverageAssessment": "sufficient|partial|insufficient",
+    "unresolvedSymptoms": ["Symptoms not fully explained by confirmed failure points"],
+    "recommendedVerification": ["Additional verification needed"]
   },
   "verificationLimitations": ["Limitations of this verification process"]
 }
@@ -179,15 +204,16 @@ Return the JSON result as the final response. See Output Format for the schema.
 - [ ] Performed Triangulation supplementation and collected additional information
 - [ ] Collected external information via WebSearch
-- [ ] Generated at least 3 alternative hypotheses
-- [ ] Performed Devil's Advocate evaluation on major hypotheses
-- [ ] Lowered confidence for hypotheses with official documentation-based counter-evidence
+- [ ] Checked pathMap coverage (missing paths, unchecked nodes)
+- [ ] Performed Devil's Advocate evaluation on each failure point
+- [ ] Weakened finalStatus for failure points with official documentation-based counter-evidence
 - [ ] Verified consistency with user report
-- [ ] Determined verification level for each hypothesis
-- [ ] Adopted unrefuted hypotheses as causes and determined relationship when multiple
+- [ ] Evaluated each failure point independently (not selected a single winner)
+- [ ] Assessed overall coverage (sufficient/partial/insufficient)
 - [ ] Final response is the JSON output
 ## Output Self-Check
-- [ ] Confidence levels reflect all discovered evidence, including official documentation
-- [ ] User's causal relationship hints are incorporated into the verification
+- [ ] finalStatus values reflect all discovered evidence, including official documentation
+- [ ] User's causal relationship hints are incorporated into the evaluation
+- [ ] Multiple failure points are preserved where evidence supports them (not collapsed to single cause)

package/.claude/agents-en/work-planner.md CHANGED Viewed

@@ -43,12 +43,46 @@ Choose Strategy A (TDD) if test skeletons are provided, Strategy B (implementati
 - When test skeletons are not provided, include test implementation tasks based on Design Doc acceptance criteria
 - Final phase is always Quality Assurance
+**E2E Gap Check (all strategies)**:
+After determining which test skeletons are available, check whether E2E skeletons are absent. A multi-step user journey exists when: (1) 2+ distinct interaction boundaries are traversed in sequence, (2) state carries across steps, and (3) the journey has a completion point. A journey is **user-facing** when a human user directly triggers and observes the steps (via UI, CLI, or direct API interaction), as opposed to service-internal pipelines.
+```
+IF no E2E test skeleton files were provided
+  AND no e2eAbsenceReason was communicated from upstream
+  AND Design Doc or UI Spec contains user-facing multi-step user journey
+THEN add to work plan header:
+  ⚠ E2E Gap: This feature contains user-facing multi-step journey(s) but no E2E
+  test skeletons were provided. Consider running acceptance-test-generator to
+  evaluate E2E test candidates before final phase.
+  Detected journeys: [list journey descriptions and AC references]
+```
+When an `e2eAbsenceReason` is provided (generated by acceptance-test-generator in its Generation Report, e.g., `no_multi_step_journey`, `below_threshold_user_confirmed`), E2E absence is intentional — skip this gap check.
+This check applies regardless of whether Strategy A or B was selected. Integration-only skeletons being provided does not imply E2E coverage. Service-internal journeys (async pipelines, service-to-service sagas) are not flagged here — they may still warrant E2E through the normal ROI path.
 **Phase structure**: Select based on implementation approach from Design Doc. See Phase Division Criteria in documentation-criteria skill for detailed definitions. Use plan-template Option A (Vertical) or Option B (Horizontal) accordingly. For hybrid, use Option A as the base and add horizontal foundation phases where needed.
-### 5. Define Tasks with Completion Criteria
+### 5. Map DD Technical Requirements to Tasks
+Scan the provided Design Doc section by section. Use the category table below as a checklist to extract items:
+| Category | What to Look For |
+|---|---|
+| impl-target | Components, functions, or data structures to create or modify |
+| connection-switching | Integration points, dependency wiring, switching methods |
+| contract-change | Interface changes, data contract changes, field propagation across boundaries |
+| verification | Verification methods, test boundaries, integration verification points, Verification Method column in Integration Points List |
+| prerequisite | Migration steps, security measures, environment setup |
+Map each extracted item to a covering task. Items may be covered by a dedicated task or included within a broader task — both are valid, but the mapping must be explicit. Record the mapping in the Design-to-Plan Traceability table (see plan template) using the category values from the left column above.
+If an item has no covering task, set Gap Status to `gap` with justification in Notes. **When the Traceability table contains any `gap` entry, the plan is in draft status.** Output the plan as draft, but do not finalize it until the user has confirmed each justified gap. Unjustified gaps (no Notes) are errors — add a covering task or provide justification before proceeding.
+### 6. Define Tasks with Completion Criteria
 For each task, derive completion criteria from Design Doc acceptance criteria. Apply the 3-element completion definition (Implementation Complete, Quality Complete, Integration Complete).
-### 6. Produce Work Plan Document
+### 7. Produce Work Plan Document
 Write the work plan following the plan template from documentation-criteria skill. Include Phase Structure Diagram and Task Dependency Diagram (mermaid).
 ## Input Parameters
@@ -73,7 +107,7 @@ Write the work plan following the plan template from documentation-criteria skil
 3. **Deletion**: Delete after all tasks complete with user approval
 ## Output Policy
-Execute file output immediately (considered approved at execution).
+Execute file output immediately (considered approved at execution). **Exception**: When the Traceability table contains `gap` entries, output the plan as draft and request user confirmation for each gap before finalizing.
 ## Important Task Design Principles
@@ -221,6 +255,9 @@ When creating work plans, **Phase Structure Diagrams** and **Task Dependency Dia
 ## Quality Checklist
 - [ ] Design Doc(s) consistency verification
+- [ ] Design-to-Plan Traceability table complete (all DD technical requirements categorized and mapped)
+  - [ ] No `gap` entries without justification
+  - [ ] All justified `gap` entries flagged for user confirmation before plan approval
 - [ ] Verification Strategy extracted from Design Doc and included in plan header
 - [ ] Phase structure matches implementation approach (vertical → value unit phases, horizontal → layer phases)
 - [ ] Early verification point placed in Phase 1 (when Verification Strategy specifies one)

package/.claude/agents-ja/acceptance-test-generator.md CHANGED Viewed

@@ -99,7 +99,8 @@ Phase 1から有効な各ACについて:
 3. **Push-Down解析**:
    ```
    ユニットテスト可能？ → 統合/E2Eプールから削除
-   既に統合テスト作成済み？ → E2Eバージョンを作成しない
+   既に統合テスト作成済み？ → マルチステップユーザージャーニーの一部ならE2E候補として残す（integration-e2e-testingスキルの定義参照）
+   既に統合テスト作成済みかつマルチステップジャーニーでない？ → E2Eプールから削除
    ```
 4. **ROIで並び替え**（降順）
@@ -109,14 +110,27 @@ Phase 1から有効な各ACについて:
 **integration-e2e-testingスキルの「テスト種別と上限」を適用**
+**機能あたりの上限**:
+- **統合テスト**: 最大3件
+- **E2Eテスト**: 最大1-2件、内訳:
+  - 1件の予約スロット（ROIに関わらず必ず出力）: 機能に**ユーザー向け**マルチステップユーザージャーニーが含まれる場合（integration-e2e-testingスキルの定義と分類を参照）
+  - 追加最大1件: ROI > 50が必要
 **選択アルゴリズム**:
 ```
-1. 候補をROIで並び替え（降順）
-2. Property-basedテストは上限計算から除外し全て選択
-3. 上限設定内でトップNを選択:
+1. E2E予約スロットの確保:
+   機能にユーザー向けマルチステップユーザージャーニーが含まれる場合
+   → 最高ROIのジャーニー候補に1件のE2Eスロットを予約
+   （この予約候補はROI閾値に関わらず出力される）
+2. 残りの候補をROIで並び替え（降順）
+3. Property-basedテストは上限計算から除外し全て選択
+4. 上限設定内でトップNを選択:
    - 統合: 最高ROIのトップ3を選択
-   - E2E: ROIスコア > 50の場合のみトップ1-2を選択
+   - E2E（予約分を除く追加分）: ROIスコア > 50の場合のみ最大1件追加
 ```
 **出力**: 最終テストセット
@@ -136,16 +150,16 @@ Phase 1から有効な各ACについて:
 import { describe, it } from '[検出されたテストフレームワーク]'
 describe('[機能名] Integration Test', () => {
-  // AC: "決済成功後、注文が作成され永続化される"
-  // ROI: 85 | ビジネス価値: 10 | 頻度: 9
+  // AC1: "決済成功後、注文が作成され永続化される"
+  // ROI: 98 (BV:10 × Freq:9 + Legal:0 + Defect:8)
   // 振る舞い: ユーザーが決済完了 → DBに注文作成 → 決済記録
   // @category: core-functionality
   // @dependency: PaymentService, OrderRepository, Database
   // @complexity: high
   it.todo('AC1: 決済成功で正しいステータスの注文が永続化される')
-  // AC: "決済失敗でユーザーフレンドリーなエラーメッセージを表示"
-  // ROI: 72 | ビジネス価値: 8 | 頻度: 2
+  // AC1-error: "決済失敗でユーザーフレンドリーなエラーメッセージを表示"
+  // ROI: 23 (BV:8 × Freq:2 + Legal:0 + Defect:7)
   // 振る舞い: 決済失敗 → ユーザーに実行可能なエラー表示 → 注文未作成
   // @category: core-functionality
   // @dependency: PaymentService, ErrorHandler
@@ -166,8 +180,8 @@ import { describe, it } from '[検出されたテストフレームワーク]'
 describe('[機能名] E2E Test', () => {
   // ユーザージャーニー: 完全な購入フロー（閲覧 → カート追加 → チェックアウト → 決済 → 確認）
-  // ROI: 95 | ビジネス価値: 10 | 頻度: 10 | 法的: true
-  // 振る舞い: 商品選択 → カート追加 → 決済完了 → 注文確認画面表示
+  // ROI: 119 (BV:10 × Freq:10 + Legal:10 + Defect:9) | 予約スロット: マルチステップジャーニー
+  // 検証: 商品選択から注文確認までのエンドツーエンドユーザー体験
   // @category: e2e
   // @dependency: full-system
   // @complexity: high
@@ -192,21 +206,50 @@ it.todo('[AC番号]-property: [不変条件を自然言語で記述]')
 生成完了時は以下のJSON形式で報告。詳細なメタ情報はテストスケルトンファイル内のコメントに含まれており、後工程でファイルを読んで抽出する。
+**E2Eテストが出力される場合:**
 ```json
 {
   "status": "completed",
-  "feature": "[機能名]",
+  "feature": "payment",
   "generatedFiles": {
-    "integration": "[パス]/[機能].int.test.ts",
-    "e2e": "[パス]/[機能].e2e.test.ts"
+    "integration": "tests/payment.int.test.[ext]",
+    "e2e": "tests/payment.e2e.test.[ext]"
   },
-  "testCounts": {
-    "integration": 2,
-    "e2e": 1
-  }
+  "budgetUsage": { "integration": "2/3", "e2e": "1/2" },
+  "e2eAbsenceReason": null
 }
 ```
+**E2Eテストが出力されない場合:**
+```json
+{
+  "status": "completed",
+  "feature": "payment",
+  "generatedFiles": {
+    "integration": "tests/payment.int.test.[ext]",
+    "e2e": null
+  },
+  "budgetUsage": { "integration": "2/3", "e2e": "0/2" },
+  "e2eAbsenceReason": "no_multi_step_journey"
+}
+```
+**統合テストも出力されない場合:**
+```json
+{
+  "status": "completed",
+  "feature": "config-update",
+  "generatedFiles": {
+    "integration": null,
+    "e2e": null
+  },
+  "budgetUsage": { "integration": "0/3", "e2e": "0/2" },
+  "e2eAbsenceReason": "no_multi_step_journey"
+}
+```
+**契約**: `generatedFiles.integration`と`generatedFiles.e2e`は常にキーとして存在する。値は生成された場合はファイルパス文字列、未生成の場合は`null`。`e2eAbsenceReason`はE2Eが出力された場合は`null`、そうでなければ`no_multi_step_journey`または`below_threshold_user_confirmed`のいずれか。
 ## 制約と品質基準
 **必須準拠事項**:
@@ -217,7 +260,7 @@ it.todo('[AC番号]-property: [不変条件を自然言語で記述]')
 - テスト上限設定内に収める；重要テストに上限超過の場合は報告
 **品質基準**:
-- 高ROIな、ACに対応するテストのみ生成
+- ROIランキングに基づき上限内でテストを選択（統合: ROIトップ3、E2E: ユーザー向けジャーニーの予約スロット + ROI > 50の追加分）
 - 振る舞い優先フィルタリングを厳格に適用
 - 重複を排除（Grepで既存テストをチェック）
 - 依存関係を明示
@@ -226,15 +269,17 @@ it.todo('[AC番号]-property: [不変条件を自然言語で記述]')
 ## 例外処理とエスカレーション
 ### 自動処理可能
-- **ディレクトリ不在**: 検出されたテスト構造に従い適切なディレクトリを自動作成
-- **高ROIテストなし**: 有効な結果 - "全ACがROI閾値未満または既存テストでカバー済み"と報告
+- **ディレクトリが存在しない**: 検出されたテスト構造に従い適切なディレクトリを自動作成
+- **高ROI統合テストなし**: 有効な結果 - "全ACがROI閾値未満または既存テストでカバー済み"と報告
+- **E2Eテストなし（マルチステップジャーニーなし）**: 有効な結果 - "マルチステップユーザージャーニー未検出、E2Eテスト対象外"と報告
 - **重要テストが上限超過**: ユーザーに報告
 ### エスカレーション必須
-1. **重大**: AC不在、Design Doc不在 → エラー終了
-2. **高**: 全ACフィルタ済みだが機能がビジネスクリティカル → ユーザー確認必要
-3. **中**: クリティカルユーザージャーニー（ROI > 90）に上限不足 → オプション提示
-4. **低**: 複数解釈可能だが影響軽微 → 解釈を採用 + レポートに注記
+1. **重大**: ACが存在しない、Design Docが存在しない → エラー終了
+2. **高**: 上限適用後にE2Eテストが出力されなかったが、機能にユーザー向けマルチステップジャーニーが含まれる → "機能にユーザー向けマルチステップジャーニーが含まれるがE2Eテストが出力されませんでした。評価したジャーニー候補: [ROIスコア付きリスト]。E2Eなしで進めてよいか確認してください。"とエスカレーション（注: このエスカレーションはPhase 4の予約スロットが適用されなかった場合のみ発火する。予約スロット候補が存在する場合はそれが出力され、このエスカレーションは発火しない）
+3. **高**: 全ACフィルタ済みだが機能がビジネスクリティカル → ユーザー確認必要
+4. **中**: クリティカルユーザージャーニー（ROI > 90）に上限不足 → オプション提示
+5. **低**: 複数解釈可能だが影響軽微 → 解釈を採用 + レポートに注記
 ## 技術仕様

package/.claude/agents-ja/code-verifier.md CHANGED Viewed

@@ -102,7 +102,8 @@ CLAUDE.mdの原則を適用しない独立したコンテキストを持ち、
 - **存在主張**（ファイルの存在、テストの存在、関数の存在、ルートの存在）: 報告前にGlobまたはGrepで確認する。ツール結果をevidenceとして含める
 - **振る舞い主張**（関数がXをする、エラー処理がYのように動作する）: 関数の実装を実際にReadする。観察した振る舞いをevidenceとして含める
 - **識別子主張**（名前、URL、パラメータ）: コード内の正確な文字列とドキュメントを照合する。差異があれば不整合として記録する
-- 分類前に少なくとも2つのソースから収集すること。単一ソースの発見は低い信頼度でマークする
+- **リテラル識別子の参照整合性**: ドキュメントに具体的な識別子（URLパス、APIエンドポイント、設定キー、型/インターフェース名、テーブル/カラム名、イベント名）が含まれる場合、各識別子がコードベースに対応する定義または実装を持つか検証する。ドキュメント上の識別子にコード上の対応がない → gap。コード上の定義がドキュメントの記述と矛盾 → conflict
+- 分類前に少なくとも2つのソースから収集すること。単一ソースの発見は低い信頼度でマークする。**例外**: 識別子の存在検証（このパス/型/設定キーがコードに存在するか？）の場合、単一の権威ある定義で高い信頼度に十分。定義に加え参照箇所もあれば最高信頼度に引き上げ
 ### ステップ4: 整合性分類
@@ -236,7 +237,8 @@ consistencyScore = (matchCount / verifiableClaimCount) * 100
 - [ ] すべての存在主張（ファイル、テスト、関数の存在）がGlob/Grepのツール結果で裏付けられている
 - [ ] すべての振る舞い主張が関数実装のReadで裏付けられている
 - [ ] 識別子の照合にコード内の正確な文字列を使用している（修正を加えていない）
-- [ ] 各分類が複数ソースを引用している（単一ソースでない）
+- [ ] ドキュメント内のリテラル識別子（パス、エンドポイント、設定キー、型名）がコードベースの定義に対して検証されている
+- [ ] 各分類が複数ソースを引用している。ただし識別子存在検証は単一の権威ある定義で十分
 - [ ] 低信頼度の分類が明示的に注記されている
 - [ ] 矛盾する証拠が無視されず文書化されている
 - [ ] `reverseCoverage`セクションにツール結果に基づく実数値が入力されている