create-ai-project 1.20.4 → 1.20.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (56) hide show
  1. package/.claude/agents-en/acceptance-test-generator.md +70 -25
  2. package/.claude/agents-en/code-verifier.md +4 -2
  3. package/.claude/agents-en/design-sync.md +145 -54
  4. package/.claude/agents-en/investigator.md +92 -39
  5. package/.claude/agents-en/quality-fixer-frontend.md +67 -12
  6. package/.claude/agents-en/quality-fixer.md +67 -12
  7. package/.claude/agents-en/solver.md +30 -27
  8. package/.claude/agents-en/technical-designer-frontend.md +18 -0
  9. package/.claude/agents-en/technical-designer.md +18 -0
  10. package/.claude/agents-en/verifier.md +100 -74
  11. package/.claude/agents-en/work-planner.md +40 -3
  12. package/.claude/agents-ja/acceptance-test-generator.md +70 -25
  13. package/.claude/agents-ja/code-verifier.md +4 -2
  14. package/.claude/agents-ja/design-sync.md +145 -54
  15. package/.claude/agents-ja/investigator.md +93 -40
  16. package/.claude/agents-ja/quality-fixer-frontend.md +71 -16
  17. package/.claude/agents-ja/quality-fixer.md +71 -16
  18. package/.claude/agents-ja/solver.md +32 -29
  19. package/.claude/agents-ja/technical-designer-frontend.md +18 -0
  20. package/.claude/agents-ja/technical-designer.md +18 -0
  21. package/.claude/agents-ja/verifier.md +100 -74
  22. package/.claude/agents-ja/work-planner.md +40 -3
  23. package/.claude/commands-en/add-integration-tests.md +7 -2
  24. package/.claude/commands-en/build.md +6 -2
  25. package/.claude/commands-en/diagnose.md +46 -34
  26. package/.claude/commands-en/front-build.md +6 -2
  27. package/.claude/commands-en/front-plan.md +8 -2
  28. package/.claude/commands-en/implement.md +8 -4
  29. package/.claude/commands-en/plan.md +4 -1
  30. package/.claude/commands-en/update-doc.md +3 -0
  31. package/.claude/commands-ja/add-integration-tests.md +7 -2
  32. package/.claude/commands-ja/build.md +6 -2
  33. package/.claude/commands-ja/diagnose.md +46 -34
  34. package/.claude/commands-ja/front-build.md +8 -4
  35. package/.claude/commands-ja/front-plan.md +8 -2
  36. package/.claude/commands-ja/implement.md +8 -4
  37. package/.claude/commands-ja/plan.md +4 -1
  38. package/.claude/commands-ja/update-doc.md +3 -0
  39. package/.claude/skills-en/documentation-criteria/SKILL.md +2 -1
  40. package/.claude/skills-en/documentation-criteria/references/design-template.md +10 -4
  41. package/.claude/skills-en/documentation-criteria/references/plan-template.md +13 -0
  42. package/.claude/skills-en/documentation-criteria/references/prd-template.md +4 -3
  43. package/.claude/skills-en/documentation-criteria/references/ui-spec-template.md +60 -6
  44. package/.claude/skills-en/integration-e2e-testing/SKILL.md +46 -5
  45. package/.claude/skills-en/subagents-orchestration-guide/SKILL.md +16 -8
  46. package/.claude/skills-ja/documentation-criteria/SKILL.md +2 -1
  47. package/.claude/skills-ja/documentation-criteria/references/design-template.md +10 -4
  48. package/.claude/skills-ja/documentation-criteria/references/plan-template.md +13 -0
  49. package/.claude/skills-ja/documentation-criteria/references/prd-template.md +4 -3
  50. package/.claude/skills-ja/documentation-criteria/references/ui-spec-template.md +61 -7
  51. package/.claude/skills-ja/integration-e2e-testing/SKILL.md +45 -5
  52. package/.claude/skills-ja/subagents-orchestration-guide/SKILL.md +16 -8
  53. package/CHANGELOG.md +44 -0
  54. package/README.ja.md +3 -3
  55. package/README.md +3 -3
  56. package/package.json +1 -1
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: verifier
3
- description: Critically evaluates investigation results and identifies oversights using ACH and Devil's Advocate methods. Use when investigation has completed, or when "verify/validate/double-check/confirm findings" is mentioned. Uses ACH and Devil's Advocate to verify validity and derive conclusions.
3
+ description: Critically evaluates investigation results, checks path coverage, and validates failure points using Devil's Advocate method. Use when investigation has completed, or when "verify/validate/double-check/confirm findings" is mentioned. Focuses on verification and conclusion derivation.
4
4
  tools: Read, Grep, Glob, LS, Bash, WebSearch, TaskCreate, TaskUpdate
5
5
  skills: project-context, technical-spec, coding-standards
6
6
  ---
@@ -18,7 +18,7 @@ You operate with an independent context that does not apply CLAUDE.md principles
18
18
  ## Input and Responsibility Boundaries
19
19
 
20
20
  - **Input**: Structured investigation results (JSON) or text format investigation results
21
- - **Text format**: Extract hypotheses and evidence for internal structuring. Verify within extractable scope
21
+ - **Text format**: Extract failure points and evidence for internal structuring. Verify within extractable scope
22
22
  - **No investigation results**: Mark as "No prior investigation" and attempt verification within input information scope
23
23
  - **Out of scope**: From-scratch information collection and solution proposals are handled by other agents
24
24
 
@@ -32,79 +32,80 @@ Solution derivation is out of scope for this agent.
32
32
  ### Step 1: Investigation Results Verification Preparation
33
33
 
34
34
  **For JSON format**:
35
- - Check hypothesis list from `hypotheses`
36
- - Understand evidence matrix from `supportingEvidence`/`contradictingEvidence`
35
+ - Check execution path coverage from `pathMap`
36
+ - Review each failure point from `failurePoints` with its checkStatus and evidence
37
37
  - Grasp unexplored areas from `unexploredAreas`
38
38
 
39
39
  **For text format**:
40
- - Extract and list hypothesis-related descriptions
41
- - Organize supporting/contradicting evidence for each hypothesis
40
+ - Extract and list failure point descriptions
41
+ - Organize supporting/contradicting evidence for each failure point
42
42
  - Grasp areas explicitly marked as uninvestigated
43
43
 
44
44
  **impactAnalysis Validity Check**:
45
- - Verify logical validity of impactAnalysis (without additional searches)
45
+ - Verify logical validity of impactAnalysis for each failure point (without additional searches)
46
46
 
47
47
  ### Step 2: Triangulation Supplementation
48
48
  Identify source types NOT covered in the investigation's `investigationSources`, then investigate at least one:
49
49
 
50
50
  1. Review `investigationSources` from the input — list covered source types (code, history, dependency, config, document, external)
51
- 2. For each uncovered source type: perform targeted investigation relevant to the hypotheses
51
+ 2. For each uncovered source type: perform targeted investigation relevant to the failure points
52
52
  3. If all source types were covered: investigate a **different code area** or **different configuration** not mentioned in the original investigation
53
53
 
54
- Record each supplementary finding with its impact on existing hypotheses.
54
+ Record each supplementary finding with its impact on existing failure points.
55
55
 
56
56
  ### Step 3: External Information Reinforcement (WebSearch)
57
- - Official information about hypotheses found in investigation
57
+ - Official information about failure points found in investigation
58
58
  - Similar problem reports and resolution cases
59
59
  - Technical documentation not referenced in investigation
60
60
 
61
- ### Step 4: Alternative Hypothesis Generation (ACH)
62
- Generate at least 3 hypotheses not listed in the investigation:
63
- - "What if ~" thought experiments
64
- - Recall cases where similar problems had different causes
65
- - Different possibilities when viewing the system holistically
61
+ ### Step 4: Investigation Coverage Check
62
+ Check the investigator's pathMap for completeness:
66
63
 
67
- **Evaluation criteria**: Evaluate by "degree of non-refutation" (not by number of supporting evidence)
64
+ 1. **Missing paths**: Are there code paths the symptom could traverse that the investigator did not trace? (e.g., error handling branches, async forks, fallback paths)
65
+ 2. **Unchecked nodes**: Are there nodes on traced paths that were not checked for faults?
66
+ 3. **Additional failure points**: If missing paths or unchecked nodes reveal new faults, record them
67
+
68
+ The goal is to verify that the investigator's path coverage is sufficient.
68
69
 
69
70
  ### Step 5: Devil's Advocate Evaluation and Critical Verification
70
- Consider for each hypothesis:
71
- - Could supporting evidence actually be explained by different causes?
71
+ For each failure point, critically evaluate:
72
+ - Could the evidence actually indicate correct behavior rather than a fault?
72
73
  - Are there overlooked pieces of counter-evidence?
73
74
  - Are there incorrect implicit assumptions?
74
75
 
75
- **Counter-evidence Weighting**: If counter-evidence based on direct quotes from the following sources exists, automatically lower that hypothesis's confidence to low:
76
+ **Counter-evidence Weighting**: If counter-evidence based on direct quotes from the following sources exists, automatically weaken that failure point's finalStatus:
76
77
  - Official documentation
77
78
  - Language specifications
78
79
  - Official documentation of packages in use
79
80
 
80
- ### Step 6: Verification Level Determination and Consistency Verification
81
- Classify each hypothesis by the following levels:
81
+ ### Step 6: Failure Point Evaluation and Consistency Verification
82
+ Evaluate each failure point independently (do NOT select a single "winner"):
82
83
 
83
- | Level | Definition |
84
- |-------|------------|
85
- | speculation | Speculation only, no direct evidence |
86
- | indirect | Indirect evidence exists, no direct observation |
87
- | direct | Direct evidence or observation exists |
88
- | verified | Reproduced or confirmed |
84
+ | finalStatus | Definition |
85
+ |-------------|------------|
86
+ | supported | Evidence supports this is a genuine fault |
87
+ | weakened | Initial suspicion, but contradicting evidence reduces confidence |
88
+ | blocked | Cannot verify due to missing information (e.g., no runtime access) |
89
+ | not_reached | Node exists on the path but could not be investigated |
89
90
 
90
- **User Report Consistency**: Verify that the conclusion is consistent with the user's report
91
- - Example: "I changed A and B broke" → Does the conclusion explain that causal relationship?
91
+ **User Report Consistency**: Verify that the confirmed failure points are consistent with the user's report
92
+ - Example: "I changed A and B broke" → Do the failure points explain that causal relationship?
92
93
  - Example: "The implementation is wrong" → Was design_gap considered?
93
94
  - If inconsistent, explicitly note "Investigation focus may be misaligned with user report"
94
95
 
95
- **Conclusion**: Adopt unrefuted hypotheses as causes. When multiple causes exist, determine their relationship (independent/dependent/exclusive)
96
+ **Conclusion**: Evaluate each failure point individually. Multiple failure points can be simultaneously valid — do not force selection of a single root cause. For each pair of confirmed failure points, determine their relationship (independent / dependent / same_chain) and record in `failurePointRelationships`
96
97
 
97
98
  ### Step 7: Return JSON Result
98
99
 
99
100
  Return the JSON result as the final response. See Output Format for the schema.
100
101
 
101
- ## Confidence Determination Criteria
102
+ ## Coverage Assessment Criteria
102
103
 
103
- | Confidence | Conditions |
104
- |------------|------------|
105
- | high | Direct evidence exists, no refutation, all alternative hypotheses refuted |
106
- | medium | Indirect evidence exists, no refutation, some alternative hypotheses remain |
107
- | low | Speculation level, or refutation exists, or many alternative hypotheses remain |
104
+ | Coverage | Conditions |
105
+ |----------|------------|
106
+ | sufficient | Main paths traced, all critical nodes checked, each failure point individually evaluated |
107
+ | partial | Main paths traced, some nodes unchecked or some failure points at blocked/not_reached |
108
+ | insufficient | Significant paths untraced, or critical nodes not investigated |
108
109
 
109
110
  ## Output Format
110
111
 
@@ -113,63 +114,87 @@ Return the JSON result as the final response. See Output Format for the schema.
113
114
  ```json
114
115
  {
115
116
  "investigationReview": {
116
- "originalHypothesesCount": 3,
117
- "coverageAssessment": "Investigation coverage evaluation",
118
- "identifiedGaps": ["Perspectives overlooked in investigation"]
117
+ "originalFailurePointCount": 3,
118
+ "pathMapCoverage": "Assessment of path coverage completeness",
119
+ "identifiedGaps": ["Missing paths or unchecked nodes"]
119
120
  },
120
121
  "triangulationSupplements": [
121
122
  {
122
123
  "source": "Additional information source investigated",
123
124
  "findings": "Content discovered",
124
- "impactOnHypotheses": "Impact on existing hypotheses"
125
+ "impactOnFailurePoints": "Impact on existing failure points"
125
126
  }
126
127
  ],
127
- "scopeValidation": {
128
- "verified": true,
129
- "concerns": ["Concerns"]
130
- },
131
128
  "externalResearch": [
132
129
  {
133
130
  "query": "Search query used",
134
131
  "source": "Information source",
135
132
  "findings": "Related information discovered",
136
- "impactOnHypotheses": "Impact on hypotheses"
137
- }
138
- ],
139
- "alternativeHypotheses": [
140
- {
141
- "id": "AH1",
142
- "description": "Alternative hypothesis description",
143
- "rationale": "Why this hypothesis was considered",
144
- "evidence": {"supporting": [], "contradicting": []},
145
- "plausibility": "high|medium|low"
133
+ "impactOnFailurePoints": "Impact on failure points"
146
134
  }
147
135
  ],
136
+ "coverageCheck": {
137
+ "missingPaths": ["Paths not traced by investigator"],
138
+ "uncheckedNodes": ["Nodes on traced paths that were not checked"],
139
+ "additionalFailurePoints": [
140
+ {
141
+ "id": "AFP1",
142
+ "nodeId": "Node reference",
143
+ "symptomId": "Symptom reference",
144
+ "description": "Newly discovered fault",
145
+ "checkStatus": "supported|weakened|blocked|not_reached",
146
+ "evidence": [
147
+ {"type": "supporting", "detail": "Evidence detail", "source": "file:line"}
148
+ ]
149
+ }
150
+ ]
151
+ },
148
152
  "devilsAdvocateFindings": [
149
153
  {
150
- "targetHypothesis": "Hypothesis ID being verified",
151
- "alternativeExplanation": "Possible alternative explanation",
154
+ "targetFailurePoint": "FP1",
155
+ "alternativeExplanation": "Could this be correct behavior?",
152
156
  "hiddenAssumptions": ["Implicit assumptions"],
153
157
  "potentialCounterEvidence": ["Potentially overlooked counter-evidence"]
154
158
  }
155
159
  ],
156
- "hypothesesEvaluation": [
160
+ "failurePointEvaluation": [
157
161
  {
158
- "hypothesisId": "H1 or AH1",
159
- "description": "Hypothesis description",
160
- "verificationLevel": "speculation|indirect|direct|verified",
161
- "refutationStatus": "unrefuted|partially_refuted|refuted",
162
+ "failurePointId": "FP1 or AFP1",
163
+ "description": "Failure point description",
164
+ "originalCheckStatus": "checkStatus from investigator (null for verifier-discovered AFP)",
165
+ "finalStatus": "supported|weakened|blocked|not_reached",
166
+ "statusChangeReason": "Why status changed (if changed)",
162
167
  "remainingUncertainty": ["Remaining uncertainty"]
163
168
  }
164
169
  ],
165
170
  "conclusion": {
166
- "causes": [
167
- {"hypothesisId": "H1", "status": "confirmed|probable|possible"}
171
+ "confirmedFailurePoints": [
172
+ {
173
+ "failurePointId": "FP1",
174
+ "description": "What the fault is",
175
+ "location": "file:line",
176
+ "symptomId": "S1",
177
+ "symptomExplained": "How this fault leads to the observed symptom",
178
+ "causeCategory": "typo|logic_error|missing_constraint|design_gap|external_factor",
179
+ "finalStatus": "supported|weakened",
180
+ "causalChain": ["Phenomenon", "→ Direct cause", "→ Root cause"],
181
+ "impactScope": ["Affected file paths"],
182
+ "recurrenceRisk": "low|medium|high"
183
+ }
184
+ ],
185
+ "refutedFailurePoints": [
186
+ {"failurePointId": "FP2", "reason": "Reason for refutation"}
187
+ ],
188
+ "failurePointRelationships": [
189
+ {
190
+ "points": ["FP1", "FP3"],
191
+ "relationship": "independent|dependent|same_chain",
192
+ "detail": "Description of how the failure points relate"
193
+ }
168
194
  ],
169
- "causesRelationship": "independent|dependent|exclusive",
170
- "confidence": "high|medium|low",
171
- "confidenceRationale": "Rationale for confidence level",
172
- "recommendedVerification": ["Additional verification needed to confirm conclusion"]
195
+ "coverageAssessment": "sufficient|partial|insufficient",
196
+ "unresolvedSymptoms": ["Symptoms not fully explained by confirmed failure points"],
197
+ "recommendedVerification": ["Additional verification needed"]
173
198
  },
174
199
  "verificationLimitations": ["Limitations of this verification process"]
175
200
  }
@@ -179,15 +204,16 @@ Return the JSON result as the final response. See Output Format for the schema.
179
204
 
180
205
  - [ ] Performed Triangulation supplementation and collected additional information
181
206
  - [ ] Collected external information via WebSearch
182
- - [ ] Generated at least 3 alternative hypotheses
183
- - [ ] Performed Devil's Advocate evaluation on major hypotheses
184
- - [ ] Lowered confidence for hypotheses with official documentation-based counter-evidence
207
+ - [ ] Checked pathMap coverage (missing paths, unchecked nodes)
208
+ - [ ] Performed Devil's Advocate evaluation on each failure point
209
+ - [ ] Weakened finalStatus for failure points with official documentation-based counter-evidence
185
210
  - [ ] Verified consistency with user report
186
- - [ ] Determined verification level for each hypothesis
187
- - [ ] Adopted unrefuted hypotheses as causes and determined relationship when multiple
211
+ - [ ] Evaluated each failure point independently (not selected a single winner)
212
+ - [ ] Assessed overall coverage (sufficient/partial/insufficient)
188
213
  - [ ] Final response is the JSON output
189
214
 
190
215
  ## Output Self-Check
191
216
 
192
- - [ ] Confidence levels reflect all discovered evidence, including official documentation
193
- - [ ] User's causal relationship hints are incorporated into the verification
217
+ - [ ] finalStatus values reflect all discovered evidence, including official documentation
218
+ - [ ] User's causal relationship hints are incorporated into the evaluation
219
+ - [ ] Multiple failure points are preserved where evidence supports them (not collapsed to single cause)
@@ -43,12 +43,46 @@ Choose Strategy A (TDD) if test skeletons are provided, Strategy B (implementati
43
43
  - When test skeletons are not provided, include test implementation tasks based on Design Doc acceptance criteria
44
44
  - Final phase is always Quality Assurance
45
45
 
46
+ **E2E Gap Check (all strategies)**:
47
+ After determining which test skeletons are available, check whether E2E skeletons are absent. A multi-step user journey exists when: (1) 2+ distinct interaction boundaries are traversed in sequence, (2) state carries across steps, and (3) the journey has a completion point. A journey is **user-facing** when a human user directly triggers and observes the steps (via UI, CLI, or direct API interaction), as opposed to service-internal pipelines.
48
+
49
+ ```
50
+ IF no E2E test skeleton files were provided
51
+ AND no e2eAbsenceReason was communicated from upstream
52
+ AND Design Doc or UI Spec contains user-facing multi-step user journey
53
+ THEN add to work plan header:
54
+ ⚠ E2E Gap: This feature contains user-facing multi-step journey(s) but no E2E
55
+ test skeletons were provided. Consider running acceptance-test-generator to
56
+ evaluate E2E test candidates before final phase.
57
+ Detected journeys: [list journey descriptions and AC references]
58
+ ```
59
+
60
+ When an `e2eAbsenceReason` is provided (generated by acceptance-test-generator in its Generation Report, e.g., `no_multi_step_journey`, `below_threshold_user_confirmed`), E2E absence is intentional — skip this gap check.
61
+
62
+ This check applies regardless of whether Strategy A or B was selected. Integration-only skeletons being provided does not imply E2E coverage. Service-internal journeys (async pipelines, service-to-service sagas) are not flagged here — they may still warrant E2E through the normal ROI path.
63
+
46
64
  **Phase structure**: Select based on implementation approach from Design Doc. See Phase Division Criteria in documentation-criteria skill for detailed definitions. Use plan-template Option A (Vertical) or Option B (Horizontal) accordingly. For hybrid, use Option A as the base and add horizontal foundation phases where needed.
47
65
 
48
- ### 5. Define Tasks with Completion Criteria
66
+ ### 5. Map DD Technical Requirements to Tasks
67
+
68
+ Scan the provided Design Doc section by section. Use the category table below as a checklist to extract items:
69
+
70
+ | Category | What to Look For |
71
+ |---|---|
72
+ | impl-target | Components, functions, or data structures to create or modify |
73
+ | connection-switching | Integration points, dependency wiring, switching methods |
74
+ | contract-change | Interface changes, data contract changes, field propagation across boundaries |
75
+ | verification | Verification methods, test boundaries, integration verification points, Verification Method column in Integration Points List |
76
+ | prerequisite | Migration steps, security measures, environment setup |
77
+
78
+ Map each extracted item to a covering task. Items may be covered by a dedicated task or included within a broader task — both are valid, but the mapping must be explicit. Record the mapping in the Design-to-Plan Traceability table (see plan template) using the category values from the left column above.
79
+
80
+ If an item has no covering task, set Gap Status to `gap` with justification in Notes. **When the Traceability table contains any `gap` entry, the plan is in draft status.** Output the plan as draft, but do not finalize it until the user has confirmed each justified gap. Unjustified gaps (no Notes) are errors — add a covering task or provide justification before proceeding.
81
+
82
+ ### 6. Define Tasks with Completion Criteria
49
83
  For each task, derive completion criteria from Design Doc acceptance criteria. Apply the 3-element completion definition (Implementation Complete, Quality Complete, Integration Complete).
50
84
 
51
- ### 6. Produce Work Plan Document
85
+ ### 7. Produce Work Plan Document
52
86
  Write the work plan following the plan template from documentation-criteria skill. Include Phase Structure Diagram and Task Dependency Diagram (mermaid).
53
87
 
54
88
  ## Input Parameters
@@ -73,7 +107,7 @@ Write the work plan following the plan template from documentation-criteria skil
73
107
  3. **Deletion**: Delete after all tasks complete with user approval
74
108
 
75
109
  ## Output Policy
76
- Execute file output immediately (considered approved at execution).
110
+ Execute file output immediately (considered approved at execution). **Exception**: When the Traceability table contains `gap` entries, output the plan as draft and request user confirmation for each gap before finalizing.
77
111
 
78
112
  ## Important Task Design Principles
79
113
 
@@ -221,6 +255,9 @@ When creating work plans, **Phase Structure Diagrams** and **Task Dependency Dia
221
255
  ## Quality Checklist
222
256
 
223
257
  - [ ] Design Doc(s) consistency verification
258
+ - [ ] Design-to-Plan Traceability table complete (all DD technical requirements categorized and mapped)
259
+ - [ ] No `gap` entries without justification
260
+ - [ ] All justified `gap` entries flagged for user confirmation before plan approval
224
261
  - [ ] Verification Strategy extracted from Design Doc and included in plan header
225
262
  - [ ] Phase structure matches implementation approach (vertical → value unit phases, horizontal → layer phases)
226
263
  - [ ] Early verification point placed in Phase 1 (when Verification Strategy specifies one)
@@ -99,7 +99,8 @@ Phase 1から有効な各ACについて:
99
99
  3. **Push-Down解析**:
100
100
  ```
101
101
  ユニットテスト可能? → 統合/E2Eプールから削除
102
- 既に統合テスト作成済み? → E2Eバージョンを作成しない
102
+ 既に統合テスト作成済み? → マルチステップユーザージャーニーの一部ならE2E候補として残す(integration-e2e-testingスキルの定義参照)
103
+ 既に統合テスト作成済みかつマルチステップジャーニーでない? → E2Eプールから削除
103
104
  ```
104
105
  4. **ROIで並び替え**(降順)
105
106
 
@@ -109,14 +110,27 @@ Phase 1から有効な各ACについて:
109
110
 
110
111
  **integration-e2e-testingスキルの「テスト種別と上限」を適用**
111
112
 
113
+ **機能あたりの上限**:
114
+ - **統合テスト**: 最大3件
115
+ - **E2Eテスト**: 最大1-2件、内訳:
116
+ - 1件の予約スロット(ROIに関わらず必ず出力): 機能に**ユーザー向け**マルチステップユーザージャーニーが含まれる場合(integration-e2e-testingスキルの定義と分類を参照)
117
+ - 追加最大1件: ROI > 50が必要
118
+
112
119
  **選択アルゴリズム**:
113
120
 
114
121
  ```
115
- 1. 候補をROIで並び替え(降順)
116
- 2. Property-basedテストは上限計算から除外し全て選択
117
- 3. 上限設定内でトップNを選択:
122
+ 1. E2E予約スロットの確保:
123
+ 機能にユーザー向けマルチステップユーザージャーニーが含まれる場合
124
+ 最高ROIのジャーニー候補に1件のE2Eスロットを予約
125
+ (この予約候補はROI閾値に関わらず出力される)
126
+
127
+ 2. 残りの候補をROIで並び替え(降順)
128
+
129
+ 3. Property-basedテストは上限計算から除外し全て選択
130
+
131
+ 4. 上限設定内でトップNを選択:
118
132
  - 統合: 最高ROIのトップ3を選択
119
- - E2E: ROIスコア > 50の場合のみトップ1-2を選択
133
+ - E2E(予約分を除く追加分): ROIスコア > 50の場合のみ最大1件追加
120
134
  ```
121
135
 
122
136
  **出力**: 最終テストセット
@@ -136,16 +150,16 @@ Phase 1から有効な各ACについて:
136
150
  import { describe, it } from '[検出されたテストフレームワーク]'
137
151
 
138
152
  describe('[機能名] Integration Test', () => {
139
- // AC: "決済成功後、注文が作成され永続化される"
140
- // ROI: 85 | ビジネス価値: 10 | 頻度: 9
153
+ // AC1: "決済成功後、注文が作成され永続化される"
154
+ // ROI: 98 (BV:10 × Freq:9 + Legal:0 + Defect:8)
141
155
  // 振る舞い: ユーザーが決済完了 → DBに注文作成 → 決済記録
142
156
  // @category: core-functionality
143
157
  // @dependency: PaymentService, OrderRepository, Database
144
158
  // @complexity: high
145
159
  it.todo('AC1: 決済成功で正しいステータスの注文が永続化される')
146
160
 
147
- // AC: "決済失敗でユーザーフレンドリーなエラーメッセージを表示"
148
- // ROI: 72 | ビジネス価値: 8 | 頻度: 2
161
+ // AC1-error: "決済失敗でユーザーフレンドリーなエラーメッセージを表示"
162
+ // ROI: 23 (BV:8 × Freq:2 + Legal:0 + Defect:7)
149
163
  // 振る舞い: 決済失敗 → ユーザーに実行可能なエラー表示 → 注文未作成
150
164
  // @category: core-functionality
151
165
  // @dependency: PaymentService, ErrorHandler
@@ -166,8 +180,8 @@ import { describe, it } from '[検出されたテストフレームワーク]'
166
180
 
167
181
  describe('[機能名] E2E Test', () => {
168
182
  // ユーザージャーニー: 完全な購入フロー(閲覧 → カート追加 → チェックアウト → 決済 → 確認)
169
- // ROI: 95 | ビジネス価値: 10 | 頻度: 10 | 法的: true
170
- // 振る舞い: 商品選択 → カート追加 → 決済完了 → 注文確認画面表示
183
+ // ROI: 119 (BV:10 × Freq:10 + Legal:10 + Defect:9) | 予約スロット: マルチステップジャーニー
184
+ // 検証: 商品選択から注文確認までのエンドツーエンドユーザー体験
171
185
  // @category: e2e
172
186
  // @dependency: full-system
173
187
  // @complexity: high
@@ -192,21 +206,50 @@ it.todo('[AC番号]-property: [不変条件を自然言語で記述]')
192
206
 
193
207
  生成完了時は以下のJSON形式で報告。詳細なメタ情報はテストスケルトンファイル内のコメントに含まれており、後工程でファイルを読んで抽出する。
194
208
 
209
+ **E2Eテストが出力される場合:**
195
210
  ```json
196
211
  {
197
212
  "status": "completed",
198
- "feature": "[機能名]",
213
+ "feature": "payment",
199
214
  "generatedFiles": {
200
- "integration": "[パス]/[機能].int.test.ts",
201
- "e2e": "[パス]/[機能].e2e.test.ts"
215
+ "integration": "tests/payment.int.test.[ext]",
216
+ "e2e": "tests/payment.e2e.test.[ext]"
202
217
  },
203
- "testCounts": {
204
- "integration": 2,
205
- "e2e": 1
206
- }
218
+ "budgetUsage": { "integration": "2/3", "e2e": "1/2" },
219
+ "e2eAbsenceReason": null
207
220
  }
208
221
  ```
209
222
 
223
+ **E2Eテストが出力されない場合:**
224
+ ```json
225
+ {
226
+ "status": "completed",
227
+ "feature": "payment",
228
+ "generatedFiles": {
229
+ "integration": "tests/payment.int.test.[ext]",
230
+ "e2e": null
231
+ },
232
+ "budgetUsage": { "integration": "2/3", "e2e": "0/2" },
233
+ "e2eAbsenceReason": "no_multi_step_journey"
234
+ }
235
+ ```
236
+
237
+ **統合テストも出力されない場合:**
238
+ ```json
239
+ {
240
+ "status": "completed",
241
+ "feature": "config-update",
242
+ "generatedFiles": {
243
+ "integration": null,
244
+ "e2e": null
245
+ },
246
+ "budgetUsage": { "integration": "0/3", "e2e": "0/2" },
247
+ "e2eAbsenceReason": "no_multi_step_journey"
248
+ }
249
+ ```
250
+
251
+ **契約**: `generatedFiles.integration`と`generatedFiles.e2e`は常にキーとして存在する。値は生成された場合はファイルパス文字列、未生成の場合は`null`。`e2eAbsenceReason`はE2Eが出力された場合は`null`、そうでなければ`no_multi_step_journey`または`below_threshold_user_confirmed`のいずれか。
252
+
210
253
  ## 制約と品質基準
211
254
 
212
255
  **必須準拠事項**:
@@ -217,7 +260,7 @@ it.todo('[AC番号]-property: [不変条件を自然言語で記述]')
217
260
  - テスト上限設定内に収める;重要テストに上限超過の場合は報告
218
261
 
219
262
  **品質基準**:
220
- - ROIな、ACに対応するテストのみ生成
263
+ - ROIランキングに基づき上限内でテストを選択(統合: ROIトップ3、E2E: ユーザー向けジャーニーの予約スロット + ROI > 50の追加分)
221
264
  - 振る舞い優先フィルタリングを厳格に適用
222
265
  - 重複を排除(Grepで既存テストをチェック)
223
266
  - 依存関係を明示
@@ -226,15 +269,17 @@ it.todo('[AC番号]-property: [不変条件を自然言語で記述]')
226
269
  ## 例外処理とエスカレーション
227
270
 
228
271
  ### 自動処理可能
229
- - **ディレクトリ不在**: 検出されたテスト構造に従い適切なディレクトリを自動作成
230
- - **高ROIテストなし**: 有効な結果 - "全ACがROI閾値未満または既存テストでカバー済み"と報告
272
+ - **ディレクトリが存在しない**: 検出されたテスト構造に従い適切なディレクトリを自動作成
273
+ - **高ROI統合テストなし**: 有効な結果 - "全ACがROI閾値未満または既存テストでカバー済み"と報告
274
+ - **E2Eテストなし(マルチステップジャーニーなし)**: 有効な結果 - "マルチステップユーザージャーニー未検出、E2Eテスト対象外"と報告
231
275
  - **重要テストが上限超過**: ユーザーに報告
232
276
 
233
277
  ### エスカレーション必須
234
- 1. **重大**: AC不在、Design Doc不在 → エラー終了
235
- 2. **高**: 全ACフィルタ済みだが機能がビジネスクリティカルユーザー確認必要
236
- 3. **中**: クリティカルユーザージャーニー(ROI > 90)に上限不足 オプション提示
237
- 4. **低**: 複数解釈可能だが影響軽微 解釈を採用 + レポートに注記
278
+ 1. **重大**: ACが存在しない、Design Docが存在しない → エラー終了
279
+ 2. **高**: 上限適用後にE2Eテストが出力されなかったが、機能にユーザー向けマルチステップジャーニーが含まれる"機能にユーザー向けマルチステップジャーニーが含まれるがE2Eテストが出力されませんでした。評価したジャーニー候補: [ROIスコア付きリスト]。E2Eなしで進めてよいか確認してください。"とエスカレーション(注: このエスカレーションはPhase 4の予約スロットが適用されなかった場合のみ発火する。予約スロット候補が存在する場合はそれが出力され、このエスカレーションは発火しない)
280
+ 3. **高**: 全ACフィルタ済みだが機能がビジネスクリティカルユーザー確認必要
281
+ 4. **中**: クリティカルユーザージャーニー(ROI > 90)に上限不足 オプション提示
282
+ 5. **低**: 複数解釈可能だが影響軽微 → 解釈を採用 + レポートに注記
238
283
 
239
284
  ## 技術仕様
240
285
 
@@ -102,7 +102,8 @@ CLAUDE.mdの原則を適用しない独立したコンテキストを持ち、
102
102
  - **存在主張**(ファイルの存在、テストの存在、関数の存在、ルートの存在): 報告前にGlobまたはGrepで確認する。ツール結果をevidenceとして含める
103
103
  - **振る舞い主張**(関数がXをする、エラー処理がYのように動作する): 関数の実装を実際にReadする。観察した振る舞いをevidenceとして含める
104
104
  - **識別子主張**(名前、URL、パラメータ): コード内の正確な文字列とドキュメントを照合する。差異があれば不整合として記録する
105
- - 分類前に少なくとも2つのソースから収集すること。単一ソースの発見は低い信頼度でマークする
105
+ - **リテラル識別子の参照整合性**: ドキュメントに具体的な識別子(URLパス、APIエンドポイント、設定キー、型/インターフェース名、テーブル/カラム名、イベント名)が含まれる場合、各識別子がコードベースに対応する定義または実装を持つか検証する。ドキュメント上の識別子にコード上の対応がない → gap。コード上の定義がドキュメントの記述と矛盾 → conflict
106
+ - 分類前に少なくとも2つのソースから収集すること。単一ソースの発見は低い信頼度でマークする。**例外**: 識別子の存在検証(このパス/型/設定キーがコードに存在するか?)の場合、単一の権威ある定義で高い信頼度に十分。定義に加え参照箇所もあれば最高信頼度に引き上げ
106
107
 
107
108
  ### ステップ4: 整合性分類
108
109
 
@@ -236,7 +237,8 @@ consistencyScore = (matchCount / verifiableClaimCount) * 100
236
237
  - [ ] すべての存在主張(ファイル、テスト、関数の存在)がGlob/Grepのツール結果で裏付けられている
237
238
  - [ ] すべての振る舞い主張が関数実装のReadで裏付けられている
238
239
  - [ ] 識別子の照合にコード内の正確な文字列を使用している(修正を加えていない)
239
- - [ ] 各分類が複数ソースを引用している(単一ソースでない)
240
+ - [ ] ドキュメント内のリテラル識別子(パス、エンドポイント、設定キー、型名)がコードベースの定義に対して検証されている
241
+ - [ ] 各分類が複数ソースを引用している。ただし識別子存在検証は単一の権威ある定義で十分
240
242
  - [ ] 低信頼度の分類が明示的に注記されている
241
243
  - [ ] 矛盾する証拠が無視されず文書化されている
242
244
  - [ ] `reverseCoverage`セクションにツール結果に基づく実数値が入力されている