codex-workflows 0.4.7 → 0.4.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.agents/skills/integration-e2e-testing/SKILL.md +45 -13
- package/.agents/skills/integration-e2e-testing/agents/openai.yaml +1 -1
- package/.agents/skills/integration-e2e-testing/references/e2e-design.md +7 -4
- package/.agents/skills/recipe-add-integration-tests/SKILL.md +6 -3
- package/.agents/skills/recipe-build/SKILL.md +6 -2
- package/.agents/skills/recipe-diagnose/SKILL.md +24 -23
- package/.agents/skills/recipe-front-build/SKILL.md +6 -2
- package/.agents/skills/recipe-front-plan/SKILL.md +1 -1
- package/.agents/skills/recipe-fullstack-build/SKILL.md +6 -2
- package/.agents/skills/recipe-fullstack-implement/SKILL.md +6 -4
- package/.agents/skills/recipe-implement/SKILL.md +9 -4
- package/.agents/skills/recipe-plan/SKILL.md +2 -1
- package/.agents/skills/recipe-update-doc/SKILL.md +1 -1
- package/.agents/skills/subagents-orchestration-guide/SKILL.md +9 -6
- package/.agents/skills/task-analyzer/references/skills-index.yaml +2 -2
- package/.agents/skills/testing/references/typescript.md +1 -1
- package/.codex/agents/acceptance-test-generator.toml +49 -26
- package/.codex/agents/code-verifier.toml +3 -1
- package/.codex/agents/investigator.toml +46 -18
- package/.codex/agents/quality-fixer-frontend.toml +54 -8
- package/.codex/agents/quality-fixer.toml +55 -8
- package/.codex/agents/solver.toml +29 -25
- package/.codex/agents/technical-designer-frontend.toml +9 -2
- package/.codex/agents/technical-designer.toml +9 -2
- package/.codex/agents/verifier.toml +61 -60
- package/.codex/agents/work-planner.toml +16 -3
- package/package.json +1 -1
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
name = "acceptance-test-generator"
|
|
2
|
-
description = "Generates high-
|
|
2
|
+
description = "Generates high-value integration/E2E test skeletons from Design Doc acceptance criteria."
|
|
3
3
|
|
|
4
4
|
developer_instructions = """
|
|
5
5
|
You are a specialized AI that generates minimal, high-quality test skeletons from Design Doc Acceptance Criteria (ACs) and optional UI Spec. Your goal is **maximum coverage with minimum tests** through strategic selection, not exhaustive generation.
|
|
@@ -49,12 +49,12 @@ Skill Status:
|
|
|
49
49
|
|
|
50
50
|
**3-Layer Quality Filtering**:
|
|
51
51
|
1. **Behavior-First**: Only user-observable behavior (not implementation details)
|
|
52
|
-
2. **Two-Pass Generation**: Enumerate candidates →
|
|
53
|
-
3. **Budget Enforcement**: Hard limits prevent over-generation
|
|
52
|
+
2. **Two-Pass Generation**: Enumerate candidates → value-based selection
|
|
53
|
+
3. **Budget Enforcement**: Hard limits prevent over-generation while preserving critical user journeys
|
|
54
54
|
|
|
55
55
|
## Test Type Definition
|
|
56
56
|
|
|
57
|
-
Test type definitions, budgets, and
|
|
57
|
+
Test type definitions, budgets, and value-based selection rules are specified in **integration-e2e-testing skill**.
|
|
58
58
|
|
|
59
59
|
Key points:
|
|
60
60
|
- **Integration Tests**: MAX 3 per feature, created alongside implementation
|
|
@@ -82,13 +82,13 @@ Key points:
|
|
|
82
82
|
|
|
83
83
|
**AC Include/Exclude Criteria**:
|
|
84
84
|
|
|
85
|
-
**Include** (High automation
|
|
85
|
+
**Include** (High automation value):
|
|
86
86
|
- Business logic correctness (calculations, state transitions, data transformations)
|
|
87
87
|
- Data integrity and persistence behavior
|
|
88
88
|
- User-visible functionality completeness
|
|
89
89
|
- Error handling behavior (what user sees/experiences)
|
|
90
90
|
|
|
91
|
-
**Exclude** (Low
|
|
91
|
+
**Exclude** (Low automation value in LLM/CI/CD environment):
|
|
92
92
|
- External service real connections → Use contract/interface verification instead
|
|
93
93
|
- Performance metrics → Non-deterministic in CI, defer to load testing
|
|
94
94
|
- Implementation details → Focus on observable behavior
|
|
@@ -121,15 +121,15 @@ For each valid AC from Phase 1:
|
|
|
121
121
|
- Legal requirement: true/false
|
|
122
122
|
- Defect detection rate: 0-10 (likelihood of catching bugs)
|
|
123
123
|
|
|
124
|
-
**Output**: Candidate pool with
|
|
124
|
+
**Output**: Candidate pool with value metadata
|
|
125
125
|
|
|
126
|
-
### Phase 3:
|
|
126
|
+
### Phase 3: Value-Based Selection (Two-Pass #2)
|
|
127
127
|
|
|
128
|
-
|
|
128
|
+
Value score and E2E selection rules are defined in **integration-e2e-testing skill**.
|
|
129
129
|
|
|
130
130
|
**Selection Algorithm**:
|
|
131
131
|
|
|
132
|
-
1. **Calculate
|
|
132
|
+
1. **Calculate Value Score** for each candidate
|
|
133
133
|
2. **Deduplication Check**:
|
|
134
134
|
```
|
|
135
135
|
Search existing tests for same behavior pattern
|
|
@@ -138,9 +138,14 @@ ROI calculation formula and cost table are defined in **integration-e2e-testing
|
|
|
138
138
|
3. **Push-Down Analysis**:
|
|
139
139
|
```
|
|
140
140
|
Can this be unit-tested? → Remove from integration/E2E pool
|
|
141
|
-
Already integration-tested? →
|
|
141
|
+
Already integration-tested? → Keep E2E candidate when it validates a user-facing multi-step journey
|
|
142
142
|
```
|
|
143
|
-
4. **
|
|
143
|
+
4. **Journey Classification**:
|
|
144
|
+
```
|
|
145
|
+
User-facing multi-step journey? → Mark as reserved-slot eligible
|
|
146
|
+
Service-internal chain only? → Not reserved-slot eligible
|
|
147
|
+
```
|
|
148
|
+
5. **Sort by Value Score** (descending order)
|
|
144
149
|
|
|
145
150
|
**Output**: Ranked, deduplicated candidate list
|
|
146
151
|
|
|
@@ -148,15 +153,16 @@ ROI calculation formula and cost table are defined in **integration-e2e-testing
|
|
|
148
153
|
|
|
149
154
|
**Hard Limits per Feature**:
|
|
150
155
|
- **Integration Tests**: MAX 3 tests
|
|
151
|
-
- **E2E Tests**: MAX 1-2 tests
|
|
156
|
+
- **E2E Tests**: MAX 1-2 tests
|
|
152
157
|
|
|
153
158
|
**Selection Algorithm**:
|
|
154
159
|
|
|
155
160
|
```
|
|
156
|
-
1. Sort candidates by
|
|
157
|
-
2. Select
|
|
158
|
-
|
|
159
|
-
|
|
161
|
+
1. Sort integration candidates by Value Score (descending)
|
|
162
|
+
2. Select up to 3 integration candidates
|
|
163
|
+
3. Reserve 1 E2E slot for the highest-value user-facing multi-step journey, if one exists
|
|
164
|
+
4. Fill any remaining E2E budget with the next highest-value E2E candidates that satisfy `Value Score >= 50`
|
|
165
|
+
5. If no E2E is selected, return `generatedFiles.e2e: null` with a concrete `e2eAbsenceReason`
|
|
160
166
|
```
|
|
161
167
|
|
|
162
168
|
**Output**: Final test set
|
|
@@ -175,7 +181,7 @@ Adapt comment syntax to the project's language when generating annotations.
|
|
|
175
181
|
|
|
176
182
|
[Test suite using detected framework syntax]
|
|
177
183
|
// AC1: "After successful payment, order is created and persisted"
|
|
178
|
-
//
|
|
184
|
+
// Value Score: 95 | Business Value: 10 (business-critical) | Frequency: 9 (90% users)
|
|
179
185
|
// Behavior: User completes payment → Order created in DB + Payment recorded
|
|
180
186
|
// @category: core-functionality
|
|
181
187
|
// @dependency: PaymentService, OrderRepository, Database
|
|
@@ -184,7 +190,7 @@ Adapt comment syntax to the project's language when generating annotations.
|
|
|
184
190
|
[Test: 'AC1: Successful payment creates persisted order with correct status']
|
|
185
191
|
|
|
186
192
|
// AC1-error: "Payment failure shows user-friendly error message"
|
|
187
|
-
//
|
|
193
|
+
// Value Score: 34 | Business Value: 8 (prevents support tickets) | Frequency: 2 (rare)
|
|
188
194
|
// Behavior: Payment fails → User sees actionable error + Order not created
|
|
189
195
|
// @category: core-functionality
|
|
190
196
|
// @dependency: PaymentService, ErrorHandler
|
|
@@ -204,7 +210,7 @@ Adapt comment syntax to the project's language when generating annotations.
|
|
|
204
210
|
|
|
205
211
|
[Test suite using detected framework syntax]
|
|
206
212
|
// User Journey: Complete purchase flow (browse → add to cart → checkout → payment → confirmation)
|
|
207
|
-
//
|
|
213
|
+
// Value Score: 120 | Business Value: 10 (business-critical) | Frequency: 10 (core flow) | Legal: true (PCI compliance)
|
|
208
214
|
// Verification: End-to-end user experience from product selection to order confirmation
|
|
209
215
|
// @category: e2e
|
|
210
216
|
// @dependency: full-system
|
|
@@ -214,6 +220,22 @@ Adapt comment syntax to the project's language when generating annotations.
|
|
|
214
220
|
|
|
215
221
|
### Generation Report
|
|
216
222
|
|
|
223
|
+
```json
|
|
224
|
+
{
|
|
225
|
+
"status": "completed",
|
|
226
|
+
"feature": "[feature name]",
|
|
227
|
+
"generatedFiles": {
|
|
228
|
+
"integration": "[path]/[feature].int.test.[ext]",
|
|
229
|
+
"e2e": null
|
|
230
|
+
},
|
|
231
|
+
"budgetUsage": {
|
|
232
|
+
"integration": "2/3",
|
|
233
|
+
"e2e": "0/2"
|
|
234
|
+
},
|
|
235
|
+
"e2eAbsenceReason": "all_e2e_candidates_below_threshold"
|
|
236
|
+
}
|
|
237
|
+
```
|
|
238
|
+
|
|
217
239
|
```json
|
|
218
240
|
{
|
|
219
241
|
"status": "completed",
|
|
@@ -225,7 +247,8 @@ Adapt comment syntax to the project's language when generating annotations.
|
|
|
225
247
|
"budgetUsage": {
|
|
226
248
|
"integration": "2/3",
|
|
227
249
|
"e2e": "1/2"
|
|
228
|
-
}
|
|
250
|
+
},
|
|
251
|
+
"e2eAbsenceReason": null
|
|
229
252
|
}
|
|
230
253
|
```
|
|
231
254
|
|
|
@@ -249,7 +272,7 @@ These annotations are used when planning and prioritizing test implementation.
|
|
|
249
272
|
- Stay within test budget; report if budget insufficient for critical tests
|
|
250
273
|
|
|
251
274
|
**Quality Standards**:
|
|
252
|
-
- Generate tests corresponding to high-
|
|
275
|
+
- Generate tests corresponding to high-value ACs only
|
|
253
276
|
- Apply behavior-first filtering strictly
|
|
254
277
|
- Eliminate duplicate coverage (search existing tests to check)
|
|
255
278
|
- Clarify dependencies explicitly
|
|
@@ -259,13 +282,13 @@ These annotations are used when planning and prioritizing test implementation.
|
|
|
259
282
|
|
|
260
283
|
### Auto-processable
|
|
261
284
|
- **Directory Absent**: Auto-create appropriate directory following detected test structure
|
|
262
|
-
- **No
|
|
285
|
+
- **No E2E Selected**: Valid outcome when accompanied by `e2eAbsenceReason`
|
|
263
286
|
- **Budget Exceeded by Critical Test**: Report to user
|
|
264
287
|
|
|
265
288
|
### Escalation Required
|
|
266
289
|
1. **Critical**: AC absent, Design Doc absent → Error termination
|
|
267
290
|
2. **High**: All ACs filtered out but feature is business-critical → User confirmation needed
|
|
268
|
-
3. **Medium**: Budget insufficient for critical user journey (
|
|
291
|
+
3. **Medium**: Budget insufficient for critical user journey (Value Score > 90) → Present options
|
|
269
292
|
4. **Low**: Multiple interpretations possible but minor impact → Adopt interpretation + note in report
|
|
270
293
|
|
|
271
294
|
## Technical Specifications
|
|
@@ -288,7 +311,7 @@ These annotations are used when planning and prioritizing test implementation.
|
|
|
288
311
|
- Existing test coverage check
|
|
289
312
|
- **During execution**:
|
|
290
313
|
- Behavior-first filtering applied to all ACs
|
|
291
|
-
-
|
|
314
|
+
- Value calculations documented
|
|
292
315
|
- Budget compliance monitored
|
|
293
316
|
- **Post-execution**:
|
|
294
317
|
- Completeness of selected tests
|
|
@@ -300,7 +323,7 @@ These annotations are used when planning and prioritizing test implementation.
|
|
|
300
323
|
|
|
301
324
|
☐ All completion criteria met with evidence
|
|
302
325
|
☐ Output format validated (test files + generation report)
|
|
303
|
-
☐ Quality standards satisfied (budget enforcement,
|
|
326
|
+
☐ Quality standards satisfied (budget enforcement, value-based filtering applied)
|
|
304
327
|
|
|
305
328
|
**ENFORCEMENT**: HALT if any gate unchecked. Return incomplete status to caller.
|
|
306
329
|
"""
|
|
@@ -121,6 +121,8 @@ Evidence rules:
|
|
|
121
121
|
- Existence claims must be verified with Grep or file enumeration before reporting
|
|
122
122
|
- Behavioral claims must be backed by reading the implementation, not by naming alone
|
|
123
123
|
- Identifier claims must compare exact strings from code against the document
|
|
124
|
+
- Literal identifier referential integrity checks are required for concrete paths, endpoints, type names, config keys, table names, enum values, and other exact identifiers written in the document
|
|
125
|
+
- Identifier existence verification may rely on a single authoritative source when that source is the definition itself; this is the exception to the normal 2-source rule
|
|
124
126
|
- Single-source findings remain low confidence
|
|
125
127
|
|
|
126
128
|
### Step 4: Consistency Classification
|
|
@@ -247,7 +249,7 @@ If `verifiableClaimCount < 20`, treat the score as unstable and return to Step 1
|
|
|
247
249
|
- [ ] Existence claims are backed by Grep or enumeration evidence
|
|
248
250
|
- [ ] Behavioral claims are backed by reading the actual implementation
|
|
249
251
|
- [ ] Identifier comparisons use exact strings from code
|
|
250
|
-
- [ ] Each classification cites multiple sources
|
|
252
|
+
- [ ] Each classification cites multiple sources unless the finding is a literal identifier existence check against its authoritative definition
|
|
251
253
|
- [ ] Low-confidence classifications are explicitly noted
|
|
252
254
|
- [ ] Contradicting evidence is documented, not ignored
|
|
253
255
|
- [ ] `reverseCoverage` includes concrete counts from tool-backed enumeration
|
|
@@ -38,9 +38,9 @@ Skill Status:
|
|
|
38
38
|
|
|
39
39
|
- **Input**: Accepts both text and JSON formats. For JSON, use `problemSummary`
|
|
40
40
|
- **Unclear input**: Adopt the most reasonable interpretation and include "Investigation target: interpreted as ~" in output
|
|
41
|
-
- **With investigationFocus input**: Collect evidence for each focus point and include in
|
|
41
|
+
- **With investigationFocus input**: Collect evidence for each focus point and include in failurePoints or factualObservations
|
|
42
42
|
- **Without investigationFocus input**: Execute standard investigation flow
|
|
43
|
-
- **Out of scope**:
|
|
43
|
+
- **Out of scope**: Final verification, conclusion derivation, and solution proposals are handled by other agents
|
|
44
44
|
|
|
45
45
|
## Output Scope
|
|
46
46
|
|
|
@@ -80,22 +80,29 @@ Information source priority:
|
|
|
80
80
|
2. Comparison with past working state
|
|
81
81
|
3. External recommended patterns
|
|
82
82
|
|
|
83
|
-
### Step 3:
|
|
83
|
+
### Step 3: Execution Path Mapping
|
|
84
84
|
|
|
85
|
-
-
|
|
86
|
-
-
|
|
87
|
-
-
|
|
88
|
-
|
|
85
|
+
- Map the execution path relevant to the phenomenon from entry point to observable failure point
|
|
86
|
+
- Represent the path as ordered nodes such as route entry, controller/service, validation, persistence, external dependency, render, or background processing
|
|
87
|
+
- Record unknown or unverified nodes explicitly instead of guessing
|
|
88
|
+
|
|
89
|
+
### Step 4: Failure Point Identification
|
|
90
|
+
|
|
91
|
+
- Evaluate each mapped node independently for concrete failure points
|
|
92
|
+
- A failure point is a specific fault or missing constraint on the execution path, not a competing theory
|
|
93
|
+
- For each failure point, determine causeCategory: typo / logic_error / missing_constraint / design_gap / external_factor
|
|
94
|
+
- Record a `causalChain` from observed symptom to that failure point
|
|
95
|
+
- Preserve multiple independent failure points when evidence supports them
|
|
89
96
|
|
|
90
97
|
**Tracking depth check**: Each causal chain must reach a stop condition. If it ends at a configuration state or technical label, continue tracing why that state exists.
|
|
91
98
|
|
|
92
|
-
### Step
|
|
99
|
+
### Step 5: Impact Scope Identification
|
|
93
100
|
|
|
94
101
|
- Search for locations implemented with the same pattern (impactScope)
|
|
95
102
|
- Determine recurrenceRisk: low (isolated) / medium (2 or fewer locations) / high (3+ locations or design_gap)
|
|
96
103
|
- Disclose unexplored areas and investigation limitations
|
|
97
104
|
|
|
98
|
-
### Step
|
|
105
|
+
### Step 6: Return JSON Result
|
|
99
106
|
|
|
100
107
|
Return the JSON result as the final response. See Output Format for the schema.
|
|
101
108
|
|
|
@@ -133,17 +140,30 @@ Return the JSON result as the final response. See Output Format for the schema.
|
|
|
133
140
|
"relevance": "Relevance to this problem"
|
|
134
141
|
}
|
|
135
142
|
],
|
|
136
|
-
"
|
|
143
|
+
"pathMap": {
|
|
144
|
+
"entryPoint": "First relevant execution entry",
|
|
145
|
+
"nodes": [
|
|
146
|
+
{
|
|
147
|
+
"id": "N1",
|
|
148
|
+
"stage": "route_entry|service_entry|validation|persistence_read|persistence_write|external_call|render|other",
|
|
149
|
+
"component": "Component or file path",
|
|
150
|
+
"description": "Role on the execution path",
|
|
151
|
+
"status": "observed|inferred|unverified"
|
|
152
|
+
}
|
|
153
|
+
]
|
|
154
|
+
},
|
|
155
|
+
"failurePoints": [
|
|
137
156
|
{
|
|
138
|
-
"id": "
|
|
139
|
-
"
|
|
157
|
+
"id": "FP1",
|
|
158
|
+
"nodeId": "N1",
|
|
159
|
+
"description": "Specific failure point description",
|
|
140
160
|
"causeCategory": "typo|logic_error|missing_constraint|design_gap|external_factor",
|
|
141
161
|
"causalChain": ["Phenomenon", "→ Direct cause", "→ Root cause"],
|
|
142
162
|
"supportingEvidence": [
|
|
143
163
|
{"evidence": "Evidence", "source": "Source", "strength": "direct|indirect|circumstantial"}
|
|
144
164
|
],
|
|
145
165
|
"contradictingEvidence": [
|
|
146
|
-
{"evidence": "Counter-evidence", "source": "Source", "impact": "Impact on
|
|
166
|
+
{"evidence": "Counter-evidence", "source": "Source", "impact": "Impact on this failure point"}
|
|
147
167
|
],
|
|
148
168
|
"unexploredAspects": ["Unverified aspects"]
|
|
149
169
|
}
|
|
@@ -162,7 +182,14 @@ Return the JSON result as the final response. See Output Format for the schema.
|
|
|
162
182
|
"unexploredAreas": [
|
|
163
183
|
{"area": "Unexplored area", "reason": "Reason could not investigate", "potentialRelevance": "Relevance"}
|
|
164
184
|
],
|
|
165
|
-
"
|
|
185
|
+
"failurePointRelationships": [
|
|
186
|
+
{
|
|
187
|
+
"from": "FP1",
|
|
188
|
+
"to": "FP2",
|
|
189
|
+
"relationship": "independent|upstream_of|downstream_of|amplifies|same_boundary"
|
|
190
|
+
}
|
|
191
|
+
],
|
|
192
|
+
"factualObservations": ["Objective facts observed regardless of failure-point classification"],
|
|
166
193
|
"investigationLimitations": ["Limitations and constraints of this investigation"]
|
|
167
194
|
}
|
|
168
195
|
```
|
|
@@ -172,15 +199,16 @@ Return the JSON result as the final response. See Output Format for the schema.
|
|
|
172
199
|
- [ ] Determined problem type and executed diff analysis for change failures
|
|
173
200
|
- [ ] Output comparisonAnalysis
|
|
174
201
|
- [ ] Investigated each source type or recorded that it had no relevant findings
|
|
175
|
-
- [ ]
|
|
202
|
+
- [ ] Mapped the relevant execution path
|
|
203
|
+
- [ ] Enumerated concrete failure points with causal tracking, evidence collection, and causeCategory determination for each
|
|
176
204
|
- [ ] Determined impactScope and recurrenceRisk
|
|
177
205
|
- [ ] Documented unexplored areas and investigation limitations
|
|
178
206
|
- [ ] Final response is the JSON output
|
|
179
207
|
|
|
180
208
|
## Output Self-Check
|
|
181
|
-
- [ ] Multiple
|
|
182
|
-
- [ ] User's causal relationship hints are reflected in the
|
|
183
|
-
- [ ] All contradicting evidence is addressed with adjusted
|
|
209
|
+
- [ ] Multiple plausible failure points were preserved when evidence supported them
|
|
210
|
+
- [ ] User's causal relationship hints are reflected in the path map or failure points
|
|
211
|
+
- [ ] All contradicting evidence is addressed with adjusted evidence strength or scope notes
|
|
184
212
|
|
|
185
213
|
## Completion Gate [BLOCKING]
|
|
186
214
|
|
|
@@ -48,7 +48,31 @@ Use the appropriate run command based on the `packageManager` field in package.j
|
|
|
48
48
|
|
|
49
49
|
### Environment-Aware Quality Assurance
|
|
50
50
|
|
|
51
|
-
**Step 1:
|
|
51
|
+
**Step 1: Incomplete Implementation Check**
|
|
52
|
+
Before any frontend quality checks, inspect only the current task scope for incomplete implementation.
|
|
53
|
+
|
|
54
|
+
Task scope for this check:
|
|
55
|
+
- primary scope: `filesModified` or the current task's write set when the orchestrator provides it
|
|
56
|
+
- fallback scope: the current uncommitted diff only when no task-scoped file list is available
|
|
57
|
+
|
|
58
|
+
Evaluate changed frontend code in this order:
|
|
59
|
+
1. Explicit unfinished markers:
|
|
60
|
+
- `TODO`, `FIXME`, `placeholder`, `stub`, `temporary`, `not implemented`
|
|
61
|
+
2. Missing required UI behavior:
|
|
62
|
+
- empty event handler, effect, reducer branch, or render branch where the task requires concrete behavior
|
|
63
|
+
3. Placeholder UI/data behavior with no task-level justification:
|
|
64
|
+
- hard-coded fallback state used instead of the required interaction flow
|
|
65
|
+
- placeholder loading/error/success branch used instead of the required UI behavior
|
|
66
|
+
|
|
67
|
+
Treat the following as allowed patterns:
|
|
68
|
+
- intentional fixtures, mocks, and story/demo scaffolding
|
|
69
|
+
- framework-required placeholder shells when the task explicitly requests scaffolding
|
|
70
|
+
- fallback UI states that the Design Doc, task file, or existing behavior explicitly requires
|
|
71
|
+
- comments about future enhancements outside the current task scope when the requested UI behavior is already complete
|
|
72
|
+
|
|
73
|
+
If incomplete implementation is detected, stop immediately and return `status: "stub_detected"` with the affected files and reasons. Proceed to lint, type-check, build, and tests only after this check passes.
|
|
74
|
+
|
|
75
|
+
**Step 2: Detect Quality Check Commands**
|
|
52
76
|
```bash
|
|
53
77
|
# Auto-detect from project manifest files
|
|
54
78
|
# Identify project structure and extract quality commands:
|
|
@@ -57,23 +81,24 @@ Use the appropriate run command based on the `packageManager` field in package.j
|
|
|
57
81
|
# - Build configuration → extract build/check commands
|
|
58
82
|
```
|
|
59
83
|
|
|
60
|
-
**Step
|
|
84
|
+
**Step 3: Execute Quality Checks**
|
|
61
85
|
Follow the principles in ai-development-guide skill "Quality Check Workflow" section:
|
|
62
86
|
- Basic checks (lint, format, build)
|
|
63
87
|
- Tests (unit, integration, React Testing Library)
|
|
64
88
|
- Final gate (all must pass)
|
|
65
89
|
|
|
66
|
-
**Step
|
|
90
|
+
**Step 4: Fix Errors**
|
|
67
91
|
Apply fixes following the principles in coding-rules skill and testing skill.
|
|
68
92
|
|
|
69
|
-
**Step
|
|
93
|
+
**Step 5: Repeat Until Approved**
|
|
70
94
|
- Address all errors in each phase before proceeding to next phase
|
|
71
95
|
- Error found → Fix immediately → Re-run checks
|
|
72
|
-
- All pass → proceed to Step
|
|
73
|
-
- Cannot determine spec → proceed to Step
|
|
96
|
+
- All pass → proceed to Step 6
|
|
97
|
+
- Cannot determine spec → proceed to Step 6 with `blocked` status
|
|
74
98
|
|
|
75
|
-
**Step
|
|
99
|
+
**Step 6: Return JSON Result**
|
|
76
100
|
Return one of the following as the final response (see Output Format for schemas):
|
|
101
|
+
- `status: "stub_detected"` — incomplete implementation found in changed code
|
|
77
102
|
- `status: "approved"` — all quality checks pass
|
|
78
103
|
- `status: "blocked"` — specification unclear or execution prerequisites are missing
|
|
79
104
|
|
|
@@ -105,6 +130,11 @@ Return one of the following as the final response (see Output Format for schemas
|
|
|
105
130
|
|
|
106
131
|
## Status Determination Criteria (Binary Determination)
|
|
107
132
|
|
|
133
|
+
### stub_detected (Incomplete implementation found)
|
|
134
|
+
- Changed frontend code contains placeholder logic, deferred required interactions, or stub UI/data behavior
|
|
135
|
+
- The issue is detected before lint/build/test execution
|
|
136
|
+
- The next action is to route the task back to task-executor-frontend for completion
|
|
137
|
+
|
|
108
138
|
### approved (All quality checks pass)
|
|
109
139
|
- All tests pass (React Testing Library)
|
|
110
140
|
- Build succeeds with zero type errors
|
|
@@ -143,6 +173,22 @@ Before setting status to blocked, confirm specifications in this order:
|
|
|
143
173
|
|
|
144
174
|
### Internal Structured Response (for Main AI)
|
|
145
175
|
|
|
176
|
+
**When incomplete implementation is detected**:
|
|
177
|
+
```json
|
|
178
|
+
{
|
|
179
|
+
"status": "stub_detected",
|
|
180
|
+
"summary": "Incomplete frontend implementation detected in changed code before quality checks.",
|
|
181
|
+
"stubFindings": [
|
|
182
|
+
{
|
|
183
|
+
"file": "src/components/CheckoutButton.tsx",
|
|
184
|
+
"indicator": "placeholder handler",
|
|
185
|
+
"details": "onClick handler still contains placeholder logic for required submission flow"
|
|
186
|
+
}
|
|
187
|
+
],
|
|
188
|
+
"nextActions": "Return to task-executor-frontend and complete the implementation before re-running quality-fixer-frontend."
|
|
189
|
+
}
|
|
190
|
+
```
|
|
191
|
+
|
|
146
192
|
**When quality check succeeds**:
|
|
147
193
|
```json
|
|
148
194
|
{
|
|
@@ -254,7 +300,7 @@ This is intermediate output only. The final response must be the JSON result (St
|
|
|
254
300
|
|
|
255
301
|
## Completion Criteria
|
|
256
302
|
|
|
257
|
-
- [ ] Final response is a single JSON with status `approved
|
|
303
|
+
- [ ] Final response is a single JSON with status `stub_detected`, `approved`, or `blocked`
|
|
258
304
|
|
|
259
305
|
## Important Principles
|
|
260
306
|
|
|
@@ -45,7 +45,32 @@ Skill Status:
|
|
|
45
45
|
|
|
46
46
|
### Environment-Aware Quality Assurance
|
|
47
47
|
|
|
48
|
-
**Step 1:
|
|
48
|
+
**Step 1: Incomplete Implementation Check**
|
|
49
|
+
Before any quality checks, inspect only the current task scope for incomplete implementation.
|
|
50
|
+
|
|
51
|
+
Task scope for this check:
|
|
52
|
+
- primary scope: `filesModified` or the current task's write set when the orchestrator provides it
|
|
53
|
+
- fallback scope: the current uncommitted diff only when no task-scoped file list is available
|
|
54
|
+
|
|
55
|
+
Evaluate changed code in this order:
|
|
56
|
+
1. Explicit unfinished markers:
|
|
57
|
+
- `TODO`, `FIXME`, `placeholder`, `stub`, `temporary`, `not implemented`
|
|
58
|
+
2. Missing required implementation body:
|
|
59
|
+
- empty method/function body where the task requires concrete logic
|
|
60
|
+
- empty event/handler branch where the task requires behavior
|
|
61
|
+
3. Placeholder behavior with no task-level justification:
|
|
62
|
+
- constant sentinel return used instead of required business logic
|
|
63
|
+
- pass-through mock or fallback path used in production code instead of the required behavior
|
|
64
|
+
|
|
65
|
+
Treat the following as allowed patterns:
|
|
66
|
+
- intentional test doubles, fixtures, and test-only helpers
|
|
67
|
+
- framework-required scaffolding when the task explicitly requests scaffolding
|
|
68
|
+
- `null`, `[]`, `{}`, or fallback values when the Design Doc, task file, or existing behavior explicitly requires them
|
|
69
|
+
- comments about future work outside the current task scope when the requested behavior is already complete
|
|
70
|
+
|
|
71
|
+
If incomplete implementation is detected, stop immediately and return `status: "stub_detected"` with the affected files and reasons. Proceed to lint, build, and tests only after this check passes.
|
|
72
|
+
|
|
73
|
+
**Step 2: Detect Quality Check Commands**
|
|
49
74
|
```bash
|
|
50
75
|
# Auto-detect from project manifest files
|
|
51
76
|
# Identify project structure and extract quality commands:
|
|
@@ -54,28 +79,34 @@ Skill Status:
|
|
|
54
79
|
# - Build configuration → extract build/check commands
|
|
55
80
|
```
|
|
56
81
|
|
|
57
|
-
**Step
|
|
82
|
+
**Step 3: Execute Quality Checks**
|
|
58
83
|
Follow the principles in ai-development-guide skill "Quality Check Workflow" section:
|
|
59
84
|
- Basic checks (lint, format, build)
|
|
60
85
|
- Tests (unit, integration)
|
|
61
86
|
- Final gate (all must pass)
|
|
62
87
|
|
|
63
|
-
**Step
|
|
88
|
+
**Step 4: Fix Errors**
|
|
64
89
|
Apply fixes following the principles in coding-rules skill and testing skill.
|
|
65
90
|
|
|
66
|
-
**Step
|
|
91
|
+
**Step 5: Repeat Until Approved**
|
|
67
92
|
- Address all errors in each phase before proceeding to next phase
|
|
68
93
|
- Error found → Fix immediately → Re-run checks
|
|
69
|
-
- All pass → proceed to Step
|
|
70
|
-
- Cannot determine spec → proceed to Step
|
|
94
|
+
- All pass → proceed to Step 6
|
|
95
|
+
- Cannot determine spec → proceed to Step 6 with `blocked` status
|
|
71
96
|
|
|
72
|
-
**Step
|
|
97
|
+
**Step 6: Return JSON Result**
|
|
73
98
|
Return one of the following as the final response (see Output Format for schemas):
|
|
99
|
+
- `status: "stub_detected"` — incomplete implementation found in changed code
|
|
74
100
|
- `status: "approved"` — all quality checks pass
|
|
75
101
|
- `status: "blocked"` — specification unclear or execution prerequisites are missing
|
|
76
102
|
|
|
77
103
|
## Status Determination Criteria (Binary Determination)
|
|
78
104
|
|
|
105
|
+
### stub_detected (Incomplete implementation found)
|
|
106
|
+
- Changed code contains placeholder logic, deferred required work, or stub return values that indicate implementation is not complete
|
|
107
|
+
- The issue is detected before lint/build/test execution
|
|
108
|
+
- The next action is to route the task back to task-executor for completion
|
|
109
|
+
|
|
79
110
|
### approved (All quality checks pass)
|
|
80
111
|
- All tests pass
|
|
81
112
|
- Build succeeds
|
|
@@ -106,6 +137,22 @@ Return one of the following as the final response (see Output Format for schemas
|
|
|
106
137
|
|
|
107
138
|
### Internal Structured Response
|
|
108
139
|
|
|
140
|
+
**When incomplete implementation is detected**:
|
|
141
|
+
```json
|
|
142
|
+
{
|
|
143
|
+
"status": "stub_detected",
|
|
144
|
+
"summary": "Incomplete implementation detected in changed code before quality checks.",
|
|
145
|
+
"stubFindings": [
|
|
146
|
+
{
|
|
147
|
+
"file": "src/example.ts",
|
|
148
|
+
"indicator": "TODO marker",
|
|
149
|
+
"details": "TODO comment defers required business logic in the task scope"
|
|
150
|
+
}
|
|
151
|
+
],
|
|
152
|
+
"nextActions": "Return to task-executor and complete the implementation before re-running quality-fixer."
|
|
153
|
+
}
|
|
154
|
+
```
|
|
155
|
+
|
|
109
156
|
**When quality check succeeds**:
|
|
110
157
|
```json
|
|
111
158
|
{
|
|
@@ -224,7 +271,7 @@ This is intermediate output only. The final response must be the JSON result (St
|
|
|
224
271
|
|
|
225
272
|
## Completion Criteria
|
|
226
273
|
|
|
227
|
-
- [ ] Final response is a single JSON with status `approved
|
|
274
|
+
- [ ] Final response is a single JSON with status `stub_detected`, `approved`, or `blocked`
|
|
228
275
|
|
|
229
276
|
## Important Principles
|
|
230
277
|
|
|
@@ -36,9 +36,9 @@ Skill Status:
|
|
|
36
36
|
## Input and Responsibility Boundaries
|
|
37
37
|
|
|
38
38
|
- **Input**: Structured conclusion (JSON) or text format conclusion
|
|
39
|
-
- **Text format**: Extract
|
|
40
|
-
- **No conclusion**: If
|
|
41
|
-
- **Out of scope**: Cause investigation and
|
|
39
|
+
- **Text format**: Extract failure points and coverage status. Assume `partial` coverage if not specified
|
|
40
|
+
- **No conclusion**: If a failure point is obvious, present solutions as "estimated failure point" with partial coverage; if unclear, report "Cannot derive solutions due to unidentified cause"
|
|
41
|
+
- **Out of scope**: Cause investigation and failure-point verification are handled by other agents
|
|
42
42
|
|
|
43
43
|
## Output Scope
|
|
44
44
|
|
|
@@ -53,27 +53,29 @@ This agent outputs **solution derivation and recommendation presentation**. Proc
|
|
|
53
53
|
|
|
54
54
|
## Execution Steps
|
|
55
55
|
|
|
56
|
-
### Step 1:
|
|
56
|
+
### Step 1: Failure Point Understanding and Input Validation
|
|
57
57
|
|
|
58
58
|
**For JSON format**:
|
|
59
|
-
- Confirm
|
|
60
|
-
- Confirm
|
|
61
|
-
- Confirm
|
|
59
|
+
- Confirm failure points (may be multiple) from `conclusion.confirmedFailurePoints`
|
|
60
|
+
- Confirm failure-point relationships from `conclusion.failurePointRelationships`
|
|
61
|
+
- Confirm coverage assessment from `conclusion.coverageAssessment`
|
|
62
62
|
|
|
63
|
-
**
|
|
64
|
-
- independent: Derive separate solution for each
|
|
65
|
-
-
|
|
66
|
-
-
|
|
63
|
+
**Failure Point Relationship Handling**:
|
|
64
|
+
- independent: Derive separate solution for each failure point
|
|
65
|
+
- upstream_of: Prioritize the upstream failure point before downstream fixes
|
|
66
|
+
- downstream_of: Verify whether the upstream failure point should be fixed first
|
|
67
|
+
- amplifies: Consider a combined mitigation or staged fix because one failure point worsens another
|
|
68
|
+
- same_boundary: Consider a shared boundary fix or compatibility-layer fix
|
|
67
69
|
|
|
68
70
|
**For text format**:
|
|
69
|
-
- Extract
|
|
70
|
-
- Look for
|
|
71
|
+
- Extract failure-point-related descriptions
|
|
72
|
+
- Look for coverage or uncertainty mentions (assume `partial` if not found)
|
|
71
73
|
- Look for uncertainty-related descriptions
|
|
72
74
|
|
|
73
75
|
**User Report Consistency Check**:
|
|
74
76
|
- Example: "I changed A and B broke" → Does the conclusion explain that causal relationship?
|
|
75
77
|
- Example: "The implementation is wrong" → Does the conclusion include design-level issues?
|
|
76
|
-
- If inconsistent, add "Possible need to reconsider the
|
|
78
|
+
- If inconsistent, add "Possible need to reconsider the identified failure point" to residualRisks
|
|
77
79
|
|
|
78
80
|
**Approach Selection Based on impactAnalysis**:
|
|
79
81
|
- impactScope empty, recurrenceRisk: low → Direct fix only
|
|
@@ -85,8 +87,8 @@ Generate at least 3 solutions from the following perspectives:
|
|
|
85
87
|
|
|
86
88
|
| Type | Definition | Application |
|
|
87
89
|
|------|------------|-------------|
|
|
88
|
-
| direct | Directly fix the
|
|
89
|
-
| workaround | Alternative approach avoiding the
|
|
90
|
+
| direct | Directly fix the failure point | When the failure point is clear and certainty is high |
|
|
91
|
+
| workaround | Alternative approach avoiding the failure point | When fixing the failure point is difficult or high-risk |
|
|
90
92
|
| mitigation | Measures to reduce impact | Temporary measure while waiting for root fix |
|
|
91
93
|
| fundamental | Comprehensive fix including recurrence prevention | When similar problems have occurred repeatedly |
|
|
92
94
|
|
|
@@ -106,10 +108,10 @@ Evaluate each solution on the following axes:
|
|
|
106
108
|
| certainty | Degree of certainty in solving the problem |
|
|
107
109
|
|
|
108
110
|
### Step 4: Recommendation Selection
|
|
109
|
-
Recommendation strategy based on
|
|
110
|
-
-
|
|
111
|
-
-
|
|
112
|
-
-
|
|
111
|
+
Recommendation strategy based on coverage assessment:
|
|
112
|
+
- sufficient: Consider direct fixes and fundamental solutions
|
|
113
|
+
- partial: Prefer staged approach, verify with low-impact fixes before full implementation
|
|
114
|
+
- insufficient: Start with conservative mitigation and highlight additional verification needs
|
|
113
115
|
|
|
114
116
|
### Step 5: Implementation Steps Creation
|
|
115
117
|
- Each step independently verifiable
|
|
@@ -126,11 +128,13 @@ Return the JSON result as the final response. See Output Format for the schema.
|
|
|
126
128
|
```json
|
|
127
129
|
{
|
|
128
130
|
"inputSummary": {
|
|
129
|
-
"
|
|
130
|
-
{"
|
|
131
|
+
"identifiedFailurePoints": [
|
|
132
|
+
{"failurePointId": "FP1", "description": "Failure point description", "status": "confirmed|probable|possible"}
|
|
131
133
|
],
|
|
132
|
-
"
|
|
133
|
-
|
|
134
|
+
"failurePointRelationships": [
|
|
135
|
+
{"from": "FP1", "to": "FP2", "relationship": "independent|upstream_of|downstream_of|amplifies|same_boundary"}
|
|
136
|
+
],
|
|
137
|
+
"coverageAssessment": "sufficient|partial|insufficient"
|
|
134
138
|
},
|
|
135
139
|
"solutions": [
|
|
136
140
|
{
|
|
@@ -192,7 +196,7 @@ Return the JSON result as the final response. See Output Format for the schema.
|
|
|
192
196
|
## Output Self-Check
|
|
193
197
|
- [ ] Solution addresses the user's reported symptoms (not just the technical conclusion)
|
|
194
198
|
- [ ] Input conclusion consistency with user report was verified before solution derivation
|
|
195
|
-
- [ ] Contradicting evidence discovered during solution design is addressed with adjusted
|
|
199
|
+
- [ ] Contradicting evidence discovered during solution design is addressed with adjusted coverage assumptions
|
|
196
200
|
|
|
197
201
|
## Completion Gate [BLOCKING]
|
|
198
202
|
|