codexspec 0.5.14__tar.gz → 0.5.16__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (64) hide show
  1. {codexspec-0.5.14 → codexspec-0.5.16}/PKG-INFO +1 -1
  2. {codexspec-0.5.14 → codexspec-0.5.16}/pyproject.toml +1 -1
  3. {codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/plan-to-tasks.md +42 -0
  4. {codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/review-plan.md +130 -9
  5. {codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/review-python-code.md +116 -8
  6. {codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/review-react-code.md +133 -9
  7. {codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/review-spec.md +126 -9
  8. {codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/review-tasks.md +128 -9
  9. {codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/spec-to-plan.md +42 -0
  10. {codexspec-0.5.14 → codexspec-0.5.16}/.gitignore +0 -0
  11. {codexspec-0.5.14 → codexspec-0.5.16}/LICENSE +0 -0
  12. {codexspec-0.5.14 → codexspec-0.5.16}/README.md +0 -0
  13. {codexspec-0.5.14 → codexspec-0.5.16}/codexspec-icon.svg +0 -0
  14. {codexspec-0.5.14 → codexspec-0.5.16}/codexspec-logo-dark.svg +0 -0
  15. {codexspec-0.5.14 → codexspec-0.5.16}/codexspec-logo-light.svg +0 -0
  16. {codexspec-0.5.14 → codexspec-0.5.16}/scripts/bash/check-i18n-completeness.sh +0 -0
  17. {codexspec-0.5.14 → codexspec-0.5.16}/scripts/bash/check-i18n-structure.sh +0 -0
  18. {codexspec-0.5.14 → codexspec-0.5.16}/scripts/bash/check-prerequisites.sh +0 -0
  19. {codexspec-0.5.14 → codexspec-0.5.16}/scripts/bash/common.sh +0 -0
  20. {codexspec-0.5.14 → codexspec-0.5.16}/scripts/bash/create-new-feature.sh +0 -0
  21. {codexspec-0.5.14 → codexspec-0.5.16}/scripts/powershell/check-prerequisites.ps1 +0 -0
  22. {codexspec-0.5.14 → codexspec-0.5.16}/scripts/powershell/common.ps1 +0 -0
  23. {codexspec-0.5.14 → codexspec-0.5.16}/scripts/powershell/create-new-feature.ps1 +0 -0
  24. {codexspec-0.5.14 → codexspec-0.5.16}/scripts/python/.claude/settings.local.json +0 -0
  25. {codexspec-0.5.14 → codexspec-0.5.16}/scripts/python/README.md +0 -0
  26. {codexspec-0.5.14 → codexspec-0.5.16}/scripts/python/claude_ctl.py +0 -0
  27. {codexspec-0.5.14 → codexspec-0.5.16}/scripts/python/claude_monitor.py +0 -0
  28. {codexspec-0.5.14 → codexspec-0.5.16}/scripts/python/notify_telegram.py +0 -0
  29. {codexspec-0.5.14 → codexspec-0.5.16}/src/codexspec/__init__.py +0 -0
  30. {codexspec-0.5.14 → codexspec-0.5.16}/src/codexspec/commands/__init__.py +0 -0
  31. {codexspec-0.5.14 → codexspec-0.5.16}/src/codexspec/commands/installer.py +0 -0
  32. {codexspec-0.5.14 → codexspec-0.5.16}/src/codexspec/i18n.py +0 -0
  33. {codexspec-0.5.14 → codexspec-0.5.16}/src/codexspec/idea.md +0 -0
  34. {codexspec-0.5.14 → codexspec-0.5.16}/src/codexspec/translator.py +0 -0
  35. {codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/analyze.md +0 -0
  36. {codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/check-i18n-semantics.md +0 -0
  37. {codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/checklist.md +0 -0
  38. {codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/clarify.md +0 -0
  39. {codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/commit-staged.md +0 -0
  40. {codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/config.md +0 -0
  41. {codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/constitution.md +0 -0
  42. {codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/generate-spec.md +0 -0
  43. {codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/implement-tasks.md +0 -0
  44. {codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/pr.md +0 -0
  45. {codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/quick.md +0 -0
  46. {codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/specify.md +0 -0
  47. {codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/tasks-to-issues.md +0 -0
  48. {codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/translate-docs.md +0 -0
  49. {codexspec-0.5.14 → codexspec-0.5.16}/templates/docs/checklist-template.md +0 -0
  50. {codexspec-0.5.14 → codexspec-0.5.16}/templates/docs/constitution-template.md +0 -0
  51. {codexspec-0.5.14 → codexspec-0.5.16}/templates/docs/plan-template-detailed.md +0 -0
  52. {codexspec-0.5.14 → codexspec-0.5.16}/templates/docs/plan-template-simple.md +0 -0
  53. {codexspec-0.5.14 → codexspec-0.5.16}/templates/docs/spec-template-detailed.md +0 -0
  54. {codexspec-0.5.14 → codexspec-0.5.16}/templates/docs/spec-template-simple.md +0 -0
  55. {codexspec-0.5.14 → codexspec-0.5.16}/templates/docs/tasks-template-detailed.md +0 -0
  56. {codexspec-0.5.14 → codexspec-0.5.16}/templates/docs/tasks-template-simple.md +0 -0
  57. {codexspec-0.5.14 → codexspec-0.5.16}/templates/translations/de.json +0 -0
  58. {codexspec-0.5.14 → codexspec-0.5.16}/templates/translations/en.json +0 -0
  59. {codexspec-0.5.14 → codexspec-0.5.16}/templates/translations/es.json +0 -0
  60. {codexspec-0.5.14 → codexspec-0.5.16}/templates/translations/fr.json +0 -0
  61. {codexspec-0.5.14 → codexspec-0.5.16}/templates/translations/ja.json +0 -0
  62. {codexspec-0.5.14 → codexspec-0.5.16}/templates/translations/ko.json +0 -0
  63. {codexspec-0.5.14 → codexspec-0.5.16}/templates/translations/pt-BR.json +0 -0
  64. {codexspec-0.5.14 → codexspec-0.5.16}/templates/translations/zh-CN.json +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: codexspec
3
- Version: 0.5.14
3
+ Version: 0.5.16
4
4
  Summary: CodexSpec - A Spec-Driven Development (SDD) toolkit for Claude Code
5
5
  Project-URL: Homepage, https://github.com/Zts0hg/codexspec
6
6
  Project-URL: Repository, https://github.com/Zts0hg/codexspec
@@ -1,6 +1,6 @@
1
1
  [project]
2
2
  name = "codexspec"
3
- version = "0.5.14"
3
+ version = "0.5.16"
4
4
  description = "CodexSpec - A Spec-Driven Development (SDD) toolkit for Claude Code"
5
5
  readme = "README.md"
6
6
  requires-python = ">=3.11"
@@ -29,6 +29,48 @@ You are acting as a **Technical Lead**. Your responsibility is to transform tech
29
29
 
30
30
  Analyze the provided spec and plan documents, then break down the technical implementation plan into specific, actionable tasks.
31
31
 
32
+ ### Quality Targets
33
+
34
+ Before generating the task breakdown, internalize these quality targets. They are aligned with the `review-tasks` scoring rubrics to ensure first-pass quality.
35
+
36
+ #### Plan Coverage (Target: ≥ 90)
37
+
38
+ - [ ] Every plan phase has corresponding tasks
39
+ - [ ] Every module/component has creation and implementation tasks
40
+ - [ ] Every API endpoint has an implementation task (if applicable)
41
+ - [ ] Every data model has an implementation task (if applicable)
42
+ - [ ] Testing tasks are included per constitution TDD requirements
43
+
44
+ #### TDD Compliance (Target: ≥ 90)
45
+
46
+ - [ ] Every code component has a test task that precedes its implementation task
47
+ - [ ] Test tasks are never marked as optional
48
+ - [ ] Integration tests are included where appropriate
49
+ - [ ] Test file paths follow project testing conventions
50
+
51
+ #### Dependency & Ordering (Target: ≥ 90)
52
+
53
+ - [ ] All dependencies between tasks are explicitly declared
54
+ - [ ] No circular dependencies exist
55
+ - [ ] Foundation/setup tasks are placed first
56
+ - [ ] Dependencies execute before dependents in the execution order
57
+
58
+ #### Task Granularity (Target: ≥ 90)
59
+
60
+ - [ ] Each task involves only ONE primary file
61
+ - [ ] Each task has a clear, single deliverable
62
+ - [ ] Tasks are neither too broad (should be split) nor too narrow (should be combined)
63
+ - [ ] Complexity estimates are reasonable
64
+
65
+ #### Parallelization & Files (Target: ≥ 90)
66
+
67
+ - [ ] Truly independent tasks are marked with `[P]`
68
+ - [ ] Dependent tasks are NOT marked `[P]`
69
+ - [ ] All tasks have file paths specified
70
+ - [ ] File paths follow project naming conventions
71
+
72
+ > **Self-Check**: After generating the tasks, verify each target above is met before saving. This reduces review iterations.
73
+
32
74
  ### Critical Requirements
33
75
 
34
76
  1. **Task Granularity**: Each task should involve modifying or creating **only one primary file**. Avoid broad tasks like "implement all features".
@@ -88,6 +88,98 @@ Review the technical implementation plan for quality and readiness. This command
88
88
  - [ ] Naming conventions are followed (if specified)
89
89
  - [ ] Testing requirements are addressed
90
90
 
91
+ ### Scoring Rubrics
92
+
93
+ Before scoring, apply these rubrics to ensure consistent, transparent evaluation.
94
+
95
+ #### Spec Alignment (30%)
96
+
97
+ | Score Range | Criteria |
98
+ |-------------|----------|
99
+ | 90-100 | All functional requirements, user stories, and NFRs have clear implementation coverage; edge cases addressed |
100
+ | 70-89 | Most requirements covered; 1-2 minor gaps in NFR or edge case coverage |
101
+ | 50-69 | Several requirements only partially covered; missing implementation for key user stories |
102
+ | Below 50 | Major requirements missing from plan; significant gaps between spec and plan |
103
+
104
+ **Typical Deductions**:
105
+
106
+ - Functional requirement with no implementation: -15 each
107
+ - User story without technical coverage: -10 each
108
+ - NFR not addressed in architecture: -8 each
109
+ - Edge case from spec not handled: -5 each
110
+
111
+ #### Tech Stack (15%)
112
+
113
+ | Score Range | Criteria |
114
+ |-------------|----------|
115
+ | 90-100 | All technologies clearly defined with versions; choices well-justified and appropriate for requirements |
116
+ | 70-89 | Tech stack defined; minor version gaps; mostly appropriate choices |
117
+ | 50-69 | Incomplete stack definition; some questionable technology choices |
118
+ | Below 50 | Vague or missing tech stack; inappropriate choices for requirements |
119
+
120
+ **Typical Deductions**:
121
+
122
+ - Technology without version constraint: -5 each
123
+ - Unjustified technology choice: -10 each
124
+ - Missing critical category (e.g., no testing framework): -10
125
+
126
+ #### Architecture Quality (25%)
127
+
128
+ | Score Range | Criteria |
129
+ |-------------|----------|
130
+ | 90-100 | Clear diagrams; well-defined module responsibilities; proper separation of concerns; dependency graph complete |
131
+ | 70-89 | Good architecture; minor gaps in documentation; mostly clear module boundaries |
132
+ | 50-69 | Architecture outlined but vague; unclear module responsibilities; missing dependency graph |
133
+ | Below 50 | No clear architecture; modules poorly defined; significant design issues |
134
+
135
+ **Typical Deductions**:
136
+
137
+ - Missing architecture diagram: -15
138
+ - Module without clear responsibility: -8 each
139
+ - Missing dependency graph: -10
140
+ - Tight coupling between modules: -8 each
141
+ - Missing separation of concerns: -10
142
+
143
+ #### Phase Planning (15%)
144
+
145
+ | Score Range | Criteria |
146
+ |-------------|----------|
147
+ | 90-100 | Phases logically ordered; clear deliverables per phase; realistic scope; minimal inter-phase dependencies |
148
+ | 70-89 | Good phasing; 1-2 phases with unclear deliverables or slightly large scope |
149
+ | 50-69 | Phase ordering has issues; several phases lack clear deliverables |
150
+ | Below 50 | No meaningful phase breakdown; deliverables unclear; unrealistic scope |
151
+
152
+ **Typical Deductions**:
153
+
154
+ - Phase without clear deliverables: -10 each
155
+ - Illogical phase ordering: -10
156
+ - Overly large phase scope: -5 each
157
+ - Missing phase dependencies: -5
158
+
159
+ #### Constitution Alignment (15%)
160
+
161
+ | Score Range | Criteria |
162
+ |-------------|----------|
163
+ | 90-100 | Fully aligned with all constitution principles; architecture principles followed; testing requirements addressed |
164
+ | 70-89 | Mostly aligned; minor gaps in addressing specific principles |
165
+ | 50-69 | Partial alignment; several principles not addressed |
166
+ | Below 50 | Significant violations or disregard of constitution |
167
+
168
+ > **Note**: If no constitution exists, this category defaults to 100 (full marks) and its weight is redistributed proportionally to other categories.
169
+
170
+ **Typical Deductions**:
171
+
172
+ - Constitution principle not addressed: -10 per principle
173
+ - Direct violation of a constitution principle: -20 per violation
174
+
175
+ #### Suggestion Score Cap Rule
176
+
177
+ **IMPORTANT**: Suggestions (Nice to Have) items may deduct a **maximum of 5 points** from the total score. After resolving all Critical Issues and Warnings, the score should be **≥ 95**.
178
+
179
+ - Critical Issues: -10 to -20 points each
180
+ - Warnings: -5 to -10 points each
181
+ - Suggestions: -1 to -2 points each, **capped at 5 points total**
182
+
91
183
  ### Report Template
92
184
 
93
185
  ```markdown
@@ -207,14 +299,16 @@ Review the technical implementation plan for quality and readiness. This command
207
299
 
208
300
  ## Scoring Breakdown
209
301
 
210
- | Category | Weight | Score | Weighted |
211
- |----------|--------|-------|----------|
212
- | Spec Alignment | 30% | X/100 | X |
213
- | Tech Stack | 15% | X/100 | X |
214
- | Architecture Quality | 25% | X/100 | X |
215
- | Phase Planning | 15% | X/100 | X |
216
- | Constitution Alignment | 15% | X/100 | X |
217
- | **Total** | **100%** | | **X/100** |
302
+ | Category | Weight | Score | Rubric Basis | Deduction Details | Weighted |
303
+ |----------|--------|-------|-------------|-------------------|----------|
304
+ | Spec Alignment | 30% | X/100 | [Which rubric range applies] | [List specific deductions, e.g., "REQ-003 not addressed: -15"] | X |
305
+ | Tech Stack | 15% | X/100 | [Which rubric range applies] | [e.g., "No version for DB: -5"] | X |
306
+ | Architecture Quality | 25% | X/100 | [Which rubric range applies] | [e.g., "Missing dependency graph: -10"] | X |
307
+ | Phase Planning | 15% | X/100 | [Which rubric range applies] | [e.g., "Phase 2 scope too large: -5"] | X |
308
+ | Constitution Alignment | 15% | X/100 | [Which rubric range applies] | [e.g., "All principles addressed"] | X |
309
+ | **Total** | **100%** | | | | **X/100** |
310
+
311
+ > **Suggestion Cap**: Suggestions deducted X/5 points (cap: 5 points max)
218
312
 
219
313
  ## Recommendations
220
314
 
@@ -245,6 +339,33 @@ Based on the review result, the user may consider:
245
339
  - **Fail**: `/codexspec:spec-to-plan` - to regenerate the technical plan
246
340
  ```
247
341
 
342
+ ### Score Validation Checklist
343
+
344
+ Before finalizing scores, the reviewer MUST verify:
345
+
346
+ - [ ] Every deduction in "Deduction Details" column has a corresponding issue in "Detailed Findings"
347
+ - [ ] The arithmetic is correct: each category score = 100 minus sum of deductions
348
+ - [ ] Weighted total = sum of (category score × weight) for all categories
349
+ - [ ] Suggestion deductions do not exceed 5-point cap
350
+ - [ ] No "phantom deductions" (deductions without matching issues)
351
+ - [ ] Score is consistent with Overall Status (Pass ≥ 80, Needs Work 50-79, Fail < 50)
352
+
353
+ ### Score Challenge Response Protocol
354
+
355
+ When a user questions or challenges the score, follow this three-step process:
356
+
357
+ 1. **Provide Evidence**: Present the complete scoring breakdown with all deduction details. Reference the specific rubric criteria and issue IDs that justify each deduction.
358
+
359
+ 2. **Ask for Specifics**: Ask the user which specific scoring item(s) they believe are incorrect. Do NOT preemptively adjust any scores.
360
+
361
+ 3. **Targeted Re-evaluation**: For each challenged item:
362
+ - Re-read the relevant section of the plan
363
+ - Re-apply the rubric criteria objectively
364
+ - If the original score was correct: explain the reasoning and maintain the score
365
+ - If the original score was indeed incorrect: adjust with clear explanation of what changed and why
366
+
367
+ > **CRITICAL**: Never adjust scores simply because the user expresses dissatisfaction. Only adjust when re-evaluation reveals a genuine scoring error.
368
+
248
369
  ### Quality Criteria
249
370
 
250
371
  Before completing the review, verify:
@@ -254,7 +375,7 @@ Before completing the review, verify:
254
375
  - [ ] Tech stack choices are evaluated
255
376
  - [ ] Constitution alignment is checked
256
377
  - [ ] Issues have clear, actionable suggestions
257
- - [ ] Score reflects actual quality accurately
378
+ - [ ] Score reflects actual quality accurately (validated via Score Validation Checklist)
258
379
  - [ ] Next steps are clear and appropriate
259
380
 
260
381
  ### Output
@@ -93,6 +93,85 @@ Perform a comprehensive code review of Python files at the specified path. This
93
93
  - [ ] Include specific code locations and refactoring suggestions
94
94
  - [ ] Calculate quality scores per dimension
95
95
 
96
+ ### Scoring Rubrics
97
+
98
+ Before scoring, apply these rubrics to ensure consistent, transparent evaluation.
99
+
100
+ #### Pythonic & KISS (30%)
101
+
102
+ | Score Range | Criteria |
103
+ |-------------|----------|
104
+ | 90-100 | Code follows Python idioms; uses built-in/stdlib effectively; no over-engineering; functions are focused |
105
+ | 70-89 | Mostly Pythonic; minor instances of unnecessary complexity or missed stdlib usage |
106
+ | 50-69 | Several non-idiomatic patterns; unnecessary classes or abstractions; missed standard library opportunities |
107
+ | Below 50 | Pervasive over-engineering; code fights against Python idioms; significant complexity issues |
108
+
109
+ **Typical Deductions**:
110
+
111
+ - Unnecessary class when function suffices: -8 each
112
+ - Missed standard library opportunity (e.g., manual iteration vs. itertools): -5 each
113
+ - Function exceeding single responsibility: -5 each
114
+ - Overly complex logic when simpler alternative exists: -5 each
115
+
116
+ #### Type Safety & Explicitness (30%)
117
+
118
+ | Score Range | Criteria |
119
+ |-------------|----------|
120
+ | 90-100 | Complete type annotations; specific exception handling; exception context preserved; good DI patterns |
121
+ | 70-89 | Most functions annotated; minor type safety gaps; 1-2 broad exception catches |
122
+ | 50-69 | Incomplete type annotations; several broad exception handlers; missing `raise from` |
123
+ | Below 50 | No type annotations; pervasive `except Exception:`; no exception context preservation |
124
+
125
+ **Typical Deductions**:
126
+
127
+ - Public function missing type annotations: -5 each
128
+ - Bare `except:` or `except Exception:` without re-raise: -8 each
129
+ - Missing `raise ... from err` context: -3 each
130
+ - mypy error: -5 each
131
+
132
+ #### Engineering Robustness (25%)
133
+
134
+ | Score Range | Criteria |
135
+ |-------------|----------|
136
+ | 90-100 | Proper resource management (context managers); correct async patterns; proper logging; no print statements |
137
+ | 70-89 | Mostly robust; minor resource management gaps; 1-2 logging issues |
138
+ | 50-69 | Several resource leaks; print statements instead of logging; async pattern issues |
139
+ | Below 50 | No context managers for resources; pervasive print debugging; blocking async operations |
140
+
141
+ **Typical Deductions**:
142
+
143
+ - File/connection without context manager: -8 each
144
+ - `print()` instead of `logging`: -3 each
145
+ - Blocking call in async context: -10 each
146
+ - Incorrect log level usage: -3 each
147
+ - ruff violation: -3 each
148
+
149
+ #### Constitution Alignment (15%)
150
+
151
+ | Score Range | Criteria |
152
+ |-------------|----------|
153
+ | 90-100 | Fully aligned with all constitution MUST principles; project conventions followed |
154
+ | 70-89 | Mostly aligned; minor gaps in addressing specific principles |
155
+ | 50-69 | Partial alignment; several principles not addressed |
156
+ | Below 50 | Significant violations or disregard of constitution |
157
+
158
+ > **Note**: If no constitution exists, this category defaults to 100 (full marks) and its weight is redistributed proportionally to other categories.
159
+
160
+ **Typical Deductions**:
161
+
162
+ - Constitution MUST violation: -15 each
163
+ - Constitution SHOULD violation: -8 each
164
+ - Naming convention violation: -3 each
165
+
166
+ #### Suggestion Score Cap Rule
167
+
168
+ **IMPORTANT**: Suggestions (LOW) items may deduct a **maximum of 5 points** from the total score. After resolving all CRITICAL and HIGH issues, the score should be **≥ 95**.
169
+
170
+ - CRITICAL Issues: -10 to -20 points each
171
+ - HIGH Issues: -5 to -10 points each
172
+ - MEDIUM Issues: -3 to -5 points each
173
+ - LOW Suggestions: -1 to -2 points each, **capped at 5 points total**
174
+
96
175
  ### Report Template
97
176
 
98
177
  ````markdown
@@ -184,13 +263,15 @@ Perform a comprehensive code review of Python files at the specified path. This
184
263
 
185
264
  ## Scoring Breakdown
186
265
 
187
- | Category | Weight | Score | Weighted |
188
- |----------|--------|-------|----------|
189
- | Pythonic & KISS | 30% | X/100 | X |
190
- | Type Safety | 30% | X/100 | X |
191
- | Engineering Robustness | 25% | X/100 | X |
192
- | Constitution Alignment | 15% | X/100 | X |
193
- | **Total** | **100%** | | **X/100** |
266
+ | Category | Weight | Score | Rubric Basis | Deduction Details | Weighted |
267
+ |----------|--------|-------|-------------|-------------------|----------|
268
+ | Pythonic & KISS | 30% | X/100 | [Which rubric range applies] | [List specific deductions, e.g., "Unnecessary class in utils.py: -8"] | X |
269
+ | Type Safety | 30% | X/100 | [Which rubric range applies] | [e.g., "2 functions missing annotations: -10"] | X |
270
+ | Engineering Robustness | 25% | X/100 | [Which rubric range applies] | [e.g., "File opened without context manager: -8"] | X |
271
+ | Constitution Alignment | 15% | X/100 | [Which rubric range applies] | [e.g., "All principles followed"] | X |
272
+ | **Total** | **100%** | | | | **X/100** |
273
+
274
+ > **Suggestion Cap**: LOW suggestions deducted X/5 points (cap: 5 points max)
194
275
 
195
276
  ## Available Follow-up Commands
196
277
 
@@ -207,6 +288,33 @@ Based on the review result, consider:
207
288
  - **Fail**: Significant rework required - consider `/codexspec:clarify` for design discussion
208
289
  ````
209
290
 
291
+ ### Score Validation Checklist
292
+
293
+ Before finalizing scores, the reviewer MUST verify:
294
+
295
+ - [ ] Every deduction in "Deduction Details" column has a corresponding issue in "Detailed Findings"
296
+ - [ ] The arithmetic is correct: each category score = 100 minus sum of deductions
297
+ - [ ] Weighted total = sum of (category score × weight) for all categories
298
+ - [ ] LOW suggestion deductions do not exceed 5-point cap
299
+ - [ ] No "phantom deductions" (deductions without matching issues)
300
+ - [ ] Score is consistent with Overall Status (Pass ≥ 80, Needs Work 50-79, Fail < 50)
301
+
302
+ ### Score Challenge Response Protocol
303
+
304
+ When a user questions or challenges the score, follow this three-step process:
305
+
306
+ 1. **Provide Evidence**: Present the complete scoring breakdown with all deduction details. Reference the specific rubric criteria and issue IDs that justify each deduction.
307
+
308
+ 2. **Ask for Specifics**: Ask the user which specific scoring item(s) they believe are incorrect. Do NOT preemptively adjust any scores.
309
+
310
+ 3. **Targeted Re-evaluation**: For each challenged item:
311
+ - Re-read the relevant code section
312
+ - Re-apply the rubric criteria objectively
313
+ - If the original score was correct: explain the reasoning and maintain the score
314
+ - If the original score was indeed incorrect: adjust with clear explanation of what changed and why
315
+
316
+ > **CRITICAL**: Never adjust scores simply because the user expresses dissatisfaction. Only adjust when re-evaluation reveals a genuine scoring error.
317
+
210
318
  ### Quality Criteria
211
319
 
212
320
  Before completing the review, verify:
@@ -216,7 +324,7 @@ Before completing the review, verify:
216
324
  - [ ] Constitution alignment has been checked (if constitution exists)
217
325
  - [ ] Issues are categorized by severity (CRITICAL/HIGH/MEDIUM/LOW)
218
326
  - [ ] Each CRITICAL/HIGH issue has specific code refactoring suggestions
219
- - [ ] Score reflects actual code quality accurately
327
+ - [ ] Score reflects actual code quality accurately (validated via Score Validation Checklist)
220
328
  - [ ] Strengths section highlights positive aspects
221
329
  - [ ] Recommendations are prioritized and actionable
222
330
 
@@ -99,6 +99,101 @@ Perform a comprehensive code review of React/TypeScript files at the specified p
99
99
  - [ ] Include specific code locations and refactoring suggestions
100
100
  - [ ] Calculate quality scores per dimension
101
101
 
102
+ ### Scoring Rubrics
103
+
104
+ Before scoring, apply these rubrics to ensure consistent, transparent evaluation.
105
+
106
+ #### Component Atomicity & SRP (25%)
107
+
108
+ | Score Range | Criteria |
109
+ |-------------|----------|
110
+ | 90-100 | Each file has one primary component; all under 200 lines; business logic extracted to custom Hooks; clear UI/logic separation |
111
+ | 70-89 | Most components atomic; 1-2 slightly large components; minor mixing of concerns |
112
+ | 50-69 | Several components exceed 200 lines; business logic mixed into UI components |
113
+ | Below 50 | Components are monolithic; no separation of concerns; pervasive SRP violations |
114
+
115
+ **Typical Deductions**:
116
+
117
+ - Component exceeding 200 lines: -5 each
118
+ - Business logic not extracted to custom Hook: -8 each
119
+ - Multiple primary components in one file: -8 each
120
+ - No separation between UI and logic: -10 each
121
+
122
+ #### Hooks Compliance (25%)
123
+
124
+ | Score Range | Criteria |
125
+ |-------------|----------|
126
+ | 90-100 | All useEffect have complete dependency arrays; no derived-state-as-state; no unnecessary useEffect; no stale closure risks |
127
+ | 70-89 | Minor Hooks issues; 1-2 incomplete dependency arrays or unnecessary useEffect |
128
+ | 50-69 | Several Hooks violations; derived state stored in state; missing dependencies |
129
+ | Below 50 | Pervasive Hooks rule violations; stale closures; incorrect dependency management |
130
+
131
+ **Typical Deductions**:
132
+
133
+ - useEffect with incomplete dependency array: -8 each
134
+ - Derived state stored as separate state (should be computed): -8 each
135
+ - Unnecessary useEffect (useMemo or direct computation suffices): -5 each
136
+ - Stale closure risk in async/event handler: -8 each
137
+
138
+ #### State Management (25%)
139
+
140
+ | Score Range | Criteria |
141
+ |-------------|----------|
142
+ | 90-100 | State as local as possible; no excessive prop drilling; proper async handling with loading/error states |
143
+ | 70-89 | Mostly good state management; minor prop drilling; 1-2 missing loading states |
144
+ | 50-69 | Unnecessary global state; significant prop drilling; missing error handling |
145
+ | Below 50 | Poor state architecture; pervasive prop drilling; no async error handling |
146
+
147
+ **Typical Deductions**:
148
+
149
+ - Unnecessary global/lifted state: -8 each
150
+ - Prop drilling more than 3 levels: -5 each
151
+ - Missing loading state for async operation: -5 each
152
+ - Missing error handling for async operation: -8 each
153
+ - Race condition in async operation: -10 each
154
+
155
+ #### Performance & Robustness (20%)
156
+
157
+ | Score Range | Criteria |
158
+ |-------------|----------|
159
+ | 90-100 | No unnecessary re-renders; proper memoization; null/undefined safety; appropriate React.memo usage |
160
+ | 70-89 | Minor performance issues; 1-2 missing memoizations; mostly null-safe |
161
+ | 50-69 | Several re-render issues; missing memoization for expensive computations; null safety gaps |
162
+ | Below 50 | Pervasive performance issues; no memoization; frequent null/undefined crashes |
163
+
164
+ **Typical Deductions**:
165
+
166
+ - Unmemoized expensive computation in render: -8 each
167
+ - Object/function created in render without useCallback/useMemo: -5 each
168
+ - Missing optional chaining for nullable access: -3 each
169
+ - Missing React.memo for frequently re-rendered component: -5 each
170
+
171
+ #### Constitution Alignment (5%)
172
+
173
+ | Score Range | Criteria |
174
+ |-------------|----------|
175
+ | 90-100 | Fully aligned with all constitution MUST principles; project conventions followed |
176
+ | 70-89 | Mostly aligned; minor gaps in addressing specific principles |
177
+ | 50-69 | Partial alignment; several principles not addressed |
178
+ | Below 50 | Significant violations or disregard of constitution |
179
+
180
+ > **Note**: If no constitution exists, this category defaults to 100 (full marks) and its weight is redistributed proportionally to other categories.
181
+
182
+ **Typical Deductions**:
183
+
184
+ - Constitution MUST violation: -15 each
185
+ - Constitution SHOULD violation: -8 each
186
+ - Naming convention violation: -3 each
187
+
188
+ #### Suggestion Score Cap Rule
189
+
190
+ **IMPORTANT**: Suggestions (LOW) items may deduct a **maximum of 5 points** from the total score. After resolving all CRITICAL and HIGH issues, the score should be **≥ 95**.
191
+
192
+ - CRITICAL Issues: -10 to -20 points each
193
+ - HIGH Issues: -5 to -10 points each
194
+ - MEDIUM Issues: -3 to -5 points each
195
+ - LOW Suggestions: -1 to -2 points each, **capped at 5 points total**
196
+
102
197
  ### Report Template
103
198
 
104
199
  ````markdown
@@ -191,14 +286,16 @@ Perform a comprehensive code review of React/TypeScript files at the specified p
191
286
 
192
287
  ## Scoring Breakdown
193
288
 
194
- | Category | Weight | Score | Weighted |
195
- |----------|--------|-------|----------|
196
- | Component Atomicity & SRP | 25% | X/100 | X |
197
- | Hooks Compliance | 25% | X/100 | X |
198
- | State Management | 25% | X/100 | X |
199
- | Performance & Robustness | 20% | X/100 | X |
200
- | Constitution Alignment | 5% | X/100 | X |
201
- | **Total** | **100%** | | **X/100** |
289
+ | Category | Weight | Score | Rubric Basis | Deduction Details | Weighted |
290
+ |----------|--------|-------|-------------|-------------------|----------|
291
+ | Component Atomicity & SRP | 25% | X/100 | [Which rubric range applies] | [List specific deductions, e.g., "UserPanel.tsx 350 lines: -5"] | X |
292
+ | Hooks Compliance | 25% | X/100 | [Which rubric range applies] | [e.g., "useEffect missing dep in Form.tsx: -8"] | X |
293
+ | State Management | 25% | X/100 | [Which rubric range applies] | [e.g., "Missing error state in useFetch: -8"] | X |
294
+ | Performance & Robustness | 20% | X/100 | [Which rubric range applies] | [e.g., "Unmemoized filter in List.tsx: -8"] | X |
295
+ | Constitution Alignment | 5% | X/100 | [Which rubric range applies] | [e.g., "All principles followed"] | X |
296
+ | **Total** | **100%** | | | | **X/100** |
297
+
298
+ > **Suggestion Cap**: LOW suggestions deducted X/5 points (cap: 5 points max)
202
299
 
203
300
  ## Available Follow-up Commands
204
301
 
@@ -215,6 +312,33 @@ Based on the review result, consider:
215
312
  - **Fail**: Significant rework required - consider `/codexspec:clarify` for design discussion
216
313
  ````
217
314
 
315
+ ### Score Validation Checklist
316
+
317
+ Before finalizing scores, the reviewer MUST verify:
318
+
319
+ - [ ] Every deduction in "Deduction Details" column has a corresponding issue in "Detailed Findings"
320
+ - [ ] The arithmetic is correct: each category score = 100 minus sum of deductions
321
+ - [ ] Weighted total = sum of (category score × weight) for all categories
322
+ - [ ] LOW suggestion deductions do not exceed 5-point cap
323
+ - [ ] No "phantom deductions" (deductions without matching issues)
324
+ - [ ] Score is consistent with Overall Status (Pass ≥ 80, Needs Work 50-79, Fail < 50)
325
+
326
+ ### Score Challenge Response Protocol
327
+
328
+ When a user questions or challenges the score, follow this three-step process:
329
+
330
+ 1. **Provide Evidence**: Present the complete scoring breakdown with all deduction details. Reference the specific rubric criteria and issue IDs that justify each deduction.
331
+
332
+ 2. **Ask for Specifics**: Ask the user which specific scoring item(s) they believe are incorrect. Do NOT preemptively adjust any scores.
333
+
334
+ 3. **Targeted Re-evaluation**: For each challenged item:
335
+ - Re-read the relevant code section
336
+ - Re-apply the rubric criteria objectively
337
+ - If the original score was correct: explain the reasoning and maintain the score
338
+ - If the original score was indeed incorrect: adjust with clear explanation of what changed and why
339
+
340
+ > **CRITICAL**: Never adjust scores simply because the user expresses dissatisfaction. Only adjust when re-evaluation reveals a genuine scoring error.
341
+
218
342
  ### Quality Criteria
219
343
 
220
344
  Before completing the review, verify:
@@ -224,7 +348,7 @@ Before completing the review, verify:
224
348
  - [ ] Constitution alignment has been checked (if constitution exists)
225
349
  - [ ] Issues are categorized by severity (CRITICAL/HIGH/MEDIUM/LOW)
226
350
  - [ ] Each CRITICAL/HIGH issue has specific code refactoring suggestions
227
- - [ ] Score reflects actual code quality accurately
351
+ - [ ] Score reflects actual code quality accurately (validated via Score Validation Checklist)
228
352
  - [ ] Strengths section highlights positive aspects
229
353
  - [ ] Recommendations are prioritized and actionable
230
354
 
@@ -76,6 +76,94 @@ Review the feature specification for quality and readiness. This command ensures
76
76
  - [ ] Naming conventions are followed (if specified)
77
77
  - [ ] Workflow guidelines are considered
78
78
 
79
+ ### Scoring Rubrics
80
+
81
+ Before scoring, apply these rubrics to ensure consistent, transparent evaluation.
82
+
83
+ #### Completeness (25%)
84
+
85
+ | Score Range | Criteria |
86
+ |-------------|----------|
87
+ | 90-100 | All 8 required sections present with substantive content; each section has concrete, specific details |
88
+ | 70-89 | 6-7 sections present and substantive; 1-2 sections thin but present |
89
+ | 50-69 | 4-5 sections present; several sections missing or placeholder-only |
90
+ | Below 50 | Fewer than 4 sections; major gaps in coverage |
91
+
92
+ **Typical Deductions**:
93
+
94
+ - Missing required section entirely: -15 per section
95
+ - Section present but placeholder/stub only: -8 per section
96
+ - Section present but lacks specificity: -5 per section
97
+
98
+ #### Clarity (25%)
99
+
100
+ | Score Range | Criteria |
101
+ |-------------|----------|
102
+ | 90-100 | No vague language; all requirements have single clear interpretation; technical terms defined |
103
+ | 70-89 | Minor ambiguities (1-2 vague terms); mostly precise language |
104
+ | 50-69 | Multiple ambiguities; several terms undefined; some requirements open to interpretation |
105
+ | Below 50 | Pervasive vagueness; most requirements unclear or multi-interpretable |
106
+
107
+ **Typical Deductions**:
108
+
109
+ - Vague term without metrics (e.g., "fast", "user-friendly"): -5 each
110
+ - Requirement with multiple interpretations: -8 each
111
+ - Undefined technical term or acronym: -3 each
112
+
113
+ #### Consistency (20%)
114
+
115
+ | Score Range | Criteria |
116
+ |-------------|----------|
117
+ | 90-100 | No internal contradictions; all sections align perfectly; scope boundaries match goals |
118
+ | 70-89 | Minor inconsistencies (1-2); easily resolved without major impact |
119
+ | 50-69 | Several inconsistencies between sections; conflicting requirements present |
120
+ | Below 50 | Major contradictions; requirements fundamentally conflict with goals or each other |
121
+
122
+ **Typical Deductions**:
123
+
124
+ - Direct contradiction between requirements: -15 each
125
+ - Scope boundary inconsistent with goals: -10
126
+ - Minor misalignment between sections: -5 each
127
+
128
+ #### Testability (20%)
129
+
130
+ | Score Range | Criteria |
131
+ |-------------|----------|
132
+ | 90-100 | All requirements testable; acceptance criteria concrete and executable; edge cases have expected behaviors |
133
+ | 70-89 | Most requirements testable; 1-2 criteria need more specificity |
134
+ | 50-69 | Several requirements lack testable criteria; edge cases missing expected behaviors |
135
+ | Below 50 | Most requirements not verifiable; no concrete acceptance criteria |
136
+
137
+ **Typical Deductions**:
138
+
139
+ - Requirement without testable acceptance criteria: -8 each
140
+ - Edge case without expected behavior: -5 each
141
+ - Non-measurable NFR (e.g., "should be scalable" without metrics): -8 each
142
+
143
+ #### Constitution Alignment (10%)
144
+
145
+ | Score Range | Criteria |
146
+ |-------------|----------|
147
+ | 90-100 | Fully aligned with all constitution principles; quality standards addressed |
148
+ | 70-89 | Mostly aligned; minor gaps in addressing specific principles |
149
+ | 50-69 | Partial alignment; several principles not addressed |
150
+ | Below 50 | Significant violations or disregard of constitution |
151
+
152
+ > **Note**: If no constitution exists, this category defaults to 100 (full marks) and its weight is redistributed proportionally to other categories.
153
+
154
+ **Typical Deductions**:
155
+
156
+ - Constitution principle not addressed: -10 per principle
157
+ - Direct violation of a constitution principle: -20 per violation
158
+
159
+ #### Suggestion Score Cap Rule
160
+
161
+ **IMPORTANT**: Suggestions (Nice to Have) items may deduct a **maximum of 5 points** from the total score. After resolving all Critical Issues and Warnings, the score should be **≥ 95**.
162
+
163
+ - Critical Issues: -10 to -20 points each
164
+ - Warnings: -5 to -10 points each
165
+ - Suggestions: -1 to -2 points each, **capped at 5 points total**
166
+
79
167
  ### Report Template
80
168
 
81
169
  ```markdown
@@ -148,14 +236,16 @@ Review the feature specification for quality and readiness. This command ensures
148
236
 
149
237
  ## Scoring Breakdown
150
238
 
151
- | Category | Weight | Score | Weighted |
152
- |----------|--------|-------|----------|
153
- | Completeness | 25% | X/100 | X |
154
- | Clarity | 25% | X/100 | X |
155
- | Consistency | 20% | X/100 | X |
156
- | Testability | 20% | X/100 | X |
157
- | Constitution Alignment | 10% | X/100 | X |
158
- | **Total** | **100%** | | **X/100** |
239
+ | Category | Weight | Score | Rubric Basis | Deduction Details | Weighted |
240
+ |----------|--------|-------|-------------|-------------------|----------|
241
+ | Completeness | 25% | X/100 | [Which rubric range applies] | [List specific deductions, e.g., "Missing Edge Cases section: -15"] | X |
242
+ | Clarity | 25% | X/100 | [Which rubric range applies] | [e.g., "2 vague terms: -10"] | X |
243
+ | Consistency | 20% | X/100 | [Which rubric range applies] | [e.g., "No contradictions found"] | X |
244
+ | Testability | 20% | X/100 | [Which rubric range applies] | [e.g., "REQ-003 not testable: -8"] | X |
245
+ | Constitution Alignment | 10% | X/100 | [Which rubric range applies] | [e.g., "All principles addressed"] | X |
246
+ | **Total** | **100%** | | | | **X/100** |
247
+
248
+ > **Suggestion Cap**: Suggestions deducted X/5 points (cap: 5 points max)
159
249
 
160
250
  ## Recommendations
161
251
 
@@ -185,6 +275,33 @@ Based on the review result, the user may consider:
185
275
  - **Fail**: `/codexspec:clarify` - to systematically identify and fix specification issues
186
276
  ```
187
277
 
278
+ ### Score Validation Checklist
279
+
280
+ Before finalizing scores, the reviewer MUST verify:
281
+
282
+ - [ ] Every deduction in "Deduction Details" column has a corresponding issue in "Detailed Findings"
283
+ - [ ] The arithmetic is correct: each category score = 100 minus sum of deductions
284
+ - [ ] Weighted total = sum of (category score × weight) for all categories
285
+ - [ ] Suggestion deductions do not exceed 5-point cap
286
+ - [ ] No "phantom deductions" (deductions without matching issues)
287
+ - [ ] Score is consistent with Overall Status (Pass ≥ 80, Needs Work 50-79, Fail < 50)
288
+
289
+ ### Score Challenge Response Protocol
290
+
291
+ When a user questions or challenges the score, follow this three-step process:
292
+
293
+ 1. **Provide Evidence**: Present the complete scoring breakdown with all deduction details. Reference the specific rubric criteria and issue IDs that justify each deduction.
294
+
295
+ 2. **Ask for Specifics**: Ask the user which specific scoring item(s) they believe are incorrect. Do NOT preemptively adjust any scores.
296
+
297
+ 3. **Targeted Re-evaluation**: For each challenged item:
298
+ - Re-read the relevant section of the specification
299
+ - Re-apply the rubric criteria objectively
300
+ - If the original score was correct: explain the reasoning and maintain the score
301
+ - If the original score was indeed incorrect: adjust with clear explanation of what changed and why
302
+
303
+ > **CRITICAL**: Never adjust scores simply because the user expresses dissatisfaction. Only adjust when re-evaluation reveals a genuine scoring error.
304
+
188
305
  ### Quality Criteria
189
306
 
190
307
  Before completing the review, verify:
@@ -192,7 +309,7 @@ Before completing the review, verify:
192
309
  - [ ] All sections of the spec have been examined
193
310
  - [ ] Issues are categorized by severity (Critical/Warning/Suggestion)
194
311
  - [ ] Each issue has a clear, actionable suggestion
195
- - [ ] Score reflects actual quality accurately
312
+ - [ ] Score reflects actual quality accurately (validated via Score Validation Checklist)
196
313
  - [ ] Recommendations are prioritized
197
314
  - [ ] Next steps are clear and appropriate
198
315
 
@@ -85,6 +85,96 @@ Review the task breakdown for quality and implementation readiness. This command
85
85
  - [ ] File paths are consistent with plan
86
86
  - [ ] File naming conventions are followed (per constitution)
87
87
 
88
+ ### Scoring Rubrics
89
+
90
+ Before scoring, apply these rubrics to ensure consistent, transparent evaluation.
91
+
92
+ #### Plan Coverage (30%)
93
+
94
+ | Score Range | Criteria |
95
+ |-------------|----------|
96
+ | 90-100 | All plan phases, modules, APIs, and data models have corresponding tasks; no gaps |
97
+ | 70-89 | Most plan items covered; 1-2 minor components missing task coverage |
98
+ | 50-69 | Several plan items lack task coverage; missing tasks for key modules |
99
+ | Below 50 | Major plan phases or components have no corresponding tasks |
100
+
101
+ **Typical Deductions**:
102
+
103
+ - Plan phase with no tasks: -15 each
104
+ - Module/component without implementation task: -10 each
105
+ - API endpoint without task: -8 each
106
+ - Missing testing tasks for plan items: -5 each
107
+
108
+ #### TDD Compliance (25%)
109
+
110
+ | Score Range | Criteria |
111
+ |-------------|----------|
112
+ | 90-100 | All code components have test tasks before implementation tasks; test tasks are not optional |
113
+ | 70-89 | Most components follow TDD; 1-2 minor ordering issues |
114
+ | 50-69 | Several components lack test-first ordering; some test tasks missing |
115
+ | Below 50 | No TDD enforcement; tests are absent or consistently after implementation |
116
+
117
+ **Typical Deductions**:
118
+
119
+ - Component without test task: -12 each
120
+ - Test task ordered after implementation task: -8 each
121
+ - Test task marked as optional: -5 each
122
+
123
+ #### Dependency & Ordering (20%)
124
+
125
+ | Score Range | Criteria |
126
+ |-------------|----------|
127
+ | 90-100 | All dependencies correctly identified; no circular dependencies; foundation tasks first; logical ordering |
128
+ | 70-89 | Dependencies mostly correct; 1-2 minor ordering issues |
129
+ | 50-69 | Several missing or incorrect dependencies; some ordering problems |
130
+ | Below 50 | Circular dependencies present; major ordering errors; dependencies largely incorrect |
131
+
132
+ **Typical Deductions**:
133
+
134
+ - Circular dependency: -15 each
135
+ - Missing dependency declaration: -5 each
136
+ - Incorrect task ordering: -8 each
137
+ - Foundation task not placed first: -10
138
+
139
+ #### Task Granularity (15%)
140
+
141
+ | Score Range | Criteria |
142
+ |-------------|----------|
143
+ | 90-100 | Each task involves one primary file; clear single deliverable; appropriate scope |
144
+ | 70-89 | Most tasks are atomic; 1-2 tasks slightly broad but manageable |
145
+ | 50-69 | Several tasks involve multiple files or unclear scope |
146
+ | Below 50 | Tasks are overly broad or too narrow; no atomic focus |
147
+
148
+ **Typical Deductions**:
149
+
150
+ - Task involving multiple primary files: -8 each
151
+ - Task scope too broad (should be split): -5 each
152
+ - Task scope too narrow (should be combined): -3 each
153
+
154
+ #### Parallelization & Files (10%)
155
+
156
+ | Score Range | Criteria |
157
+ |-------------|----------|
158
+ | 90-100 | Independent tasks correctly marked [P]; file paths specified and follow conventions; no false parallel markers |
159
+ | 70-89 | Mostly correct parallel markers; minor file path issues |
160
+ | 50-69 | Several incorrect parallel markers; missing file paths |
161
+ | Below 50 | Parallel markers largely incorrect; file paths missing or wrong |
162
+
163
+ **Typical Deductions**:
164
+
165
+ - Dependent task incorrectly marked [P]: -8 each
166
+ - Independent task missing [P] marker: -3 each
167
+ - Task without file path specification: -5 each
168
+ - File path not following project convention: -3 each
169
+
170
+ #### Suggestion Score Cap Rule
171
+
172
+ **IMPORTANT**: Suggestions (Nice to Have) items may deduct a **maximum of 5 points** from the total score. After resolving all Critical Issues and Warnings, the score should be **≥ 95**.
173
+
174
+ - Critical Issues: -10 to -20 points each
175
+ - Warnings: -5 to -10 points each
176
+ - Suggestions: -1 to -2 points each, **capped at 5 points total**
177
+
88
178
  ### Report Template
89
179
 
90
180
  ```markdown
@@ -236,14 +326,16 @@ Valid Dependency Chain:
236
326
 
237
327
  ## Scoring Breakdown
238
328
 
239
- | Category | Weight | Score | Weighted |
240
- |----------|--------|-------|----------|
241
- | Plan Coverage | 30% | X/100 | X |
242
- | TDD Compliance | 25% | X/100 | X |
243
- | Dependency & Ordering | 20% | X/100 | X |
244
- | Task Granularity | 15% | X/100 | X |
245
- | Parallelization & Files | 10% | X/100 | X |
246
- | **Total** | **100%** | | **X/100** |
329
+ | Category | Weight | Score | Rubric Basis | Deduction Details | Weighted |
330
+ |----------|--------|-------|-------------|-------------------|----------|
331
+ | Plan Coverage | 30% | X/100 | [Which rubric range applies] | [List specific deductions, e.g., "Module C missing task: -10"] | X |
332
+ | TDD Compliance | 25% | X/100 | [Which rubric range applies] | [e.g., "Service X test after impl: -8"] | X |
333
+ | Dependency & Ordering | 20% | X/100 | [Which rubric range applies] | [e.g., "Missing dependency Task 2.3→1.2: -5"] | X |
334
+ | Task Granularity | 15% | X/100 | [Which rubric range applies] | [e.g., "Task 2.5 involves 3 files: -8"] | X |
335
+ | Parallelization & Files | 10% | X/100 | [Which rubric range applies] | [e.g., "Task 2.1 false [P] marker: -8"] | X |
336
+ | **Total** | **100%** | | | | **X/100** |
337
+
338
+ > **Suggestion Cap**: Suggestions deducted X/5 points (cap: 5 points max)
247
339
 
248
340
  ## Execution Timeline Estimate
249
341
 
@@ -298,6 +390,33 @@ Based on the review result, the user may consider:
298
390
  - **Fail**: `/codexspec:plan-to-tasks` - to regenerate the task breakdown
299
391
  ```
300
392
 
393
+ ### Score Validation Checklist
394
+
395
+ Before finalizing scores, the reviewer MUST verify:
396
+
397
+ - [ ] Every deduction in "Deduction Details" column has a corresponding issue in "Detailed Findings"
398
+ - [ ] The arithmetic is correct: each category score = 100 minus sum of deductions
399
+ - [ ] Weighted total = sum of (category score × weight) for all categories
400
+ - [ ] Suggestion deductions do not exceed 5-point cap
401
+ - [ ] No "phantom deductions" (deductions without matching issues)
402
+ - [ ] Score is consistent with Overall Status (Pass ≥ 80, Needs Work 50-79, Fail < 50)
403
+
404
+ ### Score Challenge Response Protocol
405
+
406
+ When a user questions or challenges the score, follow this three-step process:
407
+
408
+ 1. **Provide Evidence**: Present the complete scoring breakdown with all deduction details. Reference the specific rubric criteria and issue IDs that justify each deduction.
409
+
410
+ 2. **Ask for Specifics**: Ask the user which specific scoring item(s) they believe are incorrect. Do NOT preemptively adjust any scores.
411
+
412
+ 3. **Targeted Re-evaluation**: For each challenged item:
413
+ - Re-read the relevant section of the tasks document
414
+ - Re-apply the rubric criteria objectively
415
+ - If the original score was correct: explain the reasoning and maintain the score
416
+ - If the original score was indeed incorrect: adjust with clear explanation of what changed and why
417
+
418
+ > **CRITICAL**: Never adjust scores simply because the user expresses dissatisfaction. Only adjust when re-evaluation reveals a genuine scoring error.
419
+
301
420
  ### Quality Criteria
302
421
 
303
422
  Before completing the review, verify:
@@ -308,7 +427,7 @@ Before completing the review, verify:
308
427
  - [ ] Task granularity is appropriate (Task Granularity)
309
428
  - [ ] Parallelization markers and file paths are correct (Parallelization & Files)
310
429
  - [ ] Issues have clear, actionable suggestions
311
- - [ ] Score reflects actual quality accurately
430
+ - [ ] Score reflects actual quality accurately (validated via Score Validation Checklist)
312
431
 
313
432
  ### Output
314
433
 
@@ -31,6 +31,48 @@ $ARGUMENTS
31
31
 
32
32
  Transform the feature specification into a detailed technical implementation plan. This is where you define **how** the feature will be built.
33
33
 
34
+ ### Quality Targets
35
+
36
+ Before generating the plan, internalize these quality targets. They are aligned with the `review-plan` scoring rubrics to ensure first-pass quality.
37
+
38
+ #### Spec Alignment (Target: ≥ 90)
39
+
40
+ - [ ] Every functional requirement (REQ-XXX) has a corresponding implementation component
41
+ - [ ] Every user story has technical coverage in the architecture
42
+ - [ ] All non-functional requirements are addressed in architecture decisions
43
+ - [ ] Edge cases from the spec are handled in implementation phases
44
+
45
+ #### Tech Stack (Target: ≥ 90)
46
+
47
+ - [ ] All technologies are clearly listed with version constraints
48
+ - [ ] Each technology choice is justified for the requirements
49
+ - [ ] Tech stack aligns with project constitution (if exists)
50
+ - [ ] No critical category missing (language, framework, testing, etc.)
51
+
52
+ #### Architecture Quality (Target: ≥ 90)
53
+
54
+ - [ ] High-level architecture diagram included (ASCII or Mermaid)
55
+ - [ ] Each module has explicit responsibility, dependencies, and interfaces
56
+ - [ ] Module dependency graph is complete
57
+ - [ ] Separation of concerns is maintained
58
+ - [ ] Design patterns are appropriate and documented
59
+
60
+ #### Phase Planning (Target: ≥ 90)
61
+
62
+ - [ ] Phases are logically ordered (foundation → core → integration → testing)
63
+ - [ ] Each phase has specific, measurable deliverables
64
+ - [ ] Phase scope is realistic and manageable
65
+ - [ ] Inter-phase dependencies are minimal and documented
66
+
67
+ #### Constitution Alignment (Target: ≥ 90)
68
+
69
+ - [ ] Each constitution principle explicitly reviewed
70
+ - [ ] Architecture decisions reference relevant principles
71
+ - [ ] Testing requirements from constitution are incorporated
72
+ - [ ] Naming conventions and workflow guidelines followed
73
+
74
+ > **Self-Check**: After generating the plan, verify each target above is met before saving. This reduces review iterations.
75
+
34
76
  ### Execution Steps
35
77
 
36
78
  1. **Load Context**
File without changes
File without changes
File without changes