PyPI - codexspec - Versions diffs - 0.5.14__tar.gz → 0.5.16__tar.gz - Mend

codexspec 0.5.14tar.gz → 0.5.16tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (64) hide show

{codexspec-0.5.14 → codexspec-0.5.16}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: codexspec
-Version: 0.5.14
+Version: 0.5.16
 Summary: CodexSpec - A Spec-Driven Development (SDD) toolkit for Claude Code
 Project-URL: Homepage, https://github.com/Zts0hg/codexspec
 Project-URL: Repository, https://github.com/Zts0hg/codexspec

{codexspec-0.5.14 → codexspec-0.5.16}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "codexspec"
-version = "0.5.14"
+version = "0.5.16"
 description = "CodexSpec - A Spec-Driven Development (SDD) toolkit for Claude Code"
 readme = "README.md"
 requires-python = ">=3.11"

{codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/plan-to-tasks.md RENAMED Viewed

@@ -29,6 +29,48 @@ You are acting as a **Technical Lead**. Your responsibility is to transform tech
 Analyze the provided spec and plan documents, then break down the technical implementation plan into specific, actionable tasks.
+### Quality Targets
+Before generating the task breakdown, internalize these quality targets. They are aligned with the `review-tasks` scoring rubrics to ensure first-pass quality.
+#### Plan Coverage (Target: ≥ 90)
+- [ ] Every plan phase has corresponding tasks
+- [ ] Every module/component has creation and implementation tasks
+- [ ] Every API endpoint has an implementation task (if applicable)
+- [ ] Every data model has an implementation task (if applicable)
+- [ ] Testing tasks are included per constitution TDD requirements
+#### TDD Compliance (Target: ≥ 90)
+- [ ] Every code component has a test task that precedes its implementation task
+- [ ] Test tasks are never marked as optional
+- [ ] Integration tests are included where appropriate
+- [ ] Test file paths follow project testing conventions
+#### Dependency & Ordering (Target: ≥ 90)
+- [ ] All dependencies between tasks are explicitly declared
+- [ ] No circular dependencies exist
+- [ ] Foundation/setup tasks are placed first
+- [ ] Dependencies execute before dependents in the execution order
+#### Task Granularity (Target: ≥ 90)
+- [ ] Each task involves only ONE primary file
+- [ ] Each task has a clear, single deliverable
+- [ ] Tasks are neither too broad (should be split) nor too narrow (should be combined)
+- [ ] Complexity estimates are reasonable
+#### Parallelization & Files (Target: ≥ 90)
+- [ ] Truly independent tasks are marked with `[P]`
+- [ ] Dependent tasks are NOT marked `[P]`
+- [ ] All tasks have file paths specified
+- [ ] File paths follow project naming conventions
+> **Self-Check**: After generating the tasks, verify each target above is met before saving. This reduces review iterations.
 ### Critical Requirements
 1. **Task Granularity**: Each task should involve modifying or creating **only one primary file**. Avoid broad tasks like "implement all features".

{codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/review-plan.md RENAMED Viewed

@@ -88,6 +88,98 @@ Review the technical implementation plan for quality and readiness. This command
    - [ ] Naming conventions are followed (if specified)
    - [ ] Testing requirements are addressed
+### Scoring Rubrics
+Before scoring, apply these rubrics to ensure consistent, transparent evaluation.
+#### Spec Alignment (30%)
+| Score Range | Criteria |
+|-------------|----------|
+| 90-100 | All functional requirements, user stories, and NFRs have clear implementation coverage; edge cases addressed |
+| 70-89 | Most requirements covered; 1-2 minor gaps in NFR or edge case coverage |
+| 50-69 | Several requirements only partially covered; missing implementation for key user stories |
+| Below 50 | Major requirements missing from plan; significant gaps between spec and plan |
+**Typical Deductions**:
+- Functional requirement with no implementation: -15 each
+- User story without technical coverage: -10 each
+- NFR not addressed in architecture: -8 each
+- Edge case from spec not handled: -5 each
+#### Tech Stack (15%)
+| Score Range | Criteria |
+|-------------|----------|
+| 90-100 | All technologies clearly defined with versions; choices well-justified and appropriate for requirements |
+| 70-89 | Tech stack defined; minor version gaps; mostly appropriate choices |
+| 50-69 | Incomplete stack definition; some questionable technology choices |
+| Below 50 | Vague or missing tech stack; inappropriate choices for requirements |
+**Typical Deductions**:
+- Technology without version constraint: -5 each
+- Unjustified technology choice: -10 each
+- Missing critical category (e.g., no testing framework): -10
+#### Architecture Quality (25%)
+| Score Range | Criteria |
+|-------------|----------|
+| 90-100 | Clear diagrams; well-defined module responsibilities; proper separation of concerns; dependency graph complete |
+| 70-89 | Good architecture; minor gaps in documentation; mostly clear module boundaries |
+| 50-69 | Architecture outlined but vague; unclear module responsibilities; missing dependency graph |
+| Below 50 | No clear architecture; modules poorly defined; significant design issues |
+**Typical Deductions**:
+- Missing architecture diagram: -15
+- Module without clear responsibility: -8 each
+- Missing dependency graph: -10
+- Tight coupling between modules: -8 each
+- Missing separation of concerns: -10
+#### Phase Planning (15%)
+| Score Range | Criteria |
+|-------------|----------|
+| 90-100 | Phases logically ordered; clear deliverables per phase; realistic scope; minimal inter-phase dependencies |
+| 70-89 | Good phasing; 1-2 phases with unclear deliverables or slightly large scope |
+| 50-69 | Phase ordering has issues; several phases lack clear deliverables |
+| Below 50 | No meaningful phase breakdown; deliverables unclear; unrealistic scope |
+**Typical Deductions**:
+- Phase without clear deliverables: -10 each
+- Illogical phase ordering: -10
+- Overly large phase scope: -5 each
+- Missing phase dependencies: -5
+#### Constitution Alignment (15%)
+| Score Range | Criteria |
+|-------------|----------|
+| 90-100 | Fully aligned with all constitution principles; architecture principles followed; testing requirements addressed |
+| 70-89 | Mostly aligned; minor gaps in addressing specific principles |
+| 50-69 | Partial alignment; several principles not addressed |
+| Below 50 | Significant violations or disregard of constitution |
+> **Note**: If no constitution exists, this category defaults to 100 (full marks) and its weight is redistributed proportionally to other categories.
+**Typical Deductions**:
+- Constitution principle not addressed: -10 per principle
+- Direct violation of a constitution principle: -20 per violation
+#### Suggestion Score Cap Rule
+**IMPORTANT**: Suggestions (Nice to Have) items may deduct a **maximum of 5 points** from the total score. After resolving all Critical Issues and Warnings, the score should be **≥ 95**.
+- Critical Issues: -10 to -20 points each
+- Warnings: -5 to -10 points each
+- Suggestions: -1 to -2 points each, **capped at 5 points total**
 ### Report Template
 ```markdown
@@ -207,14 +299,16 @@ Review the technical implementation plan for quality and readiness. This command
 ## Scoring Breakdown
-| Category | Weight | Score | Weighted |
-|----------|--------|-------|----------|
-| Spec Alignment | 30% | X/100 | X |
-| Tech Stack | 15% | X/100 | X |
-| Architecture Quality | 25% | X/100 | X |
-| Phase Planning | 15% | X/100 | X |
-| Constitution Alignment | 15% | X/100 | X |
-| **Total** | **100%** | | **X/100** |
+| Category | Weight | Score | Rubric Basis | Deduction Details | Weighted |
+|----------|--------|-------|-------------|-------------------|----------|
+| Spec Alignment | 30% | X/100 | [Which rubric range applies] | [List specific deductions, e.g., "REQ-003 not addressed: -15"] | X |
+| Tech Stack | 15% | X/100 | [Which rubric range applies] | [e.g., "No version for DB: -5"] | X |
+| Architecture Quality | 25% | X/100 | [Which rubric range applies] | [e.g., "Missing dependency graph: -10"] | X |
+| Phase Planning | 15% | X/100 | [Which rubric range applies] | [e.g., "Phase 2 scope too large: -5"] | X |
+| Constitution Alignment | 15% | X/100 | [Which rubric range applies] | [e.g., "All principles addressed"] | X |
+| **Total** | **100%** | | | | **X/100** |
+> **Suggestion Cap**: Suggestions deducted X/5 points (cap: 5 points max)
 ## Recommendations
@@ -245,6 +339,33 @@ Based on the review result, the user may consider:
 - **Fail**: `/codexspec:spec-to-plan` - to regenerate the technical plan
 ```
+### Score Validation Checklist
+Before finalizing scores, the reviewer MUST verify:
+- [ ] Every deduction in "Deduction Details" column has a corresponding issue in "Detailed Findings"
+- [ ] The arithmetic is correct: each category score = 100 minus sum of deductions
+- [ ] Weighted total = sum of (category score × weight) for all categories
+- [ ] Suggestion deductions do not exceed 5-point cap
+- [ ] No "phantom deductions" (deductions without matching issues)
+- [ ] Score is consistent with Overall Status (Pass ≥ 80, Needs Work 50-79, Fail < 50)
+### Score Challenge Response Protocol
+When a user questions or challenges the score, follow this three-step process:
+1. **Provide Evidence**: Present the complete scoring breakdown with all deduction details. Reference the specific rubric criteria and issue IDs that justify each deduction.
+2. **Ask for Specifics**: Ask the user which specific scoring item(s) they believe are incorrect. Do NOT preemptively adjust any scores.
+3. **Targeted Re-evaluation**: For each challenged item:
+   - Re-read the relevant section of the plan
+   - Re-apply the rubric criteria objectively
+   - If the original score was correct: explain the reasoning and maintain the score
+   - If the original score was indeed incorrect: adjust with clear explanation of what changed and why
+> **CRITICAL**: Never adjust scores simply because the user expresses dissatisfaction. Only adjust when re-evaluation reveals a genuine scoring error.
 ### Quality Criteria
 Before completing the review, verify:
@@ -254,7 +375,7 @@ Before completing the review, verify:
 - [ ] Tech stack choices are evaluated
 - [ ] Constitution alignment is checked
 - [ ] Issues have clear, actionable suggestions
-- [ ] Score reflects actual quality accurately
+- [ ] Score reflects actual quality accurately (validated via Score Validation Checklist)
 - [ ] Next steps are clear and appropriate
 ### Output

{codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/review-python-code.md RENAMED Viewed

@@ -93,6 +93,85 @@ Perform a comprehensive code review of Python files at the specified path. This
    - [ ] Include specific code locations and refactoring suggestions
    - [ ] Calculate quality scores per dimension
+### Scoring Rubrics
+Before scoring, apply these rubrics to ensure consistent, transparent evaluation.
+#### Pythonic & KISS (30%)
+| Score Range | Criteria |
+|-------------|----------|
+| 90-100 | Code follows Python idioms; uses built-in/stdlib effectively; no over-engineering; functions are focused |
+| 70-89 | Mostly Pythonic; minor instances of unnecessary complexity or missed stdlib usage |
+| 50-69 | Several non-idiomatic patterns; unnecessary classes or abstractions; missed standard library opportunities |
+| Below 50 | Pervasive over-engineering; code fights against Python idioms; significant complexity issues |
+**Typical Deductions**:
+- Unnecessary class when function suffices: -8 each
+- Missed standard library opportunity (e.g., manual iteration vs. itertools): -5 each
+- Function exceeding single responsibility: -5 each
+- Overly complex logic when simpler alternative exists: -5 each
+#### Type Safety & Explicitness (30%)
+| Score Range | Criteria |
+|-------------|----------|
+| 90-100 | Complete type annotations; specific exception handling; exception context preserved; good DI patterns |
+| 70-89 | Most functions annotated; minor type safety gaps; 1-2 broad exception catches |
+| 50-69 | Incomplete type annotations; several broad exception handlers; missing `raise from` |
+| Below 50 | No type annotations; pervasive `except Exception:`; no exception context preservation |
+**Typical Deductions**:
+- Public function missing type annotations: -5 each
+- Bare `except:` or `except Exception:` without re-raise: -8 each
+- Missing `raise ... from err` context: -3 each
+- mypy error: -5 each
+#### Engineering Robustness (25%)
+| Score Range | Criteria |
+|-------------|----------|
+| 90-100 | Proper resource management (context managers); correct async patterns; proper logging; no print statements |
+| 70-89 | Mostly robust; minor resource management gaps; 1-2 logging issues |
+| 50-69 | Several resource leaks; print statements instead of logging; async pattern issues |
+| Below 50 | No context managers for resources; pervasive print debugging; blocking async operations |
+**Typical Deductions**:
+- File/connection without context manager: -8 each
+- `print()` instead of `logging`: -3 each
+- Blocking call in async context: -10 each
+- Incorrect log level usage: -3 each
+- ruff violation: -3 each
+#### Constitution Alignment (15%)
+| Score Range | Criteria |
+|-------------|----------|
+| 90-100 | Fully aligned with all constitution MUST principles; project conventions followed |
+| 70-89 | Mostly aligned; minor gaps in addressing specific principles |
+| 50-69 | Partial alignment; several principles not addressed |
+| Below 50 | Significant violations or disregard of constitution |
+> **Note**: If no constitution exists, this category defaults to 100 (full marks) and its weight is redistributed proportionally to other categories.
+**Typical Deductions**:
+- Constitution MUST violation: -15 each
+- Constitution SHOULD violation: -8 each
+- Naming convention violation: -3 each
+#### Suggestion Score Cap Rule
+**IMPORTANT**: Suggestions (LOW) items may deduct a **maximum of 5 points** from the total score. After resolving all CRITICAL and HIGH issues, the score should be **≥ 95**.
+- CRITICAL Issues: -10 to -20 points each
+- HIGH Issues: -5 to -10 points each
+- MEDIUM Issues: -3 to -5 points each
+- LOW Suggestions: -1 to -2 points each, **capped at 5 points total**
 ### Report Template
 ````markdown
@@ -184,13 +263,15 @@ Perform a comprehensive code review of Python files at the specified path. This
 ## Scoring Breakdown
-| Category | Weight | Score | Weighted |
-|----------|--------|-------|----------|
-| Pythonic & KISS | 30% | X/100 | X |
-| Type Safety | 30% | X/100 | X |
-| Engineering Robustness | 25% | X/100 | X |
-| Constitution Alignment | 15% | X/100 | X |
-| **Total** | **100%** | | **X/100** |
+| Category | Weight | Score | Rubric Basis | Deduction Details | Weighted |
+|----------|--------|-------|-------------|-------------------|----------|
+| Pythonic & KISS | 30% | X/100 | [Which rubric range applies] | [List specific deductions, e.g., "Unnecessary class in utils.py: -8"] | X |
+| Type Safety | 30% | X/100 | [Which rubric range applies] | [e.g., "2 functions missing annotations: -10"] | X |
+| Engineering Robustness | 25% | X/100 | [Which rubric range applies] | [e.g., "File opened without context manager: -8"] | X |
+| Constitution Alignment | 15% | X/100 | [Which rubric range applies] | [e.g., "All principles followed"] | X |
+| **Total** | **100%** | | | | **X/100** |
+> **Suggestion Cap**: LOW suggestions deducted X/5 points (cap: 5 points max)
 ## Available Follow-up Commands
@@ -207,6 +288,33 @@ Based on the review result, consider:
 - **Fail**: Significant rework required - consider `/codexspec:clarify` for design discussion
 ````
+### Score Validation Checklist
+Before finalizing scores, the reviewer MUST verify:
+- [ ] Every deduction in "Deduction Details" column has a corresponding issue in "Detailed Findings"
+- [ ] The arithmetic is correct: each category score = 100 minus sum of deductions
+- [ ] Weighted total = sum of (category score × weight) for all categories
+- [ ] LOW suggestion deductions do not exceed 5-point cap
+- [ ] No "phantom deductions" (deductions without matching issues)
+- [ ] Score is consistent with Overall Status (Pass ≥ 80, Needs Work 50-79, Fail < 50)
+### Score Challenge Response Protocol
+When a user questions or challenges the score, follow this three-step process:
+1. **Provide Evidence**: Present the complete scoring breakdown with all deduction details. Reference the specific rubric criteria and issue IDs that justify each deduction.
+2. **Ask for Specifics**: Ask the user which specific scoring item(s) they believe are incorrect. Do NOT preemptively adjust any scores.
+3. **Targeted Re-evaluation**: For each challenged item:
+   - Re-read the relevant code section
+   - Re-apply the rubric criteria objectively
+   - If the original score was correct: explain the reasoning and maintain the score
+   - If the original score was indeed incorrect: adjust with clear explanation of what changed and why
+> **CRITICAL**: Never adjust scores simply because the user expresses dissatisfaction. Only adjust when re-evaluation reveals a genuine scoring error.
 ### Quality Criteria
 Before completing the review, verify:
@@ -216,7 +324,7 @@ Before completing the review, verify:
 - [ ] Constitution alignment has been checked (if constitution exists)
 - [ ] Issues are categorized by severity (CRITICAL/HIGH/MEDIUM/LOW)
 - [ ] Each CRITICAL/HIGH issue has specific code refactoring suggestions
-- [ ] Score reflects actual code quality accurately
+- [ ] Score reflects actual code quality accurately (validated via Score Validation Checklist)
 - [ ] Strengths section highlights positive aspects
 - [ ] Recommendations are prioritized and actionable

{codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/review-react-code.md RENAMED Viewed

@@ -99,6 +99,101 @@ Perform a comprehensive code review of React/TypeScript files at the specified p
     - [ ] Include specific code locations and refactoring suggestions
     - [ ] Calculate quality scores per dimension
+### Scoring Rubrics
+Before scoring, apply these rubrics to ensure consistent, transparent evaluation.
+#### Component Atomicity & SRP (25%)
+| Score Range | Criteria |
+|-------------|----------|
+| 90-100 | Each file has one primary component; all under 200 lines; business logic extracted to custom Hooks; clear UI/logic separation |
+| 70-89 | Most components atomic; 1-2 slightly large components; minor mixing of concerns |
+| 50-69 | Several components exceed 200 lines; business logic mixed into UI components |
+| Below 50 | Components are monolithic; no separation of concerns; pervasive SRP violations |
+**Typical Deductions**:
+- Component exceeding 200 lines: -5 each
+- Business logic not extracted to custom Hook: -8 each
+- Multiple primary components in one file: -8 each
+- No separation between UI and logic: -10 each
+#### Hooks Compliance (25%)
+| Score Range | Criteria |
+|-------------|----------|
+| 90-100 | All useEffect have complete dependency arrays; no derived-state-as-state; no unnecessary useEffect; no stale closure risks |
+| 70-89 | Minor Hooks issues; 1-2 incomplete dependency arrays or unnecessary useEffect |
+| 50-69 | Several Hooks violations; derived state stored in state; missing dependencies |
+| Below 50 | Pervasive Hooks rule violations; stale closures; incorrect dependency management |
+**Typical Deductions**:
+- useEffect with incomplete dependency array: -8 each
+- Derived state stored as separate state (should be computed): -8 each
+- Unnecessary useEffect (useMemo or direct computation suffices): -5 each
+- Stale closure risk in async/event handler: -8 each
+#### State Management (25%)
+| Score Range | Criteria |
+|-------------|----------|
+| 90-100 | State as local as possible; no excessive prop drilling; proper async handling with loading/error states |
+| 70-89 | Mostly good state management; minor prop drilling; 1-2 missing loading states |
+| 50-69 | Unnecessary global state; significant prop drilling; missing error handling |
+| Below 50 | Poor state architecture; pervasive prop drilling; no async error handling |
+**Typical Deductions**:
+- Unnecessary global/lifted state: -8 each
+- Prop drilling more than 3 levels: -5 each
+- Missing loading state for async operation: -5 each
+- Missing error handling for async operation: -8 each
+- Race condition in async operation: -10 each
+#### Performance & Robustness (20%)
+| Score Range | Criteria |
+|-------------|----------|
+| 90-100 | No unnecessary re-renders; proper memoization; null/undefined safety; appropriate React.memo usage |
+| 70-89 | Minor performance issues; 1-2 missing memoizations; mostly null-safe |
+| 50-69 | Several re-render issues; missing memoization for expensive computations; null safety gaps |
+| Below 50 | Pervasive performance issues; no memoization; frequent null/undefined crashes |
+**Typical Deductions**:
+- Unmemoized expensive computation in render: -8 each
+- Object/function created in render without useCallback/useMemo: -5 each
+- Missing optional chaining for nullable access: -3 each
+- Missing React.memo for frequently re-rendered component: -5 each
+#### Constitution Alignment (5%)
+| Score Range | Criteria |
+|-------------|----------|
+| 90-100 | Fully aligned with all constitution MUST principles; project conventions followed |
+| 70-89 | Mostly aligned; minor gaps in addressing specific principles |
+| 50-69 | Partial alignment; several principles not addressed |
+| Below 50 | Significant violations or disregard of constitution |
+> **Note**: If no constitution exists, this category defaults to 100 (full marks) and its weight is redistributed proportionally to other categories.
+**Typical Deductions**:
+- Constitution MUST violation: -15 each
+- Constitution SHOULD violation: -8 each
+- Naming convention violation: -3 each
+#### Suggestion Score Cap Rule
+**IMPORTANT**: Suggestions (LOW) items may deduct a **maximum of 5 points** from the total score. After resolving all CRITICAL and HIGH issues, the score should be **≥ 95**.
+- CRITICAL Issues: -10 to -20 points each
+- HIGH Issues: -5 to -10 points each
+- MEDIUM Issues: -3 to -5 points each
+- LOW Suggestions: -1 to -2 points each, **capped at 5 points total**
 ### Report Template
 ````markdown
@@ -191,14 +286,16 @@ Perform a comprehensive code review of React/TypeScript files at the specified p
 ## Scoring Breakdown
-| Category | Weight | Score | Weighted |
-|----------|--------|-------|----------|
-| Component Atomicity & SRP | 25% | X/100 | X |
-| Hooks Compliance | 25% | X/100 | X |
-| State Management | 25% | X/100 | X |
-| Performance & Robustness | 20% | X/100 | X |
-| Constitution Alignment | 5% | X/100 | X |
-| **Total** | **100%** | | **X/100** |
+| Category | Weight | Score | Rubric Basis | Deduction Details | Weighted |
+|----------|--------|-------|-------------|-------------------|----------|
+| Component Atomicity & SRP | 25% | X/100 | [Which rubric range applies] | [List specific deductions, e.g., "UserPanel.tsx 350 lines: -5"] | X |
+| Hooks Compliance | 25% | X/100 | [Which rubric range applies] | [e.g., "useEffect missing dep in Form.tsx: -8"] | X |
+| State Management | 25% | X/100 | [Which rubric range applies] | [e.g., "Missing error state in useFetch: -8"] | X |
+| Performance & Robustness | 20% | X/100 | [Which rubric range applies] | [e.g., "Unmemoized filter in List.tsx: -8"] | X |
+| Constitution Alignment | 5% | X/100 | [Which rubric range applies] | [e.g., "All principles followed"] | X |
+| **Total** | **100%** | | | | **X/100** |
+> **Suggestion Cap**: LOW suggestions deducted X/5 points (cap: 5 points max)
 ## Available Follow-up Commands
@@ -215,6 +312,33 @@ Based on the review result, consider:
 - **Fail**: Significant rework required - consider `/codexspec:clarify` for design discussion
 ````
+### Score Validation Checklist
+Before finalizing scores, the reviewer MUST verify:
+- [ ] Every deduction in "Deduction Details" column has a corresponding issue in "Detailed Findings"
+- [ ] The arithmetic is correct: each category score = 100 minus sum of deductions
+- [ ] Weighted total = sum of (category score × weight) for all categories
+- [ ] LOW suggestion deductions do not exceed 5-point cap
+- [ ] No "phantom deductions" (deductions without matching issues)
+- [ ] Score is consistent with Overall Status (Pass ≥ 80, Needs Work 50-79, Fail < 50)
+### Score Challenge Response Protocol
+When a user questions or challenges the score, follow this three-step process:
+1. **Provide Evidence**: Present the complete scoring breakdown with all deduction details. Reference the specific rubric criteria and issue IDs that justify each deduction.
+2. **Ask for Specifics**: Ask the user which specific scoring item(s) they believe are incorrect. Do NOT preemptively adjust any scores.
+3. **Targeted Re-evaluation**: For each challenged item:
+   - Re-read the relevant code section
+   - Re-apply the rubric criteria objectively
+   - If the original score was correct: explain the reasoning and maintain the score
+   - If the original score was indeed incorrect: adjust with clear explanation of what changed and why
+> **CRITICAL**: Never adjust scores simply because the user expresses dissatisfaction. Only adjust when re-evaluation reveals a genuine scoring error.
 ### Quality Criteria
 Before completing the review, verify:
@@ -224,7 +348,7 @@ Before completing the review, verify:
 - [ ] Constitution alignment has been checked (if constitution exists)
 - [ ] Issues are categorized by severity (CRITICAL/HIGH/MEDIUM/LOW)
 - [ ] Each CRITICAL/HIGH issue has specific code refactoring suggestions
-- [ ] Score reflects actual code quality accurately
+- [ ] Score reflects actual code quality accurately (validated via Score Validation Checklist)
 - [ ] Strengths section highlights positive aspects
 - [ ] Recommendations are prioritized and actionable

{codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/review-spec.md RENAMED Viewed

@@ -76,6 +76,94 @@ Review the feature specification for quality and readiness. This command ensures
    - [ ] Naming conventions are followed (if specified)
    - [ ] Workflow guidelines are considered
+### Scoring Rubrics
+Before scoring, apply these rubrics to ensure consistent, transparent evaluation.
+#### Completeness (25%)
+| Score Range | Criteria |
+|-------------|----------|
+| 90-100 | All 8 required sections present with substantive content; each section has concrete, specific details |
+| 70-89 | 6-7 sections present and substantive; 1-2 sections thin but present |
+| 50-69 | 4-5 sections present; several sections missing or placeholder-only |
+| Below 50 | Fewer than 4 sections; major gaps in coverage |
+**Typical Deductions**:
+- Missing required section entirely: -15 per section
+- Section present but placeholder/stub only: -8 per section
+- Section present but lacks specificity: -5 per section
+#### Clarity (25%)
+| Score Range | Criteria |
+|-------------|----------|
+| 90-100 | No vague language; all requirements have single clear interpretation; technical terms defined |
+| 70-89 | Minor ambiguities (1-2 vague terms); mostly precise language |
+| 50-69 | Multiple ambiguities; several terms undefined; some requirements open to interpretation |
+| Below 50 | Pervasive vagueness; most requirements unclear or multi-interpretable |
+**Typical Deductions**:
+- Vague term without metrics (e.g., "fast", "user-friendly"): -5 each
+- Requirement with multiple interpretations: -8 each
+- Undefined technical term or acronym: -3 each
+#### Consistency (20%)
+| Score Range | Criteria |
+|-------------|----------|
+| 90-100 | No internal contradictions; all sections align perfectly; scope boundaries match goals |
+| 70-89 | Minor inconsistencies (1-2); easily resolved without major impact |
+| 50-69 | Several inconsistencies between sections; conflicting requirements present |
+| Below 50 | Major contradictions; requirements fundamentally conflict with goals or each other |
+**Typical Deductions**:
+- Direct contradiction between requirements: -15 each
+- Scope boundary inconsistent with goals: -10
+- Minor misalignment between sections: -5 each
+#### Testability (20%)
+| Score Range | Criteria |
+|-------------|----------|
+| 90-100 | All requirements testable; acceptance criteria concrete and executable; edge cases have expected behaviors |
+| 70-89 | Most requirements testable; 1-2 criteria need more specificity |
+| 50-69 | Several requirements lack testable criteria; edge cases missing expected behaviors |
+| Below 50 | Most requirements not verifiable; no concrete acceptance criteria |
+**Typical Deductions**:
+- Requirement without testable acceptance criteria: -8 each
+- Edge case without expected behavior: -5 each
+- Non-measurable NFR (e.g., "should be scalable" without metrics): -8 each
+#### Constitution Alignment (10%)
+| Score Range | Criteria |
+|-------------|----------|
+| 90-100 | Fully aligned with all constitution principles; quality standards addressed |
+| 70-89 | Mostly aligned; minor gaps in addressing specific principles |
+| 50-69 | Partial alignment; several principles not addressed |
+| Below 50 | Significant violations or disregard of constitution |
+> **Note**: If no constitution exists, this category defaults to 100 (full marks) and its weight is redistributed proportionally to other categories.
+**Typical Deductions**:
+- Constitution principle not addressed: -10 per principle
+- Direct violation of a constitution principle: -20 per violation
+#### Suggestion Score Cap Rule
+**IMPORTANT**: Suggestions (Nice to Have) items may deduct a **maximum of 5 points** from the total score. After resolving all Critical Issues and Warnings, the score should be **≥ 95**.
+- Critical Issues: -10 to -20 points each
+- Warnings: -5 to -10 points each
+- Suggestions: -1 to -2 points each, **capped at 5 points total**
 ### Report Template
 ```markdown
@@ -148,14 +236,16 @@ Review the feature specification for quality and readiness. This command ensures
 ## Scoring Breakdown
-| Category | Weight | Score | Weighted |
-|----------|--------|-------|----------|
-| Completeness | 25% | X/100 | X |
-| Clarity | 25% | X/100 | X |
-| Consistency | 20% | X/100 | X |
-| Testability | 20% | X/100 | X |
-| Constitution Alignment | 10% | X/100 | X |
-| **Total** | **100%** | | **X/100** |
+| Category | Weight | Score | Rubric Basis | Deduction Details | Weighted |
+|----------|--------|-------|-------------|-------------------|----------|
+| Completeness | 25% | X/100 | [Which rubric range applies] | [List specific deductions, e.g., "Missing Edge Cases section: -15"] | X |
+| Clarity | 25% | X/100 | [Which rubric range applies] | [e.g., "2 vague terms: -10"] | X |
+| Consistency | 20% | X/100 | [Which rubric range applies] | [e.g., "No contradictions found"] | X |
+| Testability | 20% | X/100 | [Which rubric range applies] | [e.g., "REQ-003 not testable: -8"] | X |
+| Constitution Alignment | 10% | X/100 | [Which rubric range applies] | [e.g., "All principles addressed"] | X |
+| **Total** | **100%** | | | | **X/100** |
+> **Suggestion Cap**: Suggestions deducted X/5 points (cap: 5 points max)
 ## Recommendations
@@ -185,6 +275,33 @@ Based on the review result, the user may consider:
 - **Fail**: `/codexspec:clarify` - to systematically identify and fix specification issues
 ```
+### Score Validation Checklist
+Before finalizing scores, the reviewer MUST verify:
+- [ ] Every deduction in "Deduction Details" column has a corresponding issue in "Detailed Findings"
+- [ ] The arithmetic is correct: each category score = 100 minus sum of deductions
+- [ ] Weighted total = sum of (category score × weight) for all categories
+- [ ] Suggestion deductions do not exceed 5-point cap
+- [ ] No "phantom deductions" (deductions without matching issues)
+- [ ] Score is consistent with Overall Status (Pass ≥ 80, Needs Work 50-79, Fail < 50)
+### Score Challenge Response Protocol
+When a user questions or challenges the score, follow this three-step process:
+1. **Provide Evidence**: Present the complete scoring breakdown with all deduction details. Reference the specific rubric criteria and issue IDs that justify each deduction.
+2. **Ask for Specifics**: Ask the user which specific scoring item(s) they believe are incorrect. Do NOT preemptively adjust any scores.
+3. **Targeted Re-evaluation**: For each challenged item:
+   - Re-read the relevant section of the specification
+   - Re-apply the rubric criteria objectively
+   - If the original score was correct: explain the reasoning and maintain the score
+   - If the original score was indeed incorrect: adjust with clear explanation of what changed and why
+> **CRITICAL**: Never adjust scores simply because the user expresses dissatisfaction. Only adjust when re-evaluation reveals a genuine scoring error.
 ### Quality Criteria
 Before completing the review, verify:
@@ -192,7 +309,7 @@ Before completing the review, verify:
 - [ ] All sections of the spec have been examined
 - [ ] Issues are categorized by severity (Critical/Warning/Suggestion)
 - [ ] Each issue has a clear, actionable suggestion
-- [ ] Score reflects actual quality accurately
+- [ ] Score reflects actual quality accurately (validated via Score Validation Checklist)
 - [ ] Recommendations are prioritized
 - [ ] Next steps are clear and appropriate

{codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/review-tasks.md RENAMED Viewed

@@ -85,6 +85,96 @@ Review the task breakdown for quality and implementation readiness. This command
    - [ ] File paths are consistent with plan
    - [ ] File naming conventions are followed (per constitution)
+### Scoring Rubrics
+Before scoring, apply these rubrics to ensure consistent, transparent evaluation.
+#### Plan Coverage (30%)
+| Score Range | Criteria |
+|-------------|----------|
+| 90-100 | All plan phases, modules, APIs, and data models have corresponding tasks; no gaps |
+| 70-89 | Most plan items covered; 1-2 minor components missing task coverage |
+| 50-69 | Several plan items lack task coverage; missing tasks for key modules |
+| Below 50 | Major plan phases or components have no corresponding tasks |
+**Typical Deductions**:
+- Plan phase with no tasks: -15 each
+- Module/component without implementation task: -10 each
+- API endpoint without task: -8 each
+- Missing testing tasks for plan items: -5 each
+#### TDD Compliance (25%)
+| Score Range | Criteria |
+|-------------|----------|
+| 90-100 | All code components have test tasks before implementation tasks; test tasks are not optional |
+| 70-89 | Most components follow TDD; 1-2 minor ordering issues |
+| 50-69 | Several components lack test-first ordering; some test tasks missing |
+| Below 50 | No TDD enforcement; tests are absent or consistently after implementation |
+**Typical Deductions**:
+- Component without test task: -12 each
+- Test task ordered after implementation task: -8 each
+- Test task marked as optional: -5 each
+#### Dependency & Ordering (20%)
+| Score Range | Criteria |
+|-------------|----------|
+| 90-100 | All dependencies correctly identified; no circular dependencies; foundation tasks first; logical ordering |
+| 70-89 | Dependencies mostly correct; 1-2 minor ordering issues |
+| 50-69 | Several missing or incorrect dependencies; some ordering problems |
+| Below 50 | Circular dependencies present; major ordering errors; dependencies largely incorrect |
+**Typical Deductions**:
+- Circular dependency: -15 each
+- Missing dependency declaration: -5 each
+- Incorrect task ordering: -8 each
+- Foundation task not placed first: -10
+#### Task Granularity (15%)
+| Score Range | Criteria |
+|-------------|----------|
+| 90-100 | Each task involves one primary file; clear single deliverable; appropriate scope |
+| 70-89 | Most tasks are atomic; 1-2 tasks slightly broad but manageable |
+| 50-69 | Several tasks involve multiple files or unclear scope |
+| Below 50 | Tasks are overly broad or too narrow; no atomic focus |
+**Typical Deductions**:
+- Task involving multiple primary files: -8 each
+- Task scope too broad (should be split): -5 each
+- Task scope too narrow (should be combined): -3 each
+#### Parallelization & Files (10%)
+| Score Range | Criteria |
+|-------------|----------|
+| 90-100 | Independent tasks correctly marked [P]; file paths specified and follow conventions; no false parallel markers |
+| 70-89 | Mostly correct parallel markers; minor file path issues |
+| 50-69 | Several incorrect parallel markers; missing file paths |
+| Below 50 | Parallel markers largely incorrect; file paths missing or wrong |
+**Typical Deductions**:
+- Dependent task incorrectly marked [P]: -8 each
+- Independent task missing [P] marker: -3 each
+- Task without file path specification: -5 each
+- File path not following project convention: -3 each
+#### Suggestion Score Cap Rule
+**IMPORTANT**: Suggestions (Nice to Have) items may deduct a **maximum of 5 points** from the total score. After resolving all Critical Issues and Warnings, the score should be **≥ 95**.
+- Critical Issues: -10 to -20 points each
+- Warnings: -5 to -10 points each
+- Suggestions: -1 to -2 points each, **capped at 5 points total**
 ### Report Template
 ```markdown
@@ -236,14 +326,16 @@ Valid Dependency Chain:
 ## Scoring Breakdown
-| Category | Weight | Score | Weighted |
-|----------|--------|-------|----------|
-| Plan Coverage | 30% | X/100 | X |
-| TDD Compliance | 25% | X/100 | X |
-| Dependency & Ordering | 20% | X/100 | X |
-| Task Granularity | 15% | X/100 | X |
-| Parallelization & Files | 10% | X/100 | X |
-| **Total** | **100%** | | **X/100** |
+| Category | Weight | Score | Rubric Basis | Deduction Details | Weighted |
+|----------|--------|-------|-------------|-------------------|----------|
+| Plan Coverage | 30% | X/100 | [Which rubric range applies] | [List specific deductions, e.g., "Module C missing task: -10"] | X |
+| TDD Compliance | 25% | X/100 | [Which rubric range applies] | [e.g., "Service X test after impl: -8"] | X |
+| Dependency & Ordering | 20% | X/100 | [Which rubric range applies] | [e.g., "Missing dependency Task 2.3→1.2: -5"] | X |
+| Task Granularity | 15% | X/100 | [Which rubric range applies] | [e.g., "Task 2.5 involves 3 files: -8"] | X |
+| Parallelization & Files | 10% | X/100 | [Which rubric range applies] | [e.g., "Task 2.1 false [P] marker: -8"] | X |
+| **Total** | **100%** | | | | **X/100** |
+> **Suggestion Cap**: Suggestions deducted X/5 points (cap: 5 points max)
 ## Execution Timeline Estimate
@@ -298,6 +390,33 @@ Based on the review result, the user may consider:
 - **Fail**: `/codexspec:plan-to-tasks` - to regenerate the task breakdown
 ```
+### Score Validation Checklist
+Before finalizing scores, the reviewer MUST verify:
+- [ ] Every deduction in "Deduction Details" column has a corresponding issue in "Detailed Findings"
+- [ ] The arithmetic is correct: each category score = 100 minus sum of deductions
+- [ ] Weighted total = sum of (category score × weight) for all categories
+- [ ] Suggestion deductions do not exceed 5-point cap
+- [ ] No "phantom deductions" (deductions without matching issues)
+- [ ] Score is consistent with Overall Status (Pass ≥ 80, Needs Work 50-79, Fail < 50)
+### Score Challenge Response Protocol
+When a user questions or challenges the score, follow this three-step process:
+1. **Provide Evidence**: Present the complete scoring breakdown with all deduction details. Reference the specific rubric criteria and issue IDs that justify each deduction.
+2. **Ask for Specifics**: Ask the user which specific scoring item(s) they believe are incorrect. Do NOT preemptively adjust any scores.
+3. **Targeted Re-evaluation**: For each challenged item:
+   - Re-read the relevant section of the tasks document
+   - Re-apply the rubric criteria objectively
+   - If the original score was correct: explain the reasoning and maintain the score
+   - If the original score was indeed incorrect: adjust with clear explanation of what changed and why
+> **CRITICAL**: Never adjust scores simply because the user expresses dissatisfaction. Only adjust when re-evaluation reveals a genuine scoring error.
 ### Quality Criteria
 Before completing the review, verify:
@@ -308,7 +427,7 @@ Before completing the review, verify:
 - [ ] Task granularity is appropriate (Task Granularity)
 - [ ] Parallelization markers and file paths are correct (Parallelization & Files)
 - [ ] Issues have clear, actionable suggestions
-- [ ] Score reflects actual quality accurately
+- [ ] Score reflects actual quality accurately (validated via Score Validation Checklist)
 ### Output

{codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/spec-to-plan.md RENAMED Viewed

@@ -31,6 +31,48 @@ $ARGUMENTS
 Transform the feature specification into a detailed technical implementation plan. This is where you define **how** the feature will be built.
+### Quality Targets
+Before generating the plan, internalize these quality targets. They are aligned with the `review-plan` scoring rubrics to ensure first-pass quality.
+#### Spec Alignment (Target: ≥ 90)
+- [ ] Every functional requirement (REQ-XXX) has a corresponding implementation component
+- [ ] Every user story has technical coverage in the architecture
+- [ ] All non-functional requirements are addressed in architecture decisions
+- [ ] Edge cases from the spec are handled in implementation phases
+#### Tech Stack (Target: ≥ 90)
+- [ ] All technologies are clearly listed with version constraints
+- [ ] Each technology choice is justified for the requirements
+- [ ] Tech stack aligns with project constitution (if exists)
+- [ ] No critical category missing (language, framework, testing, etc.)
+#### Architecture Quality (Target: ≥ 90)
+- [ ] High-level architecture diagram included (ASCII or Mermaid)
+- [ ] Each module has explicit responsibility, dependencies, and interfaces
+- [ ] Module dependency graph is complete
+- [ ] Separation of concerns is maintained
+- [ ] Design patterns are appropriate and documented
+#### Phase Planning (Target: ≥ 90)
+- [ ] Phases are logically ordered (foundation → core → integration → testing)
+- [ ] Each phase has specific, measurable deliverables
+- [ ] Phase scope is realistic and manageable
+- [ ] Inter-phase dependencies are minimal and documented
+#### Constitution Alignment (Target: ≥ 90)
+- [ ] Each constitution principle explicitly reviewed
+- [ ] Architecture decisions reference relevant principles
+- [ ] Testing requirements from constitution are incorporated
+- [ ] Naming conventions and workflow guidelines followed
+> **Self-Check**: After generating the plan, verify each target above is met before saving. This reduces review iterations.
 ### Execution Steps
 1. **Load Context**

{codexspec-0.5.14 → codexspec-0.5.16}/.gitignore RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/LICENSE RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/README.md RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/codexspec-icon.svg RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/codexspec-logo-dark.svg RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/codexspec-logo-light.svg RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/scripts/bash/check-i18n-completeness.sh RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/scripts/bash/check-i18n-structure.sh RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/scripts/bash/check-prerequisites.sh RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/scripts/bash/common.sh RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/scripts/bash/create-new-feature.sh RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/scripts/powershell/check-prerequisites.ps1 RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/scripts/powershell/common.ps1 RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/scripts/powershell/create-new-feature.ps1 RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/scripts/python/.claude/settings.local.json RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/scripts/python/README.md RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/scripts/python/claude_ctl.py RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/scripts/python/claude_monitor.py RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/scripts/python/notify_telegram.py RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/src/codexspec/__init__.py RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/src/codexspec/commands/__init__.py RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/src/codexspec/commands/installer.py RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/src/codexspec/i18n.py RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/src/codexspec/idea.md RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/src/codexspec/translator.py RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/analyze.md RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/check-i18n-semantics.md RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/checklist.md RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/clarify.md RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/commit-staged.md RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/config.md RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/constitution.md RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/generate-spec.md RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/implement-tasks.md RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/pr.md RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/quick.md RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/specify.md RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/tasks-to-issues.md RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/commands/translate-docs.md RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/docs/checklist-template.md RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/docs/constitution-template.md RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/docs/plan-template-detailed.md RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/docs/plan-template-simple.md RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/docs/spec-template-detailed.md RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/docs/spec-template-simple.md RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/docs/tasks-template-detailed.md RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/docs/tasks-template-simple.md RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/translations/de.json RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/translations/en.json RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/translations/es.json RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/translations/fr.json RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/translations/ja.json RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/translations/ko.json RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/translations/pt-BR.json RENAMED Viewed

File without changes

{codexspec-0.5.14 → codexspec-0.5.16}/templates/translations/zh-CN.json RENAMED Viewed

File without changes

codexspec 0.5.14__tar.gz → 0.5.16__tar.gz

codexspec 0.5.14tar.gz → 0.5.16tar.gz