npm - jettypod - Versions diffs - 3.0.1 - Mend

jettypod 3.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (122) hide show

package/.claude/PROTECT_SKILLS.md +28 -0
package/.claude/settings.json +24 -0
package/.claude/settings.local.json +16 -0
package/.claude/skills/epic-discover/SKILL.md +262 -0
package/.claude/skills/feature-discover/SKILL.md +393 -0
package/.claude/skills/speed-mode/SKILL.md +364 -0
package/.claude/skills/stable-mode/SKILL.md +591 -0
package/.github/workflows/test-safety.yml +85 -0
package/README.md +25 -0
package/SPEED-STABLE-AUDIT.md +853 -0
package/SYSTEM-BEHAVIOR.md +1241 -0
package/TEST_SAFETY_AUDIT.md +314 -0
package/TEST_SAFETY_IMPLEMENTATION.md +97 -0
package/cucumber.js +8 -0
package/docs/COMMAND_REFERENCE.md +903 -0
package/docs/DECISIONS.md +68 -0
package/docs/README.md +48 -0
package/docs/STANDARDS-SYSTEM-DOCUMENTATION.md +374 -0
package/docs/TEST-REWRITE-PLAN.md +261 -0
package/docs/ai-test-writing-requirements.md +219 -0
package/docs/claude-code-skills.md +607 -0
package/docs/core-jettypod-methodology/comprehensive-jettypod-methodology.md +582 -0
package/docs/core-jettypod-methodology/deprecated/jettypod-comprehensive-standards.md +1222 -0
package/docs/core-jettypod-methodology/deprecated/jettypod-operating-guide.md +3399 -0
package/docs/core-jettypod-methodology/deprecated/jettypod-technical-checklist.md +1325 -0
package/docs/core-jettypod-methodology/deprecated/jettypod-vibe-coding-framework.md +1544 -0
package/docs/core-jettypod-methodology/deprecated/prompt-engineering-guide.md +320 -0
package/docs/core-jettypod-methodology/deprecated/vibe-coding-cheatsheet (1).md +516 -0
package/docs/core-jettypod-methodology/deprecated/vibe-coding-framework.md +1544 -0
package/docs/features/jettypod-standards-explained.md +543 -0
package/docs/features/standards-inventory.md +257 -0
package/docs/gap-analysis-current-vs-comprehensive-methodology.md +939 -0
package/docs/jettypod-system-overview.md +409 -0
package/features/auto-generate-production-chores.feature +14 -0
package/features/claude-md-protection/steps.js +487 -0
package/features/decisions/index.js +490 -0
package/features/decisions/index.test.js +208 -0
package/features/git-hooks/git-hooks.feature +30 -0
package/features/git-hooks/index.js +93 -0
package/features/git-hooks/index.test.js +137 -0
package/features/git-hooks/post-commit +56 -0
package/features/git-hooks/post-merge +47 -0
package/features/git-hooks/pre-commit +28 -0
package/features/git-hooks/simple-steps.js +53 -0
package/features/git-hooks/simple-test.feature +10 -0
package/features/git-hooks/steps.js +196 -0
package/features/jettypod-update-command.feature +46 -0
package/features/mode-prompts/index.js +95 -0
package/features/mode-prompts/simple-steps.js +44 -0
package/features/mode-prompts/simple-test.feature +9 -0
package/features/mode-prompts/validation.test.js +120 -0
package/features/refactor-mode/steps.js +217 -0
package/features/refactor-mode.feature +49 -0
package/features/skills-update/index.test.js +216 -0
package/features/step_definitions/auto-generate-production-chores.steps.js +162 -0
package/features/step_definitions/terminal-logo.steps.js +145 -0
package/features/step_definitions/update-command.steps.js +183 -0
package/features/terminal-logo/index.js +39 -0
package/features/terminal-logo/terminal-logo.feature +30 -0
package/features/update-command/index.js +181 -0
package/features/update-command/index.test.js +225 -0
package/features/work-commands/bug-workflow-display.feature +22 -0
package/features/work-commands/index.js +311 -0
package/features/work-commands/simple-steps.js +69 -0
package/features/work-commands/stable-tests.feature +57 -0
package/features/work-commands/steps.js +1120 -0
package/features/work-commands/validation.test.js +88 -0
package/features/work-commands/work-commands.feature +13 -0
package/features/work-tracking/discovery-validation.test.js +228 -0
package/features/work-tracking/index.js +1511 -0
package/features/work-tracking/mode-required.feature +112 -0
package/features/work-tracking/phase-tracking.test.js +482 -0
package/features/work-tracking/prototype-tracking.test.js +485 -0
package/features/work-tracking/tree-view.test.js +310 -0
package/features/work-tracking/work-set-mode.feature +71 -0
package/features/work-tracking/work-start-mode.feature +88 -0
package/full-test.txt +0 -0
package/install.sh +89 -0
package/jettypod.js +1640 -0
package/lib/bug-workflow.js +94 -0
package/lib/bug-workflow.test.js +177 -0
package/lib/claudemd.js +130 -0
package/lib/claudemd.test.js +195 -0
package/lib/comprehensive-standards-full.json +1778 -0
package/lib/config.js +181 -0
package/lib/config.test.js +511 -0
package/lib/constants.js +107 -0
package/lib/constants.test.js +164 -0
package/lib/current-work.js +130 -0
package/lib/current-work.test.js +146 -0
package/lib/database-project-config.test.js +107 -0
package/lib/database.js +256 -0
package/lib/database.test.js +106 -0
package/lib/decisions-generator.js +102 -0
package/lib/decisions-generator.test.js +457 -0
package/lib/decisions-helpers.js +119 -0
package/lib/decisions-helpers.test.js +310 -0
package/lib/discovery-checkpoint.js +83 -0
package/lib/docs-generator.js +280 -0
package/lib/external-checklist.js +177 -0
package/lib/git.js +142 -0
package/lib/git.test.js +145 -0
package/lib/logo.js +3 -0
package/lib/migrations/001-epic-to-parent.js +24 -0
package/lib/migrations/002-default-work-item-modes.js +37 -0
package/lib/migrations/002-default-work-item-modes.test.js +351 -0
package/lib/migrations/003-epic-discovery-fields.js +52 -0
package/lib/migrations/004-discovery-decisions-table.js +32 -0
package/lib/migrations/005-migrate-decision-data.js +62 -0
package/lib/migrations/006-feature-phase-field.js +61 -0
package/lib/migrations/007-prototype-tracking.js +38 -0
package/lib/migrations/008-scenario-file-field.js +24 -0
package/lib/migrations/index.js +74 -0
package/lib/production-helpers.js +69 -0
package/lib/project-state.test.js +92 -0
package/lib/test-helpers.js +184 -0
package/lib/test-helpers.test.js +255 -0
package/package.json +36 -0
package/prototypes/test/index.html +1 -0
package/setup-dist-repo.sh +68 -0
package/test-safety-check.sh +80 -0
package/work-item-tracking-plan.md +199 -0

package/SPEED-STABLE-AUDIT.md ADDED Viewed

@@ -0,0 +1,853 @@
+# Speed/Stable/Production Mode Skills Audit
+## Executive Summary
+This document analyzes the three-phase development methodology implemented through the mode skills. The approach separates feature implementation into:
+- **Speed mode**: Implement ALL scoped functionality assuming happy path
+- **Stable mode**: Add error handling and validation for internal use
+- **Production mode**: Add security, compliance, scalability, and performance for external/production use
+This audit focuses on the speed and stable mode skills, as production mode is not yet implemented.
+---
+## 1. Opportunities to Refactor
+### 1.1 Code Duplication
+**Issue**: Scenario parsing and file discovery logic is duplicated across both skills.
+**Example**: Both skills independently:
+- Parse Gherkin scenarios
+- Query the database for work items
+- Extract scenario requirements
+- Run BDD tests
+**Recommendation**: Extract shared functionality into common utilities:
+```javascript
+// lib/scenario-helpers.js
+function parseScenarios(scenarioContent) { /* ... */ }
+function matchScenarioToChore(scenarios, choreDesc) { /* ... */ }
+// lib/test-helpers.js
+async function runBddTests(featureFile, options) { /* ... */ }
+function parseTestResults(stdout) { /* ... */ }
+```
+### 1.2 Inconsistent Error Handling Patterns
+**Issue**: Stable mode demonstrates comprehensive error handling in its *instructions* but speed mode doesn't model good error handling practices in its own implementation code.
+**Example**: Speed mode's scenario loading code (lines 44-64) has no error handling, while stable mode's equivalent (lines 80-158) is comprehensive.
+**Recommendation**:
+- Both skills should practice what they preach
+- Speed mode instruction code should still have basic error handling (even if the code it generates doesn't)
+- Create a clear separation between "skill execution code" and "code being generated"
+### 1.3 File Discovery Brittleness
+**Issue**: Both skills rely on fragile approaches for finding relevant files:
+- Speed mode: Manual glob/grep patterns based on scenario keywords
+- Stable mode: Git history parsing with manual fallback
+**Example**: Stable mode (lines 260-321) has complex git log parsing that can fail in multiple ways:
+```javascript
+const { stdout: gitLog } = await execPromise(
+  `git log --oneline --all --grep="${featureName}" -10`
+);
+```
+**Problems**:
+- Assumes feature name in commit messages
+- Limited to 10 commits
+- Breaks if commits aren't descriptive
+- No handling for squashed/rebased history
+**Recommendation**:
+- Store file associations in the work item database
+- Speed mode records files it creates/modifies
+- Stable mode reads from database first, git history second
+- Add a `work_item_files` table linking work items to affected files
+### 1.4 Test Execution Parsing
+**Issue**: Both skills parse test output with fragile string matching.
+**Example** (stable mode, line 472):
+```javascript
+const targetScenarioPassed = stdout.includes('[scenario-title]') && stdout.includes('✓');
+```
+**Problems**:
+- Breaks if test output format changes
+- Hardcoded symbols ('✓', '✗')
+- No structured output parsing
+- Difficult to extract specific failure details
+**Recommendation**:
+- Use structured test output (JSON reporters)
+- Parse Cucumber/Jest JSON output for deterministic results
+- Provide detailed failure context (line numbers, assertions, stack traces)
+### 1.5 Iteration Logic Complexity
+**Issue**: Stable mode's test-fix iteration loop (lines 434-548) is monolithic and hard to reason about.
+**Problems**:
+- Mixes concerns: file editing, test running, result parsing, error handling
+- Difficult to test in isolation
+- Hard to add new iteration strategies
+- No clear separation of retry logic from business logic
+**Recommendation**:
+```javascript
+// lib/iteration-engine.js
+class TestIterationEngine {
+  constructor(maxIterations = 10, timeout = 60000) { /* ... */ }
+  async iterate(implementFn, testFn, analyzeFn) {
+    // Generic iteration logic
+  }
+  handleTestTimeout(error) { /* ... */ }
+  handleTestFailure(results) { /* ... */ }
+  handleMaxIterations() { /* ... */ }
+}
+// In stable-mode skill
+const engine = new TestIterationEngine();
+await engine.iterate(
+  () => this.addErrorHandling(),
+  () => this.runTests(),
+  (results) => this.analyzeFailures(results)
+);
+```
+### 1.6 User Interaction Ambiguity
+**Issue**: Skills have unclear boundaries between "autonomous execution" and "requires confirmation".
+**Example**: Speed mode says "autonomous" but has multiple confirmation points:
+- Step 3 Phase 1: Wait for implementation approach confirmation
+- Step 4 Phase 2: Wait for stable chore creation confirmation
+**Recommendation**:
+- Define clear "decision gates" vs "execution phases"
+- Use a consistent pattern: `proposeAndConfirm(proposal)` → `execute(plan)`
+- Document *why* each gate exists (e.g., "high-level direction setting" vs "tactical execution")
+### 1.7 Missing Rollback/Undo Mechanisms
+**Issue**: Neither skill provides rollback if implementation fails or user rejects.
+**Problems**:
+- No way to undo file changes if tests fail after max iterations
+- No git commit checkpointing
+- Hard to recover from partial failures
+**Recommendation**:
+```javascript
+// lib/transaction-helpers.js
+class CodeTransaction {
+  begin() {
+    this.checkpoint = getCurrentGitCommit();
+    this.changedFiles = [];
+  }
+  rollback() {
+    // Restore files from checkpoint
+    // Or create a revert commit
+  }
+  commit(message) {
+    // Create git commit with changed files
+  }
+}
+```
+### 1.8 Scenario Matching Heuristics
+**Issue**: Stable mode's scenario matching (lines 184-221) uses crude keyword matching.
+**Example**:
+```javascript
+const keywords = choreDesc.split(/\s+/).filter(w => w.length > 3);
+const matches = keywords.filter(k => scenarioLower.includes(k));
+```
+**Problems**:
+- False positives (common words like "user", "data")
+- Doesn't handle synonyms
+- No semantic understanding
+- No confidence scoring
+**Recommendation**:
+- Use explicit scenario references in chore descriptions: `[Scenario 2]`
+- Store scenario index in work item database
+- Provide fuzzy matching with confidence scores
+- Allow user to select from ranked list if ambiguous
+---
+## 2. The Three-Phase Development Model
+### 2.1 Overview
+The methodology separates feature development into three distinct phases, each with clear responsibilities:
+**Phase 1: Speed Mode**
+- **Goal**: Implement ALL scoped functionality
+- **Scope**: Every feature/function defined in discovery
+- **Philosophy**: Assume happy path - valid inputs, successful operations, correct types
+- **Excludes**: Error handling, validation, edge cases, performance optimization, security hardening
+- **Output**: Working feature that passes happy path BDD scenario
+- **Target**: Internal proof-of-concept, rapid prototyping
+**Phase 2: Stable Mode**
+- **Goal**: Add robustness for internal use
+- **Scope**: Error handling, input validation, edge case coverage
+- **Philosophy**: Build on speed implementation without re-implementing core features
+- **Includes**: Try/catch blocks, validation checks, clear error messages, graceful failures
+- **Excludes**: Security hardening, compliance, scalability, performance optimization
+- **Output**: Reliable feature for internal/team use
+- **Target**: Internal tools, staging environments, team testing
+**Phase 3: Production Mode** (not yet implemented in skills)
+- **Goal**: Make feature production-ready for external users
+- **Scope**: Security, compliance, scalability, performance
+- **Philosophy**: Harden stable implementation for real-world use
+- **Includes**: Authentication/authorization, rate limiting, audit logging, performance optimization, load testing, compliance requirements (GDPR, HIPAA, etc.)
+- **Output**: Production-grade feature
+- **Target**: External users, public-facing systems, regulated environments
+### 2.2 Key Insight: Progressive Hardening
+The three-phase model recognizes that **internal use** and **external/production use** have fundamentally different requirements:
+- **Internal**: Developers know how to use the system, can handle rough edges, can debug issues
+- **Production**: Users expect polish, security, performance, and reliability
+Stable mode optimizes for developer productivity (get a working feature fast for internal use), while production mode optimizes for user experience and enterprise requirements.
+---
+## 3. Strengths of This Approach
+### 3.1 Separation of Concerns
+**Strength**: Clean three-phase separation mirrors real-world development practices and aligns with internal vs external deployment targets.
+**Why it works**:
+- Reduces cognitive load (focus on one thing at a time)
+- Prevents premature optimization and over-engineering
+- Allows rapid prototyping without getting bogged down in edge cases
+- Natural checkpoints between major phases (speed → stable → production)
+- Aligns with deployment targets (internal → staging → production)
+- Defers expensive work (security audits, load testing) until actually needed
+**Evidence**: Speed mode explicitly states "NO error handling" (line 21), stable mode adds robustness (line 17), and production mode will add security/scalability.
+### 3.2 BDD-Driven Development
+**Strength**: Using Gherkin scenarios as source of truth creates clear, testable requirements.
+**Benefits**:
+- Machine-readable specifications
+- Clear acceptance criteria
+- Automatic verification through step definitions
+- Living documentation that stays synchronized with code
+**Example**: Both skills rely on scenario files to drive what gets implemented, ensuring implementation matches requirements.
+### 3.3 Autonomous Execution with Strategic Confirmation
+**Strength**: Skills ask for confirmation on *approach* but execute *implementation* autonomously.
+**Why it works**:
+- User validates high-level decisions (architecture, approach)
+- Claude handles tactical execution (code writing, iteration)
+- Balances control with efficiency
+- Appropriate for "users who may not know how to code" (line 26)
+**Example**: Speed mode Step 3 proposes implementation approach, waits for confirmation, then executes autonomously.
+### 3.4 Progressive Complexity
+**Strength**: Intentional progression from simple to complex reduces failure points.
+**Benefits**:
+- Happy path implementation (speed) establishes foundation
+- Edge cases and errors (stable) build incrementally
+- Each phase has clear success criteria
+- Easier to debug (isolate whether issue is in core logic vs error handling)
+### 3.5 Comprehensive Error Modeling
+**Strength**: Stable mode's error handling examples (lines 80-548) demonstrate thorough error scenarios.
+**Coverage includes**:
+- Database errors
+- File system errors
+- Missing data
+- Invalid input
+- Timeout errors
+- Git operation failures
+- Test execution failures
+**Why it works**: Demonstrates real-world failure modes, not just theoretical errors.
+### 3.6 Iteration Limits and Failure Recovery
+**Strength**: Stable mode includes max iteration limits and graceful failure handling.
+**Example** (lines 434, 526-547):
+```javascript
+const MAX_ITERATIONS = 10;
+// ...
+if (!scenarioPasses && iteration >= MAX_ITERATIONS) {
+  console.error('Maximum iterations reached');
+  console.log('Possible reasons:');
+  // Provides actionable suggestions
+}
+```
+**Why it works**:
+- Prevents infinite loops
+- Forces explicit failure handling
+- Provides debugging guidance
+- Asks user for direction when stuck
+### 3.7 Work Decomposition Strategy
+**Strength**: Speed mode automatically generates stable mode chores based on implementation gaps.
+**Benefits**:
+- Ensures stable mode work isn't forgotten
+- Creates explicit work items for non-happy-path scenarios
+- Provides traceability (each chore maps to specific scenario)
+- Prevents "works on my machine" releases
+**Example** (speed mode lines 259-361): Analyzes implementation, proposes chores, creates them programmatically.
+### 3.8 Clear Mental Models
+**Strength**: Both skills use consistent structure and terminology.
+**Patterns**:
+- Four-step progression: Analyze → Review → Implement → Verify
+- Autonomous vs confirmation phases clearly marked
+- Consistent emoji/formatting for status messages
+- Clear error vs success states
+**Why it works**: Reduces cognitive load, makes workflows predictable and learnable.
+---
+## 4. Criticisms
+### 4.1 Architectural: Tight Coupling to BDD
+**Problem**: The entire workflow *requires* BDD scenarios to exist and be correct.
+**Failure modes**:
+- What if scenarios don't cover actual requirements?
+- What if step definitions are incorrect?
+- What if business needs change mid-implementation?
+- How to handle exploratory development?
+**Example**: Speed mode line 217 is the *only* verification: "Check if BDD scenario passes". If scenarios are wrong, you'll implement the wrong thing correctly.
+**Impact**:
+- Assumes scenarios are perfect specifications
+- No validation that scenarios match actual user needs
+- Difficult to adapt to changing requirements
+- Requires upfront BDD expertise
+### 4.2 User Profile Assumption
+**Problem**: "User may not know how to code" (speed line 26, stable line 30) is baked into the design.
+**Implications**:
+- Limited ability to customize implementation
+- No escape hatch for code-savvy users
+- Difficult to intervene during execution
+- Black box execution model
+**Example**: User can't say "implement this specific way" during autonomous execution phases.
+**Better approach**:
+- Provide "expert mode" toggle
+- Allow code injection points
+- Support mid-execution adjustments
+- Provide "review-before-execute" option
+### 4.3 Speed Mode Philosophy: Simplified vs Wrong Implementation
+**Problem**: "Assume everything works correctly" (speed line 20) risks creating implementations that need rewriting rather than enhancing.
+**Example**: Speed mode line 240 says "localStorage for data" as a pragmatic choice - but if stable mode needs a real database for internal use, you're rewriting, not enhancing.
+**Key distinction**:
+- **Simplified** (good): Real database with basic queries, no error handling → Stable adds try/catch and validation
+- **Wrong** (bad): localStorage → Stable mode must rewrite to use real database
+**Risk**: Speed mode might choose implementation approaches that can't be enhanced in stable mode, forcing rewrites.
+**Updated with three-phase context**:
+- Speed: Simplified real implementations (real DB, real APIs, real file upload)
+- Stable: Add error handling and validation for internal use
+- Production: Add security, performance, and scalability
+This keeps the "build on" philosophy intact across all phases. Stable doesn't re-implement core logic, and production doesn't re-implement stable's error handling - each phase truly enhances the previous.
+### 4.4 Iteration Strategy: Blind Trial and Error
+**Problem**: Both skills iterate without learning from previous attempts.
+**Example** (stable mode lines 434-548): Iteration loop has no memory:
+- No tracking of what was already tried
+- No analysis of *why* previous iteration failed
+- No strategic adjustment of approach
+**Risk**: Could try the same fix 10 times, hitting max iterations without progress.
+**Better approach**:
+```javascript
+class IterationEngine {
+  constructor() {
+    this.attemptHistory = [];
+  }
+  async iterate() {
+    const failureAnalysis = this.analyzePatterns(this.attemptHistory);
+    const nextStrategy = this.selectStrategy(failureAnalysis);
+    // Execute with adjusted approach
+  }
+}
+```
+### 4.5 Testing: Lack of Unit Test Strategy
+**Problem**: Both skills only verify via BDD integration tests.
+**Missing**:
+- Unit tests for individual functions
+- Mock/stub strategies
+- Fast feedback loops (BDD tests are slow)
+- Granular failure isolation
+**Impact**: When BDD test fails, hard to know *which* function is wrong.
+**Example**: Speed mode creates entire implementation then runs BDD test. If it fails, unclear which piece is broken.
+**Recommendation**: Generate unit tests alongside implementation:
+- Speed mode: Create basic unit tests for happy path
+- Stable mode: Add edge case unit tests
+- Run unit tests before BDD tests (faster feedback)
+### 4.6 Error Messages: Inconsistent Guidance
+**Problem**: Error messages sometimes provide actionable guidance, sometimes don't.
+**Good example** (stable mode line 120):
+```javascript
+console.error('❌ Feature has no scenario_file.');
+console.log('Suggestion: Create a scenario file and update the feature.');
+```
+**Bad example** (stable mode line 503):
+```javascript
+console.error('❌ Test execution error:', testErr.message);
+console.log('Retrying...');
+// No explanation of what to check or how to fix
+```
+**Recommendation**: All error messages should follow pattern:
+1. What failed
+2. Why it might have failed
+3. How to fix it
+4. What the skill will do next
+### 4.7 Architectural Decisions: Underutilized
+**Problem**: Speed mode checks for architectural decisions (line 108) but doesn't explain how they're enforced.
+**Questions**:
+- What if implementation violates decisions?
+- Are decisions validated automatically?
+- Can decisions be overridden?
+- How are conflicts resolved?
+**Missing**: No validation that generated code adheres to architectural constraints.
+### 4.8 Stable Mode Chore Creation: Manual Decomposition
+**Problem**: Speed mode asks user to confirm chore proposals (line 318) but uses automated analysis.
+**Risk**: User might not understand what chores are needed, leading to:
+- Missing edge cases
+- Duplicate chores
+- Insufficient coverage
+- Wrong granularity
+**Better approach**: Provide confidence scores and coverage analysis:
+```
+Chore 1: Handle file upload errors [Confidence: High, Coverage: Error scenario 2]
+Chore 2: Validate file size limits [Confidence: Medium, Coverage: Edge scenario 3]
+Chore 3: Handle concurrent uploads [Confidence: Low, Coverage: Not in scenarios - inferred]
+```
+### 4.9 Performance Deferred to Production Mode
+**Clarification**: This is by design, not an oversight. Performance optimization is production mode's responsibility.
+**Three-phase division**:
+- **Speed**: Functional correctness on happy path
+- **Stable**: Reliability for internal use (error handling, validation)
+- **Production**: Performance, scalability, security for external use
+**Potential issue**: If speed mode makes poor algorithmic choices (O(n²) when O(n log n) is trivial), stable mode won't catch it because tests pass with small datasets.
+**Recommendation**: Add basic performance awareness to speed mode:
+- Don't choose obviously inefficient algorithms
+- Use reasonable data structures (Map vs Array for lookups)
+- Production mode adds: benchmarking, load testing, scalability analysis, resource optimization
+### 4.10 Unclear Handoff Between Modes
+**Problem**: No explicit verification that speed mode completion criteria are met before stable mode starts.
+**Questions**:
+- What if speed mode didn't finish?
+- What if happy path test is broken?
+- What if some features weren't implemented?
+**Missing**: Stable mode should verify speed mode completion:
+```javascript
+// Verify happy path passes before adding error handling
+const happyPathResults = await runHappyPathScenario();
+if (!happyPathResults.passed) {
+  console.error('Cannot start stable mode - happy path broken');
+  console.log('Fix speed mode implementation first');
+  return;
+}
+```
+---
+## 5. Failure Modes
+### 5.1 Scenario Specification Failures
+**Failure Mode**: BDD scenarios are incomplete, ambiguous, or incorrect.
+**Example**:
+```gherkin
+Scenario: User uploads a file
+  Given I am on the upload page
+  When I click upload
+  Then the file is uploaded
+```
+**Problem**: Scenario doesn't specify:
+- What file? (size, type, name)
+- Where is it uploaded? (URL, storage location)
+- What confirmation? (UI feedback, status)
+- What happens next? (navigation, state change)
+**Impact**: Claude implements based on assumptions, which may be wrong.
+**Consequence**: Tests pass but feature doesn't meet actual requirements.
+**Mitigation**: Add scenario validation step before implementation.
+### 4.2 Step Definition Failures
+**Failure Mode**: Step definitions don't correctly verify requirements.
+**Example**:
+```javascript
+Then('the file is uploaded', function() {
+  // Just checks that function was called
+  expect(uploadFile).toHaveBeenCalled();
+});
+```
+**Problem**: Doesn't verify file actually exists in storage, correct size, permissions, etc.
+**Impact**: Tests pass with broken implementation.
+**Consequence**: Bugs discovered in production, not testing.
+**Mitigation**: Generate step definitions with comprehensive assertions.
+### 4.3 Codebase Pattern Misunderstanding
+**Failure Mode**: Skill misinterprets codebase patterns and implements inconsistently.
+**Example**:
+- Codebase uses Promises but skill generates callbacks
+- Codebase uses TypeScript but skill generates JavaScript
+- Codebase uses class components but skill generates hooks
+**Impact**: Code works but doesn't fit architectural patterns.
+**Consequence**: Code review rejection, refactoring required, technical debt.
+**Mitigation**: Add explicit pattern validation step.
+### 4.4 Git History Pollution
+**Failure Mode**: Multiple iterations create messy commit history.
+**Example**: Stable mode iterates 8 times, creating 8 commits, all for same chore.
+**Impact**: Unclear git history, difficult to review, hard to revert.
+**Consequence**: Poor code review experience, difficult debugging.
+**Mitigation**:
+- Use transaction pattern (single commit per chore)
+- Squash iteration commits automatically
+- Provide clean commit message generation
+### 4.5 Test Timeout Cascades
+**Failure Mode**: One slow test causes timeout, leading to false failures.
+**Example** (stable mode line 500-509): Test times out, skill assumes implementation problem.
+**Actual cause**: Network latency, slow CI, resource contention.
+**Impact**: Skill iterates unnecessarily, wasting time.
+**Consequence**: Max iterations reached, chore marked as failed.
+**Mitigation**:
+- Distinguish timeout from failure
+- Retry timeouts with backoff
+- Provide manual override option
+### 4.6 Circular Dependencies
+**Failure Mode**: Speed implementation creates circular dependencies that pass tests but break at runtime.
+**Example**:
+```javascript
+// fileA.js
+const { processB } = require('./fileB');
+// fileB.js
+const { processA } = require('./fileA');
+```
+**Impact**: BDD tests mock dependencies, so circular dep not detected.
+**Consequence**: Runtime crash in production.
+**Mitigation**: Add static analysis step to detect circular deps.
+### 4.7 State Leakage Between Tests
+**Failure Mode**: Tests share state (localStorage, database, globals), causing flakiness.
+**Example**:
+- Test 1 sets `localStorage.user = "admin"`
+- Test 2 assumes anonymous user
+- Test 2 fails intermittently depending on test order
+**Impact**: Tests pass in isolation but fail in suite.
+**Consequence**: Skill assumes code is broken, iterates unnecessarily.
+**Mitigation**: Add test isolation verification, setup/teardown helpers.
+### 4.8 External Dependency Failures
+**Failure Mode**: Tests depend on external services (APIs, databases) that are unavailable.
+**Example**: BDD scenario tests real API integration, API is down during test run.
+**Impact**: Tests fail, skill assumes implementation broken.
+**Consequence**: Skill iterates without making progress.
+**Mitigation**:
+- Distinguish external failures from implementation failures
+- Provide retry logic for external deps
+- Support mock/stub modes
+### 4.9 Scenario Evolution During Development
+**Failure Mode**: Business requirements change mid-development, scenarios become outdated.
+**Example**:
+- Speed mode implements scenario v1
+- Stakeholder changes requirements
+- Stable mode implements scenario v2
+- Speed and stable implementations diverge
+**Impact**: Implementation inconsistency.
+**Consequence**: Neither version fully works.
+**Mitigation**: Version control scenarios, validate consistency across modes.
+### 4.10 Memory Limitations with Large Codebases
+**Failure Mode**: Codebase analysis loads too many files, exceeding context limits.
+**Example**: Speed mode uses Grep to find patterns, matches 500 files, tries to read all.
+**Impact**: Context overflow, incomplete analysis.
+**Consequence**: Wrong files modified, incorrect integration points.
+**Mitigation**:
+- Implement smart filtering (relevance scoring)
+- Limit analysis scope (recent files, same directory)
+- Use incremental analysis
+### 4.11 Ambiguous Error Recovery
+**Failure Mode**: Skill encounters error it wasn't designed to handle.
+**Example**: Database schema migration needed, but skill doesn't recognize it.
+**Error message**: "Cannot find column 'scenario_file'"
+**Skill behavior**: Retries same query multiple times, hits max iterations.
+**Consequence**: Fails without useful guidance.
+**Mitigation**: Add fallback to user consultation when stuck.
+### 4.12 Overconfidence in Test Passing
+**Failure Mode**: Tests pass but code doesn't actually work as intended.
+**Example**:
+- BDD test mocks HTTP responses
+- Implementation has off-by-one error in real API call
+- Test passes because mock returns expected data
+- Real usage fails
+**Impact**: False confidence in implementation quality.
+**Consequence**: Bugs discovered in production.
+**Mitigation**: Require integration tests with real dependencies in separate validation step.
+### 4.13 Skill Version Drift
+**Failure Mode**: Speed mode skill updated but stable mode skill not updated accordingly.
+**Example**:
+- Speed mode changes chore creation format
+- Stable mode expects old format
+- Handoff breaks
+**Impact**: Stable mode can't find/parse speed mode output.
+**Consequence**: Manual intervention required.
+**Mitigation**: Version skills, validate compatibility at handoff points.
+### 4.14 Human Misunderstanding of Autonomous Boundaries
+**Failure Mode**: User expects skill to stop for confirmation but it executes autonomously.
+**Example**: User expects to review each file change, but skill creates 10 files without pausing.
+**Impact**: User surprise, possible unwanted changes.
+**Consequence**: Manual rollback, lost trust.
+**Mitigation**: Make autonomous boundaries more explicit in UI, provide "confirm each step" mode.
+### 4.15 Insufficient Context for Complex Scenarios
+**Failure Mode**: Scenario requires domain knowledge not captured in codebase or scenario file.
+**Example**:
+```gherkin
+Scenario: Process payment with merchant account
+  Given a valid credit card
+  When I submit payment
+  Then funds are transferred to merchant account
+```
+**Missing context**:
+- Which payment gateway?
+- What merchant ID?
+- What API credentials?
+- Staging vs production?
+**Impact**: Skill implements with assumptions.
+**Consequence**: Implementation doesn't work with real merchant account.
+**Mitigation**: Require configuration validation before implementation.
+---
+## 6. Recommendations Summary
+### High Priority
+1. **Extract shared utilities** to reduce duplication
+2. **Implement file tracking database** to replace fragile git parsing
+3. **Add rollback/transaction support** for failed implementations
+4. **Validate scenario quality** before implementation
+5. **Distinguish external failures** from implementation failures
+### Medium Priority
+6. **Add unit testing generation** alongside BDD tests
+7. **Improve iteration intelligence** with attempt history analysis
+8. **Standardize error message format** across all failures
+9. **Version control skill dependencies** to prevent drift
+10. **Add performance scenarios** to BDD requirements
+### Low Priority
+11. **Provide expert mode** for code-savvy users
+12. **Add static analysis** for circular deps, code smells
+13. **Implement confidence scoring** for scenario matching
+14. **Add coverage analysis** for chore proposals
+15. **Create debugging guides** for common failure modes
+---
+## Conclusion
+The three-phase development model (speed → stable → production) is fundamentally sound and addresses real challenges in software development. The progressive hardening approach mirrors industry best practices:
+- **Speed**: Make it work (internal proof-of-concept)
+- **Stable**: Make it reliable (internal use with error handling)
+- **Production**: Make it production-ready (external use with security, performance, compliance)
+This separation aligns with deployment targets and defers expensive work (security audits, load testing) until actually needed for production.
+**Key strengths**:
+- Clean separation of concerns reduces cognitive load
+- BDD-driven approach ensures testable requirements
+- Progressive complexity prevents premature optimization
+- Autonomous execution appropriate for non-technical users
+**Significant brittleness**:
+- BDD scenario quality assumptions (garbage in, garbage out)
+- File discovery mechanisms (fragile git parsing)
+- Iteration strategies (blind trial and error)
+- Error recovery (no rollback, limited learning)
+**Recommendations**:
+- More defensive programming in skill execution code
+- Better failure mode handling and recovery
+- Clearer boundaries between autonomous and confirmed actions
+- Smarter iteration with learning from previous attempts
+- Shared utilities to reduce duplication
+- File tracking database to replace git parsing
+**Three-phase insight**:
+The division between stable (internal use) and production (external use) is particularly valuable. It acknowledges that internal tools can have rough edges - developers can handle them. Production must be hardened for real users. This prevents over-engineering internal tools while ensuring production systems are properly secured and optimized.
+Despite criticisms, the approach is innovative and addresses a real need for structured, test-driven development that works for non-technical users. With refinements around robustness and failure recovery, this methodology could be highly effective for teams building both internal tools and production systems.