jettypod 4.1.2 → 4.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (179) hide show
  1. package/.nvmrc +1 -0
  2. package/docs/COMPLETE-TESTING-STRATEGY.md +970 -0
  3. package/docs/DECISIONS.md +10 -12
  4. package/docs/NODE_VERSION.md +83 -0
  5. package/docs/TDD-INFRASTRUCTURE-STRATEGY.md +1374 -0
  6. package/docs/TESTING-FOR-NON-ENGINEERS.md +1588 -0
  7. package/docs/TESTING-STRATEGY-AUDIT.md +698 -0
  8. package/hooks/post-checkout +17 -0
  9. package/hooks/post-merge +17 -0
  10. package/hooks/pre-commit +30 -0
  11. package/jettypod.js +259 -120
  12. package/lib/coverage-tracker.js +218 -0
  13. package/lib/database.js +2 -0
  14. package/lib/db-export.js +192 -0
  15. package/lib/db-import.js +193 -0
  16. package/lib/external-transition-handler.js +32 -0
  17. package/lib/git-hook-helpers.js +174 -0
  18. package/lib/git-root.js +90 -0
  19. package/lib/infrastructure-chore-generator.js +45 -0
  20. package/lib/install-hooks.js +52 -0
  21. package/lib/jettypod-backup.js +238 -0
  22. package/lib/merge-lock.js +193 -0
  23. package/lib/migrations/012-add-worktree-path.js +38 -0
  24. package/lib/migrations/013-worktrees-table.js +86 -0
  25. package/lib/migrations/014-migrate-worktree-data.js +161 -0
  26. package/lib/migrations/015-merge-locks-table.js +67 -0
  27. package/lib/pattern-finder.js +152 -0
  28. package/lib/process-manager.js +140 -0
  29. package/lib/production-standards-reader.js +13 -2
  30. package/lib/production-standards-writer.js +85 -0
  31. package/lib/skills/feature-planning/dry-run-validator.js +135 -0
  32. package/lib/skills/feature-planning/validation-formatter.js +160 -0
  33. package/lib/smart-conflict-detection.js +168 -0
  34. package/lib/smart-fetch-rebase.js +614 -0
  35. package/lib/step-definition-parser.js +76 -0
  36. package/lib/unit-test-generator.js +232 -0
  37. package/lib/verification-command-generator.js +66 -0
  38. package/lib/worktree-diagnostics.js +413 -0
  39. package/lib/worktree-facade.js +174 -0
  40. package/lib/worktree-manager.js +636 -0
  41. package/lib/worktree-reconciler.js +429 -0
  42. package/package.json +30 -3
  43. package/skills-templates/external-transition/SKILL.md +34 -3
  44. package/skills-templates/feature-planning/SKILL.md +190 -24
  45. package/skills-templates/production-mode/SKILL.md +127 -9
  46. package/skills-templates/speed-mode/SKILL.md +454 -51
  47. package/skills-templates/stable-mode/SKILL.md +285 -76
  48. package/.claude/PROTECT_SKILLS.md +0 -28
  49. package/.claude/settings.json +0 -24
  50. package/.claude/settings.local.json +0 -16
  51. package/.claude/skills/epic-planning/SKILL.md +0 -297
  52. package/.claude/skills/external-transition/SKILL.md +0 -384
  53. package/.claude/skills/feature-planning/SKILL.md +0 -464
  54. package/.claude/skills/production-mode/SKILL.md +0 -369
  55. package/.claude/skills/speed-mode/SKILL.md +0 -481
  56. package/.claude/skills/stable-mode/SKILL.md +0 -713
  57. package/.claude/skills.backup-2025-11-10T23-33-09-368Z/epic-planning/SKILL.md +0 -297
  58. package/.claude/skills.backup-2025-11-10T23-33-09-368Z/feature-planning/SKILL.md +0 -464
  59. package/.claude/skills.backup-2025-11-10T23-33-09-368Z/speed-mode/SKILL.md +0 -467
  60. package/.claude/skills.backup-2025-11-10T23-33-09-368Z/stable-mode/SKILL.md +0 -673
  61. package/.claude/skills.backup-2025-11-11T16-15-10-070Z/epic-discover/SKILL.md +0 -297
  62. package/.claude/skills.backup-2025-11-11T16-42-43-212Z/epic-planning/SKILL.md +0 -297
  63. package/.claude/skills.backup-2025-11-11T16-42-43-212Z/feature-planning/SKILL.md +0 -464
  64. package/.claude/skills.backup-2025-11-11T16-42-43-212Z/speed-mode/SKILL.md +0 -467
  65. package/.claude/skills.backup-2025-11-11T16-42-43-212Z/stable-mode/SKILL.md +0 -673
  66. package/.claude/skills.backup-2025-11-11T17-06-09-783Z/epic-planning/SKILL.md +0 -297
  67. package/.claude/skills.backup-2025-11-11T17-06-09-783Z/feature-planning/SKILL.md +0 -464
  68. package/.claude/skills.backup-2025-11-11T17-06-09-783Z/speed-mode/SKILL.md +0 -467
  69. package/.claude/skills.backup-2025-11-11T17-06-09-783Z/stable-mode/SKILL.md +0 -673
  70. package/.devpod/current-work.json +0 -10
  71. package/.devpod/work.db +0 -0
  72. package/.github/workflows/test-safety.yml +0 -85
  73. package/.jettypod/config.json +0 -5
  74. package/.jettypod/current-work.json +0 -10
  75. package/.jettypod/hooks/README.md +0 -77
  76. package/.jettypod/hooks/protect-claude-md.js +0 -338
  77. package/.jettypod/test-work.db +0 -0
  78. package/.jettypod/work.db +0 -0
  79. package/CLAUDE.md +0 -49
  80. package/SPEED-STABLE-AUDIT.md +0 -853
  81. package/SYSTEM-BEHAVIOR.md +0 -2199
  82. package/TEST_SAFETY_AUDIT.md +0 -314
  83. package/TEST_SAFETY_IMPLEMENTATION.md +0 -97
  84. package/cucumber-report.html +0 -45
  85. package/dist/devpod-linux +0 -0
  86. package/dist/devpod-macos +0 -0
  87. package/dist/devpod-win.exe +0 -0
  88. package/docs/features/jettypod-standards-explained.md +0 -543
  89. package/docs/features/standards-inventory.md +0 -257
  90. package/features/auto-generate-production-chores.feature +0 -13
  91. package/features/backlog-command.feature +0 -26
  92. package/features/backlog-filtering-production.feature +0 -10
  93. package/features/claude-md-protection/steps.js +0 -498
  94. package/features/decisions/index.js +0 -490
  95. package/features/decisions/index.test.js +0 -208
  96. package/features/fix-text-wrapping.feature +0 -42
  97. package/features/git-hooks/git-hooks.feature +0 -30
  98. package/features/git-hooks/index.js +0 -93
  99. package/features/git-hooks/index.test.js +0 -137
  100. package/features/git-hooks/post-commit +0 -56
  101. package/features/git-hooks/post-merge +0 -47
  102. package/features/git-hooks/pre-commit +0 -28
  103. package/features/git-hooks/simple-steps.js +0 -53
  104. package/features/git-hooks/simple-test.feature +0 -10
  105. package/features/git-hooks/steps.js +0 -196
  106. package/features/jettypod-update-command.feature +0 -46
  107. package/features/mode-prompts/index.js +0 -95
  108. package/features/mode-prompts/simple-steps.js +0 -44
  109. package/features/mode-prompts/simple-test.feature +0 -9
  110. package/features/mode-prompts/validation.test.js +0 -120
  111. package/features/multiple-claude-instances.feature +0 -121
  112. package/features/production-mode-skill.feature +0 -121
  113. package/features/refactor-mode/steps.js +0 -217
  114. package/features/refactor-mode.feature +0 -49
  115. package/features/simplify-external-transition.feature +0 -166
  116. package/features/skills-update/index.test.js +0 -216
  117. package/features/step_definitions/backlog-command.steps.js +0 -37
  118. package/features/step_definitions/fix-text-wrapping.steps.js +0 -271
  119. package/features/step_definitions/multiple-claude-instances.steps.js +0 -621
  120. package/features/step_definitions/production-mode-skill.steps.js +0 -862
  121. package/features/step_definitions/simplify-external-transition.steps.js +0 -370
  122. package/features/step_definitions/terminal-logo.steps.js +0 -145
  123. package/features/step_definitions/update-command.steps.js +0 -183
  124. package/features/support/hooks.js +0 -9
  125. package/features/terminal-logo/index.js +0 -39
  126. package/features/terminal-logo/terminal-logo.feature +0 -30
  127. package/features/update-command/index.js +0 -181
  128. package/features/update-command/index.test.js +0 -225
  129. package/features/work-commands/bug-workflow-display.feature +0 -22
  130. package/features/work-commands/index.js +0 -498
  131. package/features/work-commands/simple-steps.js +0 -69
  132. package/features/work-commands/stable-tests.feature +0 -57
  133. package/features/work-commands/steps.js +0 -1174
  134. package/features/work-commands/validation.test.js +0 -88
  135. package/features/work-commands/work-commands.feature +0 -13
  136. package/features/work-tracking/discovery-validation.test.js +0 -228
  137. package/features/work-tracking/index.js +0 -1921
  138. package/features/work-tracking/mode-required.feature +0 -112
  139. package/features/work-tracking/phase-tracking.test.js +0 -482
  140. package/features/work-tracking/prototype-tracking.test.js +0 -485
  141. package/features/work-tracking/tree-view.test.js +0 -310
  142. package/features/work-tracking/work-set-mode.feature +0 -71
  143. package/features/work-tracking/work-start-mode.feature +0 -88
  144. package/full-test.txt +0 -0
  145. package/lib/bug-workflow.test.js +0 -177
  146. package/lib/claudemd.test.js +0 -195
  147. package/lib/config.test.js +0 -511
  148. package/lib/constants.test.js +0 -164
  149. package/lib/current-work.test.js +0 -146
  150. package/lib/database-project-config.test.js +0 -111
  151. package/lib/database.test.js +0 -106
  152. package/lib/decisions-generator.test.js +0 -457
  153. package/lib/decisions-helpers.test.js +0 -310
  154. package/lib/git-coordinator.js +0 -167
  155. package/lib/git.test.js +0 -145
  156. package/lib/migrations/002-default-work-item-modes.test.js +0 -351
  157. package/lib/production-chore-generator.test.js +0 -432
  158. package/lib/production-context-detector.test.js +0 -277
  159. package/lib/production-scenario-appender.test.js +0 -235
  160. package/lib/production-scenario-validator.test.js +0 -246
  161. package/lib/production-standards-reader.test.js +0 -270
  162. package/lib/project-state.test.js +0 -92
  163. package/lib/push-queue.js +0 -417
  164. package/lib/queue-processor.js +0 -74
  165. package/lib/test-helpers.js +0 -202
  166. package/lib/test-helpers.test.js +0 -255
  167. package/prototypes/2025-01-11-production-mode-autonomous.js +0 -119
  168. package/prototypes/2025-01-11-production-mode-collaborative.js +0 -166
  169. package/prototypes/2025-01-11-production-mode-guided.js +0 -217
  170. package/prototypes/2025-01-11-production-mode-smart-context.js +0 -347
  171. package/prototypes/2025-01-11-production-standards-example.md +0 -204
  172. package/prototypes/2025-11-10-backlog-filtering-tree-aware.js +0 -242
  173. package/prototypes/test/index.html +0 -1
  174. package/setup-dist-repo.sh +0 -68
  175. package/test-production-standards-engine.js +0 -130
  176. package/test-results.json +0 -2195
  177. package/test-safety-check.sh +0 -80
  178. package/work-item-tracking-plan.md +0 -199
  179. /package/{.jettypod/devpod.db → jettypod.db} +0 -0
@@ -1,853 +0,0 @@
1
- # Speed/Stable/Production Mode Skills Audit
2
-
3
- ## Executive Summary
4
-
5
- This document analyzes the three-phase development methodology implemented through the mode skills. The approach separates feature implementation into:
6
- - **Speed mode**: Implement ALL scoped functionality assuming happy path
7
- - **Stable mode**: Add error handling and validation for internal use
8
- - **Production mode**: Add security, compliance, scalability, and performance for external/production use
9
-
10
- This audit focuses on the speed and stable mode skills, as production mode is not yet implemented.
11
-
12
- ---
13
-
14
- ## 1. Opportunities to Refactor
15
-
16
- ### 1.1 Code Duplication
17
-
18
- **Issue**: Scenario parsing and file discovery logic is duplicated across both skills.
19
-
20
- **Example**: Both skills independently:
21
- - Parse Gherkin scenarios
22
- - Query the database for work items
23
- - Extract scenario requirements
24
- - Run BDD tests
25
-
26
- **Recommendation**: Extract shared functionality into common utilities:
27
- ```javascript
28
- // lib/scenario-helpers.js
29
- function parseScenarios(scenarioContent) { /* ... */ }
30
- function matchScenarioToChore(scenarios, choreDesc) { /* ... */ }
31
-
32
- // lib/test-helpers.js
33
- async function runBddTests(featureFile, options) { /* ... */ }
34
- function parseTestResults(stdout) { /* ... */ }
35
- ```
36
-
37
- ### 1.2 Inconsistent Error Handling Patterns
38
-
39
- **Issue**: Stable mode demonstrates comprehensive error handling in its *instructions* but speed mode doesn't model good error handling practices in its own implementation code.
40
-
41
- **Example**: Speed mode's scenario loading code (lines 44-64) has no error handling, while stable mode's equivalent (lines 80-158) is comprehensive.
42
-
43
- **Recommendation**:
44
- - Both skills should practice what they preach
45
- - Speed mode instruction code should still have basic error handling (even if the code it generates doesn't)
46
- - Create a clear separation between "skill execution code" and "code being generated"
47
-
48
- ### 1.3 File Discovery Brittleness
49
-
50
- **Issue**: Both skills rely on fragile approaches for finding relevant files:
51
- - Speed mode: Manual glob/grep patterns based on scenario keywords
52
- - Stable mode: Git history parsing with manual fallback
53
-
54
- **Example**: Stable mode (lines 260-321) has complex git log parsing that can fail in multiple ways:
55
- ```javascript
56
- const { stdout: gitLog } = await execPromise(
57
- `git log --oneline --all --grep="${featureName}" -10`
58
- );
59
- ```
60
-
61
- **Problems**:
62
- - Assumes feature name in commit messages
63
- - Limited to 10 commits
64
- - Breaks if commits aren't descriptive
65
- - No handling for squashed/rebased history
66
-
67
- **Recommendation**:
68
- - Store file associations in the work item database
69
- - Speed mode records files it creates/modifies
70
- - Stable mode reads from database first, git history second
71
- - Add a `work_item_files` table linking work items to affected files
72
-
73
- ### 1.4 Test Execution Parsing
74
-
75
- **Issue**: Both skills parse test output with fragile string matching.
76
-
77
- **Example** (stable mode, line 472):
78
- ```javascript
79
- const targetScenarioPassed = stdout.includes('[scenario-title]') && stdout.includes('✓');
80
- ```
81
-
82
- **Problems**:
83
- - Breaks if test output format changes
84
- - Hardcoded symbols ('✓', '✗')
85
- - No structured output parsing
86
- - Difficult to extract specific failure details
87
-
88
- **Recommendation**:
89
- - Use structured test output (JSON reporters)
90
- - Parse Cucumber/Jest JSON output for deterministic results
91
- - Provide detailed failure context (line numbers, assertions, stack traces)
92
-
93
- ### 1.5 Iteration Logic Complexity
94
-
95
- **Issue**: Stable mode's test-fix iteration loop (lines 434-548) is monolithic and hard to reason about.
96
-
97
- **Problems**:
98
- - Mixes concerns: file editing, test running, result parsing, error handling
99
- - Difficult to test in isolation
100
- - Hard to add new iteration strategies
101
- - No clear separation of retry logic from business logic
102
-
103
- **Recommendation**:
104
- ```javascript
105
- // lib/iteration-engine.js
106
- class TestIterationEngine {
107
- constructor(maxIterations = 10, timeout = 60000) { /* ... */ }
108
-
109
- async iterate(implementFn, testFn, analyzeFn) {
110
- // Generic iteration logic
111
- }
112
-
113
- handleTestTimeout(error) { /* ... */ }
114
- handleTestFailure(results) { /* ... */ }
115
- handleMaxIterations() { /* ... */ }
116
- }
117
-
118
- // In stable-mode skill
119
- const engine = new TestIterationEngine();
120
- await engine.iterate(
121
- () => this.addErrorHandling(),
122
- () => this.runTests(),
123
- (results) => this.analyzeFailures(results)
124
- );
125
- ```
126
-
127
- ### 1.6 User Interaction Ambiguity
128
-
129
- **Issue**: Skills have unclear boundaries between "autonomous execution" and "requires confirmation".
130
-
131
- **Example**: Speed mode says "autonomous" but has multiple confirmation points:
132
- - Step 3 Phase 1: Wait for implementation approach confirmation
133
- - Step 4 Phase 2: Wait for stable chore creation confirmation
134
-
135
- **Recommendation**:
136
- - Define clear "decision gates" vs "execution phases"
137
- - Use a consistent pattern: `proposeAndConfirm(proposal)` → `execute(plan)`
138
- - Document *why* each gate exists (e.g., "high-level direction setting" vs "tactical execution")
139
-
140
- ### 1.7 Missing Rollback/Undo Mechanisms
141
-
142
- **Issue**: Neither skill provides rollback if implementation fails or user rejects.
143
-
144
- **Problems**:
145
- - No way to undo file changes if tests fail after max iterations
146
- - No git commit checkpointing
147
- - Hard to recover from partial failures
148
-
149
- **Recommendation**:
150
- ```javascript
151
- // lib/transaction-helpers.js
152
- class CodeTransaction {
153
- begin() {
154
- this.checkpoint = getCurrentGitCommit();
155
- this.changedFiles = [];
156
- }
157
-
158
- rollback() {
159
- // Restore files from checkpoint
160
- // Or create a revert commit
161
- }
162
-
163
- commit(message) {
164
- // Create git commit with changed files
165
- }
166
- }
167
- ```
168
-
169
- ### 1.8 Scenario Matching Heuristics
170
-
171
- **Issue**: Stable mode's scenario matching (lines 184-221) uses crude keyword matching.
172
-
173
- **Example**:
174
- ```javascript
175
- const keywords = choreDesc.split(/\s+/).filter(w => w.length > 3);
176
- const matches = keywords.filter(k => scenarioLower.includes(k));
177
- ```
178
-
179
- **Problems**:
180
- - False positives (common words like "user", "data")
181
- - Doesn't handle synonyms
182
- - No semantic understanding
183
- - No confidence scoring
184
-
185
- **Recommendation**:
186
- - Use explicit scenario references in chore descriptions: `[Scenario 2]`
187
- - Store scenario index in work item database
188
- - Provide fuzzy matching with confidence scores
189
- - Allow user to select from ranked list if ambiguous
190
-
191
- ---
192
-
193
- ## 2. The Three-Phase Development Model
194
-
195
- ### 2.1 Overview
196
-
197
- The methodology separates feature development into three distinct phases, each with clear responsibilities:
198
-
199
- **Phase 1: Speed Mode**
200
- - **Goal**: Implement ALL scoped functionality
201
- - **Scope**: Every feature/function defined in discovery
202
- - **Philosophy**: Assume happy path - valid inputs, successful operations, correct types
203
- - **Excludes**: Error handling, validation, edge cases, performance optimization, security hardening
204
- - **Output**: Working feature that passes happy path BDD scenario
205
- - **Target**: Internal proof-of-concept, rapid prototyping
206
-
207
- **Phase 2: Stable Mode**
208
- - **Goal**: Add robustness for internal use
209
- - **Scope**: Error handling, input validation, edge case coverage
210
- - **Philosophy**: Build on speed implementation without re-implementing core features
211
- - **Includes**: Try/catch blocks, validation checks, clear error messages, graceful failures
212
- - **Excludes**: Security hardening, compliance, scalability, performance optimization
213
- - **Output**: Reliable feature for internal/team use
214
- - **Target**: Internal tools, staging environments, team testing
215
-
216
- **Phase 3: Production Mode** (not yet implemented in skills)
217
- - **Goal**: Make feature production-ready for external users
218
- - **Scope**: Security, compliance, scalability, performance
219
- - **Philosophy**: Harden stable implementation for real-world use
220
- - **Includes**: Authentication/authorization, rate limiting, audit logging, performance optimization, load testing, compliance requirements (GDPR, HIPAA, etc.)
221
- - **Output**: Production-grade feature
222
- - **Target**: External users, public-facing systems, regulated environments
223
-
224
- ### 2.2 Key Insight: Progressive Hardening
225
-
226
- The three-phase model recognizes that **internal use** and **external/production use** have fundamentally different requirements:
227
-
228
- - **Internal**: Developers know how to use the system, can handle rough edges, can debug issues
229
- - **Production**: Users expect polish, security, performance, and reliability
230
-
231
- Stable mode optimizes for developer productivity (get a working feature fast for internal use), while production mode optimizes for user experience and enterprise requirements.
232
-
233
- ---
234
-
235
- ## 3. Strengths of This Approach
236
-
237
- ### 3.1 Separation of Concerns
238
-
239
- **Strength**: Clean three-phase separation mirrors real-world development practices and aligns with internal vs external deployment targets.
240
-
241
- **Why it works**:
242
- - Reduces cognitive load (focus on one thing at a time)
243
- - Prevents premature optimization and over-engineering
244
- - Allows rapid prototyping without getting bogged down in edge cases
245
- - Natural checkpoints between major phases (speed → stable → production)
246
- - Aligns with deployment targets (internal → staging → production)
247
- - Defers expensive work (security audits, load testing) until actually needed
248
-
249
- **Evidence**: Speed mode explicitly states "NO error handling" (line 21), stable mode adds robustness (line 17), and production mode will add security/scalability.
250
-
251
- ### 3.2 BDD-Driven Development
252
-
253
- **Strength**: Using Gherkin scenarios as source of truth creates clear, testable requirements.
254
-
255
- **Benefits**:
256
- - Machine-readable specifications
257
- - Clear acceptance criteria
258
- - Automatic verification through step definitions
259
- - Living documentation that stays synchronized with code
260
-
261
- **Example**: Both skills rely on scenario files to drive what gets implemented, ensuring implementation matches requirements.
262
-
263
- ### 3.3 Autonomous Execution with Strategic Confirmation
264
-
265
- **Strength**: Skills ask for confirmation on *approach* but execute *implementation* autonomously.
266
-
267
- **Why it works**:
268
- - User validates high-level decisions (architecture, approach)
269
- - Claude handles tactical execution (code writing, iteration)
270
- - Balances control with efficiency
271
- - Appropriate for "users who may not know how to code" (line 26)
272
-
273
- **Example**: Speed mode Step 3 proposes implementation approach, waits for confirmation, then executes autonomously.
274
-
275
- ### 3.4 Progressive Complexity
276
-
277
- **Strength**: Intentional progression from simple to complex reduces failure points.
278
-
279
- **Benefits**:
280
- - Happy path implementation (speed) establishes foundation
281
- - Edge cases and errors (stable) build incrementally
282
- - Each phase has clear success criteria
283
- - Easier to debug (isolate whether issue is in core logic vs error handling)
284
-
285
- ### 3.5 Comprehensive Error Modeling
286
-
287
- **Strength**: Stable mode's error handling examples (lines 80-548) demonstrate thorough error scenarios.
288
-
289
- **Coverage includes**:
290
- - Database errors
291
- - File system errors
292
- - Missing data
293
- - Invalid input
294
- - Timeout errors
295
- - Git operation failures
296
- - Test execution failures
297
-
298
- **Why it works**: Demonstrates real-world failure modes, not just theoretical errors.
299
-
300
- ### 3.6 Iteration Limits and Failure Recovery
301
-
302
- **Strength**: Stable mode includes max iteration limits and graceful failure handling.
303
-
304
- **Example** (lines 434, 526-547):
305
- ```javascript
306
- const MAX_ITERATIONS = 10;
307
- // ...
308
- if (!scenarioPasses && iteration >= MAX_ITERATIONS) {
309
- console.error('Maximum iterations reached');
310
- console.log('Possible reasons:');
311
- // Provides actionable suggestions
312
- }
313
- ```
314
-
315
- **Why it works**:
316
- - Prevents infinite loops
317
- - Forces explicit failure handling
318
- - Provides debugging guidance
319
- - Asks user for direction when stuck
320
-
321
- ### 3.7 Work Decomposition Strategy
322
-
323
- **Strength**: Speed mode automatically generates stable mode chores based on implementation gaps.
324
-
325
- **Benefits**:
326
- - Ensures stable mode work isn't forgotten
327
- - Creates explicit work items for non-happy-path scenarios
328
- - Provides traceability (each chore maps to specific scenario)
329
- - Prevents "works on my machine" releases
330
-
331
- **Example** (speed mode lines 259-361): Analyzes implementation, proposes chores, creates them programmatically.
332
-
333
- ### 3.8 Clear Mental Models
334
-
335
- **Strength**: Both skills use consistent structure and terminology.
336
-
337
- **Patterns**:
338
- - Four-step progression: Analyze → Review → Implement → Verify
339
- - Autonomous vs confirmation phases clearly marked
340
- - Consistent emoji/formatting for status messages
341
- - Clear error vs success states
342
-
343
- **Why it works**: Reduces cognitive load, makes workflows predictable and learnable.
344
-
345
- ---
346
-
347
- ## 4. Criticisms
348
-
349
- ### 4.1 Architectural: Tight Coupling to BDD
350
-
351
- **Problem**: The entire workflow *requires* BDD scenarios to exist and be correct.
352
-
353
- **Failure modes**:
354
- - What if scenarios don't cover actual requirements?
355
- - What if step definitions are incorrect?
356
- - What if business needs change mid-implementation?
357
- - How to handle exploratory development?
358
-
359
- **Example**: Speed mode line 217 is the *only* verification: "Check if BDD scenario passes". If scenarios are wrong, you'll implement the wrong thing correctly.
360
-
361
- **Impact**:
362
- - Assumes scenarios are perfect specifications
363
- - No validation that scenarios match actual user needs
364
- - Difficult to adapt to changing requirements
365
- - Requires upfront BDD expertise
366
-
367
- ### 4.2 User Profile Assumption
368
-
369
- **Problem**: "User may not know how to code" (speed line 26, stable line 30) is baked into the design.
370
-
371
- **Implications**:
372
- - Limited ability to customize implementation
373
- - No escape hatch for code-savvy users
374
- - Difficult to intervene during execution
375
- - Black box execution model
376
-
377
- **Example**: User can't say "implement this specific way" during autonomous execution phases.
378
-
379
- **Better approach**:
380
- - Provide "expert mode" toggle
381
- - Allow code injection points
382
- - Support mid-execution adjustments
383
- - Provide "review-before-execute" option
384
-
385
- ### 4.3 Speed Mode Philosophy: Simplified vs Wrong Implementation
386
-
387
- **Problem**: "Assume everything works correctly" (speed line 20) risks creating implementations that need rewriting rather than enhancing.
388
-
389
- **Example**: Speed mode line 240 says "localStorage for data" as a pragmatic choice - but if stable mode needs a real database for internal use, you're rewriting, not enhancing.
390
-
391
- **Key distinction**:
392
- - **Simplified** (good): Real database with basic queries, no error handling → Stable adds try/catch and validation
393
- - **Wrong** (bad): localStorage → Stable mode must rewrite to use real database
394
-
395
- **Risk**: Speed mode might choose implementation approaches that can't be enhanced in stable mode, forcing rewrites.
396
-
397
- **Updated with three-phase context**:
398
- - Speed: Simplified real implementations (real DB, real APIs, real file upload)
399
- - Stable: Add error handling and validation for internal use
400
- - Production: Add security, performance, and scalability
401
-
402
- This keeps the "build on" philosophy intact across all phases. Stable doesn't re-implement core logic, and production doesn't re-implement stable's error handling - each phase truly enhances the previous.
403
-
404
- ### 4.4 Iteration Strategy: Blind Trial and Error
405
-
406
- **Problem**: Both skills iterate without learning from previous attempts.
407
-
408
- **Example** (stable mode lines 434-548): Iteration loop has no memory:
409
- - No tracking of what was already tried
410
- - No analysis of *why* previous iteration failed
411
- - No strategic adjustment of approach
412
-
413
- **Risk**: Could try the same fix 10 times, hitting max iterations without progress.
414
-
415
- **Better approach**:
416
- ```javascript
417
- class IterationEngine {
418
- constructor() {
419
- this.attemptHistory = [];
420
- }
421
-
422
- async iterate() {
423
- const failureAnalysis = this.analyzePatterns(this.attemptHistory);
424
- const nextStrategy = this.selectStrategy(failureAnalysis);
425
- // Execute with adjusted approach
426
- }
427
- }
428
- ```
429
-
430
- ### 4.5 Testing: Lack of Unit Test Strategy
431
-
432
- **Problem**: Both skills only verify via BDD integration tests.
433
-
434
- **Missing**:
435
- - Unit tests for individual functions
436
- - Mock/stub strategies
437
- - Fast feedback loops (BDD tests are slow)
438
- - Granular failure isolation
439
-
440
- **Impact**: When BDD test fails, hard to know *which* function is wrong.
441
-
442
- **Example**: Speed mode creates entire implementation then runs BDD test. If it fails, unclear which piece is broken.
443
-
444
- **Recommendation**: Generate unit tests alongside implementation:
445
- - Speed mode: Create basic unit tests for happy path
446
- - Stable mode: Add edge case unit tests
447
- - Run unit tests before BDD tests (faster feedback)
448
-
449
- ### 4.6 Error Messages: Inconsistent Guidance
450
-
451
- **Problem**: Error messages sometimes provide actionable guidance, sometimes don't.
452
-
453
- **Good example** (stable mode line 120):
454
- ```javascript
455
- console.error('❌ Feature has no scenario_file.');
456
- console.log('Suggestion: Create a scenario file and update the feature.');
457
- ```
458
-
459
- **Bad example** (stable mode line 503):
460
- ```javascript
461
- console.error('❌ Test execution error:', testErr.message);
462
- console.log('Retrying...');
463
- // No explanation of what to check or how to fix
464
- ```
465
-
466
- **Recommendation**: All error messages should follow pattern:
467
- 1. What failed
468
- 2. Why it might have failed
469
- 3. How to fix it
470
- 4. What the skill will do next
471
-
472
- ### 4.7 Architectural Decisions: Underutilized
473
-
474
- **Problem**: Speed mode checks for architectural decisions (line 108) but doesn't explain how they're enforced.
475
-
476
- **Questions**:
477
- - What if implementation violates decisions?
478
- - Are decisions validated automatically?
479
- - Can decisions be overridden?
480
- - How are conflicts resolved?
481
-
482
- **Missing**: No validation that generated code adheres to architectural constraints.
483
-
484
- ### 4.8 Stable Mode Chore Creation: Manual Decomposition
485
-
486
- **Problem**: Speed mode asks user to confirm chore proposals (line 318) but uses automated analysis.
487
-
488
- **Risk**: User might not understand what chores are needed, leading to:
489
- - Missing edge cases
490
- - Duplicate chores
491
- - Insufficient coverage
492
- - Wrong granularity
493
-
494
- **Better approach**: Provide confidence scores and coverage analysis:
495
- ```
496
- Chore 1: Handle file upload errors [Confidence: High, Coverage: Error scenario 2]
497
- Chore 2: Validate file size limits [Confidence: Medium, Coverage: Edge scenario 3]
498
- Chore 3: Handle concurrent uploads [Confidence: Low, Coverage: Not in scenarios - inferred]
499
- ```
500
-
501
- ### 4.9 Performance Deferred to Production Mode
502
-
503
- **Clarification**: This is by design, not an oversight. Performance optimization is production mode's responsibility.
504
-
505
- **Three-phase division**:
506
- - **Speed**: Functional correctness on happy path
507
- - **Stable**: Reliability for internal use (error handling, validation)
508
- - **Production**: Performance, scalability, security for external use
509
-
510
- **Potential issue**: If speed mode makes poor algorithmic choices (O(n²) when O(n log n) is trivial), stable mode won't catch it because tests pass with small datasets.
511
-
512
- **Recommendation**: Add basic performance awareness to speed mode:
513
- - Don't choose obviously inefficient algorithms
514
- - Use reasonable data structures (Map vs Array for lookups)
515
- - Production mode adds: benchmarking, load testing, scalability analysis, resource optimization
516
-
517
- ### 4.10 Unclear Handoff Between Modes
518
-
519
- **Problem**: No explicit verification that speed mode completion criteria are met before stable mode starts.
520
-
521
- **Questions**:
522
- - What if speed mode didn't finish?
523
- - What if happy path test is broken?
524
- - What if some features weren't implemented?
525
-
526
- **Missing**: Stable mode should verify speed mode completion:
527
- ```javascript
528
- // Verify happy path passes before adding error handling
529
- const happyPathResults = await runHappyPathScenario();
530
- if (!happyPathResults.passed) {
531
- console.error('Cannot start stable mode - happy path broken');
532
- console.log('Fix speed mode implementation first');
533
- return;
534
- }
535
- ```
536
-
537
- ---
538
-
539
- ## 5. Failure Modes
540
-
541
- ### 5.1 Scenario Specification Failures
542
-
543
- **Failure Mode**: BDD scenarios are incomplete, ambiguous, or incorrect.
544
-
545
- **Example**:
546
- ```gherkin
547
- Scenario: User uploads a file
548
- Given I am on the upload page
549
- When I click upload
550
- Then the file is uploaded
551
- ```
552
-
553
- **Problem**: Scenario doesn't specify:
554
- - What file? (size, type, name)
555
- - Where is it uploaded? (URL, storage location)
556
- - What confirmation? (UI feedback, status)
557
- - What happens next? (navigation, state change)
558
-
559
- **Impact**: Claude implements based on assumptions, which may be wrong.
560
-
561
- **Consequence**: Tests pass but feature doesn't meet actual requirements.
562
-
563
- **Mitigation**: Add scenario validation step before implementation.
564
-
565
- ### 4.2 Step Definition Failures
566
-
567
- **Failure Mode**: Step definitions don't correctly verify requirements.
568
-
569
- **Example**:
570
- ```javascript
571
- Then('the file is uploaded', function() {
572
- // Just checks that function was called
573
- expect(uploadFile).toHaveBeenCalled();
574
- });
575
- ```
576
-
577
- **Problem**: Doesn't verify file actually exists in storage, correct size, permissions, etc.
578
-
579
- **Impact**: Tests pass with broken implementation.
580
-
581
- **Consequence**: Bugs discovered in production, not testing.
582
-
583
- **Mitigation**: Generate step definitions with comprehensive assertions.
584
-
585
- ### 4.3 Codebase Pattern Misunderstanding
586
-
587
- **Failure Mode**: Skill misinterprets codebase patterns and implements inconsistently.
588
-
589
- **Example**:
590
- - Codebase uses Promises but skill generates callbacks
591
- - Codebase uses TypeScript but skill generates JavaScript
592
- - Codebase uses class components but skill generates hooks
593
-
594
- **Impact**: Code works but doesn't fit architectural patterns.
595
-
596
- **Consequence**: Code review rejection, refactoring required, technical debt.
597
-
598
- **Mitigation**: Add explicit pattern validation step.
599
-
600
- ### 4.4 Git History Pollution
601
-
602
- **Failure Mode**: Multiple iterations create messy commit history.
603
-
604
- **Example**: Stable mode iterates 8 times, creating 8 commits, all for same chore.
605
-
606
- **Impact**: Unclear git history, difficult to review, hard to revert.
607
-
608
- **Consequence**: Poor code review experience, difficult debugging.
609
-
610
- **Mitigation**:
611
- - Use transaction pattern (single commit per chore)
612
- - Squash iteration commits automatically
613
- - Provide clean commit message generation
614
-
615
- ### 4.5 Test Timeout Cascades
616
-
617
- **Failure Mode**: One slow test causes timeout, leading to false failures.
618
-
619
- **Example** (stable mode line 500-509): Test times out, skill assumes implementation problem.
620
-
621
- **Actual cause**: Network latency, slow CI, resource contention.
622
-
623
- **Impact**: Skill iterates unnecessarily, wasting time.
624
-
625
- **Consequence**: Max iterations reached, chore marked as failed.
626
-
627
- **Mitigation**:
628
- - Distinguish timeout from failure
629
- - Retry timeouts with backoff
630
- - Provide manual override option
631
-
632
- ### 4.6 Circular Dependencies
633
-
634
- **Failure Mode**: Speed implementation creates circular dependencies that pass tests but break at runtime.
635
-
636
- **Example**:
637
- ```javascript
638
- // fileA.js
639
- const { processB } = require('./fileB');
640
-
641
- // fileB.js
642
- const { processA } = require('./fileA');
643
- ```
644
-
645
- **Impact**: BDD tests mock dependencies, so circular dep not detected.
646
-
647
- **Consequence**: Runtime crash in production.
648
-
649
- **Mitigation**: Add static analysis step to detect circular deps.
650
-
651
- ### 4.7 State Leakage Between Tests
652
-
653
- **Failure Mode**: Tests share state (localStorage, database, globals), causing flakiness.
654
-
655
- **Example**:
656
- - Test 1 sets `localStorage.user = "admin"`
657
- - Test 2 assumes anonymous user
658
- - Test 2 fails intermittently depending on test order
659
-
660
- **Impact**: Tests pass in isolation but fail in suite.
661
-
662
- **Consequence**: Skill assumes code is broken, iterates unnecessarily.
663
-
664
- **Mitigation**: Add test isolation verification, setup/teardown helpers.
665
-
666
- ### 4.8 External Dependency Failures
667
-
668
- **Failure Mode**: Tests depend on external services (APIs, databases) that are unavailable.
669
-
670
- **Example**: BDD scenario tests real API integration, API is down during test run.
671
-
672
- **Impact**: Tests fail, skill assumes implementation broken.
673
-
674
- **Consequence**: Skill iterates without making progress.
675
-
676
- **Mitigation**:
677
- - Distinguish external failures from implementation failures
678
- - Provide retry logic for external deps
679
- - Support mock/stub modes
680
-
681
- ### 4.9 Scenario Evolution During Development
682
-
683
- **Failure Mode**: Business requirements change mid-development, scenarios become outdated.
684
-
685
- **Example**:
686
- - Speed mode implements scenario v1
687
- - Stakeholder changes requirements
688
- - Stable mode implements scenario v2
689
- - Speed and stable implementations diverge
690
-
691
- **Impact**: Implementation inconsistency.
692
-
693
- **Consequence**: Neither version fully works.
694
-
695
- **Mitigation**: Version control scenarios, validate consistency across modes.
696
-
697
- ### 4.10 Memory Limitations with Large Codebases
698
-
699
- **Failure Mode**: Codebase analysis loads too many files, exceeding context limits.
700
-
701
- **Example**: Speed mode uses Grep to find patterns, matches 500 files, tries to read all.
702
-
703
- **Impact**: Context overflow, incomplete analysis.
704
-
705
- **Consequence**: Wrong files modified, incorrect integration points.
706
-
707
- **Mitigation**:
708
- - Implement smart filtering (relevance scoring)
709
- - Limit analysis scope (recent files, same directory)
710
- - Use incremental analysis
711
-
712
- ### 4.11 Ambiguous Error Recovery
713
-
714
- **Failure Mode**: Skill encounters error it wasn't designed to handle.
715
-
716
- **Example**: Database schema migration needed, but skill doesn't recognize it.
717
-
718
- **Error message**: "Cannot find column 'scenario_file'"
719
-
720
- **Skill behavior**: Retries same query multiple times, hits max iterations.
721
-
722
- **Consequence**: Fails without useful guidance.
723
-
724
- **Mitigation**: Add fallback to user consultation when stuck.
725
-
726
- ### 4.12 Overconfidence in Test Passing
727
-
728
- **Failure Mode**: Tests pass but code doesn't actually work as intended.
729
-
730
- **Example**:
731
- - BDD test mocks HTTP responses
732
- - Implementation has off-by-one error in real API call
733
- - Test passes because mock returns expected data
734
- - Real usage fails
735
-
736
- **Impact**: False confidence in implementation quality.
737
-
738
- **Consequence**: Bugs discovered in production.
739
-
740
- **Mitigation**: Require integration tests with real dependencies in separate validation step.
741
-
742
- ### 4.13 Skill Version Drift
743
-
744
- **Failure Mode**: Speed mode skill updated but stable mode skill not updated accordingly.
745
-
746
- **Example**:
747
- - Speed mode changes chore creation format
748
- - Stable mode expects old format
749
- - Handoff breaks
750
-
751
- **Impact**: Stable mode can't find/parse speed mode output.
752
-
753
- **Consequence**: Manual intervention required.
754
-
755
- **Mitigation**: Version skills, validate compatibility at handoff points.
756
-
757
- ### 4.14 Human Misunderstanding of Autonomous Boundaries
758
-
759
- **Failure Mode**: User expects skill to stop for confirmation but it executes autonomously.
760
-
761
- **Example**: User expects to review each file change, but skill creates 10 files without pausing.
762
-
763
- **Impact**: User surprise, possible unwanted changes.
764
-
765
- **Consequence**: Manual rollback, lost trust.
766
-
767
- **Mitigation**: Make autonomous boundaries more explicit in UI, provide "confirm each step" mode.
768
-
769
- ### 4.15 Insufficient Context for Complex Scenarios
770
-
771
- **Failure Mode**: Scenario requires domain knowledge not captured in codebase or scenario file.
772
-
773
- **Example**:
774
- ```gherkin
775
- Scenario: Process payment with merchant account
776
- Given a valid credit card
777
- When I submit payment
778
- Then funds are transferred to merchant account
779
- ```
780
-
781
- **Missing context**:
782
- - Which payment gateway?
783
- - What merchant ID?
784
- - What API credentials?
785
- - Staging vs production?
786
-
787
- **Impact**: Skill implements with assumptions.
788
-
789
- **Consequence**: Implementation doesn't work with real merchant account.
790
-
791
- **Mitigation**: Require configuration validation before implementation.
792
-
793
- ---
794
-
795
- ## 6. Recommendations Summary
796
-
797
- ### High Priority
798
- 1. **Extract shared utilities** to reduce duplication
799
- 2. **Implement file tracking database** to replace fragile git parsing
800
- 3. **Add rollback/transaction support** for failed implementations
801
- 4. **Validate scenario quality** before implementation
802
- 5. **Distinguish external failures** from implementation failures
803
-
804
- ### Medium Priority
805
- 6. **Add unit testing generation** alongside BDD tests
806
- 7. **Improve iteration intelligence** with attempt history analysis
807
- 8. **Standardize error message format** across all failures
808
- 9. **Version control skill dependencies** to prevent drift
809
- 10. **Add performance scenarios** to BDD requirements
810
-
811
- ### Low Priority
812
- 11. **Provide expert mode** for code-savvy users
813
- 12. **Add static analysis** for circular deps, code smells
814
- 13. **Implement confidence scoring** for scenario matching
815
- 14. **Add coverage analysis** for chore proposals
816
- 15. **Create debugging guides** for common failure modes
817
-
818
- ---
819
-
820
- ## Conclusion
821
-
822
- The three-phase development model (speed → stable → production) is fundamentally sound and addresses real challenges in software development. The progressive hardening approach mirrors industry best practices:
823
-
824
- - **Speed**: Make it work (internal proof-of-concept)
825
- - **Stable**: Make it reliable (internal use with error handling)
826
- - **Production**: Make it production-ready (external use with security, performance, compliance)
827
-
828
- This separation aligns with deployment targets and defers expensive work (security audits, load testing) until actually needed for production.
829
-
830
- **Key strengths**:
831
- - Clean separation of concerns reduces cognitive load
832
- - BDD-driven approach ensures testable requirements
833
- - Progressive complexity prevents premature optimization
834
- - Autonomous execution appropriate for non-technical users
835
-
836
- **Significant brittleness**:
837
- - BDD scenario quality assumptions (garbage in, garbage out)
838
- - File discovery mechanisms (fragile git parsing)
839
- - Iteration strategies (blind trial and error)
840
- - Error recovery (no rollback, limited learning)
841
-
842
- **Recommendations**:
843
- - More defensive programming in skill execution code
844
- - Better failure mode handling and recovery
845
- - Clearer boundaries between autonomous and confirmed actions
846
- - Smarter iteration with learning from previous attempts
847
- - Shared utilities to reduce duplication
848
- - File tracking database to replace git parsing
849
-
850
- **Three-phase insight**:
851
- The division between stable (internal use) and production (external use) is particularly valuable. It acknowledges that internal tools can have rough edges - developers can handle them. Production must be hardened for real users. This prevents over-engineering internal tools while ensuring production systems are properly secured and optimized.
852
-
853
- Despite criticisms, the approach is innovative and addresses a real need for structured, test-driven development that works for non-technical users. With refinements around robustness and failure recovery, this methodology could be highly effective for teams building both internal tools and production systems.