prizmkit 1.1.1 → 1.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (99) hide show
  1. package/bundled/VERSION.json +3 -3
  2. package/bundled/adapters/claude/agent-adapter.js +18 -0
  3. package/bundled/adapters/claude/command-adapter.js +1 -27
  4. package/bundled/agents/prizm-dev-team-critic.md +2 -0
  5. package/bundled/agents/prizm-dev-team-dev.md +2 -0
  6. package/bundled/agents/prizm-dev-team-reviewer.md +2 -0
  7. package/bundled/dev-pipeline/README.md +63 -63
  8. package/bundled/dev-pipeline/assets/feature-list-example.json +1 -1
  9. package/bundled/dev-pipeline/assets/prizm-dev-team-integration.md +1 -1
  10. package/bundled/dev-pipeline/{launch-daemon.sh → launch-feature-daemon.sh} +33 -33
  11. package/bundled/dev-pipeline/launch-refactor-daemon.sh +454 -0
  12. package/bundled/dev-pipeline/lib/branch.sh +1 -1
  13. package/bundled/dev-pipeline/reset-feature.sh +3 -3
  14. package/bundled/dev-pipeline/reset-refactor.sh +312 -0
  15. package/bundled/dev-pipeline/{retry-bug.sh → retry-bugfix.sh} +47 -59
  16. package/bundled/dev-pipeline/retry-feature.sh +41 -54
  17. package/bundled/dev-pipeline/retry-refactor.sh +358 -0
  18. package/bundled/dev-pipeline/run-bugfix.sh +6 -0
  19. package/bundled/dev-pipeline/{run.sh → run-feature.sh} +31 -31
  20. package/bundled/dev-pipeline/run-refactor.sh +787 -0
  21. package/bundled/dev-pipeline/scripts/generate-bootstrap-prompt.py +177 -10
  22. package/bundled/dev-pipeline/scripts/generate-refactor-prompt.py +419 -0
  23. package/bundled/dev-pipeline/scripts/init-refactor-pipeline.py +393 -0
  24. package/bundled/dev-pipeline/scripts/update-refactor-status.py +726 -0
  25. package/bundled/dev-pipeline/templates/agent-prompts/critic-code-challenge.md +13 -0
  26. package/bundled/dev-pipeline/templates/agent-prompts/critic-plan-challenge.md +7 -0
  27. package/bundled/dev-pipeline/templates/agent-prompts/dev-fix.md +7 -0
  28. package/bundled/dev-pipeline/templates/agent-prompts/dev-implement.md +26 -0
  29. package/bundled/dev-pipeline/templates/agent-prompts/dev-resume.md +5 -0
  30. package/bundled/dev-pipeline/templates/agent-prompts/reviewer-analyze.md +5 -0
  31. package/bundled/dev-pipeline/templates/agent-prompts/reviewer-review.md +12 -0
  32. package/bundled/dev-pipeline/templates/bootstrap-tier1.md +29 -2
  33. package/bundled/dev-pipeline/templates/bootstrap-tier2.md +8 -7
  34. package/bundled/dev-pipeline/templates/bootstrap-tier3.md +11 -10
  35. package/bundled/dev-pipeline/templates/bugfix-bootstrap-prompt.md +2 -3
  36. package/bundled/dev-pipeline/templates/feature-list-schema.json +1 -1
  37. package/bundled/dev-pipeline/templates/refactor-list-schema.json +159 -0
  38. package/bundled/dev-pipeline/templates/sections/ac-verification-checklist.md +13 -0
  39. package/bundled/dev-pipeline/templates/sections/feature-context.md +1 -1
  40. package/bundled/dev-pipeline/templates/sections/phase-analyze-agent.md +9 -8
  41. package/bundled/dev-pipeline/templates/sections/phase-analyze-full.md +9 -8
  42. package/bundled/dev-pipeline/templates/sections/phase-browser-verification.md +2 -1
  43. package/bundled/dev-pipeline/templates/sections/phase-critic-code.md +8 -10
  44. package/bundled/dev-pipeline/templates/sections/phase-critic-plan-full.md +9 -10
  45. package/bundled/dev-pipeline/templates/sections/phase-critic-plan.md +8 -9
  46. package/bundled/dev-pipeline/templates/sections/phase-implement-agent.md +7 -10
  47. package/bundled/dev-pipeline/templates/sections/phase-implement-full.md +8 -15
  48. package/bundled/dev-pipeline/templates/sections/phase-review-agent.md +7 -12
  49. package/bundled/dev-pipeline/templates/sections/phase-review-full.md +8 -19
  50. package/bundled/dev-pipeline/templates/sections/test-failure-recovery.md +75 -0
  51. package/bundled/skills/_metadata.json +33 -6
  52. package/bundled/skills/app-planner/SKILL.md +105 -320
  53. package/bundled/skills/app-planner/assets/app-design-guide.md +101 -0
  54. package/bundled/skills/app-planner/references/frontend-design-guide.md +1 -1
  55. package/bundled/skills/app-planner/references/project-brief-guide.md +49 -80
  56. package/bundled/skills/bug-fix-workflow/SKILL.md +2 -2
  57. package/bundled/skills/bug-planner/SKILL.md +68 -5
  58. package/bundled/skills/bug-planner/scripts/validate-bug-list.py +3 -2
  59. package/bundled/skills/bugfix-pipeline-launcher/SKILL.md +19 -5
  60. package/bundled/skills/{dev-pipeline-launcher → feature-pipeline-launcher}/SKILL.md +32 -32
  61. package/bundled/skills/feature-planner/SKILL.md +337 -0
  62. package/bundled/skills/{app-planner → feature-planner}/assets/evaluation-guide.md +4 -4
  63. package/bundled/skills/{app-planner → feature-planner}/assets/planning-guide.md +3 -171
  64. package/bundled/skills/{app-planner → feature-planner}/references/browser-interaction.md +6 -5
  65. package/bundled/skills/feature-planner/references/decomposition-patterns.md +75 -0
  66. package/bundled/skills/{app-planner → feature-planner}/references/error-recovery.md +8 -8
  67. package/bundled/skills/{app-planner → feature-planner}/references/incremental-feature-planning.md +1 -1
  68. package/bundled/skills/{app-planner/references/new-app-planning.md → feature-planner/references/new-project-planning.md} +1 -1
  69. package/bundled/skills/{app-planner → feature-planner}/scripts/validate-and-generate.py +4 -4
  70. package/bundled/skills/feature-workflow/SKILL.md +23 -23
  71. package/bundled/skills/prizm-kit/SKILL.md +1 -3
  72. package/bundled/skills/prizmkit-analyze/SKILL.md +2 -5
  73. package/bundled/skills/prizmkit-code-review/SKILL.md +2 -2
  74. package/bundled/skills/prizmkit-committer/SKILL.md +4 -8
  75. package/bundled/skills/prizmkit-deploy/SKILL.md +1 -5
  76. package/bundled/skills/prizmkit-implement/SKILL.md +3 -50
  77. package/bundled/skills/prizmkit-init/SKILL.md +5 -77
  78. package/bundled/skills/prizmkit-plan/SKILL.md +1 -12
  79. package/bundled/skills/prizmkit-prizm-docs/SKILL.md +6 -24
  80. package/bundled/skills/prizmkit-prizm-docs/assets/PRIZM-SPEC.md +21 -0
  81. package/bundled/skills/prizmkit-retrospective/SKILL.md +12 -117
  82. package/bundled/skills/recovery-workflow/SKILL.md +166 -316
  83. package/bundled/skills/recovery-workflow/evals/evals.json +29 -13
  84. package/bundled/skills/recovery-workflow/scripts/detect-recovery-state.py +232 -274
  85. package/bundled/skills/refactor-pipeline-launcher/SKILL.md +352 -0
  86. package/bundled/skills/refactor-planner/SKILL.md +436 -0
  87. package/bundled/skills/refactor-planner/assets/planning-guide.md +292 -0
  88. package/bundled/skills/refactor-planner/references/behavior-preservation.md +301 -0
  89. package/bundled/skills/refactor-planner/references/refactor-scoping-guide.md +221 -0
  90. package/bundled/skills/refactor-planner/scripts/validate-and-generate-refactor.py +786 -0
  91. package/bundled/skills/refactor-workflow/SKILL.md +299 -319
  92. package/package.json +1 -1
  93. package/src/clean.js +3 -3
  94. package/src/scaffold.js +6 -6
  95. package/bundled/skills/prizmkit-plan/assets/spec-template.md +0 -56
  96. package/bundled/skills/prizmkit-plan/references/clarify-guide.md +0 -67
  97. package/src/config.js +0 -504
  98. package/src/prompts.js +0 -210
  99. /package/bundled/skills/{dev-pipeline-launcher → feature-pipeline-launcher}/scripts/preflight-check.py +0 -0
@@ -0,0 +1,292 @@
1
+ # Refactor Planning Reference Guide
2
+
3
+ This guide provides structured patterns, decision matrices, and templates for decomposing refactoring goals into well-scoped, executable items. It is intended as a practical reference for the AI during interactive refactor planning sessions.
4
+
5
+ ---
6
+
7
+ ## 1. Identifying Refactoring Boundaries
8
+
9
+ Refactoring boundaries define where one refactor item ends and another begins. Good boundaries produce items that are independently executable and independently verifiable.
10
+
11
+ ### Boundary Heuristics
12
+
13
+ | Signal | Boundary Type | Example |
14
+ |--------|--------------|---------|
15
+ | Different files/modules | Module boundary | "Extract auth logic" vs "Extract validation logic" |
16
+ | Different refactoring operations | Operation boundary | "Rename function" vs "Extract class" |
17
+ | Different risk levels | Risk boundary | "Safe rename" vs "Restructure module internals" |
18
+ | Different test suites affected | Test boundary | "Changes unit tests only" vs "Changes integration tests" |
19
+ | Sequential dependency | Dependency boundary | "Rename X" must complete before "Move X to new module" |
20
+
21
+ ### Rules for Setting Boundaries
22
+
23
+ 1. **One operation type per item.** Don't mix a rename with a structural extraction in the same item.
24
+ 2. **One module scope per item** (unless the refactoring specifically targets cross-module concerns like decoupling).
25
+ 3. **Each item should be independently testable.** After completing item R-001, all tests should pass before starting R-002.
26
+ 4. **If an item requires more than 3 files to change simultaneously**, consider splitting it.
27
+ 5. **If behavior preservation requires different strategies for different parts**, split into separate items with appropriate strategies.
28
+
29
+ ---
30
+
31
+ ## 2. Description Writing Guide
32
+
33
+ Refactor item descriptions are the primary input for autonomous pipeline sessions. A thin description forces the AI to guess about scope and safety constraints.
34
+
35
+ ### Minimum Word Counts
36
+
37
+ | Complexity | Minimum Words | Warning Threshold |
38
+ |------------|---------------|-------------------|
39
+ | low | 15 | 30 |
40
+ | medium | 15 | 50 |
41
+ | high | 15 | 80 |
42
+
43
+ Below 15 words is a validation error. Below the threshold triggers a warning.
44
+
45
+ ### What to Include
46
+
47
+ Every refactor description should cover:
48
+
49
+ 1. **What to change** — specific files, functions, classes, or patterns being refactored
50
+ 2. **How to change it** — the refactoring operation (extract, rename, move, inline, simplify)
51
+ 3. **Why** — the motivation (reduce complexity, improve testability, remove duplication)
52
+ 4. **Constraints** — what must NOT change (public API, behavior contracts, external interfaces)
53
+ 5. **Verification** — how to confirm the refactoring succeeded without breaking behavior
54
+
55
+ ### Good vs Bad Examples
56
+
57
+ **Bad** (12 words — too thin):
58
+ ```
59
+ "Extract the validation logic from the handler into a separate module."
60
+ ```
61
+
62
+ **Good** (55 words — implementation-ready):
63
+ ```
64
+ "Extract all input validation functions from src/api/handler.js (validateEmail, validatePassword, validateUsername) into a new src/utils/validators.js module. Update all imports in handler.js and any other files that import these functions directly. Preserve the exact function signatures and return types. The handler.js file should import from the new location. All existing tests must continue to pass without modification."
65
+ ```
66
+
67
+ **Bad** (14 words):
68
+ ```
69
+ "Convert the user service from callbacks to async/await pattern throughout."
70
+ ```
71
+
72
+ **Good** (72 words — implementation-ready):
73
+ ```
74
+ "Convert src/services/user-service.js from callback-based functions to async/await. Target functions: createUser, findUserById, updateUser, deleteUser (4 functions total). Each function currently accepts a callback as the last parameter and calls it with (err, result). Convert to return Promises and use async/await internally. Update all callers in src/routes/user-routes.js to use await instead of passing callbacks. Preserve all error handling behavior — errors that were passed to callbacks should now be thrown."
75
+ ```
76
+
77
+ ---
78
+
79
+ ## 3. Common Refactoring Patterns
80
+
81
+ Use these patterns as starting points when decomposing refactoring goals.
82
+
83
+ ### Pattern A: Extract Method/Function
84
+
85
+ **When**: A function is too long, has multiple responsibilities, or contains duplicated logic.
86
+
87
+ ```
88
+ R-001: Extract [specific logic] from [source function] into [new function name]
89
+ Type: extract
90
+ Scope: [source file]
91
+ Complexity: low
92
+ Preservation: test-gate
93
+ ```
94
+
95
+ ### Pattern B: Extract Class/Module
96
+
97
+ **When**: A file/class is too large, a group of functions share a common concern, or a module has multiple responsibilities.
98
+
99
+ ```
100
+ R-001: Create new module [name] with extracted [concern] logic
101
+ Type: extract
102
+ Scope: [source file, new file]
103
+ Complexity: medium
104
+ Preservation: test-gate
105
+
106
+ R-002: Update imports to use new [name] module (deps: R-001)
107
+ Type: restructure
108
+ Scope: [all importing files]
109
+ Complexity: low
110
+ Preservation: test-gate
111
+ ```
112
+
113
+ ### Pattern C: Move Module/File
114
+
115
+ **When**: A file is in the wrong directory, module organization needs restructuring.
116
+
117
+ ```
118
+ R-001: Move [file] from [old path] to [new path]
119
+ Type: restructure
120
+ Scope: [file, all importers]
121
+ Complexity: low-medium (depends on import count)
122
+ Preservation: test-gate
123
+ ```
124
+
125
+ ### Pattern D: Inline (Reverse of Extract)
126
+
127
+ **When**: An abstraction is unnecessary, a wrapper adds no value, or indirection hurts readability.
128
+
129
+ ```
130
+ R-001: Inline [function/module] into [target]
131
+ Type: simplify
132
+ Scope: [source file, target file]
133
+ Complexity: low
134
+ Preservation: test-gate
135
+ ```
136
+
137
+ ### Pattern E: Rename (Variable, Function, Class, File)
138
+
139
+ **When**: Names are misleading, inconsistent, or don't follow conventions.
140
+
141
+ ```
142
+ R-001: Rename [old name] to [new name] across codebase
143
+ Type: rename
144
+ Scope: [all files containing the name]
145
+ Complexity: low
146
+ Preservation: test-gate
147
+ ```
148
+
149
+ ### Pattern F: Decouple Dependencies
150
+
151
+ **When**: Circular dependencies, tight coupling between modules, or difficulty testing in isolation.
152
+
153
+ ```
154
+ R-001: Define interface/contract for [dependency] (deps: none)
155
+ Type: decouple
156
+ Scope: [new interface file]
157
+ Complexity: medium
158
+ Preservation: test-gate
159
+
160
+ R-002: Implement [dependency] behind new interface (deps: R-001)
161
+ Type: decouple
162
+ Scope: [implementation file]
163
+ Complexity: medium
164
+ Preservation: test-gate
165
+
166
+ R-003: Update consumers to use interface instead of concrete (deps: R-002)
167
+ Type: decouple
168
+ Scope: [all consumer files]
169
+ Complexity: medium
170
+ Preservation: test-gate
171
+ ```
172
+
173
+ ### Pattern G: Architecture Migration
174
+
175
+ **When**: Converting between paradigms (callbacks to promises, classes to functions, monolith to modules).
176
+
177
+ ```
178
+ R-001: Add new [pattern] alongside old [pattern] (deps: none)
179
+ Type: migrate
180
+ Scope: [target files]
181
+ Complexity: medium-high
182
+ Preservation: test-gate or snapshot
183
+
184
+ R-002: Migrate [specific area] to new pattern (deps: R-001)
185
+ Type: migrate
186
+ Scope: [area files]
187
+ Complexity: medium
188
+ Preservation: test-gate
189
+
190
+ R-003: Remove old [pattern] code (deps: R-002)
191
+ Type: simplify
192
+ Scope: [cleaned files]
193
+ Complexity: low
194
+ Preservation: test-gate
195
+ ```
196
+
197
+ ---
198
+
199
+ ## 4. Dependency Ordering Rules
200
+
201
+ Correct ordering minimizes risk and ensures each step is independently verifiable.
202
+
203
+ ### Ordering Priority (execute in this order)
204
+
205
+ 1. **Safe renames** — Lowest risk. Pure name changes with no structural impact. Can be reverted trivially.
206
+ 2. **Extract/inline** — Moderate risk. Changes module boundaries but doesn't reorganize architecture.
207
+ 3. **Structural changes** — Higher risk. Reorganizes file layout, module hierarchy, or dependency graph.
208
+ 4. **Migrations** — Highest risk. Changes programming patterns or paradigms.
209
+
210
+ ### Dependency Rules
211
+
212
+ 1. **No circular dependencies.** Dependencies MUST form a directed acyclic graph (DAG).
213
+ 2. **Minimal dependency sets.** Each item should depend only on items it directly needs.
214
+ 3. **Rename before restructure.** If you're renaming something AND moving it, rename first (easier to track).
215
+ 4. **Create before consume.** If item A creates a new module and item B uses it, B depends on A.
216
+ 5. **Interface before implementation.** If decoupling, define the interface before implementing behind it.
217
+ 6. **Preserve before remove.** If migrating, ensure new code works before removing old code.
218
+
219
+ ### Validation Checklist
220
+
221
+ - [ ] No item depends on itself
222
+ - [ ] No circular dependency chains exist
223
+ - [ ] Every item ID referenced in a dependency list is defined in the plan
224
+ - [ ] The graph can be topologically sorted
225
+ - [ ] Renames appear before structural changes that reference the renamed entities
226
+
227
+ ---
228
+
229
+ ## 5. Acceptance Criteria for Refactoring
230
+
231
+ Refactoring acceptance criteria focus on structural improvement AND behavior preservation. They differ from feature acceptance criteria.
232
+
233
+ ### Standard Criteria Templates
234
+
235
+ **For extract operations:**
236
+ - [ ] New module/function exists at [target path]
237
+ - [ ] Original location imports from new location (no duplication)
238
+ - [ ] All existing tests pass without modification
239
+ - [ ] No new circular dependencies introduced
240
+
241
+ **For rename operations:**
242
+ - [ ] Old name does not appear anywhere in codebase (except git history)
243
+ - [ ] All references updated to new name
244
+ - [ ] All existing tests pass without modification
245
+
246
+ **For restructure operations:**
247
+ - [ ] Files are in their new locations
248
+ - [ ] All import paths updated
249
+ - [ ] Module boundary is clean (no reaching into internal paths)
250
+ - [ ] All existing tests pass without modification
251
+
252
+ **For decouple operations:**
253
+ - [ ] Interface/contract defined and documented
254
+ - [ ] Implementation satisfies interface
255
+ - [ ] Consumers depend on interface, not implementation
256
+ - [ ] No circular dependencies remain
257
+ - [ ] All existing tests pass without modification
258
+
259
+ **For migrate operations:**
260
+ - [ ] New pattern is used in target area
261
+ - [ ] Old pattern code is removed (no dead code)
262
+ - [ ] All existing tests pass (may need test updates to use new pattern)
263
+ - [ ] Behavior is identical (verified via test-gate or snapshot)
264
+
265
+ ### Writing Principles
266
+
267
+ 1. **Always include "all existing tests pass"** — this is the fundamental refactoring invariant.
268
+ 2. **Be specific about structural outcomes** — "files are organized by feature" is vague; "auth files are in src/features/auth/" is concrete.
269
+ 3. **Include negative criteria** — "no circular dependencies", "no dead code", "no duplicated logic".
270
+ 4. **Keep count manageable** — 3-5 criteria per item. More than 6 suggests the item should be split.
271
+
272
+ ---
273
+
274
+ ## 6. Complexity Estimation for Refactoring
275
+
276
+ | Factor | Low | Medium | High |
277
+ |--------|-----|--------|------|
278
+ | File count | 1-2 files | 3-5 files | 6+ files |
279
+ | Cross-module scope | Same module | 2 modules | 3+ modules |
280
+ | Test coverage | High (>80%) | Moderate (40-80%) | Low (<40%) |
281
+ | Pattern familiarity | Well-known (rename, extract) | Common (restructure) | Novel (custom migration) |
282
+ | Dependency changes | None | Minor (1-2 imports) | Significant (module graph changes) |
283
+
284
+ **Rule**: Take the highest individual factor as the overall complexity. When in doubt, estimate higher.
285
+
286
+ ### Complexity Red Flags (Consider Splitting)
287
+
288
+ - Item touches more than 5 files
289
+ - Item requires changes to both test files and source files in non-trivial ways
290
+ - Item involves both structural change AND pattern migration
291
+ - Item has more than 6 acceptance criteria
292
+ - Item's description exceeds 100 words (suggests multiple operations combined)
@@ -0,0 +1,301 @@
1
+ # Behavior Preservation Guide
2
+
3
+ This guide covers strategies for ensuring that refactoring changes structure without changing behavior. Every refactor item must declare a behavior preservation strategy.
4
+
5
+ ---
6
+
7
+ ## 1. Preservation Strategies
8
+
9
+ ### Strategy: test-gate
10
+
11
+ **Definition**: Run the full test suite after each refactoring change. All previously-passing tests must continue to pass.
12
+
13
+ **How it works**:
14
+ 1. Run the full test suite before starting the refactor item (establish baseline)
15
+ 2. Implement the refactoring change
16
+ 3. Run the full test suite again
17
+ 4. Compare: all tests that passed before must still pass
18
+ 5. If any test fails -> revert the change, investigate, and retry
19
+
20
+ **When to use**:
21
+ - Target area has good test coverage (>60%)
22
+ - Tests are reliable (no flaky tests in the target area)
23
+ - Test suite runs in reasonable time (<5 minutes for the relevant subset)
24
+ - Tests cover the behavior contracts you need to preserve
25
+
26
+ **Strengths**:
27
+ - Most reliable automated strategy
28
+ - Catches regressions immediately
29
+ - Well-understood and widely practiced
30
+ - Works with any test framework
31
+
32
+ **Limitations**:
33
+ - Only as good as test coverage — untested behavior can still break
34
+ - Slow test suites may bottleneck iteration speed
35
+ - Flaky tests create false negatives
36
+
37
+ **Configuration in refactor-list.json**:
38
+ ```json
39
+ {
40
+ "behavior_preservation": "test-gate",
41
+ "test_command": "npm test"
42
+ }
43
+ ```
44
+
45
+ ---
46
+
47
+ ### Strategy: snapshot
48
+
49
+ **Definition**: Capture the observable output/state of the target code before and after refactoring, then compare.
50
+
51
+ **How it works**:
52
+ 1. Identify observable outputs of the target code (API responses, rendered UI, log output, file output)
53
+ 2. Capture a "before" snapshot by exercising the code with representative inputs
54
+ 3. Implement the refactoring change
55
+ 4. Capture an "after" snapshot with the same inputs
56
+ 5. Compare: snapshots must match (or differ only in acceptable ways like formatting)
57
+
58
+ **When to use**:
59
+ - Test coverage is insufficient but behavior is observable
60
+ - The code produces deterministic output for given inputs
61
+ - You can identify representative inputs that exercise the key behavior paths
62
+ - API endpoints, CLI tools, data processing pipelines
63
+
64
+ **Strengths**:
65
+ - Works even when formal tests are missing
66
+ - Captures real behavior rather than test assertions
67
+ - Can detect subtle regressions that tests might miss
68
+
69
+ **Limitations**:
70
+ - Requires deterministic behavior (non-deterministic outputs need normalization)
71
+ - May miss edge cases if representative inputs are incomplete
72
+ - Snapshot comparison tools may need configuration for acceptable differences
73
+ - More manual setup than test-gate
74
+
75
+ **Configuration in refactor-list.json**:
76
+ ```json
77
+ {
78
+ "behavior_preservation": "snapshot",
79
+ "snapshot_targets": ["API responses for /api/users/*", "CLI output for --help flag"]
80
+ }
81
+ ```
82
+
83
+ ---
84
+
85
+ ### Strategy: manual
86
+
87
+ **Definition**: Human verification is required to confirm behavior is preserved. Used as a last resort.
88
+
89
+ **When to use**:
90
+ - No test coverage AND no easily observable deterministic output
91
+ - UI-heavy changes where visual regression is the primary concern
92
+ - Legacy code with unknown behavior contracts
93
+ - Code that interacts with external services in non-reproducible ways
94
+
95
+ **How it works**:
96
+ 1. Document the current behavior (screenshots, recordings, written descriptions)
97
+ 2. Implement the refactoring change
98
+ 3. Human manually verifies the behavior matches the documentation
99
+ 4. Human signs off on the change
100
+
101
+ **Strengths**:
102
+ - Works for any situation
103
+ - Humans can assess subjective quality (UI layout, user experience)
104
+ - Can catch issues that automated tools miss
105
+
106
+ **Limitations**:
107
+ - Slowest strategy — blocks on human availability
108
+ - Error-prone — humans miss regressions, especially subtle ones
109
+ - Not scalable — each item needs separate human attention
110
+ - Not repeatable — different humans may verify differently
111
+
112
+ **Configuration in refactor-list.json**:
113
+ ```json
114
+ {
115
+ "behavior_preservation": "manual",
116
+ "verification_notes": "Manually verify login flow works: email login, social login, password reset"
117
+ }
118
+ ```
119
+
120
+ ---
121
+
122
+ ## 2. Choosing the Right Strategy
123
+
124
+ Use this decision tree to select the appropriate strategy for each refactor item:
125
+
126
+ ```
127
+ Does the target area have test coverage >60%?
128
+ ├── YES: Are the tests reliable (no flaky tests)?
129
+ │ ├── YES → test-gate
130
+ │ └── NO: Fix flaky tests first, then → test-gate
131
+ │ (or if fixing is out of scope → snapshot)
132
+ └── NO: Does the code produce deterministic, observable output?
133
+ ├── YES → snapshot
134
+ └── NO → manual (flag as high-risk)
135
+ ```
136
+
137
+ ### Strategy Selection Table
138
+
139
+ | Test Coverage | Output Observable | Recommended Strategy | Risk Level |
140
+ |--------------|-------------------|---------------------|------------|
141
+ | High (>60%) | Yes | test-gate | Low |
142
+ | High (>60%) | No | test-gate | Low |
143
+ | Medium (30-60%) | Yes | test-gate + snapshot | Medium |
144
+ | Medium (30-60%) | No | test-gate (acknowledge gaps) | Medium |
145
+ | Low (<30%) | Yes | snapshot | Medium-High |
146
+ | Low (<30%) | No | manual | High |
147
+ | None (0%) | Yes | snapshot | High |
148
+ | None (0%) | No | manual | Very High |
149
+
150
+ ### Mixed Strategies
151
+
152
+ For complex items, you can combine strategies:
153
+ - **test-gate + snapshot**: Run tests AND compare output snapshots. Provides defense in depth.
154
+ - **test-gate + manual**: Run tests AND have a human verify UI/UX aspects.
155
+ - Use the primary strategy in the `behavior_preservation` field and note the secondary in `verification_notes`.
156
+
157
+ ---
158
+
159
+ ## 3. Common Behavior-Breaking Pitfalls
160
+
161
+ These are patterns that frequently cause unintended behavior changes during refactoring. Check for each one when planning.
162
+
163
+ ### 3.1 Side Effect Ordering
164
+
165
+ **Pitfall**: Reordering function calls or module initialization can change side effects.
166
+
167
+ **Example**: Moving `initLogger()` after `loadConfig()` when the logger depends on config.
168
+
169
+ **Prevention**: Map side effects and their dependencies before restructuring. Document execution order constraints.
170
+
171
+ ### 3.2 Error Handling Changes
172
+
173
+ **Pitfall**: Extracting code into a new function changes which errors are caught and where.
174
+
175
+ **Example**: A try/catch block that previously caught errors from inline code no longer catches them when the code is extracted to a separate function with its own error handling.
176
+
177
+ **Prevention**: Trace error propagation paths before and after. Ensure the same errors reach the same handlers.
178
+
179
+ ### 3.3 Closure and Scope Changes
180
+
181
+ **Pitfall**: Moving code changes what variables are in scope, especially with closures.
182
+
183
+ **Example**: Extracting a closure that captures `this` into a standalone function loses the `this` binding.
184
+
185
+ **Prevention**: Identify all captured variables. Ensure they are passed as parameters or the binding is preserved.
186
+
187
+ ### 3.4 Import Order Side Effects
188
+
189
+ **Pitfall**: In some languages/frameworks, import order matters (module initialization, polyfills, monkey-patching).
190
+
191
+ **Example**: Moving an import of a polyfill to a different position causes it to load after the code that needs it.
192
+
193
+ **Prevention**: Identify imports with side effects. Document order constraints. Test module initialization explicitly.
194
+
195
+ ### 3.5 Default Parameter Changes
196
+
197
+ **Pitfall**: Extracting a function and adding default parameters changes behavior for callers that relied on the old defaults.
198
+
199
+ **Example**: Original: `function process(data, format) { format = format || 'json'; ... }` — Refactored: `function process(data, format = 'json') { ... }` — These behave differently for `process(data, '')` (empty string).
200
+
201
+ **Prevention**: Audit all default value logic. Use identical defaulting behavior in the refactored version.
202
+
203
+ ### 3.6 Async/Await Conversion Gotchas
204
+
205
+ **Pitfall**: Converting callbacks to async/await can change error propagation, timing, and concurrency.
206
+
207
+ **Example**: Callback errors that were silently swallowed now throw unhandled promise rejections.
208
+
209
+ **Prevention**: Map all error paths in the callback version. Ensure async version handles every path. Test with error scenarios.
210
+
211
+ ### 3.7 Type Coercion Changes
212
+
213
+ **Pitfall**: Moving code between contexts can change implicit type coercion behavior.
214
+
215
+ **Example**: `==` comparisons that relied on type coercion break when types change due to new module boundaries.
216
+
217
+ **Prevention**: Prefer strict equality. Audit type assumptions at module boundaries.
218
+
219
+ ### 3.8 Timing and Race Conditions
220
+
221
+ **Pitfall**: Restructuring async code can change execution timing, revealing or creating race conditions.
222
+
223
+ **Example**: Splitting a synchronous operation into two async steps creates a window where state is inconsistent.
224
+
225
+ **Prevention**: Identify shared mutable state. Ensure atomicity is preserved. Test concurrent scenarios.
226
+
227
+ ---
228
+
229
+ ## 4. Test Coverage Assessment
230
+
231
+ Before refactoring, assess the test coverage of the target area to select the appropriate preservation strategy.
232
+
233
+ ### Quick Coverage Assessment
234
+
235
+ If a formal coverage tool is available:
236
+ ```bash
237
+ # JavaScript (Istanbul/nyc)
238
+ npx nyc --reporter=text -- npm test -- --grep "target-module"
239
+
240
+ # Python (pytest-cov)
241
+ pytest --cov=target_module --cov-report=term-missing
242
+
243
+ # Go
244
+ go test -coverprofile=coverage.out ./target-package/...
245
+ ```
246
+
247
+ ### Manual Coverage Assessment
248
+
249
+ When coverage tools aren't available, assess manually:
250
+
251
+ 1. **List all public functions/methods** in the target area
252
+ 2. **Search for test files** that import or reference the target
253
+ 3. **Check test assertions** — do they test behavior or just structure?
254
+ 4. **Identify untested paths** — error handling, edge cases, default behavior
255
+
256
+ ### Coverage-Based Planning Decisions
257
+
258
+ | Coverage Level | Planning Decision |
259
+ |---------------|-------------------|
260
+ | >80% | Proceed with test-gate. High confidence in behavior preservation. |
261
+ | 60-80% | Proceed with test-gate. Note gaps in refactor item descriptions for the pipeline to be cautious about. |
262
+ | 30-60% | Consider writing additional tests before refactoring (as a prerequisite R-000 item). Or use snapshot strategy for low-coverage areas. |
263
+ | <30% | Strongly recommend writing tests first. If user declines, use snapshot or manual strategy and flag as high risk. |
264
+ | 0% | WARN user explicitly. Recommend writing tests as a prerequisite. If user insists on proceeding, use manual strategy and document all known behaviors. |
265
+
266
+ ### Adding Tests as a Prerequisite Item
267
+
268
+ When test coverage is insufficient, add a prerequisite refactor item:
269
+
270
+ ```
271
+ Refactor Item R-000:
272
+ Title: Add test coverage for [target area] before refactoring
273
+ Type: restructure
274
+ Scope: [test files]
275
+ Priority: critical
276
+ Complexity: medium
277
+ Behavior Preservation: manual (no existing tests to gate against)
278
+ Acceptance Criteria:
279
+ - Test coverage for [target area] reaches >60%
280
+ - Tests cover: [list key behavior contracts]
281
+ - All new tests pass
282
+ Dependencies: none
283
+ ```
284
+
285
+ This item runs first, establishing the test baseline that all subsequent items use for their test-gate strategy.
286
+
287
+ ---
288
+
289
+ ## 5. Behavior Verification Checklist
290
+
291
+ Before marking a refactor item as complete, verify:
292
+
293
+ - [ ] All previously-passing tests still pass
294
+ - [ ] No new warnings or deprecation notices in test output
295
+ - [ ] No new lint errors introduced
296
+ - [ ] Public API surface is unchanged (same exports, same function signatures)
297
+ - [ ] Error messages and error codes are unchanged (consumers may depend on these)
298
+ - [ ] Logging output is unchanged (monitoring/alerting may depend on log patterns)
299
+ - [ ] Configuration interface is unchanged (env vars, config files, CLI flags)
300
+ - [ ] Performance characteristics are within acceptable bounds (no regression >10%)
301
+ - [ ] No dead code left behind (unused imports, unreachable functions)