loki-mode 6.66.0 → 6.67.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,352 @@
1
+ # Legacy Healing Patterns Reference
2
+
3
+ ## Sources (Validated)
4
+
5
+ Every pattern in this document is traced to a specific source. If a pattern is our synthesis (not directly from a paper), it is marked as such.
6
+
7
+ | # | Source | Year | Key Contribution |
8
+ |---|--------|------|-----------------|
9
+ | 1 | Michael Feathers, *Working Effectively with Legacy Code* | 2004 | Characterization tests, seams, dependency-breaking |
10
+ | 2 | Martin Fowler, Strangler Fig Application | 2004 | Incremental replacement via facade |
11
+ | 3 | Eric Evans, *Domain-Driven Design* Ch.14 | 2003 | Anti-Corruption Layer |
12
+ | 4 | Amazon AGI Lab, "How Agentic AI Helps Heal Systems" | 2026 | Friction-as-semantics, RL gyms, agents as universal API |
13
+ | 5 | arXiv:2602.22518, RepoMod-Bench | 2026 | System-boundary testing for behavioral equivalence |
14
+ | 6 | arXiv:2602.04341, Model-Driven Modernization | 2026 | Observability + contract tests for conformance |
15
+ | 7 | arXiv:2506.02290, HEC | 2025 | Equivalence verification via equality saturation |
16
+ | 8 | arXiv:2502.12466, EquiBench | 2025 | LLM code equivalence reasoning benchmarks |
17
+ | 9 | arXiv:2510.18509, VAPU | 2025 | Multi-agent pipeline for autonomous legacy updates |
18
+ | 10 | arXiv:2504.11335, Code Reborn | 2025 | COBOL-to-Java AI-driven, 93% accuracy |
19
+ | 11 | arXiv:2501.19204, Multi-Agent Web App Upgrades | 2025 | Autonomous legacy web application upgrades |
20
+ | 12 | AWS Transform | 2025-2026 | Decomposition agents, semantic seeding, 1.1B LOC analyzed |
21
+ | 13 | GitHub Copilot Legacy Systems | 2025 | 3-agent pattern: extract, test, rewrite |
22
+ | 14 | Mark Seemann, Empirical Characterization Testing | 2025 | Falsifiable experiment pattern for characterization |
23
+ | 15 | arXiv:2511.04427v2 | 2025 | Velocity-quality tradeoff (807 repos studied) |
24
+ | 16 | ThoughtWorks, Strangler Fig Guide | 2025 | Practical implementation steps |
25
+
26
+ ---
27
+
28
+ ## 1. Characterization Testing (Feathers, 2004)
29
+
30
+ ### The Core Technique
31
+
32
+ Michael Feathers defines a characterization test as "a test you write to understand the behavior of the system." It captures WHAT the code does, not what it SHOULD do.
33
+
34
+ ### Feathers' Recipe
35
+
36
+ ```
37
+ 1. Use a piece of code in a test harness
38
+ 2. Write an assertion that you KNOW will fail
39
+ 3. Let the failure tell you what the actual behavior is
40
+ 4. Change the test so it expects the behavior the code produces
41
+ 5. The test now documents actual behavior
42
+ ```
43
+
44
+ ### Why "Write a failing assertion" Matters (Seemann, 2025)
45
+
46
+ Mark Seemann's empirical characterization testing (2025) explains: writing a failing test is a falsifiable experiment. If you write an assertion you EXPECT to pass, you might write a tautology. The failing assertion forces you to discover what the code actually does.
47
+
48
+ ### Characterization vs Unit Tests
49
+
50
+ | Aspect | Characterization Test | Unit Test |
51
+ |--------|---------------------|-----------|
52
+ | **Verifies** | What code DOES | What code SHOULD do |
53
+ | **Written** | After code exists | Before code exists (TDD) |
54
+ | **When they differ** | Characterization wins during healing | Unit test wins in new development |
55
+ | **Purpose** | Change detector | Correctness proof |
56
+
57
+ ### Dependency-Breaking Techniques (Feathers)
58
+
59
+ Feathers catalogs 24 dependency-breaking techniques. Most relevant for healing:
60
+
61
+ - **Sprout Method/Class**: Add new behavior in a new method/class, called from the legacy code
62
+ - **Wrap Method/Class**: Wrap legacy behavior, adding new behavior before/after
63
+ - **Extract and Override**: Extract dependency to a method, override in test subclass
64
+ - **Introduce Seam**: Find a place where behavior can be altered without modifying the call site
65
+ - **Pinch Point**: A narrow place in the dependency graph where you can intercept behavior
66
+
67
+ ---
68
+
69
+ ## 2. Strangler Fig Pattern (Fowler, 2004)
70
+
71
+ ### Definition
72
+
73
+ Named after strangler figs that gradually grow around a host tree until they replace it. In software: gradually replace legacy components while both old and new run simultaneously.
74
+
75
+ ### Implementation (ThoughtWorks, 2025)
76
+
77
+ ```
78
+ 1. Identify system boundaries
79
+ - NOT arbitrary code boundaries
80
+ - Natural business domain boundaries
81
+ - Where the system interfaces with users or other systems
82
+
83
+ 2. Define thin slices
84
+ - Small enough to replace safely
85
+ - Big enough to deliver business value
86
+ - Independent and self-contained where possible
87
+
88
+ 3. Introduce indirection layer (facade/proxy)
89
+ - Routes requests to old or new based on readiness
90
+ - Must NOT become a bottleneck
91
+ - Must NOT be a single point of failure
92
+
93
+ 4. Develop new component
94
+ - Behind the indirection layer
95
+ - With characterization tests from the old component
96
+
97
+ 5. Route traffic
98
+ - Canary: send 5% to new, 95% to old
99
+ - Compare outputs
100
+ - Gradually increase
101
+
102
+ 6. Retire old component
103
+ - Only after new component is verified at 100% traffic
104
+
105
+ 7. Iterate for next slice
106
+ ```
107
+
108
+ ### Best Practices (ThoughtWorks + AWS)
109
+
110
+ | Practice | Source | Rationale |
111
+ |----------|--------|-----------|
112
+ | Start with low-risk components | ThoughtWorks | Build confidence before tackling critical paths |
113
+ | Handle shared databases via views/APIs | ThoughtWorks | Direct DB access creates hidden coupling |
114
+ | Use feature flags and canary releases | ThoughtWorks | Reversible deployment |
115
+ | Maintain a living migration roadmap | ThoughtWorks | Track what's strangled, in-progress, and next |
116
+ | Use semantic seeding for decomposition | AWS Transform | Groups code into natural domains automatically |
117
+
118
+ ---
119
+
120
+ ## 3. Anti-Corruption Layer (Evans, DDD 2003)
121
+
122
+ ### Definition
123
+
124
+ A layer that isolates a new system from a legacy system by translating between their models. Prevents the legacy model from "corrupting" the clean design of the new system.
125
+
126
+ ### Components
127
+
128
+ ```
129
+ +-------------------+
130
+ | Modern System | Clean domain model
131
+ +--------+----------+
132
+ |
133
+ +--------v----------+
134
+ | Anti-Corruption |
135
+ | Layer |
136
+ | +-- Facade | Simplified interface to legacy
137
+ | +-- Adapter | Converts data formats
138
+ | +-- Translator | Maps domain concepts
139
+ +--------+----------+
140
+ |
141
+ +--------v----------+
142
+ | Legacy System | Untouched
143
+ +-------------------+
144
+ ```
145
+
146
+ ### Connection to Amazon's "Universal API"
147
+
148
+ Amazon AGI Lab describes agents that "effectively become a universal API" by managing legacy idiosyncrasies behind the scenes. This IS an anti-corruption layer, but implemented by an AI agent that learns the legacy system's behavior through its UI rather than through hand-coded middleware.
149
+
150
+ **Key difference:** Traditional ACL is hand-coded and static. Amazon's approach uses AI agents trained on system behavior to create dynamic, adaptive ACLs.
151
+
152
+ ---
153
+
154
+ ## 4. Behavioral Equivalence Verification
155
+
156
+ ### System-Boundary Testing (RepoMod-Bench, arXiv:2602.22518)
157
+
158
+ **Key insight:** "Software behavior is best verified at the system boundary rather than the unit level."
159
+
160
+ RepoMod-Bench showed that unit-level testing allows agents to "overfit" to tests. System-boundary testing with implementation-agnostic test suites is the correct approach.
161
+
162
+ ```yaml
163
+ system_boundary_testing:
164
+ natural_boundaries:
165
+ - CLI output for a given set of inputs
166
+ - REST API responses for known requests
167
+ - Database state after operations
168
+ - File outputs for batch jobs
169
+ - Message queue payloads
170
+
171
+ approach:
172
+ 1: "Capture outputs at ALL system boundaries before modernization"
173
+ 2: "Store as golden master (behavioral baseline)"
174
+ 3: "Run the SAME tests against modernized code"
175
+ 4: "Any difference = behavioral change requiring documentation"
176
+
177
+ findings:
178
+ small_repos: "91.3% pass rate on projects under 10K LOC"
179
+ large_repos: "15.3% pass rate on projects over 50K LOC"
180
+ implication: "Break large codebases into smaller components before modernizing"
181
+ ```
182
+
183
+ ### Observability-Based Verification (arXiv:2602.04341)
184
+
185
+ The Model-Driven Modernization paper uses observability (logs, metrics, traces) and contract tests to verify behavioral and non-functional conformance.
186
+
187
+ ```yaml
188
+ observability_verification:
189
+ capture_during_stabilize:
190
+ - "Add structured logging to all critical paths"
191
+ - "Add metrics (response time, error rate, throughput)"
192
+ - "Add distributed tracing"
193
+
194
+ verify_after_modernize:
195
+ - "Compare log patterns (same operations should produce same log sequences)"
196
+ - "Compare metrics (response time within 2x, error rate equal or lower)"
197
+ - "Compare traces (same service interactions)"
198
+
199
+ contract_tests:
200
+ - "Define contracts at adapter boundaries"
201
+ - "Verify contracts after each modernization step"
202
+ - "Contracts are NOT unit tests -- they verify interface compliance"
203
+ ```
204
+
205
+ ### Formal Equivalence (Research Stage)
206
+
207
+ HEC (arXiv:2506.02290) uses e-graphs and equality saturation for formal equivalence checking. EquiBench (arXiv:2502.12466) benchmarks LLM ability to reason about code equivalence.
208
+
209
+ **Honest assessment:** Formal equivalence verification is research-stage. Loki uses system-boundary testing and observability, not formal verification.
210
+
211
+ ---
212
+
213
+ ## 5. Multi-Agent Modernization Patterns
214
+
215
+ ### AWS Transform Decomposition (2025-2026)
216
+
217
+ AWS Transform uses specialized AI agents organized by domain:
218
+
219
+ | Agent Category | Purpose | Loki Equivalent |
220
+ |---------------|---------|-----------------|
221
+ | **Code Agent** | Analyze types, LOC, complexity, dependencies | Archaeology phase, sonnet |
222
+ | **Data Source Agent** | Identify databases, files, configs and their usage | Archaeology phase, sonnet |
223
+ | **Decomposition Agent** | Group code into logical domains via semantic seeding | Isolate phase, opus |
224
+ | **Refactor Agent** | Transform legacy code to modern language | Modernize phase, sonnet |
225
+ | **Reforge Agent** | Optimize refactored code for maintainability | Modernize phase, sonnet |
226
+ | **Testing Agent** | Generate test plans from application dependencies | All phases, sonnet |
227
+
228
+ **Semantic Seeding:** AWS Transform identifies natural domain boundaries by analyzing code semantics, not just syntactic structure. This determines WHERE to place strangler fig boundaries.
229
+
230
+ ### GitHub Copilot 3-Agent Pattern (2025)
231
+
232
+ GitHub uses three sequential agents for COBOL modernization:
233
+
234
+ ```
235
+ Agent 1: Extract business logic from legacy code
236
+ |
237
+ v
238
+ Agent 2: Generate characterization tests that validate that logic
239
+ |
240
+ v
241
+ Agent 3: Generate modern code that passes those tests
242
+ ```
243
+
244
+ **Key insight from Julia Kordick (Microsoft):** She never learned COBOL. She brought AI expertise and worked with domain experts who had decades of knowledge. The agent bridges the knowledge gap.
245
+
246
+ ### VAPU Pipeline (arXiv:2510.18509)
247
+
248
+ VAPU uses a multi-agent pipeline with verification:
249
+
250
+ ```
251
+ Requirements -> Developer Agent -> Verification Agent -> Finalizer Agent
252
+ | | |
253
+ v v v
254
+ Modify code Check success Complete phase
255
+ Give feedback or revert
256
+ ```
257
+
258
+ **Key result:** Medium-parameter models (Nova Pro 1.0, DeepSeek-V3) excelled at low error rate, while larger models (Claude 3.5 Sonnet, GPT-4o) excelled on harder tasks.
259
+
260
+ ---
261
+
262
+ ## 6. Institutional Knowledge Extraction
263
+
264
+ ### Sources of Knowledge (Ordered by Value)
265
+
266
+ | Source | Value | Method | Validation |
267
+ |--------|-------|--------|------------|
268
+ | Code comments (hack, workaround, don't touch) | High | Regex scan | Cross-reference with git blame |
269
+ | Git blame history | High | `git log --follow --diff-filter=M` | Date + author + commit message |
270
+ | Error messages | Medium | Grep user-facing strings | Often encode business rules |
271
+ | Test fixtures | Medium | Analyze expected values | Encode business expectations |
272
+ | Configuration (magic numbers, thresholds) | Medium | Find hardcoded values | Trace usage to business logic |
273
+ | Dead code | Low-Medium | Static analysis for unreachable code | May be called dynamically |
274
+ | Documentation | Variable | Read docs, verify against code | Often outdated |
275
+
276
+ ### Comment Archaeology Patterns
277
+
278
+ ```bash
279
+ # High-value patterns (likely encode business rules)
280
+ grep -rn "hack\|workaround\|kludge\|temporary" --include="*.py" ./src/
281
+ grep -rn "don't touch\|do not modify\|fragile\|careful" --include="*.py" ./src/
282
+ grep -rn "per .* requirement\|compliance\|regulation" --include="*.py" ./src/
283
+ grep -rn "see ticket\|see bug\|see issue\|JIRA" --include="*.py" ./src/
284
+
285
+ # Code age analysis
286
+ git log --format='%at %H' --diff-filter=M -- <file> | head -1
287
+ ```
288
+
289
+ ---
290
+
291
+ ## 7. Healing Anti-Patterns (Validated)
292
+
293
+ | Anti-Pattern | Why It Fails | What to Do | Source |
294
+ |-------------|-------------|-----------|--------|
295
+ | **Big Bang Rewrite** | Destroys institutional knowledge, 15.3% success rate at 50K+ LOC | Strangler Fig | Fowler 2004, arXiv:2602.22518 |
296
+ | **Fixing Quirks Without Classification** | "This sleep(2) is unnecessary" -- may prevent race condition | Classify friction first | Amazon AGI Lab 2026 |
297
+ | **Unit-Level Equivalence Testing** | Allows test overfitting, misses system-level behavioral changes | System-boundary testing | arXiv:2602.22518 |
298
+ | **Comment Deletion as "Cleanup"** | Removes institutional knowledge permanently | Extract to institutional-knowledge.md first | Synthesis |
299
+ | **Test Deletion** | "These tests are weird/slow" -- they capture critical behaviors | Keep until characterization complete | Feathers 2004 |
300
+ | **Over-Abstracting** | Adding clean architecture layers adds complexity | Anti-corruption layer at boundaries only | Evans 2003 |
301
+ | **Skipping Archaeology** | "I can see what this does" -- you see structure, not semantics | Always characterize before modifying | Feathers 2004, Amazon 2026 |
302
+ | **Ignoring Dead Code** | May be called via reflection, dynamic dispatch, or external systems | Runtime analysis before removal | Feathers 2004 |
303
+
304
+ ---
305
+
306
+ ## 8. Scale Constraints (Honest Assessment)
307
+
308
+ Based on arXiv:2602.22518 (RepoMod-Bench) and arXiv:2504.11335 (Code Reborn):
309
+
310
+ | Codebase Size | Expected Outcome | Strategy |
311
+ |---------------|-----------------|----------|
312
+ | <10K LOC | 91.3% automated pass rate | Full automated healing |
313
+ | 10K-50K LOC | ~50% automated pass rate | Automated archaeology + guided modernization |
314
+ | 50K-200K LOC | ~15% automated pass rate | Break into components first, then heal each |
315
+ | >200K LOC | Not practical end-to-end | AWS Transform or similar enterprise tooling |
316
+
317
+ **Code Reborn (arXiv:2504.11335) findings for COBOL-to-Java:**
318
+ - 93% accuracy (vs 75% manual, 82% rule-based tools)
319
+ - 35% complexity reduction
320
+ - 33% coupling reduction
321
+ - Tested on 50,000 COBOL files
322
+
323
+ ---
324
+
325
+ ## 9. Language-Specific Patterns (Condensed)
326
+
327
+ ### COBOL / Mainframe
328
+ - COPYBOOK files = shared data structures (must map ALL usage)
329
+ - PARAGRAPH names = business process steps
330
+ - 88-level conditions = enum-like business rules
331
+ - PERFORM THRU = transaction boundaries
332
+ - **200 billion lines still running** banks, insurance, government
333
+
334
+ ### Legacy Java (Pre-8)
335
+ - XML configuration (Spring XML, Hibernate HBM) = wiring
336
+ - EJB session beans = transaction semantics
337
+ - Servlet filter chain ordering matters
338
+ - JNDI lookups = deployment dependencies
339
+ - ThreadLocal = hidden state
340
+
341
+ ### Legacy PHP (Pre-7)
342
+ - register_globals behavior (security risk)
343
+ - mysql_* functions (SQL injection risk)
344
+ - include/require with variable paths (dynamic loading)
345
+ - Session handling with custom save handlers
346
+
347
+ ### Legacy Python (2.x)
348
+ - print statement vs function
349
+ - unicode/str confusion
350
+ - Integer division behavior (// vs /)
351
+ - Old-style classes
352
+ - Run `2to3 --no-diffs -w` in report mode first
@@ -29,6 +29,7 @@
29
29
  | Multi-provider (Codex, Gemini) | `providers.md` |
30
30
  | OpenSpec delta context, brownfield modifications | `openspec-integration.md` |
31
31
  | MiroFish market validation, `--mirofish` flag | `mirofish-integration.md` |
32
+ | Legacy healing, modernization, archaeology | `healing.md` |
32
33
  | Plan deepening, knowledge extraction | `compound-learning.md` |
33
34
 
34
35
  ## Module Descriptions
@@ -43,7 +44,7 @@
43
44
 
44
45
  ### quality-gates.md
45
46
  **When:** Code review, pre-commit checks, quality assurance
46
- - 9-gate quality system
47
+ - 10-gate quality system (Gate 10: backward compatibility for healing)
47
48
  - Blind review + anti-sycophancy
48
49
  - Velocity-quality feedback loop (arXiv research)
49
50
  - Mandatory quality checks per task
@@ -133,6 +134,17 @@
133
134
  - Solution retrieval: Load relevant cross-project solutions during REASON phase
134
135
  - Composable phases: plan, deepen, work, review, compound
135
136
 
137
+ ### healing.md (v6.67.0)
138
+ **When:** Legacy codebase modernization, `loki heal`, brownfield projects, code archaeology
139
+ - 5 healing principles (friction-as-semantics, failure-first, adapters, incremental, knowledge preservation)
140
+ - Healing RARV cycle (characterize before modifying)
141
+ - Codebase archaeology protocol
142
+ - Friction map management
143
+ - Healing phase gates (archaeology > stabilize > isolate > modernize > validate)
144
+ - Legacy-healing-auditor code review specialist
145
+ - Language-specific guides (COBOL, legacy Java, PHP, Python 2)
146
+ - Full reference: `references/legacy-healing-patterns.md`
147
+
136
148
  ### providers.md (v5.0.0)
137
149
  **When:** Using non-Claude providers (Codex, Gemini), understanding degraded mode
138
150
  - Provider comparison matrix