loki-mode 6.65.0 → 6.67.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/SKILL.md +3 -2
- package/VERSION +1 -1
- package/autonomy/hooks/migration-hooks.sh +170 -0
- package/autonomy/loki +1018 -1
- package/autonomy/notification-checker.py +4 -3
- package/autonomy/run.sh +22 -10
- package/autonomy/tui.sh +471 -0
- package/completions/_loki +57 -0
- package/completions/loki.bash +39 -1
- package/dashboard/__init__.py +1 -1
- package/dashboard/server.py +20 -6
- package/docs/INSTALLATION.md +1 -1
- package/events/bus.py +9 -1
- package/events/emit.sh +2 -2
- package/mcp/__init__.py +1 -1
- package/memory/namespace.py +14 -11
- package/memory/schemas.py +155 -0
- package/memory/storage.py +29 -4
- package/package.json +1 -1
- package/references/legacy-healing-patterns.md +352 -0
- package/skills/00-index.md +13 -1
- package/skills/healing.md +491 -0
- package/skills/quality-gates.md +31 -2
- package/skills/troubleshooting.md +33 -0
|
@@ -0,0 +1,491 @@
|
|
|
1
|
+
# Legacy System Healing
|
|
2
|
+
|
|
3
|
+
## Research Foundation
|
|
4
|
+
|
|
5
|
+
This module synthesizes ideas from multiple validated sources. Each technique is cited.
|
|
6
|
+
|
|
7
|
+
| Source | Key Contribution | Citation |
|
|
8
|
+
|--------|-----------------|----------|
|
|
9
|
+
| Amazon AGI Lab (2026) | Friction-as-semantics, agents as universal API over legacy UIs | [amazon.science blog](https://www.amazon.science/blog/how-agentic-ai-helps-heal-the-systems-we-cant-replace) |
|
|
10
|
+
| Michael Feathers (2004) | Characterization testing, dependency-breaking techniques, seams | *Working Effectively with Legacy Code*, Prentice Hall |
|
|
11
|
+
| Martin Fowler (2004) | Strangler Fig pattern for incremental replacement | [martinfowler.com/bliki/StranglerFigApplication](https://martinfowler.com/bliki/StranglerFigApplication.html) |
|
|
12
|
+
| Eric Evans (2003) | Anti-Corruption Layer to isolate legacy from modern code | *Domain-Driven Design*, Addison-Wesley |
|
|
13
|
+
| RepoMod-Bench (2026) | System-boundary testing for behavioral equivalence | arXiv:2602.22518 |
|
|
14
|
+
| Model-Driven Modernization (2026) | Observability + contract tests for conformance | arXiv:2602.04341 |
|
|
15
|
+
| HEC (2025) | Equivalence verification via equality saturation | arXiv:2506.02290 |
|
|
16
|
+
| VAPU (2025) | Multi-agent pipeline for autonomous legacy updates | arXiv:2510.18509 |
|
|
17
|
+
| Code Reborn (2025) | AI-driven COBOL-to-Java, 93% accuracy | arXiv:2504.11335 |
|
|
18
|
+
| AWS Transform (2025-2026) | Decomposition agents, semantic seeding, domain grouping | [AWS blog](https://aws.amazon.com/blogs/migration-and-modernization/accelerate-your-mainframe-modernization-journey-using-ai-agents-with-aws-transform/) |
|
|
19
|
+
| GitHub Copilot (2025) | 3-agent pattern: extract logic, generate tests, generate modern code | [github.blog](https://github.blog/ai-and-ml/github-copilot/how-github-copilot-and-ai-agents-are-saving-legacy-systems/) |
|
|
20
|
+
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
## When to Load This Module
|
|
24
|
+
|
|
25
|
+
- `loki heal` command invoked
|
|
26
|
+
- Working with legacy codebases (COBOL, FORTRAN, old Java, PHP 5, Python 2, jQuery-era JS)
|
|
27
|
+
- Brownfield modernization projects
|
|
28
|
+
- `--target` flag used with `loki migrate`
|
|
29
|
+
- Codebase archaeology / knowledge extraction tasks
|
|
30
|
+
|
|
31
|
+
---
|
|
32
|
+
|
|
33
|
+
## Core Principles
|
|
34
|
+
|
|
35
|
+
### 1. Friction is Semantics (Amazon AGI Lab)
|
|
36
|
+
|
|
37
|
+
**Source:** Amazon AGI Lab -- "The logic behind legacy systems reveals itself most clearly through friction."
|
|
38
|
+
|
|
39
|
+
System quirks are not bugs. They are the real behavior. The modal that appears late encodes a sequencing rule. The field that refuses input until another value is saved. The form that resets because a backend job restarted midflow. These behaviors ARE the semantics.
|
|
40
|
+
|
|
41
|
+
```yaml
|
|
42
|
+
friction_detection:
|
|
43
|
+
rule: "Before 'fixing' any quirk, verify it is not an undocumented business rule"
|
|
44
|
+
action: "Document in .loki/healing/friction-map.json"
|
|
45
|
+
classification:
|
|
46
|
+
business_rule: "Keep and document. Gate 10 blocks removal."
|
|
47
|
+
true_bug: "Fix with characterization test proving the fix."
|
|
48
|
+
unknown: "Keep until classified. NEVER remove unknown friction."
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
**Friction Map Schema:**
|
|
52
|
+
```json
|
|
53
|
+
{
|
|
54
|
+
"frictions": [
|
|
55
|
+
{
|
|
56
|
+
"id": "friction-001",
|
|
57
|
+
"location": "src/billing/invoice.py:234",
|
|
58
|
+
"behavior": "Sleep 2s before committing transaction",
|
|
59
|
+
"classification": "business_rule|true_bug|unknown",
|
|
60
|
+
"evidence": "Prevents race condition with external payment gateway callback",
|
|
61
|
+
"discovered_by": "archaeology_scan",
|
|
62
|
+
"timestamp": "2026-01-25T10:00:00Z",
|
|
63
|
+
"safe_to_remove": false
|
|
64
|
+
}
|
|
65
|
+
]
|
|
66
|
+
}
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
### 2. Characterize Before Modifying (Feathers)
|
|
70
|
+
|
|
71
|
+
**Source:** Michael Feathers, *Working Effectively with Legacy Code* (2004)
|
|
72
|
+
|
|
73
|
+
A characterization test describes the ACTUAL behavior of existing code, not the INTENDED behavior. It is a change detector, not a correctness proof. Mark Seemann (2025) emphasizes: "Write an assertion that you know will fail" -- this prevents tautological tests.
|
|
74
|
+
|
|
75
|
+
```yaml
|
|
76
|
+
characterization_testing:
|
|
77
|
+
feathers_recipe:
|
|
78
|
+
1: "Use a piece of code in a test harness"
|
|
79
|
+
2: "Write an assertion that you KNOW will fail"
|
|
80
|
+
3: "Let the failure tell you what the actual behavior is"
|
|
81
|
+
4: "Change the test so that it expects the behavior the code produces"
|
|
82
|
+
5: "Repeat -- the test now documents the actual behavior"
|
|
83
|
+
|
|
84
|
+
key_distinction: |
|
|
85
|
+
Characterization tests capture WHAT THE CODE DOES.
|
|
86
|
+
Unit tests verify WHAT THE CODE SHOULD DO.
|
|
87
|
+
When these differ, the characterization test wins during healing --
|
|
88
|
+
because users depend on actual behavior, not intended behavior.
|
|
89
|
+
|
|
90
|
+
seemann_2025: |
|
|
91
|
+
"A characterization test is a falsifiable experiment. The implied
|
|
92
|
+
hypothesis is that the test will fail. If it does not fail, you've
|
|
93
|
+
falsified the prediction." (Mark Seemann, Nov 2025)
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
### 3. Strangler Fig Pattern (Fowler)
|
|
97
|
+
|
|
98
|
+
**Source:** Martin Fowler, 2004. Named after strangler figs that gradually grow around a host tree.
|
|
99
|
+
|
|
100
|
+
Do NOT rewrite. Gradually replace components while both old and new run simultaneously. A facade/proxy routes traffic to old or new based on readiness.
|
|
101
|
+
|
|
102
|
+
```yaml
|
|
103
|
+
strangler_fig:
|
|
104
|
+
steps:
|
|
105
|
+
1: "Identify system boundaries (not arbitrary code boundaries)"
|
|
106
|
+
2: "Define thin slices -- small enough to replace, big enough to deliver value"
|
|
107
|
+
3: "Introduce indirection layer (the 'fig' that grows around the tree)"
|
|
108
|
+
4: "Develop new component behind the indirection"
|
|
109
|
+
5: "Route traffic to new component"
|
|
110
|
+
6: "Retire old component when new one is verified"
|
|
111
|
+
7: "Iterate for next slice"
|
|
112
|
+
|
|
113
|
+
best_practices:
|
|
114
|
+
- "Start with low-risk components that have good test coverage"
|
|
115
|
+
- "The facade must not become a bottleneck or single point of failure"
|
|
116
|
+
- "Handle shared databases via views or APIs, not direct access"
|
|
117
|
+
- "Use CI/CD, canary releases, and feature flags"
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
### 4. Anti-Corruption Layer (Evans, DDD)
|
|
121
|
+
|
|
122
|
+
**Source:** Eric Evans, *Domain-Driven Design* (2003), Chapter 14
|
|
123
|
+
|
|
124
|
+
When integrating with a legacy system, create a layer that translates between your modern domain model and the legacy model. This prevents the legacy model from "corrupting" your clean design.
|
|
125
|
+
|
|
126
|
+
```yaml
|
|
127
|
+
anti_corruption_layer:
|
|
128
|
+
purpose: "Translate between modern and legacy models without contamination"
|
|
129
|
+
components:
|
|
130
|
+
facade: "Simplified interface to the legacy subsystem"
|
|
131
|
+
adapter: "Converts legacy data formats to modern types"
|
|
132
|
+
translator: "Maps between domain concepts across systems"
|
|
133
|
+
|
|
134
|
+
# This is what Amazon AGI Lab calls making the agent a "universal API"
|
|
135
|
+
# The agent manages legacy idiosyncrasies behind the scenes.
|
|
136
|
+
amazon_connection: |
|
|
137
|
+
Amazon's insight: when agents learn the UI layer deeply enough,
|
|
138
|
+
they function as a synthetic API -- a stable programmatic surface
|
|
139
|
+
over infrastructure that can't be changed. This IS an anti-corruption
|
|
140
|
+
layer, implemented by an AI agent instead of hand-coded middleware.
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
### 5. System-Boundary Verification (RepoMod-Bench)
|
|
144
|
+
|
|
145
|
+
**Source:** arXiv:2602.22518 (Feb 2026) -- "Software behavior is best verified at the system boundary rather than the unit level."
|
|
146
|
+
|
|
147
|
+
Do NOT compare outputs byte-for-byte at the unit level. Verify functional equivalence at natural system interfaces: CLIs, REST APIs, message queues, file outputs.
|
|
148
|
+
|
|
149
|
+
```yaml
|
|
150
|
+
system_boundary_verification:
|
|
151
|
+
key_insight: |
|
|
152
|
+
RepoMod-Bench showed that unit-level testing allows agents to
|
|
153
|
+
"overfit" to tests. System-boundary testing with implementation-agnostic
|
|
154
|
+
test suites is the correct approach for verifying modernization.
|
|
155
|
+
|
|
156
|
+
approach:
|
|
157
|
+
- "Identify natural system boundaries (CLI, API, file I/O, DB queries)"
|
|
158
|
+
- "Write tests at those boundaries that are language-agnostic"
|
|
159
|
+
- "Run the SAME tests against old and new implementations"
|
|
160
|
+
- "Differences = behavioral changes that need explicit documentation"
|
|
161
|
+
|
|
162
|
+
repomod_finding: |
|
|
163
|
+
Pass rates drop from 91.3% on projects under 10K LOC to 15.3% on
|
|
164
|
+
projects over 50K LOC. Autonomous modernization at scale remains
|
|
165
|
+
a significant open challenge (arXiv:2602.22518).
|
|
166
|
+
|
|
167
|
+
model_driven_approach: |
|
|
168
|
+
arXiv:2602.04341 validates using observability (logs, metrics, traces)
|
|
169
|
+
and contract tests to verify behavioral and non-functional conformance
|
|
170
|
+
during modernization. Telemetry feeds back into rule guards.
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
---
|
|
174
|
+
|
|
175
|
+
## Healing Pipeline
|
|
176
|
+
|
|
177
|
+
### Phase Gates
|
|
178
|
+
|
|
179
|
+
Each phase has deterministic gates (shell-level enforcement via `autonomy/hooks/migration-hooks.sh`).
|
|
180
|
+
|
|
181
|
+
```yaml
|
|
182
|
+
phases:
|
|
183
|
+
1_archaeology:
|
|
184
|
+
name: "Codebase Archaeology"
|
|
185
|
+
actions:
|
|
186
|
+
- "Map dependency graph (Feathers: find 'seams' and 'pinch points')"
|
|
187
|
+
- "Catalog all friction points (Amazon: friction-as-semantics)"
|
|
188
|
+
- "Write characterization tests for critical paths (Feathers recipe)"
|
|
189
|
+
- "Extract institutional knowledge from comments and git history"
|
|
190
|
+
- "Classify code by age and change frequency (fossilized vs radioactive)"
|
|
191
|
+
gate: "friction-map.json has >0 entries AND characterization tests pass at 100%"
|
|
192
|
+
|
|
193
|
+
2_stabilize:
|
|
194
|
+
name: "Stabilize (Add Observability)"
|
|
195
|
+
actions:
|
|
196
|
+
- "Add logging/tracing without changing behavior (arXiv:2602.04341)"
|
|
197
|
+
- "Extract configuration from hardcoded values"
|
|
198
|
+
- "Add type annotations/hints where possible"
|
|
199
|
+
- "Set up contract tests at system boundaries (RepoMod-Bench approach)"
|
|
200
|
+
gate: "All characterization tests still pass AND no new static warnings"
|
|
201
|
+
|
|
202
|
+
3_isolate:
|
|
203
|
+
name: "Strangler Fig Setup"
|
|
204
|
+
actions:
|
|
205
|
+
- "Identify component boundaries using AWS Transform's decomposition approach"
|
|
206
|
+
- "Create anti-corruption layer at boundaries (Evans, DDD)"
|
|
207
|
+
- "Add integration tests at adapter boundaries"
|
|
208
|
+
- "Set up routing/facade for traffic splitting (Fowler)"
|
|
209
|
+
gate: "Components can be tested independently through their adapters"
|
|
210
|
+
|
|
211
|
+
4_modernize:
|
|
212
|
+
name: "Incremental Replacement"
|
|
213
|
+
actions:
|
|
214
|
+
- "Replace ONE component at a time behind its anti-corruption layer"
|
|
215
|
+
- "Run system-boundary tests after each replacement (arXiv:2602.22518)"
|
|
216
|
+
- "Verify friction behaviors are preserved (or explicitly documented as removed)"
|
|
217
|
+
- "Use GitHub's 3-agent pattern where applicable:"
|
|
218
|
+
- " Agent 1: Extract business logic from legacy component"
|
|
219
|
+
- " Agent 2: Generate characterization tests for that logic"
|
|
220
|
+
- " Agent 3: Generate modern implementation that passes those tests"
|
|
221
|
+
gate: "System-boundary tests pass + new component tests pass"
|
|
222
|
+
|
|
223
|
+
5_validate:
|
|
224
|
+
name: "Behavioral Conformance Verification"
|
|
225
|
+
actions:
|
|
226
|
+
- "Run full system-boundary test suite against both old and new"
|
|
227
|
+
- "Compare with observability data from stabilize phase (arXiv:2602.04341)"
|
|
228
|
+
- "Verify no institutional logic was lost"
|
|
229
|
+
- "Generate healing report with explicit list of behavioral changes"
|
|
230
|
+
gate: "Functional equivalence at system boundaries OR documented intentional changes"
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
---
|
|
234
|
+
|
|
235
|
+
## Healing RARV Cycle
|
|
236
|
+
|
|
237
|
+
The standard RARV cycle adapted for healing work:
|
|
238
|
+
|
|
239
|
+
```
|
|
240
|
+
REASON: What is the riskiest undocumented behavior?
|
|
241
|
+
|
|
|
242
|
+
v
|
|
243
|
+
ACT: Write a characterization test that captures it (Feathers recipe).
|
|
244
|
+
|
|
|
245
|
+
v
|
|
246
|
+
REFLECT: Does the test capture ACTUAL behavior, not INTENDED behavior?
|
|
247
|
+
(If you wrote what you THINK the code does, you wrote a unit test,
|
|
248
|
+
not a characterization test.)
|
|
249
|
+
|
|
|
250
|
+
v
|
|
251
|
+
VERIFY: Run the test.
|
|
252
|
+
|
|
|
253
|
+
+--[PASS]--> Behavior documented. Store friction point. Move to next.
|
|
254
|
+
|
|
|
255
|
+
+--[FAIL]--> You misunderstood the system. This IS the learning.
|
|
256
|
+
Update your model. Store in episodic memory.
|
|
257
|
+
(Amazon: "the hardest part is teaching why workflows fail")
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
---
|
|
261
|
+
|
|
262
|
+
## Structured Fault Injection (Honest Alternative to RL Gyms)
|
|
263
|
+
|
|
264
|
+
**Honesty note:** Amazon's RL gyms train neural models through reinforcement learning in thousands of synthetic environments. Loki cannot do RL training. What Loki CAN do is structured fault injection to discover failure modes systematically.
|
|
265
|
+
|
|
266
|
+
```yaml
|
|
267
|
+
structured_fault_injection:
|
|
268
|
+
what_this_is: "Systematic probing of error paths to build understanding"
|
|
269
|
+
what_this_is_not: "Reinforcement learning in synthetic environments"
|
|
270
|
+
|
|
271
|
+
protocol:
|
|
272
|
+
1_happy_path:
|
|
273
|
+
- "Run all existing tests -- record pass/fail baseline"
|
|
274
|
+
- "Execute documented workflows end-to-end"
|
|
275
|
+
|
|
276
|
+
2_boundary_probing:
|
|
277
|
+
- "Send null/empty/max-length inputs to all entry points"
|
|
278
|
+
- "Test with invalid dates, negative numbers, special characters"
|
|
279
|
+
- "Test concurrent access if applicable"
|
|
280
|
+
|
|
281
|
+
3_dependency_failure:
|
|
282
|
+
- "What happens when the database is slow?"
|
|
283
|
+
- "What happens when an external API returns 500?"
|
|
284
|
+
- "What happens when a config file is missing?"
|
|
285
|
+
|
|
286
|
+
4_document_everything:
|
|
287
|
+
- "Every failure mode goes into .loki/healing/failure-modes.json"
|
|
288
|
+
- "After 3+ similar failures, consolidate into semantic memory"
|
|
289
|
+
```
|
|
290
|
+
|
|
291
|
+
---
|
|
292
|
+
|
|
293
|
+
## Multi-Agent Decomposition (AWS Transform Pattern)
|
|
294
|
+
|
|
295
|
+
**Source:** AWS Transform uses specialized agents for different aspects of legacy analysis.
|
|
296
|
+
|
|
297
|
+
```yaml
|
|
298
|
+
decomposition_agents:
|
|
299
|
+
# Adapted from AWS Transform's agent categories
|
|
300
|
+
code_agent:
|
|
301
|
+
purpose: "Analyze source code artifacts -- types, LOC, complexity, dependencies, missing elements"
|
|
302
|
+
loki_mapping: "Run during archaeology phase with sonnet"
|
|
303
|
+
|
|
304
|
+
data_source_agent:
|
|
305
|
+
purpose: "Identify data sources (databases, files, configs) and their metadata"
|
|
306
|
+
loki_mapping: "Run during archaeology phase with sonnet"
|
|
307
|
+
|
|
308
|
+
decomposition_agent:
|
|
309
|
+
purpose: "Group code into logical domains using semantic seeding"
|
|
310
|
+
loki_mapping: "Run during isolate phase with opus"
|
|
311
|
+
aws_detail: |
|
|
312
|
+
AWS Transform uses semantic seeding to organize applications into
|
|
313
|
+
logical domains while dependency detection ensures proper separation.
|
|
314
|
+
This identifies the natural lines of demarcation for strangler fig slices.
|
|
315
|
+
|
|
316
|
+
# GitHub Copilot's 3-agent pattern for actual modernization
|
|
317
|
+
github_3_agent:
|
|
318
|
+
agent_1: "Extract business logic from legacy component"
|
|
319
|
+
agent_2: "Generate characterization tests that validate that logic"
|
|
320
|
+
agent_3: "Generate modern code that passes those tests"
|
|
321
|
+
source: "github.blog/ai-and-ml/github-copilot/how-github-copilot-and-ai-agents-are-saving-legacy-systems/"
|
|
322
|
+
```
|
|
323
|
+
|
|
324
|
+
---
|
|
325
|
+
|
|
326
|
+
## Health Score Formula
|
|
327
|
+
|
|
328
|
+
The health score is NOT a magic number. It is a weighted composite of measurable metrics.
|
|
329
|
+
|
|
330
|
+
```yaml
|
|
331
|
+
health_score:
|
|
332
|
+
formula: |
|
|
333
|
+
health = (0.30 * characterization_coverage)
|
|
334
|
+
+ (0.25 * friction_resolution_rate)
|
|
335
|
+
+ (0.20 * test_pass_rate)
|
|
336
|
+
+ (0.15 * knowledge_extraction_completeness)
|
|
337
|
+
+ (0.10 * static_analysis_improvement)
|
|
338
|
+
|
|
339
|
+
components:
|
|
340
|
+
characterization_coverage:
|
|
341
|
+
definition: "Critical paths with characterization tests / total critical paths"
|
|
342
|
+
range: "0.0 to 1.0"
|
|
343
|
+
weight: 0.30
|
|
344
|
+
rationale: "Feathers: untested code is the primary risk in legacy systems"
|
|
345
|
+
|
|
346
|
+
friction_resolution_rate:
|
|
347
|
+
definition: "Classified friction points / total friction points"
|
|
348
|
+
range: "0.0 to 1.0"
|
|
349
|
+
weight: 0.25
|
|
350
|
+
rationale: "Unknown friction is unmanaged risk (Amazon: friction-as-semantics)"
|
|
351
|
+
|
|
352
|
+
test_pass_rate:
|
|
353
|
+
definition: "Passing characterization tests / total characterization tests"
|
|
354
|
+
range: "0.0 to 1.0"
|
|
355
|
+
weight: 0.20
|
|
356
|
+
rationale: "Failing characterization tests mean behavior has changed"
|
|
357
|
+
|
|
358
|
+
knowledge_extraction_completeness:
|
|
359
|
+
definition: "Components with documented institutional knowledge / total components"
|
|
360
|
+
range: "0.0 to 1.0"
|
|
361
|
+
weight: 0.15
|
|
362
|
+
rationale: "Undocumented knowledge is permanent risk (Amazon: retiring devs)"
|
|
363
|
+
|
|
364
|
+
static_analysis_improvement:
|
|
365
|
+
definition: "1 - (current_warnings / baseline_warnings), clamped to [0, 1]"
|
|
366
|
+
range: "0.0 to 1.0"
|
|
367
|
+
weight: 0.10
|
|
368
|
+
rationale: "arXiv:2511.04427v2: +30% warnings negates velocity gains"
|
|
369
|
+
```
|
|
370
|
+
|
|
371
|
+
---
|
|
372
|
+
|
|
373
|
+
## Healing-Specific Code Review
|
|
374
|
+
|
|
375
|
+
When `loki heal` is active, the code review specialist pool includes:
|
|
376
|
+
|
|
377
|
+
| Specialist | Focus | Trigger Keywords |
|
|
378
|
+
|-----------|-------|-----------------|
|
|
379
|
+
| **legacy-healing-auditor** | Behavioral preservation, friction safety, institutional knowledge | legacy, heal, migrate, cobol, fortran, refactor, modernize, deprecat |
|
|
380
|
+
|
|
381
|
+
**Legacy Healing Auditor checks:**
|
|
382
|
+
- Behavioral change without characterization test update (Feathers)
|
|
383
|
+
- Removal of friction classified as `business_rule` or `unknown` (Amazon)
|
|
384
|
+
- Missing anti-corruption layer for replaced components (Evans)
|
|
385
|
+
- Institutional knowledge loss (deleted comments, removed error messages)
|
|
386
|
+
- Breaking changes to undocumented APIs consumed by other systems
|
|
387
|
+
|
|
388
|
+
---
|
|
389
|
+
|
|
390
|
+
## Healing Metrics
|
|
391
|
+
|
|
392
|
+
```
|
|
393
|
+
.loki/healing/
|
|
394
|
+
friction-map.json # All identified friction points
|
|
395
|
+
failure-modes.json # Cataloged failure modes
|
|
396
|
+
institutional-knowledge.md # Extracted tribal knowledge
|
|
397
|
+
healing-progress.json # Component-by-component healing status
|
|
398
|
+
behavioral-baseline/ # Pre-healing system-boundary outputs
|
|
399
|
+
characterization-tests/ # Tests that capture current behavior
|
|
400
|
+
```
|
|
401
|
+
|
|
402
|
+
**Progress Tracking:**
|
|
403
|
+
```json
|
|
404
|
+
{
|
|
405
|
+
"codebase": "./src",
|
|
406
|
+
"started": "2026-01-25T10:00:00Z",
|
|
407
|
+
"components": [
|
|
408
|
+
{
|
|
409
|
+
"name": "billing/invoice",
|
|
410
|
+
"phase": "stabilize",
|
|
411
|
+
"critical_paths_total": 15,
|
|
412
|
+
"critical_paths_characterized": 12,
|
|
413
|
+
"friction_points": 12,
|
|
414
|
+
"friction_classified": 8,
|
|
415
|
+
"characterization_tests": 47,
|
|
416
|
+
"characterization_passing": 47,
|
|
417
|
+
"institutional_rules_extracted": 8,
|
|
418
|
+
"baseline_warnings": 23,
|
|
419
|
+
"current_warnings": 18,
|
|
420
|
+
"health_score": 0.74
|
|
421
|
+
}
|
|
422
|
+
],
|
|
423
|
+
"overall_health": 0.74
|
|
424
|
+
}
|
|
425
|
+
```
|
|
426
|
+
|
|
427
|
+
---
|
|
428
|
+
|
|
429
|
+
## Healing Signals
|
|
430
|
+
|
|
431
|
+
| Signal | Purpose | Emitted When |
|
|
432
|
+
|--------|---------|-------------|
|
|
433
|
+
| `FRICTION_DETECTED` | New friction point found | Archaeology scan finds quirky behavior |
|
|
434
|
+
| `BEHAVIOR_CHANGE_RISK` | Proposed change may alter legacy behavior | Code review detects behavioral modification |
|
|
435
|
+
| `INSTITUTIONAL_KNOWLEDGE_FOUND` | Tribal knowledge extracted from code | Comment/history analysis reveals business rule |
|
|
436
|
+
| `HEALING_PHASE_COMPLETE` | Component completed a healing phase | Phase gate passed |
|
|
437
|
+
| `LEGACY_COMPATIBILITY_RISK` | Breaking change to legacy API detected | System-boundary test fails |
|
|
438
|
+
|
|
439
|
+
---
|
|
440
|
+
|
|
441
|
+
## Environment Variables
|
|
442
|
+
|
|
443
|
+
| Variable | Default | Purpose |
|
|
444
|
+
|----------|---------|---------|
|
|
445
|
+
| `LOKI_HEAL_MODE` | `false` | Enable healing mode |
|
|
446
|
+
| `LOKI_HEAL_PHASE` | `archaeology` | Current healing phase |
|
|
447
|
+
| `LOKI_HEAL_PRESERVE_FRICTION` | `true` | Warn before removing friction points |
|
|
448
|
+
| `LOKI_HEAL_BASELINE_DIR` | `.loki/healing/behavioral-baseline/` | Pre-healing snapshots |
|
|
449
|
+
| `LOKI_HEAL_STRICT` | `false` | Block ALL behavioral changes without approval |
|
|
450
|
+
|
|
451
|
+
---
|
|
452
|
+
|
|
453
|
+
## Known Limitations
|
|
454
|
+
|
|
455
|
+
Honest assessment of what this module can and cannot do:
|
|
456
|
+
|
|
457
|
+
| Capability | Status | Notes |
|
|
458
|
+
|-----------|--------|-------|
|
|
459
|
+
| Characterization test generation | Agent-guided | Agent writes tests following Feathers recipe, not fully automated |
|
|
460
|
+
| Friction detection | Heuristic | Pattern matching for sleeps, retries, magic values. Not exhaustive. |
|
|
461
|
+
| Equivalence verification | System-boundary | Not formal verification (HEC/arXiv:2506.02290 is research-stage) |
|
|
462
|
+
| Multi-agent decomposition | Sequential | True parallel decomposition like AWS Transform requires cloud infra |
|
|
463
|
+
| Institutional knowledge extraction | Best-effort | Comment/blame analysis. Cannot extract unwritten tribal knowledge. |
|
|
464
|
+
| Scale | <50K LOC practical | RepoMod-Bench: pass rates drop to 15.3% above 50K LOC |
|
|
465
|
+
|
|
466
|
+
---
|
|
467
|
+
|
|
468
|
+
## Quick Reference
|
|
469
|
+
|
|
470
|
+
```bash
|
|
471
|
+
# Start healing a legacy codebase
|
|
472
|
+
loki heal ./legacy-app
|
|
473
|
+
|
|
474
|
+
# Archaeology only (extract knowledge, don't modify)
|
|
475
|
+
loki heal ./legacy-app --archaeology-only
|
|
476
|
+
|
|
477
|
+
# Resume healing from last checkpoint
|
|
478
|
+
loki heal ./legacy-app --resume
|
|
479
|
+
|
|
480
|
+
# View healing progress
|
|
481
|
+
loki heal --status
|
|
482
|
+
|
|
483
|
+
# View friction map
|
|
484
|
+
loki heal --friction-map ./legacy-app
|
|
485
|
+
|
|
486
|
+
# Generate healing report
|
|
487
|
+
loki heal --report
|
|
488
|
+
|
|
489
|
+
# Strict mode: block any behavioral change without approval
|
|
490
|
+
LOKI_HEAL_STRICT=true loki heal ./legacy-app
|
|
491
|
+
```
|
package/skills/quality-gates.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
|
|
3
3
|
**Never ship code without passing all quality gates.**
|
|
4
4
|
|
|
5
|
-
## The
|
|
5
|
+
## The 10 Quality Gates
|
|
6
6
|
|
|
7
7
|
1. **Input Guardrails** - Validate scope, detect injection, check constraints (OpenAI SDK)
|
|
8
8
|
2. **Static Analysis** - CodeQL, ESLint/Pylint, type checking
|
|
@@ -13,6 +13,34 @@
|
|
|
13
13
|
7. **Test Coverage Gates** - Unit: 100% pass, >80% coverage; Integration: 100% pass
|
|
14
14
|
8. **Mock Detector** - Classifies internal vs external mocks; flags tests that never import source code, tautological assertions, and high internal mock ratios
|
|
15
15
|
9. **Test Mutation Detector** - Detects assertion value changes alongside implementation changes (test fitting), low assertion density, and missing pass/fail tracking
|
|
16
|
+
10. **Backward Compatibility** - Behavioral preservation, friction safety, institutional knowledge retention (healing mode)
|
|
17
|
+
|
|
18
|
+
## Gate 10: Backward Compatibility & Behavioral Preservation (v6.67.0)
|
|
19
|
+
|
|
20
|
+
**Triggered when:** `LOKI_HEAL_MODE=true` or `loki heal` is active, or diff touches files flagged in `.loki/healing/friction-map.json`.
|
|
21
|
+
|
|
22
|
+
**Purpose:** Prevent accidental removal of institutional logic or behavioral changes to legacy code without explicit documentation.
|
|
23
|
+
|
|
24
|
+
**Checks:**
|
|
25
|
+
1. **Friction Safety** - If modified code matches a friction-map entry, verify `safe_to_remove` is true or `classification` is `true_bug`
|
|
26
|
+
2. **Characterization Test Coverage** - Modified legacy components must have characterization tests in `.loki/healing/characterization-tests/`
|
|
27
|
+
3. **Comment Preservation** - Deleted comments containing business rule keywords (hack, workaround, compliance, per requirement) must be extracted to `institutional-knowledge.md` first
|
|
28
|
+
4. **Adapter Verification** - Replaced components must have an adapter layer that preserves the original interface
|
|
29
|
+
5. **Behavioral Baseline** - If a baseline exists in `.loki/healing/behavioral-baseline/`, outputs must match or differences must be documented as intentional
|
|
30
|
+
|
|
31
|
+
**Severity:**
|
|
32
|
+
- Removing friction point classified as `business_rule` or `unknown` without approval = **Critical** (BLOCK)
|
|
33
|
+
- Missing characterization test for modified legacy component = **High** (BLOCK)
|
|
34
|
+
- Deleted business rule comment without knowledge extraction = **Medium** (BLOCK)
|
|
35
|
+
- Missing adapter for replaced component = **High** (BLOCK)
|
|
36
|
+
- Behavioral baseline mismatch without documentation = **Medium** (BLOCK)
|
|
37
|
+
|
|
38
|
+
**Disabling (not recommended for healing mode):**
|
|
39
|
+
```bash
|
|
40
|
+
LOKI_GATE_BACKWARD_COMPAT=false # Disable gate 10
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
---
|
|
16
44
|
|
|
17
45
|
## Gate 8 and 9: Automated Test Integrity
|
|
18
46
|
|
|
@@ -285,7 +313,7 @@ velocity_quality_balance:
|
|
|
285
313
|
|
|
286
314
|
## Specialist Review Pool (v5.30.0)
|
|
287
315
|
|
|
288
|
-
|
|
316
|
+
6 named expert reviewers. Select 3 per review based on change type.
|
|
289
317
|
|
|
290
318
|
**Inspired by:** Compound Engineering Plugin's 14 named review agents -- specialized expertise catches more issues than generic reviewers.
|
|
291
319
|
|
|
@@ -296,6 +324,7 @@ velocity_quality_balance:
|
|
|
296
324
|
| **architecture-strategist** | SOLID, coupling, cohesion, patterns, abstraction, dependency direction | *(always included -- design quality affects everything)* |
|
|
297
325
|
| **test-coverage-auditor** | Missing tests, edge cases, error paths, boundary conditions | test, spec, coverage, assert, mock, fixture, expect, describe |
|
|
298
326
|
| **dependency-analyst** | Outdated packages, CVEs, bloat, unused deps, license issues | package, import, require, dependency, npm, pip, yarn, lock |
|
|
327
|
+
| **legacy-healing-auditor** | Behavioral preservation, friction safety, institutional knowledge | legacy, heal, migrate, cobol, fortran, refactor, modernize, deprecat |
|
|
299
328
|
|
|
300
329
|
### Selection Rules
|
|
301
330
|
|
|
@@ -708,6 +708,39 @@ EOF
|
|
|
708
708
|
|
|
709
709
|
---
|
|
710
710
|
|
|
711
|
+
### Healing Signals (v6.67.0)
|
|
712
|
+
|
|
713
|
+
These signals support legacy system healing workflows (see `skills/healing.md`):
|
|
714
|
+
|
|
715
|
+
| Signal | Purpose | Creates | Consumes |
|
|
716
|
+
|--------|---------|---------|----------|
|
|
717
|
+
| `FRICTION_DETECTED` | New friction point found during archaeology | Healing agent | Orchestrator |
|
|
718
|
+
| `BEHAVIOR_CHANGE_RISK` | Code change may alter legacy behavior | Code review | Healing agent |
|
|
719
|
+
| `INSTITUTIONAL_KNOWLEDGE_FOUND` | Tribal knowledge extracted from code | Archaeology scan | Knowledge registry |
|
|
720
|
+
| `HEALING_PHASE_COMPLETE` | Component completed a healing phase | Phase gate | Orchestrator |
|
|
721
|
+
| `LEGACY_COMPATIBILITY_RISK` | Breaking change to legacy API detected | Adapter verification | Healing agent |
|
|
722
|
+
|
|
723
|
+
**FRICTION_DETECTED Schema:**
|
|
724
|
+
```json
|
|
725
|
+
{
|
|
726
|
+
"timestamp": "2026-01-25T10:30:00Z",
|
|
727
|
+
"friction_id": "friction-042",
|
|
728
|
+
"location": "src/billing/invoice.py:234",
|
|
729
|
+
"behavior": "Sleep 2s before committing transaction",
|
|
730
|
+
"classification": "unknown",
|
|
731
|
+
"agent": "eng-001-healer"
|
|
732
|
+
}
|
|
733
|
+
```
|
|
734
|
+
|
|
735
|
+
**Processing Rules:**
|
|
736
|
+
- `FRICTION_DETECTED`: Add to friction-map.json, do NOT remove the friction until classified
|
|
737
|
+
- `BEHAVIOR_CHANGE_RISK`: Pause modification, verify characterization tests exist
|
|
738
|
+
- `INSTITUTIONAL_KNOWLEDGE_FOUND`: Append to institutional-knowledge.md
|
|
739
|
+
- `HEALING_PHASE_COMPLETE`: Run phase gate hook before advancing
|
|
740
|
+
- `LEGACY_COMPATIBILITY_RISK`: Block if in strict mode, warn otherwise
|
|
741
|
+
|
|
742
|
+
---
|
|
743
|
+
|
|
711
744
|
### Other Workflow Signals
|
|
712
745
|
|
|
713
746
|
These signals coordinate parallel worktrees (see `skills/parallel-workflows.md`):
|