loki-mode 6.71.1 → 6.72.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (91) hide show
  1. package/README.md +9 -1
  2. package/SKILL.md +2 -2
  3. package/VERSION +1 -1
  4. package/autonomy/hooks/migration-hooks.sh +26 -0
  5. package/autonomy/loki +429 -92
  6. package/autonomy/run.sh +219 -38
  7. package/dashboard/__init__.py +1 -1
  8. package/dashboard/server.py +101 -19
  9. package/docs/INSTALLATION.md +20 -11
  10. package/docs/bug-fixes/agent-01-cli-fixes.md +101 -0
  11. package/docs/bug-fixes/agent-02-purplelab-fixes.md +88 -0
  12. package/docs/bug-fixes/agent-03-dashboard-fixes.md +119 -0
  13. package/docs/bug-fixes/agent-04-memory-fixes.md +105 -0
  14. package/docs/bug-fixes/agent-05-provider-fixes.md +86 -0
  15. package/docs/bug-fixes/agent-06-integration-fixes.md +101 -0
  16. package/docs/bug-fixes/agent-07-dash-run-fixes.md +101 -0
  17. package/docs/bug-fixes/agent-08-docker-fixes.md +164 -0
  18. package/docs/bug-fixes/agent-09-e2e-build-fixes.md +69 -0
  19. package/docs/bug-fixes/agent-10-e2e-fullstack-fixes.md +102 -0
  20. package/docs/bug-fixes/agent-11-e2e-session-fixes.md +70 -0
  21. package/docs/bug-fixes/agent-12-scenario-fixes.md +120 -0
  22. package/docs/bug-fixes/agent-13-enterprise-fixes.md +143 -0
  23. package/docs/bug-fixes/agent-14-uat-newuser-fixes.md +88 -0
  24. package/docs/bug-fixes/agent-15-uat-poweruser-fixes.md +132 -0
  25. package/docs/bug-fixes/agent-19-code-review.md +316 -0
  26. package/docs/bug-fixes/agent-20-architecture-review.md +331 -0
  27. package/docs/competitive/bolt-new-analysis.md +579 -0
  28. package/docs/competitive/emergence-others-analysis.md +605 -0
  29. package/docs/competitive/replit-lovable-analysis.md +622 -0
  30. package/docs/test-scenarios/edge-cases.md +813 -0
  31. package/docs/test-scenarios/enterprise-scenarios.md +732 -0
  32. package/mcp/__init__.py +1 -1
  33. package/mcp/server.py +49 -5
  34. package/memory/consolidation.py +33 -0
  35. package/memory/embeddings.py +10 -1
  36. package/memory/engine.py +83 -38
  37. package/memory/retrieval.py +36 -0
  38. package/memory/storage.py +56 -4
  39. package/memory/token_economics.py +14 -2
  40. package/memory/vector_index.py +36 -7
  41. package/package.json +1 -1
  42. package/providers/gemini.sh +89 -2
  43. package/templates/README.md +1 -1
  44. package/templates/cli-tool.md +30 -0
  45. package/templates/dashboard.md +4 -0
  46. package/templates/data-pipeline.md +4 -0
  47. package/templates/discord-bot.md +47 -0
  48. package/templates/game.md +4 -0
  49. package/templates/microservice.md +4 -0
  50. package/templates/npm-library.md +4 -0
  51. package/templates/rest-api-auth.md +50 -20
  52. package/templates/rest-api.md +15 -0
  53. package/templates/saas-starter.md +1 -1
  54. package/templates/slack-bot.md +36 -0
  55. package/templates/static-landing-page.md +9 -1
  56. package/templates/web-scraper.md +4 -0
  57. package/web-app/dist/assets/Badge-CeBkFjo6.js +1 -0
  58. package/web-app/dist/assets/Button-yuhqo8Fq.js +1 -0
  59. package/web-app/dist/assets/{Card-B1bV4syB.js → Card-BG17vsX0.js} +1 -1
  60. package/web-app/dist/assets/{HomePage-CZTV6Nea.js → HomePage-BMSQ7Apj.js} +3 -3
  61. package/web-app/dist/assets/{LoginPage-D4UdURJc.js → LoginPage-aH_6iolg.js} +1 -1
  62. package/web-app/dist/assets/{NotFoundPage-CCLSeL6j.js → NotFoundPage-Di8cNtB1.js} +1 -1
  63. package/web-app/dist/assets/ProjectPage-BtRssmw9.js +285 -0
  64. package/web-app/dist/assets/ProjectsPage-B-FTFagc.js +6 -0
  65. package/web-app/dist/assets/{SettingsPage-Xuv8EfAg.js → SettingsPage-DIJPBla4.js} +1 -1
  66. package/web-app/dist/assets/TeamsPage--19fNX7w.js +36 -0
  67. package/web-app/dist/assets/TemplatesPage-ChUQNOOv.js +11 -0
  68. package/web-app/dist/assets/TerminalOutput-Dwrzecyl.js +31 -0
  69. package/web-app/dist/assets/activity-BNRWeu9N.js +6 -0
  70. package/web-app/dist/assets/{arrow-left-CaGtolHc.js → arrow-left-Ce6g1_YE.js} +1 -1
  71. package/web-app/dist/assets/circle-alert-LIndawHL.js +11 -0
  72. package/web-app/dist/assets/clock-Bpj4VPlP.js +6 -0
  73. package/web-app/dist/assets/{external-link-CazyUyav.js → external-link-BhhdF0iQ.js} +1 -1
  74. package/web-app/dist/assets/folder-open-CM2LgfxI.js +11 -0
  75. package/web-app/dist/assets/index-8-KpWWq7.css +1 -0
  76. package/web-app/dist/assets/index-kPDW4e_b.js +236 -0
  77. package/web-app/dist/assets/lock-sAk3Xe54.js +16 -0
  78. package/web-app/dist/assets/search-CR-2i9by.js +6 -0
  79. package/web-app/dist/assets/server-DuFh4ymA.js +26 -0
  80. package/web-app/dist/assets/trash-2-BmkkT8V_.js +11 -0
  81. package/web-app/dist/index.html +2 -2
  82. package/web-app/server.py +1321 -53
  83. package/web-app/dist/assets/Badge-CBUx2PjL.js +0 -6
  84. package/web-app/dist/assets/Button-DsRiznlh.js +0 -21
  85. package/web-app/dist/assets/ProjectPage-D0w_X9tG.js +0 -237
  86. package/web-app/dist/assets/ProjectsPage-ByYxDlKC.js +0 -16
  87. package/web-app/dist/assets/TemplatesPage-BKWN07mc.js +0 -1
  88. package/web-app/dist/assets/TerminalOutput-Dj98V8Z-.js +0 -51
  89. package/web-app/dist/assets/clock-C_CDmobx.js +0 -11
  90. package/web-app/dist/assets/index-D452pFGl.css +0 -1
  91. package/web-app/dist/assets/index-Df4_kgLY.js +0 -196
@@ -0,0 +1,105 @@
1
+ # Agent 04: Memory System Functional Testing -- Bug Fix Report
2
+
3
+ ## Scope
4
+ Comprehensive review and fix of all 16 Python modules in `memory/` (~10K lines).
5
+ Addressed bugs from `docs/BUG-AUDIT-v6.61.0.md` plus newly discovered issues.
6
+
7
+ ## Bugs Fixed
8
+
9
+ ### BUG-MEM-001 (Audit) -- Episode ID date parsing produces garbage paths
10
+ - **File**: `memory/engine.py` method `get_episode()`
11
+ - **Root cause**: Fixed-offset parsing with `parts[1]`/`parts[2]`/`parts[3]` assumed a two-character prefix like `ep-`. Variable-length prefixes (e.g., `episode-YYYY-MM-DD-xxx`) shifted the offsets, producing wrong date directories.
12
+ - **Fix**: Replaced fixed-offset parsing with regex `re.search(r'(\d{4})-(\d{2})-(\d{2})', episode_id)` that extracts the date from anywhere in the ID string. Falls back to full directory scan if no date pattern is found.
13
+
14
+ ### BUG-MEM-002 (Mission) -- Semantic search returns stale results after consolidation
15
+ - **File**: `memory/retrieval.py` methods `retrieve_by_similarity()`, `build_indices()`
16
+ - **Root cause**: After consolidation modifies `patterns.json`, the in-memory vector index still holds old embeddings. Searches return outdated results.
17
+ - **Fix**: Added `_indices_built_at` timestamp tracking. `build_indices()` records the build time. `retrieve_by_similarity()` compares patterns.json mtime against the build timestamp and falls back to keyword search when stale. Added `mark_indices_stale()` method for explicit invalidation.
18
+
19
+ ### BUG-MEM-003 (Mission) -- Consolidation pipeline has no locking
20
+ - **File**: `memory/consolidation.py` method `consolidate()`
21
+ - **Root cause**: The consolidation pipeline performs multiple read-modify-write operations on patterns.json without any exclusive locking. Concurrent consolidation runs (e.g., from parallel agents) corrupt data.
22
+ - **Fix**: Added file-based exclusive lock (`fcntl.flock`) via `.consolidation.lock`. The `consolidate()` method acquires the lock before delegating to the new `_consolidate_locked()` method. Lock is always released in a finally block, and the lock file is cleaned up.
23
+
24
+ ### BUG-MEM-004 (Mission) -- Memory engine doesn't validate schema versions
25
+ - **File**: `memory/engine.py` class `MemoryEngine`
26
+ - **Root cause**: No version validation when loading memory data files. Incompatible schema versions could silently produce wrong results or corrupt data.
27
+ - **Fix**: Added `SUPPORTED_SCHEMA_VERSIONS` set and `CURRENT_SCHEMA_VERSION` constant to `MemoryEngine`. Added `_validate_schema_version()` method that checks version fields in loaded data. Called during `initialize()` for index.json and timeline.json. Logs warnings for unsupported versions, auto-assigns version to legacy data without one. Changed hardcoded `"1.0"` to `self.CURRENT_SCHEMA_VERSION` in new file creation.
28
+
29
+ ### BUG-MEM-005 (Mission) -- Token counter overflows for large sessions
30
+ - **File**: `memory/token_economics.py` methods `record_discovery()`, `record_read()`
31
+ - **Root cause**: Token counters grew unbounded in very long sessions. While Python ints don't overflow, downstream JSON serializers and dashboard charts can choke on extremely large numbers.
32
+ - **Fix**: Added `_MAX_TOKEN_COUNTER = 10_000_000_000` class constant. Both `record_discovery()` and `record_read()` now cap their accumulated values at this limit using `min()`.
33
+
34
+ ### BUG-MEM-006 (Mission) -- Embedding model fallback dimension mismatch warning
35
+ - **File**: `memory/embeddings.py` method `embed()`
36
+ - **Root cause**: When the primary embedding provider fails at runtime and falls back to a provider with a different dimension (e.g., OpenAI 1536 -> local 384), callers holding references to VectorIndex objects created with the original dimension get dimension mismatch errors. No warning was issued.
37
+ - **Fix**: Added dimension change detection after runtime fallback. Logs an explicit warning when the dimension changes, informing callers that existing vector indices may need to be rebuilt.
38
+
39
+ ### BUG-MEM-007 (Mission) -- Vector index not rebuilt after consolidation
40
+ - **File**: `memory/consolidation.py` class `ConsolidationResult`
41
+ - **Root cause**: After consolidation creates or merges patterns, vector indices are not notified and continue serving stale data.
42
+ - **Fix**: Added `vector_index_stale` boolean flag to `ConsolidationResult`. The flag is set to `True` when patterns are created, merged, or anti-patterns are created. Callers can check this flag and rebuild indices accordingly.
43
+
44
+ ### BUG-MEM-013 (Audit) -- Missing encoding on vector index JSON sidecar write
45
+ - **File**: `memory/vector_index.py` method `save()`
46
+ - **Root cause**: JSON sidecar files were written without specifying encoding. On systems with non-UTF-8 default locale, non-ASCII metadata caused encoding errors.
47
+ - **Fix**: Replaced direct file write with atomic write pattern (tempfile + `os.replace`). Added `encoding="utf-8"` and `ensure_ascii=False` to the JSON dump.
48
+
49
+ ### NEW BUG -- Non-atomic npz file write in vector index
50
+ - **File**: `memory/vector_index.py` method `save()`
51
+ - **Root cause**: `np.savez()` writes directly to the target path. A crash during write could leave a corrupt npz file, breaking index loading.
52
+ - **Fix**: Write to a temp file first, then atomically rename using `os.replace()`.
53
+
54
+ ### NEW BUG -- TOCTOU race in increment_pattern_usage
55
+ - **File**: `memory/engine.py` method `increment_pattern_usage()`
56
+ - **Root cause**: Used `read_json()` + `write_json()` as separate operations to update a pattern's usage count. Another concurrent write could overwrite the changes between the read and write.
57
+ - **Fix**: Replaced with `load_pattern()` + `_dict_to_pattern()` + `save_pattern()` which performs the full upsert under an exclusive file lock via the storage layer.
58
+
59
+ ### NEW BUG -- Timeline TOCTOU race in engine
60
+ - **File**: `memory/engine.py` method `_update_timeline_with_episode()`
61
+ - **Root cause**: Used `read_json()` + `write_json()` (separate lock acquisitions) to update timeline.json. Concurrent episode storage could lose timeline entries.
62
+ - **Fix**: Delegated to `self.storage.update_timeline(action_entry)` which performs the full read-modify-write under a single exclusive lock.
63
+
64
+ ## Bugs Verified as Already Fixed
65
+
66
+ The following bugs from the audit were already fixed in the current codebase:
67
+
68
+ | Bug ID | Description | How verified |
69
+ |--------|-------------|-------------|
70
+ | BUG-MEM-004 (Audit) | `cluster_by_similarity` uses `list.index()` on duplicates | Code at line 300 uses `member_indices` tracking instead |
71
+ | BUG-MEM-005 (Audit) | Anti-pattern dedup misses current-run duplicates | Code at lines 228-230 adds to `existing_patterns` within loop |
72
+ | BUG-MEM-006 (Audit) | Non-atomic `index.json` write in layers | `memory/layers/` directory does not exist; storage.py uses `_atomic_write` |
73
+ | BUG-MEM-007 (Audit) | Non-atomic `timeline.json` write in layers | Same as above |
74
+ | BUG-MEM-009 (Audit) | `apply_decay` float comparison causes unnecessary rewrites | Code at line 1245 uses `abs(...) > 0.001` tolerance |
75
+ | BUG-MEM-011 (Audit) | `_to_utc_isoformat` edge case with custom tzinfo | Code uses `dt.utcoffset()` comparison, not deprecated `utctimetuple()` |
76
+ | BUG-MEM-012 (Audit) | Redundant filesystem scan in token economics | `_full_load_baseline` caching works correctly |
77
+ | BUG-MEM-014 (Audit) | `AttributeError` on dict-typed actions in `_episode_to_text` | Code handles both dict and object types with `isinstance` checks |
78
+
79
+ ## Validation
80
+
81
+ All 15 Python files in `memory/` pass `ast.parse()` syntax validation:
82
+ - `__init__.py`, `consolidation.py`, `cross_project.py`, `embeddings.py`, `engine.py`
83
+ - `knowledge_graph.py`, `namespace.py`, `rag_injector.py`, `retrieval.py`, `schemas.py`
84
+ - `storage.py`, `test_importance.py`, `token_economics.py`, `unified_access.py`, `vector_index.py`
85
+
86
+ ## Edge Cases Analyzed
87
+
88
+ 1. **Empty data**: All methods handle empty lists/dicts gracefully with early returns.
89
+ 2. **Unicode**: JSON sidecar writes now use `encoding="utf-8"` and `ensure_ascii=False`.
90
+ 3. **Very large episodes**: Token counters capped at 10 billion to prevent JSON serialization issues.
91
+ 4. **Concurrent access**: Consolidation pipeline now has exclusive lock; pattern updates use storage-level locking; timeline updates use storage-level locking.
92
+ 5. **Schema version drift**: Engine now validates schema versions on load and warns about incompatible versions.
93
+ 6. **ID format variations**: Episode lookup now uses regex to extract dates from any position in the ID string.
94
+ 7. **File corruption during crash**: Vector index npz and JSON sidecar files now use atomic write (temp file + rename).
95
+
96
+ ## Files Modified
97
+
98
+ | File | Changes |
99
+ |------|---------|
100
+ | `memory/engine.py` | Fixed episode ID parsing, added schema version validation, fixed TOCTOU races in pattern usage and timeline updates |
101
+ | `memory/retrieval.py` | Added index staleness detection, build timestamp tracking, `mark_indices_stale()` |
102
+ | `memory/consolidation.py` | Added exclusive file lock for consolidation pipeline, `vector_index_stale` flag |
103
+ | `memory/token_economics.py` | Added token counter overflow cap |
104
+ | `memory/embeddings.py` | Added dimension change warning on runtime fallback |
105
+ | `memory/vector_index.py` | Atomic writes for both npz and JSON sidecar files, encoding fix |
@@ -0,0 +1,86 @@
1
+ # Agent 05: Provider System Functional Testing - Bug Fixes
2
+
3
+ ## Summary
4
+
5
+ Tested all 5 provider invocation paths (Claude, Codex, Gemini, Cline, Aider) and fixed 5 bugs across `autonomy/run.sh`, `providers/gemini.sh`. Also identified and fixed 1 new undocumented bug.
6
+
7
+ ## Bugs Fixed
8
+
9
+ ### BUG-PROV-001: Gemini ignores tier_param for model selection (FIXED)
10
+
11
+ **Root cause:** The Gemini invocation in `run.sh` (main iteration loop) used `PROVIDER_MODEL` (frozen at source-time) instead of `tier_param` (dynamically resolved per iteration via `resolve_model_for_tier()`). Regardless of RARV tier, Gemini always used the same model.
12
+
13
+ **Fix locations:**
14
+ - `autonomy/run.sh` (line ~9685): Changed `local model="${PROVIDER_MODEL:-...}"` to `local model="$tier_param"` in the Gemini case block
15
+ - `autonomy/run.sh` `invoke_gemini()` (line ~3029): Changed from `PROVIDER_MODEL` to `provider_get_current_model()` with fallback
16
+ - `autonomy/run.sh` `invoke_gemini_capture()` (line ~3068): Same fix as above
17
+
18
+ ### BUG-PROV-003: Claude health check breaks OAuth users; Gemini lacks key rotation (FIXED)
19
+
20
+ **Root cause (Claude):** `check_provider_health()` required `ANTHROPIC_API_KEY` env var. Users authenticating via OAuth (no API key) were marked unhealthy, triggering unnecessary failover to degraded providers.
21
+
22
+ **Root cause (Gemini):** No support for API key rotation when keys expire or hit quota. No support for `GEMINI_API_KEY` env var alias or gcloud ADC.
23
+
24
+ **Fix locations:**
25
+ - `autonomy/run.sh` `check_provider_health()`: Claude now checks for OAuth session files (`~/.claude/.credentials.json`) and `claude auth status` as fallback. Gemini now checks `GEMINI_API_KEY` and gcloud ADC.
26
+ - `providers/gemini.sh`: Added `_gemini_resolve_api_key()` for key resolution from multiple sources (`GOOGLE_API_KEY`, `GEMINI_API_KEY`, gcloud ADC).
27
+ - `providers/gemini.sh`: Added `_gemini_rotate_api_key()` for rotating through `LOKI_GEMINI_API_KEYS` (comma-separated list) on auth errors (401/403).
28
+ - `providers/gemini.sh` `provider_invoke()` and `provider_invoke_with_tier()`: Added auth error detection and key rotation before rate-limit fallback.
29
+ - `autonomy/run.sh` Gemini invocation block: Added auth error detection and key rotation.
30
+
31
+ ### BUG-PROV-008: Failover updates PROVIDER_NAME but not LOKI_PROVIDER (FIXED)
32
+
33
+ **Root cause:** After failover, `PROVIDER_NAME` was updated but `LOKI_PROVIDER` env var (read by subprocesses and MCP server) retained the old provider name. Child processes and the MCP server reported the wrong provider.
34
+
35
+ **Fix locations:**
36
+ - `autonomy/run.sh` `attempt_provider_failover()`: Added `LOKI_PROVIDER="$provider"; export LOKI_PROVIDER` after updating `PROVIDER_NAME`
37
+ - `autonomy/run.sh` `check_primary_recovery()`: Same fix when switching back to primary provider
38
+
39
+ ### NEW BUG: LOKI_CURRENT_TIER never exported (FOUND AND FIXED)
40
+
41
+ **Root cause:** `providers/gemini.sh:provider_get_current_model()` reads `LOKI_CURRENT_TIER` to resolve the model dynamically. However, `run.sh` only sets `CURRENT_TIER` (without the `LOKI_` prefix) and never exports it. As a result, `provider_get_current_model()` always defaults to "planning" tier, negating the dynamic tier resolution for all Gemini helper functions (`invoke_gemini`, `invoke_gemini_capture`).
42
+
43
+ **Fix locations:**
44
+ - `autonomy/run.sh` (line ~1366): Set and export `LOKI_CURRENT_TIER` at initialization
45
+ - `autonomy/run.sh` (line ~9424): Update and export `LOKI_CURRENT_TIER` when `CURRENT_TIER` changes each iteration
46
+
47
+ ## Bugs Already Fixed (Verified)
48
+
49
+ These bugs were listed in the assignment but had already been resolved in the current codebase:
50
+
51
+ | Bug ID | Description | Status |
52
+ |--------|-------------|--------|
53
+ | BUG-PROV-002 | Generic LOKI_MODEL_* injects invalid Codex models | Fixed: `_codex_validate_model()` in `codex.sh` filters non-Codex model names |
54
+ | BUG-PROV-005 | Provider loader doesn't validate provider exists before sourcing | Fixed: `load_provider()` validates name AND checks file existence |
55
+ | BUG-PROV-007 | auto_detect_provider skips Cline and Aider | Fixed: All 5 providers in priority order |
56
+ | BUG-PROV-009 | Cline model flag word-splitting | Fixed: Array-based `model_args` in `cline.sh` |
57
+ | BUG-PROV-010 | Gemini buffers all output, loses streaming | Fixed: Uses `tee` for streaming |
58
+ | BUG-PROV-012 | Codex resolve_model_for_tier returns effort levels | Fixed: Documented as intentional, callers use correctly |
59
+ | BUG-RUN-010 | Retry counter increments on success | Fixed: `retry=0` reset on success at lines 9851/9897 |
60
+ | BUG-PROV-011 | Parallel dispatch includes Cline despite PARALLEL=false | Fixed: Guard at line 2235 checks `PROVIDER_HAS_PARALLEL` |
61
+
62
+ ## Validation
63
+
64
+ ### Bash syntax validation (all pass)
65
+ - `bash -n providers/claude.sh` -- OK
66
+ - `bash -n providers/codex.sh` -- OK
67
+ - `bash -n providers/gemini.sh` -- OK
68
+ - `bash -n providers/cline.sh` -- OK
69
+ - `bash -n providers/aider.sh` -- OK
70
+ - `bash -n providers/loader.sh` -- OK
71
+ - `bash -n autonomy/run.sh` -- OK
72
+
73
+ ### Edge cases verified
74
+ 1. **API key missing**: `check_provider_health()` handles all 5 providers; Claude supports OAuth fallback
75
+ 2. **CLI not installed**: All provider detect functions use `command -v` with proper error handling
76
+ 3. **Version mismatch**: Provider version functions safely call `--version` with stderr suppression
77
+ 4. **Failover chain**: Wraps around correctly using double-iteration with break-on-wrap guard
78
+ 5. **Key rotation**: `_gemini_rotate_api_key()` handles single key, wraps around, and returns failure when exhausted
79
+ 6. **Frozen model variable**: All Gemini invocation paths now use dynamic resolution
80
+
81
+ ## Files Modified
82
+
83
+ | File | Changes |
84
+ |------|---------|
85
+ | `autonomy/run.sh` | BUG-PROV-001 (Gemini model selection), BUG-PROV-003 (health check + auth), BUG-PROV-008 (LOKI_PROVIDER export), LOKI_CURRENT_TIER export |
86
+ | `providers/gemini.sh` | BUG-PROV-003 (API key resolution + rotation functions, auth error handling in invoke functions) |
@@ -0,0 +1,101 @@
1
+ # Agent 06: Purple Lab + CLI Integration Fixes
2
+
3
+ ## Summary
4
+
5
+ Investigated and fixed 5 integration bugs between Purple Lab (web-app/server.py) and the loki CLI (autonomy/loki, autonomy/run.sh). All bugs were at the boundary where the web server dispatches to the CLI or reads CLI-produced state files.
6
+
7
+ ## Bugs Fixed
8
+
9
+ ### BUG-INT-001: Quick-start API doesn't pass provider selection to CLI
10
+
11
+ **File:** `web-app/server.py` (start_session endpoint, line ~2537)
12
+
13
+ **Root cause:** When `req.mode == "quick"`, the command built was `loki quick <description>` without passing the provider. The `--provider` flag was only included in the `else` branch (full `loki start` mode). Since `loki quick` does not accept a `--provider` flag, the fix passes the provider via the `LOKI_PROVIDER` environment variable, which `run.sh` reads at line 665.
14
+
15
+ **Fix:** After constructing `build_env`, inject `LOKI_PROVIDER` from `req.provider` for all modes (both quick and start). This ensures the correct AI provider is used regardless of invocation mode.
16
+
17
+ ---
18
+
19
+ ### BUG-INT-002: Session state file format mismatch between web and CLI
20
+
21
+ **Files:** `web-app/server.py` (3 locations), `autonomy/run.sh`
22
+
23
+ **Root cause:** The web server read session state from `.loki/state/session.json`, but the CLI never writes that file. The CLI writes:
24
+ - `.loki/dashboard-state.json` (via `write_dashboard_state()` in run.sh) -- contains phase, iteration, complexity, tasks, tokens, agents
25
+ - `.loki/state/orchestrator.json` -- contains currentPhase
26
+ - `.loki/autonomy-state.json` -- contains retryCount, iterationCount, status
27
+
28
+ The web server was reading from a nonexistent file, so status fields (phase, iteration, complexity, cost, pending tasks) were always default values.
29
+
30
+ **Fix:** Changed 3 locations in server.py to read from `dashboard-state.json` (primary) with `state/orchestrator.json` fallback:
31
+ 1. `get_status()` endpoint (GET /api/session/status)
32
+ 2. `_push_state_to_client()` WebSocket push loop
33
+ 3. `_infer_session_status()` for session history
34
+
35
+ Field mapping updated to match `dashboard-state.json` structure:
36
+ - `tasks.pending` instead of `pending_tasks` (nested object)
37
+ - `tasks.inProgress` for current task detection
38
+ - `tokens.cost_usd` for cost (same structure, just different file)
39
+
40
+ ---
41
+
42
+ ### BUG-INT-003: WebSocket connection drops during long builds (no reconnect)
43
+
44
+ **Files:** `web-app/src/api/client.ts`, `web-app/server.py`
45
+
46
+ **Root cause:** The server sends keepalive pings every 60 seconds of client silence (line 5400). If 2 consecutive pings receive no pong response, the server disconnects the WebSocket (line 5396-5398). The client's `PurpleLabWebSocket` class parsed incoming messages and emitted events but never handled the `ping` message type -- it just passed it through to listeners (which nobody listened for). During long builds, the client sends no messages, so the server disconnects after ~120 seconds.
47
+
48
+ The client already had reconnect logic (3-second delay after disconnect), but reconnection during a build causes loss of the log backfill window and a brief UI disruption.
49
+
50
+ **Fix:** Added ping/pong handling in the client's `onmessage` handler. When the client receives a `{type: "ping"}` message, it immediately responds with `{type: "pong"}` via `this.send()`, preventing the server from closing the connection.
51
+
52
+ ---
53
+
54
+ ### BUG-INT-004: File watcher ignores changes based on absolute path
55
+
56
+ **File:** `web-app/server.py` (FileChangeHandler._should_ignore)
57
+
58
+ **Root cause:** The `_should_ignore` method decomposed the FULL absolute path into parts and checked each part against `_WATCH_IGNORE_DIRS` (which includes "build", "dist", "cache", ".git", etc.). If the project was stored at a path containing any of these directory names (e.g., `/home/user/build/my-project/src/app.js`), ALL file events would be silently ignored.
59
+
60
+ The check should only examine path components RELATIVE to the project directory, since only directories within the project should be filtered.
61
+
62
+ **Fix:** Changed `_should_ignore` to compute `os.path.relpath(path, self.project_dir)` before decomposing into parts. This ensures only project-internal directory names are checked against the ignore list.
63
+
64
+ ---
65
+
66
+ ### BUG-INT-005 (NEW): Chat endpoint hardcodes provider as "claude"
67
+
68
+ **File:** `web-app/server.py` (chat_session endpoint, line ~3957)
69
+
70
+ **Root cause:** In the chat endpoint's "max" mode, the command was hardcoded as `[loki, "start", "--provider", "claude", str(prd_path)]`. Users who selected a different provider (codex, gemini) would have their chat commands always routed to Claude. Similarly, "quick" and "standard" modes did not pass any provider information.
71
+
72
+ Additionally, 3 other `loki quick` invocations (monitor auto-fix, Docker service fix, fix endpoint) also had no provider passthrough.
73
+
74
+ **Fix:**
75
+ 1. Chat endpoint now reads the provider from `session.provider` and `.loki/state/provider` file
76
+ 2. Max mode passes the detected provider to `--provider` flag
77
+ 3. Quick/standard modes pass provider via `LOKI_PROVIDER` env var
78
+ 4. Fix endpoint (`/api/sessions/{id}/fix`) passes provider via env
79
+ 5. Monitor auto-fix (`_auto_fix` method) reads provider from session state
80
+ 6. Docker service auto-fix reads provider from session state
81
+
82
+ ## Files Modified
83
+
84
+ | File | Changes |
85
+ |------|---------|
86
+ | `web-app/server.py` | BUG-INT-001 through BUG-INT-005: provider passthrough, state file path correction, file watcher relative path |
87
+ | `web-app/src/api/client.ts` | BUG-INT-003: WebSocket ping/pong handler |
88
+
89
+ ## Verification
90
+
91
+ - Python syntax validated: `python3 -c "import ast; ast.parse(open('web-app/server.py').read())"`
92
+ - TypeScript changes verified (no new errors beyond pre-existing Vite import.meta issues)
93
+ - All fixes are backward-compatible (fallback to "claude" provider, fallback to orchestrator.json)
94
+
95
+ ## Edge Cases Considered
96
+
97
+ 1. **Concurrent sessions**: Each chat task creates its own subprocess with its own env, so provider isolation is maintained
98
+ 2. **Missing provider file**: Falls back to `session.provider` then to `"claude"` default
99
+ 3. **Project directory in ignored path**: Fixed by relative path computation; `os.path.relpath` handles cross-drive paths on Windows via ValueError catch
100
+ 4. **WebSocket reconnection during build**: Client now responds to pings, preventing premature disconnection; if disconnection still occurs, the 3-second reconnect timer handles recovery
101
+ 5. **State file corruption**: All JSON reads wrapped in try/except with fallback defaults
@@ -0,0 +1,101 @@
1
+ # Agent 07: Dashboard + run.sh Integration Bug Fixes
2
+
3
+ ## Area: Dashboard API (server.py) <-> Orchestrator (run.sh) Integration
4
+
5
+ ## Known Bugs -- Verification Status
6
+
7
+ All 7 known bugs (BUG-RUN-001 through BUG-RUN-010) were already patched in the codebase
8
+ with fix comments. Verified each is correctly addressed:
9
+
10
+ | Bug ID | Description | Status |
11
+ |--------|-------------|--------|
12
+ | BUG-RUN-001 | Completion promise checks stale daily log | FIXED (line 9873: uses `$iter_output`) |
13
+ | BUG-RUN-002 | Rate limit detection greps stale daily log | FIXED (line 9910: uses `$iter_output`) |
14
+ | BUG-RUN-003 | ITERATION_COUNT never persisted across restarts | FIXED (line 7964: restored from state) |
15
+ | BUG-RUN-004 | Inconsistent JSON formats in state files | FIXED (queue normalization via jq at line 3308) |
16
+ | BUG-RUN-005 | OpenSpec queue has no deduplication | FIXED (line 8665: `existing_ids` check) |
17
+ | BUG-RUN-009 | Gate escalation PAUSE writes to wrong path | FIXED (line 9804: `touch .loki/PAUSE`) |
18
+ | BUG-RUN-010 | Retry counter increments on success | FIXED (lines 9852, 9898: `retry=0`) |
19
+
20
+ ## New Bugs Found and Fixed
21
+
22
+ ### BUG-NEW-001: WebSocket push inflates running_agents count
23
+ - **File:** `dashboard/server.py` line 366
24
+ - **Root cause:** `_push_loki_state_loop` counted `len(agents_list)` from the JSON
25
+ without validating PIDs. Dead agents still appeared as running. The REST endpoint
26
+ `get_status` correctly validated each PID with `os.kill(pid, 0)`.
27
+ - **Impact:** Dashboard WebSocket clients show ghost agents that are actually dead.
28
+ - **Fix:** Added PID validation loop matching `get_status` behavior.
29
+
30
+ ### BUG-NEW-002: Dashboard drops tasks in object-format queue files
31
+ - **File:** `dashboard/server.py` line 1081
32
+ - **Root cause:** `list_tasks` only handled plain array `[...]` queue files. If a queue
33
+ file was written in `{"tasks": [...]}` format (which `load_queue_tasks` in run.sh
34
+ explicitly supports), all tasks were silently dropped.
35
+ - **Impact:** Tasks written by external tools using object format are invisible in dashboard.
36
+ - **Fix:** Added dict-unwrapping: `raw_items.get("tasks", [])` before array check.
37
+
38
+ ### BUG-NEW-003: Per-iteration temp files leak on success paths
39
+ - **File:** `autonomy/run.sh` lines 9853 and 9899
40
+ - **Root cause:** The success `continue` paths (perpetual mode + normal success) skip
41
+ `rm -f "$iter_output"`. Only the terminal completion paths (council/promise fulfilled)
42
+ and the failure path clean up. Over hundreds of iterations, `.loki/logs/iter-output-*`
43
+ files accumulate.
44
+ - **Impact:** Disk space leak proportional to iteration count. Each file contains full
45
+ iteration output (can be MBs).
46
+ - **Fix:** Added `rm -f "$iter_output"` before both success `continue` statements.
47
+
48
+ ### BUG-NEW-004: Event JSON emits floats as quoted strings
49
+ - **File:** `autonomy/run.sh` line 951
50
+ - **Root cause:** `emit_event_json` regex `^[0-9]+$` only matches integers. A value
51
+ like `cost=3.14` is treated as a string and quoted (`"cost":"3.14"`), creating
52
+ invalid typed JSON for consumers expecting numbers.
53
+ - **Impact:** Dashboard/OTEL consumers that parse event JSON get string types for
54
+ float metrics (cost, duration, etc.).
55
+ - **Fix:** Changed regex to `^[0-9]+\.?[0-9]*$` to match both integers and floats.
56
+
57
+ ### BUG-NEW-005: Dashboard stop leaves orphaned iter_output files
58
+ - **File:** `dashboard/server.py` line 2907
59
+ - **Root cause:** `stop_session` sends SIGTERM and marks session as stopped but does
60
+ not clean up `.loki/logs/iter-output-*` temp files from the killed process.
61
+ - **Impact:** Orphaned temp files persist after dashboard-initiated stops.
62
+ - **Fix:** Added glob cleanup of `iter-output-*` files after SIGTERM.
63
+
64
+ ### BUG-NEW-006: WebSocket broadcasts stale "running" status after crash
65
+ - **File:** `dashboard/server.py` line 382
66
+ - **Root cause:** `_push_loki_state_loop` determined status purely from
67
+ `dashboard-state.json`'s `mode` field. If the process crashed (SIGKILL, OOM, etc.),
68
+ the state file still said `"mode": "autonomous"`, so WebSocket clients saw "running"
69
+ indefinitely. The REST `get_status` endpoint correctly cross-checked the PID.
70
+ - **Impact:** Dashboard UI shows session as running after crash until next full poll.
71
+ - **Fix:** Added PID liveness check before status determination. If PID is dead,
72
+ status is forced to "stopped" regardless of state file contents.
73
+
74
+ ## Integration Points Verified (No Bugs Found)
75
+
76
+ 1. **Pricing tables match:** `_DEFAULT_PRICING` in server.py and `pricing` dict in
77
+ run.sh `check_budget_limit()` have identical rates for all 6 models.
78
+
79
+ 2. **Atomic state writes:** `save_state()` uses temp file + `mv` (atomic rename).
80
+ `write_dashboard_state()` also uses temp + mv. Dashboard uses `_safe_json_read`
81
+ with retry for race protection.
82
+
83
+ 3. **Midnight-crossing:** `parse_claude_reset_time()` handles past-time correctly by
84
+ adding 86400 seconds. No midnight bug.
85
+
86
+ 4. **Session lifecycle:** `stop_session` creates STOP file + SIGTERM, `pause_session`
87
+ creates PAUSE file, `resume_session` removes both. All match run.sh's
88
+ `check_human_intervention()` expectations.
89
+
90
+ 5. **Budget enforcement:** Both dashboard `/api/cost` and run.sh `check_budget_limit()`
91
+ read from `.loki/metrics/efficiency/*.json` with matching cost calculation logic.
92
+
93
+ ## Files Modified
94
+
95
+ - `autonomy/run.sh` -- 3 fixes (BUG-NEW-003 x2, BUG-NEW-004)
96
+ - `dashboard/server.py` -- 4 fixes (BUG-NEW-001, BUG-NEW-002, BUG-NEW-005, BUG-NEW-006)
97
+
98
+ ## Validation
99
+
100
+ - `bash -n autonomy/run.sh` -- PASS
101
+ - `python3 -c "import ast; ast.parse(open('dashboard/server.py').read())"` -- PASS
@@ -0,0 +1,164 @@
1
+ # Agent 08: Docker + Self-Healing Integration Testing
2
+
3
+ ## Summary
4
+
5
+ Audited Dockerfile, Dockerfile.sandbox, docker-compose.yml, healing system (`cmd_heal()`),
6
+ migration hooks (`migration-hooks.sh`), and state management in `run.sh`. Fixed 6 bugs
7
+ (2 known, 4 newly discovered).
8
+
9
+ ---
10
+
11
+ ## Bugs Fixed
12
+
13
+ ### BUG-DK-002: docker-compose loki service missing health check (FIXED)
14
+
15
+ **File:** `docker-compose.yml`
16
+
17
+ **Problem:** The `loki` service had no health check defined. The docker-compose health check
18
+ description in the bug list said "hits wrong endpoint" -- the actual issue was that the loki
19
+ service had zero health check configuration. Only the ChromaDB service had one.
20
+
21
+ **Fix:** Added a health check to the loki service that first tries the dashboard `/health`
22
+ endpoint (for when the dashboard is running), with a fallback to `loki version` (for when
23
+ only the CLI is active). Also updated the version comment from v6.38.0 to v6.71.1.
24
+
25
+ ```yaml
26
+ healthcheck:
27
+ test: ["CMD-SHELL", "curl -sf http://localhost:57374/health >/dev/null 2>&1 || loki version >/dev/null 2>&1"]
28
+ interval: 30s
29
+ timeout: 10s
30
+ start-period: 10s
31
+ retries: 3
32
+ ```
33
+
34
+ ---
35
+
36
+ ### BUG-HEAL-002: Healing phase gate doesn't validate phase transitions (FIXED)
37
+
38
+ **File:** `autonomy/hooks/migration-hooks.sh`
39
+
40
+ **Problem:** `hook_healing_phase_gate()` used a `case` statement with only valid transitions
41
+ listed. Any invalid transition (backwards, skipping phases, unknown phases) fell through
42
+ the case and returned 0 (success), silently allowing dangerous operations like jumping
43
+ from `archaeology` directly to `modernize`.
44
+
45
+ **Fix:** Added phase ordering validation before the case statement. The function now:
46
+ 1. Validates both `from_phase` and `to_phase` are known phases
47
+ 2. Rejects backward transitions (e.g., `modernize` -> `archaeology`)
48
+ 3. Rejects phase skipping (e.g., `archaeology` -> `modernize` skipping `stabilize`/`isolate`)
49
+ 4. Only allows forward transitions to the immediately next phase
50
+
51
+ ---
52
+
53
+ ### BUG-HEAL-003: cmd_heal() provider case missing default clause (NEW - FIXED)
54
+
55
+ **File:** `autonomy/loki`
56
+
57
+ **Problem:** The `case "$provider"` statement in `cmd_heal()` (around line 9298) had no
58
+ default `*)` clause. If an unknown provider was specified (e.g., `loki heal ./app --provider foo`),
59
+ the case silently fell through, `heal_exit` stayed 0, and the user received a false
60
+ "Healing phase complete" success message.
61
+
62
+ **Fix:** Added a `*)` default clause that prints an error with supported providers and
63
+ returns 1.
64
+
65
+ ---
66
+
67
+ ### BUG-HEAL-004: Migration hooks never sourced in healing flow (NEW - FIXED)
68
+
69
+ **File:** `autonomy/loki`
70
+
71
+ **Problem:** `autonomy/hooks/migration-hooks.sh` was never sourced by either `autonomy/loki`
72
+ or `autonomy/run.sh`. This meant all healing hooks (`hook_pre_healing_modify()`,
73
+ `hook_post_healing_modify()`, `hook_healing_phase_gate()`) were dead code -- they existed
74
+ but were never called during actual healing operations. The only consumer was the test file
75
+ `tests/test-migration-v2.sh`.
76
+
77
+ **Fix:** Added sourcing of `migration-hooks.sh` in `cmd_heal()` with:
78
+ 1. Source the hooks file using `BASH_SOURCE[0]` relative path resolution
79
+ 2. Call `load_migration_hook_config()` to load project-specific hook configuration
80
+ 3. Export healing environment variables (`LOKI_HEAL_MODE`, `LOKI_HEAL_PHASE`, etc.)
81
+ 4. Invoke `hook_healing_phase_gate()` when `--resume` is used with a different phase
82
+
83
+ ---
84
+
85
+ ### BUG-ST-013: save_state() doesn't ensure .loki directory exists (NEW - FIXED)
86
+
87
+ **File:** `autonomy/run.sh`
88
+
89
+ **Problem:** `save_state()` writes to `.loki/autonomy-state.json` but doesn't ensure the
90
+ `.loki` directory exists. While normally created by `initialize_workspace()`, signal handlers
91
+ could call `save_state()` before initialization completes, causing a silent failure.
92
+
93
+ **Fix:** Added defensive `mkdir -p .loki 2>/dev/null || true` at the start of `save_state()`.
94
+
95
+ ---
96
+
97
+ ### BUG-ST-014: Non-atomic current-task.json writes (NEW - FIXED)
98
+
99
+ **File:** `autonomy/run.sh`
100
+
101
+ **Problem:** `current-task.json` was written with direct `echo ... > file` (lines 3631, 3815),
102
+ outside the flock-protected section. This could cause partial reads if the dashboard or
103
+ another process reads the file mid-write. Other state files (e.g., `autonomy-state.json`,
104
+ `session.json`) already used atomic temp-file + mv patterns.
105
+
106
+ **Fix:** Both writes now use `echo ... > tmpfile && mv -f tmpfile target` atomic pattern,
107
+ consistent with BUG-XC-004 and BUG-ST-008 patterns elsewhere in the codebase.
108
+
109
+ ---
110
+
111
+ ## Bugs Verified as Already Fixed
112
+
113
+ ### BUG-DK-001: Dockerfile COPY dashboard/ missing pip install
114
+
115
+ Both `Dockerfile` (line 89-90) and `Dockerfile.sandbox` (line 180-181) already include
116
+ `pip3 install --no-cache-dir --break-system-packages -r dashboard/requirements.txt`.
117
+ No fix needed.
118
+
119
+ ### BUG-DK-003: Sandbox Dockerfile doesn't install bash 5
120
+
121
+ Verified: Debian bookworm-slim (used by Dockerfile.sandbox) ships bash 5.2.15.
122
+ Ubuntu 24.04 (used by Dockerfile) ships bash 5.2.21. Both support associative arrays
123
+ and parallel mode. No fix needed.
124
+
125
+ ### BUG-HEAL-001: cmd_heal() doesn't create .loki/healing/ directory before writing
126
+
127
+ Verified: `cmd_heal()` creates the directory at line 9201 with
128
+ `mkdir -p "$heal_dir"/{behavioral-baseline,characterization-tests}` before any writes.
129
+ The `--status`, `--report`, and `--friction-map` subcommands only read (never write)
130
+ and properly check for directory/file existence. No fix needed.
131
+
132
+ ---
133
+
134
+ ## Additional Findings (Not Fixed -- Low Priority)
135
+
136
+ ### Non-atomic writes in initialization
137
+
138
+ Several state files during `initialize_workspace()` use direct `cat > file` patterns
139
+ (e.g., `orchestrator.json` at line 2955, `budget.json` at line 2980). These are safe because
140
+ initialization runs once before any concurrent access, but could be hardened for robustness.
141
+
142
+ ### Phase skip via --phase flag without --resume
143
+
144
+ Users can run `loki heal ./app --phase modernize` and skip prior phases. This is by design
145
+ (expert override), but could be surprising. A warning message when starting at a non-archaeology
146
+ phase without prior healing data could improve UX.
147
+
148
+ ---
149
+
150
+ ## Files Modified
151
+
152
+ | File | Changes |
153
+ |------|---------|
154
+ | `docker-compose.yml` | Added loki service health check, updated version comment |
155
+ | `autonomy/hooks/migration-hooks.sh` | Added phase transition ordering validation |
156
+ | `autonomy/loki` | Added default provider clause, sourced hooks, added phase gate check on resume |
157
+ | `autonomy/run.sh` | Defensive mkdir in save_state(), atomic current-task.json writes |
158
+
159
+ ## Validation
160
+
161
+ - All 3 modified shell scripts pass `bash -n` syntax validation
162
+ - `docker-compose.yml` passes YAML validation with correct structure
163
+ - Health check uses fallback pattern (curl || loki version) for resilience
164
+ - Phase gate validation tested against all 5 phases with forward, backward, and skip scenarios
@@ -0,0 +1,69 @@
1
+ # Agent 09: Full Build E2E Testing - Bug Fixes
2
+
3
+ ## Pipeline Traced
4
+
5
+ Complete flow from prompt submission to preview:
6
+
7
+ 1. `POST /api/session/quick-start` (web-app/server.py) -> validates, generates PRD
8
+ 2. `start_session()` -> spawns `loki start` via Popen with merged stdout/stderr
9
+ 3. `_read_process_output()` -> reads lines, broadcasts via WebSocket
10
+ 4. `loki start` -> `cmd_start()` (autonomy/loki) -> `run_autonomous()` (autonomy/run.sh)
11
+ 5. RARV loop: `build_prompt()` -> provider invocation -> quality gates -> iterate
12
+ 6. File watcher detects changes -> broadcasts `file_changed` -> frontend refreshes
13
+ 7. Chat iteration: `POST /api/sessions/{id}/chat` -> `loki quick` in project dir
14
+
15
+ ## Known Bugs Fixed
16
+
17
+ ### BUG-E2E-001: Quick-start empty/short prompt validation
18
+ - **File**: `web-app/server.py` (line ~2600)
19
+ - **Problem**: Quick-start accepted prompts of any length (even 1 char), leading to degenerate builds
20
+ - **Fix**: Added minimum 3-character validation after trim. Empty strings were already caught, but trivial strings like "a" could still trigger a full build pipeline.
21
+
22
+ ### BUG-E2E-002: Build output loses ordering
23
+ - **File**: `web-app/server.py` (`_read_process_output`, WebSocket backfill)
24
+ - **Problem**: Log lines broadcast via WebSocket had no sequence number, making it impossible for the frontend to detect gaps or reorder after reconnection.
25
+ - **Root cause**: stdout/stderr were already merged at OS level via `stderr=subprocess.STDOUT` (so pipe ordering is correct), but WebSocket reconnection could cause the frontend to miss lines with no way to detect the gap.
26
+ - **Fix**: Added `seq` field (using `session.log_lines_total`) to every log broadcast and backfill message. Frontend can now detect missed lines and request backfill.
27
+
28
+ ### BUG-E2E-003: Preview iframe doesn't reload when files change
29
+ - **File**: `web-app/src/components/ProjectWorkspace.tsx` (line ~573)
30
+ - **Problem**: File change events only triggered iframe reload when no dev server was running (`!devServer?.running`). When a dev server was running, even non-HMR servers (Express, Flask, static servers) never got a reload.
31
+ - **Fix**: Now reloads the iframe for all non-HMR frameworks. HMR-capable frameworks (react, vite, next, nuxt, svelte, remix) are excluded since they handle live reload natively.
32
+
33
+ ### BUG-E2E-004: Chat iteration doesn't pass previous context to AI
34
+ - **Files**: `web-app/server.py` (ChatRequest model, chat handler), `web-app/src/api/client.ts`, `web-app/src/components/AIChatPanel.tsx`
35
+ - **Problem**: Each chat message was sent to the AI in isolation. The `loki quick` command had no awareness of what was previously discussed, making iterative development frustrating (user had to repeat context).
36
+ - **Fix**:
37
+ 1. Added `history` field to ChatRequest model (optional list of {role, content})
38
+ 2. Frontend now sends last 10 messages as conversation history
39
+ 3. Server injects history as "PREVIOUS CONVERSATION CONTEXT" prefix to the prompt
40
+ 4. Long assistant responses truncated to 500 chars to avoid token bloat
41
+
42
+ ### BUG-RUN-001/002: Midnight crossing bugs (already fixed, verified)
43
+ - **File**: `autonomy/run.sh` (line ~9366)
44
+ - **Status**: Already fixed in previous commit. Uses per-iteration `iter_output` temp file instead of daily `log_file` for completion promise checks and rate limit detection.
45
+
46
+ ## New Bugs Discovered and Fixed
47
+
48
+ ### BUG-E2E-005: iter_output temp file leak on success path
49
+ - **File**: `autonomy/run.sh` (lines ~9852, ~9899)
50
+ - **Problem**: The per-iteration output file (`iter_output`) was cleaned up only on the error/retry path (line 9952) and completion paths (lines 9867, 9882). The normal success path (line 9899) did `continue` without cleanup, leaking a temp file per successful iteration.
51
+ - **Impact**: Long-running sessions in `.loki/logs/` would accumulate `iter-output-XXXXXX` files, one per iteration. A 100-iteration session would leak ~100 temp files.
52
+ - **Fix**: Added `rm -f "$iter_output"` before `continue` on both success paths (perpetual mode at line 9852 and normal success at line 9899).
53
+
54
+ ### BUG-E2E-006: Provider validation missing on request models
55
+ - **File**: `web-app/server.py` (StartRequest, QuickStartRequest models)
56
+ - **Problem**: The `provider` field on StartRequest and QuickStartRequest accepted any string. An unknown provider like `"evil"` would be passed to `loki start --provider evil`, which would fail inside run.sh but waste resources spawning a process.
57
+ - **Fix**: Added `@field_validator("provider")` that validates against the known set: claude, codex, gemini, cline, aider.
58
+
59
+ ### BUG-E2E-007: ChatRequest message not validated
60
+ - **File**: `web-app/server.py` (ChatRequest model)
61
+ - **Problem**: The `message` field had no validation. An empty string or a 10MB message could be sent, either causing a useless `loki quick ""` invocation or excessive memory usage.
62
+ - **Fix**: Added `@field_validator("message")` that rejects empty messages and enforces a 100KB limit.
63
+
64
+ ## Verification
65
+
66
+ - Python syntax: `ast.parse()` passes for `web-app/server.py`
67
+ - Bash syntax: `bash -n autonomy/run.sh` passes
68
+ - TypeScript: No new errors introduced (pre-existing errors are all from missing node_modules)
69
+ - All fixes are backward compatible (new fields are optional, new validations reject previously-invalid input)