loki-mode 6.71.1 → 6.72.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +9 -1
- package/SKILL.md +2 -2
- package/VERSION +1 -1
- package/autonomy/hooks/migration-hooks.sh +26 -0
- package/autonomy/loki +429 -92
- package/autonomy/run.sh +219 -38
- package/dashboard/__init__.py +1 -1
- package/dashboard/server.py +101 -19
- package/docs/INSTALLATION.md +20 -11
- package/docs/bug-fixes/agent-01-cli-fixes.md +101 -0
- package/docs/bug-fixes/agent-02-purplelab-fixes.md +88 -0
- package/docs/bug-fixes/agent-03-dashboard-fixes.md +119 -0
- package/docs/bug-fixes/agent-04-memory-fixes.md +105 -0
- package/docs/bug-fixes/agent-05-provider-fixes.md +86 -0
- package/docs/bug-fixes/agent-06-integration-fixes.md +101 -0
- package/docs/bug-fixes/agent-07-dash-run-fixes.md +101 -0
- package/docs/bug-fixes/agent-08-docker-fixes.md +164 -0
- package/docs/bug-fixes/agent-09-e2e-build-fixes.md +69 -0
- package/docs/bug-fixes/agent-10-e2e-fullstack-fixes.md +102 -0
- package/docs/bug-fixes/agent-11-e2e-session-fixes.md +70 -0
- package/docs/bug-fixes/agent-12-scenario-fixes.md +120 -0
- package/docs/bug-fixes/agent-13-enterprise-fixes.md +143 -0
- package/docs/bug-fixes/agent-14-uat-newuser-fixes.md +88 -0
- package/docs/bug-fixes/agent-15-uat-poweruser-fixes.md +132 -0
- package/docs/bug-fixes/agent-19-code-review.md +316 -0
- package/docs/bug-fixes/agent-20-architecture-review.md +331 -0
- package/docs/competitive/bolt-new-analysis.md +579 -0
- package/docs/competitive/emergence-others-analysis.md +605 -0
- package/docs/competitive/replit-lovable-analysis.md +622 -0
- package/docs/test-scenarios/edge-cases.md +813 -0
- package/docs/test-scenarios/enterprise-scenarios.md +732 -0
- package/mcp/__init__.py +1 -1
- package/mcp/server.py +49 -5
- package/memory/consolidation.py +33 -0
- package/memory/embeddings.py +10 -1
- package/memory/engine.py +83 -38
- package/memory/retrieval.py +36 -0
- package/memory/storage.py +56 -4
- package/memory/token_economics.py +14 -2
- package/memory/vector_index.py +36 -7
- package/package.json +1 -1
- package/providers/gemini.sh +89 -2
- package/templates/README.md +1 -1
- package/templates/cli-tool.md +30 -0
- package/templates/dashboard.md +4 -0
- package/templates/data-pipeline.md +4 -0
- package/templates/discord-bot.md +47 -0
- package/templates/game.md +4 -0
- package/templates/microservice.md +4 -0
- package/templates/npm-library.md +4 -0
- package/templates/rest-api-auth.md +50 -20
- package/templates/rest-api.md +15 -0
- package/templates/saas-starter.md +1 -1
- package/templates/slack-bot.md +36 -0
- package/templates/static-landing-page.md +9 -1
- package/templates/web-scraper.md +4 -0
- package/web-app/dist/assets/Badge-CeBkFjo6.js +1 -0
- package/web-app/dist/assets/Button-yuhqo8Fq.js +1 -0
- package/web-app/dist/assets/{Card-B1bV4syB.js → Card-BG17vsX0.js} +1 -1
- package/web-app/dist/assets/{HomePage-CZTV6Nea.js → HomePage-BMSQ7Apj.js} +3 -3
- package/web-app/dist/assets/{LoginPage-D4UdURJc.js → LoginPage-aH_6iolg.js} +1 -1
- package/web-app/dist/assets/{NotFoundPage-CCLSeL6j.js → NotFoundPage-Di8cNtB1.js} +1 -1
- package/web-app/dist/assets/ProjectPage-BtRssmw9.js +285 -0
- package/web-app/dist/assets/ProjectsPage-B-FTFagc.js +6 -0
- package/web-app/dist/assets/{SettingsPage-Xuv8EfAg.js → SettingsPage-DIJPBla4.js} +1 -1
- package/web-app/dist/assets/TeamsPage--19fNX7w.js +36 -0
- package/web-app/dist/assets/TemplatesPage-ChUQNOOv.js +11 -0
- package/web-app/dist/assets/TerminalOutput-Dwrzecyl.js +31 -0
- package/web-app/dist/assets/activity-BNRWeu9N.js +6 -0
- package/web-app/dist/assets/{arrow-left-CaGtolHc.js → arrow-left-Ce6g1_YE.js} +1 -1
- package/web-app/dist/assets/circle-alert-LIndawHL.js +11 -0
- package/web-app/dist/assets/clock-Bpj4VPlP.js +6 -0
- package/web-app/dist/assets/{external-link-CazyUyav.js → external-link-BhhdF0iQ.js} +1 -1
- package/web-app/dist/assets/folder-open-CM2LgfxI.js +11 -0
- package/web-app/dist/assets/index-8-KpWWq7.css +1 -0
- package/web-app/dist/assets/index-kPDW4e_b.js +236 -0
- package/web-app/dist/assets/lock-sAk3Xe54.js +16 -0
- package/web-app/dist/assets/search-CR-2i9by.js +6 -0
- package/web-app/dist/assets/server-DuFh4ymA.js +26 -0
- package/web-app/dist/assets/trash-2-BmkkT8V_.js +11 -0
- package/web-app/dist/index.html +2 -2
- package/web-app/server.py +1321 -53
- package/web-app/dist/assets/Badge-CBUx2PjL.js +0 -6
- package/web-app/dist/assets/Button-DsRiznlh.js +0 -21
- package/web-app/dist/assets/ProjectPage-D0w_X9tG.js +0 -237
- package/web-app/dist/assets/ProjectsPage-ByYxDlKC.js +0 -16
- package/web-app/dist/assets/TemplatesPage-BKWN07mc.js +0 -1
- package/web-app/dist/assets/TerminalOutput-Dj98V8Z-.js +0 -51
- package/web-app/dist/assets/clock-C_CDmobx.js +0 -11
- package/web-app/dist/assets/index-D452pFGl.css +0 -1
- package/web-app/dist/assets/index-Df4_kgLY.js +0 -196
|
@@ -0,0 +1,105 @@
|
|
|
1
|
+
# Agent 04: Memory System Functional Testing -- Bug Fix Report
|
|
2
|
+
|
|
3
|
+
## Scope
|
|
4
|
+
Comprehensive review and fix of all 16 Python modules in `memory/` (~10K lines).
|
|
5
|
+
Addressed bugs from `docs/BUG-AUDIT-v6.61.0.md` plus newly discovered issues.
|
|
6
|
+
|
|
7
|
+
## Bugs Fixed
|
|
8
|
+
|
|
9
|
+
### BUG-MEM-001 (Audit) -- Episode ID date parsing produces garbage paths
|
|
10
|
+
- **File**: `memory/engine.py` method `get_episode()`
|
|
11
|
+
- **Root cause**: Fixed-offset parsing with `parts[1]`/`parts[2]`/`parts[3]` assumed a two-character prefix like `ep-`. Variable-length prefixes (e.g., `episode-YYYY-MM-DD-xxx`) shifted the offsets, producing wrong date directories.
|
|
12
|
+
- **Fix**: Replaced fixed-offset parsing with regex `re.search(r'(\d{4})-(\d{2})-(\d{2})', episode_id)` that extracts the date from anywhere in the ID string. Falls back to full directory scan if no date pattern is found.
|
|
13
|
+
|
|
14
|
+
### BUG-MEM-002 (Mission) -- Semantic search returns stale results after consolidation
|
|
15
|
+
- **File**: `memory/retrieval.py` methods `retrieve_by_similarity()`, `build_indices()`
|
|
16
|
+
- **Root cause**: After consolidation modifies `patterns.json`, the in-memory vector index still holds old embeddings. Searches return outdated results.
|
|
17
|
+
- **Fix**: Added `_indices_built_at` timestamp tracking. `build_indices()` records the build time. `retrieve_by_similarity()` compares patterns.json mtime against the build timestamp and falls back to keyword search when stale. Added `mark_indices_stale()` method for explicit invalidation.
|
|
18
|
+
|
|
19
|
+
### BUG-MEM-003 (Mission) -- Consolidation pipeline has no locking
|
|
20
|
+
- **File**: `memory/consolidation.py` method `consolidate()`
|
|
21
|
+
- **Root cause**: The consolidation pipeline performs multiple read-modify-write operations on patterns.json without any exclusive locking. Concurrent consolidation runs (e.g., from parallel agents) corrupt data.
|
|
22
|
+
- **Fix**: Added file-based exclusive lock (`fcntl.flock`) via `.consolidation.lock`. The `consolidate()` method acquires the lock before delegating to the new `_consolidate_locked()` method. Lock is always released in a finally block, and the lock file is cleaned up.
|
|
23
|
+
|
|
24
|
+
### BUG-MEM-004 (Mission) -- Memory engine doesn't validate schema versions
|
|
25
|
+
- **File**: `memory/engine.py` class `MemoryEngine`
|
|
26
|
+
- **Root cause**: No version validation when loading memory data files. Incompatible schema versions could silently produce wrong results or corrupt data.
|
|
27
|
+
- **Fix**: Added `SUPPORTED_SCHEMA_VERSIONS` set and `CURRENT_SCHEMA_VERSION` constant to `MemoryEngine`. Added `_validate_schema_version()` method that checks version fields in loaded data. Called during `initialize()` for index.json and timeline.json. Logs warnings for unsupported versions, auto-assigns version to legacy data without one. Changed hardcoded `"1.0"` to `self.CURRENT_SCHEMA_VERSION` in new file creation.
|
|
28
|
+
|
|
29
|
+
### BUG-MEM-005 (Mission) -- Token counter overflows for large sessions
|
|
30
|
+
- **File**: `memory/token_economics.py` methods `record_discovery()`, `record_read()`
|
|
31
|
+
- **Root cause**: Token counters grew unbounded in very long sessions. While Python ints don't overflow, downstream JSON serializers and dashboard charts can choke on extremely large numbers.
|
|
32
|
+
- **Fix**: Added `_MAX_TOKEN_COUNTER = 10_000_000_000` class constant. Both `record_discovery()` and `record_read()` now cap their accumulated values at this limit using `min()`.
|
|
33
|
+
|
|
34
|
+
### BUG-MEM-006 (Mission) -- Embedding model fallback dimension mismatch warning
|
|
35
|
+
- **File**: `memory/embeddings.py` method `embed()`
|
|
36
|
+
- **Root cause**: When the primary embedding provider fails at runtime and falls back to a provider with a different dimension (e.g., OpenAI 1536 -> local 384), callers holding references to VectorIndex objects created with the original dimension get dimension mismatch errors. No warning was issued.
|
|
37
|
+
- **Fix**: Added dimension change detection after runtime fallback. Logs an explicit warning when the dimension changes, informing callers that existing vector indices may need to be rebuilt.
|
|
38
|
+
|
|
39
|
+
### BUG-MEM-007 (Mission) -- Vector index not rebuilt after consolidation
|
|
40
|
+
- **File**: `memory/consolidation.py` class `ConsolidationResult`
|
|
41
|
+
- **Root cause**: After consolidation creates or merges patterns, vector indices are not notified and continue serving stale data.
|
|
42
|
+
- **Fix**: Added `vector_index_stale` boolean flag to `ConsolidationResult`. The flag is set to `True` when patterns are created, merged, or anti-patterns are created. Callers can check this flag and rebuild indices accordingly.
|
|
43
|
+
|
|
44
|
+
### BUG-MEM-013 (Audit) -- Missing encoding on vector index JSON sidecar write
|
|
45
|
+
- **File**: `memory/vector_index.py` method `save()`
|
|
46
|
+
- **Root cause**: JSON sidecar files were written without specifying encoding. On systems with non-UTF-8 default locale, non-ASCII metadata caused encoding errors.
|
|
47
|
+
- **Fix**: Replaced direct file write with atomic write pattern (tempfile + `os.replace`). Added `encoding="utf-8"` and `ensure_ascii=False` to the JSON dump.
|
|
48
|
+
|
|
49
|
+
### NEW BUG -- Non-atomic npz file write in vector index
|
|
50
|
+
- **File**: `memory/vector_index.py` method `save()`
|
|
51
|
+
- **Root cause**: `np.savez()` writes directly to the target path. A crash during write could leave a corrupt npz file, breaking index loading.
|
|
52
|
+
- **Fix**: Write to a temp file first, then atomically rename using `os.replace()`.
|
|
53
|
+
|
|
54
|
+
### NEW BUG -- TOCTOU race in increment_pattern_usage
|
|
55
|
+
- **File**: `memory/engine.py` method `increment_pattern_usage()`
|
|
56
|
+
- **Root cause**: Used `read_json()` + `write_json()` as separate operations to update a pattern's usage count. Another concurrent write could overwrite the changes between the read and write.
|
|
57
|
+
- **Fix**: Replaced with `load_pattern()` + `_dict_to_pattern()` + `save_pattern()` which performs the full upsert under an exclusive file lock via the storage layer.
|
|
58
|
+
|
|
59
|
+
### NEW BUG -- Timeline TOCTOU race in engine
|
|
60
|
+
- **File**: `memory/engine.py` method `_update_timeline_with_episode()`
|
|
61
|
+
- **Root cause**: Used `read_json()` + `write_json()` (separate lock acquisitions) to update timeline.json. Concurrent episode storage could lose timeline entries.
|
|
62
|
+
- **Fix**: Delegated to `self.storage.update_timeline(action_entry)` which performs the full read-modify-write under a single exclusive lock.
|
|
63
|
+
|
|
64
|
+
## Bugs Verified as Already Fixed
|
|
65
|
+
|
|
66
|
+
The following bugs from the audit were already fixed in the current codebase:
|
|
67
|
+
|
|
68
|
+
| Bug ID | Description | How verified |
|
|
69
|
+
|--------|-------------|-------------|
|
|
70
|
+
| BUG-MEM-004 (Audit) | `cluster_by_similarity` uses `list.index()` on duplicates | Code at line 300 uses `member_indices` tracking instead |
|
|
71
|
+
| BUG-MEM-005 (Audit) | Anti-pattern dedup misses current-run duplicates | Code at lines 228-230 adds to `existing_patterns` within loop |
|
|
72
|
+
| BUG-MEM-006 (Audit) | Non-atomic `index.json` write in layers | `memory/layers/` directory does not exist; storage.py uses `_atomic_write` |
|
|
73
|
+
| BUG-MEM-007 (Audit) | Non-atomic `timeline.json` write in layers | Same as above |
|
|
74
|
+
| BUG-MEM-009 (Audit) | `apply_decay` float comparison causes unnecessary rewrites | Code at line 1245 uses `abs(...) > 0.001` tolerance |
|
|
75
|
+
| BUG-MEM-011 (Audit) | `_to_utc_isoformat` edge case with custom tzinfo | Code uses `dt.utcoffset()` comparison, not deprecated `utctimetuple()` |
|
|
76
|
+
| BUG-MEM-012 (Audit) | Redundant filesystem scan in token economics | `_full_load_baseline` caching works correctly |
|
|
77
|
+
| BUG-MEM-014 (Audit) | `AttributeError` on dict-typed actions in `_episode_to_text` | Code handles both dict and object types with `isinstance` checks |
|
|
78
|
+
|
|
79
|
+
## Validation
|
|
80
|
+
|
|
81
|
+
All 15 Python files in `memory/` pass `ast.parse()` syntax validation:
|
|
82
|
+
- `__init__.py`, `consolidation.py`, `cross_project.py`, `embeddings.py`, `engine.py`
|
|
83
|
+
- `knowledge_graph.py`, `namespace.py`, `rag_injector.py`, `retrieval.py`, `schemas.py`
|
|
84
|
+
- `storage.py`, `test_importance.py`, `token_economics.py`, `unified_access.py`, `vector_index.py`
|
|
85
|
+
|
|
86
|
+
## Edge Cases Analyzed
|
|
87
|
+
|
|
88
|
+
1. **Empty data**: All methods handle empty lists/dicts gracefully with early returns.
|
|
89
|
+
2. **Unicode**: JSON sidecar writes now use `encoding="utf-8"` and `ensure_ascii=False`.
|
|
90
|
+
3. **Very large episodes**: Token counters capped at 10 billion to prevent JSON serialization issues.
|
|
91
|
+
4. **Concurrent access**: Consolidation pipeline now has exclusive lock; pattern updates use storage-level locking; timeline updates use storage-level locking.
|
|
92
|
+
5. **Schema version drift**: Engine now validates schema versions on load and warns about incompatible versions.
|
|
93
|
+
6. **ID format variations**: Episode lookup now uses regex to extract dates from any position in the ID string.
|
|
94
|
+
7. **File corruption during crash**: Vector index npz and JSON sidecar files now use atomic write (temp file + rename).
|
|
95
|
+
|
|
96
|
+
## Files Modified
|
|
97
|
+
|
|
98
|
+
| File | Changes |
|
|
99
|
+
|------|---------|
|
|
100
|
+
| `memory/engine.py` | Fixed episode ID parsing, added schema version validation, fixed TOCTOU races in pattern usage and timeline updates |
|
|
101
|
+
| `memory/retrieval.py` | Added index staleness detection, build timestamp tracking, `mark_indices_stale()` |
|
|
102
|
+
| `memory/consolidation.py` | Added exclusive file lock for consolidation pipeline, `vector_index_stale` flag |
|
|
103
|
+
| `memory/token_economics.py` | Added token counter overflow cap |
|
|
104
|
+
| `memory/embeddings.py` | Added dimension change warning on runtime fallback |
|
|
105
|
+
| `memory/vector_index.py` | Atomic writes for both npz and JSON sidecar files, encoding fix |
|
|
@@ -0,0 +1,86 @@
|
|
|
1
|
+
# Agent 05: Provider System Functional Testing - Bug Fixes
|
|
2
|
+
|
|
3
|
+
## Summary
|
|
4
|
+
|
|
5
|
+
Tested all 5 provider invocation paths (Claude, Codex, Gemini, Cline, Aider) and fixed 5 bugs across `autonomy/run.sh`, `providers/gemini.sh`. Also identified and fixed 1 new undocumented bug.
|
|
6
|
+
|
|
7
|
+
## Bugs Fixed
|
|
8
|
+
|
|
9
|
+
### BUG-PROV-001: Gemini ignores tier_param for model selection (FIXED)
|
|
10
|
+
|
|
11
|
+
**Root cause:** The Gemini invocation in `run.sh` (main iteration loop) used `PROVIDER_MODEL` (frozen at source-time) instead of `tier_param` (dynamically resolved per iteration via `resolve_model_for_tier()`). Regardless of RARV tier, Gemini always used the same model.
|
|
12
|
+
|
|
13
|
+
**Fix locations:**
|
|
14
|
+
- `autonomy/run.sh` (line ~9685): Changed `local model="${PROVIDER_MODEL:-...}"` to `local model="$tier_param"` in the Gemini case block
|
|
15
|
+
- `autonomy/run.sh` `invoke_gemini()` (line ~3029): Changed from `PROVIDER_MODEL` to `provider_get_current_model()` with fallback
|
|
16
|
+
- `autonomy/run.sh` `invoke_gemini_capture()` (line ~3068): Same fix as above
|
|
17
|
+
|
|
18
|
+
### BUG-PROV-003: Claude health check breaks OAuth users; Gemini lacks key rotation (FIXED)
|
|
19
|
+
|
|
20
|
+
**Root cause (Claude):** `check_provider_health()` required `ANTHROPIC_API_KEY` env var. Users authenticating via OAuth (no API key) were marked unhealthy, triggering unnecessary failover to degraded providers.
|
|
21
|
+
|
|
22
|
+
**Root cause (Gemini):** No support for API key rotation when keys expire or hit quota. No support for `GEMINI_API_KEY` env var alias or gcloud ADC.
|
|
23
|
+
|
|
24
|
+
**Fix locations:**
|
|
25
|
+
- `autonomy/run.sh` `check_provider_health()`: Claude now checks for OAuth session files (`~/.claude/.credentials.json`) and `claude auth status` as fallback. Gemini now checks `GEMINI_API_KEY` and gcloud ADC.
|
|
26
|
+
- `providers/gemini.sh`: Added `_gemini_resolve_api_key()` for key resolution from multiple sources (`GOOGLE_API_KEY`, `GEMINI_API_KEY`, gcloud ADC).
|
|
27
|
+
- `providers/gemini.sh`: Added `_gemini_rotate_api_key()` for rotating through `LOKI_GEMINI_API_KEYS` (comma-separated list) on auth errors (401/403).
|
|
28
|
+
- `providers/gemini.sh` `provider_invoke()` and `provider_invoke_with_tier()`: Added auth error detection and key rotation before rate-limit fallback.
|
|
29
|
+
- `autonomy/run.sh` Gemini invocation block: Added auth error detection and key rotation.
|
|
30
|
+
|
|
31
|
+
### BUG-PROV-008: Failover updates PROVIDER_NAME but not LOKI_PROVIDER (FIXED)
|
|
32
|
+
|
|
33
|
+
**Root cause:** After failover, `PROVIDER_NAME` was updated but `LOKI_PROVIDER` env var (read by subprocesses and MCP server) retained the old provider name. Child processes and the MCP server reported the wrong provider.
|
|
34
|
+
|
|
35
|
+
**Fix locations:**
|
|
36
|
+
- `autonomy/run.sh` `attempt_provider_failover()`: Added `LOKI_PROVIDER="$provider"; export LOKI_PROVIDER` after updating `PROVIDER_NAME`
|
|
37
|
+
- `autonomy/run.sh` `check_primary_recovery()`: Same fix when switching back to primary provider
|
|
38
|
+
|
|
39
|
+
### NEW BUG: LOKI_CURRENT_TIER never exported (FOUND AND FIXED)
|
|
40
|
+
|
|
41
|
+
**Root cause:** `providers/gemini.sh:provider_get_current_model()` reads `LOKI_CURRENT_TIER` to resolve the model dynamically. However, `run.sh` only sets `CURRENT_TIER` (without the `LOKI_` prefix) and never exports it. As a result, `provider_get_current_model()` always defaults to "planning" tier, negating the dynamic tier resolution for all Gemini helper functions (`invoke_gemini`, `invoke_gemini_capture`).
|
|
42
|
+
|
|
43
|
+
**Fix locations:**
|
|
44
|
+
- `autonomy/run.sh` (line ~1366): Set and export `LOKI_CURRENT_TIER` at initialization
|
|
45
|
+
- `autonomy/run.sh` (line ~9424): Update and export `LOKI_CURRENT_TIER` when `CURRENT_TIER` changes each iteration
|
|
46
|
+
|
|
47
|
+
## Bugs Already Fixed (Verified)
|
|
48
|
+
|
|
49
|
+
These bugs were listed in the assignment but had already been resolved in the current codebase:
|
|
50
|
+
|
|
51
|
+
| Bug ID | Description | Status |
|
|
52
|
+
|--------|-------------|--------|
|
|
53
|
+
| BUG-PROV-002 | Generic LOKI_MODEL_* injects invalid Codex models | Fixed: `_codex_validate_model()` in `codex.sh` filters non-Codex model names |
|
|
54
|
+
| BUG-PROV-005 | Provider loader doesn't validate provider exists before sourcing | Fixed: `load_provider()` validates name AND checks file existence |
|
|
55
|
+
| BUG-PROV-007 | auto_detect_provider skips Cline and Aider | Fixed: All 5 providers in priority order |
|
|
56
|
+
| BUG-PROV-009 | Cline model flag word-splitting | Fixed: Array-based `model_args` in `cline.sh` |
|
|
57
|
+
| BUG-PROV-010 | Gemini buffers all output, loses streaming | Fixed: Uses `tee` for streaming |
|
|
58
|
+
| BUG-PROV-012 | Codex resolve_model_for_tier returns effort levels | Fixed: Documented as intentional, callers use correctly |
|
|
59
|
+
| BUG-RUN-010 | Retry counter increments on success | Fixed: `retry=0` reset on success at lines 9851/9897 |
|
|
60
|
+
| BUG-PROV-011 | Parallel dispatch includes Cline despite PARALLEL=false | Fixed: Guard at line 2235 checks `PROVIDER_HAS_PARALLEL` |
|
|
61
|
+
|
|
62
|
+
## Validation
|
|
63
|
+
|
|
64
|
+
### Bash syntax validation (all pass)
|
|
65
|
+
- `bash -n providers/claude.sh` -- OK
|
|
66
|
+
- `bash -n providers/codex.sh` -- OK
|
|
67
|
+
- `bash -n providers/gemini.sh` -- OK
|
|
68
|
+
- `bash -n providers/cline.sh` -- OK
|
|
69
|
+
- `bash -n providers/aider.sh` -- OK
|
|
70
|
+
- `bash -n providers/loader.sh` -- OK
|
|
71
|
+
- `bash -n autonomy/run.sh` -- OK
|
|
72
|
+
|
|
73
|
+
### Edge cases verified
|
|
74
|
+
1. **API key missing**: `check_provider_health()` handles all 5 providers; Claude supports OAuth fallback
|
|
75
|
+
2. **CLI not installed**: All provider detect functions use `command -v` with proper error handling
|
|
76
|
+
3. **Version mismatch**: Provider version functions safely call `--version` with stderr suppression
|
|
77
|
+
4. **Failover chain**: Wraps around correctly using double-iteration with break-on-wrap guard
|
|
78
|
+
5. **Key rotation**: `_gemini_rotate_api_key()` handles single key, wraps around, and returns failure when exhausted
|
|
79
|
+
6. **Frozen model variable**: All Gemini invocation paths now use dynamic resolution
|
|
80
|
+
|
|
81
|
+
## Files Modified
|
|
82
|
+
|
|
83
|
+
| File | Changes |
|
|
84
|
+
|------|---------|
|
|
85
|
+
| `autonomy/run.sh` | BUG-PROV-001 (Gemini model selection), BUG-PROV-003 (health check + auth), BUG-PROV-008 (LOKI_PROVIDER export), LOKI_CURRENT_TIER export |
|
|
86
|
+
| `providers/gemini.sh` | BUG-PROV-003 (API key resolution + rotation functions, auth error handling in invoke functions) |
|
|
@@ -0,0 +1,101 @@
|
|
|
1
|
+
# Agent 06: Purple Lab + CLI Integration Fixes
|
|
2
|
+
|
|
3
|
+
## Summary
|
|
4
|
+
|
|
5
|
+
Investigated and fixed 5 integration bugs between Purple Lab (web-app/server.py) and the loki CLI (autonomy/loki, autonomy/run.sh). All bugs were at the boundary where the web server dispatches to the CLI or reads CLI-produced state files.
|
|
6
|
+
|
|
7
|
+
## Bugs Fixed
|
|
8
|
+
|
|
9
|
+
### BUG-INT-001: Quick-start API doesn't pass provider selection to CLI
|
|
10
|
+
|
|
11
|
+
**File:** `web-app/server.py` (start_session endpoint, line ~2537)
|
|
12
|
+
|
|
13
|
+
**Root cause:** When `req.mode == "quick"`, the command built was `loki quick <description>` without passing the provider. The `--provider` flag was only included in the `else` branch (full `loki start` mode). Since `loki quick` does not accept a `--provider` flag, the fix passes the provider via the `LOKI_PROVIDER` environment variable, which `run.sh` reads at line 665.
|
|
14
|
+
|
|
15
|
+
**Fix:** After constructing `build_env`, inject `LOKI_PROVIDER` from `req.provider` for all modes (both quick and start). This ensures the correct AI provider is used regardless of invocation mode.
|
|
16
|
+
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
### BUG-INT-002: Session state file format mismatch between web and CLI
|
|
20
|
+
|
|
21
|
+
**Files:** `web-app/server.py` (3 locations), `autonomy/run.sh`
|
|
22
|
+
|
|
23
|
+
**Root cause:** The web server read session state from `.loki/state/session.json`, but the CLI never writes that file. The CLI writes:
|
|
24
|
+
- `.loki/dashboard-state.json` (via `write_dashboard_state()` in run.sh) -- contains phase, iteration, complexity, tasks, tokens, agents
|
|
25
|
+
- `.loki/state/orchestrator.json` -- contains currentPhase
|
|
26
|
+
- `.loki/autonomy-state.json` -- contains retryCount, iterationCount, status
|
|
27
|
+
|
|
28
|
+
The web server was reading from a nonexistent file, so status fields (phase, iteration, complexity, cost, pending tasks) were always default values.
|
|
29
|
+
|
|
30
|
+
**Fix:** Changed 3 locations in server.py to read from `dashboard-state.json` (primary) with `state/orchestrator.json` fallback:
|
|
31
|
+
1. `get_status()` endpoint (GET /api/session/status)
|
|
32
|
+
2. `_push_state_to_client()` WebSocket push loop
|
|
33
|
+
3. `_infer_session_status()` for session history
|
|
34
|
+
|
|
35
|
+
Field mapping updated to match `dashboard-state.json` structure:
|
|
36
|
+
- `tasks.pending` instead of `pending_tasks` (nested object)
|
|
37
|
+
- `tasks.inProgress` for current task detection
|
|
38
|
+
- `tokens.cost_usd` for cost (same structure, just different file)
|
|
39
|
+
|
|
40
|
+
---
|
|
41
|
+
|
|
42
|
+
### BUG-INT-003: WebSocket connection drops during long builds (no reconnect)
|
|
43
|
+
|
|
44
|
+
**Files:** `web-app/src/api/client.ts`, `web-app/server.py`
|
|
45
|
+
|
|
46
|
+
**Root cause:** The server sends keepalive pings every 60 seconds of client silence (line 5400). If 2 consecutive pings receive no pong response, the server disconnects the WebSocket (line 5396-5398). The client's `PurpleLabWebSocket` class parsed incoming messages and emitted events but never handled the `ping` message type -- it just passed it through to listeners (which nobody listened for). During long builds, the client sends no messages, so the server disconnects after ~120 seconds.
|
|
47
|
+
|
|
48
|
+
The client already had reconnect logic (3-second delay after disconnect), but reconnection during a build causes loss of the log backfill window and a brief UI disruption.
|
|
49
|
+
|
|
50
|
+
**Fix:** Added ping/pong handling in the client's `onmessage` handler. When the client receives a `{type: "ping"}` message, it immediately responds with `{type: "pong"}` via `this.send()`, preventing the server from closing the connection.
|
|
51
|
+
|
|
52
|
+
---
|
|
53
|
+
|
|
54
|
+
### BUG-INT-004: File watcher ignores changes based on absolute path
|
|
55
|
+
|
|
56
|
+
**File:** `web-app/server.py` (FileChangeHandler._should_ignore)
|
|
57
|
+
|
|
58
|
+
**Root cause:** The `_should_ignore` method decomposed the FULL absolute path into parts and checked each part against `_WATCH_IGNORE_DIRS` (which includes "build", "dist", "cache", ".git", etc.). If the project was stored at a path containing any of these directory names (e.g., `/home/user/build/my-project/src/app.js`), ALL file events would be silently ignored.
|
|
59
|
+
|
|
60
|
+
The check should only examine path components RELATIVE to the project directory, since only directories within the project should be filtered.
|
|
61
|
+
|
|
62
|
+
**Fix:** Changed `_should_ignore` to compute `os.path.relpath(path, self.project_dir)` before decomposing into parts. This ensures only project-internal directory names are checked against the ignore list.
|
|
63
|
+
|
|
64
|
+
---
|
|
65
|
+
|
|
66
|
+
### BUG-INT-005 (NEW): Chat endpoint hardcodes provider as "claude"
|
|
67
|
+
|
|
68
|
+
**File:** `web-app/server.py` (chat_session endpoint, line ~3957)
|
|
69
|
+
|
|
70
|
+
**Root cause:** In the chat endpoint's "max" mode, the command was hardcoded as `[loki, "start", "--provider", "claude", str(prd_path)]`. Users who selected a different provider (codex, gemini) would have their chat commands always routed to Claude. Similarly, "quick" and "standard" modes did not pass any provider information.
|
|
71
|
+
|
|
72
|
+
Additionally, 3 other `loki quick` invocations (monitor auto-fix, Docker service fix, fix endpoint) also had no provider passthrough.
|
|
73
|
+
|
|
74
|
+
**Fix:**
|
|
75
|
+
1. Chat endpoint now reads the provider from `session.provider` and `.loki/state/provider` file
|
|
76
|
+
2. Max mode passes the detected provider to `--provider` flag
|
|
77
|
+
3. Quick/standard modes pass provider via `LOKI_PROVIDER` env var
|
|
78
|
+
4. Fix endpoint (`/api/sessions/{id}/fix`) passes provider via env
|
|
79
|
+
5. Monitor auto-fix (`_auto_fix` method) reads provider from session state
|
|
80
|
+
6. Docker service auto-fix reads provider from session state
|
|
81
|
+
|
|
82
|
+
## Files Modified
|
|
83
|
+
|
|
84
|
+
| File | Changes |
|
|
85
|
+
|------|---------|
|
|
86
|
+
| `web-app/server.py` | BUG-INT-001 through BUG-INT-005: provider passthrough, state file path correction, file watcher relative path |
|
|
87
|
+
| `web-app/src/api/client.ts` | BUG-INT-003: WebSocket ping/pong handler |
|
|
88
|
+
|
|
89
|
+
## Verification
|
|
90
|
+
|
|
91
|
+
- Python syntax validated: `python3 -c "import ast; ast.parse(open('web-app/server.py').read())"`
|
|
92
|
+
- TypeScript changes verified (no new errors beyond pre-existing Vite import.meta issues)
|
|
93
|
+
- All fixes are backward-compatible (fallback to "claude" provider, fallback to orchestrator.json)
|
|
94
|
+
|
|
95
|
+
## Edge Cases Considered
|
|
96
|
+
|
|
97
|
+
1. **Concurrent sessions**: Each chat task creates its own subprocess with its own env, so provider isolation is maintained
|
|
98
|
+
2. **Missing provider file**: Falls back to `session.provider` then to `"claude"` default
|
|
99
|
+
3. **Project directory in ignored path**: Fixed by relative path computation; `os.path.relpath` handles cross-drive paths on Windows via ValueError catch
|
|
100
|
+
4. **WebSocket reconnection during build**: Client now responds to pings, preventing premature disconnection; if disconnection still occurs, the 3-second reconnect timer handles recovery
|
|
101
|
+
5. **State file corruption**: All JSON reads wrapped in try/except with fallback defaults
|
|
@@ -0,0 +1,101 @@
|
|
|
1
|
+
# Agent 07: Dashboard + run.sh Integration Bug Fixes
|
|
2
|
+
|
|
3
|
+
## Area: Dashboard API (server.py) <-> Orchestrator (run.sh) Integration
|
|
4
|
+
|
|
5
|
+
## Known Bugs -- Verification Status
|
|
6
|
+
|
|
7
|
+
All 7 known bugs (BUG-RUN-001 through BUG-RUN-010) were already patched in the codebase
|
|
8
|
+
with fix comments. Verified each is correctly addressed:
|
|
9
|
+
|
|
10
|
+
| Bug ID | Description | Status |
|
|
11
|
+
|--------|-------------|--------|
|
|
12
|
+
| BUG-RUN-001 | Completion promise checks stale daily log | FIXED (line 9873: uses `$iter_output`) |
|
|
13
|
+
| BUG-RUN-002 | Rate limit detection greps stale daily log | FIXED (line 9910: uses `$iter_output`) |
|
|
14
|
+
| BUG-RUN-003 | ITERATION_COUNT never persisted across restarts | FIXED (line 7964: restored from state) |
|
|
15
|
+
| BUG-RUN-004 | Inconsistent JSON formats in state files | FIXED (queue normalization via jq at line 3308) |
|
|
16
|
+
| BUG-RUN-005 | OpenSpec queue has no deduplication | FIXED (line 8665: `existing_ids` check) |
|
|
17
|
+
| BUG-RUN-009 | Gate escalation PAUSE writes to wrong path | FIXED (line 9804: `touch .loki/PAUSE`) |
|
|
18
|
+
| BUG-RUN-010 | Retry counter increments on success | FIXED (lines 9852, 9898: `retry=0`) |
|
|
19
|
+
|
|
20
|
+
## New Bugs Found and Fixed
|
|
21
|
+
|
|
22
|
+
### BUG-NEW-001: WebSocket push inflates running_agents count
|
|
23
|
+
- **File:** `dashboard/server.py` line 366
|
|
24
|
+
- **Root cause:** `_push_loki_state_loop` counted `len(agents_list)` from the JSON
|
|
25
|
+
without validating PIDs. Dead agents still appeared as running. The REST endpoint
|
|
26
|
+
`get_status` correctly validated each PID with `os.kill(pid, 0)`.
|
|
27
|
+
- **Impact:** Dashboard WebSocket clients show ghost agents that are actually dead.
|
|
28
|
+
- **Fix:** Added PID validation loop matching `get_status` behavior.
|
|
29
|
+
|
|
30
|
+
### BUG-NEW-002: Dashboard drops tasks in object-format queue files
|
|
31
|
+
- **File:** `dashboard/server.py` line 1081
|
|
32
|
+
- **Root cause:** `list_tasks` only handled plain array `[...]` queue files. If a queue
|
|
33
|
+
file was written in `{"tasks": [...]}` format (which `load_queue_tasks` in run.sh
|
|
34
|
+
explicitly supports), all tasks were silently dropped.
|
|
35
|
+
- **Impact:** Tasks written by external tools using object format are invisible in dashboard.
|
|
36
|
+
- **Fix:** Added dict-unwrapping: `raw_items.get("tasks", [])` before array check.
|
|
37
|
+
|
|
38
|
+
### BUG-NEW-003: Per-iteration temp files leak on success paths
|
|
39
|
+
- **File:** `autonomy/run.sh` lines 9853 and 9899
|
|
40
|
+
- **Root cause:** The success `continue` paths (perpetual mode + normal success) skip
|
|
41
|
+
`rm -f "$iter_output"`. Only the terminal completion paths (council/promise fulfilled)
|
|
42
|
+
and the failure path clean up. Over hundreds of iterations, `.loki/logs/iter-output-*`
|
|
43
|
+
files accumulate.
|
|
44
|
+
- **Impact:** Disk space leak proportional to iteration count. Each file contains full
|
|
45
|
+
iteration output (can be MBs).
|
|
46
|
+
- **Fix:** Added `rm -f "$iter_output"` before both success `continue` statements.
|
|
47
|
+
|
|
48
|
+
### BUG-NEW-004: Event JSON emits floats as quoted strings
|
|
49
|
+
- **File:** `autonomy/run.sh` line 951
|
|
50
|
+
- **Root cause:** `emit_event_json` regex `^[0-9]+$` only matches integers. A value
|
|
51
|
+
like `cost=3.14` is treated as a string and quoted (`"cost":"3.14"`), creating
|
|
52
|
+
invalid typed JSON for consumers expecting numbers.
|
|
53
|
+
- **Impact:** Dashboard/OTEL consumers that parse event JSON get string types for
|
|
54
|
+
float metrics (cost, duration, etc.).
|
|
55
|
+
- **Fix:** Changed regex to `^[0-9]+\.?[0-9]*$` to match both integers and floats.
|
|
56
|
+
|
|
57
|
+
### BUG-NEW-005: Dashboard stop leaves orphaned iter_output files
|
|
58
|
+
- **File:** `dashboard/server.py` line 2907
|
|
59
|
+
- **Root cause:** `stop_session` sends SIGTERM and marks session as stopped but does
|
|
60
|
+
not clean up `.loki/logs/iter-output-*` temp files from the killed process.
|
|
61
|
+
- **Impact:** Orphaned temp files persist after dashboard-initiated stops.
|
|
62
|
+
- **Fix:** Added glob cleanup of `iter-output-*` files after SIGTERM.
|
|
63
|
+
|
|
64
|
+
### BUG-NEW-006: WebSocket broadcasts stale "running" status after crash
|
|
65
|
+
- **File:** `dashboard/server.py` line 382
|
|
66
|
+
- **Root cause:** `_push_loki_state_loop` determined status purely from
|
|
67
|
+
`dashboard-state.json`'s `mode` field. If the process crashed (SIGKILL, OOM, etc.),
|
|
68
|
+
the state file still said `"mode": "autonomous"`, so WebSocket clients saw "running"
|
|
69
|
+
indefinitely. The REST `get_status` endpoint correctly cross-checked the PID.
|
|
70
|
+
- **Impact:** Dashboard UI shows session as running after crash until next full poll.
|
|
71
|
+
- **Fix:** Added PID liveness check before status determination. If PID is dead,
|
|
72
|
+
status is forced to "stopped" regardless of state file contents.
|
|
73
|
+
|
|
74
|
+
## Integration Points Verified (No Bugs Found)
|
|
75
|
+
|
|
76
|
+
1. **Pricing tables match:** `_DEFAULT_PRICING` in server.py and `pricing` dict in
|
|
77
|
+
run.sh `check_budget_limit()` have identical rates for all 6 models.
|
|
78
|
+
|
|
79
|
+
2. **Atomic state writes:** `save_state()` uses temp file + `mv` (atomic rename).
|
|
80
|
+
`write_dashboard_state()` also uses temp + mv. Dashboard uses `_safe_json_read`
|
|
81
|
+
with retry for race protection.
|
|
82
|
+
|
|
83
|
+
3. **Midnight-crossing:** `parse_claude_reset_time()` handles past-time correctly by
|
|
84
|
+
adding 86400 seconds. No midnight bug.
|
|
85
|
+
|
|
86
|
+
4. **Session lifecycle:** `stop_session` creates STOP file + SIGTERM, `pause_session`
|
|
87
|
+
creates PAUSE file, `resume_session` removes both. All match run.sh's
|
|
88
|
+
`check_human_intervention()` expectations.
|
|
89
|
+
|
|
90
|
+
5. **Budget enforcement:** Both dashboard `/api/cost` and run.sh `check_budget_limit()`
|
|
91
|
+
read from `.loki/metrics/efficiency/*.json` with matching cost calculation logic.
|
|
92
|
+
|
|
93
|
+
## Files Modified
|
|
94
|
+
|
|
95
|
+
- `autonomy/run.sh` -- 3 fixes (BUG-NEW-003 x2, BUG-NEW-004)
|
|
96
|
+
- `dashboard/server.py` -- 4 fixes (BUG-NEW-001, BUG-NEW-002, BUG-NEW-005, BUG-NEW-006)
|
|
97
|
+
|
|
98
|
+
## Validation
|
|
99
|
+
|
|
100
|
+
- `bash -n autonomy/run.sh` -- PASS
|
|
101
|
+
- `python3 -c "import ast; ast.parse(open('dashboard/server.py').read())"` -- PASS
|
|
@@ -0,0 +1,164 @@
|
|
|
1
|
+
# Agent 08: Docker + Self-Healing Integration Testing
|
|
2
|
+
|
|
3
|
+
## Summary
|
|
4
|
+
|
|
5
|
+
Audited Dockerfile, Dockerfile.sandbox, docker-compose.yml, healing system (`cmd_heal()`),
|
|
6
|
+
migration hooks (`migration-hooks.sh`), and state management in `run.sh`. Fixed 6 bugs
|
|
7
|
+
(2 known, 4 newly discovered).
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## Bugs Fixed
|
|
12
|
+
|
|
13
|
+
### BUG-DK-002: docker-compose loki service missing health check (FIXED)
|
|
14
|
+
|
|
15
|
+
**File:** `docker-compose.yml`
|
|
16
|
+
|
|
17
|
+
**Problem:** The `loki` service had no health check defined. The docker-compose health check
|
|
18
|
+
description in the bug list said "hits wrong endpoint" -- the actual issue was that the loki
|
|
19
|
+
service had zero health check configuration. Only the ChromaDB service had one.
|
|
20
|
+
|
|
21
|
+
**Fix:** Added a health check to the loki service that first tries the dashboard `/health`
|
|
22
|
+
endpoint (for when the dashboard is running), with a fallback to `loki version` (for when
|
|
23
|
+
only the CLI is active). Also updated the version comment from v6.38.0 to v6.71.1.
|
|
24
|
+
|
|
25
|
+
```yaml
|
|
26
|
+
healthcheck:
|
|
27
|
+
test: ["CMD-SHELL", "curl -sf http://localhost:57374/health >/dev/null 2>&1 || loki version >/dev/null 2>&1"]
|
|
28
|
+
interval: 30s
|
|
29
|
+
timeout: 10s
|
|
30
|
+
start-period: 10s
|
|
31
|
+
retries: 3
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
---
|
|
35
|
+
|
|
36
|
+
### BUG-HEAL-002: Healing phase gate doesn't validate phase transitions (FIXED)
|
|
37
|
+
|
|
38
|
+
**File:** `autonomy/hooks/migration-hooks.sh`
|
|
39
|
+
|
|
40
|
+
**Problem:** `hook_healing_phase_gate()` used a `case` statement with only valid transitions
|
|
41
|
+
listed. Any invalid transition (backwards, skipping phases, unknown phases) fell through
|
|
42
|
+
the case and returned 0 (success), silently allowing dangerous operations like jumping
|
|
43
|
+
from `archaeology` directly to `modernize`.
|
|
44
|
+
|
|
45
|
+
**Fix:** Added phase ordering validation before the case statement. The function now:
|
|
46
|
+
1. Validates both `from_phase` and `to_phase` are known phases
|
|
47
|
+
2. Rejects backward transitions (e.g., `modernize` -> `archaeology`)
|
|
48
|
+
3. Rejects phase skipping (e.g., `archaeology` -> `modernize` skipping `stabilize`/`isolate`)
|
|
49
|
+
4. Only allows forward transitions to the immediately next phase
|
|
50
|
+
|
|
51
|
+
---
|
|
52
|
+
|
|
53
|
+
### BUG-HEAL-003: cmd_heal() provider case missing default clause (NEW - FIXED)
|
|
54
|
+
|
|
55
|
+
**File:** `autonomy/loki`
|
|
56
|
+
|
|
57
|
+
**Problem:** The `case "$provider"` statement in `cmd_heal()` (around line 9298) had no
|
|
58
|
+
default `*)` clause. If an unknown provider was specified (e.g., `loki heal ./app --provider foo`),
|
|
59
|
+
the case silently fell through, `heal_exit` stayed 0, and the user received a false
|
|
60
|
+
"Healing phase complete" success message.
|
|
61
|
+
|
|
62
|
+
**Fix:** Added a `*)` default clause that prints an error with supported providers and
|
|
63
|
+
returns 1.
|
|
64
|
+
|
|
65
|
+
---
|
|
66
|
+
|
|
67
|
+
### BUG-HEAL-004: Migration hooks never sourced in healing flow (NEW - FIXED)
|
|
68
|
+
|
|
69
|
+
**File:** `autonomy/loki`
|
|
70
|
+
|
|
71
|
+
**Problem:** `autonomy/hooks/migration-hooks.sh` was never sourced by either `autonomy/loki`
|
|
72
|
+
or `autonomy/run.sh`. This meant all healing hooks (`hook_pre_healing_modify()`,
|
|
73
|
+
`hook_post_healing_modify()`, `hook_healing_phase_gate()`) were dead code -- they existed
|
|
74
|
+
but were never called during actual healing operations. The only consumer was the test file
|
|
75
|
+
`tests/test-migration-v2.sh`.
|
|
76
|
+
|
|
77
|
+
**Fix:** Added sourcing of `migration-hooks.sh` in `cmd_heal()` with:
|
|
78
|
+
1. Source the hooks file using `BASH_SOURCE[0]` relative path resolution
|
|
79
|
+
2. Call `load_migration_hook_config()` to load project-specific hook configuration
|
|
80
|
+
3. Export healing environment variables (`LOKI_HEAL_MODE`, `LOKI_HEAL_PHASE`, etc.)
|
|
81
|
+
4. Invoke `hook_healing_phase_gate()` when `--resume` is used with a different phase
|
|
82
|
+
|
|
83
|
+
---
|
|
84
|
+
|
|
85
|
+
### BUG-ST-013: save_state() doesn't ensure .loki directory exists (NEW - FIXED)
|
|
86
|
+
|
|
87
|
+
**File:** `autonomy/run.sh`
|
|
88
|
+
|
|
89
|
+
**Problem:** `save_state()` writes to `.loki/autonomy-state.json` but doesn't ensure the
|
|
90
|
+
`.loki` directory exists. While normally created by `initialize_workspace()`, signal handlers
|
|
91
|
+
could call `save_state()` before initialization completes, causing a silent failure.
|
|
92
|
+
|
|
93
|
+
**Fix:** Added defensive `mkdir -p .loki 2>/dev/null || true` at the start of `save_state()`.
|
|
94
|
+
|
|
95
|
+
---
|
|
96
|
+
|
|
97
|
+
### BUG-ST-014: Non-atomic current-task.json writes (NEW - FIXED)
|
|
98
|
+
|
|
99
|
+
**File:** `autonomy/run.sh`
|
|
100
|
+
|
|
101
|
+
**Problem:** `current-task.json` was written with direct `echo ... > file` (lines 3631, 3815),
|
|
102
|
+
outside the flock-protected section. This could cause partial reads if the dashboard or
|
|
103
|
+
another process reads the file mid-write. Other state files (e.g., `autonomy-state.json`,
|
|
104
|
+
`session.json`) already used atomic temp-file + mv patterns.
|
|
105
|
+
|
|
106
|
+
**Fix:** Both writes now use `echo ... > tmpfile && mv -f tmpfile target` atomic pattern,
|
|
107
|
+
consistent with BUG-XC-004 and BUG-ST-008 patterns elsewhere in the codebase.
|
|
108
|
+
|
|
109
|
+
---
|
|
110
|
+
|
|
111
|
+
## Bugs Verified as Already Fixed
|
|
112
|
+
|
|
113
|
+
### BUG-DK-001: Dockerfile COPY dashboard/ missing pip install
|
|
114
|
+
|
|
115
|
+
Both `Dockerfile` (line 89-90) and `Dockerfile.sandbox` (line 180-181) already include
|
|
116
|
+
`pip3 install --no-cache-dir --break-system-packages -r dashboard/requirements.txt`.
|
|
117
|
+
No fix needed.
|
|
118
|
+
|
|
119
|
+
### BUG-DK-003: Sandbox Dockerfile doesn't install bash 5
|
|
120
|
+
|
|
121
|
+
Verified: Debian bookworm-slim (used by Dockerfile.sandbox) ships bash 5.2.15.
|
|
122
|
+
Ubuntu 24.04 (used by Dockerfile) ships bash 5.2.21. Both support associative arrays
|
|
123
|
+
and parallel mode. No fix needed.
|
|
124
|
+
|
|
125
|
+
### BUG-HEAL-001: cmd_heal() doesn't create .loki/healing/ directory before writing
|
|
126
|
+
|
|
127
|
+
Verified: `cmd_heal()` creates the directory at line 9201 with
|
|
128
|
+
`mkdir -p "$heal_dir"/{behavioral-baseline,characterization-tests}` before any writes.
|
|
129
|
+
The `--status`, `--report`, and `--friction-map` subcommands only read (never write)
|
|
130
|
+
and properly check for directory/file existence. No fix needed.
|
|
131
|
+
|
|
132
|
+
---
|
|
133
|
+
|
|
134
|
+
## Additional Findings (Not Fixed -- Low Priority)
|
|
135
|
+
|
|
136
|
+
### Non-atomic writes in initialization
|
|
137
|
+
|
|
138
|
+
Several state files during `initialize_workspace()` use direct `cat > file` patterns
|
|
139
|
+
(e.g., `orchestrator.json` at line 2955, `budget.json` at line 2980). These are safe because
|
|
140
|
+
initialization runs once before any concurrent access, but could be hardened for robustness.
|
|
141
|
+
|
|
142
|
+
### Phase skip via --phase flag without --resume
|
|
143
|
+
|
|
144
|
+
Users can run `loki heal ./app --phase modernize` and skip prior phases. This is by design
|
|
145
|
+
(expert override), but could be surprising. A warning message when starting at a non-archaeology
|
|
146
|
+
phase without prior healing data could improve UX.
|
|
147
|
+
|
|
148
|
+
---
|
|
149
|
+
|
|
150
|
+
## Files Modified
|
|
151
|
+
|
|
152
|
+
| File | Changes |
|
|
153
|
+
|------|---------|
|
|
154
|
+
| `docker-compose.yml` | Added loki service health check, updated version comment |
|
|
155
|
+
| `autonomy/hooks/migration-hooks.sh` | Added phase transition ordering validation |
|
|
156
|
+
| `autonomy/loki` | Added default provider clause, sourced hooks, added phase gate check on resume |
|
|
157
|
+
| `autonomy/run.sh` | Defensive mkdir in save_state(), atomic current-task.json writes |
|
|
158
|
+
|
|
159
|
+
## Validation
|
|
160
|
+
|
|
161
|
+
- All 3 modified shell scripts pass `bash -n` syntax validation
|
|
162
|
+
- `docker-compose.yml` passes YAML validation with correct structure
|
|
163
|
+
- Health check uses fallback pattern (curl || loki version) for resilience
|
|
164
|
+
- Phase gate validation tested against all 5 phases with forward, backward, and skip scenarios
|
|
@@ -0,0 +1,69 @@
|
|
|
1
|
+
# Agent 09: Full Build E2E Testing - Bug Fixes
|
|
2
|
+
|
|
3
|
+
## Pipeline Traced
|
|
4
|
+
|
|
5
|
+
Complete flow from prompt submission to preview:
|
|
6
|
+
|
|
7
|
+
1. `POST /api/session/quick-start` (web-app/server.py) -> validates, generates PRD
|
|
8
|
+
2. `start_session()` -> spawns `loki start` via Popen with merged stdout/stderr
|
|
9
|
+
3. `_read_process_output()` -> reads lines, broadcasts via WebSocket
|
|
10
|
+
4. `loki start` -> `cmd_start()` (autonomy/loki) -> `run_autonomous()` (autonomy/run.sh)
|
|
11
|
+
5. RARV loop: `build_prompt()` -> provider invocation -> quality gates -> iterate
|
|
12
|
+
6. File watcher detects changes -> broadcasts `file_changed` -> frontend refreshes
|
|
13
|
+
7. Chat iteration: `POST /api/sessions/{id}/chat` -> `loki quick` in project dir
|
|
14
|
+
|
|
15
|
+
## Known Bugs Fixed
|
|
16
|
+
|
|
17
|
+
### BUG-E2E-001: Quick-start empty/short prompt validation
|
|
18
|
+
- **File**: `web-app/server.py` (line ~2600)
|
|
19
|
+
- **Problem**: Quick-start accepted prompts of any length (even 1 char), leading to degenerate builds
|
|
20
|
+
- **Fix**: Added minimum 3-character validation after trim. Empty strings were already caught, but trivial strings like "a" could still trigger a full build pipeline.
|
|
21
|
+
|
|
22
|
+
### BUG-E2E-002: Build output loses ordering
|
|
23
|
+
- **File**: `web-app/server.py` (`_read_process_output`, WebSocket backfill)
|
|
24
|
+
- **Problem**: Log lines broadcast via WebSocket had no sequence number, making it impossible for the frontend to detect gaps or reorder after reconnection.
|
|
25
|
+
- **Root cause**: stdout/stderr were already merged at OS level via `stderr=subprocess.STDOUT` (so pipe ordering is correct), but WebSocket reconnection could cause the frontend to miss lines with no way to detect the gap.
|
|
26
|
+
- **Fix**: Added `seq` field (using `session.log_lines_total`) to every log broadcast and backfill message. Frontend can now detect missed lines and request backfill.
|
|
27
|
+
|
|
28
|
+
### BUG-E2E-003: Preview iframe doesn't reload when files change
|
|
29
|
+
- **File**: `web-app/src/components/ProjectWorkspace.tsx` (line ~573)
|
|
30
|
+
- **Problem**: File change events only triggered iframe reload when no dev server was running (`!devServer?.running`). When a dev server was running, even non-HMR servers (Express, Flask, static servers) never got a reload.
|
|
31
|
+
- **Fix**: Now reloads the iframe for all non-HMR frameworks. HMR-capable frameworks (react, vite, next, nuxt, svelte, remix) are excluded since they handle live reload natively.
|
|
32
|
+
|
|
33
|
+
### BUG-E2E-004: Chat iteration doesn't pass previous context to AI
|
|
34
|
+
- **Files**: `web-app/server.py` (ChatRequest model, chat handler), `web-app/src/api/client.ts`, `web-app/src/components/AIChatPanel.tsx`
|
|
35
|
+
- **Problem**: Each chat message was sent to the AI in isolation. The `loki quick` command had no awareness of what was previously discussed, making iterative development frustrating (user had to repeat context).
|
|
36
|
+
- **Fix**:
|
|
37
|
+
1. Added `history` field to ChatRequest model (optional list of {role, content})
|
|
38
|
+
2. Frontend now sends last 10 messages as conversation history
|
|
39
|
+
3. Server injects history as "PREVIOUS CONVERSATION CONTEXT" prefix to the prompt
|
|
40
|
+
4. Long assistant responses truncated to 500 chars to avoid token bloat
|
|
41
|
+
|
|
42
|
+
### BUG-RUN-001/002: Midnight crossing bugs (already fixed, verified)
|
|
43
|
+
- **File**: `autonomy/run.sh` (line ~9366)
|
|
44
|
+
- **Status**: Already fixed in previous commit. Uses per-iteration `iter_output` temp file instead of daily `log_file` for completion promise checks and rate limit detection.
|
|
45
|
+
|
|
46
|
+
## New Bugs Discovered and Fixed
|
|
47
|
+
|
|
48
|
+
### BUG-E2E-005: iter_output temp file leak on success path
|
|
49
|
+
- **File**: `autonomy/run.sh` (lines ~9852, ~9899)
|
|
50
|
+
- **Problem**: The per-iteration output file (`iter_output`) was cleaned up only on the error/retry path (line 9952) and completion paths (lines 9867, 9882). The normal success path (line 9899) did `continue` without cleanup, leaking a temp file per successful iteration.
|
|
51
|
+
- **Impact**: Long-running sessions in `.loki/logs/` would accumulate `iter-output-XXXXXX` files, one per iteration. A 100-iteration session would leak ~100 temp files.
|
|
52
|
+
- **Fix**: Added `rm -f "$iter_output"` before `continue` on both success paths (perpetual mode at line 9852 and normal success at line 9899).
|
|
53
|
+
|
|
54
|
+
### BUG-E2E-006: Provider validation missing on request models
|
|
55
|
+
- **File**: `web-app/server.py` (StartRequest, QuickStartRequest models)
|
|
56
|
+
- **Problem**: The `provider` field on StartRequest and QuickStartRequest accepted any string. An unknown provider like `"evil"` would be passed to `loki start --provider evil`, which would fail inside run.sh but waste resources spawning a process.
|
|
57
|
+
- **Fix**: Added `@field_validator("provider")` that validates against the known set: claude, codex, gemini, cline, aider.
|
|
58
|
+
|
|
59
|
+
### BUG-E2E-007: ChatRequest message not validated
|
|
60
|
+
- **File**: `web-app/server.py` (ChatRequest model)
|
|
61
|
+
- **Problem**: The `message` field had no validation. An empty string or a 10MB message could be sent, either causing a useless `loki quick ""` invocation or excessive memory usage.
|
|
62
|
+
- **Fix**: Added `@field_validator("message")` that rejects empty messages and enforces a 100KB limit.
|
|
63
|
+
|
|
64
|
+
## Verification
|
|
65
|
+
|
|
66
|
+
- Python syntax: `ast.parse()` passes for `web-app/server.py`
|
|
67
|
+
- Bash syntax: `bash -n autonomy/run.sh` passes
|
|
68
|
+
- TypeScript: No new errors introduced (pre-existing errors are all from missing node_modules)
|
|
69
|
+
- All fixes are backward compatible (new fields are optional, new validations reject previously-invalid input)
|