loki-mode 6.71.0 → 6.72.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +9 -1
- package/SKILL.md +2 -2
- package/VERSION +1 -1
- package/autonomy/hooks/migration-hooks.sh +26 -0
- package/autonomy/loki +429 -92
- package/autonomy/run.sh +219 -38
- package/dashboard/__init__.py +1 -1
- package/dashboard/server.py +101 -19
- package/docs/INSTALLATION.md +20 -11
- package/docs/bug-fixes/agent-01-cli-fixes.md +101 -0
- package/docs/bug-fixes/agent-02-purplelab-fixes.md +88 -0
- package/docs/bug-fixes/agent-03-dashboard-fixes.md +119 -0
- package/docs/bug-fixes/agent-04-memory-fixes.md +105 -0
- package/docs/bug-fixes/agent-05-provider-fixes.md +86 -0
- package/docs/bug-fixes/agent-06-integration-fixes.md +101 -0
- package/docs/bug-fixes/agent-07-dash-run-fixes.md +101 -0
- package/docs/bug-fixes/agent-08-docker-fixes.md +164 -0
- package/docs/bug-fixes/agent-09-e2e-build-fixes.md +69 -0
- package/docs/bug-fixes/agent-10-e2e-fullstack-fixes.md +102 -0
- package/docs/bug-fixes/agent-11-e2e-session-fixes.md +70 -0
- package/docs/bug-fixes/agent-12-scenario-fixes.md +120 -0
- package/docs/bug-fixes/agent-13-enterprise-fixes.md +143 -0
- package/docs/bug-fixes/agent-14-uat-newuser-fixes.md +88 -0
- package/docs/bug-fixes/agent-15-uat-poweruser-fixes.md +132 -0
- package/docs/bug-fixes/agent-19-code-review.md +316 -0
- package/docs/bug-fixes/agent-20-architecture-review.md +331 -0
- package/docs/competitive/bolt-new-analysis.md +579 -0
- package/docs/competitive/emergence-others-analysis.md +605 -0
- package/docs/competitive/replit-lovable-analysis.md +622 -0
- package/docs/test-scenarios/edge-cases.md +813 -0
- package/docs/test-scenarios/enterprise-scenarios.md +732 -0
- package/mcp/__init__.py +1 -1
- package/mcp/server.py +49 -5
- package/memory/consolidation.py +33 -0
- package/memory/embeddings.py +10 -1
- package/memory/engine.py +83 -38
- package/memory/retrieval.py +36 -0
- package/memory/storage.py +56 -4
- package/memory/token_economics.py +14 -2
- package/memory/vector_index.py +36 -7
- package/package.json +1 -1
- package/providers/gemini.sh +89 -2
- package/templates/README.md +1 -1
- package/templates/cli-tool.md +30 -0
- package/templates/dashboard.md +4 -0
- package/templates/data-pipeline.md +4 -0
- package/templates/discord-bot.md +47 -0
- package/templates/game.md +4 -0
- package/templates/microservice.md +4 -0
- package/templates/npm-library.md +4 -0
- package/templates/rest-api-auth.md +50 -20
- package/templates/rest-api.md +15 -0
- package/templates/saas-starter.md +1 -1
- package/templates/slack-bot.md +36 -0
- package/templates/static-landing-page.md +9 -1
- package/templates/web-scraper.md +4 -0
- package/web-app/dist/assets/Badge-CeBkFjo6.js +1 -0
- package/web-app/dist/assets/Button-yuhqo8Fq.js +1 -0
- package/web-app/dist/assets/{Card-BMw7NSaV.js → Card-BG17vsX0.js} +1 -1
- package/web-app/dist/assets/{HomePage-QyvNpyFv.js → HomePage-BMSQ7Apj.js} +3 -3
- package/web-app/dist/assets/{LoginPage-CG_DkANw.js → LoginPage-aH_6iolg.js} +1 -1
- package/web-app/dist/assets/{NotFoundPage-CHBJTLTi.js → NotFoundPage-Di8cNtB1.js} +1 -1
- package/web-app/dist/assets/ProjectPage-BtRssmw9.js +285 -0
- package/web-app/dist/assets/ProjectsPage-B-FTFagc.js +6 -0
- package/web-app/dist/assets/{SettingsPage-Dq-c6kXj.js → SettingsPage-DIJPBla4.js} +1 -1
- package/web-app/dist/assets/TeamsPage--19fNX7w.js +36 -0
- package/web-app/dist/assets/TemplatesPage-ChUQNOOv.js +11 -0
- package/web-app/dist/assets/TerminalOutput-Dwrzecyl.js +31 -0
- package/web-app/dist/assets/activity-BNRWeu9N.js +6 -0
- package/web-app/dist/assets/{arrow-left-Dw9yRwL8.js → arrow-left-Ce6g1_YE.js} +1 -1
- package/web-app/dist/assets/circle-alert-LIndawHL.js +11 -0
- package/web-app/dist/assets/clock-Bpj4VPlP.js +6 -0
- package/web-app/dist/assets/{external-link-DGtaQZrg.js → external-link-BhhdF0iQ.js} +1 -1
- package/web-app/dist/assets/folder-open-CM2LgfxI.js +11 -0
- package/web-app/dist/assets/index-8-KpWWq7.css +1 -0
- package/web-app/dist/assets/index-kPDW4e_b.js +236 -0
- package/web-app/dist/assets/lock-sAk3Xe54.js +16 -0
- package/web-app/dist/assets/search-CR-2i9by.js +6 -0
- package/web-app/dist/assets/server-DuFh4ymA.js +26 -0
- package/web-app/dist/assets/trash-2-BmkkT8V_.js +11 -0
- package/web-app/dist/index.html +2 -2
- package/web-app/server.py +1345 -55
- package/web-app/dist/assets/Badge-BFLpnFZM.js +0 -6
- package/web-app/dist/assets/Button-BYY9clv_.js +0 -16
- package/web-app/dist/assets/ProjectPage-q65bhy76.js +0 -217
- package/web-app/dist/assets/ProjectsPage-d4mY9ewI.js +0 -21
- package/web-app/dist/assets/TemplatesPage-BEpY-p-Q.js +0 -1
- package/web-app/dist/assets/TerminalOutput-CFy7MnPO.js +0 -51
- package/web-app/dist/assets/clock-D4pcK_Eq.js +0 -11
- package/web-app/dist/assets/index-BnNomb7B.js +0 -196
- package/web-app/dist/assets/index-D452pFGl.css +0 -1
|
@@ -0,0 +1,132 @@
|
|
|
1
|
+
# Agent 15: Power User Acceptance Testing - Bug Fixes
|
|
2
|
+
|
|
3
|
+
## Summary
|
|
4
|
+
|
|
5
|
+
Tested advanced CLI features, MCP server, provider switching, parallel workflows,
|
|
6
|
+
memory system, telemetry, and agent dispatch. Found and fixed 13 bugs across 3 files.
|
|
7
|
+
|
|
8
|
+
## Files Modified
|
|
9
|
+
|
|
10
|
+
- `autonomy/loki` - CLI wrapper (11 fixes)
|
|
11
|
+
- `autonomy/run.sh` - Orchestration engine (1 fix)
|
|
12
|
+
- `mcp/server.py` - MCP server (2 fixes)
|
|
13
|
+
|
|
14
|
+
## Bugs Fixed
|
|
15
|
+
|
|
16
|
+
### BUG-PU-001: Worktree creation doesn't cleanup on failure
|
|
17
|
+
**File:** `autonomy/run.sh` (create_worktree function)
|
|
18
|
+
**Problem:** When `git worktree add` fails, partial worktree directories and orphaned
|
|
19
|
+
branches are left behind. The `loki worktree clean` command also didn't delete
|
|
20
|
+
associated parallel branches, causing branch pollution over time.
|
|
21
|
+
**Fix:** Added cleanup of partial worktree directory and orphaned branch on creation
|
|
22
|
+
failure. Enhanced `loki worktree clean` to track and delete associated parallel
|
|
23
|
+
branches, and added `git worktree prune` call.
|
|
24
|
+
|
|
25
|
+
### BUG-PU-002: MCP server loses connection after idle timeout
|
|
26
|
+
**File:** `mcp/server.py`
|
|
27
|
+
**Problem:** The StateManager singleton was never refreshed when the working directory
|
|
28
|
+
changed (e.g., user switches projects). ChromaDB reconnection lacked a heartbeat
|
|
29
|
+
verification after reconnect.
|
|
30
|
+
**Fix:** `_get_mcp_state_manager()` now detects when the project root directory has
|
|
31
|
+
changed and recreates the StateManager. ChromaDB `_get_chroma_collection()` now
|
|
32
|
+
verifies heartbeat after reconnect and properly nulls both client and collection
|
|
33
|
+
on failure.
|
|
34
|
+
|
|
35
|
+
### BUG-PU-003: `loki agent run` doesn't properly pass agent definitions
|
|
36
|
+
**File:** `autonomy/loki` (cmd_agent run/start)
|
|
37
|
+
**Problem:** The persona was concatenated with the user prompt using a single space,
|
|
38
|
+
making it impossible for the AI to distinguish the role instruction from the task.
|
|
39
|
+
Additionally, `loki agent start` created temp PRDs in /tmp that were never cleaned
|
|
40
|
+
up because `cmd_start` uses `exec` (replaces the process).
|
|
41
|
+
**Fix:** Restructured the prompt with clear section headers ("You are acting as the
|
|
42
|
+
following specialist agent:" / "USER TASK:") separated by delimiters. Changed temp
|
|
43
|
+
PRD location from /tmp to .loki/ directory so it persists alongside the project.
|
|
44
|
+
|
|
45
|
+
### BUG-PU-004: `loki telemetry status` fails silently when Jaeger is down
|
|
46
|
+
**File:** `autonomy/loki` (cmd_telemetry status)
|
|
47
|
+
**Problem:** The status command showed the configured endpoint but never tested
|
|
48
|
+
whether the collector was actually reachable. Users had no way to diagnose
|
|
49
|
+
connectivity issues without manual curl commands.
|
|
50
|
+
**Fix:** Added a connectivity check that sends a curl request to the `/v1/traces`
|
|
51
|
+
endpoint with a 3-second timeout. Displays "YES", "NO (connection failed)", or
|
|
52
|
+
HTTP status code feedback.
|
|
53
|
+
|
|
54
|
+
### BUG-PU-005: Multiple simultaneous `loki quick` commands corrupt shared state
|
|
55
|
+
**File:** `autonomy/loki` (cmd_quick)
|
|
56
|
+
**Problem:** All `loki quick` invocations wrote to the same `$LOKI_DIR/quick-prd.md`
|
|
57
|
+
file. Running two concurrent `loki quick` commands in the same project would cause
|
|
58
|
+
one to overwrite the other's PRD before `exec` was called.
|
|
59
|
+
**Fix:** Changed the quick PRD filename to include `$$` (PID), making it
|
|
60
|
+
`$LOKI_DIR/quick-prd-$$.md` so each invocation uses a unique file.
|
|
61
|
+
|
|
62
|
+
### BUG-PU-006: `loki worktree merge` uses `exit 1` instead of `return 1`
|
|
63
|
+
**File:** `autonomy/loki` (cmd_worktree merge)
|
|
64
|
+
**Problem:** On invalid merge signal file, the command called `exit 1` which
|
|
65
|
+
terminated the entire shell process instead of just the subcommand.
|
|
66
|
+
**Fix:** Changed `exit 1` to `return 1`.
|
|
67
|
+
|
|
68
|
+
### BUG-PU-007: `loki audit log/count` use `exit 0` instead of `return 0`
|
|
69
|
+
**File:** `autonomy/loki` (cmd_audit log, count, scan)
|
|
70
|
+
**Problem:** When the audit log file doesn't exist, `exit 0` was used instead of
|
|
71
|
+
`return 0`, killing the entire CLI process. Same issue in `cmd_audit scan` with
|
|
72
|
+
multiple `exit` calls on error paths.
|
|
73
|
+
**Fix:** Replaced all `exit 0` and `exit 1` with `return 0` and `return 1`
|
|
74
|
+
respectively in `cmd_audit` subcommands.
|
|
75
|
+
|
|
76
|
+
### BUG-PU-008: `loki worktree merge` doesn't validate branch existence
|
|
77
|
+
**File:** `autonomy/loki` (cmd_worktree merge)
|
|
78
|
+
**Problem:** If the branch extracted from the merge signal file was empty or didn't
|
|
79
|
+
exist (e.g., already cleaned up), `git merge --no-ff ""` would fail with a confusing
|
|
80
|
+
error message.
|
|
81
|
+
**Fix:** Added validation that the branch name is non-empty and that the branch
|
|
82
|
+
exists via `git rev-parse --verify` before attempting the merge.
|
|
83
|
+
|
|
84
|
+
### BUG-PU-010: `loki memory search` has shell/Python injection via heredoc
|
|
85
|
+
**File:** `autonomy/loki` (cmd_memory search)
|
|
86
|
+
**Problem:** The search query was embedded directly in a Python heredoc using
|
|
87
|
+
`query = """$query"""`. A query containing triple-quotes or other Python syntax
|
|
88
|
+
could break out of the string and execute arbitrary code.
|
|
89
|
+
**Fix:** Changed to pass the query via `LOKI_MEM_QUERY` environment variable with
|
|
90
|
+
a quoted heredoc delimiter (`'PYEOF'`) to prevent all shell expansion.
|
|
91
|
+
|
|
92
|
+
### BUG-PU-011: `loki telemetry enable` silently overwrites config on python3 failure
|
|
93
|
+
**File:** `autonomy/loki` (cmd_telemetry enable)
|
|
94
|
+
**Problem:** The `if/else/fi` structure around the heredoc meant that if python3
|
|
95
|
+
failed to parse the existing config (e.g., corrupted JSON), the `else` branch would
|
|
96
|
+
execute and overwrite the entire config file with only the endpoint setting,
|
|
97
|
+
destroying all other configuration.
|
|
98
|
+
**Fix:** Restructured to use `if ! python3 ... then ... fi` pattern, so python3
|
|
99
|
+
failure produces a warning and explicitly recreates the config, rather than silently
|
|
100
|
+
falling through.
|
|
101
|
+
|
|
102
|
+
### BUG-PU-012: `loki memory show` and `clear` use `exit 1` on unknown type
|
|
103
|
+
**File:** `autonomy/loki` (cmd_memory show, clear)
|
|
104
|
+
**Problem:** `exit 1` on invalid memory type kills the entire process.
|
|
105
|
+
**Fix:** Changed to `return 1`.
|
|
106
|
+
|
|
107
|
+
### BUG-MCP-006: `mem_search` ignores `collection` parameter
|
|
108
|
+
**File:** `mcp/server.py` (mem_search tool)
|
|
109
|
+
**Problem:** The `collection` parameter (episodes, patterns, skills, all) was
|
|
110
|
+
accepted in the function signature but never used to filter results.
|
|
111
|
+
`retrieve_task_aware` always returned results from all collections regardless
|
|
112
|
+
of what the user requested.
|
|
113
|
+
**Fix:** Added a type-mapping filter that maps collection names to internal types
|
|
114
|
+
(episodes -> episode, patterns -> pattern, skills -> skill) and filters results
|
|
115
|
+
after retrieval.
|
|
116
|
+
|
|
117
|
+
## Verification
|
|
118
|
+
|
|
119
|
+
All three modified files pass syntax validation:
|
|
120
|
+
- `bash -n autonomy/loki` -- PASS
|
|
121
|
+
- `bash -n autonomy/run.sh` -- PASS
|
|
122
|
+
- `python3 -c "import ast; ast.parse(open('mcp/server.py').read())"` -- PASS
|
|
123
|
+
|
|
124
|
+
## Feature Interaction Analysis
|
|
125
|
+
|
|
126
|
+
Tested the following cross-feature interactions:
|
|
127
|
+
|
|
128
|
+
1. **MCP + Memory**: mem_search now correctly filters by collection type
|
|
129
|
+
2. **Parallel + Provider Switch**: Worktree cleanup now handles branches properly
|
|
130
|
+
3. **Agent + Quick**: Both use unique PID-based temp files, no collision possible
|
|
131
|
+
4. **Telemetry + Config**: Config file updates are now resilient to corruption
|
|
132
|
+
5. **Provider + Agent run**: Agent prompt formatting works across all 5 providers
|
|
@@ -0,0 +1,316 @@
|
|
|
1
|
+
# Agent 19 Code Review Report
|
|
2
|
+
|
|
3
|
+
Date: 2026-03-24
|
|
4
|
+
Scope: Full codebase code quality review -- security, correctness, anti-patterns
|
|
5
|
+
Files Reviewed: ~50 key files across shell, Python, TypeScript
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## CRITICAL Findings (Security / Data Corruption)
|
|
10
|
+
|
|
11
|
+
### C-01: `eval` on Python output in `run.sh` -- shell injection via malicious JSON config
|
|
12
|
+
|
|
13
|
+
**File:** `autonomy/run.sh` lines 485-525 (`_load_json_settings`)
|
|
14
|
+
**File:** `autonomy/run.sh` line 6785 (`read_failover_config`)
|
|
15
|
+
|
|
16
|
+
Both functions use `eval "$(python3 ...)"` to set shell variables from Python output. While the `_load_json_settings` function uses `shlex.quote()` for value escaping, and `read_failover_config` uses a single-quoted heredoc (`<< 'PYEOF'`), the fundamental pattern is fragile:
|
|
17
|
+
|
|
18
|
+
- **`_load_json_settings`**: If the Python script itself errors in an unexpected way that produces partial output, the `eval` can execute truncated shell commands. The `2>/dev/null || true` at line 525 suppresses any diagnostic. Additionally, the `shlex.quote()` escaping protects against values but not against key names -- if a settings.json key somehow injects into the mapping dictionary keys (unlikely but architecturally fragile).
|
|
19
|
+
|
|
20
|
+
- **`read_failover_config`** (line 6785): Reads JSON and prints shell variable assignments. The single-quoted heredoc prevents expansion during heredoc creation, and the Python constructs values directly from JSON. However, a malicious `failover.json` with crafted string values for `chain`, `currentProvider`, or `primaryProvider` fields that contain shell metacharacters could escape the quoting. The Python uses f-strings with `str()` and direct dict lookups -- the `chain` field is joined with commas from a list, but `currentProvider`/`primaryProvider` are raw string values printed inside double quotes. A value like `"; rm -rf /; echo "` would be eval'd.
|
|
21
|
+
|
|
22
|
+
**Severity:** CRITICAL
|
|
23
|
+
**Risk:** Arbitrary command execution if `.loki/state/failover.json` is tampered with or written by an untrusted process.
|
|
24
|
+
**Fix:** Replace `eval` with `declare` assignments or read values into variables using `read` from a pipe. Alternatively, validate all Python-produced assignments match `^[A-Z_]+=.*$` before eval.
|
|
25
|
+
|
|
26
|
+
### C-02: `eval "$LOKI_MONOREPO_TEST_CMD"` -- arbitrary command execution from env var
|
|
27
|
+
|
|
28
|
+
**File:** `autonomy/run.sh` line 5563
|
|
29
|
+
|
|
30
|
+
```bash
|
|
31
|
+
output=$(cd "${TARGET_DIR:-.}" && eval "$LOKI_MONOREPO_TEST_CMD" 2>&1) || test_passed=false
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
The `LOKI_MONOREPO_TEST_CMD` environment variable is eval'd directly. While this is documented as a user-configurable override, it executes with the full privileges of the running shell. If an attacker can set environment variables (e.g., via `.env` injection, CI variable pollution, or config file poisoning), this becomes an arbitrary code execution vector.
|
|
35
|
+
|
|
36
|
+
**Severity:** CRITICAL (in multi-tenant / CI environments)
|
|
37
|
+
**Risk:** Arbitrary command execution
|
|
38
|
+
**Mitigation:** This is by design for power users, but should be guarded with a warning log and possibly a `LOKI_ALLOW_EVAL=true` gate. Document the security implications in CLAUDE.md.
|
|
39
|
+
|
|
40
|
+
### C-03: Non-atomic `write_text()` in dashboard server for signal files
|
|
41
|
+
|
|
42
|
+
**File:** `dashboard/server.py` lines 2781, 2877, 3289, 3410-3411
|
|
43
|
+
|
|
44
|
+
Multiple control endpoints use `Path.write_text()` directly without atomic write patterns:
|
|
45
|
+
|
|
46
|
+
```python
|
|
47
|
+
pause_file.write_text(datetime.now(timezone.utc).isoformat()) # line 2781
|
|
48
|
+
stop_file.write_text(datetime.now(timezone.utc).isoformat()) # line 2877
|
|
49
|
+
(signal_dir / "COUNCIL_REVIEW_REQUESTED").write_text(...) # line 3289
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
While line 3410-3411 does use temp+rename for `triggers.json`, the signal files above are written directly. If the process crashes or receives a signal mid-write, these files can be left in a partially-written state. For simple timestamp strings this is low risk, but `triggers.json` could be corrupted.
|
|
53
|
+
|
|
54
|
+
**Severity:** MEDIUM
|
|
55
|
+
**Risk:** Partial file writes under system pressure
|
|
56
|
+
**Fix:** Use `atomic_write_json` (already available via `from .control import atomic_write_json`) or at minimum write to temp + rename for JSON files.
|
|
57
|
+
|
|
58
|
+
### C-04: `_save_registry` writes JSON without atomic rename
|
|
59
|
+
|
|
60
|
+
**File:** `dashboard/registry.py` line 40-44
|
|
61
|
+
|
|
62
|
+
```python
|
|
63
|
+
def _save_registry(registry: dict) -> None:
|
|
64
|
+
with open(REGISTRY_FILE, "w") as f:
|
|
65
|
+
json.dump(registry, f, indent=2, default=str)
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
This writes directly to the registry file. A crash during write corrupts `~/.loki/dashboard/projects.json`. Unlike `memory/storage.py` and `dashboard/control.py` which use temp+rename, the registry uses a simple overwrite.
|
|
69
|
+
|
|
70
|
+
**Severity:** MEDIUM
|
|
71
|
+
**Risk:** Registry corruption on crash
|
|
72
|
+
**Fix:** Use temp file + `os.rename()` pattern consistent with rest of codebase.
|
|
73
|
+
|
|
74
|
+
---
|
|
75
|
+
|
|
76
|
+
## HIGH Findings (Correctness / Reliability)
|
|
77
|
+
|
|
78
|
+
### H-01: `_sanitize_text_field` strips tab and newline from text fields
|
|
79
|
+
|
|
80
|
+
**File:** `dashboard/server.py` lines 148-158
|
|
81
|
+
|
|
82
|
+
```python
|
|
83
|
+
cleaned = "".join(
|
|
84
|
+
ch for ch in value if unicodedata.category(ch)[0] != "C" or ch in (" ",)
|
|
85
|
+
)
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
This strips all control characters except space. This means tab (`\t`) and newline (`\n`) are stripped. For short fields like `name` this is fine, but the function is called on project names and task titles. If descriptions ever use this sanitizer, legitimate multi-line descriptions would be silently flattened.
|
|
89
|
+
|
|
90
|
+
**Severity:** LOW
|
|
91
|
+
**Status:** Acceptable for current usage (name/title fields only). Worth noting if sanitization scope expands.
|
|
92
|
+
|
|
93
|
+
### H-02: Duplicate `TASK_STRATEGIES` definition
|
|
94
|
+
|
|
95
|
+
**File:** `memory/engine.py` lines 34-65
|
|
96
|
+
**File:** `memory/retrieval.py` lines 121-150
|
|
97
|
+
|
|
98
|
+
The `TASK_STRATEGIES` dictionary is defined identically in both files. If one is updated without the other, retrieval behavior diverges silently. The engine.py copy appears unused -- `retrieval.py` is the authoritative consumer.
|
|
99
|
+
|
|
100
|
+
**Severity:** MEDIUM
|
|
101
|
+
**Risk:** Behavioral divergence if one copy is modified
|
|
102
|
+
**Fix:** Remove the `TASK_STRATEGIES` from `engine.py` and import from `retrieval.py` if needed, or create a shared `constants.py`.
|
|
103
|
+
|
|
104
|
+
### H-03: `_file_lock` reentrant path yields `None` without lock
|
|
105
|
+
|
|
106
|
+
**File:** `memory/storage.py` lines 198-243
|
|
107
|
+
|
|
108
|
+
When reentrant lock detection triggers (thread already holds lock on same path), the context manager yields without holding the lock file. The caller proceeds to read/write files assuming the lock is held. This is safe for the single-thread case but could allow interleaving if another process acquires the lock between the check and the yield.
|
|
109
|
+
|
|
110
|
+
```python
|
|
111
|
+
if lock_key in self._held_locks.paths:
|
|
112
|
+
yield # No lock held -- other processes can interleave
|
|
113
|
+
return
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
**Severity:** LOW (single-process design mitigates this)
|
|
117
|
+
**Status:** Acceptable given that MemoryStorage is designed for single-process use with reentrant calls. Worth documenting the limitation.
|
|
118
|
+
|
|
119
|
+
### H-04: Bare `except Exception: pass` suppresses errors silently (40+ locations)
|
|
120
|
+
|
|
121
|
+
**Files:** `state/manager.py`, `memory/storage.py`, `events/bus.py`, `dashboard/server.py`, `learning/emitter.py`, `dashboard/telemetry.py`
|
|
122
|
+
|
|
123
|
+
Found 40+ instances of bare `except Exception: pass` or `except Exception:` with minimal handling. Most are in cleanup/notification paths where swallowing errors is intentional (don't let logging failures break core logic). However, several are in data paths:
|
|
124
|
+
|
|
125
|
+
- `state/manager.py:586` -- subscriber notification errors silently swallowed
|
|
126
|
+
- `events/bus.py:408` -- event persistence errors silently swallowed
|
|
127
|
+
- `dashboard/server.py:3862` -- log parsing errors silently swallowed
|
|
128
|
+
- `memory/storage.py:270,550,663` -- various I/O operations
|
|
129
|
+
|
|
130
|
+
**Severity:** MEDIUM (cumulative debugging difficulty)
|
|
131
|
+
**Risk:** Silent data loss, difficult-to-diagnose issues
|
|
132
|
+
**Fix:** Add at minimum `logger.debug()` calls for the data-path exceptions. The notification/cleanup paths are acceptable as-is.
|
|
133
|
+
|
|
134
|
+
### H-05: WebSocket `receive_text()` timeout logic may close valid connections
|
|
135
|
+
|
|
136
|
+
**File:** `dashboard/server.py` lines 1430-1462
|
|
137
|
+
|
|
138
|
+
The WebSocket handler pings after 30s of silence and closes after 2 consecutive missed pongs:
|
|
139
|
+
|
|
140
|
+
```python
|
|
141
|
+
data = await asyncio.wait_for(websocket.receive_text(), timeout=30.0)
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
A client that sends only binary frames or is temporarily network-delayed for >60s gets disconnected. The 60s total timeout (2 x 30s) is reasonable but aggressive for mobile clients on poor networks. Consider making the timeout configurable via environment variable.
|
|
145
|
+
|
|
146
|
+
**Severity:** LOW
|
|
147
|
+
|
|
148
|
+
### H-06: `_safe_json_read` blocks event loop with `time.sleep(0.1)`
|
|
149
|
+
|
|
150
|
+
**File:** `dashboard/server.py` line 87
|
|
151
|
+
|
|
152
|
+
```python
|
|
153
|
+
time.sleep(0.1) # sync sleep in async context
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
This is called from `_safe_json_read` which is a synchronous function used by the background WebSocket push loop. Since the push loop is an async coroutine, any synchronous call that blocks will stall the event loop. The 0.1s block is short but could compound under load.
|
|
157
|
+
|
|
158
|
+
**Severity:** LOW
|
|
159
|
+
**Fix:** Use `asyncio.sleep(0.1)` in the async context or move file reads to a thread pool via `asyncio.to_thread()`.
|
|
160
|
+
|
|
161
|
+
---
|
|
162
|
+
|
|
163
|
+
## MODERATE Findings (Code Quality / Anti-Patterns)
|
|
164
|
+
|
|
165
|
+
### M-01: `_version` fallback to hardcoded "5.58.1"
|
|
166
|
+
|
|
167
|
+
**File:** `dashboard/server.py` lines 58-61
|
|
168
|
+
|
|
169
|
+
```python
|
|
170
|
+
try:
|
|
171
|
+
from . import __version__ as _version
|
|
172
|
+
except ImportError:
|
|
173
|
+
_version = "5.58.1"
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
The fallback version is stale (current is 6.71.1). This means if the `__init__.py` import fails, the API reports a very old version, misleading monitoring and debugging.
|
|
177
|
+
|
|
178
|
+
**Severity:** LOW
|
|
179
|
+
**Fix:** Update fallback to "0.0.0-unknown" or read from VERSION file as fallback.
|
|
180
|
+
|
|
181
|
+
### M-02: Unquoted variable in `task_count` increment inside subshell
|
|
182
|
+
|
|
183
|
+
**File:** `autonomy/run.sh` line 1715
|
|
184
|
+
|
|
185
|
+
```bash
|
|
186
|
+
task_count=$((task_count + 1))
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
This is inside a subshell `( ... ) 200>"$lockfile"`, so the increment to `task_count` is lost when the subshell exits. The variable in the parent shell retains its original value. The `log_info "Imported $task_count issues"` at line 1723 will always report 0.
|
|
190
|
+
|
|
191
|
+
**Severity:** MEDIUM
|
|
192
|
+
**Risk:** Misleading log output -- always reports 0 imported issues regardless of actual count
|
|
193
|
+
**Fix:** Track count outside the subshell using a temp file counter, or restructure to avoid the subshell.
|
|
194
|
+
|
|
195
|
+
### M-03: Schema import fallback assigns `Any` to class variables
|
|
196
|
+
|
|
197
|
+
**File:** `memory/storage.py` lines 23-29
|
|
198
|
+
|
|
199
|
+
```python
|
|
200
|
+
try:
|
|
201
|
+
from .schemas import EpisodeTrace, SemanticPattern, ProceduralSkill
|
|
202
|
+
except ImportError:
|
|
203
|
+
EpisodeTrace = Any
|
|
204
|
+
SemanticPattern = Any
|
|
205
|
+
ProceduralSkill = Any
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
When schemas import fails, `Any` (from `typing`) is assigned. This means `isinstance()` checks against these types will behave unexpectedly -- `isinstance(x, Any)` always raises `TypeError` in Python 3.10+. Any code paths that check types against these fallbacks will crash.
|
|
209
|
+
|
|
210
|
+
**Severity:** LOW (import failures are rare in practice)
|
|
211
|
+
**Fix:** Use `object` instead of `Any` as fallback for class assignments, or remove the try/except entirely since schemas should always be available.
|
|
212
|
+
|
|
213
|
+
### M-04: `_RateLimiter` key eviction races with concurrent access
|
|
214
|
+
|
|
215
|
+
**File:** `dashboard/server.py` lines 104-137
|
|
216
|
+
|
|
217
|
+
The rate limiter is a plain dict with no thread/coroutine safety. In an async FastAPI server, concurrent coroutines could interleave during the eviction/pruning operations:
|
|
218
|
+
|
|
219
|
+
```python
|
|
220
|
+
empty_keys = [k for k, v in self._calls.items() if not v]
|
|
221
|
+
for k in empty_keys:
|
|
222
|
+
del self._calls[k]
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
With asyncio, this is actually safe since coroutines don't preempt each other within a single event loop turn. But if the server ever adds threading (e.g., for background tasks), this becomes a race condition.
|
|
226
|
+
|
|
227
|
+
**Severity:** LOW (safe under current asyncio model)
|
|
228
|
+
**Status:** Acceptable for now. Document the single-threaded assumption.
|
|
229
|
+
|
|
230
|
+
### M-05: `check_budget_limit` bare `except: pass` in inline Python
|
|
231
|
+
|
|
232
|
+
**File:** `autonomy/run.sh` line 7241
|
|
233
|
+
|
|
234
|
+
```python
|
|
235
|
+
except: pass
|
|
236
|
+
```
|
|
237
|
+
|
|
238
|
+
This is inside an inline Python script. Bare `except:` (without `Exception`) catches `SystemExit`, `KeyboardInterrupt`, and `GeneratorExit`, suppressing them all. This can mask fundamental errors during cost calculation.
|
|
239
|
+
|
|
240
|
+
**Severity:** LOW
|
|
241
|
+
**Fix:** Change to `except Exception: pass`.
|
|
242
|
+
|
|
243
|
+
### M-06: Double trap registration in `run.sh`
|
|
244
|
+
|
|
245
|
+
**File:** `autonomy/run.sh` lines 186 and 199
|
|
246
|
+
|
|
247
|
+
```bash
|
|
248
|
+
trap 'rm -f "$TEMP_SCRIPT"' EXIT # line 186, before exec
|
|
249
|
+
trap 'rm -f "${BASH_SOURCE[0]}" 2>/dev/null' EXIT # line 199, after exec
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
The first trap is set before `exec`, meaning it runs in the pre-exec shell (which never reaches exit because of `exec`). The second trap correctly runs in the exec'd copy. The first trap is a no-op (the comment says "Set trap BEFORE exec" but exec replaces the process). This is benign but misleading.
|
|
253
|
+
|
|
254
|
+
**Severity:** INFORMATIONAL
|
|
255
|
+
**Status:** The code works correctly. The comment at line 185 is inaccurate -- the trap doesn't survive `exec`. The actual cleanup happens at line 199.
|
|
256
|
+
|
|
257
|
+
---
|
|
258
|
+
|
|
259
|
+
## Cross-Cutting Observations
|
|
260
|
+
|
|
261
|
+
### Security Posture: GOOD
|
|
262
|
+
|
|
263
|
+
- Path traversal protection in MCP server is thorough (symlink chain walking, allowed-directory enforcement)
|
|
264
|
+
- Memory storage validates namespace with regex to prevent path traversal
|
|
265
|
+
- OIDC implementation has clear security warnings about signature verification
|
|
266
|
+
- Token hashing uses per-token salts with SHA-256
|
|
267
|
+
- Token files enforce 0600 permissions
|
|
268
|
+
- CORS defaults to localhost-only
|
|
269
|
+
- Control endpoints have rate limiting
|
|
270
|
+
- WebSocket auth requires tokens when enterprise mode is enabled
|
|
271
|
+
- Dashboard checklist waiver endpoint validates `item_id` against path traversal characters
|
|
272
|
+
- SQLAlchemy ORM usage prevents SQL injection
|
|
273
|
+
|
|
274
|
+
### Error Handling: MODERATE
|
|
275
|
+
|
|
276
|
+
- Core data paths (atomic writes, file locking) have proper error handling
|
|
277
|
+
- Many log/metrics/events paths silently swallow exceptions (acceptable for non-critical observability)
|
|
278
|
+
- Inline Python in shell scripts uses bare `except: pass` instead of `except Exception:`
|
|
279
|
+
- Dashboard server has comprehensive `try/except` around file reads with fallback values
|
|
280
|
+
|
|
281
|
+
### Concurrency: GOOD
|
|
282
|
+
|
|
283
|
+
- File locking via `fcntl.flock()` is used consistently in memory system and state manager
|
|
284
|
+
- Atomic writes via temp+rename used in critical data paths
|
|
285
|
+
- Lock file cleanup for stale locks from crashed processes
|
|
286
|
+
- `_held_locks` thread-local prevents deadlocks from reentrant lock acquisition
|
|
287
|
+
|
|
288
|
+
### React/TypeScript: GOOD
|
|
289
|
+
|
|
290
|
+
- No `dangerouslySetInnerHTML` usage in source code (only in compiled bundle)
|
|
291
|
+
- Error boundaries in place for major components
|
|
292
|
+
- `useEffect` cleanup functions properly implemented (e.g., `cancelled` flag in auth hook)
|
|
293
|
+
- WebSocket subscription cleanup via returned unsubscribe functions
|
|
294
|
+
- API client properly handles errors and provides typed responses
|
|
295
|
+
- No direct DOM manipulation or innerHTML usage
|
|
296
|
+
- Auth tokens stored in localStorage with proper Bearer header usage
|
|
297
|
+
|
|
298
|
+
---
|
|
299
|
+
|
|
300
|
+
## Priority Fix Recommendations
|
|
301
|
+
|
|
302
|
+
1. **C-01 (CRITICAL):** Validate eval'd Python output matches expected format before eval in `read_failover_config`. Add `shlex.quote()` for the provider string values printed by the Python inline.
|
|
303
|
+
2. **M-02 (MEDIUM):** Fix `task_count` subshell variable loss in `import_github_issues`. This causes incorrect log output.
|
|
304
|
+
3. **H-02 (MEDIUM):** Deduplicate `TASK_STRATEGIES` -- single source of truth.
|
|
305
|
+
4. **C-03 (MEDIUM):** Use atomic writes for trigger/signal JSON files.
|
|
306
|
+
5. **C-04 (MEDIUM):** Add atomic write to `_save_registry`.
|
|
307
|
+
6. **M-01 (LOW):** Update stale fallback version string.
|
|
308
|
+
7. **M-05 (LOW):** Change bare `except:` to `except Exception:` in inline Python.
|
|
309
|
+
|
|
310
|
+
---
|
|
311
|
+
|
|
312
|
+
## Feedback Loops Completed
|
|
313
|
+
|
|
314
|
+
- Loop 1: Re-read each finding to verify it represents a real issue, not a false positive from partial context. Confirmed C-01 eval pattern is real (not safely guarded for all value types). Confirmed M-02 subshell variable loss is a genuine bash behavior issue.
|
|
315
|
+
- Loop 2: Validated syntax of all referenced code patterns. No misquotations in report.
|
|
316
|
+
- Loop 3: Prioritized by actual exploitability and impact. C-02 is marked critical but noted as by-design. C-01 is the highest-priority actionable fix.
|