loki-mode 7.24.0 → 7.26.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -27,11 +27,13 @@
27
27
  - **Spec-driven, autonomous, with a built-in trust layer** -- Hand Loki a spec, walk away, come back to working code with tests. The full RARV-C closure loop (Reason - Act - Reflect - Verify - Close) runs until the work is actually done, not just attempted. The verified-completion evidence gate (`skills/quality-gates.md`) refuses any "done" claim on an empty git diff against the run-start commit, and blocks completion when tests run red, so "complete" means proven, not promised.
28
28
  - **Production quality built in** -- 11 quality gates (`skills/quality-gates.md`), blind 3-reviewer code review (`run.sh:run_code_review()`), anti-sycophancy checks
29
29
  - **Live App Preview** -- The dashboard embeds the locally-running app in an iframe so you can interact with it immediately during a build. Use `loki preview` (alias `loki open`) to print the URL and open it in your browser. Local-first: no hosted service, no vendor lock (v7.24.0).
30
+ - **Compose-first fullstack** -- When a spec needs more than one service (web + database + cache) Loki generates a 12-factor `docker-compose.yml` with healthchecks, `depends_on` wiring, env-var config, and a `.env.example`. The Live App Preview surfaces the web service URL (not a database port), and health reflects the web service's Docker healthcheck so a crashed app shows as crashed even when the database stays up. Single-service apps stay on a plain run command. All local-first, no hosted service (v7.26.0).
31
+ - **Intelligent `loki start`** -- For interactive foreground runs the dashboard auto-opens in the browser (cross-platform; skipped in CI, SSH-without-TTY, and piped runs; opt out with `LOKI_NO_AUTO_OPEN=1`). The completion summary shows "Your app is live at <url>" so you know exactly where to try what Loki just built. The autonomous loop passes Claude Code's `--effort`, `--max-budget-usd`, and `--fallback-model` on every iteration (each gated on CLI support and individual opt-out env vars) for better long-run unattended execution (v7.25.0).
30
32
  - **Cross-project memory** -- Episodic/semantic/procedural memory with vector search; knowledge learned on one project surfaces on the next (v5.15.0+, see `memory/engine.py`)
31
33
  - **Self-hosted and private** -- Your keys, your infrastructure, no data leaves your network
32
34
  - **Legacy system healing** -- `loki heal` archaeology/stabilize/isolate/modernize/validate phases (v6.67.0, see `skills/healing.md`)
33
35
  - **MCP server** -- 34 tools (including ChromaDB code search) plus 3 resources and 2 prompts (`mcp/server.py`, with managed-memory and magic tools registered from `mcp/managed_tools.py` and `mcp/magic_tools.py`)
34
- - **Full-stack output** -- Source code, tests, Docker configs, CI/CD pipelines, audit logs
36
+ - **Full-stack output** -- Source code, tests, Docker Compose stacks (multi-service with healthchecks), CI/CD pipelines, audit logs
35
37
  - **Provider-agnostic** -- runs on Claude, Codex, Cline, or Aider with automatic failover (`loki-ts/src/runner/providers.ts`); no vendor lock-in. Gemini CLI deprecated v7.5.18; Antigravity CLI coming soon.
36
38
  - **Open source** -- Free for personal, internal, and academic use.
37
39
 
@@ -347,7 +349,7 @@ Claude gets full features (subagents, parallelization, MCP, Task tool). Other ac
347
349
 
348
350
  | Command | Description |
349
351
  |---------|-------------|
350
- | `loki start [PRD]` | Start with optional PRD file (also accepts an issue ref; replaces deprecated `loki run`) |
352
+ | `loki start [PRD]` | Start with optional PRD file (also accepts an issue ref; replaces deprecated `loki run`). Auto-opens the dashboard in the browser for interactive runs and passes native `--effort`/`--max-budget-usd`/`--fallback-model` for resilience (v7.25.0) |
351
353
  | `loki stop` | Stop execution |
352
354
  | `loki heal <path>` | Legacy system healing (archaeology, stabilize, isolate, modernize, validate -- v6.67.0) |
353
355
  | `loki pause` / `resume` | Pause/resume after current session |
package/SKILL.md CHANGED
@@ -3,7 +3,7 @@ name: loki-mode
3
3
  description: Autonomous spec-driven build system with a built-in trust layer. It does not call work done until it is verified (RARV-C closure loop, 11 quality gates, completion council, verified-completion evidence gate). Triggers on "Loki Mode". Takes a spec (PRD, GitHub issue, OpenAPI doc, etc.) to deployed product with minimal human intervention. Provider-agnostic. Requires --dangerously-skip-permissions flag.
4
4
  ---
5
5
 
6
- # Loki Mode v7.24.0
6
+ # Loki Mode v7.26.0
7
7
 
8
8
  **You are an autonomous agent. You make decisions. You do not ask questions. You do not stop.**
9
9
 
@@ -383,4 +383,4 @@ See `CHANGELOG.md` entries [7.5.7], [7.5.8], [7.5.13] for the per-fix list and r
383
383
 
384
384
  ---
385
385
 
386
- **v7.24.0 | [Autonomi](https://www.autonomi.dev/) flagship product | ~260 lines core**
386
+ **v7.26.0 | [Autonomi](https://www.autonomi.dev/) flagship product | ~260 lines core**
package/VERSION CHANGED
@@ -1 +1 @@
1
- 7.24.0
1
+ 7.26.0
@@ -43,6 +43,10 @@ _APP_RUNNER_PID=""
43
43
  _APP_RUNNER_URL=""
44
44
  _APP_RUNNER_IS_DOCKER=false
45
45
  _APP_RUNNER_DOCKER_CONTAINER=""
46
+ # v7.26.0 (Phase 4): the identified primary web service of a compose project,
47
+ # used for service-aware health checks and the preview URL. Empty for
48
+ # non-compose runs or when identification falls back to legacy port parsing.
49
+ _APP_RUNNER_WEB_SERVICE=""
46
50
  _APP_RUNNER_HAS_SETSID=false
47
51
  _APP_RUNNER_CRASH_COUNT=0
48
52
  _APP_RUNNER_RESTART_COUNT=0
@@ -146,6 +150,77 @@ _rotate_app_log() {
146
150
  fi
147
151
  }
148
152
 
153
+ # Identify the primary web service of a docker compose project and its
154
+ # published host port. Uses `docker compose config --format json` (fully
155
+ # resolved: env-interpolated, overrides merged) parsed with python3, so we do
156
+ # NOT hand-parse YAML. Precedence MATCHES the contract in COMPOSE_INSTRUCTION
157
+ # (run.sh build_prompt): (1) label loki.primary=true, (2) service named
158
+ # web/app, (3) service publishing a common web port, (4) first service with any
159
+ # published port. Echoes "service_name|published_port" on success, nothing on
160
+ # failure (caller falls back to legacy behavior). Never hard-fails.
161
+ # v7.26.0 (Phase 4): fixes the multi-service URL/health gaps (GAP #1-4).
162
+ _identify_compose_web_service() {
163
+ local base="${1:-${TARGET_DIR:-.}}"
164
+ local compose_dir
165
+ compose_dir=$(_app_runner_compose_dir "$base")
166
+ command -v docker >/dev/null 2>&1 || return 0
167
+ command -v python3 >/dev/null 2>&1 || return 0
168
+ local cfg
169
+ cfg=$(cd "$compose_dir" && docker compose config --format json 2>/dev/null) || return 0
170
+ [ -n "$cfg" ] || return 0
171
+ printf '%s' "$cfg" | python3 -c '
172
+ import json, sys
173
+ COMMON = ["3000", "8000", "8080", "5000", "4200", "5173", "80"]
174
+ try:
175
+ d = json.load(sys.stdin)
176
+ except Exception:
177
+ sys.exit(0)
178
+ services = d.get("services", {})
179
+ if not isinstance(services, dict) or not services:
180
+ sys.exit(0)
181
+
182
+ def published_ports(svc):
183
+ out = []
184
+ for p in (svc.get("ports") or []):
185
+ if isinstance(p, dict):
186
+ pub = p.get("published")
187
+ else:
188
+ pub = None
189
+ if pub is not None and str(pub).strip():
190
+ out.append(str(pub).strip())
191
+ return out
192
+
193
+ # (1) label loki.primary=true
194
+ for name, svc in services.items():
195
+ labels = svc.get("labels") or {}
196
+ if isinstance(labels, list):
197
+ labels = dict(x.split("=", 1) for x in labels if "=" in x)
198
+ if str(labels.get("loki.primary", "")).lower() == "true":
199
+ pp = published_ports(svc)
200
+ if pp:
201
+ print(name + "|" + pp[0]); sys.exit(0)
202
+ # (2) service named web/app
203
+ for cand in ("web", "app"):
204
+ svc = services.get(cand)
205
+ if svc:
206
+ pp = published_ports(svc)
207
+ if pp:
208
+ print(cand + "|" + pp[0]); sys.exit(0)
209
+ # (3) service publishing a common web port
210
+ for name, svc in services.items():
211
+ pp = published_ports(svc)
212
+ for cp in COMMON:
213
+ if cp in pp:
214
+ print(name + "|" + cp); sys.exit(0)
215
+ # (4) first service with any published port
216
+ for name, svc in services.items():
217
+ pp = published_ports(svc)
218
+ if pp:
219
+ print(name + "|" + pp[0]); sys.exit(0)
220
+ sys.exit(0)
221
+ ' 2>/dev/null || return 0
222
+ }
223
+
149
224
  # Detect port from project files
150
225
  _detect_port() {
151
226
  local method="$1"
@@ -158,18 +233,34 @@ _detect_port() {
158
233
 
159
234
  case "$method" in
160
235
  *docker\ compose*)
161
- # Parse port from compose file
162
- local compose_file
163
- if [ -f "${TARGET_DIR:-.}/docker-compose.yml" ]; then
164
- compose_file="${TARGET_DIR:-.}/docker-compose.yml"
236
+ # v7.26.0: identify the PRIMARY WEB service and ITS published port
237
+ # via docker compose config (resolved JSON), so the preview URL and
238
+ # health check target the web service, not whichever port (e.g. a
239
+ # db/cache) appears first in the file. Falls back to the legacy
240
+ # first-port grep when docker/python is unavailable or no web
241
+ # service is found.
242
+ local web_info web_port
243
+ web_info=$(_identify_compose_web_service "${TARGET_DIR:-.}")
244
+ if [ -n "$web_info" ]; then
245
+ _APP_RUNNER_WEB_SERVICE="${web_info%%|*}"
246
+ web_port="${web_info##*|}"
247
+ fi
248
+ if [ -n "${web_port:-}" ] && [[ "$web_port" =~ ^[0-9]+$ ]]; then
249
+ _APP_RUNNER_PORT="$web_port"
165
250
  else
166
- compose_file="${TARGET_DIR:-.}/compose.yml"
251
+ # Legacy fallback: first published port from the compose file.
252
+ local compose_file
253
+ if [ -f "${TARGET_DIR:-.}/docker-compose.yml" ]; then
254
+ compose_file="${TARGET_DIR:-.}/docker-compose.yml"
255
+ else
256
+ compose_file="${TARGET_DIR:-.}/compose.yml"
257
+ fi
258
+ local port
259
+ # Handle both simple (HOST:CONTAINER) and IP-bound (IP:HOST:CONTAINER) port formats
260
+ # Also handle port ranges like "8080-8090:8080-8090" by taking the first port
261
+ port=$(grep -E '^\s*-\s*"?[0-9]' "$compose_file" 2>/dev/null | head -1 | sed 's/.*- *"*//;s/".*//;' | awk -F: '{print $(NF-1)}' | awk -F- '{print $1}')
262
+ _APP_RUNNER_PORT="${port:-8080}"
167
263
  fi
168
- local port
169
- # Handle both simple (HOST:CONTAINER) and IP-bound (IP:HOST:CONTAINER) port formats
170
- # Also handle port ranges like "8080-8090:8080-8090" by taking the first port
171
- port=$(grep -E '^\s*-\s*"?[0-9]' "$compose_file" 2>/dev/null | head -1 | sed 's/.*- *"*//;s/".*//;' | awk -F: '{print $(NF-1)}' | awk -F- '{print $1}')
172
- _APP_RUNNER_PORT="${port:-8080}"
173
264
  ;;
174
265
  *docker\ build*)
175
266
  local port
@@ -694,14 +785,64 @@ app_runner_health_check() {
694
785
  # retries are handled in app_runner_start.
695
786
  local running_containers
696
787
  running_containers=$(LOKI_COMPOSE_HEALTH_TIMEOUT=1 _app_runner_compose_running_count "${TARGET_DIR:-.}")
697
- if [ "${running_containers:-0}" -gt 0 ]; then
788
+ if [ "${running_containers:-0}" -le 0 ]; then
789
+ # Nothing running at all.
790
+ _write_health "false"
791
+ _write_app_state "crashed"
792
+ return 1
793
+ fi
794
+ # v7.26.0 (Phase 4) GAP #4 fix: "some container is up" is NOT health for
795
+ # a multi-service stack. If we identified a primary web service, health
796
+ # keys on THAT service, not on whether any container (e.g. a db/cache) is
797
+ # up. Two signals, in order: (1) the web service's docker HEALTHCHECK
798
+ # result when one is declared (COMPOSE_INSTRUCTION mandates an HTTP
799
+ # healthcheck on the web service) -- "healthy" means actually serving,
800
+ # "unhealthy"/"starting" do not; (2) when no healthcheck is declared,
801
+ # fall back to the container lifecycle State (running), matching the
802
+ # codebase convention. Reads both fields from one `docker compose ps`.
803
+ if [ -n "${_APP_RUNNER_WEB_SERVICE:-}" ]; then
804
+ local _web_line _web_state _web_health
805
+ _web_line=$(cd "$(_app_runner_compose_dir "${TARGET_DIR:-.}")" \
806
+ && docker compose ps --format '{{.Service}}|{{.State}}|{{.Health}}' 2>/dev/null \
807
+ | tr -d '\r' | awk -F'|' -v s="$_APP_RUNNER_WEB_SERVICE" '$1==s {print; exit}')
808
+ _web_state=$(printf '%s' "$_web_line" | awk -F'|' '{print $2}')
809
+ _web_health=$(printf '%s' "$_web_line" | awk -F'|' '{print $3}')
810
+ if [ "$_web_state" != "running" ]; then
811
+ # Container not running -> definitively down.
812
+ _write_health "false"
813
+ _write_app_state "crashed"
814
+ return 1
815
+ fi
816
+ if [ -n "$_web_health" ]; then
817
+ # A healthcheck is declared: it is authoritative.
818
+ if [ "$_web_health" = "healthy" ]; then
819
+ _write_health "true"
820
+ _write_app_state "running"
821
+ return 0
822
+ fi
823
+ if [ "$_web_health" = "unhealthy" ]; then
824
+ # Container up but failing its own healthcheck (not serving).
825
+ _write_health "false"
826
+ _write_app_state "crashed"
827
+ return 1
828
+ fi
829
+ # "starting" (within start_period): up, not yet healthy. Report
830
+ # running so the watchdog gives it time instead of restarting,
831
+ # but do not yet claim a passing health.
832
+ _write_health "false"
833
+ _write_app_state "running"
834
+ return 0
835
+ fi
836
+ # No healthcheck declared: container running is the signal.
698
837
  _write_health "true"
699
838
  _write_app_state "running"
700
839
  return 0
701
- else
702
- _write_health "false"
703
- return 1
704
840
  fi
841
+ # No web service identified (legacy/degraded): fall back to the
842
+ # original "any container running" signal.
843
+ _write_health "true"
844
+ _write_app_state "running"
845
+ return 0
705
846
  fi
706
847
 
707
848
  # Check PID is alive (non-docker-compose methods)
@@ -769,6 +910,36 @@ app_runner_should_restart() {
769
910
  app_runner_watchdog() {
770
911
  _app_runner_dir
771
912
 
913
+ # v7.26.0 (Phase 4): docker compose runs detached (`up -d` exits immediately),
914
+ # so the captured PID is a short-lived subshell and `kill -0` is the wrong
915
+ # liveness signal for a compose stack. For compose, delegate to
916
+ # app_runner_health_check, whose compose branch keys on the primary web
917
+ # SERVICE container running (GAP #4) and writes health.json + state.json.
918
+ # This is what makes the service-aware health logic actually fire in the
919
+ # live monitoring loop (not just in isolation). On an unhealthy web service
920
+ # it restarts the stack under the same crash-count circuit breaker.
921
+ if [ "$_APP_RUNNER_IS_DOCKER" = true ] && echo "$_APP_RUNNER_METHOD" | grep -q "docker compose"; then
922
+ if app_runner_health_check; then
923
+ return 0
924
+ fi
925
+ _APP_RUNNER_CRASH_COUNT=$(( _APP_RUNNER_CRASH_COUNT + 1 ))
926
+ log_warn "App Runner: compose web service unhealthy (crash #$_APP_RUNNER_CRASH_COUNT)"
927
+ if [ "$_APP_RUNNER_CRASH_COUNT" -ge 5 ]; then
928
+ log_error "App Runner: crash limit reached (5), marking as crashed"
929
+ tail -20 "$_APP_RUNNER_DIR/app.log" 2>/dev/null | while IFS= read -r line; do
930
+ log_error " $line"
931
+ done
932
+ _write_app_state "crashed"
933
+ return 1
934
+ fi
935
+ local _c_backoff=$(( 1 << _APP_RUNNER_CRASH_COUNT ))
936
+ [ "$_c_backoff" -gt 30 ] && _c_backoff=30
937
+ log_info "App Runner: restarting compose stack in ${_c_backoff}s..."
938
+ sleep "$_c_backoff"
939
+ app_runner_start || log_warn "App Runner: compose auto-restart failed"
940
+ return 0
941
+ fi
942
+
772
943
  if [ -z "$_APP_RUNNER_PID" ] && [ -f "$_APP_RUNNER_DIR/app.pid" ]; then
773
944
  _APP_RUNNER_PID=$(cat "$_APP_RUNNER_DIR/app.pid" 2>/dev/null)
774
945
  fi
package/autonomy/run.sh CHANGED
@@ -2423,6 +2423,20 @@ build_completion_summary() {
2423
2423
  *) outcome_label="$outcome"; notify_title="Run finished" ;;
2424
2424
  esac
2425
2425
 
2426
+ # Live app URL (best-effort): if the app runner has a running app, surface
2427
+ # where the user can try it. Reads .loki/app-runner/state.json written by
2428
+ # app-runner.sh. Empty when no app is running.
2429
+ local live_app_url=""
2430
+ local _app_state_file="$loki_dir/app-runner/state.json"
2431
+ if [ -f "$_app_state_file" ]; then
2432
+ live_app_url="$(python3 -c "import json,sys
2433
+ try:
2434
+ d=json.load(open(sys.argv[1]))
2435
+ print(d.get('url','') if d.get('status')=='running' else '')
2436
+ except Exception:
2437
+ print('')" "$_app_state_file" 2>/dev/null)"
2438
+ fi
2439
+
2426
2440
  # Branch + diff stats vs the run-start SHA (best-effort; non-git or empty
2427
2441
  # baseline yields empty values, which we render as "unknown"/"0").
2428
2442
  local start_sha="${_LOKI_RUN_START_SHA:-}"
@@ -2483,6 +2497,15 @@ build_completion_summary() {
2483
2497
  echo "Pull request: not opened (set LOKI_DELEGATE_PR=1 to open one)"
2484
2498
  fi
2485
2499
  echo ""
2500
+ if [ -n "$live_app_url" ]; then
2501
+ # Compute the dashboard scheme the same way start_dashboard does
2502
+ # (url_scheme is local to that function, not visible here).
2503
+ local _dash_scheme="http"
2504
+ [ -n "${LOKI_TLS_CERT:-}" ] && [ -n "${LOKI_TLS_KEY:-}" ] && _dash_scheme="https"
2505
+ echo "Your app is live at: $live_app_url (served locally on this machine)"
2506
+ echo " Dashboard: ${_dash_scheme}://127.0.0.1:${DASHBOARD_PORT:-57374}/ (App Runner -> Live App)"
2507
+ echo ""
2508
+ fi
2486
2509
  echo "Tasks: pending=$pending in_progress=$in_progress completed=$completed failed=$failed"
2487
2510
  echo ""
2488
2511
  echo "Review the work:"
@@ -8216,9 +8239,22 @@ start_dashboard() {
8216
8239
  log_info "Dashboard started (PID: $DASHBOARD_PID)"
8217
8240
  log_info "Dashboard: ${CYAN}${url_scheme}://127.0.0.1:$DASHBOARD_PORT/${NC}"
8218
8241
 
8219
- # Open in browser (macOS)
8220
- if [[ "$OSTYPE" == "darwin"* ]]; then
8221
- open "${url_scheme}://127.0.0.1:$DASHBOARD_PORT/" 2>/dev/null || true
8242
+ # Auto-open the dashboard in the browser, but ONLY for an interactive
8243
+ # foreground session. Gated on: a TTY on stdout ([ -t 1 ]), not
8244
+ # background/detached mode, and not explicitly opted out via
8245
+ # LOKI_NO_AUTO_OPEN=1. This keeps CI, --detach, SSH-no-TTY, and piped
8246
+ # runs from spawning a browser. Cross-platform: open / xdg-open / start.
8247
+ if [ -t 1 ] && [ "${BACKGROUND_MODE:-false}" != "true" ] && [ "${LOKI_NO_AUTO_OPEN:-0}" != "1" ]; then
8248
+ local _dash_url="${url_scheme}://127.0.0.1:$DASHBOARD_PORT/"
8249
+ if command -v open >/dev/null 2>&1; then
8250
+ open "$_dash_url" 2>/dev/null || true
8251
+ elif command -v xdg-open >/dev/null 2>&1; then
8252
+ xdg-open "$_dash_url" 2>/dev/null || true
8253
+ elif command -v cmd.exe >/dev/null 2>&1; then
8254
+ # Windows (Git Bash/WSL): `start` is a cmd builtin, not on PATH,
8255
+ # so invoke it via cmd.exe. The empty "" is start's title arg.
8256
+ cmd.exe /c start "" "$_dash_url" 2>/dev/null || true
8257
+ fi
8222
8258
  fi
8223
8259
  return 0
8224
8260
  else
@@ -10429,6 +10465,14 @@ build_prompt() {
10429
10465
  # same precedent as AUTONOMY_OVERRIDE_TEXT in providers/claude_flags.ts).
10430
10466
  local agents_md_instruction="Project conventions: read AGENTS.md in the repository root for build, test, and style conventions. If AGENTS.md is absent, read CLAUDE.md instead. The nearest such file to the code you are editing takes precedence."
10431
10467
 
10468
+ # Compose-first instruction (v7.26.0): unconditional string with conditional
10469
+ # phrasing (YOU decide whether the app warrants compose, not a static grep).
10470
+ # When an app needs more than one running service (web + database and/or
10471
+ # cache), a docker-compose.yml is the run contract Loki's app runner uses to
10472
+ # bring the whole stack up locally. This string MUST stay byte-identical to
10473
+ # COMPOSE_INSTRUCTION in loki-ts/src/runner/build_prompt.ts (parity-locked).
10474
+ local compose_instruction="RUN_CONTRACT: Decide whether this app needs more than one running service. If it needs a database (PostgreSQL/MySQL/MongoDB) and/or a cache (Redis) and/or separate backend and frontend services, generate a 12-factor docker-compose.yml at the project root so the whole stack runs with one command. Requirements when you generate it: (1) name the primary web/app service so it is obvious (service key 'web' or 'app', or add the label 'loki.primary=true' on it) and publish its HTTP port (host:container, e.g. '3000:3000'); (2) give every service a healthcheck (the web service must have an HTTP healthcheck so 'up' means actually serving, not just started); (3) wire dependencies with depends_on and config via environment variables; (4) write a .env.example listing every required variable with safe placeholder values; (5) keep secrets out of the compose file and out of git. If the app is a single service with no datastore, do NOT add compose; a plain run command is correct. If a working docker-compose.yml already exists and matches the app, leave it; otherwise create or update it. Verify the stack comes up (docker compose up) before claiming completion."
10475
+
10432
10476
  # Load existing context if resuming
10433
10477
  local context_injection=""
10434
10478
  if [ $retry -gt 0 ]; then
@@ -10758,15 +10802,15 @@ except Exception:
10758
10802
  else
10759
10803
  if [ $retry -eq 0 ]; then
10760
10804
  if [ -n "$prd" ]; then
10761
- echo "Loki Mode with PRD at $prd. $update_instruction $human_directive $gate_failure_context $queue_tasks $bmad_context $openspec_context $mirofish_context $magic_context $checklist_status $app_runner_info $playwright_info $memory_context_section $rarv_instruction $memory_instruction $usage_doc_instruction $lsp_grounding_instruction $agents_md_instruction $completion_instruction $sdlc_instruction $autonomous_suffix"
10805
+ echo "Loki Mode with PRD at $prd. $update_instruction $human_directive $gate_failure_context $queue_tasks $bmad_context $openspec_context $mirofish_context $magic_context $checklist_status $app_runner_info $playwright_info $memory_context_section $rarv_instruction $memory_instruction $usage_doc_instruction $compose_instruction $lsp_grounding_instruction $agents_md_instruction $completion_instruction $sdlc_instruction $autonomous_suffix"
10762
10806
  else
10763
- echo "Loki Mode. $human_directive $gate_failure_context $queue_tasks $bmad_context $openspec_context $mirofish_context $magic_context $checklist_status $app_runner_info $playwright_info $memory_context_section $analysis_instruction $rarv_instruction $memory_instruction $usage_doc_instruction $lsp_grounding_instruction $agents_md_instruction $completion_instruction $sdlc_instruction $autonomous_suffix"
10807
+ echo "Loki Mode. $human_directive $gate_failure_context $queue_tasks $bmad_context $openspec_context $mirofish_context $magic_context $checklist_status $app_runner_info $playwright_info $memory_context_section $analysis_instruction $rarv_instruction $memory_instruction $usage_doc_instruction $compose_instruction $lsp_grounding_instruction $agents_md_instruction $completion_instruction $sdlc_instruction $autonomous_suffix"
10764
10808
  fi
10765
10809
  else
10766
10810
  if [ -n "$prd" ]; then
10767
- echo "Loki Mode - Resume iteration #$iteration (retry #$retry). PRD: $prd. $human_directive $gate_failure_context $queue_tasks $bmad_context $openspec_context $mirofish_context $magic_context $checklist_status $app_runner_info $playwright_info $memory_context_section $rarv_instruction $memory_instruction $usage_doc_instruction $lsp_grounding_instruction $agents_md_instruction $completion_instruction $sdlc_instruction $autonomous_suffix"
10811
+ echo "Loki Mode - Resume iteration #$iteration (retry #$retry). PRD: $prd. $human_directive $gate_failure_context $queue_tasks $bmad_context $openspec_context $mirofish_context $magic_context $checklist_status $app_runner_info $playwright_info $memory_context_section $rarv_instruction $memory_instruction $usage_doc_instruction $compose_instruction $lsp_grounding_instruction $agents_md_instruction $completion_instruction $sdlc_instruction $autonomous_suffix"
10768
10812
  else
10769
- echo "Loki Mode - Resume iteration #$iteration (retry #$retry). $human_directive $gate_failure_context $queue_tasks $bmad_context $openspec_context $mirofish_context $magic_context $checklist_status $app_runner_info $playwright_info $memory_context_section Use .loki/generated-prd.md if exists. $rarv_instruction $memory_instruction $usage_doc_instruction $lsp_grounding_instruction $agents_md_instruction $completion_instruction $sdlc_instruction $autonomous_suffix"
10813
+ echo "Loki Mode - Resume iteration #$iteration (retry #$retry). $human_directive $gate_failure_context $queue_tasks $bmad_context $openspec_context $mirofish_context $magic_context $checklist_status $app_runner_info $playwright_info $memory_context_section Use .loki/generated-prd.md if exists. $rarv_instruction $memory_instruction $usage_doc_instruction $compose_instruction $lsp_grounding_instruction $agents_md_instruction $completion_instruction $sdlc_instruction $autonomous_suffix"
10770
10814
  fi
10771
10815
  fi
10772
10816
  fi
@@ -10808,6 +10852,7 @@ except Exception:
10808
10852
  printf 'You are a coding assistant. Analyze this codebase and suggest improvements. Write working code and commit changes.\n'
10809
10853
  fi
10810
10854
  printf '%s\n' "$usage_doc_instruction"
10855
+ printf '%s\n' "$compose_instruction"
10811
10856
  printf '%s\n' "$lsp_grounding_instruction"
10812
10857
  printf '%s\n' "$agents_md_instruction"
10813
10858
  printf '</loki_system>\n'
@@ -10841,6 +10886,7 @@ except Exception:
10841
10886
  printf '%s\n' "$autonomous_suffix"
10842
10887
  printf '%s\n' "$memory_instruction"
10843
10888
  printf '%s\n' "$usage_doc_instruction"
10889
+ printf '%s\n' "$compose_instruction"
10844
10890
  printf '%s\n' "$lsp_grounding_instruction"
10845
10891
  printf '%s\n' "$agents_md_instruction"
10846
10892
  # For codebase-analysis mode (no PRD), analysis_instruction is part of the
@@ -12141,6 +12187,59 @@ except Exception as exc:
12141
12187
  && loki_claude_flag_supported "--include-partial-messages"; then
12142
12188
  _loki_claude_argv+=("--include-partial-messages")
12143
12189
  fi
12190
+ # ---- Bash<->Bun invocation-flag convergence ledger (v7.25.0) ----------
12191
+ # The fixture corpus covers build_prompt/stats output, NOT this claude
12192
+ # argv, so drift here is invisible to parity tests. Keep this ledger
12193
+ # current. Live route today is BASH (bin/loki routes `start` -> bash).
12194
+ # The claude provider in loki-ts/src/runner/providers.ts is implemented
12195
+ # but is NOT reached for `start` (start is not ported to the Bun router;
12196
+ # the shim falls through to bash), so its flag set has zero live impact
12197
+ # today.
12198
+ # Bash argv (canonical, live): --dangerously-skip-permissions --model M
12199
+ # [--append-system-prompt] [--setting-sources] [--include-partial-messages]
12200
+ # [--effort] [--max-budget-usd] [--fallback-model] -p PROMPT
12201
+ # --output-format stream-json --verbose
12202
+ # Bun buildAutoFlags also emits: --exclude-dynamic-system-prompt-sections
12203
+ # (cost-only), --mcp-config (bash gets MCP via --setting-sources +
12204
+ # .mcp.json discovery; a how-difference, likely behavior-equivalent),
12205
+ # --include-hook-events (bash handles hook events in its embedded
12206
+ # stream parser; likely moot). These three are Bun-only and MUST be
12207
+ # reconciled to a deliberately chosen canonical set BEFORE `start`
12208
+ # flips to the Bun runner. They have zero live impact today.
12209
+ # v7.25.0: long-run resilience + cost flags, appended individually here
12210
+ # (NOT via _loki_build_claude_auto_flags, which would double the three
12211
+ # flags above). Each is gated on CLI support + an opt-out env var, same
12212
+ # pattern as above. These improve unattended/long-run execution:
12213
+ # --effort adaptive reasoning depth per RARV tier
12214
+ # --max-budget-usd per-call hard backstop (complements the
12215
+ # cumulative check_budget_limit PAUSE gate)
12216
+ # --fallback-model resilience to model overload/unavailability
12217
+ # The trust/verification gates stay deterministic; these only tune how
12218
+ # the provider is invoked, never whether work is judged complete.
12219
+ if [ "${LOKI_AUTO_EFFORT:-on}" != "off" ] \
12220
+ && type loki_effort_for_tier >/dev/null 2>&1 \
12221
+ && type loki_claude_flag_supported >/dev/null 2>&1 \
12222
+ && loki_claude_flag_supported "--effort"; then
12223
+ local _loki_effort
12224
+ _loki_effort="$(loki_effort_for_tier "$CURRENT_TIER" "${DETECTED_COMPLEXITY:-${LOKI_COMPLEXITY:-standard}}")"
12225
+ [ -n "$_loki_effort" ] && _loki_claude_argv+=("--effort" "$_loki_effort")
12226
+ fi
12227
+ if [ "${LOKI_AUTO_BUDGET:-on}" != "off" ] \
12228
+ && type loki_remaining_budget >/dev/null 2>&1 \
12229
+ && type loki_claude_flag_supported >/dev/null 2>&1 \
12230
+ && loki_claude_flag_supported "--max-budget-usd"; then
12231
+ local _loki_rem_budget
12232
+ _loki_rem_budget="$(loki_remaining_budget)"
12233
+ [ -n "$_loki_rem_budget" ] && _loki_claude_argv+=("--max-budget-usd" "$_loki_rem_budget")
12234
+ fi
12235
+ if [ "${LOKI_AUTO_FALLBACK:-on}" != "off" ] \
12236
+ && type loki_fallback_for_primary >/dev/null 2>&1 \
12237
+ && type loki_claude_flag_supported >/dev/null 2>&1 \
12238
+ && loki_claude_flag_supported "--fallback-model"; then
12239
+ local _loki_fallback
12240
+ _loki_fallback="$(loki_fallback_for_primary "$tier_param")"
12241
+ [ -n "$_loki_fallback" ] && _loki_claude_argv+=("--fallback-model" "$_loki_fallback")
12242
+ fi
12144
12243
  case "${PROVIDER_NAME:-claude}" in
12145
12244
  claude)
12146
12245
  # Claude: Full features with stream-json output and agent tracking
@@ -7,7 +7,7 @@ Modules:
7
7
  control: Session control API (start/stop/pause/resume)
8
8
  """
9
9
 
10
- __version__ = "7.24.0"
10
+ __version__ = "7.26.0"
11
11
 
12
12
  # Expose the control app for easy import
13
13
  try:
@@ -2,7 +2,7 @@
2
2
 
3
3
  The flagship product of [Autonomi](https://www.autonomi.dev/). Loki Mode is a spec-driven autonomous builder with a built-in trust layer that takes any spec to a deployed product and verifies completion with evidence (quality gates plus a completion council), not just a "done" claim. Complete installation instructions for all platforms and use cases.
4
4
 
5
- **Version:** v7.24.0
5
+ **Version:** v7.26.0
6
6
 
7
7
  ---
8
8
 
@@ -1,5 +1,5 @@
1
1
  // @bun
2
- var f8=Object.defineProperty;var u8=($)=>$;function c8($,Q){this[$]=u8.bind(null,Q)}var g=($,Q)=>{for(var Z in Q)f8($,Z,{get:Q[Z],enumerable:!0,configurable:!0,set:c8.bind(Q,Z)})};var k=($,Q)=>()=>($&&(Q=$($=0)),Q);var X1=import.meta.require;var F$={};g(F$,{lokiDir:()=>P,homeLokiDir:()=>o1,findRepoRootForVersion:()=>d1,REPO_ROOT:()=>f});import{resolve as n,dirname as l1}from"path";import{fileURLToPath as p8}from"url";import{existsSync as L1}from"fs";import{homedir as l8}from"os";function d8(){let $=j$;for(let Q=0;Q<6;Q++){if(L1(n($,"VERSION"))&&L1(n($,"autonomy/run.sh")))return $;let Z=l1($);if(Z===$)break;$=Z}return n(j$,"..","..","..")}function d1($){let Q=$;for(let Z=0;Z<6;Z++){if(L1(n(Q,"VERSION"))&&L1(n(Q,"autonomy/run.sh")))return Q;let z=l1(Q);if(z===Q)break;Q=z}return n($,"..","..","..")}function P(){return process.env.LOKI_DIR??n(process.cwd(),".loki")}function o1(){return n(l8(),".loki")}var j$,f;var y=k(()=>{j$=l1(p8(import.meta.url));f=d8()});import{readFileSync as o8}from"fs";import{resolve as n8,dirname as a8}from"path";import{fileURLToPath as s8}from"url";function k1(){if($1!==null)return $1;let $="7.24.0";if(typeof $==="string"&&$.length>0)return $1=$,$1;try{let Q=a8(s8(import.meta.url)),Z=d1(Q);$1=o8(n8(Z,"VERSION"),"utf-8").trim()}catch{$1="unknown"}return $1}var $1=null;var n1=k(()=>{y()});var E$={};g(E$,{runOrThrow:()=>t8,run:()=>j,commandVersion:()=>i8,commandExists:()=>v,ShellError:()=>a1});async function j($,Q={}){let Z=Bun.spawn({cmd:[...$],stdout:"pipe",stderr:"pipe",env:Q.env?{...process.env,...Q.env}:process.env,cwd:Q.cwd}),z,K;if(Q.timeoutMs&&Q.timeoutMs>0)z=setTimeout(()=>{try{Z.kill("SIGTERM")}catch{}K=setTimeout(()=>{try{Z.kill("SIGKILL")}catch{}},2000)},Q.timeoutMs);try{let[H,X,q]=await Promise.all([new Response(Z.stdout).text(),new Response(Z.stderr).text(),Z.exited]);return{stdout:H,stderr:X,exitCode:q}}finally{if(z)clearTimeout(z);if(K)clearTimeout(K)}}async function t8($,Q={}){let Z=await j($,Q);if(Z.exitCode!==0)throw new a1(`command failed (${Z.exitCode}): ${$.join(" ")}`,Z.exitCode,Z.stdout,Z.stderr);return Z}async function v($){let Q=r8($),Z=await j(["sh","-c",`command -v ${Q}`],{timeoutMs:5000});if(Z.exitCode===0)return Z.stdout.trim()||null;return null}function r8($){if(!/^[A-Za-z0-9._/-]+$/.test($))throw Error(`refused to shell-escape suspect token: ${$}`);return $}async function i8($,Q="--version"){if(!await v($))return null;let z=await j([$,Q],{timeoutMs:5000});if(z.exitCode!==0)return null;return((z.stdout||z.stderr).split(/\r?\n/)[0]?.trim()??"")||null}var a1;var d=k(()=>{a1=class a1 extends Error{message;exitCode;stdout;stderr;constructor($,Q,Z,z){super($);this.message=$;this.exitCode=Q;this.stdout=Z;this.stderr=z;this.name="ShellError"}}});function a($){return e8?"":$}var e8,T,N,_,KZ,A,R,h,J;var c=k(()=>{e8=(process.env.NO_COLOR??"").length>0;T=a("\x1B[0;31m"),N=a("\x1B[0;32m"),_=a("\x1B[1;33m"),KZ=a("\x1B[0;34m"),A=a("\x1B[0;36m"),R=a("\x1B[1m"),h=a("\x1B[2m"),J=a("\x1B[0m")});import{existsSync as U7}from"fs";async function Q1(){if(B1!==void 0)return B1;let $="/opt/homebrew/bin/python3.12";if(U7($))return B1=$,$;let Q=await v("python3.12");if(Q)return B1=Q,Q;let Z=await v("python3");return B1=Z,Z}async function Z1($,Q={}){let Z=await Q1();if(!Z)return{stdout:"",stderr:"python3 not found",exitCode:127};return j([Z,"-c",$],Q)}var B1;var H1=k(()=>{d()});var d$={};g(d$,{runStatus:()=>N7});import{existsSync as b,readFileSync as q1,readdirSync as v$,statSync as f$}from"fs";import{resolve as D,basename as P7}from"path";import{homedir as L7}from"os";async function j7(){if(await v("jq"))return!0;return process.stdout.write(`${T}Error: jq is required but not installed.${J}
2
+ var f8=Object.defineProperty;var u8=($)=>$;function c8($,Q){this[$]=u8.bind(null,Q)}var g=($,Q)=>{for(var Z in Q)f8($,Z,{get:Q[Z],enumerable:!0,configurable:!0,set:c8.bind(Q,Z)})};var k=($,Q)=>()=>($&&(Q=$($=0)),Q);var X1=import.meta.require;var F$={};g(F$,{lokiDir:()=>P,homeLokiDir:()=>o1,findRepoRootForVersion:()=>d1,REPO_ROOT:()=>f});import{resolve as n,dirname as l1}from"path";import{fileURLToPath as p8}from"url";import{existsSync as L1}from"fs";import{homedir as l8}from"os";function d8(){let $=j$;for(let Q=0;Q<6;Q++){if(L1(n($,"VERSION"))&&L1(n($,"autonomy/run.sh")))return $;let Z=l1($);if(Z===$)break;$=Z}return n(j$,"..","..","..")}function d1($){let Q=$;for(let Z=0;Z<6;Z++){if(L1(n(Q,"VERSION"))&&L1(n(Q,"autonomy/run.sh")))return Q;let z=l1(Q);if(z===Q)break;Q=z}return n($,"..","..","..")}function P(){return process.env.LOKI_DIR??n(process.cwd(),".loki")}function o1(){return n(l8(),".loki")}var j$,f;var y=k(()=>{j$=l1(p8(import.meta.url));f=d8()});import{readFileSync as o8}from"fs";import{resolve as n8,dirname as a8}from"path";import{fileURLToPath as s8}from"url";function k1(){if($1!==null)return $1;let $="7.26.0";if(typeof $==="string"&&$.length>0)return $1=$,$1;try{let Q=a8(s8(import.meta.url)),Z=d1(Q);$1=o8(n8(Z,"VERSION"),"utf-8").trim()}catch{$1="unknown"}return $1}var $1=null;var n1=k(()=>{y()});var E$={};g(E$,{runOrThrow:()=>t8,run:()=>j,commandVersion:()=>i8,commandExists:()=>v,ShellError:()=>a1});async function j($,Q={}){let Z=Bun.spawn({cmd:[...$],stdout:"pipe",stderr:"pipe",env:Q.env?{...process.env,...Q.env}:process.env,cwd:Q.cwd}),z,K;if(Q.timeoutMs&&Q.timeoutMs>0)z=setTimeout(()=>{try{Z.kill("SIGTERM")}catch{}K=setTimeout(()=>{try{Z.kill("SIGKILL")}catch{}},2000)},Q.timeoutMs);try{let[H,X,q]=await Promise.all([new Response(Z.stdout).text(),new Response(Z.stderr).text(),Z.exited]);return{stdout:H,stderr:X,exitCode:q}}finally{if(z)clearTimeout(z);if(K)clearTimeout(K)}}async function t8($,Q={}){let Z=await j($,Q);if(Z.exitCode!==0)throw new a1(`command failed (${Z.exitCode}): ${$.join(" ")}`,Z.exitCode,Z.stdout,Z.stderr);return Z}async function v($){let Q=r8($),Z=await j(["sh","-c",`command -v ${Q}`],{timeoutMs:5000});if(Z.exitCode===0)return Z.stdout.trim()||null;return null}function r8($){if(!/^[A-Za-z0-9._/-]+$/.test($))throw Error(`refused to shell-escape suspect token: ${$}`);return $}async function i8($,Q="--version"){if(!await v($))return null;let z=await j([$,Q],{timeoutMs:5000});if(z.exitCode!==0)return null;return((z.stdout||z.stderr).split(/\r?\n/)[0]?.trim()??"")||null}var a1;var d=k(()=>{a1=class a1 extends Error{message;exitCode;stdout;stderr;constructor($,Q,Z,z){super($);this.message=$;this.exitCode=Q;this.stdout=Z;this.stderr=z;this.name="ShellError"}}});function a($){return e8?"":$}var e8,T,N,_,KZ,A,R,h,J;var c=k(()=>{e8=(process.env.NO_COLOR??"").length>0;T=a("\x1B[0;31m"),N=a("\x1B[0;32m"),_=a("\x1B[1;33m"),KZ=a("\x1B[0;34m"),A=a("\x1B[0;36m"),R=a("\x1B[1m"),h=a("\x1B[2m"),J=a("\x1B[0m")});import{existsSync as U7}from"fs";async function Q1(){if(B1!==void 0)return B1;let $="/opt/homebrew/bin/python3.12";if(U7($))return B1=$,$;let Q=await v("python3.12");if(Q)return B1=Q,Q;let Z=await v("python3");return B1=Z,Z}async function Z1($,Q={}){let Z=await Q1();if(!Z)return{stdout:"",stderr:"python3 not found",exitCode:127};return j([Z,"-c",$],Q)}var B1;var H1=k(()=>{d()});var d$={};g(d$,{runStatus:()=>N7});import{existsSync as b,readFileSync as q1,readdirSync as v$,statSync as f$}from"fs";import{resolve as D,basename as P7}from"path";import{homedir as L7}from"os";async function j7(){if(await v("jq"))return!0;return process.stdout.write(`${T}Error: jq is required but not installed.${J}
3
3
  `),process.stdout.write(`Install with:
4
4
  `),process.stdout.write(` brew install jq (macOS)
5
5
  `),process.stdout.write(` apt install jq (Debian/Ubuntu)
@@ -787,4 +787,4 @@ Set LOKI_LEGACY_BASH=1 to force the bash CLI for every command.
787
787
  `),2}default:return process.stderr.write(`Unknown command: ${Q}
788
788
  `),process.stderr.write(v8),2}}g$();process.on("SIGINT",()=>process.exit(130));process.on("SIGTERM",()=>process.exit(143));var l3=await p3(Bun.argv.slice(2));process.exit(l3);
789
789
 
790
- //# debugId=5BE33C3C3E53FD8864756E2164756E21
790
+ //# debugId=7BD97DA7996A924D64756E2164756E21
package/mcp/__init__.py CHANGED
@@ -57,4 +57,4 @@ try:
57
57
  except ImportError:
58
58
  __all__ = ['mcp']
59
59
 
60
- __version__ = '7.24.0'
60
+ __version__ = '7.26.0'
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "loki-mode",
3
- "version": "7.24.0",
3
+ "version": "7.26.0",
4
4
  "description": "Loki Mode by Autonomi. Autonomous spec-to-product system: takes a PRD, GitHub issue, OpenAPI/JSON/YAML, or one-line brief to a deployed app via the RARV-C closure loop with 11 quality gates. Provider-agnostic (Claude Code, OpenAI Codex, Cline, Aider).",
5
5
  "keywords": [
6
6
  "agent",
@@ -1,114 +0,0 @@
1
- # Loki Mode vs Replit Agent / Lovable / Bolt.new -- Instant-Preview & Self-Healing Gap Analysis
2
-
3
- Date: 2026-06-09
4
- Author: autonomous analysis (verified against Loki source + competitor web research at knowledge cutoff Jan 2026 plus June 2026 web search)
5
- Status: analysis + prioritized TODO. The TODO is the planning checkpoint, not an auto-launched implementation.
6
-
7
- ## 1. The user's framing
8
-
9
- > "When users prompt a spec or idea, [Replit/Lovable/Bolt] just builds and spins up UI so users can try it. They are clever at suggesting what to add/update/improve/remove, finding bugs while the app runs and fixing them autonomously before the user realizes. Why are we not able to do that? We are lacking something that is causing friction for users (devs, enterprises, non-technical consumers)."
10
-
11
- This is correct as a *felt experience* gap. It is NOT correct that Loki lacks the underlying capability. The gap is mostly **surfacing and loop-tightness**, plus a real **category difference**.
12
-
13
- ## 2. What the competitors actually do (verified June 2026)
14
-
15
- ### Replit Agent 3
16
- - Runs autonomously up to ~200 minutes ("10x more autonomous than Agent 2").
17
- - "Self-healing" loop: spins up a real browser, simulates user behavior (click/type/login), captures logs, and fixes bugs it hits during testing. Calls out "Potemkin interfaces" (looks-done-but-broken).
18
- - REPL-based verification at scale; provisions database; verifies every button/API call.
19
- - Code-to-Device: builds + previews native mobile via Expo QR instantly.
20
- - "Stacks": agents building agents. RulesSync for replit.md across projects.
21
- - Hosted: your project runs on Replit's cloud, instant live preview built into the IDE.
22
-
23
- ### Lovable
24
- - Three modes: Visual Edits (click an element), Plan Mode (conversational), Agent Mode (autonomous codebase exploration + proactive debugging + real-time web search).
25
- - Live preview updates in real time; checkpoint/version system to revert a bad edit.
26
- - Pricing: Free (5 daily credits), Pro $25/mo, Business $50/mo; plus usage-based Cloud/AI billing on shipped apps.
27
- - Hosted; greenfield-biased (build a new app from a prompt).
28
-
29
- ### Bolt.new (StackBlitz)
30
- - WebContainer: Node runs natively *in the browser*, no server. Live preview updates as code generates; you click/fill/interact immediately.
31
- - Agent has full control of filesystem, node server, package manager, terminal, browser console; human-in-the-loop chat + hand-edit.
32
- - Bolt V2: Bolt Cloud (DB, auth, storage, edge functions, analytics, hosting), one-click Netlify deploy. Opus 4.6 with adjustable reasoning depth (Jan 2026).
33
- - Hosted/in-browser; greenfield-biased.
34
-
35
- Common thread: **hosted text-to-app**. Instant live preview is free because the runtime IS their cloud/browser sandbox. The inner loop (build -> see -> fix) is conversational, visible, real-time.
36
-
37
- ## 3. What Loki actually has (verified in source)
38
-
39
- | Capability | Competitors | Loki today | Source |
40
- |---|---|---|---|
41
- | Starts the built app on a detected port | yes | YES (`app_runner_start`, `_detect_port`) | `autonomy/app-runner.sh:498,150` |
42
- | Health check + crash watchdog + auto-restart | yes | YES (`app_runner_watchdog`, `app_runner_should_restart`, `app_runner_health_check`) | `autonomy/app-runner.sh:769,735,678` |
43
- | Browser smoke test of the running app | yes (their core loop) | YES, batch (`playwright-verify.sh`) | `autonomy/playwright-verify.sh` |
44
- | Crash/playwright signal fed back into next iteration to self-correct | yes, tight realtime | YES, batch (`app_runner_info`/`playwright_info` injected into `build_prompt`) | `autonomy/run.sh:10544,10561,10761` |
45
- | Verified completion gate (no fabricated "done") | partial | YES (`council_evidence_gate`) | `completion-council.sh` |
46
- | Failure-memory: past failures injected to avoid repeats | partial | YES (`retrieve_anti_patterns`) | `memory/retrieval.py` |
47
- | **Clickable live preview URL / embedded app the user can try** | YES, central | **NO -- app runs but URL is not surfaced in dashboard** | gap: `dashboard/server.py` has no preview route |
48
- | **Real-time visible inner loop (watch it build/fix as it happens)** | YES | Partial -- dashboard shows iterations/logs, but app preview not embedded; loop is a longer autonomous batch | gap |
49
- | **Proactive "suggest add/improve/remove" surfaced to user** | YES | Partial -- analysis pass exists internally; not surfaced as user-facing suggestions | gap |
50
- | Works on existing/brownfield repos | weak (greenfield-biased) | STRONG | `loki heal`, codebase analysis |
51
- | No hosted-runtime lock-in; your machine, your code | NO (vendor cloud) | YES | local-first CLI |
52
- | Multi-provider (Claude/Codex/Cline/Aider) | NO | YES | `providers/*.sh` |
53
-
54
- ## 4. Where Loki is BETTER (keep + lead with these)
55
-
56
- 1. **Local-first, no vendor lock-in.** Your code never leaves your machine; no hosted runtime you can be evicted from or surprised-billed on (Lovable's dual-layer billing is a known pain). Enterprises care about this.
57
- 2. **Brownfield/existing repos.** Replit/Lovable/Bolt are greenfield-biased (prompt -> new app). Loki runs on real existing codebases, including `loki heal` for legacy systems.
58
- 3. **Verified completion + failure-memory.** Evidence gate blocks fabricated "done"; anti-pattern memory prevents repeating mistakes across runs. The hosted tools mostly re-discover bugs each session.
59
- 4. **Multi-provider + your own keys/budget.** Not locked to one model vendor or a credit economy.
60
- 5. **Depth of autonomous SDLC** (RARV, council review, quality gates) vs a single conversational agent.
61
-
62
- ## 5. Where Loki is WORSE (the real friction)
63
-
64
- 1. **No instant "try it" moment.** The app DOES start (app-runner), but the dashboard never hands the user a clickable URL or embedded preview. This is the single biggest felt gap and it is a *surfacing* fix, not a build.
65
- 2. **Setup friction.** Competitors are zero-install (open a browser). Loki needs the `claude` CLI, a terminal, `loki start`. Non-technical consumers stall here.
66
- 3. **The inner loop is a long batch, not a watched conversation.** Users can't see "it's testing the login button now and fixing it." Same capability, far less visible.
67
- 4. **Suggestions aren't surfaced.** Loki's analysis pass reasons about what to add/fix internally, but doesn't present a user-facing "here's what I'd improve next" list.
68
- 5. **First-run time-to-wow.** No 30-second "look, it works" the way a hosted preview gives.
69
-
70
- ## 6. Honest category line (for positioning, not inferiority)
71
-
72
- Hosted text-to-app SaaS (Replit/Lovable/Bolt): instant live preview, tight visible loop, friendly to non-technical users -- but your code on their cloud, vendor + credit lock-in, greenfield-biased.
73
-
74
- Loki Mode: local-first CLI driving Claude on your own machine -- brownfield-capable, no hosted-runtime lock-in, multi-provider, verified completion + memory -- but no instant hosted preview and higher setup friction.
75
-
76
- The wins below close the *experience* gap without giving up the local-first advantages.
77
-
78
- ## 7. Prioritized TODO (by blast radius / friction reduction)
79
-
80
- ### P0 -- Live Preview surfacing (the headline win; cheapest path to "try it")
81
- The app already starts with crash watchdog. Surface it.
82
- - **Dashboard:** add a "Live App" panel that reads `.loki/app-runner/state.json` (status, port, url, crash_count), shows a clickable `http://localhost:<port>` link + an embedded iframe + health/crash badge + "Restart app" button (wire to existing `app_runner_restart`).
83
- - **CLI:** `loki preview` (alias `loki open`) -- prints the running app URL and opens the browser; honest message if no app is running yet.
84
- - **API:** `GET /api/app-runner` (state passthrough), `POST /api/app-runner/restart`.
85
- - Pure surfacing of existing state; no new runtime behavior. Lowest risk, highest felt impact.
86
-
87
- ### P1 -- Tighter, visible self-healing loop
88
- - Stream app-runner crash events + playwright pass/fail to the dashboard timeline in near-real-time (event bus already exists, `events/bus.py`).
89
- - Dashboard "what just happened" feed: "app crashed -> reading log -> fixing -> restarted -> smoke test passed."
90
- - Honest framing: this exposes the EXISTING batch loop more visibly; it does not claim Replit's per-click realtime browser sim.
91
-
92
- ### P2 -- Proactive suggestions surfaced to the user
93
- - Add a structured "Suggestions" output from the analysis pass (add/improve/remove/risk), persisted to `.loki/suggestions.json`.
94
- - Dashboard "Suggestions" panel + `loki suggest` CLI to print them.
95
- - These are advisory; the user opts in to queue any as tasks.
96
-
97
- ### P3 -- First-run time-to-wow / setup friction
98
- - `loki try <one-line-idea>`: scaffold a tiny app, build it, auto-start app-runner, open preview -- a guided 60-second "it works" path (honest: real build, not simulated).
99
- - Doctor-style preflight that detects missing `claude` CLI and guides install.
100
-
101
- ### P4 -- Non-technical on-ramp (longer term, optional)
102
- - Evaluate an optional hosted/containerized preview for users who can't run locally (collides with zero-egress posture; opt-in only, deferred).
103
-
104
- ## 8. What this session will actually implement
105
-
106
- Per user direction ("update dashboard and backend cli or api accordingly", "plan it perfectly", "complete autonomously"): implement **P0 (Live Preview surfacing)** end-to-end (dashboard + CLI + API), both runtime routes where applicable, council-reviewed, local-ci 42/42, channels validated. P1-P4 are scoped follow-ups.
107
-
108
- ## 9. SWE-bench note (unrelated but pending, must be stated)
109
- Primary-source data in `benchmarks/results/` shows ONLY patch generation (299/300 generated, `fixed_by_rarv:0`, status PATCHES_GENERATED, the official evaluator was never run, no resolve/pass-rate figure exists; some patches are prose not diffs). There is no "release 660." Publishing a SWE-bench resolve score would be fabrication. The only real measured number is HumanEval **98.78%** (162/164). Recommendation: lead with HumanEval; keep SWE-bench as "harness exists, resolve-rate not yet measured" + the repro command; offer to run the official evaluator as an opt-in upgrade.
110
-
111
- ## Sources (competitor research, June 2026)
112
- - Replit Agent 3: https://blog.replit.com/introducing-agent-3-our-most-autonomous-agent-yet , https://blog.replit.com/automated-self-testing , https://docs.replit.com/core-concepts/agent
113
- - Lovable: https://lovable.dev/ , https://lovable.dev/pricing , https://www.nocode.mba/articles/lovable-ai-app-builder
114
- - Bolt.new: https://github.com/stackblitz/bolt.new , https://capacity.so/blog/what-is-bolt-new , https://www.banani.co/blog/bolt-new-ai-review-and-alternatives