npm - loki-mode - Versions diffs - 7.24.0 → 7.26.0 - Mend

loki-mode 7.24.0 → 7.26.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/README.md +4 -2
package/SKILL.md +2 -2
package/VERSION +1 -1
package/autonomy/app-runner.sh +185 -14
package/autonomy/run.sh +106 -7
package/dashboard/__init__.py +1 -1
package/docs/INSTALLATION.md +1 -1
package/loki-ts/dist/loki.js +2 -2
package/mcp/__init__.py +1 -1
package/package.json +1 -1
package/docs/COMPETITIVE-ANALYSIS-INSTANT-PREVIEW-2026-06.md +0 -114

package/README.md CHANGED Viewed

@@ -27,11 +27,13 @@
 - **Spec-driven, autonomous, with a built-in trust layer** -- Hand Loki a spec, walk away, come back to working code with tests. The full RARV-C closure loop (Reason - Act - Reflect - Verify - Close) runs until the work is actually done, not just attempted. The verified-completion evidence gate (`skills/quality-gates.md`) refuses any "done" claim on an empty git diff against the run-start commit, and blocks completion when tests run red, so "complete" means proven, not promised.
 - **Production quality built in** -- 11 quality gates (`skills/quality-gates.md`), blind 3-reviewer code review (`run.sh:run_code_review()`), anti-sycophancy checks
 - **Live App Preview** -- The dashboard embeds the locally-running app in an iframe so you can interact with it immediately during a build. Use `loki preview` (alias `loki open`) to print the URL and open it in your browser. Local-first: no hosted service, no vendor lock (v7.24.0).
+- **Compose-first fullstack** -- When a spec needs more than one service (web + database + cache) Loki generates a 12-factor `docker-compose.yml` with healthchecks, `depends_on` wiring, env-var config, and a `.env.example`. The Live App Preview surfaces the web service URL (not a database port), and health reflects the web service's Docker healthcheck so a crashed app shows as crashed even when the database stays up. Single-service apps stay on a plain run command. All local-first, no hosted service (v7.26.0).
+- **Intelligent `loki start`** -- For interactive foreground runs the dashboard auto-opens in the browser (cross-platform; skipped in CI, SSH-without-TTY, and piped runs; opt out with `LOKI_NO_AUTO_OPEN=1`). The completion summary shows "Your app is live at <url>" so you know exactly where to try what Loki just built. The autonomous loop passes Claude Code's `--effort`, `--max-budget-usd`, and `--fallback-model` on every iteration (each gated on CLI support and individual opt-out env vars) for better long-run unattended execution (v7.25.0).
 - **Cross-project memory** -- Episodic/semantic/procedural memory with vector search; knowledge learned on one project surfaces on the next (v5.15.0+, see `memory/engine.py`)
 - **Self-hosted and private** -- Your keys, your infrastructure, no data leaves your network
 - **Legacy system healing** -- `loki heal` archaeology/stabilize/isolate/modernize/validate phases (v6.67.0, see `skills/healing.md`)
 - **MCP server** -- 34 tools (including ChromaDB code search) plus 3 resources and 2 prompts (`mcp/server.py`, with managed-memory and magic tools registered from `mcp/managed_tools.py` and `mcp/magic_tools.py`)
-- **Full-stack output** -- Source code, tests, Docker configs, CI/CD pipelines, audit logs
+- **Full-stack output** -- Source code, tests, Docker Compose stacks (multi-service with healthchecks), CI/CD pipelines, audit logs
 - **Provider-agnostic** -- runs on Claude, Codex, Cline, or Aider with automatic failover (`loki-ts/src/runner/providers.ts`); no vendor lock-in. Gemini CLI deprecated v7.5.18; Antigravity CLI coming soon.
 - **Open source** -- Free for personal, internal, and academic use.
@@ -347,7 +349,7 @@ Claude gets full features (subagents, parallelization, MCP, Task tool). Other ac
 | Command | Description |
 |---------|-------------|
-| `loki start [PRD]` | Start with optional PRD file (also accepts an issue ref; replaces deprecated `loki run`) |
+| `loki start [PRD]` | Start with optional PRD file (also accepts an issue ref; replaces deprecated `loki run`). Auto-opens the dashboard in the browser for interactive runs and passes native `--effort`/`--max-budget-usd`/`--fallback-model` for resilience (v7.25.0) |
 | `loki stop` | Stop execution |
 | `loki heal <path>` | Legacy system healing (archaeology, stabilize, isolate, modernize, validate -- v6.67.0) |
 | `loki pause` / `resume` | Pause/resume after current session |

package/SKILL.md CHANGED Viewed

@@ -3,7 +3,7 @@ name: loki-mode
 description: Autonomous spec-driven build system with a built-in trust layer. It does not call work done until it is verified (RARV-C closure loop, 11 quality gates, completion council, verified-completion evidence gate). Triggers on "Loki Mode". Takes a spec (PRD, GitHub issue, OpenAPI doc, etc.) to deployed product with minimal human intervention. Provider-agnostic. Requires --dangerously-skip-permissions flag.
 ---
-# Loki Mode v7.24.0
+# Loki Mode v7.26.0
 **You are an autonomous agent. You make decisions. You do not ask questions. You do not stop.**
@@ -383,4 +383,4 @@ See `CHANGELOG.md` entries [7.5.7], [7.5.8], [7.5.13] for the per-fix list and r
 ---
-**v7.24.0 | [Autonomi](https://www.autonomi.dev/) flagship product | ~260 lines core**
+**v7.26.0 | [Autonomi](https://www.autonomi.dev/) flagship product | ~260 lines core**

package/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 7.24.0
1	+ 7.26.0

package/autonomy/app-runner.sh CHANGED Viewed

@@ -43,6 +43,10 @@ _APP_RUNNER_PID=""
 _APP_RUNNER_URL=""
 _APP_RUNNER_IS_DOCKER=false
 _APP_RUNNER_DOCKER_CONTAINER=""
+# v7.26.0 (Phase 4): the identified primary web service of a compose project,
+# used for service-aware health checks and the preview URL. Empty for
+# non-compose runs or when identification falls back to legacy port parsing.
+_APP_RUNNER_WEB_SERVICE=""
 _APP_RUNNER_HAS_SETSID=false
 _APP_RUNNER_CRASH_COUNT=0
 _APP_RUNNER_RESTART_COUNT=0
@@ -146,6 +150,77 @@ _rotate_app_log() {
     fi
 }
+# Identify the primary web service of a docker compose project and its
+# published host port. Uses `docker compose config --format json` (fully
+# resolved: env-interpolated, overrides merged) parsed with python3, so we do
+# NOT hand-parse YAML. Precedence MATCHES the contract in COMPOSE_INSTRUCTION
+# (run.sh build_prompt): (1) label loki.primary=true, (2) service named
+# web/app, (3) service publishing a common web port, (4) first service with any
+# published port. Echoes "service_name|published_port" on success, nothing on
+# failure (caller falls back to legacy behavior). Never hard-fails.
+# v7.26.0 (Phase 4): fixes the multi-service URL/health gaps (GAP #1-4).
+_identify_compose_web_service() {
+    local base="${1:-${TARGET_DIR:-.}}"
+    local compose_dir
+    compose_dir=$(_app_runner_compose_dir "$base")
+    command -v docker >/dev/null 2>&1 || return 0
+    command -v python3 >/dev/null 2>&1 || return 0
+    local cfg
+    cfg=$(cd "$compose_dir" && docker compose config --format json 2>/dev/null) || return 0
+    [ -n "$cfg" ] || return 0
+    printf '%s' "$cfg" | python3 -c '
+import json, sys
+COMMON = ["3000", "8000", "8080", "5000", "4200", "5173", "80"]
+try:
+    d = json.load(sys.stdin)
+except Exception:
+    sys.exit(0)
+services = d.get("services", {})
+if not isinstance(services, dict) or not services:
+    sys.exit(0)
+def published_ports(svc):
+    out = []
+    for p in (svc.get("ports") or []):
+        if isinstance(p, dict):
+            pub = p.get("published")
+        else:
+            pub = None
+        if pub is not None and str(pub).strip():
+            out.append(str(pub).strip())
+    return out
+# (1) label loki.primary=true
+for name, svc in services.items():
+    labels = svc.get("labels") or {}
+    if isinstance(labels, list):
+        labels = dict(x.split("=", 1) for x in labels if "=" in x)
+    if str(labels.get("loki.primary", "")).lower() == "true":
+        pp = published_ports(svc)
+        if pp:
+            print(name + "|" + pp[0]); sys.exit(0)
+# (2) service named web/app
+for cand in ("web", "app"):
+    svc = services.get(cand)
+    if svc:
+        pp = published_ports(svc)
+        if pp:
+            print(cand + "|" + pp[0]); sys.exit(0)
+# (3) service publishing a common web port
+for name, svc in services.items():
+    pp = published_ports(svc)
+    for cp in COMMON:
+        if cp in pp:
+            print(name + "|" + cp); sys.exit(0)
+# (4) first service with any published port
+for name, svc in services.items():
+    pp = published_ports(svc)
+    if pp:
+        print(name + "|" + pp[0]); sys.exit(0)
+sys.exit(0)
+' 2>/dev/null || return 0
+}
 # Detect port from project files
 _detect_port() {
     local method="$1"
@@ -158,18 +233,34 @@ _detect_port() {
     case "$method" in
         *docker\ compose*)
-            # Parse port from compose file
-            local compose_file
-            if [ -f "${TARGET_DIR:-.}/docker-compose.yml" ]; then
-                compose_file="${TARGET_DIR:-.}/docker-compose.yml"
+            # v7.26.0: identify the PRIMARY WEB service and ITS published port
+            # via docker compose config (resolved JSON), so the preview URL and
+            # health check target the web service, not whichever port (e.g. a
+            # db/cache) appears first in the file. Falls back to the legacy
+            # first-port grep when docker/python is unavailable or no web
+            # service is found.
+            local web_info web_port
+            web_info=$(_identify_compose_web_service "${TARGET_DIR:-.}")
+            if [ -n "$web_info" ]; then
+                _APP_RUNNER_WEB_SERVICE="${web_info%%|*}"
+                web_port="${web_info##*|}"
+            fi
+            if [ -n "${web_port:-}" ] && [[ "$web_port" =~ ^[0-9]+$ ]]; then
+                _APP_RUNNER_PORT="$web_port"
             else
-                compose_file="${TARGET_DIR:-.}/compose.yml"
+                # Legacy fallback: first published port from the compose file.
+                local compose_file
+                if [ -f "${TARGET_DIR:-.}/docker-compose.yml" ]; then
+                    compose_file="${TARGET_DIR:-.}/docker-compose.yml"
+                else
+                    compose_file="${TARGET_DIR:-.}/compose.yml"
+                fi
+                local port
+                # Handle both simple (HOST:CONTAINER) and IP-bound (IP:HOST:CONTAINER) port formats
+                # Also handle port ranges like "8080-8090:8080-8090" by taking the first port
+                port=$(grep -E '^\s*-\s*"?[0-9]' "$compose_file" 2>/dev/null | head -1 | sed 's/.*- *"*//;s/".*//;' | awk -F: '{print $(NF-1)}' | awk -F- '{print $1}')
+                _APP_RUNNER_PORT="${port:-8080}"
             fi
-            local port
-            # Handle both simple (HOST:CONTAINER) and IP-bound (IP:HOST:CONTAINER) port formats
-            # Also handle port ranges like "8080-8090:8080-8090" by taking the first port
-            port=$(grep -E '^\s*-\s*"?[0-9]' "$compose_file" 2>/dev/null | head -1 | sed 's/.*- *"*//;s/".*//;' | awk -F: '{print $(NF-1)}' | awk -F- '{print $1}')
-            _APP_RUNNER_PORT="${port:-8080}"
             ;;
         *docker\ build*)
             local port
@@ -694,14 +785,64 @@ app_runner_health_check() {
         # retries are handled in app_runner_start.
         local running_containers
         running_containers=$(LOKI_COMPOSE_HEALTH_TIMEOUT=1 _app_runner_compose_running_count "${TARGET_DIR:-.}")
-        if [ "${running_containers:-0}" -gt 0 ]; then
+        if [ "${running_containers:-0}" -le 0 ]; then
+            # Nothing running at all.
+            _write_health "false"
+            _write_app_state "crashed"
+            return 1
+        fi
+        # v7.26.0 (Phase 4) GAP #4 fix: "some container is up" is NOT health for
+        # a multi-service stack. If we identified a primary web service, health
+        # keys on THAT service, not on whether any container (e.g. a db/cache) is
+        # up. Two signals, in order: (1) the web service's docker HEALTHCHECK
+        # result when one is declared (COMPOSE_INSTRUCTION mandates an HTTP
+        # healthcheck on the web service) -- "healthy" means actually serving,
+        # "unhealthy"/"starting" do not; (2) when no healthcheck is declared,
+        # fall back to the container lifecycle State (running), matching the
+        # codebase convention. Reads both fields from one `docker compose ps`.
+        if [ -n "${_APP_RUNNER_WEB_SERVICE:-}" ]; then
+            local _web_line _web_state _web_health
+            _web_line=$(cd "$(_app_runner_compose_dir "${TARGET_DIR:-.}")" \
+                && docker compose ps --format '{{.Service}}|{{.State}}|{{.Health}}' 2>/dev/null \
+                | tr -d '\r' | awk -F'|' -v s="$_APP_RUNNER_WEB_SERVICE" '$1==s {print; exit}')
+            _web_state=$(printf '%s' "$_web_line" | awk -F'|' '{print $2}')
+            _web_health=$(printf '%s' "$_web_line" | awk -F'|' '{print $3}')
+            if [ "$_web_state" != "running" ]; then
+                # Container not running -> definitively down.
+                _write_health "false"
+                _write_app_state "crashed"
+                return 1
+            fi
+            if [ -n "$_web_health" ]; then
+                # A healthcheck is declared: it is authoritative.
+                if [ "$_web_health" = "healthy" ]; then
+                    _write_health "true"
+                    _write_app_state "running"
+                    return 0
+                fi
+                if [ "$_web_health" = "unhealthy" ]; then
+                    # Container up but failing its own healthcheck (not serving).
+                    _write_health "false"
+                    _write_app_state "crashed"
+                    return 1
+                fi
+                # "starting" (within start_period): up, not yet healthy. Report
+                # running so the watchdog gives it time instead of restarting,
+                # but do not yet claim a passing health.
+                _write_health "false"
+                _write_app_state "running"
+                return 0
+            fi
+            # No healthcheck declared: container running is the signal.
             _write_health "true"
             _write_app_state "running"
             return 0
-        else
-            _write_health "false"
-            return 1
         fi
+        # No web service identified (legacy/degraded): fall back to the
+        # original "any container running" signal.
+        _write_health "true"
+        _write_app_state "running"
+        return 0
     fi
     # Check PID is alive (non-docker-compose methods)
@@ -769,6 +910,36 @@ app_runner_should_restart() {
 app_runner_watchdog() {
     _app_runner_dir
+    # v7.26.0 (Phase 4): docker compose runs detached (`up -d` exits immediately),
+    # so the captured PID is a short-lived subshell and `kill -0` is the wrong
+    # liveness signal for a compose stack. For compose, delegate to
+    # app_runner_health_check, whose compose branch keys on the primary web
+    # SERVICE container running (GAP #4) and writes health.json + state.json.
+    # This is what makes the service-aware health logic actually fire in the
+    # live monitoring loop (not just in isolation). On an unhealthy web service
+    # it restarts the stack under the same crash-count circuit breaker.
+    if [ "$_APP_RUNNER_IS_DOCKER" = true ] && echo "$_APP_RUNNER_METHOD" | grep -q "docker compose"; then
+        if app_runner_health_check; then
+            return 0
+        fi
+        _APP_RUNNER_CRASH_COUNT=$(( _APP_RUNNER_CRASH_COUNT + 1 ))
+        log_warn "App Runner: compose web service unhealthy (crash #$_APP_RUNNER_CRASH_COUNT)"
+        if [ "$_APP_RUNNER_CRASH_COUNT" -ge 5 ]; then
+            log_error "App Runner: crash limit reached (5), marking as crashed"
+            tail -20 "$_APP_RUNNER_DIR/app.log" 2>/dev/null | while IFS= read -r line; do
+                log_error "  $line"
+            done
+            _write_app_state "crashed"
+            return 1
+        fi
+        local _c_backoff=$(( 1 << _APP_RUNNER_CRASH_COUNT ))
+        [ "$_c_backoff" -gt 30 ] && _c_backoff=30
+        log_info "App Runner: restarting compose stack in ${_c_backoff}s..."
+        sleep "$_c_backoff"
+        app_runner_start || log_warn "App Runner: compose auto-restart failed"
+        return 0
+    fi
     if [ -z "$_APP_RUNNER_PID" ] && [ -f "$_APP_RUNNER_DIR/app.pid" ]; then
         _APP_RUNNER_PID=$(cat "$_APP_RUNNER_DIR/app.pid" 2>/dev/null)
     fi

package/autonomy/run.sh CHANGED Viewed

@@ -2423,6 +2423,20 @@ build_completion_summary() {
         *)              outcome_label="$outcome";          notify_title="Run finished" ;;
     esac
+    # Live app URL (best-effort): if the app runner has a running app, surface
+    # where the user can try it. Reads .loki/app-runner/state.json written by
+    # app-runner.sh. Empty when no app is running.
+    local live_app_url=""
+    local _app_state_file="$loki_dir/app-runner/state.json"
+    if [ -f "$_app_state_file" ]; then
+        live_app_url="$(python3 -c "import json,sys
+try:
+    d=json.load(open(sys.argv[1]))
+    print(d.get('url','') if d.get('status')=='running' else '')
+except Exception:
+    print('')" "$_app_state_file" 2>/dev/null)"
+    fi
     # Branch + diff stats vs the run-start SHA (best-effort; non-git or empty
     # baseline yields empty values, which we render as "unknown"/"0").
     local start_sha="${_LOKI_RUN_START_SHA:-}"
@@ -2483,6 +2497,15 @@ build_completion_summary() {
             echo "Pull request: not opened (set LOKI_DELEGATE_PR=1 to open one)"
         fi
         echo ""
+        if [ -n "$live_app_url" ]; then
+            # Compute the dashboard scheme the same way start_dashboard does
+            # (url_scheme is local to that function, not visible here).
+            local _dash_scheme="http"
+            [ -n "${LOKI_TLS_CERT:-}" ] && [ -n "${LOKI_TLS_KEY:-}" ] && _dash_scheme="https"
+            echo "Your app is live at: $live_app_url  (served locally on this machine)"
+            echo "  Dashboard: ${_dash_scheme}://127.0.0.1:${DASHBOARD_PORT:-57374}/  (App Runner -> Live App)"
+            echo ""
+        fi
         echo "Tasks: pending=$pending in_progress=$in_progress completed=$completed failed=$failed"
         echo ""
         echo "Review the work:"
@@ -8216,9 +8239,22 @@ start_dashboard() {
         log_info "Dashboard started (PID: $DASHBOARD_PID)"
         log_info "Dashboard: ${CYAN}${url_scheme}://127.0.0.1:$DASHBOARD_PORT/${NC}"
-        # Open in browser (macOS)
-        if [[ "$OSTYPE" == "darwin"* ]]; then
-            open "${url_scheme}://127.0.0.1:$DASHBOARD_PORT/" 2>/dev/null || true
+        # Auto-open the dashboard in the browser, but ONLY for an interactive
+        # foreground session. Gated on: a TTY on stdout ([ -t 1 ]), not
+        # background/detached mode, and not explicitly opted out via
+        # LOKI_NO_AUTO_OPEN=1. This keeps CI, --detach, SSH-no-TTY, and piped
+        # runs from spawning a browser. Cross-platform: open / xdg-open / start.
+        if [ -t 1 ] && [ "${BACKGROUND_MODE:-false}" != "true" ] && [ "${LOKI_NO_AUTO_OPEN:-0}" != "1" ]; then
+            local _dash_url="${url_scheme}://127.0.0.1:$DASHBOARD_PORT/"
+            if command -v open >/dev/null 2>&1; then
+                open "$_dash_url" 2>/dev/null || true
+            elif command -v xdg-open >/dev/null 2>&1; then
+                xdg-open "$_dash_url" 2>/dev/null || true
+            elif command -v cmd.exe >/dev/null 2>&1; then
+                # Windows (Git Bash/WSL): `start` is a cmd builtin, not on PATH,
+                # so invoke it via cmd.exe. The empty "" is start's title arg.
+                cmd.exe /c start "" "$_dash_url" 2>/dev/null || true
+            fi
         fi
         return 0
     else
@@ -10429,6 +10465,14 @@ build_prompt() {
     # same precedent as AUTONOMY_OVERRIDE_TEXT in providers/claude_flags.ts).
     local agents_md_instruction="Project conventions: read AGENTS.md in the repository root for build, test, and style conventions. If AGENTS.md is absent, read CLAUDE.md instead. The nearest such file to the code you are editing takes precedence."
+    # Compose-first instruction (v7.26.0): unconditional string with conditional
+    # phrasing (YOU decide whether the app warrants compose, not a static grep).
+    # When an app needs more than one running service (web + database and/or
+    # cache), a docker-compose.yml is the run contract Loki's app runner uses to
+    # bring the whole stack up locally. This string MUST stay byte-identical to
+    # COMPOSE_INSTRUCTION in loki-ts/src/runner/build_prompt.ts (parity-locked).
+    local compose_instruction="RUN_CONTRACT: Decide whether this app needs more than one running service. If it needs a database (PostgreSQL/MySQL/MongoDB) and/or a cache (Redis) and/or separate backend and frontend services, generate a 12-factor docker-compose.yml at the project root so the whole stack runs with one command. Requirements when you generate it: (1) name the primary web/app service so it is obvious (service key 'web' or 'app', or add the label 'loki.primary=true' on it) and publish its HTTP port (host:container, e.g. '3000:3000'); (2) give every service a healthcheck (the web service must have an HTTP healthcheck so 'up' means actually serving, not just started); (3) wire dependencies with depends_on and config via environment variables; (4) write a .env.example listing every required variable with safe placeholder values; (5) keep secrets out of the compose file and out of git. If the app is a single service with no datastore, do NOT add compose; a plain run command is correct. If a working docker-compose.yml already exists and matches the app, leave it; otherwise create or update it. Verify the stack comes up (docker compose up) before claiming completion."
     # Load existing context if resuming
     local context_injection=""
     if [ $retry -gt 0 ]; then
@@ -10758,15 +10802,15 @@ except Exception:
         else
             if [ $retry -eq 0 ]; then
                 if [ -n "$prd" ]; then
-                    echo "Loki Mode with PRD at $prd. $update_instruction $human_directive $gate_failure_context $queue_tasks $bmad_context $openspec_context $mirofish_context $magic_context $checklist_status $app_runner_info $playwright_info $memory_context_section $rarv_instruction $memory_instruction $usage_doc_instruction $lsp_grounding_instruction $agents_md_instruction $completion_instruction $sdlc_instruction $autonomous_suffix"
+                    echo "Loki Mode with PRD at $prd. $update_instruction $human_directive $gate_failure_context $queue_tasks $bmad_context $openspec_context $mirofish_context $magic_context $checklist_status $app_runner_info $playwright_info $memory_context_section $rarv_instruction $memory_instruction $usage_doc_instruction $compose_instruction $lsp_grounding_instruction $agents_md_instruction $completion_instruction $sdlc_instruction $autonomous_suffix"
                 else
-                    echo "Loki Mode. $human_directive $gate_failure_context $queue_tasks $bmad_context $openspec_context $mirofish_context $magic_context $checklist_status $app_runner_info $playwright_info $memory_context_section $analysis_instruction $rarv_instruction $memory_instruction $usage_doc_instruction $lsp_grounding_instruction $agents_md_instruction $completion_instruction $sdlc_instruction $autonomous_suffix"
+                    echo "Loki Mode. $human_directive $gate_failure_context $queue_tasks $bmad_context $openspec_context $mirofish_context $magic_context $checklist_status $app_runner_info $playwright_info $memory_context_section $analysis_instruction $rarv_instruction $memory_instruction $usage_doc_instruction $compose_instruction $lsp_grounding_instruction $agents_md_instruction $completion_instruction $sdlc_instruction $autonomous_suffix"
                 fi
             else
                 if [ -n "$prd" ]; then
-                    echo "Loki Mode - Resume iteration #$iteration (retry #$retry). PRD: $prd. $human_directive $gate_failure_context $queue_tasks $bmad_context $openspec_context $mirofish_context $magic_context $checklist_status $app_runner_info $playwright_info $memory_context_section $rarv_instruction $memory_instruction $usage_doc_instruction $lsp_grounding_instruction $agents_md_instruction $completion_instruction $sdlc_instruction $autonomous_suffix"
+                    echo "Loki Mode - Resume iteration #$iteration (retry #$retry). PRD: $prd. $human_directive $gate_failure_context $queue_tasks $bmad_context $openspec_context $mirofish_context $magic_context $checklist_status $app_runner_info $playwright_info $memory_context_section $rarv_instruction $memory_instruction $usage_doc_instruction $compose_instruction $lsp_grounding_instruction $agents_md_instruction $completion_instruction $sdlc_instruction $autonomous_suffix"
                 else
-                    echo "Loki Mode - Resume iteration #$iteration (retry #$retry). $human_directive $gate_failure_context $queue_tasks $bmad_context $openspec_context $mirofish_context $magic_context $checklist_status $app_runner_info $playwright_info $memory_context_section Use .loki/generated-prd.md if exists. $rarv_instruction $memory_instruction $usage_doc_instruction $lsp_grounding_instruction $agents_md_instruction $completion_instruction $sdlc_instruction $autonomous_suffix"
+                    echo "Loki Mode - Resume iteration #$iteration (retry #$retry). $human_directive $gate_failure_context $queue_tasks $bmad_context $openspec_context $mirofish_context $magic_context $checklist_status $app_runner_info $playwright_info $memory_context_section Use .loki/generated-prd.md if exists. $rarv_instruction $memory_instruction $usage_doc_instruction $compose_instruction $lsp_grounding_instruction $agents_md_instruction $completion_instruction $sdlc_instruction $autonomous_suffix"
                 fi
             fi
         fi
@@ -10808,6 +10852,7 @@ except Exception:
             printf 'You are a coding assistant. Analyze this codebase and suggest improvements. Write working code and commit changes.\n'
         fi
         printf '%s\n' "$usage_doc_instruction"
+        printf '%s\n' "$compose_instruction"
         printf '%s\n' "$lsp_grounding_instruction"
         printf '%s\n' "$agents_md_instruction"
         printf '</loki_system>\n'
@@ -10841,6 +10886,7 @@ except Exception:
     printf '%s\n' "$autonomous_suffix"
     printf '%s\n' "$memory_instruction"
     printf '%s\n' "$usage_doc_instruction"
+    printf '%s\n' "$compose_instruction"
     printf '%s\n' "$lsp_grounding_instruction"
     printf '%s\n' "$agents_md_instruction"
     # For codebase-analysis mode (no PRD), analysis_instruction is part of the
@@ -12141,6 +12187,59 @@ except Exception as exc:
            && loki_claude_flag_supported "--include-partial-messages"; then
             _loki_claude_argv+=("--include-partial-messages")
         fi
+        # ---- Bash<->Bun invocation-flag convergence ledger (v7.25.0) ----------
+        # The fixture corpus covers build_prompt/stats output, NOT this claude
+        # argv, so drift here is invisible to parity tests. Keep this ledger
+        # current. Live route today is BASH (bin/loki routes `start` -> bash).
+        # The claude provider in loki-ts/src/runner/providers.ts is implemented
+        # but is NOT reached for `start` (start is not ported to the Bun router;
+        # the shim falls through to bash), so its flag set has zero live impact
+        # today.
+        # Bash argv (canonical, live): --dangerously-skip-permissions --model M
+        #   [--append-system-prompt] [--setting-sources] [--include-partial-messages]
+        #   [--effort] [--max-budget-usd] [--fallback-model] -p PROMPT
+        #   --output-format stream-json --verbose
+        # Bun buildAutoFlags also emits: --exclude-dynamic-system-prompt-sections
+        #   (cost-only), --mcp-config (bash gets MCP via --setting-sources +
+        #   .mcp.json discovery; a how-difference, likely behavior-equivalent),
+        #   --include-hook-events (bash handles hook events in its embedded
+        #   stream parser; likely moot). These three are Bun-only and MUST be
+        #   reconciled to a deliberately chosen canonical set BEFORE `start`
+        #   flips to the Bun runner. They have zero live impact today.
+        # v7.25.0: long-run resilience + cost flags, appended individually here
+        # (NOT via _loki_build_claude_auto_flags, which would double the three
+        # flags above). Each is gated on CLI support + an opt-out env var, same
+        # pattern as above. These improve unattended/long-run execution:
+        #   --effort           adaptive reasoning depth per RARV tier
+        #   --max-budget-usd   per-call hard backstop (complements the
+        #                      cumulative check_budget_limit PAUSE gate)
+        #   --fallback-model   resilience to model overload/unavailability
+        # The trust/verification gates stay deterministic; these only tune how
+        # the provider is invoked, never whether work is judged complete.
+        if [ "${LOKI_AUTO_EFFORT:-on}" != "off" ] \
+           && type loki_effort_for_tier >/dev/null 2>&1 \
+           && type loki_claude_flag_supported >/dev/null 2>&1 \
+           && loki_claude_flag_supported "--effort"; then
+            local _loki_effort
+            _loki_effort="$(loki_effort_for_tier "$CURRENT_TIER" "${DETECTED_COMPLEXITY:-${LOKI_COMPLEXITY:-standard}}")"
+            [ -n "$_loki_effort" ] && _loki_claude_argv+=("--effort" "$_loki_effort")
+        fi
+        if [ "${LOKI_AUTO_BUDGET:-on}" != "off" ] \
+           && type loki_remaining_budget >/dev/null 2>&1 \
+           && type loki_claude_flag_supported >/dev/null 2>&1 \
+           && loki_claude_flag_supported "--max-budget-usd"; then
+            local _loki_rem_budget
+            _loki_rem_budget="$(loki_remaining_budget)"
+            [ -n "$_loki_rem_budget" ] && _loki_claude_argv+=("--max-budget-usd" "$_loki_rem_budget")
+        fi
+        if [ "${LOKI_AUTO_FALLBACK:-on}" != "off" ] \
+           && type loki_fallback_for_primary >/dev/null 2>&1 \
+           && type loki_claude_flag_supported >/dev/null 2>&1 \
+           && loki_claude_flag_supported "--fallback-model"; then
+            local _loki_fallback
+            _loki_fallback="$(loki_fallback_for_primary "$tier_param")"
+            [ -n "$_loki_fallback" ] && _loki_claude_argv+=("--fallback-model" "$_loki_fallback")
+        fi
         case "${PROVIDER_NAME:-claude}" in
             claude)
                 # Claude: Full features with stream-json output and agent tracking

package/dashboard/__init__.py CHANGED Viewed

@@ -7,7 +7,7 @@ Modules:
     control: Session control API (start/stop/pause/resume)
 """
-__version__ = "7.24.0"
+__version__ = "7.26.0"
 # Expose the control app for easy import
 try:

package/docs/INSTALLATION.md CHANGED Viewed

@@ -2,7 +2,7 @@
 The flagship product of [Autonomi](https://www.autonomi.dev/). Loki Mode is a spec-driven autonomous builder with a built-in trust layer that takes any spec to a deployed product and verifies completion with evidence (quality gates plus a completion council), not just a "done" claim. Complete installation instructions for all platforms and use cases.
-**Version:** v7.24.0
+**Version:** v7.26.0
 ---

package/loki-ts/dist/loki.js CHANGED Viewed

@@ -1,5 +1,5 @@
 // @bun
-var f8=Object.defineProperty;var u8=($)=>$;function c8($,Q){this[$]=u8.bind(null,Q)}var g=($,Q)=>{for(var Z in Q)f8($,Z,{get:Q[Z],enumerable:!0,configurable:!0,set:c8.bind(Q,Z)})};var k=($,Q)=>()=>($&&(Q=$($=0)),Q);var X1=import.meta.require;var F$={};g(F$,{lokiDir:()=>P,homeLokiDir:()=>o1,findRepoRootForVersion:()=>d1,REPO_ROOT:()=>f});import{resolve as n,dirname as l1}from"path";import{fileURLToPath as p8}from"url";import{existsSync as L1}from"fs";import{homedir as l8}from"os";function d8(){let $=j$;for(let Q=0;Q<6;Q++){if(L1(n($,"VERSION"))&&L1(n($,"autonomy/run.sh")))return $;let Z=l1($);if(Z===$)break;$=Z}return n(j$,"..","..","..")}function d1($){let Q=$;for(let Z=0;Z<6;Z++){if(L1(n(Q,"VERSION"))&&L1(n(Q,"autonomy/run.sh")))return Q;let z=l1(Q);if(z===Q)break;Q=z}return n($,"..","..","..")}function P(){return process.env.LOKI_DIR??n(process.cwd(),".loki")}function o1(){return n(l8(),".loki")}var j$,f;var y=k(()=>{j$=l1(p8(import.meta.url));f=d8()});import{readFileSync as o8}from"fs";import{resolve as n8,dirname as a8}from"path";import{fileURLToPath as s8}from"url";function k1(){if($1!==null)return $1;let $="7.24.0";if(typeof $==="string"&&$.length>0)return $1=$,$1;try{let Q=a8(s8(import.meta.url)),Z=d1(Q);$1=o8(n8(Z,"VERSION"),"utf-8").trim()}catch{$1="unknown"}return $1}var $1=null;var n1=k(()=>{y()});var E$={};g(E$,{runOrThrow:()=>t8,run:()=>j,commandVersion:()=>i8,commandExists:()=>v,ShellError:()=>a1});async function j($,Q={}){let Z=Bun.spawn({cmd:[...$],stdout:"pipe",stderr:"pipe",env:Q.env?{...process.env,...Q.env}:process.env,cwd:Q.cwd}),z,K;if(Q.timeoutMs&&Q.timeoutMs>0)z=setTimeout(()=>{try{Z.kill("SIGTERM")}catch{}K=setTimeout(()=>{try{Z.kill("SIGKILL")}catch{}},2000)},Q.timeoutMs);try{let[H,X,q]=await Promise.all([new Response(Z.stdout).text(),new Response(Z.stderr).text(),Z.exited]);return{stdout:H,stderr:X,exitCode:q}}finally{if(z)clearTimeout(z);if(K)clearTimeout(K)}}async function t8($,Q={}){let Z=await j($,Q);if(Z.exitCode!==0)throw new a1(`command failed (${Z.exitCode}): ${$.join(" ")}`,Z.exitCode,Z.stdout,Z.stderr);return Z}async function v($){let Q=r8($),Z=await j(["sh","-c",`command -v ${Q}`],{timeoutMs:5000});if(Z.exitCode===0)return Z.stdout.trim()||null;return null}function r8($){if(!/^[A-Za-z0-9._/-]+$/.test($))throw Error(`refused to shell-escape suspect token: ${$}`);return $}async function i8($,Q="--version"){if(!await v($))return null;let z=await j([$,Q],{timeoutMs:5000});if(z.exitCode!==0)return null;return((z.stdout||z.stderr).split(/\r?\n/)[0]?.trim()??"")||null}var a1;var d=k(()=>{a1=class a1 extends Error{message;exitCode;stdout;stderr;constructor($,Q,Z,z){super($);this.message=$;this.exitCode=Q;this.stdout=Z;this.stderr=z;this.name="ShellError"}}});function a($){return e8?"":$}var e8,T,N,_,KZ,A,R,h,J;var c=k(()=>{e8=(process.env.NO_COLOR??"").length>0;T=a("\x1B[0;31m"),N=a("\x1B[0;32m"),_=a("\x1B[1;33m"),KZ=a("\x1B[0;34m"),A=a("\x1B[0;36m"),R=a("\x1B[1m"),h=a("\x1B[2m"),J=a("\x1B[0m")});import{existsSync as U7}from"fs";async function Q1(){if(B1!==void 0)return B1;let $="/opt/homebrew/bin/python3.12";if(U7($))return B1=$,$;let Q=await v("python3.12");if(Q)return B1=Q,Q;let Z=await v("python3");return B1=Z,Z}async function Z1($,Q={}){let Z=await Q1();if(!Z)return{stdout:"",stderr:"python3 not found",exitCode:127};return j([Z,"-c",$],Q)}var B1;var H1=k(()=>{d()});var d$={};g(d$,{runStatus:()=>N7});import{existsSync as b,readFileSync as q1,readdirSync as v$,statSync as f$}from"fs";import{resolve as D,basename as P7}from"path";import{homedir as L7}from"os";async function j7(){if(await v("jq"))return!0;return process.stdout.write(`${T}Error: jq is required but not installed.${J}
+var f8=Object.defineProperty;var u8=($)=>$;function c8($,Q){this[$]=u8.bind(null,Q)}var g=($,Q)=>{for(var Z in Q)f8($,Z,{get:Q[Z],enumerable:!0,configurable:!0,set:c8.bind(Q,Z)})};var k=($,Q)=>()=>($&&(Q=$($=0)),Q);var X1=import.meta.require;var F$={};g(F$,{lokiDir:()=>P,homeLokiDir:()=>o1,findRepoRootForVersion:()=>d1,REPO_ROOT:()=>f});import{resolve as n,dirname as l1}from"path";import{fileURLToPath as p8}from"url";import{existsSync as L1}from"fs";import{homedir as l8}from"os";function d8(){let $=j$;for(let Q=0;Q<6;Q++){if(L1(n($,"VERSION"))&&L1(n($,"autonomy/run.sh")))return $;let Z=l1($);if(Z===$)break;$=Z}return n(j$,"..","..","..")}function d1($){let Q=$;for(let Z=0;Z<6;Z++){if(L1(n(Q,"VERSION"))&&L1(n(Q,"autonomy/run.sh")))return Q;let z=l1(Q);if(z===Q)break;Q=z}return n($,"..","..","..")}function P(){return process.env.LOKI_DIR??n(process.cwd(),".loki")}function o1(){return n(l8(),".loki")}var j$,f;var y=k(()=>{j$=l1(p8(import.meta.url));f=d8()});import{readFileSync as o8}from"fs";import{resolve as n8,dirname as a8}from"path";import{fileURLToPath as s8}from"url";function k1(){if($1!==null)return $1;let $="7.26.0";if(typeof $==="string"&&$.length>0)return $1=$,$1;try{let Q=a8(s8(import.meta.url)),Z=d1(Q);$1=o8(n8(Z,"VERSION"),"utf-8").trim()}catch{$1="unknown"}return $1}var $1=null;var n1=k(()=>{y()});var E$={};g(E$,{runOrThrow:()=>t8,run:()=>j,commandVersion:()=>i8,commandExists:()=>v,ShellError:()=>a1});async function j($,Q={}){let Z=Bun.spawn({cmd:[...$],stdout:"pipe",stderr:"pipe",env:Q.env?{...process.env,...Q.env}:process.env,cwd:Q.cwd}),z,K;if(Q.timeoutMs&&Q.timeoutMs>0)z=setTimeout(()=>{try{Z.kill("SIGTERM")}catch{}K=setTimeout(()=>{try{Z.kill("SIGKILL")}catch{}},2000)},Q.timeoutMs);try{let[H,X,q]=await Promise.all([new Response(Z.stdout).text(),new Response(Z.stderr).text(),Z.exited]);return{stdout:H,stderr:X,exitCode:q}}finally{if(z)clearTimeout(z);if(K)clearTimeout(K)}}async function t8($,Q={}){let Z=await j($,Q);if(Z.exitCode!==0)throw new a1(`command failed (${Z.exitCode}): ${$.join(" ")}`,Z.exitCode,Z.stdout,Z.stderr);return Z}async function v($){let Q=r8($),Z=await j(["sh","-c",`command -v ${Q}`],{timeoutMs:5000});if(Z.exitCode===0)return Z.stdout.trim()||null;return null}function r8($){if(!/^[A-Za-z0-9._/-]+$/.test($))throw Error(`refused to shell-escape suspect token: ${$}`);return $}async function i8($,Q="--version"){if(!await v($))return null;let z=await j([$,Q],{timeoutMs:5000});if(z.exitCode!==0)return null;return((z.stdout||z.stderr).split(/\r?\n/)[0]?.trim()??"")||null}var a1;var d=k(()=>{a1=class a1 extends Error{message;exitCode;stdout;stderr;constructor($,Q,Z,z){super($);this.message=$;this.exitCode=Q;this.stdout=Z;this.stderr=z;this.name="ShellError"}}});function a($){return e8?"":$}var e8,T,N,_,KZ,A,R,h,J;var c=k(()=>{e8=(process.env.NO_COLOR??"").length>0;T=a("\x1B[0;31m"),N=a("\x1B[0;32m"),_=a("\x1B[1;33m"),KZ=a("\x1B[0;34m"),A=a("\x1B[0;36m"),R=a("\x1B[1m"),h=a("\x1B[2m"),J=a("\x1B[0m")});import{existsSync as U7}from"fs";async function Q1(){if(B1!==void 0)return B1;let $="/opt/homebrew/bin/python3.12";if(U7($))return B1=$,$;let Q=await v("python3.12");if(Q)return B1=Q,Q;let Z=await v("python3");return B1=Z,Z}async function Z1($,Q={}){let Z=await Q1();if(!Z)return{stdout:"",stderr:"python3 not found",exitCode:127};return j([Z,"-c",$],Q)}var B1;var H1=k(()=>{d()});var d$={};g(d$,{runStatus:()=>N7});import{existsSync as b,readFileSync as q1,readdirSync as v$,statSync as f$}from"fs";import{resolve as D,basename as P7}from"path";import{homedir as L7}from"os";async function j7(){if(await v("jq"))return!0;return process.stdout.write(`${T}Error: jq is required but not installed.${J}
 `),process.stdout.write(`Install with:
 `),process.stdout.write(`  brew install jq    (macOS)
 `),process.stdout.write(`  apt install jq     (Debian/Ubuntu)
@@ -787,4 +787,4 @@ Set LOKI_LEGACY_BASH=1 to force the bash CLI for every command.
 `),2}default:return process.stderr.write(`Unknown command: ${Q}
 `),process.stderr.write(v8),2}}g$();process.on("SIGINT",()=>process.exit(130));process.on("SIGTERM",()=>process.exit(143));var l3=await p3(Bun.argv.slice(2));process.exit(l3);
-//# debugId=5BE33C3C3E53FD8864756E2164756E21
+//# debugId=7BD97DA7996A924D64756E2164756E21

package/mcp/__init__.py CHANGED Viewed

@@ -57,4 +57,4 @@ try:
 except ImportError:
     __all__ = ['mcp']
-__version__ = '7.24.0'
+__version__ = '7.26.0'

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "loki-mode",
-  "version": "7.24.0",
+  "version": "7.26.0",
   "description": "Loki Mode by Autonomi. Autonomous spec-to-product system: takes a PRD, GitHub issue, OpenAPI/JSON/YAML, or one-line brief to a deployed app via the RARV-C closure loop with 11 quality gates. Provider-agnostic (Claude Code, OpenAI Codex, Cline, Aider).",
   "keywords": [
     "agent",

package/docs/COMPETITIVE-ANALYSIS-INSTANT-PREVIEW-2026-06.md DELETED Viewed

@@ -1,114 +0,0 @@
-# Loki Mode vs Replit Agent / Lovable / Bolt.new -- Instant-Preview & Self-Healing Gap Analysis
-Date: 2026-06-09
-Author: autonomous analysis (verified against Loki source + competitor web research at knowledge cutoff Jan 2026 plus June 2026 web search)
-Status: analysis + prioritized TODO. The TODO is the planning checkpoint, not an auto-launched implementation.
-## 1. The user's framing
-> "When users prompt a spec or idea, [Replit/Lovable/Bolt] just builds and spins up UI so users can try it. They are clever at suggesting what to add/update/improve/remove, finding bugs while the app runs and fixing them autonomously before the user realizes. Why are we not able to do that? We are lacking something that is causing friction for users (devs, enterprises, non-technical consumers)."
-This is correct as a *felt experience* gap. It is NOT correct that Loki lacks the underlying capability. The gap is mostly **surfacing and loop-tightness**, plus a real **category difference**.
-## 2. What the competitors actually do (verified June 2026)
-### Replit Agent 3
-- Runs autonomously up to ~200 minutes ("10x more autonomous than Agent 2").
-- "Self-healing" loop: spins up a real browser, simulates user behavior (click/type/login), captures logs, and fixes bugs it hits during testing. Calls out "Potemkin interfaces" (looks-done-but-broken).
-- REPL-based verification at scale; provisions database; verifies every button/API call.
-- Code-to-Device: builds + previews native mobile via Expo QR instantly.
-- "Stacks": agents building agents. RulesSync for replit.md across projects.
-- Hosted: your project runs on Replit's cloud, instant live preview built into the IDE.
-### Lovable
-- Three modes: Visual Edits (click an element), Plan Mode (conversational), Agent Mode (autonomous codebase exploration + proactive debugging + real-time web search).
-- Live preview updates in real time; checkpoint/version system to revert a bad edit.
-- Pricing: Free (5 daily credits), Pro $25/mo, Business $50/mo; plus usage-based Cloud/AI billing on shipped apps.
-- Hosted; greenfield-biased (build a new app from a prompt).
-### Bolt.new (StackBlitz)
-- WebContainer: Node runs natively *in the browser*, no server. Live preview updates as code generates; you click/fill/interact immediately.
-- Agent has full control of filesystem, node server, package manager, terminal, browser console; human-in-the-loop chat + hand-edit.
-- Bolt V2: Bolt Cloud (DB, auth, storage, edge functions, analytics, hosting), one-click Netlify deploy. Opus 4.6 with adjustable reasoning depth (Jan 2026).
-- Hosted/in-browser; greenfield-biased.
-Common thread: **hosted text-to-app**. Instant live preview is free because the runtime IS their cloud/browser sandbox. The inner loop (build -> see -> fix) is conversational, visible, real-time.
-## 3. What Loki actually has (verified in source)
-| Capability | Competitors | Loki today | Source |
-|---|---|---|---|
-| Starts the built app on a detected port | yes | YES (`app_runner_start`, `_detect_port`) | `autonomy/app-runner.sh:498,150` |
-| Health check + crash watchdog + auto-restart | yes | YES (`app_runner_watchdog`, `app_runner_should_restart`, `app_runner_health_check`) | `autonomy/app-runner.sh:769,735,678` |
-| Browser smoke test of the running app | yes (their core loop) | YES, batch (`playwright-verify.sh`) | `autonomy/playwright-verify.sh` |
-| Crash/playwright signal fed back into next iteration to self-correct | yes, tight realtime | YES, batch (`app_runner_info`/`playwright_info` injected into `build_prompt`) | `autonomy/run.sh:10544,10561,10761` |
-| Verified completion gate (no fabricated "done") | partial | YES (`council_evidence_gate`) | `completion-council.sh` |
-| Failure-memory: past failures injected to avoid repeats | partial | YES (`retrieve_anti_patterns`) | `memory/retrieval.py` |
-| **Clickable live preview URL / embedded app the user can try** | YES, central | **NO -- app runs but URL is not surfaced in dashboard** | gap: `dashboard/server.py` has no preview route |
-| **Real-time visible inner loop (watch it build/fix as it happens)** | YES | Partial -- dashboard shows iterations/logs, but app preview not embedded; loop is a longer autonomous batch | gap |
-| **Proactive "suggest add/improve/remove" surfaced to user** | YES | Partial -- analysis pass exists internally; not surfaced as user-facing suggestions | gap |
-| Works on existing/brownfield repos | weak (greenfield-biased) | STRONG | `loki heal`, codebase analysis |
-| No hosted-runtime lock-in; your machine, your code | NO (vendor cloud) | YES | local-first CLI |
-| Multi-provider (Claude/Codex/Cline/Aider) | NO | YES | `providers/*.sh` |
-## 4. Where Loki is BETTER (keep + lead with these)
-1. **Local-first, no vendor lock-in.** Your code never leaves your machine; no hosted runtime you can be evicted from or surprised-billed on (Lovable's dual-layer billing is a known pain). Enterprises care about this.
-2. **Brownfield/existing repos.** Replit/Lovable/Bolt are greenfield-biased (prompt -> new app). Loki runs on real existing codebases, including `loki heal` for legacy systems.
-3. **Verified completion + failure-memory.** Evidence gate blocks fabricated "done"; anti-pattern memory prevents repeating mistakes across runs. The hosted tools mostly re-discover bugs each session.
-4. **Multi-provider + your own keys/budget.** Not locked to one model vendor or a credit economy.
-5. **Depth of autonomous SDLC** (RARV, council review, quality gates) vs a single conversational agent.
-## 5. Where Loki is WORSE (the real friction)
-1. **No instant "try it" moment.** The app DOES start (app-runner), but the dashboard never hands the user a clickable URL or embedded preview. This is the single biggest felt gap and it is a *surfacing* fix, not a build.
-2. **Setup friction.** Competitors are zero-install (open a browser). Loki needs the `claude` CLI, a terminal, `loki start`. Non-technical consumers stall here.
-3. **The inner loop is a long batch, not a watched conversation.** Users can't see "it's testing the login button now and fixing it." Same capability, far less visible.
-4. **Suggestions aren't surfaced.** Loki's analysis pass reasons about what to add/fix internally, but doesn't present a user-facing "here's what I'd improve next" list.
-5. **First-run time-to-wow.** No 30-second "look, it works" the way a hosted preview gives.
-## 6. Honest category line (for positioning, not inferiority)
-Hosted text-to-app SaaS (Replit/Lovable/Bolt): instant live preview, tight visible loop, friendly to non-technical users -- but your code on their cloud, vendor + credit lock-in, greenfield-biased.
-Loki Mode: local-first CLI driving Claude on your own machine -- brownfield-capable, no hosted-runtime lock-in, multi-provider, verified completion + memory -- but no instant hosted preview and higher setup friction.
-The wins below close the *experience* gap without giving up the local-first advantages.
-## 7. Prioritized TODO (by blast radius / friction reduction)
-### P0 -- Live Preview surfacing (the headline win; cheapest path to "try it")
-The app already starts with crash watchdog. Surface it.
-- **Dashboard:** add a "Live App" panel that reads `.loki/app-runner/state.json` (status, port, url, crash_count), shows a clickable `http://localhost:<port>` link + an embedded iframe + health/crash badge + "Restart app" button (wire to existing `app_runner_restart`).
-- **CLI:** `loki preview` (alias `loki open`) -- prints the running app URL and opens the browser; honest message if no app is running yet.
-- **API:** `GET /api/app-runner` (state passthrough), `POST /api/app-runner/restart`.
-- Pure surfacing of existing state; no new runtime behavior. Lowest risk, highest felt impact.
-### P1 -- Tighter, visible self-healing loop
-- Stream app-runner crash events + playwright pass/fail to the dashboard timeline in near-real-time (event bus already exists, `events/bus.py`).
-- Dashboard "what just happened" feed: "app crashed -> reading log -> fixing -> restarted -> smoke test passed."
-- Honest framing: this exposes the EXISTING batch loop more visibly; it does not claim Replit's per-click realtime browser sim.
-### P2 -- Proactive suggestions surfaced to the user
-- Add a structured "Suggestions" output from the analysis pass (add/improve/remove/risk), persisted to `.loki/suggestions.json`.
-- Dashboard "Suggestions" panel + `loki suggest` CLI to print them.
-- These are advisory; the user opts in to queue any as tasks.
-### P3 -- First-run time-to-wow / setup friction
-- `loki try <one-line-idea>`: scaffold a tiny app, build it, auto-start app-runner, open preview -- a guided 60-second "it works" path (honest: real build, not simulated).
-- Doctor-style preflight that detects missing `claude` CLI and guides install.
-### P4 -- Non-technical on-ramp (longer term, optional)
-- Evaluate an optional hosted/containerized preview for users who can't run locally (collides with zero-egress posture; opt-in only, deferred).
-## 8. What this session will actually implement
-Per user direction ("update dashboard and backend cli or api accordingly", "plan it perfectly", "complete autonomously"): implement **P0 (Live Preview surfacing)** end-to-end (dashboard + CLI + API), both runtime routes where applicable, council-reviewed, local-ci 42/42, channels validated. P1-P4 are scoped follow-ups.
-## 9. SWE-bench note (unrelated but pending, must be stated)
-Primary-source data in `benchmarks/results/` shows ONLY patch generation (299/300 generated, `fixed_by_rarv:0`, status PATCHES_GENERATED, the official evaluator was never run, no resolve/pass-rate figure exists; some patches are prose not diffs). There is no "release 660." Publishing a SWE-bench resolve score would be fabrication. The only real measured number is HumanEval **98.78%** (162/164). Recommendation: lead with HumanEval; keep SWE-bench as "harness exists, resolve-rate not yet measured" + the repro command; offer to run the official evaluator as an opt-in upgrade.
-## Sources (competitor research, June 2026)
-- Replit Agent 3: https://blog.replit.com/introducing-agent-3-our-most-autonomous-agent-yet , https://blog.replit.com/automated-self-testing , https://docs.replit.com/core-concepts/agent
-- Lovable: https://lovable.dev/ , https://lovable.dev/pricing , https://www.nocode.mba/articles/lovable-ai-app-builder
-- Bolt.new: https://github.com/stackblitz/bolt.new , https://capacity.so/blog/what-is-bolt-new , https://www.banani.co/blog/bolt-new-ai-review-and-alternatives