npm - loki-mode - Versions diffs - 7.24.0 → 7.25.0 - Mend

loki-mode 7.24.0 → 7.25.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/README.md +2 -1
package/SKILL.md +2 -2
package/VERSION +1 -1
package/autonomy/run.sh +92 -3
package/dashboard/__init__.py +1 -1
package/docs/INSTALLATION.md +1 -1
package/loki-ts/dist/loki.js +2 -2
package/mcp/__init__.py +1 -1
package/package.json +1 -1
package/docs/COMPETITIVE-ANALYSIS-INSTANT-PREVIEW-2026-06.md +0 -114

package/README.md CHANGED Viewed

@@ -27,6 +27,7 @@
 - **Spec-driven, autonomous, with a built-in trust layer** -- Hand Loki a spec, walk away, come back to working code with tests. The full RARV-C closure loop (Reason - Act - Reflect - Verify - Close) runs until the work is actually done, not just attempted. The verified-completion evidence gate (`skills/quality-gates.md`) refuses any "done" claim on an empty git diff against the run-start commit, and blocks completion when tests run red, so "complete" means proven, not promised.
 - **Production quality built in** -- 11 quality gates (`skills/quality-gates.md`), blind 3-reviewer code review (`run.sh:run_code_review()`), anti-sycophancy checks
 - **Live App Preview** -- The dashboard embeds the locally-running app in an iframe so you can interact with it immediately during a build. Use `loki preview` (alias `loki open`) to print the URL and open it in your browser. Local-first: no hosted service, no vendor lock (v7.24.0).
+- **Intelligent `loki start`** -- For interactive foreground runs the dashboard auto-opens in the browser (cross-platform; skipped in CI, SSH-without-TTY, and piped runs; opt out with `LOKI_NO_AUTO_OPEN=1`). The completion summary shows "Your app is live at <url>" so you know exactly where to try what Loki just built. The autonomous loop passes Claude Code's `--effort`, `--max-budget-usd`, and `--fallback-model` on every iteration (each gated on CLI support and individual opt-out env vars) for better long-run unattended execution (v7.25.0).
 - **Cross-project memory** -- Episodic/semantic/procedural memory with vector search; knowledge learned on one project surfaces on the next (v5.15.0+, see `memory/engine.py`)
 - **Self-hosted and private** -- Your keys, your infrastructure, no data leaves your network
 - **Legacy system healing** -- `loki heal` archaeology/stabilize/isolate/modernize/validate phases (v6.67.0, see `skills/healing.md`)
@@ -347,7 +348,7 @@ Claude gets full features (subagents, parallelization, MCP, Task tool). Other ac
 | Command | Description |
 |---------|-------------|
-| `loki start [PRD]` | Start with optional PRD file (also accepts an issue ref; replaces deprecated `loki run`) |
+| `loki start [PRD]` | Start with optional PRD file (also accepts an issue ref; replaces deprecated `loki run`). Auto-opens the dashboard in the browser for interactive runs and passes native `--effort`/`--max-budget-usd`/`--fallback-model` for resilience (v7.25.0) |
 | `loki stop` | Stop execution |
 | `loki heal <path>` | Legacy system healing (archaeology, stabilize, isolate, modernize, validate -- v6.67.0) |
 | `loki pause` / `resume` | Pause/resume after current session |

package/SKILL.md CHANGED Viewed

@@ -3,7 +3,7 @@ name: loki-mode
 description: Autonomous spec-driven build system with a built-in trust layer. It does not call work done until it is verified (RARV-C closure loop, 11 quality gates, completion council, verified-completion evidence gate). Triggers on "Loki Mode". Takes a spec (PRD, GitHub issue, OpenAPI doc, etc.) to deployed product with minimal human intervention. Provider-agnostic. Requires --dangerously-skip-permissions flag.
 ---
-# Loki Mode v7.24.0
+# Loki Mode v7.25.0
 **You are an autonomous agent. You make decisions. You do not ask questions. You do not stop.**
@@ -383,4 +383,4 @@ See `CHANGELOG.md` entries [7.5.7], [7.5.8], [7.5.13] for the per-fix list and r
 ---
-**v7.24.0 | [Autonomi](https://www.autonomi.dev/) flagship product | ~260 lines core**
+**v7.25.0 | [Autonomi](https://www.autonomi.dev/) flagship product | ~260 lines core**

package/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 7.24.0
1	+ 7.25.0

package/autonomy/run.sh CHANGED Viewed

@@ -2423,6 +2423,20 @@ build_completion_summary() {
         *)              outcome_label="$outcome";          notify_title="Run finished" ;;
     esac
+    # Live app URL (best-effort): if the app runner has a running app, surface
+    # where the user can try it. Reads .loki/app-runner/state.json written by
+    # app-runner.sh. Empty when no app is running.
+    local live_app_url=""
+    local _app_state_file="$loki_dir/app-runner/state.json"
+    if [ -f "$_app_state_file" ]; then
+        live_app_url="$(python3 -c "import json,sys
+try:
+    d=json.load(open(sys.argv[1]))
+    print(d.get('url','') if d.get('status')=='running' else '')
+except Exception:
+    print('')" "$_app_state_file" 2>/dev/null)"
+    fi
     # Branch + diff stats vs the run-start SHA (best-effort; non-git or empty
     # baseline yields empty values, which we render as "unknown"/"0").
     local start_sha="${_LOKI_RUN_START_SHA:-}"
@@ -2483,6 +2497,15 @@ build_completion_summary() {
             echo "Pull request: not opened (set LOKI_DELEGATE_PR=1 to open one)"
         fi
         echo ""
+        if [ -n "$live_app_url" ]; then
+            # Compute the dashboard scheme the same way start_dashboard does
+            # (url_scheme is local to that function, not visible here).
+            local _dash_scheme="http"
+            [ -n "${LOKI_TLS_CERT:-}" ] && [ -n "${LOKI_TLS_KEY:-}" ] && _dash_scheme="https"
+            echo "Your app is live at: $live_app_url  (served locally on this machine)"
+            echo "  Dashboard: ${_dash_scheme}://127.0.0.1:${DASHBOARD_PORT:-57374}/  (App Runner -> Live App)"
+            echo ""
+        fi
         echo "Tasks: pending=$pending in_progress=$in_progress completed=$completed failed=$failed"
         echo ""
         echo "Review the work:"
@@ -8216,9 +8239,22 @@ start_dashboard() {
         log_info "Dashboard started (PID: $DASHBOARD_PID)"
         log_info "Dashboard: ${CYAN}${url_scheme}://127.0.0.1:$DASHBOARD_PORT/${NC}"
-        # Open in browser (macOS)
-        if [[ "$OSTYPE" == "darwin"* ]]; then
-            open "${url_scheme}://127.0.0.1:$DASHBOARD_PORT/" 2>/dev/null || true
+        # Auto-open the dashboard in the browser, but ONLY for an interactive
+        # foreground session. Gated on: a TTY on stdout ([ -t 1 ]), not
+        # background/detached mode, and not explicitly opted out via
+        # LOKI_NO_AUTO_OPEN=1. This keeps CI, --detach, SSH-no-TTY, and piped
+        # runs from spawning a browser. Cross-platform: open / xdg-open / start.
+        if [ -t 1 ] && [ "${BACKGROUND_MODE:-false}" != "true" ] && [ "${LOKI_NO_AUTO_OPEN:-0}" != "1" ]; then
+            local _dash_url="${url_scheme}://127.0.0.1:$DASHBOARD_PORT/"
+            if command -v open >/dev/null 2>&1; then
+                open "$_dash_url" 2>/dev/null || true
+            elif command -v xdg-open >/dev/null 2>&1; then
+                xdg-open "$_dash_url" 2>/dev/null || true
+            elif command -v cmd.exe >/dev/null 2>&1; then
+                # Windows (Git Bash/WSL): `start` is a cmd builtin, not on PATH,
+                # so invoke it via cmd.exe. The empty "" is start's title arg.
+                cmd.exe /c start "" "$_dash_url" 2>/dev/null || true
+            fi
         fi
         return 0
     else
@@ -12141,6 +12177,59 @@ except Exception as exc:
            && loki_claude_flag_supported "--include-partial-messages"; then
             _loki_claude_argv+=("--include-partial-messages")
         fi
+        # ---- Bash<->Bun invocation-flag convergence ledger (v7.25.0) ----------
+        # The fixture corpus covers build_prompt/stats output, NOT this claude
+        # argv, so drift here is invisible to parity tests. Keep this ledger
+        # current. Live route today is BASH (bin/loki routes `start` -> bash).
+        # The claude provider in loki-ts/src/runner/providers.ts is implemented
+        # but is NOT reached for `start` (start is not ported to the Bun router;
+        # the shim falls through to bash), so its flag set has zero live impact
+        # today.
+        # Bash argv (canonical, live): --dangerously-skip-permissions --model M
+        #   [--append-system-prompt] [--setting-sources] [--include-partial-messages]
+        #   [--effort] [--max-budget-usd] [--fallback-model] -p PROMPT
+        #   --output-format stream-json --verbose
+        # Bun buildAutoFlags also emits: --exclude-dynamic-system-prompt-sections
+        #   (cost-only), --mcp-config (bash gets MCP via --setting-sources +
+        #   .mcp.json discovery; a how-difference, likely behavior-equivalent),
+        #   --include-hook-events (bash handles hook events in its embedded
+        #   stream parser; likely moot). These three are Bun-only and MUST be
+        #   reconciled to a deliberately chosen canonical set BEFORE `start`
+        #   flips to the Bun runner. They have zero live impact today.
+        # v7.25.0: long-run resilience + cost flags, appended individually here
+        # (NOT via _loki_build_claude_auto_flags, which would double the three
+        # flags above). Each is gated on CLI support + an opt-out env var, same
+        # pattern as above. These improve unattended/long-run execution:
+        #   --effort           adaptive reasoning depth per RARV tier
+        #   --max-budget-usd   per-call hard backstop (complements the
+        #                      cumulative check_budget_limit PAUSE gate)
+        #   --fallback-model   resilience to model overload/unavailability
+        # The trust/verification gates stay deterministic; these only tune how
+        # the provider is invoked, never whether work is judged complete.
+        if [ "${LOKI_AUTO_EFFORT:-on}" != "off" ] \
+           && type loki_effort_for_tier >/dev/null 2>&1 \
+           && type loki_claude_flag_supported >/dev/null 2>&1 \
+           && loki_claude_flag_supported "--effort"; then
+            local _loki_effort
+            _loki_effort="$(loki_effort_for_tier "$CURRENT_TIER" "${DETECTED_COMPLEXITY:-${LOKI_COMPLEXITY:-standard}}")"
+            [ -n "$_loki_effort" ] && _loki_claude_argv+=("--effort" "$_loki_effort")
+        fi
+        if [ "${LOKI_AUTO_BUDGET:-on}" != "off" ] \
+           && type loki_remaining_budget >/dev/null 2>&1 \
+           && type loki_claude_flag_supported >/dev/null 2>&1 \
+           && loki_claude_flag_supported "--max-budget-usd"; then
+            local _loki_rem_budget
+            _loki_rem_budget="$(loki_remaining_budget)"
+            [ -n "$_loki_rem_budget" ] && _loki_claude_argv+=("--max-budget-usd" "$_loki_rem_budget")
+        fi
+        if [ "${LOKI_AUTO_FALLBACK:-on}" != "off" ] \
+           && type loki_fallback_for_primary >/dev/null 2>&1 \
+           && type loki_claude_flag_supported >/dev/null 2>&1 \
+           && loki_claude_flag_supported "--fallback-model"; then
+            local _loki_fallback
+            _loki_fallback="$(loki_fallback_for_primary "$tier_param")"
+            [ -n "$_loki_fallback" ] && _loki_claude_argv+=("--fallback-model" "$_loki_fallback")
+        fi
         case "${PROVIDER_NAME:-claude}" in
             claude)
                 # Claude: Full features with stream-json output and agent tracking

package/dashboard/__init__.py CHANGED Viewed

@@ -7,7 +7,7 @@ Modules:
     control: Session control API (start/stop/pause/resume)
 """
-__version__ = "7.24.0"
+__version__ = "7.25.0"
 # Expose the control app for easy import
 try:

package/docs/INSTALLATION.md CHANGED Viewed

@@ -2,7 +2,7 @@
 The flagship product of [Autonomi](https://www.autonomi.dev/). Loki Mode is a spec-driven autonomous builder with a built-in trust layer that takes any spec to a deployed product and verifies completion with evidence (quality gates plus a completion council), not just a "done" claim. Complete installation instructions for all platforms and use cases.
-**Version:** v7.24.0
+**Version:** v7.25.0
 ---

package/loki-ts/dist/loki.js CHANGED Viewed

@@ -1,5 +1,5 @@
 // @bun
-var f8=Object.defineProperty;var u8=($)=>$;function c8($,Q){this[$]=u8.bind(null,Q)}var g=($,Q)=>{for(var Z in Q)f8($,Z,{get:Q[Z],enumerable:!0,configurable:!0,set:c8.bind(Q,Z)})};var k=($,Q)=>()=>($&&(Q=$($=0)),Q);var X1=import.meta.require;var F$={};g(F$,{lokiDir:()=>P,homeLokiDir:()=>o1,findRepoRootForVersion:()=>d1,REPO_ROOT:()=>f});import{resolve as n,dirname as l1}from"path";import{fileURLToPath as p8}from"url";import{existsSync as L1}from"fs";import{homedir as l8}from"os";function d8(){let $=j$;for(let Q=0;Q<6;Q++){if(L1(n($,"VERSION"))&&L1(n($,"autonomy/run.sh")))return $;let Z=l1($);if(Z===$)break;$=Z}return n(j$,"..","..","..")}function d1($){let Q=$;for(let Z=0;Z<6;Z++){if(L1(n(Q,"VERSION"))&&L1(n(Q,"autonomy/run.sh")))return Q;let z=l1(Q);if(z===Q)break;Q=z}return n($,"..","..","..")}function P(){return process.env.LOKI_DIR??n(process.cwd(),".loki")}function o1(){return n(l8(),".loki")}var j$,f;var y=k(()=>{j$=l1(p8(import.meta.url));f=d8()});import{readFileSync as o8}from"fs";import{resolve as n8,dirname as a8}from"path";import{fileURLToPath as s8}from"url";function k1(){if($1!==null)return $1;let $="7.24.0";if(typeof $==="string"&&$.length>0)return $1=$,$1;try{let Q=a8(s8(import.meta.url)),Z=d1(Q);$1=o8(n8(Z,"VERSION"),"utf-8").trim()}catch{$1="unknown"}return $1}var $1=null;var n1=k(()=>{y()});var E$={};g(E$,{runOrThrow:()=>t8,run:()=>j,commandVersion:()=>i8,commandExists:()=>v,ShellError:()=>a1});async function j($,Q={}){let Z=Bun.spawn({cmd:[...$],stdout:"pipe",stderr:"pipe",env:Q.env?{...process.env,...Q.env}:process.env,cwd:Q.cwd}),z,K;if(Q.timeoutMs&&Q.timeoutMs>0)z=setTimeout(()=>{try{Z.kill("SIGTERM")}catch{}K=setTimeout(()=>{try{Z.kill("SIGKILL")}catch{}},2000)},Q.timeoutMs);try{let[H,X,q]=await Promise.all([new Response(Z.stdout).text(),new Response(Z.stderr).text(),Z.exited]);return{stdout:H,stderr:X,exitCode:q}}finally{if(z)clearTimeout(z);if(K)clearTimeout(K)}}async function t8($,Q={}){let Z=await j($,Q);if(Z.exitCode!==0)throw new a1(`command failed (${Z.exitCode}): ${$.join(" ")}`,Z.exitCode,Z.stdout,Z.stderr);return Z}async function v($){let Q=r8($),Z=await j(["sh","-c",`command -v ${Q}`],{timeoutMs:5000});if(Z.exitCode===0)return Z.stdout.trim()||null;return null}function r8($){if(!/^[A-Za-z0-9._/-]+$/.test($))throw Error(`refused to shell-escape suspect token: ${$}`);return $}async function i8($,Q="--version"){if(!await v($))return null;let z=await j([$,Q],{timeoutMs:5000});if(z.exitCode!==0)return null;return((z.stdout||z.stderr).split(/\r?\n/)[0]?.trim()??"")||null}var a1;var d=k(()=>{a1=class a1 extends Error{message;exitCode;stdout;stderr;constructor($,Q,Z,z){super($);this.message=$;this.exitCode=Q;this.stdout=Z;this.stderr=z;this.name="ShellError"}}});function a($){return e8?"":$}var e8,T,N,_,KZ,A,R,h,J;var c=k(()=>{e8=(process.env.NO_COLOR??"").length>0;T=a("\x1B[0;31m"),N=a("\x1B[0;32m"),_=a("\x1B[1;33m"),KZ=a("\x1B[0;34m"),A=a("\x1B[0;36m"),R=a("\x1B[1m"),h=a("\x1B[2m"),J=a("\x1B[0m")});import{existsSync as U7}from"fs";async function Q1(){if(B1!==void 0)return B1;let $="/opt/homebrew/bin/python3.12";if(U7($))return B1=$,$;let Q=await v("python3.12");if(Q)return B1=Q,Q;let Z=await v("python3");return B1=Z,Z}async function Z1($,Q={}){let Z=await Q1();if(!Z)return{stdout:"",stderr:"python3 not found",exitCode:127};return j([Z,"-c",$],Q)}var B1;var H1=k(()=>{d()});var d$={};g(d$,{runStatus:()=>N7});import{existsSync as b,readFileSync as q1,readdirSync as v$,statSync as f$}from"fs";import{resolve as D,basename as P7}from"path";import{homedir as L7}from"os";async function j7(){if(await v("jq"))return!0;return process.stdout.write(`${T}Error: jq is required but not installed.${J}
+var f8=Object.defineProperty;var u8=($)=>$;function c8($,Q){this[$]=u8.bind(null,Q)}var g=($,Q)=>{for(var Z in Q)f8($,Z,{get:Q[Z],enumerable:!0,configurable:!0,set:c8.bind(Q,Z)})};var k=($,Q)=>()=>($&&(Q=$($=0)),Q);var X1=import.meta.require;var F$={};g(F$,{lokiDir:()=>P,homeLokiDir:()=>o1,findRepoRootForVersion:()=>d1,REPO_ROOT:()=>f});import{resolve as n,dirname as l1}from"path";import{fileURLToPath as p8}from"url";import{existsSync as L1}from"fs";import{homedir as l8}from"os";function d8(){let $=j$;for(let Q=0;Q<6;Q++){if(L1(n($,"VERSION"))&&L1(n($,"autonomy/run.sh")))return $;let Z=l1($);if(Z===$)break;$=Z}return n(j$,"..","..","..")}function d1($){let Q=$;for(let Z=0;Z<6;Z++){if(L1(n(Q,"VERSION"))&&L1(n(Q,"autonomy/run.sh")))return Q;let z=l1(Q);if(z===Q)break;Q=z}return n($,"..","..","..")}function P(){return process.env.LOKI_DIR??n(process.cwd(),".loki")}function o1(){return n(l8(),".loki")}var j$,f;var y=k(()=>{j$=l1(p8(import.meta.url));f=d8()});import{readFileSync as o8}from"fs";import{resolve as n8,dirname as a8}from"path";import{fileURLToPath as s8}from"url";function k1(){if($1!==null)return $1;let $="7.25.0";if(typeof $==="string"&&$.length>0)return $1=$,$1;try{let Q=a8(s8(import.meta.url)),Z=d1(Q);$1=o8(n8(Z,"VERSION"),"utf-8").trim()}catch{$1="unknown"}return $1}var $1=null;var n1=k(()=>{y()});var E$={};g(E$,{runOrThrow:()=>t8,run:()=>j,commandVersion:()=>i8,commandExists:()=>v,ShellError:()=>a1});async function j($,Q={}){let Z=Bun.spawn({cmd:[...$],stdout:"pipe",stderr:"pipe",env:Q.env?{...process.env,...Q.env}:process.env,cwd:Q.cwd}),z,K;if(Q.timeoutMs&&Q.timeoutMs>0)z=setTimeout(()=>{try{Z.kill("SIGTERM")}catch{}K=setTimeout(()=>{try{Z.kill("SIGKILL")}catch{}},2000)},Q.timeoutMs);try{let[H,X,q]=await Promise.all([new Response(Z.stdout).text(),new Response(Z.stderr).text(),Z.exited]);return{stdout:H,stderr:X,exitCode:q}}finally{if(z)clearTimeout(z);if(K)clearTimeout(K)}}async function t8($,Q={}){let Z=await j($,Q);if(Z.exitCode!==0)throw new a1(`command failed (${Z.exitCode}): ${$.join(" ")}`,Z.exitCode,Z.stdout,Z.stderr);return Z}async function v($){let Q=r8($),Z=await j(["sh","-c",`command -v ${Q}`],{timeoutMs:5000});if(Z.exitCode===0)return Z.stdout.trim()||null;return null}function r8($){if(!/^[A-Za-z0-9._/-]+$/.test($))throw Error(`refused to shell-escape suspect token: ${$}`);return $}async function i8($,Q="--version"){if(!await v($))return null;let z=await j([$,Q],{timeoutMs:5000});if(z.exitCode!==0)return null;return((z.stdout||z.stderr).split(/\r?\n/)[0]?.trim()??"")||null}var a1;var d=k(()=>{a1=class a1 extends Error{message;exitCode;stdout;stderr;constructor($,Q,Z,z){super($);this.message=$;this.exitCode=Q;this.stdout=Z;this.stderr=z;this.name="ShellError"}}});function a($){return e8?"":$}var e8,T,N,_,KZ,A,R,h,J;var c=k(()=>{e8=(process.env.NO_COLOR??"").length>0;T=a("\x1B[0;31m"),N=a("\x1B[0;32m"),_=a("\x1B[1;33m"),KZ=a("\x1B[0;34m"),A=a("\x1B[0;36m"),R=a("\x1B[1m"),h=a("\x1B[2m"),J=a("\x1B[0m")});import{existsSync as U7}from"fs";async function Q1(){if(B1!==void 0)return B1;let $="/opt/homebrew/bin/python3.12";if(U7($))return B1=$,$;let Q=await v("python3.12");if(Q)return B1=Q,Q;let Z=await v("python3");return B1=Z,Z}async function Z1($,Q={}){let Z=await Q1();if(!Z)return{stdout:"",stderr:"python3 not found",exitCode:127};return j([Z,"-c",$],Q)}var B1;var H1=k(()=>{d()});var d$={};g(d$,{runStatus:()=>N7});import{existsSync as b,readFileSync as q1,readdirSync as v$,statSync as f$}from"fs";import{resolve as D,basename as P7}from"path";import{homedir as L7}from"os";async function j7(){if(await v("jq"))return!0;return process.stdout.write(`${T}Error: jq is required but not installed.${J}
 `),process.stdout.write(`Install with:
 `),process.stdout.write(`  brew install jq    (macOS)
 `),process.stdout.write(`  apt install jq     (Debian/Ubuntu)
@@ -787,4 +787,4 @@ Set LOKI_LEGACY_BASH=1 to force the bash CLI for every command.
 `),2}default:return process.stderr.write(`Unknown command: ${Q}
 `),process.stderr.write(v8),2}}g$();process.on("SIGINT",()=>process.exit(130));process.on("SIGTERM",()=>process.exit(143));var l3=await p3(Bun.argv.slice(2));process.exit(l3);
-//# debugId=5BE33C3C3E53FD8864756E2164756E21
+//# debugId=C1988314C1A4579264756E2164756E21

package/mcp/__init__.py CHANGED Viewed

@@ -57,4 +57,4 @@ try:
 except ImportError:
     __all__ = ['mcp']
-__version__ = '7.24.0'
+__version__ = '7.25.0'

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "loki-mode",
-  "version": "7.24.0",
+  "version": "7.25.0",
   "description": "Loki Mode by Autonomi. Autonomous spec-to-product system: takes a PRD, GitHub issue, OpenAPI/JSON/YAML, or one-line brief to a deployed app via the RARV-C closure loop with 11 quality gates. Provider-agnostic (Claude Code, OpenAI Codex, Cline, Aider).",
   "keywords": [
     "agent",

package/docs/COMPETITIVE-ANALYSIS-INSTANT-PREVIEW-2026-06.md DELETED Viewed

@@ -1,114 +0,0 @@
-# Loki Mode vs Replit Agent / Lovable / Bolt.new -- Instant-Preview & Self-Healing Gap Analysis
-Date: 2026-06-09
-Author: autonomous analysis (verified against Loki source + competitor web research at knowledge cutoff Jan 2026 plus June 2026 web search)
-Status: analysis + prioritized TODO. The TODO is the planning checkpoint, not an auto-launched implementation.
-## 1. The user's framing
-> "When users prompt a spec or idea, [Replit/Lovable/Bolt] just builds and spins up UI so users can try it. They are clever at suggesting what to add/update/improve/remove, finding bugs while the app runs and fixing them autonomously before the user realizes. Why are we not able to do that? We are lacking something that is causing friction for users (devs, enterprises, non-technical consumers)."
-This is correct as a *felt experience* gap. It is NOT correct that Loki lacks the underlying capability. The gap is mostly **surfacing and loop-tightness**, plus a real **category difference**.
-## 2. What the competitors actually do (verified June 2026)
-### Replit Agent 3
-- Runs autonomously up to ~200 minutes ("10x more autonomous than Agent 2").
-- "Self-healing" loop: spins up a real browser, simulates user behavior (click/type/login), captures logs, and fixes bugs it hits during testing. Calls out "Potemkin interfaces" (looks-done-but-broken).
-- REPL-based verification at scale; provisions database; verifies every button/API call.
-- Code-to-Device: builds + previews native mobile via Expo QR instantly.
-- "Stacks": agents building agents. RulesSync for replit.md across projects.
-- Hosted: your project runs on Replit's cloud, instant live preview built into the IDE.
-### Lovable
-- Three modes: Visual Edits (click an element), Plan Mode (conversational), Agent Mode (autonomous codebase exploration + proactive debugging + real-time web search).
-- Live preview updates in real time; checkpoint/version system to revert a bad edit.
-- Pricing: Free (5 daily credits), Pro $25/mo, Business $50/mo; plus usage-based Cloud/AI billing on shipped apps.
-- Hosted; greenfield-biased (build a new app from a prompt).
-### Bolt.new (StackBlitz)
-- WebContainer: Node runs natively *in the browser*, no server. Live preview updates as code generates; you click/fill/interact immediately.
-- Agent has full control of filesystem, node server, package manager, terminal, browser console; human-in-the-loop chat + hand-edit.
-- Bolt V2: Bolt Cloud (DB, auth, storage, edge functions, analytics, hosting), one-click Netlify deploy. Opus 4.6 with adjustable reasoning depth (Jan 2026).
-- Hosted/in-browser; greenfield-biased.
-Common thread: **hosted text-to-app**. Instant live preview is free because the runtime IS their cloud/browser sandbox. The inner loop (build -> see -> fix) is conversational, visible, real-time.
-## 3. What Loki actually has (verified in source)
-| Capability | Competitors | Loki today | Source |
-|---|---|---|---|
-| Starts the built app on a detected port | yes | YES (`app_runner_start`, `_detect_port`) | `autonomy/app-runner.sh:498,150` |
-| Health check + crash watchdog + auto-restart | yes | YES (`app_runner_watchdog`, `app_runner_should_restart`, `app_runner_health_check`) | `autonomy/app-runner.sh:769,735,678` |
-| Browser smoke test of the running app | yes (their core loop) | YES, batch (`playwright-verify.sh`) | `autonomy/playwright-verify.sh` |
-| Crash/playwright signal fed back into next iteration to self-correct | yes, tight realtime | YES, batch (`app_runner_info`/`playwright_info` injected into `build_prompt`) | `autonomy/run.sh:10544,10561,10761` |
-| Verified completion gate (no fabricated "done") | partial | YES (`council_evidence_gate`) | `completion-council.sh` |
-| Failure-memory: past failures injected to avoid repeats | partial | YES (`retrieve_anti_patterns`) | `memory/retrieval.py` |
-| **Clickable live preview URL / embedded app the user can try** | YES, central | **NO -- app runs but URL is not surfaced in dashboard** | gap: `dashboard/server.py` has no preview route |
-| **Real-time visible inner loop (watch it build/fix as it happens)** | YES | Partial -- dashboard shows iterations/logs, but app preview not embedded; loop is a longer autonomous batch | gap |
-| **Proactive "suggest add/improve/remove" surfaced to user** | YES | Partial -- analysis pass exists internally; not surfaced as user-facing suggestions | gap |
-| Works on existing/brownfield repos | weak (greenfield-biased) | STRONG | `loki heal`, codebase analysis |
-| No hosted-runtime lock-in; your machine, your code | NO (vendor cloud) | YES | local-first CLI |
-| Multi-provider (Claude/Codex/Cline/Aider) | NO | YES | `providers/*.sh` |
-## 4. Where Loki is BETTER (keep + lead with these)
-1. **Local-first, no vendor lock-in.** Your code never leaves your machine; no hosted runtime you can be evicted from or surprised-billed on (Lovable's dual-layer billing is a known pain). Enterprises care about this.
-2. **Brownfield/existing repos.** Replit/Lovable/Bolt are greenfield-biased (prompt -> new app). Loki runs on real existing codebases, including `loki heal` for legacy systems.
-3. **Verified completion + failure-memory.** Evidence gate blocks fabricated "done"; anti-pattern memory prevents repeating mistakes across runs. The hosted tools mostly re-discover bugs each session.
-4. **Multi-provider + your own keys/budget.** Not locked to one model vendor or a credit economy.
-5. **Depth of autonomous SDLC** (RARV, council review, quality gates) vs a single conversational agent.
-## 5. Where Loki is WORSE (the real friction)
-1. **No instant "try it" moment.** The app DOES start (app-runner), but the dashboard never hands the user a clickable URL or embedded preview. This is the single biggest felt gap and it is a *surfacing* fix, not a build.
-2. **Setup friction.** Competitors are zero-install (open a browser). Loki needs the `claude` CLI, a terminal, `loki start`. Non-technical consumers stall here.
-3. **The inner loop is a long batch, not a watched conversation.** Users can't see "it's testing the login button now and fixing it." Same capability, far less visible.
-4. **Suggestions aren't surfaced.** Loki's analysis pass reasons about what to add/fix internally, but doesn't present a user-facing "here's what I'd improve next" list.
-5. **First-run time-to-wow.** No 30-second "look, it works" the way a hosted preview gives.
-## 6. Honest category line (for positioning, not inferiority)
-Hosted text-to-app SaaS (Replit/Lovable/Bolt): instant live preview, tight visible loop, friendly to non-technical users -- but your code on their cloud, vendor + credit lock-in, greenfield-biased.
-Loki Mode: local-first CLI driving Claude on your own machine -- brownfield-capable, no hosted-runtime lock-in, multi-provider, verified completion + memory -- but no instant hosted preview and higher setup friction.
-The wins below close the *experience* gap without giving up the local-first advantages.
-## 7. Prioritized TODO (by blast radius / friction reduction)
-### P0 -- Live Preview surfacing (the headline win; cheapest path to "try it")
-The app already starts with crash watchdog. Surface it.
-- **Dashboard:** add a "Live App" panel that reads `.loki/app-runner/state.json` (status, port, url, crash_count), shows a clickable `http://localhost:<port>` link + an embedded iframe + health/crash badge + "Restart app" button (wire to existing `app_runner_restart`).
-- **CLI:** `loki preview` (alias `loki open`) -- prints the running app URL and opens the browser; honest message if no app is running yet.
-- **API:** `GET /api/app-runner` (state passthrough), `POST /api/app-runner/restart`.
-- Pure surfacing of existing state; no new runtime behavior. Lowest risk, highest felt impact.
-### P1 -- Tighter, visible self-healing loop
-- Stream app-runner crash events + playwright pass/fail to the dashboard timeline in near-real-time (event bus already exists, `events/bus.py`).
-- Dashboard "what just happened" feed: "app crashed -> reading log -> fixing -> restarted -> smoke test passed."
-- Honest framing: this exposes the EXISTING batch loop more visibly; it does not claim Replit's per-click realtime browser sim.
-### P2 -- Proactive suggestions surfaced to the user
-- Add a structured "Suggestions" output from the analysis pass (add/improve/remove/risk), persisted to `.loki/suggestions.json`.
-- Dashboard "Suggestions" panel + `loki suggest` CLI to print them.
-- These are advisory; the user opts in to queue any as tasks.
-### P3 -- First-run time-to-wow / setup friction
-- `loki try <one-line-idea>`: scaffold a tiny app, build it, auto-start app-runner, open preview -- a guided 60-second "it works" path (honest: real build, not simulated).
-- Doctor-style preflight that detects missing `claude` CLI and guides install.
-### P4 -- Non-technical on-ramp (longer term, optional)
-- Evaluate an optional hosted/containerized preview for users who can't run locally (collides with zero-egress posture; opt-in only, deferred).
-## 8. What this session will actually implement
-Per user direction ("update dashboard and backend cli or api accordingly", "plan it perfectly", "complete autonomously"): implement **P0 (Live Preview surfacing)** end-to-end (dashboard + CLI + API), both runtime routes where applicable, council-reviewed, local-ci 42/42, channels validated. P1-P4 are scoped follow-ups.
-## 9. SWE-bench note (unrelated but pending, must be stated)
-Primary-source data in `benchmarks/results/` shows ONLY patch generation (299/300 generated, `fixed_by_rarv:0`, status PATCHES_GENERATED, the official evaluator was never run, no resolve/pass-rate figure exists; some patches are prose not diffs). There is no "release 660." Publishing a SWE-bench resolve score would be fabrication. The only real measured number is HumanEval **98.78%** (162/164). Recommendation: lead with HumanEval; keep SWE-bench as "harness exists, resolve-rate not yet measured" + the repro command; offer to run the official evaluator as an opt-in upgrade.
-## Sources (competitor research, June 2026)
-- Replit Agent 3: https://blog.replit.com/introducing-agent-3-our-most-autonomous-agent-yet , https://blog.replit.com/automated-self-testing , https://docs.replit.com/core-concepts/agent
-- Lovable: https://lovable.dev/ , https://lovable.dev/pricing , https://www.nocode.mba/articles/lovable-ai-app-builder
-- Bolt.new: https://github.com/stackblitz/bolt.new , https://capacity.so/blog/what-is-bolt-new , https://www.banani.co/blog/bolt-new-ai-review-and-alternatives