loki-mode 7.24.0 → 7.25.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -27,6 +27,7 @@
27
27
  - **Spec-driven, autonomous, with a built-in trust layer** -- Hand Loki a spec, walk away, come back to working code with tests. The full RARV-C closure loop (Reason - Act - Reflect - Verify - Close) runs until the work is actually done, not just attempted. The verified-completion evidence gate (`skills/quality-gates.md`) refuses any "done" claim on an empty git diff against the run-start commit, and blocks completion when tests run red, so "complete" means proven, not promised.
28
28
  - **Production quality built in** -- 11 quality gates (`skills/quality-gates.md`), blind 3-reviewer code review (`run.sh:run_code_review()`), anti-sycophancy checks
29
29
  - **Live App Preview** -- The dashboard embeds the locally-running app in an iframe so you can interact with it immediately during a build. Use `loki preview` (alias `loki open`) to print the URL and open it in your browser. Local-first: no hosted service, no vendor lock (v7.24.0).
30
+ - **Intelligent `loki start`** -- For interactive foreground runs the dashboard auto-opens in the browser (cross-platform; skipped in CI, SSH-without-TTY, and piped runs; opt out with `LOKI_NO_AUTO_OPEN=1`). The completion summary shows "Your app is live at <url>" so you know exactly where to try what Loki just built. The autonomous loop passes Claude Code's `--effort`, `--max-budget-usd`, and `--fallback-model` on every iteration (each gated on CLI support and individual opt-out env vars) for better long-run unattended execution (v7.25.0).
30
31
  - **Cross-project memory** -- Episodic/semantic/procedural memory with vector search; knowledge learned on one project surfaces on the next (v5.15.0+, see `memory/engine.py`)
31
32
  - **Self-hosted and private** -- Your keys, your infrastructure, no data leaves your network
32
33
  - **Legacy system healing** -- `loki heal` archaeology/stabilize/isolate/modernize/validate phases (v6.67.0, see `skills/healing.md`)
@@ -347,7 +348,7 @@ Claude gets full features (subagents, parallelization, MCP, Task tool). Other ac
347
348
 
348
349
  | Command | Description |
349
350
  |---------|-------------|
350
- | `loki start [PRD]` | Start with optional PRD file (also accepts an issue ref; replaces deprecated `loki run`) |
351
+ | `loki start [PRD]` | Start with optional PRD file (also accepts an issue ref; replaces deprecated `loki run`). Auto-opens the dashboard in the browser for interactive runs and passes native `--effort`/`--max-budget-usd`/`--fallback-model` for resilience (v7.25.0) |
351
352
  | `loki stop` | Stop execution |
352
353
  | `loki heal <path>` | Legacy system healing (archaeology, stabilize, isolate, modernize, validate -- v6.67.0) |
353
354
  | `loki pause` / `resume` | Pause/resume after current session |
package/SKILL.md CHANGED
@@ -3,7 +3,7 @@ name: loki-mode
3
3
  description: Autonomous spec-driven build system with a built-in trust layer. It does not call work done until it is verified (RARV-C closure loop, 11 quality gates, completion council, verified-completion evidence gate). Triggers on "Loki Mode". Takes a spec (PRD, GitHub issue, OpenAPI doc, etc.) to deployed product with minimal human intervention. Provider-agnostic. Requires --dangerously-skip-permissions flag.
4
4
  ---
5
5
 
6
- # Loki Mode v7.24.0
6
+ # Loki Mode v7.25.0
7
7
 
8
8
  **You are an autonomous agent. You make decisions. You do not ask questions. You do not stop.**
9
9
 
@@ -383,4 +383,4 @@ See `CHANGELOG.md` entries [7.5.7], [7.5.8], [7.5.13] for the per-fix list and r
383
383
 
384
384
  ---
385
385
 
386
- **v7.24.0 | [Autonomi](https://www.autonomi.dev/) flagship product | ~260 lines core**
386
+ **v7.25.0 | [Autonomi](https://www.autonomi.dev/) flagship product | ~260 lines core**
package/VERSION CHANGED
@@ -1 +1 @@
1
- 7.24.0
1
+ 7.25.0
package/autonomy/run.sh CHANGED
@@ -2423,6 +2423,20 @@ build_completion_summary() {
2423
2423
  *) outcome_label="$outcome"; notify_title="Run finished" ;;
2424
2424
  esac
2425
2425
 
2426
+ # Live app URL (best-effort): if the app runner has a running app, surface
2427
+ # where the user can try it. Reads .loki/app-runner/state.json written by
2428
+ # app-runner.sh. Empty when no app is running.
2429
+ local live_app_url=""
2430
+ local _app_state_file="$loki_dir/app-runner/state.json"
2431
+ if [ -f "$_app_state_file" ]; then
2432
+ live_app_url="$(python3 -c "import json,sys
2433
+ try:
2434
+ d=json.load(open(sys.argv[1]))
2435
+ print(d.get('url','') if d.get('status')=='running' else '')
2436
+ except Exception:
2437
+ print('')" "$_app_state_file" 2>/dev/null)"
2438
+ fi
2439
+
2426
2440
  # Branch + diff stats vs the run-start SHA (best-effort; non-git or empty
2427
2441
  # baseline yields empty values, which we render as "unknown"/"0").
2428
2442
  local start_sha="${_LOKI_RUN_START_SHA:-}"
@@ -2483,6 +2497,15 @@ build_completion_summary() {
2483
2497
  echo "Pull request: not opened (set LOKI_DELEGATE_PR=1 to open one)"
2484
2498
  fi
2485
2499
  echo ""
2500
+ if [ -n "$live_app_url" ]; then
2501
+ # Compute the dashboard scheme the same way start_dashboard does
2502
+ # (url_scheme is local to that function, not visible here).
2503
+ local _dash_scheme="http"
2504
+ [ -n "${LOKI_TLS_CERT:-}" ] && [ -n "${LOKI_TLS_KEY:-}" ] && _dash_scheme="https"
2505
+ echo "Your app is live at: $live_app_url (served locally on this machine)"
2506
+ echo " Dashboard: ${_dash_scheme}://127.0.0.1:${DASHBOARD_PORT:-57374}/ (App Runner -> Live App)"
2507
+ echo ""
2508
+ fi
2486
2509
  echo "Tasks: pending=$pending in_progress=$in_progress completed=$completed failed=$failed"
2487
2510
  echo ""
2488
2511
  echo "Review the work:"
@@ -8216,9 +8239,22 @@ start_dashboard() {
8216
8239
  log_info "Dashboard started (PID: $DASHBOARD_PID)"
8217
8240
  log_info "Dashboard: ${CYAN}${url_scheme}://127.0.0.1:$DASHBOARD_PORT/${NC}"
8218
8241
 
8219
- # Open in browser (macOS)
8220
- if [[ "$OSTYPE" == "darwin"* ]]; then
8221
- open "${url_scheme}://127.0.0.1:$DASHBOARD_PORT/" 2>/dev/null || true
8242
+ # Auto-open the dashboard in the browser, but ONLY for an interactive
8243
+ # foreground session. Gated on: a TTY on stdout ([ -t 1 ]), not
8244
+ # background/detached mode, and not explicitly opted out via
8245
+ # LOKI_NO_AUTO_OPEN=1. This keeps CI, --detach, SSH-no-TTY, and piped
8246
+ # runs from spawning a browser. Cross-platform: open / xdg-open / start.
8247
+ if [ -t 1 ] && [ "${BACKGROUND_MODE:-false}" != "true" ] && [ "${LOKI_NO_AUTO_OPEN:-0}" != "1" ]; then
8248
+ local _dash_url="${url_scheme}://127.0.0.1:$DASHBOARD_PORT/"
8249
+ if command -v open >/dev/null 2>&1; then
8250
+ open "$_dash_url" 2>/dev/null || true
8251
+ elif command -v xdg-open >/dev/null 2>&1; then
8252
+ xdg-open "$_dash_url" 2>/dev/null || true
8253
+ elif command -v cmd.exe >/dev/null 2>&1; then
8254
+ # Windows (Git Bash/WSL): `start` is a cmd builtin, not on PATH,
8255
+ # so invoke it via cmd.exe. The empty "" is start's title arg.
8256
+ cmd.exe /c start "" "$_dash_url" 2>/dev/null || true
8257
+ fi
8222
8258
  fi
8223
8259
  return 0
8224
8260
  else
@@ -12141,6 +12177,59 @@ except Exception as exc:
12141
12177
  && loki_claude_flag_supported "--include-partial-messages"; then
12142
12178
  _loki_claude_argv+=("--include-partial-messages")
12143
12179
  fi
12180
+ # ---- Bash<->Bun invocation-flag convergence ledger (v7.25.0) ----------
12181
+ # The fixture corpus covers build_prompt/stats output, NOT this claude
12182
+ # argv, so drift here is invisible to parity tests. Keep this ledger
12183
+ # current. Live route today is BASH (bin/loki routes `start` -> bash).
12184
+ # The claude provider in loki-ts/src/runner/providers.ts is implemented
12185
+ # but is NOT reached for `start` (start is not ported to the Bun router;
12186
+ # the shim falls through to bash), so its flag set has zero live impact
12187
+ # today.
12188
+ # Bash argv (canonical, live): --dangerously-skip-permissions --model M
12189
+ # [--append-system-prompt] [--setting-sources] [--include-partial-messages]
12190
+ # [--effort] [--max-budget-usd] [--fallback-model] -p PROMPT
12191
+ # --output-format stream-json --verbose
12192
+ # Bun buildAutoFlags also emits: --exclude-dynamic-system-prompt-sections
12193
+ # (cost-only), --mcp-config (bash gets MCP via --setting-sources +
12194
+ # .mcp.json discovery; a how-difference, likely behavior-equivalent),
12195
+ # --include-hook-events (bash handles hook events in its embedded
12196
+ # stream parser; likely moot). These three are Bun-only and MUST be
12197
+ # reconciled to a deliberately chosen canonical set BEFORE `start`
12198
+ # flips to the Bun runner. They have zero live impact today.
12199
+ # v7.25.0: long-run resilience + cost flags, appended individually here
12200
+ # (NOT via _loki_build_claude_auto_flags, which would double the three
12201
+ # flags above). Each is gated on CLI support + an opt-out env var, same
12202
+ # pattern as above. These improve unattended/long-run execution:
12203
+ # --effort adaptive reasoning depth per RARV tier
12204
+ # --max-budget-usd per-call hard backstop (complements the
12205
+ # cumulative check_budget_limit PAUSE gate)
12206
+ # --fallback-model resilience to model overload/unavailability
12207
+ # The trust/verification gates stay deterministic; these only tune how
12208
+ # the provider is invoked, never whether work is judged complete.
12209
+ if [ "${LOKI_AUTO_EFFORT:-on}" != "off" ] \
12210
+ && type loki_effort_for_tier >/dev/null 2>&1 \
12211
+ && type loki_claude_flag_supported >/dev/null 2>&1 \
12212
+ && loki_claude_flag_supported "--effort"; then
12213
+ local _loki_effort
12214
+ _loki_effort="$(loki_effort_for_tier "$CURRENT_TIER" "${DETECTED_COMPLEXITY:-${LOKI_COMPLEXITY:-standard}}")"
12215
+ [ -n "$_loki_effort" ] && _loki_claude_argv+=("--effort" "$_loki_effort")
12216
+ fi
12217
+ if [ "${LOKI_AUTO_BUDGET:-on}" != "off" ] \
12218
+ && type loki_remaining_budget >/dev/null 2>&1 \
12219
+ && type loki_claude_flag_supported >/dev/null 2>&1 \
12220
+ && loki_claude_flag_supported "--max-budget-usd"; then
12221
+ local _loki_rem_budget
12222
+ _loki_rem_budget="$(loki_remaining_budget)"
12223
+ [ -n "$_loki_rem_budget" ] && _loki_claude_argv+=("--max-budget-usd" "$_loki_rem_budget")
12224
+ fi
12225
+ if [ "${LOKI_AUTO_FALLBACK:-on}" != "off" ] \
12226
+ && type loki_fallback_for_primary >/dev/null 2>&1 \
12227
+ && type loki_claude_flag_supported >/dev/null 2>&1 \
12228
+ && loki_claude_flag_supported "--fallback-model"; then
12229
+ local _loki_fallback
12230
+ _loki_fallback="$(loki_fallback_for_primary "$tier_param")"
12231
+ [ -n "$_loki_fallback" ] && _loki_claude_argv+=("--fallback-model" "$_loki_fallback")
12232
+ fi
12144
12233
  case "${PROVIDER_NAME:-claude}" in
12145
12234
  claude)
12146
12235
  # Claude: Full features with stream-json output and agent tracking
@@ -7,7 +7,7 @@ Modules:
7
7
  control: Session control API (start/stop/pause/resume)
8
8
  """
9
9
 
10
- __version__ = "7.24.0"
10
+ __version__ = "7.25.0"
11
11
 
12
12
  # Expose the control app for easy import
13
13
  try:
@@ -2,7 +2,7 @@
2
2
 
3
3
  The flagship product of [Autonomi](https://www.autonomi.dev/). Loki Mode is a spec-driven autonomous builder with a built-in trust layer that takes any spec to a deployed product and verifies completion with evidence (quality gates plus a completion council), not just a "done" claim. Complete installation instructions for all platforms and use cases.
4
4
 
5
- **Version:** v7.24.0
5
+ **Version:** v7.25.0
6
6
 
7
7
  ---
8
8
 
@@ -1,5 +1,5 @@
1
1
  // @bun
2
- var f8=Object.defineProperty;var u8=($)=>$;function c8($,Q){this[$]=u8.bind(null,Q)}var g=($,Q)=>{for(var Z in Q)f8($,Z,{get:Q[Z],enumerable:!0,configurable:!0,set:c8.bind(Q,Z)})};var k=($,Q)=>()=>($&&(Q=$($=0)),Q);var X1=import.meta.require;var F$={};g(F$,{lokiDir:()=>P,homeLokiDir:()=>o1,findRepoRootForVersion:()=>d1,REPO_ROOT:()=>f});import{resolve as n,dirname as l1}from"path";import{fileURLToPath as p8}from"url";import{existsSync as L1}from"fs";import{homedir as l8}from"os";function d8(){let $=j$;for(let Q=0;Q<6;Q++){if(L1(n($,"VERSION"))&&L1(n($,"autonomy/run.sh")))return $;let Z=l1($);if(Z===$)break;$=Z}return n(j$,"..","..","..")}function d1($){let Q=$;for(let Z=0;Z<6;Z++){if(L1(n(Q,"VERSION"))&&L1(n(Q,"autonomy/run.sh")))return Q;let z=l1(Q);if(z===Q)break;Q=z}return n($,"..","..","..")}function P(){return process.env.LOKI_DIR??n(process.cwd(),".loki")}function o1(){return n(l8(),".loki")}var j$,f;var y=k(()=>{j$=l1(p8(import.meta.url));f=d8()});import{readFileSync as o8}from"fs";import{resolve as n8,dirname as a8}from"path";import{fileURLToPath as s8}from"url";function k1(){if($1!==null)return $1;let $="7.24.0";if(typeof $==="string"&&$.length>0)return $1=$,$1;try{let Q=a8(s8(import.meta.url)),Z=d1(Q);$1=o8(n8(Z,"VERSION"),"utf-8").trim()}catch{$1="unknown"}return $1}var $1=null;var n1=k(()=>{y()});var E$={};g(E$,{runOrThrow:()=>t8,run:()=>j,commandVersion:()=>i8,commandExists:()=>v,ShellError:()=>a1});async function j($,Q={}){let Z=Bun.spawn({cmd:[...$],stdout:"pipe",stderr:"pipe",env:Q.env?{...process.env,...Q.env}:process.env,cwd:Q.cwd}),z,K;if(Q.timeoutMs&&Q.timeoutMs>0)z=setTimeout(()=>{try{Z.kill("SIGTERM")}catch{}K=setTimeout(()=>{try{Z.kill("SIGKILL")}catch{}},2000)},Q.timeoutMs);try{let[H,X,q]=await Promise.all([new Response(Z.stdout).text(),new Response(Z.stderr).text(),Z.exited]);return{stdout:H,stderr:X,exitCode:q}}finally{if(z)clearTimeout(z);if(K)clearTimeout(K)}}async function t8($,Q={}){let Z=await j($,Q);if(Z.exitCode!==0)throw new a1(`command failed (${Z.exitCode}): ${$.join(" ")}`,Z.exitCode,Z.stdout,Z.stderr);return Z}async function v($){let Q=r8($),Z=await j(["sh","-c",`command -v ${Q}`],{timeoutMs:5000});if(Z.exitCode===0)return Z.stdout.trim()||null;return null}function r8($){if(!/^[A-Za-z0-9._/-]+$/.test($))throw Error(`refused to shell-escape suspect token: ${$}`);return $}async function i8($,Q="--version"){if(!await v($))return null;let z=await j([$,Q],{timeoutMs:5000});if(z.exitCode!==0)return null;return((z.stdout||z.stderr).split(/\r?\n/)[0]?.trim()??"")||null}var a1;var d=k(()=>{a1=class a1 extends Error{message;exitCode;stdout;stderr;constructor($,Q,Z,z){super($);this.message=$;this.exitCode=Q;this.stdout=Z;this.stderr=z;this.name="ShellError"}}});function a($){return e8?"":$}var e8,T,N,_,KZ,A,R,h,J;var c=k(()=>{e8=(process.env.NO_COLOR??"").length>0;T=a("\x1B[0;31m"),N=a("\x1B[0;32m"),_=a("\x1B[1;33m"),KZ=a("\x1B[0;34m"),A=a("\x1B[0;36m"),R=a("\x1B[1m"),h=a("\x1B[2m"),J=a("\x1B[0m")});import{existsSync as U7}from"fs";async function Q1(){if(B1!==void 0)return B1;let $="/opt/homebrew/bin/python3.12";if(U7($))return B1=$,$;let Q=await v("python3.12");if(Q)return B1=Q,Q;let Z=await v("python3");return B1=Z,Z}async function Z1($,Q={}){let Z=await Q1();if(!Z)return{stdout:"",stderr:"python3 not found",exitCode:127};return j([Z,"-c",$],Q)}var B1;var H1=k(()=>{d()});var d$={};g(d$,{runStatus:()=>N7});import{existsSync as b,readFileSync as q1,readdirSync as v$,statSync as f$}from"fs";import{resolve as D,basename as P7}from"path";import{homedir as L7}from"os";async function j7(){if(await v("jq"))return!0;return process.stdout.write(`${T}Error: jq is required but not installed.${J}
2
+ var f8=Object.defineProperty;var u8=($)=>$;function c8($,Q){this[$]=u8.bind(null,Q)}var g=($,Q)=>{for(var Z in Q)f8($,Z,{get:Q[Z],enumerable:!0,configurable:!0,set:c8.bind(Q,Z)})};var k=($,Q)=>()=>($&&(Q=$($=0)),Q);var X1=import.meta.require;var F$={};g(F$,{lokiDir:()=>P,homeLokiDir:()=>o1,findRepoRootForVersion:()=>d1,REPO_ROOT:()=>f});import{resolve as n,dirname as l1}from"path";import{fileURLToPath as p8}from"url";import{existsSync as L1}from"fs";import{homedir as l8}from"os";function d8(){let $=j$;for(let Q=0;Q<6;Q++){if(L1(n($,"VERSION"))&&L1(n($,"autonomy/run.sh")))return $;let Z=l1($);if(Z===$)break;$=Z}return n(j$,"..","..","..")}function d1($){let Q=$;for(let Z=0;Z<6;Z++){if(L1(n(Q,"VERSION"))&&L1(n(Q,"autonomy/run.sh")))return Q;let z=l1(Q);if(z===Q)break;Q=z}return n($,"..","..","..")}function P(){return process.env.LOKI_DIR??n(process.cwd(),".loki")}function o1(){return n(l8(),".loki")}var j$,f;var y=k(()=>{j$=l1(p8(import.meta.url));f=d8()});import{readFileSync as o8}from"fs";import{resolve as n8,dirname as a8}from"path";import{fileURLToPath as s8}from"url";function k1(){if($1!==null)return $1;let $="7.25.0";if(typeof $==="string"&&$.length>0)return $1=$,$1;try{let Q=a8(s8(import.meta.url)),Z=d1(Q);$1=o8(n8(Z,"VERSION"),"utf-8").trim()}catch{$1="unknown"}return $1}var $1=null;var n1=k(()=>{y()});var E$={};g(E$,{runOrThrow:()=>t8,run:()=>j,commandVersion:()=>i8,commandExists:()=>v,ShellError:()=>a1});async function j($,Q={}){let Z=Bun.spawn({cmd:[...$],stdout:"pipe",stderr:"pipe",env:Q.env?{...process.env,...Q.env}:process.env,cwd:Q.cwd}),z,K;if(Q.timeoutMs&&Q.timeoutMs>0)z=setTimeout(()=>{try{Z.kill("SIGTERM")}catch{}K=setTimeout(()=>{try{Z.kill("SIGKILL")}catch{}},2000)},Q.timeoutMs);try{let[H,X,q]=await Promise.all([new Response(Z.stdout).text(),new Response(Z.stderr).text(),Z.exited]);return{stdout:H,stderr:X,exitCode:q}}finally{if(z)clearTimeout(z);if(K)clearTimeout(K)}}async function t8($,Q={}){let Z=await j($,Q);if(Z.exitCode!==0)throw new a1(`command failed (${Z.exitCode}): ${$.join(" ")}`,Z.exitCode,Z.stdout,Z.stderr);return Z}async function v($){let Q=r8($),Z=await j(["sh","-c",`command -v ${Q}`],{timeoutMs:5000});if(Z.exitCode===0)return Z.stdout.trim()||null;return null}function r8($){if(!/^[A-Za-z0-9._/-]+$/.test($))throw Error(`refused to shell-escape suspect token: ${$}`);return $}async function i8($,Q="--version"){if(!await v($))return null;let z=await j([$,Q],{timeoutMs:5000});if(z.exitCode!==0)return null;return((z.stdout||z.stderr).split(/\r?\n/)[0]?.trim()??"")||null}var a1;var d=k(()=>{a1=class a1 extends Error{message;exitCode;stdout;stderr;constructor($,Q,Z,z){super($);this.message=$;this.exitCode=Q;this.stdout=Z;this.stderr=z;this.name="ShellError"}}});function a($){return e8?"":$}var e8,T,N,_,KZ,A,R,h,J;var c=k(()=>{e8=(process.env.NO_COLOR??"").length>0;T=a("\x1B[0;31m"),N=a("\x1B[0;32m"),_=a("\x1B[1;33m"),KZ=a("\x1B[0;34m"),A=a("\x1B[0;36m"),R=a("\x1B[1m"),h=a("\x1B[2m"),J=a("\x1B[0m")});import{existsSync as U7}from"fs";async function Q1(){if(B1!==void 0)return B1;let $="/opt/homebrew/bin/python3.12";if(U7($))return B1=$,$;let Q=await v("python3.12");if(Q)return B1=Q,Q;let Z=await v("python3");return B1=Z,Z}async function Z1($,Q={}){let Z=await Q1();if(!Z)return{stdout:"",stderr:"python3 not found",exitCode:127};return j([Z,"-c",$],Q)}var B1;var H1=k(()=>{d()});var d$={};g(d$,{runStatus:()=>N7});import{existsSync as b,readFileSync as q1,readdirSync as v$,statSync as f$}from"fs";import{resolve as D,basename as P7}from"path";import{homedir as L7}from"os";async function j7(){if(await v("jq"))return!0;return process.stdout.write(`${T}Error: jq is required but not installed.${J}
3
3
  `),process.stdout.write(`Install with:
4
4
  `),process.stdout.write(` brew install jq (macOS)
5
5
  `),process.stdout.write(` apt install jq (Debian/Ubuntu)
@@ -787,4 +787,4 @@ Set LOKI_LEGACY_BASH=1 to force the bash CLI for every command.
787
787
  `),2}default:return process.stderr.write(`Unknown command: ${Q}
788
788
  `),process.stderr.write(v8),2}}g$();process.on("SIGINT",()=>process.exit(130));process.on("SIGTERM",()=>process.exit(143));var l3=await p3(Bun.argv.slice(2));process.exit(l3);
789
789
 
790
- //# debugId=5BE33C3C3E53FD8864756E2164756E21
790
+ //# debugId=C1988314C1A4579264756E2164756E21
package/mcp/__init__.py CHANGED
@@ -57,4 +57,4 @@ try:
57
57
  except ImportError:
58
58
  __all__ = ['mcp']
59
59
 
60
- __version__ = '7.24.0'
60
+ __version__ = '7.25.0'
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "loki-mode",
3
- "version": "7.24.0",
3
+ "version": "7.25.0",
4
4
  "description": "Loki Mode by Autonomi. Autonomous spec-to-product system: takes a PRD, GitHub issue, OpenAPI/JSON/YAML, or one-line brief to a deployed app via the RARV-C closure loop with 11 quality gates. Provider-agnostic (Claude Code, OpenAI Codex, Cline, Aider).",
5
5
  "keywords": [
6
6
  "agent",
@@ -1,114 +0,0 @@
1
- # Loki Mode vs Replit Agent / Lovable / Bolt.new -- Instant-Preview & Self-Healing Gap Analysis
2
-
3
- Date: 2026-06-09
4
- Author: autonomous analysis (verified against Loki source + competitor web research at knowledge cutoff Jan 2026 plus June 2026 web search)
5
- Status: analysis + prioritized TODO. The TODO is the planning checkpoint, not an auto-launched implementation.
6
-
7
- ## 1. The user's framing
8
-
9
- > "When users prompt a spec or idea, [Replit/Lovable/Bolt] just builds and spins up UI so users can try it. They are clever at suggesting what to add/update/improve/remove, finding bugs while the app runs and fixing them autonomously before the user realizes. Why are we not able to do that? We are lacking something that is causing friction for users (devs, enterprises, non-technical consumers)."
10
-
11
- This is correct as a *felt experience* gap. It is NOT correct that Loki lacks the underlying capability. The gap is mostly **surfacing and loop-tightness**, plus a real **category difference**.
12
-
13
- ## 2. What the competitors actually do (verified June 2026)
14
-
15
- ### Replit Agent 3
16
- - Runs autonomously up to ~200 minutes ("10x more autonomous than Agent 2").
17
- - "Self-healing" loop: spins up a real browser, simulates user behavior (click/type/login), captures logs, and fixes bugs it hits during testing. Calls out "Potemkin interfaces" (looks-done-but-broken).
18
- - REPL-based verification at scale; provisions database; verifies every button/API call.
19
- - Code-to-Device: builds + previews native mobile via Expo QR instantly.
20
- - "Stacks": agents building agents. RulesSync for replit.md across projects.
21
- - Hosted: your project runs on Replit's cloud, instant live preview built into the IDE.
22
-
23
- ### Lovable
24
- - Three modes: Visual Edits (click an element), Plan Mode (conversational), Agent Mode (autonomous codebase exploration + proactive debugging + real-time web search).
25
- - Live preview updates in real time; checkpoint/version system to revert a bad edit.
26
- - Pricing: Free (5 daily credits), Pro $25/mo, Business $50/mo; plus usage-based Cloud/AI billing on shipped apps.
27
- - Hosted; greenfield-biased (build a new app from a prompt).
28
-
29
- ### Bolt.new (StackBlitz)
30
- - WebContainer: Node runs natively *in the browser*, no server. Live preview updates as code generates; you click/fill/interact immediately.
31
- - Agent has full control of filesystem, node server, package manager, terminal, browser console; human-in-the-loop chat + hand-edit.
32
- - Bolt V2: Bolt Cloud (DB, auth, storage, edge functions, analytics, hosting), one-click Netlify deploy. Opus 4.6 with adjustable reasoning depth (Jan 2026).
33
- - Hosted/in-browser; greenfield-biased.
34
-
35
- Common thread: **hosted text-to-app**. Instant live preview is free because the runtime IS their cloud/browser sandbox. The inner loop (build -> see -> fix) is conversational, visible, real-time.
36
-
37
- ## 3. What Loki actually has (verified in source)
38
-
39
- | Capability | Competitors | Loki today | Source |
40
- |---|---|---|---|
41
- | Starts the built app on a detected port | yes | YES (`app_runner_start`, `_detect_port`) | `autonomy/app-runner.sh:498,150` |
42
- | Health check + crash watchdog + auto-restart | yes | YES (`app_runner_watchdog`, `app_runner_should_restart`, `app_runner_health_check`) | `autonomy/app-runner.sh:769,735,678` |
43
- | Browser smoke test of the running app | yes (their core loop) | YES, batch (`playwright-verify.sh`) | `autonomy/playwright-verify.sh` |
44
- | Crash/playwright signal fed back into next iteration to self-correct | yes, tight realtime | YES, batch (`app_runner_info`/`playwright_info` injected into `build_prompt`) | `autonomy/run.sh:10544,10561,10761` |
45
- | Verified completion gate (no fabricated "done") | partial | YES (`council_evidence_gate`) | `completion-council.sh` |
46
- | Failure-memory: past failures injected to avoid repeats | partial | YES (`retrieve_anti_patterns`) | `memory/retrieval.py` |
47
- | **Clickable live preview URL / embedded app the user can try** | YES, central | **NO -- app runs but URL is not surfaced in dashboard** | gap: `dashboard/server.py` has no preview route |
48
- | **Real-time visible inner loop (watch it build/fix as it happens)** | YES | Partial -- dashboard shows iterations/logs, but app preview not embedded; loop is a longer autonomous batch | gap |
49
- | **Proactive "suggest add/improve/remove" surfaced to user** | YES | Partial -- analysis pass exists internally; not surfaced as user-facing suggestions | gap |
50
- | Works on existing/brownfield repos | weak (greenfield-biased) | STRONG | `loki heal`, codebase analysis |
51
- | No hosted-runtime lock-in; your machine, your code | NO (vendor cloud) | YES | local-first CLI |
52
- | Multi-provider (Claude/Codex/Cline/Aider) | NO | YES | `providers/*.sh` |
53
-
54
- ## 4. Where Loki is BETTER (keep + lead with these)
55
-
56
- 1. **Local-first, no vendor lock-in.** Your code never leaves your machine; no hosted runtime you can be evicted from or surprised-billed on (Lovable's dual-layer billing is a known pain). Enterprises care about this.
57
- 2. **Brownfield/existing repos.** Replit/Lovable/Bolt are greenfield-biased (prompt -> new app). Loki runs on real existing codebases, including `loki heal` for legacy systems.
58
- 3. **Verified completion + failure-memory.** Evidence gate blocks fabricated "done"; anti-pattern memory prevents repeating mistakes across runs. The hosted tools mostly re-discover bugs each session.
59
- 4. **Multi-provider + your own keys/budget.** Not locked to one model vendor or a credit economy.
60
- 5. **Depth of autonomous SDLC** (RARV, council review, quality gates) vs a single conversational agent.
61
-
62
- ## 5. Where Loki is WORSE (the real friction)
63
-
64
- 1. **No instant "try it" moment.** The app DOES start (app-runner), but the dashboard never hands the user a clickable URL or embedded preview. This is the single biggest felt gap and it is a *surfacing* fix, not a build.
65
- 2. **Setup friction.** Competitors are zero-install (open a browser). Loki needs the `claude` CLI, a terminal, `loki start`. Non-technical consumers stall here.
66
- 3. **The inner loop is a long batch, not a watched conversation.** Users can't see "it's testing the login button now and fixing it." Same capability, far less visible.
67
- 4. **Suggestions aren't surfaced.** Loki's analysis pass reasons about what to add/fix internally, but doesn't present a user-facing "here's what I'd improve next" list.
68
- 5. **First-run time-to-wow.** No 30-second "look, it works" the way a hosted preview gives.
69
-
70
- ## 6. Honest category line (for positioning, not inferiority)
71
-
72
- Hosted text-to-app SaaS (Replit/Lovable/Bolt): instant live preview, tight visible loop, friendly to non-technical users -- but your code on their cloud, vendor + credit lock-in, greenfield-biased.
73
-
74
- Loki Mode: local-first CLI driving Claude on your own machine -- brownfield-capable, no hosted-runtime lock-in, multi-provider, verified completion + memory -- but no instant hosted preview and higher setup friction.
75
-
76
- The wins below close the *experience* gap without giving up the local-first advantages.
77
-
78
- ## 7. Prioritized TODO (by blast radius / friction reduction)
79
-
80
- ### P0 -- Live Preview surfacing (the headline win; cheapest path to "try it")
81
- The app already starts with crash watchdog. Surface it.
82
- - **Dashboard:** add a "Live App" panel that reads `.loki/app-runner/state.json` (status, port, url, crash_count), shows a clickable `http://localhost:<port>` link + an embedded iframe + health/crash badge + "Restart app" button (wire to existing `app_runner_restart`).
83
- - **CLI:** `loki preview` (alias `loki open`) -- prints the running app URL and opens the browser; honest message if no app is running yet.
84
- - **API:** `GET /api/app-runner` (state passthrough), `POST /api/app-runner/restart`.
85
- - Pure surfacing of existing state; no new runtime behavior. Lowest risk, highest felt impact.
86
-
87
- ### P1 -- Tighter, visible self-healing loop
88
- - Stream app-runner crash events + playwright pass/fail to the dashboard timeline in near-real-time (event bus already exists, `events/bus.py`).
89
- - Dashboard "what just happened" feed: "app crashed -> reading log -> fixing -> restarted -> smoke test passed."
90
- - Honest framing: this exposes the EXISTING batch loop more visibly; it does not claim Replit's per-click realtime browser sim.
91
-
92
- ### P2 -- Proactive suggestions surfaced to the user
93
- - Add a structured "Suggestions" output from the analysis pass (add/improve/remove/risk), persisted to `.loki/suggestions.json`.
94
- - Dashboard "Suggestions" panel + `loki suggest` CLI to print them.
95
- - These are advisory; the user opts in to queue any as tasks.
96
-
97
- ### P3 -- First-run time-to-wow / setup friction
98
- - `loki try <one-line-idea>`: scaffold a tiny app, build it, auto-start app-runner, open preview -- a guided 60-second "it works" path (honest: real build, not simulated).
99
- - Doctor-style preflight that detects missing `claude` CLI and guides install.
100
-
101
- ### P4 -- Non-technical on-ramp (longer term, optional)
102
- - Evaluate an optional hosted/containerized preview for users who can't run locally (collides with zero-egress posture; opt-in only, deferred).
103
-
104
- ## 8. What this session will actually implement
105
-
106
- Per user direction ("update dashboard and backend cli or api accordingly", "plan it perfectly", "complete autonomously"): implement **P0 (Live Preview surfacing)** end-to-end (dashboard + CLI + API), both runtime routes where applicable, council-reviewed, local-ci 42/42, channels validated. P1-P4 are scoped follow-ups.
107
-
108
- ## 9. SWE-bench note (unrelated but pending, must be stated)
109
- Primary-source data in `benchmarks/results/` shows ONLY patch generation (299/300 generated, `fixed_by_rarv:0`, status PATCHES_GENERATED, the official evaluator was never run, no resolve/pass-rate figure exists; some patches are prose not diffs). There is no "release 660." Publishing a SWE-bench resolve score would be fabrication. The only real measured number is HumanEval **98.78%** (162/164). Recommendation: lead with HumanEval; keep SWE-bench as "harness exists, resolve-rate not yet measured" + the repro command; offer to run the official evaluator as an opt-in upgrade.
110
-
111
- ## Sources (competitor research, June 2026)
112
- - Replit Agent 3: https://blog.replit.com/introducing-agent-3-our-most-autonomous-agent-yet , https://blog.replit.com/automated-self-testing , https://docs.replit.com/core-concepts/agent
113
- - Lovable: https://lovable.dev/ , https://lovable.dev/pricing , https://www.nocode.mba/articles/lovable-ai-app-builder
114
- - Bolt.new: https://github.com/stackblitz/bolt.new , https://capacity.so/blog/what-is-bolt-new , https://www.banani.co/blog/bolt-new-ai-review-and-alternatives