npm - loki-mode - Versions diffs - 7.23.0 → 7.24.0 - Mend

loki-mode 7.23.0 → 7.24.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/README.md +4 -2
package/SKILL.md +2 -2
package/VERSION +1 -1
package/autonomy/loki +85 -0
package/dashboard/__init__.py +1 -1
package/dashboard/server.py +79 -2
package/dashboard/static/index.html +224 -121
package/docs/COMPETITIVE-ANALYSIS-INSTANT-PREVIEW-2026-06.md +114 -0
package/docs/INSTALLATION.md +1 -1
package/loki-ts/dist/loki.js +2 -2
package/mcp/__init__.py +1 -1
package/package.json +1 -1

package/docs/COMPETITIVE-ANALYSIS-INSTANT-PREVIEW-2026-06.md ADDED Viewed

@@ -0,0 +1,114 @@
+# Loki Mode vs Replit Agent / Lovable / Bolt.new -- Instant-Preview & Self-Healing Gap Analysis
+Date: 2026-06-09
+Author: autonomous analysis (verified against Loki source + competitor web research at knowledge cutoff Jan 2026 plus June 2026 web search)
+Status: analysis + prioritized TODO. The TODO is the planning checkpoint, not an auto-launched implementation.
+## 1. The user's framing
+> "When users prompt a spec or idea, [Replit/Lovable/Bolt] just builds and spins up UI so users can try it. They are clever at suggesting what to add/update/improve/remove, finding bugs while the app runs and fixing them autonomously before the user realizes. Why are we not able to do that? We are lacking something that is causing friction for users (devs, enterprises, non-technical consumers)."
+This is correct as a *felt experience* gap. It is NOT correct that Loki lacks the underlying capability. The gap is mostly **surfacing and loop-tightness**, plus a real **category difference**.
+## 2. What the competitors actually do (verified June 2026)
+### Replit Agent 3
+- Runs autonomously up to ~200 minutes ("10x more autonomous than Agent 2").
+- "Self-healing" loop: spins up a real browser, simulates user behavior (click/type/login), captures logs, and fixes bugs it hits during testing. Calls out "Potemkin interfaces" (looks-done-but-broken).
+- REPL-based verification at scale; provisions database; verifies every button/API call.
+- Code-to-Device: builds + previews native mobile via Expo QR instantly.
+- "Stacks": agents building agents. RulesSync for replit.md across projects.
+- Hosted: your project runs on Replit's cloud, instant live preview built into the IDE.
+### Lovable
+- Three modes: Visual Edits (click an element), Plan Mode (conversational), Agent Mode (autonomous codebase exploration + proactive debugging + real-time web search).
+- Live preview updates in real time; checkpoint/version system to revert a bad edit.
+- Pricing: Free (5 daily credits), Pro $25/mo, Business $50/mo; plus usage-based Cloud/AI billing on shipped apps.
+- Hosted; greenfield-biased (build a new app from a prompt).
+### Bolt.new (StackBlitz)
+- WebContainer: Node runs natively *in the browser*, no server. Live preview updates as code generates; you click/fill/interact immediately.
+- Agent has full control of filesystem, node server, package manager, terminal, browser console; human-in-the-loop chat + hand-edit.
+- Bolt V2: Bolt Cloud (DB, auth, storage, edge functions, analytics, hosting), one-click Netlify deploy. Opus 4.6 with adjustable reasoning depth (Jan 2026).
+- Hosted/in-browser; greenfield-biased.
+Common thread: **hosted text-to-app**. Instant live preview is free because the runtime IS their cloud/browser sandbox. The inner loop (build -> see -> fix) is conversational, visible, real-time.
+## 3. What Loki actually has (verified in source)
+| Capability | Competitors | Loki today | Source |
+|---|---|---|---|
+| Starts the built app on a detected port | yes | YES (`app_runner_start`, `_detect_port`) | `autonomy/app-runner.sh:498,150` |
+| Health check + crash watchdog + auto-restart | yes | YES (`app_runner_watchdog`, `app_runner_should_restart`, `app_runner_health_check`) | `autonomy/app-runner.sh:769,735,678` |
+| Browser smoke test of the running app | yes (their core loop) | YES, batch (`playwright-verify.sh`) | `autonomy/playwright-verify.sh` |
+| Crash/playwright signal fed back into next iteration to self-correct | yes, tight realtime | YES, batch (`app_runner_info`/`playwright_info` injected into `build_prompt`) | `autonomy/run.sh:10544,10561,10761` |
+| Verified completion gate (no fabricated "done") | partial | YES (`council_evidence_gate`) | `completion-council.sh` |
+| Failure-memory: past failures injected to avoid repeats | partial | YES (`retrieve_anti_patterns`) | `memory/retrieval.py` |
+| **Clickable live preview URL / embedded app the user can try** | YES, central | **NO -- app runs but URL is not surfaced in dashboard** | gap: `dashboard/server.py` has no preview route |
+| **Real-time visible inner loop (watch it build/fix as it happens)** | YES | Partial -- dashboard shows iterations/logs, but app preview not embedded; loop is a longer autonomous batch | gap |
+| **Proactive "suggest add/improve/remove" surfaced to user** | YES | Partial -- analysis pass exists internally; not surfaced as user-facing suggestions | gap |
+| Works on existing/brownfield repos | weak (greenfield-biased) | STRONG | `loki heal`, codebase analysis |
+| No hosted-runtime lock-in; your machine, your code | NO (vendor cloud) | YES | local-first CLI |
+| Multi-provider (Claude/Codex/Cline/Aider) | NO | YES | `providers/*.sh` |
+## 4. Where Loki is BETTER (keep + lead with these)
+1. **Local-first, no vendor lock-in.** Your code never leaves your machine; no hosted runtime you can be evicted from or surprised-billed on (Lovable's dual-layer billing is a known pain). Enterprises care about this.
+2. **Brownfield/existing repos.** Replit/Lovable/Bolt are greenfield-biased (prompt -> new app). Loki runs on real existing codebases, including `loki heal` for legacy systems.
+3. **Verified completion + failure-memory.** Evidence gate blocks fabricated "done"; anti-pattern memory prevents repeating mistakes across runs. The hosted tools mostly re-discover bugs each session.
+4. **Multi-provider + your own keys/budget.** Not locked to one model vendor or a credit economy.
+5. **Depth of autonomous SDLC** (RARV, council review, quality gates) vs a single conversational agent.
+## 5. Where Loki is WORSE (the real friction)
+1. **No instant "try it" moment.** The app DOES start (app-runner), but the dashboard never hands the user a clickable URL or embedded preview. This is the single biggest felt gap and it is a *surfacing* fix, not a build.
+2. **Setup friction.** Competitors are zero-install (open a browser). Loki needs the `claude` CLI, a terminal, `loki start`. Non-technical consumers stall here.
+3. **The inner loop is a long batch, not a watched conversation.** Users can't see "it's testing the login button now and fixing it." Same capability, far less visible.
+4. **Suggestions aren't surfaced.** Loki's analysis pass reasons about what to add/fix internally, but doesn't present a user-facing "here's what I'd improve next" list.
+5. **First-run time-to-wow.** No 30-second "look, it works" the way a hosted preview gives.
+## 6. Honest category line (for positioning, not inferiority)
+Hosted text-to-app SaaS (Replit/Lovable/Bolt): instant live preview, tight visible loop, friendly to non-technical users -- but your code on their cloud, vendor + credit lock-in, greenfield-biased.
+Loki Mode: local-first CLI driving Claude on your own machine -- brownfield-capable, no hosted-runtime lock-in, multi-provider, verified completion + memory -- but no instant hosted preview and higher setup friction.
+The wins below close the *experience* gap without giving up the local-first advantages.
+## 7. Prioritized TODO (by blast radius / friction reduction)
+### P0 -- Live Preview surfacing (the headline win; cheapest path to "try it")
+The app already starts with crash watchdog. Surface it.
+- **Dashboard:** add a "Live App" panel that reads `.loki/app-runner/state.json` (status, port, url, crash_count), shows a clickable `http://localhost:<port>` link + an embedded iframe + health/crash badge + "Restart app" button (wire to existing `app_runner_restart`).
+- **CLI:** `loki preview` (alias `loki open`) -- prints the running app URL and opens the browser; honest message if no app is running yet.
+- **API:** `GET /api/app-runner` (state passthrough), `POST /api/app-runner/restart`.
+- Pure surfacing of existing state; no new runtime behavior. Lowest risk, highest felt impact.
+### P1 -- Tighter, visible self-healing loop
+- Stream app-runner crash events + playwright pass/fail to the dashboard timeline in near-real-time (event bus already exists, `events/bus.py`).
+- Dashboard "what just happened" feed: "app crashed -> reading log -> fixing -> restarted -> smoke test passed."
+- Honest framing: this exposes the EXISTING batch loop more visibly; it does not claim Replit's per-click realtime browser sim.
+### P2 -- Proactive suggestions surfaced to the user
+- Add a structured "Suggestions" output from the analysis pass (add/improve/remove/risk), persisted to `.loki/suggestions.json`.
+- Dashboard "Suggestions" panel + `loki suggest` CLI to print them.
+- These are advisory; the user opts in to queue any as tasks.
+### P3 -- First-run time-to-wow / setup friction
+- `loki try <one-line-idea>`: scaffold a tiny app, build it, auto-start app-runner, open preview -- a guided 60-second "it works" path (honest: real build, not simulated).
+- Doctor-style preflight that detects missing `claude` CLI and guides install.
+### P4 -- Non-technical on-ramp (longer term, optional)
+- Evaluate an optional hosted/containerized preview for users who can't run locally (collides with zero-egress posture; opt-in only, deferred).
+## 8. What this session will actually implement
+Per user direction ("update dashboard and backend cli or api accordingly", "plan it perfectly", "complete autonomously"): implement **P0 (Live Preview surfacing)** end-to-end (dashboard + CLI + API), both runtime routes where applicable, council-reviewed, local-ci 42/42, channels validated. P1-P4 are scoped follow-ups.
+## 9. SWE-bench note (unrelated but pending, must be stated)
+Primary-source data in `benchmarks/results/` shows ONLY patch generation (299/300 generated, `fixed_by_rarv:0`, status PATCHES_GENERATED, the official evaluator was never run, no resolve/pass-rate figure exists; some patches are prose not diffs). There is no "release 660." Publishing a SWE-bench resolve score would be fabrication. The only real measured number is HumanEval **98.78%** (162/164). Recommendation: lead with HumanEval; keep SWE-bench as "harness exists, resolve-rate not yet measured" + the repro command; offer to run the official evaluator as an opt-in upgrade.
+## Sources (competitor research, June 2026)
+- Replit Agent 3: https://blog.replit.com/introducing-agent-3-our-most-autonomous-agent-yet , https://blog.replit.com/automated-self-testing , https://docs.replit.com/core-concepts/agent
+- Lovable: https://lovable.dev/ , https://lovable.dev/pricing , https://www.nocode.mba/articles/lovable-ai-app-builder
+- Bolt.new: https://github.com/stackblitz/bolt.new , https://capacity.so/blog/what-is-bolt-new , https://www.banani.co/blog/bolt-new-ai-review-and-alternatives

package/docs/INSTALLATION.md CHANGED Viewed

@@ -2,7 +2,7 @@
 The flagship product of [Autonomi](https://www.autonomi.dev/). Loki Mode is a spec-driven autonomous builder with a built-in trust layer that takes any spec to a deployed product and verifies completion with evidence (quality gates plus a completion council), not just a "done" claim. Complete installation instructions for all platforms and use cases.
-**Version:** v7.23.0
+**Version:** v7.24.0
 ---

package/loki-ts/dist/loki.js CHANGED Viewed

@@ -1,5 +1,5 @@
 // @bun
-var f8=Object.defineProperty;var u8=($)=>$;function c8($,Q){this[$]=u8.bind(null,Q)}var g=($,Q)=>{for(var Z in Q)f8($,Z,{get:Q[Z],enumerable:!0,configurable:!0,set:c8.bind(Q,Z)})};var k=($,Q)=>()=>($&&(Q=$($=0)),Q);var X1=import.meta.require;var F$={};g(F$,{lokiDir:()=>P,homeLokiDir:()=>o1,findRepoRootForVersion:()=>d1,REPO_ROOT:()=>f});import{resolve as n,dirname as l1}from"path";import{fileURLToPath as p8}from"url";import{existsSync as L1}from"fs";import{homedir as l8}from"os";function d8(){let $=j$;for(let Q=0;Q<6;Q++){if(L1(n($,"VERSION"))&&L1(n($,"autonomy/run.sh")))return $;let Z=l1($);if(Z===$)break;$=Z}return n(j$,"..","..","..")}function d1($){let Q=$;for(let Z=0;Z<6;Z++){if(L1(n(Q,"VERSION"))&&L1(n(Q,"autonomy/run.sh")))return Q;let z=l1(Q);if(z===Q)break;Q=z}return n($,"..","..","..")}function P(){return process.env.LOKI_DIR??n(process.cwd(),".loki")}function o1(){return n(l8(),".loki")}var j$,f;var y=k(()=>{j$=l1(p8(import.meta.url));f=d8()});import{readFileSync as o8}from"fs";import{resolve as n8,dirname as a8}from"path";import{fileURLToPath as s8}from"url";function k1(){if($1!==null)return $1;let $="7.23.0";if(typeof $==="string"&&$.length>0)return $1=$,$1;try{let Q=a8(s8(import.meta.url)),Z=d1(Q);$1=o8(n8(Z,"VERSION"),"utf-8").trim()}catch{$1="unknown"}return $1}var $1=null;var n1=k(()=>{y()});var E$={};g(E$,{runOrThrow:()=>t8,run:()=>j,commandVersion:()=>i8,commandExists:()=>v,ShellError:()=>a1});async function j($,Q={}){let Z=Bun.spawn({cmd:[...$],stdout:"pipe",stderr:"pipe",env:Q.env?{...process.env,...Q.env}:process.env,cwd:Q.cwd}),z,K;if(Q.timeoutMs&&Q.timeoutMs>0)z=setTimeout(()=>{try{Z.kill("SIGTERM")}catch{}K=setTimeout(()=>{try{Z.kill("SIGKILL")}catch{}},2000)},Q.timeoutMs);try{let[H,X,q]=await Promise.all([new Response(Z.stdout).text(),new Response(Z.stderr).text(),Z.exited]);return{stdout:H,stderr:X,exitCode:q}}finally{if(z)clearTimeout(z);if(K)clearTimeout(K)}}async function t8($,Q={}){let Z=await j($,Q);if(Z.exitCode!==0)throw new a1(`command failed (${Z.exitCode}): ${$.join(" ")}`,Z.exitCode,Z.stdout,Z.stderr);return Z}async function v($){let Q=r8($),Z=await j(["sh","-c",`command -v ${Q}`],{timeoutMs:5000});if(Z.exitCode===0)return Z.stdout.trim()||null;return null}function r8($){if(!/^[A-Za-z0-9._/-]+$/.test($))throw Error(`refused to shell-escape suspect token: ${$}`);return $}async function i8($,Q="--version"){if(!await v($))return null;let z=await j([$,Q],{timeoutMs:5000});if(z.exitCode!==0)return null;return((z.stdout||z.stderr).split(/\r?\n/)[0]?.trim()??"")||null}var a1;var d=k(()=>{a1=class a1 extends Error{message;exitCode;stdout;stderr;constructor($,Q,Z,z){super($);this.message=$;this.exitCode=Q;this.stdout=Z;this.stderr=z;this.name="ShellError"}}});function a($){return e8?"":$}var e8,T,N,_,KZ,A,R,h,J;var c=k(()=>{e8=(process.env.NO_COLOR??"").length>0;T=a("\x1B[0;31m"),N=a("\x1B[0;32m"),_=a("\x1B[1;33m"),KZ=a("\x1B[0;34m"),A=a("\x1B[0;36m"),R=a("\x1B[1m"),h=a("\x1B[2m"),J=a("\x1B[0m")});import{existsSync as U7}from"fs";async function Q1(){if(B1!==void 0)return B1;let $="/opt/homebrew/bin/python3.12";if(U7($))return B1=$,$;let Q=await v("python3.12");if(Q)return B1=Q,Q;let Z=await v("python3");return B1=Z,Z}async function Z1($,Q={}){let Z=await Q1();if(!Z)return{stdout:"",stderr:"python3 not found",exitCode:127};return j([Z,"-c",$],Q)}var B1;var H1=k(()=>{d()});var d$={};g(d$,{runStatus:()=>N7});import{existsSync as b,readFileSync as q1,readdirSync as v$,statSync as f$}from"fs";import{resolve as D,basename as P7}from"path";import{homedir as L7}from"os";async function j7(){if(await v("jq"))return!0;return process.stdout.write(`${T}Error: jq is required but not installed.${J}
+var f8=Object.defineProperty;var u8=($)=>$;function c8($,Q){this[$]=u8.bind(null,Q)}var g=($,Q)=>{for(var Z in Q)f8($,Z,{get:Q[Z],enumerable:!0,configurable:!0,set:c8.bind(Q,Z)})};var k=($,Q)=>()=>($&&(Q=$($=0)),Q);var X1=import.meta.require;var F$={};g(F$,{lokiDir:()=>P,homeLokiDir:()=>o1,findRepoRootForVersion:()=>d1,REPO_ROOT:()=>f});import{resolve as n,dirname as l1}from"path";import{fileURLToPath as p8}from"url";import{existsSync as L1}from"fs";import{homedir as l8}from"os";function d8(){let $=j$;for(let Q=0;Q<6;Q++){if(L1(n($,"VERSION"))&&L1(n($,"autonomy/run.sh")))return $;let Z=l1($);if(Z===$)break;$=Z}return n(j$,"..","..","..")}function d1($){let Q=$;for(let Z=0;Z<6;Z++){if(L1(n(Q,"VERSION"))&&L1(n(Q,"autonomy/run.sh")))return Q;let z=l1(Q);if(z===Q)break;Q=z}return n($,"..","..","..")}function P(){return process.env.LOKI_DIR??n(process.cwd(),".loki")}function o1(){return n(l8(),".loki")}var j$,f;var y=k(()=>{j$=l1(p8(import.meta.url));f=d8()});import{readFileSync as o8}from"fs";import{resolve as n8,dirname as a8}from"path";import{fileURLToPath as s8}from"url";function k1(){if($1!==null)return $1;let $="7.24.0";if(typeof $==="string"&&$.length>0)return $1=$,$1;try{let Q=a8(s8(import.meta.url)),Z=d1(Q);$1=o8(n8(Z,"VERSION"),"utf-8").trim()}catch{$1="unknown"}return $1}var $1=null;var n1=k(()=>{y()});var E$={};g(E$,{runOrThrow:()=>t8,run:()=>j,commandVersion:()=>i8,commandExists:()=>v,ShellError:()=>a1});async function j($,Q={}){let Z=Bun.spawn({cmd:[...$],stdout:"pipe",stderr:"pipe",env:Q.env?{...process.env,...Q.env}:process.env,cwd:Q.cwd}),z,K;if(Q.timeoutMs&&Q.timeoutMs>0)z=setTimeout(()=>{try{Z.kill("SIGTERM")}catch{}K=setTimeout(()=>{try{Z.kill("SIGKILL")}catch{}},2000)},Q.timeoutMs);try{let[H,X,q]=await Promise.all([new Response(Z.stdout).text(),new Response(Z.stderr).text(),Z.exited]);return{stdout:H,stderr:X,exitCode:q}}finally{if(z)clearTimeout(z);if(K)clearTimeout(K)}}async function t8($,Q={}){let Z=await j($,Q);if(Z.exitCode!==0)throw new a1(`command failed (${Z.exitCode}): ${$.join(" ")}`,Z.exitCode,Z.stdout,Z.stderr);return Z}async function v($){let Q=r8($),Z=await j(["sh","-c",`command -v ${Q}`],{timeoutMs:5000});if(Z.exitCode===0)return Z.stdout.trim()||null;return null}function r8($){if(!/^[A-Za-z0-9._/-]+$/.test($))throw Error(`refused to shell-escape suspect token: ${$}`);return $}async function i8($,Q="--version"){if(!await v($))return null;let z=await j([$,Q],{timeoutMs:5000});if(z.exitCode!==0)return null;return((z.stdout||z.stderr).split(/\r?\n/)[0]?.trim()??"")||null}var a1;var d=k(()=>{a1=class a1 extends Error{message;exitCode;stdout;stderr;constructor($,Q,Z,z){super($);this.message=$;this.exitCode=Q;this.stdout=Z;this.stderr=z;this.name="ShellError"}}});function a($){return e8?"":$}var e8,T,N,_,KZ,A,R,h,J;var c=k(()=>{e8=(process.env.NO_COLOR??"").length>0;T=a("\x1B[0;31m"),N=a("\x1B[0;32m"),_=a("\x1B[1;33m"),KZ=a("\x1B[0;34m"),A=a("\x1B[0;36m"),R=a("\x1B[1m"),h=a("\x1B[2m"),J=a("\x1B[0m")});import{existsSync as U7}from"fs";async function Q1(){if(B1!==void 0)return B1;let $="/opt/homebrew/bin/python3.12";if(U7($))return B1=$,$;let Q=await v("python3.12");if(Q)return B1=Q,Q;let Z=await v("python3");return B1=Z,Z}async function Z1($,Q={}){let Z=await Q1();if(!Z)return{stdout:"",stderr:"python3 not found",exitCode:127};return j([Z,"-c",$],Q)}var B1;var H1=k(()=>{d()});var d$={};g(d$,{runStatus:()=>N7});import{existsSync as b,readFileSync as q1,readdirSync as v$,statSync as f$}from"fs";import{resolve as D,basename as P7}from"path";import{homedir as L7}from"os";async function j7(){if(await v("jq"))return!0;return process.stdout.write(`${T}Error: jq is required but not installed.${J}
 `),process.stdout.write(`Install with:
 `),process.stdout.write(`  brew install jq    (macOS)
 `),process.stdout.write(`  apt install jq     (Debian/Ubuntu)
@@ -787,4 +787,4 @@ Set LOKI_LEGACY_BASH=1 to force the bash CLI for every command.
 `),2}default:return process.stderr.write(`Unknown command: ${Q}
 `),process.stderr.write(v8),2}}g$();process.on("SIGINT",()=>process.exit(130));process.on("SIGTERM",()=>process.exit(143));var l3=await p3(Bun.argv.slice(2));process.exit(l3);
-//# debugId=54BA5C5C5E9F714F64756E2164756E21
+//# debugId=5BE33C3C3E53FD8864756E2164756E21

package/mcp/__init__.py CHANGED Viewed

@@ -57,4 +57,4 @@ try:
 except ImportError:
     __all__ = ['mcp']
-__version__ = '7.23.0'
+__version__ = '7.24.0'

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "loki-mode",
-  "version": "7.23.0",
+  "version": "7.24.0",
   "description": "Loki Mode by Autonomi. Autonomous spec-to-product system: takes a PRD, GitHub issue, OpenAPI/JSON/YAML, or one-line brief to a deployed app via the RARV-C closure loop with 11 quality gates. Provider-agnostic (Claude Code, OpenAI Codex, Cline, Aider).",
   "keywords": [
     "agent",