npm - loki-mode - Versions diffs - 5.48.2 → 5.49.1 - Mend

loki-mode 5.48.2 → 5.49.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

package/README.md +25 -40
package/SKILL.md +3 -3
package/VERSION +1 -1
package/autonomy/CONSTITUTION.md +2 -2
package/autonomy/app-runner.sh +9 -0
package/autonomy/completion-council.sh +58 -4
package/autonomy/hooks/validate-bash.sh +11 -1
package/autonomy/loki +107 -0
package/autonomy/run.sh +299 -4
package/autonomy/sandbox.sh +9 -0
package/dashboard/__init__.py +1 -1
package/dashboard/server.py +39 -1
package/docs/COMPARISON.md +10 -10
package/docs/COMPETITIVE-ANALYSIS.md +3 -3
package/docs/INSTALLATION.md +20 -12
package/docs/auto-claude-comparison.md +1 -1
package/docs/cursor-comparison.md +3 -3
package/docs/thick2thin.md +2 -2
package/mcp/__init__.py +1 -1
package/package.json +1 -1
package/references/agent-types.md +2 -2
package/references/agents.md +1 -1
package/references/competitive-analysis.md +1 -1
package/skills/agents.md +3 -3
package/skills/parallel-workflows.md +1 -1
package/skills/quality-gates.md +1 -1

package/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Loki Mode
-**The Flagship Product of [Autonomi](https://www.autonomi.dev/) -- The First Truly Autonomous Multi-Agent Startup System**
+**The Flagship Product of [Autonomi](https://www.autonomi.dev/) -- An Autonomous Multi-Agent Development System**
 [![npm version](https://img.shields.io/npm/v/loki-mode)](https://www.npmjs.com/package/loki-mode)
 [![npm downloads](https://img.shields.io/npm/dw/loki-mode)](https://www.npmjs.com/package/loki-mode)
@@ -9,17 +9,15 @@
 [![GitHub Marketplace](https://img.shields.io/badge/Marketplace-Loki%20Mode-purple?logo=github)](https://github.com/marketplace/actions/loki-mode-code-review)
 [![Autonomi](https://img.shields.io/badge/Autonomi-autonomi.dev-5B4EEA)](https://www.autonomi.dev/)
 [![Agent Types](https://img.shields.io/badge/Agent%20Types-41-blue)]()
-[![Loki Mode](https://img.shields.io/badge/Loki%20Mode-98.78%25%20Pass%401-blueviolet)](benchmarks/results/)
-[![HumanEval](https://img.shields.io/badge/HumanEval-98.17%25%20Pass%401-brightgreen)](benchmarks/results/)
-[![SWE-bench](https://img.shields.io/badge/SWE--bench-99.67%25%20Patch%20Gen-brightgreen)](benchmarks/results/)
+[![Benchmarks](https://img.shields.io/badge/Benchmarks-Infrastructure%20Ready-blue)](benchmarks/)
-**Current Version: v5.47.0**
+**Current Version: v5.49.0**
 **[Autonomi](https://www.autonomi.dev/)** | **[Documentation](https://www.autonomi.dev/docs)** | **[GitHub](https://github.com/asklokesh/loki-mode)**
-> **PRD → Deployed Product in Zero Human Intervention**
+> **PRD to Deployed Product with Minimal Human Intervention**
 >
-> Loki Mode transforms a Product Requirements Document into a fully built, tested, deployed, and revenue-generating product while you sleep. No manual steps. No intervention. Just results.
+> Loki Mode transforms a Product Requirements Document into a fully built, tested, and deployed product with autonomous multi-agent execution. Human oversight for deployment credentials, domain setup, and critical decisions.
 ---
@@ -79,7 +77,7 @@ jobs:
     runs-on: ubuntu-latest
     steps:
       - uses: actions/checkout@v4
-      - uses: asklokesh/loki-mode@v5.38
+      - uses: asklokesh/loki-mode@v5
         with:
           github_token: ${{ secrets.GITHUB_TOKEN }}
           mode: review          # review, fix, or test
@@ -163,40 +161,27 @@ See [skills/providers.md](skills/providers.md) for full provider documentation.
 ---
-## Benchmark Results
+## Benchmarks
-### Three-Way Comparison (HumanEval)
+Benchmark infrastructure is included for HumanEval and SWE-bench evaluation. Results are self-reported from the included test harness and have not been independently verified.
-| System | Pass@1 | Details |
-|--------|--------|---------|
-| **Loki Mode (Multi-Agent)** | **98.78%** | 162/164 problems, RARV cycle recovered 2 |
-| Direct Claude | 98.17% | 161/164 problems (baseline) |
-| MetaGPT | 85.9-87.7% | Published benchmark |
+| Benchmark | Result | Notes |
+|-----------|--------|-------|
+| HumanEval | 162/164 (98.78%) | Self-reported, max 3 retries per problem |
+| SWE-bench | 299/300 patches generated | Patch generation only -- SWE-bench evaluator not yet run to verify correctness |
-**Loki Mode beats MetaGPT by +11-13%** thanks to the RARV (Reason-Act-Reflect-Verify) cycle.
+**Note:** SWE-bench "patch generation" means the system produced a patch file, not that the patch correctly resolves the issue. The SWE-bench evaluator should be run to determine actual resolution rates.
-### Full Results
-| Benchmark | Score | Details |
-|-----------|-------|---------|
-| **Loki Mode HumanEval** | **98.78% Pass@1** | 162/164 (multi-agent with RARV) |
-| **Direct Claude HumanEval** | **98.17% Pass@1** | 161/164 (single agent baseline) |
-| **Direct Claude SWE-bench** | **99.67% patch gen** | 299/300 problems |
-| **Loki Mode SWE-bench** | **99.67% patch gen** | 299/300 problems |
-| Model | Claude Opus 4.5 | |
-**Key Finding:** Multi-agent RARV matches single-agent performance on both benchmarks after timeout optimization. The 4-agent pipeline (Architect->Engineer->QA->Reviewer) achieves the same 99.67% patch generation as direct Claude.
-See [benchmarks/results/](benchmarks/results/) for full methodology and solutions.
+See [benchmarks/](benchmarks/) for the test harness and raw results.
 ---
 ## What is Loki Mode?
-Loki Mode is a multi-provider AI skill that orchestrates **41 specialized AI agent types** across **7 swarms** to autonomously build, test, deploy, and scale complete startups. Works with **Claude Code**, **OpenAI Codex CLI**, and **Google Gemini CLI**. It dynamically spawns only the agents you need—**5-10 for simple projects, 100+ for complex startups**—working in parallel with continuous self-verification.
+Loki Mode is a multi-provider AI skill that orchestrates **41 specialized AI agent types** across **8 swarms** to autonomously build, test, and deploy software projects. Works with **Claude Code**, **OpenAI Codex CLI**, and **Google Gemini CLI**. It dynamically spawns agents as needed -- typically **5-10 for simple projects, more for complex ones** -- working in parallel with continuous self-verification.
 ```
-PRD → Research → Architecture → Development → Testing → Deployment → Marketing → Revenue
+PRD → Research → Architecture → Development → Testing → Deployment → Marketing
 ```
 **Just say "Loki Mode" and point to a PRD. Walk away. Come back to a deployed product.**
@@ -205,11 +190,11 @@ PRD → Research → Architecture → Development → Testing → Deployment →
 ## Why Loki Mode?
-### **Better Than Anything Out There**
+### **How It Works**
 | What Others Do | What Loki Mode Does |
 |----------------|---------------------|
-| **Single agent** writes code linearly | **100+ agents** work in parallel across engineering, ops, business, data, product, and growth |
+| **Single agent** writes code linearly | **Multiple agents** work in parallel across engineering, ops, business, data, product, and growth |
 | **Manual deployment** required | **Autonomous deployment** to AWS, GCP, Azure, Vercel, Railway with blue-green and canary strategies |
 | **No testing** or basic unit tests | **7 automated quality gates**: input/output guardrails, static analysis, blind review, anti-sycophancy, severity blocking, test coverage |
 | **Code only** - you handle the rest | **Full business operations**: marketing, sales, legal, HR, finance, investor relations |
@@ -221,8 +206,8 @@ PRD → Research → Architecture → Development → Testing → Deployment →
 ### **Core Advantages**
-1. **Truly Autonomous**: RARV (Reason-Act-Reflect-Verify) cycle with self-verification achieves 2-3x quality improvement
-2. **Massively Parallel**: 100+ agents working simultaneously, not sequential single-agent bottlenecks
+1. **Self-Verifying**: RARV (Reason-Act-Reflect-Verify) cycle with continuous self-verification catches errors early
+2. **Parallel Execution**: Multiple agents working simultaneously, not sequential single-agent bottlenecks
 3. **Production-Ready**: Not just code—handles deployment, monitoring, incident response, and business operations
 4. **Self-Improving**: Learns from mistakes, updates continuity logs, prevents repeated errors
 5. **Zero Babysitting**: Auto-resumes on rate limits, recovers from failures, runs until completion
@@ -255,7 +240,7 @@ PRD → Research → Architecture → Development → Testing → Deployment →
 | **GitHub Integration** | Issue import, PR creation, status sync | [GitHub Integration](skills/github-integration.md) |
 | **Distribution** | npm, Homebrew, Docker installation | [Installation Guide](docs/INSTALLATION.md) |
 | **Research Foundation** | OpenAI, DeepMind, Anthropic patterns | [Acknowledgements](docs/ACKNOWLEDGEMENTS.md) |
-| **Benchmarks** | HumanEval 98.78%, SWE-bench 99.67% | [Benchmark Results](benchmarks/results/) |
+| **Benchmarks** | HumanEval and SWE-bench infrastructure included | [Benchmark Harness](benchmarks/) |
 | **Comparisons** | vs Auto-Claude, Cursor | [Auto-Claude](docs/auto-claude-comparison.md), [Cursor](docs/cursor-comparison.md) |
 ---
@@ -424,7 +409,7 @@ Loki Mode doesn't just write code—it **thinks, acts, learns, and verifies**:
    └─ Apply learning and RETRY from REASON
 ```
-**Result:** 2-3x quality improvement through continuous self-verification.
+**Result:** Improved quality through continuous self-verification and multi-reviewer code review.
 ### **Perpetual Improvement Mode**
@@ -561,7 +546,7 @@ graph TB
 **Key components:**
 - **RARV+C Cycle** -- Reason, Act, Reflect, Verify, Compound. Every iteration follows this loop. Failed verification triggers retry from Reason.
 - **Provider Layer** -- Claude Code (full parallel agents, Task tool, MCP), Codex CLI and Gemini CLI (sequential, degraded mode).
-- **Agent Swarms** -- 41 specialized agent types across 7 swarms, spawned on demand based on project complexity.
+- **Agent Swarms** -- 41 specialized agent types across 8 swarms, spawned on demand based on project complexity.
 - **Completion Council** -- 3 members vote on whether the project is done. Anti-sycophancy devil's advocate on unanimous votes.
 - **Memory System** -- Episodic traces, semantic patterns, procedural skills. Progressive disclosure reduces context usage by 60-80%.
 - **Dashboard** -- FastAPI server reading `.loki/` flat files, with real-time web UI for task queue, agents, logs, and council state. Now with TLS/HTTPS, OIDC/SSO, and RBAC (v5.36.0-v5.37.0).
@@ -609,7 +594,7 @@ Config search order: `.loki/config.yaml` (project) -> `~/.config/loki-mode/confi
 ## Agent Swarms (41 Types)
-Loki Mode has **41 predefined agent types** organized into **7 specialized swarms**. The orchestrator spawns only what you need—simple projects use 5-10 agents, complex startups spawn 100+.
+Loki Mode has **41 predefined agent types** organized into **8 specialized swarms**. The orchestrator spawns only what you need -- simple projects typically use 5-10 agents, complex ones may use more.
 <img width="5309" height="979" alt="Agent Swarms Visualization" src="https://github.com/user-attachments/assets/7d18635d-a606-401f-8d9f-430e6e4ee689" />
@@ -981,7 +966,7 @@ Built for the [Claude Code](https://claude.ai) ecosystem, powered by Anthropic's
 Loki Mode is the flagship product of **[Autonomi](https://www.autonomi.dev/)** -- a platform for autonomous AI systems. Like Alphabet is to Google, Autonomi is the parent brand under which Loki Mode and future products operate.
-**Why Autonomi?** Loki Mode proved that multi-agent autonomous systems can build real software from a PRD with zero human intervention. Autonomi is the expansion of that vision into a broader platform of autonomous services and products.
+**Why Autonomi?** Loki Mode proved that multi-agent autonomous systems can build real software from a PRD with minimal human intervention. Autonomi is the expansion of that vision into a broader platform of autonomous services and products.
 - **[autonomi.dev](https://www.autonomi.dev/)** -- Main website
 - **[Documentation](https://www.autonomi.dev/docs)** -- Full documentation

package/SKILL.md CHANGED Viewed

@@ -1,9 +1,9 @@
 ---
 name: loki-mode
-description: Multi-agent autonomous startup system. Triggers on "Loki Mode". Takes PRD to deployed product with zero human intervention. Requires --dangerously-skip-permissions flag.
+description: Multi-agent autonomous startup system. Triggers on "Loki Mode". Takes PRD to deployed product with minimal human intervention. Requires --dangerously-skip-permissions flag.
 ---
-# Loki Mode v5.48.2
+# Loki Mode v5.49.1
 **You are an autonomous agent. You make decisions. You do not ask questions. You do not stop.**
@@ -263,4 +263,4 @@ The following features are documented in skill modules but not yet fully automat
 | Quality gates 3-reviewer system | Implemented (v5.35.0) | 5 specialist reviewers in `skills/quality-gates.md`; execution in run.sh |
 | Benchmarks (HumanEval, SWE-bench) | Infrastructure only | Runner scripts and datasets exist in `benchmarks/`; no published results |
-**v5.48.2 | [Autonomi](https://www.autonomi.dev/) flagship product | ~260 lines core**
+**v5.49.1 | [Autonomi](https://www.autonomi.dev/) flagship product | ~260 lines core**

package/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 5.48.2
1	+ 5.49.1

package/autonomy/CONSTITUTION.md CHANGED Viewed

@@ -142,7 +142,7 @@ GROWTH ──[continuous improvement loop]──> GROWTH
 - `Bash` - Command execution
 - `platform-orchestrator` - Deployment and service management
-**The 37 agent types are ROLES defined through prompts, not subagent_types.**
+**The 41 agent types are ROLES defined through prompts, not subagent_types.**
 ---
@@ -158,7 +158,7 @@ skills/
   quality-gates.md             # 7-gate system, anti-sycophancy
   testing.md                   # Playwright, E2E, property-based
   production.md                # CI/CD, batch processing
-  agents.md                    # 37 agent types, A2A patterns
+  agents.md                    # 41 agent types, A2A patterns
   parallel-workflows.md        # Git worktrees, parallel streams
   troubleshooting.md           # Error recovery, fallbacks
   artifacts.md                 # Code generation patterns

package/autonomy/app-runner.sh CHANGED Viewed

@@ -432,6 +432,10 @@ app_runner_start() {
         (cd "$dir" && bash -c "$_APP_RUNNER_METHOD" >> "$_APP_RUNNER_DIR/app.log" 2>&1) &
     fi
     _APP_RUNNER_PID=$!
+    # Register with central PID registry if available
+    if type register_pid &>/dev/null; then
+        register_pid "$_APP_RUNNER_PID" "app-runner" "method=$_APP_RUNNER_METHOD"
+    fi
     # Write PID file
     echo "$_APP_RUNNER_PID" > "$_APP_RUNNER_DIR/app.pid"
@@ -497,6 +501,11 @@ app_runner_stop() {
         kill -KILL "-$_APP_RUNNER_PID" 2>/dev/null || kill -KILL "$_APP_RUNNER_PID" 2>/dev/null || true
     fi
+    # Unregister from central PID registry
+    if type unregister_pid &>/dev/null && [ -n "$_APP_RUNNER_PID" ]; then
+        unregister_pid "$_APP_RUNNER_PID"
+    fi
     rm -f "$_APP_RUNNER_DIR/app.pid"
     _write_app_state "stopped"
     log_info "App Runner: application stopped"

package/autonomy/completion-council.sh CHANGED Viewed

@@ -45,6 +45,14 @@ COUNCIL_MIN_ITERATIONS=${LOKI_COUNCIL_MIN_ITERATIONS:-3}
 COUNCIL_CONVERGENCE_WINDOW=${LOKI_COUNCIL_CONVERGENCE_WINDOW:-3}
 COUNCIL_STAGNATION_LIMIT=${LOKI_COUNCIL_STAGNATION_LIMIT:-5}
+# Error budget: severity-aware completion (v5.49.0)
+# SEVERITY_THRESHOLD: minimum severity that blocks completion (critical, high, medium, low)
+#   "critical" = only critical issues block (most permissive)
+#   "low" = all issues block (strictest, default for backwards compat)
+# ERROR_BUDGET: fraction of non-blocking issues allowed (0.0 = none, 0.1 = 10% tolerance)
+COUNCIL_SEVERITY_THRESHOLD=${LOKI_COUNCIL_SEVERITY_THRESHOLD:-low}
+COUNCIL_ERROR_BUDGET=${LOKI_COUNCIL_ERROR_BUDGET:-0.0}
 # Internal state
 COUNCIL_STATE_DIR=""
 COUNCIL_PRD_PATH=""
@@ -235,6 +243,38 @@ council_vote() {
         local vote_result
         vote_result=$(echo "$verdict" | grep -oE "VOTE:\s*(APPROVE|REJECT)" | grep -oE "APPROVE|REJECT" | head -1)
+        # Extract severity-categorized issues (v5.49.0 error budget)
+        local member_issues=""
+        member_issues=$(echo "$verdict" | grep -oE "ISSUES:\s*(CRITICAL|HIGH|MEDIUM|LOW):.*" || true)
+        # If error budget is active and member rejected, check if rejection
+        # is based only on issues below the severity threshold
+        if [ "$vote_result" = "REJECT" ] && [ "$COUNCIL_SEVERITY_THRESHOLD" != "low" ] && [ -n "$member_issues" ]; then
+            local has_blocking_issue=false
+            local severity_order="critical high medium low"
+            local threshold_reached=false
+            while IFS= read -r issue_line; do
+                local issue_severity
+                issue_severity=$(echo "$issue_line" | grep -oE "(CRITICAL|HIGH|MEDIUM|LOW)" | head -1 | tr '[:upper:]' '[:lower:]')
+                # Check if this severity meets or exceeds the threshold
+                for sev in $severity_order; do
+                    if [ "$sev" = "$COUNCIL_SEVERITY_THRESHOLD" ]; then
+                        threshold_reached=true
+                    fi
+                    if [ "$sev" = "$issue_severity" ] && [ "$threshold_reached" = "false" ]; then
+                        has_blocking_issue=true
+                        break
+                    fi
+                done
+            done <<< "$member_issues"
+            if [ "$has_blocking_issue" = "false" ]; then
+                log_info "  Member $member ($role): REJECT overridden to APPROVE (issues below ${COUNCIL_SEVERITY_THRESHOLD} threshold)"
+                vote_result="APPROVE"
+            fi
+        fi
         if [ "$vote_result" = "APPROVE" ]; then
             ((approve_count++))
             log_info "  Member $member ($role): APPROVE"
@@ -618,23 +658,37 @@ council_member_review() {
             ;;
     esac
+    local severity_instruction=""
+    if [ "$COUNCIL_SEVERITY_THRESHOLD" != "low" ]; then
+        severity_instruction="
+ERROR BUDGET: This council uses severity-aware evaluation.
+- Categorize each issue as CRITICAL, HIGH, MEDIUM, or LOW severity
+- Blocking threshold: ${COUNCIL_SEVERITY_THRESHOLD} and above
+- Only issues at ${COUNCIL_SEVERITY_THRESHOLD} severity or above should cause REJECT
+- Issues below threshold are acceptable (error budget: ${COUNCIL_ERROR_BUDGET})
+- List issues as ISSUES: SEVERITY:description (one per line)"
+    fi
     local prompt="You are a council member reviewing project completion.
 ${role_instruction}
 EVIDENCE:
 ${evidence}
+${severity_instruction}
 INSTRUCTIONS:
 1. Review the evidence carefully
 2. Determine if the project meets completion criteria
 3. Output EXACTLY one line starting with VOTE:APPROVE or VOTE:REJECT
 4. Output EXACTLY one line starting with REASON: explaining your decision
-5. Be honest - do not approve incomplete work
+5. If issues found, output lines starting with ISSUES: SEVERITY:description
+6. Be honest - do not approve incomplete work
-Output format (exactly two lines):
+Output format:
 VOTE:APPROVE or VOTE:REJECT
-REASON: your reasoning here"
+REASON: your reasoning here
+ISSUES: CRITICAL:description (optional, one per line per issue)"
     local verdict_file="$vote_dir/member-${member_id}.txt"
@@ -1300,5 +1354,5 @@ council_get_dashboard_state() {
         state_json=$(cat "$COUNCIL_STATE_DIR/state.json" 2>/dev/null || echo "{}")
     fi
-    echo "\"council\": {\"enabled\": true, \"size\": $COUNCIL_SIZE, \"threshold\": $COUNCIL_THRESHOLD, \"check_interval\": $COUNCIL_CHECK_INTERVAL, \"consecutive_no_change\": $COUNCIL_CONSECUTIVE_NO_CHANGE, \"done_signals\": $COUNCIL_DONE_SIGNALS, \"iteration\": $ITERATION_COUNT, \"state\": $state_json}"
+    echo "\"council\": {\"enabled\": true, \"size\": $COUNCIL_SIZE, \"threshold\": $COUNCIL_THRESHOLD, \"check_interval\": $COUNCIL_CHECK_INTERVAL, \"consecutive_no_change\": $COUNCIL_CONSECUTIVE_NO_CHANGE, \"done_signals\": $COUNCIL_DONE_SIGNALS, \"iteration\": $ITERATION_COUNT, \"severity_threshold\": \"$COUNCIL_SEVERITY_THRESHOLD\", \"error_budget\": $COUNCIL_ERROR_BUDGET, \"state\": $state_json}"
 }

package/autonomy/hooks/validate-bash.sh CHANGED Viewed

@@ -30,11 +30,21 @@ BLOCKED_PATTERNS=(
     "wget.*\|.*sh"
     "curl.*\|.*bash"
     "wget.*\|.*bash"
+    # Config self-protection: prevent agents from corrupting internal state
+    "rm -rf \.loki"
+    "rm -rf \./\.loki"
+    "rm .*\.loki/council/"
+    "rm .*\.loki/config\.yaml"
+    "rm .*\.loki/logs/bash-audit"
+    "rm .*\.loki/session\.lock"
+    "> \.loki/council/"
+    "> \.loki/config\.yaml"
 )
-# Safe path patterns that override rm -rf / matches
+# Safe path patterns that override blocked pattern matches
 SAFE_PATTERNS=(
     "rm -rf /tmp/"
+    "rm -rf \.loki/queue/dead-letter"
 )
 # Check for blocked patterns

package/autonomy/loki CHANGED Viewed

@@ -9,6 +9,7 @@
 # Usage:
 #   loki start [PRD]      - Start Loki Mode (optionally with PRD)
 #   loki stop             - Stop execution immediately
+#   loki cleanup          - Kill orphaned processes from crashed sessions
 #   loki pause            - Pause after current session
 #   loki resume           - Resume paused execution
 #   loki status           - Show current status
@@ -312,6 +313,7 @@ show_help() {
     echo "  init             Build a PRD interactively or from templates"
     echo "  issue <url|num>  Generate PRD from GitHub issue and optionally start"
     echo "  stop             Stop execution immediately"
+    echo "  cleanup          Kill orphaned processes from crashed sessions"
     echo "  pause            Pause after current session"
     echo "  resume           Resume paused execution"
     echo "  status [--json]  Show current status (--json for machine-readable)"
@@ -704,6 +706,28 @@ except: pass
             rm -f "$LOKI_DIR/dashboard/dashboard.pid"
         fi
+        # Kill any remaining registered processes (2s graceful window matches run.sh)
+        if [ -d "$LOKI_DIR/pids" ]; then
+            for entry_file in "$LOKI_DIR/pids"/*.json; do
+                [ -f "$entry_file" ] || continue
+                local reg_pid
+                reg_pid=$(basename "$entry_file" .json)
+                case "$reg_pid" in ''|*[!0-9]*) continue ;; esac
+                if kill -0 "$reg_pid" 2>/dev/null; then
+                    kill "$reg_pid" 2>/dev/null || true
+                    local w=0
+                    while [ $w -lt 4 ] && kill -0 "$reg_pid" 2>/dev/null; do
+                        sleep 0.5
+                        w=$((w + 1))
+                    done
+                    if kill -0 "$reg_pid" 2>/dev/null; then
+                        kill -9 "$reg_pid" 2>/dev/null || true
+                    fi
+                fi
+                rm -f "$entry_file"
+            done
+        fi
         # Emit session stop event
         emit_event session cli stop "reason=user_requested"
         # Emit success pattern for clean stop (SYN-018)
@@ -730,6 +754,86 @@ except: pass
     fi
 }
+# Kill orphaned processes from crashed sessions
+cmd_cleanup() {
+    local pids_dir="$LOKI_DIR/pids"
+    local killed=0
+    local stale=0
+    if [ ! -d "$pids_dir" ]; then
+        echo "No PID registry found. Nothing to clean up."
+        exit 0
+    fi
+    echo -e "${BOLD}Scanning for orphaned processes...${NC}"
+    for entry_file in "$pids_dir"/*.json; do
+        [ -f "$entry_file" ] || continue
+        local pid
+        pid=$(basename "$entry_file" .json)
+        case "$pid" in
+            ''|*[!0-9]*) continue ;;
+        esac
+        local label=""
+        local ppid_val=""
+        # Parse JSON fields (python3 with shell fallback)
+        if command -v python3 >/dev/null 2>&1; then
+            label=$(python3 -c "import json,sys; print(json.load(open(sys.argv[1])).get('label','unknown'))" "$entry_file" 2>/dev/null) || label="unknown"
+            ppid_val=$(python3 -c "import json,sys; print(json.load(open(sys.argv[1])).get('ppid',''))" "$entry_file" 2>/dev/null) || true
+        else
+            label=$(sed 's/.*"label":"//' "$entry_file" 2>/dev/null | sed 's/".*//' | head -1) || label="unknown"
+            ppid_val=$(sed 's/.*"ppid"://' "$entry_file" 2>/dev/null | sed 's/[,}].*//' | head -1) || true
+        fi
+        if kill -0 "$pid" 2>/dev/null; then
+            # Process is alive - check if parent is dead (orphan)
+            local is_orphan=false
+            # Validate ppid_val is numeric before using with kill
+            case "$ppid_val" in ''|*[!0-9]*) ppid_val="" ;; esac
+            if [ -n "$ppid_val" ] && ! kill -0 "$ppid_val" 2>/dev/null; then
+                is_orphan=true
+            fi
+            if [ "$is_orphan" = true ] || [ "${1:-}" = "--force" ]; then
+                echo -e "  ${RED}Killing${NC} PID=$pid label=$label (parent $ppid_val dead)"
+                kill "$pid" 2>/dev/null || true
+                sleep 0.5
+                if kill -0 "$pid" 2>/dev/null; then
+                    kill -9 "$pid" 2>/dev/null || true
+                fi
+                rm -f "$entry_file"
+                killed=$((killed + 1))
+            else
+                echo -e "  ${GREEN}Alive${NC}  PID=$pid label=$label (parent $ppid_val alive)"
+            fi
+        else
+            # Process is dead - clean up stale entry
+            rm -f "$entry_file"
+            stale=$((stale + 1))
+        fi
+    done
+    echo ""
+    echo "Results: $killed orphan(s) killed, $stale stale entries cleaned"
+    # Also kill orphaned loki-run temp scripts
+    local temp_killed=0
+    if pgrep -f "loki-run-" >/dev/null 2>&1; then
+        if ! is_session_running; then
+            echo "Killing orphaned loki-run temp scripts..."
+            pkill -f "loki-run-" 2>/dev/null || true
+            sleep 0.5
+            pkill -9 -f "loki-run-" 2>/dev/null || true
+            temp_killed=1
+        fi
+    fi
+    if [ $killed -eq 0 ] && [ $stale -eq 0 ] && [ $temp_killed -eq 0 ]; then
+        echo -e "${GREEN}System is clean. No orphans found.${NC}"
+    fi
+}
 # Pause after current session
 cmd_pause() {
     if [ ! -d "$LOKI_DIR" ]; then
@@ -4497,6 +4601,9 @@ main() {
         stop)
             cmd_stop
             ;;
+        cleanup)
+            cmd_cleanup "$@"
+            ;;
         pause)
             cmd_pause
             ;;