npm - loki-mode - Versions diffs - 5.49.1 → 5.49.3 - Mend

loki-mode 5.49.1 → 5.49.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

package/README.md +57 -83
package/SKILL.md +2 -2
package/VERSION +1 -1
package/autonomy/CONSTITUTION.md +2 -2
package/dashboard/__init__.py +1 -1
package/dashboard/server.py +140 -26
package/dashboard/static/index.html +1 -1
package/docs/COMPARISON.md +5 -5
package/docs/COMPETITIVE-ANALYSIS.md +2 -2
package/docs/INSTALLATION.md +23 -140
package/docs/alternative-installations.md +145 -0
package/docs/cursor-comparison.md +4 -4
package/mcp/__init__.py +1 -1
package/package.json +1 -1
package/references/core-workflow.md +1 -1
package/references/quality-control.md +10 -0
package/skills/00-index.md +1 -1
package/skills/artifacts.md +1 -1
package/skills/quality-gates.md +21 -1
package/skills/testing.md +15 -0

package/README.md CHANGED Viewed

@@ -11,7 +11,7 @@
 [![Agent Types](https://img.shields.io/badge/Agent%20Types-41-blue)]()
 [![Benchmarks](https://img.shields.io/badge/Benchmarks-Infrastructure%20Ready-blue)](benchmarks/)
-**Current Version: v5.49.0**
+**Current Version: v5.49.3**
 **[Autonomi](https://www.autonomi.dev/)** | **[Documentation](https://www.autonomi.dev/docs)** | **[GitHub](https://github.com/asklokesh/loki-mode)**
@@ -25,7 +25,7 @@
 [![asciicast](https://asciinema.org/a/AjjnjzOeKLYItp6s.svg)](https://asciinema.org/a/AjjnjzOeKLYItp6s)
-*Click to watch Loki Mode v5.42 -- CLI commands, dashboard, 8 parallel agents, 7-gate quality, Completion Council, memory system*
+*Click to watch Loki Mode v5.42 -- CLI commands, dashboard, 8 parallel agents, 9-gate quality, Completion Council, memory system*
 ---
@@ -39,98 +39,38 @@
 ---
-## Usage
-### Option 1: npm (Recommended)
+## Installation
 ```bash
-npm install -g loki-mode
-loki start ./my-prd.md
+git clone https://github.com/asklokesh/loki-mode.git ~/.claude/skills/loki-mode
 ```
-### Option 2: Claude Code Skill
+That's it. Claude Code auto-discovers skills in `~/.claude/skills/`.
+### Use It
 ```bash
-git clone https://github.com/asklokesh/loki-mode.git ~/.claude/skills/loki-mode
 claude --dangerously-skip-permissions
-# Then say: Loki Mode with PRD at ./my-prd.md
+# Then say: "Loki Mode with PRD at ./my-prd.md"
 ```
-### Option 3: GitHub Action
+### Update
-Add automated AI code review to your pull requests:
-```yaml
-# .github/workflows/loki-review.yml
-name: Loki Code Review
-on:
-  pull_request:
-    types: [opened, synchronize]
-permissions:
-  contents: read
-  pull-requests: write
-jobs:
-  review:
-    runs-on: ubuntu-latest
-    steps:
-      - uses: actions/checkout@v4
-      - uses: asklokesh/loki-mode@v5
-        with:
-          github_token: ${{ secrets.GITHUB_TOKEN }}
-          mode: review          # review, fix, or test
-          provider: claude      # claude, codex, or gemini
-          max_iterations: 3     # sets LOKI_MAX_ITERATIONS env var
-          budget_limit: '5.00'  # max cost in USD (maps to --budget flag)
-        env:
-          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
-```
-**Prerequisites:**
-- An API key for your chosen provider (set as a repository secret):
-  - Claude: `ANTHROPIC_API_KEY`
-  - Codex: `OPENAI_API_KEY`
-  - Gemini: `GOOGLE_API_KEY`
-- The action automatically installs `loki-mode` and `@anthropic-ai/claude-code` (for the Claude provider)
-**Action Inputs:**
-| Input | Default | Description |
-|-------|---------|-------------|
-| `mode` | `review` | `review`, `fix`, or `test` |
-| `provider` | `claude` | `claude`, `codex`, or `gemini` |
-| `budget_limit` | `5.00` | Max cost in USD (maps to `--budget` CLI flag) |
-| `budget` | | Alias for `budget_limit` |
-| `max_iterations` | `3` | Sets `LOKI_MAX_ITERATIONS` env var |
-| `github_token` | (required) | GitHub token for PR comments |
-| `prd_file` | | Path to PRD file relative to repo root |
-| `auto_confirm` | `true` | Skip confirmation prompts (always true in CI) |
-| `install_claude` | `true` | Auto-install Claude Code CLI if not present |
-| `node_version` | `20` | Node.js version |
-**Using with a PRD file (fix/test modes):**
-```yaml
-- uses: asklokesh/loki-mode@v5
-  with:
-    mode: fix
-    prd_file: 'docs/my-prd.md'
-    github_token: ${{ secrets.GITHUB_TOKEN }}
-  env:
-    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
+```bash
+cd ~/.claude/skills/loki-mode && git pull
 ```
-**Modes:**
+### Troubleshooting
-| Mode | Description |
-|------|-------------|
-| `review` | Analyze PR diff, post structured review as PR comment |
-| `fix` | Automatically fix issues found in the codebase |
-| `test` | Run autonomous test generation and validation |
+| Problem | Fix |
+|---------|-----|
+| `SKILL.md` not found | Verify: `ls ~/.claude/skills/loki-mode/SKILL.md` |
+| Claude doesn't recognize "Loki Mode" | Restart Claude Code after cloning |
+| Permission denied on clone | Check SSH keys or use HTTPS URL above |
-Also available via **Homebrew**, **Docker**, **VS Code Extension**, and **direct shell script**. See the [Installation Guide](docs/INSTALLATION.md) for all 7 installation methods and detailed instructions.
+### Other Installation Methods
+Also available via **npm**, **Homebrew**, **Docker**, **GitHub Action**, and **VS Code Extension**. See [docs/alternative-installations.md](docs/alternative-installations.md) for details and limitations of each method.
 ### Multi-Provider Support (v5.0.0)
@@ -188,6 +128,40 @@ PRD → Research → Architecture → Development → Testing → Deployment →
 ---
+## Current Limitations
+Loki Mode is powerful but not magic. Be aware of these honest limitations:
+| Area | What Works | What Doesn't (Yet) |
+|------|-----------|---------------------|
+| **Code Generation** | Generates full-stack applications from PRDs | Complex domain logic may need human review and correction |
+| **Deployment** | Generates deployment configs and scripts | Does not have cloud credentials -- human must provide and authorize |
+| **Testing** | 9 automated quality gates, blind review | Test quality depends on AI-generated assertions; mutation testing is heuristic |
+| **Business Ops** | Generates marketing copy, legal templates | Does not actually send emails, file legal documents, or process payments |
+| **Multi-Provider** | Claude (full), Codex (degraded), Gemini (degraded) | Codex and Gemini lack parallel agents and Task tool -- sequential only |
+| **Memory System** | Episodic, semantic, procedural memory tiers | Vector search requires optional `sentence-transformers` dependency |
+| **Enterprise Security** | TLS, OIDC, RBAC, audit trail, SIEM configs | Self-signed certs only; production deployments need real certificates |
+| **Dashboard** | Real-time status, task queue, agent monitoring | Single-machine only; no multi-node dashboard clustering |
+| **Benchmarks** | HumanEval 98.78%, SWE-bench 299/300 patches | Self-reported; SWE-bench counts patch generation, not verified resolution |
+**What "autonomous" means in practice:**
+- Loki Mode runs without prompting between RARV cycles
+- It does NOT have access to your cloud accounts, payment systems, or external services unless you provide credentials
+- Human oversight is expected for: deployment credentials, domain setup, API keys, and critical business decisions
+- The system is as good as the underlying AI model -- it can make mistakes, especially on novel or complex problems
+## What To Expect
+| Project Type | Examples | Autonomy Level | Typical Experience |
+|---|---|---|---|
+| Simple | Landing page, todo app, static site, single API | High | Completes with minimal retries. Human reviews output. |
+| Standard | CRUD app with auth, REST API + React frontend | Medium | Completes most features. Complex components may need guidance. |
+| Complex | Microservices, real-time systems, ML pipelines | Guided | Use as accelerator. Human reviews between phases. |
+"Autonomous" means the system runs RARV cycles without prompting. It does NOT mean zero oversight.
+---
 ## Why Loki Mode?
 ### **How It Works**
@@ -196,7 +170,7 @@ PRD → Research → Architecture → Development → Testing → Deployment →
 |----------------|---------------------|
 | **Single agent** writes code linearly | **Multiple agents** work in parallel across engineering, ops, business, data, product, and growth |
 | **Manual deployment** required | **Autonomous deployment** to AWS, GCP, Azure, Vercel, Railway with blue-green and canary strategies |
-| **No testing** or basic unit tests | **7 automated quality gates**: input/output guardrails, static analysis, blind review, anti-sycophancy, severity blocking, test coverage |
+| **No testing** or basic unit tests | **9 automated quality gates**: input/output guardrails, static analysis, blind review, anti-sycophancy, severity blocking, test coverage, mock detection, mutation detection |
 | **Code only** - you handle the rest | **Full business operations**: marketing, sales, legal, HR, finance, investor relations |
 | **Stops on errors** | **Self-healing**: circuit breakers, dead letter queues, exponential backoff, automatic recovery |
 | **No visibility** into progress | **Real-time dashboard** with agent monitoring, task queues, and live status updates |
@@ -234,7 +208,7 @@ PRD → Research → Architecture → Development → Testing → Deployment →
 | **OpenClaw Bridge (v5.38.0)** | Multi-agent coordination protocol | [OpenClaw Integration](docs/openclaw-integration.md) |
 | **41 Agent Types** | Engineering, Ops, Business, Data, Product, Growth, Orchestration | [Agent Definitions](references/agent-types.md) |
 | **RARV Cycle** | Reason-Act-Reflect-Verify workflow | [Core Workflow](references/core-workflow.md) |
-| **Quality Gates** | 7-gate system: guardrails, static analysis, blind review, anti-sycophancy, severity blocking, test coverage | [Quality Control](references/quality-control.md) |
+| **Quality Gates** | 9-gate system: guardrails, static analysis, blind review, anti-sycophancy, severity blocking, test coverage, mock detection, mutation detection | [Quality Control](references/quality-control.md) |
 | **Memory System (v5.15.0)** | Complete 3-tier memory with progressive disclosure | [Memory Architecture](references/memory-system.md) |
 | **Parallel Workflows** | Git worktree-based parallelism | [Parallel Workflows](skills/parallel-workflows.md) |
 | **GitHub Integration** | Issue import, PR creation, status sync | [GitHub Integration](skills/github-integration.md) |
@@ -661,7 +635,7 @@ references/                    # Deep documentation (23KB+ files)
 | **2. Architecture** | Tech stack selection with self-reflection |
 | **3. Infrastructure** | Provision cloud, CI/CD, monitoring |
 | **4. Development** | Implement with TDD, parallel code review |
-| **5. QA** | 7 quality gates, security audit, load testing |
+| **5. QA** | 9 quality gates, security audit, load testing |
 | **6. Deployment** | Blue-green deploy, auto-rollback on errors |
 | **7. Business** | Marketing, sales, legal, support setup |
 | **8. Growth** | Continuous optimization, A/B testing, feedback loops |

package/SKILL.md CHANGED Viewed

@@ -3,7 +3,7 @@ name: loki-mode
 description: Multi-agent autonomous startup system. Triggers on "Loki Mode". Takes PRD to deployed product with minimal human intervention. Requires --dangerously-skip-permissions flag.
 ---
-# Loki Mode v5.49.1
+# Loki Mode v5.49.3
 **You are an autonomous agent. You make decisions. You do not ask questions. You do not stop.**
@@ -263,4 +263,4 @@ The following features are documented in skill modules but not yet fully automat
 | Quality gates 3-reviewer system | Implemented (v5.35.0) | 5 specialist reviewers in `skills/quality-gates.md`; execution in run.sh |
 | Benchmarks (HumanEval, SWE-bench) | Infrastructure only | Runner scripts and datasets exist in `benchmarks/`; no published results |
-**v5.49.1 | [Autonomi](https://www.autonomi.dev/) flagship product | ~260 lines core**
+**v5.49.3 | [Autonomi](https://www.autonomi.dev/) flagship product | ~260 lines core**

package/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 5.49.1
1	+ 5.49.3

package/autonomy/CONSTITUTION.md CHANGED Viewed

@@ -155,7 +155,7 @@ SKILL.md (~190 lines)         # Always loaded: RARV cycle, autonomy rules
 skills/
   00-index.md                  # Module routing table
   model-selection.md           # Task tool, parallelization
-  quality-gates.md             # 7-gate system, anti-sycophancy
+  quality-gates.md             # 9-gate system, anti-sycophancy
   testing.md                   # Playwright, E2E, property-based
   production.md                # CI/CD, batch processing
   agents.md                    # 41 agent types, A2A patterns
@@ -196,7 +196,7 @@ Main Worktree (orchestrator)
 ---
-## Quality Gates (7-Gate System)
+## Quality Gates (9-Gate System)
 ### Gate 1: Static Analysis
 ```yaml

package/dashboard/__init__.py CHANGED Viewed

@@ -7,7 +7,7 @@ Modules:
     control: Session control API (start/stop/pause/resume)
 """
-__version__ = "5.49.1"
+__version__ = "5.49.3"
 # Expose the control app for easy import
 try:

package/dashboard/server.py CHANGED Viewed

@@ -3151,23 +3151,112 @@ async def get_github_sync_log(
 # =============================================================================
+def _resolve_process_state(pid: Optional[int], last_status: str = "",
+                           started: str = "", heartbeat: str = "",
+                           stale_threshold: int = 30) -> dict[str, Any]:
+    """Resolve process state with honest labels.
+    States:
+      RUNNING   - PID alive AND heartbeat < stale_threshold seconds
+      STALE     - PID alive BUT no heartbeat update in > stale_threshold seconds
+      COMPLETED - last_status marked done/completed and PID exited
+      FAILED    - last_status marked failed OR PID exited non-zero
+      CRASHED   - PID dead BUT last_status was 'running'
+      UNKNOWN   - No PID, no status, or conflicting data
+    Returns dict with: state, pid_alive, started, last_heartbeat, duration_seconds
+    """
+    now = datetime.now(timezone.utc)
+    pid_alive = False
+    if pid is not None:
+        try:
+            os.kill(pid, 0)
+            pid_alive = True
+        except (OSError, ValueError, TypeError):
+            pass
+    # Parse timestamps
+    started_dt = None
+    heartbeat_dt = None
+    if started:
+        try:
+            started_dt = datetime.fromisoformat(started.replace("Z", "+00:00"))
+            if started_dt.tzinfo is None:
+                started_dt = started_dt.replace(tzinfo=timezone.utc)
+        except (ValueError, AttributeError):
+            pass
+    if heartbeat:
+        try:
+            heartbeat_dt = datetime.fromisoformat(heartbeat.replace("Z", "+00:00"))
+            if heartbeat_dt.tzinfo is None:
+                heartbeat_dt = heartbeat_dt.replace(tzinfo=timezone.utc)
+        except (ValueError, AttributeError):
+            pass
+    # Calculate duration
+    duration_seconds = None
+    if started_dt:
+        duration_seconds = round((now - started_dt).total_seconds())
+    # Calculate heartbeat age
+    heartbeat_age = None
+    if heartbeat_dt:
+        heartbeat_age = round((now - heartbeat_dt).total_seconds())
+    # Resolve state
+    normalized = last_status.lower().strip() if last_status else ""
+    if pid_alive:
+        if heartbeat_age is not None and heartbeat_age > stale_threshold:
+            state = "STALE"
+        else:
+            state = "RUNNING"
+    else:
+        if normalized in ("done", "completed", "complete", "success"):
+            state = "COMPLETED"
+        elif normalized in ("failed", "error", "errored"):
+            state = "FAILED"
+        elif normalized in ("running", "active", "in_progress", "starting"):
+            state = "CRASHED"
+        elif pid is None:
+            state = "UNKNOWN"
+        else:
+            # PID dead, unknown last status
+            state = "CRASHED" if normalized == "" else "UNKNOWN"
+    result: dict[str, Any] = {
+        "state": state,
+        "pid_alive": pid_alive,
+    }
+    if started:
+        result["started"] = started
+    if heartbeat:
+        result["last_heartbeat"] = heartbeat
+    if heartbeat_age is not None:
+        result["heartbeat_age_seconds"] = heartbeat_age
+    if duration_seconds is not None:
+        result["duration_seconds"] = duration_seconds
+    return result
 @app.get("/api/health/processes")
 async def get_process_health(token: Optional[dict] = Depends(auth.get_current_token)):
-    """Get health status of all loki processes (dashboard, session, agents)."""
+    """Get health status of all loki processes (dashboard, session, agents).
+    Returns honest state labels: RUNNING, STALE, COMPLETED, FAILED, CRASHED, UNKNOWN.
+    Every entry includes timestamps (started, last_heartbeat, duration_seconds).
+    """
     result: dict[str, Any] = {"dashboard": None, "session": None, "agents": []}
     loki_dir = _get_loki_dir()
+    now_iso = datetime.now(timezone.utc).isoformat()
     # Dashboard PID
     dpid_file = loki_dir / "dashboard" / "dashboard.pid"
     if dpid_file.exists():
         try:
             dpid = int(dpid_file.read_text().strip())
-            try:
-                os.kill(dpid, 0)
-                result["dashboard"] = {"pid": dpid, "status": "alive"}
-            except OSError:
-                result["dashboard"] = {"pid": dpid, "status": "dead"}
+            state_info = _resolve_process_state(dpid, last_status="running")
+            result["dashboard"] = {"pid": dpid, **state_info}
         except (ValueError, OSError):
             pass
@@ -3176,14 +3265,23 @@ async def get_process_health(token: Optional[dict] = Depends(auth.get_current_to
     if spid_file.exists():
         try:
             spid = int(spid_file.read_text().strip())
-            try:
-                os.kill(spid, 0)
-                result["session"] = {"pid": spid, "status": "alive"}
-            except OSError:
-                result["session"] = {"pid": spid, "status": "dead"}
+            state_info = _resolve_process_state(spid, last_status="running")
+            result["session"] = {"pid": spid, **state_info}
         except (ValueError, OSError):
             pass
+    # Read dashboard-state.json for heartbeat timestamp
+    state_file = loki_dir / "dashboard-state.json"
+    state_heartbeat = ""
+    if state_file.exists():
+        try:
+            st = os.stat(state_file)
+            state_heartbeat = datetime.fromtimestamp(
+                st.st_mtime, tz=timezone.utc
+            ).isoformat()
+        except OSError:
+            pass
     # Agent PIDs
     agents_file = loki_dir / "state" / "agents.json"
     if agents_file.exists():
@@ -3191,18 +3289,21 @@ async def get_process_health(token: Optional[dict] = Depends(auth.get_current_to
             agents = json.loads(agents_file.read_text())
             for agent in agents:
                 pid = agent.get("pid")
-                status = "unknown"
-                if pid:
-                    try:
-                        os.kill(int(pid), 0)
-                        status = "alive"
-                    except (OSError, ValueError):
-                        status = "dead"
+                pid_int = int(pid) if pid else None
+                agent_status = agent.get("status", "")
+                agent_started = agent.get("started", "")
+                agent_heartbeat = agent.get("heartbeat", state_heartbeat)
+                state_info = _resolve_process_state(
+                    pid_int,
+                    last_status=agent_status,
+                    started=agent_started,
+                    heartbeat=agent_heartbeat,
+                )
                 result["agents"].append({
                     "id": agent.get("id", ""),
                     "name": agent.get("name", ""),
                     "pid": pid,
-                    "status": status,
+                    **state_info,
                 })
         except Exception:
             pass
@@ -3216,17 +3317,29 @@ async def get_process_health(token: Optional[dict] = Depends(auth.get_current_to
                 pid_str = entry_file.stem
                 pid = int(pid_str)
                 entry = json.loads(entry_file.read_text())
-                try:
-                    os.kill(pid, 0)
-                    status = "alive"
-                except OSError:
-                    status = "dead"
+                entry_started = entry.get("started", "")
+                entry_heartbeat = entry.get("heartbeat", "")
+                # Use file mtime as heartbeat fallback
+                if not entry_heartbeat:
+                    try:
+                        st = os.stat(entry_file)
+                        entry_heartbeat = datetime.fromtimestamp(
+                            st.st_mtime, tz=timezone.utc
+                        ).isoformat()
+                    except OSError:
+                        pass
+                entry_status = entry.get("status", "running")
+                state_info = _resolve_process_state(
+                    pid,
+                    last_status=entry_status,
+                    started=entry_started,
+                    heartbeat=entry_heartbeat,
+                )
                 registered.append({
                     "pid": pid,
                     "label": entry.get("label", "unknown"),
-                    "started": entry.get("started", ""),
                     "ppid": entry.get("ppid"),
-                    "status": status,
+                    **state_info,
                 })
             except (ValueError, json.JSONDecodeError, OSError):
                 continue
@@ -3234,6 +3347,7 @@ async def get_process_health(token: Optional[dict] = Depends(auth.get_current_to
     watchdog_enabled = os.environ.get("LOKI_WATCHDOG", "false").lower() == "true"
     result["watchdog_enabled"] = watchdog_enabled
+    result["checked_at"] = now_iso
     return result

package/dashboard/static/index.html CHANGED Viewed

@@ -4774,7 +4774,7 @@ var LokiDashboard=(()=>{var X=Object.defineProperty;var gt=Object.getOwnProperty
         <p>Checklist not initialized</p>
         <p class="hint">The PRD checklist will be created during the first iteration when a PRD is provided.</p>
       </div>
-    `}_attachEventListeners(){let t=this.shadowRoot;t&&(t.querySelectorAll(".category-header[data-category]").forEach(e=>{e.addEventListener("click",()=>this._toggleCategory(e.dataset.category))}),t.querySelectorAll("button[data-waive-id]").forEach(e=>{e.addEventListener("click",a=>{a.stopPropagation(),this._waiveItem(e.dataset.waiveId)})}),t.querySelectorAll("button[data-unwaive-id]").forEach(e=>{e.addEventListener("click",a=>{a.stopPropagation(),this._unwaiveItem(e.dataset.unwaiveId)})}))}_escapeHtml(t){return t?String(t).replace(/&/g,"&amp;").replace(/</g,"&lt;").replace(/>/g,"&gt;").replace(/"/g,"&quot;"):""}};customElements.define("loki-checklist-viewer",G);var ht={not_initialized:{color:"var(--loki-text-muted, #71717a)",label:"Not Started",pulse:!1},starting:{color:"var(--loki-yellow, #ca8a04)",label:"Starting...",pulse:!0},running:{color:"var(--loki-green, #16a34a)",label:"Running",pulse:!0},crashed:{color:"var(--loki-red, #dc2626)",label:"Crashed",pulse:!1},stopped:{color:"var(--loki-text-muted, #a1a1aa)",label:"Stopped",pulse:!1}},J=class extends c{static get observedAttributes(){return["api-url","theme"]}constructor(){super(),this._loading=!1,this._error=null,this._api=null,this._pollInterval=null,this._status=null,this._logs=[],this._lastDataHash=null,this._lastLogsHash=null}connectedCallback(){super.connectedCallback(),this._setupApi(),this._loadData(),this._startPolling()}disconnectedCallback(){super.disconnectedCallback(),this._stopPolling()}attributeChangedCallback(t,e,a){e!==a&&(t==="api-url"&&this._api&&(this._api.baseUrl=a,this._loadData()),t==="theme"&&this._applyTheme())}_setupApi(){let t=this.getAttribute("api-url")||window.location.origin;this._api=u({baseUrl:t})}_startPolling(){this._pollInterval=setInterval(()=>this._loadData(),3e3),this._visibilityHandler=()=>{document.hidden?this._pollInterval&&(clearInterval(this._pollInterval),this._pollInterval=null):this._pollInterval||(this._loadData(),this._pollInterval=setInterval(()=>this._loadData(),3e3))},document.addEventListener("visibilitychange",this._visibilityHandler)}_stopPolling(){this._pollInterval&&(clearInterval(this._pollInterval),this._pollInterval=null),this._visibilityHandler&&(document.removeEventListener("visibilitychange",this._visibilityHandler),this._visibilityHandler=null)}async _loadData(){try{let[t,e]=await Promise.all([this._api.getAppRunnerStatus(),this._api.getAppRunnerLogs()]),a=JSON.stringify({status:t?.status,port:t?.port,restarts:t?.restart_count,url:t?.url}),i=JSON.stringify(e?.lines?.slice(-5)||[]),s=i!==this._lastLogsHash;if(a===this._lastDataHash&&!s)return;this._lastDataHash=a,this._lastLogsHash=i,this._status=t,this._logs=e?.lines||[],this._error=null,this.render(),this._scrollLogsToBottom()}catch(t){this._error||(this._error=`Failed to load app status: ${t.message}`,this.render())}}_scrollLogsToBottom(){let t=this.shadowRoot;if(!t)return;let e=t.querySelector(".log-area");e&&(e.scrollTop=e.scrollHeight)}async _handleRestart(){try{await this._api.restartApp(),this._loadData()}catch(t){this._error=`Restart failed: ${t.message}`,this.render()}}async _handleStop(){try{await this._api.stopApp(),this._loadData()}catch(t){this._error=`Stop failed: ${t.message}`,this.render()}}_formatUptime(t){if(!t)return"--";let e=new Date(t),i=Math.floor((new Date-e)/1e3);if(i<60)return`${i}s`;if(i<3600)return`${Math.floor(i/60)}m ${i%60}s`;let s=Math.floor(i/3600),r=Math.floor(i%3600/60);return`${s}h ${r}m`}_isValidUrl(t){if(!t)return!1;try{let e=new URL(t);return e.protocol==="http:"||e.protocol==="https:"}catch{return!1}}_getStyles(){return`
+    `}_attachEventListeners(){let t=this.shadowRoot;t&&(t.querySelectorAll(".category-header[data-category]").forEach(e=>{e.addEventListener("click",()=>this._toggleCategory(e.dataset.category))}),t.querySelectorAll("button[data-waive-id]").forEach(e=>{e.addEventListener("click",a=>{a.stopPropagation(),this._waiveItem(e.dataset.waiveId)})}),t.querySelectorAll("button[data-unwaive-id]").forEach(e=>{e.addEventListener("click",a=>{a.stopPropagation(),this._unwaiveItem(e.dataset.unwaiveId)})}))}_escapeHtml(t){return t?String(t).replace(/&/g,"&amp;").replace(/</g,"&lt;").replace(/>/g,"&gt;").replace(/"/g,"&quot;"):""}};customElements.define("loki-checklist-viewer",G);var ht={not_initialized:{color:"var(--loki-text-muted, #71717a)",label:"Not Started",pulse:!1},starting:{color:"var(--loki-yellow, #ca8a04)",label:"Starting...",pulse:!0},running:{color:"var(--loki-green, #16a34a)",label:"Running",pulse:!0},stale:{color:"var(--loki-yellow, #ca8a04)",label:"Stale",pulse:!1},completed:{color:"var(--loki-text-muted, #a1a1aa)",label:"Completed",pulse:!1},failed:{color:"var(--loki-red, #dc2626)",label:"Failed",pulse:!1},crashed:{color:"var(--loki-red, #dc2626)",label:"Crashed",pulse:!1},stopped:{color:"var(--loki-text-muted, #a1a1aa)",label:"Stopped",pulse:!1},unknown:{color:"var(--loki-text-muted, #71717a)",label:"Unknown",pulse:!1}},J=class extends c{static get observedAttributes(){return["api-url","theme"]}constructor(){super(),this._loading=!1,this._error=null,this._api=null,this._pollInterval=null,this._status=null,this._logs=[],this._lastDataHash=null,this._lastLogsHash=null}connectedCallback(){super.connectedCallback(),this._setupApi(),this._loadData(),this._startPolling()}disconnectedCallback(){super.disconnectedCallback(),this._stopPolling()}attributeChangedCallback(t,e,a){e!==a&&(t==="api-url"&&this._api&&(this._api.baseUrl=a,this._loadData()),t==="theme"&&this._applyTheme())}_setupApi(){let t=this.getAttribute("api-url")||window.location.origin;this._api=u({baseUrl:t})}_startPolling(){this._pollInterval=setInterval(()=>this._loadData(),3e3),this._visibilityHandler=()=>{document.hidden?this._pollInterval&&(clearInterval(this._pollInterval),this._pollInterval=null):this._pollInterval||(this._loadData(),this._pollInterval=setInterval(()=>this._loadData(),3e3))},document.addEventListener("visibilitychange",this._visibilityHandler)}_stopPolling(){this._pollInterval&&(clearInterval(this._pollInterval),this._pollInterval=null),this._visibilityHandler&&(document.removeEventListener("visibilitychange",this._visibilityHandler),this._visibilityHandler=null)}async _loadData(){try{let[t,e]=await Promise.all([this._api.getAppRunnerStatus(),this._api.getAppRunnerLogs()]),a=JSON.stringify({status:t?.status,port:t?.port,restarts:t?.restart_count,url:t?.url}),i=JSON.stringify(e?.lines?.slice(-5)||[]),s=i!==this._lastLogsHash;if(a===this._lastDataHash&&!s)return;this._lastDataHash=a,this._lastLogsHash=i,this._status=t,this._logs=e?.lines||[],this._error=null,this.render(),this._scrollLogsToBottom()}catch(t){this._error||(this._error=`Failed to load app status: ${t.message}`,this.render())}}_scrollLogsToBottom(){let t=this.shadowRoot;if(!t)return;let e=t.querySelector(".log-area");e&&(e.scrollTop=e.scrollHeight)}async _handleRestart(){try{await this._api.restartApp(),this._loadData()}catch(t){this._error=`Restart failed: ${t.message}`,this.render()}}async _handleStop(){try{await this._api.stopApp(),this._loadData()}catch(t){this._error=`Stop failed: ${t.message}`,this.render()}}_formatUptime(t){if(!t)return"--";let e=new Date(t),i=Math.floor((new Date-e)/1e3);if(i<60)return`${i}s`;if(i<3600)return`${Math.floor(i/60)}m ${i%60}s`;let s=Math.floor(i/3600),r=Math.floor(i%3600/60);return`${s}h ${r}m`}_isValidUrl(t){if(!t)return!1;try{let e=new URL(t);return e.protocol==="http:"||e.protocol==="https:"}catch{return!1}}_getStyles(){return`
       .app-status {
         padding: 16px;
         font-family: var(--loki-font-family, system-ui, -apple-system, sans-serif);

package/docs/COMPARISON.md CHANGED Viewed

@@ -37,7 +37,7 @@
 |---------|--------------|-----------|-----------|------------|----------|-----------------|--------------|--------------|
 | **Code Review** | 3 blind reviewers + devil's advocate | Basic | Basic | BugBot PR | Property-based | Artifacts | Doc/Review | Basic |
 | **Anti-Sycophancy** | Yes (CONSENSAGENT) | No | No | No | No | No | No | No |
-| **Quality Gates** | 7 gates + PBT | Basic | Sandbox | Tests | Spec validation | Artifact checks | Tests | Permissions |
+| **Quality Gates** | 9 gates + PBT | Basic | Sandbox | Tests | Spec validation | Artifact checks | Tests | Permissions |
 | **Constitutional AI** | Yes (principles) | No | Refusal training | No | No | No | No | No |
 ---
@@ -146,7 +146,7 @@
 | Feature | **Zencoder** | **Loki Mode** | **Assessment** |
 |---------|-------------|---------------|----------------|
-| **Four Pillars** | Structured Workflows, SDD, Multi-Agent Verification, Parallel Execution | SDLC + RARV + 7 Gates + Worktrees | TIE |
+| **Four Pillars** | Structured Workflows, SDD, Multi-Agent Verification, Parallel Execution | SDLC + RARV + 9 Gates + Worktrees | TIE |
 | **Spec-Driven Dev** | Specs as first-class objects | OpenAPI-first | TIE |
 | **Multi-Agent Verification** | Model diversity (Claude vs OpenAI, 54% improvement) | 3 blind reviewers + devil's advocate | Different approach (N/A for Claude Code - only Claude models) |
 | **Quality Gates** | Built-in verification loops | 7 explicit gates + anti-sycophancy | **Loki Mode** |
@@ -207,7 +207,7 @@
 | **Skills** | Progressive disclosure | 6 slash commands | N/A | 129 skills | N/A | 35 skills | Memory focus |
 | **Multi-Provider** | Yes (Claude/Codex/Gemini) | 3 CLIs (separate) | No | No | No | No | No |
 | **Memory System** | 3-tier (episodic/semantic/procedural) | None | N/A | N/A | Hybrid | N/A | SQLite+FTS5 |
-| **Quality Gates** | 7 gates + Completion Council | User verify only | Two-Stage Review | N/A | Consensus | Tiered | N/A |
+| **Quality Gates** | 9 gates + Completion Council | User verify only | Two-Stage Review | N/A | Consensus | Tiered | N/A |
 | **Context Mgmt** | Standard | Fresh per task (core innovation) | Fresh per task | N/A | N/A | N/A | Progressive |
 | **Autonomy** | High (minimal human) | Semi (checkpoints) | Human-guided | Human-guided | Orchestrated | Human-guided | N/A |
@@ -232,7 +232,7 @@ These are patterns from competing projects that are **practically and scientific
 |----------|---------|-------------------------|
 | **Multi-Provider Support** | Only skill supporting Claude, Codex, and Gemini with graceful degradation | All 8 competitors are Claude-only |
 | **RARV Cycle** | Reason-Act-Reflect-Verify is more rigorous than Plan-Execute | Most use simple Plan-Execute |
-| **7-Gate Quality System** | Static analysis + 3 reviewers + devil's advocate + anti-sycophancy + severity blocking + coverage + debate | Superpowers has 2-stage, others have less |
+| **9-Gate Quality System** | Static analysis + 3 reviewers + devil's advocate + anti-sycophancy + severity blocking + coverage + debate | Superpowers has 2-stage, others have less |
 | **Constitutional AI Integration** | Principles-based self-critique from Anthropic research | None have this |
 | **Anti-Sycophancy (CONSENSAGENT)** | Blind review + devil's advocate prevents groupthink | None have this |
 | **Provider Abstraction Layer** | Clean degradation from full-featured to sequential-only | Claude-only projects can't degrade |
@@ -359,7 +359,7 @@ Tiered agent architecture with explicit escalation:
 |-----------|-------------------|
 | **Autonomy** | Designed for high autonomy with minimal human intervention |
 | **Multi-Agent** | 41 specialized agents in 8 swarms vs 1-8 in competitors |
-| **Quality** | 7 gates + blind review + devil's advocate + property-based testing |
+| **Quality** | 9 gates + blind review + devil's advocate + property-based testing |
 | **Research** | 10+ academic papers integrated vs proprietary/undisclosed |
 | **Anti-Sycophancy** | Only agent with CONSENSAGENT-based blind review |
 | **Memory** | 3-tier memory (episodic/semantic/procedural) + review learning + cross-project |

package/docs/COMPETITIVE-ANALYSIS.md CHANGED Viewed

@@ -20,7 +20,7 @@ GSD is the closest competitor -- a context engineering system that spawns fresh
 | Adoption | 594 stars, 6K/wk npm | 11,903 stars, 21K/wk npm | GSD (20x) |
 | Simplicity | Complex (5.4K-line run.sh, 12 Python modules) | Simple (markdown agents + slash commands) | GSD |
 | Full autonomy | Walk away, come back to deployed product | Human checkpoints at discuss/verify/milestone | Loki |
-| Quality gates | 7-gate + Completion Council + anti-sycophancy | User verification only | Loki |
+| Quality gates | 9-gate + Completion Council + anti-sycophancy | User verification only | Loki |
 | Memory system | Episodic/semantic/procedural + vector search | None | Loki |
 | Context management | Standard | Fresh subagent contexts per task (core innovation) | GSD |
 | Time to value | Learn architecture, understand CLI flags | `npx get-shit-done-cc` and go | GSD |
@@ -85,7 +85,7 @@ GSD is the closest competitor -- a context engineering system that spawns fresh
 **Strengths:**
 - 85.9-87.7% Pass@1 on HumanEval
-- 100% task completion rate in evaluations
+- High task completion rate in evaluations (100% reported by MetaGPT authors; not independently verified)
 - Standard Operating Procedures (SOPs) reduce hallucinations
 - Assembly line paradigm with role specialization
 - Low cost: ~$1.09 per project completion

package/docs/INSTALLATION.md CHANGED Viewed

@@ -2,7 +2,7 @@
 The flagship product of [Autonomi](https://www.autonomi.dev/). Complete installation instructions for all platforms and use cases.
-**Version:** v5.49.1
+**Version:** v5.49.3
 ---
@@ -36,9 +36,7 @@ The flagship product of [Autonomi](https://www.autonomi.dev/). Complete installa
 - [Quick Install (Recommended)](#quick-install-recommended)
 - [VS Code Extension](#vs-code-extension)
-- [npm (Node.js)](#npm-nodejs)
-- [Homebrew (macOS/Linux)](#homebrew-macoslinux)
-- [Docker](#docker)
+- [Alternative Methods](#alternative-methods)
 - [Sandbox Mode](#sandbox-mode)
 - [Multi-Provider Support](#multi-provider-support)
 - [Claude Code (CLI)](#claude-code-cli)
@@ -53,23 +51,19 @@ The flagship product of [Autonomi](https://www.autonomi.dev/). Complete installa
 ## Quick Install (Recommended)
-Choose your preferred method:
 ```bash
-# Option A: npm (easiest)
-npm install -g loki-mode
+git clone https://github.com/asklokesh/loki-mode.git ~/.claude/skills/loki-mode
+```
-# Option B: Homebrew (macOS/Linux)
-brew tap asklokesh/tap && brew install loki-mode
+That's it. Claude Code auto-discovers skills in `~/.claude/skills/`.
-# Option C: Docker
-docker pull asklokesh/loki-mode:latest
+**Update:** `cd ~/.claude/skills/loki-mode && git pull`
-# Option D: Git clone
-git clone https://github.com/asklokesh/loki-mode.git ~/.claude/skills/loki-mode
-```
+Skip to [Verify Installation](#verify-installation) to confirm it's working.
-**Done!** Skip to [Verify Installation](#verify-installation).
+### Alternative Installation Methods
+Also available via npm, Homebrew, Docker, VS Code Extension, and GitHub Action. Each has trade-offs -- see [docs/alternative-installations.md](alternative-installations.md) for details, limitations, and current status of each method.
 ---
@@ -145,153 +139,42 @@ The extension will automatically connect when it detects the server is running a
 ---
-## npm (Node.js)
+## Alternative Methods
-Install via npm for the easiest setup with automatic PATH configuration.
+The following installation methods are available but each has limitations. Git clone (above) is the recommended primary method.
-### Prerequisites
+For full details, troubleshooting, and current status of each method, see [alternative-installations.md](alternative-installations.md).
-- Node.js 16.0.0 or later
+### npm
-### Installation
+**Status:** Published to npm registry. Verify current version: `npm view loki-mode version`
 ```bash
-# Global installation
 npm install -g loki-mode
-# The skill is automatically installed to ~/.claude/skills/loki-mode
-# Opt out of anonymous install telemetry:
-# LOKI_TELEMETRY_DISABLED=true npm install -g loki-mode
-# Or set DO_NOT_TRACK=1
-```
-### Usage
-```bash
-# Use the CLI
-loki start ./my-prd.md
-loki status
-loki dashboard
-# Or invoke in Claude Code
-claude --dangerously-skip-permissions
-> Loki Mode with PRD at ./my-prd.md
-```
-### Updating
-```bash
-npm update -g loki-mode
-```
-### Uninstalling
-```bash
-npm uninstall -g loki-mode
-rm -rf ~/.claude/skills/loki-mode
 ```
----
-## Homebrew (macOS/Linux)
+Requires Node.js 16+. Provides the `loki` CLI and auto-installs the skill to `~/.claude/skills/loki-mode`.
-Install via Homebrew with automatic dependency management.
+### Homebrew
-### Prerequisites
-- Homebrew (https://brew.sh)
-### Installation
+**Status:** Available via tap. Verify formula: `brew info asklokesh/tap/loki-mode`
 ```bash
-# Add the tap
-brew tap asklokesh/tap
-# Install Loki Mode
-brew install loki-mode
-# Set up Claude Code skill integration (manual symlink required)
+brew tap asklokesh/tap && brew install loki-mode
+# Manual symlink required for Claude Code:
 ln -sf "$(brew --prefix)/opt/loki-mode/libexec" ~/.claude/skills/loki-mode
 ```
-### Dependencies
-Homebrew automatically installs:
-- bash 4.0+ (for associative arrays)
-- jq (JSON processing)
-- gh (GitHub CLI for integration)
-### Usage
-```bash
-# Use the CLI
-loki start ./my-prd.md
-loki status
-loki --help
-```
-### Updating
-```bash
-brew upgrade loki-mode
-```
-### Uninstalling
+### Docker
-```bash
-brew uninstall loki-mode
-rm -rf ~/.claude/skills/loki-mode
-```
----
-## Docker
-Run Loki Mode in a container for isolated execution.
-### Prerequisites
-- Docker installed and running
-### Installation
+**Status:** Published to Docker Hub.
 ```bash
-# Pull the image
 docker pull asklokesh/loki-mode:latest
-# Or use docker-compose
-curl -o docker-compose.yml https://raw.githubusercontent.com/asklokesh/loki-mode/main/docker-compose.yml
-```
-### Usage
-```bash
-# Run with a PRD file
 docker run -v $(pwd):/workspace -w /workspace asklokesh/loki-mode:latest start ./my-prd.md
-# Interactive mode
-docker run -it -v $(pwd):/workspace -w /workspace asklokesh/loki-mode:latest
-# Using docker-compose
-docker-compose run loki start ./my-prd.md
 ```
-### Environment Variables
-Pass your configuration via environment variables:
-```bash
-docker run -e LOKI_MAX_RETRIES=100 -e LOKI_BASE_WAIT=120 \
-  -v $(pwd):/workspace -w /workspace \
-  asklokesh/loki-mode:latest start ./my-prd.md
-```
-### Updating
-```bash
-docker pull asklokesh/loki-mode:latest
-```
+**Limitation:** Docker cannot run Claude Code interactively (Claude Code is a terminal-based CLI requiring TTY access). Docker is suitable for CI/CD pipelines, API-only modes, and sandbox execution -- not for the primary interactive workflow.
 ---

package/docs/alternative-installations.md ADDED Viewed

@@ -0,0 +1,145 @@
+# Alternative Installation Methods
+The primary installation method is git clone (see [README](../README.md#installation)). These alternatives serve specific use cases.
+---
+## npm (Secondary)
+**Status**: Working. Version tracks releases automatically.
+```bash
+npm install -g loki-mode
+```
+**Limitation**: Installs to `node_modules`, not `~/.claude/skills/`. To use as a Claude Code skill, you must symlink:
+```bash
+npm install -g loki-mode
+ln -sf "$(npm root -g)/loki-mode" ~/.claude/skills/loki-mode
+```
+**Best for**: CI/CD pipelines, programmatic access via `loki` CLI.
+---
+## Homebrew (Secondary)
+**Status**: Working. Tap and formula exist, version current.
+```bash
+brew tap asklokesh/tap
+brew install loki-mode
+```
+**Limitation**: Installs the `loki` CLI binary only. Does NOT install the Claude Code skill. To use with Claude Code, also run:
+```bash
+git clone https://github.com/asklokesh/loki-mode.git ~/.claude/skills/loki-mode
+```
+**Best for**: Users who want the `loki` CLI wrapper for autonomous mode (`loki start`, `loki stop`, `loki cleanup`).
+---
+## Docker (Secondary)
+**Status**: Image exists on Docker Hub. Tags: `latest`, version-specific (e.g., `5.49.1`).
+```bash
+docker pull asklokesh/loki-mode:latest
+```
+**Limitation**: Claude Code is an interactive CLI that requires API keys and terminal access. Running it inside a Docker container is not the standard workflow. Docker is useful for:
+- CI/CD sandbox execution (running `loki` in isolated environments)
+- Testing Loki Mode without modifying your local system
+- Air-gapped environments with pre-built images
+**Not recommended for**: Interactive Claude Code sessions. Use the git clone method instead.
+See [DOCKER_README.md](../DOCKER_README.md) for Docker-specific usage instructions.
+---
+## GitHub Action (Secondary)
+**Status**: Working. Adds automated AI code review to pull requests.
+```yaml
+# .github/workflows/loki-review.yml
+name: Loki Code Review
+on:
+  pull_request:
+    types: [opened, synchronize]
+permissions:
+  contents: read
+  pull-requests: write
+jobs:
+  review:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: asklokesh/loki-mode@v5
+        with:
+          github_token: ${{ secrets.GITHUB_TOKEN }}
+          mode: review
+          provider: claude
+          max_iterations: 3
+          budget_limit: '5.00'
+        env:
+          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
+```
+**Prerequisites:**
+- API key for your provider (set as repository secret): `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, or `GOOGLE_API_KEY`
+- The action auto-installs `loki-mode` and `@anthropic-ai/claude-code`
+**Action Inputs:**
+| Input | Default | Description |
+|-------|---------|-------------|
+| `mode` | `review` | `review`, `fix`, or `test` |
+| `provider` | `claude` | `claude`, `codex`, or `gemini` |
+| `budget_limit` | `5.00` | Max cost in USD |
+| `max_iterations` | `3` | Max RARV cycles |
+| `github_token` | (required) | GitHub token for PR comments |
+| `prd_file` | | Path to PRD file (for fix/test modes) |
+**Modes:**
+| Mode | Description |
+|------|-------------|
+| `review` | Analyze PR diff, post structured review as PR comment |
+| `fix` | Automatically fix issues found in the codebase |
+| `test` | Run autonomous test generation and validation |
+**Best for**: Automated PR review and CI/CD integration.
+---
+## GitHub Release Download (Secondary)
+**Status**: Working. Release assets available for each version.
+```bash
+# Download and extract to skills directory
+curl -sL https://github.com/asklokesh/loki-mode/archive/refs/tags/v5.49.1.tar.gz | tar xz
+mv loki-mode-5.49.1 ~/.claude/skills/loki-mode
+```
+**Best for**: Offline or air-gapped environments, pinned version deployments.
+---
+## VS Code Extension (Secondary)
+**Status**: Available on VS Code Marketplace.
+Search for "Loki Mode" in VS Code Extensions, or:
+```bash
+code --install-extension asklokesh.loki-mode
+```
+**Best for**: VS Code users who want dashboard integration within their editor.

package/docs/cursor-comparison.md CHANGED Viewed

@@ -11,7 +11,7 @@
 |-----------|--------|-----------|--------|
 | **Proven Scale** | 1M+ LoC, large agent count | Benchmarks only | Cursor |
 | **Research Foundation** | Empirical iteration | 25+ academic citations | Loki Mode |
-| **Quality Assurance** | Workers self-manage | 7-gate system + anti-sycophancy | Loki Mode |
+| **Quality Assurance** | Workers self-manage | 9-gate system + anti-sycophancy | Loki Mode |
 | **Anti-Sycophancy** | Not mentioned | CONSENSAGENT blind review | Loki Mode |
 | **Velocity-Quality Balance** | Not mentioned | arXiv-backed metrics | Loki Mode |
 | **Full SDLC Coverage** | Code generation focus | PRD to production + growth | Loki Mode |
@@ -66,7 +66,7 @@ velocity_quality_balance:
 ---
-### 3. 7-Gate Quality System
+### 3. 9-Gate Quality System
 **Loki Mode's Gates:**
 1. Input Guardrails - Validate scope, detect injection (OpenAI SDK pattern)
@@ -174,7 +174,7 @@ Cursor learned through failure:
 ### 3. Simplicity Principle
 > "A surprising amount of the system's behavior comes down to how we prompt the agents. The harness and models matter, but the prompts matter more."
-**Loki Mode:** More complex infrastructure (7 gates, 41 agent types, memory systems). May be over-engineered for some use cases.
+**Loki Mode:** More complex infrastructure (9 gates, 41 agent types, memory systems). May be over-engineered for some use cases.
 ---
@@ -192,7 +192,7 @@ We incorporated Cursor's proven patterns:
 ## Conclusion
 **Loki Mode is scientifically better in:**
-- Quality assurance (research-backed 7-gate system)
+- Quality assurance (research-backed 9-gate system)
 - Anti-sycophancy (CONSENSAGENT blind review)
 - Velocity-quality balance (arXiv metrics)
 - Full SDLC coverage (PRD to growth)

package/mcp/__init__.py CHANGED Viewed

@@ -21,4 +21,4 @@ try:
 except ImportError:
     __all__ = ['mcp']
-__version__ = '5.49.1'
+__version__ = '5.49.3'

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "loki-mode",
-  "version": "5.49.1",
+  "version": "5.49.3",
   "description": "Loki Mode by Autonomi - Multi-agent autonomous startup system for Claude Code, Codex CLI, and Gemini CLI",
   "keywords": [
     "autonomi",

package/references/core-workflow.md CHANGED Viewed

@@ -6,7 +6,7 @@ Full RARV cycle, CONTINUITY.md template, and autonomy rules.
 ## Autonomy Rules
-**This system runs with ZERO human intervention.**
+**This system runs with minimal human intervention.** Human oversight is expected for deployment credentials, domain setup, API keys, and critical business decisions.
 ### Core Rules
 1. **NEVER ask questions** - Do not say "Would you like me to...", "Should I...", or "What would you prefer?"

package/references/quality-control.md CHANGED Viewed

@@ -165,6 +165,16 @@ IMPLEMENT -> BLIND REVIEW (parallel) -> DEBATE (if disagreement) -> AGGREGATE ->
 - NEVER dispatch reviewers sequentially (always parallel - 3x faster)
 - NEVER aggregate before all 3 reviewers complete
+### Test Quality Review (Apply to Every Review)
+Before approving, verify:
+- Are tests using real implementations or excessive mocks of internal code?
+- Were any assertion expected values changed in the same commit as implementation? (This is the top sign an agent cheated.)
+- Do tests verify meaningful behavior or just "runs without throwing"?
+- Could all tests pass while the feature is completely broken?
+Assertion manipulation in the same commit as implementation = CRITICAL finding = automatic REJECT.
 ### Anti-Sycophancy Protocol (CONSENSAGENT Research)
 **Problem:** Reviewers may reinforce each other's findings instead of critically engaging.

package/skills/00-index.md CHANGED Viewed

@@ -41,7 +41,7 @@
 ### quality-gates.md
 **When:** Code review, pre-commit checks, quality assurance
-- 7-gate quality system
+- 9-gate quality system
 - Blind review + anti-sycophancy
 - Velocity-quality feedback loop (arXiv research)
 - Mandatory quality checks per task

package/skills/artifacts.md CHANGED Viewed

@@ -36,7 +36,7 @@ format: "markdown"
 contents:
   - Phase name and duration
   - Tasks completed (from queue)
-  - Quality gate results (7 gates)
+  - Quality gate results (9 gates)
   - Coverage metrics
   - Known issues / TODOs
 ```

package/skills/quality-gates.md CHANGED Viewed

@@ -2,7 +2,7 @@
 **Never ship code without passing all quality gates.**
-## The 7 Quality Gates
+## The 9 Quality Gates
 1. **Input Guardrails** - Validate scope, detect injection, check constraints (OpenAI SDK)
 2. **Static Analysis** - CodeQL, ESLint/Pylint, type checking
@@ -11,6 +11,26 @@
 5. **Output Guardrails** - Validate code quality, spec compliance, no secrets (tripwire on fail)
 6. **Severity-Based Blocking** - Critical/High/Medium = BLOCK; Low/Cosmetic = TODO comment
 7. **Test Coverage Gates** - Unit: 100% pass, >80% coverage; Integration: 100% pass
+8. **Mock Detector** - Classifies internal vs external mocks; flags tests that never import source code, tautological assertions, and high internal mock ratios
+9. **Test Mutation Detector** - Detects assertion value changes alongside implementation changes (test fitting), low assertion density, and missing pass/fail tracking
+## Gate 8 and 9: Automated Test Integrity
+Gates 8 (Mock Detector) and 9 (Test Mutation Detector) run during the VERIFY phase and are enabled by default.
+**How they run:**
+- Gate 8 runs `tests/detect-mock-problems.sh` against all test files in the project
+- Gate 9 runs `tests/detect-test-mutations.sh` against recent commits (default: last 5, or use `--commit HASH` for targeted checks)
+- Both produce findings at HIGH/MEDIUM/LOW severity levels
+- HIGH findings = automatic FAIL (same as other blocking gates)
+**Disabling (not recommended):**
+```bash
+LOKI_GATE_MOCK_DETECTOR=false    # Disable gate 8
+LOKI_GATE_MUTATION_DETECTOR=false # Disable gate 9
+```
+---
 ## Guardrails Execution Modes

package/skills/testing.md CHANGED Viewed

@@ -1,5 +1,20 @@
 # Testing
+## Mandatory Testing Rules
+1. Write tests FIRST. Commit the test before writing implementation.
+2. Tests must call REAL functions with REAL inputs and assert REAL outputs.
+3. Mock ONLY external dependencies: HTTP APIs, databases, file system, third-party services.
+4. NEVER mock internal modules, utility functions, or any code that is part of this project.
+5. NEVER change a test's expected value to make it pass. If a test fails, the implementation is wrong. Fix the code, not the test.
+6. If you believe a test expectation is incorrect, document WHY and flag for council review. Do not silently change it.
+7. Every test file must have at least one assertion per tested function.
+Gate 8 (mock detector) and Gate 9 (mutation detector) enforce rules 3-5 automatically.
+Violations result in automatic FAIL during VERIFY phase.
+---
 ## E2E Testing with Playwright MCP
 **Use Playwright MCP for browser-based testing.**