loki-mode 5.52.1 → 5.52.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,920 +1,213 @@
1
1
  # Loki Mode
2
2
 
3
- **The Flagship Product of [Autonomi](https://www.autonomi.dev/) -- An Autonomous Multi-Agent Development System**
3
+ **Autonomous multi-agent development with self-verification. PRD in, tested code out.**
4
4
 
5
5
  [![npm version](https://img.shields.io/npm/v/loki-mode)](https://www.npmjs.com/package/loki-mode)
6
6
  [![npm downloads](https://img.shields.io/npm/dw/loki-mode)](https://www.npmjs.com/package/loki-mode)
7
7
  [![GitHub stars](https://img.shields.io/github/stars/asklokesh/loki-mode)](https://github.com/asklokesh/loki-mode)
8
8
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
9
- [![GitHub Marketplace](https://img.shields.io/badge/Marketplace-Loki%20Mode-purple?logo=github)](https://github.com/marketplace/actions/loki-mode-code-review)
10
- [![Autonomi](https://img.shields.io/badge/Autonomi-autonomi.dev-5B4EEA)](https://www.autonomi.dev/)
11
9
  [![Agent Types](https://img.shields.io/badge/Agent%20Types-41-blue)]()
12
- [![Benchmarks](https://img.shields.io/badge/Benchmarks-Infrastructure%20Ready-blue)](benchmarks/)
13
-
14
- **Current Version: v5.52.0**
15
-
16
- **[Autonomi](https://www.autonomi.dev/)** | **[Documentation](https://www.autonomi.dev/docs)** | **[GitHub](https://github.com/asklokesh/loki-mode)**
17
-
18
- > **PRD to Deployed Product with Minimal Human Intervention**
19
- >
20
- > Loki Mode transforms a Product Requirements Document into a fully built, tested, and deployed product with autonomous multi-agent execution. Human oversight for deployment credentials, domain setup, and critical decisions.
21
-
22
- ---
23
-
24
- ## Demo
25
-
26
- [![asciicast](https://asciinema.org/a/AjjnjzOeKLYItp6s.svg)](https://asciinema.org/a/AjjnjzOeKLYItp6s)
27
-
28
- *Click to watch Loki Mode v5.42 -- CLI commands, dashboard, 8 parallel agents, 9-gate quality, Completion Council, memory system*
29
-
30
- ---
31
-
32
- ## Presentation
33
-
34
- ![Loki Mode Presentation](docs/loki-mode-presentation.gif)
35
-
36
- *9 slides: Problem, Solution, 41 Agents, RARV Cycle, Benchmarks, Multi-Provider, Full Lifecycle*
37
-
38
- **[Download PPTX](docs/loki-mode-presentation.pptx)** for offline viewing
39
-
40
- ---
41
-
42
- ## Installation
43
-
44
- ### npm (Recommended)
45
-
46
- ```bash
47
- npm install -g loki-mode
48
- ```
49
-
50
- Installs the `loki` CLI and automatically sets up the skill for Claude Code, Codex CLI, and Gemini CLI.
51
-
52
- ### Homebrew
53
-
54
- ```bash
55
- brew tap asklokesh/tap && brew install loki-mode
56
- ```
57
-
58
- Installs the `loki` CLI. To also install the skill for interactive use:
59
-
60
- ```bash
61
- loki setup-skill
62
- ```
63
-
64
- ### Quick Start
65
-
66
- ```bash
67
- # CLI mode (works with any provider)
68
- loki start ./prd.md
69
- loki start ./prd.md --provider codex
70
- loki start ./prd.md --provider gemini
71
-
72
- # Interactive mode (inside your coding agent)
73
- claude --dangerously-skip-permissions
74
- # Then say: "Loki Mode with PRD at ./my-prd.md"
75
-
76
- # Or in Codex CLI:
77
- codex
78
- # Then say: "Use Loki Mode with PRD at ./my-prd.md"
79
-
80
- # Or in Gemini CLI:
81
- gemini
82
- # Then say: "Use Loki Mode with PRD at ./my-prd.md"
83
- ```
84
-
85
- ### Verify Installation
86
-
87
- ```bash
88
- loki --version # Should print 5.52.0
89
- loki doctor # Check skill symlinks and provider availability
90
- ```
91
-
92
- ### Other Methods
93
-
94
- Git clone, Docker, GitHub Action, and VS Code Extension are also available. See [docs/alternative-installations.md](docs/alternative-installations.md).
95
-
96
- ### Update
97
-
98
- ```bash
99
- npm update -g loki-mode # npm
100
- brew upgrade loki-mode # Homebrew
101
- ```
102
-
103
- ### Multi-Provider Support (v5.0.0)
104
-
105
- | Provider | Features | Parallel Agents | Task Tool |
106
- |----------|----------|-----------------|-----------|
107
- | Claude | Full | Yes (10+) | Yes |
108
- | Codex | Degraded | No | No |
109
- | Gemini | Degraded | No | No |
110
-
111
- See [skills/providers.md](skills/providers.md) for full provider documentation.
112
-
113
- ---
114
-
115
- ## Benchmarks
116
-
117
- Benchmark infrastructure is included for HumanEval and SWE-bench evaluation. Results are self-reported from the included test harness and have not been independently verified.
118
-
119
- | Benchmark | Result | Notes |
120
- |-----------|--------|-------|
121
- | HumanEval | 162/164 (98.78%) | Self-reported, max 3 retries per problem |
122
- | SWE-bench | 299/300 patches generated | Patch generation only -- SWE-bench evaluator not yet run to verify correctness |
123
-
124
- **Note:** SWE-bench "patch generation" means the system produced a patch file, not that the patch correctly resolves the issue. The SWE-bench evaluator should be run to determine actual resolution rates.
10
+ [![Autonomi](https://img.shields.io/badge/Autonomi-autonomi.dev-5B4EEA)](https://www.autonomi.dev/)
125
11
 
126
- See [benchmarks/](benchmarks/) for the test harness and raw results.
12
+ **Current Version: v5.52.3**
127
13
 
128
14
  ---
129
15
 
130
- ## What is Loki Mode?
16
+ ## What Is Loki Mode?
131
17
 
132
- Loki Mode is a multi-provider AI skill that orchestrates **41 specialized AI agent types** across **8 swarms** to autonomously build, test, and deploy software projects. Works with **Claude Code**, **OpenAI Codex CLI**, and **Google Gemini CLI**. It dynamically spawns agents as needed -- typically **5-10 for simple projects, more for complex ones** -- working in parallel with continuous self-verification.
18
+ Loki Mode is a multi-agent system that transforms a Product Requirements Document into a built and tested product. It orchestrates 41 specialized agent types across 8 swarms -- engineering, operations, business, data, product, growth, review, and orchestration -- working in parallel with continuous self-verification.
133
19
 
134
- ```
135
- PRD → Research → Architecture → Development → Testing → Deployment → Marketing
136
- ```
20
+ Every iteration follows the **RARV cycle**: Reason (read state, identify next task) -> Act (execute, commit) -> Reflect (update continuity, learn) -> Verify (run tests, check spec). If verification fails, the system captures the error as a learning and retries from Reason. This is the core differentiator: code is not "done" until it passes automated verification. See [Core Workflow](references/core-workflow.md).
137
21
 
138
- **Just say "Loki Mode" and point to a PRD. Walk away. Come back to a deployed product.**
22
+ **What "autonomous" actually means:** The system runs RARV cycles without prompting. It does NOT have access to your cloud accounts, payment systems, or external services unless you provide credentials. Human oversight is expected for deployment credentials, domain setup, API keys, and critical decisions. The system can make mistakes, especially on novel or complex problems.
139
23
 
140
- ---
24
+ ### What To Expect
141
25
 
142
- ## Current Limitations
26
+ | Project Type | Examples | Typical Duration | Experience |
27
+ |---|---|---|---|
28
+ | Simple | Landing page, todo app, single API | 5-30 min | Completes independently. Human reviews output. |
29
+ | Standard | CRUD app with auth, REST API + React frontend | 30-90 min | Completes most features. May need guidance on complex parts. |
30
+ | Complex | Microservices, real-time systems, ML pipelines | 2+ hours | Use as accelerator. Human reviews between phases. |
143
31
 
144
- Loki Mode is powerful but not magic. Be aware of these honest limitations:
32
+ ### Limitations
145
33
 
146
34
  | Area | What Works | What Doesn't (Yet) |
147
35
  |------|-----------|---------------------|
148
- | **Code Generation** | Generates full-stack applications from PRDs | Complex domain logic may need human review and correction |
149
- | **Deployment** | Generates deployment configs and scripts | Does not have cloud credentials -- human must provide and authorize |
150
- | **Testing** | 9 automated quality gates, blind review | Test quality depends on AI-generated assertions; mutation testing is heuristic |
151
- | **Business Ops** | Generates marketing copy, legal templates | Does not actually send emails, file legal documents, or process payments |
152
- | **Multi-Provider** | Claude (full), Codex (degraded), Gemini (degraded) | Codex and Gemini lack parallel agents and Task tool -- sequential only |
153
- | **Memory System** | Episodic, semantic, procedural memory tiers | Vector search requires optional `sentence-transformers` dependency |
154
- | **Enterprise Security** | TLS, OIDC, RBAC, audit trail, SIEM configs | Self-signed certs only; production deployments need real certificates |
155
- | **Dashboard** | Real-time status, task queue, agent monitoring | Single-machine only; no multi-node dashboard clustering |
156
- | **Benchmarks** | HumanEval 98.78%, SWE-bench 299/300 patches | Self-reported; SWE-bench counts patch generation, not verified resolution |
157
-
158
- **What "autonomous" means in practice:**
159
- - Loki Mode runs without prompting between RARV cycles
160
- - It does NOT have access to your cloud accounts, payment systems, or external services unless you provide credentials
161
- - Human oversight is expected for: deployment credentials, domain setup, API keys, and critical business decisions
162
- - The system is as good as the underlying AI model -- it can make mistakes, especially on novel or complex problems
163
-
164
- ## What To Expect
165
-
166
- | Project Type | Examples | Autonomy Level | Typical Experience |
167
- |---|---|---|---|
168
- | Simple | Landing page, todo app, static site, single API | High | Completes with minimal retries. Human reviews output. |
169
- | Standard | CRUD app with auth, REST API + React frontend | Medium | Completes most features. Complex components may need guidance. |
170
- | Complex | Microservices, real-time systems, ML pipelines | Guided | Use as accelerator. Human reviews between phases. |
171
-
172
- "Autonomous" means the system runs RARV cycles without prompting. It does NOT mean zero oversight.
36
+ | **Code Generation** | Full-stack apps from PRDs | Complex domain logic may need human review |
37
+ | **Deployment** | Generates configs, Dockerfiles, CI/CD workflows | Does not deploy -- human provides cloud credentials and runs deploy |
38
+ | **Testing** | 9 automated quality gates, blind review | Test quality depends on AI-generated assertions |
39
+ | **Multi-Provider** | Claude (full), Codex/Gemini (sequential only) | Codex and Gemini lack parallel agents and Task tool |
40
+ | **Enterprise** | TLS, OIDC, RBAC, audit trail | Self-signed certs only; some features require env var activation |
41
+ | **Dashboard** | Real-time status, task queue, agents | Single-machine only; no multi-node clustering |
173
42
 
174
43
  ---
175
44
 
176
- ## Why Loki Mode?
177
-
178
- ### **How It Works**
179
-
180
- | What Others Do | What Loki Mode Does |
181
- |----------------|---------------------|
182
- | **Single agent** writes code linearly | **Multiple agents** work in parallel across engineering, ops, business, data, product, and growth |
183
- | **Manual deployment** required | **Autonomous deployment** to AWS, GCP, Azure, Vercel, Railway with blue-green and canary strategies |
184
- | **No testing** or basic unit tests | **9 automated quality gates**: input/output guardrails, static analysis, blind review, anti-sycophancy, severity blocking, test coverage, mock detection, mutation detection |
185
- | **Code only** - you handle the rest | **Full business operations**: marketing, sales, legal, HR, finance, investor relations |
186
- | **Stops on errors** | **Self-healing**: circuit breakers, dead letter queues, exponential backoff, automatic recovery |
187
- | **No visibility** into progress | **Real-time dashboard** with agent monitoring, task queues, and live status updates |
188
- | **"Done" when code is written** | **Never "done"**: continuous optimization, A/B testing, customer feedback loops, perpetual improvement |
189
- | **No security controls** | **Enterprise-ready**: TLS/HTTPS, OIDC/SSO, RBAC, audit trails, SIEM integration, Prometheus metrics (v5.36.0-v5.38.0) |
190
- | **Direct commits to main** | **Branch protection**: auto-create feature branches, clean PR workflow, never touches main directly (v5.37.0) |
191
-
192
- ### **Core Advantages**
193
-
194
- 1. **Self-Verifying**: RARV (Reason-Act-Reflect-Verify) cycle with continuous self-verification catches errors early
195
- 2. **Parallel Execution**: Multiple agents working simultaneously, not sequential single-agent bottlenecks
196
- 3. **Production-Ready**: Not just code—handles deployment, monitoring, incident response, and business operations
197
- 4. **Self-Improving**: Learns from mistakes, updates continuity logs, prevents repeated errors
198
- 5. **Zero Babysitting**: Auto-resumes on rate limits, recovers from failures, runs until completion
199
- 6. **Efficiency Optimized**: ToolOrchestra-inspired metrics track cost per task, reward signals drive continuous improvement
200
-
201
- ---
202
-
203
- ## Features & Documentation
204
-
205
- | Feature | Description | Documentation |
206
- |---------|-------------|---------------|
207
- | **VS Code Extension** | Visual interface with sidebar, status bar | [Marketplace](https://marketplace.visualstudio.com/items?itemName=asklokesh.loki-mode) |
208
- | **Multi-Provider (v5.0.0)** | Claude, Codex, Gemini support | [Provider Guide](skills/providers.md) |
209
- | **CLI (v4.1.0)** | `loki` command for start/stop/pause/status | [CLI Commands](#cli-commands-v410) |
210
- | **Config Files** | YAML configuration support | [autonomy/config.example.yaml](autonomy/config.example.yaml) |
211
- | **Dashboard** | Realtime Kanban board, agent monitoring | [Dashboard Guide](docs/dashboard-guide.md) |
212
- | **TLS/HTTPS (v5.36.0)** | Dashboard encryption with self-signed certs | [Network Security](docs/network-security.md) |
213
- | **OIDC/SSO (v5.36.0)** | Google, Azure AD, Okta authentication | [Authentication Guide](docs/authentication.md) |
214
- | **RBAC (v5.37.0)** | Admin, operator, viewer, auditor roles | [Authorization Guide](docs/authorization.md) |
215
- | **Metrics Export (v5.38.0)** | Prometheus/OpenMetrics `/metrics` endpoint | [Metrics Guide](docs/metrics.md) |
216
- | **Branch Protection (v5.37.0)** | Auto-create feature branches for PRs | [Git Workflow](docs/git-workflow.md) |
217
- | **Audit Trail (v5.37.0)** | Agent action logging with integrity chain | [Audit Logging](docs/audit-logging.md) |
218
- | **SIEM Integration (v5.38.0)** | Syslog forwarding for enterprise security | [SIEM Guide](docs/siem-integration.md) |
219
- | **OpenClaw Bridge (v5.38.0)** | Multi-agent coordination protocol | [OpenClaw Integration](docs/openclaw-integration.md) |
220
- | **41 Agent Types** | Engineering, Ops, Business, Data, Product, Growth, Orchestration | [Agent Definitions](references/agent-types.md) |
221
- | **RARV Cycle** | Reason-Act-Reflect-Verify workflow | [Core Workflow](references/core-workflow.md) |
222
- | **Quality Gates** | 9-gate system: guardrails, static analysis, blind review, anti-sycophancy, severity blocking, test coverage, mock detection, mutation detection | [Quality Control](references/quality-control.md) |
223
- | **Memory System (v5.15.0)** | Complete 3-tier memory with progressive disclosure | [Memory Architecture](references/memory-system.md) |
224
- | **Parallel Workflows** | Git worktree-based parallelism | [Parallel Workflows](skills/parallel-workflows.md) |
225
- | **GitHub Integration** | Issue import, PR creation, status sync | [GitHub Integration](skills/github-integration.md) |
226
- | **Distribution** | npm, Homebrew, Docker installation | [Installation Guide](docs/INSTALLATION.md) |
227
- | **Research Foundation** | OpenAI, DeepMind, Anthropic patterns | [Acknowledgements](docs/ACKNOWLEDGEMENTS.md) |
228
- | **Benchmarks** | HumanEval and SWE-bench infrastructure included | [Benchmark Harness](benchmarks/) |
229
- | **Comparisons** | vs Auto-Claude, Cursor | [Auto-Claude](docs/auto-claude-comparison.md), [Cursor](docs/cursor-comparison.md) |
45
+ ## Quick Start
230
46
 
231
- ---
47
+ **Requirements:** Node.js 18+, Python 3.8+, macOS/Linux/WSL2, and at least one AI CLI (Claude Code, Codex, or Gemini).
232
48
 
233
- ## Enterprise Security & Compliance (v5.36.0-v5.38.0)
234
-
235
- Loki Mode now includes production-ready security and compliance features for enterprise deployments:
236
-
237
- ### **Authentication & Authorization**
238
- - **TLS/HTTPS Encryption**: Self-signed certificates for dashboard encryption (v5.36.0)
239
- - **OIDC/SSO Integration**: Support for Google, Azure AD, and Okta authentication (v5.36.0)
240
- - **RBAC Roles**: Four-tier role system (v5.37.0)
241
- - **Admin**: Full control, configuration changes, user management
242
- - **Operator**: Start/stop sessions, modify tasks, execute actions
243
- - **Viewer**: Read-only dashboard access, view logs and metrics
244
- - **Auditor**: Access audit logs, compliance reports, security events
245
-
246
- ### **Observability & Monitoring**
247
- - **Prometheus/OpenMetrics**: `/metrics` endpoint for production monitoring (v5.38.0)
248
- - Task completion rates, agent performance, memory usage
249
- - Integration with Grafana, Datadog, New Relic
250
- - **Audit Trail**: SHA-256 integrity chain for all agent actions (v5.37.0)
251
- - Tamper-evident logging with cryptographic verification
252
- - Complete action history: who did what, when, and why
253
- - **SIEM Integration**: Syslog forwarding (RFC 5424) for enterprise security (v5.38.0)
254
- - Send logs to Splunk, QRadar, ArcSight, Elastic SIEM
255
- - Real-time security event detection and alerting
256
-
257
- ### **Development Safety**
258
- - **Branch Protection**: Auto-create feature branches for all PR work (v5.37.0)
259
- - Never commits directly to main/master
260
- - Automatic branch naming: `loki/feature/<task-id>`
261
- - Clean merge workflow with squash commits
262
- - **OpenClaw Bridge**: Multi-agent coordination protocol integration (v5.38.0)
263
- - Standardized inter-agent communication
264
- - Cross-system orchestration support
265
-
266
- ### **Quick Start (Enterprise Mode)**
49
+ ### CLI Mode
267
50
 
268
51
  ```bash
269
- # Enable TLS/HTTPS
270
- export LOKI_TLS_ENABLED=true
271
- export LOKI_TLS_CERT=/path/to/cert.pem
272
- export LOKI_TLS_KEY=/path/to/key.pem
273
-
274
- # Configure OIDC
275
- export LOKI_OIDC_PROVIDER=google
276
- export LOKI_OIDC_CLIENT_ID=your-client-id
277
- export LOKI_OIDC_CLIENT_SECRET=your-client-secret
278
-
279
- # Enable audit logging
280
- export LOKI_AUDIT_ENABLED=true
281
- export LOKI_AUDIT_INTEGRITY_CHECK=true
282
-
283
- # Enable Prometheus metrics
284
- export LOKI_METRICS_ENABLED=true
285
-
286
- # Start with enterprise features
287
- loki start --enterprise ./my-prd.md
52
+ npm install -g loki-mode
53
+ loki doctor # verify environment
54
+ loki start ./prd.md # uses Claude Code by default
288
55
  ```
289
56
 
290
- For detailed configuration, see [docs/network-security.md](docs/network-security.md), [docs/authentication.md](docs/authentication.md), and [docs/authorization.md](docs/authorization.md).
291
-
292
- ---
293
-
294
- ## Dashboard & Real-Time Monitoring
295
-
296
- Monitor your autonomous startup being built in real-time through the Loki Mode dashboard:
297
-
298
- ### **Agent Monitoring**
299
-
300
- <img width="1200" alt="Loki Mode Dashboard - Active Agents" src="docs/screenshots/dashboard-agents.png" />
301
-
302
- **Track all active agents in real-time:**
303
- - **Agent ID** and **Type** (frontend, backend, QA, DevOps, etc.)
304
- - **Model Badge** (Sonnet, Haiku, Opus) with color coding
305
- - **Current Work** being performed
306
- - **Runtime** and **Tasks Completed**
307
- - **Status** (active, completed)
308
-
309
- ### **Task Queue Visualization**
310
-
311
- <img width="1200" alt="Loki Mode Dashboard - Task Queue" src="docs/screenshots/dashboard-tasks.png" />
312
-
313
- **Four-column kanban view:**
314
- - **Pending**: Queued tasks waiting for agents
315
- - **In Progress**: Currently being worked on
316
- - **Completed**: Successfully finished (shows last 10)
317
- - **Failed**: Tasks requiring attention
318
-
319
- ### **Live Status Monitor**
57
+ ### Interactive Mode (inside Claude Code)
320
58
 
321
59
  ```bash
322
- # Watch status updates in terminal
323
- watch -n 2 cat .loki/STATUS.txt
324
- ```
325
-
60
+ claude --dangerously-skip-permissions
61
+ # Then type: "Loki Mode" or "Loki Mode with PRD at ./my-prd.md"
326
62
  ```
327
- ╔════════════════════════════════════════════════════════════════╗
328
- ║ LOKI MODE STATUS ║
329
- ╚════════════════════════════════════════════════════════════════╝
330
-
331
- Phase: DEVELOPMENT
332
63
 
333
- Active Agents: 47
334
- ├─ Engineering: 18
335
- ├─ Operations: 12
336
- ├─ QA: 8
337
- └─ Business: 9
64
+ This is the easiest way to try it if you already have Claude Code installed. No separate `loki` CLI installation needed.
338
65
 
339
- Tasks:
340
- ├─ Pending: 10
341
- ├─ In Progress: 47
342
- ├─ Completed: 203
343
- └─ Failed: 0
344
-
345
- Last Updated: 2026-01-04 20:45:32
346
- ```
66
+ ### What Happens
347
67
 
348
- **Access the dashboard:**
349
- ```bash
350
- # Automatically starts when running autonomously
351
- ./autonomy/run.sh ./docs/requirements.md
68
+ The system classifies your PRD complexity, assembles an agent team, and runs RARV cycles with 9 quality gates. Output is committed to a Git repo with source code, tests, deployment configs, and audit logs. The dashboard auto-starts at `http://localhost:57374` for real-time monitoring, or use `loki status` from the terminal.
352
69
 
353
- # Or open manually
354
- open http://localhost:57374
355
- # HTTPS mode (v5.36.0+):
356
- open https://localhost:57374
357
- ```
70
+ **Other install methods:** Homebrew (`brew tap asklokesh/tap && brew install loki-mode`), Docker, Git clone, VS Code Extension. See [Installation Guide](docs/INSTALLATION.md).
358
71
 
359
- The dashboard at `http://localhost:57374` (or `https://localhost:57374` with TLS enabled) auto-refreshes via WebSocket. Works with any modern browser.
72
+ **Cost:** Loki Mode uses your AI provider's API. Simple projects typically consume modest token usage; complex projects with parallel agents use more. Monitor token economics with `loki memory economics`. See [Token Economics](references/memory-system.md) for details.
360
73
 
361
74
  ---
362
75
 
363
- ## Autonomous Capabilities
76
+ ## Presentation
364
77
 
365
- ### **RARV Cycle: Reason-Act-Reflect-Verify**
78
+ ![Loki Mode Presentation](docs/loki-mode-presentation.gif)
366
79
 
367
- Loki Mode doesn't just write code—it **thinks, acts, learns, and verifies**:
80
+ *9 slides: Problem, Solution, 41 Agents, RARV Cycle, Benchmarks, Multi-Provider, Full Lifecycle* | **[Download PPTX](docs/loki-mode-presentation.pptx)**
368
81
 
369
- ```
370
- 1. REASON
371
- └─ Read .loki/CONTINUITY.md including "Mistakes & Learnings"
372
- └─ Check .loki/state/ and .loki/queue/
373
- └─ Identify next task or improvement
374
-
375
- 2. ACT
376
- └─ Execute task, write code
377
- └─ Commit changes atomically (git checkpoint)
378
-
379
- 3. REFLECT
380
- └─ Update .loki/CONTINUITY.md with progress
381
- └─ Update state files
382
- └─ Identify NEXT improvement
383
-
384
- 4. VERIFY
385
- └─ Run automated tests (unit, integration, E2E)
386
- └─ Check compilation/build
387
- └─ Verify against spec
388
-
389
- IF VERIFICATION FAILS:
390
- ├─ Capture error details (stack trace, logs)
391
- ├─ Analyze root cause
392
- ├─ UPDATE "Mistakes & Learnings" in CONTINUITY.md
393
- ├─ Rollback to last good git checkpoint if needed
394
- └─ Apply learning and RETRY from REASON
395
- ```
82
+ ---
396
83
 
397
- **Result:** Improved quality through continuous self-verification and multi-reviewer code review.
84
+ ## Architecture
398
85
 
399
- ### **Perpetual Improvement Mode**
86
+ <img width="5989" height="2875" alt="image" src="https://github.com/user-attachments/assets/c9798120-9587-4847-8e8d-8f421f984dfc" />
400
87
 
401
- There is **NEVER** a "finished" state. After completing the PRD, Loki Mode:
402
- - Runs performance optimizations
403
- - Adds missing test coverage
404
- - Improves documentation
405
- - Refactors code smells
406
- - Updates dependencies
407
- - Enhances user experience
408
- - Implements A/B test learnings
409
88
 
410
- **It keeps going until you stop it.**
89
+ *Fallback: PRD -> Classifier -> Agent Team (41 types, 8 swarms) -> RARV Cycle <-> Memory System -> Quality Gates (pass/fail loop) -> Output*
411
90
 
412
- ### **Auto-Resume & Self-Healing**
91
+ See [full architecture documentation](docs/enterprise/architecture.md) for the detailed view.
413
92
 
414
- **Rate limits?** Exponential backoff and automatic resume.
415
- **Errors?** Circuit breakers, dead letter queues, retry logic.
416
- **Interruptions?** State checkpoints every 5 seconds—just restart.
93
+ **Key components:**
417
94
 
418
- ```bash
419
- # Start autonomous mode
420
- ./autonomy/run.sh ./docs/requirements.md
421
-
422
- # Hit rate limit? Script automatically:
423
- # ├─ Saves state checkpoint
424
- # ├─ Waits with exponential backoff (60s → 120s → 240s...)
425
- # ├─ Resumes from exact point
426
- # └─ Continues until completion or max retries (default: 50)
427
- ```
95
+ - **RARV Cycle** -- Reason-Act-Reflect-Verify with self-correction on failure. [Core Workflow](references/core-workflow.md)
96
+ - **41 Agent Types** -- 8 swarms auto-composed by PRD complexity. [Agent Types](references/agent-types.md)
97
+ - **9 Quality Gates** -- Blind review, anti-sycophancy, severity blocking, mock/mutation detection. [Quality Gates](skills/quality-gates.md)
98
+ - **Memory System** -- Episodic, semantic, procedural tiers with progressive disclosure. [Memory Architecture](references/memory-system.md)
99
+ - **Dashboard** -- Real-time monitoring, API v2, WebSocket at port 57374. [Dashboard Guide](docs/dashboard-guide.md)
100
+ - **Enterprise Layer** -- OTEL, policy engine, audit trails, RBAC, SSO (requires env var activation). [Enterprise Guide](docs/enterprise/architecture.md)
428
101
 
429
102
  ---
430
103
 
431
- ## Quick Start
432
-
433
- ### **1. Write a PRD**
434
-
435
- ```markdown
436
- # Product: AI-Powered Todo App
437
-
438
- ## Overview
439
- Build a todo app with AI-powered task suggestions and deadline predictions.
440
-
441
104
  ## Features
442
- - User authentication (email/password)
443
- - Create, read, update, delete todos
444
- - AI suggests next tasks based on patterns
445
- - Smart deadline predictions
446
- - Mobile-responsive design
447
-
448
- ## Tech Stack
449
- - Next.js 14 with TypeScript
450
- - PostgreSQL database
451
- - OpenAI API for suggestions
452
- - Deploy to Vercel
453
- ```
454
-
455
- Save as `my-prd.md`.
456
105
 
457
- ### **2. Run It**
458
-
459
- ```bash
460
- loki start ./my-prd.md
461
- ```
106
+ | Category | Highlights | Docs |
107
+ |---|---|---|
108
+ | **Agents** | 41 types across 8 swarms, auto-composed by PRD complexity | [Agent Types](references/agent-types.md) |
109
+ | **Quality** | 9 gates: blind review, anti-sycophancy, mock/mutation detection | [Quality Gates](skills/quality-gates.md) |
110
+ | **Dashboard** | Real-time monitoring, API v2, WebSocket, auto-starts with `loki start` | [Dashboard Guide](docs/dashboard-guide.md) |
111
+ | **Memory** | 3-tier (episodic/semantic/procedural), knowledge graph, vector search | [Memory System](references/memory-system.md) |
112
+ | **Providers** | Claude (full), Codex (sequential), Gemini (sequential) | [Provider Guide](skills/providers.md) |
113
+ | **Enterprise** | TLS, OIDC/SSO, RBAC, OTEL, policy engine, audit trails | [Enterprise Guide](docs/enterprise/architecture.md) |
114
+ | **Integrations** | Jira, Slack, Teams, GitHub Actions (Linear: partial) | [Integration Cookbook](docs/enterprise/integration-cookbook.md) |
115
+ | **Deployment** | Helm, Docker Compose, Terraform configs (AWS/Azure/GCP) | [Deployment Guide](deploy/helm/README.md) |
116
+ | **SDKs** | Python (`loki-mode-sdk`), TypeScript (`loki-mode-sdk`) | [SDK Guide](docs/enterprise/sdk-guide.md) |
462
117
 
463
- ### **3. Monitor and Walk Away**
118
+ ### Multi-Provider Support
464
119
 
465
- ```bash
466
- loki status # Check progress
467
- loki dashboard # Open web dashboard
468
- ```
120
+ | Provider | Install | Autonomous Flag | Parallel Agents |
121
+ |----------|---------|-----------------|-----------------|
122
+ | Claude Code | `npm i -g @anthropic-ai/claude-code` | `--dangerously-skip-permissions` | Yes (10+) |
123
+ | Codex CLI | `npm i -g @openai/codex` | `--full-auto` | No (sequential) |
124
+ | Gemini CLI | `npm i -g @google/gemini-cli` | `--approval-mode=yolo` | No (sequential) |
469
125
 
470
- Go get coffee. It'll be deployed when you get back.
126
+ Claude gets full features (subagents, parallelization, MCP, Task tool). Codex and Gemini run in sequential mode -- one agent at a time, no Task tool. See [Provider Guide](skills/providers.md) for the full comparison.
471
127
 
472
128
  ---
473
129
 
474
- ## Architecture
475
-
476
- <img width="6961" height="6302" alt="architecture" src="https://github.com/user-attachments/assets/d9954dd2-5cb6-4b1c-8cd2-67f68141dffa" />
477
-
478
-
479
- **Key components:**
480
- - **RARV+C Cycle** -- Reason, Act, Reflect, Verify, Compound. Every iteration follows this loop. Failed verification triggers retry from Reason.
481
- - **Provider Layer** -- Claude Code (full parallel agents, Task tool, MCP), Codex CLI and Gemini CLI (sequential, degraded mode).
482
- - **Agent Swarms** -- 41 specialized agent types across 8 swarms, spawned on demand based on project complexity.
483
- - **Completion Council** -- 3 members vote on whether the project is done. Anti-sycophancy devil's advocate on unanimous votes.
484
- - **Memory System** -- Episodic traces, semantic patterns, procedural skills. Progressive disclosure reduces context usage by 60-80%.
485
- - **Dashboard** -- FastAPI server reading `.loki/` flat files, with real-time web UI for task queue, agents, logs, and council state. Now with TLS/HTTPS, OIDC/SSO, and RBAC (v5.36.0-v5.37.0).
486
- - **Metrics Export** -- Prometheus/OpenMetrics endpoint for production monitoring (v5.38.0).
487
- - **Audit Trail** -- SHA-256 integrity chain for tamper-evident logging of all agent actions (v5.37.0).
488
-
489
- ---
490
-
491
- ## CLI Commands
492
-
493
- The `loki` CLI provides easy access to all Loki Mode features:
130
+ ## CLI
494
131
 
495
132
  | Command | Description |
496
133
  |---------|-------------|
497
- | `loki start [PRD]` | Start Loki Mode with optional PRD file |
498
- | `loki stop` | Stop execution immediately |
499
- | `loki pause` | Pause after current session |
500
- | `loki resume` | Resume paused execution |
134
+ | `loki start [PRD]` | Start with optional PRD file |
135
+ | `loki stop` | Stop execution |
136
+ | `loki pause` / `resume` | Pause/resume after current session |
501
137
  | `loki status` | Show current status |
502
- | `loki dashboard` | Open dashboard in browser |
138
+ | `loki dashboard` | Open web dashboard |
139
+ | `loki doctor` | Check environment and dependencies |
503
140
  | `loki import` | Import GitHub issues as tasks |
504
- | `loki config show` | Show configuration |
505
- | `loki config init` | Create config file from template |
506
- | `loki audit logs` | View audit trail (v5.37.0) |
507
- | `loki audit verify` | Verify log integrity chain (v5.37.0) |
508
- | `loki metrics` | Display Prometheus metrics (v5.38.0) |
509
- | `loki syslog test` | Test SIEM integration (v5.38.0) |
141
+ | `loki memory <cmd>` | Memory system CLI (index, timeline, search, consolidate) |
142
+ | `loki enterprise` | Enterprise feature management (tokens, OIDC) |
510
143
  | `loki version` | Show version |
511
144
 
512
- ### Configuration File
513
-
514
- Create a YAML config file for persistent settings:
515
-
516
- ```bash
517
- # Initialize config
518
- loki config init
519
-
520
- # Or copy template manually
521
- cp ~/.claude/skills/loki-mode/autonomy/config.example.yaml .loki/config.yaml
522
- ```
523
-
524
- Config search order: `.loki/config.yaml` (project) -> `~/.config/loki-mode/config.yaml` (global)
145
+ Run `loki --help` for all commands. Full reference: [CLI Reference](wiki/CLI-Reference.md) | Configuration: [config.example.yaml](autonomy/config.example.yaml)
525
146
 
526
147
  ---
527
148
 
528
- ## Agent Swarms (41 Types)
529
-
530
- Loki Mode has **41 predefined agent types** organized into **8 specialized swarms**. The orchestrator spawns only what you need -- simple projects typically use 5-10 agents, complex ones may use more.
531
-
532
- <img width="5309" height="979" alt="Agent Swarms Visualization" src="https://github.com/user-attachments/assets/7d18635d-a606-401f-8d9f-430e6e4ee689" />
533
-
534
- ### **Engineering (8 types)**
535
- `eng-frontend` `eng-backend` `eng-database` `eng-mobile` `eng-api` `eng-qa` `eng-perf` `eng-infra`
536
-
537
- ### **Operations (8 types)**
538
- `ops-devops` `ops-sre` `ops-security` `ops-monitor` `ops-incident` `ops-release` `ops-cost` `ops-compliance`
539
-
540
- ### **Business (8 types)**
541
- `biz-marketing` `biz-sales` `biz-finance` `biz-legal` `biz-support` `biz-hr` `biz-investor` `biz-partnerships`
542
-
543
- ### **Data (3 types)**
544
- `data-ml` `data-eng` `data-analytics`
545
-
546
- ### **Product (3 types)**
547
- `prod-pm` `prod-design` `prod-techwriter`
548
-
549
- ### **Growth (4 types)**
550
- `growth-hacker` `growth-community` `growth-success` `growth-lifecycle`
551
-
552
- ### **Review (3 types)**
553
- `review-code` `review-business` `review-security`
554
-
555
- ### **Orchestration (4 types)**
556
- `orch-planner` `orch-sub-planner` `orch-judge` `orch-coordinator`
557
-
558
- See [Agent Types](references/agent-types.md) for the full list of 41 specialized agents with detailed capabilities.
559
-
560
- ---
561
-
562
- ## How It Works
563
-
564
- ### **Skill Architecture (v3.0+)**
565
-
566
- Loki Mode uses a **progressive disclosure architecture** to minimize context usage:
567
-
568
- ```
569
- SKILL.md (~190 lines) # Always loaded: core RARV cycle, autonomy rules
570
- skills/
571
- 00-index.md # Module routing table
572
- agents.md # Agent dispatch, A2A patterns
573
- production.md # HN patterns, batch processing, CI/CD
574
- quality-gates.md # Review system, severity handling
575
- testing.md # Playwright, E2E, property-based
576
- model-selection.md # Task tool, parallelization
577
- artifacts.md # Code generation patterns
578
- patterns-advanced.md # Constitutional AI, debate
579
- troubleshooting.md # Error recovery, fallbacks
580
- references/ # Deep documentation (23KB+ files)
581
- ```
149
+ ## Enterprise
582
150
 
583
- **Why this matters:**
584
- - Original 1,517-line SKILL.md consumed ~15% of context before any work began
585
- - Now only ~1% of context for core skill + on-demand modules
586
- - More room for actual code and reasoning
587
-
588
- ### **Phase Execution**
589
-
590
- | Phase | Description |
591
- |-------|-------------|
592
- | **0. Bootstrap** | Create `.loki/` directory structure, initialize state |
593
- | **1. Discovery** | Parse PRD, competitive research via web search |
594
- | **2. Architecture** | Tech stack selection with self-reflection |
595
- | **3. Infrastructure** | Provision cloud, CI/CD, monitoring |
596
- | **4. Development** | Implement with TDD, parallel code review |
597
- | **5. QA** | 9 quality gates, security audit, load testing |
598
- | **6. Deployment** | Blue-green deploy, auto-rollback on errors |
599
- | **7. Business** | Marketing, sales, legal, support setup |
600
- | **8. Growth** | Continuous optimization, A/B testing, feedback loops |
601
-
602
- ### **Parallel Code Review**
603
-
604
- Every code change goes through **3 specialized reviewers simultaneously**:
605
-
606
- ```
607
- IMPLEMENT → REVIEW (parallel) → AGGREGATE → FIX → RE-REVIEW → COMPLETE
608
-
609
- ├─ code-reviewer (Sonnet) - Code quality, patterns, best practices
610
- ├─ business-logic-reviewer (Sonnet) - Requirements, edge cases, UX
611
- └─ security-reviewer (Sonnet) - Vulnerabilities, OWASP Top 10
612
- ```
613
-
614
- **Severity-based issue handling:**
615
- - **Critical/High/Medium**: Block. Fix immediately. Re-review.
616
- - **Low**: Add `// TODO(review): ...` comment, continue.
617
- - **Cosmetic**: Add `// FIXME(nitpick): ...` comment, continue.
618
-
619
- ### **Directory Structure**
620
-
621
- ```
622
- .loki/
623
- ├── state/ # Orchestrator and agent states
624
- ├── queue/ # Task queue (pending, in-progress, completed, dead-letter)
625
- ├── memory/ # Episodic, semantic, and procedural memory
626
- ├── metrics/ # Efficiency tracking and reward signals
627
- ├── messages/ # Inter-agent communication
628
- ├── logs/ # Audit logs
629
- ├── audit/ # Audit trail with SHA-256 integrity chain (v5.37.0)
630
- ├── security/ # TLS certificates, OIDC configs (v5.36.0)
631
- ├── rbac/ # Role definitions and permissions (v5.37.0)
632
- ├── config/ # Configuration files
633
- ├── prompts/ # Agent role prompts
634
- ├── artifacts/ # Releases, reports, backups
635
- ├── dashboard/ # Real-time monitoring dashboard
636
- └── scripts/ # Helper scripts
637
- ```
638
-
639
- ### **Memory System (v5.15.0)**
640
-
641
- Complete 3-tier memory architecture with progressive disclosure:
642
-
643
- ```
644
- WORKING MEMORY (CONTINUITY.md)
645
- |
646
- v
647
- EPISODIC MEMORY (.loki/memory/episodic/)
648
- |
649
- v (consolidation)
650
- SEMANTIC MEMORY (.loki/memory/semantic/)
651
- |
652
- v
653
- PROCEDURAL MEMORY (.loki/memory/skills/)
654
- ```
655
-
656
- **Key Features:**
657
- - **Progressive Disclosure**: 3-layer loading (index ~100 tokens, timeline ~500 tokens, full details) reduces context usage by 60-80%
658
- - **Token Economics**: Track discovery vs read tokens, automatic threshold-based optimization
659
- - **Vector Search**: Optional embedding-based similarity search (sentence-transformers)
660
- - **Consolidation Pipeline**: Automatic episodic-to-semantic transformation
661
- - **Task-Aware Retrieval**: Different memory strategies for exploration, implementation, debugging, review, and refactoring
662
-
663
- **CLI Commands:**
664
- ```bash
665
- loki memory index # View index layer
666
- loki memory timeline # View compressed history
667
- loki memory consolidate # Run consolidation pipeline
668
- loki memory economics # View token usage metrics
669
- loki memory retrieve "query" # Test task-aware retrieval
670
- ```
671
-
672
- **API Endpoints:**
673
- - `GET /api/memory/summary` - Memory summary
674
- - `POST /api/memory/retrieve` - Query memories
675
- - `POST /api/memory/consolidate` - Trigger consolidation
676
- - `GET /api/memory/economics` - Token economics
677
-
678
- See [references/memory-system.md](references/memory-system.md) for complete documentation.
679
-
680
- ---
681
-
682
- ## Example PRDs
683
-
684
- Test Loki Mode with these pre-built PRDs in the `examples/` directory:
685
-
686
- | PRD | Complexity | Est. Time | Description |
687
- |-----|------------|-----------|-------------|
688
- | `simple-todo-app.md` | Low | ~10 min | Basic todo app - tests core functionality |
689
- | `api-only.md` | Low | ~10 min | REST API only - tests backend agents |
690
- | `static-landing-page.md` | Low | ~5 min | HTML/CSS only - tests frontend/marketing |
691
- | `full-stack-demo.md` | Medium | ~30-60 min | Complete bookmark manager - full test |
151
+ Enterprise features are included but require env var activation. Self-audit results: 35/45 capabilities working, 0 broken, 1,314 tests passing (683 npm + 631 pytest). 2 items partial, 3 scaffolding (OTEL/policy active only when configured). See [Audit Results](.loki/audit/integrity-audit-v5.52.0.md).
692
152
 
693
153
  ```bash
694
- # Example: Run with simple todo app
695
- ./autonomy/run.sh examples/simple-todo-app.md
696
- ```
697
-
698
- ---
699
-
700
- ## Configuration
701
-
702
- ### **Autonomy Settings**
703
-
704
- Customize the autonomous runner with environment variables:
705
-
706
- ```bash
707
- LOKI_MAX_RETRIES=100 \
708
- LOKI_BASE_WAIT=120 \
709
- LOKI_MAX_WAIT=7200 \
710
- ./autonomy/run.sh ./docs/requirements.md
711
- ```
712
-
713
- | Variable | Default | Description |
714
- |----------|---------|-------------|
715
- | `LOKI_PROVIDER` | claude | AI provider: claude, codex, gemini |
716
- | `LOKI_MAX_RETRIES` | 50 | Maximum retry attempts before giving up |
717
- | `LOKI_BASE_WAIT` | 60 | Base wait time in seconds |
718
- | `LOKI_MAX_WAIT` | 3600 | Maximum wait time (1 hour) |
719
- | `LOKI_SKIP_PREREQS` | false | Skip prerequisite checks |
720
- | `LOKI_TLS_ENABLED` | false | Enable HTTPS/TLS for dashboard (v5.36.0) |
721
- | `LOKI_OIDC_PROVIDER` | - | OIDC provider: google, azure, okta (v5.36.0) |
722
- | `LOKI_RBAC_ENABLED` | false | Enable role-based access control (v5.37.0) |
723
- | `LOKI_AUDIT_ENABLED` | false | Enable audit logging with integrity chain (v5.37.0) |
724
- | `LOKI_METRICS_ENABLED` | false | Enable Prometheus /metrics endpoint (v5.38.0) |
725
- | `LOKI_SYSLOG_ENABLED` | false | Enable syslog forwarding to SIEM (v5.38.0) |
726
- | `LOKI_BRANCH_PROTECTION` | true | Auto-create feature branches (v5.37.0) |
727
-
728
- ### **Circuit Breakers**
729
-
730
- ```yaml
731
- # .loki/config/circuit-breakers.yaml
732
- defaults:
733
- failureThreshold: 5
734
- cooldownSeconds: 300
154
+ export LOKI_TLS_ENABLED=true
155
+ export LOKI_OIDC_PROVIDER=google
156
+ export LOKI_AUDIT_ENABLED=true
157
+ export LOKI_METRICS_ENABLED=true
158
+ loki enterprise status # check what's enabled
159
+ loki start ./prd.md # enterprise features activate via env vars
735
160
  ```
736
161
 
737
- ### **External Alerting**
738
-
739
- ```yaml
740
- # .loki/config/alerting.yaml
741
- channels:
742
- slack:
743
- webhook_url: "${SLACK_WEBHOOK_URL}"
744
- severity: [critical, high]
745
- pagerduty:
746
- integration_key: "${PAGERDUTY_KEY}"
747
- severity: [critical]
748
- ```
162
+ [Enterprise Architecture](docs/enterprise/architecture.md) | [Security](docs/enterprise/security.md) | [Authentication](docs/authentication.md) | [Authorization](docs/authorization.md) | [Metrics](docs/metrics.md) | [Audit Logging](docs/audit-logging.md) | [SIEM](docs/siem-integration.md)
749
163
 
750
164
  ---
751
165
 
752
- ## Requirements
166
+ ## Benchmarks
753
167
 
754
- - **Claude Code** with `--dangerously-skip-permissions` flag
755
- - **Internet access** for competitive research and deployment
756
- - **Cloud provider credentials** (for deployment phase)
757
- - **Python 3** (for test suite)
168
+ Results from the included test harness. Self-reported and not independently verified. Verification scripts included so you can reproduce. See [benchmarks/](benchmarks/) for methodology.
758
169
 
759
- **Optional but recommended:**
760
- - Git (for version control and checkpoints)
761
- - Node.js/npm (for dashboard and web projects)
762
- - Docker (for containerized deployments)
170
+ | Benchmark | Result | Notes |
171
+ |-----------|--------|-------|
172
+ | HumanEval | 162/164 (98.78%) | Max 3 retries per problem, RARV self-verification |
173
+ | SWE-bench | 299/300 patches generated | Patch generation only -- SWE-bench evaluator not yet run to confirm resolution |
763
174
 
764
175
  ---
765
176
 
766
- ## Integrations
767
-
768
- ### **Vibe Kanban (Visual Dashboard)**
769
-
770
- Integrate with [Vibe Kanban](https://github.com/BloopAI/vibe-kanban) for a visual kanban board:
771
-
772
- ```bash
773
- # 1. Start Vibe Kanban (terminal 1)
774
- npx vibe-kanban
775
-
776
- # 2. Run Loki Mode (terminal 2)
777
- ./autonomy/run.sh ./prd.md
778
-
779
- # 3. Export tasks to see them in Vibe Kanban (terminal 3)
780
- ./scripts/export-to-vibe-kanban.sh
781
-
782
- # 4. Optional: Auto-sync for real-time updates
783
- ./scripts/vibe-sync-watcher.sh
784
- ```
785
-
786
- **Important:** Vibe Kanban integration requires manual export. Tasks don't automatically appear - you must run the export script to sync.
787
-
788
- **Benefits:**
789
- - Visual progress tracking of all active agents
790
- - Manual intervention/prioritization when needed
791
- - Code review with visual diffs
792
- - Multi-project dashboard
793
-
794
- See [integrations/vibe-kanban.md](integrations/vibe-kanban.md) for complete step-by-step setup guide and troubleshooting.
795
-
796
- ### **OpenClaw Bridge (v5.38.0)**
797
-
798
- Loki Mode now supports the OpenClaw multi-agent coordination protocol for cross-system orchestration:
799
-
800
- ```bash
801
- # Enable OpenClaw bridge
802
- export LOKI_OPENCLAW_ENABLED=true
803
- export LOKI_OPENCLAW_ENDPOINT=http://openclaw-server:8080
804
-
805
- # Start with OpenClaw integration
806
- loki start --openclaw ./prd.md
807
- ```
808
-
809
- **Benefits:**
810
- - Standardized inter-agent communication across different AI systems
811
- - Coordinate with external agent frameworks (AutoGPT, MetaGPT, etc.)
812
- - Share task queues and state between multiple orchestrators
813
- - Cross-platform agent collaboration
177
+ ## Research Foundation
814
178
 
815
- See [docs/openclaw-integration.md](docs/openclaw-integration.md) for complete setup and API reference.
179
+ | Source | What We Use From It |
180
+ |--------|---------------------|
181
+ | [Anthropic: Building Effective Agents](https://www.anthropic.com/research/building-effective-agents) | Evaluator-optimizer pattern, parallelization strategy |
182
+ | [Anthropic: Constitutional AI](https://www.anthropic.com/research/constitutional-ai-harmlessness-from-ai-feedback) | Self-critique against quality principles |
183
+ | [DeepMind: Scalable Oversight via Debate](https://deepmind.google/research/publications/34920/) | Debate-based verification in council review |
184
+ | [DeepMind: SIMA 2](https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/) | Self-improvement loop design |
185
+ | [OpenAI: Agents SDK](https://openai.github.io/openai-agents-python/) | Guardrails, tripwires, tracing patterns |
186
+ | [NVIDIA ToolOrchestra](https://github.com/NVlabs/ToolOrchestra) | Efficiency metrics, reward signal tracking |
187
+ | [CONSENSAGENT (ACL 2025)](https://aclanthology.org/2025.findings-acl.1141/) | Anti-sycophancy checks in blind review |
188
+ | [GoalAct](https://arxiv.org/abs/2504.16563) | Hierarchical planning for complex PRDs |
816
189
 
817
- ---
818
-
819
- ## Testing
190
+ **Practitioner insights:** Boris Cherny -- self-verification loop patterns | Simon Willison -- sub-agents for context isolation | [HN Community](https://news.ycombinator.com/item?id=44623207) -- production patterns from real deployments
820
191
 
821
- Run the comprehensive test suite:
822
-
823
- ```bash
824
- # Run all tests
825
- ./tests/run-all-tests.sh
826
-
827
- # Or run individual test suites
828
- ./tests/test-bootstrap.sh # Directory structure, state init
829
- ./tests/test-task-queue.sh # Queue operations, priorities
830
- ./tests/test-circuit-breaker.sh # Failure handling, recovery
831
- ./tests/test-agent-timeout.sh # Timeout, stuck process handling
832
- ./tests/test-state-recovery.sh # Checkpoints, recovery
833
- ```
192
+ **[Full Acknowledgements](docs/ACKNOWLEDGEMENTS.md)** -- 50+ research papers, articles, and resources
834
193
 
835
194
  ---
836
195
 
837
196
  ## Contributing
838
197
 
839
- Contributions welcome! Please:
840
- 1. Read [SKILL.md](SKILL.md) to understand the core architecture
841
- 2. Review [skills/00-index.md](skills/00-index.md) for module organization (v3.0+)
842
- 3. Check [references/agent-types.md](references/agent-types.md) for agent definitions
843
- 4. Open an issue for bugs or feature requests
844
- 5. Submit PRs with clear descriptions and tests
845
-
846
- **Dev setup:**
847
198
  ```bash
848
199
  git clone https://github.com/asklokesh/loki-mode.git && cd loki-mode
849
- npm install # Install dependencies
850
- bash -n autonomy/run.sh # Validate shell scripts
851
- cd dashboard-ui && npm ci && npm run build:all # Build dashboard
200
+ npm install && npm test # 683 tests, ~10 sec
201
+ python3 -m pytest # 631 tests, ~3 sec
202
+ bash tests/run-all-tests.sh # shell tests, ~2 min
852
203
  ```
853
204
 
854
- See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed development guidelines.
855
-
856
- ---
205
+ See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
857
206
 
858
207
  ## License
859
208
 
860
- MIT License - see [LICENSE](LICENSE) for details.
861
-
862
- ---
863
-
864
- ## Acknowledgments
865
-
866
- Loki Mode incorporates research and patterns from leading AI labs and practitioners:
867
-
868
- ### Research Foundation
869
-
870
- | Source | Key Contribution |
871
- |--------|------------------|
872
- | [Anthropic: Building Effective Agents](https://www.anthropic.com/research/building-effective-agents) | Evaluator-optimizer pattern, parallelization |
873
- | [Anthropic: Constitutional AI](https://www.anthropic.com/research/constitutional-ai-harmlessness-from-ai-feedback) | Self-critique against principles |
874
- | [DeepMind: Scalable Oversight via Debate](https://deepmind.google/research/publications/34920/) | Debate-based verification |
875
- | [DeepMind: SIMA 2](https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/) | Self-improvement loop |
876
- | [OpenAI: Agents SDK](https://openai.github.io/openai-agents-python/) | Guardrails, tripwires, tracing |
877
- | [NVIDIA ToolOrchestra](https://github.com/NVlabs/ToolOrchestra) | Efficiency metrics, reward signals |
878
- | [CONSENSAGENT (ACL 2025)](https://aclanthology.org/2025.findings-acl.1141/) | Anti-sycophancy, blind review |
879
- | [GoalAct](https://arxiv.org/abs/2504.16563) | Hierarchical planning |
880
-
881
- ### Practitioner Insights
882
-
883
- - **Boris Cherny** (Claude Code creator) - Self-verification loop, extended thinking
884
- - **Simon Willison** - Sub-agents for context isolation, skills system
885
- - **Hacker News Community** - [Production patterns](https://news.ycombinator.com/item?id=44623207) from real deployments
886
-
887
- ### Inspirations
888
-
889
- - [LerianStudio/ring](https://github.com/LerianStudio/ring) - Subagent-driven-development pattern
890
- - [Awesome Agentic Patterns](https://github.com/nibzard/awesome-agentic-patterns) - 105+ production patterns
891
-
892
- **[Full Acknowledgements](docs/ACKNOWLEDGEMENTS.md)** - Complete list of 50+ research papers, articles, and resources
893
-
894
- Built for the [Claude Code](https://claude.ai) ecosystem, powered by Anthropic's Claude models (Sonnet, Haiku, Opus).
895
-
896
- ---
897
-
898
- ## Autonomi
899
-
900
- Loki Mode is the flagship product of **[Autonomi](https://www.autonomi.dev/)** -- a platform for autonomous AI systems. Like Alphabet is to Google, Autonomi is the parent brand under which Loki Mode and future products operate.
901
-
902
- **Why Autonomi?** Loki Mode proved that multi-agent autonomous systems can build real software from a PRD with minimal human intervention. Autonomi is the expansion of that vision into a broader platform of autonomous services and products.
903
-
904
- - **[autonomi.dev](https://www.autonomi.dev/)** -- Main website
905
- - **[Documentation](https://www.autonomi.dev/docs)** -- Full documentation
906
- - **Loki Mode** -- Autonomous multi-agent startup system (this repo)
907
- - More products coming soon
908
-
909
- ---
910
-
911
- **Ready to build a startup while you sleep?**
912
-
913
- ```bash
914
- git clone https://github.com/asklokesh/loki-mode.git ~/.claude/skills/loki-mode
915
- ./autonomy/run.sh your-prd.md
916
- ```
209
+ MIT -- see [LICENSE](LICENSE).
917
210
 
918
211
  ---
919
212
 
920
- **Keywords:** autonomi, loki-mode, claude-code, claude-skills, ai-agents, autonomous-development, multi-agent-system, sdlc-automation, startup-automation, devops, mlops, deployment-automation, self-healing, perpetual-improvement
213
+ [Autonomi](https://www.autonomi.dev/) | [Documentation](wiki/Home.md) | [Changelog](CHANGELOG.md) | [Installation](docs/INSTALLATION.md) | [Comparisons](references/competitive-analysis.md)