loki-mode 7.45.1 → 7.46.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +16 -12
- package/SKILL.md +5 -5
- package/VERSION +1 -1
- package/autonomy/CONSTITUTION.md +9 -2
- package/autonomy/lib/sentrux-gate.sh +1 -1
- package/autonomy/loki +2 -2
- package/autonomy/run.sh +355 -92
- package/dashboard/__init__.py +1 -1
- package/dashboard/server.py +9 -10
- package/docs/COMPARISON.md +10 -10
- package/docs/COMPETITIVE-ANALYSIS.md +1 -1
- package/docs/INSTALLATION.md +2 -2
- package/docs/P0-SWEEP-PLAN.md +163 -0
- package/docs/architecture/STATE-MACHINES.md +18 -19
- package/docs/architecture/bmad-loki-voice-agent-council-analysis.md +1 -1
- package/docs/auto-claude-comparison.md +14 -11
- package/docs/certification/01-core-concepts/lesson.md +12 -11
- package/docs/certification/01-core-concepts/quiz.md +6 -6
- package/docs/certification/05-troubleshooting/lesson.md +23 -13
- package/docs/certification/05-troubleshooting/quiz.md +3 -3
- package/docs/certification/answer-key.md +2 -2
- package/docs/certification/certification-exam.md +9 -9
- package/docs/competitive/bolt-new-analysis.md +1 -1
- package/docs/competitive/emergence-others-analysis.md +9 -9
- package/docs/competitive/replit-lovable-analysis.md +3 -3
- package/docs/cursor-comparison.md +15 -12
- package/docs/dashboard-guide.md +9 -7
- package/docs/prd-purple-lab-platform-v2.md +1 -1
- package/docs/prd-purple-lab-platform.md +3 -3
- package/docs/show-hn-post.md +2 -2
- package/loki-ts/dist/loki.js +2 -2
- package/mcp/__init__.py +1 -1
- package/package.json +2 -2
- package/plugins/loki-mode/.claude-plugin/plugin.json +2 -2
- package/plugins/loki-mode/README.md +1 -1
- package/references/magic-rarv-integration.md +1 -1
- package/references/quality-control.md +5 -5
- package/references/sdlc-phases.md +1 -2
- package/skills/00-index.md +1 -1
- package/skills/artifacts.md +1 -1
- package/skills/healing.md +1 -1
- package/skills/magic-modules.md +3 -3
- package/skills/quality-gates.md +52 -39
- package/skills/testing.md +1 -1
package/docs/COMPARISON.md
CHANGED
|
@@ -14,8 +14,8 @@
|
|
|
14
14
|
| **Type** | Skill/Framework | Enterprise Platform | Standalone Agent | Cloud Agent | AI IDE | CLI Agent | AI IDE | AI IDE | Cloud Agent | AI IDE (OSS) |
|
|
15
15
|
| **Autonomy Level** | High (minimal human) | High | Full | High | Medium-High | High | High | High | High | High |
|
|
16
16
|
| **Max Runtime** | Unlimited | Async/Scheduled | Hours | Per-task | Session | Session | Days | Async | Per-task | Session |
|
|
17
|
-
| **Pricing** | Free (
|
|
18
|
-
| **
|
|
17
|
+
| **Pricing** | Free (source-available) | Enterprise | $20/mo | ChatGPT Plus | $20/mo | API costs | Free preview | Free preview | $19/mo | Free (OSS) |
|
|
18
|
+
| **Source model** | Source-available (BUSL-1.1) | No | No | No | No | No | No | No | No | Yes |
|
|
19
19
|
| **GitHub Stars** | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | 70.9k |
|
|
20
20
|
|
|
21
21
|
---
|
|
@@ -37,7 +37,7 @@
|
|
|
37
37
|
|---------|--------------|-----------|-----------|------------|----------|-----------------|--------------|--------------|
|
|
38
38
|
| **Code Review** | 3 blind reviewers + devil's advocate | Basic | Basic | BugBot PR | Property-based | Artifacts | Doc/Review | Basic |
|
|
39
39
|
| **Anti-Sycophancy** | Yes (CONSENSAGENT) | No | No | No | No | No | No | No |
|
|
40
|
-
| **Quality Gates** |
|
|
40
|
+
| **Quality Gates** | 8 gates + PBT | Basic | Sandbox | Tests | Spec validation | Artifact checks | Tests | Permissions |
|
|
41
41
|
| **Constitutional AI** | Yes (principles) | No | Refusal training | No | No | No | No | No |
|
|
42
42
|
|
|
43
43
|
---
|
|
@@ -146,10 +146,10 @@
|
|
|
146
146
|
|
|
147
147
|
| Feature | **Zencoder** | **Loki Mode** | **Assessment** |
|
|
148
148
|
|---------|-------------|---------------|----------------|
|
|
149
|
-
| **Four Pillars** | Structured Workflows, SDD, Multi-Agent Verification, Parallel Execution | SDLC + RARV +
|
|
149
|
+
| **Four Pillars** | Structured Workflows, SDD, Multi-Agent Verification, Parallel Execution | SDLC + RARV + 8 Gates + Worktrees | TIE |
|
|
150
150
|
| **Spec-Driven Dev** | Specs as first-class objects | OpenAPI-first | TIE |
|
|
151
151
|
| **Multi-Agent Verification** | Model diversity (Claude vs OpenAI, 54% improvement) | 3 blind reviewers + devil's advocate | Different approach (N/A for Claude Code - only Claude models) |
|
|
152
|
-
| **Quality Gates** | Built-in verification loops |
|
|
152
|
+
| **Quality Gates** | Built-in verification loops | 8 explicit gates + anti-sycophancy | **Loki Mode** |
|
|
153
153
|
| **Memory System** | Not documented | 3-tier episodic/semantic/procedural | **Loki Mode** |
|
|
154
154
|
| **Agent Specialization** | Custom Zen Agents | 41 pre-defined specialized agent roles | **Loki Mode** |
|
|
155
155
|
| **CI Failure Analysis** | Explicit pattern with auto-fix | DevOps agent only | **ADOPTED from Zencoder** |
|
|
@@ -178,7 +178,7 @@
|
|
|
178
178
|
|
|
179
179
|
### Where Loki Mode EXCEEDS Zencoder
|
|
180
180
|
|
|
181
|
-
1. **Quality Control**:
|
|
181
|
+
1. **Quality Control**: 8 explicit gates + blind review + devil's advocate vs built-in loops
|
|
182
182
|
2. **Memory System**: 3-tier (episodic/semantic/procedural) with cross-project learning
|
|
183
183
|
3. **Agent Specialization**: 41 pre-defined specialized agent roles across 8 domains
|
|
184
184
|
4. **Anti-Sycophancy**: CONSENSAGENT patterns prevent reviewer groupthink
|
|
@@ -207,7 +207,7 @@
|
|
|
207
207
|
| **Skills** | Progressive disclosure | 6 slash commands | N/A | 129 skills | N/A | 35 skills | Memory focus |
|
|
208
208
|
| **Multi-Provider** | Yes (Claude/Codex/Gemini) | 3 CLIs (separate) | No | No | No | No | No |
|
|
209
209
|
| **Memory System** | 3-tier (episodic/semantic/procedural) | None | N/A | N/A | Hybrid | N/A | SQLite+FTS5 |
|
|
210
|
-
| **Quality Gates** |
|
|
210
|
+
| **Quality Gates** | 8 gates + Completion Council | User verify only | Two-Stage Review | N/A | Consensus | Tiered | N/A |
|
|
211
211
|
| **Context Mgmt** | Standard | Fresh per task (core innovation) | Fresh per task | N/A | N/A | N/A | Progressive |
|
|
212
212
|
| **Autonomy** | High (minimal human) | Semi (checkpoints) | Human-guided | Human-guided | Orchestrated | Human-guided | N/A |
|
|
213
213
|
|
|
@@ -232,7 +232,7 @@ These are patterns from competing projects that are **practically and scientific
|
|
|
232
232
|
|----------|---------|-------------------------|
|
|
233
233
|
| **Multi-Provider Support** | Only skill supporting Claude, Codex, and Gemini with graceful degradation | All 8 competitors are Claude-only |
|
|
234
234
|
| **RARV Cycle** | Reason-Act-Reflect-Verify is more rigorous than Plan-Execute | Most use simple Plan-Execute |
|
|
235
|
-
| **
|
|
235
|
+
| **8-Gate Quality System** | Static analysis + test suite (pass/fail) + 3 blind reviewers with severity blocking + devil's advocate + mock-integrity + test-mutation + documentation coverage + Magic Modules debate (backward-compat is a conditional healing auditor) + Phase 1 closure | Superpowers has 2-stage, others have less |
|
|
236
236
|
| **Constitutional AI Integration** | Principles-based self-critique from Anthropic research | None have this |
|
|
237
237
|
| **Anti-Sycophancy (CONSENSAGENT)** | Blind review + devil's advocate prevents groupthink | None have this |
|
|
238
238
|
| **Provider Abstraction Layer** | Clean degradation from full-featured to sequential-only | Claude-only projects can't degrade |
|
|
@@ -359,12 +359,12 @@ Tiered agent architecture with explicit escalation:
|
|
|
359
359
|
|-----------|-------------------|
|
|
360
360
|
| **Autonomy** | Designed for high autonomy with minimal human intervention |
|
|
361
361
|
| **Multi-Agent** | 41 prompt-defined agent roles in 8 domains adopted per phase (parallel review council + optional worktree streams on Claude, sequential elsewhere) vs 1-8 in competitors, with all output gated by blind review + council |
|
|
362
|
-
| **Quality** |
|
|
362
|
+
| **Quality** | 8 gates + blind review + devil's advocate + property-based testing |
|
|
363
363
|
| **Research** | 10+ academic papers integrated vs proprietary/undisclosed |
|
|
364
364
|
| **Anti-Sycophancy** | Only agent with CONSENSAGENT-based blind review |
|
|
365
365
|
| **Memory** | 3-tier memory (episodic/semantic/procedural) + review learning + cross-project |
|
|
366
366
|
| **Transformation** | Code migration workflows (language, database, framework) |
|
|
367
|
-
| **Cost** | Free (
|
|
367
|
+
| **Cost** | Free (source-available, BUSL-1.1) vs $20-500/month |
|
|
368
368
|
| **Customization** | Full source access vs black box |
|
|
369
369
|
|
|
370
370
|
---
|
|
@@ -20,7 +20,7 @@ GSD is the closest competitor -- a context engineering system that spawns fresh
|
|
|
20
20
|
| Adoption | 594 stars, 6K/wk npm | 11,903 stars, 21K/wk npm | GSD (20x) |
|
|
21
21
|
| Simplicity | Complex (5.4K-line run.sh, 12 Python modules) | Simple (markdown agents + slash commands) | GSD |
|
|
22
22
|
| Full autonomy | Walk away, come back to deployed product | Human checkpoints at discuss/verify/milestone | Loki |
|
|
23
|
-
| Quality gates |
|
|
23
|
+
| Quality gates | 8-gate + Completion Council + anti-sycophancy | User verification only | Loki |
|
|
24
24
|
| Memory system | Episodic/semantic/procedural + vector search | None | Loki |
|
|
25
25
|
| Context management | Standard | Fresh subagent contexts per task (core innovation) | GSD |
|
|
26
26
|
| Time to value | Learn architecture, understand CLI flags | `npx get-shit-done-cc` and go | GSD |
|
package/docs/INSTALLATION.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
|
|
3
3
|
The flagship product of [Autonomi](https://www.autonomi.dev/). Loki Mode is a spec-driven autonomous builder with a built-in trust layer that takes any spec to a deployed product and verifies completion with evidence (quality gates plus a completion council), not just a "done" claim. Complete installation instructions for all platforms and use cases.
|
|
4
4
|
|
|
5
|
-
**Version:** v7.
|
|
5
|
+
**Version:** v7.46.0
|
|
6
6
|
|
|
7
7
|
---
|
|
8
8
|
|
|
@@ -389,7 +389,7 @@ provider works inside the container. Provide auth with your Anthropic API key:
|
|
|
389
389
|
# Run Loki Mode in Docker (Claude provider, API-key auth)
|
|
390
390
|
docker run --rm -e ANTHROPIC_API_KEY="$ANTHROPIC_API_KEY" \
|
|
391
391
|
-v $(pwd):/workspace -w /workspace \
|
|
392
|
-
asklokesh/loki-mode:7.
|
|
392
|
+
asklokesh/loki-mode:7.46.0 start ./my-spec.md
|
|
393
393
|
```
|
|
394
394
|
|
|
395
395
|
##### docker compose + .env (no host install)
|
|
@@ -0,0 +1,163 @@
|
|
|
1
|
+
# P0 Verification-Credibility Sweep -- Architecture Plan
|
|
2
|
+
|
|
3
|
+
Persisted from the Architect (opus). Every line number re-verified by grep.
|
|
4
|
+
Goal: make Loki's verification layer honest and real. A hollow wedge is
|
|
5
|
+
existential for a "proof of done" product. Fix or remove every false/hollow gate
|
|
6
|
+
claim, wire the unwired detectors, make anti-sycophancy act.
|
|
7
|
+
|
|
8
|
+
## 0. Verified ground truth
|
|
9
|
+
|
|
10
|
+
- P0-1: enforce_test_coverage() at autonomy/run.sh:7031. `local coverage_pct=0`
|
|
11
|
+
at 7038 is never reassigned; no coverage tool invoked. 7257 emits min_coverage
|
|
12
|
+
(the threshold), not a measured value. Gate decides purely on test_passed.
|
|
13
|
+
- P0-2: skills/quality-gates.md:5-17 lists 11 gates; gates 1 (Input Guardrails)
|
|
14
|
+
and 5 (Output Guardrails) have NO gate function. wiki/Quality-Gates.md:14-28
|
|
15
|
+
duplicates. (21 'guardrail' refs in autonomy/ are CLI help/comments/flags.)
|
|
16
|
+
- P0-3: tests/detect-mock-problems.sh + tests/detect-test-mutations.sh invoked
|
|
17
|
+
0 times in autonomy/run.sh. quality-gates.md:74-77 claims HIGH=FAIL.
|
|
18
|
+
- P0-4: anti-sycophancy block run.sh:8316-8323 only logs + writes
|
|
19
|
+
anti-sycophancy.txt. No Devil's-Advocate re-review. INERT. Bun mirror
|
|
20
|
+
loki-ts/src/runner/quality_gates.ts:804-808 equally inert.
|
|
21
|
+
- Gate inventory: phantom (Input/Output Guardrails); wired-but-unlisted
|
|
22
|
+
(run_magic_debate_gate at run.sh:14067); "Gate 10 Backward Compat" is the
|
|
23
|
+
legacy-healing-auditor SPECIALIST (run.sh:7875-7979), conditional, not a loop
|
|
24
|
+
gate; "Gate 6 Severity Blocking" is the block policy inside code review, not a
|
|
25
|
+
function.
|
|
26
|
+
|
|
27
|
+
### Functions actually invoked in orchestration (run.sh:13938-14084)
|
|
28
|
+
enforce_static_analysis (13945); enforce_test_coverage (13967); run_code_review
|
|
29
|
+
(13987); run_doc_quality_gate (14058); run_magic_debate_gate (14070); plus
|
|
30
|
+
conditional legacy-healing-auditor reviewer.
|
|
31
|
+
|
|
32
|
+
## 1. Canonical final gate list (THE CONTRACT -- docs transcribe, never recompute)
|
|
33
|
+
|
|
34
|
+
Honest count after this sweep: 8 gates.
|
|
35
|
+
|
|
36
|
+
| # | Gate | Function / mechanism | Blocking | Opt-out flag |
|
|
37
|
+
|---|------|---------------------|----------|--------------|
|
|
38
|
+
| 1 | Static Analysis | enforce_static_analysis (run.sh:6699) | Yes (ladder) | PHASE_STATIC_ANALYSIS=false |
|
|
39
|
+
| 2 | Test Suite (pass/fail) | enforce_test_coverage (run.sh:7031) | Yes (red blocks) | PHASE_UNIT_TESTS=false |
|
|
40
|
+
| 3 | Blind Code Review (3-reviewer council + severity blocking) | run_code_review (run.sh:7788) | Yes (Crit/High block) | PHASE_CODE_REVIEW=false |
|
|
41
|
+
| 4 | Anti-Sycophancy / Devil's Advocate (on unanimous PASS) | run_code_review sub-step (run.sh:8316+) | Yes (DA Crit/High block) | LOKI_GATE_DEVILS_ADVOCATE=false |
|
|
42
|
+
| 5 | Mock Integrity Detector | enforce_mock_integrity -> tests/detect-mock-problems.sh | Yes (HIGH blocks) | LOKI_GATE_MOCK=false |
|
|
43
|
+
| 6 | Test Mutation Detector | enforce_mutation_integrity -> tests/detect-test-mutations.sh | Yes (HIGH blocks) | LOKI_GATE_MUTATION=false |
|
|
44
|
+
| 7 | Documentation Coverage | run_doc_quality_gate (run.sh:7388) | Yes | LOKI_GATE_DOC_COVERAGE=false |
|
|
45
|
+
| 8 | Magic Modules Debate | run_magic_debate_gate (run.sh:7495) | Yes (BLOCK sev) | LOKI_GATE_MAGIC_DEBATE=false |
|
|
46
|
+
|
|
47
|
+
Conditional auditor (documented separately, NOT numbered): Backward-Compatibility
|
|
48
|
+
/ legacy-healing-auditor (healing mode only). Removed: Input/Output Guardrails.
|
|
49
|
+
|
|
50
|
+
### Doc files to update to "8 gates" (docs owner)
|
|
51
|
+
README.md (22,29,196,255); SKILL.md (3,10); CLAUDE.md (44);
|
|
52
|
+
plugins/loki-mode/README.md (4); wiki/Quality-Gates.md (14-48);
|
|
53
|
+
wiki/Environment-Variables.md (62); wiki/Home.md (3,13); wiki/CLI-Reference.md
|
|
54
|
+
(230); docs/cursor-comparison.md (14,177,195); docs/COMPARISON.md (40,210,362);
|
|
55
|
+
skills/quality-gates.md (5,13,14-17,19-66,69-82,650,655,668); skills/00-index.md
|
|
56
|
+
(51). CHANGELOG.md: NEW top entry ONLY; never rewrite historical entries
|
|
57
|
+
(5837/6181/6335/6340).
|
|
58
|
+
|
|
59
|
+
## 2. P0-1 Coverage honesty (Fix B) -- Slice A (run.sh owner) + Slice B (docs)
|
|
60
|
+
- run.sh: remove dead `local coverage_pct=0` (7038). Relabel logs: 13966
|
|
61
|
+
"test suite (pass/fail)"; 7265/7270 "Test suite gate".
|
|
62
|
+
- KEEP the min_coverage JSON field at 7257 (consumed by autonomy/loki:27529-27530,
|
|
63
|
+
16138 and asserted in tests/test-report-command.sh:116,
|
|
64
|
+
tests/test-completion-council-affirmative-evidence.sh:126,
|
|
65
|
+
tests/test-evidence-gate.sh:155). Only change misleading consumer strings in
|
|
66
|
+
autonomy/loki (27530, 16138) to "Min coverage TARGET (not measured)".
|
|
67
|
+
- docs (skills/quality-gates.md): :13 drop ">80% coverage" -> "coverage % not
|
|
68
|
+
measured in this release"; :650/:655 reword to pass/fail + target-only; :668
|
|
69
|
+
remove coverage.json artifact line. Note Fix A (real measurement) as follow-up.
|
|
70
|
+
|
|
71
|
+
## 3. P0-2 Phantom guardrails -- Slice B (docs only)
|
|
72
|
+
Remove gates 1 & 5 entirely (do not "mark planned"). Renumber to the 8-gate
|
|
73
|
+
table. Edit skills/quality-gates.md:5-17, wiki/Quality-Gates.md:14-28, + all
|
|
74
|
+
list files in section 1.
|
|
75
|
+
|
|
76
|
+
## 4. P0-3 Wire detectors -- Slice A (run.sh) + Slice D (scripts) + Slice C (Bun)
|
|
77
|
+
Exit-code asymmetry (load-bearing):
|
|
78
|
+
- detect-mock-problems.sh exits 1 on CRITICAL/HIGH (179-182), 0 otherwise.
|
|
79
|
+
Exit code already = block-on-HIGH.
|
|
80
|
+
- detect-test-mutations.sh exits 0 unless --strict; --strict blocks on ANY
|
|
81
|
+
finding (over-blocks MED/LOW). DO NOT use --strict. Wrapper greps stdout for
|
|
82
|
+
[HIGH] to decide block; route MED/LOW to findings injection.
|
|
83
|
+
|
|
84
|
+
New run.sh functions (place after run_magic_debate_gate ~7560):
|
|
85
|
+
enforce_mock_integrity() # HIGH -> return 1; MED/LOW -> findings file
|
|
86
|
+
enforce_mutation_integrity() # grep -c '\[HIGH\]' >0 -> return 1; MED/LOW -> findings
|
|
87
|
+
Both cd "${TARGET_DIR}", use LOKI_GATE_TIMEOUT wrapping, write findings into
|
|
88
|
+
${TARGET_DIR}/.loki/quality/ for the Phase-1 findings injector.
|
|
89
|
+
|
|
90
|
+
Orchestration insert: after the pause-check at 13983, before code-review at
|
|
91
|
+
13985. Mirror the existing pattern with track_gate_failure/clear_gate_failure +
|
|
92
|
+
gate_failures string. Toggles LOKI_GATE_MOCK / LOKI_GATE_MUTATION (matches
|
|
93
|
+
existing LOKI_GATE_DOC_COVERAGE / LOKI_GATE_MAGIC_DEBATE convention).
|
|
94
|
+
|
|
95
|
+
Detector-script (Slice D): optional --block-high mode on detect-test-mutations.sh
|
|
96
|
+
(exit 2 on HIGH) keeping --strict intact; OR rely on wrapper grep (no script
|
|
97
|
+
change). Verify detect-mock-problems.sh exit semantics. Do NOT touch run.sh.
|
|
98
|
+
|
|
99
|
+
## 5. P0-4 Anti-sycophancy acts -- Slice A (run.sh) + Slice C (Bun)
|
|
100
|
+
Read run_code_review 7788-8316 first. At 8316-8323 unanimous block: dispatch ONE
|
|
101
|
+
Devil's-Advocate reviewer reusing the existing reviewer-invocation +
|
|
102
|
+
parse_verdict helpers; if DA returns Crit/High set has_blocking=true so the
|
|
103
|
+
EXISTING block at 8326-8330 fires (return 1). Keep anti-sycophancy.txt for audit.
|
|
104
|
+
Gate behind LOKI_GATE_DEVILS_ADVOCATE (default true).
|
|
105
|
+
|
|
106
|
+
## 6. P0-5 Honest per-gate table -- Slice B (docs)
|
|
107
|
+
Replace skills/quality-gates.md:5-17 + prose 19-82 with the 8-gate table plus
|
|
108
|
+
columns: detects X / does NOT detect Y / opt-out flag / blocking. Honesty
|
|
109
|
+
entries: gate 2 "does NOT detect coverage %"; gate 5 "does NOT detect semantic
|
|
110
|
+
correctness of mocks"; gate 6 "does NOT detect logically-correct-but-weak
|
|
111
|
+
assertions".
|
|
112
|
+
|
|
113
|
+
## 7. Bash <-> Bun parity matrix
|
|
114
|
+
| Change | Bun mirror | File |
|
|
115
|
+
|--------|-----------|------|
|
|
116
|
+
| P0-1 label/honesty | Yes (light) | quality_gates.ts runTestCoverage (402): no false % strings |
|
|
117
|
+
| P0-2 gate count | docs only | -- |
|
|
118
|
+
| P0-3 mock gate | Yes | quality_gates.ts: add mock_integrity to GateName (69-74) + runMockIntegrity + sequence (1474-1480) + toggle |
|
|
119
|
+
| P0-3 mutation gate | Yes | quality_gates.ts: add mutation_integrity + runMutationIntegrity + sequence + toggle |
|
|
120
|
+
| P0-4 devil's advocate | Yes | quality_gates.ts runCodeReview (709), inert at 804-808: add DA dispatch + block |
|
|
121
|
+
| P0-5 doc table | docs only | -- |
|
|
122
|
+
Bun escalation ladder is generic; new gates inherit once added to union+sequence.
|
|
123
|
+
|
|
124
|
+
## 8. Slice boundaries (independent; no file collisions)
|
|
125
|
+
- Slice A -- run.sh runtime (ONE owner, serialized): P0-1 (run.sh + autonomy/loki
|
|
126
|
+
strings), P0-3 new funcs + orchestration insert, P0-4. Owns autonomy/run.sh +
|
|
127
|
+
autonomy/loki exclusively.
|
|
128
|
+
- Slice B -- Docs (ONE owner): P0-2 + P0-5 + all "11->8 gates" edits. Both edit
|
|
129
|
+
skills/quality-gates.md so MUST be one slice. New CHANGELOG entry only.
|
|
130
|
+
- Slice C -- Bun parity (ONE owner): loki-ts/src/runner/quality_gates.ts only.
|
|
131
|
+
- Slice D -- Detector scripts (ONE owner): tests/detect-test-mutations.sh
|
|
132
|
+
--block-high; verify detect-mock-problems.sh. No run.sh.
|
|
133
|
+
- Slice E -- SDET tests (ONE owner; after A/C/D): fixtures + assertions.
|
|
134
|
+
Order: D and B parallel anytime; A depends on D contract; C mirrors A; E last.
|
|
135
|
+
|
|
136
|
+
## 9. Test plan (SDET, Slice E)
|
|
137
|
+
- P0-1: grep assert no ">80%"/"min_coverage: 80% # Never drop"/"coverage.json"
|
|
138
|
+
in any list doc. Behavior: passing tests pass, failing tests block.
|
|
139
|
+
- P0-2: grep assert zero live "11 gates"/"Input Guardrails"/"Output Guardrails"
|
|
140
|
+
(CHANGELOG excepted); "8" present in quality-gates.md + wiki.
|
|
141
|
+
- P0-3 mock: fixture with tautological assertion -> enforce_mock_integrity
|
|
142
|
+
returns 1, BLOCKS, track_gate_failure increments. Clean -> 0, clears. MED-only
|
|
143
|
+
-> 0 + findings file.
|
|
144
|
+
- P0-3 mutation: fixture commit changing assertion values + impl (HIGH) ->
|
|
145
|
+
returns 1, BLOCKS. MED-only -> 0 + findings (proves not over-blocking).
|
|
146
|
+
- P0-4: unanimous PASS + DA High -> run_code_review returns 1. Unanimous PASS +
|
|
147
|
+
DA clean -> 0 + anti-sycophancy.txt exists.
|
|
148
|
+
- Parity: Bun sequence includes mock_integrity + mutation_integrity; runCodeReview
|
|
149
|
+
blocks on DA High; existing loki-ts tests green.
|
|
150
|
+
|
|
151
|
+
## 10. Risks + binding constraints
|
|
152
|
+
Risks: (1) min_coverage JSON field has live consumers + 3 test assertions -- keep
|
|
153
|
+
field, fix strings only. (2) mutation --strict over-blocks -- parse HIGH instead.
|
|
154
|
+
(3) detectors run against TARGET project test files -- cd TARGET_DIR + timeout
|
|
155
|
+
wrap. (4) stale cross-file comment line refs exist; do not chase, do not add new.
|
|
156
|
+
|
|
157
|
+
Binding constraints (every dev agent): NO version bumps (integrator once); NO
|
|
158
|
+
commits/push; NO emojis; NO em dashes; full gate applies (touches runtime/gates/
|
|
159
|
+
parity); stay inside your slice file ownership; run.sh is single-owner.
|
|
160
|
+
|
|
161
|
+
Canonical count decision: 8 (recommended). Keeping backward-compat numbered
|
|
162
|
+
would make it 9 but reintroduces the listed-but-not-a-loop-gate honesty gap this
|
|
163
|
+
sweep exists to close.
|
|
@@ -972,7 +972,7 @@ Source: `run.sh:7880-7881` (checklist_should_verify, checklist_verify)
|
|
|
972
972
|
|
|
973
973
|
## 7. Quality Gates
|
|
974
974
|
|
|
975
|
-
### 7.1
|
|
975
|
+
### 7.1 Eight-Gate Pipeline
|
|
976
976
|
|
|
977
977
|
Source: `skills/quality-gates.md`
|
|
978
978
|
|
|
@@ -980,40 +980,39 @@ Source: `skills/quality-gates.md`
|
|
|
980
980
|
Code Change
|
|
981
981
|
|
|
|
982
982
|
v
|
|
983
|
-
Gate 1: Static Analysis (CodeQL, ESLint)
|
|
984
|
-
|──BLOCK (
|
|
983
|
+
Gate 1: Static Analysis (CodeQL, ESLint/Pylint, type-checker)
|
|
984
|
+
|──BLOCK (severity ladder)──> [REJECTED]
|
|
985
985
|
v
|
|
986
|
-
Gate 2:
|
|
986
|
+
Gate 2: Test Suite (pass/fail; red blocks; coverage % not measured this release)
|
|
987
987
|
|──BLOCK──> [REJECTED]
|
|
988
988
|
v
|
|
989
|
-
Gate 3:
|
|
990
|
-
|──BLOCK──> [REJECTED]
|
|
991
|
-
v
|
|
992
|
-
Gate 4: Integration Tests
|
|
993
|
-
|──BLOCK──> [REJECTED]
|
|
994
|
-
v
|
|
995
|
-
Gate 5: 3-Reviewer Blind Review (see 7.3)
|
|
989
|
+
Gate 3: Blind 3-Reviewer Review with severity blocking (see 7.3)
|
|
996
990
|
|──BLOCK (Critical/High severity)──> [REJECTED]
|
|
997
991
|
v
|
|
998
|
-
Gate
|
|
999
|
-
|──BLOCK (devil's advocate
|
|
992
|
+
Gate 4: Anti-Sycophancy Devil's Advocate (on unanimous PASS)
|
|
993
|
+
|──BLOCK (devil's advocate Crit/High findings)──> [REJECTED]
|
|
1000
994
|
v
|
|
1001
|
-
Gate
|
|
1002
|
-
|──BLOCK──> [REJECTED]
|
|
995
|
+
Gate 5: Mock Integrity Detector
|
|
996
|
+
|──BLOCK (HIGH findings)──> [REJECTED]
|
|
1003
997
|
v
|
|
1004
|
-
Gate
|
|
1005
|
-
|──BLOCK──> [REJECTED]
|
|
998
|
+
Gate 6: Test Mutation Detector
|
|
999
|
+
|──BLOCK (HIGH findings)──> [REJECTED]
|
|
1006
1000
|
v
|
|
1007
|
-
Gate
|
|
1001
|
+
Gate 7: Documentation Coverage
|
|
1008
1002
|
|──BLOCK──> [REJECTED]
|
|
1009
1003
|
v
|
|
1004
|
+
Gate 8: Magic Modules Debate
|
|
1005
|
+
|──BLOCK (BLOCK-severity findings)──> [REJECTED]
|
|
1006
|
+
v
|
|
1010
1007
|
[APPROVED]
|
|
1011
1008
|
```
|
|
1012
1009
|
|
|
1010
|
+
Backward-compatibility is a conditional healing-mode auditor, not a numbered gate.
|
|
1011
|
+
|
|
1013
1012
|
Gate status values: `passed`, `failed`, `skipped`
|
|
1014
1013
|
Persistence: `.loki/dashboard-state.json` field `qualityGates`
|
|
1015
1014
|
Severity levels: `critical`, `high`, `medium`, `low`
|
|
1016
|
-
Blocking threshold: Critical and High
|
|
1015
|
+
Blocking threshold: Critical and High block; Medium and Low are advisory.
|
|
1017
1016
|
|
|
1018
1017
|
### 7.2 Model Escalation
|
|
1019
1018
|
|
|
@@ -57,5 +57,5 @@ architecture, and adversarial review -- complementing Loki Mode's autonomous exe
|
|
|
57
57
|
1. P0 must ship independently and prove value before P1/P2 begin
|
|
58
58
|
2. No runtime dependency on BMAD repo -- adapter reads BMAD output artifacts only
|
|
59
59
|
3. Zero regression on existing non-BMAD workflows
|
|
60
|
-
4. All code must pass existing
|
|
60
|
+
4. All code must pass existing 8-gate quality system
|
|
61
61
|
5. Context budget: BMAD additions must stay under 15K tokens per iteration
|
|
@@ -120,21 +120,24 @@ Loki Mode implements CONSENSAGENT (ACL 2025):
|
|
|
120
120
|
**Verdict: Loki Mode wins** - Research-backed quality assurance.
|
|
121
121
|
|
|
122
122
|
### 5. Quality Gates
|
|
123
|
-
Loki Mode
|
|
123
|
+
Loki Mode runs 8 deterministic quality gates plus full SDLC phase coverage.
|
|
124
|
+
|
|
125
|
+
The 8 deterministic quality gates: static analysis (CodeQL, ESLint), test suite (pass/fail), blind 3-reviewer review with severity blocking, anti-sycophancy Devil's Advocate, mock-integrity, test-mutation, documentation coverage, and Magic Modules debate. (Backward-compatibility is a conditional healing-mode auditor, not a numbered gate.)
|
|
126
|
+
|
|
127
|
+
Beyond the gates, the SDLC pipeline covers these phases:
|
|
124
128
|
1. Static analysis (CodeQL, ESLint)
|
|
125
|
-
2. Unit tests (
|
|
129
|
+
2. Unit tests (test suite passes; coverage % not measured this release)
|
|
126
130
|
3. API/Integration tests
|
|
127
131
|
4. E2E tests (Playwright)
|
|
128
132
|
5. Security scanning (OWASP)
|
|
129
|
-
6.
|
|
130
|
-
7.
|
|
131
|
-
8.
|
|
132
|
-
9.
|
|
133
|
-
10.
|
|
134
|
-
11.
|
|
135
|
-
12.
|
|
136
|
-
13.
|
|
137
|
-
14. Continuous monitoring
|
|
133
|
+
6. Parallel code review (3 reviewers)
|
|
134
|
+
7. Performance/load testing
|
|
135
|
+
8. Accessibility (WCAG)
|
|
136
|
+
9. Regression testing
|
|
137
|
+
10. UAT simulation
|
|
138
|
+
11. Anti-sycophancy check
|
|
139
|
+
12. Scale-aware review intensity
|
|
140
|
+
13. Continuous monitoring
|
|
138
141
|
|
|
139
142
|
**Auto-Claude:** Single QA validation loop (up to 50 iterations).
|
|
140
143
|
|
|
@@ -105,19 +105,20 @@ Full agent type definitions are in `references/agent-types.md`.
|
|
|
105
105
|
|
|
106
106
|
## Quality Gates
|
|
107
107
|
|
|
108
|
-
Loki Mode enforces
|
|
108
|
+
Loki Mode enforces an 8-gate quality system. Code must pass all applicable gates before moving forward:
|
|
109
109
|
|
|
110
110
|
| Gate | Name | Purpose |
|
|
111
111
|
|------|------|---------|
|
|
112
|
-
| 1 |
|
|
113
|
-
| 2 |
|
|
114
|
-
| 3 | Blind Review
|
|
115
|
-
| 4 | Anti-Sycophancy
|
|
116
|
-
| 5 |
|
|
117
|
-
| 6 |
|
|
118
|
-
| 7 |
|
|
119
|
-
| 8 |
|
|
120
|
-
|
|
112
|
+
| 1 | Static Analysis | CodeQL, ESLint/Pylint, type checking |
|
|
113
|
+
| 2 | Test Suite (pass/fail) | Red blocks; coverage % not measured this release |
|
|
114
|
+
| 3 | Blind Code Review (3-reviewer council + severity blocking) | 3 specialist reviewers in parallel, blind to each other; Critical/High = BLOCK; Medium/Low advisory |
|
|
115
|
+
| 4 | Anti-Sycophancy / Devil's Advocate | If reviewers unanimously approve, run a Devil's Advocate reviewer |
|
|
116
|
+
| 5 | Mock Integrity Detector | Flags tests that mock internal modules instead of real code |
|
|
117
|
+
| 6 | Test Mutation Detector | Detects assertion value changes alongside implementation changes |
|
|
118
|
+
| 7 | Documentation Coverage | README exists, docs freshness, API docs for packages |
|
|
119
|
+
| 8 | Magic Modules Debate | Spec-vs-implementation debate on generated Magic Modules |
|
|
120
|
+
|
|
121
|
+
A conditional backward-compatibility / legacy-healing auditor also runs in healing mode (not one of the 8 numbered gates).
|
|
121
122
|
|
|
122
123
|
The blind review system (Gate 3) selects 3 reviewers from a pool of 5 named specialists:
|
|
123
124
|
|
|
@@ -179,4 +180,4 @@ Every Loki Mode project uses these files in the `.loki/` directory:
|
|
|
179
180
|
|
|
180
181
|
## Summary
|
|
181
182
|
|
|
182
|
-
Loki Mode is an autonomous multi-agent system that follows the RARV cycle to build software from PRDs. It uses 41 agent types organized into 8 domains, enforces quality through
|
|
183
|
+
Loki Mode is an autonomous multi-agent system that follows the RARV cycle to build software from PRDs. It uses 41 agent types organized into 8 domains, enforces quality through 8 gates with blind peer review, and maintains episodic/semantic/procedural memory for continuous learning. Projects are classified into simple, standard, or complex tiers that determine the number of phases executed.
|
|
@@ -45,7 +45,7 @@ D) test-coverage-auditor
|
|
|
45
45
|
A) 3
|
|
46
46
|
B) 5
|
|
47
47
|
C) 7
|
|
48
|
-
D)
|
|
48
|
+
D) 8
|
|
49
49
|
|
|
50
50
|
---
|
|
51
51
|
|
|
@@ -67,12 +67,12 @@ D) complex
|
|
|
67
67
|
|
|
68
68
|
---
|
|
69
69
|
|
|
70
|
-
**Question 8:** What
|
|
70
|
+
**Question 8:** What does Gate 7 (Documentation Coverage) check?
|
|
71
71
|
|
|
72
|
-
A)
|
|
73
|
-
B)
|
|
74
|
-
C)
|
|
75
|
-
D)
|
|
72
|
+
A) That unit test coverage is at least 80%
|
|
73
|
+
B) That every function has an inline comment
|
|
74
|
+
C) That a README exists, docs are fresh within 10 commits, and packages have API docs
|
|
75
|
+
D) That cyclomatic complexity stays under 10
|
|
76
76
|
|
|
77
77
|
---
|
|
78
78
|
|
|
@@ -17,30 +17,40 @@ This module covers diagnosing and resolving common issues in Loki Mode: gate fai
|
|
|
17
17
|
|
|
18
18
|
## Quality Gate Failures
|
|
19
19
|
|
|
20
|
-
When a quality gate fails, identify which gate triggered the failure
|
|
20
|
+
When a quality gate fails, identify which gate triggered the failure (the 8-gate
|
|
21
|
+
system is detailed in `skills/quality-gates.md`):
|
|
21
22
|
|
|
22
|
-
**Gates 1-
|
|
23
|
+
**Gates 1-2 (Static analysis and test suite):**
|
|
24
|
+
- Gate 1 (Static Analysis): fix CodeQL/ESLint/Pylint/type-checker findings
|
|
25
|
+
- Gate 2 (Test Suite): the test runner must pass; red blocks. Coverage % is not
|
|
26
|
+
measured this release. Fix failing tests before proceeding (never delete or
|
|
27
|
+
skip tests)
|
|
28
|
+
|
|
29
|
+
**Gates 3-4 (Review gates):**
|
|
23
30
|
- Check the review output for severity levels
|
|
24
|
-
- Critical/High
|
|
31
|
+
- Critical/High = BLOCK; Medium/Low advisory (recommended to fix)
|
|
25
32
|
- Low/Cosmetic = TODO (informational)
|
|
26
33
|
- If all 3 reviewers pass unanimously, Gate 4 runs Devil's Advocate
|
|
27
34
|
|
|
28
|
-
**Gate
|
|
29
|
-
- Unit tests must have 100% pass rate and >80% coverage
|
|
30
|
-
- Integration tests must have 100% pass rate
|
|
31
|
-
- Fix failing tests before proceeding (never delete or skip tests)
|
|
32
|
-
|
|
33
|
-
**Gate 8 (Mock detector):**
|
|
35
|
+
**Gate 5 (Mock integrity detector):**
|
|
34
36
|
- Runs `tests/detect-mock-problems.sh`
|
|
35
37
|
- Flags tests that mock internal modules instead of using real code
|
|
36
38
|
- Flags tautological assertions and high internal mock ratios
|
|
37
|
-
- Disable with `
|
|
39
|
+
- Disable with `LOKI_GATE_MOCK=false` (not recommended)
|
|
38
40
|
|
|
39
|
-
**Gate
|
|
41
|
+
**Gate 6 (Test mutation detector):**
|
|
40
42
|
- Runs `tests/detect-test-mutations.sh`
|
|
41
43
|
- Detects assertion values changed alongside implementation (test fitting)
|
|
42
|
-
- Detects low assertion density
|
|
43
|
-
- Disable with `
|
|
44
|
+
- Detects low assertion density
|
|
45
|
+
- Disable with `LOKI_GATE_MUTATION=false` (not recommended)
|
|
46
|
+
|
|
47
|
+
**Gate 7 (Documentation coverage):**
|
|
48
|
+
- Checks README presence, docs freshness within 10 commits, and API docs for packages
|
|
49
|
+
- Disable with `LOKI_GATE_DOC_COVERAGE=false` (not recommended for packages)
|
|
50
|
+
|
|
51
|
+
**Gate 8 (Magic Modules debate):**
|
|
52
|
+
- Runs the spec-vs-implementation debate on generated Magic Modules
|
|
53
|
+
- BLOCK-severity findings block; disable with `LOKI_GATE_MAGIC_DEBATE=false`
|
|
44
54
|
|
|
45
55
|
## Circuit Breaker System
|
|
46
56
|
|
|
@@ -67,11 +67,11 @@ D) Removes the entire `.loki/` directory
|
|
|
67
67
|
|
|
68
68
|
---
|
|
69
69
|
|
|
70
|
-
**Question 8:** Which environment variable disables Gate
|
|
70
|
+
**Question 8:** Which environment variable disables Gate 5 (Mock Integrity Detector)?
|
|
71
71
|
|
|
72
|
-
A) `
|
|
72
|
+
A) `LOKI_GATE_MOCK=false`
|
|
73
73
|
B) `LOKI_GATE_MOCK_DETECTOR=false`
|
|
74
|
-
C) `
|
|
74
|
+
C) `LOKI_DISABLE_GATE_5=true`
|
|
75
75
|
D) `LOKI_NO_MOCK_DETECTION=true`
|
|
76
76
|
|
|
77
77
|
---
|
|
@@ -12,10 +12,10 @@ This file contains answers for all module quizzes and the final certification ex
|
|
|
12
12
|
| 2 | C | 41 agent types: 37 domain + 4 orchestration |
|
|
13
13
|
| 3 | B | After 5 failures, the task moves to `.loki/queue/dead-letter.json` |
|
|
14
14
|
| 4 | C | architecture-strategist is always one of the 3 selected reviewers |
|
|
15
|
-
| 5 | D |
|
|
15
|
+
| 5 | D | 8 quality gates (Static Analysis through Magic Modules Debate); backward-compatibility is a conditional healing-mode auditor, not one of the 8 |
|
|
16
16
|
| 6 | B | Episodic, semantic, and procedural memory |
|
|
17
17
|
| 7 | B | Simple tier uses 3 phases |
|
|
18
|
-
| 8 | C | Gate 7
|
|
18
|
+
| 8 | C | Gate 7 (Documentation Coverage) checks README presence, docs freshness within 10 commits, and API docs for packages; coverage % is not measured this release |
|
|
19
19
|
| 9 | C | Claude Code supports full features; Codex and Gemini run in degraded mode |
|
|
20
20
|
| 10 | B | If all 3 reviewers unanimously approve, a Devil's Advocate reviewer runs |
|
|
21
21
|
|
|
@@ -49,7 +49,7 @@ D) test-coverage-auditor
|
|
|
49
49
|
A) 3
|
|
50
50
|
B) 5
|
|
51
51
|
C) 7
|
|
52
|
-
D)
|
|
52
|
+
D) 8
|
|
53
53
|
|
|
54
54
|
---
|
|
55
55
|
|
|
@@ -71,12 +71,12 @@ D) complex
|
|
|
71
71
|
|
|
72
72
|
---
|
|
73
73
|
|
|
74
|
-
**Question 8:** What
|
|
74
|
+
**Question 8:** What does Gate 7 (Documentation Coverage) check?
|
|
75
75
|
|
|
76
|
-
A)
|
|
77
|
-
B)
|
|
78
|
-
C)
|
|
79
|
-
D)
|
|
76
|
+
A) That unit test coverage is at least 80%
|
|
77
|
+
B) That every function has an inline comment
|
|
78
|
+
C) That a README exists, docs are fresh within 10 commits, and packages have API docs
|
|
79
|
+
D) That cyclomatic complexity stays under 10
|
|
80
80
|
|
|
81
81
|
---
|
|
82
82
|
|
|
@@ -439,11 +439,11 @@ D) Removes the entire `.loki/` directory
|
|
|
439
439
|
|
|
440
440
|
---
|
|
441
441
|
|
|
442
|
-
**Question 48:** Which environment variable disables Gate
|
|
442
|
+
**Question 48:** Which environment variable disables Gate 5 (Mock Integrity Detector)?
|
|
443
443
|
|
|
444
|
-
A) `
|
|
444
|
+
A) `LOKI_GATE_MOCK=false`
|
|
445
445
|
B) `LOKI_SKIP_MOCK_CHECK=true`
|
|
446
|
-
C) `
|
|
446
|
+
C) `LOKI_DISABLE_GATE_5=true`
|
|
447
447
|
D) `LOKI_NO_MOCK_DETECTION=true`
|
|
448
448
|
|
|
449
449
|
---
|
|
@@ -409,7 +409,7 @@ These are bolt.new weaknesses that Loki Mode already solves or can emphasize:
|
|
|
409
409
|
|
|
410
410
|
#### R5: Advertise Production Readiness as Key Differentiator
|
|
411
411
|
- **bolt.new's gap**: 70% done code, no tests, no review, $5-20K remediation
|
|
412
|
-
- **Loki Mode's advantage**: RARV cycle,
|
|
412
|
+
- **Loki Mode's advantage**: RARV cycle, 8 quality gates, 3-reviewer system, automated testing
|
|
413
413
|
- **Action**: Create comparison content showing: "bolt.new gives you a prototype. Loki Mode gives you a product."
|
|
414
414
|
- **Messaging**: "From PRD to production, not PRD to prototype"
|
|
415
415
|
|