loki-mode 7.12.0 → 7.13.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/SKILL.md +2 -2
- package/VERSION +1 -1
- package/autonomy/lib/wiki-ask.py +137 -0
- package/autonomy/lib/wiki-generator.py +322 -0
- package/autonomy/lib/wiki_index.py +258 -0
- package/autonomy/lib/wiki_llm.py +140 -0
- package/autonomy/loki +121 -0
- package/bin/loki +1 -1
- package/dashboard/__init__.py +1 -1
- package/dashboard/server.py +108 -0
- package/dashboard/static/index.html +394 -329
- package/docs/INSTALLATION.md +1 -1
- package/docs/R5-AUTO-WIKI-DESIGN.md +137 -0
- package/loki-ts/dist/loki.js +224 -198
- package/mcp/__init__.py +1 -1
- package/package.json +1 -1
package/docs/INSTALLATION.md
CHANGED
|
@@ -0,0 +1,137 @@
|
|
|
1
|
+
# R5: Auto-wiki + Cited Codebase Q&A (Loki's DeepWiki) -- Design Note
|
|
2
|
+
|
|
3
|
+
Status: implemented in worktree (do not commit to main). Target release: R5 of the
|
|
4
|
+
competitive-stickiness arc. NO version bump in this worktree.
|
|
5
|
+
|
|
6
|
+
## Goal
|
|
7
|
+
|
|
8
|
+
A persistent, queryable per-project wiki generated from the codebase, surfaced in
|
|
9
|
+
the dashboard, with cited answers. Loki's answer to Devin DeepWiki. Sections cite
|
|
10
|
+
the real source files they were built from; `loki wiki ask` returns a grounded,
|
|
11
|
+
cited answer (citations = `file:line`). Generation is incremental: it skips when
|
|
12
|
+
the codebase has not changed (reuses the R1 codebase-signature idea).
|
|
13
|
+
|
|
14
|
+
## What already exists (verified against source, 2026-06-03)
|
|
15
|
+
|
|
16
|
+
| Asset | File | Reused? |
|
|
17
|
+
|---|---|---|
|
|
18
|
+
| `loki docs generate` (LLM-written README/ARCHITECTURE/...) | `autonomy/loki:20577` (`cmd_docs`) | Patterns reused: `_docs_scan_project`, `_docs_build_context`, `_docs_invoke_provider`, `_docs_write_manifest`. Not the command itself -- docs has no citations and no Q&A. |
|
|
19
|
+
| Proof-of-run generator (Python core, thin CLI readers) | `autonomy/lib/proof-generator.py`, bash `cmd_proof`, `loki-ts/src/commands/proof.ts` | Architecture precedent reused exactly: Python core does the heavy work; bash + Bun are thin readers; dashboard exposes read APIs. |
|
|
20
|
+
| PII redaction | `autonomy/lib/proof_redact.py` (`redact_tree`, `_redact_paths`) | Reused: wiki output is normalized to repo-relative paths and passed through the redactor so no `/Users/<name>/...` leaks. |
|
|
21
|
+
| Org knowledge graph | `memory/knowledge_graph.py` (`OrganizationKnowledgeGraph`) | Token-overlap scoring idea reused (`_tokenize` / `query_patterns`). NOT a codebase index -- it aggregates `.loki/memory/semantic/*.json` patterns across projects, keyed on `~/.loki/knowledge`. See "Honest reuse" below. |
|
|
22
|
+
| Memory retrieval | `memory/retrieval.py` (`MemoryRetrieval`) | Inspected. It retrieves memory entries (episodic/semantic/procedural), NOT source code. Not a code indexer. Not reused for code retrieval. |
|
|
23
|
+
| Dashboard read-API + traversal-safety | `dashboard/server.py:7191` (`_safe_proof_run_dir`) | Pattern reused for the wiki section/path param (`_safe_wiki_section`). |
|
|
24
|
+
| Dashboard web components | `dashboard-ui/components/*.js` (Web Components) | New `loki-wiki-browser.js` follows the same `LokiElement` convention; registered in `index.js`. |
|
|
25
|
+
|
|
26
|
+
### Honest reuse statement
|
|
27
|
+
|
|
28
|
+
The task says "reuse memory/knowledge_graph.py and memory/retrieval.py." Both were
|
|
29
|
+
read. Neither is a *codebase* index: `knowledge_graph.py` aggregates cross-project
|
|
30
|
+
*memory patterns* (`.loki/memory/semantic`), and `retrieval.py` retrieves *memory
|
|
31
|
+
entries*, not source files. There was no existing per-file code index to query for
|
|
32
|
+
grounded citations. So R5 adds a new lightweight, dependency-free code index
|
|
33
|
+
(`autonomy/lib/wiki_index.py`) and reuses the parts that genuinely fit: the
|
|
34
|
+
token-overlap retrieval scoring (ported from `knowledge_graph._tokenize` /
|
|
35
|
+
`query_patterns`), the docs scanner, the proof manifest/signature idea, and the
|
|
36
|
+
redactor. Reuse is stated where real; not fabricated to satisfy a constraint.
|
|
37
|
+
|
|
38
|
+
ChromaDB (`tools/index-codebase.py`, MEMORY.md) is an OPTIONAL future backend. The
|
|
39
|
+
core deliberately does NOT depend on it: it needs Docker + python3.12 and is not
|
|
40
|
+
CI-safe. Default retrieval is deterministic and dependency-light.
|
|
41
|
+
|
|
42
|
+
## The grounding contract (the part that must be right)
|
|
43
|
+
|
|
44
|
+
Fabricated citations are made structurally impossible, not merely prompt-discouraged:
|
|
45
|
+
|
|
46
|
+
1. **Index**: `wiki_index.py` scans source files (git-tracked when available, else a
|
|
47
|
+
filtered `find`), splits each into line-anchored chunks
|
|
48
|
+
`{file, start_line, end_line, text}` where `file` is REPO-RELATIVE.
|
|
49
|
+
2. **Retrieve** (`ask`): deterministic token-overlap scoring (no LLM, no network)
|
|
50
|
+
selects the top-K chunks for the question. Each is a record we own.
|
|
51
|
+
3. **Prompt**: the LLM sees NUMBERED chunks `[1]..[K]` and is told to cite by chunk
|
|
52
|
+
index only (`[1]`, `[2]`). It never emits raw paths.
|
|
53
|
+
4. **Map + validate**: indices in the answer are mapped back to `{file, start_line}`
|
|
54
|
+
from the retrieval records. Every citation is then validated against the
|
|
55
|
+
filesystem (file exists AND start_line <= file length). Non-resolving citations
|
|
56
|
+
are DROPPED. The LLM can only reference chunks we supplied, and only ones that
|
|
57
|
+
resolve on disk -- so a fabricated citation cannot survive.
|
|
58
|
+
5. **generate**: per-section "sources" are CODE-DERIVED (the files the scanner read,
|
|
59
|
+
the real def/class line numbers from a grep parse), not LLM-emitted. The LLM
|
|
60
|
+
writes prose; the citation list comes from the index.
|
|
61
|
+
|
|
62
|
+
If the LLM is unavailable (CI, no provider), `ask` returns an EXTRACTIVE answer
|
|
63
|
+
(the top chunk snippets with their real citations) and `generate` writes a
|
|
64
|
+
template-based wiki whose citations are still the real scanned files. No fabrication
|
|
65
|
+
in any path.
|
|
66
|
+
|
|
67
|
+
## Mocking the LLM in CI
|
|
68
|
+
|
|
69
|
+
The Python core reads `LOKI_WIKI_LLM_STUB`:
|
|
70
|
+
- unset -> call the real provider via the same path `_docs_invoke_provider` uses
|
|
71
|
+
(`claude -p` etc.), OR fall back to extractive/template if no provider on PATH.
|
|
72
|
+
- set to a file path -> read the stubbed completion from that file.
|
|
73
|
+
- set to any other value -> use it literally as the completion.
|
|
74
|
+
|
|
75
|
+
Tests set `LOKI_WIKI_LLM_STUB` so CI makes ZERO paid calls. This mirrors how the
|
|
76
|
+
proof tests fake `gh`/`open` via PATH and env.
|
|
77
|
+
|
|
78
|
+
## Storage layout (per project, generated)
|
|
79
|
+
|
|
80
|
+
```
|
|
81
|
+
.loki/wiki/
|
|
82
|
+
wiki.json # structured: sections[], each with title, body, citations[]
|
|
83
|
+
index.md # human-readable rendered wiki (overview + module list)
|
|
84
|
+
architecture.md # rendered architecture section
|
|
85
|
+
modules.md # rendered key-modules section
|
|
86
|
+
data-flow.md # rendered data-flow section
|
|
87
|
+
wiki-manifest.json # signature (git sha + per-file content hash), generated_at
|
|
88
|
+
code-index.json # the chunk index (file, start_line, end_line, tokens)
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
NOTE: `.loki/wiki/` (this deliverable, per-project, generated, gitignored) is a
|
|
92
|
+
DIFFERENT namespace from the repo-root `wiki/` (the GitHub wiki in the release
|
|
93
|
+
workflow). R5 never touches the latter.
|
|
94
|
+
|
|
95
|
+
## Incremental regeneration (R1 signature idea)
|
|
96
|
+
|
|
97
|
+
`wiki-manifest.json` stores a `signature` = sha256 over (git HEAD sha + sorted list
|
|
98
|
+
of `path:content-hash` for every scanned source file). `loki wiki generate` computes
|
|
99
|
+
the current signature; if it equals the stored one, it SKIPS regeneration and prints
|
|
100
|
+
"up to date" (unless `--force`). This is the same cheap-incremental idea as the docs
|
|
101
|
+
manifest and the R1 codebase signature.
|
|
102
|
+
|
|
103
|
+
## Command surface
|
|
104
|
+
|
|
105
|
+
```
|
|
106
|
+
loki wiki generate [path] [--force] # build/refresh .loki/wiki/ (incremental)
|
|
107
|
+
loki wiki show [section] # print rendered wiki (or one section)
|
|
108
|
+
loki wiki ask "<question>" # grounded, cited answer (file:line)
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
## Build surface (mirrors the proof precedent)
|
|
112
|
+
|
|
113
|
+
- `autonomy/lib/wiki_index.py` -- scan + chunk + token-overlap retrieve + signature
|
|
114
|
+
(importable module; underscore name).
|
|
115
|
+
- `autonomy/lib/wiki-generator.py` -- generate wiki.json + rendered md (LLM or
|
|
116
|
+
template), citations code-derived; subprocess-invoked (hyphen in name, like
|
|
117
|
+
proof-generator.py).
|
|
118
|
+
- `autonomy/lib/wiki-ask.py` -- retrieve K chunks, prompt (stub-aware), map + validate
|
|
119
|
+
citations, print grounded answer. Subprocess-invoked.
|
|
120
|
+
- bash `cmd_wiki` in `autonomy/loki` (generate|show|ask) -- thin dispatcher to Python.
|
|
121
|
+
- Bun `loki-ts/src/commands/wiki.ts` -- native `show` (reads `.loki/wiki/`); `generate`
|
|
122
|
+
and `ask` delegate to the bash/Python core (heavy work, provider). Added to the
|
|
123
|
+
`bin/loki` allowlist and `cli.ts` switch.
|
|
124
|
+
- Dashboard: `GET /api/wiki` (list sections + manifest), `GET /api/wiki/{section}`,
|
|
125
|
+
`POST /api/wiki/ask` -- all traversal-safe; web component `loki-wiki-browser.js`.
|
|
126
|
+
|
|
127
|
+
## Tests
|
|
128
|
+
|
|
129
|
+
- `tests/test_wiki_index.py` -- chunking is line-accurate; retrieval is deterministic;
|
|
130
|
+
signature stable + changes on edit; repo-relative paths only.
|
|
131
|
+
- `tests/test_wiki_generator.py` -- generate on a fixture repo; citations point to REAL
|
|
132
|
+
files; incremental skip when unchanged; LLM stubbed; no absolute paths (no PII).
|
|
133
|
+
- `tests/test_wiki_ask.py` -- `ask` returns grounded answer; every citation resolves on
|
|
134
|
+
disk; a stub that emits a bogus `[99]` index is dropped (anti-fabrication).
|
|
135
|
+
- `tests/cli/test-wiki-command.sh` -- bash route generate/show/ask on a fixture, stubbed.
|
|
136
|
+
- `loki-ts/tests/commands/wiki.test.ts` -- Bun `show` parity with the rendered md.
|
|
137
|
+
- `tests/dashboard/test_wiki_routes.py` -- API list/get/ask + traversal rejection.
|