droste-memory 1.0.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- droste_memory-1.0.0/CONTRIBUTING.md +39 -0
- droste_memory-1.0.0/LICENSE +21 -0
- droste_memory-1.0.0/MANIFEST.in +4 -0
- droste_memory-1.0.0/PKG-INFO +208 -0
- droste_memory-1.0.0/README.md +185 -0
- droste_memory-1.0.0/core/__init__.py +6 -0
- droste_memory-1.0.0/core/droste_cli.py +409 -0
- droste_memory-1.0.0/core/droste_engine.py +1067 -0
- droste_memory-1.0.0/core/droste_ingester.py +2742 -0
- droste_memory-1.0.0/core/droste_watcher.py +147 -0
- droste_memory-1.0.0/core/embedding_projector.py +279 -0
- droste_memory-1.0.0/core/treesitter_extract.py +391 -0
- droste_memory-1.0.0/droste_memory.egg-info/PKG-INFO +208 -0
- droste_memory-1.0.0/droste_memory.egg-info/SOURCES.txt +31 -0
- droste_memory-1.0.0/droste_memory.egg-info/dependency_links.txt +1 -0
- droste_memory-1.0.0/droste_memory.egg-info/entry_points.txt +2 -0
- droste_memory-1.0.0/droste_memory.egg-info/requires.txt +16 -0
- droste_memory-1.0.0/droste_memory.egg-info/top_level.txt +1 -0
- droste_memory-1.0.0/pyproject.toml +41 -0
- droste_memory-1.0.0/requirements.txt +18 -0
- droste_memory-1.0.0/setup.cfg +4 -0
- droste_memory-1.0.0/tests/test_core_invariants.py +197 -0
- droste_memory-1.0.0/tests/test_shards_race.py +218 -0
- droste_memory-1.0.0/visualizer/__init__.py +1 -0
- droste_memory-1.0.0/visualizer/app.py +240 -0
- droste_memory-1.0.0/visualizer/cockpit.html +1519 -0
- droste_memory-1.0.0/visualizer/context.json +81 -0
- droste_memory-1.0.0/visualizer/demo_graph.json +1 -0
- droste_memory-1.0.0/visualizer/export_graph.py +174 -0
- droste_memory-1.0.0/visualizer/graph.json +1 -0
- droste_memory-1.0.0/visualizer/server.py +75 -0
- droste_memory-1.0.0/visualizer/status.json +1 -0
- droste_memory-1.0.0/visualizer/templates/index.html +1164 -0
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
# Contributing to Droste
|
|
2
|
+
|
|
3
|
+
Thanks for helping build the causal-memory layer for AI agents.
|
|
4
|
+
|
|
5
|
+
## Dev setup
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
git clone <your fork>
|
|
9
|
+
cd droste-memory
|
|
10
|
+
pip install -e ".[dev]"
|
|
11
|
+
pytest # should be green before you start
|
|
12
|
+
```
|
|
13
|
+
|
|
14
|
+
## Ground rules
|
|
15
|
+
|
|
16
|
+
- **Tests stay green.** `pytest` runs the deterministic regression suite
|
|
17
|
+
(`tests/`). Add a test for any behaviour change to the engine, ingester, or
|
|
18
|
+
packer. The suite forces the deterministic hash embedding backend, so it runs
|
|
19
|
+
offline with no model download.
|
|
20
|
+
- **`eval/` is for benchmarks, `tests/` is for invariants.** Don't mix them.
|
|
21
|
+
- **Keep the zero-config moat.** New required deps are a big deal — prefer
|
|
22
|
+
optional extras. `fastembed` (no torch) and `tree-sitter-language-pack` are the
|
|
23
|
+
only heavy runtime deps and both degrade gracefully if missing.
|
|
24
|
+
- **Never commit user data.** `droste_memory_db.json`, `.droste/`,
|
|
25
|
+
`visualizer/graph.json` and `status.json` are gitignored — they can contain a
|
|
26
|
+
user's source. Only `visualizer/demo_graph.json` (Droste indexing itself) is
|
|
27
|
+
public.
|
|
28
|
+
|
|
29
|
+
## Good first issues
|
|
30
|
+
|
|
31
|
+
- New language extractor / edge rules in `core/treesitter_extract.py`.
|
|
32
|
+
- More cross-language bridges in `core/droste_ingester.py`
|
|
33
|
+
(`_build_dependency_links`) — e.g. ORM table refs, GraphQL, gRPC.
|
|
34
|
+
- Visualizer polish in `visualizer/cockpit.html`.
|
|
35
|
+
|
|
36
|
+
## PRs
|
|
37
|
+
|
|
38
|
+
Small, focused, with a one-line rationale and a test. CI runs `pytest` on
|
|
39
|
+
Linux across Python 3.10–3.12.
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Droste-Memory authors
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,208 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: droste-memory
|
|
3
|
+
Version: 1.0.0
|
|
4
|
+
Summary: Local hybrid structural and semantic graph memory engine.
|
|
5
|
+
Requires-Python: >=3.10
|
|
6
|
+
Description-Content-Type: text/markdown
|
|
7
|
+
License-File: LICENSE
|
|
8
|
+
Requires-Dist: mcp>=1.2.0
|
|
9
|
+
Requires-Dist: numpy>=1.24.0
|
|
10
|
+
Requires-Dist: scikit-learn>=1.3.0
|
|
11
|
+
Requires-Dist: fastembed>=0.8.0
|
|
12
|
+
Requires-Dist: truststore>=0.10.0
|
|
13
|
+
Requires-Dist: pydantic>=2.0.0
|
|
14
|
+
Requires-Dist: fastapi>=0.110.0
|
|
15
|
+
Requires-Dist: uvicorn[standard]>=0.27.0
|
|
16
|
+
Requires-Dist: tree-sitter>=0.23.0
|
|
17
|
+
Requires-Dist: tree-sitter-language-pack>=1.9.0
|
|
18
|
+
Provides-Extra: heavy-embed
|
|
19
|
+
Requires-Dist: sentence-transformers>=2.7.0; extra == "heavy-embed"
|
|
20
|
+
Provides-Extra: dev
|
|
21
|
+
Requires-Dist: pytest>=7.0; extra == "dev"
|
|
22
|
+
Dynamic: license-file
|
|
23
|
+
|
|
24
|
+
<div align="center">
|
|
25
|
+
|
|
26
|
+
# Droste
|
|
27
|
+
|
|
28
|
+
### See your codebase as a living galaxy — and give your agents causal memory of it.
|
|
29
|
+
|
|
30
|
+
Droste indexes any repo into a fractal, zoomable map of its symbols, wires them
|
|
31
|
+
together with their real call / import / DB edges across languages, and serves an
|
|
32
|
+
agent the *causal* slice of code it actually needs — not just keyword matches.
|
|
33
|
+
|
|
34
|
+
**Local-first · zero-config · polyglot · MCP-native**
|
|
35
|
+
|
|
36
|
+

|
|
37
|
+
|
|
38
|
+
*Zooming out reveals the causal web — every cyan arc is a real `syntax_dependency`
|
|
39
|
+
edge. [Full flythrough (FastAPI)](docs/assets/demo.mp4)*
|
|
40
|
+
|
|
41
|
+
[Quickstart](#quickstart) · [Why it's different](#why-its-different) · [How it works](#how-it-works) · [MCP](#use-it-as-an-mcp-server) · [Benchmarks](#benchmarks)
|
|
42
|
+
|
|
43
|
+
</div>
|
|
44
|
+
|
|
45
|
+
---
|
|
46
|
+
|
|
47
|
+
## Quickstart
|
|
48
|
+
|
|
49
|
+
```bash
|
|
50
|
+
pip install -e . # or: pip install droste-memory (once on PyPI)
|
|
51
|
+
droste index . # index the current repo
|
|
52
|
+
droste view # open the fractal galaxy in your browser
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
Three commands. `droste view` opens a full-screen, 60fps zoomable map of your
|
|
56
|
+
code — scroll to dive from the project star into folder orbits, down to the
|
|
57
|
+
individual functions, with the causal edges glowing between them.
|
|
58
|
+
|
|
59
|
+
Need it for an agent instead of your eyes?
|
|
60
|
+
|
|
61
|
+
```bash
|
|
62
|
+
droste context "checkout flow" --budget 1500 # causal context slice for an LLM
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
Running `droste` with no arguments prints the command palette:
|
|
66
|
+
|
|
67
|
+
```text
|
|
68
|
+
.-----------------------.
|
|
69
|
+
.----' | '----.
|
|
70
|
+
.---' .-----+-----. '---.
|
|
71
|
+
.' .----' | '----. '.
|
|
72
|
+
/ .---' .---+---. '---. \
|
|
73
|
+
/ .-' .-' | '-. '-. \
|
|
74
|
+
| .' .-' .---+---. '-. '. |
|
|
75
|
+
| / .' .' | '. '. \ |
|
|
76
|
+
| | | | .--+--. | | | |
|
|
77
|
+
| --+--------+-----+---+ @ +---+-----+--------+-- |
|
|
78
|
+
| | | | '--+--' | | | |
|
|
79
|
+
| \ '. '. | .' .' / |
|
|
80
|
+
| '. '-. '---+---' .-' .' |
|
|
81
|
+
\ '-. '-. | .-' .-' /
|
|
82
|
+
\ '---. '---+---' .---' /
|
|
83
|
+
'. '----. | .----' .'
|
|
84
|
+
'---. '-----+-----' .---'
|
|
85
|
+
'----. | .----'
|
|
86
|
+
'-----------------------'
|
|
87
|
+
|
|
88
|
+
DROSTE-MEMORY // RIGID FRACTAL RADIAL LAYOUT
|
|
89
|
+
Local Graph Engine v1.0-Alpha-Sharded
|
|
90
|
+
|
|
91
|
+
Commands
|
|
92
|
+
droste index <path> [--reset]
|
|
93
|
+
droste status
|
|
94
|
+
droste zoom <symbol_name>
|
|
95
|
+
droste context [query] --budget 1500
|
|
96
|
+
|
|
97
|
+
Fast path: droste context hub_core --budget 1000 | clip
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
---
|
|
101
|
+
|
|
102
|
+
## Why it's different
|
|
103
|
+
|
|
104
|
+
Most "code context" tools rank by keyword (ctags / ripgrep / repo-maps) or by
|
|
105
|
+
embedding cosine (vector-RAG). Both can only return what *resembles* your query.
|
|
106
|
+
A caller that shares no tokens — or a database function in a different language —
|
|
107
|
+
is invisible to them, yet it's exactly what you need to understand or change the
|
|
108
|
+
code.
|
|
109
|
+
|
|
110
|
+
Droste's edge is the causal graph:
|
|
111
|
+
|
|
112
|
+
- **Causal wormholes.** Real `syntax_dependency` edges (calls, imports,
|
|
113
|
+
inheritance) in both directions — Droste hands the caller and callees, ordered,
|
|
114
|
+
within a token budget.
|
|
115
|
+
- **Cross-language bridges.** The part nobody else does well: Droste links across
|
|
116
|
+
languages — app code to SQL functions/tables (`.rpc('x')`, `.from('table')`),
|
|
117
|
+
to edge functions, and same-name handlers between any two languages. Your
|
|
118
|
+
Dart/TS/Python frontend and your database stop being two separate worlds on the
|
|
119
|
+
map.
|
|
120
|
+
- **A map you actually want to look at.** The fractal galaxy isn't a gimmick —
|
|
121
|
+
it's how you see coupling, risk hotspots, and the blast radius of a change.
|
|
122
|
+
- **Zero-config and local.** No cloud, no account, no API key. fastembed (ONNX,
|
|
123
|
+
no torch) gives real semantics; a deterministic fallback keeps it runnable
|
|
124
|
+
anywhere.
|
|
125
|
+
|
|
126
|
+
Polyglot: Python (AST) + tree-sitter for Dart, TypeScript/JavaScript, Go, Rust,
|
|
127
|
+
Java, C#, C/C++, Kotlin, Swift, Ruby, PHP, SQL — symbols *and* edges.
|
|
128
|
+
|
|
129
|
+
> **Honest scope:** the measured advantage is structural / causal retrieval. On
|
|
130
|
+
> pure semantic "concept" queries it's competitive with a vector baseline, not a
|
|
131
|
+
> leap. Cross-language bridges are strongest where the target is actually defined
|
|
132
|
+
> in the indexed repo (e.g. SQL schema in your migrations).
|
|
133
|
+
|
|
134
|
+
---
|
|
135
|
+
|
|
136
|
+
## Benchmarks
|
|
137
|
+
|
|
138
|
+
Self-supervised eval (gold = the true caller/callee set from the AST), equal
|
|
139
|
+
retrieval breadth *k*, real embeddings, across Python + Dart repos
|
|
140
|
+
(`eval/comparative_eval.py`):
|
|
141
|
+
|
|
142
|
+
| structural retrieval | Droste | vector-RAG core | lexical core |
|
|
143
|
+
| --- | --- | --- | --- |
|
|
144
|
+
| neighbour-recall | **0.94** | 0.18 | 0.42 |
|
|
145
|
+
| nDCG@k | **0.65** | 0.10 | 0.29 |
|
|
146
|
+
|
|
147
|
+
…plus hundreds of true causal neighbours that both baselines structurally miss.
|
|
148
|
+
This is a retrieval-method comparison (the cores of vector-RAG and lexical
|
|
149
|
+
search), not a head-to-head against the finished products that wrap them.
|
|
150
|
+
|
|
151
|
+
---
|
|
152
|
+
|
|
153
|
+
## How it works
|
|
154
|
+
|
|
155
|
+
- **Causal graph.** Each definition is parsed (Python `ast`; tree-sitter for the
|
|
156
|
+
rest) into the names it calls / imports / inherits, becoming first-class
|
|
157
|
+
`syntax_dependency` edges. Cross-language edges add DB calls (`.rpc`, `.from`,
|
|
158
|
+
`.functions.invoke`) and string-literal name matches across languages.
|
|
159
|
+
- **Hybrid seed.** A query is matched by a normalized blend of lexical score and
|
|
160
|
+
semantic cosine (fastembed `bge-small-en-v1.5`, 384-dim), then the graph
|
|
161
|
+
expands the seed bidirectionally (callees and callers).
|
|
162
|
+
- **Token packer.** Results fit a budget with LOD-demotion (full to contract to
|
|
163
|
+
skeleton) and a hard guardrail that never cuts a line of code mid-token.
|
|
164
|
+
- **Sharded persistence.** One shard per file under `.droste/`, blake2b
|
|
165
|
+
dirty-tracking so a re-index rewrites only what changed; atomic writes + meta
|
|
166
|
+
written last, so it is crash-safe and self-heals on the next run.
|
|
167
|
+
|
|
168
|
+
---
|
|
169
|
+
|
|
170
|
+
## Use it as an MCP server
|
|
171
|
+
|
|
172
|
+
Droste is a drop-in MCP server — an agent calls it as primary code memory instead
|
|
173
|
+
of blind file reads.
|
|
174
|
+
|
|
175
|
+
```jsonc
|
|
176
|
+
{
|
|
177
|
+
"mcpServers": {
|
|
178
|
+
"droste": { "command": "python", "args": ["/abs/path/to/droste-memory/server.py"] }
|
|
179
|
+
}
|
|
180
|
+
}
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
Key tools: `droste_index_project`, `droste_get_context`, `droste_status`.
|
|
184
|
+
|
|
185
|
+
---
|
|
186
|
+
|
|
187
|
+
## Development
|
|
188
|
+
|
|
189
|
+
```bash
|
|
190
|
+
pip install -e ".[dev]"
|
|
191
|
+
pytest # deterministic regression suite (tests/)
|
|
192
|
+
python eval/comparative_eval.py # retrieval benchmark vs lexical & vector cores
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
`tests/` = invariants + concurrency (round-trip, dirty-oracle, packer guardrail,
|
|
196
|
+
cross-process shard race). `eval/` = performance/quality benchmarks.
|
|
197
|
+
|
|
198
|
+
---
|
|
199
|
+
|
|
200
|
+
## Status
|
|
201
|
+
|
|
202
|
+
**v1.0.0a0 (alpha).** Engine, polyglot + cross-language graph, CLI, fractal
|
|
203
|
+
visualizer and MCP server are working and tested. Packaging/distribution are
|
|
204
|
+
maturing — issues and PRs welcome (see `CONTRIBUTING.md`).
|
|
205
|
+
|
|
206
|
+
## License
|
|
207
|
+
|
|
208
|
+
MIT — see [LICENSE](LICENSE).
|
|
@@ -0,0 +1,185 @@
|
|
|
1
|
+
<div align="center">
|
|
2
|
+
|
|
3
|
+
# Droste
|
|
4
|
+
|
|
5
|
+
### See your codebase as a living galaxy — and give your agents causal memory of it.
|
|
6
|
+
|
|
7
|
+
Droste indexes any repo into a fractal, zoomable map of its symbols, wires them
|
|
8
|
+
together with their real call / import / DB edges across languages, and serves an
|
|
9
|
+
agent the *causal* slice of code it actually needs — not just keyword matches.
|
|
10
|
+
|
|
11
|
+
**Local-first · zero-config · polyglot · MCP-native**
|
|
12
|
+
|
|
13
|
+

|
|
14
|
+
|
|
15
|
+
*Zooming out reveals the causal web — every cyan arc is a real `syntax_dependency`
|
|
16
|
+
edge. [Full flythrough (FastAPI)](docs/assets/demo.mp4)*
|
|
17
|
+
|
|
18
|
+
[Quickstart](#quickstart) · [Why it's different](#why-its-different) · [How it works](#how-it-works) · [MCP](#use-it-as-an-mcp-server) · [Benchmarks](#benchmarks)
|
|
19
|
+
|
|
20
|
+
</div>
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## Quickstart
|
|
25
|
+
|
|
26
|
+
```bash
|
|
27
|
+
pip install -e . # or: pip install droste-memory (once on PyPI)
|
|
28
|
+
droste index . # index the current repo
|
|
29
|
+
droste view # open the fractal galaxy in your browser
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
Three commands. `droste view` opens a full-screen, 60fps zoomable map of your
|
|
33
|
+
code — scroll to dive from the project star into folder orbits, down to the
|
|
34
|
+
individual functions, with the causal edges glowing between them.
|
|
35
|
+
|
|
36
|
+
Need it for an agent instead of your eyes?
|
|
37
|
+
|
|
38
|
+
```bash
|
|
39
|
+
droste context "checkout flow" --budget 1500 # causal context slice for an LLM
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
Running `droste` with no arguments prints the command palette:
|
|
43
|
+
|
|
44
|
+
```text
|
|
45
|
+
.-----------------------.
|
|
46
|
+
.----' | '----.
|
|
47
|
+
.---' .-----+-----. '---.
|
|
48
|
+
.' .----' | '----. '.
|
|
49
|
+
/ .---' .---+---. '---. \
|
|
50
|
+
/ .-' .-' | '-. '-. \
|
|
51
|
+
| .' .-' .---+---. '-. '. |
|
|
52
|
+
| / .' .' | '. '. \ |
|
|
53
|
+
| | | | .--+--. | | | |
|
|
54
|
+
| --+--------+-----+---+ @ +---+-----+--------+-- |
|
|
55
|
+
| | | | '--+--' | | | |
|
|
56
|
+
| \ '. '. | .' .' / |
|
|
57
|
+
| '. '-. '---+---' .-' .' |
|
|
58
|
+
\ '-. '-. | .-' .-' /
|
|
59
|
+
\ '---. '---+---' .---' /
|
|
60
|
+
'. '----. | .----' .'
|
|
61
|
+
'---. '-----+-----' .---'
|
|
62
|
+
'----. | .----'
|
|
63
|
+
'-----------------------'
|
|
64
|
+
|
|
65
|
+
DROSTE-MEMORY // RIGID FRACTAL RADIAL LAYOUT
|
|
66
|
+
Local Graph Engine v1.0-Alpha-Sharded
|
|
67
|
+
|
|
68
|
+
Commands
|
|
69
|
+
droste index <path> [--reset]
|
|
70
|
+
droste status
|
|
71
|
+
droste zoom <symbol_name>
|
|
72
|
+
droste context [query] --budget 1500
|
|
73
|
+
|
|
74
|
+
Fast path: droste context hub_core --budget 1000 | clip
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
---
|
|
78
|
+
|
|
79
|
+
## Why it's different
|
|
80
|
+
|
|
81
|
+
Most "code context" tools rank by keyword (ctags / ripgrep / repo-maps) or by
|
|
82
|
+
embedding cosine (vector-RAG). Both can only return what *resembles* your query.
|
|
83
|
+
A caller that shares no tokens — or a database function in a different language —
|
|
84
|
+
is invisible to them, yet it's exactly what you need to understand or change the
|
|
85
|
+
code.
|
|
86
|
+
|
|
87
|
+
Droste's edge is the causal graph:
|
|
88
|
+
|
|
89
|
+
- **Causal wormholes.** Real `syntax_dependency` edges (calls, imports,
|
|
90
|
+
inheritance) in both directions — Droste hands the caller and callees, ordered,
|
|
91
|
+
within a token budget.
|
|
92
|
+
- **Cross-language bridges.** The part nobody else does well: Droste links across
|
|
93
|
+
languages — app code to SQL functions/tables (`.rpc('x')`, `.from('table')`),
|
|
94
|
+
to edge functions, and same-name handlers between any two languages. Your
|
|
95
|
+
Dart/TS/Python frontend and your database stop being two separate worlds on the
|
|
96
|
+
map.
|
|
97
|
+
- **A map you actually want to look at.** The fractal galaxy isn't a gimmick —
|
|
98
|
+
it's how you see coupling, risk hotspots, and the blast radius of a change.
|
|
99
|
+
- **Zero-config and local.** No cloud, no account, no API key. fastembed (ONNX,
|
|
100
|
+
no torch) gives real semantics; a deterministic fallback keeps it runnable
|
|
101
|
+
anywhere.
|
|
102
|
+
|
|
103
|
+
Polyglot: Python (AST) + tree-sitter for Dart, TypeScript/JavaScript, Go, Rust,
|
|
104
|
+
Java, C#, C/C++, Kotlin, Swift, Ruby, PHP, SQL — symbols *and* edges.
|
|
105
|
+
|
|
106
|
+
> **Honest scope:** the measured advantage is structural / causal retrieval. On
|
|
107
|
+
> pure semantic "concept" queries it's competitive with a vector baseline, not a
|
|
108
|
+
> leap. Cross-language bridges are strongest where the target is actually defined
|
|
109
|
+
> in the indexed repo (e.g. SQL schema in your migrations).
|
|
110
|
+
|
|
111
|
+
---
|
|
112
|
+
|
|
113
|
+
## Benchmarks
|
|
114
|
+
|
|
115
|
+
Self-supervised eval (gold = the true caller/callee set from the AST), equal
|
|
116
|
+
retrieval breadth *k*, real embeddings, across Python + Dart repos
|
|
117
|
+
(`eval/comparative_eval.py`):
|
|
118
|
+
|
|
119
|
+
| structural retrieval | Droste | vector-RAG core | lexical core |
|
|
120
|
+
| --- | --- | --- | --- |
|
|
121
|
+
| neighbour-recall | **0.94** | 0.18 | 0.42 |
|
|
122
|
+
| nDCG@k | **0.65** | 0.10 | 0.29 |
|
|
123
|
+
|
|
124
|
+
…plus hundreds of true causal neighbours that both baselines structurally miss.
|
|
125
|
+
This is a retrieval-method comparison (the cores of vector-RAG and lexical
|
|
126
|
+
search), not a head-to-head against the finished products that wrap them.
|
|
127
|
+
|
|
128
|
+
---
|
|
129
|
+
|
|
130
|
+
## How it works
|
|
131
|
+
|
|
132
|
+
- **Causal graph.** Each definition is parsed (Python `ast`; tree-sitter for the
|
|
133
|
+
rest) into the names it calls / imports / inherits, becoming first-class
|
|
134
|
+
`syntax_dependency` edges. Cross-language edges add DB calls (`.rpc`, `.from`,
|
|
135
|
+
`.functions.invoke`) and string-literal name matches across languages.
|
|
136
|
+
- **Hybrid seed.** A query is matched by a normalized blend of lexical score and
|
|
137
|
+
semantic cosine (fastembed `bge-small-en-v1.5`, 384-dim), then the graph
|
|
138
|
+
expands the seed bidirectionally (callees and callers).
|
|
139
|
+
- **Token packer.** Results fit a budget with LOD-demotion (full to contract to
|
|
140
|
+
skeleton) and a hard guardrail that never cuts a line of code mid-token.
|
|
141
|
+
- **Sharded persistence.** One shard per file under `.droste/`, blake2b
|
|
142
|
+
dirty-tracking so a re-index rewrites only what changed; atomic writes + meta
|
|
143
|
+
written last, so it is crash-safe and self-heals on the next run.
|
|
144
|
+
|
|
145
|
+
---
|
|
146
|
+
|
|
147
|
+
## Use it as an MCP server
|
|
148
|
+
|
|
149
|
+
Droste is a drop-in MCP server — an agent calls it as primary code memory instead
|
|
150
|
+
of blind file reads.
|
|
151
|
+
|
|
152
|
+
```jsonc
|
|
153
|
+
{
|
|
154
|
+
"mcpServers": {
|
|
155
|
+
"droste": { "command": "python", "args": ["/abs/path/to/droste-memory/server.py"] }
|
|
156
|
+
}
|
|
157
|
+
}
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
Key tools: `droste_index_project`, `droste_get_context`, `droste_status`.
|
|
161
|
+
|
|
162
|
+
---
|
|
163
|
+
|
|
164
|
+
## Development
|
|
165
|
+
|
|
166
|
+
```bash
|
|
167
|
+
pip install -e ".[dev]"
|
|
168
|
+
pytest # deterministic regression suite (tests/)
|
|
169
|
+
python eval/comparative_eval.py # retrieval benchmark vs lexical & vector cores
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
`tests/` = invariants + concurrency (round-trip, dirty-oracle, packer guardrail,
|
|
173
|
+
cross-process shard race). `eval/` = performance/quality benchmarks.
|
|
174
|
+
|
|
175
|
+
---
|
|
176
|
+
|
|
177
|
+
## Status
|
|
178
|
+
|
|
179
|
+
**v1.0.0a0 (alpha).** Engine, polyglot + cross-language graph, CLI, fractal
|
|
180
|
+
visualizer and MCP server are working and tested. Packaging/distribution are
|
|
181
|
+
maturing — issues and PRs welcome (see `CONTRIBUTING.md`).
|
|
182
|
+
|
|
183
|
+
## License
|
|
184
|
+
|
|
185
|
+
MIT — see [LICENSE](LICENSE).
|