memweave 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- memweave-0.1.0/LICENSE +21 -0
- memweave-0.1.0/PKG-INFO +529 -0
- memweave-0.1.0/README.md +501 -0
- memweave-0.1.0/memweave/__init__.py +100 -0
- memweave-0.1.0/memweave/_internal/__init__.py +0 -0
- memweave-0.1.0/memweave/_internal/hashing.py +295 -0
- memweave-0.1.0/memweave/_progress.py +76 -0
- memweave-0.1.0/memweave/chunking/__init__.py +0 -0
- memweave-0.1.0/memweave/chunking/markdown.py +232 -0
- memweave-0.1.0/memweave/config.py +622 -0
- memweave-0.1.0/memweave/embedding/__init__.py +0 -0
- memweave-0.1.0/memweave/embedding/cache.py +355 -0
- memweave-0.1.0/memweave/embedding/provider.py +308 -0
- memweave-0.1.0/memweave/embedding/vectors.py +47 -0
- memweave-0.1.0/memweave/exceptions.py +113 -0
- memweave-0.1.0/memweave/flush/__init__.py +0 -0
- memweave-0.1.0/memweave/flush/memory_flush.py +115 -0
- memweave-0.1.0/memweave/py.typed +1 -0
- memweave-0.1.0/memweave/search/__init__.py +49 -0
- memweave-0.1.0/memweave/search/hybrid.py +209 -0
- memweave-0.1.0/memweave/search/keyword.py +849 -0
- memweave-0.1.0/memweave/search/mmr.py +224 -0
- memweave-0.1.0/memweave/search/postprocessor.py +108 -0
- memweave-0.1.0/memweave/search/query_expansion.py +107 -0
- memweave-0.1.0/memweave/search/strategy.py +128 -0
- memweave-0.1.0/memweave/search/temporal_decay.py +368 -0
- memweave-0.1.0/memweave/search/vector.py +166 -0
- memweave-0.1.0/memweave/storage/__init__.py +0 -0
- memweave-0.1.0/memweave/storage/files.py +273 -0
- memweave-0.1.0/memweave/storage/schema.py +248 -0
- memweave-0.1.0/memweave/storage/sqlite_store.py +813 -0
- memweave-0.1.0/memweave/store.py +1178 -0
- memweave-0.1.0/memweave/sync/__init__.py +0 -0
- memweave-0.1.0/memweave/sync/watcher.py +144 -0
- memweave-0.1.0/memweave/types.py +307 -0
- memweave-0.1.0/pyproject.toml +72 -0
memweave-0.1.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Sachin Sharma
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
memweave-0.1.0/PKG-INFO
ADDED
|
@@ -0,0 +1,529 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: memweave
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Agentic memory you can read, search, and git diff
|
|
5
|
+
License: MIT
|
|
6
|
+
License-File: LICENSE
|
|
7
|
+
Keywords: ai,agents,memory,rag,llm,multi-agent
|
|
8
|
+
Author: Sachin Sharma
|
|
9
|
+
Author-email: sachin.sharma.tukl@gmail.com
|
|
10
|
+
Requires-Python: >=3.12,<4.0
|
|
11
|
+
Classifier: Development Status :: 3 - Alpha
|
|
12
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
13
|
+
Classifier: Programming Language :: Python :: 3
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.14
|
|
17
|
+
Classifier: Programming Language :: Python :: 3 :: Only
|
|
18
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
19
|
+
Requires-Dist: aiofiles (>=23.0.0)
|
|
20
|
+
Requires-Dist: aiosqlite (>=0.20.0)
|
|
21
|
+
Requires-Dist: litellm (>=1.80.13,<2.0)
|
|
22
|
+
Requires-Dist: numpy (>=1.24.0)
|
|
23
|
+
Requires-Dist: sqlite-vec (>=0.1.0)
|
|
24
|
+
Requires-Dist: watchfiles (>=0.20.0)
|
|
25
|
+
Project-URL: Repository, https://github.com/memweave/memweave
|
|
26
|
+
Description-Content-Type: text/markdown
|
|
27
|
+
|
|
28
|
+
# memweave
|
|
29
|
+
|
|
30
|
+
**Agent memory you can read, search, and `git diff`.**
|
|
31
|
+
|
|
32
|
+
[](https://pypi.org/project/memweave/)
|
|
33
|
+
[](https://pypi.org/project/memweave/)
|
|
34
|
+
[](https://opensource.org/licenses/MIT)
|
|
35
|
+
|
|
36
|
+
memweave is a zero-infrastructure, async-first Python library that gives AI agents persistent, searchable memory — stored as plain Markdown files and indexed by SQLite. No external services. No black-box databases. Every memory is a file you can open, edit, grep, and version-control.
|
|
37
|
+
|
|
38
|
+
---
|
|
39
|
+
|
|
40
|
+
## Why memweave?
|
|
41
|
+
|
|
42
|
+
- 📄 **Human-readable by design.** Memories live in plain `.md` files on disk. Open them in your editor, inspect them in your terminal, or `git diff` what your agent learned between runs.
|
|
43
|
+
- 🔍 **Hybrid search out of the box.** Combines BM25 keyword ranking (FTS5) with semantic vector search (sqlite-vec) and merges them — so "PostgreSQL JSONB" finds both exact matches and conceptually related content.
|
|
44
|
+
- ⚡ **Zero LLM calls on core operations.** Writing and searching memories never touches an LLM. Embeddings are cached by content hash — compute once, reuse forever.
|
|
45
|
+
- 🌐 **Works completely offline.** If your embedding API is down, memweave falls back to pure keyword search. It never crashes; it degrades gracefully.
|
|
46
|
+
- 💸 **Zero server cost, zero setup.** The entire memory store is a single SQLite file on disk — no vector database to provision, no cloud service to pay for, no Docker container to manage.
|
|
47
|
+
- 🔌 **Pluggable at every layer.** Swap in a custom search strategy, add a post-processing step, or bring your own embedding provider via a single protocol.
|
|
48
|
+
|
|
49
|
+
---
|
|
50
|
+
|
|
51
|
+
## 🚀 Quickstart Guide
|
|
52
|
+
|
|
53
|
+
```bash
|
|
54
|
+
pip install memweave
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
Set an embedding provider (or skip to use keyword-only mode):
|
|
58
|
+
|
|
59
|
+
```bash
|
|
60
|
+
export OPENAI_API_KEY=sk-...
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
```python
|
|
64
|
+
import asyncio
|
|
65
|
+
from pathlib import Path
|
|
66
|
+
from memweave import MemWeave, MemoryConfig
|
|
67
|
+
|
|
68
|
+
async def main():
|
|
69
|
+
async with MemWeave(MemoryConfig(workspace_dir=".")) as mem:
|
|
70
|
+
# Write a memory file, then index it
|
|
71
|
+
memory_file = Path("memory/preferences.md")
|
|
72
|
+
memory_file.parent.mkdir(exist_ok=True)
|
|
73
|
+
memory_file.write_text("The user prefers dark mode and concise answers.")
|
|
74
|
+
await mem.add(memory_file)
|
|
75
|
+
|
|
76
|
+
# Search across all memories.
|
|
77
|
+
# min_score=0.0 ensures results surface in a small corpus;
|
|
78
|
+
# in production the default 0.35 threshold filters low-confidence matches.
|
|
79
|
+
results = await mem.search("What is the user preference?", min_score=0.0)
|
|
80
|
+
for r in results:
|
|
81
|
+
print(f"[{r.score:.2f}] {r.snippet} ← {r.path}:{r.start_line}")
|
|
82
|
+
|
|
83
|
+
asyncio.run(main())
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
Memories are plain Markdown files in `memory/`. Inspect them any time:
|
|
87
|
+
|
|
88
|
+
```bash
|
|
89
|
+
cat memory/*.md
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
Each result includes a relevance score and the exact file and line it came from:
|
|
93
|
+
|
|
94
|
+
```
|
|
95
|
+
[0.35] The user prefers dark mode and concise answers. ← memory/preferences.md:1
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
---
|
|
99
|
+
|
|
100
|
+
## How it works
|
|
101
|
+
|
|
102
|
+
memweave separates **storage** from **search**:
|
|
103
|
+
|
|
104
|
+
```
|
|
105
|
+
┌──────────────────────────────────────────────────────────────┐
|
|
106
|
+
│ SOURCE OF TRUTH (Markdown files) │
|
|
107
|
+
│ memory/MEMORY.md ← evergreen knowledge │
|
|
108
|
+
│ memory/2026-03-21.md ← daily logs │
|
|
109
|
+
│ memory/agents/coder/ ← agent-scoped namespace │
|
|
110
|
+
└───────────────────────┬──────────────────────────────────────┘
|
|
111
|
+
│ chunking → hashing → embedding
|
|
112
|
+
┌───────────────────────▼──────────────────────────────────────┐
|
|
113
|
+
│ DERIVED INDEX (SQLite) │
|
|
114
|
+
│ chunks — text + metadata │
|
|
115
|
+
│ chunks_fts — FTS5 full-text index (BM25) │
|
|
116
|
+
│ chunks_vec — sqlite-vec SIMD index (cosine) │
|
|
117
|
+
│ embedding_cache — hash → vector (skip re-embedding) │
|
|
118
|
+
│ files — SHA-256 change detection │
|
|
119
|
+
└───────────────────────┬──────────────────────────────────────┘
|
|
120
|
+
│ hybrid merge → post-processing
|
|
121
|
+
▼
|
|
122
|
+
list[SearchResult]
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
**Write path** — `await mem.add(path)` takes any Markdown file you've written — dated, evergreen, agent-scoped, or session — chunks it, checks the embedding cache (hash lookup), calls the embedding API only on a miss, and inserts into both the FTS5 and vector tables. No LLM involved.
|
|
126
|
+
|
|
127
|
+
**Search path** — `await mem.search(query)` embeds the query, runs vector search and keyword search in parallel, merges scores (`0.7 × vector + 0.3 × BM25`), applies post-processors (threshold → MMR → temporal decay), and returns ranked results.
|
|
128
|
+
|
|
129
|
+
---
|
|
130
|
+
|
|
131
|
+
## Core concepts
|
|
132
|
+
|
|
133
|
+
### Markdown as the source of truth
|
|
134
|
+
|
|
135
|
+
The SQLite index is a **derived cache** — always rebuildable from the Markdown files. This means:
|
|
136
|
+
|
|
137
|
+
- You can edit memories directly in your editor and re-index with `await mem.index()`.
|
|
138
|
+
- `git diff memory/` shows exactly what an agent learned between commits.
|
|
139
|
+
- Losing the database is not data loss. Losing the files is.
|
|
140
|
+
|
|
141
|
+
### Evergreen vs dated files
|
|
142
|
+
|
|
143
|
+
| File | Behaviour |
|
|
144
|
+
|------|-----------|
|
|
145
|
+
| `memory/MEMORY.md` | **Evergreen** — never decays, write-protected during `flush()` |
|
|
146
|
+
| `memory/2026-03-21.md` | **Dated** — subject to temporal decay (older memories rank lower) |
|
|
147
|
+
| `memory/researcher_agent/` | **Agent-scoped** — isolated namespace per agent |
|
|
148
|
+
| `memory/episodes/event.md` | **Episodic** — named events, timestamped |
|
|
149
|
+
|
|
150
|
+
Evergreen files hold foundational facts that should always surface at full score. Dated files accumulate daily learning and fade naturally — recent memories rank higher.
|
|
151
|
+
|
|
152
|
+
### Agent namespaces & source labels
|
|
153
|
+
|
|
154
|
+
Every file gets a `source` label derived from its path — the **immediate subdirectory** under `memory/` becomes the label:
|
|
155
|
+
|
|
156
|
+
| File path | `source` |
|
|
157
|
+
|-----------|---------|
|
|
158
|
+
| `memory/notes.md` | `"memory"` |
|
|
159
|
+
| `memory/sessions/2026-04-03.md` | `"sessions"` |
|
|
160
|
+
| `memory/researcher_agent/findings.md` | `"researcher_agent"` |
|
|
161
|
+
| Outside `memory/` | `"external"` |
|
|
162
|
+
|
|
163
|
+
Pass `source_filter="researcher_agent"` to `search()` to scope results exclusively to that namespace. Only the first path component counts — `memory/researcher_agent/sub/x.md` has source `"researcher_agent"`, not `"sub"`.
|
|
164
|
+
|
|
165
|
+
### Hybrid search
|
|
166
|
+
|
|
167
|
+
```
|
|
168
|
+
query: "which database should I use for JSON?"
|
|
169
|
+
│
|
|
170
|
+
├─ FTS5 BM25 ──────── exact keywords → score A
|
|
171
|
+
│
|
|
172
|
+
└─ sqlite-vec cosine ─ semantic match → score B
|
|
173
|
+
│
|
|
174
|
+
▼
|
|
175
|
+
merged = 0.7 × B + 0.3 × A
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
Post-processors run after merging:
|
|
179
|
+
|
|
180
|
+
- **Score threshold** — drops results below `min_score` (default `0.35`)
|
|
181
|
+
- **MMR re-ranking** — penalises redundant results, promotes diversity (disabled by default)
|
|
182
|
+
- **Temporal decay** — exponential score reduction by file age, evergreen exempt (disabled by default)
|
|
183
|
+
|
|
184
|
+
---
|
|
185
|
+
|
|
186
|
+
## Usage examples
|
|
187
|
+
|
|
188
|
+
### Single agent with persistent memory
|
|
189
|
+
|
|
190
|
+
```python
|
|
191
|
+
import asyncio
|
|
192
|
+
from pathlib import Path
|
|
193
|
+
from memweave import MemWeave, MemoryConfig
|
|
194
|
+
|
|
195
|
+
async def run_agent_session():
|
|
196
|
+
config = MemoryConfig(workspace_dir="./my_project")
|
|
197
|
+
|
|
198
|
+
async with MemWeave(config) as mem:
|
|
199
|
+
# Write memory files, then index them
|
|
200
|
+
memory_dir = Path("my_project/memory")
|
|
201
|
+
memory_dir.mkdir(parents=True, exist_ok=True)
|
|
202
|
+
|
|
203
|
+
(memory_dir / "stack.md").write_text("User's preferred stack: FastAPI + PostgreSQL + Redis.")
|
|
204
|
+
(memory_dir / "guidelines.md").write_text("Avoid using global state in this codebase.")
|
|
205
|
+
|
|
206
|
+
await mem.index()
|
|
207
|
+
|
|
208
|
+
# Retrieve relevant context before responding
|
|
209
|
+
context = await mem.search("database recommendations", min_score=0.0, max_results=2)
|
|
210
|
+
for result in context:
|
|
211
|
+
print(f" [{result.score:.2f}] {result.snippet} ({result.path}:{result.start_line})")
|
|
212
|
+
|
|
213
|
+
asyncio.run(run_agent_session())
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
### Multi-Agent with shared and isolated namespaces
|
|
217
|
+
|
|
218
|
+
Agents share one workspace but write to separate subdirectories under `memory/`. The subdirectory name becomes the `source` label — pass `source_filter="researcher_agent"` to scope a search exclusively to that agent's files.
|
|
219
|
+
|
|
220
|
+
```python
|
|
221
|
+
import asyncio
|
|
222
|
+
from pathlib import Path
|
|
223
|
+
from memweave import MemWeave, MemoryConfig
|
|
224
|
+
|
|
225
|
+
async def main():
|
|
226
|
+
# Both agents share the same workspace root
|
|
227
|
+
researcher = MemWeave(MemoryConfig(workspace_dir="./project"))
|
|
228
|
+
writer = MemWeave(MemoryConfig(workspace_dir="./project"))
|
|
229
|
+
|
|
230
|
+
async with researcher, writer:
|
|
231
|
+
# Researcher writes space exploration findings to its own namespace
|
|
232
|
+
memory_dir = Path("project/memory/researcher_agent")
|
|
233
|
+
memory_dir.mkdir(parents=True, exist_ok=True)
|
|
234
|
+
|
|
235
|
+
(memory_dir / "mars_habitat.md").write_text(
|
|
236
|
+
"Mars surface pressure is ~0.6% of Earth's, requiring fully pressurised habitats. "
|
|
237
|
+
"NASA's MOXIE experiment on Perseverance successfully produced oxygen from CO2 in 2021, "
|
|
238
|
+
"validating in-situ resource utilisation (ISRU) as a viable strategy for long-duration missions."
|
|
239
|
+
)
|
|
240
|
+
(memory_dir / "artemis_mission.md").write_text(
|
|
241
|
+
"Artemis III aims to land the first woman and next man near the lunar south pole. "
|
|
242
|
+
"Permanently shadowed craters there hold water ice deposits confirmed by LCROSS in 2009. "
|
|
243
|
+
"Ice can be electrolysed into hydrogen and oxygen, serving as both breathable air and rocket propellant."
|
|
244
|
+
)
|
|
245
|
+
(memory_dir / "deep_space_propulsion.md").write_text(
|
|
246
|
+
"Ion drives expel charged xenon atoms at ~90,000 km/h, achieving far higher specific impulse "
|
|
247
|
+
"than chemical rockets, though thrust is measured in millinewtons. NASA's Dawn spacecraft used "
|
|
248
|
+
"ion propulsion to orbit both Vesta and Ceres — the first mission to orbit two extraterrestrial bodies."
|
|
249
|
+
)
|
|
250
|
+
|
|
251
|
+
await researcher.index()
|
|
252
|
+
|
|
253
|
+
# Writer queries the researcher's findings — scoped to the researcher_agent source
|
|
254
|
+
queries = [
|
|
255
|
+
"how do astronauts get oxygen on Mars",
|
|
256
|
+
"water ice on the Moon",
|
|
257
|
+
"spacecraft propulsion beyond chemical rockets",
|
|
258
|
+
]
|
|
259
|
+
|
|
260
|
+
for query in queries:
|
|
261
|
+
print(f"\nQuery: {query!r}")
|
|
262
|
+
results = await writer.search(query, source_filter="researcher_agent", min_score=0.0, max_results=1)
|
|
263
|
+
for r in results:
|
|
264
|
+
print(f" [{r.score:.2f}] {r.snippet} ({r.path}:{r.start_line})")
|
|
265
|
+
|
|
266
|
+
asyncio.run(main())
|
|
267
|
+
```
|
|
268
|
+
|
|
269
|
+
### Memory flush — persist conversation facts before context compaction
|
|
270
|
+
|
|
271
|
+
LLM context windows are finite. When a long conversation is compacted or a session ends, anything not written to memory is lost. `flush()` solves this by sending the conversation to an LLM with a structured extraction prompt — the model distils durable facts (decisions, preferences, constraints) and discards small talk. The extracted text is appended to the dated memory file (`memory/YYYY-MM-DD.md`) and immediately re-indexed, so it surfaces in future searches. If the LLM finds nothing worth storing it returns a silent sentinel and `flush()` returns `None` — nothing is written.
|
|
272
|
+
|
|
273
|
+
Requires an LLM API key (configured via `FlushConfig.model`, default `gpt-4o-mini`).
|
|
274
|
+
|
|
275
|
+
```python
|
|
276
|
+
import asyncio
|
|
277
|
+
from pathlib import Path
|
|
278
|
+
from memweave import MemWeave, MemoryConfig
|
|
279
|
+
|
|
280
|
+
WORKSPACE = Path(__file__).parent / "workspace"
|
|
281
|
+
|
|
282
|
+
conversation = [
|
|
283
|
+
{"role": "user", "content": "We just decided to use Valkey instead of Redis for caching."},
|
|
284
|
+
{"role": "assistant", "content": "Got it. I'll note that Valkey is the new caching layer."},
|
|
285
|
+
{"role": "user", "content": "Also, we're targeting a 5ms p99 latency SLA for the cache."},
|
|
286
|
+
]
|
|
287
|
+
|
|
288
|
+
async def main():
|
|
289
|
+
config = MemoryConfig(workspace_dir=WORKSPACE)
|
|
290
|
+
|
|
291
|
+
async with MemWeave(config) as mem:
|
|
292
|
+
# Extract durable facts from the conversation and write to workspace/memory/YYYY-MM-DD.md.
|
|
293
|
+
# Returns the extracted text, or None if there was nothing worth storing.
|
|
294
|
+
extracted = await mem.flush(conversation=conversation)
|
|
295
|
+
if extracted:
|
|
296
|
+
print(f"Stored:\n{extracted}\n")
|
|
297
|
+
else:
|
|
298
|
+
print("Nothing worth storing.\n")
|
|
299
|
+
|
|
300
|
+
# Search the indexed knowledge immediately after flush
|
|
301
|
+
results = await mem.search("Valkey caching latency", min_score=0.0)
|
|
302
|
+
print(f"Search results ({len(results)} hits):")
|
|
303
|
+
for r in results:
|
|
304
|
+
print(f" [{r.score:.3f}] {r.snippet.strip()}")
|
|
305
|
+
|
|
306
|
+
asyncio.run(main())
|
|
307
|
+
```
|
|
308
|
+
|
|
309
|
+
### Custom search strategy
|
|
310
|
+
|
|
311
|
+
The built-in `"hybrid"`, `"vector"`, and `"keyword"` strategies cover most cases, but sometimes you need ranking logic that none of them support — for example, boosting results from recently modified files, hard-pinning results from a specific file to the top, or implementing a completely different scoring algorithm. A custom strategy gives you direct access to the SQLite database, so you can write any query you like and return results in whatever order you want. memweave applies your results through the same post-processing pipeline (score threshold, MMR, temporal decay) as built-in strategies.
|
|
312
|
+
|
|
313
|
+
Register a strategy once with `mem.register_strategy(name, obj)`, then activate it per-call via `strategy=name`.
|
|
314
|
+
|
|
315
|
+
```python
|
|
316
|
+
import asyncio
|
|
317
|
+
import aiosqlite
|
|
318
|
+
from memweave import MemWeave, MemoryConfig
|
|
319
|
+
from memweave.search.strategy import RawSearchRow
|
|
320
|
+
|
|
321
|
+
class RecencyBoostStrategy:
|
|
322
|
+
async def search(
|
|
323
|
+
self,
|
|
324
|
+
db: aiosqlite.Connection,
|
|
325
|
+
query: str,
|
|
326
|
+
query_vec: list[float] | None,
|
|
327
|
+
model: str,
|
|
328
|
+
limit: int,
|
|
329
|
+
*,
|
|
330
|
+
source_filter: str | None = None,
|
|
331
|
+
) -> list[RawSearchRow]:
|
|
332
|
+
# Your custom ranking logic here — query `db` directly and return RawSearchRow objects
|
|
333
|
+
...
|
|
334
|
+
|
|
335
|
+
async def main():
|
|
336
|
+
async with MemWeave(MemoryConfig(workspace_dir=".")) as mem:
|
|
337
|
+
mem.register_strategy("recency", RecencyBoostStrategy())
|
|
338
|
+
results = await mem.search("recent decisions", strategy="recency")
|
|
339
|
+
|
|
340
|
+
asyncio.run(main())
|
|
341
|
+
```
|
|
342
|
+
|
|
343
|
+
### File watcher — auto-reindex on file change
|
|
344
|
+
|
|
345
|
+
When running a long-lived agent, memory files can be edited externally — by another process, a human, or a separate agent writing to the same workspace. Without the watcher, those changes are invisible until the next explicit `await mem.index()` call. `start_watching()` launches a background task that monitors the `memory/` directory and re-indexes any `.md` file the moment it changes, so searches always reflect the latest content. Rapid successive writes are debounced (default 1500 ms) to avoid redundant re-indexing. The watcher stops automatically when the context manager exits.
|
|
346
|
+
|
|
347
|
+
Requires the `watchfiles` package (`pip install memweave[watch]`).
|
|
348
|
+
|
|
349
|
+
```python
|
|
350
|
+
import asyncio
|
|
351
|
+
from memweave import MemWeave
|
|
352
|
+
|
|
353
|
+
async def main():
|
|
354
|
+
async with MemWeave() as mem:
|
|
355
|
+
await mem.start_watching() # starts background task, watches memory/
|
|
356
|
+
# ... run your agent loop
|
|
357
|
+
# any .md file edits are picked up and re-indexed automatically
|
|
358
|
+
# watcher stops automatically on context manager exit
|
|
359
|
+
|
|
360
|
+
asyncio.run(main())
|
|
361
|
+
```
|
|
362
|
+
|
|
363
|
+
### Inspect memory status
|
|
364
|
+
|
|
365
|
+
`status()` gives a point-in-time snapshot of the store — how many files and chunks are indexed, which search mode is active (`hybrid`, `fts-only`, or `vector-only`), whether there are unindexed changes pending (`dirty`), and how many embeddings are cached. Useful for health checks, debugging, or surfacing store state in agent logs.
|
|
366
|
+
|
|
367
|
+
```python
|
|
368
|
+
async with MemWeave() as mem:
|
|
369
|
+
status = await mem.status()
|
|
370
|
+
print(f"Files: {status.files}")
|
|
371
|
+
print(f"Chunks: {status.chunks}")
|
|
372
|
+
print(f"Search mode: {status.search_mode}") # hybrid | fts-only | vector-only
|
|
373
|
+
print(f"Dirty: {status.dirty}") # unindexed changes pending
|
|
374
|
+
```
|
|
375
|
+
|
|
376
|
+
### List indexed files
|
|
377
|
+
|
|
378
|
+
`files()` returns metadata for every file currently tracked in the index — path, size, chunk count, source label, and whether the file is evergreen. Useful when an agent needs to audit what it has access to, detect stale files, or decide which namespace to write to next.
|
|
379
|
+
|
|
380
|
+
```python
|
|
381
|
+
async with MemWeave() as mem:
|
|
382
|
+
for f in await mem.files():
|
|
383
|
+
print(f"{f.path} ({f.chunks} chunks, evergreen={f.is_evergreen}, source={f.source})")
|
|
384
|
+
```
|
|
385
|
+
|
|
386
|
+
---
|
|
387
|
+
|
|
388
|
+
## Configuring memweave
|
|
389
|
+
|
|
390
|
+
All configuration is optional — sensible defaults work out of the box. Pass a `MemoryConfig` to override.
|
|
391
|
+
|
|
392
|
+
`MemoryConfig` is a single nested dataclass that groups every tunable knob into focused sub-configs. Each sub-config has its own defaults and can be overridden independently:
|
|
393
|
+
|
|
394
|
+
- **`EmbeddingConfig`** — which model to use for vectorising text, API key, batch size, timeout.
|
|
395
|
+
- **`ChunkingConfig`** — chunk size and overlap in tokens. Smaller chunks give more precise retrieval; larger chunks give more context per result.
|
|
396
|
+
- **`QueryConfig`** — default search strategy, max results, score threshold, and the settings for the three built-in post-processors (hybrid weights, MMR, temporal decay).
|
|
397
|
+
- **`CacheConfig`** — embedding cache toggle and optional LRU eviction cap to bound disk usage.
|
|
398
|
+
- **`SyncConfig`** — when to auto-reindex (before each search, on file change, or on a periodic interval).
|
|
399
|
+
- **`FlushConfig`** — the LLM model and system prompt used by `flush()` for fact extraction.
|
|
400
|
+
|
|
401
|
+
Every field can also be overridden per-call at search time (e.g. `min_score`, `max_results`, `strategy`) without touching the config.
|
|
402
|
+
|
|
403
|
+
```python
|
|
404
|
+
from memweave import MemWeave
|
|
405
|
+
from memweave.config import (
|
|
406
|
+
MemoryConfig, EmbeddingConfig, QueryConfig,
|
|
407
|
+
HybridConfig, MMRConfig, TemporalDecayConfig,
|
|
408
|
+
SyncConfig, FlushConfig,
|
|
409
|
+
)
|
|
410
|
+
|
|
411
|
+
config = MemoryConfig(
|
|
412
|
+
workspace_dir="./memory", # where .md files live
|
|
413
|
+
|
|
414
|
+
embedding=EmbeddingConfig(
|
|
415
|
+
model="text-embedding-3-small", # any LiteLLM-compatible model
|
|
416
|
+
api_key="sk-...", # or set via environment variable
|
|
417
|
+
batch_size=100,
|
|
418
|
+
),
|
|
419
|
+
|
|
420
|
+
query=QueryConfig(
|
|
421
|
+
strategy="hybrid", # "hybrid" | "vector" | "keyword"
|
|
422
|
+
max_results=10,
|
|
423
|
+
min_score=0.35,
|
|
424
|
+
|
|
425
|
+
hybrid=HybridConfig(
|
|
426
|
+
vector_weight=0.7, # weight for semantic similarity
|
|
427
|
+
text_weight=0.3, # weight for BM25 keyword score
|
|
428
|
+
),
|
|
429
|
+
|
|
430
|
+
mmr=MMRConfig(
|
|
431
|
+
enabled=True,
|
|
432
|
+
lambda_param=0.5, # 0 = max diversity, 1 = max relevance
|
|
433
|
+
),
|
|
434
|
+
|
|
435
|
+
temporal_decay=TemporalDecayConfig(
|
|
436
|
+
enabled=True,
|
|
437
|
+
half_life_days=30.0, # score halves every 30 days
|
|
438
|
+
),
|
|
439
|
+
),
|
|
440
|
+
|
|
441
|
+
sync=SyncConfig(
|
|
442
|
+
on_search=True, # sync dirty files before each search
|
|
443
|
+
watch=False, # enable file watcher
|
|
444
|
+
watch_debounce_ms=500,
|
|
445
|
+
),
|
|
446
|
+
|
|
447
|
+
flush=FlushConfig(
|
|
448
|
+
enabled=True,
|
|
449
|
+
model="gpt-4o-mini", # LLM used for fact extraction
|
|
450
|
+
),
|
|
451
|
+
)
|
|
452
|
+
|
|
453
|
+
async with MemWeave(config) as mem:
|
|
454
|
+
...
|
|
455
|
+
```
|
|
456
|
+
|
|
457
|
+
### Embedding providers
|
|
458
|
+
|
|
459
|
+
memweave uses [LiteLLM](https://github.com/BerriAI/litellm) under the hood — any LiteLLM-compatible embedding model works with zero code changes:
|
|
460
|
+
|
|
461
|
+
| Provider | Model example |
|
|
462
|
+
|----------|---------------|
|
|
463
|
+
| OpenAI | `text-embedding-3-small` |
|
|
464
|
+
| Gemini | `gemini/text-embedding-004` |
|
|
465
|
+
| Voyage AI | `voyage/voyage-3` |
|
|
466
|
+
| Mistral | `mistral/mistral-embed` |
|
|
467
|
+
| Ollama (local) | `ollama/nomic-embed-text` |
|
|
468
|
+
| Cohere | `cohere/embed-english-v3.0` |
|
|
469
|
+
|
|
470
|
+
**Ollama (no API key required):**
|
|
471
|
+
|
|
472
|
+
```python
|
|
473
|
+
from memweave.config import MemoryConfig, EmbeddingConfig
|
|
474
|
+
|
|
475
|
+
config = MemoryConfig(
|
|
476
|
+
embedding=EmbeddingConfig(
|
|
477
|
+
model="ollama/nomic-embed-text",
|
|
478
|
+
api_base="http://localhost:11434",
|
|
479
|
+
)
|
|
480
|
+
)
|
|
481
|
+
```
|
|
482
|
+
|
|
483
|
+
**Keyword-only mode (fully offline, no embeddings):**
|
|
484
|
+
|
|
485
|
+
```python
|
|
486
|
+
from memweave.config import MemoryConfig, QueryConfig
|
|
487
|
+
|
|
488
|
+
config = MemoryConfig(
|
|
489
|
+
query=QueryConfig(strategy="keyword")
|
|
490
|
+
)
|
|
491
|
+
```
|
|
492
|
+
|
|
493
|
+
---
|
|
494
|
+
|
|
495
|
+
## API reference
|
|
496
|
+
|
|
497
|
+
### `MemWeave`
|
|
498
|
+
|
|
499
|
+
| Method | Description |
|
|
500
|
+
|--------|-------------|
|
|
501
|
+
| `await mem.add(path, *, force=False)` | Index a single Markdown file immediately |
|
|
502
|
+
| `await mem.index(*, force=False)` | (Re)index all Markdown files in the workspace |
|
|
503
|
+
| `await mem.search(query, *, max_results, min_score, strategy, source_filter)` | Search indexed memories |
|
|
504
|
+
| `await mem.flush(conversation, *, model=None, system_prompt=None)` | Extract and persist facts from a conversation via LLM |
|
|
505
|
+
| `await mem.status()` | Return `StoreStatus` (file count, chunk count, search mode, …) |
|
|
506
|
+
| `await mem.files()` | Return `list[FileInfo]` for all indexed files |
|
|
507
|
+
| `await mem.start_watching()` | Start background file watcher (auto-reindex on `.md` changes) |
|
|
508
|
+
| `await mem.close()` | Stop watcher and close database |
|
|
509
|
+
| `mem.register_strategy(name, strategy)` | Register a custom `SearchStrategy` |
|
|
510
|
+
| `mem.register_postprocessor(processor)` | Register a custom `PostProcessor` |
|
|
511
|
+
|
|
512
|
+
---
|
|
513
|
+
|
|
514
|
+
## 🤝 Contributing
|
|
515
|
+
|
|
516
|
+
Issues and pull requests are welcome. Please open an issue before starting large changes.
|
|
517
|
+
|
|
518
|
+
---
|
|
519
|
+
|
|
520
|
+
## Acknowledgements
|
|
521
|
+
|
|
522
|
+
🦞 [OpenClaw](https://github.com/openclaw/openclaw) — the memory architecture that inspired memweave.
|
|
523
|
+
|
|
524
|
+
---
|
|
525
|
+
|
|
526
|
+
## License
|
|
527
|
+
|
|
528
|
+
MIT
|
|
529
|
+
|