bub-semantic-memory 0.1.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,267 @@
1
+ Metadata-Version: 2.4
2
+ Name: bub-semantic-memory
3
+ Version: 0.1.0
4
+ Summary: Semantic memory plugin for Bub
5
+ Project-URL: Repository, https://github.com/bubbuild/bub
6
+ Author: Bub Community
7
+ License: Apache-2.0
8
+ Requires-Python: >=3.12
9
+ Requires-Dist: aiofiles>=25.1.0
10
+ Requires-Dist: bub>=0.3.0
11
+ Requires-Dist: pydantic>=2.0.0
12
+ Requires-Dist: republic>=0.5.4
13
+ Description-Content-Type: text/markdown
14
+
15
+ # Semantic Memory Plugin for Bub
16
+
17
+ A plugin that extracts and retains semantic entities and relations from conversation histories, enriching agent context with semantic memory.
18
+
19
+ ## Overview
20
+
21
+ This plugin intercepts the tape context building process to:
22
+ 1. **Extract semantics** from conversation entries using an LLM
23
+ 2. **Store snapshots** of entities (people, tasks, concepts) and relations between them
24
+ 3. **Inject memory** into subsequent agent prompts, enabling long-context awareness
25
+
26
+ The plugin follows Bub's philosophy: it's completely optional, zero-config after installation, and hooks into the existing `build_tape_context` architecture without modifying core.
27
+
28
+ ## Installation
29
+
30
+ The plugin is already registered in `pyproject.toml`:
31
+
32
+ ```toml
33
+ [project.entry-points."bub"]
34
+ semantic_memory = "bub.plugins.semantic_memory.hook_impl:SemanticMemoryPlugin"
35
+ ```
36
+
37
+ Bub's framework automatically loads and instantiates it on startup. No additional setup required.
38
+
39
+ ## How It Works
40
+
41
+ ### Per-Turn Flow
42
+
43
+ 1. **Input**: Agent receives a new message, tape entries are loaded
44
+ 2. **Extract**: LLM analyzes entries and identifies:
45
+ - **Entities**: people, tasks, events, concepts
46
+ - **Relations**: created, depends_on, mentions, etc.
47
+ 3. **Store**: SemanticSnapshot is appended to `~/.bub/tapes/semantic/{tape_id}.jsonl`
48
+ 4. **Load**: All historical snapshots for this tape are loaded
49
+ 5. **Inject**: Semantic memory is formatted as a system prompt block and prepended to the context
50
+ 6. **Output**: Agent receives enriched context with semantic awareness
51
+
52
+ ### Example
53
+
54
+ Given this conversation:
55
+ ```
56
+ User: "Alice created a task to deploy v1.0"
57
+ Agent: [responds]
58
+ User: "What did Alice do?"
59
+ ```
60
+
61
+ On the second turn, the agent sees:
62
+
63
+ ```
64
+ ## Semantic Memory
65
+
66
+ ### Entities (2):
67
+ - person:alice
68
+ - task:deploy_v1 (v1.0 deployment)
69
+
70
+ ### Relations (1):
71
+ - alice --created--> deploy_v1
72
+
73
+ ---
74
+
75
+ [rest of context]
76
+ ```
77
+
78
+ ## Architecture
79
+
80
+ ### Core Modules
81
+
82
+ - **`models.py`**: Pydantic dataclasses for Entity, Relation, SemanticSnapshot
83
+ - **`extractor.py`**: LLM-based extraction from tape entries
84
+ - **`store.py`**: JSONL file storage at `~/.bub/tapes/semantic/`
85
+ - **`context.py`**: Formatting snapshots into system prompts
86
+ - **`hook_impl.py`**: Bub hookimpl that wires everything together
87
+
88
+ ### Storage Format
89
+
90
+ Snapshots are stored as JSONL (one JSON object per line):
91
+
92
+ ```json
93
+ {
94
+ "entities": [
95
+ {"id": "ent_abc123", "type": "person", "name": "Alice", "metadata": {}},
96
+ {"id": "ent_def456", "type": "task", "name": "deploy_v1", "metadata": {"version": "1.0"}}
97
+ ],
98
+ "relations": [
99
+ {"from": "ent_abc123", "to": "ent_def456", "type": "created", "metadata": {}}
100
+ ],
101
+ "tape_id": "527c9ae0c6f31e05__0b871d5e50e7c192",
102
+ "anchor_id": "anchor_001",
103
+ "created_at": "2026-06-06T09:35:00Z"
104
+ }
105
+ ```
106
+
107
+ ## Configuration
108
+
109
+ The plugin **reuses your main LLM settings** (`BUB_MODEL`, `BUB_API_KEY`, etc.):
110
+
111
+ ```bash
112
+ # Your existing setup (e.g., DeepSeek)
113
+ export BUB_MODEL=deepseek:deepseek-chat
114
+ export BUB_API_KEY=sk-...
115
+ uv run bub chat
116
+ ```
117
+
118
+ No separate `BUB_SEMANTIC_*` variables needed. Semantic extraction uses the same model as your agent.
119
+
120
+ ## Testing
121
+
122
+ Run the test suite:
123
+
124
+ ```bash
125
+ uv run pytest tests/plugins/semantic_memory/test_semantic_memory.py -v
126
+ ```
127
+
128
+ **Coverage**: 43 tests across unit and integration scenarios:
129
+ - Entity/Relation serialization
130
+ - JSONL storage I/O
131
+ - LLM extraction with mocks
132
+ - Context building
133
+ - Multi-turn memory retention
134
+
135
+ ## Usage Examples
136
+
137
+ ### Example 1: CLI Multi-Turn
138
+
139
+ ```bash
140
+ $ uv run bub chat
141
+ bub > Alice is a data scientist.
142
+ Agent > Got it.
143
+
144
+ bub > What is Alice's profession?
145
+ Agent > Alice is a data scientist. (retrieved from semantic memory)
146
+
147
+ bub > ,tape.info
148
+ [Shows: 2 entries, 1 anchor, ... semantic snapshots: 2]
149
+ ```
150
+
151
+ ### Example 2: Telegram
152
+
153
+ ```
154
+ You: "I need to fix a critical bug in the payment module"
155
+ Bot: [Uses semantic memory to track bug, module]
156
+
157
+ You: "What was I working on?"
158
+ Bot: [Recalls semantic memory: bug:critical_payment, module:payment]
159
+ ```
160
+
161
+ ### Example 3: Inspect Semantic Store
162
+
163
+ ```bash
164
+ $ cat ~/.bub/tapes/semantic/527c9ae0c6f31e05__0b871d5e50e7c192.jsonl | python -m json.tool
165
+ [Shows stored entities and relations]
166
+ ```
167
+
168
+ ## Performance & Cost
169
+
170
+ ### Token Usage
171
+ - Each extraction call: ~300-500 tokens (depends on entry volume)
172
+ - Estimated overhead: **+10-20%** per turn (configurable via extraction prompt)
173
+
174
+ ### Storage
175
+ - JSONL format: ~1-2 KB per snapshot (grows with entities/relations)
176
+ - Typical session: ~50-100 KB
177
+
178
+ ### Latency
179
+ - Extraction is async, non-blocking
180
+ - First turn (with extraction): ~500ms extra
181
+ - Subsequent turns: ~50ms extra (just loading snapshots)
182
+
183
+ ## Graceful Degradation
184
+
185
+ If semantic extraction fails for any reason:
186
+ - LLM error: Returns empty snapshot, continues
187
+ - Invalid JSON: Logged as warning, continues
188
+ - Storage error: Logged, continues with base context
189
+
190
+ The agent **always** works, semantic memory is optional enhancement.
191
+
192
+ ## Future Enhancements
193
+
194
+ ### Phase 2: Smart Retrieval
195
+ - Vector embeddings for semantic similarity search
196
+ - Retrieval-augmented context injection (only include relevant entities)
197
+ - Reduces prompt bloat for long sessions
198
+
199
+ ### Phase 3: Advanced Graphs
200
+ - Entity dependency analysis (who depends on what)
201
+ - Centrality metrics (who/what is most important)
202
+ - Causal reasoning (what led to what)
203
+
204
+ ### Phase 4: Multi-Session Memory
205
+ - Cross-session entity resolution
206
+ - Long-term memory across multiple conversations
207
+ - Persistent entity graph (not just per-tape)
208
+
209
+ ## Troubleshooting
210
+
211
+ **Q: Plugin not loading?**
212
+ A: Check that entry-point is registered:
213
+ ```bash
214
+ python -c "import importlib.metadata; print(list(importlib.metadata.entry_points(group='bub')))"
215
+ ```
216
+
217
+ **Q: Semantic snapshots not appearing?**
218
+ A: Check `~/.bub/tapes/semantic/` directory exists. Check logs with `BUB_VERBOSE=1`.
219
+
220
+ **Q: LLM calls are expensive?**
221
+ A: Reduce extraction frequency or use a cheaper model (e.g., DeepSeek distill). Future releases will support model selection per plugin.
222
+
223
+ ## API Reference
224
+
225
+ ### `build_semantic_context(entries, context, llm=None, store=None) → list[dict]`
226
+
227
+ Build context with semantic memory. Called by the framework automatically.
228
+
229
+ **Args:**
230
+ - `entries`: Iterable of TapeEntry objects
231
+ - `context`: TapeContext instance
232
+ - `llm`: republic.LLM instance (optional; if None, returns base context)
233
+ - `store`: SemanticStore instance (optional; if None, returns base context)
234
+
235
+ **Returns:** List of message dicts ready for model input
236
+
237
+ ### `extract_semantics(entries, llm, tape_id, anchor_id=None, max_tokens=1000) → SemanticSnapshot`
238
+
239
+ Extract entities and relations from tape entries.
240
+
241
+ **Args:**
242
+ - `entries`: List of TapeEntry objects
243
+ - `llm`: republic.LLM instance for extraction
244
+ - `tape_id`: Session/tape identifier
245
+ - `anchor_id`: Optional anchor point identifier
246
+ - `max_tokens`: Max tokens for LLM response
247
+
248
+ **Returns:** SemanticSnapshot with extracted entities/relations
249
+
250
+ ## Contributing
251
+
252
+ This plugin is part of Bub's extensibility model. To extend:
253
+
254
+ 1. **Custom entity types**: Modify Entity.type enum in models.py
255
+ 2. **Custom extractors**: Replace or wrap extractor.py
256
+ 3. **Custom storage**: Implement SemanticStore interface
257
+ 4. **Custom formatters**: Replace _format_snapshots in context.py
258
+
259
+ All without modifying Bub core.
260
+
261
+ ## License
262
+
263
+ Same as Bub (Apache 2.0)
264
+
265
+ ---
266
+
267
+ **Questions?** See [Bub documentation](https://bub.build) or open an issue.
@@ -0,0 +1,15 @@
1
+ src/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
2
+ src/bub/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
3
+ src/bub/plugins/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
4
+ src/bub/plugins/semantic_memory/README.md,sha256=fd5pv1jQYCGZ8v_QA7sxb8UUkrJWqORBp_Zfdn-W-U4,7318
5
+ src/bub/plugins/semantic_memory/__init__.py,sha256=952JzuGJLuw6OFU6Vkmdjen3IMLizFB7ox0nast4v1E,146
6
+ src/bub/plugins/semantic_memory/context.py,sha256=56TzsJlOFJsTjDJD7RdQI9tVl0jYAyTaILrYaRiqDj8,4000
7
+ src/bub/plugins/semantic_memory/extractor.py,sha256=1P2NdaDFrRzF7DCEsh05UB0mbe1INJlao_tWVtGYelU,7845
8
+ src/bub/plugins/semantic_memory/hook_impl.py,sha256=iHJyZ3i-i7dCdlxGciKMWhM7iNdiRTtjKRGPMg9gDqw,5196
9
+ src/bub/plugins/semantic_memory/models.py,sha256=d3CoylcYuyqtKL8P9PJ_xgKJ-XiLiQqd2S0mixYShas,3491
10
+ src/bub/plugins/semantic_memory/store.py,sha256=ALux1Mzb_2Bqk6jHPAJbOspC_aFXF0L58JmZ-sONyXk,1481
11
+ src/bub/plugins/semantic_memory/types.py,sha256=IcoGmD3UFTo6jVd-lfzjuz29NsgYT7LbjZm4TMhWmjs,187
12
+ bub_semantic_memory-0.1.0.dist-info/METADATA,sha256=7cMMdEevZX3x4oOTz8C4pW9kN8P4d6OI-t7EuYj0J9I,7705
13
+ bub_semantic_memory-0.1.0.dist-info/WHEEL,sha256=mffPy8wBnZQn2VnJUU5jE99KsxaSfiyMHV9Yt0aLVxs,87
14
+ bub_semantic_memory-0.1.0.dist-info/entry_points.txt,sha256=qGOhhdgXLJJofygtiR5ffbhctU4HAFE7m3YhZkjhnio,83
15
+ bub_semantic_memory-0.1.0.dist-info/RECORD,,
@@ -0,0 +1,4 @@
1
+ Wheel-Version: 1.0
2
+ Generator: hatchling 1.30.1
3
+ Root-Is-Purelib: true
4
+ Tag: py3-none-any
@@ -0,0 +1,2 @@
1
+ [bub]
2
+ semantic_memory = bub.plugins.semantic_memory.hook_impl:SemanticMemoryPlugin
src/__init__.py ADDED
File without changes
src/bub/__init__.py ADDED
File without changes
File without changes
@@ -0,0 +1,253 @@
1
+ # Semantic Memory Plugin for Bub
2
+
3
+ A plugin that extracts and retains semantic entities and relations from conversation histories, enriching agent context with semantic memory.
4
+
5
+ ## Overview
6
+
7
+ This plugin intercepts the tape context building process to:
8
+ 1. **Extract semantics** from conversation entries using an LLM
9
+ 2. **Store snapshots** of entities (people, tasks, concepts) and relations between them
10
+ 3. **Inject memory** into subsequent agent prompts, enabling long-context awareness
11
+
12
+ The plugin follows Bub's philosophy: it's completely optional, zero-config after installation, and hooks into the existing `build_tape_context` architecture without modifying core.
13
+
14
+ ## Installation
15
+
16
+ The plugin is already registered in `pyproject.toml`:
17
+
18
+ ```toml
19
+ [project.entry-points."bub"]
20
+ semantic_memory = "bub.plugins.semantic_memory.hook_impl:SemanticMemoryPlugin"
21
+ ```
22
+
23
+ Bub's framework automatically loads and instantiates it on startup. No additional setup required.
24
+
25
+ ## How It Works
26
+
27
+ ### Per-Turn Flow
28
+
29
+ 1. **Input**: Agent receives a new message, tape entries are loaded
30
+ 2. **Extract**: LLM analyzes entries and identifies:
31
+ - **Entities**: people, tasks, events, concepts
32
+ - **Relations**: created, depends_on, mentions, etc.
33
+ 3. **Store**: SemanticSnapshot is appended to `~/.bub/tapes/semantic/{tape_id}.jsonl`
34
+ 4. **Load**: All historical snapshots for this tape are loaded
35
+ 5. **Inject**: Semantic memory is formatted as a system prompt block and prepended to the context
36
+ 6. **Output**: Agent receives enriched context with semantic awareness
37
+
38
+ ### Example
39
+
40
+ Given this conversation:
41
+ ```
42
+ User: "Alice created a task to deploy v1.0"
43
+ Agent: [responds]
44
+ User: "What did Alice do?"
45
+ ```
46
+
47
+ On the second turn, the agent sees:
48
+
49
+ ```
50
+ ## Semantic Memory
51
+
52
+ ### Entities (2):
53
+ - person:alice
54
+ - task:deploy_v1 (v1.0 deployment)
55
+
56
+ ### Relations (1):
57
+ - alice --created--> deploy_v1
58
+
59
+ ---
60
+
61
+ [rest of context]
62
+ ```
63
+
64
+ ## Architecture
65
+
66
+ ### Core Modules
67
+
68
+ - **`models.py`**: Pydantic dataclasses for Entity, Relation, SemanticSnapshot
69
+ - **`extractor.py`**: LLM-based extraction from tape entries
70
+ - **`store.py`**: JSONL file storage at `~/.bub/tapes/semantic/`
71
+ - **`context.py`**: Formatting snapshots into system prompts
72
+ - **`hook_impl.py`**: Bub hookimpl that wires everything together
73
+
74
+ ### Storage Format
75
+
76
+ Snapshots are stored as JSONL (one JSON object per line):
77
+
78
+ ```json
79
+ {
80
+ "entities": [
81
+ {"id": "ent_abc123", "type": "person", "name": "Alice", "metadata": {}},
82
+ {"id": "ent_def456", "type": "task", "name": "deploy_v1", "metadata": {"version": "1.0"}}
83
+ ],
84
+ "relations": [
85
+ {"from": "ent_abc123", "to": "ent_def456", "type": "created", "metadata": {}}
86
+ ],
87
+ "tape_id": "527c9ae0c6f31e05__0b871d5e50e7c192",
88
+ "anchor_id": "anchor_001",
89
+ "created_at": "2026-06-06T09:35:00Z"
90
+ }
91
+ ```
92
+
93
+ ## Configuration
94
+
95
+ The plugin **reuses your main LLM settings** (`BUB_MODEL`, `BUB_API_KEY`, etc.):
96
+
97
+ ```bash
98
+ # Your existing setup (e.g., DeepSeek)
99
+ export BUB_MODEL=deepseek:deepseek-chat
100
+ export BUB_API_KEY=sk-...
101
+ uv run bub chat
102
+ ```
103
+
104
+ No separate `BUB_SEMANTIC_*` variables needed. Semantic extraction uses the same model as your agent.
105
+
106
+ ## Testing
107
+
108
+ Run the test suite:
109
+
110
+ ```bash
111
+ uv run pytest tests/plugins/semantic_memory/test_semantic_memory.py -v
112
+ ```
113
+
114
+ **Coverage**: 43 tests across unit and integration scenarios:
115
+ - Entity/Relation serialization
116
+ - JSONL storage I/O
117
+ - LLM extraction with mocks
118
+ - Context building
119
+ - Multi-turn memory retention
120
+
121
+ ## Usage Examples
122
+
123
+ ### Example 1: CLI Multi-Turn
124
+
125
+ ```bash
126
+ $ uv run bub chat
127
+ bub > Alice is a data scientist.
128
+ Agent > Got it.
129
+
130
+ bub > What is Alice's profession?
131
+ Agent > Alice is a data scientist. (retrieved from semantic memory)
132
+
133
+ bub > ,tape.info
134
+ [Shows: 2 entries, 1 anchor, ... semantic snapshots: 2]
135
+ ```
136
+
137
+ ### Example 2: Telegram
138
+
139
+ ```
140
+ You: "I need to fix a critical bug in the payment module"
141
+ Bot: [Uses semantic memory to track bug, module]
142
+
143
+ You: "What was I working on?"
144
+ Bot: [Recalls semantic memory: bug:critical_payment, module:payment]
145
+ ```
146
+
147
+ ### Example 3: Inspect Semantic Store
148
+
149
+ ```bash
150
+ $ cat ~/.bub/tapes/semantic/527c9ae0c6f31e05__0b871d5e50e7c192.jsonl | python -m json.tool
151
+ [Shows stored entities and relations]
152
+ ```
153
+
154
+ ## Performance & Cost
155
+
156
+ ### Token Usage
157
+ - Each extraction call: ~300-500 tokens (depends on entry volume)
158
+ - Estimated overhead: **+10-20%** per turn (configurable via extraction prompt)
159
+
160
+ ### Storage
161
+ - JSONL format: ~1-2 KB per snapshot (grows with entities/relations)
162
+ - Typical session: ~50-100 KB
163
+
164
+ ### Latency
165
+ - Extraction is async, non-blocking
166
+ - First turn (with extraction): ~500ms extra
167
+ - Subsequent turns: ~50ms extra (just loading snapshots)
168
+
169
+ ## Graceful Degradation
170
+
171
+ If semantic extraction fails for any reason:
172
+ - LLM error: Returns empty snapshot, continues
173
+ - Invalid JSON: Logged as warning, continues
174
+ - Storage error: Logged, continues with base context
175
+
176
+ The agent **always** works, semantic memory is optional enhancement.
177
+
178
+ ## Future Enhancements
179
+
180
+ ### Phase 2: Smart Retrieval
181
+ - Vector embeddings for semantic similarity search
182
+ - Retrieval-augmented context injection (only include relevant entities)
183
+ - Reduces prompt bloat for long sessions
184
+
185
+ ### Phase 3: Advanced Graphs
186
+ - Entity dependency analysis (who depends on what)
187
+ - Centrality metrics (who/what is most important)
188
+ - Causal reasoning (what led to what)
189
+
190
+ ### Phase 4: Multi-Session Memory
191
+ - Cross-session entity resolution
192
+ - Long-term memory across multiple conversations
193
+ - Persistent entity graph (not just per-tape)
194
+
195
+ ## Troubleshooting
196
+
197
+ **Q: Plugin not loading?**
198
+ A: Check that entry-point is registered:
199
+ ```bash
200
+ python -c "import importlib.metadata; print(list(importlib.metadata.entry_points(group='bub')))"
201
+ ```
202
+
203
+ **Q: Semantic snapshots not appearing?**
204
+ A: Check `~/.bub/tapes/semantic/` directory exists. Check logs with `BUB_VERBOSE=1`.
205
+
206
+ **Q: LLM calls are expensive?**
207
+ A: Reduce extraction frequency or use a cheaper model (e.g., DeepSeek distill). Future releases will support model selection per plugin.
208
+
209
+ ## API Reference
210
+
211
+ ### `build_semantic_context(entries, context, llm=None, store=None) → list[dict]`
212
+
213
+ Build context with semantic memory. Called by the framework automatically.
214
+
215
+ **Args:**
216
+ - `entries`: Iterable of TapeEntry objects
217
+ - `context`: TapeContext instance
218
+ - `llm`: republic.LLM instance (optional; if None, returns base context)
219
+ - `store`: SemanticStore instance (optional; if None, returns base context)
220
+
221
+ **Returns:** List of message dicts ready for model input
222
+
223
+ ### `extract_semantics(entries, llm, tape_id, anchor_id=None, max_tokens=1000) → SemanticSnapshot`
224
+
225
+ Extract entities and relations from tape entries.
226
+
227
+ **Args:**
228
+ - `entries`: List of TapeEntry objects
229
+ - `llm`: republic.LLM instance for extraction
230
+ - `tape_id`: Session/tape identifier
231
+ - `anchor_id`: Optional anchor point identifier
232
+ - `max_tokens`: Max tokens for LLM response
233
+
234
+ **Returns:** SemanticSnapshot with extracted entities/relations
235
+
236
+ ## Contributing
237
+
238
+ This plugin is part of Bub's extensibility model. To extend:
239
+
240
+ 1. **Custom entity types**: Modify Entity.type enum in models.py
241
+ 2. **Custom extractors**: Replace or wrap extractor.py
242
+ 3. **Custom storage**: Implement SemanticStore interface
243
+ 4. **Custom formatters**: Replace _format_snapshots in context.py
244
+
245
+ All without modifying Bub core.
246
+
247
+ ## License
248
+
249
+ Same as Bub (Apache 2.0)
250
+
251
+ ---
252
+
253
+ **Questions?** See [Bub documentation](https://bub.build) or open an issue.
@@ -0,0 +1,5 @@
1
+ """Semantic memory plugin for Bub."""
2
+
3
+ from bub.plugins.semantic_memory.hook_impl import SemanticMemoryPlugin
4
+
5
+ __all__ = ["SemanticMemoryPlugin"]
@@ -0,0 +1,113 @@
1
+ """Semantic memory context builder for the semantic_memory plugin."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from collections.abc import Iterable
6
+ from typing import Any
7
+
8
+ from republic import LLM, TapeContext, TapeEntry
9
+
10
+ import bub.builtin.context as _builtin_context
11
+ from bub.plugins.semantic_memory import extractor
12
+ from bub.plugins.semantic_memory.models import SemanticSnapshot
13
+ from bub.plugins.semantic_memory.store import SemanticStore
14
+
15
+
16
+ async def build_semantic_context(
17
+ entries: Iterable[TapeEntry],
18
+ context: TapeContext,
19
+ llm: LLM,
20
+ store: SemanticStore,
21
+ ) -> list[dict[str, Any]]:
22
+ """Build a message list enriched with semantic memory.
23
+
24
+ Steps:
25
+ 1. Collect base messages via the default tape context selector.
26
+ 2. Extract a SemanticSnapshot from the current entries using the LLM.
27
+ 3. Append the snapshot to the persistent store.
28
+ 4. Load all historical snapshots for this tape.
29
+ 5. Format snapshots into a system prompt block and append it to messages.
30
+ 6. Return the complete message list.
31
+
32
+ If no snapshots exist after loading, the function returns only the base
33
+ messages without adding an empty semantic block.
34
+ """
35
+ entries_list = list(entries)
36
+
37
+ # Step 1: Build base messages using the default selector
38
+ messages: list[dict[str, Any]] = _builtin_context._select_messages(entries_list, context)
39
+
40
+ # Determine tape_id from context state (session_id is stored there at runtime)
41
+ tape_id: str = str(context.state.get("session_id", "")) or "default"
42
+
43
+ # Determine anchor_id from the last entry id (or empty string)
44
+ anchor_id: str = str(entries_list[-1].id) if entries_list else ""
45
+
46
+ # Step 2: Extract semantics from current entries
47
+ snapshot: SemanticSnapshot = await extractor.extract_semantics(
48
+ entries_list,
49
+ llm,
50
+ tape_id=tape_id,
51
+ anchor_id=anchor_id,
52
+ )
53
+
54
+ # Step 3: Append snapshot to store (always persist, even if empty)
55
+ await store.append(tape_id, snapshot)
56
+
57
+ # Step 4: Load all historical snapshots for this tape
58
+ snapshots: list[SemanticSnapshot] = await store.load(tape_id)
59
+
60
+ if not snapshots:
61
+ return messages
62
+
63
+ # Step 5: Format snapshots into a Markdown system prompt block
64
+ system_content = _format_snapshots(snapshots)
65
+
66
+ # Step 6: Append semantic memory block to messages
67
+ messages.append({"role": "system", "content": system_content})
68
+
69
+ return messages
70
+
71
+
72
+ def _format_snapshots(snapshots: list[SemanticSnapshot]) -> str:
73
+ """Render a list of SemanticSnapshot objects as a Markdown system prompt."""
74
+ # Deduplicate entities and relations across all snapshots
75
+ seen_entity_ids: set[str] = set()
76
+ seen_relation_keys: set[tuple[str, str, str]] = set()
77
+
78
+ all_entities = []
79
+ all_relations = []
80
+
81
+ # Build an id->name lookup for relation rendering
82
+ id_to_name: dict[str, str] = {}
83
+
84
+ for snap in snapshots:
85
+ for entity in snap.entities:
86
+ id_to_name[entity.id] = entity.name
87
+ if entity.id not in seen_entity_ids:
88
+ seen_entity_ids.add(entity.id)
89
+ all_entities.append(entity)
90
+
91
+ for relation in snap.relations:
92
+ key = (relation.from_id, relation.to_id, relation.type)
93
+ if key not in seen_relation_keys:
94
+ seen_relation_keys.add(key)
95
+ all_relations.append(relation)
96
+
97
+ entity_count = len(all_entities)
98
+ relation_count = len(all_relations)
99
+
100
+ lines: list[str] = ["## Semantic Memory", ""]
101
+
102
+ lines.append(f"### Entities ({entity_count}):")
103
+ for entity in all_entities:
104
+ lines.append(f"- {entity.type}:{entity.name} (id={entity.id})")
105
+
106
+ lines.append("")
107
+ lines.append(f"### Relations ({relation_count}):")
108
+ for relation in all_relations:
109
+ from_name = id_to_name.get(relation.from_id, relation.from_id)
110
+ to_name = id_to_name.get(relation.to_id, relation.to_id)
111
+ lines.append(f"- {from_name} --{relation.type}--> {to_name}")
112
+
113
+ return "\n".join(lines)
@@ -0,0 +1,233 @@
1
+ """
2
+ Semantic extraction for the semantic_memory plugin.
3
+
4
+ Provides extract_semantics(), which takes a list of TapeEntry objects and an LLM
5
+ instance, calls the LLM to identify entities and relations in the conversation, and
6
+ returns the result as a SemanticSnapshot.
7
+ """
8
+
9
+ from __future__ import annotations
10
+
11
+ import json
12
+ import logging
13
+ from typing import Any
14
+
15
+ from republic import LLM
16
+ from republic.tape.entries import TapeEntry
17
+
18
+ from bub.plugins.semantic_memory.models import Entity, Relation, SemanticSnapshot
19
+
20
+ logger = logging.getLogger(__name__)
21
+
22
+ _SYSTEM_PROMPT = """\
23
+ You are a semantic extraction assistant. Given a snippet of conversation history, \
24
+ identify the key entities and the relations between them.
25
+
26
+ Respond ONLY with a single valid JSON object — no markdown fences, no prose. \
27
+ The JSON must conform exactly to this schema:
28
+
29
+ {
30
+ "entities": [
31
+ {"id": "<short_stable_slug>", "type": "person|task|event|concept", "name": "<human name>", "metadata": {}}
32
+ ],
33
+ "relations": [
34
+ {"from": "<entity_id>", "to": "<entity_id>", "type": "<relation_label>", "metadata": {}}
35
+ ]
36
+ }
37
+
38
+ Guidelines:
39
+ - Use concise, lowercase slugs for entity ids (e.g. "alice", "deploy_task").
40
+ - Entity types must be one of: person, task, event, concept.
41
+ - Relation types are free-form verbs/labels (e.g. "creates", "depends_on", "mentions").
42
+ - Only include entities and relations that are clearly present or implied in the text.
43
+ - If nothing meaningful can be extracted, return {"entities": [], "relations": []}.
44
+ """
45
+
46
+
47
+ def _entries_to_text(entries: list[TapeEntry]) -> str:
48
+ """Convert a list of TapeEntry objects into a compact human-readable string."""
49
+ lines: list[str] = []
50
+ for entry in entries:
51
+ kind = entry.kind
52
+ payload = entry.payload
53
+
54
+ if kind == "message":
55
+ role = payload.get("role", "unknown")
56
+ content = payload.get("content", "")
57
+ if isinstance(content, list):
58
+ # Multimodal content — extract text parts only
59
+ content = " ".join(
60
+ part.get("text", "") for part in content if isinstance(part, dict) and part.get("type") == "text"
61
+ )
62
+ lines.append(f"[{role}] {content}")
63
+
64
+ elif kind == "system":
65
+ content = payload.get("content", "")
66
+ lines.append(f"[system] {content}")
67
+
68
+ elif kind == "tool_call":
69
+ for call in payload.get("calls", []):
70
+ name = call.get("function", {}).get("name") or call.get("name", "unknown_tool")
71
+ lines.append(f"[tool_call] {name}")
72
+
73
+ elif kind == "tool_result":
74
+ for result in payload.get("results", []):
75
+ if isinstance(result, dict):
76
+ text = result.get("content", result.get("output", str(result)))
77
+ else:
78
+ text = str(result)
79
+ lines.append(f"[tool_result] {text}")
80
+
81
+ elif kind == "event":
82
+ event_name = payload.get("name", "")
83
+ lines.append(f"[event] {event_name}")
84
+
85
+ elif kind == "anchor":
86
+ anchor_name = payload.get("name", "")
87
+ lines.append(f"[anchor] {anchor_name}")
88
+
89
+ # Skip other internal entry types silently
90
+
91
+ return "\n".join(lines)
92
+
93
+
94
+ def _parse_llm_response(raw: str) -> tuple[list[dict[str, Any]], list[dict[str, Any]]]:
95
+ """Parse raw LLM output into (entities_list, relations_list).
96
+
97
+ Returns empty lists on any parse error.
98
+ """
99
+ try:
100
+ data = json.loads(raw)
101
+ except json.JSONDecodeError as exc:
102
+ logger.warning("semantic_memory: failed to parse LLM JSON response: %s", exc)
103
+ return [], []
104
+
105
+ if not isinstance(data, dict):
106
+ logger.warning("semantic_memory: LLM response is not a JSON object")
107
+ return [], []
108
+
109
+ entities = data.get("entities", [])
110
+ relations = data.get("relations", [])
111
+
112
+ if not isinstance(entities, list) or not isinstance(relations, list):
113
+ logger.warning("semantic_memory: 'entities' or 'relations' is not a list")
114
+ return [], []
115
+
116
+ return entities, relations
117
+
118
+
119
+ def _build_snapshot(
120
+ entities_raw: list[dict[str, Any]],
121
+ relations_raw: list[dict[str, Any]],
122
+ tape_id: str,
123
+ anchor_id: str,
124
+ ) -> SemanticSnapshot:
125
+ """Convert raw dicts from the LLM into a SemanticSnapshot."""
126
+ # Build a mapping from LLM slug -> UUID so relations can reference entities
127
+ slug_to_uuid: dict[str, str] = {}
128
+ entities: list[Entity] = []
129
+
130
+ for item in entities_raw:
131
+ if not isinstance(item, dict):
132
+ continue
133
+ slug = str(item.get("id", ""))
134
+ entity_type = str(item.get("type", "concept"))
135
+ name = str(item.get("name", slug))
136
+ metadata = item.get("metadata", {}) or {}
137
+ if not isinstance(metadata, dict):
138
+ metadata = {}
139
+
140
+ entity = Entity(type=entity_type, name=name, metadata=metadata)
141
+ entities.append(entity)
142
+ if slug:
143
+ slug_to_uuid[slug] = entity.id
144
+
145
+ relations: list[Relation] = []
146
+ for item in relations_raw:
147
+ if not isinstance(item, dict):
148
+ continue
149
+ from_slug = str(item.get("from", ""))
150
+ to_slug = str(item.get("to", ""))
151
+ relation_type = str(item.get("type", "related_to"))
152
+ metadata = item.get("metadata", {}) or {}
153
+ if not isinstance(metadata, dict):
154
+ metadata = {}
155
+
156
+ from_id = slug_to_uuid.get(from_slug)
157
+ to_id = slug_to_uuid.get(to_slug)
158
+ if not from_id or not to_id:
159
+ logger.debug(
160
+ "semantic_memory: skipping relation '%s' -> '%s' (unknown slug)",
161
+ from_slug,
162
+ to_slug,
163
+ )
164
+ continue
165
+
166
+ relations.append(Relation(from_id=from_id, to_id=to_id, type=relation_type, metadata=metadata))
167
+
168
+ return SemanticSnapshot(
169
+ entities=tuple(entities),
170
+ relations=tuple(relations),
171
+ tape_id=tape_id,
172
+ anchor_id=anchor_id,
173
+ )
174
+
175
+
176
+ async def extract_semantics(
177
+ entries: list[TapeEntry],
178
+ llm: LLM,
179
+ *,
180
+ tape_id: str = "",
181
+ anchor_id: str = "",
182
+ max_tokens: int = 1000,
183
+ ) -> SemanticSnapshot:
184
+ """Extract semantic entities and relations from a list of TapeEntry objects.
185
+
186
+ Calls the LLM to analyse the conversation represented by *entries* and returns
187
+ a SemanticSnapshot. On any failure (LLM error, invalid JSON, …) an empty
188
+ snapshot is returned rather than raising.
189
+
190
+ Args:
191
+ entries: The tape entries to analyse.
192
+ llm: A republic.LLM instance used to call the model.
193
+ tape_id: Identifier for the tape these entries belong to.
194
+ anchor_id: Identifier of the tape anchor this snapshot is tied to.
195
+ max_tokens: Maximum tokens for the LLM response.
196
+
197
+ Returns:
198
+ A SemanticSnapshot (possibly empty on failure).
199
+ """
200
+ empty_snapshot = SemanticSnapshot(
201
+ entities=(),
202
+ relations=(),
203
+ tape_id=tape_id,
204
+ anchor_id=anchor_id,
205
+ )
206
+
207
+ if not entries:
208
+ return empty_snapshot
209
+
210
+ conversation_text = _entries_to_text(entries)
211
+ if not conversation_text.strip():
212
+ return empty_snapshot
213
+
214
+ try:
215
+ raw_response: str = await llm.chat_async(
216
+ prompt=conversation_text,
217
+ system_prompt=_SYSTEM_PROMPT,
218
+ max_tokens=max_tokens,
219
+ )
220
+ except Exception as exc:
221
+ logger.warning("semantic_memory: LLM call failed: %s", exc)
222
+ return empty_snapshot
223
+
224
+ entities_raw, relations_raw = _parse_llm_response(raw_response)
225
+
226
+ if not entities_raw and not relations_raw:
227
+ return empty_snapshot
228
+
229
+ try:
230
+ return _build_snapshot(entities_raw, relations_raw, tape_id=tape_id, anchor_id=anchor_id)
231
+ except Exception as exc:
232
+ logger.warning("semantic_memory: failed to build snapshot: %s", exc)
233
+ return empty_snapshot
@@ -0,0 +1,144 @@
1
+ """Semantic memory plugin hook implementation."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import json
6
+ from collections.abc import Iterable
7
+ from typing import Any
8
+
9
+ from republic import LLM, TapeContext, TapeEntry
10
+
11
+ from bub.builtin.settings import load_settings
12
+ from bub.framework import BubFramework
13
+ from bub.hookspecs import hookimpl
14
+ from bub.plugins.semantic_memory.store import SemanticStore
15
+
16
+
17
+ def _build_default_context(
18
+ entries: Iterable[TapeEntry],
19
+ ) -> list[dict[str, Any]]:
20
+ """Build default tape context (sync helper)."""
21
+ messages: list[dict[str, Any]] = []
22
+ pending_calls: list[dict[str, Any]] = []
23
+
24
+ for entry in entries:
25
+ match entry.kind:
26
+ case "anchor":
27
+ payload = entry.payload
28
+ content = (
29
+ f"[Anchor created: {payload.get('name')}]: "
30
+ f"{json.dumps(payload.get('state'), ensure_ascii=False)}"
31
+ )
32
+ messages.append({"role": "assistant", "content": content})
33
+ case "message":
34
+ payload = entry.payload
35
+ if isinstance(payload, dict):
36
+ messages.append(dict(payload))
37
+ case "tool_call":
38
+ calls = entry.payload.get("calls")
39
+ if isinstance(calls, list):
40
+ normalized = [dict(c) for c in calls if isinstance(c, dict)]
41
+ if normalized:
42
+ messages.append(
43
+ {"role": "assistant", "content": "", "tool_calls": normalized}
44
+ )
45
+ pending_calls = normalized
46
+ case "tool_result":
47
+ results = entry.payload.get("results")
48
+ if isinstance(results, list):
49
+ for i, result in enumerate(results):
50
+ msg: dict[str, Any] = {
51
+ "role": "tool",
52
+ "content": (
53
+ result
54
+ if isinstance(result, str)
55
+ else json.dumps(result, ensure_ascii=False)
56
+ ),
57
+ }
58
+ if i < len(pending_calls):
59
+ call = pending_calls[i]
60
+ if call_id := call.get("id"):
61
+ msg["tool_call_id"] = call_id
62
+ fn = call.get("function")
63
+ if isinstance(fn, dict) and (name := fn.get("name")):
64
+ msg["name"] = name
65
+ messages.append(msg)
66
+ pending_calls = []
67
+
68
+ return messages
69
+
70
+
71
+ async def build_semantic_context(
72
+ entries: Iterable[TapeEntry],
73
+ context: TapeContext,
74
+ llm: LLM | None = None,
75
+ store: SemanticStore | None = None,
76
+ ) -> list[dict[str, Any]]:
77
+ """Build context with semantic memory enhancements.
78
+
79
+ If llm or store are not provided, returns just the base context.
80
+ """
81
+ # Build base context
82
+ messages = _build_default_context(entries)
83
+
84
+ # Extract and append semantic context if both llm and store are available
85
+ if llm is None or store is None:
86
+ return messages
87
+
88
+ try:
89
+ from bub.plugins.semantic_memory.extractor import extract_semantics
90
+ from bub.plugins.semantic_memory.context import _format_snapshots
91
+
92
+ entries_list = list(entries)
93
+ if not entries_list:
94
+ return messages
95
+
96
+ # Get tape_id from context
97
+ tape_id = context.state.get("session_id", "unknown") if hasattr(context, "state") else "unknown"
98
+
99
+ # Extract new semantics
100
+ snapshot = await extract_semantics(entries_list, llm, tape_id=tape_id)
101
+ if snapshot.entities or snapshot.relations:
102
+ await store.append(tape_id, snapshot)
103
+
104
+ # Load all historical snapshots
105
+ snapshots = await store.load(tape_id)
106
+ if snapshots:
107
+ semantic_block = _format_snapshots(snapshots)
108
+ messages.append({"role": "system", "content": semantic_block})
109
+ except Exception as e:
110
+ # Graceful degradation: if semantic extraction fails, just use base context
111
+ import logging
112
+ logging.warning(f"Semantic extraction failed: {e}")
113
+
114
+ return messages
115
+
116
+
117
+ class SemanticMemoryPlugin:
118
+ """Bub plugin that provides semantic memory via a TapeContext selector."""
119
+
120
+ def __init__(self, framework: BubFramework) -> None:
121
+ self.framework = framework
122
+ settings = load_settings()
123
+ from republic.tape import InMemoryTapeStore
124
+
125
+ tape_store = InMemoryTapeStore()
126
+ self.llm = LLM(
127
+ settings.model,
128
+ api_key=settings.api_key,
129
+ api_base=settings.api_base,
130
+ tape_store=tape_store,
131
+ )
132
+ self.store = SemanticStore()
133
+
134
+ @hookimpl
135
+ def build_tape_context(self) -> TapeContext:
136
+ llm = self.llm
137
+ store = self.store
138
+
139
+ async def select_with_semantics(entries: Iterable[TapeEntry], context: TapeContext) -> list[dict]:
140
+ return await build_semantic_context(entries, context, llm=llm, store=store)
141
+
142
+ return TapeContext(select=select_with_semantics)
143
+
144
+
@@ -0,0 +1,110 @@
1
+ """
2
+ Pydantic models for the semantic_memory plugin.
3
+
4
+ Entity
5
+ Represents a named concept extracted from conversation context (e.g. a person,
6
+ place, technology, or abstract idea). Each entity has a stable UUID, a type
7
+ label, a human-readable name, and an open-ended metadata bag.
8
+
9
+ Relation
10
+ Represents a directed edge between two entities (from_id -> to_id) with a
11
+ type label (e.g. "uses", "belongs_to", "created_by") and optional metadata.
12
+
13
+ SemanticSnapshot
14
+ An immutable point-in-time capture of a set of entities and relations,
15
+ anchored to a specific tape position (tape_id + anchor_id). Snapshots are
16
+ the unit of persistence and retrieval for the plugin.
17
+ """
18
+
19
+ import uuid
20
+ from datetime import datetime, timezone
21
+ from typing import Any
22
+
23
+ from pydantic import BaseModel, Field
24
+
25
+
26
+ class Entity(BaseModel):
27
+ """A named concept node in the semantic graph.
28
+
29
+ Attributes:
30
+ id: Stable UUID that uniquely identifies this entity.
31
+ type: A short label describing the category (e.g. "person", "tool").
32
+ name: Human-readable name for the entity.
33
+ metadata: Arbitrary key-value pairs for extra context.
34
+ """
35
+
36
+ model_config = {
37
+ "frozen": True,
38
+ "json_encoders": {datetime: lambda v: v.isoformat()},
39
+ }
40
+
41
+ id: str = Field(default_factory=lambda: str(uuid.uuid4()))
42
+ type: str
43
+ name: str
44
+ metadata: dict[str, Any] = Field(default_factory=dict)
45
+
46
+ def __hash__(self) -> int:
47
+ return hash(self.id)
48
+
49
+ def __eq__(self, other: object) -> bool:
50
+ if not isinstance(other, Entity):
51
+ return NotImplemented
52
+ return self.id == other.id
53
+
54
+
55
+ class Relation(BaseModel):
56
+ """A directed edge between two Entity nodes.
57
+
58
+ Attributes:
59
+ from_id: UUID of the source entity.
60
+ to_id: UUID of the target entity.
61
+ type: Label describing the relationship (e.g. "uses", "created_by").
62
+ metadata: Arbitrary key-value pairs for extra context.
63
+ """
64
+
65
+ model_config = {
66
+ "frozen": True,
67
+ "json_encoders": {datetime: lambda v: v.isoformat()},
68
+ }
69
+
70
+ from_id: str
71
+ to_id: str
72
+ type: str
73
+ metadata: dict[str, Any] = Field(default_factory=dict)
74
+
75
+ def __hash__(self) -> int:
76
+ return hash((self.from_id, self.to_id, self.type))
77
+
78
+ def __eq__(self, other: object) -> bool:
79
+ if not isinstance(other, Relation):
80
+ return NotImplemented
81
+ return (
82
+ self.from_id == other.from_id
83
+ and self.to_id == other.to_id
84
+ and self.type == other.type
85
+ )
86
+
87
+
88
+ class SemanticSnapshot(BaseModel):
89
+ """An immutable snapshot of entities and relations at a tape position.
90
+
91
+ Attributes:
92
+ entities: Ordered list of Entity nodes captured at this snapshot.
93
+ relations: Ordered list of Relation edges captured at this snapshot.
94
+ tape_id: Identifier of the tape (conversation thread) this belongs to.
95
+ anchor_id: Identifier of the specific tape entry this snapshot is anchored to.
96
+ created_at: UTC timestamp when this snapshot was created.
97
+ """
98
+
99
+ model_config = {
100
+ "frozen": True,
101
+ "json_encoders": {datetime: lambda v: v.isoformat()},
102
+ }
103
+
104
+ entities: tuple[Entity, ...] = Field(default_factory=tuple)
105
+ relations: tuple[Relation, ...] = Field(default_factory=tuple)
106
+ tape_id: str
107
+ anchor_id: str
108
+ created_at: datetime = Field(
109
+ default_factory=lambda: datetime.now(tz=timezone.utc)
110
+ )
@@ -0,0 +1,42 @@
1
+ from __future__ import annotations
2
+
3
+ import json
4
+ from pathlib import Path
5
+
6
+ import aiofiles
7
+
8
+ from bub.plugins.semantic_memory.types import SemanticSnapshot
9
+
10
+
11
+ class SemanticStore:
12
+ def __init__(self, storage_root: Path | None = None) -> None:
13
+ if storage_root is None:
14
+ storage_root = Path.home() / ".bub" / "tapes" / "semantic"
15
+ self._storage_root = storage_root
16
+ self._storage_root.mkdir(parents=True, exist_ok=True)
17
+
18
+ def tape_file_path(self, tape_id: str) -> Path:
19
+ return self._storage_root / f"{tape_id}.jsonl"
20
+
21
+ async def append(self, tape_id: str, snapshot: SemanticSnapshot) -> None:
22
+ path = self.tape_file_path(tape_id)
23
+ async with aiofiles.open(path, mode="a", encoding="utf-8") as f:
24
+ await f.write(snapshot.model_dump_json() + "\n")
25
+
26
+ async def load(self, tape_id: str) -> list[SemanticSnapshot]:
27
+ path = self.tape_file_path(tape_id)
28
+ if not path.exists():
29
+ return []
30
+ async with aiofiles.open(path, mode="r", encoding="utf-8") as f:
31
+ lines = await f.readlines()
32
+ snapshots: list[SemanticSnapshot] = []
33
+ for line in lines:
34
+ line = line.strip()
35
+ if not line:
36
+ continue
37
+ try:
38
+ data = json.loads(line)
39
+ snapshots.append(SemanticSnapshot.model_validate(data))
40
+ except (json.JSONDecodeError, Exception):
41
+ continue
42
+ return snapshots
@@ -0,0 +1,5 @@
1
+ """Re-export types for backward compatibility."""
2
+
3
+ from bub.plugins.semantic_memory.models import Entity, Relation, SemanticSnapshot
4
+
5
+ __all__ = ["Entity", "Relation", "SemanticSnapshot"]