bub-semantic-memory 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,151 @@
1
+ docs/source
2
+
3
+ # From https://raw.githubusercontent.com/github/gitignore/main/Python.gitignore
4
+
5
+ # Byte-compiled / optimized / DLL files
6
+ __pycache__/
7
+ *.py[cod]
8
+ *$py.class
9
+
10
+ # C extensions
11
+ *.so
12
+
13
+ # Distribution / packaging
14
+ .Python
15
+ build/
16
+ develop-eggs/
17
+ dist/
18
+ downloads/
19
+ eggs/
20
+ .eggs/
21
+ lib/
22
+ lib64/
23
+ parts/
24
+ sdist/
25
+ var/
26
+ wheels/
27
+ share/python-wheels/
28
+ *.egg-info/
29
+ .installed.cfg
30
+ *.egg
31
+ MANIFEST
32
+
33
+ # PyInstaller
34
+ # Usually these files are written by a python script from a template
35
+ # before PyInstaller builds the exe, so as to inject date/other infos into it.
36
+ *.manifest
37
+ *.spec
38
+
39
+ # Installer logs
40
+ pip-log.txt
41
+ pip-delete-this-directory.txt
42
+
43
+ # Unit test / coverage reports
44
+ htmlcov/
45
+ .tox/
46
+ .nox/
47
+ .coverage
48
+ .coverage.*
49
+ .cache
50
+ nosetests.xml
51
+ coverage.xml
52
+ *.cover
53
+ *.py,cover
54
+ .hypothesis/
55
+ .pytest_cache/
56
+ cover/
57
+
58
+ # Translations
59
+ *.mo
60
+ *.pot
61
+
62
+ # Django stuff:
63
+ *.log
64
+ local_settings.py
65
+ db.sqlite3
66
+ db.sqlite3-journal
67
+
68
+ # Flask stuff:
69
+ instance/
70
+ .webassets-cache
71
+
72
+ # Scrapy stuff:
73
+ .scrapy
74
+
75
+ # Sphinx documentation
76
+ docs/_build/
77
+
78
+ # PyBuilder
79
+ .pybuilder/
80
+ target/
81
+
82
+ # Jupyter Notebook
83
+ .ipynb_checkpoints
84
+
85
+ # IPython
86
+ profile_default/
87
+ ipython_config.py
88
+
89
+ # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
90
+ __pypackages__/
91
+
92
+ # Celery stuff
93
+ celerybeat-schedule
94
+ celerybeat.pid
95
+
96
+ # SageMath parsed files
97
+ *.sage.py
98
+
99
+ # Environments
100
+ .env
101
+ .venv
102
+ env/
103
+ venv/
104
+ ENV/
105
+ env.bak/
106
+ venv.bak/
107
+ .pdm-python
108
+
109
+ # Spyder project settings
110
+ .spyderproject
111
+ .spyproject
112
+
113
+ # Rope project settings
114
+ .ropeproject
115
+
116
+ # mkdocs documentation
117
+ /site
118
+
119
+ # mypy
120
+ .mypy_cache/
121
+ .dmypy.json
122
+ dmypy.json
123
+
124
+ # Pyre type checker
125
+ .pyre/
126
+
127
+ # pytype static type analyzer
128
+ .pytype/
129
+
130
+ # Cython debug symbols
131
+ cython_debug/
132
+
133
+ # Vscode config files
134
+ .vscode/
135
+
136
+ # PyCharm
137
+ # JetBrains specific template is maintained in a separate JetBrains.gitignore that can
138
+ # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
139
+ # and can be added to the global gitignore or merged into this file. For a more nuclear
140
+ # option (not recommended) you can uncomment the following to ignore the entire idea folder.
141
+ #.idea/
142
+
143
+ # Reference directory - ignore all reference projects
144
+ reference/
145
+ !website/src/content/docs/**/build
146
+ !website/src/content/docs/**/reference
147
+
148
+ # Local legacy backups created during framework migrations
149
+ backup/
150
+ _version.py
151
+ !website/src/lib/
@@ -0,0 +1,267 @@
1
+ Metadata-Version: 2.4
2
+ Name: bub-semantic-memory
3
+ Version: 0.1.0
4
+ Summary: Semantic memory plugin for Bub
5
+ Project-URL: Repository, https://github.com/bubbuild/bub
6
+ Author: Bub Community
7
+ License: Apache-2.0
8
+ Requires-Python: >=3.12
9
+ Requires-Dist: aiofiles>=25.1.0
10
+ Requires-Dist: bub>=0.3.0
11
+ Requires-Dist: pydantic>=2.0.0
12
+ Requires-Dist: republic>=0.5.4
13
+ Description-Content-Type: text/markdown
14
+
15
+ # Semantic Memory Plugin for Bub
16
+
17
+ A plugin that extracts and retains semantic entities and relations from conversation histories, enriching agent context with semantic memory.
18
+
19
+ ## Overview
20
+
21
+ This plugin intercepts the tape context building process to:
22
+ 1. **Extract semantics** from conversation entries using an LLM
23
+ 2. **Store snapshots** of entities (people, tasks, concepts) and relations between them
24
+ 3. **Inject memory** into subsequent agent prompts, enabling long-context awareness
25
+
26
+ The plugin follows Bub's philosophy: it's completely optional, zero-config after installation, and hooks into the existing `build_tape_context` architecture without modifying core.
27
+
28
+ ## Installation
29
+
30
+ The plugin is already registered in `pyproject.toml`:
31
+
32
+ ```toml
33
+ [project.entry-points."bub"]
34
+ semantic_memory = "bub.plugins.semantic_memory.hook_impl:SemanticMemoryPlugin"
35
+ ```
36
+
37
+ Bub's framework automatically loads and instantiates it on startup. No additional setup required.
38
+
39
+ ## How It Works
40
+
41
+ ### Per-Turn Flow
42
+
43
+ 1. **Input**: Agent receives a new message, tape entries are loaded
44
+ 2. **Extract**: LLM analyzes entries and identifies:
45
+ - **Entities**: people, tasks, events, concepts
46
+ - **Relations**: created, depends_on, mentions, etc.
47
+ 3. **Store**: SemanticSnapshot is appended to `~/.bub/tapes/semantic/{tape_id}.jsonl`
48
+ 4. **Load**: All historical snapshots for this tape are loaded
49
+ 5. **Inject**: Semantic memory is formatted as a system prompt block and prepended to the context
50
+ 6. **Output**: Agent receives enriched context with semantic awareness
51
+
52
+ ### Example
53
+
54
+ Given this conversation:
55
+ ```
56
+ User: "Alice created a task to deploy v1.0"
57
+ Agent: [responds]
58
+ User: "What did Alice do?"
59
+ ```
60
+
61
+ On the second turn, the agent sees:
62
+
63
+ ```
64
+ ## Semantic Memory
65
+
66
+ ### Entities (2):
67
+ - person:alice
68
+ - task:deploy_v1 (v1.0 deployment)
69
+
70
+ ### Relations (1):
71
+ - alice --created--> deploy_v1
72
+
73
+ ---
74
+
75
+ [rest of context]
76
+ ```
77
+
78
+ ## Architecture
79
+
80
+ ### Core Modules
81
+
82
+ - **`models.py`**: Pydantic dataclasses for Entity, Relation, SemanticSnapshot
83
+ - **`extractor.py`**: LLM-based extraction from tape entries
84
+ - **`store.py`**: JSONL file storage at `~/.bub/tapes/semantic/`
85
+ - **`context.py`**: Formatting snapshots into system prompts
86
+ - **`hook_impl.py`**: Bub hookimpl that wires everything together
87
+
88
+ ### Storage Format
89
+
90
+ Snapshots are stored as JSONL (one JSON object per line):
91
+
92
+ ```json
93
+ {
94
+ "entities": [
95
+ {"id": "ent_abc123", "type": "person", "name": "Alice", "metadata": {}},
96
+ {"id": "ent_def456", "type": "task", "name": "deploy_v1", "metadata": {"version": "1.0"}}
97
+ ],
98
+ "relations": [
99
+ {"from": "ent_abc123", "to": "ent_def456", "type": "created", "metadata": {}}
100
+ ],
101
+ "tape_id": "527c9ae0c6f31e05__0b871d5e50e7c192",
102
+ "anchor_id": "anchor_001",
103
+ "created_at": "2026-06-06T09:35:00Z"
104
+ }
105
+ ```
106
+
107
+ ## Configuration
108
+
109
+ The plugin **reuses your main LLM settings** (`BUB_MODEL`, `BUB_API_KEY`, etc.):
110
+
111
+ ```bash
112
+ # Your existing setup (e.g., DeepSeek)
113
+ export BUB_MODEL=deepseek:deepseek-chat
114
+ export BUB_API_KEY=sk-...
115
+ uv run bub chat
116
+ ```
117
+
118
+ No separate `BUB_SEMANTIC_*` variables needed. Semantic extraction uses the same model as your agent.
119
+
120
+ ## Testing
121
+
122
+ Run the test suite:
123
+
124
+ ```bash
125
+ uv run pytest tests/plugins/semantic_memory/test_semantic_memory.py -v
126
+ ```
127
+
128
+ **Coverage**: 43 tests across unit and integration scenarios:
129
+ - Entity/Relation serialization
130
+ - JSONL storage I/O
131
+ - LLM extraction with mocks
132
+ - Context building
133
+ - Multi-turn memory retention
134
+
135
+ ## Usage Examples
136
+
137
+ ### Example 1: CLI Multi-Turn
138
+
139
+ ```bash
140
+ $ uv run bub chat
141
+ bub > Alice is a data scientist.
142
+ Agent > Got it.
143
+
144
+ bub > What is Alice's profession?
145
+ Agent > Alice is a data scientist. (retrieved from semantic memory)
146
+
147
+ bub > ,tape.info
148
+ [Shows: 2 entries, 1 anchor, ... semantic snapshots: 2]
149
+ ```
150
+
151
+ ### Example 2: Telegram
152
+
153
+ ```
154
+ You: "I need to fix a critical bug in the payment module"
155
+ Bot: [Uses semantic memory to track bug, module]
156
+
157
+ You: "What was I working on?"
158
+ Bot: [Recalls semantic memory: bug:critical_payment, module:payment]
159
+ ```
160
+
161
+ ### Example 3: Inspect Semantic Store
162
+
163
+ ```bash
164
+ $ cat ~/.bub/tapes/semantic/527c9ae0c6f31e05__0b871d5e50e7c192.jsonl | python -m json.tool
165
+ [Shows stored entities and relations]
166
+ ```
167
+
168
+ ## Performance & Cost
169
+
170
+ ### Token Usage
171
+ - Each extraction call: ~300-500 tokens (depends on entry volume)
172
+ - Estimated overhead: **+10-20%** per turn (configurable via extraction prompt)
173
+
174
+ ### Storage
175
+ - JSONL format: ~1-2 KB per snapshot (grows with entities/relations)
176
+ - Typical session: ~50-100 KB
177
+
178
+ ### Latency
179
+ - Extraction is async, non-blocking
180
+ - First turn (with extraction): ~500ms extra
181
+ - Subsequent turns: ~50ms extra (just loading snapshots)
182
+
183
+ ## Graceful Degradation
184
+
185
+ If semantic extraction fails for any reason:
186
+ - LLM error: Returns empty snapshot, continues
187
+ - Invalid JSON: Logged as warning, continues
188
+ - Storage error: Logged, continues with base context
189
+
190
+ The agent **always** works, semantic memory is optional enhancement.
191
+
192
+ ## Future Enhancements
193
+
194
+ ### Phase 2: Smart Retrieval
195
+ - Vector embeddings for semantic similarity search
196
+ - Retrieval-augmented context injection (only include relevant entities)
197
+ - Reduces prompt bloat for long sessions
198
+
199
+ ### Phase 3: Advanced Graphs
200
+ - Entity dependency analysis (who depends on what)
201
+ - Centrality metrics (who/what is most important)
202
+ - Causal reasoning (what led to what)
203
+
204
+ ### Phase 4: Multi-Session Memory
205
+ - Cross-session entity resolution
206
+ - Long-term memory across multiple conversations
207
+ - Persistent entity graph (not just per-tape)
208
+
209
+ ## Troubleshooting
210
+
211
+ **Q: Plugin not loading?**
212
+ A: Check that entry-point is registered:
213
+ ```bash
214
+ python -c "import importlib.metadata; print(list(importlib.metadata.entry_points(group='bub')))"
215
+ ```
216
+
217
+ **Q: Semantic snapshots not appearing?**
218
+ A: Check `~/.bub/tapes/semantic/` directory exists. Check logs with `BUB_VERBOSE=1`.
219
+
220
+ **Q: LLM calls are expensive?**
221
+ A: Reduce extraction frequency or use a cheaper model (e.g., DeepSeek distill). Future releases will support model selection per plugin.
222
+
223
+ ## API Reference
224
+
225
+ ### `build_semantic_context(entries, context, llm=None, store=None) → list[dict]`
226
+
227
+ Build context with semantic memory. Called by the framework automatically.
228
+
229
+ **Args:**
230
+ - `entries`: Iterable of TapeEntry objects
231
+ - `context`: TapeContext instance
232
+ - `llm`: republic.LLM instance (optional; if None, returns base context)
233
+ - `store`: SemanticStore instance (optional; if None, returns base context)
234
+
235
+ **Returns:** List of message dicts ready for model input
236
+
237
+ ### `extract_semantics(entries, llm, tape_id, anchor_id=None, max_tokens=1000) → SemanticSnapshot`
238
+
239
+ Extract entities and relations from tape entries.
240
+
241
+ **Args:**
242
+ - `entries`: List of TapeEntry objects
243
+ - `llm`: republic.LLM instance for extraction
244
+ - `tape_id`: Session/tape identifier
245
+ - `anchor_id`: Optional anchor point identifier
246
+ - `max_tokens`: Max tokens for LLM response
247
+
248
+ **Returns:** SemanticSnapshot with extracted entities/relations
249
+
250
+ ## Contributing
251
+
252
+ This plugin is part of Bub's extensibility model. To extend:
253
+
254
+ 1. **Custom entity types**: Modify Entity.type enum in models.py
255
+ 2. **Custom extractors**: Replace or wrap extractor.py
256
+ 3. **Custom storage**: Implement SemanticStore interface
257
+ 4. **Custom formatters**: Replace _format_snapshots in context.py
258
+
259
+ All without modifying Bub core.
260
+
261
+ ## License
262
+
263
+ Same as Bub (Apache 2.0)
264
+
265
+ ---
266
+
267
+ **Questions?** See [Bub documentation](https://bub.build) or open an issue.
@@ -0,0 +1,64 @@
1
+ # Publish bub-semantic-memory to PyPI
2
+
3
+ ## 📍 Location
4
+ ```
5
+ ~/Documents/playground/bub/packages/semantic-memory/
6
+ ```
7
+
8
+ ## 📦 What's Included
9
+ - `src/bub/plugins/semantic_memory/` - 7 modules (652 lines)
10
+ - `tests/` - 43 tests
11
+ - `README.md` - Documentation
12
+ - `pyproject.toml` - PyPI metadata
13
+
14
+ ## 🚀 Quick Publish
15
+
16
+ ### Step 1: Create GitHub Repo
17
+ ```bash
18
+ cd ~/Documents/playground/bub
19
+ git add packages/semantic-memory
20
+ git commit -m "feat: add semantic-memory plugin for distribution"
21
+ git push
22
+
23
+ # Or create separate repo at https://github.com/bubbuild/bub-semantic-memory
24
+ ```
25
+
26
+ ### Step 2: Build & Publish to PyPI
27
+ ```bash
28
+ cd ~/Documents/playground/bub/packages/semantic-memory
29
+ uv build
30
+ uv publish
31
+ ```
32
+
33
+ ### Step 3: Submit to hub.bub.build
34
+ Fork https://github.com/bubbuild/buildscape
35
+ Add `plugins/semantic-memory.json`:
36
+ ```json
37
+ {
38
+ "name": "semantic-memory",
39
+ "title": "Semantic Memory",
40
+ "description": "Extract and retain semantic entities and relations from conversations",
41
+ "author": "Bub Community",
42
+ "license": "Apache-2.0",
43
+ "repository": "https://github.com/bubbuild/bub",
44
+ "pypi": "bub-semantic-memory",
45
+ "documentation": "https://github.com/bubbuild/bub#semantic-memory",
46
+ "categories": ["memory", "context"]
47
+ }
48
+ ```
49
+
50
+ ## 📊 Package Info
51
+ - **PyPI Name**: bub-semantic-memory
52
+ - **Entry Point**: bub.plugins.semantic_memory.hook_impl:SemanticMemoryPlugin
53
+ - **Version**: 0.1.0
54
+ - **Python**: 3.12+
55
+ - **License**: Apache 2.0
56
+
57
+ ## ✅ Verification
58
+ - [x] Code: 652 lines, 7 modules
59
+ - [x] Tests: 43 passing
60
+ - [x] Documentation: README.md
61
+ - [x] pyproject.toml: Configured for PyPI
62
+ - [x] Entry point: bub.plugins.semantic_memory
63
+
64
+ Ready to publish! 🎉