shaheen-db 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Saif
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,333 @@
1
+ Metadata-Version: 2.4
2
+ Name: shaheen-db
3
+ Version: 0.1.0
4
+ Summary: A lightweight, self-consolidating cognitive memory layer for AI agents. Combines SQLite, vector search, and a knowledge graph with a biologically-inspired sleep/forget cycle.
5
+ Author-email: Saif <spellsaif@gmail.com>
6
+ License-Expression: MIT
7
+ Project-URL: Homepage, https://github.com/spellsaif/shaheen
8
+ Project-URL: Repository, https://github.com/spellsaif/shaheen
9
+ Project-URL: Bug Tracker, https://github.com/spellsaif/shaheen/issues
10
+ Keywords: ai,agent,memory,cognitive,database,vector,knowledge-graph,graphrag,llm,rag,sqlite,embeddings,openai,gemini,anthropic
11
+ Classifier: Development Status :: 3 - Alpha
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.8
15
+ Classifier: Programming Language :: Python :: 3.9
16
+ Classifier: Programming Language :: Python :: 3.10
17
+ Classifier: Programming Language :: Python :: 3.11
18
+ Classifier: Programming Language :: Python :: 3.12
19
+ Classifier: Programming Language :: Python :: 3.13
20
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
21
+ Classifier: Topic :: Database
22
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
23
+ Requires-Python: >=3.8
24
+ Description-Content-Type: text/markdown
25
+ License-File: LICENSE
26
+ Requires-Dist: numpy>=1.20.0
27
+ Requires-Dist: openai>=1.0.0
28
+ Requires-Dist: google-genai>=0.1.0
29
+ Requires-Dist: pydantic>=2.0
30
+ Provides-Extra: local
31
+ Requires-Dist: sentence-transformers>=2.2.0; extra == "local"
32
+ Provides-Extra: anthropic
33
+ Requires-Dist: anthropic>=0.18.0; extra == "anthropic"
34
+ Provides-Extra: all
35
+ Requires-Dist: sentence-transformers>=2.2.0; extra == "all"
36
+ Requires-Dist: anthropic>=0.18.0; extra == "all"
37
+ Provides-Extra: dev
38
+ Requires-Dist: pytest>=7.0; extra == "dev"
39
+ Requires-Dist: pytest-mock>=3.6.0; extra == "dev"
40
+ Requires-Dist: sentence-transformers>=2.2.0; extra == "dev"
41
+ Requires-Dist: anthropic>=0.18.0; extra == "dev"
42
+ Dynamic: license-file
43
+
44
+ # πŸ¦… Shaheen DB
45
+
46
+ [![PyPI version](https://img.shields.io/pypi/v/shaheen-db.svg)](https://pypi.org/project/shaheen-db/)
47
+ [![Python Version](https://img.shields.io/badge/python-3.8%2B-blue.svg)](https://www.python.org/)
48
+ [![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
49
+ [![Tests](https://img.shields.io/badge/tests-10%20passed-brightgreen.svg)]()
50
+ [![Database](https://img.shields.io/badge/database-SQLite--WAL-orange.svg)]()
51
+ [![Vector Math](https://img.shields.io/badge/vector--math-NumPy%20Local-red.svg)]()
52
+
53
+ ```bash
54
+ pip install shaheen-db
55
+ ```
56
+
57
+ **Shaheen DB** (named after the Royal Falcon) is a lightweight, zero-ops, self-consolidating cognitive memory layer designed specifically for AI Agents.
58
+
59
+ Instead of treating agent memory as a massive, unstructured dumping ground of raw vector embeddingsβ€”which leads to bloated context windows, duplicate facts, high token costs, and retrieval noiseβ€”**Shaheen DB** implements an opinionated, biologically inspired cognitive memory pipeline.
60
+
61
+ ---
62
+
63
+ ## πŸ’‘ The Problem with Traditional Vector Databases
64
+ Most AI agents use standard vector databases (like Pinecone or Chroma) for memory. This approach has three fatal flaws in production:
65
+ 1. **Semantic Noise:** If a user mentions eating a croissant on Day 1, and asks about their business strategy on Day 30, a standard vector search will often retrieve the croissant log due to query overlaps. This pollutes the LLM's prompt.
66
+ 2. **Context Bloat:** Raw transcripts contain filler words, greetings, and temporary statements. Feeding these directly to LLMs burns tokens and incurs high latency.
67
+ 3. **No Fact Evolution:** If a user says "My name is Saif" on Day 1, and "Actually, write my name as Saif Al-Islam" on Day 5, a vector search will return both statements, forcing the LLM to guess which name is correct.
68
+
69
+ **Shaheen DB solves this by separating sensory ingestion from long-term factual consolidation, and automatically forgetting trivial logs over time.**
70
+
71
+ ---
72
+
73
+ ## 🧠 Core Memory Architecture
74
+
75
+ ```
76
+ [ Agent Conversations / Logs ]
77
+ β”‚
78
+ β–Ό (Immediate Ingestion < 5ms)
79
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
80
+ β”‚ Sensory Buffer (SQL) β”‚ ◄─── Instant Vector Match
81
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
82
+ β”‚
83
+ β–Ό (Background "Sleep" Loop)
84
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
85
+ β”‚ Consolidation Engine β”‚
86
+ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
87
+ β”‚ β”‚
88
+ (Extract facts) β”‚ β”‚ (Extract relationships)
89
+ β–Ό β–Ό
90
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
91
+ β”‚ Vectors β”‚ β”‚ Knowledge β”‚
92
+ β”‚ Index β”‚ β”‚ Graph β”‚
93
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
94
+ ```
95
+
96
+ 1. **Sensory Store (Short-term Memory):** Immediate, high-throughput writes (under 5ms) to an SQLite buffer running in Write-Ahead Logging (WAL) mode. Vectorized immediately for real-time semantic retrieval.
97
+ 2. **Consolidation Loop (The "Sleep" Cycle):** An asynchronous process that processes raw logs using structured LLM outputs to extract structured entities (nodes) and relationships (edges).
98
+ 3. **Cognitive Decay (Forgetting):** Applies mathematical exponential decay ($decay = e^{-\lambda \Delta t}$) to temporary sensory logs. Trivial information decays and is deleted, while pinned/permanent facts and organized graph connections are preserved permanently.
99
+ 4. **Hybrid GraphRAG Retrieval:** A unified retrieval query that automatically builds a context pack containing:
100
+ * **Recent Conversations:** Highly relevant active sensory logs.
101
+ * **Associated Entities & Facts:** Consolidated entities matching the query.
102
+ * **Associative Relationships:** Sub-graphs of connections between matching entities.
103
+
104
+ ---
105
+
106
+ ## ⚑ Features at a Glance
107
+
108
+ * **Zero-Ops GraphRAG:** Get the reasoning benefits of a Knowledge Graph and Vector Database in a single local SQLite file. No Neo4j servers, no Docker, no external cloud configurations.
109
+ * **Biologically Inspired Sleep Cycle:** Runs LLM extraction in the background to convert unstructured dialogue into structured relationships.
110
+ * **Native Memory Decay:** Automatic time-based cleanup of trivial logs to keep context window sizes minimal and token costs low.
111
+ * **Microsecond Graph Traversals:** Uses optimized SQL queries inside SQLite to fetch 1-hop and 2-hop entity neighborhoods in microseconds.
112
+ * **Dependency-Optional Offline Mode:** Support for local CPU/GPU embedding generation via `sentence-transformers` or local API calls to **Ollama** to avoid heavy PyTorch dependencies.
113
+
114
+ ---
115
+
116
+ ## πŸš€ Quickstart
117
+
118
+ ### 1. Installation
119
+ Install the package requirements:
120
+ ```bash
121
+ pip install -r requirements.txt
122
+ ```
123
+
124
+ ### 2. Basic Usage
125
+
126
+ ```python
127
+ from shaheen import Shaheen
128
+
129
+ # Initialize Shaheen DB (creates a local SQLite file)
130
+ db = Shaheen(db_path="memory.db")
131
+
132
+ # 1. Store sensory memories (fast write, immediate indexing)
133
+ db.remember("User: Hello! My name is Saif. I am a software engineer.")
134
+ db.remember("User: I am building Shaheen DB in Python to solve agent memory.")
135
+ db.remember("User: By the way, I have a pet falcon named Swift.", permanent=True) # Pinned context!
136
+
137
+ # 2. Trigger the "Sleep" cycle to organize memory into a graph
138
+ db.consolidate()
139
+
140
+ # 3. Recall coordinated memory (returns a unified GraphRAG context)
141
+ result = db.recall("Who is Saif and what is his pet?")
142
+
143
+ print(result["context_text"])
144
+ ```
145
+
146
+ ### 3. Output Context Pack Structure
147
+ The generated context pack is cleanly formatted, ready to be injected straight into your LLM prompt template:
148
+
149
+ ```markdown
150
+ ### Recent Conversations & Logs:
151
+ - [2026-05-20 08:31:13] User: By the way, I have a pet falcon named Swift. [Permanent] (decay relevance: 1.00, query match: 0.85)
152
+
153
+ ### Relevant Entities & Facts:
154
+ - saif (Person): role: software engineer (query match: 0.88)
155
+ - swift (Animal): type: falcon, owner: saif (query match: 0.84)
156
+ - shaheen-db (Software): purpose: AI agent memory, creator: saif (query match: 0.75)
157
+
158
+ ### Associative Relationships:
159
+ - saif --[BUILDING]--> shaheen-db (status: active)
160
+ - saif --[OWNER_OF]--> swift
161
+ ```
162
+
163
+ ---
164
+
165
+ ## ⏱️ Running Consolidation in Production
166
+
167
+ Because the "Sleep Cycle" (`db.consolidate()`) calls generative LLMs to extract information, it takes a few seconds to complete. You should **never** run it inline on the main thread during active user conversations, as it will slow down your agent's response time.
168
+
169
+ Instead, use one of the following production patterns:
170
+
171
+ ### Pattern A: Threshold-Based Trigger (Self-Sleep)
172
+ Automatically trigger consolidation in a background thread when the count of raw, unconsolidated memories reaches a threshold (e.g., every 15 conversation turns).
173
+
174
+ ```python
175
+ import threading
176
+
177
+ def handle_incoming_message(text: str):
178
+ # 1. Ingest immediately (takes < 5ms)
179
+ db.remember(f"User: {text}")
180
+
181
+ # 2. Trigger consolidation in the background if the threshold is met
182
+ unconsolidated_count = len(db.db.get_unconsolidated_memories())
183
+ if unconconsolidated_count >= 15:
184
+ threading.Thread(target=db.consolidate, daemon=True).start()
185
+
186
+ # 3. Retrieve context & query agent LLM
187
+ context = db.recall(text)
188
+ return agent.respond(text, context)
189
+ ```
190
+
191
+ ### Pattern B: Inactivity/Idle-Time Trigger
192
+ Trigger consolidation only when the user has stopped talking to the agent for a set duration (e.g., 10 minutes of inactivity).
193
+
194
+ ```python
195
+ import time
196
+ import threading
197
+
198
+ last_activity = time.time()
199
+ has_slept = False
200
+
201
+ def idle_watcher():
202
+ global last_activity, has_slept
203
+ while True:
204
+ time.sleep(30)
205
+ if time.time() - last_activity > 600 and not has_slept:
206
+ db.consolidate()
207
+ has_slept = True
208
+
209
+ # Start idle watcher thread on startup
210
+ threading.Thread(target=idle_watcher, daemon=True).start()
211
+ ```
212
+
213
+ ### Pattern C: Scheduled Cron Job
214
+ For multi-user SaaS setups, run a nightly script (e.g., at 2:00 AM) that iterates through all active user databases and triggers `.consolidate()`.
215
+ ```bash
216
+ # Run daily at 2:00 AM
217
+ 0 2 * * * python /app/scripts/trigger_nightly_consolidation.py
218
+ ```
219
+
220
+ ---
221
+
222
+ ## βš™οΈ Configuration Scenarios
223
+
224
+ > [!NOTE]
225
+ > **Base URL is Optional:** Standard OpenAI, Google Gemini, Anthropic, and local Ollama setups are auto-configured by default. You **do not** need to set `SHAHEEN_LLM_BASE_URL` unless you are using a third-party gateway like OpenRouter or Groq.
226
+
227
+ ### Supported Providers
228
+
229
+ | Provider | `SHAHEEN_LLM_PROVIDER` | Native SDK | Embedding Source | Notes |
230
+ | :--- | :--- | :--- | :--- | :--- |
231
+ | **OpenAI** | `openai` | βœ… | OpenAI API | Default provider |
232
+ | **Google Gemini** | `gemini` | βœ… | Gemini API | Uses `google-genai` SDK |
233
+ | **Anthropic Claude** | `anthropic` | βœ… | SentenceTransformers (local) | Anthropic has no embedding API |
234
+ | **Ollama (local)** | `openai` | Via base URL | Ollama API | Point base URL to `localhost:11434` |
235
+ | **Groq** | `openai` | Via base URL | Groq API | Ultra-fast inference |
236
+ | **OpenRouter** | `openai` | Via base URL | OpenRouter API | Access to 200+ models |
237
+ | **DeepSeek** | `openai` | Via base URL | DeepSeek API | Cost-effective cloud inference |
238
+ | **LM Studio** | `openai` | Via base URL | LM Studio API | Local GUI-based model runner |
239
+ | **SentenceTransformers** | `local` | βœ… | Python (in-process) | 100% offline, no API needed |
240
+
241
+ Shaheen DB is highly flexible and supports 100% cloud, hybrid, and 100% offline stacks. Configure it using these environment variables:
242
+
243
+ ### Scenario 1: 100% Google Gemini Cloud
244
+ ```bash
245
+ export SHAHEEN_LLM_PROVIDER="gemini"
246
+ export GEMINI_API_KEY="your-gemini-api-key"
247
+
248
+ # Defaults used automatically:
249
+ # SHAHEEN_LLM_MODEL="gemini-2.5-flash"
250
+ # SHAHEEN_EMBEDDING_MODEL="text-embedding-004"
251
+ ```
252
+
253
+ ### Scenario 2: 100% OpenAI Cloud
254
+ ```bash
255
+ export SHAHEEN_LLM_PROVIDER="openai"
256
+ export SHAHEEN_LLM_API_KEY="your-openai-key"
257
+
258
+ # Defaults used automatically:
259
+ # SHAHEEN_LLM_MODEL="gpt-4o-mini"
260
+ # SHAHEEN_EMBEDDING_MODEL="text-embedding-3-small"
261
+ ```
262
+
263
+ ### Scenario 3: Anthropic Claude (Native)
264
+ Uses the native Claude SDK with Tool Use to guarantee structured JSON extraction during the sleep cycle. Since Anthropic has no embedding API, vector search runs locally via SentenceTransformers.
265
+ ```bash
266
+ export SHAHEEN_LLM_PROVIDER="anthropic"
267
+ export ANTHROPIC_API_KEY="your-anthropic-key"
268
+
269
+ # Defaults used automatically:
270
+ # SHAHEEN_LLM_MODEL="claude-3-5-sonnet-20241022"
271
+ # SHAHEEN_EMBEDDING_MODEL="all-MiniLM-L6-v2" (runs locally, no API cost)
272
+ ```
273
+
274
+ ### Scenario 4: Universal Gateway (OpenRouter / Groq / DeepSeek)
275
+ Point Shaheen DB at any OpenAI-compatible provider to access hundreds of models with a single API key.
276
+ ```bash
277
+ export SHAHEEN_LLM_PROVIDER="openai"
278
+
279
+ # OpenRouter (access to 200+ models including Claude, Llama, Mistral, Gemini)
280
+ export SHAHEEN_LLM_BASE_URL="https://openrouter.ai/api/v1"
281
+ export SHAHEEN_LLM_API_KEY="your-openrouter-key"
282
+ export SHAHEEN_LLM_MODEL="meta-llama/llama-3-70b-instruct" # or any model slug
283
+
284
+ # Groq (ultra-fast Llama / Mixtral inference)
285
+ # export SHAHEEN_LLM_BASE_URL="https://api.groq.com/openai/v1"
286
+ # export SHAHEEN_LLM_API_KEY="your-groq-key"
287
+ # export SHAHEEN_LLM_MODEL="llama3-70b-8192"
288
+ ```
289
+
290
+ ### Scenario 3: Hybrid (Local Search + DeepSeek Cloud via OpenRouter)
291
+ Runs vector search 100% locally on your CPU (saving API costs), and only queries OpenRouter's cloud for the background consolidation sleep cycle.
292
+ ```bash
293
+ export SHAHEEN_LLM_PROVIDER="local" # Runs SentenceTransformers locally
294
+
295
+ export SHAHEEN_LLM_BASE_URL="https://openrouter.ai/api/v1"
296
+ export SHAHEEN_LLM_API_KEY="your-openrouter-key"
297
+ export SHAHEEN_LLM_MODEL="deepseek/deepseek-chat"
298
+ ```
299
+
300
+ ### Scenario 4: 100% Offline (Ollama-only, Lightweight)
301
+ Runs both embeddings and the LLM locally via Ollama. Keeps your Python environment lightweight (does NOT require PyTorch or `sentence-transformers`).
302
+ ```bash
303
+ export SHAHEEN_LLM_PROVIDER="openai"
304
+ export SHAHEEN_LLM_BASE_URL="http://localhost:11434/v1" # Required to redirect the API client to Ollama
305
+ export SHAHEEN_LLM_API_KEY="ollama" # Dummy key
306
+
307
+ export SHAHEEN_LLM_MODEL="llama3"
308
+ export SHAHEEN_EMBEDDING_MODEL="nomic-embed-text"
309
+ export SHAHEEN_EMBEDDING_DIM="768"
310
+ ```
311
+
312
+ ### Scenario 5: 100% Offline (Local Python Embeddings + Ollama LLM)
313
+ Uses native SentenceTransformers locally on your CPU for vector search, and calls Ollama for the LLM consolidation sleep cycle.
314
+ ```bash
315
+ export SHAHEEN_LLM_PROVIDER="local" # Runs SentenceTransformers locally (no base URL needed for embeddings)
316
+
317
+ # Points the LLM client to Ollama (Defaults to http://localhost:11434/v1 automatically)
318
+ export SHAHEEN_LLM_MODEL="llama3"
319
+ export SHAHEEN_LLM_API_KEY="ollama" # Dummy key
320
+ ```
321
+
322
+ ---
323
+
324
+ ## πŸ›οΈ Systems & Engineering Design
325
+
326
+ * **ACID Transactions:** The Knowledge Graph (entities/edges) and vector embeddings reside in the *same SQLite database*. All consolidation updates happen in single database transactions. If an LLM extraction fails or gets interrupted, the database rolls back cleanly.
327
+ * **WAL Mode Concurrency:** Configured with SQLite Write-Ahead Logging. Your AI agent can stream messages and write sensory logs on the main thread, while the consolidation loop runs on a background scheduler without locking the database.
328
+ * **Lazy Loading:** Offline embedding dependencies (`sentence-transformers`) are only loaded if you explicitly set `SHAHEEN_LLM_PROVIDER="local"`. If you use cloud APIs or Ollama, your application starts instantly without PyTorch memory overhead.
329
+
330
+ ---
331
+
332
+ ## πŸ“œ License
333
+ Shaheen DB is open-source software licensed under the [MIT License](LICENSE).
@@ -0,0 +1,290 @@
1
+ # πŸ¦… Shaheen DB
2
+
3
+ [![PyPI version](https://img.shields.io/pypi/v/shaheen-db.svg)](https://pypi.org/project/shaheen-db/)
4
+ [![Python Version](https://img.shields.io/badge/python-3.8%2B-blue.svg)](https://www.python.org/)
5
+ [![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
6
+ [![Tests](https://img.shields.io/badge/tests-10%20passed-brightgreen.svg)]()
7
+ [![Database](https://img.shields.io/badge/database-SQLite--WAL-orange.svg)]()
8
+ [![Vector Math](https://img.shields.io/badge/vector--math-NumPy%20Local-red.svg)]()
9
+
10
+ ```bash
11
+ pip install shaheen-db
12
+ ```
13
+
14
+ **Shaheen DB** (named after the Royal Falcon) is a lightweight, zero-ops, self-consolidating cognitive memory layer designed specifically for AI Agents.
15
+
16
+ Instead of treating agent memory as a massive, unstructured dumping ground of raw vector embeddingsβ€”which leads to bloated context windows, duplicate facts, high token costs, and retrieval noiseβ€”**Shaheen DB** implements an opinionated, biologically inspired cognitive memory pipeline.
17
+
18
+ ---
19
+
20
+ ## πŸ’‘ The Problem with Traditional Vector Databases
21
+ Most AI agents use standard vector databases (like Pinecone or Chroma) for memory. This approach has three fatal flaws in production:
22
+ 1. **Semantic Noise:** If a user mentions eating a croissant on Day 1, and asks about their business strategy on Day 30, a standard vector search will often retrieve the croissant log due to query overlaps. This pollutes the LLM's prompt.
23
+ 2. **Context Bloat:** Raw transcripts contain filler words, greetings, and temporary statements. Feeding these directly to LLMs burns tokens and incurs high latency.
24
+ 3. **No Fact Evolution:** If a user says "My name is Saif" on Day 1, and "Actually, write my name as Saif Al-Islam" on Day 5, a vector search will return both statements, forcing the LLM to guess which name is correct.
25
+
26
+ **Shaheen DB solves this by separating sensory ingestion from long-term factual consolidation, and automatically forgetting trivial logs over time.**
27
+
28
+ ---
29
+
30
+ ## 🧠 Core Memory Architecture
31
+
32
+ ```
33
+ [ Agent Conversations / Logs ]
34
+ β”‚
35
+ β–Ό (Immediate Ingestion < 5ms)
36
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
37
+ β”‚ Sensory Buffer (SQL) β”‚ ◄─── Instant Vector Match
38
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
39
+ β”‚
40
+ β–Ό (Background "Sleep" Loop)
41
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
42
+ β”‚ Consolidation Engine β”‚
43
+ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
44
+ β”‚ β”‚
45
+ (Extract facts) β”‚ β”‚ (Extract relationships)
46
+ β–Ό β–Ό
47
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
48
+ β”‚ Vectors β”‚ β”‚ Knowledge β”‚
49
+ β”‚ Index β”‚ β”‚ Graph β”‚
50
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
51
+ ```
52
+
53
+ 1. **Sensory Store (Short-term Memory):** Immediate, high-throughput writes (under 5ms) to an SQLite buffer running in Write-Ahead Logging (WAL) mode. Vectorized immediately for real-time semantic retrieval.
54
+ 2. **Consolidation Loop (The "Sleep" Cycle):** An asynchronous process that processes raw logs using structured LLM outputs to extract structured entities (nodes) and relationships (edges).
55
+ 3. **Cognitive Decay (Forgetting):** Applies mathematical exponential decay ($decay = e^{-\lambda \Delta t}$) to temporary sensory logs. Trivial information decays and is deleted, while pinned/permanent facts and organized graph connections are preserved permanently.
56
+ 4. **Hybrid GraphRAG Retrieval:** A unified retrieval query that automatically builds a context pack containing:
57
+ * **Recent Conversations:** Highly relevant active sensory logs.
58
+ * **Associated Entities & Facts:** Consolidated entities matching the query.
59
+ * **Associative Relationships:** Sub-graphs of connections between matching entities.
60
+
61
+ ---
62
+
63
+ ## ⚑ Features at a Glance
64
+
65
+ * **Zero-Ops GraphRAG:** Get the reasoning benefits of a Knowledge Graph and Vector Database in a single local SQLite file. No Neo4j servers, no Docker, no external cloud configurations.
66
+ * **Biologically Inspired Sleep Cycle:** Runs LLM extraction in the background to convert unstructured dialogue into structured relationships.
67
+ * **Native Memory Decay:** Automatic time-based cleanup of trivial logs to keep context window sizes minimal and token costs low.
68
+ * **Microsecond Graph Traversals:** Uses optimized SQL queries inside SQLite to fetch 1-hop and 2-hop entity neighborhoods in microseconds.
69
+ * **Dependency-Optional Offline Mode:** Support for local CPU/GPU embedding generation via `sentence-transformers` or local API calls to **Ollama** to avoid heavy PyTorch dependencies.
70
+
71
+ ---
72
+
73
+ ## πŸš€ Quickstart
74
+
75
+ ### 1. Installation
76
+ Install the package requirements:
77
+ ```bash
78
+ pip install -r requirements.txt
79
+ ```
80
+
81
+ ### 2. Basic Usage
82
+
83
+ ```python
84
+ from shaheen import Shaheen
85
+
86
+ # Initialize Shaheen DB (creates a local SQLite file)
87
+ db = Shaheen(db_path="memory.db")
88
+
89
+ # 1. Store sensory memories (fast write, immediate indexing)
90
+ db.remember("User: Hello! My name is Saif. I am a software engineer.")
91
+ db.remember("User: I am building Shaheen DB in Python to solve agent memory.")
92
+ db.remember("User: By the way, I have a pet falcon named Swift.", permanent=True) # Pinned context!
93
+
94
+ # 2. Trigger the "Sleep" cycle to organize memory into a graph
95
+ db.consolidate()
96
+
97
+ # 3. Recall coordinated memory (returns a unified GraphRAG context)
98
+ result = db.recall("Who is Saif and what is his pet?")
99
+
100
+ print(result["context_text"])
101
+ ```
102
+
103
+ ### 3. Output Context Pack Structure
104
+ The generated context pack is cleanly formatted, ready to be injected straight into your LLM prompt template:
105
+
106
+ ```markdown
107
+ ### Recent Conversations & Logs:
108
+ - [2026-05-20 08:31:13] User: By the way, I have a pet falcon named Swift. [Permanent] (decay relevance: 1.00, query match: 0.85)
109
+
110
+ ### Relevant Entities & Facts:
111
+ - saif (Person): role: software engineer (query match: 0.88)
112
+ - swift (Animal): type: falcon, owner: saif (query match: 0.84)
113
+ - shaheen-db (Software): purpose: AI agent memory, creator: saif (query match: 0.75)
114
+
115
+ ### Associative Relationships:
116
+ - saif --[BUILDING]--> shaheen-db (status: active)
117
+ - saif --[OWNER_OF]--> swift
118
+ ```
119
+
120
+ ---
121
+
122
+ ## ⏱️ Running Consolidation in Production
123
+
124
+ Because the "Sleep Cycle" (`db.consolidate()`) calls generative LLMs to extract information, it takes a few seconds to complete. You should **never** run it inline on the main thread during active user conversations, as it will slow down your agent's response time.
125
+
126
+ Instead, use one of the following production patterns:
127
+
128
+ ### Pattern A: Threshold-Based Trigger (Self-Sleep)
129
+ Automatically trigger consolidation in a background thread when the count of raw, unconsolidated memories reaches a threshold (e.g., every 15 conversation turns).
130
+
131
+ ```python
132
+ import threading
133
+
134
+ def handle_incoming_message(text: str):
135
+ # 1. Ingest immediately (takes < 5ms)
136
+ db.remember(f"User: {text}")
137
+
138
+ # 2. Trigger consolidation in the background if the threshold is met
139
+ unconsolidated_count = len(db.db.get_unconsolidated_memories())
140
+ if unconconsolidated_count >= 15:
141
+ threading.Thread(target=db.consolidate, daemon=True).start()
142
+
143
+ # 3. Retrieve context & query agent LLM
144
+ context = db.recall(text)
145
+ return agent.respond(text, context)
146
+ ```
147
+
148
+ ### Pattern B: Inactivity/Idle-Time Trigger
149
+ Trigger consolidation only when the user has stopped talking to the agent for a set duration (e.g., 10 minutes of inactivity).
150
+
151
+ ```python
152
+ import time
153
+ import threading
154
+
155
+ last_activity = time.time()
156
+ has_slept = False
157
+
158
+ def idle_watcher():
159
+ global last_activity, has_slept
160
+ while True:
161
+ time.sleep(30)
162
+ if time.time() - last_activity > 600 and not has_slept:
163
+ db.consolidate()
164
+ has_slept = True
165
+
166
+ # Start idle watcher thread on startup
167
+ threading.Thread(target=idle_watcher, daemon=True).start()
168
+ ```
169
+
170
+ ### Pattern C: Scheduled Cron Job
171
+ For multi-user SaaS setups, run a nightly script (e.g., at 2:00 AM) that iterates through all active user databases and triggers `.consolidate()`.
172
+ ```bash
173
+ # Run daily at 2:00 AM
174
+ 0 2 * * * python /app/scripts/trigger_nightly_consolidation.py
175
+ ```
176
+
177
+ ---
178
+
179
+ ## βš™οΈ Configuration Scenarios
180
+
181
+ > [!NOTE]
182
+ > **Base URL is Optional:** Standard OpenAI, Google Gemini, Anthropic, and local Ollama setups are auto-configured by default. You **do not** need to set `SHAHEEN_LLM_BASE_URL` unless you are using a third-party gateway like OpenRouter or Groq.
183
+
184
+ ### Supported Providers
185
+
186
+ | Provider | `SHAHEEN_LLM_PROVIDER` | Native SDK | Embedding Source | Notes |
187
+ | :--- | :--- | :--- | :--- | :--- |
188
+ | **OpenAI** | `openai` | βœ… | OpenAI API | Default provider |
189
+ | **Google Gemini** | `gemini` | βœ… | Gemini API | Uses `google-genai` SDK |
190
+ | **Anthropic Claude** | `anthropic` | βœ… | SentenceTransformers (local) | Anthropic has no embedding API |
191
+ | **Ollama (local)** | `openai` | Via base URL | Ollama API | Point base URL to `localhost:11434` |
192
+ | **Groq** | `openai` | Via base URL | Groq API | Ultra-fast inference |
193
+ | **OpenRouter** | `openai` | Via base URL | OpenRouter API | Access to 200+ models |
194
+ | **DeepSeek** | `openai` | Via base URL | DeepSeek API | Cost-effective cloud inference |
195
+ | **LM Studio** | `openai` | Via base URL | LM Studio API | Local GUI-based model runner |
196
+ | **SentenceTransformers** | `local` | βœ… | Python (in-process) | 100% offline, no API needed |
197
+
198
+ Shaheen DB is highly flexible and supports 100% cloud, hybrid, and 100% offline stacks. Configure it using these environment variables:
199
+
200
+ ### Scenario 1: 100% Google Gemini Cloud
201
+ ```bash
202
+ export SHAHEEN_LLM_PROVIDER="gemini"
203
+ export GEMINI_API_KEY="your-gemini-api-key"
204
+
205
+ # Defaults used automatically:
206
+ # SHAHEEN_LLM_MODEL="gemini-2.5-flash"
207
+ # SHAHEEN_EMBEDDING_MODEL="text-embedding-004"
208
+ ```
209
+
210
+ ### Scenario 2: 100% OpenAI Cloud
211
+ ```bash
212
+ export SHAHEEN_LLM_PROVIDER="openai"
213
+ export SHAHEEN_LLM_API_KEY="your-openai-key"
214
+
215
+ # Defaults used automatically:
216
+ # SHAHEEN_LLM_MODEL="gpt-4o-mini"
217
+ # SHAHEEN_EMBEDDING_MODEL="text-embedding-3-small"
218
+ ```
219
+
220
+ ### Scenario 3: Anthropic Claude (Native)
221
+ Uses the native Claude SDK with Tool Use to guarantee structured JSON extraction during the sleep cycle. Since Anthropic has no embedding API, vector search runs locally via SentenceTransformers.
222
+ ```bash
223
+ export SHAHEEN_LLM_PROVIDER="anthropic"
224
+ export ANTHROPIC_API_KEY="your-anthropic-key"
225
+
226
+ # Defaults used automatically:
227
+ # SHAHEEN_LLM_MODEL="claude-3-5-sonnet-20241022"
228
+ # SHAHEEN_EMBEDDING_MODEL="all-MiniLM-L6-v2" (runs locally, no API cost)
229
+ ```
230
+
231
+ ### Scenario 4: Universal Gateway (OpenRouter / Groq / DeepSeek)
232
+ Point Shaheen DB at any OpenAI-compatible provider to access hundreds of models with a single API key.
233
+ ```bash
234
+ export SHAHEEN_LLM_PROVIDER="openai"
235
+
236
+ # OpenRouter (access to 200+ models including Claude, Llama, Mistral, Gemini)
237
+ export SHAHEEN_LLM_BASE_URL="https://openrouter.ai/api/v1"
238
+ export SHAHEEN_LLM_API_KEY="your-openrouter-key"
239
+ export SHAHEEN_LLM_MODEL="meta-llama/llama-3-70b-instruct" # or any model slug
240
+
241
+ # Groq (ultra-fast Llama / Mixtral inference)
242
+ # export SHAHEEN_LLM_BASE_URL="https://api.groq.com/openai/v1"
243
+ # export SHAHEEN_LLM_API_KEY="your-groq-key"
244
+ # export SHAHEEN_LLM_MODEL="llama3-70b-8192"
245
+ ```
246
+
247
+ ### Scenario 3: Hybrid (Local Search + DeepSeek Cloud via OpenRouter)
248
+ Runs vector search 100% locally on your CPU (saving API costs), and only queries OpenRouter's cloud for the background consolidation sleep cycle.
249
+ ```bash
250
+ export SHAHEEN_LLM_PROVIDER="local" # Runs SentenceTransformers locally
251
+
252
+ export SHAHEEN_LLM_BASE_URL="https://openrouter.ai/api/v1"
253
+ export SHAHEEN_LLM_API_KEY="your-openrouter-key"
254
+ export SHAHEEN_LLM_MODEL="deepseek/deepseek-chat"
255
+ ```
256
+
257
+ ### Scenario 4: 100% Offline (Ollama-only, Lightweight)
258
+ Runs both embeddings and the LLM locally via Ollama. Keeps your Python environment lightweight (does NOT require PyTorch or `sentence-transformers`).
259
+ ```bash
260
+ export SHAHEEN_LLM_PROVIDER="openai"
261
+ export SHAHEEN_LLM_BASE_URL="http://localhost:11434/v1" # Required to redirect the API client to Ollama
262
+ export SHAHEEN_LLM_API_KEY="ollama" # Dummy key
263
+
264
+ export SHAHEEN_LLM_MODEL="llama3"
265
+ export SHAHEEN_EMBEDDING_MODEL="nomic-embed-text"
266
+ export SHAHEEN_EMBEDDING_DIM="768"
267
+ ```
268
+
269
+ ### Scenario 5: 100% Offline (Local Python Embeddings + Ollama LLM)
270
+ Uses native SentenceTransformers locally on your CPU for vector search, and calls Ollama for the LLM consolidation sleep cycle.
271
+ ```bash
272
+ export SHAHEEN_LLM_PROVIDER="local" # Runs SentenceTransformers locally (no base URL needed for embeddings)
273
+
274
+ # Points the LLM client to Ollama (Defaults to http://localhost:11434/v1 automatically)
275
+ export SHAHEEN_LLM_MODEL="llama3"
276
+ export SHAHEEN_LLM_API_KEY="ollama" # Dummy key
277
+ ```
278
+
279
+ ---
280
+
281
+ ## πŸ›οΈ Systems & Engineering Design
282
+
283
+ * **ACID Transactions:** The Knowledge Graph (entities/edges) and vector embeddings reside in the *same SQLite database*. All consolidation updates happen in single database transactions. If an LLM extraction fails or gets interrupted, the database rolls back cleanly.
284
+ * **WAL Mode Concurrency:** Configured with SQLite Write-Ahead Logging. Your AI agent can stream messages and write sensory logs on the main thread, while the consolidation loop runs on a background scheduler without locking the database.
285
+ * **Lazy Loading:** Offline embedding dependencies (`sentence-transformers`) are only loaded if you explicitly set `SHAHEEN_LLM_PROVIDER="local"`. If you use cloud APIs or Ollama, your application starts instantly without PyTorch memory overhead.
286
+
287
+ ---
288
+
289
+ ## πŸ“œ License
290
+ Shaheen DB is open-source software licensed under the [MIT License](LICENSE).