@199-bio/engram 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/.env.example ADDED
@@ -0,0 +1,19 @@
1
+ # Engram Configuration
2
+ # Copy this file to .env and fill in your values
3
+
4
+ # Required: Path to store the database
5
+ ENGRAM_DB_PATH=~/.engram
6
+
7
+ # Optional: Cloud API keys for enhanced quality
8
+ # These are optional - Engram works fully offline without them
9
+ # GEMINI_API_KEY=your-gemini-api-key
10
+ # COHERE_API_KEY=your-cohere-api-key
11
+
12
+ # Optional: Model configuration
13
+ # COLBERT_MODEL=colbert-ir/colbertv2.0
14
+ # EMBEDDING_MODEL=Qwen/Qwen3-Embedding-0.6B
15
+
16
+ # Optional: Performance tuning
17
+ # MAX_MEMORY_CACHE=1000
18
+ # RETRIEVAL_TOP_K=50
19
+ # RERANK_TOP_K=10
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Boris Djordjevic, 199 Biotechnologies
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/LIVING_PLAN.md ADDED
@@ -0,0 +1,180 @@
1
+ # Engram Development - Living Plan
2
+
3
+ **Last Updated**: 2024-12-22 03:50 UTC
4
+
5
+ This file tracks development progress. If context is lost, read this file to continue.
6
+
7
+ ---
8
+
9
+ ## Current Status: Phase 5 - Production Ready
10
+
11
+ ### Completed
12
+ - [x] Project structure created
13
+ - [x] package.json, tsconfig.json, .gitignore, LICENSE
14
+ - [x] SQLite storage layer (`src/storage/database.ts`)
15
+ - Memories table with FTS5 for BM25
16
+ - Entities, Observations, Relations tables
17
+ - Graph traversal queries
18
+ - All CRUD operations
19
+ - [x] Entity extractor (`src/graph/extractor.ts`)
20
+ - Heuristic-based name extraction
21
+ - Organization detection (Goldman Sachs, etc.)
22
+ - Known organizations database
23
+ - Relationship extraction
24
+ - No external dependencies
25
+ - [x] Knowledge graph manager (`src/graph/knowledge-graph.ts`)
26
+ - High-level graph operations
27
+ - Auto-extraction from text
28
+ - Graph traversal
29
+ - [x] ColBERT Python bridge (`src/retrieval/colbert-bridge.py`)
30
+ - RAGatouille integration
31
+ - JSON stdin/stdout protocol
32
+ - [x] TypeScript ColBERT wrapper (`src/retrieval/colbert.ts`)
33
+ - Subprocess management
34
+ - Fallback SimpleRetriever when Python unavailable
35
+ - [x] Hybrid search (`src/retrieval/hybrid.ts`)
36
+ - BM25 + Semantic + Graph
37
+ - Reciprocal Rank Fusion (RRF)
38
+ - [x] MCP server with all tools (`src/index.ts`)
39
+ - remember, recall, forget
40
+ - create_entity, observe, relate, query_entity, list_entities
41
+ - stats
42
+ - [x] Install dependencies and build
43
+ - [x] Test end-to-end with fictive examples (11 tests pass)
44
+ - [x] Entity extraction improvements
45
+ - Goldman Sachs correctly detected as organization
46
+ - Known organizations database
47
+ - Place filtering (California, etc.)
48
+ - Nationality/religion filtering
49
+
50
+ ### Verified Working
51
+ - All 11 MCP test cases pass
52
+ - BM25 search working (FTS5)
53
+ - Graph-based entity linking working
54
+ - ColBERT Python bridge working
55
+ - Entity extraction correctly identifies orgs vs persons
56
+
57
+ ---
58
+
59
+ ## File Structure
60
+
61
+ ```
62
+ engram/
63
+ ├── src/
64
+ │ ├── index.ts # MCP server (DONE)
65
+ │ ├── storage/
66
+ │ │ ├── database.ts # SQLite + FTS5 (DONE)
67
+ │ │ └── index.ts # Exports (DONE)
68
+ │ ├── graph/
69
+ │ │ ├── extractor.ts # Entity extraction (DONE)
70
+ │ │ ├── knowledge-graph.ts # Graph operations (DONE)
71
+ │ │ └── index.ts # Exports (DONE)
72
+ │ ├── retrieval/
73
+ │ │ ├── colbert.ts # TypeScript wrapper (DONE)
74
+ │ │ ├── colbert-bridge.py # Python RAGatouille (DONE)
75
+ │ │ ├── hybrid.ts # RRF fusion (DONE)
76
+ │ │ └── index.ts # Exports (DONE)
77
+ ├── tests/
78
+ │ ├── test-interactive.js # Full test suite (DONE)
79
+ │ └── test-mcp.sh # Shell test script (DONE)
80
+ ├── dist/ # Compiled JS (auto-generated)
81
+ ├── package.json # Dependencies (DONE)
82
+ ├── tsconfig.json # TypeScript config (DONE)
83
+ ├── README.md # Documentation (DONE)
84
+ └── LIVING_PLAN.md # This file (DONE)
85
+ ```
86
+
87
+ ---
88
+
89
+ ## MCP Tools Available
90
+
91
+ 1. **remember** - Store a new memory, auto-extracts entities
92
+ 2. **recall** - Hybrid search (BM25 + semantic + graph)
93
+ 3. **forget** - Remove a memory by ID
94
+ 4. **create_entity** - Manually create an entity
95
+ 5. **observe** - Add an observation about an entity
96
+ 6. **relate** - Create a relationship between entities
97
+ 7. **query_entity** - Get entity details and relationships
98
+ 8. **list_entities** - List all entities by type
99
+ 9. **stats** - Get memory/entity/relation counts
100
+
101
+ ---
102
+
103
+ ## Key Decisions
104
+
105
+ 1. **ColBERT via Python**: RAGatouille is proven, well-maintained. Use subprocess.
106
+ 2. **BM25 via SQLite FTS5**: Already implemented, zero deps.
107
+ 3. **Local-first**: No API keys required.
108
+ 4. **Entity extraction**: Heuristics + known org database. Can add GLiNER later.
109
+ 5. **Hybrid Search**: RRF fusion with k=60 constant.
110
+
111
+ ---
112
+
113
+ ## Testing Commands
114
+
115
+ ```bash
116
+ # Build TypeScript
117
+ cd /Users/biobook/Code/stuff/engram
118
+ npm install
119
+ npm run build
120
+
121
+ # Run full test suite
122
+ node tests/test-interactive.js
123
+
124
+ # Test MCP server manually
125
+ echo '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' | node dist/index.js
126
+
127
+ # Install as MCP for Claude Desktop
128
+ # Add to ~/.claude/claude_desktop_config.json:
129
+ # {
130
+ # "mcpServers": {
131
+ # "engram": {
132
+ # "command": "node",
133
+ # "args": ["/Users/biobook/Code/stuff/engram/dist/index.js"]
134
+ # }
135
+ # }
136
+ # }
137
+ ```
138
+
139
+ ---
140
+
141
+ ## Known Limitations
142
+
143
+ - Windows not supported (RAGatouille limitation)
144
+ - ColBERT models are ~500MB (downloaded on first use)
145
+ - BM25 scores for named entities are low (graph search compensates)
146
+ - Place extraction not implemented (California detected as person)
147
+
148
+ ---
149
+
150
+ ## Future Enhancements
151
+
152
+ - [ ] GLiNER for better NER
153
+ - [ ] Gemini embeddings (optional cloud enhancement)
154
+ - [ ] Cohere reranking (optional cloud enhancement)
155
+ - [ ] Temporal memory decay
156
+ - [ ] Memory consolidation (merge similar memories)
157
+ - [ ] Export/import functionality
158
+
159
+ ---
160
+
161
+ ## To Continue Development
162
+
163
+ If starting fresh, run these commands:
164
+
165
+ ```bash
166
+ cd /Users/biobook/Code/stuff/engram
167
+ cat LIVING_PLAN.md # Read this file
168
+ npm run build # Rebuild if needed
169
+ node tests/test-interactive.js # Run tests
170
+ ```
171
+
172
+ ---
173
+
174
+ ## API Keys Needed
175
+
176
+ **NONE** - This is a local-first implementation.
177
+
178
+ Optional (for future cloud enhancement):
179
+ - GEMINI_API_KEY - embeddings
180
+ - COHERE_API_KEY - reranking