htm 0.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.architecture/decisions/adrs/001-use-postgresql-timescaledb-storage.md +227 -0
- data/.architecture/decisions/adrs/002-two-tier-memory-architecture.md +322 -0
- data/.architecture/decisions/adrs/003-ollama-default-embedding-provider.md +339 -0
- data/.architecture/decisions/adrs/004-multi-robot-shared-memory-hive-mind.md +374 -0
- data/.architecture/decisions/adrs/005-rag-based-retrieval-with-hybrid-search.md +443 -0
- data/.architecture/decisions/adrs/006-context-assembly-strategies.md +444 -0
- data/.architecture/decisions/adrs/007-working-memory-eviction-strategy.md +461 -0
- data/.architecture/decisions/adrs/008-robot-identification-system.md +550 -0
- data/.architecture/decisions/adrs/009-never-forget-explicit-deletion-only.md +570 -0
- data/.architecture/decisions/adrs/010-redis-working-memory-rejected.md +323 -0
- data/.architecture/decisions/adrs/011-database-side-embedding-generation-with-pgai.md +585 -0
- data/.architecture/decisions/adrs/012-llm-driven-ontology-topic-extraction.md +583 -0
- data/.architecture/decisions/adrs/013-activerecord-orm-and-many-to-many-tagging.md +299 -0
- data/.architecture/decisions/adrs/014-client-side-embedding-generation-workflow.md +569 -0
- data/.architecture/decisions/adrs/015-hierarchical-tag-ontology-and-llm-extraction.md +701 -0
- data/.architecture/decisions/adrs/016-async-embedding-and-tag-generation.md +694 -0
- data/.architecture/members.yml +144 -0
- data/.architecture/reviews/2025-10-29-llm-configuration-and-async-processing-review.md +1137 -0
- data/.architecture/reviews/initial-system-analysis.md +330 -0
- data/.envrc +32 -0
- data/.irbrc +145 -0
- data/CHANGELOG.md +150 -0
- data/COMMITS.md +196 -0
- data/LICENSE +21 -0
- data/README.md +1347 -0
- data/Rakefile +51 -0
- data/SETUP.md +268 -0
- data/config/database.yml +67 -0
- data/db/migrate/20250101000001_enable_extensions.rb +14 -0
- data/db/migrate/20250101000002_create_robots.rb +14 -0
- data/db/migrate/20250101000003_create_nodes.rb +42 -0
- data/db/migrate/20250101000005_create_tags.rb +38 -0
- data/db/migrate/20250101000007_add_node_vector_indexes.rb +30 -0
- data/db/schema.sql +473 -0
- data/db/seed_data/README.md +100 -0
- data/db/seed_data/presidents.md +136 -0
- data/db/seed_data/states.md +151 -0
- data/db/seeds.rb +208 -0
- data/dbdoc/README.md +173 -0
- data/dbdoc/public.node_stats.md +48 -0
- data/dbdoc/public.node_stats.svg +41 -0
- data/dbdoc/public.node_tags.md +40 -0
- data/dbdoc/public.node_tags.svg +112 -0
- data/dbdoc/public.nodes.md +54 -0
- data/dbdoc/public.nodes.svg +118 -0
- data/dbdoc/public.nodes_tags.md +39 -0
- data/dbdoc/public.nodes_tags.svg +112 -0
- data/dbdoc/public.ontology_structure.md +48 -0
- data/dbdoc/public.ontology_structure.svg +38 -0
- data/dbdoc/public.operations_log.md +42 -0
- data/dbdoc/public.operations_log.svg +130 -0
- data/dbdoc/public.relationships.md +39 -0
- data/dbdoc/public.relationships.svg +41 -0
- data/dbdoc/public.robot_activity.md +46 -0
- data/dbdoc/public.robot_activity.svg +35 -0
- data/dbdoc/public.robots.md +35 -0
- data/dbdoc/public.robots.svg +90 -0
- data/dbdoc/public.schema_migrations.md +29 -0
- data/dbdoc/public.schema_migrations.svg +26 -0
- data/dbdoc/public.tags.md +35 -0
- data/dbdoc/public.tags.svg +60 -0
- data/dbdoc/public.topic_relationships.md +45 -0
- data/dbdoc/public.topic_relationships.svg +32 -0
- data/dbdoc/schema.json +1437 -0
- data/dbdoc/schema.svg +154 -0
- data/docs/api/database.md +806 -0
- data/docs/api/embedding-service.md +532 -0
- data/docs/api/htm.md +797 -0
- data/docs/api/index.md +259 -0
- data/docs/api/long-term-memory.md +1096 -0
- data/docs/api/working-memory.md +665 -0
- data/docs/architecture/adrs/001-postgresql-timescaledb.md +314 -0
- data/docs/architecture/adrs/002-two-tier-memory.md +411 -0
- data/docs/architecture/adrs/003-ollama-embeddings.md +421 -0
- data/docs/architecture/adrs/004-hive-mind.md +437 -0
- data/docs/architecture/adrs/005-rag-retrieval.md +531 -0
- data/docs/architecture/adrs/006-context-assembly.md +496 -0
- data/docs/architecture/adrs/007-eviction-strategy.md +645 -0
- data/docs/architecture/adrs/008-robot-identification.md +625 -0
- data/docs/architecture/adrs/009-never-forget.md +648 -0
- data/docs/architecture/adrs/010-redis-working-memory-rejected.md +323 -0
- data/docs/architecture/adrs/011-pgai-integration.md +494 -0
- data/docs/architecture/adrs/index.md +215 -0
- data/docs/architecture/hive-mind.md +736 -0
- data/docs/architecture/index.md +351 -0
- data/docs/architecture/overview.md +538 -0
- data/docs/architecture/two-tier-memory.md +873 -0
- data/docs/assets/css/custom.css +83 -0
- data/docs/assets/images/htm-core-components.svg +63 -0
- data/docs/assets/images/htm-database-schema.svg +93 -0
- data/docs/assets/images/htm-hive-mind-architecture.svg +125 -0
- data/docs/assets/images/htm-importance-scoring-framework.svg +83 -0
- data/docs/assets/images/htm-layered-architecture.svg +71 -0
- data/docs/assets/images/htm-long-term-memory-architecture.svg +115 -0
- data/docs/assets/images/htm-working-memory-architecture.svg +120 -0
- data/docs/assets/images/htm.jpg +0 -0
- data/docs/assets/images/htm_demo.gif +0 -0
- data/docs/assets/js/mathjax.js +18 -0
- data/docs/assets/videos/htm_video.mp4 +0 -0
- data/docs/database_rake_tasks.md +322 -0
- data/docs/development/contributing.md +787 -0
- data/docs/development/index.md +336 -0
- data/docs/development/schema.md +596 -0
- data/docs/development/setup.md +719 -0
- data/docs/development/testing.md +819 -0
- data/docs/guides/adding-memories.md +824 -0
- data/docs/guides/context-assembly.md +1009 -0
- data/docs/guides/getting-started.md +577 -0
- data/docs/guides/index.md +118 -0
- data/docs/guides/long-term-memory.md +941 -0
- data/docs/guides/multi-robot.md +866 -0
- data/docs/guides/recalling-memories.md +927 -0
- data/docs/guides/search-strategies.md +953 -0
- data/docs/guides/working-memory.md +717 -0
- data/docs/index.md +214 -0
- data/docs/installation.md +477 -0
- data/docs/multi_framework_support.md +519 -0
- data/docs/quick-start.md +655 -0
- data/docs/setup_local_database.md +302 -0
- data/docs/using_rake_tasks_in_your_app.md +383 -0
- data/examples/basic_usage.rb +93 -0
- data/examples/cli_app/README.md +317 -0
- data/examples/cli_app/htm_cli.rb +270 -0
- data/examples/custom_llm_configuration.rb +183 -0
- data/examples/example_app/Rakefile +71 -0
- data/examples/example_app/app.rb +206 -0
- data/examples/sinatra_app/Gemfile +21 -0
- data/examples/sinatra_app/app.rb +335 -0
- data/lib/htm/active_record_config.rb +113 -0
- data/lib/htm/configuration.rb +342 -0
- data/lib/htm/database.rb +594 -0
- data/lib/htm/embedding_service.rb +115 -0
- data/lib/htm/errors.rb +34 -0
- data/lib/htm/job_adapter.rb +154 -0
- data/lib/htm/jobs/generate_embedding_job.rb +65 -0
- data/lib/htm/jobs/generate_tags_job.rb +82 -0
- data/lib/htm/long_term_memory.rb +965 -0
- data/lib/htm/models/node.rb +109 -0
- data/lib/htm/models/node_tag.rb +33 -0
- data/lib/htm/models/robot.rb +52 -0
- data/lib/htm/models/tag.rb +76 -0
- data/lib/htm/railtie.rb +76 -0
- data/lib/htm/sinatra.rb +157 -0
- data/lib/htm/tag_service.rb +135 -0
- data/lib/htm/tasks.rb +38 -0
- data/lib/htm/version.rb +5 -0
- data/lib/htm/working_memory.rb +182 -0
- data/lib/htm.rb +400 -0
- data/lib/tasks/db.rake +19 -0
- data/lib/tasks/htm.rake +147 -0
- data/lib/tasks/jobs.rake +312 -0
- data/mkdocs.yml +190 -0
- data/scripts/install_local_database.sh +309 -0
- metadata +341 -0
|
@@ -0,0 +1,339 @@
|
|
|
1
|
+
# ADR-003: Ollama as Default Embedding Provider
|
|
2
|
+
|
|
3
|
+
**Status**: Accepted
|
|
4
|
+
|
|
5
|
+
**Date**: 2025-10-25
|
|
6
|
+
|
|
7
|
+
**Decision Makers**: Dewayne VanHoozer, Claude (Anthropic)
|
|
8
|
+
|
|
9
|
+
## Context
|
|
10
|
+
|
|
11
|
+
HTM requires vector embeddings for semantic search functionality. Embeddings convert text into high-dimensional vectors that capture semantic meaning, enabling similarity search beyond keyword matching.
|
|
12
|
+
|
|
13
|
+
Requirements:
|
|
14
|
+
|
|
15
|
+
- Generate embeddings for memory nodes
|
|
16
|
+
- Support semantic similarity search
|
|
17
|
+
- Consistent embedding dimensions (1536 recommended)
|
|
18
|
+
- Reasonable latency (< 1 second per embedding)
|
|
19
|
+
- Cost-effective for development and production
|
|
20
|
+
- Privacy-preserving (sensitive data handling)
|
|
21
|
+
|
|
22
|
+
Options considered:
|
|
23
|
+
|
|
24
|
+
1. **OpenAI**: text-embedding-3-small, excellent quality
|
|
25
|
+
2. **Ollama**: Local models (gpt-oss, nomic-embed-text), privacy-first
|
|
26
|
+
3. **Cohere**: embed-english-v3.0, good performance
|
|
27
|
+
4. **Anthropic**: No native embedding API (yet)
|
|
28
|
+
5. **Sentence Transformers**: Local Python models via API
|
|
29
|
+
6. **Voyage AI**: Specialized embeddings, high quality
|
|
30
|
+
|
|
31
|
+
## Decision
|
|
32
|
+
|
|
33
|
+
We will use **Ollama with the gpt-oss model** as the default embedding provider for HTM, while supporting pluggable alternatives (OpenAI, Cohere, etc.).
|
|
34
|
+
|
|
35
|
+
## Rationale
|
|
36
|
+
|
|
37
|
+
### Why Ollama?
|
|
38
|
+
|
|
39
|
+
**Local-first approach**:
|
|
40
|
+
|
|
41
|
+
- ✅ Runs on user's machine (M2 Mac handles it well)
|
|
42
|
+
- ✅ No API costs during development
|
|
43
|
+
- ✅ No internet dependency once models downloaded
|
|
44
|
+
- ✅ Fast iteration without rate limits
|
|
45
|
+
|
|
46
|
+
**Privacy-preserving**:
|
|
47
|
+
|
|
48
|
+
- ✅ Data never leaves the user's machine
|
|
49
|
+
- ✅ Critical for sensitive conversations
|
|
50
|
+
- ✅ No terms of service restrictions
|
|
51
|
+
- ✅ Full control over data
|
|
52
|
+
|
|
53
|
+
**Developer-friendly**:
|
|
54
|
+
|
|
55
|
+
- ✅ Simple installation (`ollama pull gpt-oss`)
|
|
56
|
+
- ✅ HTTP API at localhost:11434
|
|
57
|
+
- ✅ Multiple model support
|
|
58
|
+
- ✅ Growing ecosystem
|
|
59
|
+
|
|
60
|
+
**Cost-effective**:
|
|
61
|
+
|
|
62
|
+
- ✅ Zero ongoing costs
|
|
63
|
+
- ✅ Pay once for compute (user's hardware)
|
|
64
|
+
- ✅ No per-token pricing
|
|
65
|
+
- ✅ Predictable operational costs
|
|
66
|
+
|
|
67
|
+
### Why gpt-oss Model?
|
|
68
|
+
|
|
69
|
+
**Technical characteristics**:
|
|
70
|
+
|
|
71
|
+
- Vector dimension: 1536 (matches OpenAI text-embedding-3-small)
|
|
72
|
+
- Speed: ~100-300ms per embedding on M2 Mac
|
|
73
|
+
- Quality: Good semantic understanding for general text
|
|
74
|
+
- Size: Reasonable model size (~274MB)
|
|
75
|
+
|
|
76
|
+
**Compatibility**:
|
|
77
|
+
|
|
78
|
+
- Same dimension as OpenAI (easier migration)
|
|
79
|
+
- Works with pgvector (supports any dimension)
|
|
80
|
+
- Compatible with other tools expecting 1536d vectors
|
|
81
|
+
|
|
82
|
+
### Why Not Alternatives?
|
|
83
|
+
|
|
84
|
+
**OpenAI text-embedding-3-small**:
|
|
85
|
+
|
|
86
|
+
- ❌ Costs $0.02 per 1M tokens (~$20 for 1M embeddings)
|
|
87
|
+
- ❌ Requires API key and internet
|
|
88
|
+
- ❌ Data sent to OpenAI (privacy concern)
|
|
89
|
+
- ❌ Rate limits (3000 RPM on Tier 1)
|
|
90
|
+
- ✅ Excellent quality
|
|
91
|
+
- ✅ Proven at scale
|
|
92
|
+
|
|
93
|
+
**Cohere embed-english-v3.0**:
|
|
94
|
+
|
|
95
|
+
- ❌ Similar cost/privacy concerns as OpenAI
|
|
96
|
+
- ❌ Requires API key
|
|
97
|
+
- ✅ Good quality
|
|
98
|
+
- ✅ Multilingual support
|
|
99
|
+
|
|
100
|
+
**Local Sentence Transformers**:
|
|
101
|
+
|
|
102
|
+
- ❌ Requires Python runtime
|
|
103
|
+
- ❌ More complex deployment
|
|
104
|
+
- ❌ Python/Ruby interop overhead
|
|
105
|
+
- ✅ No external dependencies
|
|
106
|
+
- ✅ Good quality
|
|
107
|
+
|
|
108
|
+
## Implementation Details
|
|
109
|
+
|
|
110
|
+
### EmbeddingService Architecture
|
|
111
|
+
```ruby
|
|
112
|
+
class EmbeddingService
|
|
113
|
+
def initialize(provider = :ollama, model: 'gpt-oss', ollama_url: nil)
|
|
114
|
+
@provider = provider
|
|
115
|
+
@model = model
|
|
116
|
+
@ollama_url = ollama_url || ENV['OLLAMA_URL'] || 'http://localhost:11434'
|
|
117
|
+
end
|
|
118
|
+
|
|
119
|
+
def embed(text)
|
|
120
|
+
case @provider
|
|
121
|
+
when :ollama
|
|
122
|
+
embed_ollama(text)
|
|
123
|
+
when :openai
|
|
124
|
+
embed_openai(text)
|
|
125
|
+
when :cohere
|
|
126
|
+
embed_cohere(text)
|
|
127
|
+
end
|
|
128
|
+
end
|
|
129
|
+
|
|
130
|
+
private
|
|
131
|
+
|
|
132
|
+
def embed_ollama(text)
|
|
133
|
+
# Direct HTTP call to Ollama API
|
|
134
|
+
response = Net::HTTP.post(
|
|
135
|
+
URI("#{@ollama_url}/api/embeddings"),
|
|
136
|
+
{model: @model, prompt: text}.to_json,
|
|
137
|
+
{'Content-Type' => 'application/json'}
|
|
138
|
+
)
|
|
139
|
+
JSON.parse(response.body)['embedding']
|
|
140
|
+
end
|
|
141
|
+
end
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
### Fallback Strategy
|
|
145
|
+
If Ollama is unavailable:
|
|
146
|
+
```ruby
|
|
147
|
+
rescue => e
|
|
148
|
+
warn "Error generating embedding with Ollama: #{e.message}"
|
|
149
|
+
warn "Falling back to stub embeddings (random vectors)"
|
|
150
|
+
warn "Please ensure Ollama is running: curl http://localhost:11434/api/version"
|
|
151
|
+
Array.new(1536) { rand(-1.0..1.0) }
|
|
152
|
+
end
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
### User Configuration
|
|
156
|
+
```ruby
|
|
157
|
+
# Default: Ollama with gpt-oss
|
|
158
|
+
htm = HTM.new(robot_name: "My Robot")
|
|
159
|
+
|
|
160
|
+
# Explicit Ollama configuration
|
|
161
|
+
htm = HTM.new(
|
|
162
|
+
robot_name: "My Robot",
|
|
163
|
+
embedding_service: :ollama,
|
|
164
|
+
embedding_model: 'gpt-oss'
|
|
165
|
+
)
|
|
166
|
+
|
|
167
|
+
# Use different Ollama model
|
|
168
|
+
htm = HTM.new(
|
|
169
|
+
robot_name: "My Robot",
|
|
170
|
+
embedding_service: :ollama,
|
|
171
|
+
embedding_model: 'nomic-embed-text' # 768d model
|
|
172
|
+
)
|
|
173
|
+
|
|
174
|
+
# Use OpenAI (requires implementation + API key)
|
|
175
|
+
htm = HTM.new(
|
|
176
|
+
robot_name: "My Robot",
|
|
177
|
+
embedding_service: :openai
|
|
178
|
+
)
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
## Consequences
|
|
182
|
+
|
|
183
|
+
### Positive
|
|
184
|
+
|
|
185
|
+
✅ **Zero cost**: No API fees for embedding generation
|
|
186
|
+
✅ **Privacy-first**: Data stays local
|
|
187
|
+
✅ **Fast iteration**: No rate limits during development
|
|
188
|
+
✅ **Offline capable**: Works without internet
|
|
189
|
+
✅ **Simple setup**: One command to install model
|
|
190
|
+
✅ **Flexible**: Easy to swap providers later
|
|
191
|
+
|
|
192
|
+
### Negative
|
|
193
|
+
|
|
194
|
+
❌ **Setup required**: Users must install Ollama and pull model
|
|
195
|
+
❌ **Hardware dependency**: Requires decent CPU/GPU (M2 Mac sufficient)
|
|
196
|
+
❌ **Quality trade-off**: Not quite OpenAI quality (acceptable for most use cases)
|
|
197
|
+
❌ **Compatibility**: Users on older hardware may struggle
|
|
198
|
+
❌ **Debugging**: Local issues harder to diagnose than API errors
|
|
199
|
+
|
|
200
|
+
### Neutral
|
|
201
|
+
|
|
202
|
+
➡️ **Model choice**: gpt-oss is reasonable default, but users can experiment
|
|
203
|
+
➡️ **Version drift**: Ollama model updates may change embeddings
|
|
204
|
+
➡️ **Dimension flexibility**: Could support other dimensions with schema changes
|
|
205
|
+
|
|
206
|
+
## Risks and Mitigations
|
|
207
|
+
|
|
208
|
+
### Risk: Ollama Not Installed
|
|
209
|
+
|
|
210
|
+
- **Risk**: Users try to use HTM without Ollama
|
|
211
|
+
- **Likelihood**: High (on first run)
|
|
212
|
+
- **Impact**: High (no embeddings, broken search)
|
|
213
|
+
- **Mitigation**:
|
|
214
|
+
- Clear error messages with installation instructions
|
|
215
|
+
- Fallback to stub embeddings (with warning)
|
|
216
|
+
- Check Ollama availability in setup script
|
|
217
|
+
|
|
218
|
+
### Risk: Model Not Downloaded
|
|
219
|
+
|
|
220
|
+
- **Risk**: Ollama installed but gpt-oss model not pulled
|
|
221
|
+
- **Likelihood**: Medium
|
|
222
|
+
- **Impact**: High (embedding generation fails)
|
|
223
|
+
- **Mitigation**:
|
|
224
|
+
- Setup script checks for model
|
|
225
|
+
- Error message includes `ollama pull gpt-oss`
|
|
226
|
+
- Document in README and SETUP.md
|
|
227
|
+
|
|
228
|
+
### Risk: Performance on Low-end Hardware
|
|
229
|
+
|
|
230
|
+
- **Risk**: Slow embedding generation on older machines
|
|
231
|
+
- **Likelihood**: Medium
|
|
232
|
+
- **Impact**: Medium (poor user experience)
|
|
233
|
+
- **Mitigation**:
|
|
234
|
+
- Document minimum requirements
|
|
235
|
+
- Provide alternative providers
|
|
236
|
+
- Batch embedding generation where possible
|
|
237
|
+
|
|
238
|
+
### Risk: Model Quality Insufficient
|
|
239
|
+
|
|
240
|
+
- **Risk**: gpt-oss embeddings not good enough for semantic search
|
|
241
|
+
- **Likelihood**: Low (model is decent)
|
|
242
|
+
- **Impact**: Medium (degraded search quality)
|
|
243
|
+
- **Mitigation**:
|
|
244
|
+
- Provide easy provider switching
|
|
245
|
+
- Document trade-offs
|
|
246
|
+
- Consider hybrid search (full-text + vector)
|
|
247
|
+
|
|
248
|
+
## Performance Characteristics
|
|
249
|
+
|
|
250
|
+
### Ollama (gpt-oss on M2 Mac)
|
|
251
|
+
|
|
252
|
+
- **Latency**: 100-300ms per embedding
|
|
253
|
+
- **Throughput**: ~5-10 embeddings/second
|
|
254
|
+
- **Memory**: ~500MB for model
|
|
255
|
+
- **CPU**: Moderate (benefits from Apple Silicon)
|
|
256
|
+
|
|
257
|
+
### OpenAI (for comparison)
|
|
258
|
+
|
|
259
|
+
- **Latency**: 50-150ms (network + API)
|
|
260
|
+
- **Throughput**: Limited by rate limits (3000 RPM = 50/sec)
|
|
261
|
+
- **Cost**: $0.02 per 1M tokens
|
|
262
|
+
- **Quality**: Slightly better semantic understanding
|
|
263
|
+
|
|
264
|
+
## Setup Instructions
|
|
265
|
+
|
|
266
|
+
Documented in SETUP.md:
|
|
267
|
+
```bash
|
|
268
|
+
# Install Ollama
|
|
269
|
+
curl https://ollama.ai/install.sh | sh
|
|
270
|
+
|
|
271
|
+
# Or download from: https://ollama.ai/download
|
|
272
|
+
|
|
273
|
+
# Pull gpt-oss model
|
|
274
|
+
ollama pull gpt-oss
|
|
275
|
+
|
|
276
|
+
# Verify Ollama is running
|
|
277
|
+
curl http://localhost:11434/api/version
|
|
278
|
+
```
|
|
279
|
+
|
|
280
|
+
## Migration Path
|
|
281
|
+
|
|
282
|
+
### To OpenAI
|
|
283
|
+
|
|
284
|
+
1. Set up OpenAI API key
|
|
285
|
+
2. Change initialization:
|
|
286
|
+
```ruby
|
|
287
|
+
htm = HTM.new(embedding_service: :openai)
|
|
288
|
+
```
|
|
289
|
+
3. Re-embed existing nodes (embeddings not compatible)
|
|
290
|
+
|
|
291
|
+
### To Cohere
|
|
292
|
+
Similar process, implement `embed_cohere` method
|
|
293
|
+
|
|
294
|
+
### To Custom URL
|
|
295
|
+
```ruby
|
|
296
|
+
htm = HTM.new(
|
|
297
|
+
embedding_service: :ollama,
|
|
298
|
+
ollama_url: 'http://custom-host:11434'
|
|
299
|
+
)
|
|
300
|
+
```
|
|
301
|
+
|
|
302
|
+
## Future Enhancements
|
|
303
|
+
|
|
304
|
+
1. **Batch embedding**: Generate multiple embeddings in one call
|
|
305
|
+
2. **Caching**: Cache embeddings for identical text
|
|
306
|
+
3. **Dimension flexibility**: Support models with different dimensions
|
|
307
|
+
4. **Quality metrics**: Compare embedding quality across providers
|
|
308
|
+
5. **Auto-selection**: Choose best available provider
|
|
309
|
+
6. **Fallback chain**: Try Ollama, then OpenAI, then stub
|
|
310
|
+
|
|
311
|
+
## Alternatives Considered
|
|
312
|
+
|
|
313
|
+
| Provider | Quality | Cost | Privacy | Decision |
|
|
314
|
+
|----------|---------|------|---------|----------|
|
|
315
|
+
| Ollama (gpt-oss) | Good | Free | ✅ Local | ✅ **Default** |
|
|
316
|
+
| OpenAI | Excellent | $0.02/1M | ❌ Cloud | ✅ Optional |
|
|
317
|
+
| Cohere | Excellent | $0.10/1M | ❌ Cloud | ✅ Optional |
|
|
318
|
+
| Sentence Transformers | Good | Free | ✅ Local | ⏸️ Future |
|
|
319
|
+
| Voyage AI | Excellent | $0.12/1M | ❌ Cloud | ❌ Rejected |
|
|
320
|
+
|
|
321
|
+
## References
|
|
322
|
+
|
|
323
|
+
- [Ollama Documentation](https://ollama.ai/)
|
|
324
|
+
- [gpt-oss Model](https://ollama.ai/library/gpt-oss)
|
|
325
|
+
- [OpenAI Embeddings](https://platform.openai.com/docs/guides/embeddings)
|
|
326
|
+
- [pgvector Documentation](https://github.com/pgvector/pgvector)
|
|
327
|
+
- [HTM Embedding Service](../../lib/htm/embedding_service.rb)
|
|
328
|
+
|
|
329
|
+
## Review Notes
|
|
330
|
+
|
|
331
|
+
**AI Engineer**: ✅ Local-first approach is excellent for privacy. Consider batch embedding for performance.
|
|
332
|
+
|
|
333
|
+
**Performance Specialist**: ✅ 100-300ms is acceptable. Monitor for bottlenecks with large recall operations.
|
|
334
|
+
|
|
335
|
+
**Security Specialist**: ✅ Privacy-preserving by default. Ensure users are aware of trade-offs when switching to cloud providers.
|
|
336
|
+
|
|
337
|
+
**Ruby Expert**: ✅ Clean abstraction. Consider using Faraday for HTTP calls for better connection management.
|
|
338
|
+
|
|
339
|
+
**Systems Architect**: ✅ Pluggable design allows easy provider switching. Good balance of pragmatism and flexibility.
|
|
@@ -0,0 +1,374 @@
|
|
|
1
|
+
# ADR-004: Multi-Robot Shared Memory (Hive Mind)
|
|
2
|
+
|
|
3
|
+
**Status**: Accepted
|
|
4
|
+
|
|
5
|
+
**Date**: 2025-10-25
|
|
6
|
+
|
|
7
|
+
**Decision Makers**: Dewayne VanHoozer, Claude (Anthropic)
|
|
8
|
+
|
|
9
|
+
## Context
|
|
10
|
+
|
|
11
|
+
In LLM-based applications, users often interact with multiple "robots" (AI agents) over time. These robots may serve different purposes (coding assistant, research assistant, chat companion) or represent different instances of the same application across sessions.
|
|
12
|
+
|
|
13
|
+
Challenges with isolated memory:
|
|
14
|
+
|
|
15
|
+
- Each robot has independent context
|
|
16
|
+
- User repeats information across robots
|
|
17
|
+
- No cross-robot learning
|
|
18
|
+
- Conversations fragmented across agents
|
|
19
|
+
- Lost context when switching robots
|
|
20
|
+
|
|
21
|
+
Alternative approaches:
|
|
22
|
+
|
|
23
|
+
1. **Isolated memory**: Each robot has completely separate memory
|
|
24
|
+
2. **Shared memory (hive mind)**: All robots access global memory pool
|
|
25
|
+
3. **Hierarchical memory**: Per-robot memory + shared global memory
|
|
26
|
+
4. **Explicit sharing**: User chooses what to share across robots
|
|
27
|
+
|
|
28
|
+
## Decision
|
|
29
|
+
|
|
30
|
+
We will implement a **shared memory (hive mind) architecture** where all robots access a single global memory database, with attribution tracking to identify which robot contributed each memory.
|
|
31
|
+
|
|
32
|
+
## Rationale
|
|
33
|
+
|
|
34
|
+
### Why Shared Memory?
|
|
35
|
+
|
|
36
|
+
**Context continuity**:
|
|
37
|
+
|
|
38
|
+
- User doesn't repeat themselves across robots
|
|
39
|
+
- "You" refers to the user consistently
|
|
40
|
+
- Preferences persist across sessions
|
|
41
|
+
- Conversation history accessible to all
|
|
42
|
+
|
|
43
|
+
**Cross-robot learning**:
|
|
44
|
+
|
|
45
|
+
- Knowledge gained by one robot benefits all
|
|
46
|
+
- Architectural decisions visible to coding assistants
|
|
47
|
+
- Research findings available to writers
|
|
48
|
+
- Bug fixes remembered globally
|
|
49
|
+
|
|
50
|
+
**Simplified data model**:
|
|
51
|
+
|
|
52
|
+
- Single source of truth
|
|
53
|
+
- No synchronization complexity
|
|
54
|
+
- Unified search across all conversations
|
|
55
|
+
- Consistent robot registry
|
|
56
|
+
|
|
57
|
+
**User experience**:
|
|
58
|
+
|
|
59
|
+
- Seamless switching between robots
|
|
60
|
+
- Coherent memory across interactions
|
|
61
|
+
- No need to "catch up" new robots
|
|
62
|
+
- Transparent collaboration
|
|
63
|
+
|
|
64
|
+
### Attribution Tracking
|
|
65
|
+
|
|
66
|
+
Every node stores `robot_id`:
|
|
67
|
+
```sql
|
|
68
|
+
CREATE TABLE nodes (
|
|
69
|
+
...
|
|
70
|
+
robot_id TEXT NOT NULL,
|
|
71
|
+
...
|
|
72
|
+
);
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
Benefits:
|
|
76
|
+
|
|
77
|
+
- Track which robot said what
|
|
78
|
+
- Debug conversation attribution
|
|
79
|
+
- Analyze robot behavior patterns
|
|
80
|
+
- Support privacy controls (future)
|
|
81
|
+
|
|
82
|
+
### Hive Mind Queries
|
|
83
|
+
|
|
84
|
+
```ruby
|
|
85
|
+
# Which robot discussed this topic?
|
|
86
|
+
breakdown = htm.which_robot_said("PostgreSQL")
|
|
87
|
+
# => { "robot-123" => 15, "robot-456" => 8 }
|
|
88
|
+
|
|
89
|
+
# Get chronological conversation
|
|
90
|
+
timeline = htm.conversation_timeline("HTM design", limit: 50)
|
|
91
|
+
# => [{ timestamp: ..., robot: "...", content: "..." }, ...]
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
### Robot Registry
|
|
95
|
+
|
|
96
|
+
```sql
|
|
97
|
+
CREATE TABLE robots (
|
|
98
|
+
id TEXT PRIMARY KEY,
|
|
99
|
+
name TEXT,
|
|
100
|
+
created_at TIMESTAMP,
|
|
101
|
+
last_active TIMESTAMP,
|
|
102
|
+
metadata JSONB
|
|
103
|
+
);
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
Tracks all robots using the system:
|
|
107
|
+
- Registration on first use
|
|
108
|
+
- Activity timestamps
|
|
109
|
+
- Custom metadata (configuration, purpose, etc.)
|
|
110
|
+
|
|
111
|
+
## Implementation Details
|
|
112
|
+
|
|
113
|
+
### Robot Initialization
|
|
114
|
+
```ruby
|
|
115
|
+
htm = HTM.new(
|
|
116
|
+
robot_name: "Code Helper",
|
|
117
|
+
robot_id: "robot-123" # optional, auto-generated if not provided
|
|
118
|
+
)
|
|
119
|
+
|
|
120
|
+
# Registers robot in database
|
|
121
|
+
@long_term_memory.register_robot(@robot_id, @robot_name)
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
### Adding Memories with Attribution
|
|
125
|
+
```ruby
|
|
126
|
+
def add_node(key, value, ...)
|
|
127
|
+
node_id = @long_term_memory.add(
|
|
128
|
+
key: key,
|
|
129
|
+
value: value,
|
|
130
|
+
robot_id: @robot_id, # Attribution
|
|
131
|
+
...
|
|
132
|
+
)
|
|
133
|
+
end
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
### Querying by Robot
|
|
137
|
+
```ruby
|
|
138
|
+
# All nodes by specific robot
|
|
139
|
+
SELECT * FROM nodes WHERE robot_id = 'robot-123';
|
|
140
|
+
|
|
141
|
+
# Breakdown by robot
|
|
142
|
+
SELECT robot_id, COUNT(*)
|
|
143
|
+
FROM nodes
|
|
144
|
+
WHERE value ILIKE '%PostgreSQL%'
|
|
145
|
+
GROUP BY robot_id;
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
### Working Memory: Per-Robot
|
|
149
|
+
|
|
150
|
+
**Important distinction**: While long-term memory is shared globally, working memory is per-robot instance (per-process):
|
|
151
|
+
|
|
152
|
+
```ruby
|
|
153
|
+
class HTM
|
|
154
|
+
def initialize(...)
|
|
155
|
+
@working_memory = WorkingMemory.new(max_tokens: 128_000) # Per-instance
|
|
156
|
+
@long_term_memory = LongTermMemory.new(db_config) # Shared database
|
|
157
|
+
end
|
|
158
|
+
end
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
Each robot has:
|
|
162
|
+
- **Own working memory**: Token-limited, process-local
|
|
163
|
+
- **Shared long-term memory**: Durable, global PostgreSQL
|
|
164
|
+
|
|
165
|
+
This design provides:
|
|
166
|
+
|
|
167
|
+
- Fast local access (working memory)
|
|
168
|
+
- Global knowledge sharing (long-term memory)
|
|
169
|
+
- Process isolation (no cross-process RAM access needed)
|
|
170
|
+
|
|
171
|
+
## Consequences
|
|
172
|
+
|
|
173
|
+
### Positive
|
|
174
|
+
|
|
175
|
+
✅ **Seamless context**: User never repeats information
|
|
176
|
+
✅ **Cross-robot learning**: Knowledge compounds across agents
|
|
177
|
+
✅ **Conversation attribution**: Clear ownership of memories
|
|
178
|
+
✅ **Unified search**: Find information regardless of which robot stored it
|
|
179
|
+
✅ **Simplified architecture**: Single database, no synchronization
|
|
180
|
+
✅ **Activity tracking**: Monitor robot usage patterns
|
|
181
|
+
✅ **Debugging**: Trace memories back to source robot
|
|
182
|
+
|
|
183
|
+
### Negative
|
|
184
|
+
|
|
185
|
+
❌ **Privacy complexity**: All robots see all data (no isolation)
|
|
186
|
+
❌ **Namespace conflicts**: Key collisions across robots (mitigated by UUID keys)
|
|
187
|
+
❌ **Context pollution**: Irrelevant memories from other robots
|
|
188
|
+
❌ **Testing complexity**: Shared state harder to isolate in tests
|
|
189
|
+
❌ **Multi-tenancy**: No built-in tenant isolation (future requirement)
|
|
190
|
+
|
|
191
|
+
### Neutral
|
|
192
|
+
|
|
193
|
+
➡️ **Global namespace**: Requires coordination for key naming
|
|
194
|
+
➡️ **Robot identity**: User must provide meaningful robot names
|
|
195
|
+
➡️ **Memory attribution**: "Who said this?" vs. "What was said?"
|
|
196
|
+
|
|
197
|
+
## Design Decisions
|
|
198
|
+
|
|
199
|
+
### Decision: Global by Default
|
|
200
|
+
**Rationale**: Simplicity and user experience trump isolation. Users can implement privacy layers on top if needed.
|
|
201
|
+
|
|
202
|
+
**Alternative**: Per-robot namespaces with opt-in sharing
|
|
203
|
+
**Rejected**: Adds complexity, defeats purpose of hive mind
|
|
204
|
+
|
|
205
|
+
### Decision: Robot ID Required
|
|
206
|
+
**Rationale**: Essential for attribution and debugging
|
|
207
|
+
|
|
208
|
+
**Alternative**: Optional robot_id
|
|
209
|
+
**Rejected**: Lose critical context and debugging capability
|
|
210
|
+
|
|
211
|
+
### Decision: Working Memory Per-Process
|
|
212
|
+
**Rationale**: Avoid distributed state synchronization complexity
|
|
213
|
+
|
|
214
|
+
**Alternative**: Shared working memory (Redis)
|
|
215
|
+
**Deferred**: Consider for multi-process/multi-host scenarios
|
|
216
|
+
|
|
217
|
+
## Risks and Mitigations
|
|
218
|
+
|
|
219
|
+
### Risk: Context Pollution
|
|
220
|
+
|
|
221
|
+
- **Risk**: Robot sees irrelevant memories from other robots
|
|
222
|
+
- **Likelihood**: Medium (depends on use patterns)
|
|
223
|
+
- **Impact**: Medium (degraded relevance)
|
|
224
|
+
- **Mitigation**:
|
|
225
|
+
- Importance scoring helps filter
|
|
226
|
+
- Robot-specific recall filters (future)
|
|
227
|
+
- Category/tag-based filtering
|
|
228
|
+
- Smart context assembly
|
|
229
|
+
|
|
230
|
+
### Risk: Privacy Violations
|
|
231
|
+
|
|
232
|
+
- **Risk**: Sensitive data accessible to all robots
|
|
233
|
+
- **Likelihood**: Low (single-user scenario)
|
|
234
|
+
- **Impact**: High (if multi-user)
|
|
235
|
+
- **Mitigation**:
|
|
236
|
+
- Document single-user assumption
|
|
237
|
+
- Add row-level security for multi-tenant (future)
|
|
238
|
+
- Encryption for sensitive data (future)
|
|
239
|
+
|
|
240
|
+
### Risk: Key Collisions
|
|
241
|
+
|
|
242
|
+
- **Risk**: Different robots use same key for different data
|
|
243
|
+
- **Likelihood**: Low (UUID recommendations)
|
|
244
|
+
- **Impact**: Medium (data corruption)
|
|
245
|
+
- **Mitigation**:
|
|
246
|
+
- Recommend UUIDs or prefixed keys
|
|
247
|
+
- Unique constraint on key column
|
|
248
|
+
- Error handling for collisions
|
|
249
|
+
|
|
250
|
+
### Risk: Unbounded Growth
|
|
251
|
+
|
|
252
|
+
- **Risk**: Memory grows indefinitely with multiple robots
|
|
253
|
+
- **Likelihood**: High (no automatic cleanup)
|
|
254
|
+
- **Impact**: Medium (storage costs, query slowdown)
|
|
255
|
+
- **Mitigation**:
|
|
256
|
+
- Retention policies (future)
|
|
257
|
+
- Archival strategies
|
|
258
|
+
- Importance-based pruning (future)
|
|
259
|
+
|
|
260
|
+
## Use Cases
|
|
261
|
+
|
|
262
|
+
### Use Case 1: Cross-Session Context
|
|
263
|
+
```ruby
|
|
264
|
+
# Session 1 - Robot A
|
|
265
|
+
htm_a = HTM.new(robot_name: "Code Helper A")
|
|
266
|
+
htm_a.add_node("user_pref_001", "User prefers debug_me over puts",
|
|
267
|
+
type: :preference)
|
|
268
|
+
|
|
269
|
+
# Session 2 - Robot B (different process, later time)
|
|
270
|
+
htm_b = HTM.new(robot_name: "Code Helper B")
|
|
271
|
+
memories = htm_b.recall(timeframe: "last week", topic: "debugging")
|
|
272
|
+
# => Finds preference from Robot A
|
|
273
|
+
```
|
|
274
|
+
|
|
275
|
+
### Use Case 2: Collaborative Development
|
|
276
|
+
```ruby
|
|
277
|
+
# Robot A (architecture discussion)
|
|
278
|
+
htm_a.add_node("decision_001",
|
|
279
|
+
"We decided to use PostgreSQL for storage",
|
|
280
|
+
type: :decision)
|
|
281
|
+
|
|
282
|
+
# Robot B (implementation)
|
|
283
|
+
htm_b.recall(timeframe: "today", topic: "database")
|
|
284
|
+
# => Finds architectural decision from Robot A
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
### Use Case 3: Activity Analysis
|
|
288
|
+
```ruby
|
|
289
|
+
# Which robot has been most active?
|
|
290
|
+
SELECT robot_id, COUNT(*) as contributions
|
|
291
|
+
FROM nodes
|
|
292
|
+
GROUP BY robot_id
|
|
293
|
+
ORDER BY contributions DESC;
|
|
294
|
+
|
|
295
|
+
# What did each robot contribute this week?
|
|
296
|
+
SELECT r.name, COUNT(n.id) as memories_added
|
|
297
|
+
FROM robots r
|
|
298
|
+
JOIN nodes n ON n.robot_id = r.id
|
|
299
|
+
WHERE n.created_at > NOW() - INTERVAL '7 days'
|
|
300
|
+
GROUP BY r.name;
|
|
301
|
+
```
|
|
302
|
+
|
|
303
|
+
## Future Enhancements
|
|
304
|
+
|
|
305
|
+
### Privacy Controls
|
|
306
|
+
```ruby
|
|
307
|
+
# Mark memories as private to specific robot
|
|
308
|
+
htm.add_node("private_key", "sensitive data",
|
|
309
|
+
visibility: :private) # Only accessible to this robot
|
|
310
|
+
|
|
311
|
+
# Or shared with specific robots
|
|
312
|
+
htm.add_node("shared_key", "team data",
|
|
313
|
+
visibility: [:shared, robot_ids: ['robot-a', 'robot-b']])
|
|
314
|
+
```
|
|
315
|
+
|
|
316
|
+
### Robot Groups/Teams
|
|
317
|
+
```ruby
|
|
318
|
+
# Group robots by purpose
|
|
319
|
+
htm.add_robot_to_group("robot-123", "coding-team")
|
|
320
|
+
htm.add_robot_to_group("robot-456", "research-team")
|
|
321
|
+
|
|
322
|
+
# Query by group
|
|
323
|
+
memories = htm.recall(robot_group: "coding-team", topic: "APIs")
|
|
324
|
+
```
|
|
325
|
+
|
|
326
|
+
### Multi-Tenancy
|
|
327
|
+
```ruby
|
|
328
|
+
# Tenant isolation
|
|
329
|
+
htm = HTM.new(
|
|
330
|
+
robot_name: "Helper",
|
|
331
|
+
tenant_id: "user-abc123" # Row-level security
|
|
332
|
+
)
|
|
333
|
+
```
|
|
334
|
+
|
|
335
|
+
## Alternatives Considered
|
|
336
|
+
|
|
337
|
+
### Isolated Memory (Per-Robot)
|
|
338
|
+
**Pros**: Complete isolation, no pollution, simpler privacy
|
|
339
|
+
**Cons**: User repeats information, no cross-robot learning
|
|
340
|
+
**Decision**: ❌ Rejected - defeats purpose of persistent memory
|
|
341
|
+
|
|
342
|
+
### Hierarchical Memory (Per-Robot + Global)
|
|
343
|
+
**Pros**: Best of both worlds, explicit sharing
|
|
344
|
+
**Cons**: Complex synchronization, unclear semantics
|
|
345
|
+
**Decision**: ❌ Rejected - too complex for v1
|
|
346
|
+
|
|
347
|
+
### Explicit Sharing
|
|
348
|
+
**Pros**: User controls what's shared
|
|
349
|
+
**Cons**: Friction, user burden, complexity
|
|
350
|
+
**Decision**: ❌ Rejected - simplicity and UX trump control
|
|
351
|
+
|
|
352
|
+
### Federated Memory (P2P)
|
|
353
|
+
**Pros**: Distributed, no central database
|
|
354
|
+
**Cons**: Sync complexity, consistency challenges
|
|
355
|
+
**Decision**: ❌ Rejected - unnecessary complexity
|
|
356
|
+
|
|
357
|
+
## References
|
|
358
|
+
|
|
359
|
+
- [Collective Intelligence](https://en.wikipedia.org/wiki/Collective_intelligence)
|
|
360
|
+
- [Hive Mind Concept](https://en.wikipedia.org/wiki/Hive_mind)
|
|
361
|
+
- [Multi-Agent Systems](https://en.wikipedia.org/wiki/Multi-agent_system)
|
|
362
|
+
- [HTM Robot Registry](../../lib/htm/long_term_memory.rb)
|
|
363
|
+
|
|
364
|
+
## Review Notes
|
|
365
|
+
|
|
366
|
+
**Systems Architect**: ✅ Simple and effective for single-user scenario. Plan for multi-tenancy early.
|
|
367
|
+
|
|
368
|
+
**Domain Expert**: ✅ Hive mind metaphor maps well to shared knowledge base. Consider robot personality/role in memory interpretation.
|
|
369
|
+
|
|
370
|
+
**Security Specialist**: ⚠️ Single-user assumption is critical. Document clearly and add tenant isolation before production multi-user deployment.
|
|
371
|
+
|
|
372
|
+
**AI Engineer**: ✅ Cross-robot context sharing improves LLM effectiveness. Monitor for context pollution in practice.
|
|
373
|
+
|
|
374
|
+
**Database Architect**: ✅ Robot_id indexing will scale well. Consider partitioning by robot_id if one robot dominates.
|