htm 0.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.architecture/decisions/adrs/001-use-postgresql-timescaledb-storage.md +227 -0
- data/.architecture/decisions/adrs/002-two-tier-memory-architecture.md +322 -0
- data/.architecture/decisions/adrs/003-ollama-default-embedding-provider.md +339 -0
- data/.architecture/decisions/adrs/004-multi-robot-shared-memory-hive-mind.md +374 -0
- data/.architecture/decisions/adrs/005-rag-based-retrieval-with-hybrid-search.md +443 -0
- data/.architecture/decisions/adrs/006-context-assembly-strategies.md +444 -0
- data/.architecture/decisions/adrs/007-working-memory-eviction-strategy.md +461 -0
- data/.architecture/decisions/adrs/008-robot-identification-system.md +550 -0
- data/.architecture/decisions/adrs/009-never-forget-explicit-deletion-only.md +570 -0
- data/.architecture/decisions/adrs/010-redis-working-memory-rejected.md +323 -0
- data/.architecture/decisions/adrs/011-database-side-embedding-generation-with-pgai.md +585 -0
- data/.architecture/decisions/adrs/012-llm-driven-ontology-topic-extraction.md +583 -0
- data/.architecture/decisions/adrs/013-activerecord-orm-and-many-to-many-tagging.md +299 -0
- data/.architecture/decisions/adrs/014-client-side-embedding-generation-workflow.md +569 -0
- data/.architecture/decisions/adrs/015-hierarchical-tag-ontology-and-llm-extraction.md +701 -0
- data/.architecture/decisions/adrs/016-async-embedding-and-tag-generation.md +694 -0
- data/.architecture/members.yml +144 -0
- data/.architecture/reviews/2025-10-29-llm-configuration-and-async-processing-review.md +1137 -0
- data/.architecture/reviews/initial-system-analysis.md +330 -0
- data/.envrc +32 -0
- data/.irbrc +145 -0
- data/CHANGELOG.md +150 -0
- data/COMMITS.md +196 -0
- data/LICENSE +21 -0
- data/README.md +1347 -0
- data/Rakefile +51 -0
- data/SETUP.md +268 -0
- data/config/database.yml +67 -0
- data/db/migrate/20250101000001_enable_extensions.rb +14 -0
- data/db/migrate/20250101000002_create_robots.rb +14 -0
- data/db/migrate/20250101000003_create_nodes.rb +42 -0
- data/db/migrate/20250101000005_create_tags.rb +38 -0
- data/db/migrate/20250101000007_add_node_vector_indexes.rb +30 -0
- data/db/schema.sql +473 -0
- data/db/seed_data/README.md +100 -0
- data/db/seed_data/presidents.md +136 -0
- data/db/seed_data/states.md +151 -0
- data/db/seeds.rb +208 -0
- data/dbdoc/README.md +173 -0
- data/dbdoc/public.node_stats.md +48 -0
- data/dbdoc/public.node_stats.svg +41 -0
- data/dbdoc/public.node_tags.md +40 -0
- data/dbdoc/public.node_tags.svg +112 -0
- data/dbdoc/public.nodes.md +54 -0
- data/dbdoc/public.nodes.svg +118 -0
- data/dbdoc/public.nodes_tags.md +39 -0
- data/dbdoc/public.nodes_tags.svg +112 -0
- data/dbdoc/public.ontology_structure.md +48 -0
- data/dbdoc/public.ontology_structure.svg +38 -0
- data/dbdoc/public.operations_log.md +42 -0
- data/dbdoc/public.operations_log.svg +130 -0
- data/dbdoc/public.relationships.md +39 -0
- data/dbdoc/public.relationships.svg +41 -0
- data/dbdoc/public.robot_activity.md +46 -0
- data/dbdoc/public.robot_activity.svg +35 -0
- data/dbdoc/public.robots.md +35 -0
- data/dbdoc/public.robots.svg +90 -0
- data/dbdoc/public.schema_migrations.md +29 -0
- data/dbdoc/public.schema_migrations.svg +26 -0
- data/dbdoc/public.tags.md +35 -0
- data/dbdoc/public.tags.svg +60 -0
- data/dbdoc/public.topic_relationships.md +45 -0
- data/dbdoc/public.topic_relationships.svg +32 -0
- data/dbdoc/schema.json +1437 -0
- data/dbdoc/schema.svg +154 -0
- data/docs/api/database.md +806 -0
- data/docs/api/embedding-service.md +532 -0
- data/docs/api/htm.md +797 -0
- data/docs/api/index.md +259 -0
- data/docs/api/long-term-memory.md +1096 -0
- data/docs/api/working-memory.md +665 -0
- data/docs/architecture/adrs/001-postgresql-timescaledb.md +314 -0
- data/docs/architecture/adrs/002-two-tier-memory.md +411 -0
- data/docs/architecture/adrs/003-ollama-embeddings.md +421 -0
- data/docs/architecture/adrs/004-hive-mind.md +437 -0
- data/docs/architecture/adrs/005-rag-retrieval.md +531 -0
- data/docs/architecture/adrs/006-context-assembly.md +496 -0
- data/docs/architecture/adrs/007-eviction-strategy.md +645 -0
- data/docs/architecture/adrs/008-robot-identification.md +625 -0
- data/docs/architecture/adrs/009-never-forget.md +648 -0
- data/docs/architecture/adrs/010-redis-working-memory-rejected.md +323 -0
- data/docs/architecture/adrs/011-pgai-integration.md +494 -0
- data/docs/architecture/adrs/index.md +215 -0
- data/docs/architecture/hive-mind.md +736 -0
- data/docs/architecture/index.md +351 -0
- data/docs/architecture/overview.md +538 -0
- data/docs/architecture/two-tier-memory.md +873 -0
- data/docs/assets/css/custom.css +83 -0
- data/docs/assets/images/htm-core-components.svg +63 -0
- data/docs/assets/images/htm-database-schema.svg +93 -0
- data/docs/assets/images/htm-hive-mind-architecture.svg +125 -0
- data/docs/assets/images/htm-importance-scoring-framework.svg +83 -0
- data/docs/assets/images/htm-layered-architecture.svg +71 -0
- data/docs/assets/images/htm-long-term-memory-architecture.svg +115 -0
- data/docs/assets/images/htm-working-memory-architecture.svg +120 -0
- data/docs/assets/images/htm.jpg +0 -0
- data/docs/assets/images/htm_demo.gif +0 -0
- data/docs/assets/js/mathjax.js +18 -0
- data/docs/assets/videos/htm_video.mp4 +0 -0
- data/docs/database_rake_tasks.md +322 -0
- data/docs/development/contributing.md +787 -0
- data/docs/development/index.md +336 -0
- data/docs/development/schema.md +596 -0
- data/docs/development/setup.md +719 -0
- data/docs/development/testing.md +819 -0
- data/docs/guides/adding-memories.md +824 -0
- data/docs/guides/context-assembly.md +1009 -0
- data/docs/guides/getting-started.md +577 -0
- data/docs/guides/index.md +118 -0
- data/docs/guides/long-term-memory.md +941 -0
- data/docs/guides/multi-robot.md +866 -0
- data/docs/guides/recalling-memories.md +927 -0
- data/docs/guides/search-strategies.md +953 -0
- data/docs/guides/working-memory.md +717 -0
- data/docs/index.md +214 -0
- data/docs/installation.md +477 -0
- data/docs/multi_framework_support.md +519 -0
- data/docs/quick-start.md +655 -0
- data/docs/setup_local_database.md +302 -0
- data/docs/using_rake_tasks_in_your_app.md +383 -0
- data/examples/basic_usage.rb +93 -0
- data/examples/cli_app/README.md +317 -0
- data/examples/cli_app/htm_cli.rb +270 -0
- data/examples/custom_llm_configuration.rb +183 -0
- data/examples/example_app/Rakefile +71 -0
- data/examples/example_app/app.rb +206 -0
- data/examples/sinatra_app/Gemfile +21 -0
- data/examples/sinatra_app/app.rb +335 -0
- data/lib/htm/active_record_config.rb +113 -0
- data/lib/htm/configuration.rb +342 -0
- data/lib/htm/database.rb +594 -0
- data/lib/htm/embedding_service.rb +115 -0
- data/lib/htm/errors.rb +34 -0
- data/lib/htm/job_adapter.rb +154 -0
- data/lib/htm/jobs/generate_embedding_job.rb +65 -0
- data/lib/htm/jobs/generate_tags_job.rb +82 -0
- data/lib/htm/long_term_memory.rb +965 -0
- data/lib/htm/models/node.rb +109 -0
- data/lib/htm/models/node_tag.rb +33 -0
- data/lib/htm/models/robot.rb +52 -0
- data/lib/htm/models/tag.rb +76 -0
- data/lib/htm/railtie.rb +76 -0
- data/lib/htm/sinatra.rb +157 -0
- data/lib/htm/tag_service.rb +135 -0
- data/lib/htm/tasks.rb +38 -0
- data/lib/htm/version.rb +5 -0
- data/lib/htm/working_memory.rb +182 -0
- data/lib/htm.rb +400 -0
- data/lib/tasks/db.rake +19 -0
- data/lib/tasks/htm.rake +147 -0
- data/lib/tasks/jobs.rake +312 -0
- data/mkdocs.yml +190 -0
- data/scripts/install_local_database.sh +309 -0
- metadata +341 -0
|
@@ -0,0 +1,569 @@
|
|
|
1
|
+
# ADR-014: Client-Side Embedding Generation Workflow
|
|
2
|
+
|
|
3
|
+
**Status**: ~~Accepted~~ **SUPERSEDED** (2025-10-29)
|
|
4
|
+
|
|
5
|
+
**Superseded By**: ADR-016 (Async Embedding and Tag Generation)
|
|
6
|
+
|
|
7
|
+
**Date**: 2025-10-29
|
|
8
|
+
|
|
9
|
+
**Decision Makers**: Dewayne VanHoozer, Claude (Anthropic)
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## ⚠️ DECISION SUPERSEDED (2025-10-29)
|
|
14
|
+
|
|
15
|
+
**This ADR has been superseded by ADR-016.**
|
|
16
|
+
|
|
17
|
+
**Reason**: Synchronous embedding generation before save added 50-100ms latency to node creation. The async approach (ADR-016) provides much better user experience:
|
|
18
|
+
- Node saved immediately (~15ms)
|
|
19
|
+
- Embedding generated in background job
|
|
20
|
+
- User doesn't wait for LLM operations
|
|
21
|
+
|
|
22
|
+
See [ADR-016: Async Embedding and Tag Generation](./016-async-embedding-and-tag-generation.md) for current architecture.
|
|
23
|
+
|
|
24
|
+
---
|
|
25
|
+
|
|
26
|
+
## Context (Historical)
|
|
27
|
+
|
|
28
|
+
After the reversal of ADR-011 (database-side embedding generation with pgai), HTM returned to client-side embedding generation. However, the specific workflow, timing, and error handling strategies for embedding generation were not formally documented.
|
|
29
|
+
|
|
30
|
+
This ADR establishes the canonical approach for when, how, and where embeddings are generated in the HTM architecture.
|
|
31
|
+
|
|
32
|
+
### Key Questions
|
|
33
|
+
|
|
34
|
+
1. **When**: When are embeddings generated during the node lifecycle?
|
|
35
|
+
2. **Where**: Client-side (Ruby) vs. database-side (PostgreSQL)?
|
|
36
|
+
3. **How**: Synchronous vs. asynchronous generation?
|
|
37
|
+
4. **Error Handling**: What happens if embedding generation fails?
|
|
38
|
+
5. **Updates**: When/how are embeddings regenerated?
|
|
39
|
+
6. **Dimensions**: How are variable embedding dimensions handled?
|
|
40
|
+
|
|
41
|
+
---
|
|
42
|
+
|
|
43
|
+
## Decision
|
|
44
|
+
|
|
45
|
+
HTM will generate embeddings **client-side in Ruby before database insertion** using the `EmbeddingService` class, with **synchronous generation** and **graceful degradation** on failures.
|
|
46
|
+
|
|
47
|
+
### Embedding Generation Workflow
|
|
48
|
+
|
|
49
|
+
```ruby
|
|
50
|
+
# 1. Application creates content
|
|
51
|
+
content = "PostgreSQL with pgvector provides vector similarity search"
|
|
52
|
+
|
|
53
|
+
# 2. EmbeddingService generates embedding BEFORE database operation
|
|
54
|
+
embedding_service = HTM::EmbeddingService.new(:ollama, model: 'nomic-embed-text')
|
|
55
|
+
embedding = embedding_service.embed(content) # Array<Float>, e.g. 768 dimensions
|
|
56
|
+
|
|
57
|
+
# 3. Embedding included in database INSERT
|
|
58
|
+
ltm.add(
|
|
59
|
+
content: content,
|
|
60
|
+
speaker: 'user',
|
|
61
|
+
robot_id: robot.id,
|
|
62
|
+
embedding: embedding, # Pre-generated
|
|
63
|
+
embedding_dimension: embedding.length
|
|
64
|
+
)
|
|
65
|
+
|
|
66
|
+
# 4. PostgreSQL stores embedding in vector column
|
|
67
|
+
# nodes.embedding::vector(2000)
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
### Key Principles
|
|
71
|
+
|
|
72
|
+
**Principle 1: Pre-Generation**
|
|
73
|
+
- Embeddings generated in application code BEFORE database operation
|
|
74
|
+
- Never rely on database triggers for embedding generation
|
|
75
|
+
- Embedding passed to database as parameter, not generated in-database
|
|
76
|
+
|
|
77
|
+
**Principle 2: Synchronous by Default**
|
|
78
|
+
- Embeddings generated synchronously in request path
|
|
79
|
+
- Acceptable latency (50-100ms per embedding with local Ollama)
|
|
80
|
+
- Simplifies error handling and debugging
|
|
81
|
+
|
|
82
|
+
**Principle 3: Graceful Degradation**
|
|
83
|
+
- If embedding generation fails, node still inserted (with `embedding: nil`)
|
|
84
|
+
- Background job can retry embedding generation later
|
|
85
|
+
- Nodes without embeddings excluded from vector search results
|
|
86
|
+
|
|
87
|
+
**Principle 4: Dimension Flexibility**
|
|
88
|
+
- Support embeddings from 1 to 2000 dimensions
|
|
89
|
+
- Store actual dimension in `embedding_dimension` column
|
|
90
|
+
- Validate dimension doesn't exceed database column limit (2000)
|
|
91
|
+
|
|
92
|
+
---
|
|
93
|
+
|
|
94
|
+
## Rationale
|
|
95
|
+
|
|
96
|
+
### Why Client-Side?
|
|
97
|
+
|
|
98
|
+
**Developer Experience**:
|
|
99
|
+
- ✅ Works reliably on all platforms (macOS, Linux, Cloud)
|
|
100
|
+
- ✅ Simple setup (just Ollama + Ruby gem)
|
|
101
|
+
- ✅ Easy debugging (errors visible in Ruby stack traces)
|
|
102
|
+
- ✅ No PostgreSQL extension dependencies
|
|
103
|
+
|
|
104
|
+
**Code Clarity**:
|
|
105
|
+
- ✅ Explicit embedding generation visible in code
|
|
106
|
+
- ✅ Easy to mock/stub in tests
|
|
107
|
+
- ✅ Clear separation: Ruby generates, PostgreSQL stores
|
|
108
|
+
- ✅ Embedding logic can be modified without database migrations
|
|
109
|
+
|
|
110
|
+
**Operational Simplicity**:
|
|
111
|
+
- ✅ Unified architecture (no local vs. cloud split)
|
|
112
|
+
- ✅ No database trigger management
|
|
113
|
+
- ✅ Connection pooling handled by Ruby HTTP library
|
|
114
|
+
- ✅ Retry logic in application layer (more flexible)
|
|
115
|
+
|
|
116
|
+
### Why Synchronous?
|
|
117
|
+
|
|
118
|
+
**Performance Acceptable**:
|
|
119
|
+
- Local Ollama: ~50ms per embedding (nomic-embed-text)
|
|
120
|
+
- Batch operations: Can optimize with connection reuse
|
|
121
|
+
- Most operations add single nodes (not bulk)
|
|
122
|
+
|
|
123
|
+
**Simpler Error Handling**:
|
|
124
|
+
- Immediate feedback if embedding fails
|
|
125
|
+
- Can present error to user or log synchronously
|
|
126
|
+
- No need for background job infrastructure for simple case
|
|
127
|
+
|
|
128
|
+
**Consistency**:
|
|
129
|
+
- Embedding available immediately after insertion
|
|
130
|
+
- No window where node exists but has no embedding
|
|
131
|
+
- Vector search works immediately after node creation
|
|
132
|
+
|
|
133
|
+
### Why Graceful Degradation?
|
|
134
|
+
|
|
135
|
+
**Reliability**:
|
|
136
|
+
- Ollama service may be down temporarily
|
|
137
|
+
- Network issues may prevent embedding generation
|
|
138
|
+
- Node data is more valuable than embedding
|
|
139
|
+
|
|
140
|
+
**Recovery**:
|
|
141
|
+
- Background job can retry embedding generation
|
|
142
|
+
- Manual re-embedding possible: `UPDATE nodes SET content = content`
|
|
143
|
+
- Query can filter for nodes missing embeddings
|
|
144
|
+
|
|
145
|
+
---
|
|
146
|
+
|
|
147
|
+
## Implementation
|
|
148
|
+
|
|
149
|
+
### EmbeddingService API
|
|
150
|
+
|
|
151
|
+
Located in `lib/htm/embedding_service.rb`:
|
|
152
|
+
|
|
153
|
+
```ruby
|
|
154
|
+
class HTM::EmbeddingService
|
|
155
|
+
# Initialize with provider and model
|
|
156
|
+
def initialize(provider = :ollama, model: nil, ollama_url: nil, dimensions: nil)
|
|
157
|
+
@provider = provider # :ollama or :openai
|
|
158
|
+
@model = model || default_model_for_provider(provider)
|
|
159
|
+
@ollama_url = ollama_url || ENV['OLLAMA_URL'] || 'http://localhost:11434'
|
|
160
|
+
@dimensions = dimensions || KNOWN_DIMENSIONS[@model] || 768
|
|
161
|
+
end
|
|
162
|
+
|
|
163
|
+
# Generate embedding for text (synchronous)
|
|
164
|
+
# @param text [String] Content to embed
|
|
165
|
+
# @return [Array<Float>] Embedding vector
|
|
166
|
+
# @raises [HTM::EmbeddingError] If generation fails
|
|
167
|
+
def embed(text)
|
|
168
|
+
case @provider
|
|
169
|
+
when :ollama
|
|
170
|
+
embed_with_ollama(text) # HTTP POST to Ollama API
|
|
171
|
+
when :openai
|
|
172
|
+
embed_with_openai(text) # HTTP POST to OpenAI API
|
|
173
|
+
end
|
|
174
|
+
end
|
|
175
|
+
|
|
176
|
+
# Get expected embedding dimensions for current model
|
|
177
|
+
# @return [Integer] Dimension count
|
|
178
|
+
def embedding_dimensions
|
|
179
|
+
@dimensions
|
|
180
|
+
end
|
|
181
|
+
|
|
182
|
+
# Count tokens in text (for working memory management)
|
|
183
|
+
# @param text [String] Text to count
|
|
184
|
+
# @return [Integer] Token count
|
|
185
|
+
def count_tokens(text)
|
|
186
|
+
@tokenizer.encode(text.to_s).length
|
|
187
|
+
end
|
|
188
|
+
end
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
### LongTermMemory Integration
|
|
192
|
+
|
|
193
|
+
Located in `lib/htm/long_term_memory.rb`:
|
|
194
|
+
|
|
195
|
+
```ruby
|
|
196
|
+
class HTM::LongTermMemory
|
|
197
|
+
def add(content:, speaker:, robot_id:, embedding: nil, **options)
|
|
198
|
+
# Embedding is OPTIONAL parameter
|
|
199
|
+
# If not provided, node inserted without embedding
|
|
200
|
+
# If provided, must be Array<Float> with length <= 2000
|
|
201
|
+
|
|
202
|
+
node = HTM::Models::Node.create!(
|
|
203
|
+
content: content,
|
|
204
|
+
speaker: speaker,
|
|
205
|
+
robot_id: robot_id,
|
|
206
|
+
embedding: embedding, # Can be nil
|
|
207
|
+
embedding_dimension: embedding&.length,
|
|
208
|
+
**options
|
|
209
|
+
)
|
|
210
|
+
|
|
211
|
+
node
|
|
212
|
+
end
|
|
213
|
+
end
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
### Error Handling
|
|
217
|
+
|
|
218
|
+
```ruby
|
|
219
|
+
class HTM
|
|
220
|
+
def add_message(content, speaker: 'user', type: nil, **options)
|
|
221
|
+
# Generate embedding with error handling
|
|
222
|
+
begin
|
|
223
|
+
embedding = @embedding_service.embed(content)
|
|
224
|
+
rescue HTM::EmbeddingError => e
|
|
225
|
+
# Log error but continue with node insertion
|
|
226
|
+
warn "Embedding generation failed: #{e.message}"
|
|
227
|
+
embedding = nil # Node will be created without embedding
|
|
228
|
+
end
|
|
229
|
+
|
|
230
|
+
# Insert node (with or without embedding)
|
|
231
|
+
node = @ltm.add(
|
|
232
|
+
content: content,
|
|
233
|
+
speaker: speaker,
|
|
234
|
+
robot_id: @robot.id,
|
|
235
|
+
type: type,
|
|
236
|
+
embedding: embedding,
|
|
237
|
+
embedding_dimension: embedding&.length,
|
|
238
|
+
**options
|
|
239
|
+
)
|
|
240
|
+
|
|
241
|
+
# Add to working memory
|
|
242
|
+
@working_memory.add(node)
|
|
243
|
+
|
|
244
|
+
node
|
|
245
|
+
end
|
|
246
|
+
end
|
|
247
|
+
```
|
|
248
|
+
|
|
249
|
+
### Vector Search Behavior
|
|
250
|
+
|
|
251
|
+
```ruby
|
|
252
|
+
def vector_search(query_text:, limit: 10)
|
|
253
|
+
# Generate query embedding
|
|
254
|
+
query_embedding = @embedding_service.embed(query_text)
|
|
255
|
+
|
|
256
|
+
# Search only nodes WITH embeddings
|
|
257
|
+
HTM::Models::Node
|
|
258
|
+
.where.not(embedding: nil) # Exclude nodes without embeddings
|
|
259
|
+
.order(Arel.sql("embedding <=> ?", query_embedding))
|
|
260
|
+
.limit(limit)
|
|
261
|
+
end
|
|
262
|
+
```
|
|
263
|
+
|
|
264
|
+
---
|
|
265
|
+
|
|
266
|
+
## Embedding Update Strategies
|
|
267
|
+
|
|
268
|
+
### Strategy 1: Content Change Detection
|
|
269
|
+
|
|
270
|
+
```ruby
|
|
271
|
+
class HTM::Models::Node < ActiveRecord::Base
|
|
272
|
+
before_update :regenerate_embedding_if_content_changed
|
|
273
|
+
|
|
274
|
+
private
|
|
275
|
+
|
|
276
|
+
def regenerate_embedding_if_content_changed
|
|
277
|
+
if content_changed? && HTM.embedding_service
|
|
278
|
+
new_embedding = HTM.embedding_service.embed(content)
|
|
279
|
+
self.embedding = new_embedding
|
|
280
|
+
self.embedding_dimension = new_embedding.length
|
|
281
|
+
end
|
|
282
|
+
end
|
|
283
|
+
end
|
|
284
|
+
```
|
|
285
|
+
|
|
286
|
+
**Trade-offs**:
|
|
287
|
+
- ✅ Automatic embedding regeneration on content change
|
|
288
|
+
- ❌ Embedding service must be globally accessible
|
|
289
|
+
- ❌ Adds latency to UPDATE operations
|
|
290
|
+
|
|
291
|
+
### Strategy 2: Explicit Re-Embedding
|
|
292
|
+
|
|
293
|
+
```ruby
|
|
294
|
+
class HTM
|
|
295
|
+
def regenerate_embedding(node_id)
|
|
296
|
+
node = HTM::Models::Node.find(node_id)
|
|
297
|
+
embedding = @embedding_service.embed(node.content)
|
|
298
|
+
|
|
299
|
+
node.update!(
|
|
300
|
+
embedding: embedding,
|
|
301
|
+
embedding_dimension: embedding.length
|
|
302
|
+
)
|
|
303
|
+
end
|
|
304
|
+
|
|
305
|
+
def regenerate_all_embeddings
|
|
306
|
+
HTM::Models::Node.find_each do |node|
|
|
307
|
+
regenerate_embedding(node.id)
|
|
308
|
+
end
|
|
309
|
+
end
|
|
310
|
+
end
|
|
311
|
+
```
|
|
312
|
+
|
|
313
|
+
**Trade-offs**:
|
|
314
|
+
- ✅ Explicit control over when embeddings regenerate
|
|
315
|
+
- ✅ Can batch operations efficiently
|
|
316
|
+
- ❌ Manual intervention required
|
|
317
|
+
|
|
318
|
+
### Strategy 3: Background Job (Future)
|
|
319
|
+
|
|
320
|
+
```ruby
|
|
321
|
+
class EmbeddingRegenerationJob
|
|
322
|
+
def perform(node_id)
|
|
323
|
+
node = HTM::Models::Node.find(node_id)
|
|
324
|
+
return if node.embedding.present? # Skip if already has embedding
|
|
325
|
+
|
|
326
|
+
embedding = HTM::EmbeddingService.new.embed(node.content)
|
|
327
|
+
node.update!(
|
|
328
|
+
embedding: embedding,
|
|
329
|
+
embedding_dimension: embedding.length
|
|
330
|
+
)
|
|
331
|
+
end
|
|
332
|
+
end
|
|
333
|
+
```
|
|
334
|
+
|
|
335
|
+
**Trade-offs**:
|
|
336
|
+
- ✅ Non-blocking embedding generation
|
|
337
|
+
- ✅ Can retry failures automatically
|
|
338
|
+
- ❌ Requires background job infrastructure (Sidekiq, etc.)
|
|
339
|
+
|
|
340
|
+
**Current Decision**: Use **Strategy 2 (Explicit Re-Embedding)** for simplicity.
|
|
341
|
+
|
|
342
|
+
---
|
|
343
|
+
|
|
344
|
+
## Embedding Provider Configuration
|
|
345
|
+
|
|
346
|
+
### Ollama (Default)
|
|
347
|
+
|
|
348
|
+
```ruby
|
|
349
|
+
# Default configuration
|
|
350
|
+
embedding_service = HTM::EmbeddingService.new(:ollama)
|
|
351
|
+
# Uses:
|
|
352
|
+
# - Model: nomic-embed-text (768 dimensions)
|
|
353
|
+
# - URL: http://localhost:11434
|
|
354
|
+
# - Requires: Ollama running locally
|
|
355
|
+
|
|
356
|
+
# Custom configuration
|
|
357
|
+
embedding_service = HTM::EmbeddingService.new(
|
|
358
|
+
:ollama,
|
|
359
|
+
model: 'mxbai-embed-large', # 1024 dimensions
|
|
360
|
+
ollama_url: ENV['OLLAMA_URL']
|
|
361
|
+
)
|
|
362
|
+
```
|
|
363
|
+
|
|
364
|
+
**Requirements**:
|
|
365
|
+
- Ollama installed and running
|
|
366
|
+
- Model pulled: `ollama pull nomic-embed-text`
|
|
367
|
+
- Accessible at configured URL
|
|
368
|
+
|
|
369
|
+
### OpenAI
|
|
370
|
+
|
|
371
|
+
```ruby
|
|
372
|
+
# Configure OpenAI
|
|
373
|
+
embedding_service = HTM::EmbeddingService.new(
|
|
374
|
+
:openai,
|
|
375
|
+
model: 'text-embedding-3-small' # 1536 dimensions
|
|
376
|
+
)
|
|
377
|
+
# Requires: ENV['OPENAI_API_KEY'] set
|
|
378
|
+
```
|
|
379
|
+
|
|
380
|
+
**Requirements**:
|
|
381
|
+
- `OPENAI_API_KEY` environment variable
|
|
382
|
+
- Internet connectivity
|
|
383
|
+
- API rate limits considered
|
|
384
|
+
|
|
385
|
+
---
|
|
386
|
+
|
|
387
|
+
## Consequences
|
|
388
|
+
|
|
389
|
+
### Positive
|
|
390
|
+
|
|
391
|
+
✅ **Simple and reliable**: Works consistently across all environments
|
|
392
|
+
✅ **Debuggable**: Errors occur in Ruby code with full stack traces
|
|
393
|
+
✅ **Flexible**: Easy to modify embedding logic without database changes
|
|
394
|
+
✅ **Testable**: Can mock EmbeddingService in tests
|
|
395
|
+
✅ **No extensions**: No PostgreSQL extension dependencies
|
|
396
|
+
✅ **Graceful degradation**: System works even if embeddings fail
|
|
397
|
+
✅ **Dimension flexibility**: Supports 1-2000 dimension embeddings
|
|
398
|
+
|
|
399
|
+
### Negative
|
|
400
|
+
|
|
401
|
+
❌ **Latency**: 50-100ms per embedding (vs. potential database-side optimization)
|
|
402
|
+
❌ **HTTP overhead**: Ruby → Ollama HTTP call for each embedding
|
|
403
|
+
❌ **Memory**: Embedding array held in Ruby memory before database insert
|
|
404
|
+
❌ **No automatic updates**: Embeddings not automatically regenerated on content change
|
|
405
|
+
|
|
406
|
+
### Neutral
|
|
407
|
+
|
|
408
|
+
➡️ **Provider coupling**: Application chooses provider, not database
|
|
409
|
+
➡️ **Connection management**: Ruby HTTP client handles connections
|
|
410
|
+
➡️ **Error visibility**: Failures visible in application logs, not database logs
|
|
411
|
+
|
|
412
|
+
---
|
|
413
|
+
|
|
414
|
+
## Performance Characteristics
|
|
415
|
+
|
|
416
|
+
### Benchmarks (M2 Mac, Ollama local, nomic-embed-text)
|
|
417
|
+
|
|
418
|
+
| Operation | Time | Notes |
|
|
419
|
+
|-----------|------|-------|
|
|
420
|
+
| Generate single embedding | ~50ms | HTTP round-trip to Ollama |
|
|
421
|
+
| Insert node with embedding | ~60ms | 50ms embed + 10ms INSERT |
|
|
422
|
+
| Batch 100 nodes | ~6s | ~60ms each (can optimize with connection reuse) |
|
|
423
|
+
| Vector search (10 results) | ~30ms | HNSW index efficient |
|
|
424
|
+
|
|
425
|
+
### Optimization Opportunities
|
|
426
|
+
|
|
427
|
+
**Connection Pooling**:
|
|
428
|
+
```ruby
|
|
429
|
+
# Reuse HTTP connection for multiple embeddings
|
|
430
|
+
Net::HTTP.start(uri.hostname, uri.port) do |http|
|
|
431
|
+
nodes.each do |node|
|
|
432
|
+
embedding = generate_embedding(http, node.content)
|
|
433
|
+
insert_node(node, embedding)
|
|
434
|
+
end
|
|
435
|
+
end
|
|
436
|
+
```
|
|
437
|
+
|
|
438
|
+
**Parallel Generation** (Future):
|
|
439
|
+
```ruby
|
|
440
|
+
# Generate embeddings in parallel for batch operations
|
|
441
|
+
threads = nodes.map do |node|
|
|
442
|
+
Thread.new { [node, embedding_service.embed(node.content)] }
|
|
443
|
+
end
|
|
444
|
+
|
|
445
|
+
results = threads.map(&:value) # [node, embedding] pairs
|
|
446
|
+
```
|
|
447
|
+
|
|
448
|
+
---
|
|
449
|
+
|
|
450
|
+
## Risks and Mitigations
|
|
451
|
+
|
|
452
|
+
### Risk: Ollama Service Down
|
|
453
|
+
|
|
454
|
+
**Risk**: Embedding generation fails if Ollama not running
|
|
455
|
+
- **Likelihood**: Medium (local development)
|
|
456
|
+
- **Impact**: Medium (nodes created without embeddings)
|
|
457
|
+
- **Mitigation**:
|
|
458
|
+
- Graceful degradation (nodes still created)
|
|
459
|
+
- Health check endpoint for Ollama
|
|
460
|
+
- Clear error messages with troubleshooting steps
|
|
461
|
+
- Background job retry for failed embeddings (future)
|
|
462
|
+
|
|
463
|
+
### Risk: API Rate Limits (OpenAI)
|
|
464
|
+
|
|
465
|
+
**Risk**: Hit rate limits with high-volume operations
|
|
466
|
+
- **Likelihood**: Medium (for OpenAI provider)
|
|
467
|
+
- **Impact**: Medium (batch operations fail)
|
|
468
|
+
- **Mitigation**:
|
|
469
|
+
- Rate limiting in application layer
|
|
470
|
+
- Exponential backoff retry logic
|
|
471
|
+
- Prefer local Ollama for development
|
|
472
|
+
- Batch API if available
|
|
473
|
+
|
|
474
|
+
### Risk: Dimension Mismatch
|
|
475
|
+
|
|
476
|
+
**Risk**: Model returns unexpected dimension count
|
|
477
|
+
- **Likelihood**: Low (models are consistent)
|
|
478
|
+
- **Impact**: High (database constraint violation)
|
|
479
|
+
- **Mitigation**:
|
|
480
|
+
- Validate embedding dimensions before insert
|
|
481
|
+
- Store actual dimension in `embedding_dimension` column
|
|
482
|
+
- Raise clear error if dimension > 2000
|
|
483
|
+
- Document supported models and dimensions
|
|
484
|
+
|
|
485
|
+
### Risk: Stale Embeddings
|
|
486
|
+
|
|
487
|
+
**Risk**: Content updated but embedding not regenerated
|
|
488
|
+
- **Likelihood**: Medium (manual updates)
|
|
489
|
+
- **Impact**: Low (search quality degrades slightly)
|
|
490
|
+
- **Mitigation**:
|
|
491
|
+
- Document re-embedding procedures
|
|
492
|
+
- Provide utility methods for bulk re-embedding
|
|
493
|
+
- Consider ActiveRecord callback (future)
|
|
494
|
+
- Track last embedding generation time (future)
|
|
495
|
+
|
|
496
|
+
---
|
|
497
|
+
|
|
498
|
+
## Future Enhancements
|
|
499
|
+
|
|
500
|
+
### 1. Automatic Re-Embedding on Content Change
|
|
501
|
+
|
|
502
|
+
```ruby
|
|
503
|
+
class HTM::Models::Node < ActiveRecord::Base
|
|
504
|
+
after_update :regenerate_embedding, if: :content_changed?
|
|
505
|
+
end
|
|
506
|
+
```
|
|
507
|
+
|
|
508
|
+
### 2. Background Embedding Generation
|
|
509
|
+
|
|
510
|
+
```ruby
|
|
511
|
+
# Queue for asynchronous processing
|
|
512
|
+
EmbeddingGenerationJob.perform_later(node_id)
|
|
513
|
+
```
|
|
514
|
+
|
|
515
|
+
### 3. Embedding Caching
|
|
516
|
+
|
|
517
|
+
```ruby
|
|
518
|
+
class EmbeddingCache
|
|
519
|
+
def get_or_generate(content)
|
|
520
|
+
cache_key = Digest::SHA256.hexdigest(content)
|
|
521
|
+
Rails.cache.fetch("embedding:#{cache_key}") do
|
|
522
|
+
embedding_service.embed(content)
|
|
523
|
+
end
|
|
524
|
+
end
|
|
525
|
+
end
|
|
526
|
+
```
|
|
527
|
+
|
|
528
|
+
### 4. Batch Embedding Optimization
|
|
529
|
+
|
|
530
|
+
```ruby
|
|
531
|
+
# Generate multiple embeddings in single HTTP request
|
|
532
|
+
def embed_batch(texts)
|
|
533
|
+
# Ollama doesn't support batch embedding yet
|
|
534
|
+
# OpenAI supports batches
|
|
535
|
+
end
|
|
536
|
+
```
|
|
537
|
+
|
|
538
|
+
### 5. Embedding Versioning
|
|
539
|
+
|
|
540
|
+
```ruby
|
|
541
|
+
# Track which model/version generated embedding
|
|
542
|
+
class AddEmbeddingMetadataToNodes < ActiveRecord::Migration
|
|
543
|
+
add_column :nodes, :embedding_model, :text
|
|
544
|
+
add_column :nodes, :embedding_generated_at, :timestamptz
|
|
545
|
+
end
|
|
546
|
+
```
|
|
547
|
+
|
|
548
|
+
---
|
|
549
|
+
|
|
550
|
+
## Related ADRs
|
|
551
|
+
|
|
552
|
+
- [ADR-001: PostgreSQL Storage](./001-use-postgresql-timescaledb-storage.md) - Database foundation
|
|
553
|
+
- [ADR-003: Ollama as Default Embedding Provider](./003-ollama-default-embedding-provider.md) - Provider choice
|
|
554
|
+
- [ADR-005: RAG-Based Retrieval](./005-rag-based-retrieval-with-hybrid-search.md) - How embeddings are used
|
|
555
|
+
- [ADR-011: Database-Side Embedding (REVERSED)](./011-database-side-embedding-generation-with-pgai.md) - Previous approach
|
|
556
|
+
|
|
557
|
+
---
|
|
558
|
+
|
|
559
|
+
## Review Notes
|
|
560
|
+
|
|
561
|
+
**AI Engineer**: ✅ Client-side generation is pragmatic. Graceful degradation ensures reliability.
|
|
562
|
+
|
|
563
|
+
**Performance Specialist**: ✅ 50ms latency is acceptable for this use case. Local Ollama performs well.
|
|
564
|
+
|
|
565
|
+
**Ruby Expert**: ✅ Clear separation of concerns. EmbeddingService is well-designed.
|
|
566
|
+
|
|
567
|
+
**Systems Architect**: ✅ Synchronous generation simplifies architecture. Async can be added later if needed.
|
|
568
|
+
|
|
569
|
+
**Database Architect**: ✅ Storing embedding_dimension alongside embedding is smart for future flexibility.
|