htm 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (155) hide show
  1. checksums.yaml +7 -0
  2. data/.architecture/decisions/adrs/001-use-postgresql-timescaledb-storage.md +227 -0
  3. data/.architecture/decisions/adrs/002-two-tier-memory-architecture.md +322 -0
  4. data/.architecture/decisions/adrs/003-ollama-default-embedding-provider.md +339 -0
  5. data/.architecture/decisions/adrs/004-multi-robot-shared-memory-hive-mind.md +374 -0
  6. data/.architecture/decisions/adrs/005-rag-based-retrieval-with-hybrid-search.md +443 -0
  7. data/.architecture/decisions/adrs/006-context-assembly-strategies.md +444 -0
  8. data/.architecture/decisions/adrs/007-working-memory-eviction-strategy.md +461 -0
  9. data/.architecture/decisions/adrs/008-robot-identification-system.md +550 -0
  10. data/.architecture/decisions/adrs/009-never-forget-explicit-deletion-only.md +570 -0
  11. data/.architecture/decisions/adrs/010-redis-working-memory-rejected.md +323 -0
  12. data/.architecture/decisions/adrs/011-database-side-embedding-generation-with-pgai.md +585 -0
  13. data/.architecture/decisions/adrs/012-llm-driven-ontology-topic-extraction.md +583 -0
  14. data/.architecture/decisions/adrs/013-activerecord-orm-and-many-to-many-tagging.md +299 -0
  15. data/.architecture/decisions/adrs/014-client-side-embedding-generation-workflow.md +569 -0
  16. data/.architecture/decisions/adrs/015-hierarchical-tag-ontology-and-llm-extraction.md +701 -0
  17. data/.architecture/decisions/adrs/016-async-embedding-and-tag-generation.md +694 -0
  18. data/.architecture/members.yml +144 -0
  19. data/.architecture/reviews/2025-10-29-llm-configuration-and-async-processing-review.md +1137 -0
  20. data/.architecture/reviews/initial-system-analysis.md +330 -0
  21. data/.envrc +32 -0
  22. data/.irbrc +145 -0
  23. data/CHANGELOG.md +150 -0
  24. data/COMMITS.md +196 -0
  25. data/LICENSE +21 -0
  26. data/README.md +1347 -0
  27. data/Rakefile +51 -0
  28. data/SETUP.md +268 -0
  29. data/config/database.yml +67 -0
  30. data/db/migrate/20250101000001_enable_extensions.rb +14 -0
  31. data/db/migrate/20250101000002_create_robots.rb +14 -0
  32. data/db/migrate/20250101000003_create_nodes.rb +42 -0
  33. data/db/migrate/20250101000005_create_tags.rb +38 -0
  34. data/db/migrate/20250101000007_add_node_vector_indexes.rb +30 -0
  35. data/db/schema.sql +473 -0
  36. data/db/seed_data/README.md +100 -0
  37. data/db/seed_data/presidents.md +136 -0
  38. data/db/seed_data/states.md +151 -0
  39. data/db/seeds.rb +208 -0
  40. data/dbdoc/README.md +173 -0
  41. data/dbdoc/public.node_stats.md +48 -0
  42. data/dbdoc/public.node_stats.svg +41 -0
  43. data/dbdoc/public.node_tags.md +40 -0
  44. data/dbdoc/public.node_tags.svg +112 -0
  45. data/dbdoc/public.nodes.md +54 -0
  46. data/dbdoc/public.nodes.svg +118 -0
  47. data/dbdoc/public.nodes_tags.md +39 -0
  48. data/dbdoc/public.nodes_tags.svg +112 -0
  49. data/dbdoc/public.ontology_structure.md +48 -0
  50. data/dbdoc/public.ontology_structure.svg +38 -0
  51. data/dbdoc/public.operations_log.md +42 -0
  52. data/dbdoc/public.operations_log.svg +130 -0
  53. data/dbdoc/public.relationships.md +39 -0
  54. data/dbdoc/public.relationships.svg +41 -0
  55. data/dbdoc/public.robot_activity.md +46 -0
  56. data/dbdoc/public.robot_activity.svg +35 -0
  57. data/dbdoc/public.robots.md +35 -0
  58. data/dbdoc/public.robots.svg +90 -0
  59. data/dbdoc/public.schema_migrations.md +29 -0
  60. data/dbdoc/public.schema_migrations.svg +26 -0
  61. data/dbdoc/public.tags.md +35 -0
  62. data/dbdoc/public.tags.svg +60 -0
  63. data/dbdoc/public.topic_relationships.md +45 -0
  64. data/dbdoc/public.topic_relationships.svg +32 -0
  65. data/dbdoc/schema.json +1437 -0
  66. data/dbdoc/schema.svg +154 -0
  67. data/docs/api/database.md +806 -0
  68. data/docs/api/embedding-service.md +532 -0
  69. data/docs/api/htm.md +797 -0
  70. data/docs/api/index.md +259 -0
  71. data/docs/api/long-term-memory.md +1096 -0
  72. data/docs/api/working-memory.md +665 -0
  73. data/docs/architecture/adrs/001-postgresql-timescaledb.md +314 -0
  74. data/docs/architecture/adrs/002-two-tier-memory.md +411 -0
  75. data/docs/architecture/adrs/003-ollama-embeddings.md +421 -0
  76. data/docs/architecture/adrs/004-hive-mind.md +437 -0
  77. data/docs/architecture/adrs/005-rag-retrieval.md +531 -0
  78. data/docs/architecture/adrs/006-context-assembly.md +496 -0
  79. data/docs/architecture/adrs/007-eviction-strategy.md +645 -0
  80. data/docs/architecture/adrs/008-robot-identification.md +625 -0
  81. data/docs/architecture/adrs/009-never-forget.md +648 -0
  82. data/docs/architecture/adrs/010-redis-working-memory-rejected.md +323 -0
  83. data/docs/architecture/adrs/011-pgai-integration.md +494 -0
  84. data/docs/architecture/adrs/index.md +215 -0
  85. data/docs/architecture/hive-mind.md +736 -0
  86. data/docs/architecture/index.md +351 -0
  87. data/docs/architecture/overview.md +538 -0
  88. data/docs/architecture/two-tier-memory.md +873 -0
  89. data/docs/assets/css/custom.css +83 -0
  90. data/docs/assets/images/htm-core-components.svg +63 -0
  91. data/docs/assets/images/htm-database-schema.svg +93 -0
  92. data/docs/assets/images/htm-hive-mind-architecture.svg +125 -0
  93. data/docs/assets/images/htm-importance-scoring-framework.svg +83 -0
  94. data/docs/assets/images/htm-layered-architecture.svg +71 -0
  95. data/docs/assets/images/htm-long-term-memory-architecture.svg +115 -0
  96. data/docs/assets/images/htm-working-memory-architecture.svg +120 -0
  97. data/docs/assets/images/htm.jpg +0 -0
  98. data/docs/assets/images/htm_demo.gif +0 -0
  99. data/docs/assets/js/mathjax.js +18 -0
  100. data/docs/assets/videos/htm_video.mp4 +0 -0
  101. data/docs/database_rake_tasks.md +322 -0
  102. data/docs/development/contributing.md +787 -0
  103. data/docs/development/index.md +336 -0
  104. data/docs/development/schema.md +596 -0
  105. data/docs/development/setup.md +719 -0
  106. data/docs/development/testing.md +819 -0
  107. data/docs/guides/adding-memories.md +824 -0
  108. data/docs/guides/context-assembly.md +1009 -0
  109. data/docs/guides/getting-started.md +577 -0
  110. data/docs/guides/index.md +118 -0
  111. data/docs/guides/long-term-memory.md +941 -0
  112. data/docs/guides/multi-robot.md +866 -0
  113. data/docs/guides/recalling-memories.md +927 -0
  114. data/docs/guides/search-strategies.md +953 -0
  115. data/docs/guides/working-memory.md +717 -0
  116. data/docs/index.md +214 -0
  117. data/docs/installation.md +477 -0
  118. data/docs/multi_framework_support.md +519 -0
  119. data/docs/quick-start.md +655 -0
  120. data/docs/setup_local_database.md +302 -0
  121. data/docs/using_rake_tasks_in_your_app.md +383 -0
  122. data/examples/basic_usage.rb +93 -0
  123. data/examples/cli_app/README.md +317 -0
  124. data/examples/cli_app/htm_cli.rb +270 -0
  125. data/examples/custom_llm_configuration.rb +183 -0
  126. data/examples/example_app/Rakefile +71 -0
  127. data/examples/example_app/app.rb +206 -0
  128. data/examples/sinatra_app/Gemfile +21 -0
  129. data/examples/sinatra_app/app.rb +335 -0
  130. data/lib/htm/active_record_config.rb +113 -0
  131. data/lib/htm/configuration.rb +342 -0
  132. data/lib/htm/database.rb +594 -0
  133. data/lib/htm/embedding_service.rb +115 -0
  134. data/lib/htm/errors.rb +34 -0
  135. data/lib/htm/job_adapter.rb +154 -0
  136. data/lib/htm/jobs/generate_embedding_job.rb +65 -0
  137. data/lib/htm/jobs/generate_tags_job.rb +82 -0
  138. data/lib/htm/long_term_memory.rb +965 -0
  139. data/lib/htm/models/node.rb +109 -0
  140. data/lib/htm/models/node_tag.rb +33 -0
  141. data/lib/htm/models/robot.rb +52 -0
  142. data/lib/htm/models/tag.rb +76 -0
  143. data/lib/htm/railtie.rb +76 -0
  144. data/lib/htm/sinatra.rb +157 -0
  145. data/lib/htm/tag_service.rb +135 -0
  146. data/lib/htm/tasks.rb +38 -0
  147. data/lib/htm/version.rb +5 -0
  148. data/lib/htm/working_memory.rb +182 -0
  149. data/lib/htm.rb +400 -0
  150. data/lib/tasks/db.rake +19 -0
  151. data/lib/tasks/htm.rake +147 -0
  152. data/lib/tasks/jobs.rake +312 -0
  153. data/mkdocs.yml +190 -0
  154. data/scripts/install_local_database.sh +309 -0
  155. metadata +341 -0
@@ -0,0 +1,569 @@
1
+ # ADR-014: Client-Side Embedding Generation Workflow
2
+
3
+ **Status**: ~~Accepted~~ **SUPERSEDED** (2025-10-29)
4
+
5
+ **Superseded By**: ADR-016 (Async Embedding and Tag Generation)
6
+
7
+ **Date**: 2025-10-29
8
+
9
+ **Decision Makers**: Dewayne VanHoozer, Claude (Anthropic)
10
+
11
+ ---
12
+
13
+ ## ⚠️ DECISION SUPERSEDED (2025-10-29)
14
+
15
+ **This ADR has been superseded by ADR-016.**
16
+
17
+ **Reason**: Synchronous embedding generation before save added 50-100ms latency to node creation. The async approach (ADR-016) provides much better user experience:
18
+ - Node saved immediately (~15ms)
19
+ - Embedding generated in background job
20
+ - User doesn't wait for LLM operations
21
+
22
+ See [ADR-016: Async Embedding and Tag Generation](./016-async-embedding-and-tag-generation.md) for current architecture.
23
+
24
+ ---
25
+
26
+ ## Context (Historical)
27
+
28
+ After the reversal of ADR-011 (database-side embedding generation with pgai), HTM returned to client-side embedding generation. However, the specific workflow, timing, and error handling strategies for embedding generation were not formally documented.
29
+
30
+ This ADR establishes the canonical approach for when, how, and where embeddings are generated in the HTM architecture.
31
+
32
+ ### Key Questions
33
+
34
+ 1. **When**: When are embeddings generated during the node lifecycle?
35
+ 2. **Where**: Client-side (Ruby) vs. database-side (PostgreSQL)?
36
+ 3. **How**: Synchronous vs. asynchronous generation?
37
+ 4. **Error Handling**: What happens if embedding generation fails?
38
+ 5. **Updates**: When/how are embeddings regenerated?
39
+ 6. **Dimensions**: How are variable embedding dimensions handled?
40
+
41
+ ---
42
+
43
+ ## Decision
44
+
45
+ HTM will generate embeddings **client-side in Ruby before database insertion** using the `EmbeddingService` class, with **synchronous generation** and **graceful degradation** on failures.
46
+
47
+ ### Embedding Generation Workflow
48
+
49
+ ```ruby
50
+ # 1. Application creates content
51
+ content = "PostgreSQL with pgvector provides vector similarity search"
52
+
53
+ # 2. EmbeddingService generates embedding BEFORE database operation
54
+ embedding_service = HTM::EmbeddingService.new(:ollama, model: 'nomic-embed-text')
55
+ embedding = embedding_service.embed(content) # Array<Float>, e.g. 768 dimensions
56
+
57
+ # 3. Embedding included in database INSERT
58
+ ltm.add(
59
+ content: content,
60
+ speaker: 'user',
61
+ robot_id: robot.id,
62
+ embedding: embedding, # Pre-generated
63
+ embedding_dimension: embedding.length
64
+ )
65
+
66
+ # 4. PostgreSQL stores embedding in vector column
67
+ # nodes.embedding::vector(2000)
68
+ ```
69
+
70
+ ### Key Principles
71
+
72
+ **Principle 1: Pre-Generation**
73
+ - Embeddings generated in application code BEFORE database operation
74
+ - Never rely on database triggers for embedding generation
75
+ - Embedding passed to database as parameter, not generated in-database
76
+
77
+ **Principle 2: Synchronous by Default**
78
+ - Embeddings generated synchronously in request path
79
+ - Acceptable latency (50-100ms per embedding with local Ollama)
80
+ - Simplifies error handling and debugging
81
+
82
+ **Principle 3: Graceful Degradation**
83
+ - If embedding generation fails, node still inserted (with `embedding: nil`)
84
+ - Background job can retry embedding generation later
85
+ - Nodes without embeddings excluded from vector search results
86
+
87
+ **Principle 4: Dimension Flexibility**
88
+ - Support embeddings from 1 to 2000 dimensions
89
+ - Store actual dimension in `embedding_dimension` column
90
+ - Validate dimension doesn't exceed database column limit (2000)
91
+
92
+ ---
93
+
94
+ ## Rationale
95
+
96
+ ### Why Client-Side?
97
+
98
+ **Developer Experience**:
99
+ - ✅ Works reliably on all platforms (macOS, Linux, Cloud)
100
+ - ✅ Simple setup (just Ollama + Ruby gem)
101
+ - ✅ Easy debugging (errors visible in Ruby stack traces)
102
+ - ✅ No PostgreSQL extension dependencies
103
+
104
+ **Code Clarity**:
105
+ - ✅ Explicit embedding generation visible in code
106
+ - ✅ Easy to mock/stub in tests
107
+ - ✅ Clear separation: Ruby generates, PostgreSQL stores
108
+ - ✅ Embedding logic can be modified without database migrations
109
+
110
+ **Operational Simplicity**:
111
+ - ✅ Unified architecture (no local vs. cloud split)
112
+ - ✅ No database trigger management
113
+ - ✅ Connection pooling handled by Ruby HTTP library
114
+ - ✅ Retry logic in application layer (more flexible)
115
+
116
+ ### Why Synchronous?
117
+
118
+ **Performance Acceptable**:
119
+ - Local Ollama: ~50ms per embedding (nomic-embed-text)
120
+ - Batch operations: Can optimize with connection reuse
121
+ - Most operations add single nodes (not bulk)
122
+
123
+ **Simpler Error Handling**:
124
+ - Immediate feedback if embedding fails
125
+ - Can present error to user or log synchronously
126
+ - No need for background job infrastructure for simple case
127
+
128
+ **Consistency**:
129
+ - Embedding available immediately after insertion
130
+ - No window where node exists but has no embedding
131
+ - Vector search works immediately after node creation
132
+
133
+ ### Why Graceful Degradation?
134
+
135
+ **Reliability**:
136
+ - Ollama service may be down temporarily
137
+ - Network issues may prevent embedding generation
138
+ - Node data is more valuable than embedding
139
+
140
+ **Recovery**:
141
+ - Background job can retry embedding generation
142
+ - Manual re-embedding possible: `UPDATE nodes SET content = content`
143
+ - Query can filter for nodes missing embeddings
144
+
145
+ ---
146
+
147
+ ## Implementation
148
+
149
+ ### EmbeddingService API
150
+
151
+ Located in `lib/htm/embedding_service.rb`:
152
+
153
+ ```ruby
154
+ class HTM::EmbeddingService
155
+ # Initialize with provider and model
156
+ def initialize(provider = :ollama, model: nil, ollama_url: nil, dimensions: nil)
157
+ @provider = provider # :ollama or :openai
158
+ @model = model || default_model_for_provider(provider)
159
+ @ollama_url = ollama_url || ENV['OLLAMA_URL'] || 'http://localhost:11434'
160
+ @dimensions = dimensions || KNOWN_DIMENSIONS[@model] || 768
161
+ end
162
+
163
+ # Generate embedding for text (synchronous)
164
+ # @param text [String] Content to embed
165
+ # @return [Array<Float>] Embedding vector
166
+ # @raises [HTM::EmbeddingError] If generation fails
167
+ def embed(text)
168
+ case @provider
169
+ when :ollama
170
+ embed_with_ollama(text) # HTTP POST to Ollama API
171
+ when :openai
172
+ embed_with_openai(text) # HTTP POST to OpenAI API
173
+ end
174
+ end
175
+
176
+ # Get expected embedding dimensions for current model
177
+ # @return [Integer] Dimension count
178
+ def embedding_dimensions
179
+ @dimensions
180
+ end
181
+
182
+ # Count tokens in text (for working memory management)
183
+ # @param text [String] Text to count
184
+ # @return [Integer] Token count
185
+ def count_tokens(text)
186
+ @tokenizer.encode(text.to_s).length
187
+ end
188
+ end
189
+ ```
190
+
191
+ ### LongTermMemory Integration
192
+
193
+ Located in `lib/htm/long_term_memory.rb`:
194
+
195
+ ```ruby
196
+ class HTM::LongTermMemory
197
+ def add(content:, speaker:, robot_id:, embedding: nil, **options)
198
+ # Embedding is OPTIONAL parameter
199
+ # If not provided, node inserted without embedding
200
+ # If provided, must be Array<Float> with length <= 2000
201
+
202
+ node = HTM::Models::Node.create!(
203
+ content: content,
204
+ speaker: speaker,
205
+ robot_id: robot_id,
206
+ embedding: embedding, # Can be nil
207
+ embedding_dimension: embedding&.length,
208
+ **options
209
+ )
210
+
211
+ node
212
+ end
213
+ end
214
+ ```
215
+
216
+ ### Error Handling
217
+
218
+ ```ruby
219
+ class HTM
220
+ def add_message(content, speaker: 'user', type: nil, **options)
221
+ # Generate embedding with error handling
222
+ begin
223
+ embedding = @embedding_service.embed(content)
224
+ rescue HTM::EmbeddingError => e
225
+ # Log error but continue with node insertion
226
+ warn "Embedding generation failed: #{e.message}"
227
+ embedding = nil # Node will be created without embedding
228
+ end
229
+
230
+ # Insert node (with or without embedding)
231
+ node = @ltm.add(
232
+ content: content,
233
+ speaker: speaker,
234
+ robot_id: @robot.id,
235
+ type: type,
236
+ embedding: embedding,
237
+ embedding_dimension: embedding&.length,
238
+ **options
239
+ )
240
+
241
+ # Add to working memory
242
+ @working_memory.add(node)
243
+
244
+ node
245
+ end
246
+ end
247
+ ```
248
+
249
+ ### Vector Search Behavior
250
+
251
+ ```ruby
252
+ def vector_search(query_text:, limit: 10)
253
+ # Generate query embedding
254
+ query_embedding = @embedding_service.embed(query_text)
255
+
256
+ # Search only nodes WITH embeddings
257
+ HTM::Models::Node
258
+ .where.not(embedding: nil) # Exclude nodes without embeddings
259
+ .order(Arel.sql("embedding <=> ?", query_embedding))
260
+ .limit(limit)
261
+ end
262
+ ```
263
+
264
+ ---
265
+
266
+ ## Embedding Update Strategies
267
+
268
+ ### Strategy 1: Content Change Detection
269
+
270
+ ```ruby
271
+ class HTM::Models::Node < ActiveRecord::Base
272
+ before_update :regenerate_embedding_if_content_changed
273
+
274
+ private
275
+
276
+ def regenerate_embedding_if_content_changed
277
+ if content_changed? && HTM.embedding_service
278
+ new_embedding = HTM.embedding_service.embed(content)
279
+ self.embedding = new_embedding
280
+ self.embedding_dimension = new_embedding.length
281
+ end
282
+ end
283
+ end
284
+ ```
285
+
286
+ **Trade-offs**:
287
+ - ✅ Automatic embedding regeneration on content change
288
+ - ❌ Embedding service must be globally accessible
289
+ - ❌ Adds latency to UPDATE operations
290
+
291
+ ### Strategy 2: Explicit Re-Embedding
292
+
293
+ ```ruby
294
+ class HTM
295
+ def regenerate_embedding(node_id)
296
+ node = HTM::Models::Node.find(node_id)
297
+ embedding = @embedding_service.embed(node.content)
298
+
299
+ node.update!(
300
+ embedding: embedding,
301
+ embedding_dimension: embedding.length
302
+ )
303
+ end
304
+
305
+ def regenerate_all_embeddings
306
+ HTM::Models::Node.find_each do |node|
307
+ regenerate_embedding(node.id)
308
+ end
309
+ end
310
+ end
311
+ ```
312
+
313
+ **Trade-offs**:
314
+ - ✅ Explicit control over when embeddings regenerate
315
+ - ✅ Can batch operations efficiently
316
+ - ❌ Manual intervention required
317
+
318
+ ### Strategy 3: Background Job (Future)
319
+
320
+ ```ruby
321
+ class EmbeddingRegenerationJob
322
+ def perform(node_id)
323
+ node = HTM::Models::Node.find(node_id)
324
+ return if node.embedding.present? # Skip if already has embedding
325
+
326
+ embedding = HTM::EmbeddingService.new.embed(node.content)
327
+ node.update!(
328
+ embedding: embedding,
329
+ embedding_dimension: embedding.length
330
+ )
331
+ end
332
+ end
333
+ ```
334
+
335
+ **Trade-offs**:
336
+ - ✅ Non-blocking embedding generation
337
+ - ✅ Can retry failures automatically
338
+ - ❌ Requires background job infrastructure (Sidekiq, etc.)
339
+
340
+ **Current Decision**: Use **Strategy 2 (Explicit Re-Embedding)** for simplicity.
341
+
342
+ ---
343
+
344
+ ## Embedding Provider Configuration
345
+
346
+ ### Ollama (Default)
347
+
348
+ ```ruby
349
+ # Default configuration
350
+ embedding_service = HTM::EmbeddingService.new(:ollama)
351
+ # Uses:
352
+ # - Model: nomic-embed-text (768 dimensions)
353
+ # - URL: http://localhost:11434
354
+ # - Requires: Ollama running locally
355
+
356
+ # Custom configuration
357
+ embedding_service = HTM::EmbeddingService.new(
358
+ :ollama,
359
+ model: 'mxbai-embed-large', # 1024 dimensions
360
+ ollama_url: ENV['OLLAMA_URL']
361
+ )
362
+ ```
363
+
364
+ **Requirements**:
365
+ - Ollama installed and running
366
+ - Model pulled: `ollama pull nomic-embed-text`
367
+ - Accessible at configured URL
368
+
369
+ ### OpenAI
370
+
371
+ ```ruby
372
+ # Configure OpenAI
373
+ embedding_service = HTM::EmbeddingService.new(
374
+ :openai,
375
+ model: 'text-embedding-3-small' # 1536 dimensions
376
+ )
377
+ # Requires: ENV['OPENAI_API_KEY'] set
378
+ ```
379
+
380
+ **Requirements**:
381
+ - `OPENAI_API_KEY` environment variable
382
+ - Internet connectivity
383
+ - API rate limits considered
384
+
385
+ ---
386
+
387
+ ## Consequences
388
+
389
+ ### Positive
390
+
391
+ ✅ **Simple and reliable**: Works consistently across all environments
392
+ ✅ **Debuggable**: Errors occur in Ruby code with full stack traces
393
+ ✅ **Flexible**: Easy to modify embedding logic without database changes
394
+ ✅ **Testable**: Can mock EmbeddingService in tests
395
+ ✅ **No extensions**: No PostgreSQL extension dependencies
396
+ ✅ **Graceful degradation**: System works even if embeddings fail
397
+ ✅ **Dimension flexibility**: Supports 1-2000 dimension embeddings
398
+
399
+ ### Negative
400
+
401
+ ❌ **Latency**: 50-100ms per embedding (vs. potential database-side optimization)
402
+ ❌ **HTTP overhead**: Ruby → Ollama HTTP call for each embedding
403
+ ❌ **Memory**: Embedding array held in Ruby memory before database insert
404
+ ❌ **No automatic updates**: Embeddings not automatically regenerated on content change
405
+
406
+ ### Neutral
407
+
408
+ ➡️ **Provider coupling**: Application chooses provider, not database
409
+ ➡️ **Connection management**: Ruby HTTP client handles connections
410
+ ➡️ **Error visibility**: Failures visible in application logs, not database logs
411
+
412
+ ---
413
+
414
+ ## Performance Characteristics
415
+
416
+ ### Benchmarks (M2 Mac, Ollama local, nomic-embed-text)
417
+
418
+ | Operation | Time | Notes |
419
+ |-----------|------|-------|
420
+ | Generate single embedding | ~50ms | HTTP round-trip to Ollama |
421
+ | Insert node with embedding | ~60ms | 50ms embed + 10ms INSERT |
422
+ | Batch 100 nodes | ~6s | ~60ms each (can optimize with connection reuse) |
423
+ | Vector search (10 results) | ~30ms | HNSW index efficient |
424
+
425
+ ### Optimization Opportunities
426
+
427
+ **Connection Pooling**:
428
+ ```ruby
429
+ # Reuse HTTP connection for multiple embeddings
430
+ Net::HTTP.start(uri.hostname, uri.port) do |http|
431
+ nodes.each do |node|
432
+ embedding = generate_embedding(http, node.content)
433
+ insert_node(node, embedding)
434
+ end
435
+ end
436
+ ```
437
+
438
+ **Parallel Generation** (Future):
439
+ ```ruby
440
+ # Generate embeddings in parallel for batch operations
441
+ threads = nodes.map do |node|
442
+ Thread.new { [node, embedding_service.embed(node.content)] }
443
+ end
444
+
445
+ results = threads.map(&:value) # [node, embedding] pairs
446
+ ```
447
+
448
+ ---
449
+
450
+ ## Risks and Mitigations
451
+
452
+ ### Risk: Ollama Service Down
453
+
454
+ **Risk**: Embedding generation fails if Ollama not running
455
+ - **Likelihood**: Medium (local development)
456
+ - **Impact**: Medium (nodes created without embeddings)
457
+ - **Mitigation**:
458
+ - Graceful degradation (nodes still created)
459
+ - Health check endpoint for Ollama
460
+ - Clear error messages with troubleshooting steps
461
+ - Background job retry for failed embeddings (future)
462
+
463
+ ### Risk: API Rate Limits (OpenAI)
464
+
465
+ **Risk**: Hit rate limits with high-volume operations
466
+ - **Likelihood**: Medium (for OpenAI provider)
467
+ - **Impact**: Medium (batch operations fail)
468
+ - **Mitigation**:
469
+ - Rate limiting in application layer
470
+ - Exponential backoff retry logic
471
+ - Prefer local Ollama for development
472
+ - Batch API if available
473
+
474
+ ### Risk: Dimension Mismatch
475
+
476
+ **Risk**: Model returns unexpected dimension count
477
+ - **Likelihood**: Low (models are consistent)
478
+ - **Impact**: High (database constraint violation)
479
+ - **Mitigation**:
480
+ - Validate embedding dimensions before insert
481
+ - Store actual dimension in `embedding_dimension` column
482
+ - Raise clear error if dimension > 2000
483
+ - Document supported models and dimensions
484
+
485
+ ### Risk: Stale Embeddings
486
+
487
+ **Risk**: Content updated but embedding not regenerated
488
+ - **Likelihood**: Medium (manual updates)
489
+ - **Impact**: Low (search quality degrades slightly)
490
+ - **Mitigation**:
491
+ - Document re-embedding procedures
492
+ - Provide utility methods for bulk re-embedding
493
+ - Consider ActiveRecord callback (future)
494
+ - Track last embedding generation time (future)
495
+
496
+ ---
497
+
498
+ ## Future Enhancements
499
+
500
+ ### 1. Automatic Re-Embedding on Content Change
501
+
502
+ ```ruby
503
+ class HTM::Models::Node < ActiveRecord::Base
504
+ after_update :regenerate_embedding, if: :content_changed?
505
+ end
506
+ ```
507
+
508
+ ### 2. Background Embedding Generation
509
+
510
+ ```ruby
511
+ # Queue for asynchronous processing
512
+ EmbeddingGenerationJob.perform_later(node_id)
513
+ ```
514
+
515
+ ### 3. Embedding Caching
516
+
517
+ ```ruby
518
+ class EmbeddingCache
519
+ def get_or_generate(content)
520
+ cache_key = Digest::SHA256.hexdigest(content)
521
+ Rails.cache.fetch("embedding:#{cache_key}") do
522
+ embedding_service.embed(content)
523
+ end
524
+ end
525
+ end
526
+ ```
527
+
528
+ ### 4. Batch Embedding Optimization
529
+
530
+ ```ruby
531
+ # Generate multiple embeddings in single HTTP request
532
+ def embed_batch(texts)
533
+ # Ollama doesn't support batch embedding yet
534
+ # OpenAI supports batches
535
+ end
536
+ ```
537
+
538
+ ### 5. Embedding Versioning
539
+
540
+ ```ruby
541
+ # Track which model/version generated embedding
542
+ class AddEmbeddingMetadataToNodes < ActiveRecord::Migration
543
+ add_column :nodes, :embedding_model, :text
544
+ add_column :nodes, :embedding_generated_at, :timestamptz
545
+ end
546
+ ```
547
+
548
+ ---
549
+
550
+ ## Related ADRs
551
+
552
+ - [ADR-001: PostgreSQL Storage](./001-use-postgresql-timescaledb-storage.md) - Database foundation
553
+ - [ADR-003: Ollama as Default Embedding Provider](./003-ollama-default-embedding-provider.md) - Provider choice
554
+ - [ADR-005: RAG-Based Retrieval](./005-rag-based-retrieval-with-hybrid-search.md) - How embeddings are used
555
+ - [ADR-011: Database-Side Embedding (REVERSED)](./011-database-side-embedding-generation-with-pgai.md) - Previous approach
556
+
557
+ ---
558
+
559
+ ## Review Notes
560
+
561
+ **AI Engineer**: ✅ Client-side generation is pragmatic. Graceful degradation ensures reliability.
562
+
563
+ **Performance Specialist**: ✅ 50ms latency is acceptable for this use case. Local Ollama performs well.
564
+
565
+ **Ruby Expert**: ✅ Clear separation of concerns. EmbeddingService is well-designed.
566
+
567
+ **Systems Architect**: ✅ Synchronous generation simplifies architecture. Async can be added later if needed.
568
+
569
+ **Database Architect**: ✅ Storing embedding_dimension alongside embedding is smart for future flexibility.