htm 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (155) hide show
  1. checksums.yaml +7 -0
  2. data/.architecture/decisions/adrs/001-use-postgresql-timescaledb-storage.md +227 -0
  3. data/.architecture/decisions/adrs/002-two-tier-memory-architecture.md +322 -0
  4. data/.architecture/decisions/adrs/003-ollama-default-embedding-provider.md +339 -0
  5. data/.architecture/decisions/adrs/004-multi-robot-shared-memory-hive-mind.md +374 -0
  6. data/.architecture/decisions/adrs/005-rag-based-retrieval-with-hybrid-search.md +443 -0
  7. data/.architecture/decisions/adrs/006-context-assembly-strategies.md +444 -0
  8. data/.architecture/decisions/adrs/007-working-memory-eviction-strategy.md +461 -0
  9. data/.architecture/decisions/adrs/008-robot-identification-system.md +550 -0
  10. data/.architecture/decisions/adrs/009-never-forget-explicit-deletion-only.md +570 -0
  11. data/.architecture/decisions/adrs/010-redis-working-memory-rejected.md +323 -0
  12. data/.architecture/decisions/adrs/011-database-side-embedding-generation-with-pgai.md +585 -0
  13. data/.architecture/decisions/adrs/012-llm-driven-ontology-topic-extraction.md +583 -0
  14. data/.architecture/decisions/adrs/013-activerecord-orm-and-many-to-many-tagging.md +299 -0
  15. data/.architecture/decisions/adrs/014-client-side-embedding-generation-workflow.md +569 -0
  16. data/.architecture/decisions/adrs/015-hierarchical-tag-ontology-and-llm-extraction.md +701 -0
  17. data/.architecture/decisions/adrs/016-async-embedding-and-tag-generation.md +694 -0
  18. data/.architecture/members.yml +144 -0
  19. data/.architecture/reviews/2025-10-29-llm-configuration-and-async-processing-review.md +1137 -0
  20. data/.architecture/reviews/initial-system-analysis.md +330 -0
  21. data/.envrc +32 -0
  22. data/.irbrc +145 -0
  23. data/CHANGELOG.md +150 -0
  24. data/COMMITS.md +196 -0
  25. data/LICENSE +21 -0
  26. data/README.md +1347 -0
  27. data/Rakefile +51 -0
  28. data/SETUP.md +268 -0
  29. data/config/database.yml +67 -0
  30. data/db/migrate/20250101000001_enable_extensions.rb +14 -0
  31. data/db/migrate/20250101000002_create_robots.rb +14 -0
  32. data/db/migrate/20250101000003_create_nodes.rb +42 -0
  33. data/db/migrate/20250101000005_create_tags.rb +38 -0
  34. data/db/migrate/20250101000007_add_node_vector_indexes.rb +30 -0
  35. data/db/schema.sql +473 -0
  36. data/db/seed_data/README.md +100 -0
  37. data/db/seed_data/presidents.md +136 -0
  38. data/db/seed_data/states.md +151 -0
  39. data/db/seeds.rb +208 -0
  40. data/dbdoc/README.md +173 -0
  41. data/dbdoc/public.node_stats.md +48 -0
  42. data/dbdoc/public.node_stats.svg +41 -0
  43. data/dbdoc/public.node_tags.md +40 -0
  44. data/dbdoc/public.node_tags.svg +112 -0
  45. data/dbdoc/public.nodes.md +54 -0
  46. data/dbdoc/public.nodes.svg +118 -0
  47. data/dbdoc/public.nodes_tags.md +39 -0
  48. data/dbdoc/public.nodes_tags.svg +112 -0
  49. data/dbdoc/public.ontology_structure.md +48 -0
  50. data/dbdoc/public.ontology_structure.svg +38 -0
  51. data/dbdoc/public.operations_log.md +42 -0
  52. data/dbdoc/public.operations_log.svg +130 -0
  53. data/dbdoc/public.relationships.md +39 -0
  54. data/dbdoc/public.relationships.svg +41 -0
  55. data/dbdoc/public.robot_activity.md +46 -0
  56. data/dbdoc/public.robot_activity.svg +35 -0
  57. data/dbdoc/public.robots.md +35 -0
  58. data/dbdoc/public.robots.svg +90 -0
  59. data/dbdoc/public.schema_migrations.md +29 -0
  60. data/dbdoc/public.schema_migrations.svg +26 -0
  61. data/dbdoc/public.tags.md +35 -0
  62. data/dbdoc/public.tags.svg +60 -0
  63. data/dbdoc/public.topic_relationships.md +45 -0
  64. data/dbdoc/public.topic_relationships.svg +32 -0
  65. data/dbdoc/schema.json +1437 -0
  66. data/dbdoc/schema.svg +154 -0
  67. data/docs/api/database.md +806 -0
  68. data/docs/api/embedding-service.md +532 -0
  69. data/docs/api/htm.md +797 -0
  70. data/docs/api/index.md +259 -0
  71. data/docs/api/long-term-memory.md +1096 -0
  72. data/docs/api/working-memory.md +665 -0
  73. data/docs/architecture/adrs/001-postgresql-timescaledb.md +314 -0
  74. data/docs/architecture/adrs/002-two-tier-memory.md +411 -0
  75. data/docs/architecture/adrs/003-ollama-embeddings.md +421 -0
  76. data/docs/architecture/adrs/004-hive-mind.md +437 -0
  77. data/docs/architecture/adrs/005-rag-retrieval.md +531 -0
  78. data/docs/architecture/adrs/006-context-assembly.md +496 -0
  79. data/docs/architecture/adrs/007-eviction-strategy.md +645 -0
  80. data/docs/architecture/adrs/008-robot-identification.md +625 -0
  81. data/docs/architecture/adrs/009-never-forget.md +648 -0
  82. data/docs/architecture/adrs/010-redis-working-memory-rejected.md +323 -0
  83. data/docs/architecture/adrs/011-pgai-integration.md +494 -0
  84. data/docs/architecture/adrs/index.md +215 -0
  85. data/docs/architecture/hive-mind.md +736 -0
  86. data/docs/architecture/index.md +351 -0
  87. data/docs/architecture/overview.md +538 -0
  88. data/docs/architecture/two-tier-memory.md +873 -0
  89. data/docs/assets/css/custom.css +83 -0
  90. data/docs/assets/images/htm-core-components.svg +63 -0
  91. data/docs/assets/images/htm-database-schema.svg +93 -0
  92. data/docs/assets/images/htm-hive-mind-architecture.svg +125 -0
  93. data/docs/assets/images/htm-importance-scoring-framework.svg +83 -0
  94. data/docs/assets/images/htm-layered-architecture.svg +71 -0
  95. data/docs/assets/images/htm-long-term-memory-architecture.svg +115 -0
  96. data/docs/assets/images/htm-working-memory-architecture.svg +120 -0
  97. data/docs/assets/images/htm.jpg +0 -0
  98. data/docs/assets/images/htm_demo.gif +0 -0
  99. data/docs/assets/js/mathjax.js +18 -0
  100. data/docs/assets/videos/htm_video.mp4 +0 -0
  101. data/docs/database_rake_tasks.md +322 -0
  102. data/docs/development/contributing.md +787 -0
  103. data/docs/development/index.md +336 -0
  104. data/docs/development/schema.md +596 -0
  105. data/docs/development/setup.md +719 -0
  106. data/docs/development/testing.md +819 -0
  107. data/docs/guides/adding-memories.md +824 -0
  108. data/docs/guides/context-assembly.md +1009 -0
  109. data/docs/guides/getting-started.md +577 -0
  110. data/docs/guides/index.md +118 -0
  111. data/docs/guides/long-term-memory.md +941 -0
  112. data/docs/guides/multi-robot.md +866 -0
  113. data/docs/guides/recalling-memories.md +927 -0
  114. data/docs/guides/search-strategies.md +953 -0
  115. data/docs/guides/working-memory.md +717 -0
  116. data/docs/index.md +214 -0
  117. data/docs/installation.md +477 -0
  118. data/docs/multi_framework_support.md +519 -0
  119. data/docs/quick-start.md +655 -0
  120. data/docs/setup_local_database.md +302 -0
  121. data/docs/using_rake_tasks_in_your_app.md +383 -0
  122. data/examples/basic_usage.rb +93 -0
  123. data/examples/cli_app/README.md +317 -0
  124. data/examples/cli_app/htm_cli.rb +270 -0
  125. data/examples/custom_llm_configuration.rb +183 -0
  126. data/examples/example_app/Rakefile +71 -0
  127. data/examples/example_app/app.rb +206 -0
  128. data/examples/sinatra_app/Gemfile +21 -0
  129. data/examples/sinatra_app/app.rb +335 -0
  130. data/lib/htm/active_record_config.rb +113 -0
  131. data/lib/htm/configuration.rb +342 -0
  132. data/lib/htm/database.rb +594 -0
  133. data/lib/htm/embedding_service.rb +115 -0
  134. data/lib/htm/errors.rb +34 -0
  135. data/lib/htm/job_adapter.rb +154 -0
  136. data/lib/htm/jobs/generate_embedding_job.rb +65 -0
  137. data/lib/htm/jobs/generate_tags_job.rb +82 -0
  138. data/lib/htm/long_term_memory.rb +965 -0
  139. data/lib/htm/models/node.rb +109 -0
  140. data/lib/htm/models/node_tag.rb +33 -0
  141. data/lib/htm/models/robot.rb +52 -0
  142. data/lib/htm/models/tag.rb +76 -0
  143. data/lib/htm/railtie.rb +76 -0
  144. data/lib/htm/sinatra.rb +157 -0
  145. data/lib/htm/tag_service.rb +135 -0
  146. data/lib/htm/tasks.rb +38 -0
  147. data/lib/htm/version.rb +5 -0
  148. data/lib/htm/working_memory.rb +182 -0
  149. data/lib/htm.rb +400 -0
  150. data/lib/tasks/db.rake +19 -0
  151. data/lib/tasks/htm.rake +147 -0
  152. data/lib/tasks/jobs.rake +312 -0
  153. data/mkdocs.yml +190 -0
  154. data/scripts/install_local_database.sh +309 -0
  155. metadata +341 -0
@@ -0,0 +1,531 @@
1
+ # ADR-005: RAG-Based Retrieval with Hybrid Search
2
+
3
+ **Status**: Accepted (Updated for Client-Side Embeddings)
4
+
5
+ **Date**: 2025-10-25 (Updated: 2025-10-27)
6
+
7
+ **Decision Makers**: Dewayne VanHoozer, Claude (Anthropic)
8
+
9
+ ---
10
+
11
+ !!! info "Architecture Update (October 2025)"
12
+ Following the reversal of ADR-011, query embeddings are now generated client-side in Ruby using `EmbeddingService` before being passed to SQL for vector similarity search. This provides a reliable, cross-platform solution.
13
+
14
+ ## Quick Summary
15
+
16
+ HTM implements **RAG-based retrieval with three search strategies**: vector search (semantic), full-text search (keywords), and hybrid search (combined). All strategies include temporal filtering to leverage TimescaleDB's time-series optimization.
17
+
18
+ **Why**: Different queries benefit from different approaches. Semantic search handles concepts, full-text handles precise terms, and hybrid provides the best balance for most use cases.
19
+
20
+ **Impact**: Flexible retrieval with excellent recall and precision. Client-side embedding generation provides reliable, debuggable operation across all platforms.
21
+
22
+ ---
23
+
24
+ ## Context
25
+
26
+ Traditional memory systems for LLMs face challenges in retrieving relevant information:
27
+
28
+ - **Keyword-only search**: Misses semantic relationships ("car" vs "automobile")
29
+ - **Vector-only search**: May miss exact keyword matches ("PostgreSQL 17.2" vs "database")
30
+ - **No temporal context**: Doesn't leverage time-based relevance
31
+ - **Scalability**: Simple linear scans don't scale to thousands of memories
32
+
33
+ ### Requirements
34
+
35
+ HTM needs intelligent retrieval that balances:
36
+
37
+ - Semantic understanding (what does the query mean?)
38
+ - Keyword precision (exact term matching)
39
+ - Temporal relevance (recent vs historical context)
40
+ - Performance (fast retrieval from large datasets)
41
+
42
+ ### Alternative Approaches
43
+
44
+ 1. **Pure vector search**: Semantic only, no keyword precision
45
+ 2. **Pure full-text search**: Keywords only, no semantic understanding
46
+ 3. **Hybrid search**: Combine vector + full-text + temporal filtering
47
+ 4. **LLM-as-retriever**: Use LLM to generate retrieval queries (slow, expensive)
48
+
49
+ ---
50
+
51
+ ## Decision
52
+
53
+ We will implement **RAG-based retrieval with three search strategies**: vector, full-text, and hybrid, all with temporal filtering.
54
+
55
+ ### Search Strategies
56
+
57
+ **1. Vector Search (`:vector`)**
58
+
59
+ - Generate embedding for query
60
+ - Compute cosine similarity with stored embeddings
61
+ - Temporal filtering on timeframe
62
+ - Best for: Semantic queries, conceptual relationships
63
+
64
+ **2. Full-Text Search (`:fulltext`)**
65
+
66
+ - PostgreSQL `to_tsvector` and `plainto_tsquery`
67
+ - `ts_rank` scoring for relevance
68
+ - Temporal filtering on timeframe
69
+ - Best for: Exact keywords, technical terms, proper nouns
70
+
71
+ **3. Hybrid Search (`:hybrid`)** - **Recommended Default**
72
+
73
+ - Full-text pre-filter to get candidates (top 100)
74
+ - Vector reranking of candidates for semantic relevance
75
+ - Temporal filtering on timeframe
76
+ - Best for: Balanced retrieval with precision + recall
77
+
78
+ ---
79
+
80
+ ## Rationale
81
+
82
+ ### Why RAG-Based Retrieval?
83
+
84
+ **Temporal filtering is foundational**:
85
+
86
+ - "What did we discuss last week?" - time is the primary filter
87
+ - Recent context often more relevant than old context
88
+ - TimescaleDB optimized for time-range queries
89
+
90
+ **Semantic search handles synonyms**:
91
+
92
+ - User says "database", finds memories about "PostgreSQL"
93
+ - "Bug fix" matches "resolved issue"
94
+ - Captures conceptual relationships
95
+
96
+ **Full-text handles precision**:
97
+
98
+ - "PostgreSQL 17.2" needs exact version match
99
+ - Technical terminology like "pgvector", "HNSW"
100
+ - Proper nouns like robot names, project names
101
+
102
+ **Hybrid combines strengths**:
103
+
104
+ - Pre-filter with keywords reduces vector search space
105
+ - Vector reranking improves relevance of keyword matches
106
+ - Avoids false positives from pure vector search
107
+ - Avoids missing results from pure keyword search
108
+
109
+ ---
110
+
111
+ ## Implementation Details
112
+
113
+ !!! info "Client-Side Embedding Generation"
114
+ Query embeddings are generated client-side in Ruby via `EmbeddingService` before being passed to SQL for vector similarity search.
115
+
116
+ ### Vector Search
117
+
118
+ ```ruby
119
+ def search(timeframe:, query:, limit:, embedding_service:)
120
+ # Generate query embedding client-side
121
+ query_embedding = embedding_service.embed(query)
122
+
123
+ # Pad to 2000 dimensions if needed
124
+ query_embedding += Array.new(2000 - query_embedding.length, 0.0) if query_embedding.length < 2000
125
+
126
+ # Convert to PostgreSQL vector format
127
+ embedding_str = "[#{query_embedding.join(',')}]"
128
+
129
+ # Vector search in database
130
+ conn.exec_params(<<~SQL, [embedding_str, timeframe.begin, timeframe.end, limit])
131
+ SELECT id, content, speaker, type, category, importance, created_at, robot_id, token_count,
132
+ 1 - (embedding <=> $1::vector) as similarity
133
+ FROM nodes
134
+ WHERE created_at BETWEEN $2 AND $3
135
+ AND embedding IS NOT NULL
136
+ ORDER BY embedding <=> $1::vector
137
+ LIMIT $4
138
+ SQL
139
+ end
140
+ ```
141
+
142
+ ### Full-Text Search
143
+
144
+ ```ruby
145
+ def search_fulltext(timeframe:, query:, limit:)
146
+ # No embedding needed for full-text search
147
+ conn.exec_params(<<~SQL, [query, timeframe.begin, timeframe.end, limit])
148
+ SELECT *, ts_rank(to_tsvector('english', content), plainto_tsquery('english', $1)) as rank
149
+ FROM nodes
150
+ WHERE created_at BETWEEN $2 AND $3
151
+ AND to_tsvector('english', content) @@ plainto_tsquery('english', $1)
152
+ ORDER BY rank DESC
153
+ LIMIT $4
154
+ SQL
155
+ end
156
+ ```
157
+
158
+ ### Hybrid Search
159
+
160
+ ```ruby
161
+ def search_hybrid(timeframe:, query:, limit:, embedding_service:, prefilter_limit: 100)
162
+ # Generate query embedding client-side
163
+ query_embedding = embedding_service.embed(query)
164
+ query_embedding += Array.new(2000 - query_embedding.length, 0.0) if query_embedding.length < 2000
165
+ embedding_str = "[#{query_embedding.join(',')}]"
166
+
167
+ # Combine full-text pre-filter with vector reranking
168
+ conn.exec_params(<<~SQL, [embedding_str, timeframe.begin, timeframe.end, query, prefilter_limit, limit])
169
+ WITH candidates AS (
170
+ SELECT id, content, speaker, type, category, importance, created_at, robot_id, token_count, embedding
171
+ FROM nodes
172
+ WHERE created_at BETWEEN $2 AND $3
173
+ AND to_tsvector('english', content) @@ plainto_tsquery('english', $4)
174
+ AND embedding IS NOT NULL
175
+ LIMIT $5 -- Pre-filter to top candidates
176
+ )
177
+ SELECT id, content, speaker, type, category, importance, created_at, robot_id, token_count,
178
+ 1 - (embedding <=> $1::vector) as similarity
179
+ FROM candidates
180
+ ORDER BY embedding <=> $1::vector
181
+ LIMIT $6 -- Final top results
182
+ SQL
183
+ end
184
+ ```
185
+
186
+ ### User API
187
+
188
+ ```ruby
189
+ # Use hybrid search (recommended)
190
+ memories = htm.recall(
191
+ timeframe: "last week",
192
+ topic: "PostgreSQL performance",
193
+ limit: 20,
194
+ strategy: :hybrid # default recommended
195
+ )
196
+
197
+ # Use pure vector search
198
+ memories = htm.recall(
199
+ timeframe: "last month",
200
+ topic: "database design philosophy",
201
+ strategy: :vector # best for conceptual queries
202
+ )
203
+
204
+ # Use pure full-text search
205
+ memories = htm.recall(
206
+ timeframe: "yesterday",
207
+ topic: "PostgreSQL 17.2 upgrade",
208
+ strategy: :fulltext # best for exact keywords
209
+ )
210
+ ```
211
+
212
+ ---
213
+
214
+ ## Consequences
215
+
216
+ ### Positive
217
+
218
+ - Flexible retrieval: Choose strategy based on query type
219
+ - Temporal context: Time-range filtering built into all strategies
220
+ - Semantic understanding: Vector search captures relationships
221
+ - Keyword precision: Full-text search handles exact matches
222
+ - Balanced hybrid: Best of both worlds with pre-filter optimization
223
+ - Scalable: HNSW indexing on vectors, GIN indexing on tsvectors
224
+ - Transparent scoring: Return similarity/rank scores for debugging
225
+
226
+ ### Negative
227
+
228
+ - Complexity: Three strategies to understand and choose from
229
+ - Embedding latency: Vector/hybrid require embedding generation
230
+ - Storage overhead: Both embeddings and full-text indexes
231
+ - English-only: Full-text optimized for English language
232
+ - Tuning required: Hybrid prefilter_limit may need adjustment
233
+
234
+ ### Neutral
235
+
236
+ - Strategy selection: User must choose appropriate strategy
237
+ - Timeframe parsing: Natural language time parsing adds complexity
238
+ - Embedding consistency: Different embedding models produce different results
239
+
240
+ ---
241
+
242
+ ## Use Cases
243
+
244
+ ### Use Case 1: Semantic Concept Retrieval
245
+
246
+ ```ruby
247
+ # Query: "What architectural decisions have we made?"
248
+ # Best strategy: :vector (semantic concept matching)
249
+
250
+ memories = htm.recall(
251
+ timeframe: "last month",
252
+ topic: "architectural decisions design choices",
253
+ strategy: :vector
254
+ )
255
+
256
+ # Finds: "We decided to use PostgreSQL", "Chose two-tier memory model", etc.
257
+ # Matches conceptually even without exact keywords
258
+ ```
259
+
260
+ ### Use Case 2: Exact Technical Term
261
+
262
+ ```ruby
263
+ # Query: "Find all mentions of PostgreSQL 17.2"
264
+ # Best strategy: :fulltext (exact version number)
265
+
266
+ memories = htm.recall(
267
+ timeframe: "this week",
268
+ topic: "PostgreSQL 17.2",
269
+ strategy: :fulltext
270
+ )
271
+
272
+ # Finds: Exact "PostgreSQL 17.2" mentions
273
+ # Avoids false matches to "PostgreSQL 16" or generic "database"
274
+ ```
275
+
276
+ ### Use Case 3: Balanced Query
277
+
278
+ ```ruby
279
+ # Query: "What did we discuss about database performance?"
280
+ # Best strategy: :hybrid (keyword + semantic)
281
+
282
+ memories = htm.recall(
283
+ timeframe: "last week",
284
+ topic: "database performance optimization",
285
+ strategy: :hybrid
286
+ )
287
+
288
+ # Pre-filters: Documents containing "database", "performance", "optimization"
289
+ # Reranks: By semantic similarity to full query
290
+ # Result: Best balance of precision + recall
291
+ ```
292
+
293
+ ### Use Case 4: Conversation Timeline
294
+
295
+ ```ruby
296
+ # Get chronological conversation about a topic
297
+ timeline = htm.conversation_timeline("HTM design", limit: 50)
298
+
299
+ # Returns memories sorted by created_at
300
+ # Useful for replaying decision evolution over time
301
+ ```
302
+
303
+ ---
304
+
305
+ ## Performance Characteristics
306
+
307
+ !!! info "Client-Side Embedding Generation"
308
+ Embeddings are generated client-side before SQL queries. Latency includes HTTP call to Ollama/OpenAI for embedding generation.
309
+
310
+ ### Vector Search
311
+
312
+ - **Latency**: ~30-50ms for client-side embedding + index lookup
313
+ - **Index**: HNSW (Hierarchical Navigable Small World)
314
+ - **Scalability**: O(log n) with HNSW, sublinear
315
+ - **Best case**: Conceptual queries, semantic relationships
316
+ - **Breakdown**: ~20-30ms embedding generation, ~10-20ms vector search
317
+
318
+ ### Full-Text Search
319
+
320
+ - **Latency**: ~5-20ms (no embedding generation)
321
+ - **Index**: GIN (Generalized Inverted Index) on tsvector
322
+ - **Scalability**: O(log n) with GIN index
323
+ - **Best case**: Exact keywords, technical terms
324
+ - **Benefit**: Fastest option when embeddings not needed
325
+
326
+ ### Hybrid Search
327
+
328
+ - **Latency**: Full-text pre-filter + client-side embedding + vector reranking
329
+ - **Total**: ~35-70ms
330
+ - **Optimization**: Pre-filter reduces vector search space
331
+ - **Best case**: Large datasets where full-text can narrow candidates
332
+ - **Breakdown**: ~20-30ms embedding, ~5-10ms full-text, ~10-30ms vector reranking
333
+
334
+ ### Temporal Filtering
335
+
336
+ - **Optimization**: TimescaleDB hypertable partitioning by time
337
+ - **Index**: B-tree on `created_at` column
338
+ - **Benefit**: Prunes partitions outside timeframe, faster scans
339
+
340
+ ---
341
+
342
+ ## Design Decisions
343
+
344
+ ### Decision: Three Strategies Instead of One
345
+
346
+ **Rationale**: Different queries benefit from different approaches. Give users flexibility.
347
+
348
+ **Alternative**: Single hybrid strategy for all queries
349
+
350
+ **Rejected**: Forces hybrid approach even when pure vector or full-text is better
351
+
352
+ ### Decision: Temporal Filtering is Mandatory
353
+
354
+ **Rationale**: HTM is time-oriented. All retrieval should consider temporal context.
355
+
356
+ **Alternative**: Optional timeframe parameter
357
+
358
+ **Rejected**: Easy to forget, defeats TimescaleDB optimization benefits
359
+
360
+ ### Decision: Hybrid Pre-filter Limit = 100
361
+
362
+ **Rationale**: Balances recall (enough candidates) with performance (vector search cost)
363
+
364
+ **Alternative**: Dynamic limit based on result count
365
+
366
+ **Deferred**: Can optimize later based on real-world usage patterns
367
+
368
+ ### Decision: Return Similarity/Rank Scores
369
+
370
+ **Rationale**: Enables debugging, threshold filtering, and understanding retrieval quality
371
+
372
+ **Alternative**: Just return nodes without scores
373
+
374
+ **Rejected**: Lose valuable signal for debugging and optimization
375
+
376
+ ---
377
+
378
+ ## Risks and Mitigations
379
+
380
+ ### Risk: Wrong Strategy Selection
381
+
382
+ !!! warning "Risk"
383
+ User chooses vector for exact keyword query (poor results)
384
+
385
+ **Likelihood**: Medium (requires understanding differences)
386
+
387
+ **Impact**: Medium (degraded retrieval quality)
388
+
389
+ **Mitigation**:
390
+
391
+ - Default to hybrid for balanced results
392
+ - Document use cases clearly
393
+ - Provide examples in API docs
394
+ - Consider auto-detection in future
395
+
396
+ ### Risk: Embedding Latency
397
+
398
+ !!! info "Risk"
399
+ Vector/hybrid slow due to embedding generation
400
+
401
+ **Likelihood**: High (embedding is I/O bound)
402
+
403
+ **Impact**: Medium (100-500ms for Ollama)
404
+
405
+ **Mitigation**:
406
+
407
+ - Cache embeddings for common queries (future)
408
+ - Use fast local embedding models (gpt-oss)
409
+ - Provide fallback to full-text if embedding fails
410
+
411
+ ### Risk: Language Limitation
412
+
413
+ !!! danger "Risk"
414
+ Full-text search optimized for English only
415
+
416
+ **Likelihood**: Low (single-user, likely English)
417
+
418
+ **Impact**: High (non-English users)
419
+
420
+ **Mitigation**:
421
+
422
+ - Document English assumption
423
+ - Support language parameter in future
424
+ - Vector search language-agnostic (works for all languages)
425
+
426
+ ### Risk: Pre-filter Misses Results
427
+
428
+ !!! info "Risk"
429
+ Hybrid pre-filter (100) misses relevant candidates
430
+
431
+ **Likelihood**: Low (100 is generous)
432
+
433
+ **Impact**: Medium (reduced recall)
434
+
435
+ **Mitigation**:
436
+
437
+ - Make prefilter_limit configurable
438
+ - Monitor recall metrics in practice
439
+ - Adjust default if needed
440
+
441
+ ---
442
+
443
+ ## Future Enhancements
444
+
445
+ ### Query Auto-Detection
446
+
447
+ ```ruby
448
+ # Automatically choose strategy based on query
449
+ htm.recall_smart(timeframe: "last week", topic: "PostgreSQL 17.2")
450
+ # Detects version number → uses :fulltext
451
+
452
+ htm.recall_smart(timeframe: "last month", topic: "architectural philosophy")
453
+ # Detects conceptual query → uses :vector
454
+ ```
455
+
456
+ ### Re-ranking Strategies
457
+
458
+ ```ruby
459
+ # Custom re-ranking based on multiple signals
460
+ memories = htm.recall(
461
+ timeframe: "last week",
462
+ topic: "PostgreSQL",
463
+ strategy: :hybrid,
464
+ rerank: [:similarity, :importance, :recency] # Multi-factor scoring
465
+ )
466
+ ```
467
+
468
+ ### Query Expansion
469
+
470
+ ```ruby
471
+ # LLM-powered query expansion
472
+ original = "database"
473
+ expanded = ["database", "PostgreSQL", "TimescaleDB", "SQL", "storage"]
474
+
475
+ memories = htm.recall(
476
+ timeframe: "last month",
477
+ topic: expanded,
478
+ strategy: :fulltext
479
+ )
480
+ ```
481
+
482
+ ### Caching Layer
483
+
484
+ ```ruby
485
+ # Cache embedding generation for common queries
486
+ @embedding_cache = {}
487
+
488
+ def search_cached(query)
489
+ @embedding_cache[query] ||= embedding_service.embed(query)
490
+ end
491
+ ```
492
+
493
+ ---
494
+
495
+ ## Alternatives Comparison
496
+
497
+ | Approach | Pros | Cons | Decision |
498
+ |----------|------|------|----------|
499
+ | **Hybrid Search** | **Balanced precision + recall** | **Strategy selection** | **ACCEPTED** |
500
+ | Pure Vector Only | Simplest API, semantic | Misses exact matches, slower | Rejected |
501
+ | Pure Full-Text Only | Fast, no embeddings | No semantic understanding | Rejected |
502
+ | LLM-as-Retriever | Most flexible queries | Too slow, expensive | Rejected |
503
+ | Elasticsearch | Dedicated search engine | Additional infrastructure | Rejected |
504
+
505
+ ---
506
+
507
+ ## References
508
+
509
+ - [RAG (Retrieval-Augmented Generation)](https://arxiv.org/abs/2005.11401)
510
+ - [pgvector Documentation](https://github.com/pgvector/pgvector)
511
+ - [PostgreSQL Full-Text Search](https://www.postgresql.org/docs/current/textsearch.html)
512
+ - [HNSW Algorithm](https://arxiv.org/abs/1603.09320)
513
+ - [Hybrid Search Best Practices](https://www.pinecone.io/learn/hybrid-search-intro/)
514
+ - [ADR-001: PostgreSQL Storage](001-postgresql-timescaledb.md)
515
+ - [ADR-003: Ollama Embeddings](003-ollama-embeddings.md) - **Superseded by ADR-011**
516
+ - [ADR-011: Database-Side Embedding Generation with pgai](011-pgai-integration.md) - **Superseded (returned to client-side)**
517
+ - [Search Strategies Guide](../../guides/search-strategies.md)
518
+
519
+ ---
520
+
521
+ ## Review Notes
522
+
523
+ **AI Engineer**: Hybrid search is the right approach for RAG systems. Pre-filter optimization is smart.
524
+
525
+ **Database Architect**: TimescaleDB + pgvector + full-text is well-architected. Consider query plan analysis for optimization.
526
+
527
+ **Performance Specialist**: HNSW and GIN indexes will scale. Monitor embedding latency in production.
528
+
529
+ **Systems Architect**: Three strategies provide good flexibility. Document decision matrix clearly for users.
530
+
531
+ **Ruby Expert**: Clean API design. Consider strategy as default parameter: `recall(..., strategy: :hybrid)`