htm 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (155) hide show
  1. checksums.yaml +7 -0
  2. data/.architecture/decisions/adrs/001-use-postgresql-timescaledb-storage.md +227 -0
  3. data/.architecture/decisions/adrs/002-two-tier-memory-architecture.md +322 -0
  4. data/.architecture/decisions/adrs/003-ollama-default-embedding-provider.md +339 -0
  5. data/.architecture/decisions/adrs/004-multi-robot-shared-memory-hive-mind.md +374 -0
  6. data/.architecture/decisions/adrs/005-rag-based-retrieval-with-hybrid-search.md +443 -0
  7. data/.architecture/decisions/adrs/006-context-assembly-strategies.md +444 -0
  8. data/.architecture/decisions/adrs/007-working-memory-eviction-strategy.md +461 -0
  9. data/.architecture/decisions/adrs/008-robot-identification-system.md +550 -0
  10. data/.architecture/decisions/adrs/009-never-forget-explicit-deletion-only.md +570 -0
  11. data/.architecture/decisions/adrs/010-redis-working-memory-rejected.md +323 -0
  12. data/.architecture/decisions/adrs/011-database-side-embedding-generation-with-pgai.md +585 -0
  13. data/.architecture/decisions/adrs/012-llm-driven-ontology-topic-extraction.md +583 -0
  14. data/.architecture/decisions/adrs/013-activerecord-orm-and-many-to-many-tagging.md +299 -0
  15. data/.architecture/decisions/adrs/014-client-side-embedding-generation-workflow.md +569 -0
  16. data/.architecture/decisions/adrs/015-hierarchical-tag-ontology-and-llm-extraction.md +701 -0
  17. data/.architecture/decisions/adrs/016-async-embedding-and-tag-generation.md +694 -0
  18. data/.architecture/members.yml +144 -0
  19. data/.architecture/reviews/2025-10-29-llm-configuration-and-async-processing-review.md +1137 -0
  20. data/.architecture/reviews/initial-system-analysis.md +330 -0
  21. data/.envrc +32 -0
  22. data/.irbrc +145 -0
  23. data/CHANGELOG.md +150 -0
  24. data/COMMITS.md +196 -0
  25. data/LICENSE +21 -0
  26. data/README.md +1347 -0
  27. data/Rakefile +51 -0
  28. data/SETUP.md +268 -0
  29. data/config/database.yml +67 -0
  30. data/db/migrate/20250101000001_enable_extensions.rb +14 -0
  31. data/db/migrate/20250101000002_create_robots.rb +14 -0
  32. data/db/migrate/20250101000003_create_nodes.rb +42 -0
  33. data/db/migrate/20250101000005_create_tags.rb +38 -0
  34. data/db/migrate/20250101000007_add_node_vector_indexes.rb +30 -0
  35. data/db/schema.sql +473 -0
  36. data/db/seed_data/README.md +100 -0
  37. data/db/seed_data/presidents.md +136 -0
  38. data/db/seed_data/states.md +151 -0
  39. data/db/seeds.rb +208 -0
  40. data/dbdoc/README.md +173 -0
  41. data/dbdoc/public.node_stats.md +48 -0
  42. data/dbdoc/public.node_stats.svg +41 -0
  43. data/dbdoc/public.node_tags.md +40 -0
  44. data/dbdoc/public.node_tags.svg +112 -0
  45. data/dbdoc/public.nodes.md +54 -0
  46. data/dbdoc/public.nodes.svg +118 -0
  47. data/dbdoc/public.nodes_tags.md +39 -0
  48. data/dbdoc/public.nodes_tags.svg +112 -0
  49. data/dbdoc/public.ontology_structure.md +48 -0
  50. data/dbdoc/public.ontology_structure.svg +38 -0
  51. data/dbdoc/public.operations_log.md +42 -0
  52. data/dbdoc/public.operations_log.svg +130 -0
  53. data/dbdoc/public.relationships.md +39 -0
  54. data/dbdoc/public.relationships.svg +41 -0
  55. data/dbdoc/public.robot_activity.md +46 -0
  56. data/dbdoc/public.robot_activity.svg +35 -0
  57. data/dbdoc/public.robots.md +35 -0
  58. data/dbdoc/public.robots.svg +90 -0
  59. data/dbdoc/public.schema_migrations.md +29 -0
  60. data/dbdoc/public.schema_migrations.svg +26 -0
  61. data/dbdoc/public.tags.md +35 -0
  62. data/dbdoc/public.tags.svg +60 -0
  63. data/dbdoc/public.topic_relationships.md +45 -0
  64. data/dbdoc/public.topic_relationships.svg +32 -0
  65. data/dbdoc/schema.json +1437 -0
  66. data/dbdoc/schema.svg +154 -0
  67. data/docs/api/database.md +806 -0
  68. data/docs/api/embedding-service.md +532 -0
  69. data/docs/api/htm.md +797 -0
  70. data/docs/api/index.md +259 -0
  71. data/docs/api/long-term-memory.md +1096 -0
  72. data/docs/api/working-memory.md +665 -0
  73. data/docs/architecture/adrs/001-postgresql-timescaledb.md +314 -0
  74. data/docs/architecture/adrs/002-two-tier-memory.md +411 -0
  75. data/docs/architecture/adrs/003-ollama-embeddings.md +421 -0
  76. data/docs/architecture/adrs/004-hive-mind.md +437 -0
  77. data/docs/architecture/adrs/005-rag-retrieval.md +531 -0
  78. data/docs/architecture/adrs/006-context-assembly.md +496 -0
  79. data/docs/architecture/adrs/007-eviction-strategy.md +645 -0
  80. data/docs/architecture/adrs/008-robot-identification.md +625 -0
  81. data/docs/architecture/adrs/009-never-forget.md +648 -0
  82. data/docs/architecture/adrs/010-redis-working-memory-rejected.md +323 -0
  83. data/docs/architecture/adrs/011-pgai-integration.md +494 -0
  84. data/docs/architecture/adrs/index.md +215 -0
  85. data/docs/architecture/hive-mind.md +736 -0
  86. data/docs/architecture/index.md +351 -0
  87. data/docs/architecture/overview.md +538 -0
  88. data/docs/architecture/two-tier-memory.md +873 -0
  89. data/docs/assets/css/custom.css +83 -0
  90. data/docs/assets/images/htm-core-components.svg +63 -0
  91. data/docs/assets/images/htm-database-schema.svg +93 -0
  92. data/docs/assets/images/htm-hive-mind-architecture.svg +125 -0
  93. data/docs/assets/images/htm-importance-scoring-framework.svg +83 -0
  94. data/docs/assets/images/htm-layered-architecture.svg +71 -0
  95. data/docs/assets/images/htm-long-term-memory-architecture.svg +115 -0
  96. data/docs/assets/images/htm-working-memory-architecture.svg +120 -0
  97. data/docs/assets/images/htm.jpg +0 -0
  98. data/docs/assets/images/htm_demo.gif +0 -0
  99. data/docs/assets/js/mathjax.js +18 -0
  100. data/docs/assets/videos/htm_video.mp4 +0 -0
  101. data/docs/database_rake_tasks.md +322 -0
  102. data/docs/development/contributing.md +787 -0
  103. data/docs/development/index.md +336 -0
  104. data/docs/development/schema.md +596 -0
  105. data/docs/development/setup.md +719 -0
  106. data/docs/development/testing.md +819 -0
  107. data/docs/guides/adding-memories.md +824 -0
  108. data/docs/guides/context-assembly.md +1009 -0
  109. data/docs/guides/getting-started.md +577 -0
  110. data/docs/guides/index.md +118 -0
  111. data/docs/guides/long-term-memory.md +941 -0
  112. data/docs/guides/multi-robot.md +866 -0
  113. data/docs/guides/recalling-memories.md +927 -0
  114. data/docs/guides/search-strategies.md +953 -0
  115. data/docs/guides/working-memory.md +717 -0
  116. data/docs/index.md +214 -0
  117. data/docs/installation.md +477 -0
  118. data/docs/multi_framework_support.md +519 -0
  119. data/docs/quick-start.md +655 -0
  120. data/docs/setup_local_database.md +302 -0
  121. data/docs/using_rake_tasks_in_your_app.md +383 -0
  122. data/examples/basic_usage.rb +93 -0
  123. data/examples/cli_app/README.md +317 -0
  124. data/examples/cli_app/htm_cli.rb +270 -0
  125. data/examples/custom_llm_configuration.rb +183 -0
  126. data/examples/example_app/Rakefile +71 -0
  127. data/examples/example_app/app.rb +206 -0
  128. data/examples/sinatra_app/Gemfile +21 -0
  129. data/examples/sinatra_app/app.rb +335 -0
  130. data/lib/htm/active_record_config.rb +113 -0
  131. data/lib/htm/configuration.rb +342 -0
  132. data/lib/htm/database.rb +594 -0
  133. data/lib/htm/embedding_service.rb +115 -0
  134. data/lib/htm/errors.rb +34 -0
  135. data/lib/htm/job_adapter.rb +154 -0
  136. data/lib/htm/jobs/generate_embedding_job.rb +65 -0
  137. data/lib/htm/jobs/generate_tags_job.rb +82 -0
  138. data/lib/htm/long_term_memory.rb +965 -0
  139. data/lib/htm/models/node.rb +109 -0
  140. data/lib/htm/models/node_tag.rb +33 -0
  141. data/lib/htm/models/robot.rb +52 -0
  142. data/lib/htm/models/tag.rb +76 -0
  143. data/lib/htm/railtie.rb +76 -0
  144. data/lib/htm/sinatra.rb +157 -0
  145. data/lib/htm/tag_service.rb +135 -0
  146. data/lib/htm/tasks.rb +38 -0
  147. data/lib/htm/version.rb +5 -0
  148. data/lib/htm/working_memory.rb +182 -0
  149. data/lib/htm.rb +400 -0
  150. data/lib/tasks/db.rake +19 -0
  151. data/lib/tasks/htm.rake +147 -0
  152. data/lib/tasks/jobs.rake +312 -0
  153. data/mkdocs.yml +190 -0
  154. data/scripts/install_local_database.sh +309 -0
  155. metadata +341 -0
@@ -0,0 +1,443 @@
1
+ # ADR-005: RAG-Based Retrieval with Hybrid Search
2
+
3
+ **Status**: Accepted
4
+
5
+ **Date**: 2025-10-25
6
+
7
+ **Decision Makers**: Dewayne VanHoozer, Claude (Anthropic)
8
+
9
+ ---
10
+
11
+ ## ⚠️ UPDATE (2025-10-28)
12
+
13
+ **References to TimescaleDB optimization in this ADR are now historical.**
14
+
15
+ After initial struggles with database configuration, the decision was made to drop the TimescaleDB extension as it was not providing sufficient value for the current proof-of-concept applications. The RAG-based retrieval strategies remain unchanged, but temporal filtering now uses standard PostgreSQL B-tree indexes instead of TimescaleDB hypertable partitioning.
16
+
17
+ See [ADR-001](001-use-postgresql-timescaledb-storage.md) for details on the TimescaleDB removal.
18
+
19
+ ---
20
+
21
+ ## Context
22
+
23
+ Traditional memory systems for LLMs face challenges in retrieving relevant information:
24
+
25
+ - **Keyword-only search**: Misses semantic relationships ("car" vs "automobile")
26
+ - **Vector-only search**: May miss exact keyword matches ("PostgreSQL 17.2" vs "database")
27
+ - **No temporal context**: Doesn't leverage time-based relevance
28
+ - **Scalability**: Simple linear scans don't scale to thousands of memories
29
+
30
+ HTM needs intelligent retrieval that balances:
31
+
32
+ - Semantic understanding (what does the query mean?)
33
+ - Keyword precision (exact term matching)
34
+ - Temporal relevance (recent vs historical context)
35
+ - Performance (fast retrieval from large datasets)
36
+
37
+ Alternative approaches:
38
+
39
+ 1. **Pure vector search**: Semantic only, no keyword precision
40
+ 2. **Pure full-text search**: Keywords only, no semantic understanding
41
+ 3. **Hybrid search**: Combine vector + full-text + temporal filtering
42
+ 4. **LLM-as-retriever**: Use LLM to generate retrieval queries (slow, expensive)
43
+
44
+ ## Decision
45
+
46
+ We will implement **RAG-based retrieval with three search strategies**: vector, full-text, and hybrid, all with temporal filtering.
47
+
48
+ ### Search Strategies
49
+
50
+ **1. Vector Search (`:vector`)**
51
+
52
+ - Generate embedding for query
53
+ - Compute cosine similarity with stored embeddings
54
+ - Temporal filtering on timeframe
55
+ - Best for: Semantic queries, conceptual relationships
56
+
57
+ **2. Full-Text Search (`:fulltext`)**
58
+
59
+ - PostgreSQL `to_tsvector` and `plainto_tsquery`
60
+ - `ts_rank` scoring for relevance
61
+ - Temporal filtering on timeframe
62
+ - Best for: Exact keywords, technical terms, proper nouns
63
+
64
+ **3. Hybrid Search (`:hybrid`)**
65
+
66
+ - Full-text pre-filter to get candidates (top 100)
67
+ - Vector reranking of candidates for semantic relevance
68
+ - Temporal filtering on timeframe
69
+ - Best for: Balanced retrieval with precision + recall
70
+
71
+ ### Default Strategy
72
+
73
+ **Hybrid** is recommended for most use cases as it provides the best balance of semantic understanding and keyword precision.
74
+
75
+ ## Rationale
76
+
77
+ ### Why RAG-Based Retrieval?
78
+
79
+ **Temporal filtering is foundational**:
80
+
81
+ - "What did we discuss last week?" - time is the primary filter
82
+ - Recent context often more relevant than old context
83
+ - TimescaleDB optimized for time-range queries
84
+
85
+ **Semantic search handles synonyms**:
86
+
87
+ - User says "database", finds memories about "PostgreSQL"
88
+ - "Bug fix" matches "resolved issue"
89
+ - Captures conceptual relationships
90
+
91
+ **Full-text handles precision**:
92
+
93
+ - "PostgreSQL 17.2" needs exact version match
94
+ - Technical terminology like "pgvector", "HNSW"
95
+ - Proper nouns like robot names, project names
96
+
97
+ **Hybrid combines strengths**:
98
+
99
+ - Pre-filter with keywords reduces vector search space
100
+ - Vector reranking improves relevance of keyword matches
101
+ - Avoids false positives from pure vector search
102
+ - Avoids missing results from pure keyword search
103
+
104
+ ### Implementation Details
105
+
106
+ ```ruby
107
+ # Vector search
108
+ def search(timeframe:, query:, limit:, embedding_service:)
109
+ query_embedding = embedding_service.embed(query)
110
+
111
+ SELECT *, 1 - (embedding <=> $1::vector) as similarity
112
+ FROM nodes
113
+ WHERE created_at BETWEEN $2 AND $3
114
+ ORDER BY embedding <=> $1::vector
115
+ LIMIT $4
116
+ end
117
+ ```
118
+
119
+ ```ruby
120
+ # Full-text search
121
+ def search_fulltext(timeframe:, query:, limit:)
122
+ SELECT *, ts_rank(to_tsvector('english', value), plainto_tsquery('english', $1)) as rank
123
+ FROM nodes
124
+ WHERE created_at BETWEEN $2 AND $3
125
+ AND to_tsvector('english', value) @@ plainto_tsquery('english', $1)
126
+ ORDER BY rank DESC
127
+ LIMIT $4
128
+ end
129
+ ```
130
+
131
+ ```ruby
132
+ # Hybrid search
133
+ def search_hybrid(timeframe:, query:, limit:, embedding_service:, prefilter_limit: 100)
134
+ query_embedding = embedding_service.embed(query)
135
+
136
+ WITH candidates AS (
137
+ SELECT *
138
+ FROM nodes
139
+ WHERE created_at BETWEEN $2 AND $3
140
+ AND to_tsvector('english', value) @@ plainto_tsquery('english', $1)
141
+ LIMIT $5 -- Pre-filter to 100 candidates
142
+ )
143
+ SELECT *, 1 - (embedding <=> $4::vector) as similarity
144
+ FROM candidates
145
+ ORDER BY embedding <=> $4::vector
146
+ LIMIT $6 -- Final top results
147
+ end
148
+ ```
149
+
150
+ ### User API
151
+
152
+ ```ruby
153
+ # Use hybrid search (recommended)
154
+ memories = htm.recall(
155
+ timeframe: "last week",
156
+ topic: "PostgreSQL performance",
157
+ limit: 20,
158
+ strategy: :hybrid # default recommended
159
+ )
160
+
161
+ # Use pure vector search
162
+ memories = htm.recall(
163
+ timeframe: "last month",
164
+ topic: "database design philosophy",
165
+ strategy: :vector # best for conceptual queries
166
+ )
167
+
168
+ # Use pure full-text search
169
+ memories = htm.recall(
170
+ timeframe: "yesterday",
171
+ topic: "PostgreSQL 17.2 upgrade",
172
+ strategy: :fulltext # best for exact keywords
173
+ )
174
+ ```
175
+
176
+ ## Consequences
177
+
178
+ ### Positive
179
+
180
+ ✅ **Flexible retrieval**: Choose strategy based on query type
181
+ ✅ **Temporal context**: Time-range filtering built into all strategies
182
+ ✅ **Semantic understanding**: Vector search captures relationships
183
+ ✅ **Keyword precision**: Full-text search handles exact matches
184
+ ✅ **Balanced hybrid**: Best of both worlds with pre-filter optimization
185
+ ✅ **Scalable**: HNSW indexing on vectors, GIN indexing on tsvectors
186
+ ✅ **Transparent scoring**: Return similarity/rank scores for debugging
187
+
188
+ ### Negative
189
+
190
+ ❌ **Complexity**: Three strategies to understand and choose from
191
+ ❌ **Embedding latency**: Vector/hybrid require embedding generation
192
+ ❌ **Storage overhead**: Both embeddings and full-text indexes
193
+ ❌ **English-only**: Full-text optimized for English language
194
+ ❌ **Tuning required**: Hybrid prefilter_limit may need adjustment
195
+
196
+ ### Neutral
197
+
198
+ ➡️ **Strategy selection**: User must choose appropriate strategy
199
+ ➡️ **Timeframe parsing**: Natural language time parsing adds complexity
200
+ ➡️ **Embedding consistency**: Different embedding models produce different results
201
+
202
+ ## Design Decisions
203
+
204
+ ### Decision: Three Strategies Instead of One
205
+ **Rationale**: Different queries benefit from different approaches. Give users flexibility.
206
+
207
+ **Alternative**: Single hybrid strategy for all queries
208
+ **Rejected**: Forces hybrid approach even when pure vector or full-text is better
209
+
210
+ ### Decision: Temporal Filtering is Mandatory
211
+ **Rationale**: HTM is time-oriented. All retrieval should consider temporal context.
212
+
213
+ **Alternative**: Optional timeframe parameter
214
+ **Rejected**: Easy to forget, defeats TimescaleDB optimization benefits
215
+
216
+ ### Decision: Hybrid Pre-filter Limit = 100
217
+ **Rationale**: Balances recall (enough candidates) with performance (vector search cost)
218
+
219
+ **Alternative**: Dynamic limit based on result count
220
+ **Deferred**: Can optimize later based on real-world usage patterns
221
+
222
+ ### Decision: Return Similarity/Rank Scores
223
+ **Rationale**: Enables debugging, threshold filtering, and understanding retrieval quality
224
+
225
+ **Alternative**: Just return nodes without scores
226
+ **Rejected**: Lose valuable signal for debugging and optimization
227
+
228
+ ## Use Cases
229
+
230
+ ### Use Case 1: Semantic Concept Retrieval
231
+ ```ruby
232
+ # Query: "What architectural decisions have we made?"
233
+ # Best strategy: :vector (semantic concept matching)
234
+
235
+ memories = htm.recall(
236
+ timeframe: "last month",
237
+ topic: "architectural decisions design choices",
238
+ strategy: :vector
239
+ )
240
+
241
+ # Finds: "We decided to use PostgreSQL", "Chose two-tier memory model", etc.
242
+ # Matches conceptually even without exact keywords
243
+ ```
244
+
245
+ ### Use Case 2: Exact Technical Term
246
+ ```ruby
247
+ # Query: "Find all mentions of PostgreSQL 17.2"
248
+ # Best strategy: :fulltext (exact version number)
249
+
250
+ memories = htm.recall(
251
+ timeframe: "this week",
252
+ topic: "PostgreSQL 17.2",
253
+ strategy: :fulltext
254
+ )
255
+
256
+ # Finds: Exact "PostgreSQL 17.2" mentions
257
+ # Avoids false matches to "PostgreSQL 16" or generic "database"
258
+ ```
259
+
260
+ ### Use Case 3: Balanced Query
261
+ ```ruby
262
+ # Query: "What did we discuss about database performance?"
263
+ # Best strategy: :hybrid (keyword + semantic)
264
+
265
+ memories = htm.recall(
266
+ timeframe: "last week",
267
+ topic: "database performance optimization",
268
+ strategy: :hybrid
269
+ )
270
+
271
+ # Pre-filters: Documents containing "database", "performance", "optimization"
272
+ # Reranks: By semantic similarity to full query
273
+ # Result: Best balance of precision + recall
274
+ ```
275
+
276
+ ### Use Case 4: Conversation Timeline
277
+ ```ruby
278
+ # Get chronological conversation about a topic
279
+ timeline = htm.conversation_timeline("HTM design", limit: 50)
280
+
281
+ # Returns memories sorted by created_at
282
+ # Useful for replaying decision evolution over time
283
+ ```
284
+
285
+ ## Performance Characteristics
286
+
287
+ ### Vector Search
288
+
289
+ - **Latency**: ~10-50ms for embedding generation + index lookup
290
+ - **Index**: HNSW (Hierarchical Navigable Small World)
291
+ - **Scalability**: O(log n) with HNSW, sublinear
292
+ - **Best case**: Conceptual queries, semantic relationships
293
+
294
+ ### Full-Text Search
295
+
296
+ - **Latency**: ~5-20ms (no embedding generation)
297
+ - **Index**: GIN (Generalized Inverted Index) on tsvector
298
+ - **Scalability**: O(log n) with GIN index
299
+ - **Best case**: Exact keywords, technical terms
300
+
301
+ ### Hybrid Search
302
+
303
+ - **Latency**: Full-text pre-filter + vector reranking
304
+ - **Total**: ~15-70ms (faster than pure vector on large datasets)
305
+ - **Optimization**: Pre-filter reduces vector search space
306
+ - **Best case**: Large datasets where full-text can narrow candidates
307
+
308
+ ### Temporal Filtering
309
+
310
+ - **Optimization**: TimescaleDB hypertable partitioning by time
311
+ - **Index**: B-tree on `created_at` column
312
+ - **Benefit**: Prunes partitions outside timeframe, faster scans
313
+
314
+ ## Risks and Mitigations
315
+
316
+ ### Risk: Wrong Strategy Selection
317
+
318
+ - **Risk**: User chooses vector for exact keyword query (poor results)
319
+ - **Likelihood**: Medium (requires understanding differences)
320
+ - **Impact**: Medium (degraded retrieval quality)
321
+ - **Mitigation**:
322
+ - Default to hybrid for balanced results
323
+ - Document use cases clearly
324
+ - Provide examples in API docs
325
+ - Consider auto-detection in future
326
+
327
+ ### Risk: Embedding Latency
328
+
329
+ - **Risk**: Vector/hybrid slow due to embedding generation
330
+ - **Likelihood**: High (embedding is I/O bound)
331
+ - **Impact**: Medium (100-500ms for Ollama)
332
+ - **Mitigation**:
333
+ - Cache embeddings for common queries (future)
334
+ - Use fast local embedding models (gpt-oss)
335
+ - Provide fallback to full-text if embedding fails
336
+
337
+ ### Risk: Language Limitation
338
+
339
+ - **Risk**: Full-text search optimized for English only
340
+ - **Likelihood**: Low (single-user, likely English)
341
+ - **Impact**: High (non-English users)
342
+ - **Mitigation**:
343
+ - Document English assumption
344
+ - Support language parameter in future
345
+ - Vector search language-agnostic (works for all languages)
346
+
347
+ ### Risk: Pre-filter Misses Results
348
+
349
+ - **Risk**: Hybrid pre-filter (100) misses relevant candidates
350
+ - **Likelihood**: Low (100 is generous)
351
+ - **Impact**: Medium (reduced recall)
352
+ - **Mitigation**:
353
+ - Make prefilter_limit configurable
354
+ - Monitor recall metrics in practice
355
+ - Adjust default if needed
356
+
357
+ ## Future Enhancements
358
+
359
+ ### Query Auto-Detection
360
+ ```ruby
361
+ # Automatically choose strategy based on query
362
+ htm.recall_smart(timeframe: "last week", topic: "PostgreSQL 17.2")
363
+ # Detects version number → uses :fulltext
364
+
365
+ htm.recall_smart(timeframe: "last month", topic: "architectural philosophy")
366
+ # Detects conceptual query → uses :vector
367
+ ```
368
+
369
+ ### Re-ranking Strategies
370
+ ```ruby
371
+ # Custom re-ranking based on multiple signals
372
+ memories = htm.recall(
373
+ timeframe: "last week",
374
+ topic: "PostgreSQL",
375
+ strategy: :hybrid,
376
+ rerank: [:similarity, :importance, :recency] # Multi-factor scoring
377
+ )
378
+ ```
379
+
380
+ ### Query Expansion
381
+ ```ruby
382
+ # LLM-powered query expansion
383
+ original = "database"
384
+ expanded = ["database", "PostgreSQL", "TimescaleDB", "SQL", "storage"]
385
+
386
+ memories = htm.recall(
387
+ timeframe: "last month",
388
+ topic: expanded,
389
+ strategy: :fulltext
390
+ )
391
+ ```
392
+
393
+ ### Caching Layer
394
+ ```ruby
395
+ # Cache embedding generation for common queries
396
+ @embedding_cache = {}
397
+
398
+ def search_cached(query)
399
+ @embedding_cache[query] ||= embedding_service.embed(query)
400
+ end
401
+ ```
402
+
403
+ ## Alternatives Considered
404
+
405
+ ### Pure Vector Search Only
406
+ **Pros**: Simplest API, semantic by default
407
+ **Cons**: Misses exact keyword matches, slower on large datasets
408
+ **Decision**: ❌ Rejected - need keyword precision
409
+
410
+ ### Pure Full-Text Only
411
+ **Pros**: Fast, no embedding overhead
412
+ **Cons**: No semantic understanding, synonym issues
413
+ **Decision**: ❌ Rejected - semantic understanding essential for LLMs
414
+
415
+ ### LLM-as-Retriever
416
+ **Pros**: Most flexible, natural language queries
417
+ **Cons**: Expensive, slow, requires online LLM
418
+ **Decision**: ❌ Rejected - too slow and expensive for retrieval path
419
+
420
+ ### Elasticsearch/Meilisearch
421
+ **Pros**: Dedicated search engines, advanced features
422
+ **Cons**: Additional infrastructure, complexity, cost
423
+ **Decision**: ❌ Rejected - PostgreSQL sufficient for v1, consolidation benefits
424
+
425
+ ## References
426
+
427
+ - [RAG (Retrieval-Augmented Generation)](https://arxiv.org/abs/2005.11401)
428
+ - [pgvector Documentation](https://github.com/pgvector/pgvector)
429
+ - [PostgreSQL Full-Text Search](https://www.postgresql.org/docs/current/textsearch.html)
430
+ - [HNSW Algorithm](https://arxiv.org/abs/1603.09320)
431
+ - [Hybrid Search Best Practices](https://www.pinecone.io/learn/hybrid-search-intro/)
432
+
433
+ ## Review Notes
434
+
435
+ **AI Engineer**: ✅ Hybrid search is the right approach for RAG systems. Pre-filter optimization is smart.
436
+
437
+ **Database Architect**: ✅ TimescaleDB + pgvector + full-text is well-architected. Consider query plan analysis for optimization.
438
+
439
+ **Performance Specialist**: ✅ HNSW and GIN indexes will scale. Monitor embedding latency in production.
440
+
441
+ **Systems Architect**: ✅ Three strategies provide good flexibility. Document decision matrix clearly for users.
442
+
443
+ **Ruby Expert**: ✅ Clean API design. Consider strategy as default parameter: `recall(..., strategy: :hybrid)`