htm 0.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.architecture/decisions/adrs/001-use-postgresql-timescaledb-storage.md +227 -0
- data/.architecture/decisions/adrs/002-two-tier-memory-architecture.md +322 -0
- data/.architecture/decisions/adrs/003-ollama-default-embedding-provider.md +339 -0
- data/.architecture/decisions/adrs/004-multi-robot-shared-memory-hive-mind.md +374 -0
- data/.architecture/decisions/adrs/005-rag-based-retrieval-with-hybrid-search.md +443 -0
- data/.architecture/decisions/adrs/006-context-assembly-strategies.md +444 -0
- data/.architecture/decisions/adrs/007-working-memory-eviction-strategy.md +461 -0
- data/.architecture/decisions/adrs/008-robot-identification-system.md +550 -0
- data/.architecture/decisions/adrs/009-never-forget-explicit-deletion-only.md +570 -0
- data/.architecture/decisions/adrs/010-redis-working-memory-rejected.md +323 -0
- data/.architecture/decisions/adrs/011-database-side-embedding-generation-with-pgai.md +585 -0
- data/.architecture/decisions/adrs/012-llm-driven-ontology-topic-extraction.md +583 -0
- data/.architecture/decisions/adrs/013-activerecord-orm-and-many-to-many-tagging.md +299 -0
- data/.architecture/decisions/adrs/014-client-side-embedding-generation-workflow.md +569 -0
- data/.architecture/decisions/adrs/015-hierarchical-tag-ontology-and-llm-extraction.md +701 -0
- data/.architecture/decisions/adrs/016-async-embedding-and-tag-generation.md +694 -0
- data/.architecture/members.yml +144 -0
- data/.architecture/reviews/2025-10-29-llm-configuration-and-async-processing-review.md +1137 -0
- data/.architecture/reviews/initial-system-analysis.md +330 -0
- data/.envrc +32 -0
- data/.irbrc +145 -0
- data/CHANGELOG.md +150 -0
- data/COMMITS.md +196 -0
- data/LICENSE +21 -0
- data/README.md +1347 -0
- data/Rakefile +51 -0
- data/SETUP.md +268 -0
- data/config/database.yml +67 -0
- data/db/migrate/20250101000001_enable_extensions.rb +14 -0
- data/db/migrate/20250101000002_create_robots.rb +14 -0
- data/db/migrate/20250101000003_create_nodes.rb +42 -0
- data/db/migrate/20250101000005_create_tags.rb +38 -0
- data/db/migrate/20250101000007_add_node_vector_indexes.rb +30 -0
- data/db/schema.sql +473 -0
- data/db/seed_data/README.md +100 -0
- data/db/seed_data/presidents.md +136 -0
- data/db/seed_data/states.md +151 -0
- data/db/seeds.rb +208 -0
- data/dbdoc/README.md +173 -0
- data/dbdoc/public.node_stats.md +48 -0
- data/dbdoc/public.node_stats.svg +41 -0
- data/dbdoc/public.node_tags.md +40 -0
- data/dbdoc/public.node_tags.svg +112 -0
- data/dbdoc/public.nodes.md +54 -0
- data/dbdoc/public.nodes.svg +118 -0
- data/dbdoc/public.nodes_tags.md +39 -0
- data/dbdoc/public.nodes_tags.svg +112 -0
- data/dbdoc/public.ontology_structure.md +48 -0
- data/dbdoc/public.ontology_structure.svg +38 -0
- data/dbdoc/public.operations_log.md +42 -0
- data/dbdoc/public.operations_log.svg +130 -0
- data/dbdoc/public.relationships.md +39 -0
- data/dbdoc/public.relationships.svg +41 -0
- data/dbdoc/public.robot_activity.md +46 -0
- data/dbdoc/public.robot_activity.svg +35 -0
- data/dbdoc/public.robots.md +35 -0
- data/dbdoc/public.robots.svg +90 -0
- data/dbdoc/public.schema_migrations.md +29 -0
- data/dbdoc/public.schema_migrations.svg +26 -0
- data/dbdoc/public.tags.md +35 -0
- data/dbdoc/public.tags.svg +60 -0
- data/dbdoc/public.topic_relationships.md +45 -0
- data/dbdoc/public.topic_relationships.svg +32 -0
- data/dbdoc/schema.json +1437 -0
- data/dbdoc/schema.svg +154 -0
- data/docs/api/database.md +806 -0
- data/docs/api/embedding-service.md +532 -0
- data/docs/api/htm.md +797 -0
- data/docs/api/index.md +259 -0
- data/docs/api/long-term-memory.md +1096 -0
- data/docs/api/working-memory.md +665 -0
- data/docs/architecture/adrs/001-postgresql-timescaledb.md +314 -0
- data/docs/architecture/adrs/002-two-tier-memory.md +411 -0
- data/docs/architecture/adrs/003-ollama-embeddings.md +421 -0
- data/docs/architecture/adrs/004-hive-mind.md +437 -0
- data/docs/architecture/adrs/005-rag-retrieval.md +531 -0
- data/docs/architecture/adrs/006-context-assembly.md +496 -0
- data/docs/architecture/adrs/007-eviction-strategy.md +645 -0
- data/docs/architecture/adrs/008-robot-identification.md +625 -0
- data/docs/architecture/adrs/009-never-forget.md +648 -0
- data/docs/architecture/adrs/010-redis-working-memory-rejected.md +323 -0
- data/docs/architecture/adrs/011-pgai-integration.md +494 -0
- data/docs/architecture/adrs/index.md +215 -0
- data/docs/architecture/hive-mind.md +736 -0
- data/docs/architecture/index.md +351 -0
- data/docs/architecture/overview.md +538 -0
- data/docs/architecture/two-tier-memory.md +873 -0
- data/docs/assets/css/custom.css +83 -0
- data/docs/assets/images/htm-core-components.svg +63 -0
- data/docs/assets/images/htm-database-schema.svg +93 -0
- data/docs/assets/images/htm-hive-mind-architecture.svg +125 -0
- data/docs/assets/images/htm-importance-scoring-framework.svg +83 -0
- data/docs/assets/images/htm-layered-architecture.svg +71 -0
- data/docs/assets/images/htm-long-term-memory-architecture.svg +115 -0
- data/docs/assets/images/htm-working-memory-architecture.svg +120 -0
- data/docs/assets/images/htm.jpg +0 -0
- data/docs/assets/images/htm_demo.gif +0 -0
- data/docs/assets/js/mathjax.js +18 -0
- data/docs/assets/videos/htm_video.mp4 +0 -0
- data/docs/database_rake_tasks.md +322 -0
- data/docs/development/contributing.md +787 -0
- data/docs/development/index.md +336 -0
- data/docs/development/schema.md +596 -0
- data/docs/development/setup.md +719 -0
- data/docs/development/testing.md +819 -0
- data/docs/guides/adding-memories.md +824 -0
- data/docs/guides/context-assembly.md +1009 -0
- data/docs/guides/getting-started.md +577 -0
- data/docs/guides/index.md +118 -0
- data/docs/guides/long-term-memory.md +941 -0
- data/docs/guides/multi-robot.md +866 -0
- data/docs/guides/recalling-memories.md +927 -0
- data/docs/guides/search-strategies.md +953 -0
- data/docs/guides/working-memory.md +717 -0
- data/docs/index.md +214 -0
- data/docs/installation.md +477 -0
- data/docs/multi_framework_support.md +519 -0
- data/docs/quick-start.md +655 -0
- data/docs/setup_local_database.md +302 -0
- data/docs/using_rake_tasks_in_your_app.md +383 -0
- data/examples/basic_usage.rb +93 -0
- data/examples/cli_app/README.md +317 -0
- data/examples/cli_app/htm_cli.rb +270 -0
- data/examples/custom_llm_configuration.rb +183 -0
- data/examples/example_app/Rakefile +71 -0
- data/examples/example_app/app.rb +206 -0
- data/examples/sinatra_app/Gemfile +21 -0
- data/examples/sinatra_app/app.rb +335 -0
- data/lib/htm/active_record_config.rb +113 -0
- data/lib/htm/configuration.rb +342 -0
- data/lib/htm/database.rb +594 -0
- data/lib/htm/embedding_service.rb +115 -0
- data/lib/htm/errors.rb +34 -0
- data/lib/htm/job_adapter.rb +154 -0
- data/lib/htm/jobs/generate_embedding_job.rb +65 -0
- data/lib/htm/jobs/generate_tags_job.rb +82 -0
- data/lib/htm/long_term_memory.rb +965 -0
- data/lib/htm/models/node.rb +109 -0
- data/lib/htm/models/node_tag.rb +33 -0
- data/lib/htm/models/robot.rb +52 -0
- data/lib/htm/models/tag.rb +76 -0
- data/lib/htm/railtie.rb +76 -0
- data/lib/htm/sinatra.rb +157 -0
- data/lib/htm/tag_service.rb +135 -0
- data/lib/htm/tasks.rb +38 -0
- data/lib/htm/version.rb +5 -0
- data/lib/htm/working_memory.rb +182 -0
- data/lib/htm.rb +400 -0
- data/lib/tasks/db.rake +19 -0
- data/lib/tasks/htm.rake +147 -0
- data/lib/tasks/jobs.rake +312 -0
- data/mkdocs.yml +190 -0
- data/scripts/install_local_database.sh +309 -0
- metadata +341 -0
|
@@ -0,0 +1,531 @@
|
|
|
1
|
+
# ADR-005: RAG-Based Retrieval with Hybrid Search
|
|
2
|
+
|
|
3
|
+
**Status**: Accepted (Updated for Client-Side Embeddings)
|
|
4
|
+
|
|
5
|
+
**Date**: 2025-10-25 (Updated: 2025-10-27)
|
|
6
|
+
|
|
7
|
+
**Decision Makers**: Dewayne VanHoozer, Claude (Anthropic)
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
!!! info "Architecture Update (October 2025)"
|
|
12
|
+
Following the reversal of ADR-011, query embeddings are now generated client-side in Ruby using `EmbeddingService` before being passed to SQL for vector similarity search. This provides a reliable, cross-platform solution.
|
|
13
|
+
|
|
14
|
+
## Quick Summary
|
|
15
|
+
|
|
16
|
+
HTM implements **RAG-based retrieval with three search strategies**: vector search (semantic), full-text search (keywords), and hybrid search (combined). All strategies include temporal filtering to leverage TimescaleDB's time-series optimization.
|
|
17
|
+
|
|
18
|
+
**Why**: Different queries benefit from different approaches. Semantic search handles concepts, full-text handles precise terms, and hybrid provides the best balance for most use cases.
|
|
19
|
+
|
|
20
|
+
**Impact**: Flexible retrieval with excellent recall and precision. Client-side embedding generation provides reliable, debuggable operation across all platforms.
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## Context
|
|
25
|
+
|
|
26
|
+
Traditional memory systems for LLMs face challenges in retrieving relevant information:
|
|
27
|
+
|
|
28
|
+
- **Keyword-only search**: Misses semantic relationships ("car" vs "automobile")
|
|
29
|
+
- **Vector-only search**: May miss exact keyword matches ("PostgreSQL 17.2" vs "database")
|
|
30
|
+
- **No temporal context**: Doesn't leverage time-based relevance
|
|
31
|
+
- **Scalability**: Simple linear scans don't scale to thousands of memories
|
|
32
|
+
|
|
33
|
+
### Requirements
|
|
34
|
+
|
|
35
|
+
HTM needs intelligent retrieval that balances:
|
|
36
|
+
|
|
37
|
+
- Semantic understanding (what does the query mean?)
|
|
38
|
+
- Keyword precision (exact term matching)
|
|
39
|
+
- Temporal relevance (recent vs historical context)
|
|
40
|
+
- Performance (fast retrieval from large datasets)
|
|
41
|
+
|
|
42
|
+
### Alternative Approaches
|
|
43
|
+
|
|
44
|
+
1. **Pure vector search**: Semantic only, no keyword precision
|
|
45
|
+
2. **Pure full-text search**: Keywords only, no semantic understanding
|
|
46
|
+
3. **Hybrid search**: Combine vector + full-text + temporal filtering
|
|
47
|
+
4. **LLM-as-retriever**: Use LLM to generate retrieval queries (slow, expensive)
|
|
48
|
+
|
|
49
|
+
---
|
|
50
|
+
|
|
51
|
+
## Decision
|
|
52
|
+
|
|
53
|
+
We will implement **RAG-based retrieval with three search strategies**: vector, full-text, and hybrid, all with temporal filtering.
|
|
54
|
+
|
|
55
|
+
### Search Strategies
|
|
56
|
+
|
|
57
|
+
**1. Vector Search (`:vector`)**
|
|
58
|
+
|
|
59
|
+
- Generate embedding for query
|
|
60
|
+
- Compute cosine similarity with stored embeddings
|
|
61
|
+
- Temporal filtering on timeframe
|
|
62
|
+
- Best for: Semantic queries, conceptual relationships
|
|
63
|
+
|
|
64
|
+
**2. Full-Text Search (`:fulltext`)**
|
|
65
|
+
|
|
66
|
+
- PostgreSQL `to_tsvector` and `plainto_tsquery`
|
|
67
|
+
- `ts_rank` scoring for relevance
|
|
68
|
+
- Temporal filtering on timeframe
|
|
69
|
+
- Best for: Exact keywords, technical terms, proper nouns
|
|
70
|
+
|
|
71
|
+
**3. Hybrid Search (`:hybrid`)** - **Recommended Default**
|
|
72
|
+
|
|
73
|
+
- Full-text pre-filter to get candidates (top 100)
|
|
74
|
+
- Vector reranking of candidates for semantic relevance
|
|
75
|
+
- Temporal filtering on timeframe
|
|
76
|
+
- Best for: Balanced retrieval with precision + recall
|
|
77
|
+
|
|
78
|
+
---
|
|
79
|
+
|
|
80
|
+
## Rationale
|
|
81
|
+
|
|
82
|
+
### Why RAG-Based Retrieval?
|
|
83
|
+
|
|
84
|
+
**Temporal filtering is foundational**:
|
|
85
|
+
|
|
86
|
+
- "What did we discuss last week?" - time is the primary filter
|
|
87
|
+
- Recent context often more relevant than old context
|
|
88
|
+
- TimescaleDB optimized for time-range queries
|
|
89
|
+
|
|
90
|
+
**Semantic search handles synonyms**:
|
|
91
|
+
|
|
92
|
+
- User says "database", finds memories about "PostgreSQL"
|
|
93
|
+
- "Bug fix" matches "resolved issue"
|
|
94
|
+
- Captures conceptual relationships
|
|
95
|
+
|
|
96
|
+
**Full-text handles precision**:
|
|
97
|
+
|
|
98
|
+
- "PostgreSQL 17.2" needs exact version match
|
|
99
|
+
- Technical terminology like "pgvector", "HNSW"
|
|
100
|
+
- Proper nouns like robot names, project names
|
|
101
|
+
|
|
102
|
+
**Hybrid combines strengths**:
|
|
103
|
+
|
|
104
|
+
- Pre-filter with keywords reduces vector search space
|
|
105
|
+
- Vector reranking improves relevance of keyword matches
|
|
106
|
+
- Avoids false positives from pure vector search
|
|
107
|
+
- Avoids missing results from pure keyword search
|
|
108
|
+
|
|
109
|
+
---
|
|
110
|
+
|
|
111
|
+
## Implementation Details
|
|
112
|
+
|
|
113
|
+
!!! info "Client-Side Embedding Generation"
|
|
114
|
+
Query embeddings are generated client-side in Ruby via `EmbeddingService` before being passed to SQL for vector similarity search.
|
|
115
|
+
|
|
116
|
+
### Vector Search
|
|
117
|
+
|
|
118
|
+
```ruby
|
|
119
|
+
def search(timeframe:, query:, limit:, embedding_service:)
|
|
120
|
+
# Generate query embedding client-side
|
|
121
|
+
query_embedding = embedding_service.embed(query)
|
|
122
|
+
|
|
123
|
+
# Pad to 2000 dimensions if needed
|
|
124
|
+
query_embedding += Array.new(2000 - query_embedding.length, 0.0) if query_embedding.length < 2000
|
|
125
|
+
|
|
126
|
+
# Convert to PostgreSQL vector format
|
|
127
|
+
embedding_str = "[#{query_embedding.join(',')}]"
|
|
128
|
+
|
|
129
|
+
# Vector search in database
|
|
130
|
+
conn.exec_params(<<~SQL, [embedding_str, timeframe.begin, timeframe.end, limit])
|
|
131
|
+
SELECT id, content, speaker, type, category, importance, created_at, robot_id, token_count,
|
|
132
|
+
1 - (embedding <=> $1::vector) as similarity
|
|
133
|
+
FROM nodes
|
|
134
|
+
WHERE created_at BETWEEN $2 AND $3
|
|
135
|
+
AND embedding IS NOT NULL
|
|
136
|
+
ORDER BY embedding <=> $1::vector
|
|
137
|
+
LIMIT $4
|
|
138
|
+
SQL
|
|
139
|
+
end
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
### Full-Text Search
|
|
143
|
+
|
|
144
|
+
```ruby
|
|
145
|
+
def search_fulltext(timeframe:, query:, limit:)
|
|
146
|
+
# No embedding needed for full-text search
|
|
147
|
+
conn.exec_params(<<~SQL, [query, timeframe.begin, timeframe.end, limit])
|
|
148
|
+
SELECT *, ts_rank(to_tsvector('english', content), plainto_tsquery('english', $1)) as rank
|
|
149
|
+
FROM nodes
|
|
150
|
+
WHERE created_at BETWEEN $2 AND $3
|
|
151
|
+
AND to_tsvector('english', content) @@ plainto_tsquery('english', $1)
|
|
152
|
+
ORDER BY rank DESC
|
|
153
|
+
LIMIT $4
|
|
154
|
+
SQL
|
|
155
|
+
end
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
### Hybrid Search
|
|
159
|
+
|
|
160
|
+
```ruby
|
|
161
|
+
def search_hybrid(timeframe:, query:, limit:, embedding_service:, prefilter_limit: 100)
|
|
162
|
+
# Generate query embedding client-side
|
|
163
|
+
query_embedding = embedding_service.embed(query)
|
|
164
|
+
query_embedding += Array.new(2000 - query_embedding.length, 0.0) if query_embedding.length < 2000
|
|
165
|
+
embedding_str = "[#{query_embedding.join(',')}]"
|
|
166
|
+
|
|
167
|
+
# Combine full-text pre-filter with vector reranking
|
|
168
|
+
conn.exec_params(<<~SQL, [embedding_str, timeframe.begin, timeframe.end, query, prefilter_limit, limit])
|
|
169
|
+
WITH candidates AS (
|
|
170
|
+
SELECT id, content, speaker, type, category, importance, created_at, robot_id, token_count, embedding
|
|
171
|
+
FROM nodes
|
|
172
|
+
WHERE created_at BETWEEN $2 AND $3
|
|
173
|
+
AND to_tsvector('english', content) @@ plainto_tsquery('english', $4)
|
|
174
|
+
AND embedding IS NOT NULL
|
|
175
|
+
LIMIT $5 -- Pre-filter to top candidates
|
|
176
|
+
)
|
|
177
|
+
SELECT id, content, speaker, type, category, importance, created_at, robot_id, token_count,
|
|
178
|
+
1 - (embedding <=> $1::vector) as similarity
|
|
179
|
+
FROM candidates
|
|
180
|
+
ORDER BY embedding <=> $1::vector
|
|
181
|
+
LIMIT $6 -- Final top results
|
|
182
|
+
SQL
|
|
183
|
+
end
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
### User API
|
|
187
|
+
|
|
188
|
+
```ruby
|
|
189
|
+
# Use hybrid search (recommended)
|
|
190
|
+
memories = htm.recall(
|
|
191
|
+
timeframe: "last week",
|
|
192
|
+
topic: "PostgreSQL performance",
|
|
193
|
+
limit: 20,
|
|
194
|
+
strategy: :hybrid # default recommended
|
|
195
|
+
)
|
|
196
|
+
|
|
197
|
+
# Use pure vector search
|
|
198
|
+
memories = htm.recall(
|
|
199
|
+
timeframe: "last month",
|
|
200
|
+
topic: "database design philosophy",
|
|
201
|
+
strategy: :vector # best for conceptual queries
|
|
202
|
+
)
|
|
203
|
+
|
|
204
|
+
# Use pure full-text search
|
|
205
|
+
memories = htm.recall(
|
|
206
|
+
timeframe: "yesterday",
|
|
207
|
+
topic: "PostgreSQL 17.2 upgrade",
|
|
208
|
+
strategy: :fulltext # best for exact keywords
|
|
209
|
+
)
|
|
210
|
+
```
|
|
211
|
+
|
|
212
|
+
---
|
|
213
|
+
|
|
214
|
+
## Consequences
|
|
215
|
+
|
|
216
|
+
### Positive
|
|
217
|
+
|
|
218
|
+
- Flexible retrieval: Choose strategy based on query type
|
|
219
|
+
- Temporal context: Time-range filtering built into all strategies
|
|
220
|
+
- Semantic understanding: Vector search captures relationships
|
|
221
|
+
- Keyword precision: Full-text search handles exact matches
|
|
222
|
+
- Balanced hybrid: Best of both worlds with pre-filter optimization
|
|
223
|
+
- Scalable: HNSW indexing on vectors, GIN indexing on tsvectors
|
|
224
|
+
- Transparent scoring: Return similarity/rank scores for debugging
|
|
225
|
+
|
|
226
|
+
### Negative
|
|
227
|
+
|
|
228
|
+
- Complexity: Three strategies to understand and choose from
|
|
229
|
+
- Embedding latency: Vector/hybrid require embedding generation
|
|
230
|
+
- Storage overhead: Both embeddings and full-text indexes
|
|
231
|
+
- English-only: Full-text optimized for English language
|
|
232
|
+
- Tuning required: Hybrid prefilter_limit may need adjustment
|
|
233
|
+
|
|
234
|
+
### Neutral
|
|
235
|
+
|
|
236
|
+
- Strategy selection: User must choose appropriate strategy
|
|
237
|
+
- Timeframe parsing: Natural language time parsing adds complexity
|
|
238
|
+
- Embedding consistency: Different embedding models produce different results
|
|
239
|
+
|
|
240
|
+
---
|
|
241
|
+
|
|
242
|
+
## Use Cases
|
|
243
|
+
|
|
244
|
+
### Use Case 1: Semantic Concept Retrieval
|
|
245
|
+
|
|
246
|
+
```ruby
|
|
247
|
+
# Query: "What architectural decisions have we made?"
|
|
248
|
+
# Best strategy: :vector (semantic concept matching)
|
|
249
|
+
|
|
250
|
+
memories = htm.recall(
|
|
251
|
+
timeframe: "last month",
|
|
252
|
+
topic: "architectural decisions design choices",
|
|
253
|
+
strategy: :vector
|
|
254
|
+
)
|
|
255
|
+
|
|
256
|
+
# Finds: "We decided to use PostgreSQL", "Chose two-tier memory model", etc.
|
|
257
|
+
# Matches conceptually even without exact keywords
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
### Use Case 2: Exact Technical Term
|
|
261
|
+
|
|
262
|
+
```ruby
|
|
263
|
+
# Query: "Find all mentions of PostgreSQL 17.2"
|
|
264
|
+
# Best strategy: :fulltext (exact version number)
|
|
265
|
+
|
|
266
|
+
memories = htm.recall(
|
|
267
|
+
timeframe: "this week",
|
|
268
|
+
topic: "PostgreSQL 17.2",
|
|
269
|
+
strategy: :fulltext
|
|
270
|
+
)
|
|
271
|
+
|
|
272
|
+
# Finds: Exact "PostgreSQL 17.2" mentions
|
|
273
|
+
# Avoids false matches to "PostgreSQL 16" or generic "database"
|
|
274
|
+
```
|
|
275
|
+
|
|
276
|
+
### Use Case 3: Balanced Query
|
|
277
|
+
|
|
278
|
+
```ruby
|
|
279
|
+
# Query: "What did we discuss about database performance?"
|
|
280
|
+
# Best strategy: :hybrid (keyword + semantic)
|
|
281
|
+
|
|
282
|
+
memories = htm.recall(
|
|
283
|
+
timeframe: "last week",
|
|
284
|
+
topic: "database performance optimization",
|
|
285
|
+
strategy: :hybrid
|
|
286
|
+
)
|
|
287
|
+
|
|
288
|
+
# Pre-filters: Documents containing "database", "performance", "optimization"
|
|
289
|
+
# Reranks: By semantic similarity to full query
|
|
290
|
+
# Result: Best balance of precision + recall
|
|
291
|
+
```
|
|
292
|
+
|
|
293
|
+
### Use Case 4: Conversation Timeline
|
|
294
|
+
|
|
295
|
+
```ruby
|
|
296
|
+
# Get chronological conversation about a topic
|
|
297
|
+
timeline = htm.conversation_timeline("HTM design", limit: 50)
|
|
298
|
+
|
|
299
|
+
# Returns memories sorted by created_at
|
|
300
|
+
# Useful for replaying decision evolution over time
|
|
301
|
+
```
|
|
302
|
+
|
|
303
|
+
---
|
|
304
|
+
|
|
305
|
+
## Performance Characteristics
|
|
306
|
+
|
|
307
|
+
!!! info "Client-Side Embedding Generation"
|
|
308
|
+
Embeddings are generated client-side before SQL queries. Latency includes HTTP call to Ollama/OpenAI for embedding generation.
|
|
309
|
+
|
|
310
|
+
### Vector Search
|
|
311
|
+
|
|
312
|
+
- **Latency**: ~30-50ms for client-side embedding + index lookup
|
|
313
|
+
- **Index**: HNSW (Hierarchical Navigable Small World)
|
|
314
|
+
- **Scalability**: O(log n) with HNSW, sublinear
|
|
315
|
+
- **Best case**: Conceptual queries, semantic relationships
|
|
316
|
+
- **Breakdown**: ~20-30ms embedding generation, ~10-20ms vector search
|
|
317
|
+
|
|
318
|
+
### Full-Text Search
|
|
319
|
+
|
|
320
|
+
- **Latency**: ~5-20ms (no embedding generation)
|
|
321
|
+
- **Index**: GIN (Generalized Inverted Index) on tsvector
|
|
322
|
+
- **Scalability**: O(log n) with GIN index
|
|
323
|
+
- **Best case**: Exact keywords, technical terms
|
|
324
|
+
- **Benefit**: Fastest option when embeddings not needed
|
|
325
|
+
|
|
326
|
+
### Hybrid Search
|
|
327
|
+
|
|
328
|
+
- **Latency**: Full-text pre-filter + client-side embedding + vector reranking
|
|
329
|
+
- **Total**: ~35-70ms
|
|
330
|
+
- **Optimization**: Pre-filter reduces vector search space
|
|
331
|
+
- **Best case**: Large datasets where full-text can narrow candidates
|
|
332
|
+
- **Breakdown**: ~20-30ms embedding, ~5-10ms full-text, ~10-30ms vector reranking
|
|
333
|
+
|
|
334
|
+
### Temporal Filtering
|
|
335
|
+
|
|
336
|
+
- **Optimization**: TimescaleDB hypertable partitioning by time
|
|
337
|
+
- **Index**: B-tree on `created_at` column
|
|
338
|
+
- **Benefit**: Prunes partitions outside timeframe, faster scans
|
|
339
|
+
|
|
340
|
+
---
|
|
341
|
+
|
|
342
|
+
## Design Decisions
|
|
343
|
+
|
|
344
|
+
### Decision: Three Strategies Instead of One
|
|
345
|
+
|
|
346
|
+
**Rationale**: Different queries benefit from different approaches. Give users flexibility.
|
|
347
|
+
|
|
348
|
+
**Alternative**: Single hybrid strategy for all queries
|
|
349
|
+
|
|
350
|
+
**Rejected**: Forces hybrid approach even when pure vector or full-text is better
|
|
351
|
+
|
|
352
|
+
### Decision: Temporal Filtering is Mandatory
|
|
353
|
+
|
|
354
|
+
**Rationale**: HTM is time-oriented. All retrieval should consider temporal context.
|
|
355
|
+
|
|
356
|
+
**Alternative**: Optional timeframe parameter
|
|
357
|
+
|
|
358
|
+
**Rejected**: Easy to forget, defeats TimescaleDB optimization benefits
|
|
359
|
+
|
|
360
|
+
### Decision: Hybrid Pre-filter Limit = 100
|
|
361
|
+
|
|
362
|
+
**Rationale**: Balances recall (enough candidates) with performance (vector search cost)
|
|
363
|
+
|
|
364
|
+
**Alternative**: Dynamic limit based on result count
|
|
365
|
+
|
|
366
|
+
**Deferred**: Can optimize later based on real-world usage patterns
|
|
367
|
+
|
|
368
|
+
### Decision: Return Similarity/Rank Scores
|
|
369
|
+
|
|
370
|
+
**Rationale**: Enables debugging, threshold filtering, and understanding retrieval quality
|
|
371
|
+
|
|
372
|
+
**Alternative**: Just return nodes without scores
|
|
373
|
+
|
|
374
|
+
**Rejected**: Lose valuable signal for debugging and optimization
|
|
375
|
+
|
|
376
|
+
---
|
|
377
|
+
|
|
378
|
+
## Risks and Mitigations
|
|
379
|
+
|
|
380
|
+
### Risk: Wrong Strategy Selection
|
|
381
|
+
|
|
382
|
+
!!! warning "Risk"
|
|
383
|
+
User chooses vector for exact keyword query (poor results)
|
|
384
|
+
|
|
385
|
+
**Likelihood**: Medium (requires understanding differences)
|
|
386
|
+
|
|
387
|
+
**Impact**: Medium (degraded retrieval quality)
|
|
388
|
+
|
|
389
|
+
**Mitigation**:
|
|
390
|
+
|
|
391
|
+
- Default to hybrid for balanced results
|
|
392
|
+
- Document use cases clearly
|
|
393
|
+
- Provide examples in API docs
|
|
394
|
+
- Consider auto-detection in future
|
|
395
|
+
|
|
396
|
+
### Risk: Embedding Latency
|
|
397
|
+
|
|
398
|
+
!!! info "Risk"
|
|
399
|
+
Vector/hybrid slow due to embedding generation
|
|
400
|
+
|
|
401
|
+
**Likelihood**: High (embedding is I/O bound)
|
|
402
|
+
|
|
403
|
+
**Impact**: Medium (100-500ms for Ollama)
|
|
404
|
+
|
|
405
|
+
**Mitigation**:
|
|
406
|
+
|
|
407
|
+
- Cache embeddings for common queries (future)
|
|
408
|
+
- Use fast local embedding models (gpt-oss)
|
|
409
|
+
- Provide fallback to full-text if embedding fails
|
|
410
|
+
|
|
411
|
+
### Risk: Language Limitation
|
|
412
|
+
|
|
413
|
+
!!! danger "Risk"
|
|
414
|
+
Full-text search optimized for English only
|
|
415
|
+
|
|
416
|
+
**Likelihood**: Low (single-user, likely English)
|
|
417
|
+
|
|
418
|
+
**Impact**: High (non-English users)
|
|
419
|
+
|
|
420
|
+
**Mitigation**:
|
|
421
|
+
|
|
422
|
+
- Document English assumption
|
|
423
|
+
- Support language parameter in future
|
|
424
|
+
- Vector search language-agnostic (works for all languages)
|
|
425
|
+
|
|
426
|
+
### Risk: Pre-filter Misses Results
|
|
427
|
+
|
|
428
|
+
!!! info "Risk"
|
|
429
|
+
Hybrid pre-filter (100) misses relevant candidates
|
|
430
|
+
|
|
431
|
+
**Likelihood**: Low (100 is generous)
|
|
432
|
+
|
|
433
|
+
**Impact**: Medium (reduced recall)
|
|
434
|
+
|
|
435
|
+
**Mitigation**:
|
|
436
|
+
|
|
437
|
+
- Make prefilter_limit configurable
|
|
438
|
+
- Monitor recall metrics in practice
|
|
439
|
+
- Adjust default if needed
|
|
440
|
+
|
|
441
|
+
---
|
|
442
|
+
|
|
443
|
+
## Future Enhancements
|
|
444
|
+
|
|
445
|
+
### Query Auto-Detection
|
|
446
|
+
|
|
447
|
+
```ruby
|
|
448
|
+
# Automatically choose strategy based on query
|
|
449
|
+
htm.recall_smart(timeframe: "last week", topic: "PostgreSQL 17.2")
|
|
450
|
+
# Detects version number → uses :fulltext
|
|
451
|
+
|
|
452
|
+
htm.recall_smart(timeframe: "last month", topic: "architectural philosophy")
|
|
453
|
+
# Detects conceptual query → uses :vector
|
|
454
|
+
```
|
|
455
|
+
|
|
456
|
+
### Re-ranking Strategies
|
|
457
|
+
|
|
458
|
+
```ruby
|
|
459
|
+
# Custom re-ranking based on multiple signals
|
|
460
|
+
memories = htm.recall(
|
|
461
|
+
timeframe: "last week",
|
|
462
|
+
topic: "PostgreSQL",
|
|
463
|
+
strategy: :hybrid,
|
|
464
|
+
rerank: [:similarity, :importance, :recency] # Multi-factor scoring
|
|
465
|
+
)
|
|
466
|
+
```
|
|
467
|
+
|
|
468
|
+
### Query Expansion
|
|
469
|
+
|
|
470
|
+
```ruby
|
|
471
|
+
# LLM-powered query expansion
|
|
472
|
+
original = "database"
|
|
473
|
+
expanded = ["database", "PostgreSQL", "TimescaleDB", "SQL", "storage"]
|
|
474
|
+
|
|
475
|
+
memories = htm.recall(
|
|
476
|
+
timeframe: "last month",
|
|
477
|
+
topic: expanded,
|
|
478
|
+
strategy: :fulltext
|
|
479
|
+
)
|
|
480
|
+
```
|
|
481
|
+
|
|
482
|
+
### Caching Layer
|
|
483
|
+
|
|
484
|
+
```ruby
|
|
485
|
+
# Cache embedding generation for common queries
|
|
486
|
+
@embedding_cache = {}
|
|
487
|
+
|
|
488
|
+
def search_cached(query)
|
|
489
|
+
@embedding_cache[query] ||= embedding_service.embed(query)
|
|
490
|
+
end
|
|
491
|
+
```
|
|
492
|
+
|
|
493
|
+
---
|
|
494
|
+
|
|
495
|
+
## Alternatives Comparison
|
|
496
|
+
|
|
497
|
+
| Approach | Pros | Cons | Decision |
|
|
498
|
+
|----------|------|------|----------|
|
|
499
|
+
| **Hybrid Search** | **Balanced precision + recall** | **Strategy selection** | **ACCEPTED** |
|
|
500
|
+
| Pure Vector Only | Simplest API, semantic | Misses exact matches, slower | Rejected |
|
|
501
|
+
| Pure Full-Text Only | Fast, no embeddings | No semantic understanding | Rejected |
|
|
502
|
+
| LLM-as-Retriever | Most flexible queries | Too slow, expensive | Rejected |
|
|
503
|
+
| Elasticsearch | Dedicated search engine | Additional infrastructure | Rejected |
|
|
504
|
+
|
|
505
|
+
---
|
|
506
|
+
|
|
507
|
+
## References
|
|
508
|
+
|
|
509
|
+
- [RAG (Retrieval-Augmented Generation)](https://arxiv.org/abs/2005.11401)
|
|
510
|
+
- [pgvector Documentation](https://github.com/pgvector/pgvector)
|
|
511
|
+
- [PostgreSQL Full-Text Search](https://www.postgresql.org/docs/current/textsearch.html)
|
|
512
|
+
- [HNSW Algorithm](https://arxiv.org/abs/1603.09320)
|
|
513
|
+
- [Hybrid Search Best Practices](https://www.pinecone.io/learn/hybrid-search-intro/)
|
|
514
|
+
- [ADR-001: PostgreSQL Storage](001-postgresql-timescaledb.md)
|
|
515
|
+
- [ADR-003: Ollama Embeddings](003-ollama-embeddings.md) - **Superseded by ADR-011**
|
|
516
|
+
- [ADR-011: Database-Side Embedding Generation with pgai](011-pgai-integration.md) - **Superseded (returned to client-side)**
|
|
517
|
+
- [Search Strategies Guide](../../guides/search-strategies.md)
|
|
518
|
+
|
|
519
|
+
---
|
|
520
|
+
|
|
521
|
+
## Review Notes
|
|
522
|
+
|
|
523
|
+
**AI Engineer**: Hybrid search is the right approach for RAG systems. Pre-filter optimization is smart.
|
|
524
|
+
|
|
525
|
+
**Database Architect**: TimescaleDB + pgvector + full-text is well-architected. Consider query plan analysis for optimization.
|
|
526
|
+
|
|
527
|
+
**Performance Specialist**: HNSW and GIN indexes will scale. Monitor embedding latency in production.
|
|
528
|
+
|
|
529
|
+
**Systems Architect**: Three strategies provide good flexibility. Document decision matrix clearly for users.
|
|
530
|
+
|
|
531
|
+
**Ruby Expert**: Clean API design. Consider strategy as default parameter: `recall(..., strategy: :hybrid)`
|