htm 0.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.architecture/decisions/adrs/001-use-postgresql-timescaledb-storage.md +227 -0
- data/.architecture/decisions/adrs/002-two-tier-memory-architecture.md +322 -0
- data/.architecture/decisions/adrs/003-ollama-default-embedding-provider.md +339 -0
- data/.architecture/decisions/adrs/004-multi-robot-shared-memory-hive-mind.md +374 -0
- data/.architecture/decisions/adrs/005-rag-based-retrieval-with-hybrid-search.md +443 -0
- data/.architecture/decisions/adrs/006-context-assembly-strategies.md +444 -0
- data/.architecture/decisions/adrs/007-working-memory-eviction-strategy.md +461 -0
- data/.architecture/decisions/adrs/008-robot-identification-system.md +550 -0
- data/.architecture/decisions/adrs/009-never-forget-explicit-deletion-only.md +570 -0
- data/.architecture/decisions/adrs/010-redis-working-memory-rejected.md +323 -0
- data/.architecture/decisions/adrs/011-database-side-embedding-generation-with-pgai.md +585 -0
- data/.architecture/decisions/adrs/012-llm-driven-ontology-topic-extraction.md +583 -0
- data/.architecture/decisions/adrs/013-activerecord-orm-and-many-to-many-tagging.md +299 -0
- data/.architecture/decisions/adrs/014-client-side-embedding-generation-workflow.md +569 -0
- data/.architecture/decisions/adrs/015-hierarchical-tag-ontology-and-llm-extraction.md +701 -0
- data/.architecture/decisions/adrs/016-async-embedding-and-tag-generation.md +694 -0
- data/.architecture/members.yml +144 -0
- data/.architecture/reviews/2025-10-29-llm-configuration-and-async-processing-review.md +1137 -0
- data/.architecture/reviews/initial-system-analysis.md +330 -0
- data/.envrc +32 -0
- data/.irbrc +145 -0
- data/CHANGELOG.md +150 -0
- data/COMMITS.md +196 -0
- data/LICENSE +21 -0
- data/README.md +1347 -0
- data/Rakefile +51 -0
- data/SETUP.md +268 -0
- data/config/database.yml +67 -0
- data/db/migrate/20250101000001_enable_extensions.rb +14 -0
- data/db/migrate/20250101000002_create_robots.rb +14 -0
- data/db/migrate/20250101000003_create_nodes.rb +42 -0
- data/db/migrate/20250101000005_create_tags.rb +38 -0
- data/db/migrate/20250101000007_add_node_vector_indexes.rb +30 -0
- data/db/schema.sql +473 -0
- data/db/seed_data/README.md +100 -0
- data/db/seed_data/presidents.md +136 -0
- data/db/seed_data/states.md +151 -0
- data/db/seeds.rb +208 -0
- data/dbdoc/README.md +173 -0
- data/dbdoc/public.node_stats.md +48 -0
- data/dbdoc/public.node_stats.svg +41 -0
- data/dbdoc/public.node_tags.md +40 -0
- data/dbdoc/public.node_tags.svg +112 -0
- data/dbdoc/public.nodes.md +54 -0
- data/dbdoc/public.nodes.svg +118 -0
- data/dbdoc/public.nodes_tags.md +39 -0
- data/dbdoc/public.nodes_tags.svg +112 -0
- data/dbdoc/public.ontology_structure.md +48 -0
- data/dbdoc/public.ontology_structure.svg +38 -0
- data/dbdoc/public.operations_log.md +42 -0
- data/dbdoc/public.operations_log.svg +130 -0
- data/dbdoc/public.relationships.md +39 -0
- data/dbdoc/public.relationships.svg +41 -0
- data/dbdoc/public.robot_activity.md +46 -0
- data/dbdoc/public.robot_activity.svg +35 -0
- data/dbdoc/public.robots.md +35 -0
- data/dbdoc/public.robots.svg +90 -0
- data/dbdoc/public.schema_migrations.md +29 -0
- data/dbdoc/public.schema_migrations.svg +26 -0
- data/dbdoc/public.tags.md +35 -0
- data/dbdoc/public.tags.svg +60 -0
- data/dbdoc/public.topic_relationships.md +45 -0
- data/dbdoc/public.topic_relationships.svg +32 -0
- data/dbdoc/schema.json +1437 -0
- data/dbdoc/schema.svg +154 -0
- data/docs/api/database.md +806 -0
- data/docs/api/embedding-service.md +532 -0
- data/docs/api/htm.md +797 -0
- data/docs/api/index.md +259 -0
- data/docs/api/long-term-memory.md +1096 -0
- data/docs/api/working-memory.md +665 -0
- data/docs/architecture/adrs/001-postgresql-timescaledb.md +314 -0
- data/docs/architecture/adrs/002-two-tier-memory.md +411 -0
- data/docs/architecture/adrs/003-ollama-embeddings.md +421 -0
- data/docs/architecture/adrs/004-hive-mind.md +437 -0
- data/docs/architecture/adrs/005-rag-retrieval.md +531 -0
- data/docs/architecture/adrs/006-context-assembly.md +496 -0
- data/docs/architecture/adrs/007-eviction-strategy.md +645 -0
- data/docs/architecture/adrs/008-robot-identification.md +625 -0
- data/docs/architecture/adrs/009-never-forget.md +648 -0
- data/docs/architecture/adrs/010-redis-working-memory-rejected.md +323 -0
- data/docs/architecture/adrs/011-pgai-integration.md +494 -0
- data/docs/architecture/adrs/index.md +215 -0
- data/docs/architecture/hive-mind.md +736 -0
- data/docs/architecture/index.md +351 -0
- data/docs/architecture/overview.md +538 -0
- data/docs/architecture/two-tier-memory.md +873 -0
- data/docs/assets/css/custom.css +83 -0
- data/docs/assets/images/htm-core-components.svg +63 -0
- data/docs/assets/images/htm-database-schema.svg +93 -0
- data/docs/assets/images/htm-hive-mind-architecture.svg +125 -0
- data/docs/assets/images/htm-importance-scoring-framework.svg +83 -0
- data/docs/assets/images/htm-layered-architecture.svg +71 -0
- data/docs/assets/images/htm-long-term-memory-architecture.svg +115 -0
- data/docs/assets/images/htm-working-memory-architecture.svg +120 -0
- data/docs/assets/images/htm.jpg +0 -0
- data/docs/assets/images/htm_demo.gif +0 -0
- data/docs/assets/js/mathjax.js +18 -0
- data/docs/assets/videos/htm_video.mp4 +0 -0
- data/docs/database_rake_tasks.md +322 -0
- data/docs/development/contributing.md +787 -0
- data/docs/development/index.md +336 -0
- data/docs/development/schema.md +596 -0
- data/docs/development/setup.md +719 -0
- data/docs/development/testing.md +819 -0
- data/docs/guides/adding-memories.md +824 -0
- data/docs/guides/context-assembly.md +1009 -0
- data/docs/guides/getting-started.md +577 -0
- data/docs/guides/index.md +118 -0
- data/docs/guides/long-term-memory.md +941 -0
- data/docs/guides/multi-robot.md +866 -0
- data/docs/guides/recalling-memories.md +927 -0
- data/docs/guides/search-strategies.md +953 -0
- data/docs/guides/working-memory.md +717 -0
- data/docs/index.md +214 -0
- data/docs/installation.md +477 -0
- data/docs/multi_framework_support.md +519 -0
- data/docs/quick-start.md +655 -0
- data/docs/setup_local_database.md +302 -0
- data/docs/using_rake_tasks_in_your_app.md +383 -0
- data/examples/basic_usage.rb +93 -0
- data/examples/cli_app/README.md +317 -0
- data/examples/cli_app/htm_cli.rb +270 -0
- data/examples/custom_llm_configuration.rb +183 -0
- data/examples/example_app/Rakefile +71 -0
- data/examples/example_app/app.rb +206 -0
- data/examples/sinatra_app/Gemfile +21 -0
- data/examples/sinatra_app/app.rb +335 -0
- data/lib/htm/active_record_config.rb +113 -0
- data/lib/htm/configuration.rb +342 -0
- data/lib/htm/database.rb +594 -0
- data/lib/htm/embedding_service.rb +115 -0
- data/lib/htm/errors.rb +34 -0
- data/lib/htm/job_adapter.rb +154 -0
- data/lib/htm/jobs/generate_embedding_job.rb +65 -0
- data/lib/htm/jobs/generate_tags_job.rb +82 -0
- data/lib/htm/long_term_memory.rb +965 -0
- data/lib/htm/models/node.rb +109 -0
- data/lib/htm/models/node_tag.rb +33 -0
- data/lib/htm/models/robot.rb +52 -0
- data/lib/htm/models/tag.rb +76 -0
- data/lib/htm/railtie.rb +76 -0
- data/lib/htm/sinatra.rb +157 -0
- data/lib/htm/tag_service.rb +135 -0
- data/lib/htm/tasks.rb +38 -0
- data/lib/htm/version.rb +5 -0
- data/lib/htm/working_memory.rb +182 -0
- data/lib/htm.rb +400 -0
- data/lib/tasks/db.rake +19 -0
- data/lib/tasks/htm.rake +147 -0
- data/lib/tasks/jobs.rake +312 -0
- data/mkdocs.yml +190 -0
- data/scripts/install_local_database.sh +309 -0
- metadata +341 -0
|
@@ -0,0 +1,443 @@
|
|
|
1
|
+
# ADR-005: RAG-Based Retrieval with Hybrid Search
|
|
2
|
+
|
|
3
|
+
**Status**: Accepted
|
|
4
|
+
|
|
5
|
+
**Date**: 2025-10-25
|
|
6
|
+
|
|
7
|
+
**Decision Makers**: Dewayne VanHoozer, Claude (Anthropic)
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## ⚠️ UPDATE (2025-10-28)
|
|
12
|
+
|
|
13
|
+
**References to TimescaleDB optimization in this ADR are now historical.**
|
|
14
|
+
|
|
15
|
+
After initial struggles with database configuration, the decision was made to drop the TimescaleDB extension as it was not providing sufficient value for the current proof-of-concept applications. The RAG-based retrieval strategies remain unchanged, but temporal filtering now uses standard PostgreSQL B-tree indexes instead of TimescaleDB hypertable partitioning.
|
|
16
|
+
|
|
17
|
+
See [ADR-001](001-use-postgresql-timescaledb-storage.md) for details on the TimescaleDB removal.
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## Context
|
|
22
|
+
|
|
23
|
+
Traditional memory systems for LLMs face challenges in retrieving relevant information:
|
|
24
|
+
|
|
25
|
+
- **Keyword-only search**: Misses semantic relationships ("car" vs "automobile")
|
|
26
|
+
- **Vector-only search**: May miss exact keyword matches ("PostgreSQL 17.2" vs "database")
|
|
27
|
+
- **No temporal context**: Doesn't leverage time-based relevance
|
|
28
|
+
- **Scalability**: Simple linear scans don't scale to thousands of memories
|
|
29
|
+
|
|
30
|
+
HTM needs intelligent retrieval that balances:
|
|
31
|
+
|
|
32
|
+
- Semantic understanding (what does the query mean?)
|
|
33
|
+
- Keyword precision (exact term matching)
|
|
34
|
+
- Temporal relevance (recent vs historical context)
|
|
35
|
+
- Performance (fast retrieval from large datasets)
|
|
36
|
+
|
|
37
|
+
Alternative approaches:
|
|
38
|
+
|
|
39
|
+
1. **Pure vector search**: Semantic only, no keyword precision
|
|
40
|
+
2. **Pure full-text search**: Keywords only, no semantic understanding
|
|
41
|
+
3. **Hybrid search**: Combine vector + full-text + temporal filtering
|
|
42
|
+
4. **LLM-as-retriever**: Use LLM to generate retrieval queries (slow, expensive)
|
|
43
|
+
|
|
44
|
+
## Decision
|
|
45
|
+
|
|
46
|
+
We will implement **RAG-based retrieval with three search strategies**: vector, full-text, and hybrid, all with temporal filtering.
|
|
47
|
+
|
|
48
|
+
### Search Strategies
|
|
49
|
+
|
|
50
|
+
**1. Vector Search (`:vector`)**
|
|
51
|
+
|
|
52
|
+
- Generate embedding for query
|
|
53
|
+
- Compute cosine similarity with stored embeddings
|
|
54
|
+
- Temporal filtering on timeframe
|
|
55
|
+
- Best for: Semantic queries, conceptual relationships
|
|
56
|
+
|
|
57
|
+
**2. Full-Text Search (`:fulltext`)**
|
|
58
|
+
|
|
59
|
+
- PostgreSQL `to_tsvector` and `plainto_tsquery`
|
|
60
|
+
- `ts_rank` scoring for relevance
|
|
61
|
+
- Temporal filtering on timeframe
|
|
62
|
+
- Best for: Exact keywords, technical terms, proper nouns
|
|
63
|
+
|
|
64
|
+
**3. Hybrid Search (`:hybrid`)**
|
|
65
|
+
|
|
66
|
+
- Full-text pre-filter to get candidates (top 100)
|
|
67
|
+
- Vector reranking of candidates for semantic relevance
|
|
68
|
+
- Temporal filtering on timeframe
|
|
69
|
+
- Best for: Balanced retrieval with precision + recall
|
|
70
|
+
|
|
71
|
+
### Default Strategy
|
|
72
|
+
|
|
73
|
+
**Hybrid** is recommended for most use cases as it provides the best balance of semantic understanding and keyword precision.
|
|
74
|
+
|
|
75
|
+
## Rationale
|
|
76
|
+
|
|
77
|
+
### Why RAG-Based Retrieval?
|
|
78
|
+
|
|
79
|
+
**Temporal filtering is foundational**:
|
|
80
|
+
|
|
81
|
+
- "What did we discuss last week?" - time is the primary filter
|
|
82
|
+
- Recent context often more relevant than old context
|
|
83
|
+
- TimescaleDB optimized for time-range queries
|
|
84
|
+
|
|
85
|
+
**Semantic search handles synonyms**:
|
|
86
|
+
|
|
87
|
+
- User says "database", finds memories about "PostgreSQL"
|
|
88
|
+
- "Bug fix" matches "resolved issue"
|
|
89
|
+
- Captures conceptual relationships
|
|
90
|
+
|
|
91
|
+
**Full-text handles precision**:
|
|
92
|
+
|
|
93
|
+
- "PostgreSQL 17.2" needs exact version match
|
|
94
|
+
- Technical terminology like "pgvector", "HNSW"
|
|
95
|
+
- Proper nouns like robot names, project names
|
|
96
|
+
|
|
97
|
+
**Hybrid combines strengths**:
|
|
98
|
+
|
|
99
|
+
- Pre-filter with keywords reduces vector search space
|
|
100
|
+
- Vector reranking improves relevance of keyword matches
|
|
101
|
+
- Avoids false positives from pure vector search
|
|
102
|
+
- Avoids missing results from pure keyword search
|
|
103
|
+
|
|
104
|
+
### Implementation Details
|
|
105
|
+
|
|
106
|
+
```ruby
|
|
107
|
+
# Vector search
|
|
108
|
+
def search(timeframe:, query:, limit:, embedding_service:)
|
|
109
|
+
query_embedding = embedding_service.embed(query)
|
|
110
|
+
|
|
111
|
+
SELECT *, 1 - (embedding <=> $1::vector) as similarity
|
|
112
|
+
FROM nodes
|
|
113
|
+
WHERE created_at BETWEEN $2 AND $3
|
|
114
|
+
ORDER BY embedding <=> $1::vector
|
|
115
|
+
LIMIT $4
|
|
116
|
+
end
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
```ruby
|
|
120
|
+
# Full-text search
|
|
121
|
+
def search_fulltext(timeframe:, query:, limit:)
|
|
122
|
+
SELECT *, ts_rank(to_tsvector('english', value), plainto_tsquery('english', $1)) as rank
|
|
123
|
+
FROM nodes
|
|
124
|
+
WHERE created_at BETWEEN $2 AND $3
|
|
125
|
+
AND to_tsvector('english', value) @@ plainto_tsquery('english', $1)
|
|
126
|
+
ORDER BY rank DESC
|
|
127
|
+
LIMIT $4
|
|
128
|
+
end
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
```ruby
|
|
132
|
+
# Hybrid search
|
|
133
|
+
def search_hybrid(timeframe:, query:, limit:, embedding_service:, prefilter_limit: 100)
|
|
134
|
+
query_embedding = embedding_service.embed(query)
|
|
135
|
+
|
|
136
|
+
WITH candidates AS (
|
|
137
|
+
SELECT *
|
|
138
|
+
FROM nodes
|
|
139
|
+
WHERE created_at BETWEEN $2 AND $3
|
|
140
|
+
AND to_tsvector('english', value) @@ plainto_tsquery('english', $1)
|
|
141
|
+
LIMIT $5 -- Pre-filter to 100 candidates
|
|
142
|
+
)
|
|
143
|
+
SELECT *, 1 - (embedding <=> $4::vector) as similarity
|
|
144
|
+
FROM candidates
|
|
145
|
+
ORDER BY embedding <=> $4::vector
|
|
146
|
+
LIMIT $6 -- Final top results
|
|
147
|
+
end
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
### User API
|
|
151
|
+
|
|
152
|
+
```ruby
|
|
153
|
+
# Use hybrid search (recommended)
|
|
154
|
+
memories = htm.recall(
|
|
155
|
+
timeframe: "last week",
|
|
156
|
+
topic: "PostgreSQL performance",
|
|
157
|
+
limit: 20,
|
|
158
|
+
strategy: :hybrid # default recommended
|
|
159
|
+
)
|
|
160
|
+
|
|
161
|
+
# Use pure vector search
|
|
162
|
+
memories = htm.recall(
|
|
163
|
+
timeframe: "last month",
|
|
164
|
+
topic: "database design philosophy",
|
|
165
|
+
strategy: :vector # best for conceptual queries
|
|
166
|
+
)
|
|
167
|
+
|
|
168
|
+
# Use pure full-text search
|
|
169
|
+
memories = htm.recall(
|
|
170
|
+
timeframe: "yesterday",
|
|
171
|
+
topic: "PostgreSQL 17.2 upgrade",
|
|
172
|
+
strategy: :fulltext # best for exact keywords
|
|
173
|
+
)
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
## Consequences
|
|
177
|
+
|
|
178
|
+
### Positive
|
|
179
|
+
|
|
180
|
+
✅ **Flexible retrieval**: Choose strategy based on query type
|
|
181
|
+
✅ **Temporal context**: Time-range filtering built into all strategies
|
|
182
|
+
✅ **Semantic understanding**: Vector search captures relationships
|
|
183
|
+
✅ **Keyword precision**: Full-text search handles exact matches
|
|
184
|
+
✅ **Balanced hybrid**: Best of both worlds with pre-filter optimization
|
|
185
|
+
✅ **Scalable**: HNSW indexing on vectors, GIN indexing on tsvectors
|
|
186
|
+
✅ **Transparent scoring**: Return similarity/rank scores for debugging
|
|
187
|
+
|
|
188
|
+
### Negative
|
|
189
|
+
|
|
190
|
+
❌ **Complexity**: Three strategies to understand and choose from
|
|
191
|
+
❌ **Embedding latency**: Vector/hybrid require embedding generation
|
|
192
|
+
❌ **Storage overhead**: Both embeddings and full-text indexes
|
|
193
|
+
❌ **English-only**: Full-text optimized for English language
|
|
194
|
+
❌ **Tuning required**: Hybrid prefilter_limit may need adjustment
|
|
195
|
+
|
|
196
|
+
### Neutral
|
|
197
|
+
|
|
198
|
+
➡️ **Strategy selection**: User must choose appropriate strategy
|
|
199
|
+
➡️ **Timeframe parsing**: Natural language time parsing adds complexity
|
|
200
|
+
➡️ **Embedding consistency**: Different embedding models produce different results
|
|
201
|
+
|
|
202
|
+
## Design Decisions
|
|
203
|
+
|
|
204
|
+
### Decision: Three Strategies Instead of One
|
|
205
|
+
**Rationale**: Different queries benefit from different approaches. Give users flexibility.
|
|
206
|
+
|
|
207
|
+
**Alternative**: Single hybrid strategy for all queries
|
|
208
|
+
**Rejected**: Forces hybrid approach even when pure vector or full-text is better
|
|
209
|
+
|
|
210
|
+
### Decision: Temporal Filtering is Mandatory
|
|
211
|
+
**Rationale**: HTM is time-oriented. All retrieval should consider temporal context.
|
|
212
|
+
|
|
213
|
+
**Alternative**: Optional timeframe parameter
|
|
214
|
+
**Rejected**: Easy to forget, defeats TimescaleDB optimization benefits
|
|
215
|
+
|
|
216
|
+
### Decision: Hybrid Pre-filter Limit = 100
|
|
217
|
+
**Rationale**: Balances recall (enough candidates) with performance (vector search cost)
|
|
218
|
+
|
|
219
|
+
**Alternative**: Dynamic limit based on result count
|
|
220
|
+
**Deferred**: Can optimize later based on real-world usage patterns
|
|
221
|
+
|
|
222
|
+
### Decision: Return Similarity/Rank Scores
|
|
223
|
+
**Rationale**: Enables debugging, threshold filtering, and understanding retrieval quality
|
|
224
|
+
|
|
225
|
+
**Alternative**: Just return nodes without scores
|
|
226
|
+
**Rejected**: Lose valuable signal for debugging and optimization
|
|
227
|
+
|
|
228
|
+
## Use Cases
|
|
229
|
+
|
|
230
|
+
### Use Case 1: Semantic Concept Retrieval
|
|
231
|
+
```ruby
|
|
232
|
+
# Query: "What architectural decisions have we made?"
|
|
233
|
+
# Best strategy: :vector (semantic concept matching)
|
|
234
|
+
|
|
235
|
+
memories = htm.recall(
|
|
236
|
+
timeframe: "last month",
|
|
237
|
+
topic: "architectural decisions design choices",
|
|
238
|
+
strategy: :vector
|
|
239
|
+
)
|
|
240
|
+
|
|
241
|
+
# Finds: "We decided to use PostgreSQL", "Chose two-tier memory model", etc.
|
|
242
|
+
# Matches conceptually even without exact keywords
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
### Use Case 2: Exact Technical Term
|
|
246
|
+
```ruby
|
|
247
|
+
# Query: "Find all mentions of PostgreSQL 17.2"
|
|
248
|
+
# Best strategy: :fulltext (exact version number)
|
|
249
|
+
|
|
250
|
+
memories = htm.recall(
|
|
251
|
+
timeframe: "this week",
|
|
252
|
+
topic: "PostgreSQL 17.2",
|
|
253
|
+
strategy: :fulltext
|
|
254
|
+
)
|
|
255
|
+
|
|
256
|
+
# Finds: Exact "PostgreSQL 17.2" mentions
|
|
257
|
+
# Avoids false matches to "PostgreSQL 16" or generic "database"
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
### Use Case 3: Balanced Query
|
|
261
|
+
```ruby
|
|
262
|
+
# Query: "What did we discuss about database performance?"
|
|
263
|
+
# Best strategy: :hybrid (keyword + semantic)
|
|
264
|
+
|
|
265
|
+
memories = htm.recall(
|
|
266
|
+
timeframe: "last week",
|
|
267
|
+
topic: "database performance optimization",
|
|
268
|
+
strategy: :hybrid
|
|
269
|
+
)
|
|
270
|
+
|
|
271
|
+
# Pre-filters: Documents containing "database", "performance", "optimization"
|
|
272
|
+
# Reranks: By semantic similarity to full query
|
|
273
|
+
# Result: Best balance of precision + recall
|
|
274
|
+
```
|
|
275
|
+
|
|
276
|
+
### Use Case 4: Conversation Timeline
|
|
277
|
+
```ruby
|
|
278
|
+
# Get chronological conversation about a topic
|
|
279
|
+
timeline = htm.conversation_timeline("HTM design", limit: 50)
|
|
280
|
+
|
|
281
|
+
# Returns memories sorted by created_at
|
|
282
|
+
# Useful for replaying decision evolution over time
|
|
283
|
+
```
|
|
284
|
+
|
|
285
|
+
## Performance Characteristics
|
|
286
|
+
|
|
287
|
+
### Vector Search
|
|
288
|
+
|
|
289
|
+
- **Latency**: ~10-50ms for embedding generation + index lookup
|
|
290
|
+
- **Index**: HNSW (Hierarchical Navigable Small World)
|
|
291
|
+
- **Scalability**: O(log n) with HNSW, sublinear
|
|
292
|
+
- **Best case**: Conceptual queries, semantic relationships
|
|
293
|
+
|
|
294
|
+
### Full-Text Search
|
|
295
|
+
|
|
296
|
+
- **Latency**: ~5-20ms (no embedding generation)
|
|
297
|
+
- **Index**: GIN (Generalized Inverted Index) on tsvector
|
|
298
|
+
- **Scalability**: O(log n) with GIN index
|
|
299
|
+
- **Best case**: Exact keywords, technical terms
|
|
300
|
+
|
|
301
|
+
### Hybrid Search
|
|
302
|
+
|
|
303
|
+
- **Latency**: Full-text pre-filter + vector reranking
|
|
304
|
+
- **Total**: ~15-70ms (faster than pure vector on large datasets)
|
|
305
|
+
- **Optimization**: Pre-filter reduces vector search space
|
|
306
|
+
- **Best case**: Large datasets where full-text can narrow candidates
|
|
307
|
+
|
|
308
|
+
### Temporal Filtering
|
|
309
|
+
|
|
310
|
+
- **Optimization**: TimescaleDB hypertable partitioning by time
|
|
311
|
+
- **Index**: B-tree on `created_at` column
|
|
312
|
+
- **Benefit**: Prunes partitions outside timeframe, faster scans
|
|
313
|
+
|
|
314
|
+
## Risks and Mitigations
|
|
315
|
+
|
|
316
|
+
### Risk: Wrong Strategy Selection
|
|
317
|
+
|
|
318
|
+
- **Risk**: User chooses vector for exact keyword query (poor results)
|
|
319
|
+
- **Likelihood**: Medium (requires understanding differences)
|
|
320
|
+
- **Impact**: Medium (degraded retrieval quality)
|
|
321
|
+
- **Mitigation**:
|
|
322
|
+
- Default to hybrid for balanced results
|
|
323
|
+
- Document use cases clearly
|
|
324
|
+
- Provide examples in API docs
|
|
325
|
+
- Consider auto-detection in future
|
|
326
|
+
|
|
327
|
+
### Risk: Embedding Latency
|
|
328
|
+
|
|
329
|
+
- **Risk**: Vector/hybrid slow due to embedding generation
|
|
330
|
+
- **Likelihood**: High (embedding is I/O bound)
|
|
331
|
+
- **Impact**: Medium (100-500ms for Ollama)
|
|
332
|
+
- **Mitigation**:
|
|
333
|
+
- Cache embeddings for common queries (future)
|
|
334
|
+
- Use fast local embedding models (gpt-oss)
|
|
335
|
+
- Provide fallback to full-text if embedding fails
|
|
336
|
+
|
|
337
|
+
### Risk: Language Limitation
|
|
338
|
+
|
|
339
|
+
- **Risk**: Full-text search optimized for English only
|
|
340
|
+
- **Likelihood**: Low (single-user, likely English)
|
|
341
|
+
- **Impact**: High (non-English users)
|
|
342
|
+
- **Mitigation**:
|
|
343
|
+
- Document English assumption
|
|
344
|
+
- Support language parameter in future
|
|
345
|
+
- Vector search language-agnostic (works for all languages)
|
|
346
|
+
|
|
347
|
+
### Risk: Pre-filter Misses Results
|
|
348
|
+
|
|
349
|
+
- **Risk**: Hybrid pre-filter (100) misses relevant candidates
|
|
350
|
+
- **Likelihood**: Low (100 is generous)
|
|
351
|
+
- **Impact**: Medium (reduced recall)
|
|
352
|
+
- **Mitigation**:
|
|
353
|
+
- Make prefilter_limit configurable
|
|
354
|
+
- Monitor recall metrics in practice
|
|
355
|
+
- Adjust default if needed
|
|
356
|
+
|
|
357
|
+
## Future Enhancements
|
|
358
|
+
|
|
359
|
+
### Query Auto-Detection
|
|
360
|
+
```ruby
|
|
361
|
+
# Automatically choose strategy based on query
|
|
362
|
+
htm.recall_smart(timeframe: "last week", topic: "PostgreSQL 17.2")
|
|
363
|
+
# Detects version number → uses :fulltext
|
|
364
|
+
|
|
365
|
+
htm.recall_smart(timeframe: "last month", topic: "architectural philosophy")
|
|
366
|
+
# Detects conceptual query → uses :vector
|
|
367
|
+
```
|
|
368
|
+
|
|
369
|
+
### Re-ranking Strategies
|
|
370
|
+
```ruby
|
|
371
|
+
# Custom re-ranking based on multiple signals
|
|
372
|
+
memories = htm.recall(
|
|
373
|
+
timeframe: "last week",
|
|
374
|
+
topic: "PostgreSQL",
|
|
375
|
+
strategy: :hybrid,
|
|
376
|
+
rerank: [:similarity, :importance, :recency] # Multi-factor scoring
|
|
377
|
+
)
|
|
378
|
+
```
|
|
379
|
+
|
|
380
|
+
### Query Expansion
|
|
381
|
+
```ruby
|
|
382
|
+
# LLM-powered query expansion
|
|
383
|
+
original = "database"
|
|
384
|
+
expanded = ["database", "PostgreSQL", "TimescaleDB", "SQL", "storage"]
|
|
385
|
+
|
|
386
|
+
memories = htm.recall(
|
|
387
|
+
timeframe: "last month",
|
|
388
|
+
topic: expanded,
|
|
389
|
+
strategy: :fulltext
|
|
390
|
+
)
|
|
391
|
+
```
|
|
392
|
+
|
|
393
|
+
### Caching Layer
|
|
394
|
+
```ruby
|
|
395
|
+
# Cache embedding generation for common queries
|
|
396
|
+
@embedding_cache = {}
|
|
397
|
+
|
|
398
|
+
def search_cached(query)
|
|
399
|
+
@embedding_cache[query] ||= embedding_service.embed(query)
|
|
400
|
+
end
|
|
401
|
+
```
|
|
402
|
+
|
|
403
|
+
## Alternatives Considered
|
|
404
|
+
|
|
405
|
+
### Pure Vector Search Only
|
|
406
|
+
**Pros**: Simplest API, semantic by default
|
|
407
|
+
**Cons**: Misses exact keyword matches, slower on large datasets
|
|
408
|
+
**Decision**: ❌ Rejected - need keyword precision
|
|
409
|
+
|
|
410
|
+
### Pure Full-Text Only
|
|
411
|
+
**Pros**: Fast, no embedding overhead
|
|
412
|
+
**Cons**: No semantic understanding, synonym issues
|
|
413
|
+
**Decision**: ❌ Rejected - semantic understanding essential for LLMs
|
|
414
|
+
|
|
415
|
+
### LLM-as-Retriever
|
|
416
|
+
**Pros**: Most flexible, natural language queries
|
|
417
|
+
**Cons**: Expensive, slow, requires online LLM
|
|
418
|
+
**Decision**: ❌ Rejected - too slow and expensive for retrieval path
|
|
419
|
+
|
|
420
|
+
### Elasticsearch/Meilisearch
|
|
421
|
+
**Pros**: Dedicated search engines, advanced features
|
|
422
|
+
**Cons**: Additional infrastructure, complexity, cost
|
|
423
|
+
**Decision**: ❌ Rejected - PostgreSQL sufficient for v1, consolidation benefits
|
|
424
|
+
|
|
425
|
+
## References
|
|
426
|
+
|
|
427
|
+
- [RAG (Retrieval-Augmented Generation)](https://arxiv.org/abs/2005.11401)
|
|
428
|
+
- [pgvector Documentation](https://github.com/pgvector/pgvector)
|
|
429
|
+
- [PostgreSQL Full-Text Search](https://www.postgresql.org/docs/current/textsearch.html)
|
|
430
|
+
- [HNSW Algorithm](https://arxiv.org/abs/1603.09320)
|
|
431
|
+
- [Hybrid Search Best Practices](https://www.pinecone.io/learn/hybrid-search-intro/)
|
|
432
|
+
|
|
433
|
+
## Review Notes
|
|
434
|
+
|
|
435
|
+
**AI Engineer**: ✅ Hybrid search is the right approach for RAG systems. Pre-filter optimization is smart.
|
|
436
|
+
|
|
437
|
+
**Database Architect**: ✅ TimescaleDB + pgvector + full-text is well-architected. Consider query plan analysis for optimization.
|
|
438
|
+
|
|
439
|
+
**Performance Specialist**: ✅ HNSW and GIN indexes will scale. Monitor embedding latency in production.
|
|
440
|
+
|
|
441
|
+
**Systems Architect**: ✅ Three strategies provide good flexibility. Document decision matrix clearly for users.
|
|
442
|
+
|
|
443
|
+
**Ruby Expert**: ✅ Clean API design. Consider strategy as default parameter: `recall(..., strategy: :hybrid)`
|