htm 0.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.architecture/decisions/adrs/001-use-postgresql-timescaledb-storage.md +227 -0
- data/.architecture/decisions/adrs/002-two-tier-memory-architecture.md +322 -0
- data/.architecture/decisions/adrs/003-ollama-default-embedding-provider.md +339 -0
- data/.architecture/decisions/adrs/004-multi-robot-shared-memory-hive-mind.md +374 -0
- data/.architecture/decisions/adrs/005-rag-based-retrieval-with-hybrid-search.md +443 -0
- data/.architecture/decisions/adrs/006-context-assembly-strategies.md +444 -0
- data/.architecture/decisions/adrs/007-working-memory-eviction-strategy.md +461 -0
- data/.architecture/decisions/adrs/008-robot-identification-system.md +550 -0
- data/.architecture/decisions/adrs/009-never-forget-explicit-deletion-only.md +570 -0
- data/.architecture/decisions/adrs/010-redis-working-memory-rejected.md +323 -0
- data/.architecture/decisions/adrs/011-database-side-embedding-generation-with-pgai.md +585 -0
- data/.architecture/decisions/adrs/012-llm-driven-ontology-topic-extraction.md +583 -0
- data/.architecture/decisions/adrs/013-activerecord-orm-and-many-to-many-tagging.md +299 -0
- data/.architecture/decisions/adrs/014-client-side-embedding-generation-workflow.md +569 -0
- data/.architecture/decisions/adrs/015-hierarchical-tag-ontology-and-llm-extraction.md +701 -0
- data/.architecture/decisions/adrs/016-async-embedding-and-tag-generation.md +694 -0
- data/.architecture/members.yml +144 -0
- data/.architecture/reviews/2025-10-29-llm-configuration-and-async-processing-review.md +1137 -0
- data/.architecture/reviews/initial-system-analysis.md +330 -0
- data/.envrc +32 -0
- data/.irbrc +145 -0
- data/CHANGELOG.md +150 -0
- data/COMMITS.md +196 -0
- data/LICENSE +21 -0
- data/README.md +1347 -0
- data/Rakefile +51 -0
- data/SETUP.md +268 -0
- data/config/database.yml +67 -0
- data/db/migrate/20250101000001_enable_extensions.rb +14 -0
- data/db/migrate/20250101000002_create_robots.rb +14 -0
- data/db/migrate/20250101000003_create_nodes.rb +42 -0
- data/db/migrate/20250101000005_create_tags.rb +38 -0
- data/db/migrate/20250101000007_add_node_vector_indexes.rb +30 -0
- data/db/schema.sql +473 -0
- data/db/seed_data/README.md +100 -0
- data/db/seed_data/presidents.md +136 -0
- data/db/seed_data/states.md +151 -0
- data/db/seeds.rb +208 -0
- data/dbdoc/README.md +173 -0
- data/dbdoc/public.node_stats.md +48 -0
- data/dbdoc/public.node_stats.svg +41 -0
- data/dbdoc/public.node_tags.md +40 -0
- data/dbdoc/public.node_tags.svg +112 -0
- data/dbdoc/public.nodes.md +54 -0
- data/dbdoc/public.nodes.svg +118 -0
- data/dbdoc/public.nodes_tags.md +39 -0
- data/dbdoc/public.nodes_tags.svg +112 -0
- data/dbdoc/public.ontology_structure.md +48 -0
- data/dbdoc/public.ontology_structure.svg +38 -0
- data/dbdoc/public.operations_log.md +42 -0
- data/dbdoc/public.operations_log.svg +130 -0
- data/dbdoc/public.relationships.md +39 -0
- data/dbdoc/public.relationships.svg +41 -0
- data/dbdoc/public.robot_activity.md +46 -0
- data/dbdoc/public.robot_activity.svg +35 -0
- data/dbdoc/public.robots.md +35 -0
- data/dbdoc/public.robots.svg +90 -0
- data/dbdoc/public.schema_migrations.md +29 -0
- data/dbdoc/public.schema_migrations.svg +26 -0
- data/dbdoc/public.tags.md +35 -0
- data/dbdoc/public.tags.svg +60 -0
- data/dbdoc/public.topic_relationships.md +45 -0
- data/dbdoc/public.topic_relationships.svg +32 -0
- data/dbdoc/schema.json +1437 -0
- data/dbdoc/schema.svg +154 -0
- data/docs/api/database.md +806 -0
- data/docs/api/embedding-service.md +532 -0
- data/docs/api/htm.md +797 -0
- data/docs/api/index.md +259 -0
- data/docs/api/long-term-memory.md +1096 -0
- data/docs/api/working-memory.md +665 -0
- data/docs/architecture/adrs/001-postgresql-timescaledb.md +314 -0
- data/docs/architecture/adrs/002-two-tier-memory.md +411 -0
- data/docs/architecture/adrs/003-ollama-embeddings.md +421 -0
- data/docs/architecture/adrs/004-hive-mind.md +437 -0
- data/docs/architecture/adrs/005-rag-retrieval.md +531 -0
- data/docs/architecture/adrs/006-context-assembly.md +496 -0
- data/docs/architecture/adrs/007-eviction-strategy.md +645 -0
- data/docs/architecture/adrs/008-robot-identification.md +625 -0
- data/docs/architecture/adrs/009-never-forget.md +648 -0
- data/docs/architecture/adrs/010-redis-working-memory-rejected.md +323 -0
- data/docs/architecture/adrs/011-pgai-integration.md +494 -0
- data/docs/architecture/adrs/index.md +215 -0
- data/docs/architecture/hive-mind.md +736 -0
- data/docs/architecture/index.md +351 -0
- data/docs/architecture/overview.md +538 -0
- data/docs/architecture/two-tier-memory.md +873 -0
- data/docs/assets/css/custom.css +83 -0
- data/docs/assets/images/htm-core-components.svg +63 -0
- data/docs/assets/images/htm-database-schema.svg +93 -0
- data/docs/assets/images/htm-hive-mind-architecture.svg +125 -0
- data/docs/assets/images/htm-importance-scoring-framework.svg +83 -0
- data/docs/assets/images/htm-layered-architecture.svg +71 -0
- data/docs/assets/images/htm-long-term-memory-architecture.svg +115 -0
- data/docs/assets/images/htm-working-memory-architecture.svg +120 -0
- data/docs/assets/images/htm.jpg +0 -0
- data/docs/assets/images/htm_demo.gif +0 -0
- data/docs/assets/js/mathjax.js +18 -0
- data/docs/assets/videos/htm_video.mp4 +0 -0
- data/docs/database_rake_tasks.md +322 -0
- data/docs/development/contributing.md +787 -0
- data/docs/development/index.md +336 -0
- data/docs/development/schema.md +596 -0
- data/docs/development/setup.md +719 -0
- data/docs/development/testing.md +819 -0
- data/docs/guides/adding-memories.md +824 -0
- data/docs/guides/context-assembly.md +1009 -0
- data/docs/guides/getting-started.md +577 -0
- data/docs/guides/index.md +118 -0
- data/docs/guides/long-term-memory.md +941 -0
- data/docs/guides/multi-robot.md +866 -0
- data/docs/guides/recalling-memories.md +927 -0
- data/docs/guides/search-strategies.md +953 -0
- data/docs/guides/working-memory.md +717 -0
- data/docs/index.md +214 -0
- data/docs/installation.md +477 -0
- data/docs/multi_framework_support.md +519 -0
- data/docs/quick-start.md +655 -0
- data/docs/setup_local_database.md +302 -0
- data/docs/using_rake_tasks_in_your_app.md +383 -0
- data/examples/basic_usage.rb +93 -0
- data/examples/cli_app/README.md +317 -0
- data/examples/cli_app/htm_cli.rb +270 -0
- data/examples/custom_llm_configuration.rb +183 -0
- data/examples/example_app/Rakefile +71 -0
- data/examples/example_app/app.rb +206 -0
- data/examples/sinatra_app/Gemfile +21 -0
- data/examples/sinatra_app/app.rb +335 -0
- data/lib/htm/active_record_config.rb +113 -0
- data/lib/htm/configuration.rb +342 -0
- data/lib/htm/database.rb +594 -0
- data/lib/htm/embedding_service.rb +115 -0
- data/lib/htm/errors.rb +34 -0
- data/lib/htm/job_adapter.rb +154 -0
- data/lib/htm/jobs/generate_embedding_job.rb +65 -0
- data/lib/htm/jobs/generate_tags_job.rb +82 -0
- data/lib/htm/long_term_memory.rb +965 -0
- data/lib/htm/models/node.rb +109 -0
- data/lib/htm/models/node_tag.rb +33 -0
- data/lib/htm/models/robot.rb +52 -0
- data/lib/htm/models/tag.rb +76 -0
- data/lib/htm/railtie.rb +76 -0
- data/lib/htm/sinatra.rb +157 -0
- data/lib/htm/tag_service.rb +135 -0
- data/lib/htm/tasks.rb +38 -0
- data/lib/htm/version.rb +5 -0
- data/lib/htm/working_memory.rb +182 -0
- data/lib/htm.rb +400 -0
- data/lib/tasks/db.rake +19 -0
- data/lib/tasks/htm.rake +147 -0
- data/lib/tasks/jobs.rake +312 -0
- data/mkdocs.yml +190 -0
- data/scripts/install_local_database.sh +309 -0
- metadata +341 -0
|
@@ -0,0 +1,314 @@
|
|
|
1
|
+
# ADR-001: PostgreSQL with TimescaleDB for Storage
|
|
2
|
+
|
|
3
|
+
**Status**: Accepted
|
|
4
|
+
|
|
5
|
+
**Date**: 2025-10-25
|
|
6
|
+
|
|
7
|
+
**Decision Makers**: Dewayne VanHoozer, Claude (Anthropic)
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## Quick Summary
|
|
12
|
+
|
|
13
|
+
HTM uses **PostgreSQL with TimescaleDB** as its primary storage backend, providing time-series optimization, vector embeddings (pgvector), full-text search, and ACID compliance in a single, production-proven database system.
|
|
14
|
+
|
|
15
|
+
**Why**: Consolidates time-series data, vector search, and full-text capabilities in one system rather than maintaining multiple specialized databases.
|
|
16
|
+
|
|
17
|
+
**Impact**: Production-ready storage with excellent tooling, at the cost of some operational complexity compared to simpler alternatives.
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## Context
|
|
22
|
+
|
|
23
|
+
HTM requires a persistent storage solution that can handle:
|
|
24
|
+
|
|
25
|
+
- Time-series data with efficient time-range queries
|
|
26
|
+
- Vector embeddings for semantic search
|
|
27
|
+
- Full-text search capabilities
|
|
28
|
+
- ACID compliance for data integrity
|
|
29
|
+
- Scalability for growing memory databases
|
|
30
|
+
- Production-grade reliability
|
|
31
|
+
|
|
32
|
+
### Alternative Options Considered
|
|
33
|
+
|
|
34
|
+
1. **Pure PostgreSQL**: Solid relational database, pgvector support
|
|
35
|
+
2. **TimescaleDB**: PostgreSQL extension optimized for time-series
|
|
36
|
+
3. **Elasticsearch**: Strong full-text search, vector support added
|
|
37
|
+
4. **Pinecone/Weaviate**: Specialized vector databases
|
|
38
|
+
5. **SQLite + extensions**: Simple, embedded option
|
|
39
|
+
|
|
40
|
+
---
|
|
41
|
+
|
|
42
|
+
## Decision
|
|
43
|
+
|
|
44
|
+
We will use **PostgreSQL with TimescaleDB** as the primary storage backend for HTM.
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
## Rationale
|
|
49
|
+
|
|
50
|
+
### Why PostgreSQL?
|
|
51
|
+
|
|
52
|
+
**Production-proven**:
|
|
53
|
+
- Decades of reliability in demanding environments
|
|
54
|
+
- ACID compliance guarantees data integrity for memory operations
|
|
55
|
+
- Rich ecosystem with extensive tooling, monitoring, and support
|
|
56
|
+
|
|
57
|
+
**Search capabilities**:
|
|
58
|
+
- **pgvector extension**: Native vector similarity search with HNSW indexing
|
|
59
|
+
- **Full-text search**: Built-in tsvector with GIN indexing
|
|
60
|
+
- **pg_trgm extension**: Trigram-based fuzzy matching
|
|
61
|
+
|
|
62
|
+
**Developer experience**:
|
|
63
|
+
- Strong typing with schema enforcement prevents data corruption
|
|
64
|
+
- Wide adoption means well-understood by developers
|
|
65
|
+
- Standard SQL with PostgreSQL-specific enhancements
|
|
66
|
+
|
|
67
|
+
### Why TimescaleDB?
|
|
68
|
+
|
|
69
|
+
**Time-series optimization**:
|
|
70
|
+
- **Hypertable partitioning**: Automatic chunk-based time partitioning
|
|
71
|
+
- **Compression policies**: Automatic compression of old data (70-90% reduction)
|
|
72
|
+
- **Time-range optimization**: Fast queries on temporal data via chunk exclusion
|
|
73
|
+
|
|
74
|
+
**PostgreSQL compatibility**:
|
|
75
|
+
- Drop-in extension, not a fork
|
|
76
|
+
- All PostgreSQL features remain available
|
|
77
|
+
- Standard PostgreSQL tools work seamlessly
|
|
78
|
+
|
|
79
|
+
**Operational features**:
|
|
80
|
+
- **Continuous aggregates**: Pre-computed summaries for analytics
|
|
81
|
+
- **Retention policies**: Automatic data lifecycle management
|
|
82
|
+
- **Cloud offering**: Managed service available (TimescaleDB Cloud)
|
|
83
|
+
|
|
84
|
+
### Why Not Alternatives?
|
|
85
|
+
|
|
86
|
+
!!! warning "Elasticsearch"
|
|
87
|
+
- High operational complexity (JVM tuning, cluster management)
|
|
88
|
+
- Higher resource usage
|
|
89
|
+
- Vector support more recent, less mature
|
|
90
|
+
- Superior full-text search not critical for our use case
|
|
91
|
+
|
|
92
|
+
!!! info "Specialized Vector DBs (Pinecone, Weaviate, Qdrant)"
|
|
93
|
+
- Additional service dependency increases complexity
|
|
94
|
+
- Limited relational capabilities
|
|
95
|
+
- Vendor lock-in concerns
|
|
96
|
+
- Cost considerations for managed services
|
|
97
|
+
- Excellent vector search performance
|
|
98
|
+
- Purpose-built for embeddings
|
|
99
|
+
|
|
100
|
+
!!! note "SQLite"
|
|
101
|
+
- Limited concurrency (write locks)
|
|
102
|
+
- No native vector search (extensions experimental)
|
|
103
|
+
- Not suitable for multi-robot scenarios
|
|
104
|
+
- Simple deployment
|
|
105
|
+
- Zero configuration
|
|
106
|
+
|
|
107
|
+
---
|
|
108
|
+
|
|
109
|
+
## Implementation Details
|
|
110
|
+
|
|
111
|
+
### Schema Design
|
|
112
|
+
|
|
113
|
+
```sql
|
|
114
|
+
-- Nodes table as TimescaleDB hypertable
|
|
115
|
+
CREATE TABLE nodes (
|
|
116
|
+
id SERIAL PRIMARY KEY,
|
|
117
|
+
key TEXT UNIQUE NOT NULL,
|
|
118
|
+
value TEXT NOT NULL,
|
|
119
|
+
embedding vector(1536),
|
|
120
|
+
robot_id TEXT NOT NULL,
|
|
121
|
+
created_at TIMESTAMP NOT NULL,
|
|
122
|
+
importance FLOAT DEFAULT 1.0,
|
|
123
|
+
type TEXT,
|
|
124
|
+
metadata JSONB
|
|
125
|
+
);
|
|
126
|
+
|
|
127
|
+
-- Convert to hypertable (TimescaleDB)
|
|
128
|
+
SELECT create_hypertable('nodes', 'created_at');
|
|
129
|
+
|
|
130
|
+
-- Vector indexing (HNSW for approximate nearest neighbor)
|
|
131
|
+
CREATE INDEX nodes_embedding_idx ON nodes
|
|
132
|
+
USING hnsw (embedding vector_cosine_ops);
|
|
133
|
+
|
|
134
|
+
-- Full-text indexing
|
|
135
|
+
CREATE INDEX nodes_fts_idx ON nodes
|
|
136
|
+
USING GIN (to_tsvector('english', value));
|
|
137
|
+
|
|
138
|
+
-- Additional indexes
|
|
139
|
+
CREATE INDEX nodes_robot_id_idx ON nodes (robot_id);
|
|
140
|
+
CREATE INDEX nodes_created_at_idx ON nodes (created_at DESC);
|
|
141
|
+
CREATE INDEX nodes_type_idx ON nodes (type);
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
### Connection Configuration
|
|
145
|
+
|
|
146
|
+
```ruby
|
|
147
|
+
# Via environment variable (preferred)
|
|
148
|
+
ENV['HTM_DBURL'] = "postgresql://user:pass@host:port/dbname?sslmode=require"
|
|
149
|
+
|
|
150
|
+
# Parsed into connection hash
|
|
151
|
+
{
|
|
152
|
+
host: 'host',
|
|
153
|
+
port: 5432,
|
|
154
|
+
dbname: 'tsdb',
|
|
155
|
+
user: 'tsdbadmin',
|
|
156
|
+
password: 'secret',
|
|
157
|
+
sslmode: 'require'
|
|
158
|
+
}
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
### Key Features Enabled
|
|
162
|
+
|
|
163
|
+
**Vector similarity search**:
|
|
164
|
+
```sql
|
|
165
|
+
-- Find semantically similar nodes
|
|
166
|
+
SELECT *, 1 - (embedding <=> $1::vector) as similarity
|
|
167
|
+
FROM nodes
|
|
168
|
+
WHERE created_at > NOW() - INTERVAL '30 days'
|
|
169
|
+
ORDER BY embedding <=> $1::vector
|
|
170
|
+
LIMIT 10;
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
**Full-text search**:
|
|
174
|
+
```sql
|
|
175
|
+
-- Find nodes containing keywords
|
|
176
|
+
SELECT *, ts_rank(to_tsvector('english', value),
|
|
177
|
+
plainto_tsquery('english', $1)) as rank
|
|
178
|
+
FROM nodes
|
|
179
|
+
WHERE to_tsvector('english', value) @@ plainto_tsquery('english', $1)
|
|
180
|
+
ORDER BY rank DESC
|
|
181
|
+
LIMIT 10;
|
|
182
|
+
```
|
|
183
|
+
|
|
184
|
+
**Time-range queries** (optimized by chunk exclusion):
|
|
185
|
+
```sql
|
|
186
|
+
-- Fast time-range query (TimescaleDB prunes chunks)
|
|
187
|
+
SELECT * FROM nodes
|
|
188
|
+
WHERE created_at BETWEEN '2025-10-01' AND '2025-10-25'
|
|
189
|
+
AND robot_id = 'robot-123'
|
|
190
|
+
ORDER BY created_at DESC;
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
**Automatic compression**:
|
|
194
|
+
```sql
|
|
195
|
+
-- Compress chunks older than 30 days
|
|
196
|
+
SELECT add_compression_policy('nodes', INTERVAL '30 days');
|
|
197
|
+
|
|
198
|
+
-- Segment by robot_id and type for better compression
|
|
199
|
+
ALTER TABLE nodes SET (
|
|
200
|
+
timescaledb.compress,
|
|
201
|
+
timescaledb.compress_segmentby = 'robot_id, type'
|
|
202
|
+
);
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
---
|
|
206
|
+
|
|
207
|
+
## Consequences
|
|
208
|
+
|
|
209
|
+
### Positive
|
|
210
|
+
|
|
211
|
+
- Production-ready with battle-tested reliability
|
|
212
|
+
- Multi-modal search: vector, full-text, and hybrid strategies
|
|
213
|
+
- Time-series optimization for efficient temporal queries
|
|
214
|
+
- Cost-effective storage with compression reducing cloud costs
|
|
215
|
+
- Familiar tooling: standard PostgreSQL tools and practices
|
|
216
|
+
- Flexible querying: full SQL power for complex operations
|
|
217
|
+
- ACID guarantees for critical memory operations
|
|
218
|
+
|
|
219
|
+
### Negative
|
|
220
|
+
|
|
221
|
+
- Operational complexity requires database management (mitigated by managed service)
|
|
222
|
+
- Vertical scaling limits (mitigated by partitioning)
|
|
223
|
+
- Connection overhead: PostgreSQL connections relatively heavy
|
|
224
|
+
- Vector search performance slower than specialized vector DBs at massive scale
|
|
225
|
+
|
|
226
|
+
### Neutral
|
|
227
|
+
|
|
228
|
+
- Learning curve: developers need PostgreSQL + TimescaleDB knowledge
|
|
229
|
+
- Cloud dependency: currently using TimescaleDB Cloud (could self-host)
|
|
230
|
+
- Extension management requires extensions (timescaledb, pgvector, pg_trgm)
|
|
231
|
+
|
|
232
|
+
---
|
|
233
|
+
|
|
234
|
+
## Risks and Mitigations
|
|
235
|
+
|
|
236
|
+
### Risk: Extension Availability
|
|
237
|
+
|
|
238
|
+
!!! danger "Risk"
|
|
239
|
+
Extensions not available in all PostgreSQL environments
|
|
240
|
+
|
|
241
|
+
**Likelihood**: Low (extensions widely available)
|
|
242
|
+
**Impact**: High (breaks core functionality)
|
|
243
|
+
**Mitigation**: Document requirements clearly, verify in setup process
|
|
244
|
+
|
|
245
|
+
### Risk: Connection Exhaustion
|
|
246
|
+
|
|
247
|
+
!!! warning "Risk"
|
|
248
|
+
PostgreSQL connections limited (default ~100)
|
|
249
|
+
|
|
250
|
+
**Likelihood**: Medium (with many robots)
|
|
251
|
+
**Impact**: Medium (service degradation)
|
|
252
|
+
**Mitigation**: Implement connection pooling (ConnectionPool gem)
|
|
253
|
+
|
|
254
|
+
### Risk: Storage Costs
|
|
255
|
+
|
|
256
|
+
!!! info "Risk"
|
|
257
|
+
Vector data storage can be expensive at scale
|
|
258
|
+
|
|
259
|
+
**Likelihood**: Medium (depends on usage)
|
|
260
|
+
**Impact**: Medium (operational cost)
|
|
261
|
+
**Mitigation**: Compression policies, retention policies, archival strategies
|
|
262
|
+
|
|
263
|
+
### Risk: Query Performance at Scale
|
|
264
|
+
|
|
265
|
+
!!! warning "Risk"
|
|
266
|
+
Complex hybrid searches may slow with millions of nodes
|
|
267
|
+
|
|
268
|
+
**Likelihood**: Low (with proper indexing)
|
|
269
|
+
**Impact**: Medium (user experience)
|
|
270
|
+
**Mitigation**: Query optimization, read replicas, caching layer
|
|
271
|
+
|
|
272
|
+
---
|
|
273
|
+
|
|
274
|
+
## Alternatives Comparison
|
|
275
|
+
|
|
276
|
+
| Solution | Pros | Cons | Decision |
|
|
277
|
+
|----------|------|------|----------|
|
|
278
|
+
| Pure PostgreSQL | Simple, reliable, pgvector | No time-series optimization | Rejected |
|
|
279
|
+
| **PostgreSQL + TimescaleDB** | **Best of both worlds** | **Slight complexity** | **ACCEPTED** |
|
|
280
|
+
| Elasticsearch | Excellent full-text search | High resource usage, complexity | Rejected |
|
|
281
|
+
| Pinecone | Purpose-built vectors | Vendor lock-in, cost, limited relational | Rejected |
|
|
282
|
+
| SQLite | Simple, embedded | Limited concurrency, no vectors | Rejected |
|
|
283
|
+
|
|
284
|
+
---
|
|
285
|
+
|
|
286
|
+
## Future Considerations
|
|
287
|
+
|
|
288
|
+
- **Read replicas**: For query scaling when needed
|
|
289
|
+
- **Partitioning strategies**: By robot_id for tenant isolation
|
|
290
|
+
- **Caching layer**: Redis for hot nodes
|
|
291
|
+
- **Archive tier**: S3/Glacier for very old memories
|
|
292
|
+
- **Multi-region**: For global deployment
|
|
293
|
+
|
|
294
|
+
---
|
|
295
|
+
|
|
296
|
+
## References
|
|
297
|
+
|
|
298
|
+
- [TimescaleDB Documentation](https://docs.timescale.com/)
|
|
299
|
+
- [pgvector Documentation](https://github.com/pgvector/pgvector)
|
|
300
|
+
- [PostgreSQL Full-Text Search](https://www.postgresql.org/docs/current/textsearch.html)
|
|
301
|
+
- [HTM Database Schema Guide](../../development/schema.md)
|
|
302
|
+
- [HTM Configuration Guide](../../installation.md)
|
|
303
|
+
|
|
304
|
+
---
|
|
305
|
+
|
|
306
|
+
## Review Notes
|
|
307
|
+
|
|
308
|
+
**Systems Architect**: Solid choice for time-series + vector workload. Consider read replicas for scaling.
|
|
309
|
+
|
|
310
|
+
**Database Architect**: Excellent indexing strategy. Monitor query performance as data grows.
|
|
311
|
+
|
|
312
|
+
**Performance Specialist**: TimescaleDB compression will help with costs. Add connection pooling soon.
|
|
313
|
+
|
|
314
|
+
**Maintainability Expert**: PostgreSQL tooling is mature and well-documented. Good choice for long-term maintenance.
|
|
@@ -0,0 +1,411 @@
|
|
|
1
|
+
# ADR-002: Two-Tier Memory Architecture
|
|
2
|
+
|
|
3
|
+
**Status**: Accepted
|
|
4
|
+
|
|
5
|
+
**Date**: 2025-10-25
|
|
6
|
+
|
|
7
|
+
**Decision Makers**: Dewayne VanHoozer, Claude (Anthropic)
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## Quick Summary
|
|
12
|
+
|
|
13
|
+
HTM implements a **two-tier memory architecture** with token-limited working memory (hot tier) and unlimited long-term memory (cold tier), managing LLM context windows while preserving all historical data through RAG-based retrieval.
|
|
14
|
+
|
|
15
|
+
**Why**: LLMs have limited context windows but need awareness across long conversations. Two tiers provide fast access to recent context while maintaining complete history.
|
|
16
|
+
|
|
17
|
+
**Impact**: Efficient token budget management with never-forget guarantees, at the cost of coordination between two storage layers.
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## Context
|
|
22
|
+
|
|
23
|
+
LLM-based applications face a fundamental challenge: LLMs have limited context windows (typically 128K-200K tokens) but need to maintain awareness across long conversations and sessions spanning days, weeks, or months.
|
|
24
|
+
|
|
25
|
+
### Requirements
|
|
26
|
+
|
|
27
|
+
- Persist memories across sessions (durable storage)
|
|
28
|
+
- Provide fast access to recent/relevant context
|
|
29
|
+
- Manage token budgets efficiently
|
|
30
|
+
- Never lose data accidentally
|
|
31
|
+
- Support contextual recall from the past
|
|
32
|
+
|
|
33
|
+
### Alternative Approaches
|
|
34
|
+
|
|
35
|
+
1. **Database-only**: Store everything in PostgreSQL, load on demand
|
|
36
|
+
2. **Memory-only**: Keep everything in RAM, serialize on shutdown
|
|
37
|
+
3. **Two-tier**: Combine fast working memory with durable long-term storage
|
|
38
|
+
4. **External service**: Use a managed memory service
|
|
39
|
+
|
|
40
|
+
---
|
|
41
|
+
|
|
42
|
+
## Decision
|
|
43
|
+
|
|
44
|
+
We will implement a **two-tier memory architecture** with:
|
|
45
|
+
|
|
46
|
+
- **Working Memory**: Token-limited, in-memory active context
|
|
47
|
+
- **Long-term Memory**: Durable PostgreSQL storage
|
|
48
|
+
|
|
49
|
+
---
|
|
50
|
+
|
|
51
|
+
## Rationale
|
|
52
|
+
|
|
53
|
+
### Working Memory (Hot Tier)
|
|
54
|
+
|
|
55
|
+
**Characteristics**:
|
|
56
|
+
- **Purpose**: Immediate context for LLM
|
|
57
|
+
- **Storage**: In-memory Ruby data structures
|
|
58
|
+
- **Capacity**: Token-limited (default 128K tokens)
|
|
59
|
+
- **Eviction**: LRU-based eviction when full
|
|
60
|
+
- **Access pattern**: Frequent reads, moderate writes
|
|
61
|
+
- **Lifetime**: Process lifetime
|
|
62
|
+
|
|
63
|
+
**Benefits**:
|
|
64
|
+
- O(1) hash lookups for fast context access
|
|
65
|
+
- Token budget control prevents context overflow
|
|
66
|
+
- Explicit eviction policy with transparent behavior
|
|
67
|
+
|
|
68
|
+
### Long-term Memory (Cold Tier)
|
|
69
|
+
|
|
70
|
+
**Characteristics**:
|
|
71
|
+
- **Purpose**: Permanent knowledge base
|
|
72
|
+
- **Storage**: PostgreSQL with TimescaleDB
|
|
73
|
+
- **Capacity**: Effectively unlimited
|
|
74
|
+
- **Retention**: Permanent (explicit deletion only)
|
|
75
|
+
- **Access pattern**: RAG-based retrieval
|
|
76
|
+
- **Lifetime**: Forever
|
|
77
|
+
|
|
78
|
+
**Benefits**:
|
|
79
|
+
- Never lose data, survives restarts
|
|
80
|
+
- Search historical context semantically
|
|
81
|
+
- Time-series queries for temporal context
|
|
82
|
+
|
|
83
|
+
### Data Flow
|
|
84
|
+
|
|
85
|
+
```
|
|
86
|
+
Add Memory:
|
|
87
|
+
User Input → Working Memory → Long-term Memory
|
|
88
|
+
(immediate) (persisted)
|
|
89
|
+
|
|
90
|
+
Recall Memory:
|
|
91
|
+
Query → Long-term Memory (RAG search) → Working Memory
|
|
92
|
+
(semantic + temporal) (evict if needed)
|
|
93
|
+
|
|
94
|
+
Eviction:
|
|
95
|
+
Working Memory (full) → Evict LRU → Long-term Memory (already there)
|
|
96
|
+
(mark as evicted, not deleted)
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
---
|
|
100
|
+
|
|
101
|
+
## Implementation Details
|
|
102
|
+
|
|
103
|
+
### Working Memory
|
|
104
|
+
|
|
105
|
+
```ruby
|
|
106
|
+
class WorkingMemory
|
|
107
|
+
attr_reader :max_tokens, :token_count
|
|
108
|
+
|
|
109
|
+
def initialize(max_tokens: 128_000)
|
|
110
|
+
@nodes = {} # key => {value, token_count, importance, timestamp}
|
|
111
|
+
@max_tokens = max_tokens
|
|
112
|
+
@token_count = 0
|
|
113
|
+
@access_order = [] # Track access for LRU
|
|
114
|
+
end
|
|
115
|
+
|
|
116
|
+
def add(key, value, token_count:, importance: 1.0)
|
|
117
|
+
evict_to_make_space(token_count) if needs_eviction?(token_count)
|
|
118
|
+
@nodes[key] = {
|
|
119
|
+
value: value,
|
|
120
|
+
token_count: token_count,
|
|
121
|
+
importance: importance,
|
|
122
|
+
added_at: Time.now,
|
|
123
|
+
last_accessed: Time.now
|
|
124
|
+
}
|
|
125
|
+
@token_count += token_count
|
|
126
|
+
@access_order << key
|
|
127
|
+
end
|
|
128
|
+
|
|
129
|
+
def evict_to_make_space(needed_tokens)
|
|
130
|
+
# LRU eviction based on last access + importance
|
|
131
|
+
# See ADR-007 for detailed eviction strategy
|
|
132
|
+
end
|
|
133
|
+
|
|
134
|
+
def assemble_context(strategy: :balanced, max_tokens: nil)
|
|
135
|
+
# Sort by strategy and assemble within budget
|
|
136
|
+
# See ADR-006 for context assembly strategies
|
|
137
|
+
end
|
|
138
|
+
end
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
### Long-term Memory
|
|
142
|
+
|
|
143
|
+
```ruby
|
|
144
|
+
class LongTermMemory
|
|
145
|
+
def add(key:, value:, embedding:, robot_id:, importance: 1.0, type: nil)
|
|
146
|
+
# Insert into PostgreSQL with vector embedding
|
|
147
|
+
@db.exec_params(<<~SQL, [key, value, embedding, robot_id, importance, type])
|
|
148
|
+
INSERT INTO nodes (key, value, embedding, robot_id, importance, type, created_at)
|
|
149
|
+
VALUES ($1, $2, $3, $4, $5, $6, CURRENT_TIMESTAMP)
|
|
150
|
+
RETURNING id
|
|
151
|
+
SQL
|
|
152
|
+
end
|
|
153
|
+
|
|
154
|
+
def search(timeframe:, query:, embedding_service:, limit:, strategy: :hybrid)
|
|
155
|
+
# RAG-based retrieval: temporal + semantic
|
|
156
|
+
# See ADR-005 for retrieval strategies
|
|
157
|
+
end
|
|
158
|
+
|
|
159
|
+
def mark_evicted(keys)
|
|
160
|
+
# Update in_working_memory flag (not deleted)
|
|
161
|
+
@db.exec_params(<<~SQL, [keys])
|
|
162
|
+
UPDATE nodes
|
|
163
|
+
SET in_working_memory = FALSE
|
|
164
|
+
WHERE key = ANY($1)
|
|
165
|
+
SQL
|
|
166
|
+
end
|
|
167
|
+
end
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
### Coordination (HTM Class)
|
|
171
|
+
|
|
172
|
+
```ruby
|
|
173
|
+
class HTM
|
|
174
|
+
def initialize(robot_name:, robot_id: nil, max_tokens: 128_000, ...)
|
|
175
|
+
@working_memory = WorkingMemory.new(max_tokens: max_tokens)
|
|
176
|
+
@long_term_memory = LongTermMemory.new(db_config)
|
|
177
|
+
@embedding_service = EmbeddingService.new(...)
|
|
178
|
+
@robot_id = robot_id || SecureRandom.uuid
|
|
179
|
+
@robot_name = robot_name
|
|
180
|
+
end
|
|
181
|
+
|
|
182
|
+
def add_node(key, value, importance: 1.0, type: nil)
|
|
183
|
+
# 1. Generate embedding
|
|
184
|
+
embedding = @embedding_service.embed(value)
|
|
185
|
+
|
|
186
|
+
# 2. Store in long-term memory
|
|
187
|
+
@long_term_memory.add(
|
|
188
|
+
key: key,
|
|
189
|
+
value: value,
|
|
190
|
+
embedding: embedding,
|
|
191
|
+
robot_id: @robot_id,
|
|
192
|
+
importance: importance,
|
|
193
|
+
type: type
|
|
194
|
+
)
|
|
195
|
+
|
|
196
|
+
# 3. Add to working memory (evict if needed)
|
|
197
|
+
token_count = estimate_tokens(value)
|
|
198
|
+
@working_memory.add(key, value,
|
|
199
|
+
token_count: token_count,
|
|
200
|
+
importance: importance)
|
|
201
|
+
end
|
|
202
|
+
|
|
203
|
+
def recall(timeframe:, topic:, limit: 10, strategy: :hybrid)
|
|
204
|
+
# 1. Search long-term memory (RAG)
|
|
205
|
+
results = @long_term_memory.search(
|
|
206
|
+
timeframe: timeframe,
|
|
207
|
+
query: topic,
|
|
208
|
+
embedding_service: @embedding_service,
|
|
209
|
+
limit: limit,
|
|
210
|
+
strategy: strategy
|
|
211
|
+
)
|
|
212
|
+
|
|
213
|
+
# 2. Add results to working memory (evict if needed)
|
|
214
|
+
results.each do |node|
|
|
215
|
+
@working_memory.add(node[:key], node[:value],
|
|
216
|
+
token_count: node[:token_count],
|
|
217
|
+
importance: node[:importance])
|
|
218
|
+
end
|
|
219
|
+
|
|
220
|
+
# 3. Return nodes
|
|
221
|
+
results
|
|
222
|
+
end
|
|
223
|
+
end
|
|
224
|
+
```
|
|
225
|
+
|
|
226
|
+
---
|
|
227
|
+
|
|
228
|
+
## Consequences
|
|
229
|
+
|
|
230
|
+
### Positive
|
|
231
|
+
|
|
232
|
+
- Fast context access through O(1) working memory lookups
|
|
233
|
+
- Durable storage ensures never lose data, survives restarts
|
|
234
|
+
- Token budget control with automatic management
|
|
235
|
+
- Explicit eviction policy provides transparent behavior
|
|
236
|
+
- RAG-enabled search of historical context semantically
|
|
237
|
+
- Never-delete philosophy: eviction moves data, never removes
|
|
238
|
+
- Process-isolated: each robot instance has independent working memory
|
|
239
|
+
|
|
240
|
+
### Negative
|
|
241
|
+
|
|
242
|
+
- Complexity of coordinating two storage layers
|
|
243
|
+
- Memory overhead from working memory consuming RAM
|
|
244
|
+
- Synchronization challenges keeping both tiers consistent
|
|
245
|
+
- Eviction overhead when moving data between tiers
|
|
246
|
+
|
|
247
|
+
### Neutral
|
|
248
|
+
|
|
249
|
+
- Token counting requires accurate estimation
|
|
250
|
+
- Strategy tuning for eviction and assembly needs calibration
|
|
251
|
+
- Per-process state means working memory not shared across processes
|
|
252
|
+
|
|
253
|
+
---
|
|
254
|
+
|
|
255
|
+
## Eviction Strategies
|
|
256
|
+
|
|
257
|
+
### LRU-based (Implemented)
|
|
258
|
+
|
|
259
|
+
```ruby
|
|
260
|
+
def eviction_score(node)
|
|
261
|
+
recency = Time.now - node[:last_accessed]
|
|
262
|
+
importance = node[:importance]
|
|
263
|
+
|
|
264
|
+
# Lower score = evict first
|
|
265
|
+
importance / (recency + 1.0)
|
|
266
|
+
end
|
|
267
|
+
```
|
|
268
|
+
|
|
269
|
+
See [ADR-007: Working Memory Eviction Strategy](007-eviction-strategy.md) for detailed eviction algorithm.
|
|
270
|
+
|
|
271
|
+
### Future Strategies
|
|
272
|
+
|
|
273
|
+
- **Importance-only**: Keep most important nodes
|
|
274
|
+
- **Recency-only**: Pure LRU cache
|
|
275
|
+
- **Frequency-based**: Track access counts
|
|
276
|
+
- **Category-based**: Pin certain types (facts, preferences)
|
|
277
|
+
- **Smart eviction**: ML-based prediction of future access
|
|
278
|
+
|
|
279
|
+
---
|
|
280
|
+
|
|
281
|
+
## Context Assembly Strategies
|
|
282
|
+
|
|
283
|
+
### Recent (`:recent`)
|
|
284
|
+
Sort by `created_at DESC`, newest first
|
|
285
|
+
|
|
286
|
+
### Important (`:important`)
|
|
287
|
+
Sort by `importance DESC`, most important first
|
|
288
|
+
|
|
289
|
+
### Balanced (`:balanced`)
|
|
290
|
+
```ruby
|
|
291
|
+
score = importance * (1.0 / (1 + age_in_hours))
|
|
292
|
+
```
|
|
293
|
+
|
|
294
|
+
See [ADR-006: Context Assembly Strategies](006-context-assembly.md) for detailed assembly algorithms.
|
|
295
|
+
|
|
296
|
+
---
|
|
297
|
+
|
|
298
|
+
## Design Principles
|
|
299
|
+
|
|
300
|
+
### Never Forget (Unless Told)
|
|
301
|
+
|
|
302
|
+
- Eviction moves data, never deletes
|
|
303
|
+
- Only `forget(confirm: :confirmed)` deletes
|
|
304
|
+
- Long-term memory is append-only (updates rare)
|
|
305
|
+
|
|
306
|
+
See [ADR-009: Never-Forget Philosophy](009-never-forget.md) for deletion policies.
|
|
307
|
+
|
|
308
|
+
### Token Budget Management
|
|
309
|
+
|
|
310
|
+
- Token counting happens at add time
|
|
311
|
+
- Working memory enforces hard token limit
|
|
312
|
+
- Context assembly respects token budget
|
|
313
|
+
- Safety margin (10%) for token estimation errors
|
|
314
|
+
|
|
315
|
+
### Transparent Behavior
|
|
316
|
+
|
|
317
|
+
- Log all evictions
|
|
318
|
+
- Track `in_working_memory` flag
|
|
319
|
+
- Operations log for audit trail
|
|
320
|
+
|
|
321
|
+
---
|
|
322
|
+
|
|
323
|
+
## Risks and Mitigations
|
|
324
|
+
|
|
325
|
+
### Risk: Token Count Inaccuracy
|
|
326
|
+
|
|
327
|
+
!!! warning "Risk"
|
|
328
|
+
Tiktoken approximation differs from LLM's actual count
|
|
329
|
+
|
|
330
|
+
**Likelihood**: Medium (different tokenizers)
|
|
331
|
+
**Impact**: Medium (context overflow)
|
|
332
|
+
**Mitigation**: Add safety margin (10%), use LLM-specific counters
|
|
333
|
+
|
|
334
|
+
### Risk: Eviction Thrashing
|
|
335
|
+
|
|
336
|
+
!!! info "Risk"
|
|
337
|
+
Constant eviction/recall cycles
|
|
338
|
+
|
|
339
|
+
**Likelihood**: Low (with proper sizing)
|
|
340
|
+
**Impact**: Medium (performance degradation)
|
|
341
|
+
**Mitigation**: Larger working memory, smarter eviction, caching
|
|
342
|
+
|
|
343
|
+
### Risk: Working Memory Growth
|
|
344
|
+
|
|
345
|
+
!!! danger "Risk"
|
|
346
|
+
Memory leaks or unbounded growth
|
|
347
|
+
|
|
348
|
+
**Likelihood**: Low (token budget enforced)
|
|
349
|
+
**Impact**: High (OOM crashes)
|
|
350
|
+
**Mitigation**: Hard limits, monitoring, alerts
|
|
351
|
+
|
|
352
|
+
### Risk: Stale Working Memory
|
|
353
|
+
|
|
354
|
+
!!! note "Risk"
|
|
355
|
+
Working memory doesn't reflect long-term updates
|
|
356
|
+
|
|
357
|
+
**Likelihood**: Low (single-writer pattern)
|
|
358
|
+
**Impact**: Low (eventual consistency OK)
|
|
359
|
+
**Mitigation**: Refresh on recall, invalidation on update
|
|
360
|
+
|
|
361
|
+
---
|
|
362
|
+
|
|
363
|
+
## Performance Characteristics
|
|
364
|
+
|
|
365
|
+
### Working Memory
|
|
366
|
+
|
|
367
|
+
- **Add**: O(1) amortized (eviction is O(n) when needed)
|
|
368
|
+
- **Retrieve**: O(1) hash lookup
|
|
369
|
+
- **Eviction**: O(n log n) for sorting, O(k) for removing k nodes
|
|
370
|
+
- **Context assembly**: O(n log n) for sorting, O(k) for selecting
|
|
371
|
+
|
|
372
|
+
### Long-term Memory
|
|
373
|
+
|
|
374
|
+
- **Add**: O(log n) PostgreSQL insert with indexes
|
|
375
|
+
- **Vector search**: O(log n) with HNSW index (approximate)
|
|
376
|
+
- **Full-text search**: O(log n) with GIN index
|
|
377
|
+
- **Hybrid search**: O(log n) for both, then merge
|
|
378
|
+
|
|
379
|
+
---
|
|
380
|
+
|
|
381
|
+
## Future Enhancements
|
|
382
|
+
|
|
383
|
+
1. **Shared working memory**: Redis-backed for multi-process
|
|
384
|
+
2. **Lazy loading**: Load nodes on first access
|
|
385
|
+
3. **Pre-fetching**: Anticipate needed context
|
|
386
|
+
4. **Compression**: Compress old working memory nodes
|
|
387
|
+
5. **Tiered eviction**: Multiple working memory levels
|
|
388
|
+
6. **Smart assembly**: ML-driven context selection
|
|
389
|
+
|
|
390
|
+
---
|
|
391
|
+
|
|
392
|
+
## References
|
|
393
|
+
|
|
394
|
+
- [Working Memory (Psychology)](https://en.wikipedia.org/wiki/Working_memory)
|
|
395
|
+
- [Cache Eviction Policies](https://en.wikipedia.org/wiki/Cache_replacement_policies)
|
|
396
|
+
- [LLM Context Window Management](https://www.anthropic.com/research/context-windows)
|
|
397
|
+
- [ADR-001: PostgreSQL Storage](001-postgresql-timescaledb.md)
|
|
398
|
+
- [ADR-006: Context Assembly](006-context-assembly.md)
|
|
399
|
+
- [ADR-007: Eviction Strategy](007-eviction-strategy.md)
|
|
400
|
+
|
|
401
|
+
---
|
|
402
|
+
|
|
403
|
+
## Review Notes
|
|
404
|
+
|
|
405
|
+
**Systems Architect**: Clean separation of concerns. Consider shared cache for horizontal scaling.
|
|
406
|
+
|
|
407
|
+
**Performance Specialist**: Good balance of speed and durability. Monitor eviction frequency.
|
|
408
|
+
|
|
409
|
+
**AI Engineer**: Token budget management is critical. Add safety margins for token count variance.
|
|
410
|
+
|
|
411
|
+
**Ruby Expert**: Consider using Concurrent::Map for thread-safe working memory in future.
|