htm 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (155) hide show
  1. checksums.yaml +7 -0
  2. data/.architecture/decisions/adrs/001-use-postgresql-timescaledb-storage.md +227 -0
  3. data/.architecture/decisions/adrs/002-two-tier-memory-architecture.md +322 -0
  4. data/.architecture/decisions/adrs/003-ollama-default-embedding-provider.md +339 -0
  5. data/.architecture/decisions/adrs/004-multi-robot-shared-memory-hive-mind.md +374 -0
  6. data/.architecture/decisions/adrs/005-rag-based-retrieval-with-hybrid-search.md +443 -0
  7. data/.architecture/decisions/adrs/006-context-assembly-strategies.md +444 -0
  8. data/.architecture/decisions/adrs/007-working-memory-eviction-strategy.md +461 -0
  9. data/.architecture/decisions/adrs/008-robot-identification-system.md +550 -0
  10. data/.architecture/decisions/adrs/009-never-forget-explicit-deletion-only.md +570 -0
  11. data/.architecture/decisions/adrs/010-redis-working-memory-rejected.md +323 -0
  12. data/.architecture/decisions/adrs/011-database-side-embedding-generation-with-pgai.md +585 -0
  13. data/.architecture/decisions/adrs/012-llm-driven-ontology-topic-extraction.md +583 -0
  14. data/.architecture/decisions/adrs/013-activerecord-orm-and-many-to-many-tagging.md +299 -0
  15. data/.architecture/decisions/adrs/014-client-side-embedding-generation-workflow.md +569 -0
  16. data/.architecture/decisions/adrs/015-hierarchical-tag-ontology-and-llm-extraction.md +701 -0
  17. data/.architecture/decisions/adrs/016-async-embedding-and-tag-generation.md +694 -0
  18. data/.architecture/members.yml +144 -0
  19. data/.architecture/reviews/2025-10-29-llm-configuration-and-async-processing-review.md +1137 -0
  20. data/.architecture/reviews/initial-system-analysis.md +330 -0
  21. data/.envrc +32 -0
  22. data/.irbrc +145 -0
  23. data/CHANGELOG.md +150 -0
  24. data/COMMITS.md +196 -0
  25. data/LICENSE +21 -0
  26. data/README.md +1347 -0
  27. data/Rakefile +51 -0
  28. data/SETUP.md +268 -0
  29. data/config/database.yml +67 -0
  30. data/db/migrate/20250101000001_enable_extensions.rb +14 -0
  31. data/db/migrate/20250101000002_create_robots.rb +14 -0
  32. data/db/migrate/20250101000003_create_nodes.rb +42 -0
  33. data/db/migrate/20250101000005_create_tags.rb +38 -0
  34. data/db/migrate/20250101000007_add_node_vector_indexes.rb +30 -0
  35. data/db/schema.sql +473 -0
  36. data/db/seed_data/README.md +100 -0
  37. data/db/seed_data/presidents.md +136 -0
  38. data/db/seed_data/states.md +151 -0
  39. data/db/seeds.rb +208 -0
  40. data/dbdoc/README.md +173 -0
  41. data/dbdoc/public.node_stats.md +48 -0
  42. data/dbdoc/public.node_stats.svg +41 -0
  43. data/dbdoc/public.node_tags.md +40 -0
  44. data/dbdoc/public.node_tags.svg +112 -0
  45. data/dbdoc/public.nodes.md +54 -0
  46. data/dbdoc/public.nodes.svg +118 -0
  47. data/dbdoc/public.nodes_tags.md +39 -0
  48. data/dbdoc/public.nodes_tags.svg +112 -0
  49. data/dbdoc/public.ontology_structure.md +48 -0
  50. data/dbdoc/public.ontology_structure.svg +38 -0
  51. data/dbdoc/public.operations_log.md +42 -0
  52. data/dbdoc/public.operations_log.svg +130 -0
  53. data/dbdoc/public.relationships.md +39 -0
  54. data/dbdoc/public.relationships.svg +41 -0
  55. data/dbdoc/public.robot_activity.md +46 -0
  56. data/dbdoc/public.robot_activity.svg +35 -0
  57. data/dbdoc/public.robots.md +35 -0
  58. data/dbdoc/public.robots.svg +90 -0
  59. data/dbdoc/public.schema_migrations.md +29 -0
  60. data/dbdoc/public.schema_migrations.svg +26 -0
  61. data/dbdoc/public.tags.md +35 -0
  62. data/dbdoc/public.tags.svg +60 -0
  63. data/dbdoc/public.topic_relationships.md +45 -0
  64. data/dbdoc/public.topic_relationships.svg +32 -0
  65. data/dbdoc/schema.json +1437 -0
  66. data/dbdoc/schema.svg +154 -0
  67. data/docs/api/database.md +806 -0
  68. data/docs/api/embedding-service.md +532 -0
  69. data/docs/api/htm.md +797 -0
  70. data/docs/api/index.md +259 -0
  71. data/docs/api/long-term-memory.md +1096 -0
  72. data/docs/api/working-memory.md +665 -0
  73. data/docs/architecture/adrs/001-postgresql-timescaledb.md +314 -0
  74. data/docs/architecture/adrs/002-two-tier-memory.md +411 -0
  75. data/docs/architecture/adrs/003-ollama-embeddings.md +421 -0
  76. data/docs/architecture/adrs/004-hive-mind.md +437 -0
  77. data/docs/architecture/adrs/005-rag-retrieval.md +531 -0
  78. data/docs/architecture/adrs/006-context-assembly.md +496 -0
  79. data/docs/architecture/adrs/007-eviction-strategy.md +645 -0
  80. data/docs/architecture/adrs/008-robot-identification.md +625 -0
  81. data/docs/architecture/adrs/009-never-forget.md +648 -0
  82. data/docs/architecture/adrs/010-redis-working-memory-rejected.md +323 -0
  83. data/docs/architecture/adrs/011-pgai-integration.md +494 -0
  84. data/docs/architecture/adrs/index.md +215 -0
  85. data/docs/architecture/hive-mind.md +736 -0
  86. data/docs/architecture/index.md +351 -0
  87. data/docs/architecture/overview.md +538 -0
  88. data/docs/architecture/two-tier-memory.md +873 -0
  89. data/docs/assets/css/custom.css +83 -0
  90. data/docs/assets/images/htm-core-components.svg +63 -0
  91. data/docs/assets/images/htm-database-schema.svg +93 -0
  92. data/docs/assets/images/htm-hive-mind-architecture.svg +125 -0
  93. data/docs/assets/images/htm-importance-scoring-framework.svg +83 -0
  94. data/docs/assets/images/htm-layered-architecture.svg +71 -0
  95. data/docs/assets/images/htm-long-term-memory-architecture.svg +115 -0
  96. data/docs/assets/images/htm-working-memory-architecture.svg +120 -0
  97. data/docs/assets/images/htm.jpg +0 -0
  98. data/docs/assets/images/htm_demo.gif +0 -0
  99. data/docs/assets/js/mathjax.js +18 -0
  100. data/docs/assets/videos/htm_video.mp4 +0 -0
  101. data/docs/database_rake_tasks.md +322 -0
  102. data/docs/development/contributing.md +787 -0
  103. data/docs/development/index.md +336 -0
  104. data/docs/development/schema.md +596 -0
  105. data/docs/development/setup.md +719 -0
  106. data/docs/development/testing.md +819 -0
  107. data/docs/guides/adding-memories.md +824 -0
  108. data/docs/guides/context-assembly.md +1009 -0
  109. data/docs/guides/getting-started.md +577 -0
  110. data/docs/guides/index.md +118 -0
  111. data/docs/guides/long-term-memory.md +941 -0
  112. data/docs/guides/multi-robot.md +866 -0
  113. data/docs/guides/recalling-memories.md +927 -0
  114. data/docs/guides/search-strategies.md +953 -0
  115. data/docs/guides/working-memory.md +717 -0
  116. data/docs/index.md +214 -0
  117. data/docs/installation.md +477 -0
  118. data/docs/multi_framework_support.md +519 -0
  119. data/docs/quick-start.md +655 -0
  120. data/docs/setup_local_database.md +302 -0
  121. data/docs/using_rake_tasks_in_your_app.md +383 -0
  122. data/examples/basic_usage.rb +93 -0
  123. data/examples/cli_app/README.md +317 -0
  124. data/examples/cli_app/htm_cli.rb +270 -0
  125. data/examples/custom_llm_configuration.rb +183 -0
  126. data/examples/example_app/Rakefile +71 -0
  127. data/examples/example_app/app.rb +206 -0
  128. data/examples/sinatra_app/Gemfile +21 -0
  129. data/examples/sinatra_app/app.rb +335 -0
  130. data/lib/htm/active_record_config.rb +113 -0
  131. data/lib/htm/configuration.rb +342 -0
  132. data/lib/htm/database.rb +594 -0
  133. data/lib/htm/embedding_service.rb +115 -0
  134. data/lib/htm/errors.rb +34 -0
  135. data/lib/htm/job_adapter.rb +154 -0
  136. data/lib/htm/jobs/generate_embedding_job.rb +65 -0
  137. data/lib/htm/jobs/generate_tags_job.rb +82 -0
  138. data/lib/htm/long_term_memory.rb +965 -0
  139. data/lib/htm/models/node.rb +109 -0
  140. data/lib/htm/models/node_tag.rb +33 -0
  141. data/lib/htm/models/robot.rb +52 -0
  142. data/lib/htm/models/tag.rb +76 -0
  143. data/lib/htm/railtie.rb +76 -0
  144. data/lib/htm/sinatra.rb +157 -0
  145. data/lib/htm/tag_service.rb +135 -0
  146. data/lib/htm/tasks.rb +38 -0
  147. data/lib/htm/version.rb +5 -0
  148. data/lib/htm/working_memory.rb +182 -0
  149. data/lib/htm.rb +400 -0
  150. data/lib/tasks/db.rake +19 -0
  151. data/lib/tasks/htm.rake +147 -0
  152. data/lib/tasks/jobs.rake +312 -0
  153. data/mkdocs.yml +190 -0
  154. data/scripts/install_local_database.sh +309 -0
  155. metadata +341 -0
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 0a25e07ce28cc74ddb8b2fca0a9b316108b97362c2281f2e77b7bec5124da0f7
4
+ data.tar.gz: 9978ce3bd0c1b3c589436a5c8637bf8043bac3b0dfb542657851fa7cb96e1b49
5
+ SHA512:
6
+ metadata.gz: bbd241d5bda941b6d3df5b35bb13e8dbf630af1be02debf814b4396fba5e76e67b6a22756690d3f510d4a0ad441710d61b3d8d20018db8ba1347220818ecefcf
7
+ data.tar.gz: 57e49d9629c934dd875e3d65a9c4b476bd96bc04dbc168455d3652c8a8b0aededaa2cd0163e377feadf28507550f6adbd0d13b5a9870b9c54b95437e7fd4e8bb
@@ -0,0 +1,227 @@
1
+ # ADR-001: Use PostgreSQL with TimescaleDB for Storage
2
+
3
+ **Status**: ~~Accepted~~ **SUPERSEDED** (2025-10-28)
4
+
5
+ **Date**: 2025-10-25
6
+
7
+ **Decision Makers**: Dewayne VanHoozer, Claude (Anthropic)
8
+
9
+ ---
10
+
11
+ ## ⚠️ DECISION UPDATE (2025-10-28)
12
+
13
+ **TimescaleDB extension has been removed from HTM.**
14
+
15
+ **Reason**: After initial struggles with database configuration and practical usage, the decision was made to drop the TimescaleDB extension as it was not providing sufficient value for the current proof-of-concept applications.
16
+
17
+ **Key findings**:
18
+ - No hypertables were actually created in the implementation (the `setup_hypertables` method was essentially a no-op)
19
+ - Time-range queries use standard PostgreSQL B-tree indexes on timestamp columns, not TimescaleDB-specific optimizations
20
+ - No compression policies were configured or used
21
+ - The extension added deployment complexity without delivering measurable benefits
22
+
23
+ **Current Implementation**: HTM now uses **standard PostgreSQL 17** with only the following extensions:
24
+ - `vector` (pgvector) - for embedding similarity search with up to 2000 dimensions
25
+ - `pg_trgm` - for fuzzy text matching
26
+
27
+ **Migration to ActiveRecord** (2025-10-29): The database layer has been migrated to use ActiveRecord ORM with proper models and migrations. See ADR-013 for details.
28
+
29
+ No functionality was lost in the removal, as TimescaleDB features were never actually utilized despite being documented.
30
+
31
+ ---
32
+
33
+ ## Context
34
+
35
+ HTM requires a persistent storage solution that can handle:
36
+
37
+ - Time-series data with efficient time-range queries
38
+ - Vector embeddings for semantic search
39
+ - Full-text search capabilities
40
+ - ACID compliance for data integrity
41
+ - Scalability for growing memory databases
42
+ - Production-grade reliability
43
+
44
+ Alternative options considered:
45
+
46
+ 1. **Pure PostgreSQL**: Solid relational database, pgvector support
47
+ 2. **TimescaleDB**: PostgreSQL extension optimized for time-series
48
+ 3. **Elasticsearch**: Strong full-text search, vector support added
49
+ 4. **Pinecone/Weaviate**: Specialized vector databases
50
+ 5. **SQLite + extensions**: Simple, embedded option
51
+
52
+ ## Decision
53
+
54
+ We will use **PostgreSQL with TimescaleDB** as the primary storage backend for HTM.
55
+
56
+ ## Rationale
57
+
58
+ ### Why PostgreSQL?
59
+
60
+ - **Production-proven**: Decades of reliability in demanding environments
61
+ - **ACID compliance**: Guarantees data integrity for memory operations
62
+ - **Rich ecosystem**: Extensive tooling, monitoring, and support
63
+ - **pgvector extension**: Native vector similarity search with HNSW indexing
64
+ - **Full-text search**: Built-in tsvector with GIN indexing
65
+ - **pg_trgm extension**: Trigram-based fuzzy matching
66
+ - **Strong typing**: Schema enforcement prevents data corruption
67
+ - **Wide adoption**: Well-understood by developers
68
+
69
+ ### Why TimescaleDB?
70
+
71
+ - **Hypertable partitioning**: Automatic chunk-based time partitioning
72
+ - **Compression policies**: Automatic compression of old data (70-90% reduction)
73
+ - **Time-range optimization**: Fast queries on temporal data
74
+ - **PostgreSQL compatibility**: Drop-in extension, not a fork
75
+ - **Continuous aggregates**: Pre-computed summaries for analytics
76
+ - **Retention policies**: Automatic data lifecycle management
77
+ - **Cloud offering**: Managed service available (TimescaleDB Cloud)
78
+
79
+ ### Why Not Alternatives?
80
+
81
+ **Elasticsearch**:
82
+
83
+ - ❌ Operational complexity (JVM tuning, cluster management)
84
+ - ❌ Higher resource usage
85
+ - ❌ Vector support more recent, less mature
86
+ - ✅ Superior full-text search (not critical for our use case)
87
+
88
+ **Specialized Vector DBs** (Pinecone, Weaviate, Qdrant):
89
+
90
+ - ❌ Additional service dependency
91
+ - ❌ Limited relational capabilities
92
+ - ❌ Vendor lock-in concerns
93
+ - ❌ Cost considerations for managed services
94
+ - ✅ Excellent vector search performance
95
+ - ✅ Purpose-built for embeddings
96
+
97
+ **SQLite**:
98
+
99
+ - ❌ Limited concurrency (write locks)
100
+ - ❌ No native vector search (extensions experimental)
101
+ - ❌ Not suitable for multi-robot scenarios
102
+ - ✅ Simple deployment
103
+ - ✅ Zero configuration
104
+
105
+ ## Implementation Details
106
+
107
+ ### Schema Design
108
+
109
+ - **nodes table**: TimescaleDB hypertable partitioned by `created_at`
110
+ - **operations_log**: TimescaleDB hypertable for audit trail
111
+ - **Vector indexing**: HNSW algorithm for approximate nearest neighbor
112
+ - **Full-text indexing**: GIN indexes on tsvector columns
113
+ - **Compression**: Automatic after 30 days, segmented by robot_id and type
114
+
115
+ ### Connection Configuration
116
+ ```ruby
117
+ # Via environment variable (preferred)
118
+ ENV['HTM_DBURL'] = "postgresql://user:pass@host:port/dbname?sslmode=require"
119
+
120
+ # Parsed into connection hash
121
+ {
122
+ host: 'host',
123
+ port: 5432,
124
+ dbname: 'tsdb',
125
+ user: 'tsdbadmin',
126
+ password: 'secret',
127
+ sslmode: 'require'
128
+ }
129
+ ```
130
+
131
+ ### Key Features Enabled
132
+
133
+ - Vector similarity search with `<=>` operator (cosine distance)
134
+ - Full-text search with `to_tsvector()` and `@@` operator
135
+ - Trigram fuzzy matching with `pg_trgm`
136
+ - Time-range queries optimized by chunk exclusion
137
+ - Automatic compression of old chunks
138
+
139
+ ## Consequences
140
+
141
+ ### Positive
142
+
143
+ ✅ **Production-ready**: Battle-tested database with proven reliability
144
+ ✅ **Multi-modal search**: Vector, full-text, and hybrid strategies
145
+ ✅ **Time-series optimization**: Efficient temporal queries
146
+ ✅ **Cost-effective storage**: Compression reduces cloud storage costs
147
+ ✅ **Familiar tooling**: Standard PostgreSQL tools and practices apply
148
+ ✅ **Flexible querying**: Full SQL power for complex operations
149
+ ✅ **ACID guarantees**: Data integrity for critical memory operations
150
+
151
+ ### Negative
152
+
153
+ ❌ **Operational complexity**: Requires database management (mitigated by managed service)
154
+ ❌ **Scaling limits**: Vertical scaling limits (mitigated by partitioning)
155
+ ❌ **Connection overhead**: PostgreSQL connections are relatively heavy
156
+ ❌ **Vector search performance**: Slower than specialized vector DBs at massive scale
157
+
158
+ ### Neutral
159
+
160
+ ➡️ **Learning curve**: Developers need PostgreSQL + TimescaleDB knowledge
161
+ ➡️ **Cloud dependency**: Currently using TimescaleDB Cloud (could self-host)
162
+ ➡️ **Extension management**: Requires extensions (timescaledb, pgvector, pg_trgm)
163
+
164
+ ## Risks and Mitigations
165
+
166
+ ### Risk: Extension Availability
167
+
168
+ - **Risk**: Extensions not available in all PostgreSQL environments
169
+ - **Likelihood**: Low (extensions widely available)
170
+ - **Impact**: High (breaks core functionality)
171
+ - **Mitigation**: Document requirements clearly, verify in setup process
172
+
173
+ ### Risk: Connection Exhaustion
174
+
175
+ - **Risk**: PostgreSQL connections limited (default ~100)
176
+ - **Likelihood**: Medium (with many robots)
177
+ - **Impact**: Medium (service degradation)
178
+ - **Mitigation**: Implement connection pooling (ConnectionPool gem)
179
+
180
+ ### Risk: Storage Costs
181
+
182
+ - **Risk**: Vector data storage can be expensive at scale
183
+ - **Likelihood**: Medium (depends on usage)
184
+ - **Impact**: Medium (operational cost)
185
+ - **Mitigation**: Compression policies, retention policies, archival strategies
186
+
187
+ ### Risk: Query Performance at Scale
188
+
189
+ - **Risk**: Complex hybrid searches may slow down with millions of nodes
190
+ - **Likelihood**: Low (with proper indexing)
191
+ - **Impact**: Medium (user experience)
192
+ - **Mitigation**: Query optimization, read replicas, caching layer
193
+
194
+ ## Alternatives Considered
195
+
196
+ | Solution | Pros | Cons | Decision |
197
+ |----------|------|------|----------|
198
+ | Pure PostgreSQL | Simple, reliable, pgvector | No time-series optimization | ❌ Rejected |
199
+ | PostgreSQL + TimescaleDB | Best of both worlds | Slight complexity increase | ✅ **Accepted** |
200
+ | Elasticsearch | Excellent full-text search | High resource usage, complexity | ❌ Rejected |
201
+ | Pinecone | Purpose-built vectors | Vendor lock-in, cost, limited relational | ❌ Rejected |
202
+ | SQLite | Simple, embedded | Limited concurrency, no vectors | ❌ Rejected |
203
+
204
+ ## Future Considerations
205
+
206
+ - **Read replicas**: For query scaling when needed
207
+ - **Partitioning strategies**: By robot_id for tenant isolation
208
+ - **Caching layer**: Redis for hot nodes
209
+ - **Archive tier**: S3/Glacier for very old memories
210
+ - **Multi-region**: For global deployment
211
+
212
+ ## References
213
+
214
+ - [TimescaleDB Documentation](https://docs.timescale.com/)
215
+ - [pgvector Documentation](https://github.com/pgvector/pgvector)
216
+ - [PostgreSQL Full-Text Search](https://www.postgresql.org/docs/current/textsearch.html)
217
+ - [HTM Planning Document](../../htm_teamwork.md)
218
+
219
+ ## Review Notes
220
+
221
+ **Systems Architect**: ✅ Solid choice for time-series + vector workload. Consider read replicas for scaling.
222
+
223
+ **Database Architect**: ✅ Excellent indexing strategy. Monitor query performance as data grows.
224
+
225
+ **Performance Specialist**: ✅ TimescaleDB compression will help with costs. Add connection pooling soon.
226
+
227
+ **Maintainability Expert**: ✅ PostgreSQL tooling is mature and well-documented. Good choice for long-term maintenance.
@@ -0,0 +1,322 @@
1
+ # ADR-002: Two-Tier Memory Architecture
2
+
3
+ **Status**: Accepted
4
+
5
+ **Date**: 2025-10-25
6
+
7
+ **Decision Makers**: Dewayne VanHoozer, Claude (Anthropic)
8
+
9
+ ---
10
+
11
+ ## ⚠️ UPDATE (2025-10-28)
12
+
13
+ **References to TimescaleDB in this ADR are now historical.**
14
+
15
+ After initial struggles with database configuration, the decision was made to drop the TimescaleDB extension as it was not providing sufficient value for the current proof-of-concept applications. The two-tier architecture remains unchanged, but long-term memory now uses **standard PostgreSQL** instead of PostgreSQL + TimescaleDB.
16
+
17
+ See [ADR-001](001-use-postgresql-timescaledb-storage.md) for details on the TimescaleDB removal.
18
+
19
+ ---
20
+
21
+ ## Context
22
+
23
+ LLM-based applications ("robots") face a fundamental challenge: LLMs have limited context windows (typically 128K-200K tokens) but need to maintain awareness across long conversations and sessions spanning days, weeks, or months.
24
+
25
+ Requirements:
26
+
27
+ - Persist memories across sessions (durable storage)
28
+ - Provide fast access to recent/relevant context
29
+ - Manage token budgets efficiently
30
+ - Never lose data accidentally
31
+ - Support contextual recall from the past
32
+
33
+ Alternative approaches:
34
+
35
+ 1. **Database-only**: Store everything in PostgreSQL, load on demand
36
+ 2. **Memory-only**: Keep everything in RAM, serialize on shutdown
37
+ 3. **Two-tier**: Combine fast working memory with durable long-term storage
38
+ 4. **External service**: Use a managed memory service
39
+
40
+ ## Decision
41
+
42
+ We will implement a **two-tier memory architecture** with:
43
+
44
+ - **Working Memory**: Token-limited, in-memory active context
45
+ - **Long-term Memory**: Durable PostgreSQL/TimescaleDB storage
46
+
47
+ ## Rationale
48
+
49
+ ### Working Memory (Hot Tier)
50
+
51
+ - **Purpose**: Immediate context for LLM
52
+ - **Storage**: In-memory Ruby data structures
53
+ - **Capacity**: Token-limited (default 128K tokens)
54
+ - **Eviction**: LRU-based eviction when full
55
+ - **Access pattern**: Frequent reads, moderate writes
56
+ - **Lifetime**: Process lifetime
57
+
58
+ ### Long-term Memory (Cold Tier)
59
+
60
+ - **Purpose**: Permanent knowledge base
61
+ - **Storage**: PostgreSQL with TimescaleDB
62
+ - **Capacity**: Effectively unlimited
63
+ - **Retention**: Permanent (explicit deletion only)
64
+ - **Access pattern**: RAG-based retrieval
65
+ - **Lifetime**: Forever
66
+
67
+ ### Data Flow
68
+ ```
69
+ Add Memory:
70
+ User Input → Working Memory → Long-term Memory
71
+ (immediate) (persisted)
72
+
73
+ Recall Memory:
74
+ Query → Long-term Memory (RAG search) → Working Memory
75
+ (semantic + temporal) (evict if needed)
76
+
77
+ Eviction:
78
+ Working Memory (full) → Evict LRU → Long-term Memory (already there)
79
+ (mark as evicted, not deleted)
80
+ ```
81
+
82
+ ## Implementation Details
83
+
84
+ ### Working Memory
85
+ ```ruby
86
+ class WorkingMemory
87
+ attr_reader :max_tokens, :token_count
88
+
89
+ def initialize(max_tokens: 128_000)
90
+ @nodes = {} # key => {value, token_count, importance, timestamp}
91
+ @max_tokens = max_tokens
92
+ @token_count = 0
93
+ end
94
+
95
+ def add(key, value, token_count:, importance: 1.0)
96
+ evict_to_make_space(token_count) if needs_eviction?(token_count)
97
+ @nodes[key] = {value: value, token_count: token_count, ...}
98
+ @token_count += token_count
99
+ end
100
+
101
+ def evict_to_make_space(needed_tokens)
102
+ # LRU eviction based on last access + importance
103
+ end
104
+
105
+ def assemble_context(strategy: :balanced, max_tokens: nil)
106
+ # Sort by strategy and assemble within budget
107
+ end
108
+ end
109
+ ```
110
+
111
+ ### Long-term Memory
112
+ ```ruby
113
+ class LongTermMemory
114
+ def add(key:, value:, embedding:, ...)
115
+ # Insert into PostgreSQL with vector embedding
116
+ end
117
+
118
+ def search(timeframe:, query:, embedding_service:, limit:)
119
+ # RAG-based retrieval: temporal + semantic
120
+ end
121
+
122
+ def mark_evicted(keys)
123
+ # Update in_working_memory flag (not deleted)
124
+ end
125
+ end
126
+ ```
127
+
128
+ ### Coordination (HTM class)
129
+ ```ruby
130
+ class HTM
131
+ def add_node(key, value, ...)
132
+ # 1. Generate embedding
133
+ # 2. Store in long-term memory
134
+ # 3. Add to working memory (evict if needed)
135
+ end
136
+
137
+ def recall(timeframe:, topic:, ...)
138
+ # 1. Search long-term memory (RAG)
139
+ # 2. Add results to working memory (evict if needed)
140
+ # 3. Return nodes
141
+ end
142
+ end
143
+ ```
144
+
145
+ ## Consequences
146
+
147
+ ### Positive
148
+
149
+ ✅ **Fast context access**: Working memory provides O(1) lookups
150
+ ✅ **Durable storage**: Never lose data, survives restarts
151
+ ✅ **Token budget control**: Automatic management of context size
152
+ ✅ **Explicit eviction policy**: Transparent behavior
153
+ ✅ **RAG-enabled**: Search historical context semantically
154
+ ✅ **Never-delete philosophy**: Eviction moves data, never removes
155
+ ✅ **Process-isolated**: Each robot instance has independent working memory
156
+
157
+ ### Negative
158
+
159
+ ❌ **Complexity**: Two storage layers to coordinate
160
+ ❌ **Memory overhead**: Working memory consumes RAM
161
+ ❌ **Synchronization**: Keep both tiers consistent
162
+ ❌ **Eviction overhead**: Moving data between tiers
163
+
164
+ ### Neutral
165
+
166
+ ➡️ **Token counting**: Requires accurate token estimation
167
+ ➡️ **Strategy tuning**: Eviction and assembly strategies need calibration
168
+ ➡️ **Per-process state**: Working memory not shared across processes
169
+
170
+ ## Eviction Strategies
171
+
172
+ ### LRU-based (Implemented)
173
+ ```ruby
174
+ def eviction_score(node)
175
+ recency = Time.now - node[:last_accessed]
176
+ importance = node[:importance]
177
+
178
+ # Lower score = evict first
179
+ importance / (recency + 1.0)
180
+ end
181
+ ```
182
+
183
+ ### Future Strategies
184
+ - **Importance-only**: Keep most important nodes
185
+ - **Recency-only**: Pure LRU cache
186
+ - **Frequency-based**: Track access counts
187
+ - **Category-based**: Pin certain types (facts, preferences)
188
+ - **Smart eviction**: ML-based prediction of future access
189
+
190
+ ## Context Assembly Strategies
191
+
192
+ ### Recent (`:recent`)
193
+ Sort by `created_at DESC`, newest first
194
+
195
+ ### Important (`:important`)
196
+ Sort by `importance DESC`, most important first
197
+
198
+ ### Balanced (`:balanced`)
199
+ ```ruby
200
+ score = importance * (1.0 / age_in_days)
201
+ ```
202
+
203
+ ### Future Strategies
204
+ - **Semantic clustering**: Group related memories
205
+ - **Conversation threading**: Follow reply chains
206
+ - **Category grouping**: Facts first, then context, etc.
207
+ - **Hybrid scoring**: Multiple factors weighted
208
+
209
+ ## Design Principles
210
+
211
+ ### Never Forget (Unless Told)
212
+
213
+ - Eviction moves data, never deletes
214
+ - Only `forget(confirm: :confirmed)` deletes
215
+ - Long-term memory is append-only (updates rare)
216
+
217
+ ### Token Budget Management
218
+
219
+ - Token counting happens at add time
220
+ - Working memory enforces hard token limit
221
+ - Context assembly respects token budget
222
+ - Safety margin (10%) for token estimation errors
223
+
224
+ ### Transparent Behavior
225
+
226
+ - Log all evictions
227
+ - Track in_working_memory flag
228
+ - Operations log for audit trail
229
+
230
+ ## Risks and Mitigations
231
+
232
+ ### Risk: Token Count Inaccuracy
233
+
234
+ - **Risk**: Tiktoken approximation differs from LLM's actual count
235
+ - **Likelihood**: Medium (different tokenizers)
236
+ - **Impact**: Medium (context overflow)
237
+ - **Mitigation**: Add safety margin (10%), use LLM-specific counters
238
+
239
+ ### Risk: Eviction Thrashing
240
+
241
+ - **Risk**: Constant eviction/recall cycles
242
+ - **Likelihood**: Low (with proper sizing)
243
+ - **Impact**: Medium (performance degradation)
244
+ - **Mitigation**: Larger working memory, smarter eviction, caching
245
+
246
+ ### Risk: Working Memory Growth
247
+
248
+ - **Risk**: Memory leaks or unbounded growth
249
+ - **Likelihood**: Low (token budget enforced)
250
+ - **Impact**: High (OOM crashes)
251
+ - **Mitigation**: Hard limits, monitoring, alerts
252
+
253
+ ### Risk: Stale Working Memory
254
+
255
+ - **Risk**: Working memory doesn't reflect long-term updates
256
+ - **Likelihood**: Low (single-writer pattern)
257
+ - **Impact**: Low (eventual consistency OK)
258
+ - **Mitigation**: Refresh on recall, invalidation on update
259
+
260
+ ## Alternatives Considered
261
+
262
+ ### Database-Only
263
+ **Pros**: Simple, no synchronization
264
+ **Cons**: Slow access, no token budget management
265
+ **Decision**: ❌ Rejected - too slow for every LLM call
266
+
267
+ ### Memory-Only
268
+ **Pros**: Fast, simple
269
+ **Cons**: Not durable, lost on crash
270
+ **Decision**: ❌ Rejected - unacceptable data loss risk
271
+
272
+ ### External Service (Redis, Memcached)
273
+ **Pros**: Shared across processes, mature caching
274
+ **Cons**: Additional dependency, serialization overhead
275
+ **Decision**: ⏸️ Deferred - consider for multi-process scenarios
276
+
277
+ ### Three-Tier (L1/L2/L3)
278
+ **Pros**: More granular caching
279
+ **Cons**: Much higher complexity
280
+ **Decision**: ❌ Rejected - YAGNI for v1
281
+
282
+ ## Performance Characteristics
283
+
284
+ ### Working Memory
285
+
286
+ - **Add**: O(1) amortized (eviction is O(n) when needed)
287
+ - **Retrieve**: O(1) hash lookup
288
+ - **Eviction**: O(n log n) for sorting, O(k) for removing k nodes
289
+ - **Context assembly**: O(n log n) for sorting, O(k) for selecting
290
+
291
+ ### Long-term Memory
292
+
293
+ - **Add**: O(log n) PostgreSQL insert with indexes
294
+ - **Vector search**: O(log n) with HNSW index (approximate)
295
+ - **Full-text search**: O(log n) with GIN index
296
+ - **Hybrid search**: O(log n) for both, then merge
297
+
298
+ ## Future Enhancements
299
+
300
+ 1. **Shared working memory**: Redis-backed for multi-process
301
+ 2. **Lazy loading**: Load nodes on first access
302
+ 3. **Pre-fetching**: Anticipate needed context
303
+ 4. **Compression**: Compress old working memory nodes
304
+ 5. **Tiered eviction**: Multiple working memory levels
305
+ 6. **Smart assembly**: ML-driven context selection
306
+
307
+ ## References
308
+
309
+ - [Working Memory (Psychology)](https://en.wikipedia.org/wiki/Working_memory)
310
+ - [Cache Eviction Policies](https://en.wikipedia.org/wiki/Cache_replacement_policies)
311
+ - [LLM Context Window Management](https://www.anthropic.com/research/context-windows)
312
+ - [HTM Planning Document](../../htm_teamwork.md)
313
+
314
+ ## Review Notes
315
+
316
+ **Systems Architect**: ✅ Clean separation of concerns. Consider shared cache for horizontal scaling.
317
+
318
+ **Performance Specialist**: ✅ Good balance of speed and durability. Monitor eviction frequency.
319
+
320
+ **AI Engineer**: ✅ Token budget management is critical. Add safety margins for token count variance.
321
+
322
+ **Ruby Expert**: ✅ Consider using Concurrent::Map for thread-safe working memory in future.