htm 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (155) hide show
  1. checksums.yaml +7 -0
  2. data/.architecture/decisions/adrs/001-use-postgresql-timescaledb-storage.md +227 -0
  3. data/.architecture/decisions/adrs/002-two-tier-memory-architecture.md +322 -0
  4. data/.architecture/decisions/adrs/003-ollama-default-embedding-provider.md +339 -0
  5. data/.architecture/decisions/adrs/004-multi-robot-shared-memory-hive-mind.md +374 -0
  6. data/.architecture/decisions/adrs/005-rag-based-retrieval-with-hybrid-search.md +443 -0
  7. data/.architecture/decisions/adrs/006-context-assembly-strategies.md +444 -0
  8. data/.architecture/decisions/adrs/007-working-memory-eviction-strategy.md +461 -0
  9. data/.architecture/decisions/adrs/008-robot-identification-system.md +550 -0
  10. data/.architecture/decisions/adrs/009-never-forget-explicit-deletion-only.md +570 -0
  11. data/.architecture/decisions/adrs/010-redis-working-memory-rejected.md +323 -0
  12. data/.architecture/decisions/adrs/011-database-side-embedding-generation-with-pgai.md +585 -0
  13. data/.architecture/decisions/adrs/012-llm-driven-ontology-topic-extraction.md +583 -0
  14. data/.architecture/decisions/adrs/013-activerecord-orm-and-many-to-many-tagging.md +299 -0
  15. data/.architecture/decisions/adrs/014-client-side-embedding-generation-workflow.md +569 -0
  16. data/.architecture/decisions/adrs/015-hierarchical-tag-ontology-and-llm-extraction.md +701 -0
  17. data/.architecture/decisions/adrs/016-async-embedding-and-tag-generation.md +694 -0
  18. data/.architecture/members.yml +144 -0
  19. data/.architecture/reviews/2025-10-29-llm-configuration-and-async-processing-review.md +1137 -0
  20. data/.architecture/reviews/initial-system-analysis.md +330 -0
  21. data/.envrc +32 -0
  22. data/.irbrc +145 -0
  23. data/CHANGELOG.md +150 -0
  24. data/COMMITS.md +196 -0
  25. data/LICENSE +21 -0
  26. data/README.md +1347 -0
  27. data/Rakefile +51 -0
  28. data/SETUP.md +268 -0
  29. data/config/database.yml +67 -0
  30. data/db/migrate/20250101000001_enable_extensions.rb +14 -0
  31. data/db/migrate/20250101000002_create_robots.rb +14 -0
  32. data/db/migrate/20250101000003_create_nodes.rb +42 -0
  33. data/db/migrate/20250101000005_create_tags.rb +38 -0
  34. data/db/migrate/20250101000007_add_node_vector_indexes.rb +30 -0
  35. data/db/schema.sql +473 -0
  36. data/db/seed_data/README.md +100 -0
  37. data/db/seed_data/presidents.md +136 -0
  38. data/db/seed_data/states.md +151 -0
  39. data/db/seeds.rb +208 -0
  40. data/dbdoc/README.md +173 -0
  41. data/dbdoc/public.node_stats.md +48 -0
  42. data/dbdoc/public.node_stats.svg +41 -0
  43. data/dbdoc/public.node_tags.md +40 -0
  44. data/dbdoc/public.node_tags.svg +112 -0
  45. data/dbdoc/public.nodes.md +54 -0
  46. data/dbdoc/public.nodes.svg +118 -0
  47. data/dbdoc/public.nodes_tags.md +39 -0
  48. data/dbdoc/public.nodes_tags.svg +112 -0
  49. data/dbdoc/public.ontology_structure.md +48 -0
  50. data/dbdoc/public.ontology_structure.svg +38 -0
  51. data/dbdoc/public.operations_log.md +42 -0
  52. data/dbdoc/public.operations_log.svg +130 -0
  53. data/dbdoc/public.relationships.md +39 -0
  54. data/dbdoc/public.relationships.svg +41 -0
  55. data/dbdoc/public.robot_activity.md +46 -0
  56. data/dbdoc/public.robot_activity.svg +35 -0
  57. data/dbdoc/public.robots.md +35 -0
  58. data/dbdoc/public.robots.svg +90 -0
  59. data/dbdoc/public.schema_migrations.md +29 -0
  60. data/dbdoc/public.schema_migrations.svg +26 -0
  61. data/dbdoc/public.tags.md +35 -0
  62. data/dbdoc/public.tags.svg +60 -0
  63. data/dbdoc/public.topic_relationships.md +45 -0
  64. data/dbdoc/public.topic_relationships.svg +32 -0
  65. data/dbdoc/schema.json +1437 -0
  66. data/dbdoc/schema.svg +154 -0
  67. data/docs/api/database.md +806 -0
  68. data/docs/api/embedding-service.md +532 -0
  69. data/docs/api/htm.md +797 -0
  70. data/docs/api/index.md +259 -0
  71. data/docs/api/long-term-memory.md +1096 -0
  72. data/docs/api/working-memory.md +665 -0
  73. data/docs/architecture/adrs/001-postgresql-timescaledb.md +314 -0
  74. data/docs/architecture/adrs/002-two-tier-memory.md +411 -0
  75. data/docs/architecture/adrs/003-ollama-embeddings.md +421 -0
  76. data/docs/architecture/adrs/004-hive-mind.md +437 -0
  77. data/docs/architecture/adrs/005-rag-retrieval.md +531 -0
  78. data/docs/architecture/adrs/006-context-assembly.md +496 -0
  79. data/docs/architecture/adrs/007-eviction-strategy.md +645 -0
  80. data/docs/architecture/adrs/008-robot-identification.md +625 -0
  81. data/docs/architecture/adrs/009-never-forget.md +648 -0
  82. data/docs/architecture/adrs/010-redis-working-memory-rejected.md +323 -0
  83. data/docs/architecture/adrs/011-pgai-integration.md +494 -0
  84. data/docs/architecture/adrs/index.md +215 -0
  85. data/docs/architecture/hive-mind.md +736 -0
  86. data/docs/architecture/index.md +351 -0
  87. data/docs/architecture/overview.md +538 -0
  88. data/docs/architecture/two-tier-memory.md +873 -0
  89. data/docs/assets/css/custom.css +83 -0
  90. data/docs/assets/images/htm-core-components.svg +63 -0
  91. data/docs/assets/images/htm-database-schema.svg +93 -0
  92. data/docs/assets/images/htm-hive-mind-architecture.svg +125 -0
  93. data/docs/assets/images/htm-importance-scoring-framework.svg +83 -0
  94. data/docs/assets/images/htm-layered-architecture.svg +71 -0
  95. data/docs/assets/images/htm-long-term-memory-architecture.svg +115 -0
  96. data/docs/assets/images/htm-working-memory-architecture.svg +120 -0
  97. data/docs/assets/images/htm.jpg +0 -0
  98. data/docs/assets/images/htm_demo.gif +0 -0
  99. data/docs/assets/js/mathjax.js +18 -0
  100. data/docs/assets/videos/htm_video.mp4 +0 -0
  101. data/docs/database_rake_tasks.md +322 -0
  102. data/docs/development/contributing.md +787 -0
  103. data/docs/development/index.md +336 -0
  104. data/docs/development/schema.md +596 -0
  105. data/docs/development/setup.md +719 -0
  106. data/docs/development/testing.md +819 -0
  107. data/docs/guides/adding-memories.md +824 -0
  108. data/docs/guides/context-assembly.md +1009 -0
  109. data/docs/guides/getting-started.md +577 -0
  110. data/docs/guides/index.md +118 -0
  111. data/docs/guides/long-term-memory.md +941 -0
  112. data/docs/guides/multi-robot.md +866 -0
  113. data/docs/guides/recalling-memories.md +927 -0
  114. data/docs/guides/search-strategies.md +953 -0
  115. data/docs/guides/working-memory.md +717 -0
  116. data/docs/index.md +214 -0
  117. data/docs/installation.md +477 -0
  118. data/docs/multi_framework_support.md +519 -0
  119. data/docs/quick-start.md +655 -0
  120. data/docs/setup_local_database.md +302 -0
  121. data/docs/using_rake_tasks_in_your_app.md +383 -0
  122. data/examples/basic_usage.rb +93 -0
  123. data/examples/cli_app/README.md +317 -0
  124. data/examples/cli_app/htm_cli.rb +270 -0
  125. data/examples/custom_llm_configuration.rb +183 -0
  126. data/examples/example_app/Rakefile +71 -0
  127. data/examples/example_app/app.rb +206 -0
  128. data/examples/sinatra_app/Gemfile +21 -0
  129. data/examples/sinatra_app/app.rb +335 -0
  130. data/lib/htm/active_record_config.rb +113 -0
  131. data/lib/htm/configuration.rb +342 -0
  132. data/lib/htm/database.rb +594 -0
  133. data/lib/htm/embedding_service.rb +115 -0
  134. data/lib/htm/errors.rb +34 -0
  135. data/lib/htm/job_adapter.rb +154 -0
  136. data/lib/htm/jobs/generate_embedding_job.rb +65 -0
  137. data/lib/htm/jobs/generate_tags_job.rb +82 -0
  138. data/lib/htm/long_term_memory.rb +965 -0
  139. data/lib/htm/models/node.rb +109 -0
  140. data/lib/htm/models/node_tag.rb +33 -0
  141. data/lib/htm/models/robot.rb +52 -0
  142. data/lib/htm/models/tag.rb +76 -0
  143. data/lib/htm/railtie.rb +76 -0
  144. data/lib/htm/sinatra.rb +157 -0
  145. data/lib/htm/tag_service.rb +135 -0
  146. data/lib/htm/tasks.rb +38 -0
  147. data/lib/htm/version.rb +5 -0
  148. data/lib/htm/working_memory.rb +182 -0
  149. data/lib/htm.rb +400 -0
  150. data/lib/tasks/db.rake +19 -0
  151. data/lib/tasks/htm.rake +147 -0
  152. data/lib/tasks/jobs.rake +312 -0
  153. data/mkdocs.yml +190 -0
  154. data/scripts/install_local_database.sh +309 -0
  155. metadata +341 -0
@@ -0,0 +1,299 @@
1
+ # ADR-013: ActiveRecord ORM and Many-to-Many Tagging System
2
+
3
+ **Status**: Accepted
4
+
5
+ **Date**: 2025-10-29
6
+
7
+ **Decision Makers**: Dewayne VanHoozer, Claude (Anthropic)
8
+
9
+ ---
10
+
11
+ ## Context
12
+
13
+ HTM's database layer initially used direct SQL queries via PG gem for all database operations. As the system evolved, several pain points emerged:
14
+
15
+ - **Code duplication**: Similar SQL queries repeated across methods
16
+ - **No schema versioning**: Schema changes were manual and error-prone
17
+ - **Limited validation**: No model-level validations or constraints
18
+ - **Complex queries**: Hand-written SQL for relationships and joins
19
+ - **Testing difficulty**: Hard to mock database interactions
20
+ - **Migration management**: No systematic way to evolve schema
21
+
22
+ Additionally, the initial tagging system stored tags as a simple column in the nodes table, limiting flexibility:
23
+
24
+ - **Single tag per node**: Could not express multiple categories
25
+ - **No tag reuse**: Each node duplicated tag text
26
+ - **Difficult queries**: Finding nodes by tag required text matching
27
+ - **No tag hierarchy**: Could not organize tags into hierarchies
28
+
29
+ ## Decision
30
+
31
+ We will:
32
+
33
+ 1. **Adopt ActiveRecord ORM** for database interactions with PostgreSQL
34
+ 2. **Implement proper ActiveRecord models**: Robot, Node, Tag, NodeTag
35
+ 3. **Use ActiveRecord migrations** for schema version control
36
+ 4. **Create many-to-many tagging** via join table (nodes_tags)
37
+ 5. **Support hierarchical tags** using colon-separated namespaces
38
+
39
+ ## Rationale
40
+
41
+ ### Why ActiveRecord?
42
+
43
+ **Production-proven ORM**:
44
+ - Mature, battle-tested Rails component
45
+ - Extensive documentation and community support
46
+ - Works standalone without Rails framework
47
+ - Handles connection pooling automatically
48
+
49
+ **Schema Management**:
50
+ - Migration system provides version control for database changes
51
+ - Rollback capability for schema changes
52
+ - `schema.sql` dump for canonical schema representation
53
+ - Easy team collaboration on schema evolution
54
+
55
+ **Model Layer Benefits**:
56
+ - Associations handle complex joins automatically
57
+ - Validations at model level prevent bad data
58
+ - Callbacks for lifecycle hooks
59
+ - Scopes for reusable query patterns
60
+ - Testing helpers for mocking
61
+
62
+ **Query Building**:
63
+ - Chainable query interface (`.where().order().limit()`)
64
+ - Prevents SQL injection automatically
65
+ - Generates optimized SQL
66
+ - Database-agnostic (could switch from PostgreSQL if needed)
67
+
68
+ ### Why Many-to-Many Tagging?
69
+
70
+ **Flexibility**:
71
+ - Nodes can have multiple tags
72
+ - Tags can be applied to multiple nodes
73
+ - Tag relationships are explicit and queryable
74
+
75
+ **Data Normalization**:
76
+ - Tags stored once, referenced many times
77
+ - Tag updates affect all associated nodes
78
+ - Referential integrity via foreign keys
79
+
80
+ **Hierarchical Organization**:
81
+ - Colon-separated namespaces: `ai:llm:embeddings`
82
+ - Query by prefix: `WHERE name LIKE 'ai:llm:%'`
83
+ - Enables ontology-like structure
84
+ - LLM can generate contextual tags
85
+
86
+ **Query Power**:
87
+ - Find all tags for a node (simple join)
88
+ - Find all nodes with tag (reverse join)
89
+ - Find related tags by shared nodes
90
+ - Combine with vector/full-text search
91
+
92
+ ## Implementation Details
93
+
94
+ ### ActiveRecord Models
95
+
96
+ ```ruby
97
+ # lib/htm/models/robot.rb
98
+ class HTM::Models::Robot < ActiveRecord::Base
99
+ has_many :nodes, dependent: :destroy
100
+ end
101
+
102
+ # lib/htm/models/node.rb
103
+ class HTM::Models::Node < ActiveRecord::Base
104
+ belongs_to :robot
105
+ has_many :node_tags, dependent: :destroy
106
+ has_many :tags, through: :node_tags
107
+ end
108
+
109
+ # lib/htm/models/tag.rb
110
+ class HTM::Models::Tag < ActiveRecord::Base
111
+ has_many :node_tags, dependent: :destroy
112
+ has_many :nodes, through: :node_tags
113
+ validates :name, presence: true, uniqueness: true
114
+ end
115
+
116
+ # lib/htm/models/node_tag.rb
117
+ class HTM::Models::NodeTag < ActiveRecord::Base
118
+ self.table_name = 'nodes_tags'
119
+ belongs_to :node
120
+ belongs_to :tag
121
+ validates :tag_id, uniqueness: { scope: :node_id }
122
+ end
123
+ ```
124
+
125
+ ### Database Schema
126
+
127
+ **robots** table:
128
+ - `id` (bigint, primary key)
129
+ - `name` (text)
130
+ - `created_at`, `last_active` (timestamptz)
131
+ - `metadata` (jsonb)
132
+
133
+ **nodes** table:
134
+ - `id` (bigint, primary key)
135
+ - `content`, `speaker` (text, not null)
136
+ - `type`, `category` (text)
137
+ - `importance` (double precision)
138
+ - `created_at`, `updated_at`, `last_accessed` (timestamptz)
139
+ - `token_count` (integer)
140
+ - `in_working_memory` (boolean)
141
+ - `robot_id` (bigint, foreign key → robots)
142
+ - `embedding` (vector(2000))
143
+ - `embedding_dimension` (integer)
144
+
145
+ **tags** table:
146
+ - `id` (bigint, primary key)
147
+ - `name` (text, unique, not null)
148
+ - `created_at` (timestamptz)
149
+
150
+ **nodes_tags** join table:
151
+ - `id` (bigint, primary key)
152
+ - `node_id` (bigint, foreign key → nodes)
153
+ - `tag_id` (bigint, foreign key → tags)
154
+ - `created_at` (timestamptz)
155
+ - Unique constraint on (node_id, tag_id)
156
+
157
+ ### Migration System
158
+
159
+ Migrations in `db/migrate/`:
160
+ - `20250101000001_create_robots.rb`
161
+ - `20250101000002_create_nodes.rb`
162
+ - `20250101000005_create_tags.rb`
163
+
164
+ Apply: `bundle exec rake htm:db:migrate`
165
+ Dump: `bundle exec rake htm:db:schema:dump`
166
+
167
+ ### Tag Hierarchy Examples
168
+
169
+ ```ruby
170
+ # Programming tags
171
+ 'programming:ruby:gems'
172
+ 'programming:ruby:activerecord'
173
+ 'programming:python:django'
174
+
175
+ # AI tags
176
+ 'ai:llm:embeddings'
177
+ 'ai:llm:prompts'
178
+ 'ai:rag:retrieval'
179
+
180
+ # Database tags
181
+ 'database:postgresql:indexes'
182
+ 'database:postgresql:extensions'
183
+ ```
184
+
185
+ Query patterns:
186
+ ```sql
187
+ -- All Ruby-related tags
188
+ SELECT * FROM tags WHERE name LIKE 'programming:ruby:%';
189
+
190
+ -- All LLM-related nodes
191
+ SELECT n.* FROM nodes n
192
+ JOIN nodes_tags nt ON n.id = nt.node_id
193
+ JOIN tags t ON nt.tag_id = t.id
194
+ WHERE t.name LIKE 'ai:llm:%';
195
+ ```
196
+
197
+ ## Consequences
198
+
199
+ ### Positive
200
+
201
+ ✅ **Schema version control**: Migrations provide audit trail of all schema changes
202
+ ✅ **Model validations**: Prevent invalid data at application layer
203
+ ✅ **Association power**: `node.tags` and `tag.nodes` work automatically
204
+ ✅ **Query safety**: ActiveRecord prevents SQL injection
205
+ ✅ **Testing improvement**: Models can be easily stubbed/mocked
206
+ ✅ **Code clarity**: `Node.where(type: 'fact')` vs raw SQL
207
+ ✅ **Tag flexibility**: Multiple tags per node, hierarchical organization
208
+ ✅ **Tag reuse**: Same tag on many nodes without duplication
209
+ ✅ **Referential integrity**: Foreign keys enforce consistency
210
+ ✅ **Cascade deletes**: Deleting node removes its tag associations
211
+
212
+ ### Negative
213
+
214
+ ❌ **Added dependency**: ActiveRecord gem and its dependencies
215
+ ❌ **Learning curve**: Developers need to know ActiveRecord API
216
+ ❌ **Abstraction overhead**: Slight performance cost vs raw SQL
217
+ ❌ **Magic behavior**: Callbacks and hooks can surprise developers
218
+ ❌ **Migration complexity**: Schema changes require migration files
219
+
220
+ ### Neutral
221
+
222
+ ➡️ **File organization**: Models in `lib/htm/models/`, migrations in `db/migrate/`
223
+ ➡️ **Configuration**: `lib/htm/active_record_config.rb` manages setup
224
+ ➡️ **Naming conventions**: Rails conventions (snake_case tables, CamelCase models)
225
+
226
+ ## Removed Features
227
+
228
+ To streamline the schema, the following tables were removed:
229
+
230
+ **relationships table**: Originally intended for knowledge graph edges between nodes
231
+ - **Reason**: Not used in current implementation
232
+ - **Future**: Could be re-added via migration if graph features needed
233
+
234
+ **operations_log table**: Originally intended for audit trail
235
+ - **Reason**: Not used in current implementation
236
+ - **Future**: Could use ActiveRecord callbacks or separate audit gem
237
+
238
+ ## Risks and Mitigations
239
+
240
+ ### Risk: ActiveRecord Complexity
241
+
242
+ - **Risk**: Developers misuse callbacks or create N+1 queries
243
+ - **Likelihood**: Medium
244
+ - **Impact**: Medium (performance degradation)
245
+ - **Mitigation**: Code reviews, use `includes()` for associations, monitor query patterns
246
+
247
+ ### Risk: Migration Conflicts
248
+
249
+ - **Risk**: Multiple developers create conflicting migrations
250
+ - **Likelihood**: Low (small team)
251
+ - **Impact**: Low (easy to resolve)
252
+ - **Mitigation**: Communication, timestamp-based migration names
253
+
254
+ ### Risk: Tag Proliferation
255
+
256
+ - **Risk**: Too many similar tags created (typos, inconsistent naming)
257
+ - **Likelihood**: Medium
258
+ - **Impact**: Low (cluttered tag space)
259
+ - **Mitigation**: LLM-driven tag normalization, tag search/suggestion features
260
+
261
+ ## Alternatives Considered
262
+
263
+ | Approach | Pros | Cons | Decision |
264
+ |----------|------|------|----------|
265
+ | Raw SQL (PG gem) | Maximum control, no dependencies | Boilerplate, no validations, error-prone | ❌ Rejected |
266
+ | Sequel ORM | Lightweight, flexible | Less mature than ActiveRecord | ❌ Rejected |
267
+ | ActiveRecord | Production-proven, migrations, associations | Heavier, Rails conventions | ✅ **Accepted** |
268
+ | Single-table tags | Simpler schema | No tag reuse, limited queries | ❌ Rejected |
269
+ | EAV pattern | Maximum flexibility | Query complexity, performance | ❌ Rejected |
270
+ | Many-to-many tags | Normalized, flexible, powerful | Join table overhead | ✅ **Accepted** |
271
+
272
+ ## Future Considerations
273
+
274
+ - **Tag autocomplete**: Suggest existing tags when tagging nodes
275
+ - **Tag merging**: Combine similar/duplicate tags
276
+ - **Tag statistics**: Most used tags, tag co-occurrence
277
+ - **Tag hierarchies**: Formal parent-child relationships beyond namespace convention
278
+ - **Tag permissions**: Some tags restricted to certain robots
279
+ - **ActiveRecord optimizations**: Eager loading, counter caches, read replicas
280
+
281
+ ## References
282
+
283
+ - [ActiveRecord Documentation](https://api.rubyonrails.org/classes/ActiveRecord/Base.html)
284
+ - [ActiveRecord Migrations](https://guides.rubyonrails.org/active_record_migrations.html)
285
+ - [ActiveRecord Associations](https://guides.rubyonrails.org/association_basics.html)
286
+ - [ADR-001: PostgreSQL Storage](./001-use-postgresql-timescaledb-storage.md)
287
+ - [Schema Documentation](../../../docs/development/schema.md)
288
+
289
+ ## Review Notes
290
+
291
+ **Systems Architect**: ✅ ActiveRecord is a solid choice for this scale. The many-to-many tagging provides good flexibility without over-engineering.
292
+
293
+ **Database Architect**: ✅ Proper foreign keys and unique constraints ensure data integrity. The join table follows best practices.
294
+
295
+ **Ruby Expert**: ✅ ActiveRecord integration is clean. Models follow Rails conventions which makes the codebase more approachable.
296
+
297
+ **Maintainability Expert**: ✅ Migrations provide crucial schema version control. Much better than manual SQL scripts.
298
+
299
+ **Performance Specialist**: ⚠️ Monitor for N+1 queries. Consider adding indexes on tag.name pattern queries and counter caches if tag counts are frequently accessed.