htm 0.0.1 → 0.0.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (184) hide show
  1. checksums.yaml +4 -4
  2. data/.aigcm_msg +1 -0
  3. data/.architecture/reviews/comprehensive-codebase-review.md +577 -0
  4. data/.claude/settings.local.json +92 -0
  5. data/.envrc +1 -0
  6. data/.irbrc +283 -80
  7. data/.tbls.yml +31 -0
  8. data/CHANGELOG.md +314 -16
  9. data/CLAUDE.md +603 -0
  10. data/README.md +76 -5
  11. data/Rakefile +5 -0
  12. data/SETUP.md +132 -101
  13. data/db/migrate/{20250101000001_enable_extensions.rb → 00001_enable_extensions.rb} +0 -1
  14. data/db/migrate/00002_create_robots.rb +11 -0
  15. data/db/migrate/00003_create_file_sources.rb +20 -0
  16. data/db/migrate/00004_create_nodes.rb +65 -0
  17. data/db/migrate/00005_create_tags.rb +13 -0
  18. data/db/migrate/00006_create_node_tags.rb +18 -0
  19. data/db/migrate/00007_create_robot_nodes.rb +26 -0
  20. data/db/migrate/00009_add_working_memory_to_robot_nodes.rb +12 -0
  21. data/db/schema.sql +390 -36
  22. data/docs/api/database.md +19 -232
  23. data/docs/api/embedding-service.md +1 -7
  24. data/docs/api/htm.md +305 -364
  25. data/docs/api/index.md +1 -7
  26. data/docs/api/long-term-memory.md +342 -590
  27. data/docs/api/yard/HTM/ActiveRecordConfig.md +23 -0
  28. data/docs/api/yard/HTM/AuthorizationError.md +11 -0
  29. data/docs/api/yard/HTM/CircuitBreaker.md +92 -0
  30. data/docs/api/yard/HTM/CircuitBreakerOpenError.md +34 -0
  31. data/docs/api/yard/HTM/Configuration.md +175 -0
  32. data/docs/api/yard/HTM/Database.md +99 -0
  33. data/docs/api/yard/HTM/DatabaseError.md +14 -0
  34. data/docs/api/yard/HTM/EmbeddingError.md +18 -0
  35. data/docs/api/yard/HTM/EmbeddingService.md +58 -0
  36. data/docs/api/yard/HTM/Error.md +11 -0
  37. data/docs/api/yard/HTM/JobAdapter.md +39 -0
  38. data/docs/api/yard/HTM/LongTermMemory.md +342 -0
  39. data/docs/api/yard/HTM/NotFoundError.md +17 -0
  40. data/docs/api/yard/HTM/Observability.md +107 -0
  41. data/docs/api/yard/HTM/QueryTimeoutError.md +19 -0
  42. data/docs/api/yard/HTM/Railtie.md +27 -0
  43. data/docs/api/yard/HTM/ResourceExhaustedError.md +13 -0
  44. data/docs/api/yard/HTM/TagError.md +18 -0
  45. data/docs/api/yard/HTM/TagService.md +67 -0
  46. data/docs/api/yard/HTM/Timeframe/Result.md +24 -0
  47. data/docs/api/yard/HTM/Timeframe.md +40 -0
  48. data/docs/api/yard/HTM/TimeframeExtractor/Result.md +24 -0
  49. data/docs/api/yard/HTM/TimeframeExtractor.md +45 -0
  50. data/docs/api/yard/HTM/ValidationError.md +20 -0
  51. data/docs/api/yard/HTM/WorkingMemory.md +131 -0
  52. data/docs/api/yard/HTM.md +80 -0
  53. data/docs/api/yard/index.csv +179 -0
  54. data/docs/api/yard-reference.md +51 -0
  55. data/docs/architecture/adrs/001-postgresql-timescaledb.md +1 -1
  56. data/docs/architecture/adrs/003-ollama-embeddings.md +1 -1
  57. data/docs/architecture/adrs/010-redis-working-memory-rejected.md +2 -27
  58. data/docs/architecture/adrs/index.md +2 -13
  59. data/docs/architecture/hive-mind.md +165 -166
  60. data/docs/architecture/index.md +2 -2
  61. data/docs/architecture/overview.md +5 -171
  62. data/docs/architecture/two-tier-memory.md +1 -35
  63. data/docs/assets/images/adr-010-current-architecture.svg +37 -0
  64. data/docs/assets/images/adr-010-proposed-architecture.svg +48 -0
  65. data/docs/assets/images/adr-dependency-tree.svg +93 -0
  66. data/docs/assets/images/class-hierarchy.svg +55 -0
  67. data/docs/assets/images/exception-hierarchy.svg +45 -0
  68. data/docs/assets/images/htm-architecture-overview.svg +83 -0
  69. data/docs/assets/images/htm-complete-memory-flow.svg +160 -0
  70. data/docs/assets/images/htm-context-assembly-flow.svg +148 -0
  71. data/docs/assets/images/htm-eviction-process.svg +141 -0
  72. data/docs/assets/images/htm-memory-addition-flow.svg +138 -0
  73. data/docs/assets/images/htm-memory-recall-flow.svg +152 -0
  74. data/docs/assets/images/htm-node-states.svg +123 -0
  75. data/docs/assets/images/project-structure.svg +78 -0
  76. data/docs/assets/images/test-directory-structure.svg +38 -0
  77. data/{dbdoc → docs/database}/README.md +127 -125
  78. data/docs/database/public.file_sources.md +42 -0
  79. data/docs/database/public.file_sources.svg +211 -0
  80. data/{dbdoc → docs/database}/public.node_tags.md +7 -8
  81. data/docs/database/public.node_tags.svg +239 -0
  82. data/{dbdoc → docs/database}/public.nodes.md +22 -17
  83. data/docs/database/public.nodes.svg +271 -0
  84. data/docs/database/public.robot_nodes.md +46 -0
  85. data/docs/database/public.robot_nodes.svg +243 -0
  86. data/{dbdoc → docs/database}/public.robots.md +2 -3
  87. data/docs/database/public.robots.svg +161 -0
  88. data/docs/database/public.tags.svg +139 -0
  89. data/{dbdoc → docs/database}/schema.json +941 -630
  90. data/docs/database/schema.svg +282 -0
  91. data/docs/development/index.md +1 -29
  92. data/docs/development/schema.md +134 -309
  93. data/docs/development/testing.md +1 -9
  94. data/docs/getting-started/index.md +47 -0
  95. data/docs/{installation.md → getting-started/installation.md} +2 -2
  96. data/docs/{quick-start.md → getting-started/quick-start.md} +5 -5
  97. data/docs/guides/adding-memories.md +295 -643
  98. data/docs/guides/recalling-memories.md +36 -1
  99. data/docs/guides/search-strategies.md +85 -51
  100. data/docs/images/htm-er-diagram.svg +156 -0
  101. data/docs/index.md +16 -31
  102. data/docs/multi_framework_support.md +4 -4
  103. data/examples/README.md +280 -0
  104. data/examples/basic_usage.rb +18 -16
  105. data/examples/cli_app/htm_cli.rb +146 -8
  106. data/examples/cli_app/temp.log +93 -0
  107. data/examples/custom_llm_configuration.rb +1 -2
  108. data/examples/example_app/app.rb +11 -14
  109. data/examples/file_loader_usage.rb +177 -0
  110. data/examples/robot_groups/lib/robot_group.rb +419 -0
  111. data/examples/robot_groups/lib/working_memory_channel.rb +140 -0
  112. data/examples/robot_groups/multi_process.rb +286 -0
  113. data/examples/robot_groups/robot_worker.rb +136 -0
  114. data/examples/robot_groups/same_process.rb +229 -0
  115. data/examples/sinatra_app/Gemfile +1 -0
  116. data/examples/sinatra_app/Gemfile.lock +166 -0
  117. data/examples/sinatra_app/app.rb +219 -24
  118. data/examples/timeframe_demo.rb +276 -0
  119. data/lib/htm/active_record_config.rb +10 -3
  120. data/lib/htm/circuit_breaker.rb +202 -0
  121. data/lib/htm/configuration.rb +313 -80
  122. data/lib/htm/database.rb +67 -36
  123. data/lib/htm/embedding_service.rb +39 -2
  124. data/lib/htm/errors.rb +131 -11
  125. data/lib/htm/{sinatra.rb → integrations/sinatra.rb} +87 -12
  126. data/lib/htm/job_adapter.rb +10 -3
  127. data/lib/htm/jobs/generate_embedding_job.rb +5 -4
  128. data/lib/htm/jobs/generate_tags_job.rb +4 -0
  129. data/lib/htm/loaders/markdown_loader.rb +263 -0
  130. data/lib/htm/loaders/paragraph_chunker.rb +112 -0
  131. data/lib/htm/long_term_memory.rb +601 -321
  132. data/lib/htm/models/file_source.rb +99 -0
  133. data/lib/htm/models/node.rb +116 -12
  134. data/lib/htm/models/robot.rb +53 -4
  135. data/lib/htm/models/robot_node.rb +51 -0
  136. data/lib/htm/models/tag.rb +302 -0
  137. data/lib/htm/observability.rb +395 -0
  138. data/lib/htm/tag_service.rb +60 -3
  139. data/lib/htm/tasks.rb +29 -0
  140. data/lib/htm/timeframe.rb +194 -0
  141. data/lib/htm/timeframe_extractor.rb +307 -0
  142. data/lib/htm/version.rb +1 -1
  143. data/lib/htm/working_memory.rb +165 -70
  144. data/lib/htm.rb +352 -133
  145. data/lib/tasks/doc.rake +300 -0
  146. data/lib/tasks/files.rake +299 -0
  147. data/lib/tasks/htm.rake +188 -2
  148. data/lib/tasks/jobs.rake +10 -12
  149. data/lib/tasks/tags.rake +194 -0
  150. data/mkdocs.yml +91 -9
  151. data/notes/ARCHITECTURE_REVIEW.md +1167 -0
  152. data/notes/IMPLEMENTATION_SUMMARY.md +606 -0
  153. data/notes/MULTI_FRAMEWORK_IMPLEMENTATION.md +451 -0
  154. data/notes/next_steps.md +100 -0
  155. data/notes/plan.md +627 -0
  156. data/notes/tag_ontology_enhancement_ideas.md +222 -0
  157. data/notes/timescaledb_removal_summary.md +200 -0
  158. metadata +177 -37
  159. data/db/migrate/20250101000002_create_robots.rb +0 -14
  160. data/db/migrate/20250101000003_create_nodes.rb +0 -42
  161. data/db/migrate/20250101000005_create_tags.rb +0 -38
  162. data/db/migrate/20250101000007_add_node_vector_indexes.rb +0 -30
  163. data/dbdoc/public.node_tags.svg +0 -112
  164. data/dbdoc/public.nodes.svg +0 -118
  165. data/dbdoc/public.robots.svg +0 -90
  166. data/dbdoc/public.tags.svg +0 -60
  167. data/dbdoc/schema.svg +0 -154
  168. data/{dbdoc → docs/database}/public.node_stats.md +0 -0
  169. data/{dbdoc → docs/database}/public.node_stats.svg +0 -0
  170. data/{dbdoc → docs/database}/public.nodes_tags.md +0 -0
  171. data/{dbdoc → docs/database}/public.nodes_tags.svg +0 -0
  172. data/{dbdoc → docs/database}/public.ontology_structure.md +0 -0
  173. data/{dbdoc → docs/database}/public.ontology_structure.svg +0 -0
  174. data/{dbdoc → docs/database}/public.operations_log.md +0 -0
  175. data/{dbdoc → docs/database}/public.operations_log.svg +0 -0
  176. data/{dbdoc → docs/database}/public.relationships.md +0 -0
  177. data/{dbdoc → docs/database}/public.relationships.svg +0 -0
  178. data/{dbdoc → docs/database}/public.robot_activity.md +0 -0
  179. data/{dbdoc → docs/database}/public.robot_activity.svg +0 -0
  180. data/{dbdoc → docs/database}/public.schema_migrations.md +0 -0
  181. data/{dbdoc → docs/database}/public.schema_migrations.svg +0 -0
  182. data/{dbdoc → docs/database}/public.tags.md +3 -3
  183. /data/{dbdoc → docs/database}/public.topic_relationships.md +0 -0
  184. /data/{dbdoc → docs/database}/public.topic_relationships.svg +0 -0
@@ -1,6 +1,6 @@
1
1
  # Database Schema Documentation
2
2
 
3
- This document provides a comprehensive reference for HTM's PostgreSQL database schema, including all tables, indexes, and relationships.
3
+ This document provides a comprehensive reference for HTM's PostgreSQL database schema, including query patterns, optimization strategies, and best practices.
4
4
 
5
5
  ## Schema Overview
6
6
 
@@ -22,367 +22,179 @@ CREATE EXTENSION IF NOT EXISTS vector WITH SCHEMA public;
22
22
 
23
23
  ## Entity-Relationship Diagram
24
24
 
25
- Here's the complete database structure:
25
+ Here's the complete database structure (auto-generated by tbls):
26
26
 
27
- ```svg
28
- <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1200 900" style="background: transparent;">
29
- <defs>
30
- <style>
31
- .table-box { fill: #1e1e1e; stroke: #4a9eff; stroke-width: 2; }
32
- .table-header { fill: #2d5a8e; }
33
- .text-header { fill: #ffffff; font-family: monospace; font-size: 14px; font-weight: bold; }
34
- .text-field { fill: #d4d4d4; font-family: monospace; font-size: 11px; }
35
- .text-type { fill: #8cb4e8; font-family: monospace; font-size: 10px; }
36
- .relation-line { stroke: #4a9eff; stroke-width: 1.5; fill: none; }
37
- .arrow { fill: #4a9eff; }
38
- .join-table { fill: #1e3a1e; stroke: #4a9eff; stroke-width: 2; }
39
- </style>
40
- </defs>
27
+ ![HTM Entity-Relationship Diagram](../database/schema.svg)
41
28
 
42
- <!-- Robots Table -->
43
- <rect class="table-box" x="50" y="50" width="280" height="140" rx="5"/>
44
- <rect class="table-header" x="50" y="50" width="280" height="35" rx="5"/>
45
- <text class="text-header" x="190" y="73" text-anchor="middle">robots</text>
29
+ ## Table Reference
46
30
 
47
- <text class="text-field" x="60" y="100">id</text>
48
- <text class="text-type" x="320" y="100" text-anchor="end">BIGSERIAL PK</text>
31
+ For detailed table definitions, columns, indexes, and constraints, see the auto-generated documentation:
49
32
 
50
- <text class="text-field" x="60" y="120">name</text>
51
- <text class="text-type" x="320" y="120" text-anchor="end">TEXT</text>
33
+ ### Core Tables
52
34
 
53
- <text class="text-field" x="60" y="140">created_at</text>
54
- <text class="text-type" x="320" y="140" text-anchor="end">TIMESTAMPTZ</text>
35
+ | Table | Description | Details |
36
+ |-------|-------------|---------|
37
+ | [robots](../database/public.robots.md) | Registry of all LLM robots using the HTM system | Stores robot metadata and activity tracking |
38
+ | [nodes](../database/public.nodes.md) | Core memory storage for conversation messages and context | Vector embeddings, full-text search, deduplication |
39
+ | [tags](../database/public.tags.md) | Unique hierarchical tag names for categorization | Colon-separated namespaces (e.g., `ai:llm:embeddings`) |
40
+ | file_sources | Source file metadata for loaded documents | Path, mtime, frontmatter, sync tracking |
55
41
 
56
- <text class="text-field" x="60" y="160">last_active</text>
57
- <text class="text-type" x="320" y="160" text-anchor="end">TIMESTAMPTZ</text>
42
+ ### Join Tables
58
43
 
59
- <text class="text-field" x="60" y="180">metadata</text>
60
- <text class="text-type" x="320" y="180" text-anchor="end">JSONB</text>
44
+ | Table | Description | Details |
45
+ |-------|-------------|---------|
46
+ | [robot_nodes](../database/public.robot_nodes.md) | Links robots to nodes (many-to-many) | Enables "hive mind" shared memory; includes `working_memory` boolean for per-robot working memory state |
47
+ | [node_tags](../database/public.node_tags.md) | Links nodes to tags (many-to-many) | Flexible multi-tag categorization |
61
48
 
62
- <!-- Nodes Table -->
63
- <rect class="table-box" x="50" y="250" width="280" height="400" rx="5"/>
64
- <rect class="table-header" x="50" y="250" width="280" height="35" rx="5"/>
65
- <text class="text-header" x="190" y="273" text-anchor="middle">nodes</text>
49
+ ### System Tables
66
50
 
67
- <text class="text-field" x="60" y="300">id</text>
68
- <text class="text-type" x="320" y="300" text-anchor="end">BIGSERIAL PK</text>
51
+ | Table | Description | Details |
52
+ |-------|-------------|---------|
53
+ | [schema_migrations](../database/public.schema_migrations.md) | ActiveRecord migration tracking | Tracks applied migrations |
69
54
 
70
- <text class="text-field" x="60" y="320">content</text>
71
- <text class="text-type" x="320" y="320" text-anchor="end">TEXT NOT NULL</text>
55
+ For the complete schema overview including all stored procedures and functions, see the [Database Tables Overview](../database/README.md).
72
56
 
73
- <text class="text-field" x="60" y="340">speaker</text>
74
- <text class="text-type" x="320" y="340" text-anchor="end">TEXT NOT NULL</text>
57
+ ## Key Concepts
75
58
 
76
- <text class="text-field" x="60" y="360">type</text>
77
- <text class="text-type" x="320" y="360" text-anchor="end">TEXT</text>
59
+ ### Content Deduplication
78
60
 
79
- <text class="text-field" x="60" y="380">category</text>
80
- <text class="text-type" x="320" y="380" text-anchor="end">TEXT</text>
61
+ Content deduplication is enforced via SHA-256 hashing in the `nodes` table:
81
62
 
82
- <text class="text-field" x="60" y="400">importance</text>
83
- <text class="text-type" x="320" y="400" text-anchor="end">DOUBLE PRECISION</text>
63
+ 1. When `remember()` is called, a SHA-256 hash of the content is computed
64
+ 2. If a node with the same `content_hash` exists, the existing node is reused
65
+ 3. A new `robot_nodes` association is created (or updated if it already exists)
66
+ 4. This ensures identical memories are stored once but can be "remembered" by multiple robots
84
67
 
85
- <text class="text-field" x="60" y="420">created_at</text>
86
- <text class="text-type" x="320" y="420" text-anchor="end">TIMESTAMPTZ</text>
68
+ ### JSONB Metadata
87
69
 
88
- <text class="text-field" x="60" y="440">updated_at</text>
89
- <text class="text-type" x="320" y="440" text-anchor="end">TIMESTAMPTZ</text>
70
+ The `nodes` table includes a `metadata` JSONB column for flexible key-value storage:
90
71
 
91
- <text class="text-field" x="60" y="460">last_accessed</text>
92
- <text class="text-type" x="320" y="460" text-anchor="end">TIMESTAMPTZ</text>
72
+ | Column | Type | Default | Description |
73
+ |--------|------|---------|-------------|
74
+ | `metadata` | jsonb | `{}` | Arbitrary key-value data |
93
75
 
94
- <text class="text-field" x="60" y="480">token_count</text>
95
- <text class="text-type" x="320" y="480" text-anchor="end">INTEGER</text>
76
+ **Features:**
77
+ - Stores any valid JSON data (strings, numbers, booleans, arrays, objects)
78
+ - GIN index (`idx_nodes_metadata`) for efficient containment queries
79
+ - Queried using PostgreSQL's `@>` containment operator
96
80
 
97
- <text class="text-field" x="60" y="500">in_working_memory</text>
98
- <text class="text-type" x="320" y="500" text-anchor="end">BOOLEAN</text>
99
-
100
- <text class="text-field" x="60" y="520">robot_id</text>
101
- <text class="text-type" x="320" y="520" text-anchor="end">BIGINT FK</text>
102
-
103
- <text class="text-field" x="60" y="540">embedding</text>
104
- <text class="text-type" x="320" y="540" text-anchor="end">vector(2000)</text>
105
-
106
- <text class="text-field" x="60" y="560">embedding_dimension</text>
107
- <text class="text-type" x="320" y="560" text-anchor="end">INTEGER</text>
108
-
109
- <!-- Tags Table -->
110
- <rect class="table-box" x="850" y="250" width="280" height="120" rx="5"/>
111
- <rect class="table-header" x="850" y="250" width="280" height="35" rx="5"/>
112
- <text class="text-header" x="990" y="273" text-anchor="middle">tags</text>
113
-
114
- <text class="text-field" x="860" y="300">id</text>
115
- <text class="text-type" x="1120" y="300" text-anchor="end">BIGSERIAL PK</text>
116
-
117
- <text class="text-field" x="860" y="320">name</text>
118
- <text class="text-type" x="1120" y="320" text-anchor="end">TEXT UNIQUE</text>
119
-
120
- <text class="text-field" x="860" y="340">created_at</text>
121
- <text class="text-type" x="1120" y="340" text-anchor="end">TIMESTAMPTZ</text>
122
-
123
- <!-- nodes_tags Join Table -->
124
- <rect class="join-table" x="450" y="420" width="280" height="140" rx="5"/>
125
- <rect class="table-header" x="450" y="420" width="280" height="35" rx="5"/>
126
- <text class="text-header" x="590" y="443" text-anchor="middle">nodes_tags</text>
127
-
128
- <text class="text-field" x="460" y="470">id</text>
129
- <text class="text-type" x="720" y="470" text-anchor="end">BIGSERIAL PK</text>
130
-
131
- <text class="text-field" x="460" y="490">node_id</text>
132
- <text class="text-type" x="720" y="490" text-anchor="end">BIGINT FK</text>
133
-
134
- <text class="text-field" x="460" y="510">tag_id</text>
135
- <text class="text-type" x="720" y="510" text-anchor="end">BIGINT FK</text>
136
-
137
- <text class="text-field" x="460" y="530">created_at</text>
138
- <text class="text-type" x="720" y="530" text-anchor="end">TIMESTAMPTZ</text>
139
-
140
- <!-- Relationships: robots -> nodes -->
141
- <path class="relation-line" d="M 190 190 L 190 250"/>
142
- <polygon class="arrow" points="190,250 185,240 195,240"/>
143
-
144
- <!-- Relationships: nodes -> nodes_tags -->
145
- <path class="relation-line" d="M 330 490 L 450 490"/>
146
- <polygon class="arrow" points="450,490 440,485 440,495"/>
147
-
148
- <!-- Relationships: tags -> nodes_tags -->
149
- <path class="relation-line" d="M 850 310 L 730 310 L 730 510 L 730 510"/>
150
- <polygon class="arrow" points="730,510 725,500 735,500"/>
81
+ **Query examples:**
82
+ ```sql
83
+ -- Find nodes with specific metadata
84
+ SELECT * FROM nodes WHERE metadata @> '{"priority": "high"}'::jsonb;
151
85
 
152
- <!-- Legend -->
153
- <text class="text-field" x="50" y="720" font-weight="bold">Legend:</text>
154
- <text class="text-field" x="50" y="740">PK = Primary Key</text>
155
- <text class="text-field" x="200" y="740">FK = Foreign Key</text>
156
- <text class="text-field" x="50" y="760">Green box = Join table (many-to-many)</text>
86
+ -- Find nodes with nested metadata
87
+ SELECT * FROM nodes WHERE metadata @> '{"user": {"role": "admin"}}'::jsonb;
157
88
 
158
- <!-- Annotations -->
159
- <text class="text-field" x="400" y="370" font-style="italic">1:N</text>
160
- <text class="text-field" x="380" y="480" font-style="italic">N:M</text>
161
- <text class="text-field" x="770" y="480" font-style="italic">N:M</text>
162
- </svg>
89
+ -- Find nodes with multiple conditions
90
+ SELECT * FROM nodes WHERE metadata @> '{"environment": "production", "version": 2}'::jsonb;
163
91
  ```
164
92
 
165
- ## Table Definitions
93
+ **Ruby usage:**
94
+ ```ruby
95
+ # Store with metadata
96
+ htm.remember("API config", metadata: { environment: "production", version: 2 })
166
97
 
167
- ### robots
98
+ # Recall filtering by metadata
99
+ htm.recall("config", metadata: { environment: "production" })
100
+ ```
168
101
 
169
- The robots table stores registration and metadata for all LLM agents using the HTM system.
102
+ ### Hierarchical Tags
170
103
 
171
- **Purpose**: Registry of all robots (LLM agents) with their configuration and activity tracking.
104
+ Tags use colon-separated hierarchies for organization:
105
+ - `programming:ruby:gems` - Programming > Ruby > Gems
106
+ - `database:postgresql:extensions` - Database > PostgreSQL > Extensions
107
+ - `ai:llm:embeddings` - AI > LLM > Embeddings
172
108
 
109
+ Query by prefix to find all related tags:
173
110
  ```sql
174
- CREATE TABLE public.robots (
175
- id bigint NOT NULL,
176
- name text,
177
- created_at timestamp with time zone DEFAULT CURRENT_TIMESTAMP,
178
- last_active timestamp with time zone DEFAULT CURRENT_TIMESTAMP,
179
- metadata jsonb
180
- );
181
-
182
- ALTER TABLE ONLY public.robots ALTER COLUMN id SET DEFAULT nextval('public.robots_id_seq'::regclass);
183
- ALTER TABLE ONLY public.robots ADD CONSTRAINT robots_pkey PRIMARY KEY (id);
111
+ SELECT * FROM tags WHERE name LIKE 'database:%'; -- All database-related tags
112
+ SELECT * FROM tags WHERE name LIKE 'ai:llm:%'; -- All LLM-related tags
184
113
  ```
185
114
 
186
- **Columns**:
115
+ ### File Source Tracking
187
116
 
188
- | Column | Type | Nullable | Default | Description |
189
- |--------|------|----------|---------|-------------|
190
- | `id` | BIGINT | NO | AUTO | Unique identifier (primary key) |
191
- | `name` | TEXT | YES | NULL | Human-readable name for the robot |
192
- | `created_at` | TIMESTAMPTZ | YES | NOW() | When the robot was first registered |
193
- | `last_active` | TIMESTAMPTZ | YES | NOW() | Last time the robot accessed the system |
194
- | `metadata` | JSONB | YES | NULL | Robot-specific configuration and metadata |
117
+ The `file_sources` table tracks loaded documents for re-sync support:
195
118
 
196
- **Indexes**:
197
- - `PRIMARY KEY` on `id`
119
+ | Column | Type | Description |
120
+ |--------|------|-------------|
121
+ | `id` | bigint | Primary key |
122
+ | `file_path` | text | Absolute path to the source file |
123
+ | `file_hash` | varchar(64) | SHA-256 hash of file contents |
124
+ | `mtime` | timestamptz | File modification time for change detection |
125
+ | `file_size` | integer | File size in bytes |
126
+ | `frontmatter` | jsonb | Parsed YAML frontmatter metadata |
127
+ | `last_synced_at` | timestamptz | When file was last synced |
128
+ | `created_at` | timestamptz | When source was first loaded |
129
+ | `updated_at` | timestamptz | When source was last updated |
198
130
 
199
- **Relationships**:
200
- - One robot has many nodes (1:N)
201
-
202
- ---
131
+ Nodes loaded from files have:
132
+ - `source_id` - Foreign key to file_sources (nullable, ON DELETE SET NULL)
133
+ - `chunk_position` - Integer position within the file (0-indexed)
203
134
 
204
- ### nodes
135
+ Query nodes from a file:
136
+ ```sql
137
+ SELECT n.*
138
+ FROM nodes n
139
+ JOIN file_sources fs ON n.source_id = fs.id
140
+ WHERE fs.file_path = '/path/to/file.md'
141
+ ORDER BY n.chunk_position;
142
+ ```
205
143
 
206
- The core table storing all memory nodes with vector embeddings for semantic search.
144
+ ### Remember Tracking
207
145
 
208
- **Purpose**: Stores all memories (conversation messages, facts, decisions, code, etc.) with full-text and vector search capabilities.
146
+ The `robot_nodes` table tracks per-robot remember metadata:
209
147
 
210
- ```sql
211
- CREATE TABLE public.nodes (
212
- id bigint NOT NULL,
213
- content text NOT NULL,
214
- speaker text NOT NULL,
215
- type text,
216
- category text,
217
- importance double precision DEFAULT 1.0,
218
- created_at timestamp with time zone DEFAULT CURRENT_TIMESTAMP,
219
- updated_at timestamp with time zone DEFAULT CURRENT_TIMESTAMP,
220
- last_accessed timestamp with time zone DEFAULT CURRENT_TIMESTAMP,
221
- token_count integer,
222
- in_working_memory boolean DEFAULT false,
223
- robot_id bigint NOT NULL,
224
- embedding public.vector(2000),
225
- embedding_dimension integer,
226
- CONSTRAINT check_embedding_dimension CHECK (((embedding_dimension IS NULL) OR ((embedding_dimension > 0) AND (embedding_dimension <= 2000))))
227
- );
228
-
229
- ALTER TABLE ONLY public.nodes ALTER COLUMN id SET DEFAULT nextval('public.nodes_id_seq'::regclass);
230
- ALTER TABLE ONLY public.nodes ADD CONSTRAINT nodes_pkey PRIMARY KEY (id);
231
- ALTER TABLE ONLY public.nodes
232
- ADD CONSTRAINT fk_rails_60162e9d3a FOREIGN KEY (robot_id) REFERENCES public.robots(id) ON DELETE CASCADE;
233
- ```
148
+ 1. `first_remembered_at` - When this robot first encountered this content
149
+ 2. `last_remembered_at` - Updated each time the robot tries to remember the same content
150
+ 3. `remember_count` - Incremented each time (useful for identifying frequently reinforced memories)
234
151
 
235
- **Columns**:
236
-
237
- | Column | Type | Nullable | Default | Description |
238
- |--------|------|----------|---------|-------------|
239
- | `id` | BIGINT | NO | AUTO | Unique identifier (primary key) |
240
- | `content` | TEXT | NO | - | The conversation message/utterance content |
241
- | `speaker` | TEXT | NO | - | Who said it: user or robot name |
242
- | `type` | TEXT | YES | NULL | Memory type: fact, context, code, preference, decision, question |
243
- | `category` | TEXT | YES | NULL | Optional category for organizing memories |
244
- | `importance` | DOUBLE PRECISION | YES | 1.0 | Importance score (0.0-1.0) for prioritizing recall |
245
- | `created_at` | TIMESTAMPTZ | YES | NOW() | When this memory was created |
246
- | `updated_at` | TIMESTAMPTZ | YES | NOW() | When this memory was last modified |
247
- | `last_accessed` | TIMESTAMPTZ | YES | NOW() | When this memory was last accessed |
248
- | `token_count` | INTEGER | YES | NULL | Number of tokens in the content (for context budget management) |
249
- | `in_working_memory` | BOOLEAN | YES | FALSE | Whether this memory is currently in working memory |
250
- | `robot_id` | BIGINT | NO | - | ID of the robot that owns this memory |
251
- | `embedding` | vector(2000) | YES | NULL | Vector embedding (max 2000 dimensions) for semantic search |
252
- | `embedding_dimension` | INTEGER | YES | NULL | Actual number of dimensions used in the embedding vector (max 2000) |
253
-
254
- **Indexes**:
255
-
256
- - `PRIMARY KEY` on `id`
257
- - `idx_nodes_robot_id` BTREE on `robot_id`
258
- - `idx_nodes_speaker` BTREE on `speaker`
259
- - `idx_nodes_type` BTREE on `type`
260
- - `idx_nodes_category` BTREE on `category`
261
- - `idx_nodes_created_at` BTREE on `created_at`
262
- - `idx_nodes_updated_at` BTREE on `updated_at`
263
- - `idx_nodes_last_accessed` BTREE on `last_accessed`
264
- - `idx_nodes_in_working_memory` BTREE on `in_working_memory`
265
- - `idx_nodes_embedding` HNSW on `embedding` using `vector_cosine_ops` (m=16, ef_construction=64)
266
- - `idx_nodes_content_gin` GIN on `to_tsvector('english', content)` for full-text search
267
- - `idx_nodes_content_trgm` GIN on `content` using `gin_trgm_ops` for fuzzy matching
268
-
269
- **Foreign Keys**:
270
- - `robot_id` references `robots(id)` ON DELETE CASCADE
271
-
272
- **Relationships**:
273
- - Many nodes belong to one robot (N:1)
274
- - Many nodes have many tags through nodes_tags (N:M)
275
-
276
- **Check Constraints**:
277
- - `check_embedding_dimension`: Ensures embedding_dimension is NULL or between 1 and 2000
152
+ This allows querying for:
153
+ - Recently reinforced memories: `ORDER BY last_remembered_at DESC`
154
+ - Frequently remembered content: `ORDER BY remember_count DESC`
155
+ - New vs old memories: Compare `first_remembered_at` across robots
278
156
 
279
157
  ---
280
158
 
281
- ### tags
282
-
283
- The tags table stores unique hierarchical tag names for categorization.
159
+ ## Common Query Patterns
284
160
 
285
- **Purpose**: Provides flexible, hierarchical categorization using colon-separated namespaces (e.g., `database:postgresql:timescaledb`).
161
+ ### Finding Nodes for a Robot
286
162
 
287
163
  ```sql
288
- CREATE TABLE public.tags (
289
- id bigint NOT NULL,
290
- name text NOT NULL,
291
- created_at timestamp with time zone DEFAULT CURRENT_TIMESTAMP
292
- );
293
-
294
- ALTER TABLE ONLY public.tags ALTER COLUMN id SET DEFAULT nextval('public.tags_id_seq'::regclass);
295
- ALTER TABLE ONLY public.tags ADD CONSTRAINT tags_pkey PRIMARY KEY (id);
164
+ SELECT n.*
165
+ FROM nodes n
166
+ JOIN robot_nodes rn ON n.id = rn.node_id
167
+ WHERE rn.robot_id = $1
168
+ ORDER BY rn.last_remembered_at DESC;
296
169
  ```
297
170
 
298
- **Columns**:
299
-
300
- | Column | Type | Nullable | Default | Description |
301
- |--------|------|----------|---------|-------------|
302
- | `id` | BIGINT | NO | AUTO | Unique identifier (primary key) |
303
- | `name` | TEXT | NO | - | Hierarchical tag in format: root:level1:level2 (e.g., database:postgresql:timescaledb) |
304
- | `created_at` | TIMESTAMPTZ | YES | NOW() | When this tag was created |
305
-
306
- **Indexes**:
307
- - `PRIMARY KEY` on `id`
308
- - `idx_tags_name_unique` UNIQUE BTREE on `name`
309
- - `idx_tags_name_pattern` BTREE on `name` with `text_pattern_ops` for pattern matching
310
-
311
- **Relationships**:
312
- - Many tags belong to many nodes through nodes_tags (N:M)
171
+ ### Finding Robots that Share a Node
313
172
 
314
- **Tag Hierarchy**:
315
-
316
- Tags use colon-separated hierarchies for organization:
317
- - `programming:ruby:gems` - Programming > Ruby > Gems
318
- - `database:postgresql:extensions` - Database > PostgreSQL > Extensions
319
- - `ai:llm:embeddings` - AI > LLM > Embeddings
320
-
321
- This allows querying by prefix to find all related tags:
322
173
  ```sql
323
- SELECT * FROM tags WHERE name LIKE 'database:%'; -- All database-related tags
324
- SELECT * FROM tags WHERE name LIKE 'ai:llm:%'; -- All LLM-related tags
174
+ SELECT r.*
175
+ FROM robots r
176
+ JOIN robot_nodes rn ON r.id = rn.robot_id
177
+ WHERE rn.node_id = $1
178
+ ORDER BY rn.first_remembered_at;
325
179
  ```
326
180
 
327
- ---
328
-
329
- ### nodes_tags
330
-
331
- The nodes_tags join table implements the many-to-many relationship between nodes and tags.
332
-
333
- **Purpose**: Links nodes to tags, allowing each node to have multiple tags and each tag to be applied to multiple nodes.
181
+ ### Finding Frequently Remembered Content
334
182
 
335
183
  ```sql
336
- CREATE TABLE public.nodes_tags (
337
- id bigint NOT NULL,
338
- node_id bigint NOT NULL,
339
- tag_id bigint NOT NULL,
340
- created_at timestamp with time zone DEFAULT CURRENT_TIMESTAMP
341
- );
342
-
343
- ALTER TABLE ONLY public.nodes_tags ALTER COLUMN id SET DEFAULT nextval('public.node_tags_id_seq'::regclass);
344
- ALTER TABLE ONLY public.nodes_tags ADD CONSTRAINT node_tags_pkey PRIMARY KEY (id);
345
-
346
- ALTER TABLE ONLY public.nodes_tags
347
- ADD CONSTRAINT fk_rails_b0b726ecf8 FOREIGN KEY (node_id) REFERENCES public.nodes(id) ON DELETE CASCADE;
348
- ALTER TABLE ONLY public.nodes_tags
349
- ADD CONSTRAINT fk_rails_eccc99cec5 FOREIGN KEY (tag_id) REFERENCES public.tags(id) ON DELETE CASCADE;
184
+ SELECT n.*, rn.remember_count, rn.first_remembered_at, rn.last_remembered_at
185
+ FROM nodes n
186
+ JOIN robot_nodes rn ON n.id = rn.node_id
187
+ WHERE rn.robot_id = $1
188
+ ORDER BY rn.remember_count DESC
189
+ LIMIT 10;
350
190
  ```
351
191
 
352
- **Columns**:
353
-
354
- | Column | Type | Nullable | Default | Description |
355
- |--------|------|----------|---------|-------------|
356
- | `id` | BIGINT | NO | AUTO | Unique identifier (primary key) |
357
- | `node_id` | BIGINT | NO | - | ID of the node being tagged |
358
- | `tag_id` | BIGINT | NO | - | ID of the tag being applied |
359
- | `created_at` | TIMESTAMPTZ | YES | NOW() | When this association was created |
360
-
361
- **Indexes**:
362
- - `PRIMARY KEY` on `id`
363
- - `idx_node_tags_unique` UNIQUE BTREE on `(node_id, tag_id)` - Prevents duplicate associations
364
- - `idx_node_tags_node_id` BTREE on `node_id` - Fast lookups of tags for a node
365
- - `idx_node_tags_tag_id` BTREE on `tag_id` - Fast lookups of nodes for a tag
366
-
367
- **Foreign Keys**:
368
- - `node_id` references `nodes(id)` ON DELETE CASCADE
369
- - `tag_id` references `tags(id)` ON DELETE CASCADE
370
-
371
- **Cascade Behavior**:
372
- - When a node is deleted, all its tag associations are automatically removed
373
- - When a tag is deleted, all associations to that tag are automatically removed
374
- - The join table ensures referential integrity between nodes and tags
375
-
376
- ---
377
-
378
- ## Common Query Patterns
379
-
380
192
  ### Finding Tags for a Node
381
193
 
382
194
  ```sql
383
195
  SELECT t.name
384
196
  FROM tags t
385
- JOIN nodes_tags nt ON t.id = nt.tag_id
197
+ JOIN node_tags nt ON t.id = nt.tag_id
386
198
  WHERE nt.node_id = $1
387
199
  ORDER BY t.name;
388
200
  ```
@@ -392,7 +204,7 @@ ORDER BY t.name;
392
204
  ```sql
393
205
  SELECT n.*
394
206
  FROM nodes n
395
- JOIN nodes_tags nt ON n.id = nt.node_id
207
+ JOIN node_tags nt ON n.id = nt.node_id
396
208
  JOIN tags t ON nt.tag_id = t.id
397
209
  WHERE t.name = 'database:postgresql'
398
210
  ORDER BY n.created_at DESC;
@@ -403,7 +215,7 @@ ORDER BY n.created_at DESC;
403
215
  ```sql
404
216
  SELECT n.*
405
217
  FROM nodes n
406
- JOIN nodes_tags nt ON n.id = nt.node_id
218
+ JOIN node_tags nt ON n.id = nt.node_id
407
219
  JOIN tags t ON nt.tag_id = t.id
408
220
  WHERE t.name LIKE 'ai:llm:%'
409
221
  ORDER BY n.created_at DESC;
@@ -417,8 +229,8 @@ SELECT
417
229
  t2.name AS topic2,
418
230
  COUNT(DISTINCT nt1.node_id) AS shared_nodes
419
231
  FROM tags t1
420
- JOIN nodes_tags nt1 ON t1.id = nt1.tag_id
421
- JOIN nodes_tags nt2 ON nt1.node_id = nt2.node_id
232
+ JOIN node_tags nt1 ON t1.id = nt1.tag_id
233
+ JOIN node_tags nt2 ON nt1.node_id = nt2.node_id
422
234
  JOIN tags t2 ON nt2.tag_id = t2.id
423
235
  WHERE t1.name < t2.name
424
236
  GROUP BY t1.name, t2.name
@@ -431,7 +243,7 @@ ORDER BY shared_nodes DESC;
431
243
  ```sql
432
244
  SELECT n.*, n.embedding <=> $1::vector AS distance
433
245
  FROM nodes n
434
- JOIN nodes_tags nt ON n.id = nt.node_id
246
+ JOIN node_tags nt ON n.id = nt.node_id
435
247
  JOIN tags t ON nt.tag_id = t.id
436
248
  WHERE t.name = 'programming:ruby'
437
249
  AND n.embedding IS NOT NULL
@@ -444,7 +256,7 @@ LIMIT 10;
444
256
  ```sql
445
257
  SELECT n.*, ts_rank(to_tsvector('english', n.content), query) AS rank
446
258
  FROM nodes n
447
- JOIN nodes_tags nt ON n.id = nt.node_id
259
+ JOIN node_tags nt ON n.id = nt.node_id
448
260
  JOIN tags t ON nt.tag_id = t.id,
449
261
  to_tsquery('english', 'database & optimization') query
450
262
  WHERE to_tsvector('english', n.content) @@ query
@@ -453,6 +265,17 @@ ORDER BY rank DESC
453
265
  LIMIT 20;
454
266
  ```
455
267
 
268
+ ### Finding Content Shared by Multiple Robots
269
+
270
+ ```sql
271
+ SELECT n.*, COUNT(DISTINCT rn.robot_id) AS robot_count
272
+ FROM nodes n
273
+ JOIN robot_nodes rn ON n.id = rn.node_id
274
+ GROUP BY n.id
275
+ HAVING COUNT(DISTINCT rn.robot_id) > 1
276
+ ORDER BY robot_count DESC;
277
+ ```
278
+
456
279
  ---
457
280
 
458
281
  ## Database Optimization
@@ -518,6 +341,8 @@ The schema is managed through ActiveRecord migrations located in `db/migrate/`:
518
341
  1. `20250101000001_create_robots.rb` - Creates robots table
519
342
  2. `20250101000002_create_nodes.rb` - Creates nodes table with all indexes
520
343
  3. `20250101000005_create_tags.rb` - Creates tags and nodes_tags tables
344
+ 4. `20251128000002_create_file_sources.rb` - Creates file_sources table for document tracking
345
+ 5. `20251128000003_add_source_to_nodes.rb` - Adds source_id and chunk_position to nodes
521
346
 
522
347
  To apply migrations:
523
348
  ```bash
@@ -147,15 +147,7 @@ ruby -r debug test/htm_test.rb
147
147
 
148
148
  ### Test File Layout
149
149
 
150
- ```
151
- test/
152
- ├── test_helper.rb # Shared test configuration
153
- ├── htm_test.rb # Unit tests for HTM class
154
- ├── embedding_service_test.rb # Unit tests for EmbeddingService
155
- ├── integration_test.rb # Integration tests
156
- └── fixtures/ # Test data (future)
157
- └── sample_memories.json
158
- ```
150
+ ![Test Directory Structure](../assets/images/test-directory-structure.svg)
159
151
 
160
152
  ### Test File Template
161
153
 
@@ -0,0 +1,47 @@
1
+ # Getting Started
2
+
3
+ Welcome to HTM (Hierarchical Temporary Memory)! This section will help you get up and running quickly.
4
+
5
+ ## Overview
6
+
7
+ HTM provides intelligent memory management for LLM-based applications (robots) with a two-tier architecture:
8
+
9
+ - **Long-term Memory**: Durable PostgreSQL storage with vector embeddings for semantic search
10
+ - **Working Memory**: Token-limited in-memory context for immediate LLM use
11
+
12
+ ## What You'll Learn
13
+
14
+ <div class="grid cards" markdown>
15
+
16
+ - :material-download:{ .lg .middle } **Installation**
17
+
18
+ ---
19
+
20
+ Set up HTM in your Ruby project with all required dependencies including PostgreSQL, pgvector, and Ollama.
21
+
22
+ [:octicons-arrow-right-24: Install HTM](installation.md)
23
+
24
+ - :material-rocket-launch:{ .lg .middle } **Quick Start**
25
+
26
+ ---
27
+
28
+ Build your first memory-enabled robot in minutes with practical examples and code snippets.
29
+
30
+ [:octicons-arrow-right-24: Get started](quick-start.md)
31
+
32
+ </div>
33
+
34
+ ## Prerequisites
35
+
36
+ Before installing HTM, ensure you have:
37
+
38
+ - **Ruby 3.1+** - HTM uses modern Ruby features
39
+ - **PostgreSQL 14+** - With pgvector and pg_trgm extensions
40
+ - **Ollama** (optional) - For local embedding generation
41
+
42
+ ## Next Steps
43
+
44
+ 1. **[Install HTM](installation.md)** - Set up the gem and database
45
+ 2. **[Quick Start](quick-start.md)** - Create your first memory-enabled robot
46
+ 3. **[Architecture Overview](../architecture/overview.md)** - Understand how HTM works
47
+ 4. **[Guides](../guides/index.md)** - Deep dive into specific features
@@ -453,8 +453,8 @@ psql $HTM_DBURL -c "
453
453
  Now that HTM is installed, you're ready to start building:
454
454
 
455
455
  1. **[Quick Start Guide](quick-start.md)**: Build your first HTM application in 5 minutes
456
- 2. **[User Guide](guides/getting-started.md)**: Learn all HTM features in depth
457
- 3. **[API Reference](api/htm.md)**: Explore the complete API documentation
456
+ 2. **[User Guide](../guides/getting-started.md)**: Learn all HTM features in depth
457
+ 3. **[API Reference](../api/htm.md)**: Explore the complete API documentation
458
458
  4. **[Examples](https://github.com/madbomber/htm/tree/main/examples)**: See real-world usage examples
459
459
 
460
460
  ## Getting Help