htm 0.0.18 → 0.0.30

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (216) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +119 -1
  3. data/README.md +12 -0
  4. data/Rakefile +104 -18
  5. data/db/migrate/00001_enable_extensions.rb +9 -5
  6. data/db/migrate/00002_create_robots.rb +18 -6
  7. data/db/migrate/00003_create_file_sources.rb +30 -17
  8. data/db/migrate/00004_create_nodes.rb +60 -48
  9. data/db/migrate/00005_create_tags.rb +24 -12
  10. data/db/migrate/00006_create_node_tags.rb +28 -13
  11. data/db/migrate/00007_create_robot_nodes.rb +40 -26
  12. data/db/schema.sql +17 -1
  13. data/db/seeds.rb +34 -34
  14. data/docs/api/embedding-service.md +140 -110
  15. data/docs/api/yard/HTM/ActiveRecordConfig.md +6 -0
  16. data/docs/api/yard/HTM/Config.md +173 -0
  17. data/docs/api/yard/HTM/ConfigSection.md +28 -0
  18. data/docs/api/yard/HTM/Database.md +1 -1
  19. data/docs/api/yard/HTM/Railtie.md +2 -2
  20. data/docs/api/yard/HTM.md +0 -57
  21. data/docs/api/yard/index.csv +76 -61
  22. data/docs/api/yard-reference.md +2 -1
  23. data/docs/architecture/adrs/003-ollama-embeddings.md +45 -36
  24. data/docs/architecture/adrs/004-hive-mind.md +1 -1
  25. data/docs/architecture/adrs/008-robot-identification.md +1 -1
  26. data/docs/architecture/index.md +11 -9
  27. data/docs/architecture/overview.md +11 -7
  28. data/docs/assets/images/balanced-strategy-decay.svg +41 -0
  29. data/docs/assets/images/class-hierarchy.svg +1 -1
  30. data/docs/assets/images/eviction-priority.svg +43 -0
  31. data/docs/assets/images/exception-hierarchy.svg +2 -2
  32. data/docs/assets/images/hive-mind-shared-memory.svg +52 -0
  33. data/docs/assets/images/htm-architecture-overview.svg +3 -3
  34. data/docs/assets/images/htm-core-components.svg +4 -4
  35. data/docs/assets/images/htm-layered-architecture.svg +1 -1
  36. data/docs/assets/images/htm-memory-addition-flow.svg +2 -2
  37. data/docs/assets/images/htm-memory-recall-flow.svg +2 -2
  38. data/docs/assets/images/memory-topology.svg +53 -0
  39. data/docs/assets/images/two-tier-memory-architecture.svg +55 -0
  40. data/docs/database/naming-convention.md +244 -0
  41. data/docs/database_rake_tasks.md +31 -0
  42. data/docs/development/rake-tasks.md +80 -35
  43. data/docs/development/setup.md +76 -44
  44. data/docs/examples/basic-usage.md +133 -0
  45. data/docs/examples/config-files.md +170 -0
  46. data/docs/examples/file-loading.md +208 -0
  47. data/docs/examples/index.md +116 -0
  48. data/docs/examples/llm-configuration.md +168 -0
  49. data/docs/examples/mcp-client.md +172 -0
  50. data/docs/examples/rails-integration.md +173 -0
  51. data/docs/examples/robot-groups.md +210 -0
  52. data/docs/examples/sinatra-integration.md +218 -0
  53. data/docs/examples/standalone-app.md +216 -0
  54. data/docs/examples/telemetry.md +224 -0
  55. data/docs/examples/timeframes.md +143 -0
  56. data/docs/getting-started/installation.md +97 -40
  57. data/docs/getting-started/quick-start.md +28 -11
  58. data/docs/guides/configuration.md +515 -0
  59. data/docs/guides/file-loading.md +322 -0
  60. data/docs/guides/getting-started.md +40 -9
  61. data/docs/guides/index.md +3 -3
  62. data/docs/guides/mcp-server.md +100 -13
  63. data/docs/guides/propositions.md +264 -0
  64. data/docs/guides/recalling-memories.md +4 -4
  65. data/docs/guides/search-strategies.md +3 -3
  66. data/docs/guides/tags.md +318 -0
  67. data/docs/guides/telemetry.md +229 -0
  68. data/docs/index.md +8 -16
  69. data/docs/{architecture → robots}/hive-mind.md +8 -111
  70. data/docs/robots/index.md +73 -0
  71. data/docs/{guides → robots}/multi-robot.md +3 -3
  72. data/docs/{guides → robots}/robot-groups.md +8 -7
  73. data/docs/{architecture → robots}/two-tier-memory.md +13 -149
  74. data/docs/robots/why-robots.md +85 -0
  75. data/examples/.envrc +6 -0
  76. data/examples/.gitignore +2 -0
  77. data/examples/00_create_examples_db.rb +94 -0
  78. data/examples/{basic_usage.rb → 01_basic_usage.rb} +12 -16
  79. data/examples/{custom_llm_configuration.rb → 03_custom_llm_configuration.rb} +13 -3
  80. data/examples/{file_loader_usage.rb → 04_file_loader_usage.rb} +11 -14
  81. data/examples/{timeframe_demo.rb → 05_timeframe_demo.rb} +10 -3
  82. data/examples/{example_app → 06_example_app}/app.rb +15 -15
  83. data/examples/{cli_app → 07_cli_app}/htm_cli.rb +15 -22
  84. data/examples/08_sinatra_app/Gemfile.lock +241 -0
  85. data/examples/{sinatra_app → 08_sinatra_app}/app.rb +19 -18
  86. data/examples/{mcp_client.rb → 09_mcp_client.rb} +5 -8
  87. data/examples/{telemetry → 10_telemetry}/SETUP_README.md +1 -1
  88. data/examples/{telemetry → 10_telemetry}/demo.rb +14 -10
  89. data/examples/11_robot_groups/README.md +335 -0
  90. data/examples/{robot_groups → 11_robot_groups/lib}/robot_worker.rb +17 -3
  91. data/examples/{robot_groups → 11_robot_groups}/multi_process.rb +9 -9
  92. data/examples/{robot_groups → 11_robot_groups}/same_process.rb +9 -12
  93. data/examples/{rails_app → 12_rails_app}/Gemfile +3 -0
  94. data/examples/{rails_app → 12_rails_app}/Gemfile.lock +87 -58
  95. data/examples/{rails_app → 12_rails_app}/app/controllers/dashboard_controller.rb +10 -6
  96. data/examples/{rails_app → 12_rails_app}/app/controllers/files_controller.rb +5 -5
  97. data/examples/{rails_app → 12_rails_app}/app/controllers/memories_controller.rb +11 -7
  98. data/examples/{rails_app → 12_rails_app}/app/controllers/robots_controller.rb +8 -8
  99. data/examples/12_rails_app/app/controllers/tags_controller.rb +36 -0
  100. data/examples/{rails_app → 12_rails_app}/app/views/dashboard/index.html.erb +2 -2
  101. data/examples/{rails_app → 12_rails_app}/app/views/files/new.html.erb +5 -2
  102. data/examples/{rails_app → 12_rails_app}/app/views/memories/_memory_card.html.erb +3 -3
  103. data/examples/{rails_app → 12_rails_app}/app/views/memories/deleted.html.erb +3 -3
  104. data/examples/{rails_app → 12_rails_app}/app/views/memories/edit.html.erb +3 -3
  105. data/examples/{rails_app → 12_rails_app}/app/views/memories/show.html.erb +4 -4
  106. data/examples/{rails_app → 12_rails_app}/app/views/robots/index.html.erb +2 -2
  107. data/examples/{rails_app → 12_rails_app}/app/views/robots/show.html.erb +4 -4
  108. data/examples/{rails_app → 12_rails_app}/app/views/search/index.html.erb +1 -1
  109. data/examples/{rails_app → 12_rails_app}/app/views/tags/index.html.erb +2 -2
  110. data/examples/{rails_app → 12_rails_app}/app/views/tags/show.html.erb +1 -1
  111. data/examples/12_rails_app/config/initializers/htm.rb +7 -0
  112. data/examples/12_rails_app/config/initializers/rack.rb +5 -0
  113. data/examples/README.md +230 -211
  114. data/examples/examples_helper.rb +138 -0
  115. data/lib/htm/config/builder.rb +167 -0
  116. data/lib/htm/config/database.rb +317 -0
  117. data/lib/htm/config/defaults.yml +41 -13
  118. data/lib/htm/config/section.rb +74 -0
  119. data/lib/htm/config/validator.rb +83 -0
  120. data/lib/htm/config.rb +65 -361
  121. data/lib/htm/database.rb +85 -127
  122. data/lib/htm/errors.rb +14 -0
  123. data/lib/htm/integrations/sinatra.rb +13 -44
  124. data/lib/htm/job_adapter.rb +75 -1
  125. data/lib/htm/jobs/generate_embedding_job.rb +3 -4
  126. data/lib/htm/jobs/generate_propositions_job.rb +4 -5
  127. data/lib/htm/jobs/generate_tags_job.rb +16 -15
  128. data/lib/htm/loaders/defaults_loader.rb +23 -0
  129. data/lib/htm/loaders/markdown_loader.rb +17 -15
  130. data/lib/htm/loaders/xdg_config_loader.rb +9 -9
  131. data/lib/htm/long_term_memory/fulltext_search.rb +14 -14
  132. data/lib/htm/long_term_memory/hybrid_search.rb +396 -229
  133. data/lib/htm/long_term_memory/node_operations.rb +24 -23
  134. data/lib/htm/long_term_memory/relevance_scorer.rb +23 -20
  135. data/lib/htm/long_term_memory/robot_operations.rb +4 -4
  136. data/lib/htm/long_term_memory/tag_operations.rb +91 -77
  137. data/lib/htm/long_term_memory/vector_search.rb +4 -5
  138. data/lib/htm/long_term_memory.rb +13 -13
  139. data/lib/htm/mcp/cli.rb +115 -8
  140. data/lib/htm/mcp/resources.rb +4 -3
  141. data/lib/htm/mcp/server.rb +5 -4
  142. data/lib/htm/mcp/tools.rb +37 -28
  143. data/lib/htm/migration.rb +72 -0
  144. data/lib/htm/models/file_source.rb +52 -31
  145. data/lib/htm/models/node.rb +224 -108
  146. data/lib/htm/models/node_tag.rb +49 -28
  147. data/lib/htm/models/robot.rb +38 -27
  148. data/lib/htm/models/robot_node.rb +63 -35
  149. data/lib/htm/models/tag.rb +126 -123
  150. data/lib/htm/observability.rb +45 -41
  151. data/lib/htm/proposition_service.rb +76 -7
  152. data/lib/htm/railtie.rb +2 -2
  153. data/lib/htm/robot_group.rb +30 -18
  154. data/lib/htm/sequel_config.rb +215 -0
  155. data/lib/htm/sql_builder.rb +14 -16
  156. data/lib/htm/tag_service.rb +78 -0
  157. data/lib/htm/tasks.rb +3 -0
  158. data/lib/htm/version.rb +1 -1
  159. data/lib/htm/workflows/remember_workflow.rb +213 -0
  160. data/lib/htm.rb +27 -22
  161. data/lib/tasks/db.rake +0 -2
  162. data/lib/tasks/doc.rake +2 -2
  163. data/lib/tasks/files.rake +11 -18
  164. data/lib/tasks/htm.rake +190 -62
  165. data/lib/tasks/jobs.rake +179 -54
  166. data/lib/tasks/tags.rake +8 -13
  167. data/mkdocs.yml +33 -8
  168. data/scripts/backfill_parent_tags.rb +376 -0
  169. data/scripts/normalize_plural_tags.rb +335 -0
  170. metadata +168 -86
  171. data/docs/api/yard/HTM/Configuration.md +0 -240
  172. data/docs/telemetry.md +0 -391
  173. data/examples/rails_app/app/controllers/tags_controller.rb +0 -30
  174. data/examples/sinatra_app/Gemfile.lock +0 -166
  175. data/lib/htm/active_record_config.rb +0 -104
  176. /data/examples/{config_file_example → 02_config_file_example}/README.md +0 -0
  177. /data/examples/{config_file_example → 02_config_file_example}/config/htm.local.yml +0 -0
  178. /data/examples/{config_file_example → 02_config_file_example}/custom_config.yml +0 -0
  179. /data/examples/{config_file_example → 02_config_file_example}/show_config.rb +0 -0
  180. /data/examples/{example_app → 06_example_app}/Rakefile +0 -0
  181. /data/examples/{cli_app → 07_cli_app}/README.md +0 -0
  182. /data/examples/{sinatra_app → 08_sinatra_app}/Gemfile +0 -0
  183. /data/examples/{telemetry → 10_telemetry}/README.md +0 -0
  184. /data/examples/{telemetry → 10_telemetry}/grafana/dashboards/htm-metrics.json +0 -0
  185. /data/examples/{rails_app → 12_rails_app}/.gitignore +0 -0
  186. /data/examples/{rails_app → 12_rails_app}/Procfile.dev +0 -0
  187. /data/examples/{rails_app → 12_rails_app}/README.md +0 -0
  188. /data/examples/{rails_app → 12_rails_app}/Rakefile +0 -0
  189. /data/examples/{rails_app → 12_rails_app}/app/assets/stylesheets/application.css +0 -0
  190. /data/examples/{rails_app → 12_rails_app}/app/assets/stylesheets/inter-font.css +0 -0
  191. /data/examples/{rails_app → 12_rails_app}/app/controllers/application_controller.rb +0 -0
  192. /data/examples/{rails_app → 12_rails_app}/app/controllers/search_controller.rb +0 -0
  193. /data/examples/{rails_app → 12_rails_app}/app/javascript/application.js +0 -0
  194. /data/examples/{rails_app → 12_rails_app}/app/javascript/controllers/application.js +0 -0
  195. /data/examples/{rails_app → 12_rails_app}/app/javascript/controllers/index.js +0 -0
  196. /data/examples/{rails_app → 12_rails_app}/app/views/files/index.html.erb +0 -0
  197. /data/examples/{rails_app → 12_rails_app}/app/views/files/show.html.erb +0 -0
  198. /data/examples/{rails_app → 12_rails_app}/app/views/layouts/application.html.erb +0 -0
  199. /data/examples/{rails_app → 12_rails_app}/app/views/memories/index.html.erb +0 -0
  200. /data/examples/{rails_app → 12_rails_app}/app/views/memories/new.html.erb +0 -0
  201. /data/examples/{rails_app → 12_rails_app}/app/views/robots/new.html.erb +0 -0
  202. /data/examples/{rails_app → 12_rails_app}/app/views/shared/_navbar.html.erb +0 -0
  203. /data/examples/{rails_app → 12_rails_app}/app/views/shared/_stat_card.html.erb +0 -0
  204. /data/examples/{rails_app → 12_rails_app}/bin/dev +0 -0
  205. /data/examples/{rails_app → 12_rails_app}/bin/rails +0 -0
  206. /data/examples/{rails_app → 12_rails_app}/bin/rake +0 -0
  207. /data/examples/{rails_app → 12_rails_app}/config/application.rb +0 -0
  208. /data/examples/{rails_app → 12_rails_app}/config/boot.rb +0 -0
  209. /data/examples/{rails_app → 12_rails_app}/config/database.yml +0 -0
  210. /data/examples/{rails_app → 12_rails_app}/config/environment.rb +0 -0
  211. /data/examples/{rails_app → 12_rails_app}/config/importmap.rb +0 -0
  212. /data/examples/{rails_app → 12_rails_app}/config/routes.rb +0 -0
  213. /data/examples/{rails_app → 12_rails_app}/config/tailwind.config.js +0 -0
  214. /data/examples/{rails_app → 12_rails_app}/config.ru +0 -0
  215. /data/examples/{rails_app → 12_rails_app}/log/.keep +0 -0
  216. /data/examples/{rails_app → 12_rails_app}/tmp/local_secret.txt +0 -0
@@ -0,0 +1,322 @@
1
+ # File Loading
2
+
3
+ HTM can load text-based files (currently markdown) into long-term memory with automatic chunking, source tracking, and re-sync support. This is ideal for building knowledge bases from documentation, notes, or any text content.
4
+
5
+ ## Overview
6
+
7
+ The file loading system provides:
8
+
9
+ - **Automatic chunking**: Large files are split into semantically-aware chunks
10
+ - **YAML frontmatter extraction**: Metadata from file headers is preserved
11
+ - **Source tracking**: Files are tracked for re-sync when content changes
12
+ - **Duplicate detection**: Content hashing prevents duplicate chunks
13
+ - **Soft delete**: Unloading files uses soft delete for recovery
14
+
15
+ ## Quick Start
16
+
17
+ ```ruby
18
+ require 'htm'
19
+
20
+ htm = HTM.new(robot_name: "Document Loader")
21
+
22
+ # Load a single markdown file
23
+ result = htm.load_file("docs/guide.md")
24
+ # => { file_source_id: 1, chunks_created: 5, chunks_updated: 0, skipped: false }
25
+
26
+ # Load all markdown files from a directory
27
+ results = htm.load_directory("docs/", pattern: "**/*.md")
28
+ # => [{ file_path: "docs/guide.md", ... }, { file_path: "docs/api.md", ... }]
29
+
30
+ # Query nodes from a specific file
31
+ nodes = htm.nodes_from_file("docs/guide.md")
32
+
33
+ # Unload a file (soft deletes chunks)
34
+ htm.unload_file("docs/guide.md")
35
+ ```
36
+
37
+ ## API Reference
38
+
39
+ ### load_file(path, force: false)
40
+
41
+ Loads a single file into long-term memory.
42
+
43
+ | Parameter | Type | Default | Description |
44
+ |-----------|------|---------|-------------|
45
+ | `path` | String | required | Path to the file |
46
+ | `force` | Boolean | `false` | Force reload even if file unchanged |
47
+
48
+ **Returns:** Hash with keys:
49
+ - `file_source_id`: ID of the FileSource record
50
+ - `chunks_created`: Number of new chunks created
51
+ - `chunks_updated`: Number of existing chunks updated
52
+ - `chunks_deleted`: Number of chunks removed
53
+ - `skipped`: Whether file was skipped (unchanged)
54
+
55
+ ```ruby
56
+ # Normal load - skips unchanged files
57
+ result = htm.load_file("docs/guide.md")
58
+
59
+ # Force reload even if file hasn't changed
60
+ result = htm.load_file("docs/guide.md", force: true)
61
+ ```
62
+
63
+ ### load_directory(path, pattern: "**/*.md", force: false)
64
+
65
+ Loads all matching files from a directory.
66
+
67
+ | Parameter | Type | Default | Description |
68
+ |-----------|------|---------|-------------|
69
+ | `path` | String | required | Directory path |
70
+ | `pattern` | String | `"**/*.md"` | Glob pattern for files |
71
+ | `force` | Boolean | `false` | Force reload all files |
72
+
73
+ **Returns:** Array of result hashes (one per file)
74
+
75
+ ```ruby
76
+ # Load all markdown files
77
+ results = htm.load_directory("docs/")
78
+
79
+ # Load only top-level markdown files
80
+ results = htm.load_directory("docs/", pattern: "*.md")
81
+
82
+ # Load specific subdirectory
83
+ results = htm.load_directory("docs/guides/", pattern: "**/*.md")
84
+ ```
85
+
86
+ ### nodes_from_file(path)
87
+
88
+ Returns all nodes loaded from a specific file.
89
+
90
+ ```ruby
91
+ nodes = htm.nodes_from_file("docs/guide.md")
92
+ nodes.each do |node|
93
+ puts "#{node.id}: #{node.content[0..50]}..."
94
+ end
95
+ ```
96
+
97
+ ### unload_file(path)
98
+
99
+ Soft deletes all nodes from a file and removes the file source.
100
+
101
+ ```ruby
102
+ count = htm.unload_file("docs/guide.md")
103
+ puts "Removed #{count} chunks"
104
+ ```
105
+
106
+ ## YAML Frontmatter
107
+
108
+ Files with YAML frontmatter have their metadata extracted and stored:
109
+
110
+ ```markdown
111
+ ---
112
+ title: PostgreSQL Guide
113
+ author: HTM Team
114
+ tags:
115
+ - database
116
+ - postgresql
117
+ version: 1.2
118
+ ---
119
+
120
+ # PostgreSQL Guide
121
+
122
+ Content starts here...
123
+ ```
124
+
125
+ Access frontmatter via the FileSource model:
126
+
127
+ ```ruby
128
+ source = HTM::Models::FileSource.find_by(file_path: "docs/guide.md")
129
+ source.title # => "PostgreSQL Guide"
130
+ source.author # => "HTM Team"
131
+ source.frontmatter_tags # => ["database", "postgresql"]
132
+ source.frontmatter # => { "title" => "...", "author" => "...", ... }
133
+ ```
134
+
135
+ ## Chunking Strategy
136
+
137
+ HTM uses the [Baran gem](https://github.com/baran) with `MarkdownSplitter` for intelligent chunking that respects markdown structure:
138
+
139
+ - **Headers**: Chunks break at header boundaries
140
+ - **Code blocks**: Code blocks are kept intact
141
+ - **Horizontal rules**: Natural section breaks
142
+ - **Configurable size**: Control chunk size and overlap
143
+
144
+ ### Configuration
145
+
146
+ ```ruby
147
+ # Global configuration
148
+ HTM.configure do |config|
149
+ config.chunk_size = 1024 # Characters per chunk (default: 1024)
150
+ config.chunk_overlap = 64 # Overlap between chunks (default: 64)
151
+ end
152
+
153
+ # Or via environment variables
154
+ # HTM_CHUNK_SIZE=512
155
+ # HTM_CHUNK_OVERLAP=50
156
+ ```
157
+
158
+ ### Per-Loader Configuration
159
+
160
+ ```ruby
161
+ loader = HTM::Loaders::MarkdownLoader.new(
162
+ htm,
163
+ chunk_size: 512,
164
+ chunk_overlap: 50
165
+ )
166
+ loader.load("docs/guide.md")
167
+ ```
168
+
169
+ ## Re-Sync Behavior
170
+
171
+ The file loading system tracks file modification times for efficient re-syncing:
172
+
173
+ 1. **First load**: Creates FileSource record and chunks
174
+ 2. **Subsequent loads**: Compares mtime, skips unchanged files
175
+ 3. **Changed files**: Re-chunks and updates nodes
176
+ 4. **Force reload**: Bypasses mtime check
177
+
178
+ ```ruby
179
+ # First load - creates chunks
180
+ htm.load_file("docs/guide.md")
181
+ # => { skipped: false, chunks_created: 5 }
182
+
183
+ # Second load - skipped (unchanged)
184
+ htm.load_file("docs/guide.md")
185
+ # => { skipped: true }
186
+
187
+ # After editing file - re-syncs
188
+ htm.load_file("docs/guide.md")
189
+ # => { skipped: false, chunks_updated: 2, chunks_created: 1 }
190
+
191
+ # Force reload
192
+ htm.load_file("docs/guide.md", force: true)
193
+ # => { skipped: false, chunks_updated: 5 }
194
+ ```
195
+
196
+ ## FileSource Model
197
+
198
+ The `HTM::Models::FileSource` tracks loaded files:
199
+
200
+ ```ruby
201
+ source = HTM::Models::FileSource.find_by(file_path: "docs/guide.md")
202
+
203
+ source.file_path # Full path to file
204
+ source.mtime # Last modification time
205
+ source.needs_sync? # Check if file changed since load
206
+ source.chunks # Associated nodes (ordered by position)
207
+ source.frontmatter # Parsed YAML frontmatter
208
+ source.title # Frontmatter title (convenience)
209
+ source.author # Frontmatter author (convenience)
210
+ source.frontmatter_tags # Tags from frontmatter
211
+ ```
212
+
213
+ ## Rake Tasks
214
+
215
+ HTM provides rake tasks for file management:
216
+
217
+ ```bash
218
+ # Load a single file
219
+ rake 'htm:files:load[docs/guide.md]'
220
+
221
+ # Load directory
222
+ rake 'htm:files:load_dir[docs/]'
223
+ rake 'htm:files:load_dir[docs/,**/*.md]'
224
+
225
+ # List loaded files
226
+ rake htm:files:list
227
+
228
+ # Show file details
229
+ rake 'htm:files:info[docs/guide.md]'
230
+
231
+ # Unload a file
232
+ rake 'htm:files:unload[docs/guide.md]'
233
+
234
+ # Sync all files (reload changed)
235
+ rake htm:files:sync
236
+
237
+ # Show statistics
238
+ rake htm:files:stats
239
+
240
+ # Force reload with FORCE=true
241
+ FORCE=true rake 'htm:files:load[docs/guide.md]'
242
+ ```
243
+
244
+ ## Best Practices
245
+
246
+ ### Organize Files Logically
247
+
248
+ ```ruby
249
+ # Load by category
250
+ htm.load_directory("docs/guides/", pattern: "**/*.md")
251
+ htm.load_directory("docs/api/", pattern: "**/*.md")
252
+ htm.load_directory("docs/tutorials/", pattern: "**/*.md")
253
+ ```
254
+
255
+ ### Use Frontmatter for Metadata
256
+
257
+ ```markdown
258
+ ---
259
+ title: API Authentication
260
+ category: api
261
+ tags:
262
+ - security
263
+ - authentication
264
+ priority: high
265
+ ---
266
+ ```
267
+
268
+ ### Tune Chunk Size for Your Content
269
+
270
+ ```ruby
271
+ # Smaller chunks for dense technical content
272
+ HTM.configure { |c| c.chunk_size = 512 }
273
+
274
+ # Larger chunks for narrative content
275
+ HTM.configure { |c| c.chunk_size = 2048 }
276
+ ```
277
+
278
+ ### Regular Sync for Updated Content
279
+
280
+ ```ruby
281
+ # Sync all loaded files periodically
282
+ htm.sync_files # Re-checks all FileSource records
283
+ ```
284
+
285
+ ## Example: Building a Knowledge Base
286
+
287
+ ```ruby
288
+ require 'htm'
289
+
290
+ # Initialize
291
+ htm = HTM.new(robot_name: "Knowledge Base")
292
+
293
+ # Configure chunking for technical docs
294
+ HTM.configure do |config|
295
+ config.chunk_size = 768
296
+ config.chunk_overlap = 100
297
+ end
298
+
299
+ # Load documentation
300
+ htm.load_directory("docs/", pattern: "**/*.md")
301
+ htm.load_directory("README.md")
302
+ htm.load_directory("CHANGELOG.md")
303
+
304
+ # Query the knowledge base
305
+ results = htm.recall(
306
+ "How do I configure authentication?",
307
+ strategy: :hybrid,
308
+ limit: 5
309
+ )
310
+
311
+ results.each do |result|
312
+ puts result['content']
313
+ puts "---"
314
+ end
315
+ ```
316
+
317
+ ## Related Documentation
318
+
319
+ - [Adding Memories](adding-memories.md) - Core memory operations
320
+ - [Search Strategies](search-strategies.md) - Querying loaded content
321
+ - [API Reference: HTM](../api/htm.md) - Complete API documentation
322
+ - [Example: File Loading](../examples/file-loading.md) - Working example
@@ -8,26 +8,47 @@ Before starting, ensure you have:
8
8
 
9
9
  1. **Ruby 3.0+** installed
10
10
  2. **PostgreSQL with TimescaleDB** (or access to a TimescaleDB cloud instance)
11
- 3. **Ollama** installed and running (for embeddings)
11
+ 3. **LLM Provider** configured - Ollama (default for local development), OpenAI, Anthropic, Gemini, or others via RubyLLM
12
12
  4. Basic understanding of Ruby and LLMs
13
13
 
14
- ### Installing Ollama
14
+ ### Configuring an LLM Provider
15
15
 
16
- HTM uses Ollama for generating vector embeddings by default:
16
+ HTM uses RubyLLM which supports multiple providers for generating embeddings and extracting tags.
17
+
18
+ **Option A: Ollama (Recommended for Local Development)**
17
19
 
18
20
  ```bash
19
21
  # Install Ollama
20
22
  curl https://ollama.ai/install.sh | sh
21
23
 
22
- # Pull the gpt-oss model (default for HTM)
23
- ollama pull gpt-oss
24
+ # Pull required models
25
+ ollama pull nomic-embed-text
26
+ ollama pull gemma3:latest
24
27
 
25
28
  # Verify Ollama is running
26
29
  curl http://localhost:11434/api/version
27
30
  ```
28
31
 
32
+ **Option B: OpenAI (Recommended for Production)**
33
+
34
+ ```bash
35
+ export OPENAI_API_KEY="sk-..."
36
+ ```
37
+
38
+ Configure HTM:
39
+ ```ruby
40
+ HTM.configure do |config|
41
+ config.embedding.provider = :openai
42
+ config.embedding.model = 'text-embedding-3-small'
43
+ end
44
+ ```
45
+
46
+ **Option C: Other Providers** (Anthropic, Gemini, Azure, Bedrock, DeepSeek)
47
+
48
+ Set the appropriate API key and configure HTM with your preferred provider.
49
+
29
50
  !!! tip
30
- The gpt-oss model provides high-quality embeddings optimized for semantic search. HTM uses these embeddings to understand the meaning of your memories, not just keyword matches.
51
+ HTM uses vector embeddings to understand the semantic meaning of your memories, not just keyword matches. Any provider will work—choose based on your privacy, cost, and quality requirements.
31
52
 
32
53
  ## Installation
33
54
 
@@ -463,9 +484,9 @@ htm.forget(node_id, soft: false, confirm: :confirmed)
463
484
 
464
485
  ## Troubleshooting
465
486
 
466
- ### Ollama Connection Issues
487
+ ### LLM Provider Connection Issues
467
488
 
468
- If you see embedding errors:
489
+ **If using Ollama:**
469
490
 
470
491
  ```bash
471
492
  # Check Ollama is running
@@ -474,10 +495,20 @@ curl http://localhost:11434/api/version
474
495
  # If not running, start it
475
496
  ollama serve
476
497
 
477
- # Verify the model is available
498
+ # Verify the models are available
478
499
  ollama list
479
500
  ```
480
501
 
502
+ **If using cloud providers:**
503
+
504
+ ```bash
505
+ # Verify API key is set
506
+ echo $OPENAI_API_KEY # or ANTHROPIC_API_KEY, GEMINI_API_KEY, etc.
507
+
508
+ # Test connectivity
509
+ curl https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY"
510
+ ```
511
+
481
512
  ### Database Connection Issues
482
513
 
483
514
  ```ruby
data/docs/guides/index.md CHANGED
@@ -32,7 +32,7 @@ Learn how to work with HTM's memory system effectively.
32
32
 
33
33
  Dive deeper into HTM's powerful capabilities.
34
34
 
35
- - [**Multi-Robot Usage**](multi-robot.md) - Building hive mind systems with multiple robots
35
+ - [**Multi-Robot Usage**](../robots/multi-robot.md) - Building hive mind systems with multiple robots
36
36
  - [**Search Strategies**](search-strategies.md) - Vector, full-text, and hybrid search
37
37
  - [**Context Assembly**](context-assembly.md) - Creating optimized context for LLMs
38
38
 
@@ -62,7 +62,7 @@ We recommend the following progression:
62
62
  - [Search Strategies](search-strategies.md) - Optimize retrieval
63
63
 
64
64
  4. **Advanced Topics**: Multi-Robot Systems
65
- - [Multi-Robot Usage](multi-robot.md) - Build collaborative systems
65
+ - [Multi-Robot Usage](../robots/multi-robot.md) - Build collaborative systems
66
66
 
67
67
  ## Quick Reference
68
68
 
@@ -73,7 +73,7 @@ We recommend the following progression:
73
73
  - **Search for memories**: See [Recalling Memories](recalling-memories.md#basic-recall)
74
74
  - **Create LLM context**: See [Context Assembly](context-assembly.md#basic-usage)
75
75
  - **Monitor memory usage**: See [Working Memory](working-memory.md#monitoring-utilization)
76
- - **Multi-robot setup**: See [Multi-Robot Usage](multi-robot.md#setting-up-multiple-robots)
76
+ - **Multi-robot setup**: See [Multi-Robot Usage](../robots/multi-robot.md#setting-up-multiple-robots)
77
77
  - **Use with Claude/AIA**: See [MCP Server](mcp-server.md#client-configuration)
78
78
 
79
79
  ### Memory Types
@@ -30,11 +30,19 @@ Before using the MCP server, ensure you have:
30
30
  htm_mcp setup
31
31
  ```
32
32
 
33
- 3. **Ollama running** (for embeddings and tag extraction)
33
+ 3. **LLM provider configured** (for embeddings and tag extraction)
34
+
35
+ **Option A: Ollama (default for local development)**
34
36
  ```bash
35
37
  ollama serve
36
38
  ollama pull nomic-embed-text
37
- ollama pull llama3
39
+ ollama pull gemma3:latest
40
+ ```
41
+
42
+ **Option B: Cloud providers** (OpenAI, Anthropic, etc.)
43
+ ```bash
44
+ export OPENAI_API_KEY="sk-..."
45
+ # Configure via HTM.configure or environment variables
38
46
  ```
39
47
 
40
48
  ## Starting the Server
@@ -49,7 +57,7 @@ The server logs to STDERR to avoid corrupting the JSON-RPC protocol on STDOUT.
49
57
 
50
58
  ## CLI Commands
51
59
 
52
- The `htm_mcp` executable includes management commands for database setup and diagnostics:
60
+ The `htm_mcp` executable includes management commands for database setup, diagnostics, and rake task execution:
53
61
 
54
62
  | Command | Description |
55
63
  |---------|-------------|
@@ -61,6 +69,9 @@ The `htm_mcp` executable includes management commands for database setup and dia
61
69
  | `htm_mcp stats` | Show memory statistics (nodes, tags, robots, database size) |
62
70
  | `htm_mcp version` | Show HTM version |
63
71
  | `htm_mcp help` | Show help with all environment variables |
72
+ | `htm_mcp rake <task>` | Run any HTM rake task |
73
+ | `htm_mcp rake -T` | List all available HTM rake tasks |
74
+ | `htm_mcp rake -T <pattern>` | List HTM rake tasks matching pattern |
64
75
 
65
76
  ### First-Time Setup
66
77
 
@@ -93,6 +104,72 @@ Migration Status
93
104
  3 applied, 1 pending
94
105
  ```
95
106
 
107
+ ### Rake Task Passthrough
108
+
109
+ The `htm_mcp rake` command allows you to run any HTM rake task directly through the MCP CLI. This is useful when working with HTM without a full Rails/Rake environment.
110
+
111
+ **List all available tasks:**
112
+
113
+ ```bash
114
+ $ htm_mcp rake -T
115
+ # or
116
+ $ htm_mcp rake --tasks
117
+
118
+ HTM Rake Tasks
119
+ ================================================================================
120
+ htm:db:console # Open psql console to database
121
+ htm:db:create # Create the database if it doesn't exist
122
+ htm:db:drop # Drop all HTM tables (WARNING: destructive!)
123
+ htm:db:info # Show database information
124
+ htm:db:migrate # Run pending database migrations
125
+ htm:db:purge_all # Permanently delete all soft-deleted records
126
+ ...
127
+ ```
128
+
129
+ **Filter tasks by pattern** (like standard `rake -T`):
130
+
131
+ ```bash
132
+ $ htm_mcp rake -T htm:jobs
133
+
134
+ HTM Rake Tasks
135
+ ================================================================================
136
+ htm:jobs:process_all # Process all pending jobs (embeddings, tags, propositions)
137
+ htm:jobs:process_embeddings # Process pending embedding jobs
138
+ htm:jobs:process_propositions # Process pending proposition extraction jobs
139
+ htm:jobs:process_tags # Process pending tag extraction jobs
140
+ htm:jobs:stats # Show job processing statistics
141
+
142
+ $ htm_mcp rake -T db:rebuild
143
+
144
+ HTM Rake Tasks
145
+ ================================================================================
146
+ htm:db:rebuild:embeddings # Clear and regenerate all embeddings
147
+ htm:db:rebuild:propositions # Extract propositions from all non-proposition nodes
148
+ ```
149
+
150
+ **Run specific tasks:**
151
+
152
+ ```bash
153
+ # Database tasks
154
+ $ htm_mcp rake htm:db:stats
155
+ $ htm_mcp rake htm:db:verify
156
+ $ htm_mcp rake htm:db:purge_all
157
+
158
+ # Job processing tasks
159
+ $ htm_mcp rake htm:jobs:process_all
160
+ $ htm_mcp rake htm:jobs:process_embeddings
161
+
162
+ # Tag tasks
163
+ $ htm_mcp rake htm:tags:tree
164
+ $ htm_mcp rake 'htm:tags:tree[database]' # With argument
165
+
166
+ # File tasks
167
+ $ htm_mcp rake htm:files:list
168
+ $ htm_mcp rake htm:files:sync
169
+ ```
170
+
171
+ **Note:** Tasks requiring arguments use the standard rake syntax with brackets quoted for shell safety: `htm_mcp rake 'htm:files:load[path/to/file.md]'`
172
+
96
173
  ## Tools Reference
97
174
 
98
175
  ### SetRobotTool
@@ -681,8 +758,8 @@ Memory statistics as JSON.
681
758
  "current_robot": "my-assistant",
682
759
  "robot_id": 5,
683
760
  "robot_initialized": true,
684
- "embedding_provider": "ollama",
685
- "embedding_model": "nomic-embed-text"
761
+ "embedding_provider": "ollama", // or "openai", "gemini", etc.
762
+ "embedding_model": "nomic-embed-text" // provider-specific model
686
763
  }
687
764
  ```
688
765
 
@@ -986,12 +1063,18 @@ psql htm_development -c "CREATE EXTENSION IF NOT EXISTS pg_trgm;"
986
1063
 
987
1064
  ### Embedding/Tag Errors
988
1065
 
989
- **Error: `Connection refused` (Ollama)**
1066
+ **Error: `Connection refused` (when using Ollama)**
990
1067
  1. Start Ollama: `ollama serve`
991
1068
  2. Pull required models:
992
1069
  ```bash
993
1070
  ollama pull nomic-embed-text
994
- ollama pull llama3
1071
+ ollama pull gemma3:latest
1072
+ ```
1073
+
1074
+ **Error: `API key invalid` (when using cloud providers)**
1075
+ 1. Verify the API key is set:
1076
+ ```bash
1077
+ echo $OPENAI_API_KEY # or ANTHROPIC_API_KEY, GEMINI_API_KEY
995
1078
  ```
996
1079
 
997
1080
  ### Debugging
@@ -1026,19 +1109,23 @@ Run `htm_mcp help` for a complete list. Key variables:
1026
1109
 
1027
1110
  ### LLM Providers
1028
1111
 
1112
+ HTM uses RubyLLM which supports multiple providers. Defaults to Ollama for local development.
1113
+
1029
1114
  | Variable | Description | Default |
1030
1115
  |----------|-------------|---------|
1031
- | `HTM_EMBEDDING_PROVIDER` | Embedding provider | `ollama` |
1032
- | `HTM_EMBEDDING_MODEL` | Embedding model | `nomic-embed-text:latest` |
1116
+ | `HTM_EMBEDDING_PROVIDER` | Embedding provider (`ollama`, `openai`, `gemini`, etc.) | `ollama` |
1117
+ | `HTM_EMBEDDING_MODEL` | Embedding model (provider-specific) | `nomic-embed-text` |
1033
1118
  | `HTM_TAG_PROVIDER` | Tag extraction provider | `ollama` |
1034
1119
  | `HTM_TAG_MODEL` | Tag model | `gemma3:latest` |
1035
- | `HTM_OLLAMA_URL` | Ollama server URL | `http://localhost:11434` |
1120
+ | `HTM_OLLAMA_URL` | Ollama server URL (if using Ollama) | `http://localhost:11434` |
1036
1121
 
1037
- ### Other Providers (set API keys as needed)
1122
+ ### Cloud Provider API Keys
1038
1123
 
1039
1124
  | Variable | Description |
1040
1125
  |----------|-------------|
1041
- | `HTM_OPENAI_API_KEY` | OpenAI API key |
1126
+ | `OPENAI_API_KEY` | OpenAI API key |
1127
+ | `ANTHROPIC_API_KEY` | Anthropic API key |
1128
+ | `GEMINI_API_KEY` | Google Gemini API key |
1042
1129
  | `HTM_ANTHROPIC_API_KEY` | Anthropic API key |
1043
1130
  | `HTM_GEMINI_API_KEY` | Google Gemini API key |
1044
1131
  | `HTM_AZURE_API_KEY` | Azure OpenAI API key |
@@ -1049,4 +1136,4 @@ Run `htm_mcp help` for a complete list. Key variables:
1049
1136
  - [Getting Started](getting-started.md) - HTM basics
1050
1137
  - [Adding Memories](adding-memories.md) - Learn about tags and metadata
1051
1138
  - [Recalling Memories](recalling-memories.md) - Search strategies
1052
- - [Multi-Robot Systems](multi-robot.md) - Working with multiple robots
1139
+ - [Multi-Robot Systems](../robots/multi-robot.md) - Working with multiple robots