htm 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (155) hide show
  1. checksums.yaml +7 -0
  2. data/.architecture/decisions/adrs/001-use-postgresql-timescaledb-storage.md +227 -0
  3. data/.architecture/decisions/adrs/002-two-tier-memory-architecture.md +322 -0
  4. data/.architecture/decisions/adrs/003-ollama-default-embedding-provider.md +339 -0
  5. data/.architecture/decisions/adrs/004-multi-robot-shared-memory-hive-mind.md +374 -0
  6. data/.architecture/decisions/adrs/005-rag-based-retrieval-with-hybrid-search.md +443 -0
  7. data/.architecture/decisions/adrs/006-context-assembly-strategies.md +444 -0
  8. data/.architecture/decisions/adrs/007-working-memory-eviction-strategy.md +461 -0
  9. data/.architecture/decisions/adrs/008-robot-identification-system.md +550 -0
  10. data/.architecture/decisions/adrs/009-never-forget-explicit-deletion-only.md +570 -0
  11. data/.architecture/decisions/adrs/010-redis-working-memory-rejected.md +323 -0
  12. data/.architecture/decisions/adrs/011-database-side-embedding-generation-with-pgai.md +585 -0
  13. data/.architecture/decisions/adrs/012-llm-driven-ontology-topic-extraction.md +583 -0
  14. data/.architecture/decisions/adrs/013-activerecord-orm-and-many-to-many-tagging.md +299 -0
  15. data/.architecture/decisions/adrs/014-client-side-embedding-generation-workflow.md +569 -0
  16. data/.architecture/decisions/adrs/015-hierarchical-tag-ontology-and-llm-extraction.md +701 -0
  17. data/.architecture/decisions/adrs/016-async-embedding-and-tag-generation.md +694 -0
  18. data/.architecture/members.yml +144 -0
  19. data/.architecture/reviews/2025-10-29-llm-configuration-and-async-processing-review.md +1137 -0
  20. data/.architecture/reviews/initial-system-analysis.md +330 -0
  21. data/.envrc +32 -0
  22. data/.irbrc +145 -0
  23. data/CHANGELOG.md +150 -0
  24. data/COMMITS.md +196 -0
  25. data/LICENSE +21 -0
  26. data/README.md +1347 -0
  27. data/Rakefile +51 -0
  28. data/SETUP.md +268 -0
  29. data/config/database.yml +67 -0
  30. data/db/migrate/20250101000001_enable_extensions.rb +14 -0
  31. data/db/migrate/20250101000002_create_robots.rb +14 -0
  32. data/db/migrate/20250101000003_create_nodes.rb +42 -0
  33. data/db/migrate/20250101000005_create_tags.rb +38 -0
  34. data/db/migrate/20250101000007_add_node_vector_indexes.rb +30 -0
  35. data/db/schema.sql +473 -0
  36. data/db/seed_data/README.md +100 -0
  37. data/db/seed_data/presidents.md +136 -0
  38. data/db/seed_data/states.md +151 -0
  39. data/db/seeds.rb +208 -0
  40. data/dbdoc/README.md +173 -0
  41. data/dbdoc/public.node_stats.md +48 -0
  42. data/dbdoc/public.node_stats.svg +41 -0
  43. data/dbdoc/public.node_tags.md +40 -0
  44. data/dbdoc/public.node_tags.svg +112 -0
  45. data/dbdoc/public.nodes.md +54 -0
  46. data/dbdoc/public.nodes.svg +118 -0
  47. data/dbdoc/public.nodes_tags.md +39 -0
  48. data/dbdoc/public.nodes_tags.svg +112 -0
  49. data/dbdoc/public.ontology_structure.md +48 -0
  50. data/dbdoc/public.ontology_structure.svg +38 -0
  51. data/dbdoc/public.operations_log.md +42 -0
  52. data/dbdoc/public.operations_log.svg +130 -0
  53. data/dbdoc/public.relationships.md +39 -0
  54. data/dbdoc/public.relationships.svg +41 -0
  55. data/dbdoc/public.robot_activity.md +46 -0
  56. data/dbdoc/public.robot_activity.svg +35 -0
  57. data/dbdoc/public.robots.md +35 -0
  58. data/dbdoc/public.robots.svg +90 -0
  59. data/dbdoc/public.schema_migrations.md +29 -0
  60. data/dbdoc/public.schema_migrations.svg +26 -0
  61. data/dbdoc/public.tags.md +35 -0
  62. data/dbdoc/public.tags.svg +60 -0
  63. data/dbdoc/public.topic_relationships.md +45 -0
  64. data/dbdoc/public.topic_relationships.svg +32 -0
  65. data/dbdoc/schema.json +1437 -0
  66. data/dbdoc/schema.svg +154 -0
  67. data/docs/api/database.md +806 -0
  68. data/docs/api/embedding-service.md +532 -0
  69. data/docs/api/htm.md +797 -0
  70. data/docs/api/index.md +259 -0
  71. data/docs/api/long-term-memory.md +1096 -0
  72. data/docs/api/working-memory.md +665 -0
  73. data/docs/architecture/adrs/001-postgresql-timescaledb.md +314 -0
  74. data/docs/architecture/adrs/002-two-tier-memory.md +411 -0
  75. data/docs/architecture/adrs/003-ollama-embeddings.md +421 -0
  76. data/docs/architecture/adrs/004-hive-mind.md +437 -0
  77. data/docs/architecture/adrs/005-rag-retrieval.md +531 -0
  78. data/docs/architecture/adrs/006-context-assembly.md +496 -0
  79. data/docs/architecture/adrs/007-eviction-strategy.md +645 -0
  80. data/docs/architecture/adrs/008-robot-identification.md +625 -0
  81. data/docs/architecture/adrs/009-never-forget.md +648 -0
  82. data/docs/architecture/adrs/010-redis-working-memory-rejected.md +323 -0
  83. data/docs/architecture/adrs/011-pgai-integration.md +494 -0
  84. data/docs/architecture/adrs/index.md +215 -0
  85. data/docs/architecture/hive-mind.md +736 -0
  86. data/docs/architecture/index.md +351 -0
  87. data/docs/architecture/overview.md +538 -0
  88. data/docs/architecture/two-tier-memory.md +873 -0
  89. data/docs/assets/css/custom.css +83 -0
  90. data/docs/assets/images/htm-core-components.svg +63 -0
  91. data/docs/assets/images/htm-database-schema.svg +93 -0
  92. data/docs/assets/images/htm-hive-mind-architecture.svg +125 -0
  93. data/docs/assets/images/htm-importance-scoring-framework.svg +83 -0
  94. data/docs/assets/images/htm-layered-architecture.svg +71 -0
  95. data/docs/assets/images/htm-long-term-memory-architecture.svg +115 -0
  96. data/docs/assets/images/htm-working-memory-architecture.svg +120 -0
  97. data/docs/assets/images/htm.jpg +0 -0
  98. data/docs/assets/images/htm_demo.gif +0 -0
  99. data/docs/assets/js/mathjax.js +18 -0
  100. data/docs/assets/videos/htm_video.mp4 +0 -0
  101. data/docs/database_rake_tasks.md +322 -0
  102. data/docs/development/contributing.md +787 -0
  103. data/docs/development/index.md +336 -0
  104. data/docs/development/schema.md +596 -0
  105. data/docs/development/setup.md +719 -0
  106. data/docs/development/testing.md +819 -0
  107. data/docs/guides/adding-memories.md +824 -0
  108. data/docs/guides/context-assembly.md +1009 -0
  109. data/docs/guides/getting-started.md +577 -0
  110. data/docs/guides/index.md +118 -0
  111. data/docs/guides/long-term-memory.md +941 -0
  112. data/docs/guides/multi-robot.md +866 -0
  113. data/docs/guides/recalling-memories.md +927 -0
  114. data/docs/guides/search-strategies.md +953 -0
  115. data/docs/guides/working-memory.md +717 -0
  116. data/docs/index.md +214 -0
  117. data/docs/installation.md +477 -0
  118. data/docs/multi_framework_support.md +519 -0
  119. data/docs/quick-start.md +655 -0
  120. data/docs/setup_local_database.md +302 -0
  121. data/docs/using_rake_tasks_in_your_app.md +383 -0
  122. data/examples/basic_usage.rb +93 -0
  123. data/examples/cli_app/README.md +317 -0
  124. data/examples/cli_app/htm_cli.rb +270 -0
  125. data/examples/custom_llm_configuration.rb +183 -0
  126. data/examples/example_app/Rakefile +71 -0
  127. data/examples/example_app/app.rb +206 -0
  128. data/examples/sinatra_app/Gemfile +21 -0
  129. data/examples/sinatra_app/app.rb +335 -0
  130. data/lib/htm/active_record_config.rb +113 -0
  131. data/lib/htm/configuration.rb +342 -0
  132. data/lib/htm/database.rb +594 -0
  133. data/lib/htm/embedding_service.rb +115 -0
  134. data/lib/htm/errors.rb +34 -0
  135. data/lib/htm/job_adapter.rb +154 -0
  136. data/lib/htm/jobs/generate_embedding_job.rb +65 -0
  137. data/lib/htm/jobs/generate_tags_job.rb +82 -0
  138. data/lib/htm/long_term_memory.rb +965 -0
  139. data/lib/htm/models/node.rb +109 -0
  140. data/lib/htm/models/node_tag.rb +33 -0
  141. data/lib/htm/models/robot.rb +52 -0
  142. data/lib/htm/models/tag.rb +76 -0
  143. data/lib/htm/railtie.rb +76 -0
  144. data/lib/htm/sinatra.rb +157 -0
  145. data/lib/htm/tag_service.rb +135 -0
  146. data/lib/htm/tasks.rb +38 -0
  147. data/lib/htm/version.rb +5 -0
  148. data/lib/htm/working_memory.rb +182 -0
  149. data/lib/htm.rb +400 -0
  150. data/lib/tasks/db.rake +19 -0
  151. data/lib/tasks/htm.rake +147 -0
  152. data/lib/tasks/jobs.rake +312 -0
  153. data/mkdocs.yml +190 -0
  154. data/scripts/install_local_database.sh +309 -0
  155. metadata +341 -0
data/db/schema.sql ADDED
@@ -0,0 +1,473 @@
1
+ -- HTM Database Schema
2
+ -- Auto-generated from database using pg_dump
3
+ -- DO NOT EDIT THIS FILE MANUALLY
4
+ -- Run 'rake htm:db:schema:dump' to regenerate
5
+
6
+ --
7
+ -- Name: pg_trgm; Type: EXTENSION; Schema: -; Owner: -
8
+ --
9
+
10
+ CREATE EXTENSION IF NOT EXISTS pg_trgm WITH SCHEMA public;
11
+
12
+ --
13
+ -- Name: EXTENSION pg_trgm; Type: COMMENT; Schema: -; Owner: -
14
+ --
15
+
16
+ --
17
+ -- Name: vector; Type: EXTENSION; Schema: -; Owner: -
18
+ --
19
+
20
+ CREATE EXTENSION IF NOT EXISTS vector WITH SCHEMA public;
21
+
22
+ --
23
+ -- Name: EXTENSION vector; Type: COMMENT; Schema: -; Owner: -
24
+ --
25
+
26
+ --
27
+ -- Name: node_tags; Type: TABLE; Schema: public; Owner: -
28
+ --
29
+
30
+ CREATE TABLE public.node_tags (
31
+ id bigint NOT NULL,
32
+ node_id bigint NOT NULL,
33
+ tag_id bigint NOT NULL,
34
+ created_at timestamp with time zone DEFAULT CURRENT_TIMESTAMP
35
+ );
36
+
37
+ --
38
+ -- Name: TABLE node_tags; Type: COMMENT; Schema: public; Owner: -
39
+ --
40
+
41
+ COMMENT ON TABLE public.node_tags IS 'Join table connecting nodes to tags (many-to-many)';
42
+
43
+ --
44
+ -- Name: COLUMN node_tags.node_id; Type: COMMENT; Schema: public; Owner: -
45
+ --
46
+
47
+ COMMENT ON COLUMN public.node_tags.node_id IS 'ID of the node being tagged';
48
+
49
+ --
50
+ -- Name: COLUMN node_tags.tag_id; Type: COMMENT; Schema: public; Owner: -
51
+ --
52
+
53
+ COMMENT ON COLUMN public.node_tags.tag_id IS 'ID of the tag being applied';
54
+
55
+ --
56
+ -- Name: COLUMN node_tags.created_at; Type: COMMENT; Schema: public; Owner: -
57
+ --
58
+
59
+ COMMENT ON COLUMN public.node_tags.created_at IS 'When this association was created';
60
+
61
+ --
62
+ -- Name: node_tags_id_seq; Type: SEQUENCE; Schema: public; Owner: -
63
+ --
64
+
65
+ CREATE SEQUENCE public.node_tags_id_seq
66
+ START WITH 1
67
+ INCREMENT BY 1
68
+ NO MINVALUE
69
+ NO MAXVALUE
70
+ CACHE 1;
71
+
72
+ --
73
+ -- Name: node_tags_id_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: -
74
+ --
75
+
76
+ ALTER SEQUENCE public.node_tags_id_seq OWNED BY public.node_tags.id;
77
+
78
+ --
79
+ -- Name: nodes; Type: TABLE; Schema: public; Owner: -
80
+ --
81
+
82
+ CREATE TABLE public.nodes (
83
+ id bigint NOT NULL,
84
+ content text NOT NULL,
85
+ source text DEFAULT ''::text,
86
+ access_count integer DEFAULT 0 NOT NULL,
87
+ created_at timestamp with time zone DEFAULT CURRENT_TIMESTAMP,
88
+ updated_at timestamp with time zone DEFAULT CURRENT_TIMESTAMP,
89
+ last_accessed timestamp with time zone DEFAULT CURRENT_TIMESTAMP,
90
+ token_count integer,
91
+ in_working_memory boolean DEFAULT false,
92
+ robot_id bigint NOT NULL,
93
+ embedding public.vector(2000),
94
+ embedding_dimension integer,
95
+ CONSTRAINT check_embedding_dimension CHECK (((embedding_dimension IS NULL) OR ((embedding_dimension > 0) AND (embedding_dimension <= 2000))))
96
+ );
97
+
98
+ --
99
+ -- Name: TABLE nodes; Type: COMMENT; Schema: public; Owner: -
100
+ --
101
+
102
+ COMMENT ON TABLE public.nodes IS 'Core memory storage for conversation messages and context';
103
+
104
+ --
105
+ -- Name: COLUMN nodes.content; Type: COMMENT; Schema: public; Owner: -
106
+ --
107
+
108
+ COMMENT ON COLUMN public.nodes.content IS 'The conversation message/utterance content';
109
+
110
+ --
111
+ -- Name: COLUMN nodes.source; Type: COMMENT; Schema: public; Owner: -
112
+ --
113
+
114
+ COMMENT ON COLUMN public.nodes.source IS 'From where the content came (empty string if unknown)';
115
+
116
+ --
117
+ -- Name: COLUMN nodes.access_count; Type: COMMENT; Schema: public; Owner: -
118
+ --
119
+
120
+ COMMENT ON COLUMN public.nodes.access_count IS 'Number of times this node has been accessed/retrieved';
121
+
122
+ --
123
+ -- Name: COLUMN nodes.created_at; Type: COMMENT; Schema: public; Owner: -
124
+ --
125
+
126
+ COMMENT ON COLUMN public.nodes.created_at IS 'When this memory was created';
127
+
128
+ --
129
+ -- Name: COLUMN nodes.updated_at; Type: COMMENT; Schema: public; Owner: -
130
+ --
131
+
132
+ COMMENT ON COLUMN public.nodes.updated_at IS 'When this memory was last modified';
133
+
134
+ --
135
+ -- Name: COLUMN nodes.last_accessed; Type: COMMENT; Schema: public; Owner: -
136
+ --
137
+
138
+ COMMENT ON COLUMN public.nodes.last_accessed IS 'When this memory was last accessed';
139
+
140
+ --
141
+ -- Name: COLUMN nodes.token_count; Type: COMMENT; Schema: public; Owner: -
142
+ --
143
+
144
+ COMMENT ON COLUMN public.nodes.token_count IS 'Number of tokens in the content (for context budget management)';
145
+
146
+ --
147
+ -- Name: COLUMN nodes.in_working_memory; Type: COMMENT; Schema: public; Owner: -
148
+ --
149
+
150
+ COMMENT ON COLUMN public.nodes.in_working_memory IS 'Whether this memory is currently in working memory';
151
+
152
+ --
153
+ -- Name: COLUMN nodes.robot_id; Type: COMMENT; Schema: public; Owner: -
154
+ --
155
+
156
+ COMMENT ON COLUMN public.nodes.robot_id IS 'ID of the robot that owns this memory';
157
+
158
+ --
159
+ -- Name: COLUMN nodes.embedding; Type: COMMENT; Schema: public; Owner: -
160
+ --
161
+
162
+ COMMENT ON COLUMN public.nodes.embedding IS 'Vector embedding (max 2000 dimensions) for semantic search';
163
+
164
+ --
165
+ -- Name: COLUMN nodes.embedding_dimension; Type: COMMENT; Schema: public; Owner: -
166
+ --
167
+
168
+ COMMENT ON COLUMN public.nodes.embedding_dimension IS 'Actual number of dimensions used in the embedding vector (max 2000)';
169
+
170
+ --
171
+ -- Name: nodes_id_seq; Type: SEQUENCE; Schema: public; Owner: -
172
+ --
173
+
174
+ CREATE SEQUENCE public.nodes_id_seq
175
+ START WITH 1
176
+ INCREMENT BY 1
177
+ NO MINVALUE
178
+ NO MAXVALUE
179
+ CACHE 1;
180
+
181
+ --
182
+ -- Name: nodes_id_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: -
183
+ --
184
+
185
+ ALTER SEQUENCE public.nodes_id_seq OWNED BY public.nodes.id;
186
+
187
+ --
188
+ -- Name: robots; Type: TABLE; Schema: public; Owner: -
189
+ --
190
+
191
+ CREATE TABLE public.robots (
192
+ id bigint NOT NULL,
193
+ name text,
194
+ created_at timestamp with time zone DEFAULT CURRENT_TIMESTAMP,
195
+ last_active timestamp with time zone DEFAULT CURRENT_TIMESTAMP,
196
+ metadata jsonb
197
+ );
198
+
199
+ --
200
+ -- Name: TABLE robots; Type: COMMENT; Schema: public; Owner: -
201
+ --
202
+
203
+ COMMENT ON TABLE public.robots IS 'Registry of all LLM robots using the HTM system';
204
+
205
+ --
206
+ -- Name: COLUMN robots.name; Type: COMMENT; Schema: public; Owner: -
207
+ --
208
+
209
+ COMMENT ON COLUMN public.robots.name IS 'Human-readable name for the robot';
210
+
211
+ --
212
+ -- Name: COLUMN robots.created_at; Type: COMMENT; Schema: public; Owner: -
213
+ --
214
+
215
+ COMMENT ON COLUMN public.robots.created_at IS 'When the robot was first registered';
216
+
217
+ --
218
+ -- Name: COLUMN robots.last_active; Type: COMMENT; Schema: public; Owner: -
219
+ --
220
+
221
+ COMMENT ON COLUMN public.robots.last_active IS 'Last time the robot accessed the system';
222
+
223
+ --
224
+ -- Name: COLUMN robots.metadata; Type: COMMENT; Schema: public; Owner: -
225
+ --
226
+
227
+ COMMENT ON COLUMN public.robots.metadata IS 'Robot-specific configuration and metadata';
228
+
229
+ --
230
+ -- Name: robots_id_seq; Type: SEQUENCE; Schema: public; Owner: -
231
+ --
232
+
233
+ CREATE SEQUENCE public.robots_id_seq
234
+ START WITH 1
235
+ INCREMENT BY 1
236
+ NO MINVALUE
237
+ NO MAXVALUE
238
+ CACHE 1;
239
+
240
+ --
241
+ -- Name: robots_id_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: -
242
+ --
243
+
244
+ ALTER SEQUENCE public.robots_id_seq OWNED BY public.robots.id;
245
+
246
+ --
247
+ -- Name: schema_migrations; Type: TABLE; Schema: public; Owner: -
248
+ --
249
+
250
+ CREATE TABLE public.schema_migrations (
251
+ version character varying NOT NULL
252
+ );
253
+
254
+ --
255
+ -- Name: tags; Type: TABLE; Schema: public; Owner: -
256
+ --
257
+
258
+ CREATE TABLE public.tags (
259
+ id bigint NOT NULL,
260
+ name text NOT NULL,
261
+ created_at timestamp with time zone DEFAULT CURRENT_TIMESTAMP
262
+ );
263
+
264
+ --
265
+ -- Name: TABLE tags; Type: COMMENT; Schema: public; Owner: -
266
+ --
267
+
268
+ COMMENT ON TABLE public.tags IS 'Unique tag names for categorization';
269
+
270
+ --
271
+ -- Name: COLUMN tags.name; Type: COMMENT; Schema: public; Owner: -
272
+ --
273
+
274
+ COMMENT ON COLUMN public.tags.name IS 'Hierarchical tag in format: root:level1:level2 (e.g., database:postgresql:timescaledb)';
275
+
276
+ --
277
+ -- Name: COLUMN tags.created_at; Type: COMMENT; Schema: public; Owner: -
278
+ --
279
+
280
+ COMMENT ON COLUMN public.tags.created_at IS 'When this tag was created';
281
+
282
+ --
283
+ -- Name: tags_id_seq; Type: SEQUENCE; Schema: public; Owner: -
284
+ --
285
+
286
+ CREATE SEQUENCE public.tags_id_seq
287
+ START WITH 1
288
+ INCREMENT BY 1
289
+ NO MINVALUE
290
+ NO MAXVALUE
291
+ CACHE 1;
292
+
293
+ --
294
+ -- Name: tags_id_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: -
295
+ --
296
+
297
+ ALTER SEQUENCE public.tags_id_seq OWNED BY public.tags.id;
298
+
299
+ --
300
+ -- Name: node_tags id; Type: DEFAULT; Schema: public; Owner: -
301
+ --
302
+
303
+ ALTER TABLE ONLY public.node_tags ALTER COLUMN id SET DEFAULT nextval('public.node_tags_id_seq'::regclass);
304
+
305
+ --
306
+ -- Name: nodes id; Type: DEFAULT; Schema: public; Owner: -
307
+ --
308
+
309
+ ALTER TABLE ONLY public.nodes ALTER COLUMN id SET DEFAULT nextval('public.nodes_id_seq'::regclass);
310
+
311
+ --
312
+ -- Name: robots id; Type: DEFAULT; Schema: public; Owner: -
313
+ --
314
+
315
+ ALTER TABLE ONLY public.robots ALTER COLUMN id SET DEFAULT nextval('public.robots_id_seq'::regclass);
316
+
317
+ --
318
+ -- Name: tags id; Type: DEFAULT; Schema: public; Owner: -
319
+ --
320
+
321
+ ALTER TABLE ONLY public.tags ALTER COLUMN id SET DEFAULT nextval('public.tags_id_seq'::regclass);
322
+
323
+ --
324
+ -- Name: node_tags node_tags_pkey; Type: CONSTRAINT; Schema: public; Owner: -
325
+ --
326
+
327
+ ALTER TABLE ONLY public.node_tags
328
+ ADD CONSTRAINT node_tags_pkey PRIMARY KEY (id);
329
+
330
+ --
331
+ -- Name: nodes nodes_pkey; Type: CONSTRAINT; Schema: public; Owner: -
332
+ --
333
+
334
+ ALTER TABLE ONLY public.nodes
335
+ ADD CONSTRAINT nodes_pkey PRIMARY KEY (id);
336
+
337
+ --
338
+ -- Name: robots robots_pkey; Type: CONSTRAINT; Schema: public; Owner: -
339
+ --
340
+
341
+ ALTER TABLE ONLY public.robots
342
+ ADD CONSTRAINT robots_pkey PRIMARY KEY (id);
343
+
344
+ --
345
+ -- Name: schema_migrations schema_migrations_pkey; Type: CONSTRAINT; Schema: public; Owner: -
346
+ --
347
+
348
+ ALTER TABLE ONLY public.schema_migrations
349
+ ADD CONSTRAINT schema_migrations_pkey PRIMARY KEY (version);
350
+
351
+ --
352
+ -- Name: tags tags_pkey; Type: CONSTRAINT; Schema: public; Owner: -
353
+ --
354
+
355
+ ALTER TABLE ONLY public.tags
356
+ ADD CONSTRAINT tags_pkey PRIMARY KEY (id);
357
+
358
+ --
359
+ -- Name: idx_node_tags_node_id; Type: INDEX; Schema: public; Owner: -
360
+ --
361
+
362
+ CREATE INDEX idx_node_tags_node_id ON public.node_tags USING btree (node_id);
363
+
364
+ --
365
+ -- Name: idx_node_tags_tag_id; Type: INDEX; Schema: public; Owner: -
366
+ --
367
+
368
+ CREATE INDEX idx_node_tags_tag_id ON public.node_tags USING btree (tag_id);
369
+
370
+ --
371
+ -- Name: idx_node_tags_unique; Type: INDEX; Schema: public; Owner: -
372
+ --
373
+
374
+ CREATE UNIQUE INDEX idx_node_tags_unique ON public.node_tags USING btree (node_id, tag_id);
375
+
376
+ --
377
+ -- Name: idx_nodes_access_count; Type: INDEX; Schema: public; Owner: -
378
+ --
379
+
380
+ CREATE INDEX idx_nodes_access_count ON public.nodes USING btree (access_count);
381
+
382
+ --
383
+ -- Name: idx_nodes_content_gin; Type: INDEX; Schema: public; Owner: -
384
+ --
385
+
386
+ CREATE INDEX idx_nodes_content_gin ON public.nodes USING gin (to_tsvector('english'::regconfig, content));
387
+
388
+ --
389
+ -- Name: idx_nodes_content_trgm; Type: INDEX; Schema: public; Owner: -
390
+ --
391
+
392
+ CREATE INDEX idx_nodes_content_trgm ON public.nodes USING gin (content public.gin_trgm_ops);
393
+
394
+ --
395
+ -- Name: idx_nodes_created_at; Type: INDEX; Schema: public; Owner: -
396
+ --
397
+
398
+ CREATE INDEX idx_nodes_created_at ON public.nodes USING btree (created_at);
399
+
400
+ --
401
+ -- Name: idx_nodes_embedding; Type: INDEX; Schema: public; Owner: -
402
+ --
403
+
404
+ CREATE INDEX idx_nodes_embedding ON public.nodes USING hnsw (embedding public.vector_cosine_ops) WITH (m='16', ef_construction='64');
405
+
406
+ --
407
+ -- Name: idx_nodes_in_working_memory; Type: INDEX; Schema: public; Owner: -
408
+ --
409
+
410
+ CREATE INDEX idx_nodes_in_working_memory ON public.nodes USING btree (in_working_memory);
411
+
412
+ --
413
+ -- Name: idx_nodes_last_accessed; Type: INDEX; Schema: public; Owner: -
414
+ --
415
+
416
+ CREATE INDEX idx_nodes_last_accessed ON public.nodes USING btree (last_accessed);
417
+
418
+ --
419
+ -- Name: idx_nodes_robot_id; Type: INDEX; Schema: public; Owner: -
420
+ --
421
+
422
+ CREATE INDEX idx_nodes_robot_id ON public.nodes USING btree (robot_id);
423
+
424
+ --
425
+ -- Name: idx_nodes_source; Type: INDEX; Schema: public; Owner: -
426
+ --
427
+
428
+ CREATE INDEX idx_nodes_source ON public.nodes USING btree (source);
429
+
430
+ --
431
+ -- Name: idx_nodes_updated_at; Type: INDEX; Schema: public; Owner: -
432
+ --
433
+
434
+ CREATE INDEX idx_nodes_updated_at ON public.nodes USING btree (updated_at);
435
+
436
+ --
437
+ -- Name: idx_tags_name_pattern; Type: INDEX; Schema: public; Owner: -
438
+ --
439
+
440
+ CREATE INDEX idx_tags_name_pattern ON public.tags USING btree (name text_pattern_ops);
441
+
442
+ --
443
+ -- Name: idx_tags_name_unique; Type: INDEX; Schema: public; Owner: -
444
+ --
445
+
446
+ CREATE UNIQUE INDEX idx_tags_name_unique ON public.tags USING btree (name);
447
+
448
+ --
449
+ -- Name: nodes fk_rails_60162e9d3a; Type: FK CONSTRAINT; Schema: public; Owner: -
450
+ --
451
+
452
+ ALTER TABLE ONLY public.nodes
453
+ ADD CONSTRAINT fk_rails_60162e9d3a FOREIGN KEY (robot_id) REFERENCES public.robots(id) ON DELETE CASCADE;
454
+
455
+ --
456
+ -- Name: node_tags fk_rails_b51cdcc57f; Type: FK CONSTRAINT; Schema: public; Owner: -
457
+ --
458
+
459
+ ALTER TABLE ONLY public.node_tags
460
+ ADD CONSTRAINT fk_rails_b51cdcc57f FOREIGN KEY (tag_id) REFERENCES public.tags(id) ON DELETE CASCADE;
461
+
462
+ --
463
+ -- Name: node_tags fk_rails_ebc9aafd9f; Type: FK CONSTRAINT; Schema: public; Owner: -
464
+ --
465
+
466
+ ALTER TABLE ONLY public.node_tags
467
+ ADD CONSTRAINT fk_rails_ebc9aafd9f FOREIGN KEY (node_id) REFERENCES public.nodes(id) ON DELETE CASCADE;
468
+
469
+ --
470
+ -- PostgreSQL database dump complete
471
+ --
472
+
473
+ \unrestrict f5a75Zsnuw7NUeDmu1kxeQ3pRMbaORhrsWHJyDdXV4wbRfzQweTumJBXu85kf1z
@@ -0,0 +1,100 @@
1
+ # HTM Seed Data
2
+
3
+ This directory contains markdown files used to seed the HTM database with sample data.
4
+
5
+ ## Configuration
6
+
7
+ The seeding process uses environment variables for configuration. All settings have sensible defaults:
8
+
9
+ ### LLM Provider Settings
10
+
11
+ - `HTM_EMBEDDING_PROVIDER` - Embedding provider (default: `ollama`)
12
+ - `HTM_EMBEDDING_MODEL` - Embedding model (default: `nomic-embed-text`)
13
+ - `HTM_EMBEDDING_DIMENSIONS` - Embedding dimensions (default: `768`)
14
+ - `HTM_TAG_PROVIDER` - Tag extraction provider (default: `ollama`)
15
+ - `HTM_TAG_MODEL` - Tag extraction model (default: `gemma3`)
16
+ - `OLLAMA_URL` - Ollama server URL (default: `http://localhost:11434`)
17
+
18
+ ### Timeout Settings
19
+
20
+ - `HTM_EMBEDDING_TIMEOUT` - Embedding generation timeout in seconds (default: `120`)
21
+ - `HTM_TAG_TIMEOUT` - Tag generation timeout in seconds (default: `180`)
22
+ - `HTM_CONNECTION_TIMEOUT` - LLM connection timeout in seconds (default: `30`)
23
+
24
+ ### Database Settings
25
+
26
+ - `HTM_DBURL` - Full PostgreSQL connection URL (required)
27
+ - Or individual settings: `HTM_DBHOST`, `HTM_DBPORT`, `HTM_DBNAME`, `HTM_DBUSER`, `HTM_DBPASS`
28
+
29
+ ### Other Settings
30
+
31
+ - `HTM_ROBOT_NAME` - Name for the seeding robot (default: `"Seed Robot"`)
32
+
33
+ ## Format
34
+
35
+ All `.md` files in this directory will be automatically processed by `db/seeds.rb`.
36
+
37
+ Each markdown file should follow this structure:
38
+
39
+ ```markdown
40
+ # Title (optional, will be ignored)
41
+
42
+ ## Section Name 1
43
+ Paragraph of content for this section. This entire paragraph will be stored
44
+ as a single memory node in HTM.
45
+
46
+ ## Section Name 2
47
+ Another paragraph of content. Each ## header denotes a new section that will
48
+ become a separate memory node.
49
+
50
+ ## Section Name 3
51
+ Content for the third section...
52
+ ```
53
+
54
+ ## Processing
55
+
56
+ The seeding script (`db/seeds.rb`):
57
+
58
+ 1. Reads all `*.md` files from this directory
59
+ 2. Parses each file looking for `## Header` sections
60
+ 3. Extracts the paragraph(s) following each header
61
+ 4. Creates an HTM memory node for each section
62
+ 5. Uses the filename (without `.md`) as the `source` field
63
+ 6. Automatically generates embeddings and hierarchical tags for each node
64
+
65
+ ## Current Seed Data
66
+
67
+ - **states.md**: Interesting facts about all 50 US states
68
+ - **presidents.md**: Interesting facts about all 45 US presidents
69
+
70
+ ## Adding New Seed Data
71
+
72
+ To add new seed data:
73
+
74
+ 1. Create a new `.md` file in this directory
75
+ 2. Follow the format above with `## Header` sections
76
+ 3. Run `rake htm:db:seed` to populate the database
77
+
78
+ The filename (without extension) will be used as the source identifier for all nodes
79
+ created from that file.
80
+
81
+ ## Example
82
+
83
+ Given a file `countries.md`:
84
+
85
+ ```markdown
86
+ # World Countries
87
+
88
+ ## France
89
+ France is known for the Eiffel Tower...
90
+
91
+ ## Japan
92
+ Japan is an island nation...
93
+ ```
94
+
95
+ Running `rake htm:db:seed` will create:
96
+ - 2 memory nodes
97
+ - Both with `source: "countries"`
98
+ - Each with embeddings (768-dimensional vectors)
99
+ - Each with hierarchical tags extracted by LLM
100
+ - Stored in the `nodes` table with full-text and vector search capabilities