htm 0.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.architecture/decisions/adrs/001-use-postgresql-timescaledb-storage.md +227 -0
- data/.architecture/decisions/adrs/002-two-tier-memory-architecture.md +322 -0
- data/.architecture/decisions/adrs/003-ollama-default-embedding-provider.md +339 -0
- data/.architecture/decisions/adrs/004-multi-robot-shared-memory-hive-mind.md +374 -0
- data/.architecture/decisions/adrs/005-rag-based-retrieval-with-hybrid-search.md +443 -0
- data/.architecture/decisions/adrs/006-context-assembly-strategies.md +444 -0
- data/.architecture/decisions/adrs/007-working-memory-eviction-strategy.md +461 -0
- data/.architecture/decisions/adrs/008-robot-identification-system.md +550 -0
- data/.architecture/decisions/adrs/009-never-forget-explicit-deletion-only.md +570 -0
- data/.architecture/decisions/adrs/010-redis-working-memory-rejected.md +323 -0
- data/.architecture/decisions/adrs/011-database-side-embedding-generation-with-pgai.md +585 -0
- data/.architecture/decisions/adrs/012-llm-driven-ontology-topic-extraction.md +583 -0
- data/.architecture/decisions/adrs/013-activerecord-orm-and-many-to-many-tagging.md +299 -0
- data/.architecture/decisions/adrs/014-client-side-embedding-generation-workflow.md +569 -0
- data/.architecture/decisions/adrs/015-hierarchical-tag-ontology-and-llm-extraction.md +701 -0
- data/.architecture/decisions/adrs/016-async-embedding-and-tag-generation.md +694 -0
- data/.architecture/members.yml +144 -0
- data/.architecture/reviews/2025-10-29-llm-configuration-and-async-processing-review.md +1137 -0
- data/.architecture/reviews/initial-system-analysis.md +330 -0
- data/.envrc +32 -0
- data/.irbrc +145 -0
- data/CHANGELOG.md +150 -0
- data/COMMITS.md +196 -0
- data/LICENSE +21 -0
- data/README.md +1347 -0
- data/Rakefile +51 -0
- data/SETUP.md +268 -0
- data/config/database.yml +67 -0
- data/db/migrate/20250101000001_enable_extensions.rb +14 -0
- data/db/migrate/20250101000002_create_robots.rb +14 -0
- data/db/migrate/20250101000003_create_nodes.rb +42 -0
- data/db/migrate/20250101000005_create_tags.rb +38 -0
- data/db/migrate/20250101000007_add_node_vector_indexes.rb +30 -0
- data/db/schema.sql +473 -0
- data/db/seed_data/README.md +100 -0
- data/db/seed_data/presidents.md +136 -0
- data/db/seed_data/states.md +151 -0
- data/db/seeds.rb +208 -0
- data/dbdoc/README.md +173 -0
- data/dbdoc/public.node_stats.md +48 -0
- data/dbdoc/public.node_stats.svg +41 -0
- data/dbdoc/public.node_tags.md +40 -0
- data/dbdoc/public.node_tags.svg +112 -0
- data/dbdoc/public.nodes.md +54 -0
- data/dbdoc/public.nodes.svg +118 -0
- data/dbdoc/public.nodes_tags.md +39 -0
- data/dbdoc/public.nodes_tags.svg +112 -0
- data/dbdoc/public.ontology_structure.md +48 -0
- data/dbdoc/public.ontology_structure.svg +38 -0
- data/dbdoc/public.operations_log.md +42 -0
- data/dbdoc/public.operations_log.svg +130 -0
- data/dbdoc/public.relationships.md +39 -0
- data/dbdoc/public.relationships.svg +41 -0
- data/dbdoc/public.robot_activity.md +46 -0
- data/dbdoc/public.robot_activity.svg +35 -0
- data/dbdoc/public.robots.md +35 -0
- data/dbdoc/public.robots.svg +90 -0
- data/dbdoc/public.schema_migrations.md +29 -0
- data/dbdoc/public.schema_migrations.svg +26 -0
- data/dbdoc/public.tags.md +35 -0
- data/dbdoc/public.tags.svg +60 -0
- data/dbdoc/public.topic_relationships.md +45 -0
- data/dbdoc/public.topic_relationships.svg +32 -0
- data/dbdoc/schema.json +1437 -0
- data/dbdoc/schema.svg +154 -0
- data/docs/api/database.md +806 -0
- data/docs/api/embedding-service.md +532 -0
- data/docs/api/htm.md +797 -0
- data/docs/api/index.md +259 -0
- data/docs/api/long-term-memory.md +1096 -0
- data/docs/api/working-memory.md +665 -0
- data/docs/architecture/adrs/001-postgresql-timescaledb.md +314 -0
- data/docs/architecture/adrs/002-two-tier-memory.md +411 -0
- data/docs/architecture/adrs/003-ollama-embeddings.md +421 -0
- data/docs/architecture/adrs/004-hive-mind.md +437 -0
- data/docs/architecture/adrs/005-rag-retrieval.md +531 -0
- data/docs/architecture/adrs/006-context-assembly.md +496 -0
- data/docs/architecture/adrs/007-eviction-strategy.md +645 -0
- data/docs/architecture/adrs/008-robot-identification.md +625 -0
- data/docs/architecture/adrs/009-never-forget.md +648 -0
- data/docs/architecture/adrs/010-redis-working-memory-rejected.md +323 -0
- data/docs/architecture/adrs/011-pgai-integration.md +494 -0
- data/docs/architecture/adrs/index.md +215 -0
- data/docs/architecture/hive-mind.md +736 -0
- data/docs/architecture/index.md +351 -0
- data/docs/architecture/overview.md +538 -0
- data/docs/architecture/two-tier-memory.md +873 -0
- data/docs/assets/css/custom.css +83 -0
- data/docs/assets/images/htm-core-components.svg +63 -0
- data/docs/assets/images/htm-database-schema.svg +93 -0
- data/docs/assets/images/htm-hive-mind-architecture.svg +125 -0
- data/docs/assets/images/htm-importance-scoring-framework.svg +83 -0
- data/docs/assets/images/htm-layered-architecture.svg +71 -0
- data/docs/assets/images/htm-long-term-memory-architecture.svg +115 -0
- data/docs/assets/images/htm-working-memory-architecture.svg +120 -0
- data/docs/assets/images/htm.jpg +0 -0
- data/docs/assets/images/htm_demo.gif +0 -0
- data/docs/assets/js/mathjax.js +18 -0
- data/docs/assets/videos/htm_video.mp4 +0 -0
- data/docs/database_rake_tasks.md +322 -0
- data/docs/development/contributing.md +787 -0
- data/docs/development/index.md +336 -0
- data/docs/development/schema.md +596 -0
- data/docs/development/setup.md +719 -0
- data/docs/development/testing.md +819 -0
- data/docs/guides/adding-memories.md +824 -0
- data/docs/guides/context-assembly.md +1009 -0
- data/docs/guides/getting-started.md +577 -0
- data/docs/guides/index.md +118 -0
- data/docs/guides/long-term-memory.md +941 -0
- data/docs/guides/multi-robot.md +866 -0
- data/docs/guides/recalling-memories.md +927 -0
- data/docs/guides/search-strategies.md +953 -0
- data/docs/guides/working-memory.md +717 -0
- data/docs/index.md +214 -0
- data/docs/installation.md +477 -0
- data/docs/multi_framework_support.md +519 -0
- data/docs/quick-start.md +655 -0
- data/docs/setup_local_database.md +302 -0
- data/docs/using_rake_tasks_in_your_app.md +383 -0
- data/examples/basic_usage.rb +93 -0
- data/examples/cli_app/README.md +317 -0
- data/examples/cli_app/htm_cli.rb +270 -0
- data/examples/custom_llm_configuration.rb +183 -0
- data/examples/example_app/Rakefile +71 -0
- data/examples/example_app/app.rb +206 -0
- data/examples/sinatra_app/Gemfile +21 -0
- data/examples/sinatra_app/app.rb +335 -0
- data/lib/htm/active_record_config.rb +113 -0
- data/lib/htm/configuration.rb +342 -0
- data/lib/htm/database.rb +594 -0
- data/lib/htm/embedding_service.rb +115 -0
- data/lib/htm/errors.rb +34 -0
- data/lib/htm/job_adapter.rb +154 -0
- data/lib/htm/jobs/generate_embedding_job.rb +65 -0
- data/lib/htm/jobs/generate_tags_job.rb +82 -0
- data/lib/htm/long_term_memory.rb +965 -0
- data/lib/htm/models/node.rb +109 -0
- data/lib/htm/models/node_tag.rb +33 -0
- data/lib/htm/models/robot.rb +52 -0
- data/lib/htm/models/tag.rb +76 -0
- data/lib/htm/railtie.rb +76 -0
- data/lib/htm/sinatra.rb +157 -0
- data/lib/htm/tag_service.rb +135 -0
- data/lib/htm/tasks.rb +38 -0
- data/lib/htm/version.rb +5 -0
- data/lib/htm/working_memory.rb +182 -0
- data/lib/htm.rb +400 -0
- data/lib/tasks/db.rake +19 -0
- data/lib/tasks/htm.rake +147 -0
- data/lib/tasks/jobs.rake +312 -0
- data/mkdocs.yml +190 -0
- data/scripts/install_local_database.sh +309 -0
- metadata +341 -0
data/db/schema.sql
ADDED
|
@@ -0,0 +1,473 @@
|
|
|
1
|
+
-- HTM Database Schema
|
|
2
|
+
-- Auto-generated from database using pg_dump
|
|
3
|
+
-- DO NOT EDIT THIS FILE MANUALLY
|
|
4
|
+
-- Run 'rake htm:db:schema:dump' to regenerate
|
|
5
|
+
|
|
6
|
+
--
|
|
7
|
+
-- Name: pg_trgm; Type: EXTENSION; Schema: -; Owner: -
|
|
8
|
+
--
|
|
9
|
+
|
|
10
|
+
CREATE EXTENSION IF NOT EXISTS pg_trgm WITH SCHEMA public;
|
|
11
|
+
|
|
12
|
+
--
|
|
13
|
+
-- Name: EXTENSION pg_trgm; Type: COMMENT; Schema: -; Owner: -
|
|
14
|
+
--
|
|
15
|
+
|
|
16
|
+
--
|
|
17
|
+
-- Name: vector; Type: EXTENSION; Schema: -; Owner: -
|
|
18
|
+
--
|
|
19
|
+
|
|
20
|
+
CREATE EXTENSION IF NOT EXISTS vector WITH SCHEMA public;
|
|
21
|
+
|
|
22
|
+
--
|
|
23
|
+
-- Name: EXTENSION vector; Type: COMMENT; Schema: -; Owner: -
|
|
24
|
+
--
|
|
25
|
+
|
|
26
|
+
--
|
|
27
|
+
-- Name: node_tags; Type: TABLE; Schema: public; Owner: -
|
|
28
|
+
--
|
|
29
|
+
|
|
30
|
+
CREATE TABLE public.node_tags (
|
|
31
|
+
id bigint NOT NULL,
|
|
32
|
+
node_id bigint NOT NULL,
|
|
33
|
+
tag_id bigint NOT NULL,
|
|
34
|
+
created_at timestamp with time zone DEFAULT CURRENT_TIMESTAMP
|
|
35
|
+
);
|
|
36
|
+
|
|
37
|
+
--
|
|
38
|
+
-- Name: TABLE node_tags; Type: COMMENT; Schema: public; Owner: -
|
|
39
|
+
--
|
|
40
|
+
|
|
41
|
+
COMMENT ON TABLE public.node_tags IS 'Join table connecting nodes to tags (many-to-many)';
|
|
42
|
+
|
|
43
|
+
--
|
|
44
|
+
-- Name: COLUMN node_tags.node_id; Type: COMMENT; Schema: public; Owner: -
|
|
45
|
+
--
|
|
46
|
+
|
|
47
|
+
COMMENT ON COLUMN public.node_tags.node_id IS 'ID of the node being tagged';
|
|
48
|
+
|
|
49
|
+
--
|
|
50
|
+
-- Name: COLUMN node_tags.tag_id; Type: COMMENT; Schema: public; Owner: -
|
|
51
|
+
--
|
|
52
|
+
|
|
53
|
+
COMMENT ON COLUMN public.node_tags.tag_id IS 'ID of the tag being applied';
|
|
54
|
+
|
|
55
|
+
--
|
|
56
|
+
-- Name: COLUMN node_tags.created_at; Type: COMMENT; Schema: public; Owner: -
|
|
57
|
+
--
|
|
58
|
+
|
|
59
|
+
COMMENT ON COLUMN public.node_tags.created_at IS 'When this association was created';
|
|
60
|
+
|
|
61
|
+
--
|
|
62
|
+
-- Name: node_tags_id_seq; Type: SEQUENCE; Schema: public; Owner: -
|
|
63
|
+
--
|
|
64
|
+
|
|
65
|
+
CREATE SEQUENCE public.node_tags_id_seq
|
|
66
|
+
START WITH 1
|
|
67
|
+
INCREMENT BY 1
|
|
68
|
+
NO MINVALUE
|
|
69
|
+
NO MAXVALUE
|
|
70
|
+
CACHE 1;
|
|
71
|
+
|
|
72
|
+
--
|
|
73
|
+
-- Name: node_tags_id_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: -
|
|
74
|
+
--
|
|
75
|
+
|
|
76
|
+
ALTER SEQUENCE public.node_tags_id_seq OWNED BY public.node_tags.id;
|
|
77
|
+
|
|
78
|
+
--
|
|
79
|
+
-- Name: nodes; Type: TABLE; Schema: public; Owner: -
|
|
80
|
+
--
|
|
81
|
+
|
|
82
|
+
CREATE TABLE public.nodes (
|
|
83
|
+
id bigint NOT NULL,
|
|
84
|
+
content text NOT NULL,
|
|
85
|
+
source text DEFAULT ''::text,
|
|
86
|
+
access_count integer DEFAULT 0 NOT NULL,
|
|
87
|
+
created_at timestamp with time zone DEFAULT CURRENT_TIMESTAMP,
|
|
88
|
+
updated_at timestamp with time zone DEFAULT CURRENT_TIMESTAMP,
|
|
89
|
+
last_accessed timestamp with time zone DEFAULT CURRENT_TIMESTAMP,
|
|
90
|
+
token_count integer,
|
|
91
|
+
in_working_memory boolean DEFAULT false,
|
|
92
|
+
robot_id bigint NOT NULL,
|
|
93
|
+
embedding public.vector(2000),
|
|
94
|
+
embedding_dimension integer,
|
|
95
|
+
CONSTRAINT check_embedding_dimension CHECK (((embedding_dimension IS NULL) OR ((embedding_dimension > 0) AND (embedding_dimension <= 2000))))
|
|
96
|
+
);
|
|
97
|
+
|
|
98
|
+
--
|
|
99
|
+
-- Name: TABLE nodes; Type: COMMENT; Schema: public; Owner: -
|
|
100
|
+
--
|
|
101
|
+
|
|
102
|
+
COMMENT ON TABLE public.nodes IS 'Core memory storage for conversation messages and context';
|
|
103
|
+
|
|
104
|
+
--
|
|
105
|
+
-- Name: COLUMN nodes.content; Type: COMMENT; Schema: public; Owner: -
|
|
106
|
+
--
|
|
107
|
+
|
|
108
|
+
COMMENT ON COLUMN public.nodes.content IS 'The conversation message/utterance content';
|
|
109
|
+
|
|
110
|
+
--
|
|
111
|
+
-- Name: COLUMN nodes.source; Type: COMMENT; Schema: public; Owner: -
|
|
112
|
+
--
|
|
113
|
+
|
|
114
|
+
COMMENT ON COLUMN public.nodes.source IS 'From where the content came (empty string if unknown)';
|
|
115
|
+
|
|
116
|
+
--
|
|
117
|
+
-- Name: COLUMN nodes.access_count; Type: COMMENT; Schema: public; Owner: -
|
|
118
|
+
--
|
|
119
|
+
|
|
120
|
+
COMMENT ON COLUMN public.nodes.access_count IS 'Number of times this node has been accessed/retrieved';
|
|
121
|
+
|
|
122
|
+
--
|
|
123
|
+
-- Name: COLUMN nodes.created_at; Type: COMMENT; Schema: public; Owner: -
|
|
124
|
+
--
|
|
125
|
+
|
|
126
|
+
COMMENT ON COLUMN public.nodes.created_at IS 'When this memory was created';
|
|
127
|
+
|
|
128
|
+
--
|
|
129
|
+
-- Name: COLUMN nodes.updated_at; Type: COMMENT; Schema: public; Owner: -
|
|
130
|
+
--
|
|
131
|
+
|
|
132
|
+
COMMENT ON COLUMN public.nodes.updated_at IS 'When this memory was last modified';
|
|
133
|
+
|
|
134
|
+
--
|
|
135
|
+
-- Name: COLUMN nodes.last_accessed; Type: COMMENT; Schema: public; Owner: -
|
|
136
|
+
--
|
|
137
|
+
|
|
138
|
+
COMMENT ON COLUMN public.nodes.last_accessed IS 'When this memory was last accessed';
|
|
139
|
+
|
|
140
|
+
--
|
|
141
|
+
-- Name: COLUMN nodes.token_count; Type: COMMENT; Schema: public; Owner: -
|
|
142
|
+
--
|
|
143
|
+
|
|
144
|
+
COMMENT ON COLUMN public.nodes.token_count IS 'Number of tokens in the content (for context budget management)';
|
|
145
|
+
|
|
146
|
+
--
|
|
147
|
+
-- Name: COLUMN nodes.in_working_memory; Type: COMMENT; Schema: public; Owner: -
|
|
148
|
+
--
|
|
149
|
+
|
|
150
|
+
COMMENT ON COLUMN public.nodes.in_working_memory IS 'Whether this memory is currently in working memory';
|
|
151
|
+
|
|
152
|
+
--
|
|
153
|
+
-- Name: COLUMN nodes.robot_id; Type: COMMENT; Schema: public; Owner: -
|
|
154
|
+
--
|
|
155
|
+
|
|
156
|
+
COMMENT ON COLUMN public.nodes.robot_id IS 'ID of the robot that owns this memory';
|
|
157
|
+
|
|
158
|
+
--
|
|
159
|
+
-- Name: COLUMN nodes.embedding; Type: COMMENT; Schema: public; Owner: -
|
|
160
|
+
--
|
|
161
|
+
|
|
162
|
+
COMMENT ON COLUMN public.nodes.embedding IS 'Vector embedding (max 2000 dimensions) for semantic search';
|
|
163
|
+
|
|
164
|
+
--
|
|
165
|
+
-- Name: COLUMN nodes.embedding_dimension; Type: COMMENT; Schema: public; Owner: -
|
|
166
|
+
--
|
|
167
|
+
|
|
168
|
+
COMMENT ON COLUMN public.nodes.embedding_dimension IS 'Actual number of dimensions used in the embedding vector (max 2000)';
|
|
169
|
+
|
|
170
|
+
--
|
|
171
|
+
-- Name: nodes_id_seq; Type: SEQUENCE; Schema: public; Owner: -
|
|
172
|
+
--
|
|
173
|
+
|
|
174
|
+
CREATE SEQUENCE public.nodes_id_seq
|
|
175
|
+
START WITH 1
|
|
176
|
+
INCREMENT BY 1
|
|
177
|
+
NO MINVALUE
|
|
178
|
+
NO MAXVALUE
|
|
179
|
+
CACHE 1;
|
|
180
|
+
|
|
181
|
+
--
|
|
182
|
+
-- Name: nodes_id_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: -
|
|
183
|
+
--
|
|
184
|
+
|
|
185
|
+
ALTER SEQUENCE public.nodes_id_seq OWNED BY public.nodes.id;
|
|
186
|
+
|
|
187
|
+
--
|
|
188
|
+
-- Name: robots; Type: TABLE; Schema: public; Owner: -
|
|
189
|
+
--
|
|
190
|
+
|
|
191
|
+
CREATE TABLE public.robots (
|
|
192
|
+
id bigint NOT NULL,
|
|
193
|
+
name text,
|
|
194
|
+
created_at timestamp with time zone DEFAULT CURRENT_TIMESTAMP,
|
|
195
|
+
last_active timestamp with time zone DEFAULT CURRENT_TIMESTAMP,
|
|
196
|
+
metadata jsonb
|
|
197
|
+
);
|
|
198
|
+
|
|
199
|
+
--
|
|
200
|
+
-- Name: TABLE robots; Type: COMMENT; Schema: public; Owner: -
|
|
201
|
+
--
|
|
202
|
+
|
|
203
|
+
COMMENT ON TABLE public.robots IS 'Registry of all LLM robots using the HTM system';
|
|
204
|
+
|
|
205
|
+
--
|
|
206
|
+
-- Name: COLUMN robots.name; Type: COMMENT; Schema: public; Owner: -
|
|
207
|
+
--
|
|
208
|
+
|
|
209
|
+
COMMENT ON COLUMN public.robots.name IS 'Human-readable name for the robot';
|
|
210
|
+
|
|
211
|
+
--
|
|
212
|
+
-- Name: COLUMN robots.created_at; Type: COMMENT; Schema: public; Owner: -
|
|
213
|
+
--
|
|
214
|
+
|
|
215
|
+
COMMENT ON COLUMN public.robots.created_at IS 'When the robot was first registered';
|
|
216
|
+
|
|
217
|
+
--
|
|
218
|
+
-- Name: COLUMN robots.last_active; Type: COMMENT; Schema: public; Owner: -
|
|
219
|
+
--
|
|
220
|
+
|
|
221
|
+
COMMENT ON COLUMN public.robots.last_active IS 'Last time the robot accessed the system';
|
|
222
|
+
|
|
223
|
+
--
|
|
224
|
+
-- Name: COLUMN robots.metadata; Type: COMMENT; Schema: public; Owner: -
|
|
225
|
+
--
|
|
226
|
+
|
|
227
|
+
COMMENT ON COLUMN public.robots.metadata IS 'Robot-specific configuration and metadata';
|
|
228
|
+
|
|
229
|
+
--
|
|
230
|
+
-- Name: robots_id_seq; Type: SEQUENCE; Schema: public; Owner: -
|
|
231
|
+
--
|
|
232
|
+
|
|
233
|
+
CREATE SEQUENCE public.robots_id_seq
|
|
234
|
+
START WITH 1
|
|
235
|
+
INCREMENT BY 1
|
|
236
|
+
NO MINVALUE
|
|
237
|
+
NO MAXVALUE
|
|
238
|
+
CACHE 1;
|
|
239
|
+
|
|
240
|
+
--
|
|
241
|
+
-- Name: robots_id_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: -
|
|
242
|
+
--
|
|
243
|
+
|
|
244
|
+
ALTER SEQUENCE public.robots_id_seq OWNED BY public.robots.id;
|
|
245
|
+
|
|
246
|
+
--
|
|
247
|
+
-- Name: schema_migrations; Type: TABLE; Schema: public; Owner: -
|
|
248
|
+
--
|
|
249
|
+
|
|
250
|
+
CREATE TABLE public.schema_migrations (
|
|
251
|
+
version character varying NOT NULL
|
|
252
|
+
);
|
|
253
|
+
|
|
254
|
+
--
|
|
255
|
+
-- Name: tags; Type: TABLE; Schema: public; Owner: -
|
|
256
|
+
--
|
|
257
|
+
|
|
258
|
+
CREATE TABLE public.tags (
|
|
259
|
+
id bigint NOT NULL,
|
|
260
|
+
name text NOT NULL,
|
|
261
|
+
created_at timestamp with time zone DEFAULT CURRENT_TIMESTAMP
|
|
262
|
+
);
|
|
263
|
+
|
|
264
|
+
--
|
|
265
|
+
-- Name: TABLE tags; Type: COMMENT; Schema: public; Owner: -
|
|
266
|
+
--
|
|
267
|
+
|
|
268
|
+
COMMENT ON TABLE public.tags IS 'Unique tag names for categorization';
|
|
269
|
+
|
|
270
|
+
--
|
|
271
|
+
-- Name: COLUMN tags.name; Type: COMMENT; Schema: public; Owner: -
|
|
272
|
+
--
|
|
273
|
+
|
|
274
|
+
COMMENT ON COLUMN public.tags.name IS 'Hierarchical tag in format: root:level1:level2 (e.g., database:postgresql:timescaledb)';
|
|
275
|
+
|
|
276
|
+
--
|
|
277
|
+
-- Name: COLUMN tags.created_at; Type: COMMENT; Schema: public; Owner: -
|
|
278
|
+
--
|
|
279
|
+
|
|
280
|
+
COMMENT ON COLUMN public.tags.created_at IS 'When this tag was created';
|
|
281
|
+
|
|
282
|
+
--
|
|
283
|
+
-- Name: tags_id_seq; Type: SEQUENCE; Schema: public; Owner: -
|
|
284
|
+
--
|
|
285
|
+
|
|
286
|
+
CREATE SEQUENCE public.tags_id_seq
|
|
287
|
+
START WITH 1
|
|
288
|
+
INCREMENT BY 1
|
|
289
|
+
NO MINVALUE
|
|
290
|
+
NO MAXVALUE
|
|
291
|
+
CACHE 1;
|
|
292
|
+
|
|
293
|
+
--
|
|
294
|
+
-- Name: tags_id_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: -
|
|
295
|
+
--
|
|
296
|
+
|
|
297
|
+
ALTER SEQUENCE public.tags_id_seq OWNED BY public.tags.id;
|
|
298
|
+
|
|
299
|
+
--
|
|
300
|
+
-- Name: node_tags id; Type: DEFAULT; Schema: public; Owner: -
|
|
301
|
+
--
|
|
302
|
+
|
|
303
|
+
ALTER TABLE ONLY public.node_tags ALTER COLUMN id SET DEFAULT nextval('public.node_tags_id_seq'::regclass);
|
|
304
|
+
|
|
305
|
+
--
|
|
306
|
+
-- Name: nodes id; Type: DEFAULT; Schema: public; Owner: -
|
|
307
|
+
--
|
|
308
|
+
|
|
309
|
+
ALTER TABLE ONLY public.nodes ALTER COLUMN id SET DEFAULT nextval('public.nodes_id_seq'::regclass);
|
|
310
|
+
|
|
311
|
+
--
|
|
312
|
+
-- Name: robots id; Type: DEFAULT; Schema: public; Owner: -
|
|
313
|
+
--
|
|
314
|
+
|
|
315
|
+
ALTER TABLE ONLY public.robots ALTER COLUMN id SET DEFAULT nextval('public.robots_id_seq'::regclass);
|
|
316
|
+
|
|
317
|
+
--
|
|
318
|
+
-- Name: tags id; Type: DEFAULT; Schema: public; Owner: -
|
|
319
|
+
--
|
|
320
|
+
|
|
321
|
+
ALTER TABLE ONLY public.tags ALTER COLUMN id SET DEFAULT nextval('public.tags_id_seq'::regclass);
|
|
322
|
+
|
|
323
|
+
--
|
|
324
|
+
-- Name: node_tags node_tags_pkey; Type: CONSTRAINT; Schema: public; Owner: -
|
|
325
|
+
--
|
|
326
|
+
|
|
327
|
+
ALTER TABLE ONLY public.node_tags
|
|
328
|
+
ADD CONSTRAINT node_tags_pkey PRIMARY KEY (id);
|
|
329
|
+
|
|
330
|
+
--
|
|
331
|
+
-- Name: nodes nodes_pkey; Type: CONSTRAINT; Schema: public; Owner: -
|
|
332
|
+
--
|
|
333
|
+
|
|
334
|
+
ALTER TABLE ONLY public.nodes
|
|
335
|
+
ADD CONSTRAINT nodes_pkey PRIMARY KEY (id);
|
|
336
|
+
|
|
337
|
+
--
|
|
338
|
+
-- Name: robots robots_pkey; Type: CONSTRAINT; Schema: public; Owner: -
|
|
339
|
+
--
|
|
340
|
+
|
|
341
|
+
ALTER TABLE ONLY public.robots
|
|
342
|
+
ADD CONSTRAINT robots_pkey PRIMARY KEY (id);
|
|
343
|
+
|
|
344
|
+
--
|
|
345
|
+
-- Name: schema_migrations schema_migrations_pkey; Type: CONSTRAINT; Schema: public; Owner: -
|
|
346
|
+
--
|
|
347
|
+
|
|
348
|
+
ALTER TABLE ONLY public.schema_migrations
|
|
349
|
+
ADD CONSTRAINT schema_migrations_pkey PRIMARY KEY (version);
|
|
350
|
+
|
|
351
|
+
--
|
|
352
|
+
-- Name: tags tags_pkey; Type: CONSTRAINT; Schema: public; Owner: -
|
|
353
|
+
--
|
|
354
|
+
|
|
355
|
+
ALTER TABLE ONLY public.tags
|
|
356
|
+
ADD CONSTRAINT tags_pkey PRIMARY KEY (id);
|
|
357
|
+
|
|
358
|
+
--
|
|
359
|
+
-- Name: idx_node_tags_node_id; Type: INDEX; Schema: public; Owner: -
|
|
360
|
+
--
|
|
361
|
+
|
|
362
|
+
CREATE INDEX idx_node_tags_node_id ON public.node_tags USING btree (node_id);
|
|
363
|
+
|
|
364
|
+
--
|
|
365
|
+
-- Name: idx_node_tags_tag_id; Type: INDEX; Schema: public; Owner: -
|
|
366
|
+
--
|
|
367
|
+
|
|
368
|
+
CREATE INDEX idx_node_tags_tag_id ON public.node_tags USING btree (tag_id);
|
|
369
|
+
|
|
370
|
+
--
|
|
371
|
+
-- Name: idx_node_tags_unique; Type: INDEX; Schema: public; Owner: -
|
|
372
|
+
--
|
|
373
|
+
|
|
374
|
+
CREATE UNIQUE INDEX idx_node_tags_unique ON public.node_tags USING btree (node_id, tag_id);
|
|
375
|
+
|
|
376
|
+
--
|
|
377
|
+
-- Name: idx_nodes_access_count; Type: INDEX; Schema: public; Owner: -
|
|
378
|
+
--
|
|
379
|
+
|
|
380
|
+
CREATE INDEX idx_nodes_access_count ON public.nodes USING btree (access_count);
|
|
381
|
+
|
|
382
|
+
--
|
|
383
|
+
-- Name: idx_nodes_content_gin; Type: INDEX; Schema: public; Owner: -
|
|
384
|
+
--
|
|
385
|
+
|
|
386
|
+
CREATE INDEX idx_nodes_content_gin ON public.nodes USING gin (to_tsvector('english'::regconfig, content));
|
|
387
|
+
|
|
388
|
+
--
|
|
389
|
+
-- Name: idx_nodes_content_trgm; Type: INDEX; Schema: public; Owner: -
|
|
390
|
+
--
|
|
391
|
+
|
|
392
|
+
CREATE INDEX idx_nodes_content_trgm ON public.nodes USING gin (content public.gin_trgm_ops);
|
|
393
|
+
|
|
394
|
+
--
|
|
395
|
+
-- Name: idx_nodes_created_at; Type: INDEX; Schema: public; Owner: -
|
|
396
|
+
--
|
|
397
|
+
|
|
398
|
+
CREATE INDEX idx_nodes_created_at ON public.nodes USING btree (created_at);
|
|
399
|
+
|
|
400
|
+
--
|
|
401
|
+
-- Name: idx_nodes_embedding; Type: INDEX; Schema: public; Owner: -
|
|
402
|
+
--
|
|
403
|
+
|
|
404
|
+
CREATE INDEX idx_nodes_embedding ON public.nodes USING hnsw (embedding public.vector_cosine_ops) WITH (m='16', ef_construction='64');
|
|
405
|
+
|
|
406
|
+
--
|
|
407
|
+
-- Name: idx_nodes_in_working_memory; Type: INDEX; Schema: public; Owner: -
|
|
408
|
+
--
|
|
409
|
+
|
|
410
|
+
CREATE INDEX idx_nodes_in_working_memory ON public.nodes USING btree (in_working_memory);
|
|
411
|
+
|
|
412
|
+
--
|
|
413
|
+
-- Name: idx_nodes_last_accessed; Type: INDEX; Schema: public; Owner: -
|
|
414
|
+
--
|
|
415
|
+
|
|
416
|
+
CREATE INDEX idx_nodes_last_accessed ON public.nodes USING btree (last_accessed);
|
|
417
|
+
|
|
418
|
+
--
|
|
419
|
+
-- Name: idx_nodes_robot_id; Type: INDEX; Schema: public; Owner: -
|
|
420
|
+
--
|
|
421
|
+
|
|
422
|
+
CREATE INDEX idx_nodes_robot_id ON public.nodes USING btree (robot_id);
|
|
423
|
+
|
|
424
|
+
--
|
|
425
|
+
-- Name: idx_nodes_source; Type: INDEX; Schema: public; Owner: -
|
|
426
|
+
--
|
|
427
|
+
|
|
428
|
+
CREATE INDEX idx_nodes_source ON public.nodes USING btree (source);
|
|
429
|
+
|
|
430
|
+
--
|
|
431
|
+
-- Name: idx_nodes_updated_at; Type: INDEX; Schema: public; Owner: -
|
|
432
|
+
--
|
|
433
|
+
|
|
434
|
+
CREATE INDEX idx_nodes_updated_at ON public.nodes USING btree (updated_at);
|
|
435
|
+
|
|
436
|
+
--
|
|
437
|
+
-- Name: idx_tags_name_pattern; Type: INDEX; Schema: public; Owner: -
|
|
438
|
+
--
|
|
439
|
+
|
|
440
|
+
CREATE INDEX idx_tags_name_pattern ON public.tags USING btree (name text_pattern_ops);
|
|
441
|
+
|
|
442
|
+
--
|
|
443
|
+
-- Name: idx_tags_name_unique; Type: INDEX; Schema: public; Owner: -
|
|
444
|
+
--
|
|
445
|
+
|
|
446
|
+
CREATE UNIQUE INDEX idx_tags_name_unique ON public.tags USING btree (name);
|
|
447
|
+
|
|
448
|
+
--
|
|
449
|
+
-- Name: nodes fk_rails_60162e9d3a; Type: FK CONSTRAINT; Schema: public; Owner: -
|
|
450
|
+
--
|
|
451
|
+
|
|
452
|
+
ALTER TABLE ONLY public.nodes
|
|
453
|
+
ADD CONSTRAINT fk_rails_60162e9d3a FOREIGN KEY (robot_id) REFERENCES public.robots(id) ON DELETE CASCADE;
|
|
454
|
+
|
|
455
|
+
--
|
|
456
|
+
-- Name: node_tags fk_rails_b51cdcc57f; Type: FK CONSTRAINT; Schema: public; Owner: -
|
|
457
|
+
--
|
|
458
|
+
|
|
459
|
+
ALTER TABLE ONLY public.node_tags
|
|
460
|
+
ADD CONSTRAINT fk_rails_b51cdcc57f FOREIGN KEY (tag_id) REFERENCES public.tags(id) ON DELETE CASCADE;
|
|
461
|
+
|
|
462
|
+
--
|
|
463
|
+
-- Name: node_tags fk_rails_ebc9aafd9f; Type: FK CONSTRAINT; Schema: public; Owner: -
|
|
464
|
+
--
|
|
465
|
+
|
|
466
|
+
ALTER TABLE ONLY public.node_tags
|
|
467
|
+
ADD CONSTRAINT fk_rails_ebc9aafd9f FOREIGN KEY (node_id) REFERENCES public.nodes(id) ON DELETE CASCADE;
|
|
468
|
+
|
|
469
|
+
--
|
|
470
|
+
-- PostgreSQL database dump complete
|
|
471
|
+
--
|
|
472
|
+
|
|
473
|
+
\unrestrict f5a75Zsnuw7NUeDmu1kxeQ3pRMbaORhrsWHJyDdXV4wbRfzQweTumJBXu85kf1z
|
|
@@ -0,0 +1,100 @@
|
|
|
1
|
+
# HTM Seed Data
|
|
2
|
+
|
|
3
|
+
This directory contains markdown files used to seed the HTM database with sample data.
|
|
4
|
+
|
|
5
|
+
## Configuration
|
|
6
|
+
|
|
7
|
+
The seeding process uses environment variables for configuration. All settings have sensible defaults:
|
|
8
|
+
|
|
9
|
+
### LLM Provider Settings
|
|
10
|
+
|
|
11
|
+
- `HTM_EMBEDDING_PROVIDER` - Embedding provider (default: `ollama`)
|
|
12
|
+
- `HTM_EMBEDDING_MODEL` - Embedding model (default: `nomic-embed-text`)
|
|
13
|
+
- `HTM_EMBEDDING_DIMENSIONS` - Embedding dimensions (default: `768`)
|
|
14
|
+
- `HTM_TAG_PROVIDER` - Tag extraction provider (default: `ollama`)
|
|
15
|
+
- `HTM_TAG_MODEL` - Tag extraction model (default: `gemma3`)
|
|
16
|
+
- `OLLAMA_URL` - Ollama server URL (default: `http://localhost:11434`)
|
|
17
|
+
|
|
18
|
+
### Timeout Settings
|
|
19
|
+
|
|
20
|
+
- `HTM_EMBEDDING_TIMEOUT` - Embedding generation timeout in seconds (default: `120`)
|
|
21
|
+
- `HTM_TAG_TIMEOUT` - Tag generation timeout in seconds (default: `180`)
|
|
22
|
+
- `HTM_CONNECTION_TIMEOUT` - LLM connection timeout in seconds (default: `30`)
|
|
23
|
+
|
|
24
|
+
### Database Settings
|
|
25
|
+
|
|
26
|
+
- `HTM_DBURL` - Full PostgreSQL connection URL (required)
|
|
27
|
+
- Or individual settings: `HTM_DBHOST`, `HTM_DBPORT`, `HTM_DBNAME`, `HTM_DBUSER`, `HTM_DBPASS`
|
|
28
|
+
|
|
29
|
+
### Other Settings
|
|
30
|
+
|
|
31
|
+
- `HTM_ROBOT_NAME` - Name for the seeding robot (default: `"Seed Robot"`)
|
|
32
|
+
|
|
33
|
+
## Format
|
|
34
|
+
|
|
35
|
+
All `.md` files in this directory will be automatically processed by `db/seeds.rb`.
|
|
36
|
+
|
|
37
|
+
Each markdown file should follow this structure:
|
|
38
|
+
|
|
39
|
+
```markdown
|
|
40
|
+
# Title (optional, will be ignored)
|
|
41
|
+
|
|
42
|
+
## Section Name 1
|
|
43
|
+
Paragraph of content for this section. This entire paragraph will be stored
|
|
44
|
+
as a single memory node in HTM.
|
|
45
|
+
|
|
46
|
+
## Section Name 2
|
|
47
|
+
Another paragraph of content. Each ## header denotes a new section that will
|
|
48
|
+
become a separate memory node.
|
|
49
|
+
|
|
50
|
+
## Section Name 3
|
|
51
|
+
Content for the third section...
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
## Processing
|
|
55
|
+
|
|
56
|
+
The seeding script (`db/seeds.rb`):
|
|
57
|
+
|
|
58
|
+
1. Reads all `*.md` files from this directory
|
|
59
|
+
2. Parses each file looking for `## Header` sections
|
|
60
|
+
3. Extracts the paragraph(s) following each header
|
|
61
|
+
4. Creates an HTM memory node for each section
|
|
62
|
+
5. Uses the filename (without `.md`) as the `source` field
|
|
63
|
+
6. Automatically generates embeddings and hierarchical tags for each node
|
|
64
|
+
|
|
65
|
+
## Current Seed Data
|
|
66
|
+
|
|
67
|
+
- **states.md**: Interesting facts about all 50 US states
|
|
68
|
+
- **presidents.md**: Interesting facts about all 45 US presidents
|
|
69
|
+
|
|
70
|
+
## Adding New Seed Data
|
|
71
|
+
|
|
72
|
+
To add new seed data:
|
|
73
|
+
|
|
74
|
+
1. Create a new `.md` file in this directory
|
|
75
|
+
2. Follow the format above with `## Header` sections
|
|
76
|
+
3. Run `rake htm:db:seed` to populate the database
|
|
77
|
+
|
|
78
|
+
The filename (without extension) will be used as the source identifier for all nodes
|
|
79
|
+
created from that file.
|
|
80
|
+
|
|
81
|
+
## Example
|
|
82
|
+
|
|
83
|
+
Given a file `countries.md`:
|
|
84
|
+
|
|
85
|
+
```markdown
|
|
86
|
+
# World Countries
|
|
87
|
+
|
|
88
|
+
## France
|
|
89
|
+
France is known for the Eiffel Tower...
|
|
90
|
+
|
|
91
|
+
## Japan
|
|
92
|
+
Japan is an island nation...
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
Running `rake htm:db:seed` will create:
|
|
96
|
+
- 2 memory nodes
|
|
97
|
+
- Both with `source: "countries"`
|
|
98
|
+
- Each with embeddings (768-dimensional vectors)
|
|
99
|
+
- Each with hierarchical tags extracted by LLM
|
|
100
|
+
- Stored in the `nodes` table with full-text and vector search capabilities
|