htm 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (155) hide show
  1. checksums.yaml +7 -0
  2. data/.architecture/decisions/adrs/001-use-postgresql-timescaledb-storage.md +227 -0
  3. data/.architecture/decisions/adrs/002-two-tier-memory-architecture.md +322 -0
  4. data/.architecture/decisions/adrs/003-ollama-default-embedding-provider.md +339 -0
  5. data/.architecture/decisions/adrs/004-multi-robot-shared-memory-hive-mind.md +374 -0
  6. data/.architecture/decisions/adrs/005-rag-based-retrieval-with-hybrid-search.md +443 -0
  7. data/.architecture/decisions/adrs/006-context-assembly-strategies.md +444 -0
  8. data/.architecture/decisions/adrs/007-working-memory-eviction-strategy.md +461 -0
  9. data/.architecture/decisions/adrs/008-robot-identification-system.md +550 -0
  10. data/.architecture/decisions/adrs/009-never-forget-explicit-deletion-only.md +570 -0
  11. data/.architecture/decisions/adrs/010-redis-working-memory-rejected.md +323 -0
  12. data/.architecture/decisions/adrs/011-database-side-embedding-generation-with-pgai.md +585 -0
  13. data/.architecture/decisions/adrs/012-llm-driven-ontology-topic-extraction.md +583 -0
  14. data/.architecture/decisions/adrs/013-activerecord-orm-and-many-to-many-tagging.md +299 -0
  15. data/.architecture/decisions/adrs/014-client-side-embedding-generation-workflow.md +569 -0
  16. data/.architecture/decisions/adrs/015-hierarchical-tag-ontology-and-llm-extraction.md +701 -0
  17. data/.architecture/decisions/adrs/016-async-embedding-and-tag-generation.md +694 -0
  18. data/.architecture/members.yml +144 -0
  19. data/.architecture/reviews/2025-10-29-llm-configuration-and-async-processing-review.md +1137 -0
  20. data/.architecture/reviews/initial-system-analysis.md +330 -0
  21. data/.envrc +32 -0
  22. data/.irbrc +145 -0
  23. data/CHANGELOG.md +150 -0
  24. data/COMMITS.md +196 -0
  25. data/LICENSE +21 -0
  26. data/README.md +1347 -0
  27. data/Rakefile +51 -0
  28. data/SETUP.md +268 -0
  29. data/config/database.yml +67 -0
  30. data/db/migrate/20250101000001_enable_extensions.rb +14 -0
  31. data/db/migrate/20250101000002_create_robots.rb +14 -0
  32. data/db/migrate/20250101000003_create_nodes.rb +42 -0
  33. data/db/migrate/20250101000005_create_tags.rb +38 -0
  34. data/db/migrate/20250101000007_add_node_vector_indexes.rb +30 -0
  35. data/db/schema.sql +473 -0
  36. data/db/seed_data/README.md +100 -0
  37. data/db/seed_data/presidents.md +136 -0
  38. data/db/seed_data/states.md +151 -0
  39. data/db/seeds.rb +208 -0
  40. data/dbdoc/README.md +173 -0
  41. data/dbdoc/public.node_stats.md +48 -0
  42. data/dbdoc/public.node_stats.svg +41 -0
  43. data/dbdoc/public.node_tags.md +40 -0
  44. data/dbdoc/public.node_tags.svg +112 -0
  45. data/dbdoc/public.nodes.md +54 -0
  46. data/dbdoc/public.nodes.svg +118 -0
  47. data/dbdoc/public.nodes_tags.md +39 -0
  48. data/dbdoc/public.nodes_tags.svg +112 -0
  49. data/dbdoc/public.ontology_structure.md +48 -0
  50. data/dbdoc/public.ontology_structure.svg +38 -0
  51. data/dbdoc/public.operations_log.md +42 -0
  52. data/dbdoc/public.operations_log.svg +130 -0
  53. data/dbdoc/public.relationships.md +39 -0
  54. data/dbdoc/public.relationships.svg +41 -0
  55. data/dbdoc/public.robot_activity.md +46 -0
  56. data/dbdoc/public.robot_activity.svg +35 -0
  57. data/dbdoc/public.robots.md +35 -0
  58. data/dbdoc/public.robots.svg +90 -0
  59. data/dbdoc/public.schema_migrations.md +29 -0
  60. data/dbdoc/public.schema_migrations.svg +26 -0
  61. data/dbdoc/public.tags.md +35 -0
  62. data/dbdoc/public.tags.svg +60 -0
  63. data/dbdoc/public.topic_relationships.md +45 -0
  64. data/dbdoc/public.topic_relationships.svg +32 -0
  65. data/dbdoc/schema.json +1437 -0
  66. data/dbdoc/schema.svg +154 -0
  67. data/docs/api/database.md +806 -0
  68. data/docs/api/embedding-service.md +532 -0
  69. data/docs/api/htm.md +797 -0
  70. data/docs/api/index.md +259 -0
  71. data/docs/api/long-term-memory.md +1096 -0
  72. data/docs/api/working-memory.md +665 -0
  73. data/docs/architecture/adrs/001-postgresql-timescaledb.md +314 -0
  74. data/docs/architecture/adrs/002-two-tier-memory.md +411 -0
  75. data/docs/architecture/adrs/003-ollama-embeddings.md +421 -0
  76. data/docs/architecture/adrs/004-hive-mind.md +437 -0
  77. data/docs/architecture/adrs/005-rag-retrieval.md +531 -0
  78. data/docs/architecture/adrs/006-context-assembly.md +496 -0
  79. data/docs/architecture/adrs/007-eviction-strategy.md +645 -0
  80. data/docs/architecture/adrs/008-robot-identification.md +625 -0
  81. data/docs/architecture/adrs/009-never-forget.md +648 -0
  82. data/docs/architecture/adrs/010-redis-working-memory-rejected.md +323 -0
  83. data/docs/architecture/adrs/011-pgai-integration.md +494 -0
  84. data/docs/architecture/adrs/index.md +215 -0
  85. data/docs/architecture/hive-mind.md +736 -0
  86. data/docs/architecture/index.md +351 -0
  87. data/docs/architecture/overview.md +538 -0
  88. data/docs/architecture/two-tier-memory.md +873 -0
  89. data/docs/assets/css/custom.css +83 -0
  90. data/docs/assets/images/htm-core-components.svg +63 -0
  91. data/docs/assets/images/htm-database-schema.svg +93 -0
  92. data/docs/assets/images/htm-hive-mind-architecture.svg +125 -0
  93. data/docs/assets/images/htm-importance-scoring-framework.svg +83 -0
  94. data/docs/assets/images/htm-layered-architecture.svg +71 -0
  95. data/docs/assets/images/htm-long-term-memory-architecture.svg +115 -0
  96. data/docs/assets/images/htm-working-memory-architecture.svg +120 -0
  97. data/docs/assets/images/htm.jpg +0 -0
  98. data/docs/assets/images/htm_demo.gif +0 -0
  99. data/docs/assets/js/mathjax.js +18 -0
  100. data/docs/assets/videos/htm_video.mp4 +0 -0
  101. data/docs/database_rake_tasks.md +322 -0
  102. data/docs/development/contributing.md +787 -0
  103. data/docs/development/index.md +336 -0
  104. data/docs/development/schema.md +596 -0
  105. data/docs/development/setup.md +719 -0
  106. data/docs/development/testing.md +819 -0
  107. data/docs/guides/adding-memories.md +824 -0
  108. data/docs/guides/context-assembly.md +1009 -0
  109. data/docs/guides/getting-started.md +577 -0
  110. data/docs/guides/index.md +118 -0
  111. data/docs/guides/long-term-memory.md +941 -0
  112. data/docs/guides/multi-robot.md +866 -0
  113. data/docs/guides/recalling-memories.md +927 -0
  114. data/docs/guides/search-strategies.md +953 -0
  115. data/docs/guides/working-memory.md +717 -0
  116. data/docs/index.md +214 -0
  117. data/docs/installation.md +477 -0
  118. data/docs/multi_framework_support.md +519 -0
  119. data/docs/quick-start.md +655 -0
  120. data/docs/setup_local_database.md +302 -0
  121. data/docs/using_rake_tasks_in_your_app.md +383 -0
  122. data/examples/basic_usage.rb +93 -0
  123. data/examples/cli_app/README.md +317 -0
  124. data/examples/cli_app/htm_cli.rb +270 -0
  125. data/examples/custom_llm_configuration.rb +183 -0
  126. data/examples/example_app/Rakefile +71 -0
  127. data/examples/example_app/app.rb +206 -0
  128. data/examples/sinatra_app/Gemfile +21 -0
  129. data/examples/sinatra_app/app.rb +335 -0
  130. data/lib/htm/active_record_config.rb +113 -0
  131. data/lib/htm/configuration.rb +342 -0
  132. data/lib/htm/database.rb +594 -0
  133. data/lib/htm/embedding_service.rb +115 -0
  134. data/lib/htm/errors.rb +34 -0
  135. data/lib/htm/job_adapter.rb +154 -0
  136. data/lib/htm/jobs/generate_embedding_job.rb +65 -0
  137. data/lib/htm/jobs/generate_tags_job.rb +82 -0
  138. data/lib/htm/long_term_memory.rb +965 -0
  139. data/lib/htm/models/node.rb +109 -0
  140. data/lib/htm/models/node_tag.rb +33 -0
  141. data/lib/htm/models/robot.rb +52 -0
  142. data/lib/htm/models/tag.rb +76 -0
  143. data/lib/htm/railtie.rb +76 -0
  144. data/lib/htm/sinatra.rb +157 -0
  145. data/lib/htm/tag_service.rb +135 -0
  146. data/lib/htm/tasks.rb +38 -0
  147. data/lib/htm/version.rb +5 -0
  148. data/lib/htm/working_memory.rb +182 -0
  149. data/lib/htm.rb +400 -0
  150. data/lib/tasks/db.rake +19 -0
  151. data/lib/tasks/htm.rake +147 -0
  152. data/lib/tasks/jobs.rake +312 -0
  153. data/mkdocs.yml +190 -0
  154. data/scripts/install_local_database.sh +309 -0
  155. metadata +341 -0
@@ -0,0 +1,701 @@
1
+ # ADR-015: Hierarchical Tag Ontology and LLM Extraction
2
+
3
+ **Status**: ~~Accepted (Manual) / Proposed (LLM)~~ **SUPERSEDED** (2025-10-29)
4
+
5
+ **Superseded By**: ADR-016 (Async Embedding and Tag Generation)
6
+
7
+ **Date**: 2025-10-29
8
+
9
+ **Decision Makers**: Dewayne VanHoozer, Claude (Anthropic)
10
+
11
+ ---
12
+
13
+ ## ⚠️ DECISION SUPERSEDED (2025-10-29)
14
+
15
+ **This ADR has been superseded by ADR-016.**
16
+
17
+ **Reason**: The manual-first, LLM-later approach has been replaced with automatic async LLM extraction via `TagService` and background jobs. Key changes:
18
+ - LLM tag extraction is now implemented (not future)
19
+ - Runs automatically via `GenerateTagsJob` background job
20
+ - Uses `TagService` class (parallel to `EmbeddingService`)
21
+ - No manual tagging step required
22
+
23
+ See [ADR-016: Async Embedding and Tag Generation](./016-async-embedding-and-tag-generation.md) for current architecture.
24
+
25
+ ---
26
+
27
+ ## Context (Historical)
28
+
29
+ HTM's tagging system enables organizing memories with hierarchical, namespace-based tags. Following the removal of database-side LLM extraction (ADR-012 reversal), the architecture for tag generation and ontology management needs clear documentation.
30
+
31
+ ### Current State
32
+
33
+ **What Exists**:
34
+ - Many-to-many tagging via `nodes_tags` join table (ADR-013)
35
+ - Hierarchical namespace format: `root:level1:level2`
36
+ - Manual tag assignment via `add_tag()` method
37
+ - Tag queries and relationship analysis
38
+
39
+ **What's Missing**:
40
+ - Automatic tag extraction from content
41
+ - LLM-driven topic identification
42
+ - Tag normalization and merging
43
+ - Ontology evolution strategies
44
+
45
+ ### The Ontology Vision
46
+
47
+ **Emerging Knowledge Structure**:
48
+ - Tags create navigable hierarchies across all memories
49
+ - Multiple classification paths for same content
50
+ - Complements vector embeddings (symbolic + sub-symbolic)
51
+ - Reveals patterns in knowledge base over time
52
+
53
+ **Example Ontology**:
54
+ ```
55
+ ai:llm:embeddings
56
+ ai:llm:prompts
57
+ ai:rag:retrieval
58
+ database:postgresql:indexes
59
+ database:postgresql:pgvector
60
+ programming:ruby:activerecord
61
+ programming:ruby:gems
62
+ performance:optimization:database
63
+ ```
64
+
65
+ ---
66
+
67
+ ## Decision
68
+
69
+ HTM will support **hierarchical tags with manual assignment now** and **LLM-driven extraction in the future**, using a **client-side extraction approach** that learns from existing ontology.
70
+
71
+ ### Phase 1: Manual Tagging (Current - ACCEPTED)
72
+
73
+ **Implementation**:
74
+ ```ruby
75
+ # Add single tag
76
+ ltm.add_tag(node_id: node.id, tag: 'database:postgresql')
77
+
78
+ # Add multiple tags during node creation
79
+ htm.add_message(
80
+ "PostgreSQL with pgvector provides vector search",
81
+ tags: [
82
+ 'database:postgresql',
83
+ 'database:pgvector',
84
+ 'ai:embeddings'
85
+ ]
86
+ )
87
+
88
+ # Query nodes by tag
89
+ nodes = ltm.nodes_with_tag('database:postgresql')
90
+
91
+ # Query by tag prefix (hierarchical)
92
+ nodes = ltm.nodes_with_tag_prefix('database:') # All database-related
93
+ ```
94
+
95
+ **Current API** (in `lib/htm/long_term_memory.rb`):
96
+ ```ruby
97
+ class HTM::LongTermMemory
98
+ # Add tag to existing node
99
+ def add_tag(node_id:, tag:)
100
+ tag_record = HTM::Models::Tag.find_or_create_by(name: tag)
101
+ HTM::Models::NodeTag.create(
102
+ node_id: node_id,
103
+ tag_id: tag_record.id
104
+ )
105
+ end
106
+
107
+ # Get all tags for a node
108
+ def node_topics(node_id)
109
+ HTM::Models::Tag
110
+ .joins(:node_tags)
111
+ .where(nodes_tags: { node_id: node_id })
112
+ .order(:name)
113
+ .pluck(:name)
114
+ end
115
+
116
+ # Find related topics by shared nodes
117
+ def topic_relationships(min_shared_nodes: 2, limit: 50)
118
+ result = ActiveRecord::Base.connection.select_all(
119
+ <<~SQL
120
+ SELECT t1.name AS topic1, t2.name AS topic2,
121
+ COUNT(DISTINCT nt1.node_id) AS shared_nodes
122
+ FROM tags t1
123
+ JOIN nodes_tags nt1 ON t1.id = nt1.tag_id
124
+ JOIN nodes_tags nt2 ON nt1.node_id = nt2.node_id
125
+ JOIN tags t2 ON nt2.tag_id = t2.id
126
+ WHERE t1.name < t2.name
127
+ GROUP BY t1.name, t2.name
128
+ HAVING COUNT(DISTINCT nt1.node_id) >= #{min_shared_nodes}
129
+ ORDER BY shared_nodes DESC
130
+ LIMIT #{limit}
131
+ SQL
132
+ )
133
+ result.to_a
134
+ end
135
+ end
136
+ ```
137
+
138
+ ### Phase 2: LLM Extraction (Future - PROPOSED)
139
+
140
+ **Client-Side Extraction Service**:
141
+ ```ruby
142
+ class HTM::TopicExtractor
143
+ def initialize(llm_provider: :ollama, model: 'llama3', base_url: nil)
144
+ @provider = llm_provider
145
+ @model = model
146
+ @base_url = base_url || ENV['OLLAMA_URL'] || 'http://localhost:11434'
147
+ end
148
+
149
+ # Extract hierarchical topics from content
150
+ # @param content [String] Text to analyze
151
+ # @param existing_ontology [Array<String>] Current tags for context
152
+ # @return [Array<String>] Extracted topic tags
153
+ def extract_topics(content, existing_ontology: [])
154
+ prompt = build_extraction_prompt(content, existing_ontology)
155
+ response = call_llm(prompt)
156
+ parse_and_validate_topics(response)
157
+ end
158
+
159
+ private
160
+
161
+ def build_extraction_prompt(content, ontology_sample)
162
+ ontology_context = if ontology_sample.any?
163
+ "Existing ontology includes: #{ontology_sample.sample(20).join(', ')}"
164
+ else
165
+ "This is a new ontology - create appropriate hierarchical tags."
166
+ end
167
+
168
+ <<~PROMPT
169
+ Extract hierarchical topic tags from the following text.
170
+
171
+ #{ontology_context}
172
+
173
+ Format: root:level1:level2:level3 (use colons to separate levels)
174
+ Rules:
175
+ - Use lowercase letters, numbers, and hyphens only
176
+ - Maximum depth: 5 levels
177
+ - Return 2-5 tags per text
178
+ - Tags should be reusable and consistent
179
+ - Prefer existing ontology tags when applicable
180
+
181
+ Text: #{content}
182
+
183
+ Return ONLY the topic tags, one per line, no explanations.
184
+ PROMPT
185
+ end
186
+
187
+ def call_llm(prompt)
188
+ case @provider
189
+ when :ollama
190
+ call_ollama(prompt)
191
+ when :openai
192
+ call_openai(prompt)
193
+ end
194
+ end
195
+
196
+ def call_ollama(prompt)
197
+ require 'net/http'
198
+ require 'json'
199
+
200
+ uri = URI("#{@base_url}/api/generate")
201
+ request = Net::HTTP::Post.new(uri)
202
+ request['Content-Type'] = 'application/json'
203
+ request.body = JSON.generate({
204
+ model: @model,
205
+ prompt: prompt,
206
+ stream: false,
207
+ system: 'You are a precise topic extraction system. Output only topic tags in hierarchical format: root:subtopic:detail'
208
+ })
209
+
210
+ response = Net::HTTP.start(uri.hostname, uri.port) do |http|
211
+ http.request(request)
212
+ end
213
+
214
+ result = JSON.parse(response.body)
215
+ result['response']
216
+ end
217
+
218
+ def parse_and_validate_topics(response)
219
+ # Parse response (one tag per line)
220
+ tags = response.split("\n").map(&:strip).reject(&:empty?)
221
+
222
+ # Validate format: lowercase alphanumeric + hyphens + colons
223
+ valid_tags = tags.select do |tag|
224
+ tag =~ /^[a-z0-9\-]+(:[a-z0-9\-]+)*$/
225
+ end
226
+
227
+ # Limit depth to 5 levels
228
+ valid_tags.select { |tag| tag.count(':') < 5 }
229
+ end
230
+ end
231
+ ```
232
+
233
+ **Integration with HTM**:
234
+ ```ruby
235
+ class HTM
236
+ def add_message(content, speaker: 'user', auto_tag: false, **options)
237
+ # Generate embedding
238
+ embedding = @embedding_service.embed(content)
239
+
240
+ # Extract topics if auto_tag enabled
241
+ tags = if auto_tag && @topic_extractor
242
+ existing_ontology = @ltm.all_tags # Sample for context
243
+ @topic_extractor.extract_topics(content, existing_ontology: existing_ontology)
244
+ else
245
+ options[:tags] || []
246
+ end
247
+
248
+ # Create node
249
+ node = @ltm.add(
250
+ content: content,
251
+ speaker: speaker,
252
+ robot_id: @robot.id,
253
+ embedding: embedding,
254
+ **options
255
+ )
256
+
257
+ # Add tags
258
+ tags.each do |tag|
259
+ @ltm.add_tag(node_id: node.id, tag: tag)
260
+ end
261
+
262
+ node
263
+ end
264
+ end
265
+ ```
266
+
267
+ **Usage**:
268
+ ```ruby
269
+ # Enable auto-tagging
270
+ htm = HTM.new(
271
+ robot_name: 'CodeBot',
272
+ auto_tag: true,
273
+ topic_extractor: HTM::TopicExtractor.new(:ollama, model: 'llama3')
274
+ )
275
+
276
+ # Topics automatically extracted and added
277
+ node = htm.add_message("PostgreSQL supports vector similarity search via pgvector")
278
+ # Auto-generated tags: database:postgresql, database:pgvector, ai:vectors
279
+ ```
280
+
281
+ ---
282
+
283
+ ## Rationale
284
+
285
+ ### Why Hierarchical Tags?
286
+
287
+ **Structure and Flexibility**:
288
+ - ✅ Multiple abstraction levels: `ai` → `ai:llm` → `ai:llm:embeddings`
289
+ - ✅ Multiple classification paths: Same content can be `database:postgresql` AND `performance:optimization`
290
+ - ✅ Browsable hierarchy: Navigate from broad to specific
291
+ - ✅ Pattern recognition: See knowledge base emphasis by root tags
292
+
293
+ **Complementary to Vector Search**:
294
+ ```ruby
295
+ # Filtered semantic search
296
+ results = ltm.search(
297
+ query_embedding: query_embedding,
298
+ tag_filter: 'database:postgresql' # Limit search scope by tag
299
+ )
300
+
301
+ # Discover cross-domain connections
302
+ similar_nodes = ltm.vector_search(node.embedding)
303
+ # Node about "database optimization" finds similar "ai model training"
304
+ # (both are optimization problems)
305
+ ```
306
+
307
+ ### Why Manual First, LLM Later?
308
+
309
+ **Start Simple** (Current):
310
+ - ✅ No LLM dependency for basic tagging
311
+ - ✅ User controls ontology evolution
312
+ - ✅ Predictable behavior
313
+ - ✅ Works offline
314
+
315
+ **Add Intelligence** (Future):
316
+ - ✅ Consistent automated tagging
317
+ - ✅ Discovers implicit topics
318
+ - ✅ Learns from existing ontology
319
+ - ✅ Scales to large knowledge bases
320
+
321
+ ### Why Client-Side LLM Extraction?
322
+
323
+ Following ADR-014 pattern (client-side embedding):
324
+ - ✅ No database extension dependencies
325
+ - ✅ Easy debugging (Ruby stack traces)
326
+ - ✅ Flexible prompt engineering
327
+ - ✅ Can provide ontology context to LLM
328
+ - ✅ Testable and mockable
329
+
330
+ ---
331
+
332
+ ## Tag Hierarchy Guidelines
333
+
334
+ ### Naming Conventions
335
+
336
+ **Format**: `root:level1:level2:level3:level4`
337
+
338
+ **Rules**:
339
+ - Lowercase letters, numbers, hyphens only
340
+ - Colon separates hierarchy levels
341
+ - Maximum 5 levels deep
342
+ - Use hyphens for multi-word terms: `natural-language-processing`
343
+
344
+ **Examples**:
345
+ ```
346
+ ai:llm:providers:anthropic
347
+ ai:llm:providers:openai
348
+ ai:llm:techniques:prompting
349
+ ai:llm:techniques:rag
350
+ ai:embeddings:models:ollama
351
+ ai:embeddings:models:openai
352
+ database:postgresql:extensions:pgvector
353
+ database:postgresql:extensions:pg-trgm
354
+ database:postgresql:performance:indexes
355
+ programming:ruby:gems:activerecord
356
+ programming:ruby:gems:pg
357
+ programming:ruby:testing:minitest
358
+ performance:optimization:database
359
+ performance:optimization:algorithms
360
+ ```
361
+
362
+ ### Root Category Suggestions
363
+
364
+ Common root tags for software knowledge bases:
365
+
366
+ - `ai` - Artificial intelligence, ML, LLM
367
+ - `database` - Databases, SQL, NoSQL
368
+ - `programming` - Languages, frameworks, libraries
369
+ - `architecture` - System design, patterns
370
+ - `performance` - Optimization, profiling
371
+ - `security` - Authentication, encryption, vulnerabilities
372
+ - `devops` - Deployment, CI/CD, infrastructure
373
+ - `testing` - Unit tests, integration tests, QA
374
+ - `documentation` - README, API docs, tutorials
375
+ - `tools` - CLI tools, IDEs, utilities
376
+ - `concepts` - General CS concepts, algorithms
377
+ - `business` - Domain logic, requirements, processes
378
+
379
+ ### Tag Relationships
380
+
381
+ **Parent-Child** (via prefix):
382
+ ```ruby
383
+ # Get all children of 'ai:llm'
384
+ tags = Tag.where("name LIKE 'ai:llm:%'")
385
+ # Returns: ai:llm:embeddings, ai:llm:prompts, ai:llm:providers, etc.
386
+ ```
387
+
388
+ **Siblings** (same prefix):
389
+ ```ruby
390
+ # Get siblings of 'ai:llm:embeddings'
391
+ parent = 'ai:llm'
392
+ tags = Tag.where("name LIKE '#{parent}:%' AND name NOT LIKE '#{parent}:%:%'")
393
+ # Returns: ai:llm:embeddings, ai:llm:prompts, ai:llm:providers
394
+ ```
395
+
396
+ **Related Topics** (co-occurrence):
397
+ ```ruby
398
+ # Find topics that frequently appear together
399
+ ltm.topic_relationships(min_shared_nodes: 5)
400
+ # Example: 'database:postgresql' often appears with 'performance:optimization'
401
+ ```
402
+
403
+ ---
404
+
405
+ ## Ontology Evolution Strategies
406
+
407
+ ### 1. Tag Normalization
408
+
409
+ **Problem**: Similar tags with inconsistent naming
410
+ - `database:postgres` vs `database:postgresql`
411
+ - `ai:large-language-models` vs `ai:llm`
412
+
413
+ **Solution**: Merge tags
414
+ ```ruby
415
+ class TagMerger
416
+ def merge(from_tag:, to_tag:)
417
+ from_record = Tag.find_by(name: from_tag)
418
+ to_record = Tag.find_or_create_by(name: to_tag)
419
+
420
+ # Update all node associations
421
+ NodeTag.where(tag_id: from_record.id).update_all(tag_id: to_record.id)
422
+
423
+ # Delete old tag
424
+ from_record.destroy
425
+ end
426
+ end
427
+
428
+ # Usage
429
+ merger = TagMerger.new
430
+ merger.merge(from_tag: 'database:postgres', to_tag: 'database:postgresql')
431
+ ```
432
+
433
+ ### 2. Tag Splitting
434
+
435
+ **Problem**: Tag too broad, needs sub-categories
436
+ - `programming` → `programming:ruby`, `programming:python`
437
+
438
+ **Solution**: Retroactive sub-categorization
439
+ ```ruby
440
+ class TagSplitter
441
+ def split(broad_tag:, specific_tags:)
442
+ nodes = Node.joins(:tags).where(tags: { name: broad_tag })
443
+
444
+ nodes.each do |node|
445
+ # LLM determines which specific tag(s) apply
446
+ specific = determine_specific_tags(node.content, specific_tags)
447
+
448
+ specific.each do |tag|
449
+ ltm.add_tag(node_id: node.id, tag: tag)
450
+ end
451
+
452
+ # Optionally remove broad tag
453
+ ltm.remove_tag(node_id: node.id, tag: broad_tag)
454
+ end
455
+ end
456
+ end
457
+ ```
458
+
459
+ ### 3. Orphan Tag Detection
460
+
461
+ **Problem**: Single-use tags clutter ontology
462
+
463
+ **Solution**: Identify and consolidate
464
+ ```ruby
465
+ class OntologyAnalyzer
466
+ def find_orphans(min_usage: 2)
467
+ Tag.joins(:node_tags)
468
+ .group('tags.id')
469
+ .having('COUNT(node_tags.id) < ?', min_usage)
470
+ .pluck(:name)
471
+ end
472
+
473
+ def suggest_merges
474
+ orphans = find_orphans
475
+ # Use LLM or string similarity to suggest merge candidates
476
+ end
477
+ end
478
+ ```
479
+
480
+ ### 4. Ontology Visualization
481
+
482
+ **Problem**: Hard to see structure of large ontology
483
+
484
+ **Solution**: Generate hierarchy tree
485
+ ```ruby
486
+ class OntologyVisualizer
487
+ def render_tree(root: nil)
488
+ tags = root ? Tag.where("name LIKE '#{root}:%'") : Tag.all
489
+ build_tree(tags)
490
+ end
491
+
492
+ private
493
+
494
+ def build_tree(tags)
495
+ tree = {}
496
+ tags.each do |tag|
497
+ parts = tag.name.split(':')
498
+ insert_into_tree(tree, parts)
499
+ end
500
+ tree
501
+ end
502
+ end
503
+
504
+ # Output:
505
+ # ai/
506
+ # llm/
507
+ # embeddings/ (5 nodes)
508
+ # prompts/ (12 nodes)
509
+ # rag/
510
+ # retrieval/ (8 nodes)
511
+ ```
512
+
513
+ ---
514
+
515
+ ## Consequences
516
+
517
+ ### Positive
518
+
519
+ ✅ **Structured navigation**: Browse memories by category hierarchy
520
+ ✅ **Multiple perspectives**: Same content tagged from different angles
521
+ ✅ **Complementary to vectors**: Symbolic + sub-symbolic retrieval
522
+ ✅ **Emergent ontology**: Knowledge structure evolves with content
523
+ ✅ **Pattern recognition**: See knowledge base emphasis
524
+ ✅ **Cross-domain discovery**: Find unexpected connections
525
+ ✅ **Manual control**: User directs ontology evolution (Phase 1)
526
+ ✅ **Automatic extraction**: LLM discovers topics (Phase 2)
527
+ ✅ **Learning ontology**: LLM uses existing tags for consistency
528
+
529
+ ### Negative
530
+
531
+ ❌ **Manual effort**: Phase 1 requires manual tagging (time-consuming)
532
+ ❌ **Consistency**: Manual tagging prone to inconsistencies
533
+ ❌ **LLM cost**: Phase 2 requires LLM calls (OpenAI cost or Ollama latency)
534
+ ❌ **Quality variation**: LLM may generate suboptimal tags
535
+ ❌ **Maintenance**: Ontology needs periodic cleanup/consolidation
536
+
537
+ ### Neutral
538
+
539
+ ➡️ **Schema complexity**: Many-to-many adds join table queries
540
+ ➡️ **Storage overhead**: Tags stored separately from nodes
541
+ ➡️ **Configuration**: LLM settings for topic extraction
542
+
543
+ ---
544
+
545
+ ## Performance Considerations
546
+
547
+ ### Query Patterns
548
+
549
+ **Find nodes by tag**:
550
+ ```sql
551
+ -- Optimized with idx_nodes_tags_tag_id
552
+ SELECT n.*
553
+ FROM nodes n
554
+ JOIN nodes_tags nt ON n.id = nt.node_id
555
+ JOIN tags t ON nt.tag_id = t.id
556
+ WHERE t.name = 'database:postgresql';
557
+ ```
558
+
559
+ **Find nodes by tag prefix** (hierarchical):
560
+ ```sql
561
+ -- Uses idx_tags_name_pattern for LIKE with text_pattern_ops
562
+ SELECT n.*
563
+ FROM nodes n
564
+ JOIN nodes_tags nt ON n.id = nt.node_id
565
+ JOIN tags t ON nt.tag_id = t.id
566
+ WHERE t.name LIKE 'ai:llm:%';
567
+ ```
568
+
569
+ **Combined vector + tag search**:
570
+ ```sql
571
+ -- Most powerful: semantic similarity within category
572
+ SELECT n.*, n.embedding <=> $1::vector AS distance
573
+ FROM nodes n
574
+ JOIN nodes_tags nt ON n.id = nt.node_id
575
+ JOIN tags t ON nt.tag_id = t.id
576
+ WHERE t.name LIKE 'database:%'
577
+ AND n.embedding IS NOT NULL
578
+ ORDER BY distance
579
+ LIMIT 10;
580
+ ```
581
+
582
+ ### LLM Extraction Latency (Future)
583
+
584
+ | Operation | Time | Notes |
585
+ |-----------|------|-------|
586
+ | Extract topics (Ollama) | ~500ms | LLM generation time |
587
+ | Extract topics (OpenAI) | ~200ms | Network + API processing |
588
+ | Tag insertion | ~5ms | Per tag (database INSERT) |
589
+ | **Total per node** | ~550ms | Ollama local |
590
+
591
+ **Optimization**: Batch extraction for multiple nodes
592
+ ```ruby
593
+ # Extract topics for 10 nodes in one LLM call
594
+ topics_batch = topic_extractor.extract_topics_batch(nodes.map(&:content))
595
+ ```
596
+
597
+ ---
598
+
599
+ ## Future Enhancements
600
+
601
+ ### 1. Tag Confidence Scores
602
+
603
+ ```ruby
604
+ # Store confidence with tag association
605
+ class AddConfidenceToNodesTags < ActiveRecord::Migration
606
+ add_column :nodes_tags, :confidence, :real, default: 1.0
607
+ end
608
+
609
+ # Usage
610
+ ltm.add_tag(node_id: node.id, tag: 'database:postgresql', confidence: 0.95)
611
+ ```
612
+
613
+ ### 2. Ontology Templates
614
+
615
+ ```ruby
616
+ # Pre-defined ontology templates for domains
617
+ class OntologyTemplate
618
+ RUBY_GEMS = {
619
+ root: 'programming:ruby:gems',
620
+ tags: [
621
+ 'programming:ruby:gems:activerecord',
622
+ 'programming:ruby:gems:sinatra',
623
+ 'programming:ruby:gems:rails'
624
+ ]
625
+ }
626
+
627
+ def apply(template_name)
628
+ template = const_get(template_name)
629
+ template[:tags].each do |tag|
630
+ Tag.find_or_create_by(name: tag)
631
+ end
632
+ end
633
+ end
634
+ ```
635
+
636
+ ### 3. Tag Synonyms
637
+
638
+ ```ruby
639
+ # Map synonyms to canonical tags
640
+ class TagSynonym < ActiveRecord::Base
641
+ belongs_to :canonical_tag, class_name: 'Tag'
642
+ end
643
+
644
+ # When user tags with 'db', map to 'database'
645
+ TagSynonym.create(synonym: 'db', canonical_tag: 'database')
646
+ ```
647
+
648
+ ### 4. Batch Topic Extraction
649
+
650
+ ```ruby
651
+ # Extract topics for multiple nodes efficiently
652
+ def extract_topics_batch(nodes)
653
+ combined_prompt = build_batch_prompt(nodes)
654
+ response = call_llm(combined_prompt)
655
+ parse_batch_response(response, nodes)
656
+ end
657
+ ```
658
+
659
+ ### 5. Ontology Import/Export
660
+
661
+ ```ruby
662
+ # Export ontology to YAML for sharing
663
+ class OntologyExporter
664
+ def export
665
+ {
666
+ version: 1,
667
+ exported_at: Time.current,
668
+ tags: Tag.all.map { |t| { name: t.name, usage_count: t.nodes.count } }
669
+ }.to_yaml
670
+ end
671
+
672
+ def import(yaml)
673
+ data = YAML.load(yaml)
674
+ data[:tags].each do |tag_data|
675
+ Tag.find_or_create_by(name: tag_data[:name])
676
+ end
677
+ end
678
+ end
679
+ ```
680
+
681
+ ---
682
+
683
+ ## Related ADRs
684
+
685
+ - [ADR-013: ActiveRecord ORM and Many-to-Many Tagging](./013-activerecord-orm-and-many-to-many-tagging.md) - Database schema
686
+ - [ADR-012: LLM-Driven Ontology (PARTIALLY SUPERSEDED)](./012-llm-driven-ontology-topic-extraction.md) - Previous database-side approach
687
+ - [ADR-014: Client-Side Embedding Generation](./014-client-side-embedding-generation-workflow.md) - Parallel pattern for LLM extraction
688
+
689
+ ---
690
+
691
+ ## Review Notes
692
+
693
+ **AI Engineer**: ✅ Hierarchical tags + LLM extraction is powerful. Client-side approach provides flexibility.
694
+
695
+ **Knowledge Engineer**: ✅ Ontology evolution strategies are essential. Tag normalization will be critical.
696
+
697
+ **Ruby Expert**: ✅ Manual first, LLM later is pragmatic. Good use of ActiveRecord associations.
698
+
699
+ **Database Architect**: ✅ Indexes support hierarchical queries well. LIKE with pattern ops is efficient.
700
+
701
+ **Systems Architect**: ✅ Complementary to vector search. Provides structure that embeddings lack.