RubyGems - htm - Versions diffs - 0.0.10 → 0.0.14 - Mend

htm 0.0.10 → 0.0.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (75) hide show

checksums.yaml +4 -4
data/.dictate.toml +46 -0
data/.envrc +2 -0
data/CHANGELOG.md +86 -3
data/README.md +86 -7
data/Rakefile +14 -2
data/bin/htm_mcp.rb +621 -0
data/config/database.yml +20 -13
data/db/migrate/00010_add_soft_delete_to_associations.rb +29 -0
data/db/migrate/00011_add_performance_indexes.rb +21 -0
data/db/migrate/00012_add_tags_trigram_index.rb +18 -0
data/db/migrate/00013_enable_lz4_compression.rb +43 -0
data/db/schema.sql +49 -92
data/docs/api/index.md +1 -1
data/docs/api/yard/HTM.md +2 -4
data/docs/architecture/index.md +1 -1
data/docs/development/index.md +1 -1
data/docs/getting-started/index.md +1 -1
data/docs/guides/index.md +1 -1
data/docs/images/telemetry-architecture.svg +153 -0
data/docs/telemetry.md +391 -0
data/examples/README.md +171 -1
data/examples/cli_app/README.md +1 -1
data/examples/cli_app/htm_cli.rb +1 -1
data/examples/mcp_client.rb +529 -0
data/examples/sinatra_app/app.rb +1 -1
data/examples/telemetry/README.md +147 -0
data/examples/telemetry/SETUP_README.md +169 -0
data/examples/telemetry/demo.rb +498 -0
data/examples/telemetry/grafana/dashboards/htm-metrics.json +457 -0
data/lib/htm/configuration.rb +261 -70
data/lib/htm/database.rb +46 -22
data/lib/htm/embedding_service.rb +24 -14
data/lib/htm/errors.rb +15 -1
data/lib/htm/jobs/generate_embedding_job.rb +19 -0
data/lib/htm/jobs/generate_propositions_job.rb +103 -0
data/lib/htm/jobs/generate_tags_job.rb +24 -0
data/lib/htm/loaders/markdown_chunker.rb +79 -0
data/lib/htm/loaders/markdown_loader.rb +41 -15
data/lib/htm/long_term_memory/fulltext_search.rb +138 -0
data/lib/htm/long_term_memory/hybrid_search.rb +324 -0
data/lib/htm/long_term_memory/node_operations.rb +209 -0
data/lib/htm/long_term_memory/relevance_scorer.rb +355 -0
data/lib/htm/long_term_memory/robot_operations.rb +34 -0
data/lib/htm/long_term_memory/tag_operations.rb +428 -0
data/lib/htm/long_term_memory/vector_search.rb +109 -0
data/lib/htm/long_term_memory.rb +51 -1153
data/lib/htm/models/node.rb +35 -2
data/lib/htm/models/node_tag.rb +31 -0
data/lib/htm/models/robot_node.rb +31 -0
data/lib/htm/models/tag.rb +44 -0
data/lib/htm/proposition_service.rb +169 -0
data/lib/htm/query_cache.rb +214 -0
data/lib/htm/sql_builder.rb +178 -0
data/lib/htm/tag_service.rb +16 -6
data/lib/htm/tasks.rb +8 -2
data/lib/htm/telemetry.rb +224 -0
data/lib/htm/version.rb +1 -1
data/lib/htm.rb +64 -3
data/lib/tasks/doc.rake +1 -1
data/lib/tasks/htm.rake +259 -13
data/mkdocs.yml +96 -96
metadata +75 -18
data/.aigcm_msg +0 -1
data/.claude/settings.local.json +0 -92
data/CLAUDE.md +0 -603
data/examples/cli_app/temp.log +0 -93
data/lib/htm/loaders/paragraph_chunker.rb +0 -112
data/notes/ARCHITECTURE_REVIEW.md +0 -1167
data/notes/IMPLEMENTATION_SUMMARY.md +0 -606
data/notes/MULTI_FRAMEWORK_IMPLEMENTATION.md +0 -451
data/notes/next_steps.md +0 -100
data/notes/plan.md +0 -627
data/notes/tag_ontology_enhancement_ideas.md +0 -222
data/notes/timescaledb_removal_summary.md +0 -200

data/notes/tag_ontology_enhancement_ideas.md DELETED Viewed

@@ -1,222 +0,0 @@
-# Tag Ontology Enhancement Ideas
-## Problem Statement
-HTM builds a dynamic hierarchical tag ontology but doesn't fully leverage it for retrieval. The tag extraction is sophisticated (LLM-based, ontology-consistent, hierarchical), but the retrieval capabilities are limited.
-## Current State Analysis
-### What's Working Well
-- Sophisticated LLM-based tag extraction with ontology consistency
-- Hierarchical format (`database:postgresql:indexes`)
-- Beautiful visualization (tree, Mermaid, SVG)
-- Tag relationship analysis (co-occurrence)
-- Async generation via `GenerateTagsJob`
-### What's Missing
-1. **`search_by_tags()` and `nodes_by_topic()` exist in LongTermMemory but aren't exposed in the public HTM API**
-2. `recall()` accepts `query_tags:` parameter but **ignores it** except for relevance scoring
-3. No hierarchical traversal (parent/child/ancestor queries)
-4. No tag-only retrieval (without text query)
-5. No query expansion using tag relationships
-6. No faceted navigation
-7. No tag-based result grouping
----
-## Proposed Enhancements
-### 1. Expose Tag-Based Retrieval in HTM API
-**Priority: High** - These methods already exist, just need to be exposed.
-```ruby
-# Browse by topic path (hierarchical navigation)
-htm.browse("database:postgresql")  # All nodes under this branch
-# Filter recall with tags
-htm.recall("query", tags: ["database:postgresql"], match_all: false)
-# Tag-only retrieval (no text query required)
-htm.by_tags(["database:postgresql", "ai:embeddings"])
-```
-**Implementation:**
-- Add `browse(topic_path, exact: false, limit: 20)` to HTM class
-- Add `by_tags(tags, match_all: false, limit: 20)` to HTM class
-- Wire `query_tags:` parameter through to actual filtering in `recall()`
----
-### 2. Hierarchical Query Expansion
-**Priority: High** - Leverages the hierarchical structure.
-```ruby
-# Searching "database:postgresql" should optionally include children:
-#   database:postgresql:indexes
-#   database:postgresql:extensions
-#   database:postgresql:partitioning
-htm.recall("indexes", tags: ["database:postgresql"], expand_children: true)
-# Or expand upward to include parent context
-htm.recall("specific query", tags: ["database:postgresql:indexes"], expand_ancestors: true)
-```
-**Implementation:**
-- Add `expand_children:` and `expand_ancestors:` options
-- Query `tags.name LIKE 'database:postgresql:%'` for children
-- Parse tag path and query ancestors for upward expansion
----
-### 3. Faceted Search / Tag Aggregation
-**Priority: Medium** - Enables discovery and navigation.
-```ruby
-# "What topics are represented in my search results?"
-results = htm.recall("machine learning")
-facets = results.facet_by_tags
-# => { "ai:ml" => 15, "ai:llm" => 8, "database:vector" => 5 }
-# Or as a standalone method
-htm.tag_facets(query: "machine learning", limit: 10)
-```
-**Implementation:**
-- Return tag counts grouped by topic
-- Weight by specificity (deeper tags = more specific)
-- Consider returning hierarchical facet structure
----
-### 4. Semantic Tag Matching in Recall
-**Priority: Medium** - Auto-extract tags from query.
-```ruby
-# Auto-extract tags from query and use them to boost results
-htm.recall("PostgreSQL vector search", auto_tag: true)
-# Internally: extracts ["database:postgresql", "database:vector-search"]
-# and uses these to boost relevant results
-```
-**Implementation:**
-- Use `find_query_matching_tags()` (already exists in LongTermMemory)
-- Boost results that match extracted tags
-- Make this opt-in or opt-out via configuration
----
-### 5. Tag-Based Context Assembly
-**Priority: Medium** - Topic-focused context building.
-```ruby
-# Assemble context prioritizing specific topic branches
-htm.assemble_context(
-  token_budget: 4000,
-  focus_topics: ["database:postgresql", "ai:embeddings"],
-  strategy: :topic_balanced  # New strategy
-)
-```
-**Implementation:**
-- Add `:topic_balanced` strategy to WorkingMemory
-- Weight nodes by tag overlap with focus topics
-- Ensure diverse topic coverage within budget
----
-### 6. Ontology-Aware Related Memories
-**Priority: Low** - Nice-to-have for exploration.
-```ruby
-# Find related memories via shared tags (not just vector similarity)
-htm.related_by_topic(node_id)
-# Returns nodes sharing the most tags, weighted by specificity
-# Compare to existing vector-based similarity
-htm.similar(node_id)  # Vector similarity
-htm.related_by_topic(node_id)  # Tag-based similarity
-```
-**Implementation:**
-- Count shared tags between nodes
-- Weight by tag depth (more specific = higher weight)
-- Combine with vector similarity for hybrid relatedness
----
-### 7. Tag Model Enhancements
-**Priority: Medium** - Better hierarchy navigation.
-```ruby
-# Add to HTM::Models::Tag
-tag.parent          # Parent tag (e.g., "database:postgresql" -> "database")
-tag.children        # Child tags (e.g., "database" -> ["database:postgresql", "database:mysql"])
-tag.siblings        # Same-level tags under same parent
-tag.ancestors       # All ancestors up to root
-tag.descendants     # All descendants (recursive)
-# Class methods
-Tag.roots           # All root-level tags
-Tag.at_depth(2)     # All tags at specific depth
-Tag.under("database")  # All tags in this branch
-```
-**Implementation:**
-- Parse colon-separated paths
-- Use SQL LIKE queries for efficient hierarchy traversal
-- Consider materialized path or nested set for performance at scale
----
-### 8. Tag-Based Grouping in Results
-**Priority: Low** - Organizational feature.
-```ruby
-# Group results by their primary tag
-results = htm.recall("query", group_by_tag: true)
-# => {
-#   "database:postgresql" => [node1, node2],
-#   "ai:embeddings" => [node3, node4],
-#   "uncategorized" => [node5]
-# }
-```
----
-## Implementation Priority
-### Phase 1 (High Value, Low Effort)
-1. Expose existing `search_by_tags()` and `nodes_by_topic()` in HTM API
-2. Wire `query_tags:` parameter through `recall()` for actual filtering
-### Phase 2 (High Value, Medium Effort)
-3. Add hierarchical query expansion (`expand_children:`, `expand_ancestors:`)
-4. Add Tag model hierarchy methods (parent, children, ancestors, descendants)
-### Phase 3 (Medium Value, Medium Effort)
-5. Faceted search / tag aggregation
-6. Semantic tag matching in recall (auto-extract from query)
-7. Tag-based context assembly strategy
-### Phase 4 (Nice to Have)
-8. Ontology-aware related memories
-9. Tag-based result grouping
----
-## Notes
-- All enhancements should maintain backward compatibility
-- Consider adding configuration options for default behaviors
-- Tag operations should be efficient (indexed queries)
-- Consider caching popular tag queries
-- Document new methods with YARD comments

data/notes/timescaledb_removal_summary.md DELETED Viewed

@@ -1,200 +0,0 @@
-# TimescaleDB Removal Summary
-**Date:** 2025-10-28
-**Decision:** Remove TimescaleDB extension from HTM gem as it does not add sufficient value
-## Overview
-TimescaleDB was originally included in the HTM gem for time-series optimization capabilities. However, analysis revealed that:
-1. **No hypertables were actually created** - The `setup_hypertables` method in `lib/htm/database.rb` was essentially a no-op with a comment stating "All tables use simple PRIMARY KEY (id), no hypertable conversions"
-2. **Time-range queries use standard indexed columns** - The `created_at` column on `nodes` and `timestamp` column on `operations_log` are indexed using standard PostgreSQL B-tree indexes
-3. **No compression policies were used** - Despite documentation mentioning compression, no actual compression was implemented
-4. **Additional dependency overhead** - Required users to have TimescaleDB available even though it provided no actual benefit
-## Code Files Modified
-The following code files were modified to remove TimescaleDB:
-### 1. `lib/htm/active_record_config.rb`
-**Lines modified:** 71-74
-**Changes:**
-- Removed `'timescaledb' => 'TimescaleDB extension'` from the `required_extensions` hash in `verify_extensions!` method
-**Before:**
-```ruby
-required_extensions = {
-  'timescaledb' => 'TimescaleDB extension',
-  'vector' => 'pgvector extension',
-  'pg_trgm' => 'PostgreSQL trigram extension'
-}
-```
-**After:**
-```ruby
-required_extensions = {
-  'vector' => 'pgvector extension',
-  'pg_trgm' => 'PostgreSQL trigram extension'
-}
-```
-### 2. `lib/htm/database.rb`
-**Multiple sections modified**
-**Changes:**
-- **Line 9:** Updated class documentation comment from "Handles schema creation and TimescaleDB hypertable setup" to "Handles schema creation and database initialization"
-- **Lines 31-39:** Removed entire hypertable conversion block that called `setup_hypertables(conn)`
-- **Lines 342-347:** Removed TimescaleDB version check from `verify_extensions` method
-- **Lines 432-437:** Removed entire `setup_hypertables` method definition
-**Impact:** The Database class now only handles standard PostgreSQL schema setup without any TimescaleDB-specific code.
-### 3. `db/README.md`
-**Lines modified:** 29, 127
-**Changes:**
-- **Line 29:** Changed "Vector similarity search (pgvector on TimescaleDB Cloud)" to "Vector similarity search (pgvector)"
-- **Line 127:** Removed "**TimescaleDB** extension (optional, for hypertables)" from Database Requirements section
-### 4. `lib/tasks/htm.rake`
-**Lines modified:** 67-73
-**Changes:**
-- Removed TimescaleDB version check from the `htm:db:test` rake task
-**Before:**
-```ruby
-# Check TimescaleDB
-timescale = conn.exec("SELECT extversion FROM pg_extension WHERE extname='timescaledb'").first
-if timescale
-  puts "  ✓ TimescaleDB version: #{timescale['extversion']}"
-else
-  puts "  ⚠ Warning: TimescaleDB extension not found"
-end
-# Check pgvector
-```
-**After:**
-```ruby
-# Check pgvector
-```
-## Other Potentially Impacted Files
-A codebase-wide search revealed **114 total files** containing references to "TimescaleDB", "timescaledb", or "hypertable". These fall into the following categories:
-### Documentation Files
-- `README.md` - Main project documentation
-- `CLAUDE.md` - AI assistant context documentation
-- `.architecture/` directory - Architecture Decision Records (ADRs) and reviews
-- `dbdoc/` directory - Auto-generated database documentation (120+ files)
-### Test Files
-- `test/` directory - Unit and integration tests may reference TimescaleDB in comments or mock data
-### Example Files
-- `examples/` directory - Example code may mention TimescaleDB in documentation
-### Migration Files
-- `db/migrate/` directory - Migration files may have comments referencing TimescaleDB optimization
-## Recommended Follow-up Actions
-### High Priority
-1. **Update README.md** - Remove TimescaleDB from installation instructions and feature descriptions
-2. **Update CLAUDE.md** - Remove TimescaleDB references from project overview and architecture descriptions
-3. **Review ADRs** - Update or create new ADR documenting the decision to remove TimescaleDB
-### Medium Priority
-4. **Update test files** - Remove TimescaleDB references from test comments and documentation
-5. **Update example code** - Remove TimescaleDB mentions from example documentation
-6. **Regenerate dbdoc/** - Run `tbls` again to regenerate database documentation without TimescaleDB references
-### Low Priority
-7. **Update migration comments** - Clean up any comments in migration files that reference TimescaleDB optimization
-8. **Review dependencies** - Verify that `Gemfile` or gemspec doesn't list TimescaleDB as a requirement (not found in initial search)
-## Benefits of Removal
-1. **Simplified deployment** - Users no longer need TimescaleDB-enabled PostgreSQL instances
-2. **Reduced complexity** - One less extension to manage and verify
-3. **Broader compatibility** - Works with any PostgreSQL 12+ installation (not just TimescaleDB Cloud or self-hosted TimescaleDB)
-4. **Clearer documentation** - Removes confusion about TimescaleDB's role (since it wasn't actually used)
-5. **Honest architecture** - Codebase now accurately reflects what it actually uses
-## No Loss of Functionality
-Removing TimescaleDB results in **zero loss of functionality** because:
-- No hypertables were created
-- No compression policies were used
-- Time-range queries already use standard B-tree indexes
-- All existing queries continue to work identically
-- Performance characteristics remain unchanged
-## Database Requirements After Removal
-The HTM gem now requires:
-- **PostgreSQL** 12+
-- **vector** extension (pgvector) - for embedding similarity search
-- **pg_trgm** extension - for fuzzy text matching
-No TimescaleDB required.
-## Testing Verification
-After these changes, verify:
-1. **Database setup works:**
-   ```bash
-   rake htm:db:setup
-   ```
-2. **Database connection test works:**
-   ```bash
-   rake htm:db:test
-   ```
-3. **All tests pass:**
-   ```bash
-   rake test
-   ```
-4. **Example code runs:**
-   ```bash
-   rake example
-   ```
-## Git Commit Message Suggestion
-```
-refactor!: remove TimescaleDB extension dependency
-BREAKING CHANGE: TimescaleDB is no longer required or checked for.
-TimescaleDB was originally included for time-series optimization
-but was never actually used (no hypertables were created, no
-compression policies configured). Time-range queries use standard
-PostgreSQL B-tree indexes on timestamp columns.
-This change:
-- Removes TimescaleDB from required extensions check
-- Removes verify_extensions and setup_hypertables methods
-- Updates documentation to reflect PostgreSQL-only requirements
-- Simplifies deployment by removing unnecessary dependency
-No functionality is lost as TimescaleDB features were not being used.
-Modified files:
-- lib/htm/active_record_config.rb
-- lib/htm/database.rb
-- db/README.md
-- lib/tasks/htm.rake
-```
-## Conclusion
-The removal of TimescaleDB from the HTM gem is a **low-risk refactoring** that simplifies the architecture and deployment requirements without any loss of functionality. All code changes have been completed in the core library files, with follow-up documentation updates recommended.