RubyGems - htm - Versions diffs - 0.0.1 → 0.0.10 - Mend

htm 0.0.1 → 0.0.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (184) hide show

checksums.yaml +4 -4
data/.aigcm_msg +1 -0
data/.architecture/reviews/comprehensive-codebase-review.md +577 -0
data/.claude/settings.local.json +92 -0
data/.envrc +1 -0
data/.irbrc +283 -80
data/.tbls.yml +31 -0
data/CHANGELOG.md +314 -16
data/CLAUDE.md +603 -0
data/README.md +76 -5
data/Rakefile +5 -0
data/SETUP.md +132 -101
data/db/migrate/{20250101000001_enable_extensions.rb → 00001_enable_extensions.rb} +0 -1
data/db/migrate/00002_create_robots.rb +11 -0
data/db/migrate/00003_create_file_sources.rb +20 -0
data/db/migrate/00004_create_nodes.rb +65 -0
data/db/migrate/00005_create_tags.rb +13 -0
data/db/migrate/00006_create_node_tags.rb +18 -0
data/db/migrate/00007_create_robot_nodes.rb +26 -0
data/db/migrate/00009_add_working_memory_to_robot_nodes.rb +12 -0
data/db/schema.sql +390 -36
data/docs/api/database.md +19 -232
data/docs/api/embedding-service.md +1 -7
data/docs/api/htm.md +305 -364
data/docs/api/index.md +1 -7
data/docs/api/long-term-memory.md +342 -590
data/docs/api/yard/HTM/ActiveRecordConfig.md +23 -0
data/docs/api/yard/HTM/AuthorizationError.md +11 -0
data/docs/api/yard/HTM/CircuitBreaker.md +92 -0
data/docs/api/yard/HTM/CircuitBreakerOpenError.md +34 -0
data/docs/api/yard/HTM/Configuration.md +175 -0
data/docs/api/yard/HTM/Database.md +99 -0
data/docs/api/yard/HTM/DatabaseError.md +14 -0
data/docs/api/yard/HTM/EmbeddingError.md +18 -0
data/docs/api/yard/HTM/EmbeddingService.md +58 -0
data/docs/api/yard/HTM/Error.md +11 -0
data/docs/api/yard/HTM/JobAdapter.md +39 -0
data/docs/api/yard/HTM/LongTermMemory.md +342 -0
data/docs/api/yard/HTM/NotFoundError.md +17 -0
data/docs/api/yard/HTM/Observability.md +107 -0
data/docs/api/yard/HTM/QueryTimeoutError.md +19 -0
data/docs/api/yard/HTM/Railtie.md +27 -0
data/docs/api/yard/HTM/ResourceExhaustedError.md +13 -0
data/docs/api/yard/HTM/TagError.md +18 -0
data/docs/api/yard/HTM/TagService.md +67 -0
data/docs/api/yard/HTM/Timeframe/Result.md +24 -0
data/docs/api/yard/HTM/Timeframe.md +40 -0
data/docs/api/yard/HTM/TimeframeExtractor/Result.md +24 -0
data/docs/api/yard/HTM/TimeframeExtractor.md +45 -0
data/docs/api/yard/HTM/ValidationError.md +20 -0
data/docs/api/yard/HTM/WorkingMemory.md +131 -0
data/docs/api/yard/HTM.md +80 -0
data/docs/api/yard/index.csv +179 -0
data/docs/api/yard-reference.md +51 -0
data/docs/architecture/adrs/001-postgresql-timescaledb.md +1 -1
data/docs/architecture/adrs/003-ollama-embeddings.md +1 -1
data/docs/architecture/adrs/010-redis-working-memory-rejected.md +2 -27
data/docs/architecture/adrs/index.md +2 -13
data/docs/architecture/hive-mind.md +165 -166
data/docs/architecture/index.md +2 -2
data/docs/architecture/overview.md +5 -171
data/docs/architecture/two-tier-memory.md +1 -35
data/docs/assets/images/adr-010-current-architecture.svg +37 -0
data/docs/assets/images/adr-010-proposed-architecture.svg +48 -0
data/docs/assets/images/adr-dependency-tree.svg +93 -0
data/docs/assets/images/class-hierarchy.svg +55 -0
data/docs/assets/images/exception-hierarchy.svg +45 -0
data/docs/assets/images/htm-architecture-overview.svg +83 -0
data/docs/assets/images/htm-complete-memory-flow.svg +160 -0
data/docs/assets/images/htm-context-assembly-flow.svg +148 -0
data/docs/assets/images/htm-eviction-process.svg +141 -0
data/docs/assets/images/htm-memory-addition-flow.svg +138 -0
data/docs/assets/images/htm-memory-recall-flow.svg +152 -0
data/docs/assets/images/htm-node-states.svg +123 -0
data/docs/assets/images/project-structure.svg +78 -0
data/docs/assets/images/test-directory-structure.svg +38 -0
data/{dbdoc → docs/database}/README.md +127 -125
data/docs/database/public.file_sources.md +42 -0
data/docs/database/public.file_sources.svg +211 -0
data/{dbdoc → docs/database}/public.node_tags.md +7 -8
data/docs/database/public.node_tags.svg +239 -0
data/{dbdoc → docs/database}/public.nodes.md +22 -17
data/docs/database/public.nodes.svg +271 -0
data/docs/database/public.robot_nodes.md +46 -0
data/docs/database/public.robot_nodes.svg +243 -0
data/{dbdoc → docs/database}/public.robots.md +2 -3
data/docs/database/public.robots.svg +161 -0
data/docs/database/public.tags.svg +139 -0
data/{dbdoc → docs/database}/schema.json +941 -630
data/docs/database/schema.svg +282 -0
data/docs/development/index.md +1 -29
data/docs/development/schema.md +134 -309
data/docs/development/testing.md +1 -9
data/docs/getting-started/index.md +47 -0
data/docs/{installation.md → getting-started/installation.md} +2 -2
data/docs/{quick-start.md → getting-started/quick-start.md} +5 -5
data/docs/guides/adding-memories.md +295 -643
data/docs/guides/recalling-memories.md +36 -1
data/docs/guides/search-strategies.md +85 -51
data/docs/images/htm-er-diagram.svg +156 -0
data/docs/index.md +16 -31
data/docs/multi_framework_support.md +4 -4
data/examples/README.md +280 -0
data/examples/basic_usage.rb +18 -16
data/examples/cli_app/htm_cli.rb +146 -8
data/examples/cli_app/temp.log +93 -0
data/examples/custom_llm_configuration.rb +1 -2
data/examples/example_app/app.rb +11 -14
data/examples/file_loader_usage.rb +177 -0
data/examples/robot_groups/lib/robot_group.rb +419 -0
data/examples/robot_groups/lib/working_memory_channel.rb +140 -0
data/examples/robot_groups/multi_process.rb +286 -0
data/examples/robot_groups/robot_worker.rb +136 -0
data/examples/robot_groups/same_process.rb +229 -0
data/examples/sinatra_app/Gemfile +1 -0
data/examples/sinatra_app/Gemfile.lock +166 -0
data/examples/sinatra_app/app.rb +219 -24
data/examples/timeframe_demo.rb +276 -0
data/lib/htm/active_record_config.rb +10 -3
data/lib/htm/circuit_breaker.rb +202 -0
data/lib/htm/configuration.rb +313 -80
data/lib/htm/database.rb +67 -36
data/lib/htm/embedding_service.rb +39 -2
data/lib/htm/errors.rb +131 -11
data/lib/htm/{sinatra.rb → integrations/sinatra.rb} +87 -12
data/lib/htm/job_adapter.rb +10 -3
data/lib/htm/jobs/generate_embedding_job.rb +5 -4
data/lib/htm/jobs/generate_tags_job.rb +4 -0
data/lib/htm/loaders/markdown_loader.rb +263 -0
data/lib/htm/loaders/paragraph_chunker.rb +112 -0
data/lib/htm/long_term_memory.rb +601 -321
data/lib/htm/models/file_source.rb +99 -0
data/lib/htm/models/node.rb +116 -12
data/lib/htm/models/robot.rb +53 -4
data/lib/htm/models/robot_node.rb +51 -0
data/lib/htm/models/tag.rb +302 -0
data/lib/htm/observability.rb +395 -0
data/lib/htm/tag_service.rb +60 -3
data/lib/htm/tasks.rb +29 -0
data/lib/htm/timeframe.rb +194 -0
data/lib/htm/timeframe_extractor.rb +307 -0
data/lib/htm/version.rb +1 -1
data/lib/htm/working_memory.rb +165 -70
data/lib/htm.rb +352 -133
data/lib/tasks/doc.rake +300 -0
data/lib/tasks/files.rake +299 -0
data/lib/tasks/htm.rake +188 -2
data/lib/tasks/jobs.rake +10 -12
data/lib/tasks/tags.rake +194 -0
data/mkdocs.yml +91 -9
data/notes/ARCHITECTURE_REVIEW.md +1167 -0
data/notes/IMPLEMENTATION_SUMMARY.md +606 -0
data/notes/MULTI_FRAMEWORK_IMPLEMENTATION.md +451 -0
data/notes/next_steps.md +100 -0
data/notes/plan.md +627 -0
data/notes/tag_ontology_enhancement_ideas.md +222 -0
data/notes/timescaledb_removal_summary.md +200 -0
metadata +177 -37
data/db/migrate/20250101000002_create_robots.rb +0 -14
data/db/migrate/20250101000003_create_nodes.rb +0 -42
data/db/migrate/20250101000005_create_tags.rb +0 -38
data/db/migrate/20250101000007_add_node_vector_indexes.rb +0 -30
data/dbdoc/public.node_tags.svg +0 -112
data/dbdoc/public.nodes.svg +0 -118
data/dbdoc/public.robots.svg +0 -90
data/dbdoc/public.tags.svg +0 -60
data/dbdoc/schema.svg +0 -154
data/{dbdoc → docs/database}/public.node_stats.md +0 -0
data/{dbdoc → docs/database}/public.node_stats.svg +0 -0
data/{dbdoc → docs/database}/public.nodes_tags.md +0 -0
data/{dbdoc → docs/database}/public.nodes_tags.svg +0 -0
data/{dbdoc → docs/database}/public.ontology_structure.md +0 -0
data/{dbdoc → docs/database}/public.ontology_structure.svg +0 -0
data/{dbdoc → docs/database}/public.operations_log.md +0 -0
data/{dbdoc → docs/database}/public.operations_log.svg +0 -0
data/{dbdoc → docs/database}/public.relationships.md +0 -0
data/{dbdoc → docs/database}/public.relationships.svg +0 -0
data/{dbdoc → docs/database}/public.robot_activity.md +0 -0
data/{dbdoc → docs/database}/public.robot_activity.svg +0 -0
data/{dbdoc → docs/database}/public.schema_migrations.md +0 -0
data/{dbdoc → docs/database}/public.schema_migrations.svg +0 -0
data/{dbdoc → docs/database}/public.tags.md +3 -3
/data/{dbdoc → docs/database}/public.topic_relationships.md +0 -0
/data/{dbdoc → docs/database}/public.topic_relationships.svg +0 -0

data/docs/development/schema.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Database Schema Documentation
-This document provides a comprehensive reference for HTM's PostgreSQL database schema, including all tables, indexes, and relationships.
+This document provides a comprehensive reference for HTM's PostgreSQL database schema, including query patterns, optimization strategies, and best practices.
 ## Schema Overview
@@ -22,367 +22,179 @@ CREATE EXTENSION IF NOT EXISTS vector WITH SCHEMA public;
 ## Entity-Relationship Diagram
-Here's the complete database structure:
+Here's the complete database structure (auto-generated by tbls):
-```svg
-<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1200 900" style="background: transparent;">
-  <defs>
-    <style>
-      .table-box { fill: #1e1e1e; stroke: #4a9eff; stroke-width: 2; }
-      .table-header { fill: #2d5a8e; }
-      .text-header { fill: #ffffff; font-family: monospace; font-size: 14px; font-weight: bold; }
-      .text-field { fill: #d4d4d4; font-family: monospace; font-size: 11px; }
-      .text-type { fill: #8cb4e8; font-family: monospace; font-size: 10px; }
-      .relation-line { stroke: #4a9eff; stroke-width: 1.5; fill: none; }
-      .arrow { fill: #4a9eff; }
-      .join-table { fill: #1e3a1e; stroke: #4a9eff; stroke-width: 2; }
-    </style>
-  </defs>
+![HTM Entity-Relationship Diagram](../database/schema.svg)
-  <!-- Robots Table -->
-  <rect class="table-box" x="50" y="50" width="280" height="140" rx="5"/>
-  <rect class="table-header" x="50" y="50" width="280" height="35" rx="5"/>
-  <text class="text-header" x="190" y="73" text-anchor="middle">robots</text>
+## Table Reference
-  <text class="text-field" x="60" y="100">id</text>
-  <text class="text-type" x="320" y="100" text-anchor="end">BIGSERIAL PK</text>
+For detailed table definitions, columns, indexes, and constraints, see the auto-generated documentation:
-  <text class="text-field" x="60" y="120">name</text>
-  <text class="text-type" x="320" y="120" text-anchor="end">TEXT</text>
+### Core Tables
-  <text class="text-field" x="60" y="140">created_at</text>
-  <text class="text-type" x="320" y="140" text-anchor="end">TIMESTAMPTZ</text>
+| Table | Description | Details |
+|-------|-------------|---------|
+| [robots](../database/public.robots.md) | Registry of all LLM robots using the HTM system | Stores robot metadata and activity tracking |
+| [nodes](../database/public.nodes.md) | Core memory storage for conversation messages and context | Vector embeddings, full-text search, deduplication |
+| [tags](../database/public.tags.md) | Unique hierarchical tag names for categorization | Colon-separated namespaces (e.g., `ai:llm:embeddings`) |
+| file_sources | Source file metadata for loaded documents | Path, mtime, frontmatter, sync tracking |
-  <text class="text-field" x="60" y="160">last_active</text>
-  <text class="text-type" x="320" y="160" text-anchor="end">TIMESTAMPTZ</text>
+### Join Tables
-  <text class="text-field" x="60" y="180">metadata</text>
-  <text class="text-type" x="320" y="180" text-anchor="end">JSONB</text>
+| Table | Description | Details |
+|-------|-------------|---------|
+| [robot_nodes](../database/public.robot_nodes.md) | Links robots to nodes (many-to-many) | Enables "hive mind" shared memory; includes `working_memory` boolean for per-robot working memory state |
+| [node_tags](../database/public.node_tags.md) | Links nodes to tags (many-to-many) | Flexible multi-tag categorization |
-  <!-- Nodes Table -->
-  <rect class="table-box" x="50" y="250" width="280" height="400" rx="5"/>
-  <rect class="table-header" x="50" y="250" width="280" height="35" rx="5"/>
-  <text class="text-header" x="190" y="273" text-anchor="middle">nodes</text>
+### System Tables
-  <text class="text-field" x="60" y="300">id</text>
-  <text class="text-type" x="320" y="300" text-anchor="end">BIGSERIAL PK</text>
+| Table | Description | Details |
+|-------|-------------|---------|
+| [schema_migrations](../database/public.schema_migrations.md) | ActiveRecord migration tracking | Tracks applied migrations |
-  <text class="text-field" x="60" y="320">content</text>
-  <text class="text-type" x="320" y="320" text-anchor="end">TEXT NOT NULL</text>
+For the complete schema overview including all stored procedures and functions, see the [Database Tables Overview](../database/README.md).
-  <text class="text-field" x="60" y="340">speaker</text>
-  <text class="text-type" x="320" y="340" text-anchor="end">TEXT NOT NULL</text>
+## Key Concepts
-  <text class="text-field" x="60" y="360">type</text>
-  <text class="text-type" x="320" y="360" text-anchor="end">TEXT</text>
+### Content Deduplication
-  <text class="text-field" x="60" y="380">category</text>
-  <text class="text-type" x="320" y="380" text-anchor="end">TEXT</text>
+Content deduplication is enforced via SHA-256 hashing in the `nodes` table:
-  <text class="text-field" x="60" y="400">importance</text>
-  <text class="text-type" x="320" y="400" text-anchor="end">DOUBLE PRECISION</text>
+1. When `remember()` is called, a SHA-256 hash of the content is computed
+2. If a node with the same `content_hash` exists, the existing node is reused
+3. A new `robot_nodes` association is created (or updated if it already exists)
+4. This ensures identical memories are stored once but can be "remembered" by multiple robots
-  <text class="text-field" x="60" y="420">created_at</text>
-  <text class="text-type" x="320" y="420" text-anchor="end">TIMESTAMPTZ</text>
+### JSONB Metadata
-  <text class="text-field" x="60" y="440">updated_at</text>
-  <text class="text-type" x="320" y="440" text-anchor="end">TIMESTAMPTZ</text>
+The `nodes` table includes a `metadata` JSONB column for flexible key-value storage:
-  <text class="text-field" x="60" y="460">last_accessed</text>
-  <text class="text-type" x="320" y="460" text-anchor="end">TIMESTAMPTZ</text>
+| Column | Type | Default | Description |
+|--------|------|---------|-------------|
+| `metadata` | jsonb | `{}` | Arbitrary key-value data |
-  <text class="text-field" x="60" y="480">token_count</text>
-  <text class="text-type" x="320" y="480" text-anchor="end">INTEGER</text>
+**Features:**
+- Stores any valid JSON data (strings, numbers, booleans, arrays, objects)
+- GIN index (`idx_nodes_metadata`) for efficient containment queries
+- Queried using PostgreSQL's `@>` containment operator
-  <text class="text-field" x="60" y="500">in_working_memory</text>
-  <text class="text-type" x="320" y="500" text-anchor="end">BOOLEAN</text>
-  <text class="text-field" x="60" y="520">robot_id</text>
-  <text class="text-type" x="320" y="520" text-anchor="end">BIGINT FK</text>
-  <text class="text-field" x="60" y="540">embedding</text>
-  <text class="text-type" x="320" y="540" text-anchor="end">vector(2000)</text>
-  <text class="text-field" x="60" y="560">embedding_dimension</text>
-  <text class="text-type" x="320" y="560" text-anchor="end">INTEGER</text>
-  <!-- Tags Table -->
-  <rect class="table-box" x="850" y="250" width="280" height="120" rx="5"/>
-  <rect class="table-header" x="850" y="250" width="280" height="35" rx="5"/>
-  <text class="text-header" x="990" y="273" text-anchor="middle">tags</text>
-  <text class="text-field" x="860" y="300">id</text>
-  <text class="text-type" x="1120" y="300" text-anchor="end">BIGSERIAL PK</text>
-  <text class="text-field" x="860" y="320">name</text>
-  <text class="text-type" x="1120" y="320" text-anchor="end">TEXT UNIQUE</text>
-  <text class="text-field" x="860" y="340">created_at</text>
-  <text class="text-type" x="1120" y="340" text-anchor="end">TIMESTAMPTZ</text>
-  <!-- nodes_tags Join Table -->
-  <rect class="join-table" x="450" y="420" width="280" height="140" rx="5"/>
-  <rect class="table-header" x="450" y="420" width="280" height="35" rx="5"/>
-  <text class="text-header" x="590" y="443" text-anchor="middle">nodes_tags</text>
-  <text class="text-field" x="460" y="470">id</text>
-  <text class="text-type" x="720" y="470" text-anchor="end">BIGSERIAL PK</text>
-  <text class="text-field" x="460" y="490">node_id</text>
-  <text class="text-type" x="720" y="490" text-anchor="end">BIGINT FK</text>
-  <text class="text-field" x="460" y="510">tag_id</text>
-  <text class="text-type" x="720" y="510" text-anchor="end">BIGINT FK</text>
-  <text class="text-field" x="460" y="530">created_at</text>
-  <text class="text-type" x="720" y="530" text-anchor="end">TIMESTAMPTZ</text>
-  <!-- Relationships: robots -> nodes -->
-  <path class="relation-line" d="M 190 190 L 190 250"/>
-  <polygon class="arrow" points="190,250 185,240 195,240"/>
-  <!-- Relationships: nodes -> nodes_tags -->
-  <path class="relation-line" d="M 330 490 L 450 490"/>
-  <polygon class="arrow" points="450,490 440,485 440,495"/>
-  <!-- Relationships: tags -> nodes_tags -->
-  <path class="relation-line" d="M 850 310 L 730 310 L 730 510 L 730 510"/>
-  <polygon class="arrow" points="730,510 725,500 735,500"/>
+**Query examples:**
+```sql
+-- Find nodes with specific metadata
+SELECT * FROM nodes WHERE metadata @> '{"priority": "high"}'::jsonb;
-  <!-- Legend -->
-  <text class="text-field" x="50" y="720" font-weight="bold">Legend:</text>
-  <text class="text-field" x="50" y="740">PK = Primary Key</text>
-  <text class="text-field" x="200" y="740">FK = Foreign Key</text>
-  <text class="text-field" x="50" y="760">Green box = Join table (many-to-many)</text>
+-- Find nodes with nested metadata
+SELECT * FROM nodes WHERE metadata @> '{"user": {"role": "admin"}}'::jsonb;
-  <!-- Annotations -->
-  <text class="text-field" x="400" y="370" font-style="italic">1:N</text>
-  <text class="text-field" x="380" y="480" font-style="italic">N:M</text>
-  <text class="text-field" x="770" y="480" font-style="italic">N:M</text>
-</svg>
+-- Find nodes with multiple conditions
+SELECT * FROM nodes WHERE metadata @> '{"environment": "production", "version": 2}'::jsonb;
 ```
-## Table Definitions
+**Ruby usage:**
+```ruby
+# Store with metadata
+htm.remember("API config", metadata: { environment: "production", version: 2 })
-### robots
+# Recall filtering by metadata
+htm.recall("config", metadata: { environment: "production" })
+```
-The robots table stores registration and metadata for all LLM agents using the HTM system.
+### Hierarchical Tags
-**Purpose**: Registry of all robots (LLM agents) with their configuration and activity tracking.
+Tags use colon-separated hierarchies for organization:
+- `programming:ruby:gems` - Programming > Ruby > Gems
+- `database:postgresql:extensions` - Database > PostgreSQL > Extensions
+- `ai:llm:embeddings` - AI > LLM > Embeddings
+Query by prefix to find all related tags:
 ```sql
-CREATE TABLE public.robots (
-    id bigint NOT NULL,
-    name text,
-    created_at timestamp with time zone DEFAULT CURRENT_TIMESTAMP,
-    last_active timestamp with time zone DEFAULT CURRENT_TIMESTAMP,
-    metadata jsonb
-);
-ALTER TABLE ONLY public.robots ALTER COLUMN id SET DEFAULT nextval('public.robots_id_seq'::regclass);
-ALTER TABLE ONLY public.robots ADD CONSTRAINT robots_pkey PRIMARY KEY (id);
+SELECT * FROM tags WHERE name LIKE 'database:%';  -- All database-related tags
+SELECT * FROM tags WHERE name LIKE 'ai:llm:%';    -- All LLM-related tags
 ```
-**Columns**:
+### File Source Tracking
-| Column | Type | Nullable | Default | Description |
-|--------|------|----------|---------|-------------|
-| `id` | BIGINT | NO | AUTO | Unique identifier (primary key) |
-| `name` | TEXT | YES | NULL | Human-readable name for the robot |
-| `created_at` | TIMESTAMPTZ | YES | NOW() | When the robot was first registered |
-| `last_active` | TIMESTAMPTZ | YES | NOW() | Last time the robot accessed the system |
-| `metadata` | JSONB | YES | NULL | Robot-specific configuration and metadata |
+The `file_sources` table tracks loaded documents for re-sync support:
-**Indexes**:
-- `PRIMARY KEY` on `id`
+| Column | Type | Description |
+|--------|------|-------------|
+| `id` | bigint | Primary key |
+| `file_path` | text | Absolute path to the source file |
+| `file_hash` | varchar(64) | SHA-256 hash of file contents |
+| `mtime` | timestamptz | File modification time for change detection |
+| `file_size` | integer | File size in bytes |
+| `frontmatter` | jsonb | Parsed YAML frontmatter metadata |
+| `last_synced_at` | timestamptz | When file was last synced |
+| `created_at` | timestamptz | When source was first loaded |
+| `updated_at` | timestamptz | When source was last updated |
-**Relationships**:
-- One robot has many nodes (1:N)
----
+Nodes loaded from files have:
+- `source_id` - Foreign key to file_sources (nullable, ON DELETE SET NULL)
+- `chunk_position` - Integer position within the file (0-indexed)
-### nodes
+Query nodes from a file:
+```sql
+SELECT n.*
+FROM nodes n
+JOIN file_sources fs ON n.source_id = fs.id
+WHERE fs.file_path = '/path/to/file.md'
+ORDER BY n.chunk_position;
+```
-The core table storing all memory nodes with vector embeddings for semantic search.
+### Remember Tracking
-**Purpose**: Stores all memories (conversation messages, facts, decisions, code, etc.) with full-text and vector search capabilities.
+The `robot_nodes` table tracks per-robot remember metadata:
-```sql
-CREATE TABLE public.nodes (
-    id bigint NOT NULL,
-    content text NOT NULL,
-    speaker text NOT NULL,
-    type text,
-    category text,
-    importance double precision DEFAULT 1.0,
-    created_at timestamp with time zone DEFAULT CURRENT_TIMESTAMP,
-    updated_at timestamp with time zone DEFAULT CURRENT_TIMESTAMP,
-    last_accessed timestamp with time zone DEFAULT CURRENT_TIMESTAMP,
-    token_count integer,
-    in_working_memory boolean DEFAULT false,
-    robot_id bigint NOT NULL,
-    embedding public.vector(2000),
-    embedding_dimension integer,
-    CONSTRAINT check_embedding_dimension CHECK (((embedding_dimension IS NULL) OR ((embedding_dimension > 0) AND (embedding_dimension <= 2000))))
-);
-ALTER TABLE ONLY public.nodes ALTER COLUMN id SET DEFAULT nextval('public.nodes_id_seq'::regclass);
-ALTER TABLE ONLY public.nodes ADD CONSTRAINT nodes_pkey PRIMARY KEY (id);
-ALTER TABLE ONLY public.nodes
-    ADD CONSTRAINT fk_rails_60162e9d3a FOREIGN KEY (robot_id) REFERENCES public.robots(id) ON DELETE CASCADE;
-```
+1. `first_remembered_at` - When this robot first encountered this content
+2. `last_remembered_at` - Updated each time the robot tries to remember the same content
+3. `remember_count` - Incremented each time (useful for identifying frequently reinforced memories)
-**Columns**:
-| Column | Type | Nullable | Default | Description |
-|--------|------|----------|---------|-------------|
-| `id` | BIGINT | NO | AUTO | Unique identifier (primary key) |
-| `content` | TEXT | NO | - | The conversation message/utterance content |
-| `speaker` | TEXT | NO | - | Who said it: user or robot name |
-| `type` | TEXT | YES | NULL | Memory type: fact, context, code, preference, decision, question |
-| `category` | TEXT | YES | NULL | Optional category for organizing memories |
-| `importance` | DOUBLE PRECISION | YES | 1.0 | Importance score (0.0-1.0) for prioritizing recall |
-| `created_at` | TIMESTAMPTZ | YES | NOW() | When this memory was created |
-| `updated_at` | TIMESTAMPTZ | YES | NOW() | When this memory was last modified |
-| `last_accessed` | TIMESTAMPTZ | YES | NOW() | When this memory was last accessed |
-| `token_count` | INTEGER | YES | NULL | Number of tokens in the content (for context budget management) |
-| `in_working_memory` | BOOLEAN | YES | FALSE | Whether this memory is currently in working memory |
-| `robot_id` | BIGINT | NO | - | ID of the robot that owns this memory |
-| `embedding` | vector(2000) | YES | NULL | Vector embedding (max 2000 dimensions) for semantic search |
-| `embedding_dimension` | INTEGER | YES | NULL | Actual number of dimensions used in the embedding vector (max 2000) |
-**Indexes**:
-- `PRIMARY KEY` on `id`
-- `idx_nodes_robot_id` BTREE on `robot_id`
-- `idx_nodes_speaker` BTREE on `speaker`
-- `idx_nodes_type` BTREE on `type`
-- `idx_nodes_category` BTREE on `category`
-- `idx_nodes_created_at` BTREE on `created_at`
-- `idx_nodes_updated_at` BTREE on `updated_at`
-- `idx_nodes_last_accessed` BTREE on `last_accessed`
-- `idx_nodes_in_working_memory` BTREE on `in_working_memory`
-- `idx_nodes_embedding` HNSW on `embedding` using `vector_cosine_ops` (m=16, ef_construction=64)
-- `idx_nodes_content_gin` GIN on `to_tsvector('english', content)` for full-text search
-- `idx_nodes_content_trgm` GIN on `content` using `gin_trgm_ops` for fuzzy matching
-**Foreign Keys**:
-- `robot_id` references `robots(id)` ON DELETE CASCADE
-**Relationships**:
-- Many nodes belong to one robot (N:1)
-- Many nodes have many tags through nodes_tags (N:M)
-**Check Constraints**:
-- `check_embedding_dimension`: Ensures embedding_dimension is NULL or between 1 and 2000
+This allows querying for:
+- Recently reinforced memories: `ORDER BY last_remembered_at DESC`
+- Frequently remembered content: `ORDER BY remember_count DESC`
+- New vs old memories: Compare `first_remembered_at` across robots
 ---
-### tags
-The tags table stores unique hierarchical tag names for categorization.
+## Common Query Patterns
-**Purpose**: Provides flexible, hierarchical categorization using colon-separated namespaces (e.g., `database:postgresql:timescaledb`).
+### Finding Nodes for a Robot
 ```sql
-CREATE TABLE public.tags (
-    id bigint NOT NULL,
-    name text NOT NULL,
-    created_at timestamp with time zone DEFAULT CURRENT_TIMESTAMP
-);
-ALTER TABLE ONLY public.tags ALTER COLUMN id SET DEFAULT nextval('public.tags_id_seq'::regclass);
-ALTER TABLE ONLY public.tags ADD CONSTRAINT tags_pkey PRIMARY KEY (id);
+SELECT n.*
+FROM nodes n
+JOIN robot_nodes rn ON n.id = rn.node_id
+WHERE rn.robot_id = $1
+ORDER BY rn.last_remembered_at DESC;
 ```
-**Columns**:
-| Column | Type | Nullable | Default | Description |
-|--------|------|----------|---------|-------------|
-| `id` | BIGINT | NO | AUTO | Unique identifier (primary key) |
-| `name` | TEXT | NO | - | Hierarchical tag in format: root:level1:level2 (e.g., database:postgresql:timescaledb) |
-| `created_at` | TIMESTAMPTZ | YES | NOW() | When this tag was created |
-**Indexes**:
-- `PRIMARY KEY` on `id`
-- `idx_tags_name_unique` UNIQUE BTREE on `name`
-- `idx_tags_name_pattern` BTREE on `name` with `text_pattern_ops` for pattern matching
-**Relationships**:
-- Many tags belong to many nodes through nodes_tags (N:M)
+### Finding Robots that Share a Node
-**Tag Hierarchy**:
-Tags use colon-separated hierarchies for organization:
-- `programming:ruby:gems` - Programming > Ruby > Gems
-- `database:postgresql:extensions` - Database > PostgreSQL > Extensions
-- `ai:llm:embeddings` - AI > LLM > Embeddings
-This allows querying by prefix to find all related tags:
 ```sql
-SELECT * FROM tags WHERE name LIKE 'database:%';  -- All database-related tags
-SELECT * FROM tags WHERE name LIKE 'ai:llm:%';    -- All LLM-related tags
+SELECT r.*
+FROM robots r
+JOIN robot_nodes rn ON r.id = rn.robot_id
+WHERE rn.node_id = $1
+ORDER BY rn.first_remembered_at;
 ```
----
-### nodes_tags
-The nodes_tags join table implements the many-to-many relationship between nodes and tags.
-**Purpose**: Links nodes to tags, allowing each node to have multiple tags and each tag to be applied to multiple nodes.
+### Finding Frequently Remembered Content
 ```sql
-CREATE TABLE public.nodes_tags (
-    id bigint NOT NULL,
-    node_id bigint NOT NULL,
-    tag_id bigint NOT NULL,
-    created_at timestamp with time zone DEFAULT CURRENT_TIMESTAMP
-);
-ALTER TABLE ONLY public.nodes_tags ALTER COLUMN id SET DEFAULT nextval('public.node_tags_id_seq'::regclass);
-ALTER TABLE ONLY public.nodes_tags ADD CONSTRAINT node_tags_pkey PRIMARY KEY (id);
-ALTER TABLE ONLY public.nodes_tags
-    ADD CONSTRAINT fk_rails_b0b726ecf8 FOREIGN KEY (node_id) REFERENCES public.nodes(id) ON DELETE CASCADE;
-ALTER TABLE ONLY public.nodes_tags
-    ADD CONSTRAINT fk_rails_eccc99cec5 FOREIGN KEY (tag_id) REFERENCES public.tags(id) ON DELETE CASCADE;
+SELECT n.*, rn.remember_count, rn.first_remembered_at, rn.last_remembered_at
+FROM nodes n
+JOIN robot_nodes rn ON n.id = rn.node_id
+WHERE rn.robot_id = $1
+ORDER BY rn.remember_count DESC
+LIMIT 10;
 ```
-**Columns**:
-| Column | Type | Nullable | Default | Description |
-|--------|------|----------|---------|-------------|
-| `id` | BIGINT | NO | AUTO | Unique identifier (primary key) |
-| `node_id` | BIGINT | NO | - | ID of the node being tagged |
-| `tag_id` | BIGINT | NO | - | ID of the tag being applied |
-| `created_at` | TIMESTAMPTZ | YES | NOW() | When this association was created |
-**Indexes**:
-- `PRIMARY KEY` on `id`
-- `idx_node_tags_unique` UNIQUE BTREE on `(node_id, tag_id)` - Prevents duplicate associations
-- `idx_node_tags_node_id` BTREE on `node_id` - Fast lookups of tags for a node
-- `idx_node_tags_tag_id` BTREE on `tag_id` - Fast lookups of nodes for a tag
-**Foreign Keys**:
-- `node_id` references `nodes(id)` ON DELETE CASCADE
-- `tag_id` references `tags(id)` ON DELETE CASCADE
-**Cascade Behavior**:
-- When a node is deleted, all its tag associations are automatically removed
-- When a tag is deleted, all associations to that tag are automatically removed
-- The join table ensures referential integrity between nodes and tags
----
-## Common Query Patterns
 ### Finding Tags for a Node
 ```sql
 SELECT t.name
 FROM tags t
-JOIN nodes_tags nt ON t.id = nt.tag_id
+JOIN node_tags nt ON t.id = nt.tag_id
 WHERE nt.node_id = $1
 ORDER BY t.name;
 ```
@@ -392,7 +204,7 @@ ORDER BY t.name;
 ```sql
 SELECT n.*
 FROM nodes n
-JOIN nodes_tags nt ON n.id = nt.node_id
+JOIN node_tags nt ON n.id = nt.node_id
 JOIN tags t ON nt.tag_id = t.id
 WHERE t.name = 'database:postgresql'
 ORDER BY n.created_at DESC;
@@ -403,7 +215,7 @@ ORDER BY n.created_at DESC;
 ```sql
 SELECT n.*
 FROM nodes n
-JOIN nodes_tags nt ON n.id = nt.node_id
+JOIN node_tags nt ON n.id = nt.node_id
 JOIN tags t ON nt.tag_id = t.id
 WHERE t.name LIKE 'ai:llm:%'
 ORDER BY n.created_at DESC;
@@ -417,8 +229,8 @@ SELECT
     t2.name AS topic2,
     COUNT(DISTINCT nt1.node_id) AS shared_nodes
 FROM tags t1
-JOIN nodes_tags nt1 ON t1.id = nt1.tag_id
-JOIN nodes_tags nt2 ON nt1.node_id = nt2.node_id
+JOIN node_tags nt1 ON t1.id = nt1.tag_id
+JOIN node_tags nt2 ON nt1.node_id = nt2.node_id
 JOIN tags t2 ON nt2.tag_id = t2.id
 WHERE t1.name < t2.name
 GROUP BY t1.name, t2.name
@@ -431,7 +243,7 @@ ORDER BY shared_nodes DESC;
 ```sql
 SELECT n.*, n.embedding <=> $1::vector AS distance
 FROM nodes n
-JOIN nodes_tags nt ON n.id = nt.node_id
+JOIN node_tags nt ON n.id = nt.node_id
 JOIN tags t ON nt.tag_id = t.id
 WHERE t.name = 'programming:ruby'
   AND n.embedding IS NOT NULL
@@ -444,7 +256,7 @@ LIMIT 10;
 ```sql
 SELECT n.*, ts_rank(to_tsvector('english', n.content), query) AS rank
 FROM nodes n
-JOIN nodes_tags nt ON n.id = nt.node_id
+JOIN node_tags nt ON n.id = nt.node_id
 JOIN tags t ON nt.tag_id = t.id,
      to_tsquery('english', 'database & optimization') query
 WHERE to_tsvector('english', n.content) @@ query
@@ -453,6 +265,17 @@ ORDER BY rank DESC
 LIMIT 20;
 ```
+### Finding Content Shared by Multiple Robots
+```sql
+SELECT n.*, COUNT(DISTINCT rn.robot_id) AS robot_count
+FROM nodes n
+JOIN robot_nodes rn ON n.id = rn.node_id
+GROUP BY n.id
+HAVING COUNT(DISTINCT rn.robot_id) > 1
+ORDER BY robot_count DESC;
+```
 ---
 ## Database Optimization
@@ -518,6 +341,8 @@ The schema is managed through ActiveRecord migrations located in `db/migrate/`:
 1. `20250101000001_create_robots.rb` - Creates robots table
 2. `20250101000002_create_nodes.rb` - Creates nodes table with all indexes
 3. `20250101000005_create_tags.rb` - Creates tags and nodes_tags tables
+4. `20251128000002_create_file_sources.rb` - Creates file_sources table for document tracking
+5. `20251128000003_add_source_to_nodes.rb` - Adds source_id and chunk_position to nodes
 To apply migrations:
 ```bash

data/docs/development/testing.md CHANGED Viewed

@@ -147,15 +147,7 @@ ruby -r debug test/htm_test.rb
 ### Test File Layout
-```
-test/
-├── test_helper.rb              # Shared test configuration
-├── htm_test.rb                 # Unit tests for HTM class
-├── embedding_service_test.rb   # Unit tests for EmbeddingService
-├── integration_test.rb         # Integration tests
-└── fixtures/                   # Test data (future)
-    └── sample_memories.json
-```
+![Test Directory Structure](../assets/images/test-directory-structure.svg)
 ### Test File Template

data/docs/getting-started/index.md ADDED Viewed

@@ -0,0 +1,47 @@
+# Getting Started
+Welcome to HTM (Hierarchical Temporary Memory)! This section will help you get up and running quickly.
+## Overview
+HTM provides intelligent memory management for LLM-based applications (robots) with a two-tier architecture:
+- **Long-term Memory**: Durable PostgreSQL storage with vector embeddings for semantic search
+- **Working Memory**: Token-limited in-memory context for immediate LLM use
+## What You'll Learn
+<div class="grid cards" markdown>
+-   :material-download:{ .lg .middle } **Installation**
+    ---
+    Set up HTM in your Ruby project with all required dependencies including PostgreSQL, pgvector, and Ollama.
+    [:octicons-arrow-right-24: Install HTM](installation.md)
+-   :material-rocket-launch:{ .lg .middle } **Quick Start**
+    ---
+    Build your first memory-enabled robot in minutes with practical examples and code snippets.
+    [:octicons-arrow-right-24: Get started](quick-start.md)
+</div>
+## Prerequisites
+Before installing HTM, ensure you have:
+- **Ruby 3.1+** - HTM uses modern Ruby features
+- **PostgreSQL 14+** - With pgvector and pg_trgm extensions
+- **Ollama** (optional) - For local embedding generation
+## Next Steps
+1. **[Install HTM](installation.md)** - Set up the gem and database
+2. **[Quick Start](quick-start.md)** - Create your first memory-enabled robot
+3. **[Architecture Overview](../architecture/overview.md)** - Understand how HTM works
+4. **[Guides](../guides/index.md)** - Deep dive into specific features

data/docs/{installation.md → getting-started/installation.md} RENAMED Viewed

@@ -453,8 +453,8 @@ psql $HTM_DBURL -c "
 Now that HTM is installed, you're ready to start building:
 1. **[Quick Start Guide](quick-start.md)**: Build your first HTM application in 5 minutes
-2. **[User Guide](guides/getting-started.md)**: Learn all HTM features in depth
-3. **[API Reference](api/htm.md)**: Explore the complete API documentation
+2. **[User Guide](../guides/getting-started.md)**: Learn all HTM features in depth
+3. **[API Reference](../api/htm.md)**: Explore the complete API documentation
 4. **[Examples](https://github.com/madbomber/htm/tree/main/examples)**: See real-world usage examples
 ## Getting Help