RubyGems - htm - Versions diffs - 0.0.1 - Mend

htm 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (155) hide show

checksums.yaml +7 -0
data/.architecture/decisions/adrs/001-use-postgresql-timescaledb-storage.md +227 -0
data/.architecture/decisions/adrs/002-two-tier-memory-architecture.md +322 -0
data/.architecture/decisions/adrs/003-ollama-default-embedding-provider.md +339 -0
data/.architecture/decisions/adrs/004-multi-robot-shared-memory-hive-mind.md +374 -0
data/.architecture/decisions/adrs/005-rag-based-retrieval-with-hybrid-search.md +443 -0
data/.architecture/decisions/adrs/006-context-assembly-strategies.md +444 -0
data/.architecture/decisions/adrs/007-working-memory-eviction-strategy.md +461 -0
data/.architecture/decisions/adrs/008-robot-identification-system.md +550 -0
data/.architecture/decisions/adrs/009-never-forget-explicit-deletion-only.md +570 -0
data/.architecture/decisions/adrs/010-redis-working-memory-rejected.md +323 -0
data/.architecture/decisions/adrs/011-database-side-embedding-generation-with-pgai.md +585 -0
data/.architecture/decisions/adrs/012-llm-driven-ontology-topic-extraction.md +583 -0
data/.architecture/decisions/adrs/013-activerecord-orm-and-many-to-many-tagging.md +299 -0
data/.architecture/decisions/adrs/014-client-side-embedding-generation-workflow.md +569 -0
data/.architecture/decisions/adrs/015-hierarchical-tag-ontology-and-llm-extraction.md +701 -0
data/.architecture/decisions/adrs/016-async-embedding-and-tag-generation.md +694 -0
data/.architecture/members.yml +144 -0
data/.architecture/reviews/2025-10-29-llm-configuration-and-async-processing-review.md +1137 -0
data/.architecture/reviews/initial-system-analysis.md +330 -0
data/.envrc +32 -0
data/.irbrc +145 -0
data/CHANGELOG.md +150 -0
data/COMMITS.md +196 -0
data/LICENSE +21 -0
data/README.md +1347 -0
data/Rakefile +51 -0
data/SETUP.md +268 -0
data/config/database.yml +67 -0
data/db/migrate/20250101000001_enable_extensions.rb +14 -0
data/db/migrate/20250101000002_create_robots.rb +14 -0
data/db/migrate/20250101000003_create_nodes.rb +42 -0
data/db/migrate/20250101000005_create_tags.rb +38 -0
data/db/migrate/20250101000007_add_node_vector_indexes.rb +30 -0
data/db/schema.sql +473 -0
data/db/seed_data/README.md +100 -0
data/db/seed_data/presidents.md +136 -0
data/db/seed_data/states.md +151 -0
data/db/seeds.rb +208 -0
data/dbdoc/README.md +173 -0
data/dbdoc/public.node_stats.md +48 -0
data/dbdoc/public.node_stats.svg +41 -0
data/dbdoc/public.node_tags.md +40 -0
data/dbdoc/public.node_tags.svg +112 -0
data/dbdoc/public.nodes.md +54 -0
data/dbdoc/public.nodes.svg +118 -0
data/dbdoc/public.nodes_tags.md +39 -0
data/dbdoc/public.nodes_tags.svg +112 -0
data/dbdoc/public.ontology_structure.md +48 -0
data/dbdoc/public.ontology_structure.svg +38 -0
data/dbdoc/public.operations_log.md +42 -0
data/dbdoc/public.operations_log.svg +130 -0
data/dbdoc/public.relationships.md +39 -0
data/dbdoc/public.relationships.svg +41 -0
data/dbdoc/public.robot_activity.md +46 -0
data/dbdoc/public.robot_activity.svg +35 -0
data/dbdoc/public.robots.md +35 -0
data/dbdoc/public.robots.svg +90 -0
data/dbdoc/public.schema_migrations.md +29 -0
data/dbdoc/public.schema_migrations.svg +26 -0
data/dbdoc/public.tags.md +35 -0
data/dbdoc/public.tags.svg +60 -0
data/dbdoc/public.topic_relationships.md +45 -0
data/dbdoc/public.topic_relationships.svg +32 -0
data/dbdoc/schema.json +1437 -0
data/dbdoc/schema.svg +154 -0
data/docs/api/database.md +806 -0
data/docs/api/embedding-service.md +532 -0
data/docs/api/htm.md +797 -0
data/docs/api/index.md +259 -0
data/docs/api/long-term-memory.md +1096 -0
data/docs/api/working-memory.md +665 -0
data/docs/architecture/adrs/001-postgresql-timescaledb.md +314 -0
data/docs/architecture/adrs/002-two-tier-memory.md +411 -0
data/docs/architecture/adrs/003-ollama-embeddings.md +421 -0
data/docs/architecture/adrs/004-hive-mind.md +437 -0
data/docs/architecture/adrs/005-rag-retrieval.md +531 -0
data/docs/architecture/adrs/006-context-assembly.md +496 -0
data/docs/architecture/adrs/007-eviction-strategy.md +645 -0
data/docs/architecture/adrs/008-robot-identification.md +625 -0
data/docs/architecture/adrs/009-never-forget.md +648 -0
data/docs/architecture/adrs/010-redis-working-memory-rejected.md +323 -0
data/docs/architecture/adrs/011-pgai-integration.md +494 -0
data/docs/architecture/adrs/index.md +215 -0
data/docs/architecture/hive-mind.md +736 -0
data/docs/architecture/index.md +351 -0
data/docs/architecture/overview.md +538 -0
data/docs/architecture/two-tier-memory.md +873 -0
data/docs/assets/css/custom.css +83 -0
data/docs/assets/images/htm-core-components.svg +63 -0
data/docs/assets/images/htm-database-schema.svg +93 -0
data/docs/assets/images/htm-hive-mind-architecture.svg +125 -0
data/docs/assets/images/htm-importance-scoring-framework.svg +83 -0
data/docs/assets/images/htm-layered-architecture.svg +71 -0
data/docs/assets/images/htm-long-term-memory-architecture.svg +115 -0
data/docs/assets/images/htm-working-memory-architecture.svg +120 -0
data/docs/assets/images/htm.jpg +0 -0
data/docs/assets/images/htm_demo.gif +0 -0
data/docs/assets/js/mathjax.js +18 -0
data/docs/assets/videos/htm_video.mp4 +0 -0
data/docs/database_rake_tasks.md +322 -0
data/docs/development/contributing.md +787 -0
data/docs/development/index.md +336 -0
data/docs/development/schema.md +596 -0
data/docs/development/setup.md +719 -0
data/docs/development/testing.md +819 -0
data/docs/guides/adding-memories.md +824 -0
data/docs/guides/context-assembly.md +1009 -0
data/docs/guides/getting-started.md +577 -0
data/docs/guides/index.md +118 -0
data/docs/guides/long-term-memory.md +941 -0
data/docs/guides/multi-robot.md +866 -0
data/docs/guides/recalling-memories.md +927 -0
data/docs/guides/search-strategies.md +953 -0
data/docs/guides/working-memory.md +717 -0
data/docs/index.md +214 -0
data/docs/installation.md +477 -0
data/docs/multi_framework_support.md +519 -0
data/docs/quick-start.md +655 -0
data/docs/setup_local_database.md +302 -0
data/docs/using_rake_tasks_in_your_app.md +383 -0
data/examples/basic_usage.rb +93 -0
data/examples/cli_app/README.md +317 -0
data/examples/cli_app/htm_cli.rb +270 -0
data/examples/custom_llm_configuration.rb +183 -0
data/examples/example_app/Rakefile +71 -0
data/examples/example_app/app.rb +206 -0
data/examples/sinatra_app/Gemfile +21 -0
data/examples/sinatra_app/app.rb +335 -0
data/lib/htm/active_record_config.rb +113 -0
data/lib/htm/configuration.rb +342 -0
data/lib/htm/database.rb +594 -0
data/lib/htm/embedding_service.rb +115 -0
data/lib/htm/errors.rb +34 -0
data/lib/htm/job_adapter.rb +154 -0
data/lib/htm/jobs/generate_embedding_job.rb +65 -0
data/lib/htm/jobs/generate_tags_job.rb +82 -0
data/lib/htm/long_term_memory.rb +965 -0
data/lib/htm/models/node.rb +109 -0
data/lib/htm/models/node_tag.rb +33 -0
data/lib/htm/models/robot.rb +52 -0
data/lib/htm/models/tag.rb +76 -0
data/lib/htm/railtie.rb +76 -0
data/lib/htm/sinatra.rb +157 -0
data/lib/htm/tag_service.rb +135 -0
data/lib/htm/tasks.rb +38 -0
data/lib/htm/version.rb +5 -0
data/lib/htm/working_memory.rb +182 -0
data/lib/htm.rb +400 -0
data/lib/tasks/db.rake +19 -0
data/lib/tasks/htm.rake +147 -0
data/lib/tasks/jobs.rake +312 -0
data/mkdocs.yml +190 -0
data/scripts/install_local_database.sh +309 -0
metadata +341 -0

data/docs/architecture/overview.md ADDED Viewed

@@ -0,0 +1,538 @@
+# Detailed Architecture
+This document provides a comprehensive deep dive into HTM's system architecture, component interactions, data flows, database schema, and performance characteristics.
+## Table of Contents
+- [System Architecture](#system-architecture)
+- [Component Diagrams](#component-diagrams)
+- [Data Flow Diagrams](#data-flow-diagrams)
+- [Memory Lifecycle](#memory-lifecycle)
+- [Database Schema](#database-schema)
+- [Technology Stack](#technology-stack)
+- [Performance Characteristics](#performance-characteristics)
+- [Scalability Considerations](#scalability-considerations)
+## System Architecture
+HTM implements a layered architecture with clear separation of concerns between presentation (API), business logic (memory management), and data access (database).
+### Architecture Layers
+![HTM Layered Architecture](../assets/images/htm-layered-architecture.svg)
+### Component Responsibilities
+#### API Layer (HTM class)
+- Public interface for all memory operations
+- Robot identification and initialization
+- Request routing to appropriate subsystems
+- Response aggregation and formatting
+- Activity logging and statistics
+#### Coordination Layer
+- **Robot Management**: Registration, activity tracking, metadata
+- **Embedding Coordination**: Generate embeddings for new memories and search queries
+- **Memory Orchestration**: Coordinate between working and long-term memory
+- **Context Assembly**: Build LLM context strings from working memory
+- **Token Management**: Count tokens and enforce limits
+#### Memory Management Layer
+##### Working Memory
+- **In-Memory Store**: Fast Ruby Hash-based storage
+- **Token Budget**: Enforce maximum token limit (default 128K)
+- **Eviction Policy**: Hybrid importance + recency eviction
+- **Access Tracking**: LRU-style access order for recency
+- **Context Assembly**: Three strategies (recent, important, balanced)
+##### Long-Term Memory
+- **Persistence**: Write all memories to PostgreSQL
+- **RAG Search**: Vector + temporal + full-text search
+- **Relationship Management**: Store and query node relationships
+- **Robot Registry**: Track all robots using the system
+- **Eviction Marking**: Mark which nodes are in working memory
+#### Services Layer
+##### Embedding Service
+- **Client-Side Generation**: Generate embeddings before database insertion
+- **Token Counting**: Estimate token counts for strings
+- **Model Management**: Handle different models per provider
+- **Provider Support**: Ollama (default) and OpenAI
+!!! info "Architecture Change (October 2025)"
+    Embeddings are generated client-side in Ruby before database insertion. This provides reliable, cross-platform operation without complex database extension dependencies.
+##### Database Service
+- **Connection Pooling**: Manage PostgreSQL connections
+- **Query Execution**: Execute parameterized queries safely
+- **Transaction Management**: ACID guarantees for operations
+- **Error Handling**: Retry logic and failure recovery
+#### Data Layer
+- **PostgreSQL**: Relational storage with ACID guarantees
+- **TimescaleDB**: Time-series optimization and compression
+- **pgvector**: Vector similarity search with HNSW
+- **pg_trgm**: Fuzzy text matching for search
+## Component Diagrams
+### HTM Core Components
+![HTM Core Components](../assets/images/htm-core-components.svg)
+## Data Flow Diagrams
+### Memory Addition Flow
+This diagram shows the complete flow of adding a new memory node to HTM with **client-side embedding generation**.
+!!! info "Architecture Note"
+    With client-side generation (October 2025), embeddings are generated in Ruby before database insertion. This provides reliable, cross-platform operation.
+```mermaid
+graph TD
+    A[User: add_message] -->|1. Request| B[HTM]
+    B -->|2. Count tokens| C[EmbeddingService]
+    C -->|3. Return count| B
+    B -->|4. Generate embedding| C
+    C -->|5. HTTP call| D[Ollama/OpenAI]
+    D -->|6. Return vector| C
+    C -->|7. Return embedding| B
+    B -->|8. Persist with embedding| E[LongTermMemory]
+    E -->|9. INSERT nodes with embedding| F[PostgreSQL]
+    F -->|10. Return node_id| E
+    E -->|11. Return node_id| B
+    B -->|12. Check space| G[WorkingMemory]
+    G -->|13. Space available?| H{Has Space?}
+    H -->|No| I[Evict nodes]
+    I -->|14. Mark evicted| E
+    H -->|Yes| J[Add to WM]
+    I --> J
+    J -->|15. Success| B
+    B -->|16. Log operation| E
+    B -->|17. Return node_id| A
+    style A fill:rgba(76,175,80,0.3)
+    style B fill:rgba(33,150,243,0.3)
+    style C fill:rgba(255,152,0,0.3)
+    style D fill:rgba(255,193,7,0.3)
+    style E fill:rgba(156,39,176,0.3)
+    style F fill:rgba(156,39,176,0.3)
+    style G fill:rgba(33,150,243,0.3)
+```
+### Memory Recall Flow
+This diagram illustrates the RAG-based retrieval process with **client-side query embeddings**.
+!!! info "Architecture Note"
+    With client-side generation, query embeddings are generated in Ruby before being passed to SQL for vector similarity search.
+```mermaid
+graph TD
+    A[User: recall] -->|1. Request| B[HTM]
+    B -->|2. Parse timeframe| C[Parse Natural Language]
+    C -->|3. Return range| B
+    B -->|4. Generate query embedding| D[EmbeddingService]
+    D -->|5. HTTP call| E[Ollama/OpenAI]
+    E -->|6. Return vector| D
+    D -->|7. Return embedding| B
+    B -->|8. Search with embedding| F[LongTermMemory]
+    F -->|9. Vector similarity| G{Search Strategy}
+    G -->|:vector| H[Vector Search]
+    G -->|:fulltext| I[Full-Text Search]
+    G -->|:hybrid| J[Hybrid Search]
+    H -->|10. pgvector HNSW| K[Return results]
+    I -->|10. ts_rank GIN| K
+    J -->|10. Hybrid + RRF| K
+    K -->|11. Results| F
+    F -->|12. Results| B
+    B -->|13. For each result| L[WorkingMemory]
+    L -->|14. Add to WM| M{Has Space?}
+    M -->|No| N[Evict old nodes]
+    P -->|Yes| R[Add node]
+    Q --> R
+    R -->|12. Log operation| E
+    B -->|13. Return memories| A
+    style A fill:rgba(76,175,80,0.3)
+    style B fill:rgba(33,150,243,0.3)
+    style E fill:rgba(156,39,176,0.3)
+    style J fill:rgba(255,193,7,0.3)
+    style O fill:rgba(33,150,243,0.3)
+```
+### Context Assembly Flow
+This diagram shows how working memory assembles context for LLM consumption using different strategies.
+```mermaid
+graph TD
+    A[User: create_context] -->|1. Request with strategy| B[HTM]
+    B -->|2. Assemble| C[WorkingMemory]
+    C -->|3. Strategy?| D{Strategy Type}
+    D -->|:recent| E[Sort by access order]
+    D -->|:important| F[Sort by importance]
+    D -->|:balanced| G[Hybrid score]
+    E --> H[Sorted nodes]
+    F --> H
+    G --> H
+    H -->|4. Build context| I[Token budget loop]
+    I -->|5. Check tokens| J{Tokens < max?}
+    J -->|Yes| K[Add node to context]
+    J -->|No| L[Stop, return context]
+    K --> I
+    L -->|6. Join nodes| M[Assembled context string]
+    M -->|7. Return| C
+    C -->|8. Return| B
+    B -->|9. Return| A
+    style A fill:rgba(76,175,80,0.3)
+    style C fill:rgba(33,150,243,0.3)
+    style G fill:rgba(255,193,7,0.3)
+```
+## Memory Lifecycle
+### Node States
+A memory node transitions through several states during its lifetime in HTM:
+```mermaid
+stateDiagram-v2
+    [*] --> Created: add_node()
+    Created --> InBothMemories: Initial state
+    InBothMemories --> WorkingMemoryOnly: Evicted from WM
+    InBothMemories --> LongTermMemoryOnly: WM cleared
+    WorkingMemoryOnly --> InBothMemories: Recalled
+    LongTermMemoryOnly --> InBothMemories: Recalled
+    InBothMemories --> Forgotten: forget(confirm: :confirmed)
+    WorkingMemoryOnly --> Forgotten: forget(confirm: :confirmed)
+    LongTermMemoryOnly --> Forgotten: forget(confirm: :confirmed)
+    Forgotten --> [*]
+    note right of InBothMemories
+        Node exists in:
+        - Working Memory (fast access)
+        - Long-Term Memory (persistent)
+        in_working_memory = TRUE
+    end note
+    note right of WorkingMemoryOnly
+        Node exists only in:
+        - Long-Term Memory
+        in_working_memory = FALSE
+        (Evicted due to token limit)
+    end note
+    note right of Forgotten
+        Node permanently deleted
+        from both memories
+        (Explicit user action)
+    end note
+```
+### Eviction Process
+When working memory reaches its token limit, the eviction process runs to free up space:
+```mermaid
+sequenceDiagram
+    participant User
+    participant HTM
+    participant WM as WorkingMemory
+    participant LTM as LongTermMemory
+    participant DB as Database
+    User->>HTM: add_node(large_memory)
+    HTM->>WM: add(key, value, token_count)
+    WM->>WM: Check: token_count + current > max?
+    alt Space Available
+        WM->>WM: Add node directly
+        WM-->>HTM: Success
+    else No Space
+        WM->>WM: Sort by [importance, -recency]
+        WM->>WM: Evict low-importance old nodes
+        Note over WM: Free enough tokens
+        WM->>HTM: Return evicted nodes
+        HTM->>LTM: mark_evicted(keys)
+        LTM->>DB: UPDATE in_working_memory = FALSE
+        DB-->>LTM: Updated
+        WM->>WM: Add new node
+        WM-->>HTM: Success
+    end
+    HTM-->>User: node_id
+```
+## Database Schema
+### Entity-Relationship Diagram
+![HTM Database Schema](../assets/images/htm-database-schema.svg)
+### Table Details
+#### nodes
+The main table storing all memory nodes with vector embeddings, metadata, and timestamps.
+| Column | Type | Description |
+|--------|------|-------------|
+| `id` | BIGSERIAL | Primary key, auto-incrementing |
+| `key` | TEXT | Unique identifier for node (user-defined) |
+| `value` | TEXT | Content of the memory |
+| `type` | TEXT | Memory type (fact, context, code, preference, decision, question) |
+| `category` | TEXT | Optional category for organization |
+| `importance` | REAL | Importance score (0.0-10.0, default 1.0) |
+| `created_at` | TIMESTAMP | Creation timestamp |
+| `updated_at` | TIMESTAMP | Last update timestamp |
+| `last_accessed` | TIMESTAMP | Last access timestamp |
+| `token_count` | INTEGER | Number of tokens in value |
+| `in_working_memory` | BOOLEAN | Whether currently in working memory |
+| `robot_id` | TEXT | Foreign key to robots table |
+| `embedding` | vector(1536) | Vector embedding for semantic search |
+**Indexes:**
+- Primary key on `id`
+- Unique index on `key`
+- B-tree indexes on `created_at`, `updated_at`, `last_accessed`, `type`, `category`, `robot_id`
+- HNSW index on `embedding` for vector similarity
+- GIN indexes on `to_tsvector('english', value)` for full-text search
+- GIN trigram index on `value` for fuzzy matching
+#### robots
+Registry of all robots using the HTM system.
+| Column | Type | Description |
+|--------|------|-------------|
+| `id` | TEXT | Primary key, UUID v4 |
+| `name` | TEXT | Human-readable robot name |
+| `created_at` | TIMESTAMP | Registration timestamp |
+| `last_active` | TIMESTAMP | Last activity timestamp |
+| `metadata` | JSONB | Flexible robot configuration |
+#### relationships
+Graph edges connecting related nodes.
+| Column | Type | Description |
+|--------|------|-------------|
+| `id` | BIGSERIAL | Primary key |
+| `from_node_id` | BIGINT | Source node foreign key |
+| `to_node_id` | BIGINT | Target node foreign key |
+| `relationship_type` | TEXT | Type of relationship (e.g., "related_to", "follows") |
+| `strength` | REAL | Relationship strength (0.0-1.0) |
+| `created_at` | TIMESTAMP | Creation timestamp |
+**Indexes:**
+- B-tree indexes on `from_node_id` and `to_node_id`
+- Unique constraint on `(from_node_id, to_node_id, relationship_type)`
+#### tags
+Flexible categorization system for nodes.
+| Column | Type | Description |
+|--------|------|-------------|
+| `id` | BIGSERIAL | Primary key |
+| `node_id` | BIGINT | Foreign key to nodes |
+| `tag` | TEXT | Tag name |
+| `created_at` | TIMESTAMP | Creation timestamp |
+**Indexes:**
+- B-tree index on `node_id`
+- B-tree index on `tag`
+- Unique constraint on `(node_id, tag)`
+#### operations_log
+Audit trail of all memory operations for debugging and replay.
+| Column | Type | Description |
+|--------|------|-------------|
+| `id` | BIGSERIAL | Primary key |
+| `timestamp` | TIMESTAMP | Operation timestamp |
+| `operation` | TEXT | Operation type (add, retrieve, recall, forget, evict) |
+| `node_id` | BIGINT | Foreign key to nodes (nullable) |
+| `robot_id` | TEXT | Foreign key to robots |
+| `details` | JSONB | Flexible operation metadata |
+**Indexes:**
+- B-tree indexes on `timestamp`, `robot_id`, `operation`
+## Technology Stack
+### Core Technologies
+| Technology | Version | Purpose | Why Chosen |
+|-----------|---------|---------|------------|
+| **Ruby** | 3.2+ | Implementation language | Readable, expressive, mature ecosystem |
+| **PostgreSQL** | 16+ | Relational database | ACID guarantees, rich extensions, production-proven |
+| **TimescaleDB** | 2.13+ | Time-series extension | Hypertable partitioning, automatic compression |
+| **pgvector** | 0.5+ | Vector similarity | HNSW indexing, PostgreSQL-native, fast approximate search |
+| **pg_trgm** | - | Fuzzy text search | Built-in PostgreSQL extension for trigram matching |
+### Ruby Dependencies
+```ruby
+# Core dependencies
+gem 'pg', '~> 1.5'                    # PostgreSQL client
+gem 'pgvector', '~> 0.2'              # Vector operations
+gem 'connection_pool', '~> 2.4'      # Connection pooling
+gem 'faraday', '~> 2.7'              # HTTP client (for embedding APIs)
+# Optional dependencies
+gem 'tiktoken_ruby', '~> 0.0.6'      # Token counting (OpenAI-compatible)
+```
+### Embedding Providers
+!!! info "Client-Side Generation"
+    Embeddings are generated client-side in Ruby before database insertion. This provides reliable, cross-platform operation.
+| Provider | Models | Dimensions | Speed | Cost |
+|----------|--------|------------|-------|------|
+| **Ollama** (default) | nomic-embed-text, mxbai-embed-large, all-minilm | 384-1024 | Fast (local HTTP) | Free |
+| **OpenAI** | text-embedding-3-small, text-embedding-ada-002 | 1536 | Fast (API) | $0.0001/1K tokens |
+## Performance Characteristics
+### Latency Benchmarks
+Based on typical production workloads with 10,000 nodes in long-term memory (client-side embeddings):
+!!! info "Performance Characteristics"
+    Client-side embedding generation provides reliable, debuggable operation. Latency includes HTTP call to Ollama/OpenAI for embedding generation.
+| Operation | Median | P95 | P99 | Notes |
+|-----------|--------|-----|-----|-------|
+| `add_message()` | 50ms | 110ms | 190ms | Client-side embedding generation + insert |
+| `recall()` (vector) | 80ms | 140ms | 230ms | Client-side query embedding + vector search |
+| `recall()` (fulltext) | 30ms | 60ms | 100ms | GIN index search (no embedding needed) |
+| `recall()` (hybrid) | 110ms | 190ms | 330ms | Client-side embedding + hybrid search |
+| `retrieve()` | 5ms | 10ms | 20ms | Simple primary key lookup |
+| `create_context()` | 8ms | 15ms | 25ms | In-memory sort + join |
+| `forget()` | 10ms | 20ms | 40ms | DELETE with cascades |
+!!! tip "Performance Optimization"
+    - Use connection pooling (included by default)
+    - Add database indexes for common query patterns
+    - Consider read replicas for query-heavy workloads
+    - Monitor HNSW build time for large embedding tables
+### Throughput
+| Workload | Throughput | Resource Usage |
+|----------|-----------|----------------|
+| Add nodes | 500-1000/sec | CPU-bound (embeddings) |
+| Vector search | 2000-5000/sec | I/O-bound (database) |
+| Full-text search | 5000-10000/sec | I/O-bound (database) |
+| Context assembly | 10000+/sec | Memory-bound (working memory) |
+### Storage
+| Component | Size Estimate | Compression |
+|-----------|--------------|-------------|
+| Node (text only) | ~1KB average | None |
+| Node (with embedding) | ~7KB (1536 dims × 4 bytes) | TimescaleDB compression (70-90%) |
+| Indexes | ~2x data size | Minimal |
+| Operations log | ~200 bytes/op | TimescaleDB compression |
+**Example:** 100,000 nodes with embeddings:
+- Raw data: ~700 MB
+- With indexes: ~2.1 GB
+- With compression (after 30 days): ~300 MB
+## Scalability Considerations
+### Vertical Scaling Limits
+| Resource | Limit | Mitigation |
+|----------|-------|------------|
+| **Working Memory (RAM)** | ~2GB per robot process | Use smaller `working_memory_size`, evict more aggressively |
+| **PostgreSQL Connections** | ~100-200 (default) | Connection pooling, adjust `max_connections` |
+| **Embedding API Rate Limits** | Provider-dependent | Implement rate limiting, use local models |
+| **HNSW Build Time** | O(n log n) on large tables | Partition tables by timeframe |
+### Horizontal Scaling Strategies
+#### Multi-Process (Single Host)
+- Each robot process has independent working memory
+- All processes share single PostgreSQL instance
+- Connection pooling prevents connection exhaustion
+#### Multi-Host (Distributed)
+- **Option 1: Shared Database**
+  - All hosts connect to central PostgreSQL
+  - Read replicas for query scaling
+  - Write operations to primary only
+- **Option 2: Sharded Database**
+  - Partition by `robot_id` or timeframe
+  - Requires coordination for cross-shard queries
+  - More complex but scales writes
+#### Read Scaling
+- Add PostgreSQL read replicas
+- Route `recall()` and `retrieve()` to replicas
+- Primary handles writes only
+- TimescaleDB native replication support
+!!! warning "Consistency Considerations"
+    Read replicas may lag primary by seconds. For strong consistency requirements, query primary database.
+### Future Scaling Enhancements
+1. **Redis-backed Working Memory**: Share working memory across processes
+2. **Horizontal Partitioning**: Shard `nodes` table by `robot_id` or time ranges
+3. **Caching Layer**: Add Redis cache for hot nodes
+4. **Async Embedding Generation**: Queue embedding jobs for batch processing
+5. **Vector Database Migration**: Consider specialized vector DB (Pinecone, Weaviate) at massive scale
+## Related Documentation
+- [Architecture Index](index.md) - Architecture overview and component summary
+- [Two-Tier Memory System](two-tier-memory.md) - Working memory and long-term memory deep dive
+- [Hive Mind Architecture](hive-mind.md) - Multi-robot shared memory design
+- [API Reference](../api/htm.md) - Complete API documentation
+- [Architecture Decision Records](adrs/index.md) - Decision history