PyPI - remdb - Versions diffs - 0.3.0__py3-none-any.whl → 0.3.114__py3-none-any.whl - Mend

remdb 0.3.0py3-none-any.whl → 0.3.114py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of remdb might be problematic. Click here for more details.

Files changed (98) hide show

rem/__init__.py +129 -2
rem/agentic/README.md +76 -0
rem/agentic/__init__.py +15 -0
rem/agentic/agents/__init__.py +16 -2
rem/agentic/agents/sse_simulator.py +500 -0
rem/agentic/context.py +28 -22
rem/agentic/llm_provider_models.py +301 -0
rem/agentic/otel/setup.py +92 -4
rem/agentic/providers/phoenix.py +32 -43
rem/agentic/providers/pydantic_ai.py +142 -22
rem/agentic/schema.py +358 -21
rem/agentic/tools/rem_tools.py +3 -3
rem/api/README.md +238 -1
rem/api/deps.py +255 -0
rem/api/main.py +151 -37
rem/api/mcp_router/resources.py +1 -1
rem/api/mcp_router/server.py +17 -2
rem/api/mcp_router/tools.py +143 -7
rem/api/middleware/tracking.py +172 -0
rem/api/routers/admin.py +277 -0
rem/api/routers/auth.py +124 -0
rem/api/routers/chat/completions.py +152 -16
rem/api/routers/chat/models.py +7 -3
rem/api/routers/chat/sse_events.py +526 -0
rem/api/routers/chat/streaming.py +608 -45
rem/api/routers/dev.py +81 -0
rem/api/routers/feedback.py +148 -0
rem/api/routers/messages.py +473 -0
rem/api/routers/models.py +78 -0
rem/api/routers/query.py +357 -0
rem/api/routers/shared_sessions.py +406 -0
rem/auth/middleware.py +126 -27
rem/cli/commands/README.md +201 -70
rem/cli/commands/ask.py +13 -10
rem/cli/commands/cluster.py +1359 -0
rem/cli/commands/configure.py +4 -3
rem/cli/commands/db.py +350 -137
rem/cli/commands/experiments.py +76 -72
rem/cli/commands/process.py +22 -15
rem/cli/commands/scaffold.py +47 -0
rem/cli/commands/schema.py +95 -49
rem/cli/main.py +29 -6
rem/config.py +2 -2
rem/models/core/core_model.py +7 -1
rem/models/core/rem_query.py +5 -2
rem/models/entities/__init__.py +21 -0
rem/models/entities/domain_resource.py +38 -0
rem/models/entities/feedback.py +123 -0
rem/models/entities/message.py +30 -1
rem/models/entities/session.py +83 -0
rem/models/entities/shared_session.py +180 -0
rem/models/entities/user.py +10 -3
rem/registry.py +373 -0
rem/schemas/agents/rem.yaml +7 -3
rem/services/content/providers.py +94 -140
rem/services/content/service.py +92 -20
rem/services/dreaming/affinity_service.py +2 -16
rem/services/dreaming/moment_service.py +2 -15
rem/services/embeddings/api.py +24 -17
rem/services/embeddings/worker.py +16 -16
rem/services/phoenix/EXPERIMENT_DESIGN.md +3 -3
rem/services/phoenix/client.py +252 -19
rem/services/postgres/README.md +159 -15
rem/services/postgres/__init__.py +2 -1
rem/services/postgres/diff_service.py +426 -0
rem/services/postgres/pydantic_to_sqlalchemy.py +427 -129
rem/services/postgres/repository.py +132 -0
rem/services/postgres/schema_generator.py +86 -5
rem/services/postgres/service.py +6 -6
rem/services/rate_limit.py +113 -0
rem/services/rem/README.md +14 -0
rem/services/rem/parser.py +44 -9
rem/services/rem/service.py +36 -2
rem/services/session/compression.py +17 -1
rem/services/session/reload.py +1 -1
rem/services/user_service.py +98 -0
rem/settings.py +169 -17
rem/sql/background_indexes.sql +21 -16
rem/sql/migrations/001_install.sql +231 -54
rem/sql/migrations/002_install_models.sql +457 -393
rem/sql/migrations/003_optional_extensions.sql +326 -0
rem/utils/constants.py +97 -0
rem/utils/date_utils.py +228 -0
rem/utils/embeddings.py +17 -4
rem/utils/files.py +167 -0
rem/utils/mime_types.py +158 -0
rem/utils/model_helpers.py +156 -1
rem/utils/schema_loader.py +191 -35
rem/utils/sql_types.py +3 -1
rem/utils/vision.py +9 -14
rem/workers/README.md +14 -14
rem/workers/db_maintainer.py +74 -0
{remdb-0.3.0.dist-info → remdb-0.3.114.dist-info}/METADATA +303 -164
{remdb-0.3.0.dist-info → remdb-0.3.114.dist-info}/RECORD +96 -70
{remdb-0.3.0.dist-info → remdb-0.3.114.dist-info}/WHEEL +1 -1
rem/sql/002_install_models.sql +0 -1068
rem/sql/install_models.sql +0 -1038
{remdb-0.3.0.dist-info → remdb-0.3.114.dist-info}/entry_points.txt +0 -0

{remdb-0.3.0.dist-info → remdb-0.3.114.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: remdb
-Version: 0.3.0
+Version: 0.3.114
 Summary: Resources Entities Moments - Bio-inspired memory system for agentic AI workloads
 Project-URL: Homepage, https://github.com/Percolation-Labs/reminiscent
 Project-URL: Documentation, https://github.com/Percolation-Labs/reminiscent/blob/main/README.md
@@ -14,7 +14,7 @@ Classifier: Intended Audience :: Developers
 Classifier: License :: OSI Approved :: MIT License
 Classifier: Programming Language :: Python :: 3.12
 Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
-Requires-Python: >=3.12
+Requires-Python: <3.13,>=3.12
 Requires-Dist: aioboto3>=13.0.0
 Requires-Dist: arize-phoenix>=5.0.0
 Requires-Dist: asyncpg>=0.30.0
@@ -23,11 +23,10 @@ Requires-Dist: click>=8.1.0
 Requires-Dist: fastapi>=0.115.0
 Requires-Dist: fastmcp>=0.5.0
 Requires-Dist: gitpython>=3.1.45
-Requires-Dist: gmft==0.3.1
 Requires-Dist: hypercorn>=0.17.0
 Requires-Dist: itsdangerous>=2.0.0
 Requires-Dist: json-schema-to-pydantic>=0.2.0
-Requires-Dist: kreuzberg[gmft]>=3.21.0
+Requires-Dist: kreuzberg<4.0.0,>=3.21.0
 Requires-Dist: loguru>=0.7.0
 Requires-Dist: openinference-instrumentation-pydantic-ai>=0.1.0
 Requires-Dist: opentelemetry-api>=1.28.0
@@ -102,23 +101,22 @@ Cloud-native unified memory infrastructure for agentic AI systems built with Pyd
 - **Database Layer**: PostgreSQL 18 with pgvector for multi-index memory (KV + Vector + Graph)
 - **REM Query Dialect**: Custom query language with O(1) lookups, semantic search, graph traversal
 - **Ingestion & Dreaming**: Background workers for content extraction and progressive index enrichment (0% → 100% answerable)
-- **Observability & Evals**: OpenTelemetry tracing + Arize Phoenix + LLM-as-a-Judge evaluation framework
+- **Observability & Evals**: OpenTelemetry tracing supporting LLM-as-a-Judge evaluation frameworks
 ## Features
 | Feature | Description | Benefits |
 |---------|-------------|----------|
 | **OpenAI-Compatible Chat API** | Drop-in replacement for OpenAI chat completions API with streaming support | Use with existing OpenAI clients, switch models across providers (OpenAI, Anthropic, etc.) |
-| **Built-in MCP Server** | FastMCP server with 4 tools + 3 resources for memory operations | Export memory to Claude Desktop, Cursor, or any MCP-compatible host |
+| **Built-in MCP Server** | FastMCP server with 4 tools + 5 resources for memory operations | Export memory to Claude Desktop, Cursor, or any MCP-compatible host |
 | **REM Query Engine** | Multi-index query system (LOOKUP, FUZZY, SEARCH, SQL, TRAVERSE) with custom dialect | O(1) lookups, semantic search, graph traversal - all tenant-isolated |
 | **Dreaming Workers** | Background workers for entity extraction, moment generation, and affinity matching | Automatic knowledge graph construction from resources (0% → 100% query answerable) |
 | **PostgreSQL + pgvector** | CloudNativePG with PostgreSQL 18, pgvector extension, streaming replication | Production-ready vector search, no external vector DB needed |
 | **AWS EKS Recipe** | Complete infrastructure-as-code with Pulumi, Karpenter, ArgoCD | Deploy to production EKS in minutes with auto-scaling and GitOps |
 | **JSON Schema Agents** | Dynamic agent creation from YAML schemas via Pydantic AI factory | Define agents declaratively, version control schemas, load dynamically |
-| **Content Providers** | Audio transcription (Whisper), vision (GPT-4V, Claude), PDFs, DOCX, images | Multimodal ingestion out of the box with format detection |
-| **Configurable Embeddings** | Provider-agnostic embedding system (OpenAI, Cohere, Jina) | Switch embedding providers via env vars, no code changes |
+| **Content Providers** | Audio transcription (Whisper), vision (OpenAI, Anthropic, Gemini), PDFs, DOCX, PPTX, XLSX, images | Multimodal ingestion out of the box with format detection |
+| **Configurable Embeddings** | OpenAI embedding system (text-embedding-3-small) | Production-ready embeddings, additional providers planned |
 | **Multi-Tenancy** | Tenant isolation at database level with automatic scoping | SaaS-ready with complete data separation per tenant |
-| **Streaming Everything** | SSE for chat, background workers for embeddings, async throughout | Real-time responses, non-blocking operations, scalable |
 | **Zero Vendor Lock-in** | Raw HTTP clients (no OpenAI SDK), swappable providers, open standards | Not tied to any vendor, easy to migrate, full control |
 ## Quick Start
@@ -136,42 +134,50 @@ Choose your path:
 **Best for**: First-time users who want to explore REM with curated example datasets.
 ```bash
+# Install system dependencies (tesseract for OCR)
+brew install tesseract  # macOS (Linux/Windows: see tesseract-ocr.github.io)
 # Install remdb
-pip install remdb[all]
+pip install "remdb[all]"
 # Clone example datasets
 git clone https://github.com/Percolation-Labs/remstack-lab.git
 cd remstack-lab
-# Configure REM (interactive wizard)
-rem configure --install
+# Optional: Set default LLM provider via environment variable
+# export LLM__DEFAULT_MODEL="openai:gpt-4.1-nano"  # Fast and cheap
+# export LLM__DEFAULT_MODEL="anthropic:claude-sonnet-4-5-20250929"  # High quality (default)
-# Start PostgreSQL
-docker run -d \
-  --name rem-postgres \
-  -e POSTGRES_USER=rem \
-  -e POSTGRES_PASSWORD=rem \
-  -e POSTGRES_DB=rem \
-  -p 5050:5432 \
-  pgvector/pgvector:pg18
+# Start PostgreSQL with docker-compose
+curl -O https://gist.githubusercontent.com/percolating-sirsh/d117b673bc0edfdef1a5068ccd3cf3e5/raw/docker-compose.prebuilt.yml
+docker compose -f docker-compose.prebuilt.yml up -d postgres
-# Load quickstart dataset
-rem db load --file datasets/quickstart/sample_data.yaml --user-id demo-user
+# Configure REM (creates ~/.rem/config.yaml and installs database schema)
+# Add --claude-desktop to register with Claude Desktop app
+rem configure --install --claude-desktop
+# Load quickstart dataset (uses default user)
+rem db load datasets/quickstart/sample_data.yaml
 # Ask questions
-rem ask --user-id demo-user "What documents exist in the system?"
-rem ask --user-id demo-user "Show me meetings about API design"
+rem ask "What documents exist in the system?"
+rem ask "Show me meetings about API design"
+# Ingest files (PDF, DOCX, images, etc.) - note: requires remstack-lab
+rem process ingest datasets/formats/files/bitcoin_whitepaper.pdf --category research --tags bitcoin,whitepaper
+# Query ingested content
+rem ask "What is the Bitcoin whitepaper about?"
-# Try other datasets
-rem db load --file datasets/domains/recruitment/scenarios/candidate_pipeline/data.yaml --user-id my-company
-rem ask --user-id my-company "Show me candidates with Python experience"
+# Try other datasets (use --user-id for multi-tenant scenarios)
+rem db load datasets/domains/recruitment/scenarios/candidate_pipeline/data.yaml --user-id acme-corp
+rem ask --user-id acme-corp "Show me candidates with Python experience"
 ```
 **What you get:**
 - Quickstart: 3 users, 3 resources, 3 moments, 4 messages
 - Domain datasets: recruitment, legal, enterprise, misc
 - Format examples: engrams, documents, conversations, files
-- Jupyter notebooks and experiments
 **Learn more**: [remstack-lab repository](https://github.com/Percolation-Labs/remstack-lab)
@@ -252,28 +258,28 @@ Configuration saved to `~/.rem/config.yaml` (can edit with `rem configure --edit
 # Clone datasets repository
 git clone https://github.com/Percolation-Labs/remstack-lab.git
-# Load quickstart dataset
-rem db load --file remstack-lab/datasets/quickstart/sample_data.yaml --user-id demo-user
+# Load quickstart dataset (uses default user)
+rem db load --file remstack-lab/datasets/quickstart/sample_data.yaml
 # Test with sample queries
-rem ask --user-id demo-user "What documents exist in the system?"
-rem ask --user-id demo-user "Show me meetings about API design"
-rem ask --user-id demo-user "Who is Sarah Chen?"
+rem ask "What documents exist in the system?"
+rem ask "Show me meetings about API design"
+rem ask "Who is Sarah Chen?"
-# Try domain-specific datasets
-rem db load --file remstack-lab/datasets/domains/recruitment/scenarios/candidate_pipeline/data.yaml --user-id my-company
-rem ask --user-id my-company "Show me candidates with Python experience"
+# Try domain-specific datasets (use --user-id for multi-tenant scenarios)
+rem db load --file remstack-lab/datasets/domains/recruitment/scenarios/candidate_pipeline/data.yaml --user-id acme-corp
+rem ask --user-id acme-corp "Show me candidates with Python experience"
 ```
 **Option B: Bring your own data**
 ```bash
-# Ingest your own files
+# Ingest your own files (uses default user)
 echo "REM is a bio-inspired memory system for agentic AI workloads." > test-doc.txt
-rem process ingest test-doc.txt --user-id test-user --category documentation --tags rem,ai
+rem process ingest test-doc.txt --category documentation --tags rem,ai
 # Query your ingested data
-rem ask --user-id test-user "What do you know about REM from my knowledge base?"
+rem ask "What do you know about REM from my knowledge base?"
 ```
 ### Step 4: Test the API
@@ -310,13 +316,13 @@ curl -X POST http://localhost:8000/api/v1/chat/completions \
 ```bash
 cd remstack-lab
-# Load any dataset
-rem db load --file datasets/quickstart/sample_data.yaml --user-id demo-user
+# Load any dataset (uses default user)
+rem db load --file datasets/quickstart/sample_data.yaml
 # Explore formats
-rem db load --file datasets/formats/engrams/scenarios/team_meeting/team_standup_meeting.yaml --user-id demo-user
+rem db load --file datasets/formats/engrams/scenarios/team_meeting/team_standup_meeting.yaml
-# Try domain-specific examples
+# Try domain-specific examples (use --user-id for multi-tenant scenarios)
 rem db load --file datasets/domains/recruitment/scenarios/candidate_pipeline/data.yaml --user-id acme-corp
 ```
@@ -411,30 +417,24 @@ json_schema_extra:
 ```bash
 # Ingest the schema (stores in database schemas table)
 rem process ingest my-research-assistant.yaml \
-  --user-id my-user \
   --category agents \
   --tags custom,research
 # Verify schema is in database (should show schema details)
-rem ask "LOOKUP 'my-research-assistant' FROM schemas" --user-id my-user
+rem ask "LOOKUP 'my-research-assistant' FROM schemas"
 ```
 **Step 3: Use Your Custom Agent**
 ```bash
 # Run a query with your custom agent
-rem ask research-assistant "Find documents about machine learning architecture" \
-  --user-id my-user
+rem ask research-assistant "Find documents about machine learning architecture"
 # With streaming
-rem ask research-assistant "Summarize recent API design documents" \
-  --user-id my-user \
-  --stream
+rem ask research-assistant "Summarize recent API design documents" --stream
 # With session continuity
-rem ask research-assistant "What did we discuss about ML?" \
-  --user-id my-user \
-  --session-id abc-123
+rem ask research-assistant "What did we discuss about ML?" --session-id abc-123
 ```
 ### Agent Schema Structure
@@ -505,10 +505,10 @@ Custom agents can also be used as **ontology extractors** to extract structured
 **Schema not found error:**
 ```bash
 # Check if schema was ingested correctly
-rem ask "SEARCH 'my-agent' FROM schemas" --user-id my-user
+rem ask "SEARCH 'my-agent' FROM schemas"
-# List all schemas for your user
-rem ask "SELECT name, category, created_at FROM schemas ORDER BY created_at DESC LIMIT 10" --user-id my-user
+# List all schemas
+rem ask "SELECT name, category, created_at FROM schemas ORDER BY created_at DESC LIMIT 10"
 ```
 **Agent not loading tools:**
@@ -533,15 +533,15 @@ REM provides a custom query language designed for **LLM-driven iterated retrieva
 Unlike traditional single-shot SQL queries, the REM dialect is optimized for **multi-turn exploration** where LLMs participate in query planning:
 - **Iterated Queries**: Queries return partial results that LLMs use to refine subsequent queries
-- **Composable WITH Syntax**: Chain operations together (e.g., `TRAVERSE FROM ... WITH LOOKUP "..."`)
+- **Composable WITH Syntax**: Chain operations together (e.g., `TRAVERSE edge_type WITH LOOKUP "..."`)
 - **Mixed Indexes**: Combines exact lookups (O(1)), semantic search (vector), and graph traversal
 - **Query Planner Participation**: Results include metadata for LLMs to decide next steps
 **Example Multi-Turn Flow**:
 ```
 Turn 1: LOOKUP "sarah-chen" → Returns entity + available edge types
-Turn 2: TRAVERSE FROM "sarah-chen" TYPE "authored_by" DEPTH 1 → Returns connected documents
-Turn 3: SEARCH "architecture decisions" WITH TRAVERSE FROM "sarah-chen" → Combines semantic + graph
+Turn 2: TRAVERSE authored_by WITH LOOKUP "sarah-chen" DEPTH 1 → Returns connected documents
+Turn 3: SEARCH "architecture decisions" → Semantic search, then explore graph from results
 ```
 This enables LLMs to **progressively build context** rather than requiring perfect queries upfront.
@@ -594,8 +594,8 @@ SEARCH "contract disputes" FROM resources WHERE tags @> ARRAY['legal'] LIMIT 5
 Follow `graph_edges` relationships across the knowledge graph.
 ```sql
-TRAVERSE FROM "sarah-chen" TYPE "authored_by" DEPTH 2
-TRAVERSE FROM "api-design-v2" TYPE "references,depends_on" DEPTH 3
+TRAVERSE authored_by WITH LOOKUP "sarah-chen" DEPTH 2
+TRAVERSE references,depends_on WITH LOOKUP "api-design-v2" DEPTH 3
 ```
 **Features**:
@@ -688,7 +688,7 @@ SEARCH "API migration planning" FROM resources LIMIT 5
 LOOKUP "tidb-migration-spec" FROM resources
 # Query 3: Find related people
-TRAVERSE FROM "tidb-migration-spec" TYPE "authored_by,reviewed_by" DEPTH 1
+TRAVERSE authored_by,reviewed_by WITH LOOKUP "tidb-migration-spec" DEPTH 1
 # Query 4: Recent activity
 SELECT * FROM moments WHERE
@@ -705,7 +705,7 @@ All queries automatically scoped by `user_id` for complete data isolation:
 SEARCH "contracts" FROM resources LIMIT 10
 -- No cross-user data leakage
-TRAVERSE FROM "project-x" TYPE "references" DEPTH 3
+TRAVERSE references WITH LOOKUP "project-x" DEPTH 3
 ```
 ## API Endpoints
@@ -857,81 +857,131 @@ rem serve --log-level debug
 ### Database Management
-#### `rem db migrate` - Run Migrations
+REM uses a **code-as-source-of-truth** approach for database schema management. Pydantic models define the schema, and the database is kept in sync via diff-based migrations.
-Apply database migrations (install.sql and install_models.sql).
+#### Schema Management Philosophy
+**Two migration files only:**
+- `001_install.sql` - Core infrastructure (extensions, functions, KV store)
+- `002_install_models.sql` - Entity tables (auto-generated from Pydantic models)
+**No incremental migrations** (003, 004, etc.) - the models file is always regenerated to match code.
+#### `rem db schema generate` - Regenerate Schema SQL
+Generate `002_install_models.sql` from registered Pydantic models.
 ```bash
-# Apply all migrations
-rem db migrate
+# Regenerate from model registry
+rem db schema generate
-# Core infrastructure only (extensions, functions)
-rem db migrate --install
+# Output: src/rem/sql/migrations/002_install_models.sql
+```
-# Entity tables only (Resource, Message, etc.)
-rem db migrate --models
+This generates:
+- CREATE TABLE statements for each registered entity
+- Embeddings tables (`embeddings_<table>`)
+- KV_STORE triggers for cache maintenance
+- Foreground indexes (GIN for JSONB, B-tree for lookups)
-# Background indexes (HNSW for vectors)
-rem db migrate --background-indexes
+#### `rem db diff` - Detect Schema Drift
+Compare Pydantic models against the live database using Alembic autogenerate.
+```bash
+# Show differences
+rem db diff
-# Custom connection string
-rem db migrate --connection "postgresql://user:pass@host:5432/db"
+# CI mode: exit 1 if drift detected
+rem db diff --check
-# Custom SQL directory
-rem db migrate --sql-dir /path/to/sql
+# Generate migration SQL for changes
+rem db diff --generate
 ```
-#### `rem db status` - Migration Status
+**Output shows:**
+- `+ ADD COLUMN` - Column in model but not in DB
+- `- DROP COLUMN` - Column in DB but not in model
+- `~ ALTER COLUMN` - Column type or constraints differ
+- `+ CREATE TABLE` / `- DROP TABLE` - Table additions/removals
+#### `rem db apply` - Apply SQL Directly
-Show applied migrations and execution times.
+Apply a SQL file directly to the database (bypasses migration tracking).
 ```bash
-rem db status
+# Apply with audit logging (default)
+rem db apply src/rem/sql/migrations/002_install_models.sql
+# Preview without executing
+rem db apply --dry-run src/rem/sql/migrations/002_install_models.sql
+# Apply without audit logging
+rem db apply --no-log src/rem/sql/migrations/002_install_models.sql
 ```
-#### `rem db rebuild-cache` - Rebuild KV Cache
+#### `rem db migrate` - Initial Setup
-Rebuild KV_STORE cache from entity tables (after database restart or bulk imports).
+Apply standard migrations (001 + 002). Use for initial setup only.
 ```bash
-rem db rebuild-cache
+# Apply infrastructure + entity tables
+rem db migrate
+# Include background indexes (HNSW for vectors)
+rem db migrate --background-indexes
 ```
-### Schema Management
+#### Database Workflows
-#### `rem db schema generate` - Generate SQL Schema
+**Initial Setup (Local):**
+```bash
+rem db schema generate   # Generate from models
+rem db migrate           # Apply 001 + 002
+rem db diff              # Verify no drift
+```
-Generate database schema from Pydantic models.
+**Adding/Modifying Models:**
+```bash
+# 1. Edit models in src/rem/models/entities/
+# 2. Register new models in src/rem/registry.py
+rem db schema generate   # Regenerate schema
+rem db diff              # See what changed
+rem db apply src/rem/sql/migrations/002_install_models.sql
+```
+**CI/CD Pipeline:**
 ```bash
-# Generate install_models.sql from entity models
-rem db schema generate \
-  --models src/rem/models/entities \
-  --output rem/src/rem/sql/install_models.sql
+rem db diff --check      # Fail build if drift detected
+```
-# Generate migration file
-rem db schema generate \
-  --models src/rem/models/entities \
-  --output rem/src/rem/sql/migrations/003_add_fields.sql
+**Remote Database (Production/Staging):**
+```bash
+# Port-forward to cluster database
+kubectl port-forward -n <namespace> svc/rem-postgres-rw 5433:5432 &
+# Override connection for diff check
+POSTGRES__CONNECTION_STRING="postgresql://rem:rem@localhost:5433/rem" rem db diff
+# Apply changes if needed
+POSTGRES__CONNECTION_STRING="postgresql://rem:rem@localhost:5433/rem" \
+  rem db apply src/rem/sql/migrations/002_install_models.sql
 ```
-#### `rem db schema indexes` - Generate Background Indexes
+#### `rem db rebuild-cache` - Rebuild KV Cache
-Generate SQL for background index creation (HNSW for vectors).
+Rebuild KV_STORE cache from entity tables (after database restart or bulk imports).
 ```bash
-# Generate background_indexes.sql
-rem db schema indexes \
-  --models src/rem/models/entities \
-  --output rem/src/rem/sql/background_indexes.sql
+rem db rebuild-cache
 ```
 #### `rem db schema validate` - Validate Models
-Validate Pydantic models for schema generation.
+Validate registered Pydantic models for schema generation.
 ```bash
-rem db schema validate --models src/rem/models/entities
+rem db schema validate
 ```
 ### File Processing
@@ -941,22 +991,14 @@ rem db schema validate --models src/rem/models/entities
 Process files with optional custom extractor (ontology extraction).
 ```bash
-# Process all completed files for tenant
-rem process files \
-  --tenant-id acme-corp \
-  --status completed \
-  --limit 10
+# Process all completed files
+rem process files --status completed --limit 10
 # Process with custom extractor
-rem process files \
-  --tenant-id acme-corp \
-  --extractor cv-parser-v1 \
-  --limit 50
+rem process files --extractor cv-parser-v1 --limit 50
-# Process files from the last 7 days
-rem process files \
-  --tenant-id acme-corp \
-  --lookback-hours 168
+# Process files for specific user
+rem process files --user-id user-123 --status completed
 ```
 #### `rem process ingest` - Ingest File into REM
@@ -964,14 +1006,13 @@ rem process files \
 Ingest a file into REM with full pipeline (storage + parsing + embedding + database).
 ```bash
-# Ingest local file
+# Ingest local file with metadata
 rem process ingest /path/to/document.pdf \
-  --user-id user-123 \
   --category legal \
   --tags contract,2024
 # Ingest with minimal options
-rem process ingest ./meeting-notes.md --user-id user-123
+rem process ingest ./meeting-notes.md
 ```
 #### `rem process uri` - Parse File (Read-Only)
@@ -996,28 +1037,17 @@ rem process uri s3://bucket/key.docx --output text
 Run full dreaming workflow: extractors → moments → affinity → user model.
 ```bash
-# Full workflow for user
-rem dreaming full \
-  --user-id user-123 \
-  --tenant-id acme-corp
+# Full workflow (uses default user from settings)
+rem dreaming full
 # Skip ontology extractors
-rem dreaming full \
-  --user-id user-123 \
-  --tenant-id acme-corp \
-  --skip-extractors
+rem dreaming full --skip-extractors
 # Process last 24 hours only
-rem dreaming full \
-  --user-id user-123 \
-  --tenant-id acme-corp \
-  --lookback-hours 24
+rem dreaming full --lookback-hours 24
-# Limit resources processed
-rem dreaming full \
-  --user-id user-123 \
-  --tenant-id acme-corp \
-  --limit 100
+# Limit resources processed for specific user
+rem dreaming full --user-id user-123 --limit 100
 ```
 #### `rem dreaming custom` - Custom Extractor
@@ -1025,16 +1055,11 @@ rem dreaming full \
 Run specific ontology extractor on user's data.
 ```bash
-# Run CV parser on user's files
-rem dreaming custom \
-  --user-id user-123 \
-  --tenant-id acme-corp \
-  --extractor cv-parser-v1
+# Run CV parser on files
+rem dreaming custom --extractor cv-parser-v1
-# Process last week's files
+# Process last week's files with limit
 rem dreaming custom \
-  --user-id user-123 \
-  --tenant-id acme-corp \
   --extractor contract-analyzer-v1 \
   --lookback-hours 168 \
   --limit 50
@@ -1045,17 +1070,11 @@ rem dreaming custom \
 Extract temporal narratives from resources.
 ```bash
-# Generate moments for user
-rem dreaming moments \
-  --user-id user-123 \
-  --tenant-id acme-corp \
-  --limit 50
+# Generate moments
+rem dreaming moments --limit 50
 # Process last 7 days
-rem dreaming moments \
-  --user-id user-123 \
-  --tenant-id acme-corp \
-  --lookback-hours 168
+rem dreaming moments --lookback-hours 168
 ```
 #### `rem dreaming affinity` - Build Relationships
@@ -1063,17 +1082,11 @@ rem dreaming moments \
 Build semantic relationships between resources using embeddings.
 ```bash
-# Build affinity graph for user
-rem dreaming affinity \
-  --user-id user-123 \
-  --tenant-id acme-corp \
-  --limit 100
+# Build affinity graph
+rem dreaming affinity --limit 100
 # Process recent resources only
-rem dreaming affinity \
-  --user-id user-123 \
-  --tenant-id acme-corp \
-  --lookback-hours 24
+rem dreaming affinity --lookback-hours 24
 ```
 #### `rem dreaming user-model` - Update User Model
@@ -1082,9 +1095,7 @@ Update user model from recent activity (preferences, interests, patterns).
 ```bash
 # Update user model
-rem dreaming user-model \
-  --user-id user-123 \
-  --tenant-id acme-corp
+rem dreaming user-model
 ```
 ### Evaluation & Experiments
@@ -1335,6 +1346,30 @@ S3__BUCKET_NAME=rem-storage
 S3__REGION=us-east-1
 ```
+### Building Docker Images
+We tag Docker images with three labels for traceability:
+1. `latest` - Always points to most recent build
+2. `<git-sha>` - Short commit hash for exact version tracing
+3. `<version>` - Semantic version from `pyproject.toml`
+```bash
+# Build and push multi-platform image to Docker Hub
+VERSION=$(grep '^version' pyproject.toml | cut -d'"' -f2) && \
+docker buildx build --platform linux/amd64,linux/arm64 \
+    -t percolationlabs/rem:latest \
+    -t percolationlabs/rem:$(git rev-parse --short HEAD) \
+    -t percolationlabs/rem:$VERSION \
+    --push \
+    -f Dockerfile .
+# Load locally for testing (single platform, no push)
+docker buildx build --platform linux/arm64 \
+    -t percolationlabs/rem:latest \
+    --load \
+    -f Dockerfile .
+```
 ### Production Deployment (Optional)
 For production deployment to AWS EKS with Kubernetes, see the main repository README:
@@ -1450,6 +1485,110 @@ TraverseQuery ::= TRAVERSE [<edge_types:list>] WITH <initial_query:Query> [DEPTH
 **Stage 4** (100% answerable): Mature graph with rich historical data. All query types fully functional with high-quality results.
+## Troubleshooting
+### Apple Silicon Mac: "Failed to build kreuzberg" Error
+**Problem**: Installation fails with `ERROR: Failed building wheel for kreuzberg` on Apple Silicon Macs.
+**Root Cause**: REM uses `kreuzberg>=4.0.0rc1` for document parsing with native ONNX/Rust table extraction. Kreuzberg 4.0.0rc1 provides pre-built wheels for ARM64 macOS (`macosx_14_0_arm64.whl`) but NOT for x86_64 (Intel) macOS. If you're using an x86_64 Python binary (running under Rosetta 2), pip cannot find a compatible wheel and attempts to build from source, which fails.
+**Solution**: Use ARM64 (native) Python instead of x86_64 Python.
+**Step 1: Verify your Python architecture**
+```bash
+python3 -c "import platform; print(f'Machine: {platform.machine()}')"
+```
+- **Correct**: `Machine: arm64` (native ARM Python)
+- **Wrong**: `Machine: x86_64` (Intel Python under Rosetta)
+**Step 2: Install ARM Python via Homebrew** (if not already installed)
+```bash
+# Install ARM Python
+brew install python@3.12
+# Verify it's ARM
+/opt/homebrew/bin/python3.12 -c "import platform; print(platform.machine())"
+# Should output: arm64
+```
+**Step 3: Create venv with ARM Python**
+```bash
+# Use full path to ARM Python
+/opt/homebrew/bin/python3.12 -m venv .venv
+# Activate and install
+source .venv/bin/activate
+pip install "remdb[all]"
+```
+**Why This Happens**: Some users have both Intel Homebrew (`/usr/local`) and ARM Homebrew (`/opt/homebrew`) installed. If your system `python3` points to the Intel version at `/usr/local/bin/python3`, you'll hit this issue. The fix is to explicitly use the ARM Python from `/opt/homebrew/bin/python3.12`.
+**Verification**: After successful installation, you should see:
+```
+Using cached kreuzberg-4.0.0rc1-cp310-abi3-macosx_14_0_arm64.whl (19.8 MB)
+Successfully installed ... kreuzberg-4.0.0rc1 ... remdb-0.3.10
+```
+## Using REM as a Library
+REM wraps FastAPI - extend it exactly as you would any FastAPI app.
+```python
+import rem
+from rem import create_app
+from rem.models.core import CoreModel
+# 1. Register models (for schema generation)
+rem.register_models(MyModel, AnotherModel)
+# 2. Register schema paths (for custom agents/evaluators)
+rem.register_schema_path("./schemas")
+# 3. Create app
+app = create_app()
+# 4. Extend like normal FastAPI
+app.include_router(my_router)
+@app.mcp_server.tool()
+async def my_tool(query: str) -> dict:
+    """Custom MCP tool."""
+    return {"result": query}
+```
+### Project Structure
+```
+my-rem-app/
+├── my_app/
+│   ├── main.py           # Entry point (create_app + extensions)
+│   ├── models.py         # Custom models (inherit CoreModel)
+│   └── routers/          # Custom FastAPI routers
+├── schemas/
+│   ├── agents/           # Custom agent YAML schemas
+│   └── evaluators/       # Custom evaluator schemas
+├── sql/migrations/       # Custom SQL migrations
+└── pyproject.toml
+```
+Generate this structure with: `rem scaffold my-app`
+### Extension Points
+| Extension | How |
+|-----------|-----|
+| **Routes** | `app.include_router(router)` or `@app.get()` |
+| **MCP Tools** | `@app.mcp_server.tool()` decorator or `app.mcp_server.add_tool(fn)` |
+| **MCP Resources** | `@app.mcp_server.resource("uri://...")` or `app.mcp_server.add_resource(fn)` |
+| **MCP Prompts** | `@app.mcp_server.prompt()` or `app.mcp_server.add_prompt(fn)` |
+| **Models** | `rem.register_models(Model)` then `rem db schema generate` |
+| **Agent Schemas** | `rem.register_schema_path("./schemas")` or `SCHEMA__PATHS` env var |
 ## License
 MIT

remdb 0.3.0__py3-none-any.whl → 0.3.114__py3-none-any.whl

Potentially problematic release.

remdb 0.3.0py3-none-any.whl → 0.3.114py3-none-any.whl