remdb 0.2.6__py3-none-any.whl → 0.3.118__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of remdb might be problematic. Click here for more details.
- rem/__init__.py +129 -2
- rem/agentic/README.md +76 -0
- rem/agentic/__init__.py +15 -0
- rem/agentic/agents/__init__.py +16 -2
- rem/agentic/agents/sse_simulator.py +500 -0
- rem/agentic/context.py +28 -22
- rem/agentic/llm_provider_models.py +301 -0
- rem/agentic/mcp/tool_wrapper.py +29 -3
- rem/agentic/otel/setup.py +92 -4
- rem/agentic/providers/phoenix.py +32 -43
- rem/agentic/providers/pydantic_ai.py +168 -24
- rem/agentic/schema.py +358 -21
- rem/agentic/tools/rem_tools.py +3 -3
- rem/api/README.md +238 -1
- rem/api/deps.py +255 -0
- rem/api/main.py +154 -37
- rem/api/mcp_router/resources.py +1 -1
- rem/api/mcp_router/server.py +26 -5
- rem/api/mcp_router/tools.py +454 -7
- rem/api/middleware/tracking.py +172 -0
- rem/api/routers/admin.py +494 -0
- rem/api/routers/auth.py +124 -0
- rem/api/routers/chat/completions.py +152 -16
- rem/api/routers/chat/models.py +7 -3
- rem/api/routers/chat/sse_events.py +526 -0
- rem/api/routers/chat/streaming.py +608 -45
- rem/api/routers/dev.py +81 -0
- rem/api/routers/feedback.py +148 -0
- rem/api/routers/messages.py +473 -0
- rem/api/routers/models.py +78 -0
- rem/api/routers/query.py +360 -0
- rem/api/routers/shared_sessions.py +406 -0
- rem/auth/middleware.py +126 -27
- rem/cli/commands/README.md +237 -64
- rem/cli/commands/ask.py +15 -11
- rem/cli/commands/cluster.py +1300 -0
- rem/cli/commands/configure.py +170 -97
- rem/cli/commands/db.py +396 -139
- rem/cli/commands/experiments.py +278 -96
- rem/cli/commands/process.py +22 -15
- rem/cli/commands/scaffold.py +47 -0
- rem/cli/commands/schema.py +97 -50
- rem/cli/main.py +37 -6
- rem/config.py +2 -2
- rem/models/core/core_model.py +7 -1
- rem/models/core/rem_query.py +5 -2
- rem/models/entities/__init__.py +21 -0
- rem/models/entities/domain_resource.py +38 -0
- rem/models/entities/feedback.py +123 -0
- rem/models/entities/message.py +30 -1
- rem/models/entities/session.py +83 -0
- rem/models/entities/shared_session.py +180 -0
- rem/models/entities/user.py +10 -3
- rem/registry.py +373 -0
- rem/schemas/agents/rem.yaml +7 -3
- rem/services/content/providers.py +94 -140
- rem/services/content/service.py +115 -24
- rem/services/dreaming/affinity_service.py +2 -16
- rem/services/dreaming/moment_service.py +2 -15
- rem/services/embeddings/api.py +24 -17
- rem/services/embeddings/worker.py +16 -16
- rem/services/phoenix/EXPERIMENT_DESIGN.md +3 -3
- rem/services/phoenix/client.py +252 -19
- rem/services/postgres/README.md +159 -15
- rem/services/postgres/__init__.py +2 -1
- rem/services/postgres/diff_service.py +531 -0
- rem/services/postgres/pydantic_to_sqlalchemy.py +427 -129
- rem/services/postgres/repository.py +132 -0
- rem/services/postgres/schema_generator.py +291 -9
- rem/services/postgres/service.py +6 -6
- rem/services/rate_limit.py +113 -0
- rem/services/rem/README.md +14 -0
- rem/services/rem/parser.py +44 -9
- rem/services/rem/service.py +36 -2
- rem/services/session/compression.py +17 -1
- rem/services/session/reload.py +1 -1
- rem/services/user_service.py +98 -0
- rem/settings.py +169 -22
- rem/sql/background_indexes.sql +21 -16
- rem/sql/migrations/001_install.sql +387 -54
- rem/sql/migrations/002_install_models.sql +2320 -393
- rem/sql/migrations/003_optional_extensions.sql +326 -0
- rem/sql/migrations/004_cache_system.sql +548 -0
- rem/utils/__init__.py +18 -0
- rem/utils/constants.py +97 -0
- rem/utils/date_utils.py +228 -0
- rem/utils/embeddings.py +17 -4
- rem/utils/files.py +167 -0
- rem/utils/mime_types.py +158 -0
- rem/utils/model_helpers.py +156 -1
- rem/utils/schema_loader.py +284 -21
- rem/utils/sql_paths.py +146 -0
- rem/utils/sql_types.py +3 -1
- rem/utils/vision.py +9 -14
- rem/workers/README.md +14 -14
- rem/workers/__init__.py +2 -1
- rem/workers/db_maintainer.py +74 -0
- rem/workers/unlogged_maintainer.py +463 -0
- {remdb-0.2.6.dist-info → remdb-0.3.118.dist-info}/METADATA +598 -171
- {remdb-0.2.6.dist-info → remdb-0.3.118.dist-info}/RECORD +102 -73
- {remdb-0.2.6.dist-info → remdb-0.3.118.dist-info}/WHEEL +1 -1
- rem/sql/002_install_models.sql +0 -1068
- rem/sql/install_models.sql +0 -1038
- {remdb-0.2.6.dist-info → remdb-0.3.118.dist-info}/entry_points.txt +0 -0
|
@@ -1,11 +1,11 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: remdb
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.3.118
|
|
4
4
|
Summary: Resources Entities Moments - Bio-inspired memory system for agentic AI workloads
|
|
5
|
-
Project-URL: Homepage, https://github.com/
|
|
6
|
-
Project-URL: Documentation, https://github.com/
|
|
7
|
-
Project-URL: Repository, https://github.com/
|
|
8
|
-
Project-URL: Issues, https://github.com/
|
|
5
|
+
Project-URL: Homepage, https://github.com/Percolation-Labs/reminiscent
|
|
6
|
+
Project-URL: Documentation, https://github.com/Percolation-Labs/reminiscent/blob/main/README.md
|
|
7
|
+
Project-URL: Repository, https://github.com/Percolation-Labs/reminiscent
|
|
8
|
+
Project-URL: Issues, https://github.com/Percolation-Labs/reminiscent/issues
|
|
9
9
|
Author-email: mr-saoirse <amartey@gmail.com>
|
|
10
10
|
License: MIT
|
|
11
11
|
Keywords: agents,ai,mcp,memory,postgresql,vector-search
|
|
@@ -14,7 +14,7 @@ Classifier: Intended Audience :: Developers
|
|
|
14
14
|
Classifier: License :: OSI Approved :: MIT License
|
|
15
15
|
Classifier: Programming Language :: Python :: 3.12
|
|
16
16
|
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
17
|
-
Requires-Python:
|
|
17
|
+
Requires-Python: <3.13,>=3.12
|
|
18
18
|
Requires-Dist: aioboto3>=13.0.0
|
|
19
19
|
Requires-Dist: arize-phoenix>=5.0.0
|
|
20
20
|
Requires-Dist: asyncpg>=0.30.0
|
|
@@ -23,11 +23,10 @@ Requires-Dist: click>=8.1.0
|
|
|
23
23
|
Requires-Dist: fastapi>=0.115.0
|
|
24
24
|
Requires-Dist: fastmcp>=0.5.0
|
|
25
25
|
Requires-Dist: gitpython>=3.1.45
|
|
26
|
-
Requires-Dist: gmft
|
|
27
26
|
Requires-Dist: hypercorn>=0.17.0
|
|
28
27
|
Requires-Dist: itsdangerous>=2.0.0
|
|
29
28
|
Requires-Dist: json-schema-to-pydantic>=0.2.0
|
|
30
|
-
Requires-Dist: kreuzberg
|
|
29
|
+
Requires-Dist: kreuzberg<4.0.0,>=3.21.0
|
|
31
30
|
Requires-Dist: loguru>=0.7.0
|
|
32
31
|
Requires-Dist: openinference-instrumentation-pydantic-ai>=0.1.0
|
|
33
32
|
Requires-Dist: opentelemetry-api>=1.28.0
|
|
@@ -91,32 +90,9 @@ Cloud-native unified memory infrastructure for agentic AI systems built with Pyd
|
|
|
91
90
|
|
|
92
91
|
## Architecture Overview
|
|
93
92
|
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
AGENTS --> TOOLS[MCP Tools<br/>5 Tools]
|
|
98
|
-
|
|
99
|
-
TOOLS --> QUERY[REM Query<br/>Dialect]
|
|
100
|
-
QUERY --> DB[(PostgreSQL<br/>+pgvector)]
|
|
101
|
-
|
|
102
|
-
FILES[File Processor] --> DREAM[Dreaming<br/>Workers]
|
|
103
|
-
DREAM --> DB
|
|
104
|
-
|
|
105
|
-
AGENTS --> OTEL[OpenTelemetry]
|
|
106
|
-
OTEL --> PHOENIX[Arize<br/>Phoenix]
|
|
107
|
-
|
|
108
|
-
EVAL[Evaluation<br/>Framework] --> PHOENIX
|
|
109
|
-
|
|
110
|
-
classDef api fill:#4A90E2,stroke:#2E5C8A,color:#fff
|
|
111
|
-
classDef agent fill:#7B68EE,stroke:#483D8B,color:#fff
|
|
112
|
-
classDef db fill:#50C878,stroke:#2E7D4E,color:#fff
|
|
113
|
-
classDef obs fill:#9B59B6,stroke:#6C3483,color:#fff
|
|
114
|
-
|
|
115
|
-
class API,TOOLS api
|
|
116
|
-
class AGENTS agent
|
|
117
|
-
class DB,QUERY db
|
|
118
|
-
class OTEL,PHOENIX,EVAL obs
|
|
119
|
-
```
|
|
93
|
+
<p align="center">
|
|
94
|
+
<img src="https://mermaid.ink/img/Z3JhcGggVEQKICAgIEFQSVtGYXN0QVBJPGJyLz5DaGF0ICsgTUNQXSAtLT4gQUdFTlRTW0pTT04gU2NoZW1hPGJyLz5BZ2VudHNdCiAgICBBR0VOVFMgLS0-IFRPT0xTW01DUCBUb29sczxici8-NSBUb29sc10KCiAgICBUT09MUyAtLT4gUVVFUllbUkVNIFF1ZXJ5PGJyLz5EaWFsZWN0XQogICAgUVVFUlkgLS0-IERCWyhQb3N0Z3JlU1FMPGJyLz4rcGd2ZWN0b3IpXQoKICAgIEZJTEVTW0ZpbGUgUHJvY2Vzc29yXSAtLT4gRFJFQU1bRHJlYW1pbmc8YnIvPldvcmtlcnNdCiAgICBEUkVBTSAtLT4gREIKCiAgICBBR0VOVFMgLS0-IE9URUxbT3BlblRlbGVtZXRyeV0KICAgIE9URUwgLS0-IFBIT0VOSVhbQXJpemU8YnIvPlBob2VuaXhdCgogICAgRVZBTFtFdmFsdWF0aW9uPGJyLz5GcmFtZXdvcmtdIC0tPiBQSE9FTklYCgogICAgY2xhc3NEZWYgYXBpIGZpbGw6IzRBOTBFMixzdHJva2U6IzJFNUM4QSxjb2xvcjojZmZmCiAgICBjbGFzc0RlZiBhZ2VudCBmaWxsOiM3QjY4RUUsc3Ryb2tlOiM0ODNEOEIsY29sb3I6I2ZmZgogICAgY2xhc3NEZWYgZGIgZmlsbDojNTBDODc4LHN0cm9rZTojMkU3RDRFLGNvbG9yOiNmZmYKICAgIGNsYXNzRGVmIG9icyBmaWxsOiM5QjU5QjYsc3Ryb2tlOiM2QzM0ODMsY29sb3I6I2ZmZgoKICAgIGNsYXNzIEFQSSxUT09MUyBhcGkKICAgIGNsYXNzIEFHRU5UUyBhZ2VudAogICAgY2xhc3MgREIsUVVFUlkgZGIKICAgIGNsYXNzIE9URUwsUEhPRU5JWCxFVkFMIG9icwo=" alt="REM Architecture" width="700">
|
|
95
|
+
</p>
|
|
120
96
|
|
|
121
97
|
**Key Components:**
|
|
122
98
|
|
|
@@ -125,37 +101,87 @@ graph TD
|
|
|
125
101
|
- **Database Layer**: PostgreSQL 18 with pgvector for multi-index memory (KV + Vector + Graph)
|
|
126
102
|
- **REM Query Dialect**: Custom query language with O(1) lookups, semantic search, graph traversal
|
|
127
103
|
- **Ingestion & Dreaming**: Background workers for content extraction and progressive index enrichment (0% → 100% answerable)
|
|
128
|
-
- **Observability & Evals**: OpenTelemetry tracing
|
|
104
|
+
- **Observability & Evals**: OpenTelemetry tracing supporting LLM-as-a-Judge evaluation frameworks
|
|
129
105
|
|
|
130
106
|
## Features
|
|
131
107
|
|
|
132
108
|
| Feature | Description | Benefits |
|
|
133
109
|
|---------|-------------|----------|
|
|
134
110
|
| **OpenAI-Compatible Chat API** | Drop-in replacement for OpenAI chat completions API with streaming support | Use with existing OpenAI clients, switch models across providers (OpenAI, Anthropic, etc.) |
|
|
135
|
-
| **Built-in MCP Server** | FastMCP server with
|
|
111
|
+
| **Built-in MCP Server** | FastMCP server with 4 tools + 5 resources for memory operations | Export memory to Claude Desktop, Cursor, or any MCP-compatible host |
|
|
136
112
|
| **REM Query Engine** | Multi-index query system (LOOKUP, FUZZY, SEARCH, SQL, TRAVERSE) with custom dialect | O(1) lookups, semantic search, graph traversal - all tenant-isolated |
|
|
137
113
|
| **Dreaming Workers** | Background workers for entity extraction, moment generation, and affinity matching | Automatic knowledge graph construction from resources (0% → 100% query answerable) |
|
|
138
114
|
| **PostgreSQL + pgvector** | CloudNativePG with PostgreSQL 18, pgvector extension, streaming replication | Production-ready vector search, no external vector DB needed |
|
|
139
115
|
| **AWS EKS Recipe** | Complete infrastructure-as-code with Pulumi, Karpenter, ArgoCD | Deploy to production EKS in minutes with auto-scaling and GitOps |
|
|
140
116
|
| **JSON Schema Agents** | Dynamic agent creation from YAML schemas via Pydantic AI factory | Define agents declaratively, version control schemas, load dynamically |
|
|
141
|
-
| **Content Providers** | Audio transcription (Whisper), vision (
|
|
142
|
-
| **Configurable Embeddings** |
|
|
117
|
+
| **Content Providers** | Audio transcription (Whisper), vision (OpenAI, Anthropic, Gemini), PDFs, DOCX, PPTX, XLSX, images | Multimodal ingestion out of the box with format detection |
|
|
118
|
+
| **Configurable Embeddings** | OpenAI embedding system (text-embedding-3-small) | Production-ready embeddings, additional providers planned |
|
|
143
119
|
| **Multi-Tenancy** | Tenant isolation at database level with automatic scoping | SaaS-ready with complete data separation per tenant |
|
|
144
|
-
| **Streaming Everything** | SSE for chat, background workers for embeddings, async throughout | Real-time responses, non-blocking operations, scalable |
|
|
145
120
|
| **Zero Vendor Lock-in** | Raw HTTP clients (no OpenAI SDK), swappable providers, open standards | Not tied to any vendor, easy to migrate, full control |
|
|
146
121
|
|
|
147
122
|
## Quick Start
|
|
148
123
|
|
|
149
124
|
Choose your path:
|
|
150
125
|
|
|
151
|
-
- **Option 1: Package Users** (Recommended for
|
|
152
|
-
- **Option 2:
|
|
126
|
+
- **Option 1: Package Users with Example Data** (Recommended for first-time users) - PyPI + example datasets
|
|
127
|
+
- **Option 2: Package Users** (Recommended for non-developers) - PyPI package + dockerized database
|
|
128
|
+
- **Option 3: Developers** - Clone repo, local development with uv
|
|
129
|
+
|
|
130
|
+
---
|
|
131
|
+
|
|
132
|
+
## Option 1: Package Users with Example Data (Recommended)
|
|
133
|
+
|
|
134
|
+
**Best for**: First-time users who want to explore REM with curated example datasets.
|
|
135
|
+
|
|
136
|
+
```bash
|
|
137
|
+
# Install system dependencies (tesseract for OCR)
|
|
138
|
+
brew install tesseract # macOS (Linux/Windows: see tesseract-ocr.github.io)
|
|
139
|
+
|
|
140
|
+
# Install remdb
|
|
141
|
+
pip install "remdb[all]"
|
|
142
|
+
|
|
143
|
+
# Clone example datasets
|
|
144
|
+
git clone https://github.com/Percolation-Labs/remstack-lab.git
|
|
145
|
+
cd remstack-lab
|
|
146
|
+
|
|
147
|
+
# Optional: Set default LLM provider via environment variable
|
|
148
|
+
# export LLM__DEFAULT_MODEL="openai:gpt-4.1-nano" # Fast and cheap
|
|
149
|
+
# export LLM__DEFAULT_MODEL="anthropic:claude-sonnet-4-5-20250929" # High quality (default)
|
|
150
|
+
|
|
151
|
+
# Start PostgreSQL with docker-compose
|
|
152
|
+
curl -O https://gist.githubusercontent.com/percolating-sirsh/d117b673bc0edfdef1a5068ccd3cf3e5/raw/docker-compose.prebuilt.yml
|
|
153
|
+
docker compose -f docker-compose.prebuilt.yml up -d postgres
|
|
154
|
+
|
|
155
|
+
# Configure REM (creates ~/.rem/config.yaml and installs database schema)
|
|
156
|
+
# Add --claude-desktop to register with Claude Desktop app
|
|
157
|
+
rem configure --install --claude-desktop
|
|
158
|
+
|
|
159
|
+
# Load quickstart dataset
|
|
160
|
+
rem db load datasets/quickstart/sample_data.yaml
|
|
161
|
+
|
|
162
|
+
# Ask questions
|
|
163
|
+
rem ask "What documents exist in the system?"
|
|
164
|
+
rem ask "Show me meetings about API design"
|
|
165
|
+
|
|
166
|
+
# Ingest files (PDF, DOCX, images, etc.) - note: requires remstack-lab
|
|
167
|
+
rem process ingest datasets/formats/files/bitcoin_whitepaper.pdf --category research --tags bitcoin,whitepaper
|
|
168
|
+
|
|
169
|
+
# Query ingested content
|
|
170
|
+
rem ask "What is the Bitcoin whitepaper about?"
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
**What you get:**
|
|
174
|
+
- Quickstart: 3 users, 3 resources, 3 moments, 4 messages
|
|
175
|
+
- Domain datasets: recruitment, legal, enterprise, misc
|
|
176
|
+
- Format examples: engrams, documents, conversations, files
|
|
177
|
+
|
|
178
|
+
**Learn more**: [remstack-lab repository](https://github.com/Percolation-Labs/remstack-lab)
|
|
153
179
|
|
|
154
180
|
---
|
|
155
181
|
|
|
156
|
-
## Option
|
|
182
|
+
## Option 2: Package Users (No Example Data)
|
|
157
183
|
|
|
158
|
-
**Best for**: Using REM as a service (API + CLI) without modifying code.
|
|
184
|
+
**Best for**: Using REM as a service (API + CLI) without modifying code, bringing your own data.
|
|
159
185
|
|
|
160
186
|
### Step 1: Start Database and API with Docker Compose
|
|
161
187
|
|
|
@@ -220,43 +246,264 @@ Configuration saved to `~/.rem/config.yaml` (can edit with `rem configure --edit
|
|
|
220
246
|
- See [REM Query Dialect](#rem-query-dialect) for query examples
|
|
221
247
|
- See [API Endpoints](#api-endpoints) for OpenAI-compatible API usage
|
|
222
248
|
|
|
223
|
-
### Step 3:
|
|
249
|
+
### Step 3: Load Sample Data (Optional but Recommended)
|
|
250
|
+
|
|
251
|
+
**Option A: Clone example datasets** (Recommended - works with all README examples)
|
|
252
|
+
|
|
253
|
+
```bash
|
|
254
|
+
# Clone datasets repository
|
|
255
|
+
git clone https://github.com/Percolation-Labs/remstack-lab.git
|
|
256
|
+
|
|
257
|
+
# Load quickstart dataset
|
|
258
|
+
rem db load --file remstack-lab/datasets/quickstart/sample_data.yaml
|
|
259
|
+
|
|
260
|
+
# Test with sample queries
|
|
261
|
+
rem ask "What documents exist in the system?"
|
|
262
|
+
rem ask "Show me meetings about API design"
|
|
263
|
+
rem ask "Who is Sarah Chen?"
|
|
264
|
+
```
|
|
265
|
+
|
|
266
|
+
**Option B: Bring your own data**
|
|
224
267
|
|
|
225
268
|
```bash
|
|
226
|
-
# Ingest
|
|
269
|
+
# Ingest your own files
|
|
227
270
|
echo "REM is a bio-inspired memory system for agentic AI workloads." > test-doc.txt
|
|
228
|
-
rem process ingest test-doc.txt --
|
|
271
|
+
rem process ingest test-doc.txt --category documentation --tags rem,ai
|
|
229
272
|
|
|
230
273
|
# Query your ingested data
|
|
231
|
-
rem ask "What do you know about REM from my knowledge base?"
|
|
274
|
+
rem ask "What do you know about REM from my knowledge base?"
|
|
275
|
+
```
|
|
232
276
|
|
|
233
|
-
|
|
234
|
-
rem ask "What is REM?" --user-id test-user
|
|
277
|
+
### Step 4: Test the API
|
|
235
278
|
|
|
236
|
-
|
|
279
|
+
```bash
|
|
280
|
+
# Test the OpenAI-compatible chat completions API
|
|
237
281
|
curl -X POST http://localhost:8000/api/v1/chat/completions \
|
|
238
282
|
-H "Content-Type: application/json" \
|
|
239
|
-
-H "X-User-Id: test-user" \
|
|
240
283
|
-d '{
|
|
241
284
|
"model": "anthropic:claude-sonnet-4-5-20250929",
|
|
242
|
-
"messages": [{"role": "user", "content": "What
|
|
285
|
+
"messages": [{"role": "user", "content": "What documents did Sarah Chen author?"}],
|
|
243
286
|
"stream": false
|
|
244
287
|
}'
|
|
245
288
|
```
|
|
246
289
|
|
|
247
|
-
**
|
|
290
|
+
**Available Commands:**
|
|
291
|
+
- `rem ask` - Natural language queries to REM
|
|
248
292
|
- `rem process ingest <file>` - Full ingestion pipeline (storage + parsing + embedding + database)
|
|
249
293
|
- `rem process uri <file>` - READ-ONLY parsing (no database storage, useful for testing parsers)
|
|
294
|
+
- `rem db load --file <yaml>` - Load structured datasets directly
|
|
250
295
|
|
|
296
|
+
## Example Datasets
|
|
297
|
+
|
|
298
|
+
🎯 **Recommended**: Clone [remstack-lab](https://github.com/Percolation-Labs/remstack-lab) for curated datasets organized by domain and format.
|
|
299
|
+
|
|
300
|
+
**What's included:**
|
|
301
|
+
- **Quickstart**: Minimal dataset (3 users, 3 resources, 3 moments) - perfect for first-time users
|
|
302
|
+
- **Domains**: Recruitment (CV parsing), Legal (contracts), Enterprise (team collaboration)
|
|
303
|
+
- **Formats**: Engrams (voice memos), Documents (markdown/PDF), Conversations (chat logs)
|
|
304
|
+
- **Evaluation**: Golden datasets for Phoenix-based agent testing
|
|
305
|
+
|
|
306
|
+
**Working from remstack-lab:**
|
|
307
|
+
```bash
|
|
308
|
+
cd remstack-lab
|
|
309
|
+
|
|
310
|
+
# Load any dataset
|
|
311
|
+
rem db load --file datasets/quickstart/sample_data.yaml
|
|
312
|
+
|
|
313
|
+
# Explore formats
|
|
314
|
+
rem db load --file datasets/formats/engrams/scenarios/team_meeting/team_standup_meeting.yaml
|
|
315
|
+
```
|
|
251
316
|
|
|
252
317
|
## See Also
|
|
253
318
|
|
|
254
319
|
- [REM Query Dialect](#rem-query-dialect) - LOOKUP, SEARCH, TRAVERSE, SQL query types
|
|
255
320
|
- [API Endpoints](#api-endpoints) - OpenAI-compatible chat completions, MCP server
|
|
256
321
|
- [CLI Reference](#cli-reference) - Complete command-line interface documentation
|
|
322
|
+
- [Bring Your Own Agent](#bring-your-own-agent) - Create custom agents with your own prompts and tools
|
|
257
323
|
- [Production Deployment](#production-deployment) - AWS EKS with Kubernetes
|
|
324
|
+
- [Example Datasets](https://github.com/Percolation-Labs/remstack-lab) - Curated datasets by domain and format
|
|
325
|
+
|
|
326
|
+
---
|
|
327
|
+
|
|
328
|
+
## Bring Your Own Agent
|
|
329
|
+
|
|
330
|
+
REM allows you to create **custom agents** with your own system prompts, tools, and output schemas. Custom agents are stored in the database and dynamically loaded when referenced, enabling **no-code agent creation** without modifying the codebase.
|
|
331
|
+
|
|
332
|
+
### How It Works
|
|
258
333
|
|
|
259
|
-
**
|
|
334
|
+
1. **Define Agent Schema** - Create a YAML file with your agent's prompt, tools, and output structure
|
|
335
|
+
2. **Ingest Schema** - Use `rem process ingest` to store the schema in the database
|
|
336
|
+
3. **Use Your Agent** - Reference your agent by name with `rem ask <agent-name> "query"`
|
|
337
|
+
|
|
338
|
+
When you run `rem ask my-agent "query"`, REM:
|
|
339
|
+
1. Checks if `my-agent` exists in the filesystem (`schemas/agents/`)
|
|
340
|
+
2. If not found, performs a **LOOKUP** query on the `schemas` table in the database
|
|
341
|
+
3. Loads the schema dynamically and creates a Pydantic AI agent
|
|
342
|
+
4. Runs your query with the custom agent
|
|
343
|
+
|
|
344
|
+
### Expected Behavior
|
|
345
|
+
|
|
346
|
+
**Schema Ingestion Flow** (`rem process ingest my-agent.yaml`):
|
|
347
|
+
- Parse YAML file to extract JSON Schema content
|
|
348
|
+
- Extract `json_schema_extra.kind` field → maps to `category` column
|
|
349
|
+
- Extract `json_schema_extra.provider_configs` → stores provider configurations
|
|
350
|
+
- Extract `json_schema_extra.embedding_fields` → stores semantic search fields
|
|
351
|
+
- Create `Schema` entity in `schemas` table with `user_id` scoping
|
|
352
|
+
- Schema is now queryable via `LOOKUP "my-agent" FROM schemas`
|
|
353
|
+
|
|
354
|
+
**Agent Loading Flow** (`rem ask my-agent "query"`):
|
|
355
|
+
1. `load_agent_schema("my-agent")` checks filesystem cache → miss
|
|
356
|
+
2. Falls back to database: `LOOKUP "my-agent" FROM schemas WHERE user_id = '<user-id>'`
|
|
357
|
+
3. Returns `Schema.spec` (JSON Schema dict) from database
|
|
358
|
+
4. `create_agent()` factory creates Pydantic AI agent from schema
|
|
359
|
+
5. Agent runs with tools specified in `json_schema_extra.tools`
|
|
360
|
+
6. Returns structured output defined in `properties` field
|
|
361
|
+
|
|
362
|
+
### Quick Example
|
|
363
|
+
|
|
364
|
+
**Step 1: Create Agent Schema** (`my-research-assistant.yaml`)
|
|
365
|
+
|
|
366
|
+
```yaml
|
|
367
|
+
type: object
|
|
368
|
+
description: |
|
|
369
|
+
You are a research assistant that helps users find and analyze documents.
|
|
370
|
+
|
|
371
|
+
Use the search_rem tool to find relevant documents, then analyze and summarize them.
|
|
372
|
+
Be concise and cite specific documents in your responses.
|
|
373
|
+
|
|
374
|
+
properties:
|
|
375
|
+
summary:
|
|
376
|
+
type: string
|
|
377
|
+
description: A concise summary of findings
|
|
378
|
+
sources:
|
|
379
|
+
type: array
|
|
380
|
+
items:
|
|
381
|
+
type: string
|
|
382
|
+
description: List of document labels referenced
|
|
383
|
+
|
|
384
|
+
required:
|
|
385
|
+
- summary
|
|
386
|
+
- sources
|
|
387
|
+
|
|
388
|
+
json_schema_extra:
|
|
389
|
+
kind: agent
|
|
390
|
+
name: research-assistant
|
|
391
|
+
version: 1.0.0
|
|
392
|
+
tools:
|
|
393
|
+
- search_rem
|
|
394
|
+
- ask_rem_agent
|
|
395
|
+
resources: []
|
|
396
|
+
```
|
|
397
|
+
|
|
398
|
+
**For more examples**, see:
|
|
399
|
+
- Simple agent (no tools): `src/rem/schemas/agents/examples/simple.yaml`
|
|
400
|
+
- Agent with REM tools: `src/rem/schemas/agents/core/rem-query-agent.yaml`
|
|
401
|
+
- Ontology extractor: `src/rem/schemas/agents/examples/cv-parser.yaml`
|
|
402
|
+
|
|
403
|
+
**Step 2: Ingest Schema into Database**
|
|
404
|
+
|
|
405
|
+
```bash
|
|
406
|
+
# Ingest the schema (stores in database schemas table)
|
|
407
|
+
rem process ingest my-research-assistant.yaml \
|
|
408
|
+
--category agents \
|
|
409
|
+
--tags custom,research
|
|
410
|
+
|
|
411
|
+
# Verify schema is in database (should show schema details)
|
|
412
|
+
rem ask "LOOKUP 'my-research-assistant' FROM schemas"
|
|
413
|
+
```
|
|
414
|
+
|
|
415
|
+
**Step 3: Use Your Custom Agent**
|
|
416
|
+
|
|
417
|
+
```bash
|
|
418
|
+
# Run a query with your custom agent
|
|
419
|
+
rem ask research-assistant "Find documents about machine learning architecture"
|
|
420
|
+
|
|
421
|
+
# With streaming
|
|
422
|
+
rem ask research-assistant "Summarize recent API design documents" --stream
|
|
423
|
+
|
|
424
|
+
# With session continuity
|
|
425
|
+
rem ask research-assistant "What did we discuss about ML?" --session-id abc-123
|
|
426
|
+
```
|
|
427
|
+
|
|
428
|
+
### Agent Schema Structure
|
|
429
|
+
|
|
430
|
+
Every agent schema must include:
|
|
431
|
+
|
|
432
|
+
**Required Fields:**
|
|
433
|
+
- `type: object` - JSON Schema type (always "object")
|
|
434
|
+
- `description` - System prompt with instructions for the agent
|
|
435
|
+
- `properties` - Output schema defining structured response fields
|
|
436
|
+
|
|
437
|
+
**Optional Metadata** (`json_schema_extra`):
|
|
438
|
+
- `kind` - Agent category ("agent", "evaluator", etc.) → maps to `Schema.category`
|
|
439
|
+
- `name` - Agent identifier (used for LOOKUP)
|
|
440
|
+
- `version` - Semantic version (e.g., "1.0.0")
|
|
441
|
+
- `tools` - List of MCP tools to load (e.g., `["search_rem", "lookup_rem"]`)
|
|
442
|
+
- `resources` - List of MCP resources to expose (e.g., `["user_profile"]`)
|
|
443
|
+
- `provider_configs` - Multi-provider testing configurations (for ontology extractors)
|
|
444
|
+
- `embedding_fields` - Fields to embed for semantic search (for ontology extractors)
|
|
445
|
+
|
|
446
|
+
### Available MCP Tools
|
|
447
|
+
|
|
448
|
+
REM provides **4 built-in MCP tools** your agents can use:
|
|
449
|
+
|
|
450
|
+
| Tool | Purpose | Parameters |
|
|
451
|
+
|------|---------|------------|
|
|
452
|
+
| `search_rem` | Execute REM queries (LOOKUP, FUZZY, SEARCH, SQL, TRAVERSE) | `query_type`, `entity_key`, `query_text`, `table`, `sql_query`, `initial_query`, `edge_types`, `depth` |
|
|
453
|
+
| `ask_rem_agent` | Natural language to REM query via agent-driven reasoning | `query`, `agent_schema`, `agent_version` |
|
|
454
|
+
| `ingest_into_rem` | Full file ingestion pipeline (read → store → parse → chunk → embed) | `file_uri`, `category`, `tags`, `is_local_server` |
|
|
455
|
+
| `read_resource` | Access MCP resources (schemas, status) for Claude Desktop | `uri` |
|
|
456
|
+
|
|
457
|
+
**Tool Reference**: Tools are defined in `src/rem/api/mcp_router/tools.py`
|
|
458
|
+
|
|
459
|
+
**Note**: `search_rem` is a unified tool that handles all REM query types via the `query_type` parameter:
|
|
460
|
+
- `query_type="lookup"` - O(1) entity lookup by label
|
|
461
|
+
- `query_type="fuzzy"` - Fuzzy text matching with similarity threshold
|
|
462
|
+
- `query_type="search"` - Semantic vector search (table-specific)
|
|
463
|
+
- `query_type="sql"` - Direct SQL queries (WHERE clause)
|
|
464
|
+
- `query_type="traverse"` - Graph traversal with depth control
|
|
465
|
+
|
|
466
|
+
### Multi-User Isolation
|
|
467
|
+
|
|
468
|
+
For multi-tenant deployments, custom agents are **scoped by `user_id`**, ensuring complete data isolation. Use `--user-id` flag when you need tenant separation:
|
|
469
|
+
|
|
470
|
+
```bash
|
|
471
|
+
# Create agent for specific tenant
|
|
472
|
+
rem process ingest my-agent.yaml --user-id tenant-a --category agents
|
|
473
|
+
|
|
474
|
+
# Query with tenant context
|
|
475
|
+
rem ask my-agent "test" --user-id tenant-a
|
|
476
|
+
```
|
|
477
|
+
|
|
478
|
+
### Advanced: Ontology Extractors
|
|
479
|
+
|
|
480
|
+
Custom agents can also be used as **ontology extractors** to extract structured knowledge from files. See [CLAUDE.md](../CLAUDE.md#ontology-extraction-pattern) for details on:
|
|
481
|
+
- Multi-provider testing (`provider_configs`)
|
|
482
|
+
- Semantic search configuration (`embedding_fields`)
|
|
483
|
+
- File matching rules (`OntologyConfig`)
|
|
484
|
+
- Dreaming workflow integration
|
|
485
|
+
|
|
486
|
+
### Troubleshooting
|
|
487
|
+
|
|
488
|
+
**Schema not found error:**
|
|
489
|
+
```bash
|
|
490
|
+
# Check if schema was ingested correctly
|
|
491
|
+
rem ask "SEARCH 'my-agent' FROM schemas"
|
|
492
|
+
|
|
493
|
+
# List all schemas
|
|
494
|
+
rem ask "SELECT name, category, created_at FROM schemas ORDER BY created_at DESC LIMIT 10"
|
|
495
|
+
```
|
|
496
|
+
|
|
497
|
+
**Agent not loading tools:**
|
|
498
|
+
- Verify `json_schema_extra.tools` lists correct tool names
|
|
499
|
+
- Valid tool names: `search_rem`, `ask_rem_agent`, `ingest_into_rem`, `read_resource`
|
|
500
|
+
- Check MCP tool names in `src/rem/api/mcp_router/tools.py`
|
|
501
|
+
- Tools are case-sensitive: use `search_rem`, not `Search_REM`
|
|
502
|
+
|
|
503
|
+
**Agent not returning structured output:**
|
|
504
|
+
- Ensure `properties` field defines all expected output fields
|
|
505
|
+
- Use `required` field to mark mandatory fields
|
|
506
|
+
- Check agent response with `--stream` disabled to see full JSON output
|
|
260
507
|
|
|
261
508
|
---
|
|
262
509
|
|
|
@@ -269,15 +516,15 @@ REM provides a custom query language designed for **LLM-driven iterated retrieva
|
|
|
269
516
|
Unlike traditional single-shot SQL queries, the REM dialect is optimized for **multi-turn exploration** where LLMs participate in query planning:
|
|
270
517
|
|
|
271
518
|
- **Iterated Queries**: Queries return partial results that LLMs use to refine subsequent queries
|
|
272
|
-
- **Composable WITH Syntax**: Chain operations together (e.g., `TRAVERSE
|
|
519
|
+
- **Composable WITH Syntax**: Chain operations together (e.g., `TRAVERSE edge_type WITH LOOKUP "..."`)
|
|
273
520
|
- **Mixed Indexes**: Combines exact lookups (O(1)), semantic search (vector), and graph traversal
|
|
274
521
|
- **Query Planner Participation**: Results include metadata for LLMs to decide next steps
|
|
275
522
|
|
|
276
523
|
**Example Multi-Turn Flow**:
|
|
277
524
|
```
|
|
278
525
|
Turn 1: LOOKUP "sarah-chen" → Returns entity + available edge types
|
|
279
|
-
Turn 2: TRAVERSE
|
|
280
|
-
Turn 3: SEARCH "architecture decisions"
|
|
526
|
+
Turn 2: TRAVERSE authored_by WITH LOOKUP "sarah-chen" DEPTH 1 → Returns connected documents
|
|
527
|
+
Turn 3: SEARCH "architecture decisions" → Semantic search, then explore graph from results
|
|
281
528
|
```
|
|
282
529
|
|
|
283
530
|
This enables LLMs to **progressively build context** rather than requiring perfect queries upfront.
|
|
@@ -330,8 +577,8 @@ SEARCH "contract disputes" FROM resources WHERE tags @> ARRAY['legal'] LIMIT 5
|
|
|
330
577
|
Follow `graph_edges` relationships across the knowledge graph.
|
|
331
578
|
|
|
332
579
|
```sql
|
|
333
|
-
TRAVERSE
|
|
334
|
-
TRAVERSE
|
|
580
|
+
TRAVERSE authored_by WITH LOOKUP "sarah-chen" DEPTH 2
|
|
581
|
+
TRAVERSE references,depends_on WITH LOOKUP "api-design-v2" DEPTH 3
|
|
335
582
|
```
|
|
336
583
|
|
|
337
584
|
**Features**:
|
|
@@ -424,7 +671,7 @@ SEARCH "API migration planning" FROM resources LIMIT 5
|
|
|
424
671
|
LOOKUP "tidb-migration-spec" FROM resources
|
|
425
672
|
|
|
426
673
|
# Query 3: Find related people
|
|
427
|
-
TRAVERSE
|
|
674
|
+
TRAVERSE authored_by,reviewed_by WITH LOOKUP "tidb-migration-spec" DEPTH 1
|
|
428
675
|
|
|
429
676
|
# Query 4: Recent activity
|
|
430
677
|
SELECT * FROM moments WHERE
|
|
@@ -441,7 +688,7 @@ All queries automatically scoped by `user_id` for complete data isolation:
|
|
|
441
688
|
SEARCH "contracts" FROM resources LIMIT 10
|
|
442
689
|
|
|
443
690
|
-- No cross-user data leakage
|
|
444
|
-
TRAVERSE
|
|
691
|
+
TRAVERSE references WITH LOOKUP "project-x" DEPTH 3
|
|
445
692
|
```
|
|
446
693
|
|
|
447
694
|
## API Endpoints
|
|
@@ -453,8 +700,8 @@ POST /api/v1/chat/completions
|
|
|
453
700
|
```
|
|
454
701
|
|
|
455
702
|
**Headers**:
|
|
456
|
-
- `X-Tenant-Id`: Tenant identifier (
|
|
457
|
-
- `X-User-Id`: User identifier
|
|
703
|
+
- `X-Tenant-Id`: Tenant identifier (optional, for multi-tenant deployments)
|
|
704
|
+
- `X-User-Id`: User identifier (optional, uses default if not provided)
|
|
458
705
|
- `X-Session-Id`: Session/conversation identifier
|
|
459
706
|
- `X-Agent-Schema`: Agent schema URI to use
|
|
460
707
|
|
|
@@ -593,81 +840,144 @@ rem serve --log-level debug
|
|
|
593
840
|
|
|
594
841
|
### Database Management
|
|
595
842
|
|
|
596
|
-
|
|
843
|
+
REM uses a **code-as-source-of-truth** approach for database schema management. Pydantic models define the schema, and the database is kept in sync via diff-based migrations.
|
|
844
|
+
|
|
845
|
+
#### Schema Management Philosophy
|
|
846
|
+
|
|
847
|
+
**Two migration files only:**
|
|
848
|
+
- `001_install.sql` - Core infrastructure (extensions, functions, KV store)
|
|
849
|
+
- `002_install_models.sql` - Entity tables (auto-generated from Pydantic models)
|
|
850
|
+
|
|
851
|
+
**No incremental migrations** (003, 004, etc.) - the models file is always regenerated to match code.
|
|
597
852
|
|
|
598
|
-
|
|
853
|
+
#### `rem db schema generate` - Regenerate Schema SQL
|
|
854
|
+
|
|
855
|
+
Generate `002_install_models.sql` from registered Pydantic models.
|
|
599
856
|
|
|
600
857
|
```bash
|
|
601
|
-
#
|
|
602
|
-
rem db
|
|
858
|
+
# Regenerate from model registry
|
|
859
|
+
rem db schema generate
|
|
860
|
+
|
|
861
|
+
# Output: src/rem/sql/migrations/002_install_models.sql
|
|
862
|
+
```
|
|
603
863
|
|
|
604
|
-
|
|
605
|
-
|
|
864
|
+
This generates:
|
|
865
|
+
- CREATE TABLE statements for each registered entity
|
|
866
|
+
- Embeddings tables (`embeddings_<table>`)
|
|
867
|
+
- KV_STORE triggers for cache maintenance
|
|
868
|
+
- Foreground indexes (GIN for JSONB, B-tree for lookups)
|
|
606
869
|
|
|
607
|
-
|
|
608
|
-
rem db migrate --models
|
|
870
|
+
#### `rem db diff` - Detect Schema Drift
|
|
609
871
|
|
|
610
|
-
|
|
611
|
-
|
|
872
|
+
Compare Pydantic models against the live database using Alembic autogenerate.
|
|
873
|
+
|
|
874
|
+
```bash
|
|
875
|
+
# Show additive changes only (default, safe for production)
|
|
876
|
+
rem db diff
|
|
877
|
+
|
|
878
|
+
# Show all changes including drops
|
|
879
|
+
rem db diff --strategy full
|
|
880
|
+
|
|
881
|
+
# Show additive + safe type widenings
|
|
882
|
+
rem db diff --strategy safe
|
|
612
883
|
|
|
613
|
-
#
|
|
614
|
-
rem db
|
|
884
|
+
# CI mode: exit 1 if drift detected
|
|
885
|
+
rem db diff --check
|
|
615
886
|
|
|
616
|
-
#
|
|
617
|
-
rem db
|
|
887
|
+
# Generate migration SQL for changes
|
|
888
|
+
rem db diff --generate
|
|
618
889
|
```
|
|
619
890
|
|
|
620
|
-
|
|
891
|
+
**Migration Strategies:**
|
|
892
|
+
| Strategy | Description |
|
|
893
|
+
|----------|-------------|
|
|
894
|
+
| `additive` | Only ADD columns/tables/indexes (safe, no data loss) - **default** |
|
|
895
|
+
| `full` | All changes including DROPs (use with caution) |
|
|
896
|
+
| `safe` | Additive + safe column type widenings (e.g., VARCHAR(50) → VARCHAR(256)) |
|
|
621
897
|
|
|
622
|
-
|
|
898
|
+
**Output shows:**
|
|
899
|
+
- `+ ADD COLUMN` - Column in model but not in DB
|
|
900
|
+
- `- DROP COLUMN` - Column in DB but not in model (only with `--strategy full`)
|
|
901
|
+
- `~ ALTER COLUMN` - Column type or constraints differ
|
|
902
|
+
- `+ CREATE TABLE` / `- DROP TABLE` - Table additions/removals
|
|
903
|
+
|
|
904
|
+
#### `rem db apply` - Apply SQL Directly
|
|
905
|
+
|
|
906
|
+
Apply a SQL file directly to the database (bypasses migration tracking).
|
|
623
907
|
|
|
624
908
|
```bash
|
|
625
|
-
|
|
909
|
+
# Apply with audit logging (default)
|
|
910
|
+
rem db apply src/rem/sql/migrations/002_install_models.sql
|
|
911
|
+
|
|
912
|
+
# Preview without executing
|
|
913
|
+
rem db apply --dry-run src/rem/sql/migrations/002_install_models.sql
|
|
914
|
+
|
|
915
|
+
# Apply without audit logging
|
|
916
|
+
rem db apply --no-log src/rem/sql/migrations/002_install_models.sql
|
|
626
917
|
```
|
|
627
918
|
|
|
628
|
-
#### `rem db
|
|
919
|
+
#### `rem db migrate` - Initial Setup
|
|
629
920
|
|
|
630
|
-
|
|
921
|
+
Apply standard migrations (001 + 002). Use for initial setup only.
|
|
631
922
|
|
|
632
923
|
```bash
|
|
633
|
-
|
|
924
|
+
# Apply infrastructure + entity tables
|
|
925
|
+
rem db migrate
|
|
926
|
+
|
|
927
|
+
# Include background indexes (HNSW for vectors)
|
|
928
|
+
rem db migrate --background-indexes
|
|
634
929
|
```
|
|
635
930
|
|
|
636
|
-
|
|
931
|
+
#### Database Workflows
|
|
932
|
+
|
|
933
|
+
**Initial Setup (Local):**
|
|
934
|
+
```bash
|
|
935
|
+
rem db schema generate # Generate from models
|
|
936
|
+
rem db migrate # Apply 001 + 002
|
|
937
|
+
rem db diff # Verify no drift
|
|
938
|
+
```
|
|
637
939
|
|
|
638
|
-
|
|
940
|
+
**Adding/Modifying Models:**
|
|
941
|
+
```bash
|
|
942
|
+
# 1. Edit models in src/rem/models/entities/
|
|
943
|
+
# 2. Register new models in src/rem/registry.py
|
|
944
|
+
rem db schema generate # Regenerate schema
|
|
945
|
+
rem db diff # See what changed
|
|
946
|
+
rem db apply src/rem/sql/migrations/002_install_models.sql
|
|
947
|
+
```
|
|
639
948
|
|
|
640
|
-
|
|
949
|
+
**CI/CD Pipeline:**
|
|
950
|
+
```bash
|
|
951
|
+
rem db diff --check # Fail build if drift detected
|
|
952
|
+
```
|
|
641
953
|
|
|
954
|
+
**Remote Database (Production/Staging):**
|
|
642
955
|
```bash
|
|
643
|
-
#
|
|
644
|
-
|
|
645
|
-
|
|
646
|
-
|
|
956
|
+
# Port-forward to cluster database
|
|
957
|
+
kubectl port-forward -n <namespace> svc/rem-postgres-rw 5433:5432 &
|
|
958
|
+
|
|
959
|
+
# Override connection for diff check
|
|
960
|
+
POSTGRES__CONNECTION_STRING="postgresql://rem:rem@localhost:5433/rem" rem db diff
|
|
647
961
|
|
|
648
|
-
#
|
|
649
|
-
rem
|
|
650
|
-
|
|
651
|
-
--output rem/src/rem/sql/migrations/003_add_fields.sql
|
|
962
|
+
# Apply changes if needed
|
|
963
|
+
POSTGRES__CONNECTION_STRING="postgresql://rem:rem@localhost:5433/rem" \
|
|
964
|
+
rem db apply src/rem/sql/migrations/002_install_models.sql
|
|
652
965
|
```
|
|
653
966
|
|
|
654
|
-
#### `rem db
|
|
967
|
+
#### `rem db rebuild-cache` - Rebuild KV Cache
|
|
655
968
|
|
|
656
|
-
|
|
969
|
+
Rebuild KV_STORE cache from entity tables (after database restart or bulk imports).
|
|
657
970
|
|
|
658
971
|
```bash
|
|
659
|
-
|
|
660
|
-
rem db schema indexes \
|
|
661
|
-
--models src/rem/models/entities \
|
|
662
|
-
--output rem/src/rem/sql/background_indexes.sql
|
|
972
|
+
rem db rebuild-cache
|
|
663
973
|
```
|
|
664
974
|
|
|
665
975
|
#### `rem db schema validate` - Validate Models
|
|
666
976
|
|
|
667
|
-
Validate Pydantic models for schema generation.
|
|
977
|
+
Validate registered Pydantic models for schema generation.
|
|
668
978
|
|
|
669
979
|
```bash
|
|
670
|
-
rem db schema validate
|
|
980
|
+
rem db schema validate
|
|
671
981
|
```
|
|
672
982
|
|
|
673
983
|
### File Processing
|
|
@@ -677,22 +987,14 @@ rem db schema validate --models src/rem/models/entities
|
|
|
677
987
|
Process files with optional custom extractor (ontology extraction).
|
|
678
988
|
|
|
679
989
|
```bash
|
|
680
|
-
# Process all completed files
|
|
681
|
-
rem process files
|
|
682
|
-
--tenant-id acme-corp \
|
|
683
|
-
--status completed \
|
|
684
|
-
--limit 10
|
|
990
|
+
# Process all completed files
|
|
991
|
+
rem process files --status completed --limit 10
|
|
685
992
|
|
|
686
993
|
# Process with custom extractor
|
|
687
|
-
rem process files
|
|
688
|
-
--tenant-id acme-corp \
|
|
689
|
-
--extractor cv-parser-v1 \
|
|
690
|
-
--limit 50
|
|
994
|
+
rem process files --extractor cv-parser-v1 --limit 50
|
|
691
995
|
|
|
692
|
-
# Process files
|
|
693
|
-
rem process files
|
|
694
|
-
--tenant-id acme-corp \
|
|
695
|
-
--lookback-hours 168
|
|
996
|
+
# Process files for specific user
|
|
997
|
+
rem process files --user-id user-123 --status completed
|
|
696
998
|
```
|
|
697
999
|
|
|
698
1000
|
#### `rem process ingest` - Ingest File into REM
|
|
@@ -700,14 +1002,13 @@ rem process files \
|
|
|
700
1002
|
Ingest a file into REM with full pipeline (storage + parsing + embedding + database).
|
|
701
1003
|
|
|
702
1004
|
```bash
|
|
703
|
-
# Ingest local file
|
|
1005
|
+
# Ingest local file with metadata
|
|
704
1006
|
rem process ingest /path/to/document.pdf \
|
|
705
|
-
--user-id user-123 \
|
|
706
1007
|
--category legal \
|
|
707
1008
|
--tags contract,2024
|
|
708
1009
|
|
|
709
1010
|
# Ingest with minimal options
|
|
710
|
-
rem process ingest ./meeting-notes.md
|
|
1011
|
+
rem process ingest ./meeting-notes.md
|
|
711
1012
|
```
|
|
712
1013
|
|
|
713
1014
|
#### `rem process uri` - Parse File (Read-Only)
|
|
@@ -732,28 +1033,17 @@ rem process uri s3://bucket/key.docx --output text
|
|
|
732
1033
|
Run full dreaming workflow: extractors → moments → affinity → user model.
|
|
733
1034
|
|
|
734
1035
|
```bash
|
|
735
|
-
# Full workflow
|
|
736
|
-
rem dreaming full
|
|
737
|
-
--user-id user-123 \
|
|
738
|
-
--tenant-id acme-corp
|
|
1036
|
+
# Full workflow (uses default user from settings)
|
|
1037
|
+
rem dreaming full
|
|
739
1038
|
|
|
740
1039
|
# Skip ontology extractors
|
|
741
|
-
rem dreaming full
|
|
742
|
-
--user-id user-123 \
|
|
743
|
-
--tenant-id acme-corp \
|
|
744
|
-
--skip-extractors
|
|
1040
|
+
rem dreaming full --skip-extractors
|
|
745
1041
|
|
|
746
1042
|
# Process last 24 hours only
|
|
747
|
-
rem dreaming full
|
|
748
|
-
--user-id user-123 \
|
|
749
|
-
--tenant-id acme-corp \
|
|
750
|
-
--lookback-hours 24
|
|
1043
|
+
rem dreaming full --lookback-hours 24
|
|
751
1044
|
|
|
752
|
-
# Limit resources processed
|
|
753
|
-
rem dreaming full
|
|
754
|
-
--user-id user-123 \
|
|
755
|
-
--tenant-id acme-corp \
|
|
756
|
-
--limit 100
|
|
1045
|
+
# Limit resources processed for specific user
|
|
1046
|
+
rem dreaming full --user-id user-123 --limit 100
|
|
757
1047
|
```
|
|
758
1048
|
|
|
759
1049
|
#### `rem dreaming custom` - Custom Extractor
|
|
@@ -761,16 +1051,11 @@ rem dreaming full \
|
|
|
761
1051
|
Run specific ontology extractor on user's data.
|
|
762
1052
|
|
|
763
1053
|
```bash
|
|
764
|
-
# Run CV parser on
|
|
765
|
-
rem dreaming custom
|
|
766
|
-
--user-id user-123 \
|
|
767
|
-
--tenant-id acme-corp \
|
|
768
|
-
--extractor cv-parser-v1
|
|
1054
|
+
# Run CV parser on files
|
|
1055
|
+
rem dreaming custom --extractor cv-parser-v1
|
|
769
1056
|
|
|
770
|
-
# Process last week's files
|
|
1057
|
+
# Process last week's files with limit
|
|
771
1058
|
rem dreaming custom \
|
|
772
|
-
--user-id user-123 \
|
|
773
|
-
--tenant-id acme-corp \
|
|
774
1059
|
--extractor contract-analyzer-v1 \
|
|
775
1060
|
--lookback-hours 168 \
|
|
776
1061
|
--limit 50
|
|
@@ -781,17 +1066,11 @@ rem dreaming custom \
|
|
|
781
1066
|
Extract temporal narratives from resources.
|
|
782
1067
|
|
|
783
1068
|
```bash
|
|
784
|
-
# Generate moments
|
|
785
|
-
rem dreaming moments
|
|
786
|
-
--user-id user-123 \
|
|
787
|
-
--tenant-id acme-corp \
|
|
788
|
-
--limit 50
|
|
1069
|
+
# Generate moments
|
|
1070
|
+
rem dreaming moments --limit 50
|
|
789
1071
|
|
|
790
1072
|
# Process last 7 days
|
|
791
|
-
rem dreaming moments
|
|
792
|
-
--user-id user-123 \
|
|
793
|
-
--tenant-id acme-corp \
|
|
794
|
-
--lookback-hours 168
|
|
1073
|
+
rem dreaming moments --lookback-hours 168
|
|
795
1074
|
```
|
|
796
1075
|
|
|
797
1076
|
#### `rem dreaming affinity` - Build Relationships
|
|
@@ -799,17 +1078,11 @@ rem dreaming moments \
|
|
|
799
1078
|
Build semantic relationships between resources using embeddings.
|
|
800
1079
|
|
|
801
1080
|
```bash
|
|
802
|
-
# Build affinity graph
|
|
803
|
-
rem dreaming affinity
|
|
804
|
-
--user-id user-123 \
|
|
805
|
-
--tenant-id acme-corp \
|
|
806
|
-
--limit 100
|
|
1081
|
+
# Build affinity graph
|
|
1082
|
+
rem dreaming affinity --limit 100
|
|
807
1083
|
|
|
808
1084
|
# Process recent resources only
|
|
809
|
-
rem dreaming affinity
|
|
810
|
-
--user-id user-123 \
|
|
811
|
-
--tenant-id acme-corp \
|
|
812
|
-
--lookback-hours 24
|
|
1085
|
+
rem dreaming affinity --lookback-hours 24
|
|
813
1086
|
```
|
|
814
1087
|
|
|
815
1088
|
#### `rem dreaming user-model` - Update User Model
|
|
@@ -818,9 +1091,7 @@ Update user model from recent activity (preferences, interests, patterns).
|
|
|
818
1091
|
|
|
819
1092
|
```bash
|
|
820
1093
|
# Update user model
|
|
821
|
-
rem dreaming user-model
|
|
822
|
-
--user-id user-123 \
|
|
823
|
-
--tenant-id acme-corp
|
|
1094
|
+
rem dreaming user-model
|
|
824
1095
|
```
|
|
825
1096
|
|
|
826
1097
|
### Evaluation & Experiments
|
|
@@ -912,14 +1183,11 @@ Test Pydantic AI agent with natural language queries.
|
|
|
912
1183
|
# Ask a question
|
|
913
1184
|
rem ask "What documents did Sarah Chen author?"
|
|
914
1185
|
|
|
915
|
-
# With context headers
|
|
916
|
-
rem ask "Find all resources about API design" \
|
|
917
|
-
--user-id user-123 \
|
|
918
|
-
--tenant-id acme-corp
|
|
919
|
-
|
|
920
1186
|
# Use specific agent schema
|
|
921
|
-
rem ask "Analyze this contract"
|
|
922
|
-
|
|
1187
|
+
rem ask contract-analyzer "Analyze this contract"
|
|
1188
|
+
|
|
1189
|
+
# Stream response
|
|
1190
|
+
rem ask "Find all resources about API design" --stream
|
|
923
1191
|
```
|
|
924
1192
|
|
|
925
1193
|
### Global Options
|
|
@@ -1071,6 +1339,30 @@ S3__BUCKET_NAME=rem-storage
|
|
|
1071
1339
|
S3__REGION=us-east-1
|
|
1072
1340
|
```
|
|
1073
1341
|
|
|
1342
|
+
### Building Docker Images
|
|
1343
|
+
|
|
1344
|
+
We tag Docker images with three labels for traceability:
|
|
1345
|
+
1. `latest` - Always points to most recent build
|
|
1346
|
+
2. `<git-sha>` - Short commit hash for exact version tracing
|
|
1347
|
+
3. `<version>` - Semantic version from `pyproject.toml`
|
|
1348
|
+
|
|
1349
|
+
```bash
|
|
1350
|
+
# Build and push multi-platform image to Docker Hub
|
|
1351
|
+
VERSION=$(grep '^version' pyproject.toml | cut -d'"' -f2) && \
|
|
1352
|
+
docker buildx build --platform linux/amd64,linux/arm64 \
|
|
1353
|
+
-t percolationlabs/rem:latest \
|
|
1354
|
+
-t percolationlabs/rem:$(git rev-parse --short HEAD) \
|
|
1355
|
+
-t percolationlabs/rem:$VERSION \
|
|
1356
|
+
--push \
|
|
1357
|
+
-f Dockerfile .
|
|
1358
|
+
|
|
1359
|
+
# Load locally for testing (single platform, no push)
|
|
1360
|
+
docker buildx build --platform linux/arm64 \
|
|
1361
|
+
-t percolationlabs/rem:latest \
|
|
1362
|
+
--load \
|
|
1363
|
+
-f Dockerfile .
|
|
1364
|
+
```
|
|
1365
|
+
|
|
1074
1366
|
### Production Deployment (Optional)
|
|
1075
1367
|
|
|
1076
1368
|
For production deployment to AWS EKS with Kubernetes, see the main repository README:
|
|
@@ -1186,6 +1478,141 @@ TraverseQuery ::= TRAVERSE [<edge_types:list>] WITH <initial_query:Query> [DEPTH
|
|
|
1186
1478
|
|
|
1187
1479
|
**Stage 4** (100% answerable): Mature graph with rich historical data. All query types fully functional with high-quality results.
|
|
1188
1480
|
|
|
1481
|
+
## Troubleshooting
|
|
1482
|
+
|
|
1483
|
+
### Apple Silicon Mac: "Failed to build kreuzberg" Error
|
|
1484
|
+
|
|
1485
|
+
**Problem**: Installation fails with `ERROR: Failed building wheel for kreuzberg` on Apple Silicon Macs.
|
|
1486
|
+
|
|
1487
|
+
**Root Cause**: REM uses `kreuzberg>=4.0.0rc1` for document parsing with native ONNX/Rust table extraction. Kreuzberg 4.0.0rc1 provides pre-built wheels for ARM64 macOS (`macosx_14_0_arm64.whl`) but NOT for x86_64 (Intel) macOS. If you're using an x86_64 Python binary (running under Rosetta 2), pip cannot find a compatible wheel and attempts to build from source, which fails.
|
|
1488
|
+
|
|
1489
|
+
**Solution**: Use ARM64 (native) Python instead of x86_64 Python.
|
|
1490
|
+
|
|
1491
|
+
**Step 1: Verify your Python architecture**
|
|
1492
|
+
|
|
1493
|
+
```bash
|
|
1494
|
+
python3 -c "import platform; print(f'Machine: {platform.machine()}')"
|
|
1495
|
+
```
|
|
1496
|
+
|
|
1497
|
+
- **Correct**: `Machine: arm64` (native ARM Python)
|
|
1498
|
+
- **Wrong**: `Machine: x86_64` (Intel Python under Rosetta)
|
|
1499
|
+
|
|
1500
|
+
**Step 2: Install ARM Python via Homebrew** (if not already installed)
|
|
1501
|
+
|
|
1502
|
+
```bash
|
|
1503
|
+
# Install ARM Python
|
|
1504
|
+
brew install python@3.12
|
|
1505
|
+
|
|
1506
|
+
# Verify it's ARM
|
|
1507
|
+
/opt/homebrew/bin/python3.12 -c "import platform; print(platform.machine())"
|
|
1508
|
+
# Should output: arm64
|
|
1509
|
+
```
|
|
1510
|
+
|
|
1511
|
+
**Step 3: Create venv with ARM Python**
|
|
1512
|
+
|
|
1513
|
+
```bash
|
|
1514
|
+
# Use full path to ARM Python
|
|
1515
|
+
/opt/homebrew/bin/python3.12 -m venv .venv
|
|
1516
|
+
|
|
1517
|
+
# Activate and install
|
|
1518
|
+
source .venv/bin/activate
|
|
1519
|
+
pip install "remdb[all]"
|
|
1520
|
+
```
|
|
1521
|
+
|
|
1522
|
+
**Why This Happens**: Some users have both Intel Homebrew (`/usr/local`) and ARM Homebrew (`/opt/homebrew`) installed. If your system `python3` points to the Intel version at `/usr/local/bin/python3`, you'll hit this issue. The fix is to explicitly use the ARM Python from `/opt/homebrew/bin/python3.12`.
|
|
1523
|
+
|
|
1524
|
+
**Verification**: After successful installation, you should see:
|
|
1525
|
+
```
|
|
1526
|
+
Using cached kreuzberg-4.0.0rc1-cp310-abi3-macosx_14_0_arm64.whl (19.8 MB)
|
|
1527
|
+
Successfully installed ... kreuzberg-4.0.0rc1 ... remdb-0.3.10
|
|
1528
|
+
```
|
|
1529
|
+
|
|
1530
|
+
## Using REM as a Library
|
|
1531
|
+
|
|
1532
|
+
REM wraps FastAPI - extend it exactly as you would any FastAPI app.
|
|
1533
|
+
|
|
1534
|
+
```python
|
|
1535
|
+
import rem
|
|
1536
|
+
from rem import create_app
|
|
1537
|
+
from rem.models.core import CoreModel
|
|
1538
|
+
|
|
1539
|
+
# 1. Register models (for schema generation)
|
|
1540
|
+
rem.register_models(MyModel, AnotherModel)
|
|
1541
|
+
|
|
1542
|
+
# 2. Register schema paths (for custom agents/evaluators)
|
|
1543
|
+
rem.register_schema_path("./schemas")
|
|
1544
|
+
|
|
1545
|
+
# 3. Create app
|
|
1546
|
+
app = create_app()
|
|
1547
|
+
|
|
1548
|
+
# 4. Extend like normal FastAPI
|
|
1549
|
+
app.include_router(my_router)
|
|
1550
|
+
|
|
1551
|
+
@app.mcp_server.tool()
|
|
1552
|
+
async def my_tool(query: str) -> dict:
|
|
1553
|
+
"""Custom MCP tool."""
|
|
1554
|
+
return {"result": query}
|
|
1555
|
+
```
|
|
1556
|
+
|
|
1557
|
+
### Project Structure
|
|
1558
|
+
|
|
1559
|
+
```
|
|
1560
|
+
my-rem-app/
|
|
1561
|
+
├── my_app/
|
|
1562
|
+
│ ├── main.py # Entry point (create_app + extensions)
|
|
1563
|
+
│ ├── models.py # Custom models (inherit CoreModel)
|
|
1564
|
+
│ └── routers/ # Custom FastAPI routers
|
|
1565
|
+
├── schemas/
|
|
1566
|
+
│ ├── agents/ # Custom agent YAML schemas
|
|
1567
|
+
│ └── evaluators/ # Custom evaluator schemas
|
|
1568
|
+
├── sql/migrations/ # Custom SQL migrations
|
|
1569
|
+
└── pyproject.toml
|
|
1570
|
+
```
|
|
1571
|
+
|
|
1572
|
+
Generate this structure with: `rem scaffold my-app`
|
|
1573
|
+
|
|
1574
|
+
### Extension Points
|
|
1575
|
+
|
|
1576
|
+
| Extension | How |
|
|
1577
|
+
|-----------|-----|
|
|
1578
|
+
| **Routes** | `app.include_router(router)` or `@app.get()` |
|
|
1579
|
+
| **MCP Tools** | `@app.mcp_server.tool()` decorator or `app.mcp_server.add_tool(fn)` |
|
|
1580
|
+
| **MCP Resources** | `@app.mcp_server.resource("uri://...")` or `app.mcp_server.add_resource(fn)` |
|
|
1581
|
+
| **MCP Prompts** | `@app.mcp_server.prompt()` or `app.mcp_server.add_prompt(fn)` |
|
|
1582
|
+
| **Models** | `rem.register_models(Model)` then `rem db schema generate` |
|
|
1583
|
+
| **Agent Schemas** | `rem.register_schema_path("./schemas")` or `SCHEMA__PATHS` env var |
|
|
1584
|
+
| **SQL Migrations** | Place in `sql/migrations/` (auto-detected) |
|
|
1585
|
+
|
|
1586
|
+
### Custom Migrations
|
|
1587
|
+
|
|
1588
|
+
REM automatically discovers migrations from two sources:
|
|
1589
|
+
|
|
1590
|
+
1. **Package migrations** (001-099): Built-in migrations from the `remdb` package
|
|
1591
|
+
2. **User migrations** (100+): Your custom migrations in `./sql/migrations/`
|
|
1592
|
+
|
|
1593
|
+
**Convention**: Place custom SQL files in `sql/migrations/` relative to your project root:
|
|
1594
|
+
|
|
1595
|
+
```
|
|
1596
|
+
my-rem-app/
|
|
1597
|
+
├── sql/
|
|
1598
|
+
│ └── migrations/
|
|
1599
|
+
│ ├── 100_custom_table.sql # Runs after package migrations
|
|
1600
|
+
│ ├── 101_add_indexes.sql
|
|
1601
|
+
│ └── 102_custom_functions.sql
|
|
1602
|
+
└── ...
|
|
1603
|
+
```
|
|
1604
|
+
|
|
1605
|
+
**Numbering**: Use 100+ for user migrations to ensure they run after package migrations (001-099). All migrations are sorted by filename, so proper numbering ensures correct execution order.
|
|
1606
|
+
|
|
1607
|
+
**Running migrations**:
|
|
1608
|
+
```bash
|
|
1609
|
+
# Apply all migrations (package + user)
|
|
1610
|
+
rem db migrate
|
|
1611
|
+
|
|
1612
|
+
# Apply with background indexes (for production)
|
|
1613
|
+
rem db migrate --background-indexes
|
|
1614
|
+
```
|
|
1615
|
+
|
|
1189
1616
|
## License
|
|
1190
1617
|
|
|
1191
1618
|
MIT
|