PyPI - remdb - Versions diffs - 0.3.163__py3-none-any.whl → 0.3.200__py3-none-any.whl - Mend

remdb 0.3.163py3-none-any.whl → 0.3.200py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of remdb might be problematic. Click here for more details.

Files changed (48) hide show

rem/agentic/agents/agent_manager.py +2 -1
rem/agentic/context.py +101 -0
rem/agentic/context_builder.py +30 -8
rem/agentic/mcp/tool_wrapper.py +43 -14
rem/agentic/providers/pydantic_ai.py +76 -34
rem/agentic/schema.py +4 -3
rem/agentic/tools/rem_tools.py +11 -0
rem/api/main.py +1 -1
rem/api/mcp_router/resources.py +75 -14
rem/api/mcp_router/server.py +31 -24
rem/api/mcp_router/tools.py +476 -155
rem/api/routers/auth.py +11 -6
rem/api/routers/chat/completions.py +52 -10
rem/api/routers/chat/sse_events.py +2 -2
rem/api/routers/chat/streaming.py +162 -19
rem/api/routers/messages.py +96 -23
rem/auth/middleware.py +59 -42
rem/cli/README.md +62 -0
rem/cli/commands/ask.py +1 -1
rem/cli/commands/db.py +148 -70
rem/cli/commands/process.py +171 -43
rem/models/entities/ontology.py +93 -101
rem/schemas/agents/core/agent-builder.yaml +143 -42
rem/services/content/service.py +18 -5
rem/services/email/service.py +17 -6
rem/services/embeddings/worker.py +26 -12
rem/services/postgres/__init__.py +28 -3
rem/services/postgres/diff_service.py +57 -5
rem/services/postgres/programmable_diff_service.py +635 -0
rem/services/postgres/pydantic_to_sqlalchemy.py +2 -2
rem/services/postgres/register_type.py +12 -11
rem/services/postgres/repository.py +32 -21
rem/services/postgres/schema_generator.py +5 -5
rem/services/postgres/sql_builder.py +6 -5
rem/services/session/__init__.py +7 -1
rem/services/session/pydantic_messages.py +210 -0
rem/services/user_service.py +12 -9
rem/settings.py +7 -1
rem/sql/background_indexes.sql +5 -0
rem/sql/migrations/001_install.sql +148 -11
rem/sql/migrations/002_install_models.sql +162 -132
rem/sql/migrations/004_cache_system.sql +7 -275
rem/utils/model_helpers.py +101 -0
rem/utils/schema_loader.py +51 -13
{remdb-0.3.163.dist-info → remdb-0.3.200.dist-info}/METADATA +1 -1
{remdb-0.3.163.dist-info → remdb-0.3.200.dist-info}/RECORD +48 -46
{remdb-0.3.163.dist-info → remdb-0.3.200.dist-info}/WHEEL +0 -0
{remdb-0.3.163.dist-info → remdb-0.3.200.dist-info}/entry_points.txt +0 -0

rem/schemas/agents/core/agent-builder.yaml CHANGED Viewed

@@ -2,65 +2,148 @@ type: object
 description: |
   # Agent Builder - Create Custom AI Agents Through Conversation
-  You help users create custom AI agents by chatting with them naturally.
-  Gather requirements conversationally, show previews, and save the agent when ready.
+  You help users create custom AI agents for the REM platform through natural conversation.
+  Guide them step-by-step, gather requirements, show previews, and save when ready.
   ## Your Workflow
   1. **Understand the need**: Ask what they want the agent to do
-  2. **Define personality**: Help them choose tone and style
-  3. **Structure outputs**: If needed, define what data the agent captures
-  4. **Preview**: Show them what the agent will look like
-  5. **Save**: Use `save_agent` tool to persist it
+  2. **Define personality**: Help them choose tone and communication style
+  3. **Set guardrails**: What should the agent NOT do?
+  4. **Structure outputs**: Define what data the agent captures (optional)
+  5. **Preview**: Show them what the agent will look like
+  6. **Save**: Use `save_agent` tool to persist it
   ## Conversation Style
   Be friendly and helpful. Ask one or two questions at a time.
   Don't overwhelm with options - guide them step by step.
-  ## Gathering Requirements
+  ## IMPORTANT: Tool Usage
+  - `save_agent` - Use ONLY in Step 6 when user approves the preview
+  - `get_agents_list` - Use if user asks to see existing agents as examples
+  - `get_agent_schema` - Use to load a specific agent (like "rem") as reference
+  DO NOT loop on tools. If a user asks for examples, call get_agents_list ONCE,
+  then discuss what you found. This is a conversational workflow.
+  ## Step 1: Identity & Purpose
   Ask about:
-  - What should this agent help with?
-  - What tone should it have? (casual, professional, empathetic, etc.)
-  - Should it capture any specific information? (optional)
-  - What should it be called?
+  - What should this agent help with? (primary purpose)
+  - What would you like to call it? (suggest kebab-case like "sales-assistant")
+  - What role/persona should it embody?
+  ## Step 2: Tone & Communication Style
+  Help define tone using this framework:
+  | Dimension | Options |
+  |-----------|---------|
+  | Formality | casual, conversational, professional, formal |
+  | Warmth | empathetic, friendly, neutral, businesslike |
+  | Pace | patient, balanced, efficient, direct |
+  | Expertise | peer, guide, expert, authority |
+  Ask: "What tone feels right? For example, should it be friendly and casual, or more professional?"
+  ## Step 3: Guardrails
+  Ask what the agent should NOT do:
+  - Topics to avoid?
+  - Actions it shouldn't take?
+  - Boundaries to respect?
-  ## Preview Format
+  Example guardrails:
+  - "Never provide medical/legal/financial advice"
+  - "Don't make promises about timelines"
+  - "Always recommend consulting a professional for serious issues"
-  Before saving, show a preview using markdown:
+  ## Step 4: Structured Outputs (Optional)
+  Most agents just need an `answer` field. But some use cases benefit from structured data:
+  | Field | Type | Description |
+  |-------|------|-------------|
+  | answer | string | Natural language response (always required) |
+  | confidence | number | 0.0-1.0 confidence score |
+  | category | string | Classification of the request |
+  | follow_up_needed | boolean | Whether follow-up is required |
+  Field types available:
+  - `string` - text values
+  - `number` - numeric values (can add minimum/maximum)
+  - `boolean` - true/false
+  - `array` - list of items
+  - `string` with `enum` - fixed set of choices
+  Only suggest structured outputs if the use case clearly benefits from them.
+  ## Step 5: Preview
+  Before saving, show a preview:
   ```
   ## Agent Preview: {name}
-  **Personality:**
-  {brief description of tone and approach}
+  **Purpose:** {brief description}
+  **Personality:** {tone and approach}
   **System Prompt:**
   {the actual prompt that will guide the agent}
-  **Structured Fields:** (if any)
+  **Guardrails:**
+  - {guardrail 1}
+  - {guardrail 2}
+  **Structured Fields:** (if any beyond answer)
   | Field | Type | Description |
   |-------|------|-------------|
   | answer | string | Response to user |
-  | ... | ... | ... |
   ```
-  Ask: "Does this look good? I can save it now or we can adjust anything."
+  Ask: "Does this look good? I can save it now or adjust anything."
-  ## Saving the Agent
+  ## Step 6: Save the Agent
   When the user approves, call `save_agent` with:
   - `name`: kebab-case name (e.g., "customer-support-bot")
-  - `description`: The full system prompt
+  - `description`: The full system prompt (this is the most important part!)
   - `properties`: Structured output fields (optional, defaults to just "answer")
   - `required`: Which fields are required (optional, defaults to ["answer"])
   - `tags`: Relevant tags for organization
-  ## Default Tools
+  ## Writing Great System Prompts
+  The `description` field IS the system prompt. Make it:
+  1. **Clear about role**: "You are a [role] that helps users [do what]"
+  2. **Specific about behavior**: How should it respond? What's the communication style?
+  3. **Explicit about guardrails**: What should it NOT do?
+  4. **Structured with sections**: Use markdown headers for organization
+  Example structure:
+  ```
+  # [Agent Name]
-  All agents automatically get `search_rem` and `register_metadata` tools.
-  You don't need to specify these.
+  You are a [role description]. Your purpose is to [main goal].
+  ## Communication Style
+  [Tone, approach, language preferences]
+  ## Core Responsibilities
+  - [Responsibility 1]
+  - [Responsibility 2]
+  ## Guardrails
+  - Never [thing to avoid]
+  - Always [thing to ensure]
+  ## How to Respond
+  [Guidelines for responses]
+  ```
   ## Example Conversation
@@ -68,47 +151,62 @@ description: |
   You: "Great idea! Let me help you create an email writing assistant.
-  What tone should it have? For example:
+  First, what tone should it have? For example:
   - Professional and formal
   - Friendly but polished
   - Direct and concise"
   User: "Friendly but polished"
-  You: "Perfect! Should it just write emails, or also help with things like:
+  You: "Perfect! Should it focus just on writing emails, or also help with:
   - Suggesting subject lines
   - Adjusting tone for different recipients
   - Summarizing long email threads"
   User: "All of those would be helpful"
-  You: "Let me show you a preview:
+  You: "Great! Any things it should NOT do? For example, some email assistants avoid:
+  - Writing emails that could be seen as legally binding
+  - Making commitments on behalf of the user"
+  User: "Yeah, it shouldn't make commitments or promises"
+  You: "Perfect! Here's a preview:
   ## Agent Preview: email-assistant
-  **Personality:** Friendly, professional, helpful writing partner
+  **Purpose:** Help users write polished, professional emails
+  **Personality:** Friendly, helpful, professional writing partner
   **System Prompt:**
+  # Email Writing Assistant
   You are a friendly email writing assistant. Help users craft polished,
-  professional emails while maintaining a warm tone. You can:
-  - Write new emails from scratch
+  professional emails while maintaining a warm, approachable tone.
+  ## What You Can Do
+  - Write new emails from scratch based on user's intent
   - Suggest compelling subject lines
-  - Adjust tone for different audiences
+  - Adjust tone for different audiences (colleagues, executives, clients)
   - Summarize long email threads
+  - Proofread and improve existing drafts
-  Always ask clarifying questions if the request is unclear.
+  ## Communication Style
+  Be helpful and collaborative. Suggest improvements but respect the user's voice.
+  Ask clarifying questions when the request is ambiguous.
-  **Structured Fields:**
-  | Field | Type | Description |
-  |-------|------|-------------|
-  | answer | string | Your response or the drafted email |
+  ## Guardrails
+  - Never write emails that make commitments or promises on behalf of the user
+  - Don't write anything that could be legally binding
+  - Always let the user review before sending
   Does this look good? I can save it now or adjust anything."
   User: "Looks great, save it!"
   You: *calls save_agent tool*
-  "Done! Your email-assistant is ready. Use `/custom-agent email-assistant` to start chatting with it."
+  "Done! Your email-assistant is ready to use."
 properties:
   answer:
@@ -121,14 +219,17 @@ required:
 json_schema_extra:
   kind: agent
   name: agent-builder
-  version: "1.0.0"
+  version: "1.2.0"
   tags:
     - meta
     - builder
+  structured_output: false  # Stream text responses, don't return JSON
+  mcp_servers: []  # Disable default MCP tools to prevent search_rem looping
+  resources:
+    - uri: rem://agents
+      description: "List all available agent schemas with descriptions"
+    - uri: rem://agents/{agent_name}
+      description: "Load a specific agent schema by name (e.g., 'rem', 'siggy')"
   tools:
     - name: save_agent
-      description: "Save the agent schema to make it available for use"
-    - name: search_rem
-      description: "Search for existing agents as examples"
-    - name: register_metadata
-      description: "Record session metadata"
+      description: "Save the agent schema. Only call when user approves the preview in Step 6."

rem/services/content/service.py CHANGED Viewed

@@ -274,7 +274,7 @@ class ContentService:
     async def ingest_file(
         self,
         file_uri: str,
-        user_id: str,
+        user_id: str | None = None,
         category: str | None = None,
         tags: list[str] | None = None,
         is_local_server: bool = False,
@@ -283,6 +283,10 @@ class ContentService:
         """
         Complete file ingestion pipeline: read → store → parse → chunk → embed.
+        **IMPORTANT: Data is PUBLIC by default (user_id=None).**
+        This is correct for shared knowledge bases (ontologies, procedures, reference data).
+        Private user-scoped data is rarely needed - only set user_id for truly personal content.
         **CENTRALIZED INGESTION**: This is the single entry point for all file ingestion
         in REM. It handles:
@@ -319,7 +323,9 @@ class ContentService:
         Args:
             file_uri: Source file location (local path, s3://, or https://)
-            user_id: User identifier for data isolation and ownership
+            user_id: User identifier for PRIVATE data only. Default None = PUBLIC/shared.
+                Leave as None for shared knowledge bases, ontologies, reference data.
+                Only set for truly private user-specific content.
             category: Optional category tag (document, code, audio, etc.)
             tags: Optional list of tags
             is_local_server: True if running as local/stdio MCP server
@@ -347,12 +353,19 @@ class ContentService:
         Example:
             >>> service = ContentService()
+            >>> # PUBLIC data (default) - visible to all users
             >>> result = await service.ingest_file(
-            ...     file_uri="s3://bucket/contract.pdf",
-            ...     user_id="user-123",
-            ...     category="legal"
+            ...     file_uri="s3://bucket/procedure.pdf",
+            ...     category="medical"
             ... )
             >>> print(f"Created {result['resources_created']} searchable chunks")
+            >>>
+            >>> # PRIVATE data (rare) - only for user-specific content
+            >>> result = await service.ingest_file(
+            ...     file_uri="s3://bucket/personal-notes.pdf",
+            ...     user_id="user-123",  # Only this user can access
+            ...     category="personal"
+            ... )
         """
         from pathlib import Path
         from uuid import uuid4

rem/services/email/service.py CHANGED Viewed

@@ -200,8 +200,8 @@ class EmailService:
         """
         Generate a deterministic UUID from email address.
-        Uses UUID v5 with DNS namespace for consistency.
-        Same email always produces same UUID.
+        Uses the centralized email_to_user_id() for consistency.
+        Same email always produces same UUID (bijection).
         Args:
             email: Email address
@@ -209,7 +209,8 @@ class EmailService:
         Returns:
             UUID string
         """
-        return str(uuid.uuid5(uuid.NAMESPACE_DNS, email.lower().strip()))
+        from rem.utils.user_id import email_to_user_id
+        return email_to_user_id(email)
     async def send_login_code(
         self,
@@ -375,8 +376,17 @@ class EmailService:
             await user_repo.upsert(existing_user)
             return {"allowed": True, "error": None}
         else:
-            # New user - check if domain is trusted
-            if settings and hasattr(settings, 'email') and settings.email.trusted_domain_list:
+            # New user - first check if they're a subscriber (by email lookup)
+            from ...models.entities import Subscriber
+            subscriber_repo = Repository(Subscriber, db=db)
+            existing_subscriber = await subscriber_repo.find_one({"email": email})
+            if existing_subscriber:
+                # Subscriber exists - allow them to create account
+                # (approved field may not exist in older schemas, so just check existence)
+                logger.info(f"Subscriber {email} creating user account")
+            elif settings and hasattr(settings, 'email') and settings.email.trusted_domain_list:
+                # Not an approved subscriber - check if domain is trusted
                 if not settings.email.is_domain_trusted(email):
                     email_domain = email.split("@")[-1]
                     logger.warning(f"Untrusted domain attempted signup: {email_domain}")
@@ -393,7 +403,8 @@ class EmailService:
             new_user = User(
                 id=uuid.UUID(user_id),
                 tenant_id=tenant_id,
-                name=email.split("@")[0],  # Default name from email
+                user_id=user_id,  # UUID5 hash of email (same as id)
+                name=email,  # Full email as entity_key for LOOKUP
                 email=email,
                 role=user_role,
                 metadata=login_metadata,

rem/services/embeddings/worker.py CHANGED Viewed

@@ -23,6 +23,8 @@ Future:
 import asyncio
 import os
 from typing import Any, Optional
+import hashlib
+import uuid
 from uuid import uuid4
 import httpx
@@ -108,6 +110,7 @@ class EmbeddingWorker:
         self.task_queue: asyncio.Queue = asyncio.Queue()
         self.workers: list[asyncio.Task] = []
         self.running = False
+        self._in_flight_count = 0  # Track tasks being processed (not just in queue)
         # Store API key for direct HTTP requests
         from ...settings import settings
@@ -143,17 +146,18 @@ class EmbeddingWorker:
             return
         queue_size = self.task_queue.qsize()
-        logger.debug(f"Stopping EmbeddingWorker (processing {queue_size} queued tasks first)")
+        in_flight = self._in_flight_count
+        logger.debug(f"Stopping EmbeddingWorker (queue={queue_size}, in_flight={in_flight})")
-        # Wait for queue to drain (with timeout)
+        # Wait for both queue to drain AND in-flight tasks to complete
         max_wait = 30  # 30 seconds max
         waited = 0.0
-        while not self.task_queue.empty() and waited < max_wait:
+        while (not self.task_queue.empty() or self._in_flight_count > 0) and waited < max_wait:
             await asyncio.sleep(0.5)
             waited += 0.5
-        if not self.task_queue.empty():
-            remaining = self.task_queue.qsize()
+        if not self.task_queue.empty() or self._in_flight_count > 0:
+            remaining = self.task_queue.qsize() + self._in_flight_count
             logger.warning(
                 f"EmbeddingWorker timeout: {remaining} tasks remaining after {max_wait}s"
             )
@@ -205,12 +209,18 @@ class EmbeddingWorker:
                 if not batch:
                     continue
-                logger.debug(f"Worker {worker_id} processing batch of {len(batch)} tasks")
+                # Track in-flight tasks
+                self._in_flight_count += len(batch)
-                # Generate embeddings for batch
-                await self._process_batch(batch)
+                logger.debug(f"Worker {worker_id} processing batch of {len(batch)} tasks")
-                logger.debug(f"Worker {worker_id} completed batch")
+                try:
+                    # Generate embeddings for batch
+                    await self._process_batch(batch)
+                    logger.debug(f"Worker {worker_id} completed batch")
+                finally:
+                    # Always decrement in-flight count, even on error
+                    self._in_flight_count -= len(batch)
             except asyncio.CancelledError:
                 logger.debug(f"Worker {worker_id} cancelled")
@@ -373,7 +383,11 @@ class EmbeddingWorker:
         for task, embedding in zip(tasks, embeddings):
             table_name = f"embeddings_{task.table_name}"
-            # Build upsert SQL
+            # Generate deterministic ID from key fields (entity_id, field_name, provider)
+            key_string = f"{task.entity_id}:{task.field_name}:{task.provider}"
+            embedding_id = str(uuid.UUID(hashlib.md5(key_string.encode()).hexdigest()))
+            # Build upsert SQL - conflict on deterministic ID
             sql = f"""
                 INSERT INTO {table_name} (
                     id,
@@ -386,7 +400,7 @@ class EmbeddingWorker:
                     updated_at
                 )
                 VALUES ($1, $2, $3, $4, $5, $6, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP)
-                ON CONFLICT (entity_id, field_name, provider)
+                ON CONFLICT (id)
                 DO UPDATE SET
                     model = EXCLUDED.model,
                     embedding = EXCLUDED.embedding,
@@ -400,7 +414,7 @@ class EmbeddingWorker:
                 await self.postgres_service.execute(
                     sql,
                     (
-                        str(uuid4()),
+                        embedding_id,
                         task.entity_id,
                         task.field_name,
                         task.provider,

rem/services/postgres/__init__.py CHANGED Viewed

@@ -3,22 +3,47 @@ PostgreSQL service for CloudNativePG database operations.
 """
 from .diff_service import DiffService, SchemaDiff
+from .programmable_diff_service import (
+    DiffResult,
+    ObjectDiff,
+    ObjectType,
+    ProgrammableDiffService,
+)
 from .repository import Repository
 from .service import PostgresService
+_postgres_instance: PostgresService | None = None
 def get_postgres_service() -> PostgresService | None:
     """
-    Get PostgresService instance.
+    Get PostgresService singleton instance.
     Returns None if Postgres is disabled.
+    Uses singleton pattern to prevent connection pool exhaustion.
     """
+    global _postgres_instance
     from ...settings import settings
     if not settings.postgres.enabled:
         return None
-    return PostgresService()
+    if _postgres_instance is None:
+        _postgres_instance = PostgresService()
+    return _postgres_instance
-__all__ = ["PostgresService", "get_postgres_service", "Repository", "DiffService", "SchemaDiff"]
+__all__ = [
+    "DiffResult",
+    "DiffService",
+    "ObjectDiff",
+    "ObjectType",
+    "PostgresService",
+    "ProgrammableDiffService",
+    "Repository",
+    "SchemaDiff",
+    "get_postgres_service",
+]

rem/services/postgres/diff_service.py CHANGED Viewed

@@ -5,12 +5,17 @@ Uses Alembic autogenerate to detect differences between:
 - Target schema (derived from Pydantic models)
 - Current database schema
+Also compares programmable objects (functions, triggers, views) which
+Alembic does not track.
 This enables:
 1. Local development: See what would change before applying migrations
 2. CI validation: Detect drift between code and database (--check mode)
 3. Migration generation: Create incremental migration files
 """
+import asyncio
+import re
 from dataclasses import dataclass, field
 from pathlib import Path
 from typing import Optional
@@ -51,11 +56,14 @@ class SchemaDiff:
     sql: str = ""
     upgrade_ops: Optional[ops.UpgradeOps] = None
     filtered_count: int = 0  # Number of operations filtered out by strategy
+    # Programmable objects (functions, triggers, views)
+    programmable_summary: list[str] = field(default_factory=list)
+    programmable_sql: str = ""
     @property
     def change_count(self) -> int:
         """Total number of detected changes."""
-        return len(self.summary)
+        return len(self.summary) + len(self.programmable_summary)
 class DiffService:
@@ -127,10 +135,13 @@ class DiffService:
             # These are now generated in pydantic_to_sqlalchemy
         return True
-    def compute_diff(self) -> SchemaDiff:
+    def compute_diff(self, include_programmable: bool = True) -> SchemaDiff:
         """
         Compare Pydantic models against database and return differences.
+        Args:
+            include_programmable: If True, also diff functions/triggers/views
         Returns:
             SchemaDiff with detected changes
         """
@@ -167,21 +178,62 @@ class DiffService:
                 for op in filtered_ops:
                     summary.extend(self._describe_operation(op))
-        has_changes = len(summary) > 0
         # Generate SQL if there are changes
         sql = ""
-        if has_changes and upgrade_ops:
+        if summary and upgrade_ops:
             sql = self._render_sql(upgrade_ops, engine)
+        # Programmable objects diff (functions, triggers, views)
+        programmable_summary = []
+        programmable_sql = ""
+        if include_programmable:
+            prog_summary, prog_sql = self._compute_programmable_diff()
+            programmable_summary = prog_summary
+            programmable_sql = prog_sql
+        has_changes = len(summary) > 0 or len(programmable_summary) > 0
         return SchemaDiff(
             has_changes=has_changes,
             summary=summary,
             sql=sql,
             upgrade_ops=upgrade_ops,
             filtered_count=filtered_count,
+            programmable_summary=programmable_summary,
+            programmable_sql=programmable_sql,
         )
+    def _compute_programmable_diff(self) -> tuple[list[str], str]:
+        """
+        Compute diff for programmable objects (functions, triggers, views).
+        Returns:
+            Tuple of (summary_lines, sync_sql)
+        """
+        from .programmable_diff_service import ProgrammableDiffService
+        service = ProgrammableDiffService()
+        # Run async diff in sync context
+        try:
+            loop = asyncio.get_event_loop()
+        except RuntimeError:
+            loop = asyncio.new_event_loop()
+            asyncio.set_event_loop(loop)
+        result = loop.run_until_complete(service.compute_diff())
+        summary = []
+        for diff in result.diffs:
+            if diff.status == "missing":
+                summary.append(f"+ {diff.object_type.value.upper()} {diff.name} (missing)")
+            elif diff.status == "different":
+                summary.append(f"~ {diff.object_type.value.upper()} {diff.name} (different)")
+            elif diff.status == "extra":
+                summary.append(f"- {diff.object_type.value.upper()} {diff.name} (extra in db)")
+        return summary, result.sync_sql
     def _filter_operations(self, operations: list) -> tuple[list, int]:
         """
         Filter operations based on migration strategy.

remdb 0.3.163__py3-none-any.whl → 0.3.200__py3-none-any.whl

Potentially problematic release.

remdb 0.3.163py3-none-any.whl → 0.3.200py3-none-any.whl