remdb 0.3.163__py3-none-any.whl → 0.3.181__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of remdb might be problematic. Click here for more details.

@@ -1,63 +1,55 @@
1
- """Ontology entity for tenant-specific knowledge extensions.
1
+ """Ontology entity for domain-specific knowledge.
2
2
 
3
- **What is Ontology Extraction?**
3
+ **What are Ontologies?**
4
4
 
5
- Ontologies are **domain-specific structured knowledge** extracted from files using custom
6
- agent schemas. They extend REM's normal file processing pipeline with tenant-specific
7
- parsers that extract structured data the standard chunking pipeline would miss.
5
+ Ontologies are **domain-specific structured knowledge** that can be:
6
+ 1. **Extracted** from files using custom agent schemas (agent-extracted)
7
+ 2. **Loaded directly** from external sources like git repos or S3 (direct-loaded)
8
8
 
9
- **Normal File Processing:**
10
- File → extract text → chunk → embed → resources (semantic search ready)
9
+ **Use Case 1: Agent-Extracted Ontologies**
11
10
 
12
- **Ontology Processing (Tenant Knowledge Extensions):**
13
11
  File → custom agent → structured JSON → ontology (domain knowledge)
14
12
 
15
- **Why Ontologies?**
16
- - Standard chunking gives you semantic search over raw content
17
- - Ontologies give you **structured queryable fields** from domain logic
18
- - Example: A contract PDF becomes both searchable chunks AND a structured record with
19
- parties, dates, payment terms, obligations as queryable fields
13
+ Example: A contract PDF becomes a structured record with parties, dates, payment terms.
14
+
15
+ **Use Case 2: Direct-Loaded Ontologies (Knowledge Bases)**
16
+
17
+ External source (git/S3) load ontology (reference knowledge)
18
+
19
+ Example: A psychiatric ontology of disorders, symptoms, and drugs loaded from markdown
20
+ files in a git repository. Each markdown file becomes an ontology node with:
21
+ - `uri`: git path (e.g., `git://org/repo/ontology/disorders/anxiety/panic-disorder.md`)
22
+ - `content`: markdown content for embedding/search
23
+ - `extracted_data`: parsed frontmatter or structure
20
24
 
21
25
  **Architecture:**
22
- - Runs as part of dreaming worker (background knowledge extraction)
23
- - OntologyConfig defines which files trigger which extractors (MIME type, URI pattern, tags)
26
+ - Runs as part of dreaming worker (background knowledge extraction) OR
27
+ - Loaded directly via `rem db load` for external knowledge bases
28
+ - OntologyConfig defines which files trigger which extractors
24
29
  - Multiple ontologies per file (apply different domain lenses)
25
- - Tenant-scoped: Each tenant can define their own extractors
30
+ - Tenant-scoped: Each tenant can define their own extractors and knowledge bases
26
31
 
27
32
  **Use Cases:**
28
33
 
29
- 1. **Recruitment (CV Parsing)**
30
- - Standard pipeline: Chunks for "find me candidates with Python experience"
31
- - Ontology: Structured fields for filtering/sorting (years_experience, seniority_level, skills[])
32
-
33
- 2. **Legal (Contract Analysis)**
34
- - Standard pipeline: Semantic search over contract text
35
- - Ontology: Queryable fields (parties, effective_date, payment_amount, key_obligations[])
34
+ 1. **Recruitment (CV Parsing)** - Agent-extracted
35
+ - Ontology: Structured fields for filtering/sorting (years_experience, skills[])
36
36
 
37
- 3. **Medical (Health Records)**
38
- - Standard pipeline: Find mentions of conditions
39
- - Ontology: Structured diagnoses, medications, dosages, treatment plans
37
+ 2. **Legal (Contract Analysis)** - Agent-extracted
38
+ - Ontology: Queryable fields (parties, effective_date, payment_amount)
40
39
 
41
- 4. **Finance (Report Analysis)**
42
- - Standard pipeline: Search for financial terms
43
- - Ontology: Extracted metrics, risk_flags, trends, forecasts
40
+ 3. **Medical Knowledge Base** - Direct-loaded
41
+ - Ontology: Disorders, symptoms, medications from curated markdown files
42
+ - Enables semantic search over psychiatric/medical domain knowledge
44
43
 
45
- **Example Flow:**
46
- 1. Tenant creates OntologyConfig: "Run cv-parser-v1 on files with mime_type='application/pdf' and tags=['resume']"
47
- 2. File uploaded with tags=["resume"]
48
- 3. Normal processing: File → chunks → resources
49
- 4. Dreaming worker detects matching OntologyConfig
50
- 5. Loads cv-parser-v1 agent schema from database
51
- 6. Runs agent on file content → extracts structured data
52
- 7. Stores Ontology with extracted_data = {candidate_name, skills, experience, education, ...}
53
- 8. Ontology is now queryable via LOOKUP, SEARCH, or direct SQL
44
+ 4. **Documentation/Procedures** - Direct-loaded
45
+ - Ontology: Clinical procedures (e.g., SCID-5 assessment steps)
46
+ - Reference material accessible via RAG
54
47
 
55
48
  **Design:**
56
- - Each ontology links to a File via file_id
57
- - Agent schema tracked via agent_schema_id (human-readable label, not UUID)
58
- - Structured data in `extracted_data` (arbitrary JSON, schema defined by agent)
59
- - Embeddings generated for semantic search (configurable fields via agent schema)
60
- - Multiple ontologies per file using different schemas
49
+ - `file_id` and `agent_schema_id` are optional (only needed for agent-extracted)
50
+ - `uri` field for external source references (git://, s3://, https://)
51
+ - Structured data in `extracted_data` (arbitrary JSON)
52
+ - Embeddings generated for semantic search via `content` field
61
53
  - Tenant-isolated: OntologyConfigs are tenant-scoped
62
54
  """
63
55
 
@@ -70,18 +62,19 @@ from ..core.core_model import CoreModel
70
62
 
71
63
 
72
64
  class Ontology(CoreModel):
73
- """Domain-specific knowledge extracted from files using custom agents.
65
+ """Domain-specific knowledge - either agent-extracted or direct-loaded.
74
66
 
75
67
  Attributes:
76
68
  name: Human-readable label for this ontology instance
77
- file_id: Foreign key to File entity that was processed
78
- agent_schema_id: Foreign key to Schema entity that performed extraction
79
- provider_name: LLM provider used for extraction (e.g., "anthropic", "openai")
80
- model_name: Specific model used (e.g., "claude-sonnet-4-5")
81
- extracted_data: Structured data extracted by agent (arbitrary JSON)
69
+ uri: External source reference (git://, s3://, https://) for direct-loaded ontologies
70
+ file_id: Foreign key to File entity (optional - only for agent-extracted)
71
+ agent_schema_id: Schema that performed extraction (optional - only for agent-extracted)
72
+ provider_name: LLM provider used for extraction (optional)
73
+ model_name: Specific model used (optional)
74
+ extracted_data: Structured data - either extracted by agent or parsed from source
82
75
  confidence_score: Optional confidence score from extraction (0.0-1.0)
83
76
  extraction_timestamp: When extraction was performed
84
- embedding_text: Text used for generating embedding (derived from extracted_data)
77
+ content: Text used for generating embedding
85
78
 
86
79
  Inherited from CoreModel:
87
80
  id: UUID or string identifier
@@ -93,10 +86,9 @@ class Ontology(CoreModel):
93
86
  graph_edges: Relationships to other entities
94
87
  metadata: Flexible metadata storage
95
88
  tags: Classification tags
96
- column: Database schema metadata
97
89
 
98
90
  Example Usage:
99
- # CV extraction
91
+ # Agent-extracted: CV parsing
100
92
  cv_ontology = Ontology(
101
93
  name="john-doe-cv-2024",
102
94
  file_id="file-uuid-123",
@@ -105,73 +97,74 @@ class Ontology(CoreModel):
105
97
  model_name="claude-sonnet-4-5-20250929",
106
98
  extracted_data={
107
99
  "candidate_name": "John Doe",
108
- "email": "john@example.com",
109
100
  "skills": ["Python", "PostgreSQL", "Kubernetes"],
110
- "experience": [
111
- {
112
- "company": "TechCorp",
113
- "role": "Senior Engineer",
114
- "years": 3,
115
- "achievements": ["Led migration to k8s", "Reduced costs 40%"]
116
- }
117
- ],
118
- "education": [
119
- {"degree": "BS Computer Science", "institution": "MIT", "year": 2018}
120
- ]
121
101
  },
122
102
  confidence_score=0.95,
123
- tags=["cv", "engineering", "senior-level"]
103
+ tags=["cv", "engineering"]
124
104
  )
125
105
 
126
- # Contract extraction
127
- contract_ontology = Ontology(
128
- name="acme-supplier-agreement-2024",
129
- file_id="file-uuid-456",
130
- agent_schema_id="contract-parser-v2",
131
- provider_name="openai",
132
- model_name="gpt-4.1",
106
+ # Direct-loaded: Medical knowledge base from git
107
+ disorder_ontology = Ontology(
108
+ name="panic-disorder",
109
+ uri="git://bwolfson-siggie/Siggy-MVP/ontology/disorders/anxiety/panic-disorder.md",
110
+ content="# Panic Disorder\\n\\nPanic disorder is characterized by...",
133
111
  extracted_data={
134
- "contract_type": "supplier_agreement",
135
- "parties": [
136
- {"name": "ACME Corp", "role": "buyer"},
137
- {"name": "SupplyChain Inc", "role": "supplier"}
138
- ],
139
- "effective_date": "2024-01-01",
140
- "termination_date": "2026-12-31",
141
- "payment_terms": {
142
- "amount": 500000,
143
- "currency": "USD",
144
- "frequency": "quarterly"
145
- },
146
- "key_obligations": [
147
- "Supplier must deliver within 30 days",
148
- "Buyer must pay within 60 days of invoice"
149
- ]
112
+ "type": "disorder",
113
+ "category": "anxiety",
114
+ "icd10": "F41.0",
115
+ "dsm5_criteria": ["A", "B", "C", "D"],
116
+ },
117
+ tags=["disorder", "anxiety", "dsm5"]
118
+ )
119
+
120
+ # Direct-loaded: Clinical procedure from git
121
+ scid_node = Ontology(
122
+ name="scid-5-f1",
123
+ uri="git://bwolfson-siggie/Siggy-MVP/ontology/procedures/scid-5/module-f/scid-5-f1.md",
124
+ content="# scid-5-f1: Panic Attack Screening\\n\\n...",
125
+ extracted_data={
126
+ "type": "procedure",
127
+ "module": "F",
128
+ "section": "Panic Disorder",
129
+ "dsm5_criterion": "Panic Attack Specifier",
150
130
  },
151
- confidence_score=0.92,
152
- tags=["contract", "supplier", "procurement"]
131
+ tags=["scid-5", "procedure", "anxiety"]
153
132
  )
154
133
  """
155
134
 
156
135
  # Core fields
157
136
  name: str
158
- file_id: UUID | str
159
- agent_schema_id: str # Natural language label of Schema entity
137
+ uri: Optional[str] = None # External source: git://, s3://, https://
160
138
 
161
- # Extraction metadata
162
- provider_name: str # LLM provider (anthropic, openai, etc.)
163
- model_name: str # Specific model used
164
- extracted_data: dict[str, Any] # Arbitrary structured data from agent
139
+ # Agent extraction fields (optional - only for agent-extracted ontologies)
140
+ file_id: Optional[UUID | str] = None # FK to File entity
141
+ agent_schema_id: Optional[str] = None # Schema that performed extraction
142
+ provider_name: Optional[str] = None # LLM provider (anthropic, openai, etc.)
143
+ model_name: Optional[str] = None # Specific model used
144
+
145
+ # Data fields
146
+ extracted_data: Optional[dict[str, Any]] = None # Structured data
165
147
  confidence_score: Optional[float] = None # 0.0-1.0 if provided by agent
166
148
  extraction_timestamp: Optional[str] = None # ISO8601 timestamp
167
149
 
168
- # Semantic search support
169
- embedding_text: Optional[str] = None # Text for embedding generation
150
+ # Semantic search support - 'content' is a default embeddable field name
151
+ content: Optional[str] = None # Text for embedding generation
170
152
 
171
153
  model_config = ConfigDict(
172
154
  json_schema_extra={
173
- "description": "Domain-specific knowledge extracted from files using custom agents",
155
+ "description": "Domain-specific knowledge - agent-extracted or direct-loaded from external sources",
174
156
  "examples": [
157
+ {
158
+ "name": "panic-disorder",
159
+ "uri": "git://org/repo/ontology/disorders/anxiety/panic-disorder.md",
160
+ "content": "# Panic Disorder\n\nPanic disorder is characterized by...",
161
+ "extracted_data": {
162
+ "type": "disorder",
163
+ "category": "anxiety",
164
+ "icd10": "F41.0"
165
+ },
166
+ "tags": ["disorder", "anxiety"]
167
+ },
175
168
  {
176
169
  "name": "john-doe-cv-2024",
177
170
  "file_id": "550e8400-e29b-41d4-a716-446655440000",
@@ -180,8 +173,7 @@ class Ontology(CoreModel):
180
173
  "model_name": "claude-sonnet-4-5-20250929",
181
174
  "extracted_data": {
182
175
  "candidate_name": "John Doe",
183
- "skills": ["Python", "PostgreSQL"],
184
- "experience": []
176
+ "skills": ["Python", "PostgreSQL"]
185
177
  },
186
178
  "confidence_score": 0.95,
187
179
  "tags": ["cv", "engineering"]
@@ -2,65 +2,148 @@ type: object
2
2
  description: |
3
3
  # Agent Builder - Create Custom AI Agents Through Conversation
4
4
 
5
- You help users create custom AI agents by chatting with them naturally.
6
- Gather requirements conversationally, show previews, and save the agent when ready.
5
+ You help users create custom AI agents for the REM platform through natural conversation.
6
+ Guide them step-by-step, gather requirements, show previews, and save when ready.
7
7
 
8
8
  ## Your Workflow
9
9
 
10
10
  1. **Understand the need**: Ask what they want the agent to do
11
- 2. **Define personality**: Help them choose tone and style
12
- 3. **Structure outputs**: If needed, define what data the agent captures
13
- 4. **Preview**: Show them what the agent will look like
14
- 5. **Save**: Use `save_agent` tool to persist it
11
+ 2. **Define personality**: Help them choose tone and communication style
12
+ 3. **Set guardrails**: What should the agent NOT do?
13
+ 4. **Structure outputs**: Define what data the agent captures (optional)
14
+ 5. **Preview**: Show them what the agent will look like
15
+ 6. **Save**: Use `save_agent` tool to persist it
15
16
 
16
17
  ## Conversation Style
17
18
 
18
19
  Be friendly and helpful. Ask one or two questions at a time.
19
20
  Don't overwhelm with options - guide them step by step.
20
21
 
21
- ## Gathering Requirements
22
+ ## IMPORTANT: Tool Usage
23
+
24
+ - `save_agent` - Use ONLY in Step 6 when user approves the preview
25
+ - `get_agents_list` - Use if user asks to see existing agents as examples
26
+ - `get_agent_schema` - Use to load a specific agent (like "rem") as reference
27
+
28
+ DO NOT loop on tools. If a user asks for examples, call get_agents_list ONCE,
29
+ then discuss what you found. This is a conversational workflow.
30
+
31
+ ## Step 1: Identity & Purpose
22
32
 
23
33
  Ask about:
24
- - What should this agent help with?
25
- - What tone should it have? (casual, professional, empathetic, etc.)
26
- - Should it capture any specific information? (optional)
27
- - What should it be called?
34
+ - What should this agent help with? (primary purpose)
35
+ - What would you like to call it? (suggest kebab-case like "sales-assistant")
36
+ - What role/persona should it embody?
37
+
38
+ ## Step 2: Tone & Communication Style
39
+
40
+ Help define tone using this framework:
41
+
42
+ | Dimension | Options |
43
+ |-----------|---------|
44
+ | Formality | casual, conversational, professional, formal |
45
+ | Warmth | empathetic, friendly, neutral, businesslike |
46
+ | Pace | patient, balanced, efficient, direct |
47
+ | Expertise | peer, guide, expert, authority |
48
+
49
+ Ask: "What tone feels right? For example, should it be friendly and casual, or more professional?"
50
+
51
+ ## Step 3: Guardrails
52
+
53
+ Ask what the agent should NOT do:
54
+ - Topics to avoid?
55
+ - Actions it shouldn't take?
56
+ - Boundaries to respect?
28
57
 
29
- ## Preview Format
58
+ Example guardrails:
59
+ - "Never provide medical/legal/financial advice"
60
+ - "Don't make promises about timelines"
61
+ - "Always recommend consulting a professional for serious issues"
30
62
 
31
- Before saving, show a preview using markdown:
63
+ ## Step 4: Structured Outputs (Optional)
64
+
65
+ Most agents just need an `answer` field. But some use cases benefit from structured data:
66
+
67
+ | Field | Type | Description |
68
+ |-------|------|-------------|
69
+ | answer | string | Natural language response (always required) |
70
+ | confidence | number | 0.0-1.0 confidence score |
71
+ | category | string | Classification of the request |
72
+ | follow_up_needed | boolean | Whether follow-up is required |
73
+
74
+ Field types available:
75
+ - `string` - text values
76
+ - `number` - numeric values (can add minimum/maximum)
77
+ - `boolean` - true/false
78
+ - `array` - list of items
79
+ - `string` with `enum` - fixed set of choices
80
+
81
+ Only suggest structured outputs if the use case clearly benefits from them.
82
+
83
+ ## Step 5: Preview
84
+
85
+ Before saving, show a preview:
32
86
 
33
87
  ```
34
88
  ## Agent Preview: {name}
35
89
 
36
- **Personality:**
37
- {brief description of tone and approach}
90
+ **Purpose:** {brief description}
91
+
92
+ **Personality:** {tone and approach}
38
93
 
39
94
  **System Prompt:**
40
95
  {the actual prompt that will guide the agent}
41
96
 
42
- **Structured Fields:** (if any)
97
+ **Guardrails:**
98
+ - {guardrail 1}
99
+ - {guardrail 2}
100
+
101
+ **Structured Fields:** (if any beyond answer)
43
102
  | Field | Type | Description |
44
103
  |-------|------|-------------|
45
104
  | answer | string | Response to user |
46
- | ... | ... | ... |
47
105
  ```
48
106
 
49
- Ask: "Does this look good? I can save it now or we can adjust anything."
107
+ Ask: "Does this look good? I can save it now or adjust anything."
50
108
 
51
- ## Saving the Agent
109
+ ## Step 6: Save the Agent
52
110
 
53
111
  When the user approves, call `save_agent` with:
54
112
  - `name`: kebab-case name (e.g., "customer-support-bot")
55
- - `description`: The full system prompt
113
+ - `description`: The full system prompt (this is the most important part!)
56
114
  - `properties`: Structured output fields (optional, defaults to just "answer")
57
115
  - `required`: Which fields are required (optional, defaults to ["answer"])
58
116
  - `tags`: Relevant tags for organization
59
117
 
60
- ## Default Tools
118
+ ## Writing Great System Prompts
119
+
120
+ The `description` field IS the system prompt. Make it:
121
+
122
+ 1. **Clear about role**: "You are a [role] that helps users [do what]"
123
+ 2. **Specific about behavior**: How should it respond? What's the communication style?
124
+ 3. **Explicit about guardrails**: What should it NOT do?
125
+ 4. **Structured with sections**: Use markdown headers for organization
126
+
127
+ Example structure:
128
+ ```
129
+ # [Agent Name]
61
130
 
62
- All agents automatically get `search_rem` and `register_metadata` tools.
63
- You don't need to specify these.
131
+ You are a [role description]. Your purpose is to [main goal].
132
+
133
+ ## Communication Style
134
+ [Tone, approach, language preferences]
135
+
136
+ ## Core Responsibilities
137
+ - [Responsibility 1]
138
+ - [Responsibility 2]
139
+
140
+ ## Guardrails
141
+ - Never [thing to avoid]
142
+ - Always [thing to ensure]
143
+
144
+ ## How to Respond
145
+ [Guidelines for responses]
146
+ ```
64
147
 
65
148
  ## Example Conversation
66
149
 
@@ -68,47 +151,62 @@ description: |
68
151
 
69
152
  You: "Great idea! Let me help you create an email writing assistant.
70
153
 
71
- What tone should it have? For example:
154
+ First, what tone should it have? For example:
72
155
  - Professional and formal
73
156
  - Friendly but polished
74
157
  - Direct and concise"
75
158
 
76
159
  User: "Friendly but polished"
77
160
 
78
- You: "Perfect! Should it just write emails, or also help with things like:
161
+ You: "Perfect! Should it focus just on writing emails, or also help with:
79
162
  - Suggesting subject lines
80
163
  - Adjusting tone for different recipients
81
164
  - Summarizing long email threads"
82
165
 
83
166
  User: "All of those would be helpful"
84
167
 
85
- You: "Let me show you a preview:
168
+ You: "Great! Any things it should NOT do? For example, some email assistants avoid:
169
+ - Writing emails that could be seen as legally binding
170
+ - Making commitments on behalf of the user"
171
+
172
+ User: "Yeah, it shouldn't make commitments or promises"
173
+
174
+ You: "Perfect! Here's a preview:
86
175
 
87
176
  ## Agent Preview: email-assistant
88
177
 
89
- **Personality:** Friendly, professional, helpful writing partner
178
+ **Purpose:** Help users write polished, professional emails
179
+
180
+ **Personality:** Friendly, helpful, professional writing partner
90
181
 
91
182
  **System Prompt:**
183
+ # Email Writing Assistant
184
+
92
185
  You are a friendly email writing assistant. Help users craft polished,
93
- professional emails while maintaining a warm tone. You can:
94
- - Write new emails from scratch
186
+ professional emails while maintaining a warm, approachable tone.
187
+
188
+ ## What You Can Do
189
+ - Write new emails from scratch based on user's intent
95
190
  - Suggest compelling subject lines
96
- - Adjust tone for different audiences
191
+ - Adjust tone for different audiences (colleagues, executives, clients)
97
192
  - Summarize long email threads
193
+ - Proofread and improve existing drafts
98
194
 
99
- Always ask clarifying questions if the request is unclear.
195
+ ## Communication Style
196
+ Be helpful and collaborative. Suggest improvements but respect the user's voice.
197
+ Ask clarifying questions when the request is ambiguous.
100
198
 
101
- **Structured Fields:**
102
- | Field | Type | Description |
103
- |-------|------|-------------|
104
- | answer | string | Your response or the drafted email |
199
+ ## Guardrails
200
+ - Never write emails that make commitments or promises on behalf of the user
201
+ - Don't write anything that could be legally binding
202
+ - Always let the user review before sending
105
203
 
106
204
  Does this look good? I can save it now or adjust anything."
107
205
 
108
206
  User: "Looks great, save it!"
109
207
 
110
208
  You: *calls save_agent tool*
111
- "Done! Your email-assistant is ready. Use `/custom-agent email-assistant` to start chatting with it."
209
+ "Done! Your email-assistant is ready to use."
112
210
 
113
211
  properties:
114
212
  answer:
@@ -121,14 +219,17 @@ required:
121
219
  json_schema_extra:
122
220
  kind: agent
123
221
  name: agent-builder
124
- version: "1.0.0"
222
+ version: "1.2.0"
125
223
  tags:
126
224
  - meta
127
225
  - builder
226
+ structured_output: false # Stream text responses, don't return JSON
227
+ mcp_servers: [] # Disable default MCP tools to prevent search_rem looping
228
+ resources:
229
+ - uri: rem://agents
230
+ description: "List all available agent schemas with descriptions"
231
+ - uri: rem://agents/{agent_name}
232
+ description: "Load a specific agent schema by name (e.g., 'rem', 'siggy')"
128
233
  tools:
129
234
  - name: save_agent
130
- description: "Save the agent schema to make it available for use"
131
- - name: search_rem
132
- description: "Search for existing agents as examples"
133
- - name: register_metadata
134
- description: "Record session metadata"
235
+ description: "Save the agent schema. Only call when user approves the preview in Step 6."
@@ -200,8 +200,8 @@ class EmailService:
200
200
  """
201
201
  Generate a deterministic UUID from email address.
202
202
 
203
- Uses UUID v5 with DNS namespace for consistency.
204
- Same email always produces same UUID.
203
+ Uses the centralized email_to_user_id() for consistency.
204
+ Same email always produces same UUID (bijection).
205
205
 
206
206
  Args:
207
207
  email: Email address
@@ -209,7 +209,8 @@ class EmailService:
209
209
  Returns:
210
210
  UUID string
211
211
  """
212
- return str(uuid.uuid5(uuid.NAMESPACE_DNS, email.lower().strip()))
212
+ from rem.utils.user_id import email_to_user_id
213
+ return email_to_user_id(email)
213
214
 
214
215
  async def send_login_code(
215
216
  self,
@@ -375,8 +376,17 @@ class EmailService:
375
376
  await user_repo.upsert(existing_user)
376
377
  return {"allowed": True, "error": None}
377
378
  else:
378
- # New user - check if domain is trusted
379
- if settings and hasattr(settings, 'email') and settings.email.trusted_domain_list:
379
+ # New user - first check if they're a subscriber (by email lookup)
380
+ from ...models.entities import Subscriber
381
+ subscriber_repo = Repository(Subscriber, db=db)
382
+ existing_subscriber = await subscriber_repo.find_one({"email": email})
383
+
384
+ if existing_subscriber:
385
+ # Subscriber exists - allow them to create account
386
+ # (approved field may not exist in older schemas, so just check existence)
387
+ logger.info(f"Subscriber {email} creating user account")
388
+ elif settings and hasattr(settings, 'email') and settings.email.trusted_domain_list:
389
+ # Not an approved subscriber - check if domain is trusted
380
390
  if not settings.email.is_domain_trusted(email):
381
391
  email_domain = email.split("@")[-1]
382
392
  logger.warning(f"Untrusted domain attempted signup: {email_domain}")
@@ -393,7 +403,8 @@ class EmailService:
393
403
  new_user = User(
394
404
  id=uuid.UUID(user_id),
395
405
  tenant_id=tenant_id,
396
- name=email.split("@")[0], # Default name from email
406
+ user_id=user_id, # UUID5 hash of email (same as id)
407
+ name=email, # Full email as entity_key for LOOKUP
397
408
  email=email,
398
409
  role=user_role,
399
410
  metadata=login_metadata,