@gonzih/meet-the-one-ai 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (175) hide show
  1. package/.env.example +41 -0
  2. package/.node-version +1 -0
  3. package/basis/BERNAYS.md +233 -0
  4. package/basis/FOUNDING_TRANSCRIPT.md +218 -0
  5. package/basis/TECH_SPEC.md +303 -0
  6. package/basis/VALS.md +255 -0
  7. package/basis/layers/L1_IDENTITY_AUTH.md +78 -0
  8. package/basis/layers/L2_CONVERSATION.md +159 -0
  9. package/basis/layers/L3_RECORDING_STORE.md +104 -0
  10. package/basis/layers/L4_ANALYSIS_PIPELINE.md +257 -0
  11. package/basis/layers/L5_MATCHING_ENGINE.md +164 -0
  12. package/basis/layers/L6_CONSENT_INTRODUCTION.md +143 -0
  13. package/basis/layers/L7_PORTABLE_IDENTITY.md +139 -0
  14. package/basis/layers/STACK.md +64 -0
  15. package/basis/schema.sql +203 -0
  16. package/dist/agent.d.ts +2 -0
  17. package/dist/agent.d.ts.map +1 -0
  18. package/dist/agent.js +114 -0
  19. package/dist/agent.js.map +1 -0
  20. package/dist/api/routes/auth.d.ts +2 -0
  21. package/dist/api/routes/auth.d.ts.map +1 -0
  22. package/dist/api/routes/auth.js +79 -0
  23. package/dist/api/routes/auth.js.map +1 -0
  24. package/dist/api/routes/identity.d.ts +2 -0
  25. package/dist/api/routes/identity.d.ts.map +1 -0
  26. package/dist/api/routes/identity.js +92 -0
  27. package/dist/api/routes/identity.js.map +1 -0
  28. package/dist/api/routes/text-submission.d.ts +2 -0
  29. package/dist/api/routes/text-submission.d.ts.map +1 -0
  30. package/dist/api/routes/text-submission.js +56 -0
  31. package/dist/api/routes/text-submission.js.map +1 -0
  32. package/dist/api/webhooks/twilio.d.ts +2 -0
  33. package/dist/api/webhooks/twilio.d.ts.map +1 -0
  34. package/dist/api/webhooks/twilio.js +144 -0
  35. package/dist/api/webhooks/twilio.js.map +1 -0
  36. package/dist/api/webhooks/vapi.d.ts +2 -0
  37. package/dist/api/webhooks/vapi.d.ts.map +1 -0
  38. package/dist/api/webhooks/vapi.js +177 -0
  39. package/dist/api/webhooks/vapi.js.map +1 -0
  40. package/dist/bot.d.ts +3 -0
  41. package/dist/bot.d.ts.map +1 -0
  42. package/dist/bot.js +39 -0
  43. package/dist/bot.js.map +1 -0
  44. package/dist/index.d.ts +2 -0
  45. package/dist/index.d.ts.map +1 -0
  46. package/dist/index.js +9 -0
  47. package/dist/index.js.map +1 -0
  48. package/dist/jobs/compact-identity.d.ts +2 -0
  49. package/dist/jobs/compact-identity.d.ts.map +1 -0
  50. package/dist/jobs/compact-identity.js +159 -0
  51. package/dist/jobs/compact-identity.js.map +1 -0
  52. package/dist/jobs/consent-call.d.ts +2 -0
  53. package/dist/jobs/consent-call.d.ts.map +1 -0
  54. package/dist/jobs/consent-call.js +70 -0
  55. package/dist/jobs/consent-call.js.map +1 -0
  56. package/dist/jobs/export-identity.d.ts +2 -0
  57. package/dist/jobs/export-identity.d.ts.map +1 -0
  58. package/dist/jobs/export-identity.js +129 -0
  59. package/dist/jobs/export-identity.js.map +1 -0
  60. package/dist/jobs/introduction-call.d.ts +2 -0
  61. package/dist/jobs/introduction-call.d.ts.map +1 -0
  62. package/dist/jobs/introduction-call.js +86 -0
  63. package/dist/jobs/introduction-call.js.map +1 -0
  64. package/dist/jobs/reanalyze-identity.d.ts +2 -0
  65. package/dist/jobs/reanalyze-identity.d.ts.map +1 -0
  66. package/dist/jobs/reanalyze-identity.js +56 -0
  67. package/dist/jobs/reanalyze-identity.js.map +1 -0
  68. package/dist/jobs/run-matching.d.ts +2 -0
  69. package/dist/jobs/run-matching.d.ts.map +1 -0
  70. package/dist/jobs/run-matching.js +200 -0
  71. package/dist/jobs/run-matching.js.map +1 -0
  72. package/dist/jobs/scheduled-matching.d.ts +2 -0
  73. package/dist/jobs/scheduled-matching.d.ts.map +1 -0
  74. package/dist/jobs/scheduled-matching.js +44 -0
  75. package/dist/jobs/scheduled-matching.js.map +1 -0
  76. package/dist/jobs/transcribe-session.d.ts +2 -0
  77. package/dist/jobs/transcribe-session.d.ts.map +1 -0
  78. package/dist/jobs/transcribe-session.js +66 -0
  79. package/dist/jobs/transcribe-session.js.map +1 -0
  80. package/dist/lib/anthropic.d.ts +4 -0
  81. package/dist/lib/anthropic.d.ts.map +1 -0
  82. package/dist/lib/anthropic.js +32 -0
  83. package/dist/lib/anthropic.js.map +1 -0
  84. package/dist/lib/config.d.ts +57 -0
  85. package/dist/lib/config.d.ts.map +1 -0
  86. package/dist/lib/config.js +73 -0
  87. package/dist/lib/config.js.map +1 -0
  88. package/dist/lib/deepgram.d.ts +15 -0
  89. package/dist/lib/deepgram.d.ts.map +1 -0
  90. package/dist/lib/deepgram.js +37 -0
  91. package/dist/lib/deepgram.js.map +1 -0
  92. package/dist/lib/inngest.d.ts +42 -0
  93. package/dist/lib/inngest.d.ts.map +1 -0
  94. package/dist/lib/inngest.js +7 -0
  95. package/dist/lib/inngest.js.map +1 -0
  96. package/dist/lib/openai.d.ts +3 -0
  97. package/dist/lib/openai.d.ts.map +1 -0
  98. package/dist/lib/openai.js +13 -0
  99. package/dist/lib/openai.js.map +1 -0
  100. package/dist/lib/prompts.d.ts +8 -0
  101. package/dist/lib/prompts.d.ts.map +1 -0
  102. package/dist/lib/prompts.js +258 -0
  103. package/dist/lib/prompts.js.map +1 -0
  104. package/dist/lib/r2.d.ts +7 -0
  105. package/dist/lib/r2.d.ts.map +1 -0
  106. package/dist/lib/r2.js +49 -0
  107. package/dist/lib/r2.js.map +1 -0
  108. package/dist/lib/session-helpers.d.ts +8 -0
  109. package/dist/lib/session-helpers.d.ts.map +1 -0
  110. package/dist/lib/session-helpers.js +31 -0
  111. package/dist/lib/session-helpers.js.map +1 -0
  112. package/dist/lib/supabase.d.ts +2 -0
  113. package/dist/lib/supabase.d.ts.map +1 -0
  114. package/dist/lib/supabase.js +11 -0
  115. package/dist/lib/supabase.js.map +1 -0
  116. package/dist/lib/twilio.d.ts +7 -0
  117. package/dist/lib/twilio.d.ts.map +1 -0
  118. package/dist/lib/twilio.js +34 -0
  119. package/dist/lib/twilio.js.map +1 -0
  120. package/dist/lib/vapi.d.ts +4 -0
  121. package/dist/lib/vapi.d.ts.map +1 -0
  122. package/dist/lib/vapi.js +59 -0
  123. package/dist/lib/vapi.js.map +1 -0
  124. package/dist/mcp-server.d.ts +3 -0
  125. package/dist/mcp-server.d.ts.map +1 -0
  126. package/dist/mcp-server.js +177 -0
  127. package/dist/mcp-server.js.map +1 -0
  128. package/dist/types/index.d.ts +104 -0
  129. package/dist/types/index.d.ts.map +1 -0
  130. package/dist/types/index.js +3 -0
  131. package/dist/types/index.js.map +1 -0
  132. package/package.json +28 -0
  133. package/railway.json +14 -0
  134. package/src/agent.ts +123 -0
  135. package/src/api/routes/auth.ts +95 -0
  136. package/src/api/routes/identity.ts +112 -0
  137. package/src/api/routes/text-submission.ts +64 -0
  138. package/src/api/webhooks/twilio.ts +181 -0
  139. package/src/api/webhooks/vapi.ts +219 -0
  140. package/src/bot.ts +44 -0
  141. package/src/index.ts +11 -0
  142. package/src/jobs/compact-identity.ts +211 -0
  143. package/src/jobs/consent-call.ts +87 -0
  144. package/src/jobs/export-identity.ts +166 -0
  145. package/src/jobs/introduction-call.ts +101 -0
  146. package/src/jobs/reanalyze-identity.ts +65 -0
  147. package/src/jobs/run-matching.ts +243 -0
  148. package/src/jobs/scheduled-matching.ts +59 -0
  149. package/src/jobs/transcribe-session.ts +77 -0
  150. package/src/lib/anthropic.ts +37 -0
  151. package/src/lib/config.ts +81 -0
  152. package/src/lib/deepgram.ts +57 -0
  153. package/src/lib/inngest.ts +33 -0
  154. package/src/lib/openai.ts +14 -0
  155. package/src/lib/prompts.ts +266 -0
  156. package/src/lib/r2.ts +79 -0
  157. package/src/lib/session-helpers.ts +37 -0
  158. package/src/lib/supabase.ts +15 -0
  159. package/src/lib/twilio.ts +49 -0
  160. package/src/lib/vapi.ts +80 -0
  161. package/src/mcp-server.ts +195 -0
  162. package/src/types/index.ts +146 -0
  163. package/supabase/.branches/_current_branch +1 -0
  164. package/supabase/.temp/cli-latest +1 -0
  165. package/supabase/.temp/gotrue-version +1 -0
  166. package/supabase/.temp/pooler-url +1 -0
  167. package/supabase/.temp/postgres-version +1 -0
  168. package/supabase/.temp/project-ref +1 -0
  169. package/supabase/.temp/rest-version +1 -0
  170. package/supabase/.temp/storage-migration +1 -0
  171. package/supabase/.temp/storage-version +1 -0
  172. package/supabase/config.toml +384 -0
  173. package/supabase/migrations/20260303000000_initial_schema.sql +203 -0
  174. package/supabase/migrations/20260304000000_brand_consents.sql +13 -0
  175. package/tsconfig.json +25 -0
@@ -0,0 +1,159 @@
1
+ # Layer 2 — Conversation (Intake LLM)
2
+
3
+ **Goal:** User gets a call from an AI within minutes of verifying. Conversation feels warm, curious, unhurried. Not a form. Not a questionnaire. A real conversation that reaches past the social mask.
4
+
5
+ ---
6
+
7
+ ## Calling Infrastructure
8
+
9
+ **Twilio Voice + Twilio Media Streams.**
10
+
11
+ Twilio handles the PSTN (real phone network) — calls actual phones globally. Media Streams pipes bidirectional audio in real time over WebSocket.
12
+
13
+ Why Twilio over alternatives (Vonage, Plivo, Bandwidth):
14
+ - Best real-time audio streaming support for AI integration
15
+ - Most mature webhook architecture
16
+ - Global PSTN coverage including developing markets
17
+ - Handles both inbound (user calls system) and outbound (system calls user) on same infrastructure
18
+ - One vendor, one integration, one billing relationship
19
+
20
+ ---
21
+
22
+ ## Voice AI Stack
23
+
24
+ **Vapi.ai for v0.1. Twilio + OpenAI Realtime API for v1.**
25
+
26
+ ### v0.1 — Vapi.ai
27
+ Purpose-built voice AI platform. Abstracts Twilio + STT + LLM + TTS into one product with a single API.
28
+
29
+ What it handles:
30
+ - Outbound and inbound calls
31
+ - Real-time STT → LLM → TTS pipeline
32
+ - Interruption handling (user cuts AI off mid-sentence — works naturally)
33
+ - Call recording
34
+ - Conversation history across sessions
35
+ - System prompt configuration
36
+
37
+ Why start here: ship in days not weeks. One integration. Good enough latency (~600ms). Full control later.
38
+
39
+ Cost: ~$0.10–0.15/minute of conversation. 30-minute intake = ~$4. Acceptable at early scale.
40
+
41
+ ### v1 — Twilio + OpenAI Realtime API
42
+ Direct integration when we need full control, lower latency, or Vapi pricing becomes a factor.
43
+
44
+ ```
45
+ User phone ←→ Twilio Media Streams (WebSocket) ←→ OpenAI Realtime API ←→ LLM
46
+ ```
47
+
48
+ OpenAI Realtime API handles STT + LLM + TTS in one bidirectional WebSocket. Single round-trip. Latency under 500ms. Voice interruptions handled natively.
49
+
50
+ ---
51
+
52
+ ## Conversation LLM
53
+
54
+ **Claude claude-sonnet-4-6 (Anthropic)** as the underlying model.
55
+
56
+ Why Claude over GPT-4o for conversation:
57
+ - Superior at nuanced, sensitive psychological content (desire, fear, sexuality, trauma) without triggering or becoming clinical
58
+ - Better at holding a persona across long context
59
+ - Longer context window — carries full multi-session history
60
+ - Less likely to deflect on edgy topics that are directly relevant here (kink, open relationships, sexual desire)
61
+
62
+ System prompt contains:
63
+ - Conversation persona (warm, curious, genuinely interested, non-leading)
64
+ - Projective prompt library (rotated, not scripted — AI picks the right moment)
65
+ - Behavioral signal extraction instructions (indirect, never asked as a list)
66
+ - World-view bifurcation questions (embedded naturally into conversation flow)
67
+ - Hard rule: never ask directly "what kind of relationship are you looking for?"
68
+ - Hard rule: never ask about modality preferences directly
69
+ - Instruction to note time-of-day and flag late-night sessions
70
+
71
+ Model sees: full conversation history for THIS user across all sessions. Nothing from any other user. Ever.
72
+
73
+ ---
74
+
75
+ ## Isolation Principle
76
+
77
+ This LLM has zero access to:
78
+ - Other users' profiles
79
+ - The matching engine
80
+ - Compacted identities
81
+ - Modality weights
82
+
83
+ It knows one person. It talks to one person. The conversation is a safe container. Users can say anything — it stays in their lane only.
84
+
85
+ ---
86
+
87
+ ## What the Conversation Captures
88
+
89
+ ### Projective elicitation (reach the unconscious)
90
+ Not asked as a list. Woven into conversation naturally when the moment is right:
91
+ - "Close your eyes for a second. Imagine a big white empty space. What's the first thing that comes to mind?"
92
+ - "Imagine a big empty black space. What do you feel?"
93
+ - "Tell me about a moment you felt completely free."
94
+ - "What are you most afraid of in another person?"
95
+ - "What would your perfect Saturday look like — be specific."
96
+ - Symbols that emerge: scary or joyful? Open or constrictive?
97
+
98
+ ### Indirect behavioral signals
99
+ Never asked directly. Come up in natural conversation:
100
+ - Travel history, passport (open vs closed worldview signal)
101
+ - Communication style (do they texting in exclamation points? ellipses? full sentences?)
102
+ - Living situation, pets, routines
103
+ - Energy level — depleted vs running hot
104
+ - Yes-person or no-person
105
+
106
+ ### World-view bifurcations (the big sorts — asked directly, once, naturally)
107
+ - "Do you think the world is basically a dangerous place or an adventurous one?"
108
+ - "When you meet someone new, is your default to trust them or to wait and see?"
109
+
110
+ These are the non-negotiable compatibility splits. Asked once. Answer weighted heavily.
111
+
112
+ ### Natural conversation signals
113
+ Everything else the person brings up unprompted. Stories about exes. What they're proud of. What they regret. What they want. What they're afraid they want but won't admit.
114
+
115
+ The AI doesn't steer. It listens and follows the thread.
116
+
117
+ ---
118
+
119
+ ## The 4AM Principle
120
+
121
+ System actively encourages off-hours contact. Specifically: late night, early morning, moments of vulnerability.
122
+
123
+ SMS sent after first call: "You can call anytime — even at 3 or 4 in the morning. That's actually when the most interesting conversations happen."
124
+
125
+ Late-night sessions tagged in metadata. Analysis pipeline weights those sessions more heavily — masks are down, defenses are off, signal is richer.
126
+
127
+ ---
128
+
129
+ ## Inbound Calls
130
+
131
+ User can call the system number at any time. Twilio routes inbound to same WebSocket pipeline. System picks up immediately — no hold music, no IVR menu, no "press 1 for..."
132
+
133
+ System recognizes the phone number, loads conversation history, continues where they left off.
134
+
135
+ "Hey, you're back. Last time you were telling me about..."
136
+
137
+ ---
138
+
139
+ ## Async Text Input
140
+
141
+ Between calls, users can submit text:
142
+ - Written reflections
143
+ - Dream journals
144
+ - Things they forgot to say
145
+ - Books / podcasts they want to flag as signal
146
+
147
+ Simple web form (same URL, phone-verified). Text submitted → stored as session with `source: text` → processed in next analysis run.
148
+
149
+ System may reference it in next call: "You mentioned you'd been reading [X] — what drew you to that?"
150
+
151
+ ---
152
+
153
+ ## Session Continuity
154
+
155
+ Every call is a session. Sessions are numbered. History is cumulative.
156
+
157
+ The AI never makes the user re-explain themselves. It builds. It remembers. The longer someone engages, the more it knows them — and the better the eventual match.
158
+
159
+ This is a relationship, not a form.
@@ -0,0 +1,104 @@
1
+ # Layer 3 — Recording Store
2
+
3
+ **Goal:** Every interaction captured at full fidelity. Immutable. Cheap to store forever. Re-analyzable as models improve. The recordings are the ground truth.
4
+
5
+ ---
6
+
7
+ ## What Gets Stored
8
+
9
+ Per session:
10
+ - **Raw audio** — full fidelity `.wav` or `.mp3`
11
+ - **Transcript** — auto-generated, word-level timestamps, speaker-separated
12
+ - **Metadata** — timestamp, duration, source, time-of-day bucket
13
+
14
+ For text submissions:
15
+ - Raw text content
16
+ - Submission timestamp
17
+ - No audio file
18
+
19
+ ---
20
+
21
+ ## Audio Storage
22
+
23
+ **Cloudflare R2** (S3-compatible, cheaper egress than AWS S3).
24
+
25
+ Structure:
26
+ ```
27
+ /{user_id}/{session_id}/audio.mp3
28
+ /{user_id}/{session_id}/transcript.json
29
+ /{user_id}/{session_id}/metadata.json
30
+ ```
31
+
32
+ Access: private only. No public URLs. Presigned URLs generated on demand for internal processing. Users never directly access their audio files (they get narrative summaries instead).
33
+
34
+ Retention: indefinite. This is the user's identity asset.
35
+
36
+ Cost math: 1 hour of audio ≈ 7MB as MP3. 10,000 users × 5 hours average = 350GB = ~$1.50/month on R2. Cost is irrelevant at any realistic scale.
37
+
38
+ ---
39
+
40
+ ## Transcription
41
+
42
+ **Deepgram Nova-2** — runs async after each call ends.
43
+
44
+ Why Deepgram over Whisper (OpenAI):
45
+ - 10–20x faster on long audio
46
+ - Better accuracy on conversational speech (not just dictation)
47
+ - Speaker diarization built in — separates system voice from user voice cleanly
48
+ - Word-level timestamps — useful for future analysis refinement
49
+ - Cheaper at scale: $0.0043/minute (1-hour call = $0.26)
50
+ - Handles accents, informal speech, filler words better than Whisper
51
+
52
+ Transcript stored as JSON:
53
+ ```json
54
+ {
55
+ "duration": 1847,
56
+ "words": [
57
+ {"word": "yeah", "start": 0.24, "end": 0.48, "confidence": 0.99, "speaker": 1},
58
+ ...
59
+ ],
60
+ "paragraphs": [...],
61
+ "summary": "...",
62
+ "topics": [...]
63
+ }
64
+ ```
65
+
66
+ Speaker 0 = system AI. Speaker 1 = user. All downstream analysis processes speaker 1 only.
67
+
68
+ ---
69
+
70
+ ## Database Record
71
+
72
+ PostgreSQL `sessions` table (Supabase):
73
+
74
+ | Column | Type | Notes |
75
+ |--------|------|-------|
76
+ | `session_id` | UUID | Primary key |
77
+ | `user_id` | UUID | Foreign key → users |
78
+ | `started_at` | timestamptz | Call start |
79
+ | `ended_at` | timestamptz | Call end |
80
+ | `duration_seconds` | integer | |
81
+ | `source` | enum | `voice` / `text` / `whatsapp` |
82
+ | `time_of_day_bucket` | enum | `morning` / `afternoon` / `evening` / `night` / `late_night` |
83
+ | `session_number` | integer | nth session for this user |
84
+ | `audio_r2_key` | text | Path in R2 |
85
+ | `transcript_r2_key` | text | Path in R2 |
86
+ | `transcription_status` | enum | `pending` / `processing` / `complete` / `failed` |
87
+ | `analysis_status` | enum | `pending` / `processing` / `complete` |
88
+ | `late_night_flag` | boolean | True if time_of_day = late_night |
89
+
90
+ ---
91
+
92
+ ## Immutability Principle
93
+
94
+ Raw store is append-only. Analysis pipeline reads from it, never writes back to it. Raw recordings are never modified or deleted (unless user explicitly requests data deletion under GDPR/CCPA).
95
+
96
+ Why this matters: analysis models will improve. When we upgrade from Claude claude-sonnet-4-6 to whatever is better in 6 months, we re-run analysis on all existing raw transcripts and get better identity profiles retroactively. The recordings are the permanent asset. The analysis is just the current best interpretation of them.
97
+
98
+ ---
99
+
100
+ ## Processing Trigger
101
+
102
+ After call ends → Vapi.ai (or Twilio) webhook fires → event pushed to job queue → transcription job queued → on completion, analysis job queued.
103
+
104
+ User is completely unaware of this pipeline. They hung up. The system does the rest.
@@ -0,0 +1,257 @@
1
+ # Layer 4 — Analysis Pipeline (Identity Compaction)
2
+
3
+ **Goal:** Take raw transcripts → produce a compacted identity profile + relationship modality weights. Runs async, invisible to user. Re-runnable as models improve.
4
+
5
+ ---
6
+
7
+ ## Trigger
8
+
9
+ Every time a session's transcription completes, an analysis job is queued.
10
+
11
+ **Inngest** for job orchestration — serverless background functions, no Redis or worker infrastructure to operate. Handles retries, delays, fan-out cleanly.
12
+
13
+ Job: `compact_identity(user_id, session_id)`
14
+
15
+ Runs in background. User never waits for this. They hung up, it happens.
16
+
17
+ ---
18
+
19
+ ## Analysis LLM
20
+
21
+ **Claude claude-opus-4-6 (Anthropic)** — most capable model for this step.
22
+
23
+ This is an offline batch process. Latency doesn't matter. Quality does. Opus over Sonnet here.
24
+
25
+ ---
26
+
27
+ ## Two-Pass Process Per Session
28
+
29
+ ### Pass 1 — Session Signal Extraction
30
+
31
+ Input: transcript (user speech only — speaker 1 from Deepgram diarization)
32
+
33
+ Prompt instructs model to extract:
34
+
35
+ **Domain signals:**
36
+ - Relationships — attachment cues, trust language, conflict references, depth vs breadth indicators
37
+ - Desire — explicit and inferred. Kink signals, vanilla signals, shame vs sovereignty language, body relationship
38
+ - Money — scarcity vs abundance framing, risk language, ambition signals, what they said about work/success/security
39
+ - Health — energy descriptors, physicality references, self-care language, mortality framing
40
+
41
+ **World-view signals:**
42
+ - Responses to the two bifurcation questions (dangerous/adventurous, good/bad people)
43
+ - Symbolic responses from any projective prompts used in this session
44
+ - Fear vs joy valence of any imagery that emerged
45
+
46
+ **Behavioral signals:**
47
+ - Travel mentions, passport references
48
+ - Communication style descriptions
49
+ - Living situation, pets, routines mentioned
50
+ - Energy level as described or implied
51
+
52
+ **Modality signals:**
53
+ - Any explicit statements about relationship type preferences
54
+ - Inferred modality lean from language patterns, stories told, desires expressed
55
+ - Divergence between stated and implied (itself high signal)
56
+
57
+ **Framework signals (invisible to user):**
58
+ - VALS type indicators — which segment language patterns match
59
+ - Eight Games — which games appear active (Game 1 binary morality language, Game 8 mechanism language, etc.)
60
+
61
+ Output: `session_signals.json` — structured, per-domain, with direct transcript quotes as evidence for each signal.
62
+
63
+ ---
64
+
65
+ ### Pass 2 — Identity Merge
66
+
67
+ Input:
68
+ - Existing compacted identity (if any — null for session 1)
69
+ - New `session_signals.json`
70
+ - Session metadata (session number, time-of-day, duration)
71
+
72
+ Prompt instructs model to:
73
+ - Integrate new signals with existing identity
74
+ - Weight recent sessions slightly higher than older ones
75
+ - Flag contradictions or evolution (person said X in session 1, implies Y in session 4 — that's data)
76
+ - Update modality weights based on accumulated signals
77
+ - Note confidence level per domain (some domains may have sparse signal still)
78
+
79
+ **Outputs:**
80
+
81
+ ---
82
+
83
+ ## Output A — Base Identity Profile
84
+
85
+ Stored as structured JSON. Narrative descriptions + underlying signal data.
86
+
87
+ ```json
88
+ {
89
+ "updated_at": "2026-03-03T04:17:00Z",
90
+ "session_count": 4,
91
+ "total_minutes": 87,
92
+
93
+ "relationships": {
94
+ "attachment_style": "anxious-secure spectrum, leans secure with time",
95
+ "trust_pattern": "slow to open, deep once established",
96
+ "conflict_mode": "avoidant initially, expressive when safe",
97
+ "depth_vs_breadth": 0.8,
98
+ "confidence": 0.75,
99
+ "evidence": ["said 'I need to feel safe before I open up'", "mentioned 2 long relationships, both 4+ years"]
100
+ },
101
+
102
+ "desire": {
103
+ "expressed_desires": ["emotional intimacy before physical", "eye contact", "slow burn"],
104
+ "inferred_desires": ["dominance dynamic, soft", "being truly seen"],
105
+ "kink_vanilla_spectrum": 0.35,
106
+ "shame_sovereignty_score": 0.6,
107
+ "confidence": 0.55,
108
+ "evidence": [...]
109
+ },
110
+
111
+ "money": {
112
+ "scarcity_abundance_orientation": 0.65,
113
+ "risk_tolerance": 0.7,
114
+ "ambition_contentment": 0.6,
115
+ "financial_mythology_active": ["money = freedom", "security as prerequisite for love"],
116
+ "confidence": 0.5,
117
+ "evidence": [...]
118
+ },
119
+
120
+ "health": {
121
+ "energy_pattern": "high baseline, crashes under stress",
122
+ "physicality_orientation": 0.7,
123
+ "self_care_mode": "reactive not proactive",
124
+ "confidence": 0.4,
125
+ "evidence": [...]
126
+ },
127
+
128
+ "worldview": {
129
+ "world_danger_adventure": 0.72,
130
+ "people_good_bad": 0.65,
131
+ "vals_type": "Experiencer",
132
+ "vals_confidence": 0.7,
133
+ "games_active": ["Game 5 — meta-awareness emerging", "Game 1 residual — binary moral language occasionally"],
134
+ "symbolic_responses": [
135
+ {"prompt": "white empty space", "response": "ocean", "valence": "expansive/joyful"}
136
+ ]
137
+ }
138
+ }
139
+ ```
140
+
141
+ ---
142
+
143
+ ## Output B — Modality Weights
144
+
145
+ Updated after every session. Weighted distribution across all relationship modalities.
146
+
147
+ ```json
148
+ {
149
+ "long-term": 0.65,
150
+ "open-relationship": 0.20,
151
+ "friends": 0.10,
152
+ "casual": 0.05,
153
+ "kink": 0.0,
154
+ "polyamory": 0.0,
155
+ "swinging": 0.0
156
+ }
157
+ ```
158
+
159
+ **Dual-track logic:**
160
+ - Explicit statements about modality preference captured directly as signal
161
+ - Inferred modality from language, stories, desires, fears weighted alongside
162
+ - Stated ≠ true. Divergence between the two is its own signal, logged separately
163
+ - Weights evolve every session — identity is never frozen
164
+
165
+ **Threshold for modality engagement:** weight > 0.1 triggers modality-specific rendering.
166
+
167
+ ---
168
+
169
+ ## Output C — Modality-Specific Profiles
170
+
171
+ For each modality where weight > 0.1, a third LLM pass generates a narrative profile through that modality's lens.
172
+
173
+ Purpose: the matching LLM reads these, not the raw base identity. Each lens emphasizes different dimensions.
174
+
175
+ **Long-term lens** emphasizes: attachment, depth, values alignment, life vision, conflict resolution style
176
+
177
+ **Open-relationship lens** emphasizes: autonomy signals, boundary clarity, jealousy patterns, communication about needs
178
+
179
+ **Kink lens** emphasizes: power dynamic preferences, safety language, community orientation, shame vs sovereignty score
180
+
181
+ **Friends/adventure lens** emphasizes: energy level, activity orientation, spontaneity vs planning, worldview (adventurous score critical here)
182
+
183
+ **Casual lens** emphasizes: physical energy, low-attachment signals, communication style, time orientation
184
+
185
+ Output stored as markdown narrative per modality:
186
+ ```
187
+ /identity/{user_id}/modality/long-term.md
188
+ /identity/{user_id}/modality/open-relationship.md
189
+ ```
190
+
191
+ ---
192
+
193
+ ## Data Sufficiency Tracking
194
+
195
+ `identities` table tracks:
196
+
197
+ | Column | Notes |
198
+ |--------|-------|
199
+ | `session_count` | Total sessions completed |
200
+ | `total_minutes` | Total conversation time |
201
+ | `signal_completeness_score` | 0–1, how complete identity feels across all domains + worldview |
202
+ | `domains_confident` | JSON — which of the four domains have confidence > 0.6 |
203
+ | `ready_for_matching` | Boolean — true when completeness threshold met |
204
+
205
+ **Completeness threshold (v0.1):** 3+ sessions OR 30+ minutes, AND at least the worldview bifurcation questions answered.
206
+
207
+ Below threshold: matching engine skips this person. System may prompt them to call again via SMS.
208
+
209
+ "The more you share, the better your matches get. Call whenever you feel like it."
210
+
211
+ ---
212
+
213
+ ## Storage
214
+
215
+ PostgreSQL `identities` table (Supabase):
216
+
217
+ | Column | Type | Notes |
218
+ |--------|------|-------|
219
+ | `user_id` | UUID | 1:1 with users |
220
+ | `updated_at` | timestamptz | Last compaction run |
221
+ | `version` | integer | Increments each compaction |
222
+ | `base_profile` | JSONB | Queryable, the full structured identity |
223
+ | `modality_weights` | JSONB | Queryable, the weight distribution |
224
+ | `signal_completeness_score` | float | 0–1 |
225
+ | `ready_for_matching` | boolean | |
226
+ | `embedding_id` | UUID | FK to pgvector store |
227
+
228
+ Modality profile markdown files stored in R2:
229
+ ```
230
+ /identity/{user_id}/modality/{modality}.md
231
+ ```
232
+
233
+ ---
234
+
235
+ ## Embedding
236
+
237
+ After each identity update, re-embed the compacted profile for vector search.
238
+
239
+ **OpenAI text-embedding-3-large** (3072 dimensions) — best semantic coverage for nuanced psychological content. Captures meaning, not just keywords.
240
+
241
+ What gets embedded: the full base identity JSON serialized to text + all active modality profiles concatenated.
242
+
243
+ Per-modality embeddings also generated — separate embedding per active modality profile. Matching engine uses per-modality embeddings for per-modality search.
244
+
245
+ Stored in **pgvector** (PostgreSQL extension via Supabase) — vector search co-located with identity data. No separate vector DB to operate at v0.1. Handles millions of users before needing to migrate to dedicated infrastructure (Pinecone, Weaviate).
246
+
247
+ ---
248
+
249
+ ## Re-Analysis
250
+
251
+ When a better model is available (Claude Opus 5, whatever comes next):
252
+ 1. Mark all sessions as `analysis_status: stale`
253
+ 2. Re-queue analysis jobs for all sessions in chronological order per user
254
+ 3. Compacted identities rebuilt from scratch with better model
255
+ 4. Better identities → better matches automatically
256
+
257
+ The raw recordings + transcripts are the permanent asset. Everything else is derived and re-derivable.
@@ -0,0 +1,164 @@
1
+ # Layer 5 — Matching Engine
2
+
3
+ **Goal:** Find compatible people globally. Run continuously as profiles update. Surface only high-confidence matches. Never show a match the system isn't confident about.
4
+
5
+ ---
6
+
7
+ ## Isolation Principle
8
+
9
+ Entirely separate LLM context from the Conversation LLM. The matching engine sees compacted profiles and modality narratives only. Never raw recordings. Never conversation transcripts. Never personally identifying information.
10
+
11
+ Two people reduced to their psychographic essence. Matched on that.
12
+
13
+ ---
14
+
15
+ ## When It Runs
16
+
17
+ **Inngest scheduled function — every hour.**
18
+
19
+ Processes all users whose identity was updated since the last run (new session compacted, new embedding generated).
20
+
21
+ Not triggered by user action. Runs silently in background forever.
22
+
23
+ ---
24
+
25
+ ## Step 1 — Vector Candidate Retrieval
26
+
27
+ **pgvector cosine similarity search** across all ready-for-matching users.
28
+
29
+ For each user being processed:
30
+ - Run per-modality vector search for each modality where their weight > 0.1
31
+ - Each search returns top 50 candidates by embedding similarity in that modality's embedding space
32
+ - Results weighted by the user's modality weight for that modality
33
+ - Merge all modality results, deduplicate, rank by weighted score
34
+
35
+ Filters applied at query level (cheap, no LLM):
36
+ - `ready_for_matching = true` on both sides
37
+ - `user_id != self`
38
+ - Not already in `matches` table for this pair (in any direction or status)
39
+ - Not in `declined_pairs` table
40
+
41
+ Result: up to 150 candidates per user (50 per modality × 3 modalities max), deduplicated and ranked.
42
+
43
+ ---
44
+
45
+ ## Step 2 — Hard Filter Pass
46
+
47
+ Eliminate obvious incompatibilities before touching the LLM. These are binary blocks.
48
+
49
+ **World-view bifurcation gates (non-negotiable):**
50
+ - `world_danger_adventure` delta between two people > 0.5 → eliminated
51
+ - `people_good_bad` delta > 0.5 → eliminated
52
+
53
+ Person who sees the world as dangerous and people as bad lives in a different universe than someone who sees it as adventurous and people as good. No amount of profile similarity bridges that. Eliminate before wasting compute.
54
+
55
+ **Data gate:**
56
+ - Either person's `signal_completeness_score` < 0.5 → deferred, not eliminated. Re-check next run when more data arrives.
57
+
58
+ **Geographic gate (optional, user-configurable):**
59
+ - Default: global — no geographic filter
60
+ - User can set a radius preference during conversation ("I'm open to long distance" vs "I really want someone in the same city")
61
+ - If set, filter applied here
62
+
63
+ Result after hard filter: typically 10–30 candidates from the original 150.
64
+
65
+ ---
66
+
67
+ ## Step 3 — LLM Validation Loop
68
+
69
+ **Claude claude-sonnet-4-6** — needs to be fast enough to process volume. Sonnet not Opus here.
70
+
71
+ For each candidate pair surviving hard filter, one LLM call.
72
+
73
+ **Input to LLM:**
74
+ - Person A's modality profile for the relevant modality (the markdown narrative)
75
+ - Person B's modality profile for the relevant modality
76
+ - Their worldview JSON (bifurcation scores, VALS type, Games active)
77
+ - Cross-modality flag + both modality weights if this is a cross-modality candidate
78
+ - Geographic info if relevant
79
+
80
+ **LLM outputs structured assessment:**
81
+
82
+ ```json
83
+ {
84
+ "compatible": true,
85
+ "confidence": 0.84,
86
+ "primary_modality": "long-term",
87
+ "cross_modality": false,
88
+ "cross_modality_bridge": null,
89
+ "resonances": [
90
+ "both hold world as adventurous and expansive",
91
+ "depth orientation strongly aligned — both slow to open, deep once in",
92
+ "trust-building style compatible — neither rushes intimacy"
93
+ ],
94
+ "tensions": [
95
+ "energy levels may differ — one runs hot, one more measured"
96
+ ],
97
+ "tension_fatal": false,
98
+ "consent_call_framing": "I found someone who shares the way you see the world — as a place to explore, not to fear. They take their time getting close to people too. There's something here worth a conversation.",
99
+ "data_gaps": ["desire domain sparse for Person B — only 1 session"]
100
+ }
101
+ ```
102
+
103
+ **Confidence thresholds:**
104
+ - `confidence < 0.70` → discard
105
+ - `confidence 0.70–0.84` → hold. Check if more data arrives in next 2 runs, then surface or discard
106
+ - `confidence ≥ 0.85` → queue for consent gate immediately
107
+
108
+ **Data gap handling:**
109
+ If LLM flags data gaps for either person → downgrade confidence, add to hold queue, system prompts that person to call again.
110
+
111
+ **Cross-modality matching:**
112
+ Enabled. If Person A is 0.65 long-term and Person B is 0.55 open-relationship but their profiles are highly compatible, the LLM assesses the cross-modality bridge:
113
+ - Is there genuine compatibility across the modality difference?
114
+ - What's the bridge — what do they share that transcends the modality difference?
115
+ - Flag `cross_modality: true` so consent call framing can acknowledge it without revealing the other person's modality explicitly
116
+
117
+ ---
118
+
119
+ ## Step 4 — Match Confirmation & Storage
120
+
121
+ Match confirmed when:
122
+ - `compatible: true`
123
+ - `confidence ≥ 0.85` (or held at 0.70–0.84 and second run didn't improve)
124
+ - `tension_fatal: false`
125
+ - No data gaps flagged by LLM
126
+
127
+ **`matches` table:**
128
+
129
+ | Column | Type | Notes |
130
+ |--------|------|-------|
131
+ | `match_id` | UUID | |
132
+ | `user_a_id` | UUID | |
133
+ | `user_b_id` | UUID | |
134
+ | `primary_modality` | text | |
135
+ | `cross_modality` | boolean | |
136
+ | `confidence_score` | float | |
137
+ | `resonances` | JSONB | Array of resonance strings |
138
+ | `tensions` | JSONB | Array of tension strings |
139
+ | `tension_fatal` | boolean | |
140
+ | `consent_call_framing` | text | Pre-written by LLM for consent call |
141
+ | `status` | enum | `pending_consent` / `a_accepted` / `b_accepted` / `both_accepted` / `declined` / `introduced` |
142
+ | `created_at` | timestamptz | |
143
+ | `a_consented_at` | timestamptz | |
144
+ | `b_consented_at` | timestamptz | |
145
+
146
+ **Output per user:** 5–10 confirmed matches maximum in the queue at any time. System doesn't pile up 50 pending matches — it throttles. Quality over volume.
147
+
148
+ ---
149
+
150
+ ## Feedback Loop
151
+
152
+ After introduction call + 24-hour follow-up SMS (one-tap thumbs up/down):
153
+
154
+ Positive outcome → strengthen the signal patterns that generated this match:
155
+ - Note which resonance signals predicted success
156
+ - Adjust embedding weights over time
157
+
158
+ Negative outcome → note what didn't work:
159
+ - Which tensions were underweighted?
160
+ - Was the confidence score miscalibrated?
161
+
162
+ Over time: matching engine learns from outcomes, not just profiles. This is the moat — the system gets better with every introduction.
163
+
164
+ V0.1: simple logging. V1: feedback fed back into embedding and confidence calibration.