recallmem 0.1.0 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -6,25 +6,9 @@
6
6
  </p>
7
7
 
8
8
  <p align="center">
9
- <strong>Persistent personal AI.</strong> Powered by Gemma 4 running locally on your own machine.
10
- <br> <br>
11
- This is not a chatbot (chatbots forget you) & this is not an agent (agents don't remember you). <br> <br> This IS a private AI, built on a deterministic memory framework where the LLM never touches your data.
9
+ <strong>Persistent Private AI.</strong> Powered by Gemma 4 running locally on your own machine.
12
10
  </p>
13
11
 
14
- <p align="center">
15
- Two products in one repo:
16
- <br>
17
- <strong>👤 For users:</strong> install with <code>npx recallmem</code> and start chatting with an AI that actually remembers you.
18
- <br>
19
- <strong>👨‍💻 For developers:</strong> fork it and build your own AI app on top of the memory framework in <code>lib/</code>.
20
- </p>
21
-
22
- ```bash
23
- npx recallmem
24
- ```
25
-
26
- That's the install. One command. The CLI handles the rest. It clones the repo, sets up the database, pulls the local AI models, writes the config file, opens the chat in your browser. If you've already got Node, Postgres, and Ollama installed, you're chatting with your own private AI in about 5 minutes.
27
-
28
12
  <p align="center">
29
13
  <img src="./public/screenshots/demo.png" alt="RecallMEM chat UI showing the AI remembering the user's name across conversations" width="900">
30
14
  </p>
@@ -35,701 +19,162 @@ That's the install. One command. The CLI handles the rest. It clones the repo, s
35
19
 
36
20
  ---
37
21
 
38
- ## Why I built this
39
-
40
- I wanted my own private AI for the kind of conversations I don't want sitting on someone else's server. Personal stuff. The stuff you'd actually want a real friend to help you think through.
41
-
42
- The default model is **Gemma 4** (Google's open weights model that just dropped, Apache 2.0) running locally via Ollama. You can pick any size from E2B (runs on a phone) up to the 31B Dense (best quality, needs a workstation). Or skip Ollama entirely and bring your own API key for Claude, GPT, Groq, Together, OpenRouter, or anything OpenAI-compatible. Your call.
43
-
44
- The thing is, the memory is the actual differentiator. Not the model. Not the UI. The memory. The AI builds a profile of who you are over time. It extracts facts after every conversation. It vector-searches across every chat you've ever had to find relevant context. By the time you've used it for a week, it knows you better than ChatGPT ever will, because ChatGPT forgets you the second you close the tab.
45
-
46
- <details>
47
- <summary><strong>The longer version (what's wrong with every other "private AI" tool)</strong></summary>
48
-
49
- Here's the problem with every "private AI" tool I tried: they all fall into one of three buckets.
50
-
51
- 1. **Local chat UIs for Ollama.** Look pretty, but the AI has zero memory between conversations. Every chat is a stranger.
52
- 2. **Memory libraries on GitHub.** Powerful, but they're SDKs. You have to build the whole UI yourself.
53
- 3. **Cloud-based memory products like Mem0.** Have the full feature set, but your data goes to their servers. Defeats the whole point.
54
-
55
- There's a gap right in the middle: a **complete personal AI app with real working memory that runs 100% on your machine**. So I built it.
56
-
57
- </details>
58
-
59
- ---
60
-
61
- ## What it does
62
-
63
- Persistent memory across every chat (profile + facts + vector search) with **temporal awareness** so the model knows what's current vs historical. Auto-extracts facts in real time, retires stale ones when the truth changes, stamps every memory with dates. Vector search over every past chat. Memory inspector you can edit. Custom rules. Wipe memory unrecoverably. File uploads (images, PDFs, code). Web search when using Anthropic. Bring your own LLM (Ollama, Anthropic, OpenAI, or any OpenAI-compatible API). Warm Claude-style dark mode.
64
-
65
- <details>
66
- <summary><strong>Full feature list</strong></summary>
67
-
68
- - **Persistent memory across every chat.** Three layers: a synthesized profile of who you are, an extracted facts table, and vector search over all past conversations.
69
- - **Live fact extraction.** Facts get extracted after every assistant reply, not just when the chat ends. Say "my birthday is 11/27" and refresh `/memory` a moment later, it's already there. Always uses the local FAST_MODEL so cloud users don't get billed per turn.
70
- - **Temporal awareness solves context collapse.** Every fact is stamped with a `valid_from` date. When new information contradicts an old fact ("left Acme" replaces "works at Acme"), the old fact gets retired automatically. The model always sees what's current.
71
- - **Self-healing categories.** Facts re-route to the correct category after every chat, edit, or delete. No LLM, just a deterministic loop. So when the categorizer improves, your existing memory improves with it.
72
- - **Resumed-conversation markers.** Open a chat from last week and continue it, the AI sees a system marker like `[Conversation resumed 6 days later]` so it knows time passed and earlier turns are historical.
73
- - **Dated recall.** When the vector search pulls relevant chunks from past chats, each one is prefixed with the date it came from so the model can tell history from the present.
74
- - **Auto-builds your profile** from the extracted facts, with date stamps in every section. Updates after every reply.
75
- - **Vector search across past conversations.** Ask about something you discussed last month, the AI finds it and uses it as context.
76
- - **Memory inspector page.** View, edit, or delete every fact, with collapsible category sections and a search filter for navigating long lists.
77
- - **Sidebar chat search.** Toggle between vector search (semantic, needs Ollama for embeddings) and text search (literal ILIKE on titles + transcripts, instant). Both search inside the conversations, not just titles.
78
- - **Web search toggle.** When you're using an Anthropic provider, a globe button next to the input lets Claude actually browse the web. Hidden for Ollama since local models don't have it.
79
- - **Custom rules.** Tell the AI how you want to be talked to. "Don't gaslight me." "I have dyslexia, no bullet points." "Don't add disclaimers." It applies them in every chat.
80
- - **Wipe memory unrecoverably.** `DELETE` + `VACUUM FULL` + `CHECKPOINT`. Gone for good at the database level.
81
- - **File uploads.** Drag and drop images, PDFs, code, text. Gemma 4 handles vision natively.
82
- - **Warm dark mode.** Claude-style charcoal palette via CSS variables, persisted across refreshes with no flash-of-light.
83
- - **Chat history sidebar** with date grouping, pinned chats, and the search toggle described above.
84
- - **Markdown rendering** for headings, code blocks, tables.
85
- - **Streaming responses** with smooth typewriter rendering.
86
- - **Bring any LLM you want.** Local Gemma 4 via Ollama, or plug in Anthropic (Claude), OpenAI (GPT), or any OpenAI-compatible API (Groq, Together, OpenRouter, Mistral, vLLM, LM Studio, etc).
87
- - **Test connection** for cloud providers before saving the API key, so you don't find out your key is wrong mid-chat.
88
-
89
- </details>
90
-
91
- ---
92
-
93
- ## How is this different?
94
-
95
- <details>
96
- <summary><strong>Comparison table vs ChatGPT, Claude.ai, and Mem0</strong></summary>
97
-
98
- | | RecallMEM | ChatGPT / Claude.ai | Mem0 |
99
- |---|---|---|---|
100
- | **Runs locally** | ✅ | ❌ | ❌ |
101
- | **Memory retrieval is deterministic (no LLM tool calls)** | ✅ | ❌ | ❌ |
102
- | **Persistent memory across chats** | ✅ | partial | ✅ |
103
- | **Temporal awareness (memories know when they were true)** | ✅ | ❌ | ❌ |
104
- | **Auto-retires stale facts when truth changes** | ✅ | ❌ | ❌ |
105
- | **You can edit / delete memories** | ✅ | partial | ✅ |
106
- | **Vector search over past chats** | ✅ | ❌ | ✅ |
107
- | **Custom rules / behavior** | ✅ | ✅ | ❌ |
108
- | **Bring your own LLM (any provider)** | ✅ | ❌ | ❌ |
109
- | **Use local models (Gemma 4, Llama, etc)** | ✅ | ❌ | ❌ |
110
- | **No account / no signup** | ✅ | ❌ | ❌ |
111
- | **Free** | ✅ | partial | partial |
112
- | **Source available** | ✅ Apache 2.0 | ❌ | partial |
113
-
114
- </details>
115
-
116
- <details>
117
- <summary><strong>The actual differentiator nobody talks about (deterministic memory)</strong></summary>
118
-
119
- The thing nobody is doing right is **how memory is read and written**.
120
-
121
- In ChatGPT and Claude.ai with memory turned on, the LLM is in charge of memory. The model decides when to remember something during your conversation. The model decides what to remember. The model decides what to retrieve when you ask a question. The whole memory layer is implemented as model behavior. You're trusting the LLM to be a librarian, and LLMs are not librarians. They hallucinate.
122
-
123
- RecallMEM does it backwards. **The chat LLM never touches your memory database.** Not for reads, not for writes. The LLM only ever sees a system prompt that's already been assembled by deterministic TypeScript and SQL. Here's the actual flow:
124
-
125
- **When you send a message (memory READ path, 100% deterministic):**
126
-
127
- 1. Plain SQL `SELECT` pulls your profile from `s2m_user_profiles`
128
- 2. Plain SQL `SELECT` pulls your top *active* facts from `s2m_user_facts` (retired facts are excluded automatically)
129
- 3. Each fact is stamped with its `valid_from` date so the model can reason about timelines
130
- 4. EmbeddingGemma converts your message to a 768-dim vector (math, not generation)
131
- 5. pgvector cosine similarity search ranks chunks from past conversations
132
- 6. Each retrieved chunk is stamped with its source-chat date (`[from conversation on 2026-03-12]`) so the model can tell history from now
133
- 7. If the chat is being resumed after a multi-hour gap, a one-time system marker like `[Conversation resumed 6 days later]` gets injected before the new user turn
134
- 8. TypeScript template assembles all of it into a system prompt
135
- 9. **Then** the chat LLM gets called, with the assembled context already in its prompt
136
-
137
- The chat LLM never queries the database. It can't decide what to retrieve. It can't pick which facts are relevant. It can't hallucinate a memory that doesn't exist, because if it's not in the prompt, it doesn't exist for the model. The retrieval is 100% deterministic SQL + cosine similarity. No LLM tool calls touching your memory store.
138
-
139
- **After every assistant reply (memory WRITE path, LLM proposes, TypeScript validates):**
140
-
141
- A small local LLM (Gemma 4 E4B via Ollama) runs in the background to extract candidate facts from the running transcript. This happens fire-and-forget after the stream closes, so you never wait for it. It always uses the local model regardless of which provider the chat itself is using, so cloud users (Claude, GPT) don't get billed per turn for extraction.
142
-
143
- The same LLM call also returns the IDs of any **existing** facts the new conversation contradicts. So when you say "I just left Acme to start a new job," the extractor returns the new fact AND flags the old "User works at Acme" fact for retirement. The TypeScript layer flips those rows to `is_active=false` and stamps `valid_to=NOW()`. History is preserved, the active set always reflects current truth.
144
-
145
- But here's the key: the LLM only **proposes** facts and supersession decisions. It cannot write to the database. The TypeScript layer is the actual gatekeeper, and it runs every candidate fact through six validation steps before storage:
146
-
147
- 1. **Quality gate.** Conversations under 100 characters get zero facts extracted. The LLM never even sees them.
148
- 2. **JSON parse validation.** If the LLM returns malformed JSON or no array, the entire batch is dropped.
149
- 3. **Type validation.** Only strings survive. Objects, numbers, nested arrays, all rejected.
150
- 4. **Garbage pattern filtering.** A regex filter catches the most common LLM hallucinations: meta-observations like "user asked about X", AI behavior notes like "AI suggested Y", non-facts like "not mentioned", mood observations like "had a good conversation", and anything under 10 characters.
151
- 5. **Deduplication.** Case-insensitive normalized match against the entire facts table. Duplicates get dropped.
152
- 6. **Categorization.** The category (Identity, Family, Work, Health, etc.) is decided by **keyword matching in TypeScript**, not by the LLM. The LLM has no say in how facts get organized.
153
-
154
- After all six steps, the surviving facts get a plain SQL `INSERT`. And even then, you can edit or delete any fact in the Memory page if you don't agree with it.
155
-
156
- **Why this matters:**
157
-
158
- - **Predictability.** When you mention "my dog" in a chat, RecallMEM **always** retrieves the facts that match "dog" via cosine similarity. ChatGPT retrieves whatever the model decides to retrieve, which can vary run to run.
159
- - **No hallucinated retrieval.** The LLM cannot remember something that isn't actually in your facts table. If it's not in the database, it's not in the prompt.
160
- - **Auditability.** You can look at any chat and trace exactly which facts and chunks were loaded into the system prompt. With ChatGPT, you can't see what the model decided to surface from memory.
161
- - **No prompt injection memory leaks.** The LLM in RecallMEM only sees what the deterministic layer feeds it. It can't query the rest of the database. With ChatGPT, the model has tool access to memory, which means a prompt injection attack could theoretically make it dump memory contents.
162
- - **Your data, your database.** Memory is data you control, not behavior you have to trust the model to do correctly. You can write a script that queries Postgres directly, edit facts manually, run analytics on your own conversations.
163
-
164
- This is the actual reason RecallMEM exists. Not "another local chat UI." A memory architecture where the LLM is intentionally not in charge.
165
-
166
- </details>
167
-
168
- ---
169
-
170
- ## For developers (the memory framework)
171
-
172
- Underneath the chat UI, RecallMEM is a **deterministic memory framework** you can fork and use in your own AI app. The whole `lib/` folder is intentionally framework-shaped. It's not a polished SDK with a public API contract, but it IS a working, opinionated memory architecture you can copy into your own project.
22
+ ## What is this
173
23
 
174
- <details>
175
- <summary><strong>What's in <code>lib/</code> and how to embed it in your app</strong></summary>
176
-
177
- **The core files in `lib/`:**
178
-
179
- ```
180
- lib/
181
- ├── memory.ts Memory orchestrator. Loads profile + facts + vector recall in parallel.
182
- ├── prompts.ts Assembles the system prompt with all the memory context.
183
- ├── facts.ts Fact extraction (LLM proposes) + validation (TypeScript decides).
184
- ├── profile.ts Synthesizes a structured profile from the active facts.
185
- ├── chunks.ts Splits transcripts into chunks, embeds them, runs vector search.
186
- ├── chats.ts Chat CRUD + transcript serialization with the smart parser.
187
- ├── post-chat.ts The post-chat pipeline (title gen, fact extract, profile rebuild, embed).
188
- ├── rules.ts Custom user rules / instructions.
189
- ├── embeddings.ts EmbeddingGemma calls via Ollama.
190
- ├── llm.ts LLM router (Ollama, Anthropic, OpenAI, OpenAI-compatible).
191
- └── db.ts Postgres pool + the configurable user ID resolver.
192
- ```
193
-
194
- **Embedding it into your own app:**
195
-
196
- The lib functions default to a single-user setup (`user_id = "local-user"`) but you can wire in your own auth system with two function calls at startup:
197
-
198
- ```typescript
199
- import { Pool } from "pg";
200
- import { configureDb, setUserIdResolver } from "./lib/db";
201
-
202
- // Use your existing Postgres pool (or skip this and let lib/ create its own)
203
- const myPool = new Pool({ connectionString: process.env.DATABASE_URL });
204
- configureDb({ pool: myPool });
205
-
206
- // Wire in your auth system. Called whenever a lib function needs the current user.
207
- // Can be sync or async. Return whatever string identifies the user in your app.
208
- setUserIdResolver(() => getCurrentUserFromMyAuthSystem());
209
- ```
210
-
211
- That's it. No other changes needed. Every lib function (`getProfile`, `getActiveFacts`, `searchChunks`, `storeFacts`, `rebuildProfile`, etc.) reads from the configured resolver. Your auth system stays in your code, the memory framework stays in `lib/`.
212
-
213
- **Using the memory layer in a chat request:**
24
+ A personal AI chat app with real memory that runs 100% on your machine. Your conversations stay local. The AI builds a profile of who you are over time, extracts facts after every chat, and vector-searches across your entire history to find relevant context. By the time you've used it for a week, it knows you better than any cloud AI because it never forgets.
214
25
 
215
- ```typescript
216
- import { buildMemoryAwareSystemPrompt } from "./lib/memory";
217
- import { runPostChatPipeline } from "./lib/post-chat";
218
- import { createChat, updateChat } from "./lib/chats";
26
+ The default model is **Gemma 4** (Apache 2.0) running locally via Ollama. Pick any size from E2B (runs on a phone) up to 31B Dense (best quality, needs a workstation). Or skip Ollama entirely and bring your own API key for Claude, GPT, Groq, Together, OpenRouter, or anything OpenAI-compatible.
219
27
 
220
- // 1. Build the system prompt from the user's memory
221
- const systemPrompt = await buildMemoryAwareSystemPrompt(
222
- userMessage,
223
- currentChatId
224
- );
28
+ The memory is the actual differentiator. Not the model. Not the UI. Memory reads are deterministic SQL + cosine similarity, not LLM tool calls. The chat model never touches your database. Facts are proposed by a local LLM but validated by TypeScript before storage. [Deep dive on the architecture →](./docs/ARCHITECTURE.md)
225
29
 
226
- // 2. Send to your LLM however you want (Ollama, Claude, GPT, whatever)
227
- const response = await yourLLM.chat([
228
- { role: "system", content: systemPrompt },
229
- ...conversationHistory,
230
- { role: "user", content: userMessage },
231
- ]);
30
+ ## Features
232
31
 
233
- // 3. Save the chat
234
- await updateChat(chatId, [...conversationHistory, { role: "assistant", content: response }]);
32
+ - **Three-layer memory** across every chat: synthesized profile, extracted facts table, and vector search over all past conversations
33
+ - **Temporal awareness** so the model knows what's current vs. historical. Auto-retires stale facts when the truth changes.
34
+ - **Live fact extraction** after every assistant reply, not just when the chat ends
35
+ - **Memory inspector** where you can view, edit, or delete every fact
36
+ - **Vector search** across past conversations with dated recall
37
+ - **Custom rules** for how you want the AI to talk to you
38
+ - **File uploads** (images, PDFs, code). Gemma 4 handles vision natively.
39
+ - **Web search** when using Anthropic or Ollama (via Brave Search)
40
+ - **Wipe memory unrecoverably** with `DELETE` + `VACUUM FULL` + `CHECKPOINT`
41
+ - **Bring any LLM.** Ollama, Anthropic, OpenAI, or any OpenAI-compatible API.
235
42
 
236
- // 4. (Async) Run the post-chat pipeline to extract facts, rebuild profile, embed chunks
237
- runPostChatPipeline(chatId);
238
- ```
239
-
240
- The memory framework doesn't care which LLM you use. It just assembles context. Bring your own model.
241
-
242
- **The schema lives in `migrations/001_init.sql`.** Run it against any Postgres 17+ database with the pgvector extension installed. Tables are prefixed `s2m_` (for "speak2me," the project this came from). Rename them in the migration if you want a different prefix.
243
-
244
- **License:** Apache 2.0. Fork it, modify it, ship it commercially. The only ask is that you preserve the copyright notice and the NOTICE file. See [CONTRIBUTING.md](./CONTRIBUTING.md) for the full guide.
43
+ ## Quick start (Mac)
245
44
 
246
- </details>
247
-
248
- ---
45
+ RecallMEM is built and tested on macOS. Mac is the supported platform.
249
46
 
250
- ## Quick start
47
+ **Prerequisites:** Node.js 20+ and [Homebrew](https://brew.sh).
251
48
 
252
49
  ```bash
253
50
  npx recallmem
254
51
  ```
255
52
 
256
- You need three things on your machine first: **Node.js 20+**, **Postgres 17 with pgvector**, and **Ollama** (optional, skip if you only want cloud providers). If any are missing, the CLI tells you exactly what to install for your OS.
257
-
258
- <details>
259
- <summary><strong>Architecture diagrams (system, memory layers, post-chat sequence)</strong></summary>
260
-
261
- ### System architecture
262
-
263
- ```mermaid
264
- flowchart TB
265
- Browser["Browser<br/>Chat UI<br/>localhost:3000"]
266
- NextJS["Next.js App<br/>API routes + SSR"]
267
- Postgres[("Postgres + pgvector<br/>localhost:5432<br/>Chats, facts, profile, embeddings")]
268
- Ollama["Ollama<br/>localhost:11434<br/>Gemma 4 + EmbeddingGemma"]
269
- Cloud{{"Optional: Cloud LLMs<br/>Anthropic / OpenAI / etc.<br/>Only if you add a provider"}}
270
-
271
- Browser <-->|HTTP / SSE| NextJS
272
- NextJS <-->|SQL + vector queries| Postgres
273
- NextJS <-->|"/api/chat<br/>/api/embed"| Ollama
274
- NextJS -.->|Optional API call| Cloud
275
-
276
- style Cloud stroke-dasharray: 5 5
277
- style Ollama fill:#dfe
278
- style Postgres fill:#dfe
279
- style NextJS fill:#dfe
280
- style Browser fill:#dfe
281
- ```
282
-
283
- Everything in green runs on your machine. The dashed cloud box only activates if you explicitly add a cloud provider in settings. Otherwise, nothing leaves your computer. Ever.
284
-
285
- ### The three-layer memory system
286
-
287
- ```mermaid
288
- flowchart LR
289
- Chat[New chat message]
290
- Memory["Memory loader<br/>(parallel)"]
291
- Profile["Layer 1: Profile<br/>Synthesized summary<br/>(IDENTITY, FAMILY,<br/>WORK, HEALTH...)"]
292
- Facts["Layer 2: Facts<br/>Top 50 atomic statements<br/>(pinned to system prompt)"]
293
- Vector["Layer 3: Vector search<br/>Top 5 chunks from past<br/>conversations<br/>(semantic similarity)"]
294
- Rules["User custom rules<br/>(behavior instructions)"]
295
- Prompt["System prompt<br/>(profile + facts + recall + rules)"]
296
- LLM[LLM]
297
- Response[Streaming response]
298
-
299
- Chat --> Memory
300
- Memory --> Profile
301
- Memory --> Facts
302
- Memory --> Vector
303
- Memory --> Rules
304
- Profile --> Prompt
305
- Facts --> Prompt
306
- Vector --> Prompt
307
- Rules --> Prompt
308
- Prompt --> LLM
309
- LLM --> Response
310
- ```
311
-
312
- Each layer does a different job:
313
-
314
- - **Profile** loads instantly. It's the "who am I talking to" baseline. One database row, always loaded into every system prompt.
315
- - **Facts** are atomic statements you can view, edit, and delete. Stored as individual rows. Pinned into the prompt every conversation.
316
- - **Vector search** finds semantically relevant prose from any past conversation. Catches the stuff that doesn't fit cleanly into facts, like that idea you were working through three weeks ago.
317
-
318
- Together, they let the AI know your name, your family, your job, AND remember the specific thing you mentioned a month ago when it becomes relevant.
319
-
320
- ### What happens when you end a chat
321
-
322
- ```mermaid
323
- sequenceDiagram
324
- actor User
325
- participant UI as Chat UI
326
- participant API as /api/chat/finalize
327
- participant LLM
328
- participant DB as Postgres
329
-
330
- User->>UI: Click "New chat"
331
- UI->>UI: Show "Saving memory..."
332
- UI->>API: POST chatId
333
- API->>LLM: Generate title (Gemma E4B)
334
- LLM-->>API: "Discussing project ideas"
335
- API->>DB: Save title
336
- API->>LLM: Extract facts (Gemma E4B)
337
- LLM-->>API: ["User's name is...", "User works at...", ...]
338
- API->>DB: Insert new facts (deduped)
339
- API->>DB: Rebuild profile from all facts
340
- API->>API: Embed transcript chunks
341
- API->>DB: Insert embeddings
342
- API-->>UI: Done
343
- UI->>UI: Clear chat, ready for next
344
- ```
345
-
346
- Click "New chat", wait a few seconds, and the next conversation immediately sees the new memory.
347
-
348
- </details>
349
-
350
- <details>
351
- <summary><strong>Hardware requirements (which model fits which machine)</strong></summary>
352
-
353
- The biggest variable is which LLM you pick. RecallMEM lets you choose.
354
-
355
- ### Fully open source (Ollama + Gemma 4 locally)
356
-
357
- | Setup | Model | RAM | Speed | Quality |
358
- |---|---|---|---|---|
359
- | Phone / iPad | Gemma 4 E2B | 8GB | Fast | Basic |
360
- | MacBook Air / Mac Mini M4 | Gemma 4 E4B | 16GB | Fast | Good |
361
- | Mac Studio M2+ | Gemma 4 26B MoE | 32GB+ | Very fast | Great |
362
- | Workstation / server | Gemma 4 31B Dense | 32GB+ | Slower | Best |
363
-
364
- The 26B MoE is what I use as the default. It's a Mixture of Experts model, so it only activates 3.8B parameters per token even though it has 26B total. Much faster than the 31B Dense, almost the same quality. Ranked #6 globally on the Arena leaderboard.
365
-
366
- ### Using cloud providers (Claude, GPT, Groq, etc.)
367
-
368
- If you don't want to run a local LLM at all, you can plug in any cloud API:
369
-
370
- | Setup | RAM | Notes |
371
- |---|---|---|
372
- | Any laptop | ~4GB free | Just runs Postgres + the Node.js app + browser. The LLM runs on the provider's servers. |
373
-
374
- You bring your own API key. The database, memory, profile, and rules still stay on your machine. Only the chat messages get sent to the provider.
375
-
376
- **One thing to know:** when you use a cloud provider, your conversation goes to their servers. Your facts and profile get sent as part of the system prompt so the cloud LLM has context. This breaks the local-only guarantee for those specific conversations. Use Ollama for anything you want fully private.
53
+ That's the whole install. Here's what happens after you hit Enter:
377
54
 
378
- </details>
379
-
380
- <details>
381
- <summary><strong>CLI commands</strong></summary>
382
-
383
- ```bash
384
- npx recallmem # Setup if needed, then start the app
385
- npx recallmem init # Setup only (deps check, DB, models, env)
386
- npx recallmem start # Start the server (assumes setup was done)
387
- npx recallmem doctor # Check what's missing or broken
388
- npx recallmem upgrade # Pull latest code, run pending migrations
389
- npx recallmem version # Print version
390
- npx recallmem --help # Show help
391
- ```
55
+ 1. **It checks what you already have** on your Mac (Node, Postgres, Ollama). Anything already installed gets skipped.
56
+ 2. **It shows you a list** of what's missing with ✓ and ✗ marks.
57
+ 3. **It asks one question:** `Install everything now? [Y/n]`. Hit Enter to say yes.
58
+ 4. **It runs `brew install`** for Postgres 17, pgvector, and Ollama. You'll see real-time progress in your terminal.
59
+ 5. **It starts Postgres and Ollama as background services** so they keep running across reboots.
60
+ 6. **It downloads EmbeddingGemma** (~600 MB, ~1-2 min). This is required for the memory system.
61
+ 7. **It asks which Gemma 4 model you want.** Three options:
62
+ - **1) Gemma 4 26B** 18 GB, fast, recommended for most people
63
+ - **2) Gemma 4 31B** 19 GB, slower, smartest answers
64
+ - **3) Gemma 4 E2B** 2 GB, very fast, good for testing or older laptops
65
+ 8. **It downloads the model you picked.** E2B finishes in 2-3 min. The 18 GB option takes 10-30 min depending on your internet.
66
+ 9. **It runs database migrations** (~5 seconds).
67
+ 10. **It builds the app for production** (~30-60 seconds, first install only).
68
+ 11. **It starts the server.** Open `http://localhost:3000` in your browser and start chatting.
392
69
 
393
- The default `npx recallmem` is what you'll use 99% of the time. It's smart about its state. On the first run it sets everything up, on subsequent runs it just starts the server.
70
+ Total time: **5-45 minutes** depending on which model you picked and your internet speed. Most of that is the model download. You only have to interact with it twice once to confirm install, once to pick a model. After that, walk away.
394
71
 
395
- If something breaks, run `npx recallmem doctor` first. It tells you exactly what's wrong and how to fix it.
396
-
397
- </details>
72
+ **Subsequent runs are instant.** Just `npx recallmem` and the chat opens.
398
73
 
399
74
  <details>
400
- <summary><strong>Two ways to use it (just-run-it vs fork-and-hack)</strong></summary>
401
-
402
- The `npx recallmem` command auto-detects which workflow you're in.
403
-
404
- ### Workflow 1: Just run it (most users)
75
+ <summary><strong>Just want cloud models? (Claude / GPT)</strong></summary>
405
76
 
406
- You want to use RecallMEM as your daily AI tool. You don't care about the code.
77
+ You still need Postgres for local memory storage, but you can skip Ollama entirely:
407
78
 
408
79
  ```bash
80
+ brew install postgresql@17 pgvector
81
+ brew services start postgresql@17
409
82
  npx recallmem
410
83
  ```
411
84
 
412
- The CLI:
413
- 1. Detects nothing is installed yet
414
- 2. Clones the repo to `~/.recallmem` (one-time, ~50MB)
415
- 3. Runs `npm install` inside `~/.recallmem`
416
- 4. Checks your dependencies (Postgres, pgvector, Ollama)
417
- 5. Pulls the embedding model if missing
418
- 6. Asks if you want to pull a chat model (~18GB, optional)
419
- 7. Creates the database, runs migrations, writes the config file
420
- 8. Starts the server and opens the chat in your browser
421
-
422
- Subsequent runs are instant. Just `npx recallmem` and the chat opens.
423
-
424
- To upgrade later when I ship a new version:
425
-
426
- ```bash
427
- npx recallmem upgrade
428
- ```
429
-
430
- That does a `git pull`, runs `npm install` if deps changed, and applies any pending migrations.
431
-
432
- ### Workflow 2: Fork it and hack on it (developers)
433
-
434
- You want to modify the code, contribute back, run your own variant.
435
-
436
- ```bash
437
- git clone https://github.com/RealChrisSean/RecallMEM.git
438
- cd RecallMEM
439
- npm install
440
- npx recallmem
441
- ```
442
-
443
- The CLI detects you're already inside a recallmem checkout and uses your current directory instead of cloning to `~/.recallmem`. Hot reload works. Edits to the code are reflected immediately on the next dev server reload.
444
-
445
- Same `npx recallmem` command. Different behavior because the CLI is smart about where it's running.
446
-
447
- See [CONTRIBUTING.md](./CONTRIBUTING.md) for the dev workflow.
448
-
449
- **Testing:**
450
-
451
- ```bash
452
- npm test # run the suite once
453
- npm test:watch # re-run on file change
454
- ```
455
-
456
- The test suite uses Vitest and currently covers the deterministic memory primitives (keyword inflection, the categorization router, and the regression cases that have bitten us in the past — `son` matching `Sonnet`, `work` matching `framework`, etc). It's intentionally narrow and fast (~150ms). New tests go in `test/unit/` and follow the same shape as `test/unit/facts.test.ts`. No DB or LLM required, pure functions only.
457
-
458
- **Optional observability (Langfuse):**
459
-
460
- If you're hacking on RecallMEM and want full trace timelines for every chat turn (memory build, LLM generation, fact extraction, supersession decisions, etc), there's a built-in Langfuse integration. It's a peer dependency, so it's NOT installed by default and zero cost when unused.
461
-
462
- ```bash
463
- npm install langfuse
464
- ```
465
-
466
- Then set these in `.env.local`:
467
-
468
- ```
469
- LANGFUSE_PUBLIC_KEY=pk-lf-...
470
- LANGFUSE_SECRET_KEY=sk-lf-...
471
- LANGFUSE_BASEURL=http://localhost:3000 # optional, defaults to cloud.langfuse.com
472
- ```
473
-
474
- Self-host Langfuse via Docker so traces stay on your machine. This is a developer-only debugging tool. Trace payloads include the actual user message content, so don't enable it on machines where conversation contents shouldn't leave the local environment.
475
-
476
- </details>
477
-
478
- <details>
479
- <summary><strong>Where things live on disk (and how to fully uninstall)</strong></summary>
480
-
481
- The default install location is `~/.recallmem`. Override with `RECALLMEM_HOME=/custom/path npx recallmem` if you want it somewhere else.
482
-
483
- What's in `~/.recallmem`:
484
-
485
- - The full RecallMEM source code (cloned from GitHub)
486
- - `node_modules/` with all dependencies
487
- - `.env.local` with your config
488
- - The Next.js build output (when you run it)
489
-
490
- What's NOT in `~/.recallmem`:
491
-
492
- - Your conversations, facts, profile, embeddings, rules, and API keys. Those all live in your Postgres database at `/opt/homebrew/var/postgresql@17/` (Mac) or `/var/lib/postgresql/` (Linux). The Postgres data directory is the actual source of truth.
493
-
494
- To completely uninstall:
495
-
496
- ```bash
497
- rm -rf ~/.recallmem # Remove the app
498
- dropdb recallmem # Remove the database (or use the in-app "Nuke everything" button first)
499
- ```
500
-
501
- </details>
502
-
503
- ---
504
-
505
- ## Privacy
506
-
507
- If you only use Ollama, **nothing leaves your machine, ever**. You can air-gap the computer and it keeps working. If you add a cloud provider (Claude, GPT, etc.), only the chat messages and your assembled system prompt go to that provider's servers. Your database, embeddings, and saved API keys stay local.
508
-
509
- <details>
510
- <summary><strong>Privacy diagram + truly unrecoverable deletion</strong></summary>
511
-
512
- ```mermaid
513
- flowchart TB
514
- subgraph Local["Your machine (always private)"]
515
- DB[(Postgres database<br/>Chats, facts, profile, embeddings, API keys, rules)]
516
- App[Next.js app]
517
- Ollama_Box[Ollama]
518
- end
519
-
520
- subgraph CloudOpt["Optional cloud (only if you add a provider)"]
521
- Anthropic[Anthropic API]
522
- OpenAI_API[OpenAI API]
523
- Other[Other LLM APIs]
524
- end
525
-
526
- User[You] <--> App
527
- App <--> DB
528
- App <--> Ollama_Box
529
- App -.->|"Conversation messages<br/>+ system prompt<br/>(only if you pick a cloud provider)"| Anthropic
530
- App -.-> OpenAI_API
531
- App -.-> Other
532
-
533
- style Local fill:#dfe
534
- style CloudOpt stroke-dasharray: 5 5
535
- ```
536
-
537
- **Always on your machine, never sent anywhere:**
538
- - Your chat history
539
- - Your facts and profile
540
- - Your custom rules
541
- - Your vector embeddings
542
- - Your saved API keys
543
-
544
- **Sent only when you actively use a cloud provider:**
545
- - The current conversation messages
546
- - The system prompt (which includes your profile, facts, and rules so the cloud LLM has context)
547
-
548
- ### Truly unrecoverable deletion
549
-
550
- When you click "Wipe memory" or "Nuke everything" on the Memory page, the app runs:
551
-
552
- 1. `DELETE` to remove rows from query results
553
- 2. `VACUUM FULL <table>` to physically rewrite the table on disk and release the dead row space
554
- 3. `CHECKPOINT` to force Postgres to flush WAL log files
555
-
556
- After those three steps, the data is gone from the database in any practically recoverable way.
557
-
558
- **One thing I want to be honest about:** filesystem-level forensic recovery (raw disk block scanning) is a separate problem. SSDs have wear leveling, so file overwrites don't always touch the original physical cells. The complete solution is **full-disk encryption** (FileVault on Mac, LUKS on Linux, BitLocker on Windows). With disk encryption and a strong login password, the data is genuinely unrecoverable. Not even Apple could read it.
559
-
560
- </details>
561
-
562
- ---
563
-
564
- <details>
565
- <summary><strong>What it doesn't do (yet), honest limitations</strong></summary>
566
-
567
- I'm being honest about the limitations. This is v0.1.
568
-
569
- - **No voice yet.** It's text only. I want to add Whisper for speech-to-text and Piper for text-to-speech, both local. On the roadmap.
570
- - **Web search works on Anthropic and Ollama. OpenAI not yet.** Anthropic uses the native `web_search_20250305` tool, no setup. Ollama (Gemma) uses **Brave Search** as a backend, which needs a free API key (5 minute setup): sign up at [brave.com/search/api](https://brave.com/search/api), pick the Free tier (2,000 searches/month), and add `BRAVE_SEARCH_API_KEY=your_key_here` to your `.env.local`. Then restart RecallMEM. When you toggle web search on the chat UI, the first time you'll see a privacy modal explaining that Brave will see your message text but NOT your memory, profile, facts, or past conversations. If the key isn't set or the quota is exhausted, the toggle still works but the AI will tell you what to do instead of failing silently. OpenAI's native web search requires the Responses API path which isn't plumbed through yet.
571
- - **No multi-user.** This is a personal app for one person on one machine. If you want a multi-user version, that's a separate fork.
572
- - **Reasoning models (OpenAI o1/o3, Claude extended thinking) might have edge cases.** They use different API parameters that I don't fully handle yet. Standard chat models work fine.
573
- - **OpenAI vision isn't fully wired up.** Gemma 4 (4B and up) handles images natively via Ollama. OpenAI uses a different format that I haven't plumbed through. Use Ollama or Anthropic for images.
574
- - **No mobile app.** It's a web app you run locally. You access it from your browser at `localhost:3000`. A native iOS/Android app is theoretically possible but it's a separate project I haven't started.
575
- - **Fact supersession is LLM-judged and conservative.** The local Gemma extractor decides whether a new fact contradicts an old one. It's intentionally cautious (only retires a fact when the replacement is unambiguous), so it might occasionally miss a real contradiction or, more rarely, retire something it shouldn't have. You can always inspect and edit/restore in the Memory page. For higher-stakes use cases, you'd want a stricter rule-based supersession layer on top, or a periodic profile-rebuild from full history.
85
+ After the app starts, go to **Settings → Providers → Add a new provider**, paste your API key, and pick that model from the chat dropdown.
576
86
 
577
87
  </details>
578
88
 
579
89
  <details>
580
- <summary><strong>Tech stack</strong></summary>
581
-
582
- - **Frontend / Backend:** Next.js 16 (App Router) + TypeScript + Tailwind CSS v4
583
- - **Database:** Postgres 17 + pgvector (HNSW vector indexes)
584
- - **Local LLM:** Ollama with Gemma 4 (E2B / E4B / 26B MoE / 31B Dense)
585
- - **Embeddings:** EmbeddingGemma 300M (768 dimensions, runs in Ollama)
586
- - **PDF parsing:** pdf-parse v2
587
- - **Markdown rendering:** react-markdown + remark-gfm + @tailwindcss/typography
588
- - **Cloud LLM transports (optional):** Anthropic Messages API, OpenAI Chat Completions, OpenAI-compatible
589
-
590
- </details>
591
-
592
- <details>
593
- <summary><strong>Manual install (for the curious or for when <code>npx recallmem</code> can't be used)</strong></summary>
594
-
595
- If you want to know what `npx recallmem` is doing under the hood, or you don't want to use the CLI for some reason, here's the manual install.
596
-
597
- ### macOS
598
-
599
- ```bash
600
- # 1. Install Node.js
601
- brew install node
602
-
603
- # 2. Install Postgres 17 + pgvector
604
- brew install postgresql@17 pgvector
605
- brew services start postgresql@17
606
-
607
- # 3. Install Ollama (skip if using cloud only)
608
- brew install ollama
609
- brew services start ollama
610
-
611
- # 4. Pull the models
612
- ollama pull embeddinggemma # ~600MB, REQUIRED
613
- ollama pull gemma4:26b # ~18GB, recommended chat model
614
- ollama pull gemma4:e4b # ~4GB, fast model for background tasks
615
- ```
90
+ <summary><strong>Linux (not officially supported, manual install)</strong></summary>
616
91
 
617
- ### Linux (Ubuntu/Debian)
92
+ Auto-install isn't wired up for Linux. You'll need to install everything by hand:
618
93
 
619
94
  ```bash
620
- # 1. Node.js
621
- curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
622
- sudo apt install -y nodejs
623
-
624
- # 2. Postgres + pgvector
95
+ # Postgres + pgvector (apt example)
625
96
  sudo apt install postgresql-17 postgresql-17-pgvector
626
97
  sudo systemctl start postgresql
627
98
 
628
- # 3. Ollama
99
+ # Ollama
629
100
  curl -fsSL https://ollama.com/install.sh | sh
630
-
631
- # 4. Pull models
101
+ sudo systemctl start ollama
632
102
  ollama pull embeddinggemma
633
103
  ollama pull gemma4:26b
634
- ollama pull gemma4:e4b
635
- ```
636
-
637
- ### Windows
638
-
639
- Use WSL2 with Ubuntu and follow the Linux steps. Native Windows works too but it's rougher.
640
-
641
- ### Setup
642
104
 
643
- ```bash
644
- # 1. Clone the repo
645
- git clone https://github.com/RealChrisSean/RecallMEM.git
646
- cd RecallMEM
647
-
648
- # 2. Install dependencies
649
- npm install
650
-
651
- # 3. Create the database
652
- createdb recallmem
653
-
654
- # 4. Run migrations
655
- npm run migrate
656
-
657
- # 5. Configure .env.local
658
- cat > .env.local <<EOF
659
- DATABASE_URL=postgres://$USER@localhost:5432/recallmem
660
- OLLAMA_URL=http://localhost:11434
661
- OLLAMA_CHAT_MODEL=gemma4:26b
662
- OLLAMA_FAST_MODEL=gemma4:e4b
663
- OLLAMA_EMBED_MODEL=embeddinggemma
664
- EOF
665
-
666
- # 6. Start the dev server
667
- npm run dev
105
+ # Run
106
+ npx recallmem
668
107
  ```
669
108
 
670
- Open [http://localhost:3000](http://localhost:3000).
671
-
672
109
  </details>
673
110
 
674
111
  <details>
675
- <summary><strong>Troubleshooting (the real gotchas I hit)</strong></summary>
112
+ <summary><strong>Windows (not supported, use WSL2)</strong></summary>
676
113
 
677
- Stuff I've actually hit. If you run into something else, run `npx recallmem doctor` first. It tells you exactly what's broken.
114
+ Native Windows is not supported. Use [WSL2](https://learn.microsoft.com/en-us/windows/wsl/install) with Ubuntu and follow the Linux steps above inside WSL.
678
115
 
679
- **`createdb: command not found`**
116
+ </details>
117
+
118
+ ## CLI commands
680
119
 
681
- Add Postgres to your PATH:
682
120
  ```bash
683
- export PATH="/opt/homebrew/opt/postgresql@17/bin:$PATH"
121
+ npx recallmem # Setup if needed, then start the app
122
+ npx recallmem init # Setup only (deps, DB, models, env)
123
+ npx recallmem start # Start the server (assumes setup done)
124
+ npx recallmem doctor # Check what's missing or broken
125
+ npx recallmem upgrade # Pull latest code, run pending migrations
126
+ npx recallmem version # Print version
684
127
  ```
685
128
 
686
- **`extension "vector" is not available`**
129
+ ## Privacy
687
130
 
688
- You're running Postgres 16 or older. The `pgvector` Homebrew bottle only ships extensions for Postgres 17 and 18. Switch to `postgresql@17`. I learned this the hard way. The install error message is cryptic and the fix took me 30 minutes the first time.
131
+ If you only use Ollama, **nothing leaves your machine, ever.** You can air-gap the computer and it keeps working. If you add a cloud provider, only the chat messages and your assembled system prompt go to that provider's servers. Your database, embeddings, and saved API keys stay local.
689
132
 
690
- **Ollama silently fails to pull a new model**
133
+ ## For developers
691
134
 
692
- You've got a version mismatch between the Ollama CLI and the Ollama server. This bites you if you have both Homebrew Ollama AND the desktop Ollama app installed. Check `ollama --version`. Both client and server should match.
135
+ Underneath the chat UI, RecallMEM is a **deterministic memory framework** you can fork and use in your own AI app. The whole `lib/` folder is intentionally framework-shaped.
693
136
 
694
- ```bash
695
- brew upgrade ollama
696
- pkill -f "Ollama" # kill the old desktop app server
697
- brew services start ollama # start the new server from Homebrew
137
+ ```
138
+ lib/
139
+ ├── memory.ts Memory orchestrator (profile + facts + vector recall in parallel)
140
+ ├── prompts.ts System prompt assembly with all memory context
141
+ ├── facts.ts Fact extraction (LLM proposes) + validation (TypeScript decides)
142
+ ├── profile.ts Synthesizes a structured profile from active facts
143
+ ├── chunks.ts Transcript splitting, embedding, vector search
144
+ ├── chats.ts Chat CRUD + transcript serialization
145
+ ├── post-chat.ts Post-chat pipeline (title, facts, profile rebuild, embed)
146
+ ├── rules.ts Custom user rules / instructions
147
+ ├── embeddings.ts EmbeddingGemma calls via Ollama
148
+ ├── llm.ts LLM router (Ollama, Anthropic, OpenAI, OpenAI-compatible)
149
+ └── db.ts Postgres pool + configurable user ID resolver
698
150
  ```
699
151
 
700
- **Gemma 4 31B is slow**
701
-
702
- Two reasons:
703
-
704
- 1. **Thinking mode is on.** The app already disables it via `think: false`, but if you bypass the app and call Ollama directly, you'll see slow responses. Gemma 4 spends a ton of tokens "thinking" before answering when it's enabled.
705
- 2. **Dense vs MoE.** 31B Dense activates all 31B parameters per token. Switch to `gemma4:26b` (Mixture of Experts, only 3.8B active per token) for ~3-5x the speed with minimal quality loss. This is what I use as the default.
706
-
707
- **"My memory isn't being used in new chats"**
152
+ Wire in your own auth with two calls at startup and every lib function respects it. See the [developer docs](./docs/DEVELOPERS.md) for embedding the memory layer into your own app, the database schema, testing, and optional Langfuse observability.
708
153
 
709
- Make sure you click "New chat" (or switch to another chat in the sidebar) to trigger the synchronous "Saving memory..." finalize step. If you just refresh the browser without ending the chat, the post-chat pipeline runs as a best-effort `sendBeacon()` and may not finish before the next chat starts.
154
+ ## Docs
710
155
 
711
- The fix: always click "New chat" or switch chats in the sidebar before closing the browser if you said something you want remembered.
156
+ | Doc | What's in it |
157
+ |---|---|
158
+ | [Architecture deep dive](./docs/ARCHITECTURE.md) | How deterministic memory works, read/write paths, validation pipeline, why the LLM is not in charge |
159
+ | [Developer guide](./docs/DEVELOPERS.md) | Embedding the memory framework, auth wiring, schema, testing, Langfuse setup |
160
+ | [Hardware guide](./docs/HARDWARE.md) | Which model fits which machine, RAM requirements, cloud vs. local tradeoffs |
161
+ | [Troubleshooting](./docs/TROUBLESHOOTING.md) | Every gotcha I've hit and how to fix it |
162
+ | [Manual install](./docs/MANUAL_INSTALL.md) | Step-by-step if you don't want to use the CLI |
712
163
 
713
- </details>
164
+ ## Limitations (v0.1)
714
165
 
715
- ---
166
+ Text only (no voice yet). No multi-user. No mobile app. OpenAI vision not fully wired. Reasoning models (o1/o3, extended thinking) may have edge cases. Fact supersession is LLM-judged and intentionally conservative. See the [full limitations list](./docs/LIMITATIONS.md).
716
167
 
717
168
  ## Contributing
718
169
 
719
- Forks, PRs, bug reports, ideas, all welcome. See [CONTRIBUTING.md](./CONTRIBUTING.md) for the dev setup and how the codebase is organized.
720
-
721
- If you build something cool on top of RecallMEM, I'd love to hear about it.
170
+ Forks, PRs, bug reports, ideas, all welcome. See [CONTRIBUTING.md](./CONTRIBUTING.md) for the dev setup.
722
171
 
723
172
  ## License
724
173
 
725
- Apache License 2.0. See [LICENSE](./LICENSE) for the full text and [NOTICE](./NOTICE) for third-party attributions. You can use, modify, fork, and redistribute this for any purpose, personal or commercial. The license includes a patent grant and the standard "no warranty, no liability" disclaimer.
174
+ Apache 2.0. See [LICENSE](./LICENSE) and [NOTICE](./NOTICE). Use it, modify it, fork it, ship it commercially.
726
175
 
727
176
  ## Status
728
177
 
729
- This is v0.1. It works. I use it every day.
730
-
731
- It's also not "production ready" in the corporate sense. There's no CI, no error monitoring, no SLA. There's a small Vitest test suite that covers the deterministic memory primitives (keyword routing, inflection, regression cases), but it's intentionally narrow. If you want to use it as your daily AI tool, fork it, make it yours, and expect to read the code if something breaks. That's the deal.
732
-
733
- I built RecallMEM because I wanted my own private AI. I'm sharing it because there's a real gap in the local AI ecosystem and someone needed to fill it. If this is useful to you, that's cool. If not, no hard feelings.
178
+ v0.1. It works. I use it every day. There's no CI, no error monitoring, no SLA. If you want to use it as your daily AI tool, fork it, make it yours, and expect to read the code if something breaks. That's the deal.
734
179
 
735
- The repo: [github.com/RealChrisSean/RecallMEM](https://github.com/RealChrisSean/RecallMEM)
180
+ [github.com/RealChrisSean/RecallMEM](https://github.com/RealChrisSean/RecallMEM)
@@ -1,21 +1,26 @@
1
1
  /**
2
2
  * recallmem init / setup
3
3
  *
4
- * Idempotent setup pipeline:
5
- * 1. Check Node.js version
6
- * 2. Check Postgres is installed and running
7
- * 3. Check pgvector is available
8
- * 4. Create the database if missing
9
- * 5. Run migrations
10
- * 6. Check Ollama (optional - skip if user wants cloud-only)
11
- * 7. Pull embeddinggemma (required, ~600MB)
12
- * 8. Offer to pull gemma4:26b (recommended chat model, ~18GB)
13
- * 9. Generate .env.local with sensible defaults
4
+ * Real installer (not a hint-giver). Detects what's missing, asks the user
5
+ * ONE yes/no question, then installs everything for them. Then asks which
6
+ * Gemma 4 model to download. Then runs the app.
7
+ *
8
+ * Pipeline:
9
+ * 1. Check Node 20+ (hard fail if missing - we can't bootstrap node from npx)
10
+ * 2. Detect missing pieces (Postgres, pgvector, Ollama)
11
+ * 3. Show one summary + one prompt: "install everything? Y/n"
12
+ * 4. Run brew install / brew services start for everything missing
13
+ * 5. Verify each piece is actually up before moving on
14
+ * 6. Pull EmbeddingGemma (always, required for memory)
15
+ * 7. Ask which Gemma 4 chat model to install (1, 2, or 3)
16
+ * 8. Run migrations
17
+ * 9. Production build (skipped in dev mode)
18
+ * 10. Done
14
19
  */
15
20
 
16
21
  const fs = require("node:fs");
17
22
  const path = require("node:path");
18
- const { spawn, spawnSync, execSync } = require("node:child_process");
23
+ const { spawnSync, execSync } = require("node:child_process");
19
24
 
20
25
  const {
21
26
  getOS,
@@ -26,6 +31,7 @@ const {
26
31
  detectOllama,
27
32
  detectOllamaModel,
28
33
  detectDatabase,
34
+ commandExists,
29
35
  } = require("../lib/detect");
30
36
 
31
37
  const {
@@ -46,7 +52,7 @@ const {
46
52
  blank,
47
53
  } = require("../lib/output");
48
54
 
49
- const { confirm } = require("../lib/prompt");
55
+ const { confirm, ask } = require("../lib/prompt");
50
56
 
51
57
  const DEFAULT_DB_NAME = "recallmem";
52
58
 
@@ -71,6 +77,54 @@ function writeEnv(envPath, env) {
71
77
  fs.writeFileSync(envPath, lines.join("\n") + "\n");
72
78
  }
73
79
 
80
+ // Run a shell command and stream output to the user. Returns true on success.
81
+ function run(command, args, label) {
82
+ if (label) step(label);
83
+ const result = spawnSync(command, args, { stdio: "inherit" });
84
+ return result.status === 0;
85
+ }
86
+
87
+ // Wait up to N seconds for a service to become ready. Used after starting
88
+ // brew services so we don't race ahead before postgres/ollama is actually up.
89
+ async function waitFor(checkFn, timeoutMs = 15000, intervalMs = 500) {
90
+ const start = Date.now();
91
+ while (Date.now() - start < timeoutMs) {
92
+ if (await checkFn()) return true;
93
+ await new Promise((r) => setTimeout(r, intervalMs));
94
+ }
95
+ return false;
96
+ }
97
+
98
+ // Pretty model menu — short lines, plain words, dyslexia-friendly.
99
+ async function pickGemmaModel() {
100
+ blank();
101
+ console.log(color.bold("Pick a Gemma 4 model."));
102
+ blank();
103
+ console.log(" 1) Gemma 4 26B");
104
+ console.log(" Size: 18 GB");
105
+ console.log(" Speed: Fast");
106
+ console.log(" Best for: Most people. Recommended.");
107
+ blank();
108
+ console.log(" 2) Gemma 4 31B");
109
+ console.log(" Size: 19 GB");
110
+ console.log(" Speed: Slower");
111
+ console.log(" Best for: People who want the smartest answers, even if it takes longer.");
112
+ blank();
113
+ console.log(" 3) Gemma 4 E2B");
114
+ console.log(" Size: 2 GB");
115
+ console.log(" Speed: Very fast");
116
+ console.log(" Best for: A quick test. Or older laptops.");
117
+ blank();
118
+
119
+ while (true) {
120
+ const answer = await ask("Type 1, 2, or 3 and press Enter [1]: ");
121
+ if (!answer || answer === "1") return { id: "gemma4:26b", label: "Gemma 4 26B" };
122
+ if (answer === "2") return { id: "gemma4:31b", label: "Gemma 4 31B" };
123
+ if (answer === "3") return { id: "gemma4:e2b", label: "Gemma 4 E2B" };
124
+ console.log(" Type 1, 2, or 3.");
125
+ }
126
+ }
127
+
74
128
  async function setupCommand(opts = {}) {
75
129
  const {
76
130
  silent = false,
@@ -79,9 +133,10 @@ async function setupCommand(opts = {}) {
79
133
  devMode = false,
80
134
  } = opts;
81
135
  const ENV_PATH = path.join(installPath, ".env.local");
136
+ const os = getOS();
82
137
 
83
- // ─── Step 1: Node.js ───────────────────────────────────────────────────
84
- if (!silent) section("Checking dependencies");
138
+ // ─── Step 1: Node.js (hard requirement, we're already running on it) ───
139
+ if (!silent) section("Checking what you have");
85
140
  const node = detectNode();
86
141
  if (!node.ok) {
87
142
  fail(`Node.js ${node.version} is too old (need ${node.needed}+)`);
@@ -91,54 +146,157 @@ async function setupCommand(opts = {}) {
91
146
  }
92
147
  success(`Node.js ${node.version}`);
93
148
 
94
- // ─── Step 2: Postgres ──────────────────────────────────────────────────
95
- const pg = detectPostgres();
96
- if (!pg.installed) {
97
- fail("Postgres not found");
98
- blank();
99
- console.log(postgresInstallHint());
149
+ // ─── Step 2: Detect everything else ───────────────────────────────────
150
+ let pg = detectPostgres();
151
+ let pgService = pg.installed ? detectPostgresService() : { running: false };
152
+ let ollama = detectOllama();
153
+
154
+ // ─── Step 3: Print a summary of what's there and what's missing ───────
155
+ blank();
156
+ console.log(pg.installed && pg.ok
157
+ ? ` ✓ Postgres ${pg.major}`
158
+ : " ✗ Postgres 17 with pgvector — missing");
159
+ console.log(pgService.running
160
+ ? " ✓ Postgres is running"
161
+ : pg.installed
162
+ ? " ✗ Postgres is not running"
163
+ : " ✗ Postgres is not running");
164
+ console.log(ollama.installed && ollama.running
165
+ ? " ✓ Ollama is running"
166
+ : ollama.installed
167
+ ? " ✗ Ollama is installed but not running"
168
+ : " ✗ Ollama — missing");
169
+ blank();
170
+
171
+ const needPostgres = !pg.installed || !pg.ok;
172
+ const needPostgresStart = pg.installed && pg.ok && !pgService.running;
173
+ const needOllama = !ollama.installed;
174
+ const needOllamaStart = ollama.installed && !ollama.running;
175
+ const anythingMissing = needPostgres || needPostgresStart || needOllama || needOllamaStart;
176
+
177
+ if (anythingMissing) {
178
+ // Check Homebrew is available before offering auto-install on Mac
179
+ const hasBrew = os === "mac" && commandExists("brew");
180
+
181
+ if (os === "mac" && !hasBrew) {
182
+ fail("Homebrew is required to auto-install dependencies.");
183
+ blank();
184
+ console.log("Install Homebrew first by pasting this in your terminal:");
185
+ console.log("");
186
+ console.log(" /bin/bash -c \"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\"");
187
+ console.log("");
188
+ console.log("Then re-run: npx recallmem");
189
+ return { ok: false };
190
+ }
191
+
192
+ if (os !== "mac" && os !== "linux") {
193
+ fail(`Auto-install is only supported on Mac and Linux. You're on: ${os}`);
194
+ info("On Windows, use WSL2 with Ubuntu and re-run npx recallmem inside WSL.");
195
+ return { ok: false };
196
+ }
197
+
198
+ if (os === "linux") {
199
+ // Linux is doable but we can't run apt without sudo, and the package
200
+ // names vary by distro. Print clear instructions and exit.
201
+ fail("Auto-install is currently only set up for Mac (Homebrew).");
202
+ blank();
203
+ console.log("On Linux, install these manually then re-run npx recallmem:");
204
+ console.log("");
205
+ console.log(" Postgres 17 with pgvector (your distro's package manager)");
206
+ console.log(" Ollama: curl -fsSL https://ollama.com/install.sh | sh");
207
+ console.log(" Then: systemctl start postgresql && systemctl start ollama");
208
+ return { ok: false };
209
+ }
210
+
211
+ // Mac path: ask once, install everything
100
212
  blank();
101
- info("Once installed, re-run: npx recallmem");
102
- return { ok: false };
103
- }
104
- if (!pg.ok) {
105
- fail(`Postgres ${pg.major} found, but version 17+ is required`);
213
+ console.log("I can install and start the missing pieces for you using Homebrew.");
214
+ console.log("This takes about 2-5 minutes (not counting the model download).");
106
215
  blank();
107
- console.log(postgresInstallHint());
108
- return { ok: false };
109
- }
110
- success(`Postgres ${pg.major}`);
111
-
112
- // ─── Step 3: Postgres service running ──────────────────────────────────
113
- const pgService = detectPostgresService();
114
- if (!pgService.running) {
115
- warn("Postgres is installed but not running");
116
- if (getOS() === "mac") {
117
- info("Try: brew services start postgresql@17");
118
- } else if (getOS() === "linux") {
119
- info("Try: sudo systemctl start postgresql");
216
+ const wantsInstall = await confirm("Install everything now?", true);
217
+ if (!wantsInstall) {
218
+ blank();
219
+ info("Skipped. You can install manually with:");
220
+ if (needPostgres) console.log(" brew install postgresql@17 pgvector");
221
+ if (needPostgresStart || needPostgres) console.log(" brew services start postgresql@17");
222
+ if (needOllama) console.log(" brew install ollama");
223
+ if (needOllamaStart || needOllama) console.log(" brew services start ollama");
224
+ return { ok: false };
120
225
  }
121
- return { ok: false };
226
+
227
+ // Install Postgres if missing
228
+ if (needPostgres) {
229
+ if (!run("brew", ["install", "postgresql@17", "pgvector"], "Installing Postgres 17 + pgvector...")) {
230
+ fail("Failed to install Postgres. Try running it manually and re-run npx recallmem.");
231
+ return { ok: false };
232
+ }
233
+ success("Installed Postgres 17 + pgvector");
234
+ // Re-detect after install
235
+ pg = detectPostgres();
236
+ }
237
+
238
+ // Start Postgres if not running
239
+ if (!pgService.running || needPostgres) {
240
+ step("Starting Postgres in the background...");
241
+ run("brew", ["services", "start", "postgresql@17"]);
242
+ // Wait for it to actually accept connections
243
+ const isUp = await waitFor(() => {
244
+ const r = detectPostgresService();
245
+ return r.running;
246
+ });
247
+ if (!isUp) {
248
+ fail("Postgres started but isn't accepting connections after 15s.");
249
+ info("Try: brew services restart postgresql@17");
250
+ return { ok: false };
251
+ }
252
+ success("Postgres is running on localhost:5432");
253
+ pgService = { running: true };
254
+ }
255
+
256
+ // Install Ollama if missing
257
+ if (needOllama) {
258
+ if (!run("brew", ["install", "ollama"], "Installing Ollama...")) {
259
+ fail("Failed to install Ollama. Try running it manually and re-run npx recallmem.");
260
+ return { ok: false };
261
+ }
262
+ success("Installed Ollama");
263
+ ollama = detectOllama();
264
+ }
265
+
266
+ // Start Ollama if not running
267
+ if (!ollama.running || needOllama) {
268
+ step("Starting Ollama in the background...");
269
+ run("brew", ["services", "start", "ollama"]);
270
+ const isUp = await waitFor(() => {
271
+ const r = detectOllama();
272
+ return r.running;
273
+ });
274
+ if (!isUp) {
275
+ fail("Ollama started but isn't responding after 15s.");
276
+ info("Try: brew services restart ollama");
277
+ return { ok: false };
278
+ }
279
+ success("Ollama is running on localhost:11434");
280
+ ollama = detectOllama();
281
+ }
282
+ } else {
283
+ success("Everything is already installed and running.");
122
284
  }
123
- success("Postgres service running on localhost:5432");
124
285
 
125
286
  // ─── Step 4: env file (we need DATABASE_URL before checking pgvector) ──
126
287
  const env = readEnv(ENV_PATH);
127
288
  let connectionString = env.DATABASE_URL;
128
-
129
289
  if (!connectionString) {
130
290
  connectionString = defaultConnectionString();
131
- step(`No .env.local found, will create one with default DATABASE_URL`);
132
291
  }
133
292
 
134
293
  // ─── Step 5: Database exists ───────────────────────────────────────────
135
- // Extract the database name from the connection string for accurate messages
136
294
  const dbNameMatch = connectionString.match(/\/([^/?]+)(\?|$)/);
137
295
  const dbName = dbNameMatch ? dbNameMatch[1] : DEFAULT_DB_NAME;
138
296
 
139
297
  const dbCheck = await detectDatabase(connectionString);
140
298
  if (!dbCheck.exists) {
141
- step(`Database '${dbName}' not found, creating...`);
299
+ step(`Creating database '${dbName}'...`);
142
300
  try {
143
301
  execSync(`${pg.psqlPath.replace(/psql$/, "createdb")} ${dbName}`, {
144
302
  stdio: "pipe",
@@ -164,7 +322,7 @@ async function setupCommand(opts = {}) {
164
322
  success("pgvector extension available");
165
323
 
166
324
  // ─── Step 7: Run migrations ────────────────────────────────────────────
167
- step("Running migrations...");
325
+ step("Running database migrations...");
168
326
  try {
169
327
  process.env.DATABASE_URL = connectionString;
170
328
  const migrateResult = spawnSync("npx", ["tsx", "scripts/migrate.ts"], {
@@ -181,84 +339,59 @@ async function setupCommand(opts = {}) {
181
339
  return { ok: false };
182
340
  }
183
341
 
184
- // ─── Step 8: Ollama (optional) ─────────────────────────────────────────
185
- section("Checking LLM runtime");
186
- const ollama = detectOllama();
187
- let ollamaUrl = env.OLLAMA_URL || "http://localhost:11434";
188
-
189
- if (!ollama.installed) {
190
- warn("Ollama not installed (optional - you can use cloud providers instead)");
191
- blank();
192
- console.log(ollamaInstallHint());
193
- blank();
194
- info("Continuing without Ollama. You can add Claude/OpenAI as a provider in the app.");
195
- blank();
196
- } else if (!ollama.running) {
197
- warn("Ollama is installed but not running");
198
- if (getOS() === "mac") {
199
- info("Try: brew services start ollama");
200
- } else {
201
- info("Try: ollama serve");
202
- }
203
- blank();
204
- } else {
205
- success(`Ollama running (${ollama.version || "unknown version"})`);
206
-
207
- // ─── Step 9: Required model: embeddinggemma ──────────────────────────
342
+ // ─── Step 8: Always pull EmbeddingGemma (required for memory) ──────────
343
+ if (ollama.running) {
344
+ section("Setting up models");
208
345
  const embedModel = await detectOllamaModel("embeddinggemma");
209
346
  if (!embedModel.installed) {
210
- step("Pulling embeddinggemma (~600MB, required for vector search)...");
347
+ step("Downloading EmbeddingGemma (~600 MB, required for memory)...");
211
348
  try {
212
349
  execSync("ollama pull embeddinggemma", { stdio: "inherit" });
213
- success("Pulled embeddinggemma");
350
+ success("EmbeddingGemma installed");
214
351
  } catch (err) {
215
352
  fail(`Failed to pull embeddinggemma: ${err.message}`);
216
353
  return { ok: false };
217
354
  }
218
355
  } else {
219
- success("embeddinggemma installed");
356
+ success("EmbeddingGemma already installed");
220
357
  }
221
358
 
222
- // ─── Step 10: Recommended model: gemma4:26b ──────────────────────────
223
- const chatModel = await detectOllamaModel("gemma4:26b");
224
- if (!chatModel.installed && !skipIfDone) {
225
- blank();
226
- info("Recommended chat model: gemma4:26b (~18GB)");
227
- info("Optional - you can use cloud providers (Claude, GPT) instead.");
228
- const wantsPull = await confirm("Pull gemma4:26b now?", false);
229
- if (wantsPull) {
230
- try {
231
- execSync("ollama pull gemma4:26b", { stdio: "inherit" });
232
- success("Pulled gemma4:26b");
233
- } catch (err) {
234
- fail(`Failed to pull gemma4:26b: ${err.message}`);
235
- info("You can pull it later with: ollama pull gemma4:26b");
236
- }
237
- } else {
238
- info("Skipped. You can pull it later with: ollama pull gemma4:26b");
359
+ // ─── Step 9: Pick a Gemma 4 chat model ───────────────────────────────
360
+ // Check if any Gemma 4 chat model is already installed first.
361
+ const has26 = await detectOllamaModel("gemma4:26b");
362
+ const has31 = await detectOllamaModel("gemma4:31b");
363
+ const hasE2 = await detectOllamaModel("gemma4:e2b");
364
+ const hasAny = has26.installed || has31.installed || hasE2.installed;
365
+
366
+ // Always show the picker when no Gemma chat model is installed.
367
+ // skipIfDone is intentionally NOT checked here - on a fresh machine
368
+ // we MUST pull a model or the chat 404s on first message.
369
+ if (!hasAny) {
370
+ const choice = await pickGemmaModel();
371
+ step(`Downloading ${choice.label}... (this can take a while)`);
372
+ try {
373
+ execSync(`ollama pull ${choice.id}`, { stdio: "inherit" });
374
+ success(`${choice.label} installed`);
375
+ } catch (err) {
376
+ fail(`Failed to pull ${choice.id}: ${err.message}`);
377
+ info(`You can pull it later with: ollama pull ${choice.id}`);
239
378
  }
240
- } else if (chatModel.installed) {
241
- success("gemma4:26b installed");
379
+ } else if (hasAny) {
380
+ success("A Gemma 4 chat model is already installed");
242
381
  }
243
382
  }
244
383
 
245
- // ─── Step 11: Write .env.local ─────────────────────────────────────────
246
- section("Writing config");
384
+ // ─── Step 10: Write .env.local ─────────────────────────────────────────
247
385
  const finalEnv = {
248
386
  DATABASE_URL: env.DATABASE_URL || connectionString,
249
- OLLAMA_URL: env.OLLAMA_URL || ollamaUrl,
387
+ OLLAMA_URL: env.OLLAMA_URL || "http://localhost:11434",
250
388
  OLLAMA_CHAT_MODEL: env.OLLAMA_CHAT_MODEL || "gemma4:26b",
251
389
  OLLAMA_FAST_MODEL: env.OLLAMA_FAST_MODEL || "gemma4:e4b",
252
390
  OLLAMA_EMBED_MODEL: env.OLLAMA_EMBED_MODEL || "embeddinggemma",
253
391
  };
254
- const existedBefore = fs.existsSync(ENV_PATH);
255
392
  writeEnv(ENV_PATH, finalEnv);
256
- success(`.env.local ${existedBefore ? "updated" : "created"}`);
257
393
 
258
- // ─── Step 12: Production build (skipped in dev mode) ──────────────────
259
- // End users get a production build for speed and to avoid dev-mode hot-reload
260
- // hydration warnings. Developers in their own checkout (devMode=true) skip the
261
- // build so they get hot reload via `next dev`.
394
+ // ─── Step 11: Production build (skipped in dev mode) ──────────────────
262
395
  if (!devMode) {
263
396
  const hasBuild = fs.existsSync(
264
397
  path.join(installPath, ".next", "BUILD_ID")
@@ -279,16 +412,20 @@ async function setupCommand(opts = {}) {
279
412
  }
280
413
  } catch (err) {
281
414
  warn(`Build failed: ${err.message}`);
282
- info("Falling back to dev mode at runtime");
283
415
  }
284
- } else {
285
- success("Production build already exists");
286
416
  }
287
417
  }
288
418
 
289
419
  blank();
290
420
  success(color.bold("Setup complete!"));
291
421
  blank();
422
+ console.log("Want a different Gemma 4 model later? Run one of these:");
423
+ console.log(" ollama pull gemma4:26b");
424
+ console.log(" ollama pull gemma4:31b");
425
+ console.log(" ollama pull gemma4:e2b");
426
+ console.log("");
427
+ console.log("Then pick it from the dropdown at the top of the chat.");
428
+ blank();
292
429
 
293
430
  return { ok: true };
294
431
  }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "recallmem",
3
- "version": "0.1.0",
3
+ "version": "0.1.2",
4
4
  "description": "Private, local-first AI chatbot with persistent working memory. One command install via npx.",
5
5
  "license": "Apache-2.0",
6
6
  "author": "Chris Sean",