npm - recallmem - Versions diffs - 0.1.0 → 0.1.2 - Mend

recallmem 0.1.0 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md CHANGED Viewed

@@ -6,25 +6,9 @@
 </p>
 <p align="center">
-  <strong>Persistent personal AI.</strong> Powered by Gemma 4 running locally on your own machine.
-  <br> <br>
-  This is not a chatbot (chatbots forget you) & this is not an agent (agents don't remember you). <br> <br> This IS a private AI, built on a deterministic memory framework where the LLM never touches your data.
+  <strong>Persistent Private AI.</strong> Powered by Gemma 4 running locally on your own machine.
 </p>
-<p align="center">
-  Two products in one repo:
-  <br>
-  <strong>👤 For users:</strong> install with <code>npx recallmem</code> and start chatting with an AI that actually remembers you.
-  <br>
-  <strong>👨‍💻 For developers:</strong> fork it and build your own AI app on top of the memory framework in <code>lib/</code>.
-</p>
-```bash
-npx recallmem
-```
-That's the install. One command. The CLI handles the rest. It clones the repo, sets up the database, pulls the local AI models, writes the config file, opens the chat in your browser. If you've already got Node, Postgres, and Ollama installed, you're chatting with your own private AI in about 5 minutes.
 <p align="center">
   <img src="./public/screenshots/demo.png" alt="RecallMEM chat UI showing the AI remembering the user's name across conversations" width="900">
 </p>
@@ -35,701 +19,162 @@ That's the install. One command. The CLI handles the rest. It clones the repo, s
 ---
-## Why I built this
-I wanted my own private AI for the kind of conversations I don't want sitting on someone else's server. Personal stuff. The stuff you'd actually want a real friend to help you think through.
-The default model is **Gemma 4** (Google's open weights model that just dropped, Apache 2.0) running locally via Ollama. You can pick any size from E2B (runs on a phone) up to the 31B Dense (best quality, needs a workstation). Or skip Ollama entirely and bring your own API key for Claude, GPT, Groq, Together, OpenRouter, or anything OpenAI-compatible. Your call.
-The thing is, the memory is the actual differentiator. Not the model. Not the UI. The memory. The AI builds a profile of who you are over time. It extracts facts after every conversation. It vector-searches across every chat you've ever had to find relevant context. By the time you've used it for a week, it knows you better than ChatGPT ever will, because ChatGPT forgets you the second you close the tab.
-<details>
-<summary><strong>The longer version (what's wrong with every other "private AI" tool)</strong></summary>
-Here's the problem with every "private AI" tool I tried: they all fall into one of three buckets.
-1. **Local chat UIs for Ollama.** Look pretty, but the AI has zero memory between conversations. Every chat is a stranger.
-2. **Memory libraries on GitHub.** Powerful, but they're SDKs. You have to build the whole UI yourself.
-3. **Cloud-based memory products like Mem0.** Have the full feature set, but your data goes to their servers. Defeats the whole point.
-There's a gap right in the middle: a **complete personal AI app with real working memory that runs 100% on your machine**. So I built it.
-</details>
----
-## What it does
-Persistent memory across every chat (profile + facts + vector search) with **temporal awareness** so the model knows what's current vs historical. Auto-extracts facts in real time, retires stale ones when the truth changes, stamps every memory with dates. Vector search over every past chat. Memory inspector you can edit. Custom rules. Wipe memory unrecoverably. File uploads (images, PDFs, code). Web search when using Anthropic. Bring your own LLM (Ollama, Anthropic, OpenAI, or any OpenAI-compatible API). Warm Claude-style dark mode.
-<details>
-<summary><strong>Full feature list</strong></summary>
-- **Persistent memory across every chat.** Three layers: a synthesized profile of who you are, an extracted facts table, and vector search over all past conversations.
-- **Live fact extraction.** Facts get extracted after every assistant reply, not just when the chat ends. Say "my birthday is 11/27" and refresh `/memory` a moment later, it's already there. Always uses the local FAST_MODEL so cloud users don't get billed per turn.
-- **Temporal awareness solves context collapse.** Every fact is stamped with a `valid_from` date. When new information contradicts an old fact ("left Acme" replaces "works at Acme"), the old fact gets retired automatically. The model always sees what's current.
-- **Self-healing categories.** Facts re-route to the correct category after every chat, edit, or delete. No LLM, just a deterministic loop. So when the categorizer improves, your existing memory improves with it.
-- **Resumed-conversation markers.** Open a chat from last week and continue it, the AI sees a system marker like `[Conversation resumed 6 days later]` so it knows time passed and earlier turns are historical.
-- **Dated recall.** When the vector search pulls relevant chunks from past chats, each one is prefixed with the date it came from so the model can tell history from the present.
-- **Auto-builds your profile** from the extracted facts, with date stamps in every section. Updates after every reply.
-- **Vector search across past conversations.** Ask about something you discussed last month, the AI finds it and uses it as context.
-- **Memory inspector page.** View, edit, or delete every fact, with collapsible category sections and a search filter for navigating long lists.
-- **Sidebar chat search.** Toggle between vector search (semantic, needs Ollama for embeddings) and text search (literal ILIKE on titles + transcripts, instant). Both search inside the conversations, not just titles.
-- **Web search toggle.** When you're using an Anthropic provider, a globe button next to the input lets Claude actually browse the web. Hidden for Ollama since local models don't have it.
-- **Custom rules.** Tell the AI how you want to be talked to. "Don't gaslight me." "I have dyslexia, no bullet points." "Don't add disclaimers." It applies them in every chat.
-- **Wipe memory unrecoverably.** `DELETE` + `VACUUM FULL` + `CHECKPOINT`. Gone for good at the database level.
-- **File uploads.** Drag and drop images, PDFs, code, text. Gemma 4 handles vision natively.
-- **Warm dark mode.** Claude-style charcoal palette via CSS variables, persisted across refreshes with no flash-of-light.
-- **Chat history sidebar** with date grouping, pinned chats, and the search toggle described above.
-- **Markdown rendering** for headings, code blocks, tables.
-- **Streaming responses** with smooth typewriter rendering.
-- **Bring any LLM you want.** Local Gemma 4 via Ollama, or plug in Anthropic (Claude), OpenAI (GPT), or any OpenAI-compatible API (Groq, Together, OpenRouter, Mistral, vLLM, LM Studio, etc).
-- **Test connection** for cloud providers before saving the API key, so you don't find out your key is wrong mid-chat.
-</details>
----
-## How is this different?
-<details>
-<summary><strong>Comparison table vs ChatGPT, Claude.ai, and Mem0</strong></summary>
-| | RecallMEM | ChatGPT / Claude.ai | Mem0 |
-|---|---|---|---|
-| **Runs locally** | ✅ | ❌ | ❌ |
-| **Memory retrieval is deterministic (no LLM tool calls)** | ✅ | ❌ | ❌ |
-| **Persistent memory across chats** | ✅ | partial | ✅ |
-| **Temporal awareness (memories know when they were true)** | ✅ | ❌ | ❌ |
-| **Auto-retires stale facts when truth changes** | ✅ | ❌ | ❌ |
-| **You can edit / delete memories** | ✅ | partial | ✅ |
-| **Vector search over past chats** | ✅ | ❌ | ✅ |
-| **Custom rules / behavior** | ✅ | ✅ | ❌ |
-| **Bring your own LLM (any provider)** | ✅ | ❌ | ❌ |
-| **Use local models (Gemma 4, Llama, etc)** | ✅ | ❌ | ❌ |
-| **No account / no signup** | ✅ | ❌ | ❌ |
-| **Free** | ✅ | partial | partial |
-| **Source available** | ✅ Apache 2.0 | ❌ | partial |
-</details>
-<details>
-<summary><strong>The actual differentiator nobody talks about (deterministic memory)</strong></summary>
-The thing nobody is doing right is **how memory is read and written**.
-In ChatGPT and Claude.ai with memory turned on, the LLM is in charge of memory. The model decides when to remember something during your conversation. The model decides what to remember. The model decides what to retrieve when you ask a question. The whole memory layer is implemented as model behavior. You're trusting the LLM to be a librarian, and LLMs are not librarians. They hallucinate.
-RecallMEM does it backwards. **The chat LLM never touches your memory database.** Not for reads, not for writes. The LLM only ever sees a system prompt that's already been assembled by deterministic TypeScript and SQL. Here's the actual flow:
-**When you send a message (memory READ path, 100% deterministic):**
-1. Plain SQL `SELECT` pulls your profile from `s2m_user_profiles`
-2. Plain SQL `SELECT` pulls your top *active* facts from `s2m_user_facts` (retired facts are excluded automatically)
-3. Each fact is stamped with its `valid_from` date so the model can reason about timelines
-4. EmbeddingGemma converts your message to a 768-dim vector (math, not generation)
-5. pgvector cosine similarity search ranks chunks from past conversations
-6. Each retrieved chunk is stamped with its source-chat date (`[from conversation on 2026-03-12]`) so the model can tell history from now
-7. If the chat is being resumed after a multi-hour gap, a one-time system marker like `[Conversation resumed 6 days later]` gets injected before the new user turn
-8. TypeScript template assembles all of it into a system prompt
-9. **Then** the chat LLM gets called, with the assembled context already in its prompt
-The chat LLM never queries the database. It can't decide what to retrieve. It can't pick which facts are relevant. It can't hallucinate a memory that doesn't exist, because if it's not in the prompt, it doesn't exist for the model. The retrieval is 100% deterministic SQL + cosine similarity. No LLM tool calls touching your memory store.
-**After every assistant reply (memory WRITE path, LLM proposes, TypeScript validates):**
-A small local LLM (Gemma 4 E4B via Ollama) runs in the background to extract candidate facts from the running transcript. This happens fire-and-forget after the stream closes, so you never wait for it. It always uses the local model regardless of which provider the chat itself is using, so cloud users (Claude, GPT) don't get billed per turn for extraction.
-The same LLM call also returns the IDs of any **existing** facts the new conversation contradicts. So when you say "I just left Acme to start a new job," the extractor returns the new fact AND flags the old "User works at Acme" fact for retirement. The TypeScript layer flips those rows to `is_active=false` and stamps `valid_to=NOW()`. History is preserved, the active set always reflects current truth.
-But here's the key: the LLM only **proposes** facts and supersession decisions. It cannot write to the database. The TypeScript layer is the actual gatekeeper, and it runs every candidate fact through six validation steps before storage:
-1. **Quality gate.** Conversations under 100 characters get zero facts extracted. The LLM never even sees them.
-2. **JSON parse validation.** If the LLM returns malformed JSON or no array, the entire batch is dropped.
-3. **Type validation.** Only strings survive. Objects, numbers, nested arrays, all rejected.
-4. **Garbage pattern filtering.** A regex filter catches the most common LLM hallucinations: meta-observations like "user asked about X", AI behavior notes like "AI suggested Y", non-facts like "not mentioned", mood observations like "had a good conversation", and anything under 10 characters.
-5. **Deduplication.** Case-insensitive normalized match against the entire facts table. Duplicates get dropped.
-6. **Categorization.** The category (Identity, Family, Work, Health, etc.) is decided by **keyword matching in TypeScript**, not by the LLM. The LLM has no say in how facts get organized.
-After all six steps, the surviving facts get a plain SQL `INSERT`. And even then, you can edit or delete any fact in the Memory page if you don't agree with it.
-**Why this matters:**
-- **Predictability.** When you mention "my dog" in a chat, RecallMEM **always** retrieves the facts that match "dog" via cosine similarity. ChatGPT retrieves whatever the model decides to retrieve, which can vary run to run.
-- **No hallucinated retrieval.** The LLM cannot remember something that isn't actually in your facts table. If it's not in the database, it's not in the prompt.
-- **Auditability.** You can look at any chat and trace exactly which facts and chunks were loaded into the system prompt. With ChatGPT, you can't see what the model decided to surface from memory.
-- **No prompt injection memory leaks.** The LLM in RecallMEM only sees what the deterministic layer feeds it. It can't query the rest of the database. With ChatGPT, the model has tool access to memory, which means a prompt injection attack could theoretically make it dump memory contents.
-- **Your data, your database.** Memory is data you control, not behavior you have to trust the model to do correctly. You can write a script that queries Postgres directly, edit facts manually, run analytics on your own conversations.
-This is the actual reason RecallMEM exists. Not "another local chat UI." A memory architecture where the LLM is intentionally not in charge.
-</details>
----
-## For developers (the memory framework)
-Underneath the chat UI, RecallMEM is a **deterministic memory framework** you can fork and use in your own AI app. The whole `lib/` folder is intentionally framework-shaped. It's not a polished SDK with a public API contract, but it IS a working, opinionated memory architecture you can copy into your own project.
+## What is this
-<details>
-<summary><strong>What's in <code>lib/</code> and how to embed it in your app</strong></summary>
-**The core files in `lib/`:**
-```
-lib/
-├── memory.ts        Memory orchestrator. Loads profile + facts + vector recall in parallel.
-├── prompts.ts       Assembles the system prompt with all the memory context.
-├── facts.ts         Fact extraction (LLM proposes) + validation (TypeScript decides).
-├── profile.ts       Synthesizes a structured profile from the active facts.
-├── chunks.ts        Splits transcripts into chunks, embeds them, runs vector search.
-├── chats.ts         Chat CRUD + transcript serialization with the smart parser.
-├── post-chat.ts     The post-chat pipeline (title gen, fact extract, profile rebuild, embed).
-├── rules.ts         Custom user rules / instructions.
-├── embeddings.ts    EmbeddingGemma calls via Ollama.
-├── llm.ts           LLM router (Ollama, Anthropic, OpenAI, OpenAI-compatible).
-└── db.ts            Postgres pool + the configurable user ID resolver.
-```
-**Embedding it into your own app:**
-The lib functions default to a single-user setup (`user_id = "local-user"`) but you can wire in your own auth system with two function calls at startup:
-```typescript
-import { Pool } from "pg";
-import { configureDb, setUserIdResolver } from "./lib/db";
-// Use your existing Postgres pool (or skip this and let lib/ create its own)
-const myPool = new Pool({ connectionString: process.env.DATABASE_URL });
-configureDb({ pool: myPool });
-// Wire in your auth system. Called whenever a lib function needs the current user.
-// Can be sync or async. Return whatever string identifies the user in your app.
-setUserIdResolver(() => getCurrentUserFromMyAuthSystem());
-```
-That's it. No other changes needed. Every lib function (`getProfile`, `getActiveFacts`, `searchChunks`, `storeFacts`, `rebuildProfile`, etc.) reads from the configured resolver. Your auth system stays in your code, the memory framework stays in `lib/`.
-**Using the memory layer in a chat request:**
+A personal AI chat app with real memory that runs 100% on your machine. Your conversations stay local. The AI builds a profile of who you are over time, extracts facts after every chat, and vector-searches across your entire history to find relevant context. By the time you've used it for a week, it knows you better than any cloud AI because it never forgets.
-```typescript
-import { buildMemoryAwareSystemPrompt } from "./lib/memory";
-import { runPostChatPipeline } from "./lib/post-chat";
-import { createChat, updateChat } from "./lib/chats";
+The default model is **Gemma 4** (Apache 2.0) running locally via Ollama. Pick any size from E2B (runs on a phone) up to 31B Dense (best quality, needs a workstation). Or skip Ollama entirely and bring your own API key for Claude, GPT, Groq, Together, OpenRouter, or anything OpenAI-compatible.
-// 1. Build the system prompt from the user's memory
-const systemPrompt = await buildMemoryAwareSystemPrompt(
-  userMessage,
-  currentChatId
-);
+The memory is the actual differentiator. Not the model. Not the UI. Memory reads are deterministic SQL + cosine similarity, not LLM tool calls. The chat model never touches your database. Facts are proposed by a local LLM but validated by TypeScript before storage. [Deep dive on the architecture →](./docs/ARCHITECTURE.md)
-// 2. Send to your LLM however you want (Ollama, Claude, GPT, whatever)
-const response = await yourLLM.chat([
-  { role: "system", content: systemPrompt },
-  ...conversationHistory,
-  { role: "user", content: userMessage },
-]);
+## Features
-// 3. Save the chat
-await updateChat(chatId, [...conversationHistory, { role: "assistant", content: response }]);
+- **Three-layer memory** across every chat: synthesized profile, extracted facts table, and vector search over all past conversations
+- **Temporal awareness** so the model knows what's current vs. historical. Auto-retires stale facts when the truth changes.
+- **Live fact extraction** after every assistant reply, not just when the chat ends
+- **Memory inspector** where you can view, edit, or delete every fact
+- **Vector search** across past conversations with dated recall
+- **Custom rules** for how you want the AI to talk to you
+- **File uploads** (images, PDFs, code). Gemma 4 handles vision natively.
+- **Web search** when using Anthropic or Ollama (via Brave Search)
+- **Wipe memory unrecoverably** with `DELETE` + `VACUUM FULL` + `CHECKPOINT`
+- **Bring any LLM.** Ollama, Anthropic, OpenAI, or any OpenAI-compatible API.
-// 4. (Async) Run the post-chat pipeline to extract facts, rebuild profile, embed chunks
-runPostChatPipeline(chatId);
-```
-The memory framework doesn't care which LLM you use. It just assembles context. Bring your own model.
-**The schema lives in `migrations/001_init.sql`.** Run it against any Postgres 17+ database with the pgvector extension installed. Tables are prefixed `s2m_` (for "speak2me," the project this came from). Rename them in the migration if you want a different prefix.
-**License:** Apache 2.0. Fork it, modify it, ship it commercially. The only ask is that you preserve the copyright notice and the NOTICE file. See [CONTRIBUTING.md](./CONTRIBUTING.md) for the full guide.
+## Quick start (Mac)
-</details>
----
+RecallMEM is built and tested on macOS. Mac is the supported platform.
-## Quick start
+**Prerequisites:** Node.js 20+ and [Homebrew](https://brew.sh).
 ```bash
 npx recallmem
 ```
-You need three things on your machine first: **Node.js 20+**, **Postgres 17 with pgvector**, and **Ollama** (optional, skip if you only want cloud providers). If any are missing, the CLI tells you exactly what to install for your OS.
-<details>
-<summary><strong>Architecture diagrams (system, memory layers, post-chat sequence)</strong></summary>
-### System architecture
-```mermaid
-flowchart TB
-    Browser["Browser<br/>Chat UI<br/>localhost:3000"]
-    NextJS["Next.js App<br/>API routes + SSR"]
-    Postgres[("Postgres + pgvector<br/>localhost:5432<br/>Chats, facts, profile, embeddings")]
-    Ollama["Ollama<br/>localhost:11434<br/>Gemma 4 + EmbeddingGemma"]
-    Cloud{{"Optional: Cloud LLMs<br/>Anthropic / OpenAI / etc.<br/>Only if you add a provider"}}
-    Browser <-->|HTTP / SSE| NextJS
-    NextJS <-->|SQL + vector queries| Postgres
-    NextJS <-->|"/api/chat<br/>/api/embed"| Ollama
-    NextJS -.->|Optional API call| Cloud
-    style Cloud stroke-dasharray: 5 5
-    style Ollama fill:#dfe
-    style Postgres fill:#dfe
-    style NextJS fill:#dfe
-    style Browser fill:#dfe
-```
-Everything in green runs on your machine. The dashed cloud box only activates if you explicitly add a cloud provider in settings. Otherwise, nothing leaves your computer. Ever.
-### The three-layer memory system
-```mermaid
-flowchart LR
-    Chat[New chat message]
-    Memory["Memory loader<br/>(parallel)"]
-    Profile["Layer 1: Profile<br/>Synthesized summary<br/>(IDENTITY, FAMILY,<br/>WORK, HEALTH...)"]
-    Facts["Layer 2: Facts<br/>Top 50 atomic statements<br/>(pinned to system prompt)"]
-    Vector["Layer 3: Vector search<br/>Top 5 chunks from past<br/>conversations<br/>(semantic similarity)"]
-    Rules["User custom rules<br/>(behavior instructions)"]
-    Prompt["System prompt<br/>(profile + facts + recall + rules)"]
-    LLM[LLM]
-    Response[Streaming response]
-    Chat --> Memory
-    Memory --> Profile
-    Memory --> Facts
-    Memory --> Vector
-    Memory --> Rules
-    Profile --> Prompt
-    Facts --> Prompt
-    Vector --> Prompt
-    Rules --> Prompt
-    Prompt --> LLM
-    LLM --> Response
-```
-Each layer does a different job:
-- **Profile** loads instantly. It's the "who am I talking to" baseline. One database row, always loaded into every system prompt.
-- **Facts** are atomic statements you can view, edit, and delete. Stored as individual rows. Pinned into the prompt every conversation.
-- **Vector search** finds semantically relevant prose from any past conversation. Catches the stuff that doesn't fit cleanly into facts, like that idea you were working through three weeks ago.
-Together, they let the AI know your name, your family, your job, AND remember the specific thing you mentioned a month ago when it becomes relevant.
-### What happens when you end a chat
-```mermaid
-sequenceDiagram
-    actor User
-    participant UI as Chat UI
-    participant API as /api/chat/finalize
-    participant LLM
-    participant DB as Postgres
-    User->>UI: Click "New chat"
-    UI->>UI: Show "Saving memory..."
-    UI->>API: POST chatId
-    API->>LLM: Generate title (Gemma E4B)
-    LLM-->>API: "Discussing project ideas"
-    API->>DB: Save title
-    API->>LLM: Extract facts (Gemma E4B)
-    LLM-->>API: ["User's name is...", "User works at...", ...]
-    API->>DB: Insert new facts (deduped)
-    API->>DB: Rebuild profile from all facts
-    API->>API: Embed transcript chunks
-    API->>DB: Insert embeddings
-    API-->>UI: Done
-    UI->>UI: Clear chat, ready for next
-```
-Click "New chat", wait a few seconds, and the next conversation immediately sees the new memory.
-</details>
-<details>
-<summary><strong>Hardware requirements (which model fits which machine)</strong></summary>
-The biggest variable is which LLM you pick. RecallMEM lets you choose.
-### Fully open source (Ollama + Gemma 4 locally)
-| Setup | Model | RAM | Speed | Quality |
-|---|---|---|---|---|
-| Phone / iPad | Gemma 4 E2B | 8GB | Fast | Basic |
-| MacBook Air / Mac Mini M4 | Gemma 4 E4B | 16GB | Fast | Good |
-| Mac Studio M2+ | Gemma 4 26B MoE | 32GB+ | Very fast | Great |
-| Workstation / server | Gemma 4 31B Dense | 32GB+ | Slower | Best |
-The 26B MoE is what I use as the default. It's a Mixture of Experts model, so it only activates 3.8B parameters per token even though it has 26B total. Much faster than the 31B Dense, almost the same quality. Ranked #6 globally on the Arena leaderboard.
-### Using cloud providers (Claude, GPT, Groq, etc.)
-If you don't want to run a local LLM at all, you can plug in any cloud API:
-| Setup | RAM | Notes |
-|---|---|---|
-| Any laptop | ~4GB free | Just runs Postgres + the Node.js app + browser. The LLM runs on the provider's servers. |
-You bring your own API key. The database, memory, profile, and rules still stay on your machine. Only the chat messages get sent to the provider.
-**One thing to know:** when you use a cloud provider, your conversation goes to their servers. Your facts and profile get sent as part of the system prompt so the cloud LLM has context. This breaks the local-only guarantee for those specific conversations. Use Ollama for anything you want fully private.
+That's the whole install. Here's what happens after you hit Enter:
-</details>
-<details>
-<summary><strong>CLI commands</strong></summary>
-```bash
-npx recallmem            # Setup if needed, then start the app
-npx recallmem init       # Setup only (deps check, DB, models, env)
-npx recallmem start      # Start the server (assumes setup was done)
-npx recallmem doctor     # Check what's missing or broken
-npx recallmem upgrade    # Pull latest code, run pending migrations
-npx recallmem version    # Print version
-npx recallmem --help     # Show help
-```
+1. **It checks what you already have** on your Mac (Node, Postgres, Ollama). Anything already installed gets skipped.
+2. **It shows you a list** of what's missing with ✓ and ✗ marks.
+3. **It asks one question:** `Install everything now? [Y/n]`. Hit Enter to say yes.
+4. **It runs `brew install`** for Postgres 17, pgvector, and Ollama. You'll see real-time progress in your terminal.
+5. **It starts Postgres and Ollama as background services** so they keep running across reboots.
+6. **It downloads EmbeddingGemma** (~600 MB, ~1-2 min). This is required for the memory system.
+7. **It asks which Gemma 4 model you want.** Three options:
+   - **1) Gemma 4 26B** — 18 GB, fast, recommended for most people
+   - **2) Gemma 4 31B** — 19 GB, slower, smartest answers
+   - **3) Gemma 4 E2B** — 2 GB, very fast, good for testing or older laptops
+8. **It downloads the model you picked.** E2B finishes in 2-3 min. The 18 GB option takes 10-30 min depending on your internet.
+9. **It runs database migrations** (~5 seconds).
+10. **It builds the app for production** (~30-60 seconds, first install only).
+11. **It starts the server.** Open `http://localhost:3000` in your browser and start chatting.
-The default `npx recallmem` is what you'll use 99% of the time. It's smart about its state. On the first run it sets everything up, on subsequent runs it just starts the server.
+Total time: **5-45 minutes** depending on which model you picked and your internet speed. Most of that is the model download. You only have to interact with it twice — once to confirm install, once to pick a model. After that, walk away.
-If something breaks, run `npx recallmem doctor` first. It tells you exactly what's wrong and how to fix it.
-</details>
+**Subsequent runs are instant.** Just `npx recallmem` and the chat opens.
 <details>
-<summary><strong>Two ways to use it (just-run-it vs fork-and-hack)</strong></summary>
-The `npx recallmem` command auto-detects which workflow you're in.
-### Workflow 1: Just run it (most users)
+<summary><strong>Just want cloud models? (Claude / GPT)</strong></summary>
-You want to use RecallMEM as your daily AI tool. You don't care about the code.
+You still need Postgres for local memory storage, but you can skip Ollama entirely:
 ```bash
+brew install postgresql@17 pgvector
+brew services start postgresql@17
 npx recallmem
 ```
-The CLI:
-1. Detects nothing is installed yet
-2. Clones the repo to `~/.recallmem` (one-time, ~50MB)
-3. Runs `npm install` inside `~/.recallmem`
-4. Checks your dependencies (Postgres, pgvector, Ollama)
-5. Pulls the embedding model if missing
-6. Asks if you want to pull a chat model (~18GB, optional)
-7. Creates the database, runs migrations, writes the config file
-8. Starts the server and opens the chat in your browser
-Subsequent runs are instant. Just `npx recallmem` and the chat opens.
-To upgrade later when I ship a new version:
-```bash
-npx recallmem upgrade
-```
-That does a `git pull`, runs `npm install` if deps changed, and applies any pending migrations.
-### Workflow 2: Fork it and hack on it (developers)
-You want to modify the code, contribute back, run your own variant.
-```bash
-git clone https://github.com/RealChrisSean/RecallMEM.git
-cd RecallMEM
-npm install
-npx recallmem
-```
-The CLI detects you're already inside a recallmem checkout and uses your current directory instead of cloning to `~/.recallmem`. Hot reload works. Edits to the code are reflected immediately on the next dev server reload.
-Same `npx recallmem` command. Different behavior because the CLI is smart about where it's running.
-See [CONTRIBUTING.md](./CONTRIBUTING.md) for the dev workflow.
-**Testing:**
-```bash
-npm test          # run the suite once
-npm test:watch    # re-run on file change
-```
-The test suite uses Vitest and currently covers the deterministic memory primitives (keyword inflection, the categorization router, and the regression cases that have bitten us in the past — `son` matching `Sonnet`, `work` matching `framework`, etc). It's intentionally narrow and fast (~150ms). New tests go in `test/unit/` and follow the same shape as `test/unit/facts.test.ts`. No DB or LLM required, pure functions only.
-**Optional observability (Langfuse):**
-If you're hacking on RecallMEM and want full trace timelines for every chat turn (memory build, LLM generation, fact extraction, supersession decisions, etc), there's a built-in Langfuse integration. It's a peer dependency, so it's NOT installed by default and zero cost when unused.
-```bash
-npm install langfuse
-```
-Then set these in `.env.local`:
-```
-LANGFUSE_PUBLIC_KEY=pk-lf-...
-LANGFUSE_SECRET_KEY=sk-lf-...
-LANGFUSE_BASEURL=http://localhost:3000  # optional, defaults to cloud.langfuse.com
-```
-Self-host Langfuse via Docker so traces stay on your machine. This is a developer-only debugging tool. Trace payloads include the actual user message content, so don't enable it on machines where conversation contents shouldn't leave the local environment.
-</details>
-<details>
-<summary><strong>Where things live on disk (and how to fully uninstall)</strong></summary>
-The default install location is `~/.recallmem`. Override with `RECALLMEM_HOME=/custom/path npx recallmem` if you want it somewhere else.
-What's in `~/.recallmem`:
-- The full RecallMEM source code (cloned from GitHub)
-- `node_modules/` with all dependencies
-- `.env.local` with your config
-- The Next.js build output (when you run it)
-What's NOT in `~/.recallmem`:
-- Your conversations, facts, profile, embeddings, rules, and API keys. Those all live in your Postgres database at `/opt/homebrew/var/postgresql@17/` (Mac) or `/var/lib/postgresql/` (Linux). The Postgres data directory is the actual source of truth.
-To completely uninstall:
-```bash
-rm -rf ~/.recallmem        # Remove the app
-dropdb recallmem           # Remove the database (or use the in-app "Nuke everything" button first)
-```
-</details>
----
-## Privacy
-If you only use Ollama, **nothing leaves your machine, ever**. You can air-gap the computer and it keeps working. If you add a cloud provider (Claude, GPT, etc.), only the chat messages and your assembled system prompt go to that provider's servers. Your database, embeddings, and saved API keys stay local.
-<details>
-<summary><strong>Privacy diagram + truly unrecoverable deletion</strong></summary>
-```mermaid
-flowchart TB
-    subgraph Local["Your machine (always private)"]
-        DB[(Postgres database<br/>Chats, facts, profile, embeddings, API keys, rules)]
-        App[Next.js app]
-        Ollama_Box[Ollama]
-    end
-    subgraph CloudOpt["Optional cloud (only if you add a provider)"]
-        Anthropic[Anthropic API]
-        OpenAI_API[OpenAI API]
-        Other[Other LLM APIs]
-    end
-    User[You] <--> App
-    App <--> DB
-    App <--> Ollama_Box
-    App -.->|"Conversation messages<br/>+ system prompt<br/>(only if you pick a cloud provider)"| Anthropic
-    App -.-> OpenAI_API
-    App -.-> Other
-    style Local fill:#dfe
-    style CloudOpt stroke-dasharray: 5 5
-```
-**Always on your machine, never sent anywhere:**
-- Your chat history
-- Your facts and profile
-- Your custom rules
-- Your vector embeddings
-- Your saved API keys
-**Sent only when you actively use a cloud provider:**
-- The current conversation messages
-- The system prompt (which includes your profile, facts, and rules so the cloud LLM has context)
-### Truly unrecoverable deletion
-When you click "Wipe memory" or "Nuke everything" on the Memory page, the app runs:
-1. `DELETE` to remove rows from query results
-2. `VACUUM FULL <table>` to physically rewrite the table on disk and release the dead row space
-3. `CHECKPOINT` to force Postgres to flush WAL log files
-After those three steps, the data is gone from the database in any practically recoverable way.
-**One thing I want to be honest about:** filesystem-level forensic recovery (raw disk block scanning) is a separate problem. SSDs have wear leveling, so file overwrites don't always touch the original physical cells. The complete solution is **full-disk encryption** (FileVault on Mac, LUKS on Linux, BitLocker on Windows). With disk encryption and a strong login password, the data is genuinely unrecoverable. Not even Apple could read it.
-</details>
----
-<details>
-<summary><strong>What it doesn't do (yet), honest limitations</strong></summary>
-I'm being honest about the limitations. This is v0.1.
-- **No voice yet.** It's text only. I want to add Whisper for speech-to-text and Piper for text-to-speech, both local. On the roadmap.
-- **Web search works on Anthropic and Ollama. OpenAI not yet.** Anthropic uses the native `web_search_20250305` tool, no setup. Ollama (Gemma) uses **Brave Search** as a backend, which needs a free API key (5 minute setup): sign up at [brave.com/search/api](https://brave.com/search/api), pick the Free tier (2,000 searches/month), and add `BRAVE_SEARCH_API_KEY=your_key_here` to your `.env.local`. Then restart RecallMEM. When you toggle web search on the chat UI, the first time you'll see a privacy modal explaining that Brave will see your message text but NOT your memory, profile, facts, or past conversations. If the key isn't set or the quota is exhausted, the toggle still works but the AI will tell you what to do instead of failing silently. OpenAI's native web search requires the Responses API path which isn't plumbed through yet.
-- **No multi-user.** This is a personal app for one person on one machine. If you want a multi-user version, that's a separate fork.
-- **Reasoning models (OpenAI o1/o3, Claude extended thinking) might have edge cases.** They use different API parameters that I don't fully handle yet. Standard chat models work fine.
-- **OpenAI vision isn't fully wired up.** Gemma 4 (4B and up) handles images natively via Ollama. OpenAI uses a different format that I haven't plumbed through. Use Ollama or Anthropic for images.
-- **No mobile app.** It's a web app you run locally. You access it from your browser at `localhost:3000`. A native iOS/Android app is theoretically possible but it's a separate project I haven't started.
-- **Fact supersession is LLM-judged and conservative.** The local Gemma extractor decides whether a new fact contradicts an old one. It's intentionally cautious (only retires a fact when the replacement is unambiguous), so it might occasionally miss a real contradiction or, more rarely, retire something it shouldn't have. You can always inspect and edit/restore in the Memory page. For higher-stakes use cases, you'd want a stricter rule-based supersession layer on top, or a periodic profile-rebuild from full history.
+After the app starts, go to **Settings → Providers → Add a new provider**, paste your API key, and pick that model from the chat dropdown.
 </details>
 <details>
-<summary><strong>Tech stack</strong></summary>
-- **Frontend / Backend:** Next.js 16 (App Router) + TypeScript + Tailwind CSS v4
-- **Database:** Postgres 17 + pgvector (HNSW vector indexes)
-- **Local LLM:** Ollama with Gemma 4 (E2B / E4B / 26B MoE / 31B Dense)
-- **Embeddings:** EmbeddingGemma 300M (768 dimensions, runs in Ollama)
-- **PDF parsing:** pdf-parse v2
-- **Markdown rendering:** react-markdown + remark-gfm + @tailwindcss/typography
-- **Cloud LLM transports (optional):** Anthropic Messages API, OpenAI Chat Completions, OpenAI-compatible
-</details>
-<details>
-<summary><strong>Manual install (for the curious or for when <code>npx recallmem</code> can't be used)</strong></summary>
-If you want to know what `npx recallmem` is doing under the hood, or you don't want to use the CLI for some reason, here's the manual install.
-### macOS
-```bash
-# 1. Install Node.js
-brew install node
-# 2. Install Postgres 17 + pgvector
-brew install postgresql@17 pgvector
-brew services start postgresql@17
-# 3. Install Ollama (skip if using cloud only)
-brew install ollama
-brew services start ollama
-# 4. Pull the models
-ollama pull embeddinggemma      # ~600MB, REQUIRED
-ollama pull gemma4:26b          # ~18GB, recommended chat model
-ollama pull gemma4:e4b          # ~4GB, fast model for background tasks
-```
+<summary><strong>Linux (not officially supported, manual install)</strong></summary>
-### Linux (Ubuntu/Debian)
+Auto-install isn't wired up for Linux. You'll need to install everything by hand:
 ```bash
-# 1. Node.js
-curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
-sudo apt install -y nodejs
-# 2. Postgres + pgvector
+# Postgres + pgvector (apt example)
 sudo apt install postgresql-17 postgresql-17-pgvector
 sudo systemctl start postgresql
-# 3. Ollama
+# Ollama
 curl -fsSL https://ollama.com/install.sh | sh
-# 4. Pull models
+sudo systemctl start ollama
 ollama pull embeddinggemma
 ollama pull gemma4:26b
-ollama pull gemma4:e4b
-```
-### Windows
-Use WSL2 with Ubuntu and follow the Linux steps. Native Windows works too but it's rougher.
-### Setup
-```bash
-# 1. Clone the repo
-git clone https://github.com/RealChrisSean/RecallMEM.git
-cd RecallMEM
-# 2. Install dependencies
-npm install
-# 3. Create the database
-createdb recallmem
-# 4. Run migrations
-npm run migrate
-# 5. Configure .env.local
-cat > .env.local <<EOF
-DATABASE_URL=postgres://$USER@localhost:5432/recallmem
-OLLAMA_URL=http://localhost:11434
-OLLAMA_CHAT_MODEL=gemma4:26b
-OLLAMA_FAST_MODEL=gemma4:e4b
-OLLAMA_EMBED_MODEL=embeddinggemma
-EOF
-# 6. Start the dev server
-npm run dev
+# Run
+npx recallmem
 ```
-Open [http://localhost:3000](http://localhost:3000).
 </details>
 <details>
-<summary><strong>Troubleshooting (the real gotchas I hit)</strong></summary>
+<summary><strong>Windows (not supported, use WSL2)</strong></summary>
-Stuff I've actually hit. If you run into something else, run `npx recallmem doctor` first. It tells you exactly what's broken.
+Native Windows is not supported. Use [WSL2](https://learn.microsoft.com/en-us/windows/wsl/install) with Ubuntu and follow the Linux steps above inside WSL.
-**`createdb: command not found`**
+</details>
+## CLI commands
-Add Postgres to your PATH:
 ```bash
-export PATH="/opt/homebrew/opt/postgresql@17/bin:$PATH"
+npx recallmem            # Setup if needed, then start the app
+npx recallmem init       # Setup only (deps, DB, models, env)
+npx recallmem start      # Start the server (assumes setup done)
+npx recallmem doctor     # Check what's missing or broken
+npx recallmem upgrade    # Pull latest code, run pending migrations
+npx recallmem version    # Print version
 ```
-**`extension "vector" is not available`**
+## Privacy
-You're running Postgres 16 or older. The `pgvector` Homebrew bottle only ships extensions for Postgres 17 and 18. Switch to `postgresql@17`. I learned this the hard way. The install error message is cryptic and the fix took me 30 minutes the first time.
+If you only use Ollama, **nothing leaves your machine, ever.** You can air-gap the computer and it keeps working. If you add a cloud provider, only the chat messages and your assembled system prompt go to that provider's servers. Your database, embeddings, and saved API keys stay local.
-**Ollama silently fails to pull a new model**
+## For developers
-You've got a version mismatch between the Ollama CLI and the Ollama server. This bites you if you have both Homebrew Ollama AND the desktop Ollama app installed. Check `ollama --version`. Both client and server should match.
+Underneath the chat UI, RecallMEM is a **deterministic memory framework** you can fork and use in your own AI app. The whole `lib/` folder is intentionally framework-shaped.
-```bash
-brew upgrade ollama
-pkill -f "Ollama"            # kill the old desktop app server
-brew services start ollama   # start the new server from Homebrew
+```
+lib/
+├── memory.ts        Memory orchestrator (profile + facts + vector recall in parallel)
+├── prompts.ts       System prompt assembly with all memory context
+├── facts.ts         Fact extraction (LLM proposes) + validation (TypeScript decides)
+├── profile.ts       Synthesizes a structured profile from active facts
+├── chunks.ts        Transcript splitting, embedding, vector search
+├── chats.ts         Chat CRUD + transcript serialization
+├── post-chat.ts     Post-chat pipeline (title, facts, profile rebuild, embed)
+├── rules.ts         Custom user rules / instructions
+├── embeddings.ts    EmbeddingGemma calls via Ollama
+├── llm.ts           LLM router (Ollama, Anthropic, OpenAI, OpenAI-compatible)
+└── db.ts            Postgres pool + configurable user ID resolver
 ```
-**Gemma 4 31B is slow**
-Two reasons:
-1. **Thinking mode is on.** The app already disables it via `think: false`, but if you bypass the app and call Ollama directly, you'll see slow responses. Gemma 4 spends a ton of tokens "thinking" before answering when it's enabled.
-2. **Dense vs MoE.** 31B Dense activates all 31B parameters per token. Switch to `gemma4:26b` (Mixture of Experts, only 3.8B active per token) for ~3-5x the speed with minimal quality loss. This is what I use as the default.
-**"My memory isn't being used in new chats"**
+Wire in your own auth with two calls at startup and every lib function respects it. See the [developer docs](./docs/DEVELOPERS.md) for embedding the memory layer into your own app, the database schema, testing, and optional Langfuse observability.
-Make sure you click "New chat" (or switch to another chat in the sidebar) to trigger the synchronous "Saving memory..." finalize step. If you just refresh the browser without ending the chat, the post-chat pipeline runs as a best-effort `sendBeacon()` and may not finish before the next chat starts.
+## Docs
-The fix: always click "New chat" or switch chats in the sidebar before closing the browser if you said something you want remembered.
+| Doc | What's in it |
+|---|---|
+| [Architecture deep dive](./docs/ARCHITECTURE.md) | How deterministic memory works, read/write paths, validation pipeline, why the LLM is not in charge |
+| [Developer guide](./docs/DEVELOPERS.md) | Embedding the memory framework, auth wiring, schema, testing, Langfuse setup |
+| [Hardware guide](./docs/HARDWARE.md) | Which model fits which machine, RAM requirements, cloud vs. local tradeoffs |
+| [Troubleshooting](./docs/TROUBLESHOOTING.md) | Every gotcha I've hit and how to fix it |
+| [Manual install](./docs/MANUAL_INSTALL.md) | Step-by-step if you don't want to use the CLI |
-</details>
+## Limitations (v0.1)
----
+Text only (no voice yet). No multi-user. No mobile app. OpenAI vision not fully wired. Reasoning models (o1/o3, extended thinking) may have edge cases. Fact supersession is LLM-judged and intentionally conservative. See the [full limitations list](./docs/LIMITATIONS.md).
 ## Contributing
-Forks, PRs, bug reports, ideas, all welcome. See [CONTRIBUTING.md](./CONTRIBUTING.md) for the dev setup and how the codebase is organized.
-If you build something cool on top of RecallMEM, I'd love to hear about it.
+Forks, PRs, bug reports, ideas, all welcome. See [CONTRIBUTING.md](./CONTRIBUTING.md) for the dev setup.
 ## License
-Apache License 2.0. See [LICENSE](./LICENSE) for the full text and [NOTICE](./NOTICE) for third-party attributions. You can use, modify, fork, and redistribute this for any purpose, personal or commercial. The license includes a patent grant and the standard "no warranty, no liability" disclaimer.
+Apache 2.0. See [LICENSE](./LICENSE) and [NOTICE](./NOTICE). Use it, modify it, fork it, ship it commercially.
 ## Status
-This is v0.1. It works. I use it every day.
-It's also not "production ready" in the corporate sense. There's no CI, no error monitoring, no SLA. There's a small Vitest test suite that covers the deterministic memory primitives (keyword routing, inflection, regression cases), but it's intentionally narrow. If you want to use it as your daily AI tool, fork it, make it yours, and expect to read the code if something breaks. That's the deal.
-I built RecallMEM because I wanted my own private AI. I'm sharing it because there's a real gap in the local AI ecosystem and someone needed to fill it. If this is useful to you, that's cool. If not, no hard feelings.
+v0.1. It works. I use it every day. There's no CI, no error monitoring, no SLA. If you want to use it as your daily AI tool, fork it, make it yours, and expect to read the code if something breaks. That's the deal.
-The repo: [github.com/RealChrisSean/RecallMEM](https://github.com/RealChrisSean/RecallMEM)
+[github.com/RealChrisSean/RecallMEM](https://github.com/RealChrisSean/RecallMEM)

package/bin/commands/setup.js CHANGED Viewed

@@ -1,21 +1,26 @@
 /**
  * recallmem init / setup
  *
- * Idempotent setup pipeline:
- *   1. Check Node.js version
- *   2. Check Postgres is installed and running
- *   3. Check pgvector is available
- *   4. Create the database if missing
- *   5. Run migrations
- *   6. Check Ollama (optional - skip if user wants cloud-only)
- *   7. Pull embeddinggemma (required, ~600MB)
- *   8. Offer to pull gemma4:26b (recommended chat model, ~18GB)
- *   9. Generate .env.local with sensible defaults
+ * Real installer (not a hint-giver). Detects what's missing, asks the user
+ * ONE yes/no question, then installs everything for them. Then asks which
+ * Gemma 4 model to download. Then runs the app.
+ *
+ * Pipeline:
+ *   1. Check Node 20+ (hard fail if missing - we can't bootstrap node from npx)
+ *   2. Detect missing pieces (Postgres, pgvector, Ollama)
+ *   3. Show one summary + one prompt: "install everything? Y/n"
+ *   4. Run brew install / brew services start for everything missing
+ *   5. Verify each piece is actually up before moving on
+ *   6. Pull EmbeddingGemma (always, required for memory)
+ *   7. Ask which Gemma 4 chat model to install (1, 2, or 3)
+ *   8. Run migrations
+ *   9. Production build (skipped in dev mode)
+ *   10. Done
  */
 const fs = require("node:fs");
 const path = require("node:path");
-const { spawn, spawnSync, execSync } = require("node:child_process");
+const { spawnSync, execSync } = require("node:child_process");
 const {
   getOS,
@@ -26,6 +31,7 @@ const {
   detectOllama,
   detectOllamaModel,
   detectDatabase,
+  commandExists,
 } = require("../lib/detect");
 const {
@@ -46,7 +52,7 @@ const {
   blank,
 } = require("../lib/output");
-const { confirm } = require("../lib/prompt");
+const { confirm, ask } = require("../lib/prompt");
 const DEFAULT_DB_NAME = "recallmem";
@@ -71,6 +77,54 @@ function writeEnv(envPath, env) {
   fs.writeFileSync(envPath, lines.join("\n") + "\n");
 }
+// Run a shell command and stream output to the user. Returns true on success.
+function run(command, args, label) {
+  if (label) step(label);
+  const result = spawnSync(command, args, { stdio: "inherit" });
+  return result.status === 0;
+}
+// Wait up to N seconds for a service to become ready. Used after starting
+// brew services so we don't race ahead before postgres/ollama is actually up.
+async function waitFor(checkFn, timeoutMs = 15000, intervalMs = 500) {
+  const start = Date.now();
+  while (Date.now() - start < timeoutMs) {
+    if (await checkFn()) return true;
+    await new Promise((r) => setTimeout(r, intervalMs));
+  }
+  return false;
+}
+// Pretty model menu — short lines, plain words, dyslexia-friendly.
+async function pickGemmaModel() {
+  blank();
+  console.log(color.bold("Pick a Gemma 4 model."));
+  blank();
+  console.log("  1) Gemma 4 26B");
+  console.log("     Size: 18 GB");
+  console.log("     Speed: Fast");
+  console.log("     Best for: Most people. Recommended.");
+  blank();
+  console.log("  2) Gemma 4 31B");
+  console.log("     Size: 19 GB");
+  console.log("     Speed: Slower");
+  console.log("     Best for: People who want the smartest answers, even if it takes longer.");
+  blank();
+  console.log("  3) Gemma 4 E2B");
+  console.log("     Size: 2 GB");
+  console.log("     Speed: Very fast");
+  console.log("     Best for: A quick test. Or older laptops.");
+  blank();
+  while (true) {
+    const answer = await ask("Type 1, 2, or 3 and press Enter [1]: ");
+    if (!answer || answer === "1") return { id: "gemma4:26b", label: "Gemma 4 26B" };
+    if (answer === "2") return { id: "gemma4:31b", label: "Gemma 4 31B" };
+    if (answer === "3") return { id: "gemma4:e2b", label: "Gemma 4 E2B" };
+    console.log("  Type 1, 2, or 3.");
+  }
+}
 async function setupCommand(opts = {}) {
   const {
     silent = false,
@@ -79,9 +133,10 @@ async function setupCommand(opts = {}) {
     devMode = false,
   } = opts;
   const ENV_PATH = path.join(installPath, ".env.local");
+  const os = getOS();
-  // ─── Step 1: Node.js ───────────────────────────────────────────────────
-  if (!silent) section("Checking dependencies");
+  // ─── Step 1: Node.js (hard requirement, we're already running on it) ───
+  if (!silent) section("Checking what you have");
   const node = detectNode();
   if (!node.ok) {
     fail(`Node.js ${node.version} is too old (need ${node.needed}+)`);
@@ -91,54 +146,157 @@ async function setupCommand(opts = {}) {
   }
   success(`Node.js ${node.version}`);
-  // ─── Step 2: Postgres ──────────────────────────────────────────────────
-  const pg = detectPostgres();
-  if (!pg.installed) {
-    fail("Postgres not found");
-    blank();
-    console.log(postgresInstallHint());
+  // ─── Step 2: Detect everything else ───────────────────────────────────
+  let pg = detectPostgres();
+  let pgService = pg.installed ? detectPostgresService() : { running: false };
+  let ollama = detectOllama();
+  // ─── Step 3: Print a summary of what's there and what's missing ───────
+  blank();
+  console.log(pg.installed && pg.ok
+    ? `  ✓ Postgres ${pg.major}`
+    : "  ✗ Postgres 17 with pgvector — missing");
+  console.log(pgService.running
+    ? "  ✓ Postgres is running"
+    : pg.installed
+      ? "  ✗ Postgres is not running"
+      : "  ✗ Postgres is not running");
+  console.log(ollama.installed && ollama.running
+    ? "  ✓ Ollama is running"
+    : ollama.installed
+      ? "  ✗ Ollama is installed but not running"
+      : "  ✗ Ollama — missing");
+  blank();
+  const needPostgres = !pg.installed || !pg.ok;
+  const needPostgresStart = pg.installed && pg.ok && !pgService.running;
+  const needOllama = !ollama.installed;
+  const needOllamaStart = ollama.installed && !ollama.running;
+  const anythingMissing = needPostgres || needPostgresStart || needOllama || needOllamaStart;
+  if (anythingMissing) {
+    // Check Homebrew is available before offering auto-install on Mac
+    const hasBrew = os === "mac" && commandExists("brew");
+    if (os === "mac" && !hasBrew) {
+      fail("Homebrew is required to auto-install dependencies.");
+      blank();
+      console.log("Install Homebrew first by pasting this in your terminal:");
+      console.log("");
+      console.log("  /bin/bash -c \"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\"");
+      console.log("");
+      console.log("Then re-run: npx recallmem");
+      return { ok: false };
+    }
+    if (os !== "mac" && os !== "linux") {
+      fail(`Auto-install is only supported on Mac and Linux. You're on: ${os}`);
+      info("On Windows, use WSL2 with Ubuntu and re-run npx recallmem inside WSL.");
+      return { ok: false };
+    }
+    if (os === "linux") {
+      // Linux is doable but we can't run apt without sudo, and the package
+      // names vary by distro. Print clear instructions and exit.
+      fail("Auto-install is currently only set up for Mac (Homebrew).");
+      blank();
+      console.log("On Linux, install these manually then re-run npx recallmem:");
+      console.log("");
+      console.log("  Postgres 17 with pgvector (your distro's package manager)");
+      console.log("  Ollama: curl -fsSL https://ollama.com/install.sh | sh");
+      console.log("  Then: systemctl start postgresql && systemctl start ollama");
+      return { ok: false };
+    }
+    // Mac path: ask once, install everything
     blank();
-    info("Once installed, re-run: npx recallmem");
-    return { ok: false };
-  }
-  if (!pg.ok) {
-    fail(`Postgres ${pg.major} found, but version 17+ is required`);
+    console.log("I can install and start the missing pieces for you using Homebrew.");
+    console.log("This takes about 2-5 minutes (not counting the model download).");
     blank();
-    console.log(postgresInstallHint());
-    return { ok: false };
-  }
-  success(`Postgres ${pg.major}`);
-  // ─── Step 3: Postgres service running ──────────────────────────────────
-  const pgService = detectPostgresService();
-  if (!pgService.running) {
-    warn("Postgres is installed but not running");
-    if (getOS() === "mac") {
-      info("Try: brew services start postgresql@17");
-    } else if (getOS() === "linux") {
-      info("Try: sudo systemctl start postgresql");
+    const wantsInstall = await confirm("Install everything now?", true);
+    if (!wantsInstall) {
+      blank();
+      info("Skipped. You can install manually with:");
+      if (needPostgres) console.log("  brew install postgresql@17 pgvector");
+      if (needPostgresStart || needPostgres) console.log("  brew services start postgresql@17");
+      if (needOllama) console.log("  brew install ollama");
+      if (needOllamaStart || needOllama) console.log("  brew services start ollama");
+      return { ok: false };
     }
-    return { ok: false };
+    // Install Postgres if missing
+    if (needPostgres) {
+      if (!run("brew", ["install", "postgresql@17", "pgvector"], "Installing Postgres 17 + pgvector...")) {
+        fail("Failed to install Postgres. Try running it manually and re-run npx recallmem.");
+        return { ok: false };
+      }
+      success("Installed Postgres 17 + pgvector");
+      // Re-detect after install
+      pg = detectPostgres();
+    }
+    // Start Postgres if not running
+    if (!pgService.running || needPostgres) {
+      step("Starting Postgres in the background...");
+      run("brew", ["services", "start", "postgresql@17"]);
+      // Wait for it to actually accept connections
+      const isUp = await waitFor(() => {
+        const r = detectPostgresService();
+        return r.running;
+      });
+      if (!isUp) {
+        fail("Postgres started but isn't accepting connections after 15s.");
+        info("Try: brew services restart postgresql@17");
+        return { ok: false };
+      }
+      success("Postgres is running on localhost:5432");
+      pgService = { running: true };
+    }
+    // Install Ollama if missing
+    if (needOllama) {
+      if (!run("brew", ["install", "ollama"], "Installing Ollama...")) {
+        fail("Failed to install Ollama. Try running it manually and re-run npx recallmem.");
+        return { ok: false };
+      }
+      success("Installed Ollama");
+      ollama = detectOllama();
+    }
+    // Start Ollama if not running
+    if (!ollama.running || needOllama) {
+      step("Starting Ollama in the background...");
+      run("brew", ["services", "start", "ollama"]);
+      const isUp = await waitFor(() => {
+        const r = detectOllama();
+        return r.running;
+      });
+      if (!isUp) {
+        fail("Ollama started but isn't responding after 15s.");
+        info("Try: brew services restart ollama");
+        return { ok: false };
+      }
+      success("Ollama is running on localhost:11434");
+      ollama = detectOllama();
+    }
+  } else {
+    success("Everything is already installed and running.");
   }
-  success("Postgres service running on localhost:5432");
   // ─── Step 4: env file (we need DATABASE_URL before checking pgvector) ──
   const env = readEnv(ENV_PATH);
   let connectionString = env.DATABASE_URL;
   if (!connectionString) {
     connectionString = defaultConnectionString();
-    step(`No .env.local found, will create one with default DATABASE_URL`);
   }
   // ─── Step 5: Database exists ───────────────────────────────────────────
-  // Extract the database name from the connection string for accurate messages
   const dbNameMatch = connectionString.match(/\/([^/?]+)(\?|$)/);
   const dbName = dbNameMatch ? dbNameMatch[1] : DEFAULT_DB_NAME;
   const dbCheck = await detectDatabase(connectionString);
   if (!dbCheck.exists) {
-    step(`Database '${dbName}' not found, creating...`);
+    step(`Creating database '${dbName}'...`);
     try {
       execSync(`${pg.psqlPath.replace(/psql$/, "createdb")} ${dbName}`, {
         stdio: "pipe",
@@ -164,7 +322,7 @@ async function setupCommand(opts = {}) {
   success("pgvector extension available");
   // ─── Step 7: Run migrations ────────────────────────────────────────────
-  step("Running migrations...");
+  step("Running database migrations...");
   try {
     process.env.DATABASE_URL = connectionString;
     const migrateResult = spawnSync("npx", ["tsx", "scripts/migrate.ts"], {
@@ -181,84 +339,59 @@ async function setupCommand(opts = {}) {
     return { ok: false };
   }
-  // ─── Step 8: Ollama (optional) ─────────────────────────────────────────
-  section("Checking LLM runtime");
-  const ollama = detectOllama();
-  let ollamaUrl = env.OLLAMA_URL || "http://localhost:11434";
-  if (!ollama.installed) {
-    warn("Ollama not installed (optional - you can use cloud providers instead)");
-    blank();
-    console.log(ollamaInstallHint());
-    blank();
-    info("Continuing without Ollama. You can add Claude/OpenAI as a provider in the app.");
-    blank();
-  } else if (!ollama.running) {
-    warn("Ollama is installed but not running");
-    if (getOS() === "mac") {
-      info("Try: brew services start ollama");
-    } else {
-      info("Try: ollama serve");
-    }
-    blank();
-  } else {
-    success(`Ollama running (${ollama.version || "unknown version"})`);
-    // ─── Step 9: Required model: embeddinggemma ──────────────────────────
+  // ─── Step 8: Always pull EmbeddingGemma (required for memory) ──────────
+  if (ollama.running) {
+    section("Setting up models");
     const embedModel = await detectOllamaModel("embeddinggemma");
     if (!embedModel.installed) {
-      step("Pulling embeddinggemma (~600MB, required for vector search)...");
+      step("Downloading EmbeddingGemma (~600 MB, required for memory)...");
       try {
         execSync("ollama pull embeddinggemma", { stdio: "inherit" });
-        success("Pulled embeddinggemma");
+        success("EmbeddingGemma installed");
       } catch (err) {
         fail(`Failed to pull embeddinggemma: ${err.message}`);
         return { ok: false };
       }
     } else {
-      success("embeddinggemma installed");
+      success("EmbeddingGemma already installed");
     }
-    // ─── Step 10: Recommended model: gemma4:26b ──────────────────────────
-    const chatModel = await detectOllamaModel("gemma4:26b");
-    if (!chatModel.installed && !skipIfDone) {
-      blank();
-      info("Recommended chat model: gemma4:26b (~18GB)");
-      info("Optional - you can use cloud providers (Claude, GPT) instead.");
-      const wantsPull = await confirm("Pull gemma4:26b now?", false);
-      if (wantsPull) {
-        try {
-          execSync("ollama pull gemma4:26b", { stdio: "inherit" });
-          success("Pulled gemma4:26b");
-        } catch (err) {
-          fail(`Failed to pull gemma4:26b: ${err.message}`);
-          info("You can pull it later with: ollama pull gemma4:26b");
-        }
-      } else {
-        info("Skipped. You can pull it later with: ollama pull gemma4:26b");
+    // ─── Step 9: Pick a Gemma 4 chat model ───────────────────────────────
+    // Check if any Gemma 4 chat model is already installed first.
+    const has26 = await detectOllamaModel("gemma4:26b");
+    const has31 = await detectOllamaModel("gemma4:31b");
+    const hasE2 = await detectOllamaModel("gemma4:e2b");
+    const hasAny = has26.installed || has31.installed || hasE2.installed;
+    // Always show the picker when no Gemma chat model is installed.
+    // skipIfDone is intentionally NOT checked here - on a fresh machine
+    // we MUST pull a model or the chat 404s on first message.
+    if (!hasAny) {
+      const choice = await pickGemmaModel();
+      step(`Downloading ${choice.label}... (this can take a while)`);
+      try {
+        execSync(`ollama pull ${choice.id}`, { stdio: "inherit" });
+        success(`${choice.label} installed`);
+      } catch (err) {
+        fail(`Failed to pull ${choice.id}: ${err.message}`);
+        info(`You can pull it later with: ollama pull ${choice.id}`);
       }
-    } else if (chatModel.installed) {
-      success("gemma4:26b installed");
+    } else if (hasAny) {
+      success("A Gemma 4 chat model is already installed");
     }
   }
-  // ─── Step 11: Write .env.local ─────────────────────────────────────────
-  section("Writing config");
+  // ─── Step 10: Write .env.local ─────────────────────────────────────────
   const finalEnv = {
     DATABASE_URL: env.DATABASE_URL || connectionString,
-    OLLAMA_URL: env.OLLAMA_URL || ollamaUrl,
+    OLLAMA_URL: env.OLLAMA_URL || "http://localhost:11434",
     OLLAMA_CHAT_MODEL: env.OLLAMA_CHAT_MODEL || "gemma4:26b",
     OLLAMA_FAST_MODEL: env.OLLAMA_FAST_MODEL || "gemma4:e4b",
     OLLAMA_EMBED_MODEL: env.OLLAMA_EMBED_MODEL || "embeddinggemma",
   };
-  const existedBefore = fs.existsSync(ENV_PATH);
   writeEnv(ENV_PATH, finalEnv);
-  success(`.env.local ${existedBefore ? "updated" : "created"}`);
-  // ─── Step 12: Production build (skipped in dev mode) ──────────────────
-  // End users get a production build for speed and to avoid dev-mode hot-reload
-  // hydration warnings. Developers in their own checkout (devMode=true) skip the
-  // build so they get hot reload via `next dev`.
+  // ─── Step 11: Production build (skipped in dev mode) ──────────────────
   if (!devMode) {
     const hasBuild = fs.existsSync(
       path.join(installPath, ".next", "BUILD_ID")
@@ -279,16 +412,20 @@ async function setupCommand(opts = {}) {
         }
       } catch (err) {
         warn(`Build failed: ${err.message}`);
-        info("Falling back to dev mode at runtime");
       }
-    } else {
-      success("Production build already exists");
     }
   }
   blank();
   success(color.bold("Setup complete!"));
   blank();
+  console.log("Want a different Gemma 4 model later? Run one of these:");
+  console.log("  ollama pull gemma4:26b");
+  console.log("  ollama pull gemma4:31b");
+  console.log("  ollama pull gemma4:e2b");
+  console.log("");
+  console.log("Then pick it from the dropdown at the top of the chat.");
+  blank();
   return { ok: true };
 }

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "recallmem",
-  "version": "0.1.0",
+  "version": "0.1.2",
   "description": "Private, local-first AI chatbot with persistent working memory. One command install via npx.",
   "license": "Apache-2.0",
   "author": "Chris Sean",