arcrift-setup 1.5.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (4) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +685 -0
  3. package/bin/setup.js +74 -0
  4. package/package.json +35 -0
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Eshaan Nair
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,685 @@
1
+ <div align="center">
2
+
3
+ # ArcRift — Persistent Memory for AI Coding Tools
4
+
5
+ ### Your AI forgets everything between sessions. ArcRift fixes that.
6
+ ### Memory saved in a browser chat is instantly available in your coding tool, and vice versa.
7
+
8
+ **A local-first memory layer that captures your conversations, builds a searchable knowledge graph, and automatically injects the right context into every new prompt — no cloud, no subscriptions, no re-explaining yourself.**
9
+
10
+ <br/>
11
+
12
+ [![Stars](https://img.shields.io/github/stars/Eshaan-Nair/ARCRIFT?style=for-the-badge&logo=github&labelColor=0B0E14&color=6366F1)](https://github.com/Eshaan-Nair/ARCRIFT/stargazers)
13
+ [![Forks](https://img.shields.io/github/forks/Eshaan-Nair/ARCRIFT?style=for-the-badge&logo=github&labelColor=0B0E14&color=06B6D4)](https://github.com/Eshaan-Nair/ARCRIFT/forks)
14
+ [![Issues](https://img.shields.io/github/issues/Eshaan-Nair/ARCRIFT?style=for-the-badge&logo=github&labelColor=0B0E14&color=02C39A)](https://github.com/Eshaan-Nair/ARCRIFT/issues)
15
+ [![Downloads](https://img.shields.io/npm/dt/ARCRIFT-setup?style=for-the-badge&logo=npm&labelColor=0B0E14&color=CB3837)](https://www.npmjs.com/package/ARCRIFT-setup)
16
+ [![Version](https://img.shields.io/badge/version-1.5.3-6366F1?style=for-the-badge&labelColor=0B0E14)](CHANGELOG.md)
17
+ [![License: MIT](https://img.shields.io/badge/License-MIT-F8FAFC?style=for-the-badge&labelColor=0B0E14)](LICENSE)
18
+
19
+ <br/>
20
+
21
+ **Browser Extension:** Claude · ChatGPT · Gemini · DeepSeek · Grok · Copilot · Mistral
22
+
23
+ **MCP (AI Coding Tools):** Claude Code · Cursor · Windsurf · Claude Desktop
24
+
25
+ https://github.com/user-attachments/assets/49d8eb52-c266-449a-ae45-147ec755ec09
26
+
27
+ <br/>
28
+
29
+ </div>
30
+
31
+ ## One Command Setup
32
+
33
+ ```bash
34
+ npx ARCRIFT-setup
35
+ ```
36
+
37
+
38
+ ---
39
+
40
+ ## The Problem
41
+
42
+ You are deep in a complex project. You have had 30 conversations with Claude about your auth flow, database schema, and deployment strategy. You open a new chat — and it is all gone. You spend 10 minutes re-explaining context you have already covered, and the AI gives you advice that contradicts decisions you made two weeks ago.
43
+
44
+ ArcRift stops the cycle. It captures your AI conversations, extracts structured facts into a knowledge graph, embeds them as searchable vectors, and automatically prepends the most relevant context to every new prompt — before you even finish typing.
45
+
46
+ ---
47
+
48
+ ## Table of Contents
49
+
50
+ - [How the Two Modes Work](#how-the-two-modes-work)
51
+ - [Key Features](#key-features)
52
+ - [Performance Benchmarks](#performance-benchmarks)
53
+ - [System Requirements](#system-requirements)
54
+ - [Installation](#installation)
55
+ - [Web Extension Setup](#web-extension-setup)
56
+ - [MCP Server Setup](#mcp-server-setup)
57
+ - [Running Both Together](#running-both-together)
58
+ - [Usage Guide](#usage-guide)
59
+ - [Using the Browser Extension](#using-the-browser-extension)
60
+ - [Using the MCP Tools](#using-the-mcp-tools)
61
+ - [Dashboard](#dashboard)
62
+ - [How It Works](#how-it-works)
63
+ - [Quality-of-Life Details](#quality-of-life-details)
64
+ - [Architecture](#architecture)
65
+ - [Privacy and Security](#privacy-and-security)
66
+ - [What's New in v1.5.3](#whats-new-in-v153)
67
+ - [Documentation](#documentation)
68
+ - [Contributing](#contributing)
69
+ - [License](#license)
70
+
71
+ ---
72
+
73
+ ## How the Two Modes Work
74
+
75
+ ArcRift has two complementary modes that share the same memory store. You can use one, the other, or both at the same time.
76
+
77
+ ### Mode 1 — Browser Extension (Web)
78
+
79
+ The extension lives inside Chrome and works on any AI chat website. When you save a conversation, it scrapes the page, scrubs PII, chunks and embeds the text locally, and sends it to the ArcRift backend. On every subsequent prompt you type, the extension intercepts the input, queries the backend for relevant context, and prepends it to your message automatically — before the request hits the AI.
80
+
81
+ Best for: Claude, ChatGPT, Gemini, DeepSeek, Grok, Microsoft Copilot, and Mistral web interfaces.
82
+
83
+ ### Mode 2 — MCP Server (Coding Tools)
84
+
85
+ The MCP server exposes ArcRift as a set of tools that coding agents can call directly. Instead of intercepting DOM events, the AI tool calls `recall_context` at the start of a session to pull in relevant memory, and `store_memory` after completing work to save decisions and context for future sessions.
86
+
87
+ Best for: Claude Code, Cursor, Windsurf — anywhere you write code with an AI coding agent.
88
+
89
+ ### Shared Memory
90
+
91
+ Both modes write to and read from the same backend database. A conversation you save via the browser extension is immediately available to `recall_context` in your coding tool, and vice versa. They are two interfaces into one unified knowledge base.
92
+
93
+ ---
94
+
95
+ ## Key Features
96
+
97
+ ### Core Retrieval Engine
98
+
99
+ | Feature | Detail |
100
+ |:---|:---|
101
+ | **Three-Layer Hybrid Search** | Sentence vectors, chunk vectors, and FTS5 keyword search run in parallel. Results are fused and ranked by a combined score. |
102
+ | **Surgical Sentence Trimming** | Chunks are split into individual sentences at index time. On retrieval, only the sentences that directly match the query are returned — not the entire surrounding paragraph. Reduces prompt noise by up to 95%. |
103
+ | **HyDE (Hypothetical Document Embedding)** | Before querying the vector store, ArcRift generates a hypothetical answer to your query and uses that embedding alongside the raw query. This dramatically improves recall for rephrased or indirect questions. |
104
+ | **Small-to-Big Retrieval** | High-precision sentence match triggers fetching the parent chunk for broader context. Precision of a sentence search, context of a full paragraph. |
105
+ | **Knowledge Graph Layer** | Every saved conversation is processed to extract subject-relation-object triples (22 entity types, 20+ relation types). Graph facts are fused with vector results on every recall. |
106
+ | **Background Indexing** | Sentence-level embedding is offloaded to a background job queue so Save is instant. The deep index is built asynchronously without blocking the UI. |
107
+
108
+ ### Extension Quality-of-Life
109
+
110
+ | Feature | Detail |
111
+ |:---|:---|
112
+ | **Auto-Connect** | Once a session is active, ArcRift re-attaches automatically on every page load. No clicking required — just type. |
113
+ | **SPA Navigation Awareness** | Detects "New Chat" clicks in single-page apps (ChatGPT, Claude, Gemini) without a full page reload. Automatically resets the active session so context does not bleed between conversations. |
114
+ | **Pause / Resume** | One click in the popup pauses auto-injection. Click again to resume. State persists across tabs. |
115
+ | **Classic Inject** | One-time manual inject button for priming a cold start without enabling auto-connect. |
116
+ | **FNV-1a Deduplication** | Identical conversation segments are fingerprinted and skipped — re-saving a chat never creates duplicate embeddings. |
117
+ | **Multi-Strategy DOM Resolver** | Each platform has five ordered selector strategies. If one breaks after a UI update, the next activates automatically. |
118
+ | **Restricted URL Guard** | Injection is blocked on `chrome://`, `about:`, and extension pages. Prevents crashes on non-chat pages. |
119
+
120
+ ### MCP Tool Quality-of-Life
121
+
122
+ | Tool | What it does |
123
+ |:---|:---|
124
+ | `recall_context` | Retrieves the top-N most relevant memory chunks for a prompt, scoped to a project. Includes knowledge graph facts. |
125
+ | `store_memory` | Saves text or a transcript to ArcRift Memory. Auto-creates the project if it does not exist. Triggers full background indexing. |
126
+ | `search_memory` | Cross-project global search. Useful for finding decisions made in a different project that apply to the current one. |
127
+ | `list_projects` | Lists all saved projects with metadata — chunk count, triple count, last updated. |
128
+ | `get_project_summary` | Returns a structured knowledge graph summary for a project as readable markdown. |
129
+ | `identify_active_project` | Matches a folder path against saved project names. Lets the AI agent auto-detect which project it is working on from the CWD. |
130
+ | `prune_memory` | Surgically removes facts or chunks matching a description. Corrects outdated information without wiping an entire project. |
131
+
132
+ ### Infrastructure
133
+
134
+ | Feature | Detail |
135
+ |:---|:---|
136
+ | **Zero-Docker Mode** | `ARCRIFT_STORAGE_MODE=sqlite` replaces all Docker services with a single `ArcRift.db` file. Full feature parity — vector search, knowledge graph, job queue, everything. |
137
+ | **WAL Concurrency** | SQLite runs in Write-Ahead Logging mode, allowing simultaneous reads from the dashboard, extension, and MCP server without lock contention. |
138
+ | **Dead Letter Queue** | Background jobs that fail are retried up to 5 times with exponential backoff. Failed jobs move to a dead letter queue visible in the dashboard — nothing is silently lost. |
139
+ | **Ghost Job Cleanup** | On startup, any jobs stuck in PROCESSING state from a previous crashed run are automatically reset to PENDING. |
140
+ | **Rate Limiting** | Save endpoint is rate-limited independently from read endpoints. Prevents accidental flooding from rapid saves. |
141
+ | **Helmet Security Headers** | All responses include `Content-Security-Policy`, `X-Frame-Options`, `X-Content-Type-Options`, and related headers. |
142
+
143
+ ---
144
+
145
+ ## Performance Benchmarks
146
+
147
+ Every release is stress-tested across four independent audits. All results are reproducible using the scripts in `backend/scripts/`.
148
+
149
+ ### Web Context Engine (Browser Extension)
150
+
151
+ **Scale:** 1,000 chunks (~300,000 words) | **Needles:** 20 facts | **Queries:** 60 phrasings
152
+
153
+ | Metric | Result | What it means |
154
+ |:---|:---|:---|
155
+ | **Recall @ 1** | **90.0%** | Correct fact was the top result in 54 of 60 searches |
156
+ | **Mean Reciprocal Rank** | **0.806** | Correct answer appears at position 1.24 on average (1.0 is perfect) |
157
+ | **Context Compression** | **95.0%** | Payload reduced from 55,350 chars to 2,784 chars before injection |
158
+ | **Mean Relevance Score** | **0.464** | Average semantic similarity of retrieved results (0–1 scale) |
159
+
160
+ Engine contribution across 54 successful recalls:
161
+
162
+ | Engine | Hits | Role |
163
+ |:---|:---|:---|
164
+ | Sentence Vector | 50 | High-precision match against individual sentences |
165
+ | Chunk Vector | 47 | Thematic match against full 150-word context windows |
166
+ | FTS5 Keyword | 43 | Exact literal matching, boosts low-similarity vector results |
167
+
168
+ The 6 misses were all on degenerate "Context on X?" queries with no semantic content. All natural-language and rephrased queries passed.
169
+
170
+ Full report: [reports/benchmark_web.md](reports/benchmark_web.md)
171
+
172
+ ---
173
+
174
+ ### MCP Context Engine (Coding Tools)
175
+
176
+ **Scale:** 10 facts across real project memory | **Queries:** 30 (3 phrasings each) | **TopN:** 6
177
+
178
+ | Metric | Result | Target | |
179
+ |:---|:---|:---|:---|
180
+ | **Total Recall** | **90%** | >90% | PASS |
181
+ | **Context Compression** | **81.3%** | >75% | PASS |
182
+ | **Noise Redacted** | **131,700 chars** | — | vs. returning 6 full chunks raw |
183
+
184
+ Engine contribution across 27 successful recalls:
185
+
186
+ | Engine | Hits | Contribution |
187
+ |:---|:---|:---|
188
+ | Sentence Vector | 26 | 100% of recalls |
189
+ | FTS Keyword | 24 | 92.3% of recalls |
190
+ | Chunk Vector | 9 | 34.6% of recalls |
191
+
192
+ The 3 misses were all on highly rephrased semantic queries with no shared keywords. Standard and lowercase phrasings passed in every case.
193
+
194
+ Full report: [reports/benchmark_mcp.md](reports/benchmark_mcp.md)
195
+
196
+ ---
197
+
198
+ ### MCP Project Isolation Audit
199
+
200
+ **Scale:** 10 simultaneous projects | **Checks:** Store + own-recall + cross-leak per project
201
+
202
+ | Metric | Result | Status |
203
+ |:---|:---|:---|
204
+ | **Isolation Integrity** | **100%** | ELITE — zero cross-project leakage |
205
+ | **Concurrent Access** | **Pass** | All projects readable under simultaneous load |
206
+ | **Leak Detection** | **Negative** | No data from any project visible in another |
207
+
208
+ Each project's vector space and knowledge graph is strictly siloed via `sessionId` constraints. Aggressive cleanup logic purges both IDs and Names between runs to prevent identity drift.
209
+
210
+ Full report: [reports/mcp_stress_test.md](reports/mcp_stress_test.md)
211
+
212
+ ---
213
+
214
+ ### Knowledge Graph Stress Audit
215
+
216
+ **Scale:** 1,200+ nodes, 1,087 triples in a single session
217
+
218
+ | Metric | Result | Status |
219
+ |:---|:---|:---|
220
+ | **Total Triples Stored** | **1,087** | PASS |
221
+ | **Ingestion Throughput** | **4,056 triples/sec** | OPTIMIZED |
222
+ | **Generation Time** | **0.3 seconds** | ELITE |
223
+ | **Dashboard Load** | **< 1.5 seconds** | Physics-simulated D3.js render |
224
+ | **Storage Cost** | **~0.2 MB** | SQLite increase for entire stress session |
225
+
226
+ Graph structure: 5 major hubs (40+ edges each), 15 intermediate clusters, 400 mesh entities, 100 isolated standalone facts.
227
+
228
+ Full report: [reports/graph_stress_test.md](reports/graph_stress_test.md)
229
+
230
+ ---
231
+
232
+ ## System Requirements
233
+
234
+ | Mode | Min RAM | Disk | Docker | What runs |
235
+ |:---|:---|:---|:---|:---|
236
+ | **SQLite (Recommended)** | 2 GB | 3 GB | Not required | All features — single `.db` file + Ollama |
237
+ | **Full Docker** | 8 GB | 15 GB | Required | Neo4j + MongoDB + ChromaDB + Ollama |
238
+ | **Lite Docker** | 4 GB | 10 GB | Required | MongoDB + ChromaDB (no knowledge graph) |
239
+
240
+ SQLite mode is the recommended default. The installer detects Docker automatically and sets SQLite mode if Docker is not available.
241
+
242
+ ### Prerequisites
243
+
244
+ | Requirement | Version | Notes |
245
+ |:---|:---|:---|
246
+ | Node.js | 20 LTS+ | [nodejs.org](https://nodejs.org) |
247
+ | Ollama | Latest | [ollama.com](https://ollama.com) — required for local embeddings and extraction |
248
+ | Docker Desktop | 24.0+ | [docker.com](https://docker.com) — only needed for Docker mode |
249
+ | Groq API Key | — | [console.groq.com](https://console.groq.com) — free, used as fallback if Ollama is slow |
250
+
251
+ ---
252
+
253
+ ## Installation
254
+
255
+ ### One-Command Setup (All Platforms)
256
+
257
+ ```bash
258
+ npx ARCRIFT-setup
259
+ ```
260
+
261
+ This is the recommended starting point for all users. It clones the repo, checks dependencies, pulls Ollama models, installs packages, and builds everything. Run it once and then use `start.bat` or `start.sh` for daily use.
262
+
263
+ ---
264
+
265
+ ### Web Extension Setup
266
+
267
+ The extension requires the ArcRift backend to be running. It does not work standalone.
268
+
269
+ **Step 1 — Install and start the backend**
270
+
271
+ ```bash
272
+ # One-command (recommended)
273
+ npx ARCRIFT-setup
274
+
275
+ # Or manual
276
+ git clone https://github.com/Eshaan-Nair/ARCRIFT.git
277
+ cd ARCRIFT/backend
278
+ cp .env.example .env # Edit .env — add GROQ_API_KEY if using Groq
279
+ npm install
280
+ ```
281
+
282
+ Set storage mode in `backend/.env`:
283
+ ```
284
+ ARCRIFT_STORAGE_MODE=sqlite # Recommended — no Docker needed
285
+ OLLAMA_URL=http://localhost:11434
286
+ GROQ_API_KEY=gsk_your_key_here
287
+ ```
288
+
289
+ Start the backend:
290
+ ```bash
291
+ # Windows
292
+ start.bat
293
+
294
+ # macOS / Linux
295
+ ./start.sh
296
+ ```
297
+
298
+ The backend starts on `http://localhost:3001`. The dashboard is served from the same port.
299
+
300
+ **Step 2 — Build the extension**
301
+
302
+ ```bash
303
+ cd extension
304
+ npm install
305
+ npm run build
306
+ ```
307
+
308
+ This produces the `extension/dist/` folder.
309
+
310
+ **Step 3 — Load into Chrome**
311
+
312
+ 1. Open `chrome://extensions`
313
+ 2. Enable **Developer mode** (top-right toggle)
314
+ 3. Click **Load unpacked**
315
+ 4. Select the `ARCRIFT/extension/dist` folder
316
+ 5. The ArcRift icon appears in your toolbar
317
+
318
+ **Step 4 — Use it**
319
+
320
+ Navigate to Claude, ChatGPT, Gemini, DeepSeek, Grok, Copilot, or Mistral. Click the ArcRift popup, enter a project name, and click **Save Chat**. Auto-connect activates immediately.
321
+
322
+ **Daily use:**
323
+ - Windows: double-click `start.bat`
324
+ - macOS/Linux: `./start.sh`
325
+
326
+ ---
327
+
328
+ ### MCP Server Setup
329
+
330
+ The MCP server runs as a separate process and communicates with AI coding tools over stdio. The backend does **not** need to be running as an HTTP server — the MCP server initializes its own storage connection.
331
+
332
+ **Step 1 — Build the backend**
333
+
334
+ ```bash
335
+ cd backend
336
+ npm install
337
+ npm run build
338
+ ```
339
+
340
+ This produces `backend/dist/mcp/server.js`.
341
+
342
+ **Step 2 — Generate your config (easiest)**
343
+
344
+ ```bash
345
+ cd backend
346
+ npm run mcp:config
347
+ ```
348
+
349
+ This prints a pre-formatted JSON block with absolute paths resolved for your machine. Copy it directly into your tool's config file.
350
+
351
+ **Step 3 — Add to your AI tool**
352
+
353
+ **Claude Desktop** — `%APPDATA%\Claude\claude_desktop_config.json` (Windows) or `~/.claude/claude_desktop_config.json` (macOS):
354
+ ```json
355
+ {
356
+ "mcpServers": {
357
+ "arcrift": {
358
+ "command": "node",
359
+ "args": ["C:/path/to/ARCRIFT/backend/dist/mcp/server.js"]
360
+ }
361
+ }
362
+ }
363
+ ```
364
+
365
+ **Claude Code** — run in your project directory:
366
+ ```bash
367
+ claude mcp add ArcRift node /path/to/ARCRIFT/backend/dist/mcp/server.js
368
+ ```
369
+
370
+ **Cursor** — create `.cursor/mcp.json` in your project root:
371
+ ```json
372
+ {
373
+ "mcpServers": {
374
+ "arcrift": {
375
+ "command": "node",
376
+ "args": ["/path/to/ARCRIFT/backend/dist/mcp/server.js"]
377
+ }
378
+ }
379
+ }
380
+ ```
381
+
382
+ **Windsurf** — create `.windsurf/mcp.json` in your project root:
383
+ ```json
384
+ {
385
+ "mcpServers": {
386
+ "arcrift": {
387
+ "command": "node",
388
+ "args": ["/path/to/ARCRIFT/backend/dist/mcp/server.js"]
389
+ }
390
+ }
391
+ }
392
+ ```
393
+
394
+ > Use forward slashes in all paths, even on Windows. Restart your AI tool after editing the config.
395
+
396
+ **Step 4 — Set the storage mode**
397
+
398
+ The MCP server reads `backend/.env`. Make sure it contains:
399
+ ```
400
+ ARCRIFT_STORAGE_MODE=sqlite
401
+ OLLAMA_URL=http://localhost:11434
402
+ ```
403
+
404
+ Ollama must be running for the MCP server to generate embeddings and extract knowledge graph triples.
405
+
406
+ ---
407
+
408
+ ### Running Both Together
409
+
410
+ When running the browser extension and MCP server together, they share the same `ArcRift.db` database. No extra configuration is needed.
411
+
412
+ 1. Start the HTTP backend: `start.bat` or `./start.sh`
413
+ 2. Load the extension in Chrome (it talks to `http://localhost:3001`)
414
+ 3. Your AI coding tool starts the MCP server automatically when you open a project
415
+
416
+ Memory saved via the extension is immediately available in `recall_context`, and memory stored via `store_memory` appears in the dashboard history. They are the same database.
417
+
418
+ The HTTP backend and MCP server both use WAL mode on SQLite, which allows them to read and write concurrently without locking each other out.
419
+
420
+ ---
421
+
422
+ ## Usage Guide
423
+
424
+ ### Using the Browser Extension
425
+
426
+ **Saving a conversation:**
427
+ 1. Have a conversation on any supported platform
428
+ 2. Click the ArcRift icon in the Chrome toolbar
429
+ 3. Enter a project name (e.g. `AuthService`, `MyApp-Backend`)
430
+ 4. Click **Save Chat**
431
+
432
+ ArcRift scrubs PII, chunks the text, embeds it locally with Ollama, and sends it to the backend. The UI confirms success in under 5 seconds. Background indexing (sentence-level embeddings, knowledge graph extraction) continues asynchronously.
433
+
434
+ **Auto-connect:**
435
+
436
+ Once a session is saved and activated, ArcRift intercepts every prompt you type on that platform. Before the request is sent, it queries the backend for relevant context and prepends the top results. You do not need to do anything — just type normally.
437
+
438
+ To pause: click the ArcRift popup and hit **Pause**. The badge dims. Click again to resume.
439
+
440
+ **New chat detection:**
441
+
442
+ When you click "New Chat" on ChatGPT, Claude.ai, or Gemini, ArcRift detects the URL or DOM change and resets the active session. The next Save will start a fresh project, and context from the previous session will not bleed in.
443
+
444
+ **Classic inject:**
445
+
446
+ For a one-time context push without enabling auto-connect, click **Inject Context** in the popup. ArcRift pastes the knowledge graph summary directly into the chat input field. You review it and send manually.
447
+
448
+ ---
449
+
450
+ ### Using the MCP Tools
451
+
452
+ Once connected, your coding agent has access to seven ArcRift tools. A typical session looks like this:
453
+
454
+ **At session start — recall project memory:**
455
+ ```
456
+ Use recall_context with prompt: "implementing JWT refresh token rotation"
457
+ and project: "AuthService"
458
+ ```
459
+
460
+ **After completing work — save decisions:**
461
+ ```
462
+ Use store_memory with content: "We implemented refresh token rotation using
463
+ Redis for token invalidation. The key insight was using a sliding expiry window
464
+ of 15 minutes for access tokens and 7 days for refresh tokens." and project: "AuthService"
465
+ ```
466
+
467
+ **Finding something from a different project:**
468
+ ```
469
+ Use search_memory with query: "rate limiting strategy"
470
+ ```
471
+
472
+ **Getting an overview before starting:**
473
+ ```
474
+ Use get_project_summary for project: "AuthService"
475
+ ```
476
+
477
+ **Auto-detecting the current project:**
478
+ ```
479
+ Use identify_active_project with path: "/Users/me/code/auth-service"
480
+ ```
481
+
482
+ **Correcting outdated information:**
483
+ ```
484
+ Use prune_memory with prompt: "Redis rate limiting" and project: "AuthService"
485
+ ```
486
+
487
+ ---
488
+
489
+ ### Dashboard
490
+
491
+ Open `http://localhost:3001` while the backend is running.
492
+
493
+ | Tab | What you see |
494
+ |:---|:---|
495
+ | **Graph** | D3.js force-directed knowledge graph. Nodes are entities, edges are relations. Degree-scaled sizing — high-connectivity nodes appear larger. Hover for details, scroll to zoom, drag to reposition. |
496
+ | **History** | All extracted triples (subject / relation / object) with timestamps. Filterable by project and relation type. |
497
+ | **Chat** | The full saved conversation rendered as color-coded chat bubbles, with platform attribution. |
498
+ | **Job Queue** | Live view of background indexing jobs — pending, processing, completed, dead-lettered. |
499
+
500
+ ---
501
+
502
+ ## How It Works
503
+
504
+ ```
505
+ SAVE
506
+ Browser scrapes conversation → FNV-1a dedup check
507
+ → PII scrub (API keys, JWTs, emails, IPs → [REDACTED])
508
+ → POST to backend
509
+
510
+ STORAGE (two parallel tracks)
511
+
512
+ Vector Track Graph Track
513
+ Sliding window chunker Text sent to Ollama llama3.1:8b
514
+ 300 words, 80-word overlap (Groq as fallback)
515
+ Embeds with nomic-embed-text Extracts subject-relation-object triples
516
+ Stores in SQLite vec0 Stores in SQLite facts table
517
+ Background: sentence-level Background: stores after chunk embedding
518
+ embedding job queued
519
+
520
+ RECALL (on every prompt or tool call)
521
+ Query → HyDE (generate hypothetical answer → embed both)
522
+ → Sentence vector search (top 100, filter by session)
523
+ → Chunk vector search (top 20, filter by session)
524
+ → FTS5 keyword search (prefix match, filter by session)
525
+ → Fuse results, score, deduplicate
526
+ → Surgical trim (keep only matching sentences from each chunk)
527
+ → sanitizeChunks() (scan for injection patterns → redact)
528
+ → wrapInContextBlock() (lean text header)
529
+ → Prepend to prompt
530
+ ```
531
+
532
+ ---
533
+
534
+ ## Quality-of-Life Details
535
+
536
+ These are the smaller decisions that make the system faster and more reliable in practice.
537
+
538
+ **Instant save, deep index later.** When you click Save, only the chunk-level embeddings are computed synchronously (1–2 embeddings). Sentence-level embeddings (20–40 embeddings per conversation) are offloaded to a background job. The UI confirms success immediately; the deep index catches up within seconds.
539
+
540
+ **Delete-then-insert for vector updates.** SQLite virtual tables do not support `UPDATE` on vector columns. ArcRift uses a delete-then-insert pattern to avoid `UNIQUE constraint` errors when re-saving a conversation.
541
+
542
+ **Prefix keyword matching.** FTS5 queries use wildcard suffixes (`encrypt*` matches `encryption`, `encrypted`, `encryptor`). This significantly improves recall for technical terms where the exact suffix varies.
543
+
544
+ **Threshold set at 0.30, not 0.45.** Surgical trimming allows a lower similarity threshold. Even if a chunk is only loosely related, if the matching sentences are precise, the noise penalty is near zero.
545
+
546
+ **History-aware fallback.** If a query is detected as a history-seeking question ("what did we talk about", "what was decided"), the trimmer falls back to the first three sentences of the chunk rather than returning nothing.
547
+
548
+ **5-character minimum sentence filter.** The sentence splitter ignores fragments shorter than 5 characters. This prevents code snippets and punctuation artifacts from polluting the sentence index.
549
+
550
+ **WAL mode on all writes.** SQLite is opened in WAL mode on startup. The MCP server, HTTP backend, and dashboard can all read and write concurrently without database lock errors.
551
+
552
+ **Ghost job recovery.** On startup, any jobs stuck in `PROCESSING` from a previous crash are reset to `PENDING` automatically. No manual intervention needed after an unclean shutdown.
553
+
554
+ **CORS locked to localhost.** The backend only accepts requests from `localhost` origins. External requests are rejected before they reach any route handler.
555
+
556
+ ---
557
+
558
+ ## Architecture
559
+
560
+ ```
561
+ ARCRIFT/
562
+ ├── backend/
563
+ │ ├── src/
564
+ │ │ ├── mcp/ MCP server and seven tool implementations
565
+ │ │ ├── routes/ REST API (chat, rag, session, jobs)
566
+ │ │ ├── services/ Storage bridge, SQLite engine, vector store,
567
+ │ │ │ graph store, embeddings, job queue, extractor
568
+ │ │ ├── middleware/ Rate limiting, sanitization, CORS
569
+ │ │ └── utils/ Logger, privacy scrubber
570
+ │ └── scripts/ Benchmarking, stress testing, maintenance tools
571
+ ├── dashboard/ React 19 + D3.js + Vite — built to dashboard/dist/
572
+ ├── extension/
573
+ │ ├── src/
574
+ │ │ ├── platform/ Multi-strategy DOM resolver
575
+ │ │ ├── platforms/ claude, chatgpt, gemini, deepseek, grok, copilot, mistral
576
+ │ │ ├── content.ts DOM scraping, prompt interception, auto-connect
577
+ │ │ └── background.ts Service worker, backend proxy
578
+ │ └── popup/ Popup UI and controls
579
+ ├── reports/ Benchmark and audit outputs
580
+ ├── .env.example Configuration template
581
+ ├── docker-compose.yml Full Docker profile
582
+ ├── install.bat / .sh First-time setup
583
+ └── start.bat / .sh Daily launcher
584
+ ```
585
+
586
+ ### Ports
587
+
588
+ | Service | Port | Notes |
589
+ |:---|:---|:---|
590
+ | Backend API + Dashboard | 3001 | Single process — API and static files |
591
+ | MCP Server | stdio | Spawned by your AI tool on demand |
592
+ | Ollama | 11434 | Local LLM and embeddings |
593
+ | Neo4j | 7474 / 7687 | Docker full mode only |
594
+ | MongoDB | 27017 | Docker mode only |
595
+ | ChromaDB | 8000 | Docker mode only |
596
+
597
+ ### Tech Stack
598
+
599
+ | Layer | Technology |
600
+ |:---|:---|
601
+ | Extension | TypeScript, Chrome MV3, esbuild |
602
+ | Backend | Node.js, Express 5, TypeScript, Pino |
603
+ | Vector Store | SQLite-vec (vec0 virtual tables, 768-dim float32) |
604
+ | Full-Text Search | SQLite FTS5 with Porter stemmer |
605
+ | Knowledge Graph | SQLite facts table (or Neo4j in Docker mode) |
606
+ | Embeddings | Ollama `nomic-embed-text` (768-dim, CPU-optimized) |
607
+ | LLM | Ollama `llama3.1:8b` primary — Groq fallback |
608
+ | MCP | `@modelcontextprotocol/sdk` v1.29+ (stdio transport) |
609
+ | Dashboard | React 19, Vite 7, D3.js v7 |
610
+ | Static Serving | sirv (served from same process as the API) |
611
+ | Security | Helmet, express-rate-limit |
612
+
613
+ ---
614
+
615
+ ## Privacy and Security
616
+
617
+ ArcRift was designed with a local-first philosophy from the ground up. Your conversations never leave your machine unless you explicitly configure a cloud LLM.
618
+
619
+ | Control | Detail |
620
+ |:---|:---|
621
+ | **Local Storage** | All data lives in `ArcRift.db` on your machine or in local Docker volumes. Nothing syncs to any external service. |
622
+ | **Local Embeddings** | `nomic-embed-text` runs entirely via Ollama — zero API calls for embeddings. |
623
+ | **Local Extraction** | `llama3.1:8b` runs via Ollama for knowledge graph extraction. Groq is only used as a fallback and only if you provide a key. |
624
+ | **PII Scrubbing** | API keys, JWTs, connection strings, email addresses, and internal IPs are redacted to `[REDACTED]` in the browser before any data is sent to the backend. |
625
+ | **Injection Defence** | Retrieved chunks are scanned for 10 known prompt injection patterns before being injected into any prompt. Matching content is replaced with `[Content redacted]`. |
626
+ | **CORS Locked** | The backend rejects requests from any origin other than `localhost`. |
627
+ | **Security Headers** | Helmet adds `CSP`, `X-Frame-Options`, `X-Content-Type-Options`, and other headers to every response. |
628
+ | **No Shared Secret** | The pre-v1.4.7 shared secret requirement has been removed. The extension communicates directly with the local backend. |
629
+
630
+ See [SECURITY.md](SECURITY.md) for the full threat model and vulnerability reporting policy.
631
+
632
+ ---
633
+
634
+ ## What's New in v1.5.3
635
+
636
+ - **Global Search Bar** — New debounced global search in the dashboard header querying across all projects with combined vector chunks and graph facts.
637
+ - **Knowledge Graph Pruning** — Click a node in the graph to prune it instantly without a page reload.
638
+ - **System Health Panel** — Live SQLite metrics, session count, job queue status, and Ollama connectivity pinned to the dashboard sidebar.
639
+ - **Selector Warning Badge** — The extension popup now shows an amber warning banner if it fails to locate the chat input element due to a stale CSS selector.
640
+ - **SQLite-Native CI Tests** — Refactored integration tests to use the Unified Storage Interface (`initStorage()`), automatically falling back to SQLite and removing Docker service container requirements in CI.
641
+
642
+ See [CHANGELOG.md](CHANGELOG.md) for the full history.
643
+
644
+ ---
645
+
646
+ ## Documentation
647
+
648
+ | File | Description |
649
+ |:---|:---|
650
+ | [ARCHITECTURE.md](ARCHITECTURE.md) | Data flow, storage schema, environment variables |
651
+ | [RAG_PIPELINE.md](RAG_PIPELINE.md) | Retrieval pipeline, scoring, threshold tuning |
652
+ | [MCP_SETUP.md](MCP_SETUP.md) | MCP setup guide for all supported tools |
653
+ | [PLATFORM_SELECTORS.md](PLATFORM_SELECTORS.md) | DOM resolver system, adding new platforms |
654
+ | [SECURITY.md](SECURITY.md) | Threat model, vulnerability reporting |
655
+ | [SELF_HOSTING.md](SELF_HOSTING.md) | Ports, passwords, backups, reverse proxy |
656
+ | [CONTRIBUTING.md](CONTRIBUTING.md) | Fork workflow, commit format, adding platforms |
657
+ | [CHANGELOG.md](CHANGELOG.md) | Full version history |
658
+ | [TROUBLESHOOTING.md](TROUBLESHOOTING.md) | Common issues and fixes |
659
+
660
+ ---
661
+
662
+ ## Contributing
663
+
664
+ Bug fixes, new platform support, UI improvements, and test coverage are all welcome.
665
+
666
+ [Contributing Guide](CONTRIBUTING.md) · [Code of Conduct](CODE_OF_CONDUCT.md)
667
+
668
+ Good first issues: [`good first issue`](https://github.com/Eshaan-Nair/ARCRIFT/issues?q=is%3Aissue+label%3A%22good+first+issue%22)
669
+
670
+ ---
671
+
672
+ ## License
673
+
674
+ MIT — see [LICENSE](LICENSE).
675
+
676
+ ---
677
+
678
+ <div align="center">
679
+ <br/>
680
+
681
+ **Stop re-explaining yourself. Give your AI the memory it should have had from day one.**
682
+
683
+ *Built by [Eshaan Nair](https://github.com/Eshaan-Nair)*
684
+
685
+ </div>
package/bin/setup.js ADDED
@@ -0,0 +1,74 @@
1
+ #!/usr/bin/env node
2
+
3
+ const { spawn } = require('child_process');
4
+ const path = require('path');
5
+ const fs = require('fs');
6
+ const readline = require('readline');
7
+
8
+ const rl = readline.createInterface({
9
+ input: process.stdin,
10
+ output: process.stdout
11
+ });
12
+
13
+ const REPO_URL = 'https://github.com/Eshaan-Nair/ARCRIFT.git';
14
+
15
+ console.log(`
16
+ ===================================
17
+ ArcRift v1.5.1 - Initializer
18
+ ===================================
19
+ `);
20
+
21
+ function ask(question, defaultVal) {
22
+ return new Promise((resolve) => {
23
+ rl.question(`${question} (${defaultVal}): `, (answer) => {
24
+ resolve(answer || defaultVal);
25
+ });
26
+ });
27
+ }
28
+
29
+ async function run() {
30
+ const parentDir = await ask('Where should we install ArcRift? (press Enter for current folder)', '.');
31
+ const targetDirName = 'ARCRIFT';
32
+ const fullPath = path.resolve(process.cwd(), parentDir, targetDirName);
33
+
34
+ if (fs.existsSync(fullPath)) {
35
+ console.log(`\n [!] Folder "${fullPath}" already exists. Please delete it or choose a different location.`);
36
+ process.exit(1);
37
+ }
38
+
39
+ console.log(`\n [*] Cloning ArcRift into ${fullPath}...`);
40
+
41
+ const clone = spawn('git', ['clone', REPO_URL, fullPath], { stdio: 'inherit' });
42
+
43
+ clone.on('close', (code) => {
44
+ if (code !== 0) {
45
+ console.error('\n [!] Failed to clone repository. Is Git installed?');
46
+ process.exit(1);
47
+ }
48
+
49
+ console.log('\n [*] Repository cloned successfully.');
50
+ console.log(' [*] Starting interactive installer...\n');
51
+
52
+ const isWindows = process.platform === 'win32';
53
+ const installerCmd = isWindows ? 'install.bat' : './install.sh';
54
+ const shell = isWindows ? true : false;
55
+
56
+ // Change working directory to the cloned repo
57
+ const installer = spawn(installerCmd, [], {
58
+ cwd: fullPath,
59
+ stdio: 'inherit',
60
+ shell: shell
61
+ });
62
+
63
+ installer.on('close', (code) => {
64
+ if (code === 0) {
65
+ console.log('\n [OK] Setup complete!');
66
+ } else {
67
+ console.log('\n [!] Setup exited with code ' + code);
68
+ }
69
+ process.exit(code);
70
+ });
71
+ });
72
+ }
73
+
74
+ run();
package/package.json ADDED
@@ -0,0 +1,35 @@
1
+ {
2
+ "name": "arcrift-setup",
3
+ "version": "1.5.3",
4
+ "description": "Smart, local-first RAG and Knowledge Graph for AI conversations.",
5
+ "main": "index.js",
6
+ "bin": {
7
+ "arcrift-setup": "bin/setup.js"
8
+ },
9
+ "files": [
10
+ "bin/",
11
+ "package.json",
12
+ "README.md"
13
+ ],
14
+ "scripts": {
15
+ "setup": "node bin/setup.js"
16
+ },
17
+ "repository": {
18
+ "type": "git",
19
+ "url": "https://github.com/Eshaan-Nair/ArcRift"
20
+ },
21
+ "keywords": [
22
+ "arcrift",
23
+ "ai",
24
+ "memory",
25
+ "rag",
26
+ "knowledge-graph",
27
+ "ollama",
28
+ "groq"
29
+ ],
30
+ "author": "Eshaan Nair",
31
+ "license": "MIT",
32
+ "dependencies": {
33
+ "playwright": "^1.60.0"
34
+ }
35
+ }