persyst-mcp 2.2.6 → 2.2.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +94 -128
- package/bin/export.js +4 -4
- package/bin/extract.js +8 -8
- package/bin/import.js +15 -15
- package/bin/init.js +44 -33
- package/bin/mcp.js +1 -5
- package/bin/monitor.js +511 -0
- package/bin/setup.js +9 -9
- package/package.json +12 -8
- package/src/cache.js +3 -1
- package/src/database.js +4 -2
- package/src/embeddings.js +4 -2
- package/src/events.js +2 -0
- package/src/search.js +2 -2
- package/src/server.js +179 -151
- package/src/text-utils.js +11 -0
- package/src/tools.js +49 -8
- package/src/watcher.js +12 -13
package/README.md
CHANGED
|
@@ -2,7 +2,9 @@
|
|
|
2
2
|
|
|
3
3
|
**Local-first, compliance-grade MCP memory layer for regulated enterprise coding teams using AI assistants.**
|
|
4
4
|
|
|
5
|
-
Persyst gives AI coding agents (Claude Code, Cursor, VS Code, Aider,
|
|
5
|
+
Persyst gives AI coding agents (Claude Code, Cursor, VS Code, Aider, Windsurf, Antigravity) persistent memory across sessions. It stores memories in a local SQLite database with hybrid keyword + semantic search — operating 100% offline with zero cloud egress.
|
|
6
|
+
|
|
7
|
+
---
|
|
6
8
|
|
|
7
9
|
## Compliance-Grade Security Features
|
|
8
10
|
|
|
@@ -11,216 +13,180 @@ Persyst is built from the ground up for highly regulated enterprise environments
|
|
|
11
13
|
* **100% Data Residency (Zero-Egress)**: All vector calculations, full-text searches, and model inferences run locally on the developer's workstation. No database records or context data ever leave the local machine. Bypasses Business Associate Agreement (BAA) complexity for HIPAA.
|
|
12
14
|
* **Cryptographic Chain of Custody**: Every context retrieval generates an Ed25519 cryptographic signature sealing the query and retrieved memory hashes. Each attestation is chained to the previous one via SHA-256 hash chains, creating a tamper-evident audit ledger verifiable by security teams.
|
|
13
15
|
* **Automatic Secret Redaction**: Scans incoming log files and text writes to redact high-entropy secrets (API keys, JWTs, database strings, private keys) before they reach the persistent database.
|
|
14
|
-
* **
|
|
16
|
+
* **Event-Driven File Watching**: Integrates `chokidar` for instant scanning of agent transcript folders, guaranteeing that your memories are synchronized immediately after each agent interaction.
|
|
17
|
+
* **Workspace Project Isolation**: Supports `PERSYST_PROJECT` environment partitioning, preventing cross-project context leaks while allowing shared enterprise compliance rules.
|
|
15
18
|
|
|
16
19
|
*Read more in our compliance mapping guides:*
|
|
17
|
-
- [SOC 2 Type II Controls](
|
|
18
|
-
- [HIPAA Mapping & PHI Boundaries](
|
|
19
|
-
- [EU AI Act Article 13 Transparency](
|
|
20
|
-
- [Compliance Audit Trail Sample](
|
|
20
|
+
- [SOC 2 Type II Controls](compliance/SOC2-controls.md)
|
|
21
|
+
- [HIPAA Mapping & PHI Boundaries](compliance/HIPAA-mapping.md)
|
|
22
|
+
- [EU AI Act Article 13 Transparency](compliance/EU-AI-Act-Article13.md)
|
|
23
|
+
- [Compliance Audit Trail Sample](compliance/audit-trail-sample.md)
|
|
21
24
|
|
|
22
|
-
|
|
25
|
+
---
|
|
23
26
|
|
|
24
|
-
|
|
25
|
-
Your AI Agent ←→ MCP (stdio) ←→ Persyst ←→ SQLite (local)
|
|
26
|
-
```
|
|
27
|
+
## Quick Start & Automatic IDE Setup
|
|
27
28
|
|
|
28
|
-
|
|
29
|
-
2. **Agent searches memories** → Persyst finds matches by both keywords AND meaning
|
|
30
|
-
3. **"dark mode" ↔ "night theme"** → Semantic search understands synonyms
|
|
29
|
+
You don't need to configure MCP files manually. Persyst includes an automated setup CLI that detects installed editors and configures rule wrappers and global settings in seconds.
|
|
31
30
|
|
|
32
|
-
|
|
31
|
+
### Automatic One-Command Setup
|
|
33
32
|
|
|
34
|
-
|
|
33
|
+
Run the setup wizard in your target project directory:
|
|
35
34
|
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
>
|
|
40
|
-
> The IDE itself does not automatically inject retrieved memories into prompt inputs unless configured to do so via workspace rules (e.g. `.cursorrules`, `.windsurfrules`, `.agents/AGENTS.md`) or custom system prompt builders. To ensure the agent utilizes its memory, make sure your agent instructions direct it to query the database.
|
|
41
|
-
|
|
42
|
-
---
|
|
43
|
-
|
|
44
|
-
## Quick Start
|
|
35
|
+
```bash
|
|
36
|
+
npx persyst-mcp init
|
|
37
|
+
```
|
|
45
38
|
|
|
46
|
-
|
|
39
|
+
This command automatically:
|
|
40
|
+
1. Generates local cryptographic Ed25519 keypairs in `~/.persyst`.
|
|
41
|
+
2. Creates workspace rule files (`.cursorrules`, `.windsurfrules`, `.clinerules`, `.persystrules.md`) to instruct agents on memory retrieval.
|
|
42
|
+
3. Automatically writes global MCP server configurations for **Cursor**, **Claude Code**, **Aider**, and **Continue.dev** with project-scoped environment parameters (`PERSYST_PROJECT`).
|
|
47
43
|
|
|
48
|
-
|
|
44
|
+
---
|
|
49
45
|
|
|
50
|
-
|
|
51
|
-
Add this to your global configuration file located at `~/.claude.json`:
|
|
52
|
-
```json
|
|
53
|
-
{
|
|
54
|
-
"mcpServers": {
|
|
55
|
-
"persyst": {
|
|
56
|
-
"command": "npx",
|
|
57
|
-
"args": ["-y", "persyst-mcp"]
|
|
58
|
-
}
|
|
59
|
-
}
|
|
60
|
-
}
|
|
61
|
-
```
|
|
46
|
+
## Manual MCP Configuration
|
|
62
47
|
|
|
63
|
-
|
|
64
|
-
Add this to your Claude Desktop configuration file:
|
|
65
|
-
* **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
|
|
66
|
-
* **Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
|
|
67
|
-
* **Linux**: `~/.config/Claude/claude_desktop_config.json`
|
|
48
|
+
If you prefer to configure your agent manually, add the MCP server definition to your editor:
|
|
68
49
|
|
|
50
|
+
### Claude Code (`~/.claude.json`) & Claude Desktop
|
|
69
51
|
```json
|
|
70
52
|
{
|
|
71
53
|
"mcpServers": {
|
|
72
54
|
"persyst": {
|
|
73
55
|
"command": "npx",
|
|
74
|
-
"args": ["-y", "persyst-mcp"]
|
|
56
|
+
"args": ["-y", "persyst-mcp"],
|
|
57
|
+
"env": {
|
|
58
|
+
"PERSYST_PROJECT": "my-project"
|
|
59
|
+
}
|
|
75
60
|
}
|
|
76
61
|
}
|
|
77
62
|
}
|
|
78
63
|
```
|
|
79
64
|
|
|
80
|
-
---
|
|
81
|
-
|
|
82
|
-
## Setup for Other Agents
|
|
83
|
-
|
|
84
65
|
### VS Code (Cline / Roo Code)
|
|
85
|
-
Add
|
|
66
|
+
Add to your user settings under `cline_mcp_settings.json`:
|
|
86
67
|
```json
|
|
87
68
|
{
|
|
88
69
|
"mcpServers": {
|
|
89
70
|
"persyst": {
|
|
90
71
|
"command": "npx",
|
|
91
|
-
"args": ["-y", "persyst-mcp"]
|
|
72
|
+
"args": ["-y", "persyst-mcp"],
|
|
73
|
+
"env": {
|
|
74
|
+
"PERSYST_PROJECT": "my-project"
|
|
75
|
+
}
|
|
92
76
|
}
|
|
93
77
|
}
|
|
94
78
|
}
|
|
95
79
|
```
|
|
96
80
|
|
|
97
81
|
### Cursor
|
|
98
|
-
|
|
82
|
+
Under **Settings → Features → MCP**:
|
|
99
83
|
1. Click **+ Add New MCP Server**
|
|
100
84
|
2. Name: `persyst`
|
|
101
85
|
3. Type: `stdio`
|
|
102
86
|
4. Command: `npx -y persyst-mcp`
|
|
103
87
|
|
|
104
88
|
### Aider
|
|
105
|
-
|
|
106
|
-
```bash
|
|
107
|
-
aider --mcp-server persyst:npx -y persyst-mcp
|
|
108
|
-
```
|
|
109
|
-
Or append this to your `.aider.conf.yml` project file:
|
|
89
|
+
Append to your `.aider.conf.yml` project file:
|
|
110
90
|
```yaml
|
|
111
91
|
mcp-server:
|
|
112
92
|
- name: persyst
|
|
113
93
|
command: npx -y persyst-mcp
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
### Antigravity
|
|
117
|
-
Add Persyst to your Antigravity agent configuration file at `~/.gemini/antigravity/mcp_config.json`:
|
|
118
|
-
```json
|
|
119
|
-
{
|
|
120
|
-
"mcpServers": {
|
|
121
|
-
"persyst": {
|
|
122
|
-
"command": "npx",
|
|
123
|
-
"args": ["-y", "persyst-mcp"]
|
|
124
|
-
}
|
|
125
|
-
}
|
|
126
|
-
}
|
|
94
|
+
env:
|
|
95
|
+
PERSYST_PROJECT: my-project
|
|
127
96
|
```
|
|
128
97
|
|
|
129
98
|
---
|
|
130
99
|
|
|
131
|
-
##
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
| `update_memory` | Update memory content | `id` (number), `content` (string) |
|
|
139
|
-
| `delete_memory` | Delete a memory and clean up edges | `id` (number) |
|
|
140
|
-
| `get_recent_memories` | Get latest memories | `limit` (number) |
|
|
141
|
-
| `get_important_memories` | Get by importance score | `limit` (number) |
|
|
142
|
-
| `get_optimized_context` | Get compressed, ranked context block | `query` (string), `max_tokens` (number) |
|
|
143
|
-
| `ingest_git_commits` | Import recent git commits as memories | `repo_path` (string), `count` (number) |
|
|
144
|
-
| `consolidate_memories` | Merge highly similar duplicate memories | — |
|
|
145
|
-
| `get_memory_history` | Retrieve all versions of a memory | `query` (string) |
|
|
146
|
-
| `get_agent_stats` | Agent reputation stats | — |
|
|
147
|
-
| `export_audit_log` | Export attestation audit log | `start_date`, `end_date` (ISO8601) |
|
|
148
|
-
| `verify_attestation` | Verify Ed25519 signature chain | `attestation_id` (string) |
|
|
100
|
+
## Passive Recording vs. Active Retrieval
|
|
101
|
+
|
|
102
|
+
> **Note on Agent Integration**: Persyst operates in two complementary modes:
|
|
103
|
+
> 1. **Passive Recording**: The file watcher automatically extracts and saves memories from your agent conversation transcripts in the background.
|
|
104
|
+
> 2. **Active Retrieval**: The AI agent calls `search_memories` or `get_optimized_context` to fetch relevant context.
|
|
105
|
+
>
|
|
106
|
+
> The IDE itself does not automatically inject retrieved memories into prompt inputs unless configured to do so via workspace rules (e.g. `.cursorrules`, `.windsurfrules`, `.clinerules`) or custom system prompt builders.
|
|
149
107
|
|
|
150
108
|
---
|
|
151
109
|
|
|
152
|
-
##
|
|
110
|
+
## Available Tools (19 MCP Endpoints)
|
|
111
|
+
|
|
112
|
+
| Tool | Description | Key Parameters |
|
|
113
|
+
|------|-------------|----------------|
|
|
114
|
+
| `add_memory` | Store a new memory with secret redaction & contradiction check | `content`, `importance` (0-1), `agent_id`, `shared` |
|
|
115
|
+
| `search_memories` | Hybrid keyword + semantic search with attestation | `query`, `limit`, `agent_id` |
|
|
116
|
+
| `get_memory` | Retrieve a specific memory by ID (boosts importance) | `id`, `agent_id` |
|
|
117
|
+
| `update_memory` | Update content & archive previous version | `id`, `content`, `agent_id` |
|
|
118
|
+
| `delete_memory` | Permanently delete a memory & clean knowledge graph edges | `id` |
|
|
119
|
+
| `get_recent_memories` | Fetch latest memories ordered by creation date | `limit`, `agent_id` |
|
|
120
|
+
| `get_important_memories` | Fetch memories ranked by importance score | `limit`, `agent_id` |
|
|
121
|
+
| `get_optimized_context` | Graph-hopped context prompt compiled within token budget | `query`, `max_tokens`, `agent_id`, `intent` |
|
|
122
|
+
| `ingest_git_commits` | Parse & import recent git commits as structured memories | `repo_path`, `count` |
|
|
123
|
+
| `watch_git_repo` | Poll repository for changes and auto-ingest new commits | `repo_path` |
|
|
124
|
+
| `consolidate_memories` | Semantic deduplication sweep merging similar memories | — |
|
|
125
|
+
| `get_memory_history` | Retrieve complete version history and semantic diffs | `query` |
|
|
126
|
+
| `get_agent_stats` | View agent reputation scores & contradiction metrics | — |
|
|
127
|
+
| `export_audit_log` | Export cryptographic attestation audit log (JSON/Markdown) | `start_date`, `end_date` |
|
|
128
|
+
| `verify_attestation` | Verify Ed25519 signature & SHA-256 chain integrity | `attestation_id` |
|
|
129
|
+
| `add_entity` | Add named entity to knowledge graph | `name`, `type` |
|
|
130
|
+
| `link_entity_memory` | Create edge between knowledge graph entity and memory | `entity_id`, `memory_id`, `relation` |
|
|
131
|
+
| `search_by_entity` | Query linked memories via knowledge graph traversal | `entity_name` |
|
|
153
132
|
|
|
154
|
-
|
|
133
|
+
---
|
|
155
134
|
|
|
156
|
-
|
|
157
|
-
2. **Semantic Search (sqlite-vec)** — Meaning-based using local embeddings
|
|
135
|
+
## Local HTTP Gateway & Swarm Integration
|
|
158
136
|
|
|
159
|
-
|
|
137
|
+
In addition to STDIO transport, Persyst automatically launches a high-throughput local HTTP Gateway on port `4321` (`http://127.0.0.1:4321`).
|
|
160
138
|
|
|
161
|
-
|
|
139
|
+
- **`/health`**: Health check and database status
|
|
140
|
+
- **`/stats`**: Global memory & agent reputation statistics
|
|
141
|
+
- **`/system-prompt`**: Formatted prompt context injection
|
|
142
|
+
- **`/compliance/export`**: Cryptographic compliance audit report export (supports `format=markdown`)
|
|
143
|
+
- **`/events`**: Real-time Server-Sent Events (SSE) stream for agent swarms
|
|
162
144
|
|
|
163
|
-
|
|
145
|
+
---
|
|
164
146
|
|
|
165
|
-
|
|
166
|
-
- **Database:** SQLite via better-sqlite3
|
|
167
|
-
- **Vector Search:** sqlite-vec (local, no cloud)
|
|
168
|
-
- **Full-Text Search:** SQLite FTS5
|
|
169
|
-
- **Embeddings:** @huggingface/transformers + all-MiniLM-L6-v2 (384-dim, ~50MB)
|
|
170
|
-
- **Protocol:** MCP over stdio
|
|
147
|
+
## How Hybrid Search Works
|
|
171
148
|
|
|
172
|
-
|
|
149
|
+
Persyst combines two complementary search strategies:
|
|
173
150
|
|
|
174
|
-
|
|
151
|
+
1. **Keyword Search (SQLite FTS5)** — Fast, exact string matching using BM25 ranking.
|
|
152
|
+
2. **Semantic Search (sqlite-vec)** — Deep meaning-based matching using local `all-MiniLM-L6-v2` embeddings.
|
|
175
153
|
|
|
176
|
-
|
|
177
|
-
`better-sqlite3` compiles native C++ code on installation. Make sure you have python and C++ build tools installed on your system:
|
|
178
|
-
* **Windows:** Run `npm install --global windows-build-tools` or install Visual Studio Build Tools.
|
|
179
|
-
* **macOS/Linux:** Run `xcode-select --install` or install `build-essential`.
|
|
154
|
+
Results are merged dynamically. Keyword matches receive a score boost so exact matches rank at the top, while semantic similarity surfaces conceptually relevant memories even when different phrasing is used.
|
|
180
155
|
|
|
181
|
-
|
|
182
|
-
This is normal on the **very first run** because Persyst is downloading the ~50MB embedding model. Wait 30-60 seconds for it to complete. The next runs will be instant.
|
|
156
|
+
---
|
|
183
157
|
|
|
184
|
-
|
|
185
|
-
Instead of running it globally, prefer using the `npx -y persyst-mcp` command in your agent configurations. It automatically installs and updates the server non-interactively.
|
|
158
|
+
## Tech Stack
|
|
186
159
|
|
|
187
|
-
|
|
188
|
-
|
|
160
|
+
- **Runtime:** Node.js 18+
|
|
161
|
+
- **Database:** SQLite via `better-sqlite3` (synchronous, WAL mode)
|
|
162
|
+
- **Vector Search:** `sqlite-vec` (in-process, zero cloud egress)
|
|
163
|
+
- **Full-Text Search:** SQLite FTS5
|
|
164
|
+
- **Embeddings:** `@huggingface/transformers` + `all-MiniLM-L6-v2` (384-dim, local ONNX)
|
|
165
|
+
- **Watcher:** `chokidar` event-driven file monitoring
|
|
166
|
+
- **Protocol:** MCP over stdio + HTTP Gateway
|
|
189
167
|
|
|
190
168
|
---
|
|
191
169
|
|
|
192
170
|
## Backup & Migration
|
|
193
171
|
|
|
194
|
-
Persyst includes built-in JSONL export/import commands for portable memory backup and cross-machine migration
|
|
172
|
+
Persyst includes built-in JSONL export/import commands for portable memory backup and cross-machine migration:
|
|
195
173
|
|
|
196
174
|
```bash
|
|
197
|
-
# Export all memories to a file
|
|
175
|
+
# Export all memories to a JSONL file
|
|
198
176
|
npx persyst-mcp export
|
|
199
|
-
# → persyst-export-<timestamp>.jsonl
|
|
200
177
|
|
|
201
178
|
# Export to a specific file
|
|
202
179
|
npx persyst-mcp export my-backup.jsonl
|
|
203
180
|
|
|
204
|
-
# Preview
|
|
181
|
+
# Preview import (dry run)
|
|
205
182
|
npx persyst-mcp import my-backup.jsonl --dry-run
|
|
206
183
|
|
|
207
|
-
# Import memories (
|
|
184
|
+
# Import memories (deduplicates automatically)
|
|
208
185
|
npx persyst-mcp import my-backup.jsonl
|
|
209
186
|
```
|
|
210
187
|
|
|
211
188
|
---
|
|
212
189
|
|
|
213
|
-
## Roadmap & Future Directions
|
|
214
|
-
|
|
215
|
-
Persyst is built for the privacy-focused solo developer. We are actively hardening the local-first experience before introducing network dependencies.
|
|
216
|
-
|
|
217
|
-
* **File-Based Sync** ✅ **Done**: `persyst-export` / `persyst-import` JSONL commands for backup and migration.
|
|
218
|
-
* **IDE Integrations**: First-class extensions for Cursor, VS Code, and Aider configuration helper commands.
|
|
219
|
-
* **True P2P Sync (Roadmap)**: Peer-to-peer secure sync between developer devices without relying on central cloud servers.
|
|
220
|
-
|
|
221
|
-
---
|
|
222
|
-
|
|
223
190
|
## License
|
|
224
191
|
|
|
225
192
|
MIT License. See [LICENSE](LICENSE) for details.
|
|
226
|
-
|
package/bin/export.js
CHANGED
|
@@ -100,16 +100,16 @@ try {
|
|
|
100
100
|
});
|
|
101
101
|
});
|
|
102
102
|
|
|
103
|
-
console.log(
|
|
103
|
+
console.log(`[OK] Exported ${count} memories to: ${outputFile}`);
|
|
104
104
|
if (namespace) {
|
|
105
|
-
console.log(`
|
|
105
|
+
console.log(` Namespace filter: "${namespace}" + shared`);
|
|
106
106
|
}
|
|
107
107
|
if (includeArchived) {
|
|
108
|
-
console.log('
|
|
108
|
+
console.log(' Includes archived (superseded) memories.');
|
|
109
109
|
}
|
|
110
110
|
|
|
111
111
|
} catch (err) {
|
|
112
|
-
console.error(
|
|
112
|
+
console.error(`[ERROR] Export failed: ${err.message}`);
|
|
113
113
|
process.exit(1);
|
|
114
114
|
} finally {
|
|
115
115
|
closeDatabase();
|
package/bin/extract.js
CHANGED
|
@@ -114,9 +114,9 @@ async function run() {
|
|
|
114
114
|
}
|
|
115
115
|
|
|
116
116
|
if (!jsonOutput) {
|
|
117
|
-
console.log(`\n
|
|
117
|
+
console.log(`\n[INFO] Heuristic fact(s) extracted: ${heuristicFacts.length}`);
|
|
118
118
|
for (const f of heuristicFacts) {
|
|
119
|
-
console.log(`
|
|
119
|
+
console.log(` [OK] [${f.category}] (conf: ${f.confidence}) ${f.content}`);
|
|
120
120
|
}
|
|
121
121
|
}
|
|
122
122
|
|
|
@@ -128,7 +128,7 @@ async function run() {
|
|
|
128
128
|
// --- Store to database (unless dry-run) ---
|
|
129
129
|
if (!dryRun && allFacts.length > 0) {
|
|
130
130
|
if (!jsonOutput) {
|
|
131
|
-
console.log(`\n
|
|
131
|
+
console.log(`\n[INFO] Storing to database...`);
|
|
132
132
|
}
|
|
133
133
|
|
|
134
134
|
const { insertMemory, insertVector, memoryExists } = await import('../src/database.js');
|
|
@@ -142,7 +142,7 @@ async function run() {
|
|
|
142
142
|
if (memoryExists(fact.content)) {
|
|
143
143
|
dupes++;
|
|
144
144
|
if (!jsonOutput) {
|
|
145
|
-
console.log(`
|
|
145
|
+
console.log(` [SKIP] Duplicate: "${fact.content.slice(0, 50)}..."`);
|
|
146
146
|
}
|
|
147
147
|
continue;
|
|
148
148
|
}
|
|
@@ -158,15 +158,15 @@ async function run() {
|
|
|
158
158
|
|
|
159
159
|
stored++;
|
|
160
160
|
if (!jsonOutput) {
|
|
161
|
-
console.log(`
|
|
161
|
+
console.log(` [OK] Stored memory #${id}: "${fact.content.slice(0, 60)}..."`);
|
|
162
162
|
}
|
|
163
163
|
}
|
|
164
164
|
|
|
165
165
|
if (!jsonOutput) {
|
|
166
|
-
console.log(`\n
|
|
166
|
+
console.log(`\n[INFO] Result: ${stored} stored, ${dupes} duplicates skipped`);
|
|
167
167
|
}
|
|
168
168
|
} else if (dryRun && !jsonOutput) {
|
|
169
|
-
console.log(`\n
|
|
169
|
+
console.log(`\n[INFO] Dry run — no facts stored.`);
|
|
170
170
|
}
|
|
171
171
|
|
|
172
172
|
// --- JSON output ---
|
|
@@ -180,6 +180,6 @@ async function run() {
|
|
|
180
180
|
}
|
|
181
181
|
|
|
182
182
|
run().catch(err => {
|
|
183
|
-
console.error(`\n
|
|
183
|
+
console.error(`\n[ERROR] Extraction failed: ${err.message}`);
|
|
184
184
|
process.exit(1);
|
|
185
185
|
});
|
package/bin/import.js
CHANGED
|
@@ -40,7 +40,7 @@ const skipEmbeddings = args.includes('--skip-embeddings');
|
|
|
40
40
|
const DEDUP_THRESHOLD = 0.85;
|
|
41
41
|
|
|
42
42
|
if (!inputFile) {
|
|
43
|
-
console.error('
|
|
43
|
+
console.error('[ERROR] Usage: persyst-import <file.jsonl> [--dry-run] [--namespace=<ns>] [--skip-embeddings]');
|
|
44
44
|
process.exit(1);
|
|
45
45
|
}
|
|
46
46
|
|
|
@@ -49,10 +49,10 @@ if (!inputFile) {
|
|
|
49
49
|
// ============================================================
|
|
50
50
|
|
|
51
51
|
async function main() {
|
|
52
|
-
console.log(
|
|
53
|
-
console.log(`
|
|
54
|
-
if (forceNamespace) console.log(`
|
|
55
|
-
if (skipEmbeddings) console.log('
|
|
52
|
+
console.log(`[IMPORT] Persyst Import${isDryRun ? ' (DRY RUN — nothing will be written)' : ''}`);
|
|
53
|
+
console.log(` Source: ${inputFile}`);
|
|
54
|
+
if (forceNamespace) console.log(` Forcing namespace: "${forceNamespace}"`);
|
|
55
|
+
if (skipEmbeddings) console.log(' Skipping embedding regeneration.');
|
|
56
56
|
console.log('');
|
|
57
57
|
|
|
58
58
|
const rl = createInterface({
|
|
@@ -74,7 +74,7 @@ async function main() {
|
|
|
74
74
|
try {
|
|
75
75
|
record = JSON.parse(trimmed);
|
|
76
76
|
} catch (err) {
|
|
77
|
-
console.error(`
|
|
77
|
+
console.error(` [WARN] Line ${lineNum}: Invalid JSON — skipping`);
|
|
78
78
|
errors++;
|
|
79
79
|
continue;
|
|
80
80
|
}
|
|
@@ -82,7 +82,7 @@ async function main() {
|
|
|
82
82
|
const { content, importance_score = 1.0, namespace, provenance, valid_until } = record;
|
|
83
83
|
|
|
84
84
|
if (!content || typeof content !== 'string' || content.trim().length === 0) {
|
|
85
|
-
console.error(`
|
|
85
|
+
console.error(` [WARN] Line ${lineNum}: Empty content — skipping`);
|
|
86
86
|
errors++;
|
|
87
87
|
continue;
|
|
88
88
|
}
|
|
@@ -97,7 +97,7 @@ async function main() {
|
|
|
97
97
|
|
|
98
98
|
// --- Dedup: exact content match ---
|
|
99
99
|
if (memoryExists(content, targetNamespace)) {
|
|
100
|
-
console.log(`
|
|
100
|
+
console.log(` [SKIP] Line ${lineNum}: Already exists — skipping "${content.slice(0, 60)}..."`);
|
|
101
101
|
skipped++;
|
|
102
102
|
continue;
|
|
103
103
|
}
|
|
@@ -107,7 +107,7 @@ async function main() {
|
|
|
107
107
|
try {
|
|
108
108
|
const similar = await searchHybrid(content, 1, null, null, targetNamespace);
|
|
109
109
|
if (similar.length > 0 && parseFloat(similar[0].similarity) >= DEDUP_THRESHOLD) {
|
|
110
|
-
console.log(`
|
|
110
|
+
console.log(` [SKIP] Line ${lineNum}: Semantically similar to #${similar[0].id} (sim=${similar[0].similarity}) — skipping`);
|
|
111
111
|
skipped++;
|
|
112
112
|
continue;
|
|
113
113
|
}
|
|
@@ -117,7 +117,7 @@ async function main() {
|
|
|
117
117
|
}
|
|
118
118
|
|
|
119
119
|
if (isDryRun) {
|
|
120
|
-
console.log(`
|
|
120
|
+
console.log(` [OK] Would import: "${content.slice(0, 80)}${content.length > 80 ? '...' : ''}" → ns="${targetNamespace}"`);
|
|
121
121
|
imported++;
|
|
122
122
|
continue;
|
|
123
123
|
}
|
|
@@ -132,10 +132,10 @@ async function main() {
|
|
|
132
132
|
insertVector(id, embedding);
|
|
133
133
|
}
|
|
134
134
|
|
|
135
|
-
console.log(`
|
|
135
|
+
console.log(` [OK] Imported #${id}: "${content.slice(0, 70)}${content.length > 70 ? '...' : ''}"`);
|
|
136
136
|
imported++;
|
|
137
137
|
} catch (err) {
|
|
138
|
-
console.error(`
|
|
138
|
+
console.error(` [ERROR] Line ${lineNum}: Failed to insert — ${err.message}`);
|
|
139
139
|
errors++;
|
|
140
140
|
}
|
|
141
141
|
}
|
|
@@ -143,16 +143,16 @@ async function main() {
|
|
|
143
143
|
console.log('');
|
|
144
144
|
console.log('═'.repeat(50));
|
|
145
145
|
if (isDryRun) {
|
|
146
|
-
console.log(
|
|
146
|
+
console.log(`[INFO] Dry run complete: ${imported} would import, ${skipped} skipped, ${errors} errors`);
|
|
147
147
|
} else {
|
|
148
|
-
console.log(
|
|
148
|
+
console.log(`[INFO] Import complete: ${imported} imported, ${skipped} skipped, ${errors} errors`);
|
|
149
149
|
}
|
|
150
150
|
console.log('═'.repeat(50));
|
|
151
151
|
}
|
|
152
152
|
|
|
153
153
|
main()
|
|
154
154
|
.catch(err => {
|
|
155
|
-
console.error(
|
|
155
|
+
console.error(`[ERROR] Import crashed: ${err.message}`);
|
|
156
156
|
process.exit(1);
|
|
157
157
|
})
|
|
158
158
|
.finally(() => {
|