memshell 0.3.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  <h1>mem.sh</h1>
4
4
 
5
- <p><strong>Persistent memory for AI agents.</strong><br>One line to save. One line to recall.</p>
5
+ <p><strong>Persistent memory for AI agents.</strong><br>One line to save. One line to recall. Auto-ingest conversations.</p>
6
6
 
7
7
  [![npm version](https://img.shields.io/npm/v/memshell.svg?style=flat-square)](https://www.npmjs.com/package/memshell)
8
8
  [![license](https://img.shields.io/npm/l/memshell.svg?style=flat-square)](https://github.com/justedv/mem.sh/blob/main/LICENSE)
@@ -10,7 +10,7 @@
10
10
 
11
11
  <br>
12
12
 
13
- [Quick Start](#quick-start) · [SDK](#sdk) · [API Server](#api-server) · [Architecture](#how-it-works) · [Contributing](CONTRIBUTING.md)
13
+ [Quick Start](#quick-start) · [Auto-Ingest](#auto-ingest) · [OpenClaw Integration](#openclaw-integration) · [SDK](#sdk) · [API Server](#api-server) · [Architecture](#how-it-works)
14
14
 
15
15
  </div>
16
16
 
@@ -29,17 +29,20 @@ Agents forget everything between sessions. **mem.sh** gives them a brain.
29
29
  | | mem.sh | LangChain Memory | Roll your own |
30
30
  |---|---|---|---|
31
31
  | **Setup** | `npx memshell set "..."` | 47 dependencies + config | Hours of boilerplate |
32
- | **External APIs** | None | OpenAI key required | Depends |
32
+ | **Auto-ingest** | Built-in | No | You build it |
33
+ | **External APIs** | None (optional) | OpenAI key required | Depends |
33
34
  | **Semantic search** | Built-in TF-IDF | Embedding models | You build it |
34
35
  | **Storage** | SQLite (local) | Varies | You choose |
35
- | **Lines of code** | ~1 | ~50+ | ~200+ |
36
36
 
37
37
  ## Features
38
38
 
39
- - **Fast** TF-IDF vectorization with cosine similarity, instant results
40
- - **Local-first** SQLite storage at `~/.mem/mem.db`, no data leaves your machine
41
- - **Semantic** Recall by meaning, not exact match
42
- - **Zero config** `npx` and go. No API keys, no setup, no dependencies
39
+ - **Fast** -- TF-IDF vectorization with cosine similarity, instant results
40
+ - **Local-first** -- SQLite storage at `~/.mem/mem.db`, no data leaves your machine
41
+ - **Semantic** -- Recall by meaning, not exact match
42
+ - **Auto-ingest** -- Feed raw conversations, auto-extract key facts via LLM
43
+ - **OpenClaw integration** -- Watch session transcripts and auto-learn
44
+ - **Zero config** -- `npx` and go. No API keys needed for core features
45
+ - **Smart recall** -- Shows source, creation time, and recall frequency
43
46
 
44
47
  ## Quick Start
45
48
 
@@ -51,7 +54,7 @@ npx memshell set "user prefers dark mode"
51
54
 
52
55
  # Recall semantically
53
56
  npx memshell recall "what theme does the user like?"
54
- # user prefers dark mode (score: 0.87)
57
+ # => user prefers dark mode (score: 0.87)
55
58
 
56
59
  # List all memories
57
60
  npx memshell list
@@ -63,7 +66,92 @@ npx memshell forget <id>
63
66
  npx memshell clear
64
67
  ```
65
68
 
66
- ### SDK
69
+ ## Auto-Ingest
70
+
71
+ Feed raw conversations and let the LLM extract key facts automatically.
72
+
73
+ Requires `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` (or configure via `memshell config set apiKey <key>`).
74
+
75
+ ### From a file
76
+
77
+ ```bash
78
+ npx memshell ingest conversation.txt
79
+ npx memshell ingest chat.jsonl
80
+ npx memshell ingest notes.md
81
+ ```
82
+
83
+ ### From stdin
84
+
85
+ ```bash
86
+ echo "User said they prefer dark mode and use vim" | npx memshell ingest --stdin
87
+ ```
88
+
89
+ ### Watch a directory
90
+
91
+ ```bash
92
+ npx memshell ingest --watch ./logs/
93
+ ```
94
+
95
+ Watches for new or changed `.txt`, `.md`, `.json`, and `.jsonl` files. Tracks what has been processed to avoid duplicates.
96
+
97
+ ### Via API
98
+
99
+ ```bash
100
+ curl -X POST http://localhost:3456/mem/ingest \
101
+ -H "Content-Type: application/json" \
102
+ -d '{"text": "User mentioned they love Rust and prefer dark themes"}'
103
+ # => {"extracted": 2, "stored": 2, "duplicates": 0}
104
+ ```
105
+
106
+ ### How it works
107
+
108
+ 1. Text is split into ~2000-token chunks
109
+ 2. Each chunk is sent to an LLM (gpt-4o-mini or claude-3-haiku) to extract standalone facts
110
+ 3. Facts are deduplicated against existing memories (Jaccard similarity > 0.85 = skip)
111
+ 4. New facts are stored with auto-generated tags and source tracking
112
+
113
+ ## OpenClaw Integration
114
+
115
+ Automatically learn from your OpenClaw agent conversations:
116
+
117
+ ```bash
118
+ # Start watching OpenClaw session transcripts
119
+ npx memshell connect openclaw
120
+
121
+ # Or specify a custom path
122
+ npx memshell connect openclaw /path/to/sessions/
123
+ ```
124
+
125
+ This watches the OpenClaw sessions directory (`~/.openclaw/agents/main/sessions/` by default), parses JSONL transcripts, and auto-ingests new conversations.
126
+
127
+ ### Daemon mode
128
+
129
+ Run continuous ingestion in the background:
130
+
131
+ ```bash
132
+ # Configure watchers first
133
+ npx memshell config set watch.openclaw ~/.openclaw/agents/main/sessions/
134
+
135
+ # Start the daemon
136
+ npx memshell daemon
137
+ ```
138
+
139
+ ### Configuration
140
+
141
+ ```bash
142
+ # Set LLM API key
143
+ npx memshell config set apiKey sk-...
144
+
145
+ # Set model
146
+ npx memshell config set model gpt-4o-mini
147
+
148
+ # View config
149
+ npx memshell config get
150
+ ```
151
+
152
+ Config is stored at `~/.mem/config.json`.
153
+
154
+ ## SDK
67
155
 
68
156
  ```js
69
157
  const mem = require('memshell');
@@ -72,9 +160,9 @@ const mem = require('memshell');
72
160
  await mem.set('user prefers dark mode');
73
161
  await mem.set('favorite language is rust', { agent: 'coder-bot' });
74
162
 
75
- // Recall (semantic search)
163
+ // Recall (semantic search) -- now includes source and recall count
76
164
  const results = await mem.recall('what does the user like?');
77
- // [{ id, text, score, created_at }]
165
+ // [{ id, text, score, created_at, source, recall_count }]
78
166
 
79
167
  // List all
80
168
  const all = await mem.list();
@@ -92,7 +180,7 @@ mem.sh uses **TF-IDF vectorization** with **cosine similarity** for semantic sea
92
180
 
93
181
  Memories are stored in `~/.mem/mem.db` (SQLite). Each memory is tokenized and vectorized on write. Queries are vectorized at recall time and ranked by cosine similarity against stored vectors.
94
182
 
95
- [Full architecture docs](docs/ARCHITECTURE.md)
183
+ Optional: Enable OpenAI embeddings with `--embeddings` flag for higher quality recall (requires `OPENAI_API_KEY`).
96
184
 
97
185
  ## API Server
98
186
 
@@ -107,29 +195,19 @@ npx memshell serve --port 3456 --key my-secret-key
107
195
  | Method | Path | Description |
108
196
  |--------|------|-------------|
109
197
  | `POST` | `/mem` | Store a memory |
198
+ | `POST` | `/mem/ingest` | Auto-ingest raw text |
110
199
  | `GET` | `/mem/recall?q=` | Semantic recall |
111
200
  | `GET` | `/mem/list` | List all memories |
201
+ | `GET` | `/mem/stats` | Memory statistics |
202
+ | `GET` | `/mem/export` | Export all memories |
203
+ | `POST` | `/mem/import` | Import memories |
112
204
  | `DELETE` | `/mem/:id` | Delete a memory |
113
205
  | `DELETE` | `/mem` | Clear all memories |
114
206
 
115
207
  ### Headers
116
208
 
117
- - `X-Mem-Key` API key (required if `--key` is set)
118
- - `X-Mem-Agent` Agent namespace (optional, isolates memories per agent)
119
-
120
- ### Example
121
-
122
- ```bash
123
- # Store
124
- curl -X POST http://localhost:3456/mem \
125
- -H "Content-Type: application/json" \
126
- -H "X-Mem-Key: my-secret-key" \
127
- -d '{"text": "user prefers dark mode"}'
128
-
129
- # Recall
130
- curl "http://localhost:3456/mem/recall?q=theme+preference" \
131
- -H "X-Mem-Key: my-secret-key"
132
- ```
209
+ - `X-Mem-Key` -- API key (required if `--key` is set)
210
+ - `X-Mem-Agent` -- Agent namespace (optional, isolates memories per agent)
133
211
 
134
212
  ### SDK with API Mode
135
213
 
@@ -146,9 +224,27 @@ await mem.set('user prefers dark mode');
146
224
  const results = await mem.recall('theme preference');
147
225
  ```
148
226
 
149
- ## Contributing
227
+ ## All CLI Commands
150
228
 
151
- We welcome contributions. See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
229
+ ```
230
+ memshell set <text> Store a memory
231
+ memshell recall <query> Semantic recall
232
+ memshell list List all memories
233
+ memshell forget <id> Delete a memory by ID
234
+ memshell clear Wipe all memories
235
+ memshell important <id> Boost memory importance
236
+ memshell ingest <file> Extract facts from a file
237
+ memshell ingest --stdin Extract facts from piped text
238
+ memshell ingest --watch <dir> Watch directory for new files
239
+ memshell connect openclaw Watch OpenClaw transcripts
240
+ memshell daemon Run continuous ingestion
241
+ memshell config set <key> <val> Set config value
242
+ memshell config get [key] Show config
243
+ memshell stats Show memory statistics
244
+ memshell export Export all memories as JSON
245
+ memshell import <file.json> Import memories from JSON
246
+ memshell serve [--port N] Start API server
247
+ ```
152
248
 
153
249
  ## License
154
250
 
package/bin/mem.js CHANGED
@@ -4,6 +4,7 @@
4
4
  const fs = require('fs');
5
5
  const path = require('path');
6
6
  const mem = require('../src/index');
7
+ const { LocalStore } = require('../src/index');
7
8
 
8
9
  const args = process.argv.slice(2);
9
10
  const cmd = args[0];
@@ -11,13 +12,26 @@ const cmd = args[0];
11
12
  const HELP = `
12
13
  \x1b[1mmem.sh\x1b[0m — persistent memory for AI agents
13
14
 
14
- \x1b[36mUsage:\x1b[0m
15
+ \x1b[36mCore Commands:\x1b[0m
15
16
  memshell set <text> Store a memory
16
17
  memshell recall <query> Semantic recall
17
18
  memshell list List all memories
18
19
  memshell forget <id> Delete a memory by ID
19
20
  memshell clear Wipe all memories
20
21
  memshell important <id> Boost memory importance
22
+
23
+ \x1b[36mAuto-Ingest:\x1b[0m
24
+ memshell ingest <file> Extract facts from a file
25
+ memshell ingest --stdin Extract facts from piped text
26
+ memshell ingest --watch <dir> Watch a directory for new files
27
+
28
+ \x1b[36mIntegrations:\x1b[0m
29
+ memshell connect openclaw Watch OpenClaw session transcripts
30
+ memshell daemon Run continuous ingestion daemon
31
+
32
+ \x1b[36mManagement:\x1b[0m
33
+ memshell config set <key> <val> Set config value
34
+ memshell config get [key] Show config
21
35
  memshell stats Show memory statistics
22
36
  memshell export Export all memories as JSON
23
37
  memshell import <file.json> Import memories from JSON
@@ -34,10 +48,9 @@ const HELP = `
34
48
  \x1b[36mExamples:\x1b[0m
35
49
  memshell set "user prefers dark mode" --tags preferences,ui
36
50
  memshell recall "what theme?" --tags preferences --top 3
37
- memshell important 5
38
- memshell stats
39
- memshell export > backup.json
40
- memshell import backup.json
51
+ echo "User likes vim and dark mode" | memshell ingest --stdin
52
+ memshell connect openclaw
53
+ memshell config set apiKey sk-...
41
54
  `;
42
55
 
43
56
  // Parse flags
@@ -52,18 +65,14 @@ function hasFlag(name) {
52
65
  return args.includes('--' + name);
53
66
  }
54
67
 
55
- function getTextArgs() {
56
- return args.slice(1).filter(a => !a.startsWith('--') && (args.indexOf(a) === 0 || !['--agent', '--api', '--key', '--tags', '--top', '--port'].includes(args[args.indexOf(a) - 1]))).join(' ').replace(/^["']|["']$/g, '');
57
- }
58
-
59
68
  // Smarter text extraction: skip flag values
60
69
  function getText() {
61
- const skip = new Set(['--agent', '--api', '--key', '--tags', '--top', '--port']);
70
+ const skip = new Set(['--agent', '--api', '--key', '--tags', '--top', '--port', '--watch']);
62
71
  const parts = [];
63
- let i = 1; // skip command
72
+ let i = 1;
64
73
  while (i < args.length) {
65
74
  if (skip.has(args[i])) { i += 2; continue; }
66
- if (args[i] === '--embeddings') { i++; continue; }
75
+ if (args[i] === '--embeddings' || args[i] === '--stdin' || args[i] === '--force') { i++; continue; }
67
76
  if (args[i].startsWith('--')) { i++; continue; }
68
77
  parts.push(args[i]);
69
78
  i++;
@@ -71,6 +80,17 @@ function getText() {
71
80
  return parts.join(' ').replace(/^["']|["']$/g, '');
72
81
  }
73
82
 
83
+ function readStdin() {
84
+ return new Promise((resolve) => {
85
+ let data = '';
86
+ process.stdin.setEncoding('utf8');
87
+ process.stdin.on('data', chunk => data += chunk);
88
+ process.stdin.on('end', () => resolve(data));
89
+ // If nothing after 100ms and stdin is a TTY, resolve empty
90
+ if (process.stdin.isTTY) resolve('');
91
+ });
92
+ }
93
+
74
94
  async function main() {
75
95
  const agent = flag('agent') || 'default';
76
96
  const api = flag('api');
@@ -92,7 +112,7 @@ async function main() {
92
112
  const text = getText();
93
113
  if (!text) return console.log('Usage: memshell set <text>');
94
114
  const r = await mem.set(text, { ...opts, tags });
95
- console.log(`\x1b[32m✓\x1b[0m Stored (id: \x1b[1m${r.id}\x1b[0m)${tags ? ` [tags: ${tags}]` : ''}`);
115
+ console.log(`\x1b[32m+\x1b[0m Stored (id: \x1b[1m${r.id}\x1b[0m)${tags ? ` [tags: ${tags}]` : ''}`);
96
116
  break;
97
117
  }
98
118
  case 'recall': case 'r': case 'search': case 'q': {
@@ -102,7 +122,9 @@ async function main() {
102
122
  if (!results.length) return console.log('\x1b[33mNo memories found.\x1b[0m');
103
123
  for (const r of results) {
104
124
  const tagStr = r.tags ? ` \x1b[35m[${r.tags}]\x1b[0m` : '';
105
- console.log(` \x1b[36m[${r.id}]\x1b[0m ${r.text} \x1b[33m(score: ${r.score})\x1b[0m${tagStr}`);
125
+ const srcStr = r.source && r.source !== 'manual' ? ` \x1b[2m(src: ${r.source})\x1b[0m` : '';
126
+ const recallStr = r.recall_count ? ` \x1b[2m(recalled ${r.recall_count}x)\x1b[0m` : '';
127
+ console.log(` \x1b[36m[${r.id}]\x1b[0m ${r.text} \x1b[33m(score: ${r.score})\x1b[0m${tagStr}${srcStr}${recallStr}`);
106
128
  }
107
129
  break;
108
130
  }
@@ -111,8 +133,9 @@ async function main() {
111
133
  if (!all.length) return console.log('\x1b[33mNo memories stored.\x1b[0m');
112
134
  for (const r of all) {
113
135
  const tagStr = r.tags ? ` \x1b[35m[${r.tags}]\x1b[0m` : '';
114
- const imp = r.importance !== 1.0 ? ` \x1b[33m★${r.importance.toFixed(1)}\x1b[0m` : '';
115
- console.log(` \x1b[36m[${r.id}]\x1b[0m ${r.text}${tagStr}${imp} \x1b[2m(${r.created_at})\x1b[0m`);
136
+ const imp = r.importance !== 1.0 ? ` \x1b[33m*${r.importance.toFixed(1)}\x1b[0m` : '';
137
+ const srcStr = r.source && r.source !== 'manual' ? ` \x1b[2m[${r.source}]\x1b[0m` : '';
138
+ console.log(` \x1b[36m[${r.id}]\x1b[0m ${r.text}${tagStr}${imp}${srcStr} \x1b[2m(${r.created_at})\x1b[0m`);
116
139
  }
117
140
  console.log(`\n \x1b[1m${all.length}\x1b[0m memor${all.length === 1 ? 'y' : 'ies'}`);
118
141
  break;
@@ -121,12 +144,12 @@ async function main() {
121
144
  const id = args[1];
122
145
  if (!id) return console.log('Usage: memshell forget <id>');
123
146
  await mem.forget(id);
124
- console.log(`\x1b[32m✓\x1b[0m Forgotten (id: ${id})`);
147
+ console.log(`\x1b[32m+\x1b[0m Forgotten (id: ${id})`);
125
148
  break;
126
149
  }
127
150
  case 'clear': case 'wipe': case 'reset': {
128
151
  await mem.clear(opts);
129
- console.log('\x1b[32m✓\x1b[0m All memories cleared');
152
+ console.log('\x1b[32m+\x1b[0m All memories cleared');
130
153
  break;
131
154
  }
132
155
  case 'important': case 'boost': {
@@ -134,15 +157,15 @@ async function main() {
134
157
  if (!id) return console.log('Usage: memshell important <id>');
135
158
  const r = await mem.important(Number(id));
136
159
  if (!r) return console.log('\x1b[31mMemory not found.\x1b[0m');
137
- console.log(`\x1b[32m✓\x1b[0m Boosted memory ${r.id} importance: \x1b[1m${r.importance.toFixed(1)}\x1b[0m`);
160
+ console.log(`\x1b[32m+\x1b[0m Boosted memory ${r.id} -> importance: \x1b[1m${r.importance.toFixed(1)}\x1b[0m`);
138
161
  break;
139
162
  }
140
163
  case 'stats': {
141
164
  const s = await mem.stats(opts);
142
- console.log(`\n \x1b[1m🧠 Memory Stats\x1b[0m`);
143
- console.log(` Total: \x1b[36m${s.total}\x1b[0m`);
144
- console.log(` Oldest: ${s.oldest || 'n/a'}`);
145
- console.log(` Newest: ${s.newest || 'n/a'}`);
165
+ console.log(`\n \x1b[1mMemory Stats\x1b[0m`);
166
+ console.log(` Total: \x1b[36m${s.total}\x1b[0m`);
167
+ console.log(` Oldest: ${s.oldest || 'n/a'}`);
168
+ console.log(` Newest: ${s.newest || 'n/a'}`);
146
169
  console.log(` Avg importance: \x1b[33m${s.avg_importance}\x1b[0m\n`);
147
170
  break;
148
171
  }
@@ -157,7 +180,121 @@ async function main() {
157
180
  const raw = fs.readFileSync(path.resolve(file), 'utf8');
158
181
  const data = JSON.parse(raw);
159
182
  const r = await mem.importAll(Array.isArray(data) ? data : data.memories || []);
160
- console.log(`\x1b[32m✓\x1b[0m Imported ${r.imported} memories`);
183
+ console.log(`\x1b[32m+\x1b[0m Imported ${r.imported} memories`);
184
+ break;
185
+ }
186
+ case 'ingest': {
187
+ const { ingestFile, ingest: ingestText } = require('../src/ingest');
188
+ const store = new LocalStore(undefined, useEmbeddings ? { openaiKey: process.env.OPENAI_API_KEY } : {});
189
+ await store.init();
190
+
191
+ if (hasFlag('stdin')) {
192
+ const text = await readStdin();
193
+ if (!text.trim()) return console.log('No input received via stdin.');
194
+ console.log(' Extracting facts from stdin...');
195
+ const result = await ingestText(text, store, { agent });
196
+ console.log(`\x1b[32m+\x1b[0m Extracted: ${result.extracted}, Stored: ${result.stored}, Duplicates: ${result.duplicates}`);
197
+ } else if (hasFlag('watch')) {
198
+ const dir = flag('watch');
199
+ if (!dir || dir === true) return console.log('Usage: memshell ingest --watch <directory>');
200
+ const { watchDirectory } = require('../src/ingest');
201
+ console.log(' Starting directory watcher (Ctrl+C to stop)...');
202
+ watchDirectory(dir, store, { agent });
203
+ // Keep process alive
204
+ process.on('SIGINT', () => { console.log('\n Stopped.'); process.exit(0); });
205
+ } else {
206
+ const file = getText();
207
+ if (!file) return console.log('Usage: memshell ingest <file> | --stdin | --watch <dir>');
208
+ console.log(` Ingesting: ${file}`);
209
+ const result = await ingestFile(file, store, { agent, force: hasFlag('force') });
210
+ if (result.skipped) {
211
+ console.log(` Skipped: ${result.file} (already processed, use --force to re-ingest)`);
212
+ } else {
213
+ console.log(`\x1b[32m+\x1b[0m Extracted: ${result.extracted}, Stored: ${result.stored}, Duplicates: ${result.duplicates}`);
214
+ }
215
+ }
216
+ break;
217
+ }
218
+ case 'connect': {
219
+ const target = args[1];
220
+ if (target !== 'openclaw') return console.log('Usage: memshell connect openclaw');
221
+
222
+ const { watchOpenClaw, defaultOpenClawPath, setConfigValue } = require('../src/ingest');
223
+ const store = new LocalStore(undefined, useEmbeddings ? { openaiKey: process.env.OPENAI_API_KEY } : {});
224
+ await store.init();
225
+
226
+ const sessionsPath = args[2] || defaultOpenClawPath();
227
+ setConfigValue('watch.openclaw', sessionsPath);
228
+ console.log(` OpenClaw integration configured.`);
229
+ console.log(` Sessions path: ${sessionsPath}`);
230
+ console.log(' Watching for new transcripts (Ctrl+C to stop)...\n');
231
+ watchOpenClaw(sessionsPath, store, { agent });
232
+ process.on('SIGINT', () => { console.log('\n Stopped.'); process.exit(0); });
233
+ break;
234
+ }
235
+ case 'daemon': {
236
+ const { loadConfig, watchDirectory, watchOpenClaw } = require('../src/ingest');
237
+ const store = new LocalStore(undefined, useEmbeddings ? { openaiKey: process.env.OPENAI_API_KEY } : {});
238
+ await store.init();
239
+
240
+ const config = loadConfig();
241
+ const watchers = config.watch || {};
242
+ let activeWatchers = 0;
243
+
244
+ console.log(' \x1b[1mmem.sh daemon\x1b[0m starting...\n');
245
+
246
+ if (watchers.openclaw) {
247
+ watchOpenClaw(watchers.openclaw, store, { agent });
248
+ activeWatchers++;
249
+ }
250
+
251
+ // Support array of dir watchers
252
+ if (Array.isArray(watchers.dirs)) {
253
+ for (const dir of watchers.dirs) {
254
+ watchDirectory(typeof dir === 'string' ? dir : dir.path, store, { agent });
255
+ activeWatchers++;
256
+ }
257
+ } else if (watchers.dir) {
258
+ watchDirectory(watchers.dir, store, { agent });
259
+ activeWatchers++;
260
+ }
261
+
262
+ if (activeWatchers === 0) {
263
+ console.log(' No watchers configured. Use:');
264
+ console.log(' memshell config set watch.openclaw ~/.openclaw/agents/main/sessions/');
265
+ console.log(' memshell config set watch.dir /path/to/watch');
266
+ process.exit(1);
267
+ }
268
+
269
+ console.log(`\n ${activeWatchers} watcher(s) active. Ctrl+C to stop.\n`);
270
+ process.on('SIGINT', () => { console.log('\n Daemon stopped.'); process.exit(0); });
271
+ break;
272
+ }
273
+ case 'config': {
274
+ const { loadConfig, setConfigValue } = require('../src/ingest');
275
+ const subCmd = args[1];
276
+
277
+ if (subCmd === 'set') {
278
+ const configKey = args[2];
279
+ const configVal = args.slice(3).join(' ');
280
+ if (!configKey || !configVal) return console.log('Usage: memshell config set <key> <value>');
281
+ const result = setConfigValue(configKey, configVal);
282
+ console.log(`\x1b[32m+\x1b[0m Set ${configKey} = ${configVal}`);
283
+ } else if (subCmd === 'get') {
284
+ const config = loadConfig();
285
+ const configKey = args[2];
286
+ if (configKey) {
287
+ const parts = configKey.split('.');
288
+ let val = config;
289
+ for (const p of parts) val = val?.[p];
290
+ console.log(val !== undefined ? JSON.stringify(val, null, 2) : 'Not set');
291
+ } else {
292
+ console.log(JSON.stringify(config, null, 2));
293
+ }
294
+ } else {
295
+ const config = loadConfig();
296
+ console.log(JSON.stringify(config, null, 2));
297
+ }
161
298
  break;
162
299
  }
163
300
  case 'serve': case 'server': {
package/bin/memshell.js CHANGED
@@ -4,6 +4,7 @@
4
4
  const fs = require('fs');
5
5
  const path = require('path');
6
6
  const mem = require('../src/index');
7
+ const { LocalStore } = require('../src/index');
7
8
 
8
9
  const args = process.argv.slice(2);
9
10
  const cmd = args[0];
@@ -11,13 +12,26 @@ const cmd = args[0];
11
12
  const HELP = `
12
13
  \x1b[1mmem.sh\x1b[0m — persistent memory for AI agents
13
14
 
14
- \x1b[36mUsage:\x1b[0m
15
+ \x1b[36mCore Commands:\x1b[0m
15
16
  memshell set <text> Store a memory
16
17
  memshell recall <query> Semantic recall
17
18
  memshell list List all memories
18
19
  memshell forget <id> Delete a memory by ID
19
20
  memshell clear Wipe all memories
20
21
  memshell important <id> Boost memory importance
22
+
23
+ \x1b[36mAuto-Ingest:\x1b[0m
24
+ memshell ingest <file> Extract facts from a file
25
+ memshell ingest --stdin Extract facts from piped text
26
+ memshell ingest --watch <dir> Watch a directory for new files
27
+
28
+ \x1b[36mIntegrations:\x1b[0m
29
+ memshell connect openclaw Watch OpenClaw session transcripts
30
+ memshell daemon Run continuous ingestion daemon
31
+
32
+ \x1b[36mManagement:\x1b[0m
33
+ memshell config set <key> <val> Set config value
34
+ memshell config get [key] Show config
21
35
  memshell stats Show memory statistics
22
36
  memshell export Export all memories as JSON
23
37
  memshell import <file.json> Import memories from JSON
@@ -34,10 +48,9 @@ const HELP = `
34
48
  \x1b[36mExamples:\x1b[0m
35
49
  memshell set "user prefers dark mode" --tags preferences,ui
36
50
  memshell recall "what theme?" --tags preferences --top 3
37
- memshell important 5
38
- memshell stats
39
- memshell export > backup.json
40
- memshell import backup.json
51
+ echo "User likes vim and dark mode" | memshell ingest --stdin
52
+ memshell connect openclaw
53
+ memshell config set apiKey sk-...
41
54
  `;
42
55
 
43
56
  // Parse flags
@@ -52,18 +65,14 @@ function hasFlag(name) {
52
65
  return args.includes('--' + name);
53
66
  }
54
67
 
55
- function getTextArgs() {
56
- return args.slice(1).filter(a => !a.startsWith('--') && (args.indexOf(a) === 0 || !['--agent', '--api', '--key', '--tags', '--top', '--port'].includes(args[args.indexOf(a) - 1]))).join(' ').replace(/^["']|["']$/g, '');
57
- }
58
-
59
68
  // Smarter text extraction: skip flag values
60
69
  function getText() {
61
- const skip = new Set(['--agent', '--api', '--key', '--tags', '--top', '--port']);
70
+ const skip = new Set(['--agent', '--api', '--key', '--tags', '--top', '--port', '--watch']);
62
71
  const parts = [];
63
- let i = 1; // skip command
72
+ let i = 1;
64
73
  while (i < args.length) {
65
74
  if (skip.has(args[i])) { i += 2; continue; }
66
- if (args[i] === '--embeddings') { i++; continue; }
75
+ if (args[i] === '--embeddings' || args[i] === '--stdin' || args[i] === '--force') { i++; continue; }
67
76
  if (args[i].startsWith('--')) { i++; continue; }
68
77
  parts.push(args[i]);
69
78
  i++;
@@ -71,6 +80,17 @@ function getText() {
71
80
  return parts.join(' ').replace(/^["']|["']$/g, '');
72
81
  }
73
82
 
83
+ function readStdin() {
84
+ return new Promise((resolve) => {
85
+ let data = '';
86
+ process.stdin.setEncoding('utf8');
87
+ process.stdin.on('data', chunk => data += chunk);
88
+ process.stdin.on('end', () => resolve(data));
89
+ // If nothing after 100ms and stdin is a TTY, resolve empty
90
+ if (process.stdin.isTTY) resolve('');
91
+ });
92
+ }
93
+
74
94
  async function main() {
75
95
  const agent = flag('agent') || 'default';
76
96
  const api = flag('api');
@@ -92,7 +112,7 @@ async function main() {
92
112
  const text = getText();
93
113
  if (!text) return console.log('Usage: memshell set <text>');
94
114
  const r = await mem.set(text, { ...opts, tags });
95
- console.log(`\x1b[32m✓\x1b[0m Stored (id: \x1b[1m${r.id}\x1b[0m)${tags ? ` [tags: ${tags}]` : ''}`);
115
+ console.log(`\x1b[32m+\x1b[0m Stored (id: \x1b[1m${r.id}\x1b[0m)${tags ? ` [tags: ${tags}]` : ''}`);
96
116
  break;
97
117
  }
98
118
  case 'recall': case 'r': case 'search': case 'q': {
@@ -102,7 +122,9 @@ async function main() {
102
122
  if (!results.length) return console.log('\x1b[33mNo memories found.\x1b[0m');
103
123
  for (const r of results) {
104
124
  const tagStr = r.tags ? ` \x1b[35m[${r.tags}]\x1b[0m` : '';
105
- console.log(` \x1b[36m[${r.id}]\x1b[0m ${r.text} \x1b[33m(score: ${r.score})\x1b[0m${tagStr}`);
125
+ const srcStr = r.source && r.source !== 'manual' ? ` \x1b[2m(src: ${r.source})\x1b[0m` : '';
126
+ const recallStr = r.recall_count ? ` \x1b[2m(recalled ${r.recall_count}x)\x1b[0m` : '';
127
+ console.log(` \x1b[36m[${r.id}]\x1b[0m ${r.text} \x1b[33m(score: ${r.score})\x1b[0m${tagStr}${srcStr}${recallStr}`);
106
128
  }
107
129
  break;
108
130
  }
@@ -111,8 +133,9 @@ async function main() {
111
133
  if (!all.length) return console.log('\x1b[33mNo memories stored.\x1b[0m');
112
134
  for (const r of all) {
113
135
  const tagStr = r.tags ? ` \x1b[35m[${r.tags}]\x1b[0m` : '';
114
- const imp = r.importance !== 1.0 ? ` \x1b[33m★${r.importance.toFixed(1)}\x1b[0m` : '';
115
- console.log(` \x1b[36m[${r.id}]\x1b[0m ${r.text}${tagStr}${imp} \x1b[2m(${r.created_at})\x1b[0m`);
136
+ const imp = r.importance !== 1.0 ? ` \x1b[33m*${r.importance.toFixed(1)}\x1b[0m` : '';
137
+ const srcStr = r.source && r.source !== 'manual' ? ` \x1b[2m[${r.source}]\x1b[0m` : '';
138
+ console.log(` \x1b[36m[${r.id}]\x1b[0m ${r.text}${tagStr}${imp}${srcStr} \x1b[2m(${r.created_at})\x1b[0m`);
116
139
  }
117
140
  console.log(`\n \x1b[1m${all.length}\x1b[0m memor${all.length === 1 ? 'y' : 'ies'}`);
118
141
  break;
@@ -121,12 +144,12 @@ async function main() {
121
144
  const id = args[1];
122
145
  if (!id) return console.log('Usage: memshell forget <id>');
123
146
  await mem.forget(id);
124
- console.log(`\x1b[32m✓\x1b[0m Forgotten (id: ${id})`);
147
+ console.log(`\x1b[32m+\x1b[0m Forgotten (id: ${id})`);
125
148
  break;
126
149
  }
127
150
  case 'clear': case 'wipe': case 'reset': {
128
151
  await mem.clear(opts);
129
- console.log('\x1b[32m✓\x1b[0m All memories cleared');
152
+ console.log('\x1b[32m+\x1b[0m All memories cleared');
130
153
  break;
131
154
  }
132
155
  case 'important': case 'boost': {
@@ -134,15 +157,15 @@ async function main() {
134
157
  if (!id) return console.log('Usage: memshell important <id>');
135
158
  const r = await mem.important(Number(id));
136
159
  if (!r) return console.log('\x1b[31mMemory not found.\x1b[0m');
137
- console.log(`\x1b[32m✓\x1b[0m Boosted memory ${r.id} importance: \x1b[1m${r.importance.toFixed(1)}\x1b[0m`);
160
+ console.log(`\x1b[32m+\x1b[0m Boosted memory ${r.id} -> importance: \x1b[1m${r.importance.toFixed(1)}\x1b[0m`);
138
161
  break;
139
162
  }
140
163
  case 'stats': {
141
164
  const s = await mem.stats(opts);
142
- console.log(`\n \x1b[1m🧠 Memory Stats\x1b[0m`);
143
- console.log(` Total: \x1b[36m${s.total}\x1b[0m`);
144
- console.log(` Oldest: ${s.oldest || 'n/a'}`);
145
- console.log(` Newest: ${s.newest || 'n/a'}`);
165
+ console.log(`\n \x1b[1mMemory Stats\x1b[0m`);
166
+ console.log(` Total: \x1b[36m${s.total}\x1b[0m`);
167
+ console.log(` Oldest: ${s.oldest || 'n/a'}`);
168
+ console.log(` Newest: ${s.newest || 'n/a'}`);
146
169
  console.log(` Avg importance: \x1b[33m${s.avg_importance}\x1b[0m\n`);
147
170
  break;
148
171
  }
@@ -157,7 +180,121 @@ async function main() {
157
180
  const raw = fs.readFileSync(path.resolve(file), 'utf8');
158
181
  const data = JSON.parse(raw);
159
182
  const r = await mem.importAll(Array.isArray(data) ? data : data.memories || []);
160
- console.log(`\x1b[32m✓\x1b[0m Imported ${r.imported} memories`);
183
+ console.log(`\x1b[32m+\x1b[0m Imported ${r.imported} memories`);
184
+ break;
185
+ }
186
+ case 'ingest': {
187
+ const { ingestFile, ingest: ingestText } = require('../src/ingest');
188
+ const store = new LocalStore(undefined, useEmbeddings ? { openaiKey: process.env.OPENAI_API_KEY } : {});
189
+ await store.init();
190
+
191
+ if (hasFlag('stdin')) {
192
+ const text = await readStdin();
193
+ if (!text.trim()) return console.log('No input received via stdin.');
194
+ console.log(' Extracting facts from stdin...');
195
+ const result = await ingestText(text, store, { agent });
196
+ console.log(`\x1b[32m+\x1b[0m Extracted: ${result.extracted}, Stored: ${result.stored}, Duplicates: ${result.duplicates}`);
197
+ } else if (hasFlag('watch')) {
198
+ const dir = flag('watch');
199
+ if (!dir || dir === true) return console.log('Usage: memshell ingest --watch <directory>');
200
+ const { watchDirectory } = require('../src/ingest');
201
+ console.log(' Starting directory watcher (Ctrl+C to stop)...');
202
+ watchDirectory(dir, store, { agent });
203
+ // Keep process alive
204
+ process.on('SIGINT', () => { console.log('\n Stopped.'); process.exit(0); });
205
+ } else {
206
+ const file = getText();
207
+ if (!file) return console.log('Usage: memshell ingest <file> | --stdin | --watch <dir>');
208
+ console.log(` Ingesting: ${file}`);
209
+ const result = await ingestFile(file, store, { agent, force: hasFlag('force') });
210
+ if (result.skipped) {
211
+ console.log(` Skipped: ${result.file} (already processed, use --force to re-ingest)`);
212
+ } else {
213
+ console.log(`\x1b[32m+\x1b[0m Extracted: ${result.extracted}, Stored: ${result.stored}, Duplicates: ${result.duplicates}`);
214
+ }
215
+ }
216
+ break;
217
+ }
218
+ case 'connect': {
219
+ const target = args[1];
220
+ if (target !== 'openclaw') return console.log('Usage: memshell connect openclaw');
221
+
222
+ const { watchOpenClaw, defaultOpenClawPath, setConfigValue } = require('../src/ingest');
223
+ const store = new LocalStore(undefined, useEmbeddings ? { openaiKey: process.env.OPENAI_API_KEY } : {});
224
+ await store.init();
225
+
226
+ const sessionsPath = args[2] || defaultOpenClawPath();
227
+ setConfigValue('watch.openclaw', sessionsPath);
228
+ console.log(` OpenClaw integration configured.`);
229
+ console.log(` Sessions path: ${sessionsPath}`);
230
+ console.log(' Watching for new transcripts (Ctrl+C to stop)...\n');
231
+ watchOpenClaw(sessionsPath, store, { agent });
232
+ process.on('SIGINT', () => { console.log('\n Stopped.'); process.exit(0); });
233
+ break;
234
+ }
235
+ case 'daemon': {
236
+ const { loadConfig, watchDirectory, watchOpenClaw } = require('../src/ingest');
237
+ const store = new LocalStore(undefined, useEmbeddings ? { openaiKey: process.env.OPENAI_API_KEY } : {});
238
+ await store.init();
239
+
240
+ const config = loadConfig();
241
+ const watchers = config.watch || {};
242
+ let activeWatchers = 0;
243
+
244
+ console.log(' \x1b[1mmem.sh daemon\x1b[0m starting...\n');
245
+
246
+ if (watchers.openclaw) {
247
+ watchOpenClaw(watchers.openclaw, store, { agent });
248
+ activeWatchers++;
249
+ }
250
+
251
+ // Support array of dir watchers
252
+ if (Array.isArray(watchers.dirs)) {
253
+ for (const dir of watchers.dirs) {
254
+ watchDirectory(typeof dir === 'string' ? dir : dir.path, store, { agent });
255
+ activeWatchers++;
256
+ }
257
+ } else if (watchers.dir) {
258
+ watchDirectory(watchers.dir, store, { agent });
259
+ activeWatchers++;
260
+ }
261
+
262
+ if (activeWatchers === 0) {
263
+ console.log(' No watchers configured. Use:');
264
+ console.log(' memshell config set watch.openclaw ~/.openclaw/agents/main/sessions/');
265
+ console.log(' memshell config set watch.dir /path/to/watch');
266
+ process.exit(1);
267
+ }
268
+
269
+ console.log(`\n ${activeWatchers} watcher(s) active. Ctrl+C to stop.\n`);
270
+ process.on('SIGINT', () => { console.log('\n Daemon stopped.'); process.exit(0); });
271
+ break;
272
+ }
273
+ case 'config': {
274
+ const { loadConfig, setConfigValue } = require('../src/ingest');
275
+ const subCmd = args[1];
276
+
277
+ if (subCmd === 'set') {
278
+ const configKey = args[2];
279
+ const configVal = args.slice(3).join(' ');
280
+ if (!configKey || !configVal) return console.log('Usage: memshell config set <key> <value>');
281
+ const result = setConfigValue(configKey, configVal);
282
+ console.log(`\x1b[32m+\x1b[0m Set ${configKey} = ${configVal}`);
283
+ } else if (subCmd === 'get') {
284
+ const config = loadConfig();
285
+ const configKey = args[2];
286
+ if (configKey) {
287
+ const parts = configKey.split('.');
288
+ let val = config;
289
+ for (const p of parts) val = val?.[p];
290
+ console.log(val !== undefined ? JSON.stringify(val, null, 2) : 'Not set');
291
+ } else {
292
+ console.log(JSON.stringify(config, null, 2));
293
+ }
294
+ } else {
295
+ const config = loadConfig();
296
+ console.log(JSON.stringify(config, null, 2));
297
+ }
161
298
  break;
162
299
  }
163
300
  case 'serve': case 'server': {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "memshell",
3
- "version": "0.3.0",
3
+ "version": "0.4.0",
4
4
  "description": "Persistent memory for AI agents. Like localStorage but for LLMs.",
5
5
  "main": "src/index.js",
6
6
  "bin": {
package/server.js CHANGED
@@ -27,6 +27,19 @@ app.use('/mem', async (req, res, next) => {
27
27
  next();
28
28
  });
29
29
 
30
+ // Ingest raw text
31
+ app.post('/mem/ingest', async (req, res) => {
32
+ const { text, source } = req.body;
33
+ if (!text) return res.status(400).json({ error: 'text is required' });
34
+ try {
35
+ const { ingest } = require('./src/ingest');
36
+ const result = await ingest(text, store, { agent: req.agent, source: source || 'api' });
37
+ res.json(result);
38
+ } catch (e) {
39
+ res.status(500).json({ error: e.message });
40
+ }
41
+ });
42
+
30
43
  // Store a memory
31
44
  app.post('/mem', async (req, res) => {
32
45
  const { text, tags, importance, metadata } = req.body;
package/src/index.js CHANGED
@@ -141,8 +141,13 @@ class LocalStore {
141
141
  embedding TEXT,
142
142
  tags TEXT DEFAULT '',
143
143
  created_at TEXT NOT NULL,
144
- importance REAL DEFAULT 1.0
144
+ importance REAL DEFAULT 1.0,
145
+ source TEXT DEFAULT 'manual',
146
+ recall_count INTEGER DEFAULT 0
145
147
  )`);
148
+ // Migration: add columns if they don't exist (for existing DBs)
149
+ try { this._db.run('ALTER TABLE memories ADD COLUMN source TEXT DEFAULT "manual"'); } catch {}
150
+ try { this._db.run('ALTER TABLE memories ADD COLUMN recall_count INTEGER DEFAULT 0'); } catch {}
146
151
  this._save();
147
152
  }
148
153
 
@@ -169,6 +174,7 @@ class LocalStore {
169
174
  const agent = opts.agent || 'default';
170
175
  const tags = opts.tags || '';
171
176
  const importance = opts.importance || 1.0;
177
+ const source = opts.source || 'manual';
172
178
  const created_at = new Date().toISOString();
173
179
 
174
180
  let embedding = null;
@@ -182,8 +188,8 @@ class LocalStore {
182
188
  }
183
189
 
184
190
  this._db.run(
185
- 'INSERT INTO memories (text, agent, embedding, tags, created_at, importance) VALUES (?, ?, ?, ?, ?, ?)',
186
- [text, agent, embedding, tags, created_at, importance]
191
+ 'INSERT INTO memories (text, agent, embedding, tags, created_at, importance, source) VALUES (?, ?, ?, ?, ?, ?, ?)',
192
+ [text, agent, embedding, tags, created_at, importance, source]
187
193
  );
188
194
  const id = this._db.exec('SELECT last_insert_rowid() as id')[0].values[0][0];
189
195
  this._save();
@@ -198,7 +204,7 @@ class LocalStore {
198
204
  const filterTags = opts.tags ? opts.tags.split(',').map(t => t.trim()) : null;
199
205
 
200
206
  const stmt = this._db.exec(
201
- 'SELECT id, text, agent, embedding, tags, created_at, importance FROM memories WHERE agent = ?',
207
+ 'SELECT id, text, agent, embedding, tags, created_at, importance, source, recall_count FROM memories WHERE agent = ?',
202
208
  [agent]
203
209
  );
204
210
  if (!stmt.length) return [];
@@ -257,9 +263,9 @@ class LocalStore {
257
263
  const resultLimit = top || limit;
258
264
  const results = scored.slice(0, resultLimit);
259
265
 
260
- // Bump importance for recalled memories
266
+ // Bump importance and recall_count for recalled memories
261
267
  for (const r of results) {
262
- this._db.run('UPDATE memories SET importance = importance + 0.1 WHERE id = ?', [r.id]);
268
+ this._db.run('UPDATE memories SET importance = importance + 0.1, recall_count = recall_count + 1 WHERE id = ?', [r.id]);
263
269
  }
264
270
  this._save();
265
271
 
@@ -270,7 +276,7 @@ class LocalStore {
270
276
  await this.init();
271
277
  const agent = opts.agent || 'default';
272
278
  const stmt = this._db.exec(
273
- 'SELECT id, text, agent, tags, created_at, importance FROM memories WHERE agent = ? ORDER BY id DESC',
279
+ 'SELECT id, text, agent, tags, created_at, importance, source, recall_count FROM memories WHERE agent = ? ORDER BY id DESC',
274
280
  [agent]
275
281
  );
276
282
  if (!stmt.length) return [];
package/src/ingest.js ADDED
@@ -0,0 +1,348 @@
1
+ 'use strict';
2
+
3
+ const fs = require('fs');
4
+ const path = require('path');
5
+ const os = require('os');
6
+
7
+ // ── LLM Extraction ────────────────────────────────────────────
8
+ async function callLLM(text, config = {}) {
9
+ const anthropicKey = config.anthropicKey || process.env.ANTHROPIC_API_KEY;
10
+ const openaiKey = config.apiKey || config.openaiKey || process.env.OPENAI_API_KEY;
11
+ const model = config.model || 'gpt-4o-mini';
12
+
13
+ const systemPrompt = 'Extract key facts, user preferences, decisions, and important context from this conversation. Return as a JSON array of strings, each a standalone fact. Only return the JSON array, nothing else.';
14
+
15
+ if (anthropicKey && (model.startsWith('claude') || !openaiKey)) {
16
+ const res = await fetch('https://api.anthropic.com/v1/messages', {
17
+ method: 'POST',
18
+ headers: {
19
+ 'Content-Type': 'application/json',
20
+ 'x-api-key': anthropicKey,
21
+ 'anthropic-version': '2023-06-01'
22
+ },
23
+ body: JSON.stringify({
24
+ model: model.startsWith('claude') ? model : 'claude-3-haiku-20240307',
25
+ max_tokens: 2048,
26
+ system: systemPrompt,
27
+ messages: [{ role: 'user', content: text }]
28
+ })
29
+ });
30
+ if (!res.ok) throw new Error(`Anthropic API error: ${res.status} ${await res.text()}`);
31
+ const data = await res.json();
32
+ const content = data.content[0].text;
33
+ return JSON.parse(content);
34
+ }
35
+
36
+ if (openaiKey) {
37
+ const res = await fetch('https://api.openai.com/v1/chat/completions', {
38
+ method: 'POST',
39
+ headers: {
40
+ 'Content-Type': 'application/json',
41
+ 'Authorization': `Bearer ${openaiKey}`
42
+ },
43
+ body: JSON.stringify({
44
+ model: model.startsWith('claude') ? 'gpt-4o-mini' : model,
45
+ messages: [
46
+ { role: 'system', content: systemPrompt },
47
+ { role: 'user', content: text }
48
+ ],
49
+ temperature: 0.3
50
+ })
51
+ });
52
+ if (!res.ok) throw new Error(`OpenAI API error: ${res.status} ${await res.text()}`);
53
+ const data = await res.json();
54
+ const content = data.choices[0].message.content;
55
+ // Extract JSON array from response
56
+ const match = content.match(/\[[\s\S]*\]/);
57
+ if (!match) throw new Error('LLM did not return a valid JSON array');
58
+ return JSON.parse(match[0]);
59
+ }
60
+
61
+ throw new Error('No API key found. Set OPENAI_API_KEY or ANTHROPIC_API_KEY, or run: memshell config set apiKey <key>');
62
+ }
63
+
64
+ // ── Chunking ───────────────────────────────────────────────────
65
+ function chunkText(text, maxTokens = 2000) {
66
+ // Rough estimate: 1 token ≈ 4 chars
67
+ const maxChars = maxTokens * 4;
68
+ if (text.length <= maxChars) return [text];
69
+
70
+ const chunks = [];
71
+ const lines = text.split('\n');
72
+ let current = '';
73
+
74
+ for (const line of lines) {
75
+ if ((current + '\n' + line).length > maxChars && current.length > 0) {
76
+ chunks.push(current.trim());
77
+ current = line;
78
+ } else {
79
+ current += (current ? '\n' : '') + line;
80
+ }
81
+ }
82
+ if (current.trim()) chunks.push(current.trim());
83
+ return chunks;
84
+ }
85
+
86
+ // ── Similarity (simple word overlap for dedup) ─────────────────
87
+ function wordSet(text) {
88
+ return new Set(text.toLowerCase().replace(/[^a-z0-9\s]/g, '').split(/\s+/).filter(Boolean));
89
+ }
90
+
91
+ function jaccardSimilarity(a, b) {
92
+ const setA = wordSet(a);
93
+ const setB = wordSet(b);
94
+ let intersection = 0;
95
+ for (const w of setA) if (setB.has(w)) intersection++;
96
+ const union = setA.size + setB.size - intersection;
97
+ return union === 0 ? 0 : intersection / union;
98
+ }
99
+
100
+ // ── Main Ingest Function ──────────────────────────────────────
101
+ async function ingest(text, store, opts = {}) {
102
+ const config = loadConfig();
103
+ const mergedConfig = { ...config, ...opts };
104
+ const source = opts.source || 'auto-ingest';
105
+ const agent = opts.agent || 'default';
106
+
107
+ const chunks = chunkText(text);
108
+ let totalExtracted = 0;
109
+ let totalStored = 0;
110
+ let totalDuplicates = 0;
111
+
112
+ // Get existing memories for dedup
113
+ const existing = await store.list({ agent });
114
+ const existingTexts = existing.map(m => m.text);
115
+
116
+ for (const chunk of chunks) {
117
+ if (chunk.trim().length < 20) continue; // skip tiny chunks
118
+
119
+ let facts;
120
+ try {
121
+ facts = await callLLM(chunk, mergedConfig);
122
+ } catch (e) {
123
+ console.error(` Warning: LLM extraction failed for chunk: ${e.message}`);
124
+ continue;
125
+ }
126
+
127
+ if (!Array.isArray(facts)) continue;
128
+ totalExtracted += facts.length;
129
+
130
+ for (const fact of facts) {
131
+ if (typeof fact !== 'string' || fact.trim().length < 5) continue;
132
+
133
+ // Dedup check
134
+ let isDuplicate = false;
135
+ for (const existing of existingTexts) {
136
+ if (jaccardSimilarity(fact, existing) > 0.85) {
137
+ isDuplicate = true;
138
+ break;
139
+ }
140
+ }
141
+
142
+ if (isDuplicate) {
143
+ totalDuplicates++;
144
+ continue;
145
+ }
146
+
147
+ // Auto-generate tags from fact
148
+ const tags = [source, 'auto'].join(',');
149
+ await store.set(fact, { agent, tags, source });
150
+ existingTexts.push(fact); // prevent self-duplication within batch
151
+ totalStored++;
152
+ }
153
+ }
154
+
155
+ return { extracted: totalExtracted, stored: totalStored, duplicates: totalDuplicates };
156
+ }
157
+
158
+ // ── JSONL Parser (OpenClaw format) ─────────────────────────────
159
+ function parseJSONL(content) {
160
+ const lines = content.split('\n').filter(l => l.trim());
161
+ const messages = [];
162
+
163
+ for (const line of lines) {
164
+ try {
165
+ const obj = JSON.parse(line);
166
+ if (obj.role && obj.content) {
167
+ if (obj.role === 'user' || obj.role === 'assistant') {
168
+ const text = typeof obj.content === 'string' ? obj.content : JSON.stringify(obj.content);
169
+ messages.push(`${obj.role}: ${text}`);
170
+ }
171
+ }
172
+ } catch {
173
+ // skip invalid lines
174
+ }
175
+ }
176
+
177
+ return messages.join('\n');
178
+ }
179
+
180
+ // ── Config Management ──────────────────────────────────────────
181
+ function configPath() {
182
+ return path.join(os.homedir(), '.mem', 'config.json');
183
+ }
184
+
185
+ function loadConfig() {
186
+ try {
187
+ return JSON.parse(fs.readFileSync(configPath(), 'utf8'));
188
+ } catch {
189
+ return {};
190
+ }
191
+ }
192
+
193
+ function saveConfig(config) {
194
+ const dir = path.dirname(configPath());
195
+ fs.mkdirSync(dir, { recursive: true });
196
+ fs.writeFileSync(configPath(), JSON.stringify(config, null, 2));
197
+ }
198
+
199
+ function setConfigValue(key, value) {
200
+ const config = loadConfig();
201
+ // Support dotted keys like watch.openclaw
202
+ const parts = key.split('.');
203
+ let obj = config;
204
+ for (let i = 0; i < parts.length - 1; i++) {
205
+ if (!obj[parts[i]] || typeof obj[parts[i]] !== 'object') obj[parts[i]] = {};
206
+ obj = obj[parts[i]];
207
+ }
208
+ obj[parts[parts.length - 1]] = value;
209
+ saveConfig(config);
210
+ return config;
211
+ }
212
+
213
+ // ── Processed Tracker ──────────────────────────────────────────
214
+ function processedPath() {
215
+ return path.join(os.homedir(), '.mem', 'processed.json');
216
+ }
217
+
218
+ function loadProcessed() {
219
+ try {
220
+ return JSON.parse(fs.readFileSync(processedPath(), 'utf8'));
221
+ } catch {
222
+ return { files: {} };
223
+ }
224
+ }
225
+
226
+ function saveProcessed(data) {
227
+ const dir = path.dirname(processedPath());
228
+ fs.mkdirSync(dir, { recursive: true });
229
+ fs.writeFileSync(processedPath(), JSON.stringify(data, null, 2));
230
+ }
231
+
232
+ function markProcessed(filePath, mtime) {
233
+ const data = loadProcessed();
234
+ data.files[filePath] = { mtime: mtime || Date.now(), processedAt: new Date().toISOString() };
235
+ saveProcessed(data);
236
+ }
237
+
238
+ function isProcessed(filePath, mtime) {
239
+ const data = loadProcessed();
240
+ const entry = data.files[filePath];
241
+ if (!entry) return false;
242
+ if (mtime && entry.mtime < mtime) return false; // file was modified
243
+ return true;
244
+ }
245
+
246
+ // ── File Ingestion ─────────────────────────────────────────────
247
+ async function ingestFile(filePath, store, opts = {}) {
248
+ const absPath = path.resolve(filePath);
249
+ const stat = fs.statSync(absPath);
250
+ const mtime = stat.mtimeMs;
251
+
252
+ if (!opts.force && isProcessed(absPath, mtime)) {
253
+ return { skipped: true, file: absPath };
254
+ }
255
+
256
+ const content = fs.readFileSync(absPath, 'utf8');
257
+ let text;
258
+
259
+ const ext = path.extname(absPath).toLowerCase();
260
+ if (ext === '.jsonl') {
261
+ text = parseJSONL(content);
262
+ } else if (ext === '.json') {
263
+ try {
264
+ const data = JSON.parse(content);
265
+ if (Array.isArray(data)) {
266
+ text = data.map(d => typeof d === 'string' ? d : JSON.stringify(d)).join('\n');
267
+ } else {
268
+ text = JSON.stringify(data);
269
+ }
270
+ } catch {
271
+ text = content;
272
+ }
273
+ } else {
274
+ text = content;
275
+ }
276
+
277
+ if (!text || text.trim().length < 20) {
278
+ return { skipped: true, file: absPath, reason: 'too short' };
279
+ }
280
+
281
+ const source = opts.source || `file:${path.basename(absPath)}`;
282
+ const result = await ingest(text, store, { ...opts, source });
283
+ markProcessed(absPath, mtime);
284
+ return { ...result, file: absPath };
285
+ }
286
+
287
+ // ── Directory Watcher (polling) ────────────────────────────────
288
+ function watchDirectory(dir, store, opts = {}) {
289
+ const interval = opts.interval || 10000;
290
+ const absDir = path.resolve(dir);
291
+
292
+ console.log(` Watching: ${absDir} (every ${interval / 1000}s)`);
293
+
294
+ async function scan() {
295
+ try {
296
+ const files = fs.readdirSync(absDir).filter(f => {
297
+ const ext = path.extname(f).toLowerCase();
298
+ return ['.txt', '.md', '.json', '.jsonl'].includes(ext);
299
+ });
300
+
301
+ for (const file of files) {
302
+ const filePath = path.join(absDir, file);
303
+ try {
304
+ const result = await ingestFile(filePath, store, opts);
305
+ if (!result.skipped) {
306
+ console.log(` Ingested: ${file} (${result.extracted} extracted, ${result.stored} stored, ${result.duplicates} duplicates)`);
307
+ }
308
+ } catch (e) {
309
+ console.error(` Error processing ${file}: ${e.message}`);
310
+ }
311
+ }
312
+ } catch (e) {
313
+ console.error(` Watch error: ${e.message}`);
314
+ }
315
+ }
316
+
317
+ scan(); // initial scan
318
+ return setInterval(scan, interval);
319
+ }
320
+
321
+ // ── OpenClaw Connector ─────────────────────────────────────────
322
+ function defaultOpenClawPath() {
323
+ return path.join(os.homedir(), '.openclaw', 'agents', 'main', 'sessions');
324
+ }
325
+
326
+ function watchOpenClaw(sessionsPath, store, opts = {}) {
327
+ const dir = sessionsPath || defaultOpenClawPath();
328
+ console.log(` Connecting to OpenClaw sessions: ${dir}`);
329
+ return watchDirectory(dir, store, { ...opts, source: 'openclaw' });
330
+ }
331
+
332
+ module.exports = {
333
+ ingest,
334
+ ingestFile,
335
+ callLLM,
336
+ chunkText,
337
+ jaccardSimilarity,
338
+ parseJSONL,
339
+ loadConfig,
340
+ saveConfig,
341
+ setConfigValue,
342
+ watchDirectory,
343
+ watchOpenClaw,
344
+ defaultOpenClawPath,
345
+ loadProcessed,
346
+ isProcessed,
347
+ markProcessed
348
+ };