chattercatcher 0.1.13 → 0.1.15

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -15,7 +15,7 @@
15
15
  </p>
16
16
 
17
17
  <p align="center">
18
- 静默保存家庭群里的重要消息和文件,被 @ 时用可追溯引用回答。
18
+ 静默保存家庭群里的重要消息、文件和碎片化上下文,被 @ 时用可追溯引用回答。
19
19
  </p>
20
20
 
21
21
  <p align="center">
@@ -53,7 +53,15 @@
53
53
 
54
54
  ## 项目状态
55
55
 
56
- ChatterCatcher 是一个早期 MVP。它已经具备飞书长连接接入、本地消息存储、SQLite FTS、SQLite embedding 向量检索、OpenAI-compatible LLM/Embedding、CLI、本地 Web UI 和带引用回答。
56
+ ChatterCatcher 是一个早期 MVP。它已经具备飞书长连接接入、本地消息存储、SQLite FTS、SQLite embedding 向量检索、会话记忆块、OpenAI-compatible LLM/Embedding、CLI、本地 Web UI 和带引用回答。
57
+
58
+ 近期亮点:
59
+
60
+ - **无 native 向量库依赖**:语义向量写入 SQLite,避免 LanceDB 平台包在不同 macOS/CPU 架构上安装失败。
61
+ - **SQLite FTS + embedding 混合 RAG**:关键词和语义检索并行召回,回答前必须先找到本地证据。
62
+ - **自动识别飞书机器人身份**:可通过 App ID / App Secret 自动获取 `botOpenId`,减少手动配置错误。
63
+ - **会话记忆块**:把 10 分钟窗口、静默 2 分钟后的碎片聊天整理成 episode summary,让“我要发一个 API key”与后续短消息保持上下文关联。
64
+ - **敏感摘要保护**:会话摘要会脱敏疑似 token/API key;原始消息仍保留在本地,方便必要时追溯。
57
65
 
58
66
  当前核心方向是:
59
67
 
@@ -109,15 +117,16 @@ ChatterCatcher 是一个早期 MVP。它已经具备飞书长连接接入、本
109
117
 
110
118
  | 模块 | 能力 |
111
119
  | --- | --- |
112
- | 飞书 Gateway | 官方长连接、`im.message.receive_v1` 事件、重复投递保护、附件下载入口 |
120
+ | 飞书 Gateway | 官方长连接、`im.message.receive_v1` 事件、自动 `botOpenId` 获取、重复投递保护、附件下载入口 |
113
121
  | 消息入库 | 普通文本消息写入 SQLite;`@` 提问直接回答并跳过入库 |
114
- | RAG 检索 | SQLite FTS 关键词检索、SQLite embedding 向量检索、混合重排、证据来源保留 |
122
+ | 会话记忆块 | 默认 10 分钟窗口 + 2 分钟静默期,把碎片聊天整理成可检索 episode summary,并关联原始消息 |
123
+ | RAG 检索 | SQLite FTS 关键词检索、SQLite embedding 向量检索、episode summary 检索、混合重排、证据来源保留 |
115
124
  | 问答 | OpenAI-compatible chat completions、证据不足时说不知道、回答带引用 |
116
125
  | 引用格式 | 展示“谁在什么时候说了什么”,避免暴露 `ou_` / `oc_` 等 opaque id |
117
126
  | 文件知识源 | 支持 txt、md、json、csv、tsv、log、docx、pdf 导入和解析 |
118
127
  | CLI | setup、settings、doctor、gateway、process、index、files、export、restore |
119
128
  | Web UI | 本地状态看板、自动刷新、最近消息、群聊、文件库和解析任务 |
120
- | 隐私 | 配置与密钥分离;导出不包含 API Key、App Secret 或 token |
129
+ | 隐私 | 配置与密钥分离;导出不包含 API Key、App Secret 或 token;会话摘要会脱敏疑似密钥 |
121
130
  | 数据管理 | 本地导出/恢复、按消息/文件/群删除本地知识库数据 |
122
131
 
123
132
  ---
@@ -130,13 +139,16 @@ flowchart LR
130
139
  Gateway --> Router["消息路由"]
131
140
 
132
141
  Router -->|"普通消息"| SQLite["SQLite messages"]
142
+ SQLite --> Episode["Episode summaries"]
133
143
  SQLite --> FTS["SQLite FTS5"]
144
+ Episode --> EpisodeFTS["Episode FTS5"]
134
145
  SQLite --> Indexer["Embedding Indexer"]
135
146
  Indexer --> Vectors["SQLite embedding vectors"]
136
147
 
137
148
  Router -->|"@ 提问"| QA["Question Handler"]
138
149
  QA --> Hybrid["Hybrid Retriever"]
139
150
  FTS --> Hybrid
151
+ EpisodeFTS --> Hybrid
140
152
  Vectors --> Hybrid
141
153
  Hybrid --> LLM["OpenAI-compatible LLM"]
142
154
  LLM --> Reply["带引用回复原消息"]
@@ -257,6 +269,7 @@ http://127.0.0.1:3878
257
269
  | `chattercatcher gateway status` | 查看 Gateway 状态 |
258
270
  | `chattercatcher gateway stop` | 停止 Gateway |
259
271
  | `chattercatcher process messages` | 立即处理消息索引任务 |
272
+ | `chattercatcher process episodes` | 立即生成会话记忆块,把碎片聊天整理成可检索摘要 |
260
273
  | `chattercatcher index rebuild` | 重建 SQLite embedding 向量索引 |
261
274
  | `chattercatcher files add <path...>` | 导入本地文件知识源 |
262
275
  | `chattercatcher files jobs` | 查看文件解析任务 |
@@ -265,6 +278,29 @@ http://127.0.0.1:3878
265
278
 
266
279
  ---
267
280
 
281
+ ## 会话记忆块
282
+
283
+ 家庭群聊天经常是碎片化的:前一句说明背景,后一句只发一个短词、链接或密钥。只检索单条原始消息时,RAG 很容易丢失上下文。
284
+
285
+ ChatterCatcher 会在普通消息入库后尝试生成 **会话记忆块(episode summary)**:
286
+
287
+ 1. 按群聊读取尚未整理过的原始消息。
288
+ 2. 默认以 10 分钟为窗口聚合相邻聊天。
289
+ 3. 当窗口最后一条消息之后安静 2 分钟,认为这一小段对话可以整理。
290
+ 4. 调用 LLM 把碎片聊天总结成可检索事实。
291
+ 5. 将摘要写入本地 SQLite,并记录它关联的原始消息 ID。
292
+ 6. 问答时同时检索原始消息、文件证据和会话记忆块。
293
+
294
+ 会话摘要会脱敏疑似 API key、token、cookie、私钥和 URL 凭据;原始消息仍保存在本地数据库里,回答需要追溯时可以回到原始证据。
295
+
296
+ 手动触发:
297
+
298
+ ```bash
299
+ chattercatcher process episodes
300
+ ```
301
+
302
+ ---
303
+
268
304
  ## 本地数据目录
269
305
 
270
306
  默认数据目录:
@@ -297,6 +333,8 @@ dist/
297
333
  - 默认 Web UI 只监听 `127.0.0.1`。
298
334
  - 聊天记录、文件内容、OCR 结果和语音转写都视为隐私数据。
299
335
  - App Secret、API Key 和 token 与普通配置分开保存。
336
+ - 会话记忆块会脱敏疑似 API key、token、cookie、私钥和 URL 凭据,避免把敏感值扩散到摘要里。
337
+ - 原始消息仍保存在本地数据库,方便在必要时追溯上下文。
300
338
  - 导出文件不包含密钥。
301
339
  - 事实性回答必须基于检索证据。
302
340
  - 检索不到证据时必须说不知道。
@@ -345,6 +383,14 @@ npm install -g chattercatcher@latest
345
383
 
346
384
  家庭聊天是长期知识库,不应该靠把全部历史消息塞进上下文。RAG 可以控制证据范围、保留来源、降低幻觉,并让回答可追溯。
347
385
 
386
+ ### 会话记忆块是什么?
387
+
388
+ 会话记忆块是 ChatterCatcher 对一小段碎片聊天生成的本地摘要。它默认等待 10 分钟窗口结束并静默 2 分钟后生成,用来保留“上一句解释背景、下一句只发短内容”的上下文关系。可以运行 `chattercatcher process episodes` 手动触发。
389
+
390
+ ### 会话摘要会不会泄露 API key?
391
+
392
+ 摘要层会脱敏疑似 API key、token、cookie、私钥和 URL 凭据;原始消息仍然只保存在本地数据库,用于必要时追溯证据。
393
+
348
394
  ### Web UI 可以暴露到公网吗?
349
395
 
350
396
  默认不建议。ChatterCatcher 面向家庭隐私数据,默认只监听 `127.0.0.1`。
package/dist/cli.js CHANGED
@@ -8,7 +8,7 @@ import fs13 from "fs/promises";
8
8
  // package.json
9
9
  var package_default = {
10
10
  name: "chattercatcher",
11
- version: "0.1.13",
11
+ version: "0.1.14",
12
12
  description: "\u672C\u5730\u4F18\u5148\u7684\u98DE\u4E66/Lark \u5BB6\u5EAD\u7FA4\u77E5\u8BC6\u5E93\u673A\u5668\u4EBA",
13
13
  type: "module",
14
14
  main: "dist/index.js",
@@ -110,6 +110,10 @@ var appConfigSchema = z.object({
110
110
  }),
111
111
  schedules: z.object({
112
112
  indexing: z.string().default("*/10 * * * *")
113
+ }),
114
+ episodes: z.object({
115
+ windowMinutes: z.number().int().positive().default(10),
116
+ quietMinutes: z.number().int().positive().default(2)
113
117
  })
114
118
  });
115
119
  var appSecretsSchema = z.object({
@@ -130,7 +134,8 @@ function createDefaultConfig() {
130
134
  embedding: {},
131
135
  storage: {},
132
136
  web: {},
133
- schedules: {}
137
+ schedules: {},
138
+ episodes: {}
134
139
  });
135
140
  }
136
141
  function createDefaultSecrets() {
@@ -412,6 +417,39 @@ function migrateDatabase(database) {
412
417
  tokenize = 'unicode61'
413
418
  );
414
419
 
420
+ CREATE TABLE IF NOT EXISTS memory_episodes (
421
+ id TEXT PRIMARY KEY,
422
+ chat_id TEXT NOT NULL REFERENCES chats(id) ON DELETE CASCADE,
423
+ summary TEXT NOT NULL,
424
+ message_count INTEGER NOT NULL,
425
+ started_at TEXT NOT NULL,
426
+ ended_at TEXT NOT NULL,
427
+ created_at TEXT NOT NULL,
428
+ UNIQUE(chat_id, started_at, ended_at)
429
+ );
430
+
431
+ CREATE TABLE IF NOT EXISTS memory_episode_messages (
432
+ episode_id TEXT NOT NULL REFERENCES memory_episodes(id) ON DELETE CASCADE,
433
+ message_id TEXT NOT NULL REFERENCES messages(id) ON DELETE CASCADE,
434
+ position INTEGER NOT NULL,
435
+ PRIMARY KEY (episode_id, message_id)
436
+ );
437
+
438
+ CREATE VIRTUAL TABLE IF NOT EXISTS memory_episodes_fts USING fts5(
439
+ summary,
440
+ episode_id UNINDEXED,
441
+ tokenize = 'unicode61'
442
+ );
443
+
444
+ CREATE TRIGGER IF NOT EXISTS memory_episodes_delete_fts
445
+ AFTER DELETE ON memory_episodes
446
+ BEGIN
447
+ DELETE FROM memory_episodes_fts WHERE episode_id = old.id;
448
+ END;
449
+
450
+ CREATE INDEX IF NOT EXISTS memory_episode_messages_message_idx
451
+ ON memory_episode_messages(message_id);
452
+
415
453
  CREATE TABLE IF NOT EXISTS message_chunk_embeddings (
416
454
  chunk_id TEXT NOT NULL REFERENCES message_chunks(id) ON DELETE CASCADE,
417
455
  model TEXT NOT NULL,
@@ -1268,6 +1306,211 @@ var MessageRepository = class {
1268
1306
  }
1269
1307
  };
1270
1308
 
1309
+ // src/episodes/repository.ts
1310
+ import crypto3 from "crypto";
1311
+
1312
+ // src/episodes/sanitizer.ts
1313
+ var SECRET_PATTERNS = [
1314
+ [/-----BEGIN [^-]+ PRIVATE KEY-----[\s\S]*?-----END [^-]+ PRIVATE KEY-----/g, "[REDACTED_SECRET]"],
1315
+ [/(\bAuthorization\s*:\s*Bearer\s+)[A-Za-z0-9._~+/=-]{12,}/gi, "$1[REDACTED_SECRET]"],
1316
+ [/(https?:\/\/)[^\s/@:]+:[^\s/@]+@/gi, "$1[REDACTED_SECRET]@"],
1317
+ [/([?&](?:api[_-]?key|access[_-]?token|refresh[_-]?token|token|secret|password|session(?:id)?|client[_-]?secret)=)[^\s&,。;;]+/gi, "$1[REDACTED_SECRET]"],
1318
+ [/("(?:api[_-]?key|access[_-]?token|refresh[_-]?token|token|secret|password|session(?:id)?|client[_-]?secret|private[_-]?key)"\s*:\s*")[^"]+(")/gi, "$1[REDACTED_SECRET]$2"],
1319
+ [/(\b(?:api[_-]?key|access[_-]?token|refresh[_-]?token|token|secret|password|session(?:id)?|client[_-]?secret)\s*[=:]\s*)[^\s;,。]+/gi, "$1[REDACTED_SECRET]"],
1320
+ [/\b(?:ghp|gho|ghu|ghs|ghr)_[A-Za-z0-9_]{20,}\b/g, "[REDACTED_SECRET]"],
1321
+ [/\bxox[baprs]-[A-Za-z0-9-]{20,}\b/g, "[REDACTED_SECRET]"],
1322
+ [/\bsk-[A-Za-z0-9_-]{6,}\b/g, "[REDACTED_SECRET]"]
1323
+ ];
1324
+ function sanitizeEpisodeSummary(summary) {
1325
+ let sanitized = summary;
1326
+ for (const [pattern, replacement] of SECRET_PATTERNS) {
1327
+ sanitized = sanitized.replace(pattern, replacement);
1328
+ }
1329
+ return sanitized;
1330
+ }
1331
+
1332
+ // src/episodes/repository.ts
1333
+ function nowIso3() {
1334
+ return (/* @__PURE__ */ new Date()).toISOString();
1335
+ }
1336
+ function stableId2(parts) {
1337
+ return crypto3.createHash("sha256").update(parts.join("")).digest("hex").slice(0, 32);
1338
+ }
1339
+ function escapeFtsQuery2(query) {
1340
+ const terms = query.trim().split(/\s+/).map((term) => term.replace(/[^\p{L}\p{N}_-]+/gu, " ").trim()).flatMap((term) => term.split(/\s+/)).filter(Boolean);
1341
+ if (terms.length === 0) {
1342
+ return '""';
1343
+ }
1344
+ return terms.map((term) => `"${term.replace(/"/g, '""')}"`).join(" OR ");
1345
+ }
1346
+ function toMillis(value) {
1347
+ const time = Date.parse(value);
1348
+ return Number.isFinite(time) ? time : 0;
1349
+ }
1350
+ var EpisodeRepository = class {
1351
+ constructor(database) {
1352
+ this.database = database;
1353
+ }
1354
+ database;
1355
+ async summarizeReadyWindows(input2) {
1356
+ const rows = this.database.prepare(
1357
+ `
1358
+ SELECT
1359
+ m.id,
1360
+ m.chat_id AS chatId,
1361
+ c.name AS chatName,
1362
+ m.sender_name AS senderName,
1363
+ m.text,
1364
+ m.sent_at AS sentAt
1365
+ FROM messages m
1366
+ JOIN chats c ON c.id = m.chat_id
1367
+ WHERE NOT EXISTS (
1368
+ SELECT 1 FROM memory_episode_messages mem WHERE mem.message_id = m.id
1369
+ )
1370
+ ORDER BY m.chat_id ASC, m.sent_at ASC
1371
+ `
1372
+ ).all();
1373
+ const byChat = /* @__PURE__ */ new Map();
1374
+ for (const row of rows) {
1375
+ byChat.set(row.chatId, [...byChat.get(row.chatId) ?? [], row]);
1376
+ }
1377
+ const created = [];
1378
+ const nowMs = input2.now.getTime();
1379
+ for (const messages of byChat.values()) {
1380
+ const windows = [];
1381
+ let current = [];
1382
+ for (const message of messages) {
1383
+ const first = current[0];
1384
+ if (first && toMillis(message.sentAt) - toMillis(first.sentAt) > input2.windowMs) {
1385
+ windows.push(current);
1386
+ current = [];
1387
+ }
1388
+ current.push(message);
1389
+ }
1390
+ if (current.length > 0) {
1391
+ windows.push(current);
1392
+ }
1393
+ for (const windowMessages of windows) {
1394
+ const last = windowMessages.at(-1);
1395
+ if (!last || nowMs - toMillis(last.sentAt) < input2.quietMs) {
1396
+ continue;
1397
+ }
1398
+ const first = windowMessages[0];
1399
+ const window = {
1400
+ chatId: first.chatId,
1401
+ chatName: first.chatName,
1402
+ startedAt: first.sentAt,
1403
+ endedAt: last.sentAt,
1404
+ messages: windowMessages
1405
+ };
1406
+ const summary = await input2.summarize(window);
1407
+ created.push(this.insertEpisode(window, summary));
1408
+ }
1409
+ }
1410
+ return created;
1411
+ }
1412
+ insertEpisode(window, summary) {
1413
+ const safeSummary = sanitizeEpisodeSummary(summary);
1414
+ const createdAt = nowIso3();
1415
+ const id = stableId2([window.chatId, window.startedAt, window.endedAt]);
1416
+ const transaction = this.database.transaction(() => {
1417
+ this.database.prepare(
1418
+ `
1419
+ INSERT INTO memory_episodes (id, chat_id, summary, message_count, started_at, ended_at, created_at)
1420
+ VALUES (?, ?, ?, ?, ?, ?, ?)
1421
+ ON CONFLICT(chat_id, started_at, ended_at)
1422
+ DO UPDATE SET summary = excluded.summary, message_count = excluded.message_count
1423
+ `
1424
+ ).run(id, window.chatId, safeSummary, window.messages.length, window.startedAt, window.endedAt, createdAt);
1425
+ this.database.prepare("DELETE FROM memory_episode_messages WHERE episode_id = ?").run(id);
1426
+ this.database.prepare("DELETE FROM memory_episodes_fts WHERE episode_id = ?").run(id);
1427
+ const insertMessage = this.database.prepare(
1428
+ "INSERT INTO memory_episode_messages (episode_id, message_id, position) VALUES (?, ?, ?)"
1429
+ );
1430
+ for (const [index2, message] of window.messages.entries()) {
1431
+ insertMessage.run(id, message.id, index2);
1432
+ }
1433
+ this.database.prepare("INSERT INTO memory_episodes_fts (summary, episode_id) VALUES (?, ?)").run(safeSummary, id);
1434
+ });
1435
+ transaction();
1436
+ return {
1437
+ id,
1438
+ chatId: window.chatId,
1439
+ chatName: window.chatName,
1440
+ text: safeSummary,
1441
+ startedAt: window.startedAt,
1442
+ endedAt: window.endedAt,
1443
+ messageIds: window.messages.map((message) => message.id)
1444
+ };
1445
+ }
1446
+ searchEpisodes(query, limit = 8) {
1447
+ const ftsQuery = escapeFtsQuery2(query);
1448
+ return this.database.prepare(
1449
+ `
1450
+ SELECT
1451
+ e.id AS chunkId,
1452
+ e.id AS messageId,
1453
+ 'episode' AS platform,
1454
+ e.summary AS text,
1455
+ 1.0 AS score,
1456
+ 'episode' AS messageType,
1457
+ c.name AS chatName,
1458
+ '\u4F1A\u8BDD\u8BB0\u5FC6' AS senderName,
1459
+ e.ended_at AS sentAt,
1460
+ e.started_at AS startedAt,
1461
+ e.ended_at AS endedAt,
1462
+ (
1463
+ SELECT json_group_array(message_id)
1464
+ FROM (
1465
+ SELECT message_id
1466
+ FROM memory_episode_messages
1467
+ WHERE episode_id = e.id
1468
+ ORDER BY position ASC
1469
+ )
1470
+ ) AS sourceMessageIdsJson
1471
+ FROM memory_episodes_fts fts
1472
+ JOIN memory_episodes e ON e.id = fts.episode_id
1473
+ JOIN chats c ON c.id = e.chat_id
1474
+ WHERE memory_episodes_fts MATCH ?
1475
+ GROUP BY e.id
1476
+ ORDER BY e.ended_at DESC
1477
+ LIMIT ?
1478
+ `
1479
+ ).all(ftsQuery, limit).map((row) => {
1480
+ const item = row;
1481
+ return {
1482
+ ...item,
1483
+ sourceMessageIds: JSON.parse(item.sourceMessageIdsJson)
1484
+ };
1485
+ });
1486
+ }
1487
+ };
1488
+
1489
+ // src/rag/episode-retriever.ts
1490
+ function toEpisodeEvidence(result) {
1491
+ return {
1492
+ id: result.chunkId,
1493
+ text: result.text,
1494
+ score: result.score,
1495
+ source: {
1496
+ type: "episode",
1497
+ label: result.chatName,
1498
+ sender: result.senderName,
1499
+ timestamp: result.endedAt,
1500
+ location: `${result.startedAt} - ${result.endedAt}`
1501
+ }
1502
+ };
1503
+ }
1504
+ var EpisodeFtsRetriever = class {
1505
+ constructor(episodes) {
1506
+ this.episodes = episodes;
1507
+ }
1508
+ episodes;
1509
+ async retrieve(question) {
1510
+ return this.episodes.searchEpisodes(question, 8).map(toEpisodeEvidence);
1511
+ }
1512
+ };
1513
+
1271
1514
  // src/rag/hybrid-retriever.ts
1272
1515
  function normalizeScore(score) {
1273
1516
  if (!Number.isFinite(score)) {
@@ -1275,6 +1518,14 @@ function normalizeScore(score) {
1275
1518
  }
1276
1519
  return Math.max(0, Math.min(1, score));
1277
1520
  }
1521
+ function evidenceTimestampMs(evidence) {
1522
+ const timestamp = evidence.source.timestamp;
1523
+ if (!timestamp) {
1524
+ return 0;
1525
+ }
1526
+ const parsed = Date.parse(timestamp);
1527
+ return Number.isFinite(parsed) ? parsed : 0;
1528
+ }
1278
1529
  var HybridRetriever = class {
1279
1530
  constructor(retrievers, options = {}) {
1280
1531
  this.retrievers = retrievers;
@@ -1285,19 +1536,19 @@ var HybridRetriever = class {
1285
1536
  async retrieve(question) {
1286
1537
  const results = await Promise.all(this.retrievers.map((retriever) => retriever.retrieve(question)));
1287
1538
  const merged = /* @__PURE__ */ new Map();
1288
- for (const [retrieverIndex, evidenceList] of results.entries()) {
1539
+ for (const evidenceList of results) {
1289
1540
  for (const evidence of evidenceList) {
1290
1541
  const existing = merged.get(evidence.id);
1291
- const weightedScore = normalizeScore(evidence.score) + (this.retrievers.length - retrieverIndex) * 0.01;
1292
- if (!existing || weightedScore > existing.score) {
1542
+ const score = normalizeScore(evidence.score);
1543
+ if (!existing || score > existing.score) {
1293
1544
  merged.set(evidence.id, {
1294
1545
  ...evidence,
1295
- score: weightedScore
1546
+ score
1296
1547
  });
1297
1548
  }
1298
1549
  }
1299
1550
  }
1300
- return [...merged.values()].sort((left, right) => right.score - left.score).slice(0, this.options.limit ?? 8);
1551
+ return [...merged.values()].sort((left, right) => right.score - left.score || evidenceTimestampMs(right) - evidenceTimestampMs(left)).slice(0, this.options.limit ?? 8);
1301
1552
  }
1302
1553
  };
1303
1554
 
@@ -1471,7 +1722,10 @@ function hasEmbeddingConfig(config, secrets) {
1471
1722
  return Boolean((config.embedding.baseUrl || config.llm.baseUrl) && config.embedding.model && (secrets.embedding.apiKey || secrets.llm.apiKey));
1472
1723
  }
1473
1724
  async function createHybridRetriever(input2) {
1474
- const retrievers = [new MessageFtsRetriever(input2.messages, { excludeMessageIds: input2.excludeMessageIds })];
1725
+ const retrievers = [
1726
+ new EpisodeFtsRetriever(new EpisodeRepository(input2.database)),
1727
+ new MessageFtsRetriever(input2.messages, { excludeMessageIds: input2.excludeMessageIds })
1728
+ ];
1475
1729
  const closers = [];
1476
1730
  if (hasEmbeddingConfig(input2.config, input2.secrets)) {
1477
1731
  const vectorStore = new SqliteVectorStore(input2.database, {
@@ -1945,6 +2199,40 @@ async function restoreLocalData(input2) {
1945
2199
  };
1946
2200
  }
1947
2201
 
2202
+ // src/episodes/summarizer.ts
2203
+ async function summarizeEpisodeWindow(window, model) {
2204
+ const transcript = window.messages.map((message) => `[${message.sentAt}] ${message.senderName}\uFF1A${message.text}`).join("\n");
2205
+ const summary = await model.complete([
2206
+ {
2207
+ role: "system",
2208
+ content: "\u4F60\u662F ChatterCatcher \u7684\u4F1A\u8BDD\u8BB0\u5FC6\u6574\u7406\u6A21\u5757\u3002\u4F60\u7684\u4EFB\u52A1\u662F\u628A\u788E\u7247\u5316\u95F2\u804A\u6574\u7406\u6210\u53EF\u68C0\u7D22\u4E8B\u5B9E\uFF0C\u8865\u5168\u77ED\u6D88\u606F\u3001\u4EE3\u8BCD\u3001\u7F29\u5199\u4E0E\u4E0A\u4E0B\u6587\u4E4B\u95F4\u7684\u5173\u7CFB\u3002\u53EA\u603B\u7ED3\u660E\u786E\u4E8B\u5B9E\uFF0C\u4E0D\u8981\u7F16\u9020\u3002\u4FDD\u7559\u91CD\u8981\u6570\u5B57\u3001\u65E5\u671F\u3001\u94FE\u63A5\u548C\u4EE3\u7801\uFF1B\u5982\u679C\u5185\u5BB9\u50CF\u5BC6\u7801\u3001API key\u3001token \u6216\u5BC6\u94A5\uFF0C\u53EA\u63CF\u8FF0\u5176\u4E0A\u4E0B\u6587\u5173\u7CFB\uFF0C\u4E0D\u8981\u5728\u6458\u8981\u4E2D\u590D\u5199\u539F\u6587\u3002"
2209
+ },
2210
+ {
2211
+ role: "user",
2212
+ content: `\u7FA4\u804A\uFF1A${window.chatName}
2213
+ \u65F6\u95F4\uFF1A${window.startedAt} - ${window.endedAt}
2214
+
2215
+ \u804A\u5929\u8BB0\u5F55\uFF1A
2216
+ ${transcript}
2217
+
2218
+ \u8BF7\u8F93\u51FA\u4E00\u6BB5\u7B80\u6D01\u7684\u4F1A\u8BDD\u8BB0\u5FC6\u6458\u8981\u3002`
2219
+ }
2220
+ ]);
2221
+ return sanitizeEpisodeSummary(summary);
2222
+ }
2223
+
2224
+ // src/episodes/manual-process.ts
2225
+ async function processEpisodesNow(input2) {
2226
+ const episodes = new EpisodeRepository(input2.database);
2227
+ const created = await episodes.summarizeReadyWindows({
2228
+ now: input2.now ?? /* @__PURE__ */ new Date(),
2229
+ quietMs: input2.config.episodes.quietMinutes * 60 * 1e3,
2230
+ windowMs: input2.config.episodes.windowMinutes * 60 * 1e3,
2231
+ summarize: (window) => summarizeEpisodeWindow(window, input2.model)
2232
+ });
2233
+ return { created: created.length };
2234
+ }
2235
+
1948
2236
  // src/feishu/bot-info.ts
1949
2237
  function getOpenApiBaseUrl(domain) {
1950
2238
  return domain === "lark" ? "https://open.larksuite.com/open-apis" : "https://open.feishu.cn/open-apis";
@@ -2416,6 +2704,18 @@ function createFeishuEventDispatcher(options) {
2416
2704
  console.log("\u98DE\u4E66\u6D88\u606F\u91CD\u590D\u6295\u9012\uFF1A\u5DF2\u8DF3\u8FC7\u9644\u4EF6\u5904\u7406\u548C\u56DE\u7B54\u3002");
2417
2705
  return;
2418
2706
  }
2707
+ if (options.episodeProcessor) {
2708
+ const episodeResult = await processEpisodesNow({
2709
+ config: options.config,
2710
+ secrets: options.secrets,
2711
+ database: options.episodeProcessor.database,
2712
+ model: options.episodeProcessor.model,
2713
+ now: options.episodeProcessor.now?.()
2714
+ });
2715
+ if (episodeResult.created > 0) {
2716
+ console.log(`\u98DE\u4E66\u4F1A\u8BDD\u8BB0\u5FC6\u5DF2\u751F\u6210\uFF1A${episodeResult.created}`);
2717
+ }
2718
+ }
2419
2719
  if (result.attachment?.downloaded) {
2420
2720
  console.log(`\u98DE\u4E66\u9644\u4EF6\u5DF2\u4E0B\u8F7D\uFF1A${result.attachment.downloaded.storedPath}`);
2421
2721
  if (result.attachment.indexedMessageId) {
@@ -2463,10 +2763,12 @@ function createFeishuGateway(options) {
2463
2763
  });
2464
2764
  const eventDispatcher = createFeishuEventDispatcher({
2465
2765
  config: options.config,
2766
+ secrets: options.secrets,
2466
2767
  ingestor: options.ingestor,
2467
2768
  questionHandler: options.questionHandler,
2468
2769
  resourceDownloader: options.resourceDownloader,
2469
- attachmentVectorIndexer: options.attachmentVectorIndexer
2770
+ attachmentVectorIndexer: options.attachmentVectorIndexer,
2771
+ episodeProcessor: options.episodeProcessor
2470
2772
  });
2471
2773
  return {
2472
2774
  async start() {
@@ -2544,7 +2846,7 @@ var FeishuResourceDownloader = class _FeishuResourceDownloader {
2544
2846
  };
2545
2847
 
2546
2848
  // src/files/ingest.ts
2547
- import crypto3 from "crypto";
2849
+ import crypto4 from "crypto";
2548
2850
  import fs11 from "fs/promises";
2549
2851
  import path13 from "path";
2550
2852
 
@@ -2608,7 +2910,7 @@ function ensureSupportedTextFile(filePath) {
2608
2910
  }
2609
2911
  }
2610
2912
  function stableStoredName(sourcePath, fileName) {
2611
- const digest = crypto3.createHash("sha256").update(sourcePath).digest("hex").slice(0, 16);
2913
+ const digest = crypto4.createHash("sha256").update(sourcePath).digest("hex").slice(0, 16);
2612
2914
  return `${digest}-${fileName}`;
2613
2915
  }
2614
2916
  async function ingestLocalFile(input2) {
@@ -3700,6 +4002,8 @@ async function promptForConfiguration(config, secrets) {
3700
4002
  message: "\u7FA4\u804A\u56DE\u7B54\u662F\u5426\u8981\u6C42 @ \u673A\u5668\u4EBA\uFF1F",
3701
4003
  default: config.feishu.requireMention
3702
4004
  });
4005
+ config.episodes.windowMinutes = await number({ message: "\u4F1A\u8BDD\u8BB0\u5FC6\u805A\u5408\u7A97\u53E3\uFF08\u5206\u949F\uFF09", default: config.episodes.windowMinutes, required: true }) ?? config.episodes.windowMinutes;
4006
+ config.episodes.quietMinutes = await number({ message: "\u4F1A\u8BDD\u9759\u9ED8\u591A\u4E45\u540E\u751F\u6210\u8BB0\u5FC6\uFF08\u5206\u949F\uFF09", default: config.episodes.quietMinutes, required: true }) ?? config.episodes.quietMinutes;
3703
4007
  }
3704
4008
  async function tryEnsureFeishuBotOpenId(config, secrets) {
3705
4009
  if (config.feishu.botOpenId || !config.feishu.appId || !secrets.feishu.appSecret) {
@@ -3828,6 +4132,10 @@ async function startGatewayForegroundCommand() {
3828
4132
  store: vectorStore,
3829
4133
  messageIds: [messageId]
3830
4134
  }) : void 0,
4135
+ episodeProcessor: {
4136
+ database,
4137
+ model: createChatModel(config, secrets)
4138
+ },
3831
4139
  questionHandler: new FeishuQuestionHandler({
3832
4140
  config,
3833
4141
  secrets,
@@ -4007,6 +4315,22 @@ processCommand.command("messages").description("\u7ACB\u5373\u5904\u7406\u6D88\u
4007
4315
  database.close();
4008
4316
  }
4009
4317
  });
4318
+ processCommand.command("episodes").description("\u7ACB\u5373\u751F\u6210\u4F1A\u8BDD\u8BB0\u5FC6\u5757\uFF0C\u628A\u788E\u7247\u5316\u95F2\u804A\u6574\u7406\u6210\u53EF\u68C0\u7D22\u6458\u8981").action(async () => {
4319
+ const config = await loadConfig();
4320
+ const secrets = await loadSecrets();
4321
+ const database = openDatabase(config);
4322
+ try {
4323
+ const result = await processEpisodesNow({
4324
+ config,
4325
+ secrets,
4326
+ database,
4327
+ model: createChatModel(config, secrets)
4328
+ });
4329
+ console.log(`\u4F1A\u8BDD\u8BB0\u5FC6\u5904\u7406\u5B8C\u6210\uFF1Aepisodes=${result.created}`);
4330
+ } finally {
4331
+ database.close();
4332
+ }
4333
+ });
4010
4334
  var files = program.command("files").description("\u7BA1\u7406\u672C\u5730\u6587\u4EF6\u77E5\u8BC6\u6E90");
4011
4335
  files.command("add").description("\u628A\u672C\u5730\u6587\u4EF6\u89E3\u6790\u3001\u4FDD\u5B58\u5230\u6570\u636E\u76EE\u5F55\u5E76\u5199\u5165 RAG \u77E5\u8BC6\u5E93").argument("<paths...>", "\u6587\u4EF6\u8DEF\u5F84\uFF0C\u652F\u6301 txt\u3001md\u3001json\u3001csv\u3001tsv\u3001log\u3001docx\u3001pdf").action(async (paths) => {
4012
4336
  const config = await loadConfig();