openkb 0.1.0.dev1__tar.gz → 0.1.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,7 +1,7 @@
1
1
  Metadata-Version: 2.3
2
2
  Name: openkb
3
- Version: 0.1.0.dev1
4
- Summary: OpenKB Open LLM Knowledge Base, powered by PageIndex
3
+ Version: 0.1.2
4
+ Summary: OpenKB: Open LLM Knowledge Base, powered by PageIndex
5
5
  License: Apache-2.0
6
6
  Keywords: ai,rag,retrieval,knowledge-base,llm,pageindex,agents,document
7
7
  Author: Ray
@@ -22,7 +22,8 @@ Requires-Dist: json-repair
22
22
  Requires-Dist: litellm
23
23
  Requires-Dist: markitdown[all]
24
24
  Requires-Dist: openai-agents
25
- Requires-Dist: pageindex (==0.3.0.dev0)
25
+ Requires-Dist: pageindex (==0.3.0.dev1)
26
+ Requires-Dist: prompt_toolkit (>=3.0)
26
27
  Requires-Dist: python-dotenv
27
28
  Requires-Dist: pyyaml
28
29
  Requires-Dist: watchdog (>=3.0)
@@ -57,11 +58,12 @@ Traditional RAG rediscovers knowledge from scratch on every query. Nothing accum
57
58
 
58
59
  ### Features
59
60
 
60
- - **Any format** — PDF, Word, PowerPoint, Excel, HTML, Markdown, text, CSV, and more via markitdown
61
+ - **Broad format support** — PDF, Word, Markdown, PowerPoint, HTML, Excel, CSV, text, and more via markitdown
61
62
  - **Scale to long documents** — Long and complex documents are handled via [PageIndex](https://github.com/VectifyAI/PageIndex) tree indexing, enabling accurate, vectorless long-context retrieval
62
63
  - **Native multi-modality** — Retrieves and understands figures, tables, and images, not just text
63
- - **Auto wiki** — LLM generates summaries, concept pages, and cross-links. You curate sources; the LLM does the rest
64
- - **Query** — Ask questions against your wiki. The LLM navigates your compiled knowledge to answer
64
+ - **Compiled Wiki** — LLM manages and compiles your documents into summaries, concept pages, and cross-links, all kept in sync
65
+ - **Query** — Ask questions (one-off) against your wiki. The LLM navigates your compiled knowledge to answer
66
+ - **Interactive Chat** — Multi-turn conversations with persisted sessions you can resume across runs
65
67
  - **Lint** — Health checks find contradictions, gaps, orphans, and stale content
66
68
  - **Watch mode** — Drop files into `raw/`, wiki updates automatically
67
69
  - **Obsidian compatible** — Wiki is plain `.md` files with `[[wikilinks]]`. Open in Obsidian for graph view and browsing
@@ -88,11 +90,11 @@ openkb add paper.pdf
88
90
  openkb add ~/papers/ # Add a whole directory
89
91
  openkb add article.html
90
92
 
91
- # 4. Ask questions
93
+ # 4. Ask a question
92
94
  openkb query "What are the main findings?"
93
95
 
94
- # 5. Check wiki health
95
- openkb lint
96
+ # 5. Or start an interactive chat session
97
+ openkb chat
96
98
  ```
97
99
 
98
100
  ### Set up your LLM
@@ -165,6 +167,7 @@ A single source might touch 10-15 wiki pages. Knowledge accumulates: each docume
165
167
  | `openkb add <file_or_dir>` | Add documents and compile to wiki |
166
168
  | `openkb query "question"` | Ask a question against the knowledge base |
167
169
  | `openkb query "question" --save` | Ask and save the answer to `wiki/explorations/` |
170
+ | `openkb chat` | Start an interactive multi-turn chat (use `--resume`, `--list`, `--delete` to manage sessions) |
168
171
  | `openkb watch` | Watch `raw/` and auto-compile new files |
169
172
  | `openkb lint` | Run structural + knowledge health checks |
170
173
  | `openkb list` | List indexed documents and concepts |
@@ -172,6 +175,20 @@ A single source might touch 10-15 wiki pages. Knowledge accumulates: each docume
172
175
 
173
176
  <!-- | `openkb lint --fix` | Auto-fix what it can | -->
174
177
 
178
+ ### Interactive chat
179
+
180
+ `openkb chat` opens an interactive chat session over your wiki knowledge base. Unlike the one-shot `openkb query`, each turn carries the conversation history, so you can dig into a topic without re-typing context.
181
+
182
+ ```bash
183
+ openkb chat # start a new session
184
+ openkb chat --resume # resume the most recent session
185
+ openkb chat --resume 20260411 # resume by id (unique prefix works)
186
+ openkb chat --list # list all sessions
187
+ openkb chat --delete <id> # delete a session
188
+ ```
189
+
190
+ `/help` lists all slash commands: e.g., `/save` exports the transcript, `/clear` starts a fresh session.
191
+
175
192
  ### Configuration
176
193
 
177
194
  Settings are initialized by `openkb init`, and stored in `.openkb/config.yaml`:
@@ -24,11 +24,12 @@ Traditional RAG rediscovers knowledge from scratch on every query. Nothing accum
24
24
 
25
25
  ### Features
26
26
 
27
- - **Any format** — PDF, Word, PowerPoint, Excel, HTML, Markdown, text, CSV, and more via markitdown
27
+ - **Broad format support** — PDF, Word, Markdown, PowerPoint, HTML, Excel, CSV, text, and more via markitdown
28
28
  - **Scale to long documents** — Long and complex documents are handled via [PageIndex](https://github.com/VectifyAI/PageIndex) tree indexing, enabling accurate, vectorless long-context retrieval
29
29
  - **Native multi-modality** — Retrieves and understands figures, tables, and images, not just text
30
- - **Auto wiki** — LLM generates summaries, concept pages, and cross-links. You curate sources; the LLM does the rest
31
- - **Query** — Ask questions against your wiki. The LLM navigates your compiled knowledge to answer
30
+ - **Compiled Wiki** — LLM manages and compiles your documents into summaries, concept pages, and cross-links, all kept in sync
31
+ - **Query** — Ask questions (one-off) against your wiki. The LLM navigates your compiled knowledge to answer
32
+ - **Interactive Chat** — Multi-turn conversations with persisted sessions you can resume across runs
32
33
  - **Lint** — Health checks find contradictions, gaps, orphans, and stale content
33
34
  - **Watch mode** — Drop files into `raw/`, wiki updates automatically
34
35
  - **Obsidian compatible** — Wiki is plain `.md` files with `[[wikilinks]]`. Open in Obsidian for graph view and browsing
@@ -55,11 +56,11 @@ openkb add paper.pdf
55
56
  openkb add ~/papers/ # Add a whole directory
56
57
  openkb add article.html
57
58
 
58
- # 4. Ask questions
59
+ # 4. Ask a question
59
60
  openkb query "What are the main findings?"
60
61
 
61
- # 5. Check wiki health
62
- openkb lint
62
+ # 5. Or start an interactive chat session
63
+ openkb chat
63
64
  ```
64
65
 
65
66
  ### Set up your LLM
@@ -132,6 +133,7 @@ A single source might touch 10-15 wiki pages. Knowledge accumulates: each docume
132
133
  | `openkb add <file_or_dir>` | Add documents and compile to wiki |
133
134
  | `openkb query "question"` | Ask a question against the knowledge base |
134
135
  | `openkb query "question" --save` | Ask and save the answer to `wiki/explorations/` |
136
+ | `openkb chat` | Start an interactive multi-turn chat (use `--resume`, `--list`, `--delete` to manage sessions) |
135
137
  | `openkb watch` | Watch `raw/` and auto-compile new files |
136
138
  | `openkb lint` | Run structural + knowledge health checks |
137
139
  | `openkb list` | List indexed documents and concepts |
@@ -139,6 +141,20 @@ A single source might touch 10-15 wiki pages. Knowledge accumulates: each docume
139
141
 
140
142
  <!-- | `openkb lint --fix` | Auto-fix what it can | -->
141
143
 
144
+ ### Interactive chat
145
+
146
+ `openkb chat` opens an interactive chat session over your wiki knowledge base. Unlike the one-shot `openkb query`, each turn carries the conversation history, so you can dig into a topic without re-typing context.
147
+
148
+ ```bash
149
+ openkb chat # start a new session
150
+ openkb chat --resume # resume the most recent session
151
+ openkb chat --resume 20260411 # resume by id (unique prefix works)
152
+ openkb chat --list # list all sessions
153
+ openkb chat --delete <id> # delete a session
154
+ ```
155
+
156
+ `/help` lists all slash commands: e.g., `/save` exports the transcript, `/clear` starts a fresh session.
157
+
142
158
  ### Configuration
143
159
 
144
160
  Settings are initialized by `openkb init`, and stored in `.openkb/config.yaml`:
@@ -0,0 +1,7 @@
1
+ """OpenKB package."""
2
+ from importlib.metadata import PackageNotFoundError, version as _version
3
+
4
+ try:
5
+ __version__ = _version("openkb")
6
+ except PackageNotFoundError:
7
+ __version__ = "0.0.0+unknown"
@@ -0,0 +1,378 @@
1
+ """Interactive multi-turn chat REPL for the OpenKB knowledge base.
2
+
3
+ Builds on the single-shot Q&A agent in ``openkb.agent.query`` and keeps
4
+ conversation state in ``ChatSession``. Uses prompt_toolkit for the input
5
+ line (history, editing, bottom toolbar) and streams responses directly to
6
+ stdout to preserve the existing ``query`` visual.
7
+ """
8
+ from __future__ import annotations
9
+
10
+ import os
11
+ import re
12
+ import sys
13
+ import time
14
+ from pathlib import Path
15
+ from typing import Any
16
+
17
+ from prompt_toolkit import PromptSession
18
+ from prompt_toolkit.formatted_text import FormattedText
19
+ from prompt_toolkit.shortcuts import print_formatted_text
20
+ from prompt_toolkit.styles import Style
21
+
22
+ from openkb.agent.chat_session import ChatSession
23
+ from openkb.agent.query import MAX_TURNS, build_query_agent
24
+ from openkb.log import append_log
25
+
26
+
27
+ _STYLE_DICT: dict[str, str] = {
28
+ "prompt": "bold #5fa0e0",
29
+ "bottom-toolbar": "noreverse nobold #8a8a8a bg:default",
30
+ "toolbar": "noreverse nobold #8a8a8a bg:default",
31
+ "toolbar.session": "noreverse #8a8a8a bg:default bold",
32
+ "header": "#8a8a8a",
33
+ "header.title": "bold #5fa0e0",
34
+ "tool": "#a8a8a8",
35
+ "tool.name": "#a8a8a8 bold",
36
+ "slash.ok": "ansigreen",
37
+ "slash.help": "#8a8a8a",
38
+ "error": "ansired bold",
39
+ "resume.turn": "#5fa0e0",
40
+ "resume.user": "bold",
41
+ "resume.assistant": "#8a8a8a",
42
+ }
43
+
44
+ _HELP_TEXT = (
45
+ "Commands:\n"
46
+ " /exit Exit (Ctrl-D also works)\n"
47
+ " /clear Start a fresh session (current one is kept on disk)\n"
48
+ " /save [name] Export transcript to wiki/explorations/\n"
49
+ " /help Show this"
50
+ )
51
+
52
+ _SIGINT_EXIT_WINDOW = 2.0
53
+
54
+
55
+ def _use_color(force_off: bool) -> bool:
56
+ if force_off:
57
+ return False
58
+ if os.environ.get("NO_COLOR", ""):
59
+ return False
60
+ if not sys.stdout.isatty():
61
+ return False
62
+ return True
63
+
64
+
65
+ def _build_style(use_color: bool) -> Style:
66
+ return Style.from_dict(_STYLE_DICT if use_color else {})
67
+
68
+
69
+ def _fmt(style: Style, *fragments: tuple[str, str]) -> None:
70
+ print_formatted_text(FormattedText(list(fragments)), style=style, end="")
71
+
72
+
73
+ def _format_tool_line(name: str, args: str, width: int = 78) -> str:
74
+ args = args or ""
75
+ args = args.replace("\n", " ")
76
+ base = f" \u00b7 {name}({args})"
77
+ if len(base) > width:
78
+ base = base[: width - 1] + "\u2026"
79
+ return base
80
+
81
+
82
+ def _extract_preview(text: str, limit: int = 150) -> str:
83
+ text = " ".join((text or "").strip().split())
84
+ if len(text) <= limit:
85
+ return text
86
+ return text[: limit - 1] + "\u2026"
87
+
88
+
89
+ def _openkb_version() -> str:
90
+ from openkb import __version__
91
+ return __version__
92
+
93
+
94
+ def _display_kb_dir(kb_dir: Path) -> str:
95
+ home = str(Path.home())
96
+ s = str(kb_dir)
97
+ if s == home:
98
+ return "~"
99
+ if s.startswith(home + "/"):
100
+ return "~" + s[len(home):]
101
+ return s
102
+
103
+
104
+ def _print_header(session: ChatSession, kb_dir: Path, style: Style) -> None:
105
+ disp_dir = _display_kb_dir(kb_dir)
106
+ version = _openkb_version()
107
+ version_suffix = f" v{version}\n" if version else "\n"
108
+ print()
109
+ _fmt(
110
+ style,
111
+ ("class:header.title", "OpenKB Chat"),
112
+ ("class:header", version_suffix),
113
+ )
114
+ _fmt(
115
+ style,
116
+ (
117
+ "class:header",
118
+ f"{disp_dir} \u00b7 {session.model} \u00b7 session {session.id}\n",
119
+ ),
120
+ )
121
+ _fmt(
122
+ style,
123
+ (
124
+ "class:header",
125
+ "Type /help for commands, Ctrl-D to exit, "
126
+ "Ctrl-C to abort current response.\n",
127
+ ),
128
+ )
129
+ print()
130
+
131
+
132
+ def _print_resume_view(session: ChatSession, style: Style) -> None:
133
+ turns = list(zip(session.user_turns, session.assistant_texts))
134
+ if not turns:
135
+ return
136
+ total = len(turns)
137
+ if total > 5:
138
+ omitted = total - 5
139
+ _fmt(
140
+ style,
141
+ ("class:header", f"... {omitted} earlier turn(s) omitted\n"),
142
+ )
143
+ turns = turns[-5:]
144
+ start = omitted + 1
145
+ else:
146
+ start = 1
147
+
148
+ _fmt(
149
+ style,
150
+ ("class:header", f"Resumed session {total} turn(s)\n"),
151
+ )
152
+ for i, (u, a) in enumerate(turns, start):
153
+ _fmt(
154
+ style,
155
+ ("class:resume.turn", f"[{i}] "),
156
+ ("class:resume.user", f">>> {u}\n"),
157
+ )
158
+ if a:
159
+ preview = _extract_preview(a, 180)
160
+ extra = ""
161
+ if len(a) > len(preview):
162
+ extra = f" ({len(a)} chars)"
163
+ _fmt(
164
+ style,
165
+ ("class:resume.turn", f"[{i}] "),
166
+ ("class:resume.assistant", f" {preview}{extra}\n"),
167
+ )
168
+ print()
169
+
170
+
171
+ def _bottom_toolbar(session: ChatSession) -> FormattedText:
172
+ return FormattedText(
173
+ [
174
+ ("class:toolbar", " session "),
175
+ ("class:toolbar.session", session.id),
176
+ (
177
+ "class:toolbar",
178
+ f" {session.turn_count} turn(s) {session.model} ",
179
+ ),
180
+ ]
181
+ )
182
+
183
+
184
+ def _make_prompt_session(session: ChatSession, style: Style, use_color: bool) -> PromptSession:
185
+ return PromptSession(
186
+ message=FormattedText([("class:prompt", ">>> ")]),
187
+ style=style,
188
+ bottom_toolbar=(lambda: _bottom_toolbar(session)) if use_color else None,
189
+ )
190
+
191
+
192
+ async def _run_turn(agent: Any, session: ChatSession, user_input: str, style: Style) -> None:
193
+ """Run one agent turn with streaming output and persist the new history."""
194
+ from agents import (
195
+ RawResponsesStreamEvent,
196
+ RunItemStreamEvent,
197
+ Runner,
198
+ )
199
+ from openai.types.responses import ResponseTextDeltaEvent
200
+
201
+ new_input = session.history + [{"role": "user", "content": user_input}]
202
+
203
+ result = Runner.run_streamed(agent, new_input, max_turns=MAX_TURNS)
204
+
205
+ sys.stdout.write("\n")
206
+ sys.stdout.flush()
207
+ collected: list[str] = []
208
+ last_was_text = False
209
+ need_blank_before_text = False
210
+ try:
211
+ async for event in result.stream_events():
212
+ if isinstance(event, RawResponsesStreamEvent):
213
+ if isinstance(event.data, ResponseTextDeltaEvent):
214
+ text = event.data.delta
215
+ if text:
216
+ if need_blank_before_text:
217
+ sys.stdout.write("\n")
218
+ need_blank_before_text = False
219
+ sys.stdout.write(text)
220
+ sys.stdout.flush()
221
+ collected.append(text)
222
+ last_was_text = True
223
+ elif isinstance(event, RunItemStreamEvent):
224
+ item = event.item
225
+ if item.type == "tool_call_item":
226
+ if last_was_text:
227
+ sys.stdout.write("\n")
228
+ sys.stdout.flush()
229
+ last_was_text = False
230
+ raw = item.raw_item
231
+ name = getattr(raw, "name", "?")
232
+ args = getattr(raw, "arguments", "") or ""
233
+ _fmt(style, ("class:tool", _format_tool_line(name, args) + "\n"))
234
+ need_blank_before_text = True
235
+ finally:
236
+ sys.stdout.write("\n\n")
237
+ sys.stdout.flush()
238
+
239
+ answer = "".join(collected).strip()
240
+ if not answer:
241
+ answer = (result.final_output or "").strip()
242
+ session.record_turn(user_input, answer, result.to_input_list())
243
+
244
+
245
+ def _save_transcript(kb_dir: Path, session: ChatSession, name: str | None) -> Path:
246
+ explore_dir = kb_dir / "wiki" / "explorations"
247
+ explore_dir.mkdir(parents=True, exist_ok=True)
248
+
249
+ base = name or session.title or (session.user_turns[0] if session.user_turns else session.id)
250
+ slug = re.sub(r"[^a-z0-9]+", "-", base.lower()).strip("-")[:60] or session.id
251
+ date = session.created_at[:10].replace("-", "")
252
+ path = explore_dir / f"{slug}-{date}.md"
253
+
254
+ lines: list[str] = [
255
+ "---",
256
+ f'session: "{session.id}"',
257
+ f'model: "{session.model}"',
258
+ f'created: "{session.created_at}"',
259
+ "---",
260
+ "",
261
+ f"# Chat transcript {session.title or session.id}",
262
+ "",
263
+ ]
264
+ for i, (u, a) in enumerate(zip(session.user_turns, session.assistant_texts), 1):
265
+ lines.append(f"## [{i}] {u}")
266
+ lines.append("")
267
+ lines.append(a or "_(no response recorded)_")
268
+ lines.append("")
269
+
270
+ path.write_text("\n".join(lines), encoding="utf-8")
271
+ return path
272
+
273
+
274
+ async def _handle_slash(
275
+ cmd: str,
276
+ kb_dir: Path,
277
+ session: ChatSession,
278
+ style: Style,
279
+ ) -> str | None:
280
+ """Return ``"exit"`` to end the REPL, ``"new_session"`` to swap sessions,
281
+ or ``None`` to continue with the current session."""
282
+ parts = cmd.split(maxsplit=1)
283
+ head = parts[0].lower()
284
+ arg = parts[1].strip() if len(parts) > 1 else ""
285
+
286
+ if head in ("/exit", "/quit"):
287
+ _fmt(style, ("class:header", "Bye. Thanks for using OpenKB.\n\n"))
288
+ return "exit"
289
+
290
+ if head == "/help":
291
+ _fmt(style, ("class:slash.help", _HELP_TEXT + "\n"))
292
+ return None
293
+
294
+ if head == "/clear":
295
+ old_id = session.id
296
+ _fmt(
297
+ style,
298
+ ("class:slash.ok", f"Started new session (previous: {old_id})\n"),
299
+ )
300
+ return "new_session"
301
+
302
+ if head == "/save":
303
+ if not session.user_turns:
304
+ _fmt(style, ("class:error", "Nothing to save yet.\n"))
305
+ return None
306
+ path = _save_transcript(kb_dir, session, arg or None)
307
+ _fmt(style, ("class:slash.ok", f"Saved to {path}\n"))
308
+ return None
309
+
310
+ _fmt(
311
+ style,
312
+ ("class:error", f"Unknown command: {head}. Try /help.\n"),
313
+ )
314
+ return None
315
+
316
+
317
+ async def run_chat(
318
+ kb_dir: Path,
319
+ session: ChatSession,
320
+ *,
321
+ no_color: bool = False,
322
+ ) -> None:
323
+ """Run the chat REPL against ``session`` until the user exits."""
324
+ from openkb.config import load_config
325
+
326
+ use_color = _use_color(force_off=no_color)
327
+ style = _build_style(use_color)
328
+
329
+ config = load_config(kb_dir / ".openkb" / "config.yaml")
330
+ language = session.language or config.get("language", "en")
331
+ wiki_root = str(kb_dir / "wiki")
332
+ agent = build_query_agent(wiki_root, session.model, language=language)
333
+
334
+ _print_header(session, kb_dir, style)
335
+ if session.turn_count > 0:
336
+ _print_resume_view(session, style)
337
+
338
+ prompt_session = _make_prompt_session(session, style, use_color)
339
+
340
+ last_sigint = 0.0
341
+
342
+ while True:
343
+ try:
344
+ user_input = await prompt_session.prompt_async()
345
+ last_sigint = 0.0
346
+ except KeyboardInterrupt:
347
+ now = time.monotonic()
348
+ if last_sigint and (now - last_sigint) < _SIGINT_EXIT_WINDOW:
349
+ _fmt(style, ("class:header", "\nBye. Thanks for using OpenKB.\n\n"))
350
+ return
351
+ last_sigint = now
352
+ _fmt(style, ("class:header", "\n(Press Ctrl-C again to exit)\n"))
353
+ continue
354
+ except EOFError:
355
+ _fmt(style, ("class:header", "Bye. Thanks for using OpenKB.\n\n"))
356
+ return
357
+
358
+ user_input = (user_input or "").strip()
359
+ if not user_input:
360
+ continue
361
+
362
+ if user_input.startswith("/"):
363
+ action = await _handle_slash(user_input, kb_dir, session, style)
364
+ if action == "exit":
365
+ return
366
+ if action == "new_session":
367
+ session = ChatSession.new(kb_dir, session.model, session.language)
368
+ agent = build_query_agent(wiki_root, session.model, language=language)
369
+ prompt_session = _make_prompt_session(session, style, use_color)
370
+ continue
371
+
372
+ append_log(kb_dir / "wiki", "query", user_input)
373
+ try:
374
+ await _run_turn(agent, session, user_input, style)
375
+ except KeyboardInterrupt:
376
+ _fmt(style, ("class:error", "\n[aborted]\n"))
377
+ except Exception as exc:
378
+ _fmt(style, ("class:error", f"[ERROR] {exc}\n"))
@@ -0,0 +1,280 @@
1
+ """Chat session persistence for `openkb chat`.
2
+
3
+ Each session lives in ``<kb>/.openkb/chats/<id>.json`` and stores a sanitized
4
+ agent-SDK history (from ``RunResult.to_input_list()``) alongside the user
5
+ messages and full assistant replies kept as plain strings for display and
6
+ export. Large tool-returned image payloads are replaced with lightweight
7
+ references before the history is reused or persisted.
8
+ """
9
+ from __future__ import annotations
10
+
11
+ import json
12
+ import os
13
+ import random
14
+ import string
15
+ from dataclasses import dataclass
16
+ from datetime import datetime, timezone
17
+ from pathlib import Path
18
+ from typing import Any
19
+
20
+
21
+ _IMAGE_HISTORY_NOTE = (
22
+ "Image output omitted from chat history to avoid persisting raw data URLs."
23
+ )
24
+
25
+
26
+ def _utcnow_iso() -> str:
27
+ return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
28
+
29
+
30
+ def _gen_id() -> str:
31
+ ts = datetime.now().strftime("%Y%m%d-%H%M%S")
32
+ rand = "".join(random.choices(string.ascii_lowercase + string.digits, k=3))
33
+ return f"{ts}-{rand}"
34
+
35
+
36
+ def chats_dir(kb_dir: Path) -> Path:
37
+ return kb_dir / ".openkb" / "chats"
38
+
39
+
40
+ def _title_from(msg: str, limit: int = 60) -> str:
41
+ msg = " ".join(msg.strip().split())
42
+ if len(msg) <= limit:
43
+ return msg
44
+ return msg[: limit - 1] + "\u2026"
45
+
46
+
47
+ def _image_history_placeholder(image_path: str | None) -> dict[str, str]:
48
+ text = _IMAGE_HISTORY_NOTE
49
+ if image_path:
50
+ text += f" Source path: {image_path}."
51
+ text += " Call get_image again if you need to inspect it."
52
+ return {"type": "input_text", "text": text}
53
+
54
+
55
+ def _extract_get_image_path(item: dict[str, Any]) -> str | None:
56
+ if item.get("type") != "function_call" or item.get("name") != "get_image":
57
+ return None
58
+ arguments = item.get("arguments")
59
+ if not isinstance(arguments, str):
60
+ return None
61
+ try:
62
+ payload = json.loads(arguments)
63
+ except json.JSONDecodeError:
64
+ return None
65
+ image_path = payload.get("image_path")
66
+ if isinstance(image_path, str) and image_path:
67
+ return image_path
68
+ return None
69
+
70
+
71
+ def _sanitize_history_value(value: Any, image_path: str | None = None) -> Any:
72
+ if isinstance(value, list):
73
+ return [_sanitize_history_value(item, image_path) for item in value]
74
+ if not isinstance(value, dict):
75
+ return value
76
+
77
+ if value.get("type") == "input_image":
78
+ image_url = value.get("image_url")
79
+ if isinstance(image_url, str) and image_url.startswith("data:"):
80
+ return _image_history_placeholder(image_path)
81
+
82
+ return {
83
+ key: _sanitize_history_value(item, image_path)
84
+ for key, item in value.items()
85
+ }
86
+
87
+
88
+ def sanitize_history(history: list[dict[str, Any]]) -> list[dict[str, Any]]:
89
+ """Strip large image payloads from model history while keeping a re-fetch hint."""
90
+ image_paths_by_call_id: dict[str, str] = {}
91
+ sanitized: list[dict[str, Any]] = []
92
+
93
+ for item in history:
94
+ if not isinstance(item, dict):
95
+ sanitized.append(item)
96
+ continue
97
+
98
+ image_path = _extract_get_image_path(item)
99
+ call_id = item.get("call_id")
100
+ if image_path and isinstance(call_id, str):
101
+ image_paths_by_call_id[call_id] = image_path
102
+
103
+ history_image_path = None
104
+ if item.get("type") == "function_call_output" and isinstance(call_id, str):
105
+ history_image_path = image_paths_by_call_id.get(call_id)
106
+
107
+ sanitized.append(_sanitize_history_value(item, history_image_path))
108
+
109
+ return sanitized
110
+
111
+
112
+ @dataclass
113
+ class ChatSession:
114
+ id: str
115
+ created_at: str
116
+ updated_at: str
117
+ model: str
118
+ language: str
119
+ title: str
120
+ turn_count: int
121
+ history: list[dict[str, Any]]
122
+ user_turns: list[str]
123
+ assistant_texts: list[str]
124
+ path: Path
125
+
126
+ @classmethod
127
+ def new(cls, kb_dir: Path, model: str, language: str) -> "ChatSession":
128
+ now = _utcnow_iso()
129
+ sid = _gen_id()
130
+ return cls(
131
+ id=sid,
132
+ created_at=now,
133
+ updated_at=now,
134
+ model=model,
135
+ language=language,
136
+ title="",
137
+ turn_count=0,
138
+ history=[],
139
+ user_turns=[],
140
+ assistant_texts=[],
141
+ path=chats_dir(kb_dir) / f"{sid}.json",
142
+ )
143
+
144
+ def to_dict(self) -> dict[str, Any]:
145
+ return {
146
+ "id": self.id,
147
+ "created_at": self.created_at,
148
+ "updated_at": self.updated_at,
149
+ "model": self.model,
150
+ "language": self.language,
151
+ "title": self.title,
152
+ "turn_count": self.turn_count,
153
+ "history": self.history,
154
+ "user_turns": self.user_turns,
155
+ "assistant_texts": self.assistant_texts,
156
+ }
157
+
158
+ def save(self) -> None:
159
+ self.path.parent.mkdir(parents=True, exist_ok=True)
160
+ tmp = self.path.with_suffix(".json.tmp")
161
+ tmp.write_text(
162
+ json.dumps(self.to_dict(), ensure_ascii=False, indent=2, default=str),
163
+ encoding="utf-8",
164
+ )
165
+ os.replace(tmp, self.path)
166
+
167
+ def record_turn(
168
+ self,
169
+ user_message: str,
170
+ assistant_text: str,
171
+ new_history: list[dict[str, Any]],
172
+ ) -> None:
173
+ self.history = sanitize_history(new_history)
174
+ self.user_turns.append(user_message)
175
+ self.assistant_texts.append(assistant_text)
176
+ self.turn_count = len(self.user_turns)
177
+ if not self.title:
178
+ self.title = _title_from(user_message)
179
+ self.updated_at = _utcnow_iso()
180
+ self.save()
181
+
182
+
183
+ def load_session(kb_dir: Path, session_id: str) -> ChatSession:
184
+ path = chats_dir(kb_dir) / f"{session_id}.json"
185
+ data = json.loads(path.read_text(encoding="utf-8"))
186
+ return ChatSession(
187
+ id=data["id"],
188
+ created_at=data["created_at"],
189
+ updated_at=data["updated_at"],
190
+ model=data["model"],
191
+ language=data.get("language", "en"),
192
+ title=data.get("title", ""),
193
+ turn_count=data.get("turn_count", 0),
194
+ history=sanitize_history(data.get("history", [])),
195
+ user_turns=data.get("user_turns", []),
196
+ assistant_texts=data.get("assistant_texts", []),
197
+ path=path,
198
+ )
199
+
200
+
201
+ def list_sessions(kb_dir: Path) -> list[dict[str, Any]]:
202
+ """Return session metadata dicts, most recently updated first."""
203
+ d = chats_dir(kb_dir)
204
+ if not d.exists():
205
+ return []
206
+ out: list[dict[str, Any]] = []
207
+ for p in d.glob("*.json"):
208
+ try:
209
+ data = json.loads(p.read_text(encoding="utf-8"))
210
+ except (json.JSONDecodeError, OSError):
211
+ continue
212
+ out.append(
213
+ {
214
+ "id": data.get("id", p.stem),
215
+ "title": data.get("title", ""),
216
+ "turn_count": data.get("turn_count", 0),
217
+ "updated_at": data.get("updated_at", ""),
218
+ "model": data.get("model", ""),
219
+ }
220
+ )
221
+ out.sort(key=lambda s: (s["updated_at"], s["id"]), reverse=True)
222
+ return out
223
+
224
+
225
+ def resolve_session_id(kb_dir: Path, query: str) -> str | None:
226
+ """Resolve a query to a full session id.
227
+
228
+ ``query`` may be:
229
+ - ``"__latest__"`` — returns the most recently updated session id.
230
+ - A full session id — returned as-is if it exists.
231
+ - A unique prefix of a session id — expanded to the full id.
232
+
233
+ Returns ``None`` if no session matches. Raises ``ValueError`` when a
234
+ prefix is ambiguous.
235
+ """
236
+ sessions = list_sessions(kb_dir)
237
+ if not sessions:
238
+ return None
239
+ if query == "__latest__":
240
+ return sessions[0]["id"]
241
+ for s in sessions:
242
+ if s["id"] == query:
243
+ return s["id"]
244
+ matches = [s["id"] for s in sessions if s["id"].startswith(query)]
245
+ if len(matches) == 1:
246
+ return matches[0]
247
+ if len(matches) > 1:
248
+ raise ValueError(
249
+ f"Ambiguous session prefix '{query}' matches: {', '.join(matches)}"
250
+ )
251
+ return None
252
+
253
+
254
+ def delete_session(kb_dir: Path, session_id: str) -> bool:
255
+ path = chats_dir(kb_dir) / f"{session_id}.json"
256
+ if path.exists():
257
+ path.unlink()
258
+ return True
259
+ return False
260
+
261
+
262
+ def relative_time(iso_str: str) -> str:
263
+ """Render an ISO-8601 timestamp as a short relative string."""
264
+ try:
265
+ t = datetime.strptime(iso_str, "%Y-%m-%dT%H:%M:%SZ").replace(
266
+ tzinfo=timezone.utc
267
+ )
268
+ except (ValueError, TypeError):
269
+ return iso_str or ""
270
+ now = datetime.now(timezone.utc)
271
+ seconds = int((now - t).total_seconds())
272
+ if seconds < 60:
273
+ return "just now"
274
+ if seconds < 3600:
275
+ return f"{seconds // 60}m ago"
276
+ if seconds < 86400:
277
+ return f"{seconds // 3600}h ago"
278
+ if seconds < 86400 * 7:
279
+ return f"{seconds // 86400}d ago"
280
+ return t.strftime("%Y-%m-%d")
@@ -30,7 +30,7 @@ logger = logging.getLogger(__name__)
30
30
  # ---------------------------------------------------------------------------
31
31
 
32
32
  _SYSTEM_TEMPLATE = """\
33
- You are a wiki compilation agent for a personal knowledge base.
33
+ You are OpenKB's wiki compilation agent for a personal knowledge base.
34
34
 
35
35
  {schema_md}
36
36
 
@@ -284,6 +284,57 @@ def _read_concept_briefs(wiki_dir: Path) -> str:
284
284
  return "\n".join(lines) or "(none yet)"
285
285
 
286
286
 
287
+ def _get_section_bounds(lines: list[str], heading: str) -> tuple[int, int] | None:
288
+ """Return the [start, end) bounds for a Markdown H2 section."""
289
+ for i, line in enumerate(lines):
290
+ if line == heading:
291
+ start = i + 1
292
+ end = len(lines)
293
+ for j in range(start, len(lines)):
294
+ if lines[j].startswith("## "):
295
+ end = j
296
+ break
297
+ return start, end
298
+ return None
299
+
300
+
301
+ def _section_contains_link(lines: list[str], heading: str, link: str) -> bool:
302
+ """Check whether an index entry already exists inside the named section."""
303
+ bounds = _get_section_bounds(lines, heading)
304
+ if bounds is None:
305
+ return False
306
+
307
+ start, end = bounds
308
+ entry_prefix = f"- {link}"
309
+ return any(line.startswith(entry_prefix) for line in lines[start:end])
310
+
311
+
312
+ def _replace_section_entry(lines: list[str], heading: str, link: str, entry: str) -> bool:
313
+ """Replace the first matching entry within a specific section."""
314
+ bounds = _get_section_bounds(lines, heading)
315
+ if bounds is None:
316
+ return False
317
+
318
+ start, end = bounds
319
+ entry_prefix = f"- {link}"
320
+ for i in range(start, end):
321
+ if lines[i].startswith(entry_prefix):
322
+ lines[i] = entry
323
+ return True
324
+ return False
325
+
326
+
327
+ def _insert_section_entry(lines: list[str], heading: str, entry: str) -> bool:
328
+ """Insert a new entry at the top of a specific section."""
329
+ bounds = _get_section_bounds(lines, heading)
330
+ if bounds is None:
331
+ return False
332
+
333
+ start, _ = bounds
334
+ lines.insert(start, entry)
335
+ return True
336
+
337
+
287
338
 
288
339
  def _write_summary(wiki_dir: Path, doc_name: str, summary: str,
289
340
  doc_type: str = "short") -> None:
@@ -460,7 +511,6 @@ def _backlink_concepts(wiki_dir: Path, doc_name: str, concept_slugs: list[str])
460
511
  text += f"\n\n## Related Documents\n- {link}\n"
461
512
  path.write_text(text, encoding="utf-8")
462
513
 
463
-
464
514
  def _update_index(
465
515
  wiki_dir: Path, doc_name: str, concept_names: list[str],
466
516
  doc_brief: str = "", concept_briefs: dict[str, str] | None = None,
@@ -469,8 +519,9 @@ def _update_index(
469
519
  """Append document and concept entries to index.md.
470
520
 
471
521
  When ``doc_brief`` or entries in ``concept_briefs`` are provided, entries
472
- are written as ``- [[link]] (type) — brief text``. Existing entries are
473
- detected by the link part only and skipped to avoid duplicates.
522
+ are written as ``- [[link]] (type) — brief text``. Existing entries are
523
+ detected within their own section by exact entry prefix and skipped to
524
+ avoid duplicates.
474
525
  ``doc_type`` is ``"short"`` or ``"pageindex"`` — shown in the entry so the
475
526
  query agent knows how to access detailed content.
476
527
  """
@@ -484,34 +535,27 @@ def _update_index(
484
535
  encoding="utf-8",
485
536
  )
486
537
 
487
- text = index_path.read_text(encoding="utf-8")
538
+ lines = index_path.read_text(encoding="utf-8").split("\n")
488
539
 
489
540
  doc_link = f"[[summaries/{doc_name}]]"
490
- if doc_link not in text:
541
+ if not _section_contains_link(lines, "## Documents", doc_link):
491
542
  doc_entry = f"- {doc_link} ({doc_type})"
492
543
  if doc_brief:
493
544
  doc_entry += f" — {doc_brief}"
494
- if "## Documents" in text:
495
- text = text.replace("## Documents\n", f"## Documents\n{doc_entry}\n", 1)
545
+ _insert_section_entry(lines, "## Documents", doc_entry)
496
546
 
497
547
  for name in concept_names:
498
548
  concept_link = f"[[concepts/{name}]]"
499
549
  concept_entry = f"- {concept_link}"
500
550
  if name in concept_briefs:
501
551
  concept_entry += f" — {concept_briefs[name]}"
502
- if concept_link in text:
552
+ if _section_contains_link(lines, "## Concepts", concept_link):
503
553
  if name in concept_briefs:
504
- lines = text.split("\n")
505
- for i, line in enumerate(lines):
506
- if concept_link in line:
507
- lines[i] = concept_entry
508
- break
509
- text = "\n".join(lines)
554
+ _replace_section_entry(lines, "## Concepts", concept_link, concept_entry)
510
555
  else:
511
- if "## Concepts" in text:
512
- text = text.replace("## Concepts\n", f"## Concepts\n{concept_entry}\n", 1)
556
+ _insert_section_entry(lines, "## Concepts", concept_entry)
513
557
 
514
- index_path.write_text(text, encoding="utf-8")
558
+ index_path.write_text("\n".join(lines), encoding="utf-8")
515
559
 
516
560
 
517
561
  # ---------------------------------------------------------------------------
@@ -11,7 +11,7 @@ MAX_TURNS = 50
11
11
  from openkb.schema import SCHEMA_MD, get_agents_md
12
12
 
13
13
  _LINTER_INSTRUCTIONS_TEMPLATE = """\
14
- You are a knowledge-base semantic lint agent. Your job is to audit the wiki
14
+ You are OpenKB's semantic lint agent. Your job is to audit the wiki
15
15
  for quality issues that structural tools cannot detect.
16
16
 
17
17
  {schema_md}
@@ -50,7 +50,7 @@ def build_lint_agent(wiki_root: str, model: str, language: str = "en") -> Agent:
50
50
  """
51
51
  schema_md = get_agents_md(Path(wiki_root))
52
52
  instructions = _LINTER_INSTRUCTIONS_TEMPLATE.format(schema_md=schema_md)
53
- instructions += f"\n\nIMPORTANT: Write all wiki content in {language} language."
53
+ instructions += f"\n\nIMPORTANT: Write the lint report in {language} language."
54
54
 
55
55
  @function_tool
56
56
  def list_files(directory: str) -> str:
@@ -6,13 +6,13 @@ from pathlib import Path
6
6
  from agents import Agent, Runner, function_tool
7
7
 
8
8
  from agents import ToolOutputImage, ToolOutputText
9
- from openkb.agent.tools import read_wiki_file, read_wiki_image
9
+ from openkb.agent.tools import get_wiki_page_content, read_wiki_file, read_wiki_image
10
10
 
11
11
  MAX_TURNS = 50
12
12
  from openkb.schema import get_agents_md
13
13
 
14
14
  _QUERY_INSTRUCTIONS_TEMPLATE = """\
15
- You are a knowledge-base Q&A agent. You answer questions by searching the wiki.
15
+ You are OpenKB, a knowledge-base Q&A agent. You answer questions by searching the wiki.
16
16
 
17
17
  {schema_md}
18
18
 
@@ -20,7 +20,8 @@ You are a knowledge-base Q&A agent. You answer questions by searching the wiki.
20
20
  1. Read index.md to see all documents and concepts with brief summaries.
21
21
  Each document is marked (short) or (pageindex) to indicate its type.
22
22
  2. Read relevant summary pages (summaries/) for document overviews.
23
- Note: summaries may omit details.
23
+ Summaries may omit details — if you need more, follow the summary's
24
+ `full_text` frontmatter field to the source (see step 4).
24
25
  3. Read concept pages (concepts/) for cross-document synthesis.
25
26
  4. When you need detailed source document content, each summary page has a
26
27
  `full_text` frontmatter field with the path to the original document content:
@@ -28,9 +29,8 @@ You are a knowledge-base Q&A agent. You answer questions by searching the wiki.
28
29
  - PageIndex documents (doc_type: pageindex): use get_page_content(doc_name, pages)
29
30
  with tight page ranges. The summary shows document tree structure with page
30
31
  ranges to help you target. Never fetch the whole document.
31
- 5. When source content references images (e.g. ![image](sources/images/doc/file.png)),
32
- use get_image to view them. Always view images when the question asks about
33
- a figure, chart, diagram, or visual content.
32
+ 5. Source content may reference images (e.g. ![image](sources/images/doc/file.png)).
33
+ Use the get_image tool to view them when needed.
34
34
  6. Synthesize a clear, concise, well-cited answer grounded in wiki content.
35
35
 
36
36
  Answer based only on wiki content. Be concise.
@@ -44,7 +44,7 @@ def build_query_agent(wiki_root: str, model: str, language: str = "en") -> Agent
44
44
  """Build and return the Q&A agent."""
45
45
  schema_md = get_agents_md(Path(wiki_root))
46
46
  instructions = _QUERY_INSTRUCTIONS_TEMPLATE.format(schema_md=schema_md)
47
- instructions += f"\n\nIMPORTANT: Write all wiki content in {language} language."
47
+ instructions += f"\n\nIMPORTANT: Answer in {language} language."
48
48
 
49
49
  @function_tool
50
50
  def read_file(path: str) -> str:
@@ -55,7 +55,7 @@ def build_query_agent(wiki_root: str, model: str, language: str = "en") -> Agent
55
55
  return read_wiki_file(path, wiki_root)
56
56
 
57
57
  @function_tool
58
- def get_page_content_tool(doc_name: str, pages: str) -> str:
58
+ def get_page_content(doc_name: str, pages: str) -> str:
59
59
  """Get text content of specific pages from a PageIndex (long) document.
60
60
  Only use for documents with doc_type: pageindex. For short documents,
61
61
  use read_file instead.
@@ -63,13 +63,15 @@ def build_query_agent(wiki_root: str, model: str, language: str = "en") -> Agent
63
63
  doc_name: Document name (e.g. 'attention-is-all-you-need').
64
64
  pages: Page specification (e.g. '3-5,7,10-12').
65
65
  """
66
- from openkb.agent.tools import get_page_content
67
- return get_page_content(doc_name, pages, wiki_root)
66
+ return get_wiki_page_content(doc_name, pages, wiki_root)
68
67
 
69
68
  @function_tool
70
69
  def get_image(image_path: str) -> ToolOutputImage | ToolOutputText:
71
70
  """View an image from the wiki.
72
- Use when source content references images you need to see.
71
+
72
+ Use when a question asks about a specific figure, chart, or diagram
73
+ you'd need to see to answer accurately.
74
+
73
75
  Args:
74
76
  image_path: Image path relative to wiki root (e.g. 'sources/images/doc/p1_img1.png').
75
77
  """
@@ -83,7 +85,7 @@ def build_query_agent(wiki_root: str, model: str, language: str = "en") -> Agent
83
85
  return Agent(
84
86
  name="wiki-query",
85
87
  instructions=instructions,
86
- tools=[read_file, get_page_content_tool, get_image],
88
+ tools=[read_file, get_page_content, get_image],
87
89
  model=f"litellm/{model}",
88
90
  model_settings=ModelSettings(parallel_tool_calls=False),
89
91
  )
@@ -89,7 +89,7 @@ def parse_pages(pages: str) -> list[int]:
89
89
  return sorted(n for n in result if n > 0)
90
90
 
91
91
 
92
- def get_page_content(doc_name: str, pages: str, wiki_root: str) -> str:
92
+ def get_wiki_page_content(doc_name: str, pages: str, wiki_root: str) -> str:
93
93
  """Return formatted content for specified pages of a document.
94
94
 
95
95
  Reads ``{wiki_root}/sources/{doc_name}.json`` which must be a JSON array of
@@ -1,6 +1,12 @@
1
1
  """OpenKB CLI — command-line interface for the knowledge base workflow."""
2
2
  from __future__ import annotations
3
3
 
4
+ # Silence import-time warnings (e.g. pydub's missing-ffmpeg warning emitted
5
+ # when markitdown pulls it in). markitdown later clobbers the filters during
6
+ # its own import, so we re-apply after all imports below.
7
+ import warnings
8
+ warnings.filterwarnings("ignore")
9
+
4
10
  import asyncio
5
11
  import json
6
12
  import logging
@@ -256,22 +262,23 @@ def init():
256
262
  return
257
263
 
258
264
  # Interactive prompts
265
+ click.echo("Pick an LLM in `provider/model` LiteLLM format:")
266
+ click.echo(" OpenAI: gpt-5.4-mini, gpt-5.4")
267
+ click.echo(" Anthropic: anthropic/claude-sonnet-4-6, anthropic/claude-opus-4-6")
268
+ click.echo(" Gemini: gemini/gemini-3.1-pro-preview, gemini/gemini-3-flash-preview")
269
+ click.echo(" Others: see https://docs.litellm.ai/docs/providers")
270
+ click.echo()
259
271
  model = click.prompt(
260
- f"Model (e.g. gpt-5.4-mini, anthropic/claude-sonnet-4-6) [default: {DEFAULT_CONFIG['model']}]",
272
+ f"Model (enter for default {DEFAULT_CONFIG['model']})",
261
273
  default=DEFAULT_CONFIG["model"],
262
274
  show_default=False,
263
275
  )
264
- language = click.prompt(
265
- f"Language [default: {DEFAULT_CONFIG['language']}]",
266
- default=DEFAULT_CONFIG["language"],
276
+ api_key = click.prompt(
277
+ "LLM API Key (saved to .env, enter to skip)",
278
+ default="",
279
+ hide_input=True,
267
280
  show_default=False,
268
- )
269
- pageindex_threshold = click.prompt(
270
- f"PageIndex threshold (pages) [default: {DEFAULT_CONFIG['pageindex_threshold']}]",
271
- default=DEFAULT_CONFIG["pageindex_threshold"],
272
- type=int,
273
- show_default=False,
274
- )
281
+ ).strip()
275
282
  # Create directory structure
276
283
  Path("raw").mkdir(exist_ok=True)
277
284
  Path("wiki/sources/images").mkdir(parents=True, exist_ok=True)
@@ -290,12 +297,22 @@ def init():
290
297
  openkb_dir.mkdir()
291
298
  config = {
292
299
  "model": model,
293
- "language": language,
294
- "pageindex_threshold": pageindex_threshold,
300
+ "language": DEFAULT_CONFIG["language"],
301
+ "pageindex_threshold": DEFAULT_CONFIG["pageindex_threshold"],
295
302
  }
296
303
  save_config(openkb_dir / "config.yaml", config)
297
304
  (openkb_dir / "hashes.json").write_text(json.dumps({}), encoding="utf-8")
298
305
 
306
+ # Write API key to KB-local .env (0600) if the user provided one
307
+ if api_key:
308
+ env_path = Path(".env")
309
+ if env_path.exists():
310
+ click.echo(".env already exists, skipping write. Add LLM_API_KEY manually if needed.")
311
+ else:
312
+ env_path.write_text(f"LLM_API_KEY={api_key}\n", encoding="utf-8")
313
+ os.chmod(env_path, 0o600)
314
+ click.echo("Saved LLM API key to .env.")
315
+
299
316
  # Register this KB in the global config
300
317
  register_kb(Path.cwd())
301
318
 
@@ -378,6 +395,107 @@ def query(ctx, question, save):
378
395
  click.echo(f"\nSaved to {explore_path}")
379
396
 
380
397
 
398
+ @cli.command()
399
+ @click.option(
400
+ "--resume", "-r", "resume",
401
+ is_flag=False, flag_value="__latest__", default=None, metavar="[ID]",
402
+ help="Resume the latest chat session, or a specific one by id or prefix.",
403
+ )
404
+ @click.option(
405
+ "--list", "list_sessions_flag",
406
+ is_flag=True, default=False,
407
+ help="List chat sessions.",
408
+ )
409
+ @click.option(
410
+ "--delete", "delete_id",
411
+ default=None, metavar="ID",
412
+ help="Delete a chat session by id or prefix.",
413
+ )
414
+ @click.option(
415
+ "--no-color", "no_color",
416
+ is_flag=True, default=False,
417
+ help="Disable colored output.",
418
+ )
419
+ @click.pass_context
420
+ def chat(ctx, resume, list_sessions_flag, delete_id, no_color):
421
+ """Start an interactive chat with the knowledge base."""
422
+ kb_dir = _find_kb_dir(ctx.obj.get("kb_dir_override"))
423
+ if kb_dir is None:
424
+ click.echo("No knowledge base found. Run `openkb init` first.")
425
+ return
426
+
427
+ from openkb.agent.chat_session import (
428
+ ChatSession,
429
+ delete_session,
430
+ list_sessions,
431
+ load_session,
432
+ relative_time,
433
+ resolve_session_id,
434
+ )
435
+
436
+ if list_sessions_flag:
437
+ sessions = list_sessions(kb_dir)
438
+ if not sessions:
439
+ click.echo("No chat sessions yet.")
440
+ return
441
+ click.echo(f" {'ID':<22} {'TURNS':<6} {'UPDATED':<12} TITLE")
442
+ click.echo(f" {'-'*22} {'-'*6} {'-'*12} {'-'*30}")
443
+ for s in sessions:
444
+ rel = relative_time(s.get("updated_at", ""))
445
+ title = s.get("title") or "(empty)"
446
+ click.echo(
447
+ f" {s['id']:<22} {s['turn_count']:<6} {rel:<12} {title}"
448
+ )
449
+ click.echo(
450
+ f"\n{len(sessions)} session(s) in {kb_dir / '.openkb' / 'chats'}"
451
+ )
452
+ return
453
+
454
+ if delete_id is not None:
455
+ try:
456
+ resolved = resolve_session_id(kb_dir, delete_id)
457
+ except ValueError as exc:
458
+ click.echo(f"[ERROR] {exc}")
459
+ return
460
+ if not resolved:
461
+ click.echo(f"No matching session: {delete_id}")
462
+ return
463
+ if delete_session(kb_dir, resolved):
464
+ click.echo(f"Deleted session {resolved}")
465
+ else:
466
+ click.echo(f"Could not delete session: {resolved}")
467
+ return
468
+
469
+ openkb_dir = kb_dir / ".openkb"
470
+ config = load_config(openkb_dir / "config.yaml")
471
+ _setup_llm_key(kb_dir)
472
+
473
+ if resume is not None:
474
+ try:
475
+ resolved = resolve_session_id(kb_dir, resume)
476
+ except ValueError as exc:
477
+ click.echo(f"[ERROR] {exc}")
478
+ return
479
+ if not resolved:
480
+ if resume == "__latest__":
481
+ click.echo("No previous chat sessions to resume.")
482
+ else:
483
+ click.echo(f"No matching session: {resume}")
484
+ return
485
+ session = load_session(kb_dir, resolved)
486
+ else:
487
+ model: str = config.get("model", DEFAULT_CONFIG["model"])
488
+ language: str = config.get("language", "en")
489
+ session = ChatSession.new(kb_dir, model, language)
490
+
491
+ from openkb.agent.chat import run_chat
492
+
493
+ try:
494
+ asyncio.run(run_chat(kb_dir, session, no_color=no_color))
495
+ except Exception as exc:
496
+ click.echo(f"[ERROR] Chat failed: {exc}")
497
+
498
+
381
499
  @cli.command()
382
500
  @click.pass_context
383
501
  def watch(ctx):
@@ -77,13 +77,28 @@ def index_long_document(pdf_path: Path, kb_dir: Path) -> IndexResult:
77
77
  "structure": structure,
78
78
  }
79
79
 
80
- # Write wiki/sources/ — extract per-page content with pymupdf (not PageIndex)
80
+ # Write wiki/sources/ — per-page content
81
81
  sources_dir = kb_dir / "wiki" / "sources"
82
82
  sources_dir.mkdir(parents=True, exist_ok=True)
83
83
  images_dir = sources_dir / "images" / pdf_path.stem
84
84
 
85
85
  from openkb.images import convert_pdf_to_pages
86
- all_pages = convert_pdf_to_pages(pdf_path, pdf_path.stem, images_dir)
86
+
87
+ all_pages: list = []
88
+ if pageindex_api_key:
89
+ # Cloud mode: fetch OCR'd markdown from PageIndex. get_page_content
90
+ # requires a page range, so pass "1-N".
91
+ from openkb.converter import get_pdf_page_count
92
+ page_count = get_pdf_page_count(pdf_path)
93
+ try:
94
+ all_pages = col.get_page_content(doc_id, f"1-{page_count}")
95
+ except Exception as exc:
96
+ logger.warning("Cloud get_page_content failed for %s: %s", pdf_path.name, exc)
97
+
98
+ if not all_pages:
99
+ if pageindex_api_key:
100
+ logger.warning("Cloud returned no pages for %s; falling back to local pymupdf", pdf_path.name)
101
+ all_pages = convert_pdf_to_pages(pdf_path, pdf_path.stem, images_dir)
87
102
 
88
103
  (sources_dir / f"{pdf_path.stem}.json").write_text(
89
104
  json_mod.dumps(all_pages, ensure_ascii=False, indent=2), encoding="utf-8",
@@ -1,7 +1,7 @@
1
1
  [tool.poetry]
2
2
  name = "openkb"
3
- version = "0.1.0.dev1"
4
- description = "OpenKB Open LLM Knowledge Base, powered by PageIndex"
3
+ version = "0.1.2"
4
+ description = "OpenKB: Open LLM Knowledge Base, powered by PageIndex"
5
5
  readme = "README.md"
6
6
  license = "Apache-2.0"
7
7
  authors = [
@@ -37,14 +37,22 @@ json-repair = "*"
37
37
  litellm = "*"
38
38
  markitdown = {version = "*", extras = ["all"]}
39
39
  openai-agents = "*"
40
- pageindex = "0.3.0.dev0"
40
+ pageindex = "0.3.0.dev1"
41
+ prompt_toolkit = ">=3.0"
41
42
  python-dotenv = "*"
42
43
  pyyaml = "*"
43
44
  watchdog = ">=3.0"
44
45
 
46
+ [tool.poetry.group.dev.dependencies]
47
+ pytest = "*"
48
+ pytest-asyncio = "*"
49
+
45
50
  [tool.poetry.scripts]
46
51
  openkb = "openkb.cli:cli"
47
52
 
53
+ [tool.pytest.ini_options]
54
+ testpaths = ["tests"]
55
+
48
56
  [build-system]
49
57
  requires = ["poetry-core"]
50
58
  build-backend = "poetry.core.masonry.api"
@@ -1 +0,0 @@
1
- __version__ = "0.1.0"
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes