@bndynet/ragbox 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -41,6 +41,11 @@ Before continuing, edit `ragbox.config.json`: add your model settings and point
41
41
  "model": "gpt-4o-mini",
42
42
  "apiKey": "sk-..."
43
43
  },
44
+ "serve": {
45
+ "authToken": "YOUR_RAGBOX_SERVE_TOKEN",
46
+ "host": "127.0.0.1",
47
+ "port": 8787
48
+ },
44
49
  "docs": {
45
50
  "rootDir": "./docs",
46
51
  "outputDir": "./.ragbox-index"
@@ -67,6 +72,13 @@ Use `start` when you want one foreground process for indexing, watching, and ser
67
72
  ragbox start
68
73
  ```
69
74
 
75
+ For a quick detached local run, use `--background`:
76
+
77
+ ```bash
78
+ ragbox start --background
79
+ ragbox stop
80
+ ```
81
+
70
82
  You can still pass explicit paths when you want to override the config for one command:
71
83
 
72
84
  ```bash
@@ -122,7 +134,7 @@ The default setup needs:
122
134
  | Avoid repeating paths | use the `ragbox.config.json` written by `ragbox setup pageindex`, or run `ragbox init` |
123
135
  | Query several docs folders together | Configure `sources`, run `ragbox index --source <name>`, then `ragbox query --all-sources "..."` |
124
136
  | Debug answer quality | `ragbox query --trace --json "..."` or `ragbox trace query "..."` |
125
- | Check whether an index is usable | `ragbox status ./.ragbox-index` |
137
+ | Check index and HTTP server status | `ragbox status ./.ragbox-index` |
126
138
  | Diagnose local setup issues | `ragbox doctor` |
127
139
  | Keep docs indexed while editing | `ragbox watch ./docs --output-dir ./.ragbox-index --jsonl` |
128
140
  | Run the full local service loop | `ragbox start --auth-token <token>` |
@@ -144,6 +156,11 @@ The default setup needs:
144
156
  "model": "gpt-4o-mini",
145
157
  "apiKey": "sk-..."
146
158
  },
159
+ "serve": {
160
+ "authToken": "YOUR_RAGBOX_SERVE_TOKEN",
161
+ "host": "127.0.0.1",
162
+ "port": 8787
163
+ },
147
164
  "docs": {
148
165
  "rootDir": "./docs",
149
166
  "outputDir": "./.ragbox-index"
@@ -158,7 +175,7 @@ ragbox init
158
175
  ragbox init --docs-dir ./content --output-dir ./.idx
159
176
  ```
160
177
 
161
- Relative paths are resolved from the config file directory. Server-side deployments can keep `baseUrl`, `model`, and `apiKey` together in a private `ragbox.config.json` or environment-specific config such as `ragbox.config.prod.json`. If a config file is committed or shared, keep `apiKey` in environment variables or a secret manager instead.
178
+ Relative paths are resolved from the config file directory. Server-side deployments can keep `baseUrl`, `model`, `serve.host`, `serve.port`, and `apiKey` together in a private `ragbox.config.json` or environment-specific config such as `ragbox.config.prod.json`. If a config file is committed or shared, keep `apiKey` in environment variables or a secret manager instead.
162
179
 
163
180
  For one documentation source, use the top-level `docs` object. No `--source` flag is needed. If a project needs multiple named sources, use the optional `sources` map.
164
181
 
@@ -186,6 +203,11 @@ For multiple documentation directories, name each one under `sources`. This is u
186
203
  "model": "gpt-4o-mini",
187
204
  "apiKey": "sk-..."
188
205
  },
206
+ "serve": {
207
+ "authToken": "YOUR_RAGBOX_SERVE_TOKEN",
208
+ "host": "127.0.0.1",
209
+ "port": 8787
210
+ },
189
211
  "sources": {
190
212
  "ragbox": {
191
213
  "rootDir": "./ragbox",
@@ -227,22 +249,23 @@ ragbox --config ./ragbox.config.prod.json query "How do I deploy?"
227
249
 
228
250
  ## Configuration
229
251
 
230
- For server-side use, keep the stable settings in `ragbox.config.json`: PageIndex paths, docs paths, LLM `baseUrl`, `model`, and, when the file is private, `apiKey`. Environment variables and CLI flags are still supported for overrides, secret managers, and one-off runs.
252
+ For server-side use, keep the stable settings in `ragbox.config.json`: PageIndex paths, docs paths, serve host/port, LLM `baseUrl`, `model`, and, when the file is private, `apiKey`. Environment variables and CLI flags are still supported for overrides, secret managers, and one-off runs.
231
253
 
232
254
  Resolution order is command-line flags, then `ragbox.config.json`, then environment variables, then defaults.
233
255
 
234
- | Setting | Env | CLI flag | Used by | Default |
256
+ | Setting | Env | Config / CLI | Used by | Default |
235
257
  | --- | --- | --- | --- | --- |
236
- | PageIndex script | `PAGEINDEX_CLI` | `ragbox setup pageindex` writes config | `index`, `watch` | required when indexing |
237
- | Python executable | `PAGEINDEX_PYTHON` | `--pageindex-python` | `index`, `watch` | `python3` |
238
- | Output directory | `RAGBOX_OUTPUT_DIR` | `--output-dir` | `index`, `watch` | `<folder>/.pageindex` |
239
- | Concurrency | `PAGEINDEX_CONCURRENCY` | `--concurrency` | `index`, `watch` | `1` |
258
+ | PageIndex script | `PAGEINDEX_CLI` | `ragbox setup pageindex` writes config | `index`, `watch`, `start` | required when indexing |
259
+ | Python executable | `PAGEINDEX_PYTHON` | `--pageindex-python` | `index`, `watch`, `start` | `python3` |
260
+ | Output directory | `RAGBOX_OUTPUT_DIR` | `--output-dir` | `index`, `watch`, `start` | `<folder>/.pageindex` |
261
+ | Concurrency | `PAGEINDEX_CONCURRENCY` | `--concurrency` | `index`, `watch`, `start` | `1` |
262
+ | PageIndex runner | `PAGEINDEX_RUNNER` | `--pageindex-runner` | `index`, `watch`, `start` | `auto` |
240
263
  | API base URL | `OPENAI_BASE_URL` | `--base-url` | `index`, `watch`, `query` | `https://api.openai.com/v1` |
241
264
  | API key | `OPENAI_API_KEY` | `--api-key` | `index`, `watch`, `query` | required for query and usually PageIndex |
242
265
  | Model | `PAGEINDEX_MODEL`, `LLM_MODEL` | `--model` | `index`, `watch`, `query` | `gpt-4o-mini` |
243
- | Serve host | `RAGBOX_SERVE_HOST` | `--host` | `serve` | `127.0.0.1` |
244
- | Serve port | `RAGBOX_SERVE_PORT` | `--port` | `serve` | `8787` |
245
- | Serve token | `RAGBOX_SERVE_TOKEN` | `--auth-token` | `serve` | none |
266
+ | Serve host | `RAGBOX_SERVE_HOST` | `serve.host`, `--host` | `start`, `serve`, `status`, `doctor` | `127.0.0.1` |
267
+ | Serve port | `RAGBOX_SERVE_PORT` | `serve.port`, `--port` | `start`, `serve`, `status`, `doctor` | `8787` |
268
+ | Serve token | `RAGBOX_SERVE_TOKEN` | `serve.authToken`, `--auth-token` | `start`, `serve` | none |
246
269
  | Watch debounce | `RAGBOX_WATCH_DEBOUNCE_MS` | `--debounce-ms` | `watch` | `500` |
247
270
  | Watch retry attempts | `RAGBOX_WATCH_RETRY_ATTEMPTS` | `--retry-attempts` | `watch` | `0` |
248
271
  | Watch retry delay | `RAGBOX_WATCH_RETRY_DELAY_MS` | `--retry-delay-ms` | `watch` | `1000` |
@@ -335,7 +358,7 @@ ragbox inspect --all-sources --json
335
358
 
336
359
  ### `ragbox status [target]`
337
360
 
338
- Checks whether an index is ready to query. This is useful in CI, deploy scripts, and smoke checks.
361
+ Checks whether an index is ready to query and whether the local HTTP server health endpoint is reachable. The server probe uses `serve.host` / `serve.port`, then `RAGBOX_SERVE_HOST` / `RAGBOX_SERVE_PORT`, defaulting to `127.0.0.1:8787`.
339
362
 
340
363
  ```bash
341
364
  ragbox status ./.ragbox-index
@@ -345,7 +368,7 @@ ragbox status --json
345
368
 
346
369
  ### `ragbox doctor [target]`
347
370
 
348
- Checks the local setup: config, PageIndex CLI path, LLM settings, API key presence, and index validity. It does not call the network.
371
+ Checks the local setup: config, PageIndex CLI path, LLM settings, API key presence, index validity, and local HTTP server health.
349
372
 
350
373
  ```bash
351
374
  ragbox doctor
@@ -404,23 +427,40 @@ Fatal query errors include the stage that failed, for example `Query failed duri
404
427
 
405
428
  ### `ragbox start [folder]`
406
429
 
407
- Runs the complete local service loop: index first, watch for future changes, and serve the HTTP query API.
430
+ Runs the complete local service loop: start watch, serve the HTTP query API, and keep indexes fresh.
408
431
 
409
432
  ```bash
410
433
  ragbox start
411
434
  ragbox start --auth-token dev-token
412
435
  ragbox start --host 127.0.0.1 --port 8787 --jsonl
436
+ ragbox start --background
413
437
  ragbox start --source ragbox
414
438
  ragbox start --all-sources
415
439
  ragbox start ./docs --output-dir ./.ragbox-index
416
440
  ```
417
441
 
418
- Use `start` after `ragbox setup pageindex` when you want one foreground process for local development, an internal service, or a container. It waits for the initial index run before starting HTTP `serve`, then reloads the serve index snapshot after successful watch updates.
442
+ Use `start` after `ragbox setup pageindex` when you want one foreground process for local development, an internal service, or a container. HTTP `serve` starts as soon as the watchers are registered, so `/` and `/health` respond while the initial index is still running. `/health` returns 503 until the first index snapshot is query-ready, and `start` reloads the serve index snapshot after the initial run and every successful watch update.
443
+
444
+ Pass `--background` to detach `start` from the current terminal. Background runs write stdout/stderr to `./ragbox.log` and the process id to `./ragbox.pid` by default. Override those paths with `--log-file <path>` and `--pid-file <path>`, or pass `--no-pid-file` when you do not want a pid file.
445
+
446
+ Use `ragbox stop` from the same working directory to stop the background process recorded in `./ragbox.pid`. Pass `--pid-file <path>` if you started with a custom pid file.
419
447
 
420
448
  With multiple configured sources, `ragbox start` starts all sources by default. Use `--source ragbox,icharts` to limit the running sources, or `--all-sources` to make the global behavior explicit.
421
449
 
422
450
  `start` does not create or edit `ragbox.config.json`; run `ragbox setup pageindex` for the default local setup, or `ragbox init` when you want to manage PageIndex yourself.
423
451
 
452
+ ### `ragbox stop`
453
+
454
+ Stops a `ragbox start --background` process by reading `./ragbox.pid` in the current working directory.
455
+
456
+ ```bash
457
+ ragbox stop
458
+ ragbox stop --pid-file /var/run/ragbox.pid
459
+ ragbox stop --force
460
+ ```
461
+
462
+ By default, `stop` sends `SIGTERM`, waits for the process to exit, and removes the pid file. Use `--force` to send `SIGKILL`.
463
+
424
464
  ### `ragbox serve [target]`
425
465
 
426
466
  Starts a foreground HTTP server for external systems. Index first with `ragbox index`, or keep the index fresh with `ragbox watch`.
@@ -451,6 +491,7 @@ Single-index requests:
451
491
  ```bash
452
492
  curl http://127.0.0.1:8787/
453
493
  curl http://127.0.0.1:8787/health
494
+ ragbox status ./.ragbox-index
454
495
 
455
496
  curl -H "Authorization: Bearer dev-token" \
456
497
  http://127.0.0.1:8787/indexes
@@ -479,7 +520,7 @@ curl -X POST http://localhost:8787/query \
479
520
  curl -X POST http://localhost:8787/reload
480
521
  ```
481
522
 
482
- `serve` is designed for local services, internal services, container sidecars, and docs backends. Do not expose `.ragbox-index` as static files, because it can contain source document text. Browser widgets should call your own backend first; the backend can enforce user auth, rate limits, and audit logging before forwarding requests to `ragbox serve`. In production, bind to localhost or an internal network address and configure `--auth-token` or `RAGBOX_SERVE_TOKEN`.
523
+ `serve` is designed for local services, internal services, container sidecars, and docs backends. Do not expose `.ragbox-index` as static files, because it can contain source document text. Browser widgets should call your own backend first; the backend can enforce user auth, rate limits, and audit logging before forwarding requests to `ragbox serve`. In production, bind to localhost or an internal network address and configure `serve.authToken`, `--auth-token`, or `RAGBOX_SERVE_TOKEN`.
483
524
 
484
525
  ### `ragbox watch <folder>`
485
526
 
@@ -550,8 +591,9 @@ Common patterns:
550
591
  - Store the output directory outside the source tree, for example `/var/lib/ragbox/docs-index`.
551
592
  - Mount or copy the completed output directory to every app replica that needs querying.
552
593
  - Keep API keys in a private server config, environment variables, or your secret manager. Do not commit real keys.
553
- - Use `RAGBOX_SERVE_TOKEN` or `--auth-token` when `serve` is reachable beyond localhost.
594
+ - Use `serve.authToken`, `RAGBOX_SERVE_TOKEN`, or `--auth-token` when `serve` is reachable beyond localhost. Treat the token like a secret if the config is committed or shared.
554
595
  - Start with `--concurrency 1`; raise it only after checking PageIndex and API rate limits.
596
+ - Keep the default `--pageindex-runner auto` for Markdown/MDX indexes; it uses warm PageIndex workers when possible and falls back to the single-file CLI when needed.
555
597
 
556
598
  Example private server config:
557
599
 
@@ -560,13 +602,19 @@ Example private server config:
560
602
  "version": 1,
561
603
  "pageIndex": {
562
604
  "cli": "/opt/PageIndex/run_pageindex.py",
563
- "python": "/opt/pageindex-venv/bin/python"
605
+ "python": "/opt/pageindex-venv/bin/python",
606
+ "runner": "auto"
564
607
  },
565
608
  "llm": {
566
609
  "baseUrl": "https://api.openai.com/v1",
567
610
  "model": "gpt-4o-mini",
568
611
  "apiKey": "sk-..."
569
612
  },
613
+ "serve": {
614
+ "authToken": "YOUR_RAGBOX_SERVE_TOKEN",
615
+ "host": "127.0.0.1",
616
+ "port": 8787
617
+ },
570
618
  "docs": {
571
619
  "rootDir": "/srv/app/docs",
572
620
  "outputDir": "/var/lib/ragbox/docs-index"
@@ -581,14 +629,19 @@ ragbox --config ./ragbox.config.prod.json query "How do I configure authenticati
581
629
 
582
630
  ### Run In The Background
583
631
 
584
- `ragbox start` intentionally runs in the foreground. For a server, run it under a process supervisor so it survives terminal closes, SSH disconnects, and crashes.
632
+ For quick local or internal testing, `ragbox start --background` detaches the same start loop from the current terminal:
585
633
 
586
- For quick testing, `nohup` is enough:
634
+ ```bash
635
+ ragbox --config ./ragbox.config.prod.json start \
636
+ --background
637
+ ```
587
638
 
588
639
  ```bash
589
- nohup ragbox --config ./ragbox.config.prod.json start > ragbox.log 2>&1 &
640
+ ragbox stop
590
641
  ```
591
642
 
643
+ For a long-running server, prefer a process supervisor so crashes are restarted and startup ordering is explicit.
644
+
592
645
  For Linux servers, prefer `systemd`:
593
646
 
594
647
  ```ini
package/README.zh-CN.md CHANGED
@@ -39,6 +39,11 @@ ragbox setup pageindex
39
39
  "model": "gpt-4o-mini",
40
40
  "apiKey": "sk-..."
41
41
  },
42
+ "serve": {
43
+ "authToken": "YOUR_RAGBOX_SERVE_TOKEN",
44
+ "host": "127.0.0.1",
45
+ "port": 8787
46
+ },
42
47
  "docs": {
43
48
  "rootDir": "./docs",
44
49
  "outputDir": "./.ragbox-index"
@@ -65,6 +70,13 @@ ragbox watch --jsonl
65
70
  ragbox start
66
71
  ```
67
72
 
73
+ 如果只是想临时脱离当前终端运行,可以加 `--background`:
74
+
75
+ ```bash
76
+ ragbox start --background
77
+ ragbox stop
78
+ ```
79
+
68
80
  需要临时覆盖配置时,仍然可以显式传路径:
69
81
 
70
82
  ```bash
@@ -120,7 +132,7 @@ ragbox query ./.ragbox-index "怎么配置认证?" \
120
132
  | 不想每次重复路径 | 使用 `ragbox setup pageindex` 写入的 `ragbox.config.json`,或运行 `ragbox init` |
121
133
  | 多个文档目录一起查询 | 配置 `sources`,分别跑 `ragbox index --source <name>`,再用 `ragbox query --all-sources "..."` |
122
134
  | 调试回答质量 | `ragbox query --trace --json "..."` 或 `ragbox trace query "..."` |
123
- | 检查索引是否可用 | `ragbox status ./.ragbox-index` |
135
+ | 检查索引和 HTTP 服务状态 | `ragbox status ./.ragbox-index` |
124
136
  | 诊断本地配置问题 | `ragbox doctor` |
125
137
  | 编辑文档时自动更新索引 | `ragbox watch ./docs --output-dir ./.ragbox-index --jsonl` |
126
138
  | 跑完整本地服务流程 | `ragbox start --auth-token <token>` |
@@ -142,6 +154,11 @@ ragbox query ./.ragbox-index "怎么配置认证?" \
142
154
  "model": "gpt-4o-mini",
143
155
  "apiKey": "sk-..."
144
156
  },
157
+ "serve": {
158
+ "authToken": "YOUR_RAGBOX_SERVE_TOKEN",
159
+ "host": "127.0.0.1",
160
+ "port": 8787
161
+ },
145
162
  "docs": {
146
163
  "rootDir": "./docs",
147
164
  "outputDir": "./.ragbox-index"
@@ -156,7 +173,7 @@ ragbox init
156
173
  ragbox init --docs-dir ./content --output-dir ./.idx
157
174
  ```
158
175
 
159
- 配置文件中的相对路径会按配置文件所在目录解析。Server 端部署可以把 `baseUrl`、`model` 和 `apiKey` 一起放在私有 `ragbox.config.json`,或按环境拆成 `ragbox.config.prod.json`。如果配置文件会提交到仓库或共享给他人,就不要写真实 `apiKey`,改用环境变量或 secret manager。
176
+ 配置文件中的相对路径会按配置文件所在目录解析。Server 端部署可以把 `baseUrl`、`model`、`serve.host`、`serve.port` 和 `apiKey` 一起放在私有 `ragbox.config.json`,或按环境拆成 `ragbox.config.prod.json`。如果配置文件会提交到仓库或共享给他人,就不要写真实 `apiKey`,改用环境变量或 secret manager。
160
177
 
161
178
  只有一个文档源时,用顶层 `docs` 就够了,不需要传 `--source`。项目里确实有多个命名文档源时,再使用可选的 `sources` 映射。
162
179
 
@@ -184,6 +201,11 @@ ragbox --config ./ragbox.config.json index
184
201
  "model": "gpt-4o-mini",
185
202
  "apiKey": "sk-..."
186
203
  },
204
+ "serve": {
205
+ "authToken": "YOUR_RAGBOX_SERVE_TOKEN",
206
+ "host": "127.0.0.1",
207
+ "port": 8787
208
+ },
187
209
  "sources": {
188
210
  "ragbox": {
189
211
  "rootDir": "./ragbox",
@@ -225,22 +247,23 @@ ragbox --config ./ragbox.config.prod.json query "怎么部署?"
225
247
 
226
248
  ## 配置
227
249
 
228
- Server 端使用时,建议把稳定配置集中写在 `ragbox.config.json`:PageIndex 路径、docs 路径、LLM `baseUrl`、`model`,以及私有配置文件里的 `apiKey`。环境变量和命令参数仍然支持,适合覆盖配置、接 secret manager,或临时运行。
250
+ Server 端使用时,建议把稳定配置集中写在 `ragbox.config.json`:PageIndex 路径、docs 路径、serve host/port、LLM `baseUrl`、`model`,以及私有配置文件里的 `apiKey`。环境变量和命令参数仍然支持,适合覆盖配置、接 secret manager,或临时运行。
229
251
 
230
252
  配置解析优先级为:命令行参数、`ragbox.config.json`、环境变量、默认值。
231
253
 
232
- | 配置 | 环境变量 | 命令参数 | 用于 | 默认值 |
254
+ | 配置 | 环境变量 | 配置 / 命令参数 | 用于 | 默认值 |
233
255
  | --- | --- | --- | --- | --- |
234
- | PageIndex 脚本 | `PAGEINDEX_CLI` | `ragbox setup pageindex` 写入配置 | `index`, `watch` | 索引时必填 |
235
- | Python 可执行文件 | `PAGEINDEX_PYTHON` | `--pageindex-python` | `index`, `watch` | `python3` |
236
- | 输出目录 | `RAGBOX_OUTPUT_DIR` | `--output-dir` | `index`, `watch` | `<folder>/.pageindex` |
237
- | 并发数 | `PAGEINDEX_CONCURRENCY` | `--concurrency` | `index`, `watch` | `1` |
256
+ | PageIndex 脚本 | `PAGEINDEX_CLI` | `ragbox setup pageindex` 写入配置 | `index`, `watch`, `start` | 索引时必填 |
257
+ | Python 可执行文件 | `PAGEINDEX_PYTHON` | `--pageindex-python` | `index`, `watch`, `start` | `python3` |
258
+ | 输出目录 | `RAGBOX_OUTPUT_DIR` | `--output-dir` | `index`, `watch`, `start` | `<folder>/.pageindex` |
259
+ | 并发数 | `PAGEINDEX_CONCURRENCY` | `--concurrency` | `index`, `watch`, `start` | `1` |
260
+ | PageIndex runner | `PAGEINDEX_RUNNER` | `--pageindex-runner` | `index`, `watch`, `start` | `auto` |
238
261
  | API Base URL | `OPENAI_BASE_URL` | `--base-url` | `index`, `watch`, `query` | `https://api.openai.com/v1` |
239
262
  | API Key | `OPENAI_API_KEY` | `--api-key` | `index`, `watch`, `query` | query 必填,PageIndex 通常也需要 |
240
263
  | 模型 | `PAGEINDEX_MODEL`, `LLM_MODEL` | `--model` | `index`, `watch`, `query` | `gpt-4o-mini` |
241
- | Serve host | `RAGBOX_SERVE_HOST` | `--host` | `serve` | `127.0.0.1` |
242
- | Serve port | `RAGBOX_SERVE_PORT` | `--port` | `serve` | `8787` |
243
- | Serve token | `RAGBOX_SERVE_TOKEN` | `--auth-token` | `serve` | 无 |
264
+ | Serve host | `RAGBOX_SERVE_HOST` | `serve.host`, `--host` | `start`, `serve`, `status`, `doctor` | `127.0.0.1` |
265
+ | Serve port | `RAGBOX_SERVE_PORT` | `serve.port`, `--port` | `start`, `serve`, `status`, `doctor` | `8787` |
266
+ | Serve token | `RAGBOX_SERVE_TOKEN` | `serve.authToken`, `--auth-token` | `start`, `serve` | 无 |
244
267
  | Watch debounce | `RAGBOX_WATCH_DEBOUNCE_MS` | `--debounce-ms` | `watch` | `500` |
245
268
  | Watch 重试次数 | `RAGBOX_WATCH_RETRY_ATTEMPTS` | `--retry-attempts` | `watch` | `0` |
246
269
  | Watch 重试延迟 | `RAGBOX_WATCH_RETRY_DELAY_MS` | `--retry-delay-ms` | `watch` | `1000` |
@@ -333,7 +356,7 @@ ragbox inspect --all-sources --json
333
356
 
334
357
  ### `ragbox status [target]`
335
358
 
336
- 检查索引是否已经可以 query。适合 CI、部署脚本和 smoke check。
359
+ 检查索引是否已经可以 query,并探测本机 HTTP 服务的 `/health` 是否可达。服务探测会先使用 `serve.host` / `serve.port`,再使用 `RAGBOX_SERVE_HOST` / `RAGBOX_SERVE_PORT`,默认是 `127.0.0.1:8787`。
337
360
 
338
361
  ```bash
339
362
  ragbox status ./.ragbox-index
@@ -343,7 +366,7 @@ ragbox status --json
343
366
 
344
367
  ### `ragbox doctor [target]`
345
368
 
346
- 检查本地配置、PageIndex CLI 路径、LLM 设置、API key 是否存在,以及索引是否有效。这个命令不会发起网络请求。
369
+ 检查本地配置、PageIndex CLI 路径、LLM 设置、API key 是否存在、索引是否有效,以及本机 ragbox HTTP 服务是否健康。
347
370
 
348
371
  ```bash
349
372
  ragbox doctor
@@ -409,23 +432,40 @@ indexes/
409
432
 
410
433
  ### `ragbox start [folder]`
411
434
 
412
- 运行完整本地服务流程:先生成初始索引,再持续 watch 文档变化,同时启动 HTTP query API
435
+ 运行完整本地服务流程:启动 watch、提供 HTTP query API,并持续刷新索引。
413
436
 
414
437
  ```bash
415
438
  ragbox start
416
439
  ragbox start --auth-token dev-token
417
440
  ragbox start --host 127.0.0.1 --port 8787 --jsonl
441
+ ragbox start --background
418
442
  ragbox start --source ragbox
419
443
  ragbox start --all-sources
420
444
  ragbox start ./docs --output-dir ./.ragbox-index
421
445
  ```
422
446
 
423
- 当你已经通过 `ragbox setup pageindex` 准备好默认本地配置,并希望用一个前台进程跑本地开发、内网服务或容器时,优先使用 `start`。它会等初始索引完成后再启动 HTTP `serve`,之后每次 watch 成功更新索引,都会刷新 serve 里的索引快照。
447
+ 当你已经通过 `ragbox setup pageindex` 准备好默认本地配置,并希望用一个前台进程跑本地开发、内网服务或容器时,优先使用 `start`。HTTP `serve` 会在 watcher 注册后立即启动,所以初始索引还在运行时,`/` 和 `/health` 已经可以响应。`/health` 在首个索引快照可查询前返回 503;初始索引完成后,以及之后每次 watch 成功更新索引,都会刷新 serve 里的索引快照。
448
+
449
+ 传入 `--background` 时,`start` 会脱离当前终端后台运行。后台进程默认把 stdout/stderr 写到 `./ragbox.log`,并把 PID 写到 `./ragbox.pid`。可以用 `--log-file <path>` 和 `--pid-file <path>` 覆盖路径;如果不想写 PID 文件,可以传 `--no-pid-file`。
450
+
451
+ 在同一个工作目录运行 `ragbox stop`,会读取 `./ragbox.pid` 并停止对应后台进程。如果启动时用了自定义 pid 文件,停止时也传同一个 `--pid-file <path>`。
424
452
 
425
453
  配置了多个 source 时,`ragbox start` 默认启动全部 source。可以用 `--source ragbox,icharts` 限定范围,也可以用 `--all-sources` 显式表达全局启动。
426
454
 
427
455
  `start` 不会创建或修改 `ragbox.config.json`;默认本地 setup 先运行 `ragbox setup pageindex`,如果你想自己管理 PageIndex,再用 `ragbox init` 手动配置。
428
456
 
457
+ ### `ragbox stop`
458
+
459
+ 读取当前工作目录的 `./ragbox.pid`,停止由 `ragbox start --background` 启动的后台进程。
460
+
461
+ ```bash
462
+ ragbox stop
463
+ ragbox stop --pid-file /var/run/ragbox.pid
464
+ ragbox stop --force
465
+ ```
466
+
467
+ 默认发送 `SIGTERM`,等待进程退出后删除 pid 文件。传 `--force` 时发送 `SIGKILL`。
468
+
429
469
  ### `ragbox serve [target]`
430
470
 
431
471
  启动一个前台 HTTP 服务,供外部系统通过 REST API 查询文档。使用前先通过 `ragbox index` 生成索引,或者用 `ragbox watch` 持续刷新索引。
@@ -456,6 +496,7 @@ Public HTTP contract:
456
496
  ```bash
457
497
  curl http://127.0.0.1:8787/
458
498
  curl http://127.0.0.1:8787/health
499
+ ragbox status ./.ragbox-index
459
500
 
460
501
  curl -H "Authorization: Bearer dev-token" \
461
502
  http://127.0.0.1:8787/indexes
@@ -484,7 +525,7 @@ curl -X POST http://localhost:8787/query \
484
525
  curl -X POST http://localhost:8787/reload
485
526
  ```
486
527
 
487
- `serve` 首版面向本地服务、内网服务、container sidecar 和 docs backend。不要把 `.ragbox-index` 作为静态目录直接暴露,因为里面可能包含源文档正文。浏览器 widget 不应该直接携带 ragbox token;建议先请求自己的 backend,由 backend 负责用户登录、限流和审计,再转发给 `ragbox serve`。生产环境建议绑定 localhost 或内网地址,并配置 `--auth-token` 或 `RAGBOX_SERVE_TOKEN`。
528
+ `serve` 首版面向本地服务、内网服务、container sidecar 和 docs backend。不要把 `.ragbox-index` 作为静态目录直接暴露,因为里面可能包含源文档正文。浏览器 widget 不应该直接携带 ragbox token;建议先请求自己的 backend,由 backend 负责用户登录、限流和审计,再转发给 `ragbox serve`。生产环境建议绑定 localhost 或内网地址,并配置 `serve.authToken`、`--auth-token` 或 `RAGBOX_SERVE_TOKEN`。
488
529
 
489
530
  ### `ragbox watch <folder>`
490
531
 
@@ -558,8 +599,9 @@ ragbox query ./.ragbox-index "..."
558
599
  - 把输出目录放在源码目录外,例如 `/var/lib/ragbox/docs-index`
559
600
  - 多副本应用需要读取同一份完整索引,可以挂载只读卷或随部署产物分发
560
601
  - API key 可以放私有 server 配置、环境变量或 secret manager;不要提交真实 key
561
- - 当 `serve` 不只绑定 localhost 时,使用 `RAGBOX_SERVE_TOKEN` 或 `--auth-token`
602
+ - 当 `serve` 不只绑定 localhost 时,使用 `serve.authToken`、`RAGBOX_SERVE_TOKEN` 或 `--auth-token`;如果配置会提交或共享,要把 token 当作密钥处理
562
603
  - 先用 `--concurrency 1`,确认 PageIndex 和模型服务限流后再提高
604
+ - Markdown/MDX 索引保持默认 `--pageindex-runner auto`;它会优先使用 PageIndex 热 worker,无法使用时自动回退到单文件 CLI
563
605
  - 如果要求零停机更新,可以先索引到 staging 目录,成功后再切换读目录
564
606
 
565
607
  私有 server 配置示例:
@@ -569,13 +611,19 @@ ragbox query ./.ragbox-index "..."
569
611
  "version": 1,
570
612
  "pageIndex": {
571
613
  "cli": "/opt/PageIndex/run_pageindex.py",
572
- "python": "/opt/pageindex-venv/bin/python"
614
+ "python": "/opt/pageindex-venv/bin/python",
615
+ "runner": "auto"
573
616
  },
574
617
  "llm": {
575
618
  "baseUrl": "https://api.openai.com/v1",
576
619
  "model": "gpt-4o-mini",
577
620
  "apiKey": "sk-..."
578
621
  },
622
+ "serve": {
623
+ "authToken": "YOUR_RAGBOX_SERVE_TOKEN",
624
+ "host": "127.0.0.1",
625
+ "port": 8787
626
+ },
579
627
  "docs": {
580
628
  "rootDir": "/srv/app/docs",
581
629
  "outputDir": "/var/lib/ragbox/docs-index"
@@ -590,14 +638,19 @@ ragbox --config ./ragbox.config.prod.json query "怎么配置认证?"
590
638
 
591
639
  ### 后台运行
592
640
 
593
- `ragbox start` 有意以前台进程运行。Server 部署时,建议交给进程管理器托管,这样关闭终端、SSH 断开或进程崩溃后都能继续运行或自动重启。
641
+ 本地或内网临时测试时,可以用 `ragbox start --background` 让同一个 start 流程脱离当前终端:
594
642
 
595
- 临时测试可以用 `nohup`:
643
+ ```bash
644
+ ragbox --config ./ragbox.config.prod.json start \
645
+ --background
646
+ ```
596
647
 
597
648
  ```bash
598
- nohup ragbox --config ./ragbox.config.prod.json start > ragbox.log 2>&1 &
649
+ ragbox stop
599
650
  ```
600
651
 
652
+ 长期运行的 server 仍然建议交给进程管理器托管,这样进程崩溃后可以自动重启,启动顺序也更清晰。
653
+
601
654
  Linux server 推荐用 `systemd`:
602
655
 
603
656
  ```ini