@bndynet/ragbox 0.1.0 → 0.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +74 -21
- package/README.zh-CN.md +74 -21
- package/dist/src/cli.js +382 -14
- package/dist/src/config-file.d.ts +17 -1
- package/dist/src/config-file.js +49 -4
- package/dist/src/folder-index/config.js +7 -0
- package/dist/src/folder-index/indexer.js +48 -2
- package/dist/src/folder-index/pageindex-runner.d.ts +15 -0
- package/dist/src/folder-index/pageindex-runner.js +436 -0
- package/dist/src/folder-index/types.d.ts +2 -0
- package/dist/src/index.d.ts +1 -0
- package/dist/src/sdk.d.ts +2 -1
- package/dist/src/sdk.js +1 -0
- package/dist/src/serve.js +24 -15
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -41,6 +41,11 @@ Before continuing, edit `ragbox.config.json`: add your model settings and point
|
|
|
41
41
|
"model": "gpt-4o-mini",
|
|
42
42
|
"apiKey": "sk-..."
|
|
43
43
|
},
|
|
44
|
+
"serve": {
|
|
45
|
+
"authToken": "YOUR_RAGBOX_SERVE_TOKEN",
|
|
46
|
+
"host": "127.0.0.1",
|
|
47
|
+
"port": 8787
|
|
48
|
+
},
|
|
44
49
|
"docs": {
|
|
45
50
|
"rootDir": "./docs",
|
|
46
51
|
"outputDir": "./.ragbox-index"
|
|
@@ -67,6 +72,13 @@ Use `start` when you want one foreground process for indexing, watching, and ser
|
|
|
67
72
|
ragbox start
|
|
68
73
|
```
|
|
69
74
|
|
|
75
|
+
For a quick detached local run, use `--background`:
|
|
76
|
+
|
|
77
|
+
```bash
|
|
78
|
+
ragbox start --background
|
|
79
|
+
ragbox stop
|
|
80
|
+
```
|
|
81
|
+
|
|
70
82
|
You can still pass explicit paths when you want to override the config for one command:
|
|
71
83
|
|
|
72
84
|
```bash
|
|
@@ -122,7 +134,7 @@ The default setup needs:
|
|
|
122
134
|
| Avoid repeating paths | use the `ragbox.config.json` written by `ragbox setup pageindex`, or run `ragbox init` |
|
|
123
135
|
| Query several docs folders together | Configure `sources`, run `ragbox index --source <name>`, then `ragbox query --all-sources "..."` |
|
|
124
136
|
| Debug answer quality | `ragbox query --trace --json "..."` or `ragbox trace query "..."` |
|
|
125
|
-
| Check
|
|
137
|
+
| Check index and HTTP server status | `ragbox status ./.ragbox-index` |
|
|
126
138
|
| Diagnose local setup issues | `ragbox doctor` |
|
|
127
139
|
| Keep docs indexed while editing | `ragbox watch ./docs --output-dir ./.ragbox-index --jsonl` |
|
|
128
140
|
| Run the full local service loop | `ragbox start --auth-token <token>` |
|
|
@@ -144,6 +156,11 @@ The default setup needs:
|
|
|
144
156
|
"model": "gpt-4o-mini",
|
|
145
157
|
"apiKey": "sk-..."
|
|
146
158
|
},
|
|
159
|
+
"serve": {
|
|
160
|
+
"authToken": "YOUR_RAGBOX_SERVE_TOKEN",
|
|
161
|
+
"host": "127.0.0.1",
|
|
162
|
+
"port": 8787
|
|
163
|
+
},
|
|
147
164
|
"docs": {
|
|
148
165
|
"rootDir": "./docs",
|
|
149
166
|
"outputDir": "./.ragbox-index"
|
|
@@ -158,7 +175,7 @@ ragbox init
|
|
|
158
175
|
ragbox init --docs-dir ./content --output-dir ./.idx
|
|
159
176
|
```
|
|
160
177
|
|
|
161
|
-
Relative paths are resolved from the config file directory. Server-side deployments can keep `baseUrl`, `model`, and `apiKey` together in a private `ragbox.config.json` or environment-specific config such as `ragbox.config.prod.json`. If a config file is committed or shared, keep `apiKey` in environment variables or a secret manager instead.
|
|
178
|
+
Relative paths are resolved from the config file directory. Server-side deployments can keep `baseUrl`, `model`, `serve.host`, `serve.port`, and `apiKey` together in a private `ragbox.config.json` or environment-specific config such as `ragbox.config.prod.json`. If a config file is committed or shared, keep `apiKey` in environment variables or a secret manager instead.
|
|
162
179
|
|
|
163
180
|
For one documentation source, use the top-level `docs` object. No `--source` flag is needed. If a project needs multiple named sources, use the optional `sources` map.
|
|
164
181
|
|
|
@@ -186,6 +203,11 @@ For multiple documentation directories, name each one under `sources`. This is u
|
|
|
186
203
|
"model": "gpt-4o-mini",
|
|
187
204
|
"apiKey": "sk-..."
|
|
188
205
|
},
|
|
206
|
+
"serve": {
|
|
207
|
+
"authToken": "YOUR_RAGBOX_SERVE_TOKEN",
|
|
208
|
+
"host": "127.0.0.1",
|
|
209
|
+
"port": 8787
|
|
210
|
+
},
|
|
189
211
|
"sources": {
|
|
190
212
|
"ragbox": {
|
|
191
213
|
"rootDir": "./ragbox",
|
|
@@ -227,22 +249,23 @@ ragbox --config ./ragbox.config.prod.json query "How do I deploy?"
|
|
|
227
249
|
|
|
228
250
|
## Configuration
|
|
229
251
|
|
|
230
|
-
For server-side use, keep the stable settings in `ragbox.config.json`: PageIndex paths, docs paths, LLM `baseUrl`, `model`, and, when the file is private, `apiKey`. Environment variables and CLI flags are still supported for overrides, secret managers, and one-off runs.
|
|
252
|
+
For server-side use, keep the stable settings in `ragbox.config.json`: PageIndex paths, docs paths, serve host/port, LLM `baseUrl`, `model`, and, when the file is private, `apiKey`. Environment variables and CLI flags are still supported for overrides, secret managers, and one-off runs.
|
|
231
253
|
|
|
232
254
|
Resolution order is command-line flags, then `ragbox.config.json`, then environment variables, then defaults.
|
|
233
255
|
|
|
234
|
-
| Setting | Env | CLI
|
|
256
|
+
| Setting | Env | Config / CLI | Used by | Default |
|
|
235
257
|
| --- | --- | --- | --- | --- |
|
|
236
|
-
| PageIndex script | `PAGEINDEX_CLI` | `ragbox setup pageindex` writes config | `index`, `watch` | required when indexing |
|
|
237
|
-
| Python executable | `PAGEINDEX_PYTHON` | `--pageindex-python` | `index`, `watch` | `python3` |
|
|
238
|
-
| Output directory | `RAGBOX_OUTPUT_DIR` | `--output-dir` | `index`, `watch` | `<folder>/.pageindex` |
|
|
239
|
-
| Concurrency | `PAGEINDEX_CONCURRENCY` | `--concurrency` | `index`, `watch` | `1` |
|
|
258
|
+
| PageIndex script | `PAGEINDEX_CLI` | `ragbox setup pageindex` writes config | `index`, `watch`, `start` | required when indexing |
|
|
259
|
+
| Python executable | `PAGEINDEX_PYTHON` | `--pageindex-python` | `index`, `watch`, `start` | `python3` |
|
|
260
|
+
| Output directory | `RAGBOX_OUTPUT_DIR` | `--output-dir` | `index`, `watch`, `start` | `<folder>/.pageindex` |
|
|
261
|
+
| Concurrency | `PAGEINDEX_CONCURRENCY` | `--concurrency` | `index`, `watch`, `start` | `1` |
|
|
262
|
+
| PageIndex runner | `PAGEINDEX_RUNNER` | `--pageindex-runner` | `index`, `watch`, `start` | `auto` |
|
|
240
263
|
| API base URL | `OPENAI_BASE_URL` | `--base-url` | `index`, `watch`, `query` | `https://api.openai.com/v1` |
|
|
241
264
|
| API key | `OPENAI_API_KEY` | `--api-key` | `index`, `watch`, `query` | required for query and usually PageIndex |
|
|
242
265
|
| Model | `PAGEINDEX_MODEL`, `LLM_MODEL` | `--model` | `index`, `watch`, `query` | `gpt-4o-mini` |
|
|
243
|
-
| Serve host | `RAGBOX_SERVE_HOST` | `--host` | `serve` | `127.0.0.1` |
|
|
244
|
-
| Serve port | `RAGBOX_SERVE_PORT` | `--port` | `serve` | `8787` |
|
|
245
|
-
| Serve token | `RAGBOX_SERVE_TOKEN` | `--auth-token` | `serve` | none |
|
|
266
|
+
| Serve host | `RAGBOX_SERVE_HOST` | `serve.host`, `--host` | `start`, `serve`, `status`, `doctor` | `127.0.0.1` |
|
|
267
|
+
| Serve port | `RAGBOX_SERVE_PORT` | `serve.port`, `--port` | `start`, `serve`, `status`, `doctor` | `8787` |
|
|
268
|
+
| Serve token | `RAGBOX_SERVE_TOKEN` | `serve.authToken`, `--auth-token` | `start`, `serve` | none |
|
|
246
269
|
| Watch debounce | `RAGBOX_WATCH_DEBOUNCE_MS` | `--debounce-ms` | `watch` | `500` |
|
|
247
270
|
| Watch retry attempts | `RAGBOX_WATCH_RETRY_ATTEMPTS` | `--retry-attempts` | `watch` | `0` |
|
|
248
271
|
| Watch retry delay | `RAGBOX_WATCH_RETRY_DELAY_MS` | `--retry-delay-ms` | `watch` | `1000` |
|
|
@@ -335,7 +358,7 @@ ragbox inspect --all-sources --json
|
|
|
335
358
|
|
|
336
359
|
### `ragbox status [target]`
|
|
337
360
|
|
|
338
|
-
Checks whether an index is ready to query
|
|
361
|
+
Checks whether an index is ready to query and whether the local HTTP server health endpoint is reachable. The server probe uses `serve.host` / `serve.port`, then `RAGBOX_SERVE_HOST` / `RAGBOX_SERVE_PORT`, defaulting to `127.0.0.1:8787`.
|
|
339
362
|
|
|
340
363
|
```bash
|
|
341
364
|
ragbox status ./.ragbox-index
|
|
@@ -345,7 +368,7 @@ ragbox status --json
|
|
|
345
368
|
|
|
346
369
|
### `ragbox doctor [target]`
|
|
347
370
|
|
|
348
|
-
Checks the local setup: config, PageIndex CLI path, LLM settings, API key presence,
|
|
371
|
+
Checks the local setup: config, PageIndex CLI path, LLM settings, API key presence, index validity, and local HTTP server health.
|
|
349
372
|
|
|
350
373
|
```bash
|
|
351
374
|
ragbox doctor
|
|
@@ -404,23 +427,40 @@ Fatal query errors include the stage that failed, for example `Query failed duri
|
|
|
404
427
|
|
|
405
428
|
### `ragbox start [folder]`
|
|
406
429
|
|
|
407
|
-
Runs the complete local service loop:
|
|
430
|
+
Runs the complete local service loop: start watch, serve the HTTP query API, and keep indexes fresh.
|
|
408
431
|
|
|
409
432
|
```bash
|
|
410
433
|
ragbox start
|
|
411
434
|
ragbox start --auth-token dev-token
|
|
412
435
|
ragbox start --host 127.0.0.1 --port 8787 --jsonl
|
|
436
|
+
ragbox start --background
|
|
413
437
|
ragbox start --source ragbox
|
|
414
438
|
ragbox start --all-sources
|
|
415
439
|
ragbox start ./docs --output-dir ./.ragbox-index
|
|
416
440
|
```
|
|
417
441
|
|
|
418
|
-
Use `start` after `ragbox setup pageindex` when you want one foreground process for local development, an internal service, or a container.
|
|
442
|
+
Use `start` after `ragbox setup pageindex` when you want one foreground process for local development, an internal service, or a container. HTTP `serve` starts as soon as the watchers are registered, so `/` and `/health` respond while the initial index is still running. `/health` returns 503 until the first index snapshot is query-ready, and `start` reloads the serve index snapshot after the initial run and every successful watch update.
|
|
443
|
+
|
|
444
|
+
Pass `--background` to detach `start` from the current terminal. Background runs write stdout/stderr to `./ragbox.log` and the process id to `./ragbox.pid` by default. Override those paths with `--log-file <path>` and `--pid-file <path>`, or pass `--no-pid-file` when you do not want a pid file.
|
|
445
|
+
|
|
446
|
+
Use `ragbox stop` from the same working directory to stop the background process recorded in `./ragbox.pid`. Pass `--pid-file <path>` if you started with a custom pid file.
|
|
419
447
|
|
|
420
448
|
With multiple configured sources, `ragbox start` starts all sources by default. Use `--source ragbox,icharts` to limit the running sources, or `--all-sources` to make the global behavior explicit.
|
|
421
449
|
|
|
422
450
|
`start` does not create or edit `ragbox.config.json`; run `ragbox setup pageindex` for the default local setup, or `ragbox init` when you want to manage PageIndex yourself.
|
|
423
451
|
|
|
452
|
+
### `ragbox stop`
|
|
453
|
+
|
|
454
|
+
Stops a `ragbox start --background` process by reading `./ragbox.pid` in the current working directory.
|
|
455
|
+
|
|
456
|
+
```bash
|
|
457
|
+
ragbox stop
|
|
458
|
+
ragbox stop --pid-file /var/run/ragbox.pid
|
|
459
|
+
ragbox stop --force
|
|
460
|
+
```
|
|
461
|
+
|
|
462
|
+
By default, `stop` sends `SIGTERM`, waits for the process to exit, and removes the pid file. Use `--force` to send `SIGKILL`.
|
|
463
|
+
|
|
424
464
|
### `ragbox serve [target]`
|
|
425
465
|
|
|
426
466
|
Starts a foreground HTTP server for external systems. Index first with `ragbox index`, or keep the index fresh with `ragbox watch`.
|
|
@@ -451,6 +491,7 @@ Single-index requests:
|
|
|
451
491
|
```bash
|
|
452
492
|
curl http://127.0.0.1:8787/
|
|
453
493
|
curl http://127.0.0.1:8787/health
|
|
494
|
+
ragbox status ./.ragbox-index
|
|
454
495
|
|
|
455
496
|
curl -H "Authorization: Bearer dev-token" \
|
|
456
497
|
http://127.0.0.1:8787/indexes
|
|
@@ -479,7 +520,7 @@ curl -X POST http://localhost:8787/query \
|
|
|
479
520
|
curl -X POST http://localhost:8787/reload
|
|
480
521
|
```
|
|
481
522
|
|
|
482
|
-
`serve` is designed for local services, internal services, container sidecars, and docs backends. Do not expose `.ragbox-index` as static files, because it can contain source document text. Browser widgets should call your own backend first; the backend can enforce user auth, rate limits, and audit logging before forwarding requests to `ragbox serve`. In production, bind to localhost or an internal network address and configure `--auth-token
|
|
523
|
+
`serve` is designed for local services, internal services, container sidecars, and docs backends. Do not expose `.ragbox-index` as static files, because it can contain source document text. Browser widgets should call your own backend first; the backend can enforce user auth, rate limits, and audit logging before forwarding requests to `ragbox serve`. In production, bind to localhost or an internal network address and configure `serve.authToken`, `--auth-token`, or `RAGBOX_SERVE_TOKEN`.
|
|
483
524
|
|
|
484
525
|
### `ragbox watch <folder>`
|
|
485
526
|
|
|
@@ -550,8 +591,9 @@ Common patterns:
|
|
|
550
591
|
- Store the output directory outside the source tree, for example `/var/lib/ragbox/docs-index`.
|
|
551
592
|
- Mount or copy the completed output directory to every app replica that needs querying.
|
|
552
593
|
- Keep API keys in a private server config, environment variables, or your secret manager. Do not commit real keys.
|
|
553
|
-
- Use `RAGBOX_SERVE_TOKEN
|
|
594
|
+
- Use `serve.authToken`, `RAGBOX_SERVE_TOKEN`, or `--auth-token` when `serve` is reachable beyond localhost. Treat the token like a secret if the config is committed or shared.
|
|
554
595
|
- Start with `--concurrency 1`; raise it only after checking PageIndex and API rate limits.
|
|
596
|
+
- Keep the default `--pageindex-runner auto` for Markdown/MDX indexes; it uses warm PageIndex workers when possible and falls back to the single-file CLI when needed.
|
|
555
597
|
|
|
556
598
|
Example private server config:
|
|
557
599
|
|
|
@@ -560,13 +602,19 @@ Example private server config:
|
|
|
560
602
|
"version": 1,
|
|
561
603
|
"pageIndex": {
|
|
562
604
|
"cli": "/opt/PageIndex/run_pageindex.py",
|
|
563
|
-
"python": "/opt/pageindex-venv/bin/python"
|
|
605
|
+
"python": "/opt/pageindex-venv/bin/python",
|
|
606
|
+
"runner": "auto"
|
|
564
607
|
},
|
|
565
608
|
"llm": {
|
|
566
609
|
"baseUrl": "https://api.openai.com/v1",
|
|
567
610
|
"model": "gpt-4o-mini",
|
|
568
611
|
"apiKey": "sk-..."
|
|
569
612
|
},
|
|
613
|
+
"serve": {
|
|
614
|
+
"authToken": "YOUR_RAGBOX_SERVE_TOKEN",
|
|
615
|
+
"host": "127.0.0.1",
|
|
616
|
+
"port": 8787
|
|
617
|
+
},
|
|
570
618
|
"docs": {
|
|
571
619
|
"rootDir": "/srv/app/docs",
|
|
572
620
|
"outputDir": "/var/lib/ragbox/docs-index"
|
|
@@ -581,14 +629,19 @@ ragbox --config ./ragbox.config.prod.json query "How do I configure authenticati
|
|
|
581
629
|
|
|
582
630
|
### Run In The Background
|
|
583
631
|
|
|
584
|
-
|
|
632
|
+
For quick local or internal testing, `ragbox start --background` detaches the same start loop from the current terminal:
|
|
585
633
|
|
|
586
|
-
|
|
634
|
+
```bash
|
|
635
|
+
ragbox --config ./ragbox.config.prod.json start \
|
|
636
|
+
--background
|
|
637
|
+
```
|
|
587
638
|
|
|
588
639
|
```bash
|
|
589
|
-
|
|
640
|
+
ragbox stop
|
|
590
641
|
```
|
|
591
642
|
|
|
643
|
+
For a long-running server, prefer a process supervisor so crashes are restarted and startup ordering is explicit.
|
|
644
|
+
|
|
592
645
|
For Linux servers, prefer `systemd`:
|
|
593
646
|
|
|
594
647
|
```ini
|
package/README.zh-CN.md
CHANGED
|
@@ -39,6 +39,11 @@ ragbox setup pageindex
|
|
|
39
39
|
"model": "gpt-4o-mini",
|
|
40
40
|
"apiKey": "sk-..."
|
|
41
41
|
},
|
|
42
|
+
"serve": {
|
|
43
|
+
"authToken": "YOUR_RAGBOX_SERVE_TOKEN",
|
|
44
|
+
"host": "127.0.0.1",
|
|
45
|
+
"port": 8787
|
|
46
|
+
},
|
|
42
47
|
"docs": {
|
|
43
48
|
"rootDir": "./docs",
|
|
44
49
|
"outputDir": "./.ragbox-index"
|
|
@@ -65,6 +70,13 @@ ragbox watch --jsonl
|
|
|
65
70
|
ragbox start
|
|
66
71
|
```
|
|
67
72
|
|
|
73
|
+
如果只是想临时脱离当前终端运行,可以加 `--background`:
|
|
74
|
+
|
|
75
|
+
```bash
|
|
76
|
+
ragbox start --background
|
|
77
|
+
ragbox stop
|
|
78
|
+
```
|
|
79
|
+
|
|
68
80
|
需要临时覆盖配置时,仍然可以显式传路径:
|
|
69
81
|
|
|
70
82
|
```bash
|
|
@@ -120,7 +132,7 @@ ragbox query ./.ragbox-index "怎么配置认证?" \
|
|
|
120
132
|
| 不想每次重复路径 | 使用 `ragbox setup pageindex` 写入的 `ragbox.config.json`,或运行 `ragbox init` |
|
|
121
133
|
| 多个文档目录一起查询 | 配置 `sources`,分别跑 `ragbox index --source <name>`,再用 `ragbox query --all-sources "..."` |
|
|
122
134
|
| 调试回答质量 | `ragbox query --trace --json "..."` 或 `ragbox trace query "..."` |
|
|
123
|
-
|
|
|
135
|
+
| 检查索引和 HTTP 服务状态 | `ragbox status ./.ragbox-index` |
|
|
124
136
|
| 诊断本地配置问题 | `ragbox doctor` |
|
|
125
137
|
| 编辑文档时自动更新索引 | `ragbox watch ./docs --output-dir ./.ragbox-index --jsonl` |
|
|
126
138
|
| 跑完整本地服务流程 | `ragbox start --auth-token <token>` |
|
|
@@ -142,6 +154,11 @@ ragbox query ./.ragbox-index "怎么配置认证?" \
|
|
|
142
154
|
"model": "gpt-4o-mini",
|
|
143
155
|
"apiKey": "sk-..."
|
|
144
156
|
},
|
|
157
|
+
"serve": {
|
|
158
|
+
"authToken": "YOUR_RAGBOX_SERVE_TOKEN",
|
|
159
|
+
"host": "127.0.0.1",
|
|
160
|
+
"port": 8787
|
|
161
|
+
},
|
|
145
162
|
"docs": {
|
|
146
163
|
"rootDir": "./docs",
|
|
147
164
|
"outputDir": "./.ragbox-index"
|
|
@@ -156,7 +173,7 @@ ragbox init
|
|
|
156
173
|
ragbox init --docs-dir ./content --output-dir ./.idx
|
|
157
174
|
```
|
|
158
175
|
|
|
159
|
-
配置文件中的相对路径会按配置文件所在目录解析。Server 端部署可以把 `baseUrl`、`model` 和 `apiKey` 一起放在私有 `ragbox.config.json`,或按环境拆成 `ragbox.config.prod.json`。如果配置文件会提交到仓库或共享给他人,就不要写真实 `apiKey`,改用环境变量或 secret manager。
|
|
176
|
+
配置文件中的相对路径会按配置文件所在目录解析。Server 端部署可以把 `baseUrl`、`model`、`serve.host`、`serve.port` 和 `apiKey` 一起放在私有 `ragbox.config.json`,或按环境拆成 `ragbox.config.prod.json`。如果配置文件会提交到仓库或共享给他人,就不要写真实 `apiKey`,改用环境变量或 secret manager。
|
|
160
177
|
|
|
161
178
|
只有一个文档源时,用顶层 `docs` 就够了,不需要传 `--source`。项目里确实有多个命名文档源时,再使用可选的 `sources` 映射。
|
|
162
179
|
|
|
@@ -184,6 +201,11 @@ ragbox --config ./ragbox.config.json index
|
|
|
184
201
|
"model": "gpt-4o-mini",
|
|
185
202
|
"apiKey": "sk-..."
|
|
186
203
|
},
|
|
204
|
+
"serve": {
|
|
205
|
+
"authToken": "YOUR_RAGBOX_SERVE_TOKEN",
|
|
206
|
+
"host": "127.0.0.1",
|
|
207
|
+
"port": 8787
|
|
208
|
+
},
|
|
187
209
|
"sources": {
|
|
188
210
|
"ragbox": {
|
|
189
211
|
"rootDir": "./ragbox",
|
|
@@ -225,22 +247,23 @@ ragbox --config ./ragbox.config.prod.json query "怎么部署?"
|
|
|
225
247
|
|
|
226
248
|
## 配置
|
|
227
249
|
|
|
228
|
-
Server 端使用时,建议把稳定配置集中写在 `ragbox.config.json`:PageIndex 路径、docs 路径、LLM `baseUrl`、`model`,以及私有配置文件里的 `apiKey`。环境变量和命令参数仍然支持,适合覆盖配置、接 secret manager,或临时运行。
|
|
250
|
+
Server 端使用时,建议把稳定配置集中写在 `ragbox.config.json`:PageIndex 路径、docs 路径、serve host/port、LLM `baseUrl`、`model`,以及私有配置文件里的 `apiKey`。环境变量和命令参数仍然支持,适合覆盖配置、接 secret manager,或临时运行。
|
|
229
251
|
|
|
230
252
|
配置解析优先级为:命令行参数、`ragbox.config.json`、环境变量、默认值。
|
|
231
253
|
|
|
232
|
-
| 配置 | 环境变量 | 命令参数 | 用于 | 默认值 |
|
|
254
|
+
| 配置 | 环境变量 | 配置 / 命令参数 | 用于 | 默认值 |
|
|
233
255
|
| --- | --- | --- | --- | --- |
|
|
234
|
-
| PageIndex 脚本 | `PAGEINDEX_CLI` | `ragbox setup pageindex` 写入配置 | `index`, `watch` | 索引时必填 |
|
|
235
|
-
| Python 可执行文件 | `PAGEINDEX_PYTHON` | `--pageindex-python` | `index`, `watch` | `python3` |
|
|
236
|
-
| 输出目录 | `RAGBOX_OUTPUT_DIR` | `--output-dir` | `index`, `watch` | `<folder>/.pageindex` |
|
|
237
|
-
| 并发数 | `PAGEINDEX_CONCURRENCY` | `--concurrency` | `index`, `watch` | `1` |
|
|
256
|
+
| PageIndex 脚本 | `PAGEINDEX_CLI` | `ragbox setup pageindex` 写入配置 | `index`, `watch`, `start` | 索引时必填 |
|
|
257
|
+
| Python 可执行文件 | `PAGEINDEX_PYTHON` | `--pageindex-python` | `index`, `watch`, `start` | `python3` |
|
|
258
|
+
| 输出目录 | `RAGBOX_OUTPUT_DIR` | `--output-dir` | `index`, `watch`, `start` | `<folder>/.pageindex` |
|
|
259
|
+
| 并发数 | `PAGEINDEX_CONCURRENCY` | `--concurrency` | `index`, `watch`, `start` | `1` |
|
|
260
|
+
| PageIndex runner | `PAGEINDEX_RUNNER` | `--pageindex-runner` | `index`, `watch`, `start` | `auto` |
|
|
238
261
|
| API Base URL | `OPENAI_BASE_URL` | `--base-url` | `index`, `watch`, `query` | `https://api.openai.com/v1` |
|
|
239
262
|
| API Key | `OPENAI_API_KEY` | `--api-key` | `index`, `watch`, `query` | query 必填,PageIndex 通常也需要 |
|
|
240
263
|
| 模型 | `PAGEINDEX_MODEL`, `LLM_MODEL` | `--model` | `index`, `watch`, `query` | `gpt-4o-mini` |
|
|
241
|
-
| Serve host | `RAGBOX_SERVE_HOST` | `--host` | `serve` | `127.0.0.1` |
|
|
242
|
-
| Serve port | `RAGBOX_SERVE_PORT` | `--port` | `serve` | `8787` |
|
|
243
|
-
| Serve token | `RAGBOX_SERVE_TOKEN` | `--auth-token` | `serve` | 无 |
|
|
264
|
+
| Serve host | `RAGBOX_SERVE_HOST` | `serve.host`, `--host` | `start`, `serve`, `status`, `doctor` | `127.0.0.1` |
|
|
265
|
+
| Serve port | `RAGBOX_SERVE_PORT` | `serve.port`, `--port` | `start`, `serve`, `status`, `doctor` | `8787` |
|
|
266
|
+
| Serve token | `RAGBOX_SERVE_TOKEN` | `serve.authToken`, `--auth-token` | `start`, `serve` | 无 |
|
|
244
267
|
| Watch debounce | `RAGBOX_WATCH_DEBOUNCE_MS` | `--debounce-ms` | `watch` | `500` |
|
|
245
268
|
| Watch 重试次数 | `RAGBOX_WATCH_RETRY_ATTEMPTS` | `--retry-attempts` | `watch` | `0` |
|
|
246
269
|
| Watch 重试延迟 | `RAGBOX_WATCH_RETRY_DELAY_MS` | `--retry-delay-ms` | `watch` | `1000` |
|
|
@@ -333,7 +356,7 @@ ragbox inspect --all-sources --json
|
|
|
333
356
|
|
|
334
357
|
### `ragbox status [target]`
|
|
335
358
|
|
|
336
|
-
检查索引是否已经可以 query
|
|
359
|
+
检查索引是否已经可以 query,并探测本机 HTTP 服务的 `/health` 是否可达。服务探测会先使用 `serve.host` / `serve.port`,再使用 `RAGBOX_SERVE_HOST` / `RAGBOX_SERVE_PORT`,默认是 `127.0.0.1:8787`。
|
|
337
360
|
|
|
338
361
|
```bash
|
|
339
362
|
ragbox status ./.ragbox-index
|
|
@@ -343,7 +366,7 @@ ragbox status --json
|
|
|
343
366
|
|
|
344
367
|
### `ragbox doctor [target]`
|
|
345
368
|
|
|
346
|
-
检查本地配置、PageIndex CLI 路径、LLM 设置、API key
|
|
369
|
+
检查本地配置、PageIndex CLI 路径、LLM 设置、API key 是否存在、索引是否有效,以及本机 ragbox HTTP 服务是否健康。
|
|
347
370
|
|
|
348
371
|
```bash
|
|
349
372
|
ragbox doctor
|
|
@@ -409,23 +432,40 @@ indexes/
|
|
|
409
432
|
|
|
410
433
|
### `ragbox start [folder]`
|
|
411
434
|
|
|
412
|
-
|
|
435
|
+
运行完整本地服务流程:启动 watch、提供 HTTP query API,并持续刷新索引。
|
|
413
436
|
|
|
414
437
|
```bash
|
|
415
438
|
ragbox start
|
|
416
439
|
ragbox start --auth-token dev-token
|
|
417
440
|
ragbox start --host 127.0.0.1 --port 8787 --jsonl
|
|
441
|
+
ragbox start --background
|
|
418
442
|
ragbox start --source ragbox
|
|
419
443
|
ragbox start --all-sources
|
|
420
444
|
ragbox start ./docs --output-dir ./.ragbox-index
|
|
421
445
|
```
|
|
422
446
|
|
|
423
|
-
当你已经通过 `ragbox setup pageindex` 准备好默认本地配置,并希望用一个前台进程跑本地开发、内网服务或容器时,优先使用 `start
|
|
447
|
+
当你已经通过 `ragbox setup pageindex` 准备好默认本地配置,并希望用一个前台进程跑本地开发、内网服务或容器时,优先使用 `start`。HTTP `serve` 会在 watcher 注册后立即启动,所以初始索引还在运行时,`/` 和 `/health` 已经可以响应。`/health` 在首个索引快照可查询前返回 503;初始索引完成后,以及之后每次 watch 成功更新索引,都会刷新 serve 里的索引快照。
|
|
448
|
+
|
|
449
|
+
传入 `--background` 时,`start` 会脱离当前终端后台运行。后台进程默认把 stdout/stderr 写到 `./ragbox.log`,并把 PID 写到 `./ragbox.pid`。可以用 `--log-file <path>` 和 `--pid-file <path>` 覆盖路径;如果不想写 PID 文件,可以传 `--no-pid-file`。
|
|
450
|
+
|
|
451
|
+
在同一个工作目录运行 `ragbox stop`,会读取 `./ragbox.pid` 并停止对应后台进程。如果启动时用了自定义 pid 文件,停止时也传同一个 `--pid-file <path>`。
|
|
424
452
|
|
|
425
453
|
配置了多个 source 时,`ragbox start` 默认启动全部 source。可以用 `--source ragbox,icharts` 限定范围,也可以用 `--all-sources` 显式表达全局启动。
|
|
426
454
|
|
|
427
455
|
`start` 不会创建或修改 `ragbox.config.json`;默认本地 setup 先运行 `ragbox setup pageindex`,如果你想自己管理 PageIndex,再用 `ragbox init` 手动配置。
|
|
428
456
|
|
|
457
|
+
### `ragbox stop`
|
|
458
|
+
|
|
459
|
+
读取当前工作目录的 `./ragbox.pid`,停止由 `ragbox start --background` 启动的后台进程。
|
|
460
|
+
|
|
461
|
+
```bash
|
|
462
|
+
ragbox stop
|
|
463
|
+
ragbox stop --pid-file /var/run/ragbox.pid
|
|
464
|
+
ragbox stop --force
|
|
465
|
+
```
|
|
466
|
+
|
|
467
|
+
默认发送 `SIGTERM`,等待进程退出后删除 pid 文件。传 `--force` 时发送 `SIGKILL`。
|
|
468
|
+
|
|
429
469
|
### `ragbox serve [target]`
|
|
430
470
|
|
|
431
471
|
启动一个前台 HTTP 服务,供外部系统通过 REST API 查询文档。使用前先通过 `ragbox index` 生成索引,或者用 `ragbox watch` 持续刷新索引。
|
|
@@ -456,6 +496,7 @@ Public HTTP contract:
|
|
|
456
496
|
```bash
|
|
457
497
|
curl http://127.0.0.1:8787/
|
|
458
498
|
curl http://127.0.0.1:8787/health
|
|
499
|
+
ragbox status ./.ragbox-index
|
|
459
500
|
|
|
460
501
|
curl -H "Authorization: Bearer dev-token" \
|
|
461
502
|
http://127.0.0.1:8787/indexes
|
|
@@ -484,7 +525,7 @@ curl -X POST http://localhost:8787/query \
|
|
|
484
525
|
curl -X POST http://localhost:8787/reload
|
|
485
526
|
```
|
|
486
527
|
|
|
487
|
-
`serve` 首版面向本地服务、内网服务、container sidecar 和 docs backend。不要把 `.ragbox-index` 作为静态目录直接暴露,因为里面可能包含源文档正文。浏览器 widget 不应该直接携带 ragbox token;建议先请求自己的 backend,由 backend 负责用户登录、限流和审计,再转发给 `ragbox serve`。生产环境建议绑定 localhost 或内网地址,并配置
|
|
528
|
+
`serve` 首版面向本地服务、内网服务、container sidecar 和 docs backend。不要把 `.ragbox-index` 作为静态目录直接暴露,因为里面可能包含源文档正文。浏览器 widget 不应该直接携带 ragbox token;建议先请求自己的 backend,由 backend 负责用户登录、限流和审计,再转发给 `ragbox serve`。生产环境建议绑定 localhost 或内网地址,并配置 `serve.authToken`、`--auth-token` 或 `RAGBOX_SERVE_TOKEN`。
|
|
488
529
|
|
|
489
530
|
### `ragbox watch <folder>`
|
|
490
531
|
|
|
@@ -558,8 +599,9 @@ ragbox query ./.ragbox-index "..."
|
|
|
558
599
|
- 把输出目录放在源码目录外,例如 `/var/lib/ragbox/docs-index`
|
|
559
600
|
- 多副本应用需要读取同一份完整索引,可以挂载只读卷或随部署产物分发
|
|
560
601
|
- API key 可以放私有 server 配置、环境变量或 secret manager;不要提交真实 key
|
|
561
|
-
- 当 `serve` 不只绑定 localhost 时,使用 `RAGBOX_SERVE_TOKEN` 或 `--auth-token
|
|
602
|
+
- 当 `serve` 不只绑定 localhost 时,使用 `serve.authToken`、`RAGBOX_SERVE_TOKEN` 或 `--auth-token`;如果配置会提交或共享,要把 token 当作密钥处理
|
|
562
603
|
- 先用 `--concurrency 1`,确认 PageIndex 和模型服务限流后再提高
|
|
604
|
+
- Markdown/MDX 索引保持默认 `--pageindex-runner auto`;它会优先使用 PageIndex 热 worker,无法使用时自动回退到单文件 CLI
|
|
563
605
|
- 如果要求零停机更新,可以先索引到 staging 目录,成功后再切换读目录
|
|
564
606
|
|
|
565
607
|
私有 server 配置示例:
|
|
@@ -569,13 +611,19 @@ ragbox query ./.ragbox-index "..."
|
|
|
569
611
|
"version": 1,
|
|
570
612
|
"pageIndex": {
|
|
571
613
|
"cli": "/opt/PageIndex/run_pageindex.py",
|
|
572
|
-
"python": "/opt/pageindex-venv/bin/python"
|
|
614
|
+
"python": "/opt/pageindex-venv/bin/python",
|
|
615
|
+
"runner": "auto"
|
|
573
616
|
},
|
|
574
617
|
"llm": {
|
|
575
618
|
"baseUrl": "https://api.openai.com/v1",
|
|
576
619
|
"model": "gpt-4o-mini",
|
|
577
620
|
"apiKey": "sk-..."
|
|
578
621
|
},
|
|
622
|
+
"serve": {
|
|
623
|
+
"authToken": "YOUR_RAGBOX_SERVE_TOKEN",
|
|
624
|
+
"host": "127.0.0.1",
|
|
625
|
+
"port": 8787
|
|
626
|
+
},
|
|
579
627
|
"docs": {
|
|
580
628
|
"rootDir": "/srv/app/docs",
|
|
581
629
|
"outputDir": "/var/lib/ragbox/docs-index"
|
|
@@ -590,14 +638,19 @@ ragbox --config ./ragbox.config.prod.json query "怎么配置认证?"
|
|
|
590
638
|
|
|
591
639
|
### 后台运行
|
|
592
640
|
|
|
593
|
-
`ragbox start`
|
|
641
|
+
本地或内网临时测试时,可以用 `ragbox start --background` 让同一个 start 流程脱离当前终端:
|
|
594
642
|
|
|
595
|
-
|
|
643
|
+
```bash
|
|
644
|
+
ragbox --config ./ragbox.config.prod.json start \
|
|
645
|
+
--background
|
|
646
|
+
```
|
|
596
647
|
|
|
597
648
|
```bash
|
|
598
|
-
|
|
649
|
+
ragbox stop
|
|
599
650
|
```
|
|
600
651
|
|
|
652
|
+
长期运行的 server 仍然建议交给进程管理器托管,这样进程崩溃后可以自动重启,启动顺序也更清晰。
|
|
653
|
+
|
|
601
654
|
Linux server 推荐用 `systemd`:
|
|
602
655
|
|
|
603
656
|
```ini
|