docdex 0.1.4 → 0.1.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +8 -0
- package/LICENSE +1 -1
- package/README.md +232 -38
- package/package.json +3 -2
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,13 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 0.1.6
|
|
4
|
+
- Align with MCP spec fixes (notification handling, CallToolResult content payloads, underscore tool names) so Codex and other clients stay stable.
|
|
5
|
+
- Publish npm wrapper with the latest MCP-compliant binary.
|
|
6
|
+
|
|
7
|
+
## 0.1.5
|
|
8
|
+
- Publish the MCP-enabled CLI wrapper (use `docdex mcp` for MCP clients) and align docs with the new stdio mode.
|
|
9
|
+
- Keep npm version in sync with the MCP release for binary downloads.
|
|
10
|
+
|
|
3
11
|
## 0.1.4
|
|
4
12
|
- Version bump for republish (0.1.3 already exists on npm).
|
|
5
13
|
|
package/LICENSE
CHANGED
package/README.md
CHANGED
|
@@ -1,53 +1,247 @@
|
|
|
1
|
-
# Docdex
|
|
1
|
+
# Docdex
|
|
2
2
|
|
|
3
|
-
Docdex is a lightweight, local
|
|
3
|
+
Docdex is a lightweight, local documentation indexer/search daemon. It runs per-project, keeps an on-disk index of your markdown/text docs, and serves top-k snippets over HTTP or CLI for any coding assistant or tool—no external services or uploads required.
|
|
4
4
|
|
|
5
|
-
## Install
|
|
5
|
+
## Install via npm
|
|
6
|
+
- Requires Node.js >= 18.
|
|
7
|
+
- Install: `npm i -g docdex` (or run `npx docdex --version` to verify).
|
|
8
|
+
- Commands: `docdex` (alias `docdexd`) downloads the right binary for your platform from the matching GitHub release.
|
|
9
|
+
- Supported targets: macOS (arm64, x64), Linux glibc (arm64, x64), Linux musl (arm64, x64), Windows (x64, arm64); installer fetches the matching platform release asset.
|
|
10
|
+
- If you publish from a fork, set `DOCDEX_DOWNLOAD_REPO=<owner/repo>` before installing so the downloader fetches your release assets.
|
|
11
|
+
- Distribution: binaries stay in GitHub Releases (small npm package); postinstall fetches `docdexd-<platform>.tar.gz` matching the npm version.
|
|
12
|
+
- Publishing uses npm Trusted Publishing (OIDC) — no NPM token needed; see `.github/workflows/release.yml`.
|
|
13
|
+
|
|
14
|
+
## Features at a glance
|
|
15
|
+
- Per-repo, local indexing of Markdown/text files (tantivy-backed; no network calls).
|
|
16
|
+
- HTTP API (`/search`, `/snippet`, `/healthz`) and CLI (`query`, `ingest`, `self-check`) share the same index.
|
|
17
|
+
- Live file watching while serving for incremental updates.
|
|
18
|
+
- Security knobs: TLS (manual certs or Certbot), auth token required by default (disable with `--secure-mode=false`), loopback-only allowlist by default, default rate limiting, request-size limits, strict state-dir perms, audit log, chroot/privilege drop/unshare net (Unix).
|
|
19
|
+
- Output ready for coding assistants: summaries, snippets, and doc metadata.
|
|
20
|
+
- AI-friendly: `GET /ai-help` returns a JSON playbook (endpoints, CLI commands, limits, best practices) for agents.
|
|
21
|
+
|
|
22
|
+
## What it does
|
|
23
|
+
- Indexes Markdown/text docs inside a repo and stores them locally (tantivy-based index under `<repo>/.docdex/index` by default).
|
|
24
|
+
- Serves the same index over HTTP (`/search`, `/snippet`, `/healthz`) and via CLI (`query`, `ingest`, `self-check`), so automation and interactive use share one dataset.
|
|
25
|
+
- Watches files while serving to incrementally ingest changes.
|
|
26
|
+
- Hardened defaults: loopback binding, TLS enforcement on non-loopback, auth token required by default (disable with `--secure-mode=false`), loopback-only allowlist and default rate limit (60 req/min) in secure mode, audit log enabled, and strict state-dir perms.
|
|
27
|
+
|
|
28
|
+
## How it works
|
|
29
|
+
1) `docdexd index` builds the on-disk index for your repo (or reuses a legacy `.gpt-creator/docdex/index` if present).
|
|
30
|
+
2) `docdexd serve` loads that index, starts a file watcher for incremental updates, and exposes the HTTP API.
|
|
31
|
+
3) HTTP clients or the CLI (`docdexd query`) read from the same index; `ingest` can update a single file without full reindexing.
|
|
32
|
+
4) Optional TLS/auth/rate-limit settings secure remote access; audit logging can record access actions.
|
|
33
|
+
|
|
34
|
+
## Quick start
|
|
6
35
|
```bash
|
|
7
|
-
#
|
|
36
|
+
# install (npm)
|
|
8
37
|
npm i -g docdex
|
|
9
|
-
|
|
10
|
-
# One-off use
|
|
38
|
+
# or use once
|
|
11
39
|
npx docdex --version
|
|
40
|
+
|
|
41
|
+
# full index for a repo/workspace
|
|
42
|
+
docdexd index --repo /path/to/repo
|
|
43
|
+
|
|
44
|
+
# serve HTTP API with live file watching (secure mode requires an auth token)
|
|
45
|
+
docdexd serve --repo /path/to/repo --host 127.0.0.1 --port 46137 --log info --auth-token <token>
|
|
46
|
+
# for local, token-free use, add --secure-mode=false
|
|
47
|
+
# docdexd serve --repo /path/to/repo --host 127.0.0.1 --port 46137 --log info --secure-mode=false
|
|
48
|
+
|
|
49
|
+
# ad-hoc search via CLI (JSON)
|
|
50
|
+
docdexd query --repo /path/to/repo --query "otp flow" --limit 5
|
|
12
51
|
```
|
|
13
52
|
|
|
14
|
-
##
|
|
15
|
-
-
|
|
16
|
-
-
|
|
53
|
+
## TL;DR for agents
|
|
54
|
+
- Use Docdex for repo docs: run `docdexd index --repo .` once, then either `docdexd serve --repo . --host 127.0.0.1 --port 46137` for HTTP, or `docdexd mcp --repo . --log warn` for MCP.
|
|
55
|
+
- Add `.docdex/` to `.gitignore` so indexes aren’t committed.
|
|
56
|
+
- When MCP-aware, register a server named `docdex` that runs `docdexd mcp --repo . --log warn --max-results 8`, then call `docdex_search` before coding and `docdex_index` when stale.
|
|
57
|
+
- Prefer summary-first (snippets=false), fetch specific snippets only when needed, keep queries short, and respect token estimates.
|
|
17
58
|
|
|
18
|
-
##
|
|
19
|
-
-
|
|
20
|
-
-
|
|
59
|
+
## Usage cheat sheet
|
|
60
|
+
- Build index: `docdexd index --repo <path>` (add `--exclude-*` to skip paths).
|
|
61
|
+
- Serve with watcher: `docdexd serve --repo <path> --host 127.0.0.1 --port 46137 --log warn --auth-token <token>` (secure mode also allowlists loopback and rate-limits by default; add `--allow-ip`/`--secure-mode=false`/`--rate-limit-per-min` as needed for remote use).
|
|
62
|
+
- Secure serving: add `--auth-token <token>` (required by default); use TLS with `--tls-cert/--tls-key` or `--certbot-domain <domain>`.
|
|
63
|
+
- Single-file ingest: `docdexd ingest --repo <path> --file docs/new.md` (honors excludes).
|
|
64
|
+
- Query via CLI: `docdexd query --repo <path> --query "term" --limit 4`.
|
|
65
|
+
- Git hygiene: add `.docdex/` (and especially `.docdex/index/`) to your repo’s `.gitignore` so index artifacts never get committed.
|
|
66
|
+
- Health check: `curl http://127.0.0.1:46137/healthz`.
|
|
67
|
+
- Summary-only search responses: `curl "http://127.0.0.1:46137/search?q=foo&snippets=false"`; fetch snippets only for top hits.
|
|
68
|
+
- Token budgets: `curl "http://127.0.0.1:46137/search?q=foo&max_tokens=800"` to drop hits that would exceed your prompt budget; pair with `snippets=false` then fetch 1–2 snippets you keep.
|
|
69
|
+
- Text-only snippets: append `text_only=true` to `/snippet/:doc_id` or start `serve` with `--strip-snippet-html` (or `--disable-snippet-text` to return metadata only).
|
|
70
|
+
- Keep requests compact: defaults enforce `max_query_bytes=4096` and `max_request_bytes=16384`; keep queries short and leave `--max-limit` low (default 8) to avoid oversized responses.
|
|
71
|
+
- Prompt hygiene: in agent prompts, normalize whitespace and include only `rel_path`, `summary`, and trimmed `snippet` (omit `score`/`token_estimate`/`doc_id`).
|
|
72
|
+
- Trim noise early: use `--exclude-dir` and `--exclude-prefix` to keep vendor/build/cache/secrets out of the index so snippets stay relevant and short.
|
|
73
|
+
- Quiet logging for agents: run `docdexd serve --log warn --access-log=false` if you marshal responses elsewhere to cut log overhead.
|
|
74
|
+
- Cache hits client-side: store `doc_id` ↔ `rel_path` ↔ `summary` to avoid repeat snippet calls; fetch snippets only for new doc_ids.
|
|
75
|
+
- Agent help: `curl http://127.0.0.1:46137/ai-help` (requires auth if configured; include `Authorization: Bearer <token>` when you’ve set `--auth-token`). The response includes a short MCP registration recipe.
|
|
21
76
|
|
|
22
|
-
##
|
|
23
|
-
-
|
|
24
|
-
-
|
|
25
|
-
-
|
|
26
|
-
-
|
|
77
|
+
## Versioning
|
|
78
|
+
- Semantic versioning with tagged releases (`vX.Y.Z`). The Rust crate and npm package share the same version.
|
|
79
|
+
- Conventional Commits drive release notes via Release Please; it opens release PRs that bump `Cargo.toml` and `npm/package.json`, update changelogs, and creates the tag/release on merge.
|
|
80
|
+
- Pin to a released version when integrating (e.g., in scripts or Dockerfiles) so upgrades are explicit and reversible.
|
|
81
|
+
- If you build from source, the version comes from `Cargo.toml` in this repo; the npm wrapper uses the matching version to fetch binaries.
|
|
27
82
|
|
|
28
|
-
##
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
83
|
+
## Paths and defaults
|
|
84
|
+
- State/index directory: `<repo>/.docdex/index` (if missing but legacy `<repo>/.gpt-creator/docdex/index` exists, Docdex will reuse it and warn). The directory is created with `0700` permissions by default.
|
|
85
|
+
- HTTP API: defaults to `127.0.0.1:46137` when serving.
|
|
86
|
+
- Docdex data and logs stay inside the repo; no external services.
|
|
32
87
|
|
|
33
|
-
|
|
34
|
-
|
|
88
|
+
## Configuration knobs
|
|
89
|
+
- `--repo <path>`: workspace root to index (defaults to `.`).
|
|
90
|
+
- `--state-dir <path>` / `DOCDEX_STATE_DIR`: override index storage path (relative paths are resolved under `repo`).
|
|
91
|
+
- `--exclude-prefix a,b,c` / `DOCDEX_EXCLUDE_PREFIXES`: extra relative prefixes to skip.
|
|
92
|
+
- `--exclude-dir a,b,c` / `DOCDEX_EXCLUDE_DIRS`: extra directory names to skip anywhere in the tree.
|
|
93
|
+
- `--auth-token <token>` / `DOCDEX_AUTH_TOKEN`: bearer token required in secure mode (default); omit only when starting with `--secure-mode=false`.
|
|
94
|
+
- `--secure-mode <true|false>` / `DOCDEX_SECURE_MODE`: default `true`; when enabled, requires an auth token, loopback allowlist by default, and default rate limiting (60 req/min).
|
|
95
|
+
- `--allow-ip a,b,c` / `DOCDEX_ALLOW_IPS`: optional comma-separated IPs/CIDRs allowed to reach the HTTP API (default: loopback-only in secure mode; allow all when secure mode is disabled).
|
|
96
|
+
- `--tls-cert` / `DOCDEX_TLS_CERT` and `--tls-key` / `DOCDEX_TLS_KEY`: serve HTTPS with the provided cert/key. With TLS enforcement on, non-loopback binds must use HTTPS unless you explicitly opt out.
|
|
97
|
+
- `--certbot-domain <domain>` / `DOCDEX_CERTBOT_DOMAIN`: point TLS at `/etc/letsencrypt/live/<domain>/{fullchain.pem,privkey.pem}` (Certbot). Conflicts with manual `--tls-*`.
|
|
98
|
+
- `--certbot-live-dir <path>` / `DOCDEX_CERTBOT_LIVE_DIR`: use a specific Certbot live dir containing `fullchain.pem` and `privkey.pem`.
|
|
99
|
+
- `--require-tls <true|false>` / `DOCDEX_REQUIRE_TLS`: default `true`. Enforce TLS for non-loopback binds; set to `false` when TLS is already terminated by a trusted proxy.
|
|
100
|
+
- `--insecure` / `DOCDEX_INSECURE_HTTP=true`: allow plain HTTP on non-loopback binds even when TLS is enforced (only use behind a trusted proxy).
|
|
101
|
+
- `--max-limit <n>` / `DOCDEX_MAX_LIMIT`: clamp HTTP `limit` to at most `n` (default: 8).
|
|
102
|
+
- `--max-query-bytes <n>` / `DOCDEX_MAX_QUERY_BYTES`: reject requests whose query string exceeds `n` bytes (default: 4096).
|
|
103
|
+
- `--max-request-bytes <n>` / `DOCDEX_MAX_REQUEST_BYTES`: reject requests whose Content-Length or size hint exceeds `n` bytes (default: 16384).
|
|
104
|
+
- `--rate-limit-per-min <n>` / `DOCDEX_RATE_LIMIT_PER_MIN`: per-IP request budget per minute (default 60 in secure mode when unset/0; 0 disables when secure mode is off).
|
|
105
|
+
- `--rate-limit-burst <n>` / `DOCDEX_RATE_LIMIT_BURST`: optional burst capacity for the rate limiter (defaults to per-minute limit when 0).
|
|
106
|
+
- `--audit-log-path <path>` / `DOCDEX_AUDIT_LOG_PATH`: write audit log JSONL to this path (default: `<state-dir>/audit.log`).
|
|
107
|
+
- `--audit-max-bytes <n>` / `DOCDEX_AUDIT_MAX_BYTES`: rotate audit log after this many bytes (default: 5_000_000).
|
|
108
|
+
- `--audit-max-files <n>` / `DOCDEX_AUDIT_MAX_FILES`: keep at most this many rotated audit files (default: 5).
|
|
109
|
+
- `--audit-disable` / `DOCDEX_AUDIT_DISABLE=true`: disable audit logging entirely.
|
|
110
|
+
- `--strip-snippet-html` / `DOCDEX_STRIP_SNIPPET_HTML=true`: omit `snippet.html` in responses to force text-only snippets (HTML is sanitized by default when present).
|
|
111
|
+
- `--disable-snippet-text` / `DOCDEX_DISABLE_SNIPPET_TEXT=true`: omit snippet text/html in responses entirely (only doc metadata is returned).
|
|
112
|
+
- `--access-log <true|false>` / `DOCDEX_ACCESS_LOG`: emit minimal structured access logs with query values redacted (default: true).
|
|
113
|
+
- `--run-as-uid` / `DOCDEX_RUN_AS_UID`, `--run-as-gid` / `DOCDEX_RUN_AS_GID`: (Unix) drop privileges to the provided UID/GID after startup prep.
|
|
114
|
+
- `--chroot <path>` / `DOCDEX_CHROOT`: (Unix) chroot into `path` before serving; repo/state paths must exist inside that jail.
|
|
115
|
+
- `--unshare-net` / `DOCDEX_UNSHARE_NET=true`: (Linux only) unshare the network namespace before serving (requires CAP_SYS_ADMIN/root); no-op on other platforms.
|
|
116
|
+
- Logging: `--log <level>` on `serve` (defaults to `info`), or `RUST_LOG=docdexd=debug` style filters.
|
|
117
|
+
- Secure mode defaults: when `--secure-mode=true` (default), docdex requires an auth token, allows only loopback IPs unless overridden, and applies a 60 req/min rate limit. Set `--secure-mode=false` to opt out for local dev and adjust `--allow-ip`/rate limits as needed.
|
|
35
118
|
|
|
36
|
-
|
|
37
|
-
|
|
119
|
+
## Indexing rules (see `index/mod.rs`)
|
|
120
|
+
- File types: `.md`, `.markdown`, `.mdx`, `.txt` (extend `DEFAULT_EXTENSIONS` to add more).
|
|
121
|
+
- Skipped directories: broad VCS/build/cache/vendor folders across ecosystems (e.g., `.git`, `.hg`, `.svn`, `node_modules`, `.pnpm-store`, `.yarn*`, `.nx`, `.rollup-cache`, `.webpack-cache`, `.tsbuildinfo`, `.next`, `.nuxt`, `.svelte-kit`, `.mypy_cache`, `.ruff_cache`, `.venv`, `target`, `go-build`, `.gradle`, `.mvn`, `pods`, `.dart_tool`, `.android`, `.serverless`, `.vercel`, `.netlify`, `_build`, `_opam`, `.stack-work`, `elm-stuff`, `library`, `intermediate`, `.godot`, etc.; see `DEFAULT_EXCLUDED_DIR_NAMES` for the full list).
|
|
122
|
+
- Skipped relative prefixes: `logs/`, `.docdex/`, `.docdex/logs/`, `.docdex/tmp/`, `.gpt-creator/logs/`, `.gpt-creator/tmp/`, `.mastercoda/logs/`, `.mastercoda/tmp/`, `docker/.data/`, `docker-data/`, `.docker/`.
|
|
123
|
+
- Snippet sizing: summaries ~360 chars (up to 4 segments); snippets ~420 chars.
|
|
38
124
|
|
|
39
|
-
|
|
40
|
-
|
|
125
|
+
## HTTP API
|
|
126
|
+
- `GET /healthz` — returns `ok`; this endpoint is unauthenticated and not rate-limited (IP allowlist still applies).
|
|
127
|
+
- `GET /search?q=<text>&limit=<n>&snippets=<bool>&max_tokens=<u64>` — returns `{ hits: [...] }` with doc id, rel path, summary, snippet, score, token estimate. Set `snippets=false` for summary-only responses; set `max_tokens` to drop hits above your budget.
|
|
128
|
+
- `GET /snippet/:doc_id?window=<lines>&q=<query>&text_only=<bool>&max_tokens=<u64>` — returns `{ doc, snippet }` with optional highlighted snippet; falls back to preview when query highlighting is empty (default window: 40 lines). Set `text_only=true` to drop HTML and shrink payloads; set `max_tokens` to omit the snippet if the doc exceeds your budget.
|
|
129
|
+
- `GET /ai-help` — returns a JSON quickstart for agents (endpoints, CLI commands, limits, best practices).
|
|
130
|
+
- `GET /metrics` — returns Prometheus-style counters for rate-limit/auth/error metrics.
|
|
131
|
+
- If `--auth-token` is set, include `Authorization: Bearer <token>` on HTTP calls (including `/ai-help`).
|
|
132
|
+
|
|
133
|
+
## CLI commands
|
|
134
|
+
- `serve --repo <path> [--host 127.0.0.1] [--port 46137] [--log info]` — start HTTP API with file watching for incremental updates.
|
|
135
|
+
- `index --repo <path>` — rebuild the entire index.
|
|
136
|
+
- `ingest --repo <path> --file <file>` — reindex a single file.
|
|
137
|
+
- `query --repo <path> --query "<text>" [--limit 8]` — run a search and print JSON hits.
|
|
138
|
+
- `self-check --repo <path> --terms "foo,bar" [--limit 5]` — scan the index for sensitive terms before enabling access (fails with non-zero exit if any are found; reports sample hits and if more exist). Includes built-in token/password patterns by default; disable with `--include-default-patterns=false` if you only want your provided terms.
|
|
139
|
+
|
|
140
|
+
## Help and command discovery
|
|
141
|
+
- List all commands/flags: `docdexd --help`.
|
|
142
|
+
- Dump help for every subcommand: `docdexd help-all`.
|
|
143
|
+
- See `serve` options (TLS, auth, rate limits, watcher): `docdexd serve --help`.
|
|
144
|
+
- Indexing options: `docdexd index --help` (exclude paths, custom state dir).
|
|
145
|
+
- Ad-hoc queries: `docdexd query --help`.
|
|
146
|
+
- Self-check scanner options: `docdexd self-check --help`.
|
|
147
|
+
- Agent help endpoint: `curl http://127.0.0.1:46137/ai-help` (include `Authorization: Bearer <token>` if `--auth-token` is set) for a JSON listing of endpoints, limits, and best practices.
|
|
148
|
+
- MCP help/registration: `docdexd mcp --help` lists MCP flags; register with your client using `docdexd mcp --repo <repo> --log warn --max-results 8` (Codex CLI shortcut: `codex mcp add docdex -- docdexd mcp --repo <repo> --log warn --max-results 8`).
|
|
149
|
+
- Environment variables mirror the flags (e.g., `DOCDEX_AUTH_TOKEN`, `DOCDEX_TLS_CERT`, `DOCDEX_MAX_LIMIT`).
|
|
150
|
+
- Command overview (same as `docdexd --help`):
|
|
151
|
+
- `serve` — run HTTP API with watcher and security knobs.
|
|
152
|
+
- `index` — build or rebuild the whole index.
|
|
153
|
+
- `ingest` — reindex a single file.
|
|
154
|
+
- `query` — run an ad-hoc search, JSON to stdout.
|
|
155
|
+
- `self-check` — scan index for sensitive terms with report.
|
|
156
|
+
- `help-all` — print help for every command/flag in one output.
|
|
157
|
+
|
|
158
|
+
## Troubleshooting
|
|
159
|
+
- Stale index: re-run `docdexd index --repo <path>`.
|
|
160
|
+
- Port conflicts: change `--host/--port`.
|
|
161
|
+
|
|
162
|
+
## Security considerations
|
|
163
|
+
- Default bind is `127.0.0.1`; keep it unless you are behind a trusted reverse proxy/firewall. Avoid `--host 0.0.0.0` on untrusted networks.
|
|
164
|
+
- By default, non-loopback binds require TLS; opt out only with `--require-tls=false` or `--insecure` when traffic is already terminating at a trusted proxy.
|
|
165
|
+
- If exposing externally, place a reverse proxy in front, terminate TLS, and require auth (basic/OAuth/mTLS) plus IP/VPN allowlisting. Example (nginx):
|
|
166
|
+
```
|
|
167
|
+
server {
|
|
168
|
+
listen 443 ssl;
|
|
169
|
+
server_name docdex.example.com;
|
|
170
|
+
ssl_certificate /path/fullchain.pem;
|
|
171
|
+
ssl_certificate_key /path/privkey.pem;
|
|
172
|
+
auth_basic "Protected";
|
|
173
|
+
auth_basic_user_file /etc/nginx/.htpasswd; # or hook OAuth/mTLS instead
|
|
174
|
+
allow 10.0.0.0/8;
|
|
175
|
+
allow 192.168.0.0/16;
|
|
176
|
+
deny all;
|
|
177
|
+
location / {
|
|
178
|
+
proxy_pass http://127.0.0.1:46137;
|
|
179
|
+
proxy_set_header Host $host;
|
|
180
|
+
}
|
|
181
|
+
}
|
|
182
|
+
```
|
|
183
|
+
- Trim the corpus: prefer a curated staging directory, or use `--exclude-dir` / `--exclude-prefix` to keep secrets/private paths out before indexing; the watcher will ingest any in-scope file change under `repo`.
|
|
184
|
+
- Mind logs: avoid verbose logging in production if snippets/paths are sensitive; reverse-proxy access logs can also capture query terms and paths.
|
|
185
|
+
- Least privilege: run docdex under a low-privilege user/container and keep the state dir on a path with restricted permissions.
|
|
186
|
+
- Validate before publish: run `docdexd query` for sensitive keywords to confirm no hits; store indexes on encrypted disks if required.
|
|
187
|
+
- Optional hardening: require an auth token on the HTTP API (or proxy); enforce TLS when not on localhost (default) or explicitly opt out with `--require-tls=false`/`--insecure` only behind a trusted proxy; enable rate limiting (`--rate-limit-per-min`) and clamp `limit`/request sizes (`--max-limit`, `--max-query-bytes`, `--max-request-bytes`); escape/sanitize snippet HTML if embedding or disable snippets entirely with `--disable-snippet-text`; state dir is created `0700` by default—keep it under an unprivileged user, optionally `--run-as-uid/--run-as-gid`, `--chroot`, or containerize; keep access logging minimal/redacted (`--access-log`), and run `self-check` for sensitive terms before exposing the service; for at-rest confidentiality, place the state dir on encrypted storage or use host-level disk encryption.
|
|
188
|
+
|
|
189
|
+
## Integrating with LLM tools
|
|
190
|
+
Docdex is tool-agnostic. Drop-in recipe for agents/codegen tools:
|
|
191
|
+
- Start once per repo: `docdexd index --repo <repo>` then `docdexd serve --repo <repo> --host 127.0.0.1 --port 46137 --log warn` (or use the CLI directly without serving).
|
|
192
|
+
- Configure via env: `DOCDEX_STATE_DIR` (index location), `DOCDEX_EXCLUDE_PREFIXES`, `DOCDEX_EXCLUDE_DIRS`, `RUST_LOG=docdexd=debug` (optional verbose logs).
|
|
193
|
+
- Query over HTTP: `GET /search?q=<text>&limit=<n>` returns `{"hits":[{"doc_id","rel_path","score","summary","snippet","token_estimate"}...]}`; `GET /snippet/:doc_id` fetches a focused snippet plus doc metadata.
|
|
194
|
+
- Or query via CLI: `docdexd query --repo <repo> --query "<text>" --limit 8` (JSON to stdout).
|
|
195
|
+
- Health check: `GET /healthz` should return `ok` before issuing search requests.
|
|
196
|
+
- Inject snippets into prompts:
|
|
41
197
|
```
|
|
198
|
+
"You are building features for this repo. Use the following documentation snippets for context. If a snippet cites a path, keep that path in your response. Snippets:\n<insert docdex snippets here>\nQuestion: <your question>"
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
### MCP (optional stdio server for MCP-aware clients)
|
|
202
|
+
Docdex can run as an MCP tool provider over stdio; it does not replace the HTTP daemon—pick whichever fits your agent/editor. If your MCP client supports resource templates, Docdex advertises a `docdex_file` template (`docdex://{path}`) which delegates to `docdex_open`.
|
|
203
|
+
- Run: `docdexd mcp --repo /path/to/repo --log warn --max-results 8` (alias: `--mcp-max-results 8`).
|
|
204
|
+
- Env override: `DOCDEX_MCP_MAX_RESULTS` clamps `docdex_search` results (min 1).
|
|
205
|
+
- Packaging: MCP server is built into the main `docdexd` binary (invoked via `docdexd mcp` or `docdex mcp` from the npm bin); no separate `docdex-mcp` download required.
|
|
206
|
+
- Registering with MCP clients: add a server named `docdex` that runs `docdexd mcp --repo <repo> --log warn`. Example Codex config snippet:
|
|
207
|
+
```json
|
|
208
|
+
{
|
|
209
|
+
"mcpServers": {
|
|
210
|
+
"docdex": {
|
|
211
|
+
"command": "docdexd",
|
|
212
|
+
"args": ["mcp", "--repo", ".", "--log", "warn", "--max-results", "8"],
|
|
213
|
+
"env": {}
|
|
214
|
+
}
|
|
215
|
+
}
|
|
216
|
+
}
|
|
217
|
+
```
|
|
218
|
+
- MCP quick add commands (popular agents):
|
|
219
|
+
- Docdex helper: `docdex mcp-add --repo /path/to/repo --log warn --max-results 8` auto-detects supported agents; add `--all` to attempt every known client and print manual steps for UI-only ones, or `--remove` to uninstall.
|
|
220
|
+
- Codex CLI: `codex mcp add docdex -- docdexd mcp --repo /path/to/repo --log warn --max-results 8`.
|
|
221
|
+
- Generic JSON config (Cursor, Continue, Windsurf, Cline, Claude Desktop devtools): add the `mcpServers.docdex` block above to your MCP config file (paths vary by client; most accept the `command`/`args` schema shown).
|
|
222
|
+
- Manual/stdio-only clients: start `docdexd mcp --repo /path/to/repo --log warn --max-results 8` yourself and point the client at that command/binary.
|
|
223
|
+
- Tools exposed (CallToolResult content: result.content[0].text contains JSON):
|
|
224
|
+
- `docdex_search` — args: `{ "query": "<text>", "limit": <int optional>, "project_root": "<path optional>" }`. Returns `{ "results": [...], "repo_root": "...", "state_dir": "...", "limit": <int>, "project_root": "...", "meta": {...} }`.
|
|
225
|
+
- `docdex_index` — args: `{ "paths": ["relative/or/absolute"], "project_root": "<path optional>" }`. Empty `paths` reindexes everything; otherwise ingests the listed files.
|
|
226
|
+
- `docdex_files` — args: `{ "limit": <int optional, default 200, max 1000>, "offset": <int optional, default 0>, "project_root": "<path optional>" }`. Returns `{ "results": [{ "doc_id", "rel_path", "summary", "token_estimate" }], "total", "limit", "offset", "repo_root", "project_root" }`.
|
|
227
|
+
- `docdex_open` — args: `{ "path": "<relative file>", "start_line": <int optional>, "end_line": <int optional>, "project_root": "<path optional>" }`. Returns `{ "path", "start_line", "end_line", "total_lines", "content", "repo_root", "project_root" }` (rejects paths outside repo and large files).
|
|
228
|
+
- `docdex_stats` — args: `{ "project_root": "<path optional>" }`. Returns `{ "num_docs", "state_dir", "index_size_bytes", "segments", "avg_bytes_per_doc", "generated_at_epoch_ms", "last_updated_epoch_ms", "repo_root", "project_root" }`.
|
|
229
|
+
- Example calls:
|
|
230
|
+
- Initialize: `{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}`
|
|
231
|
+
- Initialize with workspace root: `{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"workspace_root":"/path/to/repo"}}` (must match the server repo; sets default project_root for later calls)
|
|
232
|
+
- List tools: `{"jsonrpc":"2.0","id":2,"method":"tools/list"}`
|
|
233
|
+
- Reindex: `{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"docdex_index","arguments":{"paths":[]}}}`
|
|
234
|
+
- Search: `{"jsonrpc":"2.0","id":4,"method":"tools/call","params":{"name":"docdex_search","arguments":{"query":"payment auth flow","limit":3,"project_root":"/repo"}}}`
|
|
235
|
+
- List files: `{"jsonrpc":"2.0","id":5,"method":"tools/call","params":{"name":"docdex_files","arguments":{"limit":10,"offset":0}}}`
|
|
236
|
+
- Open file: `{"jsonrpc":"2.0","id":6,"method":"tools/call","params":{"name":"docdex_open","arguments":{"path":"docs/readme.md","start_line":1,"end_line":20}}}`
|
|
237
|
+
- Stats: `{"jsonrpc":"2.0","id":7,"method":"tools/call","params":{"name":"docdex_stats","arguments":{}}}`
|
|
238
|
+
- Errors: invalid JSON → code -32700; unsupported/missing `jsonrpc` → -32600; unknown tool/method → -32601; invalid params (empty query, bad args, project_root mismatch) → -32602; internal errors include a `reason` string in `error.data`.
|
|
239
|
+
- Agent guidance: call `docdex_search` with concise queries before coding; fetch only a few hits; if results look stale, call `docdex_index`; keep using HTTP/CLI if your stack isn’t MCP-aware.
|
|
240
|
+
- Help: `docdexd mcp --help` shows MCP flags and defaults; `docdexd help-all` includes an MCP section listing tools and usage.
|
|
42
241
|
|
|
43
|
-
##
|
|
44
|
-
-
|
|
45
|
-
-
|
|
46
|
-
- `
|
|
47
|
-
-
|
|
48
|
-
-
|
|
49
|
-
|
|
50
|
-
## Notes
|
|
51
|
-
- Release assets are expected to be named `docdexd-<platform>.tar.gz` with a matching `.sha256`.
|
|
52
|
-
- License: MIT (see `LICENSE`).
|
|
53
|
-
- Changelog: see `CHANGELOG.md`.
|
|
242
|
+
## HTTPS and Certbot
|
|
243
|
+
- TLS accepts PKCS8, PKCS1/RSA, and SEC1/EC private keys (compatible with Certbot output).
|
|
244
|
+
- Manual cert/key: `docdexd serve --repo <repo> --tls-cert /path/fullchain.pem --tls-key /path/privkey.pem`.
|
|
245
|
+
- Certbot helper: `docdexd serve --repo <repo> --host 0.0.0.0 --port 46137 --certbot-domain docs.example.com` (uses `/etc/letsencrypt/live/docs.example.com/{fullchain.pem,privkey.pem}`), or pass `--certbot-live-dir /custom/live/dir`.
|
|
246
|
+
- When using Certbot, set a deploy hook to restart/reload docdex after renewals (e.g., `certbot renew --deploy-hook "systemctl restart docdexd.service"` or kill -HUP your process supervisor).
|
|
247
|
+
- If binding to 443 directly, you need privileges; otherwise, keep docdex on 127.0.0.1 and let a reverse proxy terminate TLS.
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "docdex",
|
|
3
|
-
"version": "0.1.
|
|
3
|
+
"version": "0.1.6",
|
|
4
4
|
"description": "Docdex CLI as an npm-installable binary wrapper.",
|
|
5
5
|
"bin": {
|
|
6
6
|
"docdex": "bin/docdex.js",
|
|
@@ -21,7 +21,8 @@
|
|
|
21
21
|
},
|
|
22
22
|
"os": [
|
|
23
23
|
"darwin",
|
|
24
|
-
"linux"
|
|
24
|
+
"linux",
|
|
25
|
+
"win32"
|
|
25
26
|
],
|
|
26
27
|
"cpu": [
|
|
27
28
|
"arm64",
|