docdex 0.2.20 → 0.2.22
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +6 -1
- package/README.md +217 -272
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -1,6 +1,11 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
-
## 0.2.
|
|
3
|
+
## 0.2.22
|
|
4
|
+
- Add Smithery session config schema metadata (titles/descriptions, defaults, example config) for local MCP sessions.
|
|
5
|
+
- Enrich MCP tools with titles, descriptions, parameter descriptions, and annotations to improve Smithery scoring.
|
|
6
|
+
- Expose MCP prompts and resources (with titles/mime types/annotations) for onboarding, incident triage, and refactor planning.
|
|
7
|
+
|
|
8
|
+
## 0.2.21
|
|
4
9
|
- Prompt for npm updates at CLI start (TTY-only, opt-out via `DOCDEX_UPDATE_CHECK=0`).
|
|
5
10
|
- Export bundled Playwright fetcher for daemon startup (launchd/systemd/schtasks + immediate spawn).
|
|
6
11
|
- Pass `DOCDEX_PLAYWRIGHT_FETCHER` in the npm wrapper when launching the daemon.
|
package/README.md
CHANGED
|
@@ -1,286 +1,231 @@
|
|
|
1
|
+
[](https://docdex.org)
|
|
2
|
+
[](https://smithery.ai/server/bekirdag/docdex)
|
|
3
|
+

|
|
4
|
+

|
|
5
|
+

|
|
6
|
+

|
|
7
|
+
[](https://lobehub.com/mcp/bekirdag-docdex)
|
|
8
|
+
|
|
9
|
+
<a href="https://glama.ai/mcp/servers/@bekirdag/docdex">
|
|
10
|
+
<img width="380" height="200" src="https://glama.ai/mcp/servers/@bekirdag/docdex/badge" />
|
|
11
|
+
</a>
|
|
12
|
+
|
|
1
13
|
# Docdex
|
|
2
14
|
|
|
3
|
-
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
## Features
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
4) Optional TLS/auth/rate-limit settings secure remote access; audit logging can record access actions.
|
|
40
|
-
|
|
41
|
-
## Quick start
|
|
15
|
+
> **Turn your repository into fast, private context that humans and AI can trust.**
|
|
16
|
+
|
|
17
|
+
Docdex is a **local-first indexer and search daemon** for documentation and source code. It sits between your raw files and your AI assistant, providing deterministic search, code intelligence, and persistent memory without ever uploading your code to a cloud vector store.
|
|
18
|
+
|
|
19
|
+
## ⚡ Why Docdex?
|
|
20
|
+
|
|
21
|
+
Most AI tools rely on "grep" (fast but dumb) or hosted RAG (slow and requires uploads). Docdex runs locally, understands code structure, and gives your AI agents a persistent memory.
|
|
22
|
+
|
|
23
|
+
| Problem | Typical Approach | The Docdex Solution |
|
|
24
|
+
| --- | --- | --- |
|
|
25
|
+
| **Finding Context** | `grep`/`rg` (Noisy, literal matches) | **Ranked, structured results** based on intent. |
|
|
26
|
+
| **Code Privacy** | Hosted RAG (Requires uploading code) | **Local-only indexing.** Your code stays on your machine. |
|
|
27
|
+
| **Siloed Search** | IDE-only search bars | **Shared Daemon** serving CLI, HTTP, and MCP clients simultaneously. |
|
|
28
|
+
| **Code Awareness** | String matching | **AST & Impact Graph** to understand dependencies and definitions. |
|
|
29
|
+
|
|
30
|
+
---
|
|
31
|
+
|
|
32
|
+
## 🚀 Features
|
|
33
|
+
|
|
34
|
+
* **📚 Document Indexing:** Rank and summarize repo documentation instantly.
|
|
35
|
+
* **🧠 AST & Impact Graph:** Search by function intent and track downstream dependencies (supports Rust, Python, JS/TS, Go, Java, C++, and more).
|
|
36
|
+
* **💾 Repo Memory:** Stores project facts, decisions, and notes locally.
|
|
37
|
+
* **👤 Agent Memory:** Remembers user preferences (e.g., "Use concise bullet points") across different repositories.
|
|
38
|
+
* **🔌 MCP Native:** Auto-configures for tools like Claude Desktop, Cursor, and Windsurf.
|
|
39
|
+
* **🌐 Web Enrichment:** Optional web search with local LLM filtering (via Ollama).
|
|
40
|
+
|
|
41
|
+
---
|
|
42
|
+
|
|
43
|
+
## 📦 Set-and-Forget Install
|
|
44
|
+
|
|
45
|
+
Install once, point your agent at Docdex, and it keeps working in the background.
|
|
46
|
+
|
|
47
|
+
### 1. Install via npm (Recommended)
|
|
48
|
+
|
|
49
|
+
Requires Node.js >= 18. This will download the correct binary for your OS (macOS, Linux, Windows).
|
|
50
|
+
|
|
42
51
|
```bash
|
|
43
|
-
# install (npm)
|
|
44
52
|
npm i -g docdex
|
|
45
|
-
# or use once
|
|
46
|
-
npx docdex --version
|
|
47
53
|
|
|
48
|
-
|
|
49
|
-
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
### 2. Auto-Configuration
|
|
57
|
+
|
|
58
|
+
If you have any of the following clients installed, Docdex automatically configures them to use the local MCP server:
|
|
59
|
+
|
|
60
|
+
> **Claude Desktop, Cursor, Windsurf, Cline, Roo Code, Continue, VS Code, PearAI, Void, Zed, Codex.**
|
|
61
|
+
|
|
62
|
+
*Note: Restart your AI client after installation.*
|
|
50
63
|
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
64
|
+
---
|
|
65
|
+
|
|
66
|
+
## 🛠️ Usage Workflow
|
|
67
|
+
|
|
68
|
+
### 1. Index a Repository
|
|
69
|
+
|
|
70
|
+
Run this once to build the index and graph data.
|
|
71
|
+
|
|
72
|
+
```bash
|
|
73
|
+
docdexd index --repo /path/to/my-project
|
|
55
74
|
|
|
56
|
-
# ad-hoc chat via CLI (JSON)
|
|
57
|
-
docdexd chat --repo /path/to/repo --query "otp flow" --limit 5
|
|
58
75
|
```
|
|
59
76
|
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
- Health check: `curl http://127.0.0.1:3210/healthz`.
|
|
77
|
-
- Summary-only search responses: `curl "http://127.0.0.1:3210/search?q=foo&snippets=false"`; fetch snippets only for top hits.
|
|
78
|
-
- Repo-only HTTP search (ignore libs index hits): `curl "http://127.0.0.1:3210/search?q=foo&include_libs=false"`.
|
|
79
|
-
- Token budgets: `curl "http://127.0.0.1:3210/search?q=foo&max_tokens=800"` to drop hits that would exceed your prompt budget; pair with `snippets=false` then fetch 1–2 snippets you keep.
|
|
80
|
-
- Text-only snippets: append `text_only=true` to `/snippet/:doc_id` or start `serve` with `--strip-snippet-html` (or `--disable-snippet-text` to return metadata only).
|
|
81
|
-
- Keep requests compact: defaults enforce `max_query_bytes=4096` and `max_request_bytes=16384`; keep queries short and leave `--max-limit` low (default 8) to avoid oversized responses.
|
|
82
|
-
- Prompt hygiene: in agent prompts, normalize whitespace and include only `rel_path`, `summary`, and trimmed `snippet` (omit `score`/`token_estimate`/`doc_id`).
|
|
83
|
-
- Trim noise early: use `--exclude-dir` and `--exclude-prefix` to keep vendor/build/cache/secrets out of the index so snippets stay relevant and short.
|
|
84
|
-
- Quiet logging for agents: run `docdexd serve --log warn --access-log=false` if you marshal responses elsewhere to cut log overhead.
|
|
85
|
-
- Cache hits client-side: store `doc_id` ↔ `rel_path` ↔ `summary` to avoid repeat snippet calls; fetch snippets only for new doc_ids.
|
|
86
|
-
- Agent help: `curl http://127.0.0.1:3210/ai-help` (requires auth if configured; include `Authorization: Bearer <token>` when you’ve set `--auth-token`). The response includes a short MCP registration recipe.
|
|
87
|
-
|
|
88
|
-
## Versioning
|
|
89
|
-
- Semantic versioning with tagged releases (`vX.Y.Z`). The Rust crate and npm package share the same version.
|
|
90
|
-
- Conventional Commits drive release notes via Release Please; it opens release PRs that bump `Cargo.toml` and `npm/package.json`, update changelogs, and creates the tag/release on merge.
|
|
91
|
-
- Pin to a released version when integrating (e.g., in scripts or Dockerfiles) so upgrades are explicit and reversible.
|
|
92
|
-
- If you build from source, the version comes from `Cargo.toml` in this repo; the npm wrapper uses the matching version to fetch binaries.
|
|
93
|
-
|
|
94
|
-
## Paths and defaults
|
|
95
|
-
- State/index directory: `~/.docdex/state/repos/<fingerprint>/index` (legacy `.gpt-creator/docdex/index` is reused with a warning). The directory is created with `0700` permissions by default.
|
|
96
|
-
- HTTP API: defaults to `127.0.0.1:3210` when serving.
|
|
97
|
-
- State and logs stay local; no external services are required.
|
|
98
|
-
|
|
99
|
-
## Configuration knobs
|
|
100
|
-
- `--repo <path>`: workspace root to index (defaults to `.`).
|
|
101
|
-
- `--state-dir <path>` / `DOCDEX_STATE_DIR`: override state storage path (relative paths resolve under the repo root; absolute paths outside the repo are treated as shared base dirs and scoped to `<state-dir>/repos/<repo_id>/index`).
|
|
102
|
-
- `--exclude-prefix a,b,c` / `DOCDEX_EXCLUDE_PREFIXES`: extra relative prefixes to skip.
|
|
103
|
-
- `--exclude-dir a,b,c` / `DOCDEX_EXCLUDE_DIRS`: extra directory names to skip anywhere in the tree.
|
|
104
|
-
- `--auth-token <token>` / `DOCDEX_AUTH_TOKEN`: bearer token required in secure mode (default); omit only when starting with `--secure-mode=false`.
|
|
105
|
-
- `--secure-mode <true|false>` / `DOCDEX_SECURE_MODE`: default `true`; when enabled, requires an auth token, loopback allowlist by default, and default rate limiting (60 req/min).
|
|
106
|
-
- `--allow-ip a,b,c` / `DOCDEX_ALLOW_IPS`: optional comma-separated IPs/CIDRs allowed to reach the HTTP API (default: loopback-only in secure mode; allow all when secure mode is disabled).
|
|
107
|
-
- `--tls-cert` / `DOCDEX_TLS_CERT` and `--tls-key` / `DOCDEX_TLS_KEY`: serve HTTPS with the provided cert/key. With TLS enforcement on, non-loopback binds must use HTTPS unless you explicitly opt out.
|
|
108
|
-
- `--certbot-domain <domain>` / `DOCDEX_CERTBOT_DOMAIN`: point TLS at `/etc/letsencrypt/live/<domain>/{fullchain.pem,privkey.pem}` (Certbot). Conflicts with manual `--tls-*`.
|
|
109
|
-
- `--certbot-live-dir <path>` / `DOCDEX_CERTBOT_LIVE_DIR`: use a specific Certbot live dir containing `fullchain.pem` and `privkey.pem`.
|
|
110
|
-
- `--require-tls <true|false>` / `DOCDEX_REQUIRE_TLS`: default `true`. Enforce TLS for non-loopback binds; set to `false` when TLS is already terminated by a trusted proxy.
|
|
111
|
-
- `--insecure` / `DOCDEX_INSECURE_HTTP=true`: allow plain HTTP on non-loopback binds even when TLS is enforced (only use behind a trusted proxy).
|
|
112
|
-
- `--max-limit <n>` / `DOCDEX_MAX_LIMIT`: clamp HTTP `limit` to at most `n` (default: 8).
|
|
113
|
-
- `--max-query-bytes <n>` / `DOCDEX_MAX_QUERY_BYTES`: reject requests whose query string exceeds `n` bytes (default: 4096).
|
|
114
|
-
- `--max-request-bytes <n>` / `DOCDEX_MAX_REQUEST_BYTES`: reject requests whose Content-Length or size hint exceeds `n` bytes (default: 16384).
|
|
115
|
-
- `--rate-limit-per-min <n>` / `DOCDEX_RATE_LIMIT_PER_MIN`: per-IP request budget per minute (default 60 in secure mode when unset/0; 0 disables when secure mode is off).
|
|
116
|
-
- `--rate-limit-burst <n>` / `DOCDEX_RATE_LIMIT_BURST`: optional burst capacity for the rate limiter (defaults to per-minute limit when 0).
|
|
117
|
-
- `--audit-log-path <path>` / `DOCDEX_AUDIT_LOG_PATH`: write audit log JSONL to this path (default: `<state-dir>/audit.log`).
|
|
118
|
-
- `--audit-max-bytes <n>` / `DOCDEX_AUDIT_MAX_BYTES`: rotate audit log after this many bytes (default: 5_000_000).
|
|
119
|
-
- `--audit-max-files <n>` / `DOCDEX_AUDIT_MAX_FILES`: keep at most this many rotated audit files (default: 5).
|
|
120
|
-
- `--audit-disable` / `DOCDEX_AUDIT_DISABLE=true`: disable audit logging entirely.
|
|
121
|
-
- `--strip-snippet-html` / `DOCDEX_STRIP_SNIPPET_HTML=true`: omit `snippet.html` in responses to force text-only snippets (HTML is sanitized by default when present).
|
|
122
|
-
- `--disable-snippet-text` / `DOCDEX_DISABLE_SNIPPET_TEXT=true`: omit snippet text/html in responses entirely (only doc metadata is returned).
|
|
123
|
-
- `--enable-memory <true|false>` / `DOCDEX_ENABLE_MEMORY`: control memory endpoints (enabled by default via config; set `[memory].enabled=false` or `DOCDEX_ENABLE_MEMORY=0` to disable).
|
|
124
|
-
- `DOCDEX_WEB_ENABLED=1` / `DOCDEX_OFFLINE=1`: enable web fallback or force offline mode.
|
|
125
|
-
- `--access-log <true|false>` / `DOCDEX_ACCESS_LOG`: emit minimal structured access logs with query values redacted (default: true).
|
|
126
|
-
- `--run-as-uid` / `DOCDEX_RUN_AS_UID`, `--run-as-gid` / `DOCDEX_RUN_AS_GID`: (Unix) drop privileges to the provided UID/GID after startup prep.
|
|
127
|
-
- `--chroot <path>` / `DOCDEX_CHROOT`: (Unix) chroot into `path` before serving; repo/state paths must exist inside that jail.
|
|
128
|
-
- `--unshare-net` / `DOCDEX_UNSHARE_NET=true`: (Linux only) unshare the network namespace before serving (requires CAP_SYS_ADMIN/root); no-op on other platforms.
|
|
129
|
-
- Logging: `--log <level>` on `serve` (defaults to `info`), or `RUST_LOG=docdexd=debug` style filters.
|
|
130
|
-
- Secure mode defaults: when `--secure-mode=true` (default), docdex requires an auth token, allows only loopback IPs unless overridden, and applies a 60 req/min rate limit. Set `--secure-mode=false` to opt out for local dev and adjust `--allow-ip`/rate limits as needed.
|
|
131
|
-
|
|
132
|
-
## Indexing rules (see `index/mod.rs`)
|
|
133
|
-
- File types: `.md`, `.markdown`, `.mdx`, `.txt`, `.rs`, `.py`, `.js`, `.jsx`, `.ts`, `.tsx`, `.go` (extend `DEFAULT_EXTENSIONS` to add more).
|
|
134
|
-
- Skipped directories: broad VCS/build/cache/vendor folders across ecosystems (e.g., `.git`, `.hg`, `.svn`, `node_modules`, `.pnpm-store`, `.yarn*`, `.nx`, `.rollup-cache`, `.webpack-cache`, `.tsbuildinfo`, `.next`, `.nuxt`, `.svelte-kit`, `.mypy_cache`, `.ruff_cache`, `.venv`, `target`, `go-build`, `.gradle`, `.mvn`, `pods`, `.dart_tool`, `.android`, `.serverless`, `.vercel`, `.netlify`, `_build`, `_opam`, `.stack-work`, `elm-stuff`, `library`, `intermediate`, `.godot`, etc.; see `DEFAULT_EXCLUDED_DIR_NAMES` for the full list).
|
|
135
|
-
- Skipped relative prefixes: `logs/`, `.docdex/`, `.docdex/logs/`, `.docdex/tmp/`, `.gpt-creator/logs/`, `.gpt-creator/tmp/`, `.mastercoda/logs/`, `.mastercoda/tmp/`, `docker/.data/`, `docker-data/`, `.docker/`.
|
|
136
|
-
- Snippet sizing: summaries ~360 chars (up to 4 segments); snippets ~420 chars.
|
|
137
|
-
|
|
138
|
-
## HTTP API
|
|
139
|
-
- `GET /healthz` — returns `ok`; this endpoint is unauthenticated and not rate-limited (IP allowlist still applies).
|
|
140
|
-
- `GET /search?q=<text>&limit=<n>&snippets=<bool>&max_tokens=<u64>&include_libs=<bool>` — returns `{ hits: [...] }` with doc id, `rel_path`/`path`, `kind` (`doc`|`code`), summary, snippet, score, token estimate. Optional: `force_web`, `skip_local_search`, `no_cache`, `max_web_results`, `llm_filter_local_results`, `diff_mode`, `diff_base`, `diff_head`, `diff_path`, `repo_id`.
|
|
141
|
-
- `GET /snippet/:doc_id?window=<lines>&q=<query>&text_only=<bool>&max_tokens=<u64>` — returns `{ doc, snippet }` with optional highlighted snippet; falls back to preview when query highlighting is empty (default window: 40 lines).
|
|
142
|
-
- `POST /v1/index/rebuild` — rebuild the repo index.
|
|
143
|
-
- `POST /v1/index/ingest` — ingest a single file.
|
|
144
|
-
- `POST /v1/chat/completions` — OpenAI-compatible chat completion with docdex context.
|
|
145
|
-
- `GET /v1/graph/impact` / `GET /v1/graph/impact/diagnostics` — impact graph edges + unresolved imports.
|
|
146
|
-
- `GET /v1/symbols`, `GET /v1/symbols/status` — symbols per file + parser drift status.
|
|
147
|
-
- `GET /v1/ast`, `GET /v1/ast/search`, `POST /v1/ast/query` — AST queries.
|
|
148
|
-
- `POST /v1/memory/store`, `POST /v1/memory/recall` — memory endpoints (enabled by default).
|
|
149
|
-
- `POST /v1/web/search`, `POST /v1/web/fetch`, `POST /v1/web/cache/flush` — web discovery/fetch (requires `DOCDEX_WEB_ENABLED=1`).
|
|
150
|
-
- `GET /ai-help` — JSON quickstart for agents.
|
|
151
|
-
- `GET /metrics` — Prometheus-style counters/gauges (see `docs/ops/browser_guard.md` in the repo).
|
|
152
|
-
- Repo scoping: include `repo_id` in query/body or the `x-docdex-repo-id` header; mismatches are rejected.
|
|
153
|
-
- If `--auth-token` is set, include `Authorization: Bearer <token>` on HTTP calls (including `/ai-help`).
|
|
154
|
-
|
|
155
|
-
## CLI commands
|
|
156
|
-
- `serve --repo <path> [--host 127.0.0.1] [--port 3210] [--log info]` — start HTTP API with file watching for incremental updates.
|
|
157
|
-
- `index --repo <path>` — rebuild the entire index.
|
|
158
|
-
- `ingest --repo <path> --file <file>` — reindex a single file.
|
|
159
|
-
- `chat --repo <path> --query "<text>" [--limit 8] [--repo-only|--web-only] [--max-web-results N]` — run a chat/search query (omit `--query` to enter REPL mode).
|
|
160
|
-
- `web-search --query "<text>"`, `web-fetch --url <url>`, `web-rag --query "<text>"` — web discovery/fetch and web-assisted queries.
|
|
161
|
-
- `memory-store --text "<text>"` / `memory-recall --query "<text>" --top-k 5` — memory store/recall (enabled by default).
|
|
162
|
-
- `symbols-status --repo <path>` — report Tree-sitter parser drift.
|
|
163
|
-
- `impact-diagnostics --repo <path>` — list unresolved dynamic imports.
|
|
164
|
-
- `self-check --repo <path> --terms "foo,bar" [--limit 5]` — scan the index for sensitive terms before enabling access.
|
|
165
|
-
|
|
166
|
-
## Perf checks
|
|
167
|
-
- Repo-only search latency (p95 < 50ms; see `docs/sds/sds.md`): `cargo test --release repo_only_search_p95_under_50ms_with_libs_index_present -- --ignored --nocapture`.
|
|
168
|
-
|
|
169
|
-
## Help and command discovery
|
|
170
|
-
- List all commands/flags: `docdexd --help`.
|
|
171
|
-
- Dump help for every subcommand: `docdexd help-all`.
|
|
172
|
-
- See `serve` options (TLS, auth, rate limits, watcher): `docdexd serve --help`.
|
|
173
|
-
- Indexing options: `docdexd index --help` (exclude paths, custom state dir).
|
|
174
|
-
- Ad-hoc queries: `docdexd chat --help`.
|
|
175
|
-
- Self-check scanner options: `docdexd self-check --help`.
|
|
176
|
-
- Agent help endpoint: `curl http://127.0.0.1:3210/ai-help` (include `Authorization: Bearer <token>` if `--auth-token` is set) for a JSON listing of endpoints, limits, and best practices.
|
|
177
|
-
- MCP help/registration: `docdexd mcp --help` lists MCP flags; register with your client using `docdexd mcp --repo <repo> --log warn --max-results 8` (Codex CLI shortcut: `codex mcp add docdex -- docdexd mcp --repo <repo> --log warn --max-results 8`).
|
|
178
|
-
- Environment variables mirror the flags (e.g., `DOCDEX_AUTH_TOKEN`, `DOCDEX_TLS_CERT`, `DOCDEX_MAX_LIMIT`).
|
|
179
|
-
- Command overview (same as `docdexd --help`):
|
|
180
|
-
- `serve` — run HTTP API with watcher and security knobs.
|
|
181
|
-
- `index` — build or rebuild the whole index.
|
|
182
|
-
- `ingest` — reindex a single file.
|
|
183
|
-
- `chat` — run an ad-hoc search, JSON to stdout (omit `--query` for REPL).
|
|
184
|
-
- `web-search` / `web-fetch` / `web-rag` — web discovery and web-assisted queries (requires `DOCDEX_WEB_ENABLED=1`).
|
|
185
|
-
- `memory-store` / `memory-recall` — memory store/recall.
|
|
186
|
-
- `symbols-status` / `impact-diagnostics` — code intelligence status and unresolved imports.
|
|
187
|
-
- `repo` — inspect or reassociate repo identity for shared state dirs.
|
|
188
|
-
- `mcp` / `mcp-add` — MCP server + helper for agent CLIs.
|
|
189
|
-
- `self-check` — scan index for sensitive terms with report.
|
|
190
|
-
- `help-all` — print help for every command/flag in one output.
|
|
191
|
-
|
|
192
|
-
## Troubleshooting
|
|
193
|
-
- Stale index: re-run `docdexd index --repo <path>`.
|
|
194
|
-
- Port conflicts: change `--host/--port`.
|
|
195
|
-
- Installer failures (`npm i -g docdex`): use the printed `DOCDEX_*` error code; see `docs/ops/installer_error_codes.md`.
|
|
196
|
-
|
|
197
|
-
## Security considerations
|
|
198
|
-
- Default bind is `127.0.0.1`; keep it unless you are behind a trusted reverse proxy/firewall. Avoid `--host 0.0.0.0` on untrusted networks.
|
|
199
|
-
- By default, non-loopback binds require TLS; opt out only with `--require-tls=false` or `--insecure` when traffic is already terminating at a trusted proxy.
|
|
200
|
-
- If exposing externally, place a reverse proxy in front, terminate TLS, and require auth (basic/OAuth/mTLS) plus IP/VPN allowlisting. Example (nginx):
|
|
201
|
-
```
|
|
202
|
-
server {
|
|
203
|
-
listen 443 ssl;
|
|
204
|
-
server_name docdex.example.com;
|
|
205
|
-
ssl_certificate /path/fullchain.pem;
|
|
206
|
-
ssl_certificate_key /path/privkey.pem;
|
|
207
|
-
auth_basic "Protected";
|
|
208
|
-
auth_basic_user_file /etc/nginx/.htpasswd; # or hook OAuth/mTLS instead
|
|
209
|
-
allow 10.0.0.0/8;
|
|
210
|
-
allow 192.168.0.0/16;
|
|
211
|
-
deny all;
|
|
212
|
-
location / {
|
|
213
|
-
proxy_pass http://127.0.0.1:3210;
|
|
214
|
-
proxy_set_header Host $host;
|
|
215
|
-
}
|
|
216
|
-
}
|
|
217
|
-
```
|
|
218
|
-
- Trim the corpus: prefer a curated staging directory, or use `--exclude-dir` / `--exclude-prefix` to keep secrets/private paths out before indexing; the watcher will ingest any in-scope file change under `repo`.
|
|
219
|
-
- Mind logs: avoid verbose logging in production if snippets/paths are sensitive; reverse-proxy access logs can also capture query terms and paths.
|
|
220
|
-
- Least privilege: run docdex under a low-privilege user/container and keep the state dir on a path with restricted permissions.
|
|
221
|
-
- Validate before publish: run `docdexd chat` for sensitive keywords to confirm no hits; store indexes on encrypted disks if required.
|
|
222
|
-
- Optional hardening: require an auth token on the HTTP API (or proxy); enforce TLS when not on localhost (default) or explicitly opt out with `--require-tls=false`/`--insecure` only behind a trusted proxy; enable rate limiting (`--rate-limit-per-min`) and clamp `limit`/request sizes (`--max-limit`, `--max-query-bytes`, `--max-request-bytes`); escape/sanitize snippet HTML if embedding or disable snippets entirely with `--disable-snippet-text`; state dir is created `0700` by default—keep it under an unprivileged user, optionally `--run-as-uid/--run-as-gid`, `--chroot`, or containerize; keep access logging minimal/redacted (`--access-log`), and run `self-check` for sensitive terms before exposing the service; for at-rest confidentiality, place the state dir on encrypted storage or use host-level disk encryption.
|
|
223
|
-
|
|
224
|
-
## Integrating with LLM tools
|
|
225
|
-
Docdex is tool-agnostic. Drop-in recipe for agents/codegen tools:
|
|
226
|
-
- Start once per repo: `docdexd index --repo <repo>` then `docdexd serve --repo <repo> --host 127.0.0.1 --port 3210 --log warn` (or use the CLI directly without serving).
|
|
227
|
-
- Configure via env: `DOCDEX_STATE_DIR` (state location), `DOCDEX_EXCLUDE_PREFIXES`, `DOCDEX_EXCLUDE_DIRS`, `RUST_LOG=docdexd=debug` (optional verbose logs).
|
|
228
|
-
- Query over HTTP: `GET /search?q=<text>&limit=<n>` returns `{hits:[...], top_score, meta}`; `GET /snippet/:doc_id` fetches a focused snippet plus doc metadata.
|
|
229
|
-
- Or chat over HTTP: `POST /v1/chat/completions` (OpenAI-compatible) with a `docdex` object to control gating and repo context.
|
|
230
|
-
- Or query via CLI: `docdexd chat --repo <repo> --query "<text>" --limit 8` (JSON to stdout).
|
|
231
|
-
- Health check: `GET /healthz` should return `ok` before issuing search requests.
|
|
232
|
-
- Inject snippets into prompts:
|
|
77
|
+
### 2. Start the Daemon
|
|
78
|
+
|
|
79
|
+
Start the shared server. This handles HTTP requests and MCP connections.
|
|
80
|
+
|
|
81
|
+
```bash
|
|
82
|
+
docdexd daemon --repo /path/to/my-project --host 127.0.0.1 --port 3210
|
|
83
|
+
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
### 3. Ask Questions (CLI)
|
|
87
|
+
|
|
88
|
+
You can chat directly from the terminal.
|
|
89
|
+
|
|
90
|
+
```bash
|
|
91
|
+
docdexd chat --repo /path/to/my-project --query "how does auth work?"
|
|
92
|
+
|
|
233
93
|
```
|
|
234
|
-
|
|
94
|
+
|
|
95
|
+
---
|
|
96
|
+
|
|
97
|
+
## 🔌 Model Context Protocol (MCP)
|
|
98
|
+
|
|
99
|
+
Docdex is designed to be the "brain" for your AI agents. It exposes an MCP endpoint that agents connect to.
|
|
100
|
+
|
|
101
|
+
### Architecture
|
|
102
|
+
|
|
103
|
+
```mermaid
|
|
104
|
+
flowchart LR
|
|
105
|
+
Repo[Repo on disk] --> Indexer[Docdex Indexer]
|
|
106
|
+
Indexer --> Daemon[Docdex Daemon]
|
|
107
|
+
Daemon -->|HTTP + SSE| MCPClient[MCP Client]
|
|
108
|
+
MCPClient --> Host[AI Agent / Editor]
|
|
109
|
+
|
|
235
110
|
```
|
|
236
111
|
|
|
237
|
-
###
|
|
238
|
-
|
|
239
|
-
|
|
240
|
-
|
|
241
|
-
|
|
242
|
-
|
|
243
|
-
|
|
244
|
-
|
|
245
|
-
{
|
|
246
|
-
"
|
|
247
|
-
"
|
|
248
|
-
"url": "http://localhost:3210/sse"
|
|
249
|
-
}
|
|
112
|
+
### Manual Configuration
|
|
113
|
+
|
|
114
|
+
If you need to configure your client manually:
|
|
115
|
+
|
|
116
|
+
**JSON (Claude/Cursor/Continue):**
|
|
117
|
+
|
|
118
|
+
```json
|
|
119
|
+
{
|
|
120
|
+
"mcpServers": {
|
|
121
|
+
"docdex": {
|
|
122
|
+
"url": "http://localhost:3210/sse"
|
|
250
123
|
}
|
|
251
124
|
}
|
|
252
|
-
|
|
253
|
-
|
|
254
|
-
|
|
255
|
-
|
|
256
|
-
|
|
257
|
-
|
|
258
|
-
|
|
259
|
-
|
|
260
|
-
|
|
261
|
-
|
|
262
|
-
|
|
263
|
-
|
|
264
|
-
|
|
265
|
-
|
|
266
|
-
|
|
267
|
-
|
|
268
|
-
|
|
269
|
-
|
|
270
|
-
|
|
271
|
-
|
|
272
|
-
|
|
273
|
-
|
|
274
|
-
|
|
275
|
-
|
|
276
|
-
|
|
277
|
-
|
|
278
|
-
|
|
279
|
-
|
|
280
|
-
|
|
281
|
-
|
|
282
|
-
|
|
283
|
-
|
|
284
|
-
|
|
285
|
-
|
|
286
|
-
|
|
125
|
+
}
|
|
126
|
+
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
**TOML (Codex):**
|
|
130
|
+
|
|
131
|
+
```toml
|
|
132
|
+
[mcp_servers]
|
|
133
|
+
docdex = { url = "http://localhost:3210/v1/mcp" }
|
|
134
|
+
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
---
|
|
138
|
+
|
|
139
|
+
## 🤖 capabilities & Examples
|
|
140
|
+
|
|
141
|
+
### 1. AST & Impact Analysis
|
|
142
|
+
|
|
143
|
+
Don't just find the string "addressGenerator"; find the **definition** and what it impacts.
|
|
144
|
+
|
|
145
|
+
```bash
|
|
146
|
+
# Find definition
|
|
147
|
+
curl "http://127.0.0.1:3210/v1/ast?name=addressGenerator&pathPrefix=src"
|
|
148
|
+
|
|
149
|
+
# Track downstream impact (what breaks if I change this?)
|
|
150
|
+
curl "http://127.0.0.1:3210/v1/graph/impact?file=src/app.ts&maxDepth=3"
|
|
151
|
+
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
### 2. Memory System
|
|
155
|
+
|
|
156
|
+
Docdex allows you to store "facts" that retrieval helps recall later.
|
|
157
|
+
|
|
158
|
+
**Repo Memory (Project specific):**
|
|
159
|
+
|
|
160
|
+
```bash
|
|
161
|
+
# Teach the repo a fact
|
|
162
|
+
docdexd memory-store --repo . --text "Payments retry up to 3 times with backoff."
|
|
163
|
+
|
|
164
|
+
# Recall it later
|
|
165
|
+
docdexd memory-recall --repo . --query "payments retry policy"
|
|
166
|
+
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
**Agent Memory (User preference):**
|
|
170
|
+
|
|
171
|
+
```bash
|
|
172
|
+
# Set a style preference
|
|
173
|
+
docdexd profile add --agent-id "default" --category style --content "Use concise bullet points."
|
|
174
|
+
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
### 3. Local LLM (Ollama)
|
|
178
|
+
|
|
179
|
+
Docdex uses Ollama for embeddings and optional local chat.
|
|
180
|
+
|
|
181
|
+
* **Setup:** Run `docdex setup` for an interactive wizard.
|
|
182
|
+
* **Manual:** Ensure `nomic-embed-text` is pulled in Ollama (`ollama pull nomic-embed-text`).
|
|
183
|
+
* **Custom URL:**
|
|
184
|
+
```bash
|
|
185
|
+
DOCDEX_OLLAMA_BASE_URL=http://127.0.0.1:11434 docdexd daemon ...
|
|
186
|
+
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
|
|
190
|
+
|
|
191
|
+
---
|
|
192
|
+
|
|
193
|
+
## ⚙️ Configuration & HTTP API
|
|
194
|
+
|
|
195
|
+
Docdex runs as a local daemon serving:
|
|
196
|
+
|
|
197
|
+
* **CLI Commands:** `docdexd chat`
|
|
198
|
+
* **HTTP API:** `/search`, `/v1/ast`, `/v1/graph/impact`
|
|
199
|
+
* **MCP Endpoints:** `/v1/mcp` and `/sse`
|
|
200
|
+
|
|
201
|
+
### Multi-Repo Setup
|
|
202
|
+
|
|
203
|
+
Run a single daemon and mount additional repos on demand.
|
|
204
|
+
|
|
205
|
+
```bash
|
|
206
|
+
docdexd daemon --repo /path/to/repo-a --port 3210
|
|
207
|
+
|
|
208
|
+
# Mount another repo and capture its repo_id
|
|
209
|
+
curl -X POST "http://127.0.0.1:3210/v1/initialize" \
|
|
210
|
+
-H "Content-Type: application/json" \
|
|
211
|
+
-d '{"rootUri":"file:///path/to/repo-b"}'
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
Notes:
|
|
215
|
+
- When more than one repo is mounted, include `x-docdex-repo-id: <sha256>` on HTTP requests.
|
|
216
|
+
- MCP sessions bind to the repo provided in `initialize.rootUri` and reuse that repo automatically.
|
|
217
|
+
|
|
218
|
+
### Security
|
|
219
|
+
|
|
220
|
+
* **Secure Mode:** By default, Docdex enforces TLS on non-loopback binds.
|
|
221
|
+
* **Loopback:** `127.0.0.1` is accessible without TLS for local agents.
|
|
222
|
+
* To expose to a network (use with caution), use `--expose` and `--auth-token`.
|
|
223
|
+
|
|
224
|
+
---
|
|
225
|
+
|
|
226
|
+
## 📚 Learn More
|
|
227
|
+
|
|
228
|
+
* **Smithery:** [View on Smithery.ai](https://smithery.ai/server/@bekirdag/docdex)
|
|
229
|
+
* **Detailed Usage:** `docs/usage.md`
|
|
230
|
+
* **API Reference:** `docs/http_api.md`
|
|
231
|
+
* **MCP Specs:** `docs/mcp/errors.md`
|