PyPI - deepdoc - Versions diffs - 1.2.0__tar.gz → 1.4.0__tar.gz - Mend

deepdoc 1.2.0tar.gz → 1.4.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (132) hide show

{deepdoc-1.2.0 → deepdoc-1.4.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: deepdoc
-Version: 1.2.0
+Version: 1.4.0
 Summary: Auto-generate beautiful docs from any codebase
 Author: Pranav Kumar
 License: MIT
@@ -55,6 +55,8 @@ DeepDoc scans your repo, builds a bucket-based documentation plan, generates ric
 - **Five-Phase Pipeline** — Scan, plan, generate, playground, build. Planning and generation are separated so large repos and large files are handled more cleanly.
 - **Multi-Step AI Planner** — The planner classifies the repo, proposes buckets, then assigns files, symbols, artifacts, and dependencies into the final doc structure.
 - **Giant-File Handling** — Large files are decomposed into feature-aligned clusters so giant controllers or service files can feed multiple doc pages.
+- **Reader-First Repo-Agnostic Nav** — The planner normalizes bucket output into a natural onboarding flow (for backend repos: Start Here → Core Workflows → API Reference → Data Model → runtime/integrations/ops) while preserving full coverage.
+- **Large-Database Anti-Noise Grouping** — Sparse singleton model files are coalesced into stable aggregate groups (for example `core-models`) so huge schemas stay complete without one-file-per-page nav spam.
 - **Endpoint-Family + Per-Endpoint Docs** — High-level endpoint family pages are AI-planned, and individual `endpoint_ref` pages are derived from scan data and generated separately.
 - **Integration Discovery** — Third-party systems like payment gateways, delivery providers, warehouse systems, and webhook integrations can be grouped into integration docs.
 - **Incremental Updates** — `deepdoc update` uses persisted plan and ledger data to regenerate only stale or structurally affected docs.
@@ -310,6 +312,22 @@ deepdoc config set output_dir documentation            # Change output dir
 deepdoc config set llm.api_key_env AZURE_API_KEY       # Change API key env var
 ```
+### `deepdoc benchmark`
+Run planner benchmark cases and optionally generate a combined docs+chatbot quality scorecard.
+```bash
+deepdoc benchmark --catalog benchmarks/catalog.json
+deepdoc benchmark --repo /path/to/repo --gold benchmarks/gold.json
+deepdoc benchmark --catalog benchmarks/catalog.json --chatbot-eval benchmarks/chatbot_eval.json
+deepdoc benchmark --catalog benchmarks/catalog.json --chatbot-eval benchmarks/chatbot_eval.json --scorecard-out .deepdoc/quality_scorecard.json --strict-scorecard
+deepdoc benchmark --generated-root /Users/apple/autodoc/docs --scorecard-out /Users/apple/autodoc/docs/_scorecards/latest.json
+```
+Use `--strict-scorecard` to fail the command when completeness gates are not met.
+When you do not have a hand-written benchmark catalog or chatbot eval file yet, use artifact mode (`--generated-root` or `--artifact-repo`) to compute a provisional scorecard directly from persisted `.deepdoc/` outputs.
 ---
 ## LLM Provider Setup
@@ -618,7 +636,9 @@ chatbot:
     base_url: ""
     api_version: ""
     temperature: 0.1
-    max_tokens: 16000
+    max_tokens: 24000
+    continuation_retries: 2                   # Auto-continue if answer ends abruptly
+    continuation_context_chars: 12000         # Tail chars included in continuation prompt
   embeddings:                                 # LLM used for embedding code/docs
     provider: "azure"
@@ -643,16 +663,25 @@ chatbot:
     top_k_code: 15
     top_k_artifact: 8
     top_k_docs: 6
-    top_k_relationship: 6
+    top_k_relationship: 8
     candidate_top_k_code: 30
     candidate_top_k_artifact: 16
     candidate_top_k_docs: 12
     candidate_top_k_relationship: 12
     max_prompt_code_chunks: 12
     max_prompt_artifact_chunks: 6
-    max_prompt_doc_chunks: 4
-    max_prompt_relationship_chunks: 4
-    max_prompt_chars: 200000
+    max_prompt_doc_chunks: 6
+    max_prompt_relationship_chunks: 6
+    max_prompt_chars: 120000
+    fast_mode_use_llm_retrieval_steps: false  # Fast mode skips expansion/rerank by default
+    fast_mode_iterative_retrieval: false      # Fast mode skips second-pass follow-up retrieval
+    fast_mode_max_prompt_chars: 90000         # Smaller prompt budget for faster /query answers
+    deep_mode_max_prompt_chars: 140000        # Larger budget for /deep-research synthesis
+    code_deep_mode_max_prompt_chars: 180000   # Largest prompt budget for /code-deep
+    code_deep_top_k: 16                       # Code chunks retrieved for code-aware mode
+    code_deep_top_k_relationship: 12          # Relationship chunks retrieved for code-aware mode
+    code_deep_top_k_docs: 4                   # Cap docs chunks in code-aware mode
+    code_deep_file_inventory_limit: 18        # Max files listed in code-aware inventory
     lexical_retrieval: true
     lexical_candidate_limit: 24
     query_expansion: true
@@ -666,7 +695,8 @@ chatbot:
     graph_neighbor_relationship_chunks_per_file: 2
     graph_neighbor_max_docs: 4
     rerank: true
-    rerank_candidate_limit: 20
+    rerank_candidate_limit: 32
+    rerank_candidate_limit_per_kind: 8
     rerank_preview_chars: 450
     stitch_adjacent_code_chunks: true
     stitch_max_adjacent_chunks: 2
@@ -674,7 +704,8 @@ chatbot:
     live_fallback_max_files: 6
     live_fallback_max_per_file: 2
     live_fallback_context_lines: 12
-    deep_research_chunk_chars: 1600
+    deep_research_chunk_chars: 3200
+    deep_research_top_k: 10
   chunking:
     code_chunk_lines: 120
@@ -709,7 +740,9 @@ chatbot:
 | `chatbot.answer.base_url` | `""` | Custom endpoint (for Azure, Ollama, etc.) |
 | `chatbot.answer.api_version` | `""` | Azure API version string |
 | `chatbot.answer.temperature` | `0.1` | Sampling temperature (lower = more deterministic) |
-| `chatbot.answer.max_tokens` | `16000` | Max tokens per answer |
+| `chatbot.answer.max_tokens` | `24000` | Max tokens per answer |
+| `chatbot.answer.continuation_retries` | `2` | Extra completion attempts when an answer appears truncated |
+| `chatbot.answer.continuation_context_chars` | `12000` | Number of trailing chars passed when asking the model to continue |
 | **Embeddings LLM** | | |
 | `chatbot.embeddings.provider` | `azure` | Provider for the embedding model |
 | `chatbot.embeddings.model` | `azure/text-embedding-3-large` | Embedding model |
@@ -721,16 +754,25 @@ chatbot:
 | `chatbot.retrieval.top_k_code` | `15` | Top code chunks retrieved per query |
 | `chatbot.retrieval.top_k_artifact` | `8` | Top artifact chunks retrieved per query |
 | `chatbot.retrieval.top_k_docs` | `6` | Top generated-doc and repo-doc chunks retrieved per query |
-| `chatbot.retrieval.top_k_relationship` | `6` | Top relationship chunks retrieved per query |
+| `chatbot.retrieval.top_k_relationship` | `8` | Top relationship chunks retrieved per query |
 | `chatbot.retrieval.candidate_top_k_code` | `30` | Candidate code chunks gathered before reranking |
 | `chatbot.retrieval.candidate_top_k_artifact` | `16` | Candidate artifact chunks gathered before reranking |
 | `chatbot.retrieval.candidate_top_k_docs` | `12` | Candidate doc chunks gathered before reranking |
 | `chatbot.retrieval.candidate_top_k_relationship` | `12` | Candidate relationship chunks gathered before reranking |
 | `chatbot.retrieval.max_prompt_code_chunks` | `12` | Max code chunks included in the final prompt |
 | `chatbot.retrieval.max_prompt_artifact_chunks` | `6` | Max artifact chunks in the final prompt |
-| `chatbot.retrieval.max_prompt_doc_chunks` | `4` | Max doc chunks in the final prompt |
-| `chatbot.retrieval.max_prompt_relationship_chunks` | `4` | Max relationship chunks included in the final prompt |
-| `chatbot.retrieval.max_prompt_chars` | `200000` | Total character budget for the assembled prompt |
+| `chatbot.retrieval.max_prompt_doc_chunks` | `6` | Max doc chunks in the final prompt |
+| `chatbot.retrieval.max_prompt_relationship_chunks` | `6` | Max relationship chunks included in the final prompt |
+| `chatbot.retrieval.max_prompt_chars` | `120000` | Default character budget for assembled prompts |
+| `chatbot.retrieval.fast_mode_use_llm_retrieval_steps` | `false` | In `/query` fast mode, disable LLM query expansion and reranking |
+| `chatbot.retrieval.fast_mode_iterative_retrieval` | `false` | In `/query` fast mode, disable iterative follow-up retrieval |
+| `chatbot.retrieval.fast_mode_max_prompt_chars` | `90000` | Prompt budget used by `/query` fast mode |
+| `chatbot.retrieval.deep_mode_max_prompt_chars` | `140000` | Prompt budget used by `/deep-research` |
+| `chatbot.retrieval.code_deep_mode_max_prompt_chars` | `180000` | Prompt budget used by `/code-deep` |
+| `chatbot.retrieval.code_deep_top_k` | `16` | Code chunks retrieved in code-aware mode |
+| `chatbot.retrieval.code_deep_top_k_relationship` | `12` | Relationship chunks retrieved in code-aware mode |
+| `chatbot.retrieval.code_deep_top_k_docs` | `4` | Docs chunk cap in code-aware mode |
+| `chatbot.retrieval.code_deep_file_inventory_limit` | `18` | Max files listed in code-aware inventory |
 | `chatbot.retrieval.lexical_retrieval` | `true` | Blend exact-match retrieval with embedding retrieval |
 | `chatbot.retrieval.lexical_candidate_limit` | `24` | Max lexical candidates gathered before merge/rerank |
 | `chatbot.retrieval.query_expansion` | `true` | Use LLM to generate alternative search queries |
@@ -744,7 +786,8 @@ chatbot:
 | `chatbot.retrieval.graph_neighbor_relationship_chunks_per_file` | `2` | Relationship chunks per linked file during graph expansion |
 | `chatbot.retrieval.graph_neighbor_max_docs` | `4` | Max linked docs pulled in during graph expansion |
 | `chatbot.retrieval.rerank` | `true` | Use LLM to rerank retrieved chunks |
-| `chatbot.retrieval.rerank_candidate_limit` | `20` | Max candidates sent to the reranker |
+| `chatbot.retrieval.rerank_candidate_limit` | `32` | Max candidates sent to the reranker |
+| `chatbot.retrieval.rerank_candidate_limit_per_kind` | `8` | Per-kind candidate cap before filling the global rerank pool |
 | `chatbot.retrieval.rerank_preview_chars` | `450` | Characters of each chunk shown to the reranker |
 | `chatbot.retrieval.stitch_adjacent_code_chunks` | `true` | Expand exact-match code hits with adjacent windows from the same file |
 | `chatbot.retrieval.stitch_max_adjacent_chunks` | `2` | Max adjacent code windows stitched onto a top hit |
@@ -752,7 +795,8 @@ chatbot:
 | `chatbot.retrieval.live_fallback_max_files` | `6` | Max repo files inspected during a deep-research live fallback |
 | `chatbot.retrieval.live_fallback_max_per_file` | `2` | Max fallback snippets returned per inspected file |
 | `chatbot.retrieval.live_fallback_context_lines` | `12` | Lines per fallback snippet around each exact match |
-| `chatbot.retrieval.deep_research_chunk_chars` | `1600` | Max chars per evidence chunk passed into deep-research step answers |
+| `chatbot.retrieval.deep_research_chunk_chars` | `3200` | Max chars per evidence chunk passed into deep-research step answers |
+| `chatbot.retrieval.deep_research_top_k` | `10` | Retrieved chunks per deep-research sub-question |
 | **Chunking** | | |
 | `chatbot.chunking.code_chunk_lines` | `120` | Lines per code chunk |
 | `chatbot.chunking.code_chunk_overlap` | `20` | Overlap lines between code chunks |
@@ -829,22 +873,26 @@ During `deepdoc generate`, six corpora are built and stored in `.deepdoc/chatbot
 ### Chatbot Query Pipeline
-When a user asks a question, the backend runs a multi-step retrieval pipeline:
+When a user asks a question, the backend runs a mode-aware retrieval pipeline:
-1. **Query expansion** — The LLM generates up to 3 alternative search queries to improve recall.
+1. **Query expansion** — In default/deep/code-aware mode, the LLM can generate alternative search queries to improve recall. Fast mode disables this by default.
 2. **Embedding** — All queries are embedded using the configured embedding model.
 3. **Hybrid retrieval** — FAISS similarity search and exact-match lexical search both gather candidates from each corpus.
-4. **Follow-up retrieval** — The backend can derive focused second-pass searches and pull linked files/docs via graph-neighbor expansion.
+4. **Follow-up retrieval** — The backend can derive focused second-pass searches and pull linked files/docs via graph-neighbor expansion. Fast mode can skip follow-up queries for lower latency.
 5. **Chunk stitching** — Exact-match code hits can pull adjacent code windows from the same file so larger implementations survive chunk boundaries.
-6. **Reranking** — The LLM scores and reranks the retrieved chunks for relevance.
+6. **Reranking** — In default/deep/code-aware mode, the LLM can rerank candidates for relevance. Fast mode disables this by default.
 7. **Prompt assembly** — Query-type-aware budgets reserve space for the most important evidence types within the character budget.
-8. **Answer generation** — The answer LLM produces a grounded response with code, artifact, doc, repo-doc, relationship, and live-fallback citations when used.
+8. **Answer generation + continuity guard** — The answer LLM produces a grounded response, and if the output appears truncated (for example ending on a dangling heading), DeepDoc retries with a continuation prompt so the response finishes cleanly.
 `POST /deep-research` uses the same indexed corpora first, but it can also inspect a small bounded set of live repo files when exact-match evidence is missing from the index. This fallback respects the repo's exclude rules, skips oversized/binary files, and is only used in deep research mode.
+`POST /code-deep` uses a code-heavy retrieval profile and returns an explicit file inventory plus step trace so users can see where evidence came from while answering file-oriented questions such as “where is auth defined?”.
+`POST /query`, `POST /deep-research`, and `POST /code-deep` return `response_mode` in the payload (`fast`, `deep`, `code_deep`, or `default`) so clients can confirm which retrieval profile generated the result.
 ### Chatbot API Endpoints
-The generated `chatbot_backend/` exposes two endpoints:
+The generated `chatbot_backend/` exposes five endpoints:
 **Health check:**
 ```
@@ -865,6 +913,43 @@ POST /query
 The response includes the answer text, code citations (file path + line range), artifact citations, and links to relevant generated doc pages.
+`/query` is optimized for speed: it runs retrieval in fast mode (no LLM query expansion/rerank by default) and returns an answer plus citations.
+**Code-aware deep query:**
+```
+POST /code-deep
+{
+  "question": "Where is authentication defined?",
+  "history": [],
+  "max_rounds": 4
+}
+```
+`/code-deep` returns a code-aware answer plus `trace` and `file_inventory` fields so clients can show reasoning progress and files considered.
+**Code-aware live stream (SSE):**
+```
+POST /code-deep/stream
+{
+  "question": "Where is authentication defined?",
+  "history": [],
+  "max_rounds": 4
+}
+```
+`/code-deep/stream` emits `trace` events while researching, then a final `result` event and `done`.
+**Retrieve context only (no answer generation):**
+```
+POST /query-context
+{
+  "question": "Where is reshipping implemented?",
+  "history": []
+}
+```
+`/query-context` returns selected citations/chunks only. Use this endpoint to inspect retrieval quality independently from answer generation.
 ### Deploying the Chatbot
 For local development, `deepdoc serve` handles everything automatically. For production:
@@ -1111,6 +1196,15 @@ Add your API key to repo Settings → Secrets → Actions → `ANTHROPIC_API_KEY
 DeepDoc now supports automated releases through GitHub Actions.
+### Release tracks
+This repository now has two independent release tracks:
+- **Python package (`deepdoc`)**: controlled by `pyproject.toml`, root `CHANGELOG.md`, and `.github/workflows/release.yml`.
+- **VS Code extension (`vscode-extension/`)**: controlled by `vscode-extension/package.json`, `vscode-extension/CHANGELOG.md`, and `.github/workflows/release-vscode-extension.yml`.
+Keep versions and changelog entries separated by track.
 ### What happens automatically
 When you push to `main`, the release workflow checks the version in `pyproject.toml`.
@@ -1168,6 +1262,30 @@ If the matching version section is missing, GitHub falls back to auto-generated
 After that, every new version pushed to `main` can publish without a PyPI token.
+### VS Code extension release flow
+The VS Code extension release is automated from `main` when files under `vscode-extension/` change.
+What the extension workflow does:
+- reads `vscode-extension/package.json` version
+- checks whether tag `vscode-extension-v<version>` already exists
+- builds and packages the extension
+- publishes to Marketplace using `VSCE_PAT`
+- creates and pushes the matching git tag
+- creates a GitHub release with notes from `vscode-extension/CHANGELOG.md` (fallback to generated notes)
+One-time setup for extension publishing:
+1. Create a VS Code Marketplace PAT with Manage scope for publisher `Pranawww`
+2. Add repo secret `VSCE_PAT` in GitHub Actions secrets
+Extension release flow on each version:
+1. Update `vscode-extension/package.json` version
+2. Add matching section to `vscode-extension/CHANGELOG.md`
+3. Commit and push to `main`
 ---
 ## Typical Workflow

{deepdoc-1.2.0 → deepdoc-1.4.0}/README.md RENAMED Viewed

@@ -16,6 +16,8 @@ DeepDoc scans your repo, builds a bucket-based documentation plan, generates ric
 - **Five-Phase Pipeline** — Scan, plan, generate, playground, build. Planning and generation are separated so large repos and large files are handled more cleanly.
 - **Multi-Step AI Planner** — The planner classifies the repo, proposes buckets, then assigns files, symbols, artifacts, and dependencies into the final doc structure.
 - **Giant-File Handling** — Large files are decomposed into feature-aligned clusters so giant controllers or service files can feed multiple doc pages.
+- **Reader-First Repo-Agnostic Nav** — The planner normalizes bucket output into a natural onboarding flow (for backend repos: Start Here → Core Workflows → API Reference → Data Model → runtime/integrations/ops) while preserving full coverage.
+- **Large-Database Anti-Noise Grouping** — Sparse singleton model files are coalesced into stable aggregate groups (for example `core-models`) so huge schemas stay complete without one-file-per-page nav spam.
 - **Endpoint-Family + Per-Endpoint Docs** — High-level endpoint family pages are AI-planned, and individual `endpoint_ref` pages are derived from scan data and generated separately.
 - **Integration Discovery** — Third-party systems like payment gateways, delivery providers, warehouse systems, and webhook integrations can be grouped into integration docs.
 - **Incremental Updates** — `deepdoc update` uses persisted plan and ledger data to regenerate only stale or structurally affected docs.
@@ -271,6 +273,22 @@ deepdoc config set output_dir documentation            # Change output dir
 deepdoc config set llm.api_key_env AZURE_API_KEY       # Change API key env var
 ```
+### `deepdoc benchmark`
+Run planner benchmark cases and optionally generate a combined docs+chatbot quality scorecard.
+```bash
+deepdoc benchmark --catalog benchmarks/catalog.json
+deepdoc benchmark --repo /path/to/repo --gold benchmarks/gold.json
+deepdoc benchmark --catalog benchmarks/catalog.json --chatbot-eval benchmarks/chatbot_eval.json
+deepdoc benchmark --catalog benchmarks/catalog.json --chatbot-eval benchmarks/chatbot_eval.json --scorecard-out .deepdoc/quality_scorecard.json --strict-scorecard
+deepdoc benchmark --generated-root /Users/apple/autodoc/docs --scorecard-out /Users/apple/autodoc/docs/_scorecards/latest.json
+```
+Use `--strict-scorecard` to fail the command when completeness gates are not met.
+When you do not have a hand-written benchmark catalog or chatbot eval file yet, use artifact mode (`--generated-root` or `--artifact-repo`) to compute a provisional scorecard directly from persisted `.deepdoc/` outputs.
 ---
 ## LLM Provider Setup
@@ -579,7 +597,9 @@ chatbot:
     base_url: ""
     api_version: ""
     temperature: 0.1
-    max_tokens: 16000
+    max_tokens: 24000
+    continuation_retries: 2                   # Auto-continue if answer ends abruptly
+    continuation_context_chars: 12000         # Tail chars included in continuation prompt
   embeddings:                                 # LLM used for embedding code/docs
     provider: "azure"
@@ -604,16 +624,25 @@ chatbot:
     top_k_code: 15
     top_k_artifact: 8
     top_k_docs: 6
-    top_k_relationship: 6
+    top_k_relationship: 8
     candidate_top_k_code: 30
     candidate_top_k_artifact: 16
     candidate_top_k_docs: 12
     candidate_top_k_relationship: 12
     max_prompt_code_chunks: 12
     max_prompt_artifact_chunks: 6
-    max_prompt_doc_chunks: 4
-    max_prompt_relationship_chunks: 4
-    max_prompt_chars: 200000
+    max_prompt_doc_chunks: 6
+    max_prompt_relationship_chunks: 6
+    max_prompt_chars: 120000
+    fast_mode_use_llm_retrieval_steps: false  # Fast mode skips expansion/rerank by default
+    fast_mode_iterative_retrieval: false      # Fast mode skips second-pass follow-up retrieval
+    fast_mode_max_prompt_chars: 90000         # Smaller prompt budget for faster /query answers
+    deep_mode_max_prompt_chars: 140000        # Larger budget for /deep-research synthesis
+    code_deep_mode_max_prompt_chars: 180000   # Largest prompt budget for /code-deep
+    code_deep_top_k: 16                       # Code chunks retrieved for code-aware mode
+    code_deep_top_k_relationship: 12          # Relationship chunks retrieved for code-aware mode
+    code_deep_top_k_docs: 4                   # Cap docs chunks in code-aware mode
+    code_deep_file_inventory_limit: 18        # Max files listed in code-aware inventory
     lexical_retrieval: true
     lexical_candidate_limit: 24
     query_expansion: true
@@ -627,7 +656,8 @@ chatbot:
     graph_neighbor_relationship_chunks_per_file: 2
     graph_neighbor_max_docs: 4
     rerank: true
-    rerank_candidate_limit: 20
+    rerank_candidate_limit: 32
+    rerank_candidate_limit_per_kind: 8
     rerank_preview_chars: 450
     stitch_adjacent_code_chunks: true
     stitch_max_adjacent_chunks: 2
@@ -635,7 +665,8 @@ chatbot:
     live_fallback_max_files: 6
     live_fallback_max_per_file: 2
     live_fallback_context_lines: 12
-    deep_research_chunk_chars: 1600
+    deep_research_chunk_chars: 3200
+    deep_research_top_k: 10
   chunking:
     code_chunk_lines: 120
@@ -670,7 +701,9 @@ chatbot:
 | `chatbot.answer.base_url` | `""` | Custom endpoint (for Azure, Ollama, etc.) |
 | `chatbot.answer.api_version` | `""` | Azure API version string |
 | `chatbot.answer.temperature` | `0.1` | Sampling temperature (lower = more deterministic) |
-| `chatbot.answer.max_tokens` | `16000` | Max tokens per answer |
+| `chatbot.answer.max_tokens` | `24000` | Max tokens per answer |
+| `chatbot.answer.continuation_retries` | `2` | Extra completion attempts when an answer appears truncated |
+| `chatbot.answer.continuation_context_chars` | `12000` | Number of trailing chars passed when asking the model to continue |
 | **Embeddings LLM** | | |
 | `chatbot.embeddings.provider` | `azure` | Provider for the embedding model |
 | `chatbot.embeddings.model` | `azure/text-embedding-3-large` | Embedding model |
@@ -682,16 +715,25 @@ chatbot:
 | `chatbot.retrieval.top_k_code` | `15` | Top code chunks retrieved per query |
 | `chatbot.retrieval.top_k_artifact` | `8` | Top artifact chunks retrieved per query |
 | `chatbot.retrieval.top_k_docs` | `6` | Top generated-doc and repo-doc chunks retrieved per query |
-| `chatbot.retrieval.top_k_relationship` | `6` | Top relationship chunks retrieved per query |
+| `chatbot.retrieval.top_k_relationship` | `8` | Top relationship chunks retrieved per query |
 | `chatbot.retrieval.candidate_top_k_code` | `30` | Candidate code chunks gathered before reranking |
 | `chatbot.retrieval.candidate_top_k_artifact` | `16` | Candidate artifact chunks gathered before reranking |
 | `chatbot.retrieval.candidate_top_k_docs` | `12` | Candidate doc chunks gathered before reranking |
 | `chatbot.retrieval.candidate_top_k_relationship` | `12` | Candidate relationship chunks gathered before reranking |
 | `chatbot.retrieval.max_prompt_code_chunks` | `12` | Max code chunks included in the final prompt |
 | `chatbot.retrieval.max_prompt_artifact_chunks` | `6` | Max artifact chunks in the final prompt |
-| `chatbot.retrieval.max_prompt_doc_chunks` | `4` | Max doc chunks in the final prompt |
-| `chatbot.retrieval.max_prompt_relationship_chunks` | `4` | Max relationship chunks included in the final prompt |
-| `chatbot.retrieval.max_prompt_chars` | `200000` | Total character budget for the assembled prompt |
+| `chatbot.retrieval.max_prompt_doc_chunks` | `6` | Max doc chunks in the final prompt |
+| `chatbot.retrieval.max_prompt_relationship_chunks` | `6` | Max relationship chunks included in the final prompt |
+| `chatbot.retrieval.max_prompt_chars` | `120000` | Default character budget for assembled prompts |
+| `chatbot.retrieval.fast_mode_use_llm_retrieval_steps` | `false` | In `/query` fast mode, disable LLM query expansion and reranking |
+| `chatbot.retrieval.fast_mode_iterative_retrieval` | `false` | In `/query` fast mode, disable iterative follow-up retrieval |
+| `chatbot.retrieval.fast_mode_max_prompt_chars` | `90000` | Prompt budget used by `/query` fast mode |
+| `chatbot.retrieval.deep_mode_max_prompt_chars` | `140000` | Prompt budget used by `/deep-research` |
+| `chatbot.retrieval.code_deep_mode_max_prompt_chars` | `180000` | Prompt budget used by `/code-deep` |
+| `chatbot.retrieval.code_deep_top_k` | `16` | Code chunks retrieved in code-aware mode |
+| `chatbot.retrieval.code_deep_top_k_relationship` | `12` | Relationship chunks retrieved in code-aware mode |
+| `chatbot.retrieval.code_deep_top_k_docs` | `4` | Docs chunk cap in code-aware mode |
+| `chatbot.retrieval.code_deep_file_inventory_limit` | `18` | Max files listed in code-aware inventory |
 | `chatbot.retrieval.lexical_retrieval` | `true` | Blend exact-match retrieval with embedding retrieval |
 | `chatbot.retrieval.lexical_candidate_limit` | `24` | Max lexical candidates gathered before merge/rerank |
 | `chatbot.retrieval.query_expansion` | `true` | Use LLM to generate alternative search queries |
@@ -705,7 +747,8 @@ chatbot:
 | `chatbot.retrieval.graph_neighbor_relationship_chunks_per_file` | `2` | Relationship chunks per linked file during graph expansion |
 | `chatbot.retrieval.graph_neighbor_max_docs` | `4` | Max linked docs pulled in during graph expansion |
 | `chatbot.retrieval.rerank` | `true` | Use LLM to rerank retrieved chunks |
-| `chatbot.retrieval.rerank_candidate_limit` | `20` | Max candidates sent to the reranker |
+| `chatbot.retrieval.rerank_candidate_limit` | `32` | Max candidates sent to the reranker |
+| `chatbot.retrieval.rerank_candidate_limit_per_kind` | `8` | Per-kind candidate cap before filling the global rerank pool |
 | `chatbot.retrieval.rerank_preview_chars` | `450` | Characters of each chunk shown to the reranker |
 | `chatbot.retrieval.stitch_adjacent_code_chunks` | `true` | Expand exact-match code hits with adjacent windows from the same file |
 | `chatbot.retrieval.stitch_max_adjacent_chunks` | `2` | Max adjacent code windows stitched onto a top hit |
@@ -713,7 +756,8 @@ chatbot:
 | `chatbot.retrieval.live_fallback_max_files` | `6` | Max repo files inspected during a deep-research live fallback |
 | `chatbot.retrieval.live_fallback_max_per_file` | `2` | Max fallback snippets returned per inspected file |
 | `chatbot.retrieval.live_fallback_context_lines` | `12` | Lines per fallback snippet around each exact match |
-| `chatbot.retrieval.deep_research_chunk_chars` | `1600` | Max chars per evidence chunk passed into deep-research step answers |
+| `chatbot.retrieval.deep_research_chunk_chars` | `3200` | Max chars per evidence chunk passed into deep-research step answers |
+| `chatbot.retrieval.deep_research_top_k` | `10` | Retrieved chunks per deep-research sub-question |
 | **Chunking** | | |
 | `chatbot.chunking.code_chunk_lines` | `120` | Lines per code chunk |
 | `chatbot.chunking.code_chunk_overlap` | `20` | Overlap lines between code chunks |
@@ -790,22 +834,26 @@ During `deepdoc generate`, six corpora are built and stored in `.deepdoc/chatbot
 ### Chatbot Query Pipeline
-When a user asks a question, the backend runs a multi-step retrieval pipeline:
+When a user asks a question, the backend runs a mode-aware retrieval pipeline:
-1. **Query expansion** — The LLM generates up to 3 alternative search queries to improve recall.
+1. **Query expansion** — In default/deep/code-aware mode, the LLM can generate alternative search queries to improve recall. Fast mode disables this by default.
 2. **Embedding** — All queries are embedded using the configured embedding model.
 3. **Hybrid retrieval** — FAISS similarity search and exact-match lexical search both gather candidates from each corpus.
-4. **Follow-up retrieval** — The backend can derive focused second-pass searches and pull linked files/docs via graph-neighbor expansion.
+4. **Follow-up retrieval** — The backend can derive focused second-pass searches and pull linked files/docs via graph-neighbor expansion. Fast mode can skip follow-up queries for lower latency.
 5. **Chunk stitching** — Exact-match code hits can pull adjacent code windows from the same file so larger implementations survive chunk boundaries.
-6. **Reranking** — The LLM scores and reranks the retrieved chunks for relevance.
+6. **Reranking** — In default/deep/code-aware mode, the LLM can rerank candidates for relevance. Fast mode disables this by default.
 7. **Prompt assembly** — Query-type-aware budgets reserve space for the most important evidence types within the character budget.
-8. **Answer generation** — The answer LLM produces a grounded response with code, artifact, doc, repo-doc, relationship, and live-fallback citations when used.
+8. **Answer generation + continuity guard** — The answer LLM produces a grounded response, and if the output appears truncated (for example ending on a dangling heading), DeepDoc retries with a continuation prompt so the response finishes cleanly.
 `POST /deep-research` uses the same indexed corpora first, but it can also inspect a small bounded set of live repo files when exact-match evidence is missing from the index. This fallback respects the repo's exclude rules, skips oversized/binary files, and is only used in deep research mode.
+`POST /code-deep` uses a code-heavy retrieval profile and returns an explicit file inventory plus step trace so users can see where evidence came from while answering file-oriented questions such as “where is auth defined?”.
+`POST /query`, `POST /deep-research`, and `POST /code-deep` return `response_mode` in the payload (`fast`, `deep`, `code_deep`, or `default`) so clients can confirm which retrieval profile generated the result.
 ### Chatbot API Endpoints
-The generated `chatbot_backend/` exposes two endpoints:
+The generated `chatbot_backend/` exposes five endpoints:
 **Health check:**
 ```
@@ -826,6 +874,43 @@ POST /query
 The response includes the answer text, code citations (file path + line range), artifact citations, and links to relevant generated doc pages.
+`/query` is optimized for speed: it runs retrieval in fast mode (no LLM query expansion/rerank by default) and returns an answer plus citations.
+**Code-aware deep query:**
+```
+POST /code-deep
+{
+  "question": "Where is authentication defined?",
+  "history": [],
+  "max_rounds": 4
+}
+```
+`/code-deep` returns a code-aware answer plus `trace` and `file_inventory` fields so clients can show reasoning progress and files considered.
+**Code-aware live stream (SSE):**
+```
+POST /code-deep/stream
+{
+  "question": "Where is authentication defined?",
+  "history": [],
+  "max_rounds": 4
+}
+```
+`/code-deep/stream` emits `trace` events while researching, then a final `result` event and `done`.
+**Retrieve context only (no answer generation):**
+```
+POST /query-context
+{
+  "question": "Where is reshipping implemented?",
+  "history": []
+}
+```
+`/query-context` returns selected citations/chunks only. Use this endpoint to inspect retrieval quality independently from answer generation.
 ### Deploying the Chatbot
 For local development, `deepdoc serve` handles everything automatically. For production:
@@ -1072,6 +1157,15 @@ Add your API key to repo Settings → Secrets → Actions → `ANTHROPIC_API_KEY
 DeepDoc now supports automated releases through GitHub Actions.
+### Release tracks
+This repository now has two independent release tracks:
+- **Python package (`deepdoc`)**: controlled by `pyproject.toml`, root `CHANGELOG.md`, and `.github/workflows/release.yml`.
+- **VS Code extension (`vscode-extension/`)**: controlled by `vscode-extension/package.json`, `vscode-extension/CHANGELOG.md`, and `.github/workflows/release-vscode-extension.yml`.
+Keep versions and changelog entries separated by track.
 ### What happens automatically
 When you push to `main`, the release workflow checks the version in `pyproject.toml`.
@@ -1129,6 +1223,30 @@ If the matching version section is missing, GitHub falls back to auto-generated
 After that, every new version pushed to `main` can publish without a PyPI token.
+### VS Code extension release flow
+The VS Code extension release is automated from `main` when files under `vscode-extension/` change.
+What the extension workflow does:
+- reads `vscode-extension/package.json` version
+- checks whether tag `vscode-extension-v<version>` already exists
+- builds and packages the extension
+- publishes to Marketplace using `VSCE_PAT`
+- creates and pushes the matching git tag
+- creates a GitHub release with notes from `vscode-extension/CHANGELOG.md` (fallback to generated notes)
+One-time setup for extension publishing:
+1. Create a VS Code Marketplace PAT with Manage scope for publisher `Pranawww`
+2. Add repo secret `VSCE_PAT` in GitHub Actions secrets
+Extension release flow on each version:
+1. Update `vscode-extension/package.json` version
+2. Add matching section to `vscode-extension/CHANGELOG.md`
+3. Commit and push to `main`
 ---
 ## Typical Workflow

{deepdoc-1.2.0 → deepdoc-1.4.0}/deepdoc/__init__.py RENAMED Viewed

@@ -1,3 +1,3 @@
 """DeepDoc — Auto-generate beautiful docs from any codebase."""
-__version__ = "1.2.0"
+__version__ = "1.4.0"

deepdoc 1.2.0__tar.gz → 1.4.0__tar.gz

deepdoc 1.2.0tar.gz → 1.4.0tar.gz