PyPI - deepdoc - Versions diffs - 1.2.0__tar.gz → 1.3.0__tar.gz - Mend

deepdoc 1.2.0tar.gz → 1.3.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (131) hide show

{deepdoc-1.2.0 → deepdoc-1.3.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: deepdoc
-Version: 1.2.0
+Version: 1.3.0
 Summary: Auto-generate beautiful docs from any codebase
 Author: Pranav Kumar
 License: MIT
@@ -310,6 +310,22 @@ deepdoc config set output_dir documentation            # Change output dir
 deepdoc config set llm.api_key_env AZURE_API_KEY       # Change API key env var
 ```
+### `deepdoc benchmark`
+Run planner benchmark cases and optionally generate a combined docs+chatbot quality scorecard.
+```bash
+deepdoc benchmark --catalog benchmarks/catalog.json
+deepdoc benchmark --repo /path/to/repo --gold benchmarks/gold.json
+deepdoc benchmark --catalog benchmarks/catalog.json --chatbot-eval benchmarks/chatbot_eval.json
+deepdoc benchmark --catalog benchmarks/catalog.json --chatbot-eval benchmarks/chatbot_eval.json --scorecard-out .deepdoc/quality_scorecard.json --strict-scorecard
+deepdoc benchmark --generated-root /Users/apple/autodoc/docs --scorecard-out /Users/apple/autodoc/docs/_scorecards/latest.json
+```
+Use `--strict-scorecard` to fail the command when completeness gates are not met.
+When you do not have a hand-written benchmark catalog or chatbot eval file yet, use artifact mode (`--generated-root` or `--artifact-repo`) to compute a provisional scorecard directly from persisted `.deepdoc/` outputs.
 ---
 ## LLM Provider Setup
@@ -618,7 +634,9 @@ chatbot:
     base_url: ""
     api_version: ""
     temperature: 0.1
-    max_tokens: 16000
+    max_tokens: 24000
+    continuation_retries: 2                   # Auto-continue if answer ends abruptly
+    continuation_context_chars: 12000         # Tail chars included in continuation prompt
   embeddings:                                 # LLM used for embedding code/docs
     provider: "azure"
@@ -643,16 +661,20 @@ chatbot:
     top_k_code: 15
     top_k_artifact: 8
     top_k_docs: 6
-    top_k_relationship: 6
+    top_k_relationship: 8
     candidate_top_k_code: 30
     candidate_top_k_artifact: 16
     candidate_top_k_docs: 12
     candidate_top_k_relationship: 12
     max_prompt_code_chunks: 12
     max_prompt_artifact_chunks: 6
-    max_prompt_doc_chunks: 4
-    max_prompt_relationship_chunks: 4
-    max_prompt_chars: 200000
+    max_prompt_doc_chunks: 6
+    max_prompt_relationship_chunks: 6
+    max_prompt_chars: 120000
+    fast_mode_use_llm_retrieval_steps: false  # Fast mode skips expansion/rerank by default
+    fast_mode_iterative_retrieval: false      # Fast mode skips second-pass follow-up retrieval
+    fast_mode_max_prompt_chars: 90000         # Smaller prompt budget for faster /query answers
+    deep_mode_max_prompt_chars: 140000        # Larger budget for /deep-research synthesis
     lexical_retrieval: true
     lexical_candidate_limit: 24
     query_expansion: true
@@ -666,7 +688,8 @@ chatbot:
     graph_neighbor_relationship_chunks_per_file: 2
     graph_neighbor_max_docs: 4
     rerank: true
-    rerank_candidate_limit: 20
+    rerank_candidate_limit: 32
+    rerank_candidate_limit_per_kind: 8
     rerank_preview_chars: 450
     stitch_adjacent_code_chunks: true
     stitch_max_adjacent_chunks: 2
@@ -674,7 +697,8 @@ chatbot:
     live_fallback_max_files: 6
     live_fallback_max_per_file: 2
     live_fallback_context_lines: 12
-    deep_research_chunk_chars: 1600
+    deep_research_chunk_chars: 3200
+    deep_research_top_k: 10
   chunking:
     code_chunk_lines: 120
@@ -709,7 +733,9 @@ chatbot:
 | `chatbot.answer.base_url` | `""` | Custom endpoint (for Azure, Ollama, etc.) |
 | `chatbot.answer.api_version` | `""` | Azure API version string |
 | `chatbot.answer.temperature` | `0.1` | Sampling temperature (lower = more deterministic) |
-| `chatbot.answer.max_tokens` | `16000` | Max tokens per answer |
+| `chatbot.answer.max_tokens` | `24000` | Max tokens per answer |
+| `chatbot.answer.continuation_retries` | `2` | Extra completion attempts when an answer appears truncated |
+| `chatbot.answer.continuation_context_chars` | `12000` | Number of trailing chars passed when asking the model to continue |
 | **Embeddings LLM** | | |
 | `chatbot.embeddings.provider` | `azure` | Provider for the embedding model |
 | `chatbot.embeddings.model` | `azure/text-embedding-3-large` | Embedding model |
@@ -721,16 +747,20 @@ chatbot:
 | `chatbot.retrieval.top_k_code` | `15` | Top code chunks retrieved per query |
 | `chatbot.retrieval.top_k_artifact` | `8` | Top artifact chunks retrieved per query |
 | `chatbot.retrieval.top_k_docs` | `6` | Top generated-doc and repo-doc chunks retrieved per query |
-| `chatbot.retrieval.top_k_relationship` | `6` | Top relationship chunks retrieved per query |
+| `chatbot.retrieval.top_k_relationship` | `8` | Top relationship chunks retrieved per query |
 | `chatbot.retrieval.candidate_top_k_code` | `30` | Candidate code chunks gathered before reranking |
 | `chatbot.retrieval.candidate_top_k_artifact` | `16` | Candidate artifact chunks gathered before reranking |
 | `chatbot.retrieval.candidate_top_k_docs` | `12` | Candidate doc chunks gathered before reranking |
 | `chatbot.retrieval.candidate_top_k_relationship` | `12` | Candidate relationship chunks gathered before reranking |
 | `chatbot.retrieval.max_prompt_code_chunks` | `12` | Max code chunks included in the final prompt |
 | `chatbot.retrieval.max_prompt_artifact_chunks` | `6` | Max artifact chunks in the final prompt |
-| `chatbot.retrieval.max_prompt_doc_chunks` | `4` | Max doc chunks in the final prompt |
-| `chatbot.retrieval.max_prompt_relationship_chunks` | `4` | Max relationship chunks included in the final prompt |
-| `chatbot.retrieval.max_prompt_chars` | `200000` | Total character budget for the assembled prompt |
+| `chatbot.retrieval.max_prompt_doc_chunks` | `6` | Max doc chunks in the final prompt |
+| `chatbot.retrieval.max_prompt_relationship_chunks` | `6` | Max relationship chunks included in the final prompt |
+| `chatbot.retrieval.max_prompt_chars` | `120000` | Default character budget for assembled prompts |
+| `chatbot.retrieval.fast_mode_use_llm_retrieval_steps` | `false` | In `/query` fast mode, disable LLM query expansion and reranking |
+| `chatbot.retrieval.fast_mode_iterative_retrieval` | `false` | In `/query` fast mode, disable iterative follow-up retrieval |
+| `chatbot.retrieval.fast_mode_max_prompt_chars` | `90000` | Prompt budget used by `/query` fast mode |
+| `chatbot.retrieval.deep_mode_max_prompt_chars` | `140000` | Prompt budget used by `/deep-research` |
 | `chatbot.retrieval.lexical_retrieval` | `true` | Blend exact-match retrieval with embedding retrieval |
 | `chatbot.retrieval.lexical_candidate_limit` | `24` | Max lexical candidates gathered before merge/rerank |
 | `chatbot.retrieval.query_expansion` | `true` | Use LLM to generate alternative search queries |
@@ -744,7 +774,8 @@ chatbot:
 | `chatbot.retrieval.graph_neighbor_relationship_chunks_per_file` | `2` | Relationship chunks per linked file during graph expansion |
 | `chatbot.retrieval.graph_neighbor_max_docs` | `4` | Max linked docs pulled in during graph expansion |
 | `chatbot.retrieval.rerank` | `true` | Use LLM to rerank retrieved chunks |
-| `chatbot.retrieval.rerank_candidate_limit` | `20` | Max candidates sent to the reranker |
+| `chatbot.retrieval.rerank_candidate_limit` | `32` | Max candidates sent to the reranker |
+| `chatbot.retrieval.rerank_candidate_limit_per_kind` | `8` | Per-kind candidate cap before filling the global rerank pool |
 | `chatbot.retrieval.rerank_preview_chars` | `450` | Characters of each chunk shown to the reranker |
 | `chatbot.retrieval.stitch_adjacent_code_chunks` | `true` | Expand exact-match code hits with adjacent windows from the same file |
 | `chatbot.retrieval.stitch_max_adjacent_chunks` | `2` | Max adjacent code windows stitched onto a top hit |
@@ -752,7 +783,8 @@ chatbot:
 | `chatbot.retrieval.live_fallback_max_files` | `6` | Max repo files inspected during a deep-research live fallback |
 | `chatbot.retrieval.live_fallback_max_per_file` | `2` | Max fallback snippets returned per inspected file |
 | `chatbot.retrieval.live_fallback_context_lines` | `12` | Lines per fallback snippet around each exact match |
-| `chatbot.retrieval.deep_research_chunk_chars` | `1600` | Max chars per evidence chunk passed into deep-research step answers |
+| `chatbot.retrieval.deep_research_chunk_chars` | `3200` | Max chars per evidence chunk passed into deep-research step answers |
+| `chatbot.retrieval.deep_research_top_k` | `10` | Retrieved chunks per deep-research sub-question |
 | **Chunking** | | |
 | `chatbot.chunking.code_chunk_lines` | `120` | Lines per code chunk |
 | `chatbot.chunking.code_chunk_overlap` | `20` | Overlap lines between code chunks |
@@ -829,22 +861,24 @@ During `deepdoc generate`, six corpora are built and stored in `.deepdoc/chatbot
 ### Chatbot Query Pipeline
-When a user asks a question, the backend runs a multi-step retrieval pipeline:
+When a user asks a question, the backend runs a mode-aware retrieval pipeline:
-1. **Query expansion** — The LLM generates up to 3 alternative search queries to improve recall.
+1. **Query expansion** — In default/deep mode, the LLM can generate alternative search queries to improve recall. Fast mode disables this by default.
 2. **Embedding** — All queries are embedded using the configured embedding model.
 3. **Hybrid retrieval** — FAISS similarity search and exact-match lexical search both gather candidates from each corpus.
-4. **Follow-up retrieval** — The backend can derive focused second-pass searches and pull linked files/docs via graph-neighbor expansion.
+4. **Follow-up retrieval** — The backend can derive focused second-pass searches and pull linked files/docs via graph-neighbor expansion. Fast mode can skip follow-up queries for lower latency.
 5. **Chunk stitching** — Exact-match code hits can pull adjacent code windows from the same file so larger implementations survive chunk boundaries.
-6. **Reranking** — The LLM scores and reranks the retrieved chunks for relevance.
+6. **Reranking** — In default/deep mode, the LLM can rerank candidates for relevance. Fast mode disables this by default.
 7. **Prompt assembly** — Query-type-aware budgets reserve space for the most important evidence types within the character budget.
-8. **Answer generation** — The answer LLM produces a grounded response with code, artifact, doc, repo-doc, relationship, and live-fallback citations when used.
+8. **Answer generation + continuity guard** — The answer LLM produces a grounded response, and if the output appears truncated (for example ending on a dangling heading), DeepDoc retries with a continuation prompt so the response finishes cleanly.
 `POST /deep-research` uses the same indexed corpora first, but it can also inspect a small bounded set of live repo files when exact-match evidence is missing from the index. This fallback respects the repo's exclude rules, skips oversized/binary files, and is only used in deep research mode.
+`POST /query` and `POST /deep-research` now return `response_mode` in the payload (`fast`, `deep`, or `default`) so clients can confirm which retrieval profile generated the result.
 ### Chatbot API Endpoints
-The generated `chatbot_backend/` exposes two endpoints:
+The generated `chatbot_backend/` exposes three endpoints:
 **Health check:**
 ```
@@ -865,6 +899,19 @@ POST /query
 The response includes the answer text, code citations (file path + line range), artifact citations, and links to relevant generated doc pages.
+`/query` is optimized for speed: it runs retrieval in fast mode (no LLM query expansion/rerank by default) and returns an answer plus citations.
+**Retrieve context only (no answer generation):**
+```
+POST /query-context
+{
+  "question": "Where is reshipping implemented?",
+  "history": []
+}
+```
+`/query-context` returns selected citations/chunks only. Use this endpoint to inspect retrieval quality independently from answer generation.
 ### Deploying the Chatbot
 For local development, `deepdoc serve` handles everything automatically. For production:

{deepdoc-1.2.0 → deepdoc-1.3.0}/README.md RENAMED Viewed

@@ -271,6 +271,22 @@ deepdoc config set output_dir documentation            # Change output dir
 deepdoc config set llm.api_key_env AZURE_API_KEY       # Change API key env var
 ```
+### `deepdoc benchmark`
+Run planner benchmark cases and optionally generate a combined docs+chatbot quality scorecard.
+```bash
+deepdoc benchmark --catalog benchmarks/catalog.json
+deepdoc benchmark --repo /path/to/repo --gold benchmarks/gold.json
+deepdoc benchmark --catalog benchmarks/catalog.json --chatbot-eval benchmarks/chatbot_eval.json
+deepdoc benchmark --catalog benchmarks/catalog.json --chatbot-eval benchmarks/chatbot_eval.json --scorecard-out .deepdoc/quality_scorecard.json --strict-scorecard
+deepdoc benchmark --generated-root /Users/apple/autodoc/docs --scorecard-out /Users/apple/autodoc/docs/_scorecards/latest.json
+```
+Use `--strict-scorecard` to fail the command when completeness gates are not met.
+When you do not have a hand-written benchmark catalog or chatbot eval file yet, use artifact mode (`--generated-root` or `--artifact-repo`) to compute a provisional scorecard directly from persisted `.deepdoc/` outputs.
 ---
 ## LLM Provider Setup
@@ -579,7 +595,9 @@ chatbot:
     base_url: ""
     api_version: ""
     temperature: 0.1
-    max_tokens: 16000
+    max_tokens: 24000
+    continuation_retries: 2                   # Auto-continue if answer ends abruptly
+    continuation_context_chars: 12000         # Tail chars included in continuation prompt
   embeddings:                                 # LLM used for embedding code/docs
     provider: "azure"
@@ -604,16 +622,20 @@ chatbot:
     top_k_code: 15
     top_k_artifact: 8
     top_k_docs: 6
-    top_k_relationship: 6
+    top_k_relationship: 8
     candidate_top_k_code: 30
     candidate_top_k_artifact: 16
     candidate_top_k_docs: 12
     candidate_top_k_relationship: 12
     max_prompt_code_chunks: 12
     max_prompt_artifact_chunks: 6
-    max_prompt_doc_chunks: 4
-    max_prompt_relationship_chunks: 4
-    max_prompt_chars: 200000
+    max_prompt_doc_chunks: 6
+    max_prompt_relationship_chunks: 6
+    max_prompt_chars: 120000
+    fast_mode_use_llm_retrieval_steps: false  # Fast mode skips expansion/rerank by default
+    fast_mode_iterative_retrieval: false      # Fast mode skips second-pass follow-up retrieval
+    fast_mode_max_prompt_chars: 90000         # Smaller prompt budget for faster /query answers
+    deep_mode_max_prompt_chars: 140000        # Larger budget for /deep-research synthesis
     lexical_retrieval: true
     lexical_candidate_limit: 24
     query_expansion: true
@@ -627,7 +649,8 @@ chatbot:
     graph_neighbor_relationship_chunks_per_file: 2
     graph_neighbor_max_docs: 4
     rerank: true
-    rerank_candidate_limit: 20
+    rerank_candidate_limit: 32
+    rerank_candidate_limit_per_kind: 8
     rerank_preview_chars: 450
     stitch_adjacent_code_chunks: true
     stitch_max_adjacent_chunks: 2
@@ -635,7 +658,8 @@ chatbot:
     live_fallback_max_files: 6
     live_fallback_max_per_file: 2
     live_fallback_context_lines: 12
-    deep_research_chunk_chars: 1600
+    deep_research_chunk_chars: 3200
+    deep_research_top_k: 10
   chunking:
     code_chunk_lines: 120
@@ -670,7 +694,9 @@ chatbot:
 | `chatbot.answer.base_url` | `""` | Custom endpoint (for Azure, Ollama, etc.) |
 | `chatbot.answer.api_version` | `""` | Azure API version string |
 | `chatbot.answer.temperature` | `0.1` | Sampling temperature (lower = more deterministic) |
-| `chatbot.answer.max_tokens` | `16000` | Max tokens per answer |
+| `chatbot.answer.max_tokens` | `24000` | Max tokens per answer |
+| `chatbot.answer.continuation_retries` | `2` | Extra completion attempts when an answer appears truncated |
+| `chatbot.answer.continuation_context_chars` | `12000` | Number of trailing chars passed when asking the model to continue |
 | **Embeddings LLM** | | |
 | `chatbot.embeddings.provider` | `azure` | Provider for the embedding model |
 | `chatbot.embeddings.model` | `azure/text-embedding-3-large` | Embedding model |
@@ -682,16 +708,20 @@ chatbot:
 | `chatbot.retrieval.top_k_code` | `15` | Top code chunks retrieved per query |
 | `chatbot.retrieval.top_k_artifact` | `8` | Top artifact chunks retrieved per query |
 | `chatbot.retrieval.top_k_docs` | `6` | Top generated-doc and repo-doc chunks retrieved per query |
-| `chatbot.retrieval.top_k_relationship` | `6` | Top relationship chunks retrieved per query |
+| `chatbot.retrieval.top_k_relationship` | `8` | Top relationship chunks retrieved per query |
 | `chatbot.retrieval.candidate_top_k_code` | `30` | Candidate code chunks gathered before reranking |
 | `chatbot.retrieval.candidate_top_k_artifact` | `16` | Candidate artifact chunks gathered before reranking |
 | `chatbot.retrieval.candidate_top_k_docs` | `12` | Candidate doc chunks gathered before reranking |
 | `chatbot.retrieval.candidate_top_k_relationship` | `12` | Candidate relationship chunks gathered before reranking |
 | `chatbot.retrieval.max_prompt_code_chunks` | `12` | Max code chunks included in the final prompt |
 | `chatbot.retrieval.max_prompt_artifact_chunks` | `6` | Max artifact chunks in the final prompt |
-| `chatbot.retrieval.max_prompt_doc_chunks` | `4` | Max doc chunks in the final prompt |
-| `chatbot.retrieval.max_prompt_relationship_chunks` | `4` | Max relationship chunks included in the final prompt |
-| `chatbot.retrieval.max_prompt_chars` | `200000` | Total character budget for the assembled prompt |
+| `chatbot.retrieval.max_prompt_doc_chunks` | `6` | Max doc chunks in the final prompt |
+| `chatbot.retrieval.max_prompt_relationship_chunks` | `6` | Max relationship chunks included in the final prompt |
+| `chatbot.retrieval.max_prompt_chars` | `120000` | Default character budget for assembled prompts |
+| `chatbot.retrieval.fast_mode_use_llm_retrieval_steps` | `false` | In `/query` fast mode, disable LLM query expansion and reranking |
+| `chatbot.retrieval.fast_mode_iterative_retrieval` | `false` | In `/query` fast mode, disable iterative follow-up retrieval |
+| `chatbot.retrieval.fast_mode_max_prompt_chars` | `90000` | Prompt budget used by `/query` fast mode |
+| `chatbot.retrieval.deep_mode_max_prompt_chars` | `140000` | Prompt budget used by `/deep-research` |
 | `chatbot.retrieval.lexical_retrieval` | `true` | Blend exact-match retrieval with embedding retrieval |
 | `chatbot.retrieval.lexical_candidate_limit` | `24` | Max lexical candidates gathered before merge/rerank |
 | `chatbot.retrieval.query_expansion` | `true` | Use LLM to generate alternative search queries |
@@ -705,7 +735,8 @@ chatbot:
 | `chatbot.retrieval.graph_neighbor_relationship_chunks_per_file` | `2` | Relationship chunks per linked file during graph expansion |
 | `chatbot.retrieval.graph_neighbor_max_docs` | `4` | Max linked docs pulled in during graph expansion |
 | `chatbot.retrieval.rerank` | `true` | Use LLM to rerank retrieved chunks |
-| `chatbot.retrieval.rerank_candidate_limit` | `20` | Max candidates sent to the reranker |
+| `chatbot.retrieval.rerank_candidate_limit` | `32` | Max candidates sent to the reranker |
+| `chatbot.retrieval.rerank_candidate_limit_per_kind` | `8` | Per-kind candidate cap before filling the global rerank pool |
 | `chatbot.retrieval.rerank_preview_chars` | `450` | Characters of each chunk shown to the reranker |
 | `chatbot.retrieval.stitch_adjacent_code_chunks` | `true` | Expand exact-match code hits with adjacent windows from the same file |
 | `chatbot.retrieval.stitch_max_adjacent_chunks` | `2` | Max adjacent code windows stitched onto a top hit |
@@ -713,7 +744,8 @@ chatbot:
 | `chatbot.retrieval.live_fallback_max_files` | `6` | Max repo files inspected during a deep-research live fallback |
 | `chatbot.retrieval.live_fallback_max_per_file` | `2` | Max fallback snippets returned per inspected file |
 | `chatbot.retrieval.live_fallback_context_lines` | `12` | Lines per fallback snippet around each exact match |
-| `chatbot.retrieval.deep_research_chunk_chars` | `1600` | Max chars per evidence chunk passed into deep-research step answers |
+| `chatbot.retrieval.deep_research_chunk_chars` | `3200` | Max chars per evidence chunk passed into deep-research step answers |
+| `chatbot.retrieval.deep_research_top_k` | `10` | Retrieved chunks per deep-research sub-question |
 | **Chunking** | | |
 | `chatbot.chunking.code_chunk_lines` | `120` | Lines per code chunk |
 | `chatbot.chunking.code_chunk_overlap` | `20` | Overlap lines between code chunks |
@@ -790,22 +822,24 @@ During `deepdoc generate`, six corpora are built and stored in `.deepdoc/chatbot
 ### Chatbot Query Pipeline
-When a user asks a question, the backend runs a multi-step retrieval pipeline:
+When a user asks a question, the backend runs a mode-aware retrieval pipeline:
-1. **Query expansion** — The LLM generates up to 3 alternative search queries to improve recall.
+1. **Query expansion** — In default/deep mode, the LLM can generate alternative search queries to improve recall. Fast mode disables this by default.
 2. **Embedding** — All queries are embedded using the configured embedding model.
 3. **Hybrid retrieval** — FAISS similarity search and exact-match lexical search both gather candidates from each corpus.
-4. **Follow-up retrieval** — The backend can derive focused second-pass searches and pull linked files/docs via graph-neighbor expansion.
+4. **Follow-up retrieval** — The backend can derive focused second-pass searches and pull linked files/docs via graph-neighbor expansion. Fast mode can skip follow-up queries for lower latency.
 5. **Chunk stitching** — Exact-match code hits can pull adjacent code windows from the same file so larger implementations survive chunk boundaries.
-6. **Reranking** — The LLM scores and reranks the retrieved chunks for relevance.
+6. **Reranking** — In default/deep mode, the LLM can rerank candidates for relevance. Fast mode disables this by default.
 7. **Prompt assembly** — Query-type-aware budgets reserve space for the most important evidence types within the character budget.
-8. **Answer generation** — The answer LLM produces a grounded response with code, artifact, doc, repo-doc, relationship, and live-fallback citations when used.
+8. **Answer generation + continuity guard** — The answer LLM produces a grounded response, and if the output appears truncated (for example ending on a dangling heading), DeepDoc retries with a continuation prompt so the response finishes cleanly.
 `POST /deep-research` uses the same indexed corpora first, but it can also inspect a small bounded set of live repo files when exact-match evidence is missing from the index. This fallback respects the repo's exclude rules, skips oversized/binary files, and is only used in deep research mode.
+`POST /query` and `POST /deep-research` now return `response_mode` in the payload (`fast`, `deep`, or `default`) so clients can confirm which retrieval profile generated the result.
 ### Chatbot API Endpoints
-The generated `chatbot_backend/` exposes two endpoints:
+The generated `chatbot_backend/` exposes three endpoints:
 **Health check:**
 ```
@@ -826,6 +860,19 @@ POST /query
 The response includes the answer text, code citations (file path + line range), artifact citations, and links to relevant generated doc pages.
+`/query` is optimized for speed: it runs retrieval in fast mode (no LLM query expansion/rerank by default) and returns an answer plus citations.
+**Retrieve context only (no answer generation):**
+```
+POST /query-context
+{
+  "question": "Where is reshipping implemented?",
+  "history": []
+}
+```
+`/query-context` returns selected citations/chunks only. Use this endpoint to inspect retrieval quality independently from answer generation.
 ### Deploying the Chatbot
 For local development, `deepdoc serve` handles everything automatically. For production:

{deepdoc-1.2.0 → deepdoc-1.3.0}/deepdoc/__init__.py RENAMED Viewed

@@ -1,3 +1,3 @@
 """DeepDoc — Auto-generate beautiful docs from any codebase."""
-__version__ = "1.2.0"
+__version__ = "1.3.0"

deepdoc 1.2.0__tar.gz → 1.3.0__tar.gz

deepdoc 1.2.0tar.gz → 1.3.0tar.gz