biblicus 1.1.0__tar.gz → 1.1.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (425) hide show
  1. {biblicus-1.1.0/src/biblicus.egg-info → biblicus-1.1.1}/PKG-INFO +24 -24
  2. {biblicus-1.1.0 → biblicus-1.1.1}/README.md +23 -23
  3. biblicus-1.1.0/docs/ANALYSIS.md → biblicus-1.1.1/docs/analysis.md +3 -3
  4. biblicus-1.1.1/docs/architecture.md +107 -0
  5. {biblicus-1.1.0 → biblicus-1.1.1}/docs/backends/index.md +5 -5
  6. {biblicus-1.1.0 → biblicus-1.1.1}/docs/backends/scan.md +2 -2
  7. {biblicus-1.1.0 → biblicus-1.1.1}/docs/backends/sqlite-full-text-search.md +2 -2
  8. biblicus-1.1.0/docs/BACKENDS.md → biblicus-1.1.1/docs/backends.md +1 -1
  9. biblicus-1.1.0/docs/DEMOS.md → biblicus-1.1.1/docs/demos.md +29 -116
  10. biblicus-1.1.0/docs/EMBEDDING_RETRIEVAL.md → biblicus-1.1.1/docs/embedding-retrieval.md +1 -1
  11. biblicus-1.1.0/docs/EXTRACTION.md → biblicus-1.1.1/docs/extraction.md +1 -1
  12. {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/index.md +1 -1
  13. {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/ocr/paddleocr-vl.md +2 -2
  14. {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/ocr/rapidocr.md +1 -1
  15. {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/pipeline-utilities/pipeline.md +2 -2
  16. {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/pipeline-utilities/select-longest.md +2 -2
  17. {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/pipeline-utilities/select-override.md +2 -2
  18. {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/pipeline-utilities/select-smart-override.md +2 -2
  19. {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/pipeline-utilities/select-text.md +2 -2
  20. {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/speech-to-text/deepgram.md +2 -2
  21. {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/speech-to-text/openai.md +2 -2
  22. {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/text-document/markitdown.md +1 -1
  23. {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/text-document/metadata.md +2 -2
  24. {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/text-document/pass-through.md +2 -2
  25. {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/text-document/pdf.md +1 -1
  26. {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/text-document/unstructured.md +1 -1
  27. {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/vlm-document/docling-granite.md +1 -1
  28. {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/vlm-document/docling-smol.md +1 -1
  29. biblicus-1.1.0/docs/FEATURE_INDEX.md → biblicus-1.1.1/docs/feature-index.md +23 -23
  30. {biblicus-1.1.0 → biblicus-1.1.1}/docs/index.rst +35 -36
  31. biblicus-1.1.0/docs/RETRIEVAL.md → biblicus-1.1.1/docs/retrieval.md +3 -3
  32. biblicus-1.1.0/docs/ROADMAP.md → biblicus-1.1.1/docs/roadmap.md +2 -2
  33. biblicus-1.1.0/docs/TEXT_UTILITIES.md → biblicus-1.1.1/docs/text-utilities.md +27 -27
  34. {biblicus-1.1.0 → biblicus-1.1.1}/docs/use_cases/sequence_markov.md +1 -1
  35. biblicus-1.1.0/docs/UTILITIES.md → biblicus-1.1.1/docs/utilities.md +3 -3
  36. {biblicus-1.1.0 → biblicus-1.1.1}/pyproject.toml +1 -1
  37. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/__init__.py +1 -1
  38. {biblicus-1.1.0 → biblicus-1.1.1/src/biblicus.egg-info}/PKG-INFO +24 -24
  39. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus.egg-info/SOURCES.txt +28 -29
  40. biblicus-1.1.0/docs/ARCHITECTURE.md +0 -46
  41. biblicus-1.1.0/docs/ARCHITECTURE_DETAIL.md +0 -267
  42. {biblicus-1.1.0 → biblicus-1.1.1}/LICENSE +0 -0
  43. {biblicus-1.1.0 → biblicus-1.1.1}/MANIFEST.in +0 -0
  44. {biblicus-1.1.0 → biblicus-1.1.1}/THIRD_PARTY_NOTICES.md +0 -0
  45. {biblicus-1.1.0 → biblicus-1.1.1}/datasets/extraction_lab/labels.json +0 -0
  46. {biblicus-1.1.0 → biblicus-1.1.1}/datasets/retrieval_lab/labels.json +0 -0
  47. {biblicus-1.1.0 → biblicus-1.1.1}/datasets/wikipedia_mini.json +0 -0
  48. {biblicus-1.1.0 → biblicus-1.1.1}/docs/CHUNKING.md +0 -0
  49. {biblicus-1.1.0 → biblicus-1.1.1}/docs/CORPUS.md +0 -0
  50. {biblicus-1.1.0 → biblicus-1.1.1}/docs/PROFILING.md +0 -0
  51. {biblicus-1.1.0 → biblicus-1.1.1}/docs/STT.md +0 -0
  52. {biblicus-1.1.0 → biblicus-1.1.1}/docs/TESTING.md +0 -0
  53. {biblicus-1.1.0 → biblicus-1.1.1}/docs/api.rst +0 -0
  54. {biblicus-1.1.0 → biblicus-1.1.1}/docs/backends/embedding-index-file.md +0 -0
  55. {biblicus-1.1.0 → biblicus-1.1.1}/docs/backends/embedding-index-inmemory.md +0 -0
  56. {biblicus-1.1.0 → biblicus-1.1.1}/docs/backends/tf-vector.md +0 -0
  57. {biblicus-1.1.0 → biblicus-1.1.1}/docs/conf.py +0 -0
  58. /biblicus-1.1.0/docs/CONTEXT_ENGINE_DEMO.md → /biblicus-1.1.1/docs/context-engine-demo.md +0 -0
  59. /biblicus-1.1.0/docs/CONTEXT_ENGINE.md → /biblicus-1.1.1/docs/context-engine.md +0 -0
  60. /biblicus-1.1.0/docs/CONTEXT_PACK.md → /biblicus-1.1.1/docs/context-pack.md +0 -0
  61. /biblicus-1.1.0/docs/CORPUS_DESIGN.md → /biblicus-1.1.1/docs/corpus-design.md +0 -0
  62. /biblicus-1.1.0/docs/EXTRACTION_EVALUATION.md → /biblicus-1.1.1/docs/extraction-evaluation.md +0 -0
  63. {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/ocr/index.md +0 -0
  64. {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/pipeline-utilities/index.md +0 -0
  65. {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/speech-to-text/index.md +0 -0
  66. {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/text-document/index.md +0 -0
  67. {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/vlm-document/index.md +0 -0
  68. /biblicus-1.1.0/docs/KNOWLEDGE_BASE.md → /biblicus-1.1.1/docs/knowledge-base.md +0 -0
  69. /biblicus-1.1.0/docs/MARKOV_ANALYSIS.md → /biblicus-1.1.1/docs/markov-analysis.md +0 -0
  70. /biblicus-1.1.0/docs/RETRIEVAL_EVALUATION.md → /biblicus-1.1.1/docs/retrieval-evaluation.md +0 -0
  71. /biblicus-1.1.0/docs/RETRIEVAL_QUALITY.md → /biblicus-1.1.1/docs/retrieval-quality.md +0 -0
  72. /biblicus-1.1.0/docs/TEXT_ANNOTATE.md → /biblicus-1.1.1/docs/text-annotate.md +0 -0
  73. /biblicus-1.1.0/docs/TEXT_EXTRACT.md → /biblicus-1.1.1/docs/text-extract.md +0 -0
  74. /biblicus-1.1.0/docs/TEXT_LINK.md → /biblicus-1.1.1/docs/text-link.md +0 -0
  75. /biblicus-1.1.0/docs/TEXT_REDACT.md → /biblicus-1.1.1/docs/text-redact.md +0 -0
  76. /biblicus-1.1.0/docs/TEXT_SLICE.md → /biblicus-1.1.1/docs/text-slice.md +0 -0
  77. /biblicus-1.1.0/docs/TOPIC_MODELING.md → /biblicus-1.1.1/docs/topic-modeling.md +0 -0
  78. /biblicus-1.1.0/docs/USE_CASES.md → /biblicus-1.1.1/docs/use-cases.md +0 -0
  79. {biblicus-1.1.0 → biblicus-1.1.1}/docs/use_cases/notes_to_context_pack.md +0 -0
  80. {biblicus-1.1.0 → biblicus-1.1.1}/docs/use_cases/text_folder_search.md +0 -0
  81. {biblicus-1.1.0 → biblicus-1.1.1}/docs/use_cases/text_redact.md +0 -0
  82. /biblicus-1.1.0/docs/USER_CONFIGURATION.md → /biblicus-1.1.1/docs/user-configuration.md +0 -0
  83. {biblicus-1.1.0 → biblicus-1.1.1}/features/70_context_retriever.feature +0 -0
  84. {biblicus-1.1.0 → biblicus-1.1.1}/features/71_context_compaction.feature +0 -0
  85. {biblicus-1.1.0 → biblicus-1.1.1}/features/72_context_history_compaction.feature +0 -0
  86. {biblicus-1.1.0 → biblicus-1.1.1}/features/73_context_nested_compaction.feature +0 -0
  87. {biblicus-1.1.0 → biblicus-1.1.1}/features/74_context_regeneration.feature +0 -0
  88. {biblicus-1.1.0 → biblicus-1.1.1}/features/75_context_default_regeneration.feature +0 -0
  89. {biblicus-1.1.0 → biblicus-1.1.1}/features/76_context_pack_budget_weights.feature +0 -0
  90. {biblicus-1.1.0 → biblicus-1.1.1}/features/77_context_default_pack_priority.feature +0 -0
  91. {biblicus-1.1.0 → biblicus-1.1.1}/features/78_context_default_pack_weights.feature +0 -0
  92. {biblicus-1.1.0 → biblicus-1.1.1}/features/79_context_nested_context_packs.feature +0 -0
  93. {biblicus-1.1.0 → biblicus-1.1.1}/features/80_context_nested_pack_budget_cap.feature +0 -0
  94. {biblicus-1.1.0 → biblicus-1.1.1}/features/81_context_nested_regeneration.feature +0 -0
  95. {biblicus-1.1.0 → biblicus-1.1.1}/features/82_context_explicit_regeneration.feature +0 -0
  96. {biblicus-1.1.0 → biblicus-1.1.1}/features/83_context_explicit_pack_priority.feature +0 -0
  97. {biblicus-1.1.0 → biblicus-1.1.1}/features/84_context_explicit_pack_weights.feature +0 -0
  98. {biblicus-1.1.0 → biblicus-1.1.1}/features/85_context_expansion.feature +0 -0
  99. {biblicus-1.1.0 → biblicus-1.1.1}/features/86_context_engine_errors.feature +0 -0
  100. {biblicus-1.1.0 → biblicus-1.1.1}/features/87_context_compactor_strategies.feature +0 -0
  101. {biblicus-1.1.0 → biblicus-1.1.1}/features/88_context_engine_model_validation.feature +0 -0
  102. {biblicus-1.1.0 → biblicus-1.1.1}/features/89_context_engine_internal_branches.feature +0 -0
  103. {biblicus-1.1.0 → biblicus-1.1.1}/features/90_embedding_index_evidence_fallback.feature +0 -0
  104. {biblicus-1.1.0 → biblicus-1.1.1}/features/91_tf_vector_internal_branches.feature +0 -0
  105. {biblicus-1.1.0 → biblicus-1.1.1}/features/93_context_engine_full_paths.feature +0 -0
  106. {biblicus-1.1.0 → biblicus-1.1.1}/features/ai_llm.feature +0 -0
  107. {biblicus-1.1.0 → biblicus-1.1.1}/features/ai_models.feature +0 -0
  108. {biblicus-1.1.0 → biblicus-1.1.1}/features/analysis_schema.feature +0 -0
  109. {biblicus-1.1.0 → biblicus-1.1.1}/features/backend_validation.feature +0 -0
  110. {biblicus-1.1.0 → biblicus-1.1.1}/features/biblicus_corpus.feature +0 -0
  111. {biblicus-1.1.0 → biblicus-1.1.1}/features/cli_entrypoint.feature +0 -0
  112. {biblicus-1.1.0 → biblicus-1.1.1}/features/cli_parsing.feature +0 -0
  113. {biblicus-1.1.0 → biblicus-1.1.1}/features/cli_step_spec_parsing.feature +0 -0
  114. {biblicus-1.1.0 → biblicus-1.1.1}/features/content_sniffing.feature +0 -0
  115. {biblicus-1.1.0 → biblicus-1.1.1}/features/context_engine_retrieval_internal_branches.feature +0 -0
  116. {biblicus-1.1.0 → biblicus-1.1.1}/features/context_engine_retrieve_context_pack.feature +0 -0
  117. {biblicus-1.1.0 → biblicus-1.1.1}/features/context_pack.feature +0 -0
  118. {biblicus-1.1.0 → biblicus-1.1.1}/features/context_pack_cli.feature +0 -0
  119. {biblicus-1.1.0 → biblicus-1.1.1}/features/context_pack_policies.feature +0 -0
  120. {biblicus-1.1.0 → biblicus-1.1.1}/features/corpus_edge_cases.feature +0 -0
  121. {biblicus-1.1.0 → biblicus-1.1.1}/features/corpus_identity.feature +0 -0
  122. {biblicus-1.1.0 → biblicus-1.1.1}/features/corpus_internal_branches.feature +0 -0
  123. {biblicus-1.1.0 → biblicus-1.1.1}/features/corpus_purge.feature +0 -0
  124. {biblicus-1.1.0 → biblicus-1.1.1}/features/crawl.feature +0 -0
  125. {biblicus-1.1.0 → biblicus-1.1.1}/features/docling_granite_extractor.feature +0 -0
  126. {biblicus-1.1.0 → biblicus-1.1.1}/features/docling_smol_extractor.feature +0 -0
  127. {biblicus-1.1.0 → biblicus-1.1.1}/features/embedding_index_internal_branches.feature +0 -0
  128. {biblicus-1.1.0 → biblicus-1.1.1}/features/embedding_retrieval.feature +0 -0
  129. {biblicus-1.1.0 → biblicus-1.1.1}/features/embeddings.feature +0 -0
  130. {biblicus-1.1.0 → biblicus-1.1.1}/features/environment.py +0 -0
  131. {biblicus-1.1.0 → biblicus-1.1.1}/features/error_cases.feature +0 -0
  132. {biblicus-1.1.0 → biblicus-1.1.1}/features/evaluation.feature +0 -0
  133. {biblicus-1.1.0 → biblicus-1.1.1}/features/evidence_processing.feature +0 -0
  134. {biblicus-1.1.0 → biblicus-1.1.1}/features/extraction_error_handling.feature +0 -0
  135. {biblicus-1.1.0 → biblicus-1.1.1}/features/extraction_evaluation.feature +0 -0
  136. {biblicus-1.1.0 → biblicus-1.1.1}/features/extraction_evaluation_lab.feature +0 -0
  137. {biblicus-1.1.0 → biblicus-1.1.1}/features/extraction_run_lifecycle.feature +0 -0
  138. {biblicus-1.1.0 → biblicus-1.1.1}/features/extraction_selection.feature +0 -0
  139. {biblicus-1.1.0 → biblicus-1.1.1}/features/extraction_selection_longest.feature +0 -0
  140. {biblicus-1.1.0 → biblicus-1.1.1}/features/extractor_pipeline.feature +0 -0
  141. {biblicus-1.1.0 → biblicus-1.1.1}/features/extractor_validation.feature +0 -0
  142. {biblicus-1.1.0 → biblicus-1.1.1}/features/frontmatter.feature +0 -0
  143. {biblicus-1.1.0 → biblicus-1.1.1}/features/hook_config_validation.feature +0 -0
  144. {biblicus-1.1.0 → biblicus-1.1.1}/features/hook_error_handling.feature +0 -0
  145. {biblicus-1.1.0 → biblicus-1.1.1}/features/hook_logging_internal_branches.feature +0 -0
  146. {biblicus-1.1.0 → biblicus-1.1.1}/features/import_tree.feature +0 -0
  147. {biblicus-1.1.0 → biblicus-1.1.1}/features/inference_backend.feature +0 -0
  148. {biblicus-1.1.0 → biblicus-1.1.1}/features/ingest_namespacing.feature +0 -0
  149. {biblicus-1.1.0 → biblicus-1.1.1}/features/ingest_sources.feature +0 -0
  150. {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_audio_samples.feature +0 -0
  151. {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_image_samples.feature +0 -0
  152. {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_mixed_corpus.feature +0 -0
  153. {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_mixed_extraction.feature +0 -0
  154. {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_ocr_image_extraction.feature +0 -0
  155. {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_pdf_retrieval.feature +0 -0
  156. {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_pdf_samples.feature +0 -0
  157. {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_text_annotate.feature +0 -0
  158. {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_text_extract.feature +0 -0
  159. {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_text_link.feature +0 -0
  160. {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_text_redact.feature +0 -0
  161. {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_text_slice.feature +0 -0
  162. {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_unstructured_extraction.feature +0 -0
  163. {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_use_cases.feature +0 -0
  164. {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_use_cases_sequence_markov.feature +0 -0
  165. {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_wikipedia.feature +0 -0
  166. {biblicus-1.1.0 → biblicus-1.1.1}/features/knowledge_base.feature +0 -0
  167. {biblicus-1.1.0 → biblicus-1.1.1}/features/lifecycle_hooks.feature +0 -0
  168. {biblicus-1.1.0 → biblicus-1.1.1}/features/markitdown_extractor.feature +0 -0
  169. {biblicus-1.1.0 → biblicus-1.1.1}/features/markov_analysis.feature +0 -0
  170. {biblicus-1.1.0 → biblicus-1.1.1}/features/markov_analysis_categorical.feature +0 -0
  171. {biblicus-1.1.0 → biblicus-1.1.1}/features/markov_analysis_llm.feature +0 -0
  172. {biblicus-1.1.0 → biblicus-1.1.1}/features/markov_analysis_topic_modeling.feature +0 -0
  173. {biblicus-1.1.0 → biblicus-1.1.1}/features/markov_analysis_variants.feature +0 -0
  174. {biblicus-1.1.0 → biblicus-1.1.1}/features/markov_embeddings_errors.feature +0 -0
  175. {biblicus-1.1.0 → biblicus-1.1.1}/features/markov_internal_branches.feature +0 -0
  176. {biblicus-1.1.0 → biblicus-1.1.1}/features/markov_schema.feature +0 -0
  177. {biblicus-1.1.0 → biblicus-1.1.1}/features/markov_start_end_labels.feature +0 -0
  178. {biblicus-1.1.0 → biblicus-1.1.1}/features/model_validation.feature +0 -0
  179. {biblicus-1.1.0 → biblicus-1.1.1}/features/ocr_extractor.feature +0 -0
  180. {biblicus-1.1.0 → biblicus-1.1.1}/features/paddleocr_vl_extractor.feature +0 -0
  181. {biblicus-1.1.0 → biblicus-1.1.1}/features/paddleocr_vl_parse_api_response.feature +0 -0
  182. {biblicus-1.1.0 → biblicus-1.1.1}/features/pdf_text_extraction.feature +0 -0
  183. {biblicus-1.1.0 → biblicus-1.1.1}/features/profiling.feature +0 -0
  184. {biblicus-1.1.0 → biblicus-1.1.1}/features/profiling_config_overrides.feature +0 -0
  185. {biblicus-1.1.0 → biblicus-1.1.1}/features/python_api.feature +0 -0
  186. {biblicus-1.1.0 → biblicus-1.1.1}/features/python_hook_logging.feature +0 -0
  187. {biblicus-1.1.0 → biblicus-1.1.1}/features/query_processing.feature +0 -0
  188. {biblicus-1.1.0 → biblicus-1.1.1}/features/recipe_cascading.feature +0 -0
  189. {biblicus-1.1.0 → biblicus-1.1.1}/features/recipe_file_extraction.feature +0 -0
  190. {biblicus-1.1.0 → biblicus-1.1.1}/features/recipe_utilities.feature +0 -0
  191. {biblicus-1.1.0 → biblicus-1.1.1}/features/retrieval_budget.feature +0 -0
  192. {biblicus-1.1.0 → biblicus-1.1.1}/features/retrieval_build_recipes.feature +0 -0
  193. {biblicus-1.1.0 → biblicus-1.1.1}/features/retrieval_evaluation_lab.feature +0 -0
  194. {biblicus-1.1.0 → biblicus-1.1.1}/features/retrieval_quality.feature +0 -0
  195. {biblicus-1.1.0 → biblicus-1.1.1}/features/retrieval_scan.feature +0 -0
  196. {biblicus-1.1.0 → biblicus-1.1.1}/features/retrieval_sqlite_full_text_search.feature +0 -0
  197. {biblicus-1.1.0 → biblicus-1.1.1}/features/retrieval_uses_extraction_run.feature +0 -0
  198. {biblicus-1.1.0 → biblicus-1.1.1}/features/retrieval_utilities.feature +0 -0
  199. {biblicus-1.1.0 → biblicus-1.1.1}/features/select_override.feature +0 -0
  200. {biblicus-1.1.0 → biblicus-1.1.1}/features/select_override_defaults.feature +0 -0
  201. {biblicus-1.1.0 → biblicus-1.1.1}/features/smart_override_selection.feature +0 -0
  202. {biblicus-1.1.0 → biblicus-1.1.1}/features/source_helper_internal_branches.feature +0 -0
  203. {biblicus-1.1.0 → biblicus-1.1.1}/features/source_loading.feature +0 -0
  204. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/ai_llm_steps.py +0 -0
  205. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/ai_models_steps.py +0 -0
  206. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/analysis_steps.py +0 -0
  207. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/backend_steps.py +0 -0
  208. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/cli_parsing_steps.py +0 -0
  209. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/cli_steps.py +0 -0
  210. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_compaction_steps.py +0 -0
  211. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_compactor_steps.py +0 -0
  212. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_default_pack_priority_steps.py +0 -0
  213. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_default_pack_weights_steps.py +0 -0
  214. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_default_regeneration_steps.py +0 -0
  215. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_engine_error_steps.py +0 -0
  216. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_engine_full_paths_steps.py +0 -0
  217. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_engine_internal_steps.py +0 -0
  218. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_engine_model_steps.py +0 -0
  219. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_engine_registry.py +0 -0
  220. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_engine_retrieval_internal_steps.py +0 -0
  221. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_engine_retrieve_context_pack_steps.py +0 -0
  222. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_engine_retriever.py +0 -0
  223. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_expansion_steps.py +0 -0
  224. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_explicit_pack_priority_steps.py +0 -0
  225. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_explicit_pack_weights_steps.py +0 -0
  226. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_explicit_regeneration_steps.py +0 -0
  227. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_history_compaction_steps.py +0 -0
  228. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_nested_compaction_steps.py +0 -0
  229. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_nested_context_packs_steps.py +0 -0
  230. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_nested_pack_budget_cap_steps.py +0 -0
  231. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_nested_regeneration_steps.py +0 -0
  232. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_pack_budget_steps.py +0 -0
  233. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_pack_steps.py +0 -0
  234. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_regeneration_steps.py +0 -0
  235. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_retriever_steps.py +0 -0
  236. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/corpus_internal_steps.py +0 -0
  237. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/crawl_steps.py +0 -0
  238. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/deepgram_steps.py +0 -0
  239. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/docling_steps.py +0 -0
  240. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/embedding_index_evidence_steps.py +0 -0
  241. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/embedding_index_internal_steps.py +0 -0
  242. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/embedding_retrieval_coverage_steps.py +0 -0
  243. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/embeddings_steps.py +0 -0
  244. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/evidence_processing_steps.py +0 -0
  245. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/extraction_evaluation_lab_steps.py +0 -0
  246. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/extraction_evaluation_steps.py +0 -0
  247. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/extraction_run_lifecycle_steps.py +0 -0
  248. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/extraction_steps.py +0 -0
  249. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/extractor_steps.py +0 -0
  250. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/frontmatter_steps.py +0 -0
  251. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/hook_logging_steps.py +0 -0
  252. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/inference_steps.py +0 -0
  253. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/knowledge_base_steps.py +0 -0
  254. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/markitdown_steps.py +0 -0
  255. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/markov_embeddings_error_steps.py +0 -0
  256. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/markov_internal_steps.py +0 -0
  257. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/markov_schema_steps.py +0 -0
  258. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/markov_start_end_steps.py +0 -0
  259. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/markov_steps.py +0 -0
  260. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/model_steps.py +0 -0
  261. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/openai_steps.py +0 -0
  262. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/paddleocr_mock_steps.py +0 -0
  263. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/paddleocr_vl_steps.py +0 -0
  264. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/paddleocr_vl_unit_steps.py +0 -0
  265. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/pdf_steps.py +0 -0
  266. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/profiling_steps.py +0 -0
  267. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/python_api_steps.py +0 -0
  268. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/rapidocr_steps.py +0 -0
  269. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/recipe_steps.py +0 -0
  270. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/requests_mock_steps.py +0 -0
  271. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/retrieval_build_recipe_steps.py +0 -0
  272. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/retrieval_evaluation_lab_steps.py +0 -0
  273. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/retrieval_quality_steps.py +0 -0
  274. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/retrieval_steps.py +0 -0
  275. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/select_override_defaults_steps.py +0 -0
  276. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/source_helper_steps.py +0 -0
  277. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/stt_deepgram_steps.py +0 -0
  278. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/stt_steps.py +0 -0
  279. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/text_annotate_steps.py +0 -0
  280. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/text_extract_steps.py +0 -0
  281. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/text_internal_steps.py +0 -0
  282. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/text_link_internal_steps.py +0 -0
  283. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/text_link_steps.py +0 -0
  284. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/text_mock_steps.py +0 -0
  285. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/text_redact_steps.py +0 -0
  286. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/text_slice_steps.py +0 -0
  287. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/text_tool_loop_steps.py +0 -0
  288. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/tf_vector_internal_steps.py +0 -0
  289. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/topic_modeling_steps.py +0 -0
  290. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/unstructured_steps.py +0 -0
  291. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/use_cases_steps.py +0 -0
  292. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/user_config_steps.py +0 -0
  293. {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/wikitext_steps.py +0 -0
  294. {biblicus-1.1.0 → biblicus-1.1.1}/features/streaming_ingest.feature +0 -0
  295. {biblicus-1.1.0 → biblicus-1.1.1}/features/stt_deepgram_extractor.feature +0 -0
  296. {biblicus-1.1.0 → biblicus-1.1.1}/features/stt_extractor.feature +0 -0
  297. {biblicus-1.1.0 → biblicus-1.1.1}/features/text_annotate.feature +0 -0
  298. {biblicus-1.1.0 → biblicus-1.1.1}/features/text_extract.feature +0 -0
  299. {biblicus-1.1.0 → biblicus-1.1.1}/features/text_extraction_runs.feature +0 -0
  300. {biblicus-1.1.0 → biblicus-1.1.1}/features/text_internal_branches.feature +0 -0
  301. {biblicus-1.1.0 → biblicus-1.1.1}/features/text_link.feature +0 -0
  302. {biblicus-1.1.0 → biblicus-1.1.1}/features/text_link_internal_branches.feature +0 -0
  303. {biblicus-1.1.0 → biblicus-1.1.1}/features/text_mock.feature +0 -0
  304. {biblicus-1.1.0 → biblicus-1.1.1}/features/text_redact.feature +0 -0
  305. {biblicus-1.1.0 → biblicus-1.1.1}/features/text_slice.feature +0 -0
  306. {biblicus-1.1.0 → biblicus-1.1.1}/features/text_utilities.feature +0 -0
  307. {biblicus-1.1.0 → biblicus-1.1.1}/features/token_budget.feature +0 -0
  308. {biblicus-1.1.0 → biblicus-1.1.1}/features/topic_modeling.feature +0 -0
  309. {biblicus-1.1.0 → biblicus-1.1.1}/features/unstructured_extractor.feature +0 -0
  310. {biblicus-1.1.0 → biblicus-1.1.1}/features/use_cases.feature +0 -0
  311. {biblicus-1.1.0 → biblicus-1.1.1}/features/user_config.feature +0 -0
  312. {biblicus-1.1.0 → biblicus-1.1.1}/scripts/demo_context_engine.py +0 -0
  313. {biblicus-1.1.0 → biblicus-1.1.1}/scripts/download_ag_news.py +0 -0
  314. {biblicus-1.1.0 → biblicus-1.1.1}/scripts/download_audio_samples.py +0 -0
  315. {biblicus-1.1.0 → biblicus-1.1.1}/scripts/download_image_samples.py +0 -0
  316. {biblicus-1.1.0 → biblicus-1.1.1}/scripts/download_mixed_samples.py +0 -0
  317. {biblicus-1.1.0 → biblicus-1.1.1}/scripts/download_pdf_samples.py +0 -0
  318. {biblicus-1.1.0 → biblicus-1.1.1}/scripts/download_wikipedia.py +0 -0
  319. {biblicus-1.1.0 → biblicus-1.1.1}/scripts/extraction_evaluation_demo.py +0 -0
  320. {biblicus-1.1.0 → biblicus-1.1.1}/scripts/extraction_evaluation_lab.py +0 -0
  321. {biblicus-1.1.0 → biblicus-1.1.1}/scripts/markov_analysis_demo.py +0 -0
  322. {biblicus-1.1.0 → biblicus-1.1.1}/scripts/markov_cached_segments_demo.py +0 -0
  323. {biblicus-1.1.0 → biblicus-1.1.1}/scripts/markov_run_report.py +0 -0
  324. {biblicus-1.1.0 → biblicus-1.1.1}/scripts/profiling_demo.py +0 -0
  325. {biblicus-1.1.0 → biblicus-1.1.1}/scripts/readme_end_to_end_demo.py +0 -0
  326. {biblicus-1.1.0 → biblicus-1.1.1}/scripts/retrieval_evaluation_lab.py +0 -0
  327. {biblicus-1.1.0 → biblicus-1.1.1}/scripts/test.py +0 -0
  328. {biblicus-1.1.0 → biblicus-1.1.1}/scripts/topic_modeling_integration.py +0 -0
  329. {biblicus-1.1.0 → biblicus-1.1.1}/scripts/use_cases/notes_to_context_pack_demo.py +0 -0
  330. {biblicus-1.1.0 → biblicus-1.1.1}/scripts/use_cases/sequence_markov_demo.py +0 -0
  331. {biblicus-1.1.0 → biblicus-1.1.1}/scripts/use_cases/text_folder_search_demo.py +0 -0
  332. {biblicus-1.1.0 → biblicus-1.1.1}/scripts/use_cases/text_redact_demo.py +0 -0
  333. {biblicus-1.1.0 → biblicus-1.1.1}/scripts/wikipedia_rag_demo.py +0 -0
  334. {biblicus-1.1.0 → biblicus-1.1.1}/setup.cfg +0 -0
  335. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/__main__.py +0 -0
  336. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/_vendor/dotyaml/__init__.py +0 -0
  337. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/_vendor/dotyaml/interpolation.py +0 -0
  338. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/_vendor/dotyaml/loader.py +0 -0
  339. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/_vendor/dotyaml/transformer.py +0 -0
  340. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/ai/__init__.py +0 -0
  341. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/ai/embeddings.py +0 -0
  342. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/ai/llm.py +0 -0
  343. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/ai/models.py +0 -0
  344. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/analysis/__init__.py +0 -0
  345. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/analysis/base.py +0 -0
  346. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/analysis/markov.py +0 -0
  347. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/analysis/models.py +0 -0
  348. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/analysis/profiling.py +0 -0
  349. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/analysis/schema.py +0 -0
  350. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/analysis/topic_modeling.py +0 -0
  351. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/chunking.py +0 -0
  352. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/cli.py +0 -0
  353. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/configuration.py +0 -0
  354. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/constants.py +0 -0
  355. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/context.py +0 -0
  356. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/context_engine/__init__.py +0 -0
  357. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/context_engine/assembler.py +0 -0
  358. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/context_engine/compaction.py +0 -0
  359. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/context_engine/models.py +0 -0
  360. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/context_engine/retrieval.py +0 -0
  361. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/corpus.py +0 -0
  362. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/crawl.py +0 -0
  363. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/embedding_providers.py +0 -0
  364. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/errors.py +0 -0
  365. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/evaluation.py +0 -0
  366. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/evidence_processing.py +0 -0
  367. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extraction.py +0 -0
  368. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extraction_evaluation.py +0 -0
  369. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/__init__.py +0 -0
  370. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/base.py +0 -0
  371. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/deepgram_stt.py +0 -0
  372. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/docling_granite_text.py +0 -0
  373. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/docling_smol_text.py +0 -0
  374. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/markitdown_text.py +0 -0
  375. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/metadata_text.py +0 -0
  376. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/openai_stt.py +0 -0
  377. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/paddleocr_vl_text.py +0 -0
  378. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/pass_through_text.py +0 -0
  379. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/pdf_text.py +0 -0
  380. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/pipeline.py +0 -0
  381. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/rapidocr_text.py +0 -0
  382. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/select_longest_text.py +0 -0
  383. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/select_override.py +0 -0
  384. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/select_smart_override.py +0 -0
  385. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/select_text.py +0 -0
  386. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/unstructured_text.py +0 -0
  387. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/frontmatter.py +0 -0
  388. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/hook_logging.py +0 -0
  389. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/hook_manager.py +0 -0
  390. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/hooks.py +0 -0
  391. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/ignore.py +0 -0
  392. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/inference.py +0 -0
  393. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/knowledge_base.py +0 -0
  394. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/models.py +0 -0
  395. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/retrieval.py +0 -0
  396. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/retrievers/__init__.py +0 -0
  397. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/retrievers/base.py +0 -0
  398. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/retrievers/embedding_index_common.py +0 -0
  399. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/retrievers/embedding_index_file.py +0 -0
  400. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/retrievers/embedding_index_inmemory.py +0 -0
  401. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/retrievers/hybrid.py +0 -0
  402. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/retrievers/scan.py +0 -0
  403. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/retrievers/sqlite_full_text_search.py +0 -0
  404. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/retrievers/tf_vector.py +0 -0
  405. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/sources.py +0 -0
  406. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/text/__init__.py +0 -0
  407. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/text/annotate.py +0 -0
  408. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/text/extract.py +0 -0
  409. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/text/link.py +0 -0
  410. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/text/markup.py +0 -0
  411. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/text/models.py +0 -0
  412. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/text/prompts.py +0 -0
  413. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/text/redact.py +0 -0
  414. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/text/slice.py +0 -0
  415. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/text/tool_loop.py +0 -0
  416. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/time.py +0 -0
  417. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/uris.py +0 -0
  418. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/user_config.py +0 -0
  419. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus.egg-info/dependency_links.txt +0 -0
  420. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus.egg-info/entry_points.txt +0 -0
  421. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus.egg-info/requires.txt +0 -0
  422. {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus.egg-info/top_level.txt +0 -0
  423. {biblicus-1.1.0 → biblicus-1.1.1}/tests/test_text_extract_tool_calls.py +0 -0
  424. {biblicus-1.1.0 → biblicus-1.1.1}/tests/test_text_utility_tool_calls.py +0 -0
  425. {biblicus-1.1.0 → biblicus-1.1.1}/tests/test_tool_loop_safeguards.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: biblicus
3
- Version: 1.1.0
3
+ Version: 1.1.1
4
4
  Summary: Command line interface and Python library for corpus ingestion, retrieval, and evaluation.
5
5
  License: MIT
6
6
  Requires-Python: >=3.9
@@ -82,8 +82,8 @@ See [retrieval augmented generation overview] for a short introduction to the id
82
82
  - `biblicus analyze markov` learns a directed, weighted state transition graph over segmented text.
83
83
  - YAML configurations support cascading composition plus dotted `--config key=value` overrides.
84
84
  - Text extract splits long texts with an LLM by inserting XML tags in-place for structured spans.
85
- - See `docs/MARKOV_ANALYSIS.md` for Markov analysis details and runnable demos.
86
- - See `docs/TEXT_EXTRACT.md` for the text extract utility and examples.
85
+ - See `docs/markov-analysis.md` for Markov analysis details and runnable demos.
86
+ - See `docs/text-extract.md` for the text extract utility and examples.
87
87
 
88
88
  ## Start with a knowledge base
89
89
 
@@ -552,9 +552,9 @@ For detailed documentation including configuration options, performance characte
552
552
 
553
553
  ## Retrieval documentation
554
554
 
555
- For the retrieval pipeline overview and snapshot artifacts, see `docs/RETRIEVAL.md`. For retrieval quality upgrades
556
- (tuned lexical baseline, reranking, hybrid retrieval), see `docs/RETRIEVAL_QUALITY.md`. For evaluation workflows
557
- and dataset formats, see `docs/RETRIEVAL_EVALUATION.md`. For a runnable walkthrough, use the retrieval evaluation lab
555
+ For the retrieval pipeline overview and snapshot artifacts, see `docs/retrieval.md`. For retrieval quality upgrades
556
+ (tuned lexical baseline, reranking, hybrid retrieval), see `docs/retrieval-quality.md`. For evaluation workflows
557
+ and dataset formats, see `docs/retrieval-evaluation.md`. For a runnable walkthrough, use the retrieval evaluation lab
558
558
  script (`scripts/retrieval_evaluation_lab.py`).
559
559
 
560
560
  ## Extraction backends
@@ -594,7 +594,7 @@ These extractors are built in. Optional ones require extra dependencies. See [te
594
594
  For detailed documentation on all extractors, see the [Extractor Reference][extractor-reference].
595
595
 
596
596
  For extraction evaluation workflows, dataset formats, and report interpretation, see
597
- `docs/EXTRACTION_EVALUATION.md`.
597
+ `docs/extraction-evaluation.md`.
598
598
 
599
599
  ## Text extract utility
600
600
 
@@ -602,14 +602,14 @@ Text extract is a reusable analysis utility that lets a model insert XML tags in
602
602
  entire document. It returns structured spans and the marked-up text, and it is used as a segmentation option in Markov
603
603
  analysis.
604
604
 
605
- See `docs/TEXT_EXTRACT.md` for the utility API and examples, and `docs/MARKOV_ANALYSIS.md` for the Markov integration.
605
+ See `docs/text-extract.md` for the utility API and examples, and `docs/markov-analysis.md` for the Markov integration.
606
606
 
607
607
  ## Text slice utility
608
608
 
609
609
  Text slice is a reusable analysis utility that lets a model insert `<slice/>` markers into a long text without
610
610
  re-emitting the entire document. It returns ordered slices and the marked-up text for auditing and reuse.
611
611
 
612
- See `docs/TEXT_SLICE.md` for the utility API and examples.
612
+ See `docs/text-slice.md` for the utility API and examples.
613
613
 
614
614
  ## Topic modeling analysis
615
615
 
@@ -618,8 +618,8 @@ are the first analysis backends. Profiling summarizes corpus composition and ext
618
618
  an extraction snapshot, optionally applies an LLM-driven extraction pass, applies lexical processing, runs BERTopic, and
619
619
  optionally applies an LLM fine-tuning pass to label topics. The output is structured JavaScript Object Notation.
620
620
 
621
- See `docs/ANALYSIS.md` for the analysis pipeline overview, `docs/PROFILING.md` for profiling, and
622
- `docs/TOPIC_MODELING.md` for topic modeling details.
621
+ See `docs/analysis.md` for the analysis pipeline overview, `docs/profiling.md` for profiling, and
622
+ `docs/topic-modeling.md` for topic modeling details.
623
623
 
624
624
  Run a topic analysis using a configuration file:
625
625
 
@@ -668,7 +668,7 @@ For a repeatable, real-world integration run that downloads AG News and executes
668
668
  python scripts/topic_modeling_integration.py --corpus corpora/ag_news_demo --force
669
669
  ```
670
670
 
671
- See `docs/TOPIC_MODELING.md` for parameter examples and per-topic output behavior.
671
+ See `docs/topic-modeling.md` for parameter examples and per-topic output behavior.
672
672
 
673
673
  ## Integration corpus and evaluation dataset
674
674
 
@@ -726,20 +726,20 @@ Open `http://localhost:8000` in your browser.
726
726
  License terms are in `LICENSE`.
727
727
 
728
728
  [retrieval augmented generation overview]: https://en.wikipedia.org/wiki/Retrieval-augmented_generation
729
- [architecture]: docs/ARCHITECTURE.md
730
- [roadmap]: docs/ROADMAP.md
731
- [feature-index]: docs/FEATURE_INDEX.md
732
- [corpus]: docs/CORPUS.md
733
- [knowledge-base]: docs/KNOWLEDGE_BASE.md
734
- [text-extraction]: docs/EXTRACTION.md
729
+ [architecture]: docs/architecture.md
730
+ [roadmap]: docs/roadmap.md
731
+ [feature-index]: docs/feature-index.md
732
+ [corpus]: docs/corpus.md
733
+ [knowledge-base]: docs/knowledge-base.md
734
+ [text-extraction]: docs/extraction.md
735
735
  [extractor-reference]: docs/extractors/index.md
736
736
  [backend-reference]: docs/backends/index.md
737
- [speech-to-text]: docs/STT.md
738
- [user-configuration]: docs/USER_CONFIGURATION.md
739
- [backends]: docs/BACKENDS.md
740
- [context-packs]: docs/CONTEXT_PACK.md
741
- [demos]: docs/DEMOS.md
742
- [testing]: docs/TESTING.md
737
+ [speech-to-text]: docs/stt.md
738
+ [user-configuration]: docs/user-configuration.md
739
+ [backends]: docs/backends.md
740
+ [context-packs]: docs/context-pack.md
741
+ [demos]: docs/demos.md
742
+ [testing]: docs/testing.md
743
743
 
744
744
  [continuous-integration-badge]: https://github.com/AnthusAI/Biblicus/actions/workflows/ci.yml/badge.svg?branch=main
745
745
  [coverage-badge]: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/AnthusAI/Biblicus/main/coverage_badge.json
@@ -28,8 +28,8 @@ See [retrieval augmented generation overview] for a short introduction to the id
28
28
  - `biblicus analyze markov` learns a directed, weighted state transition graph over segmented text.
29
29
  - YAML configurations support cascading composition plus dotted `--config key=value` overrides.
30
30
  - Text extract splits long texts with an LLM by inserting XML tags in-place for structured spans.
31
- - See `docs/MARKOV_ANALYSIS.md` for Markov analysis details and runnable demos.
32
- - See `docs/TEXT_EXTRACT.md` for the text extract utility and examples.
31
+ - See `docs/markov-analysis.md` for Markov analysis details and runnable demos.
32
+ - See `docs/text-extract.md` for the text extract utility and examples.
33
33
 
34
34
  ## Start with a knowledge base
35
35
 
@@ -498,9 +498,9 @@ For detailed documentation including configuration options, performance characte
498
498
 
499
499
  ## Retrieval documentation
500
500
 
501
- For the retrieval pipeline overview and snapshot artifacts, see `docs/RETRIEVAL.md`. For retrieval quality upgrades
502
- (tuned lexical baseline, reranking, hybrid retrieval), see `docs/RETRIEVAL_QUALITY.md`. For evaluation workflows
503
- and dataset formats, see `docs/RETRIEVAL_EVALUATION.md`. For a runnable walkthrough, use the retrieval evaluation lab
501
+ For the retrieval pipeline overview and snapshot artifacts, see `docs/retrieval.md`. For retrieval quality upgrades
502
+ (tuned lexical baseline, reranking, hybrid retrieval), see `docs/retrieval-quality.md`. For evaluation workflows
503
+ and dataset formats, see `docs/retrieval-evaluation.md`. For a runnable walkthrough, use the retrieval evaluation lab
504
504
  script (`scripts/retrieval_evaluation_lab.py`).
505
505
 
506
506
  ## Extraction backends
@@ -540,7 +540,7 @@ These extractors are built in. Optional ones require extra dependencies. See [te
540
540
  For detailed documentation on all extractors, see the [Extractor Reference][extractor-reference].
541
541
 
542
542
  For extraction evaluation workflows, dataset formats, and report interpretation, see
543
- `docs/EXTRACTION_EVALUATION.md`.
543
+ `docs/extraction-evaluation.md`.
544
544
 
545
545
  ## Text extract utility
546
546
 
@@ -548,14 +548,14 @@ Text extract is a reusable analysis utility that lets a model insert XML tags in
548
548
  entire document. It returns structured spans and the marked-up text, and it is used as a segmentation option in Markov
549
549
  analysis.
550
550
 
551
- See `docs/TEXT_EXTRACT.md` for the utility API and examples, and `docs/MARKOV_ANALYSIS.md` for the Markov integration.
551
+ See `docs/text-extract.md` for the utility API and examples, and `docs/markov-analysis.md` for the Markov integration.
552
552
 
553
553
  ## Text slice utility
554
554
 
555
555
  Text slice is a reusable analysis utility that lets a model insert `<slice/>` markers into a long text without
556
556
  re-emitting the entire document. It returns ordered slices and the marked-up text for auditing and reuse.
557
557
 
558
- See `docs/TEXT_SLICE.md` for the utility API and examples.
558
+ See `docs/text-slice.md` for the utility API and examples.
559
559
 
560
560
  ## Topic modeling analysis
561
561
 
@@ -564,8 +564,8 @@ are the first analysis backends. Profiling summarizes corpus composition and ext
564
564
  an extraction snapshot, optionally applies an LLM-driven extraction pass, applies lexical processing, runs BERTopic, and
565
565
  optionally applies an LLM fine-tuning pass to label topics. The output is structured JavaScript Object Notation.
566
566
 
567
- See `docs/ANALYSIS.md` for the analysis pipeline overview, `docs/PROFILING.md` for profiling, and
568
- `docs/TOPIC_MODELING.md` for topic modeling details.
567
+ See `docs/analysis.md` for the analysis pipeline overview, `docs/profiling.md` for profiling, and
568
+ `docs/topic-modeling.md` for topic modeling details.
569
569
 
570
570
  Run a topic analysis using a configuration file:
571
571
 
@@ -614,7 +614,7 @@ For a repeatable, real-world integration run that downloads AG News and executes
614
614
  python scripts/topic_modeling_integration.py --corpus corpora/ag_news_demo --force
615
615
  ```
616
616
 
617
- See `docs/TOPIC_MODELING.md` for parameter examples and per-topic output behavior.
617
+ See `docs/topic-modeling.md` for parameter examples and per-topic output behavior.
618
618
 
619
619
  ## Integration corpus and evaluation dataset
620
620
 
@@ -672,20 +672,20 @@ Open `http://localhost:8000` in your browser.
672
672
  License terms are in `LICENSE`.
673
673
 
674
674
  [retrieval augmented generation overview]: https://en.wikipedia.org/wiki/Retrieval-augmented_generation
675
- [architecture]: docs/ARCHITECTURE.md
676
- [roadmap]: docs/ROADMAP.md
677
- [feature-index]: docs/FEATURE_INDEX.md
678
- [corpus]: docs/CORPUS.md
679
- [knowledge-base]: docs/KNOWLEDGE_BASE.md
680
- [text-extraction]: docs/EXTRACTION.md
675
+ [architecture]: docs/architecture.md
676
+ [roadmap]: docs/roadmap.md
677
+ [feature-index]: docs/feature-index.md
678
+ [corpus]: docs/corpus.md
679
+ [knowledge-base]: docs/knowledge-base.md
680
+ [text-extraction]: docs/extraction.md
681
681
  [extractor-reference]: docs/extractors/index.md
682
682
  [backend-reference]: docs/backends/index.md
683
- [speech-to-text]: docs/STT.md
684
- [user-configuration]: docs/USER_CONFIGURATION.md
685
- [backends]: docs/BACKENDS.md
686
- [context-packs]: docs/CONTEXT_PACK.md
687
- [demos]: docs/DEMOS.md
688
- [testing]: docs/TESTING.md
683
+ [speech-to-text]: docs/stt.md
684
+ [user-configuration]: docs/user-configuration.md
685
+ [backends]: docs/backends.md
686
+ [context-packs]: docs/context-pack.md
687
+ [demos]: docs/demos.md
688
+ [testing]: docs/testing.md
689
689
 
690
690
  [continuous-integration-badge]: https://github.com/AnthusAI/Biblicus/actions/workflows/ci.yml/badge.svg?branch=main
691
691
  [coverage-badge]: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/AnthusAI/Biblicus/main/coverage_badge.json
@@ -103,7 +103,7 @@ observations:
103
103
  ## Topic modeling
104
104
 
105
105
  Topic modeling is the first analysis backend. It uses BERTopic to cluster extracted text, produces per-topic evidence,
106
- and optionally labels topics using an LLM. See `docs/TOPIC_MODELING.md` for detailed configuration and examples.
106
+ and optionally labels topics using an LLM. See `docs/topic-modeling.md` for detailed configuration and examples.
107
107
 
108
108
  The integration demo script is a working reference you can use as a starting point:
109
109
 
@@ -117,7 +117,7 @@ labels, keywords, and document examples.
117
117
  ## Markov analysis
118
118
 
119
119
  Markov analysis learns a directed, weighted state transition graph over sequences of text segments. The output includes
120
- per-state exemplars, per-item decoded paths, and optional GraphViz exports. See `docs/MARKOV_ANALYSIS.md` for detailed
120
+ per-state exemplars, per-item decoded paths, and optional GraphViz exports. See `docs/markov-analysis.md` for detailed
121
121
  configuration and examples.
122
122
 
123
123
  Text extract is available as a segmentation strategy for long texts. It inserts XML tags in-place using a virtual file
@@ -126,7 +126,7 @@ editing loop, then extracts spans without requiring the model to re-emit the ful
126
126
  ## Profiling analysis
127
127
 
128
128
  Profiling is the baseline analysis backend. It summarizes corpus composition and extraction coverage using
129
- deterministic counts and distribution metrics. See `docs/PROFILING.md` for the full reference and working demo.
129
+ deterministic counts and distribution metrics. See `docs/profiling.md` for the full reference and working demo.
130
130
 
131
131
  ### Minimal profiling run
132
132
 
@@ -0,0 +1,107 @@
1
+ # Biblicus Architecture
2
+
3
+ Biblicus sits between raw, unstructured data and the moment you need reliable answers from it.
4
+ It is built for teams who receive large, messy corpora and must extract usable signals without
5
+ losing provenance or reproducibility. Retrieval-augmented generation is one use case, but the
6
+ system is broader than chatbots: it supports any pipeline that needs structured insight from
7
+ unstructured data.
8
+
9
+ At a high level the system does five things:
10
+
11
+ 1. **Ingests** raw content into a corpus with minimal friction.
12
+ 2. **Extracts** text from diverse media (documents, images, audio).
13
+ 3. **Transforms** and annotates text with reusable LLM utilities.
14
+ 4. **Retrieves** evidence through explicit, reproducible stages.
15
+ 5. **Evaluates** results so improvements are measurable, not anecdotal.
16
+
17
+ The guiding idea is that every retrieval produces **evidence**: structured outputs with scores
18
+ and provenance that can be inspected, audited, and reused. Context packs, summaries, and downstream
19
+ generation are all derived from that evidence.
20
+
21
+ ## Core Concepts
22
+
23
+ - **Corpus**: a named, mutable collection rooted at a path or uniform resource identifier. In
24
+ version zero it is typically a local folder containing raw files plus a `.biblicus/` directory
25
+ for minimal metadata.
26
+ - **Item**: the unit of ingestion in a corpus: raw bytes of any modality, including text, images,
27
+ Portable Document Format documents, audio, and video, plus optional metadata and provenance.
28
+ - **Knowledge base backend**: an implementation that can ingest and retrieve from a corpus, such
29
+ as scan, full text search, vector retrieval, or hybrid retrieval, exposed to procedures through
30
+ retrieval primitives.
31
+ - **Retrieval configuration**: a named configuration bundle for a backend, such as chunking rules,
32
+ embedding model and version, hybrid weights, reranker choice, and filters. This is what we
33
+ benchmark and compare.
34
+ - **Configuration manifest**: a reproducibility record describing the backend and configuration parameters,
35
+ plus any referenced snapshot artifacts and build snapshots.
36
+ - **Snapshot artifacts**: optional, persisted representations derived from raw content for a given
37
+ configuration and backend, such as chunks, embeddings, or indexes. Some backends intentionally have
38
+ none and operate on demand.
39
+ - **Evidence**: structured retrieval output from backend queries. Evidence includes spans, scores,
40
+ and provenance used by downstream retrieval augmented generation procedures.
41
+ - **Pipeline stage / editorial layer**: a structured step that transforms, filters, extracts, or
42
+ curates content, such as raw, curated, and published, or extract text from Portable Document
43
+ Format documents.
44
+
45
+ ## Design Principles
46
+
47
+ - **Primitives + derived constructs**: keep the protocol surface small and composable; ship
48
+ higher-level helpers and example procedures on top.
49
+ - **Composability definition**: composable means each stage has a small input and output contract,
50
+ so you can connect stages in different orders without rewriting them.
51
+ - **Minimal opinion raw store**: raw ingestion should work for a folder of files with optional
52
+ lightweight tagging.
53
+ - **Reproducibility by default**: comparisons require manifests (even when there are no persisted
54
+ snapshot artifacts).
55
+ - **Mutability is real**: corpora are edited, pruned, and reorganized; re-indexing must be a core
56
+ workflow.
57
+ - **Separation of concerns**: retrieval returns evidence; retrieval-augmented generation patterns
58
+ live in Tactus procedures (not inside the knowledge base backend).
59
+ - **Deployment flexibility**: same interface across local/offline, brokered external services, and
60
+ hybrid environments.
61
+ - **Evidence is the primary output**: every retrieval returns structured evidence; everything else
62
+ is a derived helper.
63
+
64
+ ## The Python Developer Mental Model
65
+
66
+ If this system is pleasant to use, a Python developer should be able to describe intent with the
67
+ core nouns:
68
+
69
+ - I have a **corpus** at this path or uniform resource identifier.
70
+ - I ingest an **item** with optional **metadata**.
71
+ - I rebuild the derived **index** after edits.
72
+ - I run a **configuration** against the same corpus.
73
+ - I query and receive **evidence**.
74
+
75
+ Anything that does not map cleanly to these nouns is either a derived helper or a backend-specific
76
+ implementation detail that should not leak.
77
+
78
+ ## Evidence Lifecycle
79
+
80
+ Evidence flows through explicit stages and remains inspectable at every step:
81
+
82
+ 1. **Retrieval**: backends return evidence with `stage` labels and scores.
83
+ 2. **Processing**: optional reranking or filtering updates scores while preserving provenance.
84
+ 3. **Context shaping**: context packs select and format evidence into model-ready text.
85
+ 4. **Evaluation**: evaluation datasets compare evidence rankings to expectations.
86
+
87
+ At each stage, the output remains a structured object, so you can inspect, store, and compare
88
+ runs without re-running the entire pipeline.
89
+
90
+ ## Relationship to Agent Frameworks
91
+
92
+ Biblicus integrates with agent frameworks through explicit tool interfaces. It does not hide
93
+ retrieval inside the model. Instead, it provides repeatable pipelines that expose *what* was
94
+ retrieved and *why*, so models can use evidence directly and safely.
95
+
96
+ - **Tools and toolsets**, including the Model Context Protocol, are the primary capability
97
+ boundary.
98
+ - **Sandboxing and brokered or secretless execution** are primary deployment modes.
99
+ - **Durability and evaluations** are central: invariants via specifications, quality via
100
+ evaluations.
101
+
102
+ ## Where to go next
103
+
104
+ - Start with **corpus.md** and **extraction.md** to understand how raw content is ingested.
105
+ - Move to **retrieval.md** and **retrieval-evaluation.md** to see how evidence is produced and tested.
106
+ - Explore **topic-modeling.md** and **markov-analysis.md** if you need higher-level analysis tools.
107
+ - See **text-utilities.md** for reusable, AI-assisted text transformations.
@@ -96,7 +96,7 @@ biblicus build my-corpus --backend sqlite-full-text-search
96
96
  biblicus query my-corpus --query "search terms"
97
97
  ```
98
98
 
99
- See `docs/RETRIEVAL.md` for a step-by-step retrieval walkthrough.
99
+ See `docs/retrieval.md` for a step-by-step retrieval walkthrough.
100
100
 
101
101
  #### Python API
102
102
 
@@ -126,7 +126,7 @@ result = backend.query(
126
126
  )
127
127
  ```
128
128
 
129
- See `docs/RETRIEVAL_EVALUATION.md` for evaluation workflows and dataset formats.
129
+ See `docs/retrieval-evaluation.md` for evaluation workflows and dataset formats.
130
130
 
131
131
  ## Choosing a Backend
132
132
 
@@ -291,12 +291,12 @@ To implement a custom backend:
291
291
  3. Register in `biblicus.backends.available_backends`
292
292
  4. Add BDD specifications with 100% coverage
293
293
 
294
- See [BACKENDS.md](../BACKENDS.md) for implementation details.
294
+ See [backends.md](../backends.md) for implementation details.
295
295
 
296
296
  ## See Also
297
297
 
298
298
  - [scan backend](scan.md) - Naive full-scan backend
299
299
  - [sqlite-full-text-search backend](sqlite-full-text-search.md) - SQLite FTS5 backend
300
- - [BACKENDS.md](../BACKENDS.md) - Backend implementation guide
301
- - [EXTRACTION.md](../EXTRACTION.md) - Text extraction pipeline
300
+ - [backends.md](../backends.md) - Backend implementation guide
301
+ - [extraction.md](../extraction.md) - Text extraction pipeline
302
302
  - [Extractor Reference](../extractors/index.md) - Text extraction plugins
@@ -322,6 +322,6 @@ Query result statistics:
322
322
  ## See Also
323
323
 
324
324
  - [Backends Overview](index.md) - All available backends
325
- - [BACKENDS.md](../BACKENDS.md) - Backend implementation guide
326
- - [EXTRACTION.md](../EXTRACTION.md) - Text extraction pipeline
325
+ - [backends.md](../backends.md) - Backend implementation guide
326
+ - [extraction.md](../extraction.md) - Text extraction pipeline
327
327
  - [Extractor Reference](../extractors/index.md) - Text extraction plugins
@@ -481,7 +481,7 @@ CREATE VIRTUAL TABLE chunks_full_text_search USING fts5(
481
481
  ## See Also
482
482
 
483
483
  - [Backends Overview](index.md) - All available backends
484
- - [BACKENDS.md](../BACKENDS.md) - Backend implementation guide
485
- - [EXTRACTION.md](../EXTRACTION.md) - Text extraction pipeline
484
+ - [backends.md](../backends.md) - Backend implementation guide
485
+ - [extraction.md](../extraction.md) - Text extraction pipeline
486
486
  - [Extractor Reference](../extractors/index.md) - Text extraction plugins
487
487
  - [SQLite FTS5 Documentation](https://www.sqlite.org/fts5.html) - Official SQLite FTS5 docs
@@ -41,7 +41,7 @@ The manifest is the reproducible contract. Artifacts are backend-specific and li
41
41
  - Treat **runs** as immutable manifests with reproducible parameters.
42
42
  - If your backend needs artifacts, store them under `.biblicus/runs/` and record paths in `artifact_paths`.
43
43
  - Keep **text extraction** in explicit pipeline stages, not in backend ingestion.
44
- See `docs/EXTRACTION.md` for how extraction snapshots are built and referenced from backend configs.
44
+ See `docs/extraction.md` for how extraction snapshots are built and referenced from backend configs.
45
45
 
46
46
  ## Reproducibility checklist
47
47
 
@@ -3,94 +3,7 @@
3
3
  This document is a set of runnable examples you can use to see the current system working end to end.
4
4
  Each section links to a textbook chapter so you can read the concept and then run the code.
5
5
 
6
- For the ordered plan of what to build next, see `docs/ROADMAP.md`.
7
-
8
- ## Diagram of the current system and the next layers
9
-
10
- Blue boxes are implemented now. Purple boxes are layers not implemented yet that we can build and compare.
11
-
12
- ```mermaid
13
- %%{init: {"flowchart": {"useMaxWidth": true, "nodeSpacing": 18, "rankSpacing": 22}}}%%
14
- flowchart TB
15
- subgraph Legend[Legend]
16
- direction LR
17
- LegendNow[Implemented now]
18
- LegendPlanned[Planned]
19
- LegendNow --- LegendPlanned
20
- end
21
-
22
- subgraph ExistsNow[Implemented now]
23
- direction TB
24
-
25
- Ingest[Ingest] --> RawFiles[Raw item files]
26
- RawFiles --> CatalogFile[Catalog file]
27
- CatalogFile --> ExtractionRun[Extraction run]
28
- ExtractionRun --> ExtractedText[Extracted text artifacts]
29
-
30
- subgraph PluggableBackend[Pluggable backend]
31
- direction LR
32
-
33
- subgraph BackendIngestionIndexing[Ingestion and indexing]
34
- direction TB
35
- CatalogFile --> BuildRun[Build run]
36
- ExtractedText -.-> BuildRun
37
- BuildRun --> BackendIndex[Backend index]
38
- BackendIndex --> RunManifest[Run manifest]
39
- end
40
-
41
- subgraph BackendRetrievalGeneration[Retrieval and generation]
42
- direction TB
43
- RunManifest --> Query[Query]
44
- Query --> Evidence[Evidence]
45
- Evidence --> EvaluationMetrics[Evaluation metrics]
46
- end
47
- end
48
- end
49
-
50
- subgraph PlannedLayers[Planned]
51
- direction TB
52
- RerankStage[Rerank<br/>pipeline stage]
53
- FilterStage[Filter<br/>pipeline stage]
54
- ToolServer[Tool server<br/>for external backends]
55
- OpticalCharacterRecognition[Optical character recognition<br/>extraction plugin]
56
- SpeechToText[Speech to text<br/>extraction plugin]
57
- end
58
-
59
- OpticalCharacterRecognition -.-> ExtractionRun
60
- SpeechToText -.-> ExtractionRun
61
- RerankStage -.-> Evidence
62
- FilterStage -.-> Evidence
63
- ToolServer -.-> PluggableBackend
64
-
65
- style Legend fill:#ffffff,stroke:#ffffff,color:#111111
66
- style ExistsNow fill:#ffffff,stroke:#ffffff,color:#111111
67
- style PlannedLayers fill:#ffffff,stroke:#ffffff,color:#111111
68
-
69
- style LegendNow fill:#e3f2fd,stroke:#1e88e5,color:#111111
70
- style LegendPlanned fill:#f3e5f5,stroke:#8e24aa,color:#111111
71
-
72
- style Ingest fill:#e3f2fd,stroke:#1e88e5,color:#111111
73
- style RawFiles fill:#e3f2fd,stroke:#1e88e5,color:#111111
74
- style CatalogFile fill:#e3f2fd,stroke:#1e88e5,color:#111111
75
- style ExtractionRun fill:#e3f2fd,stroke:#1e88e5,color:#111111
76
- style ExtractedText fill:#e3f2fd,stroke:#1e88e5,color:#111111
77
- style BuildRun fill:#e3f2fd,stroke:#1e88e5,color:#111111
78
- style BackendIndex fill:#e3f2fd,stroke:#1e88e5,color:#111111
79
- style RunManifest fill:#e3f2fd,stroke:#1e88e5,color:#111111
80
- style Query fill:#e3f2fd,stroke:#1e88e5,color:#111111
81
- style Evidence fill:#e3f2fd,stroke:#1e88e5,color:#111111
82
- style EvaluationMetrics fill:#e3f2fd,stroke:#1e88e5,color:#111111
83
-
84
- style PluggableBackend fill:#ffffff,stroke:#1e88e5,stroke-dasharray:6 3,stroke-width:2px,color:#111111
85
- style BackendIngestionIndexing fill:#ffffff,stroke:#cfd8dc,color:#111111
86
- style BackendRetrievalGeneration fill:#ffffff,stroke:#cfd8dc,color:#111111
87
-
88
- style RerankStage fill:#f3e5f5,stroke:#8e24aa,color:#111111
89
- style FilterStage fill:#f3e5f5,stroke:#8e24aa,color:#111111
90
- style ToolServer fill:#f3e5f5,stroke:#8e24aa,color:#111111
91
- style OpticalCharacterRecognition fill:#f3e5f5,stroke:#8e24aa,color:#111111
92
- style SpeechToText fill:#f3e5f5,stroke:#8e24aa,color:#111111
93
- ```
6
+ For the ordered plan of what to build next, see `docs/roadmap.md`.
94
7
 
95
8
  ## Working examples you can run now
96
9
 
@@ -169,10 +82,10 @@ In another terminal:
169
82
  ```
170
83
  rm -rf corpora/crawl-demo
171
84
  python -m biblicus init corpora/crawl-demo
172
- python -m biblicus crawl --corpus corpora/crawl-demo \\
173
- --root-url http://127.0.0.1:8000/site/index.html \\
174
- --allowed-prefix http://127.0.0.1:8000/site/ \\
175
- --max-items 50 \\
85
+ python -m biblicus crawl --corpus corpora/crawl-demo \
86
+ --root-url http://127.0.0.1:8000/site/index.html \
87
+ --allowed-prefix http://127.0.0.1:8000/site/ \
88
+ --max-items 50 \
176
89
  --tag crawled
177
90
  python -m biblicus list --corpus corpora/crawl-demo
178
91
  ```
@@ -189,7 +102,7 @@ python -m biblicus extract build --corpus corpora/demo --step pass-through-text
189
102
 
190
103
  The output includes a `snapshot_id` you can reuse when building a retrieval backend.
191
104
 
192
- Text extraction details: `docs/EXTRACTION.md`
105
+ Text extraction details: `docs/extraction.md`
193
106
 
194
107
  ### Topic modeling integration run
195
108
 
@@ -204,7 +117,7 @@ python -m pip install "biblicus[datasets,topic-modeling]"
204
117
  python scripts/topic_modeling_integration.py --corpus corpora/ag_news_demo --force
205
118
  ```
206
119
 
207
- Topic modeling details: `docs/TOPIC_MODELING.md`
120
+ Topic modeling details: `docs/topic-modeling.md`
208
121
 
209
122
  ### Extraction evaluation demo run
210
123
 
@@ -223,7 +136,7 @@ python scripts/extraction_evaluation_demo.py --corpus corpora/ag_news_extraction
223
136
 
224
137
  The script prints the dataset path, extraction snapshot reference, and evaluation output path so you can inspect the results.
225
138
 
226
- Extraction evaluation details: `docs/EXTRACTION_EVALUATION.md`
139
+ Extraction evaluation details: `docs/extraction-evaluation.md`
227
140
 
228
141
  ### Extraction evaluation lab run
229
142
 
@@ -235,7 +148,7 @@ python scripts/extraction_evaluation_lab.py --corpus corpora/extraction_eval_lab
235
148
 
236
149
  The lab writes a generated dataset file and evaluation output path and prints both in the command output.
237
150
 
238
- Extraction evaluation lab details: `docs/EXTRACTION_EVALUATION.md`
151
+ Extraction evaluation lab details: `docs/extraction-evaluation.md`
239
152
 
240
153
  ### Retrieval evaluation lab run
241
154
 
@@ -248,7 +161,7 @@ python scripts/retrieval_evaluation_lab.py --corpus corpora/retrieval_eval_lab -
248
161
 
249
162
  The script prints the dataset path, retrieval snapshot identifier, and evaluation output location.
250
163
 
251
- Retrieval evaluation details: `docs/RETRIEVAL_EVALUATION.md`
164
+ Retrieval evaluation details: `docs/retrieval-evaluation.md`
252
165
 
253
166
  Run with a larger corpus and a higher topic count:
254
167
 
@@ -274,27 +187,27 @@ The profiling demo downloads AG News, runs extraction, and produces a profiling
274
187
  python scripts/profiling_demo.py --corpus corpora/profiling_demo --force
275
188
  ```
276
189
 
277
- Profiling details: `docs/PROFILING.md`
190
+ Profiling details: `docs/profiling.md`
278
191
 
279
192
  ### Select extracted text within a pipeline
280
193
 
281
194
  When you want an explicit choice among multiple extraction outputs, add a selection extractor step at the end of the pipeline.
282
195
 
283
196
  ```
284
- python -m biblicus extract build --corpus corpora/demo \\
285
- --step pass-through-text \\
286
- --step metadata-text \\
197
+ python -m biblicus extract build --corpus corpora/demo \
198
+ --step pass-through-text \
199
+ --step metadata-text \
287
200
  --step select-text
288
201
  ```
289
202
 
290
203
  Copy the `snapshot_id` from the JavaScript Object Notation output. Use it as `EXTRACTION_SNAPSHOT_ID` in the next command.
291
204
 
292
205
  ```
293
- python -m biblicus build --corpus corpora/demo --backend sqlite-full-text-search \\
206
+ python -m biblicus build --corpus corpora/demo --backend sqlite-full-text-search \
294
207
  --config extraction_snapshot=pipeline:EXTRACTION_SNAPSHOT_ID
295
208
  ```
296
209
 
297
- Extraction pipeline details: `docs/EXTRACTION.md`
210
+ Extraction pipeline details: `docs/extraction.md`
298
211
 
299
212
  ### Portable Document Format extraction and retrieval
300
213
 
@@ -314,7 +227,7 @@ python -m biblicus build --corpus corpora/pdf_samples --backend sqlite-full-text
314
227
  python -m biblicus query --corpus corpora/pdf_samples --query "Dummy PDF file"
315
228
  ```
316
229
 
317
- Retrieval details: `docs/RETRIEVAL.md`
230
+ Retrieval details: `docs/retrieval.md`
318
231
 
319
232
  ### MarkItDown extraction demo (Python 3.10+)
320
233
 
@@ -386,9 +299,9 @@ python -m biblicus extract build --corpus corpora/mixed_samples --step unstructu
386
299
  When you want to prefer one extractor over another for the same item types, order the steps and end with `select-text`:
387
300
 
388
301
  ```
389
- python -m biblicus extract build --corpus corpora/pdf_samples \\
390
- --step unstructured \\
391
- --step pdf-text \\
302
+ python -m biblicus extract build --corpus corpora/pdf_samples \
303
+ --step unstructured \
304
+ --step pdf-text \
392
305
  --step select-text
393
306
  ```
394
307
 
@@ -429,7 +342,7 @@ python -m biblicus build --corpus corpora/demo --backend scan
429
342
  python -m biblicus query --corpus corpora/demo --query "Hello"
430
343
  ```
431
344
 
432
- Backend details: `docs/BACKENDS.md`
345
+ Backend details: `docs/backends.md`
433
346
 
434
347
  ### Build and query the practical backend
435
348
 
@@ -440,7 +353,7 @@ python -m biblicus build --corpus corpora/demo --backend sqlite-full-text-search
440
353
  python -m biblicus query --corpus corpora/demo --query "tiny"
441
354
  ```
442
355
 
443
- Backend details: `docs/BACKENDS.md`
356
+ Backend details: `docs/backends.md`
444
357
 
445
358
  ### Run the test suite and view coverage
446
359
 
@@ -455,14 +368,14 @@ To include integration scenarios that download public test data at runtime:
455
368
  python scripts/test.py --integration
456
369
  ```
457
370
 
458
- Testing details: `docs/TESTING.md`
371
+ Testing details: `docs/testing.md`
459
372
 
460
373
  ## Documentation map
461
374
 
462
- - Corpus: `docs/CORPUS.md`
463
- - Text extraction: `docs/EXTRACTION.md`
464
- - Backends: `docs/BACKENDS.md`
465
- - Testing: `docs/TESTING.md`
466
- - Roadmap: `docs/ROADMAP.md`
375
+ - Corpus: `docs/corpus.md`
376
+ - Text extraction: `docs/extraction.md`
377
+ - Backends: `docs/backends.md`
378
+ - Testing: `docs/testing.md`
379
+ - Roadmap: `docs/roadmap.md`
467
380
 
468
- For what to build next, see `docs/ROADMAP.md`.
381
+ For what to build next, see `docs/roadmap.md`.