biblicus 0.16.0__tar.gz → 1.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (440) hide show
  1. {biblicus-0.16.0/src/biblicus.egg-info → biblicus-1.1.0}/PKG-INFO +32 -23
  2. {biblicus-0.16.0 → biblicus-1.1.0}/README.md +31 -22
  3. {biblicus-0.16.0 → biblicus-1.1.0}/docs/ANALYSIS.md +25 -25
  4. {biblicus-0.16.0 → biblicus-1.1.0}/docs/ARCHITECTURE_DETAIL.md +17 -17
  5. {biblicus-0.16.0 → biblicus-1.1.0}/docs/BACKENDS.md +7 -7
  6. {biblicus-0.16.0 → biblicus-1.1.0}/docs/CHUNKING.md +1 -1
  7. biblicus-1.1.0/docs/CONTEXT_ENGINE.md +145 -0
  8. biblicus-1.1.0/docs/CONTEXT_ENGINE_DEMO.md +96 -0
  9. {biblicus-0.16.0 → biblicus-1.1.0}/docs/CONTEXT_PACK.md +1 -1
  10. {biblicus-0.16.0 → biblicus-1.1.0}/docs/CORPUS.md +2 -2
  11. {biblicus-0.16.0 → biblicus-1.1.0}/docs/CORPUS_DESIGN.md +13 -13
  12. {biblicus-0.16.0 → biblicus-1.1.0}/docs/DEMOS.md +15 -15
  13. biblicus-1.1.0/docs/EMBEDDING_RETRIEVAL.md +68 -0
  14. {biblicus-0.16.0 → biblicus-1.1.0}/docs/EXTRACTION.md +19 -19
  15. {biblicus-0.16.0 → biblicus-1.1.0}/docs/EXTRACTION_EVALUATION.md +13 -13
  16. {biblicus-0.16.0 → biblicus-1.1.0}/docs/FEATURE_INDEX.md +55 -6
  17. {biblicus-0.16.0 → biblicus-1.1.0}/docs/KNOWLEDGE_BASE.md +6 -6
  18. {biblicus-0.16.0 → biblicus-1.1.0}/docs/MARKOV_ANALYSIS.md +27 -22
  19. {biblicus-0.16.0 → biblicus-1.1.0}/docs/PROFILING.md +17 -17
  20. {biblicus-0.16.0 → biblicus-1.1.0}/docs/RETRIEVAL.md +9 -9
  21. {biblicus-0.16.0 → biblicus-1.1.0}/docs/RETRIEVAL_EVALUATION.md +18 -18
  22. {biblicus-0.16.0 → biblicus-1.1.0}/docs/RETRIEVAL_QUALITY.md +6 -6
  23. {biblicus-0.16.0 → biblicus-1.1.0}/docs/ROADMAP.md +1 -1
  24. {biblicus-0.16.0 → biblicus-1.1.0}/docs/TEXT_ANNOTATE.md +39 -9
  25. {biblicus-0.16.0 → biblicus-1.1.0}/docs/TEXT_EXTRACT.md +105 -55
  26. {biblicus-0.16.0 → biblicus-1.1.0}/docs/TEXT_LINK.md +18 -8
  27. {biblicus-0.16.0 → biblicus-1.1.0}/docs/TEXT_REDACT.md +28 -13
  28. {biblicus-0.16.0 → biblicus-1.1.0}/docs/TEXT_SLICE.md +44 -24
  29. biblicus-1.1.0/docs/TEXT_UTILITIES.md +414 -0
  30. {biblicus-0.16.0 → biblicus-1.1.0}/docs/TOPIC_MODELING.md +13 -13
  31. {biblicus-0.16.0 → biblicus-1.1.0}/docs/backends/embedding-index-file.md +2 -2
  32. {biblicus-0.16.0 → biblicus-1.1.0}/docs/backends/embedding-index-inmemory.md +2 -2
  33. {biblicus-0.16.0 → biblicus-1.1.0}/docs/backends/index.md +15 -15
  34. {biblicus-0.16.0 → biblicus-1.1.0}/docs/backends/scan.md +19 -19
  35. {biblicus-0.16.0 → biblicus-1.1.0}/docs/backends/sqlite-full-text-search.md +20 -20
  36. {biblicus-0.16.0 → biblicus-1.1.0}/docs/backends/tf-vector.md +5 -5
  37. {biblicus-0.16.0 → biblicus-1.1.0}/docs/conf.py +3 -1
  38. {biblicus-0.16.0 → biblicus-1.1.0}/docs/extractors/ocr/paddleocr-vl.md +2 -2
  39. {biblicus-0.16.0 → biblicus-1.1.0}/docs/extractors/ocr/rapidocr.md +2 -2
  40. {biblicus-0.16.0 → biblicus-1.1.0}/docs/extractors/pipeline-utilities/pipeline.md +7 -7
  41. {biblicus-0.16.0 → biblicus-1.1.0}/docs/extractors/pipeline-utilities/select-longest.md +2 -2
  42. {biblicus-0.16.0 → biblicus-1.1.0}/docs/extractors/pipeline-utilities/select-override.md +2 -2
  43. {biblicus-0.16.0 → biblicus-1.1.0}/docs/extractors/pipeline-utilities/select-smart-override.md +2 -2
  44. {biblicus-0.16.0 → biblicus-1.1.0}/docs/extractors/pipeline-utilities/select-text.md +2 -2
  45. {biblicus-0.16.0 → biblicus-1.1.0}/docs/extractors/speech-to-text/deepgram.md +2 -2
  46. {biblicus-0.16.0 → biblicus-1.1.0}/docs/extractors/speech-to-text/openai.md +2 -2
  47. {biblicus-0.16.0 → biblicus-1.1.0}/docs/extractors/text-document/markitdown.md +2 -2
  48. {biblicus-0.16.0 → biblicus-1.1.0}/docs/extractors/text-document/metadata.md +2 -2
  49. {biblicus-0.16.0 → biblicus-1.1.0}/docs/extractors/text-document/pass-through.md +3 -3
  50. {biblicus-0.16.0 → biblicus-1.1.0}/docs/extractors/text-document/pdf.md +2 -2
  51. {biblicus-0.16.0 → biblicus-1.1.0}/docs/extractors/text-document/unstructured.md +2 -2
  52. {biblicus-0.16.0 → biblicus-1.1.0}/docs/extractors/vlm-document/docling-granite.md +3 -3
  53. {biblicus-0.16.0 → biblicus-1.1.0}/docs/extractors/vlm-document/docling-smol.md +3 -3
  54. {biblicus-0.16.0 → biblicus-1.1.0}/docs/index.rst +5 -6
  55. {biblicus-0.16.0 → biblicus-1.1.0}/docs/use_cases/sequence_markov.md +6 -6
  56. {biblicus-0.16.0 → biblicus-1.1.0}/docs/use_cases/text_folder_search.md +1 -1
  57. biblicus-1.1.0/features/70_context_retriever.feature +12 -0
  58. biblicus-1.1.0/features/71_context_compaction.feature +22 -0
  59. biblicus-1.1.0/features/72_context_history_compaction.feature +9 -0
  60. biblicus-1.1.0/features/73_context_nested_compaction.feature +9 -0
  61. biblicus-1.1.0/features/74_context_regeneration.feature +9 -0
  62. biblicus-1.1.0/features/75_context_default_regeneration.feature +9 -0
  63. biblicus-1.1.0/features/76_context_pack_budget_weights.feature +9 -0
  64. biblicus-1.1.0/features/77_context_default_pack_priority.feature +10 -0
  65. biblicus-1.1.0/features/78_context_default_pack_weights.feature +9 -0
  66. biblicus-1.1.0/features/79_context_nested_context_packs.feature +9 -0
  67. biblicus-1.1.0/features/80_context_nested_pack_budget_cap.feature +9 -0
  68. biblicus-1.1.0/features/81_context_nested_regeneration.feature +9 -0
  69. biblicus-1.1.0/features/82_context_explicit_regeneration.feature +9 -0
  70. biblicus-1.1.0/features/83_context_explicit_pack_priority.feature +9 -0
  71. biblicus-1.1.0/features/84_context_explicit_pack_weights.feature +9 -0
  72. biblicus-1.1.0/features/85_context_expansion.feature +10 -0
  73. biblicus-1.1.0/features/86_context_engine_errors.feature +24 -0
  74. biblicus-1.1.0/features/87_context_compactor_strategies.feature +22 -0
  75. biblicus-1.1.0/features/88_context_engine_model_validation.feature +64 -0
  76. biblicus-1.1.0/features/89_context_engine_internal_branches.feature +68 -0
  77. biblicus-1.1.0/features/90_embedding_index_evidence_fallback.feature +10 -0
  78. biblicus-1.1.0/features/91_tf_vector_internal_branches.feature +10 -0
  79. biblicus-1.1.0/features/93_context_engine_full_paths.feature +6 -0
  80. {biblicus-0.16.0 → biblicus-1.1.0}/features/analysis_schema.feature +6 -6
  81. biblicus-1.1.0/features/backend_validation.feature +14 -0
  82. {biblicus-0.16.0 → biblicus-1.1.0}/features/biblicus_corpus.feature +1 -1
  83. {biblicus-0.16.0 → biblicus-1.1.0}/features/cli_entrypoint.feature +1 -1
  84. {biblicus-0.16.0 → biblicus-1.1.0}/features/cli_step_spec_parsing.feature +5 -5
  85. biblicus-1.1.0/features/context_engine_retrieval_internal_branches.feature +6 -0
  86. biblicus-1.1.0/features/context_engine_retrieve_context_pack.feature +38 -0
  87. {biblicus-0.16.0 → biblicus-1.1.0}/features/context_pack_cli.feature +5 -5
  88. {biblicus-0.16.0 → biblicus-1.1.0}/features/context_pack_policies.feature +40 -0
  89. {biblicus-0.16.0 → biblicus-1.1.0}/features/corpus_edge_cases.feature +3 -3
  90. biblicus-1.1.0/features/corpus_internal_branches.feature +53 -0
  91. {biblicus-0.16.0 → biblicus-1.1.0}/features/corpus_purge.feature +4 -4
  92. {biblicus-0.16.0 → biblicus-1.1.0}/features/docling_granite_extractor.feature +36 -36
  93. {biblicus-0.16.0 → biblicus-1.1.0}/features/docling_smol_extractor.feature +36 -36
  94. biblicus-1.1.0/features/embedding_index_internal_branches.feature +22 -0
  95. {biblicus-0.16.0 → biblicus-1.1.0}/features/embedding_retrieval.feature +59 -59
  96. {biblicus-0.16.0 → biblicus-1.1.0}/features/environment.py +3 -0
  97. {biblicus-0.16.0 → biblicus-1.1.0}/features/error_cases.feature +37 -37
  98. {biblicus-0.16.0 → biblicus-1.1.0}/features/evaluation.feature +18 -18
  99. {biblicus-0.16.0 → biblicus-1.1.0}/features/extraction_error_handling.feature +10 -10
  100. {biblicus-0.16.0 → biblicus-1.1.0}/features/extraction_evaluation.feature +28 -28
  101. {biblicus-0.16.0 → biblicus-1.1.0}/features/extraction_evaluation_lab.feature +1 -1
  102. biblicus-1.1.0/features/extraction_run_lifecycle.feature +117 -0
  103. {biblicus-0.16.0 → biblicus-1.1.0}/features/extraction_selection.feature +8 -8
  104. {biblicus-0.16.0 → biblicus-1.1.0}/features/extraction_selection_longest.feature +7 -7
  105. {biblicus-0.16.0 → biblicus-1.1.0}/features/extractor_pipeline.feature +15 -15
  106. biblicus-1.1.0/features/hook_logging_internal_branches.feature +6 -0
  107. {biblicus-0.16.0 → biblicus-1.1.0}/features/import_tree.feature +3 -3
  108. {biblicus-0.16.0 → biblicus-1.1.0}/features/inference_backend.feature +12 -12
  109. biblicus-1.1.0/features/ingest_namespacing.feature +43 -0
  110. {biblicus-0.16.0 → biblicus-1.1.0}/features/integration_audio_samples.feature +2 -2
  111. {biblicus-0.16.0 → biblicus-1.1.0}/features/integration_mixed_extraction.feature +6 -6
  112. {biblicus-0.16.0 → biblicus-1.1.0}/features/integration_ocr_image_extraction.feature +8 -4
  113. {biblicus-0.16.0 → biblicus-1.1.0}/features/integration_pdf_retrieval.feature +4 -4
  114. {biblicus-0.16.0 → biblicus-1.1.0}/features/integration_text_annotate.feature +2 -2
  115. {biblicus-0.16.0 → biblicus-1.1.0}/features/integration_text_extract.feature +2 -2
  116. {biblicus-0.16.0 → biblicus-1.1.0}/features/integration_unstructured_extraction.feature +3 -2
  117. {biblicus-0.16.0 → biblicus-1.1.0}/features/integration_use_cases.feature +1 -1
  118. {biblicus-0.16.0 → biblicus-1.1.0}/features/integration_use_cases_sequence_markov.feature +2 -3
  119. {biblicus-0.16.0 → biblicus-1.1.0}/features/markitdown_extractor.feature +24 -24
  120. {biblicus-0.16.0 → biblicus-1.1.0}/features/markov_analysis.feature +4 -4
  121. {biblicus-0.16.0 → biblicus-1.1.0}/features/markov_analysis_categorical.feature +3 -3
  122. {biblicus-0.16.0 → biblicus-1.1.0}/features/markov_analysis_llm.feature +4 -4
  123. {biblicus-0.16.0 → biblicus-1.1.0}/features/markov_analysis_topic_modeling.feature +4 -4
  124. {biblicus-0.16.0 → biblicus-1.1.0}/features/markov_analysis_variants.feature +70 -70
  125. {biblicus-0.16.0 → biblicus-1.1.0}/features/markov_internal_branches.feature +8 -8
  126. {biblicus-0.16.0 → biblicus-1.1.0}/features/markov_schema.feature +39 -39
  127. {biblicus-0.16.0 → biblicus-1.1.0}/features/ocr_extractor.feature +9 -9
  128. {biblicus-0.16.0 → biblicus-1.1.0}/features/paddleocr_vl_extractor.feature +32 -32
  129. {biblicus-0.16.0 → biblicus-1.1.0}/features/pdf_text_extraction.feature +13 -13
  130. {biblicus-0.16.0 → biblicus-1.1.0}/features/profiling.feature +35 -35
  131. {biblicus-0.16.0 → biblicus-1.1.0}/features/profiling_config_overrides.feature +4 -4
  132. {biblicus-0.16.0 → biblicus-1.1.0}/features/query_processing.feature +2 -2
  133. {biblicus-0.16.0 → biblicus-1.1.0}/features/recipe_cascading.feature +12 -12
  134. biblicus-1.1.0/features/recipe_file_extraction.feature +35 -0
  135. {biblicus-0.16.0 → biblicus-1.1.0}/features/recipe_utilities.feature +2 -2
  136. {biblicus-0.16.0 → biblicus-1.1.0}/features/retrieval_budget.feature +4 -0
  137. biblicus-1.1.0/features/retrieval_build_recipes.feature +19 -0
  138. {biblicus-0.16.0 → biblicus-1.1.0}/features/retrieval_evaluation_lab.feature +1 -1
  139. {biblicus-0.16.0 → biblicus-1.1.0}/features/retrieval_quality.feature +44 -44
  140. {biblicus-0.16.0 → biblicus-1.1.0}/features/retrieval_scan.feature +20 -32
  141. {biblicus-0.16.0 → biblicus-1.1.0}/features/retrieval_sqlite_full_text_search.feature +13 -13
  142. {biblicus-0.16.0 → biblicus-1.1.0}/features/retrieval_uses_extraction_run.feature +33 -33
  143. {biblicus-0.16.0 → biblicus-1.1.0}/features/retrieval_utilities.feature +5 -5
  144. {biblicus-0.16.0 → biblicus-1.1.0}/features/select_override.feature +10 -10
  145. biblicus-1.1.0/features/select_override_defaults.feature +14 -0
  146. {biblicus-0.16.0 → biblicus-1.1.0}/features/smart_override_selection.feature +27 -27
  147. biblicus-1.1.0/features/source_helper_internal_branches.feature +22 -0
  148. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/analysis_steps.py +28 -25
  149. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/backend_steps.py +48 -41
  150. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/cli_steps.py +84 -18
  151. biblicus-1.1.0/features/steps/context_compaction_steps.py +139 -0
  152. biblicus-1.1.0/features/steps/context_compactor_steps.py +28 -0
  153. biblicus-1.1.0/features/steps/context_default_pack_priority_steps.py +98 -0
  154. biblicus-1.1.0/features/steps/context_default_pack_weights_steps.py +91 -0
  155. biblicus-1.1.0/features/steps/context_default_regeneration_steps.py +69 -0
  156. biblicus-1.1.0/features/steps/context_engine_error_steps.py +111 -0
  157. biblicus-1.1.0/features/steps/context_engine_full_paths_steps.py +696 -0
  158. biblicus-1.1.0/features/steps/context_engine_internal_steps.py +521 -0
  159. biblicus-1.1.0/features/steps/context_engine_model_steps.py +144 -0
  160. biblicus-1.1.0/features/steps/context_engine_registry.py +123 -0
  161. biblicus-1.1.0/features/steps/context_engine_retrieval_internal_steps.py +114 -0
  162. biblicus-1.1.0/features/steps/context_engine_retrieve_context_pack_steps.py +131 -0
  163. biblicus-1.1.0/features/steps/context_engine_retriever.py +104 -0
  164. biblicus-1.1.0/features/steps/context_expansion_steps.py +79 -0
  165. biblicus-1.1.0/features/steps/context_explicit_pack_priority_steps.py +94 -0
  166. biblicus-1.1.0/features/steps/context_explicit_pack_weights_steps.py +83 -0
  167. biblicus-1.1.0/features/steps/context_explicit_regeneration_steps.py +84 -0
  168. biblicus-1.1.0/features/steps/context_history_compaction_steps.py +46 -0
  169. biblicus-1.1.0/features/steps/context_nested_compaction_steps.py +50 -0
  170. biblicus-1.1.0/features/steps/context_nested_context_packs_steps.py +74 -0
  171. biblicus-1.1.0/features/steps/context_nested_pack_budget_cap_steps.py +84 -0
  172. biblicus-1.1.0/features/steps/context_nested_regeneration_steps.py +91 -0
  173. biblicus-1.1.0/features/steps/context_pack_budget_steps.py +81 -0
  174. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/context_pack_steps.py +69 -15
  175. biblicus-1.1.0/features/steps/context_regeneration_steps.py +73 -0
  176. biblicus-1.1.0/features/steps/context_retriever_steps.py +68 -0
  177. biblicus-1.1.0/features/steps/corpus_internal_steps.py +190 -0
  178. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/docling_steps.py +13 -6
  179. biblicus-1.1.0/features/steps/embedding_index_evidence_steps.py +151 -0
  180. biblicus-1.1.0/features/steps/embedding_index_internal_steps.py +34 -0
  181. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/embedding_retrieval_coverage_steps.py +44 -34
  182. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/extraction_evaluation_lab_steps.py +1 -1
  183. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/extraction_evaluation_steps.py +7 -7
  184. biblicus-1.1.0/features/steps/extraction_run_lifecycle_steps.py +156 -0
  185. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/extraction_steps.py +241 -193
  186. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/extractor_steps.py +2 -2
  187. biblicus-1.1.0/features/steps/hook_logging_steps.py +13 -0
  188. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/markitdown_steps.py +7 -0
  189. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/markov_embeddings_error_steps.py +3 -3
  190. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/markov_internal_steps.py +49 -49
  191. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/markov_schema_steps.py +143 -111
  192. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/markov_steps.py +69 -64
  193. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/model_steps.py +2 -2
  194. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/openai_steps.py +21 -1
  195. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/paddleocr_vl_steps.py +12 -0
  196. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/profiling_steps.py +82 -37
  197. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/rapidocr_steps.py +7 -0
  198. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/recipe_steps.py +5 -1
  199. biblicus-1.1.0/features/steps/retrieval_build_recipe_steps.py +66 -0
  200. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/retrieval_evaluation_lab_steps.py +3 -1
  201. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/retrieval_quality_steps.py +29 -24
  202. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/retrieval_steps.py +129 -81
  203. biblicus-1.1.0/features/steps/select_override_defaults_steps.py +21 -0
  204. biblicus-1.1.0/features/steps/source_helper_steps.py +35 -0
  205. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/text_annotate_steps.py +4 -2
  206. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/text_extract_steps.py +24 -12
  207. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/text_link_internal_steps.py +32 -0
  208. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/text_link_steps.py +4 -2
  209. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/text_redact_steps.py +4 -2
  210. biblicus-1.1.0/features/steps/text_tool_loop_steps.py +138 -0
  211. biblicus-1.1.0/features/steps/tf_vector_internal_steps.py +14 -0
  212. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/topic_modeling_steps.py +46 -34
  213. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/unstructured_steps.py +7 -0
  214. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/use_cases_steps.py +26 -5
  215. {biblicus-0.16.0 → biblicus-1.1.0}/features/stt_deepgram_extractor.feature +13 -13
  216. {biblicus-0.16.0 → biblicus-1.1.0}/features/stt_extractor.feature +14 -14
  217. {biblicus-0.16.0 → biblicus-1.1.0}/features/text_extraction_runs.feature +29 -29
  218. {biblicus-0.16.0 → biblicus-1.1.0}/features/text_link_internal_branches.feature +8 -0
  219. {biblicus-0.16.0 → biblicus-1.1.0}/features/text_utilities.feature +26 -0
  220. {biblicus-0.16.0 → biblicus-1.1.0}/features/topic_modeling.feature +117 -117
  221. {biblicus-0.16.0 → biblicus-1.1.0}/features/unstructured_extractor.feature +15 -15
  222. {biblicus-0.16.0 → biblicus-1.1.0}/features/use_cases.feature +3 -3
  223. {biblicus-0.16.0 → biblicus-1.1.0}/features/user_config.feature +2 -2
  224. {biblicus-0.16.0 → biblicus-1.1.0}/pyproject.toml +1 -1
  225. biblicus-1.1.0/scripts/demo_context_engine.py +328 -0
  226. {biblicus-0.16.0 → biblicus-1.1.0}/scripts/extraction_evaluation_demo.py +12 -12
  227. {biblicus-0.16.0 → biblicus-1.1.0}/scripts/extraction_evaluation_lab.py +12 -12
  228. {biblicus-0.16.0 → biblicus-1.1.0}/scripts/markov_analysis_demo.py +77 -71
  229. {biblicus-0.16.0 → biblicus-1.1.0}/scripts/markov_cached_segments_demo.py +88 -76
  230. {biblicus-0.16.0 → biblicus-1.1.0}/scripts/markov_run_report.py +8 -8
  231. {biblicus-0.16.0 → biblicus-1.1.0}/scripts/profiling_demo.py +22 -22
  232. {biblicus-0.16.0 → biblicus-1.1.0}/scripts/readme_end_to_end_demo.py +12 -8
  233. {biblicus-0.16.0 → biblicus-1.1.0}/scripts/retrieval_evaluation_lab.py +21 -21
  234. {biblicus-0.16.0 → biblicus-1.1.0}/scripts/topic_modeling_integration.py +28 -28
  235. {biblicus-0.16.0 → biblicus-1.1.0}/scripts/use_cases/notes_to_context_pack_demo.py +13 -7
  236. {biblicus-0.16.0 → biblicus-1.1.0}/scripts/use_cases/sequence_markov_demo.py +37 -31
  237. {biblicus-0.16.0 → biblicus-1.1.0}/scripts/use_cases/text_folder_search_demo.py +15 -15
  238. {biblicus-0.16.0 → biblicus-1.1.0}/scripts/wikipedia_rag_demo.py +14 -14
  239. biblicus-1.1.0/src/biblicus/__init__.py +50 -0
  240. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/analysis/__init__.py +1 -1
  241. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/analysis/base.py +10 -10
  242. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/analysis/markov.py +78 -68
  243. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/analysis/models.py +47 -47
  244. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/analysis/profiling.py +58 -48
  245. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/analysis/topic_modeling.py +56 -51
  246. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/cli.py +248 -191
  247. biblicus-0.16.0/src/biblicus/recipes.py → biblicus-1.1.0/src/biblicus/configuration.py +14 -14
  248. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/constants.py +2 -2
  249. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/context.py +27 -12
  250. biblicus-1.1.0/src/biblicus/context_engine/__init__.py +53 -0
  251. biblicus-1.1.0/src/biblicus/context_engine/assembler.py +1090 -0
  252. biblicus-1.1.0/src/biblicus/context_engine/compaction.py +110 -0
  253. biblicus-1.1.0/src/biblicus/context_engine/models.py +423 -0
  254. biblicus-1.1.0/src/biblicus/context_engine/retrieval.py +133 -0
  255. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/corpus.py +233 -124
  256. biblicus-1.1.0/src/biblicus/errors.py +39 -0
  257. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/evaluation.py +27 -25
  258. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/extraction.py +103 -98
  259. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/extraction_evaluation.py +26 -26
  260. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/extractors/deepgram_stt.py +7 -7
  261. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/extractors/docling_granite_text.py +11 -11
  262. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/extractors/docling_smol_text.py +11 -11
  263. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/extractors/markitdown_text.py +4 -4
  264. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/extractors/openai_stt.py +7 -7
  265. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/extractors/paddleocr_vl_text.py +20 -18
  266. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/extractors/pipeline.py +8 -8
  267. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/extractors/rapidocr_text.py +3 -3
  268. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/extractors/unstructured_text.py +3 -3
  269. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/hooks.py +4 -4
  270. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/knowledge_base.py +34 -32
  271. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/models.py +84 -81
  272. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/retrieval.py +49 -42
  273. biblicus-1.1.0/src/biblicus/retrievers/__init__.py +50 -0
  274. biblicus-1.1.0/src/biblicus/retrievers/base.py +65 -0
  275. {biblicus-0.16.0/src/biblicus/backends → biblicus-1.1.0/src/biblicus/retrievers}/embedding_index_common.py +80 -44
  276. {biblicus-0.16.0/src/biblicus/backends → biblicus-1.1.0/src/biblicus/retrievers}/embedding_index_file.py +96 -61
  277. {biblicus-0.16.0/src/biblicus/backends → biblicus-1.1.0/src/biblicus/retrievers}/embedding_index_inmemory.py +100 -69
  278. biblicus-1.1.0/src/biblicus/retrievers/hybrid.py +301 -0
  279. {biblicus-0.16.0/src/biblicus/backends → biblicus-1.1.0/src/biblicus/retrievers}/scan.py +84 -73
  280. {biblicus-0.16.0/src/biblicus/backends → biblicus-1.1.0/src/biblicus/retrievers}/sqlite_full_text_search.py +115 -101
  281. {biblicus-0.16.0/src/biblicus/backends → biblicus-1.1.0/src/biblicus/retrievers}/tf_vector.py +103 -100
  282. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/sources.py +46 -11
  283. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/text/link.py +6 -0
  284. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/text/prompts.py +18 -8
  285. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/text/tool_loop.py +63 -5
  286. {biblicus-0.16.0 → biblicus-1.1.0/src/biblicus.egg-info}/PKG-INFO +32 -23
  287. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus.egg-info/SOURCES.txt +85 -13
  288. biblicus-1.1.0/tests/test_text_extract_tool_calls.py +110 -0
  289. biblicus-1.1.0/tests/test_text_utility_tool_calls.py +314 -0
  290. biblicus-1.1.0/tests/test_tool_loop_safeguards.py +171 -0
  291. biblicus-0.16.0/docs/EMBEDDING_RETRIEVAL.md +0 -57
  292. biblicus-0.16.0/docs/PR_FAQ_EMBEDDING_RETRIEVAL.md +0 -105
  293. biblicus-0.16.0/docs/PR_FAQ_TEXT_ANNOTATE.md +0 -118
  294. biblicus-0.16.0/docs/TEXT_UTILITIES.md +0 -137
  295. biblicus-0.16.0/features/backend_validation.feature +0 -14
  296. biblicus-0.16.0/features/extraction_run_lifecycle.feature +0 -117
  297. biblicus-0.16.0/features/recipe_file_extraction.feature +0 -35
  298. biblicus-0.16.0/features/retrieval_build_recipes.feature +0 -19
  299. biblicus-0.16.0/features/steps/extraction_run_lifecycle_steps.py +0 -152
  300. biblicus-0.16.0/features/steps/retrieval_build_recipe_steps.py +0 -64
  301. biblicus-0.16.0/features/steps/text_tool_loop_steps.py +0 -36
  302. biblicus-0.16.0/src/biblicus/__init__.py +0 -30
  303. biblicus-0.16.0/src/biblicus/backends/__init__.py +0 -50
  304. biblicus-0.16.0/src/biblicus/backends/base.py +0 -65
  305. biblicus-0.16.0/src/biblicus/backends/hybrid.py +0 -291
  306. biblicus-0.16.0/src/biblicus/errors.py +0 -15
  307. {biblicus-0.16.0 → biblicus-1.1.0}/LICENSE +0 -0
  308. {biblicus-0.16.0 → biblicus-1.1.0}/MANIFEST.in +0 -0
  309. {biblicus-0.16.0 → biblicus-1.1.0}/THIRD_PARTY_NOTICES.md +0 -0
  310. {biblicus-0.16.0 → biblicus-1.1.0}/datasets/extraction_lab/labels.json +0 -0
  311. {biblicus-0.16.0 → biblicus-1.1.0}/datasets/retrieval_lab/labels.json +0 -0
  312. {biblicus-0.16.0 → biblicus-1.1.0}/datasets/wikipedia_mini.json +0 -0
  313. {biblicus-0.16.0 → biblicus-1.1.0}/docs/ARCHITECTURE.md +0 -0
  314. {biblicus-0.16.0 → biblicus-1.1.0}/docs/STT.md +0 -0
  315. {biblicus-0.16.0 → biblicus-1.1.0}/docs/TESTING.md +0 -0
  316. {biblicus-0.16.0 → biblicus-1.1.0}/docs/USER_CONFIGURATION.md +0 -0
  317. {biblicus-0.16.0 → biblicus-1.1.0}/docs/USE_CASES.md +0 -0
  318. {biblicus-0.16.0 → biblicus-1.1.0}/docs/UTILITIES.md +0 -0
  319. {biblicus-0.16.0 → biblicus-1.1.0}/docs/api.rst +0 -0
  320. {biblicus-0.16.0 → biblicus-1.1.0}/docs/extractors/index.md +0 -0
  321. {biblicus-0.16.0 → biblicus-1.1.0}/docs/extractors/ocr/index.md +0 -0
  322. {biblicus-0.16.0 → biblicus-1.1.0}/docs/extractors/pipeline-utilities/index.md +0 -0
  323. {biblicus-0.16.0 → biblicus-1.1.0}/docs/extractors/speech-to-text/index.md +0 -0
  324. {biblicus-0.16.0 → biblicus-1.1.0}/docs/extractors/text-document/index.md +0 -0
  325. {biblicus-0.16.0 → biblicus-1.1.0}/docs/extractors/vlm-document/index.md +0 -0
  326. {biblicus-0.16.0 → biblicus-1.1.0}/docs/use_cases/notes_to_context_pack.md +0 -0
  327. {biblicus-0.16.0 → biblicus-1.1.0}/docs/use_cases/text_redact.md +0 -0
  328. {biblicus-0.16.0 → biblicus-1.1.0}/features/ai_llm.feature +0 -0
  329. {biblicus-0.16.0 → biblicus-1.1.0}/features/ai_models.feature +0 -0
  330. {biblicus-0.16.0 → biblicus-1.1.0}/features/cli_parsing.feature +0 -0
  331. {biblicus-0.16.0 → biblicus-1.1.0}/features/content_sniffing.feature +0 -0
  332. {biblicus-0.16.0 → biblicus-1.1.0}/features/context_pack.feature +0 -0
  333. {biblicus-0.16.0 → biblicus-1.1.0}/features/corpus_identity.feature +0 -0
  334. {biblicus-0.16.0 → biblicus-1.1.0}/features/crawl.feature +0 -0
  335. {biblicus-0.16.0 → biblicus-1.1.0}/features/embeddings.feature +0 -0
  336. {biblicus-0.16.0 → biblicus-1.1.0}/features/evidence_processing.feature +0 -0
  337. {biblicus-0.16.0 → biblicus-1.1.0}/features/extractor_validation.feature +0 -0
  338. {biblicus-0.16.0 → biblicus-1.1.0}/features/frontmatter.feature +0 -0
  339. {biblicus-0.16.0 → biblicus-1.1.0}/features/hook_config_validation.feature +0 -0
  340. {biblicus-0.16.0 → biblicus-1.1.0}/features/hook_error_handling.feature +0 -0
  341. {biblicus-0.16.0 → biblicus-1.1.0}/features/ingest_sources.feature +0 -0
  342. {biblicus-0.16.0 → biblicus-1.1.0}/features/integration_image_samples.feature +0 -0
  343. {biblicus-0.16.0 → biblicus-1.1.0}/features/integration_mixed_corpus.feature +0 -0
  344. {biblicus-0.16.0 → biblicus-1.1.0}/features/integration_pdf_samples.feature +0 -0
  345. {biblicus-0.16.0 → biblicus-1.1.0}/features/integration_text_link.feature +0 -0
  346. {biblicus-0.16.0 → biblicus-1.1.0}/features/integration_text_redact.feature +0 -0
  347. {biblicus-0.16.0 → biblicus-1.1.0}/features/integration_text_slice.feature +0 -0
  348. {biblicus-0.16.0 → biblicus-1.1.0}/features/integration_wikipedia.feature +0 -0
  349. {biblicus-0.16.0 → biblicus-1.1.0}/features/knowledge_base.feature +0 -0
  350. {biblicus-0.16.0 → biblicus-1.1.0}/features/lifecycle_hooks.feature +0 -0
  351. {biblicus-0.16.0 → biblicus-1.1.0}/features/markov_embeddings_errors.feature +0 -0
  352. {biblicus-0.16.0 → biblicus-1.1.0}/features/markov_start_end_labels.feature +0 -0
  353. {biblicus-0.16.0 → biblicus-1.1.0}/features/model_validation.feature +0 -0
  354. {biblicus-0.16.0 → biblicus-1.1.0}/features/paddleocr_vl_parse_api_response.feature +0 -0
  355. {biblicus-0.16.0 → biblicus-1.1.0}/features/python_api.feature +0 -0
  356. {biblicus-0.16.0 → biblicus-1.1.0}/features/python_hook_logging.feature +0 -0
  357. {biblicus-0.16.0 → biblicus-1.1.0}/features/source_loading.feature +0 -0
  358. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/ai_llm_steps.py +0 -0
  359. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/ai_models_steps.py +0 -0
  360. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/cli_parsing_steps.py +0 -0
  361. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/crawl_steps.py +0 -0
  362. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/deepgram_steps.py +0 -0
  363. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/embeddings_steps.py +0 -0
  364. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/evidence_processing_steps.py +0 -0
  365. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/frontmatter_steps.py +0 -0
  366. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/inference_steps.py +0 -0
  367. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/knowledge_base_steps.py +0 -0
  368. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/markov_start_end_steps.py +0 -0
  369. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/paddleocr_mock_steps.py +0 -0
  370. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/paddleocr_vl_unit_steps.py +0 -0
  371. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/pdf_steps.py +0 -0
  372. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/python_api_steps.py +0 -0
  373. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/requests_mock_steps.py +0 -0
  374. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/stt_deepgram_steps.py +0 -0
  375. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/stt_steps.py +0 -0
  376. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/text_internal_steps.py +0 -0
  377. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/text_mock_steps.py +0 -0
  378. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/text_slice_steps.py +0 -0
  379. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/user_config_steps.py +0 -0
  380. {biblicus-0.16.0 → biblicus-1.1.0}/features/steps/wikitext_steps.py +0 -0
  381. {biblicus-0.16.0 → biblicus-1.1.0}/features/streaming_ingest.feature +0 -0
  382. {biblicus-0.16.0 → biblicus-1.1.0}/features/text_annotate.feature +0 -0
  383. {biblicus-0.16.0 → biblicus-1.1.0}/features/text_extract.feature +0 -0
  384. {biblicus-0.16.0 → biblicus-1.1.0}/features/text_internal_branches.feature +0 -0
  385. {biblicus-0.16.0 → biblicus-1.1.0}/features/text_link.feature +0 -0
  386. {biblicus-0.16.0 → biblicus-1.1.0}/features/text_mock.feature +0 -0
  387. {biblicus-0.16.0 → biblicus-1.1.0}/features/text_redact.feature +0 -0
  388. {biblicus-0.16.0 → biblicus-1.1.0}/features/text_slice.feature +0 -0
  389. {biblicus-0.16.0 → biblicus-1.1.0}/features/token_budget.feature +0 -0
  390. {biblicus-0.16.0 → biblicus-1.1.0}/scripts/download_ag_news.py +0 -0
  391. {biblicus-0.16.0 → biblicus-1.1.0}/scripts/download_audio_samples.py +0 -0
  392. {biblicus-0.16.0 → biblicus-1.1.0}/scripts/download_image_samples.py +0 -0
  393. {biblicus-0.16.0 → biblicus-1.1.0}/scripts/download_mixed_samples.py +0 -0
  394. {biblicus-0.16.0 → biblicus-1.1.0}/scripts/download_pdf_samples.py +0 -0
  395. {biblicus-0.16.0 → biblicus-1.1.0}/scripts/download_wikipedia.py +0 -0
  396. {biblicus-0.16.0 → biblicus-1.1.0}/scripts/test.py +0 -0
  397. {biblicus-0.16.0 → biblicus-1.1.0}/scripts/use_cases/text_redact_demo.py +0 -0
  398. {biblicus-0.16.0 → biblicus-1.1.0}/setup.cfg +0 -0
  399. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/__main__.py +0 -0
  400. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/_vendor/dotyaml/__init__.py +0 -0
  401. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/_vendor/dotyaml/interpolation.py +0 -0
  402. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/_vendor/dotyaml/loader.py +0 -0
  403. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/_vendor/dotyaml/transformer.py +0 -0
  404. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/ai/__init__.py +0 -0
  405. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/ai/embeddings.py +0 -0
  406. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/ai/llm.py +0 -0
  407. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/ai/models.py +0 -0
  408. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/analysis/schema.py +0 -0
  409. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/chunking.py +0 -0
  410. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/crawl.py +0 -0
  411. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/embedding_providers.py +0 -0
  412. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/evidence_processing.py +0 -0
  413. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/extractors/__init__.py +0 -0
  414. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/extractors/base.py +0 -0
  415. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/extractors/metadata_text.py +0 -0
  416. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/extractors/pass_through_text.py +0 -0
  417. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/extractors/pdf_text.py +0 -0
  418. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/extractors/select_longest_text.py +0 -0
  419. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/extractors/select_override.py +0 -0
  420. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/extractors/select_smart_override.py +0 -0
  421. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/extractors/select_text.py +0 -0
  422. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/frontmatter.py +0 -0
  423. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/hook_logging.py +0 -0
  424. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/hook_manager.py +0 -0
  425. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/ignore.py +0 -0
  426. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/inference.py +0 -0
  427. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/text/__init__.py +0 -0
  428. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/text/annotate.py +0 -0
  429. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/text/extract.py +0 -0
  430. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/text/markup.py +0 -0
  431. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/text/models.py +0 -0
  432. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/text/redact.py +0 -0
  433. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/text/slice.py +0 -0
  434. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/time.py +0 -0
  435. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/uris.py +0 -0
  436. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus/user_config.py +0 -0
  437. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus.egg-info/dependency_links.txt +0 -0
  438. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus.egg-info/entry_points.txt +0 -0
  439. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus.egg-info/requires.txt +0 -0
  440. {biblicus-0.16.0 → biblicus-1.1.0}/src/biblicus.egg-info/top_level.txt +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: biblicus
3
- Version: 0.16.0
3
+ Version: 1.1.0
4
4
  Summary: Command line interface and Python library for corpus ingestion, retrieval, and evaluation.
5
5
  License: MIT
6
6
  Requires-Python: >=3.9
@@ -80,7 +80,7 @@ See [retrieval augmented generation overview] for a short introduction to the id
80
80
  ## Analysis highlights
81
81
 
82
82
  - `biblicus analyze markov` learns a directed, weighted state transition graph over segmented text.
83
- - YAML recipes support cascading composition plus dotted `--config key=value` overrides.
83
+ - YAML configurations support cascading composition plus dotted `--config key=value` overrides.
84
84
  - Text extract splits long texts with an LLM by inserting XML tags in-place for structured spans.
85
85
  - See `docs/MARKOV_ANALYSIS.md` for Markov analysis details and runnable demos.
86
86
  - See `docs/TEXT_EXTRACT.md` for the text extract utility and examples.
@@ -167,7 +167,7 @@ sequenceDiagram
167
167
 
168
168
  - You can ingest raw material once, then try many retrieval approaches over time.
169
169
  - You can keep raw files readable and portable, without locking your data inside a database.
170
- - You can evaluate retrieval runs against shared datasets and compare backends using the same corpus.
170
+ - You can evaluate retrieval snapshots against shared datasets and compare backends using the same corpus.
171
171
 
172
172
  ## Typical flow
173
173
 
@@ -176,7 +176,7 @@ sequenceDiagram
176
176
  - Crawl a website section into corpus items when you want a repeatable “import from the web” workflow.
177
177
  - Run extraction when you want derived text artifacts from non-text sources.
178
178
  - Reindex to refresh the catalog after edits.
179
- - Build a retrieval run with a backend.
179
+ - Build a retrieval snapshot with a backend.
180
180
  - Query the run to collect evidence and evaluate it with datasets.
181
181
 
182
182
  ## Install
@@ -292,8 +292,8 @@ for note_title, note_text in notes:
292
292
  corpus.ingest_note(note_text, title=note_title, tags=["memory"])
293
293
 
294
294
  backend = get_backend("scan")
295
- run = backend.build_run(corpus, recipe_name="Story demo", config={})
296
- budget = QueryBudget(max_total_items=5, max_total_characters=2000, max_items_per_source=None)
295
+ run = backend.build_run(corpus, configuration_name="Story demo", config={})
296
+ budget = QueryBudget(max_total_items=5, maximum_total_characters=2000, max_items_per_source=None)
297
297
  result = backend.query(
298
298
  corpus,
299
299
  run=run,
@@ -333,11 +333,11 @@ Example output:
333
333
  "query_text": "Primary button style preference",
334
334
  "budget": {
335
335
  "max_total_items": 5,
336
- "max_total_characters": 2000,
336
+ "maximum_total_characters": 2000,
337
337
  "max_items_per_source": null
338
338
  },
339
- "run_id": "RUN_ID",
340
- "recipe_id": "RECIPE_ID",
339
+ "snapshot_id": "RUN_ID",
340
+ "configuration_id": "RECIPE_ID",
341
341
  "backend_id": "scan",
342
342
  "generated_at": "2026-01-29T00:00:00.000000Z",
343
343
  "evidence": [
@@ -352,8 +352,8 @@ Example output:
352
352
  "span_start": null,
353
353
  "span_end": null,
354
354
  "stage": "scan",
355
- "recipe_id": "RECIPE_ID",
356
- "run_id": "RUN_ID",
355
+ "configuration_id": "RECIPE_ID",
356
+ "snapshot_id": "RUN_ID",
357
357
  "hash": null
358
358
  }
359
359
  ],
@@ -422,7 +422,7 @@ flowchart TB
422
422
 
423
423
  subgraph RowExtraction[Pluggable: extraction pipeline]
424
424
  direction TB
425
- Catalog --> Extract[Extract pipeline] --> ExtractedText[Extracted text artifacts] --> ExtractionRun[Extraction run manifest]
425
+ Catalog --> Extract[Extract pipeline] --> ExtractedText[Extracted text artifacts] --> ExtractionRun[Extraction snapshot manifest]
426
426
  end
427
427
 
428
428
  subgraph RowRetrieval[Pluggable: retrieval backend]
@@ -484,7 +484,7 @@ From Python, the same flow is available through the Corpus class and backend int
484
484
  - Ingest notes with `Corpus.ingest_note`.
485
485
  - Ingest files or web addresses with `Corpus.ingest_source`.
486
486
  - List items with `Corpus.list_items`.
487
- - Build a retrieval run with `get_backend` and `backend.build_run`.
487
+ - Build a retrieval snapshot with `get_backend` and `backend.build_run`.
488
488
  - Query a run with `backend.query`.
489
489
  - Evaluate with `evaluate_run`.
490
490
 
@@ -530,13 +530,13 @@ corpus/
530
530
  runs/
531
531
  extraction/
532
532
  pipeline/
533
- <run id>/
533
+ <snapshot id>/
534
534
  manifest.json
535
535
  text/
536
536
  <item id>.txt
537
537
  retrieval/
538
538
  <backend id>/
539
- <run id>/
539
+ <snapshot id>/
540
540
  manifest.json
541
541
  ```
542
542
 
@@ -552,7 +552,7 @@ For detailed documentation including configuration options, performance characte
552
552
 
553
553
  ## Retrieval documentation
554
554
 
555
- For the retrieval pipeline overview and run artifacts, see `docs/RETRIEVAL.md`. For retrieval quality upgrades
555
+ For the retrieval pipeline overview and snapshot artifacts, see `docs/RETRIEVAL.md`. For retrieval quality upgrades
556
556
  (tuned lexical baseline, reranking, hybrid retrieval), see `docs/RETRIEVAL_QUALITY.md`. For evaluation workflows
557
557
  and dataset formats, see `docs/RETRIEVAL_EVALUATION.md`. For a runnable walkthrough, use the retrieval evaluation lab
558
558
  script (`scripts/retrieval_evaluation_lab.py`).
@@ -615,26 +615,26 @@ See `docs/TEXT_SLICE.md` for the utility API and examples.
615
615
 
616
616
  Biblicus can run analysis pipelines on extracted text without changing the raw corpus. Profiling and topic modeling
617
617
  are the first analysis backends. Profiling summarizes corpus composition and extraction coverage. Topic modeling reads
618
- an extraction run, optionally applies an LLM-driven extraction pass, applies lexical processing, runs BERTopic, and
618
+ an extraction snapshot, optionally applies an LLM-driven extraction pass, applies lexical processing, runs BERTopic, and
619
619
  optionally applies an LLM fine-tuning pass to label topics. The output is structured JavaScript Object Notation.
620
620
 
621
621
  See `docs/ANALYSIS.md` for the analysis pipeline overview, `docs/PROFILING.md` for profiling, and
622
622
  `docs/TOPIC_MODELING.md` for topic modeling details.
623
623
 
624
- Run a topic analysis using a recipe file:
624
+ Run a topic analysis using a configuration file:
625
625
 
626
626
  ```
627
- biblicus analyze topics --corpus corpora/example --recipe recipes/topic-modeling.yml --extraction-run pipeline:<run_id>
627
+ biblicus analyze topics --corpus corpora/example --configuration configurations/topic-modeling.yml --extraction-run pipeline:<snapshot_id>
628
628
  ```
629
629
 
630
- If `--extraction-run` is omitted, Biblicus uses the most recent extraction run and emits a warning about
630
+ If `--extraction-run` is omitted, Biblicus uses the most recent extraction snapshot and emits a warning about
631
631
  reproducibility. The analysis output is stored under:
632
632
 
633
633
  ```
634
- .biblicus/runs/analysis/topic-modeling/<run_id>/output.json
634
+ .biblicus/runs/analysis/topic-modeling/<snapshot_id>/output.json
635
635
  ```
636
636
 
637
- Minimal recipe example:
637
+ Minimal configuration example:
638
638
 
639
639
  ```yaml
640
640
  schema_version: 1
@@ -659,7 +659,7 @@ llm_fine_tuning:
659
659
  ```
660
660
 
661
661
  LLM extraction and fine-tuning require `biblicus[openai]` and a configured OpenAI API key.
662
- Recipe files are validated strictly against the topic modeling schema, so type mismatches or unknown fields are errors.
662
+ Configuration files are validated strictly against the topic modeling schema, so type mismatches or unknown fields are errors.
663
663
  AG News integration runs require `biblicus[datasets]` in addition to `biblicus[topic-modeling]`.
664
664
 
665
665
  For a repeatable, real-world integration run that downloads AG News and executes topic modeling, use:
@@ -712,6 +712,15 @@ Build the documentation:
712
712
  python -m sphinx -b html docs docs/_build/html
713
713
  ```
714
714
 
715
+ Preview the documentation locally:
716
+
717
+ ```
718
+ cd docs/_build/html
719
+ python -m http.server
720
+ ```
721
+
722
+ Open `http://localhost:8000` in your browser.
723
+
715
724
  ## License
716
725
 
717
726
  License terms are in `LICENSE`.
@@ -26,7 +26,7 @@ See [retrieval augmented generation overview] for a short introduction to the id
26
26
  ## Analysis highlights
27
27
 
28
28
  - `biblicus analyze markov` learns a directed, weighted state transition graph over segmented text.
29
- - YAML recipes support cascading composition plus dotted `--config key=value` overrides.
29
+ - YAML configurations support cascading composition plus dotted `--config key=value` overrides.
30
30
  - Text extract splits long texts with an LLM by inserting XML tags in-place for structured spans.
31
31
  - See `docs/MARKOV_ANALYSIS.md` for Markov analysis details and runnable demos.
32
32
  - See `docs/TEXT_EXTRACT.md` for the text extract utility and examples.
@@ -113,7 +113,7 @@ sequenceDiagram
113
113
 
114
114
  - You can ingest raw material once, then try many retrieval approaches over time.
115
115
  - You can keep raw files readable and portable, without locking your data inside a database.
116
- - You can evaluate retrieval runs against shared datasets and compare backends using the same corpus.
116
+ - You can evaluate retrieval snapshots against shared datasets and compare backends using the same corpus.
117
117
 
118
118
  ## Typical flow
119
119
 
@@ -122,7 +122,7 @@ sequenceDiagram
122
122
  - Crawl a website section into corpus items when you want a repeatable “import from the web” workflow.
123
123
  - Run extraction when you want derived text artifacts from non-text sources.
124
124
  - Reindex to refresh the catalog after edits.
125
- - Build a retrieval run with a backend.
125
+ - Build a retrieval snapshot with a backend.
126
126
  - Query the run to collect evidence and evaluate it with datasets.
127
127
 
128
128
  ## Install
@@ -238,8 +238,8 @@ for note_title, note_text in notes:
238
238
  corpus.ingest_note(note_text, title=note_title, tags=["memory"])
239
239
 
240
240
  backend = get_backend("scan")
241
- run = backend.build_run(corpus, recipe_name="Story demo", config={})
242
- budget = QueryBudget(max_total_items=5, max_total_characters=2000, max_items_per_source=None)
241
+ run = backend.build_run(corpus, configuration_name="Story demo", config={})
242
+ budget = QueryBudget(max_total_items=5, maximum_total_characters=2000, max_items_per_source=None)
243
243
  result = backend.query(
244
244
  corpus,
245
245
  run=run,
@@ -279,11 +279,11 @@ Example output:
279
279
  "query_text": "Primary button style preference",
280
280
  "budget": {
281
281
  "max_total_items": 5,
282
- "max_total_characters": 2000,
282
+ "maximum_total_characters": 2000,
283
283
  "max_items_per_source": null
284
284
  },
285
- "run_id": "RUN_ID",
286
- "recipe_id": "RECIPE_ID",
285
+ "snapshot_id": "RUN_ID",
286
+ "configuration_id": "RECIPE_ID",
287
287
  "backend_id": "scan",
288
288
  "generated_at": "2026-01-29T00:00:00.000000Z",
289
289
  "evidence": [
@@ -298,8 +298,8 @@ Example output:
298
298
  "span_start": null,
299
299
  "span_end": null,
300
300
  "stage": "scan",
301
- "recipe_id": "RECIPE_ID",
302
- "run_id": "RUN_ID",
301
+ "configuration_id": "RECIPE_ID",
302
+ "snapshot_id": "RUN_ID",
303
303
  "hash": null
304
304
  }
305
305
  ],
@@ -368,7 +368,7 @@ flowchart TB
368
368
 
369
369
  subgraph RowExtraction[Pluggable: extraction pipeline]
370
370
  direction TB
371
- Catalog --> Extract[Extract pipeline] --> ExtractedText[Extracted text artifacts] --> ExtractionRun[Extraction run manifest]
371
+ Catalog --> Extract[Extract pipeline] --> ExtractedText[Extracted text artifacts] --> ExtractionRun[Extraction snapshot manifest]
372
372
  end
373
373
 
374
374
  subgraph RowRetrieval[Pluggable: retrieval backend]
@@ -430,7 +430,7 @@ From Python, the same flow is available through the Corpus class and backend int
430
430
  - Ingest notes with `Corpus.ingest_note`.
431
431
  - Ingest files or web addresses with `Corpus.ingest_source`.
432
432
  - List items with `Corpus.list_items`.
433
- - Build a retrieval run with `get_backend` and `backend.build_run`.
433
+ - Build a retrieval snapshot with `get_backend` and `backend.build_run`.
434
434
  - Query a run with `backend.query`.
435
435
  - Evaluate with `evaluate_run`.
436
436
 
@@ -476,13 +476,13 @@ corpus/
476
476
  runs/
477
477
  extraction/
478
478
  pipeline/
479
- <run id>/
479
+ <snapshot id>/
480
480
  manifest.json
481
481
  text/
482
482
  <item id>.txt
483
483
  retrieval/
484
484
  <backend id>/
485
- <run id>/
485
+ <snapshot id>/
486
486
  manifest.json
487
487
  ```
488
488
 
@@ -498,7 +498,7 @@ For detailed documentation including configuration options, performance characte
498
498
 
499
499
  ## Retrieval documentation
500
500
 
501
- For the retrieval pipeline overview and run artifacts, see `docs/RETRIEVAL.md`. For retrieval quality upgrades
501
+ For the retrieval pipeline overview and snapshot artifacts, see `docs/RETRIEVAL.md`. For retrieval quality upgrades
502
502
  (tuned lexical baseline, reranking, hybrid retrieval), see `docs/RETRIEVAL_QUALITY.md`. For evaluation workflows
503
503
  and dataset formats, see `docs/RETRIEVAL_EVALUATION.md`. For a runnable walkthrough, use the retrieval evaluation lab
504
504
  script (`scripts/retrieval_evaluation_lab.py`).
@@ -561,26 +561,26 @@ See `docs/TEXT_SLICE.md` for the utility API and examples.
561
561
 
562
562
  Biblicus can run analysis pipelines on extracted text without changing the raw corpus. Profiling and topic modeling
563
563
  are the first analysis backends. Profiling summarizes corpus composition and extraction coverage. Topic modeling reads
564
- an extraction run, optionally applies an LLM-driven extraction pass, applies lexical processing, runs BERTopic, and
564
+ an extraction snapshot, optionally applies an LLM-driven extraction pass, applies lexical processing, runs BERTopic, and
565
565
  optionally applies an LLM fine-tuning pass to label topics. The output is structured JavaScript Object Notation.
566
566
 
567
567
  See `docs/ANALYSIS.md` for the analysis pipeline overview, `docs/PROFILING.md` for profiling, and
568
568
  `docs/TOPIC_MODELING.md` for topic modeling details.
569
569
 
570
- Run a topic analysis using a recipe file:
570
+ Run a topic analysis using a configuration file:
571
571
 
572
572
  ```
573
- biblicus analyze topics --corpus corpora/example --recipe recipes/topic-modeling.yml --extraction-run pipeline:<run_id>
573
+ biblicus analyze topics --corpus corpora/example --configuration configurations/topic-modeling.yml --extraction-run pipeline:<snapshot_id>
574
574
  ```
575
575
 
576
- If `--extraction-run` is omitted, Biblicus uses the most recent extraction run and emits a warning about
576
+ If `--extraction-run` is omitted, Biblicus uses the most recent extraction snapshot and emits a warning about
577
577
  reproducibility. The analysis output is stored under:
578
578
 
579
579
  ```
580
- .biblicus/runs/analysis/topic-modeling/<run_id>/output.json
580
+ .biblicus/runs/analysis/topic-modeling/<snapshot_id>/output.json
581
581
  ```
582
582
 
583
- Minimal recipe example:
583
+ Minimal configuration example:
584
584
 
585
585
  ```yaml
586
586
  schema_version: 1
@@ -605,7 +605,7 @@ llm_fine_tuning:
605
605
  ```
606
606
 
607
607
  LLM extraction and fine-tuning require `biblicus[openai]` and a configured OpenAI API key.
608
- Recipe files are validated strictly against the topic modeling schema, so type mismatches or unknown fields are errors.
608
+ Configuration files are validated strictly against the topic modeling schema, so type mismatches or unknown fields are errors.
609
609
  AG News integration runs require `biblicus[datasets]` in addition to `biblicus[topic-modeling]`.
610
610
 
611
611
  For a repeatable, real-world integration run that downloads AG News and executes topic modeling, use:
@@ -658,6 +658,15 @@ Build the documentation:
658
658
  python -m sphinx -b html docs docs/_build/html
659
659
  ```
660
660
 
661
+ Preview the documentation locally:
662
+
663
+ ```
664
+ cd docs/_build/html
665
+ python -m http.server
666
+ ```
667
+
668
+ Open `http://localhost:8000` in your browser.
669
+
661
670
  ## License
662
671
 
663
672
  License terms are in `LICENSE`.
@@ -1,31 +1,31 @@
1
1
  # Corpus analysis
2
2
 
3
3
  Biblicus supports analysis backends that run on extracted text artifacts without changing the raw corpus. Analysis is a
4
- pluggable phase that reads an extraction run, produces structured output, and stores artifacts under the corpus runs
4
+ pluggable phase that reads an extraction snapshot, produces structured output, and stores artifacts under the corpus runs
5
5
  folder. Each analysis backend declares its own configuration schema and output contract, and all schemas are validated
6
6
  strictly.
7
7
 
8
- ## How analysis runs work
8
+ ## How analysis snapshots work
9
9
 
10
- - Analysis runs are tied to a corpus state via the extraction run reference.
11
- - The analysis output is written under `.biblicus/runs/analysis/<analysis-id>/<run_id>/`.
12
- - Analysis is reproducible when you supply the same extraction run and corpus catalog state.
13
- - Analysis configuration is stored as a recipe manifest in the run metadata.
10
+ - Analysis runs are tied to a corpus state via the extraction snapshot reference.
11
+ - The analysis output is written under `.biblicus/runs/analysis/<analysis-id>/<snapshot_id>/`.
12
+ - Analysis is reproducible when you supply the same extraction snapshot and corpus catalog state.
13
+ - Analysis configuration is stored as a configuration manifest in the run metadata.
14
14
 
15
- If you omit the extraction run, Biblicus uses the most recent extraction run and emits a reproducibility warning. For
16
- repeatable analysis runs, always pass the extraction run reference explicitly.
15
+ If you omit the extraction snapshot, Biblicus uses the most recent extraction snapshot and emits a reproducibility warning. For
16
+ repeatable analysis snapshots, always pass the extraction snapshot reference explicitly.
17
17
 
18
- ## Analysis run artifacts
18
+ ## Analysis snapshot artifacts
19
19
 
20
- Every analysis run records a manifest alongside the output:
20
+ Every analysis snapshot records a manifest alongside the output:
21
21
 
22
22
  ```
23
- .biblicus/runs/analysis/<analysis-id>/<run_id>/
23
+ .biblicus/runs/analysis/<analysis-id>/<snapshot_id>/
24
24
  manifest.json
25
25
  output.json
26
26
  ```
27
27
 
28
- The manifest captures the recipe, extraction run reference, and catalog timestamp so results can be reproduced and
28
+ The manifest captures the configuration, extraction snapshot reference, and catalog timestamp so results can be reproduced and
29
29
  compared later.
30
30
 
31
31
  ## Inspecting output
@@ -38,21 +38,21 @@ cat corpora/example/.biblicus/runs/analysis/profiling/RUN_ID/output.json
38
38
 
39
39
  Each analysis backend defines its own `report` payload. The run metadata is consistent across backends.
40
40
 
41
- ## Comparing analysis runs
41
+ ## Comparing analysis snapshots
42
42
 
43
43
  When you compare analysis results, record:
44
44
 
45
45
  - Corpus path and catalog timestamp.
46
46
  - Extraction run reference.
47
- - Analysis recipe name and configuration.
48
- - Analysis run identifier and output path.
47
+ - Analysis configuration name and configuration.
48
+ - Analysis snapshot identifier and output path.
49
49
 
50
50
  These make it possible to rerun the analysis and explain differences.
51
51
 
52
52
  ## Pluggable analysis backends
53
53
 
54
54
  Analysis backends implement the `CorpusAnalysisBackend` interface and are registered under `biblicus.analysis`.
55
- A backend receives the corpus, a recipe name, a configuration mapping, and an extraction run reference. It returns a
55
+ A backend receives the corpus, a configuration name, a configuration mapping, and an extraction snapshot reference. It returns a
56
56
  Pydantic model that is serialized to JavaScript Object Notation for storage.
57
57
 
58
58
  ## Choosing an analysis backend
@@ -61,22 +61,22 @@ Start with profiling when you need fast, deterministic baselines. Use topic mode
61
61
  and exploratory labels. Use Markov analysis when you want state-transition structure over sequences of segments.
62
62
  Combine multiple backends for a clear view of corpus composition, themes, and state dynamics.
63
63
 
64
- ## Recipe files
64
+ ## Configuration files
65
65
 
66
- Analysis recipes are optional JavaScript Object Notation or YAML files that capture configuration in a repeatable way.
66
+ Analysis configurations are optional JavaScript Object Notation or YAML files that capture configuration in a repeatable way.
67
67
  They are useful for sharing experiments and keeping runs reproducible.
68
68
 
69
- Recipes support cascading composition. When a command accepts `--recipe`, you can pass multiple recipe files. Biblicus
70
- merges them in order, where later recipes override earlier recipes via a deep merge. You can then apply `--config`
69
+ Recipes support cascading composition. When a command accepts `--configuration`, you can pass multiple configuration files. Biblicus
70
+ merges them in order, where later configurations override earlier configurations via a deep merge. You can then apply `--config`
71
71
  overrides on top of the composed view.
72
72
 
73
- Minimal profiling recipe:
73
+ Minimal profiling configuration:
74
74
 
75
75
  ```
76
76
  schema_version: 1
77
77
  ```
78
78
 
79
- Minimal topic modeling recipe:
79
+ Minimal topic modeling configuration:
80
80
 
81
81
  ```
82
82
  schema_version: 1
@@ -87,7 +87,7 @@ bertopic_analysis:
87
87
  nr_topics: 8
88
88
  ```
89
89
 
90
- Minimal Markov analysis recipe:
90
+ Minimal Markov analysis configuration:
91
91
 
92
92
  ```
93
93
  schema_version: 1
@@ -111,7 +111,7 @@ The integration demo script is a working reference you can use as a starting poi
111
111
  python scripts/topic_modeling_integration.py --corpus corpora/ag_news_demo --force
112
112
  ```
113
113
 
114
- The command prints the analysis run identifier and the output path. Open the resulting `output.json` to inspect per-topic
114
+ The command prints the analysis snapshot identifier and the output path. Open the resulting `output.json` to inspect per-topic
115
115
  labels, keywords, and document examples.
116
116
 
117
117
  ## Markov analysis
@@ -134,7 +134,7 @@ deterministic counts and distribution metrics. See `docs/PROFILING.md` for the f
134
134
  python -m biblicus analyze profile --corpus corpora/example --extraction-run pipeline:RUN_ID
135
135
  ```
136
136
 
137
- The command writes an analysis run directory and prints the run identifier.
137
+ The command writes an analysis snapshot directory and prints the snapshot identifier.
138
138
 
139
139
  Run profiling from the CLI:
140
140
 
@@ -15,7 +15,7 @@ Design starts from strict behavior-driven development:
15
15
  - All changes should follow specification-first behavior-driven development: failing scenario,
16
16
  implementation, passing scenario, then refactor.
17
17
  - Behavior-driven development scenarios are not an afterthought: they are how we keep the domain
18
- vocabulary consistent and the platform comparable across backends and recipes.
18
+ vocabulary consistent and the platform comparable across backends and configurations.
19
19
  - **Specification completeness** is mandatory: if behavior exists, it must be specified.
20
20
  Ambiguous or untestable behavior should be removed or turned into an explicit error.
21
21
 
@@ -42,7 +42,7 @@ core nouns:
42
42
  - I have a **corpus** at this path or uniform resource identifier.
43
43
  - I ingest an **item** with optional **metadata**.
44
44
  - I rebuild the derived **index** after edits.
45
- - I run a **recipe** against the same corpus.
45
+ - I run a **configuration** against the same corpus.
46
46
  - I query and receive **evidence**.
47
47
 
48
48
  Anything that does not map cleanly to these nouns is either a derived helper or a backend-specific
@@ -72,13 +72,13 @@ requirements.
72
72
  - **Knowledge base backend**: an implementation that can ingest and retrieve from a corpus, such
73
73
  as scan, full text search, vector retrieval, or hybrid retrieval, exposed to procedures through
74
74
  retrieval primitives.
75
- - **Retrieval recipe**: a named configuration bundle for a backend, such as chunking rules,
75
+ - **Retrieval configuration**: a named configuration bundle for a backend, such as chunking rules,
76
76
  embedding model and version, hybrid weights, reranker choice, and filters. This is what we
77
77
  benchmark and compare.
78
- - **Recipe manifest**: a reproducibility record describing the backend and recipe parameters,
79
- plus any referenced materializations and build runs.
80
- - **Materialization**: an optional, persisted representation derived from raw content for a given
81
- recipe and backend, such as chunks, embeddings, or indexes. Some backends intentionally have
78
+ - **Configuration manifest**: a reproducibility record describing the backend and configuration parameters,
79
+ plus any referenced snapshot artifacts and build snapshots.
80
+ - **Snapshot artifacts**: optional, persisted representations derived from raw content for a given
81
+ configuration and backend, such as chunks, embeddings, or indexes. Some backends intentionally have
82
82
  none and operate on demand.
83
83
  - **Evidence**: structured retrieval output from backend queries. Evidence includes spans, scores,
84
84
  and provenance used by downstream retrieval augmented generation procedures.
@@ -95,7 +95,7 @@ requirements.
95
95
  - **Minimal opinion raw store**: raw ingestion should work for a folder of files with optional
96
96
  lightweight tagging.
97
97
  - **Reproducibility by default**: comparisons require manifests (even when there are no persisted
98
- materializations).
98
+ snapshot artifacts).
99
99
  - **Mutability is real**: corpora are edited, pruned, and reorganized; re-indexing must be a core
100
100
  workflow.
101
101
  - **Separation of concerns**: retrieval returns evidence; retrieval-augmented generation patterns
@@ -110,7 +110,7 @@ requirements.
110
110
  These are explicit, opinionated policies encoded into the project:
111
111
 
112
112
  - **Evidence schema strictness**: moderate-to-strong schema. Evidence must include stable
113
- identifiers, provenance, and retrieval scores; richer fields (spans, stage, recipe and run
113
+ identifiers, provenance, and retrieval scores; richer fields (spans, stage, configuration and run
114
114
  identifiers) are expected.
115
115
  - **Retrieval stages**: multi-stage is explicit (retrieve, rerank, then filter). Pipelines are
116
116
  expressed through evidence metadata rather than hard-coded backends.
@@ -131,7 +131,7 @@ Evidence is the canonical output of retrieval. Required fields:
131
131
  - `score` and `rank`
132
132
  - `text` (or `content_ref` when non-text)
133
133
  - `stage` (for example, `scan`, `full-text-search`, `rerank`)
134
- - `recipe_id` / `run_id` (for reproducibility)
134
+ - `configuration_id` / `snapshot_id` (for reproducibility)
135
135
  - Optional: `span_start`, `span_end`, `hash`
136
136
 
137
137
  ## Evidence lifecycle
@@ -220,12 +220,12 @@ The interface stays the same; topology is configuration.
220
220
 
221
221
  ### Reproducibility
222
222
 
223
- - Biblicus always records a **recipe manifest** for reproducibility.
224
- - When a backend produces persisted materializations, Biblicus treats them as **versioned build
225
- runs** identified by `run_id` (rather than overwriting in place by default).
226
- - Manifests exist even for just-in-time backends (materializations may be empty).
223
+ - Biblicus always records a **configuration manifest** for reproducibility.
224
+ - When a backend produces persisted snapshot artifacts, Biblicus treats them as **versioned build
225
+ snapshots** identified by `snapshot_id` (rather than overwriting in place by default).
226
+ - Manifests exist even for just-in-time backends (snapshot artifacts may be empty).
227
227
  - Full directed acyclic graph lineage is not included in version zero; revisit only if needed.
228
- - Optional: define **shared materialization formats** (canonical chunk and embedding stores) so
228
+ - Optional: define **shared snapshot artifact formats** (canonical chunk and embedding stores) so
229
229
  multiple backends can reuse intermediates when it makes sense; keep it opt-in.
230
230
 
231
231
  ### Evaluation
@@ -243,8 +243,8 @@ The interface stays the same; topology is configuration.
243
243
  backend/tool can consume it without requiring a database engine.
244
244
  - Canonical version zero format is a single JavaScript Object Notation file at
245
245
  `.biblicus/catalog.json`, written atomically (temporary file and rename) on updates.
246
- - The catalog includes `latest_run_id` and run manifests are stored at
247
- `.biblicus/runs/<run_id>.json`.
246
+ - The catalog includes `latest_snapshot_id` and snapshot manifests are stored at
247
+ `.biblicus/snapshots/<snapshot_id>.json`.
248
248
  - If this becomes a bottleneck at very large scales, we **change the specification** (bump
249
249
  `schema_version`) rather than introduce multiple “supported” catalog storage modes.
250
250
 
@@ -17,7 +17,7 @@ Backends implement two operations:
17
17
  Backends store artifacts and manifests under:
18
18
 
19
19
  ```
20
- .biblicus/runs/retrieval/<backend_id>/<run_id>/
20
+ .biblicus/runs/retrieval/<backend_id>/<snapshot_id>/
21
21
  manifest.json
22
22
  <backend artifacts>
23
23
  ```
@@ -26,12 +26,12 @@ The manifest is the reproducible contract. Artifacts are backend-specific and li
26
26
 
27
27
  ## Implementation checklist
28
28
 
29
- 1. **Define a Pydantic configuration model** for your backend recipe.
29
+ 1. **Define a Pydantic configuration model** for your backend configuration.
30
30
  2. **Implement `RetrievalBackend`**:
31
- - `build_run(corpus, recipe_name, config)`
31
+ - `build_run(corpus, configuration_name, config)`
32
32
  - `query(corpus, run, query_text, budget)`
33
33
  3. **Emit `Evidence`** with required fields:
34
- - `item_id`, `source_uri`, `media_type`, `score`, `rank`, `stage`, `recipe_id`, `run_id`
34
+ - `item_id`, `source_uri`, `media_type`, `score`, `rank`, `stage`, `configuration_id`, `snapshot_id`
35
35
  - `text` **or** `content_ref`
36
36
  4. **Register the backend** in `biblicus.backends.available_backends`.
37
37
  5. **Add behavior-driven development specifications** before implementation and make them pass with 100% coverage.
@@ -41,12 +41,12 @@ The manifest is the reproducible contract. Artifacts are backend-specific and li
41
41
  - Treat **runs** as immutable manifests with reproducible parameters.
42
42
  - If your backend needs artifacts, store them under `.biblicus/runs/` and record paths in `artifact_paths`.
43
43
  - Keep **text extraction** in explicit pipeline stages, not in backend ingestion.
44
- See `docs/EXTRACTION.md` for how extraction runs are built and referenced from backend configs.
44
+ See `docs/EXTRACTION.md` for how extraction snapshots are built and referenced from backend configs.
45
45
 
46
46
  ## Reproducibility checklist
47
47
 
48
- - Record the extraction run reference used to build the backend.
49
- - Keep the backend recipe configuration in source control.
48
+ - Record the extraction snapshot reference used to build the backend.
49
+ - Keep the backend configuration configuration in source control.
50
50
  - Reuse the same `QueryBudget` when comparing backends.
51
51
 
52
52
  ## Common pitfalls
@@ -8,7 +8,7 @@ returns evidence with chunk boundaries so you can trace results back to the orig
8
8
 
9
9
  ## Chunkers are pluggable
10
10
 
11
- Chunking is a pluggable interface selected by identifier in a retrieval recipe:
11
+ Chunking is a pluggable interface selected by identifier in a retrieval configuration:
12
12
 
13
13
  - `chunker_id`
14
14
  - `chunker_config` (Pydantic validated; `extra="forbid"`)