biblicus 1.0.0__tar.gz → 1.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {biblicus-1.0.0/src/biblicus.egg-info → biblicus-1.1.0}/PKG-INFO +30 -21
- {biblicus-1.0.0 → biblicus-1.1.0}/README.md +29 -20
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/ANALYSIS.md +25 -25
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/ARCHITECTURE_DETAIL.md +17 -17
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/BACKENDS.md +7 -7
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/CHUNKING.md +1 -1
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/CONTEXT_ENGINE.md +28 -3
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/CONTEXT_PACK.md +1 -1
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/CORPUS.md +2 -2
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/CORPUS_DESIGN.md +13 -13
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/DEMOS.md +15 -15
- biblicus-1.1.0/docs/EMBEDDING_RETRIEVAL.md +68 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/EXTRACTION.md +19 -19
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/EXTRACTION_EVALUATION.md +13 -13
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/FEATURE_INDEX.md +6 -6
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/KNOWLEDGE_BASE.md +5 -5
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/MARKOV_ANALYSIS.md +27 -22
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/PROFILING.md +17 -17
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/RETRIEVAL.md +9 -9
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/RETRIEVAL_EVALUATION.md +15 -15
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/RETRIEVAL_QUALITY.md +5 -5
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/ROADMAP.md +1 -1
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/TEXT_ANNOTATE.md +39 -9
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/TEXT_EXTRACT.md +105 -55
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/TEXT_LINK.md +18 -8
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/TEXT_REDACT.md +28 -13
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/TEXT_SLICE.md +44 -24
- biblicus-1.1.0/docs/TEXT_UTILITIES.md +414 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/TOPIC_MODELING.md +13 -13
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/backends/embedding-index-file.md +2 -2
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/backends/embedding-index-inmemory.md +2 -2
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/backends/index.md +15 -15
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/backends/scan.md +19 -19
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/backends/sqlite-full-text-search.md +20 -20
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/backends/tf-vector.md +5 -5
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/conf.py +3 -1
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/extractors/ocr/paddleocr-vl.md +2 -2
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/extractors/ocr/rapidocr.md +2 -2
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/extractors/pipeline-utilities/pipeline.md +7 -7
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/extractors/pipeline-utilities/select-longest.md +2 -2
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/extractors/pipeline-utilities/select-override.md +2 -2
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/extractors/pipeline-utilities/select-smart-override.md +2 -2
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/extractors/pipeline-utilities/select-text.md +2 -2
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/extractors/speech-to-text/deepgram.md +2 -2
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/extractors/speech-to-text/openai.md +2 -2
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/extractors/text-document/markitdown.md +2 -2
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/extractors/text-document/metadata.md +2 -2
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/extractors/text-document/pass-through.md +3 -3
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/extractors/text-document/pdf.md +2 -2
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/extractors/text-document/unstructured.md +2 -2
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/extractors/vlm-document/docling-granite.md +3 -3
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/extractors/vlm-document/docling-smol.md +3 -3
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/index.rst +4 -6
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/use_cases/sequence_markov.md +6 -6
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/use_cases/text_folder_search.md +1 -1
- {biblicus-1.0.0 → biblicus-1.1.0}/features/89_context_engine_internal_branches.feature +21 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/90_embedding_index_evidence_fallback.feature +2 -2
- {biblicus-1.0.0 → biblicus-1.1.0}/features/analysis_schema.feature +6 -6
- biblicus-1.1.0/features/backend_validation.feature +14 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/cli_entrypoint.feature +1 -1
- {biblicus-1.0.0 → biblicus-1.1.0}/features/cli_step_spec_parsing.feature +5 -5
- biblicus-1.1.0/features/context_engine_retrieval_internal_branches.feature +6 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/context_engine_retrieve_context_pack.feature +10 -10
- {biblicus-1.0.0 → biblicus-1.1.0}/features/context_pack_cli.feature +5 -5
- {biblicus-1.0.0 → biblicus-1.1.0}/features/corpus_edge_cases.feature +3 -3
- {biblicus-1.0.0 → biblicus-1.1.0}/features/corpus_purge.feature +4 -4
- {biblicus-1.0.0 → biblicus-1.1.0}/features/docling_granite_extractor.feature +36 -36
- {biblicus-1.0.0 → biblicus-1.1.0}/features/docling_smol_extractor.feature +36 -36
- {biblicus-1.0.0 → biblicus-1.1.0}/features/embedding_retrieval.feature +47 -47
- {biblicus-1.0.0 → biblicus-1.1.0}/features/error_cases.feature +36 -36
- {biblicus-1.0.0 → biblicus-1.1.0}/features/evaluation.feature +13 -13
- {biblicus-1.0.0 → biblicus-1.1.0}/features/extraction_error_handling.feature +10 -10
- {biblicus-1.0.0 → biblicus-1.1.0}/features/extraction_evaluation.feature +28 -28
- {biblicus-1.0.0 → biblicus-1.1.0}/features/extraction_evaluation_lab.feature +1 -1
- biblicus-1.1.0/features/extraction_run_lifecycle.feature +117 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/extraction_selection.feature +8 -8
- {biblicus-1.0.0 → biblicus-1.1.0}/features/extraction_selection_longest.feature +7 -7
- {biblicus-1.0.0 → biblicus-1.1.0}/features/extractor_pipeline.feature +15 -15
- {biblicus-1.0.0 → biblicus-1.1.0}/features/import_tree.feature +3 -3
- {biblicus-1.0.0 → biblicus-1.1.0}/features/inference_backend.feature +12 -12
- {biblicus-1.0.0 → biblicus-1.1.0}/features/integration_audio_samples.feature +2 -2
- {biblicus-1.0.0 → biblicus-1.1.0}/features/integration_mixed_extraction.feature +6 -6
- {biblicus-1.0.0 → biblicus-1.1.0}/features/integration_ocr_image_extraction.feature +4 -4
- {biblicus-1.0.0 → biblicus-1.1.0}/features/integration_pdf_retrieval.feature +3 -3
- {biblicus-1.0.0 → biblicus-1.1.0}/features/integration_text_annotate.feature +2 -2
- {biblicus-1.0.0 → biblicus-1.1.0}/features/integration_text_extract.feature +2 -2
- {biblicus-1.0.0 → biblicus-1.1.0}/features/integration_unstructured_extraction.feature +2 -2
- {biblicus-1.0.0 → biblicus-1.1.0}/features/integration_use_cases.feature +1 -1
- {biblicus-1.0.0 → biblicus-1.1.0}/features/integration_use_cases_sequence_markov.feature +2 -2
- {biblicus-1.0.0 → biblicus-1.1.0}/features/markitdown_extractor.feature +24 -24
- {biblicus-1.0.0 → biblicus-1.1.0}/features/markov_analysis.feature +4 -4
- {biblicus-1.0.0 → biblicus-1.1.0}/features/markov_analysis_categorical.feature +3 -3
- {biblicus-1.0.0 → biblicus-1.1.0}/features/markov_analysis_llm.feature +4 -4
- {biblicus-1.0.0 → biblicus-1.1.0}/features/markov_analysis_topic_modeling.feature +4 -4
- {biblicus-1.0.0 → biblicus-1.1.0}/features/markov_analysis_variants.feature +70 -70
- {biblicus-1.0.0 → biblicus-1.1.0}/features/markov_internal_branches.feature +8 -8
- {biblicus-1.0.0 → biblicus-1.1.0}/features/markov_schema.feature +39 -39
- {biblicus-1.0.0 → biblicus-1.1.0}/features/ocr_extractor.feature +9 -9
- {biblicus-1.0.0 → biblicus-1.1.0}/features/paddleocr_vl_extractor.feature +32 -32
- {biblicus-1.0.0 → biblicus-1.1.0}/features/pdf_text_extraction.feature +13 -13
- {biblicus-1.0.0 → biblicus-1.1.0}/features/profiling.feature +35 -35
- {biblicus-1.0.0 → biblicus-1.1.0}/features/profiling_config_overrides.feature +4 -4
- {biblicus-1.0.0 → biblicus-1.1.0}/features/query_processing.feature +2 -2
- {biblicus-1.0.0 → biblicus-1.1.0}/features/recipe_cascading.feature +12 -12
- biblicus-1.1.0/features/recipe_file_extraction.feature +35 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/recipe_utilities.feature +2 -2
- biblicus-1.1.0/features/retrieval_build_recipes.feature +19 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/retrieval_evaluation_lab.feature +1 -1
- {biblicus-1.0.0 → biblicus-1.1.0}/features/retrieval_quality.feature +37 -37
- {biblicus-1.0.0 → biblicus-1.1.0}/features/retrieval_scan.feature +14 -14
- {biblicus-1.0.0 → biblicus-1.1.0}/features/retrieval_sqlite_full_text_search.feature +12 -12
- {biblicus-1.0.0 → biblicus-1.1.0}/features/retrieval_uses_extraction_run.feature +28 -28
- {biblicus-1.0.0 → biblicus-1.1.0}/features/retrieval_utilities.feature +5 -5
- {biblicus-1.0.0 → biblicus-1.1.0}/features/select_override.feature +10 -10
- {biblicus-1.0.0 → biblicus-1.1.0}/features/smart_override_selection.feature +27 -27
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/analysis_steps.py +28 -25
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/backend_steps.py +47 -40
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/cli_steps.py +11 -11
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/context_engine_full_paths_steps.py +8 -8
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/context_engine_internal_steps.py +200 -1
- biblicus-1.1.0/features/steps/context_engine_retrieval_internal_steps.py +114 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/context_engine_retrieve_context_pack_steps.py +24 -22
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/context_pack_steps.py +20 -20
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/docling_steps.py +6 -6
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/embedding_index_evidence_steps.py +25 -24
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/embedding_index_internal_steps.py +1 -1
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/embedding_retrieval_coverage_steps.py +42 -32
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/extraction_evaluation_lab_steps.py +1 -1
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/extraction_evaluation_steps.py +7 -7
- biblicus-1.1.0/features/steps/extraction_run_lifecycle_steps.py +156 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/extraction_steps.py +241 -193
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/extractor_steps.py +2 -2
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/markov_embeddings_error_steps.py +3 -3
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/markov_internal_steps.py +49 -49
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/markov_schema_steps.py +143 -111
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/markov_steps.py +69 -64
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/model_steps.py +2 -2
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/paddleocr_vl_steps.py +5 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/profiling_steps.py +82 -37
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/recipe_steps.py +5 -1
- biblicus-1.1.0/features/steps/retrieval_build_recipe_steps.py +66 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/retrieval_evaluation_lab_steps.py +3 -1
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/retrieval_quality_steps.py +28 -23
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/retrieval_steps.py +104 -76
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/text_annotate_steps.py +4 -2
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/text_extract_steps.py +24 -12
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/text_link_steps.py +4 -2
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/text_redact_steps.py +4 -2
- biblicus-1.1.0/features/steps/text_tool_loop_steps.py +138 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/tf_vector_internal_steps.py +1 -1
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/topic_modeling_steps.py +46 -34
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/use_cases_steps.py +3 -3
- {biblicus-1.0.0 → biblicus-1.1.0}/features/stt_deepgram_extractor.feature +13 -13
- {biblicus-1.0.0 → biblicus-1.1.0}/features/stt_extractor.feature +14 -14
- {biblicus-1.0.0 → biblicus-1.1.0}/features/text_extraction_runs.feature +29 -29
- {biblicus-1.0.0 → biblicus-1.1.0}/features/text_utilities.feature +26 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/topic_modeling.feature +117 -117
- {biblicus-1.0.0 → biblicus-1.1.0}/features/unstructured_extractor.feature +15 -15
- {biblicus-1.0.0 → biblicus-1.1.0}/features/use_cases.feature +3 -3
- {biblicus-1.0.0 → biblicus-1.1.0}/features/user_config.feature +2 -2
- {biblicus-1.0.0 → biblicus-1.1.0}/pyproject.toml +1 -1
- {biblicus-1.0.0 → biblicus-1.1.0}/scripts/extraction_evaluation_demo.py +12 -12
- {biblicus-1.0.0 → biblicus-1.1.0}/scripts/extraction_evaluation_lab.py +12 -12
- {biblicus-1.0.0 → biblicus-1.1.0}/scripts/markov_analysis_demo.py +77 -71
- {biblicus-1.0.0 → biblicus-1.1.0}/scripts/markov_cached_segments_demo.py +88 -76
- {biblicus-1.0.0 → biblicus-1.1.0}/scripts/markov_run_report.py +8 -8
- {biblicus-1.0.0 → biblicus-1.1.0}/scripts/profiling_demo.py +22 -22
- {biblicus-1.0.0 → biblicus-1.1.0}/scripts/readme_end_to_end_demo.py +11 -7
- {biblicus-1.0.0 → biblicus-1.1.0}/scripts/retrieval_evaluation_lab.py +20 -20
- {biblicus-1.0.0 → biblicus-1.1.0}/scripts/topic_modeling_integration.py +28 -28
- {biblicus-1.0.0 → biblicus-1.1.0}/scripts/use_cases/notes_to_context_pack_demo.py +10 -6
- {biblicus-1.0.0 → biblicus-1.1.0}/scripts/use_cases/sequence_markov_demo.py +37 -31
- {biblicus-1.0.0 → biblicus-1.1.0}/scripts/use_cases/text_folder_search_demo.py +14 -14
- {biblicus-1.0.0 → biblicus-1.1.0}/scripts/wikipedia_rag_demo.py +13 -13
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/__init__.py +5 -5
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/analysis/__init__.py +1 -1
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/analysis/base.py +10 -10
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/analysis/markov.py +78 -68
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/analysis/models.py +47 -47
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/analysis/profiling.py +58 -48
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/analysis/topic_modeling.py +56 -51
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/cli.py +224 -177
- biblicus-1.0.0/src/biblicus/recipes.py → biblicus-1.1.0/src/biblicus/configuration.py +14 -14
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/constants.py +2 -2
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/context_engine/assembler.py +49 -19
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/context_engine/retrieval.py +46 -42
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/corpus.py +116 -108
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/errors.py +3 -3
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/evaluation.py +27 -25
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/extraction.py +103 -98
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/extraction_evaluation.py +26 -26
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/extractors/deepgram_stt.py +7 -7
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/extractors/docling_granite_text.py +11 -11
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/extractors/docling_smol_text.py +11 -11
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/extractors/markitdown_text.py +4 -4
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/extractors/openai_stt.py +7 -7
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/extractors/paddleocr_vl_text.py +20 -18
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/extractors/pipeline.py +8 -8
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/extractors/rapidocr_text.py +3 -3
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/extractors/unstructured_text.py +3 -3
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/hooks.py +4 -4
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/knowledge_base.py +33 -31
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/models.py +78 -78
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/retrieval.py +47 -40
- biblicus-1.1.0/src/biblicus/retrievers/__init__.py +50 -0
- biblicus-1.1.0/src/biblicus/retrievers/base.py +65 -0
- {biblicus-1.0.0/src/biblicus/backends → biblicus-1.1.0/src/biblicus/retrievers}/embedding_index_common.py +44 -41
- {biblicus-1.0.0/src/biblicus/backends → biblicus-1.1.0/src/biblicus/retrievers}/embedding_index_file.py +87 -58
- {biblicus-1.0.0/src/biblicus/backends → biblicus-1.1.0/src/biblicus/retrievers}/embedding_index_inmemory.py +88 -59
- biblicus-1.1.0/src/biblicus/retrievers/hybrid.py +301 -0
- {biblicus-1.0.0/src/biblicus/backends → biblicus-1.1.0/src/biblicus/retrievers}/scan.py +83 -73
- {biblicus-1.0.0/src/biblicus/backends → biblicus-1.1.0/src/biblicus/retrievers}/sqlite_full_text_search.py +115 -101
- {biblicus-1.0.0/src/biblicus/backends → biblicus-1.1.0/src/biblicus/retrievers}/tf_vector.py +87 -77
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/text/prompts.py +16 -8
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/text/tool_loop.py +63 -5
- {biblicus-1.0.0 → biblicus-1.1.0/src/biblicus.egg-info}/PKG-INFO +30 -21
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus.egg-info/SOURCES.txt +14 -14
- biblicus-1.1.0/tests/test_text_extract_tool_calls.py +110 -0
- biblicus-1.1.0/tests/test_text_utility_tool_calls.py +314 -0
- biblicus-1.1.0/tests/test_tool_loop_safeguards.py +171 -0
- biblicus-1.0.0/docs/EMBEDDING_RETRIEVAL.md +0 -57
- biblicus-1.0.0/docs/PR_FAQ_CONTEXT_ENGINE.md +0 -43
- biblicus-1.0.0/docs/PR_FAQ_EMBEDDING_RETRIEVAL.md +0 -105
- biblicus-1.0.0/docs/PR_FAQ_TEXT_ANNOTATE.md +0 -118
- biblicus-1.0.0/docs/TEXT_UTILITIES.md +0 -137
- biblicus-1.0.0/features/backend_validation.feature +0 -14
- biblicus-1.0.0/features/context_engine_retrieval_internal_branches.feature +0 -6
- biblicus-1.0.0/features/extraction_run_lifecycle.feature +0 -117
- biblicus-1.0.0/features/recipe_file_extraction.feature +0 -35
- biblicus-1.0.0/features/retrieval_build_recipes.feature +0 -19
- biblicus-1.0.0/features/steps/context_engine_retrieval_internal_steps.py +0 -113
- biblicus-1.0.0/features/steps/extraction_run_lifecycle_steps.py +0 -152
- biblicus-1.0.0/features/steps/retrieval_build_recipe_steps.py +0 -64
- biblicus-1.0.0/features/steps/text_tool_loop_steps.py +0 -36
- biblicus-1.0.0/src/biblicus/backends/__init__.py +0 -50
- biblicus-1.0.0/src/biblicus/backends/base.py +0 -65
- biblicus-1.0.0/src/biblicus/backends/hybrid.py +0 -292
- {biblicus-1.0.0 → biblicus-1.1.0}/LICENSE +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/MANIFEST.in +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/THIRD_PARTY_NOTICES.md +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/datasets/extraction_lab/labels.json +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/datasets/retrieval_lab/labels.json +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/datasets/wikipedia_mini.json +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/ARCHITECTURE.md +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/CONTEXT_ENGINE_DEMO.md +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/STT.md +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/TESTING.md +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/USER_CONFIGURATION.md +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/USE_CASES.md +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/UTILITIES.md +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/api.rst +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/extractors/index.md +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/extractors/ocr/index.md +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/extractors/pipeline-utilities/index.md +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/extractors/speech-to-text/index.md +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/extractors/text-document/index.md +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/extractors/vlm-document/index.md +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/use_cases/notes_to_context_pack.md +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/docs/use_cases/text_redact.md +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/70_context_retriever.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/71_context_compaction.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/72_context_history_compaction.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/73_context_nested_compaction.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/74_context_regeneration.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/75_context_default_regeneration.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/76_context_pack_budget_weights.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/77_context_default_pack_priority.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/78_context_default_pack_weights.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/79_context_nested_context_packs.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/80_context_nested_pack_budget_cap.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/81_context_nested_regeneration.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/82_context_explicit_regeneration.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/83_context_explicit_pack_priority.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/84_context_explicit_pack_weights.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/85_context_expansion.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/86_context_engine_errors.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/87_context_compactor_strategies.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/88_context_engine_model_validation.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/91_tf_vector_internal_branches.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/93_context_engine_full_paths.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/ai_llm.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/ai_models.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/biblicus_corpus.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/cli_parsing.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/content_sniffing.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/context_pack.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/context_pack_policies.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/corpus_identity.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/corpus_internal_branches.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/crawl.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/embedding_index_internal_branches.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/embeddings.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/environment.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/evidence_processing.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/extractor_validation.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/frontmatter.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/hook_config_validation.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/hook_error_handling.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/hook_logging_internal_branches.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/ingest_namespacing.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/ingest_sources.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/integration_image_samples.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/integration_mixed_corpus.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/integration_pdf_samples.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/integration_text_link.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/integration_text_redact.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/integration_text_slice.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/integration_wikipedia.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/knowledge_base.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/lifecycle_hooks.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/markov_embeddings_errors.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/markov_start_end_labels.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/model_validation.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/paddleocr_vl_parse_api_response.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/python_api.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/python_hook_logging.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/retrieval_budget.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/select_override_defaults.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/source_helper_internal_branches.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/source_loading.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/ai_llm_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/ai_models_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/cli_parsing_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/context_compaction_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/context_compactor_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/context_default_pack_priority_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/context_default_pack_weights_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/context_default_regeneration_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/context_engine_error_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/context_engine_model_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/context_engine_registry.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/context_engine_retriever.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/context_expansion_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/context_explicit_pack_priority_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/context_explicit_pack_weights_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/context_explicit_regeneration_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/context_history_compaction_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/context_nested_compaction_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/context_nested_context_packs_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/context_nested_pack_budget_cap_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/context_nested_regeneration_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/context_pack_budget_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/context_regeneration_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/context_retriever_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/corpus_internal_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/crawl_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/deepgram_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/embeddings_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/evidence_processing_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/frontmatter_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/hook_logging_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/inference_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/knowledge_base_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/markitdown_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/markov_start_end_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/openai_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/paddleocr_mock_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/paddleocr_vl_unit_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/pdf_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/python_api_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/rapidocr_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/requests_mock_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/select_override_defaults_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/source_helper_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/stt_deepgram_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/stt_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/text_internal_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/text_link_internal_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/text_mock_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/text_slice_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/unstructured_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/user_config_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/steps/wikitext_steps.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/streaming_ingest.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/text_annotate.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/text_extract.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/text_internal_branches.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/text_link.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/text_link_internal_branches.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/text_mock.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/text_redact.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/text_slice.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/features/token_budget.feature +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/scripts/demo_context_engine.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/scripts/download_ag_news.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/scripts/download_audio_samples.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/scripts/download_image_samples.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/scripts/download_mixed_samples.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/scripts/download_pdf_samples.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/scripts/download_wikipedia.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/scripts/test.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/scripts/use_cases/text_redact_demo.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/setup.cfg +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/__main__.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/_vendor/dotyaml/__init__.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/_vendor/dotyaml/interpolation.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/_vendor/dotyaml/loader.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/_vendor/dotyaml/transformer.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/ai/__init__.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/ai/embeddings.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/ai/llm.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/ai/models.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/analysis/schema.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/chunking.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/context.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/context_engine/__init__.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/context_engine/compaction.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/context_engine/models.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/crawl.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/embedding_providers.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/evidence_processing.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/extractors/__init__.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/extractors/base.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/extractors/metadata_text.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/extractors/pass_through_text.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/extractors/pdf_text.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/extractors/select_longest_text.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/extractors/select_override.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/extractors/select_smart_override.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/extractors/select_text.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/frontmatter.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/hook_logging.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/hook_manager.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/ignore.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/inference.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/sources.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/text/__init__.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/text/annotate.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/text/extract.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/text/link.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/text/markup.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/text/models.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/text/redact.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/text/slice.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/time.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/uris.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus/user_config.py +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus.egg-info/dependency_links.txt +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus.egg-info/entry_points.txt +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus.egg-info/requires.txt +0 -0
- {biblicus-1.0.0 → biblicus-1.1.0}/src/biblicus.egg-info/top_level.txt +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: biblicus
|
|
3
|
-
Version: 1.
|
|
3
|
+
Version: 1.1.0
|
|
4
4
|
Summary: Command line interface and Python library for corpus ingestion, retrieval, and evaluation.
|
|
5
5
|
License: MIT
|
|
6
6
|
Requires-Python: >=3.9
|
|
@@ -80,7 +80,7 @@ See [retrieval augmented generation overview] for a short introduction to the id
|
|
|
80
80
|
## Analysis highlights
|
|
81
81
|
|
|
82
82
|
- `biblicus analyze markov` learns a directed, weighted state transition graph over segmented text.
|
|
83
|
-
- YAML
|
|
83
|
+
- YAML configurations support cascading composition plus dotted `--config key=value` overrides.
|
|
84
84
|
- Text extract splits long texts with an LLM by inserting XML tags in-place for structured spans.
|
|
85
85
|
- See `docs/MARKOV_ANALYSIS.md` for Markov analysis details and runnable demos.
|
|
86
86
|
- See `docs/TEXT_EXTRACT.md` for the text extract utility and examples.
|
|
@@ -167,7 +167,7 @@ sequenceDiagram
|
|
|
167
167
|
|
|
168
168
|
- You can ingest raw material once, then try many retrieval approaches over time.
|
|
169
169
|
- You can keep raw files readable and portable, without locking your data inside a database.
|
|
170
|
-
- You can evaluate retrieval
|
|
170
|
+
- You can evaluate retrieval snapshots against shared datasets and compare backends using the same corpus.
|
|
171
171
|
|
|
172
172
|
## Typical flow
|
|
173
173
|
|
|
@@ -176,7 +176,7 @@ sequenceDiagram
|
|
|
176
176
|
- Crawl a website section into corpus items when you want a repeatable “import from the web” workflow.
|
|
177
177
|
- Run extraction when you want derived text artifacts from non-text sources.
|
|
178
178
|
- Reindex to refresh the catalog after edits.
|
|
179
|
-
- Build a retrieval
|
|
179
|
+
- Build a retrieval snapshot with a backend.
|
|
180
180
|
- Query the run to collect evidence and evaluate it with datasets.
|
|
181
181
|
|
|
182
182
|
## Install
|
|
@@ -292,7 +292,7 @@ for note_title, note_text in notes:
|
|
|
292
292
|
corpus.ingest_note(note_text, title=note_title, tags=["memory"])
|
|
293
293
|
|
|
294
294
|
backend = get_backend("scan")
|
|
295
|
-
run = backend.build_run(corpus,
|
|
295
|
+
run = backend.build_run(corpus, configuration_name="Story demo", config={})
|
|
296
296
|
budget = QueryBudget(max_total_items=5, maximum_total_characters=2000, max_items_per_source=None)
|
|
297
297
|
result = backend.query(
|
|
298
298
|
corpus,
|
|
@@ -336,8 +336,8 @@ Example output:
|
|
|
336
336
|
"maximum_total_characters": 2000,
|
|
337
337
|
"max_items_per_source": null
|
|
338
338
|
},
|
|
339
|
-
"
|
|
340
|
-
"
|
|
339
|
+
"snapshot_id": "RUN_ID",
|
|
340
|
+
"configuration_id": "RECIPE_ID",
|
|
341
341
|
"backend_id": "scan",
|
|
342
342
|
"generated_at": "2026-01-29T00:00:00.000000Z",
|
|
343
343
|
"evidence": [
|
|
@@ -352,8 +352,8 @@ Example output:
|
|
|
352
352
|
"span_start": null,
|
|
353
353
|
"span_end": null,
|
|
354
354
|
"stage": "scan",
|
|
355
|
-
"
|
|
356
|
-
"
|
|
355
|
+
"configuration_id": "RECIPE_ID",
|
|
356
|
+
"snapshot_id": "RUN_ID",
|
|
357
357
|
"hash": null
|
|
358
358
|
}
|
|
359
359
|
],
|
|
@@ -422,7 +422,7 @@ flowchart TB
|
|
|
422
422
|
|
|
423
423
|
subgraph RowExtraction[Pluggable: extraction pipeline]
|
|
424
424
|
direction TB
|
|
425
|
-
Catalog --> Extract[Extract pipeline] --> ExtractedText[Extracted text artifacts] --> ExtractionRun[Extraction
|
|
425
|
+
Catalog --> Extract[Extract pipeline] --> ExtractedText[Extracted text artifacts] --> ExtractionRun[Extraction snapshot manifest]
|
|
426
426
|
end
|
|
427
427
|
|
|
428
428
|
subgraph RowRetrieval[Pluggable: retrieval backend]
|
|
@@ -484,7 +484,7 @@ From Python, the same flow is available through the Corpus class and backend int
|
|
|
484
484
|
- Ingest notes with `Corpus.ingest_note`.
|
|
485
485
|
- Ingest files or web addresses with `Corpus.ingest_source`.
|
|
486
486
|
- List items with `Corpus.list_items`.
|
|
487
|
-
- Build a retrieval
|
|
487
|
+
- Build a retrieval snapshot with `get_backend` and `backend.build_run`.
|
|
488
488
|
- Query a run with `backend.query`.
|
|
489
489
|
- Evaluate with `evaluate_run`.
|
|
490
490
|
|
|
@@ -530,13 +530,13 @@ corpus/
|
|
|
530
530
|
runs/
|
|
531
531
|
extraction/
|
|
532
532
|
pipeline/
|
|
533
|
-
<
|
|
533
|
+
<snapshot id>/
|
|
534
534
|
manifest.json
|
|
535
535
|
text/
|
|
536
536
|
<item id>.txt
|
|
537
537
|
retrieval/
|
|
538
538
|
<backend id>/
|
|
539
|
-
<
|
|
539
|
+
<snapshot id>/
|
|
540
540
|
manifest.json
|
|
541
541
|
```
|
|
542
542
|
|
|
@@ -552,7 +552,7 @@ For detailed documentation including configuration options, performance characte
|
|
|
552
552
|
|
|
553
553
|
## Retrieval documentation
|
|
554
554
|
|
|
555
|
-
For the retrieval pipeline overview and
|
|
555
|
+
For the retrieval pipeline overview and snapshot artifacts, see `docs/RETRIEVAL.md`. For retrieval quality upgrades
|
|
556
556
|
(tuned lexical baseline, reranking, hybrid retrieval), see `docs/RETRIEVAL_QUALITY.md`. For evaluation workflows
|
|
557
557
|
and dataset formats, see `docs/RETRIEVAL_EVALUATION.md`. For a runnable walkthrough, use the retrieval evaluation lab
|
|
558
558
|
script (`scripts/retrieval_evaluation_lab.py`).
|
|
@@ -615,26 +615,26 @@ See `docs/TEXT_SLICE.md` for the utility API and examples.
|
|
|
615
615
|
|
|
616
616
|
Biblicus can run analysis pipelines on extracted text without changing the raw corpus. Profiling and topic modeling
|
|
617
617
|
are the first analysis backends. Profiling summarizes corpus composition and extraction coverage. Topic modeling reads
|
|
618
|
-
an extraction
|
|
618
|
+
an extraction snapshot, optionally applies an LLM-driven extraction pass, applies lexical processing, runs BERTopic, and
|
|
619
619
|
optionally applies an LLM fine-tuning pass to label topics. The output is structured JavaScript Object Notation.
|
|
620
620
|
|
|
621
621
|
See `docs/ANALYSIS.md` for the analysis pipeline overview, `docs/PROFILING.md` for profiling, and
|
|
622
622
|
`docs/TOPIC_MODELING.md` for topic modeling details.
|
|
623
623
|
|
|
624
|
-
Run a topic analysis using a
|
|
624
|
+
Run a topic analysis using a configuration file:
|
|
625
625
|
|
|
626
626
|
```
|
|
627
|
-
biblicus analyze topics --corpus corpora/example --
|
|
627
|
+
biblicus analyze topics --corpus corpora/example --configuration configurations/topic-modeling.yml --extraction-run pipeline:<snapshot_id>
|
|
628
628
|
```
|
|
629
629
|
|
|
630
|
-
If `--extraction-run` is omitted, Biblicus uses the most recent extraction
|
|
630
|
+
If `--extraction-run` is omitted, Biblicus uses the most recent extraction snapshot and emits a warning about
|
|
631
631
|
reproducibility. The analysis output is stored under:
|
|
632
632
|
|
|
633
633
|
```
|
|
634
|
-
.biblicus/runs/analysis/topic-modeling/<
|
|
634
|
+
.biblicus/runs/analysis/topic-modeling/<snapshot_id>/output.json
|
|
635
635
|
```
|
|
636
636
|
|
|
637
|
-
Minimal
|
|
637
|
+
Minimal configuration example:
|
|
638
638
|
|
|
639
639
|
```yaml
|
|
640
640
|
schema_version: 1
|
|
@@ -659,7 +659,7 @@ llm_fine_tuning:
|
|
|
659
659
|
```
|
|
660
660
|
|
|
661
661
|
LLM extraction and fine-tuning require `biblicus[openai]` and a configured OpenAI API key.
|
|
662
|
-
|
|
662
|
+
Configuration files are validated strictly against the topic modeling schema, so type mismatches or unknown fields are errors.
|
|
663
663
|
AG News integration runs require `biblicus[datasets]` in addition to `biblicus[topic-modeling]`.
|
|
664
664
|
|
|
665
665
|
For a repeatable, real-world integration run that downloads AG News and executes topic modeling, use:
|
|
@@ -712,6 +712,15 @@ Build the documentation:
|
|
|
712
712
|
python -m sphinx -b html docs docs/_build/html
|
|
713
713
|
```
|
|
714
714
|
|
|
715
|
+
Preview the documentation locally:
|
|
716
|
+
|
|
717
|
+
```
|
|
718
|
+
cd docs/_build/html
|
|
719
|
+
python -m http.server
|
|
720
|
+
```
|
|
721
|
+
|
|
722
|
+
Open `http://localhost:8000` in your browser.
|
|
723
|
+
|
|
715
724
|
## License
|
|
716
725
|
|
|
717
726
|
License terms are in `LICENSE`.
|
|
@@ -26,7 +26,7 @@ See [retrieval augmented generation overview] for a short introduction to the id
|
|
|
26
26
|
## Analysis highlights
|
|
27
27
|
|
|
28
28
|
- `biblicus analyze markov` learns a directed, weighted state transition graph over segmented text.
|
|
29
|
-
- YAML
|
|
29
|
+
- YAML configurations support cascading composition plus dotted `--config key=value` overrides.
|
|
30
30
|
- Text extract splits long texts with an LLM by inserting XML tags in-place for structured spans.
|
|
31
31
|
- See `docs/MARKOV_ANALYSIS.md` for Markov analysis details and runnable demos.
|
|
32
32
|
- See `docs/TEXT_EXTRACT.md` for the text extract utility and examples.
|
|
@@ -113,7 +113,7 @@ sequenceDiagram
|
|
|
113
113
|
|
|
114
114
|
- You can ingest raw material once, then try many retrieval approaches over time.
|
|
115
115
|
- You can keep raw files readable and portable, without locking your data inside a database.
|
|
116
|
-
- You can evaluate retrieval
|
|
116
|
+
- You can evaluate retrieval snapshots against shared datasets and compare backends using the same corpus.
|
|
117
117
|
|
|
118
118
|
## Typical flow
|
|
119
119
|
|
|
@@ -122,7 +122,7 @@ sequenceDiagram
|
|
|
122
122
|
- Crawl a website section into corpus items when you want a repeatable “import from the web” workflow.
|
|
123
123
|
- Run extraction when you want derived text artifacts from non-text sources.
|
|
124
124
|
- Reindex to refresh the catalog after edits.
|
|
125
|
-
- Build a retrieval
|
|
125
|
+
- Build a retrieval snapshot with a backend.
|
|
126
126
|
- Query the run to collect evidence and evaluate it with datasets.
|
|
127
127
|
|
|
128
128
|
## Install
|
|
@@ -238,7 +238,7 @@ for note_title, note_text in notes:
|
|
|
238
238
|
corpus.ingest_note(note_text, title=note_title, tags=["memory"])
|
|
239
239
|
|
|
240
240
|
backend = get_backend("scan")
|
|
241
|
-
run = backend.build_run(corpus,
|
|
241
|
+
run = backend.build_run(corpus, configuration_name="Story demo", config={})
|
|
242
242
|
budget = QueryBudget(max_total_items=5, maximum_total_characters=2000, max_items_per_source=None)
|
|
243
243
|
result = backend.query(
|
|
244
244
|
corpus,
|
|
@@ -282,8 +282,8 @@ Example output:
|
|
|
282
282
|
"maximum_total_characters": 2000,
|
|
283
283
|
"max_items_per_source": null
|
|
284
284
|
},
|
|
285
|
-
"
|
|
286
|
-
"
|
|
285
|
+
"snapshot_id": "RUN_ID",
|
|
286
|
+
"configuration_id": "RECIPE_ID",
|
|
287
287
|
"backend_id": "scan",
|
|
288
288
|
"generated_at": "2026-01-29T00:00:00.000000Z",
|
|
289
289
|
"evidence": [
|
|
@@ -298,8 +298,8 @@ Example output:
|
|
|
298
298
|
"span_start": null,
|
|
299
299
|
"span_end": null,
|
|
300
300
|
"stage": "scan",
|
|
301
|
-
"
|
|
302
|
-
"
|
|
301
|
+
"configuration_id": "RECIPE_ID",
|
|
302
|
+
"snapshot_id": "RUN_ID",
|
|
303
303
|
"hash": null
|
|
304
304
|
}
|
|
305
305
|
],
|
|
@@ -368,7 +368,7 @@ flowchart TB
|
|
|
368
368
|
|
|
369
369
|
subgraph RowExtraction[Pluggable: extraction pipeline]
|
|
370
370
|
direction TB
|
|
371
|
-
Catalog --> Extract[Extract pipeline] --> ExtractedText[Extracted text artifacts] --> ExtractionRun[Extraction
|
|
371
|
+
Catalog --> Extract[Extract pipeline] --> ExtractedText[Extracted text artifacts] --> ExtractionRun[Extraction snapshot manifest]
|
|
372
372
|
end
|
|
373
373
|
|
|
374
374
|
subgraph RowRetrieval[Pluggable: retrieval backend]
|
|
@@ -430,7 +430,7 @@ From Python, the same flow is available through the Corpus class and backend int
|
|
|
430
430
|
- Ingest notes with `Corpus.ingest_note`.
|
|
431
431
|
- Ingest files or web addresses with `Corpus.ingest_source`.
|
|
432
432
|
- List items with `Corpus.list_items`.
|
|
433
|
-
- Build a retrieval
|
|
433
|
+
- Build a retrieval snapshot with `get_backend` and `backend.build_run`.
|
|
434
434
|
- Query a run with `backend.query`.
|
|
435
435
|
- Evaluate with `evaluate_run`.
|
|
436
436
|
|
|
@@ -476,13 +476,13 @@ corpus/
|
|
|
476
476
|
runs/
|
|
477
477
|
extraction/
|
|
478
478
|
pipeline/
|
|
479
|
-
<
|
|
479
|
+
<snapshot id>/
|
|
480
480
|
manifest.json
|
|
481
481
|
text/
|
|
482
482
|
<item id>.txt
|
|
483
483
|
retrieval/
|
|
484
484
|
<backend id>/
|
|
485
|
-
<
|
|
485
|
+
<snapshot id>/
|
|
486
486
|
manifest.json
|
|
487
487
|
```
|
|
488
488
|
|
|
@@ -498,7 +498,7 @@ For detailed documentation including configuration options, performance characte
|
|
|
498
498
|
|
|
499
499
|
## Retrieval documentation
|
|
500
500
|
|
|
501
|
-
For the retrieval pipeline overview and
|
|
501
|
+
For the retrieval pipeline overview and snapshot artifacts, see `docs/RETRIEVAL.md`. For retrieval quality upgrades
|
|
502
502
|
(tuned lexical baseline, reranking, hybrid retrieval), see `docs/RETRIEVAL_QUALITY.md`. For evaluation workflows
|
|
503
503
|
and dataset formats, see `docs/RETRIEVAL_EVALUATION.md`. For a runnable walkthrough, use the retrieval evaluation lab
|
|
504
504
|
script (`scripts/retrieval_evaluation_lab.py`).
|
|
@@ -561,26 +561,26 @@ See `docs/TEXT_SLICE.md` for the utility API and examples.
|
|
|
561
561
|
|
|
562
562
|
Biblicus can run analysis pipelines on extracted text without changing the raw corpus. Profiling and topic modeling
|
|
563
563
|
are the first analysis backends. Profiling summarizes corpus composition and extraction coverage. Topic modeling reads
|
|
564
|
-
an extraction
|
|
564
|
+
an extraction snapshot, optionally applies an LLM-driven extraction pass, applies lexical processing, runs BERTopic, and
|
|
565
565
|
optionally applies an LLM fine-tuning pass to label topics. The output is structured JavaScript Object Notation.
|
|
566
566
|
|
|
567
567
|
See `docs/ANALYSIS.md` for the analysis pipeline overview, `docs/PROFILING.md` for profiling, and
|
|
568
568
|
`docs/TOPIC_MODELING.md` for topic modeling details.
|
|
569
569
|
|
|
570
|
-
Run a topic analysis using a
|
|
570
|
+
Run a topic analysis using a configuration file:
|
|
571
571
|
|
|
572
572
|
```
|
|
573
|
-
biblicus analyze topics --corpus corpora/example --
|
|
573
|
+
biblicus analyze topics --corpus corpora/example --configuration configurations/topic-modeling.yml --extraction-run pipeline:<snapshot_id>
|
|
574
574
|
```
|
|
575
575
|
|
|
576
|
-
If `--extraction-run` is omitted, Biblicus uses the most recent extraction
|
|
576
|
+
If `--extraction-run` is omitted, Biblicus uses the most recent extraction snapshot and emits a warning about
|
|
577
577
|
reproducibility. The analysis output is stored under:
|
|
578
578
|
|
|
579
579
|
```
|
|
580
|
-
.biblicus/runs/analysis/topic-modeling/<
|
|
580
|
+
.biblicus/runs/analysis/topic-modeling/<snapshot_id>/output.json
|
|
581
581
|
```
|
|
582
582
|
|
|
583
|
-
Minimal
|
|
583
|
+
Minimal configuration example:
|
|
584
584
|
|
|
585
585
|
```yaml
|
|
586
586
|
schema_version: 1
|
|
@@ -605,7 +605,7 @@ llm_fine_tuning:
|
|
|
605
605
|
```
|
|
606
606
|
|
|
607
607
|
LLM extraction and fine-tuning require `biblicus[openai]` and a configured OpenAI API key.
|
|
608
|
-
|
|
608
|
+
Configuration files are validated strictly against the topic modeling schema, so type mismatches or unknown fields are errors.
|
|
609
609
|
AG News integration runs require `biblicus[datasets]` in addition to `biblicus[topic-modeling]`.
|
|
610
610
|
|
|
611
611
|
For a repeatable, real-world integration run that downloads AG News and executes topic modeling, use:
|
|
@@ -658,6 +658,15 @@ Build the documentation:
|
|
|
658
658
|
python -m sphinx -b html docs docs/_build/html
|
|
659
659
|
```
|
|
660
660
|
|
|
661
|
+
Preview the documentation locally:
|
|
662
|
+
|
|
663
|
+
```
|
|
664
|
+
cd docs/_build/html
|
|
665
|
+
python -m http.server
|
|
666
|
+
```
|
|
667
|
+
|
|
668
|
+
Open `http://localhost:8000` in your browser.
|
|
669
|
+
|
|
661
670
|
## License
|
|
662
671
|
|
|
663
672
|
License terms are in `LICENSE`.
|
|
@@ -1,31 +1,31 @@
|
|
|
1
1
|
# Corpus analysis
|
|
2
2
|
|
|
3
3
|
Biblicus supports analysis backends that run on extracted text artifacts without changing the raw corpus. Analysis is a
|
|
4
|
-
pluggable phase that reads an extraction
|
|
4
|
+
pluggable phase that reads an extraction snapshot, produces structured output, and stores artifacts under the corpus runs
|
|
5
5
|
folder. Each analysis backend declares its own configuration schema and output contract, and all schemas are validated
|
|
6
6
|
strictly.
|
|
7
7
|
|
|
8
|
-
## How analysis
|
|
8
|
+
## How analysis snapshots work
|
|
9
9
|
|
|
10
|
-
- Analysis runs are tied to a corpus state via the extraction
|
|
11
|
-
- The analysis output is written under `.biblicus/runs/analysis/<analysis-id>/<
|
|
12
|
-
- Analysis is reproducible when you supply the same extraction
|
|
13
|
-
- Analysis configuration is stored as a
|
|
10
|
+
- Analysis runs are tied to a corpus state via the extraction snapshot reference.
|
|
11
|
+
- The analysis output is written under `.biblicus/runs/analysis/<analysis-id>/<snapshot_id>/`.
|
|
12
|
+
- Analysis is reproducible when you supply the same extraction snapshot and corpus catalog state.
|
|
13
|
+
- Analysis configuration is stored as a configuration manifest in the run metadata.
|
|
14
14
|
|
|
15
|
-
If you omit the extraction
|
|
16
|
-
repeatable analysis
|
|
15
|
+
If you omit the extraction snapshot, Biblicus uses the most recent extraction snapshot and emits a reproducibility warning. For
|
|
16
|
+
repeatable analysis snapshots, always pass the extraction snapshot reference explicitly.
|
|
17
17
|
|
|
18
|
-
## Analysis
|
|
18
|
+
## Analysis snapshot artifacts
|
|
19
19
|
|
|
20
|
-
Every analysis
|
|
20
|
+
Every analysis snapshot records a manifest alongside the output:
|
|
21
21
|
|
|
22
22
|
```
|
|
23
|
-
.biblicus/runs/analysis/<analysis-id>/<
|
|
23
|
+
.biblicus/runs/analysis/<analysis-id>/<snapshot_id>/
|
|
24
24
|
manifest.json
|
|
25
25
|
output.json
|
|
26
26
|
```
|
|
27
27
|
|
|
28
|
-
The manifest captures the
|
|
28
|
+
The manifest captures the configuration, extraction snapshot reference, and catalog timestamp so results can be reproduced and
|
|
29
29
|
compared later.
|
|
30
30
|
|
|
31
31
|
## Inspecting output
|
|
@@ -38,21 +38,21 @@ cat corpora/example/.biblicus/runs/analysis/profiling/RUN_ID/output.json
|
|
|
38
38
|
|
|
39
39
|
Each analysis backend defines its own `report` payload. The run metadata is consistent across backends.
|
|
40
40
|
|
|
41
|
-
## Comparing analysis
|
|
41
|
+
## Comparing analysis snapshots
|
|
42
42
|
|
|
43
43
|
When you compare analysis results, record:
|
|
44
44
|
|
|
45
45
|
- Corpus path and catalog timestamp.
|
|
46
46
|
- Extraction run reference.
|
|
47
|
-
- Analysis
|
|
48
|
-
- Analysis
|
|
47
|
+
- Analysis configuration name and configuration.
|
|
48
|
+
- Analysis snapshot identifier and output path.
|
|
49
49
|
|
|
50
50
|
These make it possible to rerun the analysis and explain differences.
|
|
51
51
|
|
|
52
52
|
## Pluggable analysis backends
|
|
53
53
|
|
|
54
54
|
Analysis backends implement the `CorpusAnalysisBackend` interface and are registered under `biblicus.analysis`.
|
|
55
|
-
A backend receives the corpus, a
|
|
55
|
+
A backend receives the corpus, a configuration name, a configuration mapping, and an extraction snapshot reference. It returns a
|
|
56
56
|
Pydantic model that is serialized to JavaScript Object Notation for storage.
|
|
57
57
|
|
|
58
58
|
## Choosing an analysis backend
|
|
@@ -61,22 +61,22 @@ Start with profiling when you need fast, deterministic baselines. Use topic mode
|
|
|
61
61
|
and exploratory labels. Use Markov analysis when you want state-transition structure over sequences of segments.
|
|
62
62
|
Combine multiple backends for a clear view of corpus composition, themes, and state dynamics.
|
|
63
63
|
|
|
64
|
-
##
|
|
64
|
+
## Configuration files
|
|
65
65
|
|
|
66
|
-
Analysis
|
|
66
|
+
Analysis configurations are optional JavaScript Object Notation or YAML files that capture configuration in a repeatable way.
|
|
67
67
|
They are useful for sharing experiments and keeping runs reproducible.
|
|
68
68
|
|
|
69
|
-
Recipes support cascading composition. When a command accepts `--
|
|
70
|
-
merges them in order, where later
|
|
69
|
+
Recipes support cascading composition. When a command accepts `--configuration`, you can pass multiple configuration files. Biblicus
|
|
70
|
+
merges them in order, where later configurations override earlier configurations via a deep merge. You can then apply `--config`
|
|
71
71
|
overrides on top of the composed view.
|
|
72
72
|
|
|
73
|
-
Minimal profiling
|
|
73
|
+
Minimal profiling configuration:
|
|
74
74
|
|
|
75
75
|
```
|
|
76
76
|
schema_version: 1
|
|
77
77
|
```
|
|
78
78
|
|
|
79
|
-
Minimal topic modeling
|
|
79
|
+
Minimal topic modeling configuration:
|
|
80
80
|
|
|
81
81
|
```
|
|
82
82
|
schema_version: 1
|
|
@@ -87,7 +87,7 @@ bertopic_analysis:
|
|
|
87
87
|
nr_topics: 8
|
|
88
88
|
```
|
|
89
89
|
|
|
90
|
-
Minimal Markov analysis
|
|
90
|
+
Minimal Markov analysis configuration:
|
|
91
91
|
|
|
92
92
|
```
|
|
93
93
|
schema_version: 1
|
|
@@ -111,7 +111,7 @@ The integration demo script is a working reference you can use as a starting poi
|
|
|
111
111
|
python scripts/topic_modeling_integration.py --corpus corpora/ag_news_demo --force
|
|
112
112
|
```
|
|
113
113
|
|
|
114
|
-
The command prints the analysis
|
|
114
|
+
The command prints the analysis snapshot identifier and the output path. Open the resulting `output.json` to inspect per-topic
|
|
115
115
|
labels, keywords, and document examples.
|
|
116
116
|
|
|
117
117
|
## Markov analysis
|
|
@@ -134,7 +134,7 @@ deterministic counts and distribution metrics. See `docs/PROFILING.md` for the f
|
|
|
134
134
|
python -m biblicus analyze profile --corpus corpora/example --extraction-run pipeline:RUN_ID
|
|
135
135
|
```
|
|
136
136
|
|
|
137
|
-
The command writes an analysis
|
|
137
|
+
The command writes an analysis snapshot directory and prints the snapshot identifier.
|
|
138
138
|
|
|
139
139
|
Run profiling from the CLI:
|
|
140
140
|
|
|
@@ -15,7 +15,7 @@ Design starts from strict behavior-driven development:
|
|
|
15
15
|
- All changes should follow specification-first behavior-driven development: failing scenario,
|
|
16
16
|
implementation, passing scenario, then refactor.
|
|
17
17
|
- Behavior-driven development scenarios are not an afterthought: they are how we keep the domain
|
|
18
|
-
vocabulary consistent and the platform comparable across backends and
|
|
18
|
+
vocabulary consistent and the platform comparable across backends and configurations.
|
|
19
19
|
- **Specification completeness** is mandatory: if behavior exists, it must be specified.
|
|
20
20
|
Ambiguous or untestable behavior should be removed or turned into an explicit error.
|
|
21
21
|
|
|
@@ -42,7 +42,7 @@ core nouns:
|
|
|
42
42
|
- I have a **corpus** at this path or uniform resource identifier.
|
|
43
43
|
- I ingest an **item** with optional **metadata**.
|
|
44
44
|
- I rebuild the derived **index** after edits.
|
|
45
|
-
- I run a **
|
|
45
|
+
- I run a **configuration** against the same corpus.
|
|
46
46
|
- I query and receive **evidence**.
|
|
47
47
|
|
|
48
48
|
Anything that does not map cleanly to these nouns is either a derived helper or a backend-specific
|
|
@@ -72,13 +72,13 @@ requirements.
|
|
|
72
72
|
- **Knowledge base backend**: an implementation that can ingest and retrieve from a corpus, such
|
|
73
73
|
as scan, full text search, vector retrieval, or hybrid retrieval, exposed to procedures through
|
|
74
74
|
retrieval primitives.
|
|
75
|
-
- **Retrieval
|
|
75
|
+
- **Retrieval configuration**: a named configuration bundle for a backend, such as chunking rules,
|
|
76
76
|
embedding model and version, hybrid weights, reranker choice, and filters. This is what we
|
|
77
77
|
benchmark and compare.
|
|
78
|
-
- **
|
|
79
|
-
plus any referenced
|
|
80
|
-
- **
|
|
81
|
-
|
|
78
|
+
- **Configuration manifest**: a reproducibility record describing the backend and configuration parameters,
|
|
79
|
+
plus any referenced snapshot artifacts and build snapshots.
|
|
80
|
+
- **Snapshot artifacts**: optional, persisted representations derived from raw content for a given
|
|
81
|
+
configuration and backend, such as chunks, embeddings, or indexes. Some backends intentionally have
|
|
82
82
|
none and operate on demand.
|
|
83
83
|
- **Evidence**: structured retrieval output from backend queries. Evidence includes spans, scores,
|
|
84
84
|
and provenance used by downstream retrieval augmented generation procedures.
|
|
@@ -95,7 +95,7 @@ requirements.
|
|
|
95
95
|
- **Minimal opinion raw store**: raw ingestion should work for a folder of files with optional
|
|
96
96
|
lightweight tagging.
|
|
97
97
|
- **Reproducibility by default**: comparisons require manifests (even when there are no persisted
|
|
98
|
-
|
|
98
|
+
snapshot artifacts).
|
|
99
99
|
- **Mutability is real**: corpora are edited, pruned, and reorganized; re-indexing must be a core
|
|
100
100
|
workflow.
|
|
101
101
|
- **Separation of concerns**: retrieval returns evidence; retrieval-augmented generation patterns
|
|
@@ -110,7 +110,7 @@ requirements.
|
|
|
110
110
|
These are explicit, opinionated policies encoded into the project:
|
|
111
111
|
|
|
112
112
|
- **Evidence schema strictness**: moderate-to-strong schema. Evidence must include stable
|
|
113
|
-
identifiers, provenance, and retrieval scores; richer fields (spans, stage,
|
|
113
|
+
identifiers, provenance, and retrieval scores; richer fields (spans, stage, configuration and run
|
|
114
114
|
identifiers) are expected.
|
|
115
115
|
- **Retrieval stages**: multi-stage is explicit (retrieve, rerank, then filter). Pipelines are
|
|
116
116
|
expressed through evidence metadata rather than hard-coded backends.
|
|
@@ -131,7 +131,7 @@ Evidence is the canonical output of retrieval. Required fields:
|
|
|
131
131
|
- `score` and `rank`
|
|
132
132
|
- `text` (or `content_ref` when non-text)
|
|
133
133
|
- `stage` (for example, `scan`, `full-text-search`, `rerank`)
|
|
134
|
-
- `
|
|
134
|
+
- `configuration_id` / `snapshot_id` (for reproducibility)
|
|
135
135
|
- Optional: `span_start`, `span_end`, `hash`
|
|
136
136
|
|
|
137
137
|
## Evidence lifecycle
|
|
@@ -220,12 +220,12 @@ The interface stays the same; topology is configuration.
|
|
|
220
220
|
|
|
221
221
|
### Reproducibility
|
|
222
222
|
|
|
223
|
-
- Biblicus always records a **
|
|
224
|
-
- When a backend produces persisted
|
|
225
|
-
|
|
226
|
-
- Manifests exist even for just-in-time backends (
|
|
223
|
+
- Biblicus always records a **configuration manifest** for reproducibility.
|
|
224
|
+
- When a backend produces persisted snapshot artifacts, Biblicus treats them as **versioned build
|
|
225
|
+
snapshots** identified by `snapshot_id` (rather than overwriting in place by default).
|
|
226
|
+
- Manifests exist even for just-in-time backends (snapshot artifacts may be empty).
|
|
227
227
|
- Full directed acyclic graph lineage is not included in version zero; revisit only if needed.
|
|
228
|
-
- Optional: define **shared
|
|
228
|
+
- Optional: define **shared snapshot artifact formats** (canonical chunk and embedding stores) so
|
|
229
229
|
multiple backends can reuse intermediates when it makes sense; keep it opt-in.
|
|
230
230
|
|
|
231
231
|
### Evaluation
|
|
@@ -243,8 +243,8 @@ The interface stays the same; topology is configuration.
|
|
|
243
243
|
backend/tool can consume it without requiring a database engine.
|
|
244
244
|
- Canonical version zero format is a single JavaScript Object Notation file at
|
|
245
245
|
`.biblicus/catalog.json`, written atomically (temporary file and rename) on updates.
|
|
246
|
-
- The catalog includes `
|
|
247
|
-
`.biblicus/
|
|
246
|
+
- The catalog includes `latest_snapshot_id` and snapshot manifests are stored at
|
|
247
|
+
`.biblicus/snapshots/<snapshot_id>.json`.
|
|
248
248
|
- If this becomes a bottleneck at very large scales, we **change the specification** (bump
|
|
249
249
|
`schema_version`) rather than introduce multiple “supported” catalog storage modes.
|
|
250
250
|
|
|
@@ -17,7 +17,7 @@ Backends implement two operations:
|
|
|
17
17
|
Backends store artifacts and manifests under:
|
|
18
18
|
|
|
19
19
|
```
|
|
20
|
-
.biblicus/runs/retrieval/<backend_id>/<
|
|
20
|
+
.biblicus/runs/retrieval/<backend_id>/<snapshot_id>/
|
|
21
21
|
manifest.json
|
|
22
22
|
<backend artifacts>
|
|
23
23
|
```
|
|
@@ -26,12 +26,12 @@ The manifest is the reproducible contract. Artifacts are backend-specific and li
|
|
|
26
26
|
|
|
27
27
|
## Implementation checklist
|
|
28
28
|
|
|
29
|
-
1. **Define a Pydantic configuration model** for your backend
|
|
29
|
+
1. **Define a Pydantic configuration model** for your backend configuration.
|
|
30
30
|
2. **Implement `RetrievalBackend`**:
|
|
31
|
-
- `build_run(corpus,
|
|
31
|
+
- `build_run(corpus, configuration_name, config)`
|
|
32
32
|
- `query(corpus, run, query_text, budget)`
|
|
33
33
|
3. **Emit `Evidence`** with required fields:
|
|
34
|
-
- `item_id`, `source_uri`, `media_type`, `score`, `rank`, `stage`, `
|
|
34
|
+
- `item_id`, `source_uri`, `media_type`, `score`, `rank`, `stage`, `configuration_id`, `snapshot_id`
|
|
35
35
|
- `text` **or** `content_ref`
|
|
36
36
|
4. **Register the backend** in `biblicus.backends.available_backends`.
|
|
37
37
|
5. **Add behavior-driven development specifications** before implementation and make them pass with 100% coverage.
|
|
@@ -41,12 +41,12 @@ The manifest is the reproducible contract. Artifacts are backend-specific and li
|
|
|
41
41
|
- Treat **runs** as immutable manifests with reproducible parameters.
|
|
42
42
|
- If your backend needs artifacts, store them under `.biblicus/runs/` and record paths in `artifact_paths`.
|
|
43
43
|
- Keep **text extraction** in explicit pipeline stages, not in backend ingestion.
|
|
44
|
-
See `docs/EXTRACTION.md` for how extraction
|
|
44
|
+
See `docs/EXTRACTION.md` for how extraction snapshots are built and referenced from backend configs.
|
|
45
45
|
|
|
46
46
|
## Reproducibility checklist
|
|
47
47
|
|
|
48
|
-
- Record the extraction
|
|
49
|
-
- Keep the backend
|
|
48
|
+
- Record the extraction snapshot reference used to build the backend.
|
|
49
|
+
- Keep the backend configuration configuration in source control.
|
|
50
50
|
- Reuse the same `QueryBudget` when comparing backends.
|
|
51
51
|
|
|
52
52
|
## Common pitfalls
|
|
@@ -8,7 +8,7 @@ returns evidence with chunk boundaries so you can trace results back to the orig
|
|
|
8
8
|
|
|
9
9
|
## Chunkers are pluggable
|
|
10
10
|
|
|
11
|
-
Chunking is a pluggable interface selected by identifier in a retrieval
|
|
11
|
+
Chunking is a pluggable interface selected by identifier in a retrieval configuration:
|
|
12
12
|
|
|
13
13
|
- `chunker_id`
|
|
14
14
|
- `chunker_config` (Pydantic validated; `extra="forbid"`)
|
|
@@ -1,14 +1,25 @@
|
|
|
1
1
|
# Context Engine
|
|
2
2
|
|
|
3
|
-
The Context Engine is the Biblicus SDK for assembling elastic, budget-aware prompt contexts.
|
|
3
|
+
The Context Engine is the Biblicus SDK for assembling elastic, budget-aware prompt contexts. It lets AI engineers describe *what* should be in an LLM request while Biblicus handles *how* to fit it into a budgeted context window.
|
|
4
|
+
|
|
4
5
|
It turns a high-level plan into:
|
|
5
6
|
|
|
6
7
|
- a system prompt
|
|
7
8
|
- a history message list
|
|
8
9
|
- a user message
|
|
9
10
|
|
|
10
|
-
The Context Engine can **compact** content when it is too large and **expand** retriever packs by
|
|
11
|
-
|
|
11
|
+
The Context Engine can **compact** content when it is too large and **expand** retriever packs by paginating with `offset` and `limit`.
|
|
12
|
+
|
|
13
|
+
> “Context assembly is the most failure-prone part of agent engineering. Engineers need a reliable way to fit knowledge into limited context windows without hand-writing brittle logic.”
|
|
14
|
+
|
|
15
|
+
## Why Context Engine?
|
|
16
|
+
|
|
17
|
+
Context Engine provides a first-class, testable, and reusable context assembly surface. It is designed to be the shared foundation for both Python applications and Tactus procedures.
|
|
18
|
+
|
|
19
|
+
- **Composable Context plans**: Mix system/user messages, nested contexts, and retriever packs.
|
|
20
|
+
- **Budget-aware compaction**: Use pluggable compactor strategies.
|
|
21
|
+
- **Budget-aware expansion**: Use retriever pagination (`offset` + `limit`) to fill remaining budget.
|
|
22
|
+
- **Deterministic assembly**: Produce a predictable message history for model calls.
|
|
12
23
|
|
|
13
24
|
## Core Concepts
|
|
14
25
|
|
|
@@ -118,3 +129,17 @@ policy = {
|
|
|
118
129
|
```
|
|
119
130
|
|
|
120
131
|
Custom compactors can be registered via `compactor_registry`.
|
|
132
|
+
|
|
133
|
+
## FAQ
|
|
134
|
+
|
|
135
|
+
### What does “elastic” mean?
|
|
136
|
+
|
|
137
|
+
Elastic means the Context Engine can **contract** (compact) or **expand** (paginate) retrieval output depending on the current token budget. When a pack is too large it compacts; when it is too small and pagination is available, it can fetch additional pages.
|
|
138
|
+
|
|
139
|
+
### How is pagination used?
|
|
140
|
+
|
|
141
|
+
Retrievers accept `offset` and `limit`. The Context Engine uses those to request additional pages until a target budget is met or no more results are available.
|
|
142
|
+
|
|
143
|
+
### Does this replace Context packs?
|
|
144
|
+
|
|
145
|
+
No. Context packs are still derived from retrieval evidence. The Context Engine composes those packs into model messages and manages how they are sized and placed.
|
|
@@ -127,7 +127,7 @@ biblicus query --corpus corpora/example --query "primary button style preference
|
|
|
127
127
|
|
|
128
128
|
## Common pitfalls
|
|
129
129
|
|
|
130
|
-
- Building context packs from different retrieval
|
|
130
|
+
- Building context packs from different retrieval snapshots while comparing the results.
|
|
131
131
|
- Comparing outputs with different `ordering` or `include_metadata` values.
|
|
132
132
|
- Relying on token counts without recording the tokenizer identifier.
|
|
133
133
|
|