biblicus 1.1.0__tar.gz → 1.1.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {biblicus-1.1.0/src/biblicus.egg-info → biblicus-1.1.1}/PKG-INFO +24 -24
- {biblicus-1.1.0 → biblicus-1.1.1}/README.md +23 -23
- biblicus-1.1.0/docs/ANALYSIS.md → biblicus-1.1.1/docs/analysis.md +3 -3
- biblicus-1.1.1/docs/architecture.md +107 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/backends/index.md +5 -5
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/backends/scan.md +2 -2
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/backends/sqlite-full-text-search.md +2 -2
- biblicus-1.1.0/docs/BACKENDS.md → biblicus-1.1.1/docs/backends.md +1 -1
- biblicus-1.1.0/docs/DEMOS.md → biblicus-1.1.1/docs/demos.md +29 -116
- biblicus-1.1.0/docs/EMBEDDING_RETRIEVAL.md → biblicus-1.1.1/docs/embedding-retrieval.md +1 -1
- biblicus-1.1.0/docs/EXTRACTION.md → biblicus-1.1.1/docs/extraction.md +1 -1
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/index.md +1 -1
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/ocr/paddleocr-vl.md +2 -2
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/ocr/rapidocr.md +1 -1
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/pipeline-utilities/pipeline.md +2 -2
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/pipeline-utilities/select-longest.md +2 -2
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/pipeline-utilities/select-override.md +2 -2
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/pipeline-utilities/select-smart-override.md +2 -2
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/pipeline-utilities/select-text.md +2 -2
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/speech-to-text/deepgram.md +2 -2
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/speech-to-text/openai.md +2 -2
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/text-document/markitdown.md +1 -1
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/text-document/metadata.md +2 -2
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/text-document/pass-through.md +2 -2
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/text-document/pdf.md +1 -1
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/text-document/unstructured.md +1 -1
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/vlm-document/docling-granite.md +1 -1
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/vlm-document/docling-smol.md +1 -1
- biblicus-1.1.0/docs/FEATURE_INDEX.md → biblicus-1.1.1/docs/feature-index.md +23 -23
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/index.rst +35 -36
- biblicus-1.1.0/docs/RETRIEVAL.md → biblicus-1.1.1/docs/retrieval.md +3 -3
- biblicus-1.1.0/docs/ROADMAP.md → biblicus-1.1.1/docs/roadmap.md +2 -2
- biblicus-1.1.0/docs/TEXT_UTILITIES.md → biblicus-1.1.1/docs/text-utilities.md +27 -27
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/use_cases/sequence_markov.md +1 -1
- biblicus-1.1.0/docs/UTILITIES.md → biblicus-1.1.1/docs/utilities.md +3 -3
- {biblicus-1.1.0 → biblicus-1.1.1}/pyproject.toml +1 -1
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/__init__.py +1 -1
- {biblicus-1.1.0 → biblicus-1.1.1/src/biblicus.egg-info}/PKG-INFO +24 -24
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus.egg-info/SOURCES.txt +28 -29
- biblicus-1.1.0/docs/ARCHITECTURE.md +0 -46
- biblicus-1.1.0/docs/ARCHITECTURE_DETAIL.md +0 -267
- {biblicus-1.1.0 → biblicus-1.1.1}/LICENSE +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/MANIFEST.in +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/THIRD_PARTY_NOTICES.md +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/datasets/extraction_lab/labels.json +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/datasets/retrieval_lab/labels.json +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/datasets/wikipedia_mini.json +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/CHUNKING.md +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/CORPUS.md +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/PROFILING.md +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/STT.md +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/TESTING.md +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/api.rst +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/backends/embedding-index-file.md +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/backends/embedding-index-inmemory.md +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/backends/tf-vector.md +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/conf.py +0 -0
- /biblicus-1.1.0/docs/CONTEXT_ENGINE_DEMO.md → /biblicus-1.1.1/docs/context-engine-demo.md +0 -0
- /biblicus-1.1.0/docs/CONTEXT_ENGINE.md → /biblicus-1.1.1/docs/context-engine.md +0 -0
- /biblicus-1.1.0/docs/CONTEXT_PACK.md → /biblicus-1.1.1/docs/context-pack.md +0 -0
- /biblicus-1.1.0/docs/CORPUS_DESIGN.md → /biblicus-1.1.1/docs/corpus-design.md +0 -0
- /biblicus-1.1.0/docs/EXTRACTION_EVALUATION.md → /biblicus-1.1.1/docs/extraction-evaluation.md +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/ocr/index.md +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/pipeline-utilities/index.md +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/speech-to-text/index.md +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/text-document/index.md +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/extractors/vlm-document/index.md +0 -0
- /biblicus-1.1.0/docs/KNOWLEDGE_BASE.md → /biblicus-1.1.1/docs/knowledge-base.md +0 -0
- /biblicus-1.1.0/docs/MARKOV_ANALYSIS.md → /biblicus-1.1.1/docs/markov-analysis.md +0 -0
- /biblicus-1.1.0/docs/RETRIEVAL_EVALUATION.md → /biblicus-1.1.1/docs/retrieval-evaluation.md +0 -0
- /biblicus-1.1.0/docs/RETRIEVAL_QUALITY.md → /biblicus-1.1.1/docs/retrieval-quality.md +0 -0
- /biblicus-1.1.0/docs/TEXT_ANNOTATE.md → /biblicus-1.1.1/docs/text-annotate.md +0 -0
- /biblicus-1.1.0/docs/TEXT_EXTRACT.md → /biblicus-1.1.1/docs/text-extract.md +0 -0
- /biblicus-1.1.0/docs/TEXT_LINK.md → /biblicus-1.1.1/docs/text-link.md +0 -0
- /biblicus-1.1.0/docs/TEXT_REDACT.md → /biblicus-1.1.1/docs/text-redact.md +0 -0
- /biblicus-1.1.0/docs/TEXT_SLICE.md → /biblicus-1.1.1/docs/text-slice.md +0 -0
- /biblicus-1.1.0/docs/TOPIC_MODELING.md → /biblicus-1.1.1/docs/topic-modeling.md +0 -0
- /biblicus-1.1.0/docs/USE_CASES.md → /biblicus-1.1.1/docs/use-cases.md +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/use_cases/notes_to_context_pack.md +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/use_cases/text_folder_search.md +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/docs/use_cases/text_redact.md +0 -0
- /biblicus-1.1.0/docs/USER_CONFIGURATION.md → /biblicus-1.1.1/docs/user-configuration.md +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/70_context_retriever.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/71_context_compaction.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/72_context_history_compaction.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/73_context_nested_compaction.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/74_context_regeneration.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/75_context_default_regeneration.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/76_context_pack_budget_weights.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/77_context_default_pack_priority.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/78_context_default_pack_weights.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/79_context_nested_context_packs.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/80_context_nested_pack_budget_cap.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/81_context_nested_regeneration.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/82_context_explicit_regeneration.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/83_context_explicit_pack_priority.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/84_context_explicit_pack_weights.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/85_context_expansion.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/86_context_engine_errors.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/87_context_compactor_strategies.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/88_context_engine_model_validation.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/89_context_engine_internal_branches.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/90_embedding_index_evidence_fallback.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/91_tf_vector_internal_branches.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/93_context_engine_full_paths.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/ai_llm.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/ai_models.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/analysis_schema.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/backend_validation.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/biblicus_corpus.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/cli_entrypoint.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/cli_parsing.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/cli_step_spec_parsing.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/content_sniffing.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/context_engine_retrieval_internal_branches.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/context_engine_retrieve_context_pack.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/context_pack.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/context_pack_cli.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/context_pack_policies.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/corpus_edge_cases.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/corpus_identity.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/corpus_internal_branches.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/corpus_purge.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/crawl.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/docling_granite_extractor.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/docling_smol_extractor.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/embedding_index_internal_branches.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/embedding_retrieval.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/embeddings.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/environment.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/error_cases.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/evaluation.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/evidence_processing.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/extraction_error_handling.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/extraction_evaluation.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/extraction_evaluation_lab.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/extraction_run_lifecycle.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/extraction_selection.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/extraction_selection_longest.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/extractor_pipeline.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/extractor_validation.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/frontmatter.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/hook_config_validation.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/hook_error_handling.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/hook_logging_internal_branches.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/import_tree.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/inference_backend.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/ingest_namespacing.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/ingest_sources.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_audio_samples.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_image_samples.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_mixed_corpus.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_mixed_extraction.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_ocr_image_extraction.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_pdf_retrieval.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_pdf_samples.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_text_annotate.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_text_extract.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_text_link.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_text_redact.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_text_slice.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_unstructured_extraction.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_use_cases.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_use_cases_sequence_markov.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/integration_wikipedia.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/knowledge_base.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/lifecycle_hooks.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/markitdown_extractor.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/markov_analysis.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/markov_analysis_categorical.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/markov_analysis_llm.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/markov_analysis_topic_modeling.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/markov_analysis_variants.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/markov_embeddings_errors.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/markov_internal_branches.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/markov_schema.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/markov_start_end_labels.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/model_validation.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/ocr_extractor.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/paddleocr_vl_extractor.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/paddleocr_vl_parse_api_response.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/pdf_text_extraction.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/profiling.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/profiling_config_overrides.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/python_api.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/python_hook_logging.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/query_processing.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/recipe_cascading.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/recipe_file_extraction.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/recipe_utilities.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/retrieval_budget.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/retrieval_build_recipes.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/retrieval_evaluation_lab.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/retrieval_quality.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/retrieval_scan.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/retrieval_sqlite_full_text_search.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/retrieval_uses_extraction_run.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/retrieval_utilities.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/select_override.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/select_override_defaults.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/smart_override_selection.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/source_helper_internal_branches.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/source_loading.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/ai_llm_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/ai_models_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/analysis_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/backend_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/cli_parsing_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/cli_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_compaction_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_compactor_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_default_pack_priority_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_default_pack_weights_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_default_regeneration_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_engine_error_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_engine_full_paths_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_engine_internal_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_engine_model_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_engine_registry.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_engine_retrieval_internal_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_engine_retrieve_context_pack_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_engine_retriever.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_expansion_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_explicit_pack_priority_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_explicit_pack_weights_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_explicit_regeneration_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_history_compaction_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_nested_compaction_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_nested_context_packs_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_nested_pack_budget_cap_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_nested_regeneration_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_pack_budget_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_pack_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_regeneration_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/context_retriever_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/corpus_internal_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/crawl_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/deepgram_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/docling_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/embedding_index_evidence_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/embedding_index_internal_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/embedding_retrieval_coverage_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/embeddings_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/evidence_processing_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/extraction_evaluation_lab_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/extraction_evaluation_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/extraction_run_lifecycle_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/extraction_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/extractor_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/frontmatter_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/hook_logging_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/inference_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/knowledge_base_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/markitdown_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/markov_embeddings_error_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/markov_internal_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/markov_schema_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/markov_start_end_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/markov_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/model_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/openai_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/paddleocr_mock_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/paddleocr_vl_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/paddleocr_vl_unit_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/pdf_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/profiling_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/python_api_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/rapidocr_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/recipe_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/requests_mock_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/retrieval_build_recipe_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/retrieval_evaluation_lab_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/retrieval_quality_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/retrieval_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/select_override_defaults_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/source_helper_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/stt_deepgram_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/stt_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/text_annotate_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/text_extract_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/text_internal_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/text_link_internal_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/text_link_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/text_mock_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/text_redact_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/text_slice_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/text_tool_loop_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/tf_vector_internal_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/topic_modeling_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/unstructured_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/use_cases_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/user_config_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/steps/wikitext_steps.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/streaming_ingest.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/stt_deepgram_extractor.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/stt_extractor.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/text_annotate.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/text_extract.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/text_extraction_runs.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/text_internal_branches.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/text_link.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/text_link_internal_branches.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/text_mock.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/text_redact.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/text_slice.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/text_utilities.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/token_budget.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/topic_modeling.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/unstructured_extractor.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/use_cases.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/features/user_config.feature +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/scripts/demo_context_engine.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/scripts/download_ag_news.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/scripts/download_audio_samples.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/scripts/download_image_samples.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/scripts/download_mixed_samples.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/scripts/download_pdf_samples.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/scripts/download_wikipedia.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/scripts/extraction_evaluation_demo.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/scripts/extraction_evaluation_lab.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/scripts/markov_analysis_demo.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/scripts/markov_cached_segments_demo.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/scripts/markov_run_report.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/scripts/profiling_demo.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/scripts/readme_end_to_end_demo.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/scripts/retrieval_evaluation_lab.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/scripts/test.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/scripts/topic_modeling_integration.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/scripts/use_cases/notes_to_context_pack_demo.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/scripts/use_cases/sequence_markov_demo.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/scripts/use_cases/text_folder_search_demo.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/scripts/use_cases/text_redact_demo.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/scripts/wikipedia_rag_demo.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/setup.cfg +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/__main__.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/_vendor/dotyaml/__init__.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/_vendor/dotyaml/interpolation.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/_vendor/dotyaml/loader.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/_vendor/dotyaml/transformer.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/ai/__init__.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/ai/embeddings.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/ai/llm.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/ai/models.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/analysis/__init__.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/analysis/base.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/analysis/markov.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/analysis/models.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/analysis/profiling.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/analysis/schema.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/analysis/topic_modeling.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/chunking.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/cli.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/configuration.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/constants.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/context.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/context_engine/__init__.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/context_engine/assembler.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/context_engine/compaction.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/context_engine/models.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/context_engine/retrieval.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/corpus.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/crawl.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/embedding_providers.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/errors.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/evaluation.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/evidence_processing.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extraction.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extraction_evaluation.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/__init__.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/base.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/deepgram_stt.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/docling_granite_text.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/docling_smol_text.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/markitdown_text.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/metadata_text.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/openai_stt.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/paddleocr_vl_text.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/pass_through_text.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/pdf_text.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/pipeline.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/rapidocr_text.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/select_longest_text.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/select_override.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/select_smart_override.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/select_text.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/extractors/unstructured_text.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/frontmatter.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/hook_logging.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/hook_manager.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/hooks.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/ignore.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/inference.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/knowledge_base.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/models.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/retrieval.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/retrievers/__init__.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/retrievers/base.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/retrievers/embedding_index_common.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/retrievers/embedding_index_file.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/retrievers/embedding_index_inmemory.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/retrievers/hybrid.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/retrievers/scan.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/retrievers/sqlite_full_text_search.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/retrievers/tf_vector.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/sources.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/text/__init__.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/text/annotate.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/text/extract.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/text/link.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/text/markup.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/text/models.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/text/prompts.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/text/redact.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/text/slice.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/text/tool_loop.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/time.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/uris.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus/user_config.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus.egg-info/dependency_links.txt +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus.egg-info/entry_points.txt +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus.egg-info/requires.txt +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/src/biblicus.egg-info/top_level.txt +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/tests/test_text_extract_tool_calls.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/tests/test_text_utility_tool_calls.py +0 -0
- {biblicus-1.1.0 → biblicus-1.1.1}/tests/test_tool_loop_safeguards.py +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: biblicus
|
|
3
|
-
Version: 1.1.
|
|
3
|
+
Version: 1.1.1
|
|
4
4
|
Summary: Command line interface and Python library for corpus ingestion, retrieval, and evaluation.
|
|
5
5
|
License: MIT
|
|
6
6
|
Requires-Python: >=3.9
|
|
@@ -82,8 +82,8 @@ See [retrieval augmented generation overview] for a short introduction to the id
|
|
|
82
82
|
- `biblicus analyze markov` learns a directed, weighted state transition graph over segmented text.
|
|
83
83
|
- YAML configurations support cascading composition plus dotted `--config key=value` overrides.
|
|
84
84
|
- Text extract splits long texts with an LLM by inserting XML tags in-place for structured spans.
|
|
85
|
-
- See `docs/
|
|
86
|
-
- See `docs/
|
|
85
|
+
- See `docs/markov-analysis.md` for Markov analysis details and runnable demos.
|
|
86
|
+
- See `docs/text-extract.md` for the text extract utility and examples.
|
|
87
87
|
|
|
88
88
|
## Start with a knowledge base
|
|
89
89
|
|
|
@@ -552,9 +552,9 @@ For detailed documentation including configuration options, performance characte
|
|
|
552
552
|
|
|
553
553
|
## Retrieval documentation
|
|
554
554
|
|
|
555
|
-
For the retrieval pipeline overview and snapshot artifacts, see `docs/
|
|
556
|
-
(tuned lexical baseline, reranking, hybrid retrieval), see `docs/
|
|
557
|
-
and dataset formats, see `docs/
|
|
555
|
+
For the retrieval pipeline overview and snapshot artifacts, see `docs/retrieval.md`. For retrieval quality upgrades
|
|
556
|
+
(tuned lexical baseline, reranking, hybrid retrieval), see `docs/retrieval-quality.md`. For evaluation workflows
|
|
557
|
+
and dataset formats, see `docs/retrieval-evaluation.md`. For a runnable walkthrough, use the retrieval evaluation lab
|
|
558
558
|
script (`scripts/retrieval_evaluation_lab.py`).
|
|
559
559
|
|
|
560
560
|
## Extraction backends
|
|
@@ -594,7 +594,7 @@ These extractors are built in. Optional ones require extra dependencies. See [te
|
|
|
594
594
|
For detailed documentation on all extractors, see the [Extractor Reference][extractor-reference].
|
|
595
595
|
|
|
596
596
|
For extraction evaluation workflows, dataset formats, and report interpretation, see
|
|
597
|
-
`docs/
|
|
597
|
+
`docs/extraction-evaluation.md`.
|
|
598
598
|
|
|
599
599
|
## Text extract utility
|
|
600
600
|
|
|
@@ -602,14 +602,14 @@ Text extract is a reusable analysis utility that lets a model insert XML tags in
|
|
|
602
602
|
entire document. It returns structured spans and the marked-up text, and it is used as a segmentation option in Markov
|
|
603
603
|
analysis.
|
|
604
604
|
|
|
605
|
-
See `docs/
|
|
605
|
+
See `docs/text-extract.md` for the utility API and examples, and `docs/markov-analysis.md` for the Markov integration.
|
|
606
606
|
|
|
607
607
|
## Text slice utility
|
|
608
608
|
|
|
609
609
|
Text slice is a reusable analysis utility that lets a model insert `<slice/>` markers into a long text without
|
|
610
610
|
re-emitting the entire document. It returns ordered slices and the marked-up text for auditing and reuse.
|
|
611
611
|
|
|
612
|
-
See `docs/
|
|
612
|
+
See `docs/text-slice.md` for the utility API and examples.
|
|
613
613
|
|
|
614
614
|
## Topic modeling analysis
|
|
615
615
|
|
|
@@ -618,8 +618,8 @@ are the first analysis backends. Profiling summarizes corpus composition and ext
|
|
|
618
618
|
an extraction snapshot, optionally applies an LLM-driven extraction pass, applies lexical processing, runs BERTopic, and
|
|
619
619
|
optionally applies an LLM fine-tuning pass to label topics. The output is structured JavaScript Object Notation.
|
|
620
620
|
|
|
621
|
-
See `docs/
|
|
622
|
-
`docs/
|
|
621
|
+
See `docs/analysis.md` for the analysis pipeline overview, `docs/profiling.md` for profiling, and
|
|
622
|
+
`docs/topic-modeling.md` for topic modeling details.
|
|
623
623
|
|
|
624
624
|
Run a topic analysis using a configuration file:
|
|
625
625
|
|
|
@@ -668,7 +668,7 @@ For a repeatable, real-world integration run that downloads AG News and executes
|
|
|
668
668
|
python scripts/topic_modeling_integration.py --corpus corpora/ag_news_demo --force
|
|
669
669
|
```
|
|
670
670
|
|
|
671
|
-
See `docs/
|
|
671
|
+
See `docs/topic-modeling.md` for parameter examples and per-topic output behavior.
|
|
672
672
|
|
|
673
673
|
## Integration corpus and evaluation dataset
|
|
674
674
|
|
|
@@ -726,20 +726,20 @@ Open `http://localhost:8000` in your browser.
|
|
|
726
726
|
License terms are in `LICENSE`.
|
|
727
727
|
|
|
728
728
|
[retrieval augmented generation overview]: https://en.wikipedia.org/wiki/Retrieval-augmented_generation
|
|
729
|
-
[architecture]: docs/
|
|
730
|
-
[roadmap]: docs/
|
|
731
|
-
[feature-index]: docs/
|
|
732
|
-
[corpus]: docs/
|
|
733
|
-
[knowledge-base]: docs/
|
|
734
|
-
[text-extraction]: docs/
|
|
729
|
+
[architecture]: docs/architecture.md
|
|
730
|
+
[roadmap]: docs/roadmap.md
|
|
731
|
+
[feature-index]: docs/feature-index.md
|
|
732
|
+
[corpus]: docs/corpus.md
|
|
733
|
+
[knowledge-base]: docs/knowledge-base.md
|
|
734
|
+
[text-extraction]: docs/extraction.md
|
|
735
735
|
[extractor-reference]: docs/extractors/index.md
|
|
736
736
|
[backend-reference]: docs/backends/index.md
|
|
737
|
-
[speech-to-text]: docs/
|
|
738
|
-
[user-configuration]: docs/
|
|
739
|
-
[backends]: docs/
|
|
740
|
-
[context-packs]: docs/
|
|
741
|
-
[demos]: docs/
|
|
742
|
-
[testing]: docs/
|
|
737
|
+
[speech-to-text]: docs/stt.md
|
|
738
|
+
[user-configuration]: docs/user-configuration.md
|
|
739
|
+
[backends]: docs/backends.md
|
|
740
|
+
[context-packs]: docs/context-pack.md
|
|
741
|
+
[demos]: docs/demos.md
|
|
742
|
+
[testing]: docs/testing.md
|
|
743
743
|
|
|
744
744
|
[continuous-integration-badge]: https://github.com/AnthusAI/Biblicus/actions/workflows/ci.yml/badge.svg?branch=main
|
|
745
745
|
[coverage-badge]: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/AnthusAI/Biblicus/main/coverage_badge.json
|
|
@@ -28,8 +28,8 @@ See [retrieval augmented generation overview] for a short introduction to the id
|
|
|
28
28
|
- `biblicus analyze markov` learns a directed, weighted state transition graph over segmented text.
|
|
29
29
|
- YAML configurations support cascading composition plus dotted `--config key=value` overrides.
|
|
30
30
|
- Text extract splits long texts with an LLM by inserting XML tags in-place for structured spans.
|
|
31
|
-
- See `docs/
|
|
32
|
-
- See `docs/
|
|
31
|
+
- See `docs/markov-analysis.md` for Markov analysis details and runnable demos.
|
|
32
|
+
- See `docs/text-extract.md` for the text extract utility and examples.
|
|
33
33
|
|
|
34
34
|
## Start with a knowledge base
|
|
35
35
|
|
|
@@ -498,9 +498,9 @@ For detailed documentation including configuration options, performance characte
|
|
|
498
498
|
|
|
499
499
|
## Retrieval documentation
|
|
500
500
|
|
|
501
|
-
For the retrieval pipeline overview and snapshot artifacts, see `docs/
|
|
502
|
-
(tuned lexical baseline, reranking, hybrid retrieval), see `docs/
|
|
503
|
-
and dataset formats, see `docs/
|
|
501
|
+
For the retrieval pipeline overview and snapshot artifacts, see `docs/retrieval.md`. For retrieval quality upgrades
|
|
502
|
+
(tuned lexical baseline, reranking, hybrid retrieval), see `docs/retrieval-quality.md`. For evaluation workflows
|
|
503
|
+
and dataset formats, see `docs/retrieval-evaluation.md`. For a runnable walkthrough, use the retrieval evaluation lab
|
|
504
504
|
script (`scripts/retrieval_evaluation_lab.py`).
|
|
505
505
|
|
|
506
506
|
## Extraction backends
|
|
@@ -540,7 +540,7 @@ These extractors are built in. Optional ones require extra dependencies. See [te
|
|
|
540
540
|
For detailed documentation on all extractors, see the [Extractor Reference][extractor-reference].
|
|
541
541
|
|
|
542
542
|
For extraction evaluation workflows, dataset formats, and report interpretation, see
|
|
543
|
-
`docs/
|
|
543
|
+
`docs/extraction-evaluation.md`.
|
|
544
544
|
|
|
545
545
|
## Text extract utility
|
|
546
546
|
|
|
@@ -548,14 +548,14 @@ Text extract is a reusable analysis utility that lets a model insert XML tags in
|
|
|
548
548
|
entire document. It returns structured spans and the marked-up text, and it is used as a segmentation option in Markov
|
|
549
549
|
analysis.
|
|
550
550
|
|
|
551
|
-
See `docs/
|
|
551
|
+
See `docs/text-extract.md` for the utility API and examples, and `docs/markov-analysis.md` for the Markov integration.
|
|
552
552
|
|
|
553
553
|
## Text slice utility
|
|
554
554
|
|
|
555
555
|
Text slice is a reusable analysis utility that lets a model insert `<slice/>` markers into a long text without
|
|
556
556
|
re-emitting the entire document. It returns ordered slices and the marked-up text for auditing and reuse.
|
|
557
557
|
|
|
558
|
-
See `docs/
|
|
558
|
+
See `docs/text-slice.md` for the utility API and examples.
|
|
559
559
|
|
|
560
560
|
## Topic modeling analysis
|
|
561
561
|
|
|
@@ -564,8 +564,8 @@ are the first analysis backends. Profiling summarizes corpus composition and ext
|
|
|
564
564
|
an extraction snapshot, optionally applies an LLM-driven extraction pass, applies lexical processing, runs BERTopic, and
|
|
565
565
|
optionally applies an LLM fine-tuning pass to label topics. The output is structured JavaScript Object Notation.
|
|
566
566
|
|
|
567
|
-
See `docs/
|
|
568
|
-
`docs/
|
|
567
|
+
See `docs/analysis.md` for the analysis pipeline overview, `docs/profiling.md` for profiling, and
|
|
568
|
+
`docs/topic-modeling.md` for topic modeling details.
|
|
569
569
|
|
|
570
570
|
Run a topic analysis using a configuration file:
|
|
571
571
|
|
|
@@ -614,7 +614,7 @@ For a repeatable, real-world integration run that downloads AG News and executes
|
|
|
614
614
|
python scripts/topic_modeling_integration.py --corpus corpora/ag_news_demo --force
|
|
615
615
|
```
|
|
616
616
|
|
|
617
|
-
See `docs/
|
|
617
|
+
See `docs/topic-modeling.md` for parameter examples and per-topic output behavior.
|
|
618
618
|
|
|
619
619
|
## Integration corpus and evaluation dataset
|
|
620
620
|
|
|
@@ -672,20 +672,20 @@ Open `http://localhost:8000` in your browser.
|
|
|
672
672
|
License terms are in `LICENSE`.
|
|
673
673
|
|
|
674
674
|
[retrieval augmented generation overview]: https://en.wikipedia.org/wiki/Retrieval-augmented_generation
|
|
675
|
-
[architecture]: docs/
|
|
676
|
-
[roadmap]: docs/
|
|
677
|
-
[feature-index]: docs/
|
|
678
|
-
[corpus]: docs/
|
|
679
|
-
[knowledge-base]: docs/
|
|
680
|
-
[text-extraction]: docs/
|
|
675
|
+
[architecture]: docs/architecture.md
|
|
676
|
+
[roadmap]: docs/roadmap.md
|
|
677
|
+
[feature-index]: docs/feature-index.md
|
|
678
|
+
[corpus]: docs/corpus.md
|
|
679
|
+
[knowledge-base]: docs/knowledge-base.md
|
|
680
|
+
[text-extraction]: docs/extraction.md
|
|
681
681
|
[extractor-reference]: docs/extractors/index.md
|
|
682
682
|
[backend-reference]: docs/backends/index.md
|
|
683
|
-
[speech-to-text]: docs/
|
|
684
|
-
[user-configuration]: docs/
|
|
685
|
-
[backends]: docs/
|
|
686
|
-
[context-packs]: docs/
|
|
687
|
-
[demos]: docs/
|
|
688
|
-
[testing]: docs/
|
|
683
|
+
[speech-to-text]: docs/stt.md
|
|
684
|
+
[user-configuration]: docs/user-configuration.md
|
|
685
|
+
[backends]: docs/backends.md
|
|
686
|
+
[context-packs]: docs/context-pack.md
|
|
687
|
+
[demos]: docs/demos.md
|
|
688
|
+
[testing]: docs/testing.md
|
|
689
689
|
|
|
690
690
|
[continuous-integration-badge]: https://github.com/AnthusAI/Biblicus/actions/workflows/ci.yml/badge.svg?branch=main
|
|
691
691
|
[coverage-badge]: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/AnthusAI/Biblicus/main/coverage_badge.json
|
|
@@ -103,7 +103,7 @@ observations:
|
|
|
103
103
|
## Topic modeling
|
|
104
104
|
|
|
105
105
|
Topic modeling is the first analysis backend. It uses BERTopic to cluster extracted text, produces per-topic evidence,
|
|
106
|
-
and optionally labels topics using an LLM. See `docs/
|
|
106
|
+
and optionally labels topics using an LLM. See `docs/topic-modeling.md` for detailed configuration and examples.
|
|
107
107
|
|
|
108
108
|
The integration demo script is a working reference you can use as a starting point:
|
|
109
109
|
|
|
@@ -117,7 +117,7 @@ labels, keywords, and document examples.
|
|
|
117
117
|
## Markov analysis
|
|
118
118
|
|
|
119
119
|
Markov analysis learns a directed, weighted state transition graph over sequences of text segments. The output includes
|
|
120
|
-
per-state exemplars, per-item decoded paths, and optional GraphViz exports. See `docs/
|
|
120
|
+
per-state exemplars, per-item decoded paths, and optional GraphViz exports. See `docs/markov-analysis.md` for detailed
|
|
121
121
|
configuration and examples.
|
|
122
122
|
|
|
123
123
|
Text extract is available as a segmentation strategy for long texts. It inserts XML tags in-place using a virtual file
|
|
@@ -126,7 +126,7 @@ editing loop, then extracts spans without requiring the model to re-emit the ful
|
|
|
126
126
|
## Profiling analysis
|
|
127
127
|
|
|
128
128
|
Profiling is the baseline analysis backend. It summarizes corpus composition and extraction coverage using
|
|
129
|
-
deterministic counts and distribution metrics. See `docs/
|
|
129
|
+
deterministic counts and distribution metrics. See `docs/profiling.md` for the full reference and working demo.
|
|
130
130
|
|
|
131
131
|
### Minimal profiling run
|
|
132
132
|
|
|
@@ -0,0 +1,107 @@
|
|
|
1
|
+
# Biblicus Architecture
|
|
2
|
+
|
|
3
|
+
Biblicus sits between raw, unstructured data and the moment you need reliable answers from it.
|
|
4
|
+
It is built for teams who receive large, messy corpora and must extract usable signals without
|
|
5
|
+
losing provenance or reproducibility. Retrieval-augmented generation is one use case, but the
|
|
6
|
+
system is broader than chatbots: it supports any pipeline that needs structured insight from
|
|
7
|
+
unstructured data.
|
|
8
|
+
|
|
9
|
+
At a high level the system does five things:
|
|
10
|
+
|
|
11
|
+
1. **Ingests** raw content into a corpus with minimal friction.
|
|
12
|
+
2. **Extracts** text from diverse media (documents, images, audio).
|
|
13
|
+
3. **Transforms** and annotates text with reusable LLM utilities.
|
|
14
|
+
4. **Retrieves** evidence through explicit, reproducible stages.
|
|
15
|
+
5. **Evaluates** results so improvements are measurable, not anecdotal.
|
|
16
|
+
|
|
17
|
+
The guiding idea is that every retrieval produces **evidence**: structured outputs with scores
|
|
18
|
+
and provenance that can be inspected, audited, and reused. Context packs, summaries, and downstream
|
|
19
|
+
generation are all derived from that evidence.
|
|
20
|
+
|
|
21
|
+
## Core Concepts
|
|
22
|
+
|
|
23
|
+
- **Corpus**: a named, mutable collection rooted at a path or uniform resource identifier. In
|
|
24
|
+
version zero it is typically a local folder containing raw files plus a `.biblicus/` directory
|
|
25
|
+
for minimal metadata.
|
|
26
|
+
- **Item**: the unit of ingestion in a corpus: raw bytes of any modality, including text, images,
|
|
27
|
+
Portable Document Format documents, audio, and video, plus optional metadata and provenance.
|
|
28
|
+
- **Knowledge base backend**: an implementation that can ingest and retrieve from a corpus, such
|
|
29
|
+
as scan, full text search, vector retrieval, or hybrid retrieval, exposed to procedures through
|
|
30
|
+
retrieval primitives.
|
|
31
|
+
- **Retrieval configuration**: a named configuration bundle for a backend, such as chunking rules,
|
|
32
|
+
embedding model and version, hybrid weights, reranker choice, and filters. This is what we
|
|
33
|
+
benchmark and compare.
|
|
34
|
+
- **Configuration manifest**: a reproducibility record describing the backend and configuration parameters,
|
|
35
|
+
plus any referenced snapshot artifacts and build snapshots.
|
|
36
|
+
- **Snapshot artifacts**: optional, persisted representations derived from raw content for a given
|
|
37
|
+
configuration and backend, such as chunks, embeddings, or indexes. Some backends intentionally have
|
|
38
|
+
none and operate on demand.
|
|
39
|
+
- **Evidence**: structured retrieval output from backend queries. Evidence includes spans, scores,
|
|
40
|
+
and provenance used by downstream retrieval augmented generation procedures.
|
|
41
|
+
- **Pipeline stage / editorial layer**: a structured step that transforms, filters, extracts, or
|
|
42
|
+
curates content, such as raw, curated, and published, or extract text from Portable Document
|
|
43
|
+
Format documents.
|
|
44
|
+
|
|
45
|
+
## Design Principles
|
|
46
|
+
|
|
47
|
+
- **Primitives + derived constructs**: keep the protocol surface small and composable; ship
|
|
48
|
+
higher-level helpers and example procedures on top.
|
|
49
|
+
- **Composability definition**: composable means each stage has a small input and output contract,
|
|
50
|
+
so you can connect stages in different orders without rewriting them.
|
|
51
|
+
- **Minimal opinion raw store**: raw ingestion should work for a folder of files with optional
|
|
52
|
+
lightweight tagging.
|
|
53
|
+
- **Reproducibility by default**: comparisons require manifests (even when there are no persisted
|
|
54
|
+
snapshot artifacts).
|
|
55
|
+
- **Mutability is real**: corpora are edited, pruned, and reorganized; re-indexing must be a core
|
|
56
|
+
workflow.
|
|
57
|
+
- **Separation of concerns**: retrieval returns evidence; retrieval-augmented generation patterns
|
|
58
|
+
live in Tactus procedures (not inside the knowledge base backend).
|
|
59
|
+
- **Deployment flexibility**: same interface across local/offline, brokered external services, and
|
|
60
|
+
hybrid environments.
|
|
61
|
+
- **Evidence is the primary output**: every retrieval returns structured evidence; everything else
|
|
62
|
+
is a derived helper.
|
|
63
|
+
|
|
64
|
+
## The Python Developer Mental Model
|
|
65
|
+
|
|
66
|
+
If this system is pleasant to use, a Python developer should be able to describe intent with the
|
|
67
|
+
core nouns:
|
|
68
|
+
|
|
69
|
+
- I have a **corpus** at this path or uniform resource identifier.
|
|
70
|
+
- I ingest an **item** with optional **metadata**.
|
|
71
|
+
- I rebuild the derived **index** after edits.
|
|
72
|
+
- I run a **configuration** against the same corpus.
|
|
73
|
+
- I query and receive **evidence**.
|
|
74
|
+
|
|
75
|
+
Anything that does not map cleanly to these nouns is either a derived helper or a backend-specific
|
|
76
|
+
implementation detail that should not leak.
|
|
77
|
+
|
|
78
|
+
## Evidence Lifecycle
|
|
79
|
+
|
|
80
|
+
Evidence flows through explicit stages and remains inspectable at every step:
|
|
81
|
+
|
|
82
|
+
1. **Retrieval**: backends return evidence with `stage` labels and scores.
|
|
83
|
+
2. **Processing**: optional reranking or filtering updates scores while preserving provenance.
|
|
84
|
+
3. **Context shaping**: context packs select and format evidence into model-ready text.
|
|
85
|
+
4. **Evaluation**: evaluation datasets compare evidence rankings to expectations.
|
|
86
|
+
|
|
87
|
+
At each stage, the output remains a structured object, so you can inspect, store, and compare
|
|
88
|
+
runs without re-running the entire pipeline.
|
|
89
|
+
|
|
90
|
+
## Relationship to Agent Frameworks
|
|
91
|
+
|
|
92
|
+
Biblicus integrates with agent frameworks through explicit tool interfaces. It does not hide
|
|
93
|
+
retrieval inside the model. Instead, it provides repeatable pipelines that expose *what* was
|
|
94
|
+
retrieved and *why*, so models can use evidence directly and safely.
|
|
95
|
+
|
|
96
|
+
- **Tools and toolsets**, including the Model Context Protocol, are the primary capability
|
|
97
|
+
boundary.
|
|
98
|
+
- **Sandboxing and brokered or secretless execution** are primary deployment modes.
|
|
99
|
+
- **Durability and evaluations** are central: invariants via specifications, quality via
|
|
100
|
+
evaluations.
|
|
101
|
+
|
|
102
|
+
## Where to go next
|
|
103
|
+
|
|
104
|
+
- Start with **corpus.md** and **extraction.md** to understand how raw content is ingested.
|
|
105
|
+
- Move to **retrieval.md** and **retrieval-evaluation.md** to see how evidence is produced and tested.
|
|
106
|
+
- Explore **topic-modeling.md** and **markov-analysis.md** if you need higher-level analysis tools.
|
|
107
|
+
- See **text-utilities.md** for reusable, AI-assisted text transformations.
|
|
@@ -96,7 +96,7 @@ biblicus build my-corpus --backend sqlite-full-text-search
|
|
|
96
96
|
biblicus query my-corpus --query "search terms"
|
|
97
97
|
```
|
|
98
98
|
|
|
99
|
-
See `docs/
|
|
99
|
+
See `docs/retrieval.md` for a step-by-step retrieval walkthrough.
|
|
100
100
|
|
|
101
101
|
#### Python API
|
|
102
102
|
|
|
@@ -126,7 +126,7 @@ result = backend.query(
|
|
|
126
126
|
)
|
|
127
127
|
```
|
|
128
128
|
|
|
129
|
-
See `docs/
|
|
129
|
+
See `docs/retrieval-evaluation.md` for evaluation workflows and dataset formats.
|
|
130
130
|
|
|
131
131
|
## Choosing a Backend
|
|
132
132
|
|
|
@@ -291,12 +291,12 @@ To implement a custom backend:
|
|
|
291
291
|
3. Register in `biblicus.backends.available_backends`
|
|
292
292
|
4. Add BDD specifications with 100% coverage
|
|
293
293
|
|
|
294
|
-
See [
|
|
294
|
+
See [backends.md](../backends.md) for implementation details.
|
|
295
295
|
|
|
296
296
|
## See Also
|
|
297
297
|
|
|
298
298
|
- [scan backend](scan.md) - Naive full-scan backend
|
|
299
299
|
- [sqlite-full-text-search backend](sqlite-full-text-search.md) - SQLite FTS5 backend
|
|
300
|
-
- [
|
|
301
|
-
- [
|
|
300
|
+
- [backends.md](../backends.md) - Backend implementation guide
|
|
301
|
+
- [extraction.md](../extraction.md) - Text extraction pipeline
|
|
302
302
|
- [Extractor Reference](../extractors/index.md) - Text extraction plugins
|
|
@@ -322,6 +322,6 @@ Query result statistics:
|
|
|
322
322
|
## See Also
|
|
323
323
|
|
|
324
324
|
- [Backends Overview](index.md) - All available backends
|
|
325
|
-
- [
|
|
326
|
-
- [
|
|
325
|
+
- [backends.md](../backends.md) - Backend implementation guide
|
|
326
|
+
- [extraction.md](../extraction.md) - Text extraction pipeline
|
|
327
327
|
- [Extractor Reference](../extractors/index.md) - Text extraction plugins
|
|
@@ -481,7 +481,7 @@ CREATE VIRTUAL TABLE chunks_full_text_search USING fts5(
|
|
|
481
481
|
## See Also
|
|
482
482
|
|
|
483
483
|
- [Backends Overview](index.md) - All available backends
|
|
484
|
-
- [
|
|
485
|
-
- [
|
|
484
|
+
- [backends.md](../backends.md) - Backend implementation guide
|
|
485
|
+
- [extraction.md](../extraction.md) - Text extraction pipeline
|
|
486
486
|
- [Extractor Reference](../extractors/index.md) - Text extraction plugins
|
|
487
487
|
- [SQLite FTS5 Documentation](https://www.sqlite.org/fts5.html) - Official SQLite FTS5 docs
|
|
@@ -41,7 +41,7 @@ The manifest is the reproducible contract. Artifacts are backend-specific and li
|
|
|
41
41
|
- Treat **runs** as immutable manifests with reproducible parameters.
|
|
42
42
|
- If your backend needs artifacts, store them under `.biblicus/runs/` and record paths in `artifact_paths`.
|
|
43
43
|
- Keep **text extraction** in explicit pipeline stages, not in backend ingestion.
|
|
44
|
-
See `docs/
|
|
44
|
+
See `docs/extraction.md` for how extraction snapshots are built and referenced from backend configs.
|
|
45
45
|
|
|
46
46
|
## Reproducibility checklist
|
|
47
47
|
|
|
@@ -3,94 +3,7 @@
|
|
|
3
3
|
This document is a set of runnable examples you can use to see the current system working end to end.
|
|
4
4
|
Each section links to a textbook chapter so you can read the concept and then run the code.
|
|
5
5
|
|
|
6
|
-
For the ordered plan of what to build next, see `docs/
|
|
7
|
-
|
|
8
|
-
## Diagram of the current system and the next layers
|
|
9
|
-
|
|
10
|
-
Blue boxes are implemented now. Purple boxes are layers not implemented yet that we can build and compare.
|
|
11
|
-
|
|
12
|
-
```mermaid
|
|
13
|
-
%%{init: {"flowchart": {"useMaxWidth": true, "nodeSpacing": 18, "rankSpacing": 22}}}%%
|
|
14
|
-
flowchart TB
|
|
15
|
-
subgraph Legend[Legend]
|
|
16
|
-
direction LR
|
|
17
|
-
LegendNow[Implemented now]
|
|
18
|
-
LegendPlanned[Planned]
|
|
19
|
-
LegendNow --- LegendPlanned
|
|
20
|
-
end
|
|
21
|
-
|
|
22
|
-
subgraph ExistsNow[Implemented now]
|
|
23
|
-
direction TB
|
|
24
|
-
|
|
25
|
-
Ingest[Ingest] --> RawFiles[Raw item files]
|
|
26
|
-
RawFiles --> CatalogFile[Catalog file]
|
|
27
|
-
CatalogFile --> ExtractionRun[Extraction run]
|
|
28
|
-
ExtractionRun --> ExtractedText[Extracted text artifacts]
|
|
29
|
-
|
|
30
|
-
subgraph PluggableBackend[Pluggable backend]
|
|
31
|
-
direction LR
|
|
32
|
-
|
|
33
|
-
subgraph BackendIngestionIndexing[Ingestion and indexing]
|
|
34
|
-
direction TB
|
|
35
|
-
CatalogFile --> BuildRun[Build run]
|
|
36
|
-
ExtractedText -.-> BuildRun
|
|
37
|
-
BuildRun --> BackendIndex[Backend index]
|
|
38
|
-
BackendIndex --> RunManifest[Run manifest]
|
|
39
|
-
end
|
|
40
|
-
|
|
41
|
-
subgraph BackendRetrievalGeneration[Retrieval and generation]
|
|
42
|
-
direction TB
|
|
43
|
-
RunManifest --> Query[Query]
|
|
44
|
-
Query --> Evidence[Evidence]
|
|
45
|
-
Evidence --> EvaluationMetrics[Evaluation metrics]
|
|
46
|
-
end
|
|
47
|
-
end
|
|
48
|
-
end
|
|
49
|
-
|
|
50
|
-
subgraph PlannedLayers[Planned]
|
|
51
|
-
direction TB
|
|
52
|
-
RerankStage[Rerank<br/>pipeline stage]
|
|
53
|
-
FilterStage[Filter<br/>pipeline stage]
|
|
54
|
-
ToolServer[Tool server<br/>for external backends]
|
|
55
|
-
OpticalCharacterRecognition[Optical character recognition<br/>extraction plugin]
|
|
56
|
-
SpeechToText[Speech to text<br/>extraction plugin]
|
|
57
|
-
end
|
|
58
|
-
|
|
59
|
-
OpticalCharacterRecognition -.-> ExtractionRun
|
|
60
|
-
SpeechToText -.-> ExtractionRun
|
|
61
|
-
RerankStage -.-> Evidence
|
|
62
|
-
FilterStage -.-> Evidence
|
|
63
|
-
ToolServer -.-> PluggableBackend
|
|
64
|
-
|
|
65
|
-
style Legend fill:#ffffff,stroke:#ffffff,color:#111111
|
|
66
|
-
style ExistsNow fill:#ffffff,stroke:#ffffff,color:#111111
|
|
67
|
-
style PlannedLayers fill:#ffffff,stroke:#ffffff,color:#111111
|
|
68
|
-
|
|
69
|
-
style LegendNow fill:#e3f2fd,stroke:#1e88e5,color:#111111
|
|
70
|
-
style LegendPlanned fill:#f3e5f5,stroke:#8e24aa,color:#111111
|
|
71
|
-
|
|
72
|
-
style Ingest fill:#e3f2fd,stroke:#1e88e5,color:#111111
|
|
73
|
-
style RawFiles fill:#e3f2fd,stroke:#1e88e5,color:#111111
|
|
74
|
-
style CatalogFile fill:#e3f2fd,stroke:#1e88e5,color:#111111
|
|
75
|
-
style ExtractionRun fill:#e3f2fd,stroke:#1e88e5,color:#111111
|
|
76
|
-
style ExtractedText fill:#e3f2fd,stroke:#1e88e5,color:#111111
|
|
77
|
-
style BuildRun fill:#e3f2fd,stroke:#1e88e5,color:#111111
|
|
78
|
-
style BackendIndex fill:#e3f2fd,stroke:#1e88e5,color:#111111
|
|
79
|
-
style RunManifest fill:#e3f2fd,stroke:#1e88e5,color:#111111
|
|
80
|
-
style Query fill:#e3f2fd,stroke:#1e88e5,color:#111111
|
|
81
|
-
style Evidence fill:#e3f2fd,stroke:#1e88e5,color:#111111
|
|
82
|
-
style EvaluationMetrics fill:#e3f2fd,stroke:#1e88e5,color:#111111
|
|
83
|
-
|
|
84
|
-
style PluggableBackend fill:#ffffff,stroke:#1e88e5,stroke-dasharray:6 3,stroke-width:2px,color:#111111
|
|
85
|
-
style BackendIngestionIndexing fill:#ffffff,stroke:#cfd8dc,color:#111111
|
|
86
|
-
style BackendRetrievalGeneration fill:#ffffff,stroke:#cfd8dc,color:#111111
|
|
87
|
-
|
|
88
|
-
style RerankStage fill:#f3e5f5,stroke:#8e24aa,color:#111111
|
|
89
|
-
style FilterStage fill:#f3e5f5,stroke:#8e24aa,color:#111111
|
|
90
|
-
style ToolServer fill:#f3e5f5,stroke:#8e24aa,color:#111111
|
|
91
|
-
style OpticalCharacterRecognition fill:#f3e5f5,stroke:#8e24aa,color:#111111
|
|
92
|
-
style SpeechToText fill:#f3e5f5,stroke:#8e24aa,color:#111111
|
|
93
|
-
```
|
|
6
|
+
For the ordered plan of what to build next, see `docs/roadmap.md`.
|
|
94
7
|
|
|
95
8
|
## Working examples you can run now
|
|
96
9
|
|
|
@@ -169,10 +82,10 @@ In another terminal:
|
|
|
169
82
|
```
|
|
170
83
|
rm -rf corpora/crawl-demo
|
|
171
84
|
python -m biblicus init corpora/crawl-demo
|
|
172
|
-
python -m biblicus crawl --corpus corpora/crawl-demo
|
|
173
|
-
--root-url http://127.0.0.1:8000/site/index.html
|
|
174
|
-
--allowed-prefix http://127.0.0.1:8000/site/
|
|
175
|
-
--max-items 50
|
|
85
|
+
python -m biblicus crawl --corpus corpora/crawl-demo \
|
|
86
|
+
--root-url http://127.0.0.1:8000/site/index.html \
|
|
87
|
+
--allowed-prefix http://127.0.0.1:8000/site/ \
|
|
88
|
+
--max-items 50 \
|
|
176
89
|
--tag crawled
|
|
177
90
|
python -m biblicus list --corpus corpora/crawl-demo
|
|
178
91
|
```
|
|
@@ -189,7 +102,7 @@ python -m biblicus extract build --corpus corpora/demo --step pass-through-text
|
|
|
189
102
|
|
|
190
103
|
The output includes a `snapshot_id` you can reuse when building a retrieval backend.
|
|
191
104
|
|
|
192
|
-
Text extraction details: `docs/
|
|
105
|
+
Text extraction details: `docs/extraction.md`
|
|
193
106
|
|
|
194
107
|
### Topic modeling integration run
|
|
195
108
|
|
|
@@ -204,7 +117,7 @@ python -m pip install "biblicus[datasets,topic-modeling]"
|
|
|
204
117
|
python scripts/topic_modeling_integration.py --corpus corpora/ag_news_demo --force
|
|
205
118
|
```
|
|
206
119
|
|
|
207
|
-
Topic modeling details: `docs/
|
|
120
|
+
Topic modeling details: `docs/topic-modeling.md`
|
|
208
121
|
|
|
209
122
|
### Extraction evaluation demo run
|
|
210
123
|
|
|
@@ -223,7 +136,7 @@ python scripts/extraction_evaluation_demo.py --corpus corpora/ag_news_extraction
|
|
|
223
136
|
|
|
224
137
|
The script prints the dataset path, extraction snapshot reference, and evaluation output path so you can inspect the results.
|
|
225
138
|
|
|
226
|
-
Extraction evaluation details: `docs/
|
|
139
|
+
Extraction evaluation details: `docs/extraction-evaluation.md`
|
|
227
140
|
|
|
228
141
|
### Extraction evaluation lab run
|
|
229
142
|
|
|
@@ -235,7 +148,7 @@ python scripts/extraction_evaluation_lab.py --corpus corpora/extraction_eval_lab
|
|
|
235
148
|
|
|
236
149
|
The lab writes a generated dataset file and evaluation output path and prints both in the command output.
|
|
237
150
|
|
|
238
|
-
Extraction evaluation lab details: `docs/
|
|
151
|
+
Extraction evaluation lab details: `docs/extraction-evaluation.md`
|
|
239
152
|
|
|
240
153
|
### Retrieval evaluation lab run
|
|
241
154
|
|
|
@@ -248,7 +161,7 @@ python scripts/retrieval_evaluation_lab.py --corpus corpora/retrieval_eval_lab -
|
|
|
248
161
|
|
|
249
162
|
The script prints the dataset path, retrieval snapshot identifier, and evaluation output location.
|
|
250
163
|
|
|
251
|
-
Retrieval evaluation details: `docs/
|
|
164
|
+
Retrieval evaluation details: `docs/retrieval-evaluation.md`
|
|
252
165
|
|
|
253
166
|
Run with a larger corpus and a higher topic count:
|
|
254
167
|
|
|
@@ -274,27 +187,27 @@ The profiling demo downloads AG News, runs extraction, and produces a profiling
|
|
|
274
187
|
python scripts/profiling_demo.py --corpus corpora/profiling_demo --force
|
|
275
188
|
```
|
|
276
189
|
|
|
277
|
-
Profiling details: `docs/
|
|
190
|
+
Profiling details: `docs/profiling.md`
|
|
278
191
|
|
|
279
192
|
### Select extracted text within a pipeline
|
|
280
193
|
|
|
281
194
|
When you want an explicit choice among multiple extraction outputs, add a selection extractor step at the end of the pipeline.
|
|
282
195
|
|
|
283
196
|
```
|
|
284
|
-
python -m biblicus extract build --corpus corpora/demo
|
|
285
|
-
--step pass-through-text
|
|
286
|
-
--step metadata-text
|
|
197
|
+
python -m biblicus extract build --corpus corpora/demo \
|
|
198
|
+
--step pass-through-text \
|
|
199
|
+
--step metadata-text \
|
|
287
200
|
--step select-text
|
|
288
201
|
```
|
|
289
202
|
|
|
290
203
|
Copy the `snapshot_id` from the JavaScript Object Notation output. Use it as `EXTRACTION_SNAPSHOT_ID` in the next command.
|
|
291
204
|
|
|
292
205
|
```
|
|
293
|
-
python -m biblicus build --corpus corpora/demo --backend sqlite-full-text-search
|
|
206
|
+
python -m biblicus build --corpus corpora/demo --backend sqlite-full-text-search \
|
|
294
207
|
--config extraction_snapshot=pipeline:EXTRACTION_SNAPSHOT_ID
|
|
295
208
|
```
|
|
296
209
|
|
|
297
|
-
Extraction pipeline details: `docs/
|
|
210
|
+
Extraction pipeline details: `docs/extraction.md`
|
|
298
211
|
|
|
299
212
|
### Portable Document Format extraction and retrieval
|
|
300
213
|
|
|
@@ -314,7 +227,7 @@ python -m biblicus build --corpus corpora/pdf_samples --backend sqlite-full-text
|
|
|
314
227
|
python -m biblicus query --corpus corpora/pdf_samples --query "Dummy PDF file"
|
|
315
228
|
```
|
|
316
229
|
|
|
317
|
-
Retrieval details: `docs/
|
|
230
|
+
Retrieval details: `docs/retrieval.md`
|
|
318
231
|
|
|
319
232
|
### MarkItDown extraction demo (Python 3.10+)
|
|
320
233
|
|
|
@@ -386,9 +299,9 @@ python -m biblicus extract build --corpus corpora/mixed_samples --step unstructu
|
|
|
386
299
|
When you want to prefer one extractor over another for the same item types, order the steps and end with `select-text`:
|
|
387
300
|
|
|
388
301
|
```
|
|
389
|
-
python -m biblicus extract build --corpus corpora/pdf_samples
|
|
390
|
-
--step unstructured
|
|
391
|
-
--step pdf-text
|
|
302
|
+
python -m biblicus extract build --corpus corpora/pdf_samples \
|
|
303
|
+
--step unstructured \
|
|
304
|
+
--step pdf-text \
|
|
392
305
|
--step select-text
|
|
393
306
|
```
|
|
394
307
|
|
|
@@ -429,7 +342,7 @@ python -m biblicus build --corpus corpora/demo --backend scan
|
|
|
429
342
|
python -m biblicus query --corpus corpora/demo --query "Hello"
|
|
430
343
|
```
|
|
431
344
|
|
|
432
|
-
Backend details: `docs/
|
|
345
|
+
Backend details: `docs/backends.md`
|
|
433
346
|
|
|
434
347
|
### Build and query the practical backend
|
|
435
348
|
|
|
@@ -440,7 +353,7 @@ python -m biblicus build --corpus corpora/demo --backend sqlite-full-text-search
|
|
|
440
353
|
python -m biblicus query --corpus corpora/demo --query "tiny"
|
|
441
354
|
```
|
|
442
355
|
|
|
443
|
-
Backend details: `docs/
|
|
356
|
+
Backend details: `docs/backends.md`
|
|
444
357
|
|
|
445
358
|
### Run the test suite and view coverage
|
|
446
359
|
|
|
@@ -455,14 +368,14 @@ To include integration scenarios that download public test data at runtime:
|
|
|
455
368
|
python scripts/test.py --integration
|
|
456
369
|
```
|
|
457
370
|
|
|
458
|
-
Testing details: `docs/
|
|
371
|
+
Testing details: `docs/testing.md`
|
|
459
372
|
|
|
460
373
|
## Documentation map
|
|
461
374
|
|
|
462
|
-
- Corpus: `docs/
|
|
463
|
-
- Text extraction: `docs/
|
|
464
|
-
- Backends: `docs/
|
|
465
|
-
- Testing: `docs/
|
|
466
|
-
- Roadmap: `docs/
|
|
375
|
+
- Corpus: `docs/corpus.md`
|
|
376
|
+
- Text extraction: `docs/extraction.md`
|
|
377
|
+
- Backends: `docs/backends.md`
|
|
378
|
+
- Testing: `docs/testing.md`
|
|
379
|
+
- Roadmap: `docs/roadmap.md`
|
|
467
380
|
|
|
468
|
-
For what to build next, see `docs/
|
|
381
|
+
For what to build next, see `docs/roadmap.md`.
|