PyPI - biblicus - Versions diffs - 1.1.0__tar.gz → 1.1.1__tar.gz - Mend

biblicus 1.1.0tar.gz → 1.1.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (425) hide show

{biblicus-1.1.0/src/biblicus.egg-info → biblicus-1.1.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: biblicus
-Version: 1.1.0
+Version: 1.1.1
 Summary: Command line interface and Python library for corpus ingestion, retrieval, and evaluation.
 License: MIT
 Requires-Python: >=3.9
@@ -82,8 +82,8 @@ See [retrieval augmented generation overview] for a short introduction to the id
 - `biblicus analyze markov` learns a directed, weighted state transition graph over segmented text.
 - YAML configurations support cascading composition plus dotted `--config key=value` overrides.
 - Text extract splits long texts with an LLM by inserting XML tags in-place for structured spans.
-- See `docs/MARKOV_ANALYSIS.md` for Markov analysis details and runnable demos.
-- See `docs/TEXT_EXTRACT.md` for the text extract utility and examples.
+- See `docs/markov-analysis.md` for Markov analysis details and runnable demos.
+- See `docs/text-extract.md` for the text extract utility and examples.
 ## Start with a knowledge base
@@ -552,9 +552,9 @@ For detailed documentation including configuration options, performance characte
 ## Retrieval documentation
-For the retrieval pipeline overview and snapshot artifacts, see `docs/RETRIEVAL.md`. For retrieval quality upgrades
-(tuned lexical baseline, reranking, hybrid retrieval), see `docs/RETRIEVAL_QUALITY.md`. For evaluation workflows
-and dataset formats, see `docs/RETRIEVAL_EVALUATION.md`. For a runnable walkthrough, use the retrieval evaluation lab
+For the retrieval pipeline overview and snapshot artifacts, see `docs/retrieval.md`. For retrieval quality upgrades
+(tuned lexical baseline, reranking, hybrid retrieval), see `docs/retrieval-quality.md`. For evaluation workflows
+and dataset formats, see `docs/retrieval-evaluation.md`. For a runnable walkthrough, use the retrieval evaluation lab
 script (`scripts/retrieval_evaluation_lab.py`).
 ## Extraction backends
@@ -594,7 +594,7 @@ These extractors are built in. Optional ones require extra dependencies. See [te
 For detailed documentation on all extractors, see the [Extractor Reference][extractor-reference].
 For extraction evaluation workflows, dataset formats, and report interpretation, see
-`docs/EXTRACTION_EVALUATION.md`.
+`docs/extraction-evaluation.md`.
 ## Text extract utility
@@ -602,14 +602,14 @@ Text extract is a reusable analysis utility that lets a model insert XML tags in
 entire document. It returns structured spans and the marked-up text, and it is used as a segmentation option in Markov
 analysis.
-See `docs/TEXT_EXTRACT.md` for the utility API and examples, and `docs/MARKOV_ANALYSIS.md` for the Markov integration.
+See `docs/text-extract.md` for the utility API and examples, and `docs/markov-analysis.md` for the Markov integration.
 ## Text slice utility
 Text slice is a reusable analysis utility that lets a model insert `<slice/>` markers into a long text without
 re-emitting the entire document. It returns ordered slices and the marked-up text for auditing and reuse.
-See `docs/TEXT_SLICE.md` for the utility API and examples.
+See `docs/text-slice.md` for the utility API and examples.
 ## Topic modeling analysis
@@ -618,8 +618,8 @@ are the first analysis backends. Profiling summarizes corpus composition and ext
 an extraction snapshot, optionally applies an LLM-driven extraction pass, applies lexical processing, runs BERTopic, and
 optionally applies an LLM fine-tuning pass to label topics. The output is structured JavaScript Object Notation.
-See `docs/ANALYSIS.md` for the analysis pipeline overview, `docs/PROFILING.md` for profiling, and
-`docs/TOPIC_MODELING.md` for topic modeling details.
+See `docs/analysis.md` for the analysis pipeline overview, `docs/profiling.md` for profiling, and
+`docs/topic-modeling.md` for topic modeling details.
 Run a topic analysis using a configuration file:
@@ -668,7 +668,7 @@ For a repeatable, real-world integration run that downloads AG News and executes
 python scripts/topic_modeling_integration.py --corpus corpora/ag_news_demo --force
 ```
-See `docs/TOPIC_MODELING.md` for parameter examples and per-topic output behavior.
+See `docs/topic-modeling.md` for parameter examples and per-topic output behavior.
 ## Integration corpus and evaluation dataset
@@ -726,20 +726,20 @@ Open `http://localhost:8000` in your browser.
 License terms are in `LICENSE`.
 [retrieval augmented generation overview]: https://en.wikipedia.org/wiki/Retrieval-augmented_generation
-[architecture]: docs/ARCHITECTURE.md
-[roadmap]: docs/ROADMAP.md
-[feature-index]: docs/FEATURE_INDEX.md
-[corpus]: docs/CORPUS.md
-[knowledge-base]: docs/KNOWLEDGE_BASE.md
-[text-extraction]: docs/EXTRACTION.md
+[architecture]: docs/architecture.md
+[roadmap]: docs/roadmap.md
+[feature-index]: docs/feature-index.md
+[corpus]: docs/corpus.md
+[knowledge-base]: docs/knowledge-base.md
+[text-extraction]: docs/extraction.md
 [extractor-reference]: docs/extractors/index.md
 [backend-reference]: docs/backends/index.md
-[speech-to-text]: docs/STT.md
-[user-configuration]: docs/USER_CONFIGURATION.md
-[backends]: docs/BACKENDS.md
-[context-packs]: docs/CONTEXT_PACK.md
-[demos]: docs/DEMOS.md
-[testing]: docs/TESTING.md
+[speech-to-text]: docs/stt.md
+[user-configuration]: docs/user-configuration.md
+[backends]: docs/backends.md
+[context-packs]: docs/context-pack.md
+[demos]: docs/demos.md
+[testing]: docs/testing.md
 [continuous-integration-badge]: https://github.com/AnthusAI/Biblicus/actions/workflows/ci.yml/badge.svg?branch=main
 [coverage-badge]: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/AnthusAI/Biblicus/main/coverage_badge.json

{biblicus-1.1.0 → biblicus-1.1.1}/README.md RENAMED Viewed

@@ -28,8 +28,8 @@ See [retrieval augmented generation overview] for a short introduction to the id
 - `biblicus analyze markov` learns a directed, weighted state transition graph over segmented text.
 - YAML configurations support cascading composition plus dotted `--config key=value` overrides.
 - Text extract splits long texts with an LLM by inserting XML tags in-place for structured spans.
-- See `docs/MARKOV_ANALYSIS.md` for Markov analysis details and runnable demos.
-- See `docs/TEXT_EXTRACT.md` for the text extract utility and examples.
+- See `docs/markov-analysis.md` for Markov analysis details and runnable demos.
+- See `docs/text-extract.md` for the text extract utility and examples.
 ## Start with a knowledge base
@@ -498,9 +498,9 @@ For detailed documentation including configuration options, performance characte
 ## Retrieval documentation
-For the retrieval pipeline overview and snapshot artifacts, see `docs/RETRIEVAL.md`. For retrieval quality upgrades
-(tuned lexical baseline, reranking, hybrid retrieval), see `docs/RETRIEVAL_QUALITY.md`. For evaluation workflows
-and dataset formats, see `docs/RETRIEVAL_EVALUATION.md`. For a runnable walkthrough, use the retrieval evaluation lab
+For the retrieval pipeline overview and snapshot artifacts, see `docs/retrieval.md`. For retrieval quality upgrades
+(tuned lexical baseline, reranking, hybrid retrieval), see `docs/retrieval-quality.md`. For evaluation workflows
+and dataset formats, see `docs/retrieval-evaluation.md`. For a runnable walkthrough, use the retrieval evaluation lab
 script (`scripts/retrieval_evaluation_lab.py`).
 ## Extraction backends
@@ -540,7 +540,7 @@ These extractors are built in. Optional ones require extra dependencies. See [te
 For detailed documentation on all extractors, see the [Extractor Reference][extractor-reference].
 For extraction evaluation workflows, dataset formats, and report interpretation, see
-`docs/EXTRACTION_EVALUATION.md`.
+`docs/extraction-evaluation.md`.
 ## Text extract utility
@@ -548,14 +548,14 @@ Text extract is a reusable analysis utility that lets a model insert XML tags in
 entire document. It returns structured spans and the marked-up text, and it is used as a segmentation option in Markov
 analysis.
-See `docs/TEXT_EXTRACT.md` for the utility API and examples, and `docs/MARKOV_ANALYSIS.md` for the Markov integration.
+See `docs/text-extract.md` for the utility API and examples, and `docs/markov-analysis.md` for the Markov integration.
 ## Text slice utility
 Text slice is a reusable analysis utility that lets a model insert `<slice/>` markers into a long text without
 re-emitting the entire document. It returns ordered slices and the marked-up text for auditing and reuse.
-See `docs/TEXT_SLICE.md` for the utility API and examples.
+See `docs/text-slice.md` for the utility API and examples.
 ## Topic modeling analysis
@@ -564,8 +564,8 @@ are the first analysis backends. Profiling summarizes corpus composition and ext
 an extraction snapshot, optionally applies an LLM-driven extraction pass, applies lexical processing, runs BERTopic, and
 optionally applies an LLM fine-tuning pass to label topics. The output is structured JavaScript Object Notation.
-See `docs/ANALYSIS.md` for the analysis pipeline overview, `docs/PROFILING.md` for profiling, and
-`docs/TOPIC_MODELING.md` for topic modeling details.
+See `docs/analysis.md` for the analysis pipeline overview, `docs/profiling.md` for profiling, and
+`docs/topic-modeling.md` for topic modeling details.
 Run a topic analysis using a configuration file:
@@ -614,7 +614,7 @@ For a repeatable, real-world integration run that downloads AG News and executes
 python scripts/topic_modeling_integration.py --corpus corpora/ag_news_demo --force
 ```
-See `docs/TOPIC_MODELING.md` for parameter examples and per-topic output behavior.
+See `docs/topic-modeling.md` for parameter examples and per-topic output behavior.
 ## Integration corpus and evaluation dataset
@@ -672,20 +672,20 @@ Open `http://localhost:8000` in your browser.
 License terms are in `LICENSE`.
 [retrieval augmented generation overview]: https://en.wikipedia.org/wiki/Retrieval-augmented_generation
-[architecture]: docs/ARCHITECTURE.md
-[roadmap]: docs/ROADMAP.md
-[feature-index]: docs/FEATURE_INDEX.md
-[corpus]: docs/CORPUS.md
-[knowledge-base]: docs/KNOWLEDGE_BASE.md
-[text-extraction]: docs/EXTRACTION.md
+[architecture]: docs/architecture.md
+[roadmap]: docs/roadmap.md
+[feature-index]: docs/feature-index.md
+[corpus]: docs/corpus.md
+[knowledge-base]: docs/knowledge-base.md
+[text-extraction]: docs/extraction.md
 [extractor-reference]: docs/extractors/index.md
 [backend-reference]: docs/backends/index.md
-[speech-to-text]: docs/STT.md
-[user-configuration]: docs/USER_CONFIGURATION.md
-[backends]: docs/BACKENDS.md
-[context-packs]: docs/CONTEXT_PACK.md
-[demos]: docs/DEMOS.md
-[testing]: docs/TESTING.md
+[speech-to-text]: docs/stt.md
+[user-configuration]: docs/user-configuration.md
+[backends]: docs/backends.md
+[context-packs]: docs/context-pack.md
+[demos]: docs/demos.md
+[testing]: docs/testing.md
 [continuous-integration-badge]: https://github.com/AnthusAI/Biblicus/actions/workflows/ci.yml/badge.svg?branch=main
 [coverage-badge]: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/AnthusAI/Biblicus/main/coverage_badge.json

biblicus-1.1.0/docs/ANALYSIS.md → biblicus-1.1.1/docs/analysis.md RENAMED Viewed

@@ -103,7 +103,7 @@ observations:
 ## Topic modeling
 Topic modeling is the first analysis backend. It uses BERTopic to cluster extracted text, produces per-topic evidence,
-and optionally labels topics using an LLM. See `docs/TOPIC_MODELING.md` for detailed configuration and examples.
+and optionally labels topics using an LLM. See `docs/topic-modeling.md` for detailed configuration and examples.
 The integration demo script is a working reference you can use as a starting point:
@@ -117,7 +117,7 @@ labels, keywords, and document examples.
 ## Markov analysis
 Markov analysis learns a directed, weighted state transition graph over sequences of text segments. The output includes
-per-state exemplars, per-item decoded paths, and optional GraphViz exports. See `docs/MARKOV_ANALYSIS.md` for detailed
+per-state exemplars, per-item decoded paths, and optional GraphViz exports. See `docs/markov-analysis.md` for detailed
 configuration and examples.
 Text extract is available as a segmentation strategy for long texts. It inserts XML tags in-place using a virtual file
@@ -126,7 +126,7 @@ editing loop, then extracts spans without requiring the model to re-emit the ful
 ## Profiling analysis
 Profiling is the baseline analysis backend. It summarizes corpus composition and extraction coverage using
-deterministic counts and distribution metrics. See `docs/PROFILING.md` for the full reference and working demo.
+deterministic counts and distribution metrics. See `docs/profiling.md` for the full reference and working demo.
 ### Minimal profiling run

biblicus-1.1.1/docs/architecture.md ADDED Viewed

@@ -0,0 +1,107 @@
+# Biblicus Architecture
+Biblicus sits between raw, unstructured data and the moment you need reliable answers from it.
+It is built for teams who receive large, messy corpora and must extract usable signals without
+losing provenance or reproducibility. Retrieval-augmented generation is one use case, but the
+system is broader than chatbots: it supports any pipeline that needs structured insight from
+unstructured data.
+At a high level the system does five things:
+1. **Ingests** raw content into a corpus with minimal friction.
+2. **Extracts** text from diverse media (documents, images, audio).
+3. **Transforms** and annotates text with reusable LLM utilities.
+4. **Retrieves** evidence through explicit, reproducible stages.
+5. **Evaluates** results so improvements are measurable, not anecdotal.
+The guiding idea is that every retrieval produces **evidence**: structured outputs with scores
+and provenance that can be inspected, audited, and reused. Context packs, summaries, and downstream
+generation are all derived from that evidence.
+## Core Concepts
+- **Corpus**: a named, mutable collection rooted at a path or uniform resource identifier. In
+  version zero it is typically a local folder containing raw files plus a `.biblicus/` directory
+  for minimal metadata.
+- **Item**: the unit of ingestion in a corpus: raw bytes of any modality, including text, images,
+  Portable Document Format documents, audio, and video, plus optional metadata and provenance.
+- **Knowledge base backend**: an implementation that can ingest and retrieve from a corpus, such
+  as scan, full text search, vector retrieval, or hybrid retrieval, exposed to procedures through
+  retrieval primitives.
+- **Retrieval configuration**: a named configuration bundle for a backend, such as chunking rules,
+  embedding model and version, hybrid weights, reranker choice, and filters. This is what we
+  benchmark and compare.
+- **Configuration manifest**: a reproducibility record describing the backend and configuration parameters,
+  plus any referenced snapshot artifacts and build snapshots.
+- **Snapshot artifacts**: optional, persisted representations derived from raw content for a given
+  configuration and backend, such as chunks, embeddings, or indexes. Some backends intentionally have
+  none and operate on demand.
+- **Evidence**: structured retrieval output from backend queries. Evidence includes spans, scores,
+  and provenance used by downstream retrieval augmented generation procedures.
+- **Pipeline stage / editorial layer**: a structured step that transforms, filters, extracts, or
+  curates content, such as raw, curated, and published, or extract text from Portable Document
+  Format documents.
+## Design Principles
+- **Primitives + derived constructs**: keep the protocol surface small and composable; ship
+  higher-level helpers and example procedures on top.
+- **Composability definition**: composable means each stage has a small input and output contract,
+  so you can connect stages in different orders without rewriting them.
+- **Minimal opinion raw store**: raw ingestion should work for a folder of files with optional
+  lightweight tagging.
+- **Reproducibility by default**: comparisons require manifests (even when there are no persisted
+  snapshot artifacts).
+- **Mutability is real**: corpora are edited, pruned, and reorganized; re-indexing must be a core
+  workflow.
+- **Separation of concerns**: retrieval returns evidence; retrieval-augmented generation patterns
+  live in Tactus procedures (not inside the knowledge base backend).
+- **Deployment flexibility**: same interface across local/offline, brokered external services, and
+  hybrid environments.
+- **Evidence is the primary output**: every retrieval returns structured evidence; everything else
+  is a derived helper.
+## The Python Developer Mental Model
+If this system is pleasant to use, a Python developer should be able to describe intent with the
+core nouns:
+- I have a **corpus** at this path or uniform resource identifier.
+- I ingest an **item** with optional **metadata**.
+- I rebuild the derived **index** after edits.
+- I run a **configuration** against the same corpus.
+- I query and receive **evidence**.
+Anything that does not map cleanly to these nouns is either a derived helper or a backend-specific
+implementation detail that should not leak.
+## Evidence Lifecycle
+Evidence flows through explicit stages and remains inspectable at every step:
+1. **Retrieval**: backends return evidence with `stage` labels and scores.
+2. **Processing**: optional reranking or filtering updates scores while preserving provenance.
+3. **Context shaping**: context packs select and format evidence into model-ready text.
+4. **Evaluation**: evaluation datasets compare evidence rankings to expectations.
+At each stage, the output remains a structured object, so you can inspect, store, and compare
+runs without re-running the entire pipeline.
+## Relationship to Agent Frameworks
+Biblicus integrates with agent frameworks through explicit tool interfaces. It does not hide
+retrieval inside the model. Instead, it provides repeatable pipelines that expose *what* was
+retrieved and *why*, so models can use evidence directly and safely.
+- **Tools and toolsets**, including the Model Context Protocol, are the primary capability
+  boundary.
+- **Sandboxing and brokered or secretless execution** are primary deployment modes.
+- **Durability and evaluations** are central: invariants via specifications, quality via
+  evaluations.
+## Where to go next
+- Start with **corpus.md** and **extraction.md** to understand how raw content is ingested.
+- Move to **retrieval.md** and **retrieval-evaluation.md** to see how evidence is produced and tested.
+- Explore **topic-modeling.md** and **markov-analysis.md** if you need higher-level analysis tools.
+- See **text-utilities.md** for reusable, AI-assisted text transformations.

{biblicus-1.1.0 → biblicus-1.1.1}/docs/backends/index.md RENAMED Viewed

@@ -96,7 +96,7 @@ biblicus build my-corpus --backend sqlite-full-text-search
 biblicus query my-corpus --query "search terms"
 ```
-See `docs/RETRIEVAL.md` for a step-by-step retrieval walkthrough.
+See `docs/retrieval.md` for a step-by-step retrieval walkthrough.
 #### Python API
@@ -126,7 +126,7 @@ result = backend.query(
 )
 ```
-See `docs/RETRIEVAL_EVALUATION.md` for evaluation workflows and dataset formats.
+See `docs/retrieval-evaluation.md` for evaluation workflows and dataset formats.
 ## Choosing a Backend
@@ -291,12 +291,12 @@ To implement a custom backend:
 3. Register in `biblicus.backends.available_backends`
 4. Add BDD specifications with 100% coverage
-See [BACKENDS.md](../BACKENDS.md) for implementation details.
+See [backends.md](../backends.md) for implementation details.
 ## See Also
 - [scan backend](scan.md) - Naive full-scan backend
 - [sqlite-full-text-search backend](sqlite-full-text-search.md) - SQLite FTS5 backend
-- [BACKENDS.md](../BACKENDS.md) - Backend implementation guide
-- [EXTRACTION.md](../EXTRACTION.md) - Text extraction pipeline
+- [backends.md](../backends.md) - Backend implementation guide
+- [extraction.md](../extraction.md) - Text extraction pipeline
 - [Extractor Reference](../extractors/index.md) - Text extraction plugins

{biblicus-1.1.0 → biblicus-1.1.1}/docs/backends/scan.md RENAMED Viewed

@@ -322,6 +322,6 @@ Query result statistics:
 ## See Also
 - [Backends Overview](index.md) - All available backends
-- [BACKENDS.md](../BACKENDS.md) - Backend implementation guide
-- [EXTRACTION.md](../EXTRACTION.md) - Text extraction pipeline
+- [backends.md](../backends.md) - Backend implementation guide
+- [extraction.md](../extraction.md) - Text extraction pipeline
 - [Extractor Reference](../extractors/index.md) - Text extraction plugins

{biblicus-1.1.0 → biblicus-1.1.1}/docs/backends/sqlite-full-text-search.md RENAMED Viewed

@@ -481,7 +481,7 @@ CREATE VIRTUAL TABLE chunks_full_text_search USING fts5(
 ## See Also
 - [Backends Overview](index.md) - All available backends
-- [BACKENDS.md](../BACKENDS.md) - Backend implementation guide
-- [EXTRACTION.md](../EXTRACTION.md) - Text extraction pipeline
+- [backends.md](../backends.md) - Backend implementation guide
+- [extraction.md](../extraction.md) - Text extraction pipeline
 - [Extractor Reference](../extractors/index.md) - Text extraction plugins
 - [SQLite FTS5 Documentation](https://www.sqlite.org/fts5.html) - Official SQLite FTS5 docs

biblicus-1.1.0/docs/BACKENDS.md → biblicus-1.1.1/docs/backends.md RENAMED Viewed

@@ -41,7 +41,7 @@ The manifest is the reproducible contract. Artifacts are backend-specific and li
 - Treat **runs** as immutable manifests with reproducible parameters.
 - If your backend needs artifacts, store them under `.biblicus/runs/` and record paths in `artifact_paths`.
 - Keep **text extraction** in explicit pipeline stages, not in backend ingestion.
-  See `docs/EXTRACTION.md` for how extraction snapshots are built and referenced from backend configs.
+  See `docs/extraction.md` for how extraction snapshots are built and referenced from backend configs.
 ## Reproducibility checklist

biblicus-1.1.0/docs/DEMOS.md → biblicus-1.1.1/docs/demos.md RENAMED Viewed

@@ -3,94 +3,7 @@
 This document is a set of runnable examples you can use to see the current system working end to end.
 Each section links to a textbook chapter so you can read the concept and then run the code.
-For the ordered plan of what to build next, see `docs/ROADMAP.md`.
-## Diagram of the current system and the next layers
-Blue boxes are implemented now. Purple boxes are layers not implemented yet that we can build and compare.
-```mermaid
-%%{init: {"flowchart": {"useMaxWidth": true, "nodeSpacing": 18, "rankSpacing": 22}}}%%
-flowchart TB
-  subgraph Legend[Legend]
-    direction LR
-    LegendNow[Implemented now]
-    LegendPlanned[Planned]
-    LegendNow --- LegendPlanned
-  end
-  subgraph ExistsNow[Implemented now]
-    direction TB
-    Ingest[Ingest] --> RawFiles[Raw item files]
-    RawFiles --> CatalogFile[Catalog file]
-    CatalogFile --> ExtractionRun[Extraction run]
-    ExtractionRun --> ExtractedText[Extracted text artifacts]
-    subgraph PluggableBackend[Pluggable backend]
-      direction LR
-      subgraph BackendIngestionIndexing[Ingestion and indexing]
-        direction TB
-        CatalogFile --> BuildRun[Build run]
-        ExtractedText -.-> BuildRun
-        BuildRun --> BackendIndex[Backend index]
-        BackendIndex --> RunManifest[Run manifest]
-      end
-      subgraph BackendRetrievalGeneration[Retrieval and generation]
-        direction TB
-        RunManifest --> Query[Query]
-        Query --> Evidence[Evidence]
-        Evidence --> EvaluationMetrics[Evaluation metrics]
-      end
-    end
-  end
-  subgraph PlannedLayers[Planned]
-    direction TB
-    RerankStage[Rerank<br/>pipeline stage]
-    FilterStage[Filter<br/>pipeline stage]
-    ToolServer[Tool server<br/>for external backends]
-    OpticalCharacterRecognition[Optical character recognition<br/>extraction plugin]
-    SpeechToText[Speech to text<br/>extraction plugin]
-  end
-  OpticalCharacterRecognition -.-> ExtractionRun
-  SpeechToText -.-> ExtractionRun
-  RerankStage -.-> Evidence
-  FilterStage -.-> Evidence
-  ToolServer -.-> PluggableBackend
-  style Legend fill:#ffffff,stroke:#ffffff,color:#111111
-  style ExistsNow fill:#ffffff,stroke:#ffffff,color:#111111
-  style PlannedLayers fill:#ffffff,stroke:#ffffff,color:#111111
-  style LegendNow fill:#e3f2fd,stroke:#1e88e5,color:#111111
-  style LegendPlanned fill:#f3e5f5,stroke:#8e24aa,color:#111111
-  style Ingest fill:#e3f2fd,stroke:#1e88e5,color:#111111
-  style RawFiles fill:#e3f2fd,stroke:#1e88e5,color:#111111
-  style CatalogFile fill:#e3f2fd,stroke:#1e88e5,color:#111111
-  style ExtractionRun fill:#e3f2fd,stroke:#1e88e5,color:#111111
-  style ExtractedText fill:#e3f2fd,stroke:#1e88e5,color:#111111
-  style BuildRun fill:#e3f2fd,stroke:#1e88e5,color:#111111
-  style BackendIndex fill:#e3f2fd,stroke:#1e88e5,color:#111111
-  style RunManifest fill:#e3f2fd,stroke:#1e88e5,color:#111111
-  style Query fill:#e3f2fd,stroke:#1e88e5,color:#111111
-  style Evidence fill:#e3f2fd,stroke:#1e88e5,color:#111111
-  style EvaluationMetrics fill:#e3f2fd,stroke:#1e88e5,color:#111111
-  style PluggableBackend fill:#ffffff,stroke:#1e88e5,stroke-dasharray:6 3,stroke-width:2px,color:#111111
-  style BackendIngestionIndexing fill:#ffffff,stroke:#cfd8dc,color:#111111
-  style BackendRetrievalGeneration fill:#ffffff,stroke:#cfd8dc,color:#111111
-  style RerankStage fill:#f3e5f5,stroke:#8e24aa,color:#111111
-  style FilterStage fill:#f3e5f5,stroke:#8e24aa,color:#111111
-  style ToolServer fill:#f3e5f5,stroke:#8e24aa,color:#111111
-  style OpticalCharacterRecognition fill:#f3e5f5,stroke:#8e24aa,color:#111111
-  style SpeechToText fill:#f3e5f5,stroke:#8e24aa,color:#111111
-```
+For the ordered plan of what to build next, see `docs/roadmap.md`.
 ## Working examples you can run now
@@ -169,10 +82,10 @@ In another terminal:
 ```
 rm -rf corpora/crawl-demo
 python -m biblicus init corpora/crawl-demo
-python -m biblicus crawl --corpus corpora/crawl-demo \\
-  --root-url http://127.0.0.1:8000/site/index.html \\
-  --allowed-prefix http://127.0.0.1:8000/site/ \\
-  --max-items 50 \\
+python -m biblicus crawl --corpus corpora/crawl-demo \
+  --root-url http://127.0.0.1:8000/site/index.html \
+  --allowed-prefix http://127.0.0.1:8000/site/ \
+  --max-items 50 \
   --tag crawled
 python -m biblicus list --corpus corpora/crawl-demo
 ```
@@ -189,7 +102,7 @@ python -m biblicus extract build --corpus corpora/demo --step pass-through-text
 The output includes a `snapshot_id` you can reuse when building a retrieval backend.
-Text extraction details: `docs/EXTRACTION.md`
+Text extraction details: `docs/extraction.md`
 ### Topic modeling integration run
@@ -204,7 +117,7 @@ python -m pip install "biblicus[datasets,topic-modeling]"
 python scripts/topic_modeling_integration.py --corpus corpora/ag_news_demo --force
 ```
-Topic modeling details: `docs/TOPIC_MODELING.md`
+Topic modeling details: `docs/topic-modeling.md`
 ### Extraction evaluation demo run
@@ -223,7 +136,7 @@ python scripts/extraction_evaluation_demo.py --corpus corpora/ag_news_extraction
 The script prints the dataset path, extraction snapshot reference, and evaluation output path so you can inspect the results.
-Extraction evaluation details: `docs/EXTRACTION_EVALUATION.md`
+Extraction evaluation details: `docs/extraction-evaluation.md`
 ### Extraction evaluation lab run
@@ -235,7 +148,7 @@ python scripts/extraction_evaluation_lab.py --corpus corpora/extraction_eval_lab
 The lab writes a generated dataset file and evaluation output path and prints both in the command output.
-Extraction evaluation lab details: `docs/EXTRACTION_EVALUATION.md`
+Extraction evaluation lab details: `docs/extraction-evaluation.md`
 ### Retrieval evaluation lab run
@@ -248,7 +161,7 @@ python scripts/retrieval_evaluation_lab.py --corpus corpora/retrieval_eval_lab -
 The script prints the dataset path, retrieval snapshot identifier, and evaluation output location.
-Retrieval evaluation details: `docs/RETRIEVAL_EVALUATION.md`
+Retrieval evaluation details: `docs/retrieval-evaluation.md`
 Run with a larger corpus and a higher topic count:
@@ -274,27 +187,27 @@ The profiling demo downloads AG News, runs extraction, and produces a profiling
 python scripts/profiling_demo.py --corpus corpora/profiling_demo --force
 ```
-Profiling details: `docs/PROFILING.md`
+Profiling details: `docs/profiling.md`
 ### Select extracted text within a pipeline
 When you want an explicit choice among multiple extraction outputs, add a selection extractor step at the end of the pipeline.
 ```
-python -m biblicus extract build --corpus corpora/demo \\
-  --step pass-through-text \\
-  --step metadata-text \\
+python -m biblicus extract build --corpus corpora/demo \
+  --step pass-through-text \
+  --step metadata-text \
   --step select-text
 ```
 Copy the `snapshot_id` from the JavaScript Object Notation output. Use it as `EXTRACTION_SNAPSHOT_ID` in the next command.
 ```
-python -m biblicus build --corpus corpora/demo --backend sqlite-full-text-search \\
+python -m biblicus build --corpus corpora/demo --backend sqlite-full-text-search \
   --config extraction_snapshot=pipeline:EXTRACTION_SNAPSHOT_ID
 ```
-Extraction pipeline details: `docs/EXTRACTION.md`
+Extraction pipeline details: `docs/extraction.md`
 ### Portable Document Format extraction and retrieval
@@ -314,7 +227,7 @@ python -m biblicus build --corpus corpora/pdf_samples --backend sqlite-full-text
 python -m biblicus query --corpus corpora/pdf_samples --query "Dummy PDF file"
 ```
-Retrieval details: `docs/RETRIEVAL.md`
+Retrieval details: `docs/retrieval.md`
 ### MarkItDown extraction demo (Python 3.10+)
@@ -386,9 +299,9 @@ python -m biblicus extract build --corpus corpora/mixed_samples --step unstructu
 When you want to prefer one extractor over another for the same item types, order the steps and end with `select-text`:
 ```
-python -m biblicus extract build --corpus corpora/pdf_samples \\
-  --step unstructured \\
-  --step pdf-text \\
+python -m biblicus extract build --corpus corpora/pdf_samples \
+  --step unstructured \
+  --step pdf-text \
   --step select-text
 ```
@@ -429,7 +342,7 @@ python -m biblicus build --corpus corpora/demo --backend scan
 python -m biblicus query --corpus corpora/demo --query "Hello"
 ```
-Backend details: `docs/BACKENDS.md`
+Backend details: `docs/backends.md`
 ### Build and query the practical backend
@@ -440,7 +353,7 @@ python -m biblicus build --corpus corpora/demo --backend sqlite-full-text-search
 python -m biblicus query --corpus corpora/demo --query "tiny"
 ```
-Backend details: `docs/BACKENDS.md`
+Backend details: `docs/backends.md`
 ### Run the test suite and view coverage
@@ -455,14 +368,14 @@ To include integration scenarios that download public test data at runtime:
 python scripts/test.py --integration
 ```
-Testing details: `docs/TESTING.md`
+Testing details: `docs/testing.md`
 ## Documentation map
-- Corpus: `docs/CORPUS.md`
-- Text extraction: `docs/EXTRACTION.md`
-- Backends: `docs/BACKENDS.md`
-- Testing: `docs/TESTING.md`
-- Roadmap: `docs/ROADMAP.md`
+- Corpus: `docs/corpus.md`
+- Text extraction: `docs/extraction.md`
+- Backends: `docs/backends.md`
+- Testing: `docs/testing.md`
+- Roadmap: `docs/roadmap.md`
-For what to build next, see `docs/ROADMAP.md`.
+For what to build next, see `docs/roadmap.md`.

biblicus 1.1.0__tar.gz → 1.1.1__tar.gz

biblicus 1.1.0tar.gz → 1.1.1tar.gz