PyPI - biblicus - Versions diffs - 0.3.0__tar.gz → 0.4.0__tar.gz - Mend

biblicus 0.3.0tar.gz → 0.4.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (139) hide show

{biblicus-0.3.0/src/biblicus.egg-info → biblicus-0.4.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: biblicus
-Version: 0.3.0
+Version: 0.4.0
 Summary: Command line interface and Python library for corpus ingestion, retrieval, and evaluation.
 License: MIT
 Requires-Python: >=3.9
@@ -77,10 +77,7 @@ flowchart LR
     direction LR
     LegendArtifact[Stored artifact or evidence]
     LegendStep[Step]
-    LegendStable[Stable region]
-    LegendPluggable[Pluggable region]
     LegendArtifact --- LegendStep
-    LegendStable --- LegendPluggable
   end
   subgraph Main[" "]
@@ -93,14 +90,14 @@ flowchart LR
       Raw --> Catalog[Catalog file]
     end
-    subgraph PluggableExtractionPipeline[Pluggable extraction pipeline]
+    subgraph PluggableExtractionPipeline[Pluggable: extraction pipeline]
       direction TB
       Catalog --> Extract[Extract pipeline]
       Extract --> ExtractedText[Extracted text artifacts]
       ExtractedText --> ExtractionRun[Extraction run manifest]
     end
-    subgraph PluggableRetrievalBackend[Pluggable retrieval backend]
+    subgraph PluggableRetrievalBackend[Pluggable: retrieval backend]
       direction LR
       subgraph BackendIngestionIndexing[Ingestion and indexing]
@@ -154,8 +151,6 @@ flowchart LR
   style Main fill:#ffffff,stroke:#ffffff,color:#111111
   style LegendArtifact fill:#f3e5f5,stroke:#8e24aa,color:#111111
   style LegendStep fill:#eceff1,stroke:#90a4ae,color:#111111
-  style LegendStable fill:#ffffff,stroke:#8e24aa,stroke-width:2px,color:#111111
-  style LegendPluggable fill:#ffffff,stroke:#1e88e5,stroke-dasharray:6 3,stroke-width:2px,color:#111111
 ```
 ## Practical value
@@ -168,6 +163,7 @@ flowchart LR
 - Initialize a corpus folder.
 - Ingest items from file paths, web addresses, or text input.
+- Crawl a website section into corpus items when you want a repeatable “import from the web” workflow.
 - Run extraction when you want derived text artifacts from non-text sources.
 - Reindex to refresh the catalog after edits.
 - Build a retrieval run with a backend.
@@ -205,11 +201,22 @@ biblicus init corpora/example
 biblicus ingest --corpus corpora/example notes/example.txt
 echo "A short note" | biblicus ingest --corpus corpora/example --stdin --title "First note"
 biblicus list --corpus corpora/example
-biblicus extract --corpus corpora/example --step pass-through-text --step metadata-text
+biblicus extract build --corpus corpora/example --step pass-through-text --step metadata-text
+biblicus extract list --corpus corpora/example
 biblicus build --corpus corpora/example --backend scan
 biblicus query --corpus corpora/example --query "note"
 ```
+If you want to turn a website section into corpus items, crawl a root web address while restricting the crawl to an allowed prefix:
+```
+biblicus crawl --corpus corpora/example \\
+  --root-url https://example.com/docs/index.html \\
+  --allowed-prefix https://example.com/docs/ \\
+  --max-items 50 \\
+  --tag crawled
+```
 ## Python usage
 From Python, the same flow is available through the Corpus class and backend interfaces. The public surface area is small on purpose.
@@ -233,7 +240,7 @@ In an assistant system, retrieval usually produces context for a model call. Thi
 ## Learn more
-Full documentation is available on [ReadTheDocs](https://biblicus.readthedocs.io/).
+Full documentation is published on GitHub Pages: https://anthusai.github.io/Biblicus/
 The documents below are written to be read in order.
@@ -262,7 +269,16 @@ corpus/
     config.json
     catalog.json
     runs/
-      run-id.json
+      extraction/
+        pipeline/
+          <run id>/
+            manifest.json
+            text/
+              <item id>.txt
+      retrieval/
+        <backend id>/
+          <run id>/
+            manifest.json
 ```
 ## Retrieval backends
@@ -313,7 +329,7 @@ python3 -m pip install -e ".[dev]"
 Build the documentation:
 ```
-python3 -m sphinx -b html docs docs/_build
+python3 -m sphinx -b html docs docs/_build/html
 ```
 ## License
@@ -333,4 +349,4 @@ License terms are in `LICENSE`.
 [continuous-integration-badge]: https://github.com/AnthusAI/Biblicus/actions/workflows/ci.yml/badge.svg?branch=main
 [coverage-badge]: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/AnthusAI/Biblicus/main/coverage_badge.json
-[documentation-badge]: https://readthedocs.org/projects/biblicus/badge/?version=latest
+[documentation-badge]: https://img.shields.io/badge/docs-GitHub%20Pages-blue

{biblicus-0.3.0 → biblicus-0.4.0}/README.md RENAMED Viewed

@@ -48,10 +48,7 @@ flowchart LR
     direction LR
     LegendArtifact[Stored artifact or evidence]
     LegendStep[Step]
-    LegendStable[Stable region]
-    LegendPluggable[Pluggable region]
     LegendArtifact --- LegendStep
-    LegendStable --- LegendPluggable
   end
   subgraph Main[" "]
@@ -64,14 +61,14 @@ flowchart LR
       Raw --> Catalog[Catalog file]
     end
-    subgraph PluggableExtractionPipeline[Pluggable extraction pipeline]
+    subgraph PluggableExtractionPipeline[Pluggable: extraction pipeline]
       direction TB
       Catalog --> Extract[Extract pipeline]
       Extract --> ExtractedText[Extracted text artifacts]
       ExtractedText --> ExtractionRun[Extraction run manifest]
     end
-    subgraph PluggableRetrievalBackend[Pluggable retrieval backend]
+    subgraph PluggableRetrievalBackend[Pluggable: retrieval backend]
       direction LR
       subgraph BackendIngestionIndexing[Ingestion and indexing]
@@ -125,8 +122,6 @@ flowchart LR
   style Main fill:#ffffff,stroke:#ffffff,color:#111111
   style LegendArtifact fill:#f3e5f5,stroke:#8e24aa,color:#111111
   style LegendStep fill:#eceff1,stroke:#90a4ae,color:#111111
-  style LegendStable fill:#ffffff,stroke:#8e24aa,stroke-width:2px,color:#111111
-  style LegendPluggable fill:#ffffff,stroke:#1e88e5,stroke-dasharray:6 3,stroke-width:2px,color:#111111
 ```
 ## Practical value
@@ -139,6 +134,7 @@ flowchart LR
 - Initialize a corpus folder.
 - Ingest items from file paths, web addresses, or text input.
+- Crawl a website section into corpus items when you want a repeatable “import from the web” workflow.
 - Run extraction when you want derived text artifacts from non-text sources.
 - Reindex to refresh the catalog after edits.
 - Build a retrieval run with a backend.
@@ -176,11 +172,22 @@ biblicus init corpora/example
 biblicus ingest --corpus corpora/example notes/example.txt
 echo "A short note" | biblicus ingest --corpus corpora/example --stdin --title "First note"
 biblicus list --corpus corpora/example
-biblicus extract --corpus corpora/example --step pass-through-text --step metadata-text
+biblicus extract build --corpus corpora/example --step pass-through-text --step metadata-text
+biblicus extract list --corpus corpora/example
 biblicus build --corpus corpora/example --backend scan
 biblicus query --corpus corpora/example --query "note"
 ```
+If you want to turn a website section into corpus items, crawl a root web address while restricting the crawl to an allowed prefix:
+```
+biblicus crawl --corpus corpora/example \\
+  --root-url https://example.com/docs/index.html \\
+  --allowed-prefix https://example.com/docs/ \\
+  --max-items 50 \\
+  --tag crawled
+```
 ## Python usage
 From Python, the same flow is available through the Corpus class and backend interfaces. The public surface area is small on purpose.
@@ -204,7 +211,7 @@ In an assistant system, retrieval usually produces context for a model call. Thi
 ## Learn more
-Full documentation is available on [ReadTheDocs](https://biblicus.readthedocs.io/).
+Full documentation is published on GitHub Pages: https://anthusai.github.io/Biblicus/
 The documents below are written to be read in order.
@@ -233,7 +240,16 @@ corpus/
     config.json
     catalog.json
     runs/
-      run-id.json
+      extraction/
+        pipeline/
+          <run id>/
+            manifest.json
+            text/
+              <item id>.txt
+      retrieval/
+        <backend id>/
+          <run id>/
+            manifest.json
 ```
 ## Retrieval backends
@@ -284,7 +300,7 @@ python3 -m pip install -e ".[dev]"
 Build the documentation:
 ```
-python3 -m sphinx -b html docs docs/_build
+python3 -m sphinx -b html docs docs/_build/html
 ```
 ## License
@@ -304,4 +320,4 @@ License terms are in `LICENSE`.
 [continuous-integration-badge]: https://github.com/AnthusAI/Biblicus/actions/workflows/ci.yml/badge.svg?branch=main
 [coverage-badge]: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/AnthusAI/Biblicus/main/coverage_badge.json
-[documentation-badge]: https://readthedocs.org/projects/biblicus/badge/?version=latest
+[documentation-badge]: https://img.shields.io/badge/docs-GitHub%20Pages-blue

{biblicus-0.3.0 → biblicus-0.4.0}/docs/CORPUS.md RENAMED Viewed

@@ -43,6 +43,20 @@ Ingest a web address:
 python3 -m biblicus ingest --corpus corpora/example https://example.com --tag web
 ```
+## Crawl a website prefix
+To build a corpus from a website section, crawl a root uniform resource locator and restrict the crawl to an allowed prefix.
+```
+python3 -m biblicus crawl --corpus corpora/example \\
+  --root-url https://example.com/docs/index.html \\
+  --allowed-prefix https://example.com/docs/ \\
+  --max-items 50 \\
+  --tag crawled
+```
+The crawl command only follows links within the allowed prefix, and it respects `.biblicusignore` patterns against the path relative to the allowed prefix.
 Ingest a text note:
 ```
@@ -100,4 +114,3 @@ Purging deletes all items and derived artifacts under the corpus. It requires yo
 ```
 python3 -m biblicus purge --corpus corpora/example --confirm example
 ```

{biblicus-0.3.0 → biblicus-0.4.0}/docs/DEMOS.md RENAMED Viewed

@@ -133,6 +133,46 @@ The catalog is rebuildable. You can edit raw files or sidecar metadata, then ref
 python3 -m biblicus reindex --corpus corpora/demo
 ```
+### Crawl a website prefix
+To turn a website section into corpus items, crawl a root page and restrict the crawl to an allowed prefix.
+In one terminal, create a tiny local website and serve it:
+```
+rm -rf /tmp/biblicus-site
+mkdir -p /tmp/biblicus-site/site/subdir
+cat > /tmp/biblicus-site/site/index.html <<'HTML'
+<html>
+  <body>
+    <a href="page.html">Page</a>
+    <a href="subdir/">Subdir</a>
+  </body>
+</html>
+HTML
+cat > /tmp/biblicus-site/site/page.html <<'HTML'
+<html><body>hello</body></html>
+HTML
+cat > /tmp/biblicus-site/site/subdir/index.html <<'HTML'
+<html><body>subdir</body></html>
+HTML
+python3 -m http.server 8000 --directory /tmp/biblicus-site
+```
+In another terminal:
+```
+rm -rf corpora/crawl-demo
+python3 -m biblicus init corpora/crawl-demo
+python3 -m biblicus crawl --corpus corpora/crawl-demo \\
+  --root-url http://127.0.0.1:8000/site/index.html \\
+  --allowed-prefix http://127.0.0.1:8000/site/ \\
+  --max-items 50 \\
+  --tag crawled
+python3 -m biblicus list --corpus corpora/crawl-demo
+```
 ### Build an extraction run
 Text extraction is a separate pipeline stage from retrieval. An extraction run produces derived text artifacts under the corpus.
@@ -140,7 +180,7 @@ Text extraction is a separate pipeline stage from retrieval. An extraction run p
 This extractor reads text items and skips non-text items.
 ```
-python3 -m biblicus extract --corpus corpora/demo --step pass-through-text
+python3 -m biblicus extract build --corpus corpora/demo --step pass-through-text
 ```
 The output includes a `run_id` you can reuse when building a retrieval backend.
@@ -150,7 +190,7 @@ The output includes a `run_id` you can reuse when building a retrieval backend.
 When you want an explicit choice among multiple extraction outputs, add a selection extractor step at the end of the pipeline.
 ```
-python3 -m biblicus extract --corpus corpora/demo \\
+python3 -m biblicus extract build --corpus corpora/demo \\
   --step pass-through-text \\
   --step metadata-text \\
   --step select-text
@@ -171,7 +211,7 @@ This example downloads a small set of public Portable Document Format files, ext
 rm -rf corpora/pdf_samples
 python3 scripts/download_pdf_samples.py --corpus corpora/pdf_samples --force
-python3 -m biblicus extract --corpus corpora/pdf_samples --step pdf-text
+python3 -m biblicus extract build --corpus corpora/pdf_samples --step pdf-text
 ```
 Copy the `run_id` from the JavaScript Object Notation output. You will use it as `PDF_EXTRACTION_RUN_ID` in the next command.
@@ -211,7 +251,7 @@ python3 -m pip install "biblicus[ocr]"
 Then build an extraction run:
 ```
-python3 -m biblicus extract --corpus corpora/image_samples --step ocr-rapidocr
+python3 -m biblicus extract build --corpus corpora/image_samples --step ocr-rapidocr
 ```
 ### Optional: Unstructured as a last-resort extractor
@@ -227,7 +267,7 @@ python3 -m pip install "biblicus[unstructured]"
 Then build an extraction run:
 ```
-python3 -m biblicus extract --corpus corpora/pdf_samples --step unstructured
+python3 -m biblicus extract build --corpus corpora/pdf_samples --step unstructured
 ```
 To see Unstructured handle a non-Portable-Document-Format format, use the mixed corpus demo, which includes a `.docx` sample:
@@ -235,13 +275,13 @@ To see Unstructured handle a non-Portable-Document-Format format, use the mixed
 ```
 rm -rf corpora/mixed_samples
 python3 scripts/download_mixed_samples.py --corpus corpora/mixed_samples --force
-python3 -m biblicus extract --corpus corpora/mixed_samples --step unstructured
+python3 -m biblicus extract build --corpus corpora/mixed_samples --step unstructured
 ```
 When you want to prefer one extractor over another for the same item types, order the steps and end with `select-text`:
 ```
-python3 -m biblicus extract --corpus corpora/pdf_samples \\
+python3 -m biblicus extract build --corpus corpora/pdf_samples \\
   --step unstructured \\
   --step pdf-text \\
   --step select-text
@@ -263,7 +303,7 @@ python3 -m biblicus list --corpus corpora/audio_samples
 If you only want a metadata-only baseline, extract `metadata-text`:
 ```
-python3 -m biblicus extract --corpus corpora/audio_samples --step metadata-text
+python3 -m biblicus extract build --corpus corpora/audio_samples --step metadata-text
 ```
 For real speech to text transcription with the OpenAI backend, install the optional dependency and set an API key:
@@ -272,7 +312,7 @@ For real speech to text transcription with the OpenAI backend, install the optio
 python3 -m pip install "biblicus[openai]"
 mkdir -p .biblicus
 printf "openai:\n  api_key: ...\n" > .biblicus/config.yml
-python3 -m biblicus extract --corpus corpora/audio_samples --step stt-openai
+python3 -m biblicus extract build --corpus corpora/audio_samples --step stt-openai
 ```
 ### Build and query the minimal backend

{biblicus-0.3.0 → biblicus-0.4.0}/docs/EXTRACTION.md RENAMED Viewed

@@ -148,7 +148,7 @@ python3 -m biblicus init corpora/extraction-demo
 printf 'x' > /tmp/image.png
 python3 -m biblicus ingest --corpus corpora/extraction-demo /tmp/image.png --tag extracted
-python3 -m biblicus extract --corpus corpora/extraction-demo \\
+python3 -m biblicus extract build --corpus corpora/extraction-demo \\
   --step pass-through-text \\
   --step pdf-text \\
   --step metadata-text
@@ -161,7 +161,7 @@ The extracted text for the image comes from the `metadata-text` step because the
 Selection is a pipeline step that chooses extracted text from previous pipeline steps. Selection is just another extractor in the pipeline, and it decides which prior output to carry forward.
 ```
-python3 -m biblicus extract --corpus corpora/extraction-demo \\
+python3 -m biblicus extract build --corpus corpora/extraction-demo \\
   --step pass-through-text \\
   --step metadata-text \\
   --step select-text
@@ -169,6 +169,23 @@ python3 -m biblicus extract --corpus corpora/extraction-demo \\
 The pipeline run produces one extraction run under `pipeline`. You can point retrieval backends at that run.
+## Inspecting and deleting extraction runs
+Extraction runs are stored under the corpus and can be listed and inspected.
+```
+python3 -m biblicus extract list --corpus corpora/extraction-demo
+python3 -m biblicus extract show --corpus corpora/extraction-demo --run pipeline:EXTRACTION_RUN_ID
+```
+Deletion is explicit and requires typing the exact run reference as confirmation:
+```
+python3 -m biblicus extract delete --corpus corpora/extraction-demo \\
+  --run pipeline:EXTRACTION_RUN_ID \\
+  --confirm pipeline:EXTRACTION_RUN_ID
+```
 ## Use extracted text in retrieval
 Retrieval backends can build and query using a selected extraction run. This is configured by passing `extraction_run=extractor_id:run_id` to the backend build command.

biblicus-0.4.0/docs/ROADMAP.md ADDED Viewed

@@ -0,0 +1,200 @@
+# Roadmap
+This document is the ordered plan for what to build next.
+If you are looking for runnable examples, see `docs/DEMOS.md`.
+## Principles
+- Behavior specifications are the authoritative definition of behavior.
+- Every behavior that exists is specified.
+- Validation and documentation are part of the product.
+- Raw corpus items remain readable, portable files.
+- Derived artifacts are stored under the corpus and can coexist for multiple implementations.
+## Current state
+Version zero includes:
+- A file based corpus with ingestion, catalog rebuild, import, ignore rules, and lifecycle hooks.
+- A retrieval baseline (`scan`) and a practical local backend (`sqlite-full-text-search`).
+- A separate text extraction stage with extraction runs and a composable extractor pipeline.
+- Selection extractor steps that choose extracted text within a pipeline.
+- A speech to text extractor plugin (`stt-openai`) implemented as an optional dependency.
+- An optical character recognition extractor plugin (`ocr-rapidocr`) implemented as an optional dependency.
+- A broad catchall extractor plugin (`unstructured`) implemented as an optional dependency.
+- Integration corpora that include deterministic non-text cases such as a blank Portable Document Format file and a silence Waveform Audio File Format clip.
+Milestones 1 through 4 are complete. The next planned work begins at Milestone 5.
+## Near-term focus
+The next work will focus on the retrieval side of the pipeline:
+- Make retrieval runs and evidence production the simplest possible practical “minimum viable product”.
+- Add explicit evidence quality stages (rerank and filter) that are easy to compose, test, and evaluate.
+- Expand retrieval evaluation so it is easy to compare backends using the same corpora and datasets.
+Lower-priority work related to corpus ingestion conveniences and extractor evaluation remains valuable, but it is deferred while we make retrieval practical end to end.
+## Milestones
+### Milestone 1: Artifact lifecycle and storage layout
+Goal: make derived artifacts easy to inspect, compare, and retain across multiple extraction implementations.
+Status: complete.
+Deliverables:
+- A stable on-disk layout for extracted artifacts that partitions by extraction recipe and extractor identity.
+- A clear, human-readable manifest for each extraction run that includes configuration, timing, and summary stats.
+- Corpus-level tooling to list, inspect, and delete derived artifacts without touching raw items.
+Acceptance checks:
+- Raw items remain readable, portable files in `raw/`.
+- Derived artifacts can coexist for multiple extractors and multiple recipes over the same raw items.
+- Behavior specifications cover artifact layout and lifecycle operations.
+### Milestone 2: Idempotency and change detection
+Goal: make extraction runs repeatable, fast, and safe by skipping work when nothing relevant changed.
+Status: complete.
+Deliverables:
+- Change detection for extraction inputs (raw bytes identity) and extraction settings (extractor identity and configuration).
+- Extraction run behavior that cleanly separates “skipped because already present” from “skipped because unsupported”.
+- A simple “rebuild” workflow that is explicit and safe: delete an extraction run, then build it again.
+Acceptance checks:
+- Running the same extraction recipe twice produces the same outputs and reports predictable skip counts.
+- Behavior specifications cover idempotency and change detection outcomes.
+### Milestone 3: Failure semantics and reporting
+Goal: make extraction outcomes diagnosable and measurable without reading log output.
+Status: complete.
+Deliverables:
+- A clear set of extraction outcome categories (success, empty output, skipped, fatal error) with structured reasons.
+- Per-run reporting that summarizes outcomes and provides a path to per-item details.
+- Consistent, user-facing errors when optional dependencies or required configuration are missing.
+Acceptance checks:
+- Behavior specifications cover error classification and summary reporting.
+- Reports remain deterministic for the same corpus and recipe.
+### Milestone 4: Corpus import and crawl utilities
+Goal: make it easy to build a corpus from real-world sources while keeping the corpus readable and portable.
+Status: complete.
+Deliverables:
+- Folder tree import ergonomics: stable naming, media type detection, and predictable metadata sidecars.
+- A website crawl command that stays within an allow-listed uniform resource locator prefix and respects `.biblicusignore`.
+- Integration downloads that produce a small, realistic, repeatable corpus for experimentation without committing third-party content to the repository.
+Acceptance checks:
+- The crawl and import workflows are fully specified with behavior specifications.
+- Integration corpora remain gitignored, and can be regenerated from scripts.
+### Milestone 6: Evidence quality stages
+Goal: add explicit rerank and filter stages to retrieval.
+Status: next.
+Deliverables:
+- A rerank stage interface that takes evidence and returns reordered evidence.
+- A filter stage interface that applies metadata and source constraints.
+- Documentation that explains how to configure budgets and stage ordering.
+Acceptance checks:
+- Behavior specs cover the new stages.
+- Evaluation reports show per stage metrics and final metrics.
+### Milestone 7: Evaluation reports and datasets
+Goal: make evaluation results easier to interpret and compare.
+Status: next.
+Deliverables:
+- A dataset authoring workflow that supports small hand labeled sets and larger synthetic sets.
+- A report that includes per query diagnostics and a clear summary.
+Acceptance checks:
+- The existing dataset format remains stable or is versioned.
+- Reports remain deterministic for the same inputs.
+### Milestone 8: Pluggable backend hosting modes
+Goal: add one reference backend in an external process or remote service mode.
+Status: later.
+Deliverables:
+- A tool server that exposes a backend through a stable interface.
+- Documentation that shows how to run a backend out of process and connect to it.
+Acceptance checks:
+- Local tests remain fast and deterministic.
+- Integration tests validate end to end retrieval through the tool boundary.
+## Where to put design notes
+Design notes live in `docs/` so they are easy to browse and cross link.
+Executable behavior lives in `features/*.feature`.
+## Completed milestones (version zero)
+These milestones are complete as of version zero, and are maintained through behavior specifications:
+- Portable Document Format text extraction (`pdf-text`).
+- Optical character recognition extraction (`ocr-rapidocr`).
+- Catchall extraction for wide format coverage (`unstructured`).
+- Selection extractor steps (`select-text`, `select-longest-text`).
+## Completed milestones (post version zero)
+These milestones are complete after version zero, and remain defined by behavior specifications:
+- Extraction run lifecycle operations (`extract list`, `extract show`, `extract delete`) and a stable artifact layout.
+- Deterministic extraction run identifiers based on recipe and catalog version (idempotent extraction runs).
+- Crawl ingestion (`crawl`) with allow-listed prefix enforcement and `.biblicusignore` filtering.
+## Deferred milestones
+These milestones remain planned, but are not the near-term focus.
+### Milestone 5: Extractor datasets and evaluation harness (deferred)
+Goal: compare extraction approaches in a way that is measurable, repeatable, and useful for practical engineering decisions.
+Deliverables:
+- Dataset authoring workflow for extraction ground truth (for example: expected transcripts and expected optical character recognition text).
+- Evaluation metrics for accuracy, speed, and cost, including “processable fraction” for a given extractor recipe.
+- A report format that can compare multiple extraction recipes against the same corpus and dataset.
+Acceptance checks:
+- Evaluation results are stable and reproducible for the same corpus and dataset inputs.
+- Reports make it clear when an extractor fails to process an item versus producing empty output.

{biblicus-0.3.0 → biblicus-0.4.0}/docs/conf.py RENAMED Viewed

@@ -23,7 +23,6 @@ autodoc_typehints = "description"
 html_theme = "sphinx_rtd_theme"
 html_theme_options = {
-    "display_version": True,
     "prev_next_buttons_location": "bottom",
     "style_external_links": False,
     "collapse_navigation": False,

biblicus 0.3.0__tar.gz → 0.4.0__tar.gz

biblicus 0.3.0tar.gz → 0.4.0tar.gz