PyPI - biblicus - Versions diffs - 0.13.0__tar.gz → 1.0.0__tar.gz - Mend

biblicus 0.13.0tar.gz → 1.0.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (436) hide show

{biblicus-0.13.0/src/biblicus.egg-info → biblicus-1.0.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: biblicus
-Version: 0.13.0
+Version: 1.0.0
 Summary: Command line interface and Python library for corpus ingestion, retrieval, and evaluation.
 License: MIT
 Requires-Python: >=3.9
@@ -9,6 +9,9 @@ License-File: LICENSE
 Requires-Dist: pydantic>=2.0
 Requires-Dist: PyYAML>=6.0
 Requires-Dist: pypdf>=4.0
+Requires-Dist: Jinja2>=3.1
+Requires-Dist: dotyaml>=0.1.3
+Requires-Dist: numpy>=1.24
 Provides-Extra: dev
 Requires-Dist: behave>=1.2.6; extra == "dev"
 Requires-Dist: coverage[toml]>=7.0; extra == "dev"
@@ -18,6 +21,9 @@ Requires-Dist: sphinx_rtd_theme>=2.0; extra == "dev"
 Requires-Dist: ruff>=0.4.0; extra == "dev"
 Requires-Dist: black>=24.0; extra == "dev"
 Requires-Dist: python-semantic-release>=9.0.0; extra == "dev"
+Provides-Extra: dspy
+Requires-Dist: dspy>=2.5; extra == "dspy"
+Requires-Dist: litellm>=1.0; extra == "dspy"
 Provides-Extra: openai
 Requires-Dist: openai>=1.0; extra == "openai"
 Provides-Extra: unstructured
@@ -40,6 +46,8 @@ Provides-Extra: docling-mlx
 Requires-Dist: docling[mlx-vlm]>=2.0.0; extra == "docling-mlx"
 Provides-Extra: topic-modeling
 Requires-Dist: bertopic>=0.15.0; extra == "topic-modeling"
+Provides-Extra: markov-analysis
+Requires-Dist: hmmlearn>=0.3.0; extra == "markov-analysis"
 Provides-Extra: datasets
 Requires-Dist: datasets>=2.18.0; extra == "datasets"
 Dynamic: license-file
@@ -50,18 +58,33 @@ Dynamic: license-file
 ![Coverage][coverage-badge]
 ![Documentation][documentation-badge]
-Make your documents usable by your assistant, then decide later how you will search and retrieve them.
+<p>
+  <img
+    src="docs/_static/Biblicus-logo.png"
+    alt="Biblicus logo"
+    align="right"
+    width="216"
+  />
+  Make your documents usable by your assistant, then decide later how you will search and retrieve them.
+</p>
 If you are building an assistant in Python, you probably have material you want it to use: notes, documents, web pages, and reference files. A common approach is retrieval augmented generation, where a system retrieves relevant material and uses it as evidence when generating a response.
 The first practical problem is not retrieval. It is collection and care. You need a stable place to put raw items, you need a small amount of metadata so you can find them again, and you need a way to evolve your retrieval approach over time without rewriting ingestion.
-This library gives you a corpus, which is a normal folder on disk. It stores each ingested item as a file, with optional metadata stored next to it. You can open and inspect the raw files directly. Any derived catalog or index can be rebuilt from the raw corpus.
+Biblicus gives you a normal folder on disk to manage. In Biblicus documentation, that managed folder is called a *corpus* (plural: *corpora*). It stores each ingested item as a file, with optional metadata stored next to it. You can open and inspect the raw files directly. Any derived catalog or index can be rebuilt from the raw files.
 It can be used alongside LangGraph, Tactus, Pydantic AI, any agent framework, or your own setup. Use it from Python or from the command line interface.
 See [retrieval augmented generation overview] for a short introduction to the idea.
+## Analysis highlights
+- `biblicus analyze markov` learns a directed, weighted state transition graph over segmented text.
+- YAML recipes support cascading composition plus dotted `--config key=value` overrides.
+- Text extract splits long texts with an LLM by inserting XML tags in-place for structured spans.
+- See `docs/MARKOV_ANALYSIS.md` for Markov analysis details and runnable demos.
+- See `docs/TEXT_EXTRACT.md` for the text extract utility and examples.
 ## Start with a knowledge base
 If you just want to hand a folder to your assistant and move on, use the high-level knowledge base interface. The folder can be nothing more than a handful of plain text files. You are not choosing a retrieval strategy yet. You are just collecting.
@@ -106,7 +129,7 @@ Think in three stages.
 If you learn a few project words, the rest of the system becomes predictable.
-- Corpus is the folder that holds raw items and their metadata.
+- Corpus is the managed folder that holds raw items and their metadata.
 - Item is the raw bytes plus optional metadata and source information.
 - Catalog is the rebuildable index of the corpus.
 - Extraction run is a recorded extraction build that produces text artifacts.
@@ -161,28 +184,28 @@ sequenceDiagram
 This repository is a working Python package. Install it into a virtual environment from the repository root.
 ```
-python3 -m pip install -e .
+python -m pip install -e .
 ```
 After the first release, you can install it from Python Package Index.
 ```
-python3 -m pip install biblicus
+python -m pip install biblicus
 ```
 ### Optional extras
 Some extractors are optional so the base install stays small.
-- Optical character recognition for images: `python3 -m pip install "biblicus[ocr]"`
-- Advanced optical character recognition with PaddleOCR: `python3 -m pip install "biblicus[paddleocr]"`
-- Document understanding with Docling VLM: `python3 -m pip install "biblicus[docling]"`
-- Document understanding with Docling VLM and MLX acceleration: `python3 -m pip install "biblicus[docling-mlx]"`
-- Speech to text transcription with OpenAI: `python3 -m pip install "biblicus[openai]"` (requires an OpenAI API key in `~/.biblicus/config.yml` or `./.biblicus/config.yml`)
-- Speech to text transcription with Deepgram: `python3 -m pip install "biblicus[deepgram]"` (requires a Deepgram API key in `~/.biblicus/config.yml` or `./.biblicus/config.yml`)
-- Broad document parsing fallback: `python3 -m pip install "biblicus[unstructured]"`
-- MarkItDown document conversion (requires Python 3.10 or higher): `python3 -m pip install "biblicus[markitdown]"`
-- Topic modeling analysis with BERTopic: `python3 -m pip install "biblicus[topic-modeling]"`
+- Optical character recognition for images: `python -m pip install "biblicus[ocr]"`
+- Advanced optical character recognition with PaddleOCR: `python -m pip install "biblicus[paddleocr]"`
+- Document understanding with Docling VLM: `python -m pip install "biblicus[docling]"`
+- Document understanding with Docling VLM and MLX acceleration: `python -m pip install "biblicus[docling-mlx]"`
+- Speech to text transcription with OpenAI: `python -m pip install "biblicus[openai]"` (requires an OpenAI API key in `~/.biblicus/config.yml` or `./.biblicus/config.yml`)
+- Speech to text transcription with Deepgram: `python -m pip install "biblicus[deepgram]"` (requires a Deepgram API key in `~/.biblicus/config.yml` or `./.biblicus/config.yml`)
+- Broad document parsing fallback: `python -m pip install "biblicus[unstructured]"`
+- MarkItDown document conversion (requires Python 3.10 or higher): `python -m pip install "biblicus[markitdown]"`
+- Topic modeling analysis with BERTopic: `python -m pip install "biblicus[topic-modeling]"`
 ## Quick start
@@ -200,16 +223,49 @@ biblicus build --corpus corpora/example --backend scan
 biblicus query --corpus corpora/example --query "note"
 ```
-If you want to turn a website section into corpus items, crawl a root web address while restricting the crawl to an allowed prefix:
+## Web Ingestion
+Biblicus supports ingesting content directly from the web using two approaches.
+### Ingest from URLs
+Ingest individual documents or web pages from URLs. The `ingest` command automatically detects content types including PDF, HTML, Markdown, images, and audio:
+```bash
+# Ingest a document from a URL
+biblicus ingest https://example.com/document.pdf --tags "research"
+# Ingest a web page
+biblicus ingest https://example.com/article.html --tags "article"
+# Ingest with a corpus path specified
+biblicus ingest --corpus corpora/example https://docs.example.com/guide.md --tags "documentation"
 ```
-biblicus crawl --corpus corpora/example \\
-  --root-url https://example.com/docs/index.html \\
-  --allowed-prefix https://example.com/docs/ \\
-  --max-items 50 \\
-  --tag crawled
+### Crawl Websites
+Crawl entire website sections with automatic link discovery. The crawler follows links within the allowed prefix and stores discovered content:
+```bash
+# Crawl a documentation site
+biblicus crawl \
+  --corpus corpora/example \
+  --root-url https://docs.example.com/ \
+  --allowed-prefix https://docs.example.com/ \
+  --max-items 100 \
+  --tags "documentation"
+# Crawl a specific blog category
+biblicus crawl \
+  --corpus corpora/example \
+  --root-url https://blog.example.com/category/tutorials/ \
+  --allowed-prefix https://blog.example.com/category/tutorials/ \
+  --max-items 50 \
+  --tags "tutorials,blog"
 ```
+The `--allowed-prefix` parameter restricts the crawler to only follow links that start with the specified URL prefix, preventing it from crawling outside the intended scope. The crawler respects `.biblicusignore` rules and stores items under `raw/imports/crawl/` in your corpus.
 ## End-to-end example: lower-level control
 The command-line interface returns JavaScript Object Notation by default. This makes it easy to use Biblicus in scripts and to treat retrieval as a deterministic, testable step.
@@ -237,7 +293,7 @@ for note_title, note_text in notes:
 backend = get_backend("scan")
 run = backend.build_run(corpus, recipe_name="Story demo", config={})
-budget = QueryBudget(max_total_items=5, max_total_characters=2000, max_items_per_source=None)
+budget = QueryBudget(max_total_items=5, maximum_total_characters=2000, max_items_per_source=None)
 result = backend.query(
     corpus,
     run=run,
@@ -277,7 +333,7 @@ Example output:
   "query_text": "Primary button style preference",
   "budget": {
     "max_total_items": 5,
-    "max_total_characters": 2000,
+    "maximum_total_characters": 2000,
     "max_items_per_source": null
   },
   "run_id": "RUN_ID",
@@ -490,7 +546,7 @@ Three backends are included.
 - `scan` is a minimal baseline that scans raw items directly.
 - `sqlite-full-text-search` is a practical baseline that builds a full text search index in SQLite.
-- `vector` is a deterministic term-frequency vector baseline with cosine similarity scoring.
+- `tf-vector` is a deterministic term-frequency vector baseline with cosine similarity scoring.
 For detailed documentation including configuration options, performance characteristics, and usage examples, see the [Backend Reference][backend-reference].
@@ -498,7 +554,8 @@ For detailed documentation including configuration options, performance characte
 For the retrieval pipeline overview and run artifacts, see `docs/RETRIEVAL.md`. For retrieval quality upgrades
 (tuned lexical baseline, reranking, hybrid retrieval), see `docs/RETRIEVAL_QUALITY.md`. For evaluation workflows
-and dataset formats, see `docs/RETRIEVAL_EVALUATION.md`.
+and dataset formats, see `docs/RETRIEVAL_EVALUATION.md`. For a runnable walkthrough, use the retrieval evaluation lab
+script (`scripts/retrieval_evaluation_lab.py`).
 ## Extraction backends
@@ -539,6 +596,21 @@ For detailed documentation on all extractors, see the [Extractor Reference][extr
 For extraction evaluation workflows, dataset formats, and report interpretation, see
 `docs/EXTRACTION_EVALUATION.md`.
+## Text extract utility
+Text extract is a reusable analysis utility that lets a model insert XML tags into a long text without re-emitting the
+entire document. It returns structured spans and the marked-up text, and it is used as a segmentation option in Markov
+analysis.
+See `docs/TEXT_EXTRACT.md` for the utility API and examples, and `docs/MARKOV_ANALYSIS.md` for the Markov integration.
+## Text slice utility
+Text slice is a reusable analysis utility that lets a model insert `<slice/>` markers into a long text without
+re-emitting the entire document. It returns ordered slices and the marked-up text for auditing and reuse.
+See `docs/TEXT_SLICE.md` for the utility API and examples.
 ## Topic modeling analysis
 Biblicus can run analysis pipelines on extracted text without changing the raw corpus. Profiling and topic modeling
@@ -593,7 +665,7 @@ AG News integration runs require `biblicus[datasets]` in addition to `biblicus[t
 For a repeatable, real-world integration run that downloads AG News and executes topic modeling, use:
 ```
-python3 scripts/topic_modeling_integration.py --corpus corpora/ag_news_demo --force
+python scripts/topic_modeling_integration.py --corpus corpora/ag_news_demo --force
 ```
 See `docs/TOPIC_MODELING.md` for parameter examples and per-topic output behavior.
@@ -607,13 +679,13 @@ Use `scripts/download_pdf_samples.py` to download a small Portable Document Form
 ## Tests and coverage
 ```
-python3 scripts/test.py
+python scripts/test.py
 ```
 To include integration scenarios that download public test data at runtime, run this command.
 ```
-python3 scripts/test.py --integration
+python scripts/test.py --integration
 ```
 ## Releases
@@ -631,13 +703,13 @@ Reference documentation is generated from Sphinx style docstrings.
 Install development dependencies:
 ```
-python3 -m pip install -e ".[dev]"
+python -m pip install -e ".[dev]"
 ```
 Build the documentation:
 ```
-python3 -m sphinx -b html docs docs/_build/html
+python -m sphinx -b html docs docs/_build/html
 ```
 ## License

{biblicus-0.13.0 → biblicus-1.0.0}/README.md RENAMED Viewed

@@ -4,18 +4,33 @@
 ![Coverage][coverage-badge]
 ![Documentation][documentation-badge]
-Make your documents usable by your assistant, then decide later how you will search and retrieve them.
+<p>
+  <img
+    src="docs/_static/Biblicus-logo.png"
+    alt="Biblicus logo"
+    align="right"
+    width="216"
+  />
+  Make your documents usable by your assistant, then decide later how you will search and retrieve them.
+</p>
 If you are building an assistant in Python, you probably have material you want it to use: notes, documents, web pages, and reference files. A common approach is retrieval augmented generation, where a system retrieves relevant material and uses it as evidence when generating a response.
 The first practical problem is not retrieval. It is collection and care. You need a stable place to put raw items, you need a small amount of metadata so you can find them again, and you need a way to evolve your retrieval approach over time without rewriting ingestion.
-This library gives you a corpus, which is a normal folder on disk. It stores each ingested item as a file, with optional metadata stored next to it. You can open and inspect the raw files directly. Any derived catalog or index can be rebuilt from the raw corpus.
+Biblicus gives you a normal folder on disk to manage. In Biblicus documentation, that managed folder is called a *corpus* (plural: *corpora*). It stores each ingested item as a file, with optional metadata stored next to it. You can open and inspect the raw files directly. Any derived catalog or index can be rebuilt from the raw files.
 It can be used alongside LangGraph, Tactus, Pydantic AI, any agent framework, or your own setup. Use it from Python or from the command line interface.
 See [retrieval augmented generation overview] for a short introduction to the idea.
+## Analysis highlights
+- `biblicus analyze markov` learns a directed, weighted state transition graph over segmented text.
+- YAML recipes support cascading composition plus dotted `--config key=value` overrides.
+- Text extract splits long texts with an LLM by inserting XML tags in-place for structured spans.
+- See `docs/MARKOV_ANALYSIS.md` for Markov analysis details and runnable demos.
+- See `docs/TEXT_EXTRACT.md` for the text extract utility and examples.
 ## Start with a knowledge base
 If you just want to hand a folder to your assistant and move on, use the high-level knowledge base interface. The folder can be nothing more than a handful of plain text files. You are not choosing a retrieval strategy yet. You are just collecting.
@@ -60,7 +75,7 @@ Think in three stages.
 If you learn a few project words, the rest of the system becomes predictable.
-- Corpus is the folder that holds raw items and their metadata.
+- Corpus is the managed folder that holds raw items and their metadata.
 - Item is the raw bytes plus optional metadata and source information.
 - Catalog is the rebuildable index of the corpus.
 - Extraction run is a recorded extraction build that produces text artifacts.
@@ -115,28 +130,28 @@ sequenceDiagram
 This repository is a working Python package. Install it into a virtual environment from the repository root.
 ```
-python3 -m pip install -e .
+python -m pip install -e .
 ```
 After the first release, you can install it from Python Package Index.
 ```
-python3 -m pip install biblicus
+python -m pip install biblicus
 ```
 ### Optional extras
 Some extractors are optional so the base install stays small.
-- Optical character recognition for images: `python3 -m pip install "biblicus[ocr]"`
-- Advanced optical character recognition with PaddleOCR: `python3 -m pip install "biblicus[paddleocr]"`
-- Document understanding with Docling VLM: `python3 -m pip install "biblicus[docling]"`
-- Document understanding with Docling VLM and MLX acceleration: `python3 -m pip install "biblicus[docling-mlx]"`
-- Speech to text transcription with OpenAI: `python3 -m pip install "biblicus[openai]"` (requires an OpenAI API key in `~/.biblicus/config.yml` or `./.biblicus/config.yml`)
-- Speech to text transcription with Deepgram: `python3 -m pip install "biblicus[deepgram]"` (requires a Deepgram API key in `~/.biblicus/config.yml` or `./.biblicus/config.yml`)
-- Broad document parsing fallback: `python3 -m pip install "biblicus[unstructured]"`
-- MarkItDown document conversion (requires Python 3.10 or higher): `python3 -m pip install "biblicus[markitdown]"`
-- Topic modeling analysis with BERTopic: `python3 -m pip install "biblicus[topic-modeling]"`
+- Optical character recognition for images: `python -m pip install "biblicus[ocr]"`
+- Advanced optical character recognition with PaddleOCR: `python -m pip install "biblicus[paddleocr]"`
+- Document understanding with Docling VLM: `python -m pip install "biblicus[docling]"`
+- Document understanding with Docling VLM and MLX acceleration: `python -m pip install "biblicus[docling-mlx]"`
+- Speech to text transcription with OpenAI: `python -m pip install "biblicus[openai]"` (requires an OpenAI API key in `~/.biblicus/config.yml` or `./.biblicus/config.yml`)
+- Speech to text transcription with Deepgram: `python -m pip install "biblicus[deepgram]"` (requires a Deepgram API key in `~/.biblicus/config.yml` or `./.biblicus/config.yml`)
+- Broad document parsing fallback: `python -m pip install "biblicus[unstructured]"`
+- MarkItDown document conversion (requires Python 3.10 or higher): `python -m pip install "biblicus[markitdown]"`
+- Topic modeling analysis with BERTopic: `python -m pip install "biblicus[topic-modeling]"`
 ## Quick start
@@ -154,16 +169,49 @@ biblicus build --corpus corpora/example --backend scan
 biblicus query --corpus corpora/example --query "note"
 ```
-If you want to turn a website section into corpus items, crawl a root web address while restricting the crawl to an allowed prefix:
+## Web Ingestion
+Biblicus supports ingesting content directly from the web using two approaches.
+### Ingest from URLs
+Ingest individual documents or web pages from URLs. The `ingest` command automatically detects content types including PDF, HTML, Markdown, images, and audio:
+```bash
+# Ingest a document from a URL
+biblicus ingest https://example.com/document.pdf --tags "research"
+# Ingest a web page
+biblicus ingest https://example.com/article.html --tags "article"
+# Ingest with a corpus path specified
+biblicus ingest --corpus corpora/example https://docs.example.com/guide.md --tags "documentation"
 ```
-biblicus crawl --corpus corpora/example \\
-  --root-url https://example.com/docs/index.html \\
-  --allowed-prefix https://example.com/docs/ \\
-  --max-items 50 \\
-  --tag crawled
+### Crawl Websites
+Crawl entire website sections with automatic link discovery. The crawler follows links within the allowed prefix and stores discovered content:
+```bash
+# Crawl a documentation site
+biblicus crawl \
+  --corpus corpora/example \
+  --root-url https://docs.example.com/ \
+  --allowed-prefix https://docs.example.com/ \
+  --max-items 100 \
+  --tags "documentation"
+# Crawl a specific blog category
+biblicus crawl \
+  --corpus corpora/example \
+  --root-url https://blog.example.com/category/tutorials/ \
+  --allowed-prefix https://blog.example.com/category/tutorials/ \
+  --max-items 50 \
+  --tags "tutorials,blog"
 ```
+The `--allowed-prefix` parameter restricts the crawler to only follow links that start with the specified URL prefix, preventing it from crawling outside the intended scope. The crawler respects `.biblicusignore` rules and stores items under `raw/imports/crawl/` in your corpus.
 ## End-to-end example: lower-level control
 The command-line interface returns JavaScript Object Notation by default. This makes it easy to use Biblicus in scripts and to treat retrieval as a deterministic, testable step.
@@ -191,7 +239,7 @@ for note_title, note_text in notes:
 backend = get_backend("scan")
 run = backend.build_run(corpus, recipe_name="Story demo", config={})
-budget = QueryBudget(max_total_items=5, max_total_characters=2000, max_items_per_source=None)
+budget = QueryBudget(max_total_items=5, maximum_total_characters=2000, max_items_per_source=None)
 result = backend.query(
     corpus,
     run=run,
@@ -231,7 +279,7 @@ Example output:
   "query_text": "Primary button style preference",
   "budget": {
     "max_total_items": 5,
-    "max_total_characters": 2000,
+    "maximum_total_characters": 2000,
     "max_items_per_source": null
   },
   "run_id": "RUN_ID",
@@ -444,7 +492,7 @@ Three backends are included.
 - `scan` is a minimal baseline that scans raw items directly.
 - `sqlite-full-text-search` is a practical baseline that builds a full text search index in SQLite.
-- `vector` is a deterministic term-frequency vector baseline with cosine similarity scoring.
+- `tf-vector` is a deterministic term-frequency vector baseline with cosine similarity scoring.
 For detailed documentation including configuration options, performance characteristics, and usage examples, see the [Backend Reference][backend-reference].
@@ -452,7 +500,8 @@ For detailed documentation including configuration options, performance characte
 For the retrieval pipeline overview and run artifacts, see `docs/RETRIEVAL.md`. For retrieval quality upgrades
 (tuned lexical baseline, reranking, hybrid retrieval), see `docs/RETRIEVAL_QUALITY.md`. For evaluation workflows
-and dataset formats, see `docs/RETRIEVAL_EVALUATION.md`.
+and dataset formats, see `docs/RETRIEVAL_EVALUATION.md`. For a runnable walkthrough, use the retrieval evaluation lab
+script (`scripts/retrieval_evaluation_lab.py`).
 ## Extraction backends
@@ -493,6 +542,21 @@ For detailed documentation on all extractors, see the [Extractor Reference][extr
 For extraction evaluation workflows, dataset formats, and report interpretation, see
 `docs/EXTRACTION_EVALUATION.md`.
+## Text extract utility
+Text extract is a reusable analysis utility that lets a model insert XML tags into a long text without re-emitting the
+entire document. It returns structured spans and the marked-up text, and it is used as a segmentation option in Markov
+analysis.
+See `docs/TEXT_EXTRACT.md` for the utility API and examples, and `docs/MARKOV_ANALYSIS.md` for the Markov integration.
+## Text slice utility
+Text slice is a reusable analysis utility that lets a model insert `<slice/>` markers into a long text without
+re-emitting the entire document. It returns ordered slices and the marked-up text for auditing and reuse.
+See `docs/TEXT_SLICE.md` for the utility API and examples.
 ## Topic modeling analysis
 Biblicus can run analysis pipelines on extracted text without changing the raw corpus. Profiling and topic modeling
@@ -547,7 +611,7 @@ AG News integration runs require `biblicus[datasets]` in addition to `biblicus[t
 For a repeatable, real-world integration run that downloads AG News and executes topic modeling, use:
 ```
-python3 scripts/topic_modeling_integration.py --corpus corpora/ag_news_demo --force
+python scripts/topic_modeling_integration.py --corpus corpora/ag_news_demo --force
 ```
 See `docs/TOPIC_MODELING.md` for parameter examples and per-topic output behavior.
@@ -561,13 +625,13 @@ Use `scripts/download_pdf_samples.py` to download a small Portable Document Form
 ## Tests and coverage
 ```
-python3 scripts/test.py
+python scripts/test.py
 ```
 To include integration scenarios that download public test data at runtime, run this command.
 ```
-python3 scripts/test.py --integration
+python scripts/test.py --integration
 ```
 ## Releases
@@ -585,13 +649,13 @@ Reference documentation is generated from Sphinx style docstrings.
 Install development dependencies:
 ```
-python3 -m pip install -e ".[dev]"
+python -m pip install -e ".[dev]"
 ```
 Build the documentation:
 ```
-python3 -m sphinx -b html docs docs/_build/html
+python -m sphinx -b html docs docs/_build/html
 ```
 ## License

biblicus-1.0.0/datasets/retrieval_lab/labels.json ADDED Viewed

@@ -0,0 +1,25 @@
+{
+  "schema_version": 1,
+  "name": "retrieval-evaluation-lab",
+  "description": "Bundled labels for the retrieval evaluation lab.",
+  "queries": [
+    {
+      "query_id": "q1",
+      "query_text": "alpha unique",
+      "expected_filename": "alpha.txt",
+      "kind": "gold"
+    },
+    {
+      "query_id": "q2",
+      "query_text": "beta unique",
+      "expected_filename": "beta.txt",
+      "kind": "gold"
+    },
+    {
+      "query_id": "q3",
+      "query_text": "gamma unique",
+      "expected_filename": "gamma.txt",
+      "kind": "gold"
+    }
+  ]
+}

biblicus 0.13.0__tar.gz → 1.0.0__tar.gz

biblicus 0.13.0tar.gz → 1.0.0tar.gz