PyPI - biblicus - Versions diffs - 0.9.0__tar.gz → 0.11.0__tar.gz - Mend

biblicus 0.9.0tar.gz → 0.11.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (234) hide show

{biblicus-0.9.0/src/biblicus.egg-info → biblicus-0.11.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: biblicus
-Version: 0.9.0
+Version: 0.11.0
 Summary: Command line interface and Python library for corpus ingestion, retrieval, and evaluation.
 License: MIT
 Requires-Python: >=3.9
@@ -493,6 +493,12 @@ Two backends are included.
 For detailed documentation including configuration options, performance characteristics, and usage examples, see the [Backend Reference][backend-reference].
+## Retrieval documentation
+For the retrieval pipeline overview and run artifacts, see `docs/RETRIEVAL.md`. For retrieval quality upgrades
+(tuned lexical baseline, reranking, hybrid retrieval), see `docs/RETRIEVAL_QUALITY.md`. For evaluation workflows
+and dataset formats, see `docs/RETRIEVAL_EVALUATION.md`.
 ## Extraction backends
 These extractors are built in. Optional ones require extra dependencies. See [text extraction documentation][text-extraction] for details.
@@ -531,12 +537,13 @@ For detailed documentation on all extractors, see the [Extractor Reference][extr
 ## Topic modeling analysis
-Biblicus can run analysis pipelines on extracted text without changing the raw corpus. Topic modeling is the first
-analysis backend. It reads an extraction run, optionally applies an LLM-driven extraction pass, applies lexical
-processing, runs BERTopic, and optionally applies an LLM fine-tuning pass to label topics. The output is structured
-JavaScript Object Notation.
+Biblicus can run analysis pipelines on extracted text without changing the raw corpus. Profiling and topic modeling
+are the first analysis backends. Profiling summarizes corpus composition and extraction coverage. Topic modeling reads
+an extraction run, optionally applies an LLM-driven extraction pass, applies lexical processing, runs BERTopic, and
+optionally applies an LLM fine-tuning pass to label topics. The output is structured JavaScript Object Notation.
-See `docs/ANALYSIS.md` for the analysis pipeline overview and `docs/TOPIC_MODELING.md` for topic modeling details.
+See `docs/ANALYSIS.md` for the analysis pipeline overview, `docs/PROFILING.md` for profiling, and
+`docs/TOPIC_MODELING.md` for topic modeling details.
 Run a topic analysis using a recipe file:

{biblicus-0.9.0 → biblicus-0.11.0}/README.md RENAMED Viewed

@@ -447,6 +447,12 @@ Two backends are included.
 For detailed documentation including configuration options, performance characteristics, and usage examples, see the [Backend Reference][backend-reference].
+## Retrieval documentation
+For the retrieval pipeline overview and run artifacts, see `docs/RETRIEVAL.md`. For retrieval quality upgrades
+(tuned lexical baseline, reranking, hybrid retrieval), see `docs/RETRIEVAL_QUALITY.md`. For evaluation workflows
+and dataset formats, see `docs/RETRIEVAL_EVALUATION.md`.
 ## Extraction backends
 These extractors are built in. Optional ones require extra dependencies. See [text extraction documentation][text-extraction] for details.
@@ -485,12 +491,13 @@ For detailed documentation on all extractors, see the [Extractor Reference][extr
 ## Topic modeling analysis
-Biblicus can run analysis pipelines on extracted text without changing the raw corpus. Topic modeling is the first
-analysis backend. It reads an extraction run, optionally applies an LLM-driven extraction pass, applies lexical
-processing, runs BERTopic, and optionally applies an LLM fine-tuning pass to label topics. The output is structured
-JavaScript Object Notation.
+Biblicus can run analysis pipelines on extracted text without changing the raw corpus. Profiling and topic modeling
+are the first analysis backends. Profiling summarizes corpus composition and extraction coverage. Topic modeling reads
+an extraction run, optionally applies an LLM-driven extraction pass, applies lexical processing, runs BERTopic, and
+optionally applies an LLM fine-tuning pass to label topics. The output is structured JavaScript Object Notation.
-See `docs/ANALYSIS.md` for the analysis pipeline overview and `docs/TOPIC_MODELING.md` for topic modeling details.
+See `docs/ANALYSIS.md` for the analysis pipeline overview, `docs/PROFILING.md` for profiling, and
+`docs/TOPIC_MODELING.md` for topic modeling details.
 Run a topic analysis using a recipe file:

{biblicus-0.9.0 → biblicus-0.11.0}/docs/ANALYSIS.md RENAMED Viewed

@@ -34,3 +34,14 @@ python3 scripts/topic_modeling_integration.py --corpus corpora/ag_news_demo --fo
 The command prints the analysis run identifier and the output path. Open the resulting `output.json` to inspect per-topic
 labels, keywords, and document examples.
+## Profiling analysis
+Profiling is the baseline analysis backend. It summarizes corpus composition and extraction coverage using
+deterministic counts and distribution metrics. See `docs/PROFILING.md` for the full reference and working demo.
+Run profiling from the CLI:
+```
+biblicus analyze profile --corpus corpora/example --extraction-run pipeline:RUN_ID
+```

{biblicus-0.9.0 → biblicus-0.11.0}/docs/ARCHITECTURE.md RENAMED Viewed

@@ -88,11 +88,11 @@ Evidence is the canonical output of retrieval. Required fields:
 ### Integration boundary
 - Biblicus can integrate with Tactus as a **Model Context Protocol toolset**, for example with tool names such as `knowledge_base_ingest`, `knowledge_base_query`, and `knowledge_base_stats`.
-- We will **not** add a knowledge base or retrieval augmented generation language primitive in version zero. Revisit only if we need semantics that tools cannot express cleanly, such as enforceable policy boundaries, runtime managed durability, caching hooks, or guaranteed instrumentation.
+- We do **not** add a knowledge base or retrieval augmented generation language primitive in version zero. Revisit only if we need semantics that tools cannot express cleanly, such as enforceable policy boundaries, runtime managed durability, caching hooks, or guaranteed instrumentation.
 ### Interface packaging
-- The knowledge base interface is a **small protocol and reference implementation**, including tool schemas and a reference Model Context Protocol server. We will not build a full managed service in version zero.
+- The knowledge base interface is a **small protocol and reference implementation**, including tool schemas and a reference Model Context Protocol server. We do not build a full managed service in version zero.
 ### Corpus identity and layout
@@ -143,7 +143,7 @@ The interface stays the same; topology is configuration.
 - When a backend produces persisted materializations, Biblicus treats them as **versioned build runs** identified by `run_id` (rather than overwriting in place by default).
 - Manifests exist even for just-in-time backends (materializations may be empty).
 - Full directed acyclic graph lineage is not included in version zero; revisit only if needed.
-- Future (optional): define **shared materialization formats** (canonical chunk and embedding stores) so multiple backends can reuse intermediates when it makes sense; keep it opt-in.
+- Optional: define **shared materialization formats** (canonical chunk and embedding stores) so multiple backends can reuse intermediates when it makes sense; keep it opt-in.
 ### Evaluation
@@ -156,7 +156,7 @@ The interface stays the same; topology is configuration.
 - The corpus catalog is **file-based** (committable, portable, backend-agnostic) so any backend/tool can consume it without requiring a database engine.
 - Canonical version zero format is a single JavaScript Object Notation file at `.biblicus/catalog.json`, written atomically (temporary file and rename) on updates.
 - The catalog includes `latest_run_id` and run manifests are stored at `.biblicus/runs/<run_id>.json`.
-- If this ever becomes a bottleneck at very large scales, we will **change the specification** (bump `schema_version`) rather than introduce multiple “supported” catalog storage modes.
+- If this becomes a bottleneck at very large scales, we **change the specification** (bump `schema_version`) rather than introduce multiple “supported” catalog storage modes.
 ## Near-term deliverables

{biblicus-0.9.0 → biblicus-0.11.0}/docs/CORPUS_DESIGN.md RENAMED Viewed

@@ -216,7 +216,7 @@ Version zero locked this as policy. A prune workflow was not implemented yet.
 Goal: retain derived artifacts from multiple implementations side by side so a user can compare results and switch between implementations without losing work.
-This decision applies to extraction plugins and retrieval backends, and to any future plugin type that produces derived artifacts.
+This decision applies to extraction plugins and retrieval backends, and to any plugin type that produces derived artifacts.
 Option A: store artifacts under the corpus, partitioned by plugin type
@@ -369,7 +369,7 @@ Version zero implemented option A by writing structured log entries for hook exe
 ## Outcomes and remaining questions
-The hook protocol and hook logging policy above were implemented in version zero. This section records what was implemented, plus the questions that remain for future iterations.
+The hook protocol and hook logging policy above were implemented in version zero. This section records what was implemented and the open questions tracked for later iterations.
 ### Hook contexts implemented in version zero

{biblicus-0.9.0 → biblicus-0.11.0}/docs/DEMOS.md RENAMED Viewed

@@ -6,7 +6,7 @@ For the ordered plan of what to build next, see `docs/ROADMAP.md`.
 ## Diagram of the current system and the next layers
-Blue boxes are implemented now. Purple boxes are planned next layers that we can build and compare.
+Blue boxes are implemented now. Purple boxes are layers not implemented yet that we can build and compare.
 ```mermaid
 %%{init: {"flowchart": {"useMaxWidth": true, "nodeSpacing": 18, "rankSpacing": 22}}}%%
@@ -214,6 +214,14 @@ python3 scripts/topic_modeling_integration.py \
 The command prints the analysis run identifier and the output path. Open the `output.json` file to inspect per-topic labels,
 keywords, and document examples.
+### Profiling analysis demo
+The profiling demo downloads AG News, runs extraction, and produces a profiling report.
+```
+python3 scripts/profiling_demo.py --corpus corpora/profiling_demo --force
+```
 ### Select extracted text within a pipeline
 When you want an explicit choice among multiple extraction outputs, add a selection extractor step at the end of the pipeline.
@@ -225,7 +233,7 @@ python3 -m biblicus extract build --corpus corpora/demo \\
   --step select-text
 ```
-Copy the `run_id` from the JavaScript Object Notation output. You will use it as `EXTRACTION_RUN_ID` in the next command.
+Copy the `run_id` from the JavaScript Object Notation output. Use it as `EXTRACTION_RUN_ID` in the next command.
 ```
 python3 -m biblicus build --corpus corpora/demo --backend sqlite-full-text-search \\
@@ -243,7 +251,7 @@ python3 scripts/download_pdf_samples.py --corpus corpora/pdf_samples --force
 python3 -m biblicus extract build --corpus corpora/pdf_samples --step pdf-text
 ```
-Copy the `run_id` from the JavaScript Object Notation output. You will use it as `PDF_EXTRACTION_RUN_ID` in the next command.
+Copy the `run_id` from the JavaScript Object Notation output. Use it as `PDF_EXTRACTION_RUN_ID` in the next command.
 ```
 python3 -m biblicus build --corpus corpora/pdf_samples --backend sqlite-full-text-search --config extraction_run=pipeline:PDF_EXTRACTION_RUN_ID --config chunk_size=200 --config chunk_overlap=50 --config snippet_characters=120

biblicus-0.11.0/docs/PROFILING.md ADDED Viewed

@@ -0,0 +1,98 @@
+# Corpus profiling analysis
+Biblicus provides a profiling analysis backend that summarizes corpus contents using deterministic counts and
+coverage metrics. Profiling is intended as a fast, local baseline before heavier analysis such as topic modeling.
+## What profiling does
+The profiling analysis reports:
+- Total item count and media type distribution
+- Extracted text coverage (present, empty, missing)
+- Size and length distributions with percentiles
+- Tag coverage and top tags
+The output is structured JSON that can be stored, versioned, and compared across runs.
+## Run profiling from the CLI
+```
+biblicus analyze profile --corpus corpora/example --extraction-run pipeline:RUN_ID
+```
+If you omit `--extraction-run`, Biblicus uses the latest extraction run and emits a reproducibility warning.
+To customize profiling metrics, pass a recipe file:
+```
+biblicus analyze profile --corpus corpora/example --recipe recipes/profiling.yml --extraction-run pipeline:RUN_ID
+```
+### Profiling recipe configuration
+Profiling recipes use the analysis schema version and accept these fields:
+- `schema_version`: analysis schema version, currently `1`
+- `sample_size`: optional cap for distribution calculations
+- `min_text_characters`: minimum extracted text length for inclusion
+- `percentiles`: percentiles to compute for size and length distributions
+- `top_tag_count`: maximum number of tags to list in `top_tags`
+- `tag_filters`: optional list of tags to include in tag coverage metrics
+Example recipe:
+```
+schema_version: 1
+sample_size: 500
+min_text_characters: 50
+percentiles: [50, 90, 99]
+top_tag_count: 10
+tag_filters: ["ag_news", "label:World"]
+```
+## Run profiling from Python
+```
+from pathlib import Path
+from biblicus.analysis import get_analysis_backend
+from biblicus.corpus import Corpus
+from biblicus.models import ExtractionRunReference
+corpus = Corpus.open(Path("corpora/example"))
+backend = get_analysis_backend("profiling")
+output = backend.run_analysis(
+    corpus,
+    recipe_name="default",
+    config={
+        "schema_version": 1,
+        "sample_size": 500,
+        "min_text_characters": 50,
+        "percentiles": [50, 90, 99],
+        "top_tag_count": 10,
+        "tag_filters": ["ag_news"],
+    },
+    extraction_run=ExtractionRunReference(
+        extractor_id="pipeline",
+        run_id="RUN_ID",
+    ),
+)
+print(output.model_dump())
+```
+## Output location
+Profiling output is stored under:
+```
+.biblicus/runs/analysis/profiling/<run_id>/output.json
+```
+## Working demo
+A runnable demo is provided in `scripts/profiling_demo.py`. It downloads a corpus, runs extraction, and executes the
+profiling analysis so you can inspect the output:
+```
+python3 scripts/profiling_demo.py --corpus corpora/profiling_demo --force
+```

biblicus-0.11.0/docs/RETRIEVAL.md ADDED Viewed

@@ -0,0 +1,47 @@
+# Retrieval
+Biblicus treats retrieval as a reproducible, explicit pipeline stage that transforms a corpus into structured evidence.
+Retrieval is separated from extraction and context shaping so each can be evaluated independently and swapped without
+rewriting ingestion.
+## Retrieval concepts
+- **Backend**: a pluggable retrieval implementation that can build and query runs.
+- **Run**: a recorded retrieval build for a corpus and extraction run.
+- **Evidence**: structured output containing identifiers, provenance, and scores.
+- **Stage**: explicit steps such as retrieve, rerank, and filter.
+## How retrieval runs work
+1) Ingest raw items into a corpus.
+2) Build an extraction run to produce text artifacts.
+3) Build a retrieval run with a backend, referencing the extraction run.
+4) Query the run to return evidence.
+Retrieval runs are stored under:
+```
+.biblicus/runs/retrieval/<backend_id>/<run_id>/
+```
+## Backends
+See `docs/backends/index.md` for backend selection and configuration.
+## Evaluation
+Retrieval runs are evaluated against datasets with explicit budgets. See `docs/RETRIEVAL_EVALUATION.md` for the
+dataset format and workflow, `docs/FEATURE_INDEX.md` for the behavior specifications, and `docs/CONTEXT_PACK.md` for
+how evidence feeds into context packs.
+## Why the separation matters
+Keeping extraction and retrieval distinct makes it possible to:
+- Reuse the same extracted artifacts across many retrieval backends.
+- Compare backends against the same corpus and dataset inputs.
+- Record and audit retrieval decisions without mixing in prompting or context formatting.
+## Retrieval quality
+For retrieval quality upgrades, see `docs/RETRIEVAL_QUALITY.md`.

biblicus-0.11.0/docs/RETRIEVAL_EVALUATION.md ADDED Viewed

@@ -0,0 +1,74 @@
+# Retrieval evaluation
+Biblicus evaluates retrieval runs against deterministic datasets so quality comparisons are repeatable across backends
+and corpora. Evaluations keep the evidence-first model intact by reporting per-query evidence alongside summary
+metrics.
+## Dataset format
+Retrieval datasets are stored as JavaScript Object Notation files with a strict schema:
+```json
+{
+  "schema_version": 1,
+  "name": "example-dataset",
+  "description": "Small hand-labeled dataset for smoke tests.",
+  "queries": [
+    {
+      "query_id": "q-001",
+      "query_text": "alpha",
+      "expected_item_id": "item-id-123",
+      "kind": "gold"
+    }
+  ]
+}
+```
+Each query includes either an `expected_item_id` or an `expected_source_uri`. The `kind` field records whether the
+query is hand-labeled (`gold`) or synthetic.
+## Running an evaluation
+Use the command-line interface to evaluate a retrieval run against a dataset:
+```bash
+biblicus eval --corpus corpora/example --run <run_id> --dataset datasets/retrieval.json \
+  --max-total-items 5 --max-total-characters 2000 --max-items-per-source 5
+```
+If `--run` is omitted, the latest retrieval run is used. Evaluations are deterministic for the same corpus, run, and
+budget.
+## Output
+The evaluation output includes:
+- Dataset metadata (name, description, query count).
+- Run metadata (backend ID, run ID, evaluation timestamp).
+- Metrics (hit rate, precision-at-k, mean reciprocal rank).
+- System diagnostics (latency percentiles and index size).
+The output is JavaScript Object Notation suitable for downstream reporting.
+## Python usage
+```python
+from pathlib import Path
+from biblicus.corpus import Corpus
+from biblicus.evaluation import evaluate_run, load_dataset
+from biblicus.models import QueryBudget
+corpus = Corpus.open("corpora/example")
+run = corpus.load_run("<run_id>")
+dataset = load_dataset(Path("datasets/retrieval.json"))
+budget = QueryBudget(max_total_items=5, max_total_characters=2000, max_items_per_source=5)
+result = evaluate_run(corpus=corpus, run=run, dataset=dataset, budget=budget)
+print(result.model_dump_json(indent=2))
+```
+## Design notes
+- Evaluation is reproducible by construction: the run manifest, dataset, and budget fully determine the results.
+- The evaluation workflow expects retrieval stages to remain explicit in the run artifacts.
+- Reports are portable, so comparisons across backends and corpora are straightforward.

biblicus-0.11.0/docs/RETRIEVAL_QUALITY.md ADDED Viewed

@@ -0,0 +1,42 @@
+# Retrieval quality upgrades
+This document describes the retrieval quality upgrades available in Biblicus. It is a reference for how retrieval
+quality is expressed in runs and should be read alongside `docs/ROADMAP.md`.
+## Goals
+- Improve relevance without losing determinism or reproducibility.
+- Keep retrieval stages explicit and visible in run artifacts.
+- Preserve the evidence-first output model.
+## Available upgrades
+### 1) Tuned lexical baseline
+- BM25-style scoring with configurable parameters.
+- N-gram range controls.
+- Stop word strategy per backend.
+- Field weighting (for example: title, body, metadata).
+### 2) Reranking stage
+- Optional rerank step that re-scores top-N candidates.
+- Deterministic scoring keeps rerank behavior reproducible.
+### 3) Hybrid retrieval
+- Combine lexical and embedding signals.
+- Expose fusion weights in the recipe schema.
+- Emit stage-level scores and weights in evidence metadata.
+## Evaluation guidance
+- Measure accuracy-at-k and compare against the same datasets.
+- Run artifacts capture each stage and configuration for auditability.
+- Deterministic settings remain available as the default baseline.
+## Non-goals
+- Automated hyperparameter tuning.
+- Hidden fallback stages that obscure retrieval behavior.
+- UI-driven tuning in this phase.

{biblicus-0.9.0 → biblicus-0.11.0}/docs/ROADMAP.md RENAMED Viewed

@@ -31,6 +31,21 @@ Acceptance checks:
 - Dataset formats are versioned when they change.
 - Reports remain deterministic for the same inputs.
+## Next: retrieval quality upgrades
+Goal: make retrieval relevance stronger while keeping deterministic baselines and clear evaluation.
+Deliverables:
+- A tuned lexical baseline (for example: BM25 configuration, n-grams, field weighting, stop word controls).
+- A reranking stage that can refine top-N results with either a cross-encoder or an LLM re-ranker.
+- A hybrid retrieval mode that combines lexical signals with embeddings and exposes weights explicitly.
+Acceptance checks:
+- Accuracy-at-k improves on the same evaluation datasets without regressions in determinism.
+- Retrieval stages are explicitly recorded (retrieve, rerank, filter) in the output artifacts.
 ## Next: context pack policy surfaces
 Goal: make context shaping policies easier to evaluate and swap.
@@ -67,7 +82,6 @@ Goal: provide lightweight analysis utilities that summarize corpus themes and gu
 Deliverables:
-- Basic data profiling reports (counts, media types, size distributions, tag coverage).
 - Hidden Markov modeling analysis for sequence-driven corpora.
 - A way to compare analysis outputs across corpora or corpus snapshots.

{biblicus-0.9.0 → biblicus-0.11.0}/docs/conf.py RENAMED Viewed

@@ -4,8 +4,13 @@ Sphinx configuration for Biblicus documentation.
 from __future__ import annotations
+import os
+import sys
 from pathlib import Path
+from pygments.lexers.special import TextLexer
+from sphinx.highlighting import lexers
 PROJECT_ROOT = Path(__file__).resolve().parent.parent
 SOURCE_ROOT = PROJECT_ROOT / "src"
@@ -31,8 +36,6 @@ html_theme_options = {
 }
 # ReadTheDocs integration - canonical URL for SEO
-import os
 if os.environ.get("READTHEDOCS"):
     rtd_version = os.environ.get("READTHEDOCS_VERSION", "latest")
     rtd_project = os.environ.get("READTHEDOCS_PROJECT", "biblicus")
@@ -44,12 +47,6 @@ source_suffix = {
 }
 suppress_warnings = ["misc.highlighting_failure"]
-import sys
 sys.path.insert(0, str(SOURCE_ROOT))
-from pygments.lexers.special import TextLexer
-from sphinx.highlighting import lexers
 lexers["mermaid"] = TextLexer()

{biblicus-0.9.0 → biblicus-0.11.0}/docs/extractors/text-document/pass-through.md RENAMED Viewed

@@ -120,12 +120,12 @@ title: My Document
 tags: [note, draft]
 ---
-This is the body content that will be extracted.
+This is the body content that is extracted.
 ```
 Output text:
 ```
-This is the body content that will be extracted.
+This is the body content that is extracted.
 ```
 ### Mixed Format Pipeline
@@ -185,7 +185,7 @@ Non-text items are silently skipped (returns `None`). This allows the extractor
 ### Encoding Errors
-UTF-8 decoding errors will cause per-item failures recorded in `errored_items` but won't halt the entire extraction run.
+UTF-8 decoding errors cause per-item failures recorded in `errored_items` but do not halt the entire extraction run.
 ### Missing Files

{biblicus-0.9.0 → biblicus-0.11.0}/docs/extractors/text-document/unstructured.md RENAMED Viewed

@@ -78,7 +78,7 @@ class UnstructuredExtractorConfig(BaseModel):
 ### Configuration Options
-This extractor currently accepts no configuration. Future versions may expose Unstructured library options.
+This extractor currently accepts no configuration. Optional extensions may expose Unstructured library options.
 ## Usage

{biblicus-0.9.0 → biblicus-0.11.0}/docs/index.rst RENAMED Viewed

@@ -15,8 +15,12 @@ Contents
    KNOWLEDGE_BASE
    BACKENDS
    backends/index
+   RETRIEVAL
+   RETRIEVAL_QUALITY
+   RETRIEVAL_EVALUATION
    CONTEXT_PACK
    ANALYSIS
+   PROFILING
    TOPIC_MODELING
    DEMOS
    USER_CONFIGURATION

{biblicus-0.9.0 → biblicus-0.11.0}/features/analysis_schema.feature RENAMED Viewed

@@ -56,3 +56,55 @@ Feature: Analysis schema validation
     When I attempt to validate a vectorizer config with stop words "spanish"
     Then a model validation error is raised
     And the validation error mentions "vectorizer.stop_words must be"
+  Scenario: Profiling config rejects invalid sample size
+    When I attempt to validate a profiling config with sample size 0
+    Then a model validation error is raised
+    And the validation error mentions "sample_size"
+  Scenario: Profiling config rejects unsupported schema version
+    When I attempt to validate a profiling config with schema version 2
+    Then a model validation error is raised
+    And the validation error mentions "Unsupported analysis schema version"
+  Scenario: Profiling config rejects invalid percentiles
+    When I attempt to validate a profiling config with percentiles "0,101"
+    Then a model validation error is raised
+    And the validation error mentions "percentiles"
+  Scenario: Profiling config rejects empty percentiles
+    When I attempt to validate a profiling config with empty percentiles
+    Then a model validation error is raised
+    And the validation error mentions "percentiles"
+  Scenario: Profiling config rejects unsorted percentiles
+    When I attempt to validate a profiling config with percentiles "90,50"
+    Then a model validation error is raised
+    And the validation error mentions "percentiles"
+  Scenario: Profiling config rejects empty tag filters
+    When I attempt to validate a profiling config with tag filters "alpha,,beta"
+    Then a model validation error is raised
+    And the validation error mentions "tag_filters"
+  Scenario: Profiling config rejects non-list tag filters
+    When I attempt to validate a profiling config with tag filters string "alpha"
+    Then a model validation error is raised
+    And the validation error mentions "tag_filters"
+  Scenario: Profiling config accepts tag filters None
+    When I validate a profiling config with tag filters None
+    Then the profiling tag filters are absent
+  Scenario: Profiling config normalizes tag filters
+    When I validate a profiling config with tag filters list " alpha ,beta "
+    Then the profiling tag filters include "alpha"
+    And the profiling tag filters include "beta"
+  Scenario: Profiling ordering helper ignores missing items
+    When I order catalog items with missing entries
+    Then the ordered catalog item identifiers equal "a,c,b"
+  Scenario: Profiling percentile helper handles empty values
+    When I compute a profiling percentile on empty values
+    Then the profiling percentile value equals 0

{biblicus-0.9.0 → biblicus-0.11.0}/features/environment.py RENAMED Viewed

@@ -17,7 +17,6 @@ def _repo_root() -> Path:
     :return: Repository root path.
     :rtype: Path
     """
     return Path(__file__).resolve().parent.parent
@@ -32,7 +31,6 @@ def before_scenario(context, scenario) -> None:
     :return: None.
     :rtype: None
     """
     import biblicus.__main__ as _biblicus_main
     _ = _biblicus_main
@@ -74,7 +72,6 @@ def after_scenario(context, scenario) -> None:
     :return: None.
     :rtype: None
     """
     if getattr(context, "httpd", None) is not None:
         context.httpd.shutdown()
         context.httpd.server_close()
@@ -221,7 +218,9 @@ def after_scenario(context, scenario) -> None:
         context.fake_paddleocr_vl_behaviors.clear()
     if getattr(context, "_fake_paddleocr_installed", False):
         # Remove all paddle-related modules
-        paddle_module_names = [name for name in list(sys.modules.keys()) if "paddle" in name.lower()]
+        paddle_module_names = [
+            name for name in list(sys.modules.keys()) if "paddle" in name.lower()
+        ]
         for name in paddle_module_names:
             sys.modules.pop(name, None)
         # Restore original modules
@@ -345,7 +344,6 @@ def run_biblicus(
     :return: Captured execution result.
     :rtype: RunResult
     """
     import contextlib
     import io

biblicus 0.9.0__tar.gz → 0.11.0__tar.gz

biblicus 0.9.0tar.gz → 0.11.0tar.gz