PyPI - biblicus - Versions diffs - 0.9.0__tar.gz → 0.10.0__tar.gz - Mend

biblicus 0.9.0tar.gz → 0.10.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (227) hide show

{biblicus-0.9.0/src/biblicus.egg-info → biblicus-0.10.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: biblicus
-Version: 0.9.0
+Version: 0.10.0
 Summary: Command line interface and Python library for corpus ingestion, retrieval, and evaluation.
 License: MIT
 Requires-Python: >=3.9
@@ -531,12 +531,13 @@ For detailed documentation on all extractors, see the [Extractor Reference][extr
 ## Topic modeling analysis
-Biblicus can run analysis pipelines on extracted text without changing the raw corpus. Topic modeling is the first
-analysis backend. It reads an extraction run, optionally applies an LLM-driven extraction pass, applies lexical
-processing, runs BERTopic, and optionally applies an LLM fine-tuning pass to label topics. The output is structured
-JavaScript Object Notation.
+Biblicus can run analysis pipelines on extracted text without changing the raw corpus. Profiling and topic modeling
+are the first analysis backends. Profiling summarizes corpus composition and extraction coverage. Topic modeling reads
+an extraction run, optionally applies an LLM-driven extraction pass, applies lexical processing, runs BERTopic, and
+optionally applies an LLM fine-tuning pass to label topics. The output is structured JavaScript Object Notation.
-See `docs/ANALYSIS.md` for the analysis pipeline overview and `docs/TOPIC_MODELING.md` for topic modeling details.
+See `docs/ANALYSIS.md` for the analysis pipeline overview, `docs/PROFILING.md` for profiling, and
+`docs/TOPIC_MODELING.md` for topic modeling details.
 Run a topic analysis using a recipe file:

{biblicus-0.9.0 → biblicus-0.10.0}/README.md RENAMED Viewed

@@ -485,12 +485,13 @@ For detailed documentation on all extractors, see the [Extractor Reference][extr
 ## Topic modeling analysis
-Biblicus can run analysis pipelines on extracted text without changing the raw corpus. Topic modeling is the first
-analysis backend. It reads an extraction run, optionally applies an LLM-driven extraction pass, applies lexical
-processing, runs BERTopic, and optionally applies an LLM fine-tuning pass to label topics. The output is structured
-JavaScript Object Notation.
+Biblicus can run analysis pipelines on extracted text without changing the raw corpus. Profiling and topic modeling
+are the first analysis backends. Profiling summarizes corpus composition and extraction coverage. Topic modeling reads
+an extraction run, optionally applies an LLM-driven extraction pass, applies lexical processing, runs BERTopic, and
+optionally applies an LLM fine-tuning pass to label topics. The output is structured JavaScript Object Notation.
-See `docs/ANALYSIS.md` for the analysis pipeline overview and `docs/TOPIC_MODELING.md` for topic modeling details.
+See `docs/ANALYSIS.md` for the analysis pipeline overview, `docs/PROFILING.md` for profiling, and
+`docs/TOPIC_MODELING.md` for topic modeling details.
 Run a topic analysis using a recipe file:

{biblicus-0.9.0 → biblicus-0.10.0}/docs/ANALYSIS.md RENAMED Viewed

@@ -34,3 +34,14 @@ python3 scripts/topic_modeling_integration.py --corpus corpora/ag_news_demo --fo
 The command prints the analysis run identifier and the output path. Open the resulting `output.json` to inspect per-topic
 labels, keywords, and document examples.
+## Profiling analysis
+Profiling is the baseline analysis backend. It summarizes corpus composition and extraction coverage using
+deterministic counts and distribution metrics. See `docs/PROFILING.md` for the full reference and working demo.
+Run profiling from the CLI:
+```
+biblicus analyze profile --corpus corpora/example --extraction-run pipeline:RUN_ID
+```

{biblicus-0.9.0 → biblicus-0.10.0}/docs/DEMOS.md RENAMED Viewed

@@ -214,6 +214,14 @@ python3 scripts/topic_modeling_integration.py \
 The command prints the analysis run identifier and the output path. Open the `output.json` file to inspect per-topic labels,
 keywords, and document examples.
+### Profiling analysis demo
+The profiling demo downloads AG News, runs extraction, and produces a profiling report.
+```
+python3 scripts/profiling_demo.py --corpus corpora/profiling_demo --force
+```
 ### Select extracted text within a pipeline
 When you want an explicit choice among multiple extraction outputs, add a selection extractor step at the end of the pipeline.

biblicus-0.10.0/docs/PROFILING.md ADDED Viewed

@@ -0,0 +1,98 @@
+# Corpus profiling analysis
+Biblicus provides a profiling analysis backend that summarizes corpus contents using deterministic counts and
+coverage metrics. Profiling is intended as a fast, local baseline before heavier analysis such as topic modeling.
+## What profiling does
+The profiling analysis reports:
+- Total item count and media type distribution
+- Extracted text coverage (present, empty, missing)
+- Size and length distributions with percentiles
+- Tag coverage and top tags
+The output is structured JSON that can be stored, versioned, and compared across runs.
+## Run profiling from the CLI
+```
+biblicus analyze profile --corpus corpora/example --extraction-run pipeline:RUN_ID
+```
+If you omit `--extraction-run`, Biblicus uses the latest extraction run and emits a reproducibility warning.
+To customize profiling metrics, pass a recipe file:
+```
+biblicus analyze profile --corpus corpora/example --recipe recipes/profiling.yml --extraction-run pipeline:RUN_ID
+```
+### Profiling recipe configuration
+Profiling recipes use the analysis schema version and accept these fields:
+- `schema_version`: analysis schema version, currently `1`
+- `sample_size`: optional cap for distribution calculations
+- `min_text_characters`: minimum extracted text length for inclusion
+- `percentiles`: percentiles to compute for size and length distributions
+- `top_tag_count`: maximum number of tags to list in `top_tags`
+- `tag_filters`: optional list of tags to include in tag coverage metrics
+Example recipe:
+```
+schema_version: 1
+sample_size: 500
+min_text_characters: 50
+percentiles: [50, 90, 99]
+top_tag_count: 10
+tag_filters: ["ag_news", "label:World"]
+```
+## Run profiling from Python
+```
+from pathlib import Path
+from biblicus.analysis import get_analysis_backend
+from biblicus.corpus import Corpus
+from biblicus.models import ExtractionRunReference
+corpus = Corpus.open(Path("corpora/example"))
+backend = get_analysis_backend("profiling")
+output = backend.run_analysis(
+    corpus,
+    recipe_name="default",
+    config={
+        "schema_version": 1,
+        "sample_size": 500,
+        "min_text_characters": 50,
+        "percentiles": [50, 90, 99],
+        "top_tag_count": 10,
+        "tag_filters": ["ag_news"],
+    },
+    extraction_run=ExtractionRunReference(
+        extractor_id="pipeline",
+        run_id="RUN_ID",
+    ),
+)
+print(output.model_dump())
+```
+## Output location
+Profiling output is stored under:
+```
+.biblicus/runs/analysis/profiling/<run_id>/output.json
+```
+## Working demo
+A runnable demo is provided in `scripts/profiling_demo.py`. It downloads a corpus, runs extraction, and executes the
+profiling analysis so you can inspect the output:
+```
+python3 scripts/profiling_demo.py --corpus corpora/profiling_demo --force
+```

{biblicus-0.9.0 → biblicus-0.10.0}/docs/conf.py RENAMED Viewed

@@ -4,8 +4,13 @@ Sphinx configuration for Biblicus documentation.
 from __future__ import annotations
+import os
+import sys
 from pathlib import Path
+from pygments.lexers.special import TextLexer
+from sphinx.highlighting import lexers
 PROJECT_ROOT = Path(__file__).resolve().parent.parent
 SOURCE_ROOT = PROJECT_ROOT / "src"
@@ -31,8 +36,6 @@ html_theme_options = {
 }
 # ReadTheDocs integration - canonical URL for SEO
-import os
 if os.environ.get("READTHEDOCS"):
     rtd_version = os.environ.get("READTHEDOCS_VERSION", "latest")
     rtd_project = os.environ.get("READTHEDOCS_PROJECT", "biblicus")
@@ -44,12 +47,6 @@ source_suffix = {
 }
 suppress_warnings = ["misc.highlighting_failure"]
-import sys
 sys.path.insert(0, str(SOURCE_ROOT))
-from pygments.lexers.special import TextLexer
-from sphinx.highlighting import lexers
 lexers["mermaid"] = TextLexer()

{biblicus-0.9.0 → biblicus-0.10.0}/docs/index.rst RENAMED Viewed

@@ -17,6 +17,7 @@ Contents
    backends/index
    CONTEXT_PACK
    ANALYSIS
+   PROFILING
    TOPIC_MODELING
    DEMOS
    USER_CONFIGURATION

{biblicus-0.9.0 → biblicus-0.10.0}/features/analysis_schema.feature RENAMED Viewed

@@ -56,3 +56,55 @@ Feature: Analysis schema validation
     When I attempt to validate a vectorizer config with stop words "spanish"
     Then a model validation error is raised
     And the validation error mentions "vectorizer.stop_words must be"
+  Scenario: Profiling config rejects invalid sample size
+    When I attempt to validate a profiling config with sample size 0
+    Then a model validation error is raised
+    And the validation error mentions "sample_size"
+  Scenario: Profiling config rejects unsupported schema version
+    When I attempt to validate a profiling config with schema version 2
+    Then a model validation error is raised
+    And the validation error mentions "Unsupported analysis schema version"
+  Scenario: Profiling config rejects invalid percentiles
+    When I attempt to validate a profiling config with percentiles "0,101"
+    Then a model validation error is raised
+    And the validation error mentions "percentiles"
+  Scenario: Profiling config rejects empty percentiles
+    When I attempt to validate a profiling config with empty percentiles
+    Then a model validation error is raised
+    And the validation error mentions "percentiles"
+  Scenario: Profiling config rejects unsorted percentiles
+    When I attempt to validate a profiling config with percentiles "90,50"
+    Then a model validation error is raised
+    And the validation error mentions "percentiles"
+  Scenario: Profiling config rejects empty tag filters
+    When I attempt to validate a profiling config with tag filters "alpha,,beta"
+    Then a model validation error is raised
+    And the validation error mentions "tag_filters"
+  Scenario: Profiling config rejects non-list tag filters
+    When I attempt to validate a profiling config with tag filters string "alpha"
+    Then a model validation error is raised
+    And the validation error mentions "tag_filters"
+  Scenario: Profiling config accepts tag filters None
+    When I validate a profiling config with tag filters None
+    Then the profiling tag filters are absent
+  Scenario: Profiling config normalizes tag filters
+    When I validate a profiling config with tag filters list " alpha ,beta "
+    Then the profiling tag filters include "alpha"
+    And the profiling tag filters include "beta"
+  Scenario: Profiling ordering helper ignores missing items
+    When I order catalog items with missing entries
+    Then the ordered catalog item identifiers equal "a,c,b"
+  Scenario: Profiling percentile helper handles empty values
+    When I compute a profiling percentile on empty values
+    Then the profiling percentile value equals 0

{biblicus-0.9.0 → biblicus-0.10.0}/features/environment.py RENAMED Viewed

@@ -17,7 +17,6 @@ def _repo_root() -> Path:
     :return: Repository root path.
     :rtype: Path
     """
     return Path(__file__).resolve().parent.parent
@@ -32,7 +31,6 @@ def before_scenario(context, scenario) -> None:
     :return: None.
     :rtype: None
     """
     import biblicus.__main__ as _biblicus_main
     _ = _biblicus_main
@@ -74,7 +72,6 @@ def after_scenario(context, scenario) -> None:
     :return: None.
     :rtype: None
     """
     if getattr(context, "httpd", None) is not None:
         context.httpd.shutdown()
         context.httpd.server_close()
@@ -221,7 +218,9 @@ def after_scenario(context, scenario) -> None:
         context.fake_paddleocr_vl_behaviors.clear()
     if getattr(context, "_fake_paddleocr_installed", False):
         # Remove all paddle-related modules
-        paddle_module_names = [name for name in list(sys.modules.keys()) if "paddle" in name.lower()]
+        paddle_module_names = [
+            name for name in list(sys.modules.keys()) if "paddle" in name.lower()
+        ]
         for name in paddle_module_names:
             sys.modules.pop(name, None)
         # Restore original modules
@@ -345,7 +344,6 @@ def run_biblicus(
     :return: Captured execution result.
     :rtype: RunResult
     """
     import contextlib
     import io

biblicus-0.10.0/features/profiling.feature ADDED Viewed

@@ -0,0 +1,150 @@
+Feature: Profiling analysis
+  Profiling analysis summarizes raw corpus composition and extracted text coverage.
+  Scenario: Profiling analysis reports raw and extracted counts
+    Given I initialized a corpus at "corpus"
+    And a binary file "blob.bin" exists
+    When I ingest the text "Alpha note" with title "Alpha" and tags "t" into corpus "corpus"
+    And I ingest the file "blob.bin" into corpus "corpus"
+    And I build a "pipeline" extraction run in corpus "corpus" with steps:
+      | extractor_id      | config_json |
+      | pass-through-text | {}          |
+    And I run a profiling analysis in corpus "corpus" using the latest extraction run
+    Then the profiling output includes raw item total 2
+    And the profiling output includes media type count "text/markdown" 1
+    And the profiling output includes media type count "application/octet-stream" 1
+    And the profiling output includes raw bytes distribution count 2
+    And the profiling output includes raw bytes percentiles 50,90,99
+    And the profiling output includes tagged items 1
+    And the profiling output includes untagged items 1
+    And the profiling output includes top tag "t" with count 1
+    And the profiling output includes extracted source items 2
+    And the profiling output includes extracted nonempty items 1
+    And the profiling output includes extracted empty items 0
+    And the profiling output includes extracted missing items 1
+    And the profiling output includes extracted text distribution count 1
+    And the profiling output includes extracted text percentiles 50,90,99
+  Scenario: Profiling analysis uses the latest extraction run when omitted
+    Given I initialized a corpus at "corpus"
+    And a binary file "blob.bin" exists
+    When I ingest the text "Alpha note" with title "Alpha" and tags "t" into corpus "corpus"
+    And I ingest the file "blob.bin" into corpus "corpus"
+    And I build a "pipeline" extraction run in corpus "corpus" with steps:
+      | extractor_id      | config_json |
+      | pass-through-text | {}          |
+    And I run a profiling analysis in corpus "corpus"
+    Then the command succeeds
+    And standard error includes "latest extraction run"
+  Scenario: Profiling analysis accepts a recipe file
+    Given I initialized a corpus at "corpus"
+    And a binary file "blob.bin" exists
+    When I ingest the text "Alpha note" with title "Alpha" and tags "t" into corpus "corpus"
+    And I ingest the file "blob.bin" into corpus "corpus"
+    And I build a "pipeline" extraction run in corpus "corpus" with steps:
+      | extractor_id      | config_json |
+      | pass-through-text | {}          |
+    And I create a profiling recipe file "profiling_recipe.yml" with:
+      """
+      schema_version: 1
+      sample_size: 1
+      percentiles: [50]
+      top_tag_count: 1
+      """
+    And I run a profiling analysis in corpus "corpus" using recipe "profiling_recipe.yml" and the latest extraction run
+    Then the profiling output includes raw bytes distribution count 1
+    And the profiling output includes raw bytes percentiles 50
+    And the profiling output includes top tag "t" with count 1
+  Scenario: Profiling analysis reports empty corpus distributions
+    Given I initialized a corpus at "corpus"
+    When I build a "pipeline" extraction run in corpus "corpus" with steps:
+      | extractor_id      | config_json |
+      | pass-through-text | {}          |
+    And I run a profiling analysis in corpus "corpus" using the latest extraction run
+    Then the profiling output includes raw item total 0
+    And the profiling output includes raw bytes distribution count 0
+    And the profiling output includes extracted source items 0
+    And the profiling output includes extracted text distribution count 0
+  Scenario: Profiling analysis counts empty extracted text
+    Given I initialized a corpus at "corpus"
+    When I ingest the text "   " with title "Blank" and tags "t" into corpus "corpus"
+    And I build a "pipeline" extraction run in corpus "corpus" with steps:
+      | extractor_id      | config_json |
+      | pass-through-text | {}          |
+    And I run a profiling analysis in corpus "corpus" using the latest extraction run
+    Then the profiling output includes extracted nonempty items 0
+    And the profiling output includes extracted empty items 1
+  Scenario: Profiling analysis respects minimum text length
+    Given I initialized a corpus at "corpus"
+    When I ingest the text "short" with title "Short" and tags "t" into corpus "corpus"
+    And I build a "pipeline" extraction run in corpus "corpus" with steps:
+      | extractor_id      | config_json |
+      | pass-through-text | {}          |
+    And I create a profiling recipe file "profiling_min_text.yml" with:
+      """
+      schema_version: 1
+      min_text_characters: 10
+      """
+    And I run a profiling analysis in corpus "corpus" using recipe "profiling_min_text.yml" and the latest extraction run
+    Then the profiling output includes extracted nonempty items 0
+    And the profiling output includes extracted empty items 1
+  Scenario: Profiling analysis applies tag filters
+    Given I initialized a corpus at "corpus"
+    When I ingest the text "Alpha note" with title "Alpha" and tags "t" into corpus "corpus"
+    And I ingest the text "Beta note" with title "Beta" and tags "other" into corpus "corpus"
+    And I build a "pipeline" extraction run in corpus "corpus" with steps:
+      | extractor_id      | config_json |
+      | pass-through-text | {}          |
+    And I create a profiling recipe file "profiling_tags.yml" with:
+      """
+      schema_version: 1
+      tag_filters: ["t"]
+      """
+    And I run a profiling analysis in corpus "corpus" using recipe "profiling_tags.yml" and the latest extraction run
+    Then the profiling output includes top tag "t" with count 1
+    And the profiling output includes tagged items 1
+    And the profiling output includes untagged items 1
+  Scenario: Profiling analysis rejects missing recipe file
+    Given I initialized a corpus at "corpus"
+    When I run a profiling analysis in corpus "corpus" using recipe "missing.yml" without extraction run
+    Then the command fails with exit code 2
+    And standard error includes "Recipe file not found"
+  Scenario: Profiling analysis rejects non-mapping recipe
+    Given I initialized a corpus at "corpus"
+    When I create a profiling recipe file "profiling_invalid.yml" with:
+      """
+      - not
+      - a
+      - mapping
+      """
+    And I run a profiling analysis in corpus "corpus" using recipe "profiling_invalid.yml" without extraction run
+    Then the command fails with exit code 2
+    And standard error includes "Profiling recipe must be a mapping/object"
+  Scenario: Profiling analysis rejects invalid recipe values
+    Given I initialized a corpus at "corpus"
+    When I ingest the text "Alpha note" with title "Alpha" and tags "t" into corpus "corpus"
+    And I build a "pipeline" extraction run in corpus "corpus" with steps:
+      | extractor_id      | config_json |
+      | pass-through-text | {}          |
+    And I create a profiling recipe file "profiling_invalid_values.yml" with:
+      """
+      schema_version: 1
+      percentiles: ["bad"]
+      """
+    And I run a profiling analysis in corpus "corpus" using recipe "profiling_invalid_values.yml" and the latest extraction run
+    Then the command fails with exit code 2
+    And standard error includes "Invalid profiling recipe"
+  Scenario: Profiling analysis requires extraction run
+    Given I initialized a corpus at "corpus"
+    When I run a profiling analysis in corpus "corpus"
+    Then the command fails with exit code 2
+    And standard error includes "Profiling analysis requires an extraction run"

{biblicus-0.9.0 → biblicus-0.10.0}/features/steps/analysis_steps.py RENAMED Viewed

@@ -9,23 +9,25 @@ from biblicus.analysis import get_analysis_backend
 from biblicus.analysis.base import CorpusAnalysisBackend
 from biblicus.analysis.llm import LlmClientConfig, LlmProvider
 from biblicus.analysis.models import (
+    ProfilingRecipeConfig,
+    TopicModelingKeyword,
+    TopicModelingLabelSource,
     TopicModelingLlmExtractionConfig,
     TopicModelingLlmExtractionMethod,
     TopicModelingLlmFineTuningConfig,
-    TopicModelingKeyword,
-    TopicModelingLabelSource,
     TopicModelingTopic,
     TopicModelingVectorizerConfig,
 )
+from biblicus.analysis.profiling import _ordered_catalog_items, _percentile_value
 from biblicus.analysis.topic_modeling import (
-    _TopicDocument,
     _apply_llm_fine_tuning,
     _parse_itemized_response,
+    _TopicDocument,
 )
-from biblicus.models import ExtractionRunReference
+from biblicus.models import CatalogItem, ExtractionRunReference
 from features.steps.openai_steps import (
-    _FakeOpenAiChatBehavior,
     _ensure_fake_openai_chat_behaviors,
+    _FakeOpenAiChatBehavior,
     _install_fake_openai_module,
 )
@@ -163,9 +165,7 @@ def step_run_llm_fine_tuning_missing_documents(context) -> None:
             document_ids=["missing"],
         )
     ]
-    documents = [
-        _TopicDocument(document_id="present", source_item_id="present", text="Text")
-    ]
+    documents = [_TopicDocument(document_id="present", source_item_id="present", text="Text")]
     report, labeled_topics = _apply_llm_fine_tuning(
         topics=topics,
         documents=documents,
@@ -184,7 +184,7 @@ def step_fine_tuning_topics_labeled(context, count: int) -> None:
 @when("I parse an itemized response JSON string")
 def step_parse_itemized_response_json_string(context) -> None:
-    response_text = "\"[\\\"Alpha\\\", \\\"Beta\\\"]\""
+    response_text = '"[\\"Alpha\\", \\"Beta\\"]"'
     context.itemized_response = _parse_itemized_response(response_text)
@@ -247,3 +247,143 @@ def step_vectorizer_stop_words_equals(context, value: str) -> None:
 def step_vectorizer_stop_words_absent(context) -> None:
     model = context.last_model
     assert model.stop_words is None
+@when("I attempt to validate a profiling config with sample size {value:d}")
+def step_validate_profiling_sample_size(context, value: int) -> None:
+    try:
+        ProfilingRecipeConfig(sample_size=value)
+        context.validation_error = None
+    except ValidationError as exc:
+        context.validation_error = exc
+@when('I attempt to validate a profiling config with percentiles "{values}"')
+def step_validate_profiling_percentiles(context, values: str) -> None:
+    try:
+        percentiles = [int(value.strip()) for value in values.split(",") if value.strip()]
+        ProfilingRecipeConfig(percentiles=percentiles)
+        context.validation_error = None
+    except ValidationError as exc:
+        context.validation_error = exc
+@when('I attempt to validate a profiling config with tag filters "{values}"')
+def step_validate_profiling_tag_filters(context, values: str) -> None:
+    try:
+        tags = [value.strip() for value in values.split(",")]
+        ProfilingRecipeConfig(tag_filters=tags)
+        context.validation_error = None
+    except ValidationError as exc:
+        context.validation_error = exc
+@when("I attempt to validate a profiling config with schema version {value:d}")
+def step_validate_profiling_schema_version(context, value: int) -> None:
+    try:
+        ProfilingRecipeConfig(schema_version=value)
+        context.validation_error = None
+    except ValidationError as exc:
+        context.validation_error = exc
+@when("I attempt to validate a profiling config with empty percentiles")
+def step_validate_profiling_empty_percentiles(context) -> None:
+    try:
+        ProfilingRecipeConfig(percentiles=[])
+        context.validation_error = None
+    except ValidationError as exc:
+        context.validation_error = exc
+@when('I attempt to validate a profiling config with tag filters string "{value}"')
+def step_validate_profiling_tag_filters_string(context, value: str) -> None:
+    try:
+        ProfilingRecipeConfig(tag_filters=value)
+        context.validation_error = None
+    except ValidationError as exc:
+        context.validation_error = exc
+@when("I validate a profiling config with tag filters None")
+def step_validate_profiling_tag_filters_none(context) -> None:
+    context.last_model = ProfilingRecipeConfig(tag_filters=None)
+@when('I validate a profiling config with tag filters list "{values}"')
+def step_validate_profiling_tag_filters_list(context, values: str) -> None:
+    tags = [value.strip() for value in values.split(",")]
+    context.last_model = ProfilingRecipeConfig(tag_filters=tags)
+@then("the profiling tag filters are absent")
+def step_profiling_tag_filters_absent(context) -> None:
+    model = context.last_model
+    assert model.tag_filters is None
+@then('the profiling tag filters include "{value}"')
+def step_profiling_tag_filters_include(context, value: str) -> None:
+    model = context.last_model
+    assert model.tag_filters is not None
+    assert value in model.tag_filters
+@when("I order catalog items with missing entries")
+def step_order_catalog_items_with_missing_entries(context) -> None:
+    items = {
+        "a": CatalogItem(
+            id="a",
+            relpath="raw/a.txt",
+            sha256="a",
+            bytes=1,
+            media_type="text/plain",
+            title=None,
+            tags=[],
+            metadata={},
+            created_at="2020-01-01T00:00:00Z",
+            source_uri=None,
+        ),
+        "b": CatalogItem(
+            id="b",
+            relpath="raw/b.txt",
+            sha256="b",
+            bytes=2,
+            media_type="text/plain",
+            title=None,
+            tags=[],
+            metadata={},
+            created_at="2020-01-01T00:00:00Z",
+            source_uri=None,
+        ),
+        "c": CatalogItem(
+            id="c",
+            relpath="raw/c.txt",
+            sha256="c",
+            bytes=3,
+            media_type="text/plain",
+            title=None,
+            tags=[],
+            metadata={},
+            created_at="2020-01-01T00:00:00Z",
+            source_uri=None,
+        ),
+    }
+    ordered = _ordered_catalog_items(items, ["a", "missing", "c"])
+    context.ordered_catalog_ids = [item.id for item in ordered]
+@then('the ordered catalog item identifiers equal "{values}"')
+def step_ordered_catalog_item_identifiers_equal(context, values: str) -> None:
+    expected = [value.strip() for value in values.split(",") if value.strip()]
+    assert context.ordered_catalog_ids == expected
+@when("I compute a profiling percentile on empty values")
+def step_compute_profiling_percentile_empty(context) -> None:
+    context.percentile_value = _percentile_value([], 50)
+@then("the profiling percentile value equals {value:d}")
+def step_profiling_percentile_value_equals(context, value: int) -> None:
+    assert context.percentile_value == value

biblicus 0.9.0__tar.gz → 0.10.0__tar.gz

biblicus 0.9.0tar.gz → 0.10.0tar.gz