PyPI - biblicus - Versions diffs - 1.0.0__tar.gz → 1.1.1__tar.gz - Mend

biblicus 1.0.0tar.gz → 1.1.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (442) hide show

{biblicus-1.0.0/src/biblicus.egg-info → biblicus-1.1.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: biblicus
-Version: 1.0.0
+Version: 1.1.1
 Summary: Command line interface and Python library for corpus ingestion, retrieval, and evaluation.
 License: MIT
 Requires-Python: >=3.9
@@ -80,10 +80,10 @@ See [retrieval augmented generation overview] for a short introduction to the id
 ## Analysis highlights
 - `biblicus analyze markov` learns a directed, weighted state transition graph over segmented text.
-- YAML recipes support cascading composition plus dotted `--config key=value` overrides.
+- YAML configurations support cascading composition plus dotted `--config key=value` overrides.
 - Text extract splits long texts with an LLM by inserting XML tags in-place for structured spans.
-- See `docs/MARKOV_ANALYSIS.md` for Markov analysis details and runnable demos.
-- See `docs/TEXT_EXTRACT.md` for the text extract utility and examples.
+- See `docs/markov-analysis.md` for Markov analysis details and runnable demos.
+- See `docs/text-extract.md` for the text extract utility and examples.
 ## Start with a knowledge base
@@ -167,7 +167,7 @@ sequenceDiagram
 - You can ingest raw material once, then try many retrieval approaches over time.
 - You can keep raw files readable and portable, without locking your data inside a database.
-- You can evaluate retrieval runs against shared datasets and compare backends using the same corpus.
+- You can evaluate retrieval snapshots against shared datasets and compare backends using the same corpus.
 ## Typical flow
@@ -176,7 +176,7 @@ sequenceDiagram
 - Crawl a website section into corpus items when you want a repeatable “import from the web” workflow.
 - Run extraction when you want derived text artifacts from non-text sources.
 - Reindex to refresh the catalog after edits.
-- Build a retrieval run with a backend.
+- Build a retrieval snapshot with a backend.
 - Query the run to collect evidence and evaluate it with datasets.
 ## Install
@@ -292,7 +292,7 @@ for note_title, note_text in notes:
     corpus.ingest_note(note_text, title=note_title, tags=["memory"])
 backend = get_backend("scan")
-run = backend.build_run(corpus, recipe_name="Story demo", config={})
+run = backend.build_run(corpus, configuration_name="Story demo", config={})
 budget = QueryBudget(max_total_items=5, maximum_total_characters=2000, max_items_per_source=None)
 result = backend.query(
     corpus,
@@ -336,8 +336,8 @@ Example output:
     "maximum_total_characters": 2000,
     "max_items_per_source": null
   },
-  "run_id": "RUN_ID",
-  "recipe_id": "RECIPE_ID",
+  "snapshot_id": "RUN_ID",
+  "configuration_id": "RECIPE_ID",
   "backend_id": "scan",
   "generated_at": "2026-01-29T00:00:00.000000Z",
   "evidence": [
@@ -352,8 +352,8 @@ Example output:
       "span_start": null,
       "span_end": null,
       "stage": "scan",
-      "recipe_id": "RECIPE_ID",
-      "run_id": "RUN_ID",
+      "configuration_id": "RECIPE_ID",
+      "snapshot_id": "RUN_ID",
       "hash": null
     }
   ],
@@ -422,7 +422,7 @@ flowchart TB
       subgraph RowExtraction[Pluggable: extraction pipeline]
         direction TB
-        Catalog --> Extract[Extract pipeline] --> ExtractedText[Extracted text artifacts] --> ExtractionRun[Extraction run manifest]
+        Catalog --> Extract[Extract pipeline] --> ExtractedText[Extracted text artifacts] --> ExtractionRun[Extraction snapshot manifest]
       end
       subgraph RowRetrieval[Pluggable: retrieval backend]
@@ -484,7 +484,7 @@ From Python, the same flow is available through the Corpus class and backend int
 - Ingest notes with `Corpus.ingest_note`.
 - Ingest files or web addresses with `Corpus.ingest_source`.
 - List items with `Corpus.list_items`.
-- Build a retrieval run with `get_backend` and `backend.build_run`.
+- Build a retrieval snapshot with `get_backend` and `backend.build_run`.
 - Query a run with `backend.query`.
 - Evaluate with `evaluate_run`.
@@ -530,13 +530,13 @@ corpus/
     runs/
       extraction/
         pipeline/
-          <run id>/
+          <snapshot id>/
             manifest.json
             text/
               <item id>.txt
       retrieval/
         <backend id>/
-          <run id>/
+          <snapshot id>/
             manifest.json
 ```
@@ -552,9 +552,9 @@ For detailed documentation including configuration options, performance characte
 ## Retrieval documentation
-For the retrieval pipeline overview and run artifacts, see `docs/RETRIEVAL.md`. For retrieval quality upgrades
-(tuned lexical baseline, reranking, hybrid retrieval), see `docs/RETRIEVAL_QUALITY.md`. For evaluation workflows
-and dataset formats, see `docs/RETRIEVAL_EVALUATION.md`. For a runnable walkthrough, use the retrieval evaluation lab
+For the retrieval pipeline overview and snapshot artifacts, see `docs/retrieval.md`. For retrieval quality upgrades
+(tuned lexical baseline, reranking, hybrid retrieval), see `docs/retrieval-quality.md`. For evaluation workflows
+and dataset formats, see `docs/retrieval-evaluation.md`. For a runnable walkthrough, use the retrieval evaluation lab
 script (`scripts/retrieval_evaluation_lab.py`).
 ## Extraction backends
@@ -594,7 +594,7 @@ These extractors are built in. Optional ones require extra dependencies. See [te
 For detailed documentation on all extractors, see the [Extractor Reference][extractor-reference].
 For extraction evaluation workflows, dataset formats, and report interpretation, see
-`docs/EXTRACTION_EVALUATION.md`.
+`docs/extraction-evaluation.md`.
 ## Text extract utility
@@ -602,39 +602,39 @@ Text extract is a reusable analysis utility that lets a model insert XML tags in
 entire document. It returns structured spans and the marked-up text, and it is used as a segmentation option in Markov
 analysis.
-See `docs/TEXT_EXTRACT.md` for the utility API and examples, and `docs/MARKOV_ANALYSIS.md` for the Markov integration.
+See `docs/text-extract.md` for the utility API and examples, and `docs/markov-analysis.md` for the Markov integration.
 ## Text slice utility
 Text slice is a reusable analysis utility that lets a model insert `<slice/>` markers into a long text without
 re-emitting the entire document. It returns ordered slices and the marked-up text for auditing and reuse.
-See `docs/TEXT_SLICE.md` for the utility API and examples.
+See `docs/text-slice.md` for the utility API and examples.
 ## Topic modeling analysis
 Biblicus can run analysis pipelines on extracted text without changing the raw corpus. Profiling and topic modeling
 are the first analysis backends. Profiling summarizes corpus composition and extraction coverage. Topic modeling reads
-an extraction run, optionally applies an LLM-driven extraction pass, applies lexical processing, runs BERTopic, and
+an extraction snapshot, optionally applies an LLM-driven extraction pass, applies lexical processing, runs BERTopic, and
 optionally applies an LLM fine-tuning pass to label topics. The output is structured JavaScript Object Notation.
-See `docs/ANALYSIS.md` for the analysis pipeline overview, `docs/PROFILING.md` for profiling, and
-`docs/TOPIC_MODELING.md` for topic modeling details.
+See `docs/analysis.md` for the analysis pipeline overview, `docs/profiling.md` for profiling, and
+`docs/topic-modeling.md` for topic modeling details.
-Run a topic analysis using a recipe file:
+Run a topic analysis using a configuration file:
 ```
-biblicus analyze topics --corpus corpora/example --recipe recipes/topic-modeling.yml --extraction-run pipeline:<run_id>
+biblicus analyze topics --corpus corpora/example --configuration configurations/topic-modeling.yml --extraction-run pipeline:<snapshot_id>
 ```
-If `--extraction-run` is omitted, Biblicus uses the most recent extraction run and emits a warning about
+If `--extraction-run` is omitted, Biblicus uses the most recent extraction snapshot and emits a warning about
 reproducibility. The analysis output is stored under:
 ```
-.biblicus/runs/analysis/topic-modeling/<run_id>/output.json
+.biblicus/runs/analysis/topic-modeling/<snapshot_id>/output.json
 ```
-Minimal recipe example:
+Minimal configuration example:
 ```yaml
 schema_version: 1
@@ -659,7 +659,7 @@ llm_fine_tuning:
 ```
 LLM extraction and fine-tuning require `biblicus[openai]` and a configured OpenAI API key.
-Recipe files are validated strictly against the topic modeling schema, so type mismatches or unknown fields are errors.
+Configuration files are validated strictly against the topic modeling schema, so type mismatches or unknown fields are errors.
 AG News integration runs require `biblicus[datasets]` in addition to `biblicus[topic-modeling]`.
 For a repeatable, real-world integration run that downloads AG News and executes topic modeling, use:
@@ -668,7 +668,7 @@ For a repeatable, real-world integration run that downloads AG News and executes
 python scripts/topic_modeling_integration.py --corpus corpora/ag_news_demo --force
 ```
-See `docs/TOPIC_MODELING.md` for parameter examples and per-topic output behavior.
+See `docs/topic-modeling.md` for parameter examples and per-topic output behavior.
 ## Integration corpus and evaluation dataset
@@ -712,25 +712,34 @@ Build the documentation:
 python -m sphinx -b html docs docs/_build/html
 ```
+Preview the documentation locally:
+```
+cd docs/_build/html
+python -m http.server
+```
+Open `http://localhost:8000` in your browser.
 ## License
 License terms are in `LICENSE`.
 [retrieval augmented generation overview]: https://en.wikipedia.org/wiki/Retrieval-augmented_generation
-[architecture]: docs/ARCHITECTURE.md
-[roadmap]: docs/ROADMAP.md
-[feature-index]: docs/FEATURE_INDEX.md
-[corpus]: docs/CORPUS.md
-[knowledge-base]: docs/KNOWLEDGE_BASE.md
-[text-extraction]: docs/EXTRACTION.md
+[architecture]: docs/architecture.md
+[roadmap]: docs/roadmap.md
+[feature-index]: docs/feature-index.md
+[corpus]: docs/corpus.md
+[knowledge-base]: docs/knowledge-base.md
+[text-extraction]: docs/extraction.md
 [extractor-reference]: docs/extractors/index.md
 [backend-reference]: docs/backends/index.md
-[speech-to-text]: docs/STT.md
-[user-configuration]: docs/USER_CONFIGURATION.md
-[backends]: docs/BACKENDS.md
-[context-packs]: docs/CONTEXT_PACK.md
-[demos]: docs/DEMOS.md
-[testing]: docs/TESTING.md
+[speech-to-text]: docs/stt.md
+[user-configuration]: docs/user-configuration.md
+[backends]: docs/backends.md
+[context-packs]: docs/context-pack.md
+[demos]: docs/demos.md
+[testing]: docs/testing.md
 [continuous-integration-badge]: https://github.com/AnthusAI/Biblicus/actions/workflows/ci.yml/badge.svg?branch=main
 [coverage-badge]: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/AnthusAI/Biblicus/main/coverage_badge.json

{biblicus-1.0.0 → biblicus-1.1.1}/README.md RENAMED Viewed

@@ -26,10 +26,10 @@ See [retrieval augmented generation overview] for a short introduction to the id
 ## Analysis highlights
 - `biblicus analyze markov` learns a directed, weighted state transition graph over segmented text.
-- YAML recipes support cascading composition plus dotted `--config key=value` overrides.
+- YAML configurations support cascading composition plus dotted `--config key=value` overrides.
 - Text extract splits long texts with an LLM by inserting XML tags in-place for structured spans.
-- See `docs/MARKOV_ANALYSIS.md` for Markov analysis details and runnable demos.
-- See `docs/TEXT_EXTRACT.md` for the text extract utility and examples.
+- See `docs/markov-analysis.md` for Markov analysis details and runnable demos.
+- See `docs/text-extract.md` for the text extract utility and examples.
 ## Start with a knowledge base
@@ -113,7 +113,7 @@ sequenceDiagram
 - You can ingest raw material once, then try many retrieval approaches over time.
 - You can keep raw files readable and portable, without locking your data inside a database.
-- You can evaluate retrieval runs against shared datasets and compare backends using the same corpus.
+- You can evaluate retrieval snapshots against shared datasets and compare backends using the same corpus.
 ## Typical flow
@@ -122,7 +122,7 @@ sequenceDiagram
 - Crawl a website section into corpus items when you want a repeatable “import from the web” workflow.
 - Run extraction when you want derived text artifacts from non-text sources.
 - Reindex to refresh the catalog after edits.
-- Build a retrieval run with a backend.
+- Build a retrieval snapshot with a backend.
 - Query the run to collect evidence and evaluate it with datasets.
 ## Install
@@ -238,7 +238,7 @@ for note_title, note_text in notes:
     corpus.ingest_note(note_text, title=note_title, tags=["memory"])
 backend = get_backend("scan")
-run = backend.build_run(corpus, recipe_name="Story demo", config={})
+run = backend.build_run(corpus, configuration_name="Story demo", config={})
 budget = QueryBudget(max_total_items=5, maximum_total_characters=2000, max_items_per_source=None)
 result = backend.query(
     corpus,
@@ -282,8 +282,8 @@ Example output:
     "maximum_total_characters": 2000,
     "max_items_per_source": null
   },
-  "run_id": "RUN_ID",
-  "recipe_id": "RECIPE_ID",
+  "snapshot_id": "RUN_ID",
+  "configuration_id": "RECIPE_ID",
   "backend_id": "scan",
   "generated_at": "2026-01-29T00:00:00.000000Z",
   "evidence": [
@@ -298,8 +298,8 @@ Example output:
       "span_start": null,
       "span_end": null,
       "stage": "scan",
-      "recipe_id": "RECIPE_ID",
-      "run_id": "RUN_ID",
+      "configuration_id": "RECIPE_ID",
+      "snapshot_id": "RUN_ID",
       "hash": null
     }
   ],
@@ -368,7 +368,7 @@ flowchart TB
       subgraph RowExtraction[Pluggable: extraction pipeline]
         direction TB
-        Catalog --> Extract[Extract pipeline] --> ExtractedText[Extracted text artifacts] --> ExtractionRun[Extraction run manifest]
+        Catalog --> Extract[Extract pipeline] --> ExtractedText[Extracted text artifacts] --> ExtractionRun[Extraction snapshot manifest]
       end
       subgraph RowRetrieval[Pluggable: retrieval backend]
@@ -430,7 +430,7 @@ From Python, the same flow is available through the Corpus class and backend int
 - Ingest notes with `Corpus.ingest_note`.
 - Ingest files or web addresses with `Corpus.ingest_source`.
 - List items with `Corpus.list_items`.
-- Build a retrieval run with `get_backend` and `backend.build_run`.
+- Build a retrieval snapshot with `get_backend` and `backend.build_run`.
 - Query a run with `backend.query`.
 - Evaluate with `evaluate_run`.
@@ -476,13 +476,13 @@ corpus/
     runs/
       extraction/
         pipeline/
-          <run id>/
+          <snapshot id>/
             manifest.json
             text/
               <item id>.txt
       retrieval/
         <backend id>/
-          <run id>/
+          <snapshot id>/
             manifest.json
 ```
@@ -498,9 +498,9 @@ For detailed documentation including configuration options, performance characte
 ## Retrieval documentation
-For the retrieval pipeline overview and run artifacts, see `docs/RETRIEVAL.md`. For retrieval quality upgrades
-(tuned lexical baseline, reranking, hybrid retrieval), see `docs/RETRIEVAL_QUALITY.md`. For evaluation workflows
-and dataset formats, see `docs/RETRIEVAL_EVALUATION.md`. For a runnable walkthrough, use the retrieval evaluation lab
+For the retrieval pipeline overview and snapshot artifacts, see `docs/retrieval.md`. For retrieval quality upgrades
+(tuned lexical baseline, reranking, hybrid retrieval), see `docs/retrieval-quality.md`. For evaluation workflows
+and dataset formats, see `docs/retrieval-evaluation.md`. For a runnable walkthrough, use the retrieval evaluation lab
 script (`scripts/retrieval_evaluation_lab.py`).
 ## Extraction backends
@@ -540,7 +540,7 @@ These extractors are built in. Optional ones require extra dependencies. See [te
 For detailed documentation on all extractors, see the [Extractor Reference][extractor-reference].
 For extraction evaluation workflows, dataset formats, and report interpretation, see
-`docs/EXTRACTION_EVALUATION.md`.
+`docs/extraction-evaluation.md`.
 ## Text extract utility
@@ -548,39 +548,39 @@ Text extract is a reusable analysis utility that lets a model insert XML tags in
 entire document. It returns structured spans and the marked-up text, and it is used as a segmentation option in Markov
 analysis.
-See `docs/TEXT_EXTRACT.md` for the utility API and examples, and `docs/MARKOV_ANALYSIS.md` for the Markov integration.
+See `docs/text-extract.md` for the utility API and examples, and `docs/markov-analysis.md` for the Markov integration.
 ## Text slice utility
 Text slice is a reusable analysis utility that lets a model insert `<slice/>` markers into a long text without
 re-emitting the entire document. It returns ordered slices and the marked-up text for auditing and reuse.
-See `docs/TEXT_SLICE.md` for the utility API and examples.
+See `docs/text-slice.md` for the utility API and examples.
 ## Topic modeling analysis
 Biblicus can run analysis pipelines on extracted text without changing the raw corpus. Profiling and topic modeling
 are the first analysis backends. Profiling summarizes corpus composition and extraction coverage. Topic modeling reads
-an extraction run, optionally applies an LLM-driven extraction pass, applies lexical processing, runs BERTopic, and
+an extraction snapshot, optionally applies an LLM-driven extraction pass, applies lexical processing, runs BERTopic, and
 optionally applies an LLM fine-tuning pass to label topics. The output is structured JavaScript Object Notation.
-See `docs/ANALYSIS.md` for the analysis pipeline overview, `docs/PROFILING.md` for profiling, and
-`docs/TOPIC_MODELING.md` for topic modeling details.
+See `docs/analysis.md` for the analysis pipeline overview, `docs/profiling.md` for profiling, and
+`docs/topic-modeling.md` for topic modeling details.
-Run a topic analysis using a recipe file:
+Run a topic analysis using a configuration file:
 ```
-biblicus analyze topics --corpus corpora/example --recipe recipes/topic-modeling.yml --extraction-run pipeline:<run_id>
+biblicus analyze topics --corpus corpora/example --configuration configurations/topic-modeling.yml --extraction-run pipeline:<snapshot_id>
 ```
-If `--extraction-run` is omitted, Biblicus uses the most recent extraction run and emits a warning about
+If `--extraction-run` is omitted, Biblicus uses the most recent extraction snapshot and emits a warning about
 reproducibility. The analysis output is stored under:
 ```
-.biblicus/runs/analysis/topic-modeling/<run_id>/output.json
+.biblicus/runs/analysis/topic-modeling/<snapshot_id>/output.json
 ```
-Minimal recipe example:
+Minimal configuration example:
 ```yaml
 schema_version: 1
@@ -605,7 +605,7 @@ llm_fine_tuning:
 ```
 LLM extraction and fine-tuning require `biblicus[openai]` and a configured OpenAI API key.
-Recipe files are validated strictly against the topic modeling schema, so type mismatches or unknown fields are errors.
+Configuration files are validated strictly against the topic modeling schema, so type mismatches or unknown fields are errors.
 AG News integration runs require `biblicus[datasets]` in addition to `biblicus[topic-modeling]`.
 For a repeatable, real-world integration run that downloads AG News and executes topic modeling, use:
@@ -614,7 +614,7 @@ For a repeatable, real-world integration run that downloads AG News and executes
 python scripts/topic_modeling_integration.py --corpus corpora/ag_news_demo --force
 ```
-See `docs/TOPIC_MODELING.md` for parameter examples and per-topic output behavior.
+See `docs/topic-modeling.md` for parameter examples and per-topic output behavior.
 ## Integration corpus and evaluation dataset
@@ -658,25 +658,34 @@ Build the documentation:
 python -m sphinx -b html docs docs/_build/html
 ```
+Preview the documentation locally:
+```
+cd docs/_build/html
+python -m http.server
+```
+Open `http://localhost:8000` in your browser.
 ## License
 License terms are in `LICENSE`.
 [retrieval augmented generation overview]: https://en.wikipedia.org/wiki/Retrieval-augmented_generation
-[architecture]: docs/ARCHITECTURE.md
-[roadmap]: docs/ROADMAP.md
-[feature-index]: docs/FEATURE_INDEX.md
-[corpus]: docs/CORPUS.md
-[knowledge-base]: docs/KNOWLEDGE_BASE.md
-[text-extraction]: docs/EXTRACTION.md
+[architecture]: docs/architecture.md
+[roadmap]: docs/roadmap.md
+[feature-index]: docs/feature-index.md
+[corpus]: docs/corpus.md
+[knowledge-base]: docs/knowledge-base.md
+[text-extraction]: docs/extraction.md
 [extractor-reference]: docs/extractors/index.md
 [backend-reference]: docs/backends/index.md
-[speech-to-text]: docs/STT.md
-[user-configuration]: docs/USER_CONFIGURATION.md
-[backends]: docs/BACKENDS.md
-[context-packs]: docs/CONTEXT_PACK.md
-[demos]: docs/DEMOS.md
-[testing]: docs/TESTING.md
+[speech-to-text]: docs/stt.md
+[user-configuration]: docs/user-configuration.md
+[backends]: docs/backends.md
+[context-packs]: docs/context-pack.md
+[demos]: docs/demos.md
+[testing]: docs/testing.md
 [continuous-integration-badge]: https://github.com/AnthusAI/Biblicus/actions/workflows/ci.yml/badge.svg?branch=main
 [coverage-badge]: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/AnthusAI/Biblicus/main/coverage_badge.json

{biblicus-1.0.0 → biblicus-1.1.1}/docs/CHUNKING.md RENAMED Viewed

@@ -8,7 +8,7 @@ returns evidence with chunk boundaries so you can trace results back to the orig
 ## Chunkers are pluggable
-Chunking is a pluggable interface selected by identifier in a retrieval recipe:
+Chunking is a pluggable interface selected by identifier in a retrieval configuration:
 - `chunker_id`
 - `chunker_config` (Pydantic validated; `extra="forbid"`)

{biblicus-1.0.0 → biblicus-1.1.1}/docs/CORPUS.md RENAMED Viewed

@@ -18,7 +18,7 @@ corpus/
     config.json
     catalog.json
     runs/
-      <run manifests and artifacts>
+      <snapshot manifests and artifacts>
 ```
 ## Core concepts
@@ -137,7 +137,7 @@ python -m biblicus reindex --corpus corpora/example
 ## Reproducibility checklist
 - Keep raw files and sidecars in source control or backed up as immutable inputs.
-- Record the catalog timestamp when comparing run outputs.
+- Record the catalog timestamp when comparing snapshot outputs.
 - Prefer `import-tree` for reproducible ingest of existing folder structures.
 ## Common pitfalls

{biblicus-1.0.0 → biblicus-1.1.1}/docs/PROFILING.md RENAMED Viewed

@@ -20,22 +20,22 @@ The output is structured JSON that can be stored, versioned, and compared across
 biblicus analyze profile --corpus corpora/example --extraction-run pipeline:RUN_ID
 ```
-If you omit `--extraction-run`, Biblicus uses the latest extraction run and emits a reproducibility warning.
+If you omit `--extraction-run`, Biblicus uses the latest extraction snapshot and emits a reproducibility warning.
-To customize profiling metrics, pass a recipe file:
+To customize profiling metrics, pass a configuration file:
 ```
-biblicus analyze profile --corpus corpora/example --recipe recipes/profiling.yml --extraction-run pipeline:RUN_ID
+biblicus analyze profile --corpus corpora/example --configuration configurations/profiling.yml --extraction-run pipeline:RUN_ID
 ```
-Profiling recipes support cascading composition. Pass multiple `--recipe` files; later recipes override earlier recipes
+Profiling configurations support cascading composition. Pass multiple `--configuration` files; later configurations override earlier configurations
 via a deep merge:
 ```
 biblicus analyze profile \
   --corpus corpora/example \
-  --recipe recipes/profiling/base.yml \
-  --recipe recipes/profiling/strict.yml \
+  --configuration configurations/profiling/base.yml \
+  --configuration configurations/profiling/strict.yml \
   --extraction-run pipeline:RUN_ID
 ```
@@ -44,14 +44,14 @@ To override the composed configuration view from the command line, use `--config
 ```
 biblicus analyze profile \
   --corpus corpora/example \
-  --recipe recipes/profiling/base.yml \
+  --configuration configurations/profiling/base.yml \
   --config sample_size=200 \
   --extraction-run pipeline:RUN_ID
 ```
-### Profiling recipe configuration
+### Profiling configuration configuration
-Profiling recipes use the analysis schema version and accept these fields:
+Profiling configurations use the analysis schema version and accept these fields:
 - `schema_version`: analysis schema version, currently `1`
 - `sample_size`: optional cap for distribution calculations
@@ -60,7 +60,7 @@ Profiling recipes use the analysis schema version and accept these fields:
 - `top_tag_count`: maximum number of tags to list in `top_tags`
 - `tag_filters`: optional list of tags to include in tag coverage metrics
-Example recipe:
+Example configuration:
 ```
 schema_version: 1
@@ -84,7 +84,7 @@ corpus = Corpus.open(Path("corpora/example"))
 backend = get_analysis_backend("profiling")
 output = backend.run_analysis(
     corpus,
-    recipe_name="default",
+    configuration_name="default",
     config={
         "schema_version": 1,
         "sample_size": 500,
@@ -93,9 +93,9 @@ output = backend.run_analysis(
         "top_tag_count": 10,
         "tag_filters": ["ag_news"],
     },
-    extraction_run=ExtractionRunReference(
+    extraction_snapshot=ExtractionRunReference(
         extractor_id="pipeline",
-        run_id="RUN_ID",
+        snapshot_id="RUN_ID",
     ),
 )
 print(output.model_dump())
@@ -106,7 +106,7 @@ print(output.model_dump())
 Profiling output is stored under:
 ```
-.biblicus/runs/analysis/profiling/<run_id>/output.json
+.biblicus/runs/analysis/profiling/<snapshot_id>/output.json
 ```
 ## Reading the report
@@ -138,17 +138,17 @@ through extraction and how much was missing or empty.
 ## Comparing profiling runs
-Use the same extraction run and recipe configuration whenever you compare profiling outputs:
+Use the same extraction snapshot and configuration configuration whenever you compare profiling outputs:
 1) Run profiling on two corpus snapshots.
 2) Compare `raw_items.total_items`, media type counts, and tag coverage.
 3) Compare `extracted_text` coverage to spot extraction regressions.
-Record the run identifiers and catalog timestamps so you can trace differences later.
+Record the snapshot identifiers and catalog timestamps so you can trace differences later.
 ## Common pitfalls
-- Profiling without specifying an extraction run, which makes comparisons harder to reproduce.
+- Profiling without specifying an extraction snapshot, which makes comparisons harder to reproduce.
 - Comparing runs with different `sample_size` or `min_text_characters` settings.
 - Interpreting tag counts without noting the `tag_filters` applied.

biblicus 1.0.0__tar.gz → 1.1.1__tar.gz

biblicus 1.0.0tar.gz → 1.1.1tar.gz