PyPI - biblicus - Versions diffs - 0.16.0__tar.gz → 1.1.0__tar.gz - Mend

biblicus 0.16.0tar.gz → 1.1.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (440) hide show

{biblicus-0.16.0/src/biblicus.egg-info → biblicus-1.1.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: biblicus
-Version: 0.16.0
+Version: 1.1.0
 Summary: Command line interface and Python library for corpus ingestion, retrieval, and evaluation.
 License: MIT
 Requires-Python: >=3.9
@@ -80,7 +80,7 @@ See [retrieval augmented generation overview] for a short introduction to the id
 ## Analysis highlights
 - `biblicus analyze markov` learns a directed, weighted state transition graph over segmented text.
-- YAML recipes support cascading composition plus dotted `--config key=value` overrides.
+- YAML configurations support cascading composition plus dotted `--config key=value` overrides.
 - Text extract splits long texts with an LLM by inserting XML tags in-place for structured spans.
 - See `docs/MARKOV_ANALYSIS.md` for Markov analysis details and runnable demos.
 - See `docs/TEXT_EXTRACT.md` for the text extract utility and examples.
@@ -167,7 +167,7 @@ sequenceDiagram
 - You can ingest raw material once, then try many retrieval approaches over time.
 - You can keep raw files readable and portable, without locking your data inside a database.
-- You can evaluate retrieval runs against shared datasets and compare backends using the same corpus.
+- You can evaluate retrieval snapshots against shared datasets and compare backends using the same corpus.
 ## Typical flow
@@ -176,7 +176,7 @@ sequenceDiagram
 - Crawl a website section into corpus items when you want a repeatable “import from the web” workflow.
 - Run extraction when you want derived text artifacts from non-text sources.
 - Reindex to refresh the catalog after edits.
-- Build a retrieval run with a backend.
+- Build a retrieval snapshot with a backend.
 - Query the run to collect evidence and evaluate it with datasets.
 ## Install
@@ -292,8 +292,8 @@ for note_title, note_text in notes:
     corpus.ingest_note(note_text, title=note_title, tags=["memory"])
 backend = get_backend("scan")
-run = backend.build_run(corpus, recipe_name="Story demo", config={})
-budget = QueryBudget(max_total_items=5, max_total_characters=2000, max_items_per_source=None)
+run = backend.build_run(corpus, configuration_name="Story demo", config={})
+budget = QueryBudget(max_total_items=5, maximum_total_characters=2000, max_items_per_source=None)
 result = backend.query(
     corpus,
     run=run,
@@ -333,11 +333,11 @@ Example output:
   "query_text": "Primary button style preference",
   "budget": {
     "max_total_items": 5,
-    "max_total_characters": 2000,
+    "maximum_total_characters": 2000,
     "max_items_per_source": null
   },
-  "run_id": "RUN_ID",
-  "recipe_id": "RECIPE_ID",
+  "snapshot_id": "RUN_ID",
+  "configuration_id": "RECIPE_ID",
   "backend_id": "scan",
   "generated_at": "2026-01-29T00:00:00.000000Z",
   "evidence": [
@@ -352,8 +352,8 @@ Example output:
       "span_start": null,
       "span_end": null,
       "stage": "scan",
-      "recipe_id": "RECIPE_ID",
-      "run_id": "RUN_ID",
+      "configuration_id": "RECIPE_ID",
+      "snapshot_id": "RUN_ID",
       "hash": null
     }
   ],
@@ -422,7 +422,7 @@ flowchart TB
       subgraph RowExtraction[Pluggable: extraction pipeline]
         direction TB
-        Catalog --> Extract[Extract pipeline] --> ExtractedText[Extracted text artifacts] --> ExtractionRun[Extraction run manifest]
+        Catalog --> Extract[Extract pipeline] --> ExtractedText[Extracted text artifacts] --> ExtractionRun[Extraction snapshot manifest]
       end
       subgraph RowRetrieval[Pluggable: retrieval backend]
@@ -484,7 +484,7 @@ From Python, the same flow is available through the Corpus class and backend int
 - Ingest notes with `Corpus.ingest_note`.
 - Ingest files or web addresses with `Corpus.ingest_source`.
 - List items with `Corpus.list_items`.
-- Build a retrieval run with `get_backend` and `backend.build_run`.
+- Build a retrieval snapshot with `get_backend` and `backend.build_run`.
 - Query a run with `backend.query`.
 - Evaluate with `evaluate_run`.
@@ -530,13 +530,13 @@ corpus/
     runs/
       extraction/
         pipeline/
-          <run id>/
+          <snapshot id>/
             manifest.json
             text/
               <item id>.txt
       retrieval/
         <backend id>/
-          <run id>/
+          <snapshot id>/
             manifest.json
 ```
@@ -552,7 +552,7 @@ For detailed documentation including configuration options, performance characte
 ## Retrieval documentation
-For the retrieval pipeline overview and run artifacts, see `docs/RETRIEVAL.md`. For retrieval quality upgrades
+For the retrieval pipeline overview and snapshot artifacts, see `docs/RETRIEVAL.md`. For retrieval quality upgrades
 (tuned lexical baseline, reranking, hybrid retrieval), see `docs/RETRIEVAL_QUALITY.md`. For evaluation workflows
 and dataset formats, see `docs/RETRIEVAL_EVALUATION.md`. For a runnable walkthrough, use the retrieval evaluation lab
 script (`scripts/retrieval_evaluation_lab.py`).
@@ -615,26 +615,26 @@ See `docs/TEXT_SLICE.md` for the utility API and examples.
 Biblicus can run analysis pipelines on extracted text without changing the raw corpus. Profiling and topic modeling
 are the first analysis backends. Profiling summarizes corpus composition and extraction coverage. Topic modeling reads
-an extraction run, optionally applies an LLM-driven extraction pass, applies lexical processing, runs BERTopic, and
+an extraction snapshot, optionally applies an LLM-driven extraction pass, applies lexical processing, runs BERTopic, and
 optionally applies an LLM fine-tuning pass to label topics. The output is structured JavaScript Object Notation.
 See `docs/ANALYSIS.md` for the analysis pipeline overview, `docs/PROFILING.md` for profiling, and
 `docs/TOPIC_MODELING.md` for topic modeling details.
-Run a topic analysis using a recipe file:
+Run a topic analysis using a configuration file:
 ```
-biblicus analyze topics --corpus corpora/example --recipe recipes/topic-modeling.yml --extraction-run pipeline:<run_id>
+biblicus analyze topics --corpus corpora/example --configuration configurations/topic-modeling.yml --extraction-run pipeline:<snapshot_id>
 ```
-If `--extraction-run` is omitted, Biblicus uses the most recent extraction run and emits a warning about
+If `--extraction-run` is omitted, Biblicus uses the most recent extraction snapshot and emits a warning about
 reproducibility. The analysis output is stored under:
 ```
-.biblicus/runs/analysis/topic-modeling/<run_id>/output.json
+.biblicus/runs/analysis/topic-modeling/<snapshot_id>/output.json
 ```
-Minimal recipe example:
+Minimal configuration example:
 ```yaml
 schema_version: 1
@@ -659,7 +659,7 @@ llm_fine_tuning:
 ```
 LLM extraction and fine-tuning require `biblicus[openai]` and a configured OpenAI API key.
-Recipe files are validated strictly against the topic modeling schema, so type mismatches or unknown fields are errors.
+Configuration files are validated strictly against the topic modeling schema, so type mismatches or unknown fields are errors.
 AG News integration runs require `biblicus[datasets]` in addition to `biblicus[topic-modeling]`.
 For a repeatable, real-world integration run that downloads AG News and executes topic modeling, use:
@@ -712,6 +712,15 @@ Build the documentation:
 python -m sphinx -b html docs docs/_build/html
 ```
+Preview the documentation locally:
+```
+cd docs/_build/html
+python -m http.server
+```
+Open `http://localhost:8000` in your browser.
 ## License
 License terms are in `LICENSE`.

{biblicus-0.16.0 → biblicus-1.1.0}/README.md RENAMED Viewed

@@ -26,7 +26,7 @@ See [retrieval augmented generation overview] for a short introduction to the id
 ## Analysis highlights
 - `biblicus analyze markov` learns a directed, weighted state transition graph over segmented text.
-- YAML recipes support cascading composition plus dotted `--config key=value` overrides.
+- YAML configurations support cascading composition plus dotted `--config key=value` overrides.
 - Text extract splits long texts with an LLM by inserting XML tags in-place for structured spans.
 - See `docs/MARKOV_ANALYSIS.md` for Markov analysis details and runnable demos.
 - See `docs/TEXT_EXTRACT.md` for the text extract utility and examples.
@@ -113,7 +113,7 @@ sequenceDiagram
 - You can ingest raw material once, then try many retrieval approaches over time.
 - You can keep raw files readable and portable, without locking your data inside a database.
-- You can evaluate retrieval runs against shared datasets and compare backends using the same corpus.
+- You can evaluate retrieval snapshots against shared datasets and compare backends using the same corpus.
 ## Typical flow
@@ -122,7 +122,7 @@ sequenceDiagram
 - Crawl a website section into corpus items when you want a repeatable “import from the web” workflow.
 - Run extraction when you want derived text artifacts from non-text sources.
 - Reindex to refresh the catalog after edits.
-- Build a retrieval run with a backend.
+- Build a retrieval snapshot with a backend.
 - Query the run to collect evidence and evaluate it with datasets.
 ## Install
@@ -238,8 +238,8 @@ for note_title, note_text in notes:
     corpus.ingest_note(note_text, title=note_title, tags=["memory"])
 backend = get_backend("scan")
-run = backend.build_run(corpus, recipe_name="Story demo", config={})
-budget = QueryBudget(max_total_items=5, max_total_characters=2000, max_items_per_source=None)
+run = backend.build_run(corpus, configuration_name="Story demo", config={})
+budget = QueryBudget(max_total_items=5, maximum_total_characters=2000, max_items_per_source=None)
 result = backend.query(
     corpus,
     run=run,
@@ -279,11 +279,11 @@ Example output:
   "query_text": "Primary button style preference",
   "budget": {
     "max_total_items": 5,
-    "max_total_characters": 2000,
+    "maximum_total_characters": 2000,
     "max_items_per_source": null
   },
-  "run_id": "RUN_ID",
-  "recipe_id": "RECIPE_ID",
+  "snapshot_id": "RUN_ID",
+  "configuration_id": "RECIPE_ID",
   "backend_id": "scan",
   "generated_at": "2026-01-29T00:00:00.000000Z",
   "evidence": [
@@ -298,8 +298,8 @@ Example output:
       "span_start": null,
       "span_end": null,
       "stage": "scan",
-      "recipe_id": "RECIPE_ID",
-      "run_id": "RUN_ID",
+      "configuration_id": "RECIPE_ID",
+      "snapshot_id": "RUN_ID",
       "hash": null
     }
   ],
@@ -368,7 +368,7 @@ flowchart TB
       subgraph RowExtraction[Pluggable: extraction pipeline]
         direction TB
-        Catalog --> Extract[Extract pipeline] --> ExtractedText[Extracted text artifacts] --> ExtractionRun[Extraction run manifest]
+        Catalog --> Extract[Extract pipeline] --> ExtractedText[Extracted text artifacts] --> ExtractionRun[Extraction snapshot manifest]
       end
       subgraph RowRetrieval[Pluggable: retrieval backend]
@@ -430,7 +430,7 @@ From Python, the same flow is available through the Corpus class and backend int
 - Ingest notes with `Corpus.ingest_note`.
 - Ingest files or web addresses with `Corpus.ingest_source`.
 - List items with `Corpus.list_items`.
-- Build a retrieval run with `get_backend` and `backend.build_run`.
+- Build a retrieval snapshot with `get_backend` and `backend.build_run`.
 - Query a run with `backend.query`.
 - Evaluate with `evaluate_run`.
@@ -476,13 +476,13 @@ corpus/
     runs/
       extraction/
         pipeline/
-          <run id>/
+          <snapshot id>/
             manifest.json
             text/
               <item id>.txt
       retrieval/
         <backend id>/
-          <run id>/
+          <snapshot id>/
             manifest.json
 ```
@@ -498,7 +498,7 @@ For detailed documentation including configuration options, performance characte
 ## Retrieval documentation
-For the retrieval pipeline overview and run artifacts, see `docs/RETRIEVAL.md`. For retrieval quality upgrades
+For the retrieval pipeline overview and snapshot artifacts, see `docs/RETRIEVAL.md`. For retrieval quality upgrades
 (tuned lexical baseline, reranking, hybrid retrieval), see `docs/RETRIEVAL_QUALITY.md`. For evaluation workflows
 and dataset formats, see `docs/RETRIEVAL_EVALUATION.md`. For a runnable walkthrough, use the retrieval evaluation lab
 script (`scripts/retrieval_evaluation_lab.py`).
@@ -561,26 +561,26 @@ See `docs/TEXT_SLICE.md` for the utility API and examples.
 Biblicus can run analysis pipelines on extracted text without changing the raw corpus. Profiling and topic modeling
 are the first analysis backends. Profiling summarizes corpus composition and extraction coverage. Topic modeling reads
-an extraction run, optionally applies an LLM-driven extraction pass, applies lexical processing, runs BERTopic, and
+an extraction snapshot, optionally applies an LLM-driven extraction pass, applies lexical processing, runs BERTopic, and
 optionally applies an LLM fine-tuning pass to label topics. The output is structured JavaScript Object Notation.
 See `docs/ANALYSIS.md` for the analysis pipeline overview, `docs/PROFILING.md` for profiling, and
 `docs/TOPIC_MODELING.md` for topic modeling details.
-Run a topic analysis using a recipe file:
+Run a topic analysis using a configuration file:
 ```
-biblicus analyze topics --corpus corpora/example --recipe recipes/topic-modeling.yml --extraction-run pipeline:<run_id>
+biblicus analyze topics --corpus corpora/example --configuration configurations/topic-modeling.yml --extraction-run pipeline:<snapshot_id>
 ```
-If `--extraction-run` is omitted, Biblicus uses the most recent extraction run and emits a warning about
+If `--extraction-run` is omitted, Biblicus uses the most recent extraction snapshot and emits a warning about
 reproducibility. The analysis output is stored under:
 ```
-.biblicus/runs/analysis/topic-modeling/<run_id>/output.json
+.biblicus/runs/analysis/topic-modeling/<snapshot_id>/output.json
 ```
-Minimal recipe example:
+Minimal configuration example:
 ```yaml
 schema_version: 1
@@ -605,7 +605,7 @@ llm_fine_tuning:
 ```
 LLM extraction and fine-tuning require `biblicus[openai]` and a configured OpenAI API key.
-Recipe files are validated strictly against the topic modeling schema, so type mismatches or unknown fields are errors.
+Configuration files are validated strictly against the topic modeling schema, so type mismatches or unknown fields are errors.
 AG News integration runs require `biblicus[datasets]` in addition to `biblicus[topic-modeling]`.
 For a repeatable, real-world integration run that downloads AG News and executes topic modeling, use:
@@ -658,6 +658,15 @@ Build the documentation:
 python -m sphinx -b html docs docs/_build/html
 ```
+Preview the documentation locally:
+```
+cd docs/_build/html
+python -m http.server
+```
+Open `http://localhost:8000` in your browser.
 ## License
 License terms are in `LICENSE`.

{biblicus-0.16.0 → biblicus-1.1.0}/docs/ANALYSIS.md RENAMED Viewed

@@ -1,31 +1,31 @@
 # Corpus analysis
 Biblicus supports analysis backends that run on extracted text artifacts without changing the raw corpus. Analysis is a
-pluggable phase that reads an extraction run, produces structured output, and stores artifacts under the corpus runs
+pluggable phase that reads an extraction snapshot, produces structured output, and stores artifacts under the corpus runs
 folder. Each analysis backend declares its own configuration schema and output contract, and all schemas are validated
 strictly.
-## How analysis runs work
+## How analysis snapshots work
-- Analysis runs are tied to a corpus state via the extraction run reference.
-- The analysis output is written under `.biblicus/runs/analysis/<analysis-id>/<run_id>/`.
-- Analysis is reproducible when you supply the same extraction run and corpus catalog state.
-- Analysis configuration is stored as a recipe manifest in the run metadata.
+- Analysis runs are tied to a corpus state via the extraction snapshot reference.
+- The analysis output is written under `.biblicus/runs/analysis/<analysis-id>/<snapshot_id>/`.
+- Analysis is reproducible when you supply the same extraction snapshot and corpus catalog state.
+- Analysis configuration is stored as a configuration manifest in the run metadata.
-If you omit the extraction run, Biblicus uses the most recent extraction run and emits a reproducibility warning. For
-repeatable analysis runs, always pass the extraction run reference explicitly.
+If you omit the extraction snapshot, Biblicus uses the most recent extraction snapshot and emits a reproducibility warning. For
+repeatable analysis snapshots, always pass the extraction snapshot reference explicitly.
-## Analysis run artifacts
+## Analysis snapshot artifacts
-Every analysis run records a manifest alongside the output:
+Every analysis snapshot records a manifest alongside the output:
 ```
-.biblicus/runs/analysis/<analysis-id>/<run_id>/
+.biblicus/runs/analysis/<analysis-id>/<snapshot_id>/
   manifest.json
   output.json
 ```
-The manifest captures the recipe, extraction run reference, and catalog timestamp so results can be reproduced and
+The manifest captures the configuration, extraction snapshot reference, and catalog timestamp so results can be reproduced and
 compared later.
 ## Inspecting output
@@ -38,21 +38,21 @@ cat corpora/example/.biblicus/runs/analysis/profiling/RUN_ID/output.json
 Each analysis backend defines its own `report` payload. The run metadata is consistent across backends.
-## Comparing analysis runs
+## Comparing analysis snapshots
 When you compare analysis results, record:
 - Corpus path and catalog timestamp.
 - Extraction run reference.
-- Analysis recipe name and configuration.
-- Analysis run identifier and output path.
+- Analysis configuration name and configuration.
+- Analysis snapshot identifier and output path.
 These make it possible to rerun the analysis and explain differences.
 ## Pluggable analysis backends
 Analysis backends implement the `CorpusAnalysisBackend` interface and are registered under `biblicus.analysis`.
-A backend receives the corpus, a recipe name, a configuration mapping, and an extraction run reference. It returns a
+A backend receives the corpus, a configuration name, a configuration mapping, and an extraction snapshot reference. It returns a
 Pydantic model that is serialized to JavaScript Object Notation for storage.
 ## Choosing an analysis backend
@@ -61,22 +61,22 @@ Start with profiling when you need fast, deterministic baselines. Use topic mode
 and exploratory labels. Use Markov analysis when you want state-transition structure over sequences of segments.
 Combine multiple backends for a clear view of corpus composition, themes, and state dynamics.
-## Recipe files
+## Configuration files
-Analysis recipes are optional JavaScript Object Notation or YAML files that capture configuration in a repeatable way.
+Analysis configurations are optional JavaScript Object Notation or YAML files that capture configuration in a repeatable way.
 They are useful for sharing experiments and keeping runs reproducible.
-Recipes support cascading composition. When a command accepts `--recipe`, you can pass multiple recipe files. Biblicus
-merges them in order, where later recipes override earlier recipes via a deep merge. You can then apply `--config`
+Recipes support cascading composition. When a command accepts `--configuration`, you can pass multiple configuration files. Biblicus
+merges them in order, where later configurations override earlier configurations via a deep merge. You can then apply `--config`
 overrides on top of the composed view.
-Minimal profiling recipe:
+Minimal profiling configuration:
 ```
 schema_version: 1
 ```
-Minimal topic modeling recipe:
+Minimal topic modeling configuration:
 ```
 schema_version: 1
@@ -87,7 +87,7 @@ bertopic_analysis:
     nr_topics: 8
 ```
-Minimal Markov analysis recipe:
+Minimal Markov analysis configuration:
 ```
 schema_version: 1
@@ -111,7 +111,7 @@ The integration demo script is a working reference you can use as a starting poi
 python scripts/topic_modeling_integration.py --corpus corpora/ag_news_demo --force
 ```
-The command prints the analysis run identifier and the output path. Open the resulting `output.json` to inspect per-topic
+The command prints the analysis snapshot identifier and the output path. Open the resulting `output.json` to inspect per-topic
 labels, keywords, and document examples.
 ## Markov analysis
@@ -134,7 +134,7 @@ deterministic counts and distribution metrics. See `docs/PROFILING.md` for the f
 python -m biblicus analyze profile --corpus corpora/example --extraction-run pipeline:RUN_ID
 ```
-The command writes an analysis run directory and prints the run identifier.
+The command writes an analysis snapshot directory and prints the snapshot identifier.
 Run profiling from the CLI:

{biblicus-0.16.0 → biblicus-1.1.0}/docs/ARCHITECTURE_DETAIL.md RENAMED Viewed

@@ -15,7 +15,7 @@ Design starts from strict behavior-driven development:
 - All changes should follow specification-first behavior-driven development: failing scenario,
   implementation, passing scenario, then refactor.
 - Behavior-driven development scenarios are not an afterthought: they are how we keep the domain
-  vocabulary consistent and the platform comparable across backends and recipes.
+  vocabulary consistent and the platform comparable across backends and configurations.
 - **Specification completeness** is mandatory: if behavior exists, it must be specified.
   Ambiguous or untestable behavior should be removed or turned into an explicit error.
@@ -42,7 +42,7 @@ core nouns:
 - I have a **corpus** at this path or uniform resource identifier.
 - I ingest an **item** with optional **metadata**.
 - I rebuild the derived **index** after edits.
-- I run a **recipe** against the same corpus.
+- I run a **configuration** against the same corpus.
 - I query and receive **evidence**.
 Anything that does not map cleanly to these nouns is either a derived helper or a backend-specific
@@ -72,13 +72,13 @@ requirements.
 - **Knowledge base backend**: an implementation that can ingest and retrieve from a corpus, such
   as scan, full text search, vector retrieval, or hybrid retrieval, exposed to procedures through
   retrieval primitives.
-- **Retrieval recipe**: a named configuration bundle for a backend, such as chunking rules,
+- **Retrieval configuration**: a named configuration bundle for a backend, such as chunking rules,
   embedding model and version, hybrid weights, reranker choice, and filters. This is what we
   benchmark and compare.
-- **Recipe manifest**: a reproducibility record describing the backend and recipe parameters,
-  plus any referenced materializations and build runs.
-- **Materialization**: an optional, persisted representation derived from raw content for a given
-  recipe and backend, such as chunks, embeddings, or indexes. Some backends intentionally have
+- **Configuration manifest**: a reproducibility record describing the backend and configuration parameters,
+  plus any referenced snapshot artifacts and build snapshots.
+- **Snapshot artifacts**: optional, persisted representations derived from raw content for a given
+  configuration and backend, such as chunks, embeddings, or indexes. Some backends intentionally have
   none and operate on demand.
 - **Evidence**: structured retrieval output from backend queries. Evidence includes spans, scores,
   and provenance used by downstream retrieval augmented generation procedures.
@@ -95,7 +95,7 @@ requirements.
 - **Minimal opinion raw store**: raw ingestion should work for a folder of files with optional
   lightweight tagging.
 - **Reproducibility by default**: comparisons require manifests (even when there are no persisted
-  materializations).
+  snapshot artifacts).
 - **Mutability is real**: corpora are edited, pruned, and reorganized; re-indexing must be a core
   workflow.
 - **Separation of concerns**: retrieval returns evidence; retrieval-augmented generation patterns
@@ -110,7 +110,7 @@ requirements.
 These are explicit, opinionated policies encoded into the project:
 - **Evidence schema strictness**: moderate-to-strong schema. Evidence must include stable
-  identifiers, provenance, and retrieval scores; richer fields (spans, stage, recipe and run
+  identifiers, provenance, and retrieval scores; richer fields (spans, stage, configuration and run
   identifiers) are expected.
 - **Retrieval stages**: multi-stage is explicit (retrieve, rerank, then filter). Pipelines are
   expressed through evidence metadata rather than hard-coded backends.
@@ -131,7 +131,7 @@ Evidence is the canonical output of retrieval. Required fields:
 - `score` and `rank`
 - `text` (or `content_ref` when non-text)
 - `stage` (for example, `scan`, `full-text-search`, `rerank`)
-- `recipe_id` / `run_id` (for reproducibility)
+- `configuration_id` / `snapshot_id` (for reproducibility)
 - Optional: `span_start`, `span_end`, `hash`
 ## Evidence lifecycle
@@ -220,12 +220,12 @@ The interface stays the same; topology is configuration.
 ### Reproducibility
-- Biblicus always records a **recipe manifest** for reproducibility.
-- When a backend produces persisted materializations, Biblicus treats them as **versioned build
-  runs** identified by `run_id` (rather than overwriting in place by default).
-- Manifests exist even for just-in-time backends (materializations may be empty).
+- Biblicus always records a **configuration manifest** for reproducibility.
+- When a backend produces persisted snapshot artifacts, Biblicus treats them as **versioned build
+  snapshots** identified by `snapshot_id` (rather than overwriting in place by default).
+- Manifests exist even for just-in-time backends (snapshot artifacts may be empty).
 - Full directed acyclic graph lineage is not included in version zero; revisit only if needed.
-- Optional: define **shared materialization formats** (canonical chunk and embedding stores) so
+- Optional: define **shared snapshot artifact formats** (canonical chunk and embedding stores) so
   multiple backends can reuse intermediates when it makes sense; keep it opt-in.
 ### Evaluation
@@ -243,8 +243,8 @@ The interface stays the same; topology is configuration.
   backend/tool can consume it without requiring a database engine.
 - Canonical version zero format is a single JavaScript Object Notation file at
   `.biblicus/catalog.json`, written atomically (temporary file and rename) on updates.
-- The catalog includes `latest_run_id` and run manifests are stored at
-  `.biblicus/runs/<run_id>.json`.
+- The catalog includes `latest_snapshot_id` and snapshot manifests are stored at
+  `.biblicus/snapshots/<snapshot_id>.json`.
 - If this becomes a bottleneck at very large scales, we **change the specification** (bump
   `schema_version`) rather than introduce multiple “supported” catalog storage modes.

{biblicus-0.16.0 → biblicus-1.1.0}/docs/BACKENDS.md RENAMED Viewed

@@ -17,7 +17,7 @@ Backends implement two operations:
 Backends store artifacts and manifests under:
 ```
-.biblicus/runs/retrieval/<backend_id>/<run_id>/
+.biblicus/runs/retrieval/<backend_id>/<snapshot_id>/
   manifest.json
   <backend artifacts>
 ```
@@ -26,12 +26,12 @@ The manifest is the reproducible contract. Artifacts are backend-specific and li
 ## Implementation checklist
-1. **Define a Pydantic configuration model** for your backend recipe.
+1. **Define a Pydantic configuration model** for your backend configuration.
 2. **Implement `RetrievalBackend`**:
-   - `build_run(corpus, recipe_name, config)`
+   - `build_run(corpus, configuration_name, config)`
    - `query(corpus, run, query_text, budget)`
 3. **Emit `Evidence`** with required fields:
-   - `item_id`, `source_uri`, `media_type`, `score`, `rank`, `stage`, `recipe_id`, `run_id`
+   - `item_id`, `source_uri`, `media_type`, `score`, `rank`, `stage`, `configuration_id`, `snapshot_id`
    - `text` **or** `content_ref`
 4. **Register the backend** in `biblicus.backends.available_backends`.
 5. **Add behavior-driven development specifications** before implementation and make them pass with 100% coverage.
@@ -41,12 +41,12 @@ The manifest is the reproducible contract. Artifacts are backend-specific and li
 - Treat **runs** as immutable manifests with reproducible parameters.
 - If your backend needs artifacts, store them under `.biblicus/runs/` and record paths in `artifact_paths`.
 - Keep **text extraction** in explicit pipeline stages, not in backend ingestion.
-  See `docs/EXTRACTION.md` for how extraction runs are built and referenced from backend configs.
+  See `docs/EXTRACTION.md` for how extraction snapshots are built and referenced from backend configs.
 ## Reproducibility checklist
-- Record the extraction run reference used to build the backend.
-- Keep the backend recipe configuration in source control.
+- Record the extraction snapshot reference used to build the backend.
+- Keep the backend configuration configuration in source control.
 - Reuse the same `QueryBudget` when comparing backends.
 ## Common pitfalls

{biblicus-0.16.0 → biblicus-1.1.0}/docs/CHUNKING.md RENAMED Viewed

@@ -8,7 +8,7 @@ returns evidence with chunk boundaries so you can trace results back to the orig
 ## Chunkers are pluggable
-Chunking is a pluggable interface selected by identifier in a retrieval recipe:
+Chunking is a pluggable interface selected by identifier in a retrieval configuration:
 - `chunker_id`
 - `chunker_config` (Pydantic validated; `extra="forbid"`)

biblicus 0.16.0__tar.gz → 1.1.0__tar.gz

biblicus 0.16.0tar.gz → 1.1.0tar.gz