PyPI - biblicus - Versions diffs - 0.1.1__tar.gz → 0.2.0__tar.gz - Mend

biblicus 0.1.1tar.gz → 0.2.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (94) hide show

{biblicus-0.1.1/src/biblicus.egg-info → biblicus-0.2.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: biblicus
-Version: 0.1.1
+Version: 0.2.0
 Summary: Command line interface and Python library for corpus ingestion, retrieval, and evaluation.
 License: MIT
 Requires-Python: >=3.9
@@ -20,6 +20,9 @@ Dynamic: license-file
 # Biblicus
+![Continuous integration][continuous-integration-badge]
+![Coverage][coverage-badge]
 Make your documents usable by your assistant, then decide later how you will search and retrieve them.
 If you are building an assistant in Python, you probably have material you want it to use: notes, documents, web pages, and reference files. A common approach is retrieval augmented generation, where a system retrieves relevant material and uses it as evidence when generating a response.
@@ -45,6 +48,84 @@ The framework is a small, explicit vocabulary that appears in code, specificatio
 - Recipe is a named configuration for a backend.
 - Pipeline stage is a distinct retrieval step such as retrieve, rerank, and filter.
+## Diagram
+This diagram shows how a corpus becomes evidence for an assistant.
+The legend shows what the border styles and fill styles mean.
+The your code region is where you decide how to turn evidence into context and how to call a model.
+```mermaid
+%%{init: {"flowchart": {"useMaxWidth": true, "nodeSpacing": 18, "rankSpacing": 22}}}%%
+flowchart LR
+  subgraph Legend[Legend]
+    direction LR
+    LegendArtifact[Stored artifact or evidence]
+    LegendStep[Step]
+    LegendArtifact --- LegendStep
+  end
+  subgraph Main[" "]
+    direction TB
+    subgraph StableCore[Stable core]
+      direction TB
+      Source[Source items] --> Ingest[Ingest]
+      Ingest --> Raw[Raw item files]
+      Raw --> Catalog[Catalog file]
+    end
+    subgraph PluggableRetrievalBackend[Pluggable retrieval backend]
+      direction LR
+      subgraph BackendIngestionIndexing[Ingestion and indexing]
+        direction TB
+        Catalog --> Build[Build run]
+        Build --> BackendIndex[Backend index]
+        BackendIndex --> Run[Run manifest]
+      end
+      subgraph BackendRetrievalGeneration[Retrieval and generation]
+        direction TB
+        Run --> Query[Query]
+        Query --> Evidence[Evidence]
+      end
+    end
+    Evidence --> Context
+    subgraph YourCode[Your code]
+      direction TB
+      Context[Assistant context] --> Model[Large language model call]
+      Model --> Answer[Answer]
+    end
+    style StableCore fill:#ffffff,stroke:#8e24aa,stroke-width:2px,color:#111111
+    style PluggableRetrievalBackend fill:#ffffff,stroke:#1e88e5,stroke-dasharray:6 3,stroke-width:2px,color:#111111
+    style YourCode fill:#ffffff,stroke:#d81b60,stroke-width:2px,color:#111111
+    style BackendIngestionIndexing fill:#ffffff,stroke:#cfd8dc,color:#111111
+    style BackendRetrievalGeneration fill:#ffffff,stroke:#cfd8dc,color:#111111
+    style Raw fill:#f3e5f5,stroke:#8e24aa,color:#111111
+    style Catalog fill:#f3e5f5,stroke:#8e24aa,color:#111111
+    style BackendIndex fill:#f3e5f5,stroke:#8e24aa,color:#111111
+    style Run fill:#f3e5f5,stroke:#8e24aa,color:#111111
+    style Evidence fill:#f3e5f5,stroke:#8e24aa,color:#111111
+    style Context fill:#f3e5f5,stroke:#8e24aa,color:#111111
+    style Answer fill:#f3e5f5,stroke:#8e24aa,color:#111111
+    style Source fill:#f3e5f5,stroke:#8e24aa,color:#111111
+    style Ingest fill:#eceff1,stroke:#90a4ae,color:#111111
+    style Build fill:#eceff1,stroke:#90a4ae,color:#111111
+    style Query fill:#eceff1,stroke:#90a4ae,color:#111111
+    style Model fill:#eceff1,stroke:#90a4ae,color:#111111
+  end
+  style Legend fill:#ffffff,stroke:#ffffff,color:#111111
+  style Main fill:#ffffff,stroke:#ffffff,color:#111111
+  style LegendArtifact fill:#f3e5f5,stroke:#8e24aa,color:#111111
+  style LegendStep fill:#eceff1,stroke:#90a4ae,color:#111111
+```
 ## Practical value
 - You can ingest raw material once, then try many retrieval approaches over time.
@@ -110,7 +191,11 @@ In an assistant system, retrieval usually produces context for a model call. Thi
 The documents below are written to be read in order.
 - [Architecture][architecture]
+- [Corpus][corpus]
+- [Text extraction][text-extraction]
 - [Backends][backends]
+- [Next steps][next-steps]
+- [Testing][testing]
 ## Metadata and catalog
@@ -143,12 +228,20 @@ Use `scripts/download_wikipedia.py` to download a small integration corpus from
 The dataset file `datasets/wikipedia_mini.json` provides a small evaluation set that matches the integration corpus.
+Use `scripts/download_pdf_samples.py` to download a small Portable Document Format integration corpus when running tests or demos. The repository does not include that content.
 ## Tests and coverage
 ```
 python3 scripts/test.py
 ```
+To include integration scenarios that download public test data at runtime, run this command.
+```
+python3 scripts/test.py --integration
+```
 ## Releases
 Releases are automated from the main branch using semantic versioning and conventional commit messages.
@@ -171,4 +264,11 @@ License terms are in `LICENSE`.
 [retrieval augmented generation overview]: https://en.wikipedia.org/wiki/Retrieval-augmented_generation
 [architecture]: docs/ARCHITECTURE.md
+[corpus]: docs/CORPUS.md
+[text-extraction]: docs/EXTRACTION.md
 [backends]: docs/BACKENDS.md
+[next-steps]: docs/NEXT_STEPS.md
+[testing]: docs/TESTING.md
+[continuous-integration-badge]: https://github.com/AnthusAI/Biblicus/actions/workflows/ci.yml/badge.svg?branch=main
+[coverage-badge]: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/AnthusAI/Biblicus/main/coverage_badge.json

{biblicus-0.1.1 → biblicus-0.2.0}/README.md RENAMED Viewed

@@ -1,5 +1,8 @@
 # Biblicus
+![Continuous integration][continuous-integration-badge]
+![Coverage][coverage-badge]
 Make your documents usable by your assistant, then decide later how you will search and retrieve them.
 If you are building an assistant in Python, you probably have material you want it to use: notes, documents, web pages, and reference files. A common approach is retrieval augmented generation, where a system retrieves relevant material and uses it as evidence when generating a response.
@@ -25,6 +28,84 @@ The framework is a small, explicit vocabulary that appears in code, specificatio
 - Recipe is a named configuration for a backend.
 - Pipeline stage is a distinct retrieval step such as retrieve, rerank, and filter.
+## Diagram
+This diagram shows how a corpus becomes evidence for an assistant.
+The legend shows what the border styles and fill styles mean.
+The your code region is where you decide how to turn evidence into context and how to call a model.
+```mermaid
+%%{init: {"flowchart": {"useMaxWidth": true, "nodeSpacing": 18, "rankSpacing": 22}}}%%
+flowchart LR
+  subgraph Legend[Legend]
+    direction LR
+    LegendArtifact[Stored artifact or evidence]
+    LegendStep[Step]
+    LegendArtifact --- LegendStep
+  end
+  subgraph Main[" "]
+    direction TB
+    subgraph StableCore[Stable core]
+      direction TB
+      Source[Source items] --> Ingest[Ingest]
+      Ingest --> Raw[Raw item files]
+      Raw --> Catalog[Catalog file]
+    end
+    subgraph PluggableRetrievalBackend[Pluggable retrieval backend]
+      direction LR
+      subgraph BackendIngestionIndexing[Ingestion and indexing]
+        direction TB
+        Catalog --> Build[Build run]
+        Build --> BackendIndex[Backend index]
+        BackendIndex --> Run[Run manifest]
+      end
+      subgraph BackendRetrievalGeneration[Retrieval and generation]
+        direction TB
+        Run --> Query[Query]
+        Query --> Evidence[Evidence]
+      end
+    end
+    Evidence --> Context
+    subgraph YourCode[Your code]
+      direction TB
+      Context[Assistant context] --> Model[Large language model call]
+      Model --> Answer[Answer]
+    end
+    style StableCore fill:#ffffff,stroke:#8e24aa,stroke-width:2px,color:#111111
+    style PluggableRetrievalBackend fill:#ffffff,stroke:#1e88e5,stroke-dasharray:6 3,stroke-width:2px,color:#111111
+    style YourCode fill:#ffffff,stroke:#d81b60,stroke-width:2px,color:#111111
+    style BackendIngestionIndexing fill:#ffffff,stroke:#cfd8dc,color:#111111
+    style BackendRetrievalGeneration fill:#ffffff,stroke:#cfd8dc,color:#111111
+    style Raw fill:#f3e5f5,stroke:#8e24aa,color:#111111
+    style Catalog fill:#f3e5f5,stroke:#8e24aa,color:#111111
+    style BackendIndex fill:#f3e5f5,stroke:#8e24aa,color:#111111
+    style Run fill:#f3e5f5,stroke:#8e24aa,color:#111111
+    style Evidence fill:#f3e5f5,stroke:#8e24aa,color:#111111
+    style Context fill:#f3e5f5,stroke:#8e24aa,color:#111111
+    style Answer fill:#f3e5f5,stroke:#8e24aa,color:#111111
+    style Source fill:#f3e5f5,stroke:#8e24aa,color:#111111
+    style Ingest fill:#eceff1,stroke:#90a4ae,color:#111111
+    style Build fill:#eceff1,stroke:#90a4ae,color:#111111
+    style Query fill:#eceff1,stroke:#90a4ae,color:#111111
+    style Model fill:#eceff1,stroke:#90a4ae,color:#111111
+  end
+  style Legend fill:#ffffff,stroke:#ffffff,color:#111111
+  style Main fill:#ffffff,stroke:#ffffff,color:#111111
+  style LegendArtifact fill:#f3e5f5,stroke:#8e24aa,color:#111111
+  style LegendStep fill:#eceff1,stroke:#90a4ae,color:#111111
+```
 ## Practical value
 - You can ingest raw material once, then try many retrieval approaches over time.
@@ -90,7 +171,11 @@ In an assistant system, retrieval usually produces context for a model call. Thi
 The documents below are written to be read in order.
 - [Architecture][architecture]
+- [Corpus][corpus]
+- [Text extraction][text-extraction]
 - [Backends][backends]
+- [Next steps][next-steps]
+- [Testing][testing]
 ## Metadata and catalog
@@ -123,12 +208,20 @@ Use `scripts/download_wikipedia.py` to download a small integration corpus from
 The dataset file `datasets/wikipedia_mini.json` provides a small evaluation set that matches the integration corpus.
+Use `scripts/download_pdf_samples.py` to download a small Portable Document Format integration corpus when running tests or demos. The repository does not include that content.
 ## Tests and coverage
 ```
 python3 scripts/test.py
 ```
+To include integration scenarios that download public test data at runtime, run this command.
+```
+python3 scripts/test.py --integration
+```
 ## Releases
 Releases are automated from the main branch using semantic versioning and conventional commit messages.
@@ -151,4 +244,11 @@ License terms are in `LICENSE`.
 [retrieval augmented generation overview]: https://en.wikipedia.org/wiki/Retrieval-augmented_generation
 [architecture]: docs/ARCHITECTURE.md
+[corpus]: docs/CORPUS.md
+[text-extraction]: docs/EXTRACTION.md
 [backends]: docs/BACKENDS.md
+[next-steps]: docs/NEXT_STEPS.md
+[testing]: docs/TESTING.md
+[continuous-integration-badge]: https://github.com/AnthusAI/Biblicus/actions/workflows/ci.yml/badge.svg?branch=main
+[coverage-badge]: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/AnthusAI/Biblicus/main/coverage_badge.json

{biblicus-0.1.1 → biblicus-0.2.0}/docs/BACKENDS.md RENAMED Viewed

@@ -27,6 +27,7 @@ Backends implement two operations:
 - Treat **runs** as immutable manifests with reproducible parameters.
 - If your backend needs artifacts, store them under `.biblicus/runs/` and record paths in `artifact_paths`.
 - Keep **text extraction** in explicit pipeline stages, not in backend ingestion.
+  See `docs/EXTRACTION.md` for how extraction runs are built and referenced from backend configs.
 ## Examples

biblicus-0.2.0/docs/CORPUS.md ADDED Viewed

@@ -0,0 +1,103 @@
+# Corpus
+A corpus is a normal folder on disk. It is the source of truth for your raw items.
+The main goals are:
+- You can ingest an item once and keep it as a file you can open and inspect.
+- You can rebuild the catalog at any time.
+- You can add derived artifacts later without changing the raw corpus.
+## On disk layout
+```
+corpus/
+  raw/
+    <item files>
+  .biblicus/
+    config.json
+    catalog.json
+    runs/
+      <run manifests and artifacts>
+```
+## Ingest items
+The simplest ingestion flows use the command line interface.
+Create a corpus:
+```
+python3 -m biblicus init corpora/example
+```
+Ingest a local file:
+```
+python3 -m biblicus ingest --corpus corpora/example path/to/file.pdf --tag paper
+```
+Ingest a web address:
+```
+python3 -m biblicus ingest --corpus corpora/example https://example.com --tag web
+```
+Ingest a text note:
+```
+python3 -m biblicus ingest --corpus corpora/example --note "Hello" --title "First note" --tag notes
+```
+List items:
+```
+python3 -m biblicus list --corpus corpora/example
+```
+Show an item:
+```
+python3 -m biblicus show --corpus corpora/example ITEM_ID
+```
+## Metadata
+Metadata is intentionally simple and file based.
+For Markdown items, metadata lives in a YAML front matter block.
+For non Markdown items, metadata lives in a sidecar file with the suffix `.biblicus.yml`.
+The raw file and its metadata file are meant to be opened, edited, and backed up with ordinary tools.
+## Ignore rules
+If you are importing a folder tree, ignore rules can prevent accidental ingestion of build artifacts, caches, or other irrelevant files.
+Create a `.biblicusignore` file in the corpus root and add ignore patterns.
+## Import a folder tree
+To ingest an existing folder tree into a corpus while preserving relative paths, use the import command.
+```
+python3 -m biblicus import-tree --corpus corpora/example /path/to/folder/tree --tag imported
+```
+## Reindex
+The catalog is rebuildable. If you edit files or sidecar metadata, refresh the catalog.
+```
+python3 -m biblicus reindex --corpus corpora/example
+```
+## Purge
+Purging deletes all items and derived artifacts under the corpus. It requires you to type the corpus name as confirmation.
+```
+python3 -m biblicus purge --corpus corpora/example --confirm example
+```

biblicus 0.1.1__tar.gz → 0.2.0__tar.gz

biblicus 0.1.1tar.gz → 0.2.0tar.gz