biblicus 0.4.0__tar.gz → 0.6.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (155) hide show
  1. {biblicus-0.4.0/src/biblicus.egg-info → biblicus-0.6.0}/PKG-INFO +283 -102
  2. biblicus-0.6.0/README.md +504 -0
  3. {biblicus-0.4.0 → biblicus-0.6.0}/docs/ARCHITECTURE.md +1 -0
  4. biblicus-0.6.0/docs/CONTEXT_PACK.md +61 -0
  5. {biblicus-0.4.0 → biblicus-0.6.0}/docs/FEATURE_INDEX.md +35 -0
  6. biblicus-0.6.0/docs/KNOWLEDGE_BASE.md +68 -0
  7. biblicus-0.6.0/docs/ROADMAP.md +140 -0
  8. {biblicus-0.4.0 → biblicus-0.6.0}/docs/api.rst +8 -0
  9. {biblicus-0.4.0 → biblicus-0.6.0}/docs/index.rst +2 -0
  10. biblicus-0.6.0/features/context_pack.feature +42 -0
  11. biblicus-0.6.0/features/context_pack_cli.feature +29 -0
  12. biblicus-0.6.0/features/evidence_processing.feature +25 -0
  13. biblicus-0.6.0/features/knowledge_base.feature +55 -0
  14. biblicus-0.6.0/features/query_processing.feature +27 -0
  15. {biblicus-0.4.0 → biblicus-0.6.0}/features/steps/cli_steps.py +102 -0
  16. biblicus-0.6.0/features/steps/context_pack_steps.py +115 -0
  17. biblicus-0.6.0/features/steps/evidence_processing_steps.py +47 -0
  18. biblicus-0.6.0/features/steps/knowledge_base_steps.py +90 -0
  19. biblicus-0.6.0/features/token_budget.feature +37 -0
  20. {biblicus-0.4.0 → biblicus-0.6.0}/pyproject.toml +1 -1
  21. biblicus-0.6.0/scripts/readme_end_to_end_demo.py +81 -0
  22. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/__init__.py +3 -1
  23. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/cli.py +90 -1
  24. biblicus-0.6.0/src/biblicus/context.py +183 -0
  25. biblicus-0.6.0/src/biblicus/evidence_processing.py +201 -0
  26. biblicus-0.6.0/src/biblicus/knowledge_base.py +191 -0
  27. {biblicus-0.4.0 → biblicus-0.6.0/src/biblicus.egg-info}/PKG-INFO +283 -102
  28. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus.egg-info/SOURCES.txt +15 -0
  29. biblicus-0.4.0/README.md +0 -323
  30. biblicus-0.4.0/docs/ROADMAP.md +0 -200
  31. {biblicus-0.4.0 → biblicus-0.6.0}/LICENSE +0 -0
  32. {biblicus-0.4.0 → biblicus-0.6.0}/MANIFEST.in +0 -0
  33. {biblicus-0.4.0 → biblicus-0.6.0}/THIRD_PARTY_NOTICES.md +0 -0
  34. {biblicus-0.4.0 → biblicus-0.6.0}/datasets/wikipedia_mini.json +0 -0
  35. {biblicus-0.4.0 → biblicus-0.6.0}/docs/BACKENDS.md +0 -0
  36. {biblicus-0.4.0 → biblicus-0.6.0}/docs/CORPUS.md +0 -0
  37. {biblicus-0.4.0 → biblicus-0.6.0}/docs/CORPUS_DESIGN.md +0 -0
  38. {biblicus-0.4.0 → biblicus-0.6.0}/docs/DEMOS.md +0 -0
  39. {biblicus-0.4.0 → biblicus-0.6.0}/docs/EXTRACTION.md +0 -0
  40. {biblicus-0.4.0 → biblicus-0.6.0}/docs/TESTING.md +0 -0
  41. {biblicus-0.4.0 → biblicus-0.6.0}/docs/USER_CONFIGURATION.md +0 -0
  42. {biblicus-0.4.0 → biblicus-0.6.0}/docs/conf.py +0 -0
  43. {biblicus-0.4.0 → biblicus-0.6.0}/features/backend_validation.feature +0 -0
  44. {biblicus-0.4.0 → biblicus-0.6.0}/features/biblicus_corpus.feature +0 -0
  45. {biblicus-0.4.0 → biblicus-0.6.0}/features/cli_entrypoint.feature +0 -0
  46. {biblicus-0.4.0 → biblicus-0.6.0}/features/cli_parsing.feature +0 -0
  47. {biblicus-0.4.0 → biblicus-0.6.0}/features/content_sniffing.feature +0 -0
  48. {biblicus-0.4.0 → biblicus-0.6.0}/features/corpus_edge_cases.feature +0 -0
  49. {biblicus-0.4.0 → biblicus-0.6.0}/features/corpus_identity.feature +0 -0
  50. {biblicus-0.4.0 → biblicus-0.6.0}/features/corpus_purge.feature +0 -0
  51. {biblicus-0.4.0 → biblicus-0.6.0}/features/crawl.feature +0 -0
  52. {biblicus-0.4.0 → biblicus-0.6.0}/features/environment.py +0 -0
  53. {biblicus-0.4.0 → biblicus-0.6.0}/features/error_cases.feature +0 -0
  54. {biblicus-0.4.0 → biblicus-0.6.0}/features/evaluation.feature +0 -0
  55. {biblicus-0.4.0 → biblicus-0.6.0}/features/extraction_error_handling.feature +0 -0
  56. {biblicus-0.4.0 → biblicus-0.6.0}/features/extraction_run_lifecycle.feature +0 -0
  57. {biblicus-0.4.0 → biblicus-0.6.0}/features/extraction_selection.feature +0 -0
  58. {biblicus-0.4.0 → biblicus-0.6.0}/features/extraction_selection_longest.feature +0 -0
  59. {biblicus-0.4.0 → biblicus-0.6.0}/features/extractor_pipeline.feature +0 -0
  60. {biblicus-0.4.0 → biblicus-0.6.0}/features/extractor_validation.feature +0 -0
  61. {biblicus-0.4.0 → biblicus-0.6.0}/features/frontmatter.feature +0 -0
  62. {biblicus-0.4.0 → biblicus-0.6.0}/features/hook_config_validation.feature +0 -0
  63. {biblicus-0.4.0 → biblicus-0.6.0}/features/hook_error_handling.feature +0 -0
  64. {biblicus-0.4.0 → biblicus-0.6.0}/features/import_tree.feature +0 -0
  65. {biblicus-0.4.0 → biblicus-0.6.0}/features/ingest_sources.feature +0 -0
  66. {biblicus-0.4.0 → biblicus-0.6.0}/features/integration_audio_samples.feature +0 -0
  67. {biblicus-0.4.0 → biblicus-0.6.0}/features/integration_image_samples.feature +0 -0
  68. {biblicus-0.4.0 → biblicus-0.6.0}/features/integration_mixed_corpus.feature +0 -0
  69. {biblicus-0.4.0 → biblicus-0.6.0}/features/integration_mixed_extraction.feature +0 -0
  70. {biblicus-0.4.0 → biblicus-0.6.0}/features/integration_ocr_image_extraction.feature +0 -0
  71. {biblicus-0.4.0 → biblicus-0.6.0}/features/integration_pdf_retrieval.feature +0 -0
  72. {biblicus-0.4.0 → biblicus-0.6.0}/features/integration_pdf_samples.feature +0 -0
  73. {biblicus-0.4.0 → biblicus-0.6.0}/features/integration_unstructured_extraction.feature +0 -0
  74. {biblicus-0.4.0 → biblicus-0.6.0}/features/integration_wikipedia.feature +0 -0
  75. {biblicus-0.4.0 → biblicus-0.6.0}/features/lifecycle_hooks.feature +0 -0
  76. {biblicus-0.4.0 → biblicus-0.6.0}/features/model_validation.feature +0 -0
  77. {biblicus-0.4.0 → biblicus-0.6.0}/features/ocr_extractor.feature +0 -0
  78. {biblicus-0.4.0 → biblicus-0.6.0}/features/pdf_text_extraction.feature +0 -0
  79. {biblicus-0.4.0 → biblicus-0.6.0}/features/python_api.feature +0 -0
  80. {biblicus-0.4.0 → biblicus-0.6.0}/features/python_hook_logging.feature +0 -0
  81. {biblicus-0.4.0 → biblicus-0.6.0}/features/retrieval_budget.feature +0 -0
  82. {biblicus-0.4.0 → biblicus-0.6.0}/features/retrieval_scan.feature +0 -0
  83. {biblicus-0.4.0 → biblicus-0.6.0}/features/retrieval_sqlite_full_text_search.feature +0 -0
  84. {biblicus-0.4.0 → biblicus-0.6.0}/features/retrieval_uses_extraction_run.feature +0 -0
  85. {biblicus-0.4.0 → biblicus-0.6.0}/features/retrieval_utilities.feature +0 -0
  86. {biblicus-0.4.0 → biblicus-0.6.0}/features/source_loading.feature +0 -0
  87. {biblicus-0.4.0 → biblicus-0.6.0}/features/steps/backend_steps.py +0 -0
  88. {biblicus-0.4.0 → biblicus-0.6.0}/features/steps/cli_parsing_steps.py +0 -0
  89. {biblicus-0.4.0 → biblicus-0.6.0}/features/steps/crawl_steps.py +0 -0
  90. {biblicus-0.4.0 → biblicus-0.6.0}/features/steps/extraction_run_lifecycle_steps.py +0 -0
  91. {biblicus-0.4.0 → biblicus-0.6.0}/features/steps/extraction_steps.py +0 -0
  92. {biblicus-0.4.0 → biblicus-0.6.0}/features/steps/extractor_steps.py +0 -0
  93. {biblicus-0.4.0 → biblicus-0.6.0}/features/steps/frontmatter_steps.py +0 -0
  94. {biblicus-0.4.0 → biblicus-0.6.0}/features/steps/model_steps.py +0 -0
  95. {biblicus-0.4.0 → biblicus-0.6.0}/features/steps/openai_steps.py +0 -0
  96. {biblicus-0.4.0 → biblicus-0.6.0}/features/steps/pdf_steps.py +0 -0
  97. {biblicus-0.4.0 → biblicus-0.6.0}/features/steps/python_api_steps.py +0 -0
  98. {biblicus-0.4.0 → biblicus-0.6.0}/features/steps/rapidocr_steps.py +0 -0
  99. {biblicus-0.4.0 → biblicus-0.6.0}/features/steps/retrieval_steps.py +0 -0
  100. {biblicus-0.4.0 → biblicus-0.6.0}/features/steps/stt_steps.py +0 -0
  101. {biblicus-0.4.0 → biblicus-0.6.0}/features/steps/unstructured_steps.py +0 -0
  102. {biblicus-0.4.0 → biblicus-0.6.0}/features/steps/user_config_steps.py +0 -0
  103. {biblicus-0.4.0 → biblicus-0.6.0}/features/streaming_ingest.feature +0 -0
  104. {biblicus-0.4.0 → biblicus-0.6.0}/features/stt_extractor.feature +0 -0
  105. {biblicus-0.4.0 → biblicus-0.6.0}/features/text_extraction_runs.feature +0 -0
  106. {biblicus-0.4.0 → biblicus-0.6.0}/features/unstructured_extractor.feature +0 -0
  107. {biblicus-0.4.0 → biblicus-0.6.0}/features/user_config.feature +0 -0
  108. {biblicus-0.4.0 → biblicus-0.6.0}/scripts/download_audio_samples.py +0 -0
  109. {biblicus-0.4.0 → biblicus-0.6.0}/scripts/download_image_samples.py +0 -0
  110. {biblicus-0.4.0 → biblicus-0.6.0}/scripts/download_mixed_samples.py +0 -0
  111. {biblicus-0.4.0 → biblicus-0.6.0}/scripts/download_pdf_samples.py +0 -0
  112. {biblicus-0.4.0 → biblicus-0.6.0}/scripts/download_wikipedia.py +0 -0
  113. {biblicus-0.4.0 → biblicus-0.6.0}/scripts/test.py +0 -0
  114. {biblicus-0.4.0 → biblicus-0.6.0}/setup.cfg +0 -0
  115. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/__main__.py +0 -0
  116. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/_vendor/dotyaml/__init__.py +0 -0
  117. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/_vendor/dotyaml/interpolation.py +0 -0
  118. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/_vendor/dotyaml/loader.py +0 -0
  119. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/_vendor/dotyaml/transformer.py +0 -0
  120. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/backends/__init__.py +0 -0
  121. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/backends/base.py +0 -0
  122. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/backends/scan.py +0 -0
  123. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/backends/sqlite_full_text_search.py +0 -0
  124. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/constants.py +0 -0
  125. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/corpus.py +0 -0
  126. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/crawl.py +0 -0
  127. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/errors.py +0 -0
  128. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/evaluation.py +0 -0
  129. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/extraction.py +0 -0
  130. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/extractors/__init__.py +0 -0
  131. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/extractors/base.py +0 -0
  132. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/extractors/metadata_text.py +0 -0
  133. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/extractors/openai_stt.py +0 -0
  134. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/extractors/pass_through_text.py +0 -0
  135. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/extractors/pdf_text.py +0 -0
  136. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/extractors/pipeline.py +0 -0
  137. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/extractors/rapidocr_text.py +0 -0
  138. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/extractors/select_longest_text.py +0 -0
  139. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/extractors/select_text.py +0 -0
  140. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/extractors/unstructured_text.py +0 -0
  141. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/frontmatter.py +0 -0
  142. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/hook_logging.py +0 -0
  143. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/hook_manager.py +0 -0
  144. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/hooks.py +0 -0
  145. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/ignore.py +0 -0
  146. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/models.py +0 -0
  147. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/retrieval.py +0 -0
  148. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/sources.py +0 -0
  149. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/time.py +0 -0
  150. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/uris.py +0 -0
  151. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus/user_config.py +0 -0
  152. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus.egg-info/dependency_links.txt +0 -0
  153. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus.egg-info/entry_points.txt +0 -0
  154. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus.egg-info/requires.txt +0 -0
  155. {biblicus-0.4.0 → biblicus-0.6.0}/src/biblicus.egg-info/top_level.txt +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: biblicus
3
- Version: 0.4.0
3
+ Version: 0.6.0
4
4
  Summary: Command line interface and Python library for corpus ingestion, retrieval, and evaluation.
5
5
  License: MIT
6
6
  Requires-Python: >=3.9
@@ -41,11 +41,45 @@ The first practical problem is not retrieval. It is collection and care. You nee
41
41
 
42
42
  This library gives you a corpus, which is a normal folder on disk. It stores each ingested item as a file, with optional metadata stored next to it. You can open and inspect the raw files directly. Any derived catalog or index can be rebuilt from the raw corpus.
43
43
 
44
- It can be used alongside LangChain, Tactus, Pydantic AI, or the agent development kit. Use it from Python or from the command line interface.
44
+ It can be used alongside LangGraph, Tactus, Pydantic AI, any agent framework, or your own setup. Use it from Python or from the command line interface.
45
45
 
46
46
  See [retrieval augmented generation overview] for a short introduction to the idea.
47
47
 
48
- ## A beginner friendly mental model
48
+ ## Start with a knowledge base
49
+
50
+ If you just want to hand a folder to your assistant and move on, use the high-level knowledge base interface. The folder can be nothing more than a handful of plain text files. You are not choosing a retrieval strategy yet. You are just collecting.
51
+
52
+ This example assumes a folder called `notes/` with a few `.txt` files. The knowledge base handles sensible defaults and still gives you a clear context pack for your model call.
53
+
54
+ ```python
55
+ from biblicus.knowledge_base import KnowledgeBase
56
+
57
+
58
+ kb = KnowledgeBase.from_folder("notes")
59
+ result = kb.query("Primary button style preference")
60
+ context_pack = kb.context_pack(result, max_tokens=800)
61
+
62
+ print(context_pack.text)
63
+ ```
64
+
65
+ If you want to run a real, executable version of this story, use `scripts/readme_end_to_end_demo.py` from a fresh clone.
66
+
67
+ This simplified sequence diagram shows the same idea at a high level.
68
+
69
+ ```mermaid
70
+ %%{init: {"theme": "base", "themeVariables": {"primaryColor": "#f3e5f5", "primaryTextColor": "#111111", "primaryBorderColor": "#8e24aa", "lineColor": "#90a4ae", "secondaryColor": "#eceff1", "tertiaryColor": "#ffffff", "noteBkgColor": "#ffffff", "noteTextColor": "#111111", "actorBkg": "#f3e5f5", "actorBorder": "#8e24aa", "actorTextColor": "#111111"}}}%%
71
+ sequenceDiagram
72
+ participant App as Your assistant code
73
+ participant KB as Knowledge base
74
+ participant LLM as Large language model
75
+
76
+ App->>KB: query
77
+ KB-->>App: evidence and context
78
+ App->>LLM: context plus prompt
79
+ LLM-->>App: response draft
80
+ ```
81
+
82
+ ## A simple mental model
49
83
 
50
84
  Think in three stages.
51
85
 
@@ -63,94 +97,30 @@ If you learn a few project words, the rest of the system becomes predictable.
63
97
  - Run is a recorded retrieval build for a corpus.
64
98
  - Evidence is what retrieval returns, with identifiers and source information.
65
99
 
66
- ## Diagram
67
-
68
- This diagram shows how a corpus becomes evidence for an assistant.
69
- Extraction is introduced here as a separate stage so you can swap extraction approaches without changing the raw corpus.
70
- The legend shows what the block styles mean.
71
- Your code is where you decide how to turn evidence into context and how to call a model.
72
-
73
- ```mermaid
74
- %%{init: {"flowchart": {"useMaxWidth": true, "nodeSpacing": 18, "rankSpacing": 22}}}%%
75
- flowchart LR
76
- subgraph Legend[Legend]
77
- direction LR
78
- LegendArtifact[Stored artifact or evidence]
79
- LegendStep[Step]
80
- LegendArtifact --- LegendStep
81
- end
82
-
83
- subgraph Main[" "]
84
- direction TB
85
-
86
- subgraph StableCore[Stable core]
87
- direction TB
88
- Source[Source items] --> Ingest[Ingest]
89
- Ingest --> Raw[Raw item files]
90
- Raw --> Catalog[Catalog file]
91
- end
92
-
93
- subgraph PluggableExtractionPipeline[Pluggable: extraction pipeline]
94
- direction TB
95
- Catalog --> Extract[Extract pipeline]
96
- Extract --> ExtractedText[Extracted text artifacts]
97
- ExtractedText --> ExtractionRun[Extraction run manifest]
98
- end
99
-
100
- subgraph PluggableRetrievalBackend[Pluggable: retrieval backend]
101
- direction LR
102
-
103
- subgraph BackendIngestionIndexing[Ingestion and indexing]
104
- direction TB
105
- ExtractionRun --> Build[Build run]
106
- Build --> BackendIndex[Backend index]
107
- BackendIndex --> Run[Run manifest]
108
- end
100
+ ## Where it fits in an assistant
109
101
 
110
- subgraph BackendRetrievalGeneration[Retrieval and generation]
111
- direction TB
112
- Run --> Query[Query]
113
- Query --> Evidence[Evidence]
114
- end
115
- end
102
+ Biblicus does not answer user questions. It is not a language model. It helps your assistant answer them by retrieving relevant material and returning it as structured evidence. Your code decides how to turn evidence into a context pack for the model call, which is then passed to a model you choose.
116
103
 
117
- Evidence --> Context
104
+ In a coding assistant, retrieval is often triggered by what the user is doing right now. For example: you are about to propose a user interface change, so you retrieve the user's stated preferences, then you include that as context for the model call.
118
105
 
119
- subgraph YourCode[Your code]
120
- direction TB
121
- Context[Assistant context] --> Model[Large language model call]
122
- Model --> Answer[Answer]
123
- end
106
+ This diagram shows two sequential Biblicus calls. They are shown separately to make the boundaries explicit: retrieval returns evidence, and context pack building consumes evidence.
124
107
 
125
- style StableCore fill:#ffffff,stroke:#8e24aa,stroke-width:2px,color:#111111
126
- style PluggableExtractionPipeline fill:#ffffff,stroke:#5e35b1,stroke-dasharray:6 3,stroke-width:2px,color:#111111
127
- style PluggableRetrievalBackend fill:#ffffff,stroke:#1e88e5,stroke-dasharray:6 3,stroke-width:2px,color:#111111
128
- style YourCode fill:#ffffff,stroke:#d81b60,stroke-width:2px,color:#111111
129
- style BackendIngestionIndexing fill:#ffffff,stroke:#cfd8dc,color:#111111
130
- style BackendRetrievalGeneration fill:#ffffff,stroke:#cfd8dc,color:#111111
131
-
132
- style Raw fill:#f3e5f5,stroke:#8e24aa,color:#111111
133
- style Catalog fill:#f3e5f5,stroke:#8e24aa,color:#111111
134
- style ExtractedText fill:#f3e5f5,stroke:#8e24aa,color:#111111
135
- style ExtractionRun fill:#f3e5f5,stroke:#8e24aa,color:#111111
136
- style BackendIndex fill:#f3e5f5,stroke:#8e24aa,color:#111111
137
- style Run fill:#f3e5f5,stroke:#8e24aa,color:#111111
138
- style Evidence fill:#f3e5f5,stroke:#8e24aa,color:#111111
139
- style Context fill:#f3e5f5,stroke:#8e24aa,color:#111111
140
- style Answer fill:#f3e5f5,stroke:#8e24aa,color:#111111
141
- style Source fill:#f3e5f5,stroke:#8e24aa,color:#111111
142
-
143
- style Ingest fill:#eceff1,stroke:#90a4ae,color:#111111
144
- style Extract fill:#eceff1,stroke:#90a4ae,color:#111111
145
- style Build fill:#eceff1,stroke:#90a4ae,color:#111111
146
- style Query fill:#eceff1,stroke:#90a4ae,color:#111111
147
- style Model fill:#eceff1,stroke:#90a4ae,color:#111111
148
- end
149
-
150
- style Legend fill:#ffffff,stroke:#ffffff,color:#111111
151
- style Main fill:#ffffff,stroke:#ffffff,color:#111111
152
- style LegendArtifact fill:#f3e5f5,stroke:#8e24aa,color:#111111
153
- style LegendStep fill:#eceff1,stroke:#90a4ae,color:#111111
108
+ ```mermaid
109
+ %%{init: {"theme": "base", "themeVariables": {"primaryColor": "#f3e5f5", "primaryTextColor": "#111111", "primaryBorderColor": "#8e24aa", "lineColor": "#90a4ae", "secondaryColor": "#eceff1", "tertiaryColor": "#ffffff", "noteBkgColor": "#ffffff", "noteTextColor": "#111111", "actorBkg": "#f3e5f5", "actorBorder": "#8e24aa", "actorTextColor": "#111111"}}}%%
110
+ sequenceDiagram
111
+ participant User
112
+ participant App as Your assistant code
113
+ participant Bib as Biblicus
114
+ participant LLM as Large language model
115
+
116
+ User->>App: request
117
+ App->>Bib: query retrieval
118
+ Bib-->>App: retrieval result evidence JSON
119
+ App->>Bib: build context pack from evidence
120
+ Bib-->>App: context pack text
121
+ App->>LLM: context pack plus prompt
122
+ LLM-->>App: response draft
123
+ App-->>User: response
154
124
  ```
155
125
 
156
126
  ## Practical value
@@ -217,6 +187,216 @@ biblicus crawl --corpus corpora/example \\
217
187
  --tag crawled
218
188
  ```
219
189
 
190
+ ## End-to-end example: lower-level control
191
+
192
+ The command-line interface returns JavaScript Object Notation by default. This makes it easy to use Biblicus in scripts and to treat retrieval as a deterministic, testable step.
193
+
194
+ This version shows the lower-level pieces explicitly. You are building the corpus, controlling each memory string, choosing the backend, and shaping the context pack yourself.
195
+
196
+ ```python
197
+ from biblicus.backends import get_backend
198
+ from biblicus.context import ContextPackPolicy, TokenBudget, build_context_pack, fit_context_pack_to_token_budget
199
+ from biblicus.corpus import Corpus
200
+ from biblicus.models import QueryBudget
201
+
202
+
203
+ corpus = Corpus.init("corpora/story")
204
+
205
+ notes = [
206
+ ("User name", "The user's name is Tactus Maximus."),
207
+ ("Button style preference", "Primary button style preference: the user's favorite color is magenta."),
208
+ ("Style preference", "The user prefers concise answers."),
209
+ ("Language preference", "The user dislikes idioms and abbreviations."),
210
+ ("Engineering preference", "The user likes code that is over-documented and behavior-driven."),
211
+ ]
212
+ for note_title, note_text in notes:
213
+ corpus.ingest_note(note_text, title=note_title, tags=["memory"])
214
+
215
+ backend = get_backend("scan")
216
+ run = backend.build_run(corpus, recipe_name="Story demo", config={})
217
+ budget = QueryBudget(max_total_items=5, max_total_characters=2000, max_items_per_source=None)
218
+ result = backend.query(
219
+ corpus,
220
+ run=run,
221
+ query_text="Primary button style preference",
222
+ budget=budget,
223
+ )
224
+
225
+ policy = ContextPackPolicy(join_with="\n\n")
226
+ context_pack = build_context_pack(result, policy=policy)
227
+ context_pack = fit_context_pack_to_token_budget(
228
+ context_pack,
229
+ policy=policy,
230
+ token_budget=TokenBudget(max_tokens=60),
231
+ )
232
+ print(context_pack.text)
233
+ ```
234
+
235
+ If you want a runnable version of this story, use the script at `scripts/readme_end_to_end_demo.py`.
236
+
237
+ If you prefer the command-line interface, here is the same flow in compressed form:
238
+
239
+ ```
240
+ biblicus init corpora/story
241
+ biblicus ingest --corpus corpora/story --stdin --title "User name" --tag memory <<< "The user's name is Tactus Maximus."
242
+ biblicus ingest --corpus corpora/story --stdin --title "Button style preference" --tag memory <<< "Primary button style preference: the user's favorite color is magenta."
243
+ biblicus ingest --corpus corpora/story --stdin --title "Style preference" --tag memory <<< "The user prefers concise answers."
244
+ biblicus ingest --corpus corpora/story --stdin --title "Language preference" --tag memory <<< "The user dislikes idioms and abbreviations."
245
+ biblicus ingest --corpus corpora/story --stdin --title "Engineering preference" --tag memory <<< "The user likes code that is over-documented and behavior-driven."
246
+ biblicus build --corpus corpora/story --backend scan
247
+ biblicus query --corpus corpora/story --query "Primary button style preference"
248
+ ```
249
+
250
+ Example output:
251
+
252
+ ```json
253
+ {
254
+ "query_text": "Primary button style preference",
255
+ "budget": {
256
+ "max_total_items": 5,
257
+ "max_total_characters": 2000,
258
+ "max_items_per_source": null
259
+ },
260
+ "run_id": "RUN_ID",
261
+ "recipe_id": "RECIPE_ID",
262
+ "backend_id": "scan",
263
+ "generated_at": "2026-01-29T00:00:00.000000Z",
264
+ "evidence": [
265
+ {
266
+ "item_id": "ITEM_ID",
267
+ "source_uri": "text",
268
+ "media_type": "text/markdown",
269
+ "score": 1.0,
270
+ "rank": 1,
271
+ "text": "Primary button style preference: the user's favorite color is magenta.",
272
+ "content_ref": null,
273
+ "span_start": null,
274
+ "span_end": null,
275
+ "stage": "scan",
276
+ "recipe_id": "RECIPE_ID",
277
+ "run_id": "RUN_ID",
278
+ "hash": null
279
+ }
280
+ ],
281
+ "stats": {}
282
+ }
283
+ ```
284
+
285
+ Evidence is the output contract. Your code decides how to convert evidence into assistant context.
286
+
287
+ ### Turn evidence into a context pack
288
+
289
+ A context pack is a readable text block you send to a model. There is no single correct format. Treat it as a policy surface you can iterate on.
290
+
291
+ Here is a minimal example that builds a context pack from evidence:
292
+
293
+ ```python
294
+ from biblicus.context import ContextPackPolicy, build_context_pack
295
+
296
+
297
+ policy = ContextPackPolicy(
298
+ join_with="\n\n",
299
+ )
300
+ context_pack = build_context_pack(result, policy=policy)
301
+ print(context_pack.text)
302
+ ```
303
+
304
+ Example context pack output:
305
+
306
+ ```text
307
+ Primary button style preference: the user's favorite color is magenta.
308
+ ```
309
+
310
+ You can also build a context pack from the command-line interface by piping the retrieval result:
311
+
312
+ ```
313
+ biblicus query --corpus corpora/story --query "Primary button style preference" \\
314
+ | biblicus context-pack build
315
+ ```
316
+
317
+ Most production systems also apply a budget when building context. If you want a precise token budget, the budgeting logic needs a specific tokenizer and should be treated as its own stage.
318
+
319
+ ## Pipeline diagram
320
+
321
+ This diagram shows how a corpus becomes evidence for your assistant. Your code decides how to turn evidence into context and how to call a model.
322
+
323
+ ```mermaid
324
+ %%{init: {"theme": "base", "themeVariables": {"primaryColor": "#f3e5f5", "primaryTextColor": "#111111", "primaryBorderColor": "#8e24aa", "lineColor": "#90a4ae", "secondaryColor": "#eceff1", "tertiaryColor": "#ffffff"}, "flowchart": {"useMaxWidth": true, "nodeSpacing": 18, "rankSpacing": 22}}}%%
325
+ flowchart TB
326
+ subgraph Legend[Legend]
327
+ direction LR
328
+ LegendArtifact[Stored artifact or evidence]
329
+ LegendStep[Step]
330
+ LegendArtifact --- LegendStep
331
+ end
332
+
333
+ subgraph Main[" "]
334
+ direction TB
335
+
336
+ subgraph Pipeline[" "]
337
+ direction TB
338
+
339
+ subgraph RowStable[Stable core]
340
+ direction TB
341
+ Source[Source items] --> Ingest[Ingest] --> Raw[Raw item files] --> Catalog[Catalog file]
342
+ end
343
+
344
+ subgraph RowExtraction[Pluggable: extraction pipeline]
345
+ direction TB
346
+ Catalog --> Extract[Extract pipeline] --> ExtractedText[Extracted text artifacts] --> ExtractionRun[Extraction run manifest]
347
+ end
348
+
349
+ subgraph RowRetrieval[Pluggable: retrieval backend]
350
+ direction TB
351
+ ExtractionRun --> Build[Build run] --> BackendIndex[Backend index] --> Run[Run manifest] --> Retrieve[Retrieve] --> Rerank[Rerank optional] --> Filter[Filter optional] --> Evidence[Evidence]
352
+ end
353
+
354
+ subgraph RowContext[Context]
355
+ direction TB
356
+ Evidence --> ContextPack[Context pack] --> FitTokens[Fit tokens optional] --> Context[Assistant context]
357
+ end
358
+
359
+ subgraph RowYourCode[Your code]
360
+ direction TB
361
+ Context --> Model[Large language model call] --> Answer[Answer]
362
+ end
363
+ end
364
+
365
+ style RowStable fill:#ffffff,stroke:#8e24aa,stroke-width:2px,color:#111111
366
+ style RowExtraction fill:#ffffff,stroke:#5e35b1,stroke-dasharray:6 3,stroke-width:2px,color:#111111
367
+ style RowRetrieval fill:#ffffff,stroke:#1e88e5,stroke-dasharray:6 3,stroke-width:2px,color:#111111
368
+ style RowContext fill:#ffffff,stroke:#7b1fa2,stroke-width:2px,color:#111111
369
+ style RowYourCode fill:#ffffff,stroke:#d81b60,stroke-width:2px,color:#111111
370
+
371
+ style Raw fill:#f3e5f5,stroke:#8e24aa,color:#111111
372
+ style Catalog fill:#f3e5f5,stroke:#8e24aa,color:#111111
373
+ style ExtractedText fill:#f3e5f5,stroke:#8e24aa,color:#111111
374
+ style ExtractionRun fill:#f3e5f5,stroke:#8e24aa,color:#111111
375
+ style BackendIndex fill:#f3e5f5,stroke:#8e24aa,color:#111111
376
+ style Run fill:#f3e5f5,stroke:#8e24aa,color:#111111
377
+ style Evidence fill:#f3e5f5,stroke:#8e24aa,color:#111111
378
+ style ContextPack fill:#f3e5f5,stroke:#8e24aa,color:#111111
379
+ style Context fill:#f3e5f5,stroke:#8e24aa,color:#111111
380
+ style Answer fill:#f3e5f5,stroke:#8e24aa,color:#111111
381
+ style Source fill:#f3e5f5,stroke:#8e24aa,color:#111111
382
+
383
+ style Ingest fill:#eceff1,stroke:#90a4ae,color:#111111
384
+ style Extract fill:#eceff1,stroke:#90a4ae,color:#111111
385
+ style Build fill:#eceff1,stroke:#90a4ae,color:#111111
386
+ style Retrieve fill:#eceff1,stroke:#90a4ae,color:#111111
387
+ style Rerank fill:#eceff1,stroke:#90a4ae,color:#111111
388
+ style Filter fill:#eceff1,stroke:#90a4ae,color:#111111
389
+ style FitTokens fill:#eceff1,stroke:#90a4ae,color:#111111
390
+ style Model fill:#eceff1,stroke:#90a4ae,color:#111111
391
+ end
392
+
393
+ style Legend fill:#ffffff,stroke:#ffffff,color:#111111
394
+ style Main fill:#ffffff,stroke:#ffffff,color:#111111
395
+ style Pipeline fill:#ffffff,stroke:#ffffff,color:#111111
396
+ style LegendArtifact fill:#f3e5f5,stroke:#8e24aa,color:#111111
397
+ style LegendStep fill:#eceff1,stroke:#90a4ae,color:#111111
398
+ ```
399
+
220
400
  ## Python usage
221
401
 
222
402
  From Python, the same flow is available through the Corpus class and backend interfaces. The public surface area is small on purpose.
@@ -229,30 +409,29 @@ From Python, the same flow is available through the Corpus class and backend int
229
409
  - Query a run with `backend.query`.
230
410
  - Evaluate with `evaluate_run`.
231
411
 
232
- ## How it fits into an assistant
233
-
234
- In an assistant system, retrieval usually produces context for a model call. This library treats evidence as the primary output so you can decide how to use it.
235
-
236
- - Use a corpus as the source of truth for raw items.
237
- - Use a backend run to build any derived artifacts needed for retrieval.
238
- - Use queries to obtain evidence objects.
239
- - Convert evidence into the format your framework expects, such as message content, tool output, or citations.
240
-
241
412
  ## Learn more
242
413
 
243
414
  Full documentation is published on GitHub Pages: https://anthusai.github.io/Biblicus/
244
415
 
245
- The documents below are written to be read in order.
416
+ The documents below follow the pipeline from raw items to model context:
246
417
 
247
- - [Architecture][architecture]
248
- - [Roadmap][roadmap]
249
- - [Feature index][feature-index]
250
418
  - [Corpus][corpus]
251
419
  - [Text extraction][text-extraction]
252
- - [User configuration][user-configuration]
420
+ - [Knowledge base][knowledge-base]
253
421
  - [Backends][backends]
422
+ - [Context packs][context-packs]
423
+ - [Testing and evaluation][testing]
424
+
425
+ Reference:
426
+
254
427
  - [Demos][demos]
255
- - [Testing][testing]
428
+ - [User configuration][user-configuration]
429
+
430
+ Design and implementation map:
431
+
432
+ - [Feature index][feature-index]
433
+ - [Roadmap][roadmap]
434
+ - [Architecture][architecture]
256
435
 
257
436
  ## Metadata and catalog
258
437
 
@@ -341,9 +520,11 @@ License terms are in `LICENSE`.
341
520
  [roadmap]: docs/ROADMAP.md
342
521
  [feature-index]: docs/FEATURE_INDEX.md
343
522
  [corpus]: docs/CORPUS.md
523
+ [knowledge-base]: docs/KNOWLEDGE_BASE.md
344
524
  [text-extraction]: docs/EXTRACTION.md
345
525
  [user-configuration]: docs/USER_CONFIGURATION.md
346
526
  [backends]: docs/BACKENDS.md
527
+ [context-packs]: docs/CONTEXT_PACK.md
347
528
  [demos]: docs/DEMOS.md
348
529
  [testing]: docs/TESTING.md
349
530