raglab 0.1.2__tar.gz → 0.2.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,78 @@
1
+ # raglab — agent instructions
2
+
3
+ `raglab` is the **agentic-search / RAG orchestration layer**: it turns a
4
+ retrieval substrate into a *Search Agent* (plan → formulate → retrieve →
5
+ evaluate → re-query → rerank → cite) and, later, RAG pipelines on top.
6
+
7
+ > Fresh start (v0.2.0+). This repo took over the `raglab` PyPI name; the old
8
+ > backend lives at `raglab_bak`. Development is just beginning — greenfield.
9
+
10
+ ## The architecture we're building toward
11
+
12
+ The reference design is **`ir_09 — A Composable Search Agent`**, in the **ir
13
+ repo** at `$PP/i/ir/misc/docs/ir_09 -- A Composable Search Agent ...md`, and the
14
+ cross-repo plan is **i2mint/ir epic #38**. Read both before non-trivial work.
15
+
16
+ raglab is the **orchestration layer on top of `ir`** (the retrieval substrate).
17
+ "Structure over concretion": a small set of **roles** (Protocols) wired by a
18
+ control loop, with concrete tools injected at the leaves.
19
+
20
+ ### raglab owns (the *agent*)
21
+
22
+ - **Value types** (frozen, plain data, ir_09 §3): `Query`, `SubTask(goal, sources)`,
23
+ `LowLevelQuery(source, spec)`, `Judgement(relevant, sufficient, refinement)`.
24
+ (`Result` = ir's `SearchHit`/`Disclosure`, reused as-is — do not redefine it.)
25
+ - **Role Protocols** (open-closed strategy seams, injected callables):
26
+ `Planner`, `Formulator`, `Retriever`, `Evaluator`, `Reranker`, `Citer`.
27
+ - **The control loop with the back-edge** (evaluator → reformulate). v1 = an
28
+ imperative `while` loop over a mutable `AgentState` (ir_09 §9), not a
29
+ cyclic-graph runner. This back-edge is what makes it an agent vs a DAG.
30
+ - **Budget governor** (`max_rounds` / `max_sources_per_task` / `max_results_per_task`);
31
+ termination as a **separately injected policy** (ir_09 §9), not folded into the
32
+ evaluator.
33
+ - **Source registry** = a live `Mapping[name, Retriever]` across *heterogeneous*
34
+ backends (ir corpora + web/SQL/graph); cross-source merge + global rerank at
35
+ the fan-in point.
36
+ - **Two orchestrators behind one interface** (ir_09 §7): `SingleContextAgent`
37
+ (default, cheap, one ReAct loop) and `MultiAgentAgent` (one subagent per
38
+ sub-task/source, ~15× cost, breadth-first only). Promotion swaps only the
39
+ orchestrator, keeping role contracts identical.
40
+
41
+ ### What raglab CONSUMES from ir (do not reimplement these)
42
+
43
+ - `ir.as_retriever(corpus)` → register an ir corpus as one `Retriever` key (#33).
44
+ - `ir.registry.retrievers()` → the ir-corpus slice of the source registry (#34).
45
+ - `ir.make_llm_formulator` / the `formulate=` seam → the Formulator role (#32).
46
+ - `ir.Selection` + its derived `sufficient` signal → Evaluator input (#35).
47
+ - `ir.disclose(..., store=...)` + `SearchHit.to_dict()` → pointer-passing / lazy
48
+ deref across the subagent boundary (#36).
49
+
50
+ ### What belongs ELSEWHERE
51
+
52
+ - **Generation / answer synthesis and the Citer/Verifier** (which needs a
53
+ generated claim) sit with the RAG/generation layer (`srag`), not the search
54
+ agent. The agent's deliverable is **pointers + extractions**, not an essay.
55
+
56
+ ## Dependency direction (load-bearing)
57
+
58
+ **`raglab` imports `ir` (and `oa` for LLM strategies); `ir` NEVER imports
59
+ `raglab`.** Keep LLM ops (`oa`) lazy/opt-in so `import raglab` stays offline.
60
+
61
+ ## Build order (ir_09 §8 / epic #38)
62
+
63
+ 1. `Retriever` Protocol + source registry with 2–3 real backends (ir corpora via
64
+ `ir.as_retriever`; one web/SQL).
65
+ 2. `SingleContextAgent` with a trivial planner + pass-through evaluator wrapping
66
+ ir's search/select/disclose — the thin slice, **no loop yet**.
67
+ 3. LLM `Formulator` + `Evaluator` and **turn on the back-edge**.
68
+ 4. Reranker at fan-in; Citer (in `srag`).
69
+ 5. Budget governor + run-log / observability.
70
+ 6. `MultiAgentAgent` — only if breadth justifies the cost.
71
+
72
+ ## House style (i2mint ecosystem)
73
+
74
+ Functional > OOP; SOLID when OOP; facades, SSOT, dependency injection;
75
+ progressive disclosure; keyword-only beyond the 3rd positional; `collections.abc`
76
+ + frozen `dataclass`es; `Protocol`s for the role seams; every module has a
77
+ top-level docstring. Never `pip install` local ecosystem packages (`ir`, `ef`,
78
+ `vd`, `dol`, `oa`, …) — they're local via `.pth`. wads CI auto-publishes on merge.
@@ -0,0 +1 @@
1
+ *.ipynb linguist-documentation
@@ -0,0 +1,48 @@
1
+ # wads CI — calls the reusable workflow hosted in i2mint/wads.
2
+ #
3
+ # All configuration comes from this repo's pyproject.toml [tool.wads.ci.*].
4
+ # To customize the workflow itself (rare), replace this file with the
5
+ # full inline template `wads/data/github_ci_uv.yml` from i2mint/wads.
6
+ #
7
+ # Pinning: `@master` floats with wads. If you need version stability for
8
+ # a release-sensitive repo, change `@master` to a wads tag (e.g. `@v0.1.81`).
9
+ # CI failure does not block a published release — it blocks the publish
10
+ # step itself — so floating master is generally safe.
11
+ #
12
+ # Permissions: GitHub validates that the caller grants AT LEAST the
13
+ # permissions any job in the called workflow requests — at workflow-parse
14
+ # time, not at run-time, even if the job would be skipped via `if:`.
15
+ # The reusable workflow needs:
16
+ # contents: write for the publish job's version-bump push-back
17
+ # and for the github-pages job's gh-pages branch push
18
+ # pages: write for the github-pages job's REST API Pages config
19
+ # Both default to `write` on org-account GITHUB_TOKEN and need to be
20
+ # granted explicitly on personal-account callers (where the default is
21
+ # read-only). No `id-token: write` needed — the publish-github-pages
22
+ # action uses peaceiris/actions-gh-pages (branch-based) + REST API,
23
+ # not the OIDC `actions/deploy-pages` flow.
24
+ name: Continuous Integration
25
+ on: [push, pull_request]
26
+ jobs:
27
+ ci:
28
+ uses: i2mint/wads/.github/workflows/uv-ci.yml@master
29
+ permissions:
30
+ contents: write
31
+ pages: write
32
+ # Explicit pass-through (not `secrets: inherit`) because `inherit` does
33
+ # not reliably propagate caller-repo secrets to a reusable workflow owned
34
+ # by a different account (verified empirically: personal-account caller +
35
+ # i2mint-org workflow → `${{ secrets.PYPI_PASSWORD }}` resolved to empty).
36
+ #
37
+ # This list is the per-repo *transport*: it should contain PYPI_PASSWORD
38
+ # (for publishing) plus every secret your tests/CI need. It is generated
39
+ # from [tool.wads.ci.env] in pyproject.toml. To add one, run
40
+ # wads-secrets add VAR_NAME # updates pyproject + this block
41
+ # or just append a line below. *Which* of these become job env vars (and
42
+ # which are required) is controlled by [tool.wads.ci.env] — passing a
43
+ # secret here does not by itself put it in the environment.
44
+ #
45
+ # A secret name must also be declared in the reusable workflow's superset
46
+ # (wads/ci_secrets.py). `wads-secrets add` warns if it is not.
47
+ secrets:
48
+ PYPI_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
@@ -0,0 +1,120 @@
1
+ .claude/handoffs/
2
+ .claude/scratch/
3
+
4
+ # Byte-compiled / optimized / DLL files
5
+ __pycache__/
6
+ *.py[cod]
7
+ *$py.class
8
+
9
+
10
+ .DS_Store
11
+ # C extensions
12
+ *.so
13
+
14
+ # TLS certificates
15
+ ## Ignore all PEM files anywhere
16
+ *.pem
17
+ ## Also ignore any certs directory
18
+ certs/
19
+
20
+ # Distribution / packaging
21
+ .Python
22
+ build/
23
+ develop-eggs/
24
+ dist/
25
+ downloads/
26
+ eggs/
27
+ .eggs/
28
+ lib/
29
+ lib64/
30
+ parts/
31
+ sdist/
32
+ var/
33
+ wheels/
34
+ *.egg-info/
35
+ .installed.cfg
36
+ *.egg
37
+ MANIFEST
38
+ _build
39
+
40
+ # PyInstaller
41
+ # Usually these files are written by a python script from a template
42
+ # before PyInstaller builds the exe, so as to inject date/other infos into it.
43
+ *.manifest
44
+ *.spec
45
+
46
+ # Installer logs
47
+ pip-log.txt
48
+ pip-delete-this-directory.txt
49
+
50
+ # Unit test / coverage reports
51
+ htmlcov/
52
+ .tox/
53
+ .coverage
54
+ .coverage.*
55
+ .cache
56
+ nosetests.xml
57
+ coverage.xml
58
+ *.cover
59
+ .hypothesis/
60
+ .pytest_cache/
61
+
62
+ # Translations
63
+ *.mo
64
+ *.pot
65
+
66
+ # Django stuff:
67
+ *.log
68
+ local_settings.py
69
+ db.sqlite3
70
+
71
+ # Flask stuff:
72
+ instance/
73
+ .webassets-cache
74
+
75
+ # Scrapy stuff:
76
+ .scrapy
77
+
78
+ # Sphinx documentation
79
+ docs/_build/
80
+ docs/*
81
+
82
+ # PyBuilder
83
+ target/
84
+
85
+ # Jupyter Notebook
86
+ .ipynb_checkpoints
87
+
88
+ # pyenv
89
+ .python-version
90
+
91
+ # celery beat schedule file
92
+ celerybeat-schedule
93
+
94
+ # SageMath parsed files
95
+ *.sage.py
96
+
97
+ # Environments
98
+ .env
99
+ .venv
100
+ env/
101
+ venv/
102
+ ENV/
103
+ env.bak/
104
+ venv.bak/
105
+
106
+ # Spyder project settings
107
+ .spyderproject
108
+ .spyproject
109
+
110
+ # Rope project settings
111
+ .ropeproject
112
+
113
+ # mkdocs documentation
114
+ /site
115
+
116
+ # mypy
117
+ .mypy_cache/
118
+
119
+ # PyCharm
120
+ .idea
@@ -1,6 +1,6 @@
1
1
  MIT License
2
2
 
3
- Copyright (c) [year] [fullname]
3
+ Copyright (c) 2026 thorwhalen
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
6
  of this software and associated documentation files (the "Software"), to deal
raglab-0.2.2/PKG-INFO ADDED
@@ -0,0 +1,34 @@
1
+ Metadata-Version: 2.4
2
+ Name: raglab
3
+ Version: 0.2.2
4
+ Summary: A medley of tools to make RAG-based applications.
5
+ Project-URL: Homepage, https://github.com/thorwhalen/raglab
6
+ Project-URL: Repository, https://github.com/thorwhalen/raglab
7
+ Project-URL: Documentation, https://thorwhalen.github.io/raglab
8
+ Author: thorwhalen
9
+ License: mit
10
+ License-File: LICENSE
11
+ Requires-Python: >=3.10
12
+ Requires-Dist: ir>=0.1.12
13
+ Provides-Extra: dev
14
+ Requires-Dist: pytest-cov>=4.0; extra == 'dev'
15
+ Requires-Dist: pytest>=7.0; extra == 'dev'
16
+ Requires-Dist: ruff>=0.1.0; extra == 'dev'
17
+ Provides-Extra: docs
18
+ Requires-Dist: sphinx-rtd-theme>=1.0; extra == 'docs'
19
+ Requires-Dist: sphinx>=6.0; extra == 'docs'
20
+ Provides-Extra: llm
21
+ Requires-Dist: oa; extra == 'llm'
22
+ Description-Content-Type: text/markdown
23
+
24
+ # raglab
25
+
26
+ A medley of tools to make RAG-based applications.
27
+
28
+ To install: ```pip install raglab```
29
+
30
+ > **Note** — fresh start (v0.2.0+). This is a new project that has taken over the
31
+ > `raglab` name. The earlier `raglab` backend (PyPI 0.0.x–0.1.x) was renamed and
32
+ > now lives at [addaix/raglab_bak](https://github.com/addaix/raglab_bak)
33
+ > (published on PyPI as [`raglab_bak`](https://pypi.org/project/raglab_bak/)).
34
+ > Development of this new `raglab` is just beginning.
raglab-0.2.2/README.md ADDED
@@ -0,0 +1,11 @@
1
+ # raglab
2
+
3
+ A medley of tools to make RAG-based applications.
4
+
5
+ To install: ```pip install raglab```
6
+
7
+ > **Note** — fresh start (v0.2.0+). This is a new project that has taken over the
8
+ > `raglab` name. The earlier `raglab` backend (PyPI 0.0.x–0.1.x) was renamed and
9
+ > now lives at [addaix/raglab_bak](https://github.com/addaix/raglab_bak)
10
+ > (published on PyPI as [`raglab_bak`](https://pypi.org/project/raglab_bak/)).
11
+ > Development of this new `raglab` is just beginning.
@@ -0,0 +1,170 @@
1
+ [build-system]
2
+ requires = [
3
+ "hatchling",
4
+ ]
5
+ build-backend = "hatchling.build"
6
+
7
+ [project]
8
+ name = "raglab"
9
+ version = "0.2.2"
10
+ description = "A medley of tools to make RAG-based applications."
11
+ readme = "README.md"
12
+ requires-python = ">=3.10"
13
+ keywords = []
14
+ authors = [
15
+ { name = "thorwhalen" },
16
+ ]
17
+ # raglab orchestrates retrieval via `ir` (the retrieval substrate). LLM
18
+ # strategies (Planner/Formulator/Evaluator) use `oa` lazily — the `llm` extra
19
+ # below — so `import raglab` stays offline by default.
20
+ dependencies = [
21
+ "ir>=0.1.12",
22
+ ]
23
+
24
+ [project.license]
25
+ text = "mit"
26
+
27
+ [project.urls]
28
+ Homepage = "https://github.com/thorwhalen/raglab"
29
+ Repository = "https://github.com/thorwhalen/raglab"
30
+ Documentation = "https://thorwhalen.github.io/raglab"
31
+
32
+ [project.optional-dependencies]
33
+ llm = [
34
+ "oa",
35
+ ]
36
+ dev = [
37
+ "pytest>=7.0",
38
+ "pytest-cov>=4.0",
39
+ "ruff>=0.1.0",
40
+ ]
41
+ docs = [
42
+ "sphinx>=6.0",
43
+ "sphinx-rtd-theme>=1.0",
44
+ ]
45
+
46
+ [tool.ruff]
47
+ line-length = 88
48
+ target-version = "py310"
49
+ exclude = [
50
+ "**/*.ipynb",
51
+ ".git",
52
+ ".venv",
53
+ "build",
54
+ "dist",
55
+ "tests",
56
+ "examples",
57
+ "scrap",
58
+ ]
59
+
60
+ [tool.ruff.lint]
61
+ # Real lint rules from the start (pyflakes / pycodestyle / bugbear), not just
62
+ # docstring presence. E501 (line length) stays off — long descriptive docstrings.
63
+ select = [
64
+ "D100",
65
+ "F",
66
+ "E",
67
+ "W",
68
+ "B",
69
+ ]
70
+ ignore = [
71
+ "D203",
72
+ "E501",
73
+ ]
74
+
75
+ [tool.ruff.lint.pydocstyle]
76
+ convention = "google"
77
+
78
+ [tool.ruff.lint.per-file-ignores]
79
+ "**/tests/*" = [
80
+ "D",
81
+ ]
82
+ "**/examples/*" = [
83
+ "D",
84
+ ]
85
+ "**/scrap/*" = [
86
+ "D",
87
+ ]
88
+
89
+ [tool.pytest.ini_options]
90
+ minversion = "6.0"
91
+ testpaths = [
92
+ "tests",
93
+ ]
94
+ doctest_optionflags = [
95
+ "NORMALIZE_WHITESPACE",
96
+ "ELLIPSIS",
97
+ ]
98
+
99
+ [tool.wads.ci]
100
+ project_name = ""
101
+
102
+ [tool.wads.ci.commands]
103
+ pre_test = []
104
+ test = []
105
+ post_test = []
106
+ lint = []
107
+ format = []
108
+
109
+ [tool.wads.ci.env]
110
+ required_envvars = []
111
+ test_envvars = []
112
+ extra_envvars = []
113
+
114
+ [tool.wads.ci.env.defaults]
115
+
116
+ [tool.wads.ci.quality.ruff]
117
+ enabled = true
118
+
119
+ [tool.wads.ci.quality.black]
120
+ enabled = false
121
+
122
+ [tool.wads.ci.quality.mypy]
123
+ enabled = false
124
+
125
+ [tool.wads.ci.testing]
126
+ enabled = true
127
+ python_versions = [
128
+ "3.10",
129
+ "3.12",
130
+ ]
131
+ pytest_args = [
132
+ "-v",
133
+ "--tb=short",
134
+ ]
135
+ coverage_enabled = true
136
+ coverage_threshold = 0
137
+ coverage_report_format = [
138
+ "term",
139
+ "xml",
140
+ ]
141
+ exclude_paths = [
142
+ "examples",
143
+ "scrap",
144
+ ]
145
+ test_on_windows = true
146
+
147
+ [tool.wads.ci.metrics]
148
+ enabled = true
149
+ config_path = ".github/umpyre-config.yml"
150
+ storage_branch = "code-metrics"
151
+ python_version = "3.10"
152
+ force_run = false
153
+
154
+ [tool.wads.ci.build]
155
+ sdist = true
156
+ wheel = true
157
+
158
+ [tool.wads.ci.publish]
159
+ enabled = true
160
+ skip_ci_marker = "[skip ci]"
161
+ publish_marker = "[publish]"
162
+
163
+ [tool.wads.ci.docs]
164
+ enabled = true
165
+ builder = "epythet"
166
+ ignore_paths = [
167
+ "tests/",
168
+ "scrap/",
169
+ "examples/",
170
+ ]
@@ -0,0 +1,72 @@
1
+ """``raglab`` — the agentic-search / RAG orchestration layer on top of ``ir``.
2
+
3
+ `raglab` turns a retrieval substrate into a *Composable Search Agent* (the ir_09
4
+ architecture): a small set of injected **roles** — Planner, Formulator,
5
+ Retriever, Evaluator, Reranker, Citer — wired by a control loop whose back-edge
6
+ (evaluator → reformulate) is what makes it an agent rather than a DAG. Concrete
7
+ tools live at the leaves: an ``ir`` corpus becomes one ``Retriever`` via
8
+ ``ir.as_retriever``.
9
+
10
+ Quick start (the no-LLM thin slice — runs offline)::
11
+
12
+ import ir
13
+ import raglab
14
+
15
+ # register ir corpora as the agent's sources, then search across them:
16
+ sources = raglab.ir_sources("skills", "reports", mode="hybrid")
17
+ agent = raglab.make_search_agent(sources)
18
+ results = agent("how do I deploy the app") # ranked ir.SearchHits
19
+
20
+ Inject an LLM ``formulator`` (query rewrite/HyDE) and ``evaluator`` (sufficiency
21
+ + refinement) to turn on query understanding and the back-edge. Dependency
22
+ direction is one-way: ``raglab`` imports ``ir``; ``ir`` never imports ``raglab``.
23
+
24
+ > Fresh start (v0.2.0+). This repo took over the ``raglab`` PyPI name; the older
25
+ > backend now lives at ``raglab_bak``. Development is just beginning.
26
+ """
27
+
28
+ from .agent import (
29
+ Budget,
30
+ Citer,
31
+ Evaluator,
32
+ Formulator,
33
+ Judgement,
34
+ LowLevelQuery,
35
+ Planner,
36
+ Query,
37
+ Reranker,
38
+ Result,
39
+ Retriever,
40
+ SingleContextAgent,
41
+ SubTask,
42
+ identity_citer,
43
+ identity_formulator,
44
+ ir_sources,
45
+ make_search_agent,
46
+ passthrough_evaluator,
47
+ score_reranker,
48
+ single_subtask_planner,
49
+ )
50
+
51
+ __all__ = [
52
+ "Query",
53
+ "SubTask",
54
+ "LowLevelQuery",
55
+ "Judgement",
56
+ "Result",
57
+ "Retriever",
58
+ "Planner",
59
+ "Formulator",
60
+ "Evaluator",
61
+ "Reranker",
62
+ "Citer",
63
+ "Budget",
64
+ "SingleContextAgent",
65
+ "make_search_agent",
66
+ "ir_sources",
67
+ "single_subtask_planner",
68
+ "identity_formulator",
69
+ "passthrough_evaluator",
70
+ "score_reranker",
71
+ "identity_citer",
72
+ ]