raglab 0.2.1__tar.gz → 0.2.2__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- raglab-0.2.2/.claude/CLAUDE.md +78 -0
- {raglab-0.2.1 → raglab-0.2.2}/PKG-INFO +4 -1
- {raglab-0.2.1 → raglab-0.2.2}/pyproject.toml +16 -3
- raglab-0.2.2/raglab/__init__.py +72 -0
- raglab-0.2.2/raglab/agent.py +293 -0
- raglab-0.2.2/tests/test_agent.py +158 -0
- raglab-0.2.1/raglab/__init__.py +0 -7
- {raglab-0.2.1 → raglab-0.2.2}/.gitattributes +0 -0
- {raglab-0.2.1 → raglab-0.2.2}/.github/workflows/ci.yml +0 -0
- {raglab-0.2.1 → raglab-0.2.2}/.gitignore +0 -0
- {raglab-0.2.1 → raglab-0.2.2}/LICENSE +0 -0
- {raglab-0.2.1 → raglab-0.2.2}/README.md +0 -0
|
@@ -0,0 +1,78 @@
|
|
|
1
|
+
# raglab — agent instructions
|
|
2
|
+
|
|
3
|
+
`raglab` is the **agentic-search / RAG orchestration layer**: it turns a
|
|
4
|
+
retrieval substrate into a *Search Agent* (plan → formulate → retrieve →
|
|
5
|
+
evaluate → re-query → rerank → cite) and, later, RAG pipelines on top.
|
|
6
|
+
|
|
7
|
+
> Fresh start (v0.2.0+). This repo took over the `raglab` PyPI name; the old
|
|
8
|
+
> backend lives at `raglab_bak`. Development is just beginning — greenfield.
|
|
9
|
+
|
|
10
|
+
## The architecture we're building toward
|
|
11
|
+
|
|
12
|
+
The reference design is **`ir_09 — A Composable Search Agent`**, in the **ir
|
|
13
|
+
repo** at `$PP/i/ir/misc/docs/ir_09 -- A Composable Search Agent ...md`, and the
|
|
14
|
+
cross-repo plan is **i2mint/ir epic #38**. Read both before non-trivial work.
|
|
15
|
+
|
|
16
|
+
raglab is the **orchestration layer on top of `ir`** (the retrieval substrate).
|
|
17
|
+
"Structure over concretion": a small set of **roles** (Protocols) wired by a
|
|
18
|
+
control loop, with concrete tools injected at the leaves.
|
|
19
|
+
|
|
20
|
+
### raglab owns (the *agent*)
|
|
21
|
+
|
|
22
|
+
- **Value types** (frozen, plain data, ir_09 §3): `Query`, `SubTask(goal, sources)`,
|
|
23
|
+
`LowLevelQuery(source, spec)`, `Judgement(relevant, sufficient, refinement)`.
|
|
24
|
+
(`Result` = ir's `SearchHit`/`Disclosure`, reused as-is — do not redefine it.)
|
|
25
|
+
- **Role Protocols** (open-closed strategy seams, injected callables):
|
|
26
|
+
`Planner`, `Formulator`, `Retriever`, `Evaluator`, `Reranker`, `Citer`.
|
|
27
|
+
- **The control loop with the back-edge** (evaluator → reformulate). v1 = an
|
|
28
|
+
imperative `while` loop over a mutable `AgentState` (ir_09 §9), not a
|
|
29
|
+
cyclic-graph runner. This back-edge is what makes it an agent vs a DAG.
|
|
30
|
+
- **Budget governor** (`max_rounds` / `max_sources_per_task` / `max_results_per_task`);
|
|
31
|
+
termination as a **separately injected policy** (ir_09 §9), not folded into the
|
|
32
|
+
evaluator.
|
|
33
|
+
- **Source registry** = a live `Mapping[name, Retriever]` across *heterogeneous*
|
|
34
|
+
backends (ir corpora + web/SQL/graph); cross-source merge + global rerank at
|
|
35
|
+
the fan-in point.
|
|
36
|
+
- **Two orchestrators behind one interface** (ir_09 §7): `SingleContextAgent`
|
|
37
|
+
(default, cheap, one ReAct loop) and `MultiAgentAgent` (one subagent per
|
|
38
|
+
sub-task/source, ~15× cost, breadth-first only). Promotion swaps only the
|
|
39
|
+
orchestrator, keeping role contracts identical.
|
|
40
|
+
|
|
41
|
+
### What raglab CONSUMES from ir (do not reimplement these)
|
|
42
|
+
|
|
43
|
+
- `ir.as_retriever(corpus)` → register an ir corpus as one `Retriever` key (#33).
|
|
44
|
+
- `ir.registry.retrievers()` → the ir-corpus slice of the source registry (#34).
|
|
45
|
+
- `ir.make_llm_formulator` / the `formulate=` seam → the Formulator role (#32).
|
|
46
|
+
- `ir.Selection` + its derived `sufficient` signal → Evaluator input (#35).
|
|
47
|
+
- `ir.disclose(..., store=...)` + `SearchHit.to_dict()` → pointer-passing / lazy
|
|
48
|
+
deref across the subagent boundary (#36).
|
|
49
|
+
|
|
50
|
+
### What belongs ELSEWHERE
|
|
51
|
+
|
|
52
|
+
- **Generation / answer synthesis and the Citer/Verifier** (which needs a
|
|
53
|
+
generated claim) sit with the RAG/generation layer (`srag`), not the search
|
|
54
|
+
agent. The agent's deliverable is **pointers + extractions**, not an essay.
|
|
55
|
+
|
|
56
|
+
## Dependency direction (load-bearing)
|
|
57
|
+
|
|
58
|
+
**`raglab` imports `ir` (and `oa` for LLM strategies); `ir` NEVER imports
|
|
59
|
+
`raglab`.** Keep LLM ops (`oa`) lazy/opt-in so `import raglab` stays offline.
|
|
60
|
+
|
|
61
|
+
## Build order (ir_09 §8 / epic #38)
|
|
62
|
+
|
|
63
|
+
1. `Retriever` Protocol + source registry with 2–3 real backends (ir corpora via
|
|
64
|
+
`ir.as_retriever`; one web/SQL).
|
|
65
|
+
2. `SingleContextAgent` with a trivial planner + pass-through evaluator wrapping
|
|
66
|
+
ir's search/select/disclose — the thin slice, **no loop yet**.
|
|
67
|
+
3. LLM `Formulator` + `Evaluator` and **turn on the back-edge**.
|
|
68
|
+
4. Reranker at fan-in; Citer (in `srag`).
|
|
69
|
+
5. Budget governor + run-log / observability.
|
|
70
|
+
6. `MultiAgentAgent` — only if breadth justifies the cost.
|
|
71
|
+
|
|
72
|
+
## House style (i2mint ecosystem)
|
|
73
|
+
|
|
74
|
+
Functional > OOP; SOLID when OOP; facades, SSOT, dependency injection;
|
|
75
|
+
progressive disclosure; keyword-only beyond the 3rd positional; `collections.abc`
|
|
76
|
+
+ frozen `dataclass`es; `Protocol`s for the role seams; every module has a
|
|
77
|
+
top-level docstring. Never `pip install` local ecosystem packages (`ir`, `ef`,
|
|
78
|
+
`vd`, `dol`, `oa`, …) — they're local via `.pth`. wads CI auto-publishes on merge.
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: raglab
|
|
3
|
-
Version: 0.2.
|
|
3
|
+
Version: 0.2.2
|
|
4
4
|
Summary: A medley of tools to make RAG-based applications.
|
|
5
5
|
Project-URL: Homepage, https://github.com/thorwhalen/raglab
|
|
6
6
|
Project-URL: Repository, https://github.com/thorwhalen/raglab
|
|
@@ -9,6 +9,7 @@ Author: thorwhalen
|
|
|
9
9
|
License: mit
|
|
10
10
|
License-File: LICENSE
|
|
11
11
|
Requires-Python: >=3.10
|
|
12
|
+
Requires-Dist: ir>=0.1.12
|
|
12
13
|
Provides-Extra: dev
|
|
13
14
|
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
|
|
14
15
|
Requires-Dist: pytest>=7.0; extra == 'dev'
|
|
@@ -16,6 +17,8 @@ Requires-Dist: ruff>=0.1.0; extra == 'dev'
|
|
|
16
17
|
Provides-Extra: docs
|
|
17
18
|
Requires-Dist: sphinx-rtd-theme>=1.0; extra == 'docs'
|
|
18
19
|
Requires-Dist: sphinx>=6.0; extra == 'docs'
|
|
20
|
+
Provides-Extra: llm
|
|
21
|
+
Requires-Dist: oa; extra == 'llm'
|
|
19
22
|
Description-Content-Type: text/markdown
|
|
20
23
|
|
|
21
24
|
# raglab
|
|
@@ -6,7 +6,7 @@ build-backend = "hatchling.build"
|
|
|
6
6
|
|
|
7
7
|
[project]
|
|
8
8
|
name = "raglab"
|
|
9
|
-
version = "0.2.
|
|
9
|
+
version = "0.2.2"
|
|
10
10
|
description = "A medley of tools to make RAG-based applications."
|
|
11
11
|
readme = "README.md"
|
|
12
12
|
requires-python = ">=3.10"
|
|
@@ -14,7 +14,12 @@ keywords = []
|
|
|
14
14
|
authors = [
|
|
15
15
|
{ name = "thorwhalen" },
|
|
16
16
|
]
|
|
17
|
-
|
|
17
|
+
# raglab orchestrates retrieval via `ir` (the retrieval substrate). LLM
|
|
18
|
+
# strategies (Planner/Formulator/Evaluator) use `oa` lazily — the `llm` extra
|
|
19
|
+
# below — so `import raglab` stays offline by default.
|
|
20
|
+
dependencies = [
|
|
21
|
+
"ir>=0.1.12",
|
|
22
|
+
]
|
|
18
23
|
|
|
19
24
|
[project.license]
|
|
20
25
|
text = "mit"
|
|
@@ -25,6 +30,9 @@ Repository = "https://github.com/thorwhalen/raglab"
|
|
|
25
30
|
Documentation = "https://thorwhalen.github.io/raglab"
|
|
26
31
|
|
|
27
32
|
[project.optional-dependencies]
|
|
33
|
+
llm = [
|
|
34
|
+
"oa",
|
|
35
|
+
]
|
|
28
36
|
dev = [
|
|
29
37
|
"pytest>=7.0",
|
|
30
38
|
"pytest-cov>=4.0",
|
|
@@ -50,13 +58,18 @@ exclude = [
|
|
|
50
58
|
]
|
|
51
59
|
|
|
52
60
|
[tool.ruff.lint]
|
|
61
|
+
# Real lint rules from the start (pyflakes / pycodestyle / bugbear), not just
|
|
62
|
+
# docstring presence. E501 (line length) stays off — long descriptive docstrings.
|
|
53
63
|
select = [
|
|
54
64
|
"D100",
|
|
65
|
+
"F",
|
|
66
|
+
"E",
|
|
67
|
+
"W",
|
|
68
|
+
"B",
|
|
55
69
|
]
|
|
56
70
|
ignore = [
|
|
57
71
|
"D203",
|
|
58
72
|
"E501",
|
|
59
|
-
"B905",
|
|
60
73
|
]
|
|
61
74
|
|
|
62
75
|
[tool.ruff.lint.pydocstyle]
|
|
@@ -0,0 +1,72 @@
|
|
|
1
|
+
"""``raglab`` — the agentic-search / RAG orchestration layer on top of ``ir``.
|
|
2
|
+
|
|
3
|
+
`raglab` turns a retrieval substrate into a *Composable Search Agent* (the ir_09
|
|
4
|
+
architecture): a small set of injected **roles** — Planner, Formulator,
|
|
5
|
+
Retriever, Evaluator, Reranker, Citer — wired by a control loop whose back-edge
|
|
6
|
+
(evaluator → reformulate) is what makes it an agent rather than a DAG. Concrete
|
|
7
|
+
tools live at the leaves: an ``ir`` corpus becomes one ``Retriever`` via
|
|
8
|
+
``ir.as_retriever``.
|
|
9
|
+
|
|
10
|
+
Quick start (the no-LLM thin slice — runs offline)::
|
|
11
|
+
|
|
12
|
+
import ir
|
|
13
|
+
import raglab
|
|
14
|
+
|
|
15
|
+
# register ir corpora as the agent's sources, then search across them:
|
|
16
|
+
sources = raglab.ir_sources("skills", "reports", mode="hybrid")
|
|
17
|
+
agent = raglab.make_search_agent(sources)
|
|
18
|
+
results = agent("how do I deploy the app") # ranked ir.SearchHits
|
|
19
|
+
|
|
20
|
+
Inject an LLM ``formulator`` (query rewrite/HyDE) and ``evaluator`` (sufficiency
|
|
21
|
+
+ refinement) to turn on query understanding and the back-edge. Dependency
|
|
22
|
+
direction is one-way: ``raglab`` imports ``ir``; ``ir`` never imports ``raglab``.
|
|
23
|
+
|
|
24
|
+
> Fresh start (v0.2.0+). This repo took over the ``raglab`` PyPI name; the older
|
|
25
|
+
> backend now lives at ``raglab_bak``. Development is just beginning.
|
|
26
|
+
"""
|
|
27
|
+
|
|
28
|
+
from .agent import (
|
|
29
|
+
Budget,
|
|
30
|
+
Citer,
|
|
31
|
+
Evaluator,
|
|
32
|
+
Formulator,
|
|
33
|
+
Judgement,
|
|
34
|
+
LowLevelQuery,
|
|
35
|
+
Planner,
|
|
36
|
+
Query,
|
|
37
|
+
Reranker,
|
|
38
|
+
Result,
|
|
39
|
+
Retriever,
|
|
40
|
+
SingleContextAgent,
|
|
41
|
+
SubTask,
|
|
42
|
+
identity_citer,
|
|
43
|
+
identity_formulator,
|
|
44
|
+
ir_sources,
|
|
45
|
+
make_search_agent,
|
|
46
|
+
passthrough_evaluator,
|
|
47
|
+
score_reranker,
|
|
48
|
+
single_subtask_planner,
|
|
49
|
+
)
|
|
50
|
+
|
|
51
|
+
__all__ = [
|
|
52
|
+
"Query",
|
|
53
|
+
"SubTask",
|
|
54
|
+
"LowLevelQuery",
|
|
55
|
+
"Judgement",
|
|
56
|
+
"Result",
|
|
57
|
+
"Retriever",
|
|
58
|
+
"Planner",
|
|
59
|
+
"Formulator",
|
|
60
|
+
"Evaluator",
|
|
61
|
+
"Reranker",
|
|
62
|
+
"Citer",
|
|
63
|
+
"Budget",
|
|
64
|
+
"SingleContextAgent",
|
|
65
|
+
"make_search_agent",
|
|
66
|
+
"ir_sources",
|
|
67
|
+
"single_subtask_planner",
|
|
68
|
+
"identity_formulator",
|
|
69
|
+
"passthrough_evaluator",
|
|
70
|
+
"score_reranker",
|
|
71
|
+
"identity_citer",
|
|
72
|
+
]
|
|
@@ -0,0 +1,293 @@
|
|
|
1
|
+
"""The Composable Search Agent — roles wired by a control loop (ir_09).
|
|
2
|
+
|
|
3
|
+
`raglab` is the **orchestration layer on top of `ir`** (the retrieval substrate).
|
|
4
|
+
This module is the v1 foundation: the immutable value types, the role *Protocols*
|
|
5
|
+
(open-closed strategy seams), and a `SingleContextAgent` whose fixed control loop
|
|
6
|
+
is fully parametrized by injected roles. Concrete tools live at the leaves — an
|
|
7
|
+
`ir` corpus becomes one `Retriever` via :func:`ir.as_retriever`.
|
|
8
|
+
|
|
9
|
+
The shape follows ir_09 §3/§6: a small set of named roles —
|
|
10
|
+
``Planner / Formulator / Retriever / Evaluator / Reranker / Citer`` — and a loop
|
|
11
|
+
whose defining feature is the **back-edge** (evaluator → reformulate) that makes
|
|
12
|
+
it an *agent* rather than a DAG. v1 ships the loop with smart defaults (a trivial
|
|
13
|
+
planner + a pass-through evaluator), so the thin slice runs end-to-end with no
|
|
14
|
+
LLM; turning on the back-edge is just injecting an LLM ``Evaluator`` that returns
|
|
15
|
+
a ``refinement`` (ir_09 §8 step 3).
|
|
16
|
+
|
|
17
|
+
Progressive disclosure: :func:`make_search_agent` gives sensible defaults for
|
|
18
|
+
every role, so the simple path is ``make_search_agent(sources)("query")``.
|
|
19
|
+
|
|
20
|
+
Dependency direction is one-way: `raglab` imports `ir`; `ir` never imports
|
|
21
|
+
`raglab`. The ``Result`` type and the ``Retriever`` contract are ir's (SSOT).
|
|
22
|
+
"""
|
|
23
|
+
|
|
24
|
+
from __future__ import annotations
|
|
25
|
+
|
|
26
|
+
from collections.abc import Mapping, Sequence
|
|
27
|
+
from dataclasses import dataclass, field
|
|
28
|
+
from typing import Any, Protocol, runtime_checkable
|
|
29
|
+
|
|
30
|
+
# ir owns the retrieval substrate: the Result type and the Retriever leaf
|
|
31
|
+
# contract live there (one-way dependency, ir is the SSOT).
|
|
32
|
+
from ir import Retriever, SearchHit
|
|
33
|
+
|
|
34
|
+
#: A retrieved item — ir's :class:`~ir.base.SearchHit` (ir_09's ``Result``):
|
|
35
|
+
#: a *pointer + snippet* (``text``) with a ``score`` and ``metadata``.
|
|
36
|
+
Result = SearchHit
|
|
37
|
+
|
|
38
|
+
__all__ = [
|
|
39
|
+
"Query",
|
|
40
|
+
"SubTask",
|
|
41
|
+
"LowLevelQuery",
|
|
42
|
+
"Judgement",
|
|
43
|
+
"Result",
|
|
44
|
+
"Retriever",
|
|
45
|
+
"Planner",
|
|
46
|
+
"Formulator",
|
|
47
|
+
"Evaluator",
|
|
48
|
+
"Reranker",
|
|
49
|
+
"Citer",
|
|
50
|
+
"Budget",
|
|
51
|
+
"SingleContextAgent",
|
|
52
|
+
"make_search_agent",
|
|
53
|
+
"ir_sources",
|
|
54
|
+
"single_subtask_planner",
|
|
55
|
+
"identity_formulator",
|
|
56
|
+
"passthrough_evaluator",
|
|
57
|
+
"score_reranker",
|
|
58
|
+
"identity_citer",
|
|
59
|
+
]
|
|
60
|
+
|
|
61
|
+
|
|
62
|
+
# --------------------------------------------------------------------------- #
|
|
63
|
+
# Value types (immutable, plain data) — ir_09 §3
|
|
64
|
+
# --------------------------------------------------------------------------- #
|
|
65
|
+
|
|
66
|
+
|
|
67
|
+
@dataclass(frozen=True)
|
|
68
|
+
class Query:
|
|
69
|
+
"""A user intent: free text plus optional structured constraints."""
|
|
70
|
+
|
|
71
|
+
text: str
|
|
72
|
+
constraints: Mapping[str, Any] = field(default_factory=dict)
|
|
73
|
+
|
|
74
|
+
|
|
75
|
+
@dataclass(frozen=True)
|
|
76
|
+
class SubTask:
|
|
77
|
+
"""A planner's unit of work: a sub-goal bound to a set of registered sources."""
|
|
78
|
+
|
|
79
|
+
goal: str
|
|
80
|
+
sources: tuple[str, ...]
|
|
81
|
+
|
|
82
|
+
|
|
83
|
+
@dataclass(frozen=True)
|
|
84
|
+
class LowLevelQuery:
|
|
85
|
+
"""One concrete query against one source.
|
|
86
|
+
|
|
87
|
+
``query`` is the text handed to the source's :data:`Retriever`; ``params``
|
|
88
|
+
are per-call retriever overrides (e.g. ``mode`` / ``filter`` / ``k`` for an
|
|
89
|
+
ir corpus). A formulator turns a :class:`SubTask` into these.
|
|
90
|
+
"""
|
|
91
|
+
|
|
92
|
+
source: str
|
|
93
|
+
query: str
|
|
94
|
+
params: Mapping[str, Any] = field(default_factory=dict)
|
|
95
|
+
|
|
96
|
+
|
|
97
|
+
@dataclass(frozen=True)
|
|
98
|
+
class Judgement:
|
|
99
|
+
"""An evaluator's verdict over a round's results.
|
|
100
|
+
|
|
101
|
+
``relevant`` is the kept subset; ``sufficient`` says whether to stop; a
|
|
102
|
+
non-``None`` ``refinement`` is the **back-edge** — the next sub-task to
|
|
103
|
+
re-query with. The pass-through default returns ``sufficient=True`` and no
|
|
104
|
+
refinement (no loop).
|
|
105
|
+
"""
|
|
106
|
+
|
|
107
|
+
relevant: Sequence[Result]
|
|
108
|
+
sufficient: bool
|
|
109
|
+
refinement: SubTask | None = None
|
|
110
|
+
|
|
111
|
+
|
|
112
|
+
# --------------------------------------------------------------------------- #
|
|
113
|
+
# Role Protocols (the open-closed strategy seams) — ir_09 §3
|
|
114
|
+
# --------------------------------------------------------------------------- #
|
|
115
|
+
|
|
116
|
+
|
|
117
|
+
@runtime_checkable
|
|
118
|
+
class Planner(Protocol):
|
|
119
|
+
"""Decompose a query into sub-tasks and select sources for each."""
|
|
120
|
+
|
|
121
|
+
def __call__(
|
|
122
|
+
self, query: Query, sources: Mapping[str, Retriever]
|
|
123
|
+
) -> list[SubTask]: ...
|
|
124
|
+
|
|
125
|
+
|
|
126
|
+
@runtime_checkable
|
|
127
|
+
class Formulator(Protocol):
|
|
128
|
+
"""Turn a sub-task + one source into concrete low-level queries."""
|
|
129
|
+
|
|
130
|
+
def __call__(self, task: SubTask, source: str) -> list[LowLevelQuery]: ...
|
|
131
|
+
|
|
132
|
+
|
|
133
|
+
@runtime_checkable
|
|
134
|
+
class Evaluator(Protocol):
|
|
135
|
+
"""Judge relevance + sufficiency; optionally emit a refinement (back-edge)."""
|
|
136
|
+
|
|
137
|
+
def __call__(self, task: SubTask, results: Sequence[Result]) -> Judgement: ...
|
|
138
|
+
|
|
139
|
+
|
|
140
|
+
@runtime_checkable
|
|
141
|
+
class Reranker(Protocol):
|
|
142
|
+
"""Produce the final ordering over the (cross-source) merged results."""
|
|
143
|
+
|
|
144
|
+
def __call__(self, results: Sequence[Result]) -> Sequence[Result]: ...
|
|
145
|
+
|
|
146
|
+
|
|
147
|
+
@runtime_checkable
|
|
148
|
+
class Citer(Protocol):
|
|
149
|
+
"""Confirm/annotate that each result supports its use (identity by default)."""
|
|
150
|
+
|
|
151
|
+
def __call__(self, results: Sequence[Result]) -> Sequence[Result]: ...
|
|
152
|
+
|
|
153
|
+
|
|
154
|
+
# --------------------------------------------------------------------------- #
|
|
155
|
+
# Budget governor — ir_09 §4
|
|
156
|
+
# --------------------------------------------------------------------------- #
|
|
157
|
+
|
|
158
|
+
|
|
159
|
+
@dataclass(frozen=True)
|
|
160
|
+
class Budget:
|
|
161
|
+
"""Loop bounds: the safety net under the (harder) sufficiency decision."""
|
|
162
|
+
|
|
163
|
+
max_rounds: int = 3
|
|
164
|
+
max_sources_per_task: int = 4
|
|
165
|
+
max_results_per_task: int = 50
|
|
166
|
+
|
|
167
|
+
|
|
168
|
+
# --------------------------------------------------------------------------- #
|
|
169
|
+
# Default role implementations (the no-LLM thin slice) — ir_09 §8 step 2
|
|
170
|
+
# --------------------------------------------------------------------------- #
|
|
171
|
+
|
|
172
|
+
|
|
173
|
+
def single_subtask_planner(
|
|
174
|
+
query: Query, sources: Mapping[str, Retriever]
|
|
175
|
+
) -> list[SubTask]:
|
|
176
|
+
"""Trivial planner: one sub-task over *all* registered sources, no decomposition."""
|
|
177
|
+
return [SubTask(goal=query.text, sources=tuple(sources))]
|
|
178
|
+
|
|
179
|
+
|
|
180
|
+
def identity_formulator(task: SubTask, source: str) -> list[LowLevelQuery]:
|
|
181
|
+
"""Identity formulator: the sub-goal verbatim as one query (no rewrite/HyDE)."""
|
|
182
|
+
return [LowLevelQuery(source=source, query=task.goal)]
|
|
183
|
+
|
|
184
|
+
|
|
185
|
+
def passthrough_evaluator(task: SubTask, results: Sequence[Result]) -> Judgement:
|
|
186
|
+
"""Pass-through critic: keep everything, declare sufficient, never re-query."""
|
|
187
|
+
return Judgement(relevant=list(results), sufficient=True, refinement=None)
|
|
188
|
+
|
|
189
|
+
|
|
190
|
+
def score_reranker(results: Sequence[Result]) -> Sequence[Result]:
|
|
191
|
+
"""Final ordering by descending ``score`` (the cross-source merge, v1).
|
|
192
|
+
|
|
193
|
+
Note: a plain score sort assumes comparable score scales across sources
|
|
194
|
+
(true when they share an embedder + mode). A rank-based (RRF) cross-source
|
|
195
|
+
merge for heterogeneous backends is a documented follow-up.
|
|
196
|
+
"""
|
|
197
|
+
return sorted(results, key=lambda r: float(getattr(r, "score", 0.0)), reverse=True)
|
|
198
|
+
|
|
199
|
+
|
|
200
|
+
def identity_citer(results: Sequence[Result]) -> Sequence[Result]:
|
|
201
|
+
"""No-op citer (verification needs a generated claim — that lives in srag)."""
|
|
202
|
+
return results
|
|
203
|
+
|
|
204
|
+
|
|
205
|
+
# --------------------------------------------------------------------------- #
|
|
206
|
+
# The orchestrator — a fixed control loop, fully parametrized by roles
|
|
207
|
+
# --------------------------------------------------------------------------- #
|
|
208
|
+
|
|
209
|
+
|
|
210
|
+
@dataclass
|
|
211
|
+
class SingleContextAgent:
|
|
212
|
+
"""One ReAct-style loop, sequential sub-tasks (ir_09 §7 — the cheap default).
|
|
213
|
+
|
|
214
|
+
The loop is fixed; every *decision* is an injected role. Promotion to a
|
|
215
|
+
multi-agent orchestrator (ir_09 §7) swaps this class while keeping the same
|
|
216
|
+
role contracts. The **back-edge** is the single line ``current =
|
|
217
|
+
judged.refinement`` in :meth:`_run_task` — that is what makes this an agent.
|
|
218
|
+
"""
|
|
219
|
+
|
|
220
|
+
sources: Mapping[str, Retriever]
|
|
221
|
+
planner: Planner = single_subtask_planner
|
|
222
|
+
formulator: Formulator = identity_formulator
|
|
223
|
+
evaluator: Evaluator = passthrough_evaluator
|
|
224
|
+
reranker: Reranker = score_reranker
|
|
225
|
+
citer: Citer = identity_citer
|
|
226
|
+
budget: Budget = field(default_factory=Budget)
|
|
227
|
+
|
|
228
|
+
def __call__(self, query: str | Query) -> list[Result]:
|
|
229
|
+
"""Run the agent for *query*; returns the final ranked, cited results."""
|
|
230
|
+
q = query if isinstance(query, Query) else Query(text=query)
|
|
231
|
+
accumulated: list[Result] = []
|
|
232
|
+
for task in self.planner(q, self.sources):
|
|
233
|
+
accumulated.extend(self._run_task(task))
|
|
234
|
+
ranked = self.reranker(accumulated)
|
|
235
|
+
return list(self.citer(ranked))
|
|
236
|
+
|
|
237
|
+
def _run_task(self, task: SubTask) -> list[Result]:
|
|
238
|
+
found: list[Result] = []
|
|
239
|
+
current = task
|
|
240
|
+
for _ in range(max(1, self.budget.max_rounds)):
|
|
241
|
+
for source in current.sources[: self.budget.max_sources_per_task]:
|
|
242
|
+
retriever = self.sources.get(source)
|
|
243
|
+
if retriever is None:
|
|
244
|
+
continue
|
|
245
|
+
for llq in self.formulator(current, source):
|
|
246
|
+
found.extend(retriever(llq.query, **dict(llq.params)))
|
|
247
|
+
judged = self.evaluator(current, found[: self.budget.max_results_per_task])
|
|
248
|
+
found = list(judged.relevant)
|
|
249
|
+
if judged.sufficient or judged.refinement is None:
|
|
250
|
+
break
|
|
251
|
+
current = judged.refinement # the back-edge: re-query
|
|
252
|
+
return found
|
|
253
|
+
|
|
254
|
+
|
|
255
|
+
def make_search_agent(
|
|
256
|
+
sources: Mapping[str, Retriever],
|
|
257
|
+
*,
|
|
258
|
+
planner: Planner | None = None,
|
|
259
|
+
formulator: Formulator | None = None,
|
|
260
|
+
evaluator: Evaluator | None = None,
|
|
261
|
+
reranker: Reranker | None = None,
|
|
262
|
+
citer: Citer | None = None,
|
|
263
|
+
budget: Budget | None = None,
|
|
264
|
+
) -> SingleContextAgent:
|
|
265
|
+
"""Build a :class:`SingleContextAgent` over *sources* with smart defaults.
|
|
266
|
+
|
|
267
|
+
``sources`` is a ``Mapping[name, Retriever]`` — e.g.
|
|
268
|
+
``{"skills": ir.as_retriever("skills")}``. Every role defaults to its no-LLM
|
|
269
|
+
thin-slice implementation, so ``make_search_agent(sources)("query")`` just
|
|
270
|
+
works; inject an LLM ``formulator`` / ``evaluator`` to turn on rewriting and
|
|
271
|
+
the back-edge.
|
|
272
|
+
"""
|
|
273
|
+
return SingleContextAgent(
|
|
274
|
+
sources=sources,
|
|
275
|
+
planner=planner or single_subtask_planner,
|
|
276
|
+
formulator=formulator or identity_formulator,
|
|
277
|
+
evaluator=evaluator or passthrough_evaluator,
|
|
278
|
+
reranker=reranker or score_reranker,
|
|
279
|
+
citer=citer or identity_citer,
|
|
280
|
+
budget=budget or Budget(),
|
|
281
|
+
)
|
|
282
|
+
|
|
283
|
+
|
|
284
|
+
def ir_sources(*names: str, **search_defaults: Any) -> dict[str, Retriever]:
|
|
285
|
+
"""A source registry ``{name: Retriever}`` backed by named ``ir`` corpora.
|
|
286
|
+
|
|
287
|
+
Each name is bound to ``ir.as_retriever(name, **search_defaults)``. A thin
|
|
288
|
+
convenience; once ir ships ``registry.retrievers()`` (a lazy view), prefer
|
|
289
|
+
that. ``search_defaults`` (e.g. ``mode="hybrid"``) apply to every source.
|
|
290
|
+
"""
|
|
291
|
+
import ir
|
|
292
|
+
|
|
293
|
+
return {name: ir.as_retriever(name, **search_defaults) for name in names}
|
|
@@ -0,0 +1,158 @@
|
|
|
1
|
+
"""Tests for the raglab Composable Search Agent foundation (ir_09).
|
|
2
|
+
|
|
3
|
+
Hermetic: the control loop is exercised with a fake in-memory retriever (no
|
|
4
|
+
model, no network); one end-to-end test wires a real ``ir`` corpus via
|
|
5
|
+
``ir.as_retriever`` with the light (numpy-only) embedder and an in-memory store.
|
|
6
|
+
"""
|
|
7
|
+
|
|
8
|
+
import raglab
|
|
9
|
+
from ir import SearchHit
|
|
10
|
+
from raglab import Budget, Judgement, LowLevelQuery, Query, SubTask, make_search_agent
|
|
11
|
+
|
|
12
|
+
|
|
13
|
+
def _hits(*specs):
|
|
14
|
+
"""``(artifact_id, score)`` pairs -> ir.SearchHits (no corpus needed)."""
|
|
15
|
+
return [SearchHit(aid, "k", score, f"text {aid}", {}) for aid, score in specs]
|
|
16
|
+
|
|
17
|
+
|
|
18
|
+
def _fake_retriever(hits):
|
|
19
|
+
"""A Retriever that records its calls and returns canned hits."""
|
|
20
|
+
calls = []
|
|
21
|
+
|
|
22
|
+
def retrieve(query, **kw):
|
|
23
|
+
calls.append((query, kw))
|
|
24
|
+
return list(hits)
|
|
25
|
+
|
|
26
|
+
retrieve.calls = calls
|
|
27
|
+
return retrieve
|
|
28
|
+
|
|
29
|
+
|
|
30
|
+
# ----- value types & protocols --------------------------------------------- #
|
|
31
|
+
|
|
32
|
+
|
|
33
|
+
def test_result_is_ir_searchhit():
|
|
34
|
+
assert raglab.Result is SearchHit
|
|
35
|
+
|
|
36
|
+
|
|
37
|
+
def test_defaults_satisfy_role_protocols():
|
|
38
|
+
assert isinstance(raglab.single_subtask_planner, raglab.Planner)
|
|
39
|
+
assert isinstance(raglab.identity_formulator, raglab.Formulator)
|
|
40
|
+
assert isinstance(raglab.passthrough_evaluator, raglab.Evaluator)
|
|
41
|
+
assert isinstance(raglab.score_reranker, raglab.Reranker)
|
|
42
|
+
assert isinstance(raglab.identity_citer, raglab.Citer)
|
|
43
|
+
|
|
44
|
+
|
|
45
|
+
# ----- the thin-slice loop (no LLM) ----------------------------------------- #
|
|
46
|
+
|
|
47
|
+
|
|
48
|
+
def test_make_search_agent_thin_slice_runs():
|
|
49
|
+
sources = {"s": _fake_retriever(_hits(("a", 0.9), ("b", 0.5)))}
|
|
50
|
+
results = make_search_agent(sources)("anything")
|
|
51
|
+
assert [r.artifact_id for r in results] == ["a", "b"]
|
|
52
|
+
|
|
53
|
+
|
|
54
|
+
def test_query_string_or_object_equivalent():
|
|
55
|
+
sources = {"s": _fake_retriever(_hits(("a", 0.9)))}
|
|
56
|
+
agent = make_search_agent(sources)
|
|
57
|
+
assert agent("q") == agent(Query(text="q"))
|
|
58
|
+
|
|
59
|
+
|
|
60
|
+
def test_cross_source_merge_reranks_by_score():
|
|
61
|
+
sources = {
|
|
62
|
+
"s1": _fake_retriever(_hits(("a", 0.3))),
|
|
63
|
+
"s2": _fake_retriever(_hits(("b", 0.9))),
|
|
64
|
+
}
|
|
65
|
+
results = make_search_agent(sources)("q")
|
|
66
|
+
assert [r.artifact_id for r in results] == ["b", "a"] # by score desc
|
|
67
|
+
|
|
68
|
+
|
|
69
|
+
def test_passthrough_evaluator_does_not_loop():
|
|
70
|
+
retr = _fake_retriever(_hits(("a", 0.9)))
|
|
71
|
+
make_search_agent({"s": retr})("q")
|
|
72
|
+
assert len(retr.calls) == 1 # one round only
|
|
73
|
+
|
|
74
|
+
|
|
75
|
+
def test_formulator_fan_out_issues_each_query():
|
|
76
|
+
retr = _fake_retriever(_hits(("a", 0.9)))
|
|
77
|
+
|
|
78
|
+
def multi(task, source):
|
|
79
|
+
return [
|
|
80
|
+
LowLevelQuery(source, task.goal),
|
|
81
|
+
LowLevelQuery(source, task.goal + " alt"),
|
|
82
|
+
]
|
|
83
|
+
|
|
84
|
+
make_search_agent({"s": retr}, formulator=multi)("q")
|
|
85
|
+
assert len(retr.calls) == 2
|
|
86
|
+
|
|
87
|
+
|
|
88
|
+
def test_unknown_source_is_skipped_not_an_error():
|
|
89
|
+
retr = _fake_retriever(_hits(("a", 0.9)))
|
|
90
|
+
|
|
91
|
+
def planner(query, sources):
|
|
92
|
+
return [SubTask(query.text, ("s", "missing"))]
|
|
93
|
+
|
|
94
|
+
results = make_search_agent({"s": retr}, planner=planner)("q")
|
|
95
|
+
assert [r.artifact_id for r in results] == ["a"]
|
|
96
|
+
|
|
97
|
+
|
|
98
|
+
# ----- the back-edge (the property that makes it an agent) ------------------ #
|
|
99
|
+
|
|
100
|
+
|
|
101
|
+
def test_back_edge_reformulates_until_sufficient():
|
|
102
|
+
retr = _fake_retriever(_hits(("a", 0.9)))
|
|
103
|
+
rounds = {"n": 0}
|
|
104
|
+
|
|
105
|
+
def refining_evaluator(task, results):
|
|
106
|
+
rounds["n"] += 1
|
|
107
|
+
if rounds["n"] < 2: # first round: not enough, re-query (back-edge)
|
|
108
|
+
return Judgement(
|
|
109
|
+
relevant=list(results),
|
|
110
|
+
sufficient=False,
|
|
111
|
+
refinement=SubTask(goal=task.goal + " more", sources=task.sources),
|
|
112
|
+
)
|
|
113
|
+
return Judgement(relevant=list(results), sufficient=True)
|
|
114
|
+
|
|
115
|
+
make_search_agent({"s": retr}, evaluator=refining_evaluator)("q")
|
|
116
|
+
assert rounds["n"] == 2 # looped once via the back-edge
|
|
117
|
+
assert len(retr.calls) == 2
|
|
118
|
+
|
|
119
|
+
|
|
120
|
+
def test_budget_bounds_a_never_sufficient_loop():
|
|
121
|
+
retr = _fake_retriever(_hits(("a", 0.9)))
|
|
122
|
+
|
|
123
|
+
def never_sufficient(task, results):
|
|
124
|
+
return Judgement(
|
|
125
|
+
relevant=list(results),
|
|
126
|
+
sufficient=False,
|
|
127
|
+
refinement=SubTask(task.goal, task.sources),
|
|
128
|
+
)
|
|
129
|
+
|
|
130
|
+
make_search_agent(
|
|
131
|
+
{"s": retr}, evaluator=never_sufficient, budget=Budget(max_rounds=3)
|
|
132
|
+
)("q")
|
|
133
|
+
assert len(retr.calls) == 3 # exactly max_rounds — the safety net holds
|
|
134
|
+
|
|
135
|
+
|
|
136
|
+
# ----- end-to-end over a REAL ir corpus (hermetic: light embedder) ---------- #
|
|
137
|
+
|
|
138
|
+
|
|
139
|
+
def test_agent_over_real_ir_corpus():
|
|
140
|
+
import ir
|
|
141
|
+
from ir.store import CorpusStore
|
|
142
|
+
|
|
143
|
+
docs = {
|
|
144
|
+
"deploy": "deploy the app to the server with systemd units",
|
|
145
|
+
"embed": "embed text with a model and cache the vectors",
|
|
146
|
+
"search": "vector similarity search with metadata filters",
|
|
147
|
+
}
|
|
148
|
+
corpus = ir.build(
|
|
149
|
+
ir.CorpusSource.from_mapping(docs, name="t", strategy=ir.WholeText()),
|
|
150
|
+
store=CorpusStore.memory(),
|
|
151
|
+
embedder="light",
|
|
152
|
+
)
|
|
153
|
+
agent = make_search_agent({"t": ir.as_retriever(corpus, k=3)})
|
|
154
|
+
results = agent("deploy the app to the server")
|
|
155
|
+
assert results
|
|
156
|
+
assert results[0].artifact_id == "deploy"
|
|
157
|
+
assert isinstance(results[0], SearchHit)
|
|
158
|
+
results[0].to_dict() # the substrate edge is serialization-clean
|
raglab-0.2.1/raglab/__init__.py
DELETED
|
@@ -1,7 +0,0 @@
|
|
|
1
|
-
"""A medley of tools to make RAG-based applications.
|
|
2
|
-
|
|
3
|
-
This is a fresh start for the ``raglab`` name (version 0.2.0+). The earlier
|
|
4
|
-
``raglab`` backend (PyPI versions 0.0.x–0.1.x) now lives at
|
|
5
|
-
`addaix/raglab_bak <https://github.com/addaix/raglab_bak>`_ and is published as
|
|
6
|
-
``raglab_bak``. Development of this new ``raglab`` is just beginning.
|
|
7
|
-
"""
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|