predict-rlm 0.2.0__tar.gz → 0.2.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (23) hide show
  1. predict_rlm-0.2.2/PKG-INFO +132 -0
  2. predict_rlm-0.2.2/README.md +106 -0
  3. {predict_rlm-0.2.0 → predict_rlm-0.2.2}/pyproject.toml +2 -1
  4. predict_rlm-0.2.0/PKG-INFO +0 -490
  5. predict_rlm-0.2.0/README.md +0 -465
  6. {predict_rlm-0.2.0 → predict_rlm-0.2.2}/.gitignore +0 -0
  7. {predict_rlm-0.2.0 → predict_rlm-0.2.2}/LICENSE +0 -0
  8. {predict_rlm-0.2.0 → predict_rlm-0.2.2}/src/predict_rlm/__init__.py +0 -0
  9. {predict_rlm-0.2.0 → predict_rlm-0.2.2}/src/predict_rlm/_shared.py +0 -0
  10. {predict_rlm-0.2.0 → predict_rlm-0.2.2}/src/predict_rlm/files.py +0 -0
  11. {predict_rlm-0.2.0 → predict_rlm-0.2.2}/src/predict_rlm/interpreter.py +0 -0
  12. {predict_rlm-0.2.0 → predict_rlm-0.2.2}/src/predict_rlm/predict_rlm.py +0 -0
  13. {predict_rlm-0.2.0 → predict_rlm-0.2.2}/src/predict_rlm/rlm_skills.py +0 -0
  14. {predict_rlm-0.2.0 → predict_rlm-0.2.2}/src/predict_rlm/sandbox/runner.js +0 -0
  15. {predict_rlm-0.2.0 → predict_rlm-0.2.2}/src/predict_rlm/skills/__init__.py +0 -0
  16. {predict_rlm-0.2.0 → predict_rlm-0.2.2}/src/predict_rlm/skills/docx/__init__.py +0 -0
  17. {predict_rlm-0.2.0 → predict_rlm-0.2.2}/src/predict_rlm/skills/docx/modules/md2docx.py +0 -0
  18. {predict_rlm-0.2.0 → predict_rlm-0.2.2}/src/predict_rlm/skills/docx/skill.py +0 -0
  19. {predict_rlm-0.2.0 → predict_rlm-0.2.2}/src/predict_rlm/skills/pdf/__init__.py +0 -0
  20. {predict_rlm-0.2.0 → predict_rlm-0.2.2}/src/predict_rlm/skills/pdf/skill.py +0 -0
  21. {predict_rlm-0.2.0 → predict_rlm-0.2.2}/src/predict_rlm/skills/spreadsheet/__init__.py +0 -0
  22. {predict_rlm-0.2.0 → predict_rlm-0.2.2}/src/predict_rlm/skills/spreadsheet/modules/formula_eval.py +0 -0
  23. {predict_rlm-0.2.0 → predict_rlm-0.2.2}/src/predict_rlm/skills/spreadsheet/skill.py +0 -0
@@ -0,0 +1,132 @@
1
+ Metadata-Version: 2.4
2
+ Name: predict-rlm
3
+ Version: 0.2.2
4
+ Summary: Production-grade RLMs (Recursive Language Models) with tool use, built on DSPy
5
+ Project-URL: Homepage, https://www.trampoline.ai/
6
+ Project-URL: Repository, https://github.com/Trampoline-AI/predict-rlm
7
+ Project-URL: Issues, https://github.com/Trampoline-AI/predict-rlm/issues
8
+ Author: Trampoline AI
9
+ License: MIT
10
+ License-File: LICENSE
11
+ Keywords: ai,dspy,language-models,reasoning,rlm,tool-use
12
+ Classifier: Development Status :: 3 - Alpha
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: License :: OSI Approved :: MIT License
15
+ Classifier: Programming Language :: Python :: 3.11
16
+ Classifier: Programming Language :: Python :: 3.12
17
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
18
+ Requires-Python: >=3.11
19
+ Requires-Dist: deno>=2
20
+ Requires-Dist: dspy>=3.1.2
21
+ Requires-Dist: nest-asyncio>=1.6.0
22
+ Requires-Dist: pydantic<3,>=2.8.2
23
+ Provides-Extra: examples
24
+ Requires-Dist: pymupdf>=1.24.0; extra == 'examples'
25
+ Description-Content-Type: text/markdown
26
+
27
+ # predict-rlm
28
+ Harness-less LM runtime built on top of [DSPy](https://dspy.ai). Define your inputs, outputs, and tools — the model handles its own control flow. Get fully interpretable trajectories and performance that scales directly with model improvements. Without context rot.
29
+
30
+ Based on the [Recursive Language Models](https://arxiv.org/abs/2512.24601v1) paper by [Alex L. Zhang](https://x.com/a1zhang), [Tim Kraska](https://x.com/tim_kraska), and [Omar Khattab](https://x.com/lateinteraction) from the Stanford NLP lab.<br/>
31
+
32
+ <br>
33
+ <p align="center">
34
+ <a href="https://github.com/Trampoline-AI/predict-rlm/actions/workflows/tests.yml"><img src="https://img.shields.io/github/actions/workflow/status/Trampoline-AI/predict-rlm/tests.yml?label=Tests" alt="Tests"></a>
35
+ <a href="https://codecov.io/gh/Trampoline-AI/predict-rlm"><img src="https://img.shields.io/codecov/c/github/Trampoline-AI/predict-rlm?token=NNS3R7OIT2&color=brightgreen&label=codecov" alt="codecov"></a>
36
+ <a href="https://pypi.org/project/predict-rlm/"><img src="https://img.shields.io/pypi/v/predict-rlm?color=blue" alt="PyPI"></a>
37
+ <a href="https://pypi.org/project/predict-rlm/"><img src="https://img.shields.io/pypi/pyversions/predict-rlm" alt="Python"></a>
38
+ <a href="https://discord.gg/BAkd288sGN"><img src="https://img.shields.io/badge/Discord-Join-5865F2?style=flat&logo=discord&logoColor=white" alt="Discord"></a>
39
+ <a href="https://github.com/Trampoline-AI/predict-rlm"><img src="https://img.shields.io/github/stars/trampoline-ai/predict-rlm?cacheSeconds=3600" alt="GitHub stars"></a>
40
+ <br/>
41
+ crafted with ♥ in MTL · NYC · FLP<br>by <a href="https://trampoline.ai">Trampoline AI</a>
42
+ </p>
43
+
44
+ ## Installation
45
+
46
+ ```bash
47
+ uv add predict-rlm
48
+ ```
49
+
50
+ ## Why RLMs?
51
+
52
+ <p align="center">
53
+ <img src="https://raw.githubusercontent.com/Trampoline-AI/predict-rlm/main/docs/bitter_lesson_spectrum.svg" alt="Bitter Lesson Spectrum — from hand-written prompts to RLMs" width="680"/>
54
+ </p>
55
+
56
+ - **Avoid context rot** — The root LM only interacts with its context programmatically through the REPL, staying well within its comfortable operating range — enabling complex, long-horizon tasks that would otherwise cause models to silently degrade.
57
+ - **Bitter lesson-proof: RLMs improve as LMs improve** — Unlike harnesses, which can cap or constrain the base model's capabilities, the performance, speed, and cost of RLM calls correlate directly with improvements to base model capabilities. [If the base model handles 10M tokens tomorrow, the RLM handles 100M.](https://alexzhang13.github.io/blog/2025/rlm/)
58
+ - **Symbolic reasoning & recursion** — like algebra, RLMs express the *structure* of computation rather than performing each operation individually; a single line can represent 1M sub-calls — in direct contrast to agents like Claude Code that must mechanically emit each sub-agent call one at a time.
59
+ - **Interpretability** — RLM trajectories are fully readable: you can trace every peek, chunk, sub-call, and verification step the model takes. This not only reveals *how* the model decomposed a problem, but provides concrete optimization signals which tools like [GEPA](https://gepa-ai.github.io/gepa) can ingest to evolve the RLM's strategies.
60
+ - **Ideal for improving performance per token** — RLMs allow small models to punch way above their weight (RLM(GPT-5-mini) outperforms base GPT-5) providing great opportunities for reducing costs or stretching limited compute budgets without sacrificing quality.
61
+
62
+ ## Features
63
+
64
+ <p align="center">
65
+ <img src="https://raw.githubusercontent.com/Trampoline-AI/predict-rlm/main/docs/harness_vs_rlm.svg" alt="Classic harness vs RLM architecture" width="600"/>
66
+ </p>
67
+
68
+
69
+ - **Multimodal** — process images, documents, audio, and video through sub-LM calls using native provider multimodal APIs.
70
+ - **Async tool calling** — native RLM async support in the WASM sandbox, enabling concurrent sub-LM invocations and tool calls
71
+ - **Prompt-optimized skills & tools** — predic-rlm skills comes tested and optimized to ensure maximum LM interoperability and performance, bundling instructions, PyPI packages, and tools for domain-specific tasks
72
+ - **Simple file I/O** — pass local or cloud files as typed inputs and outputs via `File`, keeping interop with your existing data pipelines straightforward. (S3 files support soon)
73
+ - **Structured sub-LM calls** — native Pydantic and DSPy signature support for type-safe sub-LM invocations with structured outputs
74
+
75
+ ## Demos
76
+
77
+ | Description | Input / Output | Preview |
78
+ |---|---|---|
79
+ | [Document Analysis](examples/document_analysis/) — Analyze documents and extract key dates, entities, and financial information into a structured report | **Input:** PDFs<br>**Output:** Structured briefing report ([example output](examples/document_analysis/sample/output/report.md)) | <a href="examples/document_analysis/sample/output/report.md"><img src="https://raw.githubusercontent.com/Trampoline-AI/predict-rlm/main/examples/document_analysis/sample/output/screenshot.png" width="280"></a> |
80
+ | [Document Redaction](examples/document_redaction/) — Redact PII from PDFs based on a policy, then verify the redactions visually | **Input:** PDFs<br>**Output:** Redacted PDFs ([example output](examples/document_redaction/sample/output/output.md)) | <a href="examples/document_redaction/sample/output/output.md"><img src="https://raw.githubusercontent.com/Trampoline-AI/predict-rlm/main/examples/document_redaction/sample/output/screenshot.png" width="280"></a> |
81
+ | [Invoice Processing](examples/invoice_processing/) — Extract vendor info, line items, and totals from PDF invoices into a consolidated Excel spreadsheet | **Input:** PDF invoices<br>**Output:** Excel spreadsheet ([example output](examples/invoice_processing/sample/output/)) | <a href="examples/invoice_processing/sample/output/output.md"><img src="https://raw.githubusercontent.com/Trampoline-AI/predict-rlm/main/examples/invoice_processing/sample/output/screenshot.png" width="280"></a> |
82
+ | [Contract Comparison](examples/contract_comparison/) — Compare two contract versions and produce a structured diff report with per-section analysis | **Input:** 2 PDF contracts<br>**Output:** Structured diff report ([example output](examples/contract_comparison/sample/output/)) | <a href="examples/contract_comparison/sample/output/comparison-report.md"><img src="https://raw.githubusercontent.com/Trampoline-AI/predict-rlm/main/examples/contract_comparison/sample/output/screenshot.png" width="280"></a> |
83
+
84
+ ## Quick start
85
+
86
+ ### With your coding agent
87
+
88
+ Install the [predict-rlm skill](.agents/skills/rlm/SKILL.md) in Claude Code, Codex, Cursor, or any compatible coding agent:
89
+
90
+ ```bash
91
+ npx skills add Trampoline-AI/predict-rlm
92
+ ```
93
+
94
+ Then ask your agent to build an RLM:
95
+
96
+ ```
97
+ ❯ /rlm build an RLM that extracts line items from PDF invoices into a spreadsheet
98
+ ```
99
+
100
+ ### Quick Example
101
+
102
+ ```python
103
+ import dspy
104
+ from predict_rlm import File, PredictRLM
105
+
106
+ class AnalyzeImages(dspy.Signature):
107
+ """Analyze images and answer the query. Load each image as a base64 data
108
+ URI and use predict() with dspy.Image to extract visual information."""
109
+ images: list[File] = dspy.InputField()
110
+ query: str = dspy.InputField()
111
+ answer: str = dspy.OutputField()
112
+
113
+ rlm = PredictRLM(
114
+ AnalyzeImages,
115
+ lm="openai/gpt-5.4",
116
+ sub_lm="openai/gpt-5.1",
117
+ )
118
+
119
+ result = rlm(
120
+ images=[File(path="page.png")],
121
+ query="Extract all visible text, then count each letter A-Z (case-insensitive).",
122
+ )
123
+
124
+ print(result.answer)
125
+ ```
126
+
127
+ ## Next steps
128
+
129
+ - [How it works](docs/how-it-works.md) — understand the sandbox, REPL loop, signatures, and file I/O
130
+ - [API reference](docs/api.md) — constructor params for `PredictRLM`, `File`, and `Skill`
131
+ - [Skills](docs/skills.md) — define, compose, and mount custom skills
132
+ - [Examples](examples/) — end-to-end demos with setup instructions
@@ -0,0 +1,106 @@
1
+ # predict-rlm
2
+ Harness-less LM runtime built on top of [DSPy](https://dspy.ai). Define your inputs, outputs, and tools — the model handles its own control flow. Get fully interpretable trajectories and performance that scales directly with model improvements. Without context rot.
3
+
4
+ Based on the [Recursive Language Models](https://arxiv.org/abs/2512.24601v1) paper by [Alex L. Zhang](https://x.com/a1zhang), [Tim Kraska](https://x.com/tim_kraska), and [Omar Khattab](https://x.com/lateinteraction) from the Stanford NLP lab.<br/>
5
+
6
+ <br>
7
+ <p align="center">
8
+ <a href="https://github.com/Trampoline-AI/predict-rlm/actions/workflows/tests.yml"><img src="https://img.shields.io/github/actions/workflow/status/Trampoline-AI/predict-rlm/tests.yml?label=Tests" alt="Tests"></a>
9
+ <a href="https://codecov.io/gh/Trampoline-AI/predict-rlm"><img src="https://img.shields.io/codecov/c/github/Trampoline-AI/predict-rlm?token=NNS3R7OIT2&color=brightgreen&label=codecov" alt="codecov"></a>
10
+ <a href="https://pypi.org/project/predict-rlm/"><img src="https://img.shields.io/pypi/v/predict-rlm?color=blue" alt="PyPI"></a>
11
+ <a href="https://pypi.org/project/predict-rlm/"><img src="https://img.shields.io/pypi/pyversions/predict-rlm" alt="Python"></a>
12
+ <a href="https://discord.gg/BAkd288sGN"><img src="https://img.shields.io/badge/Discord-Join-5865F2?style=flat&logo=discord&logoColor=white" alt="Discord"></a>
13
+ <a href="https://github.com/Trampoline-AI/predict-rlm"><img src="https://img.shields.io/github/stars/trampoline-ai/predict-rlm?cacheSeconds=3600" alt="GitHub stars"></a>
14
+ <br/>
15
+ crafted with ♥ in MTL · NYC · FLP<br>by <a href="https://trampoline.ai">Trampoline AI</a>
16
+ </p>
17
+
18
+ ## Installation
19
+
20
+ ```bash
21
+ uv add predict-rlm
22
+ ```
23
+
24
+ ## Why RLMs?
25
+
26
+ <p align="center">
27
+ <img src="https://raw.githubusercontent.com/Trampoline-AI/predict-rlm/main/docs/bitter_lesson_spectrum.svg" alt="Bitter Lesson Spectrum — from hand-written prompts to RLMs" width="680"/>
28
+ </p>
29
+
30
+ - **Avoid context rot** — The root LM only interacts with its context programmatically through the REPL, staying well within its comfortable operating range — enabling complex, long-horizon tasks that would otherwise cause models to silently degrade.
31
+ - **Bitter lesson-proof: RLMs improve as LMs improve** — Unlike harnesses, which can cap or constrain the base model's capabilities, the performance, speed, and cost of RLM calls correlate directly with improvements to base model capabilities. [If the base model handles 10M tokens tomorrow, the RLM handles 100M.](https://alexzhang13.github.io/blog/2025/rlm/)
32
+ - **Symbolic reasoning & recursion** — like algebra, RLMs express the *structure* of computation rather than performing each operation individually; a single line can represent 1M sub-calls — in direct contrast to agents like Claude Code that must mechanically emit each sub-agent call one at a time.
33
+ - **Interpretability** — RLM trajectories are fully readable: you can trace every peek, chunk, sub-call, and verification step the model takes. This not only reveals *how* the model decomposed a problem, but provides concrete optimization signals which tools like [GEPA](https://gepa-ai.github.io/gepa) can ingest to evolve the RLM's strategies.
34
+ - **Ideal for improving performance per token** — RLMs allow small models to punch way above their weight (RLM(GPT-5-mini) outperforms base GPT-5) providing great opportunities for reducing costs or stretching limited compute budgets without sacrificing quality.
35
+
36
+ ## Features
37
+
38
+ <p align="center">
39
+ <img src="https://raw.githubusercontent.com/Trampoline-AI/predict-rlm/main/docs/harness_vs_rlm.svg" alt="Classic harness vs RLM architecture" width="600"/>
40
+ </p>
41
+
42
+
43
+ - **Multimodal** — process images, documents, audio, and video through sub-LM calls using native provider multimodal APIs.
44
+ - **Async tool calling** — native RLM async support in the WASM sandbox, enabling concurrent sub-LM invocations and tool calls
45
+ - **Prompt-optimized skills & tools** — predic-rlm skills comes tested and optimized to ensure maximum LM interoperability and performance, bundling instructions, PyPI packages, and tools for domain-specific tasks
46
+ - **Simple file I/O** — pass local or cloud files as typed inputs and outputs via `File`, keeping interop with your existing data pipelines straightforward. (S3 files support soon)
47
+ - **Structured sub-LM calls** — native Pydantic and DSPy signature support for type-safe sub-LM invocations with structured outputs
48
+
49
+ ## Demos
50
+
51
+ | Description | Input / Output | Preview |
52
+ |---|---|---|
53
+ | [Document Analysis](examples/document_analysis/) — Analyze documents and extract key dates, entities, and financial information into a structured report | **Input:** PDFs<br>**Output:** Structured briefing report ([example output](examples/document_analysis/sample/output/report.md)) | <a href="examples/document_analysis/sample/output/report.md"><img src="https://raw.githubusercontent.com/Trampoline-AI/predict-rlm/main/examples/document_analysis/sample/output/screenshot.png" width="280"></a> |
54
+ | [Document Redaction](examples/document_redaction/) — Redact PII from PDFs based on a policy, then verify the redactions visually | **Input:** PDFs<br>**Output:** Redacted PDFs ([example output](examples/document_redaction/sample/output/output.md)) | <a href="examples/document_redaction/sample/output/output.md"><img src="https://raw.githubusercontent.com/Trampoline-AI/predict-rlm/main/examples/document_redaction/sample/output/screenshot.png" width="280"></a> |
55
+ | [Invoice Processing](examples/invoice_processing/) — Extract vendor info, line items, and totals from PDF invoices into a consolidated Excel spreadsheet | **Input:** PDF invoices<br>**Output:** Excel spreadsheet ([example output](examples/invoice_processing/sample/output/)) | <a href="examples/invoice_processing/sample/output/output.md"><img src="https://raw.githubusercontent.com/Trampoline-AI/predict-rlm/main/examples/invoice_processing/sample/output/screenshot.png" width="280"></a> |
56
+ | [Contract Comparison](examples/contract_comparison/) — Compare two contract versions and produce a structured diff report with per-section analysis | **Input:** 2 PDF contracts<br>**Output:** Structured diff report ([example output](examples/contract_comparison/sample/output/)) | <a href="examples/contract_comparison/sample/output/comparison-report.md"><img src="https://raw.githubusercontent.com/Trampoline-AI/predict-rlm/main/examples/contract_comparison/sample/output/screenshot.png" width="280"></a> |
57
+
58
+ ## Quick start
59
+
60
+ ### With your coding agent
61
+
62
+ Install the [predict-rlm skill](.agents/skills/rlm/SKILL.md) in Claude Code, Codex, Cursor, or any compatible coding agent:
63
+
64
+ ```bash
65
+ npx skills add Trampoline-AI/predict-rlm
66
+ ```
67
+
68
+ Then ask your agent to build an RLM:
69
+
70
+ ```
71
+ ❯ /rlm build an RLM that extracts line items from PDF invoices into a spreadsheet
72
+ ```
73
+
74
+ ### Quick Example
75
+
76
+ ```python
77
+ import dspy
78
+ from predict_rlm import File, PredictRLM
79
+
80
+ class AnalyzeImages(dspy.Signature):
81
+ """Analyze images and answer the query. Load each image as a base64 data
82
+ URI and use predict() with dspy.Image to extract visual information."""
83
+ images: list[File] = dspy.InputField()
84
+ query: str = dspy.InputField()
85
+ answer: str = dspy.OutputField()
86
+
87
+ rlm = PredictRLM(
88
+ AnalyzeImages,
89
+ lm="openai/gpt-5.4",
90
+ sub_lm="openai/gpt-5.1",
91
+ )
92
+
93
+ result = rlm(
94
+ images=[File(path="page.png")],
95
+ query="Extract all visible text, then count each letter A-Z (case-insensitive).",
96
+ )
97
+
98
+ print(result.answer)
99
+ ```
100
+
101
+ ## Next steps
102
+
103
+ - [How it works](docs/how-it-works.md) — understand the sandbox, REPL loop, signatures, and file I/O
104
+ - [API reference](docs/api.md) — constructor params for `PredictRLM`, `File`, and `Skill`
105
+ - [Skills](docs/skills.md) — define, compose, and mount custom skills
106
+ - [Examples](examples/) — end-to-end demos with setup instructions
@@ -1,6 +1,6 @@
1
1
  [project]
2
2
  name = "predict-rlm"
3
- version = "0.2.0"
3
+ version = "0.2.2"
4
4
  description = "Production-grade RLMs (Recursive Language Models) with tool use, built on DSPy"
5
5
  authors = [{ name = "Trampoline AI" }]
6
6
  license = { text = "MIT" }
@@ -17,6 +17,7 @@ classifiers = [
17
17
  ]
18
18
 
19
19
  dependencies = [
20
+ "deno>=2",
20
21
  "dspy>=3.1.2",
21
22
  "nest-asyncio>=1.6.0",
22
23
  "pydantic>=2.8.2,<3",
@@ -1,490 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: predict-rlm
3
- Version: 0.2.0
4
- Summary: Production-grade RLMs (Recursive Language Models) with tool use, built on DSPy
5
- Project-URL: Homepage, https://www.trampoline.ai/
6
- Project-URL: Repository, https://github.com/Trampoline-AI/predict-rlm
7
- Project-URL: Issues, https://github.com/Trampoline-AI/predict-rlm/issues
8
- Author: Trampoline AI
9
- License: MIT
10
- License-File: LICENSE
11
- Keywords: ai,dspy,language-models,reasoning,rlm,tool-use
12
- Classifier: Development Status :: 3 - Alpha
13
- Classifier: Intended Audience :: Developers
14
- Classifier: License :: OSI Approved :: MIT License
15
- Classifier: Programming Language :: Python :: 3.11
16
- Classifier: Programming Language :: Python :: 3.12
17
- Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
18
- Requires-Python: >=3.11
19
- Requires-Dist: dspy>=3.1.2
20
- Requires-Dist: nest-asyncio>=1.6.0
21
- Requires-Dist: pydantic<3,>=2.8.2
22
- Provides-Extra: examples
23
- Requires-Dist: pymupdf>=1.24.0; extra == 'examples'
24
- Description-Content-Type: text/markdown
25
-
26
- # predict-rlm
27
-
28
- [![Tests](https://github.com/Trampoline-AI/predict-rlm/actions/workflows/tests.yml/badge.svg)](https://github.com/Trampoline-AI/predict-rlm/actions/workflows/tests.yml)
29
- [![codecov](https://codecov.io/gh/Trampoline-AI/predict-rlm/graph/badge.svg?token=NNS3R7OIT2)](https://codecov.io/gh/Trampoline-AI/predict-rlm)
30
- [![Book a Call](https://img.shields.io/badge/Book_a_Call-Cal.com-292929?style=flat&logo=cal.com)](https://cal.com/team/trampoline)
31
- [![Discord](https://img.shields.io/badge/Discord-Join-5865F2?style=flat&logo=discord&logoColor=white)](https://discord.gg/BAkd288sGN)
32
-
33
- Production-grade RLMs (Recursive Language Models) with tool use, built on [DSPy](https://github.com/stanfordnlp/dspy). By [Trampoline AI](https://www.trampoline.ai/).
34
-
35
- Based on the [Recursive Language Models](https://arxiv.org/abs/2512.24601v1) paper by [Alex L. Zhang](https://x.com/a1zhang), [Tim Kraska](https://x.com/tim_kraska), and [Omar Khattab](https://x.com/lateinteraction) from the Stanford NLP lab.
36
-
37
- ## Installation
38
-
39
- ```bash
40
- uv add predict-rlm
41
- ```
42
-
43
- Or with pip:
44
-
45
- ```bash
46
- pip install predict-rlm
47
- ```
48
-
49
- predict-rlm also requires [Deno](https://deno.com/) for its sandboxed code interpreter:
50
-
51
- ```bash
52
- curl -fsSL https://deno.land/install.sh | sh
53
- ```
54
-
55
- ## Quick start
56
-
57
- ```python
58
- import dspy
59
- from predict_rlm import File, PredictRLM
60
-
61
- class AnalyzeImages(dspy.Signature):
62
- """Analyze images and answer the query. Load each image as a base64 data
63
- URI and use predict() with dspy.Image to extract visual information."""
64
- images: list[File] = dspy.InputField()
65
- query: str = dspy.InputField()
66
- answer: str = dspy.OutputField()
67
-
68
- rlm = PredictRLM(AnalyzeImages, lm="openai/gpt-5.4", sub_lm="openai/gpt-5.1")
69
- result = rlm(images=[File(path="page.png")], query="Extract all visible text, then count each letter A-Z (case-insensitive).")
70
- print(result.answer)
71
- ```
72
-
73
- ### Use it with your coding agent
74
-
75
- Add the [predict-rlm agent skill](skills/create-rlm/SKILL.md) to Claude Code, Codex, Cursor, or any compatible coding agent:
76
-
77
- ```bash
78
- npx skills add Trampoline-AI/predict-rlm
79
- ```
80
-
81
- Your agent will then know how to build RLMs using predict-rlm — including the file structure, signatures, tools, and skills patterns.
82
-
83
- ## Demos
84
-
85
- | Example | Description | Input / Output | Preview |
86
- |---|---|---|---|
87
- | [Document Analysis](examples/document_analysis/) | Analyze documents and extract key dates, entities, and financial information into a structured report | **Input:** 1 PDF, 136 pages<br>**Output:** Structured briefing report with key dates, entities, and financial info ([sample](examples/document_analysis/sample/output/report.md)) | <a href="examples/document_analysis/sample/output/report.md"><img src="examples/document_analysis/sample/output/screenshot.png" width="200"></a> |
88
- | [Document Redaction](examples/document_redaction/) | Redact PII from PDFs based on a policy, then verify the redactions visually | **Input:** 1 PDF, 6 pages<br>**Output:** 96 PII redactions across 6 categories, verified redacted PDF ([sample](examples/document_redaction/sample/output/output.md)) | <a href="examples/document_redaction/sample/output/output.md"><img src="examples/document_redaction/sample/output/screenshot.png" width="200"></a> |
89
- | [Invoice Processing](examples/invoice_processing/) | Extract vendor info, line items, and totals from PDF invoices into a consolidated Excel spreadsheet | **Input:** 2 PDFs, 2 pages<br>**Output:** Line items, totals, and vendor info in Excel ([sample](examples/invoice_processing/sample/output/)) | <a href="examples/invoice_processing/sample/output/output.md"><img src="examples/invoice_processing/sample/output/screenshot.png" width="200"></a> |
90
- | [Contract Comparison](examples/contract_comparison/) | Compare two contract versions and produce a structured diff report with per-section analysis | **Input:** 2 PDFs, 45 pages<br>**Output:** Per-section diff report with key differences ([sample](examples/contract_comparison/sample/output/)) | <a href="examples/contract_comparison/sample/output/comparison-report.md"><img src="examples/contract_comparison/sample/output/screenshot.png" width="200"></a> |
91
-
92
- ## Why RLMs?
93
-
94
- Think of an RLM as a **callable, pre-configured agent**. Like Claude Code or Cursor, it can autonomously explore context, write and execute code, call tools, inspect results, and iterate until the task is done. Unlike a chat agent, an RLM is a **function** — you define its inputs, outputs, and tools, then call it from your code. It returns structured data, not chat messages.
95
-
96
- This makes RLMs ideal for tasks that are:
97
-
98
- - **Specific and repeatable** — tasks with a well-defined SOP and a known desired outcome. Think of an RLM as a Claude Code that's been purpose-built for one task — with the right tools, the right instructions, and a tuned workflow that reliably produces the result you want. You define the procedure once, and the RLM follows it every time.
99
- - **Context-heavy** — too much data to fit in a single prompt. The RLM selectively loads what it needs via tools, working through documents page by page rather than stuffing everything into one call.
100
- - **Multi-step** — require exploring, extracting, computing, and synthesizing. The RLM writes code to orchestrate these steps, parallelizing where possible (e.g. processing 50 pages concurrently with `asyncio.gather()`).
101
- - **Action-oriented** — need to make changes, not just read. By giving the RLM tools that modify state (redact text, call APIs, write files), it becomes an autonomous executor — not just an analyzer.
102
- - **Iterative** — the RLM can inspect its own results, catch errors, retry with different approaches, and verify its work before submitting. It self-corrects in ways a single LLM call cannot.
103
-
104
- ## What is predict-rlm?
105
-
106
- `predict-rlm` extends DSPy's RLM with a built-in `predict()` tool — a sub-LM the RLM can call from within its sandbox to perform language understanding, vision analysis, and structured extraction via DSPy signatures.
107
-
108
- The architecture is two-level:
109
-
110
- 1. **The outer LLM** (the RLM itself) writes and executes Python code in a sandboxed REPL. It plans, orchestrates, and iterates.
111
- 2. **The sub-LM** (via `predict()`) handles perception and extraction — analyzing images, understanding text, and returning typed results.
112
-
113
- The sub-LM supports `dspy.Image` type hints, which means `predict()` calls can pass images (as URLs or base64) directly to a vision-capable model. This makes RLMs **natively multimodal** — the outer LLM renders a PDF page to an image, passes it to `predict()`, and gets back structured data. The RLM itself doesn't need to be a vision model; it delegates visual understanding to the sub-LM.
114
-
115
- The outer LLM decides *what* to look at and *when*; the sub-LM decides *what it sees*. This separation is key to context management — the outer LLM's context stays small (code + tool results), while context-heavy work like reading a full page image or analyzing a long text block is offloaded to `predict()` calls. Each `predict()` call gets its own context window with the sub-LM, so the RLM can process far more total data than any single LLM call could hold.
116
-
117
- ## Features
118
-
119
- - **Built-in `predict()` tool** — call a sub-LM from inside the sandbox with DSPy signatures and type hints
120
- - **JSPI-enabled WASM sandbox** — concurrent async tool execution via Pyodide with `asyncio.gather()`
121
- - **Structured outputs** — Pydantic models, typed fields, and lists as output types
122
- - **Custom tools** — give the RLM tools that read, write, or modify external state
123
- - **Skills** — composable bundles of instructions, PyPI packages, and tools for domain-specific tasks
124
- - **Multimodal** — sub-LM calls support `dspy.Image`, so the RLM can analyze images, PDFs, screenshots, etc. without the outer LLM needing vision capabilities
125
- - **Optimizable** — built on DSPy, so optimizers can tune prompts and few-shot examples automatically. Inference-time scaling techniques like [GEPA](https://arxiv.org/abs/2504.00294) push accuracy further by generating and selecting among multiple candidate solutions
126
-
127
- ## How it works
128
-
129
- 1. You define **inputs**, **outputs**, and **tools** — what the RLM receives, what it should produce, and what actions it can take
130
- 2. The outer LLM writes Python code in a sandboxed Pyodide/WASM REPL
131
- 3. Inside the sandbox, it calls `await predict(signature, **kwargs)` to invoke the sub-LM for understanding and extraction
132
- 4. It iterates — exploring data, calling tools, building up intermediate results, and handling errors
133
- 5. When done, it calls `SUBMIT()` with the final structured output
134
-
135
- Each iteration is a REPL turn: the LLM sees the output of its previous code, decides what to do next, and writes more code. State persists between iterations, so it can accumulate findings across many steps.
136
-
137
- ### Signatures and file I/O
138
-
139
- The DSPy signature defines the **inputs**, **outputs**, and **strategy** (via the docstring). Use `File` for file-typed fields — input files are mounted into the sandbox, output files are synced back (see [API](#file) for details).
140
-
141
- ```python
142
- from predict_rlm import File, PredictRLM, Skill
143
-
144
- class AnalyzeDocuments(dspy.Signature):
145
- """Analyze documents and produce a structured report.
146
-
147
- 1. Survey the documents — file names, page counts, document types
148
- 2. Render pages as images and use predict() to extract content
149
- 3. Produce the report following the criteria's format
150
- """
151
- documents: list[File] = dspy.InputField()
152
- analysis: DocumentAnalysis = dspy.OutputField()
153
-
154
- pdf_skill = Skill(
155
- name="pdf",
156
- instructions="Use pymupdf to open and render PDF pages...",
157
- packages=["pymupdf"],
158
- )
159
-
160
- rlm = PredictRLM(
161
- AnalyzeDocuments,
162
- lm="openai/gpt-5.4",
163
- sub_lm="openai/gpt-5.1",
164
- skills=[pdf_skill],
165
- )
166
-
167
- documents = [File(path="report.pdf"), File(path="appendix.pdf")]
168
- result = rlm(documents=documents)
169
- ```
170
-
171
- Inside the sandbox, the RLM autonomously decides which pages to load and when:
172
-
173
- ```python
174
- # The RLM writes code like this — you don't write this, the LLM does:
175
- import pymupdf, base64, asyncio
176
-
177
- doc = pymupdf.open(documents[0])
178
- images = [
179
- f"data:image/png;base64,{base64.b64encode(doc[i].get_pixmap(dpi=200).tobytes('png')).decode()}"
180
- for i in range(3)
181
- ]
182
- results = await asyncio.gather(*[
183
- predict("page: dspy.Image -> dates: list[str]", page=img)
184
- for img in images
185
- ])
186
- ```
187
-
188
- ## Skills
189
-
190
- Skills are the primary way to extend what an RLM can do inside its sandbox. The sandbox starts with just Python's standard library and `predict()` — skills add **PyPI packages**, **instructions**, **modules**, and **tools** on top.
191
-
192
- Skills are for **general capabilities** — teaching the RLM how to use a library or approach a domain. For single specialized functions (fetch a URL, query a database, call an API), use the `tools=` parameter directly instead.
193
-
194
- This is powerful for the same reason CLI tools are powerful for Claude Code: if there's a Python package for it, the RLM can use it. Data manipulation with pandas, PDF parsing with pdfplumber, image processing with Pillow, web scraping with beautifulsoup4, geospatial analysis with shapely — skills make any of these available inside the sandbox, and the RLM can write code against them autonomously.
195
-
196
- **Package compatibility:** The sandbox runs [Pyodide](https://pyodide.org/) (CPython compiled to WebAssembly), which supports **pure-Python packages** out of the box via micropip. Packages with C extensions only work if they ship a pre-built Pyodide wheel — [many popular ones do](https://pyodide.org/en/stable/usage/packages-in-pyodide.html) (numpy, pandas, scipy, Pillow, pymupdf, etc.), but packages that rely on system libraries without a Pyodide build (e.g. psycopg2, torch) cannot be installed in the sandbox. For these, expose the functionality as a **host-side tool** instead — the tool runs in your normal Python environment and the RLM calls it from the sandbox via the tool bridge.
197
-
198
- Unlike Claude Code skills, which need to be discovered and loaded on demand (because Claude Code is a general-purpose agent that can't load every capability at once), RLM skills are **always loaded into context**. This works because an RLM is already scoped to a specific task — you know exactly what capabilities it needs when you define it, so you can confidently pass all relevant skills upfront without worrying about context bloat or dynamic discovery.
199
-
200
- ```python
201
- from predict_rlm import PredictRLM, Skill
202
-
203
- pdf_skill = Skill(
204
- name="pdf-extraction",
205
- instructions="Use pdfplumber for table extraction. Prefer page.extract_tables() for tabular content.",
206
- packages=["pdfplumber"],
207
- )
208
-
209
- rlm = PredictRLM(
210
- "documents -> tables: list[dict]",
211
- lm="openai/gpt-5.4",
212
- sub_lm="openai/gpt-5.1",
213
- skills=[pdf_skill],
214
- )
215
- ```
216
-
217
- Skills are composable — pass multiple skills and their instructions, packages, and tools are merged automatically:
218
-
219
- ```python
220
- data_skill = Skill(
221
- name="data-analysis",
222
- instructions="Use pandas for tabular data. Print df.head() to inspect before processing.",
223
- packages=["pandas", "openpyxl"],
224
- )
225
-
226
- viz_skill = Skill(
227
- name="visualization",
228
- instructions="Use matplotlib for charts. Save figures to bytes, don't call plt.show().",
229
- packages=["matplotlib"],
230
- )
231
-
232
- rlm = PredictRLM(
233
- "spreadsheet, query -> analysis: str, chart: bytes",
234
- lm="openai/gpt-5.4",
235
- skills=[data_skill, viz_skill],
236
- )
237
- ```
238
-
239
- ### Sandbox modules
240
-
241
- Skills can mount Python modules directly into the sandbox via the `modules` field. This lets you ship custom Python code alongside a skill that the RLM can `import` in its sandbox code — without publishing it to PyPI.
242
-
243
- ```python
244
- from pathlib import Path
245
-
246
- spreadsheet_skill = Skill(
247
- name="spreadsheet",
248
- instructions="Use openpyxl to build workbooks. Use formula_eval to verify formulas.",
249
- packages=["openpyxl", "pandas", "formulas"],
250
- modules={"formula_eval": str(Path(__file__).parent / "modules" / "formula_eval.py")},
251
- )
252
- ```
253
-
254
- The key maps the **import name** to the **host filesystem path** of the `.py` file. When the RLM runs, the module is mounted into the sandbox and becomes importable:
255
-
256
- ```python
257
- # Inside the sandbox, the RLM can write:
258
- from formula_eval import evaluate
259
- report = evaluate("output.xlsx")
260
- ```
261
-
262
- ### Built-in skills
263
-
264
- predict-rlm ships a library of pre-built skills you can use directly:
265
-
266
- ```python
267
- from predict_rlm.skills import pdf, spreadsheet, docx
268
-
269
- rlm = PredictRLM(MySignature, skills=[pdf, spreadsheet, docx])
270
- ```
271
-
272
- | Skill | Import | Packages | Modules | What it teaches the RLM |
273
- |---|---|---|---|---|
274
- | **pdf** | `from predict_rlm.skills import pdf` | `pymupdf` | — | Read, render, modify, and redact PDFs |
275
- | **spreadsheet** | `from predict_rlm.skills import spreadsheet` | `openpyxl`, `pandas`, `formulas` | `formula_eval` | Build and modify Excel workbooks with formulas and formatting |
276
- | **docx** | `from predict_rlm.skills import docx` | `python-docx` | `md2docx` | Read, write, and modify Word documents with tables, formatting, and styles |
277
-
278
- ## Examples
279
-
280
- ### Running the examples
281
-
282
- ```bash
283
- git clone https://github.com/Trampoline-AI/predict-rlm.git
284
- cd predict-rlm
285
- uv sync --extra examples
286
- ```
287
-
288
- Set your API key for the LLM provider used in the example (defaults to OpenAI):
289
-
290
- ```bash
291
- export OPENAI_API_KEY=sk-...
292
- ```
293
-
294
- Each example defaults to the PDFs in its `sample/input/` directory. You can also pass file paths or a directory:
295
-
296
- ```bash
297
- # Document analysis
298
- uv run examples/document_analysis/run.py
299
-
300
- # Document redaction
301
- uv run examples/document_redaction/run.py
302
-
303
- # Invoice processing
304
- uv run examples/invoice_processing/run.py
305
-
306
- # Contract comparison
307
- uv run examples/contract_comparison/run.py
308
-
309
- # Pass custom files or a directory
310
- uv run examples/document_analysis/run.py /path/to/docs/
311
- uv run examples/invoice_processing/run.py invoice1.pdf invoice2.pdf
312
-
313
- # With debug output (prints REPL code and tool calls to stderr)
314
- uv run examples/document_analysis/run.py --debug
315
- ```
316
-
317
- Outputs are saved to `output/{timestamp}/` inside each example directory.
318
-
319
- ### Example #1: Document Analysis
320
-
321
- **What it does:** Takes a set of PDFs and a natural language prompt (e.g. "extract key dates, entities, and financial information") and produces a structured report with typed fields.
322
-
323
- The output is defined as Pydantic schemas:
324
-
325
- ```python
326
- class KeyDate(BaseModel):
327
- name: str # e.g. "Submission Deadline"
328
- date: str # ISO format (YYYY-MM-DD)
329
- time: str | None = None # 24-hour format (HH:MM)
330
- timezone: str | None = None # e.g. "EST", "UTC"
331
-
332
- class KeyEntity(BaseModel):
333
- name: str # e.g. "Acme Corporation"
334
- role: str | None = None # e.g. "Contractor"
335
- contact: str | None = None
336
-
337
- class DocumentAnalysis(BaseModel):
338
- report: str # Full markdown report
339
- key_dates: list[KeyDate]
340
- key_entities: list[KeyEntity]
341
- ```
342
-
343
- The DSPy signature ties them together with task instructions:
344
-
345
- ```python
346
- class AnalyzeDocuments(dspy.Signature):
347
- """Analyze documents and produce a structured report.
348
-
349
- 1. Read the report criteria to understand what to extract
350
- 2. Survey the documents — file names, page counts, document types
351
- 3. Render pages and use predict() to extract content
352
- 4. Produce the report following the criteria's format
353
- """
354
- documents: list[File] = dspy.InputField()
355
- analysis: DocumentAnalysis = dspy.OutputField()
356
- ```
357
-
358
- **How it works:**
359
-
360
- The RLM receives `File` references as input. The files are mounted into the sandbox, and the RLM opens them directly with pymupdf. This is the key design pattern: the RLM **manages its own context window**. Given a 200-page document set, it doesn't try to process everything at once. Instead, it:
361
-
362
- 1. **Surveys** the documents — checks file names and page counts to understand the structure
363
- 2. **Samples** strategically — renders a few pages to understand the format and identify where key information lives
364
- 3. **Extracts in parallel** — uses `asyncio.gather()` to send multiple pages to `predict()` concurrently, extracting dates, entities, or other fields from each page simultaneously
365
- 4. **Synthesizes** — aggregates findings across pages, deduplicates, and produces the final structured output
366
-
367
- The `predict()` calls use DSPy signatures with type hints, so the sub-LM returns typed data (not free-form text) that the RLM can immediately work with in code:
368
-
369
- ```python
370
- # Inside the sandbox, the RLM writes code like this:
371
- result = await predict(
372
- "page: dspy.Image -> dates: list[str], entities: list[str]",
373
- instructions="Extract all dates and key entities from this page.",
374
- page=page_image,
375
- )
376
- # result["dates"] is a list of strings, ready to use
377
- ```
378
-
379
- **What you provide:** A [DSPy Signature](examples/document_analysis/signature.py) defining the task instructions, a [Pydantic schema](examples/document_analysis/schema.py) for the output, and a [pdf skill](examples/document_analysis/skills.py). The [service layer](examples/document_analysis/service.py) wires it all together in ~20 lines.
380
-
381
- **Sample run:** The [`sample/`](examples/document_analysis/sample/) directory contains a 136-page airport parking management document and the [full output](examples/document_analysis/sample/output/report.md) produced by the RLM. Here are the run stats:
382
-
383
- | | Main LM (`gpt-5.4`) | Sub-LM (`gpt-5.1`) |
384
- |---|---|---|
385
- | Calls | 8 | 63 |
386
- | Input tokens | 93,571 | 69,986 |
387
- | Output tokens | 9,241 | 21,274 |
388
- | Cost | $0.22 | $0.30 |
389
-
390
- **136 pages analyzed in ~4 minutes for $0.52 total ($0.004/page).** The outer LLM made 8 calls to orchestrate the entire run, while 63 sub-LM calls did the heavy lifting in parallel.
391
-
392
- ### Example #2: Document Redaction
393
-
394
- **What it does:** Takes PDFs and a redaction policy (e.g. "redact all PII: names, phone numbers, addresses, signatures") and produces redacted PDF files with sensitive content blacked out — plus a structured report of every redaction applied.
395
-
396
- This example demonstrates two key RLM capabilities:
397
-
398
- First, the RLM is an **autonomous executor that modifies files**. It inspects pages, identifies sensitive content, applies redactions, and then *re-inspects* the pages to verify the redactions worked. If a text match fails (the exact string wasn't found on the page), it retries with a shorter substring.
399
-
400
- Second, the RLM **parallelizes sub-LM calls to process large documents efficiently**. A 100-page PDF doesn't mean 100 sequential LLM calls — the RLM writes `asyncio.gather()` to fan out `predict()` calls across all pages concurrently. Each page gets its own sub-LM call with its own context window, all running in parallel.
401
- **How it works:**
402
-
403
- The RLM receives `File` references to PDFs, which are mounted into the sandbox. It uses pymupdf directly inside the sandbox (via skills) — no host-side tools needed. The workflow the RLM autonomously executes:
404
-
405
- 1. **Scans pages in parallel** — renders batches of pages as images and fans out `predict()` calls via `asyncio.gather()` to identify all text matching the redaction criteria across every page concurrently
406
- 2. **Applies redactions** — uses pymupdf's `search_for()` and `add_redact_annot()` to black out identified strings. If any are missed, it adjusts and retries
407
- 3. **Handles non-text content** — for signatures, logos, or images, it estimates bounding box coordinates and redacts by area
408
- 4. **Verifies** — re-renders redacted pages and confirms the sensitive content is gone
409
- 5. **Reports** — produces a `RedactionResult` with per-page summaries and the complete list of redaction targets
410
-
411
- Redacted PDFs are written to a `list[File]` output and synced back to the host automatically.
412
-
413
- **What you provide:** A [DSPy Signature](examples/document_redaction/signature.py) with step-by-step redaction instructions, a [Pydantic schema](examples/document_redaction/schema.py) for the result, and [skills](examples/document_redaction/skills.py) for pymupdf and redaction patterns. The [service layer](examples/document_redaction/service.py) wires it together in ~20 lines.
414
-
415
- **Sample run:** The [`sample/`](examples/document_redaction/sample/) directory contains a 6-page mock employment agreement filled with PII (names, SINs, bank accounts, addresses, phone numbers, health cards) and the [full output](examples/document_redaction/sample/output/output.md) produced by the RLM — 96 redactions across all 6 pages. Here are the run stats:
416
-
417
- | | Main LM (`gpt-5.4`) | Sub-LM (`gpt-5.1`) |
418
- |---|---|---|
419
- | Calls | 6 | 20 |
420
- | Input tokens | 55,432 | 19,572 |
421
- | Output tokens | 5,866 | 6,905 |
422
- | Cost | $0.14 | $0.09 |
423
-
424
- **6 pages fully redacted in about 2 minutes for $0.24 total.** The RLM identified and redacted 96 instances of PII across 6 categories (names, addresses, phone numbers, emails, government IDs, financial info), then verified each page.
425
-
426
- ## API
427
-
428
- ### `PredictRLM`
429
-
430
- The main class. Extends `dspy.RLM` with a built-in `predict()` tool.
431
-
432
- ```python
433
- PredictRLM(
434
- signature, # DSPy signature (str or Signature class)
435
- lm=None, # Main LM — LM instance or model string
436
- sub_lm=None, # LM for predict() — LM instance or model string
437
- max_iterations=30, # Max REPL iterations
438
- max_llm_calls=50, # Max LM calls per execution
439
- tools=None, # Additional tool functions
440
- skills=None, # List of Skill instances
441
- allowed_domains=None, # Domains the sandbox can access
442
- debug=False, # Print REPL activity to stderr
443
- )
444
- ```
445
-
446
- ### `File`
447
-
448
- Unified file type for inputs and outputs. Behavior is determined by the field position in the signature.
449
-
450
- ```python
451
- File(path="report.pdf") # Single file
452
- File.from_dir("docs/") # All files in a directory -> list[File]
453
- ```
454
-
455
- As an **input field**, the file is mounted into the sandbox. As an **output field**, it's synced back to the host after execution.
456
-
457
- ```python
458
- class MySignature(dspy.Signature):
459
- source: File = dspy.InputField() # mounted into sandbox
460
- docs: list[File] = dspy.InputField() # multiple files mounted
461
- result: File = dspy.OutputField() # single file synced back
462
- outputs: list[File] = dspy.OutputField() # multiple files synced back
463
- ```
464
-
465
- ### `Skill`
466
-
467
- Reusable bundle of instructions, packages, modules, and tools.
468
-
469
- ```python
470
- Skill(
471
- name="my-skill", # Short identifier
472
- instructions="How to approach...", # Injected into the RLM prompt
473
- packages=["pandas", "pdfplumber"], # Installed in the sandbox
474
- modules={"helper": "/path/to/mod.py"},# Mounted as importable modules in the sandbox
475
- tools={"my_func": my_func}, # Exposed alongside predict()
476
- )
477
- ```
478
-
479
- ## Requirements
480
-
481
- - Python 3.11+
482
- - [Deno](https://deno.com/) (for the sandboxed code interpreter)
483
-
484
- The RLM executes generated Python code inside a [Pyodide](https://pyodide.org/) WASM sandbox managed by Deno. Deno provides the V8 runtime with JSPI support, fine-grained permissions (network, filesystem), and runs the sandbox as a subprocess — your host Python process never executes untrusted code directly.
485
-
486
- See the [Deno installation docs](https://docs.deno.com/runtime/getting_started/installation/) for setup instructions. Deno is automatically invoked when `PredictRLM` runs — no additional configuration needed.
487
-
488
- ## License
489
-
490
- MIT — see [LICENSE](LICENSE) for details.
@@ -1,465 +0,0 @@
1
- # predict-rlm
2
-
3
- [![Tests](https://github.com/Trampoline-AI/predict-rlm/actions/workflows/tests.yml/badge.svg)](https://github.com/Trampoline-AI/predict-rlm/actions/workflows/tests.yml)
4
- [![codecov](https://codecov.io/gh/Trampoline-AI/predict-rlm/graph/badge.svg?token=NNS3R7OIT2)](https://codecov.io/gh/Trampoline-AI/predict-rlm)
5
- [![Book a Call](https://img.shields.io/badge/Book_a_Call-Cal.com-292929?style=flat&logo=cal.com)](https://cal.com/team/trampoline)
6
- [![Discord](https://img.shields.io/badge/Discord-Join-5865F2?style=flat&logo=discord&logoColor=white)](https://discord.gg/BAkd288sGN)
7
-
8
- Production-grade RLMs (Recursive Language Models) with tool use, built on [DSPy](https://github.com/stanfordnlp/dspy). By [Trampoline AI](https://www.trampoline.ai/).
9
-
10
- Based on the [Recursive Language Models](https://arxiv.org/abs/2512.24601v1) paper by [Alex L. Zhang](https://x.com/a1zhang), [Tim Kraska](https://x.com/tim_kraska), and [Omar Khattab](https://x.com/lateinteraction) from the Stanford NLP lab.
11
-
12
- ## Installation
13
-
14
- ```bash
15
- uv add predict-rlm
16
- ```
17
-
18
- Or with pip:
19
-
20
- ```bash
21
- pip install predict-rlm
22
- ```
23
-
24
- predict-rlm also requires [Deno](https://deno.com/) for its sandboxed code interpreter:
25
-
26
- ```bash
27
- curl -fsSL https://deno.land/install.sh | sh
28
- ```
29
-
30
- ## Quick start
31
-
32
- ```python
33
- import dspy
34
- from predict_rlm import File, PredictRLM
35
-
36
- class AnalyzeImages(dspy.Signature):
37
- """Analyze images and answer the query. Load each image as a base64 data
38
- URI and use predict() with dspy.Image to extract visual information."""
39
- images: list[File] = dspy.InputField()
40
- query: str = dspy.InputField()
41
- answer: str = dspy.OutputField()
42
-
43
- rlm = PredictRLM(AnalyzeImages, lm="openai/gpt-5.4", sub_lm="openai/gpt-5.1")
44
- result = rlm(images=[File(path="page.png")], query="Extract all visible text, then count each letter A-Z (case-insensitive).")
45
- print(result.answer)
46
- ```
47
-
48
- ### Use it with your coding agent
49
-
50
- Add the [predict-rlm agent skill](skills/create-rlm/SKILL.md) to Claude Code, Codex, Cursor, or any compatible coding agent:
51
-
52
- ```bash
53
- npx skills add Trampoline-AI/predict-rlm
54
- ```
55
-
56
- Your agent will then know how to build RLMs using predict-rlm — including the file structure, signatures, tools, and skills patterns.
57
-
58
- ## Demos
59
-
60
- | Example | Description | Input / Output | Preview |
61
- |---|---|---|---|
62
- | [Document Analysis](examples/document_analysis/) | Analyze documents and extract key dates, entities, and financial information into a structured report | **Input:** 1 PDF, 136 pages<br>**Output:** Structured briefing report with key dates, entities, and financial info ([sample](examples/document_analysis/sample/output/report.md)) | <a href="examples/document_analysis/sample/output/report.md"><img src="examples/document_analysis/sample/output/screenshot.png" width="200"></a> |
63
- | [Document Redaction](examples/document_redaction/) | Redact PII from PDFs based on a policy, then verify the redactions visually | **Input:** 1 PDF, 6 pages<br>**Output:** 96 PII redactions across 6 categories, verified redacted PDF ([sample](examples/document_redaction/sample/output/output.md)) | <a href="examples/document_redaction/sample/output/output.md"><img src="examples/document_redaction/sample/output/screenshot.png" width="200"></a> |
64
- | [Invoice Processing](examples/invoice_processing/) | Extract vendor info, line items, and totals from PDF invoices into a consolidated Excel spreadsheet | **Input:** 2 PDFs, 2 pages<br>**Output:** Line items, totals, and vendor info in Excel ([sample](examples/invoice_processing/sample/output/)) | <a href="examples/invoice_processing/sample/output/output.md"><img src="examples/invoice_processing/sample/output/screenshot.png" width="200"></a> |
65
- | [Contract Comparison](examples/contract_comparison/) | Compare two contract versions and produce a structured diff report with per-section analysis | **Input:** 2 PDFs, 45 pages<br>**Output:** Per-section diff report with key differences ([sample](examples/contract_comparison/sample/output/)) | <a href="examples/contract_comparison/sample/output/comparison-report.md"><img src="examples/contract_comparison/sample/output/screenshot.png" width="200"></a> |
66
-
67
- ## Why RLMs?
68
-
69
- Think of an RLM as a **callable, pre-configured agent**. Like Claude Code or Cursor, it can autonomously explore context, write and execute code, call tools, inspect results, and iterate until the task is done. Unlike a chat agent, an RLM is a **function** — you define its inputs, outputs, and tools, then call it from your code. It returns structured data, not chat messages.
70
-
71
- This makes RLMs ideal for tasks that are:
72
-
73
- - **Specific and repeatable** — tasks with a well-defined SOP and a known desired outcome. Think of an RLM as a Claude Code that's been purpose-built for one task — with the right tools, the right instructions, and a tuned workflow that reliably produces the result you want. You define the procedure once, and the RLM follows it every time.
74
- - **Context-heavy** — too much data to fit in a single prompt. The RLM selectively loads what it needs via tools, working through documents page by page rather than stuffing everything into one call.
75
- - **Multi-step** — require exploring, extracting, computing, and synthesizing. The RLM writes code to orchestrate these steps, parallelizing where possible (e.g. processing 50 pages concurrently with `asyncio.gather()`).
76
- - **Action-oriented** — need to make changes, not just read. By giving the RLM tools that modify state (redact text, call APIs, write files), it becomes an autonomous executor — not just an analyzer.
77
- - **Iterative** — the RLM can inspect its own results, catch errors, retry with different approaches, and verify its work before submitting. It self-corrects in ways a single LLM call cannot.
78
-
79
- ## What is predict-rlm?
80
-
81
- `predict-rlm` extends DSPy's RLM with a built-in `predict()` tool — a sub-LM the RLM can call from within its sandbox to perform language understanding, vision analysis, and structured extraction via DSPy signatures.
82
-
83
- The architecture is two-level:
84
-
85
- 1. **The outer LLM** (the RLM itself) writes and executes Python code in a sandboxed REPL. It plans, orchestrates, and iterates.
86
- 2. **The sub-LM** (via `predict()`) handles perception and extraction — analyzing images, understanding text, and returning typed results.
87
-
88
- The sub-LM supports `dspy.Image` type hints, which means `predict()` calls can pass images (as URLs or base64) directly to a vision-capable model. This makes RLMs **natively multimodal** — the outer LLM renders a PDF page to an image, passes it to `predict()`, and gets back structured data. The RLM itself doesn't need to be a vision model; it delegates visual understanding to the sub-LM.
89
-
90
- The outer LLM decides *what* to look at and *when*; the sub-LM decides *what it sees*. This separation is key to context management — the outer LLM's context stays small (code + tool results), while context-heavy work like reading a full page image or analyzing a long text block is offloaded to `predict()` calls. Each `predict()` call gets its own context window with the sub-LM, so the RLM can process far more total data than any single LLM call could hold.
91
-
92
- ## Features
93
-
94
- - **Built-in `predict()` tool** — call a sub-LM from inside the sandbox with DSPy signatures and type hints
95
- - **JSPI-enabled WASM sandbox** — concurrent async tool execution via Pyodide with `asyncio.gather()`
96
- - **Structured outputs** — Pydantic models, typed fields, and lists as output types
97
- - **Custom tools** — give the RLM tools that read, write, or modify external state
98
- - **Skills** — composable bundles of instructions, PyPI packages, and tools for domain-specific tasks
99
- - **Multimodal** — sub-LM calls support `dspy.Image`, so the RLM can analyze images, PDFs, screenshots, etc. without the outer LLM needing vision capabilities
100
- - **Optimizable** — built on DSPy, so optimizers can tune prompts and few-shot examples automatically. Inference-time scaling techniques like [GEPA](https://arxiv.org/abs/2504.00294) push accuracy further by generating and selecting among multiple candidate solutions
101
-
102
- ## How it works
103
-
104
- 1. You define **inputs**, **outputs**, and **tools** — what the RLM receives, what it should produce, and what actions it can take
105
- 2. The outer LLM writes Python code in a sandboxed Pyodide/WASM REPL
106
- 3. Inside the sandbox, it calls `await predict(signature, **kwargs)` to invoke the sub-LM for understanding and extraction
107
- 4. It iterates — exploring data, calling tools, building up intermediate results, and handling errors
108
- 5. When done, it calls `SUBMIT()` with the final structured output
109
-
110
- Each iteration is a REPL turn: the LLM sees the output of its previous code, decides what to do next, and writes more code. State persists between iterations, so it can accumulate findings across many steps.
111
-
112
- ### Signatures and file I/O
113
-
114
- The DSPy signature defines the **inputs**, **outputs**, and **strategy** (via the docstring). Use `File` for file-typed fields — input files are mounted into the sandbox, output files are synced back (see [API](#file) for details).
115
-
116
- ```python
117
- from predict_rlm import File, PredictRLM, Skill
118
-
119
- class AnalyzeDocuments(dspy.Signature):
120
- """Analyze documents and produce a structured report.
121
-
122
- 1. Survey the documents — file names, page counts, document types
123
- 2. Render pages as images and use predict() to extract content
124
- 3. Produce the report following the criteria's format
125
- """
126
- documents: list[File] = dspy.InputField()
127
- analysis: DocumentAnalysis = dspy.OutputField()
128
-
129
- pdf_skill = Skill(
130
- name="pdf",
131
- instructions="Use pymupdf to open and render PDF pages...",
132
- packages=["pymupdf"],
133
- )
134
-
135
- rlm = PredictRLM(
136
- AnalyzeDocuments,
137
- lm="openai/gpt-5.4",
138
- sub_lm="openai/gpt-5.1",
139
- skills=[pdf_skill],
140
- )
141
-
142
- documents = [File(path="report.pdf"), File(path="appendix.pdf")]
143
- result = rlm(documents=documents)
144
- ```
145
-
146
- Inside the sandbox, the RLM autonomously decides which pages to load and when:
147
-
148
- ```python
149
- # The RLM writes code like this — you don't write this, the LLM does:
150
- import pymupdf, base64, asyncio
151
-
152
- doc = pymupdf.open(documents[0])
153
- images = [
154
- f"data:image/png;base64,{base64.b64encode(doc[i].get_pixmap(dpi=200).tobytes('png')).decode()}"
155
- for i in range(3)
156
- ]
157
- results = await asyncio.gather(*[
158
- predict("page: dspy.Image -> dates: list[str]", page=img)
159
- for img in images
160
- ])
161
- ```
162
-
163
- ## Skills
164
-
165
- Skills are the primary way to extend what an RLM can do inside its sandbox. The sandbox starts with just Python's standard library and `predict()` — skills add **PyPI packages**, **instructions**, **modules**, and **tools** on top.
166
-
167
- Skills are for **general capabilities** — teaching the RLM how to use a library or approach a domain. For single specialized functions (fetch a URL, query a database, call an API), use the `tools=` parameter directly instead.
168
-
169
- This is powerful for the same reason CLI tools are powerful for Claude Code: if there's a Python package for it, the RLM can use it. Data manipulation with pandas, PDF parsing with pdfplumber, image processing with Pillow, web scraping with beautifulsoup4, geospatial analysis with shapely — skills make any of these available inside the sandbox, and the RLM can write code against them autonomously.
170
-
171
- **Package compatibility:** The sandbox runs [Pyodide](https://pyodide.org/) (CPython compiled to WebAssembly), which supports **pure-Python packages** out of the box via micropip. Packages with C extensions only work if they ship a pre-built Pyodide wheel — [many popular ones do](https://pyodide.org/en/stable/usage/packages-in-pyodide.html) (numpy, pandas, scipy, Pillow, pymupdf, etc.), but packages that rely on system libraries without a Pyodide build (e.g. psycopg2, torch) cannot be installed in the sandbox. For these, expose the functionality as a **host-side tool** instead — the tool runs in your normal Python environment and the RLM calls it from the sandbox via the tool bridge.
172
-
173
- Unlike Claude Code skills, which need to be discovered and loaded on demand (because Claude Code is a general-purpose agent that can't load every capability at once), RLM skills are **always loaded into context**. This works because an RLM is already scoped to a specific task — you know exactly what capabilities it needs when you define it, so you can confidently pass all relevant skills upfront without worrying about context bloat or dynamic discovery.
174
-
175
- ```python
176
- from predict_rlm import PredictRLM, Skill
177
-
178
- pdf_skill = Skill(
179
- name="pdf-extraction",
180
- instructions="Use pdfplumber for table extraction. Prefer page.extract_tables() for tabular content.",
181
- packages=["pdfplumber"],
182
- )
183
-
184
- rlm = PredictRLM(
185
- "documents -> tables: list[dict]",
186
- lm="openai/gpt-5.4",
187
- sub_lm="openai/gpt-5.1",
188
- skills=[pdf_skill],
189
- )
190
- ```
191
-
192
- Skills are composable — pass multiple skills and their instructions, packages, and tools are merged automatically:
193
-
194
- ```python
195
- data_skill = Skill(
196
- name="data-analysis",
197
- instructions="Use pandas for tabular data. Print df.head() to inspect before processing.",
198
- packages=["pandas", "openpyxl"],
199
- )
200
-
201
- viz_skill = Skill(
202
- name="visualization",
203
- instructions="Use matplotlib for charts. Save figures to bytes, don't call plt.show().",
204
- packages=["matplotlib"],
205
- )
206
-
207
- rlm = PredictRLM(
208
- "spreadsheet, query -> analysis: str, chart: bytes",
209
- lm="openai/gpt-5.4",
210
- skills=[data_skill, viz_skill],
211
- )
212
- ```
213
-
214
- ### Sandbox modules
215
-
216
- Skills can mount Python modules directly into the sandbox via the `modules` field. This lets you ship custom Python code alongside a skill that the RLM can `import` in its sandbox code — without publishing it to PyPI.
217
-
218
- ```python
219
- from pathlib import Path
220
-
221
- spreadsheet_skill = Skill(
222
- name="spreadsheet",
223
- instructions="Use openpyxl to build workbooks. Use formula_eval to verify formulas.",
224
- packages=["openpyxl", "pandas", "formulas"],
225
- modules={"formula_eval": str(Path(__file__).parent / "modules" / "formula_eval.py")},
226
- )
227
- ```
228
-
229
- The key maps the **import name** to the **host filesystem path** of the `.py` file. When the RLM runs, the module is mounted into the sandbox and becomes importable:
230
-
231
- ```python
232
- # Inside the sandbox, the RLM can write:
233
- from formula_eval import evaluate
234
- report = evaluate("output.xlsx")
235
- ```
236
-
237
- ### Built-in skills
238
-
239
- predict-rlm ships a library of pre-built skills you can use directly:
240
-
241
- ```python
242
- from predict_rlm.skills import pdf, spreadsheet, docx
243
-
244
- rlm = PredictRLM(MySignature, skills=[pdf, spreadsheet, docx])
245
- ```
246
-
247
- | Skill | Import | Packages | Modules | What it teaches the RLM |
248
- |---|---|---|---|---|
249
- | **pdf** | `from predict_rlm.skills import pdf` | `pymupdf` | — | Read, render, modify, and redact PDFs |
250
- | **spreadsheet** | `from predict_rlm.skills import spreadsheet` | `openpyxl`, `pandas`, `formulas` | `formula_eval` | Build and modify Excel workbooks with formulas and formatting |
251
- | **docx** | `from predict_rlm.skills import docx` | `python-docx` | `md2docx` | Read, write, and modify Word documents with tables, formatting, and styles |
252
-
253
- ## Examples
254
-
255
- ### Running the examples
256
-
257
- ```bash
258
- git clone https://github.com/Trampoline-AI/predict-rlm.git
259
- cd predict-rlm
260
- uv sync --extra examples
261
- ```
262
-
263
- Set your API key for the LLM provider used in the example (defaults to OpenAI):
264
-
265
- ```bash
266
- export OPENAI_API_KEY=sk-...
267
- ```
268
-
269
- Each example defaults to the PDFs in its `sample/input/` directory. You can also pass file paths or a directory:
270
-
271
- ```bash
272
- # Document analysis
273
- uv run examples/document_analysis/run.py
274
-
275
- # Document redaction
276
- uv run examples/document_redaction/run.py
277
-
278
- # Invoice processing
279
- uv run examples/invoice_processing/run.py
280
-
281
- # Contract comparison
282
- uv run examples/contract_comparison/run.py
283
-
284
- # Pass custom files or a directory
285
- uv run examples/document_analysis/run.py /path/to/docs/
286
- uv run examples/invoice_processing/run.py invoice1.pdf invoice2.pdf
287
-
288
- # With debug output (prints REPL code and tool calls to stderr)
289
- uv run examples/document_analysis/run.py --debug
290
- ```
291
-
292
- Outputs are saved to `output/{timestamp}/` inside each example directory.
293
-
294
- ### Example #1: Document Analysis
295
-
296
- **What it does:** Takes a set of PDFs and a natural language prompt (e.g. "extract key dates, entities, and financial information") and produces a structured report with typed fields.
297
-
298
- The output is defined as Pydantic schemas:
299
-
300
- ```python
301
- class KeyDate(BaseModel):
302
- name: str # e.g. "Submission Deadline"
303
- date: str # ISO format (YYYY-MM-DD)
304
- time: str | None = None # 24-hour format (HH:MM)
305
- timezone: str | None = None # e.g. "EST", "UTC"
306
-
307
- class KeyEntity(BaseModel):
308
- name: str # e.g. "Acme Corporation"
309
- role: str | None = None # e.g. "Contractor"
310
- contact: str | None = None
311
-
312
- class DocumentAnalysis(BaseModel):
313
- report: str # Full markdown report
314
- key_dates: list[KeyDate]
315
- key_entities: list[KeyEntity]
316
- ```
317
-
318
- The DSPy signature ties them together with task instructions:
319
-
320
- ```python
321
- class AnalyzeDocuments(dspy.Signature):
322
- """Analyze documents and produce a structured report.
323
-
324
- 1. Read the report criteria to understand what to extract
325
- 2. Survey the documents — file names, page counts, document types
326
- 3. Render pages and use predict() to extract content
327
- 4. Produce the report following the criteria's format
328
- """
329
- documents: list[File] = dspy.InputField()
330
- analysis: DocumentAnalysis = dspy.OutputField()
331
- ```
332
-
333
- **How it works:**
334
-
335
- The RLM receives `File` references as input. The files are mounted into the sandbox, and the RLM opens them directly with pymupdf. This is the key design pattern: the RLM **manages its own context window**. Given a 200-page document set, it doesn't try to process everything at once. Instead, it:
336
-
337
- 1. **Surveys** the documents — checks file names and page counts to understand the structure
338
- 2. **Samples** strategically — renders a few pages to understand the format and identify where key information lives
339
- 3. **Extracts in parallel** — uses `asyncio.gather()` to send multiple pages to `predict()` concurrently, extracting dates, entities, or other fields from each page simultaneously
340
- 4. **Synthesizes** — aggregates findings across pages, deduplicates, and produces the final structured output
341
-
342
- The `predict()` calls use DSPy signatures with type hints, so the sub-LM returns typed data (not free-form text) that the RLM can immediately work with in code:
343
-
344
- ```python
345
- # Inside the sandbox, the RLM writes code like this:
346
- result = await predict(
347
- "page: dspy.Image -> dates: list[str], entities: list[str]",
348
- instructions="Extract all dates and key entities from this page.",
349
- page=page_image,
350
- )
351
- # result["dates"] is a list of strings, ready to use
352
- ```
353
-
354
- **What you provide:** A [DSPy Signature](examples/document_analysis/signature.py) defining the task instructions, a [Pydantic schema](examples/document_analysis/schema.py) for the output, and a [pdf skill](examples/document_analysis/skills.py). The [service layer](examples/document_analysis/service.py) wires it all together in ~20 lines.
355
-
356
- **Sample run:** The [`sample/`](examples/document_analysis/sample/) directory contains a 136-page airport parking management document and the [full output](examples/document_analysis/sample/output/report.md) produced by the RLM. Here are the run stats:
357
-
358
- | | Main LM (`gpt-5.4`) | Sub-LM (`gpt-5.1`) |
359
- |---|---|---|
360
- | Calls | 8 | 63 |
361
- | Input tokens | 93,571 | 69,986 |
362
- | Output tokens | 9,241 | 21,274 |
363
- | Cost | $0.22 | $0.30 |
364
-
365
- **136 pages analyzed in ~4 minutes for $0.52 total ($0.004/page).** The outer LLM made 8 calls to orchestrate the entire run, while 63 sub-LM calls did the heavy lifting in parallel.
366
-
367
- ### Example #2: Document Redaction
368
-
369
- **What it does:** Takes PDFs and a redaction policy (e.g. "redact all PII: names, phone numbers, addresses, signatures") and produces redacted PDF files with sensitive content blacked out — plus a structured report of every redaction applied.
370
-
371
- This example demonstrates two key RLM capabilities:
372
-
373
- First, the RLM is an **autonomous executor that modifies files**. It inspects pages, identifies sensitive content, applies redactions, and then *re-inspects* the pages to verify the redactions worked. If a text match fails (the exact string wasn't found on the page), it retries with a shorter substring.
374
-
375
- Second, the RLM **parallelizes sub-LM calls to process large documents efficiently**. A 100-page PDF doesn't mean 100 sequential LLM calls — the RLM writes `asyncio.gather()` to fan out `predict()` calls across all pages concurrently. Each page gets its own sub-LM call with its own context window, all running in parallel.
376
- **How it works:**
377
-
378
- The RLM receives `File` references to PDFs, which are mounted into the sandbox. It uses pymupdf directly inside the sandbox (via skills) — no host-side tools needed. The workflow the RLM autonomously executes:
379
-
380
- 1. **Scans pages in parallel** — renders batches of pages as images and fans out `predict()` calls via `asyncio.gather()` to identify all text matching the redaction criteria across every page concurrently
381
- 2. **Applies redactions** — uses pymupdf's `search_for()` and `add_redact_annot()` to black out identified strings. If any are missed, it adjusts and retries
382
- 3. **Handles non-text content** — for signatures, logos, or images, it estimates bounding box coordinates and redacts by area
383
- 4. **Verifies** — re-renders redacted pages and confirms the sensitive content is gone
384
- 5. **Reports** — produces a `RedactionResult` with per-page summaries and the complete list of redaction targets
385
-
386
- Redacted PDFs are written to a `list[File]` output and synced back to the host automatically.
387
-
388
- **What you provide:** A [DSPy Signature](examples/document_redaction/signature.py) with step-by-step redaction instructions, a [Pydantic schema](examples/document_redaction/schema.py) for the result, and [skills](examples/document_redaction/skills.py) for pymupdf and redaction patterns. The [service layer](examples/document_redaction/service.py) wires it together in ~20 lines.
389
-
390
- **Sample run:** The [`sample/`](examples/document_redaction/sample/) directory contains a 6-page mock employment agreement filled with PII (names, SINs, bank accounts, addresses, phone numbers, health cards) and the [full output](examples/document_redaction/sample/output/output.md) produced by the RLM — 96 redactions across all 6 pages. Here are the run stats:
391
-
392
- | | Main LM (`gpt-5.4`) | Sub-LM (`gpt-5.1`) |
393
- |---|---|---|
394
- | Calls | 6 | 20 |
395
- | Input tokens | 55,432 | 19,572 |
396
- | Output tokens | 5,866 | 6,905 |
397
- | Cost | $0.14 | $0.09 |
398
-
399
- **6 pages fully redacted in about 2 minutes for $0.24 total.** The RLM identified and redacted 96 instances of PII across 6 categories (names, addresses, phone numbers, emails, government IDs, financial info), then verified each page.
400
-
401
- ## API
402
-
403
- ### `PredictRLM`
404
-
405
- The main class. Extends `dspy.RLM` with a built-in `predict()` tool.
406
-
407
- ```python
408
- PredictRLM(
409
- signature, # DSPy signature (str or Signature class)
410
- lm=None, # Main LM — LM instance or model string
411
- sub_lm=None, # LM for predict() — LM instance or model string
412
- max_iterations=30, # Max REPL iterations
413
- max_llm_calls=50, # Max LM calls per execution
414
- tools=None, # Additional tool functions
415
- skills=None, # List of Skill instances
416
- allowed_domains=None, # Domains the sandbox can access
417
- debug=False, # Print REPL activity to stderr
418
- )
419
- ```
420
-
421
- ### `File`
422
-
423
- Unified file type for inputs and outputs. Behavior is determined by the field position in the signature.
424
-
425
- ```python
426
- File(path="report.pdf") # Single file
427
- File.from_dir("docs/") # All files in a directory -> list[File]
428
- ```
429
-
430
- As an **input field**, the file is mounted into the sandbox. As an **output field**, it's synced back to the host after execution.
431
-
432
- ```python
433
- class MySignature(dspy.Signature):
434
- source: File = dspy.InputField() # mounted into sandbox
435
- docs: list[File] = dspy.InputField() # multiple files mounted
436
- result: File = dspy.OutputField() # single file synced back
437
- outputs: list[File] = dspy.OutputField() # multiple files synced back
438
- ```
439
-
440
- ### `Skill`
441
-
442
- Reusable bundle of instructions, packages, modules, and tools.
443
-
444
- ```python
445
- Skill(
446
- name="my-skill", # Short identifier
447
- instructions="How to approach...", # Injected into the RLM prompt
448
- packages=["pandas", "pdfplumber"], # Installed in the sandbox
449
- modules={"helper": "/path/to/mod.py"},# Mounted as importable modules in the sandbox
450
- tools={"my_func": my_func}, # Exposed alongside predict()
451
- )
452
- ```
453
-
454
- ## Requirements
455
-
456
- - Python 3.11+
457
- - [Deno](https://deno.com/) (for the sandboxed code interpreter)
458
-
459
- The RLM executes generated Python code inside a [Pyodide](https://pyodide.org/) WASM sandbox managed by Deno. Deno provides the V8 runtime with JSPI support, fine-grained permissions (network, filesystem), and runs the sandbox as a subprocess — your host Python process never executes untrusted code directly.
460
-
461
- See the [Deno installation docs](https://docs.deno.com/runtime/getting_started/installation/) for setup instructions. Deno is automatically invoked when `PredictRLM` runs — no additional configuration needed.
462
-
463
- ## License
464
-
465
- MIT — see [LICENSE](LICENSE) for details.
File without changes
File without changes