tokenpack-rag 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- tokenpack_rag-0.1.0/LICENSE +75 -0
- tokenpack_rag-0.1.0/PKG-INFO +551 -0
- tokenpack_rag-0.1.0/README.md +433 -0
- tokenpack_rag-0.1.0/pyproject.toml +51 -0
- tokenpack_rag-0.1.0/setup.cfg +4 -0
- tokenpack_rag-0.1.0/src/tokenpack/__init__.py +28 -0
- tokenpack_rag-0.1.0/src/tokenpack/benchmark.py +256 -0
- tokenpack_rag-0.1.0/src/tokenpack/chunk_profiles.py +35 -0
- tokenpack_rag-0.1.0/src/tokenpack/chunking.py +340 -0
- tokenpack_rag-0.1.0/src/tokenpack/cli.py +436 -0
- tokenpack_rag-0.1.0/src/tokenpack/compression.py +177 -0
- tokenpack_rag-0.1.0/src/tokenpack/dataset.py +141 -0
- tokenpack_rag-0.1.0/src/tokenpack/doctor.py +41 -0
- tokenpack_rag-0.1.0/src/tokenpack/embeddings.py +106 -0
- tokenpack_rag-0.1.0/src/tokenpack/export.py +66 -0
- tokenpack_rag-0.1.0/src/tokenpack/generation.py +132 -0
- tokenpack_rag-0.1.0/src/tokenpack/index.py +46 -0
- tokenpack_rag-0.1.0/src/tokenpack/loaders.py +826 -0
- tokenpack_rag-0.1.0/src/tokenpack/mcp_server.py +216 -0
- tokenpack_rag-0.1.0/src/tokenpack/models.py +105 -0
- tokenpack_rag-0.1.0/src/tokenpack/packing.py +405 -0
- tokenpack_rag-0.1.0/src/tokenpack/pipeline.py +50 -0
- tokenpack_rag-0.1.0/src/tokenpack/reporting.py +96 -0
- tokenpack_rag-0.1.0/src/tokenpack/reranking.py +104 -0
- tokenpack_rag-0.1.0/src/tokenpack/scoring.py +325 -0
- tokenpack_rag-0.1.0/src/tokenpack/scoring_experimental.py +853 -0
- tokenpack_rag-0.1.0/src/tokenpack/selectors.py +360 -0
- tokenpack_rag-0.1.0/src/tokenpack/tokenization.py +26 -0
- tokenpack_rag-0.1.0/src/tokenpack_rag.egg-info/PKG-INFO +551 -0
- tokenpack_rag-0.1.0/src/tokenpack_rag.egg-info/SOURCES.txt +36 -0
- tokenpack_rag-0.1.0/src/tokenpack_rag.egg-info/dependency_links.txt +1 -0
- tokenpack_rag-0.1.0/src/tokenpack_rag.egg-info/entry_points.txt +5 -0
- tokenpack_rag-0.1.0/src/tokenpack_rag.egg-info/requires.txt +30 -0
- tokenpack_rag-0.1.0/src/tokenpack_rag.egg-info/top_level.txt +1 -0
- tokenpack_rag-0.1.0/tests/test_core.py +1062 -0
- tokenpack_rag-0.1.0/tests/test_longbench_eval.py +274 -0
- tokenpack_rag-0.1.0/tests/test_modal_generation_eval.py +117 -0
- tokenpack_rag-0.1.0/tests/test_qasper_compression_eval.py +115 -0
|
@@ -0,0 +1,75 @@
|
|
|
1
|
+
Business Source License 1.1
|
|
2
|
+
|
|
3
|
+
Licensor: TokenPack Contributors
|
|
4
|
+
Licensed Work: TokenPack
|
|
5
|
+
Additional Use Grant: None
|
|
6
|
+
Change Date: 2030-05-10
|
|
7
|
+
Change License: Apache License, Version 2.0
|
|
8
|
+
|
|
9
|
+
License text copyright © 2017 MariaDB Corporation Ab, All Rights Reserved.
|
|
10
|
+
"Business Source License" is a trademark of MariaDB Corporation Ab.
|
|
11
|
+
|
|
12
|
+
Terms
|
|
13
|
+
|
|
14
|
+
The Licensor hereby grants you the right to copy, modify, create derivative
|
|
15
|
+
works, redistribute, and make non-production use of the Licensed Work. The
|
|
16
|
+
Licensor may make an Additional Use Grant, above, permitting limited production
|
|
17
|
+
use.
|
|
18
|
+
|
|
19
|
+
Effective on the Change Date, or the fourth anniversary of the first publicly
|
|
20
|
+
available distribution of a specific version of the Licensed Work under this
|
|
21
|
+
License, whichever comes first, the Licensor hereby grants you rights under the
|
|
22
|
+
terms of the Change License, and the rights granted in the paragraph above
|
|
23
|
+
terminate.
|
|
24
|
+
|
|
25
|
+
If your use of the Licensed Work does not comply with the requirements currently
|
|
26
|
+
in effect as described in this License, you must purchase a commercial license
|
|
27
|
+
from the Licensor, its affiliated entities, or authorized resellers, or you must
|
|
28
|
+
refrain from using the Licensed Work.
|
|
29
|
+
|
|
30
|
+
All copies of the original and modified Licensed Work, and derivative works of
|
|
31
|
+
the Licensed Work, are subject to this License. This License applies separately
|
|
32
|
+
for each version of the Licensed Work and the Change Date may vary for each
|
|
33
|
+
version of the Licensed Work released by Licensor.
|
|
34
|
+
|
|
35
|
+
You must conspicuously display this License on each original or modified copy of
|
|
36
|
+
the Licensed Work. If you receive the Licensed Work in original or modified form
|
|
37
|
+
from a third party, the terms and conditions set forth in this License apply to
|
|
38
|
+
your use of that work.
|
|
39
|
+
|
|
40
|
+
Any use of the Licensed Work in violation of this License will automatically
|
|
41
|
+
terminate your rights under this License for the current and all other versions
|
|
42
|
+
of the Licensed Work.
|
|
43
|
+
|
|
44
|
+
This License does not grant you any right in any trademark or logo of Licensor
|
|
45
|
+
or its affiliates (provided that you may use a trademark or logo of Licensor as
|
|
46
|
+
expressly required by this License).
|
|
47
|
+
|
|
48
|
+
TO THE EXTENT PERMITTED BY APPLICABLE LAW, THE LICENSED WORK IS PROVIDED ON AN
|
|
49
|
+
"AS IS" BASIS. LICENSOR HEREBY DISCLAIMS ALL WARRANTIES AND CONDITIONS, EXPRESS
|
|
50
|
+
OR IMPLIED, INCLUDING (WITHOUT LIMITATION) WARRANTIES OF MERCHANTABILITY,
|
|
51
|
+
FITNESS FOR A PARTICULAR PURPOSE, NON-INFRINGEMENT, AND TITLE.
|
|
52
|
+
|
|
53
|
+
MariaDB hereby grants you permission to use this License's text to license your
|
|
54
|
+
works, and to refer to it using the trademark "Business Source License", as long
|
|
55
|
+
as you comply with the Covenants of Licensor below.
|
|
56
|
+
|
|
57
|
+
Covenants of Licensor
|
|
58
|
+
|
|
59
|
+
In consideration of the right to use this License's text and the "Business
|
|
60
|
+
Source License" name and trademark, Licensor covenants to MariaDB, and to all
|
|
61
|
+
other recipients of the licensed work to be provided by Licensor:
|
|
62
|
+
|
|
63
|
+
1. To specify as the Change License the GPL Version 2.0 or any later version, or
|
|
64
|
+
a license that is compatible with GPL Version 2.0 or a later version, where
|
|
65
|
+
"compatible" means that software provided under the Change License can be
|
|
66
|
+
included in a program with software provided under GPL Version 2.0 or a later
|
|
67
|
+
version. Licensor may specify additional Change Licenses without limitation.
|
|
68
|
+
|
|
69
|
+
2. To either: (a) specify an additional grant of rights to use that does not
|
|
70
|
+
impose any additional restriction on the right granted in this License, as the
|
|
71
|
+
Additional Use Grant; or (b) insert the text "None".
|
|
72
|
+
|
|
73
|
+
3. To specify a Change Date.
|
|
74
|
+
|
|
75
|
+
4. Not to modify this License in any other way.
|
|
@@ -0,0 +1,551 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: tokenpack-rag
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Query-aware semantic chunk selection under LLM context-window budgets.
|
|
5
|
+
Author-email: Metehan Kizilcik <metekizilcik@gmail.com>
|
|
6
|
+
License: Business Source License 1.1
|
|
7
|
+
|
|
8
|
+
Licensor: TokenPack Contributors
|
|
9
|
+
Licensed Work: TokenPack
|
|
10
|
+
Additional Use Grant: None
|
|
11
|
+
Change Date: 2030-05-10
|
|
12
|
+
Change License: Apache License, Version 2.0
|
|
13
|
+
|
|
14
|
+
License text copyright © 2017 MariaDB Corporation Ab, All Rights Reserved.
|
|
15
|
+
"Business Source License" is a trademark of MariaDB Corporation Ab.
|
|
16
|
+
|
|
17
|
+
Terms
|
|
18
|
+
|
|
19
|
+
The Licensor hereby grants you the right to copy, modify, create derivative
|
|
20
|
+
works, redistribute, and make non-production use of the Licensed Work. The
|
|
21
|
+
Licensor may make an Additional Use Grant, above, permitting limited production
|
|
22
|
+
use.
|
|
23
|
+
|
|
24
|
+
Effective on the Change Date, or the fourth anniversary of the first publicly
|
|
25
|
+
available distribution of a specific version of the Licensed Work under this
|
|
26
|
+
License, whichever comes first, the Licensor hereby grants you rights under the
|
|
27
|
+
terms of the Change License, and the rights granted in the paragraph above
|
|
28
|
+
terminate.
|
|
29
|
+
|
|
30
|
+
If your use of the Licensed Work does not comply with the requirements currently
|
|
31
|
+
in effect as described in this License, you must purchase a commercial license
|
|
32
|
+
from the Licensor, its affiliated entities, or authorized resellers, or you must
|
|
33
|
+
refrain from using the Licensed Work.
|
|
34
|
+
|
|
35
|
+
All copies of the original and modified Licensed Work, and derivative works of
|
|
36
|
+
the Licensed Work, are subject to this License. This License applies separately
|
|
37
|
+
for each version of the Licensed Work and the Change Date may vary for each
|
|
38
|
+
version of the Licensed Work released by Licensor.
|
|
39
|
+
|
|
40
|
+
You must conspicuously display this License on each original or modified copy of
|
|
41
|
+
the Licensed Work. If you receive the Licensed Work in original or modified form
|
|
42
|
+
from a third party, the terms and conditions set forth in this License apply to
|
|
43
|
+
your use of that work.
|
|
44
|
+
|
|
45
|
+
Any use of the Licensed Work in violation of this License will automatically
|
|
46
|
+
terminate your rights under this License for the current and all other versions
|
|
47
|
+
of the Licensed Work.
|
|
48
|
+
|
|
49
|
+
This License does not grant you any right in any trademark or logo of Licensor
|
|
50
|
+
or its affiliates (provided that you may use a trademark or logo of Licensor as
|
|
51
|
+
expressly required by this License).
|
|
52
|
+
|
|
53
|
+
TO THE EXTENT PERMITTED BY APPLICABLE LAW, THE LICENSED WORK IS PROVIDED ON AN
|
|
54
|
+
"AS IS" BASIS. LICENSOR HEREBY DISCLAIMS ALL WARRANTIES AND CONDITIONS, EXPRESS
|
|
55
|
+
OR IMPLIED, INCLUDING (WITHOUT LIMITATION) WARRANTIES OF MERCHANTABILITY,
|
|
56
|
+
FITNESS FOR A PARTICULAR PURPOSE, NON-INFRINGEMENT, AND TITLE.
|
|
57
|
+
|
|
58
|
+
MariaDB hereby grants you permission to use this License's text to license your
|
|
59
|
+
works, and to refer to it using the trademark "Business Source License", as long
|
|
60
|
+
as you comply with the Covenants of Licensor below.
|
|
61
|
+
|
|
62
|
+
Covenants of Licensor
|
|
63
|
+
|
|
64
|
+
In consideration of the right to use this License's text and the "Business
|
|
65
|
+
Source License" name and trademark, Licensor covenants to MariaDB, and to all
|
|
66
|
+
other recipients of the licensed work to be provided by Licensor:
|
|
67
|
+
|
|
68
|
+
1. To specify as the Change License the GPL Version 2.0 or any later version, or
|
|
69
|
+
a license that is compatible with GPL Version 2.0 or a later version, where
|
|
70
|
+
"compatible" means that software provided under the Change License can be
|
|
71
|
+
included in a program with software provided under GPL Version 2.0 or a later
|
|
72
|
+
version. Licensor may specify additional Change Licenses without limitation.
|
|
73
|
+
|
|
74
|
+
2. To either: (a) specify an additional grant of rights to use that does not
|
|
75
|
+
impose any additional restriction on the right granted in this License, as the
|
|
76
|
+
Additional Use Grant; or (b) insert the text "None".
|
|
77
|
+
|
|
78
|
+
3. To specify a Change Date.
|
|
79
|
+
|
|
80
|
+
4. Not to modify this License in any other way.
|
|
81
|
+
|
|
82
|
+
Project-URL: Homepage, https://github.com/mo-tunn/TokenPack
|
|
83
|
+
Project-URL: Repository, https://github.com/mo-tunn/TokenPack
|
|
84
|
+
Project-URL: Paper, https://github.com/mo-tunn/TokenPack/blob/main/submission/TokenPack-paper.pdf
|
|
85
|
+
Keywords: rag,llm,context-compression,retrieval,knapsack
|
|
86
|
+
Classifier: Programming Language :: Python :: 3
|
|
87
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
88
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
89
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
90
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
91
|
+
Classifier: Topic :: Text Processing :: Indexing
|
|
92
|
+
Requires-Python: >=3.10
|
|
93
|
+
Description-Content-Type: text/markdown
|
|
94
|
+
License-File: LICENSE
|
|
95
|
+
Requires-Dist: sentence-transformers>=3.0.0
|
|
96
|
+
Provides-Extra: reranking
|
|
97
|
+
Requires-Dist: sentence-transformers>=3.0.0; extra == "reranking"
|
|
98
|
+
Provides-Extra: pdf
|
|
99
|
+
Requires-Dist: PyMuPDF>=1.24.0; extra == "pdf"
|
|
100
|
+
Requires-Dist: pypdf>=4.0.0; extra == "pdf"
|
|
101
|
+
Provides-Extra: tokens
|
|
102
|
+
Requires-Dist: tiktoken>=0.7.0; extra == "tokens"
|
|
103
|
+
Provides-Extra: compression
|
|
104
|
+
Requires-Dist: llmlingua>=0.2.2; extra == "compression"
|
|
105
|
+
Provides-Extra: modal
|
|
106
|
+
Requires-Dist: modal>=0.64.0; extra == "modal"
|
|
107
|
+
Requires-Dist: pandas>=2.0.0; extra == "modal"
|
|
108
|
+
Provides-Extra: mcp
|
|
109
|
+
Requires-Dist: mcp>=1.2.0; extra == "mcp"
|
|
110
|
+
Provides-Extra: office
|
|
111
|
+
Requires-Dist: python-docx>=1.1.0; extra == "office"
|
|
112
|
+
Requires-Dist: python-pptx>=0.6.23; extra == "office"
|
|
113
|
+
Requires-Dist: openpyxl>=3.1.0; extra == "office"
|
|
114
|
+
Provides-Extra: dev
|
|
115
|
+
Requires-Dist: pytest>=8.0.0; extra == "dev"
|
|
116
|
+
Requires-Dist: mcp>=1.2.0; extra == "dev"
|
|
117
|
+
Dynamic: license-file
|
|
118
|
+
|
|
119
|
+
# TokenPack-RAG
|
|
120
|
+
|
|
121
|
+
**TokenPack-RAG packs the most useful evidence chunks into a smaller LLM-ready context file.**
|
|
122
|
+
|
|
123
|
+
It turns long-context selection into a budgeted context-packing problem: chunks are items, token counts are weights, and query-conditioned evidence scores are values. The default pipeline is the current strongest setting from the paper:
|
|
124
|
+
|
|
125
|
+
```text
|
|
126
|
+
structure-aware semantic chunks + evidence-hybrid scoring + hybrid-greedy budget fill
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
The practical goal is simple: give your LLM less context while keeping the evidence that matters.
|
|
130
|
+
|
|
131
|
+
## What You Get
|
|
132
|
+
|
|
133
|
+
- A one-command CLI: `tokenpack-rag pack SOURCE --query "..."`
|
|
134
|
+
- Automatic token-budget estimation when you do not know what budget to choose.
|
|
135
|
+
- Automatic Markdown output next to your source file, such as `paper-tp.md`.
|
|
136
|
+
- Budget-valid context selection for long documents, code, PDFs, or mixed folders.
|
|
137
|
+
- Advanced `ingest`, `select`, `export-context`, `answer`, and `benchmark` commands for experiments.
|
|
138
|
+
- Optional local MCP server for agent tools such as Claude Desktop, Cursor, or Codex.
|
|
139
|
+
- Optional second-stage prompt compression with LLMLingua / LongLLMLingua.
|
|
140
|
+
- Reproducible paper artifacts under [`submission/`](submission).
|
|
141
|
+
|
|
142
|
+
## Install
|
|
143
|
+
|
|
144
|
+
From PyPI, once published:
|
|
145
|
+
|
|
146
|
+
```bash
|
|
147
|
+
pip install tokenpack-rag
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
From GitHub today:
|
|
151
|
+
|
|
152
|
+
```bash
|
|
153
|
+
pip install "tokenpack-rag @ git+https://github.com/mo-tunn/TokenPack.git"
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
For PDF parsing, Office files, token counting, compression, and development tools:
|
|
157
|
+
|
|
158
|
+
```bash
|
|
159
|
+
pip install "tokenpack-rag[pdf,office,tokens,compression,dev] @ git+https://github.com/mo-tunn/TokenPack.git"
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
For local agent/MCP usage:
|
|
163
|
+
|
|
164
|
+
```bash
|
|
165
|
+
pip install "tokenpack-rag[mcp,pdf,office,tokens] @ git+https://github.com/mo-tunn/TokenPack.git"
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
For local editable development:
|
|
169
|
+
|
|
170
|
+
```bash
|
|
171
|
+
git clone https://github.com/mo-tunn/TokenPack.git
|
|
172
|
+
cd TokenPack
|
|
173
|
+
pip install -e ".[pdf,office,tokens,compression,dev]"
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
TokenPack-RAG uses `sentence-transformers/all-MiniLM-L6-v2` as the default embedding model. Use `--offline-models` only when the model is already cached locally.
|
|
177
|
+
|
|
178
|
+
## Quick Start
|
|
179
|
+
|
|
180
|
+
Pack one document into an LLM-ready Markdown context:
|
|
181
|
+
|
|
182
|
+
```bash
|
|
183
|
+
tokenpack-rag pack README.md --query "How does TokenPack reduce LLM context cost?"
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
This writes:
|
|
187
|
+
|
|
188
|
+
```text
|
|
189
|
+
README-tp.md
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
For a PDF:
|
|
193
|
+
|
|
194
|
+
```bash
|
|
195
|
+
tokenpack-rag pack paper.pdf --query "What are the main contributions?"
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
This writes:
|
|
199
|
+
|
|
200
|
+
```text
|
|
201
|
+
paper-tp.md
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
For a folder:
|
|
205
|
+
|
|
206
|
+
```bash
|
|
207
|
+
tokenpack-rag pack docs/ --query "Summarize the design decisions in this project."
|
|
208
|
+
```
|
|
209
|
+
|
|
210
|
+
This writes:
|
|
211
|
+
|
|
212
|
+
```text
|
|
213
|
+
docs-tp.md
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
The output is not a modified PDF. It is a packed Markdown context file that you can paste or upload into your own LLM.
|
|
217
|
+
|
|
218
|
+
## Supported Inputs
|
|
219
|
+
|
|
220
|
+
TokenPack-RAG accepts a single file or a folder. Folder inputs are scanned recursively and unsupported binary/media files are skipped.
|
|
221
|
+
|
|
222
|
+
| Category | Extensions |
|
|
223
|
+
|---|---|
|
|
224
|
+
| Text and docs | `.txt`, `.text`, `.md`, `.markdown`, `.rst`, `.adoc`, `.tex`, `.log` |
|
|
225
|
+
| PDF | `.pdf` with the `pdf` extra |
|
|
226
|
+
| Web | `.html`, `.htm` |
|
|
227
|
+
| Data/config | `.json`, `.jsonl`, `.csv`, `.tsv`, `.yaml`, `.yml`, `.toml` |
|
|
228
|
+
| Office | `.docx`, `.pptx`, `.xlsx` with the `office` extra |
|
|
229
|
+
| Code | `.py`, `.js`, `.jsx`, `.ts`, `.tsx`, `.java`, `.go`, `.rs`, `.c`, `.cpp`, `.cs`, `.php`, `.rb`, `.swift`, `.kt`, `.scala`, `.sh`, `.ps1`, `.sql`, `.css`, `.xml`, and related variants |
|
|
230
|
+
|
|
231
|
+
Office support is optional so the base install stays lighter:
|
|
232
|
+
|
|
233
|
+
```bash
|
|
234
|
+
pip install "tokenpack-rag[office]"
|
|
235
|
+
```
|
|
236
|
+
|
|
237
|
+
## Auto Budget
|
|
238
|
+
|
|
239
|
+
`--budget` is optional. When you omit it, TokenPack-RAG estimates a budget from the source size:
|
|
240
|
+
|
|
241
|
+
```text
|
|
242
|
+
source_tokens = sum(chunk.token_count for chunk in index.chunks)
|
|
243
|
+
raw_budget = ceil(source_tokens * 0.50)
|
|
244
|
+
budget = clamp(raw_budget, min_budget=1200, max_budget=64000)
|
|
245
|
+
reserve_output = min(4000, max(512, int(budget * 0.10)))
|
|
246
|
+
selection_budget = budget - reserve_output
|
|
247
|
+
```
|
|
248
|
+
|
|
249
|
+
Example terminal summary:
|
|
250
|
+
|
|
251
|
+
```text
|
|
252
|
+
Source: paper.pdf
|
|
253
|
+
Output: paper-tp.md
|
|
254
|
+
Source tokens: 142,000
|
|
255
|
+
Auto budget: 64,000 tokens (ratio=50%, capped by max-budget)
|
|
256
|
+
Reserved for answer: 4,000
|
|
257
|
+
Selection budget: 60,000
|
|
258
|
+
Selected: 188 chunks / 59,240 tokens
|
|
259
|
+
```
|
|
260
|
+
|
|
261
|
+
You can still take control when you want a smaller or larger packed context:
|
|
262
|
+
|
|
263
|
+
```bash
|
|
264
|
+
tokenpack-rag pack paper.pdf \
|
|
265
|
+
--query "What evidence supports the main claim?" \
|
|
266
|
+
--budget 32000 \
|
|
267
|
+
--overwrite
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
Other budget controls:
|
|
271
|
+
|
|
272
|
+
```bash
|
|
273
|
+
tokenpack-rag pack paper.pdf --query "..." --budget-ratio 0.35
|
|
274
|
+
tokenpack-rag pack paper.pdf --query "..." --max-budget 128000
|
|
275
|
+
tokenpack-rag pack paper.pdf --query "..." --reserve-output 2000
|
|
276
|
+
```
|
|
277
|
+
|
|
278
|
+
The default `64k` cap is intentional: TokenPack-RAG does local embedding and selection, so the packing step itself does not spend LLM API tokens. The cap is aimed at modern long-context models while still preventing unexpectedly huge output files.
|
|
279
|
+
|
|
280
|
+
## Output Files
|
|
281
|
+
|
|
282
|
+
By default, TokenPack-RAG writes the packed context next to the source:
|
|
283
|
+
|
|
284
|
+
| Source | Output |
|
|
285
|
+
|---|---|
|
|
286
|
+
| `paper.pdf` | `paper-tp.md` |
|
|
287
|
+
| `notes.txt` | `notes-tp.md` |
|
|
288
|
+
| `docs/` | `docs-tp.md` |
|
|
289
|
+
|
|
290
|
+
Existing output files are protected by default:
|
|
291
|
+
|
|
292
|
+
```bash
|
|
293
|
+
tokenpack-rag pack paper.pdf --query "..."
|
|
294
|
+
```
|
|
295
|
+
|
|
296
|
+
If `paper-tp.md` already exists, the command stops. Use `--overwrite` or choose an explicit path:
|
|
297
|
+
|
|
298
|
+
```bash
|
|
299
|
+
tokenpack-rag pack paper.pdf --query "..." --overwrite
|
|
300
|
+
tokenpack-rag pack paper.pdf --query "..." --out packed-context.md
|
|
301
|
+
```
|
|
302
|
+
|
|
303
|
+
Internal artifacts go under `.tokenpack/runs/<timestamp>/` unless you choose paths:
|
|
304
|
+
|
|
305
|
+
```bash
|
|
306
|
+
tokenpack-rag pack paper.pdf \
|
|
307
|
+
--query "..." \
|
|
308
|
+
--index-out .tokenpack/paper.index.json \
|
|
309
|
+
--selection-out paper-tp.selection.json
|
|
310
|
+
```
|
|
311
|
+
|
|
312
|
+
## Optional Compression
|
|
313
|
+
|
|
314
|
+
TokenPack-RAG is selection-first by default. You can optionally compress the selected evidence with LLMLingua:
|
|
315
|
+
|
|
316
|
+
```bash
|
|
317
|
+
tokenpack-rag pack paper.pdf \
|
|
318
|
+
--query "What evidence supports the main claim?" \
|
|
319
|
+
--compress llmlingua \
|
|
320
|
+
--compression-rate 0.85
|
|
321
|
+
```
|
|
322
|
+
|
|
323
|
+
For LongLLMLingua-style query-conditioned compression:
|
|
324
|
+
|
|
325
|
+
```bash
|
|
326
|
+
tokenpack-rag pack paper.pdf \
|
|
327
|
+
--query "What evidence supports the main claim?" \
|
|
328
|
+
--compress llmlingua \
|
|
329
|
+
--longllmlingua \
|
|
330
|
+
--compression-rate 0.85
|
|
331
|
+
```
|
|
332
|
+
|
|
333
|
+
By default, compression models are expected to be cached locally. Add `--allow-download` when you intentionally want Hugging Face downloads during compression.
|
|
334
|
+
|
|
335
|
+
## Use With Agents / MCP
|
|
336
|
+
|
|
337
|
+
TokenPack-RAG can also run as a local stdio MCP server. This lets an agent call TokenPack directly as a tool, produce a packed Markdown context, and then reason over that selected context.
|
|
338
|
+
|
|
339
|
+
Install with MCP support:
|
|
340
|
+
|
|
341
|
+
```bash
|
|
342
|
+
pipx install "tokenpack-rag[mcp,pdf,office,tokens]"
|
|
343
|
+
```
|
|
344
|
+
|
|
345
|
+
Add a local MCP server to your agent config:
|
|
346
|
+
|
|
347
|
+
```json
|
|
348
|
+
{
|
|
349
|
+
"mcpServers": {
|
|
350
|
+
"tokenpack-rag": {
|
|
351
|
+
"command": "tokenpack-rag-mcp",
|
|
352
|
+
"args": ["--workspace", "/path/to/project"]
|
|
353
|
+
}
|
|
354
|
+
}
|
|
355
|
+
}
|
|
356
|
+
```
|
|
357
|
+
|
|
358
|
+
Or run it through `uvx` without a permanent install:
|
|
359
|
+
|
|
360
|
+
```json
|
|
361
|
+
{
|
|
362
|
+
"mcpServers": {
|
|
363
|
+
"tokenpack-rag": {
|
|
364
|
+
"command": "uvx",
|
|
365
|
+
"args": [
|
|
366
|
+
"--from",
|
|
367
|
+
"tokenpack-rag[mcp,pdf,office,tokens]",
|
|
368
|
+
"tokenpack-rag-mcp",
|
|
369
|
+
"--workspace",
|
|
370
|
+
"/path/to/project"
|
|
371
|
+
]
|
|
372
|
+
}
|
|
373
|
+
}
|
|
374
|
+
}
|
|
375
|
+
```
|
|
376
|
+
|
|
377
|
+
The MCP server exposes:
|
|
378
|
+
|
|
379
|
+
| Tool | Purpose |
|
|
380
|
+
|---|---|
|
|
381
|
+
| `pack_context` | Packs a file or folder into selected Markdown context and writes the `-tp.md` artifact. |
|
|
382
|
+
| `read_packed_context` | Reads a packed context file, optionally in slices for large contexts. |
|
|
383
|
+
|
|
384
|
+
By default the MCP server can only read and write inside `--workspace`. Use `--allow-any-path` only for trusted local setups.
|
|
385
|
+
|
|
386
|
+
## Advanced CLI
|
|
387
|
+
|
|
388
|
+
The one-command `pack` workflow is the main user-facing interface. The lower-level commands remain available for experiments and reproducible paper runs.
|
|
389
|
+
|
|
390
|
+
Build an index:
|
|
391
|
+
|
|
392
|
+
```bash
|
|
393
|
+
tokenpack-rag ingest README.md --index .tokenpack/readme-index.json
|
|
394
|
+
```
|
|
395
|
+
|
|
396
|
+
Select evidence under a manual budget:
|
|
397
|
+
|
|
398
|
+
```bash
|
|
399
|
+
tokenpack-rag select \
|
|
400
|
+
--index .tokenpack/readme-index.json \
|
|
401
|
+
--query "How does TokenPack reduce LLM context cost?" \
|
|
402
|
+
--budget 3000 \
|
|
403
|
+
--reserve-output 500 \
|
|
404
|
+
--output .tokenpack/selection.json
|
|
405
|
+
```
|
|
406
|
+
|
|
407
|
+
Export the selected context:
|
|
408
|
+
|
|
409
|
+
```bash
|
|
410
|
+
tokenpack-rag export-context \
|
|
411
|
+
--selection .tokenpack/selection.json \
|
|
412
|
+
--output .tokenpack/context.txt
|
|
413
|
+
```
|
|
414
|
+
|
|
415
|
+
By default, these commands use:
|
|
416
|
+
|
|
417
|
+
```text
|
|
418
|
+
chunker: structure-aware semantic boundaries
|
|
419
|
+
chunk-size-preset: low-budget
|
|
420
|
+
scoring: evidence-hybrid
|
|
421
|
+
selector: budget-top-k (TokenPack hybrid-greedy)
|
|
422
|
+
```
|
|
423
|
+
|
|
424
|
+
Historical selectors such as `knapsack`, `knapsack-redundancy`, and `semantic-threshold` chunking remain available for ablation work, but the main pipeline is hybrid-greedy.
|
|
425
|
+
|
|
426
|
+
## Python API
|
|
427
|
+
|
|
428
|
+
```python
|
|
429
|
+
from tokenpack.embeddings import make_embedder
|
|
430
|
+
from tokenpack.pipeline import ingest_path
|
|
431
|
+
from tokenpack.scoring import score_chunks
|
|
432
|
+
from tokenpack.selectors import select_chunks
|
|
433
|
+
|
|
434
|
+
embedder = make_embedder()
|
|
435
|
+
index = ingest_path(
|
|
436
|
+
"README.md",
|
|
437
|
+
".tokenpack/readme-index.json",
|
|
438
|
+
embedder=embedder,
|
|
439
|
+
chunker_name="structure-aware",
|
|
440
|
+
target_tokens=250,
|
|
441
|
+
min_tokens=40,
|
|
442
|
+
max_tokens=320,
|
|
443
|
+
)
|
|
444
|
+
|
|
445
|
+
query = "How does TokenPack reduce LLM context cost?"
|
|
446
|
+
query_embedding = embedder.embed([query])[0]
|
|
447
|
+
|
|
448
|
+
scored = score_chunks(
|
|
449
|
+
query_embedding,
|
|
450
|
+
index.chunks,
|
|
451
|
+
index.embeddings,
|
|
452
|
+
scoring="evidence-hybrid",
|
|
453
|
+
query_text=query,
|
|
454
|
+
redundancy_penalty=0.35,
|
|
455
|
+
)
|
|
456
|
+
|
|
457
|
+
result = select_chunks(
|
|
458
|
+
scored,
|
|
459
|
+
strategy="budget-top-k",
|
|
460
|
+
budget=3000,
|
|
461
|
+
candidate_pool=250,
|
|
462
|
+
)
|
|
463
|
+
|
|
464
|
+
print(result.used_tokens, [item.chunk.id for item in result.selected])
|
|
465
|
+
```
|
|
466
|
+
|
|
467
|
+
## Headline Results
|
|
468
|
+
|
|
469
|
+
These are the cleanest results from the current paper artifacts. The paper is intentionally conservative: TokenPack-RAG does not claim universal knapsack dominance, but it does show that selection-first context packing is a strong budget-control layer.
|
|
470
|
+
|
|
471
|
+
| Setting | Main Result |
|
|
472
|
+
|---|---|
|
|
473
|
+
| **QASPER, matched ~50% saving** | Only TokenPack preserves **0.934 evidence recall** vs **0.713** for Only LLMLingua-2. |
|
|
474
|
+
| **QASPER complete evidence** | Only TokenPack preserves complete evidence on **0.870** of questions vs **0.120** for Only LLMLingua-2. |
|
|
475
|
+
| **QASPER cascade frontier** | TokenPack + LLMLingua-2 at rate 0.85 reaches **58.4% saving** with **0.851 evidence recall**. |
|
|
476
|
+
| **LongBench v2 generation pilot** | TP hybrid-greedy-50 answers **37/83** cases correctly vs **32/83 full context** and **34/83 production-RAG**, a **+15.6% relative accuracy gain** over full context with **50.6% saving**. |
|
|
477
|
+
| **LongBench aggressive cascade** | TP hybrid-greedy-50 + LongLLMLingua-50 keeps the same **37/83** correctness while reaching **74.6% context saving** on the 83-case eligible pilot. |
|
|
478
|
+
|
|
479
|
+
The strongest claim is:
|
|
480
|
+
|
|
481
|
+
> Select evidence first, then optionally compress it. Retrieval-time budget selection and prompt compression are not interchangeable.
|
|
482
|
+
|
|
483
|
+
## Reproduce Paper Runs
|
|
484
|
+
|
|
485
|
+
Fast local tests:
|
|
486
|
+
|
|
487
|
+
```bash
|
|
488
|
+
python -m pytest -q
|
|
489
|
+
```
|
|
490
|
+
|
|
491
|
+
QASPER selector baseline:
|
|
492
|
+
|
|
493
|
+
```bash
|
|
494
|
+
python submission/experiments/qasper_selector_eval.py \
|
|
495
|
+
--data-file .tokenpack/data/qasper-validation.parquet \
|
|
496
|
+
--chunker structure-aware \
|
|
497
|
+
--strategies production-rag,budget-top-k,greedy-density,knapsack,knapsack-redundancy \
|
|
498
|
+
--budget-ratios 0.20,0.30,0.40,0.50 \
|
|
499
|
+
--max-papers 500 \
|
|
500
|
+
--max-questions 861 \
|
|
501
|
+
--candidate-pool 300 \
|
|
502
|
+
--chunk-size-preset low-budget \
|
|
503
|
+
--output-dir submission/results/qasper_selector_eval_strong_rerun
|
|
504
|
+
```
|
|
505
|
+
|
|
506
|
+
LongBench v2 Modal pilot used in the current paper:
|
|
507
|
+
|
|
508
|
+
```bash
|
|
509
|
+
python -m modal run submission/longbench_eval/app.py::build_and_run \
|
|
510
|
+
--output-dir submission/results/longbench_v2_modal_hybrid_greedy_83_latency \
|
|
511
|
+
--limit 83 \
|
|
512
|
+
--source-min-tokens 8000 \
|
|
513
|
+
--source-max-tokens 24000 \
|
|
514
|
+
--max-scanned 503 \
|
|
515
|
+
--model-id Qwen/Qwen2.5-14B-Instruct \
|
|
516
|
+
--batch-size 1 \
|
|
517
|
+
--context-order score-then-source \
|
|
518
|
+
--latency-mode
|
|
519
|
+
```
|
|
520
|
+
|
|
521
|
+
See [`submission/source_code_manifest.md`](submission/source_code_manifest.md) for the full artifact map.
|
|
522
|
+
|
|
523
|
+
## Repository Layout
|
|
524
|
+
|
|
525
|
+
```text
|
|
526
|
+
src/tokenpack/ Python package and CLI implementation
|
|
527
|
+
tests/ Unit and smoke tests
|
|
528
|
+
examples/ Small local examples for the CLI
|
|
529
|
+
submission/paper/ LaTeX paper source, tables, figures
|
|
530
|
+
submission/experiments/ QASPER, LongBench, compression, and ablation scripts
|
|
531
|
+
submission/results/ Paper result artifacts and readouts
|
|
532
|
+
submission/longbench_eval/ Modal LongBench v2 generation harness
|
|
533
|
+
submission/modal_generation_eval/ Modal QASPER generation/judge harness
|
|
534
|
+
```
|
|
535
|
+
|
|
536
|
+
## Notes
|
|
537
|
+
|
|
538
|
+
- The default workflow is output-first: create a packed context file and send that file to your own LLM.
|
|
539
|
+
- Ollama is not required for `pack`; MCP support is optional and local-first.
|
|
540
|
+
- QASPER metrics are evidence-retention and answer-token-retention proxies, not human-judged generated-answer quality.
|
|
541
|
+
- LongBench v2 accuracy numbers are pilot-scale and should be read descriptively, not as statistically significant wins.
|
|
542
|
+
- Evidence-hybrid scoring weights are engineering defaults. The paper calls out weight calibration as future work.
|
|
543
|
+
- BudgetMem is discussed as related work; the old `budgetmem-style` proxy is kept only in `tokenpack.scoring_experimental`, not in the production CLI.
|
|
544
|
+
|
|
545
|
+
## License
|
|
546
|
+
|
|
547
|
+
TokenPack-RAG is licensed under the Business Source License 1.1. See [`LICENSE`](LICENSE).
|
|
548
|
+
|
|
549
|
+
## Citation
|
|
550
|
+
|
|
551
|
+
If you use TokenPack-RAG in research, cite the paper PDF in [`submission/TokenPack-paper.pdf`](submission/TokenPack-paper.pdf). A BibTeX entry will be added when the public preprint is available.
|