less-tokens 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Shamin Chokshi
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,4 @@
1
+ include README.md
2
+ include LICENSE
3
+ include pyproject.toml
4
+ recursive-include less_tokens *.py
@@ -0,0 +1,459 @@
1
+ Metadata-Version: 2.4
2
+ Name: less-tokens
3
+ Version: 0.1.0
4
+ Summary: Deterministic, training-free lexical compression for LLM prompts.
5
+ Author-email: Shamin Chokshi <shaminchokshi2000@gail.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/shaminchokshi/less-tokens
8
+ Project-URL: Issues, https://github.com/shaminchokshi/less-tokens/issues
9
+ Project-URL: Repository, https://github.com/shaminchokshi/less-tokens
10
+ Keywords: llm,prompt,compression,tokens,nlp,openai,tokenization,shortening,Less,reduce
11
+ Classifier: Development Status :: 4 - Beta
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: Intended Audience :: Science/Research
14
+ Classifier: License :: OSI Approved :: MIT License
15
+ Classifier: Operating System :: OS Independent
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Programming Language :: Python :: 3.9
18
+ Classifier: Programming Language :: Python :: 3.10
19
+ Classifier: Programming Language :: Python :: 3.11
20
+ Classifier: Programming Language :: Python :: 3.12
21
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
22
+ Classifier: Topic :: Text Processing :: Linguistic
23
+ Requires-Python: >=3.9
24
+ Description-Content-Type: text/markdown
25
+ License-File: LICENSE
26
+ Requires-Dist: nltk>=3.8
27
+ Requires-Dist: tiktoken>=0.6.0
28
+ Requires-Dist: rouge-score>=0.1.2
29
+ Requires-Dist: sentence-transformers>=2.5.0
30
+ Requires-Dist: bert-score>=0.3.13
31
+ Requires-Dist: numpy>=1.21
32
+ Provides-Extra: dev
33
+ Requires-Dist: pytest>=7; extra == "dev"
34
+ Requires-Dist: build; extra == "dev"
35
+ Requires-Dist: twine; extra == "dev"
36
+ Dynamic: license-file
37
+
38
+ # less-tokens
39
+
40
+ [![tests](https://github.com/shaminchokshi/less-tokens/actions/workflows/tests.yml/badge.svg)](https://github.com/shaminchokshi/less-tokens/actions/workflows/tests.yml)
41
+ [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
42
+ [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
43
+
44
+ Shrink your LLM prompts by 30 to 40 percent without changing what the model says back.
45
+
46
+ `less-tokens` is a small Python library that compresses prompts before you send them to an LLM. It works by stripping out filler words, redundant phrases, and grammatical scaffolding that the model doesn't actually need. The result is a shorter prompt that costs less and responds faster, while producing essentially the same answer.
47
+
48
+ No neural model, no GPU, no API key for the compression itself. It's classical lexical NLP, runs in milliseconds on a laptop CPU, and is fully deterministic.
49
+
50
+ ```python
51
+ from less_tokens import compress
52
+
53
+ original = "I was wondering if you could please explain to me how I can run a Python script from the command line."
54
+ compressed = compress(original,
55
+ remove_filler_phrases=1,
56
+ remove_stopwords=1,
57
+ apply_contractions=1)
58
+
59
+ print(compressed)
60
+ # "explain run Python script command line"
61
+ ```
62
+
63
+ ## Why this exists
64
+
65
+ If you're calling OpenAI, Anthropic, or any other LLM API at meaningful volume, every token has a cost. And typical prompts carry a lot of fat:
66
+
67
+ - *"I was wondering if you could..."* is hedging the model ignores
68
+ - *"the"*, *"a"*, *"is"* are function words that rarely change meaning
69
+ - *"basically"*, *"actually"*, *"really"* are fillers
70
+ - *"for example"* is just a verbose way to write *"e.g."*
71
+
72
+ Strip these out and the model still gets your point, but you pay less. On a large benchmark we ran (1,242 prompts, 18,630 paired LLM completions), here's how the headline numbers came out:
73
+
74
+ | Setting | Token reduction | Output similarity (BERTScore F1) |
75
+ |---------|-----------------|----------------------------------|
76
+ | Conservative | about 2% | 0.96 |
77
+ | **Balanced** | **about 30%** | **0.91** |
78
+ | Aggressive | about 35% | 0.91 |
79
+ | Maximum | about 40% | 0.88 |
80
+
81
+ The balanced setting is the sweet spot for most production use. Aggressive gets you a bit more compression without much extra quality loss.
82
+
83
+ ## Install
84
+
85
+ ```bash
86
+ pip install less-tokens
87
+ ```
88
+
89
+ On first use it downloads about 30 MB of NLTK data automatically. If you also call `compare()`, BERTScore will download an additional ~1 GB model the first time. You can skip that with `bertscore=False` if you don't need it.
90
+
91
+ Using a virtual environment is highly recommended:
92
+
93
+ ```bash
94
+ python -m venv .venv
95
+
96
+ # Windows
97
+ .venv\Scripts\Activate.ps1
98
+
99
+ # macOS or Linux
100
+ source .venv/bin/activate
101
+
102
+ pip install less-tokens
103
+ ```
104
+
105
+ ## How to use it
106
+
107
+ The library exposes two functions. That's it.
108
+
109
+ ### `compress()` shrinks a prompt
110
+
111
+ Pass your prompt and any combination of eleven flags. Each flag is `1` to enable or `0` to disable. Bool and string aliases like `True` or `"on"` work too. Defaults are off for everything except whitespace cleanup, so you choose what runs.
112
+
113
+ ```python
114
+ from less_tokens import compress
115
+
116
+ short = compress(
117
+ "I was wondering if you could explain this to me.",
118
+ remove_filler_phrases=1,
119
+ remove_stopwords=1,
120
+ )
121
+ # "explain"
122
+ ```
123
+
124
+ #### The eleven techniques
125
+
126
+ | Flag | What it does | Example |
127
+ |------|--------------|---------|
128
+ | `remove_filler_phrases` | Strips hedging phrases | "I was wondering if you could explain" becomes "explain" |
129
+ | `apply_abbreviations` | Replaces verbose forms | "for example" becomes "e.g." |
130
+ | `apply_contractions` | Combines into contractions | "do not" becomes "don't" |
131
+ | `remove_filler_words` | Drops single-word fillers | "this is basically really good" becomes "this is good" |
132
+ | `remove_stopwords` | Drops common stopwords | "the cat is on the mat" becomes "cat mat" |
133
+ | `remove_function_words` | Drops articles and auxiliaries | "the cat is running" becomes "cat running" |
134
+ | `pos_keep_only` | Keeps only content words | "I need to read the book quickly" becomes "need read book" |
135
+ | `lemmatize` | Reduces words to root forms | "running studies" becomes "run study" |
136
+ | `shorten_synonyms` | Substitutes shorter synonyms | "automobile" becomes "car" |
137
+ | `preserve_named_entities` | Protects names from pruning | "New York" stays intact (modifier flag) |
138
+ | `normalize_whitespace_punct` | Cleans up spacing | "hello world!!!" becomes "hello world!" (always on) |
139
+
140
+ #### What never gets removed
141
+
142
+ Two categories of words are hard-coded as protected, even at the most aggressive setting.
143
+
144
+ First, negations. Words like `not`, `no`, `never`, `nothing`, `nor`, `nobody`, and `cannot`. Dropping these flips the meaning of a sentence, which would be catastrophic. "Do not run this code" becoming "Do run this code" is not a tradeoff anyone wants.
145
+
146
+ Second, question words. `What`, `why`, `how`, `when`, `where`, `which`. These carry the intent of a query.
147
+
148
+ Also, if your original prompt ended with a question mark, the compressed version will too. We re-assert question form at the end of the pipeline so it isn't lost during pruning.
149
+
150
+ #### Four presets you can copy
151
+
152
+ You don't have to figure out which flags to combine. Here are four named recipes for different aggression levels:
153
+
154
+ ```python
155
+ # SAFE: barely shrinks anything, near-perfect quality preservation.
156
+ # Useful when you can't afford any quality risk.
157
+ compress(prompt,
158
+ remove_filler_phrases=1,
159
+ apply_contractions=1,
160
+ remove_filler_words=1)
161
+ # about 2% reduction, 0.96 BERTScore
162
+
163
+ # BALANCED: the production default. Roughly 30% reduction with minimal
164
+ # quality loss. Start here.
165
+ compress(prompt,
166
+ remove_filler_phrases=1,
167
+ apply_abbreviations=1,
168
+ apply_contractions=1,
169
+ remove_filler_words=1,
170
+ remove_stopwords=1)
171
+ # about 30% reduction, 0.91 BERTScore
172
+
173
+ # AGGRESSIVE: pure POS-based pruning. Slightly more reduction than balanced
174
+ # at very similar quality. Great for high-volume systems.
175
+ compress(prompt,
176
+ pos_keep_only=1,
177
+ preserve_named_entities=1)
178
+ # about 35% reduction, 0.91 BERTScore
179
+
180
+ # MAXIMUM: everything on. About 40% reduction at the cost of some output
181
+ # quality. Use when the cost savings really matter.
182
+ compress(prompt,
183
+ remove_filler_phrases=1, apply_abbreviations=1, apply_contractions=1,
184
+ remove_filler_words=1, remove_stopwords=1, remove_function_words=1,
185
+ pos_keep_only=1, lemmatize=1, shorten_synonyms=1, preserve_named_entities=1)
186
+ # about 40% reduction, 0.88 BERTScore
187
+ ```
188
+
189
+ ### `compare()` measures the quality tradeoff
190
+
191
+ Compression is only useful if the LLM still produces the same answer. `compare()` quantifies that across six different similarity metrics so you can see exactly what compressing cost you.
192
+
193
+ You make the LLM calls yourself, with whichever provider you like. `compare()` only looks at the four strings: original prompt, compressed prompt, output from original, output from compressed.
194
+
195
+ ```python
196
+ from less_tokens import compress, compare
197
+ from openai import OpenAI
198
+
199
+ client = OpenAI()
200
+
201
+ def call_llm(prompt: str) -> str:
202
+ r = client.chat.completions.create(
203
+ model="gpt-4o-mini",
204
+ messages=[{"role": "user", "content": prompt}],
205
+ temperature=0,
206
+ )
207
+ return r.choices[0].message.content
208
+
209
+ original = "I was wondering if you could explain how to brew good coffee at home."
210
+ compressed = compress(original, remove_filler_phrases=1, remove_stopwords=1)
211
+
212
+ out_original = call_llm(original)
213
+ out_compressed = call_llm(compressed)
214
+
215
+ metrics = compare(original, compressed, out_original, out_compressed)
216
+ ```
217
+
218
+ #### What you get back
219
+
220
+ ```python
221
+ {
222
+ "compression": {
223
+ "original_tokens": 18,
224
+ "compressed_tokens": 8,
225
+ "token_reduction_pct": 55.56, # you saved 55% of your tokens
226
+ "original_chars": 72,
227
+ "compressed_chars": 32,
228
+ "char_reduction_pct": 55.56,
229
+ },
230
+ "prompt_similarity": {
231
+ "cosine": 0.842, # the two prompts mean roughly the same thing
232
+ },
233
+ "output_similarity": { # six metrics on the LLM outputs
234
+ "cosine": 0.917,
235
+ "bleu": 0.412,
236
+ "rouge1_f": 0.673,
237
+ "rouge2_f": 0.418,
238
+ "rougeL_f": 0.601,
239
+ "bertscore_p": 0.923,
240
+ "bertscore_r": 0.918,
241
+ "bertscore_f": 0.920,
242
+ },
243
+ }
244
+ ```
245
+
246
+ #### What each of the six metrics actually means
247
+
248
+ All six measure the same thing from different angles: how similar is the LLM's response to the compressed prompt, compared to its response to the original. Each one captures a different notion of "similar".
249
+
250
+ **1. `cosine`. Semantic similarity. Range 0.0 to 1.0.**
251
+
252
+ The plain-English question it answers: *do the two outputs mean the same thing?*
253
+
254
+ It works by embedding both outputs with SentenceBERT (MiniLM-L6-v2) and taking the cosine of the angle between them. This is the most forgiving metric in the set because it handles paraphrasing well.
255
+
256
+ Interpretation:
257
+ - 0.95 or above: essentially identical meaning
258
+ - 0.85 to 0.95: same meaning, different wording
259
+ - 0.70 to 0.85: related but starting to drift
260
+ - below 0.70: the meanings have meaningfully diverged
261
+
262
+ **2. `bleu`. Word-sequence overlap. Range 0.0 to 1.0.**
263
+
264
+ The plain-English question: *do the two outputs use the same exact words in the same order?*
265
+
266
+ BLEU-4 with smoothing, originally invented for machine translation (Papineni et al., 2002). This is very strict. It penalises rewording, even when the meaning is preserved perfectly.
267
+
268
+ Interpretation:
269
+ - 0.50 or above: near-identical phrasing
270
+ - 0.20 to 0.50: similar content but reworded
271
+ - below 0.20: very different word choices (which doesn't mean the answer is wrong, just that the LLM phrased it differently)
272
+
273
+ Don't panic if BLEU is low. That's expected when an LLM rephrases the same answer using different words.
274
+
275
+ **3. `rouge1_f`. Single-word overlap. Range 0.0 to 1.0.**
276
+
277
+ The plain-English question: *do the two outputs use the same words, regardless of order?*
278
+
279
+ ROUGE-1 F1 (Lin, 2004). Measures unigram overlap. Less strict than BLEU because word order doesn't matter.
280
+
281
+ Interpretation:
282
+ - 0.70 or above: strong vocabulary overlap
283
+ - 0.40 to 0.70: moderate overlap
284
+ - below 0.40: mostly different vocabulary
285
+
286
+ **4. `rouge2_f`. Two-word phrase overlap. Range 0.0 to 1.0.**
287
+
288
+ The plain-English question: *do the two outputs share the same two-word phrases?*
289
+
290
+ ROUGE-2 F1. Same idea as ROUGE-1 but measures bigrams (consecutive word pairs). Stricter than ROUGE-1 because the words have to appear in the same order locally.
291
+
292
+ Interpretation:
293
+ - 0.40 or above: strong phrasal similarity
294
+ - 0.15 to 0.40: some shared phrases
295
+ - below 0.15: mostly different phrasing
296
+
297
+ **5. `rougeL_f`. Longest matching subsequence. Range 0.0 to 1.0.**
298
+
299
+ The plain-English question: *what's the longest stretch of words that appear in both outputs in the same order?*
300
+
301
+ ROUGE-L F1. Measures the longest common subsequence: words that appear in both outputs in the same order, but allowing other words between them. Captures structural similarity better than BLEU does.
302
+
303
+ Interpretation:
304
+ - 0.60 or above: strong structural alignment
305
+ - 0.30 to 0.60: some shared structure
306
+ - below 0.30: mostly independent structure
307
+
308
+ **6. `bertscore_f`. Contextual semantic similarity. Range 0.0 to 1.0.**
309
+
310
+ The plain-English question: *do the two outputs convey the same ideas, accounting for context?*
311
+
312
+ BERTScore F1 (Zhang et al., 2020). Computes per-token cosine similarity in a BERT embedding space, matching each token in one output to its most similar token in the other. This is the headline quality metric and correlates better with human judgment than any of the metrics above.
313
+
314
+ Interpretation:
315
+ - 0.95 or above: essentially equivalent outputs
316
+ - 0.90 to 0.95: very close, with some phrasing differences
317
+ - 0.85 to 0.90: similar core content but noticeable rewording
318
+ - below 0.85: meaningful divergence
319
+
320
+ BERTScore also gives you `bertscore_p` for precision and `bertscore_r` for recall. F1 is the harmonic mean of both, and is the one you should focus on.
321
+
322
+ #### Which metric should you care about?
323
+
324
+ It depends what you're trying to measure:
325
+
326
+ | Use case | Look at this | Threshold to aim for |
327
+ |----------|--------------|----------------------|
328
+ | General quality check | `bertscore_f` | 0.90 or higher |
329
+ | You need exact specific words in the output | `bleu` | 0.40 or higher |
330
+ | You need the same vocabulary, word order flexible | `rouge1_f` | 0.60 or higher |
331
+ | Cheap sanity check without downloading BERT model | `cosine` | 0.85 or higher |
332
+
333
+ If you don't want the 1 GB BERTScore model downloaded, skip it:
334
+
335
+ ```python
336
+ metrics = compare(original, compressed, out_original, out_compressed,
337
+ bertscore=False)
338
+ ```
339
+
340
+ You still get the other five metrics, which together are very informative.
341
+
342
+ ## A complete example
343
+
344
+ Here's the whole flow end to end:
345
+
346
+ ```python
347
+ from less_tokens import compress, compare
348
+ from openai import OpenAI
349
+
350
+ client = OpenAI()
351
+
352
+ def ask_gpt(prompt: str) -> str:
353
+ r = client.chat.completions.create(
354
+ model="gpt-4o-mini",
355
+ messages=[{"role": "user", "content": prompt}],
356
+ temperature=0,
357
+ )
358
+ return r.choices[0].message.content
359
+
360
+ original = ("I was wondering if you could please give me a step-by-step "
361
+ "explanation of how to make a really good cup of pour-over "
362
+ "coffee at home using a Hario V60.")
363
+
364
+ compressed = compress(
365
+ original,
366
+ remove_filler_phrases=1,
367
+ apply_abbreviations=1,
368
+ apply_contractions=1,
369
+ remove_filler_words=1,
370
+ remove_stopwords=1,
371
+ )
372
+
373
+ print(f"Original ({len(original)} chars): {original}")
374
+ print(f"Compressed ({len(compressed)} chars): {compressed}")
375
+ print()
376
+
377
+ out_original = ask_gpt(original)
378
+ out_compressed = ask_gpt(compressed)
379
+
380
+ metrics = compare(original, compressed, out_original, out_compressed)
381
+
382
+ print(f"Token reduction: {metrics['compression']['token_reduction_pct']}%")
383
+ print(f"BERTScore F1: {metrics['output_similarity']['bertscore_f']}")
384
+ print(f"Cosine sim: {metrics['output_similarity']['cosine']}")
385
+ ```
386
+
387
+ Typical output looks like:
388
+
389
+ ```
390
+ Original (160 chars): I was wondering if you could please give me a step-by-step...
391
+ Compressed (95 chars): step-by-step explanation pour-over coffee home Hario V60.
392
+
393
+ Token reduction: 40.6%
394
+ BERTScore F1: 0.918
395
+ Cosine sim: 0.911
396
+ ```
397
+
398
+ You saved 40 percent of your tokens and the LLM still gave you essentially the same answer.
399
+
400
+ ## What's happening under the hood
401
+
402
+ `less-tokens` is built on classical lexical NLP. These are the same techniques used in information retrieval and pre-neural NLP pipelines, just packaged together with sensible defaults and safety guarantees:
403
+
404
+ - **NLTK** (Loper and Bird, 2002) handles tokenisation, POS tagging, and named entity recognition
405
+ - **WordNet** (Miller, 1995) provides the synonym graph
406
+ - **tiktoken** counts tokens the same way GPT models do
407
+ - **sentence-transformers** computes cosine similarity
408
+ - **bert_score** computes BERTScore F1
409
+ - **rouge_score** computes ROUGE-1, ROUGE-2, and ROUGE-L
410
+ - **NLTK's BLEU** with method-1 smoothing
411
+
412
+ Every technique is a pure function. Same input plus same flags always produces the same output, byte for byte. Compression itself runs in well under 100 ms on a single CPU core.
413
+
414
+ ## Limitations worth knowing about
415
+
416
+ A few honest caveats so you know what you're getting.
417
+
418
+ English only. NLTK stopwords and WordNet are English-language. Multilingual support is open work.
419
+
420
+ Best on short and medium prompts. Roughly 60 to 2000 characters. Very long retrieval-augmented contexts aren't the target use case. For those, look at learned compressors like [LLMLingua](https://github.com/microsoft/LLMLingua).
421
+
422
+ The `shorten_synonyms` flag is the riskiest. WordNet sometimes picks topically narrower terms. Don't enable it without testing on your own data first.
423
+
424
+ Quality is task-dependent. Open-ended Q&A and creative writing tolerate compression well. Commonsense reasoning (HellaSwag-style multiple choice) degrades faster.
425
+
426
+ `compare()` measures similarity, not correctness. If your original prompt produces a bad LLM output, a similar compressed output is still bad. Make sure your prompts work first, then compress.
427
+
428
+ ## Contributing
429
+
430
+ Issues and pull requests are very welcome at [github.com/shaminchokshi/less-tokens](https://github.com/shaminchokshi/less-tokens).
431
+
432
+ To run the test suite locally:
433
+
434
+ ```bash
435
+ git clone https://github.com/shaminchokshi/less-tokens.git
436
+ cd less-tokens
437
+ pip install -e ".[dev]"
438
+ pytest tests/ -v
439
+ ```
440
+
441
+ ## License
442
+
443
+ MIT. See [LICENSE](LICENSE).
444
+
445
+ ## Citations
446
+
447
+ If you're using `less-tokens` in research, the underlying techniques come from these foundational papers:
448
+
449
+ - **NLTK**: Loper and Bird (2002). *NLTK: The Natural Language Toolkit.* ACL Workshop.
450
+ - **WordNet**: Miller (1995). *WordNet: A Lexical Database for English.* CACM 38(11).
451
+ - **BERTScore**: Zhang et al. (2020). *BERTScore: Evaluating Text Generation with BERT.* ICLR.
452
+ - **BLEU**: Papineni et al. (2002). *BLEU: a Method for Automatic Evaluation of Machine Translation.* ACL.
453
+ - **ROUGE**: Lin (2004). *ROUGE: A Package for Automatic Evaluation of Summaries.* ACL Workshop.
454
+ - **Sentence-BERT**: Reimers and Gurevych (2019). *Sentence-BERT.* EMNLP.
455
+
456
+ Related work on prompt compression you might want to compare against:
457
+
458
+ - **LLMLingua**: Jiang et al. (2023). EMNLP. Learned token pruning with an auxiliary LM, up to 20x compression.
459
+ - **Selective Context**: Li et al. (2023). EMNLP. Self-information-based pruning.