python-token-killer 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- python_token_killer-0.1.0/.gitignore +38 -0
- python_token_killer-0.1.0/CHANGELOG.md +74 -0
- python_token_killer-0.1.0/CONTRIBUTING.md +141 -0
- python_token_killer-0.1.0/LICENSE +21 -0
- python_token_killer-0.1.0/PKG-INFO +269 -0
- python_token_killer-0.1.0/README.md +219 -0
- python_token_killer-0.1.0/benchmarks/bench.py +138 -0
- python_token_killer-0.1.0/benchmarks/samples/api_response.json +31 -0
- python_token_killer-0.1.0/benchmarks/samples/python_module.py +413 -0
- python_token_killer-0.1.0/benchmarks/samples/server_log.txt +57 -0
- python_token_killer-0.1.0/examples/claude_code_skill.py +70 -0
- python_token_killer-0.1.0/examples/clean_api_response.py +44 -0
- python_token_killer-0.1.0/examples/langchain_middleware.py +97 -0
- python_token_killer-0.1.0/pyproject.toml +100 -0
- python_token_killer-0.1.0/src/ptk/__init__.py +166 -0
- python_token_killer-0.1.0/src/ptk/_base.py +137 -0
- python_token_killer-0.1.0/src/ptk/_types.py +126 -0
- python_token_killer-0.1.0/src/ptk/minimizers/__init__.py +17 -0
- python_token_killer-0.1.0/src/ptk/minimizers/_code.py +156 -0
- python_token_killer-0.1.0/src/ptk/minimizers/_dict.py +167 -0
- python_token_killer-0.1.0/src/ptk/minimizers/_diff.py +83 -0
- python_token_killer-0.1.0/src/ptk/minimizers/_list.py +87 -0
- python_token_killer-0.1.0/src/ptk/minimizers/_log.py +94 -0
- python_token_killer-0.1.0/src/ptk/minimizers/_text.py +182 -0
- python_token_killer-0.1.0/src/ptk/py.typed +0 -0
- python_token_killer-0.1.0/tests/test_adversarial.py +983 -0
- python_token_killer-0.1.0/tests/test_ptk.py +1022 -0
- python_token_killer-0.1.0/tests/test_real_world.py +620 -0
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
# Python
|
|
2
|
+
__pycache__/
|
|
3
|
+
*.py[cod]
|
|
4
|
+
*$py.class
|
|
5
|
+
*.egg-info/
|
|
6
|
+
*.egg
|
|
7
|
+
dist/
|
|
8
|
+
build/
|
|
9
|
+
*.whl
|
|
10
|
+
|
|
11
|
+
# uv
|
|
12
|
+
.venv/
|
|
13
|
+
|
|
14
|
+
# Legacy virtual environments (not used with uv)
|
|
15
|
+
venv/
|
|
16
|
+
env/
|
|
17
|
+
|
|
18
|
+
# IDE
|
|
19
|
+
.vscode/
|
|
20
|
+
.idea/
|
|
21
|
+
*.swp
|
|
22
|
+
*.swo
|
|
23
|
+
*~
|
|
24
|
+
|
|
25
|
+
# Testing
|
|
26
|
+
.pytest_cache/
|
|
27
|
+
.coverage
|
|
28
|
+
htmlcov/
|
|
29
|
+
.mypy_cache/
|
|
30
|
+
|
|
31
|
+
# OS
|
|
32
|
+
.DS_Store
|
|
33
|
+
Thumbs.db
|
|
34
|
+
|
|
35
|
+
# Build
|
|
36
|
+
*.so
|
|
37
|
+
*.dylib
|
|
38
|
+
dist/
|
|
@@ -0,0 +1,74 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to this project will be documented in this file.
|
|
4
|
+
|
|
5
|
+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
|
6
|
+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
+
|
|
8
|
+
## [Unreleased]
|
|
9
|
+
|
|
10
|
+
### Added
|
|
11
|
+
|
|
12
|
+
### Changed
|
|
13
|
+
|
|
14
|
+
### Fixed
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## [0.1.0] - 2026-04-09
|
|
19
|
+
|
|
20
|
+
Initial public release.
|
|
21
|
+
|
|
22
|
+
### API
|
|
23
|
+
|
|
24
|
+
- `ptk.minimize(obj)` — auto-detects content type, applies the right compression strategy, returns a minimized string. Accepts `aggressive`, `content_type`, and minimizer-specific kwargs.
|
|
25
|
+
- `ptk.stats(obj)` — same compression, returns a dict with `output`, `original_tokens`, `minimized_tokens`, `savings_pct`, `content_type`.
|
|
26
|
+
- `ptk.detect_type(obj)` — returns the auto-detected content type as a string.
|
|
27
|
+
- `ptk(obj)` — callable module shorthand for `ptk.minimize(obj)`.
|
|
28
|
+
|
|
29
|
+
### Minimizers
|
|
30
|
+
|
|
31
|
+
- **DictMinimizer** — recursive null/empty stripping (preserves `0` and `False`), key shortening (`description` → `desc`, `configuration` → `cfg`, 30+ mappings), single-child flattening, kv/tabular output formats.
|
|
32
|
+
- **ListMinimizer** — schema-once tabular encoding for uniform list-of-dicts, primitive dedup with `(xN)` counts, deterministic even-spaced sampling with first/last preservation.
|
|
33
|
+
- **CodeMinimizer** — comment stripping with pragma preservation (`# noqa`, `# type: ignore`, `# TODO`, `# FIXME`, `// eslint-disable`), multi-line docstring collapse to first line, multi-language signature extraction (Python, JS, Rust, Go).
|
|
34
|
+
- **LogMinimizer** — consecutive duplicate line collapse, timestamp stripping, error-only filtering with stack trace preservation (`Traceback`, `File`, `*Error:`, `*Exception:`), `"failed"` keyword preservation, FATAL/CRITICAL treated as errors.
|
|
35
|
+
- **DiffMinimizer** — context line folding to `... N lines ...`, noise stripping (`index`, `old mode`, `new mode`, `similarity`, `Binary files`), `` preservation.
|
|
36
|
+
- **TextMinimizer** — 20+ word abbreviations (`implementation` → `impl`, `configuration` → `config`, `production` → `prod`, case-preserving), 16 phrase abbreviations (`in order to` → `to`, `due to the fact that` → `because`), 13 filler phrase removals (`Furthermore,`, `Moreover,`, `Additionally,`), stopword removal (aggressive mode).
|
|
37
|
+
|
|
38
|
+
### Benchmarks
|
|
39
|
+
|
|
40
|
+
Real token counts via tiktoken (`cl100k_base`):
|
|
41
|
+
|
|
42
|
+
| Benchmark | Original | Default | Saved | Aggressive | Saved |
|
|
43
|
+
|---|---|---|---|---|---|
|
|
44
|
+
| API response (JSON) | 1,450 | 792 | 45.4% | 782 | 46.1% |
|
|
45
|
+
| Python module (code) | 2,734 | 2,113 | 22.7% | 309 | 88.7% |
|
|
46
|
+
| Server log (58 lines) | 1,389 | 1,388 | 0.1% | 231 | 83.4% |
|
|
47
|
+
| 50 user records (list) | 2,774 | 922 | 66.8% | 922 | 66.8% |
|
|
48
|
+
| Verbose paragraph (text) | 101 | 96 | 5.0% | 74 | 26.7% |
|
|
49
|
+
| **Total** | **11,182** | **7,424** | **33.6%** | **2,627** | **76.5%** |
|
|
50
|
+
|
|
51
|
+
Bundled sample data and runner: `python benchmarks/bench.py`
|
|
52
|
+
|
|
53
|
+
### Tests
|
|
54
|
+
|
|
55
|
+
322 tests across two suites:
|
|
56
|
+
|
|
57
|
+
- **test_ptk.py** (153 tests) — feature coverage for all 6 minimizers, type detection, base helpers, API contracts, and real-world payloads.
|
|
58
|
+
- **test_adversarial.py** (169 tests) — type chaos (None, bytes, sets, circular refs, broken `__str__`, dataclasses, generators, inf/nan), deep nesting (100-level dicts, 10k-wide structures), unicode (emoji, CJK, RTL, null bytes, surrogates, BOM), regex safety (pathological backtracking, unclosed constructs, 100k newlines), API contracts (parametrized across 9 input types), input mutation verification (deepcopy before/after), thread safety (10 concurrent threads), performance (all benchmarks under 5s), idempotency, and content type mismatch degradation.
|
|
59
|
+
|
|
60
|
+
### Examples
|
|
61
|
+
|
|
62
|
+
- `examples/clean_api_response.py` — standalone script + stdin pipe for JSON cleanup.
|
|
63
|
+
- `examples/langchain_middleware.py` — LangGraph node, callable wrapper, batch document minimizer.
|
|
64
|
+
- `examples/claude_code_skill.py` — CLI tool with `--stdin`, `--type`, `--aggressive`, `--stats` flags.
|
|
65
|
+
|
|
66
|
+
### Infrastructure
|
|
67
|
+
|
|
68
|
+
- Zero required dependencies — stdlib only. tiktoken optional (`pip install python-token-killer[tiktoken]`).
|
|
69
|
+
- `mypy --strict` clean across all 10 source files.
|
|
70
|
+
- `ruff check` clean across all source, tests, benchmarks, and examples.
|
|
71
|
+
- `py.typed` marker for PEP 561 type checker support.
|
|
72
|
+
- GitHub Actions CI workflow (Python 3.10–3.13 matrix).
|
|
73
|
+
- `AGENTS.md` + `CLAUDE.md` for coding agent context.
|
|
74
|
+
- MIT license.
|
|
@@ -0,0 +1,141 @@
|
|
|
1
|
+
# Contributing to ptk
|
|
2
|
+
|
|
3
|
+
## Quick Start
|
|
4
|
+
|
|
5
|
+
```bash
|
|
6
|
+
git clone https://github.com/amahi2001/python-token-killer.git
|
|
7
|
+
cd python-token-killer
|
|
8
|
+
|
|
9
|
+
# Install uv (if you don't have it)
|
|
10
|
+
curl -LsSf https://astral.sh/uv/install.sh | sh
|
|
11
|
+
|
|
12
|
+
# Install all dev dependencies — uv manages the venv automatically
|
|
13
|
+
uv sync
|
|
14
|
+
|
|
15
|
+
# Run everything CI runs — must pass before opening a PR
|
|
16
|
+
make check
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
That's it. `uv sync` reads `uv.lock`, creates `.venv`, and installs every dev tool pinned to exact versions. No manual venv activation needed — `uv run` handles it.
|
|
20
|
+
|
|
21
|
+
## Commands
|
|
22
|
+
|
|
23
|
+
| Command | What it does |
|
|
24
|
+
|---|---|
|
|
25
|
+
| `make check` | Lint + typecheck + tests (the one command before every PR) |
|
|
26
|
+
| `make test` | Tests only (361 tests, ~0.6s) |
|
|
27
|
+
| `make lint` | `ruff check` + `ruff format --check` |
|
|
28
|
+
| `make typecheck` | `mypy --strict` |
|
|
29
|
+
| `make bench` | Benchmarks with tiktoken |
|
|
30
|
+
| `make fix` | Auto-fix lint and formatting issues |
|
|
31
|
+
| `make build` | Build wheel + sdist (`dist/`) |
|
|
32
|
+
| `make clean` | Remove caches and build artifacts |
|
|
33
|
+
|
|
34
|
+
All commands use `uv run` — they work whether or not you've activated the venv.
|
|
35
|
+
|
|
36
|
+
## Architecture in 30 Seconds
|
|
37
|
+
|
|
38
|
+
```
|
|
39
|
+
ptk.minimize(obj)
|
|
40
|
+
→ _types.detect(obj) # what is this? dict, list, code, log, diff, text
|
|
41
|
+
→ _ROUTER[type] # pick the singleton minimizer
|
|
42
|
+
→ minimizer.run(obj) # _serialize for measurement, _minimize for output
|
|
43
|
+
→ MinResult(output, lengths) # frozen dataclass
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
Every file has one job:
|
|
47
|
+
|
|
48
|
+
```
|
|
49
|
+
src/ptk/
|
|
50
|
+
__init__.py Public API + callable module trick + router
|
|
51
|
+
_types.py ContentType enum + detect() heuristics
|
|
52
|
+
_base.py Minimizer ABC + MinResult + shared helpers
|
|
53
|
+
minimizers/
|
|
54
|
+
_dict.py DictMinimizer (null strip, key shorten, flatten)
|
|
55
|
+
_list.py ListMinimizer (tabular, dedup, sampling)
|
|
56
|
+
_code.py CodeMinimizer (comments, docstrings, signatures)
|
|
57
|
+
_log.py LogMinimizer (dedup lines, error filter, stack traces)
|
|
58
|
+
_diff.py DiffMinimizer (context folding, noise strip)
|
|
59
|
+
_text.py TextMinimizer (abbreviation, filler removal, stopwords)
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
## The Three Rules
|
|
63
|
+
|
|
64
|
+
These are non-negotiable. PRs that break them will be rejected.
|
|
65
|
+
|
|
66
|
+
### 1. `minimize()` must never raise
|
|
67
|
+
|
|
68
|
+
Any Python object passed to `ptk.minimize()` must produce a string — never an exception. `Minimizer.run()` wraps every `_minimize()` call in a try/except that catches `RecursionError`, `ValueError`, `TypeError`, and `OverflowError`, falling back to `str(obj)`.
|
|
69
|
+
|
|
70
|
+
### 2. Never mutate the input
|
|
71
|
+
|
|
72
|
+
All minimizers must create new objects. The original `obj` passed to `minimize()` must be identical after the call. `test_adversarial.py::TestInputMutation` verifies this with `deepcopy` comparisons.
|
|
73
|
+
|
|
74
|
+
### 3. Zero required dependencies
|
|
75
|
+
|
|
76
|
+
The library must work with `pip install python-token-killer` and nothing else. No numpy, no tiktoken in the core. Optional extras are fine — import them inside try/except.
|
|
77
|
+
|
|
78
|
+
## Gotchas You'll Hit
|
|
79
|
+
|
|
80
|
+
### The callable module trick
|
|
81
|
+
|
|
82
|
+
`__init__.py` swaps its own `__class__` to `_CallableModule` so `ptk(obj)` works. Imports must be structured carefully — `sys` and `types` are imported after the public API definitions with `# noqa: E402`. Don't reorganize imports without testing `ptk({"a": 1})` interactively.
|
|
83
|
+
|
|
84
|
+
### `_serialize` vs `_minimize`
|
|
85
|
+
|
|
86
|
+
`_serialize(obj)` is called before `_minimize()` — it only measures the original length for stats. It must never raise (it has its own try/except). The actual output comes from `_minimize()`.
|
|
87
|
+
|
|
88
|
+
### `from __future__ import annotations`
|
|
89
|
+
|
|
90
|
+
Every source file uses this for PEP 563 deferred annotation evaluation on Python 3.10. Don't remove it.
|
|
91
|
+
|
|
92
|
+
### Regexes are precompiled
|
|
93
|
+
|
|
94
|
+
All regex patterns are compiled at module import time as module-level constants. Never call `re.compile()` inside a function.
|
|
95
|
+
|
|
96
|
+
### Pragma preservation in CodeMinimizer
|
|
97
|
+
|
|
98
|
+
When stripping comments, `_strip_comment_if_safe()` checks each comment against `_PRAGMA_KEYWORDS` before removing it. Comments containing `noqa`, `type: ignore`, `TODO`, `FIXME`, `eslint-disable`, etc. survive.
|
|
99
|
+
|
|
100
|
+
### Thread safety
|
|
101
|
+
|
|
102
|
+
Minimizers are stateless singletons stored in `_ROUTER`. Don't add instance attributes that change between calls.
|
|
103
|
+
|
|
104
|
+
## Adding a New Minimizer
|
|
105
|
+
|
|
106
|
+
1. Create `src/ptk/minimizers/_yourtype.py`, subclass `Minimizer`, implement `_minimize()`
|
|
107
|
+
2. Add `ContentType.YOURTYPE = auto()` to `_types.py` + detection heuristic in `detect()`
|
|
108
|
+
3. Register in `_ROUTER` in `__init__.py`
|
|
109
|
+
4. Export from `minimizers/__init__.py`
|
|
110
|
+
5. Add tests in `test_ptk.py` (feature) and `test_adversarial.py` (edge cases)
|
|
111
|
+
|
|
112
|
+
## Dependency Groups
|
|
113
|
+
|
|
114
|
+
ptk uses [PEP 735](https://peps.python.org/pep-0735/) dependency groups:
|
|
115
|
+
|
|
116
|
+
| Group | Contents | Install |
|
|
117
|
+
|---|---|---|
|
|
118
|
+
| `test` | pytest | `uv sync --only-group test` |
|
|
119
|
+
| `lint` | ruff | `uv sync --only-group lint` |
|
|
120
|
+
| `typecheck` | mypy | `uv sync --only-group typecheck` |
|
|
121
|
+
| `bench` | tiktoken | `uv sync --only-group bench` |
|
|
122
|
+
| `hooks` | pre-commit | `uv sync --only-group hooks` |
|
|
123
|
+
| `dev` | all of the above | `uv sync` (default) |
|
|
124
|
+
|
|
125
|
+
CI installs only what each job needs. `uv sync` with no flags installs `dev` (everything).
|
|
126
|
+
|
|
127
|
+
## Pre-commit Hooks
|
|
128
|
+
|
|
129
|
+
```bash
|
|
130
|
+
uv run pre-commit install
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
After that, `ruff` and `mypy` run automatically on every `git commit`. `make check` is the equivalent without needing hooks installed.
|
|
134
|
+
|
|
135
|
+
## PR Checklist
|
|
136
|
+
|
|
137
|
+
- `make check` passes
|
|
138
|
+
- New code has tests in `test_ptk.py` (feature) and/or `test_adversarial.py` (edge cases)
|
|
139
|
+
- No new required dependencies added
|
|
140
|
+
- Docstrings on new public classes/methods
|
|
141
|
+
- CHANGELOG.md updated under `[Unreleased]` if user-facing
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 ptk contributors
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,269 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: python-token-killer
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Minimize LLM tokens from Python objects — dicts, code, logs, diffs, and more.
|
|
5
|
+
Project-URL: Homepage, https://github.com/amahi2001/python-token-killer
|
|
6
|
+
Project-URL: Repository, https://github.com/amahi2001/python-token-killer
|
|
7
|
+
Project-URL: Issues, https://github.com/amahi2001/python-token-killer/issues
|
|
8
|
+
Project-URL: Changelog, https://github.com/amahi2001/python-token-killer/blob/main/CHANGELOG.md
|
|
9
|
+
Author-email: amahi2001 <amahi2001@gmail.com>
|
|
10
|
+
License: MIT License
|
|
11
|
+
|
|
12
|
+
Copyright (c) 2026 ptk contributors
|
|
13
|
+
|
|
14
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
15
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
16
|
+
in the Software without restriction, including without limitation the rights
|
|
17
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
18
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
19
|
+
furnished to do so, subject to the following conditions:
|
|
20
|
+
|
|
21
|
+
The above copyright notice and this permission notice shall be included in all
|
|
22
|
+
copies or substantial portions of the Software.
|
|
23
|
+
|
|
24
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
25
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
26
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
27
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
28
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
29
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
30
|
+
SOFTWARE.
|
|
31
|
+
License-File: LICENSE
|
|
32
|
+
Keywords: agents,claude,compression,context-window,langchain,langgraph,llm,nlp,openai,rag,tokens
|
|
33
|
+
Classifier: Development Status :: 3 - Alpha
|
|
34
|
+
Classifier: Intended Audience :: Developers
|
|
35
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
36
|
+
Classifier: Operating System :: OS Independent
|
|
37
|
+
Classifier: Programming Language :: Python :: 3
|
|
38
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
39
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
40
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
41
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
42
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
43
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
44
|
+
Classifier: Topic :: Text Processing
|
|
45
|
+
Classifier: Typing :: Typed
|
|
46
|
+
Requires-Python: >=3.10
|
|
47
|
+
Provides-Extra: tiktoken
|
|
48
|
+
Requires-Dist: tiktoken>=0.7; extra == 'tiktoken'
|
|
49
|
+
Description-Content-Type: text/markdown
|
|
50
|
+
|
|
51
|
+
<p align="center">
|
|
52
|
+
<img src="assets/mascot.png" alt="ptk" width="200"/>
|
|
53
|
+
</p>
|
|
54
|
+
|
|
55
|
+
<p align="center">
|
|
56
|
+
<strong>ptk — Python Token Killer</strong><br/>
|
|
57
|
+
<strong>Minimize LLM tokens from Python objects in one call</strong><br/>
|
|
58
|
+
Zero dependencies • Auto type detection • 322 tests
|
|
59
|
+
</p>
|
|
60
|
+
|
|
61
|
+
<table align="center">
|
|
62
|
+
<tr>
|
|
63
|
+
<td align="left" valign="middle">
|
|
64
|
+
<a href="https://github.com/amahi2001/python-token-killer/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/amahi2001/python-token-killer/ci.yml?branch=main&style=flat-square&label=CI" alt="CI"/></a><br/>
|
|
65
|
+
<img src="https://img.shields.io/badge/python-3.10+-3776AB?style=flat-square&logo=python&logoColor=white" alt="Python 3.10+"/><br/>
|
|
66
|
+
<img src="https://img.shields.io/badge/mypy-strict-blue?style=flat-square" alt="mypy strict"/><br/>
|
|
67
|
+
<a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-yellow?style=flat-square" alt="License"/></a>
|
|
68
|
+
</td>
|
|
69
|
+
</tr>
|
|
70
|
+
</table>
|
|
71
|
+
|
|
72
|
+
---
|
|
73
|
+
|
|
74
|
+
## What is ptk?
|
|
75
|
+
|
|
76
|
+
ptk is a **Python library** that minimizes tokens before they reach an LLM. Pass in any Python object — dict, list, code, logs, diffs, text — and get back a compressed string representation.
|
|
77
|
+
|
|
78
|
+
Inspired by [RTK (Rust Token Killer)](https://github.com/rtk-ai/rtk), but designed as a library for programmatic use, not a CLI proxy.
|
|
79
|
+
|
|
80
|
+
```python
|
|
81
|
+
import ptk
|
|
82
|
+
|
|
83
|
+
ptk.minimize({"users": [{"name": "Alice", "bio": None, "age": 30}]})
|
|
84
|
+
# → '{"users":[{"name":"Alice","age":30}]}'
|
|
85
|
+
|
|
86
|
+
ptk(my_dict) # callable shorthand
|
|
87
|
+
ptk(my_dict, aggressive=True) # max compression
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
```bash
|
|
91
|
+
pip install python-token-killer
|
|
92
|
+
# or
|
|
93
|
+
uv add python-token-killer
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
Optional: `pip install python-token-killer[tiktoken]` or `uv add python-token-killer[tiktoken]` for exact token counting.
|
|
97
|
+
|
|
98
|
+
## Benchmarks
|
|
99
|
+
|
|
100
|
+
Real token counts via tiktoken (`cl100k_base`, same tokenizer as GPT-4 / Claude):
|
|
101
|
+
|
|
102
|
+
```
|
|
103
|
+
Benchmark Original Default Saved Aggressive Saved
|
|
104
|
+
API response (JSON) 1450 792 45.4% 782 46.1%
|
|
105
|
+
Python module (code) 2734 2113 22.7% 309 88.7%
|
|
106
|
+
Server log (58 lines) 1389 1388 0.1% 231 83.4%
|
|
107
|
+
50 user records (list) 2774 922 66.8% 922 66.8%
|
|
108
|
+
Verbose paragraph (text) 101 96 5.0% 74 26.7%
|
|
109
|
+
─────────────────────────────────────────────
|
|
110
|
+
TOTAL 11182 7424 33.6% 2627 76.5%
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
Run yourself: `python benchmarks/bench.py`
|
|
114
|
+
|
|
115
|
+
## What It Does
|
|
116
|
+
|
|
117
|
+
ptk auto-detects your input type and routes to the right minimizer:
|
|
118
|
+
|
|
119
|
+
| Input Type | Strategy | Typical Savings |
|
|
120
|
+
|---|---|---|
|
|
121
|
+
| `dict` | Null stripping, key shortening, flattening, compact JSON | 30–60% |
|
|
122
|
+
| `list` | Dedup, schema-once tabular, sampling | 40–70% |
|
|
123
|
+
| Code `str` | Comment stripping (pragma-preserving), docstring collapse, signature extraction | 25–80% |
|
|
124
|
+
| Logs `str` | Line dedup with counts, error-only filtering, stack trace preservation | 60–90% |
|
|
125
|
+
| Diffs `str` | Context folding, noise stripping | 50–75% |
|
|
126
|
+
| Text `str` | Word/phrase abbreviation, filler removal, stopword removal | 10–30% |
|
|
127
|
+
|
|
128
|
+
## API
|
|
129
|
+
|
|
130
|
+
### `ptk.minimize(obj, *, aggressive=False, content_type=None, **kw) → str`
|
|
131
|
+
|
|
132
|
+
Main entry point. Auto-detects type, applies the right strategy, returns a minimized string.
|
|
133
|
+
|
|
134
|
+
```python
|
|
135
|
+
# auto-detect
|
|
136
|
+
ptk.minimize({"key": "value"})
|
|
137
|
+
|
|
138
|
+
# force content type
|
|
139
|
+
ptk.minimize(some_string, content_type="code")
|
|
140
|
+
ptk.minimize(some_string, content_type="log")
|
|
141
|
+
|
|
142
|
+
# dict output formats
|
|
143
|
+
ptk.minimize(data, format="kv") # key:value lines
|
|
144
|
+
ptk.minimize(data, format="tabular") # header-once tabular
|
|
145
|
+
|
|
146
|
+
# code: signatures only (huge savings)
|
|
147
|
+
ptk.minimize(code, content_type="code", mode="signatures")
|
|
148
|
+
|
|
149
|
+
# logs: errors only
|
|
150
|
+
ptk.minimize(logs, content_type="log", errors_only=True)
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
### `ptk.stats(obj, **kw) → dict`
|
|
154
|
+
|
|
155
|
+
Same compression, but returns statistics:
|
|
156
|
+
|
|
157
|
+
```python
|
|
158
|
+
ptk.stats(big_api_response)
|
|
159
|
+
# {
|
|
160
|
+
# "output": "...",
|
|
161
|
+
# "original_len": 4200,
|
|
162
|
+
# "minimized_len": 1800,
|
|
163
|
+
# "savings_pct": 57.1,
|
|
164
|
+
# "content_type": "dict",
|
|
165
|
+
# "original_tokens": 1050,
|
|
166
|
+
# "minimized_tokens": 450,
|
|
167
|
+
# }
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
### `ptk(obj)` — callable module
|
|
171
|
+
|
|
172
|
+
```python
|
|
173
|
+
import ptk
|
|
174
|
+
ptk(some_dict) # equivalent to ptk.minimize(some_dict)
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
## Features by Minimizer
|
|
178
|
+
|
|
179
|
+
### DictMinimizer
|
|
180
|
+
- Strips `None`, `""`, `[]`, `{}` recursively (preserves `0` and `False`)
|
|
181
|
+
- Key shortening: `description` → `desc`, `timestamp` → `ts`, `configuration` → `cfg`, etc.
|
|
182
|
+
- Single-child flattening: `{"a": {"b": val}}` → `{"a.b": val}` (aggressive)
|
|
183
|
+
- Output formats: compact JSON (default), key-value lines, header-once tabular
|
|
184
|
+
|
|
185
|
+
### ListMinimizer
|
|
186
|
+
- Uniform list-of-dicts → schema-once tabular: declare fields once, one row per item
|
|
187
|
+
- Primitive dedup with counts: `["a", "a", "a", "b"]` → `a (x3)\nb`
|
|
188
|
+
- Large array sampling with first/last preservation (aggressive, threshold: 50)
|
|
189
|
+
|
|
190
|
+
### CodeMinimizer
|
|
191
|
+
- Strips comments while **preserving pragmas**: `# noqa`, `# type: ignore`, `# TODO`, `# FIXME`, `// eslint-disable`
|
|
192
|
+
- Collapses multi-line docstrings to first line only
|
|
193
|
+
- Signature extraction mode: pulls `def`, `class`, `fn`, `func` across Python, JS, Rust, Go
|
|
194
|
+
- Normalizes blank lines and trailing whitespace
|
|
195
|
+
|
|
196
|
+
### LogMinimizer
|
|
197
|
+
- Consecutive duplicate line collapse with `(xN)` counts
|
|
198
|
+
- Error-only filtering preserving: ERROR, WARN, FATAL, CRITICAL, stack traces, "failed" keyword
|
|
199
|
+
- Timestamp stripping (aggressive)
|
|
200
|
+
|
|
201
|
+
### DiffMinimizer
|
|
202
|
+
- Folds unchanged context lines to `... N lines ...`
|
|
203
|
+
- Strips noise: `index`, `old mode`, `new mode`, `similarity`, `Binary files` (aggressive)
|
|
204
|
+
- Preserves: `+`/`-` lines, `@@` hunks, `---`/`+++` headers, ``
|
|
205
|
+
|
|
206
|
+
### TextMinimizer
|
|
207
|
+
- Word abbreviation: `implementation` → `impl`, `configuration` → `config`, `production` → `prod`, etc.
|
|
208
|
+
- Phrase abbreviation: `in order to` → `to`, `due to the fact that` → `because`, etc.
|
|
209
|
+
- Filler removal: strips `Furthermore,`, `Moreover,`, `In addition,`, `Additionally,`
|
|
210
|
+
- Stopword removal (aggressive): strips `the`, `a`, `is`, `very`, etc.
|
|
211
|
+
|
|
212
|
+
## Use Cases
|
|
213
|
+
|
|
214
|
+
### Agent Frameworks (LangGraph / LangChain)
|
|
215
|
+
|
|
216
|
+
```python
|
|
217
|
+
import ptk
|
|
218
|
+
|
|
219
|
+
def compress_context(state):
|
|
220
|
+
state["context"] = ptk.minimize(state["context"], aggressive=True)
|
|
221
|
+
return state
|
|
222
|
+
```
|
|
223
|
+
|
|
224
|
+
### Claude Code Skills
|
|
225
|
+
|
|
226
|
+
```python
|
|
227
|
+
#!/usr/bin/env python3
|
|
228
|
+
import ptk, json, sys
|
|
229
|
+
data = json.load(open(sys.argv[1]))
|
|
230
|
+
print(ptk(data))
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
### API Response Cleanup
|
|
234
|
+
|
|
235
|
+
```python
|
|
236
|
+
response = requests.get("https://api.example.com/users").json()
|
|
237
|
+
clean = ptk.minimize(response) # strip nulls, compact JSON
|
|
238
|
+
```
|
|
239
|
+
|
|
240
|
+
## Comparison with Alternatives
|
|
241
|
+
|
|
242
|
+
| Tool | Approach | Best For |
|
|
243
|
+
|---|---|---|
|
|
244
|
+
| **ptk** | Type-detecting Python library, one-liner API | Programmatic use in scripts, agents, frameworks |
|
|
245
|
+
| [RTK](https://github.com/rtk-ai/rtk) | Rust CLI proxy for shell commands | Coding agents (Claude Code, OpenCode) |
|
|
246
|
+
| [claw-compactor](https://github.com/open-compress/claw-compactor) | 14-stage pipeline, AST-aware | Heavy-duty workspace compression |
|
|
247
|
+
| [toons](https://pypi.org/project/toons/) | TOON serialization format | Tabular data in LLM prompts |
|
|
248
|
+
| [LLMLingua](https://github.com/microsoft/LLMLingua) | Neural prompt compression | Natural language, requires GPU |
|
|
249
|
+
|
|
250
|
+
## Design Principles
|
|
251
|
+
|
|
252
|
+
- **Zero deps** — stdlib only. tiktoken is optional for exact counts.
|
|
253
|
+
- **Builtins-first** — `frozenset` for O(1) lookups, precompiled regexes, `slots=True` frozen dataclasses.
|
|
254
|
+
- **DRY** — shared `strip_nullish()`, `dedup_lines()` reused across minimizers.
|
|
255
|
+
- **Type-routed** — O(1) detection for dicts/lists, first-2KB heuristic for strings.
|
|
256
|
+
- **Safe by default** — aggressive mode is opt-in. Default never destroys meaning.
|
|
257
|
+
|
|
258
|
+
## Development
|
|
259
|
+
|
|
260
|
+
```bash
|
|
261
|
+
git clone https://github.com/amahi2001/python-token-killer.git
|
|
262
|
+
cd python-token-killer
|
|
263
|
+
uv sync # installs all dev dependencies, creates .venv automatically
|
|
264
|
+
make check # lint + typecheck + 361 tests
|
|
265
|
+
```
|
|
266
|
+
|
|
267
|
+
## License
|
|
268
|
+
|
|
269
|
+
MIT
|