python-token-killer 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (28) hide show
  1. python_token_killer-0.1.0/.gitignore +38 -0
  2. python_token_killer-0.1.0/CHANGELOG.md +74 -0
  3. python_token_killer-0.1.0/CONTRIBUTING.md +141 -0
  4. python_token_killer-0.1.0/LICENSE +21 -0
  5. python_token_killer-0.1.0/PKG-INFO +269 -0
  6. python_token_killer-0.1.0/README.md +219 -0
  7. python_token_killer-0.1.0/benchmarks/bench.py +138 -0
  8. python_token_killer-0.1.0/benchmarks/samples/api_response.json +31 -0
  9. python_token_killer-0.1.0/benchmarks/samples/python_module.py +413 -0
  10. python_token_killer-0.1.0/benchmarks/samples/server_log.txt +57 -0
  11. python_token_killer-0.1.0/examples/claude_code_skill.py +70 -0
  12. python_token_killer-0.1.0/examples/clean_api_response.py +44 -0
  13. python_token_killer-0.1.0/examples/langchain_middleware.py +97 -0
  14. python_token_killer-0.1.0/pyproject.toml +100 -0
  15. python_token_killer-0.1.0/src/ptk/__init__.py +166 -0
  16. python_token_killer-0.1.0/src/ptk/_base.py +137 -0
  17. python_token_killer-0.1.0/src/ptk/_types.py +126 -0
  18. python_token_killer-0.1.0/src/ptk/minimizers/__init__.py +17 -0
  19. python_token_killer-0.1.0/src/ptk/minimizers/_code.py +156 -0
  20. python_token_killer-0.1.0/src/ptk/minimizers/_dict.py +167 -0
  21. python_token_killer-0.1.0/src/ptk/minimizers/_diff.py +83 -0
  22. python_token_killer-0.1.0/src/ptk/minimizers/_list.py +87 -0
  23. python_token_killer-0.1.0/src/ptk/minimizers/_log.py +94 -0
  24. python_token_killer-0.1.0/src/ptk/minimizers/_text.py +182 -0
  25. python_token_killer-0.1.0/src/ptk/py.typed +0 -0
  26. python_token_killer-0.1.0/tests/test_adversarial.py +983 -0
  27. python_token_killer-0.1.0/tests/test_ptk.py +1022 -0
  28. python_token_killer-0.1.0/tests/test_real_world.py +620 -0
@@ -0,0 +1,38 @@
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.egg-info/
6
+ *.egg
7
+ dist/
8
+ build/
9
+ *.whl
10
+
11
+ # uv
12
+ .venv/
13
+
14
+ # Legacy virtual environments (not used with uv)
15
+ venv/
16
+ env/
17
+
18
+ # IDE
19
+ .vscode/
20
+ .idea/
21
+ *.swp
22
+ *.swo
23
+ *~
24
+
25
+ # Testing
26
+ .pytest_cache/
27
+ .coverage
28
+ htmlcov/
29
+ .mypy_cache/
30
+
31
+ # OS
32
+ .DS_Store
33
+ Thumbs.db
34
+
35
+ # Build
36
+ *.so
37
+ *.dylib
38
+ dist/
@@ -0,0 +1,74 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [Unreleased]
9
+
10
+ ### Added
11
+
12
+ ### Changed
13
+
14
+ ### Fixed
15
+
16
+ ---
17
+
18
+ ## [0.1.0] - 2026-04-09
19
+
20
+ Initial public release.
21
+
22
+ ### API
23
+
24
+ - `ptk.minimize(obj)` — auto-detects content type, applies the right compression strategy, returns a minimized string. Accepts `aggressive`, `content_type`, and minimizer-specific kwargs.
25
+ - `ptk.stats(obj)` — same compression, returns a dict with `output`, `original_tokens`, `minimized_tokens`, `savings_pct`, `content_type`.
26
+ - `ptk.detect_type(obj)` — returns the auto-detected content type as a string.
27
+ - `ptk(obj)` — callable module shorthand for `ptk.minimize(obj)`.
28
+
29
+ ### Minimizers
30
+
31
+ - **DictMinimizer** — recursive null/empty stripping (preserves `0` and `False`), key shortening (`description` → `desc`, `configuration` → `cfg`, 30+ mappings), single-child flattening, kv/tabular output formats.
32
+ - **ListMinimizer** — schema-once tabular encoding for uniform list-of-dicts, primitive dedup with `(xN)` counts, deterministic even-spaced sampling with first/last preservation.
33
+ - **CodeMinimizer** — comment stripping with pragma preservation (`# noqa`, `# type: ignore`, `# TODO`, `# FIXME`, `// eslint-disable`), multi-line docstring collapse to first line, multi-language signature extraction (Python, JS, Rust, Go).
34
+ - **LogMinimizer** — consecutive duplicate line collapse, timestamp stripping, error-only filtering with stack trace preservation (`Traceback`, `File`, `*Error:`, `*Exception:`), `"failed"` keyword preservation, FATAL/CRITICAL treated as errors.
35
+ - **DiffMinimizer** — context line folding to `... N lines ...`, noise stripping (`index`, `old mode`, `new mode`, `similarity`, `Binary files`), `` preservation.
36
+ - **TextMinimizer** — 20+ word abbreviations (`implementation` → `impl`, `configuration` → `config`, `production` → `prod`, case-preserving), 16 phrase abbreviations (`in order to` → `to`, `due to the fact that` → `because`), 13 filler phrase removals (`Furthermore,`, `Moreover,`, `Additionally,`), stopword removal (aggressive mode).
37
+
38
+ ### Benchmarks
39
+
40
+ Real token counts via tiktoken (`cl100k_base`):
41
+
42
+ | Benchmark | Original | Default | Saved | Aggressive | Saved |
43
+ |---|---|---|---|---|---|
44
+ | API response (JSON) | 1,450 | 792 | 45.4% | 782 | 46.1% |
45
+ | Python module (code) | 2,734 | 2,113 | 22.7% | 309 | 88.7% |
46
+ | Server log (58 lines) | 1,389 | 1,388 | 0.1% | 231 | 83.4% |
47
+ | 50 user records (list) | 2,774 | 922 | 66.8% | 922 | 66.8% |
48
+ | Verbose paragraph (text) | 101 | 96 | 5.0% | 74 | 26.7% |
49
+ | **Total** | **11,182** | **7,424** | **33.6%** | **2,627** | **76.5%** |
50
+
51
+ Bundled sample data and runner: `python benchmarks/bench.py`
52
+
53
+ ### Tests
54
+
55
+ 322 tests across two suites:
56
+
57
+ - **test_ptk.py** (153 tests) — feature coverage for all 6 minimizers, type detection, base helpers, API contracts, and real-world payloads.
58
+ - **test_adversarial.py** (169 tests) — type chaos (None, bytes, sets, circular refs, broken `__str__`, dataclasses, generators, inf/nan), deep nesting (100-level dicts, 10k-wide structures), unicode (emoji, CJK, RTL, null bytes, surrogates, BOM), regex safety (pathological backtracking, unclosed constructs, 100k newlines), API contracts (parametrized across 9 input types), input mutation verification (deepcopy before/after), thread safety (10 concurrent threads), performance (all benchmarks under 5s), idempotency, and content type mismatch degradation.
59
+
60
+ ### Examples
61
+
62
+ - `examples/clean_api_response.py` — standalone script + stdin pipe for JSON cleanup.
63
+ - `examples/langchain_middleware.py` — LangGraph node, callable wrapper, batch document minimizer.
64
+ - `examples/claude_code_skill.py` — CLI tool with `--stdin`, `--type`, `--aggressive`, `--stats` flags.
65
+
66
+ ### Infrastructure
67
+
68
+ - Zero required dependencies — stdlib only. tiktoken optional (`pip install python-token-killer[tiktoken]`).
69
+ - `mypy --strict` clean across all 10 source files.
70
+ - `ruff check` clean across all source, tests, benchmarks, and examples.
71
+ - `py.typed` marker for PEP 561 type checker support.
72
+ - GitHub Actions CI workflow (Python 3.10–3.13 matrix).
73
+ - `AGENTS.md` + `CLAUDE.md` for coding agent context.
74
+ - MIT license.
@@ -0,0 +1,141 @@
1
+ # Contributing to ptk
2
+
3
+ ## Quick Start
4
+
5
+ ```bash
6
+ git clone https://github.com/amahi2001/python-token-killer.git
7
+ cd python-token-killer
8
+
9
+ # Install uv (if you don't have it)
10
+ curl -LsSf https://astral.sh/uv/install.sh | sh
11
+
12
+ # Install all dev dependencies — uv manages the venv automatically
13
+ uv sync
14
+
15
+ # Run everything CI runs — must pass before opening a PR
16
+ make check
17
+ ```
18
+
19
+ That's it. `uv sync` reads `uv.lock`, creates `.venv`, and installs every dev tool pinned to exact versions. No manual venv activation needed — `uv run` handles it.
20
+
21
+ ## Commands
22
+
23
+ | Command | What it does |
24
+ |---|---|
25
+ | `make check` | Lint + typecheck + tests (the one command before every PR) |
26
+ | `make test` | Tests only (361 tests, ~0.6s) |
27
+ | `make lint` | `ruff check` + `ruff format --check` |
28
+ | `make typecheck` | `mypy --strict` |
29
+ | `make bench` | Benchmarks with tiktoken |
30
+ | `make fix` | Auto-fix lint and formatting issues |
31
+ | `make build` | Build wheel + sdist (`dist/`) |
32
+ | `make clean` | Remove caches and build artifacts |
33
+
34
+ All commands use `uv run` — they work whether or not you've activated the venv.
35
+
36
+ ## Architecture in 30 Seconds
37
+
38
+ ```
39
+ ptk.minimize(obj)
40
+ → _types.detect(obj) # what is this? dict, list, code, log, diff, text
41
+ → _ROUTER[type] # pick the singleton minimizer
42
+ → minimizer.run(obj) # _serialize for measurement, _minimize for output
43
+ → MinResult(output, lengths) # frozen dataclass
44
+ ```
45
+
46
+ Every file has one job:
47
+
48
+ ```
49
+ src/ptk/
50
+ __init__.py Public API + callable module trick + router
51
+ _types.py ContentType enum + detect() heuristics
52
+ _base.py Minimizer ABC + MinResult + shared helpers
53
+ minimizers/
54
+ _dict.py DictMinimizer (null strip, key shorten, flatten)
55
+ _list.py ListMinimizer (tabular, dedup, sampling)
56
+ _code.py CodeMinimizer (comments, docstrings, signatures)
57
+ _log.py LogMinimizer (dedup lines, error filter, stack traces)
58
+ _diff.py DiffMinimizer (context folding, noise strip)
59
+ _text.py TextMinimizer (abbreviation, filler removal, stopwords)
60
+ ```
61
+
62
+ ## The Three Rules
63
+
64
+ These are non-negotiable. PRs that break them will be rejected.
65
+
66
+ ### 1. `minimize()` must never raise
67
+
68
+ Any Python object passed to `ptk.minimize()` must produce a string — never an exception. `Minimizer.run()` wraps every `_minimize()` call in a try/except that catches `RecursionError`, `ValueError`, `TypeError`, and `OverflowError`, falling back to `str(obj)`.
69
+
70
+ ### 2. Never mutate the input
71
+
72
+ All minimizers must create new objects. The original `obj` passed to `minimize()` must be identical after the call. `test_adversarial.py::TestInputMutation` verifies this with `deepcopy` comparisons.
73
+
74
+ ### 3. Zero required dependencies
75
+
76
+ The library must work with `pip install python-token-killer` and nothing else. No numpy, no tiktoken in the core. Optional extras are fine — import them inside try/except.
77
+
78
+ ## Gotchas You'll Hit
79
+
80
+ ### The callable module trick
81
+
82
+ `__init__.py` swaps its own `__class__` to `_CallableModule` so `ptk(obj)` works. Imports must be structured carefully — `sys` and `types` are imported after the public API definitions with `# noqa: E402`. Don't reorganize imports without testing `ptk({"a": 1})` interactively.
83
+
84
+ ### `_serialize` vs `_minimize`
85
+
86
+ `_serialize(obj)` is called before `_minimize()` — it only measures the original length for stats. It must never raise (it has its own try/except). The actual output comes from `_minimize()`.
87
+
88
+ ### `from __future__ import annotations`
89
+
90
+ Every source file uses this for PEP 563 deferred annotation evaluation on Python 3.10. Don't remove it.
91
+
92
+ ### Regexes are precompiled
93
+
94
+ All regex patterns are compiled at module import time as module-level constants. Never call `re.compile()` inside a function.
95
+
96
+ ### Pragma preservation in CodeMinimizer
97
+
98
+ When stripping comments, `_strip_comment_if_safe()` checks each comment against `_PRAGMA_KEYWORDS` before removing it. Comments containing `noqa`, `type: ignore`, `TODO`, `FIXME`, `eslint-disable`, etc. survive.
99
+
100
+ ### Thread safety
101
+
102
+ Minimizers are stateless singletons stored in `_ROUTER`. Don't add instance attributes that change between calls.
103
+
104
+ ## Adding a New Minimizer
105
+
106
+ 1. Create `src/ptk/minimizers/_yourtype.py`, subclass `Minimizer`, implement `_minimize()`
107
+ 2. Add `ContentType.YOURTYPE = auto()` to `_types.py` + detection heuristic in `detect()`
108
+ 3. Register in `_ROUTER` in `__init__.py`
109
+ 4. Export from `minimizers/__init__.py`
110
+ 5. Add tests in `test_ptk.py` (feature) and `test_adversarial.py` (edge cases)
111
+
112
+ ## Dependency Groups
113
+
114
+ ptk uses [PEP 735](https://peps.python.org/pep-0735/) dependency groups:
115
+
116
+ | Group | Contents | Install |
117
+ |---|---|---|
118
+ | `test` | pytest | `uv sync --only-group test` |
119
+ | `lint` | ruff | `uv sync --only-group lint` |
120
+ | `typecheck` | mypy | `uv sync --only-group typecheck` |
121
+ | `bench` | tiktoken | `uv sync --only-group bench` |
122
+ | `hooks` | pre-commit | `uv sync --only-group hooks` |
123
+ | `dev` | all of the above | `uv sync` (default) |
124
+
125
+ CI installs only what each job needs. `uv sync` with no flags installs `dev` (everything).
126
+
127
+ ## Pre-commit Hooks
128
+
129
+ ```bash
130
+ uv run pre-commit install
131
+ ```
132
+
133
+ After that, `ruff` and `mypy` run automatically on every `git commit`. `make check` is the equivalent without needing hooks installed.
134
+
135
+ ## PR Checklist
136
+
137
+ - `make check` passes
138
+ - New code has tests in `test_ptk.py` (feature) and/or `test_adversarial.py` (edge cases)
139
+ - No new required dependencies added
140
+ - Docstrings on new public classes/methods
141
+ - CHANGELOG.md updated under `[Unreleased]` if user-facing
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 ptk contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,269 @@
1
+ Metadata-Version: 2.4
2
+ Name: python-token-killer
3
+ Version: 0.1.0
4
+ Summary: Minimize LLM tokens from Python objects — dicts, code, logs, diffs, and more.
5
+ Project-URL: Homepage, https://github.com/amahi2001/python-token-killer
6
+ Project-URL: Repository, https://github.com/amahi2001/python-token-killer
7
+ Project-URL: Issues, https://github.com/amahi2001/python-token-killer/issues
8
+ Project-URL: Changelog, https://github.com/amahi2001/python-token-killer/blob/main/CHANGELOG.md
9
+ Author-email: amahi2001 <amahi2001@gmail.com>
10
+ License: MIT License
11
+
12
+ Copyright (c) 2026 ptk contributors
13
+
14
+ Permission is hereby granted, free of charge, to any person obtaining a copy
15
+ of this software and associated documentation files (the "Software"), to deal
16
+ in the Software without restriction, including without limitation the rights
17
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
18
+ copies of the Software, and to permit persons to whom the Software is
19
+ furnished to do so, subject to the following conditions:
20
+
21
+ The above copyright notice and this permission notice shall be included in all
22
+ copies or substantial portions of the Software.
23
+
24
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
25
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
26
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
27
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
28
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
29
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
30
+ SOFTWARE.
31
+ License-File: LICENSE
32
+ Keywords: agents,claude,compression,context-window,langchain,langgraph,llm,nlp,openai,rag,tokens
33
+ Classifier: Development Status :: 3 - Alpha
34
+ Classifier: Intended Audience :: Developers
35
+ Classifier: License :: OSI Approved :: MIT License
36
+ Classifier: Operating System :: OS Independent
37
+ Classifier: Programming Language :: Python :: 3
38
+ Classifier: Programming Language :: Python :: 3.10
39
+ Classifier: Programming Language :: Python :: 3.11
40
+ Classifier: Programming Language :: Python :: 3.12
41
+ Classifier: Programming Language :: Python :: 3.13
42
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
43
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
44
+ Classifier: Topic :: Text Processing
45
+ Classifier: Typing :: Typed
46
+ Requires-Python: >=3.10
47
+ Provides-Extra: tiktoken
48
+ Requires-Dist: tiktoken>=0.7; extra == 'tiktoken'
49
+ Description-Content-Type: text/markdown
50
+
51
+ <p align="center">
52
+ <img src="assets/mascot.png" alt="ptk" width="200"/>
53
+ </p>
54
+
55
+ <p align="center">
56
+ <strong>ptk — Python Token Killer</strong><br/>
57
+ <strong>Minimize LLM tokens from Python objects in one call</strong><br/>
58
+ Zero dependencies • Auto type detection • 322 tests
59
+ </p>
60
+
61
+ <table align="center">
62
+ <tr>
63
+ <td align="left" valign="middle">
64
+ <a href="https://github.com/amahi2001/python-token-killer/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/amahi2001/python-token-killer/ci.yml?branch=main&style=flat-square&label=CI" alt="CI"/></a><br/>
65
+ <img src="https://img.shields.io/badge/python-3.10+-3776AB?style=flat-square&logo=python&logoColor=white" alt="Python 3.10+"/><br/>
66
+ <img src="https://img.shields.io/badge/mypy-strict-blue?style=flat-square" alt="mypy strict"/><br/>
67
+ <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-yellow?style=flat-square" alt="License"/></a>
68
+ </td>
69
+ </tr>
70
+ </table>
71
+
72
+ ---
73
+
74
+ ## What is ptk?
75
+
76
+ ptk is a **Python library** that minimizes tokens before they reach an LLM. Pass in any Python object — dict, list, code, logs, diffs, text — and get back a compressed string representation.
77
+
78
+ Inspired by [RTK (Rust Token Killer)](https://github.com/rtk-ai/rtk), but designed as a library for programmatic use, not a CLI proxy.
79
+
80
+ ```python
81
+ import ptk
82
+
83
+ ptk.minimize({"users": [{"name": "Alice", "bio": None, "age": 30}]})
84
+ # → '{"users":[{"name":"Alice","age":30}]}'
85
+
86
+ ptk(my_dict) # callable shorthand
87
+ ptk(my_dict, aggressive=True) # max compression
88
+ ```
89
+
90
+ ```bash
91
+ pip install python-token-killer
92
+ # or
93
+ uv add python-token-killer
94
+ ```
95
+
96
+ Optional: `pip install python-token-killer[tiktoken]` or `uv add python-token-killer[tiktoken]` for exact token counting.
97
+
98
+ ## Benchmarks
99
+
100
+ Real token counts via tiktoken (`cl100k_base`, same tokenizer as GPT-4 / Claude):
101
+
102
+ ```
103
+ Benchmark Original Default Saved Aggressive Saved
104
+ API response (JSON) 1450 792 45.4% 782 46.1%
105
+ Python module (code) 2734 2113 22.7% 309 88.7%
106
+ Server log (58 lines) 1389 1388 0.1% 231 83.4%
107
+ 50 user records (list) 2774 922 66.8% 922 66.8%
108
+ Verbose paragraph (text) 101 96 5.0% 74 26.7%
109
+ ─────────────────────────────────────────────
110
+ TOTAL 11182 7424 33.6% 2627 76.5%
111
+ ```
112
+
113
+ Run yourself: `python benchmarks/bench.py`
114
+
115
+ ## What It Does
116
+
117
+ ptk auto-detects your input type and routes to the right minimizer:
118
+
119
+ | Input Type | Strategy | Typical Savings |
120
+ |---|---|---|
121
+ | `dict` | Null stripping, key shortening, flattening, compact JSON | 30–60% |
122
+ | `list` | Dedup, schema-once tabular, sampling | 40–70% |
123
+ | Code `str` | Comment stripping (pragma-preserving), docstring collapse, signature extraction | 25–80% |
124
+ | Logs `str` | Line dedup with counts, error-only filtering, stack trace preservation | 60–90% |
125
+ | Diffs `str` | Context folding, noise stripping | 50–75% |
126
+ | Text `str` | Word/phrase abbreviation, filler removal, stopword removal | 10–30% |
127
+
128
+ ## API
129
+
130
+ ### `ptk.minimize(obj, *, aggressive=False, content_type=None, **kw) → str`
131
+
132
+ Main entry point. Auto-detects type, applies the right strategy, returns a minimized string.
133
+
134
+ ```python
135
+ # auto-detect
136
+ ptk.minimize({"key": "value"})
137
+
138
+ # force content type
139
+ ptk.minimize(some_string, content_type="code")
140
+ ptk.minimize(some_string, content_type="log")
141
+
142
+ # dict output formats
143
+ ptk.minimize(data, format="kv") # key:value lines
144
+ ptk.minimize(data, format="tabular") # header-once tabular
145
+
146
+ # code: signatures only (huge savings)
147
+ ptk.minimize(code, content_type="code", mode="signatures")
148
+
149
+ # logs: errors only
150
+ ptk.minimize(logs, content_type="log", errors_only=True)
151
+ ```
152
+
153
+ ### `ptk.stats(obj, **kw) → dict`
154
+
155
+ Same compression, but returns statistics:
156
+
157
+ ```python
158
+ ptk.stats(big_api_response)
159
+ # {
160
+ # "output": "...",
161
+ # "original_len": 4200,
162
+ # "minimized_len": 1800,
163
+ # "savings_pct": 57.1,
164
+ # "content_type": "dict",
165
+ # "original_tokens": 1050,
166
+ # "minimized_tokens": 450,
167
+ # }
168
+ ```
169
+
170
+ ### `ptk(obj)` — callable module
171
+
172
+ ```python
173
+ import ptk
174
+ ptk(some_dict) # equivalent to ptk.minimize(some_dict)
175
+ ```
176
+
177
+ ## Features by Minimizer
178
+
179
+ ### DictMinimizer
180
+ - Strips `None`, `""`, `[]`, `{}` recursively (preserves `0` and `False`)
181
+ - Key shortening: `description` → `desc`, `timestamp` → `ts`, `configuration` → `cfg`, etc.
182
+ - Single-child flattening: `{"a": {"b": val}}` → `{"a.b": val}` (aggressive)
183
+ - Output formats: compact JSON (default), key-value lines, header-once tabular
184
+
185
+ ### ListMinimizer
186
+ - Uniform list-of-dicts → schema-once tabular: declare fields once, one row per item
187
+ - Primitive dedup with counts: `["a", "a", "a", "b"]` → `a (x3)\nb`
188
+ - Large array sampling with first/last preservation (aggressive, threshold: 50)
189
+
190
+ ### CodeMinimizer
191
+ - Strips comments while **preserving pragmas**: `# noqa`, `# type: ignore`, `# TODO`, `# FIXME`, `// eslint-disable`
192
+ - Collapses multi-line docstrings to first line only
193
+ - Signature extraction mode: pulls `def`, `class`, `fn`, `func` across Python, JS, Rust, Go
194
+ - Normalizes blank lines and trailing whitespace
195
+
196
+ ### LogMinimizer
197
+ - Consecutive duplicate line collapse with `(xN)` counts
198
+ - Error-only filtering preserving: ERROR, WARN, FATAL, CRITICAL, stack traces, "failed" keyword
199
+ - Timestamp stripping (aggressive)
200
+
201
+ ### DiffMinimizer
202
+ - Folds unchanged context lines to `... N lines ...`
203
+ - Strips noise: `index`, `old mode`, `new mode`, `similarity`, `Binary files` (aggressive)
204
+ - Preserves: `+`/`-` lines, `@@` hunks, `---`/`+++` headers, ``
205
+
206
+ ### TextMinimizer
207
+ - Word abbreviation: `implementation` → `impl`, `configuration` → `config`, `production` → `prod`, etc.
208
+ - Phrase abbreviation: `in order to` → `to`, `due to the fact that` → `because`, etc.
209
+ - Filler removal: strips `Furthermore,`, `Moreover,`, `In addition,`, `Additionally,`
210
+ - Stopword removal (aggressive): strips `the`, `a`, `is`, `very`, etc.
211
+
212
+ ## Use Cases
213
+
214
+ ### Agent Frameworks (LangGraph / LangChain)
215
+
216
+ ```python
217
+ import ptk
218
+
219
+ def compress_context(state):
220
+ state["context"] = ptk.minimize(state["context"], aggressive=True)
221
+ return state
222
+ ```
223
+
224
+ ### Claude Code Skills
225
+
226
+ ```python
227
+ #!/usr/bin/env python3
228
+ import ptk, json, sys
229
+ data = json.load(open(sys.argv[1]))
230
+ print(ptk(data))
231
+ ```
232
+
233
+ ### API Response Cleanup
234
+
235
+ ```python
236
+ response = requests.get("https://api.example.com/users").json()
237
+ clean = ptk.minimize(response) # strip nulls, compact JSON
238
+ ```
239
+
240
+ ## Comparison with Alternatives
241
+
242
+ | Tool | Approach | Best For |
243
+ |---|---|---|
244
+ | **ptk** | Type-detecting Python library, one-liner API | Programmatic use in scripts, agents, frameworks |
245
+ | [RTK](https://github.com/rtk-ai/rtk) | Rust CLI proxy for shell commands | Coding agents (Claude Code, OpenCode) |
246
+ | [claw-compactor](https://github.com/open-compress/claw-compactor) | 14-stage pipeline, AST-aware | Heavy-duty workspace compression |
247
+ | [toons](https://pypi.org/project/toons/) | TOON serialization format | Tabular data in LLM prompts |
248
+ | [LLMLingua](https://github.com/microsoft/LLMLingua) | Neural prompt compression | Natural language, requires GPU |
249
+
250
+ ## Design Principles
251
+
252
+ - **Zero deps** — stdlib only. tiktoken is optional for exact counts.
253
+ - **Builtins-first** — `frozenset` for O(1) lookups, precompiled regexes, `slots=True` frozen dataclasses.
254
+ - **DRY** — shared `strip_nullish()`, `dedup_lines()` reused across minimizers.
255
+ - **Type-routed** — O(1) detection for dicts/lists, first-2KB heuristic for strings.
256
+ - **Safe by default** — aggressive mode is opt-in. Default never destroys meaning.
257
+
258
+ ## Development
259
+
260
+ ```bash
261
+ git clone https://github.com/amahi2001/python-token-killer.git
262
+ cd python-token-killer
263
+ uv sync # installs all dev dependencies, creates .venv automatically
264
+ make check # lint + typecheck + 361 tests
265
+ ```
266
+
267
+ ## License
268
+
269
+ MIT