skimtoken 0.2.0__tar.gz → 0.2.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {skimtoken-0.2.0 → skimtoken-0.2.1}/Cargo.lock +1 -1
- {skimtoken-0.2.0 → skimtoken-0.2.1}/Cargo.toml +1 -1
- {skimtoken-0.2.0 → skimtoken-0.2.1}/PKG-INFO +19 -18
- {skimtoken-0.2.0 → skimtoken-0.2.1}/README.md +18 -18
- {skimtoken-0.2.0 → skimtoken-0.2.1}/pyproject.toml +1 -1
- {skimtoken-0.2.0 → skimtoken-0.2.1}/.github/workflows/ci.yml +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/.github/workflows/release.yml +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/.gitignore +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/CONTRIBUTING.md +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/LICENSE +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/data/test_dataset.jsonl +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/examples/example.py +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/examples/multilingual_estimate.py +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/params/basic.toml +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/params/multilingual.toml +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/params/multilingual_simple.toml +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/params/simple.toml +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/scripts/benchmark.py +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/scripts/optimize/optimize_basic.py +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/scripts/optimize/optimize_multilingual.py +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/scripts/optimize/optimize_multilingual_simple.py +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/scripts/optimize/optimize_simple.py +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/scripts/optimize/utils.py +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/scripts/optimize_all.py +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/scripts/prepare_cc100_dataset.py +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/scripts/update_rust_params.py +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/scripts/update_token_counts.py +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/skimtoken/__init__.py +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/skimtoken/__init__.pyi +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/skimtoken/basic.py +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/skimtoken/basic.pyi +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/skimtoken/multilingual.py +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/skimtoken/multilingual.pyi +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/skimtoken/multilingual_simple.py +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/skimtoken/multilingual_simple.pyi +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/skimtoken/simple.py +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/skimtoken/simple.pyi +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/src/lib.rs +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/src/main.rs +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/src/methods/method.rs +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/src/methods/method_basic.rs +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/src/methods/method_multilingual.rs +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/src/methods/method_multilingual_simple.rs +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/src/methods/method_simple.rs +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/tests/test_comprehensive.py +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/tests/test_hypothesis.py +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/tests/test_simple.py +0 -0
- {skimtoken-0.2.0 → skimtoken-0.2.1}/uv.lock +0 -0
@@ -1,6 +1,6 @@
|
|
1
1
|
Metadata-Version: 2.4
|
2
2
|
Name: skimtoken
|
3
|
-
Version: 0.2.
|
3
|
+
Version: 0.2.1
|
4
4
|
License-File: LICENSE
|
5
5
|
Summary: Fast token count estimation library
|
6
6
|
Home-Page: https://github.com/masaishi/skimtoken
|
@@ -25,7 +25,7 @@ A lightweight, fast token count estimation library written in Rust with Python b
|
|
25
25
|
|
26
26
|
**This library is currently in early beta and has significant accuracy issues:**
|
27
27
|
|
28
|
-
- **Multilingual method**: Takes
|
28
|
+
- **Multilingual method**: Takes 1.13x longer than tiktoken due to inefficient implementation
|
29
29
|
- **Overall accuracy**: 15.11% error rate, which is too high for most use cases
|
30
30
|
|
31
31
|
|
@@ -37,7 +37,7 @@ A lightweight, fast token count estimation library written in Rust with Python b
|
|
37
37
|
|
38
38
|
- ✅ **64x less memory** (0.92MB vs 60MB)
|
39
39
|
- ✅ **128x faster startup** (4ms vs 485ms)
|
40
|
-
- ❌ **
|
40
|
+
- ❌ **1.13x slower execution** (5.51s vs 4.59s) for multilingual method
|
41
41
|
- ❌ Trade-off: ~15.11% error rate vs exact counts
|
42
42
|
|
43
43
|
## Installation
|
@@ -133,21 +133,21 @@ Total Characters: 13,062,391
|
|
133
133
|
Mean RMSE: 21.3034 tokens
|
134
134
|
Mean Error Rate: 15.11%
|
135
135
|
|
136
|
-
|
137
|
-
┃ Metric ┃ tiktoken ┃
|
138
|
-
|
139
|
-
│ Init Time │ 0.
|
140
|
-
|
141
|
-
│ Init Memory │ 42.
|
142
|
-
|
143
|
-
│ Exec Time │ 4.
|
144
|
-
|
145
|
-
│ Exec Memory │ 17.
|
146
|
-
|
147
|
-
│ Total Time │
|
148
|
-
|
149
|
-
│ Total Memory │ 59.
|
150
|
-
|
136
|
+
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┓
|
137
|
+
┃ Metric ┃ tiktoken ┃ skimtoken ┃ Ratio ┃
|
138
|
+
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━┩
|
139
|
+
│ Init Time │ 0.815441 s │ 0.138714 s │ 0.170x │
|
140
|
+
├──────────────┼────────────┼────────────┼────────┤
|
141
|
+
│ Init Memory │ 42.4791 MB │ 0.1613 MB │ 0.004x │
|
142
|
+
├──────────────┼────────────┼────────────┼────────┤
|
143
|
+
│ Exec Time │ 4.041857 s │ 5.380782 s │ 1.331x │
|
144
|
+
├──────────────┼────────────┼────────────┼────────┤
|
145
|
+
│ Exec Memory │ 17.3227 MB │ 0.8950 MB │ 0.052x │
|
146
|
+
├──────────────┼────────────┼────────────┼────────┤
|
147
|
+
│ Total Time │ 4.857297 s │ 5.519496 s │ 1.136x │
|
148
|
+
├──────────────┼────────────┼────────────┼────────┤
|
149
|
+
│ Total Memory │ 59.8018 MB │ 1.0563 MB │ 0.018x │
|
150
|
+
└──────────────┴────────────┴────────────┴────────┘
|
151
151
|
```
|
152
152
|
|
153
153
|
## Available Methods
|
@@ -279,3 +279,4 @@ We are actively working to improve skimtoken's accuracy and performance:
|
|
279
279
|
## License
|
280
280
|
|
281
281
|
MIT License - see [LICENSE](./LICENSE) for details.
|
282
|
+
|
@@ -12,7 +12,7 @@ A lightweight, fast token count estimation library written in Rust with Python b
|
|
12
12
|
|
13
13
|
**This library is currently in early beta and has significant accuracy issues:**
|
14
14
|
|
15
|
-
- **Multilingual method**: Takes
|
15
|
+
- **Multilingual method**: Takes 1.13x longer than tiktoken due to inefficient implementation
|
16
16
|
- **Overall accuracy**: 15.11% error rate, which is too high for most use cases
|
17
17
|
|
18
18
|
|
@@ -24,7 +24,7 @@ A lightweight, fast token count estimation library written in Rust with Python b
|
|
24
24
|
|
25
25
|
- ✅ **64x less memory** (0.92MB vs 60MB)
|
26
26
|
- ✅ **128x faster startup** (4ms vs 485ms)
|
27
|
-
- ❌ **
|
27
|
+
- ❌ **1.13x slower execution** (5.51s vs 4.59s) for multilingual method
|
28
28
|
- ❌ Trade-off: ~15.11% error rate vs exact counts
|
29
29
|
|
30
30
|
## Installation
|
@@ -120,21 +120,21 @@ Total Characters: 13,062,391
|
|
120
120
|
Mean RMSE: 21.3034 tokens
|
121
121
|
Mean Error Rate: 15.11%
|
122
122
|
|
123
|
-
|
124
|
-
┃ Metric ┃ tiktoken ┃
|
125
|
-
|
126
|
-
│ Init Time │ 0.
|
127
|
-
|
128
|
-
│ Init Memory │ 42.
|
129
|
-
|
130
|
-
│ Exec Time │ 4.
|
131
|
-
|
132
|
-
│ Exec Memory │ 17.
|
133
|
-
|
134
|
-
│ Total Time │
|
135
|
-
|
136
|
-
│ Total Memory │ 59.
|
137
|
-
|
123
|
+
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┓
|
124
|
+
┃ Metric ┃ tiktoken ┃ skimtoken ┃ Ratio ┃
|
125
|
+
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━┩
|
126
|
+
│ Init Time │ 0.815441 s │ 0.138714 s │ 0.170x │
|
127
|
+
├──────────────┼────────────┼────────────┼────────┤
|
128
|
+
│ Init Memory │ 42.4791 MB │ 0.1613 MB │ 0.004x │
|
129
|
+
├──────────────┼────────────┼────────────┼────────┤
|
130
|
+
│ Exec Time │ 4.041857 s │ 5.380782 s │ 1.331x │
|
131
|
+
├──────────────┼────────────┼────────────┼────────┤
|
132
|
+
│ Exec Memory │ 17.3227 MB │ 0.8950 MB │ 0.052x │
|
133
|
+
├──────────────┼────────────┼────────────┼────────┤
|
134
|
+
│ Total Time │ 4.857297 s │ 5.519496 s │ 1.136x │
|
135
|
+
├──────────────┼────────────┼────────────┼────────┤
|
136
|
+
│ Total Memory │ 59.8018 MB │ 1.0563 MB │ 0.018x │
|
137
|
+
└──────────────┴────────────┴────────────┴────────┘
|
138
138
|
```
|
139
139
|
|
140
140
|
## Available Methods
|
@@ -265,4 +265,4 @@ We are actively working to improve skimtoken's accuracy and performance:
|
|
265
265
|
|
266
266
|
## License
|
267
267
|
|
268
|
-
MIT License - see [LICENSE](./LICENSE) for details.
|
268
|
+
MIT License - see [LICENSE](./LICENSE) for details.
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|