charex-numba 0.5.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,25 @@
1
+ BSD 2-Clause License
2
+
3
+ Copyright (c) 2022, Nima Mehrani
4
+ All rights reserved.
5
+
6
+ Redistribution and use in source and binary forms, with or without
7
+ modification, are permitted provided that the following conditions are met:
8
+
9
+ 1. Redistributions of source code must retain the above copyright notice, this
10
+ list of conditions and the following disclaimer.
11
+
12
+ 2. Redistributions in binary form must reproduce the above copyright notice,
13
+ this list of conditions and the following disclaimer in the documentation
14
+ and/or other materials provided with the distribution.
15
+
16
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
17
+ AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18
+ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
19
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
20
+ FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
21
+ DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
22
+ SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
23
+ CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
24
+ OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
@@ -0,0 +1,8 @@
1
+ prune benchmarks
2
+ prune docs
3
+ prune tests
4
+
5
+ global-exclude __pycache__
6
+ global-exclude *.py[cod]
7
+ global-exclude *.nbc
8
+ global-exclude *.nbi
@@ -0,0 +1,241 @@
1
+ Metadata-Version: 2.4
2
+ Name: charex-numba
3
+ Version: 0.5.2
4
+ Summary: Numba overloads for NumPy string operations
5
+ License-Expression: BSD-2-Clause
6
+ Project-URL: Homepage, https://github.com/nmehran/charex
7
+ Project-URL: Source, https://github.com/nmehran/charex
8
+ Project-URL: Issues, https://github.com/nmehran/charex/issues
9
+ Keywords: numba,numpy,strings,stringdtype,np.char,np.strings
10
+ Classifier: Development Status :: 4 - Beta
11
+ Classifier: Intended Audience :: Developers
12
+ Classifier: Intended Audience :: Science/Research
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3 :: Only
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Programming Language :: Python :: 3.13
19
+ Classifier: Programming Language :: Python :: 3.14
20
+ Classifier: Topic :: Scientific/Engineering
21
+ Classifier: Topic :: Software Development :: Compilers
22
+ Requires-Python: <3.15,>=3.10
23
+ Description-Content-Type: text/markdown
24
+ License-File: LICENSE
25
+ Requires-Dist: numba<0.66,>=0.65.1
26
+ Requires-Dist: numpy<2.5,>=1.22
27
+ Provides-Extra: test
28
+ Requires-Dist: pytest>=8; extra == "test"
29
+ Provides-Extra: bench
30
+ Requires-Dist: matplotlib>=3.8; extra == "bench"
31
+ Dynamic: license-file
32
+
33
+ # charex
34
+
35
+ Use NumPy string functions inside Numba-compiled code.
36
+
37
+ `charex` lets `@njit` functions call common NumPy string operations such as
38
+ comparisons, `find`, `startswith`, `endswith`, `str_len`, and string predicates.
39
+ It works with fixed-width NumPy string arrays and, on NumPy 2.x, variable-width
40
+ `StringDType` arrays.
41
+
42
+ ## Installation
43
+
44
+ The PyPI distribution is named `charex-numba`; the import package is `charex`:
45
+
46
+ ```bash
47
+ python -m pip install charex-numba
48
+ ```
49
+
50
+ ## Quick Start
51
+
52
+ ```python
53
+ import charex
54
+ import numpy as np
55
+ from numba import njit
56
+
57
+
58
+ @njit
59
+ def count_long(values, min_length):
60
+ # charex enables this NumPy string operation inside nopython mode.
61
+ lengths = np.strings.str_len(values)
62
+ return np.count_nonzero(lengths >= min_length)
63
+ ```
64
+
65
+ On NumPy 1.x, use the same pattern with `np.char` and fixed-width `S` or `U`
66
+ arrays.
67
+
68
+ ## Behavior
69
+
70
+ NumPy behavior is the contract. Supported operations aim to match NumPy's return
71
+ values, output shapes, dtypes, exception behavior, broadcasting, and input
72
+ immutability.
73
+
74
+ ## Supported Operations
75
+
76
+ Comparisons:
77
+
78
+ - `equal`
79
+ - `not_equal`
80
+ - `greater`
81
+ - `greater_equal`
82
+ - `less`
83
+ - `less_equal`
84
+
85
+ Occurrence and search:
86
+
87
+ - `count`
88
+ - `startswith`
89
+ - `endswith`
90
+ - `find`
91
+ - `rfind`
92
+ - `index`
93
+ - `rindex`
94
+
95
+ Information and predicates:
96
+
97
+ - `str_len`
98
+ - `isalpha`
99
+ - `isalnum`
100
+ - `isdigit`
101
+ - `isdecimal`
102
+ - `isnumeric`
103
+ - `isspace`
104
+ - `islower`
105
+ - `isupper`
106
+ - `istitle`
107
+
108
+ Additional `np.char` operation:
109
+
110
+ - `compare_chararrays`
111
+
112
+ ## Supported APIs And Dtypes
113
+
114
+ - `np.char` fixed-width byte strings: `S`
115
+ - `np.char` fixed-width Unicode strings: `U`
116
+ - `np.strings` fixed-width byte strings: `S`
117
+ - `np.strings` fixed-width Unicode strings: `U`
118
+ - `np.strings` variable-width Unicode strings: `StringDType`
119
+
120
+ NumPy stores `S` and `U` arrays in fixed-size records. `StringDType` is
121
+ variable-width and stores string payloads separately from the array's fixed-size
122
+ metadata records. `charex` supports both storage models.
123
+
124
+ The native `charex._stringdtype` extension is required for `StringDType` support
125
+ and is built by the package install.
126
+
127
+ ## Shapes
128
+
129
+ Supported inputs include scalars, 0-D arrays, 1-D arrays, N-D arrays, and
130
+ broadcast-compatible shapes for both fixed-width `S`/`U` and variable-width
131
+ `StringDType`. Array inputs may be contiguous, read-only, positively strided,
132
+ negatively strided, zero-stride, or empty views.
133
+
134
+ `StringDType()` and `StringDType(na_object=...)` variants are supported for the
135
+ listed `np.strings` operations with NumPy-matching operation-specific null
136
+ behavior.
137
+
138
+ `np.char` and `np.strings` are not treated as aliases. For example, `np.char`
139
+ comparison semantics strip trailing whitespace/NULs, while `np.strings`
140
+ comparison semantics do not.
141
+
142
+ Byte inputs to Unicode-only predicates such as `isdecimal` and `isnumeric`
143
+ follow NumPy and raise unsupported-loop errors.
144
+
145
+ ## Not Supported
146
+
147
+ `charex` does not yet implement transformation or output-producing string
148
+ operations such as replace, case conversion, strip, pad, join, split, encode, or
149
+ decode. Object arrays and object-scalar bridges are also outside the current
150
+ nopython string path.
151
+
152
+ ## Performance
153
+
154
+ On the current Numba 0.65.1 matrix, `charex` ranges from `1.02x` to `6.51x`
155
+ NumPy speed across 135 fixed-width and `StringDType` cases, with a `1.60x`
156
+ median.
157
+
158
+ Benchmark artifacts are in
159
+ [docs/benchmarks/numba-v-0.65.1](docs/benchmarks/numba-v-0.65.1/).
160
+
161
+ ### Comparison Operators
162
+
163
+ ![comparison-operators-bytes.png](docs/benchmarks/numba-v-0.65.1/comparison-operators-bytes.png)
164
+ ![comparison-operators-strings.png](docs/benchmarks/numba-v-0.65.1/comparison-operators-strings.png)
165
+ ![stringdtype-comparison.png](docs/benchmarks/numba-v-0.65.1/stringdtype-comparison.png)
166
+
167
+ ### Occurrence Information
168
+
169
+ ![char-occurrence-bytes.png](docs/benchmarks/numba-v-0.65.1/char-occurrence-bytes.png)
170
+ ![char-occurrence-strings.png](docs/benchmarks/numba-v-0.65.1/char-occurrence-strings.png)
171
+ ![stringdtype-occurrence.png](docs/benchmarks/numba-v-0.65.1/stringdtype-occurrence.png)
172
+
173
+ ### Property Information
174
+
175
+ ![char-properties-bytes.png](docs/benchmarks/numba-v-0.65.1/char-properties-bytes.png)
176
+ ![char-properties-strings.png](docs/benchmarks/numba-v-0.65.1/char-properties-strings.png)
177
+ ![char-numerics-strings.png](docs/benchmarks/numba-v-0.65.1/char-numerics-strings.png)
178
+ ![stringdtype-properties.png](docs/benchmarks/numba-v-0.65.1/stringdtype-properties.png)
179
+ ![stringdtype-numerics.png](docs/benchmarks/numba-v-0.65.1/stringdtype-numerics.png)
180
+
181
+ The previous Numba 0.59 matrix is archived under
182
+ [benchmarks/numba-v-0.59](benchmarks/numba-v-0.59/).
183
+
184
+ ## Compatibility
185
+
186
+ `charex` targets Numba 0.65.1 and the NumPy ranges tested by that Numba release:
187
+
188
+ - Python `>=3.10,<3.15`
189
+ - Numba `>=0.65.1,<0.66`
190
+ - NumPy `>=1.22,<1.27` or `>=2.0,<2.5`
191
+ - llvmlite `0.47.x`
192
+
193
+ `np.strings` is available on NumPy 2.x only. On NumPy 1.x, `charex` registers the
194
+ `np.char` overloads and skips `np.strings`.
195
+
196
+ ## Development
197
+
198
+ Install test dependencies:
199
+
200
+ ```bash
201
+ python -m pip install -e ".[test]"
202
+ ```
203
+
204
+ Run tests:
205
+
206
+ ```bash
207
+ pytest -q
208
+ ```
209
+
210
+ Run the representative behavior audit:
211
+
212
+ ```bash
213
+ python docs/exploration/string_array_shape_audit.py --methods representative --api all --dtype all
214
+ ```
215
+
216
+ Run the benchmark smoke test:
217
+
218
+ ```bash
219
+ python benchmarks/benchmark.py --size 50000 --repeat 5
220
+ ```
221
+
222
+ Install benchmark plotting dependencies and write CSV/PNG output:
223
+
224
+ ```bash
225
+ python -m pip install -e ".[bench]"
226
+ python benchmarks/benchmark.py --size 50000 --repeat 5 --plot
227
+ ```
228
+
229
+ Regenerate the full benchmark matrix from the repository root:
230
+
231
+ ```bash
232
+ python -m pip install -e ".[bench]"
233
+ # Use a fresh NUMBA_CACHE_DIR for release matrices
234
+ CACHE_DIR=$(mktemp -d /tmp/charex-numba-cache.XXXXXX)
235
+ NUMBA_CACHE_DIR="$CACHE_DIR" PYTHONPATH=. \
236
+ python benchmarks/matrix.py --size 250000 --repeat 15
237
+ ```
238
+
239
+ CI runs Python 3.10-3.14 across representative NumPy 1.x and 2.x jobs with
240
+ Numba 0.65.1. The benchmark matrix above was generated on Python 3.12.8,
241
+ NumPy 2.4.6, Numba 0.65.1, and llvmlite 0.47.0.
@@ -0,0 +1,209 @@
1
+ # charex
2
+
3
+ Use NumPy string functions inside Numba-compiled code.
4
+
5
+ `charex` lets `@njit` functions call common NumPy string operations such as
6
+ comparisons, `find`, `startswith`, `endswith`, `str_len`, and string predicates.
7
+ It works with fixed-width NumPy string arrays and, on NumPy 2.x, variable-width
8
+ `StringDType` arrays.
9
+
10
+ ## Installation
11
+
12
+ The PyPI distribution is named `charex-numba`; the import package is `charex`:
13
+
14
+ ```bash
15
+ python -m pip install charex-numba
16
+ ```
17
+
18
+ ## Quick Start
19
+
20
+ ```python
21
+ import charex
22
+ import numpy as np
23
+ from numba import njit
24
+
25
+
26
+ @njit
27
+ def count_long(values, min_length):
28
+ # charex enables this NumPy string operation inside nopython mode.
29
+ lengths = np.strings.str_len(values)
30
+ return np.count_nonzero(lengths >= min_length)
31
+ ```
32
+
33
+ On NumPy 1.x, use the same pattern with `np.char` and fixed-width `S` or `U`
34
+ arrays.
35
+
36
+ ## Behavior
37
+
38
+ NumPy behavior is the contract. Supported operations aim to match NumPy's return
39
+ values, output shapes, dtypes, exception behavior, broadcasting, and input
40
+ immutability.
41
+
42
+ ## Supported Operations
43
+
44
+ Comparisons:
45
+
46
+ - `equal`
47
+ - `not_equal`
48
+ - `greater`
49
+ - `greater_equal`
50
+ - `less`
51
+ - `less_equal`
52
+
53
+ Occurrence and search:
54
+
55
+ - `count`
56
+ - `startswith`
57
+ - `endswith`
58
+ - `find`
59
+ - `rfind`
60
+ - `index`
61
+ - `rindex`
62
+
63
+ Information and predicates:
64
+
65
+ - `str_len`
66
+ - `isalpha`
67
+ - `isalnum`
68
+ - `isdigit`
69
+ - `isdecimal`
70
+ - `isnumeric`
71
+ - `isspace`
72
+ - `islower`
73
+ - `isupper`
74
+ - `istitle`
75
+
76
+ Additional `np.char` operation:
77
+
78
+ - `compare_chararrays`
79
+
80
+ ## Supported APIs And Dtypes
81
+
82
+ - `np.char` fixed-width byte strings: `S`
83
+ - `np.char` fixed-width Unicode strings: `U`
84
+ - `np.strings` fixed-width byte strings: `S`
85
+ - `np.strings` fixed-width Unicode strings: `U`
86
+ - `np.strings` variable-width Unicode strings: `StringDType`
87
+
88
+ NumPy stores `S` and `U` arrays in fixed-size records. `StringDType` is
89
+ variable-width and stores string payloads separately from the array's fixed-size
90
+ metadata records. `charex` supports both storage models.
91
+
92
+ The native `charex._stringdtype` extension is required for `StringDType` support
93
+ and is built by the package install.
94
+
95
+ ## Shapes
96
+
97
+ Supported inputs include scalars, 0-D arrays, 1-D arrays, N-D arrays, and
98
+ broadcast-compatible shapes for both fixed-width `S`/`U` and variable-width
99
+ `StringDType`. Array inputs may be contiguous, read-only, positively strided,
100
+ negatively strided, zero-stride, or empty views.
101
+
102
+ `StringDType()` and `StringDType(na_object=...)` variants are supported for the
103
+ listed `np.strings` operations with NumPy-matching operation-specific null
104
+ behavior.
105
+
106
+ `np.char` and `np.strings` are not treated as aliases. For example, `np.char`
107
+ comparison semantics strip trailing whitespace/NULs, while `np.strings`
108
+ comparison semantics do not.
109
+
110
+ Byte inputs to Unicode-only predicates such as `isdecimal` and `isnumeric`
111
+ follow NumPy and raise unsupported-loop errors.
112
+
113
+ ## Not Supported
114
+
115
+ `charex` does not yet implement transformation or output-producing string
116
+ operations such as replace, case conversion, strip, pad, join, split, encode, or
117
+ decode. Object arrays and object-scalar bridges are also outside the current
118
+ nopython string path.
119
+
120
+ ## Performance
121
+
122
+ On the current Numba 0.65.1 matrix, `charex` ranges from `1.02x` to `6.51x`
123
+ NumPy speed across 135 fixed-width and `StringDType` cases, with a `1.60x`
124
+ median.
125
+
126
+ Benchmark artifacts are in
127
+ [docs/benchmarks/numba-v-0.65.1](docs/benchmarks/numba-v-0.65.1/).
128
+
129
+ ### Comparison Operators
130
+
131
+ ![comparison-operators-bytes.png](docs/benchmarks/numba-v-0.65.1/comparison-operators-bytes.png)
132
+ ![comparison-operators-strings.png](docs/benchmarks/numba-v-0.65.1/comparison-operators-strings.png)
133
+ ![stringdtype-comparison.png](docs/benchmarks/numba-v-0.65.1/stringdtype-comparison.png)
134
+
135
+ ### Occurrence Information
136
+
137
+ ![char-occurrence-bytes.png](docs/benchmarks/numba-v-0.65.1/char-occurrence-bytes.png)
138
+ ![char-occurrence-strings.png](docs/benchmarks/numba-v-0.65.1/char-occurrence-strings.png)
139
+ ![stringdtype-occurrence.png](docs/benchmarks/numba-v-0.65.1/stringdtype-occurrence.png)
140
+
141
+ ### Property Information
142
+
143
+ ![char-properties-bytes.png](docs/benchmarks/numba-v-0.65.1/char-properties-bytes.png)
144
+ ![char-properties-strings.png](docs/benchmarks/numba-v-0.65.1/char-properties-strings.png)
145
+ ![char-numerics-strings.png](docs/benchmarks/numba-v-0.65.1/char-numerics-strings.png)
146
+ ![stringdtype-properties.png](docs/benchmarks/numba-v-0.65.1/stringdtype-properties.png)
147
+ ![stringdtype-numerics.png](docs/benchmarks/numba-v-0.65.1/stringdtype-numerics.png)
148
+
149
+ The previous Numba 0.59 matrix is archived under
150
+ [benchmarks/numba-v-0.59](benchmarks/numba-v-0.59/).
151
+
152
+ ## Compatibility
153
+
154
+ `charex` targets Numba 0.65.1 and the NumPy ranges tested by that Numba release:
155
+
156
+ - Python `>=3.10,<3.15`
157
+ - Numba `>=0.65.1,<0.66`
158
+ - NumPy `>=1.22,<1.27` or `>=2.0,<2.5`
159
+ - llvmlite `0.47.x`
160
+
161
+ `np.strings` is available on NumPy 2.x only. On NumPy 1.x, `charex` registers the
162
+ `np.char` overloads and skips `np.strings`.
163
+
164
+ ## Development
165
+
166
+ Install test dependencies:
167
+
168
+ ```bash
169
+ python -m pip install -e ".[test]"
170
+ ```
171
+
172
+ Run tests:
173
+
174
+ ```bash
175
+ pytest -q
176
+ ```
177
+
178
+ Run the representative behavior audit:
179
+
180
+ ```bash
181
+ python docs/exploration/string_array_shape_audit.py --methods representative --api all --dtype all
182
+ ```
183
+
184
+ Run the benchmark smoke test:
185
+
186
+ ```bash
187
+ python benchmarks/benchmark.py --size 50000 --repeat 5
188
+ ```
189
+
190
+ Install benchmark plotting dependencies and write CSV/PNG output:
191
+
192
+ ```bash
193
+ python -m pip install -e ".[bench]"
194
+ python benchmarks/benchmark.py --size 50000 --repeat 5 --plot
195
+ ```
196
+
197
+ Regenerate the full benchmark matrix from the repository root:
198
+
199
+ ```bash
200
+ python -m pip install -e ".[bench]"
201
+ # Use a fresh NUMBA_CACHE_DIR for release matrices
202
+ CACHE_DIR=$(mktemp -d /tmp/charex-numba-cache.XXXXXX)
203
+ NUMBA_CACHE_DIR="$CACHE_DIR" PYTHONPATH=. \
204
+ python benchmarks/matrix.py --size 250000 --repeat 15
205
+ ```
206
+
207
+ CI runs Python 3.10-3.14 across representative NumPy 1.x and 2.x jobs with
208
+ Numba 0.65.1. The benchmark matrix above was generated on Python 3.12.8,
209
+ NumPy 2.4.6, Numba 0.65.1, and llvmlite 0.47.0.
@@ -0,0 +1,5 @@
1
+ from . import _stringdtype_support as _stringdtype
2
+ from . import _char_compat as _char
3
+ from . import _overloads as _strings
4
+
5
+ __all__ = []