mlsort 0.1.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,135 @@
1
+ Metadata-Version: 2.4
2
+ Name: mlsort
3
+ Version: 0.1.0
4
+ Summary: ML-guided sorting backend selector with install-time benchmarking
5
+ Author: Siddharth Chaudhary
6
+ License: MIT License
7
+
8
+ Copyright (c) 2025 Siddharth Chaudhary
9
+
10
+ Permission is hereby granted, free of charge, to any person obtaining a copy
11
+ of this software and associated documentation files (the "Software"), to deal
12
+ in the Software without restriction, including without limitation the rights
13
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
14
+ copies of the Software, and to permit persons to whom the Software is
15
+ furnished to do so, subject to the following conditions:
16
+
17
+ The above copyright notice and this permission notice shall be included in all
18
+ copies or substantial portions of the Software.
19
+
20
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
21
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
22
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
23
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
24
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
25
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
26
+ SOFTWARE.
27
+
28
+ Project-URL: Homepage, https://github.com/sidcoding/mlsort
29
+ Project-URL: Repository, https://github.com/sidcoding/mlsort
30
+ Project-URL: Issues, https://github.com/sidcoding/mlsort/issues
31
+ Keywords: sorting,machine-learning,numpy,performance,benchmark,timsort,radix,counting-sort,decision-tree
32
+ Classifier: Programming Language :: Python :: 3
33
+ Classifier: Programming Language :: Python :: 3 :: Only
34
+ Classifier: Programming Language :: Python :: 3.9
35
+ Classifier: Programming Language :: Python :: 3.10
36
+ Classifier: Programming Language :: Python :: 3.11
37
+ Classifier: License :: OSI Approved :: MIT License
38
+ Classifier: Operating System :: OS Independent
39
+ Classifier: Intended Audience :: Developers
40
+ Classifier: Topic :: Software Development :: Libraries
41
+ Classifier: Topic :: System :: Benchmark
42
+ Requires-Python: >=3.9
43
+ Description-Content-Type: text/markdown
44
+ License-File: LICENSE
45
+ Requires-Dist: numpy>=1.24
46
+ Requires-Dist: scikit-learn>=1.3
47
+ Requires-Dist: scipy>=1.10
48
+ Requires-Dist: joblib>=1.3
49
+ Dynamic: license-file
50
+
51
+ # mlsort
52
+
53
+ ML-guided sorting backend selector. Chooses between Python Timsort, NumPy sorts, and integer-only counting/radix based on cheap, sampled properties of your data. Defaults are safe; selection only activates for large arrays.
54
+
55
+ ## Install
56
+
57
+ ```bash
58
+ pip install mlsort
59
+ ```
60
+
61
+ Optionally initialize thresholds and optimized cutoffs (recommended once per machine/user):
62
+
63
+ ```bash
64
+ mlsort-init # all params optional; see below
65
+ ```
66
+
67
+ ## Quick usage
68
+
69
+ Top-level API:
70
+
71
+ ```python
72
+ from mlsort import sort, select_algorithm
73
+
74
+ data = [3, 1, 2, 5, 4]
75
+ algo = select_algorithm(data) # e.g., 'timsort' or a NumPy backend
76
+ out = sort(data) # returns a new sorted list
77
+
78
+ # Options compatible with Python sort()
79
+ out_desc = sort(data, reverse=True)
80
+ out_by_len = sort(["aa", "b", "cccc"], key=len) # forces builtin Timsort
81
+ ```
82
+
83
+ Behavior summary:
84
+ - Mixed/object/string inputs default to builtin Timsort for safety and compatibility.
85
+ - Passing a key function forces builtin Timsort (NumPy/counting/radix do not support key).
86
+ - reverse=True is supported for all backends; for non-Timsort, results are reversed after sorting.
87
+ - For small arrays, Timsort is used; for medium arrays, NumPy quicksort; the ML decision runs only for very large arrays.
88
+
89
+ ## CLI: initialize thresholds (optional)
90
+
91
+ ```bash
92
+ mlsort-init \
93
+ --samples 1200 \ # training samples (default 1200)
94
+ --max-n 200000 \ # max array size to consider (default 200000)
95
+ --seed 42 \ # default from MLSORT_SEED or 42
96
+ --artifacts /path/to/cache # default MLSORT_ARTIFACTS_DIR or OS cache
97
+ ```
98
+
99
+ This writes `thresholds.json` under the artifacts directory and optimizes two size thresholds:
100
+ - cutoff_n: below this, always use Timsort.
101
+ - activation_n: only run ML decision at/above this size; between cutoff and activation use a fast default (NumPy quicksort).
102
+
103
+ ## Configuration
104
+
105
+ Use environment variables to control behavior:
106
+
107
+ - MLSORT_ARTIFACTS_DIR: directory for cached artifacts (default: OS cache, e.g., `~/Library/Caches/mlsort` on macOS).
108
+ - MLSORT_ENABLE_INSTALL_BENCH=1: allow benchmarking during lazy first-use initialization.
109
+ - MLSORT_INIT_ON_IMPORT=1: opt-in to run a short init automatically on first import if artifacts are missing.
110
+ - MLSORT_SEED=...: deterministic random seed for benchmarking.
111
+ - MLSORT_DEBUG=1: debug logs showing the selected algorithm and paths.
112
+
113
+ ## Supported algorithms
114
+
115
+ - Python Timsort (`list.sort`)
116
+ - NumPy quicksort and mergesort
117
+ - Counting sort (integers only; guarded by range to avoid large memory)
118
+ - Radix LSD sort (integers only)
119
+
120
+ ## Safety and limits
121
+
122
+ - Always-safe fallback: if selection fails or types are unsupported, we use builtin Timsort.
123
+ - Type handling: strings/bytes/mixed objects use Timsort. Numeric-only arrays may use NumPy or integer algorithms.
124
+ - Resource bounds: counting/radix only used when safe; decision is skipped for small/medium arrays to avoid overhead.
125
+
126
+ ## Python versions
127
+
128
+ Tested on Python 3.9–3.11 in CI.
129
+
130
+ ## Troubleshooting
131
+
132
+ - Selection slower than a single baseline: ensure you ran `mlsort-init` and that your data sizes reach the activation threshold. For mostly small arrays, Timsort/NumPy will be chosen automatically.
133
+ - Custom cache location: set `MLSORT_ARTIFACTS_DIR` before running `mlsort-init` or your program.
134
+ - Need full control: call `select_algorithm(...)` to see what would be chosen, then run your preferred sort.
135
+
@@ -0,0 +1,22 @@
1
+ mlsort/__init__.py,sha256=49ZFRUBmCcD_YpHDLtAvb6CjCOAUoDqczL0c5pTWhPs,1121
2
+ mlsort/algorithms.py,sha256=MgOOe9SHy9D_af7siDS4jWtuLK6alhIv8sPusqCx9qI,4475
3
+ mlsort/api.py,sha256=T1T_ND-ybfld0FTHyjAihhNgPTcjcWgXnv50bnVnoKw,6026
4
+ mlsort/baseline.py,sha256=2nZrEY7P5QAQ8RPOxqNz47rR_WZMyd3iONOSR38u_-Y,1104
5
+ mlsort/benchmark.py,sha256=Ez_-HOnbvzfZD0323Nv_8vGj3xhENflztD-7IOEAalo,3713
6
+ mlsort/cli_bench_compare.py,sha256=HH1C8H8IpWWdORT81_1gsO_je7emLW9NEIpqPedDbgw,1771
7
+ mlsort/cli_bench_install.py,sha256=g28V9TZ_b5rJIEbV9LAkXZWYZaUMYQK10npGGxTZ5jI,936
8
+ mlsort/cli_init.py,sha256=kDcnne1lLTRg5IEZVBOYiV8kkzb_5JHmTybZEvvPpKw,1721
9
+ mlsort/cli_optimize_cutoffs.py,sha256=6De71xb93z6JScfMIduKkDe9DZ7xaG2Pg085JQe6HB0,1216
10
+ mlsort/config.py,sha256=3Qzumm41uCvseH9LbRaDe06ffFOsJ3k3f20pM_bdjg8,966
11
+ mlsort/data.py,sha256=HHtffrqOE15jLRxN6sc__mXBKsptT-USLxsZhq09SIc,3246
12
+ mlsort/decision.py,sha256=YB3epa2L7Wa1faFmbyTSNjdGj0NigO2mro7K_k5IklA,1527
13
+ mlsort/features.py,sha256=MJOwPnC4z8VDwiA9PXBgjwSr5r9djup7ZieGzSmwMbg,5303
14
+ mlsort/installer.py,sha256=M7Dj2lMEecNblWY4YoMBME3fjqNf8L2BolNbvLcYUxE,4904
15
+ mlsort/model.py,sha256=OY6b_04unIIbjrjmQ4LKrG52TmB7FvUBbflcBNg2-d0,2378
16
+ mlsort/optimize.py,sha256=7Yi6tmiJcnj_6NtDfViuxzY1nVR3zoq-sbOj0v7yEis,2945
17
+ mlsort-0.1.0.dist-info/licenses/LICENSE,sha256=yzOA5llIyAHw7tVsir3l5NgRm1_pkvXy2r4bUFcZY0g,1076
18
+ mlsort-0.1.0.dist-info/METADATA,sha256=m4fy4EuvfqrB-enbeFxXrEQZb0nhbErdEaqAooD05DU,5794
19
+ mlsort-0.1.0.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
20
+ mlsort-0.1.0.dist-info/entry_points.txt,sha256=HKRZnDWd50NuGw9uFEhAUZE6OOEKl74MT3J3yHL-1Is,218
21
+ mlsort-0.1.0.dist-info/top_level.txt,sha256=0tl8OhYGP3bgyXuS76DsDreFASPKloMccz5pGfteKp0,7
22
+ mlsort-0.1.0.dist-info/RECORD,,
@@ -0,0 +1,5 @@
1
+ Wheel-Version: 1.0
2
+ Generator: setuptools (80.9.0)
3
+ Root-Is-Purelib: true
4
+ Tag: py3-none-any
5
+
@@ -0,0 +1,5 @@
1
+ [console_scripts]
2
+ mlsort-bench-compare = mlsort.cli_bench_compare:main
3
+ mlsort-bench-install = mlsort.cli_bench_install:main
4
+ mlsort-init = mlsort.cli_init:main
5
+ mlsort-optimize-cutoffs = mlsort.cli_optimize_cutoffs:main
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Siddharth Chaudhary
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1 @@
1
+ mlsort