polyleven 0.10.0__cp310-cp310-win_arm64.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
|
@@ -0,0 +1,123 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: polyleven
|
|
3
|
+
Version: 0.10.0
|
|
4
|
+
Summary: A fast C-implemented library for Levenshtein distance
|
|
5
|
+
Maintainer-email: Fujimoto Seiji <fujimoto@ceptord.net>
|
|
6
|
+
Project-URL: github, https://github.com/fujimotos/polyleven
|
|
7
|
+
Keywords: Levenshtein distance
|
|
8
|
+
Classifier: Development Status :: 5 - Production/Stable
|
|
9
|
+
Classifier: Operating System :: OS Independent
|
|
10
|
+
Classifier: Programming Language :: Python :: 3
|
|
11
|
+
Classifier: Programming Language :: C
|
|
12
|
+
Requires-Python: >=3.8
|
|
13
|
+
Description-Content-Type: text/x-rst
|
|
14
|
+
License-File: LICENSE
|
|
15
|
+
Dynamic: license-file
|
|
16
|
+
|
|
17
|
+
==============================================
|
|
18
|
+
Polyleven -- Fast Pythonic Levenshtein Library
|
|
19
|
+
==============================================
|
|
20
|
+
|
|
21
|
+
:License: MIT License
|
|
22
|
+
|
|
23
|
+
1. Introduction
|
|
24
|
+
===============
|
|
25
|
+
|
|
26
|
+
polyleven is a Pythonic Levenshtein distance library that:
|
|
27
|
+
|
|
28
|
+
- Is **fast** independent of input types, and hence can be applied to
|
|
29
|
+
both short (like English words) and long inputs (like DNA sequences).
|
|
30
|
+
|
|
31
|
+
- Is **stand-alone** depending only on core Python packages.
|
|
32
|
+
|
|
33
|
+
- Is distributed under the **MIT License**, hence can be used freely
|
|
34
|
+
in private projects.
|
|
35
|
+
|
|
36
|
+
2. How to install
|
|
37
|
+
=================
|
|
38
|
+
|
|
39
|
+
The official package is available on PyPI::
|
|
40
|
+
|
|
41
|
+
$ pip install polyleven
|
|
42
|
+
|
|
43
|
+
3. How to use
|
|
44
|
+
=============
|
|
45
|
+
|
|
46
|
+
Polyleven provides a single interface function ``levenshtein()``. You
|
|
47
|
+
can use this function to measure the similarity of two strings.
|
|
48
|
+
|
|
49
|
+
>>> from polyleven import levenshtein
|
|
50
|
+
>>> levenshtein('aaa', 'ccc')
|
|
51
|
+
3
|
|
52
|
+
|
|
53
|
+
If you only care about distances under a certain threshold, you can
|
|
54
|
+
pass the max threshold to the third argument.
|
|
55
|
+
|
|
56
|
+
>>> levenshtein('acc', 'ccc', 1)
|
|
57
|
+
1
|
|
58
|
+
>>> levenshtein('aaa', 'ccc', 1)
|
|
59
|
+
2
|
|
60
|
+
|
|
61
|
+
In general, you can gain a noticeable speed boost with threshold
|
|
62
|
+
:math:`k < 3`.
|
|
63
|
+
|
|
64
|
+
4. Benchmark
|
|
65
|
+
============
|
|
66
|
+
|
|
67
|
+
4.1 English Words
|
|
68
|
+
------------------
|
|
69
|
+
|
|
70
|
+
To compare Polyleven with other Pythonic edit distance libraries,
|
|
71
|
+
a million word pairs was generated from `SCOWL`_.
|
|
72
|
+
|
|
73
|
+
.. _SCOWL: http://wordlist.aspell.net/
|
|
74
|
+
|
|
75
|
+
Each library was measured how long it takes to evaluate all of
|
|
76
|
+
these words. The following table summarises the result:
|
|
77
|
+
|
|
78
|
+
============================== ============ ================
|
|
79
|
+
Function Name TIME[sec] SPEED[pairs/s]
|
|
80
|
+
============================== ============ ================
|
|
81
|
+
edlib 4.763 208216
|
|
82
|
+
editdistance 1.943 510450
|
|
83
|
+
jellyfish.levenshtein_distance 0.722 1374081
|
|
84
|
+
distance.levenshtein 0.623 1591396
|
|
85
|
+
Levenshtein.distance 0.500 1982764
|
|
86
|
+
polyleven.levenshtein 0.431 2303420
|
|
87
|
+
============================== ============ ================
|
|
88
|
+
|
|
89
|
+
4.2. Longer Inputs
|
|
90
|
+
------------------
|
|
91
|
+
|
|
92
|
+
To evaluate the efficiency for longer inputs, I created 5000 pairs
|
|
93
|
+
of random strings of size 16, 32, 64, 128, 256, 512 and 1024.
|
|
94
|
+
|
|
95
|
+
Each library was measured how fast it can process these entries. [#fn1]_
|
|
96
|
+
|
|
97
|
+
============ ===== ===== ===== ===== ===== ===== ======
|
|
98
|
+
Library N=16 N=32 N=64 N=128 N=256 N=512 N=1024
|
|
99
|
+
============ ===== ===== ===== ===== ===== ===== ======
|
|
100
|
+
edlib 0.040 0.063 0.094 0.205 0.432 0.908 2.089
|
|
101
|
+
editdistance 0.027 0.049 0.086 0.178 0.336 0.740 58.139
|
|
102
|
+
jellyfish 0.009 0.032 0.118 0.470 1.874 8.877 42.848
|
|
103
|
+
distance 0.007 0.029 0.109 0.431 1.726 6.950 27.998
|
|
104
|
+
Levenshtein 0.006 0.022 0.085 0.336 1.328 5.286 21.097
|
|
105
|
+
polyleven 0.003 0.005 0.010 0.043 0.149 0.550 2.109
|
|
106
|
+
============ ===== ===== ===== ===== ===== ===== ======
|
|
107
|
+
|
|
108
|
+
3.3. List of Libraries
|
|
109
|
+
----------------------
|
|
110
|
+
|
|
111
|
+
============ ======= ==========================================
|
|
112
|
+
Library Version URL
|
|
113
|
+
============ ======= ==========================================
|
|
114
|
+
edlib v1.2.1 https://github.com/Martinsos/edlib
|
|
115
|
+
editdistance v0.4 https://github.com/aflc/editdistance
|
|
116
|
+
jellyfish v0.5.6 https://github.com/jamesturk/jellyfish
|
|
117
|
+
distance v0.1.3 https://github.com/doukremt/distance
|
|
118
|
+
Levenshtein v0.12 https://github.com/ztane/python-Levenshtein
|
|
119
|
+
polyleven v0.3 https://github.com/fujimotos/polyleven
|
|
120
|
+
============ ======= ==========================================
|
|
121
|
+
|
|
122
|
+
.. [#fn1] Measured using Python 3.5.3 on Debian Jessie with Intel Core
|
|
123
|
+
i3-4010U (1.70GHz)
|
|
@@ -0,0 +1,6 @@
|
|
|
1
|
+
polyleven.cp310-win_arm64.pyd,sha256=2a3Sg1MBJDjuCOwHzr1o0DOHROjj-HJEOqfDWj8uDfg,12800
|
|
2
|
+
polyleven-0.10.0.dist-info/licenses/LICENSE,sha256=xnZjXO5s8lo3gIlvxuEZIT0bkSbZWKupIH9qtavDHwE,1344
|
|
3
|
+
polyleven-0.10.0.dist-info/METADATA,sha256=Vb8eMlO2R4B9gw1miRKRU3M5ejZ3ldtHC9hNS3Lw3hc,4374
|
|
4
|
+
polyleven-0.10.0.dist-info/WHEEL,sha256=n3R1Sz-ViPg5U6Qx-ZSHSkX2Y2FTsVx0_jJlFgqycH0,102
|
|
5
|
+
polyleven-0.10.0.dist-info/top_level.txt,sha256=12GbQ6DLcEtqgc30L3CguDVut0T-AYu2LoAm0fY4-cY,21
|
|
6
|
+
polyleven-0.10.0.dist-info/RECORD,,
|
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
Copyright (c) 2021 Fujimoto Seiji <fujimoto@ceptord.net>
|
|
2
|
+
Copyright (c) 2021 Max Bachmann <kontakt@maxbachmann.de>
|
|
3
|
+
Copyright (c) 2022 Nick Mazuk
|
|
4
|
+
Copyright (c) 2022 Michael Weiss <code@mweiss.ch>
|
|
5
|
+
Copyright (c) 2024 Alex Morgan <lexyym@gmail.com>
|
|
6
|
+
Copyright (c) 2026 Michael Mok <pmmmwh@gmail.com>
|
|
7
|
+
|
|
8
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
9
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
10
|
+
in the Software without restriction, including without limitation the rights
|
|
11
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
12
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
13
|
+
furnished to do so, subject to the following conditions:
|
|
14
|
+
|
|
15
|
+
The above copyright notice and this permission notice shall be included in all
|
|
16
|
+
copies or substantial portions of the Software.
|
|
17
|
+
|
|
18
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
19
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
20
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
21
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
22
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
23
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
24
|
+
SOFTWARE.
|
|
25
|
+
|
|
Binary file
|