bblean 0.6.0b2__cp313-cp313-win_amd64.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,288 @@
1
+ Metadata-Version: 2.4
2
+ Name: bblean
3
+ Version: 0.6.0b2
4
+ Summary: BitBirch-Lean Python package
5
+ Author: The Miranda-Quintana Lab and other BitBirch developers
6
+ Author-email: Ramon Alain Miranda Quintana <quintana@chem.ufl.edu>, Krisztina Zsigmond <kzsigmond@ufl.edu>, Ignacio Pickering <ipickering@ufl.edu>, Kenneth Lopez Perez <klopezperez@chem.ufl.edu>, Miroslav Lzicar <miroslav.lzicar@deepmedchem.com>
7
+ License-Expression: GPL-3.0-only
8
+ Project-URL: homepage, https://github.com/mqcomplab/bblean.git
9
+ Project-URL: repository, https://github.com/mqcomplab/bblean.git
10
+ Project-URL: documentation, https://github.com/mqcomplab/bblean.git
11
+ Classifier: Intended Audience :: Science/Research
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: Programming Language :: Python
14
+ Classifier: Topic :: Software Development
15
+ Classifier: Topic :: Scientific/Engineering
16
+ Classifier: Development Status :: 4 - Beta
17
+ Classifier: Operating System :: POSIX
18
+ Classifier: Operating System :: Unix
19
+ Classifier: Operating System :: MacOS
20
+ Classifier: Programming Language :: Python :: 3
21
+ Classifier: Programming Language :: Python :: 3.11
22
+ Classifier: Programming Language :: Python :: 3.12
23
+ Classifier: Programming Language :: Python :: 3.13
24
+ Classifier: Programming Language :: Python :: Implementation :: CPython
25
+ Requires-Python: >=3.11
26
+ Description-Content-Type: text/markdown
27
+ License-File: LICENSE
28
+ Requires-Dist: scipy
29
+ Requires-Dist: rdkit
30
+ Requires-Dist: numpy>=2.0
31
+ Requires-Dist: pandas
32
+ Requires-Dist: psutil
33
+ Requires-Dist: matplotlib
34
+ Requires-Dist: colorcet
35
+ Requires-Dist: seaborn
36
+ Requires-Dist: scikit-learn
37
+ Requires-Dist: rich
38
+ Requires-Dist: typer
39
+ Requires-Dist: opentsne
40
+ Requires-Dist: umap-learn
41
+ Requires-Dist: pynndescent
42
+ Dynamic: license-file
43
+
44
+ <picture>
45
+ <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/mqcomplab/bblean/main/docs/src/_static/logo-dark-bw.svg">
46
+ <source media="(prefers-color-scheme: light)" srcset="https://raw.githubusercontent.com/mqcomplab/bblean/main/docs/src/_static/logo-light-bw.svg">
47
+ <img alt="BitBIRCH-Lean logo" src="https://raw.githubusercontent.com/mqcomplab/bblean/main/docs/src/_static/logo-light-bw.svg">
48
+ </picture>
49
+ <br>
50
+ <br>
51
+
52
+ [![DOI](https://zenodo.org/badge/1051268662.svg)](https://doi.org/10.5281/zenodo.17139445)
53
+ [![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
54
+ [![CI](https://github.com/mqcomplab/bblean/actions/workflows/ci.yaml/badge.svg?branch=main)](https://github.com/mqcomplab/bblean/actions/workflows/ci.yaml)
55
+ [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
56
+ ![Code coverage](https://img.shields.io/endpoint?url=https%3A%2F%2Fmqcomplab.github.io%2Fbblean%2Fcoverage%2Fcoverage-badge.json)
57
+
58
+ ## Overview
59
+
60
+ BitBIRCH-Lean is a high-throughput implementation of the BitBIRCH clustering
61
+ algorithm designed for very large molecular libraries.
62
+
63
+ If you find this software useful please cite the following articles:
64
+
65
+ - *BitBIRCH: efficient clustering of large molecular libraries*:
66
+ https://doi.org/10.1039/D5DD00030K
67
+ - *BitBIRCH Clustering Refinement Strategies*:
68
+ https://doi.org/10.1021/acs.jcim.5c00627
69
+ - *BitBIRCH-Lean*:
70
+ (preprint) https://www.biorxiv.org/content/10.1101/2025.10.22.684015v1
71
+
72
+ **NOTE**: BitBirch-Lean is currently beta software, expect minor breaking changes until
73
+ we hit version 1.0
74
+
75
+ The [documentation](https://mqcomplab.github.io/bblean/devdocs) of the developer version is a work in progress. Please let us know if you find any issues.
76
+
77
+ ⚠️ **Important**: The default `threshold` is 0.3 and the default fingerprint kind to
78
+ *ecfp4*. We recommend setting `threshold` to 0.5-0.65 for *rdkit* fingerprints and
79
+ 0.3-0.4 for *ecfp4* or *ecfp6* fingerprints (although you may need further tuning for
80
+ your specific library / fingerprint set). For more information on tuning these
81
+ parameters see [the best
82
+ practices](https://mqcomplab.github.io/bblean/devdocs/user-guide/notebooks/bitbirch_best_practices.html)
83
+ and [parameter
84
+ tuning](https://mqcomplab.github.io/bblean/devdocs/user-guide/parameters.html) guides.
85
+
86
+ ## Installation
87
+
88
+ BitBIRCH-Lean requires Python 3.11 or higher, and can be installed in Windows, Linux or
89
+ macOS via pip, which automatically includes C++ extensions:
90
+
91
+ ```bash
92
+ pip install bblean
93
+ bb --help
94
+ ```
95
+
96
+ We recommend installing `bblean` in a conda environment or a `venv`.
97
+
98
+ Memory usage and C++ extensions are most optimized for Linux / macOS. We support windows
99
+ on a best-effort basis, some releases may not have Windows support.
100
+
101
+ ### From source
102
+
103
+ To build from source instead (editable mode):
104
+
105
+ ```bash
106
+ git clone git@github.com:mqcomplab/bblean,
107
+ cd bblean
108
+
109
+ conda env create --file ./environment.yaml
110
+ conda activate bblean
111
+
112
+ BITBIRCH_BUILD_CPP=1 pip install -e .
113
+
114
+ # If you want to build without the C++ extensions run this instead:
115
+ pip install -e .
116
+
117
+ bb --help
118
+ ```
119
+
120
+ If the extensions install successfully, they will be automatically used each time
121
+ BitBirch-Lean or its classes are used. No need to do anything else.
122
+
123
+ If you run into any issues when installing the extensions, please open a GitHub issue
124
+ and tag it with `C++`.
125
+
126
+ ## CLI Quickstart
127
+
128
+ <div align="center">
129
+ <img src="bblean-demo-v2.gif" width="600" />
130
+ </div>
131
+
132
+ BitBIRCH-Lean provides a convenient CLI interface, `bb`. The CLI can be used to convert
133
+ SMILES files into compact fingerprint arrays, and cluster them in parallel or serial
134
+ mode with a single command, making it straightforward to triage collections with
135
+ millions of molecules. The CLI prints a run banner with the parameters used, memory
136
+ usage (when available), and elapsed timings so you can track each job at a glance.
137
+
138
+ The most important commands you need are:
139
+
140
+ - `bb fps-from-smiles`: Generate fingerprints from a `*.smi` file.
141
+ - `bb run` or `bb multiround`: Cluster the fingerprints
142
+ - `bb plot-summary` or `bb plot-tsne`: Analyze the clusters
143
+
144
+ An example usual workflow is as follows:
145
+
146
+ 1. **Generate fingerprints from SMILES**: The repository ships with a ChEMBL
147
+ sample that you can use right away for testing:
148
+
149
+ ```bash
150
+ bb fps-from-smiles examples/chembl-33-natural-products-sample.smi
151
+ ```
152
+
153
+ This writes a packed fingerprint array to the current working directory (use
154
+ `--out-dir <dir>` for a different location). The naming convention is
155
+ `packed-fps-uint8-508e53ef.npy`, where `508e53ef` is a unique identifier (use `--name
156
+ <name>` if you prefer a different name). The packed `uint8` format is required for
157
+ maximum memory-efficient, so keep the default
158
+ `--pack` and `--dtype` values unless you have a very good reason to change them.
159
+ You can optionally split over multiple files for parallel parallel processing with `--num-parts <num>`.
160
+
161
+ 3. **Cluster the fingerprints**: To cluster in serial mode, point `bb run` at the
162
+ generated array (or a directory with multiple `*.npy` files):
163
+
164
+ ```bash
165
+ bb run ./packed-fps-uint8-508e53ef.npy
166
+ ```
167
+
168
+ The outputs are stored in directory such as `bb_run_outputs/504e40ef/`, where
169
+ `504e40ef` is a unique identifier (use `--out-dir <dir>` for a different location).
170
+ Additional flags can be set to control the BitBIRCH `--branching`, `--threshold`,
171
+ and merge criterion. Optionally, cluster refinement can be performed with `--refine-num 1`.
172
+ `bb run --help ` for details.
173
+
174
+ To cluster in parallel mode, use `bb multiround ./file-or-dir` instead. If pointed to
175
+ a directory with multiple `*.npy` files, files will be clustered in parallel and
176
+ sub-trees will be merged iteratively in intermediate rounds. For more information:
177
+ `bb multiround --help`. Outputs are written by default to
178
+ `bb_multiround_outputs/<unique-id>/`.
179
+
180
+ 4. **Visualize the results**: You can plot a summary of the largest clusters with
181
+ `bb plot-summary <output-path> --top 20` (largest 20 clusters). Passing the optional `--smiles <path-to-file.smi>` argument
182
+ additionally generates Murcko scaffold analysis. For a t-SNE
183
+ visualization try `bb plot-tsne <output-path> -- top 20`.
184
+ t-SNE plots use [openTSNE](https://opentsne.readthedocs.io/en/latest/) as a backend,
185
+ which is a parallel, extremely fast implementation. We recommend you consult the corresponding
186
+ documentation for info on the available parameters.
187
+ Still, expect t-SNE plots to be slow for very large datasets (more than 1M molecules).
188
+
189
+ ### Manually exploring clustering results
190
+
191
+ Every run directory contains a raw `clusters.pkl` file with the molecule indices for each
192
+ cluster, plus metadata in `*.json` files that captures the exact settings and
193
+ performance characteristics. A quick Python session is all you need to get started:
194
+
195
+ ```python
196
+ import pickle
197
+
198
+ clusters = pickle.load(open("bb_run_outputs/504e40ef/clusters.pkl", "rb"))
199
+ clusters[:2]
200
+ # [[321, 323, 326, 328, 337, ..., 9988, 9989],
201
+ # [5914, 5915, 5916, 5917, 5918, ..., 9990, 9991, 9992, 9993]]
202
+ ```
203
+
204
+ The indices refer to the position of each molecule in the order they were read from the
205
+ fingerprint files, making it easy to link back to your original SMILES records.
206
+
207
+ ## Python Quickstart and Examples
208
+
209
+ For an example of how to use the main `bblean` classes and functions consult
210
+ `examples/bitbirch_quickstart.ipynb`. The `examples/dataset_splitting.ipynb` notebook
211
+ contains an adapted notebook by Pat Walters ([Some Thoughts on Splitting Chemical
212
+ Datasets](https://practicalcheminformatics.blogspot.com/2024/11/some-thoughts-on-splitting-chemical.html)).
213
+ More examples will be added soon!
214
+
215
+ A quick summary:
216
+
217
+ ```python
218
+ import pickle
219
+
220
+ import matplotlib.pyplot as plt
221
+ import numpy as np
222
+
223
+ import bblean
224
+ import bblean.plotting as plotting
225
+ import bblean.analysis as analysis
226
+
227
+ # Create the fingerprints and pack them into a numpy array, starting from a *.smi file
228
+ smiles = bblean.load_smiles("./examples/chembl-33-natural-products-sample.smi")
229
+ fps = bblean.fps_from_smiles(smiles, pack=True, n_features=2048, kind="rdkit")
230
+
231
+ # Fit the figerprints (by default all bblean functions take *packed* fingerprints)
232
+ # A threhsold of 0.5-0.65 is good for rdkit fingerprints, a threshold of 0.3-0.4
233
+ # is better for ECFPs
234
+ tree = bblean.BitBirch(branching_factor=50, threshold=0.65, merge_criterion="diameter")
235
+ tree.fit(fps)
236
+
237
+ # Refine the tree (if needed)
238
+ tree.set_merge(merge_criterion="tolerance-diameter", tolerance=0.0)
239
+ tree.refine_inplace(fps)
240
+
241
+ # Visualize the results
242
+ clusters = tree.get_cluster_mol_ids()
243
+ ca = analysis.cluster_analysis(clusters, fps, smiles)
244
+ plotting.summary_plot(ca, title="ChEMBL Sample")
245
+ plt.show()
246
+
247
+ # Save the resulting clusters, metrics, and fps
248
+ with open("./clusters.pkl", "wb") as f:
249
+ pickle.dump(clusters, f)
250
+ ca.dump_metrics("./metrics.csv")
251
+ np.save("./fps-packed-2048.npy", fps)
252
+ ```
253
+
254
+ ## Public Python API and Documentation
255
+
256
+ By default all functions take *packed* fingerprints of dtype `uint8`. Many functions
257
+ support an `input_is_packed: bool` flag, which you can toggle to `False` in case for
258
+ some reason you want to pass unpacked fingerprints (not recommended).
259
+
260
+ - Functions and classes that *end in an underscore* are considered private (such as
261
+ `_private_function(...)`) and should not be used, since they can be removed or
262
+ modified without warning.
263
+ - All functions and classes that are in *modules that end with an underscore* are also
264
+ considered private (such as `bblean._private_module.private_function(...)`) and should
265
+ not be used, since they can be removed or modified without warning.
266
+ - All other functions and classes are part of the stable public API and can be used.
267
+ However, expect minor breaking changes before we hit version 1.0
268
+
269
+ ## Contributing
270
+
271
+ If you find a bug in BitBIRCH-Lean or have an issue with the usage
272
+ or documentation please open an issue in the GitHub issue tracker.
273
+
274
+ If you want to contribute to BitBIRCH-Lean with a bug fix, improving the documentation,
275
+ with usability, maintainability, or performance, please open an issue with your
276
+ idea/request (or directly open a PR from a fork if you prefer).
277
+
278
+ Currently we don't directly accept PRs with new features that have not been extensively
279
+ validated, but if you have an idea to improve the BitBIRCH algorithm you may want to
280
+ contact the Miranda-Quintana Lab, we are open to collaborations.
281
+
282
+ To contribute, first create a
283
+ [fork](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo),
284
+ then clone your fork (`git clone git@github.com:<user>/bblean`. We recommend you install
285
+ `pre-commit` (`pre-commit install --hook-type pre-push`), which will run some checks
286
+ before you push to your branch. After you have finished work on your branch, [open a
287
+ pull
288
+ request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request).
@@ -0,0 +1,31 @@
1
+ bblean/__init__.py,sha256=9cudBHEt0H5p0jKEvgrhLZIHPSzwNAx0uJRp-_iM32I,686
2
+ bblean/_config.py,sha256=WaONZilOWCLFdZulqWLKRqNM-ZLhY0YCXfwk-84FYmQ,1813
3
+ bblean/_console.py,sha256=Mk1hi1NdPw2HDmjWj1LLbCuV3vCxL5l6u2gXaEeOFBM,8021
4
+ bblean/_cpp_similarity.cp313-win_amd64.pyd,sha256=rwQ5EXqcPVWYLYmt4qEcdAEARwySvp5rgW4HANkyyBY,178688
5
+ bblean/_memory.py,sha256=eycXzXV_O_VEyIKpAv3QpbxtpB5WkBLChzm_e2Dqaw0,6892
6
+ bblean/_merges.py,sha256=xwFMJUPJ9VMujf2nSROx0NhsPoQ_R84KIxBF81x2hks,6432
7
+ bblean/_py_similarity.py,sha256=d1kbEc8lc0MgYsmW6nkFI-tV1Plo12e3bml32_8dkoU,9859
8
+ bblean/_timer.py,sha256=D1-_tTQFJqIQgzl4HSE__-P3Scw72EIVlNDaChJT8Qs,1402
9
+ bblean/_version.py,sha256=Z6NaqO7AvzfKUsoqEpOi7eBkzR_-GLsbF8CpiRFwVJo,746
10
+ bblean/analysis.py,sha256=apD5OgSoNGbIuBLSJFFzlUkVjZHBtb3fVEeEUJGbyqc,8118
11
+ bblean/bitbirch.py,sha256=fRS9dIHu3wx7rJztPYUyEINuv5KsridRpqLYh_DlmT0,58792
12
+ bblean/cli.py,sha256=FwO-jWO9Wt-1CGP8mL_PmbEyJyHPnQxo9BaGT2zLVjE,62506
13
+ bblean/fingerprints.py,sha256=cArsOt-946xjvoKM8qTXc0wfKA39ZFhzIht6MW9x-kQ,15315
14
+ bblean/metrics.py,sha256=4KB-PIQJtFMsNg7lG2uM1HEId_eR5vhqcdLpCVLuI5Y,7280
15
+ bblean/multiround.py,sha256=_-pr5LG_GLSBNZ60uLcy8XZ-qo7lr0Y048Kp041_ug8,19980
16
+ bblean/plotting.py,sha256=1ryJbWJBVY7gkoX_JDyhY4k62spjumz1_V8IhpObzbY,15676
17
+ bblean/similarity.py,sha256=nCrUH0t6k5GMNNWf6gD4r7ZszQEPR3b2qyk5Im7Naa8,10203
18
+ bblean/sklearn.py,sha256=USE5qfGrWLZokz4Ati_RsRIGn1mOwHSCAw82VXD7qhA,7512
19
+ bblean/smiles.py,sha256=fBoU41eLGmxq_uPkX-yWM9SBoPqb7_sWXmy0eo0MtNs,1855
20
+ bblean/utils.py,sha256=K0ttSPf54nxrKD1TwbLFuwDIRlAD0jdr6KnuTqXs-HQ,3836
21
+ bblean/_legacy/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
22
+ bblean/_legacy/bb_int64.py,sha256=TJ5vd71iVLHZW1gEit_tAd4nwpJ8PMoWys84e9E8RIk,45770
23
+ bblean/_legacy/bb_uint8.py,sha256=8kbeVAq7MxiR8hS_6lKhSDhVWc6acjLmLzNFCR466iA,41573
24
+ bblean/csrc/README.md,sha256=qOPPK6sTqkYgnlPWtcNu9P3PwuLH8cCNJ1FwJeewsrk,59
25
+ bblean/csrc/similarity.cpp,sha256=7zS76zHywEOnxPqK0kFPxrgsRjTKAD_YrSCYMgb1DJ4,21231
26
+ bblean-0.6.0b2.dist-info/licenses/LICENSE,sha256=Dq9t2XHr5wSrykVuVo8etKsAS35ENnDobU1h7t3H_-k,2598
27
+ bblean-0.6.0b2.dist-info/METADATA,sha256=9TcsxKr-RZCJGp6IFRXERdSsPbkO9GuYDYfx31kKg5w,13023
28
+ bblean-0.6.0b2.dist-info/WHEEL,sha256=qV0EIPljj1XC_vuSatRWjn02nZIz3N1t8jsZz7HBr2U,101
29
+ bblean-0.6.0b2.dist-info/entry_points.txt,sha256=a0jb2L5JFKioMD6CqbvJiI2unaArGzi-AMZsyY-uyGg,38
30
+ bblean-0.6.0b2.dist-info/top_level.txt,sha256=ybxTonvTC9zR25yR5B27aEDLl6CiwID093ZyS_--Cq4,7
31
+ bblean-0.6.0b2.dist-info/RECORD,,
@@ -0,0 +1,5 @@
1
+ Wheel-Version: 1.0
2
+ Generator: setuptools (80.9.0)
3
+ Root-Is-Purelib: false
4
+ Tag: cp313-cp313-win_amd64
5
+
@@ -0,0 +1,2 @@
1
+ [console_scripts]
2
+ bb = bblean.cli:app
@@ -0,0 +1,48 @@
1
+ BitBIRCH-Lean Python Package: An open-source clustering module based on iSIM.
2
+
3
+ If you find this software useful please cite the following articles:
4
+ - BitBIRCH: efficient clustering of large molecular libraries:
5
+ https://doi.org/10.1039/D5DD00030K
6
+ - BitBIRCH Clustering Refinement Strategies:
7
+ https://doi.org/10.1021/acs.jcim.5c00627
8
+ - BitBIRCH-Lean:
9
+ (preprint) https://www.biorxiv.org/content/10.1101/2025.10.22.684015v1
10
+
11
+ Copyright (C) 2025 The Miranda-Quintana Lab and other BitBirch developers, comprised
12
+ exclusively by:
13
+ - Ramon Alain Miranda Quintana <ramirandaq@gmail.com>, <quintana@chem.ufl.edu>
14
+ - Krisztina Zsigmond <kzsigmond@ufl.edu>
15
+ - Ignacio Pickering <ipickering@chem.ufl.edu>
16
+ - Kenneth Lopez Perez <klopezperez@chem.ufl.edu>
17
+ - Miroslav Lzicar <miroslav.lzicar@deepmedchem.com>
18
+
19
+ Authors of ./bblean/multiround.py are:
20
+ - Ramon Alain Miranda Quintana <ramirandaq@gmail.com>, <quintana@chem.ufl.edu>
21
+ - Ignacio Pickering <ipickering@chem.ufl.edu>
22
+
23
+ This program is free software: you can redistribute it and/or modify it under the
24
+ terms of the GNU General Public License as published by the Free Software Foundation,
25
+ version 3 (SPDX-License-Identifier: GPL-3.0-only).
26
+
27
+ Portions of the file ./bblean/bitbirch.py are licensed under the BSD 3-Clause License
28
+ Copyright (c) 2007-2024 The scikit-learn developers. All rights reserved.
29
+ (SPDX-License-Identifier: BSD-3-Clause). Copies or reproductions of code in the
30
+ ./bblean/bitbirch.py file must in addition adhere to the BSD-3-Clause license terms. A
31
+ copy of the BSD-3-Clause license can be located at the root of this repository, under
32
+ ./LICENSES/BSD-3-Clause.txt.
33
+
34
+ Portions of the file ./bblean/bitbirch.py were previously licensed under the LGPL 3.0
35
+ license (SPDX-License-Identifier: LGPL-3.0-only), they are relicensed in this program
36
+ as GPL-3.0, with permission of all original copyright holders:
37
+ - Ramon Alain Miranda Quintana <ramirandaq@gmail.com>, <quintana@chem.ufl.edu>
38
+ - Vicky (Vic) Jung <jungvicky@ufl.edu>
39
+ - Kenneth Lopez Perez <klopezperez@chem.ufl.edu>
40
+ - Kate Huddleston <kdavis2@chem.ufl.edu>
41
+
42
+ This program is distributed in the hope that it will be useful, but WITHOUT ANY
43
+ WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
44
+ PARTICULAR PURPOSE. See the GNU General Public License for more details.
45
+
46
+ You should have received a copy of the GNU General Public License along with this
47
+ program. This copy can be located at the root of this repository, under
48
+ ./LICENSES/GPL-3.0-only.txt. If not, see <http://www.gnu.org/licenses/gpl-3.0.html>.
@@ -0,0 +1 @@
1
+ bblean