pycompound 0.1.8__tar.gz → 0.1.10__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- pycompound-0.1.10/PKG-INFO +28 -0
- {pycompound-0.1.8 → pycompound-0.1.10}/README.md +2 -2
- pycompound-0.1.10/README_PyPI.md +3 -0
- {pycompound-0.1.8 → pycompound-0.1.10}/pyproject.toml +2 -2
- {pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound/plot_spectra.py +6 -6
- {pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound/spec_lib_matching.py +37 -29
- pycompound-0.1.10/src/pycompound.egg-info/PKG-INFO +28 -0
- {pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound.egg-info/SOURCES.txt +1 -0
- {pycompound-0.1.8 → pycompound-0.1.10}/tests/test_plot_spectra.py +65 -32
- {pycompound-0.1.8 → pycompound-0.1.10}/tests/test_spec_lib_matching.py +56 -46
- {pycompound-0.1.8 → pycompound-0.1.10}/tests/test_tuning.py +32 -10
- pycompound-0.1.8/PKG-INFO +0 -824
- pycompound-0.1.8/src/pycompound.egg-info/PKG-INFO +0 -824
- {pycompound-0.1.8 → pycompound-0.1.10}/LICENSE +0 -0
- {pycompound-0.1.8 → pycompound-0.1.10}/setup.cfg +0 -0
- {pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound/build_library.py +0 -0
- {pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound/plot_spectra_CLI.py +0 -0
- {pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound/processing.py +0 -0
- {pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound/similarity_measures.py +0 -0
- {pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound/spec_lib_matching_CLI.py +0 -0
- {pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound/tuning_CLI_DE.py +0 -0
- {pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound/tuning_CLI_grid.py +0 -0
- {pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound.egg-info/dependency_links.txt +0 -0
- {pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound.egg-info/requires.txt +0 -0
- {pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound.egg-info/top_level.txt +0 -0
- {pycompound-0.1.8 → pycompound-0.1.10}/tests/test_build_library.py +0 -0
- {pycompound-0.1.8 → pycompound-0.1.10}/tests/test_similarity_measures.py +0 -0
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: pycompound
|
|
3
|
+
Version: 0.1.10
|
|
4
|
+
Summary: Python package to perform compound identification in mass spectrometry via spectral library matching.
|
|
5
|
+
Author-email: Hunter Dlugas <fy7392@wayne.edu>
|
|
6
|
+
License-Expression: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/hdlugas/pycompound
|
|
8
|
+
Project-URL: Issues, https://github.com/hdlugas/pycompound/issues
|
|
9
|
+
Classifier: Programming Language :: Python :: 3
|
|
10
|
+
Classifier: Operating System :: OS Independent
|
|
11
|
+
Requires-Python: >=3.9
|
|
12
|
+
Description-Content-Type: text/markdown
|
|
13
|
+
License-File: LICENSE
|
|
14
|
+
Requires-Dist: matplotlib==3.8.4
|
|
15
|
+
Requires-Dist: numpy==1.26.4
|
|
16
|
+
Requires-Dist: pandas==2.2.2
|
|
17
|
+
Requires-Dist: scipy==1.13.1
|
|
18
|
+
Requires-Dist: pyteomics==4.7.2
|
|
19
|
+
Requires-Dist: netCDF4==1.6.5
|
|
20
|
+
Requires-Dist: lxml>=5.1.0
|
|
21
|
+
Requires-Dist: orjson==3.11.0
|
|
22
|
+
Requires-Dist: shiny==1.4.0
|
|
23
|
+
Requires-Dist: joblib==1.5.2
|
|
24
|
+
Dynamic: license-file
|
|
25
|
+
|
|
26
|
+
# PyCompound
|
|
27
|
+
|
|
28
|
+
A Python-based tool for spectral library matching, PyCompound is available as a Python package (pycompound) with a command-line interface (CLI) available and as a GUI application build with Python/Shiny. It performs spectral library matching to identify chemical compounds, offering a range of spectrum preprocessing transformations and similarity measures, including Cosine, three entropy-based similarity measures, and a plethora of binary similarity measures. PyCompound also includes functionality to tune parameters commonly used in a compound identification workflow given a query library of spectra with known ID. PyCompound supports both high-resolution mass spectrometry (HRMS) data (e.g., LC-MS/MS) and nominal-resolution mass spectrometry (NRMS) data (e.g., GC-MS). For the full documentation, see the GitHub repository https://github.com/hdlugas/pycompound.
|
|
@@ -19,9 +19,9 @@ A Python-based tool for spectral library matching, PyCompound is available as a
|
|
|
19
19
|
## 1. Install dependencies
|
|
20
20
|
PyCompound requires the Python dependencies Matplotlib, NumPy, Pandas, SciPy, Pyteomics, and netCDF4. Specifically, this software was validated with python=3.12.4, matplotlib=3.8.4, numpy=1.26.4, pandas=2.2.2, scipy=1.13.1, pyteomics=4.7.2, netCDF4=1.6.5, lxml=5.1.0, joblib=1.5.2, and shiny=1.4.0, although it may work with other versions of these tools. A user may consider creating a conda environment (see [https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html](https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html) for guidance on getting started with conda if you are unfamiliar). For a system with conda installed, one can create the environment pycompound_env, activate it, and install the necessary dependencies with:
|
|
21
21
|
```
|
|
22
|
-
conda create -n pycompound_env python=3.12
|
|
22
|
+
conda create -n pycompound_env python=3.12 -y
|
|
23
23
|
conda activate pycompound_env
|
|
24
|
-
pip install pycompound==0.1.
|
|
24
|
+
pip install pycompound==0.1.10
|
|
25
25
|
```
|
|
26
26
|
|
|
27
27
|
<a name="functionality"></a>
|
|
@@ -0,0 +1,3 @@
|
|
|
1
|
+
# PyCompound
|
|
2
|
+
|
|
3
|
+
A Python-based tool for spectral library matching, PyCompound is available as a Python package (pycompound) with a command-line interface (CLI) available and as a GUI application build with Python/Shiny. It performs spectral library matching to identify chemical compounds, offering a range of spectrum preprocessing transformations and similarity measures, including Cosine, three entropy-based similarity measures, and a plethora of binary similarity measures. PyCompound also includes functionality to tune parameters commonly used in a compound identification workflow given a query library of spectra with known ID. PyCompound supports both high-resolution mass spectrometry (HRMS) data (e.g., LC-MS/MS) and nominal-resolution mass spectrometry (NRMS) data (e.g., GC-MS). For the full documentation, see the GitHub repository https://github.com/hdlugas/pycompound.
|
|
@@ -4,12 +4,12 @@ build-backend = "setuptools.build_meta"
|
|
|
4
4
|
|
|
5
5
|
[project]
|
|
6
6
|
name = "pycompound"
|
|
7
|
-
version = "0.1.
|
|
7
|
+
version = "0.1.10"
|
|
8
8
|
authors = [
|
|
9
9
|
{ name="Hunter Dlugas", email="fy7392@wayne.edu" },
|
|
10
10
|
]
|
|
11
11
|
description = "Python package to perform compound identification in mass spectrometry via spectral library matching."
|
|
12
|
-
readme = "
|
|
12
|
+
readme = "README_PyPI.md"
|
|
13
13
|
requires-python = ">=3.9"
|
|
14
14
|
classifiers = [
|
|
15
15
|
"Programming Language :: Python :: 3",
|
|
@@ -14,7 +14,7 @@ def generate_plots_on_HRMS_data(query_data=None, reference_data=None, spectrum_I
|
|
|
14
14
|
else:
|
|
15
15
|
extension = query_data.rsplit('.',1)
|
|
16
16
|
extension = extension[(len(extension)-1)]
|
|
17
|
-
if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF':
|
|
17
|
+
if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF' or extension == 'msp' or extension == 'MSP' or extension == 'json' or extension == 'JSON':
|
|
18
18
|
output_path_tmp = query_data[:-3] + 'txt'
|
|
19
19
|
build_library_from_raw_data(input_path=query_data, output_path=output_path_tmp, is_reference=True)
|
|
20
20
|
df_query = pd.read_csv(output_path_tmp, sep='\t')
|
|
@@ -29,7 +29,7 @@ def generate_plots_on_HRMS_data(query_data=None, reference_data=None, spectrum_I
|
|
|
29
29
|
else:
|
|
30
30
|
extension = reference_data.rsplit('.',1)
|
|
31
31
|
extension = extension[(len(extension)-1)]
|
|
32
|
-
if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF':
|
|
32
|
+
if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF' or extension == 'msp' or extension == 'MSP' or extension == 'json' or extension == 'JSON':
|
|
33
33
|
output_path_tmp = reference_data[:-3] + 'txt'
|
|
34
34
|
build_library_from_raw_data(input_path=reference_data, output_path=output_path_tmp, is_reference=True)
|
|
35
35
|
df_reference = pd.read_csv(output_path_tmp, sep='\t')
|
|
@@ -298,7 +298,7 @@ def generate_plots_on_NRMS_data(query_data=None, reference_data=None, spectrum_I
|
|
|
298
298
|
else:
|
|
299
299
|
extension = query_data.rsplit('.',1)
|
|
300
300
|
extension = extension[(len(extension)-1)]
|
|
301
|
-
if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF':
|
|
301
|
+
if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF' or extension == 'msp' or extension == 'MSP' or extension == 'json' or extension == 'JSON':
|
|
302
302
|
output_path_tmp = query_data[:-3] + 'txt'
|
|
303
303
|
build_library_from_raw_data(input_path=query_data, output_path=output_path_tmp, is_reference=False)
|
|
304
304
|
df_query = pd.read_csv(output_path_tmp, sep='\t')
|
|
@@ -312,7 +312,7 @@ def generate_plots_on_NRMS_data(query_data=None, reference_data=None, spectrum_I
|
|
|
312
312
|
else:
|
|
313
313
|
extension = reference_data.rsplit('.',1)
|
|
314
314
|
extension = extension[(len(extension)-1)]
|
|
315
|
-
if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF':
|
|
315
|
+
if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF' or extension == 'msp' or extension == 'MSP' or extension == 'json' or extension == 'JSON':
|
|
316
316
|
output_path_tmp = reference_data[:-3] + 'txt'
|
|
317
317
|
build_library_from_raw_data(input_path=reference_data, output_path=output_path_tmp, is_reference=True)
|
|
318
318
|
df_reference = pd.read_csv(output_path_tmp, sep='\t')
|
|
@@ -395,8 +395,8 @@ def generate_plots_on_NRMS_data(query_data=None, reference_data=None, spectrum_I
|
|
|
395
395
|
print(f'Warning: plots will be saved to the PDF ./spectrum1_{spectrum_ID1}_spectrum2_{spectrum_ID2}_plot.pdf in the current working directory.')
|
|
396
396
|
output_path = f'{Path.cwd()}/spectrum1_{spectrum_ID1}_spectrum2_{spectrum_ID2}.pdf'
|
|
397
397
|
|
|
398
|
-
min_mz = np.min([np.min(df_query['mz_ratio'].tolist()), np.min(df_reference['mz_ratio'].tolist())])
|
|
399
|
-
max_mz = np.max([np.max(df_query['mz_ratio'].tolist()), np.max(df_reference['mz_ratio'].tolist())])
|
|
398
|
+
min_mz = int(np.min([np.min(df_query['mz_ratio'].tolist()), np.min(df_reference['mz_ratio'].tolist())]))
|
|
399
|
+
max_mz = int(np.max([np.max(df_query['mz_ratio'].tolist()), np.max(df_reference['mz_ratio'].tolist())]))
|
|
400
400
|
mzs = np.linspace(min_mz,max_mz,(max_mz-min_mz+1))
|
|
401
401
|
|
|
402
402
|
unique_query_ids = df_query['id'].unique().tolist()
|
|
@@ -31,7 +31,8 @@ def objective_function_HRMS(X, ctx):
|
|
|
31
31
|
p["wf_mz"], p["wf_int"], p["LET_threshold"],
|
|
32
32
|
p["entropy_dimension"],
|
|
33
33
|
ctx["high_quality_reference_library"],
|
|
34
|
-
verbose=False
|
|
34
|
+
verbose=False,
|
|
35
|
+
exact_match_required=ctx["exact_match_required"]
|
|
35
36
|
)
|
|
36
37
|
print(f"\nparams({ctx['optimize_params']}) = {np.array(X)}\naccuracy: {acc*100}%")
|
|
37
38
|
return 1.0 - acc
|
|
@@ -45,7 +46,8 @@ def objective_function_NRMS(X, ctx):
|
|
|
45
46
|
ctx["mz_min"], ctx["mz_max"], ctx["int_min"], ctx["int_max"],
|
|
46
47
|
p["noise_threshold"], p["wf_mz"], p["wf_int"], p["LET_threshold"], p["entropy_dimension"],
|
|
47
48
|
ctx["high_quality_reference_library"],
|
|
48
|
-
verbose=False
|
|
49
|
+
verbose=False,
|
|
50
|
+
exact_match_required=ctx["exact_match_required"]
|
|
49
51
|
)
|
|
50
52
|
print(f"\nparams({ctx['optimize_params']}) = {np.array(X)}\naccuracy: {acc*100}%")
|
|
51
53
|
return 1.0 - acc
|
|
@@ -53,7 +55,7 @@ def objective_function_NRMS(X, ctx):
|
|
|
53
55
|
|
|
54
56
|
|
|
55
57
|
|
|
56
|
-
def tune_params_DE(query_data=None, reference_data=None, chromatography_platform='HRMS', precursor_ion_mz_tolerance=None, ionization_mode=None, adduct=None, similarity_measure='cosine', weights=None, spectrum_preprocessing_order='CNMWL', mz_min=0, mz_max=999999999, int_min=0, int_max=999999999, high_quality_reference_library=False, optimize_params=["window_size_centroiding","window_size_matching","noise_threshold","wf_mz","wf_int","LET_threshold","entropy_dimension"], param_bounds={"window_size_centroiding":(0.0,0.5),"window_size_matching":(0.0,0.5),"noise_threshold":(0.0,0.25),"wf_mz":(0.0,5.0),"wf_int":(0.0,5.0),"LET_threshold":(0.0,5.0),"entropy_dimension":(1.0,3.0)}, default_params={"window_size_centroiding": 0.5, "window_size_matching":0.5, "noise_threshold":0.10, "wf_mz":0.0, "wf_int":1.0, "LET_threshold":0.0, "entropy_dimension":1.1}, maxiters=3, de_workers=1):
|
|
58
|
+
def tune_params_DE(query_data=None, reference_data=None, chromatography_platform='HRMS', precursor_ion_mz_tolerance=None, ionization_mode=None, adduct=None, similarity_measure='cosine', weights=None, spectrum_preprocessing_order='CNMWL', mz_min=0, mz_max=999999999, int_min=0, int_max=999999999, high_quality_reference_library=False, optimize_params=["window_size_centroiding","window_size_matching","noise_threshold","wf_mz","wf_int","LET_threshold","entropy_dimension"], param_bounds={"window_size_centroiding":(0.0,0.5),"window_size_matching":(0.0,0.5),"noise_threshold":(0.0,0.25),"wf_mz":(0.0,5.0),"wf_int":(0.0,5.0),"LET_threshold":(0.0,5.0),"entropy_dimension":(1.0,3.0)}, default_params={"window_size_centroiding": 0.5, "window_size_matching":0.5, "noise_threshold":0.10, "wf_mz":0.0, "wf_int":1.0, "LET_threshold":0.0, "entropy_dimension":1.1}, maxiters=3, de_workers=1, exact_match_required=False):
|
|
57
59
|
|
|
58
60
|
if query_data is None:
|
|
59
61
|
print('\nError: No argument passed to the mandatory query_data. Please pass the path to the TXT file of the query data.')
|
|
@@ -63,7 +65,7 @@ def tune_params_DE(query_data=None, reference_data=None, chromatography_platform
|
|
|
63
65
|
extension = extension[(len(extension)-1)]
|
|
64
66
|
if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF' or extension == 'msp' or extension == 'MSP' or extension == 'json' or extension == 'JSON':
|
|
65
67
|
output_path_tmp = query_data[:-3] + 'txt'
|
|
66
|
-
build_library_from_raw_data(input_path=query_data, output_path=output_path_tmp, is_reference=
|
|
68
|
+
build_library_from_raw_data(input_path=query_data, output_path=output_path_tmp, is_reference=True)
|
|
67
69
|
df_query = pd.read_csv(output_path_tmp, sep='\t')
|
|
68
70
|
if extension == 'txt' or extension == 'TXT':
|
|
69
71
|
df_query = pd.read_csv(query_data, sep='\t')
|
|
@@ -106,6 +108,7 @@ def tune_params_DE(query_data=None, reference_data=None, chromatography_platform
|
|
|
106
108
|
high_quality_reference_library=high_quality_reference_library,
|
|
107
109
|
default_params=default_params,
|
|
108
110
|
optimize_params=optimize_params,
|
|
111
|
+
exact_match_required=exact_match_required
|
|
109
112
|
)
|
|
110
113
|
|
|
111
114
|
bounds = [param_bounds[p] for p in optimize_params]
|
|
@@ -136,14 +139,7 @@ default_HRMS_grid = {'similarity_measure':['cosine'], 'weight':[{'Cosine':0.25,'
|
|
|
136
139
|
default_NRMS_grid = {'similarity_measure':['cosine'], 'weight':[{'Cosine':0.25,'Shannon':0.25,'Renyi':0.25,'Tsallis':0.25}], 'spectrum_preprocessing_order':['FCNMWL'], 'mz_min':[0], 'mz_max':[9999999], 'int_min':[0], 'int_max':[99999999], 'noise_threshold':[0.0], 'wf_mz':[0.0], 'wf_int':[1.0], 'LET_threshold':[0.0], 'entropy_dimension':[1.1], 'high_quality_reference_library':[False]}
|
|
137
140
|
|
|
138
141
|
|
|
139
|
-
def _eval_one_HRMS(df_query, df_reference,
|
|
140
|
-
precursor_ion_mz_tolerance_tmp, ionization_mode_tmp, adduct_tmp,
|
|
141
|
-
similarity_measure_tmp, weight,
|
|
142
|
-
spectrum_preprocessing_order_tmp, mz_min_tmp, mz_max_tmp,
|
|
143
|
-
int_min_tmp, int_max_tmp, noise_threshold_tmp,
|
|
144
|
-
window_size_centroiding_tmp, window_size_matching_tmp,
|
|
145
|
-
wf_mz_tmp, wf_int_tmp, LET_threshold_tmp,
|
|
146
|
-
entropy_dimension_tmp, high_quality_reference_library_tmp):
|
|
142
|
+
def _eval_one_HRMS(df_query, df_reference, precursor_ion_mz_tolerance_tmp, ionization_mode_tmp, adduct_tmp, similarity_measure_tmp, weight, spectrum_preprocessing_order_tmp, mz_min_tmp, mz_max_tmp, int_min_tmp, int_max_tmp, noise_threshold_tmp, window_size_centroiding_tmp, window_size_matching_tmp, wf_mz_tmp, wf_int_tmp, LET_threshold_tmp, entropy_dimension_tmp, high_quality_reference_library_tmp, exact_match_required_tmp):
|
|
147
143
|
|
|
148
144
|
acc = get_acc_HRMS(
|
|
149
145
|
df_query=df_query, df_reference=df_reference,
|
|
@@ -160,7 +156,8 @@ def _eval_one_HRMS(df_query, df_reference,
|
|
|
160
156
|
LET_threshold=LET_threshold_tmp,
|
|
161
157
|
entropy_dimension=entropy_dimension_tmp,
|
|
162
158
|
high_quality_reference_library=high_quality_reference_library_tmp,
|
|
163
|
-
verbose=False
|
|
159
|
+
verbose=False,
|
|
160
|
+
exact_match_required=exact_match_required_tmp
|
|
164
161
|
)
|
|
165
162
|
|
|
166
163
|
return (
|
|
@@ -172,12 +169,7 @@ def _eval_one_HRMS(df_query, df_reference,
|
|
|
172
169
|
)
|
|
173
170
|
|
|
174
171
|
|
|
175
|
-
def _eval_one_NRMS(df_query, df_reference, unique_query_ids, unique_reference_ids,
|
|
176
|
-
similarity_measure_tmp, weight,
|
|
177
|
-
spectrum_preprocessing_order_tmp, mz_min_tmp, mz_max_tmp,
|
|
178
|
-
int_min_tmp, int_max_tmp, noise_threshold_tmp,
|
|
179
|
-
wf_mz_tmp, wf_int_tmp, LET_threshold_tmp,
|
|
180
|
-
entropy_dimension_tmp, high_quality_reference_library_tmp):
|
|
172
|
+
def _eval_one_NRMS(df_query, df_reference, unique_query_ids, unique_reference_ids, similarity_measure_tmp, weight, spectrum_preprocessing_order_tmp, mz_min_tmp, mz_max_tmp, int_min_tmp, int_max_tmp, noise_threshold_tmp, wf_mz_tmp, wf_int_tmp, LET_threshold_tmp, entropy_dimension_tmp, high_quality_reference_library_tmp, exact_match_required):
|
|
181
173
|
|
|
182
174
|
acc = get_acc_NRMS(
|
|
183
175
|
df_query=df_query, df_reference=df_reference,
|
|
@@ -191,7 +183,8 @@ def _eval_one_NRMS(df_query, df_reference, unique_query_ids, unique_reference_id
|
|
|
191
183
|
LET_threshold=LET_threshold_tmp,
|
|
192
184
|
entropy_dimension=entropy_dimension_tmp,
|
|
193
185
|
high_quality_reference_library=high_quality_reference_library_tmp,
|
|
194
|
-
verbose=False
|
|
186
|
+
verbose=False,
|
|
187
|
+
exact_match_required=exact_match_required_tmp
|
|
195
188
|
)
|
|
196
189
|
|
|
197
190
|
return (
|
|
@@ -202,7 +195,7 @@ def _eval_one_NRMS(df_query, df_reference, unique_query_ids, unique_reference_id
|
|
|
202
195
|
|
|
203
196
|
|
|
204
197
|
|
|
205
|
-
def tune_params_on_HRMS_data_grid(query_data=None, reference_data=None, precursor_ion_mz_tolerance=None, ionization_mode=None, adduct=None, grid=None, output_path=None, return_output=False):
|
|
198
|
+
def tune_params_on_HRMS_data_grid(query_data=None, reference_data=None, precursor_ion_mz_tolerance=None, ionization_mode=None, adduct=None, grid=None, output_path=None, return_output=False, exact_match_required=False):
|
|
206
199
|
grid = {**default_HRMS_grid, **(grid or {})}
|
|
207
200
|
for key, value in grid.items():
|
|
208
201
|
globals()[key] = value
|
|
@@ -251,7 +244,9 @@ def tune_params_on_HRMS_data_grid(query_data=None, reference_data=None, precurso
|
|
|
251
244
|
|
|
252
245
|
param_grid = product(similarity_measure, weight, spectrum_preprocessing_order, mz_min, mz_max, int_min, int_max, noise_threshold,
|
|
253
246
|
window_size_centroiding, window_size_matching, wf_mz, wf_int, LET_threshold, entropy_dimension, high_quality_reference_library)
|
|
254
|
-
results = Parallel(n_jobs=-1, verbose=10)(delayed(_eval_one_HRMS)(df_query, df_reference, precursor_ion_mz_tolerance, ionization_mode, adduct,
|
|
247
|
+
#results = Parallel(n_jobs=-1, verbose=10)(delayed(_eval_one_HRMS)(df_query, df_reference, precursor_ion_mz_tolerance, ionization_mode, adduct, (*params for params in param_grid), exact_match_required))
|
|
248
|
+
results = Parallel(n_jobs=-1, verbose=10)(delayed(_eval_one_HRMS)(df_query, df_reference, precursor_ion_mz_tolerance, ionization_mode, adduct, *params, exact_match_required) for params in param_grid)
|
|
249
|
+
|
|
255
250
|
|
|
256
251
|
df_out = pd.DataFrame(results, columns=[
|
|
257
252
|
'ACC','SIMILARITY.MEASURE','WEIGHT','SPECTRUM.PROCESSING.ORDER', 'MZ.MIN','MZ.MAX','INT.MIN','INT.MAX','NOISE.THRESHOLD',
|
|
@@ -275,7 +270,7 @@ def tune_params_on_HRMS_data_grid(query_data=None, reference_data=None, precurso
|
|
|
275
270
|
|
|
276
271
|
|
|
277
272
|
|
|
278
|
-
def tune_params_on_NRMS_data_grid(query_data=None, reference_data=None, grid=None, output_path=None, return_output=False):
|
|
273
|
+
def tune_params_on_NRMS_data_grid(query_data=None, reference_data=None, grid=None, output_path=None, return_output=False, exact_match_required=False):
|
|
279
274
|
grid = {**default_NRMS_grid, **(grid or {})}
|
|
280
275
|
for key, value in grid.items():
|
|
281
276
|
globals()[key] = value
|
|
@@ -318,7 +313,8 @@ def tune_params_on_NRMS_data_grid(query_data=None, reference_data=None, grid=Non
|
|
|
318
313
|
|
|
319
314
|
param_grid = product(similarity_measure, weight, spectrum_preprocessing_order, mz_min, mz_max, int_min, int_max,
|
|
320
315
|
noise_threshold, wf_mz, wf_int, LET_threshold, entropy_dimension, high_quality_reference_library)
|
|
321
|
-
results = Parallel(n_jobs=-1, verbose=10)(delayed(_eval_one_NRMS)(df_query, df_reference, unique_query_ids, unique_reference_ids, *params) for params in param_grid)
|
|
316
|
+
#results = Parallel(n_jobs=-1, verbose=10)(delayed(_eval_one_NRMS)(df_query, df_reference, unique_query_ids, unique_reference_ids, *params) for params in param_grid, exact_match_required)
|
|
317
|
+
results = Parallel(n_jobs=-1, verbose=10)(delayed(_eval_one_NRMS)(df_query, df_reference, unique_query_ids, unique_reference_ids, *params, exact_match_required) for params in param_grid)
|
|
322
318
|
|
|
323
319
|
df_out = pd.DataFrame(results, columns=['ACC','SIMILARITY.MEASURE','WEIGHT','SPECTRUM.PROCESSING.ORDER', 'MZ.MIN','MZ.MAX','INT.MIN','INT.MAX',
|
|
324
320
|
'NOISE.THRESHOLD','WF.MZ','WF.INT','LET.THRESHOLD','ENTROPY.DIMENSION', 'HIGH.QUALITY.REFERENCE.LIBRARY'])
|
|
@@ -339,7 +335,7 @@ def tune_params_on_NRMS_data_grid(query_data=None, reference_data=None, grid=Non
|
|
|
339
335
|
|
|
340
336
|
|
|
341
337
|
|
|
342
|
-
def get_acc_HRMS(df_query, df_reference, precursor_ion_mz_tolerance, ionization_mode, adduct, similarity_measure, weights, spectrum_preprocessing_order, mz_min, mz_max, int_min, int_max, window_size_centroiding, window_size_matching, noise_threshold, wf_mz, wf_int, LET_threshold, entropy_dimension, high_quality_reference_library, verbose=True):
|
|
338
|
+
def get_acc_HRMS(df_query, df_reference, precursor_ion_mz_tolerance, ionization_mode, adduct, similarity_measure, weights, spectrum_preprocessing_order, mz_min, mz_max, int_min, int_max, window_size_centroiding, window_size_matching, noise_threshold, wf_mz, wf_int, LET_threshold, entropy_dimension, high_quality_reference_library, verbose=True, exact_match_required=False):
|
|
343
339
|
|
|
344
340
|
n_top_matches_to_save = 1
|
|
345
341
|
unique_reference_ids = df_reference['id'].dropna().astype(str).unique().tolist()
|
|
@@ -445,11 +441,17 @@ def get_acc_HRMS(df_query, df_reference, precursor_ion_mz_tolerance, ionization_
|
|
|
445
441
|
df_tmp = pd.DataFrame({'TRUE.ID': df_scores.index.to_list(), 'PREDICTED.ID': top_ids, 'SCORE': top_scores})
|
|
446
442
|
#if verbose:
|
|
447
443
|
# print(df_tmp)
|
|
448
|
-
|
|
444
|
+
if exact_match_required == True:
|
|
445
|
+
acc = (df_tmp['TRUE.ID'] == df_tmp['PREDICTED.ID']).mean()
|
|
446
|
+
else:
|
|
447
|
+
true_lower = df_tmp['TRUE.ID'].str.lower()
|
|
448
|
+
pred_lower = df_tmp['PREDICTED.ID'].str.lower()
|
|
449
|
+
matches = [t in p for t, p in zip(true_lower, pred_lower)]
|
|
450
|
+
acc = sum(matches) / len(matches)
|
|
449
451
|
return acc
|
|
450
452
|
|
|
451
453
|
|
|
452
|
-
def get_acc_NRMS(df_query, df_reference, unique_query_ids, unique_reference_ids, similarity_measure, weights, spectrum_preprocessing_order, mz_min, mz_max, int_min, int_max, noise_threshold, wf_mz, wf_int, LET_threshold, entropy_dimension, high_quality_reference_library, verbose=True):
|
|
454
|
+
def get_acc_NRMS(df_query, df_reference, unique_query_ids, unique_reference_ids, similarity_measure, weights, spectrum_preprocessing_order, mz_min, mz_max, int_min, int_max, noise_threshold, wf_mz, wf_int, LET_threshold, entropy_dimension, high_quality_reference_library, verbose=True, exact_match_required=False):
|
|
453
455
|
|
|
454
456
|
n_top_matches_to_save = 1
|
|
455
457
|
|
|
@@ -532,7 +534,13 @@ def get_acc_NRMS(df_query, df_reference, unique_query_ids, unique_reference_ids,
|
|
|
532
534
|
df_tmp = pd.DataFrame(out, columns=['TRUE.ID','PREDICTED.ID','SCORE'])
|
|
533
535
|
#if verbose:
|
|
534
536
|
# print(df_tmp)
|
|
535
|
-
|
|
537
|
+
if exact_match_required == True:
|
|
538
|
+
acc = (df_tmp['TRUE.ID'] == df_tmp['PREDICTED.ID']).mean()
|
|
539
|
+
else:
|
|
540
|
+
true_lower = df_tmp['TRUE.ID'].str.lower()
|
|
541
|
+
pred_lower = df_tmp['PREDICTED.ID'].str.lower()
|
|
542
|
+
matches = [t in p for t, p in zip(true_lower, pred_lower)]
|
|
543
|
+
acc = sum(matches) / len(matches)
|
|
536
544
|
return acc
|
|
537
545
|
|
|
538
546
|
|
|
@@ -797,7 +805,7 @@ def run_spec_lib_matching_on_NRMS_data(query_data=None, reference_data=None, lik
|
|
|
797
805
|
else:
|
|
798
806
|
extension = query_data.rsplit('.',1)
|
|
799
807
|
extension = extension[(len(extension)-1)]
|
|
800
|
-
if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF':
|
|
808
|
+
if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF' or extension == 'msp' or extension == 'MSP' or extension == 'json' or extension == 'JSON':
|
|
801
809
|
output_path_tmp = query_data[:-3] + 'txt'
|
|
802
810
|
build_library_from_raw_data(input_path=query_data, output_path=output_path_tmp, is_reference=False)
|
|
803
811
|
df_query = pd.read_csv(output_path_tmp, sep='\t')
|
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: pycompound
|
|
3
|
+
Version: 0.1.10
|
|
4
|
+
Summary: Python package to perform compound identification in mass spectrometry via spectral library matching.
|
|
5
|
+
Author-email: Hunter Dlugas <fy7392@wayne.edu>
|
|
6
|
+
License-Expression: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/hdlugas/pycompound
|
|
8
|
+
Project-URL: Issues, https://github.com/hdlugas/pycompound/issues
|
|
9
|
+
Classifier: Programming Language :: Python :: 3
|
|
10
|
+
Classifier: Operating System :: OS Independent
|
|
11
|
+
Requires-Python: >=3.9
|
|
12
|
+
Description-Content-Type: text/markdown
|
|
13
|
+
License-File: LICENSE
|
|
14
|
+
Requires-Dist: matplotlib==3.8.4
|
|
15
|
+
Requires-Dist: numpy==1.26.4
|
|
16
|
+
Requires-Dist: pandas==2.2.2
|
|
17
|
+
Requires-Dist: scipy==1.13.1
|
|
18
|
+
Requires-Dist: pyteomics==4.7.2
|
|
19
|
+
Requires-Dist: netCDF4==1.6.5
|
|
20
|
+
Requires-Dist: lxml>=5.1.0
|
|
21
|
+
Requires-Dist: orjson==3.11.0
|
|
22
|
+
Requires-Dist: shiny==1.4.0
|
|
23
|
+
Requires-Dist: joblib==1.5.2
|
|
24
|
+
Dynamic: license-file
|
|
25
|
+
|
|
26
|
+
# PyCompound
|
|
27
|
+
|
|
28
|
+
A Python-based tool for spectral library matching, PyCompound is available as a Python package (pycompound) with a command-line interface (CLI) available and as a GUI application build with Python/Shiny. It performs spectral library matching to identify chemical compounds, offering a range of spectrum preprocessing transformations and similarity measures, including Cosine, three entropy-based similarity measures, and a plethora of binary similarity measures. PyCompound also includes functionality to tune parameters commonly used in a compound identification workflow given a query library of spectra with known ID. PyCompound supports both high-resolution mass spectrometry (HRMS) data (e.g., LC-MS/MS) and nominal-resolution mass spectrometry (NRMS) data (e.g., GC-MS). For the full documentation, see the GitHub repository https://github.com/hdlugas/pycompound.
|
|
@@ -8,7 +8,7 @@ os.makedirs(f'{Path.cwd()}/plots', exist_ok=True)
|
|
|
8
8
|
|
|
9
9
|
print('\n\ntest #1:')
|
|
10
10
|
generate_plots_on_HRMS_data(
|
|
11
|
-
query_data=f'{Path.cwd()}/data/
|
|
11
|
+
query_data=f'{Path.cwd()}/data/lcms_query.txt',
|
|
12
12
|
reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
|
|
13
13
|
high_quality_reference_library=True,
|
|
14
14
|
noise_threshold=0.1,
|
|
@@ -17,7 +17,7 @@ generate_plots_on_HRMS_data(
|
|
|
17
17
|
|
|
18
18
|
print('\n\ntest #2:')
|
|
19
19
|
generate_plots_on_HRMS_data(
|
|
20
|
-
query_data=f'{Path.cwd()}/data/
|
|
20
|
+
query_data=f'{Path.cwd()}/data/lcms_query.txt',
|
|
21
21
|
reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
|
|
22
22
|
noise_threshold=0.1,
|
|
23
23
|
similarity_measure='shannon',
|
|
@@ -25,7 +25,7 @@ generate_plots_on_HRMS_data(
|
|
|
25
25
|
|
|
26
26
|
print('\n\ntest #3:')
|
|
27
27
|
generate_plots_on_HRMS_data(
|
|
28
|
-
query_data=f'{Path.cwd()}/data/
|
|
28
|
+
query_data=f'{Path.cwd()}/data/lcms_query.txt',
|
|
29
29
|
reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
|
|
30
30
|
similarity_measure='renyi',
|
|
31
31
|
entropy_dimension=1.2,
|
|
@@ -33,7 +33,7 @@ generate_plots_on_HRMS_data(
|
|
|
33
33
|
|
|
34
34
|
print('\n\ntest #4:')
|
|
35
35
|
generate_plots_on_HRMS_data(
|
|
36
|
-
query_data=f'{Path.cwd()}/data/
|
|
36
|
+
query_data=f'{Path.cwd()}/data/lcms_query.txt',
|
|
37
37
|
reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
|
|
38
38
|
similarity_measure='tsallis',
|
|
39
39
|
entropy_dimension=1.2,
|
|
@@ -41,7 +41,7 @@ generate_plots_on_HRMS_data(
|
|
|
41
41
|
|
|
42
42
|
print('\n\ntest #5:')
|
|
43
43
|
generate_plots_on_HRMS_data(
|
|
44
|
-
query_data=f'{Path.cwd()}/data/
|
|
44
|
+
query_data=f'{Path.cwd()}/data/lcms_query.txt',
|
|
45
45
|
reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
|
|
46
46
|
similarity_measure='tsallis',
|
|
47
47
|
entropy_dimension=1.2,
|
|
@@ -49,7 +49,7 @@ generate_plots_on_HRMS_data(
|
|
|
49
49
|
|
|
50
50
|
print('\n\ntest #6:')
|
|
51
51
|
generate_plots_on_HRMS_data(
|
|
52
|
-
query_data=f'{Path.cwd()}/data/
|
|
52
|
+
query_data=f'{Path.cwd()}/data/lcms_query.txt',
|
|
53
53
|
reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
|
|
54
54
|
wf_intensity=0.8,
|
|
55
55
|
wf_mz=1.1,
|
|
@@ -57,21 +57,21 @@ generate_plots_on_HRMS_data(
|
|
|
57
57
|
|
|
58
58
|
print('\n\ntest #7:')
|
|
59
59
|
generate_plots_on_HRMS_data(
|
|
60
|
-
query_data=f'{Path.cwd()}/data/
|
|
60
|
+
query_data=f'{Path.cwd()}/data/lcms_query.txt',
|
|
61
61
|
reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
|
|
62
62
|
window_size_centroiding=0.1,
|
|
63
63
|
output_path=f'{Path.cwd()}/plots/test7.pdf')
|
|
64
64
|
|
|
65
65
|
print('\n\ntest #8:')
|
|
66
66
|
generate_plots_on_HRMS_data(
|
|
67
|
-
query_data=f'{Path.cwd()}/data/
|
|
67
|
+
query_data=f'{Path.cwd()}/data/lcms_query.txt',
|
|
68
68
|
reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
|
|
69
69
|
window_size_matching=0.25,
|
|
70
70
|
output_path=f'{Path.cwd()}/plots/test8.pdf')
|
|
71
71
|
|
|
72
72
|
print('\n\ntest #9:')
|
|
73
73
|
generate_plots_on_HRMS_data(
|
|
74
|
-
query_data=f'{Path.cwd()}/data/
|
|
74
|
+
query_data=f'{Path.cwd()}/data/lcms_query.txt',
|
|
75
75
|
reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
|
|
76
76
|
spectrum_preprocessing_order='WCM',
|
|
77
77
|
wf_mz=0.8,
|
|
@@ -80,14 +80,14 @@ generate_plots_on_HRMS_data(
|
|
|
80
80
|
|
|
81
81
|
print('\n\ntest #10:')
|
|
82
82
|
generate_plots_on_HRMS_data(
|
|
83
|
-
query_data=f'{Path.cwd()}/data/
|
|
83
|
+
query_data=f'{Path.cwd()}/data/lcms_query.txt',
|
|
84
84
|
reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
|
|
85
85
|
LET_threshold=3,
|
|
86
86
|
output_path=f'{Path.cwd()}/plots/test10.pdf')
|
|
87
87
|
|
|
88
88
|
print('\n\ntest #11:')
|
|
89
89
|
generate_plots_on_HRMS_data(
|
|
90
|
-
query_data=f'{Path.cwd()}/data/
|
|
90
|
+
query_data=f'{Path.cwd()}/data/lcms_query.txt',
|
|
91
91
|
reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
|
|
92
92
|
spectrum_ID1 = 212,
|
|
93
93
|
spectrum_ID2 = 100,
|
|
@@ -96,7 +96,7 @@ generate_plots_on_HRMS_data(
|
|
|
96
96
|
|
|
97
97
|
print('\n\ntest #12:')
|
|
98
98
|
generate_plots_on_HRMS_data(
|
|
99
|
-
query_data=f'{Path.cwd()}/data/
|
|
99
|
+
query_data=f'{Path.cwd()}/data/lcms_query.txt',
|
|
100
100
|
reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
|
|
101
101
|
spectrum_ID1 = 'Jamaicamide A M+H',
|
|
102
102
|
spectrum_ID2 = 'Malyngamide J M+H',
|
|
@@ -105,7 +105,7 @@ generate_plots_on_HRMS_data(
|
|
|
105
105
|
|
|
106
106
|
print('\n\ntest #13:')
|
|
107
107
|
generate_plots_on_HRMS_data(
|
|
108
|
-
query_data=f'{Path.cwd()}/data/
|
|
108
|
+
query_data=f'{Path.cwd()}/data/lcms_query.txt',
|
|
109
109
|
reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
|
|
110
110
|
spectrum_ID1 = 'Jamaicamide A M+H',
|
|
111
111
|
spectrum_ID2 = 'Jamaicamide A M+H',
|
|
@@ -114,13 +114,13 @@ generate_plots_on_HRMS_data(
|
|
|
114
114
|
|
|
115
115
|
print('\n\ntest #14:')
|
|
116
116
|
generate_plots_on_NRMS_data(
|
|
117
|
-
query_data=f'{Path.cwd()}/data/
|
|
117
|
+
query_data=f'{Path.cwd()}/data/gcms_query.txt',
|
|
118
118
|
reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
|
|
119
119
|
output_path=f'{Path.cwd()}/plots/test14.pdf')
|
|
120
120
|
|
|
121
121
|
print('\n\ntest #15:')
|
|
122
122
|
generate_plots_on_NRMS_data(
|
|
123
|
-
query_data=f'{Path.cwd()}/data/
|
|
123
|
+
query_data=f'{Path.cwd()}/data/gcms_query.txt',
|
|
124
124
|
reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
|
|
125
125
|
spectrum_ID1 = 463514,
|
|
126
126
|
spectrum_ID2 = 112312,
|
|
@@ -128,40 +128,40 @@ generate_plots_on_NRMS_data(
|
|
|
128
128
|
|
|
129
129
|
print('\n\ntest #17:')
|
|
130
130
|
generate_plots_on_NRMS_data(
|
|
131
|
-
query_data=f'{Path.cwd()}/data/
|
|
131
|
+
query_data=f'{Path.cwd()}/data/gcms_query.txt',
|
|
132
132
|
reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
|
|
133
133
|
output_path=f'{Path.cwd()}/plots/test17.pdf')
|
|
134
134
|
|
|
135
135
|
print('\n\ntest #18:')
|
|
136
136
|
generate_plots_on_NRMS_data(
|
|
137
|
-
query_data=f'{Path.cwd()}/data/
|
|
137
|
+
query_data=f'{Path.cwd()}/data/gcms_query.txt',
|
|
138
138
|
reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
|
|
139
139
|
y_axis_transformation='none',
|
|
140
140
|
output_path=f'{Path.cwd()}/plots/test18.pdf')
|
|
141
141
|
|
|
142
142
|
print('\n\ntest #19:')
|
|
143
143
|
generate_plots_on_NRMS_data(
|
|
144
|
-
query_data=f'{Path.cwd()}/data/
|
|
144
|
+
query_data=f'{Path.cwd()}/data/gcms_query.txt',
|
|
145
145
|
reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
|
|
146
146
|
y_axis_transformation='log10',
|
|
147
147
|
output_path=f'{Path.cwd()}/plots/test19.pdf')
|
|
148
148
|
|
|
149
149
|
print('\n\ntest #20:')
|
|
150
150
|
generate_plots_on_NRMS_data(
|
|
151
|
-
query_data=f'{Path.cwd()}/data/
|
|
151
|
+
query_data=f'{Path.cwd()}/data/gcms_query.txt',
|
|
152
152
|
reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
|
|
153
153
|
y_axis_transformation='sqrt',
|
|
154
154
|
output_path=f'{Path.cwd()}/plots/test20.pdf')
|
|
155
155
|
|
|
156
156
|
print('\n\ntest #21:')
|
|
157
157
|
generate_plots_on_HRMS_data(
|
|
158
|
-
query_data=f'{Path.cwd()}/data/
|
|
158
|
+
query_data=f'{Path.cwd()}/data/lcms_query.txt',
|
|
159
159
|
reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
|
|
160
160
|
output_path=f'{Path.cwd()}/plots/test_no_wf_normalized_y_axis_no_mz_zoom.pdf')
|
|
161
161
|
|
|
162
162
|
print('\n\ntest #22:')
|
|
163
163
|
generate_plots_on_HRMS_data(
|
|
164
|
-
query_data=f'{Path.cwd()}/data/
|
|
164
|
+
query_data=f'{Path.cwd()}/data/lcms_query.txt',
|
|
165
165
|
reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
|
|
166
166
|
wf_mz=2,
|
|
167
167
|
wf_intensity=0.5,
|
|
@@ -169,21 +169,21 @@ generate_plots_on_HRMS_data(
|
|
|
169
169
|
|
|
170
170
|
print('\n\ntest #23:')
|
|
171
171
|
generate_plots_on_HRMS_data(
|
|
172
|
-
query_data=f'{Path.cwd()}/data/
|
|
172
|
+
query_data=f'{Path.cwd()}/data/lcms_query.txt',
|
|
173
173
|
reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
|
|
174
174
|
y_axis_transformation='log10',
|
|
175
175
|
output_path=f'{Path.cwd()}/plots/test_no_wf_log10_y_axis_no_mz_zoom.pdf')
|
|
176
176
|
|
|
177
177
|
print('\n\ntest #24:')
|
|
178
178
|
generate_plots_on_HRMS_data(
|
|
179
|
-
query_data=f'{Path.cwd()}/data/
|
|
179
|
+
query_data=f'{Path.cwd()}/data/lcms_query.txt',
|
|
180
180
|
reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
|
|
181
181
|
y_axis_transformation='sqrt',
|
|
182
182
|
output_path=f'{Path.cwd()}/plots/test_no_wf_sqrt_y_axis_no_mz_zoom.pdf')
|
|
183
183
|
|
|
184
184
|
print('\n\ntest #25:')
|
|
185
185
|
generate_plots_on_HRMS_data(
|
|
186
|
-
query_data=f'{Path.cwd()}/data/
|
|
186
|
+
query_data=f'{Path.cwd()}/data/lcms_query.txt',
|
|
187
187
|
reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
|
|
188
188
|
mz_min = 400,
|
|
189
189
|
mz_max = 650,
|
|
@@ -192,49 +192,49 @@ generate_plots_on_HRMS_data(
|
|
|
192
192
|
|
|
193
193
|
print('\n\ntest #26:')
|
|
194
194
|
generate_plots_on_HRMS_data(
|
|
195
|
-
query_data=f'{Path.cwd()}/data/
|
|
195
|
+
query_data=f'{Path.cwd()}/data/lcms_query.txt',
|
|
196
196
|
reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
|
|
197
197
|
high_quality_reference_library=False,
|
|
198
198
|
output_path=f'{Path.cwd()}/plots/test_HRMS.pdf')
|
|
199
199
|
|
|
200
200
|
print('\n\ntest #27:')
|
|
201
201
|
generate_plots_on_NRMS_data(
|
|
202
|
-
query_data=f'{Path.cwd()}/data/
|
|
202
|
+
query_data=f'{Path.cwd()}/data/gcms_query.txt',
|
|
203
203
|
reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
|
|
204
204
|
high_quality_reference_library=False,
|
|
205
205
|
output_path=f'{Path.cwd()}/plots/test_NRMS.pdf')
|
|
206
206
|
|
|
207
207
|
print('\n\ntest #28:')
|
|
208
208
|
generate_plots_on_HRMS_data(
|
|
209
|
-
query_data=f'{Path.cwd()}/data/
|
|
209
|
+
query_data=f'{Path.cwd()}/data/lcms_query.txt',
|
|
210
210
|
reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
|
|
211
211
|
similarity_measure='jaccard',
|
|
212
212
|
output_path=f'{Path.cwd()}/plots/test28.pdf')
|
|
213
213
|
|
|
214
214
|
print('\n\ntest #28:')
|
|
215
215
|
generate_plots_on_HRMS_data(
|
|
216
|
-
query_data=f'{Path.cwd()}/data/
|
|
216
|
+
query_data=f'{Path.cwd()}/data/lcms_query.txt',
|
|
217
217
|
reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
|
|
218
218
|
similarity_measure='hamming',
|
|
219
219
|
output_path=f'{Path.cwd()}/plots/test28.pdf')
|
|
220
220
|
|
|
221
221
|
print('\n\ntest #29:')
|
|
222
222
|
generate_plots_on_NRMS_data(
|
|
223
|
-
query_data=f'{Path.cwd()}/data/
|
|
223
|
+
query_data=f'{Path.cwd()}/data/gcms_query.txt',
|
|
224
224
|
reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
|
|
225
225
|
similarity_measure='sokal_sneath',
|
|
226
226
|
output_path=f'{Path.cwd()}/plots/test29.pdf')
|
|
227
227
|
|
|
228
228
|
print('\n\ntest #30:')
|
|
229
229
|
generate_plots_on_NRMS_data(
|
|
230
|
-
query_data=f'{Path.cwd()}/data/
|
|
230
|
+
query_data=f'{Path.cwd()}/data/gcms_query.txt',
|
|
231
231
|
reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
|
|
232
232
|
similarity_measure='simpson',
|
|
233
233
|
output_path=f'{Path.cwd()}/plots/test30.pdf')
|
|
234
234
|
|
|
235
235
|
print('\n\ntest #31:')
|
|
236
236
|
generate_plots_on_NRMS_data(
|
|
237
|
-
query_data=f'{Path.cwd()}/data/
|
|
237
|
+
query_data=f'{Path.cwd()}/data/gcms_query.txt',
|
|
238
238
|
reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
|
|
239
239
|
similarity_measure='mixture',
|
|
240
240
|
weights={'Cosine':0.5, 'Shannon':0.3, 'Renyi':0.1, 'Tsallis':0.1},
|
|
@@ -242,9 +242,42 @@ generate_plots_on_NRMS_data(
|
|
|
242
242
|
|
|
243
243
|
print('\n\ntest #32:')
|
|
244
244
|
generate_plots_on_HRMS_data(
|
|
245
|
-
query_data=f'{Path.cwd()}/data/
|
|
245
|
+
query_data=f'{Path.cwd()}/data/lcms_query.txt',
|
|
246
246
|
reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
|
|
247
247
|
similarity_measure='mixture',
|
|
248
248
|
weights={'Cosine':0.1, 'Shannon':0.2, 'Renyi':0.3, 'Tsallis':0.4},
|
|
249
249
|
output_path=f'{Path.cwd()}/plots/test32.pdf')
|
|
250
250
|
|
|
251
|
+
print('\n\ntest #33:')
|
|
252
|
+
generate_plots_on_HRMS_data(
|
|
253
|
+
query_data=f'{Path.cwd()}/data/lcms_query.msp',
|
|
254
|
+
reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
|
|
255
|
+
high_quality_reference_library=True,
|
|
256
|
+
noise_threshold=0.1,
|
|
257
|
+
mz_min=100,
|
|
258
|
+
output_path=f'{Path.cwd()}/plots/test33.pdf')
|
|
259
|
+
|
|
260
|
+
print('\n\ntest #34:')
|
|
261
|
+
generate_plots_on_HRMS_data(
|
|
262
|
+
query_data=f'{Path.cwd()}/data/lcms_query_tuning.msp',
|
|
263
|
+
reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
|
|
264
|
+
high_quality_reference_library=True,
|
|
265
|
+
noise_threshold=0.1,
|
|
266
|
+
mz_min=100,
|
|
267
|
+
output_path=f'{Path.cwd()}/plots/test34.pdf')
|
|
268
|
+
|
|
269
|
+
print('\n\ntest #35:')
|
|
270
|
+
generate_plots_on_NRMS_data(
|
|
271
|
+
query_data=f'{Path.cwd()}/data/gcms_query.msp',
|
|
272
|
+
reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
|
|
273
|
+
similarity_measure='shannon',
|
|
274
|
+
weights={'Cosine':0.5, 'Shannon':0.3, 'Renyi':0.1, 'Tsallis':0.1},
|
|
275
|
+
output_path=f'{Path.cwd()}/plots/test35.pdf')
|
|
276
|
+
|
|
277
|
+
print('\n\ntest #36:')
|
|
278
|
+
generate_plots_on_NRMS_data(
|
|
279
|
+
query_data=f'{Path.cwd()}/data/gcms_query.msp',
|
|
280
|
+
reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
|
|
281
|
+
similarity_measure='cosine',
|
|
282
|
+
output_path=f'{Path.cwd()}/plots/test36.pdf')
|
|
283
|
+
|