pycompound 0.1.8__tar.gz → 0.1.10__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (27) hide show
  1. pycompound-0.1.10/PKG-INFO +28 -0
  2. {pycompound-0.1.8 → pycompound-0.1.10}/README.md +2 -2
  3. pycompound-0.1.10/README_PyPI.md +3 -0
  4. {pycompound-0.1.8 → pycompound-0.1.10}/pyproject.toml +2 -2
  5. {pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound/plot_spectra.py +6 -6
  6. {pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound/spec_lib_matching.py +37 -29
  7. pycompound-0.1.10/src/pycompound.egg-info/PKG-INFO +28 -0
  8. {pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound.egg-info/SOURCES.txt +1 -0
  9. {pycompound-0.1.8 → pycompound-0.1.10}/tests/test_plot_spectra.py +65 -32
  10. {pycompound-0.1.8 → pycompound-0.1.10}/tests/test_spec_lib_matching.py +56 -46
  11. {pycompound-0.1.8 → pycompound-0.1.10}/tests/test_tuning.py +32 -10
  12. pycompound-0.1.8/PKG-INFO +0 -824
  13. pycompound-0.1.8/src/pycompound.egg-info/PKG-INFO +0 -824
  14. {pycompound-0.1.8 → pycompound-0.1.10}/LICENSE +0 -0
  15. {pycompound-0.1.8 → pycompound-0.1.10}/setup.cfg +0 -0
  16. {pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound/build_library.py +0 -0
  17. {pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound/plot_spectra_CLI.py +0 -0
  18. {pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound/processing.py +0 -0
  19. {pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound/similarity_measures.py +0 -0
  20. {pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound/spec_lib_matching_CLI.py +0 -0
  21. {pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound/tuning_CLI_DE.py +0 -0
  22. {pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound/tuning_CLI_grid.py +0 -0
  23. {pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound.egg-info/dependency_links.txt +0 -0
  24. {pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound.egg-info/requires.txt +0 -0
  25. {pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound.egg-info/top_level.txt +0 -0
  26. {pycompound-0.1.8 → pycompound-0.1.10}/tests/test_build_library.py +0 -0
  27. {pycompound-0.1.8 → pycompound-0.1.10}/tests/test_similarity_measures.py +0 -0
@@ -0,0 +1,28 @@
1
+ Metadata-Version: 2.4
2
+ Name: pycompound
3
+ Version: 0.1.10
4
+ Summary: Python package to perform compound identification in mass spectrometry via spectral library matching.
5
+ Author-email: Hunter Dlugas <fy7392@wayne.edu>
6
+ License-Expression: MIT
7
+ Project-URL: Homepage, https://github.com/hdlugas/pycompound
8
+ Project-URL: Issues, https://github.com/hdlugas/pycompound/issues
9
+ Classifier: Programming Language :: Python :: 3
10
+ Classifier: Operating System :: OS Independent
11
+ Requires-Python: >=3.9
12
+ Description-Content-Type: text/markdown
13
+ License-File: LICENSE
14
+ Requires-Dist: matplotlib==3.8.4
15
+ Requires-Dist: numpy==1.26.4
16
+ Requires-Dist: pandas==2.2.2
17
+ Requires-Dist: scipy==1.13.1
18
+ Requires-Dist: pyteomics==4.7.2
19
+ Requires-Dist: netCDF4==1.6.5
20
+ Requires-Dist: lxml>=5.1.0
21
+ Requires-Dist: orjson==3.11.0
22
+ Requires-Dist: shiny==1.4.0
23
+ Requires-Dist: joblib==1.5.2
24
+ Dynamic: license-file
25
+
26
+ # PyCompound
27
+
28
+ A Python-based tool for spectral library matching, PyCompound is available as a Python package (pycompound) with a command-line interface (CLI) available and as a GUI application build with Python/Shiny. It performs spectral library matching to identify chemical compounds, offering a range of spectrum preprocessing transformations and similarity measures, including Cosine, three entropy-based similarity measures, and a plethora of binary similarity measures. PyCompound also includes functionality to tune parameters commonly used in a compound identification workflow given a query library of spectra with known ID. PyCompound supports both high-resolution mass spectrometry (HRMS) data (e.g., LC-MS/MS) and nominal-resolution mass spectrometry (NRMS) data (e.g., GC-MS). For the full documentation, see the GitHub repository https://github.com/hdlugas/pycompound.
@@ -19,9 +19,9 @@ A Python-based tool for spectral library matching, PyCompound is available as a
19
19
  ## 1. Install dependencies
20
20
  PyCompound requires the Python dependencies Matplotlib, NumPy, Pandas, SciPy, Pyteomics, and netCDF4. Specifically, this software was validated with python=3.12.4, matplotlib=3.8.4, numpy=1.26.4, pandas=2.2.2, scipy=1.13.1, pyteomics=4.7.2, netCDF4=1.6.5, lxml=5.1.0, joblib=1.5.2, and shiny=1.4.0, although it may work with other versions of these tools. A user may consider creating a conda environment (see [https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html](https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html) for guidance on getting started with conda if you are unfamiliar). For a system with conda installed, one can create the environment pycompound_env, activate it, and install the necessary dependencies with:
21
21
  ```
22
- conda create -n pycompound_env python=3.12
22
+ conda create -n pycompound_env python=3.12 -y
23
23
  conda activate pycompound_env
24
- pip install pycompound==0.1.7
24
+ pip install pycompound==0.1.10
25
25
  ```
26
26
 
27
27
  <a name="functionality"></a>
@@ -0,0 +1,3 @@
1
+ # PyCompound
2
+
3
+ A Python-based tool for spectral library matching, PyCompound is available as a Python package (pycompound) with a command-line interface (CLI) available and as a GUI application build with Python/Shiny. It performs spectral library matching to identify chemical compounds, offering a range of spectrum preprocessing transformations and similarity measures, including Cosine, three entropy-based similarity measures, and a plethora of binary similarity measures. PyCompound also includes functionality to tune parameters commonly used in a compound identification workflow given a query library of spectra with known ID. PyCompound supports both high-resolution mass spectrometry (HRMS) data (e.g., LC-MS/MS) and nominal-resolution mass spectrometry (NRMS) data (e.g., GC-MS). For the full documentation, see the GitHub repository https://github.com/hdlugas/pycompound.
@@ -4,12 +4,12 @@ build-backend = "setuptools.build_meta"
4
4
 
5
5
  [project]
6
6
  name = "pycompound"
7
- version = "0.1.8"
7
+ version = "0.1.10"
8
8
  authors = [
9
9
  { name="Hunter Dlugas", email="fy7392@wayne.edu" },
10
10
  ]
11
11
  description = "Python package to perform compound identification in mass spectrometry via spectral library matching."
12
- readme = "README.md"
12
+ readme = "README_PyPI.md"
13
13
  requires-python = ">=3.9"
14
14
  classifiers = [
15
15
  "Programming Language :: Python :: 3",
@@ -14,7 +14,7 @@ def generate_plots_on_HRMS_data(query_data=None, reference_data=None, spectrum_I
14
14
  else:
15
15
  extension = query_data.rsplit('.',1)
16
16
  extension = extension[(len(extension)-1)]
17
- if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF':
17
+ if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF' or extension == 'msp' or extension == 'MSP' or extension == 'json' or extension == 'JSON':
18
18
  output_path_tmp = query_data[:-3] + 'txt'
19
19
  build_library_from_raw_data(input_path=query_data, output_path=output_path_tmp, is_reference=True)
20
20
  df_query = pd.read_csv(output_path_tmp, sep='\t')
@@ -29,7 +29,7 @@ def generate_plots_on_HRMS_data(query_data=None, reference_data=None, spectrum_I
29
29
  else:
30
30
  extension = reference_data.rsplit('.',1)
31
31
  extension = extension[(len(extension)-1)]
32
- if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF':
32
+ if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF' or extension == 'msp' or extension == 'MSP' or extension == 'json' or extension == 'JSON':
33
33
  output_path_tmp = reference_data[:-3] + 'txt'
34
34
  build_library_from_raw_data(input_path=reference_data, output_path=output_path_tmp, is_reference=True)
35
35
  df_reference = pd.read_csv(output_path_tmp, sep='\t')
@@ -298,7 +298,7 @@ def generate_plots_on_NRMS_data(query_data=None, reference_data=None, spectrum_I
298
298
  else:
299
299
  extension = query_data.rsplit('.',1)
300
300
  extension = extension[(len(extension)-1)]
301
- if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF':
301
+ if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF' or extension == 'msp' or extension == 'MSP' or extension == 'json' or extension == 'JSON':
302
302
  output_path_tmp = query_data[:-3] + 'txt'
303
303
  build_library_from_raw_data(input_path=query_data, output_path=output_path_tmp, is_reference=False)
304
304
  df_query = pd.read_csv(output_path_tmp, sep='\t')
@@ -312,7 +312,7 @@ def generate_plots_on_NRMS_data(query_data=None, reference_data=None, spectrum_I
312
312
  else:
313
313
  extension = reference_data.rsplit('.',1)
314
314
  extension = extension[(len(extension)-1)]
315
- if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF':
315
+ if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF' or extension == 'msp' or extension == 'MSP' or extension == 'json' or extension == 'JSON':
316
316
  output_path_tmp = reference_data[:-3] + 'txt'
317
317
  build_library_from_raw_data(input_path=reference_data, output_path=output_path_tmp, is_reference=True)
318
318
  df_reference = pd.read_csv(output_path_tmp, sep='\t')
@@ -395,8 +395,8 @@ def generate_plots_on_NRMS_data(query_data=None, reference_data=None, spectrum_I
395
395
  print(f'Warning: plots will be saved to the PDF ./spectrum1_{spectrum_ID1}_spectrum2_{spectrum_ID2}_plot.pdf in the current working directory.')
396
396
  output_path = f'{Path.cwd()}/spectrum1_{spectrum_ID1}_spectrum2_{spectrum_ID2}.pdf'
397
397
 
398
- min_mz = np.min([np.min(df_query['mz_ratio'].tolist()), np.min(df_reference['mz_ratio'].tolist())])
399
- max_mz = np.max([np.max(df_query['mz_ratio'].tolist()), np.max(df_reference['mz_ratio'].tolist())])
398
+ min_mz = int(np.min([np.min(df_query['mz_ratio'].tolist()), np.min(df_reference['mz_ratio'].tolist())]))
399
+ max_mz = int(np.max([np.max(df_query['mz_ratio'].tolist()), np.max(df_reference['mz_ratio'].tolist())]))
400
400
  mzs = np.linspace(min_mz,max_mz,(max_mz-min_mz+1))
401
401
 
402
402
  unique_query_ids = df_query['id'].unique().tolist()
@@ -31,7 +31,8 @@ def objective_function_HRMS(X, ctx):
31
31
  p["wf_mz"], p["wf_int"], p["LET_threshold"],
32
32
  p["entropy_dimension"],
33
33
  ctx["high_quality_reference_library"],
34
- verbose=False
34
+ verbose=False,
35
+ exact_match_required=ctx["exact_match_required"]
35
36
  )
36
37
  print(f"\nparams({ctx['optimize_params']}) = {np.array(X)}\naccuracy: {acc*100}%")
37
38
  return 1.0 - acc
@@ -45,7 +46,8 @@ def objective_function_NRMS(X, ctx):
45
46
  ctx["mz_min"], ctx["mz_max"], ctx["int_min"], ctx["int_max"],
46
47
  p["noise_threshold"], p["wf_mz"], p["wf_int"], p["LET_threshold"], p["entropy_dimension"],
47
48
  ctx["high_quality_reference_library"],
48
- verbose=False
49
+ verbose=False,
50
+ exact_match_required=ctx["exact_match_required"]
49
51
  )
50
52
  print(f"\nparams({ctx['optimize_params']}) = {np.array(X)}\naccuracy: {acc*100}%")
51
53
  return 1.0 - acc
@@ -53,7 +55,7 @@ def objective_function_NRMS(X, ctx):
53
55
 
54
56
 
55
57
 
56
- def tune_params_DE(query_data=None, reference_data=None, chromatography_platform='HRMS', precursor_ion_mz_tolerance=None, ionization_mode=None, adduct=None, similarity_measure='cosine', weights=None, spectrum_preprocessing_order='CNMWL', mz_min=0, mz_max=999999999, int_min=0, int_max=999999999, high_quality_reference_library=False, optimize_params=["window_size_centroiding","window_size_matching","noise_threshold","wf_mz","wf_int","LET_threshold","entropy_dimension"], param_bounds={"window_size_centroiding":(0.0,0.5),"window_size_matching":(0.0,0.5),"noise_threshold":(0.0,0.25),"wf_mz":(0.0,5.0),"wf_int":(0.0,5.0),"LET_threshold":(0.0,5.0),"entropy_dimension":(1.0,3.0)}, default_params={"window_size_centroiding": 0.5, "window_size_matching":0.5, "noise_threshold":0.10, "wf_mz":0.0, "wf_int":1.0, "LET_threshold":0.0, "entropy_dimension":1.1}, maxiters=3, de_workers=1):
58
+ def tune_params_DE(query_data=None, reference_data=None, chromatography_platform='HRMS', precursor_ion_mz_tolerance=None, ionization_mode=None, adduct=None, similarity_measure='cosine', weights=None, spectrum_preprocessing_order='CNMWL', mz_min=0, mz_max=999999999, int_min=0, int_max=999999999, high_quality_reference_library=False, optimize_params=["window_size_centroiding","window_size_matching","noise_threshold","wf_mz","wf_int","LET_threshold","entropy_dimension"], param_bounds={"window_size_centroiding":(0.0,0.5),"window_size_matching":(0.0,0.5),"noise_threshold":(0.0,0.25),"wf_mz":(0.0,5.0),"wf_int":(0.0,5.0),"LET_threshold":(0.0,5.0),"entropy_dimension":(1.0,3.0)}, default_params={"window_size_centroiding": 0.5, "window_size_matching":0.5, "noise_threshold":0.10, "wf_mz":0.0, "wf_int":1.0, "LET_threshold":0.0, "entropy_dimension":1.1}, maxiters=3, de_workers=1, exact_match_required=False):
57
59
 
58
60
  if query_data is None:
59
61
  print('\nError: No argument passed to the mandatory query_data. Please pass the path to the TXT file of the query data.')
@@ -63,7 +65,7 @@ def tune_params_DE(query_data=None, reference_data=None, chromatography_platform
63
65
  extension = extension[(len(extension)-1)]
64
66
  if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF' or extension == 'msp' or extension == 'MSP' or extension == 'json' or extension == 'JSON':
65
67
  output_path_tmp = query_data[:-3] + 'txt'
66
- build_library_from_raw_data(input_path=query_data, output_path=output_path_tmp, is_reference=False)
68
+ build_library_from_raw_data(input_path=query_data, output_path=output_path_tmp, is_reference=True)
67
69
  df_query = pd.read_csv(output_path_tmp, sep='\t')
68
70
  if extension == 'txt' or extension == 'TXT':
69
71
  df_query = pd.read_csv(query_data, sep='\t')
@@ -106,6 +108,7 @@ def tune_params_DE(query_data=None, reference_data=None, chromatography_platform
106
108
  high_quality_reference_library=high_quality_reference_library,
107
109
  default_params=default_params,
108
110
  optimize_params=optimize_params,
111
+ exact_match_required=exact_match_required
109
112
  )
110
113
 
111
114
  bounds = [param_bounds[p] for p in optimize_params]
@@ -136,14 +139,7 @@ default_HRMS_grid = {'similarity_measure':['cosine'], 'weight':[{'Cosine':0.25,'
136
139
  default_NRMS_grid = {'similarity_measure':['cosine'], 'weight':[{'Cosine':0.25,'Shannon':0.25,'Renyi':0.25,'Tsallis':0.25}], 'spectrum_preprocessing_order':['FCNMWL'], 'mz_min':[0], 'mz_max':[9999999], 'int_min':[0], 'int_max':[99999999], 'noise_threshold':[0.0], 'wf_mz':[0.0], 'wf_int':[1.0], 'LET_threshold':[0.0], 'entropy_dimension':[1.1], 'high_quality_reference_library':[False]}
137
140
 
138
141
 
139
- def _eval_one_HRMS(df_query, df_reference,
140
- precursor_ion_mz_tolerance_tmp, ionization_mode_tmp, adduct_tmp,
141
- similarity_measure_tmp, weight,
142
- spectrum_preprocessing_order_tmp, mz_min_tmp, mz_max_tmp,
143
- int_min_tmp, int_max_tmp, noise_threshold_tmp,
144
- window_size_centroiding_tmp, window_size_matching_tmp,
145
- wf_mz_tmp, wf_int_tmp, LET_threshold_tmp,
146
- entropy_dimension_tmp, high_quality_reference_library_tmp):
142
+ def _eval_one_HRMS(df_query, df_reference, precursor_ion_mz_tolerance_tmp, ionization_mode_tmp, adduct_tmp, similarity_measure_tmp, weight, spectrum_preprocessing_order_tmp, mz_min_tmp, mz_max_tmp, int_min_tmp, int_max_tmp, noise_threshold_tmp, window_size_centroiding_tmp, window_size_matching_tmp, wf_mz_tmp, wf_int_tmp, LET_threshold_tmp, entropy_dimension_tmp, high_quality_reference_library_tmp, exact_match_required_tmp):
147
143
 
148
144
  acc = get_acc_HRMS(
149
145
  df_query=df_query, df_reference=df_reference,
@@ -160,7 +156,8 @@ def _eval_one_HRMS(df_query, df_reference,
160
156
  LET_threshold=LET_threshold_tmp,
161
157
  entropy_dimension=entropy_dimension_tmp,
162
158
  high_quality_reference_library=high_quality_reference_library_tmp,
163
- verbose=False
159
+ verbose=False,
160
+ exact_match_required=exact_match_required_tmp
164
161
  )
165
162
 
166
163
  return (
@@ -172,12 +169,7 @@ def _eval_one_HRMS(df_query, df_reference,
172
169
  )
173
170
 
174
171
 
175
- def _eval_one_NRMS(df_query, df_reference, unique_query_ids, unique_reference_ids,
176
- similarity_measure_tmp, weight,
177
- spectrum_preprocessing_order_tmp, mz_min_tmp, mz_max_tmp,
178
- int_min_tmp, int_max_tmp, noise_threshold_tmp,
179
- wf_mz_tmp, wf_int_tmp, LET_threshold_tmp,
180
- entropy_dimension_tmp, high_quality_reference_library_tmp):
172
+ def _eval_one_NRMS(df_query, df_reference, unique_query_ids, unique_reference_ids, similarity_measure_tmp, weight, spectrum_preprocessing_order_tmp, mz_min_tmp, mz_max_tmp, int_min_tmp, int_max_tmp, noise_threshold_tmp, wf_mz_tmp, wf_int_tmp, LET_threshold_tmp, entropy_dimension_tmp, high_quality_reference_library_tmp, exact_match_required):
181
173
 
182
174
  acc = get_acc_NRMS(
183
175
  df_query=df_query, df_reference=df_reference,
@@ -191,7 +183,8 @@ def _eval_one_NRMS(df_query, df_reference, unique_query_ids, unique_reference_id
191
183
  LET_threshold=LET_threshold_tmp,
192
184
  entropy_dimension=entropy_dimension_tmp,
193
185
  high_quality_reference_library=high_quality_reference_library_tmp,
194
- verbose=False
186
+ verbose=False,
187
+ exact_match_required=exact_match_required_tmp
195
188
  )
196
189
 
197
190
  return (
@@ -202,7 +195,7 @@ def _eval_one_NRMS(df_query, df_reference, unique_query_ids, unique_reference_id
202
195
 
203
196
 
204
197
 
205
- def tune_params_on_HRMS_data_grid(query_data=None, reference_data=None, precursor_ion_mz_tolerance=None, ionization_mode=None, adduct=None, grid=None, output_path=None, return_output=False):
198
+ def tune_params_on_HRMS_data_grid(query_data=None, reference_data=None, precursor_ion_mz_tolerance=None, ionization_mode=None, adduct=None, grid=None, output_path=None, return_output=False, exact_match_required=False):
206
199
  grid = {**default_HRMS_grid, **(grid or {})}
207
200
  for key, value in grid.items():
208
201
  globals()[key] = value
@@ -251,7 +244,9 @@ def tune_params_on_HRMS_data_grid(query_data=None, reference_data=None, precurso
251
244
 
252
245
  param_grid = product(similarity_measure, weight, spectrum_preprocessing_order, mz_min, mz_max, int_min, int_max, noise_threshold,
253
246
  window_size_centroiding, window_size_matching, wf_mz, wf_int, LET_threshold, entropy_dimension, high_quality_reference_library)
254
- results = Parallel(n_jobs=-1, verbose=10)(delayed(_eval_one_HRMS)(df_query, df_reference, precursor_ion_mz_tolerance, ionization_mode, adduct, *params) for params in param_grid)
247
+ #results = Parallel(n_jobs=-1, verbose=10)(delayed(_eval_one_HRMS)(df_query, df_reference, precursor_ion_mz_tolerance, ionization_mode, adduct, (*params for params in param_grid), exact_match_required))
248
+ results = Parallel(n_jobs=-1, verbose=10)(delayed(_eval_one_HRMS)(df_query, df_reference, precursor_ion_mz_tolerance, ionization_mode, adduct, *params, exact_match_required) for params in param_grid)
249
+
255
250
 
256
251
  df_out = pd.DataFrame(results, columns=[
257
252
  'ACC','SIMILARITY.MEASURE','WEIGHT','SPECTRUM.PROCESSING.ORDER', 'MZ.MIN','MZ.MAX','INT.MIN','INT.MAX','NOISE.THRESHOLD',
@@ -275,7 +270,7 @@ def tune_params_on_HRMS_data_grid(query_data=None, reference_data=None, precurso
275
270
 
276
271
 
277
272
 
278
- def tune_params_on_NRMS_data_grid(query_data=None, reference_data=None, grid=None, output_path=None, return_output=False):
273
+ def tune_params_on_NRMS_data_grid(query_data=None, reference_data=None, grid=None, output_path=None, return_output=False, exact_match_required=False):
279
274
  grid = {**default_NRMS_grid, **(grid or {})}
280
275
  for key, value in grid.items():
281
276
  globals()[key] = value
@@ -318,7 +313,8 @@ def tune_params_on_NRMS_data_grid(query_data=None, reference_data=None, grid=Non
318
313
 
319
314
  param_grid = product(similarity_measure, weight, spectrum_preprocessing_order, mz_min, mz_max, int_min, int_max,
320
315
  noise_threshold, wf_mz, wf_int, LET_threshold, entropy_dimension, high_quality_reference_library)
321
- results = Parallel(n_jobs=-1, verbose=10)(delayed(_eval_one_NRMS)(df_query, df_reference, unique_query_ids, unique_reference_ids, *params) for params in param_grid)
316
+ #results = Parallel(n_jobs=-1, verbose=10)(delayed(_eval_one_NRMS)(df_query, df_reference, unique_query_ids, unique_reference_ids, *params) for params in param_grid, exact_match_required)
317
+ results = Parallel(n_jobs=-1, verbose=10)(delayed(_eval_one_NRMS)(df_query, df_reference, unique_query_ids, unique_reference_ids, *params, exact_match_required) for params in param_grid)
322
318
 
323
319
  df_out = pd.DataFrame(results, columns=['ACC','SIMILARITY.MEASURE','WEIGHT','SPECTRUM.PROCESSING.ORDER', 'MZ.MIN','MZ.MAX','INT.MIN','INT.MAX',
324
320
  'NOISE.THRESHOLD','WF.MZ','WF.INT','LET.THRESHOLD','ENTROPY.DIMENSION', 'HIGH.QUALITY.REFERENCE.LIBRARY'])
@@ -339,7 +335,7 @@ def tune_params_on_NRMS_data_grid(query_data=None, reference_data=None, grid=Non
339
335
 
340
336
 
341
337
 
342
- def get_acc_HRMS(df_query, df_reference, precursor_ion_mz_tolerance, ionization_mode, adduct, similarity_measure, weights, spectrum_preprocessing_order, mz_min, mz_max, int_min, int_max, window_size_centroiding, window_size_matching, noise_threshold, wf_mz, wf_int, LET_threshold, entropy_dimension, high_quality_reference_library, verbose=True):
338
+ def get_acc_HRMS(df_query, df_reference, precursor_ion_mz_tolerance, ionization_mode, adduct, similarity_measure, weights, spectrum_preprocessing_order, mz_min, mz_max, int_min, int_max, window_size_centroiding, window_size_matching, noise_threshold, wf_mz, wf_int, LET_threshold, entropy_dimension, high_quality_reference_library, verbose=True, exact_match_required=False):
343
339
 
344
340
  n_top_matches_to_save = 1
345
341
  unique_reference_ids = df_reference['id'].dropna().astype(str).unique().tolist()
@@ -445,11 +441,17 @@ def get_acc_HRMS(df_query, df_reference, precursor_ion_mz_tolerance, ionization_
445
441
  df_tmp = pd.DataFrame({'TRUE.ID': df_scores.index.to_list(), 'PREDICTED.ID': top_ids, 'SCORE': top_scores})
446
442
  #if verbose:
447
443
  # print(df_tmp)
448
- acc = (df_tmp['TRUE.ID'] == df_tmp['PREDICTED.ID']).mean()
444
+ if exact_match_required == True:
445
+ acc = (df_tmp['TRUE.ID'] == df_tmp['PREDICTED.ID']).mean()
446
+ else:
447
+ true_lower = df_tmp['TRUE.ID'].str.lower()
448
+ pred_lower = df_tmp['PREDICTED.ID'].str.lower()
449
+ matches = [t in p for t, p in zip(true_lower, pred_lower)]
450
+ acc = sum(matches) / len(matches)
449
451
  return acc
450
452
 
451
453
 
452
- def get_acc_NRMS(df_query, df_reference, unique_query_ids, unique_reference_ids, similarity_measure, weights, spectrum_preprocessing_order, mz_min, mz_max, int_min, int_max, noise_threshold, wf_mz, wf_int, LET_threshold, entropy_dimension, high_quality_reference_library, verbose=True):
454
+ def get_acc_NRMS(df_query, df_reference, unique_query_ids, unique_reference_ids, similarity_measure, weights, spectrum_preprocessing_order, mz_min, mz_max, int_min, int_max, noise_threshold, wf_mz, wf_int, LET_threshold, entropy_dimension, high_quality_reference_library, verbose=True, exact_match_required=False):
453
455
 
454
456
  n_top_matches_to_save = 1
455
457
 
@@ -532,7 +534,13 @@ def get_acc_NRMS(df_query, df_reference, unique_query_ids, unique_reference_ids,
532
534
  df_tmp = pd.DataFrame(out, columns=['TRUE.ID','PREDICTED.ID','SCORE'])
533
535
  #if verbose:
534
536
  # print(df_tmp)
535
- acc = (df_tmp['TRUE.ID']==df_tmp['PREDICTED.ID']).mean()
537
+ if exact_match_required == True:
538
+ acc = (df_tmp['TRUE.ID'] == df_tmp['PREDICTED.ID']).mean()
539
+ else:
540
+ true_lower = df_tmp['TRUE.ID'].str.lower()
541
+ pred_lower = df_tmp['PREDICTED.ID'].str.lower()
542
+ matches = [t in p for t, p in zip(true_lower, pred_lower)]
543
+ acc = sum(matches) / len(matches)
536
544
  return acc
537
545
 
538
546
 
@@ -797,7 +805,7 @@ def run_spec_lib_matching_on_NRMS_data(query_data=None, reference_data=None, lik
797
805
  else:
798
806
  extension = query_data.rsplit('.',1)
799
807
  extension = extension[(len(extension)-1)]
800
- if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF':
808
+ if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF' or extension == 'msp' or extension == 'MSP' or extension == 'json' or extension == 'JSON':
801
809
  output_path_tmp = query_data[:-3] + 'txt'
802
810
  build_library_from_raw_data(input_path=query_data, output_path=output_path_tmp, is_reference=False)
803
811
  df_query = pd.read_csv(output_path_tmp, sep='\t')
@@ -0,0 +1,28 @@
1
+ Metadata-Version: 2.4
2
+ Name: pycompound
3
+ Version: 0.1.10
4
+ Summary: Python package to perform compound identification in mass spectrometry via spectral library matching.
5
+ Author-email: Hunter Dlugas <fy7392@wayne.edu>
6
+ License-Expression: MIT
7
+ Project-URL: Homepage, https://github.com/hdlugas/pycompound
8
+ Project-URL: Issues, https://github.com/hdlugas/pycompound/issues
9
+ Classifier: Programming Language :: Python :: 3
10
+ Classifier: Operating System :: OS Independent
11
+ Requires-Python: >=3.9
12
+ Description-Content-Type: text/markdown
13
+ License-File: LICENSE
14
+ Requires-Dist: matplotlib==3.8.4
15
+ Requires-Dist: numpy==1.26.4
16
+ Requires-Dist: pandas==2.2.2
17
+ Requires-Dist: scipy==1.13.1
18
+ Requires-Dist: pyteomics==4.7.2
19
+ Requires-Dist: netCDF4==1.6.5
20
+ Requires-Dist: lxml>=5.1.0
21
+ Requires-Dist: orjson==3.11.0
22
+ Requires-Dist: shiny==1.4.0
23
+ Requires-Dist: joblib==1.5.2
24
+ Dynamic: license-file
25
+
26
+ # PyCompound
27
+
28
+ A Python-based tool for spectral library matching, PyCompound is available as a Python package (pycompound) with a command-line interface (CLI) available and as a GUI application build with Python/Shiny. It performs spectral library matching to identify chemical compounds, offering a range of spectrum preprocessing transformations and similarity measures, including Cosine, three entropy-based similarity measures, and a plethora of binary similarity measures. PyCompound also includes functionality to tune parameters commonly used in a compound identification workflow given a query library of spectra with known ID. PyCompound supports both high-resolution mass spectrometry (HRMS) data (e.g., LC-MS/MS) and nominal-resolution mass spectrometry (NRMS) data (e.g., GC-MS). For the full documentation, see the GitHub repository https://github.com/hdlugas/pycompound.
@@ -1,5 +1,6 @@
1
1
  LICENSE
2
2
  README.md
3
+ README_PyPI.md
3
4
  pyproject.toml
4
5
  src/pycompound/build_library.py
5
6
  src/pycompound/plot_spectra.py
@@ -8,7 +8,7 @@ os.makedirs(f'{Path.cwd()}/plots', exist_ok=True)
8
8
 
9
9
  print('\n\ntest #1:')
10
10
  generate_plots_on_HRMS_data(
11
- query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
11
+ query_data=f'{Path.cwd()}/data/lcms_query.txt',
12
12
  reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
13
13
  high_quality_reference_library=True,
14
14
  noise_threshold=0.1,
@@ -17,7 +17,7 @@ generate_plots_on_HRMS_data(
17
17
 
18
18
  print('\n\ntest #2:')
19
19
  generate_plots_on_HRMS_data(
20
- query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
20
+ query_data=f'{Path.cwd()}/data/lcms_query.txt',
21
21
  reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
22
22
  noise_threshold=0.1,
23
23
  similarity_measure='shannon',
@@ -25,7 +25,7 @@ generate_plots_on_HRMS_data(
25
25
 
26
26
  print('\n\ntest #3:')
27
27
  generate_plots_on_HRMS_data(
28
- query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
28
+ query_data=f'{Path.cwd()}/data/lcms_query.txt',
29
29
  reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
30
30
  similarity_measure='renyi',
31
31
  entropy_dimension=1.2,
@@ -33,7 +33,7 @@ generate_plots_on_HRMS_data(
33
33
 
34
34
  print('\n\ntest #4:')
35
35
  generate_plots_on_HRMS_data(
36
- query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
36
+ query_data=f'{Path.cwd()}/data/lcms_query.txt',
37
37
  reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
38
38
  similarity_measure='tsallis',
39
39
  entropy_dimension=1.2,
@@ -41,7 +41,7 @@ generate_plots_on_HRMS_data(
41
41
 
42
42
  print('\n\ntest #5:')
43
43
  generate_plots_on_HRMS_data(
44
- query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
44
+ query_data=f'{Path.cwd()}/data/lcms_query.txt',
45
45
  reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
46
46
  similarity_measure='tsallis',
47
47
  entropy_dimension=1.2,
@@ -49,7 +49,7 @@ generate_plots_on_HRMS_data(
49
49
 
50
50
  print('\n\ntest #6:')
51
51
  generate_plots_on_HRMS_data(
52
- query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
52
+ query_data=f'{Path.cwd()}/data/lcms_query.txt',
53
53
  reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
54
54
  wf_intensity=0.8,
55
55
  wf_mz=1.1,
@@ -57,21 +57,21 @@ generate_plots_on_HRMS_data(
57
57
 
58
58
  print('\n\ntest #7:')
59
59
  generate_plots_on_HRMS_data(
60
- query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
60
+ query_data=f'{Path.cwd()}/data/lcms_query.txt',
61
61
  reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
62
62
  window_size_centroiding=0.1,
63
63
  output_path=f'{Path.cwd()}/plots/test7.pdf')
64
64
 
65
65
  print('\n\ntest #8:')
66
66
  generate_plots_on_HRMS_data(
67
- query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
67
+ query_data=f'{Path.cwd()}/data/lcms_query.txt',
68
68
  reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
69
69
  window_size_matching=0.25,
70
70
  output_path=f'{Path.cwd()}/plots/test8.pdf')
71
71
 
72
72
  print('\n\ntest #9:')
73
73
  generate_plots_on_HRMS_data(
74
- query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
74
+ query_data=f'{Path.cwd()}/data/lcms_query.txt',
75
75
  reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
76
76
  spectrum_preprocessing_order='WCM',
77
77
  wf_mz=0.8,
@@ -80,14 +80,14 @@ generate_plots_on_HRMS_data(
80
80
 
81
81
  print('\n\ntest #10:')
82
82
  generate_plots_on_HRMS_data(
83
- query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
83
+ query_data=f'{Path.cwd()}/data/lcms_query.txt',
84
84
  reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
85
85
  LET_threshold=3,
86
86
  output_path=f'{Path.cwd()}/plots/test10.pdf')
87
87
 
88
88
  print('\n\ntest #11:')
89
89
  generate_plots_on_HRMS_data(
90
- query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
90
+ query_data=f'{Path.cwd()}/data/lcms_query.txt',
91
91
  reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
92
92
  spectrum_ID1 = 212,
93
93
  spectrum_ID2 = 100,
@@ -96,7 +96,7 @@ generate_plots_on_HRMS_data(
96
96
 
97
97
  print('\n\ntest #12:')
98
98
  generate_plots_on_HRMS_data(
99
- query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
99
+ query_data=f'{Path.cwd()}/data/lcms_query.txt',
100
100
  reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
101
101
  spectrum_ID1 = 'Jamaicamide A M+H',
102
102
  spectrum_ID2 = 'Malyngamide J M+H',
@@ -105,7 +105,7 @@ generate_plots_on_HRMS_data(
105
105
 
106
106
  print('\n\ntest #13:')
107
107
  generate_plots_on_HRMS_data(
108
- query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
108
+ query_data=f'{Path.cwd()}/data/lcms_query.txt',
109
109
  reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
110
110
  spectrum_ID1 = 'Jamaicamide A M+H',
111
111
  spectrum_ID2 = 'Jamaicamide A M+H',
@@ -114,13 +114,13 @@ generate_plots_on_HRMS_data(
114
114
 
115
115
  print('\n\ntest #14:')
116
116
  generate_plots_on_NRMS_data(
117
- query_data=f'{Path.cwd()}/data/gcms_query_library.txt',
117
+ query_data=f'{Path.cwd()}/data/gcms_query.txt',
118
118
  reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
119
119
  output_path=f'{Path.cwd()}/plots/test14.pdf')
120
120
 
121
121
  print('\n\ntest #15:')
122
122
  generate_plots_on_NRMS_data(
123
- query_data=f'{Path.cwd()}/data/gcms_query_library.txt',
123
+ query_data=f'{Path.cwd()}/data/gcms_query.txt',
124
124
  reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
125
125
  spectrum_ID1 = 463514,
126
126
  spectrum_ID2 = 112312,
@@ -128,40 +128,40 @@ generate_plots_on_NRMS_data(
128
128
 
129
129
  print('\n\ntest #17:')
130
130
  generate_plots_on_NRMS_data(
131
- query_data=f'{Path.cwd()}/data/gcms_query_library.txt',
131
+ query_data=f'{Path.cwd()}/data/gcms_query.txt',
132
132
  reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
133
133
  output_path=f'{Path.cwd()}/plots/test17.pdf')
134
134
 
135
135
  print('\n\ntest #18:')
136
136
  generate_plots_on_NRMS_data(
137
- query_data=f'{Path.cwd()}/data/gcms_query_library.txt',
137
+ query_data=f'{Path.cwd()}/data/gcms_query.txt',
138
138
  reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
139
139
  y_axis_transformation='none',
140
140
  output_path=f'{Path.cwd()}/plots/test18.pdf')
141
141
 
142
142
  print('\n\ntest #19:')
143
143
  generate_plots_on_NRMS_data(
144
- query_data=f'{Path.cwd()}/data/gcms_query_library.txt',
144
+ query_data=f'{Path.cwd()}/data/gcms_query.txt',
145
145
  reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
146
146
  y_axis_transformation='log10',
147
147
  output_path=f'{Path.cwd()}/plots/test19.pdf')
148
148
 
149
149
  print('\n\ntest #20:')
150
150
  generate_plots_on_NRMS_data(
151
- query_data=f'{Path.cwd()}/data/gcms_query_library.txt',
151
+ query_data=f'{Path.cwd()}/data/gcms_query.txt',
152
152
  reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
153
153
  y_axis_transformation='sqrt',
154
154
  output_path=f'{Path.cwd()}/plots/test20.pdf')
155
155
 
156
156
  print('\n\ntest #21:')
157
157
  generate_plots_on_HRMS_data(
158
- query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
158
+ query_data=f'{Path.cwd()}/data/lcms_query.txt',
159
159
  reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
160
160
  output_path=f'{Path.cwd()}/plots/test_no_wf_normalized_y_axis_no_mz_zoom.pdf')
161
161
 
162
162
  print('\n\ntest #22:')
163
163
  generate_plots_on_HRMS_data(
164
- query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
164
+ query_data=f'{Path.cwd()}/data/lcms_query.txt',
165
165
  reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
166
166
  wf_mz=2,
167
167
  wf_intensity=0.5,
@@ -169,21 +169,21 @@ generate_plots_on_HRMS_data(
169
169
 
170
170
  print('\n\ntest #23:')
171
171
  generate_plots_on_HRMS_data(
172
- query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
172
+ query_data=f'{Path.cwd()}/data/lcms_query.txt',
173
173
  reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
174
174
  y_axis_transformation='log10',
175
175
  output_path=f'{Path.cwd()}/plots/test_no_wf_log10_y_axis_no_mz_zoom.pdf')
176
176
 
177
177
  print('\n\ntest #24:')
178
178
  generate_plots_on_HRMS_data(
179
- query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
179
+ query_data=f'{Path.cwd()}/data/lcms_query.txt',
180
180
  reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
181
181
  y_axis_transformation='sqrt',
182
182
  output_path=f'{Path.cwd()}/plots/test_no_wf_sqrt_y_axis_no_mz_zoom.pdf')
183
183
 
184
184
  print('\n\ntest #25:')
185
185
  generate_plots_on_HRMS_data(
186
- query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
186
+ query_data=f'{Path.cwd()}/data/lcms_query.txt',
187
187
  reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
188
188
  mz_min = 400,
189
189
  mz_max = 650,
@@ -192,49 +192,49 @@ generate_plots_on_HRMS_data(
192
192
 
193
193
  print('\n\ntest #26:')
194
194
  generate_plots_on_HRMS_data(
195
- query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
195
+ query_data=f'{Path.cwd()}/data/lcms_query.txt',
196
196
  reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
197
197
  high_quality_reference_library=False,
198
198
  output_path=f'{Path.cwd()}/plots/test_HRMS.pdf')
199
199
 
200
200
  print('\n\ntest #27:')
201
201
  generate_plots_on_NRMS_data(
202
- query_data=f'{Path.cwd()}/data/gcms_query_library.txt',
202
+ query_data=f'{Path.cwd()}/data/gcms_query.txt',
203
203
  reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
204
204
  high_quality_reference_library=False,
205
205
  output_path=f'{Path.cwd()}/plots/test_NRMS.pdf')
206
206
 
207
207
  print('\n\ntest #28:')
208
208
  generate_plots_on_HRMS_data(
209
- query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
209
+ query_data=f'{Path.cwd()}/data/lcms_query.txt',
210
210
  reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
211
211
  similarity_measure='jaccard',
212
212
  output_path=f'{Path.cwd()}/plots/test28.pdf')
213
213
 
214
214
  print('\n\ntest #28:')
215
215
  generate_plots_on_HRMS_data(
216
- query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
216
+ query_data=f'{Path.cwd()}/data/lcms_query.txt',
217
217
  reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
218
218
  similarity_measure='hamming',
219
219
  output_path=f'{Path.cwd()}/plots/test28.pdf')
220
220
 
221
221
  print('\n\ntest #29:')
222
222
  generate_plots_on_NRMS_data(
223
- query_data=f'{Path.cwd()}/data/gcms_query_library.txt',
223
+ query_data=f'{Path.cwd()}/data/gcms_query.txt',
224
224
  reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
225
225
  similarity_measure='sokal_sneath',
226
226
  output_path=f'{Path.cwd()}/plots/test29.pdf')
227
227
 
228
228
  print('\n\ntest #30:')
229
229
  generate_plots_on_NRMS_data(
230
- query_data=f'{Path.cwd()}/data/gcms_query_library.txt',
230
+ query_data=f'{Path.cwd()}/data/gcms_query.txt',
231
231
  reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
232
232
  similarity_measure='simpson',
233
233
  output_path=f'{Path.cwd()}/plots/test30.pdf')
234
234
 
235
235
  print('\n\ntest #31:')
236
236
  generate_plots_on_NRMS_data(
237
- query_data=f'{Path.cwd()}/data/gcms_query_library.txt',
237
+ query_data=f'{Path.cwd()}/data/gcms_query.txt',
238
238
  reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
239
239
  similarity_measure='mixture',
240
240
  weights={'Cosine':0.5, 'Shannon':0.3, 'Renyi':0.1, 'Tsallis':0.1},
@@ -242,9 +242,42 @@ generate_plots_on_NRMS_data(
242
242
 
243
243
  print('\n\ntest #32:')
244
244
  generate_plots_on_HRMS_data(
245
- query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
245
+ query_data=f'{Path.cwd()}/data/lcms_query.txt',
246
246
  reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
247
247
  similarity_measure='mixture',
248
248
  weights={'Cosine':0.1, 'Shannon':0.2, 'Renyi':0.3, 'Tsallis':0.4},
249
249
  output_path=f'{Path.cwd()}/plots/test32.pdf')
250
250
 
251
+ print('\n\ntest #33:')
252
+ generate_plots_on_HRMS_data(
253
+ query_data=f'{Path.cwd()}/data/lcms_query.msp',
254
+ reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
255
+ high_quality_reference_library=True,
256
+ noise_threshold=0.1,
257
+ mz_min=100,
258
+ output_path=f'{Path.cwd()}/plots/test33.pdf')
259
+
260
+ print('\n\ntest #34:')
261
+ generate_plots_on_HRMS_data(
262
+ query_data=f'{Path.cwd()}/data/lcms_query_tuning.msp',
263
+ reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
264
+ high_quality_reference_library=True,
265
+ noise_threshold=0.1,
266
+ mz_min=100,
267
+ output_path=f'{Path.cwd()}/plots/test34.pdf')
268
+
269
+ print('\n\ntest #35:')
270
+ generate_plots_on_NRMS_data(
271
+ query_data=f'{Path.cwd()}/data/gcms_query.msp',
272
+ reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
273
+ similarity_measure='shannon',
274
+ weights={'Cosine':0.5, 'Shannon':0.3, 'Renyi':0.1, 'Tsallis':0.1},
275
+ output_path=f'{Path.cwd()}/plots/test35.pdf')
276
+
277
+ print('\n\ntest #36:')
278
+ generate_plots_on_NRMS_data(
279
+ query_data=f'{Path.cwd()}/data/gcms_query.msp',
280
+ reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
281
+ similarity_measure='cosine',
282
+ output_path=f'{Path.cwd()}/plots/test36.pdf')
283
+