PyPI - pycompound - Versions diffs - 0.1.8__tar.gz → 0.1.10__tar.gz - Mend

pycompound 0.1.8tar.gz → 0.1.10tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

pycompound-0.1.10/PKG-INFO ADDED Viewed

@@ -0,0 +1,28 @@
+Metadata-Version: 2.4
+Name: pycompound
+Version: 0.1.10
+Summary: Python package to perform compound identification in mass spectrometry via spectral library matching.
+Author-email: Hunter Dlugas <fy7392@wayne.edu>
+License-Expression: MIT
+Project-URL: Homepage, https://github.com/hdlugas/pycompound
+Project-URL: Issues, https://github.com/hdlugas/pycompound/issues
+Classifier: Programming Language :: Python :: 3
+Classifier: Operating System :: OS Independent
+Requires-Python: >=3.9
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: matplotlib==3.8.4
+Requires-Dist: numpy==1.26.4
+Requires-Dist: pandas==2.2.2
+Requires-Dist: scipy==1.13.1
+Requires-Dist: pyteomics==4.7.2
+Requires-Dist: netCDF4==1.6.5
+Requires-Dist: lxml>=5.1.0
+Requires-Dist: orjson==3.11.0
+Requires-Dist: shiny==1.4.0
+Requires-Dist: joblib==1.5.2
+Dynamic: license-file
+# PyCompound
+A Python-based tool for spectral library matching, PyCompound is available as a Python package (pycompound) with a command-line interface (CLI) available and as a GUI application build with Python/Shiny. It performs spectral library matching to identify chemical compounds, offering a range of spectrum preprocessing transformations and similarity measures, including Cosine, three entropy-based similarity measures, and a plethora of binary similarity measures. PyCompound also includes functionality to tune parameters commonly used in a compound identification workflow given a query library of spectra with known ID. PyCompound supports both high-resolution mass spectrometry (HRMS) data (e.g., LC-MS/MS) and nominal-resolution mass spectrometry (NRMS) data (e.g., GC-MS). For the full documentation, see the GitHub repository https://github.com/hdlugas/pycompound.

{pycompound-0.1.8 → pycompound-0.1.10}/README.md RENAMED Viewed

@@ -19,9 +19,9 @@ A Python-based tool for spectral library matching, PyCompound is available as a
 ## 1. Install dependencies
 PyCompound requires the Python dependencies Matplotlib, NumPy, Pandas, SciPy, Pyteomics, and netCDF4. Specifically, this software was validated with python=3.12.4, matplotlib=3.8.4, numpy=1.26.4, pandas=2.2.2, scipy=1.13.1, pyteomics=4.7.2, netCDF4=1.6.5, lxml=5.1.0, joblib=1.5.2, and shiny=1.4.0, although it may work with other versions of these tools. A user may consider creating a conda environment (see [https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html](https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html) for guidance on getting started with conda if you are unfamiliar). For a system with conda installed, one can create the environment pycompound_env, activate it, and install the necessary dependencies with:
 ```
-conda create -n pycompound_env python=3.12
+conda create -n pycompound_env python=3.12 -y
 conda activate pycompound_env
-pip install pycompound==0.1.7
+pip install pycompound==0.1.10
 ```
 <a name="functionality"></a>

pycompound-0.1.10/README_PyPI.md ADDED Viewed

@@ -0,0 +1,3 @@
+# PyCompound
+A Python-based tool for spectral library matching, PyCompound is available as a Python package (pycompound) with a command-line interface (CLI) available and as a GUI application build with Python/Shiny. It performs spectral library matching to identify chemical compounds, offering a range of spectrum preprocessing transformations and similarity measures, including Cosine, three entropy-based similarity measures, and a plethora of binary similarity measures. PyCompound also includes functionality to tune parameters commonly used in a compound identification workflow given a query library of spectra with known ID. PyCompound supports both high-resolution mass spectrometry (HRMS) data (e.g., LC-MS/MS) and nominal-resolution mass spectrometry (NRMS) data (e.g., GC-MS). For the full documentation, see the GitHub repository https://github.com/hdlugas/pycompound.

{pycompound-0.1.8 → pycompound-0.1.10}/pyproject.toml RENAMED Viewed

@@ -4,12 +4,12 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "pycompound"
-version = "0.1.8"
+version = "0.1.10"
 authors = [
   { name="Hunter Dlugas", email="fy7392@wayne.edu" },
 ]
 description = "Python package to perform compound identification in mass spectrometry via spectral library matching."
-readme = "README.md"
+readme = "README_PyPI.md"
 requires-python = ">=3.9"
 classifiers = [
     "Programming Language :: Python :: 3",

{pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound/plot_spectra.py RENAMED Viewed

@@ -14,7 +14,7 @@ def generate_plots_on_HRMS_data(query_data=None, reference_data=None, spectrum_I
     else:
         extension = query_data.rsplit('.',1)
         extension = extension[(len(extension)-1)]
-        if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF':
+        if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF' or extension == 'msp' or extension == 'MSP' or extension == 'json' or extension == 'JSON':
             output_path_tmp = query_data[:-3] + 'txt'
             build_library_from_raw_data(input_path=query_data, output_path=output_path_tmp, is_reference=True)
             df_query = pd.read_csv(output_path_tmp, sep='\t')
@@ -29,7 +29,7 @@ def generate_plots_on_HRMS_data(query_data=None, reference_data=None, spectrum_I
     else:
         extension = reference_data.rsplit('.',1)
         extension = extension[(len(extension)-1)]
-        if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF':
+        if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF' or extension == 'msp' or extension == 'MSP' or extension == 'json' or extension == 'JSON':
             output_path_tmp = reference_data[:-3] + 'txt'
             build_library_from_raw_data(input_path=reference_data, output_path=output_path_tmp, is_reference=True)
             df_reference = pd.read_csv(output_path_tmp, sep='\t')
@@ -298,7 +298,7 @@ def generate_plots_on_NRMS_data(query_data=None, reference_data=None, spectrum_I
     else:
         extension = query_data.rsplit('.',1)
         extension = extension[(len(extension)-1)]
-        if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF':
+        if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF' or extension == 'msp' or extension == 'MSP' or extension == 'json' or extension == 'JSON':
             output_path_tmp = query_data[:-3] + 'txt'
             build_library_from_raw_data(input_path=query_data, output_path=output_path_tmp, is_reference=False)
             df_query = pd.read_csv(output_path_tmp, sep='\t')
@@ -312,7 +312,7 @@ def generate_plots_on_NRMS_data(query_data=None, reference_data=None, spectrum_I
     else:
         extension = reference_data.rsplit('.',1)
         extension = extension[(len(extension)-1)]
-        if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF':
+        if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF' or extension == 'msp' or extension == 'MSP' or extension == 'json' or extension == 'JSON':
             output_path_tmp = reference_data[:-3] + 'txt'
             build_library_from_raw_data(input_path=reference_data, output_path=output_path_tmp, is_reference=True)
             df_reference = pd.read_csv(output_path_tmp, sep='\t')
@@ -395,8 +395,8 @@ def generate_plots_on_NRMS_data(query_data=None, reference_data=None, spectrum_I
         print(f'Warning: plots will be saved to the PDF ./spectrum1_{spectrum_ID1}_spectrum2_{spectrum_ID2}_plot.pdf in the current working directory.')
         output_path = f'{Path.cwd()}/spectrum1_{spectrum_ID1}_spectrum2_{spectrum_ID2}.pdf'
-    min_mz = np.min([np.min(df_query['mz_ratio'].tolist()), np.min(df_reference['mz_ratio'].tolist())])
-    max_mz = np.max([np.max(df_query['mz_ratio'].tolist()), np.max(df_reference['mz_ratio'].tolist())])
+    min_mz = int(np.min([np.min(df_query['mz_ratio'].tolist()), np.min(df_reference['mz_ratio'].tolist())]))
+    max_mz = int(np.max([np.max(df_query['mz_ratio'].tolist()), np.max(df_reference['mz_ratio'].tolist())]))
     mzs = np.linspace(min_mz,max_mz,(max_mz-min_mz+1))
     unique_query_ids = df_query['id'].unique().tolist()

{pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound/spec_lib_matching.py RENAMED Viewed

@@ -31,7 +31,8 @@ def objective_function_HRMS(X, ctx):
         p["wf_mz"], p["wf_int"], p["LET_threshold"],
         p["entropy_dimension"],
         ctx["high_quality_reference_library"],
-        verbose=False
+        verbose=False,
+        exact_match_required=ctx["exact_match_required"]
     )
     print(f"\nparams({ctx['optimize_params']}) = {np.array(X)}\naccuracy: {acc*100}%")
     return 1.0 - acc
@@ -45,7 +46,8 @@ def objective_function_NRMS(X, ctx):
         ctx["mz_min"], ctx["mz_max"], ctx["int_min"], ctx["int_max"],
         p["noise_threshold"], p["wf_mz"], p["wf_int"], p["LET_threshold"], p["entropy_dimension"],
         ctx["high_quality_reference_library"],
-        verbose=False
+        verbose=False,
+        exact_match_required=ctx["exact_match_required"]
     )
     print(f"\nparams({ctx['optimize_params']}) = {np.array(X)}\naccuracy: {acc*100}%")
     return 1.0 - acc
@@ -53,7 +55,7 @@ def objective_function_NRMS(X, ctx):
-def tune_params_DE(query_data=None, reference_data=None, chromatography_platform='HRMS', precursor_ion_mz_tolerance=None, ionization_mode=None, adduct=None, similarity_measure='cosine', weights=None, spectrum_preprocessing_order='CNMWL', mz_min=0, mz_max=999999999, int_min=0, int_max=999999999, high_quality_reference_library=False, optimize_params=["window_size_centroiding","window_size_matching","noise_threshold","wf_mz","wf_int","LET_threshold","entropy_dimension"], param_bounds={"window_size_centroiding":(0.0,0.5),"window_size_matching":(0.0,0.5),"noise_threshold":(0.0,0.25),"wf_mz":(0.0,5.0),"wf_int":(0.0,5.0),"LET_threshold":(0.0,5.0),"entropy_dimension":(1.0,3.0)}, default_params={"window_size_centroiding": 0.5, "window_size_matching":0.5, "noise_threshold":0.10, "wf_mz":0.0, "wf_int":1.0, "LET_threshold":0.0, "entropy_dimension":1.1}, maxiters=3, de_workers=1):
+def tune_params_DE(query_data=None, reference_data=None, chromatography_platform='HRMS', precursor_ion_mz_tolerance=None, ionization_mode=None, adduct=None, similarity_measure='cosine', weights=None, spectrum_preprocessing_order='CNMWL', mz_min=0, mz_max=999999999, int_min=0, int_max=999999999, high_quality_reference_library=False, optimize_params=["window_size_centroiding","window_size_matching","noise_threshold","wf_mz","wf_int","LET_threshold","entropy_dimension"], param_bounds={"window_size_centroiding":(0.0,0.5),"window_size_matching":(0.0,0.5),"noise_threshold":(0.0,0.25),"wf_mz":(0.0,5.0),"wf_int":(0.0,5.0),"LET_threshold":(0.0,5.0),"entropy_dimension":(1.0,3.0)}, default_params={"window_size_centroiding": 0.5, "window_size_matching":0.5, "noise_threshold":0.10, "wf_mz":0.0, "wf_int":1.0, "LET_threshold":0.0, "entropy_dimension":1.1}, maxiters=3, de_workers=1, exact_match_required=False):
     if query_data is None:
         print('\nError: No argument passed to the mandatory query_data. Please pass the path to the TXT file of the query data.')
@@ -63,7 +65,7 @@ def tune_params_DE(query_data=None, reference_data=None, chromatography_platform
         extension = extension[(len(extension)-1)]
         if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF' or extension == 'msp' or extension == 'MSP' or extension == 'json' or extension == 'JSON':
             output_path_tmp = query_data[:-3] + 'txt'
-            build_library_from_raw_data(input_path=query_data, output_path=output_path_tmp, is_reference=False)
+            build_library_from_raw_data(input_path=query_data, output_path=output_path_tmp, is_reference=True)
             df_query = pd.read_csv(output_path_tmp, sep='\t')
         if extension == 'txt' or extension == 'TXT':
             df_query = pd.read_csv(query_data, sep='\t')
@@ -106,6 +108,7 @@ def tune_params_DE(query_data=None, reference_data=None, chromatography_platform
         high_quality_reference_library=high_quality_reference_library,
         default_params=default_params,
         optimize_params=optimize_params,
+        exact_match_required=exact_match_required
     )
     bounds = [param_bounds[p] for p in optimize_params]
@@ -136,14 +139,7 @@ default_HRMS_grid = {'similarity_measure':['cosine'], 'weight':[{'Cosine':0.25,'
 default_NRMS_grid = {'similarity_measure':['cosine'], 'weight':[{'Cosine':0.25,'Shannon':0.25,'Renyi':0.25,'Tsallis':0.25}], 'spectrum_preprocessing_order':['FCNMWL'], 'mz_min':[0], 'mz_max':[9999999], 'int_min':[0], 'int_max':[99999999], 'noise_threshold':[0.0], 'wf_mz':[0.0], 'wf_int':[1.0], 'LET_threshold':[0.0], 'entropy_dimension':[1.1], 'high_quality_reference_library':[False]}
-def _eval_one_HRMS(df_query, df_reference,
-              precursor_ion_mz_tolerance_tmp, ionization_mode_tmp, adduct_tmp,
-              similarity_measure_tmp, weight,
-              spectrum_preprocessing_order_tmp, mz_min_tmp, mz_max_tmp,
-              int_min_tmp, int_max_tmp, noise_threshold_tmp,
-              window_size_centroiding_tmp, window_size_matching_tmp,
-              wf_mz_tmp, wf_int_tmp, LET_threshold_tmp,
-              entropy_dimension_tmp, high_quality_reference_library_tmp):
+def _eval_one_HRMS(df_query, df_reference, precursor_ion_mz_tolerance_tmp, ionization_mode_tmp, adduct_tmp, similarity_measure_tmp, weight, spectrum_preprocessing_order_tmp, mz_min_tmp, mz_max_tmp, int_min_tmp, int_max_tmp, noise_threshold_tmp, window_size_centroiding_tmp, window_size_matching_tmp, wf_mz_tmp, wf_int_tmp, LET_threshold_tmp, entropy_dimension_tmp, high_quality_reference_library_tmp, exact_match_required_tmp):
     acc = get_acc_HRMS(
         df_query=df_query, df_reference=df_reference,
@@ -160,7 +156,8 @@ def _eval_one_HRMS(df_query, df_reference,
         LET_threshold=LET_threshold_tmp,
         entropy_dimension=entropy_dimension_tmp,
         high_quality_reference_library=high_quality_reference_library_tmp,
-        verbose=False
+        verbose=False,
+        exact_match_required=exact_match_required_tmp
     )
     return (
@@ -172,12 +169,7 @@ def _eval_one_HRMS(df_query, df_reference,
     )
-def _eval_one_NRMS(df_query, df_reference, unique_query_ids, unique_reference_ids,
-              similarity_measure_tmp, weight,
-              spectrum_preprocessing_order_tmp, mz_min_tmp, mz_max_tmp,
-              int_min_tmp, int_max_tmp, noise_threshold_tmp,
-              wf_mz_tmp, wf_int_tmp, LET_threshold_tmp,
-              entropy_dimension_tmp, high_quality_reference_library_tmp):
+def _eval_one_NRMS(df_query, df_reference, unique_query_ids, unique_reference_ids, similarity_measure_tmp, weight, spectrum_preprocessing_order_tmp, mz_min_tmp, mz_max_tmp, int_min_tmp, int_max_tmp, noise_threshold_tmp, wf_mz_tmp, wf_int_tmp, LET_threshold_tmp, entropy_dimension_tmp, high_quality_reference_library_tmp, exact_match_required):
     acc = get_acc_NRMS(
         df_query=df_query, df_reference=df_reference,
@@ -191,7 +183,8 @@ def _eval_one_NRMS(df_query, df_reference, unique_query_ids, unique_reference_id
         LET_threshold=LET_threshold_tmp,
         entropy_dimension=entropy_dimension_tmp,
         high_quality_reference_library=high_quality_reference_library_tmp,
-        verbose=False
+        verbose=False,
+        exact_match_required=exact_match_required_tmp
     )
     return (
@@ -202,7 +195,7 @@ def _eval_one_NRMS(df_query, df_reference, unique_query_ids, unique_reference_id
-def tune_params_on_HRMS_data_grid(query_data=None, reference_data=None, precursor_ion_mz_tolerance=None, ionization_mode=None, adduct=None, grid=None, output_path=None, return_output=False):
+def tune_params_on_HRMS_data_grid(query_data=None, reference_data=None, precursor_ion_mz_tolerance=None, ionization_mode=None, adduct=None, grid=None, output_path=None, return_output=False, exact_match_required=False):
     grid = {**default_HRMS_grid, **(grid or {})}
     for key, value in grid.items():
         globals()[key] = value
@@ -251,7 +244,9 @@ def tune_params_on_HRMS_data_grid(query_data=None, reference_data=None, precurso
     param_grid = product(similarity_measure, weight, spectrum_preprocessing_order, mz_min, mz_max, int_min, int_max, noise_threshold,
                          window_size_centroiding, window_size_matching, wf_mz, wf_int, LET_threshold, entropy_dimension, high_quality_reference_library)
-    results = Parallel(n_jobs=-1, verbose=10)(delayed(_eval_one_HRMS)(df_query, df_reference, precursor_ion_mz_tolerance, ionization_mode, adduct,  *params) for params in param_grid)
+    #results = Parallel(n_jobs=-1, verbose=10)(delayed(_eval_one_HRMS)(df_query, df_reference, precursor_ion_mz_tolerance, ionization_mode, adduct, (*params for params in param_grid), exact_match_required))
+    results = Parallel(n_jobs=-1, verbose=10)(delayed(_eval_one_HRMS)(df_query, df_reference, precursor_ion_mz_tolerance, ionization_mode, adduct, *params, exact_match_required) for params in param_grid)
     df_out = pd.DataFrame(results, columns=[
         'ACC','SIMILARITY.MEASURE','WEIGHT','SPECTRUM.PROCESSING.ORDER', 'MZ.MIN','MZ.MAX','INT.MIN','INT.MAX','NOISE.THRESHOLD',
@@ -275,7 +270,7 @@ def tune_params_on_HRMS_data_grid(query_data=None, reference_data=None, precurso
-def tune_params_on_NRMS_data_grid(query_data=None, reference_data=None, grid=None, output_path=None, return_output=False):
+def tune_params_on_NRMS_data_grid(query_data=None, reference_data=None, grid=None, output_path=None, return_output=False, exact_match_required=False):
     grid = {**default_NRMS_grid, **(grid or {})}
     for key, value in grid.items():
         globals()[key] = value
@@ -318,7 +313,8 @@ def tune_params_on_NRMS_data_grid(query_data=None, reference_data=None, grid=Non
     param_grid = product(similarity_measure, weight, spectrum_preprocessing_order, mz_min, mz_max, int_min, int_max,
                          noise_threshold, wf_mz, wf_int, LET_threshold, entropy_dimension, high_quality_reference_library)
-    results = Parallel(n_jobs=-1, verbose=10)(delayed(_eval_one_NRMS)(df_query, df_reference, unique_query_ids, unique_reference_ids, *params) for params in param_grid)
+    #results = Parallel(n_jobs=-1, verbose=10)(delayed(_eval_one_NRMS)(df_query, df_reference, unique_query_ids, unique_reference_ids, *params) for params in param_grid, exact_match_required)
+    results = Parallel(n_jobs=-1, verbose=10)(delayed(_eval_one_NRMS)(df_query, df_reference, unique_query_ids, unique_reference_ids, *params, exact_match_required) for params in param_grid)
     df_out = pd.DataFrame(results, columns=['ACC','SIMILARITY.MEASURE','WEIGHT','SPECTRUM.PROCESSING.ORDER', 'MZ.MIN','MZ.MAX','INT.MIN','INT.MAX',
                                             'NOISE.THRESHOLD','WF.MZ','WF.INT','LET.THRESHOLD','ENTROPY.DIMENSION', 'HIGH.QUALITY.REFERENCE.LIBRARY'])
@@ -339,7 +335,7 @@ def tune_params_on_NRMS_data_grid(query_data=None, reference_data=None, grid=Non
-def get_acc_HRMS(df_query, df_reference, precursor_ion_mz_tolerance, ionization_mode, adduct, similarity_measure, weights, spectrum_preprocessing_order, mz_min, mz_max, int_min, int_max, window_size_centroiding, window_size_matching, noise_threshold, wf_mz, wf_int, LET_threshold, entropy_dimension, high_quality_reference_library, verbose=True):
+def get_acc_HRMS(df_query, df_reference, precursor_ion_mz_tolerance, ionization_mode, adduct, similarity_measure, weights, spectrum_preprocessing_order, mz_min, mz_max, int_min, int_max, window_size_centroiding, window_size_matching, noise_threshold, wf_mz, wf_int, LET_threshold, entropy_dimension, high_quality_reference_library, verbose=True, exact_match_required=False):
     n_top_matches_to_save = 1
     unique_reference_ids = df_reference['id'].dropna().astype(str).unique().tolist()
@@ -445,11 +441,17 @@ def get_acc_HRMS(df_query, df_reference, precursor_ion_mz_tolerance, ionization_
     df_tmp = pd.DataFrame({'TRUE.ID': df_scores.index.to_list(), 'PREDICTED.ID': top_ids, 'SCORE': top_scores})
     #if verbose:
     #    print(df_tmp)
-    acc = (df_tmp['TRUE.ID'] == df_tmp['PREDICTED.ID']).mean()
+    if exact_match_required == True:
+        acc = (df_tmp['TRUE.ID'] == df_tmp['PREDICTED.ID']).mean()
+    else:
+        true_lower = df_tmp['TRUE.ID'].str.lower()
+        pred_lower = df_tmp['PREDICTED.ID'].str.lower()
+        matches = [t in p for t, p in zip(true_lower, pred_lower)]
+        acc = sum(matches) / len(matches)
     return acc
-def get_acc_NRMS(df_query, df_reference, unique_query_ids, unique_reference_ids, similarity_measure, weights, spectrum_preprocessing_order, mz_min, mz_max, int_min, int_max, noise_threshold, wf_mz, wf_int, LET_threshold, entropy_dimension, high_quality_reference_library, verbose=True):
+def get_acc_NRMS(df_query, df_reference, unique_query_ids, unique_reference_ids, similarity_measure, weights, spectrum_preprocessing_order, mz_min, mz_max, int_min, int_max, noise_threshold, wf_mz, wf_int, LET_threshold, entropy_dimension, high_quality_reference_library, verbose=True, exact_match_required=False):
     n_top_matches_to_save = 1
@@ -532,7 +534,13 @@ def get_acc_NRMS(df_query, df_reference, unique_query_ids, unique_reference_ids,
     df_tmp = pd.DataFrame(out, columns=['TRUE.ID','PREDICTED.ID','SCORE'])
     #if verbose:
     #    print(df_tmp)
-    acc = (df_tmp['TRUE.ID']==df_tmp['PREDICTED.ID']).mean()
+    if exact_match_required == True:
+        acc = (df_tmp['TRUE.ID'] == df_tmp['PREDICTED.ID']).mean()
+    else:
+        true_lower = df_tmp['TRUE.ID'].str.lower()
+        pred_lower = df_tmp['PREDICTED.ID'].str.lower()
+        matches = [t in p for t, p in zip(true_lower, pred_lower)]
+        acc = sum(matches) / len(matches)
     return acc
@@ -797,7 +805,7 @@ def run_spec_lib_matching_on_NRMS_data(query_data=None, reference_data=None, lik
     else:
         extension = query_data.rsplit('.',1)
         extension = extension[(len(extension)-1)]
-        if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF':
+        if extension == 'mgf' or extension == 'MGF' or extension == 'mzML' or extension == 'mzml' or extension == 'MZML' or extension == 'cdf' or extension == 'CDF' or extension == 'msp' or extension == 'MSP' or extension == 'json' or extension == 'JSON':
             output_path_tmp = query_data[:-3] + 'txt'
             build_library_from_raw_data(input_path=query_data, output_path=output_path_tmp, is_reference=False)
             df_query = pd.read_csv(output_path_tmp, sep='\t')

pycompound-0.1.10/src/pycompound.egg-info/PKG-INFO ADDED Viewed

@@ -0,0 +1,28 @@
+Metadata-Version: 2.4
+Name: pycompound
+Version: 0.1.10
+Summary: Python package to perform compound identification in mass spectrometry via spectral library matching.
+Author-email: Hunter Dlugas <fy7392@wayne.edu>
+License-Expression: MIT
+Project-URL: Homepage, https://github.com/hdlugas/pycompound
+Project-URL: Issues, https://github.com/hdlugas/pycompound/issues
+Classifier: Programming Language :: Python :: 3
+Classifier: Operating System :: OS Independent
+Requires-Python: >=3.9
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: matplotlib==3.8.4
+Requires-Dist: numpy==1.26.4
+Requires-Dist: pandas==2.2.2
+Requires-Dist: scipy==1.13.1
+Requires-Dist: pyteomics==4.7.2
+Requires-Dist: netCDF4==1.6.5
+Requires-Dist: lxml>=5.1.0
+Requires-Dist: orjson==3.11.0
+Requires-Dist: shiny==1.4.0
+Requires-Dist: joblib==1.5.2
+Dynamic: license-file
+# PyCompound
+A Python-based tool for spectral library matching, PyCompound is available as a Python package (pycompound) with a command-line interface (CLI) available and as a GUI application build with Python/Shiny. It performs spectral library matching to identify chemical compounds, offering a range of spectrum preprocessing transformations and similarity measures, including Cosine, three entropy-based similarity measures, and a plethora of binary similarity measures. PyCompound also includes functionality to tune parameters commonly used in a compound identification workflow given a query library of spectra with known ID. PyCompound supports both high-resolution mass spectrometry (HRMS) data (e.g., LC-MS/MS) and nominal-resolution mass spectrometry (NRMS) data (e.g., GC-MS). For the full documentation, see the GitHub repository https://github.com/hdlugas/pycompound.

{pycompound-0.1.8 → pycompound-0.1.10}/src/pycompound.egg-info/SOURCES.txt RENAMED Viewed

@@ -1,5 +1,6 @@
 LICENSE
 README.md
+README_PyPI.md
 pyproject.toml
 src/pycompound/build_library.py
 src/pycompound/plot_spectra.py

{pycompound-0.1.8 → pycompound-0.1.10}/tests/test_plot_spectra.py RENAMED Viewed

@@ -8,7 +8,7 @@ os.makedirs(f'{Path.cwd()}/plots', exist_ok=True)
 print('\n\ntest #1:')
 generate_plots_on_HRMS_data(
-        query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/lcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
         high_quality_reference_library=True,
         noise_threshold=0.1,
@@ -17,7 +17,7 @@ generate_plots_on_HRMS_data(
 print('\n\ntest #2:')
 generate_plots_on_HRMS_data(
-        query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/lcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
         noise_threshold=0.1,
         similarity_measure='shannon',
@@ -25,7 +25,7 @@ generate_plots_on_HRMS_data(
 print('\n\ntest #3:')
 generate_plots_on_HRMS_data(
-        query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/lcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
         similarity_measure='renyi',
         entropy_dimension=1.2,
@@ -33,7 +33,7 @@ generate_plots_on_HRMS_data(
 print('\n\ntest #4:')
 generate_plots_on_HRMS_data(
-        query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/lcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
         similarity_measure='tsallis',
         entropy_dimension=1.2,
@@ -41,7 +41,7 @@ generate_plots_on_HRMS_data(
 print('\n\ntest #5:')
 generate_plots_on_HRMS_data(
-        query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/lcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
         similarity_measure='tsallis',
         entropy_dimension=1.2,
@@ -49,7 +49,7 @@ generate_plots_on_HRMS_data(
 print('\n\ntest #6:')
 generate_plots_on_HRMS_data(
-        query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/lcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
         wf_intensity=0.8,
         wf_mz=1.1,
@@ -57,21 +57,21 @@ generate_plots_on_HRMS_data(
 print('\n\ntest #7:')
 generate_plots_on_HRMS_data(
-        query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/lcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
         window_size_centroiding=0.1,
         output_path=f'{Path.cwd()}/plots/test7.pdf')
 print('\n\ntest #8:')
 generate_plots_on_HRMS_data(
-        query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/lcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
         window_size_matching=0.25,
         output_path=f'{Path.cwd()}/plots/test8.pdf')
 print('\n\ntest #9:')
 generate_plots_on_HRMS_data(
-        query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/lcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
         spectrum_preprocessing_order='WCM',
         wf_mz=0.8,
@@ -80,14 +80,14 @@ generate_plots_on_HRMS_data(
 print('\n\ntest #10:')
 generate_plots_on_HRMS_data(
-        query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/lcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
         LET_threshold=3,
         output_path=f'{Path.cwd()}/plots/test10.pdf')
 print('\n\ntest #11:')
 generate_plots_on_HRMS_data(
-        query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/lcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
         spectrum_ID1 = 212,
         spectrum_ID2 = 100,
@@ -96,7 +96,7 @@ generate_plots_on_HRMS_data(
 print('\n\ntest #12:')
 generate_plots_on_HRMS_data(
-        query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/lcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
         spectrum_ID1 = 'Jamaicamide A M+H',
         spectrum_ID2 = 'Malyngamide J M+H',
@@ -105,7 +105,7 @@ generate_plots_on_HRMS_data(
 print('\n\ntest #13:')
 generate_plots_on_HRMS_data(
-        query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/lcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
         spectrum_ID1 = 'Jamaicamide A M+H',
         spectrum_ID2 = 'Jamaicamide A M+H',
@@ -114,13 +114,13 @@ generate_plots_on_HRMS_data(
 print('\n\ntest #14:')
 generate_plots_on_NRMS_data(
-        query_data=f'{Path.cwd()}/data/gcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/gcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
         output_path=f'{Path.cwd()}/plots/test14.pdf')
 print('\n\ntest #15:')
 generate_plots_on_NRMS_data(
-        query_data=f'{Path.cwd()}/data/gcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/gcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
         spectrum_ID1 = 463514,
         spectrum_ID2 = 112312,
@@ -128,40 +128,40 @@ generate_plots_on_NRMS_data(
 print('\n\ntest #17:')
 generate_plots_on_NRMS_data(
-        query_data=f'{Path.cwd()}/data/gcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/gcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
         output_path=f'{Path.cwd()}/plots/test17.pdf')
 print('\n\ntest #18:')
 generate_plots_on_NRMS_data(
-        query_data=f'{Path.cwd()}/data/gcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/gcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
         y_axis_transformation='none',
         output_path=f'{Path.cwd()}/plots/test18.pdf')
 print('\n\ntest #19:')
 generate_plots_on_NRMS_data(
-        query_data=f'{Path.cwd()}/data/gcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/gcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
         y_axis_transformation='log10',
         output_path=f'{Path.cwd()}/plots/test19.pdf')
 print('\n\ntest #20:')
 generate_plots_on_NRMS_data(
-        query_data=f'{Path.cwd()}/data/gcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/gcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
         y_axis_transformation='sqrt',
         output_path=f'{Path.cwd()}/plots/test20.pdf')
 print('\n\ntest #21:')
 generate_plots_on_HRMS_data(
-        query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/lcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
         output_path=f'{Path.cwd()}/plots/test_no_wf_normalized_y_axis_no_mz_zoom.pdf')
 print('\n\ntest #22:')
 generate_plots_on_HRMS_data(
-        query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/lcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
         wf_mz=2,
         wf_intensity=0.5,
@@ -169,21 +169,21 @@ generate_plots_on_HRMS_data(
 print('\n\ntest #23:')
 generate_plots_on_HRMS_data(
-        query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/lcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
         y_axis_transformation='log10',
         output_path=f'{Path.cwd()}/plots/test_no_wf_log10_y_axis_no_mz_zoom.pdf')
 print('\n\ntest #24:')
 generate_plots_on_HRMS_data(
-        query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/lcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
         y_axis_transformation='sqrt',
         output_path=f'{Path.cwd()}/plots/test_no_wf_sqrt_y_axis_no_mz_zoom.pdf')
 print('\n\ntest #25:')
 generate_plots_on_HRMS_data(
-        query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/lcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
         mz_min = 400,
         mz_max = 650,
@@ -192,49 +192,49 @@ generate_plots_on_HRMS_data(
 print('\n\ntest #26:')
 generate_plots_on_HRMS_data(
-        query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/lcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
         high_quality_reference_library=False,
         output_path=f'{Path.cwd()}/plots/test_HRMS.pdf')
 print('\n\ntest #27:')
 generate_plots_on_NRMS_data(
-        query_data=f'{Path.cwd()}/data/gcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/gcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
         high_quality_reference_library=False,
         output_path=f'{Path.cwd()}/plots/test_NRMS.pdf')
 print('\n\ntest #28:')
 generate_plots_on_HRMS_data(
-        query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/lcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
         similarity_measure='jaccard',
         output_path=f'{Path.cwd()}/plots/test28.pdf')
 print('\n\ntest #28:')
 generate_plots_on_HRMS_data(
-        query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/lcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
         similarity_measure='hamming',
         output_path=f'{Path.cwd()}/plots/test28.pdf')
 print('\n\ntest #29:')
 generate_plots_on_NRMS_data(
-        query_data=f'{Path.cwd()}/data/gcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/gcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
         similarity_measure='sokal_sneath',
         output_path=f'{Path.cwd()}/plots/test29.pdf')
 print('\n\ntest #30:')
 generate_plots_on_NRMS_data(
-        query_data=f'{Path.cwd()}/data/gcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/gcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
         similarity_measure='simpson',
         output_path=f'{Path.cwd()}/plots/test30.pdf')
 print('\n\ntest #31:')
 generate_plots_on_NRMS_data(
-        query_data=f'{Path.cwd()}/data/gcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/gcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
         similarity_measure='mixture',
         weights={'Cosine':0.5, 'Shannon':0.3, 'Renyi':0.1, 'Tsallis':0.1},
@@ -242,9 +242,42 @@ generate_plots_on_NRMS_data(
 print('\n\ntest #32:')
 generate_plots_on_HRMS_data(
-        query_data=f'{Path.cwd()}/data/lcms_query_library.txt',
+        query_data=f'{Path.cwd()}/data/lcms_query.txt',
         reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
         similarity_measure='mixture',
         weights={'Cosine':0.1, 'Shannon':0.2, 'Renyi':0.3, 'Tsallis':0.4},
         output_path=f'{Path.cwd()}/plots/test32.pdf')
+print('\n\ntest #33:')
+generate_plots_on_HRMS_data(
+        query_data=f'{Path.cwd()}/data/lcms_query.msp',
+        reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
+        high_quality_reference_library=True,
+        noise_threshold=0.1,
+        mz_min=100,
+        output_path=f'{Path.cwd()}/plots/test33.pdf')
+print('\n\ntest #34:')
+generate_plots_on_HRMS_data(
+        query_data=f'{Path.cwd()}/data/lcms_query_tuning.msp',
+        reference_data=f'{Path.cwd()}/data/trimmed_GNPS_reference_library.txt',
+        high_quality_reference_library=True,
+        noise_threshold=0.1,
+        mz_min=100,
+        output_path=f'{Path.cwd()}/plots/test34.pdf')
+print('\n\ntest #35:')
+generate_plots_on_NRMS_data(
+        query_data=f'{Path.cwd()}/data/gcms_query.msp',
+        reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
+        similarity_measure='shannon',
+        weights={'Cosine':0.5, 'Shannon':0.3, 'Renyi':0.1, 'Tsallis':0.1},
+        output_path=f'{Path.cwd()}/plots/test35.pdf')
+print('\n\ntest #36:')
+generate_plots_on_NRMS_data(
+        query_data=f'{Path.cwd()}/data/gcms_query.msp',
+        reference_data=f'{Path.cwd()}/data/trimmed_gcms_reference_library.txt',
+        similarity_measure='cosine',
+        output_path=f'{Path.cwd()}/plots/test36.pdf')

pycompound 0.1.8__tar.gz → 0.1.10__tar.gz

pycompound 0.1.8tar.gz → 0.1.10tar.gz