PyPI - pycompound - Versions diffs - 0.0.1__tar.gz → 0.0.6__tar.gz - Mend

pycompound 0.0.1tar.gz → 0.0.6tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

{pycompound-0.0.1 → pycompound-0.0.6}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: pycompound
-Version: 0.0.1
+Version: 0.0.6
 Summary: Python package to perform compound identification in mass spectrometry via spectral library matching.
 Author-email: Hunter Dlugas <fy7392@wayne.edu>
 License-Expression: MIT
@@ -18,9 +18,9 @@ Requires-Dist: scipy==1.13.1
 Requires-Dist: pyteomics==4.7.2
 Requires-Dist: netCDF4==1.6.5
 Requires-Dist: lxml>=5.1.0
-Requires-Dist: shiny==1.4.0
+Requires-Dist: orjson==3.11.0
+Requires-Dist: joblib==1.5.2
 Dynamic: license-file
 # PyCompound
-A Python-based tool for spectral library matching, PyCompound is available as a Python package with a command-line interface (CLI) available and as a GUI application build with Python/Shiny. It performs spectral library matching to identify chemical compounds, offering a range of spectrum preprocessing transformations and similarity measures, including Cosine and three entropy-based similarity measures. PyCompound supports both high-resolution mass spectrometry (HRMS) data (e.g., LC-MS/MS) and nominal-resolution mass spectrometry (NRMS) data (e.g., GC-MS). For documentation and usage instructions for PyCompound, please refer to the GitHub repository [https://github.com/hdlugas/pycompound](https://github.com/hdlugas/pycompound).
+A Python-based tool for spectral library matching, PyCompound is available as a Python package with a command-line interface (CLI) available and as a GUI application build with Python/Shiny. It performs spectral library matching to identify chemical compounds, offering a range of spectrum preprocessing transformations and similarity measures, including Cosine, three entropy-based similarity measures, and a plethora of binary similarity measures. PyCompound also includes functionality to tune parameters commonly used in a compound identification workflow given a query library of spectra with known ID. PyCompound supports both high-resolution mass spectrometry (HRMS) data (e.g., LC-MS/MS) and nominal-resolution mass spectrometry (NRMS) data (e.g., GC-MS). For the full documentation, see the GitHub repository https://github.com/hdlugas/pycompound.

pycompound-0.0.6/README.md ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ # PyCompound
2	+ A Python-based tool for spectral library matching, PyCompound is available as a Python package with a command-line interface (CLI) available and as a GUI application build with Python/Shiny. It performs spectral library matching to identify chemical compounds, offering a range of spectrum preprocessing transformations and similarity measures, including Cosine, three entropy-based similarity measures, and a plethora of binary similarity measures. PyCompound also includes functionality to tune parameters commonly used in a compound identification workflow given a query library of spectra with known ID. PyCompound supports both high-resolution mass spectrometry (HRMS) data (e.g., LC-MS/MS) and nominal-resolution mass spectrometry (NRMS) data (e.g., GC-MS). For the full documentation, see the GitHub repository https://github.com/hdlugas/pycompound.

{pycompound-0.0.1 → pycompound-0.0.6}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "pycompound"
-version = "0.0.1"
+version = "0.0.6"
 authors = [
   { name="Hunter Dlugas", email="fy7392@wayne.edu" },
 ]
@@ -26,7 +26,8 @@ dependencies = [
     "pyteomics==4.7.2",
     "netCDF4==1.6.5",
     "lxml>=5.1.0",
-    "shiny==1.4.0"
+    "orjson==3.11.0",
+    "joblib==1.5.2"
 ]
 [project.urls]

pycompound-0.0.6/src/app.py ADDED Viewed

@@ -0,0 +1,364 @@
+from shiny import App, ui, reactive, render
+from pycompound_fy7392.spec_lib_matching import run_spec_lib_matching_on_HRMS_data
+from pycompound_fy7392.spec_lib_matching import run_spec_lib_matching_on_NRMS_data
+from pycompound_fy7392.spec_lib_matching import tune_params_on_HRMS_data
+from pycompound_fy7392.spec_lib_matching import tune_params_on_NRMS_data
+from pycompound_fy7392.plot_spectra import generate_plots_on_HRMS_data
+from pycompound_fy7392.plot_spectra import generate_plots_on_NRMS_data
+from pathlib import Path
+import subprocess
+import traceback
+import asyncio
+import io
+import matplotlib.pyplot as plt
+def plot_spectra_ui(platform: str):
+    # Base inputs common to all platforms
+    base_inputs = [
+        ui.input_file("query_data", "Upload query dataset (mgf, mzML, cdf, msp, or csv):"),
+        ui.input_file("reference_data", "Upload reference dataset (mgf, mzML, cdf, msp, or csv):"),
+        ui.input_text("spectrum_ID1", "Input ID of one spectrum to be plotted:", None),
+        ui.input_text("spectrum_ID2", "Input ID of another spectrum to be plotted:", None),
+        ui.input_select("similarity_measure", "Select similarity measure:", ["cosine","shannon","renyi","tsallis","mixture","jaccard","dice","3w_jaccard","sokal_sneath","binary_cosine","mountford","mcconnaughey","driver_kroeber","simpson","braun_banquet","fager_mcgowan","kulczynski","intersection","hamming","hellinger"]),
+        ui.input_select(
+            "high_quality_reference_library",
+            "Indicate whether the reference library is considered high quality. "
+            "If True, filtering and noise removal are only applied to the query spectra.",
+            [False, True],
+        ),
+    ]
+    # Extra inputs depending on platform
+    if platform == "HRMS":
+        extra_inputs = [
+            ui.input_text(
+                "spectrum_preprocessing_order",
+                "Sequence of characters for preprocessing order (C, F, M, N, L, W). M must be included, C before M if used.",
+                "FCNMWL",
+            ),
+            ui.input_numeric("window_size_centroiding", "Centroiding window-size:", 0.5),
+            ui.input_numeric("window_size_matching", "Matching window-size:", 0.5),
+        ]
+    else:
+        extra_inputs = [
+            ui.input_text(
+                "spectrum_preprocessing_order",
+                "Sequence of characters for preprocessing order (F, N, L, W).",
+                "FNLW",
+            )
+        ]
+    # Numeric inputs
+    numeric_inputs = [
+        ui.input_numeric("mz_min", "Minimum m/z for filtering:", 0),
+        ui.input_numeric("mz_max", "Maximum m/z for filtering:", 99999999),
+        ui.input_numeric("int_min", "Minimum intensity for filtering:", 0),
+        ui.input_numeric("int_max", "Maximum intensity for filtering:", 999999999),
+        ui.input_numeric("noise_threshold", "Noise removal threshold:", 0.0),
+        ui.input_numeric("wf_mz", "Mass/charge weight factor:", 0.0),
+        ui.input_numeric("wf_int", "Intensity weight factor:", 1.0),
+        ui.input_numeric("LET_threshold", "Low-entropy threshold:", 0.0),
+        ui.input_numeric("entropy_dimension", "Entropy dimension (Renyi/Tsallis only):", 1.1),
+    ]
+    # Y-axis transformation select input
+    select_input = ui.input_select(
+        "y_axis_transformation",
+        "Transformation to apply to intensity axis:",
+        ["normalized", "none", "log10", "sqrt"],
+    )
+    # Run and Back buttons
+    run_button = ui.input_action_button("run_btn", "Run", style="font-size:16px; padding:15px 30px; width:200px; height:80px")
+    back_button = ui.input_action_button("back", "Back to main menu", style="font-size:16px; padding:15px 30px; width:200px; height:80px")
+    #print(len(extra_inputs))
+    # Layout base_inputs and extra_inputs in columns
+    if platform == "HRMS":
+        inputs_columns = ui.layout_columns(
+            ui.div(base_inputs[0:5], style="display:flex; flex-direction:column; gap:10px;"),
+            ui.div([base_inputs[5:6], *extra_inputs], style="display:flex; flex-direction:column; gap:10px;"),
+            ui.div(numeric_inputs[0:5], style="display:flex; flex-direction:column; gap:10px;"),
+            ui.div([numeric_inputs[5:10], select_input], style="display:flex; flex-direction:column; gap:10px;"),
+            col_widths=(3, 3, 3, 3),
+        )
+    elif platform == "NRMS":
+        inputs_columns = ui.layout_columns(
+            ui.div(base_inputs[0:5], style="display:flex; flex-direction:column; gap:10px;"),
+            ui.div([base_inputs[5:6], *extra_inputs], style="display:flex; flex-direction:column; gap:10px;"),
+            ui.div(numeric_inputs[0:5], style="display:flex; flex-direction:column; gap:10px;"),
+            ui.div([numeric_inputs[5:10], select_input], style="display:flex; flex-direction:column; gap:10px;"),
+            col_widths=(3, 3, 3, 3),
+        )
+    # Combine everything
+    return ui.div(
+        ui.TagList(
+            ui.h2("Plot Spectra"),
+            inputs_columns,
+            run_button,
+            back_button
+        ),
+    )
+def run_spec_lib_matching_ui(platform: str):
+    # Base inputs common to all platforms
+    base_inputs = [
+        ui.input_file("query_data", "Upload query dataset (mgf, mzML, cdf, msp, or csv):"),
+        ui.input_file("reference_data", "Upload reference dataset (mgf, mzML, cdf, msp, or csv):"),
+        ui.input_select("similarity_measure", "Select similarity measure:", ["cosine","shannon","renyi","tsallis","mixture","jaccard","dice","3w_jaccard","sokal_sneath","binary_cosine","mountford","mcconnaughey","driver_kroeber","simpson","braun_banquet","fager_mcgowan","kulczynski","intersection","hamming","hellinger"]),
+        ui.input_select(
+            "high_quality_reference_library",
+            "Indicate whether the reference library is considered high quality. "
+            "If True, filtering and noise removal are only applied to the query spectra.",
+            [False, True],
+        ),
+    ]
+    # Extra inputs depending on platform
+    if platform == "HRMS":
+        extra_inputs = [
+            ui.input_text(
+                "spectrum_preprocessing_order",
+                "Sequence of characters for preprocessing order (C, F, M, N, L, W). M must be included, C before M if used.",
+                "FCNMWL",
+            ),
+            ui.input_numeric("window_size_centroiding", "Centroiding window-size:", 0.5),
+            ui.input_numeric("window_size_matching", "Matching window-size:", 0.5),
+        ]
+    else:
+        extra_inputs = [
+            ui.input_text(
+                "spectrum_preprocessing_order",
+                "Sequence of characters for preprocessing order (F, N, L, W).",
+                "FNLW",
+            )
+        ]
+    # Numeric inputs
+    numeric_inputs = [
+        ui.input_numeric("mz_min", "Minimum m/z for filtering:", 0),
+        ui.input_numeric("mz_max", "Maximum m/z for filtering:", 99999999),
+        ui.input_numeric("int_min", "Minimum intensity for filtering:", 0),
+        ui.input_numeric("int_max", "Maximum intensity for filtering:", 999999999),
+        ui.input_numeric("noise_threshold", "Noise removal threshold:", 0.0),
+        ui.input_numeric("wf_mz", "Mass/charge weight factor:", 0.0),
+        ui.input_numeric("wf_int", "Intensity weight factor:", 1.0),
+        ui.input_numeric("LET_threshold", "Low-entropy threshold:", 0.0),
+        ui.input_numeric("entropy_dimension", "Entropy dimension (Renyi/Tsallis only):", 1.1),
+        ui.input_numeric("n_top_matches_to_save", "Number of top matches to save:", 1),
+    ]
+    # Run and Back buttons
+    run_button = ui.input_action_button("run_btn", "Run", style="font-size:16px; padding:15px 30px; width:200px; height:80px")
+    back_button = ui.input_action_button("back", "Back to main menu", style="font-size:16px; padding:15px 30px; width:200px; height:80px")
+    #print(len(extra_inputs))
+    # Layout base_inputs and extra_inputs in columns
+    if platform == "HRMS":
+        inputs_columns = ui.layout_columns(
+            ui.div(base_inputs[0:5], style="display:flex; flex-direction:column; gap:10px;"),
+            ui.div([base_inputs[5:6], *extra_inputs], style="display:flex; flex-direction:column; gap:10px;"),
+            ui.div(numeric_inputs[0:5], style="display:flex; flex-direction:column; gap:10px;"),
+            ui.div(numeric_inputs[5:10], style="display:flex; flex-direction:column; gap:10px;"),
+            col_widths=(3, 3, 3, 3),
+        )
+    elif platform == "NRMS":
+        inputs_columns = ui.layout_columns(
+            ui.div(base_inputs[0:5], style="display:flex; flex-direction:column; gap:10px;"),
+            ui.div([base_inputs[5:6], *extra_inputs], style="display:flex; flex-direction:column; gap:10px;"),
+            ui.div(numeric_inputs[0:5], style="display:flex; flex-direction:column; gap:10px;"),
+            ui.div(numeric_inputs[5:10], style="display:flex; flex-direction:column; gap:10px;"),
+            col_widths=(3, 3, 3, 3),
+        )
+    # Combine everything
+    return ui.div(
+        ui.TagList(
+            ui.h2("Run Spectral Library Matching"),
+            inputs_columns,
+            run_button,
+            back_button
+        ),
+    )
+app_ui = ui.page_fluid(
+    ui.output_ui("main_ui"),
+    ui.output_text("status_output")
+)
+def server(input, output, session):
+    # Track which page to show
+    current_page = reactive.Value("main_menu")
+    # Track button clicks
+    plot_clicks = reactive.Value(0)
+    match_clicks = reactive.Value(0)
+    back_clicks = reactive.Value(0)
+    run_status = reactive.Value("Waiting for input...")
+    @reactive.Effect
+    def _():
+        # Main menu buttons
+        if input.plot_spectra() > plot_clicks.get():
+            current_page.set("plot_spectra")
+            plot_clicks.set(input.plot_spectra())
+        elif input.run_spec_lib_matching() > match_clicks.get():
+            current_page.set("run_spec_lib_matching")
+            match_clicks.set(input.run_spec_lib_matching())
+        elif hasattr(input, "back") and input.back() > back_clicks.get():
+            current_page.set("main_menu")
+            back_clicks.set(input.back())
+    @render.image
+    def image():
+        from pathlib import Path
+        dir = Path(__file__).resolve().parent
+        img: ImgData = {"src": str(dir / "www/emblem.png"), "width": "320px", "height": "250px"}
+        return img
+    @output
+    @render.ui
+    def main_ui():
+        if current_page() == "main_menu":
+            return ui.page_fluid(
+                ui.h2("Main Menu"),
+                ui.div(
+                    ui.output_image("image"),
+                    style=(
+                        "position:fixed; top:0; left:50%; transform:translateX(-50%); "
+                        "z-index:1000; text-align:center; padding:10px; background-color:white;"
+                    ),
+                ),
+                ui.div(
+                    "Overview:",
+                    style="text-align:left; font-size:24px; font-weight:bold; margin-top:350px"
+                ),
+                ui.div(
+                    "PyCompound is a Python-based tool designed for performing spectral library matching on either high-resolution mass spectrometry data (HRMS) or low-resolution mass spectrometry data (NRMS). PyCompound offers a range of spectrum preprocessing transformations and similarity measures. These spectrum preprocessing transformations include filtering on mass/charge and/or intensity values, weight factor transformation, low-entropy transformation, centroiding, noise removal, and matching. The available similarity measures include the canonical Cosine similarity measure, three entropy-based similarity measures, and a variety of binary similarity measures: Jaccard, Dice, 3W-Jaccard, Sokal-Sneath, Binary Cosine, Mountford, McConnaughey, Driver-Kroeber, Simpson, Braun-Banquet, Fager-McGowan, Kulczynski, Intersection, Hamming, and Hellinger.",
+                    style="margin-top:10px; text-align:left; font-size:16px; font-weight:500"
+                ),
+                ui.div(
+                    "Select options:",
+                    style="margin-top:30px; text-align:left; font-size:24px; font-weight:bold"
+                ),
+                ui.div(
+                    ui.input_radio_buttons("chromatography_platform", "Specify chromatography platform:", ["HRMS","NRMS"]),
+                    style="font-size:18px; margin-top:10px; max-width:none"
+                ),
+                ui.input_action_button("plot_spectra", "Plot two spectra before and after preprocessing transformations.", style="font-size:18px; padding:20px 40px; width:550px; height:100px; margin-top:10px; margin-right:50px"),
+                ui.input_action_button("run_spec_lib_matching", "Run spectral library matching to perform compound identification on a query library of spectra.", style="font-size:18px; padding:20px 40px; width:550px; height:100px; margin-top:10px; margin-right:50px"),
+                ui.div(
+                    "References:",
+                    style="margin-top:35px; text-align:left; font-size:24px; font-weight:bold"
+                ),
+                ui.div(
+                    "If Shannon Entropy similarity measure, low-entropy transformation, or centroiding are used:",
+                    style="margin-top:10px; text-align:left; font-size:14px; font-weight:500"
+                ),
+                ui.div(
+                    ui.HTML(
+                        'Li, Y., Kind, T., Folz, J. et al. (2021) Spectral entropy outperforms MS/MS dot product similarity for small-molecule compound identification. Nat Methods, 18 1524–1531. <a href="https://doi.org/10.1038/s41592-021-01331-z" target="_blank">https://doi.org/10.1038/s41592-021-01331-z</a>.'
+                    ),
+                    style="text-align:left; font-size:14px; font-weight:500"
+                ),
+                ui.div(
+                    "If Tsallis Entropy similarity measure or series of preprocessing transformations are used:",
+                    style="margin-top:10px; text-align:left; font-size:14px; font-weight:500"
+                ),
+                ui.div(
+                    ui.HTML(
+                        'Dlugas, H., Zhang, X., Kim, S. (2025) Comparative analysis of continuous similarity measures for compound identification in mass spectrometry-based metabolomics. Chemometrics and Intelligent Laboratory Systems, 263, 105417. <a href="https://doi.org/10.1016/j.chemolab.2025.105417", target="_blank">https://doi.org/10.1016/j.chemolab.2025.105417</a>.'
+                    ),
+                    style="text-align:left; font-size:14px; font-weight:500"
+                ),
+                ui.div(
+                    "If binary similarity measures are used:",
+                    style="margin-top:10px; text-align:left; font-size:14px; font-weight:500"
+                ),
+                ui.div(
+                    ui.HTML(
+                        'Kim, S., Kato, I., & Zhang, X. (2022). Comparative Analysis of Binary Similarity Measures for Compound Identification in Mass Spectrometry-Based Metabolomics. Metabolites, 12(8), 694. <a href="https://doi.org/10.3390/metabo12080694" target="_blank">https://doi.org/10.3390/metabo12080694</a>.'
+                    ),
+                    style="text-align:left; font-size:14px; font-weight:500"
+                ),
+                ui.div(
+                    "If weight factor transformation is used:",
+                    style="margin-top:10px; text-align:left; font-size:14px; font-weight:500"
+                ),
+                ui.div(
+                    ui.HTML(
+                        'Kim, S., Koo, I., Wei, X., & Zhang, X. (2012). A method of finding optimal weight factors for compound identification in gas chromatography-mass spectrometry. Bioinformatics, 28(8), 1158-1163. <a href="https://doi.org/10.1093/bioinformatics/bts083" target="_blank">https://doi.org/10.1093/bioinformatics/bts083</a>.'
+                    ),
+                    style="margin-bottom:40px; text-align:left; font-size:14px; font-weight:500"
+                ),
+            )
+        elif current_page() == "plot_spectra":
+            return plot_spectra_ui(input.chromatography_platform())
+        elif current_page() == "run_spec_lib_matching":
+            return run_spec_lib_matching_ui(input.chromatography_platform())
+    @reactive.effect
+    @reactive.event(input.run_btn)
+    def _():
+        if current_page() == "plot_spectra":
+            if len(input.spectrum_ID1())==0:
+                spectrum_ID1 = None
+            else:
+                spectrum_ID1 = input.spectrum_ID1()
+            if len(input.spectrum_ID2())==0:
+                spectrum_ID2 = None
+            else:
+                spectrum_ID2 = input.spectrum_ID2()
+            if input.chromatography_platform() == "HRMS":
+                try:
+                    fig = generate_plots_on_HRMS_data(query_data=input.query_data()[0]['datapath'], reference_data=input.reference_data()[0]['datapath'], spectrum_ID1=spectrum_ID1, spectrum_ID2=spectrum_ID2, similarity_measure=input.similarity_measure(), spectrum_preprocessing_order=input.spectrum_preprocessing_order(), high_quality_reference_library=input.high_quality_reference_library(), mz_min=input.mz_min(), mz_max=input.mz_max(), int_min=input.int_min(), int_max=input.int_max(), window_size_centroiding=input.window_size_centroiding(), window_size_matching=input.window_size_matching(), noise_threshold=input.noise_threshold(), wf_mz=input.wf_mz(), wf_intensity=input.wf_int(), LET_threshold=input.LET_threshold(), entropy_dimension=input.entropy_dimension(), y_axis_transformation=input.y_axis_transformation(), return_plot=True)
+                    plt.show()
+                    run_status.set(f"✅  Plotting has finished.")
+                except Exception as e:
+                    run_status.set(f"❌ Error: {traceback.format_exc()}")
+            elif input.chromatography_platform() == "NRMS":
+                try:
+                    generate_plots_on_NRMS_data(query_data=input.query_data()[0]['datapath'], reference_data=input.reference_data()[0]['datapath'], spectrum_ID1=spectrum_ID1, spectrum_ID2=spectrum_ID2, similarity_measure=input.similarity_measure(), spectrum_preprocessing_order=input.spectrum_preprocessing_order(), high_quality_reference_library=input.high_quality_reference_library(), mz_min=input.mz_min(), mz_max=input.mz_max(), int_min=input.int_min(), int_max=input.int_max(), noise_threshold=input.noise_threshold(), wf_mz=input.wf_mz(), wf_intensity=input.wf_int(), LET_threshold=input.LET_threshold(), entropy_dimension=input.entropy_dimension(), y_axis_transformation=input.y_axis_transformation(), return_plot=True)
+                    plt.show()
+                    run_status.set(f"✅  Plotting has finished.")
+                except Exception as e:
+                    run_status.set(f"❌ Error: {traceback.format_exc()}")
+        elif current_page() == 'run_spec_lib_matching':
+            if input.chromatography_platform() == 'HRMS':
+                try:
+                    run_spec_lib_matching_on_HRMS_data(query_data=input.query_data()[0]['datapath'], reference_data=input.reference_data()[0]['datapath'], likely_reference_ids=None, similarity_measure=input.similarity_measure(), spectrum_preprocessing_order=input.spectrum_preprocessing_order(), high_quality_reference_library=input.high_quality_reference_library(), mz_min=input.mz_min(), mz_max=input.mz_max(), int_min=input.int_min(), int_max=input.int_max(), window_size_centroiding=input.window_size_centroiding(), window_size_matching=input.window_size_matching(), noise_threshold=input.noise_threshold(), wf_mz=input.wf_mz(), wf_intensity=input.wf_int(), LET_threshold=input.LET_threshold(), entropy_dimension=input.entropy_dimension(), n_top_matches_to_save=input.n_top_matches_to_save(), print_id_results=False, output_identification=f'{Path.cwd()}/output_identification.csv', output_similarity_scores=f'{Path.cwd()}/')
+                    run_status.set(f"✅  Spectral library matching has finished and results were written to {Path.cwd()}/output_similarity_scores.csv.")
+                except Exception as e:
+                    run_status.set(f"❌ Error: {traceback.format_exc()}")
+            elif input.chromatography_platform() == 'NRMS':
+                try:
+                    run_spec_lib_matching_on_NRMS_data(query_data=input.query_data()[0]['datapath'], reference_data=input.reference_data()[0]['datapath'], likely_reference_ids=None, similarity_measure=input.similarity_measure(), spectrum_preprocessing_order=input.spectrum_preprocessing_order(), high_quality_reference_library=input.high_quality_reference_library(), mz_min=input.mz_min(), mz_max=input.mz_max(), int_min=input.int_min(), int_max=input.int_max(), noise_threshold=input.noise_threshold(), wf_mz=input.wf_mz(), wf_intensity=input.wf_int(), LET_threshold=input.LET_threshold(), entropy_dimension=input.entropy_dimension(), n_top_matches_to_save=input.n_top_matches_to_save(), print_id_results=False, output_identification=f'{Path.cwd()}/output_identification.csv', output_similarity_scores=f'{Path.cwd()}/output_similarity_scores.csv')
+                    run_status.set(f"✅  Spectral library matching has finished and results were written to {Path.cwd()}/")
+                except Exception as e:
+                    run_status.set(f"❌ Error: {traceback.format_exc()}")
+    @render.text
+    def status_output():
+        return run_status.get()
+app = App(app_ui, server)

{pycompound-0.0.1/src/pycompound_fy7392 → pycompound-0.0.6/src/pycompound}/plot_spectra.py RENAMED Viewed

@@ -9,7 +9,7 @@ import sys
 import matplotlib.pyplot as plt
-def generate_plots_on_HRMS_data(query_data=None, reference_data=None, spectrum_ID1=None, spectrum_ID2=None, similarity_measure='cosine', spectrum_preprocessing_order='FCNMWL', high_quality_reference_library=False, mz_min=0, mz_max=9999999, int_min=0, int_max=9999999, window_size_centroiding=0.5, window_size_matching=0.5, noise_threshold=0.0, wf_mz=0.0, wf_intensity=1.0, LET_threshold=0.0, entropy_dimension=1.1, y_axis_transformation='normalized', output_path=None):
+def generate_plots_on_HRMS_data(query_data=None, reference_data=None, spectrum_ID1=None, spectrum_ID2=None, similarity_measure='cosine', weights={'Cosine':0.25,'Shannon':0.25,'Renyi':0.25,'Tsallis':0.25}, spectrum_preprocessing_order='FCNMWL', high_quality_reference_library=False, mz_min=0, mz_max=9999999, int_min=0, int_max=9999999, window_size_centroiding=0.5, window_size_matching=0.5, noise_threshold=0.0, wf_mz=0.0, wf_intensity=1.0, LET_threshold=0.0, entropy_dimension=1.1, y_axis_transformation='normalized', output_path=None, return_plot=False):
     '''
     plots two spectra against each other before and after preprocessing transformations for high-resolution mass spectrometry data
@@ -17,7 +17,8 @@ def generate_plots_on_HRMS_data(query_data=None, reference_data=None, spectrum_I
     --reference_data: mgf, mzML, or csv file of the reference mass spectra. If csv file, each row should correspond to a mass spectrum, the left-most column should contain in identifier (i.e. the CAS registry number or the compound name), and the remaining column should correspond to a single mass/charge ratio. Mandatory argument.
     --spectrum_ID1: ID of one spectrum to be plotted. Default is first spectrum in the query library. Optional argument.
     --spectrum_ID2: ID of another spectrum to be plotted. Default is first spectrum in the reference library. Optional argument.
-    --similarity_measure: \'cosine\', \'shannon\', \'renyi\', and \'tsallis\'. Default: cosine.
+    --similarity_measure: cosine, shannon, renyi, tsallis, mixture, jaccard, dice, 3w_jaccard, sokal_sneath, binary_cosine, mountford, mcconnaughey, driver_kroeber, simpson, braun_banquet, fager_mcgowan, kulczynski, intersection, hamming, hellinger. Default: cosine.
+    --weights: dict of weights to give to each non-binary similarity measure (i.e. cosine, shannon, renyi, and tsallis) when the mixture similarity measure is specified. Default: 0.25 for each of the four non-binary similarity measures.
     --spectrum_preprocessing_order: The spectrum preprocessing transformations and the order in which they are to be applied. Note that these transformations are applied prior to computing similarity scores. Format must be a string with 2-6 characters chosen from C, F, M, N, L, W representing centroiding, filtering based on mass/charge and intensity values, matching, noise removal, low-entropy trannsformation, and weight-factor-transformation, respectively. For example, if \'WCM\' is passed, then each spectrum will undergo a weight factor transformation, then centroiding, and then matching. Note that if an argument is passed, then \'M\' must be contained in the argument, since matching is a required preprocessing step in spectral library matching of HRMS data. Furthermore, \'C\' must be performed before matching since centroiding can change the number of ion fragments in a given spectrum. Default: FCNMWL')
     --high_quality_reference_library: True/False flag indicating whether the reference library is considered to be of high quality. If True, then the spectrum preprocessing transformations of filtering and noise removal are performed only on the query spectrum/spectra. If False, all spectrum preprocessing transformations specified will be applied to both the query and reference spectra. Default: False')
     --mz_min: Remove all peaks with mass/charge value less than mz_min in each spectrum. Default: 0
@@ -95,8 +96,8 @@ def generate_plots_on_HRMS_data(query_data=None, reference_data=None, spectrum_I
         print(f'Error: spectrum_preprocessing_order must contain only \'C\', \'F\', \'M\', \'N\', \'L\', \'W\'.')
         sys.exit()
-    if similarity_measure not in ['cosine','shannon','renyi','tsallis']:
-        print('\nError: similarity_measure must be either \'cosine\', \'shannon\', \'renyi\', or \'tsallis\'')
+    if similarity_measure not in ['cosine','shannon','renyi','tsallis','mixture','jaccard','dice','3w_jaccard','sokal_sneath','binary_cosine','mountford','mcconnaughey','driver_kroeber','simpson','braun_banquet','fager_mcgowan','kulczynski','interection','hamming','hellinger']:
+        print('\nError: similarity_measure must be either cosine, shannon, renyi, tsallis, mixture, jaccard, dice, 3w_jaccard, sokal_sneath, binary_cosine, mountford, mcconnaughey, driver_kroeber, simpson, braun_banquet, fager_mcgowan, kulczynski, interection, hamming, or hellinger.')
         sys.exit()
     if isinstance(int_min,int) is True:
@@ -157,10 +158,6 @@ def generate_plots_on_HRMS_data(query_data=None, reference_data=None, spectrum_I
         output_path = f'{Path.cwd()}/spectrum1_{spectrum_ID1}_spectrum2_{spectrum_ID2}.pdf'
-    #print(spectrum_ID1)
-    #print(spectrum_ID2)
-    #print(unique_query_ids)
-    #print(unique_reference_ids)
     if spectrum_ID1 in unique_query_ids and spectrum_ID2 in unique_query_ids:
         query_idx = unique_query_ids.index(spectrum_ID1)
         reference_idx = unique_query_ids.index(spectrum_ID2)
@@ -266,17 +263,7 @@ def generate_plots_on_HRMS_data(query_data=None, reference_data=None, spectrum_I
     # if there is at least one non-zero intensity ion fragment in either spectra, compute their similarity
     if np.sum(q_ints) != 0 and np.sum(r_ints) != 0 and q_spec.shape[0] > 1 and r_spec.shape[1] > 1:
-        if similarity_measure == 'cosine':
-            similarity_score = S_cos(q_ints, r_ints)
-        else:
-            q_ints = normalize(q_ints, method = normalization_method)
-            r_ints = normalize(r_ints, method = normalization_method)
-            if similarity_measure == 'shannon':
-                similarity_score = S_shannon(q_ints, r_ints)
-            elif similarity_measure == 'renyi':
-                similarity_score = S_renyi(q_ints, r_ints, q)
-            elif similarity_measure == 'tsallis':
-                similarity_score = S_tsallis(q_ints, r_ints, q)
+        similarity_score = get_similarity(similarity_measure, q_ints, r_ints, weights, entropy_dimension)
     else:
         similarity_score = 0
@@ -333,16 +320,20 @@ def generate_plots_on_HRMS_data(query_data=None, reference_data=None, spectrum_I
     fig.text(0.45, 0.06, f'Low-Entropy Threshold: {LET_threshold}', fontsize=7)
     plt.savefig(output_path, format='pdf')
+    if return_plot == True:
+        return plt
-def generate_plots_on_NRMS_data(query_data=None, reference_data=None, spectrum_ID1=None, spectrum_ID2=None, similarity_measure='cosine', spectrum_preprocessing_order='FNLW', high_quality_reference_library=False, mz_min=0, mz_max=9999999, int_min=0, int_max=9999999, noise_threshold=0.0, wf_mz=0.0, wf_intensity=1.0, LET_threshold=0.0, entropy_dimension=1.1, y_axis_transformation='normalized', output_path=None):
+def generate_plots_on_NRMS_data(query_data=None, reference_data=None, spectrum_ID1=None, spectrum_ID2=None, similarity_measure='cosine', weights={'Cosine':0.25,'Shannon':0.25,'Renyi':0.25,'Tsallis':0.25}, spectrum_preprocessing_order='FNLW', high_quality_reference_library=False, mz_min=0, mz_max=9999999, int_min=0, int_max=9999999, noise_threshold=0.0, wf_mz=0.0, wf_intensity=1.0, LET_threshold=0.0, entropy_dimension=1.1, y_axis_transformation='normalized', output_path=None, return_plot=False):
     '''
     plots two spectra against each other before and after preprocessing transformations for high-resolution mass spectrometry data
     --query_data: cdf or csv file of query mass spectrum/spectra to be identified. If csv file, each row should correspond to a mass spectrum, the left-most column should contain an identifier, and each of the other columns should correspond to a single mass/charge ratio. Mandatory argument.
     --reference_data: cdf of csv file of the reference mass spectra. If csv file, each row should correspond to a mass spectrum, the left-most column should contain in identifier (i.e. the CAS registry number or the compound name), and the remaining column should correspond to a single mass/charge ratio. Mandatory argument.
-    --similarity_measure: \'cosine\', \'shannon\', \'renyi\', and \'tsallis\'. Default: cosine.
+    --similarity_measure: cosine, shannon, renyi, tsallis, mixture, jaccard, dice, 3w_jaccard, sokal_sneath, binary_cosine, mountford, mcconnaughey, driver_kroeber, simpson, braun_banquet, fager_mcgowan, kulczynski, intersection, hamming, hellinger. Default: cosine.
+    --weights: dict of weights to give to each non-binary similarity measure (i.e. cosine, shannon, renyi, and tsallis) when the mixture similarity measure is specified. Default: 0.25 for each of the four non-binary similarity measures.
     --spectrum_preprocessing_order: The spectrum preprocessing transformations and the order in which they are to be applied. Note that these transformations are applied prior to computing similarity scores. Format must be a string with 2-4 characters chosen from F, N, L, W representing filtering based on mass/charge and intensity values, noise removal, low-entropy trannsformation, and weight-factor-transformation, respectively. For example, if \'WN\' is passed, then each spectrum will undergo a weight factor transformation and then noise removal. Default: FNLW')
     --high_quality_reference_library: True/False flag indicating whether the reference library is considered to be of high quality. If True, then the spectrum preprocessing transformations of filtering and noise removal are performed only on the query spectrum/spectra. If False, all spectrum preprocessing transformations specified will be applied to both the query and reference spectra. Default: False')
     --mz_min: Remove all peaks with mass/charge value less than mz_min in each spectrum. Default: 0
@@ -409,8 +400,8 @@ def generate_plots_on_NRMS_data(query_data=None, reference_data=None, spectrum_I
         print(f'Error: spectrum_preprocessing_order must contain only \'F\', \'N\', \'W\', \'L\'.')
         sys.exit()
-    if similarity_measure not in ['cosine','shannon','renyi','tsallis']:
-        print('\nError: similarity_measure must be either \'cosine\', \'shannon\', \'renyi\', or \'tsallis\'')
+    if similarity_measure not in ['cosine','shannon','renyi','tsallis','mixture','jaccard','dice','3w_jaccard','sokal_sneath','binary_cosine','mountford','mcconnaughey','driver_kroeber','simpson','braun_banquet','fager_mcgowan','kulczynski','interection','hamming','hellinger']:
+        print('\nError: similarity_measure must be either cosine, shannon, renyi, tsallis, mixture, jaccard, dice, 3w_jaccard, sokal_sneath, binary_cosine, mountford, mcconnaughey, driver_kroeber, simpson, braun_banquet, fager_mcgowan, kulczynski, interection, hamming, or hellinger.')
         sys.exit()
     if isinstance(int_min,int) is True:
@@ -564,20 +555,9 @@ def generate_plots_on_NRMS_data(query_data=None, reference_data=None, spectrum_I
             if high_quality_reference_library == False:
                 r_spec = filter_spec_gcms(r_spec, mz_min = mz_min, mz_max = mz_max, int_min = int_min, int_max = int_max)
-    # compute similarity score; if the spectra contain one point at most, their similarity is considered to be 0
+    # compute similarity score; if the spectra contain at most one point, their similarity is considered to be 0
     if q_spec.shape[0] > 1:
-        if similarity_measure == 'cosine':
-            similarity_score = S_cos(q_spec[:,1], r_spec[:,1])
-        else:
-            q_spec[:,1] = normalize(q_spec[:,1], method = normalization_method)
-            r_spec[:,1] = normalize(r_spec[:,1], method = normalization_method)
-            if similarity_measure == 'shannon':
-                similarity_score = S_shannon(q_spec[:,1].astype('float'), r_spec[:,1].astype('float'))
-            elif similarity_measure == 'renyi':
-                similarity_score = S_renyi(q_spec[:,1], r_spec[:,1], q)
-            elif similarity_measure == 'tsallis':
-                similarity_score = S_tsallis(q_spec[:,1], r_spec[:,1], q)
+        similarity_score = get_similarity(similarity_measure, q_spec[:,1], r_spec[:,1], weights, entropy_dimension)
     else:
         similarity_score = 0
@@ -633,4 +613,6 @@ def generate_plots_on_NRMS_data(query_data=None, reference_data=None, spectrum_I
     fig.text(0.45, 0.06, f'Low-Entropy Threshold: {LET_threshold}', fontsize=7)
     plt.savefig(output_path, format='pdf')
+    if return_plot == True:
+        return fig

{pycompound-0.0.1/src/pycompound_fy7392 → pycompound-0.0.6/src/pycompound}/plot_spectra_CLI.py RENAMED Viewed

@@ -3,6 +3,7 @@ from pycompound_fy7392.plot_spectra import generate_plots_on_HRMS_data
 from pycompound_fy7392.plot_spectra import generate_plots_on_NRMS_data
 import pandas as pd
 import argparse
+import json
 from pathlib import Path
 import sys
@@ -13,7 +14,8 @@ parser.add_argument('--query_data', type=str, metavar='\b', help='CSV file of qu
 parser.add_argument('--reference_data', type=str, metavar='\b', help='CSV file of the reference mass spectra. Each row should correspond to a mass spectrum, the left-most column should contain in identifier (i.e. the CAS registry number or the compound name), and the remaining column should correspond to a single mass/charge ratio. Mandatory argument.')
 parser.add_argument('--spectrum_ID1', type=str, metavar='\b', help='The identifier of the query spectrum to be plotted. Default: first query spectrum in query_data.')
 parser.add_argument('--spectrum_ID2', type=str, metavar='\b', help='The identifier of the reference spectrum to be plotted. Default: first reference spectrum in reference_data.')
-parser.add_argument('--similarity_measure', type=str, default='cosine', metavar='\b', help='Similarity measure: options are \'cosine\', \'shannon\', \'renyi\', and \'tsallis\'. Default: cosine.')
+parser.add_argument('--similarity_measure', type=str, default='cosine', metavar='\b', help='Similarity measure: options are cosine, shannon, renyi, tsallis, mixture, jaccard, dice, 3w_jaccard, sokal_sneath, binary_cosine, mountford, mcconnaughey, driver_kroeber, simpson, braun_banquet, fager_mcgowan, kulczynski, intersection, hamming, or hellinger. Default: cosine.')
+parser.add_argument('--weights', type=json.loads, default={'Cosine':0.25,'Shannon':0.25,'Renyi':0.25,'Tsallis':0.25}, metavar='\b', help='dict of weights to give to each non-binary similarity measure (i.e. cosine, shannon, renyi, and tsallis) when the mixture similarity measure is specified. Default: 0.25 for each of the four non-binary similarity measures.')
 parser.add_argument('--chromatography_platform', type=str, metavar='\b', help='Chromatography platform: options are \'HRMS\' and \'NRMS\'. Mandatory argument.')
 parser.add_argument('--spectrum_preprocessing_order', type=str, metavar='\b', help='The LC-MS/MS spectrum preprocessing transformations and the order in which they are to be applied. Note that these transformations are applied prior to computing similarity scores. Format must be a string with 2-6 characters chosen from C, F, M, N, L, W representing centroiding, filtering based on mass/charge and intensity values, matching, noise removal, low-entropy trannsformation, and weight-factor-transformation, respectively. For example, if \'WCM\' is passed, then each spectrum will undergo a weight factor transformation, then centroiding, and then matching. Note that if an argument is passed, then \'M\' must be contained in the argument, since matching is a required preprocessing step in spectral library matching of LC-MS/MS data. Furthermore, \'C\' must be performed before matching since centroiding can change the number of ion fragments in a given spectrum. Default: FCNMWL for HRMS, FNLW for NRMS')
 parser.add_argument('--high_quality_reference_library', type=str, default='False', metavar='\b', help='True/False flag indicating whether the reference library is considered to be of high quality. If True, then the spectrum preprocessing transformations of filtering and noise removal are performed only on the query spectrum/spectra. If False, all spectrum preprocessing transformations specified will be applied to both the query and reference spectra. Default: False')
@@ -43,9 +45,8 @@ else:
 if args.chromatography_platform == 'HRMS':
-    generate_plots_on_HRMS_data(query_data=args.query_data, reference_data=args.reference_data, spectrum_ID1=args.spectrum_ID1, spectrum_ID2=args.spectrum_ID2, similarity_measure=args.similarity_measure, spectrum_preprocessing_order=spectrum_preprocessing_order, high_quality_reference_library=args.high_quality_reference_library, mz_min=args.mz_min, mz_max=args.mz_max, int_min=args.int_min, int_max=args.int_max, window_size_centroiding=args.window_size_centroiding, window_size_matching=args.window_size_matching, noise_threshold=args.noise_threshold, wf_mz=args.wf_mz, wf_intensity=args.wf_intensity, LET_threshold=args.LET_threshold, entropy_dimension=args.entropy_dimension, y_axis_transformation=args.y_axis_transformation, output_path=args.output_path)
+    generate_plots_on_HRMS_data(query_data=args.query_data, reference_data=args.reference_data, spectrum_ID1=args.spectrum_ID1, spectrum_ID2=args.spectrum_ID2, similarity_measure=args.similarity_measure, weights=args.weights, spectrum_preprocessing_order=spectrum_preprocessing_order, high_quality_reference_library=args.high_quality_reference_library, mz_min=args.mz_min, mz_max=args.mz_max, int_min=args.int_min, int_max=args.int_max, window_size_centroiding=args.window_size_centroiding, window_size_matching=args.window_size_matching, noise_threshold=args.noise_threshold, wf_mz=args.wf_mz, wf_intensity=args.wf_intensity, LET_threshold=args.LET_threshold, entropy_dimension=args.entropy_dimension, y_axis_transformation=args.y_axis_transformation, output_path=args.output_path)
 elif args.chromatography_platform == 'NRMS':
-    generate_plots_on_NRMS_data(query_data=args.query_data, reference_data=args.reference_data, spectrum_ID1=args.spectrum_ID1, spectrum_ID2=args.spectrum_ID2, similarity_measure=args.similarity_measure, spectrum_preprocessing_order=spectrum_preprocessing_order, high_quality_reference_library=args.high_quality_reference_library, mz_min=args.mz_min, mz_max=args.mz_max, int_min=args.int_min, int_max=args.int_max, noise_threshold=args.noise_threshold, wf_mz=args.wf_mz, wf_intensity=args.wf_intensity, LET_threshold=args.LET_threshold, entropy_dimension=args.entropy_dimension, y_axis_transformation=args.y_axis_transformation, output_path=args.output_path)
+    generate_plots_on_NRMS_data(query_data=args.query_data, reference_data=args.reference_data, spectrum_ID1=args.spectrum_ID1, spectrum_ID2=args.spectrum_ID2, similarity_measure=args.similarity_measure, weights=args.weights, spectrum_preprocessing_order=spectrum_preprocessing_order, high_quality_reference_library=args.high_quality_reference_library, mz_min=args.mz_min, mz_max=args.mz_max, int_min=args.int_min, int_max=args.int_max, noise_threshold=args.noise_threshold, wf_mz=args.wf_mz, wf_intensity=args.wf_intensity, LET_threshold=args.LET_threshold, entropy_dimension=args.entropy_dimension, y_axis_transformation=args.y_axis_transformation, output_path=args.output_path)

pycompound 0.0.1__tar.gz → 0.0.6__tar.gz

pycompound 0.0.1tar.gz → 0.0.6tar.gz