PyPI - masster - Versions diffs - 0.3.9__tar.gz → 0.3.10__tar.gz - Mend

masster 0.3.9tar.gz → 0.3.10tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of masster might be problematic. Click here for more details.

Files changed (80) hide show

{masster-0.3.9 → masster-0.3.10}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: masster
-Version: 0.3.9
+Version: 0.3.10
 Summary: Mass spectrometry data analysis package
 Project-URL: homepage, https://github.com/zamboni-lab/masster
 Project-URL: repository, https://github.com/zamboni-lab/masster

{masster-0.3.9 → masster-0.3.10}/pyproject.toml RENAMED Viewed

@@ -1,7 +1,7 @@
 [project]
 name = "masster"
-version = "0.3.9"
+version = "0.3.10"
 description = "Mass spectrometry data analysis package"
 authors = [
     { name = "Zamboni Lab" }

{masster-0.3.9 → masster-0.3.10}/src/masster/sample/defaults/find_features_def.py RENAMED Viewed

@@ -17,102 +17,100 @@ from typing import Any
 @dataclass
 class find_features_defaults:
+    """Configuration defaults for the feature-finding pipeline.
+    This dataclass centralizes parameters used by the `find_features()` routine
+    (mass-trace detection, elution-peak detection and feature assembly).  The
+    purpose of this docstring is to explain the role and impact of the main
+    parameters users commonly tune.
+    Main parameters (what they mean, units and guidance):
+    - chrom_fwhm (float, seconds):
+        Expected chromatographic peak full-width at half-maximum (FWHM) in
+        seconds. This value informs the peak detection algorithms about the
+        typical temporal width of chromatographic peaks. It is used for
+        smoothing, window sizes when searching for local maxima and when
+        calculating RT-based tolerances. Use a value that matches your LC
+        method: smaller values for sharp, fast chromatography and larger values
+        for broader peaks. Default: 1.0 s.
+    - noise (float, intensity units):
+        Intensity threshold used to filter out low-intensity signals before
+        mass-trace and peak detection. Points with intensity below this
+        threshold are treated as background and typically ignored. Raising
+        `noise` reduces false positives from background fluctuations but may
+        remove low-abundance true peaks; lowering it increases sensitivity at
+        the cost of more noise. Default: 200.0 (instrument-dependent).
+    - chrom_peak_snr (float, unitless):
+        Minimum signal-to-noise ratio required to accept a detected
+        chromatographic peak. SNR is typically computed as peak height
+        (or crest intensity) divided by an estimate of local noise. A higher
+        `chrom_peak_snr` makes detection stricter (fewer false positives),
+        while a lower value makes detection more permissive (more low-SNR
+        peaks accepted). Typical values range from ~3 (relaxed) to >10
+        (stringent). Default: 10.0.
+    Use these three parameters together to balance sensitivity and
+    specificity for your dataset: tune `chrom_fwhm` to match chromatographic
+    peak shapes, set `noise` to a conservative background level for your
+    instrument, then adjust `chrom_peak_snr` to control how aggressively
+    peaks are accepted or rejected.
+    The class also contains many other configuration options (mass tolerances,
+    isotope handling, post-processing and reporting flags). See individual
+    parameter metadata (`_param_metadata`) for allowed ranges and types.
     """
-    Parameters for mass spectrometry feature detection using OpenMS algorithms.
-    This class consolidates all parameters used in the find_features() method including
-    mass trace detection (MTD), elution peak detection (EPD), and feature finding (FFM).
-    It provides type checking, validation, and comprehensive parameter descriptions.
-    Mass Trace Detection (MTD) Parameters:
-        tol_ppm: Mass error tolerance in parts-per-million for mass trace detection.
-        noise: Noise threshold intensity to filter out low-intensity signals.
-        min_trace_length_multiplier: Multiplier for minimum trace length (multiplied by chrom_fwhm_min).
-        trace_termination_outliers: Number of outliers allowed before terminating a trace.
-    Elution Peak Detection (EPD) Parameters:
-        chrom_fwhm: Full width at half maximum for chromatographic peak shape.
-        chrom_fwhm_min: Minimum FWHM for chromatographic peak detection.
-        chrom_peak_snr: Signal-to-noise ratio required for chromatographic peaks.
-        masstrace_snr_filtering: Whether to apply SNR filtering to mass traces.
-        mz_scoring_13C: Whether to enable scoring of 13C isotopic patterns.
-        width_filtering: Width filtering method for mass traces.
-    Feature Finding (FFM) Parameters:
-        remove_single_traces: Whether to remove mass traces without satellite isotopic traces.
-        report_convex_hulls: Whether to report convex hulls for features.
-        report_summed_ints: Whether to report summed intensities.
-        report_chromatograms: Whether to report chromatograms.
-    Post-processing Parameters:
-        deisotope: Whether to perform deisotoping of detected features.
-        deisotope_mz_tol: m/z tolerance for deisotoping.
-        deisotope_rt_tol_factor: RT tolerance factor for deisotoping (multiplied by chrom_fwhm_min/4).
-        eic_mz_tol: m/z tolerance for EIC extraction.
-        eic_rt_tol: RT tolerance for EIC extraction.
-    Available Methods:
-        - validate(param_name, value): Validate a single parameter value
-        - validate_all(): Validate all parameters at once
-        - to_dict(): Convert parameters to dictionary
-        - set_from_dict(param_dict, validate=True): Update multiple parameters from dict
-        - set(param_name, value, validate=True): Set parameter value with validation
-        - get(param_name): Get parameter value
-        - get_description(param_name): Get parameter description
-        - get_info(param_name): Get full parameter metadata
-        - list_parameters(): Get list of all parameter names
-    """
+    # Main params
+    noise: float = 200.0
+    chrom_fwhm: float = 1.0
+    chrom_peak_snr: float = 10.0
     # Mass Trace Detection parameters
     tol_ppm: float = 30.0
-    noise: float = 200.0
-    min_trace_length_multiplier: float = 1.0
-    trace_termination_outliers: int = 2
+    reestimate_mt_sd: bool = True
+    quant_method: str = "area"
+    trace_termination_criterion: str = "outlier"
+    trace_termination_outliers: int = 5
+    min_sample_rate: float = 0.5
+    min_trace_length: float = 0.5
+    min_trace_length_multiplier: float = 0.2
+    max_trace_length: float = -1.0
     # Elution Peak Detection parameters
-    chrom_fwhm: float = 1.0
-    chrom_fwhm_min: float = 0.5
-    chrom_peak_snr: float = 10.0
-    masstrace_snr_filtering: bool = False
-    mz_scoring_13C: bool = False
+    enabled: bool = True
+    chrom_fwhm_min: float = 0.2
+    chrom_fwhm_max: float = 60.0
     width_filtering: str = "fixed"
+    masstrace_snr_filtering: bool = False
     # Feature Finding parameters
+    local_rt_range: float = 1.0
+    local_mz_range: float = 5.0
+    charge_lower_bound: int = 0
+    charge_upper_bound: int = 5
+    report_smoothed_intensities: bool = False
     remove_single_traces: bool = False
     report_convex_hulls: bool = True
     report_summed_ints: bool = False
     report_chromatograms: bool = True
+    mz_scoring_13C: bool = False
+    threads: int = 1
+    no_progress: bool = False
+    debug: bool = False
     # Post-processing parameters
     deisotope: bool = True
     deisotope_mz_tol: float = 0.02
-    deisotope_rt_tol_factor: float = 0.25  # Will be multiplied by chrom_fwhm_min/4
-    eic_mz_tol: float = 0.01
-    eic_rt_tol: float = 10.0
+    deisotope_rt_tol_factor: float = 0.5  # Will be multiplied by chrom_fwhm
-    # Additional OpenMS FeatureFinderMetabo parameters
-    threads: int = 1
-    no_progress: bool = False
-    debug: bool = False
-    min_sample_rate: float = 0.5
-    min_trace_length: int = 5
-    min_fwhm: float = 1.0
-    max_fwhm: float = 60.0
-    # Additional Mass Trace Detection parameters
-    trace_termination_criterion: str = "outlier"
-    reestimate_mt_sd: bool = True
-    quant_method: str = "area"
-    # Additional Elution Peak Detection parameters
-    enabled: bool = True
-    # Additional Feature Finding parameters
-    local_rt_range: float = 10.0
-    local_mz_range: float = 6.5
-    charge_lower_bound: int = 1
-    charge_upper_bound: int = 3
-    report_smoothed_intensities: bool = False
+    # chrom extraction parameters
     # Parameter metadata for validation and description
     _param_metadata: dict[str, dict[str, Any]] = field(
@@ -132,8 +130,8 @@ class find_features_defaults:
             "min_trace_length_multiplier": {
                 "dtype": float,
                 "description": "Multiplier for minimum trace length calculation (multiplied by chrom_fwhm_min)",
-                "min_value": 1.0,
-                "max_value": 10.0,
+                "min_value": 0.1,
+                "max_value": 2.0,
             },
             "trace_termination_outliers": {
                 "dtype": int,
@@ -204,18 +202,6 @@ class find_features_defaults:
                 "min_value": 0.1,
                 "max_value": 2.0,
             },
-            "eic_mz_tol": {
-                "dtype": float,
-                "description": "m/z tolerance for EIC extraction (Da)",
-                "min_value": 0.001,
-                "max_value": 0.1,
-            },
-            "eic_rt_tol": {
-                "dtype": float,
-                "description": "RT tolerance for EIC extraction (seconds)",
-                "min_value": 1.0,
-                "max_value": 60.0,
-            },
             "threads": {
                 "dtype": int,
                 "description": "Number of threads to use for parallel processing",
@@ -242,13 +228,13 @@ class find_features_defaults:
                 "min_value": 2,
                 "max_value": 100,
             },
-            "min_fwhm": {
+'''            "min_fwhm": {
                 "dtype": float,
                 "description": "Minimum full width at half maximum for peaks (seconds)",
                 "min_value": 0.1,
                 "max_value": 10.0,
-            },
-            "max_fwhm": {
+            },'''
+            "chrom_fwhm_max": {
                 "dtype": float,
                 "description": "Maximum full width at half maximum for peaks (seconds)",
                 "min_value": 1.0,

{masster-0.3.9 → masster-0.3.10}/src/masster/sample/defaults/sample_def.py RENAMED Viewed

@@ -53,6 +53,9 @@ class sample_defaults:
     centroid_prominence: int = -1
     max_points_per_spectrum: int = 50000
     dia_window: float | None = None
+    eic_mz_tol: float = 0.01
+    eic_rt_tol: float = 10.0
     _param_metadata: dict[str, dict[str, Any]] = field(
         default_factory=lambda: {
@@ -163,6 +166,18 @@ class sample_defaults:
                 "default": None,
                 "min_value": 0.0,
             },
+            "eic_mz_tol": {
+                "dtype": float,
+                "description": "m/z tolerance for EIC extraction (Da)",
+                "min_value": 0.001,
+                "max_value": 1.0,
+            },
+            "eic_rt_tol": {
+                "dtype": float,
+                "description": "RT tolerance for EIC extraction (seconds)",
+                "min_value": 0.2,
+                "max_value": 60.0,
+            },
         },
         repr=False,
     )

{masster-0.3.9 → masster-0.3.10}/src/masster/sample/lib.py RENAMED Viewed

@@ -421,14 +421,14 @@ def save_lib_mgf(
             # trim spectrum 2 Da lower and 10 Da higher than precursor m/z
             spec = spec.mz_trim(mz_min=row["mz"] - 2.0, mz_max=row["mz"] + 10.0)
-            filename: str = os.path.basename(self.file_path)
+            file_basename: str = os.path.basename(self.file_path)
             mslevel = 1 if spec.ms_level is None else spec.ms_level
             activation = None
             energy = None
             kineticenergy = None
             if mslevel > 1:
-                if "CID" in filename.upper() or "ZTS" in filename.upper():
-                    if "EAD" in filename.upper():
+                if "CID" in file_basename.upper() or "ZTS" in file_basename.upper():
+                    if "EAD" in file_basename.upper():
                         activation = "CID-EAD"
                         # search ([0-9]*KE) in filename.upper() using regex
                         match = re.search(r"(\d+)KE", str(filename.upper()))
@@ -440,14 +440,14 @@ def save_lib_mgf(
                                 kineticenergy = int(match.group(1))
                     else:
                         activation = "CID"
-                elif "EAD" in filename.upper():
+                elif "EAD" in file_basename.upper():
                     activation = "EAD"
                     # search ([0-9]*KE) in filename.upper() using regex
-                    match = re.search(r"(\d+)KE", filename.upper())
+                    match = re.search(r"(\d+)KE", file_basename.upper())
                     if match:
                         kineticenergy = int(match.group(1))
                     else:
-                        match = re.search(r"(\d+)EV", filename.upper())
+                        match = re.search(r"(\d+)EV", file_basename.upper())
                         if match:
                             kineticenergy = int(match.group(1))
                 energy = spec.energy if hasattr(spec, "energy") else None
@@ -515,14 +515,14 @@ def save_lib_mgf(
                                         kineticenergy = int(match.group(1))
                             else:
                                 activation = "CID"
-                        elif "EAD" in filename.upper():
+                        elif "EAD" in file_basename.upper():
                             activation = "EAD"
-                            # search ([0-9]*KE) in filename.upper() using regex
-                            match = re.search(r"(\d+)KE", filename.upper())
+                            # search ([0-9]*KE) in file_basename.upper() using regex
+                            match = re.search(r"(\d+)KE", file_basename.upper())
                             if match:
                                 kineticenergy = int(match.group(1))
                             else:
-                                match = re.search(r"(\d+)EV", filename.upper())
+                                match = re.search(r"(\d+)EV", file_basename.upper())
                                 if match:
                                     kineticenergy = int(match.group(1))
                             energy = spec.energy if hasattr(spec, "energy") else None
@@ -541,7 +541,7 @@ def save_lib_mgf(
                             "ACTIVATION": activation,
                             "COLLISIONENERGY": energy,
                             "KINETICENERGY": kineticenergy,
-                            "FILENAME": filename,
+                            "FILENAME": file_basename,
                             "SCANS": ms1_scan_uid,
                             "FID": row["feature_uid"],
                             "MSLEVEL": 1 if spec.ms_level is None else spec.ms_level,

{masster-0.3.9 → masster-0.3.10}/src/masster/sample/plot.py RENAMED Viewed

@@ -519,6 +519,14 @@ def plot_2d(
         # find features with ms2_scans not None  and iso==0
         features_df = feats[feats["ms2_scans"].notnull()]
         # Create feature points with proper sizing method
+        feature_hover_1 = HoverTool(tooltips=[
+            ("rt", "@rt"),
+            ("m/z", "@mz{0.0000}"),
+            ("feature_uid", "@feature_uid"),
+            ("inty", "@inty"),
+            ("quality", "@quality"),
+            ("rt_delta", "@rt_delta"),
+        ])
         feature_points_1 = hv.Points(
             features_df,
             kdims=["rt", "mz"],
@@ -536,11 +544,19 @@ def plot_2d(
             color=color_1,
             marker=marker_type,
             size=size_1,
-            tools=["hover"],
+            tools=[feature_hover_1],
             hooks=hooks,
         )
         # find features without MS2 data
         features_df = feats[feats["ms2_scans"].isnull()]
+        feature_hover_2 = HoverTool(tooltips=[
+            ("rt", "@rt"),
+            ("m/z", "@mz{0.0000}"),
+            ("feature_uid", "@feature_uid"),
+            ("inty", "@inty"),
+            ("quality", "@quality"),
+            ("rt_delta", "@rt_delta"),
+        ])
         feature_points_2 = hv.Points(
             features_df,
             kdims=["rt", "mz"],
@@ -557,7 +573,7 @@ def plot_2d(
             color="red",
             marker=marker_type,
             size=size_2,
-            tools=["hover"],
+            tools=[feature_hover_2],
             hooks=hooks,
         )
@@ -567,6 +583,16 @@ def plot_2d(
             # Convert to pandas for plotting compatibility
             if hasattr(features_df, "to_pandas"):
                 features_df = features_df.to_pandas()
+            feature_hover_iso = HoverTool(tooltips=[
+                ("rt", "@rt"),
+                ("m/z", "@mz{0.0000}"),
+                ("feature_uid", "@feature_uid"),
+                ("inty", "@inty"),
+                ("quality", "@quality"),
+                ("rt_delta", "@rt_delta"),
+                ("iso", "@iso"),
+                ("iso_of", "@iso_of"),
+            ])
             feature_points_iso = hv.Points(
                 features_df,
                 kdims=["rt", "mz"],
@@ -585,7 +611,7 @@ def plot_2d(
                 color="violet",
                 marker=marker_type,
                 size=size_1,
-                tools=["hover"],
+                tools=[feature_hover_iso],
                 hooks=hooks,
             )
     if show_ms2:
@@ -597,6 +623,13 @@ def plot_2d(
         if len(ms2_orphan) > 0:
             # pandalize
             ms2 = ms2_orphan.to_pandas()
+            ms2_hover_3 = HoverTool(tooltips=[
+                ("rt", "@rt"),
+                ("prec_mz", "@prec_mz{0.0000}"),
+                ("index", "@index"),
+                ("inty_tot", "@inty_tot"),
+                ("bl", "@bl"),
+            ])
             feature_points_3 = hv.Points(
                 ms2,
                 kdims=["rt", "prec_mz"],
@@ -606,7 +639,7 @@ def plot_2d(
                 color=color_2,
                 marker="x",
                 size=size_2,
-                tools=["hover"],
+                tools=[ms2_hover_3],
             )
         ms2_linked = self.scans_df.filter(pl.col("ms_level") == 2).filter(
@@ -615,6 +648,13 @@ def plot_2d(
         if len(ms2_linked) > 0:
             # pandalize
             ms2 = ms2_linked.to_pandas()
+            ms2_hover_4 = HoverTool(tooltips=[
+                ("rt", "@rt"),
+                ("prec_mz", "@prec_mz{0.0000}"),
+                ("index", "@index"),
+                ("inty_tot", "@inty_tot"),
+                ("bl", "@bl"),
+            ])
             feature_points_4 = hv.Points(
                 ms2,
                 kdims=["rt", "prec_mz"],
@@ -624,7 +664,7 @@ def plot_2d(
                 color=color_1,
                 marker="x",
                 size=size_2,
-                tools=["hover"],
+                tools=[ms2_hover_4],
             )
     overlay = raster
@@ -1041,6 +1081,18 @@ def plot_2d_oracle(
     feat_df = feats.copy()
     feat_df = feat_df[feat_df["id_level"] == 2]
+    oracle_hover_1 = HoverTool(tooltips=[
+        ("rt", "@rt"),
+        ("m/z", "@mz{0.0000}"),
+        ("feature_uid", "@feature_uid"),
+        ("id_level", "@id_level"),
+        ("id_class", "@id_class"),
+        ("id_label", "@id_label"),
+        ("id_ion", "@id_ion"),
+        ("id_evidence", "@id_evidence"),
+        ("score", "@score"),
+        ("score2", "@score2"),
+    ])
     feature_points_1 = hv.Points(
         feat_df,
         kdims=["rt", "mz"],
@@ -1062,7 +1114,7 @@ def plot_2d_oracle(
         marker="circle",
         size=markersize,
         fill_alpha=1.0,
-        tools=["hover"],
+        tools=[oracle_hover_1],
     )
     # feature_points_2 are all features that have ms2_scans not null and id_level ==1
@@ -1070,6 +1122,15 @@ def plot_2d_oracle(
     feat_df = feats.copy()
     feat_df = feat_df[(feat_df["ms2_scans"].notnull()) & (feat_df["id_level"] == 1)]
     if len(feat_df) > 0:
+        oracle_hover_2 = HoverTool(tooltips=[
+            ("rt", "@rt"),
+            ("m/z", "@mz{0.0000}"),
+            ("feature_uid", "@feature_uid"),
+            ("id_level", "@id_level"),
+            ("id_label", "@id_label"),
+            ("id_ion", "@id_ion"),
+            ("id_class", "@id_class"),
+        ])
         feature_points_2 = hv.Points(
             feat_df,
             kdims=["rt", "mz"],
@@ -1088,7 +1149,7 @@ def plot_2d_oracle(
             marker="circle",
             size=markersize,
             fill_alpha=0.0,
-            tools=["hover"],
+            tools=[oracle_hover_2],
         )
     # feature_points_3 are all features that have ms2_scans null and id_level ==1
@@ -1096,6 +1157,15 @@ def plot_2d_oracle(
     feat_df = feats.copy()
     feat_df = feat_df[(feat_df["ms2_scans"].isnull()) & (feat_df["id_level"] == 1)]
     if len(feat_df) > 0:
+        oracle_hover_3 = HoverTool(tooltips=[
+            ("rt", "@rt"),
+            ("m/z", "@mz{0.0000}"),
+            ("feature_uid", "@feature_uid"),
+            ("id_level", "@id_level"),
+            ("id_label", "@id_label"),
+            ("id_ion", "@id_ion"),
+            ("id_class", "@id_class"),
+        ])
         feature_points_3 = hv.Points(
             feat_df,
             kdims=["rt", "mz"],
@@ -1114,7 +1184,7 @@ def plot_2d_oracle(
             marker="diamond",
             size=markersize,
             fill_alpha=0.0,
-            tools=["hover"],
+            tools=[oracle_hover_3],
         )
     # feature_points_4 are all features that have ms2_scans null and id_level ==0
@@ -1122,6 +1192,12 @@ def plot_2d_oracle(
     feat_df = feats.copy()
     feat_df = feat_df[(feat_df["ms2_scans"].notnull()) & (feat_df["id_level"] < 1)]
     if len(feat_df) > 0:
+        oracle_hover_4 = HoverTool(tooltips=[
+            ("rt", "@rt"),
+            ("m/z", "@mz{0.0000}"),
+            ("feature_uid", "@feature_uid"),
+            ("inty", "@inty"),
+        ])
         feature_points_4 = hv.Points(
             feat_df,
             kdims=["rt", "mz"],
@@ -1132,14 +1208,20 @@ def plot_2d_oracle(
             marker="circle",
             size=markersize,
             fill_alpha=0.0,
-            tools=["hover"],
+            tools=[oracle_hover_4],
         )
-    # feature_points_4 are all features that have ms2_scans null and id_level ==0
+    # feature_points_5 are all features that have ms2_scans null and id_level ==0
     feature_points_5 = None
     feat_df = feats.copy()
     feat_df = feat_df[(feat_df["ms2_scans"].isnull()) & (feat_df["id_level"] < 1)]
     if len(feat_df) > 0:
+        oracle_hover_5 = HoverTool(tooltips=[
+            ("rt", "@rt"),
+            ("m/z", "@mz{0.0000}"),
+            ("feature_uid", "@feature_uid"),
+            ("inty", "@inty"),
+        ])
         feature_points_5 = hv.Points(
             feat_df,
             kdims=["rt", "mz"],
@@ -1150,7 +1232,7 @@ def plot_2d_oracle(
             marker="diamond",
             fill_alpha=0.0,
             size=markersize,
-            tools=["hover"],
+            tools=[oracle_hover_5],
         )
     overlay = raster

masster 0.3.9__tar.gz → 0.3.10__tar.gz

Potentially problematic release.

masster 0.3.9tar.gz → 0.3.10tar.gz