PyPI - msreport - Versions diffs - 0.0.29__tar.gz → 0.0.31__tar.gz - Mend

msreport 0.0.29tar.gz → 0.0.31tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (54) hide show

{msreport-0.0.29 → msreport-0.0.31}/PKG-INFO RENAMED Viewed

@@ -1,10 +1,11 @@
 Metadata-Version: 2.4
 Name: msreport
-Version: 0.0.29
+Version: 0.0.31
 Summary: Post processing and analysis of quantitative proteomics data
 Author-email: "David M. Hollenstein" <hollenstein.david@gmail.com>
 License-Expression: Apache-2.0
 Project-URL: homepage, https://github.com/hollenstein/msreport
+Project-URL: documentation, https://hollenstein.github.io/msreport/
 Project-URL: changelog, https://github.com/hollenstein/msreport/blob/main/CHANGELOG.md
 Keywords: mass spectrometry,proteomics,post processing,data analysis
 Classifier: Development Status :: 4 - Beta
@@ -29,10 +30,17 @@ Requires-Dist: seaborn>=0.12.0
 Requires-Dist: statsmodels>=0.13.2
 Requires-Dist: typing_extensions>=4
 Provides-Extra: r
-Requires-Dist: rpy2!=3.5.13,>=3.5.3; extra == "r"
+Requires-Dist: rpy2<3.5.13,>=3.5.3; extra == "r"
 Provides-Extra: dev
 Requires-Dist: mypy>=1.15.0; extra == "dev"
 Requires-Dist: pytest>=8.3.5; extra == "dev"
+Provides-Extra: docs
+Requires-Dist: mkdocs-awesome-nav>=3.1.2; extra == "docs"
+Requires-Dist: mkdocs-macros-plugin>=1.3.7; extra == "docs"
+Requires-Dist: mkdocs-material>=9.6.15; extra == "docs"
+Requires-Dist: mkdocs-roamlinks-plugin>=0.3.2; extra == "docs"
+Requires-Dist: mkdocstrings-python>=1.16.12; extra == "docs"
+Requires-Dist: ruff>=0.12.2; extra == "docs"
 Provides-Extra: test
 Requires-Dist: pytest>=8.3.5; extra == "test"
 Dynamic: license-file
@@ -40,6 +48,7 @@ Dynamic: license-file
 # MsReport
 [![Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.](https://www.repostatus.org/badges/latest/wip.svg)](https://www.repostatus.org/#wip)
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.15309090.svg)](https://doi.org/10.5281/zenodo.15309090)
 ![Python Version from PEP 621 TOML](https://img.shields.io/python/required-version-toml?tomlFilePath=https%3A%2F%2Fraw.githubusercontent.com%2Fhollenstein%2Fmsreport%2Fmain%2Fpyproject.toml)
 [![Run tests](https://github.com/hollenstein/msreport/actions/workflows/run-tests.yml/badge.svg)](https://github.com/hollenstein/msreport/actions/workflows/run-tests.yml)
@@ -55,6 +64,7 @@ bottom-up mass spectrometry experiments.
     - [Additional requirements](#additional-requirements)
     - [Optional Dependencies](#optional-dependencies)
 - [Development status](#development-status)
+- [How to cite](#how-to-cite)
 ## What is MsReport?
@@ -62,6 +72,8 @@ MsReport is a Python library designed to simplify the post-processing and analys
 The library supports importing protein and peptide-level quantification results from MaxQuant, FragPipe, and Spectronaut, as well as post-translational modification (PTM) data from MaxQuant and FragPipe. MsReport provides tools for data annotation, normalization and transformation, statistical testing, and data visualization.
+The [documentation](https://hollenstein.github.io/msreport/) provides an overview of the library's public API.
 ### Key features of MsReport
 #### Data Import and Standardization
@@ -134,3 +146,9 @@ For example, the R home directory might look like this on Windows: `C:\Program F
 ## Development status
 MsReport is a stable and reliable library that has been used on a daily basis for over two years in the Mass Spectrometry Facility at the Max Perutz Labs and the Mass Spectrometry Facility of IMP/IMBA/GMI. While the current interface of MsReport is stable, the library is still under active development, with new features being added regularly. Please note that a major rewrite is planned, which may introduce changes to the API in the future.
+## How to cite
+If you use MsReport for your research or publications, please include the following citation and consider giving the project a star on GitHub.
+> Hollenstein, D. M., & Hartl, M. (2025). hollenstein/msreport: v0.0.29 (0.0.29). Zenodo. https://doi.org/10.5281/zenodo.15309090

{msreport-0.0.29 → msreport-0.0.31}/README.md RENAMED Viewed

@@ -1,6 +1,7 @@
 # MsReport
 [![Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.](https://www.repostatus.org/badges/latest/wip.svg)](https://www.repostatus.org/#wip)
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.15309090.svg)](https://doi.org/10.5281/zenodo.15309090)
 ![Python Version from PEP 621 TOML](https://img.shields.io/python/required-version-toml?tomlFilePath=https%3A%2F%2Fraw.githubusercontent.com%2Fhollenstein%2Fmsreport%2Fmain%2Fpyproject.toml)
 [![Run tests](https://github.com/hollenstein/msreport/actions/workflows/run-tests.yml/badge.svg)](https://github.com/hollenstein/msreport/actions/workflows/run-tests.yml)
@@ -16,6 +17,7 @@ bottom-up mass spectrometry experiments.
     - [Additional requirements](#additional-requirements)
     - [Optional Dependencies](#optional-dependencies)
 - [Development status](#development-status)
+- [How to cite](#how-to-cite)
 ## What is MsReport?
@@ -23,6 +25,8 @@ MsReport is a Python library designed to simplify the post-processing and analys
 The library supports importing protein and peptide-level quantification results from MaxQuant, FragPipe, and Spectronaut, as well as post-translational modification (PTM) data from MaxQuant and FragPipe. MsReport provides tools for data annotation, normalization and transformation, statistical testing, and data visualization.
+The [documentation](https://hollenstein.github.io/msreport/) provides an overview of the library's public API.
 ### Key features of MsReport
 #### Data Import and Standardization
@@ -95,3 +99,9 @@ For example, the R home directory might look like this on Windows: `C:\Program F
 ## Development status
 MsReport is a stable and reliable library that has been used on a daily basis for over two years in the Mass Spectrometry Facility at the Max Perutz Labs and the Mass Spectrometry Facility of IMP/IMBA/GMI. While the current interface of MsReport is stable, the library is still under active development, with new features being added regularly. Please note that a major rewrite is planned, which may introduce changes to the API in the future.
+## How to cite
+If you use MsReport for your research or publications, please include the following citation and consider giving the project a star on GitHub.
+> Hollenstein, D. M., & Hartl, M. (2025). hollenstein/msreport: v0.0.29 (0.0.29). Zenodo. https://doi.org/10.5281/zenodo.15309090

{msreport-0.0.29 → msreport-0.0.31}/msreport/__init__.py RENAMED Viewed

@@ -8,4 +8,4 @@ from msreport.fasta import import_protein_database
 from msreport.qtable import Qtable
 from msreport.reader import FragPipeReader, MaxQuantReader, SpectronautReader
-__version__ = "0.0.29"
+__version__ = "0.0.31"

msreport-0.0.31/msreport/aggregate/__init__.py ADDED Viewed

@@ -0,0 +1,10 @@
+"""A comprehensive set of tools for aggregating and reshaping tabular proteomics data.
+The `aggregation` module contains submodules that offer functionalities to transform
+data from lower levels of abstraction (e.g. ions, peptides) to higher levels (e.g.
+peptides, proteins, PTMs) through various summarization and condensation techniques.
+It also includes methods for reshaping tables from "long" to "wide" format, a common
+prerequisite for aggregation. The MaxLFQ algorithm is integrated for specific
+quantitative summarizations, enabling users to build customized, higher-level data
+tables.
+"""

{msreport-0.0.29 → msreport-0.0.31}/msreport/aggregate/condense.py RENAMED Viewed

@@ -1,3 +1,12 @@
+"""Low-level functions for aggregating numerical and string data.
+This module defines fundamental "condenser" functions that operate directly on NumPy
+arrays. These functions are designed to be applied to groups of data, performing
+operations such as summing values, finding maximum/minimum, counting or joining unique
+elements, and calculating abundance profiles. It includes the core implementations for
+MaxLFQ summation.
+"""
 import numpy as np
 import msreport.helper.maxlfq as MAXLFQ

{msreport-0.0.29 → msreport-0.0.31}/msreport/aggregate/pivot.py RENAMED Viewed

@@ -1,4 +1,12 @@
-from typing import Iterable, Union
+"""Functionalities for reshaping tabular quantitative proteomics data.
+This module offers methods to transform data from a "long" format into a "wide" format,
+which is a common and often necessary step before aggregation or analysis. It supports
+pivoting data based on specified index and grouping columns, and can handle both
+quantitative values and annotation columns.
+"""
+from typing import Iterable
 import pandas as pd
@@ -12,11 +20,12 @@ def pivot_table(
     group_by: str,
     annotation_columns: Iterable[str],
     pivoting_columns: Iterable[str],
-):
+) -> pd.DataFrame:
     """Generates a pivoted table in wide format.
     Args:
-        table: Dataframe in long format that is used to generate a table in wide format.
+        long_table: Dataframe in long format that is used to generate a table in wide
+            format.
         index: One or multiple column names that are used to group the table for
             pivoting.
         group_by: Column that is used to split the table on its unique entries.
@@ -58,7 +67,7 @@ def pivot_table(
 def pivot_column(
-    table: pd.DataFrame, index: Union[str, Iterable], group_by: str, values: str
+    table: pd.DataFrame, index: str | Iterable[str], group_by: str, values: str
 ) -> pd.DataFrame:
     """Returns a reshaped dataframe, generated by pivoting the table on one column.
@@ -98,7 +107,7 @@ def pivot_column(
 def join_unique(
-    table: pd.DataFrame, index: Union[str, Iterable], values: str
+    table: pd.DataFrame, index: str | Iterable[str], values: str
 ) -> pd.DataFrame:
     """Returns a new dataframe with unique values from a column and grouped by 'index'.

{msreport-0.0.29 → msreport-0.0.31}/msreport/aggregate/summarize.py RENAMED Viewed

@@ -1,4 +1,14 @@
-from typing import Callable, Iterable, Optional, Union
+"""High-level functions for aggregating quantitative proteomics data.
+This module offers functions to summarize data from a lower level of abstraction (e.g.
+ions, peptides) to a higher level (e.g., peptides, proteins, PTMs). It operates directly
+on pandas DataFrames, allowing users to specify a grouping column and the columns to be
+summarized. These functions often leverage low-level condenser operations defined in
+`msreport.aggregate.condense`. It includes specific functions for MaxLFQ summation, as
+well as general counting, joining, and summing of columns.
+"""
+from typing import Callable, Iterable, Optional
 import numpy as np
 import pandas as pd
@@ -10,7 +20,7 @@ from msreport.helper import find_sample_columns
 def count_unique(
     table: pd.DataFrame,
     group_by: str,
-    input_column: Union[str, Iterable],
+    input_column: str | Iterable[str],
     output_column: str = "Unique counts",
     is_sorted: bool = False,
 ) -> pd.DataFrame:
@@ -55,7 +65,7 @@ def count_unique(
 def join_unique(
     table: pd.DataFrame,
     group_by: str,
-    input_column: Union[str, Iterable],
+    input_column: str | Iterable[str],
     output_column: str = "Unique values",
     sep: str = ";",
     is_sorted: bool = False,
@@ -215,7 +225,7 @@ def sum_columns_maxlfq(
 def aggregate_unique_groups(
     table: pd.DataFrame,
     group_by: str,
-    columns_to_aggregate: Union[str, Iterable],
+    columns_to_aggregate: str | Iterable[str],
     condenser: Callable,
     is_sorted: bool,
 ) -> tuple[np.ndarray, np.ndarray]:

{msreport-0.0.29 → msreport-0.0.31}/msreport/analyze.py RENAMED Viewed

@@ -1,12 +1,16 @@
-"""The analyze module contains methods for analysing quantification results."""
+"""Tools for post-processing and statistical analysis of `Qtable` data.
-from __future__ import annotations
+All functions in this module take a `Qtable` object and modify its data in place. The
+module provides functionality for data evaluation, normalization, imputation of missing
+values, and statistical testing, including integration with R's LIMMA package.
+"""
 import warnings
 from typing import Iterable, Optional, Protocol, Sequence
 import numpy as np
 import pandas as pd
+from typing_extensions import Self
 import msreport.normalize
 from msreport.errors import OptionalDependencyError
@@ -24,7 +28,7 @@ except OptionalDependencyError as err:
 class Transformer(Protocol):
-    def fit(self, table: pd.DataFrame) -> Transformer:
+    def fit(self, table: pd.DataFrame) -> Self:
         """Fits the Transformer and returns a fitted Transformer instance."""
     def is_fitted(self) -> bool:
@@ -35,7 +39,7 @@ class Transformer(Protocol):
 class CategoryTransformer(Protocol):
-    def fit(self, table: pd.DataFrame) -> Transformer:
+    def fit(self, table: pd.DataFrame) -> Self:
         """Fits the Transformer and returns a fitted Transformer instance."""
     def is_fitted(self) -> bool:
@@ -162,7 +166,7 @@ def validate_proteins(
 def apply_transformer(
-    qtable: msreport.Qtable,
+    qtable: Qtable,
     transformer: Transformer,
     tag: str,
     exclude_invalid: bool,
@@ -205,6 +209,64 @@ def apply_transformer(
     qtable.data[data_table.columns] = data_table
+def apply_category_transformer(
+    qtable: Qtable,
+    transformer: CategoryTransformer,
+    tag: str,
+    exclude_invalid: bool,
+    remove_invalid: bool,
+    new_tag: Optional[str] = None,
+) -> None:
+    """Apply a category transformer to Qtable columns selected by tag.
+    Args:
+        qtable: A Qtable instance, to which the transformer is applied.
+        transformer: The CategoryTransformer to apply.
+        tag: The tag used to identify the columns for applying the transformer.
+        exclude_invalid: Exclude invalid values from the transformation.
+        remove_invalid: Remove invalid values from the table after the transformation.
+        new_tag: Optional, if specified than the tag is replaced with this value in the
+            column names and the transformed data is stored to these new columns.
+    Raises:
+        KeyError: If the category column of the `transformer` is not found in the
+            `qtable.data`.
+        ValueError: If no sample columns are found for the specified tag.
+    """
+    category_column = transformer.get_category_column()
+    if category_column not in qtable.data.columns:
+        raise KeyError(
+            f'The category column "{category_column}" in the transformer '
+            f"is not found in `qtable.data`."
+        )
+    valid = qtable.data["Valid"]
+    samples = qtable.get_samples()
+    sample_columns = find_sample_columns(qtable.data, tag, samples)
+    if not sample_columns:
+        raise ValueError(f"No sample columns found for tag '{tag}'.")
+    if new_tag is not None:
+        sample_columns = [c.replace(tag, new_tag) for c in sample_columns]
+    column_mapping = dict(zip(samples, sample_columns))
+    data_table = qtable.make_sample_table(tag, samples_as_columns=True)
+    data_table[category_column] = qtable.data[category_column]
+    if exclude_invalid:
+        data_table.loc[valid, :] = transformer.transform(data_table.loc[valid, :])
+    else:
+        data_table = transformer.transform(data_table)
+    data_table = data_table.drop(columns=[category_column])
+    if remove_invalid:
+        data_table[~valid] = np.nan
+    data_table.columns = [column_mapping[s] for s in data_table.columns]
+    qtable.data[data_table.columns] = data_table
 def normalize_expression(
     qtable: Qtable,
     normalizer: Transformer,

{msreport-0.0.29 → msreport-0.0.31}/msreport/export.py RENAMED Viewed

@@ -1,19 +1,13 @@
-"""
-Columns that are not yet present in the amica output at the moment:
-Index([
-    'Protein Probability',
-    'Top Peptide Probability',
-    'Total peptides',
-    'Leading proteins',
-    'Protein entry name',
-    'Fasta header',
-    'Protein length',
-    'iBAQ peptides',
-    'Sequence coverage',
-], dtype='object')
+"""Exporting of proteomics data from `Qtable` into external formats.
+This module offers functionalities to convert and save `Qtable` data into files
+compatible with external tools (Amica and Perseus), and creating sequence coverage maps
+in HTML format. While most functions operate on `Qtable` instances, some may accept
+other data structures.
 """
 import os
+import pathlib
 import warnings
 from collections import defaultdict as ddict
 from typing import Iterable, Optional, Protocol, Sequence
@@ -99,7 +93,7 @@ def contaminants_to_clipboard(qtable: Qtable) -> None:
 def to_perseus_matrix(
     qtable: Qtable,
-    directory,
+    directory: str | pathlib.Path,
     table_name: str = "perseus_matrix.tsv",
 ) -> None:
     """Exports a qtable to a perseus matrix file in tsv format.
@@ -151,7 +145,7 @@ def to_perseus_matrix(
 def to_amica(
     qtable: Qtable,
-    directory,
+    directory: str | pathlib.Path,
     table_name: str = "amica_table.tsv",
     design_name: str = "amica_design.tsv",
 ) -> None:

{msreport-0.0.29 → msreport-0.0.31}/msreport/fasta.py RENAMED Viewed

@@ -1,11 +1,18 @@
+"""Functionalities for import and access to protein sequence databases from FASTA files.
+This module serves as an interface to the `profasta` library, offering a convenient way
+to generate a `profasta.db.ProteinDatabase` from one or multiple FASTA files. It
+supports custom FASTA header parsing through a configurable header parser.
+"""
 import pathlib
-from typing import Iterable, Union
+from typing import Iterable
 from profasta.db import ProteinDatabase
 def import_protein_database(
-    fasta_path: Union[str, pathlib.Path, Iterable[Union[str, pathlib.Path]]],
+    fasta_path: str | pathlib.Path | Iterable[str | pathlib.Path],
     header_parser: str = "uniprot",
 ) -> ProteinDatabase:
     """Generates a protein database from one or a list of fasta files.

{msreport-0.0.29 → msreport-0.0.31}/msreport/helper/__init__.py RENAMED Viewed

@@ -1,3 +1,9 @@
+"""A collection of widely used helper and utility functions.
+This module re-exports commonly used functions from various `msreport.helper`
+submodules for convenience.
+"""
 from .calc import (
     calculate_monoisotopic_mass,
     calculate_sequence_coverage,
@@ -21,3 +27,15 @@ from .temp import (
     extract_modifications,
     modify_peptide,
 )
+__all__ = [
+    "apply_intensity_cutoff",
+    "find_columns",
+    "find_sample_columns",
+    "guess_design",
+    "intensities_in_logspace",
+    "keep_rows_by_partial_match",
+    "remove_rows_by_partial_match",
+    "rename_mq_reporter_channels",
+    "rename_sample_columns",
+]

{msreport-0.0.29 → msreport-0.0.31}/msreport/impute.py RENAMED Viewed

@@ -1,9 +1,17 @@
-from __future__ import annotations
+"""Transformer classes for imputing missing values in quantitative proteomics data.
+This module defines transformer classes that can be fitted to a table containing
+quantitative values to learn imputation parameters. Once fitted, these transformers can
+then be applied to another table to transform it by filling in missing values. The
+transformation returns a new copy of the table with the imputed values, leaving the
+original table unchanged.
+"""
 from typing import Any, Optional
 import numpy as np
 import pandas as pd
+from typing_extensions import Self
 from msreport.errors import NotFittedError
@@ -42,7 +50,7 @@ class FixedValueImputer:
         self.column_wise = column_wise
         self._sample_fill_values: dict[str, float] = {}
-    def fit(self, table: pd.DataFrame) -> FixedValueImputer:
+    def fit(self, table: pd.DataFrame) -> Self:
         """Fits the FixedValueImputer.
         Args:
@@ -79,7 +87,7 @@ class FixedValueImputer:
         Returns:
             'table' with imputed missing values.
         """
-        confirm_is_fitted(self)
+        _confirm_is_fitted(self)
         _table = table.copy()
         for column in _table.columns:
@@ -108,7 +116,7 @@ class GaussianImputer:
         self.sigma = sigma
         self.seed = seed
-    def fit(self, table: pd.DataFrame) -> GaussianImputer:
+    def fit(self, table: pd.DataFrame) -> Self:
         """Fits the GaussianImputer, altough this is not necessary.
         Args:
@@ -134,7 +142,7 @@ class GaussianImputer:
         Returns:
             'table' with imputed missing values.
         """
-        confirm_is_fitted(self)
+        _confirm_is_fitted(self)
         np.random.seed(self.seed)
         _table = table.copy()
@@ -182,9 +190,9 @@ class PerseusImputer:
         self.std_width = std_width
         self.column_wise = column_wise
         self.seed = seed
-        self._column_params: dict[str, dict] = {}
+        self._column_params: dict[str, dict[str, float]] = {}
-    def fit(self, table: pd.DataFrame) -> PerseusImputer:
+    def fit(self, table: pd.DataFrame) -> Self:
         """Fits the PerseusImputer.
         Args:
@@ -223,7 +231,7 @@ class PerseusImputer:
         Returns:
             'table' with imputed missing values.
         """
-        confirm_is_fitted(self)
+        _confirm_is_fitted(self)
         np.random.seed(self.seed)
         _table = table.copy()
@@ -239,7 +247,7 @@ class PerseusImputer:
         return _table
-def confirm_is_fitted(imputer: Any, msg: Optional[str] = None) -> None:
+def _confirm_is_fitted(imputer: Any, msg: Optional[str] = None) -> None:
     """Perform is_fitted validation for imputer instances.
     Checks if the imputer is fitted by verifying the presence of fitted attributes
@@ -266,7 +274,7 @@ def confirm_is_fitted(imputer: Any, msg: Optional[str] = None) -> None:
         raise NotFittedError(msg % {"name": type(imputer).__name__})
-def _calculate_integer_below_min(table) -> int:
+def _calculate_integer_below_min(table: pd.DataFrame) -> int:
     minimal_value = np.nanmin(table.to_numpy().flatten())
     below_minimal = np.floor(minimal_value)
     if minimal_value <= below_minimal:

{msreport-0.0.29 → msreport-0.0.31}/msreport/isobar.py RENAMED Viewed

@@ -1,34 +1,31 @@
-from __future__ import annotations
+"""Provides a transformer class for processing isobarically labeled proteomics data.
+This module defines the `IsotopeImpurityCorrecter` class for processing of isobaric
+(e.g., TMT, iTRAQ) reporter intensities. This transformer must be fitted with an isotope
+impurity matrix to correct interference in reporter intensities. Once fitted, the
+transformer can then be applied to a table containing reporter ion intensities to adjust
+its intensity values. The transformation returns a new copy of the table with the
+processed values, leaving the original table unchanged.
+"""
 import functools
-from typing import Protocol
 import numpy as np
 import pandas as pd
 import scipy
+from typing_extensions import Self
 import msreport.helper
 from msreport.errors import NotFittedError
-class Transformer(Protocol):
-    def fit(self, table: pd.DataFrame) -> Transformer:
-        """Fits the Transformer and returns a fitted Transformer instance."""
-    def is_fitted(self) -> bool:
-        """Returns True if the Transformer has been fitted."""
-    def transform(self, table: pd.DataFrame) -> pd.DataFrame:
-        """Transform values in 'table'."""
 class IsotopeImpurityCorrecter:
     """Corrects isotope impurity interference in isobaric reporter expression values."""
     def __init__(self):
         self._impurity_matrix = None
-    def fit(self, impurity_matrix: np.ndarray) -> IsotopeImpurityCorrecter:
+    def fit(self, impurity_matrix: np.ndarray) -> Self:
         """Fits the isotope impurity correcter to a given impurity matrix.
         Args:

msreport 0.0.29__tar.gz → 0.0.31__tar.gz

msreport 0.0.29tar.gz → 0.0.31tar.gz