PyPI - paradigma - Versions diffs - 0.4.7__tar.gz → 1.0.1__tar.gz - Mend

paradigma 0.4.7tar.gz → 1.0.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (22) hide show

{paradigma-0.4.7 → paradigma-1.0.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.3
 Name: paradigma
-Version: 0.4.7
+Version: 1.0.1
 Summary: ParaDigMa - A toolbox for deriving Parkinson's disease Digital Markers from real-life wrist sensor data
 License: Apache-2.0
 Author: Erik Post
@@ -26,7 +26,7 @@ Description-Content-Type: text/markdown
 |:----:|----|
 | **Packages and Releases** | [![Latest release](https://img.shields.io/github/release/biomarkersparkinson/paradigma.svg)](https://github.com/biomarkersparkinson/paradigma/releases/latest) [![PyPI](https://img.shields.io/pypi/v/paradigma.svg)](https://pypi.python.org/pypi/paradigma/)  [![Static Badge](https://img.shields.io/badge/RSD-paradigma-lib)](https://research-software-directory.org/software/paradigma) |
 | **DOI** | [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.13838392.svg)](https://doi.org/10.5281/zenodo.13838392) |
-| **Build Status** | [![](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/) [![Build and test](https://github.com/biomarkersParkinson/paradigma/actions/workflows/build-and-test.yml/badge.svg)](https://github.com/biomarkersParkinson/paradigma/actions/workflows/build-and-test.yml) [![pages-build-deployment](https://github.com/biomarkersParkinson/paradigma/actions/workflows/pages/pages-build-deployment/badge.svg)](https://github.com/biomarkersParkinson/paradigma/actions/workflows/pages/pages-build-deployment) |
+| **Build Status** | [![](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/) [![Build and test](https://github.com/biomarkersParkinson/paradigma/actions/workflows/build-and-test.yml/badge.svg)](https://github.com/biomarkersParkinson/paradigma/actions/workflows/build-and-test.yml) [![pages-build-deployment](https://github.com/biomarkersParkinson/paradigma/actions/workflows/pages/pages-build-deployment/badge.svg)](https://github.com/biomarkersParkinson/paradigma/actions/workflows/pages/pages-build-deployment) |
 | **License** |  [![GitHub license](https://img.shields.io/github/license/biomarkersParkinson/paradigma)](https://github.com/biomarkersparkinson/paradigma/blob/main/LICENSE) |
 <!-- | **Fairness** |  [![fair-software.eu](https://img.shields.io/badge/fair--software.eu-%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F-green)](https://fair-software.eu) [![OpenSSF Best Practices](https://bestpractices.coreinfrastructure.org/projects/8083/badge)](https://www.bestpractices.dev/projects/8083) | -->
@@ -95,7 +95,7 @@ The ParaDigMa toolbox is designed for the analysis of passive monitoring data co
 Specific requirements include:
 | Pipeline               | Sensor Configuration                                                                                                       | Context of Use                                                                                             |
 |------------------------|--------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|
-| **All**               | - Sensor position: wrist-band on most or least affected side (validated for both, but different sensitivity for measuring disease progression for tremor and arm swing during gait).  <br> - Sensor orientation: orientation as described in [Coordinate System](https://biomarkersparkinson.github.io/paradigma/guides/coordinate_system.html). | - Population: persons with PD. <br> - Data collection protocol: passive monitoring in daily life. |
+| **All**               | - Sensor position: wrist-band on most or least affected side (validated for both, but different sensitivity for measuring disease progression for tremor and arm swing during gait).  <br> - Sensor orientation: orientation as described in [Coordinate System](https://biomarkersparkinson.github.io/paradigma/guides/coordinate_system.html). <br> - Timeframe: contiguous, strictly increasing timestamps. | - Population: persons with PD. <br> - Data collection protocol: passive monitoring in daily life. |
 | **Arm swing during gait** | - Accelerometer: minimum sampling rate of 100 Hz, minimum range of ± 4 g. <br> - Gyroscope: minimum sampling rate of 100 Hz, minimum range of ± 1000 degrees/sec. | - Population: no walking aid, no severe dyskinesia in the watch-sided arm. <br> - Compliance: for weekly measures: at least three compliant days (with ≥10 hours of data between 8 am and 10 pm), and at least 2 minutes of arm swing. |
 | **Tremor**            | - Gyroscope: minimum sampling rate of 100 Hz, minimum range of ± 1000 degrees/sec. | - Compliance: for weekly measures: at least three compliant days (with ≥10 hours of data between 8 am and 10 pm). |
 | **Pulse rate**        | - PPG*: minimum sampling rate of 30 Hz, green LED. <br> - Accelerometer: minimum sampling rate of 100 Hz, minimum range of ± 4 g. | - Population: no rhythm disorders (e.g. atrial fibrillation, atrial flutter). <br> - Compliance: for weekly measures: minimum average of 12 hours of data per day. |
@@ -111,8 +111,10 @@ We have included support for [TSDF](https://biomarkersparkinson.github.io/tsdf/)
 ## Scientific validation
-The pipelines were developed and validated using data from the Parkinson@Home Validation study [[Evers et al. 2020]](https://pmc.ncbi.nlm.nih.gov/articles/PMC7584982/)
-and the Personalized Parkinson Project [[Bloem et al. 2019]](https://pubmed.ncbi.nlm.nih.gov/31315608/). Details and validation of the different pipelines shall be shared in upcoming scientific publications.
+The pipelines were developed and validated using data from the Parkinson@Home Validation study [[Evers et al. 2020]](https://pmc.ncbi.nlm.nih.gov/articles/PMC7584982/) and the Personalized Parkinson Project [[Bloem et al. 2019]](https://pubmed.ncbi.nlm.nih.gov/31315608/). The following publication contains the details and validation of the arm swing during gait pipeline:
+* [Post, E. et al. - Quantifying arm swing in Parkinson's disease: a method account for arm activities during free-living gait](https://doi.org/10.1186/s12984-025-01578-z)
+Details and validation of the other pipelines shall be shared in upcoming scientific publications.
 ## Contributing

{paradigma-0.4.7 → paradigma-1.0.1}/README.md RENAMED Viewed

@@ -6,7 +6,7 @@
 |:----:|----|
 | **Packages and Releases** | [![Latest release](https://img.shields.io/github/release/biomarkersparkinson/paradigma.svg)](https://github.com/biomarkersparkinson/paradigma/releases/latest) [![PyPI](https://img.shields.io/pypi/v/paradigma.svg)](https://pypi.python.org/pypi/paradigma/)  [![Static Badge](https://img.shields.io/badge/RSD-paradigma-lib)](https://research-software-directory.org/software/paradigma) |
 | **DOI** | [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.13838392.svg)](https://doi.org/10.5281/zenodo.13838392) |
-| **Build Status** | [![](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/) [![Build and test](https://github.com/biomarkersParkinson/paradigma/actions/workflows/build-and-test.yml/badge.svg)](https://github.com/biomarkersParkinson/paradigma/actions/workflows/build-and-test.yml) [![pages-build-deployment](https://github.com/biomarkersParkinson/paradigma/actions/workflows/pages/pages-build-deployment/badge.svg)](https://github.com/biomarkersParkinson/paradigma/actions/workflows/pages/pages-build-deployment) |
+| **Build Status** | [![](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/) [![Build and test](https://github.com/biomarkersParkinson/paradigma/actions/workflows/build-and-test.yml/badge.svg)](https://github.com/biomarkersParkinson/paradigma/actions/workflows/build-and-test.yml) [![pages-build-deployment](https://github.com/biomarkersParkinson/paradigma/actions/workflows/pages/pages-build-deployment/badge.svg)](https://github.com/biomarkersParkinson/paradigma/actions/workflows/pages/pages-build-deployment) |
 | **License** |  [![GitHub license](https://img.shields.io/github/license/biomarkersParkinson/paradigma)](https://github.com/biomarkersparkinson/paradigma/blob/main/LICENSE) |
 <!-- | **Fairness** |  [![fair-software.eu](https://img.shields.io/badge/fair--software.eu-%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F-green)](https://fair-software.eu) [![OpenSSF Best Practices](https://bestpractices.coreinfrastructure.org/projects/8083/badge)](https://www.bestpractices.dev/projects/8083) | -->
@@ -75,7 +75,7 @@ The ParaDigMa toolbox is designed for the analysis of passive monitoring data co
 Specific requirements include:
 | Pipeline               | Sensor Configuration                                                                                                       | Context of Use                                                                                             |
 |------------------------|--------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|
-| **All**               | - Sensor position: wrist-band on most or least affected side (validated for both, but different sensitivity for measuring disease progression for tremor and arm swing during gait).  <br> - Sensor orientation: orientation as described in [Coordinate System](https://biomarkersparkinson.github.io/paradigma/guides/coordinate_system.html). | - Population: persons with PD. <br> - Data collection protocol: passive monitoring in daily life. |
+| **All**               | - Sensor position: wrist-band on most or least affected side (validated for both, but different sensitivity for measuring disease progression for tremor and arm swing during gait).  <br> - Sensor orientation: orientation as described in [Coordinate System](https://biomarkersparkinson.github.io/paradigma/guides/coordinate_system.html). <br> - Timeframe: contiguous, strictly increasing timestamps. | - Population: persons with PD. <br> - Data collection protocol: passive monitoring in daily life. |
 | **Arm swing during gait** | - Accelerometer: minimum sampling rate of 100 Hz, minimum range of ± 4 g. <br> - Gyroscope: minimum sampling rate of 100 Hz, minimum range of ± 1000 degrees/sec. | - Population: no walking aid, no severe dyskinesia in the watch-sided arm. <br> - Compliance: for weekly measures: at least three compliant days (with ≥10 hours of data between 8 am and 10 pm), and at least 2 minutes of arm swing. |
 | **Tremor**            | - Gyroscope: minimum sampling rate of 100 Hz, minimum range of ± 1000 degrees/sec. | - Compliance: for weekly measures: at least three compliant days (with ≥10 hours of data between 8 am and 10 pm). |
 | **Pulse rate**        | - PPG*: minimum sampling rate of 30 Hz, green LED. <br> - Accelerometer: minimum sampling rate of 100 Hz, minimum range of ± 4 g. | - Population: no rhythm disorders (e.g. atrial fibrillation, atrial flutter). <br> - Compliance: for weekly measures: minimum average of 12 hours of data per day. |
@@ -91,8 +91,10 @@ We have included support for [TSDF](https://biomarkersparkinson.github.io/tsdf/)
 ## Scientific validation
-The pipelines were developed and validated using data from the Parkinson@Home Validation study [[Evers et al. 2020]](https://pmc.ncbi.nlm.nih.gov/articles/PMC7584982/)
-and the Personalized Parkinson Project [[Bloem et al. 2019]](https://pubmed.ncbi.nlm.nih.gov/31315608/). Details and validation of the different pipelines shall be shared in upcoming scientific publications.
+The pipelines were developed and validated using data from the Parkinson@Home Validation study [[Evers et al. 2020]](https://pmc.ncbi.nlm.nih.gov/articles/PMC7584982/) and the Personalized Parkinson Project [[Bloem et al. 2019]](https://pubmed.ncbi.nlm.nih.gov/31315608/). The following publication contains the details and validation of the arm swing during gait pipeline:
+* [Post, E. et al. - Quantifying arm swing in Parkinson's disease: a method account for arm activities during free-living gait](https://doi.org/10.1186/s12984-025-01578-z)
+Details and validation of the other pipelines shall be shared in upcoming scientific publications.
 ## Contributing

{paradigma-0.4.7 → paradigma-1.0.1}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "paradigma"
-version = "0.4.7"
+version = "1.0.1"
 description = "ParaDigMa - A toolbox for deriving Parkinson's disease Digital Markers from real-life wrist sensor data"
 authors = [ "Erik Post <erik.post@radboudumc.nl>",
             "Kars Veldkamp <kars.veldkamp@radboudumc.nl>",

{paradigma-0.4.7 → paradigma-1.0.1}/src/paradigma/config.py RENAMED Viewed

@@ -244,7 +244,7 @@ class TremorConfig(IMUConfig):
             }
-class HeartRateConfig(PPGConfig):
+class PulseRateConfig(PPGConfig):
     def __init__(self, sensor: str = 'ppg', min_window_length_s: int = 30) -> None:
         super().__init__()
@@ -265,14 +265,14 @@ class HeartRateConfig(PPGConfig):
         self.freq_bin_resolution = 0.05 # Hz
         # ---------------------
-        # Heart rate estimation
+        # Pulse rate estimation
         # ---------------------
         self.set_tfd_length(min_window_length_s)  # Set tfd length to default of 30 seconds
         self.threshold_sqa = 0.5
-        self.threshold_sqa_accelerometer = 0.13
+        self.threshold_sqa_accelerometer = 0.10
-        hr_est_length = 2
-        self.hr_est_samples = hr_est_length * self.sampling_frequency
+        pr_est_length = 2  # pulse rate estimation length in seconds
+        self.pr_est_samples = pr_est_length * self.sampling_frequency
         # Time-frequency distribution parameters
         self.kern_type = 'sep'
@@ -297,7 +297,7 @@ class HeartRateConfig(PPGConfig):
     def set_tfd_length(self, tfd_length: int):
         self.tfd_length = tfd_length
-        self.min_hr_samples = int(round(self.tfd_length * self.sampling_frequency))
+        self.min_pr_samples = int(round(self.tfd_length * self.sampling_frequency))
     def set_sensor(self, sensor):
         self.sensor = sensor

{paradigma-0.4.7 → paradigma-1.0.1}/src/paradigma/constants.py RENAMED Viewed

@@ -58,8 +58,8 @@ class DataColumns():
     PRED_SQA_ACC_LABEL: str = "pred_sqa_acc_label"
     PRED_SQA: str = "pred_sqa"
-    # Constants for heart rate
-    HEART_RATE: str = "heart_rate"
+    # Constants for pulse rate
+    PULSE_RATE: str = "pulse_rate"
 @dataclass(frozen=True)
 class DataUnits():

{paradigma-0.4.7 → paradigma-1.0.1}/src/paradigma/feature_extraction.py RENAMED Viewed

@@ -7,7 +7,7 @@ from scipy.signal import find_peaks, windows
 from scipy.stats import kurtosis, skew
 from sklearn.decomposition import PCA
-from paradigma.config import HeartRateConfig
+from paradigma.config import PulseRateConfig
 def compute_statistics(data: np.ndarray, statistic: str, abs_stats: bool=False) -> np.ndarray:
@@ -353,7 +353,7 @@ def extract_frequency_peak(
 def compute_relative_power(
         freqs: np.ndarray,
         psd: np.ndarray,
-        config: HeartRateConfig
+        config: PulseRateConfig
     ) -> list:
     """
     Calculate relative power within the dominant frequency band in the physiological range (0.75 - 3 Hz).
@@ -364,11 +364,11 @@ def compute_relative_power(
         The frequency bins of the power spectral density.
     psd: np.ndarray
         The power spectral density of the signal.
-    config: HeartRateConfig
+    config: PulseRateConfig
         The configuration object containing the parameters for the feature extraction. The following
         attributes are used:
         - freq_band_physio: tuple
-            The frequency band for physiological heart rate (default: (0.75, 3)).
+            The frequency band for physiological pulse rate (default: (0.75, 3)).
         - bandwidth: float
             The bandwidth around the peak frequency to consider for relative power calculation (default: 0.5).
@@ -597,11 +597,9 @@ def pca_transform_gyroscope(
         df: pd.DataFrame,
         y_gyro_colname: str,
         z_gyro_colname: str,
-        pred_colname: str | None = None,
 ) -> np.ndarray:
     """
-    Perform principal component analysis (PCA) on gyroscope data to estimate velocity. If pred_colname is provided,
-    the PCA is fitted on the predicted gait data. Otherwise, the PCA is fitted on the entire dataset.
+    Perform principal component analysis (PCA) on gyroscope data to estimate velocity.
     Parameters
     ----------
@@ -611,8 +609,6 @@ def pca_transform_gyroscope(
         The column name for the y-axis gyroscope data.
     z_gyro_colname : str
         The column name for the z-axis gyroscope data.
-    pred_colname : str, optional
-        The column name for the predicted gait (default: None).
     Returns
     -------
@@ -623,19 +619,9 @@ def pca_transform_gyroscope(
     y_gyro_array = df[y_gyro_colname].to_numpy()
     z_gyro_array = df[z_gyro_colname].to_numpy()
-    # Filter data based on predicted gait if pred_colname is provided
-    if pred_colname is not None:
-        pred_mask = df[pred_colname] == 1
-        y_gyro_fit_array = y_gyro_array[pred_mask]
-        z_gyro_fit_array = z_gyro_array[pred_mask]
-        # Fit PCA on predicted gait data
-        fit_data = np.column_stack((y_gyro_fit_array, z_gyro_fit_array))
-        full_data = np.column_stack((y_gyro_array, z_gyro_array))
-    else:
-        # Fit PCA on entire dataset
-        fit_data = np.column_stack((y_gyro_array, z_gyro_array))
-        full_data = fit_data
+    # Fit PCA
+    fit_data = np.column_stack((y_gyro_array, z_gyro_array))
+    full_data = fit_data
     pca = PCA(n_components=2, svd_solver='auto', random_state=22)
     pca.fit(fit_data)

{paradigma-0.4.7 → paradigma-1.0.1}/src/paradigma/pipelines/gait_pipeline.py RENAMED Viewed

@@ -1,20 +1,17 @@
 import numpy as np
-import os
 import pandas as pd
-from pathlib import Path
 from scipy.signal import periodogram
 from typing import List, Tuple
-import tsdf
 from paradigma.classification import ClassifierPackage
-from paradigma.constants import DataColumns, TimeUnit
+from paradigma.constants import DataColumns
 from paradigma.config import GaitConfig
 from paradigma.feature_extraction import pca_transform_gyroscope, compute_angle, remove_moving_average_angle, \
     extract_angle_extremes, compute_range_of_motion, compute_peak_angular_velocity, compute_statistics, \
     compute_std_euclidean_norm, compute_power_in_bandwidth, compute_dominant_frequency, compute_mfccs, \
     compute_total_power
 from paradigma.segmenting import tabulate_windows, create_segments, discard_segments, categorize_segments, WindowedDataExtractor
-from paradigma.util import aggregate_parameter, merge_predictions_with_timestamps, read_metadata, write_df_data, get_end_iso8601
+from paradigma.util import aggregate_parameter
 def extract_gait_features(
@@ -160,66 +157,35 @@ def detect_gait(
 def extract_arm_activity_features(
+        df: pd.DataFrame,
         config: GaitConfig,
-        df_timestamps: pd.DataFrame,
-        df_predictions: pd.DataFrame,
-        threshold: float
     ) -> pd.DataFrame:
     """
     Extract features related to arm activity from a time-series DataFrame.
     This function processes a DataFrame containing accelerometer, gravity, and gyroscope signals,
     and extracts features related to arm activity by performing the following steps:
-    1. Merges the gait predictions with timestamps by expanding overlapping windows into individual timestamps.
-    2. Computes the angle and velocity from gyroscope data.
-    3. Filters the data to include only predicted gait segments.
-    4. Groups the data into segments based on consecutive timestamps and pre-specified gaps.
-    5. Removes segments that do not meet predefined criteria.
-    6. Creates fixed-length windows from the time series data.
-    7. Extracts angle-related features, temporal domain features, and spectral domain features.
+    1. Computes the angle and velocity from gyroscope data.
+    2. Filters the data to include only predicted gait segments.
+    3. Groups the data into segments based on consecutive timestamps and pre-specified gaps.
+    4. Removes segments that do not meet predefined criteria.
+    5. Creates fixed-length windows from the time series data.
+    6. Extracts angle-related features, temporal domain features, and spectral domain features.
     Parameters
     ----------
-    config : GaitConfig
-        Configuration object containing column names and parameters for feature extraction.
-    df_timestamps : pd.DataFrame
-        A DataFrame containing the raw sensor data, including accelerometer, gravity, and gyroscope columns.
-    df_predictions : pd.DataFrame
-        A DataFrame containing the predicted probabilities for gait activity per window.
+    df: pd.DataFrame
+        The input DataFrame containing accelerometer, gravity, and gyroscope data of predicted gait.
     config : ArmActivityFeatureExtractionConfig
         Configuration object containing column names and parameters for feature extraction.
-    path_to_classifier_input : str | Path
-        The path to the directory containing the classifier files and other necessary input files for feature extraction.
     Returns
     -------
     pd.DataFrame
         A DataFrame containing the extracted arm activity features, including angle, velocity,
         temporal, and spectral features.
     """
-    if not any(df_predictions[DataColumns.PRED_GAIT_PROBA] >= threshold):
-        raise ValueError("No gait detected in the input data.")
-    # Merge gait predictions with timestamps
-    gait_preprocessing_config = GaitConfig(step='gait')
-    df = merge_predictions_with_timestamps(
-        df_ts=df_timestamps,
-        df_predictions=df_predictions,
-        pred_proba_colname=DataColumns.PRED_GAIT_PROBA,
-        window_length_s=gait_preprocessing_config.window_length_s,
-        fs=gait_preprocessing_config.sampling_frequency
-    )
-    # Add a column for predicted gait based on a fitted threshold
-    df[DataColumns.PRED_GAIT] = (df[DataColumns.PRED_GAIT_PROBA] >= threshold).astype(int)
-    # Filter the DataFrame to only include predicted gait (1)
-    df = df.loc[df[DataColumns.PRED_GAIT]==1].reset_index(drop=True)
     # Group consecutive timestamps into segments, with new segments starting after a pre-specified gap
     df[DataColumns.SEGMENT_NR] = create_segments(
         time_array=df[DataColumns.TIME],
@@ -315,8 +281,8 @@ def filter_gait(
     ----------
     df : pd.DataFrame
         The input DataFrame containing features extracted from gait data.
-    full_path_to_classifier_package : str | Path
-        The path to the pre-trained classifier file.
+    clf_package: ClassifierPackage
+        The pre-trained classifier package containing the classifier, threshold, and scaler.
     parallel : bool, optional, default=False
         If `True`, enables parallel processing.
@@ -351,10 +317,10 @@ def filter_gait(
 def quantify_arm_swing(
         df: pd.DataFrame,
-        max_segment_gap_s: float,
-        min_segment_length_s: float,
         fs: int,
         filtered: bool = False,
+        max_segment_gap_s: float = 1.5,
+        min_segment_length_s: float = 1.5
     ) -> Tuple[dict[str, pd.DataFrame], dict]:
     """
     Quantify arm swing parameters for segments of motion based on gyroscope data.
@@ -362,28 +328,27 @@ def quantify_arm_swing(
     Parameters
     ----------
     df : pd.DataFrame
-        A DataFrame containing the raw sensor data, including gyroscope columns. Should include a column
+        A DataFrame containing the raw sensor data of predicted gait timestamps. Should include a column
         for predicted no other arm activity based on a fitted threshold if filtered is True.
-    max_segment_gap_s : float
-        The maximum gap allowed between segments.
-    min_segment_length_s : float
-        The minimum length required for a segment to be considered valid.
     fs : int
         The sampling frequency of the sensor data.
     filtered : bool, optional, default=True
         If `True`, the gyroscope data is filtered to only include predicted no other arm activity.
+    max_segment_gap_s : float, optional, default=1.5
+        The maximum gap in seconds between consecutive timestamps to group them into segments.
+    min_segment_length_s : float, optional, default=1.5
+        The minimum length in seconds for a segment to be considered valid.
     Returns
     -------
     Tuple[pd.DataFrame, dict]
         A tuple containing a dataframe with quantified arm swing parameters and a dictionary containing
         metadata for each segment.
     """
     # Group consecutive timestamps into segments, with new segments starting after a pre-specified gap.
     # Segments are made based on predicted gait
     df[DataColumns.SEGMENT_NR] = create_segments(
@@ -391,6 +356,10 @@ def quantify_arm_swing(
         max_segment_gap_s=max_segment_gap_s
     )
+    # Segment category is determined based on predicted gait, hence it is set
+    # before filtering the DataFrame to only include predicted no other arm activity
+    df[DataColumns.SEGMENT_CAT] = categorize_segments(df=df, fs=fs)
     # Remove segments that do not meet predetermined criteria
     df = discard_segments(
         df=df,
@@ -401,40 +370,51 @@ def quantify_arm_swing(
     )
     if df.empty:
-        raise ValueError("No segments found in the input data.")
+        raise ValueError("No segments found in the input data after discarding segments of invalid shape.")
     # If no arm swing data is remaining, return an empty dictionary
     if filtered and df.loc[df[DataColumns.PRED_NO_OTHER_ARM_ACTIVITY]==1].empty:
         raise ValueError("No gait without other arm activities to quantify.")
-    df[DataColumns.SEGMENT_CAT] = categorize_segments(df=df, fs=fs)
-    # Group and process segments
-    arm_swing_quantified = []
-    segment_meta = {}
-    if filtered:
+    elif filtered:
         # Filter the DataFrame to only include predicted no other arm activity (1)
         df = df.loc[df[DataColumns.PRED_NO_OTHER_ARM_ACTIVITY]==1].reset_index(drop=True)
-        # Group consecutive timestamps into segments, with new segments starting after a pre-specified gap
-        # Now segments are based on predicted gait without other arm activity for subsequent processes
+        # Group consecutive timestamps into segments of filtered gait
         df[DataColumns.SEGMENT_NR] = create_segments(
             time_array=df[DataColumns.TIME],
             max_segment_gap_s=max_segment_gap_s
         )
-        pred_colname_pca = DataColumns.PRED_NO_OTHER_ARM_ACTIVITY
-    else:
-        pred_colname_pca = None
+        # Remove segments that do not meet predetermined criteria
+        df = discard_segments(
+            df=df,
+            segment_nr_colname=DataColumns.SEGMENT_NR,
+            min_segment_length_s=min_segment_length_s,
+            fs=fs,
+        )
+        if df.empty:
+            raise ValueError("No filtered gait segments found in the input data after discarding segments of invalid shape.")
+    arm_swing_quantified = []
+    segment_meta = {
+        'aggregated': {
+            'all': {
+                'duration_s': len(df[DataColumns.TIME]) / fs
+            },
+        },
+        'per_segment': {}
+    }
+    # PCA is fitted on only predicted gait without other arm activity if filtered, otherwise
+    # it is fitted on the entire gyroscope data
     df[DataColumns.VELOCITY] = pca_transform_gyroscope(
         df=df,
         y_gyro_colname=DataColumns.GYROSCOPE_Y,
         z_gyro_colname=DataColumns.GYROSCOPE_Z,
-        pred_colname=pred_colname_pca
     )
+    # Group and process segments
     for segment_nr, group in df.groupby(DataColumns.SEGMENT_NR, sort=False):
         segment_cat = group[DataColumns.SEGMENT_CAT].iloc[0]
         time_array = group[DataColumns.TIME].to_numpy()
@@ -452,8 +432,10 @@ def quantify_arm_swing(
             fs=fs,
         )
-        segment_meta[segment_nr] = {
-            'time_s': len(angle_array) / fs,
+        segment_meta['per_segment'][segment_nr] = {
+            'start_time_s': time_array.min(),
+            'end_time_s': time_array.max(),
+            'duration_s': len(angle_array) / fs,
             DataColumns.SEGMENT_CAT: segment_cat
         }
@@ -487,12 +469,20 @@ def quantify_arm_swing(
                 df_params_segment = pd.DataFrame({
                     DataColumns.SEGMENT_NR: segment_nr,
+                    DataColumns.SEGMENT_CAT: segment_cat,
                     DataColumns.RANGE_OF_MOTION: rom,
                     DataColumns.PEAK_VELOCITY: pav
                 })
                 arm_swing_quantified.append(df_params_segment)
+    # Combine segment categories
+    segment_categories = set([segment_meta['per_segment'][x][DataColumns.SEGMENT_CAT] for x in segment_meta['per_segment'].keys()])
+    for segment_cat in segment_categories:
+        segment_meta['aggregated'][segment_cat] = {
+            'duration_s': sum([segment_meta['per_segment'][x]['duration_s'] for x in segment_meta['per_segment'].keys() if segment_meta['per_segment'][x][DataColumns.SEGMENT_CAT] == segment_cat])
+        }
     arm_swing_quantified = pd.concat(arm_swing_quantified, ignore_index=True)
     return arm_swing_quantified, segment_meta
@@ -527,7 +517,7 @@ def aggregate_arm_swing_params(df_arm_swing_params: pd.DataFrame, segment_meta:
         cat_segments = [x for x in segment_meta.keys() if segment_meta[x][DataColumns.SEGMENT_CAT] == segment_cat]
         aggregated_results[segment_cat] = {
-            'time_s': sum([segment_meta[x]['time_s'] for x in cat_segments])
+            'duration_s': sum([segment_meta[x]['duration_s'] for x in cat_segments])
         }
         df_arm_swing_params_cat = df_arm_swing_params[df_arm_swing_params[DataColumns.SEGMENT_NR].isin(cat_segments)]
@@ -537,7 +527,7 @@ def aggregate_arm_swing_params(df_arm_swing_params: pd.DataFrame, segment_meta:
                 aggregated_results[segment_cat][f'{aggregate}_{arm_swing_parameter}'] = aggregate_parameter(df_arm_swing_params_cat[arm_swing_parameter], aggregate)
     aggregated_results['all_segment_categories'] = {
-        'time_s': sum([segment_meta[x]['time_s'] for x in segment_meta.keys()])
+        'duration_s': sum([segment_meta[x]['duration_s'] for x in segment_meta.keys()])
     }
     for arm_swing_parameter in arm_swing_parameters:

paradigma-0.4.7/src/paradigma/pipelines/heart_rate_pipeline.py → paradigma-1.0.1/src/paradigma/pipelines/pulse_rate_pipeline.py RENAMED Viewed

@@ -10,14 +10,14 @@ from typing import List
 from paradigma.classification import ClassifierPackage
 from paradigma.constants import DataColumns
-from paradigma.config import HeartRateConfig
+from paradigma.config import PulseRateConfig
 from paradigma.feature_extraction import compute_statistics, compute_signal_to_noise_ratio, compute_auto_correlation, \
     compute_dominant_frequency, compute_relative_power, compute_spectral_entropy
-from paradigma.pipelines.heart_rate_utils import assign_sqa_label, extract_hr_segments, extract_hr_from_segment
+from paradigma.pipelines.pulse_rate_utils import assign_sqa_label, extract_pr_segments, extract_pr_from_segment
 from paradigma.segmenting import tabulate_windows, WindowedDataExtractor
-from paradigma.util import read_metadata, aggregate_parameter
+from paradigma.util import aggregate_parameter
-def extract_signal_quality_features(df_ppg: pd.DataFrame, df_acc: pd.DataFrame, ppg_config: HeartRateConfig, acc_config: HeartRateConfig) -> pd.DataFrame:
+def extract_signal_quality_features(df_ppg: pd.DataFrame, df_acc: pd.DataFrame, ppg_config: PulseRateConfig, acc_config: PulseRateConfig) -> pd.DataFrame:
     """
     Extract signal quality features from the PPG signal.
     The features are extracted from the temporal and spectral domain of the PPG signal.
@@ -30,9 +30,9 @@ def extract_signal_quality_features(df_ppg: pd.DataFrame, df_acc: pd.DataFrame,
         The DataFrame containing the PPG signal.
     df_acc : pd.DataFrame
         The DataFrame containing the accelerometer signal.
-    ppg_config: HeartRateConfig
+    ppg_config: PulseRateConfig
         The configuration for the signal quality feature extraction of the PPG signal.
-    acc_config: HeartRateConfig
+    acc_config: PulseRateConfig
         The configuration for the signal quality feature extraction of the accelerometer signal.
     Returns
@@ -94,7 +94,7 @@ def extract_signal_quality_features(df_ppg: pd.DataFrame, df_acc: pd.DataFrame,
     return df_features
-def signal_quality_classification(df: pd.DataFrame, config: HeartRateConfig, full_path_to_classifier_package: str | Path) -> pd.DataFrame:
+def signal_quality_classification(df: pd.DataFrame, config: PulseRateConfig, full_path_to_classifier_package: str | Path) -> pd.DataFrame:
     """
     Classify the signal quality of the PPG signal using a logistic regression classifier. A probability close to 1 indicates a high-quality signal, while a probability close to 0 indicates a low-quality signal.
     The classifier is trained on features extracted from the PPG signal. The features are extracted using the extract_signal_quality_features function.
@@ -105,7 +105,7 @@ def signal_quality_classification(df: pd.DataFrame, config: HeartRateConfig, ful
     ----------
     df : pd.DataFrame
         The DataFrame containing the PPG features and the accelerometer feature for signal quality classification.
-    config : HeartRateConfig
+    config : PulseRateConfig
         The configuration for the signal quality classification.
     full_path_to_classifier_package : str | Path
         The path to the directory containing the classifier.
@@ -128,9 +128,9 @@ def signal_quality_classification(df: pd.DataFrame, config: HeartRateConfig, ful
     return df[[DataColumns.TIME, DataColumns.PRED_SQA_PROBA, DataColumns.PRED_SQA_ACC_LABEL]]  # Return only the relevant columns, namely the predicted probabilities for the PPG signal quality and the accelerometer label
-def estimate_heart_rate(df_sqa: pd.DataFrame, df_ppg_preprocessed: pd.DataFrame, config: HeartRateConfig) -> pd.DataFrame:
+def estimate_pulse_rate(df_sqa: pd.DataFrame, df_ppg_preprocessed: pd.DataFrame, config: PulseRateConfig) -> pd.DataFrame:
     """
-    Estimate the heart rate from the PPG signal using the time-frequency domain method.
+    Estimate the pulse rate from the PPG signal using the time-frequency domain method.
     Parameters
     ----------
@@ -138,13 +138,13 @@ def estimate_heart_rate(df_sqa: pd.DataFrame, df_ppg_preprocessed: pd.DataFrame,
         The DataFrame containing the signal quality assessment predictions.
     df_ppg_preprocessed : pd.DataFrame
         The DataFrame containing the preprocessed PPG signal.
-    config : HeartRateConfig
-        The configuration for the heart rate estimation.
+    config : PulseRateConfig
+        The configuration for the pulse rate estimation.
     Returns
     -------
-    df_hr : pd.DataFrame
-        The DataFrame containing the heart rate estimations.
+    df_pr : pd.DataFrame
+        The DataFrame containing the pulse rate estimations.
     """
     # Extract NumPy arrays for faster operations
@@ -156,13 +156,13 @@ def estimate_heart_rate(df_sqa: pd.DataFrame, df_ppg_preprocessed: pd.DataFrame,
     # Assign window-level probabilities to individual samples
     sqa_label = assign_sqa_label(ppg_post_prob, config, acc_label) # assigns a signal quality label to every individual data point
-    v_start_idx, v_end_idx = extract_hr_segments(sqa_label, config.min_hr_samples) # extracts heart rate segments based on the SQA label
+    v_start_idx, v_end_idx = extract_pr_segments(sqa_label, config.min_pr_samples) # extracts pulse rate segments based on the SQA label
-    v_hr_rel = np.array([])
-    t_hr_rel = np.array([])
+    v_pr_rel = np.array([])
+    t_pr_rel = np.array([])
-    edge_add = 2 * config.sampling_frequency  # Add 2s on both sides of the segment for HR estimation
-    step_size = config.hr_est_samples  # Step size for HR estimation
+    edge_add = 2 * config.sampling_frequency  # Add 2s on both sides of the segment for PR estimation
+    step_size = config.pr_est_samples  # Step size for PR estimation
     # Estimate the maximum size for preallocation
     valid_segments = (v_start_idx >= edge_add) & (v_end_idx <= len(ppg_preprocessed) - edge_add) # check if the segments are valid, e.g. not too close to the edges (2s)
@@ -171,55 +171,55 @@ def estimate_heart_rate(df_sqa: pd.DataFrame, df_ppg_preprocessed: pd.DataFrame,
     max_size = np.sum((valid_end_idx - valid_start_idx) // step_size) # maximum size for preallocation
     # Preallocate arrays
-    v_hr_rel = np.empty(max_size, dtype=float)
-    t_hr_rel = np.empty(max_size, dtype=float)
+    v_pr_rel = np.empty(max_size, dtype=float)
+    t_pr_rel = np.empty(max_size, dtype=float)
     # Track current position
-    hr_pos = 0
+    pr_pos = 0
     for start_idx, end_idx in zip(valid_start_idx, valid_end_idx):
         # Extract extended PPG segment
         extended_ppg_segment = ppg_preprocessed[start_idx - edge_add : end_idx + edge_add, ppg_idx]
-        # Estimate heart rate
-        hr_est = extract_hr_from_segment(
+        # Estimate pulse rate
+        pr_est = extract_pr_from_segment(
             extended_ppg_segment,
             config.tfd_length,
             config.sampling_frequency,
             config.kern_type,
             config.kern_params,
         )
-        n_hr = len(hr_est)  # Number of heart rate estimates
-        end_idx_time = n_hr * step_size + start_idx  # Calculate end index for time, different from end_idx since it is always a multiple of step_size, while end_idx is not
+        n_pr = len(pr_est)  # Number of pulse rate estimates
+        end_idx_time = n_pr * step_size + start_idx  # Calculate end index for time, different from end_idx since it is always a multiple of step_size, while end_idx is not
-        # Extract relative time for HR estimates
-        hr_time = ppg_preprocessed[start_idx : end_idx_time : step_size, time_idx]
+        # Extract relative time for PR estimates
+        pr_time = ppg_preprocessed[start_idx : end_idx_time : step_size, time_idx]
         # Insert into preallocated arrays
-        v_hr_rel[hr_pos:hr_pos + n_hr] = hr_est
-        t_hr_rel[hr_pos:hr_pos + n_hr] = hr_time
-        hr_pos += n_hr
+        v_pr_rel[pr_pos:pr_pos + n_pr] = pr_est
+        t_pr_rel[pr_pos:pr_pos + n_pr] = pr_time
+        pr_pos += n_pr
-    df_hr = pd.DataFrame({"time": t_hr_rel, "heart_rate": v_hr_rel})
+    df_pr = pd.DataFrame({"time": t_pr_rel, "pulse_rate": v_pr_rel})
-    return df_hr
+    return df_pr
-def aggregate_heart_rate(hr_values: np.ndarray, aggregates: List[str] = ['mode', '99p']) -> dict:
+def aggregate_pulse_rate(pr_values: np.ndarray, aggregates: List[str] = ['mode', '99p']) -> dict:
     """
-    Aggregate the heart rate estimates using the specified aggregation methods.
+    Aggregate the pulse rate estimates using the specified aggregation methods.
     Parameters
     ----------
-    hr_values : np.ndarray
-        The array containing the heart rate estimates
+    pr_values : np.ndarray
+        The array containing the pulse rate estimates
     aggregates : List[str]
-        The list of aggregation methods to be used for the heart rate estimates. The default is ['mode', '99p'].
+        The list of aggregation methods to be used for the pulse rate estimates. The default is ['mode', '99p'].
     Returns
     -------
     aggregated_results : dict
-        The dictionary containing the aggregated results of the heart rate estimates.
+        The dictionary containing the aggregated results of the pulse rate estimates.
     """
     # Initialize the dictionary for the aggregated results
     aggregated_results = {}
@@ -227,19 +227,19 @@ def aggregate_heart_rate(hr_values: np.ndarray, aggregates: List[str] = ['mode',
     # Initialize the dictionary for the aggregated results with the metadata
     aggregated_results = {
     'metadata': {
-        'nr_hr_est': len(hr_values)
+        'nr_pr_est': len(pr_values)
     },
-    'hr_aggregates': {}
+    'pr_aggregates': {}
 }
     for aggregate in aggregates:
-        aggregated_results['hr_aggregates'][f'{aggregate}_{DataColumns.HEART_RATE}'] = aggregate_parameter(hr_values, aggregate)
+        aggregated_results['pr_aggregates'][f'{aggregate}_{DataColumns.PULSE_RATE}'] = aggregate_parameter(pr_values, aggregate)
     return aggregated_results
 def extract_temporal_domain_features(
         ppg_windowed: np.ndarray,
-        config: HeartRateConfig,
+        config: PulseRateConfig,
         quality_stats: List[str] = ['mean', 'std']
     ) -> pd.DataFrame:
     """
@@ -250,7 +250,7 @@ def extract_temporal_domain_features(
     ppg_windowed: np.ndarray
         The dataframe containing the windowed accelerometer signal
-    config: HeartRateConfig
+    config: PulseRateConfig
         The configuration object containing the parameters for the feature extraction
     quality_stats: list, optional
@@ -273,7 +273,7 @@ def extract_temporal_domain_features(
 def extract_spectral_domain_features(
         ppg_windowed: np.ndarray,
-        config: HeartRateConfig,
+        config: PulseRateConfig,
     ) -> pd.DataFrame:
     """
     Calculate the spectral features (dominant frequency, relative power, and spectral entropy)
@@ -285,7 +285,7 @@ def extract_spectral_domain_features(
     ppg_windowed: np.ndarray
         The dataframe containing the windowed ppg signal
-    config: HeartRateConfig
+    config: PulseRateConfig
         The configuration object containing the parameters for the feature extraction
     Returns
@@ -371,7 +371,7 @@ def extract_acc_power_feature(
 def extract_accelerometer_feature(
         acc_windowed: np.ndarray,
         ppg_windowed: np.ndarray,
-        config: HeartRateConfig
+        config: PulseRateConfig
     ) -> pd.DataFrame:
     """
     Extract accelerometer features from the accelerometer signal in the PPG frequency range.
@@ -384,7 +384,7 @@ def extract_accelerometer_feature(
     ppg_windowed: np.ndarray
         The dataframe containing the corresponding windowed ppg signal
-    config: HeartRateConfig
+    config: PulseRateConfig
         The configuration object containing the parameters for the feature extraction
     Returns

paradigma-0.4.7/src/paradigma/pipelines/heart_rate_utils.py → paradigma-1.0.1/src/paradigma/pipelines/pulse_rate_utils.py RENAMED Viewed

@@ -2,12 +2,12 @@ import numpy as np
 from scipy import signal
 from typing import Tuple
-from paradigma.config import HeartRateConfig
+from paradigma.config import PulseRateConfig
 def assign_sqa_label(
         ppg_prob: np.ndarray,
-        config: HeartRateConfig,
+        config: PulseRateConfig,
         acc_label=None
     ) -> np.ndarray:
     """
@@ -17,7 +17,7 @@ def assign_sqa_label(
     ----------
     ppg_prob : np.ndarray
         The probabilities for PPG.
-    config : HeartRateConfig
+    config : PulseRateConfig
         The configuration parameters.
     acc_label : np.ndarray, optional
         The labels for the accelerometer.
@@ -61,23 +61,23 @@ def assign_sqa_label(
     return sqa_label
-def extract_hr_segments(sqa_label: np.ndarray, min_hr_samples: int) -> Tuple[np.ndarray, np.ndarray]:
+def extract_pr_segments(sqa_label: np.ndarray, min_pr_samples: int) -> Tuple[np.ndarray, np.ndarray]:
     """
-    Extracts heart rate segments based on the SQA label.
+    Extracts pulse rate segments based on the SQA label.
     Parameters
     ----------
     sqa_label : np.ndarray
         The signal quality assessment label.
-    min_hr_samples : int
-        The minimum number of samples required for a heart rate segment.
+    min_pr_samples : int
+        The minimum number of samples required for a pulse rate segment.
     Returns
     -------
     Tuple[v_start_idx_long, v_end_idx_long]
-        The start and end indices of the heart rate segments.
+        The start and end indices of the pulse rate segments.
     """
-    # Find the start and end indices of the heart rate segments
+    # Find the start and end indices of the pulse rate segments
     v_start_idx = np.where(np.diff(sqa_label.astype(int)) == 1)[0] + 1
     v_end_idx = np.where(np.diff(sqa_label.astype(int)) == -1)[0] + 1
@@ -88,13 +88,13 @@ def extract_hr_segments(sqa_label: np.ndarray, min_hr_samples: int) -> Tuple[np.
         v_end_idx = np.append(v_end_idx, len(sqa_label))
     # Check if the segments are long enough
-    v_start_idx_long = v_start_idx[(v_end_idx - v_start_idx) >= min_hr_samples]
-    v_end_idx_long = v_end_idx[(v_end_idx - v_start_idx) >= min_hr_samples]
+    v_start_idx_long = v_start_idx[(v_end_idx - v_start_idx) >= min_pr_samples]
+    v_end_idx_long = v_end_idx[(v_end_idx - v_start_idx) >= min_pr_samples]
     return v_start_idx_long, v_end_idx_long
-def extract_hr_from_segment(
+def extract_pr_from_segment(
         ppg: np.ndarray,
         tfd_length: int,
         fs: int,
@@ -102,7 +102,7 @@ def extract_hr_from_segment(
         kern_params: dict
     ) -> np.ndarray:
     """
-    Extracts heart rate from the time-frequency distribution of the PPG signal.
+    Extracts pulse rate from the time-frequency distribution of the PPG signal.
     Parameters
     ----------
@@ -121,7 +121,7 @@ def extract_hr_from_segment(
     Returns
     -------
     np.ndarray
-        The estimated heart rate.
+        The estimated pulse rate.
     """
     # Constants to handle boundary effects
@@ -145,23 +145,23 @@ def extract_hr_from_segment(
             end_idx = len(ppg)
         ppg_segments.append(ppg[start_idx:end_idx])
-    hr_est_from_ppg = np.array([])
+    pr_est_from_ppg = np.array([])
     for segment in ppg_segments:
         # Calculate the time-frequency distribution
-        hr_tfd = extract_hr_with_tfd(segment, fs, kern_type, kern_params)
-        hr_est_from_ppg = np.concatenate((hr_est_from_ppg, hr_tfd))
+        pr_tfd = extract_pr_with_tfd(segment, fs, kern_type, kern_params)
+        pr_est_from_ppg = np.concatenate((pr_est_from_ppg, pr_tfd))
-    return hr_est_from_ppg
+    return pr_est_from_ppg
-def extract_hr_with_tfd(
+def extract_pr_with_tfd(
         ppg: np.ndarray,
         fs: int,
         kern_type: str,
         kern_params: dict
     ) -> np.ndarray:
     """
-    Estimate heart rate (HR) from a PPG segment using a TFD method with optional
+    Estimate pulse rate (PR) from a PPG segment using a TFD method with optional
     moving average filtering.
     Parameters
@@ -177,8 +177,8 @@ def extract_hr_with_tfd(
     Returns
     -------
-    hr_smooth_tfd : np.ndarray
-        Estimated HR values (in beats per minute) for each 2-second segment of the PPG signal.
+    pr_smooth_tfd : np.ndarray
+        Estimated pr values (in beats per minute) for each 2-second segment of the PPG signal.
     """
     # Generate the TFD matrix using the specified kernel
     tfd_obj = TimeFreqDistr()
@@ -189,16 +189,16 @@ def extract_hr_with_tfd(
     time_axis = np.arange(num_time_samples) / fs
     freq_axis = np.linspace(0, 0.5, num_freq_bins) * fs
-    # Estimate HR by identifying the max frequency in the TFD
+    # Estimate pulse rate by identifying the max frequency in the TFD
     max_freq_indices = np.argmax(tfd, axis=0)
-    hr_smooth_tfd = np.array([])
+    pr_smooth_tfd = np.array([])
     for i in range(2, int(len(ppg) / fs) - 4 + 1, 2):  # Skip the first and last 2 seconds, add 1 to include the last segment
         relevant_indices = (time_axis >= i) & (time_axis < i + 2)
         avg_frequency = np.mean(freq_axis[max_freq_indices[relevant_indices]])
-        hr_smooth_tfd = np.concatenate((hr_smooth_tfd, [60 * avg_frequency]))  # Convert frequency to BPM
+        pr_smooth_tfd = np.concatenate((pr_smooth_tfd, [60 * avg_frequency]))  # Convert frequency to BPM
-    return hr_smooth_tfd
+    return pr_smooth_tfd
 class TimeFreqDistr:

{paradigma-0.4.7 → paradigma-1.0.1}/src/paradigma/pipelines/tremor_pipeline.py RENAMED Viewed

@@ -143,7 +143,6 @@ def detect_tremor(df: pd.DataFrame, config: TremorConfig, full_path_to_classifie
     return df
 def aggregate_tremor(df: pd.DataFrame, config: TremorConfig):
     """
     Quantifies the amount of tremor time and tremor power, aggregated over all windows in the input dataframe.
@@ -154,8 +153,8 @@ def aggregate_tremor(df: pd.DataFrame, config: TremorConfig):
     Parameters
     ----------
     df : pd.DataFrame
-        The input DataFrame containing extracted tremor features. The DataFrame must include
-        the necessary columns as specified in the classifier's feature names.
+        The input DataFrame containing the tremor predictions and computed tremor power.
+        The DataFrame must also contain a datatime column ('time_dt').
     config : TremorConfig
         Configuration object containing the percentile for aggregating tremor power.
@@ -163,8 +162,8 @@ def aggregate_tremor(df: pd.DataFrame, config: TremorConfig):
     Returns
     -------
     dict
-        A dictionary with the aggregated tremor time and tremor power measures, as well as the total number of windows
-        available in the input dataframe, and the number of windows at rest.
+        A dictionary with the aggregated tremor time and tremor power measures, as well as the number of valid days,
+        the total number of windows, and the number of windows at rest available in the input dataframe.
     Notes
     -----
@@ -173,7 +172,7 @@ def aggregate_tremor(df: pd.DataFrame, config: TremorConfig):
     - The modal tremor power is computed based on gaussian kernel density estimation.
     """
+    nr_valid_days = df['time_dt'].dt.date.unique().size # number of valid days in the input dataframe
     nr_windows_total = df.shape[0] # number of windows in the input dataframe
     # remove windows with detected non-tremor arm movements to control for the amount of arm activities performed
@@ -216,6 +215,7 @@ def aggregate_tremor(df: pd.DataFrame, config: TremorConfig):
     # store aggregates in json format
     d_aggregates = {
         'metadata': {
+            'nr_valid_days': nr_valid_days,
             'nr_windows_total': nr_windows_total,
             'nr_windows_rest': nr_windows_rest
         },
@@ -250,6 +250,7 @@ def extract_spectral_domain_features(data: np.ndarray, config) -> pd.DataFrame:
     pd.DataFrame
         The feature dataframe containing the extracted spectral features, including
         MFCCs, the frequency of the peak, the tremor power and below tremor power for each window.
     """
     # Initialize a dictionary to hold the results

{paradigma-0.4.7 → paradigma-1.0.1}/src/paradigma/preprocessing.py RENAMED Viewed

@@ -17,7 +17,9 @@ def resample_data(
     df: pd.DataFrame,
     time_column : str,
     values_column_names: List[str],
+    sampling_frequency: int,
     resampling_frequency: int,
+    tolerance: float | None = None
 ) -> pd.DataFrame:
     """
     Resamples sensor data to a specified frequency using cubic interpolation.
@@ -30,8 +32,14 @@ def resample_data(
         The name of the column containing the time data.
     values_column_names : List[str]
         A list of column names that should be resampled.
+    sampling_frequency : int
+        The original sampling frequency of the data (in Hz).
     resampling_frequency : int
         The frequency to which the data should be resampled (in Hz).
+    tolerance : float, optional
+        The tolerance added to the expected difference when checking
+        for contiguous timestamps. If not provided, it defaults to
+        twice the expected interval.
     Returns
     -------
@@ -46,23 +54,35 @@ def resample_data(
     Notes
     -----
-    The function uses cubic interpolation to resample the data to the specified frequency.
-    It requires the input time array to be strictly increasing.
+    - Uses cubic interpolation for smooth resampling if there are enough points.
+    - If only two timestamps are available, it falls back to linear interpolation.
     """
+    # Set default tolerance if not provided to twice the expected interval
+    if tolerance is None:
+        tolerance = 2 * 1 / sampling_frequency
-    # Extract time and values from DataFrame
+    # Extract time and values
     time_abs_array = np.array(df[time_column])
     values_array = np.array(df[values_column_names])
     # Ensure the time array is strictly increasing
     if not np.all(np.diff(time_abs_array) > 0):
-        raise ValueError("time_abs_array is not strictly increasing")
+        raise ValueError("Time array is not strictly increasing")
+    # Ensure the time array is contiguous
+    expected_interval = 1 / sampling_frequency
+    timestamp_diffs = np.diff(time_abs_array)
+    if np.any(np.abs(timestamp_diffs - expected_interval) > tolerance):
+        raise ValueError("Time array is not contiguous")
     # Resample the time data using the specified frequency
     t_resampled = np.arange(time_abs_array[0], time_abs_array[-1], 1 / resampling_frequency)
-    # Interpolate the data using cubic interpolation
-    interpolator = interp1d(time_abs_array, values_array, axis=0, kind="cubic")
+    # Choose interpolation method
+    interpolation_kind = "cubic" if len(time_abs_array) > 3 else "linear"
+    interpolator = interp1d(time_abs_array, values_array, axis=0, kind=interpolation_kind, fill_value="extrapolate")
+    # Interpolate
     resampled_values = interpolator(t_resampled)
     # Create a DataFrame with the resampled data
@@ -186,7 +206,8 @@ def preprocess_imu_data(df: pd.DataFrame, config: IMUConfig, sensor: str, watch_
     df = resample_data(
         df=df,
         time_column=DataColumns.TIME,
-        values_column_names = values_colnames,
+        values_column_names=values_colnames,
+        sampling_frequency=config.sampling_frequency,
         resampling_frequency=config.sampling_frequency
     )
@@ -259,6 +280,7 @@ def preprocess_ppg_data(df_ppg: pd.DataFrame, df_acc: pd.DataFrame, ppg_config:
         df=df_acc_overlapping,
         time_column=DataColumns.TIME,
         values_column_names = list(imu_config.d_channels_accelerometer.keys()),
+        sampling_frequency=imu_config.sampling_frequency,
         resampling_frequency=imu_config.sampling_frequency
     )
@@ -267,6 +289,7 @@ def preprocess_ppg_data(df_ppg: pd.DataFrame, df_acc: pd.DataFrame, ppg_config:
         df=df_ppg_overlapping,
         time_column=DataColumns.TIME,
         values_column_names = list(ppg_config.d_channels_ppg.keys()),
+        sampling_frequency=ppg_config.sampling_frequency,
         resampling_frequency=ppg_config.sampling_frequency
     )

{paradigma-0.4.7 → paradigma-1.0.1}/src/paradigma/segmenting.py RENAMED Viewed

@@ -168,7 +168,7 @@ def create_segments(
     gap_exceeds = time_diff > max_segment_gap_s
     # Create the segment number based on the cumulative sum of the gap_exceeds mask
-    segments = gap_exceeds.cumsum() + 1  # +1 to start enumeration from 1
+    segments = gap_exceeds.cumsum()
     return segments
@@ -236,6 +236,9 @@ def discard_segments(
     df = df[valid_segment_mask].copy()
+    if df.empty:
+        raise ValueError("All segments were removed.")
     # Reset segment numbers in a single step
     unique_segments = pd.factorize(df[segment_nr_colname])[0] + 1
     df[segment_nr_colname] = unique_segments

{paradigma-0.4.7 → paradigma-1.0.1}/src/paradigma/testing.py RENAMED Viewed

@@ -7,16 +7,16 @@ import tsdf
 from typing import List
 from paradigma.classification import ClassifierPackage
-from paradigma.config import IMUConfig, PPGConfig, GaitConfig, TremorConfig, HeartRateConfig
+from paradigma.config import IMUConfig, PPGConfig, GaitConfig, TremorConfig, PulseRateConfig
 from paradigma.constants import DataColumns, TimeUnit
 from paradigma.pipelines.gait_pipeline import extract_gait_features, detect_gait, \
     extract_arm_activity_features, filter_gait
 from paradigma.pipelines.tremor_pipeline import extract_tremor_features, detect_tremor, \
     aggregate_tremor
-from paradigma.pipelines.heart_rate_pipeline import extract_signal_quality_features, signal_quality_classification, \
-    aggregate_heart_rate
+from paradigma.pipelines.pulse_rate_pipeline import extract_signal_quality_features, signal_quality_classification, \
+    aggregate_pulse_rate
 from paradigma.preprocessing import preprocess_imu_data, preprocess_ppg_data
-from paradigma.util import read_metadata, write_df_data, get_end_iso8601
+from paradigma.util import read_metadata, write_df_data, get_end_iso8601, merge_predictions_with_timestamps
 def preprocess_imu_data_io(path_to_input: str | Path, path_to_output: str | Path,
@@ -208,13 +208,27 @@ def extract_arm_activity_features_io(
     clf_package = ClassifierPackage.load(full_path_to_classifier_package)
+    gait_preprocessing_config = GaitConfig(step='gait')
+    df = merge_predictions_with_timestamps(
+        df_ts=df_ts,
+        df_predictions=df_pred_gait,
+        pred_proba_colname=DataColumns.PRED_GAIT_PROBA,
+        window_length_s=gait_preprocessing_config.window_length_s,
+        fs=gait_preprocessing_config.sampling_frequency
+    )
+    # Add a column for predicted gait based on a fitted threshold
+    df[DataColumns.PRED_GAIT] = (df[DataColumns.PRED_GAIT_PROBA] >= clf_package.threshold).astype(int)
+    # Filter the DataFrame to only include predicted gait (1)
+    df = df.loc[df[DataColumns.PRED_GAIT]==1].reset_index(drop=True)
     # Extract arm activity features
     config = GaitConfig(step='arm_activity')
     df_features = extract_arm_activity_features(
+        df=df,
         config=config,
-        df_timestamps=df_ts,
-        df_predictions=df_pred_gait,
-        threshold=clf_package.threshold
     )
     end_iso8601 = get_end_iso8601(metadata_ts_values.start_iso8601, df_features[DataColumns.TIME][-1:].values[0] + config.window_length_s)
@@ -339,7 +353,7 @@ def aggregate_tremor_io(path_to_feature_input: str | Path, path_to_prediction_in
         json.dump(d_aggregates, json_file, indent=4)
-def extract_signal_quality_features_io(input_path: str | Path, output_path: str | Path, ppg_config: HeartRateConfig, acc_config: HeartRateConfig) -> pd.DataFrame:
+def extract_signal_quality_features_io(input_path: str | Path, output_path: str | Path, ppg_config: PulseRateConfig, acc_config: PulseRateConfig) -> pd.DataFrame:
     """
     Extract signal quality features from the PPG signal and save them to a file.
@@ -349,9 +363,9 @@ def extract_signal_quality_features_io(input_path: str | Path, output_path: str
         The path to the directory containing the preprocessed PPG and accelerometer data.
     output_path : str | Path
         The path to the directory where the extracted features will be saved.
-    ppg_config: HeartRateConfig
+    ppg_config: PulseRateConfig
         The configuration for the signal quality feature extraction of the ppg signal.
-    acc_config: HeartRateConfig
+    acc_config: PulseRateConfig
         The configuration for the signal quality feature extraction of the accelerometer signal.
     Returns
@@ -376,7 +390,7 @@ def extract_signal_quality_features_io(input_path: str | Path, output_path: str
     return df_windowed
-def signal_quality_classification_io(input_path: str | Path, output_path: str | Path, path_to_classifier_input: str | Path, config: HeartRateConfig) -> None:
+def signal_quality_classification_io(input_path: str | Path, output_path: str | Path, path_to_classifier_input: str | Path, config: PulseRateConfig) -> None:
     # Load the data
     metadata_time, metadata_values = read_metadata(input_path, config.meta_filename, config.time_filename, config.values_filename)
@@ -385,32 +399,32 @@ def signal_quality_classification_io(input_path: str | Path, output_path: str |
     df_sqa = signal_quality_classification(df_windowed, config, path_to_classifier_input)
-def aggregate_heart_rate_io(
+def aggregate_pulse_rate_io(
         full_path_to_input: str | Path,
         full_path_to_output: str | Path,
         aggregates: List[str] = ['mode', '99p']
     ) -> None:
     """
-    Extract heart rate from the PPG signal and save the aggregated heart rate estimates to a file.
+    Extract pulse rate from the PPG signal and save the aggregated pulse rate estimates to a file.
     Parameters
     ----------
     input_path : str | Path
-        The path to the directory containing the heart rate estimates.
+        The path to the directory containing the pulse rate estimates.
     output_path : str | Path
-        The path to the directory where the aggregated heart rate estimates will be saved.
+        The path to the directory where the aggregated pulse rate estimates will be saved.
     aggregates : List[str]
-        The list of aggregation methods to be used for the heart rate estimates. The default is ['mode', '99p'].
+        The list of aggregation methods to be used for the pulse rate estimates. The default is ['mode', '99p'].
     """
-    # Load the heart rate estimates
+    # Load the pulse rate estimates
     with open(full_path_to_input, 'r') as f:
-        df_hr = json.load(f)
+        df_pr = json.load(f)
-    # Aggregate the heart rate estimates
-    hr_values = df_hr['heart_rate'].values
-    df_hr_aggregates = aggregate_heart_rate(hr_values, aggregates)
+    # Aggregate the pulse rate estimates
+    pr_values = df_pr['pulse_rate'].values
+    df_pr_aggregates = aggregate_pulse_rate(pr_values, aggregates)
-    # Save the aggregated heart rate estimates
+    # Save the aggregated pulse rate estimates
     with open(full_path_to_output, 'w') as json_file:
-        json.dump(df_hr_aggregates, json_file, indent=4)
+        json.dump(df_pr_aggregates, json_file, indent=4)

{paradigma-0.4.7 → paradigma-1.0.1}/src/paradigma/util.py RENAMED Viewed

@@ -1,9 +1,7 @@
-import json
 import os
 import numpy as np
 import pandas as pd
-from pathlib import Path
-from datetime import timedelta
+from datetime import datetime, timedelta
 from dateutil import parser
 from typing import List, Tuple
@@ -432,3 +430,61 @@ def merge_predictions_with_timestamps(
     df_ts = df_ts.dropna(subset=[pred_proba_colname])
     return df_ts
+def select_hours(df: pd.DataFrame, select_hours_start: str, select_hours_end: str) -> pd.DataFrame:
+    """
+    Select hours of interest from the data to include in the aggregation step.
+    Parameters
+    ----------
+    df : pd.DataFrame
+        Input data.
+    select_hours_start: str
+        The start time of the selected hours in "HH:MM" format.
+    select_hours_end: str
+            The end time of the selected hours in "HH:MM" format.
+    Returns
+    -------
+    pd.DataFrame
+        The selected data.
+    """
+    select_hours_start = datetime.strptime(select_hours_start, '%H:%M').time() # convert to time object
+    select_hours_end = datetime.strptime(select_hours_end, '%H:%M').time()
+    df_subset = df[df['time_dt'].dt.time.between(select_hours_start, select_hours_end)] # select the hours of interest
+    return df_subset
+def select_days(df: pd.DataFrame, min_hours_per_day: int) -> pd.DataFrame:
+    """
+    Select days of interest from the data to include in the aggregation step.
+    Parameters
+    ----------
+    df : pd.DataFrame
+        Input data with column 'time_dt' in which the date is stored.
+    min_hours_per_day: int
+        The minimum number of hours per day required for including the day in the aggregation step.
+    Returns
+    -------
+    pd.DataFrame
+        The selected data.
+    """
+    min_s_per_day = min_hours_per_day * 3600
+    window_length_s = df['time_dt'].diff().dt.total_seconds()[1] # determine the length of the first window in seconds
+    min_windows_per_day = min_s_per_day / window_length_s
+    df_subset = df.groupby(df['time_dt'].dt.date).filter(lambda x: len(x) >= min_windows_per_day)
+    return df_subset