PyPI - paradigma - Versions diffs - 0.4.7__tar.gz → 1.0.0__tar.gz - Mend

paradigma 0.4.7tar.gz → 1.0.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (22) hide show

{paradigma-0.4.7 → paradigma-1.0.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.3
 Name: paradigma
-Version: 0.4.7
+Version: 1.0.0
 Summary: ParaDigMa - A toolbox for deriving Parkinson's disease Digital Markers from real-life wrist sensor data
 License: Apache-2.0
 Author: Erik Post
@@ -95,7 +95,7 @@ The ParaDigMa toolbox is designed for the analysis of passive monitoring data co
 Specific requirements include:
 | Pipeline               | Sensor Configuration                                                                                                       | Context of Use                                                                                             |
 |------------------------|--------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|
-| **All**               | - Sensor position: wrist-band on most or least affected side (validated for both, but different sensitivity for measuring disease progression for tremor and arm swing during gait).  <br> - Sensor orientation: orientation as described in [Coordinate System](https://biomarkersparkinson.github.io/paradigma/guides/coordinate_system.html). | - Population: persons with PD. <br> - Data collection protocol: passive monitoring in daily life. |
+| **All**               | - Sensor position: wrist-band on most or least affected side (validated for both, but different sensitivity for measuring disease progression for tremor and arm swing during gait).  <br> - Sensor orientation: orientation as described in [Coordinate System](https://biomarkersparkinson.github.io/paradigma/guides/coordinate_system.html). <br> - Timeframe: contiguous, strictly increasing timestamps. | - Population: persons with PD. <br> - Data collection protocol: passive monitoring in daily life. |
 | **Arm swing during gait** | - Accelerometer: minimum sampling rate of 100 Hz, minimum range of ± 4 g. <br> - Gyroscope: minimum sampling rate of 100 Hz, minimum range of ± 1000 degrees/sec. | - Population: no walking aid, no severe dyskinesia in the watch-sided arm. <br> - Compliance: for weekly measures: at least three compliant days (with ≥10 hours of data between 8 am and 10 pm), and at least 2 minutes of arm swing. |
 | **Tremor**            | - Gyroscope: minimum sampling rate of 100 Hz, minimum range of ± 1000 degrees/sec. | - Compliance: for weekly measures: at least three compliant days (with ≥10 hours of data between 8 am and 10 pm). |
 | **Pulse rate**        | - PPG*: minimum sampling rate of 30 Hz, green LED. <br> - Accelerometer: minimum sampling rate of 100 Hz, minimum range of ± 4 g. | - Population: no rhythm disorders (e.g. atrial fibrillation, atrial flutter). <br> - Compliance: for weekly measures: minimum average of 12 hours of data per day. |
@@ -111,8 +111,10 @@ We have included support for [TSDF](https://biomarkersparkinson.github.io/tsdf/)
 ## Scientific validation
-The pipelines were developed and validated using data from the Parkinson@Home Validation study [[Evers et al. 2020]](https://pmc.ncbi.nlm.nih.gov/articles/PMC7584982/)
-and the Personalized Parkinson Project [[Bloem et al. 2019]](https://pubmed.ncbi.nlm.nih.gov/31315608/). Details and validation of the different pipelines shall be shared in upcoming scientific publications.
+The pipelines were developed and validated using data from the Parkinson@Home Validation study [[Evers et al. 2020]](https://pmc.ncbi.nlm.nih.gov/articles/PMC7584982/) and the Personalized Parkinson Project [[Bloem et al. 2019]](https://pubmed.ncbi.nlm.nih.gov/31315608/). The following publication contains the details and validation of the arm swing during gait pipeline:
+* [Post, E. et al. - Quantifying arm swing in Parkinson's disease: a method account for arm activities during free-living gait](https://doi.org/10.1186/s12984-025-01578-z)
+Details and validation of the other pipelines shall be shared in upcoming scientific publications.
 ## Contributing

{paradigma-0.4.7 → paradigma-1.0.0}/README.md RENAMED Viewed

@@ -75,7 +75,7 @@ The ParaDigMa toolbox is designed for the analysis of passive monitoring data co
 Specific requirements include:
 | Pipeline               | Sensor Configuration                                                                                                       | Context of Use                                                                                             |
 |------------------------|--------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|
-| **All**               | - Sensor position: wrist-band on most or least affected side (validated for both, but different sensitivity for measuring disease progression for tremor and arm swing during gait).  <br> - Sensor orientation: orientation as described in [Coordinate System](https://biomarkersparkinson.github.io/paradigma/guides/coordinate_system.html). | - Population: persons with PD. <br> - Data collection protocol: passive monitoring in daily life. |
+| **All**               | - Sensor position: wrist-band on most or least affected side (validated for both, but different sensitivity for measuring disease progression for tremor and arm swing during gait).  <br> - Sensor orientation: orientation as described in [Coordinate System](https://biomarkersparkinson.github.io/paradigma/guides/coordinate_system.html). <br> - Timeframe: contiguous, strictly increasing timestamps. | - Population: persons with PD. <br> - Data collection protocol: passive monitoring in daily life. |
 | **Arm swing during gait** | - Accelerometer: minimum sampling rate of 100 Hz, minimum range of ± 4 g. <br> - Gyroscope: minimum sampling rate of 100 Hz, minimum range of ± 1000 degrees/sec. | - Population: no walking aid, no severe dyskinesia in the watch-sided arm. <br> - Compliance: for weekly measures: at least three compliant days (with ≥10 hours of data between 8 am and 10 pm), and at least 2 minutes of arm swing. |
 | **Tremor**            | - Gyroscope: minimum sampling rate of 100 Hz, minimum range of ± 1000 degrees/sec. | - Compliance: for weekly measures: at least three compliant days (with ≥10 hours of data between 8 am and 10 pm). |
 | **Pulse rate**        | - PPG*: minimum sampling rate of 30 Hz, green LED. <br> - Accelerometer: minimum sampling rate of 100 Hz, minimum range of ± 4 g. | - Population: no rhythm disorders (e.g. atrial fibrillation, atrial flutter). <br> - Compliance: for weekly measures: minimum average of 12 hours of data per day. |
@@ -91,8 +91,10 @@ We have included support for [TSDF](https://biomarkersparkinson.github.io/tsdf/)
 ## Scientific validation
-The pipelines were developed and validated using data from the Parkinson@Home Validation study [[Evers et al. 2020]](https://pmc.ncbi.nlm.nih.gov/articles/PMC7584982/)
-and the Personalized Parkinson Project [[Bloem et al. 2019]](https://pubmed.ncbi.nlm.nih.gov/31315608/). Details and validation of the different pipelines shall be shared in upcoming scientific publications.
+The pipelines were developed and validated using data from the Parkinson@Home Validation study [[Evers et al. 2020]](https://pmc.ncbi.nlm.nih.gov/articles/PMC7584982/) and the Personalized Parkinson Project [[Bloem et al. 2019]](https://pubmed.ncbi.nlm.nih.gov/31315608/). The following publication contains the details and validation of the arm swing during gait pipeline:
+* [Post, E. et al. - Quantifying arm swing in Parkinson's disease: a method account for arm activities during free-living gait](https://doi.org/10.1186/s12984-025-01578-z)
+Details and validation of the other pipelines shall be shared in upcoming scientific publications.
 ## Contributing

{paradigma-0.4.7 → paradigma-1.0.0}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "paradigma"
-version = "0.4.7"
+version = "1.0.0"
 description = "ParaDigMa - A toolbox for deriving Parkinson's disease Digital Markers from real-life wrist sensor data"
 authors = [ "Erik Post <erik.post@radboudumc.nl>",
             "Kars Veldkamp <kars.veldkamp@radboudumc.nl>",

{paradigma-0.4.7 → paradigma-1.0.0}/src/paradigma/feature_extraction.py RENAMED Viewed

@@ -597,11 +597,9 @@ def pca_transform_gyroscope(
         df: pd.DataFrame,
         y_gyro_colname: str,
         z_gyro_colname: str,
-        pred_colname: str | None = None,
 ) -> np.ndarray:
     """
-    Perform principal component analysis (PCA) on gyroscope data to estimate velocity. If pred_colname is provided,
-    the PCA is fitted on the predicted gait data. Otherwise, the PCA is fitted on the entire dataset.
+    Perform principal component analysis (PCA) on gyroscope data to estimate velocity.
     Parameters
     ----------
@@ -611,8 +609,6 @@ def pca_transform_gyroscope(
         The column name for the y-axis gyroscope data.
     z_gyro_colname : str
         The column name for the z-axis gyroscope data.
-    pred_colname : str, optional
-        The column name for the predicted gait (default: None).
     Returns
     -------
@@ -623,19 +619,9 @@ def pca_transform_gyroscope(
     y_gyro_array = df[y_gyro_colname].to_numpy()
     z_gyro_array = df[z_gyro_colname].to_numpy()
-    # Filter data based on predicted gait if pred_colname is provided
-    if pred_colname is not None:
-        pred_mask = df[pred_colname] == 1
-        y_gyro_fit_array = y_gyro_array[pred_mask]
-        z_gyro_fit_array = z_gyro_array[pred_mask]
-        # Fit PCA on predicted gait data
-        fit_data = np.column_stack((y_gyro_fit_array, z_gyro_fit_array))
-        full_data = np.column_stack((y_gyro_array, z_gyro_array))
-    else:
-        # Fit PCA on entire dataset
-        fit_data = np.column_stack((y_gyro_array, z_gyro_array))
-        full_data = fit_data
+    # Fit PCA
+    fit_data = np.column_stack((y_gyro_array, z_gyro_array))
+    full_data = fit_data
     pca = PCA(n_components=2, svd_solver='auto', random_state=22)
     pca.fit(fit_data)

{paradigma-0.4.7 → paradigma-1.0.0}/src/paradigma/pipelines/gait_pipeline.py RENAMED Viewed

@@ -1,20 +1,17 @@
 import numpy as np
-import os
 import pandas as pd
-from pathlib import Path
 from scipy.signal import periodogram
 from typing import List, Tuple
-import tsdf
 from paradigma.classification import ClassifierPackage
-from paradigma.constants import DataColumns, TimeUnit
+from paradigma.constants import DataColumns
 from paradigma.config import GaitConfig
 from paradigma.feature_extraction import pca_transform_gyroscope, compute_angle, remove_moving_average_angle, \
     extract_angle_extremes, compute_range_of_motion, compute_peak_angular_velocity, compute_statistics, \
     compute_std_euclidean_norm, compute_power_in_bandwidth, compute_dominant_frequency, compute_mfccs, \
     compute_total_power
 from paradigma.segmenting import tabulate_windows, create_segments, discard_segments, categorize_segments, WindowedDataExtractor
-from paradigma.util import aggregate_parameter, merge_predictions_with_timestamps, read_metadata, write_df_data, get_end_iso8601
+from paradigma.util import aggregate_parameter
 def extract_gait_features(
@@ -160,66 +157,35 @@ def detect_gait(
 def extract_arm_activity_features(
+        df: pd.DataFrame,
         config: GaitConfig,
-        df_timestamps: pd.DataFrame,
-        df_predictions: pd.DataFrame,
-        threshold: float
     ) -> pd.DataFrame:
     """
     Extract features related to arm activity from a time-series DataFrame.
     This function processes a DataFrame containing accelerometer, gravity, and gyroscope signals,
     and extracts features related to arm activity by performing the following steps:
-    1. Merges the gait predictions with timestamps by expanding overlapping windows into individual timestamps.
-    2. Computes the angle and velocity from gyroscope data.
-    3. Filters the data to include only predicted gait segments.
-    4. Groups the data into segments based on consecutive timestamps and pre-specified gaps.
-    5. Removes segments that do not meet predefined criteria.
-    6. Creates fixed-length windows from the time series data.
-    7. Extracts angle-related features, temporal domain features, and spectral domain features.
+    1. Computes the angle and velocity from gyroscope data.
+    2. Filters the data to include only predicted gait segments.
+    3. Groups the data into segments based on consecutive timestamps and pre-specified gaps.
+    4. Removes segments that do not meet predefined criteria.
+    5. Creates fixed-length windows from the time series data.
+    6. Extracts angle-related features, temporal domain features, and spectral domain features.
     Parameters
     ----------
-    config : GaitConfig
-        Configuration object containing column names and parameters for feature extraction.
-    df_timestamps : pd.DataFrame
-        A DataFrame containing the raw sensor data, including accelerometer, gravity, and gyroscope columns.
-    df_predictions : pd.DataFrame
-        A DataFrame containing the predicted probabilities for gait activity per window.
+    df: pd.DataFrame
+        The input DataFrame containing accelerometer, gravity, and gyroscope data of predicted gait.
     config : ArmActivityFeatureExtractionConfig
         Configuration object containing column names and parameters for feature extraction.
-    path_to_classifier_input : str | Path
-        The path to the directory containing the classifier files and other necessary input files for feature extraction.
     Returns
     -------
     pd.DataFrame
         A DataFrame containing the extracted arm activity features, including angle, velocity,
         temporal, and spectral features.
     """
-    if not any(df_predictions[DataColumns.PRED_GAIT_PROBA] >= threshold):
-        raise ValueError("No gait detected in the input data.")
-    # Merge gait predictions with timestamps
-    gait_preprocessing_config = GaitConfig(step='gait')
-    df = merge_predictions_with_timestamps(
-        df_ts=df_timestamps,
-        df_predictions=df_predictions,
-        pred_proba_colname=DataColumns.PRED_GAIT_PROBA,
-        window_length_s=gait_preprocessing_config.window_length_s,
-        fs=gait_preprocessing_config.sampling_frequency
-    )
-    # Add a column for predicted gait based on a fitted threshold
-    df[DataColumns.PRED_GAIT] = (df[DataColumns.PRED_GAIT_PROBA] >= threshold).astype(int)
-    # Filter the DataFrame to only include predicted gait (1)
-    df = df.loc[df[DataColumns.PRED_GAIT]==1].reset_index(drop=True)
     # Group consecutive timestamps into segments, with new segments starting after a pre-specified gap
     df[DataColumns.SEGMENT_NR] = create_segments(
         time_array=df[DataColumns.TIME],
@@ -315,8 +281,8 @@ def filter_gait(
     ----------
     df : pd.DataFrame
         The input DataFrame containing features extracted from gait data.
-    full_path_to_classifier_package : str | Path
-        The path to the pre-trained classifier file.
+    clf_package: ClassifierPackage
+        The pre-trained classifier package containing the classifier, threshold, and scaler.
     parallel : bool, optional, default=False
         If `True`, enables parallel processing.
@@ -351,10 +317,10 @@ def filter_gait(
 def quantify_arm_swing(
         df: pd.DataFrame,
-        max_segment_gap_s: float,
-        min_segment_length_s: float,
         fs: int,
         filtered: bool = False,
+        max_segment_gap_s: float = 1.5,
+        min_segment_length_s: float = 1.5
     ) -> Tuple[dict[str, pd.DataFrame], dict]:
     """
     Quantify arm swing parameters for segments of motion based on gyroscope data.
@@ -362,28 +328,27 @@ def quantify_arm_swing(
     Parameters
     ----------
     df : pd.DataFrame
-        A DataFrame containing the raw sensor data, including gyroscope columns. Should include a column
+        A DataFrame containing the raw sensor data of predicted gait timestamps. Should include a column
         for predicted no other arm activity based on a fitted threshold if filtered is True.
-    max_segment_gap_s : float
-        The maximum gap allowed between segments.
-    min_segment_length_s : float
-        The minimum length required for a segment to be considered valid.
     fs : int
         The sampling frequency of the sensor data.
     filtered : bool, optional, default=True
         If `True`, the gyroscope data is filtered to only include predicted no other arm activity.
+    max_segment_gap_s : float, optional, default=1.5
+        The maximum gap in seconds between consecutive timestamps to group them into segments.
+    min_segment_length_s : float, optional, default=1.5
+        The minimum length in seconds for a segment to be considered valid.
     Returns
     -------
     Tuple[pd.DataFrame, dict]
         A tuple containing a dataframe with quantified arm swing parameters and a dictionary containing
         metadata for each segment.
     """
     # Group consecutive timestamps into segments, with new segments starting after a pre-specified gap.
     # Segments are made based on predicted gait
     df[DataColumns.SEGMENT_NR] = create_segments(
@@ -391,6 +356,10 @@ def quantify_arm_swing(
         max_segment_gap_s=max_segment_gap_s
     )
+    # Segment category is determined based on predicted gait, hence it is set
+    # before filtering the DataFrame to only include predicted no other arm activity
+    df[DataColumns.SEGMENT_CAT] = categorize_segments(df=df, fs=fs)
     # Remove segments that do not meet predetermined criteria
     df = discard_segments(
         df=df,
@@ -401,40 +370,51 @@ def quantify_arm_swing(
     )
     if df.empty:
-        raise ValueError("No segments found in the input data.")
+        raise ValueError("No segments found in the input data after discarding segments of invalid shape.")
     # If no arm swing data is remaining, return an empty dictionary
     if filtered and df.loc[df[DataColumns.PRED_NO_OTHER_ARM_ACTIVITY]==1].empty:
         raise ValueError("No gait without other arm activities to quantify.")
-    df[DataColumns.SEGMENT_CAT] = categorize_segments(df=df, fs=fs)
-    # Group and process segments
-    arm_swing_quantified = []
-    segment_meta = {}
-    if filtered:
+    elif filtered:
         # Filter the DataFrame to only include predicted no other arm activity (1)
         df = df.loc[df[DataColumns.PRED_NO_OTHER_ARM_ACTIVITY]==1].reset_index(drop=True)
-        # Group consecutive timestamps into segments, with new segments starting after a pre-specified gap
-        # Now segments are based on predicted gait without other arm activity for subsequent processes
+        # Group consecutive timestamps into segments of filtered gait
         df[DataColumns.SEGMENT_NR] = create_segments(
             time_array=df[DataColumns.TIME],
             max_segment_gap_s=max_segment_gap_s
         )
-        pred_colname_pca = DataColumns.PRED_NO_OTHER_ARM_ACTIVITY
-    else:
-        pred_colname_pca = None
+        # Remove segments that do not meet predetermined criteria
+        df = discard_segments(
+            df=df,
+            segment_nr_colname=DataColumns.SEGMENT_NR,
+            min_segment_length_s=min_segment_length_s,
+            fs=fs,
+        )
+        if df.empty:
+            raise ValueError("No filtered gait segments found in the input data after discarding segments of invalid shape.")
+    arm_swing_quantified = []
+    segment_meta = {
+        'aggregated': {
+            'all': {
+                'duration_s': len(df[DataColumns.TIME]) / fs
+            },
+        },
+        'per_segment': {}
+    }
+    # PCA is fitted on only predicted gait without other arm activity if filtered, otherwise
+    # it is fitted on the entire gyroscope data
     df[DataColumns.VELOCITY] = pca_transform_gyroscope(
         df=df,
         y_gyro_colname=DataColumns.GYROSCOPE_Y,
         z_gyro_colname=DataColumns.GYROSCOPE_Z,
-        pred_colname=pred_colname_pca
     )
+    # Group and process segments
     for segment_nr, group in df.groupby(DataColumns.SEGMENT_NR, sort=False):
         segment_cat = group[DataColumns.SEGMENT_CAT].iloc[0]
         time_array = group[DataColumns.TIME].to_numpy()
@@ -452,8 +432,10 @@ def quantify_arm_swing(
             fs=fs,
         )
-        segment_meta[segment_nr] = {
-            'time_s': len(angle_array) / fs,
+        segment_meta['per_segment'][segment_nr] = {
+            'start_time_s': time_array.min(),
+            'end_time_s': time_array.max(),
+            'duration_s': len(angle_array) / fs,
             DataColumns.SEGMENT_CAT: segment_cat
         }
@@ -487,12 +469,20 @@ def quantify_arm_swing(
                 df_params_segment = pd.DataFrame({
                     DataColumns.SEGMENT_NR: segment_nr,
+                    DataColumns.SEGMENT_CAT: segment_cat,
                     DataColumns.RANGE_OF_MOTION: rom,
                     DataColumns.PEAK_VELOCITY: pav
                 })
                 arm_swing_quantified.append(df_params_segment)
+    # Combine segment categories
+    segment_categories = set([segment_meta['per_segment'][x][DataColumns.SEGMENT_CAT] for x in segment_meta['per_segment'].keys()])
+    for segment_cat in segment_categories:
+        segment_meta['aggregated'][segment_cat] = {
+            'duration_s': sum([segment_meta['per_segment'][x]['duration_s'] for x in segment_meta['per_segment'].keys() if segment_meta['per_segment'][x][DataColumns.SEGMENT_CAT] == segment_cat])
+        }
     arm_swing_quantified = pd.concat(arm_swing_quantified, ignore_index=True)
     return arm_swing_quantified, segment_meta
@@ -527,7 +517,7 @@ def aggregate_arm_swing_params(df_arm_swing_params: pd.DataFrame, segment_meta:
         cat_segments = [x for x in segment_meta.keys() if segment_meta[x][DataColumns.SEGMENT_CAT] == segment_cat]
         aggregated_results[segment_cat] = {
-            'time_s': sum([segment_meta[x]['time_s'] for x in cat_segments])
+            'duration_s': sum([segment_meta[x]['duration_s'] for x in cat_segments])
         }
         df_arm_swing_params_cat = df_arm_swing_params[df_arm_swing_params[DataColumns.SEGMENT_NR].isin(cat_segments)]
@@ -537,7 +527,7 @@ def aggregate_arm_swing_params(df_arm_swing_params: pd.DataFrame, segment_meta:
                 aggregated_results[segment_cat][f'{aggregate}_{arm_swing_parameter}'] = aggregate_parameter(df_arm_swing_params_cat[arm_swing_parameter], aggregate)
     aggregated_results['all_segment_categories'] = {
-        'time_s': sum([segment_meta[x]['time_s'] for x in segment_meta.keys()])
+        'duration_s': sum([segment_meta[x]['duration_s'] for x in segment_meta.keys()])
     }
     for arm_swing_parameter in arm_swing_parameters:

{paradigma-0.4.7 → paradigma-1.0.0}/src/paradigma/pipelines/tremor_pipeline.py RENAMED Viewed

@@ -163,8 +163,8 @@ def aggregate_tremor(df: pd.DataFrame, config: TremorConfig):
     Returns
     -------
     dict
-        A dictionary with the aggregated tremor time and tremor power measures, as well as the total number of windows
-        available in the input dataframe, and the number of windows at rest.
+        A dictionary with the aggregated tremor time and tremor power measures, as well as the number of valid days,
+        the total number of windows, and the number of windows at rest available in the input dataframe.
     Notes
     -----
@@ -173,7 +173,7 @@ def aggregate_tremor(df: pd.DataFrame, config: TremorConfig):
     - The modal tremor power is computed based on gaussian kernel density estimation.
     """
+    nr_valid_days = df['time_dt'].dt.date.unique().size # number of valid days in the input dataframe
     nr_windows_total = df.shape[0] # number of windows in the input dataframe
     # remove windows with detected non-tremor arm movements to control for the amount of arm activities performed
@@ -216,6 +216,7 @@ def aggregate_tremor(df: pd.DataFrame, config: TremorConfig):
     # store aggregates in json format
     d_aggregates = {
         'metadata': {
+            'nr_valid_days': nr_valid_days,
             'nr_windows_total': nr_windows_total,
             'nr_windows_rest': nr_windows_rest
         },

{paradigma-0.4.7 → paradigma-1.0.0}/src/paradigma/preprocessing.py RENAMED Viewed

@@ -17,7 +17,9 @@ def resample_data(
     df: pd.DataFrame,
     time_column : str,
     values_column_names: List[str],
+    sampling_frequency: int,
     resampling_frequency: int,
+    tolerance: float | None = None
 ) -> pd.DataFrame:
     """
     Resamples sensor data to a specified frequency using cubic interpolation.
@@ -30,8 +32,14 @@ def resample_data(
         The name of the column containing the time data.
     values_column_names : List[str]
         A list of column names that should be resampled.
+    sampling_frequency : int
+        The original sampling frequency of the data (in Hz).
     resampling_frequency : int
         The frequency to which the data should be resampled (in Hz).
+    tolerance : float, optional
+        The tolerance added to the expected difference when checking
+        for contiguous timestamps. If not provided, it defaults to
+        twice the expected interval.
     Returns
     -------
@@ -46,23 +54,35 @@ def resample_data(
     Notes
     -----
-    The function uses cubic interpolation to resample the data to the specified frequency.
-    It requires the input time array to be strictly increasing.
+    - Uses cubic interpolation for smooth resampling if there are enough points.
+    - If only two timestamps are available, it falls back to linear interpolation.
     """
+    # Set default tolerance if not provided to twice the expected interval
+    if tolerance is None:
+        tolerance = 2 * 1 / sampling_frequency
-    # Extract time and values from DataFrame
+    # Extract time and values
     time_abs_array = np.array(df[time_column])
     values_array = np.array(df[values_column_names])
     # Ensure the time array is strictly increasing
     if not np.all(np.diff(time_abs_array) > 0):
-        raise ValueError("time_abs_array is not strictly increasing")
+        raise ValueError("Time array is not strictly increasing")
+    # Ensure the time array is contiguous
+    expected_interval = 1 / sampling_frequency
+    timestamp_diffs = np.diff(time_abs_array)
+    if np.any(np.abs(timestamp_diffs - expected_interval) > tolerance):
+        raise ValueError("Time array is not contiguous")
     # Resample the time data using the specified frequency
     t_resampled = np.arange(time_abs_array[0], time_abs_array[-1], 1 / resampling_frequency)
-    # Interpolate the data using cubic interpolation
-    interpolator = interp1d(time_abs_array, values_array, axis=0, kind="cubic")
+    # Choose interpolation method
+    interpolation_kind = "cubic" if len(time_abs_array) > 3 else "linear"
+    interpolator = interp1d(time_abs_array, values_array, axis=0, kind=interpolation_kind, fill_value="extrapolate")
+    # Interpolate
     resampled_values = interpolator(t_resampled)
     # Create a DataFrame with the resampled data
@@ -186,7 +206,8 @@ def preprocess_imu_data(df: pd.DataFrame, config: IMUConfig, sensor: str, watch_
     df = resample_data(
         df=df,
         time_column=DataColumns.TIME,
-        values_column_names = values_colnames,
+        values_column_names=values_colnames,
+        sampling_frequency=config.sampling_frequency,
         resampling_frequency=config.sampling_frequency
     )
@@ -259,6 +280,7 @@ def preprocess_ppg_data(df_ppg: pd.DataFrame, df_acc: pd.DataFrame, ppg_config:
         df=df_acc_overlapping,
         time_column=DataColumns.TIME,
         values_column_names = list(imu_config.d_channels_accelerometer.keys()),
+        sampling_frequency=imu_config.sampling_frequency,
         resampling_frequency=imu_config.sampling_frequency
     )
@@ -267,6 +289,7 @@ def preprocess_ppg_data(df_ppg: pd.DataFrame, df_acc: pd.DataFrame, ppg_config:
         df=df_ppg_overlapping,
         time_column=DataColumns.TIME,
         values_column_names = list(ppg_config.d_channels_ppg.keys()),
+        sampling_frequency=ppg_config.sampling_frequency,
         resampling_frequency=ppg_config.sampling_frequency
     )

{paradigma-0.4.7 → paradigma-1.0.0}/src/paradigma/segmenting.py RENAMED Viewed

@@ -168,7 +168,7 @@ def create_segments(
     gap_exceeds = time_diff > max_segment_gap_s
     # Create the segment number based on the cumulative sum of the gap_exceeds mask
-    segments = gap_exceeds.cumsum() + 1  # +1 to start enumeration from 1
+    segments = gap_exceeds.cumsum()
     return segments
@@ -236,6 +236,9 @@ def discard_segments(
     df = df[valid_segment_mask].copy()
+    if df.empty:
+        raise ValueError("All segments were removed.")
     # Reset segment numbers in a single step
     unique_segments = pd.factorize(df[segment_nr_colname])[0] + 1
     df[segment_nr_colname] = unique_segments

{paradigma-0.4.7 → paradigma-1.0.0}/src/paradigma/testing.py RENAMED Viewed

@@ -16,7 +16,7 @@ from paradigma.pipelines.tremor_pipeline import extract_tremor_features, detect_
 from paradigma.pipelines.heart_rate_pipeline import extract_signal_quality_features, signal_quality_classification, \
     aggregate_heart_rate
 from paradigma.preprocessing import preprocess_imu_data, preprocess_ppg_data
-from paradigma.util import read_metadata, write_df_data, get_end_iso8601
+from paradigma.util import read_metadata, write_df_data, get_end_iso8601, merge_predictions_with_timestamps
 def preprocess_imu_data_io(path_to_input: str | Path, path_to_output: str | Path,
@@ -208,13 +208,27 @@ def extract_arm_activity_features_io(
     clf_package = ClassifierPackage.load(full_path_to_classifier_package)
+    gait_preprocessing_config = GaitConfig(step='gait')
+    df = merge_predictions_with_timestamps(
+        df_ts=df_ts,
+        df_predictions=df_pred_gait,
+        pred_proba_colname=DataColumns.PRED_GAIT_PROBA,
+        window_length_s=gait_preprocessing_config.window_length_s,
+        fs=gait_preprocessing_config.sampling_frequency
+    )
+    # Add a column for predicted gait based on a fitted threshold
+    df[DataColumns.PRED_GAIT] = (df[DataColumns.PRED_GAIT_PROBA] >= clf_package.threshold).astype(int)
+    # Filter the DataFrame to only include predicted gait (1)
+    df = df.loc[df[DataColumns.PRED_GAIT]==1].reset_index(drop=True)
     # Extract arm activity features
     config = GaitConfig(step='arm_activity')
     df_features = extract_arm_activity_features(
+        df=df,
         config=config,
-        df_timestamps=df_ts,
-        df_predictions=df_pred_gait,
-        threshold=clf_package.threshold
     )
     end_iso8601 = get_end_iso8601(metadata_ts_values.start_iso8601, df_features[DataColumns.TIME][-1:].values[0] + config.window_length_s)

{paradigma-0.4.7 → paradigma-1.0.0}/src/paradigma/util.py RENAMED Viewed

@@ -1,9 +1,7 @@
-import json
 import os
 import numpy as np
 import pandas as pd
-from pathlib import Path
-from datetime import timedelta
+from datetime import datetime, timedelta
 from dateutil import parser
 from typing import List, Tuple
@@ -432,3 +430,61 @@ def merge_predictions_with_timestamps(
     df_ts = df_ts.dropna(subset=[pred_proba_colname])
     return df_ts
+def select_hours(df: pd.DataFrame, select_hours_start: str, select_hours_end: str) -> pd.DataFrame:
+    """
+    Select hours of interest from the data to include in the aggregation step.
+    Parameters
+    ----------
+    df : pd.DataFrame
+        Input data.
+    select_hours_start: str
+        The start time of the selected hours in "HH:MM" format.
+    select_hours_end: str
+            The end time of the selected hours in "HH:MM" format.
+    Returns
+    -------
+    pd.DataFrame
+        The selected data.
+    """
+    select_hours_start = datetime.strptime(select_hours_start, '%H:%M').time() # convert to time object
+    select_hours_end = datetime.strptime(select_hours_end, '%H:%M').time()
+    df_subset = df[df['time_dt'].dt.time.between(select_hours_start, select_hours_end)] # select the hours of interest
+    return df_subset
+def select_days(df: pd.DataFrame, min_hours_per_day: int) -> pd.DataFrame:
+    """
+    Select days of interest from the data to include in the aggregation step.
+    Parameters
+    ----------
+    df : pd.DataFrame
+        Input data with column 'time_dt' in which the date is stored.
+    min_hours_per_day: int
+        The minimum number of hours per day required for including the day in the aggregation step.
+    Returns
+    -------
+    pd.DataFrame
+        The selected data.
+    """
+    min_s_per_day = min_hours_per_day * 3600
+    window_length_s = df['time_dt'].diff().dt.total_seconds()[1] # determine the length of the first window in seconds
+    min_windows_per_day = min_s_per_day / window_length_s
+    df_subset = df.groupby(df['time_dt'].dt.date).filter(lambda x: len(x) >= min_windows_per_day)
+    return df_subset