PyPI - sl-shared-assets - Versions diffs - 1.0.0rc19__py3-none-any.whl → 1.0.0rc21__py3-none-any.whl - Mend

sl-shared-assets 1.0.0rc19py3-none-any.whl → 1.0.0rc21py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of sl-shared-assets might be problematic. Click here for more details.

Files changed (36) hide show

sl_shared_assets/__init__.py +27 -27
sl_shared_assets/__init__.pyi +73 -0
sl_shared_assets/cli.py +266 -40
sl_shared_assets/cli.pyi +87 -0
sl_shared_assets/data_classes/__init__.py +23 -20
sl_shared_assets/data_classes/__init__.pyi +61 -0
sl_shared_assets/data_classes/configuration_data.py +407 -26
sl_shared_assets/data_classes/configuration_data.pyi +194 -0
sl_shared_assets/data_classes/runtime_data.py +59 -41
sl_shared_assets/data_classes/runtime_data.pyi +145 -0
sl_shared_assets/data_classes/session_data.py +168 -914
sl_shared_assets/data_classes/session_data.pyi +249 -0
sl_shared_assets/data_classes/surgery_data.py +3 -3
sl_shared_assets/data_classes/surgery_data.pyi +89 -0
sl_shared_assets/server/__init__.pyi +8 -0
sl_shared_assets/server/job.pyi +94 -0
sl_shared_assets/server/server.pyi +95 -0
sl_shared_assets/tools/__init__.py +8 -1
sl_shared_assets/tools/__init__.pyi +15 -0
sl_shared_assets/tools/ascension_tools.py +27 -26
sl_shared_assets/tools/ascension_tools.pyi +68 -0
sl_shared_assets/tools/packaging_tools.py +14 -1
sl_shared_assets/tools/packaging_tools.pyi +56 -0
sl_shared_assets/tools/project_management_tools.py +164 -0
sl_shared_assets/tools/project_management_tools.pyi +48 -0
sl_shared_assets/tools/transfer_tools.pyi +53 -0
{sl_shared_assets-1.0.0rc19.dist-info → sl_shared_assets-1.0.0rc21.dist-info}/METADATA +21 -4
sl_shared_assets-1.0.0rc21.dist-info/RECORD +36 -0
sl_shared_assets-1.0.0rc21.dist-info/entry_points.txt +8 -0
sl_shared_assets/suite2p/__init__.py +0 -8
sl_shared_assets/suite2p/multi_day.py +0 -225
sl_shared_assets/suite2p/single_day.py +0 -563
sl_shared_assets-1.0.0rc19.dist-info/RECORD +0 -23
sl_shared_assets-1.0.0rc19.dist-info/entry_points.txt +0 -4
{sl_shared_assets-1.0.0rc19.dist-info → sl_shared_assets-1.0.0rc21.dist-info}/WHEEL +0 -0
{sl_shared_assets-1.0.0rc19.dist-info → sl_shared_assets-1.0.0rc21.dist-info}/licenses/LICENSE +0 -0

sl_shared_assets/tools/ascension_tools.py CHANGED Viewed

@@ -5,13 +5,12 @@ an example for how to convert other data formats to match use the Sun lab data s
 from pathlib import Path
 import datetime
-import tempfile
 import numpy as np
-from ataraxis_base_utilities import LogLevel, console
+from ataraxis_base_utilities import LogLevel, console, ensure_directory_exists
 from ataraxis_time.time_helpers import extract_timestamp_from_bytes
-from ..data_classes import SessionData, ProjectConfiguration
+from ..data_classes import SessionData, ProjectConfiguration, get_system_configuration_data
 from .transfer_tools import transfer_directory
 from .packaging_tools import calculate_directory_checksum
@@ -170,7 +169,7 @@ def _reorganize_data(session_data: SessionData, source_root: Path) -> bool:
         return True
-def ascend_tyche_data(root_directory: Path, output_root_directory: Path, server_root_directory: Path) -> None:
+def ascend_tyche_data(root_directory: Path) -> None:
     """Reformats the old Tyche data to use the modern Sun lab layout and metadata files.
     This function is used to convert old Tyche data to the modern data management standard. This is used to make the
@@ -188,30 +187,24 @@ def ascend_tyche_data(root_directory: Path, output_root_directory: Path, server_
         this function for a large number of sessions will result in a long processing time due to the network data
         transfer.
+        Since SessionData can only be created on a PC that has a valid acquisition system config, this function will
+        only work on a machine that is part of an active Sun lab acquisition system.
     Args:
         root_directory: The directory that stores one or more Tyche animal folders. This can be conceptualized as the
             root directory for the Tyche project.
-        output_root_directory: The path to the local directory where to generate the converted Tyche project hierarchy.
-            Typically, this is the 'root' directory where all other Sun lab projects are stored.
-        server_root_directory: The path to the local filesystem-mounted BioHPC server storage directory. Note, this
-            directory hs to be mapped to the local filesystem via the SMB or equivalent protocol.
     """
     # Generates a (shared) project configuration file.
     project_configuration = ProjectConfiguration()
-    # Generates a temporary directory for NAS and Mesoscope paths. Since Tyche data is already backed up on the NAS and
-    # we are not generating new data, these root paths are not needed, but have to be created as part of the pipeline.
-    # Redirecting them to local temporary directories allows avoiding extra steps to manually remove these redundant
-    # directories after runtime.
-    temp_nas_dir = Path(tempfile.mkdtemp(prefix="nas_temp_"))
-    temp_mesoscope_dir = Path(tempfile.mkdtemp(prefix="mesoscope_temp_"))
+    # The acquisition system config resolves most paths and filesystem configuration arguments
+    acquisition_system = get_system_configuration_data()
+    output_root_directory = acquisition_system.paths.root_directory
+    server_root_directory = acquisition_system.paths.server_storage_directory
     # Statically defines project name and local root paths
-    project_configuration.project_name = "Tyche"
-    project_configuration.local_root_directory = output_root_directory
-    project_configuration.local_server_directory = server_root_directory
-    project_configuration.local_nas_directory = temp_nas_dir
-    project_configuration.local_mesoscope_directory = temp_mesoscope_dir
+    project_name = "Tyche"
+    project_configuration.project_name = project_name
     # Uses nonsensical google sheet IDs. Tyche project did not use Google Sheet processing like our modern projects do.
     project_configuration.water_log_sheet_id = "1xFh9Q2zT7pL3mVkJdR8bN6yXoE4wS5aG0cHu2Kf7D3v"
@@ -219,13 +212,14 @@ def ascend_tyche_data(root_directory: Path, output_root_directory: Path, server_
     # Dumps project configuration into the 'configuration' subfolder of the Tyche project.
     configuration_path = output_root_directory.joinpath("Tyche", "configuration", "project_configuration.yaml")
+    ensure_directory_exists(configuration_path)
     project_configuration.save(path=configuration_path)
     # Assumes that root directory stores all animal folders to be processed
     for animal_folder in root_directory.iterdir():
         # Each animal folder is named to include project name and a static animal ID, e.g.: Tyche-A7. This extracts each
         # animal ID.
-        animal_name = animal_folder.name.split(sep="-")[1]
+        animal_name = animal_folder.stem.split(sep="-")[1]
         # Under each animal root folder, there are day folders that use YYYY-MM-DD timestamps
         for session_folder in animal_folder.iterdir():
@@ -240,11 +234,11 @@ def ascend_tyche_data(root_directory: Path, output_root_directory: Path, server_
                 # session data hierarchy using the output root. This generates a 'standard' Sun lab directory structure
                 # for the Tyche data.
                 session_data = SessionData.create(
+                    project_name=project_configuration.project_name,
                     session_name=session_name,
                     animal_id=animal_name,
-                    project_configuration=project_configuration,
-                    session_type="Experiment",
-                    experiment_name=None,  # Has to be none, otherwise the system tries to copy a configuration file.
+                    session_type="mesoscope experiment",
+                    experiment_name=None,
                 )
                 # Moves the data from the old hierarchy to the new hierarchy. If the process runs as expected, and
@@ -259,15 +253,22 @@ def ascend_tyche_data(root_directory: Path, output_root_directory: Path, server_
                     # noinspection PyTypeChecker
                     console.echo(message=message, level=LogLevel.WARNING)
                 else:
-                    # If the transfer process was successful, generates a new checksum for the moved data
+                    # Generates the telomere.bin file to mark the session as 'complete'
+                    session_data.raw_data.telomere_path.touch()
+                    # If the local transfer process was successful, generates a new checksum for the moved data
                     calculate_directory_checksum(directory=Path(session_data.raw_data.raw_data_path))
                     # Next, copies the data to the BioHPC server for further processing
                     transfer_directory(
                         source=Path(session_data.raw_data.raw_data_path),
-                        destination=Path(session_data.destinations.server_raw_data_path),
+                        destination=Path(
+                            server_root_directory.joinpath(project_name, animal_name, session_name, "raw_data")
+                        ),
                         verify_integrity=False,
                     )
-                    # Finally, removes the now-empty old session data directory.
+                    # Removes the now-empty old session data directory.
                     acquisition_folder.rmdir()
             # If the loop above removed all acquisition folders, all data for that day has been successfully converted

sl_shared_assets/tools/ascension_tools.pyi ADDED Viewed

@@ -0,0 +1,68 @@
+from pathlib import Path
+from ..data_classes import (
+    SessionData as SessionData,
+    ProjectConfiguration as ProjectConfiguration,
+    get_system_configuration_data as get_system_configuration_data,
+)
+from .transfer_tools import transfer_directory as transfer_directory
+from .packaging_tools import calculate_directory_checksum as calculate_directory_checksum
+def _generate_session_name(acquisition_path: Path) -> str:
+    """Generates a session name using the last modification time of a zstack.mat or MotionEstimator.me file.
+    This worker function uses one of the motion estimation files stored in each Tyche 'acquisition' subfolder to
+    generate a modern Sun lab timestamp-based session name. This is used to translate the original Tyche session naming
+    pattern into the pattern used by all modern Sun lab projects and pipelines.
+    Args:
+        acquisition_path: The absolute path to the target acquisition folder. These folders are found under the 'day'
+            folders for each animal, e.g.: Tyche-A7/2022_01_03/1.
+    Returns:
+        The modernized session name.
+    """
+def _reorganize_data(session_data: SessionData, source_root: Path) -> bool:
+    """Reorganizes and moves the session's data from the source folder in the old Tyche data hierarchy to the raw_data
+    folder in the newly created modern hierarchy.
+    This worker function is used to physically rearrange the data from the original Tyche data structure to the
+    new data structure. It both moves the existing files to their new destinations and renames certain files to match
+    the modern naming convention used in the Sun lab.
+    Args:
+        session_data: The initialized SessionData instance managing the 'ascended' (modernized) session data hierarchy.
+        source_root: The absolute path to the old Tyche data hierarchy folder that stores session's data.
+    Returns:
+        True if the ascension process was successfully completed. False if the process encountered missing data or
+        otherwise did not go as expected. When the method returns False, the runtime function requests user intervention
+        to finalize the process manually.
+    """
+def ascend_tyche_data(root_directory: Path) -> None:
+    """Reformats the old Tyche data to use the modern Sun lab layout and metadata files.
+    This function is used to convert old Tyche data to the modern data management standard. This is used to make the
+    data compatible with the modern Sun lab data workflows.
+    Notes:
+        This function is statically written to work with the raw Tyche dataset featured in the OSM manuscript:
+        https://www.nature.com/articles/s41586-024-08548-w. Additionally, it assumes that the dataset has been
+        preprocessed with the early Sun lab mesoscope compression pipeline. The function will not work for any other
+        project or data hierarchy.
+        As part of its runtime, the function automatically transfers the ascended session data to the BioHPC server.
+        Since transferring the data over the network is the bottleneck of this pipeline, it runs in a single-threaded
+        mode and is constrained by the communication channel between the local machine and the BioHPC server. Calling
+        this function for a large number of sessions will result in a long processing time due to the network data
+        transfer.
+        Since SessionData can only be created on a PC that has a valid acquisition system config, this function will
+        only work on a machine that is part of an active Sun lab acquisition system.
+    Args:
+        root_directory: The directory that stores one or more Tyche animal folders. This can be conceptualized as the
+            root directory for the Tyche project.
+    """

sl_shared_assets/tools/packaging_tools.py CHANGED Viewed

@@ -10,6 +10,19 @@ from concurrent.futures import ProcessPoolExecutor, as_completed
 from tqdm import tqdm
 import xxhash
+# Defines a 'blacklist' set of files. Primarily, this lit contains the service files that may change after the session
+# data has been acquired. Therefore, it does not make sense to include them in the checksum, as they do not reflect the
+# data that should remain permanently unchanged. Note, make sure all service files are added to this set!
+_excluded_files = {
+    "ax_checksum.txt",
+    "ubiquitin.bin",
+    "telomere.bin",
+    "single_day_suite2p.bin",
+    "multi_day_suite2p.bin",
+    "behavior.bin",
+    "dlc.bin",
+}
 def _calculate_file_checksum(base_directory: Path, file_path: Path) -> tuple[str, bytes]:
     """Calculates xxHash3-128 checksum for a single file and its path relative to the base directory.
@@ -89,7 +102,7 @@ def calculate_directory_checksum(
     files = sorted(
         path
         for path in directory.rglob("*")
-        if path.is_file() and path.stem != "ax_checksum" and path.suffix != ".txt"  # Excludes checksum files
+        if path.is_file() and f"{path.stem}{path.suffix}" not in _excluded_files  # Excludes service files
     )
     # Precreates the directory checksum

sl_shared_assets/tools/packaging_tools.pyi ADDED Viewed

@@ -0,0 +1,56 @@
+from pathlib import Path
+from _typeshed import Incomplete
+_excluded_files: Incomplete
+def _calculate_file_checksum(base_directory: Path, file_path: Path) -> tuple[str, bytes]:
+    """Calculates xxHash3-128 checksum for a single file and its path relative to the base directory.
+    This function is passed to parallel workers used by the calculate_directory_hash() method that iteratively
+    calculates the checksum for all files inside a directory. Each call to this function returns the checksum for the
+    target file, which includes both the contents of the file and its path relative to the base directory.
+    Args:
+        base_directory: The path to the base (root) directory which is being checksummed by the main
+            'calculate_directory_checksum' function.
+        file_path: The absolute path to the target file.
+    Returns:
+        A tuple with two elements. The first element is the path to the file relative to the base directory. The second
+        element is the xxHash3-128 checksum that covers the relative path and the contents of the file.
+    """
+def calculate_directory_checksum(
+    directory: Path, num_processes: int | None = None, batch: bool = False, save_checksum: bool = True
+) -> str:
+    """Calculates xxHash3-128 checksum for the input directory, which includes the data of all contained files and
+    the directory structure information.
+    This function is used to generate a checksum for the raw_data directory of each experiment or training session.
+    Checksums are used to verify the session data integrity during transmission between the PC that acquired the data
+    and long-term storage locations, such as the Synology NAS or the BioHPC server. The function can be configured to
+    write the generated checksum as a hexadecimal string to the ax_checksum.txt file stored at the highest level of the
+    input directory.
+    Note:
+        This method uses multiprocessing to efficiently parallelize checksum calculation for multiple files. In
+        combination with xxHash3, this achieves a significant speedup over more common checksums, such as MD5 and
+        SHA256. Note that xxHash3 is not suitable for security purposes and is only used to ensure data integrity.
+        The method notifies the user about the checksum calculation process via the terminal.
+        The returned checksum accounts for both the contents of each file and the layout of the input directory
+        structure.
+    Args:
+        directory: The Path to the directory to be checksummed.
+        num_processes: The number of CPU processes to use for parallelizing checksum calculation. If set to None, the
+            function defaults to using (logical CPU count - 4).
+        batch: Determines whether the function is called as part of batch-processing multiple directories. This is used
+            to optimize progress reporting to avoid cluttering the terminal.
+        save_checksum: Determines whether the checksum should be saved (written to) a .txt file.
+    Returns:
+        The xxHash3-128 checksum for the input directory as a hexadecimal string.
+    """

sl_shared_assets/tools/project_management_tools.py ADDED Viewed

@@ -0,0 +1,164 @@
+"""This module provides tools for managing the data of any Sun lab project. Tools from this module extend the
+functionality of SessionData class via a convenient API that allows working with the data of multiple sessions making
+up a given project."""
+from pathlib import Path
+import polars as pl
+from ..data_classes import SessionData
+from .packaging_tools import calculate_directory_checksum
+def generate_project_manifest(
+    raw_project_directory: Path, output_directory: Path, processed_project_directory: Path | None = None
+) -> None:
+    """Builds and saves the project manifest .feather file under the specified output directory.
+    This function evaluates the input project directory and builds the 'manifest' file for the project. The file
+    includes the descriptive information about every session stored inside the input project folder and the state of
+    session's data processing (which processing pipelines have been applied to each session). The file will be created
+    under the 'output_path' directory and use the following name pattern: {ProjectName}}_manifest.feather.
+    Notes:
+        The manifest file is primarily used to capture and move project state information between machines, typically
+        in the context of working with data stored on a remote compute server or cluster. However, it can also be used
+        on a local machine, since an up-to-date manifest file is required to run most data processing pipelines in the
+        lab regardless of the runtime context.
+    Args:
+        raw_project_directory: The path to the root project directory used to store raw session data.
+        output_directory: The path to the directory where to save the generated manifest file.
+        processed_project_directory: The path to the root project directory used to store processed session data if it
+            is different from the 'raw_project_directory'. Typically, this would be the case on remote compute server(s)
+            and not on local machines.
+    """
+    # Finds all raw data directories
+    session_directories = [directory.parent for directory in raw_project_directory.rglob("raw_data")]
+    # Precreates the 'manifest' dictionary structure
+    manifest: dict[str, list[str | bool]] = {
+        "animal": [],  # Animal IDs.
+        "session": [],  # Session names.
+        "type": [],  # Type of the session (e.g., Experiment, Training, etc.).
+        "raw_data": [],  # Server-side raw_data folder path.
+        "processed_data": [],  # Server-side processed_data folder path.
+        "complete": [],  # Determines if the session data is complete. Incomplete sessions are excluded from processing.
+        "single_day_suite2p": [],  # Determines whether the session has been processed with the single-day s2p pipeline.
+        "multi_day_suite2p": [],  # Determines whether the session has been processed with the multi-day s2p pipeline.
+        "behavior": [],  # Determines whether the session has been processed with the behavior extraction pipeline.
+        "dlc": [],  # Determines whether the session has been processed with the DeepLabCut pipeline.
+    }
+    # Loops over each session of every animal in the project and extracts session ID information and information
+    # about which processing steps have been successfully applied to the session.
+    for directory in session_directories:
+        # Instantiates the SessionData instance to resolve the paths to all session's data files and locations.
+        session_data = SessionData.load(
+            session_path=directory, processed_data_root=processed_project_directory, make_processed_data_directory=False
+        )
+        # Fills the manifest dictionary with data for the processed session:
+        # Extracts ID and data path information from the SessionData instance
+        manifest["animal"].append(session_data.animal_id)
+        manifest["session"].append(session_data.session_name)
+        manifest["type"].append(session_data.session_type)
+        manifest["raw_data"].append(str(session_data.raw_data.raw_data_path))
+        manifest["processed_data"].append(str(session_data.processed_data.processed_data_path))
+        # If the session raw_data folder contains the telomere.bin file, marks the session as complete.
+        manifest["complete"].append(session_data.raw_data.telomere_path.exists())
+        # If the session is incomplete, marks all processing steps as FALSE, as automatic processing is disabled for
+        # incomplete sessions.
+        if not manifest["complete"][-1]:
+            manifest["single_day_suite2p"].append(False)
+            manifest["multi_day_suite2p"].append(False)
+            manifest["behavior"].append(False)
+            manifest["dlc"].append(False)
+            continue  # Cycles to the next session
+        # If the session processed_data folder contains the single-day suite2p.bin file, marks the single-day suite2p
+        # processing step as complete.
+        manifest["single_day_suite2p"].append(session_data.processed_data.single_day_suite2p_bin_path.exists())
+        # If the session processed_data folder contains the multi-day suite2p.bin file, marks the multi-day suite2p
+        # processing step as complete.
+        manifest["multi_day_suite2p"].append(session_data.processed_data.multi_day_suite2p_bin_path.exists())
+        # If the session processed_data folder contains the behavior.bin file, marks the behavior processing step as
+        # complete.
+        manifest["behavior"].append(session_data.processed_data.behavior_data_path.exists())
+        # If the session processed_data folder contains the dlc.bin file, marks the dlc processing step as
+        # complete.
+        manifest["dlc"].append(session_data.processed_data.dlc_bin_path.exists())
+    # Converts the manifest dictionary to a Polars Dataframe
+    schema = {
+        "animal": pl.String,
+        "session": pl.String,
+        "raw_data": pl.String,
+        "processed_data": pl.String,
+        "type": pl.String,
+        "complete": pl.Boolean,
+        "single_day_suite2p": pl.Boolean,
+        "multi_day_suite2p": pl.Boolean,
+        "behavior": pl.Boolean,
+        "dlc": pl.Boolean,
+    }
+    df = pl.DataFrame(manifest, schema=schema)
+    # Sorts the DataFrame by animal and then session. Since we assign animal IDs sequentially and 'name' sessions based
+    # on acquisition timestamps, the sort order is chronological.
+    sorted_df = df.sort(["animal", "session"])
+    # Saves the generated manifest to the project-specific manifest .feather file for further processing.
+    sorted_df.write_ipc(
+        file=output_directory.joinpath(f"{raw_project_directory.stem}_manifest.feather"), compression="lz4"
+    )
+def verify_session_checksum(session_path: Path) -> bool:
+    """Verifies the integrity of the session's raw data by generating the checksum of the raw_data directory and
+    comparing it against the checksum stored in the ax_checksum.txt file.
+    Primarily, this function is used to verify data integrity after transferring it from a local PC to the remote
+    server for long-term storage. This function is designed to do nothing if the checksum matches and to remove the
+    'telomere.bin' marker file if it does not.
+    Notes:
+        Removing the telomere.bin marker file from session's raw_data folder marks the session as incomplete, excluding
+        it from all further automatic processing.
+    Args:
+        session_path: The path to the session directory to be verified. Note, the input session directory must contain
+            the 'raw_data' subdirectory.
+    Returns:
+        True if the checksum matches, False otherwise.
+    """
+    # Loads session data layout
+    session_data = SessionData.load(session_path=session_path)
+    # Re-calculates the checksum for the raw_data directory
+    calculated_checksum = calculate_directory_checksum(
+        directory=session_data.raw_data.raw_data_path, batch=False, save_checksum=False
+    )
+    # Loads the checksum stored inside the ax_checksum.txt file
+    with open(session_data.raw_data.checksum_path, "r") as f:
+        stored_checksum = f.read().strip()
+    # If the two checksums do not match, this likely indicates data corruption.
+    if stored_checksum != calculated_checksum:
+        # If the telomere.bin file exists, removes this file. This automatically marks the session as incomplete for
+        # all other Sun lab runtimes. The presence of the telomere.bin file after integrity verification is used as a
+        # heuristic for determining whether the session has passed the verification process.
+        if session_data.raw_data.telomere_path.exists():
+            session_data.raw_data.telomere_path.unlink()
+        return False
+    return True

sl_shared_assets/tools/project_management_tools.pyi ADDED Viewed

@@ -0,0 +1,48 @@
+from pathlib import Path
+from ..data_classes import SessionData as SessionData
+from .packaging_tools import calculate_directory_checksum as calculate_directory_checksum
+def generate_project_manifest(
+    raw_project_directory: Path, output_directory: Path, processed_project_directory: Path | None = None
+) -> None:
+    """Builds and saves the project manifest .feather file under the specified output directory.
+    This function evaluates the input project directory and builds the 'manifest' file for the project. The file
+    includes the descriptive information about every session stored inside the input project folder and the state of
+    session's data processing (which processing pipelines have been applied to each session). The file will be created
+    under the 'output_path' directory and use the following name pattern: {ProjectName}}_manifest.feather.
+    Notes:
+        The manifest file is primarily used to capture and move project state information between machines, typically
+        in the context of working with data stored on a remote compute server or cluster. However, it can also be used
+        on a local machine, since an up-to-date manifest file is required to run most data processing pipelines in the
+        lab regardless of the runtime context.
+    Args:
+        raw_project_directory: The path to the root project directory used to store raw session data.
+        output_directory: The path to the directory where to save the generated manifest file.
+        processed_project_directory: The path to the root project directory used to store processed session data if it
+            is different from the 'raw_project_directory'. Typically, this would be the case on remote compute server(s)
+            and not on local machines.
+    """
+def verify_session_checksum(session_path: Path) -> bool:
+    """Verifies the integrity of the session's raw data by generating the checksum of the raw_data directory and
+    comparing it against the checksum stored in the ax_checksum.txt file.
+    Primarily, this function is used to verify data integrity after transferring it from a local PC to the remote
+    server for long-term storage. This function is designed to do nothing if the checksum matches and to remove the
+    'telomere.bin' marker file if it does not.
+    Notes:
+        Removing the telomere.bin marker file from session's raw_data folder marks the session as incomplete, excluding
+        it from all further automatic processing.
+    Args:
+        session_path: The path to the session directory to be verified. Note, the input session directory must contain
+            the 'raw_data' subdirectory.
+    Returns:
+        True if the checksum matches, False otherwise.
+    """

sl_shared_assets/tools/transfer_tools.pyi ADDED Viewed

@@ -0,0 +1,53 @@
+from pathlib import Path
+from .packaging_tools import calculate_directory_checksum as calculate_directory_checksum
+def _transfer_file(source_file: Path, source_directory: Path, destination_directory: Path) -> None:
+    """Copies the input file from the source directory to the destination directory while preserving the file metadata.
+    This is a worker method used by the transfer_directory() method to move multiple files in parallel.
+    Notes:
+        If the file is found under a hierarchy of subdirectories inside the input source_directory, that hierarchy will
+        be preserved in the destination directory.
+    Args:
+        source_file: The file to be copied.
+        source_directory: The root directory where the file is located.
+        destination_directory: The destination directory where to move the file.
+    """
+def transfer_directory(source: Path, destination: Path, num_threads: int = 1, verify_integrity: bool = True) -> None:
+    """Copies the contents of the input directory tree from source to destination while preserving the folder
+    structure.
+    This function is used to assemble the experimental data from all remote machines used in the acquisition process on
+    the VRPC before the data is preprocessed. It is also used to transfer the preprocessed data from the VRPC to the
+    SynologyNAS and the Sun lab BioHPC server.
+    Notes:
+        This method recreates the moved directory hierarchy on the destination if the hierarchy does not exist. This is
+        done before copying the files.
+        The method executes a multithreading copy operation. It does not clean up the source files. That job is handed
+        to the specific preprocessing function from the sl_experiment or sl-forgery libraries that calls this function.
+        If the method is configured to verify transferred file integrity, it reruns the xxHash3-128 checksum calculation
+        and compares the returned checksum to the one stored in the source directory. The method assumes that all input
+        directories contain the 'ax_checksum.txt' file that stores the 'source' directory checksum at the highest level
+        of the input directory tree.
+    Args:
+        source: The path to the directory that needs to be moved.
+        destination: The path to the destination directory where to move the contents of the source directory.
+        num_threads: The number of threads to use for parallel file transfer. This number should be set depending on the
+            type of transfer (local or remote) and is not guaranteed to provide improved transfer performance. For local
+            transfers, setting this number above 1 will likely provide a performance boost. For remote transfers using
+            a single TCP / IP socket (such as non-multichannel SMB protocol), the number should be set to 1.
+        verify_integrity: Determines whether to perform integrity verification for the transferred files. Note,
+            integrity verification is a time-consuming process and generally would not be a concern for most runtimes.
+            Therefore, it is often fine to disable this option to optimize method runtime speed.
+    Raises:
+        RuntimeError: If the transferred files do not pass the xxHas3-128 checksum integrity verification.
+    """

{sl_shared_assets-1.0.0rc19.dist-info → sl_shared_assets-1.0.0rc21.dist-info}/METADATA RENAMED Viewed

@@ -1,10 +1,10 @@
 Metadata-Version: 2.4
 Name: sl-shared-assets
-Version: 1.0.0rc19
+Version: 1.0.0rc21
 Summary: Stores assets shared between multiple Sun (NeuroAI) lab data pipelines.
 Project-URL: Homepage, https://github.com/Sun-Lab-NBB/sl-shared-assets
 Project-URL: Documentation, https://sl-shared-assets-api-docs.netlify.app/
-Author: Ivan Kondratyev, Kushaan Gupta, Yuantao Deng
+Author: Ivan Kondratyev, Kushaan Gupta, Yuantao Deng, Natalie Yeung
 Maintainer-email: Ivan Kondratyev <ik278@cornell.edu>
 License:                     GNU GENERAL PUBLIC LICENSE
                                Version 3, 29 June 2007
@@ -695,8 +695,10 @@ Requires-Dist: ataraxis-base-utilities<4,>=3
 Requires-Dist: ataraxis-data-structures<4,>=3.1.1
 Requires-Dist: ataraxis-time<4,>=3
 Requires-Dist: click<9,>=8
-Requires-Dist: dacite<2,>=1
+Requires-Dist: natsort<9,>=8
 Requires-Dist: paramiko<4,>=3.5.1
+Requires-Dist: polars<2,>=1
+Requires-Dist: pyarrow<21,>=20
 Requires-Dist: simple-slurm<1,>=0
 Requires-Dist: tqdm<5,>=4
 Requires-Dist: xxhash<4,>=3
@@ -717,8 +719,10 @@ Requires-Dist: types-tqdm<5,>=4; extra == 'conda'
 Provides-Extra: condarun
 Requires-Dist: appdirs<2,>=1; extra == 'condarun'
 Requires-Dist: click<9,>=8; extra == 'condarun'
-Requires-Dist: dacite<2,>=1; extra == 'condarun'
+Requires-Dist: natsort<9,>=8; extra == 'condarun'
 Requires-Dist: paramiko<4,>=3.5.1; extra == 'condarun'
+Requires-Dist: polars<2,>=1; extra == 'condarun'
+Requires-Dist: pyarrow<21,>=20; extra == 'condarun'
 Requires-Dist: tqdm<5,>=4; extra == 'condarun'
 Provides-Extra: dev
 Requires-Dist: ataraxis-automation<5,>=4; extra == 'dev'
@@ -781,6 +785,7 @@ acquisition and processing and provides the API for accessing the lab’s main c
 - [Dependencies](#dependencies)
 - [Installation](#installation)
+- [Usage](#usage)
 - [API Documentation](#api-documentation)
 - [Versioning](#versioning)
 - [Authors](#authors)
@@ -811,11 +816,22 @@ Use the following command to install the library using pip: ```pip install sl-sh
 ---
+## Usage
+All library components are intended to be used via other Sun lab libraries. Developers should study the API and CLI
+documentation below to learn how to use library components in other Sun lab libraries.
+---
 ## API Documentation
 See the [API documentation](https://sl-shared-assets-api-docs.netlify.app/) for the
 detailed description of the methods and classes exposed by components of this library.
+**Note!** The API documentation includes important information about Command-Line-Interfaces (CLIs) exposed by this
+library as part of installation into a Python environment. All users are highly encouraged to study the CLI
+documentation to learn how to use library components via the terminal.
 ___
 ## Versioning
@@ -830,6 +846,7 @@ We use [semantic versioning](https://semver.org/) for this project. For the vers
 - Ivan Kondratyev ([Inkaros](https://github.com/Inkaros))
 - Kushaan Gupta ([kushaangupta](https://github.com/kushaangupta))
 - Yuantao Deng ([YuantaoDeng](https://github.com/YuantaoDeng))
+- Natalie Yeung
 ___

sl-shared-assets 1.0.0rc19__py3-none-any.whl → 1.0.0rc21__py3-none-any.whl

Potentially problematic release.

sl-shared-assets 1.0.0rc19py3-none-any.whl → 1.0.0rc21py3-none-any.whl