sl-shared-assets 1.0.0rc15__py3-none-any.whl → 1.0.0rc17__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of sl-shared-assets might be problematic. Click here for more details.
- sl_shared_assets/data_classes/session_data.py +27 -9
- sl_shared_assets/suite2p/multi_day.py +119 -87
- sl_shared_assets/suite2p/single_day.py +220 -136
- {sl_shared_assets-1.0.0rc15.dist-info → sl_shared_assets-1.0.0rc17.dist-info}/METADATA +1 -1
- sl_shared_assets-1.0.0rc17.dist-info/RECORD +23 -0
- sl_shared_assets/__init__.pyi +0 -71
- sl_shared_assets/cli.pyi +0 -28
- sl_shared_assets/data_classes/__init__.pyi +0 -61
- sl_shared_assets/data_classes/configuration_data.pyi +0 -37
- sl_shared_assets/data_classes/runtime_data.pyi +0 -145
- sl_shared_assets/data_classes/session_data.pyi +0 -527
- sl_shared_assets/data_classes/surgery_data.pyi +0 -89
- sl_shared_assets/server/__init__.pyi +0 -8
- sl_shared_assets/server/job.pyi +0 -94
- sl_shared_assets/server/server.pyi +0 -95
- sl_shared_assets/suite2p/__init__.pyi +0 -4
- sl_shared_assets/suite2p/multi_day.pyi +0 -99
- sl_shared_assets/suite2p/single_day.pyi +0 -192
- sl_shared_assets/tools/__init__.pyi +0 -5
- sl_shared_assets/tools/ascension_tools.pyi +0 -68
- sl_shared_assets/tools/packaging_tools.pyi +0 -52
- sl_shared_assets/tools/transfer_tools.pyi +0 -53
- sl_shared_assets-1.0.0rc15.dist-info/RECORD +0 -40
- {sl_shared_assets-1.0.0rc15.dist-info → sl_shared_assets-1.0.0rc17.dist-info}/WHEEL +0 -0
- {sl_shared_assets-1.0.0rc15.dist-info → sl_shared_assets-1.0.0rc17.dist-info}/entry_points.txt +0 -0
- {sl_shared_assets-1.0.0rc15.dist-info → sl_shared_assets-1.0.0rc17.dist-info}/licenses/LICENSE +0 -0
sl_shared_assets/server/job.pyi
DELETED
|
@@ -1,94 +0,0 @@
|
|
|
1
|
-
from pathlib import Path
|
|
2
|
-
|
|
3
|
-
from _typeshed import Incomplete
|
|
4
|
-
from simple_slurm import Slurm
|
|
5
|
-
|
|
6
|
-
class Job:
|
|
7
|
-
"""Aggregates the data of a single SLURM-managed job to be executed on the Sun lab BioHPC cluster.
|
|
8
|
-
|
|
9
|
-
This class provides the API for constructing any server-side job in the Sun lab. Internally, it wraps an instance
|
|
10
|
-
of a Slurm class to package the job data into the format expected by the SLURM job manager. All jobs managed by this
|
|
11
|
-
class instance should be submitted to an initialized Server class 'submit_job' method to be executed on the server.
|
|
12
|
-
|
|
13
|
-
Notes:
|
|
14
|
-
The initialization method of the class contains the arguments for configuring the SLURM and Conda environments
|
|
15
|
-
used by the job. Do not submit additional SLURM or Conda commands via the 'add_command' method, as this may
|
|
16
|
-
produce unexpected behavior.
|
|
17
|
-
|
|
18
|
-
Each job can be conceptualized as a sequence of shell instructions to execute on the remote compute server. For
|
|
19
|
-
the lab, that means that the bulk of the command consists of calling various CLIs exposed by data processing or
|
|
20
|
-
analysis pipelines, installed in the Conda environment on the server. Other than that, the job contains commands
|
|
21
|
-
for activating the target conda environment and, in some cases, doing other preparatory or cleanup work. The
|
|
22
|
-
source code of a 'remote' job is typically identical to what a human operator would type in a 'local' terminal
|
|
23
|
-
to run the same job on their PC.
|
|
24
|
-
|
|
25
|
-
A key feature of server-side jobs is that they are executed on virtual machines managed by SLURM. Since the
|
|
26
|
-
server has a lot more compute and memory resources than likely needed by individual jobs, each job typically
|
|
27
|
-
requests a subset of these resources. Upon being executed, SLURM creates an isolated environment with the
|
|
28
|
-
requested resources and runs the job in that environment.
|
|
29
|
-
|
|
30
|
-
Since all jobs are expected to use the CLIs from python packages (pre)installed on the BioHPC server, make sure
|
|
31
|
-
that the target environment is installed and configured before submitting jobs to the server. See notes in
|
|
32
|
-
ReadMe to learn more about configuring server-side conda environments.
|
|
33
|
-
|
|
34
|
-
Args:
|
|
35
|
-
job_name: The descriptive name of the SLURM job to be created. Primarily, this name is used in terminal
|
|
36
|
-
printouts to identify the job to human operators.
|
|
37
|
-
output_log: The absolute path to the .txt file on the processing server, where to store the standard output
|
|
38
|
-
data of the job.
|
|
39
|
-
error_log: The absolute path to the .txt file on the processing server, where to store the standard error
|
|
40
|
-
data of the job.
|
|
41
|
-
working_directory: The absolute path to the directory where temporary job files will be stored. During runtime,
|
|
42
|
-
classes from this library use that directory to store files such as the job's shell script. All such files
|
|
43
|
-
are automatically removed from the directory at the end of a non-errors runtime.
|
|
44
|
-
conda_environment: The name of the conda environment to activate on the server before running the job logic. The
|
|
45
|
-
environment should contain the necessary Python packages and CLIs to support running the job's logic.
|
|
46
|
-
cpus_to_use: The number of CPUs to use for the job.
|
|
47
|
-
ram_gb: The amount of RAM to allocate for the job, in Gigabytes.
|
|
48
|
-
time_limit: The maximum time limit for the job, in minutes. If the job is still running at the end of this time
|
|
49
|
-
period, it will be forcibly terminated. It is highly advised to always set adequate maximum runtime limits
|
|
50
|
-
to prevent jobs from hogging the server in case of runtime or algorithm errors.
|
|
51
|
-
|
|
52
|
-
Attributes:
|
|
53
|
-
remote_script_path: Stores the path to the script file relative to the root of the remote server that runs the
|
|
54
|
-
command.
|
|
55
|
-
job_id: Stores the unique job identifier assigned by the SLURM manager to this job, when it is accepted for
|
|
56
|
-
execution. This field initialized to None and is overwritten by the Server class that submits the job.
|
|
57
|
-
job_name: Stores the descriptive name of the SLURM job.
|
|
58
|
-
_command: Stores the managed SLURM command object.
|
|
59
|
-
"""
|
|
60
|
-
|
|
61
|
-
remote_script_path: Incomplete
|
|
62
|
-
job_id: str | None
|
|
63
|
-
job_name: str
|
|
64
|
-
_command: Slurm
|
|
65
|
-
def __init__(
|
|
66
|
-
self,
|
|
67
|
-
job_name: str,
|
|
68
|
-
output_log: Path,
|
|
69
|
-
error_log: Path,
|
|
70
|
-
working_directory: Path,
|
|
71
|
-
conda_environment: str,
|
|
72
|
-
cpus_to_use: int = 10,
|
|
73
|
-
ram_gb: int = 10,
|
|
74
|
-
time_limit: int = 60,
|
|
75
|
-
) -> None: ...
|
|
76
|
-
def __repr__(self) -> str:
|
|
77
|
-
"""Returns the string representation of the Job instance."""
|
|
78
|
-
def add_command(self, command: str) -> None:
|
|
79
|
-
"""Adds the input command string to the end of the managed SLURM job command list.
|
|
80
|
-
|
|
81
|
-
This method is a wrapper around simple_slurm's 'add_cmd' method. It is used to iteratively build the shell
|
|
82
|
-
command sequence of the job.
|
|
83
|
-
|
|
84
|
-
Args:
|
|
85
|
-
command: The command string to add to the command list, e.g.: 'python main.py --input 1'.
|
|
86
|
-
"""
|
|
87
|
-
@property
|
|
88
|
-
def command_script(self) -> str:
|
|
89
|
-
"""Translates the managed job data into a shell-script-writable string and returns it to caller.
|
|
90
|
-
|
|
91
|
-
This method is used by the Server class to translate the job into the format that can be submitted to and
|
|
92
|
-
executed on the remote compute server. Do not call this method manually unless you know what you are doing.
|
|
93
|
-
The returned string is safe to dump into a .sh (shell script) file and move to the BioHPC server for execution.
|
|
94
|
-
"""
|
|
@@ -1,95 +0,0 @@
|
|
|
1
|
-
from pathlib import Path
|
|
2
|
-
from dataclasses import dataclass
|
|
3
|
-
|
|
4
|
-
from simple_slurm import Slurm as Slurm
|
|
5
|
-
from paramiko.client import SSHClient as SSHClient
|
|
6
|
-
from ataraxis_data_structures import YamlConfig
|
|
7
|
-
|
|
8
|
-
from .job import Job as Job
|
|
9
|
-
|
|
10
|
-
def generate_server_credentials(
|
|
11
|
-
output_directory: Path, username: str, password: str, host: str = "cbsuwsun.biohpc.cornell.edu"
|
|
12
|
-
) -> None:
|
|
13
|
-
"""Generates a new server_credentials.yaml file under the specified directory, using input information.
|
|
14
|
-
|
|
15
|
-
This function provides a convenience interface for generating new BioHPC server credential files. Generally, this is
|
|
16
|
-
only used when setting up new host-computers in the lab.
|
|
17
|
-
"""
|
|
18
|
-
@dataclass()
|
|
19
|
-
class ServerCredentials(YamlConfig):
|
|
20
|
-
"""This class stores the hostname and credentials used to log into the BioHPC cluster to run Sun lab processing
|
|
21
|
-
pipelines.
|
|
22
|
-
|
|
23
|
-
Primarily, this is used as part of the sl-experiment library runtime to start data processing once it is
|
|
24
|
-
transferred to the BioHPC server during preprocessing. However, the same file can be used together with the Server
|
|
25
|
-
class API to run any computation jobs on the lab's BioHPC server.
|
|
26
|
-
"""
|
|
27
|
-
|
|
28
|
-
username: str = ...
|
|
29
|
-
password: str = ...
|
|
30
|
-
host: str = ...
|
|
31
|
-
|
|
32
|
-
class Server:
|
|
33
|
-
"""Encapsulates access to the Sun lab BioHPC processing server.
|
|
34
|
-
|
|
35
|
-
This class provides the API that allows accessing the BioHPC server to create and submit various SLURM-managed jobs
|
|
36
|
-
to the server. It functions as the central interface used by all processing pipelines in the lab to execute costly
|
|
37
|
-
data processing on the server.
|
|
38
|
-
|
|
39
|
-
Notes:
|
|
40
|
-
All lab processing pipelines expect the data to be stored on the server and all processing logic to be packaged
|
|
41
|
-
and installed into dedicated conda environments on the server.
|
|
42
|
-
|
|
43
|
-
This class assumes that the target server has SLURM job manager installed and accessible to the user whose
|
|
44
|
-
credentials are used to connect to the server as part of this class instantiation.
|
|
45
|
-
|
|
46
|
-
Args:
|
|
47
|
-
credentials_path: The path to the locally stored .yaml file that contains the server hostname and access
|
|
48
|
-
credentials.
|
|
49
|
-
|
|
50
|
-
Attributes:
|
|
51
|
-
_open: Tracks whether the connection to the server is open or not.
|
|
52
|
-
_client: Stores the initialized SSHClient instance used to interface with the server.
|
|
53
|
-
"""
|
|
54
|
-
|
|
55
|
-
_open: bool
|
|
56
|
-
_credentials: ServerCredentials
|
|
57
|
-
_client: SSHClient
|
|
58
|
-
def __init__(self, credentials_path: Path) -> None: ...
|
|
59
|
-
def __del__(self) -> None:
|
|
60
|
-
"""If the instance is connected to the server, terminates the connection before the instance is destroyed."""
|
|
61
|
-
def submit_job(self, job: Job) -> Job:
|
|
62
|
-
"""Submits the input job to the managed BioHPC server via SLURM job manager.
|
|
63
|
-
|
|
64
|
-
This method submits various jobs for execution via SLURM-managed BioHPC cluster. As part of its runtime, the
|
|
65
|
-
method translates the Job object into the shell script, moves the script to the target working directory on
|
|
66
|
-
the server, and instructs the server to execute the shell script (via SLURM).
|
|
67
|
-
|
|
68
|
-
Args:
|
|
69
|
-
job: The Job object that contains all job data.
|
|
70
|
-
|
|
71
|
-
Returns:
|
|
72
|
-
The job object whose 'job_id' attribute had been modified with the job ID, if the job was successfully
|
|
73
|
-
submitted.
|
|
74
|
-
|
|
75
|
-
Raises:
|
|
76
|
-
RuntimeError: If job submission to the server fails.
|
|
77
|
-
"""
|
|
78
|
-
def job_complete(self, job: Job) -> bool:
|
|
79
|
-
"""Returns True if the job managed by the input Job instance has been completed or terminated its runtime due
|
|
80
|
-
to an error.
|
|
81
|
-
|
|
82
|
-
If the job is still running or is waiting inside the execution queue, returns False.
|
|
83
|
-
|
|
84
|
-
Args:
|
|
85
|
-
job: The Job object whose status needs to be checked.
|
|
86
|
-
|
|
87
|
-
Raises:
|
|
88
|
-
ValueError: If the input Job object does not contain a valid job_id, suggesting that it has not been
|
|
89
|
-
submitted to the server.
|
|
90
|
-
"""
|
|
91
|
-
def close(self) -> None:
|
|
92
|
-
"""Closes the SSH connection to the server.
|
|
93
|
-
|
|
94
|
-
This method has to be called before destroying the class instance to ensure proper resource cleanup.
|
|
95
|
-
"""
|
|
@@ -1,99 +0,0 @@
|
|
|
1
|
-
from typing import Any
|
|
2
|
-
from dataclasses import field, dataclass
|
|
3
|
-
|
|
4
|
-
from _typeshed import Incomplete
|
|
5
|
-
from ataraxis_data_structures import YamlConfig
|
|
6
|
-
|
|
7
|
-
@dataclass()
|
|
8
|
-
class IO:
|
|
9
|
-
"""Stores parameters that control data input and output during various stages of the pipeline."""
|
|
10
|
-
|
|
11
|
-
sessions: list[str] = field(default_factory=list)
|
|
12
|
-
mesoscan: bool = ...
|
|
13
|
-
|
|
14
|
-
@dataclass()
|
|
15
|
-
class CellDetection:
|
|
16
|
-
"""Stores parameters for selecting single-day-registered cells (ROIs) to be tracked across multiple sessions (days).
|
|
17
|
-
|
|
18
|
-
To maximize the tracking pipeline reliability, it is beneficial to pre-filter the cells whose identity (as cells)
|
|
19
|
-
is not certain or that may be hard to track across sessions.
|
|
20
|
-
"""
|
|
21
|
-
|
|
22
|
-
probability_threshold: float = ...
|
|
23
|
-
maximum_size: int = ...
|
|
24
|
-
mesoscope_stripe_borders: list[int] = field(default_factory=list)
|
|
25
|
-
stripe_margin: int = ...
|
|
26
|
-
|
|
27
|
-
@dataclass()
|
|
28
|
-
class Registration:
|
|
29
|
-
"""Stores parameters for aligning (registering) the sessions from multiple days to the same visual space.
|
|
30
|
-
|
|
31
|
-
Registration is used to create a 'shared' visual space, allowing to track the same cells (ROIs) across otherwise
|
|
32
|
-
variable visual space of each session.
|
|
33
|
-
"""
|
|
34
|
-
|
|
35
|
-
image_type: str = ...
|
|
36
|
-
grid_sampling_factor: float = ...
|
|
37
|
-
scale_sampling: int = ...
|
|
38
|
-
speed_factor: float = ...
|
|
39
|
-
|
|
40
|
-
@dataclass()
|
|
41
|
-
class Clustering:
|
|
42
|
-
"""Stores parameters for clustering cell (ROI) masks across multiple registered sessions.
|
|
43
|
-
|
|
44
|
-
Clustering is used to track cells across sessions. If a group of ROIs across sessions is clustered together, it
|
|
45
|
-
is likely that they represent the same cell (ROI) across all sessions. This process involves first creating a
|
|
46
|
-
'template' mask that tracks a cell using the registered (deformed) visual space and then using this template to
|
|
47
|
-
track the cell in the original (non-deformed) visual space of each session.
|
|
48
|
-
"""
|
|
49
|
-
|
|
50
|
-
criterion: str = ...
|
|
51
|
-
threshold: float = ...
|
|
52
|
-
mask_prevalence: int = ...
|
|
53
|
-
pixel_prevalence: int = ...
|
|
54
|
-
step_sizes: list[int] = field(default_factory=Incomplete)
|
|
55
|
-
bin_size: int = ...
|
|
56
|
-
maximum_distance: int = ...
|
|
57
|
-
minimum_size: int = ...
|
|
58
|
-
|
|
59
|
-
@dataclass()
|
|
60
|
-
class Demix:
|
|
61
|
-
"""Stores settings used to deconvolve fluorescence signals from cells tracked across multiple days.
|
|
62
|
-
|
|
63
|
-
This step applies the suite2p spike deconvolution algorithm to the cell masks isolated during clustering to extract
|
|
64
|
-
the fluorescence of the cells tracked across multiple sessions (days). Generally, it should use the same parameters
|
|
65
|
-
as were used by the single-day suite2p pipeline.
|
|
66
|
-
"""
|
|
67
|
-
|
|
68
|
-
baseline: str = ...
|
|
69
|
-
win_baseline: float = ...
|
|
70
|
-
sig_baseline: float = ...
|
|
71
|
-
l2_reg: float = ...
|
|
72
|
-
neucoeff: float = ...
|
|
73
|
-
|
|
74
|
-
@dataclass()
|
|
75
|
-
class MultiDayS2PConfiguration(YamlConfig):
|
|
76
|
-
"""Aggregates all parameters for the multi-day suite2p pipeline used to track cells across multiple days
|
|
77
|
-
(sessions) and extract their activity.
|
|
78
|
-
|
|
79
|
-
These settings are used to configure the multiday suite2p extraction pipeline, which is based on the reference
|
|
80
|
-
implementation here: https://github.com/sprustonlab/multiday-suite2p-public. This class behaves similar to the
|
|
81
|
-
SingleDayS2PConfiguration class. It can be saved and loaded from a .YAML file and translated to dictionary format,
|
|
82
|
-
expected by the multi-day sl-suite2p pipeline.
|
|
83
|
-
"""
|
|
84
|
-
|
|
85
|
-
cell_detection: CellDetection = field(default_factory=CellDetection)
|
|
86
|
-
registration: Registration = field(default_factory=Registration)
|
|
87
|
-
clustering: Clustering = field(default_factory=Clustering)
|
|
88
|
-
demix: Demix = field(default_factory=Demix)
|
|
89
|
-
io: IO = field(default_factory=IO)
|
|
90
|
-
def to_ops(self) -> dict[str, Any]:
|
|
91
|
-
"""Converts the class instance to a dictionary and returns it to caller.
|
|
92
|
-
|
|
93
|
-
This dictionary can be passed to sl-suite2p multi-day functions as the 'ops' argument.
|
|
94
|
-
|
|
95
|
-
Notes:
|
|
96
|
-
Unlike the single-day configuration class, the dictionary generated by this method uses section names as
|
|
97
|
-
top level keys and parameter names as second-level keys. This mimics the original multiday-pipeline
|
|
98
|
-
configuration scheme.
|
|
99
|
-
"""
|
|
@@ -1,192 +0,0 @@
|
|
|
1
|
-
from typing import Any
|
|
2
|
-
from dataclasses import field, dataclass
|
|
3
|
-
|
|
4
|
-
from _typeshed import Incomplete
|
|
5
|
-
from ataraxis_data_structures import YamlConfig
|
|
6
|
-
|
|
7
|
-
@dataclass
|
|
8
|
-
class Main:
|
|
9
|
-
"""Stores global parameters that broadly define the suite2p single-day processing configuration."""
|
|
10
|
-
|
|
11
|
-
nplanes: int = ...
|
|
12
|
-
nchannels: int = ...
|
|
13
|
-
functional_chan: int = ...
|
|
14
|
-
tau: float = ...
|
|
15
|
-
force_sktiff: bool = ...
|
|
16
|
-
fs: float = ...
|
|
17
|
-
do_bidiphase: bool = ...
|
|
18
|
-
bidiphase: int = ...
|
|
19
|
-
bidi_corrected: bool = ...
|
|
20
|
-
frames_include: int = ...
|
|
21
|
-
multiplane_parallel: bool = ...
|
|
22
|
-
ignore_flyback: list[int] = field(default_factory=list)
|
|
23
|
-
|
|
24
|
-
@dataclass
|
|
25
|
-
class FileIO:
|
|
26
|
-
"""Stores general I/O parameters that specify input data location, format, and working and output directories."""
|
|
27
|
-
|
|
28
|
-
fast_disk: list[str] = field(default_factory=list)
|
|
29
|
-
delete_bin: bool = ...
|
|
30
|
-
mesoscan: bool = ...
|
|
31
|
-
bruker: bool = ...
|
|
32
|
-
bruker_bidirectional: bool = ...
|
|
33
|
-
h5py: list[str] = field(default_factory=list)
|
|
34
|
-
h5py_key: str = ...
|
|
35
|
-
nwb_file: str = ...
|
|
36
|
-
nwb_driver: str = ...
|
|
37
|
-
nwb_series: str = ...
|
|
38
|
-
save_path0: list[str] = field(default_factory=list)
|
|
39
|
-
save_folder: list[str] = field(default_factory=list)
|
|
40
|
-
look_one_level_down: bool = ...
|
|
41
|
-
subfolders: list[str] = field(default_factory=list)
|
|
42
|
-
move_bin: bool = ...
|
|
43
|
-
|
|
44
|
-
@dataclass
|
|
45
|
-
class Output:
|
|
46
|
-
"""Stores I/O settings that specify the output format and organization of the data processing results."""
|
|
47
|
-
|
|
48
|
-
preclassify: float = ...
|
|
49
|
-
save_nwb: bool = ...
|
|
50
|
-
save_mat: bool = ...
|
|
51
|
-
combined: bool = ...
|
|
52
|
-
aspect: float = ...
|
|
53
|
-
report_time: bool = ...
|
|
54
|
-
|
|
55
|
-
@dataclass
|
|
56
|
-
class Registration:
|
|
57
|
-
"""Stores parameters for rigid registration, which is used to correct motion artifacts between frames."""
|
|
58
|
-
|
|
59
|
-
do_registration: bool = ...
|
|
60
|
-
align_by_chan: int = ...
|
|
61
|
-
nimg_init: int = ...
|
|
62
|
-
batch_size: int = ...
|
|
63
|
-
maxregshift: float = ...
|
|
64
|
-
smooth_sigma: float = ...
|
|
65
|
-
smooth_sigma_time: float = ...
|
|
66
|
-
keep_movie_raw: bool = ...
|
|
67
|
-
two_step_registration: bool = ...
|
|
68
|
-
reg_tif: bool = ...
|
|
69
|
-
reg_tif_chan2: bool = ...
|
|
70
|
-
subpixel: int = ...
|
|
71
|
-
th_badframes: float = ...
|
|
72
|
-
norm_frames: bool = ...
|
|
73
|
-
force_refImg: bool = ...
|
|
74
|
-
pad_fft: bool = ...
|
|
75
|
-
|
|
76
|
-
@dataclass
|
|
77
|
-
class OnePRegistration:
|
|
78
|
-
"""Stores parameters for additional pre-registration processing used to improve the registration of 1-photon
|
|
79
|
-
datasets."""
|
|
80
|
-
|
|
81
|
-
one_p_reg: bool = ...
|
|
82
|
-
spatial_hp_reg: int = ...
|
|
83
|
-
pre_smooth: float = ...
|
|
84
|
-
spatial_taper: float = ...
|
|
85
|
-
|
|
86
|
-
@dataclass
|
|
87
|
-
class NonRigid:
|
|
88
|
-
"""Stores parameters for non-rigid registration, which is used to improve motion registration in complex
|
|
89
|
-
datasets."""
|
|
90
|
-
|
|
91
|
-
nonrigid: bool = ...
|
|
92
|
-
block_size: list[int] = field(default_factory=Incomplete)
|
|
93
|
-
snr_thresh: float = ...
|
|
94
|
-
maxregshiftNR: float = ...
|
|
95
|
-
|
|
96
|
-
@dataclass
|
|
97
|
-
class ROIDetection:
|
|
98
|
-
"""Stores parameters for cell ROI detection and extraction."""
|
|
99
|
-
|
|
100
|
-
roidetect: bool = ...
|
|
101
|
-
sparse_mode: bool = ...
|
|
102
|
-
spatial_scale: int = ...
|
|
103
|
-
connected: bool = ...
|
|
104
|
-
threshold_scaling: float = ...
|
|
105
|
-
spatial_hp_detect: int = ...
|
|
106
|
-
max_overlap: float = ...
|
|
107
|
-
high_pass: int = ...
|
|
108
|
-
smooth_masks: bool = ...
|
|
109
|
-
max_iterations: int = ...
|
|
110
|
-
nbinned: int = ...
|
|
111
|
-
denoise: bool = ...
|
|
112
|
-
|
|
113
|
-
@dataclass
|
|
114
|
-
class CellposeDetection:
|
|
115
|
-
"""Stores parameters for the Cellpose algorithm, which can optionally be used to improve cell ROI extraction."""
|
|
116
|
-
|
|
117
|
-
anatomical_only: int = ...
|
|
118
|
-
diameter: int = ...
|
|
119
|
-
cellprob_threshold: float = ...
|
|
120
|
-
flow_threshold: float = ...
|
|
121
|
-
spatial_hp_cp: int = ...
|
|
122
|
-
pretrained_model: str = ...
|
|
123
|
-
|
|
124
|
-
@dataclass
|
|
125
|
-
class SignalExtraction:
|
|
126
|
-
"""Stores parameters for extracting fluorescence signals from ROIs and surrounding neuropil regions."""
|
|
127
|
-
|
|
128
|
-
neuropil_extract: bool = ...
|
|
129
|
-
allow_overlap: bool = ...
|
|
130
|
-
min_neuropil_pixels: int = ...
|
|
131
|
-
inner_neuropil_radius: int = ...
|
|
132
|
-
lam_percentile: int = ...
|
|
133
|
-
|
|
134
|
-
@dataclass
|
|
135
|
-
class SpikeDeconvolution:
|
|
136
|
-
"""Stores parameters for deconvolve fluorescence signals to infer spike trains."""
|
|
137
|
-
|
|
138
|
-
spikedetect: bool = ...
|
|
139
|
-
neucoeff: float = ...
|
|
140
|
-
baseline: str = ...
|
|
141
|
-
win_baseline: float = ...
|
|
142
|
-
sig_baseline: float = ...
|
|
143
|
-
prctile_baseline: float = ...
|
|
144
|
-
|
|
145
|
-
@dataclass
|
|
146
|
-
class Classification:
|
|
147
|
-
"""Stores parameters for classifying detected ROIs as real cells or artifacts."""
|
|
148
|
-
|
|
149
|
-
soma_crop: bool = ...
|
|
150
|
-
use_builtin_classifier: bool = ...
|
|
151
|
-
classifier_path: str = ...
|
|
152
|
-
|
|
153
|
-
@dataclass
|
|
154
|
-
class Channel2:
|
|
155
|
-
"""Stores parameters for processing the second channel in multichannel datasets."""
|
|
156
|
-
|
|
157
|
-
chan2_thres: float = ...
|
|
158
|
-
|
|
159
|
-
@dataclass
|
|
160
|
-
class SingleDayS2PConfiguration(YamlConfig):
|
|
161
|
-
"""Stores the user-addressable suite2p configuration parameters for the single-day (original) pipeline, organized
|
|
162
|
-
into subsections.
|
|
163
|
-
|
|
164
|
-
This class is used during single-day processing to instruct suite2p on how to process the data. This class is based
|
|
165
|
-
on the 'default_ops' from the original suite2p package. As part of the suite2p refactoring performed in sl-suite2p
|
|
166
|
-
package, the 'default_ops' has been replaced with this class instance. Compared to 'original' ops, it allows saving
|
|
167
|
-
configuration parameters as a .YAML file, which offers a better way of viewing and editing the parameters and
|
|
168
|
-
running suite2p pipeline on remote compute servers.
|
|
169
|
-
|
|
170
|
-
Notes:
|
|
171
|
-
The .YAML file uses section names that match the suite2p documentation sections. This way, users can always
|
|
172
|
-
consult the suite2p documentation for information on the purpose of each field inside every subsection.
|
|
173
|
-
"""
|
|
174
|
-
|
|
175
|
-
main: Main = field(default_factory=Main)
|
|
176
|
-
file_io: FileIO = field(default_factory=FileIO)
|
|
177
|
-
output: Output = field(default_factory=Output)
|
|
178
|
-
registration: Registration = field(default_factory=Registration)
|
|
179
|
-
one_p_registration: OnePRegistration = field(default_factory=OnePRegistration)
|
|
180
|
-
non_rigid: NonRigid = field(default_factory=NonRigid)
|
|
181
|
-
roi_detection: ROIDetection = field(default_factory=ROIDetection)
|
|
182
|
-
cellpose_detection: CellposeDetection = field(default_factory=CellposeDetection)
|
|
183
|
-
signal_extraction: SignalExtraction = field(default_factory=SignalExtraction)
|
|
184
|
-
spike_deconvolution: SpikeDeconvolution = field(default_factory=SpikeDeconvolution)
|
|
185
|
-
classification: Classification = field(default_factory=Classification)
|
|
186
|
-
channel2: Channel2 = field(default_factory=Channel2)
|
|
187
|
-
def to_ops(self) -> dict[str, Any]:
|
|
188
|
-
"""Converts the class instance to a dictionary and returns it to caller.
|
|
189
|
-
|
|
190
|
-
This dictionary can be passed to suite2p functions either as an 'ops' or 'db' argument to control the
|
|
191
|
-
processing runtime.
|
|
192
|
-
"""
|
|
@@ -1,5 +0,0 @@
|
|
|
1
|
-
from .transfer_tools import transfer_directory as transfer_directory
|
|
2
|
-
from .ascension_tools import ascend_tyche_data as ascend_tyche_data
|
|
3
|
-
from .packaging_tools import calculate_directory_checksum as calculate_directory_checksum
|
|
4
|
-
|
|
5
|
-
__all__ = ["transfer_directory", "calculate_directory_checksum", "ascend_tyche_data"]
|
|
@@ -1,68 +0,0 @@
|
|
|
1
|
-
from pathlib import Path
|
|
2
|
-
|
|
3
|
-
from ..data_classes import (
|
|
4
|
-
SessionData as SessionData,
|
|
5
|
-
ProjectConfiguration as ProjectConfiguration,
|
|
6
|
-
)
|
|
7
|
-
from .transfer_tools import transfer_directory as transfer_directory
|
|
8
|
-
from .packaging_tools import calculate_directory_checksum as calculate_directory_checksum
|
|
9
|
-
|
|
10
|
-
def _generate_session_name(acquisition_path: Path) -> str:
|
|
11
|
-
"""Generates a session name using the last modification time of a zstack.mat or MotionEstimator.me file.
|
|
12
|
-
|
|
13
|
-
This worker function uses one of the motion estimation files stored in each Tyche 'acquisition' subfolder to
|
|
14
|
-
generate a modern Sun lab timestamp-based session name. This is used to translate the original Tyche session naming
|
|
15
|
-
pattern into the pattern used by all modern Sun lab projects and pipelines.
|
|
16
|
-
|
|
17
|
-
Args:
|
|
18
|
-
acquisition_path: The absolute path to the target acquisition folder. These folders are found under the 'day'
|
|
19
|
-
folders for each animal, e.g.: Tyche-A7/2022_01_03/1.
|
|
20
|
-
|
|
21
|
-
Returns:
|
|
22
|
-
The modernized session name.
|
|
23
|
-
"""
|
|
24
|
-
|
|
25
|
-
def _reorganize_data(session_data: SessionData, source_root: Path) -> bool:
|
|
26
|
-
"""Reorganizes and moves the session's data from the source folder in the old Tyche data hierarchy to the raw_data
|
|
27
|
-
folder in the newly created modern hierarchy.
|
|
28
|
-
|
|
29
|
-
This worker function is used to physically rearrange the data from the original Tyche data structure to the
|
|
30
|
-
new data structure. It both moves the existing files to their new destinations and renames certain files to match
|
|
31
|
-
the modern naming convention used in the Sun lab.
|
|
32
|
-
|
|
33
|
-
Args:
|
|
34
|
-
session_data: The initialized SessionData instance managing the 'ascended' (modernized) session data hierarchy.
|
|
35
|
-
source_root: The absolute path to the old Tyche data hierarchy folder that stores session's data.
|
|
36
|
-
|
|
37
|
-
Returns:
|
|
38
|
-
True if the ascension process was successfully completed. False if the process encountered missing data or
|
|
39
|
-
otherwise did not go as expected. When the method returns False, the runtime function requests user intervention
|
|
40
|
-
to finalize the process manually.
|
|
41
|
-
"""
|
|
42
|
-
|
|
43
|
-
def ascend_tyche_data(root_directory: Path, output_root_directory: Path, server_root_directory: Path) -> None:
|
|
44
|
-
"""Reformats the old Tyche data to use the modern Sun lab layout and metadata files.
|
|
45
|
-
|
|
46
|
-
This function is used to convert old Tyche data to the modern data management standard. This is used to make the
|
|
47
|
-
data compatible with the modern Sun lab data workflows.
|
|
48
|
-
|
|
49
|
-
Notes:
|
|
50
|
-
This function is statically written to work with the raw Tyche dataset featured in the OSM manuscript:
|
|
51
|
-
https://www.nature.com/articles/s41586-024-08548-w. Additionally, it assumes that the dataset has been
|
|
52
|
-
preprocessed with the early Sun lab mesoscope compression pipeline. The function will not work for any other
|
|
53
|
-
project or data hierarchy.
|
|
54
|
-
|
|
55
|
-
As part of its runtime, the function automatically transfers the ascended session data to the BioHPC server.
|
|
56
|
-
Since transferring the data over the network is the bottleneck of this pipeline, it runs in a single-threaded
|
|
57
|
-
mode and is constrained by the communication channel between the local machine and the BioHPC server. Calling
|
|
58
|
-
this function for a large number of sessions will result in a long processing time due to the network data
|
|
59
|
-
transfer.
|
|
60
|
-
|
|
61
|
-
Args:
|
|
62
|
-
root_directory: The directory that stores one or more Tyche animal folders. This can be conceptualized as the
|
|
63
|
-
root directory for the Tyche project.
|
|
64
|
-
output_root_directory: The path to the local directory where to generate the converted Tyche project hierarchy.
|
|
65
|
-
Typically, this is the 'root' directory where all other Sun lab projects are stored.
|
|
66
|
-
server_root_directory: The path to the local filesystem-mounted BioHPC server storage directory. Note, this
|
|
67
|
-
directory hs to be mapped to the local filesystem via the SMB or equivalent protocol.
|
|
68
|
-
"""
|
|
@@ -1,52 +0,0 @@
|
|
|
1
|
-
from pathlib import Path
|
|
2
|
-
|
|
3
|
-
def _calculate_file_checksum(base_directory: Path, file_path: Path) -> tuple[str, bytes]:
|
|
4
|
-
"""Calculates xxHash3-128 checksum for a single file and its path relative to the base directory.
|
|
5
|
-
|
|
6
|
-
This function is passed to parallel workers used by the calculate_directory_hash() method that iteratively
|
|
7
|
-
calculates the checksum for all files inside a directory. Each call to this function returns the checksum for the
|
|
8
|
-
target file, which includes both the contents of the file and its path relative to the base directory.
|
|
9
|
-
|
|
10
|
-
Args:
|
|
11
|
-
base_directory: The path to the base (root) directory which is being checksummed by the main
|
|
12
|
-
'calculate_directory_checksum' function.
|
|
13
|
-
file_path: The absolute path to the target file.
|
|
14
|
-
|
|
15
|
-
Returns:
|
|
16
|
-
A tuple with two elements. The first element is the path to the file relative to the base directory. The second
|
|
17
|
-
element is the xxHash3-128 checksum that covers the relative path and the contents of the file.
|
|
18
|
-
"""
|
|
19
|
-
|
|
20
|
-
def calculate_directory_checksum(
|
|
21
|
-
directory: Path, num_processes: int | None = None, batch: bool = False, save_checksum: bool = True
|
|
22
|
-
) -> str:
|
|
23
|
-
"""Calculates xxHash3-128 checksum for the input directory, which includes the data of all contained files and
|
|
24
|
-
the directory structure information.
|
|
25
|
-
|
|
26
|
-
This function is used to generate a checksum for the raw_data directory of each experiment or training session.
|
|
27
|
-
Checksums are used to verify the session data integrity during transmission between the PC that acquired the data
|
|
28
|
-
and long-term storage locations, such as the Synology NAS or the BioHPC server. The function can be configured to
|
|
29
|
-
write the generated checksum as a hexadecimal string to the ax_checksum.txt file stored at the highest level of the
|
|
30
|
-
input directory.
|
|
31
|
-
|
|
32
|
-
Note:
|
|
33
|
-
This method uses multiprocessing to efficiently parallelize checksum calculation for multiple files. In
|
|
34
|
-
combination with xxHash3, this achieves a significant speedup over more common checksums, such as MD5 and
|
|
35
|
-
SHA256. Note that xxHash3 is not suitable for security purposes and is only used to ensure data integrity.
|
|
36
|
-
|
|
37
|
-
The method notifies the user about the checksum calculation process via the terminal.
|
|
38
|
-
|
|
39
|
-
The returned checksum accounts for both the contents of each file and the layout of the input directory
|
|
40
|
-
structure.
|
|
41
|
-
|
|
42
|
-
Args:
|
|
43
|
-
directory: The Path to the directory to be checksummed.
|
|
44
|
-
num_processes: The number of CPU processes to use for parallelizing checksum calculation. If set to None, the
|
|
45
|
-
function defaults to using (logical CPU count - 4).
|
|
46
|
-
batch: Determines whether the function is called as part of batch-processing multiple directories. This is used
|
|
47
|
-
to optimize progress reporting to avoid cluttering the terminal.
|
|
48
|
-
save_checksum: Determines whether the checksum should be saved (written to) a .txt file.
|
|
49
|
-
|
|
50
|
-
Returns:
|
|
51
|
-
The xxHash3-128 checksum for the input directory as a hexadecimal string.
|
|
52
|
-
"""
|
|
@@ -1,53 +0,0 @@
|
|
|
1
|
-
from pathlib import Path
|
|
2
|
-
|
|
3
|
-
from .packaging_tools import calculate_directory_checksum as calculate_directory_checksum
|
|
4
|
-
|
|
5
|
-
def _transfer_file(source_file: Path, source_directory: Path, destination_directory: Path) -> None:
|
|
6
|
-
"""Copies the input file from the source directory to the destination directory while preserving the file metadata.
|
|
7
|
-
|
|
8
|
-
This is a worker method used by the transfer_directory() method to move multiple files in parallel.
|
|
9
|
-
|
|
10
|
-
Notes:
|
|
11
|
-
If the file is found under a hierarchy of subdirectories inside the input source_directory, that hierarchy will
|
|
12
|
-
be preserved in the destination directory.
|
|
13
|
-
|
|
14
|
-
Args:
|
|
15
|
-
source_file: The file to be copied.
|
|
16
|
-
source_directory: The root directory where the file is located.
|
|
17
|
-
destination_directory: The destination directory where to move the file.
|
|
18
|
-
"""
|
|
19
|
-
|
|
20
|
-
def transfer_directory(source: Path, destination: Path, num_threads: int = 1, verify_integrity: bool = True) -> None:
|
|
21
|
-
"""Copies the contents of the input directory tree from source to destination while preserving the folder
|
|
22
|
-
structure.
|
|
23
|
-
|
|
24
|
-
This function is used to assemble the experimental data from all remote machines used in the acquisition process on
|
|
25
|
-
the VRPC before the data is preprocessed. It is also used to transfer the preprocessed data from the VRPC to the
|
|
26
|
-
SynologyNAS and the Sun lab BioHPC server.
|
|
27
|
-
|
|
28
|
-
Notes:
|
|
29
|
-
This method recreates the moved directory hierarchy on the destination if the hierarchy does not exist. This is
|
|
30
|
-
done before copying the files.
|
|
31
|
-
|
|
32
|
-
The method executes a multithreading copy operation. It does not clean up the source files. That job is handed
|
|
33
|
-
to the specific preprocessing function from the sl_experiment or sl-forgery libraries that calls this function.
|
|
34
|
-
|
|
35
|
-
If the method is configured to verify transferred file integrity, it reruns the xxHash3-128 checksum calculation
|
|
36
|
-
and compares the returned checksum to the one stored in the source directory. The method assumes that all input
|
|
37
|
-
directories contain the 'ax_checksum.txt' file that stores the 'source' directory checksum at the highest level
|
|
38
|
-
of the input directory tree.
|
|
39
|
-
|
|
40
|
-
Args:
|
|
41
|
-
source: The path to the directory that needs to be moved.
|
|
42
|
-
destination: The path to the destination directory where to move the contents of the source directory.
|
|
43
|
-
num_threads: The number of threads to use for parallel file transfer. This number should be set depending on the
|
|
44
|
-
type of transfer (local or remote) and is not guaranteed to provide improved transfer performance. For local
|
|
45
|
-
transfers, setting this number above 1 will likely provide a performance boost. For remote transfers using
|
|
46
|
-
a single TCP / IP socket (such as non-multichannel SMB protocol), the number should be set to 1.
|
|
47
|
-
verify_integrity: Determines whether to perform integrity verification for the transferred files. Note,
|
|
48
|
-
integrity verification is a time-consuming process and generally would not be a concern for most runtimes.
|
|
49
|
-
Therefore, it is often fine to disable this option to optimize method runtime speed.
|
|
50
|
-
|
|
51
|
-
Raises:
|
|
52
|
-
RuntimeError: If the transferred files do not pass the xxHas3-128 checksum integrity verification.
|
|
53
|
-
"""
|