PyPI - octopi - Versions diffs - 1.0__tar.gz → 1.1__tar.gz - Mend

octopi 1.0tar.gz → 1.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of octopi might be problematic. Click here for more details.

Files changed (60) hide show

octopi-1.1/PKG-INFO ADDED Viewed

@@ -0,0 +1,108 @@
+Metadata-Version: 2.3
+Name: octopi
+Version: 1.1
+Summary: Model architecture exploration for cryoET particle picking
+License: MIT
+Author: Jonathan Schwartz
+Requires-Python: >=3.9,<4.0
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Requires-Dist: copick
+Requires-Dist: ipywidgets
+Requires-Dist: kaleido
+Requires-Dist: matplotlib
+Requires-Dist: mlflow (==2.17.0)
+Requires-Dist: monai-weekly (==1.5.dev2448)
+Requires-Dist: mrcfile
+Requires-Dist: multiprocess
+Requires-Dist: nibabel
+Requires-Dist: optuna (==4.0.0)
+Requires-Dist: optuna-integration[botorch,pytorch-lightning]
+Requires-Dist: pandas
+Requires-Dist: plotly
+Requires-Dist: python-dotenv
+Requires-Dist: pytorch-lightning (==2.4.0)
+Requires-Dist: requests (>=2.25.1,<3.0.0)
+Requires-Dist: seaborn
+Requires-Dist: torch-ema
+Requires-Dist: tqdm
+Project-URL: Documentation, https://chanzuckerberg.github.io/octopi/
+Project-URL: Homepage, https://github.com/chanzuckerberg/octopi
+Project-URL: Issues, https://github.com/chanzuckerberg/octopi/issues
+Description-Content-Type: text/markdown
+# OCTOPI 🐙🐙🐙
+[![License](https://img.shields.io/pypi/l/octopi.svg?color=green)](https://github.com/chanzuckerberg/octopi/raw/main/LICENSE)
+[![PyPI](https://img.shields.io/pypi/v/octopi.svg?color=green)](https://pypi.org/project/octopi)
+[![Python Version](https://img.shields.io/pypi/pyversions/octopi.svg?color=green)](https://www.python.org/)
+**O**bject dete**CT**ion **O**f **P**rote**I**ns. A deep learning framework for Cryo-ET 3D particle picking with autonomous model exploration capabilities.
+## 🚀 Introduction
+octopi addresses a critical bottleneck in cryo-electron tomography (cryo-ET) research: the efficient identification and extraction of proteins within complex cellular environments. As advances in cryo-ET enable the collection of thousands of tomograms, the need for automated, accurate particle picking has become increasingly urgent.
+Our deep learning-based pipeline streamlines the training and execution of 3D autoencoder models specifically designed for cryo-ET particle picking. Built on [copick](https://github.com/copick/copick), a storage-agnostic API, octopi seamlessly accesses tomograms and segmentations across local and remote environments.
+## 🧩 Core Features
+- **3D U-Net Training**: Train and evaluate custom 3D U-Net models for particle segmentation
+- **Automatic Architecture Search**: Explore optimal model configurations using Bayesian optimization via Optuna
+- **Flexible Data Access**: Seamlessly work with tomograms from local storage or remote data portals
+- **HPC Ready**: Built-in support for SLURM-based clusters
+- **Experiment Tracking**: Integrated MLflow support for monitoring training and optimization
+- **Dual Interface**: Use via command-line or Python API
+## 🚀 Quick Start
+### Installation
+```bash
+pip install octopi
+```
+### Basic Usage
+octopi provides two main command-line interfaces:
+```bash
+# Main CLI for training, inference, and data processing
+octopi --help
+```
+The main `octopi` command provides subcommands for:
+- Data import and preprocessing
+- Training label preparation
+- Model training and exploration
+- Inference and particle localization
+```bash
+# HPC-specific CLI for submitting jobs to SLURM clusters
+octopi-slurm --help
+```
+The `octopi-slurm` command provides utilities for:
+- Submitting training jobs to SLURM clusters
+- Managing distributed inference tasks
+- Handling batch processing on HPC systems
+## 📚 Documentation
+For detailed documentation, tutorials, CLI and API reference, visit our [documentation](https://chanzuckerberg.github.io/octopi/).
+## 🤝 Contributing
+This project adheres to the Contributor Covenant code of conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior to opensource@chanzuckerberg.com.
+## 🔒 Security
+If you believe you have found a security issue, please responsibly disclose by contacting us at security@chanzuckerberg.com.

octopi-1.1/README.md ADDED Viewed

@@ -0,0 +1,69 @@
+# OCTOPI 🐙🐙🐙
+[![License](https://img.shields.io/pypi/l/octopi.svg?color=green)](https://github.com/chanzuckerberg/octopi/raw/main/LICENSE)
+[![PyPI](https://img.shields.io/pypi/v/octopi.svg?color=green)](https://pypi.org/project/octopi)
+[![Python Version](https://img.shields.io/pypi/pyversions/octopi.svg?color=green)](https://www.python.org/)
+**O**bject dete**CT**ion **O**f **P**rote**I**ns. A deep learning framework for Cryo-ET 3D particle picking with autonomous model exploration capabilities.
+## 🚀 Introduction
+octopi addresses a critical bottleneck in cryo-electron tomography (cryo-ET) research: the efficient identification and extraction of proteins within complex cellular environments. As advances in cryo-ET enable the collection of thousands of tomograms, the need for automated, accurate particle picking has become increasingly urgent.
+Our deep learning-based pipeline streamlines the training and execution of 3D autoencoder models specifically designed for cryo-ET particle picking. Built on [copick](https://github.com/copick/copick), a storage-agnostic API, octopi seamlessly accesses tomograms and segmentations across local and remote environments.
+## 🧩 Core Features
+- **3D U-Net Training**: Train and evaluate custom 3D U-Net models for particle segmentation
+- **Automatic Architecture Search**: Explore optimal model configurations using Bayesian optimization via Optuna
+- **Flexible Data Access**: Seamlessly work with tomograms from local storage or remote data portals
+- **HPC Ready**: Built-in support for SLURM-based clusters
+- **Experiment Tracking**: Integrated MLflow support for monitoring training and optimization
+- **Dual Interface**: Use via command-line or Python API
+## 🚀 Quick Start
+### Installation
+```bash
+pip install octopi
+```
+### Basic Usage
+octopi provides two main command-line interfaces:
+```bash
+# Main CLI for training, inference, and data processing
+octopi --help
+```
+The main `octopi` command provides subcommands for:
+- Data import and preprocessing
+- Training label preparation
+- Model training and exploration
+- Inference and particle localization
+```bash
+# HPC-specific CLI for submitting jobs to SLURM clusters
+octopi-slurm --help
+```
+The `octopi-slurm` command provides utilities for:
+- Submitting training jobs to SLURM clusters
+- Managing distributed inference tasks
+- Handling batch processing on HPC systems
+## 📚 Documentation
+For detailed documentation, tutorials, CLI and API reference, visit our [documentation](https://chanzuckerberg.github.io/octopi/).
+## 🤝 Contributing
+This project adheres to the Contributor Covenant code of conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior to opensource@chanzuckerberg.com.
+## 🔒 Security
+If you believe you have found a security issue, please responsibly disclose by contacting us at security@chanzuckerberg.com.

{octopi-1.0 → octopi-1.1}/octopi/entry_points/common.py RENAMED Viewed

@@ -8,9 +8,9 @@ def add_model_parameters(parser, octopi = False):
     # Add U-Net model parameters
     parser.add_argument("--Nclass", type=int, required=False, default=3, help="Number of prediction classes in the model")
-    parser.add_argument("--channels", type=utils.parse_int_list, required=False, default='32,64,128,128', help="List of channel sizes")
+    parser.add_argument("--channels", type=utils.parse_int_list, required=False, default='32,64,96,96', help="List of channel sizes")
     parser.add_argument("--strides", type=utils.parse_int_list, required=False, default='2,2,1', help="List of stride sizes")
-    parser.add_argument("--res-units", type=int, required=False, default=2, help="Number of residual units in the UNet")
+    parser.add_argument("--res-units", type=int, required=False, default=1, help="Number of residual units in the UNet")
     parser.add_argument("--dim-in", type=int, required=False, default=96, help="Input dimension for the UNet model")
 def inference_model_parameters(parser):
@@ -24,7 +24,7 @@ def add_train_parameters(parser, octopi = False):
     """
     Add training parameters to the parser.
     """
-    parser.add_argument("--num-epochs", type=int, required=False, default=100, help="Number of training epochs")
+    parser.add_argument("--num-epochs", type=int, required=False, default=1000, help="Number of training epochs")
     parser.add_argument("--val-interval", type=int, required=False, default=10, help="Interval for validation metric calculations")
     parser.add_argument("--tomo-batch-size", type=int, required=False, default=15, help="Number of tomograms to load per epoch for training")
     parser.add_argument("--best-metric", type=str, default='avg_f1', required=False, help="Metric to Monitor for Determining Best Model. To track fBetaN, use fBetaN with N as the beta-value.")
@@ -32,8 +32,8 @@ def add_train_parameters(parser, octopi = False):
     if not octopi:
         parser.add_argument("--num-tomo-crops", type=int, required=False, default=16, help="Number of tomogram crops to use per patch")
         parser.add_argument("--lr", type=float, required=False, default=1e-3, help="Learning rate for the optimizer")
-        parser.add_argument("--tversky-alpha", type=float, required=False, default=0.5, help="Alpha parameter for the Tversky loss")
-        parser.add_argument("--model-save-path", required=True, help="Path to model save directory")
+        parser.add_argument("--tversky-alpha", type=float, required=False, default=0.3, help="Alpha parameter for the Tversky loss")
+        parser.add_argument("--model-save-path", required=False, default='results', help="Path to model save directory")
     else:
         parser.add_argument("--num-trials", type=int, default=10, required=False, help="Number of trials for architecture search (default: 10).")

{octopi-1.0 → octopi-1.1}/octopi/entry_points/create_slurm_submission.py RENAMED Viewed

@@ -16,19 +16,27 @@ def create_train_script(args):
     command = f"""
 octopi train \\
+    {strconfigs} \\
     --model-save-path {args.model_save_path} \\
-    --target-info {args.target_info} \\
-    --voxel-size {args.voxel_size} --tomo-algorithm {args.tomo_algorithm} --Nclass {args.Nclass} \\
-    --best-metric {args.best_metric} --num-epochs {args.num_epochs} --val-interval {args.val_interval} \\
+    --target-info {','.join(args.target_info)} \\
+    --voxel-size {args.voxel_size} --tomo-alg {args.tomo_alg} --Nclass {args.Nclass} \\
     --tomo-batch-size {args.tomo_batch_size} --num-tomo-crops {args.num_tomo_crops} \\
-    {strconfigs}
-"""
+    --best-metric {args.best_metric} --num-epochs {args.num_epochs} --val-interval {args.val_interval} \\
+    """
     # If a model config is provided, use it to build the model
     if args.model_config is not None:
         command += f" --model-config {args.model_config}"
     else:
-        command += f" --tversky-alpha {args.tversky_alpha} --channels {args.channels} --strides {args.strides} --dim-in {args.dim_in} --res-units {args.res_units}"
+        channels = ",".join(map(str, args.channels))
+        strides = ",".join(map(str, args.strides))
+        command += (
+            f" --tversky-alpha {args.tversky_alpha}"
+            f" --channels {channels}"
+            f" --strides {strides}"
+            f" --dim-in {args.dim_in}"
+            f" --res-units {args.res_units}"
+        )
     # If Model Weights are provided, use them to initialize the model
     if args.model_weights is not None and args.model_config is not None:
@@ -240,4 +248,4 @@ def download_dataportal_slurm():
     """
     parser_description = "Create a SLURM script for downloading tomograms from the Dataportal"
     args = cli_dataportal_parser(parser_description, add_slurm=True)
-    create_download_dataportal_script(args)
+    create_download_dataportal_script(args)

{octopi-1.0 → octopi-1.1}/octopi/entry_points/run_extract_mb_picks.py RENAMED Viewed

@@ -30,46 +30,23 @@ def extract_membrane_bound_picks(
     if n_procs is None:
         n_procs = min(mp.cpu_count(), n_run_ids)
     print(f"Using {n_procs} processes to parallelize across {n_run_ids} run IDs.")
-    # Initialize tqdm progress bar
-    with tqdm(total=n_run_ids, desc="Membrane-Protein Isolation", unit="run") as pbar:
-        for _iz in range(0, n_run_ids, n_procs):
-            start_idx = _iz
-            end_idx = min(_iz + n_procs, n_run_ids)  # Ensure end_idx does not exceed n_run_ids
-            print(f"\nProcessing runIDs from {start_idx} -> {end_idx } (out of {n_run_ids})")
-            processes = []
-            for _in in range(n_procs):
-                _iz_this = _iz + _in
-                if _iz_this >= n_run_ids:
-                    break
-                run_id = run_ids[_iz_this]
-                run = root.get_run(run_id)
-                p = mp.Process(
-                    target=extract.process_membrane_bound_extract,
-                    args=(run,
-                          voxel_size,
-                          picks_info,
-                          membrane_info,
-                          organelle_info,
-                          save_user_id,
-                          save_session_id,
-                          distance_threshold),
-                )
-                processes.append(p)
-            for p in processes:
-                p.start()
-            for p in processes:
-                p.join()
-            for p in processes:
-                p.close()
-            # Update tqdm progress bar
-            pbar.update(len(processes))
+    # Run Membrane-Protein Isolation - Main Parallelization Loop
+    with mp.Pool(processes=n_procs) as pool:
+        with tqdm(total=n_run_ids, desc="Membrane-Protein Isolation", unit="run") as pbar:
+            worker_func = lambda run_id: extract.process_membrane_bound_extract(
+                root.get_run(run_id),
+                voxel_size,
+                picks_info,
+                membrane_info,
+                organelle_info,
+                save_user_id,
+                save_session_id,
+                distance_threshold
+            )
+            for _ in pool.imap_unordered(worker_func, run_ids, chunksize=1):
+                pbar.update(1)
     print('Extraction of Membrane-Bound Proteins Complete!')

{octopi-1.0 → octopi-1.1}/octopi/entry_points/run_localize.py RENAMED Viewed

@@ -5,6 +5,7 @@ import copick, argparse, pprint
 from typing import List, Tuple
 import multiprocess as mp
 from tqdm import tqdm
+import os
 def pick_particles(
     copick_config_path: str,
@@ -40,56 +41,39 @@ def pick_particles(
     print(', '.join([f'{obj[0]} (Label: {obj[1]})' for obj in objects]) + '\n')
     # Either Specify Input RunIDs or Run on All RunIDs
-    if runIDs:  print('Running Localization on the Following RunIDs: ' + ', '.join(runIDs) + '\n')
-    run_ids = runIDs if runIDs else [run.name for run in root.runs]
+    if runIDs:
+        print('Running Localization on the Following RunIDs: ' + ', '.join(runIDs) + '\n')
+        run_ids = runIDs
+    else:
+        run_ids = [run.name for run in root.runs if run.get_voxel_spacing(voxel_size) is not None]
+        skipped_run_ids = [run.name for run in root.runs if run.get_voxel_spacing(voxel_size) is None]
+        if skipped_run_ids:
+            print(f"Warning: skipping runs with no voxel spacing {voxel_size}: {skipped_run_ids}")
+    # Nprocesses shouldnt exceed computation resource or number of available runs
     n_run_ids = len(run_ids)
+    n_procs = min(mp.mp.cpu_count(), n_procs, n_run_ids)
-    # Determine the number of processes to use
-    if n_procs is None:
-        n_procs = min(int(mp.cpu_count()//4), n_run_ids)
+    # Run Localization - Main Parallelization Loop
     print(f"Using {n_procs} processes to parallelize across {n_run_ids} run IDs.")
-    # Initialize tqdm progress bar
-    with tqdm(total=n_run_ids, desc="Localization", unit="run") as pbar:
-        for _iz in range(0, n_run_ids, n_procs):
-            start_idx = _iz
-            end_idx = min(_iz + n_procs, n_run_ids)  # Ensure end_idx does not exceed n_run_ids
-            print(f"\nProcessing runIDs from {start_idx} -> {end_idx } (out of {n_run_ids})")
-            processes = []
-            for _in in range(n_procs):
-                _iz_this = _iz + _in
-                if _iz_this >= n_run_ids:
-                    break
-                run_id = run_ids[_iz_this]
-                run = root.get_run(run_id)
-                p = mp.Process(
-                    target=localize.processs_localization,
-                    args=(run,
-                          objects,
-                          seg_info,
-                          method,
-                          voxel_size,
-                          filter_size,
-                          radius_min_scale,
-                          radius_max_scale,
-                          pick_session_id,
-                          pick_user_id),
-                )
-                processes.append(p)
-            for p in processes:
-                p.start()
-            for p in processes:
-                p.join()
-            for p in processes:
-                p.close()
-            # Update tqdm progress bar
-            pbar.update(len(processes))
+    with mp.Pool(processes=n_procs) as pool:
+        with tqdm(total=n_run_ids, desc="Localization", unit="run") as pbar:
+            worker_func = lambda run_id: localize.processs_localization(
+                root.get_run(run_id),
+                objects,
+                seg_info,
+                method,
+                voxel_size,
+                filter_size,
+                radius_min_scale,
+                radius_max_scale,
+                pick_session_id,
+                pick_user_id
+            )
+            for _ in pool.imap_unordered(worker_func, run_ids, chunksize=1):
+                pbar.update(1)
     print('Localization Complete!')
@@ -110,7 +94,7 @@ def localize_parser(parser_description, add_slurm: bool = False):
     localize_group.add_argument("--radius-max-scale", type=float, default=1.0, required=False, help="Maximum radius scale for particles.")
     localize_group.add_argument("--filter-size", type=int, default=10, required=False, help="Filter size for localization.")
     localize_group.add_argument("--pick-objects", type=utils.parse_list, default=None, required=False, help="Specific Objects to Find Picks for.")
-    localize_group.add_argument("--n-procs", type=int, default=None, required=False, help="Number of CPU processes to parallelize runs across. Defaults to the max number of cores available or available runs.")
+    localize_group.add_argument("--n-procs", type=int, default=8, required=False, help="Number of CPU processes to parallelize runs across. Defaults to the max number of cores available or available runs.")
     output_group = parser.add_argument_group("Output Arguments")
     output_group.add_argument("--pick-session-id", type=str, default='1', required=False, help="Session ID for the particle picks.")

{octopi-1.0 → octopi-1.1}/octopi/extract/localize.py RENAMED Viewed

@@ -9,7 +9,7 @@ from octopi import io
 import scipy.ndimage as ndi
 from tqdm import tqdm
 import numpy as np
-import math
+import gc
 def processs_localization(run,
                           objects,
@@ -107,7 +107,7 @@ def extract_particle_centroids_via_watershed(
     max_particle_size = (4 / 3) * np.pi * (max_particle_radius ** 3)
     # Create a binary mask for the specific segmentation label
-    binary_mask = (segmentation == segmentation_idx).astype(int)
+    binary_mask = (segmentation == segmentation_idx).astype(np.uint8)
     # Skip if the segmentation label is not present
     if np.sum(binary_mask) == 0:
@@ -117,7 +117,12 @@ def extract_particle_centroids_via_watershed(
     # Structuring element for erosion and dilation
     struct_elem = ball(1)
     eroded = binary_erosion(binary_mask, struct_elem)
+    del binary_mask
+    gc.collect()
     dilated = binary_dilation(eroded, struct_elem)
+    del eroded
+    gc.collect()
     # Distance transform and local maxima detection
     distance = ndi.distance_transform_edt(dilated)
@@ -125,7 +130,14 @@ def extract_particle_centroids_via_watershed(
     # Watershed segmentation
     markers, _ = ndi.label(local_max)
+    del local_max
+    markers = markers.astype(np.uint8)
+    gc.collect()
     watershed_labels = watershed(-distance, markers, mask=dilated)
+    del distance, markers, dilated
+    watershed_labels = watershed_labels.astype(np.uint8)
+    gc.collect()
     # Extract region properties and filter based on particle size
     all_centroids = []
@@ -135,6 +147,9 @@ def extract_particle_centroids_via_watershed(
             # Option 1: Use all centroids
             all_centroids.append(region.centroid)
+    del watershed_labels
+    gc.collect()
     return all_centroids
 def extract_particle_centroids_via_com(

{octopi-1.0 → octopi-1.1}/octopi/io.py RENAMED Viewed

@@ -137,7 +137,7 @@ def get_segmentation_array(run,
     # No Segmentations Are Available, Result in Error
     if len(seg) == 0:
         # Get all available segmentations with their metadata
-        available_segs = run.get_segmentations(voxel_size=voxel_spacing)
+        available_segs = run.get_segmentations(voxel_size=float(voxel_spacing))
         seg_info = [(s.name, s.user_id, s.session_id) for s in available_segs]
         # Format the information for display

{octopi-1.0 → octopi-1.1}/octopi/main.py RENAMED Viewed

@@ -33,7 +33,7 @@ def cli_main():
         "create-targets": (create_targets, "Generate segmentation targets from coordinates."),
         "train": (train_model, "Train a single U-Net model."),
         "model-explore": (model_explore, "Explore model architectures with Optuna / Bayesian Optimization."),
-        "inference": (inference, "Perform segmentation inference on tomograms."),
+        "segment": (inference, "Perform segmentation inference on tomograms."),
         "localize": (localize, "Perform localization of particles in tomograms."),
         "extract-mb-picks": (extract_mb_picks, "Extract MB Picks from tomograms."),
         "evaluate": (evaluate, "Evaluate the performance of a model."),

{octopi-1.0 → octopi-1.1}/octopi/processing/create_targets_from_picks.py RENAMED Viewed

@@ -42,7 +42,11 @@ def generate_targets(
     # If runIDs are not provided, load all runs
     if run_ids is None:
-        run_ids = [run.name for run in root.runs]
+        run_ids = [run.name for run in root.runs if run.get_voxel_spacing(voxel_size) is not None]
+        skipped_run_ids = [run.name for run in root.runs if run.get_voxel_spacing(voxel_size) is None]
+        if skipped_run_ids:
+            print(f"Warning: skipping runs with no voxel spacing {voxel_size}: {skipped_run_ids}")
     # Iterate Over All Runs
     for runID in tqdm(run_ids):
@@ -87,6 +91,9 @@ def generate_targets(
                     session_id=train_targets[target_name]["session_id"],
                 )
+        # Filter out empty picks
+        query = [pick for pick in query if pick.points is not None]
         # Add Picks to Target
         for pick in query:
             numPicks += len(pick.points)

{octopi-1.0 → octopi-1.1}/octopi/processing/downsample.py RENAMED Viewed

@@ -102,11 +102,6 @@ class FourierRescale:
         """
         in_depth, in_height, in_width = volume.shape[-3:]
-        # Check if dimensions are odd
-        d_is_odd = in_depth % 2
-        h_is_odd = in_height % 2
-        w_is_odd = in_width % 2
         # Calculate new dimensions
         extent_depth = in_depth * self.input_voxel_size[0]
         extent_height = in_height * self.input_voxel_size[1]
@@ -121,9 +116,10 @@ class FourierRescale:
         new_height = new_height - (new_height % 2)
         new_width = new_width - (new_width % 2)
-        # Calculate starting points with odd/even correction
-        start_d = (in_depth - new_depth) // 2 + (d_is_odd)
-        start_h = (in_height - new_height) // 2 + (h_is_odd)
-        start_w = (in_width - new_width) // 2 + (w_is_odd)
+        # Calculate starting points - properly centered around DC component
+        # No odd/even correction needed - just center the crop
+        start_d = (in_depth - new_depth) // 2
+        start_h = (in_height - new_height) // 2
+        start_w = (in_width - new_width) // 2
-        return start_d, start_h, start_w, new_depth, new_height, new_width
+        return start_d, start_h, start_w, new_depth, new_height, new_width

{octopi-1.0 → octopi-1.1}/octopi/pytorch/model_search_submitter.py RENAMED Viewed

@@ -16,16 +16,16 @@ class ModelSearchSubmit:
         voxel_size: float,
         Nclass: int,
         model_type: str,
-        mlflow_experiment_name: str,
-        random_seed: int,
-        num_epochs: int,
-        num_trials: int,
-        tomo_batch_size: int,
-        best_metric: str,
-        val_interval: int,
-        trainRunIDs: List[str],
-        validateRunIDs: List[str],
-        data_split: str
+        best_metric: str = 'avg_f1',
+        num_epochs: int = 1000,
+        num_trials: int = 100,
+        data_split: str = 0.8,
+        random_seed: int = 42,
+        val_interval: int = 10,
+        tomo_batch_size: int = 15,
+        trainRunIDs: List[str] = None,
+        validateRunIDs: List[str] = None,
+        mlflow_experiment_name: str = 'explore',
     ):
         """
         Initialize the ModelSearch class for architecture search with Optuna.
@@ -207,7 +207,7 @@ class ModelSearchSubmit:
             # Run multi-GPU optimization
             study = self.get_optuna_study()
             study.optimize(
-                lambda trial: BayesianModelSearch(self.data_generator, self.model_type).multi_gpu_objective(
+                lambda trial: hyper_search.BayesianModelSearch(self.data_generator, self.model_type).multi_gpu_objective(
                     parent_run, trial,
                     self.num_epochs,
                     best_metric=self.best_metric,

{octopi-1.0 → octopi-1.1}/octopi/pytorch/segmentation.py RENAMED Viewed

@@ -193,8 +193,12 @@ class Predictor:
         # If runIDs are not provided, load all runs
         if runIDs is None:
-            runIDs = [run.name for run in self.root.runs]
+            runIDs = [run.name for run in self.root.runs if run.get_voxel_spacing(voxel_spacing) is not None]
+            skippedRunIDs = [run.name for run in self.root.runs if run.get_voxel_spacing(voxel_spacing) is None]
+            if skippedRunIDs:
+                print(f"Warning: skipping runs with no voxel spacing {voxel_spacing}: {skippedRunIDs}")
         # Iterate over batches of runIDs
         for i in range(0, len(runIDs), num_tomos_per_batch):
@@ -227,9 +231,9 @@ class Predictor:
             lambda x: torch.rot90(x, k=1, dims=(3, 4)),  # 90° rotation
             lambda x: torch.rot90(x, k=2, dims=(3, 4)),  # 180° rotation
             lambda x: torch.rot90(x, k=3, dims=(3, 4)),  # 270° rotation
-            # Flip(spatial_axis=0),         # Flip along x-axis (depth)
-            # Flip(spatial_axis=1),         # Flip along y-axis (height)
-            # Flip(spatial_axis=2),         # Flip along z-axis (width)
+            # lambda x: torch.flip(x, dims=(3,)),        # Flip along height (spatial_axis=1)
+            # lambda x: torch.flip(x, dims=(4,)),        # Flip along width (spatial_axis=2)
+            # lambda x: torch.flip(x, dims=(3, 4)),      # Flip along both height and width
         ]
         # Define inverse transformations (flip back to original orientation)
@@ -238,9 +242,9 @@ class Predictor:
             lambda x: torch.rot90(x, k=-1, dims=(2, 3)),  # Inverse of 90° (i.e. -90°)
             lambda x: torch.rot90(x, k=-2, dims=(2, 3)),  # Inverse of 180° (i.e. -180°)
             lambda x: torch.rot90(x, k=-3, dims=(2, 3)),  # Inverse of 270° (i.e. -270°)
-            # Flip(spatial_axis=0),         # Undo Flip along x-axis
-            # Flip(spatial_axis=1),         # Undo Flip along y-axis
-            # Flip(spatial_axis=2),         # Undo Flip along z-axis
+            # lambda x: torch.flip(x, dims=(2,)),        # Same as forward
+            # lambda x: torch.flip(x, dims=(3,)),        # Same as forward
+            # lambda x: torch.flip(x, dims=(2, 3)),      # Same as forward
         ]
 ###################################################################################################################################################

{octopi-1.0 → octopi-1.1}/octopi/pytorch/trainer.py RENAMED Viewed

@@ -101,6 +101,9 @@ class ModelTrainer:
                     device=self.device
                 )
+                del val_inputs
+                torch.cuda.empty_cache()
                 # Compute the loss for this batch
                 loss = self.loss_function(val_outputs, val_labels)  # Assuming self.loss_function is defined
                 val_loss += loss.item()  # Accumulate the loss
@@ -112,6 +115,9 @@ class ModelTrainer:
                 # Compute metrics
                 self.metrics_function(y_pred=metric_val_outputs, y=metric_val_labels)
+                del val_labels, val_outputs, metric_val_outputs, metric_val_labels
+                torch.cuda.empty_cache()
         # # Contains recall, precision, and f1 for each class
         metric_values = self.metrics_function.aggregate(reduction='mean_batch')
@@ -435,4 +441,4 @@ class ModelTrainer:
             best_metric = 'avg_f1'
         return best_metric

{octopi-1.0 → octopi-1.1}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "octopi"
-version = "1.0"
+version = "1.1"
 description = "Model architecture exploration for cryoET particle picking"
 authors = ["Jonathan Schwartz", "Kevin Zhao"]
 license = "MIT"
@@ -41,3 +41,8 @@ octopi-slurm = "octopi.main:cli_slurm_main"
 [build-system]
 requires = ["poetry-core>=1.0.0"]
 build-backend = "poetry.core.masonry.api"
+[tool.poetry.urls]
+"Homepage" = "https://github.com/chanzuckerberg/octopi"
+"Documentation" = "https://chanzuckerberg.github.io/octopi/"
+"Issues" = "https://github.com/chanzuckerberg/octopi/issues"

octopi-1.0/PKG-INFO DELETED Viewed

@@ -1,209 +0,0 @@
-Metadata-Version: 2.3
-Name: octopi
-Version: 1.0
-Summary: Model architecture exploration for cryoET particle picking
-License: MIT
-Author: Jonathan Schwartz
-Requires-Python: >=3.9,<4.0
-Classifier: License :: OSI Approved :: MIT License
-Classifier: Programming Language :: Python :: 3
-Classifier: Programming Language :: Python :: 3.9
-Classifier: Programming Language :: Python :: 3.10
-Classifier: Programming Language :: Python :: 3.11
-Classifier: Programming Language :: Python :: 3.12
-Classifier: Programming Language :: Python :: 3.13
-Requires-Dist: copick
-Requires-Dist: ipywidgets
-Requires-Dist: kaleido
-Requires-Dist: matplotlib
-Requires-Dist: mlflow (==2.17.0)
-Requires-Dist: monai-weekly (==1.5.dev2448)
-Requires-Dist: mrcfile
-Requires-Dist: multiprocess
-Requires-Dist: nibabel
-Requires-Dist: optuna (==4.0.0)
-Requires-Dist: optuna-integration[botorch,pytorch-lightning]
-Requires-Dist: pandas
-Requires-Dist: plotly
-Requires-Dist: python-dotenv
-Requires-Dist: pytorch-lightning (==2.4.0)
-Requires-Dist: requests (>=2.25.1,<3.0.0)
-Requires-Dist: seaborn
-Requires-Dist: torch-ema
-Requires-Dist: tqdm
-Description-Content-Type: text/markdown
-# OCTOPI 🐙🐙🐙
-**O**bject dete**CT**ion **O**f **P**rote**I**ns. A deep learning framework for Cryo-ET 3D particle picking with autonomous model exploration capabilities.
-## 🚀 Introduction
-octopi addresses a critical bottleneck in cryo-electron tomography (cryo-ET) research: the efficient identification and extraction of proteins within complex cellular environments. As advances in cryo-ET enable the collection of thousands of tomograms, the need for automated, accurate particle picking has become increasingly urgent.
-Our deep learning-based pipeline streamlines the training and execution of 3D autoencoder models specifically designed for cryo-ET particle picking. Built on [copick](https://github.com/copick/copick), a storage-agnostic API, octopi seamlessly accesses tomograms and segmentations across local and remote environments.
-## 🧩 Features
-octopi offers a modular, deep learning-driven pipeline for:
-*	Training and evaluating custom 3D U-Net models for particle segmentation.
-*	Automatically exploring model architectures using Bayesian optimization via Optuna.
-*	Performing inference for both semantic segmentation and particle localization.
-octopi empowers researchers to navigate the dense, intricate landscapes of cryo-ET datasets with unprecedented precision and efficiency without manual trial and error.
-## Getting Started
-### Installation
-*Octopi* is available on PyPI.
-```
-pip install octopi
-```
-## 📚 Usage
-octopi provides a clean, scriptable command-line interface. Run the following command to view all available subcommands:
-```
-octopi --help
-```
-Each subcommand supports its own --help flag for detailed usage. To see practical examples of how to interface directly with the octopi API, explore the notebooks/ folder.
-If you're running octopi on an HPC cluster, several SLURM-compatible submission commands are available. You can view them by running:
-```
-octopi-slurm --help
-```
-This provides utilities for submitting training, inference, and localization jobs in SLURM-based environments.
-### 📥 Data Import & Preprocessing
-To train or run inference with octopi, your tomograms must be organized inside a CoPick project. octopi supports two primary methods for data ingestion, both of which include optional Fourier cropping to reduce resolution and accelerate downstream processing.
-If your tomograms are already processed and stored locally in .mrc format (e.g., from Warp, IMOD, or AreTomo), you can import them into a new or existing CoPick project using:
-```
-octopi import-mrc-volumes \
-    --input-folder /path/to/mrc/files --config /path/to/config.json \
-    --target-tomo-type denoised --input-voxel-size --output-voxel-size 10
-```
-octopi also can process tomograms that are hosted on the data portal. Users can download tomograms onto their own remote machine especially if they would like to downsample the tomograms to a lower resolution for speed and memory. You can download and process the tomograms using:
-```
-octopi download-dataportal \
-    --config /path/to/config.json --datasetID 10445 --overlay-path path/to/saved/zarrs \
-    --input-voxel-size 5 --output-voxel-size 10 \
-    --dataportal-name wbp --target-tomotype wbp
-```
-### 📁 Training Labels Preparation
-Use `octopi create-targets` to create semantic masks for proteins of interest using annotation metadata. In this example lets generate picks segmentations for dataset 10439 from the CZ cryoET Dataportal (only need to run this step once).
-```
-octopi create-targets \
-    --config config.json \
-    --target apoferritin --target beta-galactosidase,slabpick,1 \
-    --target ribosome,pytom,0 --target virus-like-particle,pytom,0 \
-    --seg-target membrane \
-    --tomo-alg wbp --voxel-size 10 \
-    --target-session-id 1 --target-segmentation-name remotetargets \
-    --target-user-id train-octopi
-```
-### 🧠 Training a single 3D U-Net model
-Train a 3D U-Net model on the prepared datasets using the prepared target segmentations. We can use tomograms derived from multiple copick projects.
-```
-octopi train-model \
-    --config experiment,config1.json \
-    --config simulation,config2.json \
-    --voxel-size 10 --tomo-alg wbp --Nclass 8 \
-    --tomo-batch-size 50 --num-epochs 100 --val-interval 10 \
-    --target-info remotetargets,train-octopi,1
-```
-Outputs will include model weights (.pth), logs, and training metrics.
-### 🔍 Model exploration with Optuna
-octopi🐙 supports automatic neural architecture search using Optuna, enabling efficient discovery of optimal 3D U-Net configurations through Bayesian optimization. This allows users to maximize segmentation accuracy without manual tuning.
-To launch a model exploration job:
-```
-octopi model-explore \
-    --config experiment,/mnt/dataportal/ml_challenge/config.json \
-    --config simulation,/mnt/dataportal/synthetic_ml_challenge/config.json \
-    --voxel-size 10 --tomo-alg wbp --Nclass 8 \
-    --model-save-path train_results
-```
-Each trial evaluates a different architecture and logs:
-	•	Segmentation performance metrics
-	•	Model weights and configs
-	•	Training curves and validation loss
-🔬 Trials are automatically tracked with MLflow and saved under the specified `--model-save-path`.
-#### Optuna Dashboard
-To quickly asses the exploration results and observe which trials results the best architectures, Optuna provides a dashboard that summarizes all the information on a dashboard. The instrucutions to access the dashboard are available here - https://optuna-dashboard.readthedocs.io/en/latest/getting-started.html, it is recommended to use either VS-Code extension or CLI.
-#### 📊 MLflow experiment tracking
-To use CZI cloud MLflow tracker, add a `.env` in the root directory like below. You can get a CZI MLflow access token from [here](https://mlflow.cw.use4-prod.si.czi.technology/api/2.0/mlflow/users/access-token) (note that a new token will be generated everytime you open this site).
-```
-MLFLOW_TRACKING_USERNAME = <Your_CZ_email>
-MLFLOW_TRACKING_PASSWORD = <Your_mlflow_access_token>
-```
-octopi supports MLflow for logging and visualizing model training and hyperparameter search results, including:
-	•	Training loss/validation metrics over time
-	•	Model hyperparameters and architecture details
-	•	Trial comparison (e.g., best performing model)
-You can use either a local MLflow instance, a remote (HPC) instance, or the CZI cloud server:
-#### 🧪 Local MLflow Dashboard
-To inspect results locally: `mlflow ui` and open http://localhost:5000 in your browser.
-#### 🖥️ HPC Cluster MLflow Access (Remote via SSH tunnel)
-If running octopi on a remote cluster (e.g., Biohub Bruno), forward the MLflow port.
-On your local machine:
- `ssh -L 5000:localhost:5000 remote_username@remote_host` (in the case of Bruno the remote would be `login01.czbiohub.org`).
-Then on the remote terminal (login node): ` mlflow ui --host 0.0.0.0 --port 5000` to launch the MLFlow dashboard on a local borwser.
-#### ☁️ CZI coreweave cluser
-For the CZI coreweave cluser, MLflow is already hosted. Go to the CZI [mlflow server](https://mlflow.cw.use4-prod.si.czi.technology/).
-🔐 A .env file is required to authenticate (see Getting Started section).
-📁 Be sure to register your project name in MLflow before launching runs.
-### 🔮 Segmentation
-Generate segmentation prediction masks for tomograms in a given copick project.
-```
-octopi inference \
-    --config config.json \
-    --seg-info predict,unet,1 \
-    --model-config train_results/best_model_config.yaml \
-    --model-weights train_results/best_model.pth \
-    --voxel-size 10 --tomo-alg wbp --tomo-batch-size 25
-```
-Output masks will be saved to the corresponding copick project under the `seg-info` input.
-### 📍 Localization
-Convert the segmentation masks into particle coordinates.
-```
-octopi localize \
-    --config config.json \
-    --pick-session-id 1 --pick-user-id unet \
-    --seg-info predict,unet,1
-```
-## Contributing
-This project adheres to the Contributor Covenant code of conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior to opensource@chanzuckerberg.com.
-## Reporting Security Issues
-Please note: If you believe you have found a security issue, please responsibly disclose by contacting us at security@chanzuckerberg.com.

octopi-1.0/README.md DELETED Viewed

@@ -1,173 +0,0 @@
-# OCTOPI 🐙🐙🐙
-**O**bject dete**CT**ion **O**f **P**rote**I**ns. A deep learning framework for Cryo-ET 3D particle picking with autonomous model exploration capabilities.
-## 🚀 Introduction
-octopi addresses a critical bottleneck in cryo-electron tomography (cryo-ET) research: the efficient identification and extraction of proteins within complex cellular environments. As advances in cryo-ET enable the collection of thousands of tomograms, the need for automated, accurate particle picking has become increasingly urgent.
-Our deep learning-based pipeline streamlines the training and execution of 3D autoencoder models specifically designed for cryo-ET particle picking. Built on [copick](https://github.com/copick/copick), a storage-agnostic API, octopi seamlessly accesses tomograms and segmentations across local and remote environments.
-## 🧩 Features
-octopi offers a modular, deep learning-driven pipeline for:
-*	Training and evaluating custom 3D U-Net models for particle segmentation.
-*	Automatically exploring model architectures using Bayesian optimization via Optuna.
-*	Performing inference for both semantic segmentation and particle localization.
-octopi empowers researchers to navigate the dense, intricate landscapes of cryo-ET datasets with unprecedented precision and efficiency without manual trial and error.
-## Getting Started
-### Installation
-*Octopi* is available on PyPI.
-```
-pip install octopi
-```
-## 📚 Usage
-octopi provides a clean, scriptable command-line interface. Run the following command to view all available subcommands:
-```
-octopi --help
-```
-Each subcommand supports its own --help flag for detailed usage. To see practical examples of how to interface directly with the octopi API, explore the notebooks/ folder.
-If you're running octopi on an HPC cluster, several SLURM-compatible submission commands are available. You can view them by running:
-```
-octopi-slurm --help
-```
-This provides utilities for submitting training, inference, and localization jobs in SLURM-based environments.
-### 📥 Data Import & Preprocessing
-To train or run inference with octopi, your tomograms must be organized inside a CoPick project. octopi supports two primary methods for data ingestion, both of which include optional Fourier cropping to reduce resolution and accelerate downstream processing.
-If your tomograms are already processed and stored locally in .mrc format (e.g., from Warp, IMOD, or AreTomo), you can import them into a new or existing CoPick project using:
-```
-octopi import-mrc-volumes \
-    --input-folder /path/to/mrc/files --config /path/to/config.json \
-    --target-tomo-type denoised --input-voxel-size --output-voxel-size 10
-```
-octopi also can process tomograms that are hosted on the data portal. Users can download tomograms onto their own remote machine especially if they would like to downsample the tomograms to a lower resolution for speed and memory. You can download and process the tomograms using:
-```
-octopi download-dataportal \
-    --config /path/to/config.json --datasetID 10445 --overlay-path path/to/saved/zarrs \
-    --input-voxel-size 5 --output-voxel-size 10 \
-    --dataportal-name wbp --target-tomotype wbp
-```
-### 📁 Training Labels Preparation
-Use `octopi create-targets` to create semantic masks for proteins of interest using annotation metadata. In this example lets generate picks segmentations for dataset 10439 from the CZ cryoET Dataportal (only need to run this step once).
-```
-octopi create-targets \
-    --config config.json \
-    --target apoferritin --target beta-galactosidase,slabpick,1 \
-    --target ribosome,pytom,0 --target virus-like-particle,pytom,0 \
-    --seg-target membrane \
-    --tomo-alg wbp --voxel-size 10 \
-    --target-session-id 1 --target-segmentation-name remotetargets \
-    --target-user-id train-octopi
-```
-### 🧠 Training a single 3D U-Net model
-Train a 3D U-Net model on the prepared datasets using the prepared target segmentations. We can use tomograms derived from multiple copick projects.
-```
-octopi train-model \
-    --config experiment,config1.json \
-    --config simulation,config2.json \
-    --voxel-size 10 --tomo-alg wbp --Nclass 8 \
-    --tomo-batch-size 50 --num-epochs 100 --val-interval 10 \
-    --target-info remotetargets,train-octopi,1
-```
-Outputs will include model weights (.pth), logs, and training metrics.
-### 🔍 Model exploration with Optuna
-octopi🐙 supports automatic neural architecture search using Optuna, enabling efficient discovery of optimal 3D U-Net configurations through Bayesian optimization. This allows users to maximize segmentation accuracy without manual tuning.
-To launch a model exploration job:
-```
-octopi model-explore \
-    --config experiment,/mnt/dataportal/ml_challenge/config.json \
-    --config simulation,/mnt/dataportal/synthetic_ml_challenge/config.json \
-    --voxel-size 10 --tomo-alg wbp --Nclass 8 \
-    --model-save-path train_results
-```
-Each trial evaluates a different architecture and logs:
-	•	Segmentation performance metrics
-	•	Model weights and configs
-	•	Training curves and validation loss
-🔬 Trials are automatically tracked with MLflow and saved under the specified `--model-save-path`.
-#### Optuna Dashboard
-To quickly asses the exploration results and observe which trials results the best architectures, Optuna provides a dashboard that summarizes all the information on a dashboard. The instrucutions to access the dashboard are available here - https://optuna-dashboard.readthedocs.io/en/latest/getting-started.html, it is recommended to use either VS-Code extension or CLI.
-#### 📊 MLflow experiment tracking
-To use CZI cloud MLflow tracker, add a `.env` in the root directory like below. You can get a CZI MLflow access token from [here](https://mlflow.cw.use4-prod.si.czi.technology/api/2.0/mlflow/users/access-token) (note that a new token will be generated everytime you open this site).
-```
-MLFLOW_TRACKING_USERNAME = <Your_CZ_email>
-MLFLOW_TRACKING_PASSWORD = <Your_mlflow_access_token>
-```
-octopi supports MLflow for logging and visualizing model training and hyperparameter search results, including:
-	•	Training loss/validation metrics over time
-	•	Model hyperparameters and architecture details
-	•	Trial comparison (e.g., best performing model)
-You can use either a local MLflow instance, a remote (HPC) instance, or the CZI cloud server:
-#### 🧪 Local MLflow Dashboard
-To inspect results locally: `mlflow ui` and open http://localhost:5000 in your browser.
-#### 🖥️ HPC Cluster MLflow Access (Remote via SSH tunnel)
-If running octopi on a remote cluster (e.g., Biohub Bruno), forward the MLflow port.
-On your local machine:
- `ssh -L 5000:localhost:5000 remote_username@remote_host` (in the case of Bruno the remote would be `login01.czbiohub.org`).
-Then on the remote terminal (login node): ` mlflow ui --host 0.0.0.0 --port 5000` to launch the MLFlow dashboard on a local borwser.
-#### ☁️ CZI coreweave cluser
-For the CZI coreweave cluser, MLflow is already hosted. Go to the CZI [mlflow server](https://mlflow.cw.use4-prod.si.czi.technology/).
-🔐 A .env file is required to authenticate (see Getting Started section).
-📁 Be sure to register your project name in MLflow before launching runs.
-### 🔮 Segmentation
-Generate segmentation prediction masks for tomograms in a given copick project.
-```
-octopi inference \
-    --config config.json \
-    --seg-info predict,unet,1 \
-    --model-config train_results/best_model_config.yaml \
-    --model-weights train_results/best_model.pth \
-    --voxel-size 10 --tomo-alg wbp --tomo-batch-size 25
-```
-Output masks will be saved to the corresponding copick project under the `seg-info` input.
-### 📍 Localization
-Convert the segmentation masks into particle coordinates.
-```
-octopi localize \
-    --config config.json \
-    --pick-session-id 1 --pick-user-id unet \
-    --seg-info predict,unet,1
-```
-## Contributing
-This project adheres to the Contributor Covenant code of conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior to opensource@chanzuckerberg.com.
-## Reporting Security Issues
-Please note: If you believe you have found a security issue, please responsibly disclose by contacting us at security@chanzuckerberg.com.