PyPI - supremo-lite - Versions diffs - 0.5.4__tar.gz → 0.5.5__tar.gz - Mend

supremo-lite 0.5.4tar.gz → 0.5.5tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

{supremo_lite-0.5.4 → supremo_lite-0.5.5}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: supremo_lite
-Version: 0.5.4
+Version: 0.5.5
 Summary: A lightweight memory first, model agnostic version of SuPreMo
 License: MIT
 License-File: LICENSE
@@ -42,13 +42,10 @@ For the latest features and bug fixes:
 ```bash
 # Install directly latest release
-pip install supremo_lite
+pip install supremo-lite
 # Or install a specific version/tag
 pip install git+https://github.com/gladstone-institutes/supremo_lite.git@v0.5.0
-# Or install from a specific branch
-pip install git+https://github.com/gladstone-institutes/supremo_lite.git@main
 ```
 ### Dependencies
@@ -60,7 +57,7 @@ Required dependencies will be installed automatically:
 Optional dependencies:
 - `torch` - For PyTorch tensor support (automatically detected)
-- [https://github.com/gladstone-institutes/brisket](brisket) - Cython powered faster 1 hot encoding for DNA sequences (automatically detected)
+- [brisket](https://github.com/gladstone-institutes/brisket) - Cython powered faster 1 hot encoding for DNA sequences (automatically detected)
 ## Quick Start
@@ -214,3 +211,4 @@ Interested in contributing? Check out the contributing guidelines. Please note t
 ## Credits
 `supremo_lite` was created with [`cookiecutter`](https://cookiecutter.readthedocs.io/en/latest/) and the `py-pkgs-cookiecutter` [template](https://github.com/py-pkgs/py-pkgs-cookiecutter).

{supremo_lite-0.5.4 → supremo_lite-0.5.5}/README.md RENAMED Viewed

@@ -20,13 +20,10 @@ For the latest features and bug fixes:
 ```bash
 # Install directly latest release
-pip install supremo_lite
+pip install supremo-lite
 # Or install a specific version/tag
 pip install git+https://github.com/gladstone-institutes/supremo_lite.git@v0.5.0
-# Or install from a specific branch
-pip install git+https://github.com/gladstone-institutes/supremo_lite.git@main
 ```
 ### Dependencies
@@ -38,7 +35,7 @@ Required dependencies will be installed automatically:
 Optional dependencies:
 - `torch` - For PyTorch tensor support (automatically detected)
-- [https://github.com/gladstone-institutes/brisket](brisket) - Cython powered faster 1 hot encoding for DNA sequences (automatically detected)
+- [brisket](https://github.com/gladstone-institutes/brisket) - Cython powered faster 1 hot encoding for DNA sequences (automatically detected)
 ## Quick Start
@@ -191,4 +188,4 @@ Interested in contributing? Check out the contributing guidelines. Please note t
 ## Credits
-`supremo_lite` was created with [`cookiecutter`](https://cookiecutter.readthedocs.io/en/latest/) and the `py-pkgs-cookiecutter` [template](https://github.com/py-pkgs/py-pkgs-cookiecutter).
+`supremo_lite` was created with [`cookiecutter`](https://cookiecutter.readthedocs.io/en/latest/) and the `py-pkgs-cookiecutter` [template](https://github.com/py-pkgs/py-pkgs-cookiecutter).

{supremo_lite-0.5.4 → supremo_lite-0.5.5}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "supremo_lite"
-version = "0.5.4"
+version = "0.5.5"
 description = "A lightweight memory first, model agnostic version of SuPreMo"
 authors = ["Natalie Gill", "Sean Whalen"]
 license = "MIT"

{supremo_lite-0.5.4 → supremo_lite-0.5.5}/src/supremo_lite/__init__.py RENAMED Viewed

@@ -52,7 +52,7 @@ from .prediction_alignment import align_predictions_by_coordinate
 # This allows users who don't have PyTorch to still use the main package
 # Version
-__version__ = "0.5.4"
+__version__ = "0.5.5"
 # Package metadata
 __description__ = (
     "A module for generating personalized genome sequences and in-silico mutagenesis"

{supremo_lite-0.5.4 → supremo_lite-0.5.5}/src/supremo_lite/mock_models/testmodel_2d.py RENAMED Viewed

@@ -126,9 +126,13 @@ if TORCH_AVAILABLE:
             )
             # Crop bins from all edges to focus loss function
-            y_hat = y_hat[
-                :, :, self.crop_bins : -self.crop_bins, self.crop_bins : -self.crop_bins
-            ]
+            if self.crop_bins > 0:
+                y_hat = y_hat[
+                    :,
+                    :,
+                    self.crop_bins : -self.crop_bins,
+                    self.crop_bins : -self.crop_bins,
+                ]
             # Return full contact matrix
             return y_hat

{supremo_lite-0.5.4 → supremo_lite-0.5.5}/src/supremo_lite/mutagenesis.py RENAMED Viewed

@@ -146,7 +146,8 @@ def get_sm_sequences(chrom, start, end, reference_fasta, encoder=None):
     # Create a DataFrame for the metadata
     metadata_df = pd.DataFrame(
-        metadata, columns=["chrom", "window_start", "window_end", "variant_pos0", "ref", "alt"]
+        metadata,
+        columns=["chrom", "window_start", "window_end", "variant_pos0", "ref", "alt"],
     )
     return ref_1h, alt_seqs_stacked, metadata_df
@@ -239,9 +240,7 @@ def get_sm_subsequences(
             )
     elif not has_bed:
         # Neither approach was specified
-        raise ValueError(
-            "Must provide either (anchor + anchor_radius) or bed_regions."
-        )
+        raise ValueError("Must provide either (anchor + anchor_radius) or bed_regions.")
     alt_seqs = []
     metadata = []
@@ -331,7 +330,11 @@ def get_sm_subsequences(
                 # Adjust window to stay within chromosome bounds
                 chrom_obj = reference_fasta[chrom]
-                chrom_len = len(chrom_obj) if hasattr(chrom_obj, '__len__') else len(chrom_obj.seq)
+                chrom_len = (
+                    len(chrom_obj)
+                    if hasattr(chrom_obj, "__len__")
+                    else len(chrom_obj.seq)
+                )
                 if window_start < 0:
                     window_start = 0
                     window_end = min(seq_len, chrom_len)
@@ -377,13 +380,17 @@ def get_sm_subsequences(
                         # Create a clone and substitute the base
                         if TORCH_AVAILABLE and isinstance(region_1h, torch.Tensor):
                             alt_1h = region_1h.clone()
-                            alt_1h[:, i] = torch.tensor(nt_to_1h[alt], dtype=alt_1h.dtype)
+                            alt_1h[:, i] = torch.tensor(
+                                nt_to_1h[alt], dtype=alt_1h.dtype
+                            )
                         else:
                             alt_1h = region_1h.copy()
                             alt_1h[:, i] = nt_to_1h[alt]
                         alt_seqs.append(alt_1h)
-                        metadata.append([chrom, window_start, window_end, i, ref_nt, alt])
+                        metadata.append(
+                            [chrom, window_start, window_end, i, ref_nt, alt]
+                        )
         # If no regions were processed, create empty ref_1h
         if ref_1h is None:
@@ -408,7 +415,8 @@ def get_sm_subsequences(
     # Create a DataFrame for the metadata
     metadata_df = pd.DataFrame(
-        metadata, columns=["chrom", "window_start", "window_end", "variant_pos0", "ref", "alt"]
+        metadata,
+        columns=["chrom", "window_start", "window_end", "variant_pos0", "ref", "alt"],
     )
     return ref_1h, alt_seqs_stacked, metadata_df

{supremo_lite-0.5.4 → supremo_lite-0.5.5}/src/supremo_lite/personalize.py RENAMED Viewed

@@ -42,7 +42,7 @@ IUPAC_CODES = {
     "D": "[AGT]",
     "H": "[ACT]",
     "V": "[ACG]",
-    "N": "[ACGT]"
+    "N": "[ACGT]",
 }
@@ -2811,6 +2811,7 @@ def get_pam_disrupting_alt_sequences(
         ...     ref, vcf, seq_len=50, max_pam_distance=10, n_chunks=5):
         ...     predictions = model.predict(alt_seqs, ref_seqs)
     """
     # Helper function to find PAM sites in a sequence
     def _find_pam_sites(sequence, pam_pattern):
         """Find all PAM site positions in a sequence using IUPAC codes.
@@ -2830,7 +2831,7 @@ def get_pam_disrupting_alt_sequences(
                 pat_base = pat_upper[j]
                 # Sequence 'N' (padding or unknown) matches any pattern base
-                if seq_base == 'N':
+                if seq_base == "N":
                     continue  # Always matches
                 # Get allowed bases for this pattern position
@@ -3003,9 +3004,7 @@ def get_pam_disrupting_alt_sequences(
         ref_allele = var.get("ref", "")
         alt_allele = var.get("alt", "")
         is_indel = (
-            len(ref_allele) != len(alt_allele)
-            or ref_allele == "-"
-            or alt_allele == "-"
+            len(ref_allele) != len(alt_allele) or ref_allele == "-" or alt_allele == "-"
         )
         truly_disrupted_pam_sites = []
@@ -3053,8 +3052,12 @@ def get_pam_disrupting_alt_sequences(
         # For each disrupted PAM site, create a metadata entry
         for pam_site_pos in truly_disrupted_pam_sites:
             # Extract PAM sequences
-            ref_pam_seq = ref_window_seq[pam_site_pos : pam_site_pos + len(pam_sequence)]
-            alt_pam_seq = modified_window[pam_site_pos : pam_site_pos + len(pam_sequence)]
+            ref_pam_seq = ref_window_seq[
+                pam_site_pos : pam_site_pos + len(pam_sequence)
+            ]
+            alt_pam_seq = modified_window[
+                pam_site_pos : pam_site_pos + len(pam_sequence)
+            ]
             # Calculate distance from variant to PAM
             pam_distance = abs(pam_site_pos - variant_pos_in_window)
@@ -3063,12 +3066,14 @@ def get_pam_disrupting_alt_sequences(
             pam_disrupting_variants_list.append(var)
             # Store PAM-specific metadata
-            pam_metadata_list.append({
-                'pam_site_pos': pam_site_pos,
-                'pam_ref_sequence': ref_pam_seq,
-                'pam_alt_sequence': alt_pam_seq,
-                'pam_distance': pam_distance
-            })
+            pam_metadata_list.append(
+                {
+                    "pam_site_pos": pam_site_pos,
+                    "pam_ref_sequence": ref_pam_seq,
+                    "pam_alt_sequence": alt_pam_seq,
+                    "pam_distance": pam_distance,
+                }
+            )
     # If no PAM-disrupting variants found, yield empty results
     if not pam_disrupting_variants_list:
@@ -3076,7 +3081,9 @@ def get_pam_disrupting_alt_sequences(
         return
     # Create DataFrame with filtered PAM-disrupting variants
-    filtered_variants_df = pd.DataFrame(pam_disrupting_variants_list).reset_index(drop=True)
+    filtered_variants_df = pd.DataFrame(pam_disrupting_variants_list).reset_index(
+        drop=True
+    )
     pam_metadata_df = pd.DataFrame(pam_metadata_list)
     # Call get_alt_ref_sequences with the filtered variants
@@ -3087,12 +3094,13 @@ def get_pam_disrupting_alt_sequences(
         encode,
         n_chunks,
         encoder,
-        auto_map_chromosomes
+        auto_map_chromosomes,
     ):
         # Merge PAM-specific metadata with base metadata
         # Both should have the same number of rows since we created one entry per PAM site
-        enriched_metadata = pd.concat([base_metadata.reset_index(drop=True),
-                                       pam_metadata_df], axis=1)
+        enriched_metadata = pd.concat(
+            [base_metadata.reset_index(drop=True), pam_metadata_df], axis=1
+        )
         # Yield the chunk with enriched metadata
         yield (alt_seqs, ref_seqs, enriched_metadata)

{supremo_lite-0.5.4 → supremo_lite-0.5.5}/src/supremo_lite/prediction_alignment.py RENAMED Viewed

@@ -40,19 +40,49 @@ class VariantPosition:
     svlen: int  # Length of structural variant (base pairs, signed for DEL/INS)
     variant_type: str  # Type of variant ('SNV', 'INS', 'DEL', 'DUP', 'INV', 'BND')
-    def get_bin_positions(self, bin_size: int) -> Tuple[int, int, int]:
+    def get_bin_positions(
+        self, bin_size: int, window_start: int, crop_length: int
+    ) -> Tuple[int, int, int]:
         """
-        Convert base pair positions to bin indices.
+        Convert base pair positions to bin indices relative to window.
         Args:
             bin_size: Number of base pairs per prediction bin
+            window_start: Start position of the sequence window (0-based genomic coord).
+            crop_length: Number of base pairs cropped from each edge by the model.
+                        This accounts for edge bases removed before prediction.
         Returns:
-            Tuple of (ref_bin, alt_start_bin, alt_end_bin)
+            Tuple of (ref_bin, alt_start_bin, alt_end_bin) as bin indices
+            relative to the prediction vector. For centered masking, these
+            represent the center and extent of the masked region.
+        Notes:
+            - Positions are calculated relative to window_start, not absolute genomic coords
+            - crop_length accounts for edge bases removed before prediction
+            - Masked bins are centered on the variant position
         """
-        ref_bin = int(np.ceil(self.ref_pos / bin_size))
-        alt_start_bin = int(np.ceil(self.alt_pos / bin_size))
-        alt_end_bin = int(np.ceil((self.alt_pos + abs(self.svlen)) / bin_size))
+        # Calculate positions relative to window
+        rel_ref_pos = self.ref_pos - window_start
+        rel_alt_pos = self.alt_pos - window_start
+        # Account for cropping (bases removed from start of window before prediction)
+        rel_ref_pos -= crop_length
+        rel_alt_pos -= crop_length
+        # Convert to bin indices using floor division (not ceil!)
+        ref_bin_center = rel_ref_pos // bin_size
+        alt_bin_center = rel_alt_pos // bin_size
+        # Calculate number of bins to mask
+        svlen_bins = int(np.ceil(abs(self.svlen) / bin_size))
+        half_bins = svlen_bins // 2
+        # Center the masked region on the variant
+        ref_bin = ref_bin_center - half_bins
+        alt_start_bin = alt_bin_center - half_bins
+        alt_end_bin = alt_bin_center + (svlen_bins - half_bins)
         return ref_bin, alt_start_bin, alt_end_bin
@@ -71,24 +101,27 @@ class PredictionAligner1D:
     Args:
         target_size: Expected number of bins in the prediction output
         bin_size: Number of base pairs per prediction bin (model-specific)
+        crop_length: Number of base pairs cropped from each edge by the model
     Example:
-        >>> aligner = PredictionAligner1D(target_size=896, bin_size=128)
+        >>> aligner = PredictionAligner1D(target_size=896, bin_size=128, crop_length=0)
         >>> ref_aligned, alt_aligned = aligner.align_predictions(
         ...     ref_pred, alt_pred, 'INS', variant_position
         ... )
     """
-    def __init__(self, target_size: int, bin_size: int):
+    def __init__(self, target_size: int, bin_size: int, crop_length: int):
         """
         Initialize the 1D prediction aligner.
         Args:
             target_size: Expected number of bins in prediction (e.g., 896 for Enformer)
             bin_size: Base pairs per bin (e.g., 128 for Enformer)
+            crop_length: Number of base pairs cropped from each edge by the model
         """
         self.target_size = target_size
         self.bin_size = bin_size
+        self.crop_length = crop_length
     def align_predictions(
         self,
@@ -96,6 +129,7 @@ class PredictionAligner1D:
         alt_pred: Union[np.ndarray, "torch.Tensor"],
         svtype: str,
         var_pos: VariantPosition,
+        window_start: int = 0,
     ) -> Tuple[Union[np.ndarray, "torch.Tensor"], Union[np.ndarray, "torch.Tensor"]]:
         """
         Main entry point for 1D prediction alignment.
@@ -105,6 +139,8 @@ class PredictionAligner1D:
             alt_pred: Alternate prediction vector (length N)
             svtype: Variant type ('DEL', 'DUP', 'INS', 'INV', 'SNV')
             var_pos: Variant position information
+            window_start: Start position of sequence window (0-based genomic coord).
+                         Required for correct bin calculation. Defaults to 0.
         Returns:
             Tuple of (aligned_ref, aligned_alt) vectors with NaN masking applied
@@ -120,10 +156,12 @@ class PredictionAligner1D:
         if svtype_normalized in ["DEL", "DUP", "INS"]:
             return self._align_indel_predictions(
-                ref_pred, alt_pred, svtype_normalized, var_pos
+                ref_pred, alt_pred, svtype_normalized, var_pos, window_start
             )
         elif svtype_normalized == "INV":
-            return self._align_inversion_predictions(ref_pred, alt_pred, var_pos)
+            return self._align_inversion_predictions(
+                ref_pred, alt_pred, var_pos, window_start
+            )
         elif svtype_normalized in ["SNV", "MNV"]:
             # SNVs don't change coordinates, direct alignment
             is_torch = TORCH_AVAILABLE and torch.is_tensor(ref_pred)
@@ -140,18 +178,22 @@ class PredictionAligner1D:
         alt_pred: Union[np.ndarray, "torch.Tensor"],
         svtype: str,
         var_pos: VariantPosition,
+        window_start: int = 0,
     ) -> Tuple[Union[np.ndarray, "torch.Tensor"], Union[np.ndarray, "torch.Tensor"]]:
         """
         Align predictions for insertions, deletions, and duplications.
         Strategy:
         1. For DEL: Swap REF/ALT (deletion removes from REF)
-        2. Insert NaN bins in shorter sequence
+        2. Insert NaN bins in shorter sequence (centered on variant)
         3. Crop edges to maintain target size
         4. For DEL: Swap back
         This ensures that positions present in one sequence but not the other
         are marked with NaN, enabling fair comparison of overlapping regions.
+        Args:
+            window_start: Start position of sequence window (0-based genomic coord)
         """
         is_torch = TORCH_AVAILABLE and torch.is_tensor(ref_pred)
@@ -170,8 +212,10 @@ class PredictionAligner1D:
                 var_pos.alt_pos, var_pos.ref_pos, var_pos.svlen, svtype
             )
-        # Get bin positions
-        ref_bin, alt_start_bin, alt_end_bin = var_pos.get_bin_positions(self.bin_size)
+        # Get bin positions (window-relative, centered)
+        ref_bin, alt_start_bin, alt_end_bin = var_pos.get_bin_positions(
+            self.bin_size, window_start, self.crop_length
+        )
         bins_to_add = alt_end_bin - alt_start_bin
         # Insert NaN bins in REF where variant exists in ALT
@@ -248,6 +292,7 @@ class PredictionAligner1D:
         ref_pred: Union[np.ndarray, "torch.Tensor"],
         alt_pred: Union[np.ndarray, "torch.Tensor"],
         var_pos: VariantPosition,
+        window_start: int = 0,
     ) -> Tuple[Union[np.ndarray, "torch.Tensor"], Union[np.ndarray, "torch.Tensor"]]:
         """
         Align predictions for inversions.
@@ -259,6 +304,9 @@ class PredictionAligner1D:
         For strand-aware models, inversions can significantly affect predictions
         because regulatory elements now appear on the opposite strand. We mask
         the inverted region to focus comparison on unaffected flanking sequences.
+        Args:
+            window_start: Start position of sequence window (0-based genomic coord)
         """
         is_torch = TORCH_AVAILABLE and torch.is_tensor(ref_pred)
@@ -270,7 +318,9 @@ class PredictionAligner1D:
             ref_np = ref_pred.copy()
             alt_np = alt_pred.copy()
-        var_start, _, var_end = var_pos.get_bin_positions(self.bin_size)
+        var_start, _, var_end = var_pos.get_bin_positions(
+            self.bin_size, window_start, self.crop_length
+        )
         # Mask inverted region in both REF and ALT
         ref_np[var_start : var_end + 1] = np.nan
@@ -373,19 +423,23 @@ class PredictionAligner2D:
         target_size: Expected matrix dimension (NxN)
         bin_size: Number of base pairs per matrix bin (model-specific)
         diag_offset: Number of diagonal bins to mask (model-specific)
+        crop_length: Number of base pairs cropped from each edge by the model
     Example:
         >>> aligner = PredictionAligner2D(
         ...     target_size=448,
         ...     bin_size=2048,
-        ...     diag_offset=2
+        ...     diag_offset=2,
+        ...     crop_length=0
         ... )
         >>> ref_aligned, alt_aligned = aligner.align_predictions(
         ...     ref_matrix, alt_matrix, 'DEL', variant_position
         ... )
     """
-    def __init__(self, target_size: int, bin_size: int, diag_offset: int):
+    def __init__(
+        self, target_size: int, bin_size: int, diag_offset: int, crop_length: int
+    ):
         """
         Initialize the 2D prediction aligner.
@@ -393,10 +447,12 @@ class PredictionAligner2D:
             target_size: Matrix dimension (e.g., 448 for Akita)
             bin_size: Base pairs per bin (e.g., 2048 for Akita)
             diag_offset: Diagonal masking offset (e.g., 2 for Akita)
+            crop_length: Number of base pairs cropped from each edge by the model
         """
         self.target_size = target_size
         self.bin_size = bin_size
         self.diag_offset = diag_offset
+        self.crop_length = crop_length
     def align_predictions(
         self,
@@ -404,6 +460,7 @@ class PredictionAligner2D:
         alt_pred: Union[np.ndarray, "torch.Tensor"],
         svtype: str,
         var_pos: VariantPosition,
+        window_start: int = 0,
     ) -> Tuple[Union[np.ndarray, "torch.Tensor"], Union[np.ndarray, "torch.Tensor"]]:
         """
         Main entry point for 2D matrix alignment.
@@ -413,6 +470,8 @@ class PredictionAligner2D:
             alt_pred: Alternate prediction matrix (NxN)
             svtype: Variant type ('DEL', 'DUP', 'INS', 'INV', 'SNV')
             var_pos: Variant position information
+            window_start: Start position of sequence window (0-based genomic coord).
+                         Required for correct bin calculation. Defaults to 0.
         Returns:
             Tuple of (aligned_ref, aligned_alt) matrices with NaN masking applied
@@ -428,10 +487,12 @@ class PredictionAligner2D:
         if svtype_normalized in ["DEL", "DUP", "INS"]:
             return self._align_indel_matrices(
-                ref_pred, alt_pred, svtype_normalized, var_pos
+                ref_pred, alt_pred, svtype_normalized, var_pos, window_start
             )
         elif svtype_normalized == "INV":
-            return self._align_inversion_matrices(ref_pred, alt_pred, var_pos)
+            return self._align_inversion_matrices(
+                ref_pred, alt_pred, var_pos, window_start
+            )
         elif svtype_normalized in ["SNV", "MNV"]:
             # SNVs don't change coordinates, direct alignment
             is_torch = TORCH_AVAILABLE and torch.is_tensor(ref_pred)
@@ -448,15 +509,19 @@ class PredictionAligner2D:
         alt_pred: Union[np.ndarray, "torch.Tensor"],
         svtype: str,
         var_pos: VariantPosition,
+        window_start: int = 0,
     ) -> Tuple[Union[np.ndarray, "torch.Tensor"], Union[np.ndarray, "torch.Tensor"]]:
         """
         Align matrices for insertions, deletions, and duplications.
         Strategy:
         1. For DEL: Swap REF/ALT (deletion removes from REF)
-        2. Insert NaN bins (rows AND columns) in shorter matrix
+        2. Insert NaN bins (rows AND columns) in shorter matrix (centered on variant)
         3. Crop edges to maintain target size
         4. For DEL: Swap back
+        Args:
+            window_start: Start position of sequence window (0-based genomic coord)
         """
         is_torch = TORCH_AVAILABLE and torch.is_tensor(ref_pred)
@@ -475,8 +540,10 @@ class PredictionAligner2D:
                 var_pos.alt_pos, var_pos.ref_pos, var_pos.svlen, svtype
             )
-        # Get bin positions
-        ref_bin, alt_start_bin, alt_end_bin = var_pos.get_bin_positions(self.bin_size)
+        # Get bin positions (window-relative, centered)
+        ref_bin, alt_start_bin, alt_end_bin = var_pos.get_bin_positions(
+            self.bin_size, window_start, self.crop_length
+        )
         bins_to_add = alt_end_bin - alt_start_bin
         # Insert NaN bins in REF where variant exists in ALT
@@ -541,6 +608,7 @@ class PredictionAligner2D:
         ref_pred: Union[np.ndarray, "torch.Tensor"],
         alt_pred: Union[np.ndarray, "torch.Tensor"],
         var_pos: VariantPosition,
+        window_start: int = 0,
     ) -> Tuple[Union[np.ndarray, "torch.Tensor"], Union[np.ndarray, "torch.Tensor"]]:
         """
         Align matrices for inversions.
@@ -556,6 +624,9 @@ class PredictionAligner2D:
         The same NaN pattern is mirrored to ALT so both matrices have identical
         masked regions, enabling fair comparison of the unaffected areas.
+        Args:
+            window_start: Start position of sequence window (0-based genomic coord)
         """
         is_torch = TORCH_AVAILABLE and torch.is_tensor(ref_pred)
@@ -567,7 +638,9 @@ class PredictionAligner2D:
             ref_np = ref_pred.copy()
             alt_np = alt_pred.copy()
-        var_start, _, var_end = var_pos.get_bin_positions(self.bin_size)
+        var_start, _, var_end = var_pos.get_bin_positions(
+            self.bin_size, window_start, self.crop_length
+        )
         # Mask inverted region in REF (cross-pattern: rows + columns)
         ref_np[var_start : var_end + 1, :] = np.nan
@@ -802,6 +875,7 @@ def align_predictions_by_coordinate(
     metadata_row: dict,
     bin_size: int,
     prediction_type: str,
+    crop_length: int,
     matrix_size: Optional[int] = None,
     diag_offset: int = 0,
 ) -> Tuple[Union[np.ndarray, "torch.Tensor"], Union[np.ndarray, "torch.Tensor"]]:
@@ -812,7 +886,7 @@ def align_predictions_by_coordinate(
     vectors (e.g., chromatin accessibility, TF binding) and 2D matrices (e.g., Hi-C contact maps),
     routing to the appropriate alignment strategy based on variant type.
-    IMPORTANT: Model-specific parameters (bin_size, matrix_size) must be explicitly
+    IMPORTANT: Model-specific parameters (bin_size, crop_length, matrix_size) must be explicitly
     provided by the user. There are no defaults because these vary across different models.
     Args:
@@ -824,10 +898,13 @@ def align_predictions_by_coordinate(
             - 'variant_pos0': Variant position (0-based, absolute genomic coordinate)
             - 'svlen': Length of structural variant (optional, for symbolic alleles)
         bin_size: Number of base pairs per prediction bin (REQUIRED, model-specific)
-            Examples: 2048 for Akita
+            Examples: 2048 for Akita, 128 for Enformer
         prediction_type: Type of predictions ("1D" or "2D")
             - "1D": Vector predictions (chromatin accessibility, TF binding, etc.)
             - "2D": Matrix predictions (Hi-C contact maps, Micro-C, etc.)
+        crop_length: Number of base pairs cropped from each edge by the model (REQUIRED)
+            This accounts for edge bases removed before prediction.
+            Examples: 0 for models without cropping
         matrix_size: Size of contact matrix (REQUIRED for 2D type)
             Examples: 448 for Akita
         diag_offset: Number of diagonal bins to mask (default: 0 for no masking)
@@ -849,7 +926,8 @@ def align_predictions_by_coordinate(
         ...     metadata_row={'variant_type': 'INS', 'window_start': 0,
         ...                   'variant_pos0': 500, 'svlen': 100},
         ...     bin_size=128,
-        ...     prediction_type="1D"
+        ...     prediction_type="1D",
+        ...     crop_length=0
         ... )
     Example (2D contact maps with diagonal masking):
@@ -860,6 +938,7 @@ def align_predictions_by_coordinate(
         ...                   'variant_pos0': 50000, 'svlen': -2048},
         ...     bin_size=2048,
         ...     prediction_type="2D",
+        ...     crop_length=0,
         ...     matrix_size=448,
         ...     diag_offset=2  # Optional: use 0 if no diagonal masking
         ... )
@@ -872,6 +951,7 @@ def align_predictions_by_coordinate(
         ...                   'variant_pos0': 1000, 'svlen': 500},
         ...     bin_size=1000,
         ...     prediction_type="2D",
+        ...     crop_length=0,
         ...     matrix_size=512
         ...     # diag_offset defaults to 0 (no masking)
         ... )
@@ -937,7 +1017,9 @@ def align_predictions_by_coordinate(
         # Handle multi-target predictions [n_targets, n_bins]
         if ndim > 1:
             target_size = ref_preds.shape[-1]  # Number of bins
-            aligner = PredictionAligner1D(target_size=target_size, bin_size=bin_size)
+            aligner = PredictionAligner1D(
+                target_size=target_size, bin_size=bin_size, crop_length=crop_length
+            )
             # Align each target separately
             n_targets = ref_preds.shape[0]
@@ -948,7 +1030,7 @@ def align_predictions_by_coordinate(
                 ref_target = ref_preds[target_idx]
                 alt_target = alt_preds[target_idx]
                 ref_aligned, alt_aligned = aligner.align_predictions(
-                    ref_target, alt_target, variant_type, var_pos
+                    ref_target, alt_target, variant_type, var_pos, window_start
                 )
                 ref_aligned_list.append(ref_aligned)
                 alt_aligned_list.append(alt_aligned)
@@ -965,9 +1047,11 @@ def align_predictions_by_coordinate(
         else:
             # Single target prediction [n_bins]
             target_size = len(ref_preds)
-            aligner = PredictionAligner1D(target_size=target_size, bin_size=bin_size)
+            aligner = PredictionAligner1D(
+                target_size=target_size, bin_size=bin_size, crop_length=crop_length
+            )
             return aligner.align_predictions(
-                ref_preds, alt_preds, variant_type, var_pos
+                ref_preds, alt_preds, variant_type, var_pos, window_start
             )
     else:  # 2D
         # Check if predictions are 1D (flattened upper triangular) or 2D (full matrix)
@@ -989,10 +1073,13 @@ def align_predictions_by_coordinate(
             # Align matrices
             aligner = PredictionAligner2D(
-                target_size=matrix_size, bin_size=bin_size, diag_offset=diag_offset
+                target_size=matrix_size,
+                bin_size=bin_size,
+                diag_offset=diag_offset,
+                crop_length=crop_length,
             )
             aligned_ref_matrix, aligned_alt_matrix = aligner.align_predictions(
-                ref_matrix, alt_matrix, variant_type, var_pos
+                ref_matrix, alt_matrix, variant_type, var_pos, window_start
             )
             # Convert back to flattened format
@@ -1007,8 +1094,11 @@ def align_predictions_by_coordinate(
         else:
             # Already 2D matrices
             aligner = PredictionAligner2D(
-                target_size=matrix_size, bin_size=bin_size, diag_offset=diag_offset
+                target_size=matrix_size,
+                bin_size=bin_size,
+                diag_offset=diag_offset,
+                crop_length=crop_length,
             )
             return aligner.align_predictions(
-                ref_preds, alt_preds, variant_type, var_pos
+                ref_preds, alt_preds, variant_type, var_pos, window_start
             )

{supremo_lite-0.5.4 → supremo_lite-0.5.5}/LICENSE RENAMED Viewed

File without changes

{supremo_lite-0.5.4 → supremo_lite-0.5.5}/src/supremo_lite/chromosome_utils.py RENAMED Viewed

File without changes

{supremo_lite-0.5.4 → supremo_lite-0.5.5}/src/supremo_lite/core.py RENAMED Viewed

File without changes

{supremo_lite-0.5.4 → supremo_lite-0.5.5}/src/supremo_lite/mock_models/__init__.py RENAMED Viewed

File without changes

{supremo_lite-0.5.4 → supremo_lite-0.5.5}/src/supremo_lite/mock_models/testmodel_1d.py RENAMED Viewed

File without changes

{supremo_lite-0.5.4 → supremo_lite-0.5.5}/src/supremo_lite/sequence_utils.py RENAMED Viewed

File without changes

{supremo_lite-0.5.4 → supremo_lite-0.5.5}/src/supremo_lite/variant_utils.py RENAMED Viewed

File without changes

supremo-lite 0.5.4__tar.gz → 0.5.5__tar.gz

supremo-lite 0.5.4tar.gz → 0.5.5tar.gz