PyPI - pyseqalignment - Versions diffs - 0.1.2__tar.gz → 0.1.4__tar.gz - Mend

pyseqalignment 0.1.2tar.gz → 0.1.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (67) hide show

{pyseqalignment-0.1.2/src/pyseqalignment.egg-info → pyseqalignment-0.1.4}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: pyseqalignment
-Version: 0.1.2
+Version: 0.1.4
 Summary: pySeqAlign -- sequence alignment with Prolog-style distance functions and ILP learning
 Author-email: Andreas Karwath <a.karwath@bham.ac.uk>
 License-Expression: MIT
@@ -56,7 +56,9 @@ pySeqAlign provides Smith-Waterman (local) and Needleman-Wunsch (global) sequenc
 ## Features
 - **Smith-Waterman** local alignment with k-best non-overlapping results
-- **Needleman-Wunsch** global alignment
+- **Needleman-Wunsch** global alignment (linear and **affine** gap costs)
+- **Multiple sequence alignment** -- progressive MSA over a neighbor-joining guide tree, parameterised by any scoring function (`pyseqalign.msa`)
+- **Relational sequence logos** -- information-content logos of aligned logical atoms, with per-column least-general generalisation, after Karwath & Kersting (ILP 2006) (`pyseqalign.logo`)
 - **Prolog-based distance functions** via SWI-Prolog integration (optional)
 - **Substitution matrices** -- BLOSUM (50, 60, 62, 70, 80, 90, 100) and PAM (50, 150, 200, 250) bundled; any NCBI-format matrix loadable from file or downloaded at runtime
 - **Nienhuys-Cheng distance** for recursive structural comparison of logical atoms
@@ -309,19 +311,62 @@ For reference, other notable systems in the field include:
 - [Metagol](https://github.com/metagol/metagol) -- Meta-Interpretive Learning
 - [DeepStochLog](https://github.com/ML-KULeuven/deepstochlog) -- Neural-symbolic ILP combining logic and neural networks
+## Multiple alignment & relational logos
+pySeqAlign can align *and* summarise sequences of structured logical atoms,
+reproducing Karwath & Kersting, *Relational Sequence Alignments and Logos*
+(ILP 2006) — with no learning involved:
+```python
+from pyseqalign.msa import progressive_msa
+from pyseqalign.logo import relational_logo
+from pyseqalign.scoring.distance import AtomDistance
+# atoms as structured tuples: id -> (predicate, *args); 0 = gap
+atom_store = {1: ('h', 'a', 'r', 'm'), 2: ('h', 'a', 'r', 'l'), 3: ('s', 'p', 'm')}
+seqs = {'d1': [1, 2, 3], 'd2': [1, 3, 2], 'd3': [2, 3]}
+scoring = AtomDistance(atom_store=atom_store, gap_score=-0.5)   # Nienhuys-Cheng
+msa = progressive_msa(seqs, scoring, gap_open=-1.0, gap_extend=-0.1)
+rows = list(msa.aligned_sequences.values())
+relational_logo(rows, atom_store, 'logo.png', title='example fold')
+```
+`progressive_msa` accepts **any** scoring function, so a reward matrix learned
+by [pyREAL](https://github.com/athro/pyREAL)'s boosting can drive the alignment
+in place of the fixed distance. Runnable reproductions of the paper's SCOP and
+balloon logos are in [`examples/`](examples/).
 ## Fast C++ aligner (optional)
 The pure-Python aligners are fine for typical use. For heavy workloads (e.g.
 boosting that re-aligns thousands of sequence pairs per iteration), an optional
-**C++ affine-gap Needleman-Wunsch** kernel is provided. It is NOT built by default
-(the core stays pure Python); build it once with a C++ compiler + pybind11:
+**C++ affine-gap Needleman-Wunsch** kernel is provided.
+**It is compiled automatically at install time** (best effort). The package is
+distributed as an sdist, so `pip install pyseqalignment` builds from source and
+tries to compile the accelerator for your Python/ABI/platform using a C++
+compiler + pybind11 (pulled in as a build dependency). On macOS/Linux with a
+compiler present this "just works"; if no compiler is available the install
+still succeeds and the library falls back to the pure-Python aligner. Check with:
+```python
+from pyseqalign.accel import cpp_available
+print(cpp_available())   # True if the accelerator compiled at install
+```
+If you installed without a compiler and later want the accelerator, install one
+(Xcode Command Line Tools on macOS, `build-essential` on Debian/Ubuntu) and
+either reinstall (`pip install --force-reinstall --no-binary :all: pyseqalignment`)
+or build it in place once:
 ```bash
 pip install pybind11
 src/pyseqalign/cpp/build_cpp_aligner.sh          # or: PY=$(which python) src/.../build_cpp_aligner.sh
 ```
-This compiles the extension into the `pyseqalign` package. Then:
+Either way it compiles the extension into the `pyseqalign` package. Then:
 ```python
 from pyseqalign.accel import cpp_available, load

{pyseqalignment-0.1.2 → pyseqalignment-0.1.4}/README.md RENAMED Viewed

@@ -18,7 +18,9 @@ pySeqAlign provides Smith-Waterman (local) and Needleman-Wunsch (global) sequenc
 ## Features
 - **Smith-Waterman** local alignment with k-best non-overlapping results
-- **Needleman-Wunsch** global alignment
+- **Needleman-Wunsch** global alignment (linear and **affine** gap costs)
+- **Multiple sequence alignment** -- progressive MSA over a neighbor-joining guide tree, parameterised by any scoring function (`pyseqalign.msa`)
+- **Relational sequence logos** -- information-content logos of aligned logical atoms, with per-column least-general generalisation, after Karwath & Kersting (ILP 2006) (`pyseqalign.logo`)
 - **Prolog-based distance functions** via SWI-Prolog integration (optional)
 - **Substitution matrices** -- BLOSUM (50, 60, 62, 70, 80, 90, 100) and PAM (50, 150, 200, 250) bundled; any NCBI-format matrix loadable from file or downloaded at runtime
 - **Nienhuys-Cheng distance** for recursive structural comparison of logical atoms
@@ -271,19 +273,62 @@ For reference, other notable systems in the field include:
 - [Metagol](https://github.com/metagol/metagol) -- Meta-Interpretive Learning
 - [DeepStochLog](https://github.com/ML-KULeuven/deepstochlog) -- Neural-symbolic ILP combining logic and neural networks
+## Multiple alignment & relational logos
+pySeqAlign can align *and* summarise sequences of structured logical atoms,
+reproducing Karwath & Kersting, *Relational Sequence Alignments and Logos*
+(ILP 2006) — with no learning involved:
+```python
+from pyseqalign.msa import progressive_msa
+from pyseqalign.logo import relational_logo
+from pyseqalign.scoring.distance import AtomDistance
+# atoms as structured tuples: id -> (predicate, *args); 0 = gap
+atom_store = {1: ('h', 'a', 'r', 'm'), 2: ('h', 'a', 'r', 'l'), 3: ('s', 'p', 'm')}
+seqs = {'d1': [1, 2, 3], 'd2': [1, 3, 2], 'd3': [2, 3]}
+scoring = AtomDistance(atom_store=atom_store, gap_score=-0.5)   # Nienhuys-Cheng
+msa = progressive_msa(seqs, scoring, gap_open=-1.0, gap_extend=-0.1)
+rows = list(msa.aligned_sequences.values())
+relational_logo(rows, atom_store, 'logo.png', title='example fold')
+```
+`progressive_msa` accepts **any** scoring function, so a reward matrix learned
+by [pyREAL](https://github.com/athro/pyREAL)'s boosting can drive the alignment
+in place of the fixed distance. Runnable reproductions of the paper's SCOP and
+balloon logos are in [`examples/`](examples/).
 ## Fast C++ aligner (optional)
 The pure-Python aligners are fine for typical use. For heavy workloads (e.g.
 boosting that re-aligns thousands of sequence pairs per iteration), an optional
-**C++ affine-gap Needleman-Wunsch** kernel is provided. It is NOT built by default
-(the core stays pure Python); build it once with a C++ compiler + pybind11:
+**C++ affine-gap Needleman-Wunsch** kernel is provided.
+**It is compiled automatically at install time** (best effort). The package is
+distributed as an sdist, so `pip install pyseqalignment` builds from source and
+tries to compile the accelerator for your Python/ABI/platform using a C++
+compiler + pybind11 (pulled in as a build dependency). On macOS/Linux with a
+compiler present this "just works"; if no compiler is available the install
+still succeeds and the library falls back to the pure-Python aligner. Check with:
+```python
+from pyseqalign.accel import cpp_available
+print(cpp_available())   # True if the accelerator compiled at install
+```
+If you installed without a compiler and later want the accelerator, install one
+(Xcode Command Line Tools on macOS, `build-essential` on Debian/Ubuntu) and
+either reinstall (`pip install --force-reinstall --no-binary :all: pyseqalignment`)
+or build it in place once:
 ```bash
 pip install pybind11
 src/pyseqalign/cpp/build_cpp_aligner.sh          # or: PY=$(which python) src/.../build_cpp_aligner.sh
 ```
-This compiles the extension into the `pyseqalign` package. Then:
+Either way it compiles the extension into the `pyseqalign` package. Then:
 ```python
 from pyseqalign.accel import cpp_available, load

{pyseqalignment-0.1.2 → pyseqalignment-0.1.4}/pyproject.toml RENAMED Viewed

@@ -1,12 +1,14 @@
 [build-system]
-requires = ["setuptools>=68.0"]
+# pybind11 is a build-time dep so the OPTIONAL C++ aligner can be compiled at
+# install time (best-effort; see setup.py). It is not a runtime dependency.
+requires = ["setuptools>=68.0", "pybind11>=2.10"]
 build-backend = "setuptools.build_meta"
 [project]
 # PyPI distribution name (the import package is `pyseqalign`; the name
 # `pyseqalign` was blocked by PyPI's similarity guard vs. an existing project).
 name = "pyseqalignment"
-version = "0.1.2"
+version = "0.1.4"
 description = "pySeqAlign -- sequence alignment with Prolog-style distance functions and ILP learning"
 readme = "README.md"
 license = "MIT"
@@ -78,6 +80,11 @@ line-length = 100
 [tool.ruff.lint]
 select = ["E", "F", "W", "I", "N", "UP"]
+ignore = [
+    "N803",  # uppercase argument names (matrix math convention: M, Ix, Iy)
+    "N806",  # uppercase local variables in functions (same reason)
+    "E741",  # ambiguous variable name 'l' (used in tree (m, l, r) unpacking)
+]
 [tool.mypy]
 python_version = "3.10"

pyseqalignment-0.1.4/setup.py ADDED Viewed

@@ -0,0 +1,58 @@
+"""Best-effort build of the optional C++ affine-gap aligner.
+pySeqAlign's core is pure Python. The C++ aligner (``pyseqalign.cpp_aligner``)
+is an OPTIONAL accelerator (~100-270x faster, identical results). We try to
+compile it at install time on any platform that has a C++ compiler + pybind11;
+if that fails (no compiler, no pybind11, unsupported platform, ...) the install
+STILL SUCCEEDS and the library transparently falls back to the pure-Python
+aligner -- see ``pyseqalign.accel``.
+This is why the project publishes an sdist (not a pure-Python wheel): pip builds
+from source on the target machine, giving every install a chance to compile the
+accelerator locally for its own Python/ABI/platform.
+"""
+from __future__ import annotations
+import sys
+from setuptools import Extension, setup
+from setuptools.command.build_ext import build_ext
+# Only build on POSIX (macOS/Linux); the hand-tuned flags below are GCC/Clang.
+_CPP = 'src/pyseqalign/cpp/cpp_aligner.cpp'
+ext_modules: list[Extension] = []
+if sys.platform != 'win32':
+    try:
+        import pybind11
+        ext_modules = [
+            Extension(
+                'pyseqalign.cpp_aligner',
+                [_CPP],
+                include_dirs=[pybind11.get_include()],
+                language='c++',
+                optional=True,  # setuptools won't fail the build if this ext won't compile
+                extra_compile_args=['-O3', '-std=c++14'],
+            )
+        ]
+    except Exception as exc:  # pybind11 missing -> skip the accelerator
+        print(f'pyseqalign: skipping optional C++ aligner ({exc}); pure-Python fallback.')
+class BestEffortBuildExt(build_ext):
+    """Compile the accelerator if possible; never break the install if not."""
+    def run(self) -> None:
+        try:
+            super().run()
+        except Exception as exc:  # pragma: no cover - depends on build env
+            print(f'pyseqalign: optional C++ aligner not built ({exc}); pure-Python fallback.')
+    def build_extension(self, ext) -> None:
+        try:
+            super().build_extension(ext)
+        except Exception as exc:  # pragma: no cover - depends on build env
+            print(f'pyseqalign: optional C++ aligner not built ({exc}); pure-Python fallback.')
+setup(ext_modules=ext_modules, cmdclass={'build_ext': BestEffortBuildExt})

{pyseqalignment-0.1.2 → pyseqalignment-0.1.4}/src/pyseqalign/__init__.py RENAMED Viewed

@@ -1,14 +1,21 @@
 """pySeqAlign -- Sequence alignment with Prolog-style distance functions and ILP learning."""
-from pyseqalign.core.alignment import AlignmentResult, LocalAlignmentResult
+from pyseqalign.core.alignment import (
+    AffineAlignmentResult,
+    AlignmentResult,
+    LocalAlignmentResult,
+)
 from pyseqalign.core.needleman_wunsch import NeedlemanWunsch
+from pyseqalign.core.nw_affine import NeedlemanWunschAffine
 from pyseqalign.core.smith_waterman import SmithWaterman
-__version__ = "0.1.0"
+__version__ = "0.1.4"
 __all__ = [
     "SmithWaterman",
     "NeedlemanWunsch",
+    "NeedlemanWunschAffine",
     "AlignmentResult",
+    "AffineAlignmentResult",
     "LocalAlignmentResult",
 ]

{pyseqalignment-0.1.2 → pyseqalignment-0.1.4}/src/pyseqalign/core/__init__.py RENAMED Viewed

@@ -1,12 +1,19 @@
 """Core alignment algorithms."""
-from pyseqalign.core.alignment import AlignmentResult, LocalAlignmentResult
+from pyseqalign.core.alignment import (
+    AffineAlignmentResult,
+    AlignmentResult,
+    LocalAlignmentResult,
+)
 from pyseqalign.core.needleman_wunsch import NeedlemanWunsch
+from pyseqalign.core.nw_affine import NeedlemanWunschAffine
 from pyseqalign.core.smith_waterman import SmithWaterman
 __all__ = [
     "SmithWaterman",
     "NeedlemanWunsch",
+    "NeedlemanWunschAffine",
     "AlignmentResult",
+    "AffineAlignmentResult",
     "LocalAlignmentResult",
 ]

{pyseqalignment-0.1.2 → pyseqalignment-0.1.4}/src/pyseqalign/core/alignment.py RENAMED Viewed

@@ -22,6 +22,19 @@ class AlignmentResult:
     length: int
+@dataclass
+class AffineAlignmentResult(AlignmentResult):
+    """Extended result from affine-gap alignment.
+    Attributes:
+        gap_opens: Number of gap-open events in both sequences combined.
+        gap_extensions: Number of gap-extension events.
+    """
+    gap_opens: int = 0
+    gap_extensions: int = 0
 @dataclass
 class LocalAlignmentResult:
     """Result of a single local (Smith-Waterman) alignment.

pyseqalignment-0.1.4/src/pyseqalign/core/nw_affine.py ADDED Viewed

@@ -0,0 +1,202 @@
+"""Needleman-Wunsch global alignment with affine gap penalties.
+Translated from the legacy C++ AlignerAffine::_align() implementation.
+Uses three DP matrices (M, Ix, Iy) to distinguish gap-open from gap-extend.
+"""
+from __future__ import annotations
+import numpy as np
+from pyseqalign.core.alignment import AffineAlignmentResult
+from pyseqalign.scoring.protocols import ScoringFunction
+# Matrix indices.
+_M = 0  # match/mismatch
+_IX = 1  # gap in target (consuming query element)
+_IY = 2  # gap in query (consuming target element)
+class NeedlemanWunschAffine:
+    """Needleman-Wunsch with affine gap penalties.
+    Recurrences (similarity mode):
+      M[i][j]  = score(q[i], t[j]) + max(M[i-1][j-1], Ix[i-1][j-1], Iy[i-1][j-1])
+      Ix[i][j] = max(M[i-1][j] + gap_open, Ix[i-1][j] + gap_extend, Iy[i-1][j] + gap_open)
+      Iy[i][j] = max(M[i][j-1] + gap_open, Iy[i][j-1] + gap_extend, Ix[i][j-1] + gap_open)
+    Args:
+        scoring: Scoring function (element ID 0 = gap).
+        gap_open: Cost for opening a new gap (should be negative for penalties).
+        gap_extend: Cost for extending an existing gap (should be negative,
+            typically less severe than gap_open).
+    """
+    def __init__(
+        self,
+        scoring: ScoringFunction,
+        gap_open: float = -2.5,
+        gap_extend: float = -0.25,
+    ) -> None:
+        self.scoring = scoring
+        self.gap_open = gap_open
+        self.gap_extend = gap_extend
+    def align(self, seq1: list[int], seq2: list[int]) -> AffineAlignmentResult:
+        """Compute the optimal global alignment with affine gap penalties.
+        Args:
+            seq1: Query sequence (list of integer element IDs).
+            seq2: Target sequence.
+        Returns:
+            An ``AffineAlignmentResult`` with aligned sequences and gap statistics.
+        """
+        n = len(seq1)
+        m = len(seq2)
+        NEG_INF = -np.inf
+        # F[k, i, j] for k in {M=0, Ix=1, Iy=2}
+        F = np.full((3, n + 1, m + 1), NEG_INF, dtype=np.float64)
+        # Traceback: B[k, i, j, :] = (from_k, from_i, from_j)
+        B = np.full((3, n + 1, m + 1, 3), -1, dtype=np.int32)
+        F[_M, 0, 0] = 0.0
+        d = self.gap_open
+        e = self.gap_extend
+        # --- Border initialization: gaps along query (Ix column) ---
+        for i0 in range(n):
+            i = i0 + 1
+            if i > 1:
+                F[_IX, i, 0] = F[_IX, i - 1, 0] + e
+            else:
+                F[_IX, i, 0] = d
+            B[_IX, i, 0] = [_IX, i - 1, 0]
+            # M and Iy are -inf along this border (already set).
+        # --- Border initialization: gaps along target (Iy row) ---
+        for j0 in range(m):
+            j = j0 + 1
+            if j > 1:
+                F[_IY, 0, j] = F[_IY, 0, j - 1] + e
+            else:
+                F[_IY, 0, j] = d
+            B[_IY, 0, j] = [_IY, 0, j - 1]
+            # M and Ix are -inf along this border (already set).
+        # --- Main DP fill ---
+        for i0 in range(n):
+            i = i0 + 1
+            for j0 in range(m):
+                j = j0 + 1
+                # Match/mismatch: diagonal transition.
+                s = self.scoring.score(seq1[i - 1], seq2[j - 1])
+                candidates_m = (
+                    F[_M, i - 1, j - 1] + s,
+                    F[_IX, i - 1, j - 1] + s,
+                    F[_IY, i - 1, j - 1] + s,
+                )
+                best_k = _argmax3(candidates_m)
+                F[_M, i, j] = candidates_m[best_k]
+                B[_M, i, j] = [best_k, i - 1, j - 1]
+                # Ix: gap in target (consume query[i], skip target).
+                candidates_ix = (
+                    F[_M, i - 1, j] + d,  # new gap
+                    F[_IX, i - 1, j] + e,  # extend gap
+                    F[_IY, i - 1, j] + d,  # new gap
+                )
+                best_k = _argmax3(candidates_ix)
+                F[_IX, i, j] = candidates_ix[best_k]
+                B[_IX, i, j] = [best_k, i - 1, j]
+                # Iy: gap in query (skip query, consume target[j]).
+                candidates_iy = (
+                    F[_M, i, j - 1] + d,  # new gap
+                    F[_IY, i, j - 1] + e,  # extend gap
+                    F[_IX, i, j - 1] + d,  # new gap
+                )
+                best_k = _argmax3(candidates_iy)
+                F[_IY, i, j] = candidates_iy[best_k]
+                B[_IY, i, j] = [best_k, i, j - 1]
+        # --- Find best endpoint ---
+        end_scores = (F[_M, n, m], F[_IX, n, m], F[_IY, n, m])
+        best_end = _argmax3(end_scores)
+        score = end_scores[best_end]
+        # --- Traceback ---
+        align1, align2, gap_opens, gap_extensions = self._traceback(B, seq1, seq2, best_end, n, m)
+        return AffineAlignmentResult(
+            query=align1,
+            target=align2,
+            score=float(score),
+            length=len(align1),
+            gap_opens=gap_opens,
+            gap_extensions=gap_extensions,
+        )
+    @staticmethod
+    def _traceback(
+        B: np.ndarray,
+        seq1: list[int],
+        seq2: list[int],
+        start_k: int,
+        start_i: int,
+        start_j: int,
+    ) -> tuple[list[int], list[int], int, int]:
+        """Walk the traceback matrix to produce aligned sequences."""
+        align1: list[int] = []
+        align2: list[int] = []
+        gap_opens = 0
+        gap_extensions = 0
+        k, i, j = start_k, start_i, start_j
+        prev_k = -1
+        while i > 0 or j > 0:
+            from_k, from_i, from_j = int(B[k, i, j, 0]), int(B[k, i, j, 1]), int(B[k, i, j, 2])
+            if from_i < 0 or from_j < 0:
+                # Reached uninitialised border — shouldn't happen.
+                break
+            if k == _M:
+                # Diagonal: match/mismatch.
+                align1.append(seq1[i - 1])
+                align2.append(seq2[j - 1])
+            elif k == _IX:
+                # Gap in target.
+                align1.append(seq1[i - 1])
+                align2.append(0)
+                if prev_k != _IX:
+                    gap_opens += 1
+                else:
+                    gap_extensions += 1
+            else:  # _IY
+                # Gap in query.
+                align1.append(0)
+                align2.append(seq2[j - 1])
+                if prev_k != _IY:
+                    gap_opens += 1
+                else:
+                    gap_extensions += 1
+            prev_k = k
+            k, i, j = from_k, from_i, from_j
+        align1.reverse()
+        align2.reverse()
+        return align1, align2, gap_opens, gap_extensions
+def _argmax3(vals: tuple[float, float, float]) -> int:
+    """Return index of maximum among exactly three values."""
+    if vals[0] >= vals[1]:
+        return 0 if vals[0] >= vals[2] else 2
+    return 1 if vals[1] >= vals[2] else 2

pyseqalignment-0.1.4/src/pyseqalign/logo/__init__.py ADDED Viewed

@@ -0,0 +1,17 @@
+"""Relational sequence logos — position-specific profiles of logical atoms."""
+from pyseqalign.logo.probability import FreqDist, LidstoneProbDist, MLEProbDist
+from pyseqalign.logo.profile import PositionProfile, RelationalProfile
+from pyseqalign.logo.render import column_ic, lgg_atoms, relational_logo, term_str
+__all__ = [
+    'FreqDist',
+    'MLEProbDist',
+    'LidstoneProbDist',
+    'PositionProfile',
+    'RelationalProfile',
+    'relational_logo',
+    'column_ic',
+    'lgg_atoms',
+    'term_str',
+]

pyseqalignment 0.1.2__tar.gz → 0.1.4__tar.gz

pyseqalignment 0.1.2tar.gz → 0.1.4tar.gz