pyseqalignment 0.1.1__tar.gz → 0.1.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (55) hide show
  1. {pyseqalignment-0.1.1/src/pyseqalignment.egg-info → pyseqalignment-0.1.3}/PKG-INFO +47 -1
  2. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/README.md +46 -0
  3. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/pyproject.toml +5 -2
  4. pyseqalignment-0.1.3/setup.py +58 -0
  5. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/__init__.py +1 -1
  6. pyseqalignment-0.1.3/src/pyseqalign/accel.py +33 -0
  7. pyseqalignment-0.1.3/src/pyseqalign/cpp/build_cpp_aligner.sh +12 -0
  8. pyseqalignment-0.1.3/src/pyseqalign/cpp/cpp_aligner.cpp +140 -0
  9. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3/src/pyseqalignment.egg-info}/PKG-INFO +47 -1
  10. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalignment.egg-info/SOURCES.txt +4 -0
  11. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/LICENSE +0 -0
  12. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/setup.cfg +0 -0
  13. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/core/__init__.py +0 -0
  14. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/core/alignment.py +0 -0
  15. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/core/needleman_wunsch.py +0 -0
  16. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/core/smith_waterman.py +0 -0
  17. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/learning/__init__.py +0 -0
  18. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/learning/aleph.py +0 -0
  19. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/learning/aleph_files/__init__.py +0 -0
  20. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/learning/aleph_files/aleph_swi_ak.pl +0 -0
  21. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/learning/base.py +0 -0
  22. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/learning/popper.py +0 -0
  23. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/learning/task_builder.py +0 -0
  24. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/prolog/__init__.py +0 -0
  25. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/prolog/engine.py +0 -0
  26. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/prolog/knowledge/__init__.py +0 -0
  27. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/prolog/knowledge/amino_acids.pl +0 -0
  28. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/prolog/knowledge/blosum50.pl +0 -0
  29. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/prolog/knowledge/defaults.pl +0 -0
  30. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/prolog/knowledge/distances.pl +0 -0
  31. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/scoring/__init__.py +0 -0
  32. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/scoring/distance.py +0 -0
  33. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/scoring/matrices.py +0 -0
  34. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/scoring/matrix_data/BLOSUM100 +0 -0
  35. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/scoring/matrix_data/BLOSUM50 +0 -0
  36. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/scoring/matrix_data/BLOSUM60 +0 -0
  37. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/scoring/matrix_data/BLOSUM62 +0 -0
  38. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/scoring/matrix_data/BLOSUM70 +0 -0
  39. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/scoring/matrix_data/BLOSUM80 +0 -0
  40. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/scoring/matrix_data/BLOSUM90 +0 -0
  41. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/scoring/matrix_data/PAM150 +0 -0
  42. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/scoring/matrix_data/PAM200 +0 -0
  43. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/scoring/matrix_data/PAM250 +0 -0
  44. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/scoring/matrix_data/PAM50 +0 -0
  45. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/scoring/matrix_data/__init__.py +0 -0
  46. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/utils/__init__.py +0 -0
  47. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalign/utils/helpers.py +0 -0
  48. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalignment.egg-info/dependency_links.txt +0 -0
  49. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalignment.egg-info/requires.txt +0 -0
  50. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/src/pyseqalignment.egg-info/top_level.txt +0 -0
  51. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/tests/test_learning.py +0 -0
  52. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/tests/test_needleman_wunsch.py +0 -0
  53. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/tests/test_scoring.py +0 -0
  54. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/tests/test_smith_waterman.py +0 -0
  55. {pyseqalignment-0.1.1 → pyseqalignment-0.1.3}/tests/test_utils.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: pyseqalignment
3
- Version: 0.1.1
3
+ Version: 0.1.3
4
4
  Summary: pySeqAlign -- sequence alignment with Prolog-style distance functions and ILP learning
5
5
  Author-email: Andreas Karwath <a.karwath@bham.ac.uk>
6
6
  License-Expression: MIT
@@ -64,6 +64,7 @@ pySeqAlign provides Smith-Waterman (local) and Needleman-Wunsch (global) sequenc
64
64
  - **Aleph** backend -- classic ILP system (Srinivasan, 2001)
65
65
  - **Popper** backend -- modern ILP via learning from failures (Cropper & Morel, 2021)
66
66
  - **Pure Python** core -- no C extension required (unlike the legacy version)
67
+ - **Optional fast C++ aligner** -- a pybind11 affine-gap Needleman-Wunsch kernel (`pyseqalign.accel`) for large all-pairs / iterative workloads; identical results to the Python aligner, ~100-270x faster
67
68
 
68
69
  ## Installation
69
70
 
@@ -308,6 +309,51 @@ For reference, other notable systems in the field include:
308
309
  - [Metagol](https://github.com/metagol/metagol) -- Meta-Interpretive Learning
309
310
  - [DeepStochLog](https://github.com/ML-KULeuven/deepstochlog) -- Neural-symbolic ILP combining logic and neural networks
310
311
 
312
+ ## Fast C++ aligner (optional)
313
+
314
+ The pure-Python aligners are fine for typical use. For heavy workloads (e.g.
315
+ boosting that re-aligns thousands of sequence pairs per iteration), an optional
316
+ **C++ affine-gap Needleman-Wunsch** kernel is provided.
317
+
318
+ **It is compiled automatically at install time** (best effort). The package is
319
+ distributed as an sdist, so `pip install pyseqalignment` builds from source and
320
+ tries to compile the accelerator for your Python/ABI/platform using a C++
321
+ compiler + pybind11 (pulled in as a build dependency). On macOS/Linux with a
322
+ compiler present this "just works"; if no compiler is available the install
323
+ still succeeds and the library falls back to the pure-Python aligner. Check with:
324
+
325
+ ```python
326
+ from pyseqalign.accel import cpp_available
327
+ print(cpp_available()) # True if the accelerator compiled at install
328
+ ```
329
+
330
+ If you installed without a compiler and later want the accelerator, install one
331
+ (Xcode Command Line Tools on macOS, `build-essential` on Debian/Ubuntu) and
332
+ either reinstall (`pip install --force-reinstall --no-binary :all: pyseqalignment`)
333
+ or build it in place once:
334
+
335
+ ```bash
336
+ pip install pybind11
337
+ src/pyseqalign/cpp/build_cpp_aligner.sh # or: PY=$(which python) src/.../build_cpp_aligner.sh
338
+ ```
339
+
340
+ Either way it compiles the extension into the `pyseqalign` package. Then:
341
+
342
+ ```python
343
+ from pyseqalign.accel import cpp_available, load
344
+ assert cpp_available()
345
+ cpp = load() # the compiled module
346
+ al = cpp.CppAligner(num_ids, gap_open, gap_extend)
347
+ al.set_matrix(flat_score_matrix) # (num_ids+1)^2 row-major scores
348
+ r = al.align(query_ids, target_ids) # r.score, r.query, r.target, r.gap_opens, ...
349
+ ```
350
+
351
+ The kernel implements the same 3-matrix (M/Ix/Iy) affine recurrence as a numeric
352
+ aligner over a dense score matrix, so its results are interchangeable with a
353
+ pure-Python affine Needleman-Wunsch (validated bit-for-bit). If the extension is
354
+ not built, `cpp_available()` is `False` and `load()` raises a helpful error --
355
+ callers should fall back to the Python aligner.
356
+
311
357
  ## Background
312
358
 
313
359
  **pySeqAlign** is a modern, pure-Python reimplementation that revives the name of one of its own ancestors. It succeeds two legacy libraries behind the ILP 2006 / ICDM 2008 work: the original **pyAlign** (SWIG-wrapped C with YAP Prolog bindings for alignment) and the original **pySeqAlign** (which held the Aleph ILP framework for learning rules from alignment examples). This version is pure Python, with optional SWI-Prolog integration via [Janus](https://www.swi-prolog.org/packs/list?p=janus) (the modern Python-Prolog bridge, replacing the older pyswip).
@@ -26,6 +26,7 @@ pySeqAlign provides Smith-Waterman (local) and Needleman-Wunsch (global) sequenc
26
26
  - **Aleph** backend -- classic ILP system (Srinivasan, 2001)
27
27
  - **Popper** backend -- modern ILP via learning from failures (Cropper & Morel, 2021)
28
28
  - **Pure Python** core -- no C extension required (unlike the legacy version)
29
+ - **Optional fast C++ aligner** -- a pybind11 affine-gap Needleman-Wunsch kernel (`pyseqalign.accel`) for large all-pairs / iterative workloads; identical results to the Python aligner, ~100-270x faster
29
30
 
30
31
  ## Installation
31
32
 
@@ -270,6 +271,51 @@ For reference, other notable systems in the field include:
270
271
  - [Metagol](https://github.com/metagol/metagol) -- Meta-Interpretive Learning
271
272
  - [DeepStochLog](https://github.com/ML-KULeuven/deepstochlog) -- Neural-symbolic ILP combining logic and neural networks
272
273
 
274
+ ## Fast C++ aligner (optional)
275
+
276
+ The pure-Python aligners are fine for typical use. For heavy workloads (e.g.
277
+ boosting that re-aligns thousands of sequence pairs per iteration), an optional
278
+ **C++ affine-gap Needleman-Wunsch** kernel is provided.
279
+
280
+ **It is compiled automatically at install time** (best effort). The package is
281
+ distributed as an sdist, so `pip install pyseqalignment` builds from source and
282
+ tries to compile the accelerator for your Python/ABI/platform using a C++
283
+ compiler + pybind11 (pulled in as a build dependency). On macOS/Linux with a
284
+ compiler present this "just works"; if no compiler is available the install
285
+ still succeeds and the library falls back to the pure-Python aligner. Check with:
286
+
287
+ ```python
288
+ from pyseqalign.accel import cpp_available
289
+ print(cpp_available()) # True if the accelerator compiled at install
290
+ ```
291
+
292
+ If you installed without a compiler and later want the accelerator, install one
293
+ (Xcode Command Line Tools on macOS, `build-essential` on Debian/Ubuntu) and
294
+ either reinstall (`pip install --force-reinstall --no-binary :all: pyseqalignment`)
295
+ or build it in place once:
296
+
297
+ ```bash
298
+ pip install pybind11
299
+ src/pyseqalign/cpp/build_cpp_aligner.sh # or: PY=$(which python) src/.../build_cpp_aligner.sh
300
+ ```
301
+
302
+ Either way it compiles the extension into the `pyseqalign` package. Then:
303
+
304
+ ```python
305
+ from pyseqalign.accel import cpp_available, load
306
+ assert cpp_available()
307
+ cpp = load() # the compiled module
308
+ al = cpp.CppAligner(num_ids, gap_open, gap_extend)
309
+ al.set_matrix(flat_score_matrix) # (num_ids+1)^2 row-major scores
310
+ r = al.align(query_ids, target_ids) # r.score, r.query, r.target, r.gap_opens, ...
311
+ ```
312
+
313
+ The kernel implements the same 3-matrix (M/Ix/Iy) affine recurrence as a numeric
314
+ aligner over a dense score matrix, so its results are interchangeable with a
315
+ pure-Python affine Needleman-Wunsch (validated bit-for-bit). If the extension is
316
+ not built, `cpp_available()` is `False` and `load()` raises a helpful error --
317
+ callers should fall back to the Python aligner.
318
+
273
319
  ## Background
274
320
 
275
321
  **pySeqAlign** is a modern, pure-Python reimplementation that revives the name of one of its own ancestors. It succeeds two legacy libraries behind the ILP 2006 / ICDM 2008 work: the original **pyAlign** (SWIG-wrapped C with YAP Prolog bindings for alignment) and the original **pySeqAlign** (which held the Aleph ILP framework for learning rules from alignment examples). This version is pure Python, with optional SWI-Prolog integration via [Janus](https://www.swi-prolog.org/packs/list?p=janus) (the modern Python-Prolog bridge, replacing the older pyswip).
@@ -1,12 +1,14 @@
1
1
  [build-system]
2
- requires = ["setuptools>=68.0"]
2
+ # pybind11 is a build-time dep so the OPTIONAL C++ aligner can be compiled at
3
+ # install time (best-effort; see setup.py). It is not a runtime dependency.
4
+ requires = ["setuptools>=68.0", "pybind11>=2.10"]
3
5
  build-backend = "setuptools.build_meta"
4
6
 
5
7
  [project]
6
8
  # PyPI distribution name (the import package is `pyseqalign`; the name
7
9
  # `pyseqalign` was blocked by PyPI's similarity guard vs. an existing project).
8
10
  name = "pyseqalignment"
9
- version = "0.1.1"
11
+ version = "0.1.3"
10
12
  description = "pySeqAlign -- sequence alignment with Prolog-style distance functions and ILP learning"
11
13
  readme = "README.md"
12
14
  license = "MIT"
@@ -66,6 +68,7 @@ where = ["src"]
66
68
  "pyseqalign.prolog.knowledge" = ["*.pl"]
67
69
  "pyseqalign.learning.aleph_files" = ["*.pl"]
68
70
  "pyseqalign.scoring.matrix_data" = ["*"]
71
+ "pyseqalign" = ["cpp/*.cpp", "cpp/*.sh"] # optional C++ aligner source (build manually)
69
72
 
70
73
  [tool.pytest.ini_options]
71
74
  testpaths = ["tests"]
@@ -0,0 +1,58 @@
1
+ """Best-effort build of the optional C++ affine-gap aligner.
2
+
3
+ pySeqAlign's core is pure Python. The C++ aligner (``pyseqalign.cpp_aligner``)
4
+ is an OPTIONAL accelerator (~100-270x faster, identical results). We try to
5
+ compile it at install time on any platform that has a C++ compiler + pybind11;
6
+ if that fails (no compiler, no pybind11, unsupported platform, ...) the install
7
+ STILL SUCCEEDS and the library transparently falls back to the pure-Python
8
+ aligner -- see ``pyseqalign.accel``.
9
+
10
+ This is why the project publishes an sdist (not a pure-Python wheel): pip builds
11
+ from source on the target machine, giving every install a chance to compile the
12
+ accelerator locally for its own Python/ABI/platform.
13
+ """
14
+
15
+ from __future__ import annotations
16
+
17
+ import sys
18
+
19
+ from setuptools import Extension, setup
20
+ from setuptools.command.build_ext import build_ext
21
+
22
+ # Only build on POSIX (macOS/Linux); the hand-tuned flags below are GCC/Clang.
23
+ _CPP = 'src/pyseqalign/cpp/cpp_aligner.cpp'
24
+ ext_modules: list[Extension] = []
25
+ if sys.platform != 'win32':
26
+ try:
27
+ import pybind11
28
+ ext_modules = [
29
+ Extension(
30
+ 'pyseqalign.cpp_aligner',
31
+ [_CPP],
32
+ include_dirs=[pybind11.get_include()],
33
+ language='c++',
34
+ optional=True, # setuptools won't fail the build if this ext won't compile
35
+ extra_compile_args=['-O3', '-std=c++14'],
36
+ )
37
+ ]
38
+ except Exception as exc: # pybind11 missing -> skip the accelerator
39
+ print(f'pyseqalign: skipping optional C++ aligner ({exc}); pure-Python fallback.')
40
+
41
+
42
+ class BestEffortBuildExt(build_ext):
43
+ """Compile the accelerator if possible; never break the install if not."""
44
+
45
+ def run(self) -> None:
46
+ try:
47
+ super().run()
48
+ except Exception as exc: # pragma: no cover - depends on build env
49
+ print(f'pyseqalign: optional C++ aligner not built ({exc}); pure-Python fallback.')
50
+
51
+ def build_extension(self, ext) -> None:
52
+ try:
53
+ super().build_extension(ext)
54
+ except Exception as exc: # pragma: no cover - depends on build env
55
+ print(f'pyseqalign: optional C++ aligner not built ({exc}); pure-Python fallback.')
56
+
57
+
58
+ setup(ext_modules=ext_modules, cmdclass={'build_ext': BestEffortBuildExt})
@@ -4,7 +4,7 @@ from pyseqalign.core.alignment import AlignmentResult, LocalAlignmentResult
4
4
  from pyseqalign.core.needleman_wunsch import NeedlemanWunsch
5
5
  from pyseqalign.core.smith_waterman import SmithWaterman
6
6
 
7
- __version__ = "0.1.0"
7
+ __version__ = "0.1.3"
8
8
 
9
9
  __all__ = [
10
10
  "SmithWaterman",
@@ -0,0 +1,33 @@
1
+ """Optional fast C++ affine-gap aligner (pybind11).
2
+
3
+ pySeqAlign's core is pure Python; this accelerated aligner is OPTIONAL. Build it
4
+ with ``src/pyseqalign/cpp/build_cpp_aligner.sh`` (needs a C++ compiler + pybind11).
5
+
6
+ It implements the same 3-matrix affine Needleman-Wunsch recurrence as a numeric
7
+ kernel over a dense score matrix, so results are interchangeable with a
8
+ pure-Python affine aligner. Exposes ``CppAligner(num_ids, gap_open, gap_extend)``
9
+ with ``set_matrix(flat_row_major)`` and ``align(q, t) -> AlignResult``
10
+ (``.score/.query/.target/.gap_opens/.gap_extensions/.length``).
11
+ """
12
+ from __future__ import annotations
13
+
14
+
15
+ def cpp_available() -> bool:
16
+ """True if the compiled extension is built and importable."""
17
+ try:
18
+ from . import cpp_aligner # noqa: F401
19
+ return True
20
+ except Exception:
21
+ return False
22
+
23
+
24
+ def load():
25
+ """Return the compiled module (raises a helpful error if not built)."""
26
+ try:
27
+ from . import cpp_aligner
28
+ return cpp_aligner
29
+ except ImportError as e: # pragma: no cover
30
+ raise ImportError(
31
+ 'pySeqAlign C++ aligner not built. Run '
32
+ 'src/pyseqalign/cpp/build_cpp_aligner.sh (needs a C++ compiler + pybind11).'
33
+ ) from e
@@ -0,0 +1,12 @@
1
+ #!/bin/bash
2
+ # Build the optional fast C++ affine-gap aligner (pybind11) for the active Python.
3
+ # Usage: PY=/path/to/python ./build_cpp_aligner.sh (default: python3)
4
+ set -e
5
+ PY="${PY:-python3}"
6
+ HERE="$(cd "$(dirname "$0")" && pwd)"
7
+ PYINC=$("$PY" -c "import sysconfig;print(sysconfig.get_path('include'))")
8
+ PBINC=$("$PY" -c "import pybind11;print(pybind11.get_include())")
9
+ SUF=$("$PY" -c "import sysconfig;print(sysconfig.get_config_var('EXT_SUFFIX'))")
10
+ clang++ -O3 -std=c++14 -shared -undefined dynamic_lookup -fPIC \
11
+ -I"$PYINC" -I"$PBINC" "$HERE/cpp_aligner.cpp" -o "$HERE/../cpp_aligner$SUF"
12
+ echo "built $HERE/../cpp_aligner$SUF"
@@ -0,0 +1,140 @@
1
+ // Fast C++ Needleman-Wunsch affine-gap aligner for pyREAL boosting.
2
+ //
3
+ // Mirrors pyreal.core.nw_affine.NeedlemanWunschAffine EXACTLY (3-matrix M/Ix/Iy
4
+ // DP, same border init, same argmax tie-breaking, same traceback + gap counting)
5
+ // so the Python and C++ backends are interchangeable. Adapted from the pyAlign2
6
+ // AlignerAffine structure, but YAP-free: scores come from a flat distance matrix
7
+ // (the boosting reward matrix), set once per reward update; align() is then a
8
+ // pure numeric kernel callable thousands of times without recomputing scores.
9
+ //
10
+ // Build: see build_cpp_aligner.sh
11
+ #include <pybind11/pybind11.h>
12
+ #include <pybind11/stl.h>
13
+ #include <vector>
14
+ #include <limits>
15
+ #include <algorithm>
16
+ #include <stdexcept>
17
+
18
+ namespace py = pybind11;
19
+
20
+ static const int M = 0, IX = 1, IY = 2;
21
+
22
+ struct AlignResult {
23
+ double score = 0.0;
24
+ std::vector<int> query; // aligned query (0 = gap)
25
+ std::vector<int> target; // aligned target (0 = gap)
26
+ int gap_opens = 0;
27
+ int gap_extensions = 0;
28
+ int length = 0;
29
+ };
30
+
31
+ class CppAligner {
32
+ public:
33
+ // num_ids = number of distinct atom ids (gap id 0 excluded). The score
34
+ // matrix is (num_ids+1) x (num_ids+1), row-major, indexed by atom id.
35
+ CppAligner(int num_ids, double gap_open, double gap_extend)
36
+ : n1_(num_ids + 1), gap_open_(gap_open), gap_extend_(gap_extend),
37
+ mat_((size_t)(num_ids + 1) * (num_ids + 1), 0.0) {}
38
+
39
+ void set_matrix(const std::vector<double>& flat) {
40
+ if (flat.size() != mat_.size())
41
+ throw std::runtime_error("score matrix size mismatch");
42
+ mat_ = flat;
43
+ }
44
+
45
+ inline double score(int a, int b) const { return mat_[(size_t)a * n1_ + b]; }
46
+
47
+ AlignResult align(const std::vector<int>& q, const std::vector<int>& t) const {
48
+ const int n = (int)q.size(), m = (int)t.size();
49
+ const double NEG = -std::numeric_limits<double>::infinity();
50
+ const double d = gap_open_, e = gap_extend_;
51
+
52
+ // F[k][i][j], B stores (from_k, from_i, from_j)
53
+ std::vector<double> F((size_t)3 * (n + 1) * (m + 1), NEG);
54
+ std::vector<int> B((size_t)3 * (n + 1) * (m + 1) * 3, -1);
55
+ auto Fi = [&](int k, int i, int j) -> double& {
56
+ return F[((size_t)k * (n + 1) + i) * (m + 1) + j];
57
+ };
58
+ auto Bi = [&](int k, int i, int j, int c) -> int& {
59
+ return B[(((size_t)k * (n + 1) + i) * (m + 1) + j) * 3 + c];
60
+ };
61
+ auto setB = [&](int k, int i, int j, int fk, int fi, int fj) {
62
+ Bi(k, i, j, 0) = fk; Bi(k, i, j, 1) = fi; Bi(k, i, j, 2) = fj;
63
+ };
64
+
65
+ Fi(M, 0, 0) = 0.0;
66
+ for (int i = 1; i <= n; ++i) {
67
+ Fi(IX, i, 0) = (i > 1) ? Fi(IX, i - 1, 0) + e : d;
68
+ setB(IX, i, 0, IX, i - 1, 0);
69
+ }
70
+ for (int j = 1; j <= m; ++j) {
71
+ Fi(IY, 0, j) = (j > 1) ? Fi(IY, 0, j - 1) + e : d;
72
+ setB(IY, 0, j, IY, 0, j - 1);
73
+ }
74
+
75
+ for (int i = 1; i <= n; ++i) {
76
+ for (int j = 1; j <= m; ++j) {
77
+ double s = score(q[i - 1], t[j - 1]);
78
+ double cm[3] = {Fi(M, i - 1, j - 1) + s, Fi(IX, i - 1, j - 1) + s, Fi(IY, i - 1, j - 1) + s};
79
+ int bk = argmax3(cm);
80
+ Fi(M, i, j) = cm[bk]; setB(M, i, j, bk, i - 1, j - 1);
81
+
82
+ double cx[3] = {Fi(M, i - 1, j) + d, Fi(IX, i - 1, j) + e, Fi(IY, i - 1, j) + d};
83
+ bk = argmax3(cx);
84
+ Fi(IX, i, j) = cx[bk]; setB(IX, i, j, bk, i - 1, j);
85
+
86
+ double cy[3] = {Fi(M, i, j - 1) + d, Fi(IY, i, j - 1) + e, Fi(IX, i, j - 1) + d};
87
+ bk = argmax3(cy);
88
+ Fi(IY, i, j) = cy[bk]; setB(IY, i, j, bk, i, j - 1);
89
+ }
90
+ }
91
+
92
+ double ends[3] = {Fi(M, n, m), Fi(IX, n, m), Fi(IY, n, m)};
93
+ int best = argmax3(ends);
94
+
95
+ AlignResult r;
96
+ r.score = ends[best];
97
+ int k = best, i = n, j = m, prev_k = -1;
98
+ while (i > 0 || j > 0) {
99
+ int fk = Bi(k, i, j, 0), fi = Bi(k, i, j, 1), fj = Bi(k, i, j, 2);
100
+ if (fi < 0 || fj < 0) break;
101
+ if (k == M) { r.query.push_back(q[i - 1]); r.target.push_back(t[j - 1]); }
102
+ else if (k == IX) {
103
+ r.query.push_back(q[i - 1]); r.target.push_back(0);
104
+ if (prev_k != IX) r.gap_opens++; else r.gap_extensions++;
105
+ } else {
106
+ r.query.push_back(0); r.target.push_back(t[j - 1]);
107
+ if (prev_k != IY) r.gap_opens++; else r.gap_extensions++;
108
+ }
109
+ prev_k = k; k = fk; i = fi; j = fj;
110
+ }
111
+ std::reverse(r.query.begin(), r.query.end());
112
+ std::reverse(r.target.begin(), r.target.end());
113
+ r.length = (int)r.query.size();
114
+ return r;
115
+ }
116
+
117
+ private:
118
+ int n1_;
119
+ double gap_open_, gap_extend_;
120
+ std::vector<double> mat_;
121
+
122
+ static inline int argmax3(const double v[3]) {
123
+ if (v[0] >= v[1]) return (v[0] >= v[2]) ? 0 : 2;
124
+ return (v[1] >= v[2]) ? 1 : 2;
125
+ }
126
+ };
127
+
128
+ PYBIND11_MODULE(cpp_aligner, mod) {
129
+ py::class_<AlignResult>(mod, "AlignResult")
130
+ .def_readonly("score", &AlignResult::score)
131
+ .def_readonly("query", &AlignResult::query)
132
+ .def_readonly("target", &AlignResult::target)
133
+ .def_readonly("gap_opens", &AlignResult::gap_opens)
134
+ .def_readonly("gap_extensions", &AlignResult::gap_extensions)
135
+ .def_readonly("length", &AlignResult::length);
136
+ py::class_<CppAligner>(mod, "CppAligner")
137
+ .def(py::init<int, double, double>())
138
+ .def("set_matrix", &CppAligner::set_matrix)
139
+ .def("align", &CppAligner::align);
140
+ }
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: pyseqalignment
3
- Version: 0.1.1
3
+ Version: 0.1.3
4
4
  Summary: pySeqAlign -- sequence alignment with Prolog-style distance functions and ILP learning
5
5
  Author-email: Andreas Karwath <a.karwath@bham.ac.uk>
6
6
  License-Expression: MIT
@@ -64,6 +64,7 @@ pySeqAlign provides Smith-Waterman (local) and Needleman-Wunsch (global) sequenc
64
64
  - **Aleph** backend -- classic ILP system (Srinivasan, 2001)
65
65
  - **Popper** backend -- modern ILP via learning from failures (Cropper & Morel, 2021)
66
66
  - **Pure Python** core -- no C extension required (unlike the legacy version)
67
+ - **Optional fast C++ aligner** -- a pybind11 affine-gap Needleman-Wunsch kernel (`pyseqalign.accel`) for large all-pairs / iterative workloads; identical results to the Python aligner, ~100-270x faster
67
68
 
68
69
  ## Installation
69
70
 
@@ -308,6 +309,51 @@ For reference, other notable systems in the field include:
308
309
  - [Metagol](https://github.com/metagol/metagol) -- Meta-Interpretive Learning
309
310
  - [DeepStochLog](https://github.com/ML-KULeuven/deepstochlog) -- Neural-symbolic ILP combining logic and neural networks
310
311
 
312
+ ## Fast C++ aligner (optional)
313
+
314
+ The pure-Python aligners are fine for typical use. For heavy workloads (e.g.
315
+ boosting that re-aligns thousands of sequence pairs per iteration), an optional
316
+ **C++ affine-gap Needleman-Wunsch** kernel is provided.
317
+
318
+ **It is compiled automatically at install time** (best effort). The package is
319
+ distributed as an sdist, so `pip install pyseqalignment` builds from source and
320
+ tries to compile the accelerator for your Python/ABI/platform using a C++
321
+ compiler + pybind11 (pulled in as a build dependency). On macOS/Linux with a
322
+ compiler present this "just works"; if no compiler is available the install
323
+ still succeeds and the library falls back to the pure-Python aligner. Check with:
324
+
325
+ ```python
326
+ from pyseqalign.accel import cpp_available
327
+ print(cpp_available()) # True if the accelerator compiled at install
328
+ ```
329
+
330
+ If you installed without a compiler and later want the accelerator, install one
331
+ (Xcode Command Line Tools on macOS, `build-essential` on Debian/Ubuntu) and
332
+ either reinstall (`pip install --force-reinstall --no-binary :all: pyseqalignment`)
333
+ or build it in place once:
334
+
335
+ ```bash
336
+ pip install pybind11
337
+ src/pyseqalign/cpp/build_cpp_aligner.sh # or: PY=$(which python) src/.../build_cpp_aligner.sh
338
+ ```
339
+
340
+ Either way it compiles the extension into the `pyseqalign` package. Then:
341
+
342
+ ```python
343
+ from pyseqalign.accel import cpp_available, load
344
+ assert cpp_available()
345
+ cpp = load() # the compiled module
346
+ al = cpp.CppAligner(num_ids, gap_open, gap_extend)
347
+ al.set_matrix(flat_score_matrix) # (num_ids+1)^2 row-major scores
348
+ r = al.align(query_ids, target_ids) # r.score, r.query, r.target, r.gap_opens, ...
349
+ ```
350
+
351
+ The kernel implements the same 3-matrix (M/Ix/Iy) affine recurrence as a numeric
352
+ aligner over a dense score matrix, so its results are interchangeable with a
353
+ pure-Python affine Needleman-Wunsch (validated bit-for-bit). If the extension is
354
+ not built, `cpp_available()` is `False` and `load()` raises a helpful error --
355
+ callers should fall back to the Python aligner.
356
+
311
357
  ## Background
312
358
 
313
359
  **pySeqAlign** is a modern, pure-Python reimplementation that revives the name of one of its own ancestors. It succeeds two legacy libraries behind the ILP 2006 / ICDM 2008 work: the original **pyAlign** (SWIG-wrapped C with YAP Prolog bindings for alignment) and the original **pySeqAlign** (which held the Aleph ILP framework for learning rules from alignment examples). This version is pure Python, with optional SWI-Prolog integration via [Janus](https://www.swi-prolog.org/packs/list?p=janus) (the modern Python-Prolog bridge, replacing the older pyswip).
@@ -1,11 +1,15 @@
1
1
  LICENSE
2
2
  README.md
3
3
  pyproject.toml
4
+ setup.py
4
5
  src/pyseqalign/__init__.py
6
+ src/pyseqalign/accel.py
5
7
  src/pyseqalign/core/__init__.py
6
8
  src/pyseqalign/core/alignment.py
7
9
  src/pyseqalign/core/needleman_wunsch.py
8
10
  src/pyseqalign/core/smith_waterman.py
11
+ src/pyseqalign/cpp/build_cpp_aligner.sh
12
+ src/pyseqalign/cpp/cpp_aligner.cpp
9
13
  src/pyseqalign/learning/__init__.py
10
14
  src/pyseqalign/learning/aleph.py
11
15
  src/pyseqalign/learning/base.py
File without changes
File without changes