PyPI - late-interaction-kernels - Versions diffs - 0.0.1__tar.gz - Mend

late-interaction-kernels 0.0.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (87) hide show

late_interaction_kernels-0.0.1/.github/ISSUE_TEMPLATE/bug_report.yml ADDED Viewed

@@ -0,0 +1,54 @@
+name: Bug report
+description: A kernel crashes, returns wrong numbers, or is slower than expected.
+labels: ["bug"]
+body:
+  - type: textarea
+    id: summary
+    attributes:
+      label: Summary
+      description: One sentence — what went wrong?
+    validations:
+      required: true
+  - type: textarea
+    id: env
+    attributes:
+      label: Environment
+      description: Output of the commands below.
+      placeholder: |
+        python -c "import torch; print('torch', torch.__version__, 'cuda', torch.version.cuda)"
+        python -c "import triton; print('triton', triton.__version__)"
+        python -c "import late_interaction_kernels as lik; print('lik', lik.__version__)"
+        python -c "import pylate; print('pylate', pylate.__version__)"  # if PyLate-related
+        nvidia-smi | head -5
+      render: shell
+    validations:
+      required: true
+  - type: textarea
+    id: repro
+    attributes:
+      label: Minimal reproducer
+      description: |
+        < 30 lines of runnable Python. For PyLate issues, include `patch_pylate()`
+        and one loss / scoring call. For perf issues, adapt one of `benchmarks/bench_*.py`.
+      render: python
+    validations:
+      required: true
+  - type: textarea
+    id: shape
+    attributes:
+      label: Shape / dtype
+      description: "(Nq, Nd, Lq, Ld, d), dtype, mask usage, backward method."
+      placeholder: "Nq=32, Nd=32, Lq=32, Ld=300, d=128, fp16, d_mask=True, auto"
+  - type: textarea
+    id: observed
+    attributes:
+      label: Observed vs expected
+      description: |
+        Tracebacks, numerical diffs, or perf numbers. For PyLate issues,
+        include the parity delta vs `LIK_DISABLE=1` (kill-switch path).
+    validations:
+      required: true

late_interaction_kernels-0.0.1/.github/ISSUE_TEMPLATE/config.yml ADDED Viewed

@@ -0,0 +1,11 @@
+blank_issues_enabled: false
+contact_links:
+  - name: API reference
+    url: https://github.com/hcompai/late-interaction-kernels/blob/main/README.md#api
+    about: Public surface — patch_pylate(), MaxSimScorer, retrieve, maxsim_varlen, etc.
+  - name: Supported models
+    url: https://github.com/hcompai/late-interaction-kernels/blob/main/docs/supported_models.md
+    about: Which ColBERT / ColPali / ModernColBERT / LateOn-Code / mxbai-edge models we accelerate today.
+  - name: Packed / varlen training
+    url: https://github.com/hcompai/late-interaction-kernels/blob/main/docs/packed_training.md
+    about: Cookbook for wiring maxsim_varlen into a heterogeneous-length training loop.

late_interaction_kernels-0.0.1/.github/ISSUE_TEMPLATE/feature_request.yml ADDED Viewed

@@ -0,0 +1,25 @@
+name: Feature request
+description: Propose a new kernel variant, API change, or integration.
+labels: ["enhancement"]
+body:
+  - type: textarea
+    id: motivation
+    attributes:
+      label: Motivation
+      description: What problem does this solve? Who's hitting it?
+    validations:
+      required: true
+  - type: textarea
+    id: proposal
+    attributes:
+      label: Proposal
+      description: Sketch API / behavior. Pseudo-code is great.
+    validations:
+      required: true
+  - type: textarea
+    id: alternatives
+    attributes:
+      label: Alternatives considered
+      description: Why not solve this outside the library?

late_interaction_kernels-0.0.1/.github/pull_request_template.md ADDED Viewed

@@ -0,0 +1,16 @@
+## Description
+<!-- One or two sentences. Link any related issue. -->
+## Approach
+<!-- Brief notes on the implementation. -->
+## Test plan
+<!-- How did you verify correctness? Shapes, GPU, numbers. -->
+- [ ] `ruff check . && ruff format --check .` and `pytest -q` pass
+- [ ] Parity vs `reference.maxsim_reference` holds for new numerical paths
+- [ ] Benchmarks included for performance-motivated changes
+- [ ] `CHANGELOG.md` and README updated for public API changes

late_interaction_kernels-0.0.1/.github/workflows/ci.yml ADDED Viewed

@@ -0,0 +1,79 @@
+name: CI
+permissions:
+  contents: read
+on:
+  pull_request:
+    paths:
+      - "late_interaction_kernels/**"
+      - "tests/**"
+      - "pyproject.toml"
+      - "uv.lock"
+      - ".github/workflows/ci.yml"
+  push:
+    branches: [main]
+  workflow_dispatch:
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}-${{ github.ref == 'refs/heads/main' && github.sha || '' }}
+  cancel-in-progress: true
+jobs:
+  lint:
+    name: Lint (ruff)
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Install uv
+        uses: astral-sh/setup-uv@v5
+        with:
+          enable-cache: true
+          prune-cache: false
+      - name: Install dependencies
+        run: uv sync --frozen --extra dev
+      - name: Run ruff linter
+        run: uv run ruff check
+      - name: Run ruff formatter check
+        run: uv run ruff format --check
+  typecheck:
+    name: Type Check (ty)
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Install uv
+        uses: astral-sh/setup-uv@v5
+        with:
+          enable-cache: true
+          prune-cache: false
+      - name: Install dependencies
+        run: uv sync --frozen --extra dev
+      - name: Run ty type checker
+        run: uv run ty check --output-format github
+  cpu-smoke:
+    name: CPU smoke (py${{ matrix.python-version }})
+    runs-on: ubuntu-latest
+    strategy:
+      fail-fast: false
+      matrix:
+        python-version: ["3.10", "3.11", "3.12"]
+    steps:
+      - uses: actions/checkout@v4
+      - name: Install uv
+        uses: astral-sh/setup-uv@v5
+        with:
+          enable-cache: true
+          prune-cache: false
+      - name: Install dependencies
+        run: uv sync --frozen --extra dev --python ${{ matrix.python-version }}
+      - name: Import check
+        run: |
+          uv run --python ${{ matrix.python-version }} python -c "import late_interaction_kernels as lik; print(lik.__version__)"
+          uv run --python ${{ matrix.python-version }} python -c "from late_interaction_kernels.reference import maxsim_reference, maxsim_reference_soft, maxsim_reference_varlen; print('reference ok')"
+      - name: Run CPU-safe tests (CUDA tests auto-skip)
+        run: uv run --python ${{ matrix.python-version }} pytest -q
+  # GPU tests live on SkyPilot — run manually or via a scheduled dispatch.
+  # See scripts/sky_test.yaml.

late_interaction_kernels-0.0.1/.github/workflows/publish.yml ADDED Viewed

@@ -0,0 +1,54 @@
+name: Publish Python Package
+on:
+  release:
+    types: [published]
+permissions:
+  contents: read
+jobs:
+  release-build:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v6
+      - uses: actions/setup-python@v6
+        with:
+          python-version: "3.x"
+      - name: Build release distributions
+        run: |
+          python -m pip install build
+          python -m build
+      - name: Upload distributions
+        uses: actions/upload-artifact@v6
+        with:
+          name: release-dists
+          path: dist/
+  pypi-publish:
+    runs-on: ubuntu-latest
+    needs:
+      - release-build
+    permissions:
+      # IMPORTANT: this permission is mandatory for trusted publishing
+      id-token: write
+    environment:
+      name: pypi
+      url: https://pypi.org/project/late-interaction-kernels/
+    steps:
+      - name: Retrieve release distributions
+        uses: actions/download-artifact@v7
+        with:
+          name: release-dists
+          path: dist/
+      - name: Publish release distributions to PyPI
+        uses: pypa/gh-action-pypi-publish@release/v1

late_interaction_kernels-0.0.1/.gitignore ADDED Viewed

@@ -0,0 +1,223 @@
+# Custom
+.DS_Store
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[codz]
+*$py.class
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# PyInstaller
+#   Usually these files are written by a python script from a template
+#   before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py.cover
+*.lcov
+.hypothesis/
+.pytest_cache/
+cover/
+# Translations
+*.mo
+*.pot
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+# Flask stuff:
+instance/
+.webassets-cache
+# Scrapy stuff:
+.scrapy
+# Sphinx documentation
+docs/_build/
+# PyBuilder
+.pybuilder/
+target/
+# Jupyter Notebook
+.ipynb_checkpoints
+# IPython
+profile_default/
+ipython_config.py
+# pyenv
+#   For a library or package, you might want to ignore these files since the code is
+#   intended to run in multiple environments; otherwise, check them in:
+# .python-version
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+# Pipfile.lock
+# UV
+#   Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+# uv.lock
+# poetry
+#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+# poetry.lock
+# poetry.toml
+# pdm
+#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+#   pdm recommends including project-wide configuration in pdm.toml, but excluding .pdm-python.
+#   https://pdm-project.org/en/latest/usage/project/#working-with-version-control
+# pdm.lock
+# pdm.toml
+.pdm-python
+.pdm-build/
+# pixi
+#   Similar to Pipfile.lock, it is generally recommended to include pixi.lock in version control.
+# pixi.lock
+#   Pixi creates a virtual environment in the .pixi directory, just like venv module creates one
+#   in the .venv directory. It is recommended not to include this directory in version control.
+.pixi/*
+!.pixi/config.toml
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+# Celery stuff
+celerybeat-schedule*
+celerybeat.pid
+# Redis
+*.rdb
+*.aof
+*.pid
+# RabbitMQ
+mnesia/
+rabbitmq/
+rabbitmq-data/
+# ActiveMQ
+activemq-data/
+# SageMath parsed files
+*.sage.py
+# Environments
+.env
+.envrc
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# Spyder project settings
+.spyderproject
+.spyproject
+# Rope project settings
+.ropeproject
+# mkdocs documentation
+/site
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+# Pyre type checker
+.pyre/
+# pytype static type analyzer
+.pytype/
+# Cython debug symbols
+cython_debug/
+# PyCharm
+#   JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+#   be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
+#   and can be added to the global gitignore or merged into this file.  For a more nuclear
+#   option (not recommended) you can uncomment the following to ignore the entire idea folder.
+# .idea/
+# Abstra
+#   Abstra is an AI-powered process automation framework.
+#   Ignore directories containing user credentials, local state, and settings.
+#   Learn more at https://abstra.io/docs
+.abstra/
+# Visual Studio Code
+#   Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore
+#   that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
+#   and can be added to the global gitignore or merged into this file. However, if you prefer,
+#   you could uncomment the following to ignore the entire vscode folder
+# .vscode/
+# Temporary file for partial code execution
+tempCodeRunnerFile.py
+# Ruff stuff:
+.ruff_cache/
+# PyPI configuration file
+.pypirc
+# Marimo
+marimo/_static/
+marimo/_lsp/
+__marimo__/
+# Streamlit
+.streamlit/secrets.toml

late_interaction_kernels-0.0.1/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,51 @@
+# Changelog
+All notable changes to this project will be documented here.
+Format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and the
+project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.0.1] - 2026-05-02
+Fused Triton kernels for late-interaction (MaxSim) scoring, with a high-level
+PyTorch API and PyLate drop-in.
+### Added
+- **Core MaxSim kernels** — `maxsim` (autograd-aware) and `maxsim_inference`
+  with fused L2-normalize, mask handling, and a `unified` / `csr` / `atomic`
+  backward selector (`set_backward_method`, default `auto`).
+- **Ragged / packed batches** — `maxsim_varlen` over `cu_seqlens`-indexed
+  flat buffers, autograd-aware on both `Q` and `D`.
+- **Pair-list scoring** — `maxsim_inference_scatter` scores arbitrary
+  `(query_index, doc_index)` pairs from packed batches and returns
+  `[num_pairs]` directly (vLLM-style reranker scheduling).
+- **Fused D-side head** — `maxsim_from_hidden` (inference) and
+  `maxsim_from_hidden_train` (closed-form backward) apply
+  projection + L2-normalize + MaxSim in a single pass over raw
+  `[Nd, Ld, d_model]` hidden states.
+- **PLAID / ColBERTv2** — `plaid_approx_score` (approximate scoring) and
+  `maxsim_residual` / `maxsim_residual_varlen` (exact rerank with on-the-fly
+  2/4/8-bit residual decompression + L2-normalize + MaxSim, forward-only on
+  varlen).
+- **FP8 inference** — `maxsim_inference_fp8` with per-tensor / per-token
+  e4m3 inputs, fp32 accumulator, and a score-tie fallback harness.
+- **High-level API** — `MaxSimScorer(nn.Module)` and `retrieve(Q, D, top_k)`,
+  both with transparent pure-PyTorch CPU fallback so training and retrieval
+  code is unit-testable on macOS / Windows / CPU-only CI.
+- **PyLate drop-in** — `patch_pylate` / `unpatch_pylate` patch
+  `colbert_scores` and `colbert_kd_scores` across `Contrastive`,
+  `CachedContrastive`, and `Distillation`. `LIK_DISABLE=1` is the
+  process-wide kill switch.
+- **Experimental kernels** — `late_interaction_kernels.experimental` ships
+  `soft_maxsim`, `smooth_maxsim`, `maxsim_xtr`, and `maxsim_matryoshka`.
+- **FP8 helpers** — `late_interaction_kernels.fp8` exposes per-tensor /
+  per-token quantize / dequantize utilities.
+- Per-GPU autotune (Hopper / Ampere / Ada / generic) with shared-memory
+  pruning; warp specialization on Triton ≥ 3.2 with transparent fallback.
+- Pure-PyTorch reference (`late_interaction_kernels.reference`) used as
+  ground truth in tests and as the CPU fallback path.
+- Test suite covering forward / backward parity, varlen, soft/smooth,
+  edge cases, PyLate compatibility, CPU fallback, and `gradcheck` on the
+  high-level API.
+- Benchmarks for every kernel, plus end-to-end PyLate / LateOn training
+  and retrieval scripts under `benchmarks/` and `scripts/`.

late_interaction_kernels-0.0.1/CONTRIBUTING.md ADDED Viewed

@@ -0,0 +1,62 @@
+# Contributing
+## Reporting issues
+Use the **Bug report** or **Feature request** templates under
+[Issues → New issue](https://github.com/hcompai/late-interaction-kernels/issues/new/choose).
+## Autotune for a new GPU
+If performance is poor on a GPU we don't have a shortlist for:
+1. Run the benchmark for the shape you care about
+   (`benchmarks/bench_forward.py`, `benchmarks/bench_inference_edge.py`,
+   `benchmarks/bench_backward_method.py`).
+2. Add a shortlist in `late_interaction_kernels/_autotune.py` keyed on
+   the device-name prefix.
+3. Re-run the benchmark and include before / after in the PR.
+## New kernel variant
+For a new reduction flavor (e.g. top-K, soft variants), keep it in a
+separate module under `late_interaction_kernels/` and follow the
+existing split:
+- internal `_forward` returning `(scores, argmax)` without autograd;
+- `torch.autograd.Function` wrapper that saves minimal state;
+- pure-PyTorch reference in `late_interaction_kernels/reference.py`;
+- parity tests in `tests/`.
+Research kernels with no production user yet land under
+`late_interaction_kernels/experimental/`.
+## Development setup
+```bash
+git clone https://github.com/hcompai/late-interaction-kernels
+cd late-interaction-kernels
+pip install -e ".[dev,pylate]"
+ruff check . && ruff format --check .
+pytest -q
+```
+## Style
+- Python 3.9+; type hints on public APIs.
+- Comments explain *why*, not *what*. Don't narrate trivial code.
+- Match the existing docstring tone — short, concrete, no marketing.
+## Publishing a release
+1. Ensure `main` is green and `CHANGELOG.md` has the `Unreleased` block filled in.
+2. On GitHub: **Releases → Draft a new release**, tag `vX.Y.Z` off `main`.
+3. Paste the matching `CHANGELOG.md` section as the release body, then **Publish**.
+The [`publish.yml`](.github/workflows/publish.yml) workflow builds and uploads
+to PyPI automatically via OIDC trusted publishing. No token needed.
+## License
+By contributing you agree your work is licensed under Apache 2.0
+(see [`LICENSE`](LICENSE)).