PyPI - interpreto - Versions diffs - 0.4.20__tar.gz → 0.5.0.dev1__tar.gz - Mend

interpreto 0.4.20tar.gz → 0.5.0.dev1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (138) hide show

{interpreto-0.4.20 → interpreto-0.5.0.dev1}/.pre-commit-config.yaml RENAMED Viewed

@@ -24,7 +24,7 @@ repos:
         exclude: LICENSE
   - repo: https://github.com/astral-sh/ruff-pre-commit
-    rev: v0.14.14
+    rev: v0.15.10
     hooks:
       - id: ruff-format
       - id: ruff

interpreto-0.5.0.dev1/AGENTS.md ADDED Viewed

@@ -0,0 +1,214 @@
+# AGENTS.md
+## Goal
+`interpreto` is a modular interpretability toolkit for transformer models. The repository aims to provide:
+- an easy-to-use public API for attribution and concept-based explanations,
+- detailed documentation with concrete examples,
+- precise internal representations for tensors, targets, and activations,
+- reusable building blocks that can be combined without rewriting the whole pipeline.
+The main product surface is:
+- attribution methods for classification and generation,
+- concept discovery and concept interpretation workflows,
+- evaluation metrics,
+- HTML visualizations,
+- docs and notebooks showing real usage.
+## Repository Map
+- `interpreto/__init__.py`
+  - Curated public API. If a feature is meant to be user-facing, it usually belongs here too.
+- `interpreto/model_wrapping/`
+  - Bridges raw Hugging Face models to Interpreto internals.
+  - `inference_wrapper.py`: shared batching, device handling, logits/gradient access, padding helpers.
+  - `classification_inference_wrapper.py`: targeted scoring for classification tasks.
+  - `generation_inference_wrapper.py`: targeted scoring for generation tasks.
+  - `model_with_split_points.py`: `nnsight`-based model splitting and activation extraction for concept methods.
+  - `llm_interface.py`: abstraction layer for LLM-based concept labeling.
+- `interpreto/attributions/`
+  - Attribution framework.
+  - `base.py`: shared explainers, normalization, output dataclasses, classification/generation glue.
+  - `methods/`: LIME, KernelShap, Occlusion, Sobol, Saliency, Integrated Gradients, SmoothGrad, etc.
+  - `perturbations/`: perturbation generators used by attribution methods.
+  - `aggregations/`: score aggregation logic.
+  - `metrics/`: insertion/deletion evaluation.
+- `interpreto/concepts/`
+  - Concept-based interpretability framework.
+  - `base.py`: base concept explainer interfaces.
+  - `methods/`: neurons-as-concepts, overcomplete/SAE methods, sklearn-based methods, Cockatiel.
+  - `interpretations/`: `TopKInputs`, `LLMLabels`, and related interpretation utilities.
+  - `metrics/`: reconstruction, sparsity, stability, and ConSim.
+- `interpreto/commons/`
+  - Shared utilities such as granularity handling, generator helpers, and distances.
+- `interpreto/typing.py`
+  - Central typing aliases and protocols. This file expresses the intended normalized internal shapes and interfaces.
+- `interpreto/visualizations/`
+  - HTML/CSS/JS renderers for attribution and concept outputs.
+  - Visualizations should consume normalized outputs, not recompute model logic.
+- `interpreto/_vendor/overcomplete/`
+  - Vendored dependency for concept learning backends. Avoid touching it unless the change really belongs there.
+- `tests/`
+  - Pytest suite. Reuse fixtures from `tests/conftest.py` whenever possible.
+- `docs/`
+  - MkDocs source, API pages, and notebooks.
+- `site/`
+  - Generated documentation output. Prefer editing `docs/`, not `site/`.
+## Key Dependencies
+- `torch`
+  - Core tensor and model execution backend.
+- `transformers`
+  - Main model/tokenizer interface and public compatibility target.
+- `nnsight`
+  - Used by `ModelWithSplitPoints` for split points and activation capture.
+- `jaxtyping` and `beartype`
+  - Preferred tools for explicit tensor typing and shape contracts.
+- `scikit-learn`, `scipy`, `einops`, `matplotlib`, `nltk`
+  - Supporting libraries for methods, metrics, preprocessing, and visualization.
+- `bitsandbytes`
+  - Compatibility with quantized transformer loading.
+- `mkdocs` stack
+  - Documentation build system.
+## How The Pieces Interact
+### Attribution pipeline
+User inputs can arrive in several formats: strings, tokenized mappings, tensors, or iterables of those. The code should normalize them early, then keep core computations on one internal format.
+Typical flow:
+1. User input and targets enter an attribution explainer from `interpreto.attributions`.
+2. The explainer normalizes inputs/targets in `attributions/base.py`.
+3. A perturbator or gradient path generates the computation stream.
+4. A task-specific inference wrapper computes targeted logits or gradients.
+5. An aggregator converts raw scores into final attribution values.
+6. The result is packaged as `AttributionOutput`.
+7. Metrics and visualizations consume `AttributionOutput`.
+Important style point: attribution code is intentionally generator-friendly. Many paths are designed to work sample by sample or batch by batch instead of materializing everything eagerly. Preserve that when making changes, especially for generation and prompt construction logic.
+### Concept pipeline
+Typical flow:
+1. `ModelWithSplitPoints` wraps a transformer model and exposes split points.
+2. `get_activations()` extracts latent activations at a chosen granularity.
+3. A concept explainer from `interpreto.concepts.methods` fits or applies a concept model on those activations.
+4. Interpretation methods such as `TopKInputs` or `LLMLabels` map concept dimensions to human-readable descriptions.
+5. Metrics and visualizations operate on the resulting concept-space artifacts.
+`ModelWithSplitPoints` is the bridge between the transformer world and concept methods. Most concept changes should respect that layering instead of bypassing it.
+### Granularity and normalization
+Granularity is a core abstraction shared across attribution and concept code. The code often accepts flexible user inputs, but should converge quickly toward:
+- normalized `TensorMapping`-style model inputs,
+- normalized target tensors,
+- normalized activation tensors,
+- normalized output dataclasses.
+This repository prefers a flexible public API and a stricter internal core.
+## Repository Vibe
+- Keep the public API easy to use.
+  - Users may provide several input formats.
+  - Internal computations should still be normalized into a single clear format as early as possible.
+- Prefer precise typing.
+  - `jaxtyping` is valuable here because tensor shapes matter a lot for readability and debugging.
+  - Be pragmatic at boundaries with `transformers` and `nnsight`; do not make the code worse just to force shape annotations through awkward external APIs.
+- Documentation matters.
+  - Detailed docstrings, examples, file-level comments, and inline comments are a feature of the repository, not noise.
+  - When adding or changing logic, explain the shape conventions and the intent, especially around generators, token alignment, split points, and concept encoding.
+- The repository is modular.
+  - Prefer plug-and-play building blocks over special-purpose monoliths.
+  - Reuse wrappers, perturbators, aggregators, metrics, and visualization outputs rather than duplicating logic.
+- Prefer one place for validation.
+  - Do not add repeated guardrails in every layer if the check already belongs at the public boundary or is already enforced by typing/contracts.
+  - Re-check only if a lower-level function can be called independently or if the invariant genuinely changes.
+- Smaller changes are usually better.
+  - Do not refactor by default.
+  - If a minimal patch would conflict with the method/class/repository design, then do the slightly larger coherent refactor instead of adding a local hack.
+- Keep implementations efficient but simple.
+  - Prefer straightforward Torch code.
+  - If a much faster version would add a lot of complexity, it is often better to land the clean version first and leave a focused `TODO`.
+- In attribution code, preserve the generator-based pipeline mindset.
+  - The repository often processes attribution sample by sample, while trying to construct good prompts and avoid unnecessary materialization.
+## Coding Expectations
+- Write docstrings and the important inline comments at the same time as the code change, or before.
+- Prefer file-level comments when the whole module has a specific role or subtle invariant.
+- Keep internal data formats explicit.
+- If adding a new public class or function, check whether it should be re-exported in a package `__init__.py` and documented in `docs/`.
+- Use the existing module boundaries.
+  - New attribution methods usually belong in `interpreto/attributions/methods/`.
+  - New perturbation logic belongs in `interpreto/attributions/perturbations/`.
+  - New concept methods belong in `interpreto/concepts/methods/`.
+  - New interpretation strategies should use the existing concept explainer interfaces.
+## Tests
+Testing style in this repository is usually a mix of:
+- method-level tests for specific algorithmic behavior,
+- class-level tests for API and integration behavior,
+- sanity checks for end-to-end invariants.
+Guidelines:
+- For a new feature, test-driven development is preferred when practical.
+- Keep tests reviewable. Do not add large numbers of nearly identical tests.
+- Be very clear in test comments/docstrings about what the test is proving.
+- Reuse `tests/conftest.py`, `tests/fixtures/`, and existing helpers before inventing new scaffolding.
+- Prefer `hf-internal-testing/*` tiny models over large custom placeholders or long fake model definitions.
+- Do not test the same invariant in many places unless it protects distinct call paths.
+## Change Workflow For Agents
+1. Think first.
+   - Understand which layer should change.
+   - Prefer the smallest coherent modification.
+   - If the design tradeoff is uncertain, it is better to ask for an opinion than to guess.
+2. Add or update tests.
+   - For new features or bug fixes, start from the behavior you want to lock in.
+   - Reuse fixtures and tiny test models whenever possible.
+3. Implement the change.
+   - Keep the code aligned with existing abstractions.
+   - Avoid clever one-off tricks that only satisfy the immediate patch.
+4. Update documentation if needed.
+   - Public API changes usually need docstring and docs updates.
+   - Example-driven documentation is part of the repository style.
+5. Verify with targeted commands first.
+Useful commands:
+- `make install-dev`
+- `make lint`
+- `make fast-test`
+- `make test-cpu`
+- `python -m pytest -n auto -c pyproject.toml -v path/to/test_file.py`
+## Practical Do / Don't
+Do:
+- Normalize flexible user inputs into one internal format early.
+- Use `jaxtyping` where it improves shape clarity.
+- Preserve generator-based or streaming-friendly flows.
+- Add comments where tensor shapes, batching, or prompt construction are non-obvious.
+- Favor small coherent patches.
+Don't:
+- Add redundant guardrails in every layer.
+- Materialize huge intermediate lists if the existing pipeline is intentionally iterable/generator-based.
+- Refactor broadly without a concrete design reason.
+- Fight external library APIs just to satisfy an idealized typing style.
+- Edit generated docs in `site/` when the real source lives in `docs/`.

{interpreto-0.4.20 → interpreto-0.5.0.dev1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: interpreto
-Version: 0.4.20
+Version: 0.5.0.dev1
 Summary: Interpretability toolbox for LLMs
 Author: FOR Team
 Author-email: fanny.jourdan@irt-saintexupery.com
@@ -57,7 +57,7 @@ License-File: LICENSE
 Requires-Dist: transformers>=4.22.0
 Requires-Dist: nltk
 Requires-Dist: torch>=2.0
-Requires-Dist: nnsight<0.6.0,>=0.5.1
+Requires-Dist: nnsight<0.8.0,>=0.7.0
 Requires-Dist: jaxtyping<=0.2.36
 Requires-Dist: beartype
 Requires-Dist: mknotebooks
@@ -120,6 +120,7 @@ Dynamic: license-file
 <p align="center">
   <a href="https://for-sight-ai.github.io/interpreto/"><strong>📚 Explore Interpreto docs &gt;&gt;</strong></a><br />
   <a href="https://for-sight-ai.github.io/interpreto-demo/"><strong>🖼️ Checkout our explanation gallery &gt;&gt;</strong></a>
+  <a href="https://arxiv.org/abs/2512.09730"><strong>📜 Read our paper &gt;&gt;</strong></a>
 </p>
 ## 🚀 Quick Start
@@ -163,41 +164,51 @@ They all work seamlessly for both classification (`...ForSequenceClassification`
 Concept-based explanations aim to provide high-level interpretations of latent model representations.
-Interpreto generalizes these methods through four core steps:
+We propose both supervised (probes and CAVs) and unsupervised (dictionary learning) approaches.
+Interpreto generalizes these methods through four core steps, the two first are common between both approaches:
 1. Split a model in two and obtain a dataset of activations
-2. Concept Discovery (e.g., from latent embeddings)
-3. Concept Interpretation (mapping discovered concepts to human-understandable elements)
-4. Concept-to-Output Attribution (assessing concept relevance to model outputs)
+2. Learn concepts (e.g., from latent embeddings)
+3. Interpret concepts (mapping discovered concepts to human-understandable elements)
+4. Estimate concepts importance (assessing concept relevance to model outputs)
 **1. Split a model in two and obtain a dataset of activations:** (mainly via [`nnsight`](https://github.com/ndif-team/nnsight)):
 Choose any layer in any HuggingFace language model with our `ModelWithSplitPoints` based on `nnsight`. Then pass a dataset through it to obtain a dataset of activations.
-**2. Dictionary Learning for Concept Discovery** (mainly via [`overcomplete`](https://github.com/KempnerInstitute/overcomplete)):
+**2. (supervised) Train probe** with the [`ProbeExplainer`](https://for-sight-ai.github.io/interpreto/api/concepts/probes/)
+We differentiate two families of probes:
+- Linear probes: [`LinearRegressionProbe`](https://for-sight-ai.github.io/interpreto/api/concepts/probes/#interpreto.concepts.probes.LinearRegressionProbe), [`LogisticRegressionProbe`](https://for-sight-ai.github.io/interpreto/api/concepts/probes/#interpreto.concepts.probes.LogisticRegressionProbe), [`LinearSVMProbe`](https://for-sight-ai.github.io/interpreto/api/concepts/probes/#interpreto.concepts.probes.LinearSVMProbe), [`MeansDiffProbe`](https://for-sight-ai.github.io/interpreto/api/concepts/probes/#interpreto.concepts.probes.MeansDiffProbe)
+- Centroid-based probes: [`CosineCentroidProbe`](https://for-sight-ai.github.io/interpreto/api/concepts/probes/#interpreto.concepts.probes.CosineCentroidProbe), [`DotProductCentroidProbe`](https://for-sight-ai.github.io/interpreto/api/concepts/probes/#interpreto.concepts.probes.DotProductCentroidProbe), [`SqL2CentroidProbe`](https://for-sight-ai.github.io/interpreto/api/concepts/probes/#interpreto.concepts.probes.SqL2CentroidProbe), [`SVDDCentroidProbe`](https://for-sight-ai.github.io/interpreto/api/concepts/probes/#interpreto.concepts.probes.SVDDCentroidProbe), [`DiagonalMahalanobisCentroidProbe`](https://for-sight-ai.github.io/interpreto/api/concepts/probes/#interpreto.concepts.probes.DiagonalMahalanobisCentroidProbe)
+Both can be tuned with `bias_calibrator` and `normalization` parameters.
+**2. (unsupervised) Dictionary Learning for Concept Discovery** (mainly via [`overcomplete`](https://github.com/KempnerInstitute/overcomplete)):
 - Interpret neurons directly via [`NeuronsAsConcepts`](https://for-sight-ai.github.io/interpreto/api/concepts/methods/neurons_as_concepts/)
 - [`NMF`](https://for-sight-ai.github.io/interpreto/api/concepts/methods/optim/#interpreto.concepts.NMFConcepts), [`Semi-NMF`](https://for-sight-ai.github.io/interpreto/api/concepts/methods/optim/#interpreto.concepts.SemiNMFConcepts), [`ConvexNMF`](https://for-sight-ai.github.io/interpreto/api/concepts/methods/optim/#interpreto.concepts.ConvexNMFConcepts)
 - [`ICA`](https://for-sight-ai.github.io/interpreto/api/concepts/methods/optim/#interpreto.concepts.ICAConcepts), [`SVD`](https://for-sight-ai.github.io/interpreto/api/concepts/methods/optim/#interpreto.concepts.SVDConcepts), [`PCA`](https://for-sight-ai.github.io/interpreto/api/concepts/methods/optim/#interpreto.concepts.PCAConcepts), [`KMeans`](https://for-sight-ai.github.io/interpreto/api/concepts/methods/optim/#interpreto.concepts.KMeansConcepts)
 - SAE variants: [`Vanilla SAE`](https://for-sight-ai.github.io/interpreto/api/concepts/methods/sae/#interpreto.concepts.VanillaSAEConcepts), [`TopK SAE`](https://for-sight-ai.github.io/interpreto/api/concepts/methods/sae/#interpreto.concepts.TopKSAEConcepts), [`JumpReLU SAE`](https://for-sight-ai.github.io/interpreto/api/concepts/methods/sae/#interpreto.concepts.JumpReLUSAEConcepts), [`BatchTopK SAE`](https://for-sight-ai.github.io/interpreto/api/concepts/methods/sae/#interpreto.concepts.BatchTopKSAEConcepts)
-**3. Available Concept Interpretation Techniques:**
+**3. (unsupervised) Available Concept Interpretation Techniques:**
 - Top-k tokens from tokenizer vocabulary via [`TopKInputs`](https://for-sight-ai.github.io/interpreto/api/concepts/concepts_interpretations/#interpreto.concepts.interpretations.TopKInputs) and `use_vocab=True`
 - Top-k tokens/words/sentences/samples from specific datasets via [`TopKInputs`](https://for-sight-ai.github.io/interpreto/api/concepts/concepts_interpretations/#interpreto.concepts.interpretations.TopKInputs)
 - Label concepts via LLMs with [`LLMLabels`](https://for-sight-ai.github.io/interpreto/api/concepts/concepts_interpretations/#interpreto.concepts.interpretations.LLMLabels) ([Bills et al. 2023](https://openai.com/index/language-models-can-explain-neurons-in-language-models/))
+- Input-to-concept attribution from dataset examples ([Concept Attributions](https://for-sight-ai.github.io/interpreto/api/concepts/interpretations/concept_attributions/)) ([Jourdan et al. 2023](https://aclanthology.org/2023.findings-acl.317/))
 <details><summary>Concept Interpretation Techniques Added in the future:</summary>
-- Input-to-concept attribution from dataset examples ([Jourdan et al. 2023](https://aclanthology.org/2023.findings-acl.317/))
-- Theme prediction via LLMs from top-k tokens/sentences
 - Aligning concepts with human labels ([Sajjad et al. 2022](https://aclanthology.org/2022.naacl-main.225/))
 - Word cloud visualizations of concepts ([Dalvi et al. 2022](https://arxiv.org/abs/2205.07237))
 - VocabProj & TokenChange ([Gur-Arieh et al. 2025](https://arxiv.org/abs/2501.08319))
 </details>
-**4. Concept-to-Output Attribution:**
+**4. (unsupervised) Concept-to-Output Attribution:**
 Estimate the contribution of each concept to the model output.
@@ -207,7 +218,6 @@ Can be obtained with any concept-based explainer via [`MethodConcepts.concept_ou
 Thanks to this generalization encompassing all concept-based methods and our highly flexible architecture, we can easily obtain a large number of concept-based methods:
-- CAV and TCAV: [Kim et al. 2018, Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)](http://proceedings.mlr.press/v80/kim18d.html)
 - ConceptSHAP: [Yeh et al. 2020, On Completeness-aware Concept-Based Explanations in Deep Neural Networks](https://proceedings.neurips.cc/paper/2020/hash/ecb287ff763c169694f682af52c1f309-Abstract.html)
 - COCKATIEL: [Jourdan et al. 2023, COCKATIEL: COntinuous Concept ranKed ATtribution with Interpretable ELements for explaining neural net classifiers on NLP](https://aclanthology.org/2023.findings-acl.317/)
 - Yun et al. 2021, [Transformer visualization via dictionary learning: contextualized embedding as a linear superposition of transformer factors](https://arxiv.org/abs/2103.15949)
@@ -264,7 +274,7 @@ Interpreto 🪄 is a project of the [FOR](https://www.irt-saintexupery.com/fr/fo
 ## 🗞️ Citation
-If you use Interpreto 🪄 as part of your workflow in a scientific publication, please consider citing 🗞️ our paper:
+If you use Interpreto 🪄 as part of your workflow in a scientific publication, please consider citing 🗞️ [our paper](https://arxiv.org/abs/2512.09730):
 ```bibtex
 @article{poche2025interpreto,

{interpreto-0.4.20 → interpreto-0.5.0.dev1}/README.md RENAMED Viewed

@@ -15,6 +15,7 @@
 <p align="center">
   <a href="https://for-sight-ai.github.io/interpreto/"><strong>📚 Explore Interpreto docs &gt;&gt;</strong></a><br />
   <a href="https://for-sight-ai.github.io/interpreto-demo/"><strong>🖼️ Checkout our explanation gallery &gt;&gt;</strong></a>
+  <a href="https://arxiv.org/abs/2512.09730"><strong>📜 Read our paper &gt;&gt;</strong></a>
 </p>
 ## 🚀 Quick Start
@@ -58,41 +59,51 @@ They all work seamlessly for both classification (`...ForSequenceClassification`
 Concept-based explanations aim to provide high-level interpretations of latent model representations.
-Interpreto generalizes these methods through four core steps:
+We propose both supervised (probes and CAVs) and unsupervised (dictionary learning) approaches.
+Interpreto generalizes these methods through four core steps, the two first are common between both approaches:
 1. Split a model in two and obtain a dataset of activations
-2. Concept Discovery (e.g., from latent embeddings)
-3. Concept Interpretation (mapping discovered concepts to human-understandable elements)
-4. Concept-to-Output Attribution (assessing concept relevance to model outputs)
+2. Learn concepts (e.g., from latent embeddings)
+3. Interpret concepts (mapping discovered concepts to human-understandable elements)
+4. Estimate concepts importance (assessing concept relevance to model outputs)
 **1. Split a model in two and obtain a dataset of activations:** (mainly via [`nnsight`](https://github.com/ndif-team/nnsight)):
 Choose any layer in any HuggingFace language model with our `ModelWithSplitPoints` based on `nnsight`. Then pass a dataset through it to obtain a dataset of activations.
-**2. Dictionary Learning for Concept Discovery** (mainly via [`overcomplete`](https://github.com/KempnerInstitute/overcomplete)):
+**2. (supervised) Train probe** with the [`ProbeExplainer`](https://for-sight-ai.github.io/interpreto/api/concepts/probes/)
+We differentiate two families of probes:
+- Linear probes: [`LinearRegressionProbe`](https://for-sight-ai.github.io/interpreto/api/concepts/probes/#interpreto.concepts.probes.LinearRegressionProbe), [`LogisticRegressionProbe`](https://for-sight-ai.github.io/interpreto/api/concepts/probes/#interpreto.concepts.probes.LogisticRegressionProbe), [`LinearSVMProbe`](https://for-sight-ai.github.io/interpreto/api/concepts/probes/#interpreto.concepts.probes.LinearSVMProbe), [`MeansDiffProbe`](https://for-sight-ai.github.io/interpreto/api/concepts/probes/#interpreto.concepts.probes.MeansDiffProbe)
+- Centroid-based probes: [`CosineCentroidProbe`](https://for-sight-ai.github.io/interpreto/api/concepts/probes/#interpreto.concepts.probes.CosineCentroidProbe), [`DotProductCentroidProbe`](https://for-sight-ai.github.io/interpreto/api/concepts/probes/#interpreto.concepts.probes.DotProductCentroidProbe), [`SqL2CentroidProbe`](https://for-sight-ai.github.io/interpreto/api/concepts/probes/#interpreto.concepts.probes.SqL2CentroidProbe), [`SVDDCentroidProbe`](https://for-sight-ai.github.io/interpreto/api/concepts/probes/#interpreto.concepts.probes.SVDDCentroidProbe), [`DiagonalMahalanobisCentroidProbe`](https://for-sight-ai.github.io/interpreto/api/concepts/probes/#interpreto.concepts.probes.DiagonalMahalanobisCentroidProbe)
+Both can be tuned with `bias_calibrator` and `normalization` parameters.
+**2. (unsupervised) Dictionary Learning for Concept Discovery** (mainly via [`overcomplete`](https://github.com/KempnerInstitute/overcomplete)):
 - Interpret neurons directly via [`NeuronsAsConcepts`](https://for-sight-ai.github.io/interpreto/api/concepts/methods/neurons_as_concepts/)
 - [`NMF`](https://for-sight-ai.github.io/interpreto/api/concepts/methods/optim/#interpreto.concepts.NMFConcepts), [`Semi-NMF`](https://for-sight-ai.github.io/interpreto/api/concepts/methods/optim/#interpreto.concepts.SemiNMFConcepts), [`ConvexNMF`](https://for-sight-ai.github.io/interpreto/api/concepts/methods/optim/#interpreto.concepts.ConvexNMFConcepts)
 - [`ICA`](https://for-sight-ai.github.io/interpreto/api/concepts/methods/optim/#interpreto.concepts.ICAConcepts), [`SVD`](https://for-sight-ai.github.io/interpreto/api/concepts/methods/optim/#interpreto.concepts.SVDConcepts), [`PCA`](https://for-sight-ai.github.io/interpreto/api/concepts/methods/optim/#interpreto.concepts.PCAConcepts), [`KMeans`](https://for-sight-ai.github.io/interpreto/api/concepts/methods/optim/#interpreto.concepts.KMeansConcepts)
 - SAE variants: [`Vanilla SAE`](https://for-sight-ai.github.io/interpreto/api/concepts/methods/sae/#interpreto.concepts.VanillaSAEConcepts), [`TopK SAE`](https://for-sight-ai.github.io/interpreto/api/concepts/methods/sae/#interpreto.concepts.TopKSAEConcepts), [`JumpReLU SAE`](https://for-sight-ai.github.io/interpreto/api/concepts/methods/sae/#interpreto.concepts.JumpReLUSAEConcepts), [`BatchTopK SAE`](https://for-sight-ai.github.io/interpreto/api/concepts/methods/sae/#interpreto.concepts.BatchTopKSAEConcepts)
-**3. Available Concept Interpretation Techniques:**
+**3. (unsupervised) Available Concept Interpretation Techniques:**
 - Top-k tokens from tokenizer vocabulary via [`TopKInputs`](https://for-sight-ai.github.io/interpreto/api/concepts/concepts_interpretations/#interpreto.concepts.interpretations.TopKInputs) and `use_vocab=True`
 - Top-k tokens/words/sentences/samples from specific datasets via [`TopKInputs`](https://for-sight-ai.github.io/interpreto/api/concepts/concepts_interpretations/#interpreto.concepts.interpretations.TopKInputs)
 - Label concepts via LLMs with [`LLMLabels`](https://for-sight-ai.github.io/interpreto/api/concepts/concepts_interpretations/#interpreto.concepts.interpretations.LLMLabels) ([Bills et al. 2023](https://openai.com/index/language-models-can-explain-neurons-in-language-models/))
+- Input-to-concept attribution from dataset examples ([Concept Attributions](https://for-sight-ai.github.io/interpreto/api/concepts/interpretations/concept_attributions/)) ([Jourdan et al. 2023](https://aclanthology.org/2023.findings-acl.317/))
 <details><summary>Concept Interpretation Techniques Added in the future:</summary>
-- Input-to-concept attribution from dataset examples ([Jourdan et al. 2023](https://aclanthology.org/2023.findings-acl.317/))
-- Theme prediction via LLMs from top-k tokens/sentences
 - Aligning concepts with human labels ([Sajjad et al. 2022](https://aclanthology.org/2022.naacl-main.225/))
 - Word cloud visualizations of concepts ([Dalvi et al. 2022](https://arxiv.org/abs/2205.07237))
 - VocabProj & TokenChange ([Gur-Arieh et al. 2025](https://arxiv.org/abs/2501.08319))
 </details>
-**4. Concept-to-Output Attribution:**
+**4. (unsupervised) Concept-to-Output Attribution:**
 Estimate the contribution of each concept to the model output.
@@ -102,7 +113,6 @@ Can be obtained with any concept-based explainer via [`MethodConcepts.concept_ou
 Thanks to this generalization encompassing all concept-based methods and our highly flexible architecture, we can easily obtain a large number of concept-based methods:
-- CAV and TCAV: [Kim et al. 2018, Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)](http://proceedings.mlr.press/v80/kim18d.html)
 - ConceptSHAP: [Yeh et al. 2020, On Completeness-aware Concept-Based Explanations in Deep Neural Networks](https://proceedings.neurips.cc/paper/2020/hash/ecb287ff763c169694f682af52c1f309-Abstract.html)
 - COCKATIEL: [Jourdan et al. 2023, COCKATIEL: COntinuous Concept ranKed ATtribution with Interpretable ELements for explaining neural net classifiers on NLP](https://aclanthology.org/2023.findings-acl.317/)
 - Yun et al. 2021, [Transformer visualization via dictionary learning: contextualized embedding as a linear superposition of transformer factors](https://arxiv.org/abs/2103.15949)
@@ -159,7 +169,7 @@ Interpreto 🪄 is a project of the [FOR](https://www.irt-saintexupery.com/fr/fo
 ## 🗞️ Citation
-If you use Interpreto 🪄 as part of your workflow in a scientific publication, please consider citing 🗞️ our paper:
+If you use Interpreto 🪄 as part of your workflow in a scientific publication, please consider citing 🗞️ [our paper](https://arxiv.org/abs/2512.09730):
 ```bibtex
 @article{poche2025interpreto,

{interpreto-0.4.20 → interpreto-0.5.0.dev1}/interpreto/__init__.py RENAMED Viewed

@@ -42,7 +42,7 @@ from .attributions import (
 from .commons import (
     Granularity,
 )
-from .model_wrapping import ModelWithSplitPoints
+from .model_wrapping import ModelWithSplitPoints, SplitSequenceClassification
 from .visualizations import (
     AttributionVisualization,
     plot_attributions,
@@ -71,6 +71,7 @@ __all__ = [
     "SquareGrad",
     "Saliency",
     "SmoothGrad",
+    "SplitSequenceClassification",
     "Sobol",
     "VarGrad",
     "get_version",

interpreto-0.5.0.dev1/interpreto/_vendor/overcomplete/base.pyi ADDED Viewed

@@ -0,0 +1,16 @@
+from abc import ABC
+import torch
+from torch import nn
+class BaseDictionaryLearning(ABC, nn.Module):
+    nb_concepts: int
+    device: str | torch.device
+    fitted: bool
+    def __init__(self, nb_concepts: int, device: str | torch.device = "cpu") -> None: ...
+    def encode(self, x: torch.Tensor) -> torch.Tensor: ...
+    def decode(self, z: torch.Tensor) -> torch.Tensor: ...
+    def fit(self, x: torch.Tensor) -> None: ...
+    def get_dictionary(self) -> torch.Tensor: ...
+    def to(self, device: str | torch.device) -> "BaseDictionaryLearning": ...  # type: ignore[override]

{interpreto-0.4.20 → interpreto-0.5.0.dev1}/interpreto/attributions/aggregations/base.py RENAMED Viewed

@@ -41,10 +41,9 @@ def cast_input_to_dtype(func):
     Ensure mask and results are on the device specified in the aggregator
     """
-    def wrapper(self, results: torch.Tensor, mask, *args, **kwargs) -> torch.Tensor:
-        # TODO : eventually add device alignment as well
-        if mask is not None and mask.dtype != self.dtype:
-            mask = mask.to(self.dtype)
+    def wrapper(self, results: torch.Tensor, mask: torch.Tensor | None, *args, **kwargs) -> torch.Tensor:
+        if mask is not None:
+            mask = mask.to(device=results.device, dtype=self.dtype)
         return func(self, results.to(self.dtype), mask, *args, **kwargs)
     return wrapper

interpreto 0.4.20__tar.gz → 0.5.0.dev1__tar.gz

interpreto 0.4.20tar.gz → 0.5.0.dev1tar.gz