PyPI - interpkit - Versions diffs - 0.2.0__tar.gz → 0.3.0__tar.gz - Mend

interpkit 0.2.0tar.gz → 0.3.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (78) hide show

{interpkit-0.2.0 → interpkit-0.3.0}/PKG-INFO RENAMED Viewed

@@ -1,12 +1,12 @@
 Metadata-Version: 2.4
 Name: interpkit
-Version: 0.2.0
+Version: 0.3.0
 Summary: Mech interp for any HuggingFace model.
 Author: Davide Zani
 License-Expression: MIT
-Project-URL: Homepage, https://github.com/z4nix/MechKit
-Project-URL: Repository, https://github.com/z4nix/MechKit
-Project-URL: Issues, https://github.com/z4nix/MechKit/issues
+Project-URL: Homepage, https://github.com/z4nix/interpkit
+Project-URL: Repository, https://github.com/z4nix/interpkit
+Project-URL: Issues, https://github.com/z4nix/interpkit/issues
 Keywords: mechanistic-interpretability,pytorch,transformers,mech-interp,interpretability
 Classifier: Development Status :: 3 - Alpha
 Classifier: Intended Audience :: Science/Research
@@ -23,6 +23,7 @@ Requires-Dist: torch>=2.1
 Requires-Dist: transformers>=4.36
 Requires-Dist: safetensors>=0.4
 Requires-Dist: rich>=13.0
+Requires-Dist: rich-gradient>=0.3
 Requires-Dist: typer>=0.9
 Requires-Dist: Pillow>=10.0
 Requires-Dist: matplotlib>=3.8
@@ -34,20 +35,20 @@ Requires-Dist: scikit-learn>=1.3; extra == "probe"
 Provides-Extra: dev
 Requires-Dist: pytest>=7.0; extra == "dev"
 Requires-Dist: pytest-timeout>=2.2; extra == "dev"
+Requires-Dist: pytest-cov>=5.0; extra == "dev"
 Requires-Dist: scikit-learn>=1.3; extra == "dev"
 Requires-Dist: torchvision>=0.16; extra == "dev"
+Requires-Dist: ruff>=0.4; extra == "dev"
+Requires-Dist: mypy>=1.8; extra == "dev"
+Provides-Extra: docs
+Requires-Dist: mkdocs>=1.5; extra == "docs"
+Requires-Dist: mkdocs-material>=9.5; extra == "docs"
+Requires-Dist: mkdocstrings[python]>=0.24; extra == "docs"
 Dynamic: license-file
-```
-IIIII         tt                          KK  KK iii tt
- III  nn nnn  tt      eee  rr rr  pp pp   KK KK      tt
- III  nnn  nn tttt  ee   e rrr  r ppp  pp KKKK   iii tttt
- III  nn   nn tt    eeeee  rr     pppppp  KK KK  iii tt
-IIIII nn   nn  tttt  eeeee rr     pp      KK  KK iii  tttt
-                                  pp
-```
-> Mech interp for any HuggingFace model.
+<p align="center">
+  <img src="assets/logo.svg" alt="InterpKit" width="680">
+</p>
 [![PyPI version](https://img.shields.io/pypi/v/interpkit.svg)](https://pypi.org/project/interpkit/)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
@@ -75,8 +76,8 @@ pip install interpkit[probe]
 Or install from source for development:
 ```bash
-git clone https://github.com/davidezani/InterpKit.git
-cd InterpKit
+git clone https://github.com/z4nix/interpkit.git
+cd interpkit
 pip install -e ".[dev]"
 ```
@@ -117,7 +118,7 @@ model = interpkit.load("bert-base-uncased")
 | Operation | What it does | Works on |
 |-----------|-------------|----------|
 | **`scan`** | One-command model overview: runs DLA, lens, attention, attribution and surfaces key findings | LMs |
-| **`dla`** | Direct Logit Attribution — decompose output logits by head and MLP contribution | LMs |
+| **`dla`** | Direct Logit Attribution — decompose output logits by head and MLP contribution; optionally decompose through an SAE into per-feature attributions | LMs |
 | `inspect` | Module tree with types, param counts, shapes | Any model |
 | `patch` | Activation patching at a module, head, or position | Any model |
 | `trace` | Causal tracing — module-level or position-aware (Meng et al.) heatmap | Any model |
@@ -172,6 +173,16 @@ model.dla("The capital of France is", token="Paris")
 # Save a bar chart
 model.dla("The capital of France is", save="dla.png")
+# Feature-level DLA — decompose a component through an SAE
+# to see which individual features drive the prediction
+model.dla(
+    "The capital of France is",
+    sae="jbloom/GPT2-Small-SAEs-Reformatted",
+    sae_at="transformer.h.11.attn",
+)
+# result["feature_contributions"]["features"]
+#   — per-feature logit attributions at the specified component
 ```
 ## Causal Tracing
@@ -342,14 +353,22 @@ interpkit.diff(base, finetuned, "The capital of France is")
 ## SAE Features
-Decompose activations into interpretable features using pre-trained Sparse Autoencoders from HuggingFace:
+Decompose activations into interpretable features using pre-trained Sparse Autoencoders:
 ```python
+# From HuggingFace
 model.features(
     "The capital of France is",
     at="transformer.h.8",
     sae="jbloom/GPT2-Small-SAEs-Reformatted",
 )
+# From a local file (.safetensors or .pt)
+model.features(
+    "The capital of France is",
+    at="transformer.h.8",
+    sae="/path/to/sae_weights.safetensors",
+)
 ```
 No SAELens dependency — weights are loaded directly via `safetensors`.
@@ -408,6 +427,8 @@ interpkit ablate gpt2 "The capital of France is" --at transformer.h.8.mlp
 interpkit decompose gpt2 "The capital of France is"
 interpkit diff gpt2 my-finetuned-gpt2 "The capital of France is" --save diff.png
 interpkit features gpt2 "The capital of France is" --at transformer.h.8 --sae jbloom/GPT2-Small-SAEs-Reformatted
+interpkit features gpt2 "The capital of France is" --at transformer.h.8 --sae ./my_sae.safetensors
+interpkit dla gpt2 "The capital of France is" --sae jbloom/GPT2-Small-SAEs-Reformatted --sae-at transformer.h.11.attn
 # Interactive HTML output
 interpkit attention gpt2 "hello world" --html attention.html

{interpkit-0.2.0 → interpkit-0.3.0}/README.md RENAMED Viewed

@@ -1,13 +1,6 @@
-```
-IIIII         tt                          KK  KK iii tt
- III  nn nnn  tt      eee  rr rr  pp pp   KK KK      tt
- III  nnn  nn tttt  ee   e rrr  r ppp  pp KKKK   iii tttt
- III  nn   nn tt    eeeee  rr     pppppp  KK KK  iii tt
-IIIII nn   nn  tttt  eeeee rr     pp      KK  KK iii  tttt
-                                  pp
-```
-> Mech interp for any HuggingFace model.
+<p align="center">
+  <img src="assets/logo.svg" alt="InterpKit" width="680">
+</p>
 [![PyPI version](https://img.shields.io/pypi/v/interpkit.svg)](https://pypi.org/project/interpkit/)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
@@ -35,8 +28,8 @@ pip install interpkit[probe]
 Or install from source for development:
 ```bash
-git clone https://github.com/davidezani/InterpKit.git
-cd InterpKit
+git clone https://github.com/z4nix/interpkit.git
+cd interpkit
 pip install -e ".[dev]"
 ```
@@ -77,7 +70,7 @@ model = interpkit.load("bert-base-uncased")
 | Operation | What it does | Works on |
 |-----------|-------------|----------|
 | **`scan`** | One-command model overview: runs DLA, lens, attention, attribution and surfaces key findings | LMs |
-| **`dla`** | Direct Logit Attribution — decompose output logits by head and MLP contribution | LMs |
+| **`dla`** | Direct Logit Attribution — decompose output logits by head and MLP contribution; optionally decompose through an SAE into per-feature attributions | LMs |
 | `inspect` | Module tree with types, param counts, shapes | Any model |
 | `patch` | Activation patching at a module, head, or position | Any model |
 | `trace` | Causal tracing — module-level or position-aware (Meng et al.) heatmap | Any model |
@@ -132,6 +125,16 @@ model.dla("The capital of France is", token="Paris")
 # Save a bar chart
 model.dla("The capital of France is", save="dla.png")
+# Feature-level DLA — decompose a component through an SAE
+# to see which individual features drive the prediction
+model.dla(
+    "The capital of France is",
+    sae="jbloom/GPT2-Small-SAEs-Reformatted",
+    sae_at="transformer.h.11.attn",
+)
+# result["feature_contributions"]["features"]
+#   — per-feature logit attributions at the specified component
 ```
 ## Causal Tracing
@@ -302,14 +305,22 @@ interpkit.diff(base, finetuned, "The capital of France is")
 ## SAE Features
-Decompose activations into interpretable features using pre-trained Sparse Autoencoders from HuggingFace:
+Decompose activations into interpretable features using pre-trained Sparse Autoencoders:
 ```python
+# From HuggingFace
 model.features(
     "The capital of France is",
     at="transformer.h.8",
     sae="jbloom/GPT2-Small-SAEs-Reformatted",
 )
+# From a local file (.safetensors or .pt)
+model.features(
+    "The capital of France is",
+    at="transformer.h.8",
+    sae="/path/to/sae_weights.safetensors",
+)
 ```
 No SAELens dependency — weights are loaded directly via `safetensors`.
@@ -368,6 +379,8 @@ interpkit ablate gpt2 "The capital of France is" --at transformer.h.8.mlp
 interpkit decompose gpt2 "The capital of France is"
 interpkit diff gpt2 my-finetuned-gpt2 "The capital of France is" --save diff.png
 interpkit features gpt2 "The capital of France is" --at transformer.h.8 --sae jbloom/GPT2-Small-SAEs-Reformatted
+interpkit features gpt2 "The capital of France is" --at transformer.h.8 --sae ./my_sae.safetensors
+interpkit dla gpt2 "The capital of France is" --sae jbloom/GPT2-Small-SAEs-Reformatted --sae-at transformer.h.11.attn
 # Interactive HTML output
 interpkit attention gpt2 "hello world" --html attention.html

interpkit 0.2.0__tar.gz → 0.3.0__tar.gz

interpkit 0.2.0tar.gz → 0.3.0tar.gz