PyPI - interpkit - Versions diffs - 0.2.0__tar.gz → 0.4.0__tar.gz - Mend

interpkit 0.2.0tar.gz → 0.4.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (84) hide show

{interpkit-0.2.0/interpkit.egg-info → interpkit-0.4.0}/PKG-INFO RENAMED Viewed

@@ -1,12 +1,12 @@
 Metadata-Version: 2.4
 Name: interpkit
-Version: 0.2.0
+Version: 0.4.0
 Summary: Mech interp for any HuggingFace model.
 Author: Davide Zani
 License-Expression: MIT
-Project-URL: Homepage, https://github.com/z4nix/MechKit
-Project-URL: Repository, https://github.com/z4nix/MechKit
-Project-URL: Issues, https://github.com/z4nix/MechKit/issues
+Project-URL: Homepage, https://github.com/z4nix/interpkit
+Project-URL: Repository, https://github.com/z4nix/interpkit
+Project-URL: Issues, https://github.com/z4nix/interpkit/issues
 Keywords: mechanistic-interpretability,pytorch,transformers,mech-interp,interpretability
 Classifier: Development Status :: 3 - Alpha
 Classifier: Intended Audience :: Science/Research
@@ -23,6 +23,7 @@ Requires-Dist: torch>=2.1
 Requires-Dist: transformers>=4.36
 Requires-Dist: safetensors>=0.4
 Requires-Dist: rich>=13.0
+Requires-Dist: rich-gradient>=0.3
 Requires-Dist: typer>=0.9
 Requires-Dist: Pillow>=10.0
 Requires-Dist: matplotlib>=3.8
@@ -34,20 +35,20 @@ Requires-Dist: scikit-learn>=1.3; extra == "probe"
 Provides-Extra: dev
 Requires-Dist: pytest>=7.0; extra == "dev"
 Requires-Dist: pytest-timeout>=2.2; extra == "dev"
+Requires-Dist: pytest-cov>=5.0; extra == "dev"
 Requires-Dist: scikit-learn>=1.3; extra == "dev"
 Requires-Dist: torchvision>=0.16; extra == "dev"
+Requires-Dist: ruff>=0.4; extra == "dev"
+Requires-Dist: mypy>=1.8; extra == "dev"
+Provides-Extra: docs
+Requires-Dist: mkdocs>=1.5; extra == "docs"
+Requires-Dist: mkdocs-material>=9.5; extra == "docs"
+Requires-Dist: mkdocstrings[python]>=0.24; extra == "docs"
 Dynamic: license-file
-```
-IIIII         tt                          KK  KK iii tt
- III  nn nnn  tt      eee  rr rr  pp pp   KK KK      tt
- III  nnn  nn tttt  ee   e rrr  r ppp  pp KKKK   iii tttt
- III  nn   nn tt    eeeee  rr     pppppp  KK KK  iii tt
-IIIII nn   nn  tttt  eeeee rr     pp      KK  KK iii  tttt
-                                  pp
-```
-> Mech interp for any HuggingFace model.
+<p align="center">
+  <img src="assets/logo.svg" alt="InterpKit" width="680">
+</p>
 [![PyPI version](https://img.shields.io/pypi/v/interpkit.svg)](https://pypi.org/project/interpkit/)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
@@ -75,8 +76,8 @@ pip install interpkit[probe]
 Or install from source for development:
 ```bash
-git clone https://github.com/davidezani/InterpKit.git
-cd InterpKit
+git clone https://github.com/z4nix/interpkit.git
+cd interpkit
 pip install -e ".[dev]"
 ```
@@ -110,6 +111,25 @@ model = interpkit.load("google/vit-base-patch16-224")
 model = interpkit.load("bert-base-uncased")
 ```
+### Chat models
+Instruction-tuned models work too — interpkit applies the tokenizer's chat template automatically.
+```python
+chat = interpkit.load("HuggingFaceTB/SmolLM2-360M-Instruct")
+result = chat.chat("Write a haiku about cats.", max_new_tokens=64)
+print(result["response"])
+# Run any other op on the templated prompt
+chat.dla(result["prompt"])
+# Or pass a message list directly to any op
+chat.dla([{"role": "user", "content": "Capital of France?"}])
+```
+See [examples/10_chat_models.ipynb](examples/10_chat_models.ipynb) for a full walkthrough including chat-style steering.
 ---
 ## Operations
@@ -117,7 +137,8 @@ model = interpkit.load("bert-base-uncased")
 | Operation | What it does | Works on |
 |-----------|-------------|----------|
 | **`scan`** | One-command model overview: runs DLA, lens, attention, attribution and surfaces key findings | LMs |
-| **`dla`** | Direct Logit Attribution — decompose output logits by head and MLP contribution | LMs |
+| **`chat`** | Send a message through the tokenizer's chat template and generate a reply | Chat / instruct LMs |
+| **`dla`** | Direct Logit Attribution — decompose output logits by head and MLP contribution; optionally decompose through an SAE into per-feature attributions | LMs |
 | `inspect` | Module tree with types, param counts, shapes | Any model |
 | `patch` | Activation patching at a module, head, or position | Any model |
 | `trace` | Causal tracing — module-level or position-aware (Meng et al.) heatmap | Any model |
@@ -172,6 +193,16 @@ model.dla("The capital of France is", token="Paris")
 # Save a bar chart
 model.dla("The capital of France is", save="dla.png")
+# Feature-level DLA — decompose a component through an SAE
+# to see which individual features drive the prediction
+model.dla(
+    "The capital of France is",
+    sae="jbloom/GPT2-Small-SAEs-Reformatted",
+    sae_at="transformer.h.11.attn",
+)
+# result["feature_contributions"]["features"]
+#   — per-feature logit attributions at the specified component
 ```
 ## Causal Tracing
@@ -317,10 +348,12 @@ results = model.dla_batch(["The capital of France is", "The CEO of Apple is"])
 ## Steering
 ```python
-vector = model.steer_vector("Love", "Hate", at="transformer.h.8")
+vector = model.steer_vector(" love", " hate", at="transformer.h.8")
 model.steer("The weather today is", vector=vector, at="transformer.h.8", scale=2.0)
 ```
+> Note the leading spaces. BPE tokenizers (GPT-2, Llama, ...) treat `" love"` and `"love"` as different tokens, and the leading-space variant is the one the model actually sees in normal text. interpkit prints a warning if you forget.
 ## Linear Probe
 ```python
@@ -342,14 +375,22 @@ interpkit.diff(base, finetuned, "The capital of France is")
 ## SAE Features
-Decompose activations into interpretable features using pre-trained Sparse Autoencoders from HuggingFace:
+Decompose activations into interpretable features using pre-trained Sparse Autoencoders:
 ```python
+# From HuggingFace
 model.features(
     "The capital of France is",
     at="transformer.h.8",
     sae="jbloom/GPT2-Small-SAEs-Reformatted",
 )
+# From a local file (.safetensors or .pt)
+model.features(
+    "The capital of France is",
+    at="transformer.h.8",
+    sae="/path/to/sae_weights.safetensors",
+)
 ```
 No SAELens dependency — weights are loaded directly via `safetensors`.
@@ -403,11 +444,17 @@ interpkit lens gpt2 "The capital of France is"
 interpkit lens gpt2 "The capital of France is" --position -1
 interpkit attention gpt2 "The capital of France is" --layer 8 --save attention.png
 interpkit attribute gpt2 "The capital of France is"
-interpkit steer gpt2 "The weather is" --positive Love --negative Hate --at transformer.h.8
+interpkit steer gpt2 "The weather is" --positive " love" --negative " hate" --at transformer.h.8
 interpkit ablate gpt2 "The capital of France is" --at transformer.h.8.mlp
 interpkit decompose gpt2 "The capital of France is"
 interpkit diff gpt2 my-finetuned-gpt2 "The capital of France is" --save diff.png
 interpkit features gpt2 "The capital of France is" --at transformer.h.8 --sae jbloom/GPT2-Small-SAEs-Reformatted
+interpkit features gpt2 "The capital of France is" --at transformer.h.8 --sae ./my_sae.safetensors
+interpkit dla gpt2 "The capital of France is" --sae jbloom/GPT2-Small-SAEs-Reformatted --sae-at transformer.h.11.attn
+# Chat / instruct models — applies the tokenizer's chat template automatically
+interpkit chat HuggingFaceTB/SmolLM2-360M-Instruct "Write a haiku about cats." --max-new-tokens 64
+interpkit chat HuggingFaceTB/SmolLM2-360M-Instruct "What is 2+2?" --system "You are terse." --show-prompt
 # Interactive HTML output
 interpkit attention gpt2 "hello world" --html attention.html
@@ -418,7 +465,17 @@ interpkit attribute gpt2 "The capital of France is" --html attribution.html
 interpkit attribute microsoft/resnet-50 cat.jpg --target 281
 ```
-Run `interpkit` with no arguments for a full command reference.
+Run `interpkit` with no arguments for a full command reference, or
+`interpkit --extensive` for a beginner-friendly walkthrough of every command.
+If the `interpkit` console script isn't on your `PATH` (e.g. fresh
+environments, sandboxed installs, or running from a checkout without
+re-installing), every command also works as `python -m interpkit ...`:
+```bash
+python -m interpkit scan gpt2 "The capital of France is"
+python -m interpkit chat HuggingFaceTB/SmolLM2-360M-Instruct "Hello!"
+```
 ---
@@ -480,6 +537,7 @@ See the [`examples/`](examples/) directory for Jupyter notebooks:
 | `07_vision_models` | ResNet/ViT attribution, ablation, activations |
 | `08_dla_and_circuits` | DLA, head activations, residual decomposition, OV/QK analysis, composition, circuit discovery |
 | `09_scan_and_batch` | Auto-scan, batch operations, dataset workflows |
+| `10_chat_models` | Chat-template handling, `model.chat()`, message-list inputs, chat-style steering |
 ---

interpkit-0.2.0/PKG-INFO → interpkit-0.4.0/README.md RENAMED Viewed

@@ -1,53 +1,6 @@
-Metadata-Version: 2.4
-Name: interpkit
-Version: 0.2.0
-Summary: Mech interp for any HuggingFace model.
-Author: Davide Zani
-License-Expression: MIT
-Project-URL: Homepage, https://github.com/z4nix/MechKit
-Project-URL: Repository, https://github.com/z4nix/MechKit
-Project-URL: Issues, https://github.com/z4nix/MechKit/issues
-Keywords: mechanistic-interpretability,pytorch,transformers,mech-interp,interpretability
-Classifier: Development Status :: 3 - Alpha
-Classifier: Intended Audience :: Science/Research
-Classifier: Programming Language :: Python :: 3
-Classifier: Programming Language :: Python :: 3.10
-Classifier: Programming Language :: Python :: 3.11
-Classifier: Programming Language :: Python :: 3.12
-Classifier: Programming Language :: Python :: 3.13
-Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
-Requires-Python: >=3.10
-Description-Content-Type: text/markdown
-License-File: LICENSE
-Requires-Dist: torch>=2.1
-Requires-Dist: transformers>=4.36
-Requires-Dist: safetensors>=0.4
-Requires-Dist: rich>=13.0
-Requires-Dist: typer>=0.9
-Requires-Dist: Pillow>=10.0
-Requires-Dist: matplotlib>=3.8
-Requires-Dist: huggingface-hub>=0.20
-Provides-Extra: vision
-Requires-Dist: torchvision>=0.16; extra == "vision"
-Provides-Extra: probe
-Requires-Dist: scikit-learn>=1.3; extra == "probe"
-Provides-Extra: dev
-Requires-Dist: pytest>=7.0; extra == "dev"
-Requires-Dist: pytest-timeout>=2.2; extra == "dev"
-Requires-Dist: scikit-learn>=1.3; extra == "dev"
-Requires-Dist: torchvision>=0.16; extra == "dev"
-Dynamic: license-file
-```
-IIIII         tt                          KK  KK iii tt
- III  nn nnn  tt      eee  rr rr  pp pp   KK KK      tt
- III  nnn  nn tttt  ee   e rrr  r ppp  pp KKKK   iii tttt
- III  nn   nn tt    eeeee  rr     pppppp  KK KK  iii tt
-IIIII nn   nn  tttt  eeeee rr     pp      KK  KK iii  tttt
-                                  pp
-```
-> Mech interp for any HuggingFace model.
+<p align="center">
+  <img src="assets/logo.svg" alt="InterpKit" width="680">
+</p>
 [![PyPI version](https://img.shields.io/pypi/v/interpkit.svg)](https://pypi.org/project/interpkit/)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
@@ -75,8 +28,8 @@ pip install interpkit[probe]
 Or install from source for development:
 ```bash
-git clone https://github.com/davidezani/InterpKit.git
-cd InterpKit
+git clone https://github.com/z4nix/interpkit.git
+cd interpkit
 pip install -e ".[dev]"
 ```
@@ -110,6 +63,25 @@ model = interpkit.load("google/vit-base-patch16-224")
 model = interpkit.load("bert-base-uncased")
 ```
+### Chat models
+Instruction-tuned models work too — interpkit applies the tokenizer's chat template automatically.
+```python
+chat = interpkit.load("HuggingFaceTB/SmolLM2-360M-Instruct")
+result = chat.chat("Write a haiku about cats.", max_new_tokens=64)
+print(result["response"])
+# Run any other op on the templated prompt
+chat.dla(result["prompt"])
+# Or pass a message list directly to any op
+chat.dla([{"role": "user", "content": "Capital of France?"}])
+```
+See [examples/10_chat_models.ipynb](examples/10_chat_models.ipynb) for a full walkthrough including chat-style steering.
 ---
 ## Operations
@@ -117,7 +89,8 @@ model = interpkit.load("bert-base-uncased")
 | Operation | What it does | Works on |
 |-----------|-------------|----------|
 | **`scan`** | One-command model overview: runs DLA, lens, attention, attribution and surfaces key findings | LMs |
-| **`dla`** | Direct Logit Attribution — decompose output logits by head and MLP contribution | LMs |
+| **`chat`** | Send a message through the tokenizer's chat template and generate a reply | Chat / instruct LMs |
+| **`dla`** | Direct Logit Attribution — decompose output logits by head and MLP contribution; optionally decompose through an SAE into per-feature attributions | LMs |
 | `inspect` | Module tree with types, param counts, shapes | Any model |
 | `patch` | Activation patching at a module, head, or position | Any model |
 | `trace` | Causal tracing — module-level or position-aware (Meng et al.) heatmap | Any model |
@@ -172,6 +145,16 @@ model.dla("The capital of France is", token="Paris")
 # Save a bar chart
 model.dla("The capital of France is", save="dla.png")
+# Feature-level DLA — decompose a component through an SAE
+# to see which individual features drive the prediction
+model.dla(
+    "The capital of France is",
+    sae="jbloom/GPT2-Small-SAEs-Reformatted",
+    sae_at="transformer.h.11.attn",
+)
+# result["feature_contributions"]["features"]
+#   — per-feature logit attributions at the specified component
 ```
 ## Causal Tracing
@@ -317,10 +300,12 @@ results = model.dla_batch(["The capital of France is", "The CEO of Apple is"])
 ## Steering
 ```python
-vector = model.steer_vector("Love", "Hate", at="transformer.h.8")
+vector = model.steer_vector(" love", " hate", at="transformer.h.8")
 model.steer("The weather today is", vector=vector, at="transformer.h.8", scale=2.0)
 ```
+> Note the leading spaces. BPE tokenizers (GPT-2, Llama, ...) treat `" love"` and `"love"` as different tokens, and the leading-space variant is the one the model actually sees in normal text. interpkit prints a warning if you forget.
 ## Linear Probe
 ```python
@@ -342,14 +327,22 @@ interpkit.diff(base, finetuned, "The capital of France is")
 ## SAE Features
-Decompose activations into interpretable features using pre-trained Sparse Autoencoders from HuggingFace:
+Decompose activations into interpretable features using pre-trained Sparse Autoencoders:
 ```python
+# From HuggingFace
 model.features(
     "The capital of France is",
     at="transformer.h.8",
     sae="jbloom/GPT2-Small-SAEs-Reformatted",
 )
+# From a local file (.safetensors or .pt)
+model.features(
+    "The capital of France is",
+    at="transformer.h.8",
+    sae="/path/to/sae_weights.safetensors",
+)
 ```
 No SAELens dependency — weights are loaded directly via `safetensors`.
@@ -403,11 +396,17 @@ interpkit lens gpt2 "The capital of France is"
 interpkit lens gpt2 "The capital of France is" --position -1
 interpkit attention gpt2 "The capital of France is" --layer 8 --save attention.png
 interpkit attribute gpt2 "The capital of France is"
-interpkit steer gpt2 "The weather is" --positive Love --negative Hate --at transformer.h.8
+interpkit steer gpt2 "The weather is" --positive " love" --negative " hate" --at transformer.h.8
 interpkit ablate gpt2 "The capital of France is" --at transformer.h.8.mlp
 interpkit decompose gpt2 "The capital of France is"
 interpkit diff gpt2 my-finetuned-gpt2 "The capital of France is" --save diff.png
 interpkit features gpt2 "The capital of France is" --at transformer.h.8 --sae jbloom/GPT2-Small-SAEs-Reformatted
+interpkit features gpt2 "The capital of France is" --at transformer.h.8 --sae ./my_sae.safetensors
+interpkit dla gpt2 "The capital of France is" --sae jbloom/GPT2-Small-SAEs-Reformatted --sae-at transformer.h.11.attn
+# Chat / instruct models — applies the tokenizer's chat template automatically
+interpkit chat HuggingFaceTB/SmolLM2-360M-Instruct "Write a haiku about cats." --max-new-tokens 64
+interpkit chat HuggingFaceTB/SmolLM2-360M-Instruct "What is 2+2?" --system "You are terse." --show-prompt
 # Interactive HTML output
 interpkit attention gpt2 "hello world" --html attention.html
@@ -418,7 +417,17 @@ interpkit attribute gpt2 "The capital of France is" --html attribution.html
 interpkit attribute microsoft/resnet-50 cat.jpg --target 281
 ```
-Run `interpkit` with no arguments for a full command reference.
+Run `interpkit` with no arguments for a full command reference, or
+`interpkit --extensive` for a beginner-friendly walkthrough of every command.
+If the `interpkit` console script isn't on your `PATH` (e.g. fresh
+environments, sandboxed installs, or running from a checkout without
+re-installing), every command also works as `python -m interpkit ...`:
+```bash
+python -m interpkit scan gpt2 "The capital of France is"
+python -m interpkit chat HuggingFaceTB/SmolLM2-360M-Instruct "Hello!"
+```
 ---
@@ -480,6 +489,7 @@ See the [`examples/`](examples/) directory for Jupyter notebooks:
 | `07_vision_models` | ResNet/ViT attribution, ablation, activations |
 | `08_dla_and_circuits` | DLA, head activations, residual decomposition, OV/QK analysis, composition, circuit discovery |
 | `09_scan_and_batch` | Auto-scan, batch operations, dataset workflows |
+| `10_chat_models` | Chat-template handling, `model.chat()`, message-list inputs, chat-style steering |
 ---

interpkit-0.4.0/interpkit/__main__.py ADDED Viewed

@@ -0,0 +1,19 @@
+"""Entry point so ``python -m interpkit`` invokes the Typer CLI.
+Mirrors the ``[project.scripts] interpkit = "interpkit.cli.main:app"``
+console script declared in :file:`pyproject.toml`, so users without the
+console script on their ``$PATH`` (e.g. just-installed in a fresh
+environment, vendored copies, ad-hoc subprocess invocations) can still
+reach every CLI command via ``python -m interpkit ...``.
+"""
+from interpkit.cli.main import app
+def main() -> None:
+    """Invoke the Typer app — separate function makes patching easier in tests."""
+    app()
+if __name__ == "__main__":
+    main()

interpkit 0.2.0__tar.gz → 0.4.0__tar.gz

interpkit 0.2.0tar.gz → 0.4.0tar.gz