PyPI - interpkit - Versions diffs - 0.3.0__tar.gz → 0.4.0__tar.gz - Mend

interpkit 0.3.0tar.gz → 0.4.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (83) hide show

{interpkit-0.3.0 → interpkit-0.4.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: interpkit
-Version: 0.3.0
+Version: 0.4.0
 Summary: Mech interp for any HuggingFace model.
 Author: Davide Zani
 License-Expression: MIT
@@ -111,6 +111,25 @@ model = interpkit.load("google/vit-base-patch16-224")
 model = interpkit.load("bert-base-uncased")
 ```
+### Chat models
+Instruction-tuned models work too — interpkit applies the tokenizer's chat template automatically.
+```python
+chat = interpkit.load("HuggingFaceTB/SmolLM2-360M-Instruct")
+result = chat.chat("Write a haiku about cats.", max_new_tokens=64)
+print(result["response"])
+# Run any other op on the templated prompt
+chat.dla(result["prompt"])
+# Or pass a message list directly to any op
+chat.dla([{"role": "user", "content": "Capital of France?"}])
+```
+See [examples/10_chat_models.ipynb](examples/10_chat_models.ipynb) for a full walkthrough including chat-style steering.
 ---
 ## Operations
@@ -118,6 +137,7 @@ model = interpkit.load("bert-base-uncased")
 | Operation | What it does | Works on |
 |-----------|-------------|----------|
 | **`scan`** | One-command model overview: runs DLA, lens, attention, attribution and surfaces key findings | LMs |
+| **`chat`** | Send a message through the tokenizer's chat template and generate a reply | Chat / instruct LMs |
 | **`dla`** | Direct Logit Attribution — decompose output logits by head and MLP contribution; optionally decompose through an SAE into per-feature attributions | LMs |
 | `inspect` | Module tree with types, param counts, shapes | Any model |
 | `patch` | Activation patching at a module, head, or position | Any model |
@@ -328,10 +348,12 @@ results = model.dla_batch(["The capital of France is", "The CEO of Apple is"])
 ## Steering
 ```python
-vector = model.steer_vector("Love", "Hate", at="transformer.h.8")
+vector = model.steer_vector(" love", " hate", at="transformer.h.8")
 model.steer("The weather today is", vector=vector, at="transformer.h.8", scale=2.0)
 ```
+> Note the leading spaces. BPE tokenizers (GPT-2, Llama, ...) treat `" love"` and `"love"` as different tokens, and the leading-space variant is the one the model actually sees in normal text. interpkit prints a warning if you forget.
 ## Linear Probe
 ```python
@@ -422,7 +444,7 @@ interpkit lens gpt2 "The capital of France is"
 interpkit lens gpt2 "The capital of France is" --position -1
 interpkit attention gpt2 "The capital of France is" --layer 8 --save attention.png
 interpkit attribute gpt2 "The capital of France is"
-interpkit steer gpt2 "The weather is" --positive Love --negative Hate --at transformer.h.8
+interpkit steer gpt2 "The weather is" --positive " love" --negative " hate" --at transformer.h.8
 interpkit ablate gpt2 "The capital of France is" --at transformer.h.8.mlp
 interpkit decompose gpt2 "The capital of France is"
 interpkit diff gpt2 my-finetuned-gpt2 "The capital of France is" --save diff.png
@@ -430,6 +452,10 @@ interpkit features gpt2 "The capital of France is" --at transformer.h.8 --sae jb
 interpkit features gpt2 "The capital of France is" --at transformer.h.8 --sae ./my_sae.safetensors
 interpkit dla gpt2 "The capital of France is" --sae jbloom/GPT2-Small-SAEs-Reformatted --sae-at transformer.h.11.attn
+# Chat / instruct models — applies the tokenizer's chat template automatically
+interpkit chat HuggingFaceTB/SmolLM2-360M-Instruct "Write a haiku about cats." --max-new-tokens 64
+interpkit chat HuggingFaceTB/SmolLM2-360M-Instruct "What is 2+2?" --system "You are terse." --show-prompt
 # Interactive HTML output
 interpkit attention gpt2 "hello world" --html attention.html
 interpkit trace gpt2 --clean "...Paris..." --corrupted "...Rome..." --html trace.html
@@ -439,7 +465,17 @@ interpkit attribute gpt2 "The capital of France is" --html attribution.html
 interpkit attribute microsoft/resnet-50 cat.jpg --target 281
 ```
-Run `interpkit` with no arguments for a full command reference.
+Run `interpkit` with no arguments for a full command reference, or
+`interpkit --extensive` for a beginner-friendly walkthrough of every command.
+If the `interpkit` console script isn't on your `PATH` (e.g. fresh
+environments, sandboxed installs, or running from a checkout without
+re-installing), every command also works as `python -m interpkit ...`:
+```bash
+python -m interpkit scan gpt2 "The capital of France is"
+python -m interpkit chat HuggingFaceTB/SmolLM2-360M-Instruct "Hello!"
+```
 ---
@@ -501,6 +537,7 @@ See the [`examples/`](examples/) directory for Jupyter notebooks:
 | `07_vision_models` | ResNet/ViT attribution, ablation, activations |
 | `08_dla_and_circuits` | DLA, head activations, residual decomposition, OV/QK analysis, composition, circuit discovery |
 | `09_scan_and_batch` | Auto-scan, batch operations, dataset workflows |
+| `10_chat_models` | Chat-template handling, `model.chat()`, message-list inputs, chat-style steering |
 ---

{interpkit-0.3.0 → interpkit-0.4.0}/README.md RENAMED Viewed

@@ -63,6 +63,25 @@ model = interpkit.load("google/vit-base-patch16-224")
 model = interpkit.load("bert-base-uncased")
 ```
+### Chat models
+Instruction-tuned models work too — interpkit applies the tokenizer's chat template automatically.
+```python
+chat = interpkit.load("HuggingFaceTB/SmolLM2-360M-Instruct")
+result = chat.chat("Write a haiku about cats.", max_new_tokens=64)
+print(result["response"])
+# Run any other op on the templated prompt
+chat.dla(result["prompt"])
+# Or pass a message list directly to any op
+chat.dla([{"role": "user", "content": "Capital of France?"}])
+```
+See [examples/10_chat_models.ipynb](examples/10_chat_models.ipynb) for a full walkthrough including chat-style steering.
 ---
 ## Operations
@@ -70,6 +89,7 @@ model = interpkit.load("bert-base-uncased")
 | Operation | What it does | Works on |
 |-----------|-------------|----------|
 | **`scan`** | One-command model overview: runs DLA, lens, attention, attribution and surfaces key findings | LMs |
+| **`chat`** | Send a message through the tokenizer's chat template and generate a reply | Chat / instruct LMs |
 | **`dla`** | Direct Logit Attribution — decompose output logits by head and MLP contribution; optionally decompose through an SAE into per-feature attributions | LMs |
 | `inspect` | Module tree with types, param counts, shapes | Any model |
 | `patch` | Activation patching at a module, head, or position | Any model |
@@ -280,10 +300,12 @@ results = model.dla_batch(["The capital of France is", "The CEO of Apple is"])
 ## Steering
 ```python
-vector = model.steer_vector("Love", "Hate", at="transformer.h.8")
+vector = model.steer_vector(" love", " hate", at="transformer.h.8")
 model.steer("The weather today is", vector=vector, at="transformer.h.8", scale=2.0)
 ```
+> Note the leading spaces. BPE tokenizers (GPT-2, Llama, ...) treat `" love"` and `"love"` as different tokens, and the leading-space variant is the one the model actually sees in normal text. interpkit prints a warning if you forget.
 ## Linear Probe
 ```python
@@ -374,7 +396,7 @@ interpkit lens gpt2 "The capital of France is"
 interpkit lens gpt2 "The capital of France is" --position -1
 interpkit attention gpt2 "The capital of France is" --layer 8 --save attention.png
 interpkit attribute gpt2 "The capital of France is"
-interpkit steer gpt2 "The weather is" --positive Love --negative Hate --at transformer.h.8
+interpkit steer gpt2 "The weather is" --positive " love" --negative " hate" --at transformer.h.8
 interpkit ablate gpt2 "The capital of France is" --at transformer.h.8.mlp
 interpkit decompose gpt2 "The capital of France is"
 interpkit diff gpt2 my-finetuned-gpt2 "The capital of France is" --save diff.png
@@ -382,6 +404,10 @@ interpkit features gpt2 "The capital of France is" --at transformer.h.8 --sae jb
 interpkit features gpt2 "The capital of France is" --at transformer.h.8 --sae ./my_sae.safetensors
 interpkit dla gpt2 "The capital of France is" --sae jbloom/GPT2-Small-SAEs-Reformatted --sae-at transformer.h.11.attn
+# Chat / instruct models — applies the tokenizer's chat template automatically
+interpkit chat HuggingFaceTB/SmolLM2-360M-Instruct "Write a haiku about cats." --max-new-tokens 64
+interpkit chat HuggingFaceTB/SmolLM2-360M-Instruct "What is 2+2?" --system "You are terse." --show-prompt
 # Interactive HTML output
 interpkit attention gpt2 "hello world" --html attention.html
 interpkit trace gpt2 --clean "...Paris..." --corrupted "...Rome..." --html trace.html
@@ -391,7 +417,17 @@ interpkit attribute gpt2 "The capital of France is" --html attribution.html
 interpkit attribute microsoft/resnet-50 cat.jpg --target 281
 ```
-Run `interpkit` with no arguments for a full command reference.
+Run `interpkit` with no arguments for a full command reference, or
+`interpkit --extensive` for a beginner-friendly walkthrough of every command.
+If the `interpkit` console script isn't on your `PATH` (e.g. fresh
+environments, sandboxed installs, or running from a checkout without
+re-installing), every command also works as `python -m interpkit ...`:
+```bash
+python -m interpkit scan gpt2 "The capital of France is"
+python -m interpkit chat HuggingFaceTB/SmolLM2-360M-Instruct "Hello!"
+```
 ---
@@ -453,6 +489,7 @@ See the [`examples/`](examples/) directory for Jupyter notebooks:
 | `07_vision_models` | ResNet/ViT attribution, ablation, activations |
 | `08_dla_and_circuits` | DLA, head activations, residual decomposition, OV/QK analysis, composition, circuit discovery |
 | `09_scan_and_batch` | Auto-scan, batch operations, dataset workflows |
+| `10_chat_models` | Chat-template handling, `model.chat()`, message-list inputs, chat-style steering |
 ---

interpkit-0.4.0/interpkit/__main__.py ADDED Viewed

@@ -0,0 +1,19 @@
+"""Entry point so ``python -m interpkit`` invokes the Typer CLI.
+Mirrors the ``[project.scripts] interpkit = "interpkit.cli.main:app"``
+console script declared in :file:`pyproject.toml`, so users without the
+console script on their ``$PATH`` (e.g. just-installed in a fresh
+environment, vendored copies, ad-hoc subprocess invocations) can still
+reach every CLI command via ``python -m interpkit ...``.
+"""
+from interpkit.cli.main import app
+def main() -> None:
+    """Invoke the Typer app — separate function makes patching easier in tests."""
+    app()
+if __name__ == "__main__":
+    main()

{interpkit-0.3.0 → interpkit-0.4.0}/interpkit/cli/main.py RENAMED Viewed

@@ -6,6 +6,7 @@ import json as _json
 from importlib.metadata import version as _pkg_version
 import typer
+import typer.rich_utils as _ru
 from rich.console import Console
 from rich.panel import Panel
 from rich.table import Table
@@ -14,6 +15,18 @@ from rich_gradient import Text as GradientText
 from interpkit.core.theme import ACCENT, ACCENT_DIM, BRAND_COLORS
+_ru.STYLE_OPTION = f"bold {ACCENT}"
+_ru.STYLE_SWITCH = f"bold {ACCENT}"
+_ru.STYLE_METAVAR = f"bold {ACCENT}"
+_ru.STYLE_USAGE = ACCENT
+_ru.STYLE_USAGE_COMMAND = "bold"
+_ru.STYLE_COMMANDS_TABLE_FIRST_COLUMN = f"bold {ACCENT}"
+_ru.STYLE_OPTIONS_PANEL_BORDER = ACCENT_DIM
+_ru.STYLE_COMMANDS_PANEL_BORDER = ACCENT_DIM
+_ru.STYLE_REQUIRED_SHORT = ACCENT
+_ru.STYLE_REQUIRED_LONG = ACCENT_DIM
+_ru.STYLE_NEGATIVE_OPTION = f"bold {ACCENT}"
 app = typer.Typer(
     name="interpkit",
     help="Mech interp for any HuggingFace model.",
@@ -110,6 +123,27 @@ def _show_extensive_help() -> None:
         padding=(0, 2),
     ))
+    console.print()
+    console.print(Panel(
+        f"[bold {ACCENT}]chat[/bold {ACCENT}]  "
+        "[dim]interpkit chat HuggingFaceTB/SmolLM2-360M-Instruct 'Write a haiku.'[/dim]\n\n"
+        "Send a message to an instruction-tuned chat model and print its reply. The message is"
+        " routed through the tokenizer's chat template (e.g. ChatML, Llama-2 Inst, Qwen, Gemma)"
+        " with [dim]add_generation_prompt=True[/dim] before generation, so any HF chat model that"
+        " ships a template just works.\n\n"
+        "  Errors clearly when the model has no chat template (i.e. a base/non-instruct model) —"
+        " in that case load an instruct variant or call any other command with a plain string.\n\n"
+        "  [bold]Key options:[/bold]\n"
+        "    [bold green]--system 'be brief'[/bold green]  Optional system prompt prepended to the conversation.\n"
+        "    [bold green]--max-new-tokens N[/bold green]  Generation budget (default 128).\n"
+        "    [bold green]--sample / --no-sample[/bold green]  Sampling vs greedy decoding (default greedy).\n"
+        "    [bold green]--temperature / --top-p[/bold green]  Standard sampling controls (used when --sample).\n"
+        "    [bold green]--show-prompt[/bold green]  Print the chat-templated prompt before generating.",
+        title="chat",
+        border_style=ACCENT_DIM,
+        padding=(0, 2),
+    ))
     # ── Core Operations ───────────────────────────────────────────
     console.print()
     console.print(Rule("[bold]Core Operations[/bold]", style=ACCENT))
@@ -261,7 +295,7 @@ def _show_extensive_help() -> None:
         ),
         (
             "steer",
-            "interpkit steer gpt2 'The sky is' --positive Love --negative Hate --at transformer.h.8",
+            "interpkit steer gpt2 'The sky is' --positive ' love' --negative ' hate' --at transformer.h.8",
             "Activation steering. Computes a 'steering vector' as the mean-difference between"
             " activations for contrasting concepts ([bold green]--positive[/bold green] vs"
             " [bold green]--negative[/bold green]), then adds a scaled copy of it to the activations"
@@ -422,6 +456,7 @@ def main(
     quick_start = _cmd_table([
         ("scan", "One-command overview \u2014 DLA, lens, attention, attribution"),
         ("report", "Generate an interactive HTML report"),
+        ("chat", "Send a message to a chat / instruct model"),
     ])
     core_ops = _cmd_table([
@@ -469,6 +504,10 @@ def main(
     console.print()
     console.print("  [dim]\u25b8[/dim] Most commands accept [bold green]--save[/bold green] and [bold green]--html[/bold green] for exports.")
     console.print(f"  [dim]\u25b8[/dim] Run [bold {ACCENT}]interpkit <command> --help[/bold {ACCENT}] for detailed usage.")
+    console.print(
+        f"  [dim]\u25b8[/dim] No console script on PATH? [bold {ACCENT}]python -m interpkit[/bold {ACCENT}]"
+        " works the same everywhere."
+    )
     console.print(f"  [dim]\u25b8[/dim] New here? Try [bold {ACCENT}]interpkit --extensive[/bold {ACCENT}] for a plain-English walkthrough.")
     console.print()
@@ -781,7 +820,8 @@ def features(
     model_name: str = typer.Argument(..., help="HuggingFace model ID (e.g. gpt2)"),
     input_data: str | None = typer.Argument(None, help="Input text (omit when using --positive-file / --negative-file)"),
     at: str = typer.Option(..., "--at", help="Module name to decompose (e.g. transformer.h.8)"),
-    sae: str = typer.Option(..., "--sae", help="SAE source: HuggingFace repo ID or local file path (.safetensors / .pt)"),
+    sae: str = typer.Option(..., "--sae", help="SAE source: HuggingFace repo ID, local file path (.safetensors / .pt), or 'org/repo/subfolder' shorthand"),
+    sae_subfolder: str | None = typer.Option(None, "--sae-subfolder", help="Subfolder inside the SAE repo (e.g. 'blocks.8.hook_resid_pre'). Equivalent to appending it to --sae."),
     top_k: int = typer.Option(20, "--top-k", help="Number of top features to display"),
     positive_file: str | None = typer.Option(None, "--positive-file", help="Text file with positive examples for contrastive analysis, one per line"),
     negative_file: str | None = typer.Option(None, "--negative-file", help="Text file with negative examples for contrastive analysis, one per line"),
@@ -800,13 +840,19 @@ def features(
         pos_inputs = read_examples_file(positive_file)
         neg_inputs = read_examples_file(negative_file)
         m = _load_model(model_name, device=device, dtype=dtype, device_map=device_map)
-        result = m.contrastive_features(pos_inputs, neg_inputs, at=at, sae=sae, top_k=top_k)
+        result = m.contrastive_features(
+            pos_inputs, neg_inputs, at=at, sae=sae, top_k=top_k,
+            sae_subfolder=sae_subfolder,
+        )
     else:
         if input_data is None:
             raise typer.BadParameter("Provide input text or use --positive-file / --negative-file for contrastive mode")
         m = _load_model(model_name, device=device, dtype=dtype, device_map=device_map)
         with console.status("  Decomposing features..."):
-            result = m.features(input_data, at=at, sae=sae, top_k=top_k)
+            result = m.features(
+                input_data, at=at, sae=sae, top_k=top_k,
+                sae_subfolder=sae_subfolder,
+            )
     if _output_format == "json":
         _json_dump(result)
@@ -847,8 +893,9 @@ def dla(
     top_k: int = typer.Option(10, "--top-k", help="Number of top/bottom contributors to show"),
     save: str | None = typer.Option(None, "--save", help="Save bar chart to file (e.g. dla.png)"),
     html_path: str | None = typer.Option(None, "--html", help="Save interactive HTML to file"),
-    sae: str | None = typer.Option(None, "--sae", help="SAE source: HuggingFace repo ID or local file path (.safetensors / .pt)"),
+    sae: str | None = typer.Option(None, "--sae", help="SAE source: HuggingFace repo ID, local file path (.safetensors / .pt), or 'org/repo/subfolder' shorthand"),
     sae_at: str | None = typer.Option(None, "--sae-at", help="Module to decompose through the SAE (e.g. transformer.h.11.attn)"),
+    sae_subfolder: str | None = typer.Option(None, "--sae-subfolder", help="Subfolder inside the SAE repo (e.g. 'blocks.8.hook_resid_pre'). Equivalent to appending it to --sae."),
     device: str | None = typer.Option(None, help="Device"),
     dtype: str | None = typer.Option(None, "--dtype", help="Model dtype: float16, bfloat16, float32, auto"),
     device_map: str | None = typer.Option(None, "--device-map", help="HF device_map (e.g. 'auto')"),
@@ -865,7 +912,7 @@ def dla(
         result = m.dla(
             input_data, token=parsed_token, position=position,
             top_k=top_k, save=save, html=html_path,
-            sae=sae, sae_at=sae_at,
+            sae=sae, sae_at=sae_at, sae_subfolder=sae_subfolder,
         )
     if _output_format == "json":
         _json_dump(result)
@@ -959,5 +1006,62 @@ def report(
         _json_dump(result)
+# ══════════════════════════════════════════════════════════════════
+# chat
+# ══════════════════════════════════════════════════════════════════
+@app.command()
+def chat(
+    model_name: str = typer.Argument(..., help="HuggingFace chat/instruct model ID (e.g. HuggingFaceTB/SmolLM2-360M-Instruct)"),
+    message: str = typer.Argument(..., help="User message to send"),
+    system: str | None = typer.Option(None, "--system", help="Optional system prompt"),
+    max_new_tokens: int = typer.Option(128, "--max-new-tokens", help="Max generation length"),
+    sample: bool = typer.Option(False, "--sample/--no-sample", help="Sample (True) or use greedy decoding (False, default)"),
+    temperature: float = typer.Option(1.0, "--temperature", help="Sampling temperature (used when --sample)"),
+    top_p: float = typer.Option(1.0, "--top-p", help="Nucleus sampling cutoff (used when --sample)"),
+    show_prompt: bool = typer.Option(False, "--show-prompt", help="Print the chat-templated prompt before generating"),
+    device: str | None = typer.Option(None, help="Device"),
+    dtype: str | None = typer.Option(None, "--dtype", help="Model dtype: float16, bfloat16, float32, auto"),
+    device_map: str | None = typer.Option(None, "--device-map", help="HF device_map (e.g. 'auto')"),
+) -> None:
+    """Send a chat message and print the model's response.
+    Routes the message through the tokenizer's chat template
+    (``apply_chat_template`` with ``add_generation_prompt=True``) and
+    calls ``model.generate``.  Errors clearly when the loaded model has
+    no chat template (i.e. is a base/non-instruct model).
+    """
+    m = _load_model(model_name, device=device, dtype=dtype, device_map=device_map)
+    with console.status("  Generating response..."):
+        result = m.chat(
+            message,
+            system=system,
+            max_new_tokens=max_new_tokens,
+            do_sample=sample,
+            temperature=temperature,
+            top_p=top_p,
+        )
+    if show_prompt:
+        console.print(Panel(
+            result["prompt"],
+            title="[bold]Prompt[/bold]",
+            border_style=ACCENT_DIM,
+            padding=(0, 1),
+        ))
+    console.print()
+    console.print(Panel(
+        result["response"],
+        title=f"[bold]{model_name}[/bold]",
+        border_style=ACCENT,
+        padding=(0, 2),
+    ))
+    if _output_format == "json":
+        _json_dump({k: v for k, v in result.items() if k not in {"input_ids", "output_ids"}})
 if __name__ == "__main__":
     app()

interpkit 0.3.0__tar.gz → 0.4.0__tar.gz

interpkit 0.3.0tar.gz → 0.4.0tar.gz