PyPI - sae-lens - Versions diffs - 6.3.0__tar.gz → 6.25.1__tar.gz - Mend

sae-lens 6.3.0tar.gz → 6.25.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of sae-lens might be problematic. Click here for more details.

Files changed (45) hide show

{sae_lens-6.3.0 → sae_lens-6.25.1}/PKG-INFO RENAMED Viewed

@@ -1,8 +1,9 @@
-Metadata-Version: 2.3
+Metadata-Version: 2.4
 Name: sae-lens
-Version: 6.3.0
+Version: 6.25.1
 Summary: Training and Analyzing Sparse Autoencoders (SAEs)
 License: MIT
+License-File: LICENSE
 Keywords: deep-learning,sparse-autoencoders,mechanistic-interpretability,PyTorch
 Author: Joseph Bloom
 Requires-Python: >=3.10,<4.0
@@ -12,41 +13,36 @@ Classifier: Programming Language :: Python :: 3.10
 Classifier: Programming Language :: Python :: 3.11
 Classifier: Programming Language :: Python :: 3.12
 Classifier: Programming Language :: Python :: 3.13
+Classifier: Programming Language :: Python :: 3.14
 Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
 Provides-Extra: mamba
-Requires-Dist: automated-interpretability (>=0.0.5,<1.0.0)
 Requires-Dist: babe (>=0.0.7,<0.0.8)
-Requires-Dist: datasets (>=2.17.1,<3.0.0)
+Requires-Dist: datasets (>=3.1.0)
 Requires-Dist: mamba-lens (>=0.0.4,<0.0.5) ; extra == "mamba"
-Requires-Dist: matplotlib (>=3.8.3,<4.0.0)
-Requires-Dist: matplotlib-inline (>=0.1.6,<0.2.0)
 Requires-Dist: nltk (>=3.8.1,<4.0.0)
-Requires-Dist: plotly (>=5.19.0,<6.0.0)
-Requires-Dist: plotly-express (>=0.4.1,<0.5.0)
-Requires-Dist: pytest-profiling (>=1.7.0,<2.0.0)
-Requires-Dist: python-dotenv (>=1.0.1,<2.0.0)
+Requires-Dist: plotly (>=5.19.0)
+Requires-Dist: plotly-express (>=0.4.1)
+Requires-Dist: python-dotenv (>=1.0.1)
 Requires-Dist: pyyaml (>=6.0.1,<7.0.0)
-Requires-Dist: pyzmq (==26.0.0)
-Requires-Dist: safetensors (>=0.4.2,<0.5.0)
+Requires-Dist: safetensors (>=0.4.2,<1.0.0)
 Requires-Dist: simple-parsing (>=0.1.6,<0.2.0)
-Requires-Dist: transformer-lens (>=2.0.0,<3.0.0)
+Requires-Dist: tenacity (>=9.0.0)
+Requires-Dist: transformer-lens (>=2.16.1,<3.0.0)
 Requires-Dist: transformers (>=4.38.1,<5.0.0)
-Requires-Dist: typer (>=0.12.3,<0.13.0)
 Requires-Dist: typing-extensions (>=4.10.0,<5.0.0)
-Requires-Dist: zstandard (>=0.22.0,<0.23.0)
-Project-URL: Homepage, https://jbloomaus.github.io/SAELens
-Project-URL: Repository, https://github.com/jbloomAus/SAELens
+Project-URL: Homepage, https://decoderesearch.github.io/SAELens
+Project-URL: Repository, https://github.com/decoderesearch/SAELens
 Description-Content-Type: text/markdown
-<img width="1308" alt="Screenshot 2024-03-21 at 3 08 28 pm" src="https://github.com/jbloomAus/mats_sae_training/assets/69127271/209012ec-a779-4036-b4be-7b7739ea87f6">
+<img width="1308" height="532" alt="saes_pic" src="https://github.com/user-attachments/assets/2a5d752f-b261-4ee4-ad5d-ebf282321371" />
 # SAE Lens
 [![PyPI](https://img.shields.io/pypi/v/sae-lens?color=blue)](https://pypi.org/project/sae-lens/)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-[![build](https://github.com/jbloomAus/SAELens/actions/workflows/build.yml/badge.svg)](https://github.com/jbloomAus/SAELens/actions/workflows/build.yml)
-[![Deploy Docs](https://github.com/jbloomAus/SAELens/actions/workflows/deploy_docs.yml/badge.svg)](https://github.com/jbloomAus/SAELens/actions/workflows/deploy_docs.yml)
-[![codecov](https://codecov.io/gh/jbloomAus/SAELens/graph/badge.svg?token=N83NGH8CGE)](https://codecov.io/gh/jbloomAus/SAELens)
+[![build](https://github.com/decoderesearch/SAELens/actions/workflows/build.yml/badge.svg)](https://github.com/decoderesearch/SAELens/actions/workflows/build.yml)
+[![Deploy Docs](https://github.com/decoderesearch/SAELens/actions/workflows/deploy_docs.yml/badge.svg)](https://github.com/decoderesearch/SAELens/actions/workflows/deploy_docs.yml)
+[![codecov](https://codecov.io/gh/decoderesearch/SAELens/graph/badge.svg?token=N83NGH8CGE)](https://codecov.io/gh/decoderesearch/SAELens)
 SAELens exists to help researchers:
@@ -54,7 +50,7 @@ SAELens exists to help researchers:
 - Analyse sparse autoencoders / research mechanistic interpretability.
 - Generate insights which make it easier to create safe and aligned AI systems.
-Please refer to the [documentation](https://jbloomaus.github.io/SAELens/) for information on how to:
+Please refer to the [documentation](https://decoderesearch.github.io/SAELens/) for information on how to:
 - Download and Analyse pre-trained sparse autoencoders.
 - Train your own sparse autoencoders.
@@ -62,25 +58,25 @@ Please refer to the [documentation](https://jbloomaus.github.io/SAELens/) for in
 SAE Lens is the result of many contributors working collectively to improve humanity's understanding of neural networks, many of whom are motivated by a desire to [safeguard humanity from risks posed by artificial intelligence](https://80000hours.org/problem-profiles/artificial-intelligence/).
-This library is maintained by [Joseph Bloom](https://www.jbloomaus.com/), [Curt Tigges](https://curttigges.com/), [Anthony Duong](https://github.com/anthonyduong9) and [David Chanin](https://github.com/chanind).
+This library is maintained by [Joseph Bloom](https://www.decoderesearch.com/), [Curt Tigges](https://curttigges.com/), [Anthony Duong](https://github.com/anthonyduong9) and [David Chanin](https://github.com/chanind).
 ## Loading Pre-trained SAEs.
-Pre-trained SAEs for various models can be imported via SAE Lens. See this [page](https://jbloomaus.github.io/SAELens/sae_table/) in the readme for a list of all SAEs.
+Pre-trained SAEs for various models can be imported via SAE Lens. See this [page](https://decoderesearch.github.io/SAELens/pretrained_saes/) for a list of all SAEs.
 ## Migrating to SAELens v6
-The new v6 update is a major refactor to SAELens and changes the way training code is structured. Check out the [migration guide](https://jbloomaus.github.io/SAELens/latest/migrating/) for more details.
+The new v6 update is a major refactor to SAELens and changes the way training code is structured. Check out the [migration guide](https://decoderesearch.github.io/SAELens/latest/migrating/) for more details.
 ## Tutorials
-- [SAE Lens + Neuronpedia](tutorials/tutorial_2_0.ipynb)[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/jbloomAus/SAELens/blob/main/tutorials/tutorial_2_0.ipynb)
+- [SAE Lens + Neuronpedia](tutorials/tutorial_2_0.ipynb)[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/decoderesearch/SAELens/blob/main/tutorials/tutorial_2_0.ipynb)
 - [Loading and Analysing Pre-Trained Sparse Autoencoders](tutorials/basic_loading_and_analysing.ipynb)
-  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/jbloomAus/SAELens/blob/main/tutorials/basic_loading_and_analysing.ipynb)
+  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/decoderesearch/SAELens/blob/main/tutorials/basic_loading_and_analysing.ipynb)
 - [Understanding SAE Features with the Logit Lens](tutorials/logits_lens_with_features.ipynb)
-  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/jbloomAus/SAELens/blob/main/tutorials/logits_lens_with_features.ipynb)
+  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/decoderesearch/SAELens/blob/main/tutorials/logits_lens_with_features.ipynb)
 - [Training a Sparse Autoencoder](tutorials/training_a_sparse_autoencoder.ipynb)
-  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/jbloomAus/SAELens/blob/main/tutorials/training_a_sparse_autoencoder.ipynb)
+  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/decoderesearch/SAELens/blob/main/tutorials/training_a_sparse_autoencoder.ipynb)
 ## Join the Slack!
@@ -95,7 +91,7 @@ Please cite the package as follows:
    title = {SAELens},
    author = {Bloom, Joseph and Tigges, Curt and Duong, Anthony and Chanin, David},
    year = {2024},
-   howpublished = {\url{https://github.com/jbloomAus/SAELens}},
+   howpublished = {\url{https://github.com/decoderesearch/SAELens}},
 }
 ```

{sae_lens-6.3.0 → sae_lens-6.25.1}/README.md RENAMED Viewed

@@ -1,12 +1,12 @@
-<img width="1308" alt="Screenshot 2024-03-21 at 3 08 28 pm" src="https://github.com/jbloomAus/mats_sae_training/assets/69127271/209012ec-a779-4036-b4be-7b7739ea87f6">
+<img width="1308" height="532" alt="saes_pic" src="https://github.com/user-attachments/assets/2a5d752f-b261-4ee4-ad5d-ebf282321371" />
 # SAE Lens
 [![PyPI](https://img.shields.io/pypi/v/sae-lens?color=blue)](https://pypi.org/project/sae-lens/)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-[![build](https://github.com/jbloomAus/SAELens/actions/workflows/build.yml/badge.svg)](https://github.com/jbloomAus/SAELens/actions/workflows/build.yml)
-[![Deploy Docs](https://github.com/jbloomAus/SAELens/actions/workflows/deploy_docs.yml/badge.svg)](https://github.com/jbloomAus/SAELens/actions/workflows/deploy_docs.yml)
-[![codecov](https://codecov.io/gh/jbloomAus/SAELens/graph/badge.svg?token=N83NGH8CGE)](https://codecov.io/gh/jbloomAus/SAELens)
+[![build](https://github.com/decoderesearch/SAELens/actions/workflows/build.yml/badge.svg)](https://github.com/decoderesearch/SAELens/actions/workflows/build.yml)
+[![Deploy Docs](https://github.com/decoderesearch/SAELens/actions/workflows/deploy_docs.yml/badge.svg)](https://github.com/decoderesearch/SAELens/actions/workflows/deploy_docs.yml)
+[![codecov](https://codecov.io/gh/decoderesearch/SAELens/graph/badge.svg?token=N83NGH8CGE)](https://codecov.io/gh/decoderesearch/SAELens)
 SAELens exists to help researchers:
@@ -14,7 +14,7 @@ SAELens exists to help researchers:
 - Analyse sparse autoencoders / research mechanistic interpretability.
 - Generate insights which make it easier to create safe and aligned AI systems.
-Please refer to the [documentation](https://jbloomaus.github.io/SAELens/) for information on how to:
+Please refer to the [documentation](https://decoderesearch.github.io/SAELens/) for information on how to:
 - Download and Analyse pre-trained sparse autoencoders.
 - Train your own sparse autoencoders.
@@ -22,25 +22,25 @@ Please refer to the [documentation](https://jbloomaus.github.io/SAELens/) for in
 SAE Lens is the result of many contributors working collectively to improve humanity's understanding of neural networks, many of whom are motivated by a desire to [safeguard humanity from risks posed by artificial intelligence](https://80000hours.org/problem-profiles/artificial-intelligence/).
-This library is maintained by [Joseph Bloom](https://www.jbloomaus.com/), [Curt Tigges](https://curttigges.com/), [Anthony Duong](https://github.com/anthonyduong9) and [David Chanin](https://github.com/chanind).
+This library is maintained by [Joseph Bloom](https://www.decoderesearch.com/), [Curt Tigges](https://curttigges.com/), [Anthony Duong](https://github.com/anthonyduong9) and [David Chanin](https://github.com/chanind).
 ## Loading Pre-trained SAEs.
-Pre-trained SAEs for various models can be imported via SAE Lens. See this [page](https://jbloomaus.github.io/SAELens/sae_table/) in the readme for a list of all SAEs.
+Pre-trained SAEs for various models can be imported via SAE Lens. See this [page](https://decoderesearch.github.io/SAELens/pretrained_saes/) for a list of all SAEs.
 ## Migrating to SAELens v6
-The new v6 update is a major refactor to SAELens and changes the way training code is structured. Check out the [migration guide](https://jbloomaus.github.io/SAELens/latest/migrating/) for more details.
+The new v6 update is a major refactor to SAELens and changes the way training code is structured. Check out the [migration guide](https://decoderesearch.github.io/SAELens/latest/migrating/) for more details.
 ## Tutorials
-- [SAE Lens + Neuronpedia](tutorials/tutorial_2_0.ipynb)[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/jbloomAus/SAELens/blob/main/tutorials/tutorial_2_0.ipynb)
+- [SAE Lens + Neuronpedia](tutorials/tutorial_2_0.ipynb)[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/decoderesearch/SAELens/blob/main/tutorials/tutorial_2_0.ipynb)
 - [Loading and Analysing Pre-Trained Sparse Autoencoders](tutorials/basic_loading_and_analysing.ipynb)
-  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/jbloomAus/SAELens/blob/main/tutorials/basic_loading_and_analysing.ipynb)
+  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/decoderesearch/SAELens/blob/main/tutorials/basic_loading_and_analysing.ipynb)
 - [Understanding SAE Features with the Logit Lens](tutorials/logits_lens_with_features.ipynb)
-  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/jbloomAus/SAELens/blob/main/tutorials/logits_lens_with_features.ipynb)
+  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/decoderesearch/SAELens/blob/main/tutorials/logits_lens_with_features.ipynb)
 - [Training a Sparse Autoencoder](tutorials/training_a_sparse_autoencoder.ipynb)
-  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/jbloomAus/SAELens/blob/main/tutorials/training_a_sparse_autoencoder.ipynb)
+  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/decoderesearch/SAELens/blob/main/tutorials/training_a_sparse_autoencoder.ipynb)
 ## Join the Slack!
@@ -55,6 +55,6 @@ Please cite the package as follows:
    title = {SAELens},
    author = {Bloom, Joseph and Tigges, Curt and Duong, Anthony and Chanin, David},
    year = {2024},
-   howpublished = {\url{https://github.com/jbloomAus/SAELens}},
+   howpublished = {\url{https://github.com/decoderesearch/SAELens}},
 }
 ```

{sae_lens-6.3.0 → sae_lens-6.25.1}/pyproject.toml RENAMED Viewed

@@ -1,13 +1,13 @@
 [tool.poetry]
 name = "sae-lens"
-version = "6.3.0"
+version = "6.25.1"
 description = "Training and Analyzing Sparse Autoencoders (SAEs)"
 authors = ["Joseph Bloom"]
 readme = "README.md"
 packages = [{ include = "sae_lens" }]
 include = ["pretrained_saes.yaml"]
-repository = "https://github.com/jbloomAus/SAELens"
-homepage = "https://jbloomaus.github.io/SAELens"
+repository = "https://github.com/decoderesearch/SAELens"
+homepage = "https://decoderesearch.github.io/SAELens"
 license = "MIT"
 keywords = [
     "deep-learning",
@@ -19,26 +19,20 @@ classifiers = ["Topic :: Scientific/Engineering :: Artificial Intelligence"]
 [tool.poetry.dependencies]
 python = "^3.10"
-transformer-lens = "^2.0.0"
+transformer-lens = "^2.16.1"
 transformers = "^4.38.1"
-plotly = "^5.19.0"
-plotly-express = "^0.4.1"
-matplotlib = "^3.8.3"
-matplotlib-inline = "^0.1.6"
-datasets = "^2.17.1"
+plotly = ">=5.19.0"
+plotly-express = ">=0.4.1"
+datasets = ">=3.1.0"
 babe = "^0.0.7"
 nltk = "^3.8.1"
-safetensors = "^0.4.2"
-typer = "^0.12.3"
+safetensors = ">=0.4.2,<1.0.0"
 mamba-lens = { version = "^0.0.4", optional = true }
-pyzmq = "26.0.0"
-automated-interpretability = ">=0.0.5,<1.0.0"
-python-dotenv = "^1.0.1"
+python-dotenv = ">=1.0.1"
 pyyaml = "^6.0.1"
-pytest-profiling = "^1.7.0"
-zstandard = "^0.22.0"
 typing-extensions = "^4.10.0"
 simple-parsing = "^0.1.6"
+tenacity = ">=9.0.0"
 [tool.poetry.group.dev.dependencies]
 pytest = "^8.0.2"
@@ -53,6 +47,7 @@ docstr-coverage = "^2.3.2"
 mkdocs = "^1.6.1"
 mkdocs-material = "^9.5.34"
 mkdocs-autorefs = "^1.4.2"
+mkdocs-redirects = "^1.2.1"
 mkdocs-section-index = "^0.3.9"
 mkdocstrings = "^0.25.2"
 mkdocstrings-python = "^1.10.9"
@@ -61,6 +56,7 @@ ruff = "^0.7.4"
 eai-sparsify = "^1.1.1"
 mike = "^2.0.0"
 trio = "^0.30.0"
+dictionary-learning = "^0.1.0"
 [tool.poetry.extras]
 mamba = ["mamba-lens"]

{sae_lens-6.3.0 → sae_lens-6.25.1}/sae_lens/__init__.py RENAMED Viewed

@@ -1,5 +1,5 @@
 # ruff: noqa: E402
-__version__ = "6.3.0"
+__version__ = "6.25.1"
 import logging
@@ -15,19 +15,31 @@ from sae_lens.saes import (
     GatedTrainingSAEConfig,
     JumpReLUSAE,
     JumpReLUSAEConfig,
+    JumpReLUSkipTranscoder,
+    JumpReLUSkipTranscoderConfig,
     JumpReLUTrainingSAE,
     JumpReLUTrainingSAEConfig,
+    JumpReLUTranscoder,
+    JumpReLUTranscoderConfig,
+    MatryoshkaBatchTopKTrainingSAE,
+    MatryoshkaBatchTopKTrainingSAEConfig,
     SAEConfig,
+    SkipTranscoder,
+    SkipTranscoderConfig,
     StandardSAE,
     StandardSAEConfig,
     StandardTrainingSAE,
     StandardTrainingSAEConfig,
+    TemporalSAE,
+    TemporalSAEConfig,
     TopKSAE,
     TopKSAEConfig,
     TopKTrainingSAE,
     TopKTrainingSAEConfig,
     TrainingSAE,
     TrainingSAEConfig,
+    Transcoder,
+    TranscoderConfig,
 )
 from .analysis.hooked_sae_transformer import HookedSAETransformer
@@ -89,6 +101,18 @@ __all__ = [
     "LoggingConfig",
     "BatchTopKTrainingSAE",
     "BatchTopKTrainingSAEConfig",
+    "Transcoder",
+    "TranscoderConfig",
+    "SkipTranscoder",
+    "SkipTranscoderConfig",
+    "JumpReLUTranscoder",
+    "JumpReLUTranscoderConfig",
+    "JumpReLUSkipTranscoder",
+    "JumpReLUSkipTranscoderConfig",
+    "MatryoshkaBatchTopKTrainingSAE",
+    "MatryoshkaBatchTopKTrainingSAEConfig",
+    "TemporalSAE",
+    "TemporalSAEConfig",
 ]
@@ -103,3 +127,15 @@ register_sae_training_class("jumprelu", JumpReLUTrainingSAE, JumpReLUTrainingSAE
 register_sae_training_class(
     "batchtopk", BatchTopKTrainingSAE, BatchTopKTrainingSAEConfig
 )
+register_sae_training_class(
+    "matryoshka_batchtopk",
+    MatryoshkaBatchTopKTrainingSAE,
+    MatryoshkaBatchTopKTrainingSAEConfig,
+)
+register_sae_class("transcoder", Transcoder, TranscoderConfig)
+register_sae_class("skip_transcoder", SkipTranscoder, SkipTranscoderConfig)
+register_sae_class("jumprelu_transcoder", JumpReLUTranscoder, JumpReLUTranscoderConfig)
+register_sae_class(
+    "jumprelu_skip_transcoder", JumpReLUSkipTranscoder, JumpReLUSkipTranscoderConfig
+)
+register_sae_class("temporal", TemporalSAE, TemporalSAEConfig)

{sae_lens-6.3.0 → sae_lens-6.25.1}/sae_lens/analysis/hooked_sae_transformer.py RENAMED Viewed

@@ -3,15 +3,15 @@ from contextlib import contextmanager
 from typing import Any, Callable
 import torch
-from jaxtyping import Float
 from transformer_lens.ActivationCache import ActivationCache
+from transformer_lens.components.mlps.can_be_used_as_mlp import CanBeUsedAsMLP
 from transformer_lens.hook_points import HookPoint  # Hooking utilities
 from transformer_lens.HookedTransformer import HookedTransformer
 from sae_lens.saes.sae import SAE
-SingleLoss = Float[torch.Tensor, ""]  # Type alias for a single element tensor
-LossPerToken = Float[torch.Tensor, "batch pos-1"]
+SingleLoss = torch.Tensor  # Type alias for a single element tensor
+LossPerToken = torch.Tensor
 Loss = SingleLoss | LossPerToken
@@ -50,6 +50,13 @@ def set_deep_attr(obj: Any, path: str, value: Any):
     setattr(obj, parts[-1], value)
+def add_hook_in_to_mlp(mlp: CanBeUsedAsMLP):
+    # Temporary hack to add a `mlp.hook_in` hook to mimic what's in circuit-tracer
+    mlp.hook_in = HookPoint()
+    original_forward = mlp.forward
+    mlp.forward = lambda x: original_forward(mlp.hook_in(x))  # type: ignore
 class HookedSAETransformer(HookedTransformer):
     def __init__(
         self,
@@ -66,6 +73,11 @@ class HookedSAETransformer(HookedTransformer):
             **model_kwargs: Keyword arguments for HookedTransformer initialization
         """
         super().__init__(*model_args, **model_kwargs)
+        for block in self.blocks:
+            add_hook_in_to_mlp(block.mlp)  # type: ignore
+        self.setup()
         self.acts_to_saes: dict[str, SAE] = {}  # type: ignore
     def add_sae(self, sae: SAE[Any], use_error_term: bool | None = None):
@@ -158,12 +170,7 @@ class HookedSAETransformer(HookedTransformer):
         reset_saes_end: bool = True,
         use_error_term: bool | None = None,
         **model_kwargs: Any,
-    ) -> (
-        None
-        | Float[torch.Tensor, "batch pos d_vocab"]
-        | Loss
-        | tuple[Float[torch.Tensor, "batch pos d_vocab"], Loss]
-    ):
+    ) -> None | torch.Tensor | Loss | tuple[torch.Tensor, Loss]:
         """Wrapper around HookedTransformer forward pass.
         Runs the model with the given SAEs attached for one forward pass, then removes them. By default, will reset all SAEs to original state after.
@@ -190,10 +197,7 @@ class HookedSAETransformer(HookedTransformer):
         remove_batch_dim: bool = False,
         **kwargs: Any,
     ) -> tuple[
-        None
-        | Float[torch.Tensor, "batch pos d_vocab"]
-        | Loss
-        | tuple[Float[torch.Tensor, "batch pos d_vocab"], Loss],
+        None | torch.Tensor | Loss | tuple[torch.Tensor, Loss],
         ActivationCache | dict[str, torch.Tensor],
     ]:
         """Wrapper around 'run_with_cache' in HookedTransformer.

sae_lens-6.25.1/sae_lens/analysis/neuronpedia_integration.py ADDED Viewed

@@ -0,0 +1,163 @@
+import json
+import urllib.parse
+import webbrowser
+from typing import Any
+import requests
+from dotenv import load_dotenv
+from sae_lens import SAE, logger
+NEURONPEDIA_DOMAIN = "https://neuronpedia.org"
+# Constants for replacing NaNs and Infs in outputs
+POSITIVE_INF_REPLACEMENT = 9999
+NEGATIVE_INF_REPLACEMENT = -9999
+NAN_REPLACEMENT = 0
+OTHER_INVALID_REPLACEMENT = -99999
+# Pick up OPENAI_API_KEY from environment variable
+load_dotenv()
+def NanAndInfReplacer(value: str):
+    """Replace NaNs and Infs in outputs."""
+    replacements = {
+        "-Infinity": NEGATIVE_INF_REPLACEMENT,
+        "Infinity": POSITIVE_INF_REPLACEMENT,
+        "NaN": NAN_REPLACEMENT,
+    }
+    if value in replacements:
+        replaced_value = replacements[value]
+        return float(replaced_value)
+    return NAN_REPLACEMENT
+def open_neuronpedia_feature_dashboard(sae: SAE[Any], index: int):
+    sae_id = sae.cfg.metadata.neuronpedia_id
+    if sae_id is None:
+        logger.warning(
+            "SAE does not have a Neuronpedia ID. Either dashboards for this SAE do not exist (yet) on Neuronpedia, or the SAE was not loaded via the from_pretrained method"
+        )
+    else:
+        url = f"{NEURONPEDIA_DOMAIN}/{sae_id}/{index}"
+        webbrowser.open(url)
+def get_neuronpedia_quick_list(
+    sae: SAE[Any],
+    features: list[int],
+    name: str = "temporary_list",
+):
+    sae_id = sae.cfg.metadata.neuronpedia_id
+    if sae_id is None:
+        logger.warning(
+            "SAE does not have a Neuronpedia ID. Either dashboards for this SAE do not exist (yet) on Neuronpedia, or the SAE was not loaded via the from_pretrained method"
+        )
+    assert sae_id is not None
+    url = NEURONPEDIA_DOMAIN + "/quick-list/"
+    name = urllib.parse.quote(name)
+    url = url + "?name=" + name
+    list_feature = [
+        {
+            "modelId": sae.cfg.metadata.model_name,
+            "layer": sae_id.split("/")[1],
+            "index": str(feature),
+        }
+        for feature in features
+    ]
+    url = url + "&features=" + urllib.parse.quote(json.dumps(list_feature))
+    webbrowser.open(url)
+    return url
+def get_neuronpedia_feature(
+    feature: int, layer: int, model: str = "gpt2-small", dataset: str = "res-jb"
+) -> dict[str, Any]:
+    """Fetch a feature from Neuronpedia API."""
+    url = f"{NEURONPEDIA_DOMAIN}/api/feature/{model}/{layer}-{dataset}/{feature}"
+    result = requests.get(url).json()
+    result["index"] = int(result["index"])
+    return result
+class NeuronpediaActivation:
+    """Represents an activation from Neuronpedia."""
+    def __init__(self, id: str, tokens: list[str], act_values: list[float]):
+        self.id = id
+        self.tokens = tokens
+        self.act_values = act_values
+class NeuronpediaFeature:
+    """Represents a feature from Neuronpedia."""
+    def __init__(
+        self,
+        modelId: str,
+        layer: int,
+        dataset: str,
+        feature: int,
+        description: str = "",
+        activations: list[NeuronpediaActivation] | None = None,
+        autointerp_explanation: str = "",
+        autointerp_explanation_score: float = 0.0,
+    ):
+        self.modelId = modelId
+        self.layer = layer
+        self.dataset = dataset
+        self.feature = feature
+        self.description = description
+        self.activations = activations
+        self.autointerp_explanation = autointerp_explanation
+        self.autointerp_explanation_score = autointerp_explanation_score
+    def has_activating_text(self) -> bool:
+        """Check if the feature has activating text."""
+        if self.activations is None:
+            return False
+        return any(max(activation.act_values) > 0 for activation in self.activations)
+def make_neuronpedia_list_with_features(
+    api_key: str,
+    list_name: str,
+    features: list[NeuronpediaFeature],
+    list_description: str | None = None,
+    open_browser: bool = True,
+):
+    url = NEURONPEDIA_DOMAIN + "/api/list/new-with-features"
+    # make POST json request with body
+    body = {
+        "name": list_name,
+        "description": list_description,
+        "features": [
+            {
+                "modelId": feature.modelId,
+                "layer": f"{feature.layer}-{feature.dataset}",
+                "index": feature.feature,
+                "description": feature.description,
+            }
+            for feature in features
+        ],
+    }
+    response = requests.post(url, json=body, headers={"x-api-key": api_key})
+    result = response.json()
+    if "url" in result and open_browser:
+        webbrowser.open(result["url"])
+        return result["url"]
+    raise Exception("Error in creating list: " + result["message"])
+def test_key(api_key: str):
+    """Test the validity of the Neuronpedia API key."""
+    url = f"{NEURONPEDIA_DOMAIN}/api/test"
+    body = {"apiKey": api_key}
+    response = requests.post(url, json=body)
+    if response.status_code != 200:
+        raise Exception("Neuronpedia API key is not valid.")

{sae_lens-6.3.0 → sae_lens-6.25.1}/sae_lens/cache_activations_runner.py RENAMED Viewed

@@ -9,15 +9,14 @@ import torch
 from datasets import Array2D, Dataset, Features, Sequence, Value
 from datasets.fingerprint import generate_fingerprint
 from huggingface_hub import HfApi
-from jaxtyping import Float, Int
-from tqdm import tqdm
+from tqdm.auto import tqdm
 from transformer_lens.HookedTransformer import HookedRootModule
 from sae_lens import logger
 from sae_lens.config import CacheActivationsRunnerConfig
-from sae_lens.constants import DTYPE_MAP
 from sae_lens.load_model import load_model
 from sae_lens.training.activations_store import ActivationsStore
+from sae_lens.util import str_to_dtype
 def _mk_activations_store(
@@ -82,7 +81,7 @@ class CacheActivationsRunner:
             )
             for hook_name in [self.cfg.hook_name]
         }
-        features_dict["token_ids"] = Sequence(
+        features_dict["token_ids"] = Sequence(  # type: ignore
             Value(dtype="int32"), length=self.context_size
         )
         self.features = Features(features_dict)
@@ -98,7 +97,7 @@ class CacheActivationsRunner:
         bytes_per_token = (
             self.cfg.d_in * self.cfg.dtype.itemsize
             if isinstance(self.cfg.dtype, torch.dtype)
-            else DTYPE_MAP[self.cfg.dtype].itemsize
+            else str_to_dtype(self.cfg.dtype).itemsize
         )
         total_training_tokens = self.cfg.n_seq_in_dataset * self.context_size
         total_disk_space_gb = total_training_tokens * bytes_per_token / 10**9
@@ -318,8 +317,8 @@ class CacheActivationsRunner:
     def _create_shard(
         self,
         buffer: tuple[
-            Float[torch.Tensor, "(bs context_size) d_in"],
-            Int[torch.Tensor, "(bs context_size)"] | None,
+            torch.Tensor,  # shape: (bs context_size) d_in
+            torch.Tensor | None,  # shape: (bs context_size) or None
         ],
     ) -> Dataset:
         hook_names = [self.cfg.hook_name]

sae-lens 6.3.0__tar.gz → 6.25.1__tar.gz

Potentially problematic release.

sae-lens 6.3.0tar.gz → 6.25.1tar.gz