PyPI - sae-lens - Versions diffs - 6.17.0__tar.gz → 6.20.1__tar.gz - Mend

sae-lens 6.17.0tar.gz → 6.20.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (41) hide show

{sae_lens-6.17.0 → sae_lens-6.20.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: sae-lens
-Version: 6.17.0
+Version: 6.20.1
 Summary: Training and Analyzing Sparse Autoencoders (SAEs)
 License: MIT
 License-File: LICENSE
@@ -30,19 +30,19 @@ Requires-Dist: tenacity (>=9.0.0)
 Requires-Dist: transformer-lens (>=2.16.1,<3.0.0)
 Requires-Dist: transformers (>=4.38.1,<5.0.0)
 Requires-Dist: typing-extensions (>=4.10.0,<5.0.0)
-Project-URL: Homepage, https://jbloomaus.github.io/SAELens
-Project-URL: Repository, https://github.com/jbloomAus/SAELens
+Project-URL: Homepage, https://decoderesearch.github.io/SAELens
+Project-URL: Repository, https://github.com/decoderesearch/SAELens
 Description-Content-Type: text/markdown
-<img width="1308" alt="Screenshot 2024-03-21 at 3 08 28 pm" src="https://github.com/jbloomAus/mats_sae_training/assets/69127271/209012ec-a779-4036-b4be-7b7739ea87f6">
+<img width="1308" height="532" alt="saes_pic" src="https://github.com/user-attachments/assets/2a5d752f-b261-4ee4-ad5d-ebf282321371" />
 # SAE Lens
 [![PyPI](https://img.shields.io/pypi/v/sae-lens?color=blue)](https://pypi.org/project/sae-lens/)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-[![build](https://github.com/jbloomAus/SAELens/actions/workflows/build.yml/badge.svg)](https://github.com/jbloomAus/SAELens/actions/workflows/build.yml)
-[![Deploy Docs](https://github.com/jbloomAus/SAELens/actions/workflows/deploy_docs.yml/badge.svg)](https://github.com/jbloomAus/SAELens/actions/workflows/deploy_docs.yml)
-[![codecov](https://codecov.io/gh/jbloomAus/SAELens/graph/badge.svg?token=N83NGH8CGE)](https://codecov.io/gh/jbloomAus/SAELens)
+[![build](https://github.com/decoderesearch/SAELens/actions/workflows/build.yml/badge.svg)](https://github.com/decoderesearch/SAELens/actions/workflows/build.yml)
+[![Deploy Docs](https://github.com/decoderesearch/SAELens/actions/workflows/deploy_docs.yml/badge.svg)](https://github.com/decoderesearch/SAELens/actions/workflows/deploy_docs.yml)
+[![codecov](https://codecov.io/gh/decoderesearch/SAELens/graph/badge.svg?token=N83NGH8CGE)](https://codecov.io/gh/decoderesearch/SAELens)
 SAELens exists to help researchers:
@@ -50,7 +50,7 @@ SAELens exists to help researchers:
 - Analyse sparse autoencoders / research mechanistic interpretability.
 - Generate insights which make it easier to create safe and aligned AI systems.
-Please refer to the [documentation](https://jbloomaus.github.io/SAELens/) for information on how to:
+Please refer to the [documentation](https://decoderesearch.github.io/SAELens/) for information on how to:
 - Download and Analyse pre-trained sparse autoencoders.
 - Train your own sparse autoencoders.
@@ -58,25 +58,25 @@ Please refer to the [documentation](https://jbloomaus.github.io/SAELens/) for in
 SAE Lens is the result of many contributors working collectively to improve humanity's understanding of neural networks, many of whom are motivated by a desire to [safeguard humanity from risks posed by artificial intelligence](https://80000hours.org/problem-profiles/artificial-intelligence/).
-This library is maintained by [Joseph Bloom](https://www.jbloomaus.com/), [Curt Tigges](https://curttigges.com/), [Anthony Duong](https://github.com/anthonyduong9) and [David Chanin](https://github.com/chanind).
+This library is maintained by [Joseph Bloom](https://www.decoderesearch.com/), [Curt Tigges](https://curttigges.com/), [Anthony Duong](https://github.com/anthonyduong9) and [David Chanin](https://github.com/chanind).
 ## Loading Pre-trained SAEs.
-Pre-trained SAEs for various models can be imported via SAE Lens. See this [page](https://jbloomaus.github.io/SAELens/sae_table/) in the readme for a list of all SAEs.
+Pre-trained SAEs for various models can be imported via SAE Lens. See this [page](https://decoderesearch.github.io/SAELens/sae_table/) in the readme for a list of all SAEs.
 ## Migrating to SAELens v6
-The new v6 update is a major refactor to SAELens and changes the way training code is structured. Check out the [migration guide](https://jbloomaus.github.io/SAELens/latest/migrating/) for more details.
+The new v6 update is a major refactor to SAELens and changes the way training code is structured. Check out the [migration guide](https://decoderesearch.github.io/SAELens/latest/migrating/) for more details.
 ## Tutorials
-- [SAE Lens + Neuronpedia](tutorials/tutorial_2_0.ipynb)[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/jbloomAus/SAELens/blob/main/tutorials/tutorial_2_0.ipynb)
+- [SAE Lens + Neuronpedia](tutorials/tutorial_2_0.ipynb)[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/decoderesearch/SAELens/blob/main/tutorials/tutorial_2_0.ipynb)
 - [Loading and Analysing Pre-Trained Sparse Autoencoders](tutorials/basic_loading_and_analysing.ipynb)
-  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/jbloomAus/SAELens/blob/main/tutorials/basic_loading_and_analysing.ipynb)
+  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/decoderesearch/SAELens/blob/main/tutorials/basic_loading_and_analysing.ipynb)
 - [Understanding SAE Features with the Logit Lens](tutorials/logits_lens_with_features.ipynb)
-  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/jbloomAus/SAELens/blob/main/tutorials/logits_lens_with_features.ipynb)
+  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/decoderesearch/SAELens/blob/main/tutorials/logits_lens_with_features.ipynb)
 - [Training a Sparse Autoencoder](tutorials/training_a_sparse_autoencoder.ipynb)
-  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/jbloomAus/SAELens/blob/main/tutorials/training_a_sparse_autoencoder.ipynb)
+  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/decoderesearch/SAELens/blob/main/tutorials/training_a_sparse_autoencoder.ipynb)
 ## Join the Slack!
@@ -91,7 +91,7 @@ Please cite the package as follows:
    title = {SAELens},
    author = {Bloom, Joseph and Tigges, Curt and Duong, Anthony and Chanin, David},
    year = {2024},
-   howpublished = {\url{https://github.com/jbloomAus/SAELens}},
+   howpublished = {\url{https://github.com/decoderesearch/SAELens}},
 }
 ```

{sae_lens-6.17.0 → sae_lens-6.20.1}/README.md RENAMED Viewed

@@ -1,12 +1,12 @@
-<img width="1308" alt="Screenshot 2024-03-21 at 3 08 28 pm" src="https://github.com/jbloomAus/mats_sae_training/assets/69127271/209012ec-a779-4036-b4be-7b7739ea87f6">
+<img width="1308" height="532" alt="saes_pic" src="https://github.com/user-attachments/assets/2a5d752f-b261-4ee4-ad5d-ebf282321371" />
 # SAE Lens
 [![PyPI](https://img.shields.io/pypi/v/sae-lens?color=blue)](https://pypi.org/project/sae-lens/)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-[![build](https://github.com/jbloomAus/SAELens/actions/workflows/build.yml/badge.svg)](https://github.com/jbloomAus/SAELens/actions/workflows/build.yml)
-[![Deploy Docs](https://github.com/jbloomAus/SAELens/actions/workflows/deploy_docs.yml/badge.svg)](https://github.com/jbloomAus/SAELens/actions/workflows/deploy_docs.yml)
-[![codecov](https://codecov.io/gh/jbloomAus/SAELens/graph/badge.svg?token=N83NGH8CGE)](https://codecov.io/gh/jbloomAus/SAELens)
+[![build](https://github.com/decoderesearch/SAELens/actions/workflows/build.yml/badge.svg)](https://github.com/decoderesearch/SAELens/actions/workflows/build.yml)
+[![Deploy Docs](https://github.com/decoderesearch/SAELens/actions/workflows/deploy_docs.yml/badge.svg)](https://github.com/decoderesearch/SAELens/actions/workflows/deploy_docs.yml)
+[![codecov](https://codecov.io/gh/decoderesearch/SAELens/graph/badge.svg?token=N83NGH8CGE)](https://codecov.io/gh/decoderesearch/SAELens)
 SAELens exists to help researchers:
@@ -14,7 +14,7 @@ SAELens exists to help researchers:
 - Analyse sparse autoencoders / research mechanistic interpretability.
 - Generate insights which make it easier to create safe and aligned AI systems.
-Please refer to the [documentation](https://jbloomaus.github.io/SAELens/) for information on how to:
+Please refer to the [documentation](https://decoderesearch.github.io/SAELens/) for information on how to:
 - Download and Analyse pre-trained sparse autoencoders.
 - Train your own sparse autoencoders.
@@ -22,25 +22,25 @@ Please refer to the [documentation](https://jbloomaus.github.io/SAELens/) for in
 SAE Lens is the result of many contributors working collectively to improve humanity's understanding of neural networks, many of whom are motivated by a desire to [safeguard humanity from risks posed by artificial intelligence](https://80000hours.org/problem-profiles/artificial-intelligence/).
-This library is maintained by [Joseph Bloom](https://www.jbloomaus.com/), [Curt Tigges](https://curttigges.com/), [Anthony Duong](https://github.com/anthonyduong9) and [David Chanin](https://github.com/chanind).
+This library is maintained by [Joseph Bloom](https://www.decoderesearch.com/), [Curt Tigges](https://curttigges.com/), [Anthony Duong](https://github.com/anthonyduong9) and [David Chanin](https://github.com/chanind).
 ## Loading Pre-trained SAEs.
-Pre-trained SAEs for various models can be imported via SAE Lens. See this [page](https://jbloomaus.github.io/SAELens/sae_table/) in the readme for a list of all SAEs.
+Pre-trained SAEs for various models can be imported via SAE Lens. See this [page](https://decoderesearch.github.io/SAELens/sae_table/) in the readme for a list of all SAEs.
 ## Migrating to SAELens v6
-The new v6 update is a major refactor to SAELens and changes the way training code is structured. Check out the [migration guide](https://jbloomaus.github.io/SAELens/latest/migrating/) for more details.
+The new v6 update is a major refactor to SAELens and changes the way training code is structured. Check out the [migration guide](https://decoderesearch.github.io/SAELens/latest/migrating/) for more details.
 ## Tutorials
-- [SAE Lens + Neuronpedia](tutorials/tutorial_2_0.ipynb)[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/jbloomAus/SAELens/blob/main/tutorials/tutorial_2_0.ipynb)
+- [SAE Lens + Neuronpedia](tutorials/tutorial_2_0.ipynb)[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/decoderesearch/SAELens/blob/main/tutorials/tutorial_2_0.ipynb)
 - [Loading and Analysing Pre-Trained Sparse Autoencoders](tutorials/basic_loading_and_analysing.ipynb)
-  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/jbloomAus/SAELens/blob/main/tutorials/basic_loading_and_analysing.ipynb)
+  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/decoderesearch/SAELens/blob/main/tutorials/basic_loading_and_analysing.ipynb)
 - [Understanding SAE Features with the Logit Lens](tutorials/logits_lens_with_features.ipynb)
-  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/jbloomAus/SAELens/blob/main/tutorials/logits_lens_with_features.ipynb)
+  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/decoderesearch/SAELens/blob/main/tutorials/logits_lens_with_features.ipynb)
 - [Training a Sparse Autoencoder](tutorials/training_a_sparse_autoencoder.ipynb)
-  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/jbloomAus/SAELens/blob/main/tutorials/training_a_sparse_autoencoder.ipynb)
+  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/decoderesearch/SAELens/blob/main/tutorials/training_a_sparse_autoencoder.ipynb)
 ## Join the Slack!
@@ -55,6 +55,6 @@ Please cite the package as follows:
    title = {SAELens},
    author = {Bloom, Joseph and Tigges, Curt and Duong, Anthony and Chanin, David},
    year = {2024},
-   howpublished = {\url{https://github.com/jbloomAus/SAELens}},
+   howpublished = {\url{https://github.com/decoderesearch/SAELens}},
 }
 ```

{sae_lens-6.17.0 → sae_lens-6.20.1}/pyproject.toml RENAMED Viewed

@@ -1,13 +1,13 @@
 [tool.poetry]
 name = "sae-lens"
-version = "6.17.0"
+version = "6.20.1"
 description = "Training and Analyzing Sparse Autoencoders (SAEs)"
 authors = ["Joseph Bloom"]
 readme = "README.md"
 packages = [{ include = "sae_lens" }]
 include = ["pretrained_saes.yaml"]
-repository = "https://github.com/jbloomAus/SAELens"
-homepage = "https://jbloomaus.github.io/SAELens"
+repository = "https://github.com/decoderesearch/SAELens"
+homepage = "https://decoderesearch.github.io/SAELens"
 license = "MIT"
 keywords = [
     "deep-learning",

{sae_lens-6.17.0 → sae_lens-6.20.1}/sae_lens/__init__.py RENAMED Viewed

@@ -1,5 +1,5 @@
 # ruff: noqa: E402
-__version__ = "6.17.0"
+__version__ = "6.20.1"
 import logging
@@ -28,6 +28,8 @@ from sae_lens.saes import (
     StandardSAEConfig,
     StandardTrainingSAE,
     StandardTrainingSAEConfig,
+    TemporalSAE,
+    TemporalSAEConfig,
     TopKSAE,
     TopKSAEConfig,
     TopKTrainingSAE,
@@ -105,6 +107,8 @@ __all__ = [
     "JumpReLUTranscoderConfig",
     "MatryoshkaBatchTopKTrainingSAE",
     "MatryoshkaBatchTopKTrainingSAEConfig",
+    "TemporalSAE",
+    "TemporalSAEConfig",
 ]
@@ -127,3 +131,4 @@ register_sae_training_class(
 register_sae_class("transcoder", Transcoder, TranscoderConfig)
 register_sae_class("skip_transcoder", SkipTranscoder, SkipTranscoderConfig)
 register_sae_class("jumprelu_transcoder", JumpReLUTranscoder, JumpReLUTranscoderConfig)
+register_sae_class("temporal", TemporalSAE, TemporalSAEConfig)

{sae_lens-6.17.0 → sae_lens-6.20.1}/sae_lens/config.py RENAMED Viewed

@@ -18,6 +18,7 @@ from datasets import (
 from sae_lens import __version__, logger
 from sae_lens.constants import DTYPE_MAP
+from sae_lens.registry import get_sae_training_class
 from sae_lens.saes.sae import TrainingSAEConfig
 if TYPE_CHECKING:
@@ -387,8 +388,11 @@ class LanguageModelSAERunnerConfig(Generic[T_TRAINING_SAE_CONFIG]):
         return self.sae.to_dict()
     def to_dict(self) -> dict[str, Any]:
-        # Make a shallow copy of config's dictionary
-        d = dict(self.__dict__)
+        """
+        Convert the config to a dictionary.
+        """
+        d = asdict(self)
         d["logger"] = asdict(self.logger)
         d["sae"] = self.sae.to_dict()
@@ -398,6 +402,37 @@ class LanguageModelSAERunnerConfig(Generic[T_TRAINING_SAE_CONFIG]):
         d["act_store_device"] = str(self.act_store_device)
         return d
+    @classmethod
+    def from_dict(cls, cfg_dict: dict[str, Any]) -> "LanguageModelSAERunnerConfig[Any]":
+        """
+        Load a LanguageModelSAERunnerConfig from a dictionary given by `to_dict`.
+        Args:
+            cfg_dict (dict[str, Any]): The dictionary to load the config from.
+        Returns:
+            LanguageModelSAERunnerConfig: The loaded config.
+        """
+        if "sae" not in cfg_dict:
+            raise ValueError("sae field is required in the config dictionary")
+        if "architecture" not in cfg_dict["sae"]:
+            raise ValueError("architecture field is required in the sae dictionary")
+        if "logger" not in cfg_dict:
+            raise ValueError("logger field is required in the config dictionary")
+        sae_config_class = get_sae_training_class(cfg_dict["sae"]["architecture"])[1]
+        sae_cfg = sae_config_class.from_dict(cfg_dict["sae"])
+        logger_cfg = LoggingConfig(**cfg_dict["logger"])
+        updated_cfg_dict: dict[str, Any] = {
+            **cfg_dict,
+            "sae": sae_cfg,
+            "logger": logger_cfg,
+        }
+        output = cls(**updated_cfg_dict)
+        # the post_init always appends to checkpoint path, so we need to set it explicitly here.
+        if "checkpoint_path" in cfg_dict:
+            output.checkpoint_path = cfg_dict["checkpoint_path"]
+        return output
     def to_sae_trainer_config(self) -> "SAETrainerConfig":
         return SAETrainerConfig(
             n_checkpoints=self.n_checkpoints,

{sae_lens-6.17.0 → sae_lens-6.20.1}/sae_lens/loading/pretrained_sae_loaders.py RENAMED Viewed

@@ -523,6 +523,82 @@ def gemma_2_sae_huggingface_loader(
     return cfg_dict, state_dict, log_sparsity
+def get_goodfire_config_from_hf(
+    repo_id: str,
+    folder_name: str,  # noqa: ARG001
+    device: str,
+    force_download: bool = False,  # noqa: ARG001
+    cfg_overrides: dict[str, Any] | None = None,
+) -> dict[str, Any]:
+    cfg_dict = None
+    if repo_id == "Goodfire/Llama-3.3-70B-Instruct-SAE-l50":
+        if folder_name != "Llama-3.3-70B-Instruct-SAE-l50.pt":
+            raise ValueError(f"Unsupported Goodfire SAE: {repo_id}/{folder_name}")
+        cfg_dict = {
+            "architecture": "standard",
+            "d_in": 8192,
+            "d_sae": 65536,
+            "model_name": "meta-llama/Llama-3.3-70B-Instruct",
+            "hook_name": "blocks.50.hook_resid_post",
+            "hook_head_index": None,
+            "dataset_path": "lmsys/lmsys-chat-1m",
+            "apply_b_dec_to_input": False,
+        }
+    elif repo_id == "Goodfire/Llama-3.1-8B-Instruct-SAE-l19":
+        if folder_name != "Llama-3.1-8B-Instruct-SAE-l19.pth":
+            raise ValueError(f"Unsupported Goodfire SAE: {repo_id}/{folder_name}")
+        cfg_dict = {
+            "architecture": "standard",
+            "d_in": 4096,
+            "d_sae": 65536,
+            "model_name": "meta-llama/Llama-3.1-8B-Instruct",
+            "hook_name": "blocks.19.hook_resid_post",
+            "hook_head_index": None,
+            "dataset_path": "lmsys/lmsys-chat-1m",
+            "apply_b_dec_to_input": False,
+        }
+    if cfg_dict is None:
+        raise ValueError(f"Unsupported Goodfire SAE: {repo_id}/{folder_name}")
+    if device is not None:
+        cfg_dict["device"] = device
+    if cfg_overrides is not None:
+        cfg_dict.update(cfg_overrides)
+    return cfg_dict
+def get_goodfire_huggingface_loader(
+    repo_id: str,
+    folder_name: str,
+    device: str = "cpu",
+    force_download: bool = False,
+    cfg_overrides: dict[str, Any] | None = None,
+) -> tuple[dict[str, Any], dict[str, torch.Tensor], torch.Tensor | None]:
+    cfg_dict = get_goodfire_config_from_hf(
+        repo_id,
+        folder_name,
+        device,
+        force_download,
+        cfg_overrides,
+    )
+    # Download the SAE weights
+    sae_path = hf_hub_download(
+        repo_id=repo_id,
+        filename=folder_name,
+        force_download=force_download,
+    )
+    raw_state_dict = torch.load(sae_path, map_location=device)
+    state_dict = {
+        "W_enc": raw_state_dict["encoder_linear.weight"].T,
+        "W_dec": raw_state_dict["decoder_linear.weight"].T,
+        "b_enc": raw_state_dict["encoder_linear.bias"],
+        "b_dec": raw_state_dict["decoder_linear.bias"],
+    }
+    return cfg_dict, state_dict, None
 def get_llama_scope_config_from_hf(
     repo_id: str,
     folder_name: str,
@@ -1475,6 +1551,114 @@ def get_mntss_clt_layer_config_from_hf(
     }
+def get_temporal_sae_config_from_hf(
+    repo_id: str,
+    folder_name: str,
+    device: str,
+    force_download: bool = False,
+    cfg_overrides: dict[str, Any] | None = None,
+) -> dict[str, Any]:
+    """Get TemporalSAE config without loading weights."""
+    # Download config file
+    conf_path = hf_hub_download(
+        repo_id=repo_id,
+        filename=f"{folder_name}/conf.yaml",
+        force_download=force_download,
+    )
+    # Load and parse config
+    with open(conf_path) as f:
+        yaml_config = yaml.safe_load(f)
+    # Extract parameters
+    d_in = yaml_config["llm"]["dimin"]
+    exp_factor = yaml_config["sae"]["exp_factor"]
+    d_sae = int(d_in * exp_factor)
+    # extract layer from folder_name eg : "layer_12/temporal"
+    layer = re.search(r"layer_(\d+)", folder_name)
+    if layer is None:
+        raise ValueError(f"Could not find layer in folder_name: {folder_name}")
+    layer = int(layer.group(1))
+    # Build config dict
+    cfg_dict = {
+        "architecture": "temporal",
+        "hook_name": f"blocks.{layer}.hook_resid_post",
+        "d_in": d_in,
+        "d_sae": d_sae,
+        "n_heads": yaml_config["sae"]["n_heads"],
+        "n_attn_layers": yaml_config["sae"]["n_attn_layers"],
+        "bottleneck_factor": yaml_config["sae"]["bottleneck_factor"],
+        "sae_diff_type": yaml_config["sae"]["sae_diff_type"],
+        "kval_topk": yaml_config["sae"]["kval_topk"],
+        "tied_weights": yaml_config["sae"]["tied_weights"],
+        "dtype": yaml_config["data"]["dtype"],
+        "device": device,
+        "normalize_activations": "constant_scalar_rescale",
+        "activation_normalization_factor": yaml_config["sae"]["scaling_factor"],
+        "apply_b_dec_to_input": True,
+    }
+    if cfg_overrides:
+        cfg_dict.update(cfg_overrides)
+    return cfg_dict
+def temporal_sae_huggingface_loader(
+    repo_id: str,
+    folder_name: str,
+    device: str = "cpu",
+    force_download: bool = False,
+    cfg_overrides: dict[str, Any] | None = None,
+) -> tuple[dict[str, Any], dict[str, torch.Tensor], torch.Tensor | None]:
+    """
+    Load TemporalSAE from canrager/temporalSAEs format (safetensors version).
+    Expects folder_name to contain:
+    - conf.yaml (configuration)
+    - latest_ckpt.safetensors (model weights)
+    """
+    cfg_dict = get_temporal_sae_config_from_hf(
+        repo_id=repo_id,
+        folder_name=folder_name,
+        device=device,
+        force_download=force_download,
+        cfg_overrides=cfg_overrides,
+    )
+    # Download checkpoint (safetensors format)
+    ckpt_path = hf_hub_download(
+        repo_id=repo_id,
+        filename=f"{folder_name}/latest_ckpt.safetensors",
+        force_download=force_download,
+    )
+    # Load checkpoint from safetensors
+    state_dict_raw = load_file(ckpt_path, device=device)
+    # Convert to SAELens naming convention
+    # TemporalSAE uses: D (decoder), E (encoder), b (bias), attn_layers.*
+    state_dict = {}
+    # Copy attention layers as-is
+    for key, value in state_dict_raw.items():
+        if key.startswith("attn_layers."):
+            state_dict[key] = value.to(device)
+    # Main parameters
+    state_dict["W_dec"] = state_dict_raw["D"].to(device)
+    state_dict["b_dec"] = state_dict_raw["b"].to(device)
+    # Handle tied/untied weights
+    if "E" in state_dict_raw:
+        state_dict["W_enc"] = state_dict_raw["E"].to(device)
+    return cfg_dict, state_dict, None
 NAMED_PRETRAINED_SAE_LOADERS: dict[str, PretrainedSaeHuggingfaceLoader] = {
     "sae_lens": sae_lens_huggingface_loader,
     "connor_rob_hook_z": connor_rob_hook_z_huggingface_loader,
@@ -1487,6 +1671,8 @@ NAMED_PRETRAINED_SAE_LOADERS: dict[str, PretrainedSaeHuggingfaceLoader] = {
     "gemma_2_transcoder": gemma_2_transcoder_huggingface_loader,
     "mwhanna_transcoder": mwhanna_transcoder_huggingface_loader,
     "mntss_clt_layer_transcoder": mntss_clt_layer_huggingface_loader,
+    "temporal": temporal_sae_huggingface_loader,
+    "goodfire": get_goodfire_huggingface_loader,
 }
@@ -1502,4 +1688,6 @@ NAMED_PRETRAINED_SAE_CONFIG_GETTERS: dict[str, PretrainedSaeConfigHuggingfaceLoa
     "gemma_2_transcoder": get_gemma_2_transcoder_config_from_hf,
     "mwhanna_transcoder": get_mwhanna_transcoder_config_from_hf,
     "mntss_clt_layer_transcoder": get_mntss_clt_layer_config_from_hf,
+    "temporal": get_temporal_sae_config_from_hf,
+    "goodfire": get_goodfire_config_from_hf,
 }

{sae_lens-6.17.0 → sae_lens-6.20.1}/sae_lens/loading/pretrained_saes_directory.py RENAMED Viewed

@@ -1,6 +1,6 @@
 from dataclasses import dataclass
 from functools import cache
-from importlib import resources
+from importlib.resources import files
 from typing import Any
 import yaml
@@ -24,7 +24,8 @@ def get_pretrained_saes_directory() -> dict[str, PretrainedSAELookup]:
     package = "sae_lens"
     # Access the file within the package using importlib.resources
     directory: dict[str, PretrainedSAELookup] = {}
-    with resources.open_text(package, "pretrained_saes.yaml") as file:
+    yaml_file = files(package).joinpath("pretrained_saes.yaml")
+    with yaml_file.open("r") as file:
         # Load the YAML file content
         data = yaml.safe_load(file)
         for release, value in data.items():
@@ -68,7 +69,8 @@ def get_norm_scaling_factor(release: str, sae_id: str) -> float | None:
         float | None: The norm_scaling_factor if it exists, None otherwise.
     """
     package = "sae_lens"
-    with resources.open_text(package, "pretrained_saes.yaml") as file:
+    yaml_file = files(package).joinpath("pretrained_saes.yaml")
+    with yaml_file.open("r") as file:
         data = yaml.safe_load(file)
         if release in data:
             for sae_info in data[release]["saes"]:

{sae_lens-6.17.0 → sae_lens-6.20.1}/sae_lens/pretrained_saes.yaml RENAMED Viewed

@@ -1,3 +1,35 @@
+temporal-sae-gemma-2-2b:
+  conversion_func: temporal
+  model: gemma-2-2b
+  repo_id: canrager/temporalSAEs
+  config_overrides:
+    model_name: gemma-2-2b
+    hook_name: blocks.12.hook_resid_post
+    dataset_path: monology/pile-uncopyrighted
+  saes:
+  - id: blocks.12.hook_resid_post
+    l0: 192
+    norm_scaling_factor: 0.00666666667
+    path: gemma-2-2B/layer_12/temporal
+    neuronpedia: gemma-2-2b/12-temporal-res
+temporal-sae-llama-3.1-8b:
+  conversion_func: temporal
+  model: meta-llama/Llama-3.1-8B
+  repo_id: canrager/temporalSAEs
+  config_overrides:
+    model_name: meta-llama/Llama-3.1-8B
+    dataset_path: monology/pile-uncopyrighted
+  saes:
+  - id: blocks.15.hook_resid_post
+    l0: 256
+    norm_scaling_factor: 0.029
+    path: llama-3.1-8B/layer_15/temporal
+    neuronpedia: llama3.1-8b/15-temporal-res
+  - id: blocks.26.hook_resid_post
+    l0: 256
+    norm_scaling_factor: 0.029
+    path: llama-3.1-8B/layer_26/temporal
+    neuronpedia: llama3.1-8b/26-temporal-res
 deepseek-r1-distill-llama-8b-qresearch:
   conversion_func: deepseek_r1
   model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
@@ -14882,4 +14914,22 @@ qwen2.5-7b-instruct-andyrdt:
     neuronpedia: qwen2.5-7b-it/23-resid-post-aa
   - id: resid_post_layer_27_trainer_1
     path: resid_post_layer_27/trainer_1
-    neuronpedia: qwen2.5-7b-it/27-resid-post-aa
+    neuronpedia: qwen2.5-7b-it/27-resid-post-aa
+goodfire-llama-3.3-70b-instruct:
+  conversion_func: goodfire
+  model: meta-llama/Llama-3.3-70B-Instruct
+  repo_id: Goodfire/Llama-3.3-70B-Instruct-SAE-l50
+  saes:
+  - id: layer_50
+    path: Llama-3.3-70B-Instruct-SAE-l50.pt
+    l0: 121
+goodfire-llama-3.1-8b-instruct:
+  conversion_func: goodfire
+  model: meta-llama/Llama-3.1-8B-Instruct
+  repo_id: Goodfire/Llama-3.1-8B-Instruct-SAE-l19
+  saes:
+  - id: layer_19
+    path: Llama-3.1-8B-Instruct-SAE-l19.pth
+    l0: 91

{sae_lens-6.17.0 → sae_lens-6.20.1}/sae_lens/saes/__init__.py RENAMED Viewed

@@ -25,6 +25,7 @@ from .standard_sae import (
     StandardTrainingSAE,
     StandardTrainingSAEConfig,
 )
+from .temporal_sae import TemporalSAE, TemporalSAEConfig
 from .topk_sae import (
     TopKSAE,
     TopKSAEConfig,
@@ -71,4 +72,6 @@ __all__ = [
     "JumpReLUTranscoderConfig",
     "MatryoshkaBatchTopKTrainingSAE",
     "MatryoshkaBatchTopKTrainingSAEConfig",
+    "TemporalSAE",
+    "TemporalSAEConfig",
 ]

{sae_lens-6.17.0 → sae_lens-6.20.1}/sae_lens/saes/sae.py RENAMED Viewed

@@ -155,9 +155,9 @@ class SAEConfig(ABC):
     dtype: str = "float32"
     device: str = "cpu"
     apply_b_dec_to_input: bool = True
-    normalize_activations: Literal[
-        "none", "expected_average_only_in", "constant_norm_rescale", "layer_norm"
-    ] = "none"  # none, expected_average_only_in (Anthropic April Update), constant_norm_rescale (Anthropic Feb Update)
+    normalize_activations: Literal["none", "expected_average_only_in", "layer_norm"] = (
+        "none"  # none, expected_average_only_in (Anthropic April Update)
+    )
     reshape_activations: Literal["none", "hook_z"] = "none"
     metadata: SAEMetadata = field(default_factory=SAEMetadata)
@@ -309,6 +309,7 @@ class SAE(HookedRootModule, Generic[T_SAE_CONFIG], ABC):
             self.run_time_activation_norm_fn_in = run_time_activation_norm_fn_in
             self.run_time_activation_norm_fn_out = run_time_activation_norm_fn_out
         elif self.cfg.normalize_activations == "layer_norm":
             #  we need to scale the norm of the input and store the scaling factor
             def run_time_activation_ln_in(
@@ -452,23 +453,14 @@ class SAE(HookedRootModule, Generic[T_SAE_CONFIG], ABC):
     def process_sae_in(
         self, sae_in: Float[torch.Tensor, "... d_in"]
     ) -> Float[torch.Tensor, "... d_in"]:
-        # print(f"Input shape to process_sae_in: {sae_in.shape}")
-        # print(f"self.cfg.hook_name: {self.cfg.hook_name}")
-        # print(f"self.b_dec shape: {self.b_dec.shape}")
-        # print(f"Hook z reshaping mode: {getattr(self, 'hook_z_reshaping_mode', False)}")
         sae_in = sae_in.to(self.dtype)
-        # print(f"Shape before reshape_fn_in: {sae_in.shape}")
         sae_in = self.reshape_fn_in(sae_in)
-        # print(f"Shape after reshape_fn_in: {sae_in.shape}")
         sae_in = self.hook_sae_input(sae_in)
         sae_in = self.run_time_activation_norm_fn_in(sae_in)
         # Here's where the error happens
         bias_term = self.b_dec * self.cfg.apply_b_dec_to_input
-        # print(f"Bias term shape: {bias_term.shape}")
         return sae_in - bias_term

sae_lens-6.20.1/sae_lens/saes/temporal_sae.py ADDED Viewed

@@ -0,0 +1,372 @@
+"""TemporalSAE: A Sparse Autoencoder with temporal attention mechanism.
+TemporalSAE decomposes activations into:
+1. Predicted codes (from attention over context)
+2. Novel codes (sparse features of the residual)
+See: https://arxiv.org/abs/2410.04185
+"""
+import math
+from dataclasses import dataclass
+from typing import Literal
+import torch
+import torch.nn.functional as F
+from jaxtyping import Float
+from torch import nn
+from typing_extensions import override
+from sae_lens import logger
+from sae_lens.saes.sae import SAE, SAEConfig
+def get_attention(query: torch.Tensor, key: torch.Tensor) -> torch.Tensor:
+    """Compute causal attention weights."""
+    L, S = query.size(-2), key.size(-2)
+    scale_factor = 1 / math.sqrt(query.size(-1))
+    attn_bias = torch.zeros(L, S, dtype=query.dtype, device=query.device)
+    temp_mask = torch.ones(L, S, dtype=torch.bool, device=query.device).tril(diagonal=0)
+    attn_bias.masked_fill_(temp_mask.logical_not(), float("-inf"))
+    attn_bias.to(query.dtype)
+    attn_weight = query @ key.transpose(-2, -1) * scale_factor
+    attn_weight += attn_bias
+    return torch.softmax(attn_weight, dim=-1)
+class ManualAttention(nn.Module):
+    """Manual attention implementation for TemporalSAE."""
+    def __init__(
+        self,
+        dimin: int,
+        n_heads: int = 4,
+        bottleneck_factor: int = 64,
+        bias_k: bool = True,
+        bias_q: bool = True,
+        bias_v: bool = True,
+        bias_o: bool = True,
+    ):
+        super().__init__()
+        assert dimin % (bottleneck_factor * n_heads) == 0
+        self.n_heads = n_heads
+        self.n_embds = dimin // bottleneck_factor
+        self.dimin = dimin
+        # Key, query, value projections
+        self.k_ctx = nn.Linear(dimin, self.n_embds, bias=bias_k)
+        self.q_target = nn.Linear(dimin, self.n_embds, bias=bias_q)
+        self.v_ctx = nn.Linear(dimin, dimin, bias=bias_v)
+        self.c_proj = nn.Linear(dimin, dimin, bias=bias_o)
+        # Normalize to match scale with representations
+        with torch.no_grad():
+            scaling = 1 / math.sqrt(self.n_embds // self.n_heads)
+            self.k_ctx.weight.copy_(
+                scaling
+                * self.k_ctx.weight
+                / (1e-6 + torch.linalg.norm(self.k_ctx.weight, dim=1, keepdim=True))
+            )
+            self.q_target.weight.copy_(
+                scaling
+                * self.q_target.weight
+                / (1e-6 + torch.linalg.norm(self.q_target.weight, dim=1, keepdim=True))
+            )
+            scaling = 1 / math.sqrt(self.dimin // self.n_heads)
+            self.v_ctx.weight.copy_(
+                scaling
+                * self.v_ctx.weight
+                / (1e-6 + torch.linalg.norm(self.v_ctx.weight, dim=1, keepdim=True))
+            )
+            scaling = 1 / math.sqrt(self.dimin)
+            self.c_proj.weight.copy_(
+                scaling
+                * self.c_proj.weight
+                / (1e-6 + torch.linalg.norm(self.c_proj.weight, dim=1, keepdim=True))
+            )
+    def forward(
+        self, x_ctx: torch.Tensor, x_target: torch.Tensor, get_attn_map: bool = False
+    ) -> tuple[torch.Tensor, torch.Tensor | None]:
+        """Compute projective attention output."""
+        k = self.k_ctx(x_ctx)
+        v = self.v_ctx(x_ctx)
+        q = self.q_target(x_target)
+        # Split into heads
+        B, T, _ = x_ctx.size()
+        k = k.view(B, T, self.n_heads, self.n_embds // self.n_heads).transpose(1, 2)
+        q = q.view(B, T, self.n_heads, self.n_embds // self.n_heads).transpose(1, 2)
+        v = v.view(B, T, self.n_heads, self.dimin // self.n_heads).transpose(1, 2)
+        # Attention map (optional)
+        attn_map = None
+        if get_attn_map:
+            attn_map = get_attention(query=q, key=k)
+        # Scaled dot-product attention
+        attn_output = torch.nn.functional.scaled_dot_product_attention(
+            q, k, v, attn_mask=None, dropout_p=0, is_causal=True
+        )
+        # Reshape and project
+        d_target = self.c_proj(
+            attn_output.transpose(1, 2).contiguous().view(B, T, self.dimin)
+        )
+        return d_target, attn_map
+@dataclass
+class TemporalSAEConfig(SAEConfig):
+    """Configuration for TemporalSAE inference.
+    Args:
+        d_in: Input dimension (dimensionality of the activations being encoded)
+        d_sae: SAE latent dimension (number of features)
+        n_heads: Number of attention heads in temporal attention
+        n_attn_layers: Number of attention layers
+        bottleneck_factor: Bottleneck factor for attention dimension
+        sae_diff_type: Type of SAE for novel codes ('relu' or 'topk')
+        kval_topk: K value for top-k sparsity (if sae_diff_type='topk')
+        tied_weights: Whether to tie encoder and decoder weights
+        activation_normalization_factor: Scalar factor for rescaling activations (used with normalize_activations='constant_scalar_rescale')
+    """
+    n_heads: int = 8
+    n_attn_layers: int = 1
+    bottleneck_factor: int = 64
+    sae_diff_type: Literal["relu", "topk"] = "topk"
+    kval_topk: int | None = None
+    tied_weights: bool = True
+    activation_normalization_factor: float = 1.0
+    def __post_init__(self):
+        # Call parent's __post_init__ first, but allow constant_scalar_rescale
+        if self.normalize_activations not in [
+            "none",
+            "expected_average_only_in",
+            "constant_norm_rescale",
+            "constant_scalar_rescale",  # Temporal SAEs support this
+            "layer_norm",
+        ]:
+            raise ValueError(
+                f"normalize_activations must be none, expected_average_only_in, layer_norm, constant_norm_rescale, or constant_scalar_rescale. Got {self.normalize_activations}"
+            )
+    @override
+    @classmethod
+    def architecture(cls) -> str:
+        return "temporal"
+class TemporalSAE(SAE[TemporalSAEConfig]):
+    """TemporalSAE: Sparse Autoencoder with temporal attention.
+    This SAE decomposes each activation x_t into:
+    - x_pred: Information aggregated from context {x_0, ..., x_{t-1}}
+    - x_novel: Novel information at position t (encoded sparsely)
+    The forward pass:
+    1. Uses attention layers to predict x_t from context
+    2. Encodes the residual (novel part) with a sparse SAE
+    3. Combines both for reconstruction
+    """
+    # Custom parameters (in addition to W_enc, W_dec, b_dec from base)
+    attn_layers: nn.ModuleList  # Attention layers
+    eps: float
+    lam: float
+    def __init__(self, cfg: TemporalSAEConfig, use_error_term: bool = False):
+        # Call parent init first
+        super().__init__(cfg, use_error_term)
+        # Initialize attention layers after parent init and move to correct device
+        self.attn_layers = nn.ModuleList(
+            [
+                ManualAttention(
+                    dimin=cfg.d_sae,
+                    n_heads=cfg.n_heads,
+                    bottleneck_factor=cfg.bottleneck_factor,
+                    bias_k=True,
+                    bias_q=True,
+                    bias_v=True,
+                    bias_o=True,
+                ).to(device=self.device, dtype=self.dtype)
+                for _ in range(cfg.n_attn_layers)
+            ]
+        )
+        self.eps = 1e-6
+        self.lam = 1 / (4 * self.cfg.d_in)
+    @override
+    def _setup_activation_normalization(self):
+        """Set up activation normalization functions for TemporalSAE.
+        Overrides the base implementation to handle constant_scalar_rescale
+        using the temporal-specific activation_normalization_factor.
+        """
+        if self.cfg.normalize_activations == "constant_scalar_rescale":
+            # Handle constant scalar rescaling for temporal SAEs
+            def run_time_activation_norm_fn_in(x: torch.Tensor) -> torch.Tensor:
+                return x * self.cfg.activation_normalization_factor
+            def run_time_activation_norm_fn_out(x: torch.Tensor) -> torch.Tensor:
+                return x / self.cfg.activation_normalization_factor
+            self.run_time_activation_norm_fn_in = run_time_activation_norm_fn_in
+            self.run_time_activation_norm_fn_out = run_time_activation_norm_fn_out
+        else:
+            # Delegate to parent for all other normalization types
+            super()._setup_activation_normalization()
+    @override
+    def initialize_weights(self) -> None:
+        """Initialize TemporalSAE weights."""
+        # Initialize D (decoder) and b (bias)
+        self.W_dec = nn.Parameter(
+            torch.randn(
+                (self.cfg.d_sae, self.cfg.d_in), dtype=self.dtype, device=self.device
+            )
+        )
+        self.b_dec = nn.Parameter(
+            torch.zeros((self.cfg.d_in), dtype=self.dtype, device=self.device)
+        )
+        # Initialize E (encoder) if not tied
+        if not self.cfg.tied_weights:
+            self.W_enc = nn.Parameter(
+                torch.randn(
+                    (self.cfg.d_in, self.cfg.d_sae),
+                    dtype=self.dtype,
+                    device=self.device,
+                )
+            )
+    def encode_with_predictions(
+        self, x: Float[torch.Tensor, "... d_in"]
+    ) -> tuple[Float[torch.Tensor, "... d_sae"], Float[torch.Tensor, "... d_sae"]]:
+        """Encode input to novel codes only.
+        Returns only the sparse novel codes (not predicted codes).
+        This is the main feature representation for TemporalSAE.
+        """
+        # Process input through SAELens preprocessing
+        x = self.process_sae_in(x)
+        B, L, _ = x.shape
+        if self.cfg.tied_weights:  # noqa: SIM108
+            W_enc = self.W_dec.T
+        else:
+            W_enc = self.W_enc
+        # Compute predicted codes using attention
+        x_residual = x
+        z_pred = torch.zeros((B, L, self.cfg.d_sae), device=x.device, dtype=x.dtype)
+        for attn_layer in self.attn_layers:
+            # Encode input to latent space
+            z_input = F.relu(torch.matmul(x_residual * self.lam, W_enc))
+            # Shift context (causal masking)
+            z_ctx = torch.cat(
+                (torch.zeros_like(z_input[:, :1, :]), z_input[:, :-1, :].clone()), dim=1
+            )
+            # Apply attention to get predicted codes
+            z_pred_, _ = attn_layer(z_ctx, z_input, get_attn_map=False)
+            z_pred_ = F.relu(z_pred_)
+            # Project predicted codes back to input space
+            Dz_pred_ = torch.matmul(z_pred_, self.W_dec)
+            Dz_norm_ = Dz_pred_.norm(dim=-1, keepdim=True) + self.eps
+            # Compute projection scale
+            proj_scale = (Dz_pred_ * x_residual).sum(
+                dim=-1, keepdim=True
+            ) / Dz_norm_.pow(2)
+            # Accumulate predicted codes
+            z_pred = z_pred + (z_pred_ * proj_scale)
+            # Remove prediction from residual
+            x_residual = x_residual - proj_scale * Dz_pred_
+        # Encode residual (novel part) with sparse SAE
+        z_novel = F.relu(torch.matmul(x_residual * self.lam, W_enc))
+        if self.cfg.sae_diff_type == "topk":
+            kval = self.cfg.kval_topk
+            if kval is not None:
+                _, topk_indices = torch.topk(z_novel, kval, dim=-1)
+                mask = torch.zeros_like(z_novel)
+                mask.scatter_(-1, topk_indices, 1)
+                z_novel = z_novel * mask
+        # Return only novel codes (these are the interpretable features)
+        return z_novel, z_pred
+    def encode(
+        self, x: Float[torch.Tensor, "... d_in"]
+    ) -> Float[torch.Tensor, "... d_sae"]:
+        return self.encode_with_predictions(x)[0]
+    def decode(
+        self, feature_acts: Float[torch.Tensor, "... d_sae"]
+    ) -> Float[torch.Tensor, "... d_in"]:
+        """Decode novel codes to reconstruction.
+        Note: This only decodes the novel codes. For full reconstruction,
+        use forward() which includes predicted codes.
+        """
+        # Decode novel codes
+        sae_out = torch.matmul(feature_acts, self.W_dec)
+        sae_out = sae_out + self.b_dec
+        # Apply hook
+        sae_out = self.hook_sae_recons(sae_out)
+        # Apply output activation normalization (reverses input normalization)
+        sae_out = self.run_time_activation_norm_fn_out(sae_out)
+        # Add bias (already removed in process_sae_in)
+        logger.warning(
+            "NOTE this only decodes x_novel. The x_pred is missing, so we're not reconstructing the full x."
+        )
+        return sae_out
+    @override
+    def forward(
+        self, x: Float[torch.Tensor, "... d_in"]
+    ) -> Float[torch.Tensor, "... d_in"]:
+        """Full forward pass through TemporalSAE.
+        Returns complete reconstruction (predicted + novel).
+        """
+        # Encode
+        z_novel, z_pred = self.encode_with_predictions(x)
+        # Decode the sum of predicted and novel codes.
+        x_recons = torch.matmul(z_novel + z_pred, self.W_dec) + self.b_dec
+        # Apply output activation normalization (reverses input normalization)
+        x_recons = self.run_time_activation_norm_fn_out(x_recons)
+        return self.hook_sae_output(x_recons)
+    @override
+    def fold_W_dec_norm(self) -> None:
+        raise NotImplementedError("Folding W_dec_norm is not supported for TemporalSAE")
+    @override
+    @torch.no_grad()
+    def fold_activation_norm_scaling_factor(self, scaling_factor: float) -> None:
+        raise NotImplementedError(
+            "Folding activation norm scaling factor is not supported for TemporalSAE"
+        )

{sae_lens-6.17.0 → sae_lens-6.20.1}/sae_lens/training/activations_store.py RENAMED Viewed

@@ -319,7 +319,7 @@ class ActivationsStore:
                 )
         else:
             warnings.warn(
-                "Dataset is not tokenized. Pre-tokenizing will improve performance and allows for more control over special tokens. See https://jbloomaus.github.io/SAELens/training_saes/#pretokenizing-datasets for more info."
+                "Dataset is not tokenized. Pre-tokenizing will improve performance and allows for more control over special tokens. See https://decoderesearch.github.io/SAELens/training_saes/#pretokenizing-datasets for more info."
             )
         self.iterable_sequences = self._iterate_tokenized_sequences()