@synsci/cli-darwin-x64 1.1.49
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/skills/accelerate/SKILL.md +332 -0
- package/bin/skills/accelerate/references/custom-plugins.md +453 -0
- package/bin/skills/accelerate/references/megatron-integration.md +489 -0
- package/bin/skills/accelerate/references/performance.md +525 -0
- package/bin/skills/audiocraft/SKILL.md +564 -0
- package/bin/skills/audiocraft/references/advanced-usage.md +666 -0
- package/bin/skills/audiocraft/references/troubleshooting.md +504 -0
- package/bin/skills/autogpt/SKILL.md +403 -0
- package/bin/skills/autogpt/references/advanced-usage.md +535 -0
- package/bin/skills/autogpt/references/troubleshooting.md +420 -0
- package/bin/skills/awq/SKILL.md +310 -0
- package/bin/skills/awq/references/advanced-usage.md +324 -0
- package/bin/skills/awq/references/troubleshooting.md +344 -0
- package/bin/skills/axolotl/SKILL.md +158 -0
- package/bin/skills/axolotl/references/api.md +5548 -0
- package/bin/skills/axolotl/references/dataset-formats.md +1029 -0
- package/bin/skills/axolotl/references/index.md +15 -0
- package/bin/skills/axolotl/references/other.md +3563 -0
- package/bin/skills/bigcode-evaluation-harness/SKILL.md +405 -0
- package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +393 -0
- package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +424 -0
- package/bin/skills/bigcode-evaluation-harness/references/issues.md +394 -0
- package/bin/skills/bitsandbytes/SKILL.md +411 -0
- package/bin/skills/bitsandbytes/references/memory-optimization.md +521 -0
- package/bin/skills/bitsandbytes/references/qlora-training.md +521 -0
- package/bin/skills/bitsandbytes/references/quantization-formats.md +447 -0
- package/bin/skills/blip-2/SKILL.md +564 -0
- package/bin/skills/blip-2/references/advanced-usage.md +680 -0
- package/bin/skills/blip-2/references/troubleshooting.md +526 -0
- package/bin/skills/chroma/SKILL.md +406 -0
- package/bin/skills/chroma/references/integration.md +38 -0
- package/bin/skills/clip/SKILL.md +253 -0
- package/bin/skills/clip/references/applications.md +207 -0
- package/bin/skills/constitutional-ai/SKILL.md +290 -0
- package/bin/skills/crewai/SKILL.md +498 -0
- package/bin/skills/crewai/references/flows.md +438 -0
- package/bin/skills/crewai/references/tools.md +429 -0
- package/bin/skills/crewai/references/troubleshooting.md +480 -0
- package/bin/skills/deepspeed/SKILL.md +141 -0
- package/bin/skills/deepspeed/references/08.md +17 -0
- package/bin/skills/deepspeed/references/09.md +173 -0
- package/bin/skills/deepspeed/references/2020.md +378 -0
- package/bin/skills/deepspeed/references/2023.md +279 -0
- package/bin/skills/deepspeed/references/assets.md +179 -0
- package/bin/skills/deepspeed/references/index.md +35 -0
- package/bin/skills/deepspeed/references/mii.md +118 -0
- package/bin/skills/deepspeed/references/other.md +1191 -0
- package/bin/skills/deepspeed/references/tutorials.md +6554 -0
- package/bin/skills/dspy/SKILL.md +590 -0
- package/bin/skills/dspy/references/examples.md +663 -0
- package/bin/skills/dspy/references/modules.md +475 -0
- package/bin/skills/dspy/references/optimizers.md +566 -0
- package/bin/skills/faiss/SKILL.md +221 -0
- package/bin/skills/faiss/references/index_types.md +280 -0
- package/bin/skills/flash-attention/SKILL.md +367 -0
- package/bin/skills/flash-attention/references/benchmarks.md +215 -0
- package/bin/skills/flash-attention/references/transformers-integration.md +293 -0
- package/bin/skills/gguf/SKILL.md +427 -0
- package/bin/skills/gguf/references/advanced-usage.md +504 -0
- package/bin/skills/gguf/references/troubleshooting.md +442 -0
- package/bin/skills/gptq/SKILL.md +450 -0
- package/bin/skills/gptq/references/calibration.md +337 -0
- package/bin/skills/gptq/references/integration.md +129 -0
- package/bin/skills/gptq/references/troubleshooting.md +95 -0
- package/bin/skills/grpo-rl-training/README.md +97 -0
- package/bin/skills/grpo-rl-training/SKILL.md +572 -0
- package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +393 -0
- package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +228 -0
- package/bin/skills/guidance/SKILL.md +572 -0
- package/bin/skills/guidance/references/backends.md +554 -0
- package/bin/skills/guidance/references/constraints.md +674 -0
- package/bin/skills/guidance/references/examples.md +767 -0
- package/bin/skills/hqq/SKILL.md +445 -0
- package/bin/skills/hqq/references/advanced-usage.md +528 -0
- package/bin/skills/hqq/references/troubleshooting.md +503 -0
- package/bin/skills/hugging-face-cli/SKILL.md +191 -0
- package/bin/skills/hugging-face-cli/references/commands.md +954 -0
- package/bin/skills/hugging-face-cli/references/examples.md +374 -0
- package/bin/skills/hugging-face-datasets/SKILL.md +547 -0
- package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +239 -0
- package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +196 -0
- package/bin/skills/hugging-face-datasets/examples/training_examples.json +176 -0
- package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +522 -0
- package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +844 -0
- package/bin/skills/hugging-face-datasets/templates/chat.json +55 -0
- package/bin/skills/hugging-face-datasets/templates/classification.json +62 -0
- package/bin/skills/hugging-face-datasets/templates/completion.json +51 -0
- package/bin/skills/hugging-face-datasets/templates/custom.json +75 -0
- package/bin/skills/hugging-face-datasets/templates/qa.json +54 -0
- package/bin/skills/hugging-face-datasets/templates/tabular.json +81 -0
- package/bin/skills/hugging-face-evaluation/SKILL.md +656 -0
- package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +382 -0
- package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +141 -0
- package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +135 -0
- package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +50 -0
- package/bin/skills/hugging-face-evaluation/requirements.txt +20 -0
- package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +1374 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +104 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +317 -0
- package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +303 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +98 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +331 -0
- package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +206 -0
- package/bin/skills/hugging-face-jobs/SKILL.md +1041 -0
- package/bin/skills/hugging-face-jobs/index.html +216 -0
- package/bin/skills/hugging-face-jobs/references/hardware_guide.md +336 -0
- package/bin/skills/hugging-face-jobs/references/hub_saving.md +352 -0
- package/bin/skills/hugging-face-jobs/references/token_usage.md +546 -0
- package/bin/skills/hugging-face-jobs/references/troubleshooting.md +475 -0
- package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +718 -0
- package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +546 -0
- package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +587 -0
- package/bin/skills/hugging-face-model-trainer/SKILL.md +711 -0
- package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +296 -0
- package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +283 -0
- package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +364 -0
- package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +371 -0
- package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +189 -0
- package/bin/skills/hugging-face-model-trainer/references/training_methods.md +150 -0
- package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +203 -0
- package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +282 -0
- package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +424 -0
- package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +417 -0
- package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +150 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +106 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +89 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +122 -0
- package/bin/skills/hugging-face-paper-publisher/SKILL.md +627 -0
- package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +327 -0
- package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +216 -0
- package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +508 -0
- package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +299 -0
- package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +358 -0
- package/bin/skills/hugging-face-paper-publisher/templates/modern.md +319 -0
- package/bin/skills/hugging-face-paper-publisher/templates/standard.md +201 -0
- package/bin/skills/hugging-face-tool-builder/SKILL.md +115 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +57 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +40 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +57 -0
- package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +230 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +96 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +188 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +171 -0
- package/bin/skills/hugging-face-trackio/SKILL.md +65 -0
- package/bin/skills/hugging-face-trackio/references/logging_metrics.md +206 -0
- package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +223 -0
- package/bin/skills/huggingface-tokenizers/SKILL.md +516 -0
- package/bin/skills/huggingface-tokenizers/references/algorithms.md +653 -0
- package/bin/skills/huggingface-tokenizers/references/integration.md +637 -0
- package/bin/skills/huggingface-tokenizers/references/pipeline.md +723 -0
- package/bin/skills/huggingface-tokenizers/references/training.md +565 -0
- package/bin/skills/instructor/SKILL.md +740 -0
- package/bin/skills/instructor/references/examples.md +107 -0
- package/bin/skills/instructor/references/providers.md +70 -0
- package/bin/skills/instructor/references/validation.md +606 -0
- package/bin/skills/knowledge-distillation/SKILL.md +458 -0
- package/bin/skills/knowledge-distillation/references/minillm.md +334 -0
- package/bin/skills/lambda-labs/SKILL.md +545 -0
- package/bin/skills/lambda-labs/references/advanced-usage.md +611 -0
- package/bin/skills/lambda-labs/references/troubleshooting.md +530 -0
- package/bin/skills/langchain/SKILL.md +480 -0
- package/bin/skills/langchain/references/agents.md +499 -0
- package/bin/skills/langchain/references/integration.md +562 -0
- package/bin/skills/langchain/references/rag.md +600 -0
- package/bin/skills/langsmith/SKILL.md +422 -0
- package/bin/skills/langsmith/references/advanced-usage.md +548 -0
- package/bin/skills/langsmith/references/troubleshooting.md +537 -0
- package/bin/skills/litgpt/SKILL.md +469 -0
- package/bin/skills/litgpt/references/custom-models.md +568 -0
- package/bin/skills/litgpt/references/distributed-training.md +451 -0
- package/bin/skills/litgpt/references/supported-models.md +336 -0
- package/bin/skills/litgpt/references/training-recipes.md +619 -0
- package/bin/skills/llama-cpp/SKILL.md +258 -0
- package/bin/skills/llama-cpp/references/optimization.md +89 -0
- package/bin/skills/llama-cpp/references/quantization.md +213 -0
- package/bin/skills/llama-cpp/references/server.md +125 -0
- package/bin/skills/llama-factory/SKILL.md +80 -0
- package/bin/skills/llama-factory/references/_images.md +23 -0
- package/bin/skills/llama-factory/references/advanced.md +1055 -0
- package/bin/skills/llama-factory/references/getting_started.md +349 -0
- package/bin/skills/llama-factory/references/index.md +19 -0
- package/bin/skills/llama-factory/references/other.md +31 -0
- package/bin/skills/llamaguard/SKILL.md +337 -0
- package/bin/skills/llamaindex/SKILL.md +569 -0
- package/bin/skills/llamaindex/references/agents.md +83 -0
- package/bin/skills/llamaindex/references/data_connectors.md +108 -0
- package/bin/skills/llamaindex/references/query_engines.md +406 -0
- package/bin/skills/llava/SKILL.md +304 -0
- package/bin/skills/llava/references/training.md +197 -0
- package/bin/skills/lm-evaluation-harness/SKILL.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
- package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
- package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
- package/bin/skills/long-context/SKILL.md +536 -0
- package/bin/skills/long-context/references/extension_methods.md +468 -0
- package/bin/skills/long-context/references/fine_tuning.md +611 -0
- package/bin/skills/long-context/references/rope.md +402 -0
- package/bin/skills/mamba/SKILL.md +260 -0
- package/bin/skills/mamba/references/architecture-details.md +206 -0
- package/bin/skills/mamba/references/benchmarks.md +255 -0
- package/bin/skills/mamba/references/training-guide.md +388 -0
- package/bin/skills/megatron-core/SKILL.md +366 -0
- package/bin/skills/megatron-core/references/benchmarks.md +249 -0
- package/bin/skills/megatron-core/references/parallelism-guide.md +404 -0
- package/bin/skills/megatron-core/references/production-examples.md +473 -0
- package/bin/skills/megatron-core/references/training-recipes.md +547 -0
- package/bin/skills/miles/SKILL.md +315 -0
- package/bin/skills/miles/references/api-reference.md +141 -0
- package/bin/skills/miles/references/troubleshooting.md +352 -0
- package/bin/skills/mlflow/SKILL.md +704 -0
- package/bin/skills/mlflow/references/deployment.md +744 -0
- package/bin/skills/mlflow/references/model-registry.md +770 -0
- package/bin/skills/mlflow/references/tracking.md +680 -0
- package/bin/skills/modal/SKILL.md +341 -0
- package/bin/skills/modal/references/advanced-usage.md +503 -0
- package/bin/skills/modal/references/troubleshooting.md +494 -0
- package/bin/skills/model-merging/SKILL.md +539 -0
- package/bin/skills/model-merging/references/evaluation.md +462 -0
- package/bin/skills/model-merging/references/examples.md +428 -0
- package/bin/skills/model-merging/references/methods.md +352 -0
- package/bin/skills/model-pruning/SKILL.md +495 -0
- package/bin/skills/model-pruning/references/wanda.md +347 -0
- package/bin/skills/moe-training/SKILL.md +526 -0
- package/bin/skills/moe-training/references/architectures.md +432 -0
- package/bin/skills/moe-training/references/inference.md +348 -0
- package/bin/skills/moe-training/references/training.md +425 -0
- package/bin/skills/nanogpt/SKILL.md +290 -0
- package/bin/skills/nanogpt/references/architecture.md +382 -0
- package/bin/skills/nanogpt/references/data.md +476 -0
- package/bin/skills/nanogpt/references/training.md +564 -0
- package/bin/skills/nemo-curator/SKILL.md +383 -0
- package/bin/skills/nemo-curator/references/deduplication.md +87 -0
- package/bin/skills/nemo-curator/references/filtering.md +102 -0
- package/bin/skills/nemo-evaluator/SKILL.md +494 -0
- package/bin/skills/nemo-evaluator/references/adapter-system.md +340 -0
- package/bin/skills/nemo-evaluator/references/configuration.md +447 -0
- package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +315 -0
- package/bin/skills/nemo-evaluator/references/execution-backends.md +361 -0
- package/bin/skills/nemo-guardrails/SKILL.md +297 -0
- package/bin/skills/nnsight/SKILL.md +436 -0
- package/bin/skills/nnsight/references/README.md +78 -0
- package/bin/skills/nnsight/references/api.md +344 -0
- package/bin/skills/nnsight/references/tutorials.md +300 -0
- package/bin/skills/openrlhf/SKILL.md +249 -0
- package/bin/skills/openrlhf/references/algorithm-comparison.md +404 -0
- package/bin/skills/openrlhf/references/custom-rewards.md +530 -0
- package/bin/skills/openrlhf/references/hybrid-engine.md +287 -0
- package/bin/skills/openrlhf/references/multi-node-training.md +454 -0
- package/bin/skills/outlines/SKILL.md +652 -0
- package/bin/skills/outlines/references/backends.md +615 -0
- package/bin/skills/outlines/references/examples.md +773 -0
- package/bin/skills/outlines/references/json_generation.md +652 -0
- package/bin/skills/peft/SKILL.md +431 -0
- package/bin/skills/peft/references/advanced-usage.md +514 -0
- package/bin/skills/peft/references/troubleshooting.md +480 -0
- package/bin/skills/phoenix/SKILL.md +475 -0
- package/bin/skills/phoenix/references/advanced-usage.md +619 -0
- package/bin/skills/phoenix/references/troubleshooting.md +538 -0
- package/bin/skills/pinecone/SKILL.md +358 -0
- package/bin/skills/pinecone/references/deployment.md +181 -0
- package/bin/skills/pytorch-fsdp/SKILL.md +126 -0
- package/bin/skills/pytorch-fsdp/references/index.md +7 -0
- package/bin/skills/pytorch-fsdp/references/other.md +4249 -0
- package/bin/skills/pytorch-lightning/SKILL.md +346 -0
- package/bin/skills/pytorch-lightning/references/callbacks.md +436 -0
- package/bin/skills/pytorch-lightning/references/distributed.md +490 -0
- package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +556 -0
- package/bin/skills/pyvene/SKILL.md +473 -0
- package/bin/skills/pyvene/references/README.md +73 -0
- package/bin/skills/pyvene/references/api.md +383 -0
- package/bin/skills/pyvene/references/tutorials.md +376 -0
- package/bin/skills/qdrant/SKILL.md +493 -0
- package/bin/skills/qdrant/references/advanced-usage.md +648 -0
- package/bin/skills/qdrant/references/troubleshooting.md +631 -0
- package/bin/skills/ray-data/SKILL.md +326 -0
- package/bin/skills/ray-data/references/integration.md +82 -0
- package/bin/skills/ray-data/references/transformations.md +83 -0
- package/bin/skills/ray-train/SKILL.md +406 -0
- package/bin/skills/ray-train/references/multi-node.md +628 -0
- package/bin/skills/rwkv/SKILL.md +260 -0
- package/bin/skills/rwkv/references/architecture-details.md +344 -0
- package/bin/skills/rwkv/references/rwkv7.md +386 -0
- package/bin/skills/rwkv/references/state-management.md +369 -0
- package/bin/skills/saelens/SKILL.md +386 -0
- package/bin/skills/saelens/references/README.md +70 -0
- package/bin/skills/saelens/references/api.md +333 -0
- package/bin/skills/saelens/references/tutorials.md +318 -0
- package/bin/skills/segment-anything/SKILL.md +500 -0
- package/bin/skills/segment-anything/references/advanced-usage.md +589 -0
- package/bin/skills/segment-anything/references/troubleshooting.md +484 -0
- package/bin/skills/sentence-transformers/SKILL.md +255 -0
- package/bin/skills/sentence-transformers/references/models.md +123 -0
- package/bin/skills/sentencepiece/SKILL.md +235 -0
- package/bin/skills/sentencepiece/references/algorithms.md +200 -0
- package/bin/skills/sentencepiece/references/training.md +304 -0
- package/bin/skills/sglang/SKILL.md +442 -0
- package/bin/skills/sglang/references/deployment.md +490 -0
- package/bin/skills/sglang/references/radix-attention.md +413 -0
- package/bin/skills/sglang/references/structured-generation.md +541 -0
- package/bin/skills/simpo/SKILL.md +219 -0
- package/bin/skills/simpo/references/datasets.md +478 -0
- package/bin/skills/simpo/references/hyperparameters.md +452 -0
- package/bin/skills/simpo/references/loss-functions.md +350 -0
- package/bin/skills/skypilot/SKILL.md +509 -0
- package/bin/skills/skypilot/references/advanced-usage.md +491 -0
- package/bin/skills/skypilot/references/troubleshooting.md +570 -0
- package/bin/skills/slime/SKILL.md +464 -0
- package/bin/skills/slime/references/api-reference.md +392 -0
- package/bin/skills/slime/references/troubleshooting.md +386 -0
- package/bin/skills/speculative-decoding/SKILL.md +467 -0
- package/bin/skills/speculative-decoding/references/lookahead.md +309 -0
- package/bin/skills/speculative-decoding/references/medusa.md +350 -0
- package/bin/skills/stable-diffusion/SKILL.md +519 -0
- package/bin/skills/stable-diffusion/references/advanced-usage.md +716 -0
- package/bin/skills/stable-diffusion/references/troubleshooting.md +555 -0
- package/bin/skills/tensorboard/SKILL.md +629 -0
- package/bin/skills/tensorboard/references/integrations.md +638 -0
- package/bin/skills/tensorboard/references/profiling.md +545 -0
- package/bin/skills/tensorboard/references/visualization.md +620 -0
- package/bin/skills/tensorrt-llm/SKILL.md +187 -0
- package/bin/skills/tensorrt-llm/references/multi-gpu.md +298 -0
- package/bin/skills/tensorrt-llm/references/optimization.md +242 -0
- package/bin/skills/tensorrt-llm/references/serving.md +470 -0
- package/bin/skills/tinker/SKILL.md +362 -0
- package/bin/skills/tinker/references/api-reference.md +168 -0
- package/bin/skills/tinker/references/getting-started.md +157 -0
- package/bin/skills/tinker/references/loss-functions.md +163 -0
- package/bin/skills/tinker/references/models-and-lora.md +139 -0
- package/bin/skills/tinker/references/recipes.md +280 -0
- package/bin/skills/tinker/references/reinforcement-learning.md +212 -0
- package/bin/skills/tinker/references/rendering.md +243 -0
- package/bin/skills/tinker/references/supervised-learning.md +232 -0
- package/bin/skills/tinker-training-cost/SKILL.md +187 -0
- package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +123 -0
- package/bin/skills/torchforge/SKILL.md +433 -0
- package/bin/skills/torchforge/references/api-reference.md +327 -0
- package/bin/skills/torchforge/references/troubleshooting.md +409 -0
- package/bin/skills/torchtitan/SKILL.md +358 -0
- package/bin/skills/torchtitan/references/checkpoint.md +181 -0
- package/bin/skills/torchtitan/references/custom-models.md +258 -0
- package/bin/skills/torchtitan/references/float8.md +133 -0
- package/bin/skills/torchtitan/references/fsdp.md +126 -0
- package/bin/skills/transformer-lens/SKILL.md +346 -0
- package/bin/skills/transformer-lens/references/README.md +54 -0
- package/bin/skills/transformer-lens/references/api.md +362 -0
- package/bin/skills/transformer-lens/references/tutorials.md +339 -0
- package/bin/skills/trl-fine-tuning/SKILL.md +455 -0
- package/bin/skills/trl-fine-tuning/references/dpo-variants.md +227 -0
- package/bin/skills/trl-fine-tuning/references/online-rl.md +82 -0
- package/bin/skills/trl-fine-tuning/references/reward-modeling.md +122 -0
- package/bin/skills/trl-fine-tuning/references/sft-training.md +168 -0
- package/bin/skills/unsloth/SKILL.md +80 -0
- package/bin/skills/unsloth/references/index.md +7 -0
- package/bin/skills/unsloth/references/llms-full.md +16799 -0
- package/bin/skills/unsloth/references/llms-txt.md +12044 -0
- package/bin/skills/unsloth/references/llms.md +82 -0
- package/bin/skills/verl/SKILL.md +391 -0
- package/bin/skills/verl/references/api-reference.md +301 -0
- package/bin/skills/verl/references/troubleshooting.md +391 -0
- package/bin/skills/vllm/SKILL.md +364 -0
- package/bin/skills/vllm/references/optimization.md +226 -0
- package/bin/skills/vllm/references/quantization.md +284 -0
- package/bin/skills/vllm/references/server-deployment.md +255 -0
- package/bin/skills/vllm/references/troubleshooting.md +447 -0
- package/bin/skills/weights-and-biases/SKILL.md +590 -0
- package/bin/skills/weights-and-biases/references/artifacts.md +584 -0
- package/bin/skills/weights-and-biases/references/integrations.md +700 -0
- package/bin/skills/weights-and-biases/references/sweeps.md +847 -0
- package/bin/skills/whisper/SKILL.md +317 -0
- package/bin/skills/whisper/references/languages.md +189 -0
- package/bin/synsc +0 -0
- package/package.json +10 -0
|
@@ -0,0 +1,346 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: transformer-lens-interpretability
|
|
3
|
+
description: Provides guidance for mechanistic interpretability research using TransformerLens to inspect and manipulate transformer internals via HookPoints and activation caching. Use when reverse-engineering model algorithms, studying attention patterns, or performing activation patching experiments.
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
author: Synthetic Sciences
|
|
6
|
+
license: MIT
|
|
7
|
+
tags: [Mechanistic Interpretability, TransformerLens, Activation Patching, Circuit Analysis]
|
|
8
|
+
dependencies: [transformer-lens>=2.0.0, torch>=2.0.0]
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
# TransformerLens: Mechanistic Interpretability for Transformers
|
|
12
|
+
|
|
13
|
+
TransformerLens is the de facto standard library for mechanistic interpretability research on GPT-style language models. Created by Neel Nanda and maintained by Bryce Meyer, it provides clean interfaces to inspect and manipulate model internals via HookPoints on every activation.
|
|
14
|
+
|
|
15
|
+
**GitHub**: [TransformerLensOrg/TransformerLens](https://github.com/TransformerLensOrg/TransformerLens) (2,900+ stars)
|
|
16
|
+
|
|
17
|
+
## When to Use TransformerLens
|
|
18
|
+
|
|
19
|
+
**Use TransformerLens when you need to:**
|
|
20
|
+
- Reverse-engineer algorithms learned during training
|
|
21
|
+
- Perform activation patching / causal tracing experiments
|
|
22
|
+
- Study attention patterns and information flow
|
|
23
|
+
- Analyze circuits (e.g., induction heads, IOI circuit)
|
|
24
|
+
- Cache and inspect intermediate activations
|
|
25
|
+
- Apply direct logit attribution
|
|
26
|
+
|
|
27
|
+
**Consider alternatives when:**
|
|
28
|
+
- You need to work with non-transformer architectures → Use **nnsight** or **pyvene**
|
|
29
|
+
- You want to train/analyze Sparse Autoencoders → Use **SAELens**
|
|
30
|
+
- You need remote execution on massive models → Use **nnsight** with NDIF
|
|
31
|
+
- You want higher-level causal intervention abstractions → Use **pyvene**
|
|
32
|
+
|
|
33
|
+
## Installation
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
pip install transformer-lens
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
For development version:
|
|
40
|
+
```bash
|
|
41
|
+
pip install git+https://github.com/TransformerLensOrg/TransformerLens
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
## Core Concepts
|
|
45
|
+
|
|
46
|
+
### HookedTransformer
|
|
47
|
+
|
|
48
|
+
The main class that wraps transformer models with HookPoints on every activation:
|
|
49
|
+
|
|
50
|
+
```python
|
|
51
|
+
from transformer_lens import HookedTransformer
|
|
52
|
+
|
|
53
|
+
# Load a model
|
|
54
|
+
model = HookedTransformer.from_pretrained("gpt2-small")
|
|
55
|
+
|
|
56
|
+
# For gated models (LLaMA, Mistral)
|
|
57
|
+
import os
|
|
58
|
+
os.environ["HF_TOKEN"] = "your_token"
|
|
59
|
+
model = HookedTransformer.from_pretrained("meta-llama/Llama-2-7b-hf")
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
### Supported Models (50+)
|
|
63
|
+
|
|
64
|
+
| Family | Models |
|
|
65
|
+
|--------|--------|
|
|
66
|
+
| GPT-2 | gpt2, gpt2-medium, gpt2-large, gpt2-xl |
|
|
67
|
+
| LLaMA | llama-7b, llama-13b, llama-2-7b, llama-2-13b |
|
|
68
|
+
| EleutherAI | pythia-70m to pythia-12b, gpt-neo, gpt-j-6b |
|
|
69
|
+
| Mistral | mistral-7b, mixtral-8x7b |
|
|
70
|
+
| Others | phi, qwen, opt, gemma |
|
|
71
|
+
|
|
72
|
+
### Activation Caching
|
|
73
|
+
|
|
74
|
+
Run the model and cache all intermediate activations:
|
|
75
|
+
|
|
76
|
+
```python
|
|
77
|
+
# Get all activations
|
|
78
|
+
tokens = model.to_tokens("The Eiffel Tower is in")
|
|
79
|
+
logits, cache = model.run_with_cache(tokens)
|
|
80
|
+
|
|
81
|
+
# Access specific activations
|
|
82
|
+
residual = cache["resid_post", 5] # Layer 5 residual stream
|
|
83
|
+
attn_pattern = cache["pattern", 3] # Layer 3 attention pattern
|
|
84
|
+
mlp_out = cache["mlp_out", 7] # Layer 7 MLP output
|
|
85
|
+
|
|
86
|
+
# Filter which activations to cache (saves memory)
|
|
87
|
+
logits, cache = model.run_with_cache(
|
|
88
|
+
tokens,
|
|
89
|
+
names_filter=lambda name: "resid_post" in name
|
|
90
|
+
)
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
### ActivationCache Keys
|
|
94
|
+
|
|
95
|
+
| Key Pattern | Shape | Description |
|
|
96
|
+
|-------------|-------|-------------|
|
|
97
|
+
| `resid_pre, layer` | [batch, pos, d_model] | Residual before attention |
|
|
98
|
+
| `resid_mid, layer` | [batch, pos, d_model] | Residual after attention |
|
|
99
|
+
| `resid_post, layer` | [batch, pos, d_model] | Residual after MLP |
|
|
100
|
+
| `attn_out, layer` | [batch, pos, d_model] | Attention output |
|
|
101
|
+
| `mlp_out, layer` | [batch, pos, d_model] | MLP output |
|
|
102
|
+
| `pattern, layer` | [batch, head, q_pos, k_pos] | Attention pattern (post-softmax) |
|
|
103
|
+
| `q, layer` | [batch, pos, head, d_head] | Query vectors |
|
|
104
|
+
| `k, layer` | [batch, pos, head, d_head] | Key vectors |
|
|
105
|
+
| `v, layer` | [batch, pos, head, d_head] | Value vectors |
|
|
106
|
+
|
|
107
|
+
## Workflow 1: Activation Patching (Causal Tracing)
|
|
108
|
+
|
|
109
|
+
Identify which activations causally affect model output by patching clean activations into corrupted runs.
|
|
110
|
+
|
|
111
|
+
### Step-by-Step
|
|
112
|
+
|
|
113
|
+
```python
|
|
114
|
+
from transformer_lens import HookedTransformer, patching
|
|
115
|
+
import torch
|
|
116
|
+
|
|
117
|
+
model = HookedTransformer.from_pretrained("gpt2-small")
|
|
118
|
+
|
|
119
|
+
# 1. Define clean and corrupted prompts
|
|
120
|
+
clean_prompt = "The Eiffel Tower is in the city of"
|
|
121
|
+
corrupted_prompt = "The Colosseum is in the city of"
|
|
122
|
+
|
|
123
|
+
clean_tokens = model.to_tokens(clean_prompt)
|
|
124
|
+
corrupted_tokens = model.to_tokens(corrupted_prompt)
|
|
125
|
+
|
|
126
|
+
# 2. Get clean activations
|
|
127
|
+
_, clean_cache = model.run_with_cache(clean_tokens)
|
|
128
|
+
|
|
129
|
+
# 3. Define metric (e.g., logit difference)
|
|
130
|
+
paris_token = model.to_single_token(" Paris")
|
|
131
|
+
rome_token = model.to_single_token(" Rome")
|
|
132
|
+
|
|
133
|
+
def metric(logits):
|
|
134
|
+
return logits[0, -1, paris_token] - logits[0, -1, rome_token]
|
|
135
|
+
|
|
136
|
+
# 4. Patch each position and layer
|
|
137
|
+
results = torch.zeros(model.cfg.n_layers, clean_tokens.shape[1])
|
|
138
|
+
|
|
139
|
+
for layer in range(model.cfg.n_layers):
|
|
140
|
+
for pos in range(clean_tokens.shape[1]):
|
|
141
|
+
def patch_hook(activation, hook):
|
|
142
|
+
activation[0, pos] = clean_cache[hook.name][0, pos]
|
|
143
|
+
return activation
|
|
144
|
+
|
|
145
|
+
patched_logits = model.run_with_hooks(
|
|
146
|
+
corrupted_tokens,
|
|
147
|
+
fwd_hooks=[(f"blocks.{layer}.hook_resid_post", patch_hook)]
|
|
148
|
+
)
|
|
149
|
+
results[layer, pos] = metric(patched_logits)
|
|
150
|
+
|
|
151
|
+
# 5. Visualize results (layer x position heatmap)
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
### Checklist
|
|
155
|
+
- [ ] Define clean and corrupted inputs that differ minimally
|
|
156
|
+
- [ ] Choose metric that captures behavior difference
|
|
157
|
+
- [ ] Cache clean activations
|
|
158
|
+
- [ ] Systematically patch each (layer, position) combination
|
|
159
|
+
- [ ] Visualize results as heatmap
|
|
160
|
+
- [ ] Identify causal hotspots
|
|
161
|
+
|
|
162
|
+
## Workflow 2: Circuit Analysis (Indirect Object Identification)
|
|
163
|
+
|
|
164
|
+
Replicate the IOI circuit discovery from "Interpretability in the Wild".
|
|
165
|
+
|
|
166
|
+
### Step-by-Step
|
|
167
|
+
|
|
168
|
+
```python
|
|
169
|
+
from transformer_lens import HookedTransformer
|
|
170
|
+
import torch
|
|
171
|
+
|
|
172
|
+
model = HookedTransformer.from_pretrained("gpt2-small")
|
|
173
|
+
|
|
174
|
+
# IOI task: "When John and Mary went to the store, Mary gave a bottle to"
|
|
175
|
+
# Model should predict "John" (indirect object)
|
|
176
|
+
|
|
177
|
+
prompt = "When John and Mary went to the store, Mary gave a bottle to"
|
|
178
|
+
tokens = model.to_tokens(prompt)
|
|
179
|
+
|
|
180
|
+
# 1. Get baseline logits
|
|
181
|
+
logits, cache = model.run_with_cache(tokens)
|
|
182
|
+
|
|
183
|
+
john_token = model.to_single_token(" John")
|
|
184
|
+
mary_token = model.to_single_token(" Mary")
|
|
185
|
+
|
|
186
|
+
# 2. Compute logit difference (IO - S)
|
|
187
|
+
logit_diff = logits[0, -1, john_token] - logits[0, -1, mary_token]
|
|
188
|
+
print(f"Logit difference: {logit_diff.item():.3f}")
|
|
189
|
+
|
|
190
|
+
# 3. Direct logit attribution by head
|
|
191
|
+
def get_head_contribution(layer, head):
|
|
192
|
+
# Project head output to logits
|
|
193
|
+
head_out = cache["z", layer][0, :, head, :] # [pos, d_head]
|
|
194
|
+
W_O = model.W_O[layer, head] # [d_head, d_model]
|
|
195
|
+
W_U = model.W_U # [d_model, vocab]
|
|
196
|
+
|
|
197
|
+
# Head contribution to logits at final position
|
|
198
|
+
contribution = head_out[-1] @ W_O @ W_U
|
|
199
|
+
return contribution[john_token] - contribution[mary_token]
|
|
200
|
+
|
|
201
|
+
# 4. Map all heads
|
|
202
|
+
head_contributions = torch.zeros(model.cfg.n_layers, model.cfg.n_heads)
|
|
203
|
+
for layer in range(model.cfg.n_layers):
|
|
204
|
+
for head in range(model.cfg.n_heads):
|
|
205
|
+
head_contributions[layer, head] = get_head_contribution(layer, head)
|
|
206
|
+
|
|
207
|
+
# 5. Identify top contributing heads (name movers, backup name movers)
|
|
208
|
+
```
|
|
209
|
+
|
|
210
|
+
### Checklist
|
|
211
|
+
- [ ] Set up task with clear IO/S tokens
|
|
212
|
+
- [ ] Compute baseline logit difference
|
|
213
|
+
- [ ] Decompose by attention head contributions
|
|
214
|
+
- [ ] Identify key circuit components (name movers, S-inhibition, induction)
|
|
215
|
+
- [ ] Validate with ablation experiments
|
|
216
|
+
|
|
217
|
+
## Workflow 3: Induction Head Detection
|
|
218
|
+
|
|
219
|
+
Find induction heads that implement [A][B]...[A] → [B] pattern.
|
|
220
|
+
|
|
221
|
+
```python
|
|
222
|
+
from transformer_lens import HookedTransformer
|
|
223
|
+
import torch
|
|
224
|
+
|
|
225
|
+
model = HookedTransformer.from_pretrained("gpt2-small")
|
|
226
|
+
|
|
227
|
+
# Create repeated sequence: [A][B][A] should predict [B]
|
|
228
|
+
repeated_tokens = torch.tensor([[1000, 2000, 1000]]) # Arbitrary tokens
|
|
229
|
+
|
|
230
|
+
_, cache = model.run_with_cache(repeated_tokens)
|
|
231
|
+
|
|
232
|
+
# Induction heads attend from final [A] back to first [B]
|
|
233
|
+
# Check attention from position 2 to position 1
|
|
234
|
+
induction_scores = torch.zeros(model.cfg.n_layers, model.cfg.n_heads)
|
|
235
|
+
|
|
236
|
+
for layer in range(model.cfg.n_layers):
|
|
237
|
+
pattern = cache["pattern", layer][0] # [head, q_pos, k_pos]
|
|
238
|
+
# Attention from pos 2 to pos 1
|
|
239
|
+
induction_scores[layer] = pattern[:, 2, 1]
|
|
240
|
+
|
|
241
|
+
# Heads with high scores are induction heads
|
|
242
|
+
top_heads = torch.topk(induction_scores.flatten(), k=5)
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
## Common Issues & Solutions
|
|
246
|
+
|
|
247
|
+
### Issue: Hooks persist after debugging
|
|
248
|
+
```python
|
|
249
|
+
# WRONG: Old hooks remain active
|
|
250
|
+
model.run_with_hooks(tokens, fwd_hooks=[...]) # Debug, add new hooks
|
|
251
|
+
model.run_with_hooks(tokens, fwd_hooks=[...]) # Old hooks still there!
|
|
252
|
+
|
|
253
|
+
# RIGHT: Always reset hooks
|
|
254
|
+
model.reset_hooks()
|
|
255
|
+
model.run_with_hooks(tokens, fwd_hooks=[...])
|
|
256
|
+
```
|
|
257
|
+
|
|
258
|
+
### Issue: Tokenization gotchas
|
|
259
|
+
```python
|
|
260
|
+
# WRONG: Assuming consistent tokenization
|
|
261
|
+
model.to_tokens("Tim") # Single token
|
|
262
|
+
model.to_tokens("Neel") # Becomes "Ne" + "el" (two tokens!)
|
|
263
|
+
|
|
264
|
+
# RIGHT: Check tokenization explicitly
|
|
265
|
+
tokens = model.to_tokens("Neel", prepend_bos=False)
|
|
266
|
+
print(model.to_str_tokens(tokens)) # ['Ne', 'el']
|
|
267
|
+
```
|
|
268
|
+
|
|
269
|
+
### Issue: LayerNorm ignored in analysis
|
|
270
|
+
```python
|
|
271
|
+
# WRONG: Ignoring LayerNorm
|
|
272
|
+
pre_activation = residual @ model.W_in[layer]
|
|
273
|
+
|
|
274
|
+
# RIGHT: Include LayerNorm
|
|
275
|
+
ln_scale = model.blocks[layer].ln2.w
|
|
276
|
+
ln_out = model.blocks[layer].ln2(residual)
|
|
277
|
+
pre_activation = ln_out @ model.W_in[layer]
|
|
278
|
+
```
|
|
279
|
+
|
|
280
|
+
### Issue: Memory explosion with large models
|
|
281
|
+
```python
|
|
282
|
+
# Use selective caching
|
|
283
|
+
logits, cache = model.run_with_cache(
|
|
284
|
+
tokens,
|
|
285
|
+
names_filter=lambda n: "resid_post" in n or "pattern" in n,
|
|
286
|
+
device="cpu" # Cache on CPU
|
|
287
|
+
)
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
## Key Classes Reference
|
|
291
|
+
|
|
292
|
+
| Class | Purpose |
|
|
293
|
+
|-------|---------|
|
|
294
|
+
| `HookedTransformer` | Main model wrapper with hooks |
|
|
295
|
+
| `ActivationCache` | Dictionary-like cache of activations |
|
|
296
|
+
| `HookedTransformerConfig` | Model configuration |
|
|
297
|
+
| `FactoredMatrix` | Efficient factored matrix operations |
|
|
298
|
+
|
|
299
|
+
## Integration with SAELens
|
|
300
|
+
|
|
301
|
+
TransformerLens integrates with SAELens for Sparse Autoencoder analysis:
|
|
302
|
+
|
|
303
|
+
```python
|
|
304
|
+
from transformer_lens import HookedTransformer
|
|
305
|
+
from sae_lens import SAE
|
|
306
|
+
|
|
307
|
+
model = HookedTransformer.from_pretrained("gpt2-small")
|
|
308
|
+
sae = SAE.from_pretrained("gpt2-small-res-jb", "blocks.8.hook_resid_pre")
|
|
309
|
+
|
|
310
|
+
# Run with SAE
|
|
311
|
+
tokens = model.to_tokens("Hello world")
|
|
312
|
+
_, cache = model.run_with_cache(tokens)
|
|
313
|
+
sae_acts = sae.encode(cache["resid_pre", 8])
|
|
314
|
+
```
|
|
315
|
+
|
|
316
|
+
## Reference Documentation
|
|
317
|
+
|
|
318
|
+
For detailed API documentation, tutorials, and advanced usage, see the `references/` folder:
|
|
319
|
+
|
|
320
|
+
| File | Contents |
|
|
321
|
+
|------|----------|
|
|
322
|
+
| [references/README.md](references/README.md) | Overview and quick start guide |
|
|
323
|
+
| [references/api.md](references/api.md) | Complete API reference for HookedTransformer, ActivationCache, HookPoints |
|
|
324
|
+
| [references/tutorials.md](references/tutorials.md) | Step-by-step tutorials for activation patching, circuit analysis, logit lens |
|
|
325
|
+
|
|
326
|
+
## External Resources
|
|
327
|
+
|
|
328
|
+
### Tutorials
|
|
329
|
+
- [Main Demo Notebook](https://transformerlensorg.github.io/TransformerLens/generated/demos/Main_Demo.html)
|
|
330
|
+
- [Activation Patching Demo](https://colab.research.google.com/github/TransformerLensOrg/TransformerLens/blob/main/demos/Activation_Patching_in_TL_Demo.ipynb)
|
|
331
|
+
- [ARENA Mech Interp Course](https://arena-foundation.github.io/ARENA/) - 200+ hours of tutorials
|
|
332
|
+
|
|
333
|
+
### Papers
|
|
334
|
+
- [A Mathematical Framework for Transformer Circuits](https://transformer-circuits.pub/2021/framework/index.html)
|
|
335
|
+
- [In-context Learning and Induction Heads](https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html)
|
|
336
|
+
- [Interpretability in the Wild (IOI)](https://arxiv.org/abs/2211.00593)
|
|
337
|
+
|
|
338
|
+
### Official Documentation
|
|
339
|
+
- [Official Docs](https://transformerlensorg.github.io/TransformerLens/)
|
|
340
|
+
- [Model Properties Table](https://transformerlensorg.github.io/TransformerLens/generated/model_properties_table.html)
|
|
341
|
+
- [Neel Nanda's Glossary](https://www.neelnanda.io/mechanistic-interpretability/glossary)
|
|
342
|
+
|
|
343
|
+
## Version Notes
|
|
344
|
+
|
|
345
|
+
- **v2.0**: Removed HookedSAE (moved to SAELens)
|
|
346
|
+
- **v3.0 (alpha)**: TransformerBridge for loading any nn.Module
|
|
@@ -0,0 +1,54 @@
|
|
|
1
|
+
# TransformerLens Reference Documentation
|
|
2
|
+
|
|
3
|
+
This directory contains comprehensive reference materials for TransformerLens.
|
|
4
|
+
|
|
5
|
+
## Contents
|
|
6
|
+
|
|
7
|
+
- [api.md](api.md) - Complete API reference for HookedTransformer, ActivationCache, and HookPoints
|
|
8
|
+
- [tutorials.md](tutorials.md) - Step-by-step tutorials for common interpretability workflows
|
|
9
|
+
- [papers.md](papers.md) - Key research papers and foundational concepts
|
|
10
|
+
|
|
11
|
+
## Quick Links
|
|
12
|
+
|
|
13
|
+
- **Official Documentation**: https://transformerlensorg.github.io/TransformerLens/
|
|
14
|
+
- **GitHub Repository**: https://github.com/TransformerLensOrg/TransformerLens
|
|
15
|
+
- **Model Properties Table**: https://transformerlensorg.github.io/TransformerLens/generated/model_properties_table.html
|
|
16
|
+
|
|
17
|
+
## Installation
|
|
18
|
+
|
|
19
|
+
```bash
|
|
20
|
+
pip install transformer-lens
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
## Basic Usage
|
|
24
|
+
|
|
25
|
+
```python
|
|
26
|
+
from transformer_lens import HookedTransformer
|
|
27
|
+
|
|
28
|
+
# Load model
|
|
29
|
+
model = HookedTransformer.from_pretrained("gpt2-small")
|
|
30
|
+
|
|
31
|
+
# Run with activation caching
|
|
32
|
+
tokens = model.to_tokens("Hello world")
|
|
33
|
+
logits, cache = model.run_with_cache(tokens)
|
|
34
|
+
|
|
35
|
+
# Access activations
|
|
36
|
+
residual = cache["resid_post", 5] # Layer 5 residual stream
|
|
37
|
+
attention = cache["pattern", 3] # Layer 3 attention patterns
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
## Key Concepts
|
|
41
|
+
|
|
42
|
+
### HookPoints
|
|
43
|
+
Every activation in the transformer has a HookPoint wrapper, enabling:
|
|
44
|
+
- Reading activations via `run_with_cache()`
|
|
45
|
+
- Modifying activations via `run_with_hooks()`
|
|
46
|
+
|
|
47
|
+
### Activation Cache
|
|
48
|
+
The `ActivationCache` stores all intermediate activations with helper methods for:
|
|
49
|
+
- Residual stream decomposition
|
|
50
|
+
- Logit attribution
|
|
51
|
+
- Layer-wise analysis
|
|
52
|
+
|
|
53
|
+
### Supported Models (50+)
|
|
54
|
+
GPT-2, LLaMA, Mistral, Pythia, GPT-Neo, OPT, Gemma, Phi, and more.
|