@synsci/cli-darwin-x64 1.1.49
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/skills/accelerate/SKILL.md +332 -0
- package/bin/skills/accelerate/references/custom-plugins.md +453 -0
- package/bin/skills/accelerate/references/megatron-integration.md +489 -0
- package/bin/skills/accelerate/references/performance.md +525 -0
- package/bin/skills/audiocraft/SKILL.md +564 -0
- package/bin/skills/audiocraft/references/advanced-usage.md +666 -0
- package/bin/skills/audiocraft/references/troubleshooting.md +504 -0
- package/bin/skills/autogpt/SKILL.md +403 -0
- package/bin/skills/autogpt/references/advanced-usage.md +535 -0
- package/bin/skills/autogpt/references/troubleshooting.md +420 -0
- package/bin/skills/awq/SKILL.md +310 -0
- package/bin/skills/awq/references/advanced-usage.md +324 -0
- package/bin/skills/awq/references/troubleshooting.md +344 -0
- package/bin/skills/axolotl/SKILL.md +158 -0
- package/bin/skills/axolotl/references/api.md +5548 -0
- package/bin/skills/axolotl/references/dataset-formats.md +1029 -0
- package/bin/skills/axolotl/references/index.md +15 -0
- package/bin/skills/axolotl/references/other.md +3563 -0
- package/bin/skills/bigcode-evaluation-harness/SKILL.md +405 -0
- package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +393 -0
- package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +424 -0
- package/bin/skills/bigcode-evaluation-harness/references/issues.md +394 -0
- package/bin/skills/bitsandbytes/SKILL.md +411 -0
- package/bin/skills/bitsandbytes/references/memory-optimization.md +521 -0
- package/bin/skills/bitsandbytes/references/qlora-training.md +521 -0
- package/bin/skills/bitsandbytes/references/quantization-formats.md +447 -0
- package/bin/skills/blip-2/SKILL.md +564 -0
- package/bin/skills/blip-2/references/advanced-usage.md +680 -0
- package/bin/skills/blip-2/references/troubleshooting.md +526 -0
- package/bin/skills/chroma/SKILL.md +406 -0
- package/bin/skills/chroma/references/integration.md +38 -0
- package/bin/skills/clip/SKILL.md +253 -0
- package/bin/skills/clip/references/applications.md +207 -0
- package/bin/skills/constitutional-ai/SKILL.md +290 -0
- package/bin/skills/crewai/SKILL.md +498 -0
- package/bin/skills/crewai/references/flows.md +438 -0
- package/bin/skills/crewai/references/tools.md +429 -0
- package/bin/skills/crewai/references/troubleshooting.md +480 -0
- package/bin/skills/deepspeed/SKILL.md +141 -0
- package/bin/skills/deepspeed/references/08.md +17 -0
- package/bin/skills/deepspeed/references/09.md +173 -0
- package/bin/skills/deepspeed/references/2020.md +378 -0
- package/bin/skills/deepspeed/references/2023.md +279 -0
- package/bin/skills/deepspeed/references/assets.md +179 -0
- package/bin/skills/deepspeed/references/index.md +35 -0
- package/bin/skills/deepspeed/references/mii.md +118 -0
- package/bin/skills/deepspeed/references/other.md +1191 -0
- package/bin/skills/deepspeed/references/tutorials.md +6554 -0
- package/bin/skills/dspy/SKILL.md +590 -0
- package/bin/skills/dspy/references/examples.md +663 -0
- package/bin/skills/dspy/references/modules.md +475 -0
- package/bin/skills/dspy/references/optimizers.md +566 -0
- package/bin/skills/faiss/SKILL.md +221 -0
- package/bin/skills/faiss/references/index_types.md +280 -0
- package/bin/skills/flash-attention/SKILL.md +367 -0
- package/bin/skills/flash-attention/references/benchmarks.md +215 -0
- package/bin/skills/flash-attention/references/transformers-integration.md +293 -0
- package/bin/skills/gguf/SKILL.md +427 -0
- package/bin/skills/gguf/references/advanced-usage.md +504 -0
- package/bin/skills/gguf/references/troubleshooting.md +442 -0
- package/bin/skills/gptq/SKILL.md +450 -0
- package/bin/skills/gptq/references/calibration.md +337 -0
- package/bin/skills/gptq/references/integration.md +129 -0
- package/bin/skills/gptq/references/troubleshooting.md +95 -0
- package/bin/skills/grpo-rl-training/README.md +97 -0
- package/bin/skills/grpo-rl-training/SKILL.md +572 -0
- package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +393 -0
- package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +228 -0
- package/bin/skills/guidance/SKILL.md +572 -0
- package/bin/skills/guidance/references/backends.md +554 -0
- package/bin/skills/guidance/references/constraints.md +674 -0
- package/bin/skills/guidance/references/examples.md +767 -0
- package/bin/skills/hqq/SKILL.md +445 -0
- package/bin/skills/hqq/references/advanced-usage.md +528 -0
- package/bin/skills/hqq/references/troubleshooting.md +503 -0
- package/bin/skills/hugging-face-cli/SKILL.md +191 -0
- package/bin/skills/hugging-face-cli/references/commands.md +954 -0
- package/bin/skills/hugging-face-cli/references/examples.md +374 -0
- package/bin/skills/hugging-face-datasets/SKILL.md +547 -0
- package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +239 -0
- package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +196 -0
- package/bin/skills/hugging-face-datasets/examples/training_examples.json +176 -0
- package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +522 -0
- package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +844 -0
- package/bin/skills/hugging-face-datasets/templates/chat.json +55 -0
- package/bin/skills/hugging-face-datasets/templates/classification.json +62 -0
- package/bin/skills/hugging-face-datasets/templates/completion.json +51 -0
- package/bin/skills/hugging-face-datasets/templates/custom.json +75 -0
- package/bin/skills/hugging-face-datasets/templates/qa.json +54 -0
- package/bin/skills/hugging-face-datasets/templates/tabular.json +81 -0
- package/bin/skills/hugging-face-evaluation/SKILL.md +656 -0
- package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +382 -0
- package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +141 -0
- package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +135 -0
- package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +50 -0
- package/bin/skills/hugging-face-evaluation/requirements.txt +20 -0
- package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +1374 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +104 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +317 -0
- package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +303 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +98 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +331 -0
- package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +206 -0
- package/bin/skills/hugging-face-jobs/SKILL.md +1041 -0
- package/bin/skills/hugging-face-jobs/index.html +216 -0
- package/bin/skills/hugging-face-jobs/references/hardware_guide.md +336 -0
- package/bin/skills/hugging-face-jobs/references/hub_saving.md +352 -0
- package/bin/skills/hugging-face-jobs/references/token_usage.md +546 -0
- package/bin/skills/hugging-face-jobs/references/troubleshooting.md +475 -0
- package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +718 -0
- package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +546 -0
- package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +587 -0
- package/bin/skills/hugging-face-model-trainer/SKILL.md +711 -0
- package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +296 -0
- package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +283 -0
- package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +364 -0
- package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +371 -0
- package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +189 -0
- package/bin/skills/hugging-face-model-trainer/references/training_methods.md +150 -0
- package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +203 -0
- package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +282 -0
- package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +424 -0
- package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +417 -0
- package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +150 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +106 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +89 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +122 -0
- package/bin/skills/hugging-face-paper-publisher/SKILL.md +627 -0
- package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +327 -0
- package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +216 -0
- package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +508 -0
- package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +299 -0
- package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +358 -0
- package/bin/skills/hugging-face-paper-publisher/templates/modern.md +319 -0
- package/bin/skills/hugging-face-paper-publisher/templates/standard.md +201 -0
- package/bin/skills/hugging-face-tool-builder/SKILL.md +115 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +57 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +40 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +57 -0
- package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +230 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +96 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +188 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +171 -0
- package/bin/skills/hugging-face-trackio/SKILL.md +65 -0
- package/bin/skills/hugging-face-trackio/references/logging_metrics.md +206 -0
- package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +223 -0
- package/bin/skills/huggingface-tokenizers/SKILL.md +516 -0
- package/bin/skills/huggingface-tokenizers/references/algorithms.md +653 -0
- package/bin/skills/huggingface-tokenizers/references/integration.md +637 -0
- package/bin/skills/huggingface-tokenizers/references/pipeline.md +723 -0
- package/bin/skills/huggingface-tokenizers/references/training.md +565 -0
- package/bin/skills/instructor/SKILL.md +740 -0
- package/bin/skills/instructor/references/examples.md +107 -0
- package/bin/skills/instructor/references/providers.md +70 -0
- package/bin/skills/instructor/references/validation.md +606 -0
- package/bin/skills/knowledge-distillation/SKILL.md +458 -0
- package/bin/skills/knowledge-distillation/references/minillm.md +334 -0
- package/bin/skills/lambda-labs/SKILL.md +545 -0
- package/bin/skills/lambda-labs/references/advanced-usage.md +611 -0
- package/bin/skills/lambda-labs/references/troubleshooting.md +530 -0
- package/bin/skills/langchain/SKILL.md +480 -0
- package/bin/skills/langchain/references/agents.md +499 -0
- package/bin/skills/langchain/references/integration.md +562 -0
- package/bin/skills/langchain/references/rag.md +600 -0
- package/bin/skills/langsmith/SKILL.md +422 -0
- package/bin/skills/langsmith/references/advanced-usage.md +548 -0
- package/bin/skills/langsmith/references/troubleshooting.md +537 -0
- package/bin/skills/litgpt/SKILL.md +469 -0
- package/bin/skills/litgpt/references/custom-models.md +568 -0
- package/bin/skills/litgpt/references/distributed-training.md +451 -0
- package/bin/skills/litgpt/references/supported-models.md +336 -0
- package/bin/skills/litgpt/references/training-recipes.md +619 -0
- package/bin/skills/llama-cpp/SKILL.md +258 -0
- package/bin/skills/llama-cpp/references/optimization.md +89 -0
- package/bin/skills/llama-cpp/references/quantization.md +213 -0
- package/bin/skills/llama-cpp/references/server.md +125 -0
- package/bin/skills/llama-factory/SKILL.md +80 -0
- package/bin/skills/llama-factory/references/_images.md +23 -0
- package/bin/skills/llama-factory/references/advanced.md +1055 -0
- package/bin/skills/llama-factory/references/getting_started.md +349 -0
- package/bin/skills/llama-factory/references/index.md +19 -0
- package/bin/skills/llama-factory/references/other.md +31 -0
- package/bin/skills/llamaguard/SKILL.md +337 -0
- package/bin/skills/llamaindex/SKILL.md +569 -0
- package/bin/skills/llamaindex/references/agents.md +83 -0
- package/bin/skills/llamaindex/references/data_connectors.md +108 -0
- package/bin/skills/llamaindex/references/query_engines.md +406 -0
- package/bin/skills/llava/SKILL.md +304 -0
- package/bin/skills/llava/references/training.md +197 -0
- package/bin/skills/lm-evaluation-harness/SKILL.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
- package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
- package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
- package/bin/skills/long-context/SKILL.md +536 -0
- package/bin/skills/long-context/references/extension_methods.md +468 -0
- package/bin/skills/long-context/references/fine_tuning.md +611 -0
- package/bin/skills/long-context/references/rope.md +402 -0
- package/bin/skills/mamba/SKILL.md +260 -0
- package/bin/skills/mamba/references/architecture-details.md +206 -0
- package/bin/skills/mamba/references/benchmarks.md +255 -0
- package/bin/skills/mamba/references/training-guide.md +388 -0
- package/bin/skills/megatron-core/SKILL.md +366 -0
- package/bin/skills/megatron-core/references/benchmarks.md +249 -0
- package/bin/skills/megatron-core/references/parallelism-guide.md +404 -0
- package/bin/skills/megatron-core/references/production-examples.md +473 -0
- package/bin/skills/megatron-core/references/training-recipes.md +547 -0
- package/bin/skills/miles/SKILL.md +315 -0
- package/bin/skills/miles/references/api-reference.md +141 -0
- package/bin/skills/miles/references/troubleshooting.md +352 -0
- package/bin/skills/mlflow/SKILL.md +704 -0
- package/bin/skills/mlflow/references/deployment.md +744 -0
- package/bin/skills/mlflow/references/model-registry.md +770 -0
- package/bin/skills/mlflow/references/tracking.md +680 -0
- package/bin/skills/modal/SKILL.md +341 -0
- package/bin/skills/modal/references/advanced-usage.md +503 -0
- package/bin/skills/modal/references/troubleshooting.md +494 -0
- package/bin/skills/model-merging/SKILL.md +539 -0
- package/bin/skills/model-merging/references/evaluation.md +462 -0
- package/bin/skills/model-merging/references/examples.md +428 -0
- package/bin/skills/model-merging/references/methods.md +352 -0
- package/bin/skills/model-pruning/SKILL.md +495 -0
- package/bin/skills/model-pruning/references/wanda.md +347 -0
- package/bin/skills/moe-training/SKILL.md +526 -0
- package/bin/skills/moe-training/references/architectures.md +432 -0
- package/bin/skills/moe-training/references/inference.md +348 -0
- package/bin/skills/moe-training/references/training.md +425 -0
- package/bin/skills/nanogpt/SKILL.md +290 -0
- package/bin/skills/nanogpt/references/architecture.md +382 -0
- package/bin/skills/nanogpt/references/data.md +476 -0
- package/bin/skills/nanogpt/references/training.md +564 -0
- package/bin/skills/nemo-curator/SKILL.md +383 -0
- package/bin/skills/nemo-curator/references/deduplication.md +87 -0
- package/bin/skills/nemo-curator/references/filtering.md +102 -0
- package/bin/skills/nemo-evaluator/SKILL.md +494 -0
- package/bin/skills/nemo-evaluator/references/adapter-system.md +340 -0
- package/bin/skills/nemo-evaluator/references/configuration.md +447 -0
- package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +315 -0
- package/bin/skills/nemo-evaluator/references/execution-backends.md +361 -0
- package/bin/skills/nemo-guardrails/SKILL.md +297 -0
- package/bin/skills/nnsight/SKILL.md +436 -0
- package/bin/skills/nnsight/references/README.md +78 -0
- package/bin/skills/nnsight/references/api.md +344 -0
- package/bin/skills/nnsight/references/tutorials.md +300 -0
- package/bin/skills/openrlhf/SKILL.md +249 -0
- package/bin/skills/openrlhf/references/algorithm-comparison.md +404 -0
- package/bin/skills/openrlhf/references/custom-rewards.md +530 -0
- package/bin/skills/openrlhf/references/hybrid-engine.md +287 -0
- package/bin/skills/openrlhf/references/multi-node-training.md +454 -0
- package/bin/skills/outlines/SKILL.md +652 -0
- package/bin/skills/outlines/references/backends.md +615 -0
- package/bin/skills/outlines/references/examples.md +773 -0
- package/bin/skills/outlines/references/json_generation.md +652 -0
- package/bin/skills/peft/SKILL.md +431 -0
- package/bin/skills/peft/references/advanced-usage.md +514 -0
- package/bin/skills/peft/references/troubleshooting.md +480 -0
- package/bin/skills/phoenix/SKILL.md +475 -0
- package/bin/skills/phoenix/references/advanced-usage.md +619 -0
- package/bin/skills/phoenix/references/troubleshooting.md +538 -0
- package/bin/skills/pinecone/SKILL.md +358 -0
- package/bin/skills/pinecone/references/deployment.md +181 -0
- package/bin/skills/pytorch-fsdp/SKILL.md +126 -0
- package/bin/skills/pytorch-fsdp/references/index.md +7 -0
- package/bin/skills/pytorch-fsdp/references/other.md +4249 -0
- package/bin/skills/pytorch-lightning/SKILL.md +346 -0
- package/bin/skills/pytorch-lightning/references/callbacks.md +436 -0
- package/bin/skills/pytorch-lightning/references/distributed.md +490 -0
- package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +556 -0
- package/bin/skills/pyvene/SKILL.md +473 -0
- package/bin/skills/pyvene/references/README.md +73 -0
- package/bin/skills/pyvene/references/api.md +383 -0
- package/bin/skills/pyvene/references/tutorials.md +376 -0
- package/bin/skills/qdrant/SKILL.md +493 -0
- package/bin/skills/qdrant/references/advanced-usage.md +648 -0
- package/bin/skills/qdrant/references/troubleshooting.md +631 -0
- package/bin/skills/ray-data/SKILL.md +326 -0
- package/bin/skills/ray-data/references/integration.md +82 -0
- package/bin/skills/ray-data/references/transformations.md +83 -0
- package/bin/skills/ray-train/SKILL.md +406 -0
- package/bin/skills/ray-train/references/multi-node.md +628 -0
- package/bin/skills/rwkv/SKILL.md +260 -0
- package/bin/skills/rwkv/references/architecture-details.md +344 -0
- package/bin/skills/rwkv/references/rwkv7.md +386 -0
- package/bin/skills/rwkv/references/state-management.md +369 -0
- package/bin/skills/saelens/SKILL.md +386 -0
- package/bin/skills/saelens/references/README.md +70 -0
- package/bin/skills/saelens/references/api.md +333 -0
- package/bin/skills/saelens/references/tutorials.md +318 -0
- package/bin/skills/segment-anything/SKILL.md +500 -0
- package/bin/skills/segment-anything/references/advanced-usage.md +589 -0
- package/bin/skills/segment-anything/references/troubleshooting.md +484 -0
- package/bin/skills/sentence-transformers/SKILL.md +255 -0
- package/bin/skills/sentence-transformers/references/models.md +123 -0
- package/bin/skills/sentencepiece/SKILL.md +235 -0
- package/bin/skills/sentencepiece/references/algorithms.md +200 -0
- package/bin/skills/sentencepiece/references/training.md +304 -0
- package/bin/skills/sglang/SKILL.md +442 -0
- package/bin/skills/sglang/references/deployment.md +490 -0
- package/bin/skills/sglang/references/radix-attention.md +413 -0
- package/bin/skills/sglang/references/structured-generation.md +541 -0
- package/bin/skills/simpo/SKILL.md +219 -0
- package/bin/skills/simpo/references/datasets.md +478 -0
- package/bin/skills/simpo/references/hyperparameters.md +452 -0
- package/bin/skills/simpo/references/loss-functions.md +350 -0
- package/bin/skills/skypilot/SKILL.md +509 -0
- package/bin/skills/skypilot/references/advanced-usage.md +491 -0
- package/bin/skills/skypilot/references/troubleshooting.md +570 -0
- package/bin/skills/slime/SKILL.md +464 -0
- package/bin/skills/slime/references/api-reference.md +392 -0
- package/bin/skills/slime/references/troubleshooting.md +386 -0
- package/bin/skills/speculative-decoding/SKILL.md +467 -0
- package/bin/skills/speculative-decoding/references/lookahead.md +309 -0
- package/bin/skills/speculative-decoding/references/medusa.md +350 -0
- package/bin/skills/stable-diffusion/SKILL.md +519 -0
- package/bin/skills/stable-diffusion/references/advanced-usage.md +716 -0
- package/bin/skills/stable-diffusion/references/troubleshooting.md +555 -0
- package/bin/skills/tensorboard/SKILL.md +629 -0
- package/bin/skills/tensorboard/references/integrations.md +638 -0
- package/bin/skills/tensorboard/references/profiling.md +545 -0
- package/bin/skills/tensorboard/references/visualization.md +620 -0
- package/bin/skills/tensorrt-llm/SKILL.md +187 -0
- package/bin/skills/tensorrt-llm/references/multi-gpu.md +298 -0
- package/bin/skills/tensorrt-llm/references/optimization.md +242 -0
- package/bin/skills/tensorrt-llm/references/serving.md +470 -0
- package/bin/skills/tinker/SKILL.md +362 -0
- package/bin/skills/tinker/references/api-reference.md +168 -0
- package/bin/skills/tinker/references/getting-started.md +157 -0
- package/bin/skills/tinker/references/loss-functions.md +163 -0
- package/bin/skills/tinker/references/models-and-lora.md +139 -0
- package/bin/skills/tinker/references/recipes.md +280 -0
- package/bin/skills/tinker/references/reinforcement-learning.md +212 -0
- package/bin/skills/tinker/references/rendering.md +243 -0
- package/bin/skills/tinker/references/supervised-learning.md +232 -0
- package/bin/skills/tinker-training-cost/SKILL.md +187 -0
- package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +123 -0
- package/bin/skills/torchforge/SKILL.md +433 -0
- package/bin/skills/torchforge/references/api-reference.md +327 -0
- package/bin/skills/torchforge/references/troubleshooting.md +409 -0
- package/bin/skills/torchtitan/SKILL.md +358 -0
- package/bin/skills/torchtitan/references/checkpoint.md +181 -0
- package/bin/skills/torchtitan/references/custom-models.md +258 -0
- package/bin/skills/torchtitan/references/float8.md +133 -0
- package/bin/skills/torchtitan/references/fsdp.md +126 -0
- package/bin/skills/transformer-lens/SKILL.md +346 -0
- package/bin/skills/transformer-lens/references/README.md +54 -0
- package/bin/skills/transformer-lens/references/api.md +362 -0
- package/bin/skills/transformer-lens/references/tutorials.md +339 -0
- package/bin/skills/trl-fine-tuning/SKILL.md +455 -0
- package/bin/skills/trl-fine-tuning/references/dpo-variants.md +227 -0
- package/bin/skills/trl-fine-tuning/references/online-rl.md +82 -0
- package/bin/skills/trl-fine-tuning/references/reward-modeling.md +122 -0
- package/bin/skills/trl-fine-tuning/references/sft-training.md +168 -0
- package/bin/skills/unsloth/SKILL.md +80 -0
- package/bin/skills/unsloth/references/index.md +7 -0
- package/bin/skills/unsloth/references/llms-full.md +16799 -0
- package/bin/skills/unsloth/references/llms-txt.md +12044 -0
- package/bin/skills/unsloth/references/llms.md +82 -0
- package/bin/skills/verl/SKILL.md +391 -0
- package/bin/skills/verl/references/api-reference.md +301 -0
- package/bin/skills/verl/references/troubleshooting.md +391 -0
- package/bin/skills/vllm/SKILL.md +364 -0
- package/bin/skills/vllm/references/optimization.md +226 -0
- package/bin/skills/vllm/references/quantization.md +284 -0
- package/bin/skills/vllm/references/server-deployment.md +255 -0
- package/bin/skills/vllm/references/troubleshooting.md +447 -0
- package/bin/skills/weights-and-biases/SKILL.md +590 -0
- package/bin/skills/weights-and-biases/references/artifacts.md +584 -0
- package/bin/skills/weights-and-biases/references/integrations.md +700 -0
- package/bin/skills/weights-and-biases/references/sweeps.md +847 -0
- package/bin/skills/whisper/SKILL.md +317 -0
- package/bin/skills/whisper/references/languages.md +189 -0
- package/bin/synsc +0 -0
- package/package.json +10 -0
|
@@ -0,0 +1,436 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: nnsight-remote-interpretability
|
|
3
|
+
description: Provides guidance for interpreting and manipulating neural network internals using nnsight with optional NDIF remote execution. Use when needing to run interpretability experiments on massive models (70B+) without local GPU resources, or when working with any PyTorch architecture.
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
author: Synthetic Sciences
|
|
6
|
+
license: MIT
|
|
7
|
+
tags: [nnsight, NDIF, Remote Execution, Mechanistic Interpretability, Model Internals]
|
|
8
|
+
dependencies: [nnsight>=0.5.0, torch>=2.0.0]
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
# nnsight: Transparent Access to Neural Network Internals
|
|
12
|
+
|
|
13
|
+
nnsight (/ɛn.saɪt/) enables researchers to interpret and manipulate the internals of any PyTorch model, with the unique capability of running the same code locally on small models or remotely on massive models (70B+) via NDIF.
|
|
14
|
+
|
|
15
|
+
**GitHub**: [ndif-team/nnsight](https://github.com/ndif-team/nnsight) (730+ stars)
|
|
16
|
+
**Paper**: [NNsight and NDIF: Democratizing Access to Foundation Model Internals](https://arxiv.org/abs/2407.14561) (ICLR 2025)
|
|
17
|
+
|
|
18
|
+
## Key Value Proposition
|
|
19
|
+
|
|
20
|
+
**Write once, run anywhere**: The same interpretability code works on GPT-2 locally or Llama-3.1-405B remotely. Just toggle `remote=True`.
|
|
21
|
+
|
|
22
|
+
```python
|
|
23
|
+
# Local execution (small model)
|
|
24
|
+
with model.trace("Hello world"):
|
|
25
|
+
hidden = model.transformer.h[5].output[0].save()
|
|
26
|
+
|
|
27
|
+
# Remote execution (massive model) - same code!
|
|
28
|
+
with model.trace("Hello world", remote=True):
|
|
29
|
+
hidden = model.model.layers[40].output[0].save()
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
## When to Use nnsight
|
|
33
|
+
|
|
34
|
+
**Use nnsight when you need to:**
|
|
35
|
+
- Run interpretability experiments on models too large for local GPUs (70B, 405B)
|
|
36
|
+
- Work with any PyTorch architecture (transformers, Mamba, custom models)
|
|
37
|
+
- Perform multi-token generation interventions
|
|
38
|
+
- Share activations between different prompts
|
|
39
|
+
- Access full model internals without reimplementation
|
|
40
|
+
|
|
41
|
+
**Consider alternatives when:**
|
|
42
|
+
- You want consistent API across models → Use **TransformerLens**
|
|
43
|
+
- You need declarative, shareable interventions → Use **pyvene**
|
|
44
|
+
- You're training SAEs → Use **SAELens**
|
|
45
|
+
- You only work with small models locally → **TransformerLens** may be simpler
|
|
46
|
+
|
|
47
|
+
## Installation
|
|
48
|
+
|
|
49
|
+
```bash
|
|
50
|
+
# Basic installation
|
|
51
|
+
pip install nnsight
|
|
52
|
+
|
|
53
|
+
# For vLLM support
|
|
54
|
+
pip install "nnsight[vllm]"
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
For remote NDIF execution, sign up at [login.ndif.us](https://login.ndif.us) for an API key.
|
|
58
|
+
|
|
59
|
+
## Core Concepts
|
|
60
|
+
|
|
61
|
+
### LanguageModel Wrapper
|
|
62
|
+
|
|
63
|
+
```python
|
|
64
|
+
from nnsight import LanguageModel
|
|
65
|
+
|
|
66
|
+
# Load model (uses HuggingFace under the hood)
|
|
67
|
+
model = LanguageModel("openai-community/gpt2", device_map="auto")
|
|
68
|
+
|
|
69
|
+
# For larger models
|
|
70
|
+
model = LanguageModel("meta-llama/Llama-3.1-8B", device_map="auto")
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
### Tracing Context
|
|
74
|
+
|
|
75
|
+
The `trace` context manager enables deferred execution - operations are collected into a computation graph:
|
|
76
|
+
|
|
77
|
+
```python
|
|
78
|
+
from nnsight import LanguageModel
|
|
79
|
+
|
|
80
|
+
model = LanguageModel("gpt2", device_map="auto")
|
|
81
|
+
|
|
82
|
+
with model.trace("The Eiffel Tower is in") as tracer:
|
|
83
|
+
# Access any module's output
|
|
84
|
+
hidden_states = model.transformer.h[5].output[0].save()
|
|
85
|
+
|
|
86
|
+
# Access attention patterns
|
|
87
|
+
attn = model.transformer.h[5].attn.attn_dropout.input[0][0].save()
|
|
88
|
+
|
|
89
|
+
# Modify activations
|
|
90
|
+
model.transformer.h[8].output[0][:] = 0 # Zero out layer 8
|
|
91
|
+
|
|
92
|
+
# Get final output
|
|
93
|
+
logits = model.output.save()
|
|
94
|
+
|
|
95
|
+
# After context exits, access saved values
|
|
96
|
+
print(hidden_states.shape) # [batch, seq, hidden]
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
### Proxy Objects
|
|
100
|
+
|
|
101
|
+
Inside `trace`, module accesses return Proxy objects that record operations:
|
|
102
|
+
|
|
103
|
+
```python
|
|
104
|
+
with model.trace("Hello"):
|
|
105
|
+
# These are all Proxy objects - operations are deferred
|
|
106
|
+
h5_out = model.transformer.h[5].output[0] # Proxy
|
|
107
|
+
h5_mean = h5_out.mean(dim=-1) # Proxy
|
|
108
|
+
h5_saved = h5_mean.save() # Save for later access
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
## Workflow 1: Activation Analysis
|
|
112
|
+
|
|
113
|
+
### Step-by-Step
|
|
114
|
+
|
|
115
|
+
```python
|
|
116
|
+
from nnsight import LanguageModel
|
|
117
|
+
import torch
|
|
118
|
+
|
|
119
|
+
model = LanguageModel("gpt2", device_map="auto")
|
|
120
|
+
|
|
121
|
+
prompt = "The capital of France is"
|
|
122
|
+
|
|
123
|
+
with model.trace(prompt) as tracer:
|
|
124
|
+
# 1. Collect activations from multiple layers
|
|
125
|
+
layer_outputs = []
|
|
126
|
+
for i in range(12): # GPT-2 has 12 layers
|
|
127
|
+
layer_out = model.transformer.h[i].output[0].save()
|
|
128
|
+
layer_outputs.append(layer_out)
|
|
129
|
+
|
|
130
|
+
# 2. Get attention patterns
|
|
131
|
+
attn_patterns = []
|
|
132
|
+
for i in range(12):
|
|
133
|
+
# Access attention weights (after softmax)
|
|
134
|
+
attn = model.transformer.h[i].attn.attn_dropout.input[0][0].save()
|
|
135
|
+
attn_patterns.append(attn)
|
|
136
|
+
|
|
137
|
+
# 3. Get final logits
|
|
138
|
+
logits = model.output.save()
|
|
139
|
+
|
|
140
|
+
# 4. Analyze outside context
|
|
141
|
+
for i, layer_out in enumerate(layer_outputs):
|
|
142
|
+
print(f"Layer {i} output shape: {layer_out.shape}")
|
|
143
|
+
print(f"Layer {i} norm: {layer_out.norm().item():.3f}")
|
|
144
|
+
|
|
145
|
+
# 5. Find top predictions
|
|
146
|
+
probs = torch.softmax(logits[0, -1], dim=-1)
|
|
147
|
+
top_tokens = probs.topk(5)
|
|
148
|
+
for token, prob in zip(top_tokens.indices, top_tokens.values):
|
|
149
|
+
print(f"{model.tokenizer.decode(token)}: {prob.item():.3f}")
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
### Checklist
|
|
153
|
+
- [ ] Load model with LanguageModel wrapper
|
|
154
|
+
- [ ] Use trace context for operations
|
|
155
|
+
- [ ] Call `.save()` on values you need after context
|
|
156
|
+
- [ ] Access saved values outside context
|
|
157
|
+
- [ ] Use `.shape`, `.norm()`, etc. for analysis
|
|
158
|
+
|
|
159
|
+
## Workflow 2: Activation Patching
|
|
160
|
+
|
|
161
|
+
### Step-by-Step
|
|
162
|
+
|
|
163
|
+
```python
|
|
164
|
+
from nnsight import LanguageModel
|
|
165
|
+
import torch
|
|
166
|
+
|
|
167
|
+
model = LanguageModel("gpt2", device_map="auto")
|
|
168
|
+
|
|
169
|
+
clean_prompt = "The Eiffel Tower is in"
|
|
170
|
+
corrupted_prompt = "The Colosseum is in"
|
|
171
|
+
|
|
172
|
+
# 1. Get clean activations
|
|
173
|
+
with model.trace(clean_prompt) as tracer:
|
|
174
|
+
clean_hidden = model.transformer.h[8].output[0].save()
|
|
175
|
+
|
|
176
|
+
# 2. Patch clean into corrupted run
|
|
177
|
+
with model.trace(corrupted_prompt) as tracer:
|
|
178
|
+
# Replace layer 8 output with clean activations
|
|
179
|
+
model.transformer.h[8].output[0][:] = clean_hidden
|
|
180
|
+
|
|
181
|
+
patched_logits = model.output.save()
|
|
182
|
+
|
|
183
|
+
# 3. Compare predictions
|
|
184
|
+
paris_token = model.tokenizer.encode(" Paris")[0]
|
|
185
|
+
rome_token = model.tokenizer.encode(" Rome")[0]
|
|
186
|
+
|
|
187
|
+
patched_probs = torch.softmax(patched_logits[0, -1], dim=-1)
|
|
188
|
+
print(f"Paris prob: {patched_probs[paris_token].item():.3f}")
|
|
189
|
+
print(f"Rome prob: {patched_probs[rome_token].item():.3f}")
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
### Systematic Patching Sweep
|
|
193
|
+
|
|
194
|
+
```python
|
|
195
|
+
def patch_layer_position(layer, position, clean_cache, corrupted_prompt):
|
|
196
|
+
"""Patch single layer/position from clean to corrupted."""
|
|
197
|
+
with model.trace(corrupted_prompt) as tracer:
|
|
198
|
+
# Get current activation
|
|
199
|
+
current = model.transformer.h[layer].output[0]
|
|
200
|
+
|
|
201
|
+
# Patch only specific position
|
|
202
|
+
current[:, position, :] = clean_cache[layer][:, position, :]
|
|
203
|
+
|
|
204
|
+
logits = model.output.save()
|
|
205
|
+
|
|
206
|
+
return logits
|
|
207
|
+
|
|
208
|
+
# Sweep over all layers and positions
|
|
209
|
+
results = torch.zeros(12, seq_len)
|
|
210
|
+
for layer in range(12):
|
|
211
|
+
for pos in range(seq_len):
|
|
212
|
+
logits = patch_layer_position(layer, pos, clean_hidden, corrupted)
|
|
213
|
+
results[layer, pos] = compute_metric(logits)
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
## Workflow 3: Remote Execution with NDIF
|
|
217
|
+
|
|
218
|
+
Run the same experiments on massive models without local GPUs.
|
|
219
|
+
|
|
220
|
+
### Step-by-Step
|
|
221
|
+
|
|
222
|
+
```python
|
|
223
|
+
from nnsight import LanguageModel
|
|
224
|
+
|
|
225
|
+
# 1. Load large model (will run remotely)
|
|
226
|
+
model = LanguageModel("meta-llama/Llama-3.1-70B")
|
|
227
|
+
|
|
228
|
+
# 2. Same code, just add remote=True
|
|
229
|
+
with model.trace("The meaning of life is", remote=True) as tracer:
|
|
230
|
+
# Access internals of 70B model!
|
|
231
|
+
layer_40_out = model.model.layers[40].output[0].save()
|
|
232
|
+
logits = model.output.save()
|
|
233
|
+
|
|
234
|
+
# 3. Results returned from NDIF
|
|
235
|
+
print(f"Layer 40 shape: {layer_40_out.shape}")
|
|
236
|
+
|
|
237
|
+
# 4. Generation with interventions
|
|
238
|
+
with model.trace(remote=True) as tracer:
|
|
239
|
+
with tracer.invoke("What is 2+2?"):
|
|
240
|
+
# Intervene during generation
|
|
241
|
+
model.model.layers[20].output[0][:, -1, :] *= 1.5
|
|
242
|
+
|
|
243
|
+
output = model.generate(max_new_tokens=50)
|
|
244
|
+
```
|
|
245
|
+
|
|
246
|
+
### NDIF Setup
|
|
247
|
+
|
|
248
|
+
1. Sign up at [login.ndif.us](https://login.ndif.us)
|
|
249
|
+
2. Get API key
|
|
250
|
+
3. Set environment variable or pass to nnsight:
|
|
251
|
+
|
|
252
|
+
```python
|
|
253
|
+
import os
|
|
254
|
+
os.environ["NDIF_API_KEY"] = "your_key"
|
|
255
|
+
|
|
256
|
+
# Or configure directly
|
|
257
|
+
from nnsight import CONFIG
|
|
258
|
+
CONFIG.API_KEY = "your_key"
|
|
259
|
+
```
|
|
260
|
+
|
|
261
|
+
### Available Models on NDIF
|
|
262
|
+
|
|
263
|
+
- Llama-3.1-8B, 70B, 405B
|
|
264
|
+
- DeepSeek-R1 models
|
|
265
|
+
- Various open-weight models (check [ndif.us](https://ndif.us) for current list)
|
|
266
|
+
|
|
267
|
+
## Workflow 4: Cross-Prompt Activation Sharing
|
|
268
|
+
|
|
269
|
+
Share activations between different inputs in a single trace.
|
|
270
|
+
|
|
271
|
+
```python
|
|
272
|
+
from nnsight import LanguageModel
|
|
273
|
+
|
|
274
|
+
model = LanguageModel("gpt2", device_map="auto")
|
|
275
|
+
|
|
276
|
+
with model.trace() as tracer:
|
|
277
|
+
# First prompt
|
|
278
|
+
with tracer.invoke("The cat sat on the"):
|
|
279
|
+
cat_hidden = model.transformer.h[6].output[0].save()
|
|
280
|
+
|
|
281
|
+
# Second prompt - inject cat's activations
|
|
282
|
+
with tracer.invoke("The dog ran through the"):
|
|
283
|
+
# Replace with cat's activations at layer 6
|
|
284
|
+
model.transformer.h[6].output[0][:] = cat_hidden
|
|
285
|
+
dog_with_cat = model.output.save()
|
|
286
|
+
|
|
287
|
+
# The dog prompt now has cat's internal representations
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
## Workflow 5: Gradient-Based Analysis
|
|
291
|
+
|
|
292
|
+
Access gradients during backward pass.
|
|
293
|
+
|
|
294
|
+
```python
|
|
295
|
+
from nnsight import LanguageModel
|
|
296
|
+
import torch
|
|
297
|
+
|
|
298
|
+
model = LanguageModel("gpt2", device_map="auto")
|
|
299
|
+
|
|
300
|
+
with model.trace("The quick brown fox") as tracer:
|
|
301
|
+
# Save activations and enable gradient
|
|
302
|
+
hidden = model.transformer.h[5].output[0].save()
|
|
303
|
+
hidden.retain_grad()
|
|
304
|
+
|
|
305
|
+
logits = model.output
|
|
306
|
+
|
|
307
|
+
# Compute loss on specific token
|
|
308
|
+
target_token = model.tokenizer.encode(" jumps")[0]
|
|
309
|
+
loss = -logits[0, -1, target_token]
|
|
310
|
+
|
|
311
|
+
# Backward pass
|
|
312
|
+
loss.backward()
|
|
313
|
+
|
|
314
|
+
# Access gradients
|
|
315
|
+
grad = hidden.grad
|
|
316
|
+
print(f"Gradient shape: {grad.shape}")
|
|
317
|
+
print(f"Gradient norm: {grad.norm().item():.3f}")
|
|
318
|
+
```
|
|
319
|
+
|
|
320
|
+
**Note**: Gradient access not supported for vLLM or remote execution.
|
|
321
|
+
|
|
322
|
+
## Common Issues & Solutions
|
|
323
|
+
|
|
324
|
+
### Issue: Module path differs between models
|
|
325
|
+
```python
|
|
326
|
+
# GPT-2 structure
|
|
327
|
+
model.transformer.h[5].output[0]
|
|
328
|
+
|
|
329
|
+
# LLaMA structure
|
|
330
|
+
model.model.layers[5].output[0]
|
|
331
|
+
|
|
332
|
+
# Solution: Check model structure
|
|
333
|
+
print(model._model) # See actual module names
|
|
334
|
+
```
|
|
335
|
+
|
|
336
|
+
### Issue: Forgetting to save
|
|
337
|
+
```python
|
|
338
|
+
# WRONG: Value not accessible outside trace
|
|
339
|
+
with model.trace("Hello"):
|
|
340
|
+
hidden = model.transformer.h[5].output[0] # Not saved!
|
|
341
|
+
|
|
342
|
+
print(hidden) # Error or wrong value
|
|
343
|
+
|
|
344
|
+
# RIGHT: Call .save()
|
|
345
|
+
with model.trace("Hello"):
|
|
346
|
+
hidden = model.transformer.h[5].output[0].save()
|
|
347
|
+
|
|
348
|
+
print(hidden) # Works!
|
|
349
|
+
```
|
|
350
|
+
|
|
351
|
+
### Issue: Remote timeout
|
|
352
|
+
```python
|
|
353
|
+
# For long operations, increase timeout
|
|
354
|
+
with model.trace("prompt", remote=True, timeout=300) as tracer:
|
|
355
|
+
# Long operation...
|
|
356
|
+
```
|
|
357
|
+
|
|
358
|
+
### Issue: Memory with many saved activations
|
|
359
|
+
```python
|
|
360
|
+
# Only save what you need
|
|
361
|
+
with model.trace("prompt"):
|
|
362
|
+
# Don't save everything
|
|
363
|
+
for i in range(100):
|
|
364
|
+
model.transformer.h[i].output[0].save() # Memory heavy!
|
|
365
|
+
|
|
366
|
+
# Better: save specific layers
|
|
367
|
+
key_layers = [0, 5, 11]
|
|
368
|
+
for i in key_layers:
|
|
369
|
+
model.transformer.h[i].output[0].save()
|
|
370
|
+
```
|
|
371
|
+
|
|
372
|
+
### Issue: vLLM gradient limitation
|
|
373
|
+
```python
|
|
374
|
+
# vLLM doesn't support gradients
|
|
375
|
+
# Use standard execution for gradient analysis
|
|
376
|
+
model = LanguageModel("gpt2", device_map="auto") # Not vLLM
|
|
377
|
+
```
|
|
378
|
+
|
|
379
|
+
## Key API Reference
|
|
380
|
+
|
|
381
|
+
| Method/Property | Purpose |
|
|
382
|
+
|-----------------|---------|
|
|
383
|
+
| `model.trace(prompt, remote=False)` | Start tracing context |
|
|
384
|
+
| `proxy.save()` | Save value for access after trace |
|
|
385
|
+
| `proxy[:]` | Slice/index proxy (assignment patches) |
|
|
386
|
+
| `tracer.invoke(prompt)` | Add prompt within trace |
|
|
387
|
+
| `model.generate(...)` | Generate with interventions |
|
|
388
|
+
| `model.output` | Final model output logits |
|
|
389
|
+
| `model._model` | Underlying HuggingFace model |
|
|
390
|
+
|
|
391
|
+
## Comparison with Other Tools
|
|
392
|
+
|
|
393
|
+
| Feature | nnsight | TransformerLens | pyvene |
|
|
394
|
+
|---------|---------|-----------------|--------|
|
|
395
|
+
| Any architecture | Yes | Transformers only | Yes |
|
|
396
|
+
| Remote execution | Yes (NDIF) | No | No |
|
|
397
|
+
| Consistent API | No | Yes | Yes |
|
|
398
|
+
| Deferred execution | Yes | No | No |
|
|
399
|
+
| HuggingFace native | Yes | Reimplemented | Yes |
|
|
400
|
+
| Shareable configs | No | No | Yes |
|
|
401
|
+
|
|
402
|
+
## Reference Documentation
|
|
403
|
+
|
|
404
|
+
For detailed API documentation, tutorials, and advanced usage, see the `references/` folder:
|
|
405
|
+
|
|
406
|
+
| File | Contents |
|
|
407
|
+
|------|----------|
|
|
408
|
+
| [references/README.md](references/README.md) | Overview and quick start guide |
|
|
409
|
+
| [references/api.md](references/api.md) | Complete API reference for LanguageModel, tracing, proxy objects |
|
|
410
|
+
| [references/tutorials.md](references/tutorials.md) | Step-by-step tutorials for local and remote interpretability |
|
|
411
|
+
|
|
412
|
+
## External Resources
|
|
413
|
+
|
|
414
|
+
### Tutorials
|
|
415
|
+
- [Getting Started](https://nnsight.net/start/)
|
|
416
|
+
- [Features Overview](https://nnsight.net/features/)
|
|
417
|
+
- [Remote Execution](https://nnsight.net/notebooks/features/remote_execution/)
|
|
418
|
+
- [Applied Tutorials](https://nnsight.net/applied_tutorials/)
|
|
419
|
+
|
|
420
|
+
### Official Documentation
|
|
421
|
+
- [Official Docs](https://nnsight.net/documentation/)
|
|
422
|
+
- [NDIF Info](https://ndif.us/)
|
|
423
|
+
- [Community Forum](https://discuss.ndif.us/)
|
|
424
|
+
|
|
425
|
+
### Papers
|
|
426
|
+
- [NNsight and NDIF Paper](https://arxiv.org/abs/2407.14561) - Fiotto-Kaufman et al. (ICLR 2025)
|
|
427
|
+
|
|
428
|
+
## Architecture Support
|
|
429
|
+
|
|
430
|
+
nnsight works with any PyTorch model:
|
|
431
|
+
- **Transformers**: GPT-2, LLaMA, Mistral, etc.
|
|
432
|
+
- **State Space Models**: Mamba
|
|
433
|
+
- **Vision Models**: ViT, CLIP
|
|
434
|
+
- **Custom architectures**: Any nn.Module
|
|
435
|
+
|
|
436
|
+
The key is knowing the module structure to access the right components.
|
|
@@ -0,0 +1,78 @@
|
|
|
1
|
+
# nnsight Reference Documentation
|
|
2
|
+
|
|
3
|
+
This directory contains comprehensive reference materials for nnsight.
|
|
4
|
+
|
|
5
|
+
## Contents
|
|
6
|
+
|
|
7
|
+
- [api.md](api.md) - Complete API reference for LanguageModel, tracing, and proxy objects
|
|
8
|
+
- [tutorials.md](tutorials.md) - Step-by-step tutorials for local and remote interpretability
|
|
9
|
+
|
|
10
|
+
## Quick Links
|
|
11
|
+
|
|
12
|
+
- **Official Documentation**: https://nnsight.net/
|
|
13
|
+
- **GitHub Repository**: https://github.com/ndif-team/nnsight
|
|
14
|
+
- **NDIF (Remote Execution)**: https://ndif.us/
|
|
15
|
+
- **Community Forum**: https://discuss.ndif.us/
|
|
16
|
+
- **Paper**: https://arxiv.org/abs/2407.14561 (ICLR 2025)
|
|
17
|
+
|
|
18
|
+
## Installation
|
|
19
|
+
|
|
20
|
+
```bash
|
|
21
|
+
# Basic installation
|
|
22
|
+
pip install nnsight
|
|
23
|
+
|
|
24
|
+
# For vLLM support
|
|
25
|
+
pip install "nnsight[vllm]"
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
## Basic Usage
|
|
29
|
+
|
|
30
|
+
```python
|
|
31
|
+
from nnsight import LanguageModel
|
|
32
|
+
|
|
33
|
+
# Load model
|
|
34
|
+
model = LanguageModel("openai-community/gpt2", device_map="auto")
|
|
35
|
+
|
|
36
|
+
# Trace and access internals
|
|
37
|
+
with model.trace("The Eiffel Tower is in") as tracer:
|
|
38
|
+
# Access layer output
|
|
39
|
+
hidden = model.transformer.h[5].output[0].save()
|
|
40
|
+
|
|
41
|
+
# Modify activations
|
|
42
|
+
model.transformer.h[8].output[0][:] *= 0.5
|
|
43
|
+
|
|
44
|
+
# Get final output
|
|
45
|
+
logits = model.output.save()
|
|
46
|
+
|
|
47
|
+
# Access saved values outside context
|
|
48
|
+
print(hidden.shape)
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
## Key Concepts
|
|
52
|
+
|
|
53
|
+
### Tracing
|
|
54
|
+
The `trace()` context enables deferred execution - operations are recorded and executed together.
|
|
55
|
+
|
|
56
|
+
### Proxy Objects
|
|
57
|
+
Inside trace, module accesses return Proxies. Call `.save()` to retrieve values after execution.
|
|
58
|
+
|
|
59
|
+
### Remote Execution (NDIF)
|
|
60
|
+
Run the same code on massive models (70B+) without local GPUs:
|
|
61
|
+
|
|
62
|
+
```python
|
|
63
|
+
# Same code, just add remote=True
|
|
64
|
+
with model.trace("Hello", remote=True):
|
|
65
|
+
hidden = model.model.layers[40].output[0].save()
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
## NDIF Setup
|
|
69
|
+
|
|
70
|
+
1. Sign up at https://login.ndif.us/
|
|
71
|
+
2. Get API key
|
|
72
|
+
3. Set environment variable: `export NDIF_API_KEY=your_key`
|
|
73
|
+
|
|
74
|
+
## Available Remote Models
|
|
75
|
+
|
|
76
|
+
- Llama-3.1-8B, 70B, 405B
|
|
77
|
+
- DeepSeek-R1 models
|
|
78
|
+
- More at https://ndif.us/
|