@synsci/cli-darwin-arm64 1.1.49
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/skills/accelerate/SKILL.md +332 -0
- package/bin/skills/accelerate/references/custom-plugins.md +453 -0
- package/bin/skills/accelerate/references/megatron-integration.md +489 -0
- package/bin/skills/accelerate/references/performance.md +525 -0
- package/bin/skills/audiocraft/SKILL.md +564 -0
- package/bin/skills/audiocraft/references/advanced-usage.md +666 -0
- package/bin/skills/audiocraft/references/troubleshooting.md +504 -0
- package/bin/skills/autogpt/SKILL.md +403 -0
- package/bin/skills/autogpt/references/advanced-usage.md +535 -0
- package/bin/skills/autogpt/references/troubleshooting.md +420 -0
- package/bin/skills/awq/SKILL.md +310 -0
- package/bin/skills/awq/references/advanced-usage.md +324 -0
- package/bin/skills/awq/references/troubleshooting.md +344 -0
- package/bin/skills/axolotl/SKILL.md +158 -0
- package/bin/skills/axolotl/references/api.md +5548 -0
- package/bin/skills/axolotl/references/dataset-formats.md +1029 -0
- package/bin/skills/axolotl/references/index.md +15 -0
- package/bin/skills/axolotl/references/other.md +3563 -0
- package/bin/skills/bigcode-evaluation-harness/SKILL.md +405 -0
- package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +393 -0
- package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +424 -0
- package/bin/skills/bigcode-evaluation-harness/references/issues.md +394 -0
- package/bin/skills/bitsandbytes/SKILL.md +411 -0
- package/bin/skills/bitsandbytes/references/memory-optimization.md +521 -0
- package/bin/skills/bitsandbytes/references/qlora-training.md +521 -0
- package/bin/skills/bitsandbytes/references/quantization-formats.md +447 -0
- package/bin/skills/blip-2/SKILL.md +564 -0
- package/bin/skills/blip-2/references/advanced-usage.md +680 -0
- package/bin/skills/blip-2/references/troubleshooting.md +526 -0
- package/bin/skills/chroma/SKILL.md +406 -0
- package/bin/skills/chroma/references/integration.md +38 -0
- package/bin/skills/clip/SKILL.md +253 -0
- package/bin/skills/clip/references/applications.md +207 -0
- package/bin/skills/constitutional-ai/SKILL.md +290 -0
- package/bin/skills/crewai/SKILL.md +498 -0
- package/bin/skills/crewai/references/flows.md +438 -0
- package/bin/skills/crewai/references/tools.md +429 -0
- package/bin/skills/crewai/references/troubleshooting.md +480 -0
- package/bin/skills/deepspeed/SKILL.md +141 -0
- package/bin/skills/deepspeed/references/08.md +17 -0
- package/bin/skills/deepspeed/references/09.md +173 -0
- package/bin/skills/deepspeed/references/2020.md +378 -0
- package/bin/skills/deepspeed/references/2023.md +279 -0
- package/bin/skills/deepspeed/references/assets.md +179 -0
- package/bin/skills/deepspeed/references/index.md +35 -0
- package/bin/skills/deepspeed/references/mii.md +118 -0
- package/bin/skills/deepspeed/references/other.md +1191 -0
- package/bin/skills/deepspeed/references/tutorials.md +6554 -0
- package/bin/skills/dspy/SKILL.md +590 -0
- package/bin/skills/dspy/references/examples.md +663 -0
- package/bin/skills/dspy/references/modules.md +475 -0
- package/bin/skills/dspy/references/optimizers.md +566 -0
- package/bin/skills/faiss/SKILL.md +221 -0
- package/bin/skills/faiss/references/index_types.md +280 -0
- package/bin/skills/flash-attention/SKILL.md +367 -0
- package/bin/skills/flash-attention/references/benchmarks.md +215 -0
- package/bin/skills/flash-attention/references/transformers-integration.md +293 -0
- package/bin/skills/gguf/SKILL.md +427 -0
- package/bin/skills/gguf/references/advanced-usage.md +504 -0
- package/bin/skills/gguf/references/troubleshooting.md +442 -0
- package/bin/skills/gptq/SKILL.md +450 -0
- package/bin/skills/gptq/references/calibration.md +337 -0
- package/bin/skills/gptq/references/integration.md +129 -0
- package/bin/skills/gptq/references/troubleshooting.md +95 -0
- package/bin/skills/grpo-rl-training/README.md +97 -0
- package/bin/skills/grpo-rl-training/SKILL.md +572 -0
- package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +393 -0
- package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +228 -0
- package/bin/skills/guidance/SKILL.md +572 -0
- package/bin/skills/guidance/references/backends.md +554 -0
- package/bin/skills/guidance/references/constraints.md +674 -0
- package/bin/skills/guidance/references/examples.md +767 -0
- package/bin/skills/hqq/SKILL.md +445 -0
- package/bin/skills/hqq/references/advanced-usage.md +528 -0
- package/bin/skills/hqq/references/troubleshooting.md +503 -0
- package/bin/skills/hugging-face-cli/SKILL.md +191 -0
- package/bin/skills/hugging-face-cli/references/commands.md +954 -0
- package/bin/skills/hugging-face-cli/references/examples.md +374 -0
- package/bin/skills/hugging-face-datasets/SKILL.md +547 -0
- package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +239 -0
- package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +196 -0
- package/bin/skills/hugging-face-datasets/examples/training_examples.json +176 -0
- package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +522 -0
- package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +844 -0
- package/bin/skills/hugging-face-datasets/templates/chat.json +55 -0
- package/bin/skills/hugging-face-datasets/templates/classification.json +62 -0
- package/bin/skills/hugging-face-datasets/templates/completion.json +51 -0
- package/bin/skills/hugging-face-datasets/templates/custom.json +75 -0
- package/bin/skills/hugging-face-datasets/templates/qa.json +54 -0
- package/bin/skills/hugging-face-datasets/templates/tabular.json +81 -0
- package/bin/skills/hugging-face-evaluation/SKILL.md +656 -0
- package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +382 -0
- package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +141 -0
- package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +135 -0
- package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +50 -0
- package/bin/skills/hugging-face-evaluation/requirements.txt +20 -0
- package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +1374 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +104 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +317 -0
- package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +303 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +98 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +331 -0
- package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +206 -0
- package/bin/skills/hugging-face-jobs/SKILL.md +1041 -0
- package/bin/skills/hugging-face-jobs/index.html +216 -0
- package/bin/skills/hugging-face-jobs/references/hardware_guide.md +336 -0
- package/bin/skills/hugging-face-jobs/references/hub_saving.md +352 -0
- package/bin/skills/hugging-face-jobs/references/token_usage.md +546 -0
- package/bin/skills/hugging-face-jobs/references/troubleshooting.md +475 -0
- package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +718 -0
- package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +546 -0
- package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +587 -0
- package/bin/skills/hugging-face-model-trainer/SKILL.md +711 -0
- package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +296 -0
- package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +283 -0
- package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +364 -0
- package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +371 -0
- package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +189 -0
- package/bin/skills/hugging-face-model-trainer/references/training_methods.md +150 -0
- package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +203 -0
- package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +282 -0
- package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +424 -0
- package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +417 -0
- package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +150 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +106 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +89 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +122 -0
- package/bin/skills/hugging-face-paper-publisher/SKILL.md +627 -0
- package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +327 -0
- package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +216 -0
- package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +508 -0
- package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +299 -0
- package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +358 -0
- package/bin/skills/hugging-face-paper-publisher/templates/modern.md +319 -0
- package/bin/skills/hugging-face-paper-publisher/templates/standard.md +201 -0
- package/bin/skills/hugging-face-tool-builder/SKILL.md +115 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +57 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +40 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +57 -0
- package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +230 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +96 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +188 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +171 -0
- package/bin/skills/hugging-face-trackio/SKILL.md +65 -0
- package/bin/skills/hugging-face-trackio/references/logging_metrics.md +206 -0
- package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +223 -0
- package/bin/skills/huggingface-tokenizers/SKILL.md +516 -0
- package/bin/skills/huggingface-tokenizers/references/algorithms.md +653 -0
- package/bin/skills/huggingface-tokenizers/references/integration.md +637 -0
- package/bin/skills/huggingface-tokenizers/references/pipeline.md +723 -0
- package/bin/skills/huggingface-tokenizers/references/training.md +565 -0
- package/bin/skills/instructor/SKILL.md +740 -0
- package/bin/skills/instructor/references/examples.md +107 -0
- package/bin/skills/instructor/references/providers.md +70 -0
- package/bin/skills/instructor/references/validation.md +606 -0
- package/bin/skills/knowledge-distillation/SKILL.md +458 -0
- package/bin/skills/knowledge-distillation/references/minillm.md +334 -0
- package/bin/skills/lambda-labs/SKILL.md +545 -0
- package/bin/skills/lambda-labs/references/advanced-usage.md +611 -0
- package/bin/skills/lambda-labs/references/troubleshooting.md +530 -0
- package/bin/skills/langchain/SKILL.md +480 -0
- package/bin/skills/langchain/references/agents.md +499 -0
- package/bin/skills/langchain/references/integration.md +562 -0
- package/bin/skills/langchain/references/rag.md +600 -0
- package/bin/skills/langsmith/SKILL.md +422 -0
- package/bin/skills/langsmith/references/advanced-usage.md +548 -0
- package/bin/skills/langsmith/references/troubleshooting.md +537 -0
- package/bin/skills/litgpt/SKILL.md +469 -0
- package/bin/skills/litgpt/references/custom-models.md +568 -0
- package/bin/skills/litgpt/references/distributed-training.md +451 -0
- package/bin/skills/litgpt/references/supported-models.md +336 -0
- package/bin/skills/litgpt/references/training-recipes.md +619 -0
- package/bin/skills/llama-cpp/SKILL.md +258 -0
- package/bin/skills/llama-cpp/references/optimization.md +89 -0
- package/bin/skills/llama-cpp/references/quantization.md +213 -0
- package/bin/skills/llama-cpp/references/server.md +125 -0
- package/bin/skills/llama-factory/SKILL.md +80 -0
- package/bin/skills/llama-factory/references/_images.md +23 -0
- package/bin/skills/llama-factory/references/advanced.md +1055 -0
- package/bin/skills/llama-factory/references/getting_started.md +349 -0
- package/bin/skills/llama-factory/references/index.md +19 -0
- package/bin/skills/llama-factory/references/other.md +31 -0
- package/bin/skills/llamaguard/SKILL.md +337 -0
- package/bin/skills/llamaindex/SKILL.md +569 -0
- package/bin/skills/llamaindex/references/agents.md +83 -0
- package/bin/skills/llamaindex/references/data_connectors.md +108 -0
- package/bin/skills/llamaindex/references/query_engines.md +406 -0
- package/bin/skills/llava/SKILL.md +304 -0
- package/bin/skills/llava/references/training.md +197 -0
- package/bin/skills/lm-evaluation-harness/SKILL.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
- package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
- package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
- package/bin/skills/long-context/SKILL.md +536 -0
- package/bin/skills/long-context/references/extension_methods.md +468 -0
- package/bin/skills/long-context/references/fine_tuning.md +611 -0
- package/bin/skills/long-context/references/rope.md +402 -0
- package/bin/skills/mamba/SKILL.md +260 -0
- package/bin/skills/mamba/references/architecture-details.md +206 -0
- package/bin/skills/mamba/references/benchmarks.md +255 -0
- package/bin/skills/mamba/references/training-guide.md +388 -0
- package/bin/skills/megatron-core/SKILL.md +366 -0
- package/bin/skills/megatron-core/references/benchmarks.md +249 -0
- package/bin/skills/megatron-core/references/parallelism-guide.md +404 -0
- package/bin/skills/megatron-core/references/production-examples.md +473 -0
- package/bin/skills/megatron-core/references/training-recipes.md +547 -0
- package/bin/skills/miles/SKILL.md +315 -0
- package/bin/skills/miles/references/api-reference.md +141 -0
- package/bin/skills/miles/references/troubleshooting.md +352 -0
- package/bin/skills/mlflow/SKILL.md +704 -0
- package/bin/skills/mlflow/references/deployment.md +744 -0
- package/bin/skills/mlflow/references/model-registry.md +770 -0
- package/bin/skills/mlflow/references/tracking.md +680 -0
- package/bin/skills/modal/SKILL.md +341 -0
- package/bin/skills/modal/references/advanced-usage.md +503 -0
- package/bin/skills/modal/references/troubleshooting.md +494 -0
- package/bin/skills/model-merging/SKILL.md +539 -0
- package/bin/skills/model-merging/references/evaluation.md +462 -0
- package/bin/skills/model-merging/references/examples.md +428 -0
- package/bin/skills/model-merging/references/methods.md +352 -0
- package/bin/skills/model-pruning/SKILL.md +495 -0
- package/bin/skills/model-pruning/references/wanda.md +347 -0
- package/bin/skills/moe-training/SKILL.md +526 -0
- package/bin/skills/moe-training/references/architectures.md +432 -0
- package/bin/skills/moe-training/references/inference.md +348 -0
- package/bin/skills/moe-training/references/training.md +425 -0
- package/bin/skills/nanogpt/SKILL.md +290 -0
- package/bin/skills/nanogpt/references/architecture.md +382 -0
- package/bin/skills/nanogpt/references/data.md +476 -0
- package/bin/skills/nanogpt/references/training.md +564 -0
- package/bin/skills/nemo-curator/SKILL.md +383 -0
- package/bin/skills/nemo-curator/references/deduplication.md +87 -0
- package/bin/skills/nemo-curator/references/filtering.md +102 -0
- package/bin/skills/nemo-evaluator/SKILL.md +494 -0
- package/bin/skills/nemo-evaluator/references/adapter-system.md +340 -0
- package/bin/skills/nemo-evaluator/references/configuration.md +447 -0
- package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +315 -0
- package/bin/skills/nemo-evaluator/references/execution-backends.md +361 -0
- package/bin/skills/nemo-guardrails/SKILL.md +297 -0
- package/bin/skills/nnsight/SKILL.md +436 -0
- package/bin/skills/nnsight/references/README.md +78 -0
- package/bin/skills/nnsight/references/api.md +344 -0
- package/bin/skills/nnsight/references/tutorials.md +300 -0
- package/bin/skills/openrlhf/SKILL.md +249 -0
- package/bin/skills/openrlhf/references/algorithm-comparison.md +404 -0
- package/bin/skills/openrlhf/references/custom-rewards.md +530 -0
- package/bin/skills/openrlhf/references/hybrid-engine.md +287 -0
- package/bin/skills/openrlhf/references/multi-node-training.md +454 -0
- package/bin/skills/outlines/SKILL.md +652 -0
- package/bin/skills/outlines/references/backends.md +615 -0
- package/bin/skills/outlines/references/examples.md +773 -0
- package/bin/skills/outlines/references/json_generation.md +652 -0
- package/bin/skills/peft/SKILL.md +431 -0
- package/bin/skills/peft/references/advanced-usage.md +514 -0
- package/bin/skills/peft/references/troubleshooting.md +480 -0
- package/bin/skills/phoenix/SKILL.md +475 -0
- package/bin/skills/phoenix/references/advanced-usage.md +619 -0
- package/bin/skills/phoenix/references/troubleshooting.md +538 -0
- package/bin/skills/pinecone/SKILL.md +358 -0
- package/bin/skills/pinecone/references/deployment.md +181 -0
- package/bin/skills/pytorch-fsdp/SKILL.md +126 -0
- package/bin/skills/pytorch-fsdp/references/index.md +7 -0
- package/bin/skills/pytorch-fsdp/references/other.md +4249 -0
- package/bin/skills/pytorch-lightning/SKILL.md +346 -0
- package/bin/skills/pytorch-lightning/references/callbacks.md +436 -0
- package/bin/skills/pytorch-lightning/references/distributed.md +490 -0
- package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +556 -0
- package/bin/skills/pyvene/SKILL.md +473 -0
- package/bin/skills/pyvene/references/README.md +73 -0
- package/bin/skills/pyvene/references/api.md +383 -0
- package/bin/skills/pyvene/references/tutorials.md +376 -0
- package/bin/skills/qdrant/SKILL.md +493 -0
- package/bin/skills/qdrant/references/advanced-usage.md +648 -0
- package/bin/skills/qdrant/references/troubleshooting.md +631 -0
- package/bin/skills/ray-data/SKILL.md +326 -0
- package/bin/skills/ray-data/references/integration.md +82 -0
- package/bin/skills/ray-data/references/transformations.md +83 -0
- package/bin/skills/ray-train/SKILL.md +406 -0
- package/bin/skills/ray-train/references/multi-node.md +628 -0
- package/bin/skills/rwkv/SKILL.md +260 -0
- package/bin/skills/rwkv/references/architecture-details.md +344 -0
- package/bin/skills/rwkv/references/rwkv7.md +386 -0
- package/bin/skills/rwkv/references/state-management.md +369 -0
- package/bin/skills/saelens/SKILL.md +386 -0
- package/bin/skills/saelens/references/README.md +70 -0
- package/bin/skills/saelens/references/api.md +333 -0
- package/bin/skills/saelens/references/tutorials.md +318 -0
- package/bin/skills/segment-anything/SKILL.md +500 -0
- package/bin/skills/segment-anything/references/advanced-usage.md +589 -0
- package/bin/skills/segment-anything/references/troubleshooting.md +484 -0
- package/bin/skills/sentence-transformers/SKILL.md +255 -0
- package/bin/skills/sentence-transformers/references/models.md +123 -0
- package/bin/skills/sentencepiece/SKILL.md +235 -0
- package/bin/skills/sentencepiece/references/algorithms.md +200 -0
- package/bin/skills/sentencepiece/references/training.md +304 -0
- package/bin/skills/sglang/SKILL.md +442 -0
- package/bin/skills/sglang/references/deployment.md +490 -0
- package/bin/skills/sglang/references/radix-attention.md +413 -0
- package/bin/skills/sglang/references/structured-generation.md +541 -0
- package/bin/skills/simpo/SKILL.md +219 -0
- package/bin/skills/simpo/references/datasets.md +478 -0
- package/bin/skills/simpo/references/hyperparameters.md +452 -0
- package/bin/skills/simpo/references/loss-functions.md +350 -0
- package/bin/skills/skypilot/SKILL.md +509 -0
- package/bin/skills/skypilot/references/advanced-usage.md +491 -0
- package/bin/skills/skypilot/references/troubleshooting.md +570 -0
- package/bin/skills/slime/SKILL.md +464 -0
- package/bin/skills/slime/references/api-reference.md +392 -0
- package/bin/skills/slime/references/troubleshooting.md +386 -0
- package/bin/skills/speculative-decoding/SKILL.md +467 -0
- package/bin/skills/speculative-decoding/references/lookahead.md +309 -0
- package/bin/skills/speculative-decoding/references/medusa.md +350 -0
- package/bin/skills/stable-diffusion/SKILL.md +519 -0
- package/bin/skills/stable-diffusion/references/advanced-usage.md +716 -0
- package/bin/skills/stable-diffusion/references/troubleshooting.md +555 -0
- package/bin/skills/tensorboard/SKILL.md +629 -0
- package/bin/skills/tensorboard/references/integrations.md +638 -0
- package/bin/skills/tensorboard/references/profiling.md +545 -0
- package/bin/skills/tensorboard/references/visualization.md +620 -0
- package/bin/skills/tensorrt-llm/SKILL.md +187 -0
- package/bin/skills/tensorrt-llm/references/multi-gpu.md +298 -0
- package/bin/skills/tensorrt-llm/references/optimization.md +242 -0
- package/bin/skills/tensorrt-llm/references/serving.md +470 -0
- package/bin/skills/tinker/SKILL.md +362 -0
- package/bin/skills/tinker/references/api-reference.md +168 -0
- package/bin/skills/tinker/references/getting-started.md +157 -0
- package/bin/skills/tinker/references/loss-functions.md +163 -0
- package/bin/skills/tinker/references/models-and-lora.md +139 -0
- package/bin/skills/tinker/references/recipes.md +280 -0
- package/bin/skills/tinker/references/reinforcement-learning.md +212 -0
- package/bin/skills/tinker/references/rendering.md +243 -0
- package/bin/skills/tinker/references/supervised-learning.md +232 -0
- package/bin/skills/tinker-training-cost/SKILL.md +187 -0
- package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +123 -0
- package/bin/skills/torchforge/SKILL.md +433 -0
- package/bin/skills/torchforge/references/api-reference.md +327 -0
- package/bin/skills/torchforge/references/troubleshooting.md +409 -0
- package/bin/skills/torchtitan/SKILL.md +358 -0
- package/bin/skills/torchtitan/references/checkpoint.md +181 -0
- package/bin/skills/torchtitan/references/custom-models.md +258 -0
- package/bin/skills/torchtitan/references/float8.md +133 -0
- package/bin/skills/torchtitan/references/fsdp.md +126 -0
- package/bin/skills/transformer-lens/SKILL.md +346 -0
- package/bin/skills/transformer-lens/references/README.md +54 -0
- package/bin/skills/transformer-lens/references/api.md +362 -0
- package/bin/skills/transformer-lens/references/tutorials.md +339 -0
- package/bin/skills/trl-fine-tuning/SKILL.md +455 -0
- package/bin/skills/trl-fine-tuning/references/dpo-variants.md +227 -0
- package/bin/skills/trl-fine-tuning/references/online-rl.md +82 -0
- package/bin/skills/trl-fine-tuning/references/reward-modeling.md +122 -0
- package/bin/skills/trl-fine-tuning/references/sft-training.md +168 -0
- package/bin/skills/unsloth/SKILL.md +80 -0
- package/bin/skills/unsloth/references/index.md +7 -0
- package/bin/skills/unsloth/references/llms-full.md +16799 -0
- package/bin/skills/unsloth/references/llms-txt.md +12044 -0
- package/bin/skills/unsloth/references/llms.md +82 -0
- package/bin/skills/verl/SKILL.md +391 -0
- package/bin/skills/verl/references/api-reference.md +301 -0
- package/bin/skills/verl/references/troubleshooting.md +391 -0
- package/bin/skills/vllm/SKILL.md +364 -0
- package/bin/skills/vllm/references/optimization.md +226 -0
- package/bin/skills/vllm/references/quantization.md +284 -0
- package/bin/skills/vllm/references/server-deployment.md +255 -0
- package/bin/skills/vllm/references/troubleshooting.md +447 -0
- package/bin/skills/weights-and-biases/SKILL.md +590 -0
- package/bin/skills/weights-and-biases/references/artifacts.md +584 -0
- package/bin/skills/weights-and-biases/references/integrations.md +700 -0
- package/bin/skills/weights-and-biases/references/sweeps.md +847 -0
- package/bin/skills/whisper/SKILL.md +317 -0
- package/bin/skills/whisper/references/languages.md +189 -0
- package/bin/synsc +0 -0
- package/package.json +10 -0
|
@@ -0,0 +1,469 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: implementing-llms-litgpt
|
|
3
|
+
description: Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA. Single-file implementations, no abstraction layers.
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
author: Synthetic Sciences
|
|
6
|
+
license: MIT
|
|
7
|
+
tags: [Model Architecture, LitGPT, Lightning AI, LLM Implementation, LoRA, QLoRA, Fine-Tuning, Llama, Gemma, Phi, Mistral, Educational]
|
|
8
|
+
dependencies: [litgpt, torch, transformers]
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
# LitGPT - Clean LLM Implementations
|
|
12
|
+
|
|
13
|
+
## Quick start
|
|
14
|
+
|
|
15
|
+
LitGPT provides 20+ pretrained LLM implementations with clean, readable code and production-ready training workflows.
|
|
16
|
+
|
|
17
|
+
**Installation**:
|
|
18
|
+
```bash
|
|
19
|
+
pip install 'litgpt[extra]'
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
**Load and use any model**:
|
|
23
|
+
```python
|
|
24
|
+
from litgpt import LLM
|
|
25
|
+
|
|
26
|
+
# Load pretrained model
|
|
27
|
+
llm = LLM.load("microsoft/phi-2")
|
|
28
|
+
|
|
29
|
+
# Generate text
|
|
30
|
+
result = llm.generate(
|
|
31
|
+
"What is the capital of France?",
|
|
32
|
+
max_new_tokens=50,
|
|
33
|
+
temperature=0.7
|
|
34
|
+
)
|
|
35
|
+
print(result)
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
**List available models**:
|
|
39
|
+
```bash
|
|
40
|
+
litgpt download list
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
## Common workflows
|
|
44
|
+
|
|
45
|
+
### Workflow 1: Fine-tune on custom dataset
|
|
46
|
+
|
|
47
|
+
Copy this checklist:
|
|
48
|
+
|
|
49
|
+
```
|
|
50
|
+
Fine-Tuning Setup:
|
|
51
|
+
- [ ] Step 1: Download pretrained model
|
|
52
|
+
- [ ] Step 2: Prepare dataset
|
|
53
|
+
- [ ] Step 3: Configure training
|
|
54
|
+
- [ ] Step 4: Run fine-tuning
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
**Step 1: Download pretrained model**
|
|
58
|
+
|
|
59
|
+
```bash
|
|
60
|
+
# Download Llama 3 8B
|
|
61
|
+
litgpt download meta-llama/Meta-Llama-3-8B
|
|
62
|
+
|
|
63
|
+
# Download Phi-2 (smaller, faster)
|
|
64
|
+
litgpt download microsoft/phi-2
|
|
65
|
+
|
|
66
|
+
# Download Gemma 2B
|
|
67
|
+
litgpt download google/gemma-2b
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
Models are saved to `checkpoints/` directory.
|
|
71
|
+
|
|
72
|
+
**Step 2: Prepare dataset**
|
|
73
|
+
|
|
74
|
+
LitGPT supports multiple formats:
|
|
75
|
+
|
|
76
|
+
**Alpaca format** (instruction-response):
|
|
77
|
+
```json
|
|
78
|
+
[
|
|
79
|
+
{
|
|
80
|
+
"instruction": "What is the capital of France?",
|
|
81
|
+
"input": "",
|
|
82
|
+
"output": "The capital of France is Paris."
|
|
83
|
+
},
|
|
84
|
+
{
|
|
85
|
+
"instruction": "Translate to Spanish: Hello, how are you?",
|
|
86
|
+
"input": "",
|
|
87
|
+
"output": "Hola, ¿cómo estás?"
|
|
88
|
+
}
|
|
89
|
+
]
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
Save as `data/my_dataset.json`.
|
|
93
|
+
|
|
94
|
+
**Step 3: Configure training**
|
|
95
|
+
|
|
96
|
+
```bash
|
|
97
|
+
# Full fine-tuning (requires 40GB+ GPU for 7B models)
|
|
98
|
+
litgpt finetune \
|
|
99
|
+
meta-llama/Meta-Llama-3-8B \
|
|
100
|
+
--data JSON \
|
|
101
|
+
--data.json_path data/my_dataset.json \
|
|
102
|
+
--train.max_steps 1000 \
|
|
103
|
+
--train.learning_rate 2e-5 \
|
|
104
|
+
--train.micro_batch_size 1 \
|
|
105
|
+
--train.global_batch_size 16
|
|
106
|
+
|
|
107
|
+
# LoRA fine-tuning (efficient, 16GB GPU)
|
|
108
|
+
litgpt finetune_lora \
|
|
109
|
+
microsoft/phi-2 \
|
|
110
|
+
--data JSON \
|
|
111
|
+
--data.json_path data/my_dataset.json \
|
|
112
|
+
--lora_r 16 \
|
|
113
|
+
--lora_alpha 32 \
|
|
114
|
+
--lora_dropout 0.05 \
|
|
115
|
+
--train.max_steps 1000 \
|
|
116
|
+
--train.learning_rate 1e-4
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
**Step 4: Run fine-tuning**
|
|
120
|
+
|
|
121
|
+
Training saves checkpoints to `out/finetune/` automatically.
|
|
122
|
+
|
|
123
|
+
Monitor training:
|
|
124
|
+
```bash
|
|
125
|
+
# View logs
|
|
126
|
+
tail -f out/finetune/logs.txt
|
|
127
|
+
|
|
128
|
+
# TensorBoard (if using --train.logger_name tensorboard)
|
|
129
|
+
tensorboard --logdir out/finetune/lightning_logs
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
### Workflow 2: LoRA fine-tuning on single GPU
|
|
133
|
+
|
|
134
|
+
Most memory-efficient option.
|
|
135
|
+
|
|
136
|
+
```
|
|
137
|
+
LoRA Training:
|
|
138
|
+
- [ ] Step 1: Choose base model
|
|
139
|
+
- [ ] Step 2: Configure LoRA parameters
|
|
140
|
+
- [ ] Step 3: Train with LoRA
|
|
141
|
+
- [ ] Step 4: Merge LoRA weights (optional)
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
**Step 1: Choose base model**
|
|
145
|
+
|
|
146
|
+
For limited GPU memory (12-16GB):
|
|
147
|
+
- **Phi-2** (2.7B) - Best quality/size tradeoff
|
|
148
|
+
- **Llama 3 1B** - Smallest, fastest
|
|
149
|
+
- **Gemma 2B** - Good reasoning
|
|
150
|
+
|
|
151
|
+
**Step 2: Configure LoRA parameters**
|
|
152
|
+
|
|
153
|
+
```bash
|
|
154
|
+
litgpt finetune_lora \
|
|
155
|
+
microsoft/phi-2 \
|
|
156
|
+
--data JSON \
|
|
157
|
+
--data.json_path data/my_dataset.json \
|
|
158
|
+
--lora_r 16 \ # LoRA rank (8-64, higher=more capacity)
|
|
159
|
+
--lora_alpha 32 \ # LoRA scaling (typically 2×r)
|
|
160
|
+
--lora_dropout 0.05 \ # Prevent overfitting
|
|
161
|
+
--lora_query true \ # Apply LoRA to query projection
|
|
162
|
+
--lora_key false \ # Usually not needed
|
|
163
|
+
--lora_value true \ # Apply LoRA to value projection
|
|
164
|
+
--lora_projection true \ # Apply LoRA to output projection
|
|
165
|
+
--lora_mlp false \ # Usually not needed
|
|
166
|
+
--lora_head false # Usually not needed
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
LoRA rank guide:
|
|
170
|
+
- `r=8`: Lightweight, 2-4MB adapters
|
|
171
|
+
- `r=16`: Standard, good quality
|
|
172
|
+
- `r=32`: High capacity, use for complex tasks
|
|
173
|
+
- `r=64`: Maximum quality, 4× larger adapters
|
|
174
|
+
|
|
175
|
+
**Step 3: Train with LoRA**
|
|
176
|
+
|
|
177
|
+
```bash
|
|
178
|
+
litgpt finetune_lora \
|
|
179
|
+
microsoft/phi-2 \
|
|
180
|
+
--data JSON \
|
|
181
|
+
--data.json_path data/my_dataset.json \
|
|
182
|
+
--lora_r 16 \
|
|
183
|
+
--train.epochs 3 \
|
|
184
|
+
--train.learning_rate 1e-4 \
|
|
185
|
+
--train.micro_batch_size 4 \
|
|
186
|
+
--train.global_batch_size 32 \
|
|
187
|
+
--out_dir out/phi2-lora
|
|
188
|
+
|
|
189
|
+
# Memory usage: ~8-12GB for Phi-2 with LoRA
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
**Step 4: Merge LoRA weights** (optional)
|
|
193
|
+
|
|
194
|
+
Merge LoRA adapters into base model for deployment:
|
|
195
|
+
|
|
196
|
+
```bash
|
|
197
|
+
litgpt merge_lora \
|
|
198
|
+
out/phi2-lora/final \
|
|
199
|
+
--out_dir out/phi2-merged
|
|
200
|
+
```
|
|
201
|
+
|
|
202
|
+
Now use merged model:
|
|
203
|
+
```python
|
|
204
|
+
from litgpt import LLM
|
|
205
|
+
llm = LLM.load("out/phi2-merged")
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
### Workflow 3: Pretrain from scratch
|
|
209
|
+
|
|
210
|
+
Train new model on your domain data.
|
|
211
|
+
|
|
212
|
+
```
|
|
213
|
+
Pretraining:
|
|
214
|
+
- [ ] Step 1: Prepare pretraining dataset
|
|
215
|
+
- [ ] Step 2: Configure model architecture
|
|
216
|
+
- [ ] Step 3: Set up multi-GPU training
|
|
217
|
+
- [ ] Step 4: Launch pretraining
|
|
218
|
+
```
|
|
219
|
+
|
|
220
|
+
**Step 1: Prepare pretraining dataset**
|
|
221
|
+
|
|
222
|
+
LitGPT expects tokenized data. Use `prepare_dataset.py`:
|
|
223
|
+
|
|
224
|
+
```bash
|
|
225
|
+
python scripts/prepare_dataset.py \
|
|
226
|
+
--source_path data/my_corpus.txt \
|
|
227
|
+
--checkpoint_dir checkpoints/tokenizer \
|
|
228
|
+
--destination_path data/pretrain \
|
|
229
|
+
--split train,val
|
|
230
|
+
```
|
|
231
|
+
|
|
232
|
+
**Step 2: Configure model architecture**
|
|
233
|
+
|
|
234
|
+
Edit config file or use existing:
|
|
235
|
+
|
|
236
|
+
```python
|
|
237
|
+
# config/pythia-160m.yaml
|
|
238
|
+
model_name: pythia-160m
|
|
239
|
+
block_size: 2048
|
|
240
|
+
vocab_size: 50304
|
|
241
|
+
n_layer: 12
|
|
242
|
+
n_head: 12
|
|
243
|
+
n_embd: 768
|
|
244
|
+
rotary_percentage: 0.25
|
|
245
|
+
parallel_residual: true
|
|
246
|
+
bias: true
|
|
247
|
+
```
|
|
248
|
+
|
|
249
|
+
**Step 3: Set up multi-GPU training**
|
|
250
|
+
|
|
251
|
+
```bash
|
|
252
|
+
# Single GPU
|
|
253
|
+
litgpt pretrain \
|
|
254
|
+
--config config/pythia-160m.yaml \
|
|
255
|
+
--data.data_dir data/pretrain \
|
|
256
|
+
--train.max_tokens 10_000_000_000
|
|
257
|
+
|
|
258
|
+
# Multi-GPU with FSDP
|
|
259
|
+
litgpt pretrain \
|
|
260
|
+
--config config/pythia-1b.yaml \
|
|
261
|
+
--data.data_dir data/pretrain \
|
|
262
|
+
--devices 8 \
|
|
263
|
+
--train.max_tokens 100_000_000_000
|
|
264
|
+
```
|
|
265
|
+
|
|
266
|
+
**Step 4: Launch pretraining**
|
|
267
|
+
|
|
268
|
+
For large-scale pretraining on cluster:
|
|
269
|
+
|
|
270
|
+
```bash
|
|
271
|
+
# Using SLURM
|
|
272
|
+
sbatch --nodes=8 --gpus-per-node=8 \
|
|
273
|
+
pretrain_script.sh
|
|
274
|
+
|
|
275
|
+
# pretrain_script.sh content:
|
|
276
|
+
litgpt pretrain \
|
|
277
|
+
--config config/pythia-1b.yaml \
|
|
278
|
+
--data.data_dir /shared/data/pretrain \
|
|
279
|
+
--devices 8 \
|
|
280
|
+
--num_nodes 8 \
|
|
281
|
+
--train.global_batch_size 512 \
|
|
282
|
+
--train.max_tokens 300_000_000_000
|
|
283
|
+
```
|
|
284
|
+
|
|
285
|
+
### Workflow 4: Convert and deploy model
|
|
286
|
+
|
|
287
|
+
Export LitGPT models for production.
|
|
288
|
+
|
|
289
|
+
```
|
|
290
|
+
Model Deployment:
|
|
291
|
+
- [ ] Step 1: Test inference locally
|
|
292
|
+
- [ ] Step 2: Quantize model (optional)
|
|
293
|
+
- [ ] Step 3: Convert to GGUF (for llama.cpp)
|
|
294
|
+
- [ ] Step 4: Deploy with API
|
|
295
|
+
```
|
|
296
|
+
|
|
297
|
+
**Step 1: Test inference locally**
|
|
298
|
+
|
|
299
|
+
```python
|
|
300
|
+
from litgpt import LLM
|
|
301
|
+
|
|
302
|
+
llm = LLM.load("out/phi2-lora/final")
|
|
303
|
+
|
|
304
|
+
# Single generation
|
|
305
|
+
print(llm.generate("What is machine learning?"))
|
|
306
|
+
|
|
307
|
+
# Streaming
|
|
308
|
+
for token in llm.generate("Explain quantum computing", stream=True):
|
|
309
|
+
print(token, end="", flush=True)
|
|
310
|
+
|
|
311
|
+
# Batch inference
|
|
312
|
+
prompts = ["Hello", "Goodbye", "Thank you"]
|
|
313
|
+
results = [llm.generate(p) for p in prompts]
|
|
314
|
+
```
|
|
315
|
+
|
|
316
|
+
**Step 2: Quantize model** (optional)
|
|
317
|
+
|
|
318
|
+
Reduce model size with minimal quality loss:
|
|
319
|
+
|
|
320
|
+
```bash
|
|
321
|
+
# 8-bit quantization (50% size reduction)
|
|
322
|
+
litgpt convert_lit_checkpoint \
|
|
323
|
+
out/phi2-lora/final \
|
|
324
|
+
--dtype bfloat16 \
|
|
325
|
+
--quantize bnb.nf4
|
|
326
|
+
|
|
327
|
+
# 4-bit quantization (75% size reduction)
|
|
328
|
+
litgpt convert_lit_checkpoint \
|
|
329
|
+
out/phi2-lora/final \
|
|
330
|
+
--quantize bnb.nf4-dq # Double quantization
|
|
331
|
+
```
|
|
332
|
+
|
|
333
|
+
**Step 3: Convert to GGUF** (for llama.cpp)
|
|
334
|
+
|
|
335
|
+
```bash
|
|
336
|
+
python scripts/convert_lit_checkpoint.py \
|
|
337
|
+
--checkpoint_path out/phi2-lora/final \
|
|
338
|
+
--output_path models/phi2.gguf \
|
|
339
|
+
--model_name microsoft/phi-2
|
|
340
|
+
```
|
|
341
|
+
|
|
342
|
+
**Step 4: Deploy with API**
|
|
343
|
+
|
|
344
|
+
```python
|
|
345
|
+
from fastapi import FastAPI
|
|
346
|
+
from litgpt import LLM
|
|
347
|
+
|
|
348
|
+
app = FastAPI()
|
|
349
|
+
llm = LLM.load("out/phi2-lora/final")
|
|
350
|
+
|
|
351
|
+
@app.post("/generate")
|
|
352
|
+
def generate(prompt: str, max_tokens: int = 100):
|
|
353
|
+
result = llm.generate(
|
|
354
|
+
prompt,
|
|
355
|
+
max_new_tokens=max_tokens,
|
|
356
|
+
temperature=0.7
|
|
357
|
+
)
|
|
358
|
+
return {"response": result}
|
|
359
|
+
|
|
360
|
+
# Run: uvicorn api:app --host 0.0.0.0 --port 8000
|
|
361
|
+
```
|
|
362
|
+
|
|
363
|
+
## When to use vs alternatives
|
|
364
|
+
|
|
365
|
+
**Use LitGPT when:**
|
|
366
|
+
- Want to understand LLM architectures (clean, readable code)
|
|
367
|
+
- Need production-ready training recipes
|
|
368
|
+
- Educational purposes or research
|
|
369
|
+
- Prototyping new model ideas
|
|
370
|
+
- Lightning ecosystem user
|
|
371
|
+
|
|
372
|
+
**Use alternatives instead:**
|
|
373
|
+
- **Axolotl/TRL**: More fine-tuning features, YAML configs
|
|
374
|
+
- **Megatron-Core**: Maximum performance for >70B models
|
|
375
|
+
- **HuggingFace Transformers**: Broadest model support
|
|
376
|
+
- **vLLM**: Inference-only (no training)
|
|
377
|
+
|
|
378
|
+
## Common issues
|
|
379
|
+
|
|
380
|
+
**Issue: Out of memory during fine-tuning**
|
|
381
|
+
|
|
382
|
+
Use LoRA instead of full fine-tuning:
|
|
383
|
+
```bash
|
|
384
|
+
# Instead of litgpt finetune (requires 40GB+)
|
|
385
|
+
litgpt finetune_lora # Only needs 12-16GB
|
|
386
|
+
```
|
|
387
|
+
|
|
388
|
+
Or enable gradient checkpointing:
|
|
389
|
+
```bash
|
|
390
|
+
litgpt finetune_lora \
|
|
391
|
+
... \
|
|
392
|
+
--train.gradient_accumulation_iters 4 # Accumulate gradients
|
|
393
|
+
```
|
|
394
|
+
|
|
395
|
+
**Issue: Training too slow**
|
|
396
|
+
|
|
397
|
+
Enable Flash Attention (built-in, automatic on compatible hardware):
|
|
398
|
+
```python
|
|
399
|
+
# Already enabled by default on Ampere+ GPUs (A100, RTX 30/40 series)
|
|
400
|
+
# No configuration needed
|
|
401
|
+
```
|
|
402
|
+
|
|
403
|
+
Use smaller micro-batch and accumulate:
|
|
404
|
+
```bash
|
|
405
|
+
--train.micro_batch_size 1 \
|
|
406
|
+
--train.global_batch_size 32 \
|
|
407
|
+
--train.gradient_accumulation_iters 32 # Effective batch=32
|
|
408
|
+
```
|
|
409
|
+
|
|
410
|
+
**Issue: Model not loading**
|
|
411
|
+
|
|
412
|
+
Check model name:
|
|
413
|
+
```bash
|
|
414
|
+
# List all available models
|
|
415
|
+
litgpt download list
|
|
416
|
+
|
|
417
|
+
# Download if not exists
|
|
418
|
+
litgpt download meta-llama/Meta-Llama-3-8B
|
|
419
|
+
```
|
|
420
|
+
|
|
421
|
+
Verify checkpoints directory:
|
|
422
|
+
```bash
|
|
423
|
+
ls checkpoints/
|
|
424
|
+
# Should see: meta-llama/Meta-Llama-3-8B/
|
|
425
|
+
```
|
|
426
|
+
|
|
427
|
+
**Issue: LoRA adapters too large**
|
|
428
|
+
|
|
429
|
+
Reduce LoRA rank:
|
|
430
|
+
```bash
|
|
431
|
+
--lora_r 8 # Instead of 16 or 32
|
|
432
|
+
```
|
|
433
|
+
|
|
434
|
+
Apply LoRA to fewer layers:
|
|
435
|
+
```bash
|
|
436
|
+
--lora_query true \
|
|
437
|
+
--lora_value true \
|
|
438
|
+
--lora_projection false \ # Disable this
|
|
439
|
+
--lora_mlp false # And this
|
|
440
|
+
```
|
|
441
|
+
|
|
442
|
+
## Advanced topics
|
|
443
|
+
|
|
444
|
+
**Supported architectures**: See [references/supported-models.md](references/supported-models.md) for complete list of 20+ model families with sizes and capabilities.
|
|
445
|
+
|
|
446
|
+
**Training recipes**: See [references/training-recipes.md](references/training-recipes.md) for proven hyperparameter configurations for pretraining and fine-tuning.
|
|
447
|
+
|
|
448
|
+
**FSDP configuration**: See [references/distributed-training.md](references/distributed-training.md) for multi-GPU training with Fully Sharded Data Parallel.
|
|
449
|
+
|
|
450
|
+
**Custom architectures**: See [references/custom-models.md](references/custom-models.md) for implementing new model architectures in LitGPT style.
|
|
451
|
+
|
|
452
|
+
## Hardware requirements
|
|
453
|
+
|
|
454
|
+
- **GPU**: NVIDIA (CUDA 11.8+), AMD (ROCm), Apple Silicon (MPS)
|
|
455
|
+
- **Memory**:
|
|
456
|
+
- Inference (Phi-2): 6GB
|
|
457
|
+
- LoRA fine-tuning (7B): 16GB
|
|
458
|
+
- Full fine-tuning (7B): 40GB+
|
|
459
|
+
- Pretraining (1B): 24GB
|
|
460
|
+
- **Storage**: 5-50GB per model (depending on size)
|
|
461
|
+
|
|
462
|
+
## Resources
|
|
463
|
+
|
|
464
|
+
- GitHub: https://github.com/Lightning-AI/litgpt
|
|
465
|
+
- Docs: https://lightning.ai/docs/litgpt
|
|
466
|
+
- Tutorials: https://lightning.ai/docs/litgpt/tutorials
|
|
467
|
+
- Model zoo: 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral, Mixtral, Falcon, etc.)
|
|
468
|
+
|
|
469
|
+
|