@synsci/cli-darwin-x64 1.1.49
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/skills/accelerate/SKILL.md +332 -0
- package/bin/skills/accelerate/references/custom-plugins.md +453 -0
- package/bin/skills/accelerate/references/megatron-integration.md +489 -0
- package/bin/skills/accelerate/references/performance.md +525 -0
- package/bin/skills/audiocraft/SKILL.md +564 -0
- package/bin/skills/audiocraft/references/advanced-usage.md +666 -0
- package/bin/skills/audiocraft/references/troubleshooting.md +504 -0
- package/bin/skills/autogpt/SKILL.md +403 -0
- package/bin/skills/autogpt/references/advanced-usage.md +535 -0
- package/bin/skills/autogpt/references/troubleshooting.md +420 -0
- package/bin/skills/awq/SKILL.md +310 -0
- package/bin/skills/awq/references/advanced-usage.md +324 -0
- package/bin/skills/awq/references/troubleshooting.md +344 -0
- package/bin/skills/axolotl/SKILL.md +158 -0
- package/bin/skills/axolotl/references/api.md +5548 -0
- package/bin/skills/axolotl/references/dataset-formats.md +1029 -0
- package/bin/skills/axolotl/references/index.md +15 -0
- package/bin/skills/axolotl/references/other.md +3563 -0
- package/bin/skills/bigcode-evaluation-harness/SKILL.md +405 -0
- package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +393 -0
- package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +424 -0
- package/bin/skills/bigcode-evaluation-harness/references/issues.md +394 -0
- package/bin/skills/bitsandbytes/SKILL.md +411 -0
- package/bin/skills/bitsandbytes/references/memory-optimization.md +521 -0
- package/bin/skills/bitsandbytes/references/qlora-training.md +521 -0
- package/bin/skills/bitsandbytes/references/quantization-formats.md +447 -0
- package/bin/skills/blip-2/SKILL.md +564 -0
- package/bin/skills/blip-2/references/advanced-usage.md +680 -0
- package/bin/skills/blip-2/references/troubleshooting.md +526 -0
- package/bin/skills/chroma/SKILL.md +406 -0
- package/bin/skills/chroma/references/integration.md +38 -0
- package/bin/skills/clip/SKILL.md +253 -0
- package/bin/skills/clip/references/applications.md +207 -0
- package/bin/skills/constitutional-ai/SKILL.md +290 -0
- package/bin/skills/crewai/SKILL.md +498 -0
- package/bin/skills/crewai/references/flows.md +438 -0
- package/bin/skills/crewai/references/tools.md +429 -0
- package/bin/skills/crewai/references/troubleshooting.md +480 -0
- package/bin/skills/deepspeed/SKILL.md +141 -0
- package/bin/skills/deepspeed/references/08.md +17 -0
- package/bin/skills/deepspeed/references/09.md +173 -0
- package/bin/skills/deepspeed/references/2020.md +378 -0
- package/bin/skills/deepspeed/references/2023.md +279 -0
- package/bin/skills/deepspeed/references/assets.md +179 -0
- package/bin/skills/deepspeed/references/index.md +35 -0
- package/bin/skills/deepspeed/references/mii.md +118 -0
- package/bin/skills/deepspeed/references/other.md +1191 -0
- package/bin/skills/deepspeed/references/tutorials.md +6554 -0
- package/bin/skills/dspy/SKILL.md +590 -0
- package/bin/skills/dspy/references/examples.md +663 -0
- package/bin/skills/dspy/references/modules.md +475 -0
- package/bin/skills/dspy/references/optimizers.md +566 -0
- package/bin/skills/faiss/SKILL.md +221 -0
- package/bin/skills/faiss/references/index_types.md +280 -0
- package/bin/skills/flash-attention/SKILL.md +367 -0
- package/bin/skills/flash-attention/references/benchmarks.md +215 -0
- package/bin/skills/flash-attention/references/transformers-integration.md +293 -0
- package/bin/skills/gguf/SKILL.md +427 -0
- package/bin/skills/gguf/references/advanced-usage.md +504 -0
- package/bin/skills/gguf/references/troubleshooting.md +442 -0
- package/bin/skills/gptq/SKILL.md +450 -0
- package/bin/skills/gptq/references/calibration.md +337 -0
- package/bin/skills/gptq/references/integration.md +129 -0
- package/bin/skills/gptq/references/troubleshooting.md +95 -0
- package/bin/skills/grpo-rl-training/README.md +97 -0
- package/bin/skills/grpo-rl-training/SKILL.md +572 -0
- package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +393 -0
- package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +228 -0
- package/bin/skills/guidance/SKILL.md +572 -0
- package/bin/skills/guidance/references/backends.md +554 -0
- package/bin/skills/guidance/references/constraints.md +674 -0
- package/bin/skills/guidance/references/examples.md +767 -0
- package/bin/skills/hqq/SKILL.md +445 -0
- package/bin/skills/hqq/references/advanced-usage.md +528 -0
- package/bin/skills/hqq/references/troubleshooting.md +503 -0
- package/bin/skills/hugging-face-cli/SKILL.md +191 -0
- package/bin/skills/hugging-face-cli/references/commands.md +954 -0
- package/bin/skills/hugging-face-cli/references/examples.md +374 -0
- package/bin/skills/hugging-face-datasets/SKILL.md +547 -0
- package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +239 -0
- package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +196 -0
- package/bin/skills/hugging-face-datasets/examples/training_examples.json +176 -0
- package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +522 -0
- package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +844 -0
- package/bin/skills/hugging-face-datasets/templates/chat.json +55 -0
- package/bin/skills/hugging-face-datasets/templates/classification.json +62 -0
- package/bin/skills/hugging-face-datasets/templates/completion.json +51 -0
- package/bin/skills/hugging-face-datasets/templates/custom.json +75 -0
- package/bin/skills/hugging-face-datasets/templates/qa.json +54 -0
- package/bin/skills/hugging-face-datasets/templates/tabular.json +81 -0
- package/bin/skills/hugging-face-evaluation/SKILL.md +656 -0
- package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +382 -0
- package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +141 -0
- package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +135 -0
- package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +50 -0
- package/bin/skills/hugging-face-evaluation/requirements.txt +20 -0
- package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +1374 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +104 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +317 -0
- package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +303 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +98 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +331 -0
- package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +206 -0
- package/bin/skills/hugging-face-jobs/SKILL.md +1041 -0
- package/bin/skills/hugging-face-jobs/index.html +216 -0
- package/bin/skills/hugging-face-jobs/references/hardware_guide.md +336 -0
- package/bin/skills/hugging-face-jobs/references/hub_saving.md +352 -0
- package/bin/skills/hugging-face-jobs/references/token_usage.md +546 -0
- package/bin/skills/hugging-face-jobs/references/troubleshooting.md +475 -0
- package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +718 -0
- package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +546 -0
- package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +587 -0
- package/bin/skills/hugging-face-model-trainer/SKILL.md +711 -0
- package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +296 -0
- package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +283 -0
- package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +364 -0
- package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +371 -0
- package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +189 -0
- package/bin/skills/hugging-face-model-trainer/references/training_methods.md +150 -0
- package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +203 -0
- package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +282 -0
- package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +424 -0
- package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +417 -0
- package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +150 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +106 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +89 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +122 -0
- package/bin/skills/hugging-face-paper-publisher/SKILL.md +627 -0
- package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +327 -0
- package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +216 -0
- package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +508 -0
- package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +299 -0
- package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +358 -0
- package/bin/skills/hugging-face-paper-publisher/templates/modern.md +319 -0
- package/bin/skills/hugging-face-paper-publisher/templates/standard.md +201 -0
- package/bin/skills/hugging-face-tool-builder/SKILL.md +115 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +57 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +40 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +57 -0
- package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +230 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +96 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +188 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +171 -0
- package/bin/skills/hugging-face-trackio/SKILL.md +65 -0
- package/bin/skills/hugging-face-trackio/references/logging_metrics.md +206 -0
- package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +223 -0
- package/bin/skills/huggingface-tokenizers/SKILL.md +516 -0
- package/bin/skills/huggingface-tokenizers/references/algorithms.md +653 -0
- package/bin/skills/huggingface-tokenizers/references/integration.md +637 -0
- package/bin/skills/huggingface-tokenizers/references/pipeline.md +723 -0
- package/bin/skills/huggingface-tokenizers/references/training.md +565 -0
- package/bin/skills/instructor/SKILL.md +740 -0
- package/bin/skills/instructor/references/examples.md +107 -0
- package/bin/skills/instructor/references/providers.md +70 -0
- package/bin/skills/instructor/references/validation.md +606 -0
- package/bin/skills/knowledge-distillation/SKILL.md +458 -0
- package/bin/skills/knowledge-distillation/references/minillm.md +334 -0
- package/bin/skills/lambda-labs/SKILL.md +545 -0
- package/bin/skills/lambda-labs/references/advanced-usage.md +611 -0
- package/bin/skills/lambda-labs/references/troubleshooting.md +530 -0
- package/bin/skills/langchain/SKILL.md +480 -0
- package/bin/skills/langchain/references/agents.md +499 -0
- package/bin/skills/langchain/references/integration.md +562 -0
- package/bin/skills/langchain/references/rag.md +600 -0
- package/bin/skills/langsmith/SKILL.md +422 -0
- package/bin/skills/langsmith/references/advanced-usage.md +548 -0
- package/bin/skills/langsmith/references/troubleshooting.md +537 -0
- package/bin/skills/litgpt/SKILL.md +469 -0
- package/bin/skills/litgpt/references/custom-models.md +568 -0
- package/bin/skills/litgpt/references/distributed-training.md +451 -0
- package/bin/skills/litgpt/references/supported-models.md +336 -0
- package/bin/skills/litgpt/references/training-recipes.md +619 -0
- package/bin/skills/llama-cpp/SKILL.md +258 -0
- package/bin/skills/llama-cpp/references/optimization.md +89 -0
- package/bin/skills/llama-cpp/references/quantization.md +213 -0
- package/bin/skills/llama-cpp/references/server.md +125 -0
- package/bin/skills/llama-factory/SKILL.md +80 -0
- package/bin/skills/llama-factory/references/_images.md +23 -0
- package/bin/skills/llama-factory/references/advanced.md +1055 -0
- package/bin/skills/llama-factory/references/getting_started.md +349 -0
- package/bin/skills/llama-factory/references/index.md +19 -0
- package/bin/skills/llama-factory/references/other.md +31 -0
- package/bin/skills/llamaguard/SKILL.md +337 -0
- package/bin/skills/llamaindex/SKILL.md +569 -0
- package/bin/skills/llamaindex/references/agents.md +83 -0
- package/bin/skills/llamaindex/references/data_connectors.md +108 -0
- package/bin/skills/llamaindex/references/query_engines.md +406 -0
- package/bin/skills/llava/SKILL.md +304 -0
- package/bin/skills/llava/references/training.md +197 -0
- package/bin/skills/lm-evaluation-harness/SKILL.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
- package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
- package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
- package/bin/skills/long-context/SKILL.md +536 -0
- package/bin/skills/long-context/references/extension_methods.md +468 -0
- package/bin/skills/long-context/references/fine_tuning.md +611 -0
- package/bin/skills/long-context/references/rope.md +402 -0
- package/bin/skills/mamba/SKILL.md +260 -0
- package/bin/skills/mamba/references/architecture-details.md +206 -0
- package/bin/skills/mamba/references/benchmarks.md +255 -0
- package/bin/skills/mamba/references/training-guide.md +388 -0
- package/bin/skills/megatron-core/SKILL.md +366 -0
- package/bin/skills/megatron-core/references/benchmarks.md +249 -0
- package/bin/skills/megatron-core/references/parallelism-guide.md +404 -0
- package/bin/skills/megatron-core/references/production-examples.md +473 -0
- package/bin/skills/megatron-core/references/training-recipes.md +547 -0
- package/bin/skills/miles/SKILL.md +315 -0
- package/bin/skills/miles/references/api-reference.md +141 -0
- package/bin/skills/miles/references/troubleshooting.md +352 -0
- package/bin/skills/mlflow/SKILL.md +704 -0
- package/bin/skills/mlflow/references/deployment.md +744 -0
- package/bin/skills/mlflow/references/model-registry.md +770 -0
- package/bin/skills/mlflow/references/tracking.md +680 -0
- package/bin/skills/modal/SKILL.md +341 -0
- package/bin/skills/modal/references/advanced-usage.md +503 -0
- package/bin/skills/modal/references/troubleshooting.md +494 -0
- package/bin/skills/model-merging/SKILL.md +539 -0
- package/bin/skills/model-merging/references/evaluation.md +462 -0
- package/bin/skills/model-merging/references/examples.md +428 -0
- package/bin/skills/model-merging/references/methods.md +352 -0
- package/bin/skills/model-pruning/SKILL.md +495 -0
- package/bin/skills/model-pruning/references/wanda.md +347 -0
- package/bin/skills/moe-training/SKILL.md +526 -0
- package/bin/skills/moe-training/references/architectures.md +432 -0
- package/bin/skills/moe-training/references/inference.md +348 -0
- package/bin/skills/moe-training/references/training.md +425 -0
- package/bin/skills/nanogpt/SKILL.md +290 -0
- package/bin/skills/nanogpt/references/architecture.md +382 -0
- package/bin/skills/nanogpt/references/data.md +476 -0
- package/bin/skills/nanogpt/references/training.md +564 -0
- package/bin/skills/nemo-curator/SKILL.md +383 -0
- package/bin/skills/nemo-curator/references/deduplication.md +87 -0
- package/bin/skills/nemo-curator/references/filtering.md +102 -0
- package/bin/skills/nemo-evaluator/SKILL.md +494 -0
- package/bin/skills/nemo-evaluator/references/adapter-system.md +340 -0
- package/bin/skills/nemo-evaluator/references/configuration.md +447 -0
- package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +315 -0
- package/bin/skills/nemo-evaluator/references/execution-backends.md +361 -0
- package/bin/skills/nemo-guardrails/SKILL.md +297 -0
- package/bin/skills/nnsight/SKILL.md +436 -0
- package/bin/skills/nnsight/references/README.md +78 -0
- package/bin/skills/nnsight/references/api.md +344 -0
- package/bin/skills/nnsight/references/tutorials.md +300 -0
- package/bin/skills/openrlhf/SKILL.md +249 -0
- package/bin/skills/openrlhf/references/algorithm-comparison.md +404 -0
- package/bin/skills/openrlhf/references/custom-rewards.md +530 -0
- package/bin/skills/openrlhf/references/hybrid-engine.md +287 -0
- package/bin/skills/openrlhf/references/multi-node-training.md +454 -0
- package/bin/skills/outlines/SKILL.md +652 -0
- package/bin/skills/outlines/references/backends.md +615 -0
- package/bin/skills/outlines/references/examples.md +773 -0
- package/bin/skills/outlines/references/json_generation.md +652 -0
- package/bin/skills/peft/SKILL.md +431 -0
- package/bin/skills/peft/references/advanced-usage.md +514 -0
- package/bin/skills/peft/references/troubleshooting.md +480 -0
- package/bin/skills/phoenix/SKILL.md +475 -0
- package/bin/skills/phoenix/references/advanced-usage.md +619 -0
- package/bin/skills/phoenix/references/troubleshooting.md +538 -0
- package/bin/skills/pinecone/SKILL.md +358 -0
- package/bin/skills/pinecone/references/deployment.md +181 -0
- package/bin/skills/pytorch-fsdp/SKILL.md +126 -0
- package/bin/skills/pytorch-fsdp/references/index.md +7 -0
- package/bin/skills/pytorch-fsdp/references/other.md +4249 -0
- package/bin/skills/pytorch-lightning/SKILL.md +346 -0
- package/bin/skills/pytorch-lightning/references/callbacks.md +436 -0
- package/bin/skills/pytorch-lightning/references/distributed.md +490 -0
- package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +556 -0
- package/bin/skills/pyvene/SKILL.md +473 -0
- package/bin/skills/pyvene/references/README.md +73 -0
- package/bin/skills/pyvene/references/api.md +383 -0
- package/bin/skills/pyvene/references/tutorials.md +376 -0
- package/bin/skills/qdrant/SKILL.md +493 -0
- package/bin/skills/qdrant/references/advanced-usage.md +648 -0
- package/bin/skills/qdrant/references/troubleshooting.md +631 -0
- package/bin/skills/ray-data/SKILL.md +326 -0
- package/bin/skills/ray-data/references/integration.md +82 -0
- package/bin/skills/ray-data/references/transformations.md +83 -0
- package/bin/skills/ray-train/SKILL.md +406 -0
- package/bin/skills/ray-train/references/multi-node.md +628 -0
- package/bin/skills/rwkv/SKILL.md +260 -0
- package/bin/skills/rwkv/references/architecture-details.md +344 -0
- package/bin/skills/rwkv/references/rwkv7.md +386 -0
- package/bin/skills/rwkv/references/state-management.md +369 -0
- package/bin/skills/saelens/SKILL.md +386 -0
- package/bin/skills/saelens/references/README.md +70 -0
- package/bin/skills/saelens/references/api.md +333 -0
- package/bin/skills/saelens/references/tutorials.md +318 -0
- package/bin/skills/segment-anything/SKILL.md +500 -0
- package/bin/skills/segment-anything/references/advanced-usage.md +589 -0
- package/bin/skills/segment-anything/references/troubleshooting.md +484 -0
- package/bin/skills/sentence-transformers/SKILL.md +255 -0
- package/bin/skills/sentence-transformers/references/models.md +123 -0
- package/bin/skills/sentencepiece/SKILL.md +235 -0
- package/bin/skills/sentencepiece/references/algorithms.md +200 -0
- package/bin/skills/sentencepiece/references/training.md +304 -0
- package/bin/skills/sglang/SKILL.md +442 -0
- package/bin/skills/sglang/references/deployment.md +490 -0
- package/bin/skills/sglang/references/radix-attention.md +413 -0
- package/bin/skills/sglang/references/structured-generation.md +541 -0
- package/bin/skills/simpo/SKILL.md +219 -0
- package/bin/skills/simpo/references/datasets.md +478 -0
- package/bin/skills/simpo/references/hyperparameters.md +452 -0
- package/bin/skills/simpo/references/loss-functions.md +350 -0
- package/bin/skills/skypilot/SKILL.md +509 -0
- package/bin/skills/skypilot/references/advanced-usage.md +491 -0
- package/bin/skills/skypilot/references/troubleshooting.md +570 -0
- package/bin/skills/slime/SKILL.md +464 -0
- package/bin/skills/slime/references/api-reference.md +392 -0
- package/bin/skills/slime/references/troubleshooting.md +386 -0
- package/bin/skills/speculative-decoding/SKILL.md +467 -0
- package/bin/skills/speculative-decoding/references/lookahead.md +309 -0
- package/bin/skills/speculative-decoding/references/medusa.md +350 -0
- package/bin/skills/stable-diffusion/SKILL.md +519 -0
- package/bin/skills/stable-diffusion/references/advanced-usage.md +716 -0
- package/bin/skills/stable-diffusion/references/troubleshooting.md +555 -0
- package/bin/skills/tensorboard/SKILL.md +629 -0
- package/bin/skills/tensorboard/references/integrations.md +638 -0
- package/bin/skills/tensorboard/references/profiling.md +545 -0
- package/bin/skills/tensorboard/references/visualization.md +620 -0
- package/bin/skills/tensorrt-llm/SKILL.md +187 -0
- package/bin/skills/tensorrt-llm/references/multi-gpu.md +298 -0
- package/bin/skills/tensorrt-llm/references/optimization.md +242 -0
- package/bin/skills/tensorrt-llm/references/serving.md +470 -0
- package/bin/skills/tinker/SKILL.md +362 -0
- package/bin/skills/tinker/references/api-reference.md +168 -0
- package/bin/skills/tinker/references/getting-started.md +157 -0
- package/bin/skills/tinker/references/loss-functions.md +163 -0
- package/bin/skills/tinker/references/models-and-lora.md +139 -0
- package/bin/skills/tinker/references/recipes.md +280 -0
- package/bin/skills/tinker/references/reinforcement-learning.md +212 -0
- package/bin/skills/tinker/references/rendering.md +243 -0
- package/bin/skills/tinker/references/supervised-learning.md +232 -0
- package/bin/skills/tinker-training-cost/SKILL.md +187 -0
- package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +123 -0
- package/bin/skills/torchforge/SKILL.md +433 -0
- package/bin/skills/torchforge/references/api-reference.md +327 -0
- package/bin/skills/torchforge/references/troubleshooting.md +409 -0
- package/bin/skills/torchtitan/SKILL.md +358 -0
- package/bin/skills/torchtitan/references/checkpoint.md +181 -0
- package/bin/skills/torchtitan/references/custom-models.md +258 -0
- package/bin/skills/torchtitan/references/float8.md +133 -0
- package/bin/skills/torchtitan/references/fsdp.md +126 -0
- package/bin/skills/transformer-lens/SKILL.md +346 -0
- package/bin/skills/transformer-lens/references/README.md +54 -0
- package/bin/skills/transformer-lens/references/api.md +362 -0
- package/bin/skills/transformer-lens/references/tutorials.md +339 -0
- package/bin/skills/trl-fine-tuning/SKILL.md +455 -0
- package/bin/skills/trl-fine-tuning/references/dpo-variants.md +227 -0
- package/bin/skills/trl-fine-tuning/references/online-rl.md +82 -0
- package/bin/skills/trl-fine-tuning/references/reward-modeling.md +122 -0
- package/bin/skills/trl-fine-tuning/references/sft-training.md +168 -0
- package/bin/skills/unsloth/SKILL.md +80 -0
- package/bin/skills/unsloth/references/index.md +7 -0
- package/bin/skills/unsloth/references/llms-full.md +16799 -0
- package/bin/skills/unsloth/references/llms-txt.md +12044 -0
- package/bin/skills/unsloth/references/llms.md +82 -0
- package/bin/skills/verl/SKILL.md +391 -0
- package/bin/skills/verl/references/api-reference.md +301 -0
- package/bin/skills/verl/references/troubleshooting.md +391 -0
- package/bin/skills/vllm/SKILL.md +364 -0
- package/bin/skills/vllm/references/optimization.md +226 -0
- package/bin/skills/vllm/references/quantization.md +284 -0
- package/bin/skills/vllm/references/server-deployment.md +255 -0
- package/bin/skills/vllm/references/troubleshooting.md +447 -0
- package/bin/skills/weights-and-biases/SKILL.md +590 -0
- package/bin/skills/weights-and-biases/references/artifacts.md +584 -0
- package/bin/skills/weights-and-biases/references/integrations.md +700 -0
- package/bin/skills/weights-and-biases/references/sweeps.md +847 -0
- package/bin/skills/whisper/SKILL.md +317 -0
- package/bin/skills/whisper/references/languages.md +189 -0
- package/bin/synsc +0 -0
- package/package.json +10 -0
|
@@ -0,0 +1,393 @@
|
|
|
1
|
+
# BigCode Evaluation Harness - Benchmark Guide
|
|
2
|
+
|
|
3
|
+
Comprehensive guide to all benchmarks supported by BigCode Evaluation Harness.
|
|
4
|
+
|
|
5
|
+
## Code Generation with Unit Tests
|
|
6
|
+
|
|
7
|
+
These benchmarks test functional correctness by executing generated code against unit tests.
|
|
8
|
+
|
|
9
|
+
### HumanEval
|
|
10
|
+
|
|
11
|
+
**Overview**: 164 handwritten Python programming problems created by OpenAI.
|
|
12
|
+
|
|
13
|
+
**Dataset**: `openai_humaneval` on HuggingFace
|
|
14
|
+
**Metric**: pass@k (k=1, 10, 100)
|
|
15
|
+
**Problems**: Function completion with docstrings
|
|
16
|
+
|
|
17
|
+
**Example problem structure**:
|
|
18
|
+
```python
|
|
19
|
+
def has_close_elements(numbers: List[float], threshold: float) -> bool:
|
|
20
|
+
"""Check if in given list of numbers, are any two numbers closer to each other than given threshold.
|
|
21
|
+
>>> has_close_elements([1.0, 2.0, 3.0], 0.5)
|
|
22
|
+
False
|
|
23
|
+
>>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
|
|
24
|
+
True
|
|
25
|
+
"""
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
**Usage**:
|
|
29
|
+
```bash
|
|
30
|
+
accelerate launch main.py \
|
|
31
|
+
--model bigcode/starcoder2-7b \
|
|
32
|
+
--tasks humaneval \
|
|
33
|
+
--temperature 0.2 \
|
|
34
|
+
--n_samples 200 \
|
|
35
|
+
--batch_size 50 \
|
|
36
|
+
--allow_code_execution
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
**Recommended settings**:
|
|
40
|
+
- `temperature`: 0.8 for pass@k with large n_samples, 0.2 for greedy
|
|
41
|
+
- `n_samples`: 200 for accurate pass@k estimation
|
|
42
|
+
- `max_length_generation`: 512 (sufficient for most problems)
|
|
43
|
+
|
|
44
|
+
### HumanEval+
|
|
45
|
+
|
|
46
|
+
**Overview**: Extended HumanEval with 80× more test cases per problem.
|
|
47
|
+
|
|
48
|
+
**Dataset**: `evalplus/humanevalplus` on HuggingFace
|
|
49
|
+
**Why use it**: Catches solutions that pass original tests but fail on edge cases
|
|
50
|
+
|
|
51
|
+
**Usage**:
|
|
52
|
+
```bash
|
|
53
|
+
accelerate launch main.py \
|
|
54
|
+
--model bigcode/starcoder2-7b \
|
|
55
|
+
--tasks humanevalplus \
|
|
56
|
+
--temperature 0.2 \
|
|
57
|
+
--n_samples 200 \
|
|
58
|
+
--allow_code_execution
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
**Note**: Execution takes longer due to additional tests. Timeout may need adjustment.
|
|
62
|
+
|
|
63
|
+
### MBPP (Mostly Basic Python Problems)
|
|
64
|
+
|
|
65
|
+
**Overview**: 1,000 crowd-sourced Python problems designed for entry-level programmers.
|
|
66
|
+
|
|
67
|
+
**Dataset**: `mbpp` on HuggingFace
|
|
68
|
+
**Test split**: 500 problems (indices 11-511)
|
|
69
|
+
**Metric**: pass@k
|
|
70
|
+
|
|
71
|
+
**Problem structure**:
|
|
72
|
+
- Task description in English
|
|
73
|
+
- 3 automated test cases per problem
|
|
74
|
+
- Code solution (ground truth)
|
|
75
|
+
|
|
76
|
+
**Usage**:
|
|
77
|
+
```bash
|
|
78
|
+
accelerate launch main.py \
|
|
79
|
+
--model bigcode/starcoder2-7b \
|
|
80
|
+
--tasks mbpp \
|
|
81
|
+
--temperature 0.2 \
|
|
82
|
+
--n_samples 200 \
|
|
83
|
+
--allow_code_execution
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
### MBPP+
|
|
87
|
+
|
|
88
|
+
**Overview**: 399 curated MBPP problems with 35× more test cases.
|
|
89
|
+
|
|
90
|
+
**Dataset**: `evalplus/mbppplus` on HuggingFace
|
|
91
|
+
|
|
92
|
+
**Usage**:
|
|
93
|
+
```bash
|
|
94
|
+
accelerate launch main.py \
|
|
95
|
+
--model bigcode/starcoder2-7b \
|
|
96
|
+
--tasks mbppplus \
|
|
97
|
+
--allow_code_execution
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
### MultiPL-E (18 Languages)
|
|
101
|
+
|
|
102
|
+
**Overview**: HumanEval and MBPP translated to 18 programming languages.
|
|
103
|
+
|
|
104
|
+
**Languages**: Python, JavaScript, Java, C++, Go, Rust, TypeScript, C#, PHP, Ruby, Swift, Kotlin, Scala, Perl, Julia, Lua, R, Racket
|
|
105
|
+
|
|
106
|
+
**Task naming**: `multiple-{lang}` where lang is file extension:
|
|
107
|
+
- `multiple-py` (Python)
|
|
108
|
+
- `multiple-js` (JavaScript)
|
|
109
|
+
- `multiple-java` (Java)
|
|
110
|
+
- `multiple-cpp` (C++)
|
|
111
|
+
- `multiple-go` (Go)
|
|
112
|
+
- `multiple-rs` (Rust)
|
|
113
|
+
- `multiple-ts` (TypeScript)
|
|
114
|
+
- `multiple-cs` (C#)
|
|
115
|
+
- `multiple-php` (PHP)
|
|
116
|
+
- `multiple-rb` (Ruby)
|
|
117
|
+
- `multiple-swift` (Swift)
|
|
118
|
+
- `multiple-kt` (Kotlin)
|
|
119
|
+
- `multiple-scala` (Scala)
|
|
120
|
+
- `multiple-pl` (Perl)
|
|
121
|
+
- `multiple-jl` (Julia)
|
|
122
|
+
- `multiple-lua` (Lua)
|
|
123
|
+
- `multiple-r` (R)
|
|
124
|
+
- `multiple-rkt` (Racket)
|
|
125
|
+
|
|
126
|
+
**Usage with Docker** (recommended for safe execution):
|
|
127
|
+
```bash
|
|
128
|
+
# Step 1: Generate on host
|
|
129
|
+
accelerate launch main.py \
|
|
130
|
+
--model bigcode/starcoder2-7b \
|
|
131
|
+
--tasks multiple-js,multiple-java,multiple-cpp \
|
|
132
|
+
--generation_only \
|
|
133
|
+
--save_generations \
|
|
134
|
+
--save_generations_path generations.json
|
|
135
|
+
|
|
136
|
+
# Step 2: Evaluate in Docker
|
|
137
|
+
docker pull ghcr.io/bigcode-project/evaluation-harness-multiple
|
|
138
|
+
docker run -v $(pwd)/generations.json:/app/generations.json:ro \
|
|
139
|
+
-it evaluation-harness-multiple python3 main.py \
|
|
140
|
+
--tasks multiple-js,multiple-java,multiple-cpp \
|
|
141
|
+
--load_generations_path /app/generations.json \
|
|
142
|
+
--allow_code_execution
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
### APPS
|
|
146
|
+
|
|
147
|
+
**Overview**: 10,000 Python problems across three difficulty levels.
|
|
148
|
+
|
|
149
|
+
**Difficulty levels**:
|
|
150
|
+
- Introductory: Basic programming
|
|
151
|
+
- Interview: Technical interview level
|
|
152
|
+
- Competition: Competitive programming
|
|
153
|
+
|
|
154
|
+
**Tasks**:
|
|
155
|
+
- `apps-introductory`
|
|
156
|
+
- `apps-interview`
|
|
157
|
+
- `apps-competition`
|
|
158
|
+
|
|
159
|
+
**Usage**:
|
|
160
|
+
```bash
|
|
161
|
+
accelerate launch main.py \
|
|
162
|
+
--model bigcode/starcoder2-7b \
|
|
163
|
+
--tasks apps-introductory \
|
|
164
|
+
--max_length_generation 1024 \
|
|
165
|
+
--allow_code_execution
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
### DS-1000
|
|
169
|
+
|
|
170
|
+
**Overview**: 1,000 data science problems across 7 Python libraries.
|
|
171
|
+
|
|
172
|
+
**Libraries**: NumPy, Pandas, SciPy, Scikit-learn, PyTorch, TensorFlow, Matplotlib
|
|
173
|
+
|
|
174
|
+
**Requirements**:
|
|
175
|
+
- Python 3.7.10 specifically
|
|
176
|
+
- `pip install -e ".[ds1000]"`
|
|
177
|
+
- PyTorch 1.12.1
|
|
178
|
+
|
|
179
|
+
**Usage**:
|
|
180
|
+
```bash
|
|
181
|
+
accelerate launch main.py \
|
|
182
|
+
--model bigcode/starcoder2-7b \
|
|
183
|
+
--tasks ds1000-all-completion \
|
|
184
|
+
--allow_code_execution
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
### Mercury
|
|
188
|
+
|
|
189
|
+
**Overview**: 1,889 tasks for evaluating computational efficiency of generated code.
|
|
190
|
+
|
|
191
|
+
**Requirements**: `pip install lctk sortedcontainers`
|
|
192
|
+
|
|
193
|
+
**Metric**: Beyond@k (efficiency-based)
|
|
194
|
+
|
|
195
|
+
**Usage**:
|
|
196
|
+
```bash
|
|
197
|
+
accelerate launch main.py \
|
|
198
|
+
--model bigcode/starcoder2-7b \
|
|
199
|
+
--tasks mercury \
|
|
200
|
+
--allow_code_execution
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
## Code Generation Without Unit Tests
|
|
204
|
+
|
|
205
|
+
These benchmarks use text-based metrics (BLEU, Exact Match).
|
|
206
|
+
|
|
207
|
+
### SantaCoder-FIM (Fill-in-the-Middle)
|
|
208
|
+
|
|
209
|
+
**Overview**: 4,792 fill-in-the-middle tasks for Python, JavaScript, Java.
|
|
210
|
+
|
|
211
|
+
**Metric**: Exact Match
|
|
212
|
+
**Use case**: Evaluating FIM/infilling capabilities
|
|
213
|
+
|
|
214
|
+
**Tasks**:
|
|
215
|
+
- `santacoder_fim`
|
|
216
|
+
- `starcoder_fim`
|
|
217
|
+
|
|
218
|
+
**Usage**:
|
|
219
|
+
```bash
|
|
220
|
+
accelerate launch main.py \
|
|
221
|
+
--model bigcode/starcoder2-7b \
|
|
222
|
+
--tasks santacoder_fim \
|
|
223
|
+
--n_samples 1 \
|
|
224
|
+
--batch_size 1
|
|
225
|
+
```
|
|
226
|
+
|
|
227
|
+
### CoNaLa
|
|
228
|
+
|
|
229
|
+
**Overview**: Natural language to Python code generation.
|
|
230
|
+
|
|
231
|
+
**Metric**: BLEU score
|
|
232
|
+
**Setting**: Two-shot
|
|
233
|
+
|
|
234
|
+
**Usage**:
|
|
235
|
+
```bash
|
|
236
|
+
accelerate launch main.py \
|
|
237
|
+
--model bigcode/starcoder2-7b \
|
|
238
|
+
--tasks conala \
|
|
239
|
+
--do_sample False \
|
|
240
|
+
--n_samples 1
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
### Concode
|
|
244
|
+
|
|
245
|
+
**Overview**: Natural language to Java code generation.
|
|
246
|
+
|
|
247
|
+
**Metric**: BLEU score
|
|
248
|
+
|
|
249
|
+
**Usage**:
|
|
250
|
+
```bash
|
|
251
|
+
accelerate launch main.py \
|
|
252
|
+
--model bigcode/starcoder2-7b \
|
|
253
|
+
--tasks concode \
|
|
254
|
+
--do_sample False \
|
|
255
|
+
--n_samples 1
|
|
256
|
+
```
|
|
257
|
+
|
|
258
|
+
## Instruction-Tuned Model Evaluation
|
|
259
|
+
|
|
260
|
+
### InstructHumanEval
|
|
261
|
+
|
|
262
|
+
**Overview**: HumanEval reformatted for instruction-following models.
|
|
263
|
+
|
|
264
|
+
**Usage**:
|
|
265
|
+
```bash
|
|
266
|
+
accelerate launch main.py \
|
|
267
|
+
--model codellama/CodeLlama-7b-Instruct-hf \
|
|
268
|
+
--tasks instruct-humaneval \
|
|
269
|
+
--instruction_tokens "<s>[INST],</s>,[/INST]" \
|
|
270
|
+
--allow_code_execution
|
|
271
|
+
```
|
|
272
|
+
|
|
273
|
+
### HumanEvalPack
|
|
274
|
+
|
|
275
|
+
**Overview**: Extends HumanEval to 3 scenarios across 6 languages.
|
|
276
|
+
|
|
277
|
+
**Scenarios**:
|
|
278
|
+
- **Synthesize**: Generate code from docstring
|
|
279
|
+
- **Fix**: Fix buggy code
|
|
280
|
+
- **Explain**: Generate docstring from code
|
|
281
|
+
|
|
282
|
+
**Languages**: Python, JavaScript, Java, Go, C++, Rust
|
|
283
|
+
|
|
284
|
+
**Tasks**:
|
|
285
|
+
- `humanevalsynthesize-{lang}`
|
|
286
|
+
- `humanevalfix-{lang}`
|
|
287
|
+
- `humanevalexplain-{lang}`
|
|
288
|
+
|
|
289
|
+
**Usage**:
|
|
290
|
+
```bash
|
|
291
|
+
accelerate launch main.py \
|
|
292
|
+
--model codellama/CodeLlama-7b-Instruct-hf \
|
|
293
|
+
--tasks humanevalsynthesize-python,humanevalfix-python \
|
|
294
|
+
--prompt instruct \
|
|
295
|
+
--allow_code_execution
|
|
296
|
+
```
|
|
297
|
+
|
|
298
|
+
## Math and Reasoning
|
|
299
|
+
|
|
300
|
+
### PAL (Program-Aided Language Models)
|
|
301
|
+
|
|
302
|
+
**Overview**: Solve math problems by generating Python code.
|
|
303
|
+
|
|
304
|
+
**Datasets**: GSM8K, GSM-HARD
|
|
305
|
+
|
|
306
|
+
**Tasks**:
|
|
307
|
+
- `pal-gsm8k-greedy`: Greedy decoding
|
|
308
|
+
- `pal-gsm8k-majority_voting`: k=40 majority voting
|
|
309
|
+
- `pal-gsmhard-greedy`
|
|
310
|
+
- `pal-gsmhard-majority_voting`
|
|
311
|
+
|
|
312
|
+
**Usage**:
|
|
313
|
+
```bash
|
|
314
|
+
accelerate launch main.py \
|
|
315
|
+
--model bigcode/starcoder2-7b \
|
|
316
|
+
--tasks pal-gsm8k-greedy \
|
|
317
|
+
--max_length_generation 2048 \
|
|
318
|
+
--do_sample False \
|
|
319
|
+
--allow_code_execution
|
|
320
|
+
```
|
|
321
|
+
|
|
322
|
+
**Note**: Requires `max_length_generation >= 2048` due to 8-shot prompts (~1500 tokens).
|
|
323
|
+
|
|
324
|
+
## Documentation Generation
|
|
325
|
+
|
|
326
|
+
### CodeXGLUE Code-to-Text
|
|
327
|
+
|
|
328
|
+
**Overview**: Generate documentation from code.
|
|
329
|
+
|
|
330
|
+
**Languages**: Python, Go, Ruby, Java, JavaScript, PHP
|
|
331
|
+
|
|
332
|
+
**Tasks**: `codexglue_code_to_text-{lang}`
|
|
333
|
+
|
|
334
|
+
**Usage**:
|
|
335
|
+
```bash
|
|
336
|
+
accelerate launch main.py \
|
|
337
|
+
--model bigcode/starcoder2-7b \
|
|
338
|
+
--tasks codexglue_code_to_text-python \
|
|
339
|
+
--do_sample False \
|
|
340
|
+
--n_samples 1 \
|
|
341
|
+
--batch_size 1
|
|
342
|
+
```
|
|
343
|
+
|
|
344
|
+
## Classification Tasks
|
|
345
|
+
|
|
346
|
+
### Java Complexity Prediction
|
|
347
|
+
|
|
348
|
+
**Task**: `java-complexity`
|
|
349
|
+
|
|
350
|
+
### Code Equivalence Detection
|
|
351
|
+
|
|
352
|
+
**Task**: `java-clone-detection`
|
|
353
|
+
|
|
354
|
+
### C Defect Prediction
|
|
355
|
+
|
|
356
|
+
**Task**: `c-defect-detection`
|
|
357
|
+
|
|
358
|
+
## Benchmark Selection Guide
|
|
359
|
+
|
|
360
|
+
| Goal | Recommended Benchmarks |
|
|
361
|
+
|------|------------------------|
|
|
362
|
+
| Quick sanity check | HumanEval (n_samples=20) |
|
|
363
|
+
| Standard evaluation | HumanEval + MBPP |
|
|
364
|
+
| Rigorous evaluation | HumanEval+ + MBPP+ |
|
|
365
|
+
| Multi-language | MultiPL-E |
|
|
366
|
+
| Instruction models | InstructHumanEval, HumanEvalPack |
|
|
367
|
+
| FIM/Infilling | SantaCoder-FIM, StarCoder-FIM |
|
|
368
|
+
| Data science | DS-1000 |
|
|
369
|
+
| Competition-level | APPS |
|
|
370
|
+
| Efficiency | Mercury |
|
|
371
|
+
| Math reasoning | PAL-GSM8K |
|
|
372
|
+
|
|
373
|
+
## pass@k Calculation
|
|
374
|
+
|
|
375
|
+
pass@k estimates probability that at least one of k samples passes all tests:
|
|
376
|
+
|
|
377
|
+
```
|
|
378
|
+
pass@k = E[1 - C(n-c, k) / C(n, k)]
|
|
379
|
+
```
|
|
380
|
+
|
|
381
|
+
Where:
|
|
382
|
+
- n = total samples generated
|
|
383
|
+
- c = samples that pass all tests
|
|
384
|
+
- k = number of samples allowed
|
|
385
|
+
|
|
386
|
+
**Recommended n_samples by k**:
|
|
387
|
+
- pass@1: n >= 20
|
|
388
|
+
- pass@10: n >= 100
|
|
389
|
+
- pass@100: n >= 200
|
|
390
|
+
|
|
391
|
+
**Temperature recommendations**:
|
|
392
|
+
- pass@1: temperature = 0.2 (near-greedy)
|
|
393
|
+
- pass@10, pass@100: temperature = 0.8 (more diverse sampling)
|