@synsci/cli-darwin-x64 1.1.49
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/skills/accelerate/SKILL.md +332 -0
- package/bin/skills/accelerate/references/custom-plugins.md +453 -0
- package/bin/skills/accelerate/references/megatron-integration.md +489 -0
- package/bin/skills/accelerate/references/performance.md +525 -0
- package/bin/skills/audiocraft/SKILL.md +564 -0
- package/bin/skills/audiocraft/references/advanced-usage.md +666 -0
- package/bin/skills/audiocraft/references/troubleshooting.md +504 -0
- package/bin/skills/autogpt/SKILL.md +403 -0
- package/bin/skills/autogpt/references/advanced-usage.md +535 -0
- package/bin/skills/autogpt/references/troubleshooting.md +420 -0
- package/bin/skills/awq/SKILL.md +310 -0
- package/bin/skills/awq/references/advanced-usage.md +324 -0
- package/bin/skills/awq/references/troubleshooting.md +344 -0
- package/bin/skills/axolotl/SKILL.md +158 -0
- package/bin/skills/axolotl/references/api.md +5548 -0
- package/bin/skills/axolotl/references/dataset-formats.md +1029 -0
- package/bin/skills/axolotl/references/index.md +15 -0
- package/bin/skills/axolotl/references/other.md +3563 -0
- package/bin/skills/bigcode-evaluation-harness/SKILL.md +405 -0
- package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +393 -0
- package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +424 -0
- package/bin/skills/bigcode-evaluation-harness/references/issues.md +394 -0
- package/bin/skills/bitsandbytes/SKILL.md +411 -0
- package/bin/skills/bitsandbytes/references/memory-optimization.md +521 -0
- package/bin/skills/bitsandbytes/references/qlora-training.md +521 -0
- package/bin/skills/bitsandbytes/references/quantization-formats.md +447 -0
- package/bin/skills/blip-2/SKILL.md +564 -0
- package/bin/skills/blip-2/references/advanced-usage.md +680 -0
- package/bin/skills/blip-2/references/troubleshooting.md +526 -0
- package/bin/skills/chroma/SKILL.md +406 -0
- package/bin/skills/chroma/references/integration.md +38 -0
- package/bin/skills/clip/SKILL.md +253 -0
- package/bin/skills/clip/references/applications.md +207 -0
- package/bin/skills/constitutional-ai/SKILL.md +290 -0
- package/bin/skills/crewai/SKILL.md +498 -0
- package/bin/skills/crewai/references/flows.md +438 -0
- package/bin/skills/crewai/references/tools.md +429 -0
- package/bin/skills/crewai/references/troubleshooting.md +480 -0
- package/bin/skills/deepspeed/SKILL.md +141 -0
- package/bin/skills/deepspeed/references/08.md +17 -0
- package/bin/skills/deepspeed/references/09.md +173 -0
- package/bin/skills/deepspeed/references/2020.md +378 -0
- package/bin/skills/deepspeed/references/2023.md +279 -0
- package/bin/skills/deepspeed/references/assets.md +179 -0
- package/bin/skills/deepspeed/references/index.md +35 -0
- package/bin/skills/deepspeed/references/mii.md +118 -0
- package/bin/skills/deepspeed/references/other.md +1191 -0
- package/bin/skills/deepspeed/references/tutorials.md +6554 -0
- package/bin/skills/dspy/SKILL.md +590 -0
- package/bin/skills/dspy/references/examples.md +663 -0
- package/bin/skills/dspy/references/modules.md +475 -0
- package/bin/skills/dspy/references/optimizers.md +566 -0
- package/bin/skills/faiss/SKILL.md +221 -0
- package/bin/skills/faiss/references/index_types.md +280 -0
- package/bin/skills/flash-attention/SKILL.md +367 -0
- package/bin/skills/flash-attention/references/benchmarks.md +215 -0
- package/bin/skills/flash-attention/references/transformers-integration.md +293 -0
- package/bin/skills/gguf/SKILL.md +427 -0
- package/bin/skills/gguf/references/advanced-usage.md +504 -0
- package/bin/skills/gguf/references/troubleshooting.md +442 -0
- package/bin/skills/gptq/SKILL.md +450 -0
- package/bin/skills/gptq/references/calibration.md +337 -0
- package/bin/skills/gptq/references/integration.md +129 -0
- package/bin/skills/gptq/references/troubleshooting.md +95 -0
- package/bin/skills/grpo-rl-training/README.md +97 -0
- package/bin/skills/grpo-rl-training/SKILL.md +572 -0
- package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +393 -0
- package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +228 -0
- package/bin/skills/guidance/SKILL.md +572 -0
- package/bin/skills/guidance/references/backends.md +554 -0
- package/bin/skills/guidance/references/constraints.md +674 -0
- package/bin/skills/guidance/references/examples.md +767 -0
- package/bin/skills/hqq/SKILL.md +445 -0
- package/bin/skills/hqq/references/advanced-usage.md +528 -0
- package/bin/skills/hqq/references/troubleshooting.md +503 -0
- package/bin/skills/hugging-face-cli/SKILL.md +191 -0
- package/bin/skills/hugging-face-cli/references/commands.md +954 -0
- package/bin/skills/hugging-face-cli/references/examples.md +374 -0
- package/bin/skills/hugging-face-datasets/SKILL.md +547 -0
- package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +239 -0
- package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +196 -0
- package/bin/skills/hugging-face-datasets/examples/training_examples.json +176 -0
- package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +522 -0
- package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +844 -0
- package/bin/skills/hugging-face-datasets/templates/chat.json +55 -0
- package/bin/skills/hugging-face-datasets/templates/classification.json +62 -0
- package/bin/skills/hugging-face-datasets/templates/completion.json +51 -0
- package/bin/skills/hugging-face-datasets/templates/custom.json +75 -0
- package/bin/skills/hugging-face-datasets/templates/qa.json +54 -0
- package/bin/skills/hugging-face-datasets/templates/tabular.json +81 -0
- package/bin/skills/hugging-face-evaluation/SKILL.md +656 -0
- package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +382 -0
- package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +141 -0
- package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +135 -0
- package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +50 -0
- package/bin/skills/hugging-face-evaluation/requirements.txt +20 -0
- package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +1374 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +104 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +317 -0
- package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +303 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +98 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +331 -0
- package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +206 -0
- package/bin/skills/hugging-face-jobs/SKILL.md +1041 -0
- package/bin/skills/hugging-face-jobs/index.html +216 -0
- package/bin/skills/hugging-face-jobs/references/hardware_guide.md +336 -0
- package/bin/skills/hugging-face-jobs/references/hub_saving.md +352 -0
- package/bin/skills/hugging-face-jobs/references/token_usage.md +546 -0
- package/bin/skills/hugging-face-jobs/references/troubleshooting.md +475 -0
- package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +718 -0
- package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +546 -0
- package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +587 -0
- package/bin/skills/hugging-face-model-trainer/SKILL.md +711 -0
- package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +296 -0
- package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +283 -0
- package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +364 -0
- package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +371 -0
- package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +189 -0
- package/bin/skills/hugging-face-model-trainer/references/training_methods.md +150 -0
- package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +203 -0
- package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +282 -0
- package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +424 -0
- package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +417 -0
- package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +150 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +106 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +89 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +122 -0
- package/bin/skills/hugging-face-paper-publisher/SKILL.md +627 -0
- package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +327 -0
- package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +216 -0
- package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +508 -0
- package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +299 -0
- package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +358 -0
- package/bin/skills/hugging-face-paper-publisher/templates/modern.md +319 -0
- package/bin/skills/hugging-face-paper-publisher/templates/standard.md +201 -0
- package/bin/skills/hugging-face-tool-builder/SKILL.md +115 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +57 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +40 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +57 -0
- package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +230 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +96 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +188 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +171 -0
- package/bin/skills/hugging-face-trackio/SKILL.md +65 -0
- package/bin/skills/hugging-face-trackio/references/logging_metrics.md +206 -0
- package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +223 -0
- package/bin/skills/huggingface-tokenizers/SKILL.md +516 -0
- package/bin/skills/huggingface-tokenizers/references/algorithms.md +653 -0
- package/bin/skills/huggingface-tokenizers/references/integration.md +637 -0
- package/bin/skills/huggingface-tokenizers/references/pipeline.md +723 -0
- package/bin/skills/huggingface-tokenizers/references/training.md +565 -0
- package/bin/skills/instructor/SKILL.md +740 -0
- package/bin/skills/instructor/references/examples.md +107 -0
- package/bin/skills/instructor/references/providers.md +70 -0
- package/bin/skills/instructor/references/validation.md +606 -0
- package/bin/skills/knowledge-distillation/SKILL.md +458 -0
- package/bin/skills/knowledge-distillation/references/minillm.md +334 -0
- package/bin/skills/lambda-labs/SKILL.md +545 -0
- package/bin/skills/lambda-labs/references/advanced-usage.md +611 -0
- package/bin/skills/lambda-labs/references/troubleshooting.md +530 -0
- package/bin/skills/langchain/SKILL.md +480 -0
- package/bin/skills/langchain/references/agents.md +499 -0
- package/bin/skills/langchain/references/integration.md +562 -0
- package/bin/skills/langchain/references/rag.md +600 -0
- package/bin/skills/langsmith/SKILL.md +422 -0
- package/bin/skills/langsmith/references/advanced-usage.md +548 -0
- package/bin/skills/langsmith/references/troubleshooting.md +537 -0
- package/bin/skills/litgpt/SKILL.md +469 -0
- package/bin/skills/litgpt/references/custom-models.md +568 -0
- package/bin/skills/litgpt/references/distributed-training.md +451 -0
- package/bin/skills/litgpt/references/supported-models.md +336 -0
- package/bin/skills/litgpt/references/training-recipes.md +619 -0
- package/bin/skills/llama-cpp/SKILL.md +258 -0
- package/bin/skills/llama-cpp/references/optimization.md +89 -0
- package/bin/skills/llama-cpp/references/quantization.md +213 -0
- package/bin/skills/llama-cpp/references/server.md +125 -0
- package/bin/skills/llama-factory/SKILL.md +80 -0
- package/bin/skills/llama-factory/references/_images.md +23 -0
- package/bin/skills/llama-factory/references/advanced.md +1055 -0
- package/bin/skills/llama-factory/references/getting_started.md +349 -0
- package/bin/skills/llama-factory/references/index.md +19 -0
- package/bin/skills/llama-factory/references/other.md +31 -0
- package/bin/skills/llamaguard/SKILL.md +337 -0
- package/bin/skills/llamaindex/SKILL.md +569 -0
- package/bin/skills/llamaindex/references/agents.md +83 -0
- package/bin/skills/llamaindex/references/data_connectors.md +108 -0
- package/bin/skills/llamaindex/references/query_engines.md +406 -0
- package/bin/skills/llava/SKILL.md +304 -0
- package/bin/skills/llava/references/training.md +197 -0
- package/bin/skills/lm-evaluation-harness/SKILL.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
- package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
- package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
- package/bin/skills/long-context/SKILL.md +536 -0
- package/bin/skills/long-context/references/extension_methods.md +468 -0
- package/bin/skills/long-context/references/fine_tuning.md +611 -0
- package/bin/skills/long-context/references/rope.md +402 -0
- package/bin/skills/mamba/SKILL.md +260 -0
- package/bin/skills/mamba/references/architecture-details.md +206 -0
- package/bin/skills/mamba/references/benchmarks.md +255 -0
- package/bin/skills/mamba/references/training-guide.md +388 -0
- package/bin/skills/megatron-core/SKILL.md +366 -0
- package/bin/skills/megatron-core/references/benchmarks.md +249 -0
- package/bin/skills/megatron-core/references/parallelism-guide.md +404 -0
- package/bin/skills/megatron-core/references/production-examples.md +473 -0
- package/bin/skills/megatron-core/references/training-recipes.md +547 -0
- package/bin/skills/miles/SKILL.md +315 -0
- package/bin/skills/miles/references/api-reference.md +141 -0
- package/bin/skills/miles/references/troubleshooting.md +352 -0
- package/bin/skills/mlflow/SKILL.md +704 -0
- package/bin/skills/mlflow/references/deployment.md +744 -0
- package/bin/skills/mlflow/references/model-registry.md +770 -0
- package/bin/skills/mlflow/references/tracking.md +680 -0
- package/bin/skills/modal/SKILL.md +341 -0
- package/bin/skills/modal/references/advanced-usage.md +503 -0
- package/bin/skills/modal/references/troubleshooting.md +494 -0
- package/bin/skills/model-merging/SKILL.md +539 -0
- package/bin/skills/model-merging/references/evaluation.md +462 -0
- package/bin/skills/model-merging/references/examples.md +428 -0
- package/bin/skills/model-merging/references/methods.md +352 -0
- package/bin/skills/model-pruning/SKILL.md +495 -0
- package/bin/skills/model-pruning/references/wanda.md +347 -0
- package/bin/skills/moe-training/SKILL.md +526 -0
- package/bin/skills/moe-training/references/architectures.md +432 -0
- package/bin/skills/moe-training/references/inference.md +348 -0
- package/bin/skills/moe-training/references/training.md +425 -0
- package/bin/skills/nanogpt/SKILL.md +290 -0
- package/bin/skills/nanogpt/references/architecture.md +382 -0
- package/bin/skills/nanogpt/references/data.md +476 -0
- package/bin/skills/nanogpt/references/training.md +564 -0
- package/bin/skills/nemo-curator/SKILL.md +383 -0
- package/bin/skills/nemo-curator/references/deduplication.md +87 -0
- package/bin/skills/nemo-curator/references/filtering.md +102 -0
- package/bin/skills/nemo-evaluator/SKILL.md +494 -0
- package/bin/skills/nemo-evaluator/references/adapter-system.md +340 -0
- package/bin/skills/nemo-evaluator/references/configuration.md +447 -0
- package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +315 -0
- package/bin/skills/nemo-evaluator/references/execution-backends.md +361 -0
- package/bin/skills/nemo-guardrails/SKILL.md +297 -0
- package/bin/skills/nnsight/SKILL.md +436 -0
- package/bin/skills/nnsight/references/README.md +78 -0
- package/bin/skills/nnsight/references/api.md +344 -0
- package/bin/skills/nnsight/references/tutorials.md +300 -0
- package/bin/skills/openrlhf/SKILL.md +249 -0
- package/bin/skills/openrlhf/references/algorithm-comparison.md +404 -0
- package/bin/skills/openrlhf/references/custom-rewards.md +530 -0
- package/bin/skills/openrlhf/references/hybrid-engine.md +287 -0
- package/bin/skills/openrlhf/references/multi-node-training.md +454 -0
- package/bin/skills/outlines/SKILL.md +652 -0
- package/bin/skills/outlines/references/backends.md +615 -0
- package/bin/skills/outlines/references/examples.md +773 -0
- package/bin/skills/outlines/references/json_generation.md +652 -0
- package/bin/skills/peft/SKILL.md +431 -0
- package/bin/skills/peft/references/advanced-usage.md +514 -0
- package/bin/skills/peft/references/troubleshooting.md +480 -0
- package/bin/skills/phoenix/SKILL.md +475 -0
- package/bin/skills/phoenix/references/advanced-usage.md +619 -0
- package/bin/skills/phoenix/references/troubleshooting.md +538 -0
- package/bin/skills/pinecone/SKILL.md +358 -0
- package/bin/skills/pinecone/references/deployment.md +181 -0
- package/bin/skills/pytorch-fsdp/SKILL.md +126 -0
- package/bin/skills/pytorch-fsdp/references/index.md +7 -0
- package/bin/skills/pytorch-fsdp/references/other.md +4249 -0
- package/bin/skills/pytorch-lightning/SKILL.md +346 -0
- package/bin/skills/pytorch-lightning/references/callbacks.md +436 -0
- package/bin/skills/pytorch-lightning/references/distributed.md +490 -0
- package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +556 -0
- package/bin/skills/pyvene/SKILL.md +473 -0
- package/bin/skills/pyvene/references/README.md +73 -0
- package/bin/skills/pyvene/references/api.md +383 -0
- package/bin/skills/pyvene/references/tutorials.md +376 -0
- package/bin/skills/qdrant/SKILL.md +493 -0
- package/bin/skills/qdrant/references/advanced-usage.md +648 -0
- package/bin/skills/qdrant/references/troubleshooting.md +631 -0
- package/bin/skills/ray-data/SKILL.md +326 -0
- package/bin/skills/ray-data/references/integration.md +82 -0
- package/bin/skills/ray-data/references/transformations.md +83 -0
- package/bin/skills/ray-train/SKILL.md +406 -0
- package/bin/skills/ray-train/references/multi-node.md +628 -0
- package/bin/skills/rwkv/SKILL.md +260 -0
- package/bin/skills/rwkv/references/architecture-details.md +344 -0
- package/bin/skills/rwkv/references/rwkv7.md +386 -0
- package/bin/skills/rwkv/references/state-management.md +369 -0
- package/bin/skills/saelens/SKILL.md +386 -0
- package/bin/skills/saelens/references/README.md +70 -0
- package/bin/skills/saelens/references/api.md +333 -0
- package/bin/skills/saelens/references/tutorials.md +318 -0
- package/bin/skills/segment-anything/SKILL.md +500 -0
- package/bin/skills/segment-anything/references/advanced-usage.md +589 -0
- package/bin/skills/segment-anything/references/troubleshooting.md +484 -0
- package/bin/skills/sentence-transformers/SKILL.md +255 -0
- package/bin/skills/sentence-transformers/references/models.md +123 -0
- package/bin/skills/sentencepiece/SKILL.md +235 -0
- package/bin/skills/sentencepiece/references/algorithms.md +200 -0
- package/bin/skills/sentencepiece/references/training.md +304 -0
- package/bin/skills/sglang/SKILL.md +442 -0
- package/bin/skills/sglang/references/deployment.md +490 -0
- package/bin/skills/sglang/references/radix-attention.md +413 -0
- package/bin/skills/sglang/references/structured-generation.md +541 -0
- package/bin/skills/simpo/SKILL.md +219 -0
- package/bin/skills/simpo/references/datasets.md +478 -0
- package/bin/skills/simpo/references/hyperparameters.md +452 -0
- package/bin/skills/simpo/references/loss-functions.md +350 -0
- package/bin/skills/skypilot/SKILL.md +509 -0
- package/bin/skills/skypilot/references/advanced-usage.md +491 -0
- package/bin/skills/skypilot/references/troubleshooting.md +570 -0
- package/bin/skills/slime/SKILL.md +464 -0
- package/bin/skills/slime/references/api-reference.md +392 -0
- package/bin/skills/slime/references/troubleshooting.md +386 -0
- package/bin/skills/speculative-decoding/SKILL.md +467 -0
- package/bin/skills/speculative-decoding/references/lookahead.md +309 -0
- package/bin/skills/speculative-decoding/references/medusa.md +350 -0
- package/bin/skills/stable-diffusion/SKILL.md +519 -0
- package/bin/skills/stable-diffusion/references/advanced-usage.md +716 -0
- package/bin/skills/stable-diffusion/references/troubleshooting.md +555 -0
- package/bin/skills/tensorboard/SKILL.md +629 -0
- package/bin/skills/tensorboard/references/integrations.md +638 -0
- package/bin/skills/tensorboard/references/profiling.md +545 -0
- package/bin/skills/tensorboard/references/visualization.md +620 -0
- package/bin/skills/tensorrt-llm/SKILL.md +187 -0
- package/bin/skills/tensorrt-llm/references/multi-gpu.md +298 -0
- package/bin/skills/tensorrt-llm/references/optimization.md +242 -0
- package/bin/skills/tensorrt-llm/references/serving.md +470 -0
- package/bin/skills/tinker/SKILL.md +362 -0
- package/bin/skills/tinker/references/api-reference.md +168 -0
- package/bin/skills/tinker/references/getting-started.md +157 -0
- package/bin/skills/tinker/references/loss-functions.md +163 -0
- package/bin/skills/tinker/references/models-and-lora.md +139 -0
- package/bin/skills/tinker/references/recipes.md +280 -0
- package/bin/skills/tinker/references/reinforcement-learning.md +212 -0
- package/bin/skills/tinker/references/rendering.md +243 -0
- package/bin/skills/tinker/references/supervised-learning.md +232 -0
- package/bin/skills/tinker-training-cost/SKILL.md +187 -0
- package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +123 -0
- package/bin/skills/torchforge/SKILL.md +433 -0
- package/bin/skills/torchforge/references/api-reference.md +327 -0
- package/bin/skills/torchforge/references/troubleshooting.md +409 -0
- package/bin/skills/torchtitan/SKILL.md +358 -0
- package/bin/skills/torchtitan/references/checkpoint.md +181 -0
- package/bin/skills/torchtitan/references/custom-models.md +258 -0
- package/bin/skills/torchtitan/references/float8.md +133 -0
- package/bin/skills/torchtitan/references/fsdp.md +126 -0
- package/bin/skills/transformer-lens/SKILL.md +346 -0
- package/bin/skills/transformer-lens/references/README.md +54 -0
- package/bin/skills/transformer-lens/references/api.md +362 -0
- package/bin/skills/transformer-lens/references/tutorials.md +339 -0
- package/bin/skills/trl-fine-tuning/SKILL.md +455 -0
- package/bin/skills/trl-fine-tuning/references/dpo-variants.md +227 -0
- package/bin/skills/trl-fine-tuning/references/online-rl.md +82 -0
- package/bin/skills/trl-fine-tuning/references/reward-modeling.md +122 -0
- package/bin/skills/trl-fine-tuning/references/sft-training.md +168 -0
- package/bin/skills/unsloth/SKILL.md +80 -0
- package/bin/skills/unsloth/references/index.md +7 -0
- package/bin/skills/unsloth/references/llms-full.md +16799 -0
- package/bin/skills/unsloth/references/llms-txt.md +12044 -0
- package/bin/skills/unsloth/references/llms.md +82 -0
- package/bin/skills/verl/SKILL.md +391 -0
- package/bin/skills/verl/references/api-reference.md +301 -0
- package/bin/skills/verl/references/troubleshooting.md +391 -0
- package/bin/skills/vllm/SKILL.md +364 -0
- package/bin/skills/vllm/references/optimization.md +226 -0
- package/bin/skills/vllm/references/quantization.md +284 -0
- package/bin/skills/vllm/references/server-deployment.md +255 -0
- package/bin/skills/vllm/references/troubleshooting.md +447 -0
- package/bin/skills/weights-and-biases/SKILL.md +590 -0
- package/bin/skills/weights-and-biases/references/artifacts.md +584 -0
- package/bin/skills/weights-and-biases/references/integrations.md +700 -0
- package/bin/skills/weights-and-biases/references/sweeps.md +847 -0
- package/bin/skills/whisper/SKILL.md +317 -0
- package/bin/skills/whisper/references/languages.md +189 -0
- package/bin/synsc +0 -0
- package/package.json +10 -0
|
@@ -0,0 +1,405 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: evaluating-code-models
|
|
3
|
+
description: Evaluates code generation models across HumanEval, MBPP, MultiPL-E, and 15+ benchmarks with pass@k metrics. Use when benchmarking code models, comparing coding abilities, testing multi-language support, or measuring code generation quality. Industry standard from BigCode Project used by HuggingFace leaderboards.
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
author: Synthetic Sciences
|
|
6
|
+
license: MIT
|
|
7
|
+
tags: [Evaluation, Code Generation, HumanEval, MBPP, MultiPL-E, Pass@k, BigCode, Benchmarking, Code Models]
|
|
8
|
+
dependencies: [bigcode-evaluation-harness, transformers>=4.25.1, accelerate>=0.13.2, datasets>=2.6.1]
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
# BigCode Evaluation Harness - Code Model Benchmarking
|
|
12
|
+
|
|
13
|
+
## Quick Start
|
|
14
|
+
|
|
15
|
+
BigCode Evaluation Harness evaluates code generation models across 15+ benchmarks including HumanEval, MBPP, and MultiPL-E (18 languages).
|
|
16
|
+
|
|
17
|
+
**Installation**:
|
|
18
|
+
```bash
|
|
19
|
+
git clone https://github.com/bigcode-project/bigcode-evaluation-harness.git
|
|
20
|
+
cd bigcode-evaluation-harness
|
|
21
|
+
pip install -e .
|
|
22
|
+
accelerate config
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
**Evaluate on HumanEval**:
|
|
26
|
+
```bash
|
|
27
|
+
accelerate launch main.py \
|
|
28
|
+
--model bigcode/starcoder2-7b \
|
|
29
|
+
--tasks humaneval \
|
|
30
|
+
--max_length_generation 512 \
|
|
31
|
+
--temperature 0.2 \
|
|
32
|
+
--n_samples 20 \
|
|
33
|
+
--batch_size 10 \
|
|
34
|
+
--allow_code_execution \
|
|
35
|
+
--save_generations
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
**View available tasks**:
|
|
39
|
+
```bash
|
|
40
|
+
python -c "from bigcode_eval.tasks import ALL_TASKS; print(ALL_TASKS)"
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
## Common Workflows
|
|
44
|
+
|
|
45
|
+
### Workflow 1: Standard Code Benchmark Evaluation
|
|
46
|
+
|
|
47
|
+
Evaluate model on core code benchmarks (HumanEval, MBPP, HumanEval+).
|
|
48
|
+
|
|
49
|
+
**Checklist**:
|
|
50
|
+
```
|
|
51
|
+
Code Benchmark Evaluation:
|
|
52
|
+
- [ ] Step 1: Choose benchmark suite
|
|
53
|
+
- [ ] Step 2: Configure model and generation
|
|
54
|
+
- [ ] Step 3: Run evaluation with code execution
|
|
55
|
+
- [ ] Step 4: Analyze pass@k results
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
**Step 1: Choose benchmark suite**
|
|
59
|
+
|
|
60
|
+
**Python code generation** (most common):
|
|
61
|
+
- **HumanEval**: 164 handwritten problems, function completion
|
|
62
|
+
- **HumanEval+**: Same 164 problems with 80× more tests (stricter)
|
|
63
|
+
- **MBPP**: 500 crowd-sourced problems, entry-level difficulty
|
|
64
|
+
- **MBPP+**: 399 curated problems with 35× more tests
|
|
65
|
+
|
|
66
|
+
**Multi-language** (18 languages):
|
|
67
|
+
- **MultiPL-E**: HumanEval/MBPP translated to C++, Java, JavaScript, Go, Rust, etc.
|
|
68
|
+
|
|
69
|
+
**Advanced**:
|
|
70
|
+
- **APPS**: 10,000 problems (introductory/interview/competition)
|
|
71
|
+
- **DS-1000**: 1,000 data science problems across 7 libraries
|
|
72
|
+
|
|
73
|
+
**Step 2: Configure model and generation**
|
|
74
|
+
|
|
75
|
+
```bash
|
|
76
|
+
# Standard HuggingFace model
|
|
77
|
+
accelerate launch main.py \
|
|
78
|
+
--model bigcode/starcoder2-7b \
|
|
79
|
+
--tasks humaneval \
|
|
80
|
+
--max_length_generation 512 \
|
|
81
|
+
--temperature 0.2 \
|
|
82
|
+
--do_sample True \
|
|
83
|
+
--n_samples 200 \
|
|
84
|
+
--batch_size 50 \
|
|
85
|
+
--allow_code_execution
|
|
86
|
+
|
|
87
|
+
# Quantized model (4-bit)
|
|
88
|
+
accelerate launch main.py \
|
|
89
|
+
--model codellama/CodeLlama-34b-hf \
|
|
90
|
+
--tasks humaneval \
|
|
91
|
+
--load_in_4bit \
|
|
92
|
+
--max_length_generation 512 \
|
|
93
|
+
--allow_code_execution
|
|
94
|
+
|
|
95
|
+
# Custom/private model
|
|
96
|
+
accelerate launch main.py \
|
|
97
|
+
--model /path/to/my-code-model \
|
|
98
|
+
--tasks humaneval \
|
|
99
|
+
--trust_remote_code \
|
|
100
|
+
--use_auth_token \
|
|
101
|
+
--allow_code_execution
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
**Step 3: Run evaluation**
|
|
105
|
+
|
|
106
|
+
```bash
|
|
107
|
+
# Full evaluation with pass@k estimation (k=1,10,100)
|
|
108
|
+
accelerate launch main.py \
|
|
109
|
+
--model bigcode/starcoder2-7b \
|
|
110
|
+
--tasks humaneval \
|
|
111
|
+
--temperature 0.8 \
|
|
112
|
+
--n_samples 200 \
|
|
113
|
+
--batch_size 50 \
|
|
114
|
+
--allow_code_execution \
|
|
115
|
+
--save_generations \
|
|
116
|
+
--metric_output_path results/starcoder2-humaneval.json
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
**Step 4: Analyze results**
|
|
120
|
+
|
|
121
|
+
Results in `results/starcoder2-humaneval.json`:
|
|
122
|
+
```json
|
|
123
|
+
{
|
|
124
|
+
"humaneval": {
|
|
125
|
+
"pass@1": 0.354,
|
|
126
|
+
"pass@10": 0.521,
|
|
127
|
+
"pass@100": 0.689
|
|
128
|
+
},
|
|
129
|
+
"config": {
|
|
130
|
+
"model": "bigcode/starcoder2-7b",
|
|
131
|
+
"temperature": 0.8,
|
|
132
|
+
"n_samples": 200
|
|
133
|
+
}
|
|
134
|
+
}
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
### Workflow 2: Multi-Language Evaluation (MultiPL-E)
|
|
138
|
+
|
|
139
|
+
Evaluate code generation across 18 programming languages.
|
|
140
|
+
|
|
141
|
+
**Checklist**:
|
|
142
|
+
```
|
|
143
|
+
Multi-Language Evaluation:
|
|
144
|
+
- [ ] Step 1: Generate solutions (host machine)
|
|
145
|
+
- [ ] Step 2: Run evaluation in Docker (safe execution)
|
|
146
|
+
- [ ] Step 3: Compare across languages
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
**Step 1: Generate solutions on host**
|
|
150
|
+
|
|
151
|
+
```bash
|
|
152
|
+
# Generate without execution (safe)
|
|
153
|
+
accelerate launch main.py \
|
|
154
|
+
--model bigcode/starcoder2-7b \
|
|
155
|
+
--tasks multiple-py,multiple-js,multiple-java,multiple-cpp \
|
|
156
|
+
--max_length_generation 650 \
|
|
157
|
+
--temperature 0.8 \
|
|
158
|
+
--n_samples 50 \
|
|
159
|
+
--batch_size 50 \
|
|
160
|
+
--generation_only \
|
|
161
|
+
--save_generations \
|
|
162
|
+
--save_generations_path generations_multi.json
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
**Step 2: Evaluate in Docker container**
|
|
166
|
+
|
|
167
|
+
```bash
|
|
168
|
+
# Pull the MultiPL-E Docker image
|
|
169
|
+
docker pull ghcr.io/bigcode-project/evaluation-harness-multiple
|
|
170
|
+
|
|
171
|
+
# Run evaluation inside container
|
|
172
|
+
docker run -v $(pwd)/generations_multi.json:/app/generations.json:ro \
|
|
173
|
+
-it evaluation-harness-multiple python3 main.py \
|
|
174
|
+
--model bigcode/starcoder2-7b \
|
|
175
|
+
--tasks multiple-py,multiple-js,multiple-java,multiple-cpp \
|
|
176
|
+
--load_generations_path /app/generations.json \
|
|
177
|
+
--allow_code_execution \
|
|
178
|
+
--n_samples 50
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
**Supported languages**: Python, JavaScript, Java, C++, Go, Rust, TypeScript, C#, PHP, Ruby, Swift, Kotlin, Scala, Perl, Julia, Lua, R, Racket
|
|
182
|
+
|
|
183
|
+
### Workflow 3: Instruction-Tuned Model Evaluation
|
|
184
|
+
|
|
185
|
+
Evaluate chat/instruction models with proper formatting.
|
|
186
|
+
|
|
187
|
+
**Checklist**:
|
|
188
|
+
```
|
|
189
|
+
Instruction Model Evaluation:
|
|
190
|
+
- [ ] Step 1: Use instruction-tuned tasks
|
|
191
|
+
- [ ] Step 2: Configure instruction tokens
|
|
192
|
+
- [ ] Step 3: Run evaluation
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
**Step 1: Choose instruction tasks**
|
|
196
|
+
|
|
197
|
+
- **instruct-humaneval**: HumanEval with instruction prompts
|
|
198
|
+
- **humanevalsynthesize-{lang}**: HumanEvalPack synthesis tasks
|
|
199
|
+
|
|
200
|
+
**Step 2: Configure instruction tokens**
|
|
201
|
+
|
|
202
|
+
```bash
|
|
203
|
+
# For models with chat templates (e.g., CodeLlama-Instruct)
|
|
204
|
+
accelerate launch main.py \
|
|
205
|
+
--model codellama/CodeLlama-7b-Instruct-hf \
|
|
206
|
+
--tasks instruct-humaneval \
|
|
207
|
+
--instruction_tokens "<s>[INST],</s>,[/INST]" \
|
|
208
|
+
--max_length_generation 512 \
|
|
209
|
+
--allow_code_execution
|
|
210
|
+
```
|
|
211
|
+
|
|
212
|
+
**Step 3: HumanEvalPack for instruction models**
|
|
213
|
+
|
|
214
|
+
```bash
|
|
215
|
+
# Test code synthesis across 6 languages
|
|
216
|
+
accelerate launch main.py \
|
|
217
|
+
--model codellama/CodeLlama-7b-Instruct-hf \
|
|
218
|
+
--tasks humanevalsynthesize-python,humanevalsynthesize-js \
|
|
219
|
+
--prompt instruct \
|
|
220
|
+
--max_length_generation 512 \
|
|
221
|
+
--allow_code_execution
|
|
222
|
+
```
|
|
223
|
+
|
|
224
|
+
### Workflow 4: Compare Multiple Models
|
|
225
|
+
|
|
226
|
+
Benchmark suite for model comparison.
|
|
227
|
+
|
|
228
|
+
**Step 1: Create evaluation script**
|
|
229
|
+
|
|
230
|
+
```bash
|
|
231
|
+
#!/bin/bash
|
|
232
|
+
# eval_models.sh
|
|
233
|
+
|
|
234
|
+
MODELS=(
|
|
235
|
+
"bigcode/starcoder2-7b"
|
|
236
|
+
"codellama/CodeLlama-7b-hf"
|
|
237
|
+
"deepseek-ai/deepseek-coder-6.7b-base"
|
|
238
|
+
)
|
|
239
|
+
TASKS="humaneval,mbpp"
|
|
240
|
+
|
|
241
|
+
for model in "${MODELS[@]}"; do
|
|
242
|
+
model_name=$(echo $model | tr '/' '-')
|
|
243
|
+
echo "Evaluating $model"
|
|
244
|
+
|
|
245
|
+
accelerate launch main.py \
|
|
246
|
+
--model $model \
|
|
247
|
+
--tasks $TASKS \
|
|
248
|
+
--temperature 0.2 \
|
|
249
|
+
--n_samples 20 \
|
|
250
|
+
--batch_size 20 \
|
|
251
|
+
--allow_code_execution \
|
|
252
|
+
--metric_output_path results/${model_name}.json
|
|
253
|
+
done
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
**Step 2: Generate comparison table**
|
|
257
|
+
|
|
258
|
+
```python
|
|
259
|
+
import json
|
|
260
|
+
import pandas as pd
|
|
261
|
+
|
|
262
|
+
models = ["bigcode-starcoder2-7b", "codellama-CodeLlama-7b-hf", "deepseek-ai-deepseek-coder-6.7b-base"]
|
|
263
|
+
results = []
|
|
264
|
+
|
|
265
|
+
for model in models:
|
|
266
|
+
with open(f"results/{model}.json") as f:
|
|
267
|
+
data = json.load(f)
|
|
268
|
+
results.append({
|
|
269
|
+
"Model": model,
|
|
270
|
+
"HumanEval pass@1": f"{data['humaneval']['pass@1']:.3f}",
|
|
271
|
+
"MBPP pass@1": f"{data['mbpp']['pass@1']:.3f}"
|
|
272
|
+
})
|
|
273
|
+
|
|
274
|
+
df = pd.DataFrame(results)
|
|
275
|
+
print(df.to_markdown(index=False))
|
|
276
|
+
```
|
|
277
|
+
|
|
278
|
+
## When to Use vs Alternatives
|
|
279
|
+
|
|
280
|
+
**Use BigCode Evaluation Harness when:**
|
|
281
|
+
- Evaluating **code generation** models specifically
|
|
282
|
+
- Need **multi-language** evaluation (18 languages via MultiPL-E)
|
|
283
|
+
- Testing **functional correctness** with unit tests (pass@k)
|
|
284
|
+
- Benchmarking for **BigCode/HuggingFace leaderboards**
|
|
285
|
+
- Evaluating **fill-in-the-middle** (FIM) capabilities
|
|
286
|
+
|
|
287
|
+
**Use alternatives instead:**
|
|
288
|
+
- **lm-evaluation-harness**: General LLM benchmarks (MMLU, GSM8K, HellaSwag)
|
|
289
|
+
- **EvalPlus**: Stricter HumanEval+/MBPP+ with more test cases
|
|
290
|
+
- **SWE-bench**: Real-world GitHub issue resolution
|
|
291
|
+
- **LiveCodeBench**: Contamination-free, continuously updated problems
|
|
292
|
+
- **CodeXGLUE**: Code understanding tasks (clone detection, defect prediction)
|
|
293
|
+
|
|
294
|
+
## Supported Benchmarks
|
|
295
|
+
|
|
296
|
+
| Benchmark | Problems | Languages | Metric | Use Case |
|
|
297
|
+
|-----------|----------|-----------|--------|----------|
|
|
298
|
+
| HumanEval | 164 | Python | pass@k | Standard code completion |
|
|
299
|
+
| HumanEval+ | 164 | Python | pass@k | Stricter evaluation (80× tests) |
|
|
300
|
+
| MBPP | 500 | Python | pass@k | Entry-level problems |
|
|
301
|
+
| MBPP+ | 399 | Python | pass@k | Stricter evaluation (35× tests) |
|
|
302
|
+
| MultiPL-E | 164×18 | 18 languages | pass@k | Multi-language evaluation |
|
|
303
|
+
| APPS | 10,000 | Python | pass@k | Competition-level |
|
|
304
|
+
| DS-1000 | 1,000 | Python | pass@k | Data science (pandas, numpy, etc.) |
|
|
305
|
+
| HumanEvalPack | 164×3×6 | 6 languages | pass@k | Synthesis/fix/explain |
|
|
306
|
+
| Mercury | 1,889 | Python | Efficiency | Computational efficiency |
|
|
307
|
+
|
|
308
|
+
## Common Issues
|
|
309
|
+
|
|
310
|
+
**Issue: Different results than reported in papers**
|
|
311
|
+
|
|
312
|
+
Check these factors:
|
|
313
|
+
```bash
|
|
314
|
+
# 1. Verify n_samples (need 200 for accurate pass@k)
|
|
315
|
+
--n_samples 200
|
|
316
|
+
|
|
317
|
+
# 2. Check temperature (0.2 for greedy-ish, 0.8 for sampling)
|
|
318
|
+
--temperature 0.8
|
|
319
|
+
|
|
320
|
+
# 3. Verify task name matches exactly
|
|
321
|
+
--tasks humaneval # Not "human_eval" or "HumanEval"
|
|
322
|
+
|
|
323
|
+
# 4. Check max_length_generation
|
|
324
|
+
--max_length_generation 512 # Increase for longer problems
|
|
325
|
+
```
|
|
326
|
+
|
|
327
|
+
**Issue: CUDA out of memory**
|
|
328
|
+
|
|
329
|
+
```bash
|
|
330
|
+
# Use quantization
|
|
331
|
+
--load_in_8bit
|
|
332
|
+
# OR
|
|
333
|
+
--load_in_4bit
|
|
334
|
+
|
|
335
|
+
# Reduce batch size
|
|
336
|
+
--batch_size 1
|
|
337
|
+
|
|
338
|
+
# Set memory limit
|
|
339
|
+
--max_memory_per_gpu "20GiB"
|
|
340
|
+
```
|
|
341
|
+
|
|
342
|
+
**Issue: Code execution hangs or times out**
|
|
343
|
+
|
|
344
|
+
Use Docker for safe execution:
|
|
345
|
+
```bash
|
|
346
|
+
# Generate on host (no execution)
|
|
347
|
+
--generation_only --save_generations
|
|
348
|
+
|
|
349
|
+
# Evaluate in Docker
|
|
350
|
+
docker run ... --allow_code_execution --load_generations_path ...
|
|
351
|
+
```
|
|
352
|
+
|
|
353
|
+
**Issue: Low scores on instruction models**
|
|
354
|
+
|
|
355
|
+
Ensure proper instruction formatting:
|
|
356
|
+
```bash
|
|
357
|
+
# Use instruction-specific tasks
|
|
358
|
+
--tasks instruct-humaneval
|
|
359
|
+
|
|
360
|
+
# Set instruction tokens for your model
|
|
361
|
+
--instruction_tokens "<s>[INST],</s>,[/INST]"
|
|
362
|
+
```
|
|
363
|
+
|
|
364
|
+
**Issue: MultiPL-E language failures**
|
|
365
|
+
|
|
366
|
+
Use the dedicated Docker image:
|
|
367
|
+
```bash
|
|
368
|
+
docker pull ghcr.io/bigcode-project/evaluation-harness-multiple
|
|
369
|
+
```
|
|
370
|
+
|
|
371
|
+
## Command Reference
|
|
372
|
+
|
|
373
|
+
| Argument | Default | Description |
|
|
374
|
+
|----------|---------|-------------|
|
|
375
|
+
| `--model` | - | HuggingFace model ID or local path |
|
|
376
|
+
| `--tasks` | - | Comma-separated task names |
|
|
377
|
+
| `--n_samples` | 1 | Samples per problem (200 for pass@k) |
|
|
378
|
+
| `--temperature` | 0.2 | Sampling temperature |
|
|
379
|
+
| `--max_length_generation` | 512 | Max tokens (prompt + generation) |
|
|
380
|
+
| `--batch_size` | 1 | Batch size per GPU |
|
|
381
|
+
| `--allow_code_execution` | False | Enable code execution (required) |
|
|
382
|
+
| `--generation_only` | False | Generate without evaluation |
|
|
383
|
+
| `--load_generations_path` | - | Load pre-generated solutions |
|
|
384
|
+
| `--save_generations` | False | Save generated code |
|
|
385
|
+
| `--metric_output_path` | results.json | Output file for metrics |
|
|
386
|
+
| `--load_in_8bit` | False | 8-bit quantization |
|
|
387
|
+
| `--load_in_4bit` | False | 4-bit quantization |
|
|
388
|
+
| `--trust_remote_code` | False | Allow custom model code |
|
|
389
|
+
| `--precision` | fp32 | Model precision (fp32/fp16/bf16) |
|
|
390
|
+
|
|
391
|
+
## Hardware Requirements
|
|
392
|
+
|
|
393
|
+
| Model Size | VRAM (fp16) | VRAM (4-bit) | Time (HumanEval, n=200) |
|
|
394
|
+
|------------|-------------|--------------|-------------------------|
|
|
395
|
+
| 7B | 14GB | 6GB | ~30 min (A100) |
|
|
396
|
+
| 13B | 26GB | 10GB | ~1 hour (A100) |
|
|
397
|
+
| 34B | 68GB | 20GB | ~2 hours (A100) |
|
|
398
|
+
|
|
399
|
+
## Resources
|
|
400
|
+
|
|
401
|
+
- **GitHub**: https://github.com/bigcode-project/bigcode-evaluation-harness
|
|
402
|
+
- **Documentation**: https://github.com/bigcode-project/bigcode-evaluation-harness/tree/main/docs
|
|
403
|
+
- **BigCode Leaderboard**: https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard
|
|
404
|
+
- **HumanEval Dataset**: https://huggingface.co/datasets/openai/openai_humaneval
|
|
405
|
+
- **MultiPL-E**: https://github.com/nuprl/MultiPL-E
|