@synsci/cli-darwin-x64 1.1.49
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/skills/accelerate/SKILL.md +332 -0
- package/bin/skills/accelerate/references/custom-plugins.md +453 -0
- package/bin/skills/accelerate/references/megatron-integration.md +489 -0
- package/bin/skills/accelerate/references/performance.md +525 -0
- package/bin/skills/audiocraft/SKILL.md +564 -0
- package/bin/skills/audiocraft/references/advanced-usage.md +666 -0
- package/bin/skills/audiocraft/references/troubleshooting.md +504 -0
- package/bin/skills/autogpt/SKILL.md +403 -0
- package/bin/skills/autogpt/references/advanced-usage.md +535 -0
- package/bin/skills/autogpt/references/troubleshooting.md +420 -0
- package/bin/skills/awq/SKILL.md +310 -0
- package/bin/skills/awq/references/advanced-usage.md +324 -0
- package/bin/skills/awq/references/troubleshooting.md +344 -0
- package/bin/skills/axolotl/SKILL.md +158 -0
- package/bin/skills/axolotl/references/api.md +5548 -0
- package/bin/skills/axolotl/references/dataset-formats.md +1029 -0
- package/bin/skills/axolotl/references/index.md +15 -0
- package/bin/skills/axolotl/references/other.md +3563 -0
- package/bin/skills/bigcode-evaluation-harness/SKILL.md +405 -0
- package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +393 -0
- package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +424 -0
- package/bin/skills/bigcode-evaluation-harness/references/issues.md +394 -0
- package/bin/skills/bitsandbytes/SKILL.md +411 -0
- package/bin/skills/bitsandbytes/references/memory-optimization.md +521 -0
- package/bin/skills/bitsandbytes/references/qlora-training.md +521 -0
- package/bin/skills/bitsandbytes/references/quantization-formats.md +447 -0
- package/bin/skills/blip-2/SKILL.md +564 -0
- package/bin/skills/blip-2/references/advanced-usage.md +680 -0
- package/bin/skills/blip-2/references/troubleshooting.md +526 -0
- package/bin/skills/chroma/SKILL.md +406 -0
- package/bin/skills/chroma/references/integration.md +38 -0
- package/bin/skills/clip/SKILL.md +253 -0
- package/bin/skills/clip/references/applications.md +207 -0
- package/bin/skills/constitutional-ai/SKILL.md +290 -0
- package/bin/skills/crewai/SKILL.md +498 -0
- package/bin/skills/crewai/references/flows.md +438 -0
- package/bin/skills/crewai/references/tools.md +429 -0
- package/bin/skills/crewai/references/troubleshooting.md +480 -0
- package/bin/skills/deepspeed/SKILL.md +141 -0
- package/bin/skills/deepspeed/references/08.md +17 -0
- package/bin/skills/deepspeed/references/09.md +173 -0
- package/bin/skills/deepspeed/references/2020.md +378 -0
- package/bin/skills/deepspeed/references/2023.md +279 -0
- package/bin/skills/deepspeed/references/assets.md +179 -0
- package/bin/skills/deepspeed/references/index.md +35 -0
- package/bin/skills/deepspeed/references/mii.md +118 -0
- package/bin/skills/deepspeed/references/other.md +1191 -0
- package/bin/skills/deepspeed/references/tutorials.md +6554 -0
- package/bin/skills/dspy/SKILL.md +590 -0
- package/bin/skills/dspy/references/examples.md +663 -0
- package/bin/skills/dspy/references/modules.md +475 -0
- package/bin/skills/dspy/references/optimizers.md +566 -0
- package/bin/skills/faiss/SKILL.md +221 -0
- package/bin/skills/faiss/references/index_types.md +280 -0
- package/bin/skills/flash-attention/SKILL.md +367 -0
- package/bin/skills/flash-attention/references/benchmarks.md +215 -0
- package/bin/skills/flash-attention/references/transformers-integration.md +293 -0
- package/bin/skills/gguf/SKILL.md +427 -0
- package/bin/skills/gguf/references/advanced-usage.md +504 -0
- package/bin/skills/gguf/references/troubleshooting.md +442 -0
- package/bin/skills/gptq/SKILL.md +450 -0
- package/bin/skills/gptq/references/calibration.md +337 -0
- package/bin/skills/gptq/references/integration.md +129 -0
- package/bin/skills/gptq/references/troubleshooting.md +95 -0
- package/bin/skills/grpo-rl-training/README.md +97 -0
- package/bin/skills/grpo-rl-training/SKILL.md +572 -0
- package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +393 -0
- package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +228 -0
- package/bin/skills/guidance/SKILL.md +572 -0
- package/bin/skills/guidance/references/backends.md +554 -0
- package/bin/skills/guidance/references/constraints.md +674 -0
- package/bin/skills/guidance/references/examples.md +767 -0
- package/bin/skills/hqq/SKILL.md +445 -0
- package/bin/skills/hqq/references/advanced-usage.md +528 -0
- package/bin/skills/hqq/references/troubleshooting.md +503 -0
- package/bin/skills/hugging-face-cli/SKILL.md +191 -0
- package/bin/skills/hugging-face-cli/references/commands.md +954 -0
- package/bin/skills/hugging-face-cli/references/examples.md +374 -0
- package/bin/skills/hugging-face-datasets/SKILL.md +547 -0
- package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +239 -0
- package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +196 -0
- package/bin/skills/hugging-face-datasets/examples/training_examples.json +176 -0
- package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +522 -0
- package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +844 -0
- package/bin/skills/hugging-face-datasets/templates/chat.json +55 -0
- package/bin/skills/hugging-face-datasets/templates/classification.json +62 -0
- package/bin/skills/hugging-face-datasets/templates/completion.json +51 -0
- package/bin/skills/hugging-face-datasets/templates/custom.json +75 -0
- package/bin/skills/hugging-face-datasets/templates/qa.json +54 -0
- package/bin/skills/hugging-face-datasets/templates/tabular.json +81 -0
- package/bin/skills/hugging-face-evaluation/SKILL.md +656 -0
- package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +382 -0
- package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +141 -0
- package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +135 -0
- package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +50 -0
- package/bin/skills/hugging-face-evaluation/requirements.txt +20 -0
- package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +1374 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +104 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +317 -0
- package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +303 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +98 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +331 -0
- package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +206 -0
- package/bin/skills/hugging-face-jobs/SKILL.md +1041 -0
- package/bin/skills/hugging-face-jobs/index.html +216 -0
- package/bin/skills/hugging-face-jobs/references/hardware_guide.md +336 -0
- package/bin/skills/hugging-face-jobs/references/hub_saving.md +352 -0
- package/bin/skills/hugging-face-jobs/references/token_usage.md +546 -0
- package/bin/skills/hugging-face-jobs/references/troubleshooting.md +475 -0
- package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +718 -0
- package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +546 -0
- package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +587 -0
- package/bin/skills/hugging-face-model-trainer/SKILL.md +711 -0
- package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +296 -0
- package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +283 -0
- package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +364 -0
- package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +371 -0
- package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +189 -0
- package/bin/skills/hugging-face-model-trainer/references/training_methods.md +150 -0
- package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +203 -0
- package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +282 -0
- package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +424 -0
- package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +417 -0
- package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +150 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +106 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +89 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +122 -0
- package/bin/skills/hugging-face-paper-publisher/SKILL.md +627 -0
- package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +327 -0
- package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +216 -0
- package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +508 -0
- package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +299 -0
- package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +358 -0
- package/bin/skills/hugging-face-paper-publisher/templates/modern.md +319 -0
- package/bin/skills/hugging-face-paper-publisher/templates/standard.md +201 -0
- package/bin/skills/hugging-face-tool-builder/SKILL.md +115 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +57 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +40 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +57 -0
- package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +230 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +96 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +188 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +171 -0
- package/bin/skills/hugging-face-trackio/SKILL.md +65 -0
- package/bin/skills/hugging-face-trackio/references/logging_metrics.md +206 -0
- package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +223 -0
- package/bin/skills/huggingface-tokenizers/SKILL.md +516 -0
- package/bin/skills/huggingface-tokenizers/references/algorithms.md +653 -0
- package/bin/skills/huggingface-tokenizers/references/integration.md +637 -0
- package/bin/skills/huggingface-tokenizers/references/pipeline.md +723 -0
- package/bin/skills/huggingface-tokenizers/references/training.md +565 -0
- package/bin/skills/instructor/SKILL.md +740 -0
- package/bin/skills/instructor/references/examples.md +107 -0
- package/bin/skills/instructor/references/providers.md +70 -0
- package/bin/skills/instructor/references/validation.md +606 -0
- package/bin/skills/knowledge-distillation/SKILL.md +458 -0
- package/bin/skills/knowledge-distillation/references/minillm.md +334 -0
- package/bin/skills/lambda-labs/SKILL.md +545 -0
- package/bin/skills/lambda-labs/references/advanced-usage.md +611 -0
- package/bin/skills/lambda-labs/references/troubleshooting.md +530 -0
- package/bin/skills/langchain/SKILL.md +480 -0
- package/bin/skills/langchain/references/agents.md +499 -0
- package/bin/skills/langchain/references/integration.md +562 -0
- package/bin/skills/langchain/references/rag.md +600 -0
- package/bin/skills/langsmith/SKILL.md +422 -0
- package/bin/skills/langsmith/references/advanced-usage.md +548 -0
- package/bin/skills/langsmith/references/troubleshooting.md +537 -0
- package/bin/skills/litgpt/SKILL.md +469 -0
- package/bin/skills/litgpt/references/custom-models.md +568 -0
- package/bin/skills/litgpt/references/distributed-training.md +451 -0
- package/bin/skills/litgpt/references/supported-models.md +336 -0
- package/bin/skills/litgpt/references/training-recipes.md +619 -0
- package/bin/skills/llama-cpp/SKILL.md +258 -0
- package/bin/skills/llama-cpp/references/optimization.md +89 -0
- package/bin/skills/llama-cpp/references/quantization.md +213 -0
- package/bin/skills/llama-cpp/references/server.md +125 -0
- package/bin/skills/llama-factory/SKILL.md +80 -0
- package/bin/skills/llama-factory/references/_images.md +23 -0
- package/bin/skills/llama-factory/references/advanced.md +1055 -0
- package/bin/skills/llama-factory/references/getting_started.md +349 -0
- package/bin/skills/llama-factory/references/index.md +19 -0
- package/bin/skills/llama-factory/references/other.md +31 -0
- package/bin/skills/llamaguard/SKILL.md +337 -0
- package/bin/skills/llamaindex/SKILL.md +569 -0
- package/bin/skills/llamaindex/references/agents.md +83 -0
- package/bin/skills/llamaindex/references/data_connectors.md +108 -0
- package/bin/skills/llamaindex/references/query_engines.md +406 -0
- package/bin/skills/llava/SKILL.md +304 -0
- package/bin/skills/llava/references/training.md +197 -0
- package/bin/skills/lm-evaluation-harness/SKILL.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
- package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
- package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
- package/bin/skills/long-context/SKILL.md +536 -0
- package/bin/skills/long-context/references/extension_methods.md +468 -0
- package/bin/skills/long-context/references/fine_tuning.md +611 -0
- package/bin/skills/long-context/references/rope.md +402 -0
- package/bin/skills/mamba/SKILL.md +260 -0
- package/bin/skills/mamba/references/architecture-details.md +206 -0
- package/bin/skills/mamba/references/benchmarks.md +255 -0
- package/bin/skills/mamba/references/training-guide.md +388 -0
- package/bin/skills/megatron-core/SKILL.md +366 -0
- package/bin/skills/megatron-core/references/benchmarks.md +249 -0
- package/bin/skills/megatron-core/references/parallelism-guide.md +404 -0
- package/bin/skills/megatron-core/references/production-examples.md +473 -0
- package/bin/skills/megatron-core/references/training-recipes.md +547 -0
- package/bin/skills/miles/SKILL.md +315 -0
- package/bin/skills/miles/references/api-reference.md +141 -0
- package/bin/skills/miles/references/troubleshooting.md +352 -0
- package/bin/skills/mlflow/SKILL.md +704 -0
- package/bin/skills/mlflow/references/deployment.md +744 -0
- package/bin/skills/mlflow/references/model-registry.md +770 -0
- package/bin/skills/mlflow/references/tracking.md +680 -0
- package/bin/skills/modal/SKILL.md +341 -0
- package/bin/skills/modal/references/advanced-usage.md +503 -0
- package/bin/skills/modal/references/troubleshooting.md +494 -0
- package/bin/skills/model-merging/SKILL.md +539 -0
- package/bin/skills/model-merging/references/evaluation.md +462 -0
- package/bin/skills/model-merging/references/examples.md +428 -0
- package/bin/skills/model-merging/references/methods.md +352 -0
- package/bin/skills/model-pruning/SKILL.md +495 -0
- package/bin/skills/model-pruning/references/wanda.md +347 -0
- package/bin/skills/moe-training/SKILL.md +526 -0
- package/bin/skills/moe-training/references/architectures.md +432 -0
- package/bin/skills/moe-training/references/inference.md +348 -0
- package/bin/skills/moe-training/references/training.md +425 -0
- package/bin/skills/nanogpt/SKILL.md +290 -0
- package/bin/skills/nanogpt/references/architecture.md +382 -0
- package/bin/skills/nanogpt/references/data.md +476 -0
- package/bin/skills/nanogpt/references/training.md +564 -0
- package/bin/skills/nemo-curator/SKILL.md +383 -0
- package/bin/skills/nemo-curator/references/deduplication.md +87 -0
- package/bin/skills/nemo-curator/references/filtering.md +102 -0
- package/bin/skills/nemo-evaluator/SKILL.md +494 -0
- package/bin/skills/nemo-evaluator/references/adapter-system.md +340 -0
- package/bin/skills/nemo-evaluator/references/configuration.md +447 -0
- package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +315 -0
- package/bin/skills/nemo-evaluator/references/execution-backends.md +361 -0
- package/bin/skills/nemo-guardrails/SKILL.md +297 -0
- package/bin/skills/nnsight/SKILL.md +436 -0
- package/bin/skills/nnsight/references/README.md +78 -0
- package/bin/skills/nnsight/references/api.md +344 -0
- package/bin/skills/nnsight/references/tutorials.md +300 -0
- package/bin/skills/openrlhf/SKILL.md +249 -0
- package/bin/skills/openrlhf/references/algorithm-comparison.md +404 -0
- package/bin/skills/openrlhf/references/custom-rewards.md +530 -0
- package/bin/skills/openrlhf/references/hybrid-engine.md +287 -0
- package/bin/skills/openrlhf/references/multi-node-training.md +454 -0
- package/bin/skills/outlines/SKILL.md +652 -0
- package/bin/skills/outlines/references/backends.md +615 -0
- package/bin/skills/outlines/references/examples.md +773 -0
- package/bin/skills/outlines/references/json_generation.md +652 -0
- package/bin/skills/peft/SKILL.md +431 -0
- package/bin/skills/peft/references/advanced-usage.md +514 -0
- package/bin/skills/peft/references/troubleshooting.md +480 -0
- package/bin/skills/phoenix/SKILL.md +475 -0
- package/bin/skills/phoenix/references/advanced-usage.md +619 -0
- package/bin/skills/phoenix/references/troubleshooting.md +538 -0
- package/bin/skills/pinecone/SKILL.md +358 -0
- package/bin/skills/pinecone/references/deployment.md +181 -0
- package/bin/skills/pytorch-fsdp/SKILL.md +126 -0
- package/bin/skills/pytorch-fsdp/references/index.md +7 -0
- package/bin/skills/pytorch-fsdp/references/other.md +4249 -0
- package/bin/skills/pytorch-lightning/SKILL.md +346 -0
- package/bin/skills/pytorch-lightning/references/callbacks.md +436 -0
- package/bin/skills/pytorch-lightning/references/distributed.md +490 -0
- package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +556 -0
- package/bin/skills/pyvene/SKILL.md +473 -0
- package/bin/skills/pyvene/references/README.md +73 -0
- package/bin/skills/pyvene/references/api.md +383 -0
- package/bin/skills/pyvene/references/tutorials.md +376 -0
- package/bin/skills/qdrant/SKILL.md +493 -0
- package/bin/skills/qdrant/references/advanced-usage.md +648 -0
- package/bin/skills/qdrant/references/troubleshooting.md +631 -0
- package/bin/skills/ray-data/SKILL.md +326 -0
- package/bin/skills/ray-data/references/integration.md +82 -0
- package/bin/skills/ray-data/references/transformations.md +83 -0
- package/bin/skills/ray-train/SKILL.md +406 -0
- package/bin/skills/ray-train/references/multi-node.md +628 -0
- package/bin/skills/rwkv/SKILL.md +260 -0
- package/bin/skills/rwkv/references/architecture-details.md +344 -0
- package/bin/skills/rwkv/references/rwkv7.md +386 -0
- package/bin/skills/rwkv/references/state-management.md +369 -0
- package/bin/skills/saelens/SKILL.md +386 -0
- package/bin/skills/saelens/references/README.md +70 -0
- package/bin/skills/saelens/references/api.md +333 -0
- package/bin/skills/saelens/references/tutorials.md +318 -0
- package/bin/skills/segment-anything/SKILL.md +500 -0
- package/bin/skills/segment-anything/references/advanced-usage.md +589 -0
- package/bin/skills/segment-anything/references/troubleshooting.md +484 -0
- package/bin/skills/sentence-transformers/SKILL.md +255 -0
- package/bin/skills/sentence-transformers/references/models.md +123 -0
- package/bin/skills/sentencepiece/SKILL.md +235 -0
- package/bin/skills/sentencepiece/references/algorithms.md +200 -0
- package/bin/skills/sentencepiece/references/training.md +304 -0
- package/bin/skills/sglang/SKILL.md +442 -0
- package/bin/skills/sglang/references/deployment.md +490 -0
- package/bin/skills/sglang/references/radix-attention.md +413 -0
- package/bin/skills/sglang/references/structured-generation.md +541 -0
- package/bin/skills/simpo/SKILL.md +219 -0
- package/bin/skills/simpo/references/datasets.md +478 -0
- package/bin/skills/simpo/references/hyperparameters.md +452 -0
- package/bin/skills/simpo/references/loss-functions.md +350 -0
- package/bin/skills/skypilot/SKILL.md +509 -0
- package/bin/skills/skypilot/references/advanced-usage.md +491 -0
- package/bin/skills/skypilot/references/troubleshooting.md +570 -0
- package/bin/skills/slime/SKILL.md +464 -0
- package/bin/skills/slime/references/api-reference.md +392 -0
- package/bin/skills/slime/references/troubleshooting.md +386 -0
- package/bin/skills/speculative-decoding/SKILL.md +467 -0
- package/bin/skills/speculative-decoding/references/lookahead.md +309 -0
- package/bin/skills/speculative-decoding/references/medusa.md +350 -0
- package/bin/skills/stable-diffusion/SKILL.md +519 -0
- package/bin/skills/stable-diffusion/references/advanced-usage.md +716 -0
- package/bin/skills/stable-diffusion/references/troubleshooting.md +555 -0
- package/bin/skills/tensorboard/SKILL.md +629 -0
- package/bin/skills/tensorboard/references/integrations.md +638 -0
- package/bin/skills/tensorboard/references/profiling.md +545 -0
- package/bin/skills/tensorboard/references/visualization.md +620 -0
- package/bin/skills/tensorrt-llm/SKILL.md +187 -0
- package/bin/skills/tensorrt-llm/references/multi-gpu.md +298 -0
- package/bin/skills/tensorrt-llm/references/optimization.md +242 -0
- package/bin/skills/tensorrt-llm/references/serving.md +470 -0
- package/bin/skills/tinker/SKILL.md +362 -0
- package/bin/skills/tinker/references/api-reference.md +168 -0
- package/bin/skills/tinker/references/getting-started.md +157 -0
- package/bin/skills/tinker/references/loss-functions.md +163 -0
- package/bin/skills/tinker/references/models-and-lora.md +139 -0
- package/bin/skills/tinker/references/recipes.md +280 -0
- package/bin/skills/tinker/references/reinforcement-learning.md +212 -0
- package/bin/skills/tinker/references/rendering.md +243 -0
- package/bin/skills/tinker/references/supervised-learning.md +232 -0
- package/bin/skills/tinker-training-cost/SKILL.md +187 -0
- package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +123 -0
- package/bin/skills/torchforge/SKILL.md +433 -0
- package/bin/skills/torchforge/references/api-reference.md +327 -0
- package/bin/skills/torchforge/references/troubleshooting.md +409 -0
- package/bin/skills/torchtitan/SKILL.md +358 -0
- package/bin/skills/torchtitan/references/checkpoint.md +181 -0
- package/bin/skills/torchtitan/references/custom-models.md +258 -0
- package/bin/skills/torchtitan/references/float8.md +133 -0
- package/bin/skills/torchtitan/references/fsdp.md +126 -0
- package/bin/skills/transformer-lens/SKILL.md +346 -0
- package/bin/skills/transformer-lens/references/README.md +54 -0
- package/bin/skills/transformer-lens/references/api.md +362 -0
- package/bin/skills/transformer-lens/references/tutorials.md +339 -0
- package/bin/skills/trl-fine-tuning/SKILL.md +455 -0
- package/bin/skills/trl-fine-tuning/references/dpo-variants.md +227 -0
- package/bin/skills/trl-fine-tuning/references/online-rl.md +82 -0
- package/bin/skills/trl-fine-tuning/references/reward-modeling.md +122 -0
- package/bin/skills/trl-fine-tuning/references/sft-training.md +168 -0
- package/bin/skills/unsloth/SKILL.md +80 -0
- package/bin/skills/unsloth/references/index.md +7 -0
- package/bin/skills/unsloth/references/llms-full.md +16799 -0
- package/bin/skills/unsloth/references/llms-txt.md +12044 -0
- package/bin/skills/unsloth/references/llms.md +82 -0
- package/bin/skills/verl/SKILL.md +391 -0
- package/bin/skills/verl/references/api-reference.md +301 -0
- package/bin/skills/verl/references/troubleshooting.md +391 -0
- package/bin/skills/vllm/SKILL.md +364 -0
- package/bin/skills/vllm/references/optimization.md +226 -0
- package/bin/skills/vllm/references/quantization.md +284 -0
- package/bin/skills/vllm/references/server-deployment.md +255 -0
- package/bin/skills/vllm/references/troubleshooting.md +447 -0
- package/bin/skills/weights-and-biases/SKILL.md +590 -0
- package/bin/skills/weights-and-biases/references/artifacts.md +584 -0
- package/bin/skills/weights-and-biases/references/integrations.md +700 -0
- package/bin/skills/weights-and-biases/references/sweeps.md +847 -0
- package/bin/skills/whisper/SKILL.md +317 -0
- package/bin/skills/whisper/references/languages.md +189 -0
- package/bin/synsc +0 -0
- package/package.json +10 -0
|
@@ -0,0 +1,317 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: whisper
|
|
3
|
+
description: OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
author: Synthetic Sciences
|
|
6
|
+
license: MIT
|
|
7
|
+
tags: [Whisper, Speech Recognition, ASR, Multimodal, Multilingual, OpenAI, Speech-To-Text, Transcription, Translation, Audio Processing]
|
|
8
|
+
dependencies: [openai-whisper, transformers, torch]
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
# Whisper - Robust Speech Recognition
|
|
12
|
+
|
|
13
|
+
OpenAI's multilingual speech recognition model.
|
|
14
|
+
|
|
15
|
+
## When to use Whisper
|
|
16
|
+
|
|
17
|
+
**Use when:**
|
|
18
|
+
- Speech-to-text transcription (99 languages)
|
|
19
|
+
- Podcast/video transcription
|
|
20
|
+
- Meeting notes automation
|
|
21
|
+
- Translation to English
|
|
22
|
+
- Noisy audio transcription
|
|
23
|
+
- Multilingual audio processing
|
|
24
|
+
|
|
25
|
+
**Metrics**:
|
|
26
|
+
- **72,900+ GitHub stars**
|
|
27
|
+
- 99 languages supported
|
|
28
|
+
- Trained on 680,000 hours of audio
|
|
29
|
+
- MIT License
|
|
30
|
+
|
|
31
|
+
**Use alternatives instead**:
|
|
32
|
+
- **AssemblyAI**: Managed API, speaker diarization
|
|
33
|
+
- **Deepgram**: Real-time streaming ASR
|
|
34
|
+
- **Google Speech-to-Text**: Cloud-based
|
|
35
|
+
|
|
36
|
+
## Quick start
|
|
37
|
+
|
|
38
|
+
### Installation
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
# Requires Python 3.8-3.11
|
|
42
|
+
pip install -U openai-whisper
|
|
43
|
+
|
|
44
|
+
# Requires ffmpeg
|
|
45
|
+
# macOS: brew install ffmpeg
|
|
46
|
+
# Ubuntu: sudo apt install ffmpeg
|
|
47
|
+
# Windows: choco install ffmpeg
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
### Basic transcription
|
|
51
|
+
|
|
52
|
+
```python
|
|
53
|
+
import whisper
|
|
54
|
+
|
|
55
|
+
# Load model
|
|
56
|
+
model = whisper.load_model("base")
|
|
57
|
+
|
|
58
|
+
# Transcribe
|
|
59
|
+
result = model.transcribe("audio.mp3")
|
|
60
|
+
|
|
61
|
+
# Print text
|
|
62
|
+
print(result["text"])
|
|
63
|
+
|
|
64
|
+
# Access segments
|
|
65
|
+
for segment in result["segments"]:
|
|
66
|
+
print(f"[{segment['start']:.2f}s - {segment['end']:.2f}s] {segment['text']}")
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
## Model sizes
|
|
70
|
+
|
|
71
|
+
```python
|
|
72
|
+
# Available models
|
|
73
|
+
models = ["tiny", "base", "small", "medium", "large", "turbo"]
|
|
74
|
+
|
|
75
|
+
# Load specific model
|
|
76
|
+
model = whisper.load_model("turbo") # Fastest, good quality
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
| Model | Parameters | English-only | Multilingual | Speed | VRAM |
|
|
80
|
+
|-------|------------|--------------|--------------|-------|------|
|
|
81
|
+
| tiny | 39M | ✓ | ✓ | ~32x | ~1 GB |
|
|
82
|
+
| base | 74M | ✓ | ✓ | ~16x | ~1 GB |
|
|
83
|
+
| small | 244M | ✓ | ✓ | ~6x | ~2 GB |
|
|
84
|
+
| medium | 769M | ✓ | ✓ | ~2x | ~5 GB |
|
|
85
|
+
| large | 1550M | ✗ | ✓ | 1x | ~10 GB |
|
|
86
|
+
| turbo | 809M | ✗ | ✓ | ~8x | ~6 GB |
|
|
87
|
+
|
|
88
|
+
**Recommendation**: Use `turbo` for best speed/quality, `base` for prototyping
|
|
89
|
+
|
|
90
|
+
## Transcription options
|
|
91
|
+
|
|
92
|
+
### Language specification
|
|
93
|
+
|
|
94
|
+
```python
|
|
95
|
+
# Auto-detect language
|
|
96
|
+
result = model.transcribe("audio.mp3")
|
|
97
|
+
|
|
98
|
+
# Specify language (faster)
|
|
99
|
+
result = model.transcribe("audio.mp3", language="en")
|
|
100
|
+
|
|
101
|
+
# Supported: en, es, fr, de, it, pt, ru, ja, ko, zh, and 89 more
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
### Task selection
|
|
105
|
+
|
|
106
|
+
```python
|
|
107
|
+
# Transcription (default)
|
|
108
|
+
result = model.transcribe("audio.mp3", task="transcribe")
|
|
109
|
+
|
|
110
|
+
# Translation to English
|
|
111
|
+
result = model.transcribe("spanish.mp3", task="translate")
|
|
112
|
+
# Input: Spanish audio → Output: English text
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
### Initial prompt
|
|
116
|
+
|
|
117
|
+
```python
|
|
118
|
+
# Improve accuracy with context
|
|
119
|
+
result = model.transcribe(
|
|
120
|
+
"audio.mp3",
|
|
121
|
+
initial_prompt="This is a technical podcast about machine learning and AI."
|
|
122
|
+
)
|
|
123
|
+
|
|
124
|
+
# Helps with:
|
|
125
|
+
# - Technical terms
|
|
126
|
+
# - Proper nouns
|
|
127
|
+
# - Domain-specific vocabulary
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
### Timestamps
|
|
131
|
+
|
|
132
|
+
```python
|
|
133
|
+
# Word-level timestamps
|
|
134
|
+
result = model.transcribe("audio.mp3", word_timestamps=True)
|
|
135
|
+
|
|
136
|
+
for segment in result["segments"]:
|
|
137
|
+
for word in segment["words"]:
|
|
138
|
+
print(f"{word['word']} ({word['start']:.2f}s - {word['end']:.2f}s)")
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
### Temperature fallback
|
|
142
|
+
|
|
143
|
+
```python
|
|
144
|
+
# Retry with different temperatures if confidence low
|
|
145
|
+
result = model.transcribe(
|
|
146
|
+
"audio.mp3",
|
|
147
|
+
temperature=(0.0, 0.2, 0.4, 0.6, 0.8, 1.0)
|
|
148
|
+
)
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
## Command line usage
|
|
152
|
+
|
|
153
|
+
```bash
|
|
154
|
+
# Basic transcription
|
|
155
|
+
whisper audio.mp3
|
|
156
|
+
|
|
157
|
+
# Specify model
|
|
158
|
+
whisper audio.mp3 --model turbo
|
|
159
|
+
|
|
160
|
+
# Output formats
|
|
161
|
+
whisper audio.mp3 --output_format txt # Plain text
|
|
162
|
+
whisper audio.mp3 --output_format srt # Subtitles
|
|
163
|
+
whisper audio.mp3 --output_format vtt # WebVTT
|
|
164
|
+
whisper audio.mp3 --output_format json # JSON with timestamps
|
|
165
|
+
|
|
166
|
+
# Language
|
|
167
|
+
whisper audio.mp3 --language Spanish
|
|
168
|
+
|
|
169
|
+
# Translation
|
|
170
|
+
whisper spanish.mp3 --task translate
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
## Batch processing
|
|
174
|
+
|
|
175
|
+
```python
|
|
176
|
+
import os
|
|
177
|
+
|
|
178
|
+
audio_files = ["file1.mp3", "file2.mp3", "file3.mp3"]
|
|
179
|
+
|
|
180
|
+
for audio_file in audio_files:
|
|
181
|
+
print(f"Transcribing {audio_file}...")
|
|
182
|
+
result = model.transcribe(audio_file)
|
|
183
|
+
|
|
184
|
+
# Save to file
|
|
185
|
+
output_file = audio_file.replace(".mp3", ".txt")
|
|
186
|
+
with open(output_file, "w") as f:
|
|
187
|
+
f.write(result["text"])
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
## Real-time transcription
|
|
191
|
+
|
|
192
|
+
```python
|
|
193
|
+
# For streaming audio, use faster-whisper
|
|
194
|
+
# pip install faster-whisper
|
|
195
|
+
|
|
196
|
+
from faster_whisper import WhisperModel
|
|
197
|
+
|
|
198
|
+
model = WhisperModel("base", device="cuda", compute_type="float16")
|
|
199
|
+
|
|
200
|
+
# Transcribe with streaming
|
|
201
|
+
segments, info = model.transcribe("audio.mp3", beam_size=5)
|
|
202
|
+
|
|
203
|
+
for segment in segments:
|
|
204
|
+
print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
## GPU acceleration
|
|
208
|
+
|
|
209
|
+
```python
|
|
210
|
+
import whisper
|
|
211
|
+
|
|
212
|
+
# Automatically uses GPU if available
|
|
213
|
+
model = whisper.load_model("turbo")
|
|
214
|
+
|
|
215
|
+
# Force CPU
|
|
216
|
+
model = whisper.load_model("turbo", device="cpu")
|
|
217
|
+
|
|
218
|
+
# Force GPU
|
|
219
|
+
model = whisper.load_model("turbo", device="cuda")
|
|
220
|
+
|
|
221
|
+
# 10-20× faster on GPU
|
|
222
|
+
```
|
|
223
|
+
|
|
224
|
+
## Integration with other tools
|
|
225
|
+
|
|
226
|
+
### Subtitle generation
|
|
227
|
+
|
|
228
|
+
```bash
|
|
229
|
+
# Generate SRT subtitles
|
|
230
|
+
whisper video.mp4 --output_format srt --language English
|
|
231
|
+
|
|
232
|
+
# Output: video.srt
|
|
233
|
+
```
|
|
234
|
+
|
|
235
|
+
### With LangChain
|
|
236
|
+
|
|
237
|
+
```python
|
|
238
|
+
from langchain.document_loaders import WhisperTranscriptionLoader
|
|
239
|
+
|
|
240
|
+
loader = WhisperTranscriptionLoader(file_path="audio.mp3")
|
|
241
|
+
docs = loader.load()
|
|
242
|
+
|
|
243
|
+
# Use transcription in RAG
|
|
244
|
+
from langchain_chroma import Chroma
|
|
245
|
+
from langchain_openai import OpenAIEmbeddings
|
|
246
|
+
|
|
247
|
+
vectorstore = Chroma.from_documents(docs, OpenAIEmbeddings())
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
### Extract audio from video
|
|
251
|
+
|
|
252
|
+
```bash
|
|
253
|
+
# Use ffmpeg to extract audio
|
|
254
|
+
ffmpeg -i video.mp4 -vn -acodec pcm_s16le audio.wav
|
|
255
|
+
|
|
256
|
+
# Then transcribe
|
|
257
|
+
whisper audio.wav
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
## Best practices
|
|
261
|
+
|
|
262
|
+
1. **Use turbo model** - Best speed/quality for English
|
|
263
|
+
2. **Specify language** - Faster than auto-detect
|
|
264
|
+
3. **Add initial prompt** - Improves technical terms
|
|
265
|
+
4. **Use GPU** - 10-20× faster
|
|
266
|
+
5. **Batch process** - More efficient
|
|
267
|
+
6. **Convert to WAV** - Better compatibility
|
|
268
|
+
7. **Split long audio** - <30 min chunks
|
|
269
|
+
8. **Check language support** - Quality varies by language
|
|
270
|
+
9. **Use faster-whisper** - 4× faster than openai-whisper
|
|
271
|
+
10. **Monitor VRAM** - Scale model size to hardware
|
|
272
|
+
|
|
273
|
+
## Performance
|
|
274
|
+
|
|
275
|
+
| Model | Real-time factor (CPU) | Real-time factor (GPU) |
|
|
276
|
+
|-------|------------------------|------------------------|
|
|
277
|
+
| tiny | ~0.32 | ~0.01 |
|
|
278
|
+
| base | ~0.16 | ~0.01 |
|
|
279
|
+
| turbo | ~0.08 | ~0.01 |
|
|
280
|
+
| large | ~1.0 | ~0.05 |
|
|
281
|
+
|
|
282
|
+
*Real-time factor: 0.1 = 10× faster than real-time*
|
|
283
|
+
|
|
284
|
+
## Language support
|
|
285
|
+
|
|
286
|
+
Top-supported languages:
|
|
287
|
+
- English (en)
|
|
288
|
+
- Spanish (es)
|
|
289
|
+
- French (fr)
|
|
290
|
+
- German (de)
|
|
291
|
+
- Italian (it)
|
|
292
|
+
- Portuguese (pt)
|
|
293
|
+
- Russian (ru)
|
|
294
|
+
- Japanese (ja)
|
|
295
|
+
- Korean (ko)
|
|
296
|
+
- Chinese (zh)
|
|
297
|
+
|
|
298
|
+
Full list: 99 languages total
|
|
299
|
+
|
|
300
|
+
## Limitations
|
|
301
|
+
|
|
302
|
+
1. **Hallucinations** - May repeat or invent text
|
|
303
|
+
2. **Long-form accuracy** - Degrades on >30 min audio
|
|
304
|
+
3. **Speaker identification** - No diarization
|
|
305
|
+
4. **Accents** - Quality varies
|
|
306
|
+
5. **Background noise** - Can affect accuracy
|
|
307
|
+
6. **Real-time latency** - Not suitable for live captioning
|
|
308
|
+
|
|
309
|
+
## Resources
|
|
310
|
+
|
|
311
|
+
- **GitHub**: https://github.com/openai/whisper ⭐ 72,900+
|
|
312
|
+
- **Paper**: https://arxiv.org/abs/2212.04356
|
|
313
|
+
- **Model Card**: https://github.com/openai/whisper/blob/main/model-card.md
|
|
314
|
+
- **Colab**: Available in repo
|
|
315
|
+
- **License**: MIT
|
|
316
|
+
|
|
317
|
+
|
|
@@ -0,0 +1,189 @@
|
|
|
1
|
+
# Whisper Language Support Guide
|
|
2
|
+
|
|
3
|
+
Complete guide to Whisper's multilingual capabilities.
|
|
4
|
+
|
|
5
|
+
## Supported languages (99 total)
|
|
6
|
+
|
|
7
|
+
### Top-tier support (WER < 10%)
|
|
8
|
+
|
|
9
|
+
- English (en)
|
|
10
|
+
- Spanish (es)
|
|
11
|
+
- French (fr)
|
|
12
|
+
- German (de)
|
|
13
|
+
- Italian (it)
|
|
14
|
+
- Portuguese (pt)
|
|
15
|
+
- Dutch (nl)
|
|
16
|
+
- Polish (pl)
|
|
17
|
+
- Russian (ru)
|
|
18
|
+
- Japanese (ja)
|
|
19
|
+
- Korean (ko)
|
|
20
|
+
- Chinese (zh)
|
|
21
|
+
|
|
22
|
+
### Good support (WER 10-20%)
|
|
23
|
+
|
|
24
|
+
- Arabic (ar)
|
|
25
|
+
- Turkish (tr)
|
|
26
|
+
- Vietnamese (vi)
|
|
27
|
+
- Swedish (sv)
|
|
28
|
+
- Finnish (fi)
|
|
29
|
+
- Czech (cs)
|
|
30
|
+
- Romanian (ro)
|
|
31
|
+
- Hungarian (hu)
|
|
32
|
+
- Danish (da)
|
|
33
|
+
- Norwegian (no)
|
|
34
|
+
- Thai (th)
|
|
35
|
+
- Hebrew (he)
|
|
36
|
+
- Greek (el)
|
|
37
|
+
- Indonesian (id)
|
|
38
|
+
- Malay (ms)
|
|
39
|
+
|
|
40
|
+
### Full list (99 languages)
|
|
41
|
+
|
|
42
|
+
Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Bashkir, Basque, Belarusian, Bengali, Bosnian, Breton, Bulgarian, Burmese, Cantonese, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Faroese, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Lao, Latin, Latvian, Lingala, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Moldavian, Mongolian, Myanmar, Nepali, Norwegian, Nynorsk, Occitan, Pashto, Persian, Polish, Portuguese, Punjabi, Pushto, Romanian, Russian, Sanskrit, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tajik, Tamil, Tatar, Telugu, Thai, Tibetan, Turkish, Turkmen, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, Yiddish, Yoruba
|
|
43
|
+
|
|
44
|
+
## Usage examples
|
|
45
|
+
|
|
46
|
+
### Auto-detect language
|
|
47
|
+
|
|
48
|
+
```python
|
|
49
|
+
import whisper
|
|
50
|
+
|
|
51
|
+
model = whisper.load_model("turbo")
|
|
52
|
+
|
|
53
|
+
# Auto-detect language
|
|
54
|
+
result = model.transcribe("audio.mp3")
|
|
55
|
+
|
|
56
|
+
print(f"Detected language: {result['language']}")
|
|
57
|
+
print(f"Text: {result['text']}")
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
### Specify language (faster)
|
|
61
|
+
|
|
62
|
+
```python
|
|
63
|
+
# Specify language for faster transcription
|
|
64
|
+
result = model.transcribe("audio.mp3", language="es") # Spanish
|
|
65
|
+
result = model.transcribe("audio.mp3", language="fr") # French
|
|
66
|
+
result = model.transcribe("audio.mp3", language="ja") # Japanese
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
### Translation to English
|
|
70
|
+
|
|
71
|
+
```python
|
|
72
|
+
# Translate any language to English
|
|
73
|
+
result = model.transcribe(
|
|
74
|
+
"spanish_audio.mp3",
|
|
75
|
+
task="translate" # Translates to English
|
|
76
|
+
)
|
|
77
|
+
|
|
78
|
+
print(f"Original language: {result['language']}")
|
|
79
|
+
print(f"English translation: {result['text']}")
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
## Language-specific tips
|
|
83
|
+
|
|
84
|
+
### Chinese
|
|
85
|
+
|
|
86
|
+
```python
|
|
87
|
+
# Chinese works well with larger models
|
|
88
|
+
model = whisper.load_model("large")
|
|
89
|
+
|
|
90
|
+
result = model.transcribe(
|
|
91
|
+
"chinese_audio.mp3",
|
|
92
|
+
language="zh",
|
|
93
|
+
initial_prompt="这是一段关于技术的讨论" # Context helps
|
|
94
|
+
)
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
### Japanese
|
|
98
|
+
|
|
99
|
+
```python
|
|
100
|
+
# Japanese benefits from initial prompt
|
|
101
|
+
result = model.transcribe(
|
|
102
|
+
"japanese_audio.mp3",
|
|
103
|
+
language="ja",
|
|
104
|
+
initial_prompt="これは技術的な会議の録音です"
|
|
105
|
+
)
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
### Arabic
|
|
109
|
+
|
|
110
|
+
```python
|
|
111
|
+
# Arabic: Use large model for best results
|
|
112
|
+
model = whisper.load_model("large")
|
|
113
|
+
|
|
114
|
+
result = model.transcribe(
|
|
115
|
+
"arabic_audio.mp3",
|
|
116
|
+
language="ar"
|
|
117
|
+
)
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
## Model size recommendations
|
|
121
|
+
|
|
122
|
+
| Language Tier | Recommended Model | WER |
|
|
123
|
+
|---------------|-------------------|-----|
|
|
124
|
+
| Top-tier (en, es, fr, de) | base/turbo | < 10% |
|
|
125
|
+
| Good (ar, tr, vi) | medium/large | 10-20% |
|
|
126
|
+
| Lower-resource | large | 20-30% |
|
|
127
|
+
|
|
128
|
+
## Performance by language
|
|
129
|
+
|
|
130
|
+
### English
|
|
131
|
+
|
|
132
|
+
- **tiny**: WER ~15%
|
|
133
|
+
- **base**: WER ~8%
|
|
134
|
+
- **small**: WER ~5%
|
|
135
|
+
- **medium**: WER ~4%
|
|
136
|
+
- **large**: WER ~3%
|
|
137
|
+
- **turbo**: WER ~3.5%
|
|
138
|
+
|
|
139
|
+
### Spanish
|
|
140
|
+
|
|
141
|
+
- **tiny**: WER ~20%
|
|
142
|
+
- **base**: WER ~12%
|
|
143
|
+
- **medium**: WER ~6%
|
|
144
|
+
- **large**: WER ~4%
|
|
145
|
+
|
|
146
|
+
### Chinese
|
|
147
|
+
|
|
148
|
+
- **small**: WER ~15%
|
|
149
|
+
- **medium**: WER ~8%
|
|
150
|
+
- **large**: WER ~5%
|
|
151
|
+
|
|
152
|
+
## Best practices
|
|
153
|
+
|
|
154
|
+
1. **Use English-only models** - Better for small models (tiny/base)
|
|
155
|
+
2. **Specify language** - Faster than auto-detect
|
|
156
|
+
3. **Add initial prompt** - Improves accuracy for technical terms
|
|
157
|
+
4. **Use larger models** - For low-resource languages
|
|
158
|
+
5. **Test on sample** - Quality varies by accent/dialect
|
|
159
|
+
6. **Consider audio quality** - Clear audio = better results
|
|
160
|
+
7. **Check language codes** - Use ISO 639-1 codes (2 letters)
|
|
161
|
+
|
|
162
|
+
## Language detection
|
|
163
|
+
|
|
164
|
+
```python
|
|
165
|
+
# Detect language only (no transcription)
|
|
166
|
+
import whisper
|
|
167
|
+
|
|
168
|
+
model = whisper.load_model("base")
|
|
169
|
+
|
|
170
|
+
# Load audio
|
|
171
|
+
audio = whisper.load_audio("audio.mp3")
|
|
172
|
+
audio = whisper.pad_or_trim(audio)
|
|
173
|
+
|
|
174
|
+
# Make log-Mel spectrogram
|
|
175
|
+
mel = whisper.log_mel_spectrogram(audio).to(model.device)
|
|
176
|
+
|
|
177
|
+
# Detect language
|
|
178
|
+
_, probs = model.detect_language(mel)
|
|
179
|
+
detected_language = max(probs, key=probs.get)
|
|
180
|
+
|
|
181
|
+
print(f"Detected language: {detected_language}")
|
|
182
|
+
print(f"Confidence: {probs[detected_language]:.2%}")
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
## Resources
|
|
186
|
+
|
|
187
|
+
- **Paper**: https://arxiv.org/abs/2212.04356
|
|
188
|
+
- **GitHub**: https://github.com/openai/whisper
|
|
189
|
+
- **Model Card**: https://github.com/openai/whisper/blob/main/model-card.md
|
package/bin/synsc
ADDED
|
Binary file
|