@synsci/cli-darwin-x64 1.1.49
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/skills/accelerate/SKILL.md +332 -0
- package/bin/skills/accelerate/references/custom-plugins.md +453 -0
- package/bin/skills/accelerate/references/megatron-integration.md +489 -0
- package/bin/skills/accelerate/references/performance.md +525 -0
- package/bin/skills/audiocraft/SKILL.md +564 -0
- package/bin/skills/audiocraft/references/advanced-usage.md +666 -0
- package/bin/skills/audiocraft/references/troubleshooting.md +504 -0
- package/bin/skills/autogpt/SKILL.md +403 -0
- package/bin/skills/autogpt/references/advanced-usage.md +535 -0
- package/bin/skills/autogpt/references/troubleshooting.md +420 -0
- package/bin/skills/awq/SKILL.md +310 -0
- package/bin/skills/awq/references/advanced-usage.md +324 -0
- package/bin/skills/awq/references/troubleshooting.md +344 -0
- package/bin/skills/axolotl/SKILL.md +158 -0
- package/bin/skills/axolotl/references/api.md +5548 -0
- package/bin/skills/axolotl/references/dataset-formats.md +1029 -0
- package/bin/skills/axolotl/references/index.md +15 -0
- package/bin/skills/axolotl/references/other.md +3563 -0
- package/bin/skills/bigcode-evaluation-harness/SKILL.md +405 -0
- package/bin/skills/bigcode-evaluation-harness/references/benchmarks.md +393 -0
- package/bin/skills/bigcode-evaluation-harness/references/custom-tasks.md +424 -0
- package/bin/skills/bigcode-evaluation-harness/references/issues.md +394 -0
- package/bin/skills/bitsandbytes/SKILL.md +411 -0
- package/bin/skills/bitsandbytes/references/memory-optimization.md +521 -0
- package/bin/skills/bitsandbytes/references/qlora-training.md +521 -0
- package/bin/skills/bitsandbytes/references/quantization-formats.md +447 -0
- package/bin/skills/blip-2/SKILL.md +564 -0
- package/bin/skills/blip-2/references/advanced-usage.md +680 -0
- package/bin/skills/blip-2/references/troubleshooting.md +526 -0
- package/bin/skills/chroma/SKILL.md +406 -0
- package/bin/skills/chroma/references/integration.md +38 -0
- package/bin/skills/clip/SKILL.md +253 -0
- package/bin/skills/clip/references/applications.md +207 -0
- package/bin/skills/constitutional-ai/SKILL.md +290 -0
- package/bin/skills/crewai/SKILL.md +498 -0
- package/bin/skills/crewai/references/flows.md +438 -0
- package/bin/skills/crewai/references/tools.md +429 -0
- package/bin/skills/crewai/references/troubleshooting.md +480 -0
- package/bin/skills/deepspeed/SKILL.md +141 -0
- package/bin/skills/deepspeed/references/08.md +17 -0
- package/bin/skills/deepspeed/references/09.md +173 -0
- package/bin/skills/deepspeed/references/2020.md +378 -0
- package/bin/skills/deepspeed/references/2023.md +279 -0
- package/bin/skills/deepspeed/references/assets.md +179 -0
- package/bin/skills/deepspeed/references/index.md +35 -0
- package/bin/skills/deepspeed/references/mii.md +118 -0
- package/bin/skills/deepspeed/references/other.md +1191 -0
- package/bin/skills/deepspeed/references/tutorials.md +6554 -0
- package/bin/skills/dspy/SKILL.md +590 -0
- package/bin/skills/dspy/references/examples.md +663 -0
- package/bin/skills/dspy/references/modules.md +475 -0
- package/bin/skills/dspy/references/optimizers.md +566 -0
- package/bin/skills/faiss/SKILL.md +221 -0
- package/bin/skills/faiss/references/index_types.md +280 -0
- package/bin/skills/flash-attention/SKILL.md +367 -0
- package/bin/skills/flash-attention/references/benchmarks.md +215 -0
- package/bin/skills/flash-attention/references/transformers-integration.md +293 -0
- package/bin/skills/gguf/SKILL.md +427 -0
- package/bin/skills/gguf/references/advanced-usage.md +504 -0
- package/bin/skills/gguf/references/troubleshooting.md +442 -0
- package/bin/skills/gptq/SKILL.md +450 -0
- package/bin/skills/gptq/references/calibration.md +337 -0
- package/bin/skills/gptq/references/integration.md +129 -0
- package/bin/skills/gptq/references/troubleshooting.md +95 -0
- package/bin/skills/grpo-rl-training/README.md +97 -0
- package/bin/skills/grpo-rl-training/SKILL.md +572 -0
- package/bin/skills/grpo-rl-training/examples/reward_functions_library.py +393 -0
- package/bin/skills/grpo-rl-training/templates/basic_grpo_training.py +228 -0
- package/bin/skills/guidance/SKILL.md +572 -0
- package/bin/skills/guidance/references/backends.md +554 -0
- package/bin/skills/guidance/references/constraints.md +674 -0
- package/bin/skills/guidance/references/examples.md +767 -0
- package/bin/skills/hqq/SKILL.md +445 -0
- package/bin/skills/hqq/references/advanced-usage.md +528 -0
- package/bin/skills/hqq/references/troubleshooting.md +503 -0
- package/bin/skills/hugging-face-cli/SKILL.md +191 -0
- package/bin/skills/hugging-face-cli/references/commands.md +954 -0
- package/bin/skills/hugging-face-cli/references/examples.md +374 -0
- package/bin/skills/hugging-face-datasets/SKILL.md +547 -0
- package/bin/skills/hugging-face-datasets/examples/diverse_training_examples.json +239 -0
- package/bin/skills/hugging-face-datasets/examples/system_prompt_template.txt +196 -0
- package/bin/skills/hugging-face-datasets/examples/training_examples.json +176 -0
- package/bin/skills/hugging-face-datasets/scripts/dataset_manager.py +522 -0
- package/bin/skills/hugging-face-datasets/scripts/sql_manager.py +844 -0
- package/bin/skills/hugging-face-datasets/templates/chat.json +55 -0
- package/bin/skills/hugging-face-datasets/templates/classification.json +62 -0
- package/bin/skills/hugging-face-datasets/templates/completion.json +51 -0
- package/bin/skills/hugging-face-datasets/templates/custom.json +75 -0
- package/bin/skills/hugging-face-datasets/templates/qa.json +54 -0
- package/bin/skills/hugging-face-datasets/templates/tabular.json +81 -0
- package/bin/skills/hugging-face-evaluation/SKILL.md +656 -0
- package/bin/skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md +382 -0
- package/bin/skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py +141 -0
- package/bin/skills/hugging-face-evaluation/examples/example_readme_tables.md +135 -0
- package/bin/skills/hugging-face-evaluation/examples/metric_mapping.json +50 -0
- package/bin/skills/hugging-face-evaluation/requirements.txt +20 -0
- package/bin/skills/hugging-face-evaluation/scripts/evaluation_manager.py +1374 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_eval_uv.py +104 -0
- package/bin/skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py +317 -0
- package/bin/skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py +303 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_eval_job.py +98 -0
- package/bin/skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py +331 -0
- package/bin/skills/hugging-face-evaluation/scripts/test_extraction.py +206 -0
- package/bin/skills/hugging-face-jobs/SKILL.md +1041 -0
- package/bin/skills/hugging-face-jobs/index.html +216 -0
- package/bin/skills/hugging-face-jobs/references/hardware_guide.md +336 -0
- package/bin/skills/hugging-face-jobs/references/hub_saving.md +352 -0
- package/bin/skills/hugging-face-jobs/references/token_usage.md +546 -0
- package/bin/skills/hugging-face-jobs/references/troubleshooting.md +475 -0
- package/bin/skills/hugging-face-jobs/scripts/cot-self-instruct.py +718 -0
- package/bin/skills/hugging-face-jobs/scripts/finepdfs-stats.py +546 -0
- package/bin/skills/hugging-face-jobs/scripts/generate-responses.py +587 -0
- package/bin/skills/hugging-face-model-trainer/SKILL.md +711 -0
- package/bin/skills/hugging-face-model-trainer/references/gguf_conversion.md +296 -0
- package/bin/skills/hugging-face-model-trainer/references/hardware_guide.md +283 -0
- package/bin/skills/hugging-face-model-trainer/references/hub_saving.md +364 -0
- package/bin/skills/hugging-face-model-trainer/references/reliability_principles.md +371 -0
- package/bin/skills/hugging-face-model-trainer/references/trackio_guide.md +189 -0
- package/bin/skills/hugging-face-model-trainer/references/training_methods.md +150 -0
- package/bin/skills/hugging-face-model-trainer/references/training_patterns.md +203 -0
- package/bin/skills/hugging-face-model-trainer/references/troubleshooting.md +282 -0
- package/bin/skills/hugging-face-model-trainer/scripts/convert_to_gguf.py +424 -0
- package/bin/skills/hugging-face-model-trainer/scripts/dataset_inspector.py +417 -0
- package/bin/skills/hugging-face-model-trainer/scripts/estimate_cost.py +150 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_dpo_example.py +106 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_grpo_example.py +89 -0
- package/bin/skills/hugging-face-model-trainer/scripts/train_sft_example.py +122 -0
- package/bin/skills/hugging-face-paper-publisher/SKILL.md +627 -0
- package/bin/skills/hugging-face-paper-publisher/examples/example_usage.md +327 -0
- package/bin/skills/hugging-face-paper-publisher/references/quick_reference.md +216 -0
- package/bin/skills/hugging-face-paper-publisher/scripts/paper_manager.py +508 -0
- package/bin/skills/hugging-face-paper-publisher/templates/arxiv.md +299 -0
- package/bin/skills/hugging-face-paper-publisher/templates/ml-report.md +358 -0
- package/bin/skills/hugging-face-paper-publisher/templates/modern.md +319 -0
- package/bin/skills/hugging-face-paper-publisher/templates/standard.md +201 -0
- package/bin/skills/hugging-face-tool-builder/SKILL.md +115 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.py +57 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.sh +40 -0
- package/bin/skills/hugging-face-tool-builder/references/baseline_hf_api.tsx +57 -0
- package/bin/skills/hugging-face-tool-builder/references/find_models_by_paper.sh +230 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_enrich_models.sh +96 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh +188 -0
- package/bin/skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh +171 -0
- package/bin/skills/hugging-face-trackio/SKILL.md +65 -0
- package/bin/skills/hugging-face-trackio/references/logging_metrics.md +206 -0
- package/bin/skills/hugging-face-trackio/references/retrieving_metrics.md +223 -0
- package/bin/skills/huggingface-tokenizers/SKILL.md +516 -0
- package/bin/skills/huggingface-tokenizers/references/algorithms.md +653 -0
- package/bin/skills/huggingface-tokenizers/references/integration.md +637 -0
- package/bin/skills/huggingface-tokenizers/references/pipeline.md +723 -0
- package/bin/skills/huggingface-tokenizers/references/training.md +565 -0
- package/bin/skills/instructor/SKILL.md +740 -0
- package/bin/skills/instructor/references/examples.md +107 -0
- package/bin/skills/instructor/references/providers.md +70 -0
- package/bin/skills/instructor/references/validation.md +606 -0
- package/bin/skills/knowledge-distillation/SKILL.md +458 -0
- package/bin/skills/knowledge-distillation/references/minillm.md +334 -0
- package/bin/skills/lambda-labs/SKILL.md +545 -0
- package/bin/skills/lambda-labs/references/advanced-usage.md +611 -0
- package/bin/skills/lambda-labs/references/troubleshooting.md +530 -0
- package/bin/skills/langchain/SKILL.md +480 -0
- package/bin/skills/langchain/references/agents.md +499 -0
- package/bin/skills/langchain/references/integration.md +562 -0
- package/bin/skills/langchain/references/rag.md +600 -0
- package/bin/skills/langsmith/SKILL.md +422 -0
- package/bin/skills/langsmith/references/advanced-usage.md +548 -0
- package/bin/skills/langsmith/references/troubleshooting.md +537 -0
- package/bin/skills/litgpt/SKILL.md +469 -0
- package/bin/skills/litgpt/references/custom-models.md +568 -0
- package/bin/skills/litgpt/references/distributed-training.md +451 -0
- package/bin/skills/litgpt/references/supported-models.md +336 -0
- package/bin/skills/litgpt/references/training-recipes.md +619 -0
- package/bin/skills/llama-cpp/SKILL.md +258 -0
- package/bin/skills/llama-cpp/references/optimization.md +89 -0
- package/bin/skills/llama-cpp/references/quantization.md +213 -0
- package/bin/skills/llama-cpp/references/server.md +125 -0
- package/bin/skills/llama-factory/SKILL.md +80 -0
- package/bin/skills/llama-factory/references/_images.md +23 -0
- package/bin/skills/llama-factory/references/advanced.md +1055 -0
- package/bin/skills/llama-factory/references/getting_started.md +349 -0
- package/bin/skills/llama-factory/references/index.md +19 -0
- package/bin/skills/llama-factory/references/other.md +31 -0
- package/bin/skills/llamaguard/SKILL.md +337 -0
- package/bin/skills/llamaindex/SKILL.md +569 -0
- package/bin/skills/llamaindex/references/agents.md +83 -0
- package/bin/skills/llamaindex/references/data_connectors.md +108 -0
- package/bin/skills/llamaindex/references/query_engines.md +406 -0
- package/bin/skills/llava/SKILL.md +304 -0
- package/bin/skills/llava/references/training.md +197 -0
- package/bin/skills/lm-evaluation-harness/SKILL.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/api-evaluation.md +490 -0
- package/bin/skills/lm-evaluation-harness/references/benchmark-guide.md +488 -0
- package/bin/skills/lm-evaluation-harness/references/custom-tasks.md +602 -0
- package/bin/skills/lm-evaluation-harness/references/distributed-eval.md +519 -0
- package/bin/skills/long-context/SKILL.md +536 -0
- package/bin/skills/long-context/references/extension_methods.md +468 -0
- package/bin/skills/long-context/references/fine_tuning.md +611 -0
- package/bin/skills/long-context/references/rope.md +402 -0
- package/bin/skills/mamba/SKILL.md +260 -0
- package/bin/skills/mamba/references/architecture-details.md +206 -0
- package/bin/skills/mamba/references/benchmarks.md +255 -0
- package/bin/skills/mamba/references/training-guide.md +388 -0
- package/bin/skills/megatron-core/SKILL.md +366 -0
- package/bin/skills/megatron-core/references/benchmarks.md +249 -0
- package/bin/skills/megatron-core/references/parallelism-guide.md +404 -0
- package/bin/skills/megatron-core/references/production-examples.md +473 -0
- package/bin/skills/megatron-core/references/training-recipes.md +547 -0
- package/bin/skills/miles/SKILL.md +315 -0
- package/bin/skills/miles/references/api-reference.md +141 -0
- package/bin/skills/miles/references/troubleshooting.md +352 -0
- package/bin/skills/mlflow/SKILL.md +704 -0
- package/bin/skills/mlflow/references/deployment.md +744 -0
- package/bin/skills/mlflow/references/model-registry.md +770 -0
- package/bin/skills/mlflow/references/tracking.md +680 -0
- package/bin/skills/modal/SKILL.md +341 -0
- package/bin/skills/modal/references/advanced-usage.md +503 -0
- package/bin/skills/modal/references/troubleshooting.md +494 -0
- package/bin/skills/model-merging/SKILL.md +539 -0
- package/bin/skills/model-merging/references/evaluation.md +462 -0
- package/bin/skills/model-merging/references/examples.md +428 -0
- package/bin/skills/model-merging/references/methods.md +352 -0
- package/bin/skills/model-pruning/SKILL.md +495 -0
- package/bin/skills/model-pruning/references/wanda.md +347 -0
- package/bin/skills/moe-training/SKILL.md +526 -0
- package/bin/skills/moe-training/references/architectures.md +432 -0
- package/bin/skills/moe-training/references/inference.md +348 -0
- package/bin/skills/moe-training/references/training.md +425 -0
- package/bin/skills/nanogpt/SKILL.md +290 -0
- package/bin/skills/nanogpt/references/architecture.md +382 -0
- package/bin/skills/nanogpt/references/data.md +476 -0
- package/bin/skills/nanogpt/references/training.md +564 -0
- package/bin/skills/nemo-curator/SKILL.md +383 -0
- package/bin/skills/nemo-curator/references/deduplication.md +87 -0
- package/bin/skills/nemo-curator/references/filtering.md +102 -0
- package/bin/skills/nemo-evaluator/SKILL.md +494 -0
- package/bin/skills/nemo-evaluator/references/adapter-system.md +340 -0
- package/bin/skills/nemo-evaluator/references/configuration.md +447 -0
- package/bin/skills/nemo-evaluator/references/custom-benchmarks.md +315 -0
- package/bin/skills/nemo-evaluator/references/execution-backends.md +361 -0
- package/bin/skills/nemo-guardrails/SKILL.md +297 -0
- package/bin/skills/nnsight/SKILL.md +436 -0
- package/bin/skills/nnsight/references/README.md +78 -0
- package/bin/skills/nnsight/references/api.md +344 -0
- package/bin/skills/nnsight/references/tutorials.md +300 -0
- package/bin/skills/openrlhf/SKILL.md +249 -0
- package/bin/skills/openrlhf/references/algorithm-comparison.md +404 -0
- package/bin/skills/openrlhf/references/custom-rewards.md +530 -0
- package/bin/skills/openrlhf/references/hybrid-engine.md +287 -0
- package/bin/skills/openrlhf/references/multi-node-training.md +454 -0
- package/bin/skills/outlines/SKILL.md +652 -0
- package/bin/skills/outlines/references/backends.md +615 -0
- package/bin/skills/outlines/references/examples.md +773 -0
- package/bin/skills/outlines/references/json_generation.md +652 -0
- package/bin/skills/peft/SKILL.md +431 -0
- package/bin/skills/peft/references/advanced-usage.md +514 -0
- package/bin/skills/peft/references/troubleshooting.md +480 -0
- package/bin/skills/phoenix/SKILL.md +475 -0
- package/bin/skills/phoenix/references/advanced-usage.md +619 -0
- package/bin/skills/phoenix/references/troubleshooting.md +538 -0
- package/bin/skills/pinecone/SKILL.md +358 -0
- package/bin/skills/pinecone/references/deployment.md +181 -0
- package/bin/skills/pytorch-fsdp/SKILL.md +126 -0
- package/bin/skills/pytorch-fsdp/references/index.md +7 -0
- package/bin/skills/pytorch-fsdp/references/other.md +4249 -0
- package/bin/skills/pytorch-lightning/SKILL.md +346 -0
- package/bin/skills/pytorch-lightning/references/callbacks.md +436 -0
- package/bin/skills/pytorch-lightning/references/distributed.md +490 -0
- package/bin/skills/pytorch-lightning/references/hyperparameter-tuning.md +556 -0
- package/bin/skills/pyvene/SKILL.md +473 -0
- package/bin/skills/pyvene/references/README.md +73 -0
- package/bin/skills/pyvene/references/api.md +383 -0
- package/bin/skills/pyvene/references/tutorials.md +376 -0
- package/bin/skills/qdrant/SKILL.md +493 -0
- package/bin/skills/qdrant/references/advanced-usage.md +648 -0
- package/bin/skills/qdrant/references/troubleshooting.md +631 -0
- package/bin/skills/ray-data/SKILL.md +326 -0
- package/bin/skills/ray-data/references/integration.md +82 -0
- package/bin/skills/ray-data/references/transformations.md +83 -0
- package/bin/skills/ray-train/SKILL.md +406 -0
- package/bin/skills/ray-train/references/multi-node.md +628 -0
- package/bin/skills/rwkv/SKILL.md +260 -0
- package/bin/skills/rwkv/references/architecture-details.md +344 -0
- package/bin/skills/rwkv/references/rwkv7.md +386 -0
- package/bin/skills/rwkv/references/state-management.md +369 -0
- package/bin/skills/saelens/SKILL.md +386 -0
- package/bin/skills/saelens/references/README.md +70 -0
- package/bin/skills/saelens/references/api.md +333 -0
- package/bin/skills/saelens/references/tutorials.md +318 -0
- package/bin/skills/segment-anything/SKILL.md +500 -0
- package/bin/skills/segment-anything/references/advanced-usage.md +589 -0
- package/bin/skills/segment-anything/references/troubleshooting.md +484 -0
- package/bin/skills/sentence-transformers/SKILL.md +255 -0
- package/bin/skills/sentence-transformers/references/models.md +123 -0
- package/bin/skills/sentencepiece/SKILL.md +235 -0
- package/bin/skills/sentencepiece/references/algorithms.md +200 -0
- package/bin/skills/sentencepiece/references/training.md +304 -0
- package/bin/skills/sglang/SKILL.md +442 -0
- package/bin/skills/sglang/references/deployment.md +490 -0
- package/bin/skills/sglang/references/radix-attention.md +413 -0
- package/bin/skills/sglang/references/structured-generation.md +541 -0
- package/bin/skills/simpo/SKILL.md +219 -0
- package/bin/skills/simpo/references/datasets.md +478 -0
- package/bin/skills/simpo/references/hyperparameters.md +452 -0
- package/bin/skills/simpo/references/loss-functions.md +350 -0
- package/bin/skills/skypilot/SKILL.md +509 -0
- package/bin/skills/skypilot/references/advanced-usage.md +491 -0
- package/bin/skills/skypilot/references/troubleshooting.md +570 -0
- package/bin/skills/slime/SKILL.md +464 -0
- package/bin/skills/slime/references/api-reference.md +392 -0
- package/bin/skills/slime/references/troubleshooting.md +386 -0
- package/bin/skills/speculative-decoding/SKILL.md +467 -0
- package/bin/skills/speculative-decoding/references/lookahead.md +309 -0
- package/bin/skills/speculative-decoding/references/medusa.md +350 -0
- package/bin/skills/stable-diffusion/SKILL.md +519 -0
- package/bin/skills/stable-diffusion/references/advanced-usage.md +716 -0
- package/bin/skills/stable-diffusion/references/troubleshooting.md +555 -0
- package/bin/skills/tensorboard/SKILL.md +629 -0
- package/bin/skills/tensorboard/references/integrations.md +638 -0
- package/bin/skills/tensorboard/references/profiling.md +545 -0
- package/bin/skills/tensorboard/references/visualization.md +620 -0
- package/bin/skills/tensorrt-llm/SKILL.md +187 -0
- package/bin/skills/tensorrt-llm/references/multi-gpu.md +298 -0
- package/bin/skills/tensorrt-llm/references/optimization.md +242 -0
- package/bin/skills/tensorrt-llm/references/serving.md +470 -0
- package/bin/skills/tinker/SKILL.md +362 -0
- package/bin/skills/tinker/references/api-reference.md +168 -0
- package/bin/skills/tinker/references/getting-started.md +157 -0
- package/bin/skills/tinker/references/loss-functions.md +163 -0
- package/bin/skills/tinker/references/models-and-lora.md +139 -0
- package/bin/skills/tinker/references/recipes.md +280 -0
- package/bin/skills/tinker/references/reinforcement-learning.md +212 -0
- package/bin/skills/tinker/references/rendering.md +243 -0
- package/bin/skills/tinker/references/supervised-learning.md +232 -0
- package/bin/skills/tinker-training-cost/SKILL.md +187 -0
- package/bin/skills/tinker-training-cost/scripts/calculate_cost.py +123 -0
- package/bin/skills/torchforge/SKILL.md +433 -0
- package/bin/skills/torchforge/references/api-reference.md +327 -0
- package/bin/skills/torchforge/references/troubleshooting.md +409 -0
- package/bin/skills/torchtitan/SKILL.md +358 -0
- package/bin/skills/torchtitan/references/checkpoint.md +181 -0
- package/bin/skills/torchtitan/references/custom-models.md +258 -0
- package/bin/skills/torchtitan/references/float8.md +133 -0
- package/bin/skills/torchtitan/references/fsdp.md +126 -0
- package/bin/skills/transformer-lens/SKILL.md +346 -0
- package/bin/skills/transformer-lens/references/README.md +54 -0
- package/bin/skills/transformer-lens/references/api.md +362 -0
- package/bin/skills/transformer-lens/references/tutorials.md +339 -0
- package/bin/skills/trl-fine-tuning/SKILL.md +455 -0
- package/bin/skills/trl-fine-tuning/references/dpo-variants.md +227 -0
- package/bin/skills/trl-fine-tuning/references/online-rl.md +82 -0
- package/bin/skills/trl-fine-tuning/references/reward-modeling.md +122 -0
- package/bin/skills/trl-fine-tuning/references/sft-training.md +168 -0
- package/bin/skills/unsloth/SKILL.md +80 -0
- package/bin/skills/unsloth/references/index.md +7 -0
- package/bin/skills/unsloth/references/llms-full.md +16799 -0
- package/bin/skills/unsloth/references/llms-txt.md +12044 -0
- package/bin/skills/unsloth/references/llms.md +82 -0
- package/bin/skills/verl/SKILL.md +391 -0
- package/bin/skills/verl/references/api-reference.md +301 -0
- package/bin/skills/verl/references/troubleshooting.md +391 -0
- package/bin/skills/vllm/SKILL.md +364 -0
- package/bin/skills/vllm/references/optimization.md +226 -0
- package/bin/skills/vllm/references/quantization.md +284 -0
- package/bin/skills/vllm/references/server-deployment.md +255 -0
- package/bin/skills/vllm/references/troubleshooting.md +447 -0
- package/bin/skills/weights-and-biases/SKILL.md +590 -0
- package/bin/skills/weights-and-biases/references/artifacts.md +584 -0
- package/bin/skills/weights-and-biases/references/integrations.md +700 -0
- package/bin/skills/weights-and-biases/references/sweeps.md +847 -0
- package/bin/skills/whisper/SKILL.md +317 -0
- package/bin/skills/whisper/references/languages.md +189 -0
- package/bin/synsc +0 -0
- package/package.json +10 -0
|
@@ -0,0 +1,200 @@
|
|
|
1
|
+
# Tokenization Algorithms
|
|
2
|
+
|
|
3
|
+
BPE vs Unigram comparison and subword regularization.
|
|
4
|
+
|
|
5
|
+
## BPE (Byte-Pair Encoding)
|
|
6
|
+
|
|
7
|
+
### Algorithm
|
|
8
|
+
|
|
9
|
+
1. Initialize vocabulary with characters
|
|
10
|
+
2. Count frequency of adjacent token pairs
|
|
11
|
+
3. Merge most frequent pair
|
|
12
|
+
4. Repeat until vocabulary size reached
|
|
13
|
+
|
|
14
|
+
### Example
|
|
15
|
+
|
|
16
|
+
**Corpus**:
|
|
17
|
+
```
|
|
18
|
+
low: 5
|
|
19
|
+
lower: 2
|
|
20
|
+
newest: 6
|
|
21
|
+
widest: 3
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
**Iteration 1**:
|
|
25
|
+
- Most frequent pair: 'e' + 's' (9 times)
|
|
26
|
+
- Merge → 'es'
|
|
27
|
+
- Vocabulary: [chars] + ['es']
|
|
28
|
+
|
|
29
|
+
**Iteration 2**:
|
|
30
|
+
- Most frequent: 'es' + 't' (9 times)
|
|
31
|
+
- Merge → 'est'
|
|
32
|
+
- Vocabulary: [chars] + ['es', 'est']
|
|
33
|
+
|
|
34
|
+
**Result**: `newest` → `new|est`, `widest` → `wid|est`
|
|
35
|
+
|
|
36
|
+
### Implementation
|
|
37
|
+
|
|
38
|
+
```python
|
|
39
|
+
import sentencepiece as spm
|
|
40
|
+
|
|
41
|
+
spm.SentencePieceTrainer.train(
|
|
42
|
+
input='corpus.txt',
|
|
43
|
+
model_type='bpe',
|
|
44
|
+
vocab_size=16000
|
|
45
|
+
)
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
### Advantages
|
|
49
|
+
|
|
50
|
+
- Simple algorithm
|
|
51
|
+
- Fast training
|
|
52
|
+
- Good compression ratio
|
|
53
|
+
|
|
54
|
+
### Disadvantages
|
|
55
|
+
|
|
56
|
+
- Deterministic (no sampling)
|
|
57
|
+
- May split common words unexpectedly
|
|
58
|
+
|
|
59
|
+
## Unigram
|
|
60
|
+
|
|
61
|
+
### Algorithm
|
|
62
|
+
|
|
63
|
+
1. Start with large vocabulary (all substrings)
|
|
64
|
+
2. Compute probability of each token
|
|
65
|
+
3. Remove tokens with minimal loss impact
|
|
66
|
+
4. Repeat until vocabulary size reached
|
|
67
|
+
|
|
68
|
+
### Probabilistic tokenization
|
|
69
|
+
|
|
70
|
+
Given vocabulary with probabilities:
|
|
71
|
+
```
|
|
72
|
+
P('low') = 0.02
|
|
73
|
+
P('est') = 0.03
|
|
74
|
+
P('l') = 0.01
|
|
75
|
+
P('o') = 0.015
|
|
76
|
+
...
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
Tokenize "lowest":
|
|
80
|
+
```
|
|
81
|
+
Option 1: ['low', 'est']
|
|
82
|
+
P = 0.02 × 0.03 = 0.0006 ← highest
|
|
83
|
+
|
|
84
|
+
Option 2: ['l', 'o', 'w', 'est']
|
|
85
|
+
P = 0.01 × 0.015 × 0.01 × 0.03 = 0.000000045
|
|
86
|
+
|
|
87
|
+
Choose option 1 (highest probability)
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
### Implementation
|
|
91
|
+
|
|
92
|
+
```python
|
|
93
|
+
spm.SentencePieceTrainer.train(
|
|
94
|
+
input='corpus.txt',
|
|
95
|
+
model_type='unigram',
|
|
96
|
+
vocab_size=8000
|
|
97
|
+
)
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
### Advantages
|
|
101
|
+
|
|
102
|
+
- Probabilistic (can sample)
|
|
103
|
+
- Better for morphologically rich languages
|
|
104
|
+
- Supports subword regularization
|
|
105
|
+
|
|
106
|
+
### Disadvantages
|
|
107
|
+
|
|
108
|
+
- Slower training
|
|
109
|
+
- More complex algorithm
|
|
110
|
+
|
|
111
|
+
## Comparison
|
|
112
|
+
|
|
113
|
+
| Feature | BPE | Unigram |
|
|
114
|
+
|---------|-----|---------|
|
|
115
|
+
| Training speed | Fast | Slow |
|
|
116
|
+
| Tokenization | Deterministic | Probabilistic |
|
|
117
|
+
| Sampling | No | Yes |
|
|
118
|
+
| Typical vocab size | 16k-32k | 8k-32k |
|
|
119
|
+
| Used by | mBART | T5, ALBERT, XLNet |
|
|
120
|
+
|
|
121
|
+
## Subword regularization
|
|
122
|
+
|
|
123
|
+
Sample different tokenizations during training for robustness.
|
|
124
|
+
|
|
125
|
+
### Enable sampling
|
|
126
|
+
|
|
127
|
+
```python
|
|
128
|
+
sp = spm.SentencePieceProcessor(model_file='m.model')
|
|
129
|
+
|
|
130
|
+
# Sample different tokenizations
|
|
131
|
+
for _ in range(5):
|
|
132
|
+
pieces = sp.encode('tokenization', out_type=str, enable_sampling=True, alpha=0.1)
|
|
133
|
+
print(pieces)
|
|
134
|
+
|
|
135
|
+
# Output (different each time):
|
|
136
|
+
# ['▁token', 'ization']
|
|
137
|
+
# ['▁tok', 'en', 'ization']
|
|
138
|
+
# ['▁token', 'iz', 'ation']
|
|
139
|
+
# ['▁to', 'ken', 'ization']
|
|
140
|
+
# ['▁token', 'ization']
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
### Parameters
|
|
144
|
+
|
|
145
|
+
- `alpha`: Regularization strength
|
|
146
|
+
- 0.0 = deterministic (no sampling)
|
|
147
|
+
- 0.1 = slight variation
|
|
148
|
+
- 0.5 = high variation
|
|
149
|
+
- 1.0 = maximum variation
|
|
150
|
+
|
|
151
|
+
### Benefits
|
|
152
|
+
|
|
153
|
+
1. **Robustness**: Model learns multiple tokenizations
|
|
154
|
+
2. **Data augmentation**: More diverse training data
|
|
155
|
+
3. **Better generalization**: Less overfitting to specific tokenization
|
|
156
|
+
|
|
157
|
+
### Use case
|
|
158
|
+
|
|
159
|
+
```python
|
|
160
|
+
# Training loop with regularization
|
|
161
|
+
for batch in dataloader:
|
|
162
|
+
# Sample different tokenizations each epoch
|
|
163
|
+
tokens = sp.encode(batch['text'], enable_sampling=True, alpha=0.1)
|
|
164
|
+
# Train model...
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
**Used by**: mT5, XLM-RoBERTa
|
|
168
|
+
|
|
169
|
+
## NBest encoding
|
|
170
|
+
|
|
171
|
+
Get multiple tokenization candidates with scores.
|
|
172
|
+
|
|
173
|
+
```python
|
|
174
|
+
sp = spm.SentencePieceProcessor(model_file='m.model')
|
|
175
|
+
|
|
176
|
+
# Get top-5 tokenizations
|
|
177
|
+
nbest = sp.nbest_encode('tokenization', nbest_size=5, out_type=str)
|
|
178
|
+
|
|
179
|
+
for pieces, score in nbest:
|
|
180
|
+
print(f"{pieces} (log prob: {score:.4f})")
|
|
181
|
+
|
|
182
|
+
# Output:
|
|
183
|
+
# ['▁token', 'ization'] (log prob: -2.34)
|
|
184
|
+
# ['▁tok', 'en', 'ization'] (log prob: -2.41)
|
|
185
|
+
# ['▁token', 'iz', 'ation'] (log prob: -2.57)
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
### Use cases
|
|
189
|
+
|
|
190
|
+
1. **Ensemble tokenization**: Average over multiple tokenizations
|
|
191
|
+
2. **Uncertainty estimation**: Check variance in scores
|
|
192
|
+
3. **Debugging**: Understand tokenizer behavior
|
|
193
|
+
|
|
194
|
+
## Best practices
|
|
195
|
+
|
|
196
|
+
1. **Use Unigram for multilingual** - Better for diverse languages
|
|
197
|
+
2. **Use BPE for speed** - Faster training and inference
|
|
198
|
+
3. **Enable subword regularization** - Improves model robustness
|
|
199
|
+
4. **Set alpha=0.1 for slight variation** - Good balance
|
|
200
|
+
5. **Use deterministic mode for inference** - Consistent results
|
|
@@ -0,0 +1,304 @@
|
|
|
1
|
+
# SentencePiece Training Guide
|
|
2
|
+
|
|
3
|
+
Complete guide to training SentencePiece models.
|
|
4
|
+
|
|
5
|
+
## Training workflow
|
|
6
|
+
|
|
7
|
+
### Step 1: Prepare corpus
|
|
8
|
+
|
|
9
|
+
```bash
|
|
10
|
+
# Plain text file, one sentence per line (recommended)
|
|
11
|
+
cat corpus.txt
|
|
12
|
+
# Hello world
|
|
13
|
+
# This is a test
|
|
14
|
+
# SentencePiece is language-independent
|
|
15
|
+
|
|
16
|
+
# Or use raw text (SentencePiece handles sentence splitting)
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
### Step 2: Train model
|
|
20
|
+
|
|
21
|
+
**Command-line**:
|
|
22
|
+
```bash
|
|
23
|
+
spm_train \
|
|
24
|
+
--input=corpus.txt \
|
|
25
|
+
--model_prefix=m \
|
|
26
|
+
--vocab_size=8000 \
|
|
27
|
+
--model_type=unigram \
|
|
28
|
+
--character_coverage=0.9995
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
**Python API**:
|
|
32
|
+
```python
|
|
33
|
+
import sentencepiece as spm
|
|
34
|
+
|
|
35
|
+
spm.SentencePieceTrainer.train(
|
|
36
|
+
input='corpus.txt',
|
|
37
|
+
model_prefix='m',
|
|
38
|
+
vocab_size=8000,
|
|
39
|
+
model_type='unigram'
|
|
40
|
+
)
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
**Output**: `m.model` (binary), `m.vocab` (text vocabulary)
|
|
44
|
+
|
|
45
|
+
### Step 3: Load and use
|
|
46
|
+
|
|
47
|
+
```python
|
|
48
|
+
sp = spm.SentencePieceProcessor(model_file='m.model')
|
|
49
|
+
pieces = sp.encode('Test sentence', out_type=str)
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
## Training parameters
|
|
53
|
+
|
|
54
|
+
### Core parameters
|
|
55
|
+
|
|
56
|
+
```python
|
|
57
|
+
spm.SentencePieceTrainer.train(
|
|
58
|
+
# Required
|
|
59
|
+
input='corpus.txt', # Input corpus
|
|
60
|
+
model_prefix='output', # Output prefix
|
|
61
|
+
vocab_size=8000, # Target vocabulary size
|
|
62
|
+
|
|
63
|
+
# Algorithm
|
|
64
|
+
model_type='unigram', # 'unigram', 'bpe', 'char', 'word'
|
|
65
|
+
|
|
66
|
+
# Coverage
|
|
67
|
+
character_coverage=0.9995, # 0.9995 for most, 1.0 for CJK
|
|
68
|
+
|
|
69
|
+
# Normalization
|
|
70
|
+
normalization_rule_name='nmt_nfkc', # 'nmt_nfkc', 'nfkc', 'identity'
|
|
71
|
+
|
|
72
|
+
# Performance
|
|
73
|
+
num_threads=16, # Training threads
|
|
74
|
+
input_sentence_size=10000000 # Max sentences to load
|
|
75
|
+
)
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
### Special tokens
|
|
79
|
+
|
|
80
|
+
```python
|
|
81
|
+
spm.SentencePieceTrainer.train(
|
|
82
|
+
input='corpus.txt',
|
|
83
|
+
model_prefix='m',
|
|
84
|
+
vocab_size=32000,
|
|
85
|
+
|
|
86
|
+
# Control symbols (special tokens for model control)
|
|
87
|
+
control_symbols=['<s>', '</s>', '<pad>'],
|
|
88
|
+
|
|
89
|
+
# User-defined symbols (never split)
|
|
90
|
+
user_defined_symbols=['[MASK]', '[SEP]', '[CLS]'],
|
|
91
|
+
|
|
92
|
+
# Special token pieces
|
|
93
|
+
unk_piece='<unk>',
|
|
94
|
+
bos_piece='<s>',
|
|
95
|
+
eos_piece='</s>',
|
|
96
|
+
pad_piece='<pad>',
|
|
97
|
+
|
|
98
|
+
# Special token IDs
|
|
99
|
+
unk_id=0,
|
|
100
|
+
bos_id=1,
|
|
101
|
+
eos_id=2,
|
|
102
|
+
pad_id=3
|
|
103
|
+
)
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
### Advanced options
|
|
107
|
+
|
|
108
|
+
```python
|
|
109
|
+
spm.SentencePieceTrainer.train(
|
|
110
|
+
input='corpus.txt',
|
|
111
|
+
model_prefix='m',
|
|
112
|
+
vocab_size=32000,
|
|
113
|
+
|
|
114
|
+
# Byte fallback (handle unknown chars)
|
|
115
|
+
byte_fallback=True,
|
|
116
|
+
|
|
117
|
+
# Digit handling
|
|
118
|
+
split_digits=True, # Split digits individually
|
|
119
|
+
|
|
120
|
+
# Script splitting
|
|
121
|
+
split_by_unicode_script=True, # Split by Unicode script
|
|
122
|
+
split_by_whitespace=True, # Split by whitespace
|
|
123
|
+
|
|
124
|
+
# Length constraints
|
|
125
|
+
max_sentencepiece_length=16, # Max token length
|
|
126
|
+
|
|
127
|
+
# Rare word handling
|
|
128
|
+
min_frequency=2, # Min frequency for token
|
|
129
|
+
|
|
130
|
+
# Training size
|
|
131
|
+
input_sentence_size=10000000, # Max sentences
|
|
132
|
+
shuffle_input_sentence=True, # Shuffle training data
|
|
133
|
+
|
|
134
|
+
# Seed
|
|
135
|
+
seed_sentencepiece_size=1000000 # Seed vocab size
|
|
136
|
+
)
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
## Training from Python iterator
|
|
140
|
+
|
|
141
|
+
```python
|
|
142
|
+
import sentencepiece as spm
|
|
143
|
+
from datasets import load_dataset
|
|
144
|
+
|
|
145
|
+
# Load dataset
|
|
146
|
+
dataset = load_dataset('wikitext', 'wikitext-103-raw-v1', split='train')
|
|
147
|
+
|
|
148
|
+
# Create iterator
|
|
149
|
+
def corpus_iterator():
|
|
150
|
+
for example in dataset:
|
|
151
|
+
if example['text'].strip():
|
|
152
|
+
yield example['text']
|
|
153
|
+
|
|
154
|
+
# Train from iterator
|
|
155
|
+
spm.SentencePieceTrainer.train(
|
|
156
|
+
sentence_iterator=corpus_iterator(),
|
|
157
|
+
model_prefix='wiki',
|
|
158
|
+
vocab_size=32000,
|
|
159
|
+
model_type='unigram'
|
|
160
|
+
)
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
## Model types
|
|
164
|
+
|
|
165
|
+
### BPE
|
|
166
|
+
|
|
167
|
+
```python
|
|
168
|
+
spm.SentencePieceTrainer.train(
|
|
169
|
+
input='corpus.txt',
|
|
170
|
+
model_type='bpe',
|
|
171
|
+
vocab_size=16000
|
|
172
|
+
)
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
**Training time**: ~10-15 min for 1GB corpus
|
|
176
|
+
|
|
177
|
+
### Unigram (recommended)
|
|
178
|
+
|
|
179
|
+
```python
|
|
180
|
+
spm.SentencePieceTrainer.train(
|
|
181
|
+
input='corpus.txt',
|
|
182
|
+
model_type='unigram',
|
|
183
|
+
vocab_size=8000
|
|
184
|
+
)
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
**Training time**: ~30-40 min for 1GB corpus
|
|
188
|
+
|
|
189
|
+
## Character coverage
|
|
190
|
+
|
|
191
|
+
### English/European (0.9995)
|
|
192
|
+
|
|
193
|
+
```python
|
|
194
|
+
spm.SentencePieceTrainer.train(
|
|
195
|
+
input='en_corpus.txt',
|
|
196
|
+
character_coverage=0.9995 # Cover 99.95% of chars
|
|
197
|
+
)
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
Covers: a-z, A-Z, punctuation, common accents
|
|
201
|
+
|
|
202
|
+
### CJK (1.0)
|
|
203
|
+
|
|
204
|
+
```python
|
|
205
|
+
spm.SentencePieceTrainer.train(
|
|
206
|
+
input='zh_corpus.txt',
|
|
207
|
+
character_coverage=1.0 # Cover ALL characters
|
|
208
|
+
)
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
Required for: Chinese, Japanese, Korean
|
|
212
|
+
|
|
213
|
+
### Multilingual (0.9995-1.0)
|
|
214
|
+
|
|
215
|
+
```python
|
|
216
|
+
spm.SentencePieceTrainer.train(
|
|
217
|
+
input='multilingual_corpus.txt',
|
|
218
|
+
character_coverage=0.9995 # Balance coverage/size
|
|
219
|
+
)
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
## Vocabulary size selection
|
|
223
|
+
|
|
224
|
+
| Task | Vocab Size | Rationale |
|
|
225
|
+
|------|------------|-----------|
|
|
226
|
+
| English monolingual | 16k-32k | Standard |
|
|
227
|
+
| Multilingual | 32k-250k | More languages |
|
|
228
|
+
| CJK | 32k-100k | More characters |
|
|
229
|
+
| Code | 16k-32k | Similar to English |
|
|
230
|
+
|
|
231
|
+
## Normalization rules
|
|
232
|
+
|
|
233
|
+
### nmt_nfkc (recommended)
|
|
234
|
+
|
|
235
|
+
```python
|
|
236
|
+
normalization_rule_name='nmt_nfkc'
|
|
237
|
+
```
|
|
238
|
+
|
|
239
|
+
- NFKC Unicode normalization
|
|
240
|
+
- Whitespace handling
|
|
241
|
+
- **Recommended for most tasks**
|
|
242
|
+
|
|
243
|
+
### identity (no normalization)
|
|
244
|
+
|
|
245
|
+
```python
|
|
246
|
+
normalization_rule_name='identity'
|
|
247
|
+
```
|
|
248
|
+
|
|
249
|
+
- Preserves input exactly
|
|
250
|
+
- Use for code, case-sensitive tasks
|
|
251
|
+
|
|
252
|
+
### nfkc (standard Unicode)
|
|
253
|
+
|
|
254
|
+
```python
|
|
255
|
+
normalization_rule_name='nfkc'
|
|
256
|
+
```
|
|
257
|
+
|
|
258
|
+
- Standard Unicode normalization
|
|
259
|
+
- Less aggressive than nmt_nfkc
|
|
260
|
+
|
|
261
|
+
## Performance optimization
|
|
262
|
+
|
|
263
|
+
### Multi-threading
|
|
264
|
+
|
|
265
|
+
```python
|
|
266
|
+
spm.SentencePieceTrainer.train(
|
|
267
|
+
input='large_corpus.txt',
|
|
268
|
+
num_threads=32 # Use all cores
|
|
269
|
+
)
|
|
270
|
+
```
|
|
271
|
+
|
|
272
|
+
**Speedup**: ~4-8× with 16+ cores
|
|
273
|
+
|
|
274
|
+
### Sampling input
|
|
275
|
+
|
|
276
|
+
```python
|
|
277
|
+
spm.SentencePieceTrainer.train(
|
|
278
|
+
input='huge_corpus.txt',
|
|
279
|
+
input_sentence_size=10000000, # Sample 10M sentences
|
|
280
|
+
shuffle_input_sentence=True
|
|
281
|
+
)
|
|
282
|
+
```
|
|
283
|
+
|
|
284
|
+
**For very large corpora** (>10GB)
|
|
285
|
+
|
|
286
|
+
### Extremely large corpus
|
|
287
|
+
|
|
288
|
+
```python
|
|
289
|
+
spm.SentencePieceTrainer.train(
|
|
290
|
+
input='massive_corpus.txt',
|
|
291
|
+
train_extremely_large_corpus=True, # Enable for >10GB
|
|
292
|
+
input_sentence_size=100000000
|
|
293
|
+
)
|
|
294
|
+
```
|
|
295
|
+
|
|
296
|
+
## Best practices
|
|
297
|
+
|
|
298
|
+
1. **Use Unigram for most tasks** - Better for multilingual
|
|
299
|
+
2. **Set character_coverage=1.0 for CJK** - Required for full coverage
|
|
300
|
+
3. **Use nmt_nfkc normalization** - Works well for most cases
|
|
301
|
+
4. **Add user_defined_symbols for special tokens** - BERT-style tokens
|
|
302
|
+
5. **Enable byte_fallback for robustness** - Handles emojis/rare chars
|
|
303
|
+
6. **Start with vocab_size=32000** - Good default for most tasks
|
|
304
|
+
7. **Use multi-threading** - Speeds up training significantly
|