@bgicli/bgicli 2.2.8 → 2.2.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (113) hide show
  1. package/data/skills/anthropic-algorithmic-art/SKILL.md +405 -0
  2. package/data/skills/anthropic-canvas-design/SKILL.md +130 -0
  3. package/data/skills/anthropic-claude-api/SKILL.md +243 -0
  4. package/data/skills/anthropic-doc-coauthoring/SKILL.md +375 -0
  5. package/data/skills/anthropic-docx/SKILL.md +590 -0
  6. package/data/skills/anthropic-frontend-design/SKILL.md +42 -0
  7. package/data/skills/anthropic-internal-comms/SKILL.md +32 -0
  8. package/data/skills/anthropic-mcp-builder/SKILL.md +236 -0
  9. package/data/skills/anthropic-pdf/SKILL.md +314 -0
  10. package/data/skills/anthropic-pptx/SKILL.md +232 -0
  11. package/data/skills/anthropic-skill-creator/SKILL.md +485 -0
  12. package/data/skills/anthropic-webapp-testing/SKILL.md +96 -0
  13. package/data/skills/anthropic-xlsx/SKILL.md +292 -0
  14. package/data/skills/arxiv-database/SKILL.md +362 -0
  15. package/data/skills/astropy/SKILL.md +329 -0
  16. package/data/skills/ctx-advanced-evaluation/SKILL.md +402 -0
  17. package/data/skills/ctx-bdi-mental-states/SKILL.md +311 -0
  18. package/data/skills/ctx-context-compression/SKILL.md +272 -0
  19. package/data/skills/ctx-context-degradation/SKILL.md +206 -0
  20. package/data/skills/ctx-context-fundamentals/SKILL.md +201 -0
  21. package/data/skills/ctx-context-optimization/SKILL.md +195 -0
  22. package/data/skills/ctx-evaluation/SKILL.md +251 -0
  23. package/data/skills/ctx-filesystem-context/SKILL.md +287 -0
  24. package/data/skills/ctx-hosted-agents/SKILL.md +260 -0
  25. package/data/skills/ctx-memory-systems/SKILL.md +225 -0
  26. package/data/skills/ctx-multi-agent-patterns/SKILL.md +257 -0
  27. package/data/skills/ctx-project-development/SKILL.md +291 -0
  28. package/data/skills/ctx-tool-design/SKILL.md +271 -0
  29. package/data/skills/dhdna-profiler/SKILL.md +162 -0
  30. package/data/skills/generate-image/SKILL.md +183 -0
  31. package/data/skills/geomaster/SKILL.md +365 -0
  32. package/data/skills/get-available-resources/SKILL.md +275 -0
  33. package/data/skills/hamelsmu-build-review-interface/SKILL.md +96 -0
  34. package/data/skills/hamelsmu-error-analysis/SKILL.md +164 -0
  35. package/data/skills/hamelsmu-eval-audit/SKILL.md +183 -0
  36. package/data/skills/hamelsmu-evaluate-rag/SKILL.md +177 -0
  37. package/data/skills/hamelsmu-generate-synthetic-data/SKILL.md +131 -0
  38. package/data/skills/hamelsmu-validate-evaluator/SKILL.md +212 -0
  39. package/data/skills/hamelsmu-write-judge-prompt/SKILL.md +144 -0
  40. package/data/skills/hf-cli/SKILL.md +174 -0
  41. package/data/skills/hf-mcp/SKILL.md +178 -0
  42. package/data/skills/hugging-face-dataset-viewer/SKILL.md +121 -0
  43. package/data/skills/hugging-face-datasets/SKILL.md +542 -0
  44. package/data/skills/hugging-face-evaluation/SKILL.md +651 -0
  45. package/data/skills/hugging-face-jobs/SKILL.md +1042 -0
  46. package/data/skills/hugging-face-model-trainer/SKILL.md +717 -0
  47. package/data/skills/hugging-face-paper-pages/SKILL.md +239 -0
  48. package/data/skills/hugging-face-paper-publisher/SKILL.md +624 -0
  49. package/data/skills/hugging-face-tool-builder/SKILL.md +110 -0
  50. package/data/skills/hugging-face-trackio/SKILL.md +115 -0
  51. package/data/skills/hugging-face-vision-trainer/SKILL.md +593 -0
  52. package/data/skills/huggingface-gradio/SKILL.md +245 -0
  53. package/data/skills/matlab/SKILL.md +376 -0
  54. package/data/skills/modal/SKILL.md +381 -0
  55. package/data/skills/openai-cloudflare-deploy/SKILL.md +224 -0
  56. package/data/skills/openai-develop-web-game/SKILL.md +149 -0
  57. package/data/skills/openai-doc/SKILL.md +80 -0
  58. package/data/skills/openai-figma/SKILL.md +42 -0
  59. package/data/skills/openai-figma-implement-design/SKILL.md +264 -0
  60. package/data/skills/openai-gh-address-comments/SKILL.md +25 -0
  61. package/data/skills/openai-gh-fix-ci/SKILL.md +69 -0
  62. package/data/skills/openai-imagegen/SKILL.md +174 -0
  63. package/data/skills/openai-jupyter-notebook/SKILL.md +107 -0
  64. package/data/skills/openai-linear/SKILL.md +87 -0
  65. package/data/skills/openai-netlify-deploy/SKILL.md +247 -0
  66. package/data/skills/openai-notion-knowledge-capture/SKILL.md +56 -0
  67. package/data/skills/openai-notion-meeting-intelligence/SKILL.md +60 -0
  68. package/data/skills/openai-notion-research-documentation/SKILL.md +59 -0
  69. package/data/skills/openai-notion-spec-to-implementation/SKILL.md +58 -0
  70. package/data/skills/openai-openai-docs/SKILL.md +69 -0
  71. package/data/skills/openai-pdf/SKILL.md +67 -0
  72. package/data/skills/openai-playwright/SKILL.md +147 -0
  73. package/data/skills/openai-render-deploy/SKILL.md +479 -0
  74. package/data/skills/openai-screenshot/SKILL.md +267 -0
  75. package/data/skills/openai-security-best-practices/SKILL.md +86 -0
  76. package/data/skills/openai-security-ownership-map/SKILL.md +206 -0
  77. package/data/skills/openai-security-threat-model/SKILL.md +81 -0
  78. package/data/skills/openai-sentry/SKILL.md +123 -0
  79. package/data/skills/openai-sora/SKILL.md +178 -0
  80. package/data/skills/openai-speech/SKILL.md +144 -0
  81. package/data/skills/openai-spreadsheet/SKILL.md +145 -0
  82. package/data/skills/openai-transcribe/SKILL.md +81 -0
  83. package/data/skills/openai-vercel-deploy/SKILL.md +77 -0
  84. package/data/skills/openai-yeet/SKILL.md +28 -0
  85. package/data/skills/pennylane/SKILL.md +224 -0
  86. package/data/skills/polars-bio/SKILL.md +374 -0
  87. package/data/skills/primekg/SKILL.md +97 -0
  88. package/data/skills/pymatgen/SKILL.md +689 -0
  89. package/data/skills/qiskit/SKILL.md +273 -0
  90. package/data/skills/qutip/SKILL.md +316 -0
  91. package/data/skills/recursive-decomposition/SKILL.md +185 -0
  92. package/data/skills/rowan/SKILL.md +427 -0
  93. package/data/skills/scholar-evaluation/SKILL.md +298 -0
  94. package/data/skills/sentry-create-alert/SKILL.md +210 -0
  95. package/data/skills/sentry-fix-issues/SKILL.md +126 -0
  96. package/data/skills/sentry-pr-code-review/SKILL.md +105 -0
  97. package/data/skills/sentry-python-sdk/SKILL.md +317 -0
  98. package/data/skills/sentry-setup-ai-monitoring/SKILL.md +217 -0
  99. package/data/skills/stable-baselines3/SKILL.md +297 -0
  100. package/data/skills/sympy/SKILL.md +498 -0
  101. package/data/skills/trailofbits-ask-questions-if-underspecified/SKILL.md +85 -0
  102. package/data/skills/trailofbits-audit-context-building/SKILL.md +302 -0
  103. package/data/skills/trailofbits-differential-review/SKILL.md +220 -0
  104. package/data/skills/trailofbits-insecure-defaults/SKILL.md +117 -0
  105. package/data/skills/trailofbits-modern-python/SKILL.md +333 -0
  106. package/data/skills/trailofbits-property-based-testing/SKILL.md +123 -0
  107. package/data/skills/trailofbits-semgrep-rule-creator/SKILL.md +172 -0
  108. package/data/skills/trailofbits-sharp-edges/SKILL.md +292 -0
  109. package/data/skills/trailofbits-variant-analysis/SKILL.md +142 -0
  110. package/data/skills/transformers.js/SKILL.md +637 -0
  111. package/data/skills/writing/SKILL.md +419 -0
  112. package/dist/bgi.js +66 -2
  113. package/package.json +1 -1
@@ -0,0 +1,717 @@
1
+ ---
2
+ name: hugging-face-model-trainer
3
+ description: This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, and model persistence. Should be invoked for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.
4
+ license: Complete terms in LICENSE.txt
5
+ ---
6
+
7
+ # TRL Training on Hugging Face Jobs
8
+
9
+ ## Overview
10
+
11
+ Train language models using TRL (Transformer Reinforcement Learning) on fully managed Hugging Face infrastructure. No local GPU setup required—models train on cloud GPUs and results are automatically saved to the Hugging Face Hub.
12
+
13
+ **TRL provides multiple training methods:**
14
+ - **SFT** (Supervised Fine-Tuning) - Standard instruction tuning
15
+ - **DPO** (Direct Preference Optimization) - Alignment from preference data
16
+ - **GRPO** (Group Relative Policy Optimization) - Online RL training
17
+ - **Reward Modeling** - Train reward models for RLHF
18
+
19
+ **For detailed TRL method documentation:**
20
+ ```python
21
+ hf_doc_search("your query", product="trl")
22
+ hf_doc_fetch("https://huggingface.co/docs/trl/sft_trainer") # SFT
23
+ hf_doc_fetch("https://huggingface.co/docs/trl/dpo_trainer") # DPO
24
+ # etc.
25
+ ```
26
+
27
+ **See also:** `references/training_methods.md` for method overviews and selection guidance
28
+
29
+ ## When to Use This Skill
30
+
31
+ Use this skill when users want to:
32
+ - Fine-tune language models on cloud GPUs without local infrastructure
33
+ - Train with TRL methods (SFT, DPO, GRPO, etc.)
34
+ - Run training jobs on Hugging Face Jobs infrastructure
35
+ - Convert trained models to GGUF for local deployment (Ollama, LM Studio, llama.cpp)
36
+ - Ensure trained models are permanently saved to the Hub
37
+ - Use modern workflows with optimized defaults
38
+
39
+ ### When to Use Unsloth
40
+
41
+ Use **Unsloth** (`references/unsloth.md`) instead of standard TRL when:
42
+ - **Limited GPU memory** - Unsloth uses ~60% less VRAM
43
+ - **Speed matters** - Unsloth is ~2x faster
44
+ - Training **large models (>13B)** - memory efficiency is critical
45
+ - Training **Vision-Language Models (VLMs)** - Unsloth has `FastVisionModel` support
46
+
47
+ See `references/unsloth.md` for complete Unsloth documentation and `scripts/unsloth_sft_example.py` for a production-ready training script.
48
+
49
+ ## Key Directives
50
+
51
+ When assisting with training jobs:
52
+
53
+ 1. **ALWAYS use `hf_jobs()` MCP tool** - Submit jobs using `hf_jobs("uv", {...})`, NOT bash `trl-jobs` commands. The `script` parameter accepts Python code directly. Do NOT save to local files unless the user explicitly requests it. Pass the script content as a string to `hf_jobs()`. If user asks to "train a model", "fine-tune", or similar requests, you MUST create the training script AND submit the job immediately using `hf_jobs()`.
54
+
55
+ 2. **Always include Trackio** - Every training script should include Trackio for real-time monitoring. Use example scripts in `scripts/` as templates.
56
+
57
+ 3. **Provide job details after submission** - After submitting, provide job ID, monitoring URL, estimated time, and note that the user can request status checks later.
58
+
59
+ 4. **Use example scripts as templates** - Reference `scripts/train_sft_example.py`, `scripts/train_dpo_example.py`, etc. as starting points.
60
+
61
+ ## Local Script Execution
62
+
63
+ Repository scripts use PEP 723 inline dependencies. Run them with `uv run`:
64
+ ```bash
65
+ uv run scripts/estimate_cost.py --help
66
+ uv run scripts/dataset_inspector.py --help
67
+ ```
68
+
69
+ ## Prerequisites Checklist
70
+
71
+ Before starting any training job, verify:
72
+
73
+ ### ✅ **Account & Authentication**
74
+ - Hugging Face Account with [Pro](https://hf.co/pro), [Team](https://hf.co/enterprise), or [Enterprise](https://hf.co/enterprise) plan (Jobs require paid plan)
75
+ - Authenticated login: Check with `hf_whoami()`
76
+ - **HF_TOKEN for Hub Push** ⚠️ CRITICAL - Training environment is ephemeral, must push to Hub or ALL training results are lost
77
+ - Token must have write permissions
78
+ - **MUST pass `secrets={"HF_TOKEN": "$HF_TOKEN"}` in job config** to make token available (the `$HF_TOKEN` syntax
79
+ references your actual token value)
80
+
81
+ ### ✅ **Dataset Requirements**
82
+ - Dataset must exist on Hub or be loadable via `datasets.load_dataset()`
83
+ - Format must match training method (SFT: "messages"/text/prompt-completion; DPO: chosen/rejected; GRPO: prompt-only)
84
+ - **ALWAYS validate unknown datasets** before GPU training to prevent format failures (see Dataset Validation section below)
85
+ - Size appropriate for hardware (Demo: 50-100 examples on t4-small; Production: 1K-10K+ on a10g-large/a100-large)
86
+
87
+ ### ⚠️ **Critical Settings**
88
+ - **Timeout must exceed expected training time** - Default 30min is TOO SHORT for most training. Minimum recommended: 1-2 hours. Job fails and loses all progress if timeout is exceeded.
89
+ - **Hub push must be enabled** - Config: `push_to_hub=True`, `hub_model_id="username/model-name"`; Job: `secrets={"HF_TOKEN": "$HF_TOKEN"}`
90
+
91
+ ## Asynchronous Job Guidelines
92
+
93
+ **⚠️ IMPORTANT: Training jobs run asynchronously and can take hours**
94
+
95
+ ### Action Required
96
+
97
+ **When user requests training:**
98
+ 1. **Create the training script** with Trackio included (use `scripts/train_sft_example.py` as template)
99
+ 2. **Submit immediately** using `hf_jobs()` MCP tool with script content inline - don't save to file unless user requests
100
+ 3. **Report submission** with job ID, monitoring URL, and estimated time
101
+ 4. **Wait for user** to request status checks - don't poll automatically
102
+
103
+ ### Ground Rules
104
+ - **Jobs run in background** - Submission returns immediately; training continues independently
105
+ - **Initial logs delayed** - Can take 30-60 seconds for logs to appear
106
+ - **User checks status** - Wait for user to request status updates
107
+ - **Avoid polling** - Check logs only on user request; provide monitoring links instead
108
+
109
+ ### After Submission
110
+
111
+ **Provide to user:**
112
+ - ✅ Job ID and monitoring URL
113
+ - ✅ Expected completion time
114
+ - ✅ Trackio dashboard URL
115
+ - ✅ Note that user can request status checks later
116
+
117
+ **Example Response:**
118
+ ```
119
+ ✅ Job submitted successfully!
120
+
121
+ Job ID: abc123xyz
122
+ Monitor: https://huggingface.co/jobs/username/abc123xyz
123
+
124
+ Expected time: ~2 hours
125
+ Estimated cost: ~$10
126
+
127
+ The job is running in the background. Ask me to check status/logs when ready!
128
+ ```
129
+
130
+ ## Quick Start: Three Approaches
131
+
132
+ **💡 Tip for Demos:** For quick demos on smaller GPUs (t4-small), omit `eval_dataset` and `eval_strategy` to save ~40% memory. You'll still see training loss and learning progress.
133
+
134
+ ### Sequence Length Configuration
135
+
136
+ **TRL config classes use `max_length` (not `max_seq_length`)** to control tokenized sequence length:
137
+
138
+ ```python
139
+ # ✅ CORRECT - If you need to set sequence length
140
+ SFTConfig(max_length=512) # Truncate sequences to 512 tokens
141
+ DPOConfig(max_length=2048) # Longer context (2048 tokens)
142
+
143
+ # ❌ WRONG - This parameter doesn't exist
144
+ SFTConfig(max_seq_length=512) # TypeError!
145
+ ```
146
+
147
+ **Default behavior:** `max_length=1024` (truncates from right). This works well for most training.
148
+
149
+ **When to override:**
150
+ - **Longer context**: Set higher (e.g., `max_length=2048`)
151
+ - **Memory constraints**: Set lower (e.g., `max_length=512`)
152
+ - **Vision models**: Set `max_length=None` (prevents cutting image tokens)
153
+
154
+ **Usually you don't need to set this parameter at all** - the examples below use the sensible default.
155
+
156
+ ### Approach 1: UV Scripts (Recommended—Default Choice)
157
+
158
+ UV scripts use PEP 723 inline dependencies for clean, self-contained training. **This is the primary approach for Claude Code.**
159
+
160
+ ```python
161
+ hf_jobs("uv", {
162
+ "script": """
163
+ # /// script
164
+ # dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio"]
165
+ # ///
166
+
167
+ from datasets import load_dataset
168
+ from peft import LoraConfig
169
+ from trl import SFTTrainer, SFTConfig
170
+ import trackio
171
+
172
+ dataset = load_dataset("trl-lib/Capybara", split="train")
173
+
174
+ # Create train/eval split for monitoring
175
+ dataset_split = dataset.train_test_split(test_size=0.1, seed=42)
176
+
177
+ trainer = SFTTrainer(
178
+ model="Qwen/Qwen2.5-0.5B",
179
+ train_dataset=dataset_split["train"],
180
+ eval_dataset=dataset_split["test"],
181
+ peft_config=LoraConfig(r=16, lora_alpha=32),
182
+ args=SFTConfig(
183
+ output_dir="my-model",
184
+ push_to_hub=True,
185
+ hub_model_id="username/my-model",
186
+ num_train_epochs=3,
187
+ eval_strategy="steps",
188
+ eval_steps=50,
189
+ report_to="trackio",
190
+ project="meaningful_prject_name", # project name for the training name (trackio)
191
+ run_name="meaningful_run_name", # descriptive name for the specific training run (trackio)
192
+ )
193
+ )
194
+
195
+ trainer.train()
196
+ trainer.push_to_hub()
197
+ """,
198
+ "flavor": "a10g-large",
199
+ "timeout": "2h",
200
+ "secrets": {"HF_TOKEN": "$HF_TOKEN"}
201
+ })
202
+ ```
203
+
204
+ **Benefits:** Direct MCP tool usage, clean code, dependencies declared inline (PEP 723), no file saving required, full control
205
+ **When to use:** Default choice for all training tasks in Claude Code, custom training logic, any scenario requiring `hf_jobs()`
206
+
207
+ #### Working with Scripts
208
+
209
+ ⚠️ **Important:** The `script` parameter accepts either inline code (as shown above) OR a URL. **Local file paths do NOT work.**
210
+
211
+ **Why local paths don't work:**
212
+ Jobs run in isolated Docker containers without access to your local filesystem. Scripts must be:
213
+ - Inline code (recommended for custom training)
214
+ - Publicly accessible URLs
215
+ - Private repo URLs (with HF_TOKEN)
216
+
217
+ **Common mistakes:**
218
+ ```python
219
+ # ❌ These will all fail
220
+ hf_jobs("uv", {"script": "train.py"})
221
+ hf_jobs("uv", {"script": "./scripts/train.py"})
222
+ hf_jobs("uv", {"script": "/path/to/train.py"})
223
+ ```
224
+
225
+ **Correct approaches:**
226
+ ```python
227
+ # ✅ Inline code (recommended)
228
+ hf_jobs("uv", {"script": "# /// script\n# dependencies = [...]\n# ///\n\n<your code>"})
229
+
230
+ # ✅ From Hugging Face Hub
231
+ hf_jobs("uv", {"script": "https://huggingface.co/user/repo/resolve/main/train.py"})
232
+
233
+ # ✅ From GitHub
234
+ hf_jobs("uv", {"script": "https://raw.githubusercontent.com/user/repo/main/train.py"})
235
+
236
+ # ✅ From Gist
237
+ hf_jobs("uv", {"script": "https://gist.githubusercontent.com/user/id/raw/train.py"})
238
+ ```
239
+
240
+ **To use local scripts:** Upload to HF Hub first:
241
+ ```bash
242
+ hf repos create my-training-scripts --type model
243
+ hf upload my-training-scripts ./train.py train.py
244
+ # Use: https://huggingface.co/USERNAME/my-training-scripts/resolve/main/train.py
245
+ ```
246
+
247
+ ### Approach 2: TRL Maintained Scripts (Official Examples)
248
+
249
+ TRL provides battle-tested scripts for all methods. Can be run from URLs:
250
+
251
+ ```python
252
+ hf_jobs("uv", {
253
+ "script": "https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py",
254
+ "script_args": [
255
+ "--model_name_or_path", "Qwen/Qwen2.5-0.5B",
256
+ "--dataset_name", "trl-lib/Capybara",
257
+ "--output_dir", "my-model",
258
+ "--push_to_hub",
259
+ "--hub_model_id", "username/my-model"
260
+ ],
261
+ "flavor": "a10g-large",
262
+ "timeout": "2h",
263
+ "secrets": {"HF_TOKEN": "$HF_TOKEN"}
264
+ })
265
+ ```
266
+
267
+ **Benefits:** No code to write, maintained by TRL team, production-tested
268
+ **When to use:** Standard TRL training, quick experiments, don't need custom code
269
+ **Available:** Scripts are available from https://github.com/huggingface/trl/tree/main/examples/scripts
270
+
271
+ ### Finding More UV Scripts on Hub
272
+
273
+ The `uv-scripts` organization provides ready-to-use UV scripts stored as datasets on Hugging Face Hub:
274
+
275
+ ```python
276
+ # Discover available UV script collections
277
+ dataset_search({"author": "uv-scripts", "sort": "downloads", "limit": 20})
278
+
279
+ # Explore a specific collection
280
+ hub_repo_details(["uv-scripts/classification"], repo_type="dataset", include_readme=True)
281
+ ```
282
+
283
+ **Popular collections:** ocr, classification, synthetic-data, vllm, dataset-creation
284
+
285
+ ### Approach 3: HF Jobs CLI (Direct Terminal Commands)
286
+
287
+ When the `hf_jobs()` MCP tool is unavailable, use the `hf jobs` CLI directly.
288
+
289
+ **⚠️ CRITICAL: CLI Syntax Rules**
290
+
291
+ ```bash
292
+ # ✅ CORRECT syntax - flags BEFORE script URL
293
+ hf jobs uv run --flavor a10g-large --timeout 2h --secrets HF_TOKEN "https://example.com/train.py"
294
+
295
+ # ❌ WRONG - "run uv" instead of "uv run"
296
+ hf jobs run uv "https://example.com/train.py" --flavor a10g-large
297
+
298
+ # ❌ WRONG - flags AFTER script URL (will be ignored!)
299
+ hf jobs uv run "https://example.com/train.py" --flavor a10g-large
300
+
301
+ # ❌ WRONG - "--secret" instead of "--secrets" (plural)
302
+ hf jobs uv run --secret HF_TOKEN "https://example.com/train.py"
303
+ ```
304
+
305
+ **Key syntax rules:**
306
+ 1. Command order is `hf jobs uv run` (NOT `hf jobs run uv`)
307
+ 2. All flags (`--flavor`, `--timeout`, `--secrets`) must come BEFORE the script URL
308
+ 3. Use `--secrets` (plural), not `--secret`
309
+ 4. Script URL must be the last positional argument
310
+
311
+ **Complete CLI example:**
312
+ ```bash
313
+ hf jobs uv run \
314
+ --flavor a10g-large \
315
+ --timeout 2h \
316
+ --secrets HF_TOKEN \
317
+ "https://huggingface.co/user/repo/resolve/main/train.py"
318
+ ```
319
+
320
+ **Check job status via CLI:**
321
+ ```bash
322
+ hf jobs ps # List all jobs
323
+ hf jobs logs <job-id> # View logs
324
+ hf jobs inspect <job-id> # Job details
325
+ hf jobs cancel <job-id> # Cancel a job
326
+ ```
327
+
328
+ ### Approach 4: TRL Jobs Package (Simplified Training)
329
+
330
+ The `trl-jobs` package provides optimized defaults and one-liner training.
331
+
332
+ ```bash
333
+ uvx trl-jobs sft \
334
+ --model_name Qwen/Qwen2.5-0.5B \
335
+ --dataset_name trl-lib/Capybara
336
+
337
+ ```
338
+
339
+ **Benefits:** Pre-configured settings, automatic Trackio integration, automatic Hub push, one-line commands
340
+ **When to use:** User working in terminal directly (not Claude Code context), quick local experimentation
341
+ **Repository:** https://github.com/huggingface/trl-jobs
342
+
343
+ ⚠️ **In Claude Code context, prefer using `hf_jobs()` MCP tool (Approach 1) when available.**
344
+
345
+ ## Hardware Selection
346
+
347
+ | Model Size | Recommended Hardware | Cost (approx/hr) | Use Case |
348
+ |------------|---------------------|------------------|----------|
349
+ | <1B params | `t4-small` | ~$0.75 | Demos, quick tests only without eval steps |
350
+ | 1-3B params | `t4-medium`, `l4x1` | ~$1.50-2.50 | Development |
351
+ | 3-7B params | `a10g-small`, `a10g-large` | ~$3.50-5.00 | Production training |
352
+ | 7-13B params | `a10g-large`, `a100-large` | ~$5-10 | Large models (use LoRA) |
353
+ | 13B+ params | `a100-large`, `a10g-largex2` | ~$10-20 | Very large (use LoRA) |
354
+
355
+ **GPU Flavors:** cpu-basic/upgrade/performance/xl, t4-small/medium, l4x1/x4, a10g-small/large/largex2/largex4, a100-large, h100/h100x8
356
+
357
+ **Guidelines:**
358
+ - Use **LoRA/PEFT** for models >7B to reduce memory
359
+ - Multi-GPU automatically handled by TRL/Accelerate
360
+ - Start with smaller hardware for testing
361
+
362
+ **See:** `references/hardware_guide.md` for detailed specifications
363
+
364
+ ## Critical: Saving Results to Hub
365
+
366
+ **⚠️ EPHEMERAL ENVIRONMENT—MUST PUSH TO HUB**
367
+
368
+ The Jobs environment is temporary. All files are deleted when the job ends. If the model isn't pushed to Hub, **ALL TRAINING IS LOST**.
369
+
370
+ ### Required Configuration
371
+
372
+ **In training script/config:**
373
+ ```python
374
+ SFTConfig(
375
+ push_to_hub=True,
376
+ hub_model_id="username/model-name", # MUST specify
377
+ hub_strategy="every_save", # Optional: push checkpoints
378
+ )
379
+ ```
380
+
381
+ **In job submission:**
382
+ ```python
383
+ {
384
+ "secrets": {"HF_TOKEN": "$HF_TOKEN"} # Enables authentication
385
+ }
386
+ ```
387
+
388
+ ### Verification Checklist
389
+
390
+ Before submitting:
391
+ - [ ] `push_to_hub=True` set in config
392
+ - [ ] `hub_model_id` includes username/repo-name
393
+ - [ ] `secrets` parameter includes HF_TOKEN
394
+ - [ ] User has write access to target repo
395
+
396
+ **See:** `references/hub_saving.md` for detailed troubleshooting
397
+
398
+ ## Timeout Management
399
+
400
+ **⚠️ DEFAULT: 30 MINUTES—TOO SHORT FOR TRAINING**
401
+
402
+ ### Setting Timeouts
403
+
404
+ ```python
405
+ {
406
+ "timeout": "2h" # 2 hours (formats: "90m", "2h", "1.5h", or seconds as integer)
407
+ }
408
+ ```
409
+
410
+ ### Timeout Guidelines
411
+
412
+ | Scenario | Recommended | Notes |
413
+ |----------|-------------|-------|
414
+ | Quick demo (50-100 examples) | 10-30 min | Verify setup |
415
+ | Development training | 1-2 hours | Small datasets |
416
+ | Production (3-7B model) | 4-6 hours | Full datasets |
417
+ | Large model with LoRA | 3-6 hours | Depends on dataset |
418
+
419
+ **Always add 20-30% buffer** for model/dataset loading, checkpoint saving, Hub push operations, and network delays.
420
+
421
+ **On timeout:** Job killed immediately, all unsaved progress lost, must restart from beginning
422
+
423
+ ## Cost Estimation
424
+
425
+ **Offer to estimate cost when planning jobs with known parameters.** Use `scripts/estimate_cost.py`:
426
+
427
+ ```bash
428
+ uv run scripts/estimate_cost.py \
429
+ --model meta-llama/Llama-2-7b-hf \
430
+ --dataset trl-lib/Capybara \
431
+ --hardware a10g-large \
432
+ --dataset-size 16000 \
433
+ --epochs 3
434
+ ```
435
+
436
+ Output includes estimated time, cost, recommended timeout (with buffer), and optimization suggestions.
437
+
438
+ **When to offer:** User planning a job, asks about cost/time, choosing hardware, job will run >1 hour or cost >$5
439
+
440
+ ## Example Training Scripts
441
+
442
+ **Production-ready templates with all best practices:**
443
+
444
+ Load these scripts for correctly:
445
+
446
+ - **`scripts/train_sft_example.py`** - Complete SFT training with Trackio, LoRA, checkpoints
447
+ - **`scripts/train_dpo_example.py`** - DPO training for preference learning
448
+ - **`scripts/train_grpo_example.py`** - GRPO training for online RL
449
+
450
+ These scripts demonstrate proper Hub saving, Trackio integration, checkpoint management, and optimized parameters. Pass their content inline to `hf_jobs()` or use as templates for custom scripts.
451
+
452
+ ## Monitoring and Tracking
453
+
454
+ **Trackio** provides real-time metrics visualization. See `references/trackio_guide.md` for complete setup guide.
455
+
456
+ **Key points:**
457
+ - Add `trackio` to dependencies
458
+ - Configure trainer with `report_to="trackio" and run_name="meaningful_name"`
459
+
460
+ ### Trackio Configuration Defaults
461
+
462
+ **Use sensible defaults unless user specifies otherwise.** When generating training scripts with Trackio:
463
+
464
+ **Default Configuration:**
465
+ - **Space ID**: `{username}/trackio` (use "trackio" as default space name)
466
+ - **Run naming**: Unless otherwise specified, name the run in a way the user will recognize (e.g., descriptive of the task, model, or purpose)
467
+ - **Config**: Keep minimal - only include hyperparameters and model/dataset info
468
+ - **Project Name**: Use a Project Name to associate runs with a particular Project
469
+
470
+ **User overrides:** If user requests specific trackio configuration (custom space, run naming, grouping, or additional config), apply their preferences instead of defaults.
471
+
472
+
473
+ This is useful for managing multiple jobs with the same configuration or keeping training scripts portable.
474
+
475
+ See `references/trackio_guide.md` for complete documentation including grouping runs for experiments.
476
+
477
+ ### Check Job Status
478
+
479
+ ```python
480
+ # List all jobs
481
+ hf_jobs("ps")
482
+
483
+ # Inspect specific job
484
+ hf_jobs("inspect", {"job_id": "your-job-id"})
485
+
486
+ # View logs
487
+ hf_jobs("logs", {"job_id": "your-job-id"})
488
+ ```
489
+
490
+ **Remember:** Wait for user to request status checks. Avoid polling repeatedly.
491
+
492
+ ## Dataset Validation
493
+
494
+ **Validate dataset format BEFORE launching GPU training to prevent the #1 cause of training failures: format mismatches.**
495
+
496
+ ### Why Validate
497
+
498
+ - 50%+ of training failures are due to dataset format issues
499
+ - DPO especially strict: requires exact column names (`prompt`, `chosen`, `rejected`)
500
+ - Failed GPU jobs waste $1-10 and 30-60 minutes
501
+ - Validation on CPU costs ~$0.01 and takes <1 minute
502
+
503
+ ### When to Validate
504
+
505
+ **ALWAYS validate for:**
506
+ - Unknown or custom datasets
507
+ - DPO training (CRITICAL - 90% of datasets need mapping)
508
+ - Any dataset not explicitly TRL-compatible
509
+
510
+ **Skip validation for known TRL datasets:**
511
+ - `trl-lib/ultrachat_200k`, `trl-lib/Capybara`, `HuggingFaceH4/ultrachat_200k`, etc.
512
+
513
+ ### Usage
514
+
515
+ ```python
516
+ hf_jobs("uv", {
517
+ "script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py",
518
+ "script_args": ["--dataset", "username/dataset-name", "--split", "train"]
519
+ })
520
+ ```
521
+
522
+ The script is fast, and will usually complete synchronously.
523
+
524
+ ### Reading Results
525
+
526
+ The output shows compatibility for each training method:
527
+
528
+ - **`✓ READY`** - Dataset is compatible, use directly
529
+ - **`✗ NEEDS MAPPING`** - Compatible but needs preprocessing (mapping code provided)
530
+ - **`✗ INCOMPATIBLE`** - Cannot be used for this method
531
+
532
+ When mapping is needed, the output includes a **"MAPPING CODE"** section with copy-paste ready Python code.
533
+
534
+ ### Example Workflow
535
+
536
+ ```python
537
+ # 1. Inspect dataset (costs ~$0.01, <1 min on CPU)
538
+ hf_jobs("uv", {
539
+ "script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py",
540
+ "script_args": ["--dataset", "argilla/distilabel-math-preference-dpo", "--split", "train"]
541
+ })
542
+
543
+ # 2. Check output markers:
544
+ # ✓ READY → proceed with training
545
+ # ✗ NEEDS MAPPING → apply mapping code below
546
+ # ✗ INCOMPATIBLE → choose different method/dataset
547
+
548
+ # 3. If mapping needed, apply before training:
549
+ def format_for_dpo(example):
550
+ return {
551
+ 'prompt': example['instruction'],
552
+ 'chosen': example['chosen_response'],
553
+ 'rejected': example['rejected_response'],
554
+ }
555
+ dataset = dataset.map(format_for_dpo, remove_columns=dataset.column_names)
556
+
557
+ # 4. Launch training job with confidence
558
+ ```
559
+
560
+ ### Common Scenario: DPO Format Mismatch
561
+
562
+ Most DPO datasets use non-standard column names. Example:
563
+
564
+ ```
565
+ Dataset has: instruction, chosen_response, rejected_response
566
+ DPO expects: prompt, chosen, rejected
567
+ ```
568
+
569
+ The validator detects this and provides exact mapping code to fix it.
570
+
571
+ ## Converting Models to GGUF
572
+
573
+ After training, convert models to **GGUF format** for use with llama.cpp, Ollama, LM Studio, and other local inference tools.
574
+
575
+ **What is GGUF:**
576
+ - Optimized for CPU/GPU inference with llama.cpp
577
+ - Supports quantization (4-bit, 5-bit, 8-bit) to reduce model size
578
+ - Compatible with Ollama, LM Studio, Jan, GPT4All, llama.cpp
579
+ - Typically 2-8GB for 7B models (vs 14GB unquantized)
580
+
581
+ **When to convert:**
582
+ - Running models locally with Ollama or LM Studio
583
+ - Reducing model size with quantization
584
+ - Deploying to edge devices
585
+ - Sharing models for local-first use
586
+
587
+ **See:** `references/gguf_conversion.md` for complete conversion guide, including production-ready conversion script, quantization options, hardware requirements, usage examples, and troubleshooting.
588
+
589
+ **Quick conversion:**
590
+ ```python
591
+ hf_jobs("uv", {
592
+ "script": "<see references/gguf_conversion.md for complete script>",
593
+ "flavor": "a10g-large",
594
+ "timeout": "45m",
595
+ "secrets": {"HF_TOKEN": "$HF_TOKEN"},
596
+ "env": {
597
+ "ADAPTER_MODEL": "username/my-finetuned-model",
598
+ "BASE_MODEL": "Qwen/Qwen2.5-0.5B",
599
+ "OUTPUT_REPO": "username/my-model-gguf"
600
+ }
601
+ })
602
+ ```
603
+
604
+ ## Common Training Patterns
605
+
606
+ See `references/training_patterns.md` for detailed examples including:
607
+ - Quick demo (5-10 minutes)
608
+ - Production with checkpoints
609
+ - Multi-GPU training
610
+ - DPO training (preference learning)
611
+ - GRPO training (online RL)
612
+
613
+ ## Common Failure Modes
614
+
615
+ ### Out of Memory (OOM)
616
+
617
+ **Fix (try in order):**
618
+ 1. Reduce batch size: `per_device_train_batch_size=1`, increase `gradient_accumulation_steps=8`. Effective batch size is `per_device_train_batch_size` x `gradient_accumulation_steps`. For best performance keep effective batch size close to 128.
619
+ 2. Enable: `gradient_checkpointing=True`
620
+ 3. Upgrade hardware: t4-small → l4x1, a10g-small → a10g-large etc.
621
+
622
+ ### Dataset Misformatted
623
+
624
+ **Fix:**
625
+ 1. Validate first with dataset inspector:
626
+ ```bash
627
+ uv run https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py \
628
+ --dataset name --split train
629
+ ```
630
+ 2. Check output for compatibility markers (✓ READY, ✗ NEEDS MAPPING, ✗ INCOMPATIBLE)
631
+ 3. Apply mapping code from inspector output if needed
632
+
633
+ ### Job Timeout
634
+
635
+ **Fix:**
636
+ 1. Check logs for actual runtime: `hf_jobs("logs", {"job_id": "..."})`
637
+ 2. Increase timeout with buffer: `"timeout": "3h"` (add 30% to estimated time)
638
+ 3. Or reduce training: lower `num_train_epochs`, use smaller dataset, enable `max_steps`
639
+ 4. Save checkpoints: `save_strategy="steps"`, `save_steps=500`, `hub_strategy="every_save"`
640
+
641
+ **Note:** Default 30min is insufficient for real training. Minimum 1-2 hours.
642
+
643
+ ### Hub Push Failures
644
+
645
+ **Fix:**
646
+ 1. Add to job: `secrets={"HF_TOKEN": "$HF_TOKEN"}`
647
+ 2. Add to config: `push_to_hub=True`, `hub_model_id="username/model-name"`
648
+ 3. Verify auth: `mcp__huggingface__hf_whoami()`
649
+ 4. Check token has write permissions and repo exists (or set `hub_private_repo=True`)
650
+
651
+ ### Missing Dependencies
652
+
653
+ **Fix:**
654
+ Add to PEP 723 header:
655
+ ```python
656
+ # /// script
657
+ # dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio", "missing-package"]
658
+ # ///
659
+ ```
660
+
661
+ ## Troubleshooting
662
+
663
+ **Common issues:**
664
+ - Job times out → Increase timeout, reduce epochs/dataset, use smaller model/LoRA
665
+ - Model not saved to Hub → Check push_to_hub=True, hub_model_id, secrets=HF_TOKEN
666
+ - Out of Memory (OOM) → Reduce batch size, increase gradient accumulation, enable LoRA, use larger GPU
667
+ - Dataset format error → Validate with dataset inspector (see Dataset Validation section)
668
+ - Import/module errors → Add PEP 723 header with dependencies, verify format
669
+ - Authentication errors → Check `mcp__huggingface__hf_whoami()`, token permissions, secrets parameter
670
+
671
+ **See:** `references/troubleshooting.md` for complete troubleshooting guide
672
+
673
+ ## Resources
674
+
675
+ ### References (In This Skill)
676
+ - `references/training_methods.md` - Overview of SFT, DPO, GRPO, KTO, PPO, Reward Modeling
677
+ - `references/training_patterns.md` - Common training patterns and examples
678
+ - `references/unsloth.md` - Unsloth for fast VLM training (~2x speed, 60% less VRAM)
679
+ - `references/gguf_conversion.md` - Complete GGUF conversion guide
680
+ - `references/trackio_guide.md` - Trackio monitoring setup
681
+ - `references/hardware_guide.md` - Hardware specs and selection
682
+ - `references/hub_saving.md` - Hub authentication troubleshooting
683
+ - `references/troubleshooting.md` - Common issues and solutions
684
+ - `references/local_training_macos.md` - Local training on macOS
685
+
686
+ ### Scripts (In This Skill)
687
+ - `scripts/train_sft_example.py` - Production SFT template
688
+ - `scripts/train_dpo_example.py` - Production DPO template
689
+ - `scripts/train_grpo_example.py` - Production GRPO template
690
+ - `scripts/unsloth_sft_example.py` - Unsloth text LLM training template (faster, less VRAM)
691
+ - `scripts/estimate_cost.py` - Estimate time and cost (offer when appropriate)
692
+ - `scripts/convert_to_gguf.py` - Complete GGUF conversion script
693
+
694
+ ### External Scripts
695
+ - [Dataset Inspector](https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py) - Validate dataset format before training (use via `uv run` or `hf_jobs`)
696
+
697
+ ### External Links
698
+ - [TRL Documentation](https://huggingface.co/docs/trl)
699
+ - [TRL Jobs Training Guide](https://huggingface.co/docs/trl/en/jobs_training)
700
+ - [TRL Jobs Package](https://github.com/huggingface/trl-jobs)
701
+ - [HF Jobs Documentation](https://huggingface.co/docs/huggingface_hub/guides/jobs)
702
+ - [TRL Example Scripts](https://github.com/huggingface/trl/tree/main/examples/scripts)
703
+ - [UV Scripts Guide](https://docs.astral.sh/uv/guides/scripts/)
704
+ - [UV Scripts Organization](https://huggingface.co/uv-scripts)
705
+
706
+ ## Key Takeaways
707
+
708
+ 1. **Submit scripts inline** - The `script` parameter accepts Python code directly; no file saving required unless user requests
709
+ 2. **Jobs are asynchronous** - Don't wait/poll; let user check when ready
710
+ 3. **Always set timeout** - Default 30 min is insufficient; minimum 1-2 hours recommended
711
+ 4. **Always enable Hub push** - Environment is ephemeral; without push, all results lost
712
+ 5. **Include Trackio** - Use example scripts as templates for real-time monitoring
713
+ 6. **Offer cost estimation** - When parameters are known, use `scripts/estimate_cost.py`
714
+ 7. **Use UV scripts (Approach 1)** - Default to `hf_jobs("uv", {...})` with inline scripts; TRL maintained scripts for standard training; avoid bash `trl-jobs` commands in Claude Code
715
+ 8. **Use hf_doc_fetch/hf_doc_search** for latest TRL documentation
716
+ 9. **Validate dataset format** before training with dataset inspector (see Dataset Validation section)
717
+ 10. **Choose appropriate hardware** for model size; use LoRA for models >7B