@bgicli/bgicli 2.2.8 → 2.2.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (113) hide show
  1. package/data/skills/anthropic-algorithmic-art/SKILL.md +405 -0
  2. package/data/skills/anthropic-canvas-design/SKILL.md +130 -0
  3. package/data/skills/anthropic-claude-api/SKILL.md +243 -0
  4. package/data/skills/anthropic-doc-coauthoring/SKILL.md +375 -0
  5. package/data/skills/anthropic-docx/SKILL.md +590 -0
  6. package/data/skills/anthropic-frontend-design/SKILL.md +42 -0
  7. package/data/skills/anthropic-internal-comms/SKILL.md +32 -0
  8. package/data/skills/anthropic-mcp-builder/SKILL.md +236 -0
  9. package/data/skills/anthropic-pdf/SKILL.md +314 -0
  10. package/data/skills/anthropic-pptx/SKILL.md +232 -0
  11. package/data/skills/anthropic-skill-creator/SKILL.md +485 -0
  12. package/data/skills/anthropic-webapp-testing/SKILL.md +96 -0
  13. package/data/skills/anthropic-xlsx/SKILL.md +292 -0
  14. package/data/skills/arxiv-database/SKILL.md +362 -0
  15. package/data/skills/astropy/SKILL.md +329 -0
  16. package/data/skills/ctx-advanced-evaluation/SKILL.md +402 -0
  17. package/data/skills/ctx-bdi-mental-states/SKILL.md +311 -0
  18. package/data/skills/ctx-context-compression/SKILL.md +272 -0
  19. package/data/skills/ctx-context-degradation/SKILL.md +206 -0
  20. package/data/skills/ctx-context-fundamentals/SKILL.md +201 -0
  21. package/data/skills/ctx-context-optimization/SKILL.md +195 -0
  22. package/data/skills/ctx-evaluation/SKILL.md +251 -0
  23. package/data/skills/ctx-filesystem-context/SKILL.md +287 -0
  24. package/data/skills/ctx-hosted-agents/SKILL.md +260 -0
  25. package/data/skills/ctx-memory-systems/SKILL.md +225 -0
  26. package/data/skills/ctx-multi-agent-patterns/SKILL.md +257 -0
  27. package/data/skills/ctx-project-development/SKILL.md +291 -0
  28. package/data/skills/ctx-tool-design/SKILL.md +271 -0
  29. package/data/skills/dhdna-profiler/SKILL.md +162 -0
  30. package/data/skills/generate-image/SKILL.md +183 -0
  31. package/data/skills/geomaster/SKILL.md +365 -0
  32. package/data/skills/get-available-resources/SKILL.md +275 -0
  33. package/data/skills/hamelsmu-build-review-interface/SKILL.md +96 -0
  34. package/data/skills/hamelsmu-error-analysis/SKILL.md +164 -0
  35. package/data/skills/hamelsmu-eval-audit/SKILL.md +183 -0
  36. package/data/skills/hamelsmu-evaluate-rag/SKILL.md +177 -0
  37. package/data/skills/hamelsmu-generate-synthetic-data/SKILL.md +131 -0
  38. package/data/skills/hamelsmu-validate-evaluator/SKILL.md +212 -0
  39. package/data/skills/hamelsmu-write-judge-prompt/SKILL.md +144 -0
  40. package/data/skills/hf-cli/SKILL.md +174 -0
  41. package/data/skills/hf-mcp/SKILL.md +178 -0
  42. package/data/skills/hugging-face-dataset-viewer/SKILL.md +121 -0
  43. package/data/skills/hugging-face-datasets/SKILL.md +542 -0
  44. package/data/skills/hugging-face-evaluation/SKILL.md +651 -0
  45. package/data/skills/hugging-face-jobs/SKILL.md +1042 -0
  46. package/data/skills/hugging-face-model-trainer/SKILL.md +717 -0
  47. package/data/skills/hugging-face-paper-pages/SKILL.md +239 -0
  48. package/data/skills/hugging-face-paper-publisher/SKILL.md +624 -0
  49. package/data/skills/hugging-face-tool-builder/SKILL.md +110 -0
  50. package/data/skills/hugging-face-trackio/SKILL.md +115 -0
  51. package/data/skills/hugging-face-vision-trainer/SKILL.md +593 -0
  52. package/data/skills/huggingface-gradio/SKILL.md +245 -0
  53. package/data/skills/matlab/SKILL.md +376 -0
  54. package/data/skills/modal/SKILL.md +381 -0
  55. package/data/skills/openai-cloudflare-deploy/SKILL.md +224 -0
  56. package/data/skills/openai-develop-web-game/SKILL.md +149 -0
  57. package/data/skills/openai-doc/SKILL.md +80 -0
  58. package/data/skills/openai-figma/SKILL.md +42 -0
  59. package/data/skills/openai-figma-implement-design/SKILL.md +264 -0
  60. package/data/skills/openai-gh-address-comments/SKILL.md +25 -0
  61. package/data/skills/openai-gh-fix-ci/SKILL.md +69 -0
  62. package/data/skills/openai-imagegen/SKILL.md +174 -0
  63. package/data/skills/openai-jupyter-notebook/SKILL.md +107 -0
  64. package/data/skills/openai-linear/SKILL.md +87 -0
  65. package/data/skills/openai-netlify-deploy/SKILL.md +247 -0
  66. package/data/skills/openai-notion-knowledge-capture/SKILL.md +56 -0
  67. package/data/skills/openai-notion-meeting-intelligence/SKILL.md +60 -0
  68. package/data/skills/openai-notion-research-documentation/SKILL.md +59 -0
  69. package/data/skills/openai-notion-spec-to-implementation/SKILL.md +58 -0
  70. package/data/skills/openai-openai-docs/SKILL.md +69 -0
  71. package/data/skills/openai-pdf/SKILL.md +67 -0
  72. package/data/skills/openai-playwright/SKILL.md +147 -0
  73. package/data/skills/openai-render-deploy/SKILL.md +479 -0
  74. package/data/skills/openai-screenshot/SKILL.md +267 -0
  75. package/data/skills/openai-security-best-practices/SKILL.md +86 -0
  76. package/data/skills/openai-security-ownership-map/SKILL.md +206 -0
  77. package/data/skills/openai-security-threat-model/SKILL.md +81 -0
  78. package/data/skills/openai-sentry/SKILL.md +123 -0
  79. package/data/skills/openai-sora/SKILL.md +178 -0
  80. package/data/skills/openai-speech/SKILL.md +144 -0
  81. package/data/skills/openai-spreadsheet/SKILL.md +145 -0
  82. package/data/skills/openai-transcribe/SKILL.md +81 -0
  83. package/data/skills/openai-vercel-deploy/SKILL.md +77 -0
  84. package/data/skills/openai-yeet/SKILL.md +28 -0
  85. package/data/skills/pennylane/SKILL.md +224 -0
  86. package/data/skills/polars-bio/SKILL.md +374 -0
  87. package/data/skills/primekg/SKILL.md +97 -0
  88. package/data/skills/pymatgen/SKILL.md +689 -0
  89. package/data/skills/qiskit/SKILL.md +273 -0
  90. package/data/skills/qutip/SKILL.md +316 -0
  91. package/data/skills/recursive-decomposition/SKILL.md +185 -0
  92. package/data/skills/rowan/SKILL.md +427 -0
  93. package/data/skills/scholar-evaluation/SKILL.md +298 -0
  94. package/data/skills/sentry-create-alert/SKILL.md +210 -0
  95. package/data/skills/sentry-fix-issues/SKILL.md +126 -0
  96. package/data/skills/sentry-pr-code-review/SKILL.md +105 -0
  97. package/data/skills/sentry-python-sdk/SKILL.md +317 -0
  98. package/data/skills/sentry-setup-ai-monitoring/SKILL.md +217 -0
  99. package/data/skills/stable-baselines3/SKILL.md +297 -0
  100. package/data/skills/sympy/SKILL.md +498 -0
  101. package/data/skills/trailofbits-ask-questions-if-underspecified/SKILL.md +85 -0
  102. package/data/skills/trailofbits-audit-context-building/SKILL.md +302 -0
  103. package/data/skills/trailofbits-differential-review/SKILL.md +220 -0
  104. package/data/skills/trailofbits-insecure-defaults/SKILL.md +117 -0
  105. package/data/skills/trailofbits-modern-python/SKILL.md +333 -0
  106. package/data/skills/trailofbits-property-based-testing/SKILL.md +123 -0
  107. package/data/skills/trailofbits-semgrep-rule-creator/SKILL.md +172 -0
  108. package/data/skills/trailofbits-sharp-edges/SKILL.md +292 -0
  109. package/data/skills/trailofbits-variant-analysis/SKILL.md +142 -0
  110. package/data/skills/transformers.js/SKILL.md +637 -0
  111. package/data/skills/writing/SKILL.md +419 -0
  112. package/dist/bgi.js +2 -2
  113. package/package.json +1 -1
@@ -0,0 +1,593 @@
1
+ ---
2
+ name: hugging-face-vision-trainer
3
+ description: Trains and fine-tunes vision models for object detection (D-FINE, RT-DETR v2, DETR, YOLOS), image classification (timm models — MobileNetV3, MobileViT, ResNet, ViT/DINOv3 — plus any Transformers classifier), and SAM/SAM2 segmentation using Hugging Face Transformers on Hugging Face Jobs cloud GPUs. Covers COCO-format dataset preparation, Albumentations augmentation, mAP/mAR evaluation, accuracy metrics, SAM segmentation with bbox/point prompts, DiceCE loss, hardware selection, cost estimation, Trackio monitoring, and Hub persistence. Use when users mention training object detection, image classification, SAM, SAM2, segmentation, image matting, DETR, D-FINE, RT-DETR, ViT, timm, MobileNet, ResNet, bounding box models, or fine-tuning vision models on Hugging Face Jobs.
4
+ ---
5
+
6
+ # Vision Model Training on Hugging Face Jobs
7
+
8
+ Train object detection, image classification, and SAM/SAM2 segmentation models on managed cloud GPUs. No local GPU setup required—results are automatically saved to the Hugging Face Hub.
9
+
10
+ ## When to Use This Skill
11
+
12
+ Use this skill when users want to:
13
+ - Fine-tune object detection models (D-FINE, RT-DETR v2, DETR, YOLOS) on cloud GPUs or local
14
+ - Fine-tune image classification models (timm: MobileNetV3, MobileViT, ResNet, ViT/DINOv3, or any Transformers classifier) on cloud GPUs or local
15
+ - Fine-tune SAM or SAM2 models for segmentation / image matting using bbox or point prompts
16
+ - Train bounding-box detectors on custom datasets
17
+ - Train image classifiers on custom datasets
18
+ - Train segmentation models on custom mask datasets with prompts
19
+ - Run vision training jobs on Hugging Face Jobs infrastructure
20
+ - Ensure trained vision models are permanently saved to the Hub
21
+
22
+ ## Related Skills
23
+
24
+ - **`hugging-face-jobs`** — General HF Jobs infrastructure: token authentication, hardware flavors, timeout management, cost estimation, secrets, environment variables, scheduled jobs, and result persistence. **Refer to the Jobs skill for any non-training-specific Jobs questions** (e.g., "how do secrets work?", "what hardware is available?", "how do I pass tokens?").
25
+ - **`hugging-face-model-trainer`** — TRL-based language model training (SFT, DPO, GRPO). Use that skill for text/language model fine-tuning.
26
+
27
+ ## Local Script Execution
28
+
29
+ Helper scripts use PEP 723 inline dependencies. Run them with `uv run`:
30
+ ```bash
31
+ uv run scripts/dataset_inspector.py --dataset username/dataset-name --split train
32
+ uv run scripts/estimate_cost.py --help
33
+ ```
34
+
35
+ ## Prerequisites Checklist
36
+
37
+ Before starting any training job, verify:
38
+
39
+ ### Account & Authentication
40
+ - Hugging Face Account with [Pro](https://hf.co/pro), [Team](https://hf.co/enterprise), or [Enterprise](https://hf.co/enterprise) plan (Jobs require paid plan)
41
+ - Authenticated login: Check with `hf_whoami()` (tool) or `hf auth whoami` (terminal)
42
+ - Token has **write** permissions
43
+ - **MUST pass token in job secrets** — see directive #3 below for syntax (MCP tool vs Python API)
44
+
45
+ ### Dataset Requirements — Object Detection
46
+ - Dataset must exist on Hub
47
+ - Annotations must use the `objects` column with `bbox`, `category` (and optionally `area`) sub-fields
48
+ - Bboxes can be in **xywh (COCO)** or **xyxy (Pascal VOC)** format — auto-detected and converted
49
+ - Categories can be **integers or strings** — strings are auto-remapped to integer IDs
50
+ - `image_id` column is **optional** — generated automatically if missing
51
+ - **ALWAYS validate unknown datasets** before GPU training (see Dataset Validation section)
52
+
53
+ ### Dataset Requirements — Image Classification
54
+ - Dataset must exist on Hub
55
+ - Must have an **`image` column** (PIL images) and a **`label` column** (integer class IDs or strings)
56
+ - The label column can be `ClassLabel` type (with names) or plain integers/strings — strings are auto-remapped
57
+ - Common column names auto-detected: `label`, `labels`, `class`, `fine_label`
58
+ - **ALWAYS validate unknown datasets** before GPU training (see Dataset Validation section)
59
+
60
+ ### Dataset Requirements — SAM/SAM2 Segmentation
61
+ - Dataset must exist on Hub
62
+ - Must have an **`image` column** (PIL images) and a **`mask` column** (binary ground-truth segmentation mask)
63
+ - Must have a **prompt** — either:
64
+ - A **`prompt` column** with JSON containing `{"bbox": [x0,y0,x1,y1]}` or `{"point": [x,y]}`
65
+ - OR a dedicated **`bbox`** column with `[x0,y0,x1,y1]` values
66
+ - OR a dedicated **`point`** column with `[x,y]` or `[[x,y],...]` values
67
+ - Bboxes should be in **xyxy** format (absolute pixel coordinates)
68
+ - Example dataset: `merve/MicroMat-mini` (image matting with bbox prompts)
69
+ - **ALWAYS validate unknown datasets** before GPU training (see Dataset Validation section)
70
+
71
+ ### Critical Settings
72
+ - **Timeout must exceed expected training time** — Default 30min is TOO SHORT. See directive #6 for recommended values.
73
+ - **Hub push must be enabled** — `push_to_hub=True`, `hub_model_id="username/model-name"`, token in `secrets`
74
+
75
+ ## Dataset Validation
76
+
77
+ **Validate dataset format BEFORE launching GPU training to prevent the #1 cause of training failures: format mismatches.**
78
+
79
+ **ALWAYS validate for** unknown/custom datasets or any dataset you haven't trained with before. **Skip for** `cppe-5` (the default in the training script).
80
+
81
+ ### Running the Inspector
82
+
83
+ **Option 1: Via HF Jobs (recommended — avoids local SSL/dependency issues):**
84
+ ```python
85
+ hf_jobs("uv", {
86
+ "script": "path/to/dataset_inspector.py",
87
+ "script_args": ["--dataset", "username/dataset-name", "--split", "train"]
88
+ })
89
+ ```
90
+
91
+ **Option 2: Locally:**
92
+ ```bash
93
+ uv run scripts/dataset_inspector.py --dataset username/dataset-name --split train
94
+ ```
95
+
96
+ **Option 3: Via `HfApi().run_uv_job()` (if hf_jobs MCP unavailable):**
97
+ ```python
98
+ from huggingface_hub import HfApi
99
+ api = HfApi()
100
+ api.run_uv_job(
101
+ script="scripts/dataset_inspector.py",
102
+ script_args=["--dataset", "username/dataset-name", "--split", "train"],
103
+ flavor="cpu-basic",
104
+ timeout=300,
105
+ )
106
+ ```
107
+
108
+ ### Reading Results
109
+
110
+ - **`✓ READY`** — Dataset is compatible, use directly
111
+ - **`✗ NEEDS FORMATTING`** — Needs preprocessing (mapping code provided in output)
112
+
113
+ ## Automatic Bbox Preprocessing
114
+
115
+ The object detection training script (`scripts/object_detection_training.py`) automatically handles bbox format detection (xyxy→xywh conversion), bbox sanitization, `image_id` generation, string category→integer remapping, and dataset truncation. **No manual preprocessing needed** — just ensure the dataset has `objects.bbox` and `objects.category` columns.
116
+
117
+ ## Training workflow
118
+
119
+ Copy this checklist and track progress:
120
+
121
+ ```
122
+ Training Progress:
123
+ - [ ] Step 1: Verify prerequisites (account, token, dataset)
124
+ - [ ] Step 2: Validate dataset format (run dataset_inspector.py)
125
+ - [ ] Step 3: Ask user about dataset size and validation split
126
+ - [ ] Step 4: Prepare training script (OD: scripts/object_detection_training.py, IC: scripts/image_classification_training.py, SAM: scripts/sam_segmentation_training.py)
127
+ - [ ] Step 5: Save script locally, submit job, and report details
128
+ ```
129
+
130
+ **Step 1: Verify prerequisites**
131
+
132
+ Follow the Prerequisites Checklist above.
133
+
134
+ **Step 2: Validate dataset**
135
+
136
+ Run the dataset inspector BEFORE spending GPU time. See "Dataset Validation" section above.
137
+
138
+ **Step 3: Ask user preferences**
139
+
140
+ ALWAYS use the AskUserQuestion tool with option-style format:
141
+
142
+ ```python
143
+ AskUserQuestion({
144
+ "questions": [
145
+ {
146
+ "question": "Do you want to run a quick test with a subset of the data first?",
147
+ "header": "Dataset Size",
148
+ "options": [
149
+ {"label": "Quick test run (10% of data)", "description": "Faster, cheaper (~30-60 min, ~$2-5) to validate setup"},
150
+ {"label": "Full dataset (Recommended)", "description": "Complete training for best model quality"}
151
+ ],
152
+ "multiSelect": false
153
+ },
154
+ {
155
+ "question": "Do you want to create a validation split from the training data?",
156
+ "header": "Split data",
157
+ "options": [
158
+ {"label": "Yes (Recommended)", "description": "Automatically split 15% of training data for validation"},
159
+ {"label": "No", "description": "Use existing validation split from dataset"}
160
+ ],
161
+ "multiSelect": false
162
+ },
163
+ {
164
+ "question": "Which GPU hardware do you want to use?",
165
+ "header": "Hardware Flavor",
166
+ "options": [
167
+ {"label": "t4-small ($0.40/hr)", "description": "1x T4, 16 GB VRAM — sufficient for all OD models under 100M params"},
168
+ {"label": "l4x1 ($0.80/hr)", "description": "1x L4, 24 GB VRAM — more headroom for large images or batch sizes"},
169
+ {"label": "a10g-large ($1.50/hr)", "description": "1x A10G, 24 GB VRAM — faster training, more CPU/RAM"},
170
+ {"label": "a100-large ($2.50/hr)", "description": "1x A100, 80 GB VRAM — fastest, for very large datasets or image sizes"}
171
+ ],
172
+ "multiSelect": false
173
+ }
174
+ ]
175
+ })
176
+ ```
177
+
178
+ **Step 4: Prepare training script**
179
+
180
+ For object detection, use [scripts/object_detection_training.py](scripts/object_detection_training.py) as the production-ready template. For image classification, use [scripts/image_classification_training.py](scripts/image_classification_training.py). For SAM/SAM2 segmentation, use [scripts/sam_segmentation_training.py](scripts/sam_segmentation_training.py). All scripts use `HfArgumentParser` — all configuration is passed via CLI arguments in `script_args`, NOT by editing Python variables. For timm model details, see [references/timm_trainer.md](references/timm_trainer.md). For SAM2 training details, see [references/finetune_sam2_trainer.md](references/finetune_sam2_trainer.md).
181
+
182
+ **Step 5: Save script, submit job, and report**
183
+
184
+ 1. **Save the script locally** to `submitted_jobs/` in the workspace root (create if needed) with a descriptive name like `training_<dataset>_<YYYYMMDD_HHMMSS>.py`. Tell the user the path.
185
+ 2. **Submit** using `hf_jobs` MCP tool (preferred) or `HfApi().run_uv_job()` — see directive #1 for both methods. Pass all config via `script_args`.
186
+ 3. **Report** the job ID (from `.id` attribute), monitoring URL, Trackio dashboard (`https://huggingface.co/spaces/{username}/trackio`), expected time, and estimated cost.
187
+ 4. **Wait for user** to request status checks — don't poll automatically. Training jobs run asynchronously and can take hours.
188
+
189
+ ## Critical directives
190
+
191
+ These rules prevent common failures. Follow them exactly.
192
+
193
+ ### 1. Job submission: `hf_jobs` MCP tool vs Python API
194
+
195
+ **`hf_jobs()` is an MCP tool, NOT a Python function.** Do NOT try to import it from `huggingface_hub`. Call it as a tool:
196
+
197
+ ```
198
+ hf_jobs("uv", {"script": training_script_content, "flavor": "a10g-large", "timeout": "4h", "secrets": {"HF_TOKEN": "$HF_TOKEN"}})
199
+ ```
200
+
201
+ **If `hf_jobs` MCP tool is unavailable**, use the Python API directly:
202
+
203
+ ```python
204
+ from huggingface_hub import HfApi, get_token
205
+ api = HfApi()
206
+ job_info = api.run_uv_job(
207
+ script="path/to/training_script.py", # file PATH, NOT content
208
+ script_args=["--dataset_name", "cppe-5", ...],
209
+ flavor="a10g-large",
210
+ timeout=14400, # seconds (4 hours)
211
+ env={"PYTHONUNBUFFERED": "1"},
212
+ secrets={"HF_TOKEN": get_token()}, # MUST use get_token(), NOT "$HF_TOKEN"
213
+ )
214
+ print(f"Job ID: {job_info.id}")
215
+ ```
216
+
217
+ **Critical differences between the two methods:**
218
+
219
+ | | `hf_jobs` MCP tool | `HfApi().run_uv_job()` |
220
+ |---|---|---|
221
+ | `script` param | Python code string or URL (NOT local paths) | File path to `.py` file (NOT content) |
222
+ | Token in secrets | `"$HF_TOKEN"` (auto-replaced) | `get_token()` (actual token value) |
223
+ | Timeout format | String (`"4h"`) | Seconds (`14400`) |
224
+
225
+ **Rules for both methods:**
226
+ - The training script MUST include PEP 723 inline metadata with dependencies
227
+ - Do NOT use `image` or `command` parameters (those belong to `run_job()`, not `run_uv_job()`)
228
+
229
+ ### 2. Authentication via job secrets + explicit hub_token injection
230
+
231
+ **Job config** MUST include the token in secrets — syntax depends on submission method (see table above).
232
+
233
+ **Training script requirement:** The Transformers `Trainer` calls `create_repo(token=self.args.hub_token)` during `__init__()` when `push_to_hub=True`. The training script MUST inject `HF_TOKEN` into `training_args.hub_token` AFTER parsing args but BEFORE creating the `Trainer`. The template `scripts/object_detection_training.py` already includes this:
234
+
235
+ ```python
236
+ hf_token = os.environ.get("HF_TOKEN")
237
+ if training_args.push_to_hub and not training_args.hub_token:
238
+ if hf_token:
239
+ training_args.hub_token = hf_token
240
+ ```
241
+
242
+ If you write a custom script, you MUST include this token injection before the `Trainer(...)` call.
243
+
244
+ - Do NOT call `login()` in custom scripts unless replicating the full pattern from `scripts/object_detection_training.py`
245
+ - Do NOT rely on implicit token resolution (`hub_token=None`) — unreliable in Jobs
246
+ - See the `hugging-face-jobs` skill → *Token Usage Guide* for full details
247
+
248
+ ### 3. JobInfo attribute
249
+
250
+ Access the job identifier using `.id` (NOT `.job_id` or `.name` — these don't exist):
251
+
252
+ ```python
253
+ job_info = api.run_uv_job(...) # or hf_jobs("uv", {...})
254
+ job_id = job_info.id # Correct -- returns string like "687fb701029421ae5549d998"
255
+ ```
256
+
257
+ ### 4. Required training flags and HfArgumentParser boolean syntax
258
+
259
+ `scripts/object_detection_training.py` uses `HfArgumentParser` — all config is passed via `script_args`. Boolean arguments have two syntaxes:
260
+
261
+ - **`bool` fields** (e.g., `push_to_hub`, `do_train`): Use as bare flags (`--push_to_hub`) or negate with `--no_` prefix (`--no_remove_unused_columns`)
262
+ - **`Optional[bool]` fields** (e.g., `greater_is_better`): MUST pass explicit value (`--greater_is_better True`). Bare `--greater_is_better` causes `error: expected one argument`
263
+
264
+ Required flags for object detection:
265
+
266
+ ```
267
+ --no_remove_unused_columns # MUST: preserves image column for pixel_values
268
+ --no_eval_do_concat_batches # MUST: images have different numbers of target boxes
269
+ --push_to_hub # MUST: environment is ephemeral
270
+ --hub_model_id username/model-name
271
+ --metric_for_best_model eval_map
272
+ --greater_is_better True # MUST pass "True" explicitly (Optional[bool])
273
+ --do_train
274
+ --do_eval
275
+ ```
276
+
277
+ Required flags for image classification:
278
+
279
+ ```
280
+ --no_remove_unused_columns # MUST: preserves image column for pixel_values
281
+ --push_to_hub # MUST: environment is ephemeral
282
+ --hub_model_id username/model-name
283
+ --metric_for_best_model eval_accuracy
284
+ --greater_is_better True # MUST pass "True" explicitly (Optional[bool])
285
+ --do_train
286
+ --do_eval
287
+ ```
288
+
289
+ Required flags for SAM/SAM2 segmentation:
290
+
291
+ ```
292
+ --remove_unused_columns False # MUST: preserves input_boxes/input_points
293
+ --push_to_hub # MUST: environment is ephemeral
294
+ --hub_model_id username/model-name
295
+ --do_train
296
+ --prompt_type bbox # or "point"
297
+ --dataloader_pin_memory False # MUST: avoids pin_memory issues with custom collator
298
+ ```
299
+
300
+ ### 5. Timeout management
301
+
302
+ Default 30 min is TOO SHORT for object detection. Set minimum 2-4 hours. Add 30% buffer for model loading, preprocessing, and Hub push.
303
+
304
+ | Scenario | Timeout |
305
+ |----------|---------|
306
+ | Quick test (100-200 images, 5-10 epochs) | 1h |
307
+ | Development (500-1K images, 15-20 epochs) | 2-3h |
308
+ | Production (1K-5K images, 30 epochs) | 4-6h |
309
+ | Large dataset (5K+ images) | 6-12h |
310
+
311
+ ### 6. Trackio monitoring
312
+
313
+ Trackio is **always enabled** in the object detection training script — it calls `trackio.init()` and `trackio.finish()` automatically. No need to pass `--report_to trackio`. The project name is taken from `--output_dir` and the run name from `--run_name`. For image classification, pass `--report_to trackio` in `TrainingArguments`.
314
+
315
+ Dashboard at: `https://huggingface.co/spaces/{username}/trackio`
316
+
317
+ ## Model & hardware selection
318
+
319
+ ### Recommended object detection models
320
+
321
+ | Model | Params | Use case |
322
+ |-------|--------|----------|
323
+ | `ustc-community/dfine-small-coco` | 10.4M | Best starting point — fast, cheap, SOTA quality |
324
+ | `PekingU/rtdetr_v2_r18vd` | 20.2M | Lightweight real-time detector |
325
+ | `ustc-community/dfine-large-coco` | 31.4M | Higher accuracy, still efficient |
326
+ | `PekingU/rtdetr_v2_r50vd` | 43M | Strong real-time baseline |
327
+ | `ustc-community/dfine-xlarge-obj365` | 63.5M | Best accuracy (pretrained on Objects365) |
328
+ | `PekingU/rtdetr_v2_r101vd` | 76M | Largest RT-DETR v2 variant |
329
+
330
+ Start with `ustc-community/dfine-small-coco` for fast iteration. Move to D-FINE Large or RT-DETR v2 R50 for better accuracy.
331
+
332
+ ### Recommended image classification models
333
+
334
+ All `timm/` models work out of the box via `AutoModelForImageClassification` (loaded as `TimmWrapperForImageClassification`). See [references/timm_trainer.md](references/timm_trainer.md) for details.
335
+
336
+ | Model | Params | Use case |
337
+ |-------|--------|----------|
338
+ | `timm/mobilenetv3_small_100.lamb_in1k` | 2.5M | Ultra-lightweight — mobile/edge, fastest training |
339
+ | `timm/mobilevit_s.cvnets_in1k` | 5.6M | Mobile transformer — good accuracy/speed trade-off |
340
+ | `timm/resnet50.a1_in1k` | 25.6M | Strong CNN baseline — reliable, well-studied |
341
+ | `timm/vit_base_patch16_dinov3.lvd1689m` | 86.6M | Best accuracy — DINOv3 self-supervised ViT |
342
+
343
+ Start with `timm/mobilenetv3_small_100.lamb_in1k` for fast iteration. Move to `timm/resnet50.a1_in1k` or `timm/vit_base_patch16_dinov3.lvd1689m` for better accuracy.
344
+
345
+ ### Recommended SAM/SAM2 segmentation models
346
+
347
+ | Model | Params | Use case |
348
+ |-------|--------|----------|
349
+ | `facebook/sam2.1-hiera-tiny` | 38.9M | Fastest SAM2 — good for quick experiments |
350
+ | `facebook/sam2.1-hiera-small` | 46.0M | Best starting point — good quality/speed balance |
351
+ | `facebook/sam2.1-hiera-base-plus` | 80.8M | Higher capacity for complex segmentation |
352
+ | `facebook/sam2.1-hiera-large` | 224.4M | Best SAM2 accuracy — requires more VRAM |
353
+ | `facebook/sam-vit-base` | 93.7M | Original SAM — ViT-B backbone |
354
+ | `facebook/sam-vit-large` | 312.3M | Original SAM — ViT-L backbone |
355
+ | `facebook/sam-vit-huge` | 641.1M | Original SAM — ViT-H, best SAM v1 accuracy |
356
+
357
+ Start with `facebook/sam2.1-hiera-small` for fast iteration. SAM2 models are generally more efficient than SAM v1 at similar quality. Only the mask decoder is trained by default (vision and prompt encoders are frozen).
358
+
359
+ ### Hardware recommendation
360
+
361
+ All recommended OD and IC models are under 100M params — **`t4-small` (16 GB VRAM, $0.40/hr) is sufficient for all of them.** Image classification models are generally smaller and faster than object detection models — `t4-small` handles even ViT-Base comfortably. For SAM2 models up to `hiera-base-plus`, `t4-small` is sufficient since only the mask decoder is trained. For `sam2.1-hiera-large` or SAM v1 models, use `l4x1` or `a10g-large`. Only upgrade if you hit OOM from large batch sizes — reduce batch size first before switching hardware. Common upgrade path: `t4-small` → `l4x1` ($0.80/hr, 24 GB) → `a10g-large` ($1.50/hr, 24 GB).
362
+
363
+ For full hardware flavor list: refer to the `hugging-face-jobs` skill. For cost estimation: run `scripts/estimate_cost.py`.
364
+
365
+ ## Quick start — Object Detection
366
+
367
+ The `script_args` below are the same for both submission methods. See directive #1 for the critical differences between them.
368
+
369
+ ```python
370
+ OD_SCRIPT_ARGS = [
371
+ "--model_name_or_path", "ustc-community/dfine-small-coco",
372
+ "--dataset_name", "cppe-5",
373
+ "--image_square_size", "640",
374
+ "--output_dir", "dfine_finetuned",
375
+ "--num_train_epochs", "30",
376
+ "--per_device_train_batch_size", "8",
377
+ "--learning_rate", "5e-5",
378
+ "--eval_strategy", "epoch",
379
+ "--save_strategy", "epoch",
380
+ "--save_total_limit", "2",
381
+ "--load_best_model_at_end",
382
+ "--metric_for_best_model", "eval_map",
383
+ "--greater_is_better", "True",
384
+ "--no_remove_unused_columns",
385
+ "--no_eval_do_concat_batches",
386
+ "--push_to_hub",
387
+ "--hub_model_id", "username/model-name",
388
+ "--do_train",
389
+ "--do_eval",
390
+ ]
391
+ ```
392
+
393
+ ```python
394
+ from huggingface_hub import HfApi, get_token
395
+ api = HfApi()
396
+ job_info = api.run_uv_job(
397
+ script="scripts/object_detection_training.py",
398
+ script_args=OD_SCRIPT_ARGS,
399
+ flavor="t4-small",
400
+ timeout=14400,
401
+ env={"PYTHONUNBUFFERED": "1"},
402
+ secrets={"HF_TOKEN": get_token()},
403
+ )
404
+ print(f"Job ID: {job_info.id}")
405
+ ```
406
+
407
+ ### Key OD `script_args`
408
+
409
+ - `--model_name_or_path` — recommended: `"ustc-community/dfine-small-coco"` (see model table above)
410
+ - `--dataset_name` — the Hub dataset ID
411
+ - `--image_square_size` — 480 (fast iteration) or 800 (better accuracy)
412
+ - `--hub_model_id` — `"username/model-name"` for Hub persistence
413
+ - `--num_train_epochs` — 30 typical for convergence
414
+ - `--train_val_split` — fraction to split for validation (default 0.15), set if dataset lacks a validation split
415
+ - `--max_train_samples` — truncate training set (useful for quick test runs, e.g. `"785"` for ~10% of a 7.8K dataset)
416
+ - `--max_eval_samples` — truncate evaluation set
417
+
418
+ ## Quick start — Image Classification
419
+
420
+ ```python
421
+ IC_SCRIPT_ARGS = [
422
+ "--model_name_or_path", "timm/mobilenetv3_small_100.lamb_in1k",
423
+ "--dataset_name", "ethz/food101",
424
+ "--output_dir", "food101_classifier",
425
+ "--num_train_epochs", "5",
426
+ "--per_device_train_batch_size", "32",
427
+ "--per_device_eval_batch_size", "32",
428
+ "--learning_rate", "5e-5",
429
+ "--eval_strategy", "epoch",
430
+ "--save_strategy", "epoch",
431
+ "--save_total_limit", "2",
432
+ "--load_best_model_at_end",
433
+ "--metric_for_best_model", "eval_accuracy",
434
+ "--greater_is_better", "True",
435
+ "--no_remove_unused_columns",
436
+ "--push_to_hub",
437
+ "--hub_model_id", "username/food101-classifier",
438
+ "--do_train",
439
+ "--do_eval",
440
+ ]
441
+ ```
442
+
443
+ ```python
444
+ from huggingface_hub import HfApi, get_token
445
+ api = HfApi()
446
+ job_info = api.run_uv_job(
447
+ script="scripts/image_classification_training.py",
448
+ script_args=IC_SCRIPT_ARGS,
449
+ flavor="t4-small",
450
+ timeout=7200,
451
+ env={"PYTHONUNBUFFERED": "1"},
452
+ secrets={"HF_TOKEN": get_token()},
453
+ )
454
+ print(f"Job ID: {job_info.id}")
455
+ ```
456
+
457
+ ### Key IC `script_args`
458
+
459
+ - `--model_name_or_path` — any `timm/` model or Transformers classification model (see model table above)
460
+ - `--dataset_name` — the Hub dataset ID
461
+ - `--image_column_name` — column containing PIL images (default: `"image"`)
462
+ - `--label_column_name` — column containing class labels (default: `"label"`)
463
+ - `--hub_model_id` — `"username/model-name"` for Hub persistence
464
+ - `--num_train_epochs` — 3-5 typical for classification (fewer than OD)
465
+ - `--per_device_train_batch_size` — 16-64 (classification models use less memory than OD)
466
+ - `--train_val_split` — fraction to split for validation (default 0.15), set if dataset lacks a validation split
467
+ - `--max_train_samples` / `--max_eval_samples` — truncate for quick tests
468
+
469
+ ## Quick start — SAM/SAM2 Segmentation
470
+
471
+ ```python
472
+ SAM_SCRIPT_ARGS = [
473
+ "--model_name_or_path", "facebook/sam2.1-hiera-small",
474
+ "--dataset_name", "merve/MicroMat-mini",
475
+ "--prompt_type", "bbox",
476
+ "--prompt_column_name", "prompt",
477
+ "--output_dir", "sam2-finetuned",
478
+ "--num_train_epochs", "30",
479
+ "--per_device_train_batch_size", "4",
480
+ "--learning_rate", "1e-5",
481
+ "--logging_steps", "1",
482
+ "--save_strategy", "epoch",
483
+ "--save_total_limit", "2",
484
+ "--remove_unused_columns", "False",
485
+ "--dataloader_pin_memory", "False",
486
+ "--push_to_hub",
487
+ "--hub_model_id", "username/sam2-finetuned",
488
+ "--do_train",
489
+ "--report_to", "trackio",
490
+ ]
491
+ ```
492
+
493
+ ```python
494
+ from huggingface_hub import HfApi, get_token
495
+ api = HfApi()
496
+ job_info = api.run_uv_job(
497
+ script="scripts/sam_segmentation_training.py",
498
+ script_args=SAM_SCRIPT_ARGS,
499
+ flavor="t4-small",
500
+ timeout=7200,
501
+ env={"PYTHONUNBUFFERED": "1"},
502
+ secrets={"HF_TOKEN": get_token()},
503
+ )
504
+ print(f"Job ID: {job_info.id}")
505
+ ```
506
+
507
+ ### Key SAM `script_args`
508
+
509
+ - `--model_name_or_path` — SAM or SAM2 model (see model table above); auto-detects SAM vs SAM2
510
+ - `--dataset_name` — the Hub dataset ID (e.g., `"merve/MicroMat-mini"`)
511
+ - `--prompt_type` — `"bbox"` or `"point"` — type of prompt in the dataset
512
+ - `--prompt_column_name` — column with JSON-encoded prompts (default: `"prompt"`)
513
+ - `--bbox_column_name` — dedicated bbox column (alternative to JSON prompt column)
514
+ - `--point_column_name` — dedicated point column (alternative to JSON prompt column)
515
+ - `--mask_column_name` — column with ground-truth masks (default: `"mask"`)
516
+ - `--hub_model_id` — `"username/model-name"` for Hub persistence
517
+ - `--num_train_epochs` — 20-30 typical for SAM fine-tuning
518
+ - `--per_device_train_batch_size` — 2-4 (SAM models use significant memory)
519
+ - `--freeze_vision_encoder` / `--freeze_prompt_encoder` — freeze encoder weights (default: both frozen, only mask decoder trains)
520
+ - `--train_val_split` — fraction to split for validation (default 0.1)
521
+
522
+ ## Checking job status
523
+
524
+ **MCP tool (if available):**
525
+ ```
526
+ hf_jobs("ps") # List all jobs
527
+ hf_jobs("logs", {"job_id": "your-job-id"}) # View logs
528
+ hf_jobs("inspect", {"job_id": "your-job-id"}) # Job details
529
+ ```
530
+
531
+ **Python API fallback:**
532
+ ```python
533
+ from huggingface_hub import HfApi
534
+ api = HfApi()
535
+ api.list_jobs() # List all jobs
536
+ api.get_job_logs(job_id="your-job-id") # View logs
537
+ api.get_job(job_id="your-job-id") # Job details
538
+ ```
539
+
540
+ ## Common failure modes
541
+
542
+ ### OOM (CUDA out of memory)
543
+ Reduce `per_device_train_batch_size` (try 4, then 2), reduce `IMAGE_SIZE`, or upgrade hardware.
544
+
545
+ ### Dataset format errors
546
+ Run `scripts/dataset_inspector.py` first. The training script auto-detects xyxy vs xywh, converts string categories to integer IDs, and adds `image_id` if missing. Ensure `objects.bbox` contains 4-value coordinate lists in absolute pixels and `objects.category` contains either integer IDs or string labels.
547
+
548
+ ### Hub push failures (401)
549
+ Verify: (1) job secrets include token (see directive #2), (2) script sets `training_args.hub_token` BEFORE creating the `Trainer`, (3) `push_to_hub=True` is set, (4) correct `hub_model_id`, (5) token has write permissions.
550
+
551
+ ### Job timeout
552
+ Increase timeout (see directive #5 table), reduce epochs/dataset, or use checkpoint strategy with `hub_strategy="every_save"`.
553
+
554
+ ### KeyError: 'test' (missing test split)
555
+ The object detection training script handles this gracefully — it falls back to the `validation` split. Ensure you're using the latest `scripts/object_detection_training.py`.
556
+
557
+ ### Single-class dataset: "iteration over a 0-d tensor"
558
+ `torchmetrics.MeanAveragePrecision` returns scalar (0-d) tensors for per-class metrics when there's only one class. The template `scripts/object_detection_training.py` handles this by calling `.unsqueeze(0)` on these tensors. Ensure you're using the latest template.
559
+
560
+ ### Poor detection performance (mAP < 0.15)
561
+ Increase epochs (30-50), ensure 500+ images, check per-class mAP for imbalanced classes, try different learning rates (1e-5 to 1e-4), increase image size.
562
+
563
+ For comprehensive troubleshooting: see [references/reliability_principles.md](references/reliability_principles.md)
564
+
565
+ ## Reference files
566
+
567
+ - [scripts/object_detection_training.py](scripts/object_detection_training.py) — Production-ready object detection training script
568
+ - [scripts/image_classification_training.py](scripts/image_classification_training.py) — Production-ready image classification training script (supports timm models)
569
+ - [scripts/sam_segmentation_training.py](scripts/sam_segmentation_training.py) — Production-ready SAM/SAM2 segmentation training script (bbox & point prompts)
570
+ - [scripts/dataset_inspector.py](scripts/dataset_inspector.py) — Validate dataset format for OD, classification, and SAM segmentation
571
+ - [scripts/estimate_cost.py](scripts/estimate_cost.py) — Estimate training costs for any vision model (includes SAM/SAM2)
572
+ - [references/object_detection_training_notebook.md](references/object_detection_training_notebook.md) — Object detection training workflow, augmentation strategies, and training patterns
573
+ - [references/image_classification_training_notebook.md](references/image_classification_training_notebook.md) — Image classification training workflow with ViT, preprocessing, and evaluation
574
+ - [references/finetune_sam2_trainer.md](references/finetune_sam2_trainer.md) — SAM2 fine-tuning walkthrough with MicroMat dataset, DiceCE loss, and Trainer integration
575
+ - [references/timm_trainer.md](references/timm_trainer.md) — Using timm models with HF Trainer (TimmWrapper, transforms, full example)
576
+ - [references/hub_saving.md](references/hub_saving.md) — Detailed Hub persistence guide and verification checklist
577
+ - [references/reliability_principles.md](references/reliability_principles.md) — Failure prevention principles from production experience
578
+
579
+ ## External links
580
+
581
+ - [Transformers Object Detection Guide](https://huggingface.co/docs/transformers/tasks/object_detection)
582
+ - [Transformers Image Classification Guide](https://huggingface.co/docs/transformers/tasks/image_classification)
583
+ - [DETR Model Documentation](https://huggingface.co/docs/transformers/model_doc/detr)
584
+ - [ViT Model Documentation](https://huggingface.co/docs/transformers/model_doc/vit)
585
+ - [HF Jobs Guide](https://huggingface.co/docs/huggingface_hub/guides/jobs) — Main Jobs documentation
586
+ - [HF Jobs Configuration](https://huggingface.co/docs/hub/en/jobs-configuration) — Hardware, secrets, timeouts, namespaces
587
+ - [HF Jobs CLI Reference](https://huggingface.co/docs/huggingface_hub/guides/cli#hf-jobs) — Command line interface
588
+ - [Object Detection Models](https://huggingface.co/models?pipeline_tag=object-detection)
589
+ - [Image Classification Models](https://huggingface.co/models?pipeline_tag=image-classification)
590
+ - [SAM2 Model Documentation](https://huggingface.co/docs/transformers/model_doc/sam2)
591
+ - [SAM Model Documentation](https://huggingface.co/docs/transformers/model_doc/sam)
592
+ - [Object Detection Datasets](https://huggingface.co/datasets?task_categories=task_categories:object-detection)
593
+ - [Image Classification Datasets](https://huggingface.co/datasets?task_categories=task_categories:image-classification)