npm - opencode-skills-antigravity - Versions diffs - 1.0.39 → 1.0.41 - Mend

opencode-skills-antigravity 1.0.39 → 1.0.41

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (91) hide show

package/bundled-skills/hugging-face-vision-trainer/references/reliability_principles.md ADDED Viewed

@@ -0,0 +1,310 @@
+# Reliability Principles for Training Jobs
+## Contents
+- Principle 1: Always Verify Before Use
+- Principle 2: Prioritize Reliability Over Performance
+- Principle 3: Create Atomic, Self-Contained Scripts
+- Principle 4: Provide Clear Error Context
+- Principle 5: Test the Happy Path on Known-Good Inputs
+- Summary: The Reliability Checklist (pre-flight, script quality, job config)
+- When Principles Conflict
+---
+These principles are derived from real production failures and successful fixes. Following them prevents common failure modes and ensures reliable job execution.
+## Principle 1: Always Verify Before Use
+**Rule:** Never assume repos, datasets, or resources exist. Verify with tools first.
+### What It Prevents
+- **Non-existent datasets** - Jobs fail immediately when dataset doesn't exist
+- **Typos in names** - Simple mistakes like "argilla-dpo-mix-7k" vs "ultrafeedback_binarized"
+- **Incorrect paths** - Old or moved repos, renamed files
+- **Missing dependencies** - Undocumented requirements
+### How to Apply
+**Before submitting ANY job:**
+```python
+# Verify dataset exists
+dataset_search({"query": "dataset-name", "author": "author-name", "limit": 5})
+hub_repo_details(["author/dataset-name"], repo_type="dataset")
+# Verify model exists
+hub_repo_details(["org/model-name"], repo_type="model")
+# Check script/file paths (for URL-based scripts)
+# Verify before using: https://github.com/user/repo/blob/main/script.py
+```
+**Examples that would have caught errors:**
+```python
+# ❌ WRONG: Assumed dataset exists
+hf_jobs("uv", {
+    "script": """...""",
+    "env": {"DATASET": "trl-lib/argilla-dpo-mix-7k"}  # Doesn't exist!
+})
+# ✅ CORRECT: Verify first
+dataset_search({"query": "argilla dpo", "author": "trl-lib"})
+# Would show: "trl-lib/ultrafeedback_binarized" is the correct name
+hub_repo_details(["trl-lib/ultrafeedback_binarized"], repo_type="dataset")
+# Confirms it exists before using
+```
+### Implementation Checklist
+- [ ] Check dataset exists before training
+- [ ] Test script URLs are valid before submitting
+- [ ] Check for recent updates/renames of resources
+- [ ] Check for dataset format
+**Time cost:** 5-10 seconds
+**Time saved:** Hours of failed job time + debugging
+---
+## Principle 2: Prioritize Reliability Over Performance
+**Rule:** Default to what is most likely to succeed, not what is theoretically fastest.
+### What It Prevents
+- **Hardware incompatibilities** - Features that fail on certain GPUs
+- **Unstable optimizations** - Speed-ups that cause crashes
+- **Complex configurations** - More failure points
+- **Build system issues** - Unreliable compilation methods
+### How to Apply
+**Choose reliability:**
+```python
+# ❌ RISKY: Aggressive optimization that may fail
+TrainingArguments(
+    torch_compile=True,  # Can fail on T4, A10G GPUs
+    optim="adamw_bnb_8bit",  # Requires specific setup
+    dataloader_num_workers=8,  # May cause OOM on small instances
+    ...
+)
+# ✅ SAFE: Proven defaults
+TrainingArguments(
+    # torch_compile=True,  # Commented with note: "Enable on H100 for 20% speedup"
+    optim="adamw_torch",  # Standard, always works
+    fp16=True,  # Stable and fast on T4/A10G
+    dataloader_num_workers=4,  # Conservative, reliable
+    ...
+)
+```
+### Real-World Example
+**The `torch.compile` failure:**
+- Added for "20% speedup" on H100
+- **Failed fatally on T4-medium** with cryptic error
+- Misdiagnosed as dataset issue (cost hours)
+- **Fix:** Disable by default, add as optional comment
+**Result:** Reliability > 20% performance gain
+### Implementation Checklist
+- [ ] Use proven, standard configurations by default
+- [ ] Comment out performance optimizations with hardware notes
+- [ ] Use stable build systems (CMake > make)
+- [ ] Test on target hardware before production
+- [ ] Document known incompatibilities
+- [ ] Provide "safe" and "fast" variants when needed
+**Performance loss:** 10-20% in best case
+**Reliability gain:** 95%+ success rate vs 60-70%
+---
+## Principle 3: Create Atomic, Self-Contained Scripts
+**Rule:** Scripts should work as complete, independent units. Don't remove parts to "simplify."
+### What It Prevents
+- **Missing dependencies** - Removed "unnecessary" packages that are actually required
+- **Incomplete processes** - Skipped steps that seem redundant
+- **Environment assumptions** - Scripts that need pre-setup
+- **Partial failures** - Some parts work, others fail silently
+### How to Apply
+**Complete dependency specifications:**
+```python
+# ❌ INCOMPLETE: "Simplified" by removing dependencies
+# /// script
+# dependencies = [
+#     "transformers",
+#     "torch",
+#     "datasets",
+# ]
+# ///
+# ✅ COMPLETE: All dependencies explicit
+# /// script
+# dependencies = [
+#     "transformers>=5.2.0",
+#     "accelerate>=1.1.0",
+#     "albumentations>=1.4.16",  # Required for augmentation + bbox handling
+#     "timm",                     # Required for vision backbones
+#     "datasets>=4.0",
+#     "torchmetrics",             # Required for mAP/mAR computation
+#     "pycocotools",              # Required for COCO evaluation
+#     "trackio",                  # Required for metrics monitoring
+#     "huggingface_hub",
+# ]
+# ///
+```
+### Real-World Example
+**The `albumentations` failure:**
+- Original script had it: augmentations and bbox clipping worked fine
+- "Simplified" version removed it: "not strictly needed for training"
+- **Training crashed on bbox augmentation** — no fallback for COCO-format bbox handling
+- Hard to debug: error appeared in data loading, not in augmentation setup
+- **Fix:** Restore all original dependencies
+**Result:** Don't remove dependencies without thorough testing
+### Implementation Checklist
+- [ ] All dependencies in PEP 723 header with version pins
+- [ ] All system packages installed by script
+- [ ] No assumptions about pre-existing environment
+- [ ] No "optional" steps that are actually required
+- [ ] Test scripts in clean environment
+- [ ] Document why each dependency is needed
+**Complexity:** Slightly longer scripts
+**Reliability:** Scripts "just work" every time
+---
+## Principle 4: Provide Clear Error Context
+**Rule:** When things fail, make it obvious what went wrong and how to fix it.
+### How to Apply
+**Wrap subprocess calls:**
+```python
+# ❌ UNCLEAR: Silent failure
+subprocess.run([...], check=True, capture_output=True)
+# ✅ CLEAR: Shows what failed
+try:
+    result = subprocess.run(
+        [...],
+        check=True,
+        capture_output=True,
+        text=True
+    )
+    print(result.stdout)
+    if result.stderr:
+        print("Warnings:", result.stderr)
+except subprocess.CalledProcessError as e:
+    print(f"❌ Command failed!")
+    print("STDOUT:", e.stdout)
+    print("STDERR:", e.stderr)
+    raise
+```
+**Validate inputs:**
+```python
+# ❌ UNCLEAR: Fails later with cryptic error
+model = load_model(MODEL_NAME)
+# ✅ CLEAR: Fails fast with clear message
+if not MODEL_NAME:
+    raise ValueError("MODEL_NAME environment variable not set!")
+print(f"Loading model: {MODEL_NAME}")
+try:
+    model = load_model(MODEL_NAME)
+    print(f"✅ Model loaded successfully")
+except Exception as e:
+    print(f"❌ Failed to load model: {MODEL_NAME}")
+    print(f"Error: {e}")
+    print("Hint: Check that model exists on Hub")
+    raise
+```
+### Implementation Checklist
+- [ ] Wrap external calls with try/except
+- [ ] Print stdout/stderr on failure
+- [ ] Validate environment variables early
+- [ ] Add progress indicators (✅, ❌, 🔄)
+- [ ] Include hints for common failures
+- [ ] Log configuration at start
+---
+## Principle 5: Test the Happy Path on Known-Good Inputs
+**Rule:** Before using new code in production, test with inputs you know work.
+## Summary: The Reliability Checklist
+Before submitting ANY job:
+### Pre-Flight Checks
+- [ ] **Verified** all repos/datasets exist (hub_repo_details)
+- [ ] **Tested** with known-good inputs if new code
+- [ ] **Using** proven hardware/configuration
+- [ ] **Included** all dependencies in PEP 723 header
+- [ ] **Installed** system requirements (build tools, etc.)
+- [ ] **Set** appropriate timeout (not default 30m)
+- [ ] **Configured** Hub push with HF_TOKEN (login() + hub_token)
+- [ ] **Added** clear error handling
+### Script Quality
+- [ ] Self-contained (no external setup needed)
+- [ ] Complete dependencies listed
+- [ ] Build tools installed by script
+- [ ] Progress indicators included
+- [ ] Error messages are clear
+- [ ] Configuration logged at start
+### Job Configuration
+- [ ] Timeout > expected runtime + 30% buffer
+- [ ] Hardware appropriate for model size
+- [ ] Secrets include HF_TOKEN (see SKILL.md directive #2 for syntax)
+- [ ] Script calls `login(token=hf_token)` and sets `training_args.hub_token = hf_token` BEFORE `Trainer()` init
+- [ ] Environment variables set correctly
+- [ ] Cost estimated and acceptable
+**Following these principles transforms job success rate from ~60-70% to ~95%+**
+---
+## When Principles Conflict
+Sometimes reliability and performance conflict. Here's how to choose:
+| Scenario | Choose | Rationale |
+|----------|--------|-----------|
+| Demo/test | Reliability | Fast failure is worse than slow success |
+| Production (first run) | Reliability | Prove it works before optimizing |
+| Production (proven) | Performance | Safe to optimize after validation |
+| Time-critical | Reliability | Failures cause more delay than slow runs |
+| Cost-critical | Balanced | Test with small model, then optimize |
+**General rule:** Reliability first, optimize second.
+---

package/bundled-skills/hugging-face-vision-trainer/references/timm_trainer.md ADDED Viewed

@@ -0,0 +1,91 @@
+# Using timm models with Hugging Face Trainer
+Transformers has first-class support for timm models via the `TimmWrapper` classes. You can load any timm model and use it directly with the `Trainer` API for image classification. Here's how it works:
+## Loading a timm model
+The `TimmWrapperForImageClassification` class (in `transformers/src/transformers/models/timm_wrapper/modeling_timm_wrapper.py`) wraps timm models so they're fully compatible with the Trainer API. You can load them via the `Auto` classes:
+```python
+from transformers import AutoModelForImageClassification, AutoImageProcessor, Trainer, TrainingArguments
+# Load a timm model for image classification
+checkpoint = "timm/resnet50.a1_in1k"
+image_processor = AutoImageProcessor.from_pretrained(checkpoint)
+model = AutoModelForImageClassification.from_pretrained(
+    checkpoint,
+    num_labels=10,  # set to your number of classes
+    ignore_mismatched_sizes=True,  # needed when changing num_labels from pretrained
+)
+```
+## Key details
+1. **Image processor**: The `TimmWrapperImageProcessor` automatically resolves the correct transforms from timm's config. It exposes both `val_transforms` and `train_transforms` (with augmentations), as noted in the code:
+```64:65:transformers/src/transformers/models/timm_wrapper/image_processing_timm_wrapper.py
+        # useful for training, see examples/pytorch/image-classification/run_image_classification.py
+        self.train_transforms = timm.data.create_transform(**self.data_config, is_training=True)
+```
+2. **Loss computation is built-in**: `TimmWrapperForImageClassification.forward()` accepts a `labels` argument and computes cross-entropy loss automatically, which is exactly what Trainer expects:
+```374:376:transformers/src/transformers/models/timm_wrapper/modeling_timm_wrapper.py
+        loss = None
+        if labels is not None:
+            loss = self.loss_function(labels, logits, self.config)
+```
+3. **Returns `ImageClassifierOutput`**: The output format is the standard transformers output, so Trainer handles it seamlessly.
+## Full training example
+```python
+from transformers import AutoModelForImageClassification, AutoImageProcessor, Trainer, TrainingArguments
+from datasets import load_dataset
+# Load dataset
+dataset = load_dataset("food101", split="train[:5000]")
+dataset = dataset.train_test_split(test_size=0.2)
+# Load timm model + processor
+checkpoint = "timm/resnet50.a1_in1k"
+image_processor = AutoImageProcessor.from_pretrained(checkpoint)
+model = AutoModelForImageClassification.from_pretrained(
+    checkpoint,
+    num_labels=101,
+    ignore_mismatched_sizes=True,
+)
+# Preprocessing
+def transform(batch):
+    batch["pixel_values"] = [image_processor(img)["pixel_values"][0] for img in batch["image"]]
+    batch["labels"] = batch["label"]
+    return batch
+dataset["train"].set_transform(transform)
+dataset["test"].set_transform(transform)
+# Train
+training_args = TrainingArguments(
+    output_dir="./timm-finetuned",
+    num_train_epochs=3,
+    per_device_train_batch_size=16,
+    per_device_eval_batch_size=16,
+    eval_strategy="epoch",
+    save_strategy="epoch",
+    logging_steps=50,
+    remove_unused_columns=False,
+)
+trainer = Trainer(
+    model=model,
+    args=training_args,
+    train_dataset=dataset["train"],
+    eval_dataset=dataset["test"],
+)
+trainer.train()
+```
+Any timm checkpoint on the Hub (prefixed with `timm/`) works out of the box (ResNet, EfficientNet, ViT, ConvNeXt, etc). The wrapper handles all the translation between timm's interface and what Trainer expects.