npm - @wentorai/research-plugins - Versions diffs - 1.0.0 → 1.2.0 - Mend

@wentorai/research-plugins 1.0.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (415) hide show

package/skills/domains/ai-ml/autonomous-agents-papers-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,178 @@
+---
+name: autonomous-agents-papers-guide
+description: "Daily-updated collection of autonomous AI agent papers"
+metadata:
+  openclaw:
+    emoji: "🤖"
+    category: "domains"
+    subcategory: "ai-ml"
+    keywords: ["autonomous agents", "AI agents", "LLM agents", "agent papers", "planning", "tool use"]
+    source: "https://github.com/tmgthb/Autonomous-Agents"
+---
+# Autonomous Agents Papers Guide
+## Overview
+A daily-updated collection of research papers on autonomous AI agents — systems that use LLMs for planning, reasoning, tool use, and multi-step task execution. Covers the full agent stack from foundational prompting techniques (ReAct, Chain-of-Thought) to multi-agent systems, memory architectures, and real-world deployments. Organized chronologically with category tags for easy navigation.
+## Agent Taxonomy
+```
+Autonomous Agents
+├── Planning & Reasoning
+│   ├── Chain-of-Thought (CoT, ToT, GoT)
+│   ├── ReAct (Reasoning + Acting)
+│   ├── Reflexion (Self-reflection)
+│   └── LATS (Language Agent Tree Search)
+├── Tool Use & Actions
+│   ├── Function calling
+│   ├── Code execution
+│   ├── Web browsing
+│   └── API interaction
+├── Memory Systems
+│   ├── Short-term (context window)
+│   ├── Long-term (vector stores)
+│   ├── Episodic (experience replay)
+│   └── Procedural (learned strategies)
+├── Multi-Agent Systems
+│   ├── Debate/discussion (ChatDev, MetaGPT)
+│   ├── Hierarchical (manager/worker)
+│   ├── Collaborative (shared goals)
+│   └── Competitive (adversarial)
+└── Applications
+    ├── Software engineering (SWE-agent, Devin)
+    ├── Scientific research (AI Scientist)
+    ├── Web automation (WebArena)
+    └── Game playing (Voyager)
+```
+## Landmark Papers
+| Paper | Year | Key Contribution |
+|-------|------|-----------------|
+| **ReAct** | 2023 | Interleaving reasoning and acting |
+| **Toolformer** | 2023 | Self-taught tool use |
+| **Voyager** | 2023 | Lifelong learning agent in Minecraft |
+| **AutoGPT** | 2023 | Autonomous goal-directed agent |
+| **MetaGPT** | 2023 | Multi-agent software company |
+| **Reflexion** | 2023 | Verbal self-reflection for learning |
+| **SWE-agent** | 2024 | Autonomous software engineering |
+| **AI Scientist** | 2024 | Autonomous research paper generation |
+| **Claude Computer Use** | 2024 | GUI agent via screenshots |
+| **OpenHands** | 2024 | Open platform for AI agents |
+## Paper Tracking
+```python
+import arxiv
+from datetime import datetime, timedelta
+def find_agent_papers(days=7, max_results=30):
+    """Find recent autonomous agent papers."""
+    queries = [
+        "abs:autonomous agent AND abs:large language model",
+        "abs:LLM agent AND (abs:planning OR abs:tool use)",
+        "abs:multi-agent AND abs:LLM",
+    ]
+    seen = set()
+    papers = []
+    for query in queries:
+        search = arxiv.Search(
+            query=query,
+            max_results=max_results,
+            sort_by=arxiv.SortCriterion.SubmittedDate,
+        )
+        cutoff = datetime.now() - timedelta(days=days)
+        for r in search.results():
+            if (r.entry_id not in seen and
+                r.published.replace(tzinfo=None) > cutoff):
+                seen.add(r.entry_id)
+                papers.append({
+                    "title": r.title,
+                    "url": r.entry_id,
+                    "date": r.published.strftime("%Y-%m-%d"),
+                    "categories": r.categories,
+                })
+    papers.sort(key=lambda x: x["date"], reverse=True)
+    return papers
+for p in find_agent_papers(days=14):
+    print(f"[{p['date']}] {p['title']}")
+```
+## Agent Benchmarks
+```python
+benchmarks = {
+    "SWE-bench": {
+        "task": "Resolve real GitHub issues",
+        "metric": "% resolved",
+        "top_score": "49% (Claude 3.5 + SWE-agent)",
+    },
+    "WebArena": {
+        "task": "Complete web tasks in realistic sites",
+        "metric": "Task success rate",
+        "top_score": "35.8%",
+    },
+    "GAIA": {
+        "task": "General AI assistant tasks",
+        "metric": "Accuracy across levels",
+        "top_score": "Level 1: 75%, Level 3: 30%",
+    },
+    "AgentBench": {
+        "task": "8 diverse agent environments",
+        "metric": "Overall score",
+    },
+    "ToolBench": {
+        "task": "API tool selection and chaining",
+        "metric": "Pass rate",
+    },
+}
+for name, info in benchmarks.items():
+    print(f"\n{name}: {info['task']}")
+    print(f"  Metric: {info['metric']}")
+    if "top_score" in info:
+        print(f"  SOTA: {info['top_score']}")
+```
+## Reading Roadmap
+```markdown
+### Foundations
+1. "Chain-of-Thought Prompting" (Wei et al., 2022)
+2. "ReAct: Synergizing Reasoning and Acting" (Yao et al., 2023)
+3. "Toolformer" (Schick et al., 2023)
+### Planning & Memory
+4. "Tree of Thoughts" (Yao et al., 2023)
+5. "Reflexion" (Shinn et al., 2023)
+6. "Generative Agents" (Park et al., 2023)
+### Multi-Agent
+7. "MetaGPT" (Hong et al., 2023)
+8. "AutoGen" (Wu et al., 2023)
+9. "ChatDev" (Qian et al., 2023)
+### Applications
+10. "SWE-agent" (Yang et al., 2024)
+11. "The AI Scientist" (Lu et al., 2024)
+```
+## Use Cases
+1. **Literature survey**: Track the fast-moving agent research field
+2. **System design**: Learn from agent architecture patterns
+3. **Benchmark comparison**: Compare agent frameworks
+4. **Research direction**: Identify open problems in agent AI
+5. **Course material**: Teach LLM-based agent systems
+## References
+- [Autonomous-Agents GitHub](https://github.com/tmgthb/Autonomous-Agents)
+- [LLM-Agent-Paper-List](https://github.com/WooooDyy/LLM-Agent-Paper-List)
+- [Agent Survey](https://arxiv.org/abs/2308.11432)

package/skills/domains/ai-ml/dl-transformer-finetune/SKILL.md ADDED Viewed

@@ -0,0 +1,239 @@
+---
+name: dl-transformer-finetune
+description: "Build transformer fine-tuning plans for classification and generation"
+metadata:
+  openclaw:
+    emoji: "🎯"
+    category: "domains"
+    subcategory: "ai-ml"
+    keywords: ["transformer", "fine-tuning", "BERT", "LoRA", "PEFT", "transfer learning", "NLP"]
+    source: "https://github.com/huggingface/peft"
+---
+# Transformer Fine-Tuning Guide
+## Overview
+Fine-tuning pretrained transformers is the dominant paradigm in modern NLP and increasingly in vision, audio, and multimodal research. The core idea is simple: take a model pretrained on massive data, then adapt it to your specific task with a comparatively small labeled dataset. But the practical details -- which layers to freeze, which optimizer and learning rate to use, how to handle catastrophic forgetting, when to use parameter-efficient methods -- determine whether fine-tuning succeeds or fails.
+This guide covers the full spectrum of fine-tuning approaches: full fine-tuning for maximum performance, parameter-efficient fine-tuning (PEFT) for resource-constrained settings, and the decision framework for choosing between them. The patterns are drawn from hundreds of published papers and the Hugging Face ecosystem that supports them.
+Whether you are fine-tuning BERT for text classification in a domain-specific corpus, adapting a large language model with LoRA for instruction following, or building a multi-task model for your research pipeline, this guide provides the recipes you need.
+## Full Fine-Tuning
+### Text Classification with BERT
+```python
+from transformers import (
+    AutoModelForSequenceClassification,
+    AutoTokenizer,
+    TrainingArguments,
+    Trainer,
+)
+from datasets import load_dataset
+import numpy as np
+from sklearn.metrics import accuracy_score, f1_score
+# Load model and tokenizer
+model_name = "bert-base-uncased"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(
+    model_name, num_labels=3
+)
+# Prepare dataset
+dataset = load_dataset("multi_nli")
+def tokenize_function(examples):
+    return tokenizer(
+        examples["premise"],
+        examples["hypothesis"],
+        truncation=True,
+        max_length=128,
+        padding="max_length",
+    )
+tokenized = dataset.map(tokenize_function, batched=True)
+# Metrics
+def compute_metrics(eval_pred):
+    logits, labels = eval_pred
+    preds = np.argmax(logits, axis=-1)
+    return {
+        "accuracy": accuracy_score(labels, preds),
+        "f1_macro": f1_score(labels, preds, average="macro"),
+    }
+# Training arguments (research-grade defaults)
+training_args = TrainingArguments(
+    output_dir="./results",
+    num_train_epochs=3,
+    per_device_train_batch_size=32,
+    per_device_eval_batch_size=64,
+    learning_rate=2e-5,                  # Standard for BERT fine-tuning
+    weight_decay=0.01,
+    warmup_ratio=0.06,                   # 6% warmup
+    evaluation_strategy="epoch",
+    save_strategy="epoch",
+    load_best_model_at_end=True,
+    metric_for_best_model="f1_macro",
+    fp16=True,
+    dataloader_num_workers=4,
+    seed=42,
+    report_to="wandb",
+)
+trainer = Trainer(
+    model=model,
+    args=training_args,
+    train_dataset=tokenized["train"],
+    eval_dataset=tokenized["validation_matched"],
+    compute_metrics=compute_metrics,
+)
+trainer.train()
+```
+### Learning Rate Selection Guide
+| Model Size | Recommended LR | Warmup | Weight Decay |
+|-----------|----------------|--------|--------------|
+| BERT-base (110M) | 2e-5 to 5e-5 | 6-10% | 0.01 |
+| BERT-large (340M) | 1e-5 to 3e-5 | 6-10% | 0.01 |
+| RoBERTa-large (355M) | 1e-5 to 2e-5 | 6% | 0.01 |
+| T5-base (220M) | 3e-4 to 1e-3 | 0-5% | 0.01 |
+| LLaMA-7B (full FT) | 1e-5 to 2e-5 | 3% | 0.0 |
+| LLaMA-7B (LoRA) | 1e-4 to 3e-4 | 3% | 0.0 |
+## Parameter-Efficient Fine-Tuning (PEFT)
+### LoRA (Low-Rank Adaptation)
+LoRA freezes the pretrained weights and injects trainable low-rank decomposition matrices. It typically trains only 0.1-1% of parameters while achieving 95-100% of full fine-tuning performance.
+```python
+from peft import LoraConfig, get_peft_model, TaskType
+from transformers import AutoModelForCausalLM, AutoTokenizer
+# Load base model
+model = AutoModelForCausalLM.from_pretrained(
+    "meta-llama/Llama-2-7b-hf",
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+)
+# Configure LoRA
+lora_config = LoraConfig(
+    task_type=TaskType.CAUSAL_LM,
+    r=16,                          # Rank (8-64 typical)
+    lora_alpha=32,                 # Scaling factor (usually 2*r)
+    lora_dropout=0.05,
+    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
+    bias="none",
+)
+model = get_peft_model(model, lora_config)
+model.print_trainable_parameters()
+# Output: trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.062
+```
+### QLoRA (Quantized LoRA)
+```python
+from transformers import BitsAndBytesConfig
+# 4-bit quantization config
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.bfloat16,
+    bnb_4bit_use_double_quant=True,
+)
+model = AutoModelForCausalLM.from_pretrained(
+    "meta-llama/Llama-2-7b-hf",
+    quantization_config=bnb_config,
+    device_map="auto",
+)
+# Apply LoRA on top of quantized model
+model = get_peft_model(model, lora_config)
+# Now fits on a single 24GB GPU!
+```
+## PEFT Method Comparison
+| Method | Trainable % | Memory | Performance | Best For |
+|--------|------------|--------|-------------|----------|
+| Full fine-tuning | 100% | High | Best | Sufficient compute + data |
+| LoRA | 0.1-1% | Low | 95-100% | Most scenarios |
+| QLoRA | 0.1-1% | Very low | 93-98% | Consumer GPUs |
+| Prefix tuning | ~0.1% | Low | 90-95% | Generation tasks |
+| Adapter layers | 1-5% | Medium | 95-99% | Multi-task |
+| Prompt tuning | <0.01% | Minimal | 85-95% | Large models, many tasks |
+## Avoiding Catastrophic Forgetting
+```python
+# Strategy 1: Gradual unfreezing (Howard & Ruder, 2018)
+def gradual_unfreeze(model, epoch, total_layers=12):
+    """Unfreeze one more layer group per epoch, from top to bottom."""
+    layers_to_unfreeze = min(epoch + 1, total_layers)
+    for i, (name, param) in enumerate(reversed(list(model.named_parameters()))):
+        param.requires_grad = i < layers_to_unfreeze * 10  # ~10 params per layer
+# Strategy 2: Discriminative learning rates
+def get_layer_lrs(model, base_lr=2e-5, decay_factor=0.95):
+    """Apply lower learning rates to earlier layers."""
+    params = []
+    num_layers = 12  # BERT-base
+    for i in range(num_layers):
+        lr = base_lr * (decay_factor ** (num_layers - i - 1))
+        layer_params = [p for n, p in model.named_parameters()
+                       if f"layer.{i}." in n]
+        params.append({"params": layer_params, "lr": lr})
+    return params
+# Strategy 3: EWC (Elastic Weight Consolidation)
+# Add a penalty term that keeps important weights close to pretrained values
+```
+## Fine-Tuning Checklist for Papers
+```
+Before fine-tuning:
+[ ] Report exact pretrained model name and version
+[ ] Document dataset size, splits, and preprocessing
+[ ] Specify hardware (GPU model, count, precision)
+[ ] Set random seeds (Python, NumPy, PyTorch, CUDA)
+During fine-tuning:
+[ ] Use validation set for hyperparameter selection
+[ ] Log training curves (loss, metrics per epoch)
+[ ] Monitor for overfitting (val loss divergence)
+[ ] Try at least 3 learning rates from the recommended range
+Reporting:
+[ ] Report mean and std across 3-5 random seeds
+[ ] Include training time and compute cost
+[ ] Compare against published baselines using same evaluation
+[ ] Release model weights or LoRA adapters for reproducibility
+```
+## Best Practices
+- **Start with the recommended learning rate** for your model size, then sweep 3-5 values.
+- **Use LoRA first** unless you have strong evidence that full fine-tuning is needed.
+- **Always evaluate on a held-out test set** that was not used for any hyperparameter decisions.
+- **Freeze embeddings** when fine-tuning for classification -- they rarely need updating.
+- **Use gradient accumulation** to simulate larger batch sizes on limited hardware.
+- **Save the tokenizer alongside the model** to ensure reproducibility.
+## References
+- [LoRA paper](https://arxiv.org/abs/2106.09685) -- Hu et al., 2021
+- [QLoRA paper](https://arxiv.org/abs/2305.14314) -- Dettmers et al., 2023
+- [PEFT library](https://github.com/huggingface/peft) -- Hugging Face parameter-efficient fine-tuning
+- [ULMFiT](https://arxiv.org/abs/1801.06146) -- Howard & Ruder, 2018 (gradual unfreezing, discriminative LR)
+- [Hugging Face Transformers Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) -- Training API documentation

package/skills/domains/ai-ml/domain-adaptation-papers-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,173 @@
+---
+name: domain-adaptation-papers-guide
+description: "Comprehensive collection of domain adaptation research papers"
+metadata:
+  openclaw:
+    emoji: "🔄"
+    category: "domains"
+    subcategory: "ai-ml"
+    keywords: ["domain adaptation", "transfer learning", "distribution shift", "domain gap", "UDA", "domain generalization"]
+    source: "https://github.com/zhaoxin94/awesome-domain-adaptation"
+---
+# Domain Adaptation Papers Guide
+## Overview
+Domain adaptation addresses the problem of training models on one data distribution (source domain) and deploying them on a different distribution (target domain). This curated collection covers the full spectrum — from unsupervised domain adaptation (UDA) and domain generalization to partial, open-set, and source-free adaptation. Organized by methodology and application area with regularly updated paper lists.
+## Taxonomy of Methods
+```
+Domain Adaptation
+├── Unsupervised DA (UDA)
+│   ├── Discrepancy-based (MMD, CORAL, CDD)
+│   ├── Adversarial-based (DANN, ADDA, CDAN)
+│   ├── Reconstruction-based (DRCN, DSN)
+│   └── Self-training (SHOT, CBST)
+├── Semi-supervised DA
+├── Source-free DA (no source data at adaptation time)
+├── Partial DA (target has subset of source classes)
+├── Open-set DA (target has unknown classes)
+├── Universal DA (no prior on label set relationship)
+├── Multi-source DA
+├── Domain Generalization (no target data at all)
+└── Test-time Adaptation (adapt at inference)
+```
+## Key Methods by Era
+### Classical Methods
+| Method | Year | Approach | Key Idea |
+|--------|------|----------|----------|
+| **TCA** | 2011 | Kernel | Transfer Component Analysis |
+| **GFK** | 2012 | Subspace | Geodesic Flow Kernel |
+| **SA** | 2013 | Subspace | Subspace Alignment |
+| **DAN** | 2015 | MMD | Deep Adaptation Networks |
+| **DANN** | 2016 | Adversarial | Domain-Adversarial Neural Networks |
+| **ADDA** | 2017 | Adversarial | Adversarial Discriminative DA |
+| **CORAL** | 2016 | Statistics | Correlation Alignment |
+### Modern Methods
+| Method | Year | Approach | Key Idea |
+|--------|------|----------|----------|
+| **CDAN** | 2018 | Adversarial | Conditional adversarial + entropy |
+| **MCD** | 2018 | Discrepancy | Maximum Classifier Discrepancy |
+| **SHOT** | 2020 | Source-free | Self-supervised pseudo-labeling |
+| **TENT** | 2021 | Test-time | Entropy minimization at test time |
+| **DAFormer** | 2022 | Transformer | DA for semantic segmentation |
+| **PADCLIP** | 2023 | Vision-language | CLIP-based domain adaptation |
+## Paper Tracking
+```python
+import arxiv
+def find_da_papers(subtopic="unsupervised", days=30):
+    """Find recent domain adaptation papers on arXiv."""
+    queries = {
+        "unsupervised": "abs:unsupervised domain adaptation",
+        "source_free": "abs:source-free domain adaptation",
+        "generalization": "abs:domain generalization",
+        "test_time": "abs:test-time adaptation OR test-time training",
+    }
+    search = arxiv.Search(
+        query=queries.get(subtopic, queries["unsupervised"]),
+        max_results=30,
+        sort_by=arxiv.SortCriterion.SubmittedDate,
+    )
+    for result in search.results():
+        print(f"[{result.published.strftime('%Y-%m-%d')}] "
+              f"{result.title}")
+        print(f"  {result.entry_id}")
+find_da_papers("source_free")
+```
+## Benchmark Datasets
+```python
+# Standard DA benchmarks
+benchmarks = {
+    "Office-31": {
+        "domains": ["Amazon", "DSLR", "Webcam"],
+        "classes": 31,
+        "task": "Object recognition",
+    },
+    "Office-Home": {
+        "domains": ["Art", "Clipart", "Product", "Real World"],
+        "classes": 65,
+        "task": "Object recognition",
+    },
+    "VisDA-2017": {
+        "domains": ["Synthetic", "Real"],
+        "classes": 12,
+        "task": "Large-scale sim-to-real",
+    },
+    "DomainNet": {
+        "domains": ["Clipart", "Infograph", "Painting",
+                     "Quickdraw", "Real", "Sketch"],
+        "classes": 345,
+        "task": "Large-scale multi-domain",
+    },
+    "PACS": {
+        "domains": ["Photo", "Art", "Cartoon", "Sketch"],
+        "classes": 7,
+        "task": "Domain generalization",
+    },
+}
+for name, info in benchmarks.items():
+    print(f"\n{name}: {info['classes']} classes, "
+          f"{len(info['domains'])} domains")
+    print(f"  Domains: {', '.join(info['domains'])}")
+```
+## Application Areas
+| Application | Source → Target Example |
+|-------------|----------------------|
+| **Medical imaging** | Hospital A → Hospital B scanners |
+| **Autonomous driving** | Simulation → Real world |
+| **Remote sensing** | Region A → Region B satellite |
+| **NLP** | News text → Social media |
+| **Speech** | Studio → Noisy environments |
+| **Robotics** | Sim → Real manipulation |
+## Reading Roadmap
+```markdown
+### Beginner Path
+1. "A Survey on Transfer Learning" (Pan & Yang, 2010)
+2. "Domain Adaptation for Object Recognition" (Saenko et al., 2010)
+3. "Deep Domain Confusion" (Tzeng et al., 2014)
+4. DANN paper (Ganin et al., 2016)
+### Intermediate Path
+5. CDAN (Long et al., 2018)
+6. MCD (Saito et al., 2018)
+7. "Moment Matching for Multi-Source DA" (Peng et al., 2019)
+### Advanced Path
+8. SHOT (Liang et al., 2020) — source-free
+9. TENT (Wang et al., 2021) — test-time
+10. "Benchmarking DA on Language" (Ramponi & Plank, 2020)
+```
+## Use Cases
+1. **Literature survey**: Map the DA research landscape
+2. **Method selection**: Choose appropriate DA technique for your task
+3. **Benchmark comparison**: Compare methods on standard datasets
+4. **Research gaps**: Identify under-explored DA settings
+5. **Course material**: Teach transfer learning and DA
+## References
+- [awesome-domain-adaptation](https://github.com/zhaoxin94/awesome-domain-adaptation)
+- [Transfer Learning Library](https://github.com/thuml/Transfer-Learning-Library)
+- [DomainBed](https://github.com/facebookresearch/DomainBed)