npm - @rfxlamia/skillkit - Versions diffs - 1.0.0 → 1.2.0 - Mend

@rfxlamia/skillkit 1.0.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (269) hide show

package/skills/skills/diverse-content-gen/references/troubleshooting.md ADDED Viewed

@@ -0,0 +1,426 @@
+# VS Troubleshooting Guide
+**Purpose:** Solutions to common VS issues and error patterns
+**Load when:** VS execution fails, outputs unsatisfactory, or errors encountered
+---
+## Issue 1: JSON Parsing Failures
+### Symptoms
+- LLM returns explanation text before/after JSON
+- Invalid JSON structure
+- Missing quotes or brackets
+### Root Cause
+Model not following "ONLY JSON" instruction strictly
+### Solutions
+#### Solution A: Emphasize JSON-Only Output
+```
+[Add to VS prompt, in bold/caps]
+**CRITICAL: Give ONLY the JSON object with no explanations, no extra text before or after.**
+Expected format EXACTLY:
+{"responses": [{"text": "...", "probability": 0.XX}, ...]}
+```
+#### Solution B: Structured Output Mode
+```
+# If model supports structured output
+Use structured output API with schema:
+{
+  "type": "object",
+  "properties": {
+    "responses": {
+      "type": "array",
+      "items": {
+        "type": "object",
+        "properties": {
+          "text": {"type": "string"},
+          "probability": {"type": "number", "minimum": 0, "maximum": 1}
+        }
+      }
+    }
+  }
+}
+```
+#### Solution C: Regex Extraction
+```python
+# Fallback: Extract JSON from mixed text
+import re
+import json
+def extract_json(text):
+    # Find JSON object in text
+    match = re.search(r'\{[\s\S]*"responses"[\s\S]*\}', text)
+    if match:
+        return json.loads(match.group(0))
+    raise ValueError("No valid JSON found")
+```
+### Prevention
+Always include explicit "ONLY JSON" instruction in every VS prompt
+---
+## Issue 2: Probabilities Don't Make Sense
+### Symptoms
+- All probabilities are identical (e.g., all 0.20)
+- Probabilities sum to unexpected values (2.5, 10.0, etc.)
+- Negative probabilities or values > 1.0
+### Root Cause
+Model estimating probabilities imperfectly (expected behavior)
+### Understanding
+**Important:** Probabilities in VS are **estimates**, not ground truth.
+✅ **What to trust:**
+- Relative ordering (higher p = more typical)
+- General magnitude (0.08 vs 0.01 = significant difference)
+❌ **What NOT to expect:**
+- Perfect calibration
+- Probabilities summing to exactly 1.0
+- Absolute precision
+### Solutions
+#### Solution A: Focus on Relative Ranking
+```python
+# Sort by probability, ignore absolute values
+candidates.sort(key=lambda x: x["probability"], reverse=True)
+# Present: highest p = most typical, lowest p = most creative
+```
+#### Solution B: Normalize If Needed
+```python
+# Only if required for downstream use
+total = sum(c["probability"] for c in candidates)
+for c in candidates:
+    c["probability_normalized"] = c["probability"] / total
+```
+#### Solution C: Don't Show Probabilities to User
+```
+# Simple presentation (hide probabilities)
+Here are 5 diverse options:
+1. [Option 1 text]
+2. [Option 2 text]
+...
+```
+### Prevention
+Set expectation: probabilities are guidance, not precise measurements
+---
+## Issue 3: Outputs Still Too Similar
+### Symptoms
+- VS outputs lack diversity despite using technique
+- All variations sound alike
+- No meaningful angle/style differences
+### Root Causes
+1. Task itself has limited diversity space
+2. Parameters not tuned for diversity
+3. Model constraints (smaller models struggle more)
+### Solutions
+#### Solution A: Lower Probability Threshold
+```
+# Add to VS prompt
+Randomly sample from the distribution, with probability of each response below 0.01.
+```
+**Effect:** Samples more from tail (creative outputs)
+#### Solution B: Increase k Value
+```
+# Generate more candidates
+k = 10  # Instead of 5
+```
+**Effect:** Wider exploration of possibility space
+#### Solution C: Add Explicit Diversity Instruction
+```
+IMPORTANT: Ensure responses cover DIFFERENT:
+- Tones: (humorous, professional, inspirational, casual)
+- Perspectives: (beginner, expert, skeptic, enthusiast)
+- Formats: (question, statement, story, instruction)
+Avoid generating similar responses.
+```
+#### Solution D: Use VS-CoT
+```
+# Add reasoning step (see advanced-techniques.md)
+Before generating, think through different angles...
+```
+**Effect:** Model consciously diversifies
+#### Solution E: Check Task Viability
+```
+# Some tasks genuinely have limited diversity
+Example: "Generate the capital of France" → Only 1 valid answer
+Ask: Does this task have multiple valid approaches?
+If NO → VS may not be appropriate
+```
+### Prevention
+Start with k=5, threshold=0.10. Adjust if diversity insufficient.
+---
+## Issue 4: Quality Drop on Complex Tasks
+### Symptoms
+- VS outputs diverse but lower quality
+- Errors, incoherence, or off-topic responses
+- User prefers single-shot standard prompting quality
+### Root Cause
+Diversity-quality tradeoff, especially with high k or low threshold
+### Solutions
+#### Solution A: Use VS-Multi
+```
+# See advanced-techniques.md
+Round 1: VS (diversity)
+Round 2: User selects favorites
+Round 3: Refine selected (quality)
+```
+**Effect:** Best of both worlds
+#### Solution B: Add Quality Constraints
+```
+Requirements for ALL responses:
+- Professional tone maintained
+- Grammatically correct
+- On-topic and relevant
+- No clichés or filler
+Then generate {k} responses meeting these standards.
+```
+#### Solution C: Reduce k or Raise Threshold
+```
+# More conservative parameters
+k = 3  # Instead of 10
+threshold = 0.10  # Instead of 0.01
+```
+**Effect:** Less aggressive diversity, better quality
+#### Solution D: Use VS-CoT for Coherence
+```
+# Reasoning step helps with complex tasks
+# See advanced-techniques.md
+```
+### Prevention
+For production content, default to VS-Multi workflow
+---
+## Issue 5: Model Not Following VS Format
+### Symptoms
+- Returns single response instead of k responses
+- Generates list without probabilities
+- Ignores JSON format entirely
+### Root Cause
+1. Smaller/weaker models struggle with complex instructions
+2. Prompt too complex for model capabilities
+### Solutions
+#### Solution A: Simplify Prompt
+```
+# Minimal VS prompt
+Generate 5 different responses to: {request}
+Return JSON:
+{"responses": [
+  {"text": "response 1", "probability": 0.X},
+  {"text": "response 2", "probability": 0.X},
+  ...
+]}
+ONLY JSON, no extra text.
+```
+#### Solution B: Use Stronger Model
+```
+# Check model compatibility (see research-findings.md)
+Recommended: GPT-4.1+, Claude 4+, Gemini 2.5+
+Avoid: Models < 70B parameters
+```
+#### Solution C: Fallback to Standard + Repetition
+```
+# If VS fails, alternative:
+for i in range(k):
+    response = standard_prompt_with_variation(request)
+    candidates.append(response)
+```
+**Effect:** Less research-backed but more reliable for weak models
+### Prevention
+Use frontier models (GPT-4, Claude 4, Gemini 2.5) for best VS results
+---
+## Issue 6: VS Taking Too Long / Expensive
+### Symptoms
+- High latency for VS calls
+- Token costs exceeding budget
+- User impatient with wait times
+### Root Causes
+- High k value (10-20) requires long generation
+- Multiple calls for large N
+- Using expensive models
+### Solutions
+#### Solution A: Reduce k
+```
+k = 3  # Quick mode, still gets diversity benefit
+```
+**Effect:** ~40% faster, 60% token cost reduction
+#### Solution B: Batch Optimization
+```
+# If generating for multiple prompts:
+# Execute VS calls in parallel (if API allows)
+import asyncio
+results = await asyncio.gather(*vs_calls)
+```
+#### Solution C: Use Smaller Model for Initial Pass
+```
+Round 1: VS with smaller/faster model (GPT-4.1-mini)
+Round 2: Refine selected with flagship model
+```
+**Effect:** Cost reduction while maintaining quality
+#### Solution D: Cache Results
+```
+# For repeated similar prompts:
+if prompt in cache:
+    return cache[prompt]
+else:
+    result = execute_vs(prompt)
+    cache[prompt] = result
+    return result
+```
+### Prevention
+Set expectations: VS trades tokens/time for diversity gains
+---
+## Issue 7: Probabilities All Very Low
+### Symptoms
+- All probabilities < 0.05
+- No clear high-probability responses
+### Root Cause
+This is actually **expected with low probability threshold**
+### Understanding
+```
+# If you use threshold=0.01
+Randomly sample with probability < 0.01
+Result: All responses will be < 0.01 (tail sampling)
+This is working as intended!
+```
+### Solutions
+#### Solution A: Remove/Raise Threshold
+```
+# For more typical outputs:
+threshold = 0.10  # Or remove threshold entirely
+```
+#### Solution B: Interpret Correctly
+```
+# Low probabilities = creative/unique outputs
+# This is a feature, not a bug!
+Present to user:
+"Here are 5 creative (low-probability) options..."
+```
+### Prevention
+Understand that low threshold → low probabilities (by design)
+---
+## Debugging Checklist
+**When VS doesn't work as expected:**
+1. [ ] Check prompt formatting (exact template used?)
+2. [ ] Verify model supports complex instructions (frontier model?)
+3. [ ] Review parameters (k, threshold, temperature sensible?)
+4. [ ] Test with simpler request (does basic VS work?)
+5. [ ] Check JSON parsing (is response valid JSON?)
+6. [ ] Verify task suitability (does it have diversity potential?)
+7. [ ] Review quality requirements (diversity-quality tradeoff?)
+---
+## Error Message Quick Reference
+| Error | Likely Cause | Quick Fix |
+|-------|--------------|-----------|
+| "Invalid JSON" | Model didn't follow format | Emphasize "ONLY JSON" OR use regex extraction |
+| "Missing 'probability' field" | Model skipped probabilities | Simplify prompt OR use stronger model |
+| "All outputs identical" | Not using VS correctly | Check prompt has VS template |
+| "Probabilities sum to 10" | Misunderstanding (not an error) | Focus on relative ranking, not absolute |
+| "Quality too low" | High diversity, low quality | Use VS-Multi OR add quality constraints |
+| "Too slow" | High k or multiple calls | Reduce k OR use smaller model |
+---
+## When to Give Up on VS
+**VS may not be suitable if:**
+1. Task has single correct answer (factual QA)
+2. Model too weak to follow instructions (< 70B params)
+3. User explicitly wants deterministic output
+4. Real-time latency critical (< 1 second response needed)
+5. Quality degradation unacceptable
+**Alternative:** Use standard prompting with explicit variation instructions
+---
+**For advanced VS techniques:** See `advanced-techniques.md`
+**For research-backed insights:** See `research-findings.md`

package/skills/skills/diverse-content-gen/references/vs-core-technique.md ADDED Viewed

@@ -0,0 +1,240 @@
+# VS Core Technique
+**Purpose:** Core Verbalized Sampling concepts, prompt templates, and execution workflow
+**Load when:** Agent needs to execute VS for the first time or needs template reference
+---
+## Why VS Works: The Theory
+### The Mode Collapse Problem
+Aligned LLMs suffer from **typicality bias** - they favor more typical, familiar text because:
+- Human annotators prefer fluent, predictable content
+- RLHF training amplifies this bias
+- Result: **50-70% diversity reduction** vs. base models
+### The VS Solution
+**Different prompts collapse to different modes:**
+| Prompt Type | Example | Collapses To |
+|------------|---------|--------------|
+| **Instance** | "Tell me a joke" | Single most typical joke |
+| **List** | "Tell 5 jokes" | Uniform distribution of related items |
+| **Distribution (VS)** | "Tell 5 jokes with probabilities" | Base model's learned distribution |
+**Key insight:** By asking for a probability distribution, VS recovers the diverse pre-training distribution that alignment compressed.
+---
+## VS Prompt Template (Production-Ready)
+### Standard Template
+**Use this exact format for JSON-parseable output:**
+```
+Generate {k} responses to the following user request. Each response should be approximately {target_words} words.
+Return the responses in JSON format with the key: "responses" (list of dicts). Each dictionary must include:
+• text: the response string only (no explanation or extra text)
+• probability: the estimated probability from 0.0 to 1.0 of this response given the input prompt (relative to the full distribution)
+[OPTIONAL: Randomly sample the responses from the distribution, with the probability of each response below {threshold}.]
+Give ONLY the JSON object, with no explanations or extra text.
+USER REQUEST:
+{user_original_request}
+```
+### Template Variables
+**Required:**
+- `{k}`: Number of candidates (typically 5)
+- `{target_words}`: Expected length (e.g., "50", "200", "500")
+- `{user_original_request}`: The actual user query
+**Optional:**
+- `{threshold}`: Probability threshold (0.01, 0.05, 0.10) - include bracketed line only if tuning for more diversity
+### Concrete Example
+**User request:** "Write 10 social media captions for a coffee shop's new latte"
+**Agent formats VS prompt:**
+```
+Generate 5 responses to the following user request. Each response should be approximately 20 words.
+Return the responses in JSON format with the key: "responses" (list of dicts). Each dictionary must include:
+• text: the response string only (no explanation or extra text)
+• probability: the estimated probability from 0.0 to 1.0 of this response given the input prompt (relative to the full distribution)
+Randomly sample the responses from the distribution, with the probability of each response below 0.10.
+Give ONLY the JSON object, with no explanations or extra text.
+USER REQUEST:
+Write a social media caption for a coffee shop's new caramel cloud latte
+```
+**Expected output:**
+```json
+{
+  "responses": [
+    {"text": "Sip on cloud nine ☁️ Our new Caramel Cloud Latte is here to make your mornings magical ✨", "probability": 0.08},
+    {"text": "Warning: Dangerously smooth. The Caramel Cloud Latte has arrived and it's causing serious caffeine crushes 💛", "probability": 0.06},
+    {"text": "Fluffy. Creamy. Caramel-y. The Caramel Cloud Latte is basically a hug in a cup 🤗", "probability": 0.05},
+    {"text": "Plot twist: clouds ARE edible. Try our new Caramel Cloud Latte and taste the sky ☁️☕", "probability": 0.04},
+    {"text": "New latte just dropped and it's lighter than air. Introducing: Caramel Cloud Latte 🌤️", "probability": 0.03}
+  ]
+}
+```
+---
+## Execution Workflow
+### Step 1: Parameter Planning
+**Before executing VS, determine:**
+**1.1 Content Parameters**
+- Content type: (blog post, caption, story, campaign idea, etc.)
+- Target word count: (20 words for captions, 500 for blog posts, etc.)
+- Total outputs needed: N (user wants 10 captions? 5 blog posts?)
+**1.2 VS Parameters**
+| Parameter | Default | Notes |
+|-----------|---------|-------|
+| k (candidates per call) | 5 | Use 3 for quick, 10 for deep exploration |
+| Temperature | 0.7-1.0 | Can combine with VS for extra boost |
+| Probability threshold | 0.10 (optional) | Lower = more creative tail sampling |
+**1.3 Calculate Calls Needed**
+```
+Number of LLM calls = ⌈N / k⌉
+```
+Example: User wants 15 ideas → k=5 → Need 3 calls
+### Step 2: Execute VS Prompt
+1. **Format the prompt** using template with variables filled in
+2. **Make LLM call** (use regular message, no special tools)
+3. **Parse JSON response** - extract responses array
+4. **Repeat if needed** for additional candidates (when N > k)
+5. **Collect all candidates** from multiple calls into single pool
+### Step 3: Parse & Validate Output
+**After receiving VS response:**
+```python
+# Pseudo-code for agent processing
+import json
+response_text = llm_output  # The JSON string from LLM
+data = json.loads(response_text)
+candidates = data["responses"]
+# Validate structure
+for item in candidates:
+    assert "text" in item and "probability" in item
+    assert 0.0 <= item["probability"] <= 1.0
+```
+**Handle errors:**
+- If JSON parsing fails → See `troubleshooting.md`
+- If structure invalid → Retry with emphasis on "ONLY JSON"
+### Step 4: Present Results to User
+**Three presentation options:**
+**Option A: Ranked by Probability**
+```
+Here are 5 diverse caption ideas (ordered by probability):
+1. [p=0.08] Sip on cloud nine ☁️ Our new Caramel Cloud Latte...
+2. [p=0.06] Warning: Dangerously smooth. The Caramel Cloud Latte...
+3. [p=0.05] Fluffy. Creamy. Caramel-y. The Caramel Cloud Latte...
+```
+**Option B: Grouped by Tiers**
+```
+HIGH PROBABILITY (typical, safer):
+• Sip on cloud nine ☁️ Our new Caramel Cloud Latte...
+MEDIUM PROBABILITY (balanced):
+• Warning: Dangerously smooth. The Caramel Cloud Latte...
+• Fluffy. Creamy. Caramel-y. The Caramel Cloud Latte...
+LOW PROBABILITY (creative, unique):
+• Plot twist: clouds ARE edible. Try our new Caramel Cloud...
+```
+**Option C: Simple List (Hide Probabilities)**
+```
+Here are 5 diverse caption ideas:
+1. Sip on cloud nine ☁️ Our new Caramel Cloud Latte...
+2. Warning: Dangerously smooth. The Caramel Cloud Latte...
+3. Fluffy. Creamy. Caramel-y. The Caramel Cloud Latte...
+```
+**Default:** Use Option C for cleaner user experience, unless user asks for probability insights.
+---
+## Parameter Selection Guide
+### Quick Decision Matrix
+| Scenario | k | Threshold | Temperature |
+|----------|---|-----------|-------------|
+| **Quick ideation** | 3 | None | 0.7 |
+| **Standard brainstorming** | 5 | 0.10 | 0.8 |
+| **Deep exploration** | 10 | 0.01 | 1.0 |
+| **Production content** | 5 | None | 0.8 |
+### When to Adjust Parameters
+**If outputs too similar:**
+- ✅ Lower threshold (0.10 → 0.01)
+- ✅ Increase k (5 → 10)
+- ✅ Add explicit diversity instruction to prompt
+**If outputs too wild/low quality:**
+- ✅ Raise threshold (0.01 → 0.10)
+- ✅ Reduce k (10 → 5)
+- ✅ Add quality constraints to prompt
+---
+## Quality Control Checklist
+**Before presenting VS results to user, verify:**
+- [ ] **Diversity achieved:** Outputs cover genuinely different angles/styles/approaches
+- [ ] **Quality maintained:** Each output meets baseline quality standards
+- [ ] **User intent matched:** Outputs address the original request accurately
+- [ ] **Formatting correct:** Clean presentation, no JSON artifacts in user-facing text
+- [ ] **Probabilities sensible:** If shown, probabilities are reasonable (don't need to sum to 1.0)
+---
+## Next Steps
+**After mastering core VS:**
+- **Task-specific workflows:** Load `task-workflows.md` for pre-configured templates
+- **Advanced techniques:** Load `advanced-techniques.md` for VS-CoT, VS-Multi, refinement
+- **Tool integration:** Load `tool-integration.md` for file operations, batch processing
+- **Troubleshooting:** Load `troubleshooting.md` if encountering issues