npm - @xaviele/ag-kit - Versions diffs - 1.0.0 - Mend

@xaviele/ag-kit 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (235) hide show

package/template/.agent/skills/claudekit-ai-multimodal/SKILL.md ADDED Viewed

@@ -0,0 +1,353 @@
+---
+name: "ai-multimodal"
+description: "Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (captioning, object detection, OCR, visual Q&A, segmentation), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image, editing, composition, refinement). Use when working with audio/video files, analyzing images or screenshots, processing PDF documents, extracting structured data from media, creating images from text prompts, or implementing multimodal AI features. Supports multiple models (Gemini 2.5/2.0) with context windows up to 2M tokens."
+version: 1.0.0
+category: build
+---
+# AI Multimodal Processing Skill
+Process audio, images, videos, documents, and generate images using Google Gemini's multimodal API. Unified interface for all multimedia content understanding and generation.
+## Core Capabilities
+### Audio Processing
+- Transcription with timestamps (up to 9.5 hours)
+- Audio summarization and analysis
+- Speech understanding and speaker identification
+- Music and environmental sound analysis
+- Text-to-speech generation with controllable voice
+### Image Understanding
+- Image captioning and description
+- Object detection with bounding boxes (2.0+)
+- Pixel-level segmentation (2.5+)
+- Visual question answering
+- Multi-image comparison (up to 3,600 images)
+- OCR and text extraction
+### Video Analysis
+- Scene detection and summarization
+- Video Q&A with temporal understanding
+- Transcription with visual descriptions
+- YouTube URL support
+- Long video processing (up to 6 hours)
+- Frame-level analysis
+### Document Extraction
+- Native PDF vision processing (up to 1,000 pages)
+- Table and form extraction
+- Chart and diagram analysis
+- Multi-page document understanding
+- Structured data output (JSON schema)
+- Format conversion (PDF to HTML/JSON)
+### Image Generation
+- Text-to-image generation
+- Image editing and modification
+- Multi-image composition (up to 3 images)
+- Iterative refinement
+- Multiple aspect ratios (1:1, 16:9, 9:16, 4:3, 3:4)
+- Controllable style and quality
+## Capability Matrix
+| Task | Audio | Image | Video | Document | Generation |
+|------|:-----:|:-----:|:-----:|:--------:|:----------:|
+| Transcription | ✓ | - | ✓ | - | - |
+| Summarization | ✓ | ✓ | ✓ | ✓ | - |
+| Q&A | ✓ | ✓ | ✓ | ✓ | - |
+| Object Detection | - | ✓ | ✓ | - | - |
+| Text Extraction | - | ✓ | - | ✓ | - |
+| Structured Output | ✓ | ✓ | ✓ | ✓ | - |
+| Creation | TTS | - | - | - | ✓ |
+| Timestamps | ✓ | - | ✓ | - | - |
+| Segmentation | - | ✓ | - | - | - |
+## Model Selection Guide
+### Gemini 2.5 Series (Recommended)
+- **gemini-2.5-pro**: Highest quality, all features, 1M-2M context
+- **gemini-2.5-flash**: Best balance, all features, 1M-2M context
+- **gemini-2.5-flash-lite**: Lightweight, segmentation support
+- **gemini-2.5-flash-image**: Image generation only
+### Gemini 2.0 Series
+- **gemini-2.0-flash**: Fast processing, object detection
+- **gemini-2.0-flash-lite**: Lightweight option
+### Feature Requirements
+- **Segmentation**: Requires 2.5+ models
+- **Object Detection**: Requires 2.0+ models
+- **Multi-video**: Requires 2.5+ models
+- **Image Generation**: Requires flash-image model
+### Context Windows
+- **2M tokens**: ~6 hours video (low-res) or ~2 hours (default)
+- **1M tokens**: ~3 hours video (low-res) or ~1 hour (default)
+- **Audio**: 32 tokens/second (1 min = 1,920 tokens)
+- **PDF**: 258 tokens/page (fixed)
+- **Image**: 258-1,548 tokens based on size
+## Quick Start
+### Prerequisites
+**API Key Setup**: Supports both Google AI Studio and Vertex AI.
+The skill checks for `GEMINI_API_KEY` in this order:
+1. Process environment: `export GEMINI_API_KEY="your-key"`
+2. Project root: `.env`
+3. `.claude/.env`
+4. `.claude/skills/.env`
+5. `.claude/skills/ai-multimodal/.env`
+**Get API key**: https://aistudio.google.com/apikey
+**For Vertex AI**:
+```bash
+export GEMINI_USE_VERTEX=true
+export VERTEX_PROJECT_ID=your-gcp-project-id
+export VERTEX_LOCATION=us-central1  # Optional
+```
+**Install SDK**:
+```bash
+pip install google-genai python-dotenv pillow
+```
+### Common Patterns
+**Transcribe Audio**:
+```bash
+python scripts/gemini_batch_process.py \
+  --files audio.mp3 \
+  --task transcribe \
+  --model gemini-2.5-flash
+```
+**Analyze Image**:
+```bash
+python scripts/gemini_batch_process.py \
+  --files image.jpg \
+  --task analyze \
+  --prompt "Describe this image" \
+  --output docs/assets/<output-name>.md \
+  --model gemini-2.5-flash
+```
+**Process Video**:
+```bash
+python scripts/gemini_batch_process.py \
+  --files video.mp4 \
+  --task analyze \
+  --prompt "Summarize key points with timestamps" \
+  --output docs/assets/<output-name>.md \
+  --model gemini-2.5-flash
+```
+**Extract from PDF**:
+```bash
+python scripts/gemini_batch_process.py \
+  --files document.pdf \
+  --task extract \
+  --prompt "Extract table data as JSON" \
+  --output docs/assets/<output-name>.md \
+  --format json
+```
+**Generate Image**:
+```bash
+python scripts/gemini_batch_process.py \
+  --task generate \
+  --prompt "A futuristic city at sunset" \
+  --output docs/assets/<output-file-name> \
+  --model gemini-2.5-flash-image \
+  --aspect-ratio 16:9
+```
+**Optimize Media**:
+```bash
+# Prepare large video for processing
+python scripts/media_optimizer.py \
+  --input large-video.mp4 \
+  --output docs/assets/<output-file-name> \
+  --target-size 100MB
+# Batch optimize multiple files
+python scripts/media_optimizer.py \
+  --input-dir ./videos \
+  --output-dir docs/assets/optimized \
+  --quality 85
+```
+**Convert Documents to Markdown**:
+```bash
+# Convert to PDF
+python scripts/document_converter.py \
+  --input document.docx \
+  --output docs/assets/document.md
+# Extract pages
+python scripts/document_converter.py \
+  --input large.pdf \
+  --output docs/assets/chapter1.md \
+  --pages 1-20
+```
+## Supported Formats
+### Audio
+- WAV, MP3, AAC, FLAC, OGG Vorbis, AIFF
+- Max 9.5 hours per request
+- Auto-downsampled to 16 Kbps mono
+### Images
+- PNG, JPEG, WEBP, HEIC, HEIF
+- Max 3,600 images per request
+- Resolution: ≤384px = 258 tokens, larger = tiled
+### Video
+- MP4, MPEG, MOV, AVI, FLV, MPG, WebM, WMV, 3GPP
+- Max 6 hours (low-res) or 2 hours (default)
+- YouTube URLs supported (public only)
+### Documents
+- PDF only for vision processing
+- Max 1,000 pages
+- TXT, HTML, Markdown supported (text-only)
+### Size Limits
+- **Inline**: <20MB total request
+- **File API**: 2GB per file, 20GB project quota
+- **Retention**: 48 hours auto-delete
+## Reference Navigation
+For detailed implementation guidance, see:
+### Audio Processing
+- `references/audio-processing.md` - Transcription, analysis, TTS
+  - Timestamp handling and segment analysis
+  - Multi-speaker identification
+  - Non-speech audio analysis
+  - Text-to-speech generation
+### Image Understanding
+- `references/vision-understanding.md` - Captioning, detection, OCR
+  - Object detection and localization
+  - Pixel-level segmentation
+  - Visual question answering
+  - Multi-image comparison
+### Video Analysis
+- `references/video-analysis.md` - Scene detection, temporal understanding
+  - YouTube URL processing
+  - Timestamp-based queries
+  - Video clipping and FPS control
+  - Long video optimization
+### Document Extraction
+- `references/document-extraction.md` - PDF processing, structured output
+  - Table and form extraction
+  - Chart and diagram analysis
+  - JSON schema validation
+  - Multi-page handling
+### Image Generation
+- `references/image-generation.md` - Text-to-image, editing
+  - Prompt engineering strategies
+  - Image editing and composition
+  - Aspect ratio selection
+  - Safety settings
+## Cost Optimization
+### Token Costs
+**Input Pricing**:
+- Gemini 2.5 Flash: $1.00/1M input, $0.10/1M output
+- Gemini 2.5 Pro: $3.00/1M input, $12.00/1M output
+- Gemini 1.5 Flash: $0.70/1M input, $0.175/1M output
+**Token Rates**:
+- Audio: 32 tokens/second (1 min = 1,920 tokens)
+- Video: ~300 tokens/second (default) or ~100 (low-res)
+- PDF: 258 tokens/page (fixed)
+- Image: 258-1,548 tokens based on size
+**TTS Pricing**:
+- Flash TTS: $10/1M tokens
+- Pro TTS: $20/1M tokens
+### Best Practices
+1. Use `gemini-2.5-flash` for most tasks (best price/performance)
+2. Use File API for files >20MB or repeated queries
+3. Optimize media before upload (see `media_optimizer.py`)
+4. Process specific segments instead of full videos
+5. Use lower FPS for static content
+6. Implement context caching for repeated queries
+7. Batch process multiple files in parallel
+## Rate Limits
+**Free Tier**:
+- 10-15 RPM (requests per minute)
+- 1M-4M TPM (tokens per minute)
+- 1,500 RPD (requests per day)
+**YouTube Limits**:
+- Free tier: 8 hours/day
+- Paid tier: No length limits
+- Public videos only
+**Storage Limits**:
+- 20GB per project
+- 2GB per file
+- 48-hour retention
+## Error Handling
+Common errors and solutions:
+- **400**: Invalid format/size - validate before upload
+- **401**: Invalid API key - check configuration
+- **403**: Permission denied - verify API key restrictions
+- **404**: File not found - ensure file uploaded and active
+- **429**: Rate limit exceeded - implement exponential backoff
+- **500**: Server error - retry with backoff
+## Scripts Overview
+All scripts support unified API key detection and error handling:
+**gemini_batch_process.py**: Batch process multiple media files
+- Supports all modalities (audio, image, video, PDF)
+- Progress tracking and error recovery
+- Output formats: JSON, Markdown, CSV
+- Rate limiting and retry logic
+- Dry-run mode
+**media_optimizer.py**: Prepare media for Gemini API
+- Compress videos/audio for size limits
+- Resize images appropriately
+- Split long videos into chunks
+- Format conversion
+- Quality vs size optimization
+**document_converter.py**: Convert documents to PDF
+- Convert DOCX, XLSX, PPTX to PDF
+- Extract page ranges
+- Optimize PDFs for Gemini
+- Extract images from PDFs
+- Batch conversion support
+Run any script with `--help` for detailed usage.
+## Resources
+- [Audio API Docs](https://ai.google.dev/gemini-api/docs/audio)
+- [Image API Docs](https://ai.google.dev/gemini-api/docs/image-understanding)
+- [Video API Docs](https://ai.google.dev/gemini-api/docs/video-understanding)
+- [Document API Docs](https://ai.google.dev/gemini-api/docs/document-processing)
+- [Image Gen Docs](https://ai.google.dev/gemini-api/docs/image-generation)
+- [Get API Key](https://aistudio.google.com/apikey)
+- [Pricing](https://ai.google.dev/pricing)

package/template/.agent/skills/clean-code/SKILL.md ADDED Viewed

@@ -0,0 +1,201 @@
+---
+name: clean-code
+description: Pragmatic coding standards - concise, direct, no over-engineering, no unnecessary comments
+allowed-tools: Read, Write, Edit
+version: 2.0
+priority: CRITICAL
+---
+# Clean Code - Pragmatic AI Coding Standards
+> **CRITICAL SKILL** - Be **concise, direct, and solution-focused**.
+---
+## Core Principles
+| Principle | Rule |
+|-----------|------|
+| **SRP** | Single Responsibility - each function/class does ONE thing |
+| **DRY** | Don't Repeat Yourself - extract duplicates, reuse |
+| **KISS** | Keep It Simple - simplest solution that works |
+| **YAGNI** | You Aren't Gonna Need It - don't build unused features |
+| **Boy Scout** | Leave code cleaner than you found it |
+---
+## Naming Rules
+| Element | Convention |
+|---------|------------|
+| **Variables** | Reveal intent: `userCount` not `n` |
+| **Functions** | Verb + noun: `getUserById()` not `user()` |
+| **Booleans** | Question form: `isActive`, `hasPermission`, `canEdit` |
+| **Constants** | SCREAMING_SNAKE: `MAX_RETRY_COUNT` |
+> **Rule:** If you need a comment to explain a name, rename it.
+---
+## Function Rules
+| Rule | Description |
+|------|-------------|
+| **Small** | Max 20 lines, ideally 5-10 |
+| **One Thing** | Does one thing, does it well |
+| **One Level** | One level of abstraction per function |
+| **Few Args** | Max 3 arguments, prefer 0-2 |
+| **No Side Effects** | Don't mutate inputs unexpectedly |
+---
+## Code Structure
+| Pattern | Apply |
+|---------|-------|
+| **Guard Clauses** | Early returns for edge cases |
+| **Flat > Nested** | Avoid deep nesting (max 2 levels) |
+| **Composition** | Small functions composed together |
+| **Colocation** | Keep related code close |
+---
+## AI Coding Style
+| Situation | Action |
+|-----------|--------|
+| User asks for feature | Write it directly |
+| User reports bug | Fix it, don't explain |
+| No clear requirement | Ask, don't assume |
+---
+## Anti-Patterns (DON'T)
+| ❌ Pattern | ✅ Fix |
+|-----------|-------|
+| Comment every line | Delete obvious comments |
+| Helper for one-liner | Inline the code |
+| Factory for 2 objects | Direct instantiation |
+| utils.ts with 1 function | Put code where used |
+| "First we import..." | Just write code |
+| Deep nesting | Guard clauses |
+| Magic numbers | Named constants |
+| God functions | Split by responsibility |
+---
+## 🔴 Before Editing ANY File (THINK FIRST!)
+**Before changing a file, ask yourself:**
+| Question | Why |
+|----------|-----|
+| **What imports this file?** | They might break |
+| **What does this file import?** | Interface changes |
+| **What tests cover this?** | Tests might fail |
+| **Is this a shared component?** | Multiple places affected |
+**Quick Check:**
+```
+File to edit: UserService.ts
+└── Who imports this? → UserController.ts, AuthController.ts
+└── Do they need changes too? → Check function signatures
+```
+> 🔴 **Rule:** Edit the file + all dependent files in the SAME task.
+> 🔴 **Never leave broken imports or missing updates.**
+---
+## Summary
+| Do | Don't |
+|----|-------|
+| Write code directly | Write tutorials |
+| Let code self-document | Add obvious comments |
+| Fix bugs immediately | Explain the fix first |
+| Inline small things | Create unnecessary files |
+| Name things clearly | Use abbreviations |
+| Keep functions small | Write 100+ line functions |
+> **Remember: The user wants working code, not a programming lesson.**
+---
+## 🔴 Self-Check Before Completing (MANDATORY)
+**Before saying "task complete", verify:**
+| Check | Question |
+|-------|----------|
+| ✅ **Goal met?** | Did I do exactly what user asked? |
+| ✅ **Files edited?** | Did I modify all necessary files? |
+| ✅ **Code works?** | Did I test/verify the change? |
+| ✅ **No errors?** | Lint and TypeScript pass? |
+| ✅ **Nothing forgotten?** | Any edge cases missed? |
+> 🔴 **Rule:** If ANY check fails, fix it before completing.
+---
+## Verification Scripts (MANDATORY)
+> 🔴 **CRITICAL:** Each agent runs ONLY their own skill's scripts after completing work.
+### Agent → Script Mapping
+| Agent | Script | Command |
+|-------|--------|---------|
+| **frontend-specialist** | UX Audit | `python .agent/skills/frontend-design/scripts/ux_audit.py .` |
+| **frontend-specialist** | A11y Check | `python .agent/skills/frontend-design/scripts/accessibility_checker.py .` |
+| **backend-specialist** | API Validator | `python .agent/skills/api-patterns/scripts/api_validator.py .` |
+| **mobile-developer** | Mobile Audit | `python .agent/skills/mobile-design/scripts/mobile_audit.py .` |
+| **database-architect** | Schema Validate | `python .agent/skills/database-design/scripts/schema_validator.py .` |
+| **security-auditor** | Security Scan | `python .agent/skills/vulnerability-scanner/scripts/security_scan.py .` |
+| **seo-specialist** | SEO Check | `python .agent/skills/seo-fundamentals/scripts/seo_checker.py .` |
+| **seo-specialist** | GEO Check | `python .agent/skills/geo-fundamentals/scripts/geo_checker.py .` |
+| **performance-optimizer** | Lighthouse | `python .agent/skills/performance-profiling/scripts/lighthouse_audit.py <url>` |
+| **test-engineer** | Test Runner | `python .agent/skills/testing-patterns/scripts/test_runner.py .` |
+| **test-engineer** | Playwright | `python .agent/skills/webapp-testing/scripts/playwright_runner.py <url>` |
+| **Any agent** | Lint Check | `python .agent/skills/lint-and-validate/scripts/lint_runner.py .` |
+| **Any agent** | Type Coverage | `python .agent/skills/lint-and-validate/scripts/type_coverage.py .` |
+| **Any agent** | i18n Check | `python .agent/skills/i18n-localization/scripts/i18n_checker.py .` |
+> ❌ **WRONG:** `test-engineer` running `ux_audit.py`
+> ✅ **CORRECT:** `frontend-specialist` running `ux_audit.py`
+---
+### 🔴 Script Output Handling (READ → SUMMARIZE → ASK)
+**When running a validation script, you MUST:**
+1. **Run the script** and capture ALL output
+2. **Parse the output** - identify errors, warnings, and passes
+3. **Summarize to user** in this format:
+```markdown
+## Script Results: [script_name.py]
+### ❌ Errors Found (X items)
+- [File:Line] Error description 1
+- [File:Line] Error description 2
+### ⚠️ Warnings (Y items)
+- [File:Line] Warning description
+### ✅ Passed (Z items)
+- Check 1 passed
+- Check 2 passed
+**Should I fix the X errors?**
+```
+4. **Wait for user confirmation** before fixing
+5. **After fixing** → Re-run script to confirm
+> 🔴 **VIOLATION:** Running script and ignoring output = FAILED task.
+> 🔴 **VIOLATION:** Auto-fixing without asking = Not allowed.
+> 🔴 **Rule:** Always READ output → SUMMARIZE → ASK → then fix.

package/template/.agent/skills/code-review-checklist/SKILL.md ADDED Viewed

@@ -0,0 +1,109 @@
+---
+name: code-review-checklist
+description: Code review guidelines covering code quality, security, and best practices.
+allowed-tools: Read, Glob, Grep
+---
+# Code Review Checklist
+## Quick Review Checklist
+### Correctness
+- [ ] Code does what it's supposed to do
+- [ ] Edge cases handled
+- [ ] Error handling in place
+- [ ] No obvious bugs
+### Security
+- [ ] Input validated and sanitized
+- [ ] No SQL/NoSQL injection vulnerabilities
+- [ ] No XSS or CSRF vulnerabilities
+- [ ] No hardcoded secrets or sensitive credentials
+- [ ] **AI-Specific:** Protection against Prompt Injection (if applicable)
+- [ ] **AI-Specific:** Outputs are sanitized before being used in critical sinks
+### Performance
+- [ ] No N+1 queries
+- [ ] No unnecessary loops
+- [ ] Appropriate caching
+- [ ] Bundle size impact considered
+### Code Quality
+- [ ] Clear naming
+- [ ] DRY - no duplicate code
+- [ ] SOLID principles followed
+- [ ] Appropriate abstraction level
+### Testing
+- [ ] Unit tests for new code
+- [ ] Edge cases tested
+- [ ] Tests readable and maintainable
+### Documentation
+- [ ] Complex logic commented
+- [ ] Public APIs documented
+- [ ] README updated if needed
+## AI & LLM Review Patterns (2025)
+### Logic & Hallucinations
+- [ ] **Chain of Thought:** Does the logic follow a verifiable path?
+- [ ] **Edge Cases:** Did the AI account for empty states, timeouts, and partial failures?
+- [ ] **External State:** Is the code making safe assumptions about file systems or networks?
+### Prompt Engineering Review
+```markdown
+// ❌ Vague prompt in code
+const response = await ai.generate(userInput);
+// ✅ Structured & Safe prompt
+const response = await ai.generate({
+  system: "You are a specialized parser...",
+  input: sanitize(userInput),
+  schema: ResponseSchema
+});
+```
+## Anti-Patterns to Flag
+```typescript
+// ❌ Magic numbers
+if (status === 3) { ... }
+// ✅ Named constants
+if (status === Status.ACTIVE) { ... }
+// ❌ Deep nesting
+if (a) { if (b) { if (c) { ... } } }
+// ✅ Early returns
+if (!a) return;
+if (!b) return;
+if (!c) return;
+// do work
+// ❌ Long functions (100+ lines)
+// ✅ Small, focused functions
+// ❌ any type
+const data: any = ...
+// ✅ Proper types
+const data: UserData = ...
+```
+## Review Comments Guide
+```
+// Blocking issues use 🔴
+🔴 BLOCKING: SQL injection vulnerability here
+// Important suggestions use 🟡
+🟡 SUGGESTION: Consider using useMemo for performance
+// Minor nits use 🟢
+🟢 NIT: Prefer const over let for immutable variable
+// Questions use ❓
+❓ QUESTION: What happens if user is null here?
+```

package/template/.agent/skills/database-design/SKILL.md ADDED Viewed

@@ -0,0 +1,52 @@
+---
+name: database-design
+description: Database design principles and decision-making. Schema design, indexing strategy, ORM selection, serverless databases.
+allowed-tools: Read, Write, Edit, Glob, Grep
+---
+# Database Design
+> **Learn to THINK, not copy SQL patterns.**
+## 🎯 Selective Reading Rule
+**Read ONLY files relevant to the request!** Check the content map, find what you need.
+| File | Description | When to Read |
+|------|-------------|--------------|
+| `database-selection.md` | PostgreSQL vs Neon vs Turso vs SQLite | Choosing database |
+| `orm-selection.md` | Drizzle vs Prisma vs Kysely | Choosing ORM |
+| `schema-design.md` | Normalization, PKs, relationships | Designing schema |
+| `indexing.md` | Index types, composite indexes | Performance tuning |
+| `optimization.md` | N+1, EXPLAIN ANALYZE | Query optimization |
+| `migrations.md` | Safe migrations, serverless DBs | Schema changes |
+---
+## ⚠️ Core Principle
+- ASK user for database preferences when unclear
+- Choose database/ORM based on CONTEXT
+- Don't default to PostgreSQL for everything
+---
+## Decision Checklist
+Before designing schema:
+- [ ] Asked user about database preference?
+- [ ] Chosen database for THIS context?
+- [ ] Considered deployment environment?
+- [ ] Planned index strategy?
+- [ ] Defined relationship types?
+---
+## Anti-Patterns
+❌ Default to PostgreSQL for simple apps (SQLite may suffice)
+❌ Skip indexing
+❌ Use SELECT * in production
+❌ Store JSON when structured data is better
+❌ Ignore N+1 queries