npm - gsd-trae - Versions diffs - 1.0.1 → 1.0.2 - Mend

gsd-trae 1.0.1 → 1.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (761) hide show

package/refs/vbenchmark/README.md DELETED Viewed

@@ -1,354 +0,0 @@
-<p align="center">
-  <h1 align="center">🚀 VibeCodingBench</h1>
-  <p align="center">
-    <strong>The benchmark that measures what AI coding agents actually do in production</strong>
-  </p>
-  <p align="center">
-    <a href="#why-vibecodingbench">Why</a> •
-    <a href="#quick-start">Quick Start</a> •
-    <a href="#task-categories">Tasks</a> •
-    <a href="#evaluation">Evaluation</a> •
-    <a href="#leaderboard">Leaderboard</a> •
-    <a href="#contributing">Contributing</a>
-  </p>
-  <p align="center">
-    <img src="https://img.shields.io/badge/tasks-180-blue" alt="Tasks">
-    <img src="https://img.shields.io/badge/models-15-green" alt="Models">
-    <img src="https://img.shields.io/badge/languages-10-orange" alt="Languages">
-    <img src="https://img.shields.io/badge/version-1.0.0-brightgreen" alt="Version">
-  </p>
-</p>
----
-## Why VibeCodingBench?
-**Existing benchmarks are disconnected from reality.** See our [full thesis](docs/THESIS.md) for detailed analysis.
-| Benchmark | Focus | Real-World Signal | Limitation |
-|-----------|-------|-------------------|------------|
-| HumanEval | Algorithmic puzzles | ❌ Low | Not production code |
-| SWE-bench | Bug fixes in 12 repos | ⚠️ Medium | [63% suspicious patches](https://runloop.ai/blog/swe-bench-deep-dive-unmasking-the-limitations-of-a-popular-benchmark) |
-| SWE-bench Pro | Multi-file tasks | ⚠️ Medium | [70% → 23% performance drop](https://scale.com/leaderboard/swe_bench_pro_public) |
-| **VibeCodingBench** | Full-stack features | ✅ **High** | Production-aligned tasks |
-### The Evidence
-**Developer Time Distribution** ([Sonar Research](https://www.sonarsource.com/blog/how-much-time-do-developers-spend-actually-writing-code/)):
-- Writing new code: 32% | Code maintenance: 19% | Testing: 12%
-- Developers code only **52 minutes/day** on average
-**The Boilerplate Burden** ([GitHub Octoverse 2025](https://github.blog/news-insights/octoverse/)):
-- 2.4M repos use Notebooks (+75% YoY)
-- 1.9M repos use Dockerfiles (+120% YoY)
-- Developers need help with **repetitive patterns**: auth, CRUD, integrations
-**SWE-EVO Exposes the Gap** ([arxiv:2512.18470](https://arxiv.org/abs/2512.18470)):
-- Best models: 65% on simple fixes → **only 21% on code evolution**
-- "Current AI agents struggle with comprehensive planning and execution"
-**Quality Beyond Pass Rate** ([Qodo 2025](https://www.qodo.ai/reports/state-of-ai-code-quality/)):
-- "Claude Sonnet 4 averaged **2.11 issues per passing task**"
-- Pass rate alone hides production risks
-**Developer Frustration** ([Stack Overflow 2025](https://survey.stackoverflow.co/2025/)):
-- 66% cite "AI solutions almost right, but not quite" as top frustration
-- 45% say "debugging AI code is more time-consuming"
-## Quick Start
-### From Source
-```bash
-git clone https://github.com/alt-research/vibe-coding-benchmark-public.git
-cd coding-model-benchmark
-npm install
-npm run build
-# List tasks
-node packages/cli/dist/index.js list
-# Run a task with mock agent
-node packages/cli/dist/index.js run saas-core/auth/supabase-oauth --agent mock
-# Run with real agent (requires API key)
-export ANTHROPIC_API_KEY=your_key
-node packages/cli/dist/index.js run saas-core/auth/supabase-oauth --agent claude
-# Run full evaluation across agents
-node packages/cli/dist/index.js eval --agents claude,glm,minimax
-# Watch live execution
-node packages/cli/dist/index.js run <task-id> --agent claude --live
-```
-## Task Categories
-| Category | Weight | Tasks | Languages | Examples |
-|----------|--------|-------|-----------|----------|
-| **SaaS Core** | 25% | 20 | TS, Go, Python, Java, Rust | `supabase-oauth`, `jwt-refresh-tokens`, `rbac-permissions` |
-| **Glue Code** | 20% | 20 | Python, Go, TS, Java, Rust | `csv-normalizer`, `kafka-producer`, `cdc-pipeline` |
-| **AI Integration** | 20% | 20 | Python, TS, Go | `pdf-qa`, `research-agent`, `semantic-search` |
-| **Frontend** | 15% | 20 | React, Vue, Svelte, RN | `landing-page`, `data-grid`, `collaborative-editor` |
-| **API Integrations** | 10% | 20 | TS, Go, Python, Java | `checkout-session`, `twilio-sms`, `saml-sso` |
-| **Code Evolution** | 10% | 20 | TS, Python, Go, Kotlin | `flask-to-fastapi`, `java-to-kotlin`, `secrets-rotation` |
-**Total: 180 tasks** across **10 languages** (TypeScript, Python, Go, Java, Kotlin, Rust, C#, React, Vue, Svelte)
-### Language Distribution
-Based on [GitHub Octoverse 2025](https://github.blog/news-insights/octoverse/) and [Stack Overflow Developer Survey 2025](https://survey.stackoverflow.co/2025/):
-| Language | % of Tasks | Rationale |
-|----------|------------|-----------|
-| TypeScript/JavaScript | 40% | #1 on GitHub, dominant in web dev |
-| Python | 25% | #2 on GitHub, AI/ML leader |
-| Go | 15% | Rising for cloud-native, microservices |
-| Java/Kotlin | 10% | Enterprise, Android development |
-| Rust | 5% | Systems programming, performance-critical |
-| C# | 5% | Enterprise, game development |
-### Task Structure
-Each task is a self-contained directory:
-```
-tasks/saas-core/auth/supabase-oauth/
-├── task.yaml           # Metadata, constraints
-├── PROMPT.md           # Instructions for the agent
-├── tests/              # Evaluation tests
-│   └── auth.test.ts    # Playwright E2E tests
-├── docker-compose.yaml # Services (DB, mock APIs)
-└── golden/             # Reference implementation (optional)
-```
-**Hot-reload support**: Add new tasks while the benchmark is running!
-## Evaluation
-### Multi-Dimensional Scoring
-We measure what senior engineers care about:
-| Dimension | Weight | Method | Why It Matters |
-|-----------|--------|--------|----------------|
-| **Functional** | 40% | Playwright E2E, Pass@k | Does it work? |
-| **Visual** | 20% | Pixel diff vs reference | Does it look right? |
-| **Quality** | 20% | ESLint + Semgrep + complexity | Is it maintainable? |
-| **Cost** | 10% | Tokens used, context pollution | Is it efficient? |
-| **Speed** | 10% | Wall-clock time, step count | Is it fast? |
-### Security Gate
-Any **Critical/High** vulnerability = **automatic fail**. We use Semgrep with OWASP rules.
-### The Scoring Formula
-```
-Final = (Functional × 0.4) + (Visual × 0.2) + (Quality × 0.2)
-        - (Cost Penalty) - (Speed Penalty)
-Security Fail → Final = 0
-```
-## Supported Agents
-| Agent | Model | Status | Config | Pricing (Input/Output per MTok) |
-|-------|-------|--------|--------|--------------------------------|
-| Claude | Haiku 4.5 | ✅ Supported | `ANTHROPIC_API_KEY` | $1.00 / $5.00 |
-| Claude | Opus 4.5 | ✅ Supported | `ANTHROPIC_API_KEY` | $5.00 / $25.00 |
-| Qwen | Qwen3-Max | ✅ Supported | `QWEN_API_KEY` | $1.20 / $6.00 |
-| GLM | GLM-4.7 | ✅ Supported | `GLM_API_KEY` | $0.60 / $2.20 |
-| MiniMax | M2.1 | ✅ Supported | `MINIMAX_API_KEY` | $0.30 / $1.20 |
-| OpenAI | GPT-5.2 | ✅ Supported | `OPENAI_API_KEY` | $1.75 / $14.00 |
-| DeepSeek | Chat-v3 | ✅ Supported | `DEEPSEEK_API_KEY` | $0.40 / $1.60 |
-| Gemini | 3-Flash Preview | ✅ Supported | `GOOGLE_API_KEY` | $0.50 / $3.00 |
-## Leaderboard
-```
-📈 LEADERBOARD (2026-01-27) - 180 tasks evaluated, 15 models
-╔══════╤══════════════════════╤═══════╤═══════════╤════════════╤════════════╤══════════════╤═════════════╗
-║ Rank │ Model                │ Final │ Pass Rate │ Total Cost │ Total Time │ Avg Time/Task│ Total Tokens║
-╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
-║ #1   │ Claude Opus 4.5      │ 89.2% │ 100.0%    │ $12.31     │ 2h 12m     │ 44s          │ 648K        ║
-╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
-║ #2   │ Claude Haiku 4.5     │ 89.0% │ 99.4%     │ $3.03      │ 1h 5m      │ 22s          │ 798K        ║
-╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
-║ #3   │ Grok 4 Fast          │ 88.8% │ 98.9%     │ $0.21      │ 1h 57m     │ 70s          │ 520K        ║
-╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
-║ #4   │ OpenAI GPT-5.2       │ 88.8% │ 98.3%     │ $5.01      │ 1h 24m     │ 28s          │ 485K        ║
-╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
-║ #5   │ Qwen3 Max            │ 88.6% │ 100.0%    │ $5.42      │ 2h 15m     │ 45s          │ 949K        ║
-╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
-║ #6   │ Claude Sonnet 4.5    │ 88.6% │ 98.3%     │ $6.98      │ 2h 6m      │ 42s          │ 612K        ║
-╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
-║ #7   │ GLM 4-Plus           │ 88.2% │ 98.9%     │ $0.93      │ 4h 49m     │ 96s          │ 794K        ║
-╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
-║ #8   │ DeepSeek v3.2        │ 88.2% │ 98.3%     │ $0.50      │ 4h 29m     │ 90s          │ 543K        ║
-╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
-║ #9   │ Grok 4               │ 88.0% │ 97.8%     │ $5.47      │ 2h 5m      │ 75s          │ 480K        ║
-╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
-║ #10  │ MiniMax M2.1         │ 87.4% │ 99.4%     │ $2.40      │ 8h 15m     │ 165s         │ 2.78M       ║
-╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
-║ #11  │ Grok 4.1 Fast        │ 86.8% │ 97.2%     │ $0.24      │ 2h 27m     │ 89s          │ 580K        ║
-╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
-║ #12  │ Gemini 3 Pro Preview │ 85.8% │ 95.0%     │ $10.34     │ 1h 36m     │ 32s          │ 738K        ║
-╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
-║ #13  │ GLM-4.7              │ 83.9% │ 85.6%     │ $0.73      │ 2h 50m     │ 57s          │ 623K        ║
-╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
-║ #14  │ GLM 4.7 Flash        │ 83.8% │ 92.8%     │ $1.11      │ 2h 15m     │ 45s          │ 650K        ║
-╟──────┼──────────────────────┼───────┼───────────┼────────────┼────────────┼──────────────┼─────────────╢
-║ #15  │ Gemini 3 Flash       │ 83.4% │ 92.2%     │ $0.86      │ 1h 23m     │ 28s          │ 384K        ║
-╚══════╧══════════════════════╧═══════╧═══════════╧════════════╧════════════╧══════════════╧═════════════╝
-```
-### Pricing (OpenRouter 2026-01-27)
-| Model | Input $/M | Output $/M |
-|-------|-----------|------------|
-| Claude Opus 4.5 | $5.00 | $25.00 |
-| Claude Sonnet 4.5 | $3.00 | $15.00 |
-| Claude Haiku 4.5 | $1.00 | $5.00 |
-| Qwen3 Max | $1.20 | $6.00 |
-| OpenAI GPT-5.2 | $1.75 | $14.00 |
-| Grok 4 | $3.00 | $15.00 |
-| Grok 4 Fast | $0.20 | $0.50 |
-| Grok 4.1 Fast | $0.20 | $0.50 |
-| GLM 4-Plus/4.7 | $0.40 | $1.50 |
-| GLM 4.7 Flash | $0.07 | $0.40 |
-| DeepSeek v3.2 | $0.30 | $1.20 |
-| Gemini 3 Flash | $0.50 | $3.00 |
-| Gemini 3 Pro | $2.00 | $12.00 |
-| MiniMax M2.1 | $0.27 | $1.12 |
-### Detailed Metrics
-| Model | Functional | Quality | Cost/Task | Tokens/Task |
-|-------|------------|---------|-----------|-------------|
-| Claude Opus 4.5 | 85.0% | 80.0% | $0.0684 | 3,599 |
-| Claude Haiku 4.5 | 84.5% | 79.6% | $0.0168 | 4,435 |
-| Grok 4 Fast | 84.1% | 80.0% | $0.0012 | 2,889 |
-| Qwen3 Max | 85.0% | 80.0% | $0.0301 | 5,273 |
-| OpenAI GPT-5.2 | 83.6% | 79.6% | $0.0278 | 2,694 |
-| Claude Sonnet 4.5 | 83.6% | 80.0% | $0.0388 | 3,400 |
-| GLM 4-Plus | 84.1% | 80.0% | $0.0052 | 4,412 |
-| DeepSeek v3.2 | 83.6% | 80.0% | $0.0028 | 3,015 |
-| Grok 4 | 83.6% | 80.0% | $0.0304 | 2,667 |
-| MiniMax M2.1 | 84.5% | 80.0% | $0.0133 | 15,436 |
-| Grok 4.1 Fast | 82.6% | 78.7% | $0.0013 | 3,222 |
-| Gemini 3 Pro Preview | 80.8% | 77.3% | $0.0574 | 4,099 |
-| GLM-4.7 | 72.7% | 79.6% | $0.0041 | 3,464 |
-| GLM 4.7 Flash | 78.9% | 79.6% | $0.0062 | 3,611 |
-| Gemini 3 Flash | 78.4% | 75.1% | $0.0048 | 2,133 |
-**Live Dashboard**: https://vibecoding.llmbench.xyz
-## Contributing
-We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for details.
-### Adding a New Task
-1. **Create task directory**:
-   ```bash
-   mkdir -p tasks/<category>/<subcategory>/<task-name>
-   ```
-2. **Add task.yaml**:
-   ```yaml
-   name: My New Task
-   category: saas-core
-   difficulty: medium
-   stack: nextjs-supabase
-   tags: [typescript, auth]
-   ```
-3. **Write PROMPT.md** with clear requirements
-4. **Add tests** (Playwright for web, pytest for Python)
-5. **Submit PR** using the template
-## Architecture
-```
-vibecodingbench/
-├── packages/
-│   ├── cli/              # CLI tool
-│   ├── evaluator/        # Scoring engine
-│   └── leaderboard/      # Web dashboard
-├── tasks/                # 120 benchmark tasks
-│   ├── saas-core/        # 20 tasks
-│   ├── glue-code/        # 20 tasks
-│   ├── ai-integration/   # 20 tasks
-│   ├── frontend/         # 20 tasks
-│   ├── api-integrations/ # 20 tasks
-│   └── code-evolution/   # 20 tasks
-├── templates/            # Starter codebases
-│   ├── nextjs-supabase/
-│   ├── fastapi-postgres/
-│   ├── go-fiber/
-│   └── rust-axum/
-└── docker/               # Base images
-```
-## Deployment
-### Self-Hosted (Docker)
-```bash
-# Build and run production stack
-./scripts/deploy.sh docker
-# Or in background
-./scripts/deploy.sh docker --detach
-# Services available at:
-# - Dashboard: http://localhost:3000
-# - API: http://localhost:3001
-```
-### Fly.io
-```bash
-cd packages/leaderboard
-fly launch --config fly.toml
-fly deploy
-```
-## Environment Setup
-```bash
-# Required
-export ANTHROPIC_API_KEY=...            # Claude (Anthropic)
-export OPENAI_API_KEY=...               # OpenAI
-export GOOGLE_API_KEY=...               # Gemini (Google AI)
-# Optional
-export GLM_API_KEY=...                  # GLM (Zhipu AI)
-export MINIMAX_API_KEY=...              # MiniMax
-export QWEN_API_KEY=...                 # Qwen (Alibaba DashScope)
-export DEEPSEEK_API_KEY=...             # DeepSeek
-```
-## Citation
-If you use VibeCodingBench in your research, please cite:
-```bibtex
-@software{vibecodingbench2025,
-  title = {VibeCodingBench: A Benchmark for AI Coding Agents on Real-World Developer Tasks},
-  year = {2025},
-  url = {https://github.com/alt-research/vibe-coding-benchmark-public}
-}
-```
----
-<p align="center">
-  <sub>Built with ❤️ by the open-source community</sub>
-</p>

package/refs/vbenchmark/docker-compose.prod.yaml DELETED Viewed

@@ -1,35 +0,0 @@
-version: '3.8'
-services:
-  leaderboard:
-    build:
-      context: ./packages/leaderboard
-      dockerfile: Dockerfile
-    ports:
-      - "3001:3001"
-    environment:
-      - NODE_ENV=production
-      - PORT=3001
-      - DATABASE_URL=${DATABASE_URL:-}
-    restart: unless-stopped
-    healthcheck:
-      test: ["CMD", "wget", "-qO-", "http://localhost:3001/health"]
-      interval: 30s
-      timeout: 10s
-      retries: 3
-  dashboard:
-    build:
-      context: ./packages/dashboard
-      dockerfile: Dockerfile
-    ports:
-      - "3000:3000"
-    environment:
-      - VITE_API_URL=http://leaderboard:3001
-    depends_on:
-      - leaderboard
-    restart: unless-stopped
-networks:
-  default:
-    name: vibecodingbench

package/refs/vbenchmark/docker-compose.yaml DELETED Viewed

@@ -1,53 +0,0 @@
-version: '3.8'
-services:
-  postgres:
-    image: postgres:16-alpine
-    container_name: benchmark-postgres
-    environment:
-      POSTGRES_USER: benchmark
-      POSTGRES_PASSWORD: benchmark123
-      POSTGRES_DB: vibecodingbench
-    ports:
-      - "5432:5432"
-    volumes:
-      - postgres_data:/var/lib/postgresql/data
-    healthcheck:
-      test: ["CMD-SHELL", "pg_isready -U benchmark -d vibecodingbench"]
-      interval: 5s
-      timeout: 5s
-      retries: 5
-  leaderboard:
-    build:
-      context: ./packages/leaderboard
-      dockerfile: Dockerfile
-    ports:
-      - "3001:3001"
-    environment:
-      - NODE_ENV=production
-      - PORT=3001
-      - DATABASE_URL=postgresql://benchmark:benchmark123@postgres:5432/vibecodingbench
-    depends_on:
-      postgres:
-        condition: service_healthy
-    restart: unless-stopped
-  dashboard:
-    build:
-      context: ./packages/dashboard
-      dockerfile: Dockerfile
-    ports:
-      - "3000:3000"
-    environment:
-      - VITE_API_URL=http://leaderboard:3001
-    depends_on:
-      - leaderboard
-    restart: unless-stopped
-volumes:
-  postgres_data:
-networks:
-  default:
-    name: vibecodingbench

package/refs/vbenchmark/docs/TASK_EXPANSION_PLAN.md DELETED Viewed

@@ -1,211 +0,0 @@
-# VibeCodingBench Task Expansion Plan
-## Overview
-Expanding from 18 tasks to 120 tasks (20 per category) with multi-language support.
-## Language Distribution
-Based on GitHub Octoverse 2025 and Stack Overflow 2025:
-- **TypeScript/JavaScript**: 40% (most used on GitHub)
-- **Python**: 25% (dominant in AI/data)
-- **Go**: 15% (cloud-native, microservices)
-- **Java/Kotlin**: 10% (enterprise)
-- **Rust**: 5% (systems, performance)
-- **C#**: 5% (enterprise, game dev)
----
-## Category 1: saas-core (20 tasks)
-### Existing (6):
-1. auth/supabase-oauth (TypeScript)
-2. auth/mfa-totp (TypeScript)
-3. crud/dashboard-table (TypeScript)
-4. settings/user-preferences (TypeScript)
-5. realtime/websocket-chat (TypeScript)
-6. security/rate-limiter (TypeScript)
-### New (14):
-7. auth/jwt-refresh-tokens (Go) - Implement JWT with refresh token rotation
-8. auth/magic-link-email (Python/FastAPI) - Passwordless email authentication
-9. auth/rbac-permissions (Java/Spring) - Role-based access control system
-10. auth/session-management (Rust/Actix) - Secure session handling with Redis
-11. billing/stripe-subscriptions (TypeScript) - Subscription management with Stripe
-12. billing/usage-metering (Go) - Track and bill based on API usage
-13. billing/invoice-generation (Python) - Generate PDF invoices with line items
-14. multi-tenant/org-isolation (TypeScript) - Database-per-tenant isolation
-15. multi-tenant/subdomain-routing (Go) - Route requests by subdomain
-16. notifications/email-queue (Python) - Async email notification system
-17. notifications/push-notifications (TypeScript) - Web push with service workers
-18. notifications/in-app-alerts (Java/Spring) - Real-time in-app notifications
-19. audit/activity-logging (Go) - Comprehensive audit trail system
-20. search/full-text-search (TypeScript) - Elasticsearch integration for search
----
-## Category 2: glue-code (20 tasks)
-### Existing (3):
-1. data-transform/excel-to-json (Python)
-2. api-sync/rest-to-graphql (TypeScript)
-3. caching/redis-cache (TypeScript)
-### New (17):
-4. data-transform/csv-normalizer (Python) - Clean and normalize CSV data
-5. data-transform/json-to-xml (Go) - Bidirectional JSON/XML conversion
-6. data-transform/protobuf-converter (Rust) - Protocol buffer serialization
-7. data-transform/avro-schema-evolution (Java) - Handle Avro schema changes
-8. etl/database-sync (Python) - Sync data between PostgreSQL and MongoDB
-9. etl/s3-to-warehouse (Go) - Load S3 files into data warehouse
-10. etl/cdc-pipeline (TypeScript) - Change data capture with Debezium
-11. queue/rabbitmq-consumer (Python) - Reliable message queue processing
-12. queue/kafka-producer (Go) - High-throughput Kafka event publishing
-13. queue/sqs-batch-processor (TypeScript) - AWS SQS batch processing
-14. scheduler/cron-job-manager (Go) - Distributed cron job scheduling
-15. scheduler/delayed-tasks (Python) - Celery-based delayed task execution
-16. file-processing/image-resizer (Rust) - High-performance image processing
-17. file-processing/pdf-merger (Python) - Merge and manipulate PDFs
-18. file-processing/video-transcoder (Go) - FFmpeg-based video processing
-19. migration/database-versioning (TypeScript) - Schema migration system
-20. migration/data-backfill (Python) - Backfill data with progress tracking
----
-## Category 3: ai-integration (20 tasks)
-### Existing (2):
-1. structured-output/invoice-parser (Python)
-2. rag-chatbot/pdf-qa (Python)
-### New (18):
-3. structured-output/resume-parser (Python) - Extract structured data from resumes
-4. structured-output/receipt-scanner (TypeScript) - OCR + LLM receipt extraction
-5. structured-output/contract-analyzer (Python) - Legal document analysis
-6. rag-chatbot/code-assistant (TypeScript) - Codebase Q&A with RAG
-7. rag-chatbot/support-bot (Python) - Customer support with knowledge base
-8. rag-chatbot/doc-search (Go) - Multi-document semantic search
-9. agents/web-scraper-agent (Python) - Autonomous web data extraction
-10. agents/research-agent (TypeScript) - Multi-step research automation
-11. agents/code-review-agent (Python) - Automated PR review with LLM
-12. function-calling/api-orchestrator (TypeScript) - LLM-driven API calls
-13. function-calling/database-query (Python) - Natural language to SQL
-14. function-calling/calendar-assistant (TypeScript) - Schedule management agent
-15. embeddings/semantic-search (Python) - Vector similarity search
-16. embeddings/recommendation-engine (Go) - Content recommendations
-17. embeddings/duplicate-detection (Python) - Find similar documents
-18. fine-tuning/classification-model (Python) - Fine-tune for text classification
-19. multimodal/image-captioning (Python) - Generate image descriptions
-20. multimodal/chart-interpreter (TypeScript) - Extract data from chart images
----
-## Category 4: frontend (20 tasks)
-### Existing (3):
-1. figma-to-code/pricing-card (TypeScript)
-2. visualization/chart-dashboard (TypeScript)
-3. components/form-builder (TypeScript)
-### New (17):
-4. figma-to-code/landing-page (TypeScript/React) - Full landing page from design
-5. figma-to-code/dashboard-layout (TypeScript/Vue) - Admin dashboard UI
-6. figma-to-code/mobile-app-screen (TypeScript/React Native) - Mobile UI
-7. components/data-grid (TypeScript/React) - Advanced data grid with virtual scroll
-8. components/rich-text-editor (TypeScript/Vue) - WYSIWYG editor with plugins
-9. components/file-uploader (TypeScript/React) - Drag-drop with preview
-10. components/date-range-picker (TypeScript/Svelte) - Complex date selection
-11. visualization/realtime-charts (TypeScript/React) - Live updating charts
-12. visualization/map-dashboard (TypeScript/Vue) - Geographic data viz
-13. visualization/gantt-chart (TypeScript/React) - Project timeline view
-14. state-management/shopping-cart (TypeScript/React) - Complex cart with Redux
-15. state-management/collaborative-editor (TypeScript/Vue) - Real-time collab
-16. accessibility/screen-reader-nav (TypeScript/React) - WCAG compliant nav
-17. accessibility/keyboard-shortcuts (TypeScript/Vue) - Full keyboard support
-18. performance/infinite-scroll (TypeScript/React) - Virtualized infinite list
-19. performance/image-lazy-load (TypeScript/Svelte) - Optimized image loading
-20. animation/page-transitions (TypeScript/React) - Smooth route animations
----
-## Category 5: api-integrations (20 tasks)
-### Existing (2):
-1. stripe/payment-webhook (TypeScript)
-2. email/transactional (TypeScript)
-### New (18):
-3. stripe/checkout-session (Go) - Create Stripe checkout flows
-4. stripe/subscription-portal (TypeScript) - Customer billing portal
-5. payment/paypal-integration (Python) - PayPal payments and refunds
-6. payment/crypto-payments (TypeScript) - Accept cryptocurrency
-7. storage/s3-presigned-urls (Go) - Secure file uploads to S3
-8. storage/cloudinary-upload (TypeScript) - Image upload and transform
-9. storage/gcs-streaming (Python) - Stream large files to GCS
-10. auth-provider/oauth2-github (Go) - GitHub OAuth integration
-11. auth-provider/saml-sso (Java/Spring) - Enterprise SAML SSO
-12. auth-provider/okta-integration (TypeScript) - Okta user management
-13. communication/twilio-sms (Python) - SMS notifications
-14. communication/slack-bot (TypeScript) - Slack app with slash commands
-15. communication/discord-webhook (Go) - Discord notifications
-16. maps/google-maps-geocoding (TypeScript) - Address to coordinates
-17. maps/mapbox-directions (Python) - Route calculation
-18. analytics/segment-tracking (TypeScript) - Event tracking pipeline
-19. analytics/mixpanel-events (Go) - User behavior analytics
-20. social/twitter-api (Python) - Tweet posting and monitoring
----
-## Category 6: code-evolution (20 tasks)
-### Existing (2):
-1. legacy-migration/express-to-fastify (TypeScript)
-2. refactoring/class-to-hooks (TypeScript)
-### New (18):
-3. legacy-migration/callback-to-async (TypeScript) - Callback hell to async/await
-4. legacy-migration/jquery-to-react (TypeScript) - jQuery app to React
-5. legacy-migration/flask-to-fastapi (Python) - Flask to FastAPI migration
-6. legacy-migration/java-to-kotlin (Kotlin) - Java codebase to Kotlin
-7. legacy-migration/rest-to-grpc (Go) - REST API to gRPC
-8. refactoring/monolith-to-modules (TypeScript) - Extract modules from monolith
-9. refactoring/orm-migration (Python) - SQLAlchemy to async SQLModel
-10. refactoring/dependency-injection (Java/Spring) - Add DI to legacy code
-11. refactoring/error-handling (Go) - Standardize error handling
-12. testing/add-unit-tests (TypeScript) - Add tests to untested code
-13. testing/e2e-playwright (TypeScript) - Add E2E tests with Playwright
-14. testing/pytest-fixtures (Python) - Refactor tests with fixtures
-15. performance/query-optimization (Python) - Optimize slow DB queries
-16. performance/memory-leak-fix (TypeScript) - Fix memory leaks in Node.js
-17. performance/async-refactor (Python) - Sync to async for I/O bound
-18. security/sql-injection-fix (Python) - Fix SQL injection vulnerabilities
-19. security/xss-prevention (TypeScript) - Add XSS protection
-20. security/secrets-rotation (Go) - Implement secrets rotation
----
-## Implementation Priority
-### Phase 1 (High Priority - Common Tasks)
-- All auth tasks
-- All billing tasks
-- All payment integrations
-- RAG and agent tasks
-### Phase 2 (Medium Priority - Enterprise)
-- Multi-tenant tasks
-- SAML/SSO integrations
-- Audit logging
-- Migration tasks
-### Phase 3 (Lower Priority - Specialized)
-- Multimodal AI tasks
-- Advanced visualizations
-- Performance optimization tasks
----
-## Sources
-- [GitHub Octoverse 2025](https://github.blog/news-insights/octoverse/octoverse-a-new-developer-joins-github-every-second-as-ai-leads-typescript-to-1/)
-- [Stack Overflow Developer Survey 2025](https://survey.stackoverflow.co/2025/)
-- [HackerRank Real-World Coding Challenges 2025](https://www.hackerrank.com/writing/design-real-world-coding-challenges-junior-backend-developer-screening-2025)
-- [LangChain State of Agent Engineering](https://www.langchain.com/state-of-agent-engineering)
-- [WorkOS Multi-Tenant Architecture Guide](https://workos.com/blog/developers-guide-saas-multi-tenant-architecture)