npm - @framers/agentos-skills-registry - Versions diffs - 0.12.0 → 0.14.0 - Mend

@framers/agentos-skills-registry 0.12.0 → 0.14.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/package.json +1 -1
package/registry/curated/multimodal-rag/SKILL.md +145 -15
package/registry/curated/video-generation/SKILL.md +210 -0
package/registry/curated/vision-ocr/SKILL.md +71 -11
package/registry.json +121 -81
package/registry/curated/video-ingestion/SKILL.md +0 -90

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@framers/agentos-skills-registry",
-  "version": "0.12.0",
+  "version": "0.14.0",
   "files": [
     "dist",
     "registry",

package/registry/curated/multimodal-rag/SKILL.md CHANGED Viewed

@@ -1,23 +1,153 @@
 ---
 name: multimodal-rag
-description: Index and search across text, images, audio, video, and PDFs
-version: 1.0.0
-tags: [rag, multimodal, image, audio, video, pdf, search]
-tools_required: [vision-pipeline]
+version: '2.0.0'
+description: Index and search across text, images, audio, video, and PDFs via the multimodal RAG pipeline and HTTP API.
+author: Wunderland
+namespace: wunderland
+category: productivity
+tags: [rag, multimodal, image, audio, video, pdf, search, indexing, memory]
+requires_secrets: []
+requires_tools: [vision-pipeline]
+metadata:
+  agentos:
+    emoji: "\U0001F50D"
 ---
 # Multimodal RAG
-Index and retrieve content across all modalities -- text, images, audio, video, and PDFs. Images are described via vision AI, audio is transcribed, video frames are extracted, and PDFs are parsed. All content is embedded and searchable.
+Use this skill when the user wants to index, search, or retrieve content across multiple modalities -- text, images, audio, video, and documents (PDF, DOCX, Markdown, CSV, JSON, XML). All non-text content is converted to a text representation (vision description, STT transcript, document parse) before embedding, so every modality is searchable with the same text query.
+## Architecture
+```
+Image  --> Vision LLM --> description --> embed --> vector store
+Audio  --> STT        --> transcript  --> embed --> vector store
+Video  --> ffmpeg (frames + audio)   --> vision + STT --> vector store
+PDF    --> text extraction + chunking --> embed --> vector store
+```
+When cognitive memory is enabled via `MultimodalMemoryBridge`, ingested content also creates memory traces so agents can recall multimodal content during conversation without an explicit search.
 ## Capabilities
-- **Image indexing**: Vision LLM describes -> embed -> search
-- **Audio indexing**: STT transcribes -> embed -> search
-- **Video indexing**: Frame extraction + audio transcription
-- **PDF indexing**: Text + embedded image extraction
-- **Cross-modal search**: Query returns results from all modalities
-## Example
-"Find images related to quantum computing"
-"Search my audio recordings for mentions of the project deadline"
-"What does the diagram in page 3 of the PDF show?"
+- **Image indexing** — Vision LLM describes the image, description is embedded and searchable.
+- **Audio indexing** — STT transcribes the audio, transcript is chunked and searchable.
+- **Video indexing** — Frame extraction (vision) + audio transcription (STT), both indexed.
+- **Document indexing** — PDF, DOCX, TXT, Markdown, CSV, JSON, XML text extracted and indexed.
+- **Cross-modal search** — A single text query returns results from all modalities, ranked by relevance.
+- **Query-by-image** — Upload an image to find similar indexed content.
+- **Query-by-audio** — Upload audio to find related indexed content via transcript matching.
+## HTTP API Routes
+All routes are mounted under `/api/agentos/rag/multimodal`. Ingestion routes accept `multipart/form-data`.
+### Ingest
+| Method | Path | Field | Description |
+|--------|------|-------|-------------|
+| POST | `/images/ingest` | `image` | Ingest an image (max 15 MB). Vision LLM generates description. |
+| POST | `/audio/ingest` | `audio` | Ingest audio (max 25 MB). STT generates transcript. |
+| POST | `/documents/ingest` | `document` | Ingest a document (max 30 MB). Text extracted and chunked. |
+Common form fields for all ingest routes:
+| Field | Type | Description |
+|-------|------|-------------|
+| `collectionId` | string | Target collection (default: auto) |
+| `assetId` | string | Optional custom ID for the asset |
+| `category` | string | `conversation_memory`, `knowledge_base`, `user_notes`, `system`, `custom` |
+| `tags` | string | Comma-separated or JSON array of tags |
+| `metadata` | string | JSON object with arbitrary metadata |
+| `storePayload` | boolean | Whether to store the raw binary (for later download) |
+| `sourceUrl` | string | Original URL of the content |
+| `textRepresentation` | string | Override auto-generated description/transcript |
+| `userId` | string | Owner user ID |
+| `agentId` | string | Owner agent ID |
+### Query
+| Method | Path | Body / Field | Description |
+|--------|------|-------------|-------------|
+| POST | `/query` | JSON body | Text query across all modalities |
+| POST | `/images/query` | `image` field | Query by uploading an image |
+| POST | `/audio/query` | `audio` field | Query by uploading audio |
+Text query body:
+```json
+{
+  "query": "quantum computing diagrams",
+  "modalities": ["image", "audio", "document"],
+  "collectionIds": ["knowledge-base"],
+  "topK": 10,
+  "includeMetadata": true
+}
+```
+Image/audio query form fields:
+| Field | Type | Description |
+|-------|------|-------------|
+| `modalities` | string | Comma-separated: `image`, `audio`, `document` |
+| `collectionIds` | string | Comma-separated collection IDs to search |
+| `topK` | number | Max results (default: 5) |
+| `includeMetadata` | boolean | Include stored metadata in results |
+| `retrievalMode` | string | `auto` (default), `text`, `native`, `hybrid` |
+### Asset Management
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/assets/:assetId` | Get asset metadata |
+| GET | `/assets/:assetId/content` | Download raw binary (if `storePayload` was true) |
+| DELETE | `/assets/:assetId` | Delete asset and its embeddings |
+## Retrieval Modes
+- **`auto`** (default) — Text-first retrieval with native augmentation when available.
+- **`text`** — Derive a caption/transcript and query the standard text pipeline only.
+- **`native`** — Use modality-native embeddings (e.g. CLIP for images) when available.
+- **`hybrid`** — Combine text and native retrieval, merge and re-rank results.
+## Programmatic Usage
+```typescript
+import { MultimodalMemoryBridge } from 'agentos/rag/multimodal';
+// Ingest an image
+await bridge.ingestImage(imageBuffer, { source: 'upload', tags: ['product'] });
+// Ingest audio
+await bridge.ingestAudio(audioBuffer, { language: 'en' });
+// Ingest video (requires ffmpeg)
+await bridge.ingestVideo(videoBuffer, { extractFrames: true });
+// Ingest PDF
+await bridge.ingestPDF(pdfBuffer, { extractImages: true });
+// Cross-modal search
+const results = await indexer.search('quantum computing', {
+  topK: 10,
+  modalities: ['image', 'text', 'audio'],
+});
+```
+## Examples
+- "Index this product photo so I can find it by description later."
+- "Ingest all the PDFs in this folder into my knowledge base."
+- "Search my audio recordings for mentions of the quarterly budget."
+- "Find images related to the network architecture diagram."
+- "What does the chart on page 5 of the annual report show?"
+- "Upload this meeting recording and make it searchable."
+## Constraints
+- Image uploads are capped at 15 MB, audio at 25 MB, documents at 30 MB.
+- Supported audio formats: MP3, MP4, M4A, WAV, WebM, OGG (Whisper-compatible).
+- Supported document formats: PDF, DOCX, TXT, Markdown, CSV, JSON, XML.
+- Video ingestion requires ffmpeg installed on the system.
+- Vision LLM and STT provider must be configured for image/audio indexing respectively.
+- Cross-modal search ranks by cosine similarity of embedded text representations; it does not perform true multimodal embedding fusion unless `retrievalMode: 'native'` is used with a CLIP-like model.

package/registry/curated/video-generation/SKILL.md ADDED Viewed

@@ -0,0 +1,210 @@
+---
+name: video-generation
+version: '1.0.0'
+description: Video generation, analysis, and scene detection — text-to-video, image-to-video, structured scene descriptions with RAG indexing, and general-purpose visual change detection.
+author: Wunderland
+namespace: wunderland
+category: media
+tags: [video, generation, analysis, scene-detection, RAG, multimodal, runway, replicate, fal]
+requires_secrets: []
+requires_tools: []
+metadata:
+  agentos:
+    emoji: "\U0001F3AC"
+---
+# Video Generation, Analysis & Scene Detection
+Use this skill when the user wants to create AI-generated videos, analyse existing video content for structured scene descriptions, or detect visual changes in live/recorded frame streams.
+This skill covers three complementary APIs:
+1. **generateVideo()** — Text-to-video and image-to-video generation
+2. **analyzeVideo()** — Structured video analysis with scene descriptions, transcription, and optional RAG indexing
+3. **detectScenes()** — Real-time or batch scene boundary detection from frame streams
+## Video Generation
+### Text-to-Video
+Generate a video from a text prompt. The system auto-detects the best available provider from environment variables in priority order: `RUNWAY_API_KEY` (highest quality), `REPLICATE_API_TOKEN` (widest model variety), `FAL_API_KEY` (fast serverless GPU).
+```typescript
+import { generateVideo } from 'agentos';
+const result = await generateVideo({
+  prompt: 'A drone flying over a misty forest at sunrise, cinematic 4K',
+  durationSec: 5,
+  aspectRatio: '16:9',
+});
+console.log(result.videos[0].url);
+```
+### Image-to-Video
+Animate a still image by providing it as a Buffer via `opts.image`. The prompt describes the desired motion rather than the scene itself.
+```typescript
+import { generateVideo } from 'agentos';
+import { readFileSync } from 'fs';
+const result = await generateVideo({
+  prompt: 'Camera slowly zooms out, gentle wind moves the leaves',
+  image: readFileSync('landscape.png'),
+  provider: 'runway',
+});
+```
+### Provider Selection
+| Provider | Best For | Env Var |
+|----------|----------|---------|
+| **Runway** | Highest quality, cinematic output, image-to-video | `RUNWAY_API_KEY` |
+| **Replicate** | Widest model variety (Kling, HunyuanVideo, MiniMax), open-source models | `REPLICATE_API_TOKEN` |
+| **Fal** | Fast serverless GPU, cost-effective, Kling/CogVideo | `FAL_API_KEY` |
+When multiple provider API keys are set, the system wraps the primary in a `FallbackVideoProxy` so a transient failure on one provider automatically retries on the next.
+To force a specific provider:
+```typescript
+const result = await generateVideo({
+  prompt: 'A cat playing piano',
+  provider: 'replicate',
+  model: 'klingai/kling-v1',
+  apiKey: 'your-replicate-token',
+});
+```
+### Prompt Tips for Video
+- **Be specific about motion**: "camera pans left to right", "person walks toward camera", "time-lapse of clouds moving"
+- **Specify style early**: "cinematic 4K", "hand-drawn animation", "vintage film grain"
+- **Keep prompts concise**: Video models respond best to clear, focused descriptions (1-3 sentences)
+- **Use negative prompts** to avoid unwanted artifacts: `negativePrompt: 'blurry, distorted faces, watermark'`
+### Image-to-Video Motion Strength
+When doing image-to-video, the prompt controls how much the image changes:
+- **Gentle motion**: "subtle camera drift", "soft wind blowing through hair" — minimal departure from source
+- **Moderate motion**: "person turns head and smiles", "camera orbits subject" — clear movement while preserving subject
+- **Strong motion**: "explosion of confetti", "character runs toward camera" — significant scene change
+The provider's motion strength interpretation varies. Runway tends to be conservative (good for preserving the source image), while Replicate/Fal models may be more aggressive. Start with gentle prompts and increase intensity.
+## Video Analysis
+### Structured Scene Analysis
+Analyse a video to extract structured scene descriptions, detected objects, on-screen text, and optional audio transcription.
+```typescript
+import { analyzeVideo } from 'agentos';
+const analysis = await analyzeVideo({
+  videoUrl: 'https://example.com/product-demo.mp4',
+  prompt: 'Identify all products shown and their key features',
+  transcribeAudio: true,
+  descriptionDetail: 'detailed',
+});
+console.log(analysis.description);
+for (const scene of analysis.scenes ?? []) {
+  console.log(`[${scene.startSec}s - ${scene.endSec}s] ${scene.description}`);
+}
+```
+### RAG Integration
+Enable `indexForRAG: true` to automatically index scene descriptions and transcripts into the vector store for later retrieval. This is especially useful for building searchable video libraries.
+```typescript
+const analysis = await analyzeVideo({
+  videoBuffer: videoData,
+  indexForRAG: true,
+  descriptionDetail: 'detailed',
+  transcribeAudio: true,
+});
+// Scene descriptions and transcripts are now searchable via RAG
+console.log(`Indexed ${analysis.ragChunkIds?.length ?? 0} chunks`);
+```
+Each scene description becomes a separate vector chunk with metadata including timestamps, scene index, and cut type. This enables queries like "find the part where the presenter shows the pricing slide" to return precise timestamp ranges.
+### Analysis Options
+| Option | Default | Description |
+|--------|---------|-------------|
+| `sceneThreshold` | `0.3` | Scene change sensitivity (0-1, lower = more scenes) |
+| `transcribeAudio` | `true` | Transcribe audio via configured STT provider |
+| `descriptionDetail` | `'detailed'` | `'brief'`, `'detailed'`, or `'exhaustive'` |
+| `maxScenes` | `100` | Cap on detected scenes (prevents runaway on long videos) |
+| `indexForRAG` | `false` | Index results into RAG vector store |
+## Scene Detection
+### Live Stream / Batch Detection
+Use `detectScenes()` for real-time visual change detection on frame streams. Returns an AsyncGenerator that yields `SceneBoundary` objects as visual discontinuities are detected.
+```typescript
+import { detectScenes } from 'agentos';
+// From a pre-recorded video (frames extracted via ffmpeg)
+for await (const boundary of detectScenes({ frames: extractedFrameStream })) {
+  console.log(`Scene ${boundary.index} at ${boundary.startTimeSec}s`);
+  console.log(`  Type: ${boundary.cutType}, Confidence: ${boundary.confidence}`);
+}
+```
+### Use Cases
+- **Webcam / security camera**: Detect motion or scene changes in real-time surveillance feeds
+- **Screen recording**: Identify slide transitions in presentations, page changes in demos
+- **Video editing**: Automatically segment raw footage at cut points
+- **Content moderation**: Flag rapid scene changes that may indicate problematic content
+### Configuration
+```typescript
+for await (const boundary of detectScenes({
+  frames: webcamStream,
+  hardCutThreshold: 0.4,     // Less sensitive to hard cuts
+  gradualThreshold: 0.15,    // Standard sensitivity for dissolves/fades
+  minSceneDurationSec: 2.0,  // Suppress very short scenes
+  methods: ['histogram'],     // Fast histogram-only detection
+})) {
+  handleSceneChange(boundary);
+}
+```
+### Cut Type Classification
+The detector classifies each scene boundary:
+| Cut Type | Description |
+|----------|-------------|
+| `hard-cut` | Abrupt frame-to-frame change (most common) |
+| `dissolve` | Cross-dissolve / superimposition transition |
+| `fade` | Fade from/to black or white |
+| `gradual` | Other gradual visual change |
+## Prerequisites
+- At least one video provider API key for generation (`RUNWAY_API_KEY`, `REPLICATE_API_TOKEN`, or `FAL_API_KEY`)
+- **ffmpeg** on PATH for video analysis (frame extraction and audio demuxing)
+- A vision-capable LLM (`OPENAI_API_KEY` or equivalent) for scene description
+- An STT provider for audio transcription (when `transcribeAudio` is enabled)
+Scene detection (`detectScenes()`) has zero external dependencies — it works purely on RGB pixel buffers.
+## Examples
+- "Generate a 5-second cinematic video of a sunset over the ocean"
+- "Turn this product photo into a video with a slow camera orbit"
+- "Analyse this tutorial video and index it for search"
+- "Detect scene changes in this security camera feed"
+- "Extract structured scenes from this presentation recording"
+- "Create a video from this image with gentle parallax motion"

package/registry/curated/vision-ocr/SKILL.md CHANGED Viewed

@@ -1,22 +1,82 @@
 ---
 name: vision-ocr
-description: Extract text from images using OCR and vision AI
-version: 1.0.0
+version: '1.1.0'
+description: Extract text from images using OCR and vision AI with the performOCR() high-level API or the full VisionPipeline.
+author: Wunderland
+namespace: wunderland
+category: vision
 tags: [vision, ocr, text-extraction, document, handwriting]
-tools_required: [vision-pipeline]
+requires_secrets: []
+requires_tools: [vision-pipeline]
 ---
 # Vision & OCR
-Extract text from images, documents, and handwritten notes using a progressive 3-tier pipeline: local OCR (PaddleOCR) -> local vision models (TrOCR, Florence-2) -> cloud vision (GPT-4o, Claude).
+Extract text from images, documents, and handwritten notes using a progressive 3-tier pipeline: local OCR (PaddleOCR / Tesseract) -> local vision models (TrOCR, Florence-2) -> cloud vision LLM (GPT-4o, Claude, Gemini).
+## High-Level API: `performOCR()`
+For one-shot text extraction, use the top-level `performOCR()` function. It handles input resolution, pipeline lifecycle, and cleanup automatically.
+```typescript
+import { performOCR } from '@framers/agentos';
+const result = await performOCR({
+  image: '/path/to/receipt.png',   // file path, URL, base64, or Buffer
+  strategy: 'progressive',         // 'progressive' | 'local-only' | 'cloud-only'
+  confidenceThreshold: 0.7,        // min confidence before escalating tier
+});
+console.log(result.text);       // extracted text
+console.log(result.confidence); // 0–1 score
+console.log(result.tier);       // 'ocr' | 'handwriting' | 'document-ai' | 'cloud-vision'
+console.log(result.provider);   // 'paddle' | 'tesseract' | 'openai' | etc.
+console.log(result.regions);    // bounding boxes (when available)
+```
+## When to use `performOCR()` vs `VisionPipeline`
+| Use case | Recommendation |
+|----------|---------------|
+| One-shot text extraction from a single image | `performOCR()` — simplest API |
+| Batch processing many images | `VisionPipeline` — create once, reuse, dispose when done |
+| Need CLIP embeddings or document layout | `VisionPipeline` — richer result shape |
+| Quick scripts and integrations | `performOCR()` — zero boilerplate |
+## Progressive Tier System
+The pipeline tries the cheapest/fastest tier first and only escalates when confidence is below threshold:
+1. **Tier 1 — Local OCR** (PaddleOCR or Tesseract.js): Fast, free, offline. Handles printed text in documents, receipts, screenshots.
+2. **Tier 2 — Local Vision Models** (TrOCR / Florence-2): Still offline. Handles handwritten notes, complex document layouts with tables and figures.
+3. **Tier 3 — Cloud Vision LLM** (GPT-4o / Claude / Gemini): Best quality. Handles photographs, diagrams, mixed content, anything the local tiers can't confidently read.
+## Strategy Selection
+- **`'progressive'`** (default): Start local, escalate only if needed. Best cost/quality balance for most use cases.
+- **`'local-only'`**: Never call cloud APIs. Use for air-gapped environments, privacy-sensitive data (medical records, financial docs), or when no API keys are available.
+- **`'cloud-only'`**: Skip local tiers entirely, send straight to a cloud vision LLM. Use when you need the highest quality output and cost is not a concern.
+## Input Formats
+`performOCR()` accepts four input types:
+- **File path**: `'/tmp/scan.png'` — reads from disk
+- **URL**: `'https://example.com/receipt.jpg'` — fetches via HTTP
+- **Base64 string**: Raw base64 or `data:image/png;base64,...` data URIs — decoded in-memory
+- **Buffer**: Raw image bytes — passed directly to the pipeline
 ## Capabilities
-- **Printed text OCR**: Extract text from documents, receipts, screenshots
+- **Printed text OCR**: Extract text from documents, receipts, screenshots, PDFs
 - **Handwriting recognition**: Read handwritten notes and forms via TrOCR
-- **Document layout**: Understand tables, figures, headings via Florence-2
-- **Image embeddings**: Generate CLIP vectors for semantic image search
+- **Document layout understanding**: Parse tables, figures, headings via Florence-2
+- **Bounding box regions**: Spatial text locations for overlay rendering
+- **Image embeddings**: Generate CLIP vectors for semantic image search (via `VisionPipeline` only)
+## Examples
-## Example
-"Read the text from this receipt"
-"What does this handwritten note say?"
-"Extract the table data from this PDF page"
+- "Read the text from this receipt"
+- "What does this handwritten note say?"
+- "Extract the table data from this PDF page"
+- "OCR this screenshot and return the error message"

package/registry.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "version": "1.0.0",
-  "updated": "2026-03-27T01:15:04.856Z",
+  "updated": "2026-03-27T07:00:16.599Z",
   "categories": {
     "curated": [
       "1password",
@@ -60,6 +60,7 @@
       "topicality",
       "trello",
       "twitter-bot",
+      "video-generation",
       "vision-ocr",
       "voice-conversation",
       "vosk",
@@ -84,7 +85,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "1password",
           "passwords",
@@ -129,7 +130,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "accounts",
           "credentials",
@@ -165,7 +166,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "agent",
           "config",
@@ -186,7 +187,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "voice",
           "tts",
@@ -217,7 +218,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "apple-notes",
           "macos",
@@ -251,7 +252,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "apple-reminders",
           "macos",
@@ -286,7 +287,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "blog",
           "publishing",
@@ -330,7 +331,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "bluesky",
           "social-media",
@@ -370,7 +371,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "cloud",
           "devops",
@@ -397,7 +398,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "guardrails",
           "code-safety",
@@ -425,7 +426,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "coding",
           "programming",
@@ -452,7 +453,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "content",
           "writing",
@@ -482,7 +483,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "research",
           "investigation",
@@ -520,7 +521,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "voice",
           "diarization",
@@ -545,7 +546,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "discord",
           "messaging",
@@ -573,7 +574,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "metadata": {
           "primaryEnv": "INTERNAL_API_SECRET",
           "emoji": "📧",
@@ -614,7 +615,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "emergent",
           "tools",
@@ -642,7 +643,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "voice",
           "endpointing",
@@ -668,7 +669,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "facebook",
           "social-media",
@@ -707,7 +708,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "git",
           "version-control",
@@ -761,7 +762,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "github",
           "git",
@@ -820,7 +821,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "voice",
           "stt",
@@ -849,7 +850,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "voice",
           "tts",
@@ -878,7 +879,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "guardrails",
           "hallucination",
@@ -906,7 +907,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "monitoring",
           "health",
@@ -939,7 +940,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "image",
           "editing",
@@ -953,25 +954,26 @@
         "id": "com.framers.skill.image-gen",
         "name": "image-gen",
         "displayName": "image-gen",
-        "version": "1.0.0",
+        "version": "2.0.0",
         "path": "registry/curated/image-gen",
-        "description": "Generate images from text prompts using AI image generation APIs like DALL-E, Stable Diffusion, or Midjourney.",
+        "description": "Generate, edit, upscale, and variate images using the AgentOS multi-provider image pipeline with automatic fallback.",
         "category": "creative",
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "image-generation",
           "ai-art",
           "dall-e",
           "stable-diffusion",
+          "flux",
+          "replicate",
+          "stability",
+          "fal",
           "creative",
           "visual"
         ],
-        "requiredSecrets": [
-          "openai.api_key"
-        ],
         "requiredTools": [
           "generate_image"
         ],
@@ -992,7 +994,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "instagram",
           "social-media",
@@ -1030,7 +1032,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "linkedin",
           "social-media",
@@ -1068,7 +1070,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "mastodon",
           "fediverse",
@@ -1108,7 +1110,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "memory",
           "cognitive",
@@ -1132,7 +1134,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "guardrails",
           "safety",
@@ -1155,14 +1157,14 @@
         "id": "com.framers.skill.multimodal-rag",
         "name": "multimodal-rag",
         "displayName": "multimodal-rag",
-        "version": "1.0.0",
+        "version": "2.0.0",
         "path": "registry/curated/multimodal-rag",
-        "description": "Index and search across text, images, audio, video, and PDFs",
-        "category": "uncategorized",
+        "description": "Index and search across text, images, audio, video, and PDFs via the multimodal RAG pipeline and HTTP API.",
+        "category": "productivity",
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "rag",
           "multimodal",
@@ -1170,8 +1172,16 @@
           "audio",
           "video",
           "pdf",
-          "search"
-        ]
+          "search",
+          "indexing",
+          "memory"
+        ],
+        "requiredTools": [
+          "vision-pipeline"
+        ],
+        "metadata": {
+          "emoji": "🔍"
+        }
       },
       {
         "id": "com.framers.skill.notion",
@@ -1184,7 +1194,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "notion",
           "wiki",
@@ -1213,7 +1223,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "obsidian",
           "markdown",
@@ -1241,7 +1251,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "voice",
           "wake-word",
@@ -1269,7 +1279,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "pii",
           "privacy",
@@ -1300,7 +1310,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "pinterest",
           "social-media",
@@ -1337,7 +1347,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "voice",
           "tts",
@@ -1364,7 +1374,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "voice",
           "wake-word",
@@ -1395,7 +1405,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "reddit",
           "social-media",
@@ -1436,7 +1446,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "seo",
           "link-building",
@@ -1471,7 +1481,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "deploy",
           "cloud",
@@ -1507,7 +1517,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "slack",
           "messaging",
@@ -1539,7 +1549,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z"
+        "verifiedAt": "2026-03-27T07:00:16.599Z"
       },
       {
         "id": "com.framers.skill.spotify-player",
@@ -1552,7 +1562,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "spotify",
           "music",
@@ -1587,7 +1597,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "voice",
           "stt",
@@ -1617,7 +1627,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "voice",
           "stt",
@@ -1648,7 +1658,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "voice",
           "tts",
@@ -1678,7 +1688,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "voice",
           "tts",
@@ -1707,7 +1717,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "structured-output",
           "json",
@@ -1727,7 +1737,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "summarization",
           "text-processing",
@@ -1753,7 +1763,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "threads",
           "social-media",
@@ -1789,7 +1799,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "tiktok",
           "video",
@@ -1826,7 +1836,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "guardrails",
           "topics",
@@ -1853,7 +1863,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "trello",
           "kanban",
@@ -1886,7 +1896,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "twitter",
           "social-media",
@@ -1918,24 +1928,54 @@
           "primaryEnv": "TWITTER_BEARER_TOKEN"
         }
       },
+      {
+        "id": "com.framers.skill.video-generation",
+        "name": "video-generation",
+        "displayName": "video-generation",
+        "version": "1.0.0",
+        "path": "registry/curated/video-generation",
+        "description": "Video generation, analysis, and scene detection — text-to-video, image-to-video, structured scene descriptions with RAG indexing, and general-purpose visual change detection.",
+        "category": "media",
+        "namespace": "wunderland",
+        "verified": true,
+        "source": "curated",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
+        "keywords": [
+          "video",
+          "generation",
+          "analysis",
+          "scene-detection",
+          "RAG",
+          "multimodal",
+          "runway",
+          "replicate",
+          "fal"
+        ],
+        "metadata": {
+          "emoji": "🎬"
+        }
+      },
       {
         "id": "com.framers.skill.vision-ocr",
         "name": "vision-ocr",
         "displayName": "vision-ocr",
-        "version": "1.0.0",
+        "version": "1.1.0",
         "path": "registry/curated/vision-ocr",
-        "description": "Extract text from images using OCR and vision AI",
-        "category": "uncategorized",
+        "description": "Extract text from images using OCR and vision AI with the performOCR() high-level API or the full VisionPipeline.",
+        "category": "vision",
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "vision",
           "ocr",
           "text-extraction",
           "document",
           "handwriting"
+        ],
+        "requiredTools": [
+          "vision-pipeline"
         ]
       },
       {
@@ -1949,7 +1989,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "voice",
           "speech",
@@ -1976,7 +2016,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "voice",
           "stt",
@@ -2002,7 +2042,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "weather",
           "forecast",
@@ -2028,7 +2068,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "scraping",
           "browser",
@@ -2065,7 +2105,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "search",
           "web",
@@ -2092,7 +2132,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "transcription",
           "whisper",
@@ -2155,7 +2195,7 @@
         "namespace": "wunderland",
         "verified": true,
         "source": "curated",
-        "verifiedAt": "2026-03-27T01:15:04.856Z",
+        "verifiedAt": "2026-03-27T07:00:16.599Z",
         "keywords": [
           "youtube",
           "video",
@@ -2188,8 +2228,8 @@
     "community": []
   },
   "stats": {
-    "totalSkills": 65,
-    "curatedCount": 65,
+    "totalSkills": 66,
+    "curatedCount": 66,
     "communityCount": 0
   }
 }

package/registry/curated/video-ingestion/SKILL.md DELETED Viewed

@@ -1,90 +0,0 @@
----
-name: video-ingestion
-version: '1.0.0'
-description: Video processing for RAG — extract frames via vision pipeline + audio via STT, index into knowledge base.
-author: Wunderland
-namespace: wunderland
-category: productivity
-tags: [video, ffmpeg, frames, transcription, multimodal, RAG]
-requires_secrets: []
-requires_tools: []
-metadata:
-  agentos:
-    emoji: "\U0001F3AC"
----
-# Video Ingestion for Multimodal RAG
-Use this skill when the user wants to index video content into the agent's knowledge base so it can be searched and recalled during conversation.
-Video ingestion works through the `MultimodalMemoryBridge`, which orchestrates two parallel extraction pipelines and feeds the results into both the RAG vector store and (optionally) cognitive memory.
-## How It Works
-1. **Frame extraction** — ffmpeg samples frames at a configurable interval (default: 1 frame every 5 seconds). Each frame is passed to a vision-capable LLM (e.g. GPT-4o) which generates a text description. That description is embedded and indexed into the vector store with `modality: 'image'` metadata.
-2. **Audio extraction** — ffmpeg demuxes the audio track and pipes it to the configured STT provider (e.g. Whisper). The resulting transcript is chunked, embedded, and indexed with `modality: 'audio'` metadata.
-3. **Memory traces** — When cognitive memory is enabled, the bridge encodes both visual descriptions and audio transcript chunks as memory traces so the agent can recall video content during future conversations.
-## When to Ingest Video vs. Just Extract Audio
-- **Ingest full video** when visual content matters: tutorials, screen recordings, product demos, surveillance, presentations with slides, anything where "what is shown" conveys information the transcript alone misses.
-- **Extract audio only** when the video is essentially a podcast, voice memo, meeting recording, or phone call where the visual track adds no information. Audio-only ingestion is faster, cheaper (no vision LLM calls), and produces smaller index footprints.
-If you are unsure, prefer full video ingestion. The frame extraction is lightweight and the vision descriptions are short — the marginal cost is small compared to the value of not losing visual context.
-## Prerequisites
-- **ffmpeg** must be installed and on the system PATH. The bridge shells out to `ffmpeg` for frame and audio extraction. Without it, video ingestion will fail with a clear error.
-- A **vision-capable LLM** must be configured (OPENAI_API_KEY or equivalent) for frame description.
-- An **STT provider** must be configured for audio transcription.
-## Usage
-Video ingestion is triggered through the `MultimodalMemoryBridge.ingestVideo()` method. When using the HTTP API, POST the video file to:
-```
-POST /api/agentos/rag/multimodal/documents/ingest
-Content-Type: multipart/form-data
-```
-with the video file in the `document` field. The system auto-detects video MIME types and routes to the video pipeline.
-Programmatic usage:
-```typescript
-import { MultimodalMemoryBridge } from 'agentos/rag/multimodal';
-await bridge.ingestVideo(videoBuffer, {
-  source: 'user-upload',
-  fileName: 'meeting-2024-03-15.mp4',
-  extractFrames: true,       // default true
-  frameIntervalSeconds: 10,  // sample 1 frame every 10s (default 5)
-  language: 'en',            // STT language hint
-});
-```
-## Configuration Options
-| Option | Default | Description |
-|--------|---------|-------------|
-| `extractFrames` | `true` | Set `false` for audio-only ingestion |
-| `frameIntervalSeconds` | `5` | Seconds between sampled frames |
-| `language` | auto-detect | BCP-47 language code for STT |
-| `collection` | `'multimodal'` | Target vector store collection |
-## Examples
-- "Ingest this tutorial video so I can search it later."
-- "Extract the audio from this meeting recording and add it to my knowledge base."
-- "Index this product demo video — I need to reference the UI screenshots shown at 2:30."
-- "Process all MP4 files in this folder and make them searchable."
-## Constraints
-- ffmpeg must be installed. The system does not bundle or auto-install it.
-- Long videos (>1 hour) produce many frames; consider increasing `frameIntervalSeconds` to 15-30 for very long content.
-- Vision LLM calls are billed per frame. A 1-hour video at the default 5-second interval generates ~720 frames.
-- Supported container formats: MP4, MKV, WebM, AVI, MOV (anything ffmpeg can demux).
-- Video ingestion is not real-time; expect processing time proportional to video length.