@framers/agentos-skills-registry 0.12.0 → 0.14.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@framers/agentos-skills-registry",
3
- "version": "0.12.0",
3
+ "version": "0.14.0",
4
4
  "files": [
5
5
  "dist",
6
6
  "registry",
@@ -1,23 +1,153 @@
1
1
  ---
2
2
  name: multimodal-rag
3
- description: Index and search across text, images, audio, video, and PDFs
4
- version: 1.0.0
5
- tags: [rag, multimodal, image, audio, video, pdf, search]
6
- tools_required: [vision-pipeline]
3
+ version: '2.0.0'
4
+ description: Index and search across text, images, audio, video, and PDFs via the multimodal RAG pipeline and HTTP API.
5
+ author: Wunderland
6
+ namespace: wunderland
7
+ category: productivity
8
+ tags: [rag, multimodal, image, audio, video, pdf, search, indexing, memory]
9
+ requires_secrets: []
10
+ requires_tools: [vision-pipeline]
11
+ metadata:
12
+ agentos:
13
+ emoji: "\U0001F50D"
7
14
  ---
8
15
 
9
16
  # Multimodal RAG
10
17
 
11
- Index and retrieve content across all modalities -- text, images, audio, video, and PDFs. Images are described via vision AI, audio is transcribed, video frames are extracted, and PDFs are parsed. All content is embedded and searchable.
18
+ Use this skill when the user wants to index, search, or retrieve content across multiple modalities -- text, images, audio, video, and documents (PDF, DOCX, Markdown, CSV, JSON, XML). All non-text content is converted to a text representation (vision description, STT transcript, document parse) before embedding, so every modality is searchable with the same text query.
19
+
20
+ ## Architecture
21
+
22
+ ```
23
+ Image --> Vision LLM --> description --> embed --> vector store
24
+ Audio --> STT --> transcript --> embed --> vector store
25
+ Video --> ffmpeg (frames + audio) --> vision + STT --> vector store
26
+ PDF --> text extraction + chunking --> embed --> vector store
27
+ ```
28
+
29
+ When cognitive memory is enabled via `MultimodalMemoryBridge`, ingested content also creates memory traces so agents can recall multimodal content during conversation without an explicit search.
12
30
 
13
31
  ## Capabilities
14
- - **Image indexing**: Vision LLM describes -> embed -> search
15
- - **Audio indexing**: STT transcribes -> embed -> search
16
- - **Video indexing**: Frame extraction + audio transcription
17
- - **PDF indexing**: Text + embedded image extraction
18
- - **Cross-modal search**: Query returns results from all modalities
19
-
20
- ## Example
21
- "Find images related to quantum computing"
22
- "Search my audio recordings for mentions of the project deadline"
23
- "What does the diagram in page 3 of the PDF show?"
32
+
33
+ - **Image indexing** Vision LLM describes the image, description is embedded and searchable.
34
+ - **Audio indexing** STT transcribes the audio, transcript is chunked and searchable.
35
+ - **Video indexing** Frame extraction (vision) + audio transcription (STT), both indexed.
36
+ - **Document indexing** PDF, DOCX, TXT, Markdown, CSV, JSON, XML text extracted and indexed.
37
+ - **Cross-modal search** — A single text query returns results from all modalities, ranked by relevance.
38
+ - **Query-by-image** — Upload an image to find similar indexed content.
39
+ - **Query-by-audio** Upload audio to find related indexed content via transcript matching.
40
+
41
+ ## HTTP API Routes
42
+
43
+ All routes are mounted under `/api/agentos/rag/multimodal`. Ingestion routes accept `multipart/form-data`.
44
+
45
+ ### Ingest
46
+
47
+ | Method | Path | Field | Description |
48
+ |--------|------|-------|-------------|
49
+ | POST | `/images/ingest` | `image` | Ingest an image (max 15 MB). Vision LLM generates description. |
50
+ | POST | `/audio/ingest` | `audio` | Ingest audio (max 25 MB). STT generates transcript. |
51
+ | POST | `/documents/ingest` | `document` | Ingest a document (max 30 MB). Text extracted and chunked. |
52
+
53
+ Common form fields for all ingest routes:
54
+
55
+ | Field | Type | Description |
56
+ |-------|------|-------------|
57
+ | `collectionId` | string | Target collection (default: auto) |
58
+ | `assetId` | string | Optional custom ID for the asset |
59
+ | `category` | string | `conversation_memory`, `knowledge_base`, `user_notes`, `system`, `custom` |
60
+ | `tags` | string | Comma-separated or JSON array of tags |
61
+ | `metadata` | string | JSON object with arbitrary metadata |
62
+ | `storePayload` | boolean | Whether to store the raw binary (for later download) |
63
+ | `sourceUrl` | string | Original URL of the content |
64
+ | `textRepresentation` | string | Override auto-generated description/transcript |
65
+ | `userId` | string | Owner user ID |
66
+ | `agentId` | string | Owner agent ID |
67
+
68
+ ### Query
69
+
70
+ | Method | Path | Body / Field | Description |
71
+ |--------|------|-------------|-------------|
72
+ | POST | `/query` | JSON body | Text query across all modalities |
73
+ | POST | `/images/query` | `image` field | Query by uploading an image |
74
+ | POST | `/audio/query` | `audio` field | Query by uploading audio |
75
+
76
+ Text query body:
77
+
78
+ ```json
79
+ {
80
+ "query": "quantum computing diagrams",
81
+ "modalities": ["image", "audio", "document"],
82
+ "collectionIds": ["knowledge-base"],
83
+ "topK": 10,
84
+ "includeMetadata": true
85
+ }
86
+ ```
87
+
88
+ Image/audio query form fields:
89
+
90
+ | Field | Type | Description |
91
+ |-------|------|-------------|
92
+ | `modalities` | string | Comma-separated: `image`, `audio`, `document` |
93
+ | `collectionIds` | string | Comma-separated collection IDs to search |
94
+ | `topK` | number | Max results (default: 5) |
95
+ | `includeMetadata` | boolean | Include stored metadata in results |
96
+ | `retrievalMode` | string | `auto` (default), `text`, `native`, `hybrid` |
97
+
98
+ ### Asset Management
99
+
100
+ | Method | Path | Description |
101
+ |--------|------|-------------|
102
+ | GET | `/assets/:assetId` | Get asset metadata |
103
+ | GET | `/assets/:assetId/content` | Download raw binary (if `storePayload` was true) |
104
+ | DELETE | `/assets/:assetId` | Delete asset and its embeddings |
105
+
106
+ ## Retrieval Modes
107
+
108
+ - **`auto`** (default) — Text-first retrieval with native augmentation when available.
109
+ - **`text`** — Derive a caption/transcript and query the standard text pipeline only.
110
+ - **`native`** — Use modality-native embeddings (e.g. CLIP for images) when available.
111
+ - **`hybrid`** — Combine text and native retrieval, merge and re-rank results.
112
+
113
+ ## Programmatic Usage
114
+
115
+ ```typescript
116
+ import { MultimodalMemoryBridge } from 'agentos/rag/multimodal';
117
+
118
+ // Ingest an image
119
+ await bridge.ingestImage(imageBuffer, { source: 'upload', tags: ['product'] });
120
+
121
+ // Ingest audio
122
+ await bridge.ingestAudio(audioBuffer, { language: 'en' });
123
+
124
+ // Ingest video (requires ffmpeg)
125
+ await bridge.ingestVideo(videoBuffer, { extractFrames: true });
126
+
127
+ // Ingest PDF
128
+ await bridge.ingestPDF(pdfBuffer, { extractImages: true });
129
+
130
+ // Cross-modal search
131
+ const results = await indexer.search('quantum computing', {
132
+ topK: 10,
133
+ modalities: ['image', 'text', 'audio'],
134
+ });
135
+ ```
136
+
137
+ ## Examples
138
+
139
+ - "Index this product photo so I can find it by description later."
140
+ - "Ingest all the PDFs in this folder into my knowledge base."
141
+ - "Search my audio recordings for mentions of the quarterly budget."
142
+ - "Find images related to the network architecture diagram."
143
+ - "What does the chart on page 5 of the annual report show?"
144
+ - "Upload this meeting recording and make it searchable."
145
+
146
+ ## Constraints
147
+
148
+ - Image uploads are capped at 15 MB, audio at 25 MB, documents at 30 MB.
149
+ - Supported audio formats: MP3, MP4, M4A, WAV, WebM, OGG (Whisper-compatible).
150
+ - Supported document formats: PDF, DOCX, TXT, Markdown, CSV, JSON, XML.
151
+ - Video ingestion requires ffmpeg installed on the system.
152
+ - Vision LLM and STT provider must be configured for image/audio indexing respectively.
153
+ - Cross-modal search ranks by cosine similarity of embedded text representations; it does not perform true multimodal embedding fusion unless `retrievalMode: 'native'` is used with a CLIP-like model.
@@ -0,0 +1,210 @@
1
+ ---
2
+ name: video-generation
3
+ version: '1.0.0'
4
+ description: Video generation, analysis, and scene detection — text-to-video, image-to-video, structured scene descriptions with RAG indexing, and general-purpose visual change detection.
5
+ author: Wunderland
6
+ namespace: wunderland
7
+ category: media
8
+ tags: [video, generation, analysis, scene-detection, RAG, multimodal, runway, replicate, fal]
9
+ requires_secrets: []
10
+ requires_tools: []
11
+ metadata:
12
+ agentos:
13
+ emoji: "\U0001F3AC"
14
+ ---
15
+
16
+ # Video Generation, Analysis & Scene Detection
17
+
18
+ Use this skill when the user wants to create AI-generated videos, analyse existing video content for structured scene descriptions, or detect visual changes in live/recorded frame streams.
19
+
20
+ This skill covers three complementary APIs:
21
+
22
+ 1. **generateVideo()** — Text-to-video and image-to-video generation
23
+ 2. **analyzeVideo()** — Structured video analysis with scene descriptions, transcription, and optional RAG indexing
24
+ 3. **detectScenes()** — Real-time or batch scene boundary detection from frame streams
25
+
26
+ ## Video Generation
27
+
28
+ ### Text-to-Video
29
+
30
+ Generate a video from a text prompt. The system auto-detects the best available provider from environment variables in priority order: `RUNWAY_API_KEY` (highest quality), `REPLICATE_API_TOKEN` (widest model variety), `FAL_API_KEY` (fast serverless GPU).
31
+
32
+ ```typescript
33
+ import { generateVideo } from 'agentos';
34
+
35
+ const result = await generateVideo({
36
+ prompt: 'A drone flying over a misty forest at sunrise, cinematic 4K',
37
+ durationSec: 5,
38
+ aspectRatio: '16:9',
39
+ });
40
+ console.log(result.videos[0].url);
41
+ ```
42
+
43
+ ### Image-to-Video
44
+
45
+ Animate a still image by providing it as a Buffer via `opts.image`. The prompt describes the desired motion rather than the scene itself.
46
+
47
+ ```typescript
48
+ import { generateVideo } from 'agentos';
49
+ import { readFileSync } from 'fs';
50
+
51
+ const result = await generateVideo({
52
+ prompt: 'Camera slowly zooms out, gentle wind moves the leaves',
53
+ image: readFileSync('landscape.png'),
54
+ provider: 'runway',
55
+ });
56
+ ```
57
+
58
+ ### Provider Selection
59
+
60
+ | Provider | Best For | Env Var |
61
+ |----------|----------|---------|
62
+ | **Runway** | Highest quality, cinematic output, image-to-video | `RUNWAY_API_KEY` |
63
+ | **Replicate** | Widest model variety (Kling, HunyuanVideo, MiniMax), open-source models | `REPLICATE_API_TOKEN` |
64
+ | **Fal** | Fast serverless GPU, cost-effective, Kling/CogVideo | `FAL_API_KEY` |
65
+
66
+ When multiple provider API keys are set, the system wraps the primary in a `FallbackVideoProxy` so a transient failure on one provider automatically retries on the next.
67
+
68
+ To force a specific provider:
69
+
70
+ ```typescript
71
+ const result = await generateVideo({
72
+ prompt: 'A cat playing piano',
73
+ provider: 'replicate',
74
+ model: 'klingai/kling-v1',
75
+ apiKey: 'your-replicate-token',
76
+ });
77
+ ```
78
+
79
+ ### Prompt Tips for Video
80
+
81
+ - **Be specific about motion**: "camera pans left to right", "person walks toward camera", "time-lapse of clouds moving"
82
+ - **Specify style early**: "cinematic 4K", "hand-drawn animation", "vintage film grain"
83
+ - **Keep prompts concise**: Video models respond best to clear, focused descriptions (1-3 sentences)
84
+ - **Use negative prompts** to avoid unwanted artifacts: `negativePrompt: 'blurry, distorted faces, watermark'`
85
+
86
+ ### Image-to-Video Motion Strength
87
+
88
+ When doing image-to-video, the prompt controls how much the image changes:
89
+
90
+ - **Gentle motion**: "subtle camera drift", "soft wind blowing through hair" — minimal departure from source
91
+ - **Moderate motion**: "person turns head and smiles", "camera orbits subject" — clear movement while preserving subject
92
+ - **Strong motion**: "explosion of confetti", "character runs toward camera" — significant scene change
93
+
94
+ The provider's motion strength interpretation varies. Runway tends to be conservative (good for preserving the source image), while Replicate/Fal models may be more aggressive. Start with gentle prompts and increase intensity.
95
+
96
+ ## Video Analysis
97
+
98
+ ### Structured Scene Analysis
99
+
100
+ Analyse a video to extract structured scene descriptions, detected objects, on-screen text, and optional audio transcription.
101
+
102
+ ```typescript
103
+ import { analyzeVideo } from 'agentos';
104
+
105
+ const analysis = await analyzeVideo({
106
+ videoUrl: 'https://example.com/product-demo.mp4',
107
+ prompt: 'Identify all products shown and their key features',
108
+ transcribeAudio: true,
109
+ descriptionDetail: 'detailed',
110
+ });
111
+
112
+ console.log(analysis.description);
113
+ for (const scene of analysis.scenes ?? []) {
114
+ console.log(`[${scene.startSec}s - ${scene.endSec}s] ${scene.description}`);
115
+ }
116
+ ```
117
+
118
+ ### RAG Integration
119
+
120
+ Enable `indexForRAG: true` to automatically index scene descriptions and transcripts into the vector store for later retrieval. This is especially useful for building searchable video libraries.
121
+
122
+ ```typescript
123
+ const analysis = await analyzeVideo({
124
+ videoBuffer: videoData,
125
+ indexForRAG: true,
126
+ descriptionDetail: 'detailed',
127
+ transcribeAudio: true,
128
+ });
129
+
130
+ // Scene descriptions and transcripts are now searchable via RAG
131
+ console.log(`Indexed ${analysis.ragChunkIds?.length ?? 0} chunks`);
132
+ ```
133
+
134
+ Each scene description becomes a separate vector chunk with metadata including timestamps, scene index, and cut type. This enables queries like "find the part where the presenter shows the pricing slide" to return precise timestamp ranges.
135
+
136
+ ### Analysis Options
137
+
138
+ | Option | Default | Description |
139
+ |--------|---------|-------------|
140
+ | `sceneThreshold` | `0.3` | Scene change sensitivity (0-1, lower = more scenes) |
141
+ | `transcribeAudio` | `true` | Transcribe audio via configured STT provider |
142
+ | `descriptionDetail` | `'detailed'` | `'brief'`, `'detailed'`, or `'exhaustive'` |
143
+ | `maxScenes` | `100` | Cap on detected scenes (prevents runaway on long videos) |
144
+ | `indexForRAG` | `false` | Index results into RAG vector store |
145
+
146
+ ## Scene Detection
147
+
148
+ ### Live Stream / Batch Detection
149
+
150
+ Use `detectScenes()` for real-time visual change detection on frame streams. Returns an AsyncGenerator that yields `SceneBoundary` objects as visual discontinuities are detected.
151
+
152
+ ```typescript
153
+ import { detectScenes } from 'agentos';
154
+
155
+ // From a pre-recorded video (frames extracted via ffmpeg)
156
+ for await (const boundary of detectScenes({ frames: extractedFrameStream })) {
157
+ console.log(`Scene ${boundary.index} at ${boundary.startTimeSec}s`);
158
+ console.log(` Type: ${boundary.cutType}, Confidence: ${boundary.confidence}`);
159
+ }
160
+ ```
161
+
162
+ ### Use Cases
163
+
164
+ - **Webcam / security camera**: Detect motion or scene changes in real-time surveillance feeds
165
+ - **Screen recording**: Identify slide transitions in presentations, page changes in demos
166
+ - **Video editing**: Automatically segment raw footage at cut points
167
+ - **Content moderation**: Flag rapid scene changes that may indicate problematic content
168
+
169
+ ### Configuration
170
+
171
+ ```typescript
172
+ for await (const boundary of detectScenes({
173
+ frames: webcamStream,
174
+ hardCutThreshold: 0.4, // Less sensitive to hard cuts
175
+ gradualThreshold: 0.15, // Standard sensitivity for dissolves/fades
176
+ minSceneDurationSec: 2.0, // Suppress very short scenes
177
+ methods: ['histogram'], // Fast histogram-only detection
178
+ })) {
179
+ handleSceneChange(boundary);
180
+ }
181
+ ```
182
+
183
+ ### Cut Type Classification
184
+
185
+ The detector classifies each scene boundary:
186
+
187
+ | Cut Type | Description |
188
+ |----------|-------------|
189
+ | `hard-cut` | Abrupt frame-to-frame change (most common) |
190
+ | `dissolve` | Cross-dissolve / superimposition transition |
191
+ | `fade` | Fade from/to black or white |
192
+ | `gradual` | Other gradual visual change |
193
+
194
+ ## Prerequisites
195
+
196
+ - At least one video provider API key for generation (`RUNWAY_API_KEY`, `REPLICATE_API_TOKEN`, or `FAL_API_KEY`)
197
+ - **ffmpeg** on PATH for video analysis (frame extraction and audio demuxing)
198
+ - A vision-capable LLM (`OPENAI_API_KEY` or equivalent) for scene description
199
+ - An STT provider for audio transcription (when `transcribeAudio` is enabled)
200
+
201
+ Scene detection (`detectScenes()`) has zero external dependencies — it works purely on RGB pixel buffers.
202
+
203
+ ## Examples
204
+
205
+ - "Generate a 5-second cinematic video of a sunset over the ocean"
206
+ - "Turn this product photo into a video with a slow camera orbit"
207
+ - "Analyse this tutorial video and index it for search"
208
+ - "Detect scene changes in this security camera feed"
209
+ - "Extract structured scenes from this presentation recording"
210
+ - "Create a video from this image with gentle parallax motion"
@@ -1,22 +1,82 @@
1
1
  ---
2
2
  name: vision-ocr
3
- description: Extract text from images using OCR and vision AI
4
- version: 1.0.0
3
+ version: '1.1.0'
4
+ description: Extract text from images using OCR and vision AI with the performOCR() high-level API or the full VisionPipeline.
5
+ author: Wunderland
6
+ namespace: wunderland
7
+ category: vision
5
8
  tags: [vision, ocr, text-extraction, document, handwriting]
6
- tools_required: [vision-pipeline]
9
+ requires_secrets: []
10
+ requires_tools: [vision-pipeline]
7
11
  ---
8
12
 
9
13
  # Vision & OCR
10
14
 
11
- Extract text from images, documents, and handwritten notes using a progressive 3-tier pipeline: local OCR (PaddleOCR) -> local vision models (TrOCR, Florence-2) -> cloud vision (GPT-4o, Claude).
15
+ Extract text from images, documents, and handwritten notes using a progressive 3-tier pipeline: local OCR (PaddleOCR / Tesseract) -> local vision models (TrOCR, Florence-2) -> cloud vision LLM (GPT-4o, Claude, Gemini).
16
+
17
+ ## High-Level API: `performOCR()`
18
+
19
+ For one-shot text extraction, use the top-level `performOCR()` function. It handles input resolution, pipeline lifecycle, and cleanup automatically.
20
+
21
+ ```typescript
22
+ import { performOCR } from '@framers/agentos';
23
+
24
+ const result = await performOCR({
25
+ image: '/path/to/receipt.png', // file path, URL, base64, or Buffer
26
+ strategy: 'progressive', // 'progressive' | 'local-only' | 'cloud-only'
27
+ confidenceThreshold: 0.7, // min confidence before escalating tier
28
+ });
29
+
30
+ console.log(result.text); // extracted text
31
+ console.log(result.confidence); // 0–1 score
32
+ console.log(result.tier); // 'ocr' | 'handwriting' | 'document-ai' | 'cloud-vision'
33
+ console.log(result.provider); // 'paddle' | 'tesseract' | 'openai' | etc.
34
+ console.log(result.regions); // bounding boxes (when available)
35
+ ```
36
+
37
+ ## When to use `performOCR()` vs `VisionPipeline`
38
+
39
+ | Use case | Recommendation |
40
+ |----------|---------------|
41
+ | One-shot text extraction from a single image | `performOCR()` — simplest API |
42
+ | Batch processing many images | `VisionPipeline` — create once, reuse, dispose when done |
43
+ | Need CLIP embeddings or document layout | `VisionPipeline` — richer result shape |
44
+ | Quick scripts and integrations | `performOCR()` — zero boilerplate |
45
+
46
+ ## Progressive Tier System
47
+
48
+ The pipeline tries the cheapest/fastest tier first and only escalates when confidence is below threshold:
49
+
50
+ 1. **Tier 1 — Local OCR** (PaddleOCR or Tesseract.js): Fast, free, offline. Handles printed text in documents, receipts, screenshots.
51
+ 2. **Tier 2 — Local Vision Models** (TrOCR / Florence-2): Still offline. Handles handwritten notes, complex document layouts with tables and figures.
52
+ 3. **Tier 3 — Cloud Vision LLM** (GPT-4o / Claude / Gemini): Best quality. Handles photographs, diagrams, mixed content, anything the local tiers can't confidently read.
53
+
54
+ ## Strategy Selection
55
+
56
+ - **`'progressive'`** (default): Start local, escalate only if needed. Best cost/quality balance for most use cases.
57
+ - **`'local-only'`**: Never call cloud APIs. Use for air-gapped environments, privacy-sensitive data (medical records, financial docs), or when no API keys are available.
58
+ - **`'cloud-only'`**: Skip local tiers entirely, send straight to a cloud vision LLM. Use when you need the highest quality output and cost is not a concern.
59
+
60
+ ## Input Formats
61
+
62
+ `performOCR()` accepts four input types:
63
+
64
+ - **File path**: `'/tmp/scan.png'` — reads from disk
65
+ - **URL**: `'https://example.com/receipt.jpg'` — fetches via HTTP
66
+ - **Base64 string**: Raw base64 or `data:image/png;base64,...` data URIs — decoded in-memory
67
+ - **Buffer**: Raw image bytes — passed directly to the pipeline
12
68
 
13
69
  ## Capabilities
14
- - **Printed text OCR**: Extract text from documents, receipts, screenshots
70
+
71
+ - **Printed text OCR**: Extract text from documents, receipts, screenshots, PDFs
15
72
  - **Handwriting recognition**: Read handwritten notes and forms via TrOCR
16
- - **Document layout**: Understand tables, figures, headings via Florence-2
17
- - **Image embeddings**: Generate CLIP vectors for semantic image search
73
+ - **Document layout understanding**: Parse tables, figures, headings via Florence-2
74
+ - **Bounding box regions**: Spatial text locations for overlay rendering
75
+ - **Image embeddings**: Generate CLIP vectors for semantic image search (via `VisionPipeline` only)
76
+
77
+ ## Examples
18
78
 
19
- ## Example
20
- "Read the text from this receipt"
21
- "What does this handwritten note say?"
22
- "Extract the table data from this PDF page"
79
+ - "Read the text from this receipt"
80
+ - "What does this handwritten note say?"
81
+ - "Extract the table data from this PDF page"
82
+ - "OCR this screenshot and return the error message"
package/registry.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "version": "1.0.0",
3
- "updated": "2026-03-27T01:15:04.856Z",
3
+ "updated": "2026-03-27T07:00:16.599Z",
4
4
  "categories": {
5
5
  "curated": [
6
6
  "1password",
@@ -60,6 +60,7 @@
60
60
  "topicality",
61
61
  "trello",
62
62
  "twitter-bot",
63
+ "video-generation",
63
64
  "vision-ocr",
64
65
  "voice-conversation",
65
66
  "vosk",
@@ -84,7 +85,7 @@
84
85
  "namespace": "wunderland",
85
86
  "verified": true,
86
87
  "source": "curated",
87
- "verifiedAt": "2026-03-27T01:15:04.856Z",
88
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
88
89
  "keywords": [
89
90
  "1password",
90
91
  "passwords",
@@ -129,7 +130,7 @@
129
130
  "namespace": "wunderland",
130
131
  "verified": true,
131
132
  "source": "curated",
132
- "verifiedAt": "2026-03-27T01:15:04.856Z",
133
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
133
134
  "keywords": [
134
135
  "accounts",
135
136
  "credentials",
@@ -165,7 +166,7 @@
165
166
  "namespace": "wunderland",
166
167
  "verified": true,
167
168
  "source": "curated",
168
- "verifiedAt": "2026-03-27T01:15:04.856Z",
169
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
169
170
  "keywords": [
170
171
  "agent",
171
172
  "config",
@@ -186,7 +187,7 @@
186
187
  "namespace": "wunderland",
187
188
  "verified": true,
188
189
  "source": "curated",
189
- "verifiedAt": "2026-03-27T01:15:04.856Z",
190
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
190
191
  "keywords": [
191
192
  "voice",
192
193
  "tts",
@@ -217,7 +218,7 @@
217
218
  "namespace": "wunderland",
218
219
  "verified": true,
219
220
  "source": "curated",
220
- "verifiedAt": "2026-03-27T01:15:04.856Z",
221
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
221
222
  "keywords": [
222
223
  "apple-notes",
223
224
  "macos",
@@ -251,7 +252,7 @@
251
252
  "namespace": "wunderland",
252
253
  "verified": true,
253
254
  "source": "curated",
254
- "verifiedAt": "2026-03-27T01:15:04.856Z",
255
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
255
256
  "keywords": [
256
257
  "apple-reminders",
257
258
  "macos",
@@ -286,7 +287,7 @@
286
287
  "namespace": "wunderland",
287
288
  "verified": true,
288
289
  "source": "curated",
289
- "verifiedAt": "2026-03-27T01:15:04.856Z",
290
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
290
291
  "keywords": [
291
292
  "blog",
292
293
  "publishing",
@@ -330,7 +331,7 @@
330
331
  "namespace": "wunderland",
331
332
  "verified": true,
332
333
  "source": "curated",
333
- "verifiedAt": "2026-03-27T01:15:04.856Z",
334
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
334
335
  "keywords": [
335
336
  "bluesky",
336
337
  "social-media",
@@ -370,7 +371,7 @@
370
371
  "namespace": "wunderland",
371
372
  "verified": true,
372
373
  "source": "curated",
373
- "verifiedAt": "2026-03-27T01:15:04.856Z",
374
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
374
375
  "keywords": [
375
376
  "cloud",
376
377
  "devops",
@@ -397,7 +398,7 @@
397
398
  "namespace": "wunderland",
398
399
  "verified": true,
399
400
  "source": "curated",
400
- "verifiedAt": "2026-03-27T01:15:04.856Z",
401
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
401
402
  "keywords": [
402
403
  "guardrails",
403
404
  "code-safety",
@@ -425,7 +426,7 @@
425
426
  "namespace": "wunderland",
426
427
  "verified": true,
427
428
  "source": "curated",
428
- "verifiedAt": "2026-03-27T01:15:04.856Z",
429
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
429
430
  "keywords": [
430
431
  "coding",
431
432
  "programming",
@@ -452,7 +453,7 @@
452
453
  "namespace": "wunderland",
453
454
  "verified": true,
454
455
  "source": "curated",
455
- "verifiedAt": "2026-03-27T01:15:04.856Z",
456
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
456
457
  "keywords": [
457
458
  "content",
458
459
  "writing",
@@ -482,7 +483,7 @@
482
483
  "namespace": "wunderland",
483
484
  "verified": true,
484
485
  "source": "curated",
485
- "verifiedAt": "2026-03-27T01:15:04.856Z",
486
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
486
487
  "keywords": [
487
488
  "research",
488
489
  "investigation",
@@ -520,7 +521,7 @@
520
521
  "namespace": "wunderland",
521
522
  "verified": true,
522
523
  "source": "curated",
523
- "verifiedAt": "2026-03-27T01:15:04.856Z",
524
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
524
525
  "keywords": [
525
526
  "voice",
526
527
  "diarization",
@@ -545,7 +546,7 @@
545
546
  "namespace": "wunderland",
546
547
  "verified": true,
547
548
  "source": "curated",
548
- "verifiedAt": "2026-03-27T01:15:04.856Z",
549
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
549
550
  "keywords": [
550
551
  "discord",
551
552
  "messaging",
@@ -573,7 +574,7 @@
573
574
  "namespace": "wunderland",
574
575
  "verified": true,
575
576
  "source": "curated",
576
- "verifiedAt": "2026-03-27T01:15:04.856Z",
577
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
577
578
  "metadata": {
578
579
  "primaryEnv": "INTERNAL_API_SECRET",
579
580
  "emoji": "📧",
@@ -614,7 +615,7 @@
614
615
  "namespace": "wunderland",
615
616
  "verified": true,
616
617
  "source": "curated",
617
- "verifiedAt": "2026-03-27T01:15:04.856Z",
618
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
618
619
  "keywords": [
619
620
  "emergent",
620
621
  "tools",
@@ -642,7 +643,7 @@
642
643
  "namespace": "wunderland",
643
644
  "verified": true,
644
645
  "source": "curated",
645
- "verifiedAt": "2026-03-27T01:15:04.856Z",
646
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
646
647
  "keywords": [
647
648
  "voice",
648
649
  "endpointing",
@@ -668,7 +669,7 @@
668
669
  "namespace": "wunderland",
669
670
  "verified": true,
670
671
  "source": "curated",
671
- "verifiedAt": "2026-03-27T01:15:04.856Z",
672
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
672
673
  "keywords": [
673
674
  "facebook",
674
675
  "social-media",
@@ -707,7 +708,7 @@
707
708
  "namespace": "wunderland",
708
709
  "verified": true,
709
710
  "source": "curated",
710
- "verifiedAt": "2026-03-27T01:15:04.856Z",
711
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
711
712
  "keywords": [
712
713
  "git",
713
714
  "version-control",
@@ -761,7 +762,7 @@
761
762
  "namespace": "wunderland",
762
763
  "verified": true,
763
764
  "source": "curated",
764
- "verifiedAt": "2026-03-27T01:15:04.856Z",
765
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
765
766
  "keywords": [
766
767
  "github",
767
768
  "git",
@@ -820,7 +821,7 @@
820
821
  "namespace": "wunderland",
821
822
  "verified": true,
822
823
  "source": "curated",
823
- "verifiedAt": "2026-03-27T01:15:04.856Z",
824
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
824
825
  "keywords": [
825
826
  "voice",
826
827
  "stt",
@@ -849,7 +850,7 @@
849
850
  "namespace": "wunderland",
850
851
  "verified": true,
851
852
  "source": "curated",
852
- "verifiedAt": "2026-03-27T01:15:04.856Z",
853
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
853
854
  "keywords": [
854
855
  "voice",
855
856
  "tts",
@@ -878,7 +879,7 @@
878
879
  "namespace": "wunderland",
879
880
  "verified": true,
880
881
  "source": "curated",
881
- "verifiedAt": "2026-03-27T01:15:04.856Z",
882
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
882
883
  "keywords": [
883
884
  "guardrails",
884
885
  "hallucination",
@@ -906,7 +907,7 @@
906
907
  "namespace": "wunderland",
907
908
  "verified": true,
908
909
  "source": "curated",
909
- "verifiedAt": "2026-03-27T01:15:04.856Z",
910
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
910
911
  "keywords": [
911
912
  "monitoring",
912
913
  "health",
@@ -939,7 +940,7 @@
939
940
  "namespace": "wunderland",
940
941
  "verified": true,
941
942
  "source": "curated",
942
- "verifiedAt": "2026-03-27T01:15:04.856Z",
943
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
943
944
  "keywords": [
944
945
  "image",
945
946
  "editing",
@@ -953,25 +954,26 @@
953
954
  "id": "com.framers.skill.image-gen",
954
955
  "name": "image-gen",
955
956
  "displayName": "image-gen",
956
- "version": "1.0.0",
957
+ "version": "2.0.0",
957
958
  "path": "registry/curated/image-gen",
958
- "description": "Generate images from text prompts using AI image generation APIs like DALL-E, Stable Diffusion, or Midjourney.",
959
+ "description": "Generate, edit, upscale, and variate images using the AgentOS multi-provider image pipeline with automatic fallback.",
959
960
  "category": "creative",
960
961
  "namespace": "wunderland",
961
962
  "verified": true,
962
963
  "source": "curated",
963
- "verifiedAt": "2026-03-27T01:15:04.856Z",
964
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
964
965
  "keywords": [
965
966
  "image-generation",
966
967
  "ai-art",
967
968
  "dall-e",
968
969
  "stable-diffusion",
970
+ "flux",
971
+ "replicate",
972
+ "stability",
973
+ "fal",
969
974
  "creative",
970
975
  "visual"
971
976
  ],
972
- "requiredSecrets": [
973
- "openai.api_key"
974
- ],
975
977
  "requiredTools": [
976
978
  "generate_image"
977
979
  ],
@@ -992,7 +994,7 @@
992
994
  "namespace": "wunderland",
993
995
  "verified": true,
994
996
  "source": "curated",
995
- "verifiedAt": "2026-03-27T01:15:04.856Z",
997
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
996
998
  "keywords": [
997
999
  "instagram",
998
1000
  "social-media",
@@ -1030,7 +1032,7 @@
1030
1032
  "namespace": "wunderland",
1031
1033
  "verified": true,
1032
1034
  "source": "curated",
1033
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1035
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1034
1036
  "keywords": [
1035
1037
  "linkedin",
1036
1038
  "social-media",
@@ -1068,7 +1070,7 @@
1068
1070
  "namespace": "wunderland",
1069
1071
  "verified": true,
1070
1072
  "source": "curated",
1071
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1073
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1072
1074
  "keywords": [
1073
1075
  "mastodon",
1074
1076
  "fediverse",
@@ -1108,7 +1110,7 @@
1108
1110
  "namespace": "wunderland",
1109
1111
  "verified": true,
1110
1112
  "source": "curated",
1111
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1113
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1112
1114
  "keywords": [
1113
1115
  "memory",
1114
1116
  "cognitive",
@@ -1132,7 +1134,7 @@
1132
1134
  "namespace": "wunderland",
1133
1135
  "verified": true,
1134
1136
  "source": "curated",
1135
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1137
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1136
1138
  "keywords": [
1137
1139
  "guardrails",
1138
1140
  "safety",
@@ -1155,14 +1157,14 @@
1155
1157
  "id": "com.framers.skill.multimodal-rag",
1156
1158
  "name": "multimodal-rag",
1157
1159
  "displayName": "multimodal-rag",
1158
- "version": "1.0.0",
1160
+ "version": "2.0.0",
1159
1161
  "path": "registry/curated/multimodal-rag",
1160
- "description": "Index and search across text, images, audio, video, and PDFs",
1161
- "category": "uncategorized",
1162
+ "description": "Index and search across text, images, audio, video, and PDFs via the multimodal RAG pipeline and HTTP API.",
1163
+ "category": "productivity",
1162
1164
  "namespace": "wunderland",
1163
1165
  "verified": true,
1164
1166
  "source": "curated",
1165
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1167
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1166
1168
  "keywords": [
1167
1169
  "rag",
1168
1170
  "multimodal",
@@ -1170,8 +1172,16 @@
1170
1172
  "audio",
1171
1173
  "video",
1172
1174
  "pdf",
1173
- "search"
1174
- ]
1175
+ "search",
1176
+ "indexing",
1177
+ "memory"
1178
+ ],
1179
+ "requiredTools": [
1180
+ "vision-pipeline"
1181
+ ],
1182
+ "metadata": {
1183
+ "emoji": "🔍"
1184
+ }
1175
1185
  },
1176
1186
  {
1177
1187
  "id": "com.framers.skill.notion",
@@ -1184,7 +1194,7 @@
1184
1194
  "namespace": "wunderland",
1185
1195
  "verified": true,
1186
1196
  "source": "curated",
1187
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1197
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1188
1198
  "keywords": [
1189
1199
  "notion",
1190
1200
  "wiki",
@@ -1213,7 +1223,7 @@
1213
1223
  "namespace": "wunderland",
1214
1224
  "verified": true,
1215
1225
  "source": "curated",
1216
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1226
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1217
1227
  "keywords": [
1218
1228
  "obsidian",
1219
1229
  "markdown",
@@ -1241,7 +1251,7 @@
1241
1251
  "namespace": "wunderland",
1242
1252
  "verified": true,
1243
1253
  "source": "curated",
1244
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1254
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1245
1255
  "keywords": [
1246
1256
  "voice",
1247
1257
  "wake-word",
@@ -1269,7 +1279,7 @@
1269
1279
  "namespace": "wunderland",
1270
1280
  "verified": true,
1271
1281
  "source": "curated",
1272
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1282
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1273
1283
  "keywords": [
1274
1284
  "pii",
1275
1285
  "privacy",
@@ -1300,7 +1310,7 @@
1300
1310
  "namespace": "wunderland",
1301
1311
  "verified": true,
1302
1312
  "source": "curated",
1303
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1313
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1304
1314
  "keywords": [
1305
1315
  "pinterest",
1306
1316
  "social-media",
@@ -1337,7 +1347,7 @@
1337
1347
  "namespace": "wunderland",
1338
1348
  "verified": true,
1339
1349
  "source": "curated",
1340
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1350
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1341
1351
  "keywords": [
1342
1352
  "voice",
1343
1353
  "tts",
@@ -1364,7 +1374,7 @@
1364
1374
  "namespace": "wunderland",
1365
1375
  "verified": true,
1366
1376
  "source": "curated",
1367
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1377
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1368
1378
  "keywords": [
1369
1379
  "voice",
1370
1380
  "wake-word",
@@ -1395,7 +1405,7 @@
1395
1405
  "namespace": "wunderland",
1396
1406
  "verified": true,
1397
1407
  "source": "curated",
1398
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1408
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1399
1409
  "keywords": [
1400
1410
  "reddit",
1401
1411
  "social-media",
@@ -1436,7 +1446,7 @@
1436
1446
  "namespace": "wunderland",
1437
1447
  "verified": true,
1438
1448
  "source": "curated",
1439
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1449
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1440
1450
  "keywords": [
1441
1451
  "seo",
1442
1452
  "link-building",
@@ -1471,7 +1481,7 @@
1471
1481
  "namespace": "wunderland",
1472
1482
  "verified": true,
1473
1483
  "source": "curated",
1474
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1484
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1475
1485
  "keywords": [
1476
1486
  "deploy",
1477
1487
  "cloud",
@@ -1507,7 +1517,7 @@
1507
1517
  "namespace": "wunderland",
1508
1518
  "verified": true,
1509
1519
  "source": "curated",
1510
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1520
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1511
1521
  "keywords": [
1512
1522
  "slack",
1513
1523
  "messaging",
@@ -1539,7 +1549,7 @@
1539
1549
  "namespace": "wunderland",
1540
1550
  "verified": true,
1541
1551
  "source": "curated",
1542
- "verifiedAt": "2026-03-27T01:15:04.856Z"
1552
+ "verifiedAt": "2026-03-27T07:00:16.599Z"
1543
1553
  },
1544
1554
  {
1545
1555
  "id": "com.framers.skill.spotify-player",
@@ -1552,7 +1562,7 @@
1552
1562
  "namespace": "wunderland",
1553
1563
  "verified": true,
1554
1564
  "source": "curated",
1555
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1565
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1556
1566
  "keywords": [
1557
1567
  "spotify",
1558
1568
  "music",
@@ -1587,7 +1597,7 @@
1587
1597
  "namespace": "wunderland",
1588
1598
  "verified": true,
1589
1599
  "source": "curated",
1590
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1600
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1591
1601
  "keywords": [
1592
1602
  "voice",
1593
1603
  "stt",
@@ -1617,7 +1627,7 @@
1617
1627
  "namespace": "wunderland",
1618
1628
  "verified": true,
1619
1629
  "source": "curated",
1620
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1630
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1621
1631
  "keywords": [
1622
1632
  "voice",
1623
1633
  "stt",
@@ -1648,7 +1658,7 @@
1648
1658
  "namespace": "wunderland",
1649
1659
  "verified": true,
1650
1660
  "source": "curated",
1651
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1661
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1652
1662
  "keywords": [
1653
1663
  "voice",
1654
1664
  "tts",
@@ -1678,7 +1688,7 @@
1678
1688
  "namespace": "wunderland",
1679
1689
  "verified": true,
1680
1690
  "source": "curated",
1681
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1691
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1682
1692
  "keywords": [
1683
1693
  "voice",
1684
1694
  "tts",
@@ -1707,7 +1717,7 @@
1707
1717
  "namespace": "wunderland",
1708
1718
  "verified": true,
1709
1719
  "source": "curated",
1710
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1720
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1711
1721
  "keywords": [
1712
1722
  "structured-output",
1713
1723
  "json",
@@ -1727,7 +1737,7 @@
1727
1737
  "namespace": "wunderland",
1728
1738
  "verified": true,
1729
1739
  "source": "curated",
1730
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1740
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1731
1741
  "keywords": [
1732
1742
  "summarization",
1733
1743
  "text-processing",
@@ -1753,7 +1763,7 @@
1753
1763
  "namespace": "wunderland",
1754
1764
  "verified": true,
1755
1765
  "source": "curated",
1756
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1766
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1757
1767
  "keywords": [
1758
1768
  "threads",
1759
1769
  "social-media",
@@ -1789,7 +1799,7 @@
1789
1799
  "namespace": "wunderland",
1790
1800
  "verified": true,
1791
1801
  "source": "curated",
1792
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1802
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1793
1803
  "keywords": [
1794
1804
  "tiktok",
1795
1805
  "video",
@@ -1826,7 +1836,7 @@
1826
1836
  "namespace": "wunderland",
1827
1837
  "verified": true,
1828
1838
  "source": "curated",
1829
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1839
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1830
1840
  "keywords": [
1831
1841
  "guardrails",
1832
1842
  "topics",
@@ -1853,7 +1863,7 @@
1853
1863
  "namespace": "wunderland",
1854
1864
  "verified": true,
1855
1865
  "source": "curated",
1856
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1866
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1857
1867
  "keywords": [
1858
1868
  "trello",
1859
1869
  "kanban",
@@ -1886,7 +1896,7 @@
1886
1896
  "namespace": "wunderland",
1887
1897
  "verified": true,
1888
1898
  "source": "curated",
1889
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1899
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1890
1900
  "keywords": [
1891
1901
  "twitter",
1892
1902
  "social-media",
@@ -1918,24 +1928,54 @@
1918
1928
  "primaryEnv": "TWITTER_BEARER_TOKEN"
1919
1929
  }
1920
1930
  },
1931
+ {
1932
+ "id": "com.framers.skill.video-generation",
1933
+ "name": "video-generation",
1934
+ "displayName": "video-generation",
1935
+ "version": "1.0.0",
1936
+ "path": "registry/curated/video-generation",
1937
+ "description": "Video generation, analysis, and scene detection — text-to-video, image-to-video, structured scene descriptions with RAG indexing, and general-purpose visual change detection.",
1938
+ "category": "media",
1939
+ "namespace": "wunderland",
1940
+ "verified": true,
1941
+ "source": "curated",
1942
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1943
+ "keywords": [
1944
+ "video",
1945
+ "generation",
1946
+ "analysis",
1947
+ "scene-detection",
1948
+ "RAG",
1949
+ "multimodal",
1950
+ "runway",
1951
+ "replicate",
1952
+ "fal"
1953
+ ],
1954
+ "metadata": {
1955
+ "emoji": "🎬"
1956
+ }
1957
+ },
1921
1958
  {
1922
1959
  "id": "com.framers.skill.vision-ocr",
1923
1960
  "name": "vision-ocr",
1924
1961
  "displayName": "vision-ocr",
1925
- "version": "1.0.0",
1962
+ "version": "1.1.0",
1926
1963
  "path": "registry/curated/vision-ocr",
1927
- "description": "Extract text from images using OCR and vision AI",
1928
- "category": "uncategorized",
1964
+ "description": "Extract text from images using OCR and vision AI with the performOCR() high-level API or the full VisionPipeline.",
1965
+ "category": "vision",
1929
1966
  "namespace": "wunderland",
1930
1967
  "verified": true,
1931
1968
  "source": "curated",
1932
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1969
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1933
1970
  "keywords": [
1934
1971
  "vision",
1935
1972
  "ocr",
1936
1973
  "text-extraction",
1937
1974
  "document",
1938
1975
  "handwriting"
1976
+ ],
1977
+ "requiredTools": [
1978
+ "vision-pipeline"
1939
1979
  ]
1940
1980
  },
1941
1981
  {
@@ -1949,7 +1989,7 @@
1949
1989
  "namespace": "wunderland",
1950
1990
  "verified": true,
1951
1991
  "source": "curated",
1952
- "verifiedAt": "2026-03-27T01:15:04.856Z",
1992
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1953
1993
  "keywords": [
1954
1994
  "voice",
1955
1995
  "speech",
@@ -1976,7 +2016,7 @@
1976
2016
  "namespace": "wunderland",
1977
2017
  "verified": true,
1978
2018
  "source": "curated",
1979
- "verifiedAt": "2026-03-27T01:15:04.856Z",
2019
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
1980
2020
  "keywords": [
1981
2021
  "voice",
1982
2022
  "stt",
@@ -2002,7 +2042,7 @@
2002
2042
  "namespace": "wunderland",
2003
2043
  "verified": true,
2004
2044
  "source": "curated",
2005
- "verifiedAt": "2026-03-27T01:15:04.856Z",
2045
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
2006
2046
  "keywords": [
2007
2047
  "weather",
2008
2048
  "forecast",
@@ -2028,7 +2068,7 @@
2028
2068
  "namespace": "wunderland",
2029
2069
  "verified": true,
2030
2070
  "source": "curated",
2031
- "verifiedAt": "2026-03-27T01:15:04.856Z",
2071
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
2032
2072
  "keywords": [
2033
2073
  "scraping",
2034
2074
  "browser",
@@ -2065,7 +2105,7 @@
2065
2105
  "namespace": "wunderland",
2066
2106
  "verified": true,
2067
2107
  "source": "curated",
2068
- "verifiedAt": "2026-03-27T01:15:04.856Z",
2108
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
2069
2109
  "keywords": [
2070
2110
  "search",
2071
2111
  "web",
@@ -2092,7 +2132,7 @@
2092
2132
  "namespace": "wunderland",
2093
2133
  "verified": true,
2094
2134
  "source": "curated",
2095
- "verifiedAt": "2026-03-27T01:15:04.856Z",
2135
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
2096
2136
  "keywords": [
2097
2137
  "transcription",
2098
2138
  "whisper",
@@ -2155,7 +2195,7 @@
2155
2195
  "namespace": "wunderland",
2156
2196
  "verified": true,
2157
2197
  "source": "curated",
2158
- "verifiedAt": "2026-03-27T01:15:04.856Z",
2198
+ "verifiedAt": "2026-03-27T07:00:16.599Z",
2159
2199
  "keywords": [
2160
2200
  "youtube",
2161
2201
  "video",
@@ -2188,8 +2228,8 @@
2188
2228
  "community": []
2189
2229
  },
2190
2230
  "stats": {
2191
- "totalSkills": 65,
2192
- "curatedCount": 65,
2231
+ "totalSkills": 66,
2232
+ "curatedCount": 66,
2193
2233
  "communityCount": 0
2194
2234
  }
2195
2235
  }
@@ -1,90 +0,0 @@
1
- ---
2
- name: video-ingestion
3
- version: '1.0.0'
4
- description: Video processing for RAG — extract frames via vision pipeline + audio via STT, index into knowledge base.
5
- author: Wunderland
6
- namespace: wunderland
7
- category: productivity
8
- tags: [video, ffmpeg, frames, transcription, multimodal, RAG]
9
- requires_secrets: []
10
- requires_tools: []
11
- metadata:
12
- agentos:
13
- emoji: "\U0001F3AC"
14
- ---
15
-
16
- # Video Ingestion for Multimodal RAG
17
-
18
- Use this skill when the user wants to index video content into the agent's knowledge base so it can be searched and recalled during conversation.
19
-
20
- Video ingestion works through the `MultimodalMemoryBridge`, which orchestrates two parallel extraction pipelines and feeds the results into both the RAG vector store and (optionally) cognitive memory.
21
-
22
- ## How It Works
23
-
24
- 1. **Frame extraction** — ffmpeg samples frames at a configurable interval (default: 1 frame every 5 seconds). Each frame is passed to a vision-capable LLM (e.g. GPT-4o) which generates a text description. That description is embedded and indexed into the vector store with `modality: 'image'` metadata.
25
-
26
- 2. **Audio extraction** — ffmpeg demuxes the audio track and pipes it to the configured STT provider (e.g. Whisper). The resulting transcript is chunked, embedded, and indexed with `modality: 'audio'` metadata.
27
-
28
- 3. **Memory traces** — When cognitive memory is enabled, the bridge encodes both visual descriptions and audio transcript chunks as memory traces so the agent can recall video content during future conversations.
29
-
30
- ## When to Ingest Video vs. Just Extract Audio
31
-
32
- - **Ingest full video** when visual content matters: tutorials, screen recordings, product demos, surveillance, presentations with slides, anything where "what is shown" conveys information the transcript alone misses.
33
- - **Extract audio only** when the video is essentially a podcast, voice memo, meeting recording, or phone call where the visual track adds no information. Audio-only ingestion is faster, cheaper (no vision LLM calls), and produces smaller index footprints.
34
-
35
- If you are unsure, prefer full video ingestion. The frame extraction is lightweight and the vision descriptions are short — the marginal cost is small compared to the value of not losing visual context.
36
-
37
- ## Prerequisites
38
-
39
- - **ffmpeg** must be installed and on the system PATH. The bridge shells out to `ffmpeg` for frame and audio extraction. Without it, video ingestion will fail with a clear error.
40
- - A **vision-capable LLM** must be configured (OPENAI_API_KEY or equivalent) for frame description.
41
- - An **STT provider** must be configured for audio transcription.
42
-
43
- ## Usage
44
-
45
- Video ingestion is triggered through the `MultimodalMemoryBridge.ingestVideo()` method. When using the HTTP API, POST the video file to:
46
-
47
- ```
48
- POST /api/agentos/rag/multimodal/documents/ingest
49
- Content-Type: multipart/form-data
50
- ```
51
-
52
- with the video file in the `document` field. The system auto-detects video MIME types and routes to the video pipeline.
53
-
54
- Programmatic usage:
55
-
56
- ```typescript
57
- import { MultimodalMemoryBridge } from 'agentos/rag/multimodal';
58
-
59
- await bridge.ingestVideo(videoBuffer, {
60
- source: 'user-upload',
61
- fileName: 'meeting-2024-03-15.mp4',
62
- extractFrames: true, // default true
63
- frameIntervalSeconds: 10, // sample 1 frame every 10s (default 5)
64
- language: 'en', // STT language hint
65
- });
66
- ```
67
-
68
- ## Configuration Options
69
-
70
- | Option | Default | Description |
71
- |--------|---------|-------------|
72
- | `extractFrames` | `true` | Set `false` for audio-only ingestion |
73
- | `frameIntervalSeconds` | `5` | Seconds between sampled frames |
74
- | `language` | auto-detect | BCP-47 language code for STT |
75
- | `collection` | `'multimodal'` | Target vector store collection |
76
-
77
- ## Examples
78
-
79
- - "Ingest this tutorial video so I can search it later."
80
- - "Extract the audio from this meeting recording and add it to my knowledge base."
81
- - "Index this product demo video — I need to reference the UI screenshots shown at 2:30."
82
- - "Process all MP4 files in this folder and make them searchable."
83
-
84
- ## Constraints
85
-
86
- - ffmpeg must be installed. The system does not bundle or auto-install it.
87
- - Long videos (>1 hour) produce many frames; consider increasing `frameIntervalSeconds` to 15-30 for very long content.
88
- - Vision LLM calls are billed per frame. A 1-hour video at the default 5-second interval generates ~720 frames.
89
- - Supported container formats: MP4, MKV, WebM, AVI, MOV (anything ffmpeg can demux).
90
- - Video ingestion is not real-time; expect processing time proportional to video length.