@framers/agentos-skills 0.2.1 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CONTRIBUTING.md +231 -0
- package/README.md +99 -58
- package/package.json +22 -34
- package/registry/community/.gitkeep +0 -0
- package/registry/curated/1password/SKILL.md +53 -0
- package/registry/curated/account-manager/SKILL.md +60 -0
- package/registry/curated/agent-config/SKILL.md +22 -0
- package/registry/curated/amazon-polly/SKILL.md +74 -0
- package/registry/curated/apple-notes/SKILL.md +45 -0
- package/registry/curated/apple-reminders/SKILL.md +46 -0
- package/registry/curated/audio-generation/SKILL.md +231 -0
- package/registry/curated/blog-publisher/SKILL.md +110 -0
- package/registry/curated/bluesky-bot/SKILL.md +93 -0
- package/registry/curated/cli-tools/SKILL.md +137 -0
- package/registry/curated/cloud-ops/SKILL.md +124 -0
- package/registry/curated/code-safety/SKILL.md +42 -0
- package/registry/curated/coding-agent/SKILL.md +40 -0
- package/registry/curated/company-research/SKILL.md +46 -0
- package/registry/curated/content-creator/SKILL.md +53 -0
- package/registry/curated/deep-research/SKILL.md +56 -0
- package/registry/curated/diarization/SKILL.md +83 -0
- package/registry/curated/discord-helper/SKILL.md +43 -0
- package/registry/curated/document-export/SKILL.md +54 -0
- package/registry/curated/email-intelligence/SKILL.md +41 -0
- package/registry/curated/emergent-tools/SKILL.md +225 -0
- package/registry/curated/endpoint-semantic/SKILL.md +72 -0
- package/registry/curated/facebook-bot/SKILL.md +94 -0
- package/registry/curated/git/SKILL.md +49 -0
- package/registry/curated/github/SKILL.md +142 -0
- package/registry/curated/google-cloud-stt/SKILL.md +71 -0
- package/registry/curated/google-cloud-tts/SKILL.md +71 -0
- package/registry/curated/grounding-guard/SKILL.md +38 -0
- package/registry/curated/healthcheck/SKILL.md +43 -0
- package/registry/curated/image-editing/SKILL.md +25 -0
- package/registry/curated/image-gen/SKILL.md +141 -0
- package/registry/curated/instagram-bot/SKILL.md +60 -0
- package/registry/curated/interactive-widgets/SKILL.md +85 -0
- package/registry/curated/linkedin-bot/SKILL.md +86 -0
- package/registry/curated/mastodon-bot/SKILL.md +104 -0
- package/registry/curated/memory-manager/SKILL.md +127 -0
- package/registry/curated/ml-content-classifier/SKILL.md +38 -0
- package/registry/curated/movie-lookup/SKILL.md +48 -0
- package/registry/curated/multimodal-rag/SKILL.md +153 -0
- package/registry/curated/notion/SKILL.md +43 -0
- package/registry/curated/obsidian/SKILL.md +42 -0
- package/registry/curated/openwakeword/SKILL.md +75 -0
- package/registry/curated/pii-redaction/SKILL.md +56 -0
- package/registry/curated/pinterest-bot/SKILL.md +45 -0
- package/registry/curated/piper/SKILL.md +72 -0
- package/registry/curated/porcupine/SKILL.md +74 -0
- package/registry/curated/reddit-bot/SKILL.md +74 -0
- package/registry/curated/seo-campaign/SKILL.md +51 -0
- package/registry/curated/site-deploy/SKILL.md +119 -0
- package/registry/curated/slack-helper/SKILL.md +43 -0
- package/registry/curated/social-broadcast/SKILL.md +145 -0
- package/registry/curated/spotify-player/SKILL.md +45 -0
- package/registry/curated/streaming-stt-deepgram/SKILL.md +84 -0
- package/registry/curated/streaming-stt-whisper/SKILL.md +82 -0
- package/registry/curated/streaming-tts-elevenlabs/SKILL.md +84 -0
- package/registry/curated/streaming-tts-openai/SKILL.md +83 -0
- package/registry/curated/structured-output/SKILL.md +22 -0
- package/registry/curated/summarize/SKILL.md +40 -0
- package/registry/curated/threads-bot/SKILL.md +82 -0
- package/registry/curated/tiktok-bot/SKILL.md +104 -0
- package/registry/curated/topicality/SKILL.md +37 -0
- package/registry/curated/trello/SKILL.md +44 -0
- package/registry/curated/twitter-bot/SKILL.md +63 -0
- package/registry/curated/video-generation/SKILL.md +225 -0
- package/registry/curated/vision-ocr/SKILL.md +82 -0
- package/registry/curated/voice-conversation/SKILL.md +65 -0
- package/registry/curated/vosk/SKILL.md +74 -0
- package/registry/curated/weather/SKILL.md +37 -0
- package/registry/curated/web-scraper/SKILL.md +60 -0
- package/registry/curated/web-search/SKILL.md +49 -0
- package/registry/curated/whisper-transcribe/SKILL.md +58 -0
- package/registry/curated/youtube-bot/SKILL.md +104 -0
- package/registry.json +2446 -0
- package/scripts/update-registry.mjs +126 -0
- package/scripts/validate-skill.mjs +304 -0
- package/types.d.ts +160 -0
- package/dist/SkillLoader.d.ts +0 -50
- package/dist/SkillLoader.d.ts.map +0 -1
- package/dist/SkillLoader.js +0 -291
- package/dist/SkillLoader.js.map +0 -1
- package/dist/SkillRegistry.d.ts +0 -135
- package/dist/SkillRegistry.d.ts.map +0 -1
- package/dist/SkillRegistry.js +0 -455
- package/dist/SkillRegistry.js.map +0 -1
- package/dist/index.d.ts +0 -13
- package/dist/index.d.ts.map +0 -1
- package/dist/index.js +0 -13
- package/dist/index.js.map +0 -1
- package/dist/paths.d.ts +0 -35
- package/dist/paths.d.ts.map +0 -1
- package/dist/paths.js +0 -71
- package/dist/paths.js.map +0 -1
- package/dist/types.d.ts +0 -231
- package/dist/types.d.ts.map +0 -1
- package/dist/types.js +0 -21
- package/dist/types.js.map +0 -1
|
@@ -0,0 +1,44 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: trello
|
|
3
|
+
version: '1.0.0'
|
|
4
|
+
description: Manage Trello boards, lists, cards, checklists, and team workflows via the Trello API.
|
|
5
|
+
author: Wunderland
|
|
6
|
+
namespace: wunderland
|
|
7
|
+
category: productivity
|
|
8
|
+
tags: [trello, kanban, project-management, boards, tasks, workflow]
|
|
9
|
+
requires_secrets: [trello.api_key, trello.token]
|
|
10
|
+
requires_tools: []
|
|
11
|
+
metadata:
|
|
12
|
+
agentos:
|
|
13
|
+
emoji: "\U0001F4CB"
|
|
14
|
+
primaryEnv: TRELLO_API_KEY
|
|
15
|
+
secondaryEnvs: [TRELLO_TOKEN]
|
|
16
|
+
homepage: https://developer.atlassian.com/cloud/trello
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
# Trello Board Management
|
|
20
|
+
|
|
21
|
+
You can manage Trello boards, lists, and cards to organize projects and track workflows. Use the Trello REST API with API key and token authentication to perform board operations programmatically.
|
|
22
|
+
|
|
23
|
+
When managing cards, always provide complete information: title, description, due dates, labels, and assigned members. Move cards between lists to reflect workflow progress (e.g., To Do -> In Progress -> Done). Create checklists on cards for multi-step tasks and update individual checklist items as they are completed. Use labels consistently with the board's established color/naming conventions.
|
|
24
|
+
|
|
25
|
+
For board operations, create new boards with predefined list structures that match common workflows (Backlog, To Do, In Progress, Review, Done). When querying boards, filter cards by list, label, member, or due date to provide focused views. Archive completed cards periodically to keep boards clean, but never delete cards without explicit user confirmation.
|
|
26
|
+
|
|
27
|
+
When organizing work, use card descriptions for detailed specifications, attach relevant files and links, and add comments for status updates and discussions. Support Power-Up integrations where applicable (Calendar, Custom Fields). Batch related operations together to minimize API calls and provide atomic updates.
|
|
28
|
+
|
|
29
|
+
## Examples
|
|
30
|
+
|
|
31
|
+
- "Create a new card 'Implement login page' in the To Do list with a checklist of subtasks"
|
|
32
|
+
- "Move all cards labeled 'urgent' to the top of the In Progress list"
|
|
33
|
+
- "Show me all cards assigned to me that are due this week"
|
|
34
|
+
- "Archive all cards in the Done list that were completed more than 30 days ago"
|
|
35
|
+
- "Add a comment to card #42: 'Blocked by API dependency -- see PR #15'"
|
|
36
|
+
|
|
37
|
+
## Constraints
|
|
38
|
+
|
|
39
|
+
- API rate limits: 100 requests per 10-second window per token.
|
|
40
|
+
- Each board is limited to 5,000 cards (including archived).
|
|
41
|
+
- Attachments are limited to 250 per card, 10MB each per file.
|
|
42
|
+
- Cannot execute Trello Automations (Butler rules) via API; only manual operations.
|
|
43
|
+
- Board templates and Power-Up configurations require additional API access.
|
|
44
|
+
- Webhook creation requires a publicly accessible callback URL.
|
|
@@ -0,0 +1,63 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: twitter-bot
|
|
3
|
+
version: '1.0.0'
|
|
4
|
+
description: Automated Twitter/X engagement — personality-driven reply bot, thread creation, trending engagement, and analytics tracking.
|
|
5
|
+
author: Wunderland
|
|
6
|
+
namespace: wunderland
|
|
7
|
+
category: social-automation
|
|
8
|
+
tags: [twitter, social-media, engagement, reply-bot, threads, trending, automation]
|
|
9
|
+
requires_secrets: [twitter.bearerToken, twitter.apiKey, twitter.apiSecret, twitter.accessToken, twitter.accessSecret]
|
|
10
|
+
requires_tools: [twitterPost, twitterReply, twitterSearch, twitterTrending, twitterLike, twitterRetweet, twitterThread, twitterAnalytics]
|
|
11
|
+
metadata:
|
|
12
|
+
agentos:
|
|
13
|
+
emoji: "\U0001F426"
|
|
14
|
+
primaryEnv: TWITTER_BEARER_TOKEN
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
# Twitter Bot
|
|
18
|
+
|
|
19
|
+
You are an autonomous Twitter/X engagement agent. You can post tweets, reply to conversations, create threads, engage with trending topics, and track your performance analytics.
|
|
20
|
+
|
|
21
|
+
## Core Capabilities
|
|
22
|
+
|
|
23
|
+
- **Post tweets** with text, images, polls, and media
|
|
24
|
+
- **Reply to tweets** in your agent's personality and voice
|
|
25
|
+
- **Quote tweet** with your commentary
|
|
26
|
+
- **Create threads** — multi-tweet story arcs or analysis
|
|
27
|
+
- **Engage** — like and retweet content aligned with your interests
|
|
28
|
+
- **Search** — find conversations and topics relevant to your expertise
|
|
29
|
+
- **Track trending** — discover what's hot and join relevant conversations
|
|
30
|
+
- **Analytics** — monitor engagement metrics on your posts
|
|
31
|
+
|
|
32
|
+
## Engagement Strategy
|
|
33
|
+
|
|
34
|
+
1. **Search for relevant topics** using your interests and expertise keywords
|
|
35
|
+
2. **Evaluate tweets** — only engage with content that aligns with your persona
|
|
36
|
+
3. **Reply thoughtfully** — add genuine value, don't just agree or promote
|
|
37
|
+
4. **Create original threads** on topics you have deep knowledge about
|
|
38
|
+
5. **Monitor trending topics** and join conversations where you can contribute
|
|
39
|
+
6. **Track analytics** to understand what resonates with your audience
|
|
40
|
+
|
|
41
|
+
## Personality Guidelines
|
|
42
|
+
|
|
43
|
+
- Stay in character — your HEXACO traits should influence your tone and approach
|
|
44
|
+
- High Openness agents: explore diverse topics, share novel perspectives
|
|
45
|
+
- High Agreeableness agents: be supportive, amplify others
|
|
46
|
+
- Low Agreeableness agents: challenge ideas constructively, debate
|
|
47
|
+
- High Conscientiousness agents: fact-check, provide sources, be thorough
|
|
48
|
+
|
|
49
|
+
## Rate Limits & Safety
|
|
50
|
+
|
|
51
|
+
- Respect Twitter API rate limits (300 tweets/3h, 1000 DMs/24h)
|
|
52
|
+
- Don't spam — minimum 5-minute gap between engagement bursts
|
|
53
|
+
- Avoid controversial or harmful content per your security tier
|
|
54
|
+
- Don't engage with bots or low-quality content
|
|
55
|
+
- Vary your engagement patterns to appear natural
|
|
56
|
+
|
|
57
|
+
## Workflow
|
|
58
|
+
|
|
59
|
+
1. **Discover** → Search trends and relevant conversations
|
|
60
|
+
2. **Evaluate** → Score each opportunity for relevance and engagement potential
|
|
61
|
+
3. **Engage** → Reply, like, or retweet based on evaluation
|
|
62
|
+
4. **Create** → Post original content and threads on schedule
|
|
63
|
+
5. **Analyze** → Review performance and adjust strategy
|
|
@@ -0,0 +1,225 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: video-generation
|
|
3
|
+
version: '1.0.0'
|
|
4
|
+
description: Video generation, analysis, and scene detection — text-to-video, image-to-video, structured scene descriptions with RAG indexing, and general-purpose visual change detection.
|
|
5
|
+
author: Wunderland
|
|
6
|
+
namespace: wunderland
|
|
7
|
+
category: media
|
|
8
|
+
tags: [video, generation, analysis, scene-detection, RAG, multimodal, runway, replicate, fal]
|
|
9
|
+
requires_secrets: []
|
|
10
|
+
requires_tools: []
|
|
11
|
+
metadata:
|
|
12
|
+
agentos:
|
|
13
|
+
emoji: "\U0001F3AC"
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
# Video Generation, Analysis & Scene Detection
|
|
17
|
+
|
|
18
|
+
Use this skill when the user wants to create AI-generated videos, analyse existing video content for structured scene descriptions, or detect visual changes in live/recorded frame streams.
|
|
19
|
+
|
|
20
|
+
This skill covers three complementary APIs:
|
|
21
|
+
|
|
22
|
+
1. **generateVideo()** — Text-to-video and image-to-video generation
|
|
23
|
+
2. **analyzeVideo()** — Structured video analysis with scene descriptions, transcription, and optional RAG indexing
|
|
24
|
+
3. **detectScenes()** — Real-time or batch scene boundary detection from frame streams
|
|
25
|
+
|
|
26
|
+
## Video Generation
|
|
27
|
+
|
|
28
|
+
### Text-to-Video
|
|
29
|
+
|
|
30
|
+
Generate a video from a text prompt. The system auto-detects the best available provider from environment variables in priority order: `RUNWAY_API_KEY` (highest quality), `REPLICATE_API_TOKEN` (widest model variety), `FAL_API_KEY` (fast serverless GPU).
|
|
31
|
+
|
|
32
|
+
```typescript
|
|
33
|
+
import { generateVideo } from 'agentos';
|
|
34
|
+
|
|
35
|
+
const result = await generateVideo({
|
|
36
|
+
prompt: 'A drone flying over a misty forest at sunrise, cinematic 4K',
|
|
37
|
+
durationSec: 5,
|
|
38
|
+
aspectRatio: '16:9',
|
|
39
|
+
});
|
|
40
|
+
console.log(result.videos[0].url);
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
### Image-to-Video
|
|
44
|
+
|
|
45
|
+
Animate a still image by providing it as a Buffer via `opts.image`. The prompt describes the desired motion rather than the scene itself.
|
|
46
|
+
|
|
47
|
+
```typescript
|
|
48
|
+
import { generateVideo } from 'agentos';
|
|
49
|
+
import { readFileSync } from 'fs';
|
|
50
|
+
|
|
51
|
+
const result = await generateVideo({
|
|
52
|
+
prompt: 'Camera slowly zooms out, gentle wind moves the leaves',
|
|
53
|
+
image: readFileSync('landscape.png'),
|
|
54
|
+
provider: 'runway',
|
|
55
|
+
});
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
### Provider Selection
|
|
59
|
+
|
|
60
|
+
| Provider | Best For | Env Var |
|
|
61
|
+
|----------|----------|---------|
|
|
62
|
+
| **Runway** | Highest quality, cinematic output, image-to-video | `RUNWAY_API_KEY` |
|
|
63
|
+
| **Replicate** | Widest model variety (Kling, HunyuanVideo, MiniMax), open-source models | `REPLICATE_API_TOKEN` |
|
|
64
|
+
| **Fal** | Fast serverless GPU, cost-effective, Kling/CogVideo | `FAL_API_KEY` |
|
|
65
|
+
|
|
66
|
+
When multiple provider API keys are set, the system wraps the primary in a `FallbackVideoProxy` so a transient failure on one provider automatically retries on the next.
|
|
67
|
+
|
|
68
|
+
Use `providerPreferences` to reorder, block, or weight providers during auto-selection:
|
|
69
|
+
|
|
70
|
+
```typescript
|
|
71
|
+
const result = await generateVideo({
|
|
72
|
+
prompt: 'A cat playing piano',
|
|
73
|
+
timeoutMs: 180_000,
|
|
74
|
+
onProgress: (event) => console.log(event.status, event.message),
|
|
75
|
+
providerPreferences: {
|
|
76
|
+
preferred: ['runway', 'replicate'],
|
|
77
|
+
weights: { runway: 3, replicate: 1 },
|
|
78
|
+
blocked: ['fal'],
|
|
79
|
+
},
|
|
80
|
+
});
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
To force a specific provider:
|
|
84
|
+
|
|
85
|
+
```typescript
|
|
86
|
+
const result = await generateVideo({
|
|
87
|
+
prompt: 'A cat playing piano',
|
|
88
|
+
provider: 'replicate',
|
|
89
|
+
model: 'klingai/kling-v1',
|
|
90
|
+
apiKey: 'your-replicate-token',
|
|
91
|
+
});
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
### Prompt Tips for Video
|
|
95
|
+
|
|
96
|
+
- **Be specific about motion**: "camera pans left to right", "person walks toward camera", "time-lapse of clouds moving"
|
|
97
|
+
- **Specify style early**: "cinematic 4K", "hand-drawn animation", "vintage film grain"
|
|
98
|
+
- **Keep prompts concise**: Video models respond best to clear, focused descriptions (1-3 sentences)
|
|
99
|
+
- **Use negative prompts** to avoid unwanted artifacts: `negativePrompt: 'blurry, distorted faces, watermark'`
|
|
100
|
+
|
|
101
|
+
### Image-to-Video Motion Strength
|
|
102
|
+
|
|
103
|
+
When doing image-to-video, the prompt controls how much the image changes:
|
|
104
|
+
|
|
105
|
+
- **Gentle motion**: "subtle camera drift", "soft wind blowing through hair" — minimal departure from source
|
|
106
|
+
- **Moderate motion**: "person turns head and smiles", "camera orbits subject" — clear movement while preserving subject
|
|
107
|
+
- **Strong motion**: "explosion of confetti", "character runs toward camera" — significant scene change
|
|
108
|
+
|
|
109
|
+
The provider's motion strength interpretation varies. Runway tends to be conservative (good for preserving the source image), while Replicate/Fal models may be more aggressive. Start with gentle prompts and increase intensity.
|
|
110
|
+
|
|
111
|
+
## Video Analysis
|
|
112
|
+
|
|
113
|
+
### Structured Scene Analysis
|
|
114
|
+
|
|
115
|
+
Analyse a video to extract structured scene descriptions, detected objects, on-screen text, and optional audio transcription.
|
|
116
|
+
|
|
117
|
+
```typescript
|
|
118
|
+
import { analyzeVideo } from 'agentos';
|
|
119
|
+
|
|
120
|
+
const analysis = await analyzeVideo({
|
|
121
|
+
videoUrl: 'https://example.com/product-demo.mp4',
|
|
122
|
+
prompt: 'Identify all products shown and their key features',
|
|
123
|
+
transcribeAudio: true,
|
|
124
|
+
descriptionDetail: 'detailed',
|
|
125
|
+
});
|
|
126
|
+
|
|
127
|
+
console.log(analysis.description);
|
|
128
|
+
for (const scene of analysis.scenes ?? []) {
|
|
129
|
+
console.log(`[${scene.startSec}s - ${scene.endSec}s] ${scene.description}`);
|
|
130
|
+
}
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
### RAG Integration
|
|
134
|
+
|
|
135
|
+
Enable `indexForRAG: true` to automatically index scene descriptions and transcripts into the vector store for later retrieval. This is especially useful for building searchable video libraries.
|
|
136
|
+
|
|
137
|
+
```typescript
|
|
138
|
+
const analysis = await analyzeVideo({
|
|
139
|
+
videoBuffer: videoData,
|
|
140
|
+
indexForRAG: true,
|
|
141
|
+
descriptionDetail: 'detailed',
|
|
142
|
+
transcribeAudio: true,
|
|
143
|
+
});
|
|
144
|
+
|
|
145
|
+
// Scene descriptions and transcripts are now searchable via RAG
|
|
146
|
+
console.log(`Indexed ${analysis.ragChunkIds?.length ?? 0} chunks`);
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
Each scene description becomes a separate vector chunk with metadata including timestamps, scene index, and cut type. This enables queries like "find the part where the presenter shows the pricing slide" to return precise timestamp ranges.
|
|
150
|
+
|
|
151
|
+
### Analysis Options
|
|
152
|
+
|
|
153
|
+
| Option | Default | Description |
|
|
154
|
+
|--------|---------|-------------|
|
|
155
|
+
| `sceneThreshold` | `0.3` | Scene change sensitivity (0-1, lower = more scenes) |
|
|
156
|
+
| `transcribeAudio` | `true` | Transcribe audio via configured STT provider |
|
|
157
|
+
| `descriptionDetail` | `'detailed'` | `'brief'`, `'detailed'`, or `'exhaustive'` |
|
|
158
|
+
| `maxScenes` | `100` | Cap on detected scenes (prevents runaway on long videos) |
|
|
159
|
+
| `indexForRAG` | `false` | Index results into RAG vector store |
|
|
160
|
+
|
|
161
|
+
## Scene Detection
|
|
162
|
+
|
|
163
|
+
### Live Stream / Batch Detection
|
|
164
|
+
|
|
165
|
+
Use `detectScenes()` for real-time visual change detection on frame streams. Returns an AsyncGenerator that yields `SceneBoundary` objects as visual discontinuities are detected.
|
|
166
|
+
|
|
167
|
+
```typescript
|
|
168
|
+
import { detectScenes } from 'agentos';
|
|
169
|
+
|
|
170
|
+
// From a pre-recorded video (frames extracted via ffmpeg)
|
|
171
|
+
for await (const boundary of detectScenes({ frames: extractedFrameStream })) {
|
|
172
|
+
console.log(`Scene ${boundary.index} at ${boundary.startTimeSec}s`);
|
|
173
|
+
console.log(` Type: ${boundary.cutType}, Confidence: ${boundary.confidence}`);
|
|
174
|
+
}
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
### Use Cases
|
|
178
|
+
|
|
179
|
+
- **Webcam / security camera**: Detect motion or scene changes in real-time surveillance feeds
|
|
180
|
+
- **Screen recording**: Identify slide transitions in presentations, page changes in demos
|
|
181
|
+
- **Video editing**: Automatically segment raw footage at cut points
|
|
182
|
+
- **Content moderation**: Flag rapid scene changes that may indicate problematic content
|
|
183
|
+
|
|
184
|
+
### Configuration
|
|
185
|
+
|
|
186
|
+
```typescript
|
|
187
|
+
for await (const boundary of detectScenes({
|
|
188
|
+
frames: webcamStream,
|
|
189
|
+
hardCutThreshold: 0.4, // Less sensitive to hard cuts
|
|
190
|
+
gradualThreshold: 0.15, // Standard sensitivity for dissolves/fades
|
|
191
|
+
minSceneDurationSec: 2.0, // Suppress very short scenes
|
|
192
|
+
methods: ['histogram'], // Fast histogram-only detection
|
|
193
|
+
})) {
|
|
194
|
+
handleSceneChange(boundary);
|
|
195
|
+
}
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
### Cut Type Classification
|
|
199
|
+
|
|
200
|
+
The detector classifies each scene boundary:
|
|
201
|
+
|
|
202
|
+
| Cut Type | Description |
|
|
203
|
+
|----------|-------------|
|
|
204
|
+
| `hard-cut` | Abrupt frame-to-frame change (most common) |
|
|
205
|
+
| `dissolve` | Cross-dissolve / superimposition transition |
|
|
206
|
+
| `fade` | Fade from/to black or white |
|
|
207
|
+
| `gradual` | Other gradual visual change |
|
|
208
|
+
|
|
209
|
+
## Prerequisites
|
|
210
|
+
|
|
211
|
+
- At least one video provider API key for generation (`RUNWAY_API_KEY`, `REPLICATE_API_TOKEN`, or `FAL_API_KEY`)
|
|
212
|
+
- **ffmpeg** on PATH for video analysis (frame extraction and audio demuxing)
|
|
213
|
+
- A vision-capable LLM (`OPENAI_API_KEY` or equivalent) for scene description
|
|
214
|
+
- An STT provider for audio transcription (when `transcribeAudio` is enabled)
|
|
215
|
+
|
|
216
|
+
Scene detection (`detectScenes()`) has zero external dependencies — it works purely on RGB pixel buffers.
|
|
217
|
+
|
|
218
|
+
## Examples
|
|
219
|
+
|
|
220
|
+
- "Generate a 5-second cinematic video of a sunset over the ocean"
|
|
221
|
+
- "Turn this product photo into a video with a slow camera orbit"
|
|
222
|
+
- "Analyse this tutorial video and index it for search"
|
|
223
|
+
- "Detect scene changes in this security camera feed"
|
|
224
|
+
- "Extract structured scenes from this presentation recording"
|
|
225
|
+
- "Create a video from this image with gentle parallax motion"
|
|
@@ -0,0 +1,82 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: vision-ocr
|
|
3
|
+
version: '1.1.0'
|
|
4
|
+
description: Extract text from images using OCR and vision AI with the performOCR() high-level API or the full VisionPipeline.
|
|
5
|
+
author: Wunderland
|
|
6
|
+
namespace: wunderland
|
|
7
|
+
category: vision
|
|
8
|
+
tags: [vision, ocr, text-extraction, document, handwriting]
|
|
9
|
+
requires_secrets: []
|
|
10
|
+
requires_tools: [vision-pipeline]
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Vision & OCR
|
|
14
|
+
|
|
15
|
+
Extract text from images, documents, and handwritten notes using a progressive 3-tier pipeline: local OCR (PaddleOCR / Tesseract) -> local vision models (TrOCR, Florence-2) -> cloud vision LLM (GPT-4o, Claude, Gemini).
|
|
16
|
+
|
|
17
|
+
## High-Level API: `performOCR()`
|
|
18
|
+
|
|
19
|
+
For one-shot text extraction, use the top-level `performOCR()` function. It handles input resolution, pipeline lifecycle, and cleanup automatically.
|
|
20
|
+
|
|
21
|
+
```typescript
|
|
22
|
+
import { performOCR } from '@framers/agentos';
|
|
23
|
+
|
|
24
|
+
const result = await performOCR({
|
|
25
|
+
image: '/path/to/receipt.png', // file path, URL, base64, or Buffer
|
|
26
|
+
strategy: 'progressive', // 'progressive' | 'local-only' | 'cloud-only'
|
|
27
|
+
confidenceThreshold: 0.7, // min confidence before escalating tier
|
|
28
|
+
});
|
|
29
|
+
|
|
30
|
+
console.log(result.text); // extracted text
|
|
31
|
+
console.log(result.confidence); // 0–1 score
|
|
32
|
+
console.log(result.tier); // 'ocr' | 'handwriting' | 'document-ai' | 'cloud-vision'
|
|
33
|
+
console.log(result.provider); // 'paddle' | 'tesseract' | 'openai' | etc.
|
|
34
|
+
console.log(result.regions); // bounding boxes (when available)
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
## When to use `performOCR()` vs `VisionPipeline`
|
|
38
|
+
|
|
39
|
+
| Use case | Recommendation |
|
|
40
|
+
|----------|---------------|
|
|
41
|
+
| One-shot text extraction from a single image | `performOCR()` — simplest API |
|
|
42
|
+
| Batch processing many images | `VisionPipeline` — create once, reuse, dispose when done |
|
|
43
|
+
| Need CLIP embeddings or document layout | `VisionPipeline` — richer result shape |
|
|
44
|
+
| Quick scripts and integrations | `performOCR()` — zero boilerplate |
|
|
45
|
+
|
|
46
|
+
## Progressive Tier System
|
|
47
|
+
|
|
48
|
+
The pipeline tries the cheapest/fastest tier first and only escalates when confidence is below threshold:
|
|
49
|
+
|
|
50
|
+
1. **Tier 1 — Local OCR** (PaddleOCR or Tesseract.js): Fast, free, offline. Handles printed text in documents, receipts, screenshots.
|
|
51
|
+
2. **Tier 2 — Local Vision Models** (TrOCR / Florence-2): Still offline. Handles handwritten notes, complex document layouts with tables and figures.
|
|
52
|
+
3. **Tier 3 — Cloud Vision LLM** (GPT-4o / Claude / Gemini): Best quality. Handles photographs, diagrams, mixed content, anything the local tiers can't confidently read.
|
|
53
|
+
|
|
54
|
+
## Strategy Selection
|
|
55
|
+
|
|
56
|
+
- **`'progressive'`** (default): Start local, escalate only if needed. Best cost/quality balance for most use cases.
|
|
57
|
+
- **`'local-only'`**: Never call cloud APIs. Use for air-gapped environments, privacy-sensitive data (medical records, financial docs), or when no API keys are available.
|
|
58
|
+
- **`'cloud-only'`**: Skip local tiers entirely, send straight to a cloud vision LLM. Use when you need the highest quality output and cost is not a concern.
|
|
59
|
+
|
|
60
|
+
## Input Formats
|
|
61
|
+
|
|
62
|
+
`performOCR()` accepts four input types:
|
|
63
|
+
|
|
64
|
+
- **File path**: `'/tmp/scan.png'` — reads from disk
|
|
65
|
+
- **URL**: `'https://example.com/receipt.jpg'` — fetches via HTTP
|
|
66
|
+
- **Base64 string**: Raw base64 or `data:image/png;base64,...` data URIs — decoded in-memory
|
|
67
|
+
- **Buffer**: Raw image bytes — passed directly to the pipeline
|
|
68
|
+
|
|
69
|
+
## Capabilities
|
|
70
|
+
|
|
71
|
+
- **Printed text OCR**: Extract text from documents, receipts, screenshots, PDFs
|
|
72
|
+
- **Handwriting recognition**: Read handwritten notes and forms via TrOCR
|
|
73
|
+
- **Document layout understanding**: Parse tables, figures, headings via Florence-2
|
|
74
|
+
- **Bounding box regions**: Spatial text locations for overlay rendering
|
|
75
|
+
- **Image embeddings**: Generate CLIP vectors for semantic image search (via `VisionPipeline` only)
|
|
76
|
+
|
|
77
|
+
## Examples
|
|
78
|
+
|
|
79
|
+
- "Read the text from this receipt"
|
|
80
|
+
- "What does this handwritten note say?"
|
|
81
|
+
- "Extract the table data from this PDF page"
|
|
82
|
+
- "OCR this screenshot and return the error message"
|
|
@@ -0,0 +1,65 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: voice-conversation
|
|
3
|
+
version: '1.0.0'
|
|
4
|
+
description: Run provider-agnostic live voice conversations with VAD, silence boundaries, wake-word gating, STT, and TTS through the AgentOS speech runtime.
|
|
5
|
+
author: Wunderland
|
|
6
|
+
namespace: wunderland
|
|
7
|
+
category: communication
|
|
8
|
+
tags: [voice, speech, conversation, stt, tts, vad, wake-word, whisper, elevenlabs]
|
|
9
|
+
requires_secrets: []
|
|
10
|
+
requires_tools: []
|
|
11
|
+
metadata:
|
|
12
|
+
agentos:
|
|
13
|
+
emoji: "\U0001F3A4"
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
# Live Voice Conversations
|
|
17
|
+
|
|
18
|
+
Use this skill when the user wants an agent to listen, transcribe, respond with speech, or switch between speech providers without changing the rest of the workflow.
|
|
19
|
+
|
|
20
|
+
Prefer the unified AgentOS speech runtime over provider-specific one-offs. Treat STT, TTS, VAD, and wake-word detection as separate capabilities that can be swapped independently.
|
|
21
|
+
|
|
22
|
+
## Workflow
|
|
23
|
+
|
|
24
|
+
1. Pick providers for:
|
|
25
|
+
- STT
|
|
26
|
+
- TTS
|
|
27
|
+
- optional wake word
|
|
28
|
+
- optional telephony
|
|
29
|
+
2. Start a speech session in one of three modes:
|
|
30
|
+
- `manual` for push-to-talk or file transcription
|
|
31
|
+
- `vad` for continuous listen-until-silence loops
|
|
32
|
+
- `wake-word` when the user wants hands-free activation
|
|
33
|
+
3. Let VAD and silence detection decide utterance boundaries unless the user explicitly wants manual capture.
|
|
34
|
+
4. Transcribe the utterance, generate the response, then synthesize speech with the selected TTS provider.
|
|
35
|
+
5. Support interruption and provider switching without changing the higher-level agent behavior.
|
|
36
|
+
|
|
37
|
+
## Provider Rules
|
|
38
|
+
|
|
39
|
+
- Prefer `openai-whisper` for simple hosted transcription.
|
|
40
|
+
- Prefer `openai-tts` for a default hosted voice path when one API key should cover both LLM and speech.
|
|
41
|
+
- Prefer `elevenlabs` when voice quality or cloning matters more than simplicity.
|
|
42
|
+
- Prefer local providers such as `whisper-local` or `piper` when the user wants offline or lower-cost operation.
|
|
43
|
+
- Treat wake-word detection as optional. Default to VAD + silence detection unless the user asked for hands-free wake-up.
|
|
44
|
+
|
|
45
|
+
## Voice UX Rules
|
|
46
|
+
|
|
47
|
+
- Do not keep listening forever without a boundary policy.
|
|
48
|
+
- Use significant pauses as a hint; use utterance-end silence as the final cutoff.
|
|
49
|
+
- When speech playback is active, be ready for barge-in and interruption if the user starts speaking again.
|
|
50
|
+
- Surface which provider combination is active so the user knows what is handling STT and TTS.
|
|
51
|
+
- When provider credentials are missing, degrade to whichever speech providers are configured instead of failing the whole interaction.
|
|
52
|
+
|
|
53
|
+
## Examples
|
|
54
|
+
|
|
55
|
+
- "Start a live voice session with Whisper for STT and ElevenLabs for TTS."
|
|
56
|
+
- "Use OpenAI for both speech recognition and speech output."
|
|
57
|
+
- "Run voice chat locally with VAD and no wake word."
|
|
58
|
+
- "Switch the TTS provider but keep the same voice conversation flow."
|
|
59
|
+
|
|
60
|
+
## Constraints
|
|
61
|
+
|
|
62
|
+
- Hosted speech providers need API keys.
|
|
63
|
+
- Wake-word support is optional and may depend on a plugin or local model.
|
|
64
|
+
- VAD and silence thresholds should be tuned for the environment; do not hardcode one value for every context.
|
|
65
|
+
- Telephony call control is separate from local microphone capture, but both should share the same speech provider abstractions.
|
|
@@ -0,0 +1,74 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: vosk
|
|
3
|
+
version: '1.0.0'
|
|
4
|
+
description: Fully offline speech-to-text via the Vosk library — streaming recognition, 16 kHz PCM, no network required after model download.
|
|
5
|
+
author: Wunderland
|
|
6
|
+
namespace: wunderland
|
|
7
|
+
category: voice
|
|
8
|
+
tags: [voice, stt, speech-to-text, vosk, offline, local, privacy]
|
|
9
|
+
requires_secrets: []
|
|
10
|
+
requires_tools: []
|
|
11
|
+
metadata:
|
|
12
|
+
agentos:
|
|
13
|
+
emoji: "\U0001F3A4"
|
|
14
|
+
homepage: https://alphacephei.com/vosk/
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
# Vosk Offline STT
|
|
18
|
+
|
|
19
|
+
Use this skill when the agent must operate without internet connectivity, or when user privacy requirements prohibit sending audio to external APIs. Vosk provides fully offline speech recognition after the model is downloaded once.
|
|
20
|
+
|
|
21
|
+
Prefer this over cloud STT providers when operating in air-gapped environments, on-premise deployments, or when the user explicitly requests zero-cloud voice processing.
|
|
22
|
+
|
|
23
|
+
## Setup
|
|
24
|
+
|
|
25
|
+
Download a Vosk model and place it at `~/.agentos/models/vosk/` (default), or set `modelPath` in `providerOptions`.
|
|
26
|
+
|
|
27
|
+
```sh
|
|
28
|
+
# Example: download the small English model
|
|
29
|
+
wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
|
|
30
|
+
unzip vosk-model-small-en-us-0.15.zip -d ~/.agentos/models/vosk/
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
## Configuration
|
|
34
|
+
|
|
35
|
+
```json
|
|
36
|
+
{
|
|
37
|
+
"voice": {
|
|
38
|
+
"stt": "vosk"
|
|
39
|
+
}
|
|
40
|
+
}
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
With a custom model path:
|
|
44
|
+
|
|
45
|
+
```json
|
|
46
|
+
{
|
|
47
|
+
"voice": {
|
|
48
|
+
"stt": "vosk",
|
|
49
|
+
"providerOptions": {
|
|
50
|
+
"modelPath": "/opt/models/vosk-model-en"
|
|
51
|
+
}
|
|
52
|
+
}
|
|
53
|
+
}
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
## Provider Rules
|
|
57
|
+
|
|
58
|
+
- Audio input must be 16 kHz LINEAR16 PCM. Resample other formats before streaming.
|
|
59
|
+
- Model quality scales with model size. Use `vosk-model-en-us-0.22` for best English accuracy; use small models on constrained hardware.
|
|
60
|
+
- No API key required. The only requirement is a pre-downloaded model directory.
|
|
61
|
+
- Streaming recognition is natively supported by the Vosk recognizer.
|
|
62
|
+
|
|
63
|
+
## Examples
|
|
64
|
+
|
|
65
|
+
- "Use offline Vosk STT for this air-gapped deployment."
|
|
66
|
+
- "Transcribe my voice locally without sending audio to the cloud."
|
|
67
|
+
- "Configure Vosk with the large English model for best accuracy."
|
|
68
|
+
|
|
69
|
+
## Constraints
|
|
70
|
+
|
|
71
|
+
- Requires the Vosk npm package and a pre-downloaded model directory.
|
|
72
|
+
- Accuracy is lower than cloud providers, especially for accented speech and domain-specific vocabulary.
|
|
73
|
+
- Audio must be 16 kHz mono LINEAR16 PCM. Other sample rates or formats require conversion.
|
|
74
|
+
- Model download size ranges from ~40 MB (small) to ~1.8 GB (large en-us).
|
|
@@ -0,0 +1,37 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: weather
|
|
3
|
+
version: '1.0.0'
|
|
4
|
+
description: Look up current weather conditions, forecasts, and severe weather alerts for any location worldwide.
|
|
5
|
+
author: Wunderland
|
|
6
|
+
namespace: wunderland
|
|
7
|
+
category: information
|
|
8
|
+
tags: [weather, forecast, climate, location]
|
|
9
|
+
requires_secrets: []
|
|
10
|
+
requires_tools: [web-search]
|
|
11
|
+
metadata:
|
|
12
|
+
agentos:
|
|
13
|
+
emoji: "\u2600\uFE0F"
|
|
14
|
+
homepage: https://openweathermap.org
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
# Weather Lookup
|
|
18
|
+
|
|
19
|
+
You can retrieve current weather conditions, multi-day forecasts, and severe weather alerts for any location the user specifies. Use the web-search tool to query weather data from reputable sources such as weather.gov, OpenWeatherMap, or AccuWeather.
|
|
20
|
+
|
|
21
|
+
When the user asks about weather, always clarify the location if it is ambiguous. Provide temperatures in both Fahrenheit and Celsius unless the user specifies a preference. Include relevant details like humidity, wind speed, precipitation chance, and UV index when available.
|
|
22
|
+
|
|
23
|
+
For forecasts, present information in a concise, scannable format. Highlight any severe weather warnings or advisories prominently at the top of your response. If the user asks about historical weather or climate averages, note that your data is limited to what is available through web search results.
|
|
24
|
+
|
|
25
|
+
## Examples
|
|
26
|
+
|
|
27
|
+
- "What's the weather in San Francisco right now?"
|
|
28
|
+
- "Give me a 5-day forecast for Tokyo"
|
|
29
|
+
- "Are there any severe weather alerts in the Midwest?"
|
|
30
|
+
- "What's the temperature in Berlin in Celsius?"
|
|
31
|
+
|
|
32
|
+
## Constraints
|
|
33
|
+
|
|
34
|
+
- Weather data accuracy depends on the web search results available at query time.
|
|
35
|
+
- Historical weather data may be limited or unavailable.
|
|
36
|
+
- Hyper-local micro-climate data (e.g., specific street-level conditions) is not reliably available.
|
|
37
|
+
- Always attribute the data source when possible.
|
|
@@ -0,0 +1,60 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: web-scraper
|
|
3
|
+
version: '1.0.0'
|
|
4
|
+
description: Autonomous web scraping — navigate sites, extract structured data, handle pagination, anti-detection, and proxy rotation.
|
|
5
|
+
author: Wunderland
|
|
6
|
+
namespace: wunderland
|
|
7
|
+
category: automation
|
|
8
|
+
tags: [scraping, browser, extraction, data, pagination, proxy, automation]
|
|
9
|
+
requires_secrets: []
|
|
10
|
+
requires_tools: [browserNavigate, browserClick, browserFill, browserExtract, browserScreenshot, browserSnapshot, browserScroll, browserWait, browserEvaluate, browserSession]
|
|
11
|
+
metadata:
|
|
12
|
+
agentos:
|
|
13
|
+
emoji: "\U0001F578"
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
# Web Scraper
|
|
17
|
+
|
|
18
|
+
You are an autonomous web scraping agent. You navigate websites, extract structured data, handle pagination, manage sessions, and deal with anti-bot measures — all using the browser automation tools.
|
|
19
|
+
|
|
20
|
+
## Core Capabilities
|
|
21
|
+
|
|
22
|
+
- **Navigate** to any URL and render full JavaScript pages
|
|
23
|
+
- **Snapshot** pages to understand structure and find interactive elements
|
|
24
|
+
- **Extract** text, HTML, or attributes from DOM selectors
|
|
25
|
+
- **Paginate** — click through pages, infinite scroll, load more buttons
|
|
26
|
+
- **Handle auth** — log in, manage sessions, restore cookies
|
|
27
|
+
- **Anti-detection** — rotate proxies, manage fingerprints
|
|
28
|
+
|
|
29
|
+
## Scraping Workflow
|
|
30
|
+
|
|
31
|
+
1. **Navigate** to the target URL
|
|
32
|
+
2. **Snapshot** the page to understand its structure
|
|
33
|
+
3. **Identify patterns** — find the data elements (product cards, article listings, etc.)
|
|
34
|
+
4. **Extract data** — pull text/attributes from identified selectors
|
|
35
|
+
5. **Paginate** — navigate to next page and repeat
|
|
36
|
+
6. **Handle errors** — retry on failures, screenshot for debugging
|
|
37
|
+
|
|
38
|
+
## Best Practices
|
|
39
|
+
|
|
40
|
+
- **Respect robots.txt** — check before scraping
|
|
41
|
+
- **Rate limit requests** — don't overwhelm servers (minimum 1-2 second delays)
|
|
42
|
+
- **Use sessions** — save and restore login state to avoid re-authentication
|
|
43
|
+
- **Handle dynamic content** — wait for elements to load before extracting
|
|
44
|
+
- **Validate data** — check extracted data for completeness
|
|
45
|
+
- **Take screenshots** on errors for debugging
|
|
46
|
+
|
|
47
|
+
## Anti-Detection
|
|
48
|
+
|
|
49
|
+
- Rotate user agents and viewport sizes
|
|
50
|
+
- Use proxy rotation when available
|
|
51
|
+
- Add random delays between actions
|
|
52
|
+
- Avoid scraping too fast from a single IP
|
|
53
|
+
- Handle CAPTCHAs when they appear
|
|
54
|
+
|
|
55
|
+
## Data Output
|
|
56
|
+
|
|
57
|
+
Structure extracted data consistently:
|
|
58
|
+
- Return arrays of objects with consistent field names
|
|
59
|
+
- Include metadata (source URL, timestamp, page number)
|
|
60
|
+
- Handle missing fields gracefully (null, not undefined)
|