@framers/agentos-skills 0.3.0 → 0.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (100) hide show
  1. package/CONTRIBUTING.md +231 -0
  2. package/README.md +93 -58
  3. package/package.json +19 -31
  4. package/registry/community/.gitkeep +0 -0
  5. package/registry/curated/1password/SKILL.md +53 -0
  6. package/registry/curated/account-manager/SKILL.md +60 -0
  7. package/registry/curated/agent-config/SKILL.md +22 -0
  8. package/registry/curated/amazon-polly/SKILL.md +74 -0
  9. package/registry/curated/apple-notes/SKILL.md +45 -0
  10. package/registry/curated/apple-reminders/SKILL.md +46 -0
  11. package/registry/curated/audio-generation/SKILL.md +231 -0
  12. package/registry/curated/blog-publisher/SKILL.md +110 -0
  13. package/registry/curated/bluesky-bot/SKILL.md +93 -0
  14. package/registry/curated/cli-tools/SKILL.md +137 -0
  15. package/registry/curated/cloud-ops/SKILL.md +124 -0
  16. package/registry/curated/code-safety/SKILL.md +42 -0
  17. package/registry/curated/coding-agent/SKILL.md +40 -0
  18. package/registry/curated/company-research/SKILL.md +46 -0
  19. package/registry/curated/content-creator/SKILL.md +53 -0
  20. package/registry/curated/deep-research/SKILL.md +56 -0
  21. package/registry/curated/diarization/SKILL.md +83 -0
  22. package/registry/curated/discord-helper/SKILL.md +43 -0
  23. package/registry/curated/document-export/SKILL.md +54 -0
  24. package/registry/curated/email-intelligence/SKILL.md +41 -0
  25. package/registry/curated/emergent-tools/SKILL.md +225 -0
  26. package/registry/curated/endpoint-semantic/SKILL.md +72 -0
  27. package/registry/curated/facebook-bot/SKILL.md +94 -0
  28. package/registry/curated/git/SKILL.md +49 -0
  29. package/registry/curated/github/SKILL.md +142 -0
  30. package/registry/curated/google-cloud-stt/SKILL.md +71 -0
  31. package/registry/curated/google-cloud-tts/SKILL.md +71 -0
  32. package/registry/curated/grounding-guard/SKILL.md +38 -0
  33. package/registry/curated/healthcheck/SKILL.md +43 -0
  34. package/registry/curated/image-editing/SKILL.md +25 -0
  35. package/registry/curated/image-gen/SKILL.md +141 -0
  36. package/registry/curated/instagram-bot/SKILL.md +60 -0
  37. package/registry/curated/interactive-widgets/SKILL.md +85 -0
  38. package/registry/curated/linkedin-bot/SKILL.md +86 -0
  39. package/registry/curated/mastodon-bot/SKILL.md +104 -0
  40. package/registry/curated/memory-manager/SKILL.md +127 -0
  41. package/registry/curated/ml-content-classifier/SKILL.md +38 -0
  42. package/registry/curated/movie-lookup/SKILL.md +48 -0
  43. package/registry/curated/multimodal-rag/SKILL.md +153 -0
  44. package/registry/curated/notion/SKILL.md +43 -0
  45. package/registry/curated/obsidian/SKILL.md +42 -0
  46. package/registry/curated/openwakeword/SKILL.md +75 -0
  47. package/registry/curated/pii-redaction/SKILL.md +56 -0
  48. package/registry/curated/pinterest-bot/SKILL.md +45 -0
  49. package/registry/curated/piper/SKILL.md +72 -0
  50. package/registry/curated/porcupine/SKILL.md +74 -0
  51. package/registry/curated/reddit-bot/SKILL.md +74 -0
  52. package/registry/curated/seo-campaign/SKILL.md +51 -0
  53. package/registry/curated/site-deploy/SKILL.md +119 -0
  54. package/registry/curated/slack-helper/SKILL.md +43 -0
  55. package/registry/curated/social-broadcast/SKILL.md +145 -0
  56. package/registry/curated/spotify-player/SKILL.md +45 -0
  57. package/registry/curated/streaming-stt-deepgram/SKILL.md +84 -0
  58. package/registry/curated/streaming-stt-whisper/SKILL.md +82 -0
  59. package/registry/curated/streaming-tts-elevenlabs/SKILL.md +84 -0
  60. package/registry/curated/streaming-tts-openai/SKILL.md +83 -0
  61. package/registry/curated/structured-output/SKILL.md +22 -0
  62. package/registry/curated/summarize/SKILL.md +40 -0
  63. package/registry/curated/threads-bot/SKILL.md +82 -0
  64. package/registry/curated/tiktok-bot/SKILL.md +104 -0
  65. package/registry/curated/topicality/SKILL.md +37 -0
  66. package/registry/curated/trello/SKILL.md +44 -0
  67. package/registry/curated/twitter-bot/SKILL.md +63 -0
  68. package/registry/curated/video-generation/SKILL.md +225 -0
  69. package/registry/curated/vision-ocr/SKILL.md +82 -0
  70. package/registry/curated/voice-conversation/SKILL.md +65 -0
  71. package/registry/curated/vosk/SKILL.md +74 -0
  72. package/registry/curated/weather/SKILL.md +37 -0
  73. package/registry/curated/web-scraper/SKILL.md +60 -0
  74. package/registry/curated/web-search/SKILL.md +49 -0
  75. package/registry/curated/whisper-transcribe/SKILL.md +58 -0
  76. package/registry/curated/youtube-bot/SKILL.md +104 -0
  77. package/registry.json +2446 -0
  78. package/scripts/update-registry.mjs +126 -0
  79. package/scripts/validate-skill.mjs +304 -0
  80. package/types.d.ts +160 -0
  81. package/dist/SkillLoader.d.ts +0 -50
  82. package/dist/SkillLoader.d.ts.map +0 -1
  83. package/dist/SkillLoader.js +0 -291
  84. package/dist/SkillLoader.js.map +0 -1
  85. package/dist/SkillRegistry.d.ts +0 -135
  86. package/dist/SkillRegistry.d.ts.map +0 -1
  87. package/dist/SkillRegistry.js +0 -455
  88. package/dist/SkillRegistry.js.map +0 -1
  89. package/dist/index.d.ts +0 -13
  90. package/dist/index.d.ts.map +0 -1
  91. package/dist/index.js +0 -13
  92. package/dist/index.js.map +0 -1
  93. package/dist/paths.d.ts +0 -35
  94. package/dist/paths.d.ts.map +0 -1
  95. package/dist/paths.js +0 -71
  96. package/dist/paths.js.map +0 -1
  97. package/dist/types.d.ts +0 -231
  98. package/dist/types.d.ts.map +0 -1
  99. package/dist/types.js +0 -21
  100. package/dist/types.js.map +0 -1
@@ -0,0 +1,44 @@
1
+ ---
2
+ name: trello
3
+ version: '1.0.0'
4
+ description: Manage Trello boards, lists, cards, checklists, and team workflows via the Trello API.
5
+ author: Wunderland
6
+ namespace: wunderland
7
+ category: productivity
8
+ tags: [trello, kanban, project-management, boards, tasks, workflow]
9
+ requires_secrets: [trello.api_key, trello.token]
10
+ requires_tools: []
11
+ metadata:
12
+ agentos:
13
+ emoji: "\U0001F4CB"
14
+ primaryEnv: TRELLO_API_KEY
15
+ secondaryEnvs: [TRELLO_TOKEN]
16
+ homepage: https://developer.atlassian.com/cloud/trello
17
+ ---
18
+
19
+ # Trello Board Management
20
+
21
+ You can manage Trello boards, lists, and cards to organize projects and track workflows. Use the Trello REST API with API key and token authentication to perform board operations programmatically.
22
+
23
+ When managing cards, always provide complete information: title, description, due dates, labels, and assigned members. Move cards between lists to reflect workflow progress (e.g., To Do -> In Progress -> Done). Create checklists on cards for multi-step tasks and update individual checklist items as they are completed. Use labels consistently with the board's established color/naming conventions.
24
+
25
+ For board operations, create new boards with predefined list structures that match common workflows (Backlog, To Do, In Progress, Review, Done). When querying boards, filter cards by list, label, member, or due date to provide focused views. Archive completed cards periodically to keep boards clean, but never delete cards without explicit user confirmation.
26
+
27
+ When organizing work, use card descriptions for detailed specifications, attach relevant files and links, and add comments for status updates and discussions. Support Power-Up integrations where applicable (Calendar, Custom Fields). Batch related operations together to minimize API calls and provide atomic updates.
28
+
29
+ ## Examples
30
+
31
+ - "Create a new card 'Implement login page' in the To Do list with a checklist of subtasks"
32
+ - "Move all cards labeled 'urgent' to the top of the In Progress list"
33
+ - "Show me all cards assigned to me that are due this week"
34
+ - "Archive all cards in the Done list that were completed more than 30 days ago"
35
+ - "Add a comment to card #42: 'Blocked by API dependency -- see PR #15'"
36
+
37
+ ## Constraints
38
+
39
+ - API rate limits: 100 requests per 10-second window per token.
40
+ - Each board is limited to 5,000 cards (including archived).
41
+ - Attachments are limited to 250 per card, 10MB each per file.
42
+ - Cannot execute Trello Automations (Butler rules) via API; only manual operations.
43
+ - Board templates and Power-Up configurations require additional API access.
44
+ - Webhook creation requires a publicly accessible callback URL.
@@ -0,0 +1,63 @@
1
+ ---
2
+ name: twitter-bot
3
+ version: '1.0.0'
4
+ description: Automated Twitter/X engagement — personality-driven reply bot, thread creation, trending engagement, and analytics tracking.
5
+ author: Wunderland
6
+ namespace: wunderland
7
+ category: social-automation
8
+ tags: [twitter, social-media, engagement, reply-bot, threads, trending, automation]
9
+ requires_secrets: [twitter.bearerToken, twitter.apiKey, twitter.apiSecret, twitter.accessToken, twitter.accessSecret]
10
+ requires_tools: [twitterPost, twitterReply, twitterSearch, twitterTrending, twitterLike, twitterRetweet, twitterThread, twitterAnalytics]
11
+ metadata:
12
+ agentos:
13
+ emoji: "\U0001F426"
14
+ primaryEnv: TWITTER_BEARER_TOKEN
15
+ ---
16
+
17
+ # Twitter Bot
18
+
19
+ You are an autonomous Twitter/X engagement agent. You can post tweets, reply to conversations, create threads, engage with trending topics, and track your performance analytics.
20
+
21
+ ## Core Capabilities
22
+
23
+ - **Post tweets** with text, images, polls, and media
24
+ - **Reply to tweets** in your agent's personality and voice
25
+ - **Quote tweet** with your commentary
26
+ - **Create threads** — multi-tweet story arcs or analysis
27
+ - **Engage** — like and retweet content aligned with your interests
28
+ - **Search** — find conversations and topics relevant to your expertise
29
+ - **Track trending** — discover what's hot and join relevant conversations
30
+ - **Analytics** — monitor engagement metrics on your posts
31
+
32
+ ## Engagement Strategy
33
+
34
+ 1. **Search for relevant topics** using your interests and expertise keywords
35
+ 2. **Evaluate tweets** — only engage with content that aligns with your persona
36
+ 3. **Reply thoughtfully** — add genuine value, don't just agree or promote
37
+ 4. **Create original threads** on topics you have deep knowledge about
38
+ 5. **Monitor trending topics** and join conversations where you can contribute
39
+ 6. **Track analytics** to understand what resonates with your audience
40
+
41
+ ## Personality Guidelines
42
+
43
+ - Stay in character — your HEXACO traits should influence your tone and approach
44
+ - High Openness agents: explore diverse topics, share novel perspectives
45
+ - High Agreeableness agents: be supportive, amplify others
46
+ - Low Agreeableness agents: challenge ideas constructively, debate
47
+ - High Conscientiousness agents: fact-check, provide sources, be thorough
48
+
49
+ ## Rate Limits & Safety
50
+
51
+ - Respect Twitter API rate limits (300 tweets/3h, 1000 DMs/24h)
52
+ - Don't spam — minimum 5-minute gap between engagement bursts
53
+ - Avoid controversial or harmful content per your security tier
54
+ - Don't engage with bots or low-quality content
55
+ - Vary your engagement patterns to appear natural
56
+
57
+ ## Workflow
58
+
59
+ 1. **Discover** → Search trends and relevant conversations
60
+ 2. **Evaluate** → Score each opportunity for relevance and engagement potential
61
+ 3. **Engage** → Reply, like, or retweet based on evaluation
62
+ 4. **Create** → Post original content and threads on schedule
63
+ 5. **Analyze** → Review performance and adjust strategy
@@ -0,0 +1,225 @@
1
+ ---
2
+ name: video-generation
3
+ version: '1.0.0'
4
+ description: Video generation, analysis, and scene detection — text-to-video, image-to-video, structured scene descriptions with RAG indexing, and general-purpose visual change detection.
5
+ author: Wunderland
6
+ namespace: wunderland
7
+ category: media
8
+ tags: [video, generation, analysis, scene-detection, RAG, multimodal, runway, replicate, fal]
9
+ requires_secrets: []
10
+ requires_tools: []
11
+ metadata:
12
+ agentos:
13
+ emoji: "\U0001F3AC"
14
+ ---
15
+
16
+ # Video Generation, Analysis & Scene Detection
17
+
18
+ Use this skill when the user wants to create AI-generated videos, analyse existing video content for structured scene descriptions, or detect visual changes in live/recorded frame streams.
19
+
20
+ This skill covers three complementary APIs:
21
+
22
+ 1. **generateVideo()** — Text-to-video and image-to-video generation
23
+ 2. **analyzeVideo()** — Structured video analysis with scene descriptions, transcription, and optional RAG indexing
24
+ 3. **detectScenes()** — Real-time or batch scene boundary detection from frame streams
25
+
26
+ ## Video Generation
27
+
28
+ ### Text-to-Video
29
+
30
+ Generate a video from a text prompt. The system auto-detects the best available provider from environment variables in priority order: `RUNWAY_API_KEY` (highest quality), `REPLICATE_API_TOKEN` (widest model variety), `FAL_API_KEY` (fast serverless GPU).
31
+
32
+ ```typescript
33
+ import { generateVideo } from 'agentos';
34
+
35
+ const result = await generateVideo({
36
+ prompt: 'A drone flying over a misty forest at sunrise, cinematic 4K',
37
+ durationSec: 5,
38
+ aspectRatio: '16:9',
39
+ });
40
+ console.log(result.videos[0].url);
41
+ ```
42
+
43
+ ### Image-to-Video
44
+
45
+ Animate a still image by providing it as a Buffer via `opts.image`. The prompt describes the desired motion rather than the scene itself.
46
+
47
+ ```typescript
48
+ import { generateVideo } from 'agentos';
49
+ import { readFileSync } from 'fs';
50
+
51
+ const result = await generateVideo({
52
+ prompt: 'Camera slowly zooms out, gentle wind moves the leaves',
53
+ image: readFileSync('landscape.png'),
54
+ provider: 'runway',
55
+ });
56
+ ```
57
+
58
+ ### Provider Selection
59
+
60
+ | Provider | Best For | Env Var |
61
+ |----------|----------|---------|
62
+ | **Runway** | Highest quality, cinematic output, image-to-video | `RUNWAY_API_KEY` |
63
+ | **Replicate** | Widest model variety (Kling, HunyuanVideo, MiniMax), open-source models | `REPLICATE_API_TOKEN` |
64
+ | **Fal** | Fast serverless GPU, cost-effective, Kling/CogVideo | `FAL_API_KEY` |
65
+
66
+ When multiple provider API keys are set, the system wraps the primary in a `FallbackVideoProxy` so a transient failure on one provider automatically retries on the next.
67
+
68
+ Use `providerPreferences` to reorder, block, or weight providers during auto-selection:
69
+
70
+ ```typescript
71
+ const result = await generateVideo({
72
+ prompt: 'A cat playing piano',
73
+ timeoutMs: 180_000,
74
+ onProgress: (event) => console.log(event.status, event.message),
75
+ providerPreferences: {
76
+ preferred: ['runway', 'replicate'],
77
+ weights: { runway: 3, replicate: 1 },
78
+ blocked: ['fal'],
79
+ },
80
+ });
81
+ ```
82
+
83
+ To force a specific provider:
84
+
85
+ ```typescript
86
+ const result = await generateVideo({
87
+ prompt: 'A cat playing piano',
88
+ provider: 'replicate',
89
+ model: 'klingai/kling-v1',
90
+ apiKey: 'your-replicate-token',
91
+ });
92
+ ```
93
+
94
+ ### Prompt Tips for Video
95
+
96
+ - **Be specific about motion**: "camera pans left to right", "person walks toward camera", "time-lapse of clouds moving"
97
+ - **Specify style early**: "cinematic 4K", "hand-drawn animation", "vintage film grain"
98
+ - **Keep prompts concise**: Video models respond best to clear, focused descriptions (1-3 sentences)
99
+ - **Use negative prompts** to avoid unwanted artifacts: `negativePrompt: 'blurry, distorted faces, watermark'`
100
+
101
+ ### Image-to-Video Motion Strength
102
+
103
+ When doing image-to-video, the prompt controls how much the image changes:
104
+
105
+ - **Gentle motion**: "subtle camera drift", "soft wind blowing through hair" — minimal departure from source
106
+ - **Moderate motion**: "person turns head and smiles", "camera orbits subject" — clear movement while preserving subject
107
+ - **Strong motion**: "explosion of confetti", "character runs toward camera" — significant scene change
108
+
109
+ The provider's motion strength interpretation varies. Runway tends to be conservative (good for preserving the source image), while Replicate/Fal models may be more aggressive. Start with gentle prompts and increase intensity.
110
+
111
+ ## Video Analysis
112
+
113
+ ### Structured Scene Analysis
114
+
115
+ Analyse a video to extract structured scene descriptions, detected objects, on-screen text, and optional audio transcription.
116
+
117
+ ```typescript
118
+ import { analyzeVideo } from 'agentos';
119
+
120
+ const analysis = await analyzeVideo({
121
+ videoUrl: 'https://example.com/product-demo.mp4',
122
+ prompt: 'Identify all products shown and their key features',
123
+ transcribeAudio: true,
124
+ descriptionDetail: 'detailed',
125
+ });
126
+
127
+ console.log(analysis.description);
128
+ for (const scene of analysis.scenes ?? []) {
129
+ console.log(`[${scene.startSec}s - ${scene.endSec}s] ${scene.description}`);
130
+ }
131
+ ```
132
+
133
+ ### RAG Integration
134
+
135
+ Enable `indexForRAG: true` to automatically index scene descriptions and transcripts into the vector store for later retrieval. This is especially useful for building searchable video libraries.
136
+
137
+ ```typescript
138
+ const analysis = await analyzeVideo({
139
+ videoBuffer: videoData,
140
+ indexForRAG: true,
141
+ descriptionDetail: 'detailed',
142
+ transcribeAudio: true,
143
+ });
144
+
145
+ // Scene descriptions and transcripts are now searchable via RAG
146
+ console.log(`Indexed ${analysis.ragChunkIds?.length ?? 0} chunks`);
147
+ ```
148
+
149
+ Each scene description becomes a separate vector chunk with metadata including timestamps, scene index, and cut type. This enables queries like "find the part where the presenter shows the pricing slide" to return precise timestamp ranges.
150
+
151
+ ### Analysis Options
152
+
153
+ | Option | Default | Description |
154
+ |--------|---------|-------------|
155
+ | `sceneThreshold` | `0.3` | Scene change sensitivity (0-1, lower = more scenes) |
156
+ | `transcribeAudio` | `true` | Transcribe audio via configured STT provider |
157
+ | `descriptionDetail` | `'detailed'` | `'brief'`, `'detailed'`, or `'exhaustive'` |
158
+ | `maxScenes` | `100` | Cap on detected scenes (prevents runaway on long videos) |
159
+ | `indexForRAG` | `false` | Index results into RAG vector store |
160
+
161
+ ## Scene Detection
162
+
163
+ ### Live Stream / Batch Detection
164
+
165
+ Use `detectScenes()` for real-time visual change detection on frame streams. Returns an AsyncGenerator that yields `SceneBoundary` objects as visual discontinuities are detected.
166
+
167
+ ```typescript
168
+ import { detectScenes } from 'agentos';
169
+
170
+ // From a pre-recorded video (frames extracted via ffmpeg)
171
+ for await (const boundary of detectScenes({ frames: extractedFrameStream })) {
172
+ console.log(`Scene ${boundary.index} at ${boundary.startTimeSec}s`);
173
+ console.log(` Type: ${boundary.cutType}, Confidence: ${boundary.confidence}`);
174
+ }
175
+ ```
176
+
177
+ ### Use Cases
178
+
179
+ - **Webcam / security camera**: Detect motion or scene changes in real-time surveillance feeds
180
+ - **Screen recording**: Identify slide transitions in presentations, page changes in demos
181
+ - **Video editing**: Automatically segment raw footage at cut points
182
+ - **Content moderation**: Flag rapid scene changes that may indicate problematic content
183
+
184
+ ### Configuration
185
+
186
+ ```typescript
187
+ for await (const boundary of detectScenes({
188
+ frames: webcamStream,
189
+ hardCutThreshold: 0.4, // Less sensitive to hard cuts
190
+ gradualThreshold: 0.15, // Standard sensitivity for dissolves/fades
191
+ minSceneDurationSec: 2.0, // Suppress very short scenes
192
+ methods: ['histogram'], // Fast histogram-only detection
193
+ })) {
194
+ handleSceneChange(boundary);
195
+ }
196
+ ```
197
+
198
+ ### Cut Type Classification
199
+
200
+ The detector classifies each scene boundary:
201
+
202
+ | Cut Type | Description |
203
+ |----------|-------------|
204
+ | `hard-cut` | Abrupt frame-to-frame change (most common) |
205
+ | `dissolve` | Cross-dissolve / superimposition transition |
206
+ | `fade` | Fade from/to black or white |
207
+ | `gradual` | Other gradual visual change |
208
+
209
+ ## Prerequisites
210
+
211
+ - At least one video provider API key for generation (`RUNWAY_API_KEY`, `REPLICATE_API_TOKEN`, or `FAL_API_KEY`)
212
+ - **ffmpeg** on PATH for video analysis (frame extraction and audio demuxing)
213
+ - A vision-capable LLM (`OPENAI_API_KEY` or equivalent) for scene description
214
+ - An STT provider for audio transcription (when `transcribeAudio` is enabled)
215
+
216
+ Scene detection (`detectScenes()`) has zero external dependencies — it works purely on RGB pixel buffers.
217
+
218
+ ## Examples
219
+
220
+ - "Generate a 5-second cinematic video of a sunset over the ocean"
221
+ - "Turn this product photo into a video with a slow camera orbit"
222
+ - "Analyse this tutorial video and index it for search"
223
+ - "Detect scene changes in this security camera feed"
224
+ - "Extract structured scenes from this presentation recording"
225
+ - "Create a video from this image with gentle parallax motion"
@@ -0,0 +1,82 @@
1
+ ---
2
+ name: vision-ocr
3
+ version: '1.1.0'
4
+ description: Extract text from images using OCR and vision AI with the performOCR() high-level API or the full VisionPipeline.
5
+ author: Wunderland
6
+ namespace: wunderland
7
+ category: vision
8
+ tags: [vision, ocr, text-extraction, document, handwriting]
9
+ requires_secrets: []
10
+ requires_tools: [vision-pipeline]
11
+ ---
12
+
13
+ # Vision & OCR
14
+
15
+ Extract text from images, documents, and handwritten notes using a progressive 3-tier pipeline: local OCR (PaddleOCR / Tesseract) -> local vision models (TrOCR, Florence-2) -> cloud vision LLM (GPT-4o, Claude, Gemini).
16
+
17
+ ## High-Level API: `performOCR()`
18
+
19
+ For one-shot text extraction, use the top-level `performOCR()` function. It handles input resolution, pipeline lifecycle, and cleanup automatically.
20
+
21
+ ```typescript
22
+ import { performOCR } from '@framers/agentos';
23
+
24
+ const result = await performOCR({
25
+ image: '/path/to/receipt.png', // file path, URL, base64, or Buffer
26
+ strategy: 'progressive', // 'progressive' | 'local-only' | 'cloud-only'
27
+ confidenceThreshold: 0.7, // min confidence before escalating tier
28
+ });
29
+
30
+ console.log(result.text); // extracted text
31
+ console.log(result.confidence); // 0–1 score
32
+ console.log(result.tier); // 'ocr' | 'handwriting' | 'document-ai' | 'cloud-vision'
33
+ console.log(result.provider); // 'paddle' | 'tesseract' | 'openai' | etc.
34
+ console.log(result.regions); // bounding boxes (when available)
35
+ ```
36
+
37
+ ## When to use `performOCR()` vs `VisionPipeline`
38
+
39
+ | Use case | Recommendation |
40
+ |----------|---------------|
41
+ | One-shot text extraction from a single image | `performOCR()` — simplest API |
42
+ | Batch processing many images | `VisionPipeline` — create once, reuse, dispose when done |
43
+ | Need CLIP embeddings or document layout | `VisionPipeline` — richer result shape |
44
+ | Quick scripts and integrations | `performOCR()` — zero boilerplate |
45
+
46
+ ## Progressive Tier System
47
+
48
+ The pipeline tries the cheapest/fastest tier first and only escalates when confidence is below threshold:
49
+
50
+ 1. **Tier 1 — Local OCR** (PaddleOCR or Tesseract.js): Fast, free, offline. Handles printed text in documents, receipts, screenshots.
51
+ 2. **Tier 2 — Local Vision Models** (TrOCR / Florence-2): Still offline. Handles handwritten notes, complex document layouts with tables and figures.
52
+ 3. **Tier 3 — Cloud Vision LLM** (GPT-4o / Claude / Gemini): Best quality. Handles photographs, diagrams, mixed content, anything the local tiers can't confidently read.
53
+
54
+ ## Strategy Selection
55
+
56
+ - **`'progressive'`** (default): Start local, escalate only if needed. Best cost/quality balance for most use cases.
57
+ - **`'local-only'`**: Never call cloud APIs. Use for air-gapped environments, privacy-sensitive data (medical records, financial docs), or when no API keys are available.
58
+ - **`'cloud-only'`**: Skip local tiers entirely, send straight to a cloud vision LLM. Use when you need the highest quality output and cost is not a concern.
59
+
60
+ ## Input Formats
61
+
62
+ `performOCR()` accepts four input types:
63
+
64
+ - **File path**: `'/tmp/scan.png'` — reads from disk
65
+ - **URL**: `'https://example.com/receipt.jpg'` — fetches via HTTP
66
+ - **Base64 string**: Raw base64 or `data:image/png;base64,...` data URIs — decoded in-memory
67
+ - **Buffer**: Raw image bytes — passed directly to the pipeline
68
+
69
+ ## Capabilities
70
+
71
+ - **Printed text OCR**: Extract text from documents, receipts, screenshots, PDFs
72
+ - **Handwriting recognition**: Read handwritten notes and forms via TrOCR
73
+ - **Document layout understanding**: Parse tables, figures, headings via Florence-2
74
+ - **Bounding box regions**: Spatial text locations for overlay rendering
75
+ - **Image embeddings**: Generate CLIP vectors for semantic image search (via `VisionPipeline` only)
76
+
77
+ ## Examples
78
+
79
+ - "Read the text from this receipt"
80
+ - "What does this handwritten note say?"
81
+ - "Extract the table data from this PDF page"
82
+ - "OCR this screenshot and return the error message"
@@ -0,0 +1,65 @@
1
+ ---
2
+ name: voice-conversation
3
+ version: '1.0.0'
4
+ description: Run provider-agnostic live voice conversations with VAD, silence boundaries, wake-word gating, STT, and TTS through the AgentOS speech runtime.
5
+ author: Wunderland
6
+ namespace: wunderland
7
+ category: communication
8
+ tags: [voice, speech, conversation, stt, tts, vad, wake-word, whisper, elevenlabs]
9
+ requires_secrets: []
10
+ requires_tools: []
11
+ metadata:
12
+ agentos:
13
+ emoji: "\U0001F3A4"
14
+ ---
15
+
16
+ # Live Voice Conversations
17
+
18
+ Use this skill when the user wants an agent to listen, transcribe, respond with speech, or switch between speech providers without changing the rest of the workflow.
19
+
20
+ Prefer the unified AgentOS speech runtime over provider-specific one-offs. Treat STT, TTS, VAD, and wake-word detection as separate capabilities that can be swapped independently.
21
+
22
+ ## Workflow
23
+
24
+ 1. Pick providers for:
25
+ - STT
26
+ - TTS
27
+ - optional wake word
28
+ - optional telephony
29
+ 2. Start a speech session in one of three modes:
30
+ - `manual` for push-to-talk or file transcription
31
+ - `vad` for continuous listen-until-silence loops
32
+ - `wake-word` when the user wants hands-free activation
33
+ 3. Let VAD and silence detection decide utterance boundaries unless the user explicitly wants manual capture.
34
+ 4. Transcribe the utterance, generate the response, then synthesize speech with the selected TTS provider.
35
+ 5. Support interruption and provider switching without changing the higher-level agent behavior.
36
+
37
+ ## Provider Rules
38
+
39
+ - Prefer `openai-whisper` for simple hosted transcription.
40
+ - Prefer `openai-tts` for a default hosted voice path when one API key should cover both LLM and speech.
41
+ - Prefer `elevenlabs` when voice quality or cloning matters more than simplicity.
42
+ - Prefer local providers such as `whisper-local` or `piper` when the user wants offline or lower-cost operation.
43
+ - Treat wake-word detection as optional. Default to VAD + silence detection unless the user asked for hands-free wake-up.
44
+
45
+ ## Voice UX Rules
46
+
47
+ - Do not keep listening forever without a boundary policy.
48
+ - Use significant pauses as a hint; use utterance-end silence as the final cutoff.
49
+ - When speech playback is active, be ready for barge-in and interruption if the user starts speaking again.
50
+ - Surface which provider combination is active so the user knows what is handling STT and TTS.
51
+ - When provider credentials are missing, degrade to whichever speech providers are configured instead of failing the whole interaction.
52
+
53
+ ## Examples
54
+
55
+ - "Start a live voice session with Whisper for STT and ElevenLabs for TTS."
56
+ - "Use OpenAI for both speech recognition and speech output."
57
+ - "Run voice chat locally with VAD and no wake word."
58
+ - "Switch the TTS provider but keep the same voice conversation flow."
59
+
60
+ ## Constraints
61
+
62
+ - Hosted speech providers need API keys.
63
+ - Wake-word support is optional and may depend on a plugin or local model.
64
+ - VAD and silence thresholds should be tuned for the environment; do not hardcode one value for every context.
65
+ - Telephony call control is separate from local microphone capture, but both should share the same speech provider abstractions.
@@ -0,0 +1,74 @@
1
+ ---
2
+ name: vosk
3
+ version: '1.0.0'
4
+ description: Fully offline speech-to-text via the Vosk library — streaming recognition, 16 kHz PCM, no network required after model download.
5
+ author: Wunderland
6
+ namespace: wunderland
7
+ category: voice
8
+ tags: [voice, stt, speech-to-text, vosk, offline, local, privacy]
9
+ requires_secrets: []
10
+ requires_tools: []
11
+ metadata:
12
+ agentos:
13
+ emoji: "\U0001F3A4"
14
+ homepage: https://alphacephei.com/vosk/
15
+ ---
16
+
17
+ # Vosk Offline STT
18
+
19
+ Use this skill when the agent must operate without internet connectivity, or when user privacy requirements prohibit sending audio to external APIs. Vosk provides fully offline speech recognition after the model is downloaded once.
20
+
21
+ Prefer this over cloud STT providers when operating in air-gapped environments, on-premise deployments, or when the user explicitly requests zero-cloud voice processing.
22
+
23
+ ## Setup
24
+
25
+ Download a Vosk model and place it at `~/.agentos/models/vosk/` (default), or set `modelPath` in `providerOptions`.
26
+
27
+ ```sh
28
+ # Example: download the small English model
29
+ wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
30
+ unzip vosk-model-small-en-us-0.15.zip -d ~/.agentos/models/vosk/
31
+ ```
32
+
33
+ ## Configuration
34
+
35
+ ```json
36
+ {
37
+ "voice": {
38
+ "stt": "vosk"
39
+ }
40
+ }
41
+ ```
42
+
43
+ With a custom model path:
44
+
45
+ ```json
46
+ {
47
+ "voice": {
48
+ "stt": "vosk",
49
+ "providerOptions": {
50
+ "modelPath": "/opt/models/vosk-model-en"
51
+ }
52
+ }
53
+ }
54
+ ```
55
+
56
+ ## Provider Rules
57
+
58
+ - Audio input must be 16 kHz LINEAR16 PCM. Resample other formats before streaming.
59
+ - Model quality scales with model size. Use `vosk-model-en-us-0.22` for best English accuracy; use small models on constrained hardware.
60
+ - No API key required. The only requirement is a pre-downloaded model directory.
61
+ - Streaming recognition is natively supported by the Vosk recognizer.
62
+
63
+ ## Examples
64
+
65
+ - "Use offline Vosk STT for this air-gapped deployment."
66
+ - "Transcribe my voice locally without sending audio to the cloud."
67
+ - "Configure Vosk with the large English model for best accuracy."
68
+
69
+ ## Constraints
70
+
71
+ - Requires the Vosk npm package and a pre-downloaded model directory.
72
+ - Accuracy is lower than cloud providers, especially for accented speech and domain-specific vocabulary.
73
+ - Audio must be 16 kHz mono LINEAR16 PCM. Other sample rates or formats require conversion.
74
+ - Model download size ranges from ~40 MB (small) to ~1.8 GB (large en-us).
@@ -0,0 +1,37 @@
1
+ ---
2
+ name: weather
3
+ version: '1.0.0'
4
+ description: Look up current weather conditions, forecasts, and severe weather alerts for any location worldwide.
5
+ author: Wunderland
6
+ namespace: wunderland
7
+ category: information
8
+ tags: [weather, forecast, climate, location]
9
+ requires_secrets: []
10
+ requires_tools: [web-search]
11
+ metadata:
12
+ agentos:
13
+ emoji: "\u2600\uFE0F"
14
+ homepage: https://openweathermap.org
15
+ ---
16
+
17
+ # Weather Lookup
18
+
19
+ You can retrieve current weather conditions, multi-day forecasts, and severe weather alerts for any location the user specifies. Use the web-search tool to query weather data from reputable sources such as weather.gov, OpenWeatherMap, or AccuWeather.
20
+
21
+ When the user asks about weather, always clarify the location if it is ambiguous. Provide temperatures in both Fahrenheit and Celsius unless the user specifies a preference. Include relevant details like humidity, wind speed, precipitation chance, and UV index when available.
22
+
23
+ For forecasts, present information in a concise, scannable format. Highlight any severe weather warnings or advisories prominently at the top of your response. If the user asks about historical weather or climate averages, note that your data is limited to what is available through web search results.
24
+
25
+ ## Examples
26
+
27
+ - "What's the weather in San Francisco right now?"
28
+ - "Give me a 5-day forecast for Tokyo"
29
+ - "Are there any severe weather alerts in the Midwest?"
30
+ - "What's the temperature in Berlin in Celsius?"
31
+
32
+ ## Constraints
33
+
34
+ - Weather data accuracy depends on the web search results available at query time.
35
+ - Historical weather data may be limited or unavailable.
36
+ - Hyper-local micro-climate data (e.g., specific street-level conditions) is not reliably available.
37
+ - Always attribute the data source when possible.
@@ -0,0 +1,60 @@
1
+ ---
2
+ name: web-scraper
3
+ version: '1.0.0'
4
+ description: Autonomous web scraping — navigate sites, extract structured data, handle pagination, anti-detection, and proxy rotation.
5
+ author: Wunderland
6
+ namespace: wunderland
7
+ category: automation
8
+ tags: [scraping, browser, extraction, data, pagination, proxy, automation]
9
+ requires_secrets: []
10
+ requires_tools: [browserNavigate, browserClick, browserFill, browserExtract, browserScreenshot, browserSnapshot, browserScroll, browserWait, browserEvaluate, browserSession]
11
+ metadata:
12
+ agentos:
13
+ emoji: "\U0001F578"
14
+ ---
15
+
16
+ # Web Scraper
17
+
18
+ You are an autonomous web scraping agent. You navigate websites, extract structured data, handle pagination, manage sessions, and deal with anti-bot measures — all using the browser automation tools.
19
+
20
+ ## Core Capabilities
21
+
22
+ - **Navigate** to any URL and render full JavaScript pages
23
+ - **Snapshot** pages to understand structure and find interactive elements
24
+ - **Extract** text, HTML, or attributes from DOM selectors
25
+ - **Paginate** — click through pages, infinite scroll, load more buttons
26
+ - **Handle auth** — log in, manage sessions, restore cookies
27
+ - **Anti-detection** — rotate proxies, manage fingerprints
28
+
29
+ ## Scraping Workflow
30
+
31
+ 1. **Navigate** to the target URL
32
+ 2. **Snapshot** the page to understand its structure
33
+ 3. **Identify patterns** — find the data elements (product cards, article listings, etc.)
34
+ 4. **Extract data** — pull text/attributes from identified selectors
35
+ 5. **Paginate** — navigate to next page and repeat
36
+ 6. **Handle errors** — retry on failures, screenshot for debugging
37
+
38
+ ## Best Practices
39
+
40
+ - **Respect robots.txt** — check before scraping
41
+ - **Rate limit requests** — don't overwhelm servers (minimum 1-2 second delays)
42
+ - **Use sessions** — save and restore login state to avoid re-authentication
43
+ - **Handle dynamic content** — wait for elements to load before extracting
44
+ - **Validate data** — check extracted data for completeness
45
+ - **Take screenshots** on errors for debugging
46
+
47
+ ## Anti-Detection
48
+
49
+ - Rotate user agents and viewport sizes
50
+ - Use proxy rotation when available
51
+ - Add random delays between actions
52
+ - Avoid scraping too fast from a single IP
53
+ - Handle CAPTCHAs when they appear
54
+
55
+ ## Data Output
56
+
57
+ Structure extracted data consistently:
58
+ - Return arrays of objects with consistent field names
59
+ - Include metadata (source URL, timestamp, page number)
60
+ - Handle missing fields gracefully (null, not undefined)