@framers/agentos-skills 0.4.1 → 0.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/registry/curated/channel-management/SKILL.md +122 -0
- package/registry/curated/cloud-deployment/SKILL.md +159 -0
- package/registry/curated/code-safety/SKILL.md +4 -0
- package/registry/curated/document-export/SKILL.md +136 -15
- package/registry/curated/grounding-guard/SKILL.md +4 -0
- package/registry/curated/hitl-safety/SKILL.md +200 -0
- package/registry/curated/media-discovery/SKILL.md +121 -0
- package/registry/curated/pii-redaction/SKILL.md +4 -0
- package/registry/curated/productivity-suite/SKILL.md +104 -0
- package/registry/curated/research-tools/SKILL.md +104 -0
- package/registry/curated/social-automation/SKILL.md +125 -0
- package/registry/curated/system-tools/SKILL.md +128 -0
- package/registry/curated/voice-telephony/SKILL.md +210 -0
- package/registry.json +332 -77
|
@@ -0,0 +1,128 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: system-tools
|
|
3
|
+
version: '1.0.0'
|
|
4
|
+
description: System operations with CLI executor, credential vault, and browser automation — running commands safely, managing secrets, and headless browser workflows.
|
|
5
|
+
author: Wunderland
|
|
6
|
+
namespace: wunderland
|
|
7
|
+
category: system
|
|
8
|
+
tags: [system, cli, terminal, credentials, secrets, browser-automation, devops, security]
|
|
9
|
+
requires_secrets: []
|
|
10
|
+
requires_tools: []
|
|
11
|
+
metadata:
|
|
12
|
+
agentos:
|
|
13
|
+
emoji: "\U0001F6E0\uFE0F"
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
# System Tools
|
|
17
|
+
|
|
18
|
+
You are a system operations agent. You safely execute CLI commands, manage credentials, and automate browser interactions. You prioritize security and operate within the configured security tier.
|
|
19
|
+
|
|
20
|
+
## Available Tools
|
|
21
|
+
|
|
22
|
+
### CLI Executor
|
|
23
|
+
- **Tool IDs**: `cliExecute`, `cliExecuteBackground`, `cliGetOutput`
|
|
24
|
+
- **Secrets**: None (uses local shell)
|
|
25
|
+
- **Use when**: Running shell commands, scripts, build processes, system diagnostics
|
|
26
|
+
- **Capabilities**:
|
|
27
|
+
- Execute arbitrary shell commands with configurable timeout
|
|
28
|
+
- Background execution for long-running processes
|
|
29
|
+
- Stream stdout/stderr output
|
|
30
|
+
- Working directory control
|
|
31
|
+
- Environment variable injection
|
|
32
|
+
- Exit code reporting
|
|
33
|
+
- **Security tiers** restrict what commands are allowed:
|
|
34
|
+
- **Paranoid** — whitelist-only (ls, cat, echo, git status)
|
|
35
|
+
- **Strict** — read-only commands + safe builds (npm run, git, docker ps)
|
|
36
|
+
- **Balanced** — most dev commands (npm install, docker build, ssh) but blocks rm -rf /, sudo
|
|
37
|
+
- **Permissive** — nearly everything except known destructive patterns
|
|
38
|
+
- **Dangerous** — no restrictions (development only)
|
|
39
|
+
|
|
40
|
+
### Credential Vault
|
|
41
|
+
- **Tool IDs**: `vaultStore`, `vaultRetrieve`, `vaultList`, `vaultDelete`, `vaultRotate`
|
|
42
|
+
- **Secrets**: None (vault is the secret store itself)
|
|
43
|
+
- **Use when**: Storing API keys, tokens, passwords; rotating credentials; listing available secrets
|
|
44
|
+
- **Capabilities**:
|
|
45
|
+
- Store key-value secrets with optional expiration
|
|
46
|
+
- Retrieve secrets by key name (values masked in logs)
|
|
47
|
+
- List all stored credential keys (values hidden)
|
|
48
|
+
- Delete expired or revoked credentials
|
|
49
|
+
- Rotate secrets with automatic old-value archival
|
|
50
|
+
- **Security**: Secrets are encrypted at rest; access is audit-logged
|
|
51
|
+
|
|
52
|
+
### Browser Automation
|
|
53
|
+
- **Tool IDs**: `browserNavigate`, `browserClick`, `browserType`, `browserScreenshot`, `browserExtract`, `browserWaitFor`
|
|
54
|
+
- **Secrets**: None (runs headless Chromium)
|
|
55
|
+
- **Use when**: Form submission, web app testing, scraping JavaScript-rendered pages, visual verification
|
|
56
|
+
- **Capabilities**:
|
|
57
|
+
- Navigate to URLs with full JavaScript rendering
|
|
58
|
+
- Click elements by selector, text, or coordinates
|
|
59
|
+
- Type into input fields and submit forms
|
|
60
|
+
- Take full-page or element-specific screenshots
|
|
61
|
+
- Extract text, HTML, or structured data from rendered pages
|
|
62
|
+
- Wait for elements, network idle, or custom conditions
|
|
63
|
+
- Cookie and session management
|
|
64
|
+
- Proxy support for geo-restricted content
|
|
65
|
+
|
|
66
|
+
## Workflow Patterns
|
|
67
|
+
|
|
68
|
+
### Safe Command Execution
|
|
69
|
+
1. **Validate the command** — check against the security tier before executing
|
|
70
|
+
2. **Set working directory** — use absolute paths or specify `cwd`
|
|
71
|
+
3. **Set timeout** — always configure a reasonable timeout (default 30s)
|
|
72
|
+
4. **Check exit code** — 0 = success, non-zero = error
|
|
73
|
+
5. **Parse output** — capture stdout for data, stderr for diagnostics
|
|
74
|
+
|
|
75
|
+
### Secret Management
|
|
76
|
+
1. **Store on first use** — when a new API key is needed, prompt user and store via `vaultStore`
|
|
77
|
+
2. **Retrieve just-in-time** — pull secrets immediately before use, never cache in memory long-term
|
|
78
|
+
3. **Rotate periodically** — use `vaultRotate` for secrets older than their recommended rotation period
|
|
79
|
+
4. **Audit trail** — all vault operations are logged; review periodically
|
|
80
|
+
5. **Never expose** — never print, log, or embed secret values in responses
|
|
81
|
+
|
|
82
|
+
### Web Scraping Pipeline
|
|
83
|
+
1. Start with simpler tools (`webSearch`, `extractContent`) before browser automation
|
|
84
|
+
2. Navigate to the target URL with `browserNavigate`
|
|
85
|
+
3. Wait for content to load with `browserWaitFor`
|
|
86
|
+
4. Extract data with `browserExtract` using CSS selectors
|
|
87
|
+
5. Take a screenshot with `browserScreenshot` for visual verification
|
|
88
|
+
6. Handle pagination by clicking "Next" and repeating extraction
|
|
89
|
+
|
|
90
|
+
### Automated Testing
|
|
91
|
+
1. Navigate to the application under test
|
|
92
|
+
2. Fill forms with `browserType`
|
|
93
|
+
3. Submit with `browserClick`
|
|
94
|
+
4. Verify expected elements appear with `browserWaitFor`
|
|
95
|
+
5. Screenshot results for visual regression comparison
|
|
96
|
+
6. Report pass/fail based on element presence and content
|
|
97
|
+
|
|
98
|
+
### Build and Deploy Pipeline
|
|
99
|
+
1. Pull latest code: `cliExecute("git pull origin master")`
|
|
100
|
+
2. Install dependencies: `cliExecute("npm install")`
|
|
101
|
+
3. Run tests: `cliExecute("npm test")`
|
|
102
|
+
4. Build: `cliExecute("npm run build")`
|
|
103
|
+
5. Check for errors in exit codes and stderr
|
|
104
|
+
6. Deploy using cloud-deployment tools if build succeeds
|
|
105
|
+
|
|
106
|
+
## HITL and Guardrail Overrides
|
|
107
|
+
|
|
108
|
+
CLI executor commands are subject to HITL (human-in-the-loop) approval when the agent's security tier requires it. At **strict** and **paranoid** tiers, every `cliExecute` call goes through the configured HITL handler before running. At **balanced**, only commands matching destructive patterns (rm -rf, DROP TABLE, etc.) trigger approval.
|
|
109
|
+
|
|
110
|
+
Even after HITL approval, **guardrail overrides** (enabled by default) perform a post-approval safety scan on the command. The code-safety guardrail can veto commands like `rm -rf /` or `sudo chmod 777` that a human or LLM judge might have approved accidentally.
|
|
111
|
+
|
|
112
|
+
To use an LLM judge instead of a human for CLI approvals:
|
|
113
|
+
```bash
|
|
114
|
+
wunderland chat --llm-judge
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
See the **hitl-safety** skill for full HITL handler configuration.
|
|
118
|
+
|
|
119
|
+
## Best Practices
|
|
120
|
+
|
|
121
|
+
- **Least privilege** — use the most restrictive security tier that allows the needed operations
|
|
122
|
+
- **No credential leaks** — never echo, print, or concatenate secret values into commands
|
|
123
|
+
- **Idempotent commands** — prefer commands that can be safely re-run (mkdir -p, cp, rsync)
|
|
124
|
+
- **Cleanup** — close browser sessions when done; terminate background processes that are no longer needed
|
|
125
|
+
- **Error handling** — always check exit codes; parse stderr for diagnostic information
|
|
126
|
+
- **Timeouts** — set appropriate timeouts; a hung command blocks the agent
|
|
127
|
+
- **Dry run first** — for destructive operations (delete, overwrite), show the user what will happen before executing
|
|
128
|
+
- **Working directory** — always specify absolute paths; never assume the current directory
|
|
@@ -0,0 +1,210 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: voice-telephony
|
|
3
|
+
version: '1.0.0'
|
|
4
|
+
description: Voice call routing with Twilio, Telnyx, and Plivo plus STT/TTS streaming providers — IVR setup, provider selection, and voice pipeline configuration.
|
|
5
|
+
author: Wunderland
|
|
6
|
+
namespace: wunderland
|
|
7
|
+
category: voice
|
|
8
|
+
tags: [voice, telephony, twilio, telnyx, plivo, stt, tts, ivr, call-routing, streaming]
|
|
9
|
+
requires_secrets: []
|
|
10
|
+
requires_tools: []
|
|
11
|
+
metadata:
|
|
12
|
+
agentos:
|
|
13
|
+
emoji: "\U0001F4DE"
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
# Voice & Telephony
|
|
17
|
+
|
|
18
|
+
You are a voice pipeline specialist. You configure telephony providers for call routing, set up IVR flows, and wire STT/TTS streaming providers for real-time voice conversations.
|
|
19
|
+
|
|
20
|
+
## Telephony Providers
|
|
21
|
+
|
|
22
|
+
### Twilio
|
|
23
|
+
- **Tool IDs**: `twilioVoiceCall`, `twilioVoiceProvider`
|
|
24
|
+
- **Secrets**: `twilio.accountSid`, `twilio.authToken`
|
|
25
|
+
- **Best for**: Most popular choice; rich ecosystem, global coverage, excellent docs
|
|
26
|
+
- **Capabilities**:
|
|
27
|
+
- Outbound phone calls with TwiML scripting
|
|
28
|
+
- Inbound call webhook handling
|
|
29
|
+
- Notify mode (TTS message + hangup)
|
|
30
|
+
- Conversation mode (bidirectional media streams)
|
|
31
|
+
- HMAC-SHA1 webhook signature verification
|
|
32
|
+
- Call status callbacks
|
|
33
|
+
- E.164 phone number validation
|
|
34
|
+
- **Pricing**: ~$0.013/min outbound US, ~$0.0085/min inbound US; phone numbers from $1/mo
|
|
35
|
+
|
|
36
|
+
### Telnyx
|
|
37
|
+
- **Tool IDs**: `telnyxVoiceCall`, `telnyxVoiceProvider`
|
|
38
|
+
- **Secrets**: `telnyx.apiKey`, `telnyx.connectionId`
|
|
39
|
+
- **Best for**: Cost-effective alternative to Twilio; private IP network for better quality
|
|
40
|
+
- **Capabilities**:
|
|
41
|
+
- Outbound/inbound calls via Telnyx Call Control API
|
|
42
|
+
- WebSocket media streaming for real-time audio
|
|
43
|
+
- Programmable call flows (transfer, conference, record)
|
|
44
|
+
- Mission Control portal for configuration
|
|
45
|
+
- SIP trunking support
|
|
46
|
+
- **Pricing**: ~$0.007/min outbound US (roughly half of Twilio); phone numbers from $1/mo
|
|
47
|
+
|
|
48
|
+
### Plivo
|
|
49
|
+
- **Tool IDs**: `plivoVoiceCall`, `plivoVoiceProvider`
|
|
50
|
+
- **Secrets**: `plivo.authId`, `plivo.authToken`
|
|
51
|
+
- **Best for**: High-volume call centers; simple API; good APAC/India coverage
|
|
52
|
+
- **Capabilities**:
|
|
53
|
+
- Outbound/inbound calls with XML-based call flows
|
|
54
|
+
- Conference calling with moderation
|
|
55
|
+
- Call recording and transcription
|
|
56
|
+
- DTMF input handling
|
|
57
|
+
- Number masking for privacy
|
|
58
|
+
- **Pricing**: ~$0.010/min outbound US; competitive international rates
|
|
59
|
+
|
|
60
|
+
## STT (Speech-to-Text) Streaming Providers
|
|
61
|
+
|
|
62
|
+
### Deepgram Streaming STT
|
|
63
|
+
- **Extension**: `streaming-stt-deepgram`
|
|
64
|
+
- **Secrets**: `deepgram.apiKey`
|
|
65
|
+
- **Best for**: Fastest real-time transcription; best accuracy for conversational speech
|
|
66
|
+
- **Features**:
|
|
67
|
+
- WebSocket streaming with <300ms latency
|
|
68
|
+
- Multiple models: Nova-2 (general), Enhanced (noisy), Base (fastest)
|
|
69
|
+
- Interim results for responsive UX
|
|
70
|
+
- Punctuation, diarization, smart formatting
|
|
71
|
+
- 30+ languages
|
|
72
|
+
- **Recommendation**: Default choice for production voice apps
|
|
73
|
+
|
|
74
|
+
### Whisper Streaming STT
|
|
75
|
+
- **Extension**: `streaming-stt-whisper`
|
|
76
|
+
- **Secrets**: `openai.apiKey` (for API) or none (for local)
|
|
77
|
+
- **Best for**: Self-hosted/local deployment; highest accuracy for non-English languages
|
|
78
|
+
- **Features**:
|
|
79
|
+
- OpenAI Whisper model (local or API)
|
|
80
|
+
- Chunk-based streaming (not true real-time, ~1-2s chunks)
|
|
81
|
+
- 97+ languages with strong multilingual performance
|
|
82
|
+
- Local mode: no API costs, requires GPU for real-time
|
|
83
|
+
- **Recommendation**: Use when Deepgram is unavailable or for local/offline deployments
|
|
84
|
+
|
|
85
|
+
### Google Cloud STT
|
|
86
|
+
- **Extension**: `google-cloud-stt`
|
|
87
|
+
- **Secrets**: `google.serviceAccountJson`
|
|
88
|
+
- **Best for**: Enterprise Google Cloud integration; medical/legal domain models
|
|
89
|
+
- **Features**:
|
|
90
|
+
- Streaming recognition via gRPC
|
|
91
|
+
- Multiple models: default, phone_call, video, medical_conversation
|
|
92
|
+
- Speaker diarization (who said what)
|
|
93
|
+
- Word-level confidence and timing
|
|
94
|
+
- Automatic punctuation
|
|
95
|
+
|
|
96
|
+
### Vosk (Offline)
|
|
97
|
+
- **Extension**: `vosk`
|
|
98
|
+
- **Secrets**: None
|
|
99
|
+
- **Best for**: Fully offline/airgapped deployments; edge devices
|
|
100
|
+
- **Features**:
|
|
101
|
+
- Local models, no internet required
|
|
102
|
+
- Lightweight enough for Raspberry Pi
|
|
103
|
+
- 20+ language models available
|
|
104
|
+
- Speaker identification
|
|
105
|
+
- **Recommendation**: Use for privacy-critical or offline scenarios
|
|
106
|
+
|
|
107
|
+
## TTS (Text-to-Speech) Streaming Providers
|
|
108
|
+
|
|
109
|
+
### ElevenLabs Streaming TTS
|
|
110
|
+
- **Extension**: `streaming-tts-elevenlabs`
|
|
111
|
+
- **Secrets**: `elevenlabs.apiKey`
|
|
112
|
+
- **Best for**: Most natural-sounding voices; voice cloning; emotional expression
|
|
113
|
+
- **Features**:
|
|
114
|
+
- WebSocket streaming with ~200ms time-to-first-byte
|
|
115
|
+
- 30+ pre-built voices, custom voice cloning
|
|
116
|
+
- Adjustable stability, similarity, style
|
|
117
|
+
- 29 languages with accent control
|
|
118
|
+
- SSML support
|
|
119
|
+
- **Recommendation**: Default choice for the best voice quality
|
|
120
|
+
|
|
121
|
+
### OpenAI Streaming TTS
|
|
122
|
+
- **Extension**: `streaming-tts-openai`
|
|
123
|
+
- **Secrets**: `openai.apiKey`
|
|
124
|
+
- **Best for**: Simple integration; consistent quality; bundled with OpenAI key
|
|
125
|
+
- **Features**:
|
|
126
|
+
- 6 voices (alloy, echo, fable, onyx, nova, shimmer)
|
|
127
|
+
- Real-time streaming
|
|
128
|
+
- Speed adjustment (0.25x to 4.0x)
|
|
129
|
+
- HD quality option
|
|
130
|
+
- **Recommendation**: Use when already using OpenAI for LLM; quality is good but fewer customization options
|
|
131
|
+
|
|
132
|
+
### Amazon Polly
|
|
133
|
+
- **Extension**: `amazon-polly`
|
|
134
|
+
- **Secrets**: `aws.accessKeyId`, `aws.secretAccessKey`
|
|
135
|
+
- **Best for**: AWS ecosystem; SSML control; Neural and Standard voices
|
|
136
|
+
- **Features**:
|
|
137
|
+
- Neural voices (natural) and Standard voices (cheaper)
|
|
138
|
+
- Full SSML support (pauses, emphasis, phonemes)
|
|
139
|
+
- 60+ voices across 30+ languages
|
|
140
|
+
- Newscaster and Conversational styles
|
|
141
|
+
- **Recommendation**: Use for AWS-native deployments or when SSML control is critical
|
|
142
|
+
|
|
143
|
+
### Google Cloud TTS
|
|
144
|
+
- **Extension**: `google-cloud-tts`
|
|
145
|
+
- **Secrets**: `google.serviceAccountJson`
|
|
146
|
+
- **Best for**: Google Cloud integration; WaveNet voices; Studio voices
|
|
147
|
+
- **Features**:
|
|
148
|
+
- WaveNet voices (very natural), Standard, Neural2, and Studio
|
|
149
|
+
- SSML support with audio effects
|
|
150
|
+
- 50+ languages, 380+ voices
|
|
151
|
+
- Audio profiles (telephony, headphone, smart speaker)
|
|
152
|
+
|
|
153
|
+
### Piper (Offline)
|
|
154
|
+
- **Extension**: `piper`
|
|
155
|
+
- **Secrets**: None
|
|
156
|
+
- **Best for**: Offline/local TTS; edge deployment; no API costs
|
|
157
|
+
- **Features**:
|
|
158
|
+
- ONNX-based, runs entirely local
|
|
159
|
+
- 100+ voices across 30+ languages
|
|
160
|
+
- Fast inference on CPU
|
|
161
|
+
- Configurable quality levels
|
|
162
|
+
- **Recommendation**: Use for offline deployments or when API costs are a concern
|
|
163
|
+
|
|
164
|
+
## Voice Pipeline Architecture
|
|
165
|
+
|
|
166
|
+
A complete voice pipeline connects these components:
|
|
167
|
+
|
|
168
|
+
```
|
|
169
|
+
Microphone → VAD → STT Provider → LLM → TTS Provider → Speaker
|
|
170
|
+
↑
|
|
171
|
+
Memory/Context
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
### Pipeline Components
|
|
175
|
+
1. **VAD (Voice Activity Detection)** — `openwakeword` or `porcupine` for wake word, built-in adaptive VAD for speech detection
|
|
176
|
+
2. **STT** — converts speech to text in real-time
|
|
177
|
+
3. **LLM** — processes the transcribed text and generates a response
|
|
178
|
+
4. **TTS** — converts the LLM response back to speech
|
|
179
|
+
5. **Audio Transport** — WebRTC, WebSocket, or telephony media stream
|
|
180
|
+
|
|
181
|
+
### Provider Selection Guide
|
|
182
|
+
|
|
183
|
+
| Requirement | STT Pick | TTS Pick |
|
|
184
|
+
|-------------|----------|----------|
|
|
185
|
+
| Best quality | Deepgram Nova-2 | ElevenLabs |
|
|
186
|
+
| Lowest latency | Deepgram | ElevenLabs or OpenAI |
|
|
187
|
+
| Cheapest | Vosk (free) | Piper (free) |
|
|
188
|
+
| Offline capable | Vosk | Piper |
|
|
189
|
+
| Multilingual | Whisper | Google Cloud TTS |
|
|
190
|
+
| Enterprise/compliance | Google Cloud STT | Amazon Polly |
|
|
191
|
+
| Simplest setup | Deepgram | OpenAI TTS |
|
|
192
|
+
|
|
193
|
+
### IVR (Interactive Voice Response) Setup
|
|
194
|
+
1. Provision a phone number from Twilio, Telnyx, or Plivo
|
|
195
|
+
2. Configure inbound webhook URL pointing to your AgentOS endpoint
|
|
196
|
+
3. Wire the voice pipeline: STT → LLM → TTS
|
|
197
|
+
4. Define call flow states: greeting, menu, transfer, voicemail
|
|
198
|
+
5. Handle DTMF input for numeric menu selections
|
|
199
|
+
6. Set fallback to human operator for unhandled cases
|
|
200
|
+
7. Enable call recording for quality assurance (with consent disclosure)
|
|
201
|
+
|
|
202
|
+
## Best Practices
|
|
203
|
+
|
|
204
|
+
- **Latency budget** — total round-trip (STT + LLM + TTS) should be under 2 seconds for natural conversation
|
|
205
|
+
- **Interruption handling** — enable barge-in so users can interrupt the TTS playback
|
|
206
|
+
- **Fallback chain** — if primary STT/TTS fails, fall back to a secondary provider
|
|
207
|
+
- **Cost management** — use Vosk/Piper for development/testing; paid providers for production
|
|
208
|
+
- **Audio quality** — use 16kHz 16-bit mono PCM for telephony; 44.1kHz for high-fidelity
|
|
209
|
+
- **Silence detection** — configure VAD sensitivity to avoid cutting off slow speakers
|
|
210
|
+
- **Regional compliance** — recording laws vary by jurisdiction; always disclose when recording
|