@framers/agentos-skills-registry 0.8.0 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -6,7 +6,7 @@
6
6
 
7
7
  # @framers/agentos-skills-registry
8
8
 
9
- Curated skills registry for [AgentOS](https://github.com/framersai/agentos) 40 SKILL.md prompt modules, typed catalog, and lazy-loading factories.
9
+ Curated catalog of 60+ AgentOS skills with query helpers and lazy-loading factories.
10
10
 
11
11
  [![npm](https://img.shields.io/npm/v/@framers/agentos-skills-registry?logo=npm&color=cb3837)](https://www.npmjs.com/package/@framers/agentos-skills-registry)
12
12
 
@@ -14,15 +14,36 @@ Curated skills registry for [AgentOS](https://github.com/framersai/agentos) —
14
14
  npm install @framers/agentos-skills-registry
15
15
  ```
16
16
 
17
- ## What's Inside
17
+ ## What This Package Is
18
18
 
19
- This is the **single package** for AgentOS skills. It contains:
19
+ This is the **skills catalog** the data layer that ships 60+ curated SKILL.md
20
+ prompt modules and provides typed query helpers, search functions, and lazy-loading
21
+ factories for consuming them.
20
22
 
21
- - **40 curated SKILL.md files** prompt modules spanning social automation, developer tooling, productivity, research, voice, and more
22
- - **registry.json** — machine-readable index of all skills with metadata
23
- - **Static catalog** (`SKILLS_CATALOG`) typed array with query helpers
24
- - **Registry factories** — `createCuratedSkillRegistry()`, `createCuratedSkillSnapshot()` (requires `@framers/agentos`)
25
- - **Validation script** — `npm run validate` to lint SKILL.md files
23
+ It is **not** the skills engine. The engine lives in
24
+ [`@framers/agentos-skills`](https://www.npmjs.com/package/@framers/agentos)
25
+ (exported from `@framers/agentos/skills`), which provides the runtime
26
+ `SkillRegistry`, `SkillSnapshot` builder, frontmatter parser, and eligibility
27
+ resolver.
28
+
29
+ ### Architecture: catalog vs. engine
30
+
31
+ ```
32
+ @framers/agentos ← the skills ENGINE (runtime)
33
+ └── /skills SkillRegistry, SkillSnapshot, parser, eligibility
34
+
35
+ │ lazy import()
36
+
37
+ @framers/agentos-skills-registry ← THIS package (CATALOG)
38
+ ├── registry/curated/*/SKILL.md 60+ bundled prompt modules
39
+ ├── registry.json machine-readable index of all skills
40
+ ├── catalog.ts SKILLS_CATALOG array + query helpers (zero deps)
41
+ └── index.ts factory functions that lazy-import the engine
42
+ ```
43
+
44
+ **Dependency direction:** this catalog package depends on `@framers/agentos`
45
+ (optional peer dep), never the other way around. The engine knows nothing about
46
+ the catalog — it just provides the parsing and registry machinery.
26
47
 
27
48
  ## Quick Start
28
49
 
@@ -67,7 +88,7 @@ Access the JSON index directly:
67
88
  import { getSkillsCatalog } from '@framers/agentos-skills-registry';
68
89
 
69
90
  const catalog = await getSkillsCatalog();
70
- console.log(catalog.skills.curated.length); // 40
91
+ console.log(catalog.skills.curated.length); // 60+
71
92
  console.log(catalog.version); // '1.0.0'
72
93
  ```
73
94
 
@@ -80,7 +101,8 @@ console.log(registry.skills.curated[0].name); // 'weather'
80
101
 
81
102
  ### 3. Dynamically load skills into an agent (requires @framers/agentos)
82
103
 
83
- The factory functions lazy-load `@framers/agentos` via dynamic `import()`:
104
+ The factory functions lazy-import `@framers/agentos/skills` (the engine) via
105
+ dynamic `import()` — resolved only when you call them, cached after first use:
84
106
 
85
107
  ```bash
86
108
  npm install @framers/agentos-skills-registry @framers/agentos
@@ -149,23 +171,26 @@ const valid = skillNames.filter((name) => {
149
171
  const snapshot = await createCuratedSkillSnapshot({ skills: valid });
150
172
  ```
151
173
 
152
- When `skills` is a string array, the registry only loads those specific `SKILL.md`
174
+ When `skills` is a string array, the catalog only loads those specific `SKILL.md`
153
175
  files before building the snapshot. It does not walk the full curated bundle first.
154
176
  Loaded skills also include parsed `metadata` so consumers do not need to decode
155
177
  the `metadata.agentos` block manually.
156
178
 
157
- ## Two Import Paths
179
+ ## Sub-exports
158
180
 
159
- | Import | Peer deps | Use case |
160
- |--------|-----------|----------|
161
- | `@framers/agentos-skills-registry/catalog` | None | UI browsing, search, filtering |
162
- | `@framers/agentos-skills-registry` | `@framers/agentos` (optional) | Runtime loading, snapshots, factories |
181
+ | Export path | Peer deps | Use case |
182
+ |-------------|-----------|----------|
183
+ | `@framers/agentos-skills-registry` | `@framers/agentos` (optional) | Full SDK: catalog + factory functions + schema types |
184
+ | `@framers/agentos-skills-registry/catalog` | None | Lightweight: `SKILLS_CATALOG`, query helpers (search, filter, browse) |
185
+ | `@framers/agentos-skills-registry/registry.json` | None | Raw JSON index of all skills |
186
+ | `@framers/agentos-skills-registry/workspace-discovery` | None | Discover SKILL.md files in workspace directories |
187
+ | `@framers/agentos-skills-registry/types` | None | TypeScript declarations for registry.json schema |
163
188
 
164
- The `@framers/agentos` dependency is loaded **lazily** at runtime and cached after first resolution. If it's not installed and you call a factory function, you get a clear error with install instructions.
189
+ The `@framers/agentos` dependency is loaded **lazily** at runtime and cached after first resolution. If it is not installed and you call a factory function, you get a clear error with install instructions. The catalog query helpers work without it.
165
190
 
166
- ## Included Skills (40)
191
+ ## Included Skills (60+)
167
192
 
168
- The catalog now includes both foundational utility skills and social automation modules, including:
193
+ The catalog includes both foundational utility skills and social automation modules:
169
194
 
170
195
  - Information and research: `web-search`, `weather`, `summarize`, `deep-research`
171
196
  - Developer tools: `github`, `coding-agent`, `git`
@@ -203,26 +228,6 @@ import type {
203
228
  // SkillInstallSpec — install instructions for skill dependencies
204
229
  ```
205
230
 
206
- ## Exports
207
-
208
- | Export path | Contents |
209
- |-------------|----------|
210
- | `.` | Full SDK: catalog helpers + factory functions + schema types |
211
- | `./catalog` | Lightweight: `SKILLS_CATALOG`, query helpers (zero peer deps) |
212
- | `./registry.json` | Raw JSON index of all skills |
213
- | `./types` | TypeScript declarations for registry.json schema |
214
-
215
- ## Relationship to Other Packages
216
-
217
- ```
218
- @framers/agentos-skills-registry ← This package (data + SDK)
219
- ├── registry/curated/*/SKILL.md (bundled prompt modules)
220
- ├── registry.json (machine-readable index)
221
- ├── catalog.ts (typed queries: search, filter, browse)
222
- └── index.ts (factories: lazy-load @framers/agentos)
223
- └── @framers/agentos (optional peer: live SkillRegistry + snapshots)
224
- ```
225
-
226
231
  ## Contributing
227
232
 
228
233
  See [CONTRIBUTING.md](./CONTRIBUTING.md) for how to submit new skills.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@framers/agentos-skills-registry",
3
- "version": "0.8.0",
3
+ "version": "0.9.0",
4
4
  "files": [
5
5
  "dist",
6
6
  "registry",
@@ -10,7 +10,7 @@
10
10
  "CONTRIBUTING.md",
11
11
  "README.md"
12
12
  ],
13
- "description": "Curated skills registry for AgentOS — SKILL.md prompt modules, typed catalog, and lazy-loading factories",
13
+ "description": "Curated skills catalog for AgentOS — 60+ bundled SKILL.md prompt modules with query helpers and lazy-loading factories",
14
14
  "type": "module",
15
15
  "sideEffects": false,
16
16
  "main": "./dist/index.js",
@@ -0,0 +1,22 @@
1
+ ---
2
+ name: agent-config
3
+ description: Export and import agent configurations for sharing and backup
4
+ version: 1.0.0
5
+ tags: [agent, config, export, import, yaml, json]
6
+ tools_required: []
7
+ ---
8
+
9
+ # Agent Configuration
10
+
11
+ Export agent configurations as portable YAML/JSON files for sharing, backup, and migration. Import configurations to recreate agents.
12
+
13
+ ## Capabilities
14
+ - **Export**: Save agent config with instructions, tools, personality, guardrails
15
+ - **Import**: Recreate agent from exported config
16
+ - **Secret redaction**: API keys automatically redacted on export
17
+ - **Round-trip**: Export -> edit -> import workflow
18
+
19
+ ## Example
20
+ "Export my research agent to a file"
21
+ "Import this agent configuration"
22
+ "Share my agent config with the team"
@@ -0,0 +1,74 @@
1
+ ---
2
+ name: amazon-polly
3
+ version: '1.0.0'
4
+ description: Neural text-to-speech via Amazon Polly — high-quality voices, MP3 output, voice listing, default Joanna (en-US Neural).
5
+ author: Wunderland
6
+ namespace: wunderland
7
+ category: voice
8
+ tags: [voice, tts, text-to-speech, amazon, aws, polly, neural]
9
+ requires_secrets: [aws.accessKeyId, aws.secretAccessKey]
10
+ requires_tools: []
11
+ metadata:
12
+ agentos:
13
+ emoji: "\U0001F50A"
14
+ primaryEnv: AWS_ACCESS_KEY_ID
15
+ homepage: https://docs.aws.amazon.com/polly/
16
+ ---
17
+
18
+ # Amazon Polly TTS
19
+
20
+ Use this skill when the agent is deployed on AWS infrastructure or when the user prefers Amazon Polly's Neural engine voices. Polly provides high-quality neural TTS with MP3 output and a wide range of voices across languages and regions.
21
+
22
+ Prefer this over OpenAI or ElevenLabs TTS when the user already has AWS credentials configured, or when cost efficiency at high volume is a priority (Polly pricing is per-character rather than per-request).
23
+
24
+ ## Setup
25
+
26
+ Set the following in the environment or agent secrets store:
27
+
28
+ | Variable | Description |
29
+ |---------------------------|--------------------------|
30
+ | `AWS_ACCESS_KEY_ID` | IAM access key ID |
31
+ | `AWS_SECRET_ACCESS_KEY` | IAM secret access key |
32
+ | `AWS_REGION` | AWS region (default `us-east-1`) |
33
+
34
+ ## Configuration
35
+
36
+ ```json
37
+ {
38
+ "voice": {
39
+ "tts": "amazon-polly"
40
+ }
41
+ }
42
+ ```
43
+
44
+ With a specific voice:
45
+
46
+ ```json
47
+ {
48
+ "voice": {
49
+ "tts": "amazon-polly",
50
+ "providerOptions": {
51
+ "voice": "Matthew"
52
+ }
53
+ }
54
+ }
55
+ ```
56
+
57
+ ## Provider Rules
58
+
59
+ - Default voice is `Joanna` (en-US, Neural engine).
60
+ - Always use the Neural engine for production — Standard voices are lower quality.
61
+ - Call `listAvailableVoices()` to enumerate all voices available in the configured region.
62
+ - Audio is returned as MP3. Playback requires an MP3-capable audio pipeline.
63
+
64
+ ## Examples
65
+
66
+ - "Use Amazon Polly with the Matthew voice for this agent."
67
+ - "List available Polly voices in us-east-1."
68
+ - "Synthesize this message using Amazon Polly."
69
+
70
+ ## Constraints
71
+
72
+ - Requires `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` with `polly:SynthesizeSpeech` and `polly:DescribeVoices` IAM permissions.
73
+ - Neural engine voices are region-specific. Not all voices are available in every AWS region.
74
+ - API costs apply per character synthesized (Neural engine pricing).
@@ -0,0 +1,83 @@
1
+ ---
2
+ name: diarization
3
+ version: '1.0.0'
4
+ description: Speaker diarization — identifies and tracks who is speaking at each moment in an audio stream, using provider-delegated labels or local offline clustering.
5
+ author: Wunderland
6
+ namespace: wunderland
7
+ category: voice
8
+ tags: [voice, diarization, speaker-identification, multi-speaker, offline, deepgram]
9
+ requires_secrets: []
10
+ requires_tools: []
11
+ metadata:
12
+ agentos:
13
+ emoji: "\U0001F465"
14
+ homepage: https://docs.wunderland.sh/guides/voice
15
+ ---
16
+
17
+ # Speaker Diarization
18
+
19
+ Use this skill when the agent needs to track who is speaking across a multi-speaker audio stream. Supports two modes: provider-delegated (extracts speaker labels from Deepgram word-level results) and local clustering (spectral-centroid agglomerative clustering, fully offline).
20
+
21
+ Enable this when transcribing meetings, interviews, podcast recordings, or any scenario where knowing which speaker said what is important for downstream tasks (summaries, action items, CRM updates).
22
+
23
+ ## Setup
24
+
25
+ No API key required for local mode. For provider-delegated mode, enable diarization on the STT provider:
26
+
27
+ ```json
28
+ {
29
+ "voice": {
30
+ "stt": "deepgram",
31
+ "diarization": "provider",
32
+ "providerOptions": { "diarize": true }
33
+ }
34
+ }
35
+ ```
36
+
37
+ For fully offline local clustering:
38
+
39
+ ```json
40
+ {
41
+ "voice": {
42
+ "diarization": "local"
43
+ }
44
+ }
45
+ ```
46
+
47
+ ## Speaker Enrollment
48
+
49
+ Pre-register known speakers so the engine labels them by name instead of `Speaker_0`, `Speaker_1`:
50
+
51
+ ```ts
52
+ await session.enrollSpeaker('Alice', aliceVoiceprintFloat32Array);
53
+ await session.enrollSpeaker('Bob', bobVoiceprintFloat32Array);
54
+ ```
55
+
56
+ ## Provider Rules
57
+
58
+ - Prefer provider-delegated mode with Deepgram when speaker accuracy is critical and `DEEPGRAM_API_KEY` is available. Word-level speaker labels are more reliable than local clustering.
59
+ - Use local mode when privacy, offline operation, or zero additional API cost is required.
60
+ - The local backend extracts 16-dimensional spectral feature vectors per 1.5 s window (0.5 s overlap) — suitable for clear audio. Replace with an ONNX x-vector model for noisy environments.
61
+ - Enroll known speakers when participant names are known in advance to get named labels instead of generic `Speaker_N` identifiers.
62
+
63
+ ## Events
64
+
65
+ | Event | Description |
66
+ |-----------------------|----------------------------------------------------------|
67
+ | `speaker_identified` | Active speaker label has changed |
68
+ | `segment_ready` | A labelled audio or transcript segment is ready |
69
+ | `error` | Unrecoverable diarization error |
70
+ | `close` | Session fully terminated |
71
+
72
+ ## Examples
73
+
74
+ - "Transcribe this meeting and label each speaker's turns."
75
+ - "Use local diarization for a private interview recording."
76
+ - "Enable Deepgram diarization and label Alice and Bob by voice."
77
+
78
+ ## Constraints
79
+
80
+ - Local mode accuracy depends on audio clarity and spectral separation between speakers.
81
+ - Provider mode requires `diarize: true` support on the active STT provider (currently Deepgram).
82
+ - Speaker enrollment voiceprints must be Float32Arrays computed from clean reference audio.
83
+ - The built-in feature extractor is intentionally lightweight; replace `extractSimpleEmbedding()` with an ONNX x-vector model for production-quality voiceprints.
@@ -0,0 +1,97 @@
1
+ ---
2
+ name: emergent-tools
3
+ version: '1.0.0'
4
+ description: Forge new tools at runtime via LLM — sandboxed execution, LLM-as-judge validation, composable tool building, and full audit trail.
5
+ author: Wunderland
6
+ namespace: wunderland
7
+ category: productivity
8
+ tags: [emergent, tools, forge, sandbox, dynamic, runtime, LLM-judge]
9
+ requires_secrets: []
10
+ requires_tools: [forge_tool]
11
+ metadata:
12
+ agentos:
13
+ emoji: "\U0001F527"
14
+ ---
15
+
16
+ # Emergent Tools
17
+
18
+ You have access to the EmergentCapabilityEngine — a system that lets you create brand-new tools at runtime when no existing tool satisfies the user's request. This is a powerful capability; use it wisely.
19
+
20
+ ## When to Forge vs. Use Existing Tools
21
+
22
+ Before forging a new tool, always check whether an existing tool can fulfill the request:
23
+
24
+ 1. **Search first** — Use `discover_capabilities` to scan the tool registry. If a tool already exists that handles the task (even partially), prefer it.
25
+ 2. **Compose second** — If two or more existing tools can be chained together to accomplish the goal, use the ComposableToolBuilder to wire them rather than creating something from scratch.
26
+ 3. **Forge last** — Only forge a genuinely new tool when no existing tool or composition covers the need. Common forge-worthy scenarios:
27
+ - A domain-specific data transformation not covered by general utilities
28
+ - A custom API integration the user needs on the fly
29
+ - A specialized validation or formatting pipeline
30
+ - A one-off computation that would be awkward to express as a prompt
31
+
32
+ ## The Forging Process
33
+
34
+ When you decide to forge a tool, the pipeline works as follows:
35
+
36
+ 1. **Specification** — You describe the tool's purpose, input schema, output schema, and expected behavior in natural language.
37
+ 2. **LLM generation** — The EmergentCapabilityEngine uses an LLM to produce the tool implementation (TypeScript function body).
38
+ 3. **Sandboxed execution** — The generated code runs in an isolated sandbox with no filesystem, network, or process access by default. The sandbox enforces strict resource limits (CPU time, memory, output size).
39
+ 4. **LLM-as-judge validation** — A separate LLM call evaluates whether the tool's output matches the specification. The judge scores correctness, safety, and completeness.
40
+ 5. **Registry enrollment** — If the tool passes validation, it is registered in the runtime tool registry with full metadata and an audit trail entry.
41
+
42
+ ## Using ForgeToolMetaTool
43
+
44
+ The `forge_tool` meta-tool is your interface to the EmergentCapabilityEngine. Invoke it with:
45
+
46
+ - **name** — A clear, snake_case identifier for the new tool (e.g., `csv_to_markdown_table`)
47
+ - **description** — What the tool does, written as if for another agent reading a tool list
48
+ - **input_schema** — JSON Schema describing the expected input
49
+ - **output_schema** — JSON Schema describing the expected output
50
+ - **examples** — At least one input/output example pair to guide generation and validation
51
+ - **constraints** — Optional safety constraints (e.g., "must not make network calls", "output must be valid JSON")
52
+
53
+ The more precise your specification, the higher the first-pass success rate.
54
+
55
+ ## ComposableToolBuilder
56
+
57
+ For compositions of existing tools, use the ComposableToolBuilder pattern:
58
+
59
+ - **pipeline(tools[])** — Chain tools sequentially, piping each output as the next input
60
+ - **parallel(tools[])** — Run tools concurrently and merge their outputs
61
+ - **conditional(predicate, ifTool, elseTool)** — Branch based on a runtime condition
62
+ - **transform(tool, mapFn)** — Wrap a tool with an output transformation
63
+
64
+ Composed tools are registered just like forged tools, with full provenance tracking showing which base tools were combined.
65
+
66
+ ## EmergentJudge Quality Thresholds
67
+
68
+ The LLM-as-judge system uses three thresholds:
69
+
70
+ - **Correctness** (>= 0.8) — Does the output match the specification and examples?
71
+ - **Safety** (>= 0.9) — Does the tool avoid side effects, data leaks, or dangerous operations?
72
+ - **Completeness** (>= 0.7) — Does the tool handle edge cases and produce well-structured output?
73
+
74
+ If any threshold is not met, the forge attempt fails with a detailed explanation. You can revise the specification and retry. Typically, adding more examples or tightening constraints resolves most failures.
75
+
76
+ ## Audit Trail
77
+
78
+ Every forged tool carries an audit record containing:
79
+
80
+ - The original specification
81
+ - The generated source code (hash-pinned)
82
+ - Judge scores and rationale
83
+ - Timestamp and session context
84
+ - Parent tool references (for compositions)
85
+
86
+ This trail is immutable. If a user asks "how was this tool made?", you can retrieve and explain its provenance.
87
+
88
+ ## Best Practices
89
+
90
+ 1. **Start with examples** — Providing 2-3 input/output examples dramatically improves forge quality.
91
+ 2. **Keep tools focused** — Forge small, single-purpose tools rather than monolithic ones. Compose them later if needed.
92
+ 3. **Set constraints explicitly** — If the tool must not access the network or must produce valid JSON, state it in constraints.
93
+ 4. **Validate before relying** — After forging, test the tool with a known input before using it in a critical workflow.
94
+ 5. **Reuse forged tools** — Forged tools persist in the session registry. Check before forging a duplicate.
95
+ 6. **Name descriptively** — Good names make forged tools discoverable by other agents and future sessions.
96
+ 7. **Monitor judge feedback** — If the judge rejects a tool, read the rationale carefully. It usually pinpoints exactly what to fix.
97
+ 8. **Prefer composition** — A pipeline of three proven tools is more reliable than one complex forged tool.
@@ -0,0 +1,72 @@
1
+ ---
2
+ name: endpoint-semantic
3
+ version: '1.0.0'
4
+ description: Semantic endpoint detection — uses an LLM to classify whether the user's utterance is a complete thought, reducing false turn boundaries on mid-sentence pauses.
5
+ author: Wunderland
6
+ namespace: wunderland
7
+ category: voice
8
+ tags: [voice, endpointing, turn-detection, semantic, llm, vad, silence]
9
+ requires_secrets: []
10
+ requires_tools: []
11
+ metadata:
12
+ agentos:
13
+ emoji: "\U0001F4AC"
14
+ homepage: https://docs.wunderland.sh/guides/voice
15
+ ---
16
+
17
+ # Semantic Endpoint Detector
18
+
19
+ Use this skill when the agent's default silence-based turn detection is causing false positives — triggering agent responses mid-sentence when the user pauses to think. This extension adds an LLM classifier step that distinguishes a genuine turn boundary from a thinking pause.
20
+
21
+ Prefer this over pure VAD/silence endpointing in conversational contexts where users speak with frequent mid-thought pauses, or when the user repeatedly complains that the agent "interrupts" them.
22
+
23
+ ## How It Works
24
+
25
+ 1. If the final transcript ends with `.`, `?`, or `!`, the turn ends immediately (punctuation path).
26
+ 2. Short acknowledgement phrases (`"uh huh"`, `"yeah"`, `"right"`) are classified as backchannels and suppressed.
27
+ 3. On silence without terminal punctuation, after `minSilenceBeforeCheckMs` (default 500 ms), an LLM is queried: "Is this utterance a complete thought?" Results are LRU-cached.
28
+ - `COMPLETE` → turn ends, reason `semantic_model`.
29
+ - `INCOMPLETE` → waiting continues; eventual silence timeout acts as final fallback.
30
+ - `TIMEOUT` → falls back to silence timeout.
31
+
32
+ ## Configuration
33
+
34
+ ```json
35
+ {
36
+ "voice": {
37
+ "endpointing": "semantic",
38
+ "endpointingOptions": {
39
+ "model": "gpt-4o-mini",
40
+ "timeoutMs": 500,
41
+ "minSilenceBeforeCheckMs": 500,
42
+ "silenceTimeoutMs": 2000
43
+ }
44
+ }
45
+ }
46
+ ```
47
+
48
+ ## Provider Rules
49
+
50
+ - Use `gpt-4o-mini` (or equivalent cheap small model) for the classifier — latency matters more than quality for this binary decision.
51
+ - Keep `timeoutMs` under 600 ms to avoid adding noticeable lag to the turn boundary.
52
+ - Increase `silenceTimeoutMs` for users who speak slowly or pause frequently.
53
+ - Reduce `minSilenceBeforeCheckMs` to 300 ms for faster-paced conversations.
54
+
55
+ ## Events
56
+
57
+ | Event | Description |
58
+ |------------------------|--------------------------------------------------------------------------|
59
+ | `turn_complete` | User turn ended; `reason` is `punctuation`, `semantic_model`, or `silence_timeout` |
60
+ | `backchannel_detected` | A backchannel phrase was recognised; accumulation suppressed |
61
+
62
+ ## Examples
63
+
64
+ - "Use semantic endpoint detection to avoid cutting me off mid-thought."
65
+ - "Enable smarter turn detection for this conversational voice session."
66
+ - "Configure the endpoint detector to wait longer before deciding I'm done speaking."
67
+
68
+ ## Constraints
69
+
70
+ - Requires an LLM provider to be configured in the runtime for the classifier calls.
71
+ - LLM calls add latency at turn boundaries. Use a small, fast model to minimize this.
72
+ - The LRU cache (keyed on first 100 characters) reduces repeated LLM calls for identical short utterances.
@@ -0,0 +1,71 @@
1
+ ---
2
+ name: google-cloud-stt
3
+ version: '1.0.0'
4
+ description: Batch speech-to-text via Google Cloud Speech-to-Text API — LINEAR16 PCM, configurable language, word-level confidence scores.
5
+ author: Wunderland
6
+ namespace: wunderland
7
+ category: voice
8
+ tags: [voice, stt, speech-to-text, google, cloud, batch]
9
+ requires_secrets: [google.cloudSttCredentials]
10
+ requires_tools: []
11
+ metadata:
12
+ agentos:
13
+ emoji: "\U0001F3A4"
14
+ primaryEnv: GOOGLE_CLOUD_STT_CREDENTIALS
15
+ homepage: https://cloud.google.com/speech-to-text/docs
16
+ ---
17
+
18
+ # Google Cloud STT
19
+
20
+ Use this skill when Google Cloud Speech-to-Text is the preferred provider, particularly when the user's Google Cloud project is already configured or when multi-language support is required without switching providers.
21
+
22
+ Prefer this over Deepgram when the agent is already operating within a Google Cloud environment, or when the user specifically requests Google as the STT backend.
23
+
24
+ ## Setup
25
+
26
+ Set `GOOGLE_CLOUD_STT_CREDENTIALS` in the environment or agent secrets store. Accepts either:
27
+ - An absolute path to a service-account JSON key file
28
+ - A raw JSON string with the service-account credentials
29
+
30
+ ## Configuration
31
+
32
+ ```json
33
+ {
34
+ "voice": {
35
+ "stt": "google-cloud-stt"
36
+ }
37
+ }
38
+ ```
39
+
40
+ With language override:
41
+
42
+ ```json
43
+ {
44
+ "voice": {
45
+ "stt": "google-cloud-stt",
46
+ "providerOptions": {
47
+ "language": "fr-FR"
48
+ }
49
+ }
50
+ }
51
+ ```
52
+
53
+ ## Provider Rules
54
+
55
+ - Language codes follow BCP-47 format (e.g. `en-US`, `fr-FR`, `ja-JP`). Default is `en-US`.
56
+ - Audio input must be LINEAR16 PCM format. Convert other formats before sending.
57
+ - Confidence scores and word-level alternatives are included in the transcription result.
58
+ - Credentials are resolved at initialization; if the path is invalid or JSON is malformed, initialization will throw immediately.
59
+
60
+ ## Examples
61
+
62
+ - "Transcribe this audio using Google Cloud Speech-to-Text."
63
+ - "Use Google STT with French language recognition."
64
+ - "Set up Google Cloud STT with my service account credentials."
65
+
66
+ ## Constraints
67
+
68
+ - Requires a Google Cloud service account with the `Speech-to-Text` API enabled.
69
+ - Credentials must be either a path to a valid JSON key file or inline JSON string.
70
+ - This is a batch (non-streaming) provider. For real-time streaming, use Deepgram or Whisper instead.
71
+ - API costs apply per 15-second audio block.
@@ -0,0 +1,71 @@
1
+ ---
2
+ name: google-cloud-tts
3
+ version: '1.0.0'
4
+ description: Text-to-speech synthesis via Google Cloud Text-to-Speech API — MP3 output, configurable language and voice, voice listing.
5
+ author: Wunderland
6
+ namespace: wunderland
7
+ category: voice
8
+ tags: [voice, tts, text-to-speech, google, cloud, neural]
9
+ requires_secrets: [google.cloudTtsCredentials]
10
+ requires_tools: []
11
+ metadata:
12
+ agentos:
13
+ emoji: "\U0001F50A"
14
+ primaryEnv: GOOGLE_CLOUD_TTS_CREDENTIALS
15
+ homepage: https://cloud.google.com/text-to-speech/docs
16
+ ---
17
+
18
+ # Google Cloud TTS
19
+
20
+ Use this skill when Google Cloud Text-to-Speech is the preferred synthesis provider, particularly within a Google Cloud environment or when the user needs one of Google's Neural2 or WaveNet voices not available through other providers.
21
+
22
+ Prefer this over OpenAI TTS when the user already uses Google Cloud, needs a specific non-English language voice, or prefers Google's voice catalog.
23
+
24
+ ## Setup
25
+
26
+ Set `GOOGLE_CLOUD_TTS_CREDENTIALS` in the environment or agent secrets store. Accepts either:
27
+ - An absolute path to a service-account JSON key file
28
+ - A raw JSON string with the service-account credentials
29
+
30
+ ## Configuration
31
+
32
+ ```json
33
+ {
34
+ "voice": {
35
+ "tts": "google-cloud-tts"
36
+ }
37
+ }
38
+ ```
39
+
40
+ With voice and language options:
41
+
42
+ ```json
43
+ {
44
+ "voice": {
45
+ "tts": "google-cloud-tts",
46
+ "providerOptions": {
47
+ "languageCode": "en-GB",
48
+ "voice": "en-GB-Neural2-A"
49
+ }
50
+ }
51
+ }
52
+ ```
53
+
54
+ ## Provider Rules
55
+
56
+ - Output is MP3 (audio/mpeg). Playback requires an MP3-capable audio pipeline.
57
+ - Language codes follow BCP-47 (e.g. `en-US`, `en-GB`, `de-DE`).
58
+ - Call `listAvailableVoices()` to enumerate all voices available on the account; useful for letting the user pick a voice.
59
+ - Neural2 and WaveNet voices require the respective API feature to be enabled on the Google Cloud project.
60
+
61
+ ## Examples
62
+
63
+ - "Use Google Cloud TTS with a British English Neural2 voice."
64
+ - "List the available voices on my Google Cloud TTS account."
65
+ - "Synthesize this response using Google Cloud Text-to-Speech."
66
+
67
+ ## Constraints
68
+
69
+ - Requires a Google Cloud service account with the `Text-to-Speech` API enabled.
70
+ - Credentials must be either a path to a valid JSON key file or inline JSON string.
71
+ - API costs apply per character synthesized.
@@ -0,0 +1,25 @@
1
+ ---
2
+ name: image-editing
3
+ description: Edit, transform, and enhance images using AI models
4
+ version: 1.0.0
5
+ tags: [image, editing, img2img, inpainting, upscaling, variation]
6
+ tools_required: [editImage, upscaleImage, variateImage]
7
+ ---
8
+
9
+ # Image Editing
10
+
11
+ Edit images with img2img transformation, inpainting (fill masked regions), outpainting (extend beyond borders), upscaling (2x/4x super resolution), and creating variations.
12
+
13
+ ## Capabilities
14
+ - **img2img**: Transform an image based on a text prompt
15
+ - **Inpainting**: Fill in masked regions of an image
16
+ - **Upscaling**: Increase image resolution by 2x or 4x
17
+ - **Variations**: Create variations of an existing image
18
+
19
+ ## Providers
20
+ OpenAI, Stability AI, Local Stable Diffusion (A1111/ComfyUI), Replicate
21
+
22
+ ## Example
23
+ "Take this photo and make it look like a watercolor painting"
24
+ "Remove the background from this product photo"
25
+ "Upscale this image to 4x resolution"