npm - @framers/agentos-skills-registry - Versions diffs - 0.8.0 → 0.10.0 - Mend

@framers/agentos-skills-registry 0.8.0 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

package/README.md +44 -39
package/package.json +2 -2
package/registry/curated/agent-config/SKILL.md +22 -0
package/registry/curated/amazon-polly/SKILL.md +74 -0
package/registry/curated/diarization/SKILL.md +83 -0
package/registry/curated/emergent-tools/SKILL.md +97 -0
package/registry/curated/endpoint-semantic/SKILL.md +72 -0
package/registry/curated/git/SKILL.md +10 -0
package/registry/curated/github/SKILL.md +125 -37
package/registry/curated/google-cloud-stt/SKILL.md +71 -0
package/registry/curated/google-cloud-tts/SKILL.md +71 -0
package/registry/curated/image-editing/SKILL.md +25 -0
package/registry/curated/multimodal-rag/SKILL.md +23 -0
package/registry/curated/openwakeword/SKILL.md +75 -0
package/registry/curated/piper/SKILL.md +72 -0
package/registry/curated/porcupine/SKILL.md +74 -0
package/registry/curated/streaming-stt-deepgram/SKILL.md +84 -0
package/registry/curated/streaming-stt-whisper/SKILL.md +82 -0
package/registry/curated/streaming-tts-elevenlabs/SKILL.md +84 -0
package/registry/curated/streaming-tts-openai/SKILL.md +83 -0
package/registry/curated/structured-output/SKILL.md +22 -0
package/registry/curated/vision-ocr/SKILL.md +22 -0
package/registry/curated/vosk/SKILL.md +74 -0
package/registry.json +779 -93

package/README.md CHANGED Viewed

@@ -6,7 +6,7 @@
 # @framers/agentos-skills-registry
-Curated skills registry for [AgentOS](https://github.com/framersai/agentos) — 40 SKILL.md prompt modules, typed catalog, and lazy-loading factories.
+Curated catalog of 60+ AgentOS skills with query helpers and lazy-loading factories.
 [![npm](https://img.shields.io/npm/v/@framers/agentos-skills-registry?logo=npm&color=cb3837)](https://www.npmjs.com/package/@framers/agentos-skills-registry)
@@ -14,15 +14,36 @@ Curated skills registry for [AgentOS](https://github.com/framersai/agentos) —
 npm install @framers/agentos-skills-registry
 ```
-## What's Inside
+## What This Package Is
-This is the **single package** for AgentOS skills. It contains:
+This is the **skills catalog** — the data layer that ships 60+ curated SKILL.md
+prompt modules and provides typed query helpers, search functions, and lazy-loading
+factories for consuming them.
-- **40 curated SKILL.md files** — prompt modules spanning social automation, developer tooling, productivity, research, voice, and more
-- **registry.json** — machine-readable index of all skills with metadata
-- **Static catalog** (`SKILLS_CATALOG`) — typed array with query helpers
-- **Registry factories** — `createCuratedSkillRegistry()`, `createCuratedSkillSnapshot()` (requires `@framers/agentos`)
-- **Validation script** — `npm run validate` to lint SKILL.md files
+It is **not** the skills engine. The engine lives in
+[`@framers/agentos-skills`](https://www.npmjs.com/package/@framers/agentos)
+(exported from `@framers/agentos/skills`), which provides the runtime
+`SkillRegistry`, `SkillSnapshot` builder, frontmatter parser, and eligibility
+resolver.
+### Architecture: catalog vs. engine
+```
+@framers/agentos              ← the skills ENGINE (runtime)
+  └── /skills                    SkillRegistry, SkillSnapshot, parser, eligibility
+        ▲
+        │  lazy import()
+        │
+@framers/agentos-skills-registry   ← THIS package (CATALOG)
+  ├── registry/curated/*/SKILL.md  60+ bundled prompt modules
+  ├── registry.json                machine-readable index of all skills
+  ├── catalog.ts                   SKILLS_CATALOG array + query helpers (zero deps)
+  └── index.ts                     factory functions that lazy-import the engine
+```
+**Dependency direction:** this catalog package depends on `@framers/agentos`
+(optional peer dep), never the other way around. The engine knows nothing about
+the catalog — it just provides the parsing and registry machinery.
 ## Quick Start
@@ -67,7 +88,7 @@ Access the JSON index directly:
 import { getSkillsCatalog } from '@framers/agentos-skills-registry';
 const catalog = await getSkillsCatalog();
-console.log(catalog.skills.curated.length); // 40
+console.log(catalog.skills.curated.length); // 60+
 console.log(catalog.version); // '1.0.0'
 ```
@@ -80,7 +101,8 @@ console.log(registry.skills.curated[0].name); // 'weather'
 ### 3. Dynamically load skills into an agent (requires @framers/agentos)
-The factory functions lazy-load `@framers/agentos` via dynamic `import()`:
+The factory functions lazy-import `@framers/agentos/skills` (the engine) via
+dynamic `import()` — resolved only when you call them, cached after first use:
 ```bash
 npm install @framers/agentos-skills-registry @framers/agentos
@@ -149,23 +171,26 @@ const valid = skillNames.filter((name) => {
 const snapshot = await createCuratedSkillSnapshot({ skills: valid });
 ```
-When `skills` is a string array, the registry only loads those specific `SKILL.md`
+When `skills` is a string array, the catalog only loads those specific `SKILL.md`
 files before building the snapshot. It does not walk the full curated bundle first.
 Loaded skills also include parsed `metadata` so consumers do not need to decode
 the `metadata.agentos` block manually.
-## Two Import Paths
+## Sub-exports
-| Import | Peer deps | Use case |
-|--------|-----------|----------|
-| `@framers/agentos-skills-registry/catalog` | None | UI browsing, search, filtering |
-| `@framers/agentos-skills-registry` | `@framers/agentos` (optional) | Runtime loading, snapshots, factories |
+| Export path | Peer deps | Use case |
+|-------------|-----------|----------|
+| `@framers/agentos-skills-registry` | `@framers/agentos` (optional) | Full SDK: catalog + factory functions + schema types |
+| `@framers/agentos-skills-registry/catalog` | None | Lightweight: `SKILLS_CATALOG`, query helpers (search, filter, browse) |
+| `@framers/agentos-skills-registry/registry.json` | None | Raw JSON index of all skills |
+| `@framers/agentos-skills-registry/workspace-discovery` | None | Discover SKILL.md files in workspace directories |
+| `@framers/agentos-skills-registry/types` | None | TypeScript declarations for registry.json schema |
-The `@framers/agentos` dependency is loaded **lazily** at runtime and cached after first resolution. If it's not installed and you call a factory function, you get a clear error with install instructions.
+The `@framers/agentos` dependency is loaded **lazily** at runtime and cached after first resolution. If it is not installed and you call a factory function, you get a clear error with install instructions. The catalog query helpers work without it.
-## Included Skills (40)
+## Included Skills (60+)
-The catalog now includes both foundational utility skills and social automation modules, including:
+The catalog includes both foundational utility skills and social automation modules:
 - Information and research: `web-search`, `weather`, `summarize`, `deep-research`
 - Developer tools: `github`, `coding-agent`, `git`
@@ -203,26 +228,6 @@ import type {
 // SkillInstallSpec — install instructions for skill dependencies
 ```
-## Exports
-| Export path | Contents |
-|-------------|----------|
-| `.` | Full SDK: catalog helpers + factory functions + schema types |
-| `./catalog` | Lightweight: `SKILLS_CATALOG`, query helpers (zero peer deps) |
-| `./registry.json` | Raw JSON index of all skills |
-| `./types` | TypeScript declarations for registry.json schema |
-## Relationship to Other Packages
-```
-@framers/agentos-skills-registry     ← This package (data + SDK)
-  ├── registry/curated/*/SKILL.md    (bundled prompt modules)
-  ├── registry.json                  (machine-readable index)
-  ├── catalog.ts                     (typed queries: search, filter, browse)
-  └── index.ts                       (factories: lazy-load @framers/agentos)
-        └── @framers/agentos         (optional peer: live SkillRegistry + snapshots)
-```
 ## Contributing
 See [CONTRIBUTING.md](./CONTRIBUTING.md) for how to submit new skills.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@framers/agentos-skills-registry",
-  "version": "0.8.0",
+  "version": "0.10.0",
   "files": [
     "dist",
     "registry",
@@ -10,7 +10,7 @@
     "CONTRIBUTING.md",
     "README.md"
   ],
-  "description": "Curated skills registry for AgentOS — SKILL.md prompt modules, typed catalog, and lazy-loading factories",
+  "description": "Curated skills catalog for AgentOS — 60+ bundled SKILL.md prompt modules with query helpers and lazy-loading factories",
   "type": "module",
   "sideEffects": false,
   "main": "./dist/index.js",

package/registry/curated/agent-config/SKILL.md ADDED Viewed

@@ -0,0 +1,22 @@
+---
+name: agent-config
+description: Export and import agent configurations for sharing and backup
+version: 1.0.0
+tags: [agent, config, export, import, yaml, json]
+tools_required: []
+---
+# Agent Configuration
+Export agent configurations as portable YAML/JSON files for sharing, backup, and migration. Import configurations to recreate agents.
+## Capabilities
+- **Export**: Save agent config with instructions, tools, personality, guardrails
+- **Import**: Recreate agent from exported config
+- **Secret redaction**: API keys automatically redacted on export
+- **Round-trip**: Export -> edit -> import workflow
+## Example
+"Export my research agent to a file"
+"Import this agent configuration"
+"Share my agent config with the team"

package/registry/curated/amazon-polly/SKILL.md ADDED Viewed

@@ -0,0 +1,74 @@
+---
+name: amazon-polly
+version: '1.0.0'
+description: Neural text-to-speech via Amazon Polly — high-quality voices, MP3 output, voice listing, default Joanna (en-US Neural).
+author: Wunderland
+namespace: wunderland
+category: voice
+tags: [voice, tts, text-to-speech, amazon, aws, polly, neural]
+requires_secrets: [aws.accessKeyId, aws.secretAccessKey]
+requires_tools: []
+metadata:
+  agentos:
+    emoji: "\U0001F50A"
+    primaryEnv: AWS_ACCESS_KEY_ID
+    homepage: https://docs.aws.amazon.com/polly/
+---
+# Amazon Polly TTS
+Use this skill when the agent is deployed on AWS infrastructure or when the user prefers Amazon Polly's Neural engine voices. Polly provides high-quality neural TTS with MP3 output and a wide range of voices across languages and regions.
+Prefer this over OpenAI or ElevenLabs TTS when the user already has AWS credentials configured, or when cost efficiency at high volume is a priority (Polly pricing is per-character rather than per-request).
+## Setup
+Set the following in the environment or agent secrets store:
+| Variable                  | Description              |
+|---------------------------|--------------------------|
+| `AWS_ACCESS_KEY_ID`       | IAM access key ID        |
+| `AWS_SECRET_ACCESS_KEY`   | IAM secret access key    |
+| `AWS_REGION`              | AWS region (default `us-east-1`) |
+## Configuration
+```json
+{
+  "voice": {
+    "tts": "amazon-polly"
+  }
+}
+```
+With a specific voice:
+```json
+{
+  "voice": {
+    "tts": "amazon-polly",
+    "providerOptions": {
+      "voice": "Matthew"
+    }
+  }
+}
+```
+## Provider Rules
+- Default voice is `Joanna` (en-US, Neural engine).
+- Always use the Neural engine for production — Standard voices are lower quality.
+- Call `listAvailableVoices()` to enumerate all voices available in the configured region.
+- Audio is returned as MP3. Playback requires an MP3-capable audio pipeline.
+## Examples
+- "Use Amazon Polly with the Matthew voice for this agent."
+- "List available Polly voices in us-east-1."
+- "Synthesize this message using Amazon Polly."
+## Constraints
+- Requires `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` with `polly:SynthesizeSpeech` and `polly:DescribeVoices` IAM permissions.
+- Neural engine voices are region-specific. Not all voices are available in every AWS region.
+- API costs apply per character synthesized (Neural engine pricing).

package/registry/curated/diarization/SKILL.md ADDED Viewed

@@ -0,0 +1,83 @@
+---
+name: diarization
+version: '1.0.0'
+description: Speaker diarization — identifies and tracks who is speaking at each moment in an audio stream, using provider-delegated labels or local offline clustering.
+author: Wunderland
+namespace: wunderland
+category: voice
+tags: [voice, diarization, speaker-identification, multi-speaker, offline, deepgram]
+requires_secrets: []
+requires_tools: []
+metadata:
+  agentos:
+    emoji: "\U0001F465"
+    homepage: https://docs.wunderland.sh/guides/voice
+---
+# Speaker Diarization
+Use this skill when the agent needs to track who is speaking across a multi-speaker audio stream. Supports two modes: provider-delegated (extracts speaker labels from Deepgram word-level results) and local clustering (spectral-centroid agglomerative clustering, fully offline).
+Enable this when transcribing meetings, interviews, podcast recordings, or any scenario where knowing which speaker said what is important for downstream tasks (summaries, action items, CRM updates).
+## Setup
+No API key required for local mode. For provider-delegated mode, enable diarization on the STT provider:
+```json
+{
+  "voice": {
+    "stt": "deepgram",
+    "diarization": "provider",
+    "providerOptions": { "diarize": true }
+  }
+}
+```
+For fully offline local clustering:
+```json
+{
+  "voice": {
+    "diarization": "local"
+  }
+}
+```
+## Speaker Enrollment
+Pre-register known speakers so the engine labels them by name instead of `Speaker_0`, `Speaker_1`:
+```ts
+await session.enrollSpeaker('Alice', aliceVoiceprintFloat32Array);
+await session.enrollSpeaker('Bob', bobVoiceprintFloat32Array);
+```
+## Provider Rules
+- Prefer provider-delegated mode with Deepgram when speaker accuracy is critical and `DEEPGRAM_API_KEY` is available. Word-level speaker labels are more reliable than local clustering.
+- Use local mode when privacy, offline operation, or zero additional API cost is required.
+- The local backend extracts 16-dimensional spectral feature vectors per 1.5 s window (0.5 s overlap) — suitable for clear audio. Replace with an ONNX x-vector model for noisy environments.
+- Enroll known speakers when participant names are known in advance to get named labels instead of generic `Speaker_N` identifiers.
+## Events
+| Event                 | Description                                              |
+|-----------------------|----------------------------------------------------------|
+| `speaker_identified`  | Active speaker label has changed                         |
+| `segment_ready`       | A labelled audio or transcript segment is ready          |
+| `error`               | Unrecoverable diarization error                          |
+| `close`               | Session fully terminated                                 |
+## Examples
+- "Transcribe this meeting and label each speaker's turns."
+- "Use local diarization for a private interview recording."
+- "Enable Deepgram diarization and label Alice and Bob by voice."
+## Constraints
+- Local mode accuracy depends on audio clarity and spectral separation between speakers.
+- Provider mode requires `diarize: true` support on the active STT provider (currently Deepgram).
+- Speaker enrollment voiceprints must be Float32Arrays computed from clean reference audio.
+- The built-in feature extractor is intentionally lightweight; replace `extractSimpleEmbedding()` with an ONNX x-vector model for production-quality voiceprints.

package/registry/curated/emergent-tools/SKILL.md ADDED Viewed

@@ -0,0 +1,97 @@
+---
+name: emergent-tools
+version: '1.0.0'
+description: Forge new tools at runtime via LLM — sandboxed execution, LLM-as-judge validation, composable tool building, and full audit trail.
+author: Wunderland
+namespace: wunderland
+category: productivity
+tags: [emergent, tools, forge, sandbox, dynamic, runtime, LLM-judge]
+requires_secrets: []
+requires_tools: [forge_tool]
+metadata:
+  agentos:
+    emoji: "\U0001F527"
+---
+# Emergent Tools
+You have access to the EmergentCapabilityEngine — a system that lets you create brand-new tools at runtime when no existing tool satisfies the user's request. This is a powerful capability; use it wisely.
+## When to Forge vs. Use Existing Tools
+Before forging a new tool, always check whether an existing tool can fulfill the request:
+1. **Search first** — Use `discover_capabilities` to scan the tool registry. If a tool already exists that handles the task (even partially), prefer it.
+2. **Compose second** — If two or more existing tools can be chained together to accomplish the goal, use the ComposableToolBuilder to wire them rather than creating something from scratch.
+3. **Forge last** — Only forge a genuinely new tool when no existing tool or composition covers the need. Common forge-worthy scenarios:
+   - A domain-specific data transformation not covered by general utilities
+   - A custom API integration the user needs on the fly
+   - A specialized validation or formatting pipeline
+   - A one-off computation that would be awkward to express as a prompt
+## The Forging Process
+When you decide to forge a tool, the pipeline works as follows:
+1. **Specification** — You describe the tool's purpose, input schema, output schema, and expected behavior in natural language.
+2. **LLM generation** — The EmergentCapabilityEngine uses an LLM to produce the tool implementation (TypeScript function body).
+3. **Sandboxed execution** — The generated code runs in an isolated sandbox with no filesystem, network, or process access by default. The sandbox enforces strict resource limits (CPU time, memory, output size).
+4. **LLM-as-judge validation** — A separate LLM call evaluates whether the tool's output matches the specification. The judge scores correctness, safety, and completeness.
+5. **Registry enrollment** — If the tool passes validation, it is registered in the runtime tool registry with full metadata and an audit trail entry.
+## Using ForgeToolMetaTool
+The `forge_tool` meta-tool is your interface to the EmergentCapabilityEngine. Invoke it with:
+- **name** — A clear, snake_case identifier for the new tool (e.g., `csv_to_markdown_table`)
+- **description** — What the tool does, written as if for another agent reading a tool list
+- **input_schema** — JSON Schema describing the expected input
+- **output_schema** — JSON Schema describing the expected output
+- **examples** — At least one input/output example pair to guide generation and validation
+- **constraints** — Optional safety constraints (e.g., "must not make network calls", "output must be valid JSON")
+The more precise your specification, the higher the first-pass success rate.
+## ComposableToolBuilder
+For compositions of existing tools, use the ComposableToolBuilder pattern:
+- **pipeline(tools[])** — Chain tools sequentially, piping each output as the next input
+- **parallel(tools[])** — Run tools concurrently and merge their outputs
+- **conditional(predicate, ifTool, elseTool)** — Branch based on a runtime condition
+- **transform(tool, mapFn)** — Wrap a tool with an output transformation
+Composed tools are registered just like forged tools, with full provenance tracking showing which base tools were combined.
+## EmergentJudge Quality Thresholds
+The LLM-as-judge system uses three thresholds:
+- **Correctness** (>= 0.8) — Does the output match the specification and examples?
+- **Safety** (>= 0.9) — Does the tool avoid side effects, data leaks, or dangerous operations?
+- **Completeness** (>= 0.7) — Does the tool handle edge cases and produce well-structured output?
+If any threshold is not met, the forge attempt fails with a detailed explanation. You can revise the specification and retry. Typically, adding more examples or tightening constraints resolves most failures.
+## Audit Trail
+Every forged tool carries an audit record containing:
+- The original specification
+- The generated source code (hash-pinned)
+- Judge scores and rationale
+- Timestamp and session context
+- Parent tool references (for compositions)
+This trail is immutable. If a user asks "how was this tool made?", you can retrieve and explain its provenance.
+## Best Practices
+1. **Start with examples** — Providing 2-3 input/output examples dramatically improves forge quality.
+2. **Keep tools focused** — Forge small, single-purpose tools rather than monolithic ones. Compose them later if needed.
+3. **Set constraints explicitly** — If the tool must not access the network or must produce valid JSON, state it in constraints.
+4. **Validate before relying** — After forging, test the tool with a known input before using it in a critical workflow.
+5. **Reuse forged tools** — Forged tools persist in the session registry. Check before forging a duplicate.
+6. **Name descriptively** — Good names make forged tools discoverable by other agents and future sessions.
+7. **Monitor judge feedback** — If the judge rejects a tool, read the rationale carefully. It usually pinpoints exactly what to fix.
+8. **Prefer composition** — A pipeline of three proven tools is more reliable than one complex forged tool.

package/registry/curated/endpoint-semantic/SKILL.md ADDED Viewed

@@ -0,0 +1,72 @@
+---
+name: endpoint-semantic
+version: '1.0.0'
+description: Semantic endpoint detection — uses an LLM to classify whether the user's utterance is a complete thought, reducing false turn boundaries on mid-sentence pauses.
+author: Wunderland
+namespace: wunderland
+category: voice
+tags: [voice, endpointing, turn-detection, semantic, llm, vad, silence]
+requires_secrets: []
+requires_tools: []
+metadata:
+  agentos:
+    emoji: "\U0001F4AC"
+    homepage: https://docs.wunderland.sh/guides/voice
+---
+# Semantic Endpoint Detector
+Use this skill when the agent's default silence-based turn detection is causing false positives — triggering agent responses mid-sentence when the user pauses to think. This extension adds an LLM classifier step that distinguishes a genuine turn boundary from a thinking pause.
+Prefer this over pure VAD/silence endpointing in conversational contexts where users speak with frequent mid-thought pauses, or when the user repeatedly complains that the agent "interrupts" them.
+## How It Works
+1. If the final transcript ends with `.`, `?`, or `!`, the turn ends immediately (punctuation path).
+2. Short acknowledgement phrases (`"uh huh"`, `"yeah"`, `"right"`) are classified as backchannels and suppressed.
+3. On silence without terminal punctuation, after `minSilenceBeforeCheckMs` (default 500 ms), an LLM is queried: "Is this utterance a complete thought?" Results are LRU-cached.
+   - `COMPLETE` → turn ends, reason `semantic_model`.
+   - `INCOMPLETE` → waiting continues; eventual silence timeout acts as final fallback.
+   - `TIMEOUT` → falls back to silence timeout.
+## Configuration
+```json
+{
+  "voice": {
+    "endpointing": "semantic",
+    "endpointingOptions": {
+      "model": "gpt-4o-mini",
+      "timeoutMs": 500,
+      "minSilenceBeforeCheckMs": 500,
+      "silenceTimeoutMs": 2000
+    }
+  }
+}
+```
+## Provider Rules
+- Use `gpt-4o-mini` (or equivalent cheap small model) for the classifier — latency matters more than quality for this binary decision.
+- Keep `timeoutMs` under 600 ms to avoid adding noticeable lag to the turn boundary.
+- Increase `silenceTimeoutMs` for users who speak slowly or pause frequently.
+- Reduce `minSilenceBeforeCheckMs` to 300 ms for faster-paced conversations.
+## Events
+| Event                  | Description                                                              |
+|------------------------|--------------------------------------------------------------------------|
+| `turn_complete`        | User turn ended; `reason` is `punctuation`, `semantic_model`, or `silence_timeout` |
+| `backchannel_detected` | A backchannel phrase was recognised; accumulation suppressed             |
+## Examples
+- "Use semantic endpoint detection to avoid cutting me off mid-thought."
+- "Enable smarter turn detection for this conversational voice session."
+- "Configure the endpoint detector to wait longer before deciding I'm done speaking."
+## Constraints
+- Requires an LLM provider to be configured in the runtime for the classifier calls.
+- LLM calls add latency at turn boundaries. Use a small, fast model to minimize this.
+- The LRU cache (keyed on first 100 characters) reduces repeated LLM calls for identical short utterances.

package/registry/curated/git/SKILL.md CHANGED Viewed

@@ -37,3 +37,13 @@ Use `git` to inspect history, create branches, commit changes, and resolve confl
 - Create a branch: `git checkout -b my-branch`
 - Stage + commit: `git add -A && git commit -m "message"`
 - Rebase: `git rebase -i origin/main`
+## GitHub Integration
+After making local commits with git, use the GitHub tools to push your work upstream and open pull requests:
+- **Create a remote branch** — Use `github_branch_create` to create a branch on the remote from a given SHA. Get the SHA from your local HEAD with `git rev-parse HEAD`, then create the matching remote branch.
+- **Open a pull request** — Use `github_pr_create` to open a PR from your feature branch to the default branch, with a clear title and description summarizing the changes.
+- **Full GitHub API operations** — The `github` skill provides 26 tools covering PR review, merge, issue triage, release management, Actions CI/CD, and more. Reference it for anything beyond local git operations.
+**Division of responsibility:** Use `git` for local operations (staging, committing, branching, rebasing, diffing, log inspection) and the `github_*` tools for remote API operations (creating PRs, reviewing code, managing issues, triggering workflows). The two complement each other — git handles your local working copy while the GitHub tools interact with the hosted platform.