npm - @mastra/mcp-docs-server - Versions diffs - 0.0.3 → 0.0.4-alpha.0 - Mend

@mastra/mcp-docs-server 0.0.3 → 0.0.4-alpha.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (74) hide show

package/.docs/raw/reference/voice/sarvam.mdx ADDED Viewed

@@ -0,0 +1,260 @@
+---
+title: "Reference: Sarvam Voice | Voice Providers | Mastra Docs"
+description: "Documentation for the Sarvam class, providing text-to-speech and speech-to-text capabilities."
+---
+# Sarvam
+The SarvamVoice class in Mastra provides text-to-speech and speech-to-text capabilities using Sarvam AI models.
+## Usage Example
+```typescript
+import { SarvamVoice } from "@mastra/voice-sarvam";
+// Initialize with default configuration using environment variables
+const voice = new SarvamVoice();
+// Or initialize with specific configuration
+const voiceWithConfig = new SarvamVoice({
+   speechModel: {
+    model: "bulbul:v1",
+    apiKey: process.env.SARVAM_API_KEY!,
+    language: "en-IN",
+    properties: {
+      pitch: 0,
+      pace: 1.65,
+      loudness: 1.5,
+      speech_sample_rate: 8000,
+      enable_preprocessing: false,
+      eng_interpolation_wt: 123,
+    },
+  },
+  listeningModel: {
+    model: "saarika:v2",
+    apiKey: process.env.SARVAM_API_KEY!,
+    languageCode: "en-IN",
+     filetype?: 'wav';
+  },
+  speaker: "meera", // Default voice
+});
+// Convert text to speech
+const audioStream = await voice.speak("Hello, how can I help you?");
+// Convert speech to text
+const text = await voice.listen(audioStream, {
+  filetype: "wav",
+});
+```
+### Sarvam API Docs -
+https://docs.sarvam.ai/api-reference-docs/endpoints/text-to-speech
+## Configuration
+### Constructor Options
+<PropertiesTable
+  content={[
+    {
+      name: "speechModel",
+      type: "SarvamVoiceConfig",
+      description: "Configuration for text-to-speech synthesis.",
+      isOptional: true,
+      defaultValue: "{ model: 'bulbul:v1', language: 'en-IN' }",
+    },
+    {
+      name: "speaker",
+      type: "SarvamVoiceId",
+      description:
+        "The speaker to be used for the output audio. If not provided, Meera will be used as default. AvailableOptions - meera, pavithra, maitreyi, arvind, amol, amartya, diya, neel, misha, vian, arjun, maya",
+      isOptional: true,
+      defaultValue: "'meera'",
+    },
+    {
+      name: "listeningModel",
+      type: "SarvamListenOptions",
+      description: "Configuration for speech-to-text recognition.",
+      isOptional: true,
+      defaultValue: "{ model: 'saarika:v2', language_code: 'unknown' }",
+    },
+  ]}
+/>
+### SarvamVoiceConfig
+<PropertiesTable
+  content={[
+    {
+      name: "apiKey",
+      type: "string",
+      description:
+        "Sarvam API key. Falls back to SARVAM_API_KEY environment variable.",
+      isOptional: true,
+    },
+    {
+      name: "model",
+      type: "SarvamTTSModel",
+      description: "Specifies the model to use for text-to-speech conversion.",
+      isOptional: true,
+      defaultValue: "'bulbul:v1'",
+    },
+    {
+      name: "language",
+      type: "SarvamTTSLanguage",
+      description:
+        "Target language for speech synthesis. Available options: hi-IN, bn-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-IN, te-IN, en-IN, gu-IN",
+      isOptional: false,
+      defaultValue: "'en-IN'",
+    },
+    {
+      name: "properties",
+      type: "object",
+      description: "Additional voice properties for customization.",
+      isOptional: true,
+    },
+    {
+      name: "properties.pitch",
+      type: "number",
+      description:
+        "Controls the pitch of the audio. Lower values result in a deeper voice, while higher values make it sharper. The suitable range is between -0.75 and 0.75.",
+      isOptional: true,
+    },
+    {
+      name: "properties.pace",
+      type: "number",
+      description:
+        "Controls the speed of the audio. Lower values result in slower speech, while higher values make it faster. The suitable range is between 0.5 and 2.0. Default is 1.0. Required range: 0.3 <= x <= 3",
+      isOptional: true,
+    },
+    {
+      name: "properties.loudness",
+      type: "number",
+      description:
+        "Controls the loudness of the audio. Lower values result in quieter audio, while higher values make it louder. The suitable range is between 0.3 and 3.0. Required range: 0 <= x <= 3",
+      isOptional: true,
+    },
+    {
+      name: "properties.speech_sample_rate",
+      type: "8000 | 16000 | 22050",
+      description: "Audio sample rate in Hz.",
+      isOptional: true,
+    },
+    {
+      name: "properties.enable_preprocessing",
+      type: "boolean",
+      description:
+        "Controls whether normalization of English words and numeric entities (e.g., numbers, dates) is performed. Set to true for better handling of mixed-language text. Default is false.",
+      isOptional: true,
+    },
+    {
+      name: "properties.eng_interpolation_wt",
+      type: "number",
+      description: "Weight for interpolating with English speaker at encoder.",
+      isOptional: true,
+    },
+  ]}
+/>
+### SarvamListenOptions
+<PropertiesTable
+  content={[
+    {
+      name: "apiKey",
+      type: "string",
+      description:
+        "Sarvam API key. Falls back to SARVAM_API_KEY environment variable.",
+      isOptional: true,
+    },
+    {
+      name: "model",
+      type: "SarvamSTTModel",
+      description:
+        "Specifies the model to use for speech-to-text conversion. Note:- Default model is saarika:v2 . Available options: saarika:v1, saarika:v2, saarika:flash ",
+      isOptional: true,
+      defaultValue: "'saarika:v2'",
+    },
+    {
+      name: "languageCode",
+      type: "SarvamSTTLanguage",
+      description:
+        "Specifies the language of the input audio. This parameter is required to ensure accurate transcription. For the saarika:v1 model, this parameter is mandatory. For the saarika:v2 model, it is optional. unknown: Use this when the language is not known; the API will detect it automatically. Note:- that the saarika:v1 model does not support unknown language code. Available options: unknown, hi-IN, bn-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-IN, te-IN, en-IN, gu-IN ",
+      isOptional: true,
+      defaultValue: "'unknown'",
+    },
+    {
+      name: "filetype",
+      type: "'mp3' | 'wav'",
+      description: "Audio format of the input stream.",
+      isOptional: true,
+    },
+  ]}
+/>
+## Methods
+### speak()
+Converts text to speech using Sarvam's text-to-speech models.
+<PropertiesTable
+  content={[
+    {
+      name: "input",
+      type: "string | NodeJS.ReadableStream",
+      description: "Text or text stream to convert to speech.",
+      isOptional: false,
+    },
+    {
+      name: "options.speaker",
+      type: "SarvamVoiceId",
+      description: "Voice ID to use for speech synthesis.",
+      isOptional: true,
+      defaultValue: "Constructor's speaker value",
+    },
+  ]}
+/>
+Returns: `Promise<NodeJS.ReadableStream>`
+### listen()
+Transcribes audio using Sarvam's speech recognition models.
+<PropertiesTable
+  content={[
+    {
+      name: "input",
+      type: "NodeJS.ReadableStream",
+      description: "Audio stream to transcribe.",
+      isOptional: false,
+    },
+    {
+      name: "options",
+      type: "SarvamListenOptions",
+      description: "Configuration options for speech recognition.",
+      isOptional: true,
+    },
+  ]}
+/>
+Returns: `Promise<string>`
+### getSpeakers()
+Returns an array of available voice options.
+Returns: `Promise<Array<{voiceId: SarvamVoiceId}>>`
+## Notes
+- API key can be provided via constructor options or the `SARVAM_API_KEY` environment variable
+- If no API key is provided, the constructor will throw an error
+- The service communicates with the Sarvam AI API at `https://api.sarvam.ai`
+- Audio is returned as a stream containing binary audio data
+- Speech recognition supports mp3 and wav audio formats

package/.docs/raw/reference/workflows/snapshots.mdx ADDED Viewed

@@ -0,0 +1,204 @@
+---
+title: "Reference: Snapshots | Workflow State Persistence | Mastra Docs"
+description: "Technical reference on snapshots in Mastra - the serialized workflow state that enables suspend and resume functionality"
+---
+# Snapshots
+In Mastra, a snapshot is a serializable representation of a workflow's complete execution state at a specific point in time. Snapshots capture all the information needed to resume a workflow from exactly where it left off, including:
+- The current state of each step in the workflow
+- The outputs of completed steps
+- The execution path taken through the workflow
+- Any suspended steps and their metadata
+- The remaining retry attempts for each step
+- Additional contextual data needed to resume execution
+Snapshots are automatically created and managed by Mastra whenever a workflow is suspended, and are persisted to the configured storage system.
+## The Role of Snapshots in Suspend and Resume
+Snapshots are the key mechanism enabling Mastra's suspend and resume capabilities. When a workflow step calls `await suspend()`:
+1. The workflow execution is paused at that exact point
+2. The current state of the workflow is captured as a snapshot
+3. The snapshot is persisted to storage
+4. The workflow step is marked as "suspended" with a status of `'suspended'`
+5. Later, when `resume()` is called on the suspended step, the snapshot is retrieved
+6. The workflow execution resumes from exactly where it left off
+This mechanism provides a powerful way to implement human-in-the-loop workflows, handle rate limiting, wait for external resources, and implement complex branching workflows that may need to pause for extended periods.
+## Snapshot Anatomy
+A Mastra workflow snapshot consists of several key components:
+```typescript
+export interface WorkflowRunState {
+  // Core state info
+  value: Record<string, string>;      // Current state machine value
+  context: {                          // Workflow context
+    steps: Record<string, {           // Step execution results
+      status: 'success' | 'failed' | 'suspended' | 'waiting' | 'skipped';
+      payload?: any;                  // Step-specific data
+      error?: string;                 // Error info if failed
+    }>;
+    triggerData: Record<string, any>; // Initial trigger data
+    attempts: Record<string, number>; // Remaining retry attempts
+    inputData: Record<string, any>; 	// Initial input data
+  };
+  activePaths: Array<{               // Currently active execution paths
+    stepPath: string[];
+    stepId: string;
+    status: string;
+  }>;
+  // Metadata
+  runId: string;                     // Unique run identifier
+  timestamp: number;                 // Time snapshot was created
+  // For nested workflows and suspended steps
+  childStates?: Record<string, WorkflowRunState>;  // Child workflow states
+  suspendedSteps?: Record<string, string>;         // Mapping of suspended steps
+}
+```
+## How Snapshots Are Saved and Retrieved
+Mastra persists snapshots to the configured storage system. By default, snapshots are saved to a LibSQL database, but can be configured to use other storage providers like Upstash.
+The snapshots are stored in the `workflow_snapshots` table and identified uniquely by the `run_id` for the associated run when using libsql.
+Utilizing a persistence layer allows for the snapshots to be persisted across workflow runs, allowing for advanced human-in-the-loop functionality.
+Read more about [libsql storage](../storage/libsql.mdx) and [upstash storage](../storage/upstash.mdx) here.
+### Saving Snapshots
+When a workflow is suspended, Mastra automatically persists the workflow snapshot with these steps:
+1. The `suspend()` function in a step execution triggers the snapshot process
+2. The `WorkflowInstance.suspend()` method records the suspended machine
+3. `persistWorkflowSnapshot()` is called to save the current state
+4. The snapshot is serialized and stored in the configured database in the `workflow_snapshots` table
+5. The storage record includes the workflow name, run ID, and the serialized snapshot
+### Retrieving Snapshots
+When a workflow is resumed, Mastra retrieves the persisted snapshot with these steps:
+1. The `resume()` method is called with a specific step ID
+2. The snapshot is loaded from storage using `loadWorkflowSnapshot()`
+3. The snapshot is parsed and prepared for resumption
+4. The workflow execution is recreated with the snapshot state
+5. The suspended step is resumed, and execution continues
+## Storage Options for Snapshots
+Mastra provides multiple storage options for persisting snapshots.
+A `storage` instance is configured on the `Mastra` class, and is used to setup a snapshot persistence layer for all workflows registered on the `Mastra` instance.
+This means that storage is shared across all workflows registered with the same `Mastra` instance.
+### LibSQL (Default)
+The default storage option is LibSQL, a SQLite-compatible database:
+```typescript
+import { Mastra } from '@mastra/core/mastra';
+import { DefaultStorage } from '@mastra/core/storage/libsql';
+const mastra = new Mastra({
+  storage: new DefaultStorage({
+    config: {
+      url: "file:storage.db", // Local file-based database
+      // For production:
+      // url: process.env.DATABASE_URL,
+      // authToken: process.env.DATABASE_AUTH_TOKEN,
+    }
+  }),
+  workflows: {
+    weatherWorkflow,
+    travelWorkflow,
+  }
+});
+```
+### Upstash (Redis-Compatible)
+For serverless environments:
+```typescript
+import { Mastra } from '@mastra/core/mastra';
+import { UpstashStore } from "@mastra/upstash";
+const mastra = new Mastra({
+  storage: new UpstashStore({
+    url: process.env.UPSTASH_URL,
+    token: process.env.UPSTASH_TOKEN,
+  }),
+  workflows: {
+    weatherWorkflow,
+    travelWorkflow,
+  }
+});
+```
+## Best Practices for Working with Snapshots
+1. **Ensure Serializability**: Any data that needs to be included in the snapshot must be serializable (convertible to JSON).
+2. **Minimize Snapshot Size**: Avoid storing large data objects directly in the workflow context. Instead, store references to them (like IDs) and retrieve the data when needed.
+3. **Handle Resume Context Carefully**: When resuming a workflow, carefully consider what context to provide. This will be merged with the existing snapshot data.
+4. **Set Up Proper Monitoring**: Implement monitoring for suspended workflows, especially long-running ones, to ensure they are properly resumed.
+5. **Consider Storage Scaling**: For applications with many suspended workflows, ensure your storage solution is appropriately scaled.
+## Advanced Snapshot Patterns
+### Custom Snapshot Metadata
+When suspending a workflow, you can include custom metadata that can help when resuming:
+```typescript
+await suspend({
+  reason: "Waiting for customer approval",
+  requiredApprovers: ["manager", "finance"],
+  requestedBy: currentUser,
+  urgency: "high",
+  expires: new Date(Date.now() + 7 * 24 * 60 * 60 * 1000)
+});
+```
+This metadata is stored with the snapshot and available when resuming.
+### Conditional Resumption
+You can implement conditional logic based on the suspend payload when resuming:
+```typescript
+run.watch(async ({ context, activePaths }) => {
+  for (const path of activePaths) {
+    const approvalStep = context.steps?.approval;
+    if (approvalStep?.status === 'suspended') {
+      const payload = approvalStep.suspendPayload;
+      if (payload.urgency === "high" && currentUser.role === "manager") {
+        await resume({
+          stepId: 'approval',
+          context: { approved: true, approver: currentUser.id },
+        });
+      }
+    }
+  }
+});
+```
+## Related
+- [Suspend Function Reference](./suspend.mdx)
+- [Resume Function Reference](./resume.mdx)
+- [Watch Function Reference](./watch.mdx)
+- [Suspend and Resume Guide](../../workflows/suspend-and-resume.mdx)

package/.docs/raw/voice/overview.mdx ADDED Viewed

@@ -0,0 +1,135 @@
+---
+title: Voice in Mastra | Mastra Docs
+description: Overview of voice capabilities in Mastra, including text-to-speech, speech-to-text, and real-time voice-to-voice interactions.
+---
+# Voice in Mastra
+Mastra's Voice system provides a unified interface for voice interactions, enabling text-to-speech (TTS), speech-to-text (STT), and real-time voice-to-voice capabilities in your applications.
+## Key Features
+- Standardized API across different voice providers
+- Support for multiple voice services
+- Voice-to-voice interactions using events for continuous audio streaming
+- Composable voice providers for mixing TTS and STT services
+## Adding Voice to Agents
+To learn how to integrate voice capabilities into your agents, check out the [Adding Voice to Agents](../agents/03-adding-voice.mdx) documentation. This section covers how to use both single and multiple voice providers, as well as real-time interactions.
+## Example of Using a Single Voice Provider
+```typescript
+import { OpenAIVoice } from "@mastra/voice-openai";
+// Initialize OpenAI voice for TTS
+const voice = new OpenAIVoice({
+  speechModel: {
+    name: "tts-1-hd", // Specify the TTS model
+    apiKey: process.env.OPENAI_API_KEY, // Your OpenAI API key
+  },
+});
+// Convert text to speech
+const audioStream = await voice.speak("Hello! How can I assist you today?", {
+  speaker: "default", // Optional: specify a speaker
+});
+// Play the audio response
+playAudio(audioStream);
+```
+## Example of Using Multiple Voice Providers
+This example demonstrates how to create and use two different voice providers in Mastra: OpenAI for speech-to-text (STT) and PlayAI for text-to-speech (TTS).
+Start by creating instances of the voice providers with any necessary configuration.
+```typescript
+import { OpenAIVoice } from "@mastra/voice-openai";
+import { PlayAIVoice } from "@mastra/voice-playai";
+import { CompositeVoice } from "@mastra/core/voice";
+// Initialize OpenAI voice for STT
+const listeningProvider = new OpenAIVoice({
+  listeningModel: {
+    name: "whisper-1",
+    apiKey: process.env.OPENAI_API_KEY,
+  },
+});
+// Initialize PlayAI voice for TTS
+const speakingProvider = new PlayAIVoice({
+  speechModel: {
+    name: "playai-voice",
+    apiKey: process.env.PLAYAI_API_KEY,
+  },
+});
+// Combine the providers using CompositeVoice
+const voice = new CompositeVoice({
+  listeningProvider,
+  speakingProvider,
+});
+// Implement voice interactions using the combined voice provider
+const audioStream = getMicrophoneStream(); // Assume this function gets audio input
+const transcript = await voice.listen(audioStream);
+// Log the transcribed text
+console.log("Transcribed text:", transcript);
+// Convert text to speech
+const responseAudio = await voice.speak(`You said: ${transcript}`, {
+  speaker: "default", // Optional: specify a speaker
+});
+// Play the audio response
+playAudio(responseAudio);
+```
+## Real-time Capabilities
+Many voice providers support real-time speech-to-speech interactions through WebSocket connections, enabling:
+- Live voice conversations with AI
+- Streaming transcription
+- Real-time text-to-speech synthesis
+- Tool usage during conversations
+## Voice Configuration
+Voice providers can be configured with different models and options:
+```typescript
+const voice = new OpenAIVoice({
+  speechModel: {
+    name: "tts-1-hd",
+    apiKey: process.env.OPENAI_API_KEY
+  },
+  listeningModel: {
+    name: "whisper-1"
+  },
+  speaker: "alloy"
+});
+```
+## Available Voice Providers
+Mastra supports a variety of voice providers, including:
+- OpenAI
+- PlayAI
+- Murf
+- ElevenLabs
+- [More](https://github.com/mastra-ai/mastra/tree/main/voice)
+## More Resources
+- [CompositeVoice](../reference/voice/composite-voice.mdx)
+- [MastraVoice](../reference/voice/mastra-voice.mdx)
+- [OpenAI Voice](../reference/voice/openai.mdx)
+- [PlayAI Voice](../reference/voice/playai.mdx)
+- [Voice Examples](../../examples/voice/)

package/.docs/raw/voice/speech-to-text.mdx ADDED Viewed

@@ -0,0 +1,45 @@
+---
+title: Speech-to-Text (STT) in Mastra | Mastra Docs
+description: Overview of Speech-to-Text capabilities in Mastra, including configuration, usage, and integration with voice providers.
+---
+# Speech-to-Text (STT)
+Speech-to-Text (STT) in Mastra provides a standardized interface for converting audio input into text across multiple service providers. This section covers STT configuration and usage. Check out the [Adding Voice to Agents](../agents/03-adding-voice.mdx) documentation to learn how to use STT in an agent.
+## Speech Configuration
+To use STT in Mastra, you need to provide a `listeningModel` configuration when initializing the voice provider. This configuration includes parameters such as:
+- **`name`**: The specific STT model to use.
+- **`apiKey`**: Your API key for authentication.
+- **Provider-specific options**: Additional options that may be required or supported by the specific voice provider.
+**Note**: All of these parameters are optional. You can use the default settings provided by the voice provider, which will depend on the specific provider you are using.
+### Example Configuration
+```typescript
+const voice = new OpenAIVoice({
+  listeningModel: {
+    name: "whisper-1",
+    apiKey: process.env.OPENAI_API_KEY,
+  },
+});
+// If using default settings the configuration can be simplified to:
+const voice = new OpenAIVoice();
+```
+## Using the Listen Method
+The primary method for STT is the `listen()` method, which converts spoken audio into text. Here's how to use it:
+```typescript
+const audioStream = getMicrophoneStream(); // Assume this function gets audio input
+const transcript = await voice.listen(audioStream, {
+  filetype: "m4a", // Optional: specify the audio file type
+});
+```
+**Note**: If you are using a voice-to-voice provider, such as `OpenAIRealtimeVoice`, the `listen()` method will emit a "writing" event instead of returning a transcript directly.

package/.docs/raw/voice/text-to-speech.mdx ADDED Viewed

@@ -0,0 +1,52 @@
+---
+title: Text-to-Speech (TTS) in Mastra | Mastra Docs
+description: Overview of Text-to-Speech capabilities in Mastra, including configuration, usage, and integration with voice providers.
+---
+# Text-to-Speech (TTS)
+Text-to-Speech (TTS) in Mastra offers a unified API for synthesizing spoken audio from text using various provider services. This section explains TTS configuration options and implementation methods. For integrating TTS capabilities with agents, refer to the [Adding Voice to Agents](../agents/03-adding-voice.mdx) documentation.
+## Speech Configuration
+To use TTS in Mastra, you need to provide a `speechModel` configuration when initializing the voice provider. This configuration includes parameters such as:
+- **`name`**: The specific TTS model to use.
+- **`apiKey`**: Your API key for authentication.
+- **Provider-specific options**: Additional options that may be required or supported by the specific voice provider.
+The **`speaker`** option is specified separately and allows you to select different voices for speech synthesis.
+**Note**: All of these parameters are optional. You can use the default settings provided by the voice provider, which will depend on the specific provider you are using.
+### Example Configuration
+```typescript
+const voice = new OpenAIVoice({
+  speechModel: {
+    name: "tts-1-hd",
+    apiKey: process.env.OPENAI_API_KEY
+  },
+  speaker: "alloy",
+});
+// If using default settings the configuration can be simplified to:
+const voice = new OpenAIVoice();
+```
+## Using the Speak Method
+The primary method for TTS is the `speak()` method, which converts text to speech. This method can accept options that allows you to specify the speaker and other provider-specific options. Here's how to use it:
+```typescript
+const readableStream = await voice.speak("Hello, world!", {
+  speaker: "default", // Optional: specify a speaker
+  properties: {
+    speed: 1.0, // Optional: adjust speech speed
+    pitch: "default", // Optional: specify pitch if supported
+  },
+});
+```
+**Note**: If you are using a voice-to-voice provider, such as `OpenAIRealtimeVoice`, the `speak()` method will emit a "speaking" event instead of returning an Readable Stream.