phonic 0.4.0 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE CHANGED
@@ -1,4 +1,4 @@
1
- Copyright (c) 2024 Phonic, Inc.
1
+ Copyright (c) 2025 Phonic, Inc.
2
2
 
3
3
  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
4
4
 
package/README.md CHANGED
@@ -7,7 +7,7 @@ Node.js library for the Phonic API.
7
7
  - [Usage](#usage)
8
8
  - [Get voices](#get-voices)
9
9
  - [Get voice by id](#get-voice-by-id)
10
- - [Text-to-speech via WebSocket](#text-to-speech-via-websocket)
10
+ - [Speech-to-speech via WebSocket](#speech-to-speech-via-websocket)
11
11
 
12
12
  ## Installation
13
13
 
@@ -19,7 +19,7 @@ npm i phonic
19
19
 
20
20
  Grab an API key from [Phonic settings](https://phonic.co/settings) and pass it to the Phonic constructor.
21
21
 
22
- ```js
22
+ ```ts
23
23
  import { Phonic } from "phonic";
24
24
 
25
25
  const phonic = new Phonic("ph_...");
@@ -29,7 +29,7 @@ const phonic = new Phonic("ph_...");
29
29
 
30
30
  ### Get voices
31
31
 
32
- ```js
32
+ ```ts
33
33
  const { data, error } = await phonic.voices.list({ model: "shasta" });
34
34
 
35
35
  if (error === null) {
@@ -40,7 +40,7 @@ if (error === null) {
40
40
 
41
41
  ### Get voice by id
42
42
 
43
- ```js
43
+ ```ts
44
44
  const { data, error } = await phonic.voices.get("meredith");
45
45
 
46
46
  if (error === null) {
@@ -48,16 +48,12 @@ if (error === null) {
48
48
  }
49
49
  ```
50
50
 
51
- ### Text-to-speech via WebSocket
51
+ ### Speech-to-speech via WebSocket
52
52
 
53
53
  Open a WebSocket connection:
54
54
 
55
- ```js
56
- const { data, error } = await phonic.tts.websocket({
57
- model: "shasta",
58
- output_format: "mulaw_8000",
59
- voice_id: "meredith",
60
- });
55
+ ```ts
56
+ const { data, error } = await phonic.sts.websocket();
61
57
 
62
58
  if (error !== null) {
63
59
  throw new Error(error.message);
@@ -67,69 +63,64 @@ if (error !== null) {
67
63
  const { phonicWebSocket } = data;
68
64
  ```
69
65
 
70
- Process audio chunks that Phonic sends back to you, by sending them to Twilio, for example:
71
-
72
- ```js
73
- phonicWebSocket.onMessage((message) => {
74
- if (message.type === "audio_chunk") {
75
- ws.send(
76
- JSON.stringify({
77
- event: "media",
78
- streamSid: "...",
79
- media: {
80
- payload: message.audio,
81
- },
82
- }),
83
- );
84
- }
85
- });
86
- ```
87
-
88
- Send text chunks to Phonic for audio generation as you receive them from LLM:
89
-
90
- ```js
91
- const stream = await openai.chat.completions.create(...);
66
+ Send config params for the conversation:
92
67
 
93
- for await (const chunk of stream) {
94
- const text = chunk.choices[0]?.delta?.content || "";
68
+ ```ts
69
+ phonicWebSocket.config({
70
+ input_format: "mulaw_8000",
95
71
 
96
- if (text) {
97
- phonicWebSocket.generate({ text });
98
- }
99
- }
72
+ // Optional fields
73
+ system_prompt: "You are a helpful assistant.",
74
+ welcome_message: "Hello, how can I help you?",
75
+ voice_id: "meredith",
76
+ output_format: "mulaw_8000"
77
+ });
100
78
  ```
101
79
 
102
- Tell Phonic to finish generating audio for all text chunks you've sent:
80
+ Stream input (user) audio chunks:
103
81
 
104
- ```js
105
- phonicWebSocket.flush();
82
+ ```ts
83
+ phonicWebSocket.audioChunk({
84
+ audio: "...", // base64 encoded audio chunk
85
+ });
106
86
  ```
107
87
 
108
- You can also tell Phonic to stop sending audio chunks back, e.g. if the user interrupts the conversation:
88
+ Process messages that Phonic sends back to you:
109
89
 
110
- ```js
111
- phonicWebSocket.stop();
90
+ ```ts
91
+ phonicWebSocket.onMessage((message) => {
92
+ switch (message.type) {
93
+ case "input_text": {
94
+ console.log(`User: ${message.text}`);
95
+ break;
96
+ }
97
+
98
+ case "audio_chunk": {
99
+ // Send the audio chunk to Twilio, for example:
100
+ twilioWebSocket.send(
101
+ JSON.stringify({
102
+ event: "media",
103
+ streamSid: "...",
104
+ media: {
105
+ payload: message.audio,
106
+ },
107
+ }),
108
+ );
109
+ break;
110
+ }
111
+ }
112
+ });
112
113
  ```
113
114
 
114
- To close the WebSocket connection:
115
+ To end the conversation, close the WebSocket:
115
116
 
116
- ```js
117
+ ```ts
117
118
  phonicWebSocket.close();
118
119
  ```
119
120
 
120
- To know when the last audio chunk has been received:
121
-
122
- ```js
123
- phonicWebSocket.onMessage((message) => {
124
- if (message.type === "flushed") {
125
- console.log("Last audio chunk received");
126
- }
127
- });
128
- ```
129
-
130
121
  You can also listen for close and error events:
131
122
 
132
- ```js
123
+ ```ts
133
124
  phonicWebSocket.onClose((event) => {
134
125
  console.log(
135
126
  `Phonic WebSocket closed with code ${event.code} and reason "${event.reason}"`,
package/dist/index.d.mts CHANGED
@@ -18,24 +18,13 @@ type DataOrError<T> = Promise<{
18
18
  error: ErrorResponse;
19
19
  }>;
20
20
 
21
- type PhonicWebSocketParams = {
22
- model?: string;
23
- output_format?: string;
24
- voice_id?: string;
25
- };
26
- type PhonicWebSocketResponseMessage = {
27
- type: "config";
28
- model: string;
29
- output_format: string;
30
- voice_id: string;
21
+ type PhonicSTSWebSocketResponseMessage = {
22
+ type: "input_text";
23
+ text: string;
31
24
  } | {
32
25
  type: "audio_chunk";
33
- audio: string;
34
26
  text: string;
35
- } | {
36
- type: "flush_confirm";
37
- } | {
38
- type: "stop_confirm";
27
+ audio: string;
39
28
  } | {
40
29
  type: "error";
41
30
  error: {
@@ -43,18 +32,18 @@ type PhonicWebSocketResponseMessage = {
43
32
  code?: string;
44
33
  };
45
34
  paramErrors?: {
46
- model?: string;
47
- output_format?: string;
35
+ system_prompt?: string;
36
+ welcome_message?: string;
48
37
  voice_id?: string;
49
- text?: string;
50
- speed?: string;
38
+ input_format?: string;
39
+ output_format?: string;
51
40
  };
52
41
  };
53
- type OnMessageCallback = (message: PhonicWebSocketResponseMessage) => void;
42
+ type OnMessageCallback = (message: PhonicSTSWebSocketResponseMessage) => void;
54
43
  type OnCloseCallback = (event: WebSocket.CloseEvent) => void;
55
44
  type OnErrorCallback = (event: WebSocket.ErrorEvent) => void;
56
45
 
57
- declare class PhonicWebSocket {
46
+ declare class PhonicSTSWebSocket {
58
47
  private readonly ws;
59
48
  private onMessageCallback;
60
49
  private onCloseCallback;
@@ -63,20 +52,24 @@ declare class PhonicWebSocket {
63
52
  onMessage(callback: OnMessageCallback): void;
64
53
  onClose(callback: OnCloseCallback): void;
65
54
  onError(callback: OnErrorCallback): void;
66
- generate(message: {
67
- text: string;
68
- speed?: number;
55
+ config(message: {
56
+ system_prompt?: string;
57
+ welcome_message?: string;
58
+ voice_id?: string;
59
+ input_format?: "pcm_44100" | "mulaw_8000";
60
+ output_format?: "pcm_44100" | "mulaw_8000";
61
+ }): void;
62
+ audioChunk(message: {
63
+ audio: string;
69
64
  }): void;
70
- flush(): void;
71
- stop(): void;
72
65
  close(): void;
73
66
  }
74
67
 
75
- declare class TextToSpeech {
68
+ declare class SpeechToSpeech {
76
69
  private readonly phonic;
77
70
  constructor(phonic: Phonic);
78
- websocket(params?: PhonicWebSocketParams): DataOrError<{
79
- phonicWebSocket: PhonicWebSocket;
71
+ websocket(): DataOrError<{
72
+ phonicWebSocket: PhonicSTSWebSocket;
80
73
  }>;
81
74
  }
82
75
 
@@ -105,7 +98,7 @@ declare class Phonic {
105
98
  readonly baseUrl: string;
106
99
  private readonly headers;
107
100
  readonly voices: Voices;
108
- readonly tts: TextToSpeech;
101
+ readonly sts: SpeechToSpeech;
109
102
  constructor(apiKey: string, config?: PhonicConfig);
110
103
  fetchRequest<T>(path: string, options: FetchOptions): DataOrError<T>;
111
104
  get<T>(path: string): Promise<{
@@ -117,4 +110,4 @@ declare class Phonic {
117
110
  }>;
118
111
  }
119
112
 
120
- export { Phonic, PhonicWebSocket };
113
+ export { Phonic, PhonicSTSWebSocket };
package/dist/index.d.ts CHANGED
@@ -18,24 +18,13 @@ type DataOrError<T> = Promise<{
18
18
  error: ErrorResponse;
19
19
  }>;
20
20
 
21
- type PhonicWebSocketParams = {
22
- model?: string;
23
- output_format?: string;
24
- voice_id?: string;
25
- };
26
- type PhonicWebSocketResponseMessage = {
27
- type: "config";
28
- model: string;
29
- output_format: string;
30
- voice_id: string;
21
+ type PhonicSTSWebSocketResponseMessage = {
22
+ type: "input_text";
23
+ text: string;
31
24
  } | {
32
25
  type: "audio_chunk";
33
- audio: string;
34
26
  text: string;
35
- } | {
36
- type: "flush_confirm";
37
- } | {
38
- type: "stop_confirm";
27
+ audio: string;
39
28
  } | {
40
29
  type: "error";
41
30
  error: {
@@ -43,18 +32,18 @@ type PhonicWebSocketResponseMessage = {
43
32
  code?: string;
44
33
  };
45
34
  paramErrors?: {
46
- model?: string;
47
- output_format?: string;
35
+ system_prompt?: string;
36
+ welcome_message?: string;
48
37
  voice_id?: string;
49
- text?: string;
50
- speed?: string;
38
+ input_format?: string;
39
+ output_format?: string;
51
40
  };
52
41
  };
53
- type OnMessageCallback = (message: PhonicWebSocketResponseMessage) => void;
42
+ type OnMessageCallback = (message: PhonicSTSWebSocketResponseMessage) => void;
54
43
  type OnCloseCallback = (event: WebSocket.CloseEvent) => void;
55
44
  type OnErrorCallback = (event: WebSocket.ErrorEvent) => void;
56
45
 
57
- declare class PhonicWebSocket {
46
+ declare class PhonicSTSWebSocket {
58
47
  private readonly ws;
59
48
  private onMessageCallback;
60
49
  private onCloseCallback;
@@ -63,20 +52,24 @@ declare class PhonicWebSocket {
63
52
  onMessage(callback: OnMessageCallback): void;
64
53
  onClose(callback: OnCloseCallback): void;
65
54
  onError(callback: OnErrorCallback): void;
66
- generate(message: {
67
- text: string;
68
- speed?: number;
55
+ config(message: {
56
+ system_prompt?: string;
57
+ welcome_message?: string;
58
+ voice_id?: string;
59
+ input_format?: "pcm_44100" | "mulaw_8000";
60
+ output_format?: "pcm_44100" | "mulaw_8000";
61
+ }): void;
62
+ audioChunk(message: {
63
+ audio: string;
69
64
  }): void;
70
- flush(): void;
71
- stop(): void;
72
65
  close(): void;
73
66
  }
74
67
 
75
- declare class TextToSpeech {
68
+ declare class SpeechToSpeech {
76
69
  private readonly phonic;
77
70
  constructor(phonic: Phonic);
78
- websocket(params?: PhonicWebSocketParams): DataOrError<{
79
- phonicWebSocket: PhonicWebSocket;
71
+ websocket(): DataOrError<{
72
+ phonicWebSocket: PhonicSTSWebSocket;
80
73
  }>;
81
74
  }
82
75
 
@@ -105,7 +98,7 @@ declare class Phonic {
105
98
  readonly baseUrl: string;
106
99
  private readonly headers;
107
100
  readonly voices: Voices;
108
- readonly tts: TextToSpeech;
101
+ readonly sts: SpeechToSpeech;
109
102
  constructor(apiKey: string, config?: PhonicConfig);
110
103
  fetchRequest<T>(path: string, options: FetchOptions): DataOrError<T>;
111
104
  get<T>(path: string): Promise<{
@@ -117,4 +110,4 @@ declare class Phonic {
117
110
  }>;
118
111
  }
119
112
 
120
- export { Phonic, PhonicWebSocket };
113
+ export { Phonic, PhonicSTSWebSocket };
package/dist/index.js CHANGED
@@ -35,13 +35,13 @@ __export(index_exports, {
35
35
  module.exports = __toCommonJS(index_exports);
36
36
 
37
37
  // package.json
38
- var version = "0.4.0";
38
+ var version = "0.6.0";
39
39
 
40
- // src/tts/index.ts
40
+ // src/sts/index.ts
41
41
  var import_ws = __toESM(require("ws"));
42
42
 
43
- // src/tts/websocket.ts
44
- var PhonicWebSocket = class {
43
+ // src/sts/websocket.ts
44
+ var PhonicSTSWebSocket = class {
45
45
  constructor(ws) {
46
46
  this.ws = ws;
47
47
  this.ws.onmessage = (event) => {
@@ -51,7 +51,9 @@ var PhonicWebSocket = class {
51
51
  if (typeof event.data !== "string") {
52
52
  throw new Error("Received non-string message");
53
53
  }
54
- const dataObj = JSON.parse(event.data);
54
+ const dataObj = JSON.parse(
55
+ event.data
56
+ );
55
57
  this.onMessageCallback(dataObj);
56
58
  };
57
59
  this.ws.onclose = (event) => {
@@ -67,9 +69,10 @@ var PhonicWebSocket = class {
67
69
  this.onErrorCallback(event);
68
70
  };
69
71
  this.onMessage = this.onMessage.bind(this);
70
- this.generate = this.generate.bind(this);
71
- this.flush = this.flush.bind(this);
72
- this.stop = this.stop.bind(this);
72
+ this.onClose = this.onClose.bind(this);
73
+ this.onError = this.onError.bind(this);
74
+ this.config = this.config.bind(this);
75
+ this.audioChunk = this.audioChunk.bind(this);
73
76
  this.close = this.close.bind(this);
74
77
  }
75
78
  onMessageCallback = null;
@@ -84,41 +87,42 @@ var PhonicWebSocket = class {
84
87
  onError(callback) {
85
88
  this.onErrorCallback = callback;
86
89
  }
87
- generate(message) {
90
+ config(message) {
88
91
  this.ws.send(
89
92
  JSON.stringify({
90
- type: "generate",
93
+ type: "config",
91
94
  ...message
92
95
  })
93
96
  );
94
97
  }
95
- flush() {
96
- this.ws.send(JSON.stringify({ type: "flush" }));
97
- }
98
- stop() {
99
- this.ws.send(JSON.stringify({ type: "stop" }));
98
+ audioChunk(message) {
99
+ this.ws.send(
100
+ JSON.stringify({
101
+ type: "audio_chunk",
102
+ ...message
103
+ })
104
+ );
100
105
  }
101
106
  close() {
102
107
  this.ws.close();
103
108
  }
104
109
  };
105
110
 
106
- // src/tts/index.ts
107
- var TextToSpeech = class {
111
+ // src/sts/index.ts
112
+ var SpeechToSpeech = class {
108
113
  constructor(phonic) {
109
114
  this.phonic = phonic;
110
115
  }
111
- async websocket(params) {
116
+ async websocket() {
112
117
  return new Promise((resolve) => {
113
118
  const wsBaseUrl = this.phonic.baseUrl.replace(/^http/, "ws");
114
- const queryString = new URLSearchParams(params).toString();
115
- const ws = new import_ws.default(`${wsBaseUrl}/v1/tts/ws?${queryString}`, {
119
+ const ws = new import_ws.default(`${wsBaseUrl}/v1/sts/ws`, {
116
120
  headers: {
117
121
  Authorization: `Bearer ${this.phonic.apiKey}`
118
122
  }
119
123
  });
120
124
  ws.onopen = () => {
121
- const phonicWebSocket = new PhonicWebSocket(ws);
125
+ const phonicWebSocket = new PhonicSTSWebSocket(ws);
122
126
  resolve({ data: { phonicWebSocket }, error: null });
123
127
  };
124
128
  ws.onerror = (error) => {
@@ -178,7 +182,7 @@ var Phonic = class {
178
182
  baseUrl;
179
183
  headers;
180
184
  voices = new Voices(this);
181
- tts = new TextToSpeech(this);
185
+ sts = new SpeechToSpeech(this);
182
186
  async fetchRequest(path, options) {
183
187
  try {
184
188
  const response = await fetch(`${this.baseUrl}/v1${path}`, {
package/dist/index.mjs CHANGED
@@ -1,11 +1,11 @@
1
1
  // package.json
2
- var version = "0.4.0";
2
+ var version = "0.6.0";
3
3
 
4
- // src/tts/index.ts
4
+ // src/sts/index.ts
5
5
  import WebSocket from "ws";
6
6
 
7
- // src/tts/websocket.ts
8
- var PhonicWebSocket = class {
7
+ // src/sts/websocket.ts
8
+ var PhonicSTSWebSocket = class {
9
9
  constructor(ws) {
10
10
  this.ws = ws;
11
11
  this.ws.onmessage = (event) => {
@@ -15,7 +15,9 @@ var PhonicWebSocket = class {
15
15
  if (typeof event.data !== "string") {
16
16
  throw new Error("Received non-string message");
17
17
  }
18
- const dataObj = JSON.parse(event.data);
18
+ const dataObj = JSON.parse(
19
+ event.data
20
+ );
19
21
  this.onMessageCallback(dataObj);
20
22
  };
21
23
  this.ws.onclose = (event) => {
@@ -31,9 +33,10 @@ var PhonicWebSocket = class {
31
33
  this.onErrorCallback(event);
32
34
  };
33
35
  this.onMessage = this.onMessage.bind(this);
34
- this.generate = this.generate.bind(this);
35
- this.flush = this.flush.bind(this);
36
- this.stop = this.stop.bind(this);
36
+ this.onClose = this.onClose.bind(this);
37
+ this.onError = this.onError.bind(this);
38
+ this.config = this.config.bind(this);
39
+ this.audioChunk = this.audioChunk.bind(this);
37
40
  this.close = this.close.bind(this);
38
41
  }
39
42
  onMessageCallback = null;
@@ -48,41 +51,42 @@ var PhonicWebSocket = class {
48
51
  onError(callback) {
49
52
  this.onErrorCallback = callback;
50
53
  }
51
- generate(message) {
54
+ config(message) {
52
55
  this.ws.send(
53
56
  JSON.stringify({
54
- type: "generate",
57
+ type: "config",
55
58
  ...message
56
59
  })
57
60
  );
58
61
  }
59
- flush() {
60
- this.ws.send(JSON.stringify({ type: "flush" }));
61
- }
62
- stop() {
63
- this.ws.send(JSON.stringify({ type: "stop" }));
62
+ audioChunk(message) {
63
+ this.ws.send(
64
+ JSON.stringify({
65
+ type: "audio_chunk",
66
+ ...message
67
+ })
68
+ );
64
69
  }
65
70
  close() {
66
71
  this.ws.close();
67
72
  }
68
73
  };
69
74
 
70
- // src/tts/index.ts
71
- var TextToSpeech = class {
75
+ // src/sts/index.ts
76
+ var SpeechToSpeech = class {
72
77
  constructor(phonic) {
73
78
  this.phonic = phonic;
74
79
  }
75
- async websocket(params) {
80
+ async websocket() {
76
81
  return new Promise((resolve) => {
77
82
  const wsBaseUrl = this.phonic.baseUrl.replace(/^http/, "ws");
78
- const queryString = new URLSearchParams(params).toString();
79
- const ws = new WebSocket(`${wsBaseUrl}/v1/tts/ws?${queryString}`, {
83
+ const ws = new WebSocket(`${wsBaseUrl}/v1/sts/ws`, {
80
84
  headers: {
81
85
  Authorization: `Bearer ${this.phonic.apiKey}`
82
86
  }
83
87
  });
84
88
  ws.onopen = () => {
85
- const phonicWebSocket = new PhonicWebSocket(ws);
89
+ const phonicWebSocket = new PhonicSTSWebSocket(ws);
86
90
  resolve({ data: { phonicWebSocket }, error: null });
87
91
  };
88
92
  ws.onerror = (error) => {
@@ -142,7 +146,7 @@ var Phonic = class {
142
146
  baseUrl;
143
147
  headers;
144
148
  voices = new Voices(this);
145
- tts = new TextToSpeech(this);
149
+ sts = new SpeechToSpeech(this);
146
150
  async fetchRequest(path, options) {
147
151
  try {
148
152
  const response = await fetch(`${this.baseUrl}/v1${path}`, {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "phonic",
3
- "version": "0.4.0",
3
+ "version": "0.6.0",
4
4
  "description": "Phonic Node.js SDK",
5
5
  "scripts": {
6
6
  "build": "tsup",
@@ -33,13 +33,13 @@
33
33
  "url": "https://github.com/Phonic-Co/phonic-node/issues"
34
34
  },
35
35
  "dependencies": {
36
- "ws": "8.18.0"
36
+ "ws": "8.18.1"
37
37
  },
38
38
  "devDependencies": {
39
39
  "@biomejs/biome": "1.9.4",
40
- "@changesets/changelog-github": "0.5.0",
41
- "@changesets/cli": "2.27.12",
42
- "@types/bun": "1.2.2",
40
+ "@changesets/changelog-github": "0.5.1",
41
+ "@changesets/cli": "2.28.1",
42
+ "@types/bun": "1.2.3",
43
43
  "tsup": "8.3.6",
44
44
  "typescript": "5.7.3",
45
45
  "zod": "3.24.2"
@@ -51,8 +51,7 @@
51
51
  },
52
52
  "keywords": [
53
53
  "phonic",
54
- "text-to-speech",
55
- "tts",
54
+ "speech-to-speech",
56
55
  "javascript",
57
56
  "typescript",
58
57
  "ai",