@anam-ai/js-sdk 1.9.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +23 -453
  2. package/package.json +2 -5
package/README.md CHANGED
@@ -6,15 +6,23 @@ This is the official JavaScript SDK for integrating Anam AI realtime digital per
6
6
 
7
7
  The Anam AI JavaScript SDK is designed to help developers integrate Anam AI's digital personas into their JavaScript applications. The SDK provides a set of APIs and utilities to make it easier to create, manage, and interact with digital personas in a realtime environment.
8
8
 
9
+ ## Documentation
10
+
11
+ Full documentation is available at [docs.anam.ai](https://docs.anam.ai).
12
+
13
+ ## Examples
14
+
15
+ Check out our [example projects](https://github.com/anam-org/anam-examples) for implementation samples.
16
+
9
17
  ## Prerequisites
10
18
 
11
19
  ### An Anam AI account
12
20
 
13
- Public account creation is currently unavailable. If you are a design partner your account will be created for you by our team.
21
+ To create a free account head to the [Anam Lab](https://lab.anam.ai) and sign up.
14
22
 
15
23
  ### An Anam API key
16
24
 
17
- Public API keys are not yet available. If you are a design partner an API key will be shared with you during onboarding.
25
+ To use the SDK you first need an API key. Follow the instructions [here](https://docs.anam.ai/guides/get-started/api-key) to create one.
18
26
 
19
27
  # Getting Started
20
28
 
@@ -64,11 +72,22 @@ When deploying to production it is important not to publicly expose your API key
64
72
 
65
73
  ```typescript
66
74
  const response = await fetch(`https://api.anam.ai/v1/auth/session-token`, {
67
- method: 'GET',
75
+ method: 'POST',
68
76
  headers: {
69
77
  'Content-Type': 'application/json',
70
78
  Authorization: `Bearer ${apiKey}`,
71
79
  },
80
+ body: JSON.stringify({
81
+ personaConfig: {
82
+ name: 'Test Assistant',
83
+ personaPreset: 'eva',
84
+ brainType: 'ANAM_GPT_4O_MINI_V1',
85
+ personality: 'You are a helpful and friendly AI assistant.',
86
+ systemPrompt:
87
+ 'You are an AI assistant focused on helping with technical questions.',
88
+ fillerPhrases: ['Let me think about that...'],
89
+ },
90
+ }),
72
91
  });
73
92
  const data = await response.json();
74
93
  const sessionToken = data.sessionToken;
@@ -79,458 +98,9 @@ Once you have a session token you can use the `createClient` method of the Anam
79
98
  ```typescript
80
99
  import { createClient } from '@anam-ai/js-sdk';
81
100
 
82
- const anamClient = createClient('your-session-token', {
83
- personaId: 'chosen-persona-id',
84
- });
101
+ const anamClient = createClient('your-session-token');
85
102
  ```
86
103
 
87
104
  Regardless of whether you initialise the client using an API key or session token the client exposes the same set of available methods for streaming.
88
105
 
89
106
  [See here](#starting-a-session-in-production-environments) for an example sequence diagram of starting a session in production environments.
90
-
91
- ## Using the Talk command to control persona output
92
-
93
- Sometimes during a persona session you may wish to force a response from the persona. For example when the user interacts with an element on the page or when you have disabled the Anam LLM in order to use your own. To do this you can use the `talk` method of the Anam client.
94
-
95
- ```typescript
96
- anamClient.talk('Content to say');
97
- ```
98
-
99
- The `talk` method will send a `talk` command to the persona which will respond by speaking the provided content.
100
-
101
- ## Streaming Talk input
102
-
103
- You may want to stream a particular message to the persona in multiple chunks. For example when you are streaming output from a custom LLM and want to reduce latency by sending the message in chunks. To do this you can use the `createTalkMessageStream` method to create a `TalkMessageStream` instance and use the `streamMessageChunk` method to send the message in chunks.
104
-
105
- This approach can be useful to reduce latency when streaming output from a custom LLM, however it does introduce some complexity due to the need to handle interrupts and end of speech.
106
-
107
- Example usage:
108
-
109
- ```typescript
110
- const talkMessageStream = anamClient.createTalkMessageStream();
111
- const chunks = ['He', 'l', 'lo', ', how are you?'];
112
-
113
- for (const chunk of chunks) {
114
- if (talkMessageStream.isActive()) {
115
- talkMessageStream.streamMessageChunk(
116
- chunk,
117
- chunk === chunks[chunks.length - 1],
118
- );
119
- }
120
- }
121
- ```
122
-
123
- If a sentence is not interrupted, you must signal the end of the speech yourself by calling `streamMessageChunk` with the `endOfSpeech` parameter set to `true` or by calling `talkMessageStream.endMessage()`. If a talkMessageStream is already closed, either due to an interrupt or end of speech, you do not need to signal end of speech.
124
-
125
- **Important note**: One talkMessageStream represents one "turn" in the conversation. Once that turn is over, the object can no longer be used and you must create a new `TalkMessageStream` using `anamClient.createTalkMessageStream()`.
126
-
127
- There are two ways a "turn" can end and a `TalkMessageStream` closed:
128
-
129
- 1. End of speech: `streamMessageChunk` is called with the `endOfSpeech` parameter set to `true` or `endMessage` is called.
130
- 2. Interrupted by user speech: The user speaks during the stream causing an `AnamEvent.TALK_STREAM_INTERRUPTED` event to fire and the `TalkMessageStream` object to be closed automatically.
131
-
132
- In both of these cases the `TalkMessageStream` object is closed and no longer usable. If you try to call `streamMessageChunk` or `endMessage` on a closed `TalkMessageStream` object you will be met with an error. To handle this you can check the `isActive()` method of the `TalkMessageStream` object or listen for the `AnamEvent.TALK_STREAM_INTERRUPTED` event.
133
-
134
- #### Correlation IDs
135
-
136
- The `streamMessageChunk` method accepts an optional `correlationId` parameter. **This should be unique for every `TalkMessageStream` instance.** When a talk stream is interrupted by user speech the interrupted stream's `correlationId` will be sent in the `AnamEvent.TALK_STREAM_INTERRUPTED` event.
137
-
138
- Currently only one `TalkMessageStream` can be active at a time, so this is not strictly necessary.
139
-
140
- ## Controlling the input audio
141
-
142
- ### Audio input state
143
-
144
- By default the Anam client starts capturing input audio from the users microphone when a session starts and stops capturing the audio when the session ends. For certain use cases however, you may wish to control the input audio state programmatically.
145
-
146
- To get the current input audio state.
147
-
148
- ```typescript
149
- const audioState: InputAudioState = anamClient.getInputAudioState();
150
- // { isMuted: false } or { isMuted: true }
151
- ```
152
-
153
- To mute the input audio.
154
-
155
- ```typescript
156
- const audioState: InputAudioState = anamClient.muteInputAudio();
157
- // { isMuted: true }
158
- ```
159
-
160
- **Note**: If you mute the input audio before starting a stream the session will start with microphone input disabled.
161
-
162
- To unmute the input audio.
163
-
164
- ```typescript
165
- const audioState: InputAudioState = anamClient.unmuteInputAudio();
166
- // { isMuted: false }
167
- ```
168
-
169
- ### Using custom input streams
170
-
171
- If you wish to control the microphone input audio capture yourself you can instead pass your own `MediaStream` object when starting a stream.
172
-
173
- ```typescript
174
- anamClient.streamToVideoAndAudioElements(
175
- 'video-element-id',
176
- 'audio-element-id',
177
- userProvidedMediaStream,
178
- );
179
- ```
180
-
181
- The `userProvidedMediaStream` object must be an instance of `MediaStream` and the user input audio should be the first audio track returned from the `MediaStream.getAudioTracks()` method.
182
-
183
- **Note**: This is the default behaviour if you are using `navigator.mediaDevices.getUserMedia()`.
184
-
185
- ## Additional Configuration
186
-
187
- ### Disable brains
188
-
189
- You can turn off the Anam LLM by passing the `disableBrains` config option to the client during initialisation. If this option is set to `true` then the persona will wait to receive `talk` commands and will not respond to voice input from the user.
190
-
191
- ```typescript
192
- import { createClient } from '@anam-ai/js-sdk';
193
-
194
- const anamClient = createClient('your-session-token', {
195
- personaId: 'chosen-persona-id',
196
- disableBrains: true,
197
- });
198
- ```
199
-
200
- ### Disable filler phrases
201
-
202
- To turn off the use of filler phrases by the persona you can set the `disableFillerPhrases` option to true.
203
-
204
- ```typescript
205
- import { createClient } from '@anam-ai/js-sdk';
206
-
207
- const anamClient = createClient('your-session-token', {
208
- personaId: 'chosen-persona-id',
209
- disableFillerPhrases: true,
210
- });
211
- ```
212
-
213
- > **Note**: the option `disableFillerPhrases` has no effect if `disableBrains` is set to `true`.
214
-
215
- ### Updating client config after initialisation
216
-
217
- If you have already initialised the Anam client but wish to update the persona configuration you can use the `setPersonaConfig` method
218
-
219
- ```typescript
220
- import { createClient } from '@anam-ai/js-sdk';
221
-
222
- const anamClient = createClient('your-session-token', {
223
- personaId: 'chosen-persona-id',
224
- });
225
-
226
- anamClient.setPersonaConfig({
227
- personaId: 'chosen-persona-id',
228
- disableFillerPhrases: true,
229
- });
230
- ```
231
-
232
- To check the currently set config use the `getPersonaConfig` method.
233
-
234
- ```typescript
235
- const config = anamClient.getPersonaConfig();
236
- ```
237
-
238
- ## Session Options
239
-
240
- Session options are a set of optional parameters that can be adjusted to tweak the behaviour of a session, such as controlling the sensitivity of the Anam voice detection. Session options can be passed to the Anam client during initialisation.
241
-
242
- ```typescript
243
- const anamClient = createClient(
244
- 'your-session-token',
245
- {
246
- personaId: 'chosen-persona-id',
247
- },
248
- {
249
- voiceDetection: { endOfSpeechSensitivity: 0.7 },
250
- },
251
- );
252
- ```
253
-
254
- ### Available Session Options
255
-
256
- | Option | Sub-option | Type | Description | Status |
257
- | ---------------- | ------------------------ | ------ | ------------------------------------------------------- | ----------- |
258
- | `voiceDetection` | `endOfSpeechSensitivity` | number | Adjusts the sensitivity of the end-of-speech detection. | Coming soon |
259
-
260
- ## Listening to Events
261
-
262
- After initialising the Anam client you can register any event listeners using the `addListener` method.
263
-
264
- ```typescript
265
- import { AnamEvent } from "@anam-ai/js-sdk/dist/module/types";
266
-
267
- anamClient.addListener(AnamEvent.CONNECTION_ESTABLISHED, () => {
268
- console.log('Connection Established');
269
- });
270
-
271
- anamClient.addListener(AnamEvent.MESSAGE_HISTORY_UPDATED, (messages) => {
272
- console.log('Updated Messages: ', messages);
273
- });
274
- ```
275
-
276
- ### Event Types
277
-
278
- | Event Name | Description |
279
- | ------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
280
- | `CONNECTION_ESTABLISHED` | Called when the direct connection between the browser and the Anam Engine has been established. |
281
- | `CONNECTION_CLOSED` | Called when the direct connection between the browser and the Anam Engine has been closed. |
282
- | `VIDEO_PLAY_STARTED` | When streaming directly to a video element this event is fired when the first frames start playing. Useful for removing any loading indicators during connection. |
283
- | `MESSAGE_HISTORY_UPDATED` | Called with the message history transcription of the current session each time the user or the persona finishes speaking. |
284
- | `MESSAGE_STREAM_EVENT_RECEIVED` | For persona speech, this stream is updated with each transcribed speech chunk as the persona is speaking. For the user speech this stream is updated with a complete transcription of the user's sentence once they finish speaking. |
285
- | `INPUT_AUDIO_STREAM_STARTED` | Called with the users input audio stream when microphone input has been initialised. |
286
- | `TALK_STREAM_INTERRUPTED` | Called when the user interrupts the current `TalkMessageStream` by speaking. The interrupted stream's `correlationId` will be sent in the event. The TalkMessageStream object is automatically closed by the SDK but this event is provided to allow for any additional actions. |
287
-
288
- # Personas
289
-
290
- Available personas are managed via the [Anam API](https://api.anam.ai/api).
291
-
292
- > **Note**: The examples below are shown using bash curl syntax. For the best experience we recommend trying queries directly from our [interactive Swagger documentation](https://api.anam.ai/api). To use the interactive Swagger documentation you will first need to authenticate by clicking the Authorize button in the top right and pasting your API key into the displayed box.
293
-
294
- ### Listing Available Personas
295
-
296
- To list all personas available for your account use the `/v1/personas` endpoint.
297
-
298
- ```bash
299
- # Example Request
300
- curl -X GET "https://api.anam.ai/v1/personas" -H "Authorization: Bearer your-api-key"
301
-
302
- # Example Response
303
-
304
- {
305
- "data": [
306
- {
307
- "id": "773a8ca8-efd8-4449-9305-8b8bc1591475",
308
- "name": "Leo",
309
- "description": "Leo is the virtual receptionist of the Sunset Hotel.",
310
- "personaPreset": "leo_desk",
311
- "isDefaultPersona": true,
312
- "createdAt": "2021-01-01T00:00:00Z",
313
- "updatedAt": "2021-01-02T00:00:00Z"
314
- }
315
- ],
316
- "meta": {
317
- "total": 1,
318
- "lastPage": 0,
319
- "currentPage": 1,
320
- "perPage": 10,
321
- "prev": 0,
322
- "next": 0
323
- }
324
- }
325
-
326
- ```
327
-
328
- Every Anam account has access to a pre-defined list of 'default personas', identifiable by the `isDefaultPersona` attribute. Currently there is one default persona 'Leo', the virtual receptionist of the Sunset Hotel.
329
-
330
- > **Quick start**: Use the persona id `773a8ca8-efd8-4449-9305-8b8bc1591475` when initialising the SDK if you wish to try out Leo.
331
-
332
- To show more detail about a specific persona you can use the `/v1/personas/{id}` endpoint.
333
-
334
- ```bash
335
- # Example Request
336
- curl -X GET "https://api.anam.ai/v1/personas/773a8ca8-efd8-4449-9305-8b8bc1591475" -H "Authorization: Bearer your-api-key"
337
-
338
- # Example Response
339
- {
340
- "id": "773a8ca8-efd8-4449-9305-8b8bc1591475",
341
- "name": "Leo",
342
- "description": "Leo is the virtual receptionist of the Sunset Hotel.",
343
- "personaPreset": "leo_desk",
344
- "brain": {
345
- "id": "3c4525f0-698d-4e8d-b619-8c97a23780512",
346
- "personality": "You are role-playing as a text chatbot hotel receptionist at The Sunset Hotel. Your name is Leo.",
347
- "systemPrompt": "You are role-playing as a text chatbot hotel receptionist at The Sunset Hotel...",
348
- "fillerPhrases": ["One moment please.", "Let me check that for you."],
349
- "createdAt": "2021-01-01T00:00:00Z",
350
- "updatedAt": "2021-01-02T00:00:00Z"
351
- }
352
- }
353
- ```
354
-
355
- ### Creating Custom Personas
356
-
357
- You can create your own custom personas by using the `/v1/personas` endpoint via a `POST` request which defined the following properties:
358
- | Persona parameter | Description |
359
- |----------------|---------------------------------------------------------------------------------------------------------|
360
- | `name` | The name for the persona. This is used as a human-readable identifier for the persona. |
361
- | `description` | A brief description of the persona. This is optional and helps provide context about the persona's role. Not used by calls to the LLM. |
362
- | `personaPreset`| Defines the face and voice of the persona from a list of available presets. |
363
- | `brain` | Configuration for the persona's LLM 'brain' including the system prompt, personality, and filler phrases. |
364
-
365
- | Brain Parameter | Description |
366
- | --------------- | ----------------------------------------------------------------------------------------------------- |
367
- | `systemPrompt` | The prompt used for initializing LLM interactions, setting the context for the persona's behaviour. |
368
- | `personality` | A short description of the persona's character traits which influences the choice of filler phrases. |
369
- | `fillerPhrases` | Phrases used to enhance interaction response times, providing immediate feedback before a full reply. |
370
-
371
- Example usage
372
-
373
- ```bash
374
- # Example Request
375
- curl -X POST "https://api.anam.ai/v1/personas" -H "Content-Type: application/json" -H "Authorization: Bearer your-api-key" -d '{
376
- "name": "Leo",
377
- "description": "Leo is the virtual receptionist of the Sunset Hotel.",
378
- "personaPreset": "leo_desk",
379
- "brain": {
380
- "systemPrompt": "You are Leo, a virtual receptionist...",
381
- "personality": "You are role-playing as a text chatbot hotel receptionist at The Sunset Hotel. Your name is Leo.",
382
- "fillerPhrases": ["One moment please.", "Let me check that for you."]
383
- }
384
- }'
385
-
386
- # Example Response
387
- {
388
- "id": "new_persona_id",
389
- "name": "Leo",
390
- "description": "Leo is the virtual receptionist of the Sunset Hotel.",
391
- "personaPreset": "leo_desk",
392
- "brain": {
393
- "id": "new_brain_id",
394
- "personality": "helpful and friendly",
395
- "systemPrompt": "You are Leo, a virtual receptionist...",
396
- "fillerPhrases": ["One moment please...", "Let me check that for you..."],
397
- "createdAt": "2021-01-01T00:00:00Z",
398
- "updatedAt": "2021-01-02T00:00:00Z"
399
- }
400
- }
401
- ```
402
-
403
- # Frequently Asked Questions
404
-
405
- ### What personas are currently available?
406
- There are 6 default personas available in various backgrounds:
407
- - Leo (leo_desk, leo_windowdesk, leo_windowsofacorner)
408
- - Alister (alister_desk, alister_windowdesk, alister_windowsofa)
409
- - Astrid (astrid_desk, astrid_windowdesk, astrid_windowsofacorner)
410
- - Cara (cara_desk, cara_windowdesk, cara_windowsofa)
411
- - Evelyn (evelyn_desk)
412
- - Pablo (pablo_desk, pablo_windowdesk, pablo_windowsofa)
413
-
414
- You can list available personas using the `/v1/personas` endpoint or view them in the Anam Lab.
415
-
416
- ### How is usage time calculated and billed?
417
- Usage time starts when you call the `stream()` method and ends when that specific stream closes. The time is tracked in seconds and billed per minute. The starter plan includes:
418
- - 60 free minutes per month
419
- - $0.18/minute for overages
420
- - Up to 3 concurrent conversations
421
- - Access to 6 personas and multiple backgrounds
422
- - Usage is billed retrospectively at the end of each month
423
-
424
- ### What causes latency and how can I optimize it?
425
- Latency can come from several sources:
426
- - Connection setup time (usually 1-2 seconds, but can be up to 5 seconds)
427
- - LLM processing time
428
- - TTS generation
429
- - Network conditions
430
-
431
- To optimize latency:
432
- - Use shorter initial sentences in responses
433
- - Take advantage of phrase caching (repeated phrases will be faster)
434
- - Consider using our turnkey solution instead of custom LLM for lowest latency
435
- - Use the streaming API for custom LLM implementations
436
-
437
- ### How do I handle multilingual conversations?
438
- Current language support:
439
- - Speech recognition currently struggles outside of English and will often translate non-English speech to English
440
- - We have language selection coming soon to fix this
441
- - TTS supports multiple languages but voice quality may vary
442
- - System prompts can be set to specific languages
443
- - Language handling is primarily controlled via the system prompt
444
- - Auto-language detection is planned for future releases
445
-
446
- ### Can I interrupt the persona while it's speaking?
447
- Yes, you can interrupt the persona in two ways:
448
- 1. Send a new `talk()` command which will override the current speech
449
- 2. When using streaming, user speech will automatically interrupt the current stream
450
-
451
- Note: Currently there isn't a way to completely silence the persona mid-speech, but sending a short punctuation mark (like "." or "!") through the talk command can achieve a similar effect.
452
-
453
- ### How do I integrate my own LLM?
454
- To use your own LLM:
455
- 1. Initialize the client with `disableBrains: true`
456
- 2. Handle speech-to-text events via `MESSAGE_HISTORY_UPDATED`
457
- 3. Process the text through your LLM
458
- 4. Send responses using the `talk()` method (or for better latency try `createTalkMessageStream()`)
459
-
460
- ```javascript
461
- const anamClient = createClient(sessionToken, {
462
- personaId: 'your-persona-id',
463
- disableBrains: true,
464
- });
465
-
466
- anamClient.addListener(AnamEvent.MESSAGE_HISTORY_UPDATED, (messages) => {
467
- // Process with your LLM
468
- // Send response with anamClient.talk()
469
- });
470
- ```
471
-
472
- ### What are the browser compatibility requirements?
473
- The SDK requires:
474
- - Modern browser with WebRTC support
475
- - Microphone permissions for audio input
476
- - Autoplay capabilities for video/audio
477
- - WebAssembly support
478
-
479
- Safari/iOS notes:
480
- - Requires explicit user interaction for audio playback
481
- - May have additional security policy requirements
482
- - WebKit engine has specific autoplay restrictions
483
-
484
- ### How do I monitor current usage?
485
- Usage tracking options:
486
- - Available in Anam Lab
487
- - API endpoint for usage stats coming soon
488
- - Session logs available on request
489
-
490
- ### What's the difference between development and production setup?
491
- Development:
492
- ```javascript
493
- const client = unsafe_createClientWithApiKey('your-api-key', {
494
- personaId: 'chosen-persona-id',
495
- });
496
- ```
497
-
498
- Production:
499
- 1. Exchange API key for session token server-side
500
- 2. Pass session token to client
501
- ```javascript
502
- const client = createClient('session-token', {
503
- personaId: 'chosen-persona-id',
504
- });
505
- ```
506
-
507
- ### How do I handle connection issues?
508
- Common issues and solutions:
509
- - For "403 Forbidden" errors, verify API key/session token
510
- - If video doesn't appear, check element IDs match exactly
511
- - Connection timeouts may require retry logic
512
- - Session tokens expire and need refresh
513
- - Monitor `CONNECTION_CLOSED` events for network issues
514
-
515
- ### What features are coming soon?
516
- Near-term roadmap includes:
517
- - One-shot model for custom persona creation
518
- - Improved streaming support for custom LLMs
519
- - Usage dashboard and analytics
520
- - Enhanced multilingual support
521
- - Function calling capabilities
522
- - Additional persona options
523
-
524
- # Sequence Diagrams
525
-
526
- ## Starting a session in production environments
527
-
528
- ![Example sequence diagram](resources/media/start-session.png)
529
-
530
- ## Interaction loop
531
-
532
- ![Example interaction loop](resources/media/interaction-loop.png)
533
-
534
- ## Interaction loop with custom LLM usage
535
-
536
- ![Example interaction loop for custom LLM diagram](resources/media/custom-llm-interaction.png)
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@anam-ai/js-sdk",
3
- "version": "1.9.0",
3
+ "version": "v2.0.0",
4
4
  "description": "Client side JavaScript SDK for Anam AI",
5
5
  "author": "Anam AI",
6
6
  "main": "dist/main/index.js",
@@ -24,14 +24,13 @@
24
24
  "contributors": [
25
25
  "Anam AI"
26
26
  ],
27
- "license": "ISC",
27
+ "license": "MIT",
28
28
  "bugs": {
29
29
  "url": "https://github.com/anam-org/javascript-sdk/issues"
30
30
  },
31
31
  "homepage": "https://github.com/anam-org/javascript-sdk#readme",
32
32
  "scripts": {
33
33
  "clean": "rimraf dist",
34
- "docs": "typedoc",
35
34
  "build": "run-s clean format build:*",
36
35
  "build:main": "tsc -p tsconfig.json",
37
36
  "build:module": "tsc -p tsconfig.module.json",
@@ -59,8 +58,6 @@
59
58
  "pretty-quick": "^4.0.0",
60
59
  "rimraf": "^5.0.7",
61
60
  "ts-loader": "^9.5.1",
62
- "typedoc": "^0.25.13",
63
- "typedoc-material-theme": "^1.0.2",
64
61
  "typescript": "^5.4.5",
65
62
  "typescript-eslint": "^7.9.0",
66
63
  "webpack": "^5.91.0",