@rajnandan1/atticus 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,315 @@
1
+ # Atticus
2
+
3
+ A framework-agnostic voice agent library for voice-controlled UI interactions, powered by OpenAI's Realtime API.
4
+
5
+ ## Features
6
+
7
+ - 🎙️ Real-time voice conversations with AI
8
+ - 🖱️ UI-aware interactions - let users control your app with voice
9
+ - ⚡ Auto-executes UI actions (click, type, scroll, etc.)
10
+ - 🌍 Multi-language support (40+ languages)
11
+ - 📦 Framework-agnostic - works with React, Vue, Svelte, vanilla JS, etc.
12
+ - 🔧 Simple event-based API
13
+ - 🎯 DOM compression for efficient context via d2snap
14
+
15
+ ## Installation
16
+
17
+ ```bash
18
+ npm install atticus
19
+ ```
20
+
21
+ ## Quick Start
22
+
23
+ ```typescript
24
+ import { Atticus } from "atticus";
25
+
26
+ // Get a client secret from your backend (which calls OpenAI's API)
27
+ const clientSecret = await fetchClientSecret();
28
+
29
+ const agent = new Atticus({
30
+ clientSecret,
31
+ voice: "shimmer", // Optional: alloy, ash, ballad, coral, echo, sage, shimmer, verse
32
+ language: "en", // Optional: supports 40+ languages
33
+ agent: {
34
+ name: "Assistant",
35
+ instructions: "You are a helpful assistant.",
36
+ },
37
+ });
38
+
39
+ // Listen to events
40
+ agent.on("connected", () => console.log("Connected!"));
41
+ agent.on("message", (msg) => console.log("Message:", msg));
42
+ agent.on("error", (err) => console.error("Error:", err));
43
+
44
+ // Connect and start talking
45
+ await agent.connect();
46
+
47
+ // Disconnect when done
48
+ agent.disconnect();
49
+ ```
50
+
51
+ ## UI-Aware Mode
52
+
53
+ Enable UI awareness to let users control your interface with voice. Actions are **automatically executed** by default:
54
+
55
+ ```typescript
56
+ const agent = new Atticus({
57
+ clientSecret,
58
+ agent: {
59
+ name: "UI Assistant",
60
+ instructions: "Help users fill out the form on this page.",
61
+ },
62
+ ui: {
63
+ enabled: true,
64
+ rootElement: document.getElementById("app")!,
65
+ autoUpdate: true, // Auto-refresh DOM context
66
+ },
67
+ });
68
+
69
+ // Actions are auto-executed! Just listen for logging/feedback
70
+ agent.on("action", (action) => {
71
+ console.log("Action executed:", action.outputText);
72
+ console.log("Code:", action.outputCode);
73
+ });
74
+
75
+ await agent.connect();
76
+
77
+ // Now say: "Fill the name field with John Doe"
78
+ // The library will automatically execute the action!
79
+ ```
80
+
81
+ ### Manual Action Execution
82
+
83
+ If you want to handle actions yourself:
84
+
85
+ ```typescript
86
+ const agent = new Atticus({
87
+ clientSecret,
88
+ agent: { name: "Assistant", instructions: "..." },
89
+ doNotExecuteActions: true, // Disable auto-execution
90
+ ui: { enabled: true, rootElement: document.body },
91
+ });
92
+
93
+ agent.on("action", async (action) => {
94
+ // Validate or modify action before execution
95
+ if (action.actionType === "click") {
96
+ const result = await agent.executeAction(action);
97
+ console.log("Result:", result);
98
+ }
99
+ });
100
+ ```
101
+
102
+ ## Configuration
103
+
104
+ ```typescript
105
+ interface AtticusConfig {
106
+ // Required: OpenAI client secret (ephemeral key)
107
+ clientSecret: string;
108
+
109
+ // Required: Agent configuration
110
+ agent: {
111
+ name: string;
112
+ instructions: string;
113
+ };
114
+
115
+ // Optional: Voice for the agent (default: 'alloy')
116
+ // Options: 'alloy', 'ash', 'ballad', 'coral', 'echo', 'sage', 'shimmer', 'verse'
117
+ voice?: AtticusVoice;
118
+
119
+ // Optional: Language code (default: 'en')
120
+ // Supports: en, es, fr, de, it, pt, ru, ja, ko, zh, hi, ar, and 30+ more
121
+ language?: string;
122
+
123
+ // Optional: OpenAI model (default: 'gpt-4o-realtime-preview')
124
+ model?: string;
125
+
126
+ // Optional: Auto-greet on connect (default: true)
127
+ autoGreet?: boolean;
128
+
129
+ // Optional: Greeting message (default: language-specific greeting)
130
+ greetingMessage?: string;
131
+
132
+ // Optional: Debug logging (default: false)
133
+ debug?: boolean;
134
+
135
+ // Optional: Disable auto-execution of UI actions (default: false)
136
+ doNotExecuteActions?: boolean;
137
+
138
+ // Optional: UI awareness configuration
139
+ ui?: {
140
+ enabled: boolean;
141
+ rootElement: Element;
142
+ autoUpdate?: boolean;
143
+ autoUpdateInterval?: number; // ms, default: 5000
144
+ d2SnapOptions?: {
145
+ maxTokens?: number; // default: 4096
146
+ assignUniqueIDs?: boolean; // default: true
147
+ };
148
+ };
149
+ }
150
+ ```
151
+
152
+ ## Voice Options
153
+
154
+ | Voice | Description |
155
+ | --------- | --------------------------- |
156
+ | `alloy` | Neutral, balanced (default) |
157
+ | `ash` | Soft, gentle |
158
+ | `ballad` | Warm, expressive |
159
+ | `coral` | Clear, friendly |
160
+ | `echo` | Smooth, conversational |
161
+ | `sage` | Calm, wise |
162
+ | `shimmer` | Bright, energetic |
163
+ | `verse` | Articulate, professional |
164
+
165
+ ## Supported Languages
166
+
167
+ Atticus supports 40+ languages with native greetings. Set the `language` option:
168
+
169
+ ```typescript
170
+ const agent = new Atticus({
171
+ clientSecret,
172
+ language: "hi", // Hindi - will greet with "नमस्ते!"
173
+ agent: { name: "Assistant", instructions: "..." },
174
+ });
175
+ ```
176
+
177
+ | Code | Language | Code | Language | Code | Language |
178
+ | ---- | ---------- | ---- | -------- | ---- | --------- |
179
+ | `en` | English | `ja` | Japanese | `pl` | Polish |
180
+ | `hi` | Hindi | `ko` | Korean | `nl` | Dutch |
181
+ | `es` | Spanish | `zh` | Chinese | `sv` | Swedish |
182
+ | `fr` | French | `ar` | Arabic | `da` | Danish |
183
+ | `de` | German | `bn` | Bengali | `no` | Norwegian |
184
+ | `it` | Italian | `ta` | Tamil | `fi` | Finnish |
185
+ | `pt` | Portuguese | `te` | Telugu | `tr` | Turkish |
186
+ | `ru` | Russian | `th` | Thai | `uk` | Ukrainian |
187
+
188
+ ## Events
189
+
190
+ | Event | Payload | Description |
191
+ | ------------------------- | ------------------- | ------------------------------------------------------------- |
192
+ | `connected` | - | Successfully connected |
193
+ | `disconnected` | - | Disconnected |
194
+ | `error` | `string` | Error occurred |
195
+ | `statusChange` | `AtticusStatus` | Connection status changed |
196
+ | `conversationStateChange` | `ConversationState` | Conversation state changed |
197
+ | `message` | `Message` | New message received |
198
+ | `historyChange` | `Message[]` | Conversation history updated |
199
+ | `stateChange` | `AtticusState` | Any state changed |
200
+ | `agentStart` | - | Agent started speaking |
201
+ | `agentEnd` | - | Agent stopped speaking |
202
+ | `userAudio` | - | User audio detected |
203
+ | `action` | `UIAction` | UI action executed (or requested if doNotExecuteActions=true) |
204
+
205
+ ## UI Action Types
206
+
207
+ When UI mode is enabled, the agent can perform these actions:
208
+
209
+ | Action | Description | Example Code |
210
+ | ---------- | -------------------------- | ----------------------------------------------------- |
211
+ | `click` | Click elements | `document.getElementById('btn').click()` |
212
+ | `type` | Enter text | `document.getElementById('input').value = 'Hello'` |
213
+ | `scroll` | Scroll page/elements | `window.scrollTo(0, 500)` |
214
+ | `focus` | Focus form elements | `document.getElementById('field').focus()` |
215
+ | `select` | Select dropdown options | `document.getElementById('select').value = 'option1'` |
216
+ | `hover` | Hover over elements | - |
217
+ | `navigate` | Navigate pages | `window.location.href = '/page'` |
218
+ | `read` | Read information (no code) | - |
219
+
220
+ ## API
221
+
222
+ ### Methods
223
+
224
+ - `connect()` - Connect to the voice agent
225
+ - `disconnect()` - Disconnect from the voice agent
226
+ - `toggle()` - Toggle connection state
227
+ - `interrupt()` - Interrupt the AI while speaking
228
+ - `sendMessage(text)` - Send a text message
229
+ - `updateDOM(element | html)` - Manually update DOM context
230
+ - `refreshDOM()` - Refresh DOM from root element
231
+ - `startAutoUpdate()` - Start auto-updating DOM
232
+ - `stopAutoUpdate()` - Stop auto-updating DOM
233
+ - `executeAction(action)` - Manually execute a UI action
234
+ - `getState()` - Get complete state object
235
+ - `destroy()` - Clean up resources
236
+
237
+ ### Properties
238
+
239
+ - `status` - Connection status (`idle` | `connecting` | `connected` | `error`)
240
+ - `conversationState` - Conversation state (`idle` | `ai_speaking` | `user_turn` | `user_speaking`)
241
+ - `error` - Error message (if any)
242
+ - `history` - Conversation history
243
+ - `isConnected` - Is connected
244
+ - `isAiSpeaking` - Is AI speaking
245
+ - `isUserSpeaking` - Is user speaking
246
+ - `language` - Configured language
247
+ - `currentDOM` - Current DOM context
248
+ - `isUIEnabled` - Is UI mode enabled
249
+
250
+ ## Getting a Client Secret
251
+
252
+ The client secret (ephemeral key) must be obtained from your backend. Here's an example:
253
+
254
+ ### Backend (Node.js/Express)
255
+
256
+ ```typescript
257
+ import OpenAI from "openai";
258
+
259
+ const openai = new OpenAI();
260
+
261
+ app.post("/api/session", async (req, res) => {
262
+ const response = await fetch(
263
+ "https://api.openai.com/v1/realtime/client_secrets",
264
+ {
265
+ method: "POST",
266
+ headers: {
267
+ Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
268
+ "Content-Type": "application/json",
269
+ },
270
+ body: JSON.stringify({
271
+ session: {
272
+ type: "realtime",
273
+ model: "gpt-4o-realtime-preview",
274
+ },
275
+ }),
276
+ }
277
+ );
278
+
279
+ const data = await response.json();
280
+ res.json({ clientSecret: data.client_secret.value });
281
+ });
282
+ ```
283
+
284
+ ### Frontend
285
+
286
+ ```typescript
287
+ async function fetchClientSecret() {
288
+ const response = await fetch('/api/session', { method: 'POST' });
289
+ const data = await response.json();
290
+ return data.clientSecret;
291
+ }
292
+
293
+ const clientSecret = await fetchClientSecret();
294
+ const agent = new Atticus({ clientSecret, ... });
295
+ ```
296
+
297
+ ## Running the Demo
298
+
299
+ ```bash
300
+ # Clone the repo
301
+ git clone https://github.com/aspect-labs/atticus.git
302
+ cd atticus
303
+
304
+ # Install dependencies
305
+ npm install
306
+
307
+ # Start dev server (builds + serves demo)
308
+ npm run dev
309
+
310
+ # Open http://localhost:3000/demo/
311
+ ```
312
+
313
+ ## License
314
+
315
+ MIT