esap-aiui-react 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +1380 -0
- package/assets/architecture-diagram.png +0 -0
- package/dist/AIUIProvider.d.ts +58 -0
- package/dist/AIUIProvider.d.ts.map +1 -0
- package/dist/auuichat.d.ts +57 -0
- package/dist/auuichat.d.ts.map +1 -0
- package/dist/index.d.ts +6 -0
- package/dist/index.d.ts.map +1 -0
- package/dist/index.esm.js +2 -0
- package/dist/index.esm.js.map +1 -0
- package/dist/index.js +2 -0
- package/dist/index.js.map +1 -0
- package/dist/types.d.ts +91 -0
- package/dist/types.d.ts.map +1 -0
- package/package.json +68 -0
package/README.md
ADDED
|
@@ -0,0 +1,1380 @@
|
|
|
1
|
+
# AIUI React SDK
|
|
2
|
+
|
|
3
|
+
[](https://www.npmjs.com/package/@espai/aiui-react-sdk)
|
|
4
|
+
[](https://www.npmjs.com/package/@espai/aiui-react-sdk)
|
|
5
|
+
[](https://opensource.org/licenses/MIT)
|
|
6
|
+
[](https://www.typescriptlang.org/)
|
|
7
|
+
[](https://github.com/espai/aiui-react-sdk/pulls)
|
|
8
|
+
|
|
9
|
+
**Zero-configuration voice and chat control for React applications through autonomous semantic UI discovery.**
|
|
10
|
+
|
|
11
|
+
AIUI React SDK enables natural language interaction with any React application through both voice commands and text chat, without manual UI annotation or intent mapping. The framework employs real-time DOM observation and semantic element discovery to automatically understand your application's interface, allowing users to control your app through conversational voice or text-based commands.
|
|
12
|
+
|
|
13
|
+
## Overview
|
|
14
|
+
|
|
15
|
+
Traditional voice control solutions require extensive manual configuration, predefined intent schemas, or explicit UI element annotation. AIUI eliminates this overhead through a novel semantic discovery architecture that automatically maps UI elements to their contextual meaning, enabling immediate voice interaction with zero setup.
|
|
16
|
+
|
|
17
|
+
### Core Innovation
|
|
18
|
+
|
|
19
|
+
The SDK implements a **hybrid discovery engine** combining MutationObserver-based DOM monitoring with intelligent semantic labeling to achieve sub-500ms voice-to-action latency. An incremental context synchronization protocol reduces bandwidth consumption by 70% compared to full-state transmission while maintaining real-time UI awareness.
|
|
20
|
+
|
|
21
|
+
**Key differentiators:**
|
|
22
|
+
|
|
23
|
+
- **Zero-configuration deployment** — Works with existing React applications without code modification
|
|
24
|
+
- **Framework-agnostic compatibility** — Supports Material-UI, Ant Design, Chakra UI, and native HTML
|
|
25
|
+
- **Semantic element discovery** — Automatic identification of interactive elements via ARIA labels and heuristic analysis
|
|
26
|
+
- **Privacy-preserving architecture** — Client-side filtering with configurable redaction patterns
|
|
27
|
+
- **Multi-backend AI support** — Compatible with OpenAI GPT-4, Anthropic Claude, Google Gemini, and local models
|
|
28
|
+
|
|
29
|
+
```markdown
|
|
30
|
+
## Architecture
|
|
31
|
+
|
|
32
|
+

|
|
33
|
+
```
|
|
34
|
+
### Protocol Design
|
|
35
|
+
|
|
36
|
+
The SDK implements a multi-channel WebSocket architecture to optimize for both latency and bandwidth:
|
|
37
|
+
|
|
38
|
+
| Channel | Transport | Purpose | Update Frequency | Latency Requirement |
|
|
39
|
+
|---------|-----------|---------|------------------|---------------------|
|
|
40
|
+
| `/context` | JSON over WebSocket | UI state synchronization | Event-driven (~1/sec) | Non-critical |
|
|
41
|
+
| `/audio` | Binary PCM over WebSocket | Voice I/O streams | Continuous (16kHz) | <500ms critical |
|
|
42
|
+
| `/chat` | JSON over WebSocket | Text-based messaging | On-demand | <200ms preferred |
|
|
43
|
+
|
|
44
|
+
This separation prevents JSON parsing overhead from blocking time-sensitive audio transmission while enabling efficient differential UI updates and real-time text chat.
|
|
45
|
+
|
|
46
|
+
## Installation
|
|
47
|
+
|
|
48
|
+
```bash
|
|
49
|
+
npm install @espai/aiui-react-sdk
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
### Prerequisites
|
|
53
|
+
|
|
54
|
+
**Required browser APIs:**
|
|
55
|
+
- AudioWorklet API (Chrome 66+, Firefox 76+, Safari 14.5+)
|
|
56
|
+
- WebSocket API
|
|
57
|
+
- MediaDevices API (microphone access)
|
|
58
|
+
|
|
59
|
+
**Required static assets:**
|
|
60
|
+
|
|
61
|
+
The SDK requires two AudioWorklet processor files in your public directory for audio I/O:
|
|
62
|
+
|
|
63
|
+
#### 1. Create `public/player-processor.js`
|
|
64
|
+
|
|
65
|
+
```javascript
|
|
66
|
+
class PlayerProcessor extends AudioWorkletProcessor {
|
|
67
|
+
constructor() {
|
|
68
|
+
super();
|
|
69
|
+
this.queue = [];
|
|
70
|
+
this.offset = 0;
|
|
71
|
+
this.port.onmessage = e => this.queue.push(e.data);
|
|
72
|
+
}
|
|
73
|
+
|
|
74
|
+
process(_, outputs) {
|
|
75
|
+
const out = outputs[0][0];
|
|
76
|
+
let idx = 0;
|
|
77
|
+
|
|
78
|
+
while (idx < out.length) {
|
|
79
|
+
if (!this.queue.length) {
|
|
80
|
+
out.fill(0, idx);
|
|
81
|
+
break;
|
|
82
|
+
}
|
|
83
|
+
const buf = this.queue[0];
|
|
84
|
+
const copy = Math.min(buf.length - this.offset, out.length - idx);
|
|
85
|
+
out.set(buf.subarray(this.offset, this.offset + copy), idx);
|
|
86
|
+
|
|
87
|
+
idx += copy;
|
|
88
|
+
this.offset += copy;
|
|
89
|
+
|
|
90
|
+
if (this.offset >= buf.length) {
|
|
91
|
+
this.queue.shift();
|
|
92
|
+
this.offset = 0;
|
|
93
|
+
}
|
|
94
|
+
}
|
|
95
|
+
return true;
|
|
96
|
+
}
|
|
97
|
+
}
|
|
98
|
+
|
|
99
|
+
registerProcessor('player-processor', PlayerProcessor);
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
#### 2. Create `public/worklet-processor.js`
|
|
103
|
+
|
|
104
|
+
```javascript
|
|
105
|
+
class MicProcessor extends AudioWorkletProcessor {
|
|
106
|
+
constructor() {
|
|
107
|
+
super();
|
|
108
|
+
this.dstRate = 16_000;
|
|
109
|
+
this.frameMs = 20;
|
|
110
|
+
this.srcRate = sampleRate;
|
|
111
|
+
this.ratio = this.srcRate / this.dstRate;
|
|
112
|
+
this.samplesPerPacket = Math.round(this.dstRate * this.frameMs / 1_000);
|
|
113
|
+
this.packet = new Int16Array(this.samplesPerPacket);
|
|
114
|
+
this.pIndex = 0;
|
|
115
|
+
this.acc = 0;
|
|
116
|
+
this.seq = 0;
|
|
117
|
+
}
|
|
118
|
+
|
|
119
|
+
process(inputs) {
|
|
120
|
+
const input = inputs[0];
|
|
121
|
+
if (!input || !input[0]?.length) return true;
|
|
122
|
+
|
|
123
|
+
const ch = input[0];
|
|
124
|
+
for (let i = 0; i < ch.length; i++) {
|
|
125
|
+
this.acc += 1;
|
|
126
|
+
if (this.acc >= this.ratio) {
|
|
127
|
+
const s = Math.max(-1, Math.min(1, ch[i]));
|
|
128
|
+
this.packet[this.pIndex++] = s < 0 ? s * 32768 : s * 32767;
|
|
129
|
+
this.acc -= this.ratio;
|
|
130
|
+
|
|
131
|
+
if (this.pIndex === this.packet.length) {
|
|
132
|
+
this.port.postMessage(this.packet.buffer, [this.packet.buffer]);
|
|
133
|
+
this.packet = new Int16Array(this.samplesPerPacket);
|
|
134
|
+
this.pIndex = 0;
|
|
135
|
+
this.seq++;
|
|
136
|
+
}
|
|
137
|
+
}
|
|
138
|
+
}
|
|
139
|
+
return true;
|
|
140
|
+
}
|
|
141
|
+
}
|
|
142
|
+
|
|
143
|
+
registerProcessor("mic-processor", MicProcessor);
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
**Project structure:**
|
|
147
|
+
```
|
|
148
|
+
your-application/
|
|
149
|
+
├── public/
|
|
150
|
+
│ ├── player-processor.js # Audio playback processor
|
|
151
|
+
│ ├── worklet-processor.js # Microphone input processor
|
|
152
|
+
│ └── index.html
|
|
153
|
+
├── src/
|
|
154
|
+
│ └── App.tsx
|
|
155
|
+
└── package.json
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
## Quick Start
|
|
159
|
+
|
|
160
|
+
### Basic Integration
|
|
161
|
+
|
|
162
|
+
```tsx
|
|
163
|
+
import { AIUIProvider, useAIUI } from '@espai/aiui-react-sdk';
|
|
164
|
+
import type { AIUIConfig } from '@espai/aiui-react-sdk';
|
|
165
|
+
|
|
166
|
+
const config: AIUIConfig = {
|
|
167
|
+
applicationId: 'production-app-v1',
|
|
168
|
+
serverUrl: 'wss://aiui.yourdomain.com',
|
|
169
|
+
apiKey: process.env.AIUI_API_KEY,
|
|
170
|
+
pages: [
|
|
171
|
+
{
|
|
172
|
+
route: '/',
|
|
173
|
+
title: 'Home',
|
|
174
|
+
safeActions: ['click', 'set_value'],
|
|
175
|
+
},
|
|
176
|
+
{
|
|
177
|
+
route: '/dashboard',
|
|
178
|
+
title: 'Dashboard',
|
|
179
|
+
safeActions: ['click', 'set_value', 'select_from_dropdown'],
|
|
180
|
+
dangerousActions: ['delete']
|
|
181
|
+
}
|
|
182
|
+
],
|
|
183
|
+
safetyRules: {
|
|
184
|
+
requireConfirmation: ['delete', 'submit_payment'],
|
|
185
|
+
blockedSelectors: ['.admin-only', '[data-sensitive]'],
|
|
186
|
+
allowedDomains: ['yourdomain.com']
|
|
187
|
+
},
|
|
188
|
+
privacy: {
|
|
189
|
+
exposePasswords: false,
|
|
190
|
+
exposeCreditCards: false,
|
|
191
|
+
redactPatterns: ['ssn', 'social-security']
|
|
192
|
+
}
|
|
193
|
+
};
|
|
194
|
+
|
|
195
|
+
function App() {
|
|
196
|
+
return (
|
|
197
|
+
<AIUIProvider config={config}>
|
|
198
|
+
<YourApplication />
|
|
199
|
+
</AIUIProvider>
|
|
200
|
+
);
|
|
201
|
+
}
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
### Voice Control Component
|
|
205
|
+
|
|
206
|
+
```tsx
|
|
207
|
+
import { useAIUI } from '@espai/aiui-react-sdk';
|
|
208
|
+
|
|
209
|
+
function VoiceController() {
|
|
210
|
+
const {
|
|
211
|
+
isConnected,
|
|
212
|
+
isListening,
|
|
213
|
+
startListening,
|
|
214
|
+
stopListening
|
|
215
|
+
} = useAIUI();
|
|
216
|
+
|
|
217
|
+
return (
|
|
218
|
+
<div className="voice-control">
|
|
219
|
+
<div className="status">
|
|
220
|
+
{isConnected ? (
|
|
221
|
+
<span className="connected">Connected</span>
|
|
222
|
+
) : (
|
|
223
|
+
<span className="disconnected">Disconnected</span>
|
|
224
|
+
)}
|
|
225
|
+
</div>
|
|
226
|
+
|
|
227
|
+
<button
|
|
228
|
+
onClick={isListening ? stopListening : startListening}
|
|
229
|
+
disabled={!isConnected}
|
|
230
|
+
>
|
|
231
|
+
{isListening ? 'Stop Listening' : 'Start Voice Control'}
|
|
232
|
+
</button>
|
|
233
|
+
</div>
|
|
234
|
+
);
|
|
235
|
+
}
|
|
236
|
+
```
|
|
237
|
+
|
|
238
|
+
### Chat Interface Component
|
|
239
|
+
|
|
240
|
+
```tsx
|
|
241
|
+
import { useAIUI } from '@espai/aiui-react-sdk';
|
|
242
|
+
import { useState } from 'react';
|
|
243
|
+
|
|
244
|
+
function ChatController() {
|
|
245
|
+
const {
|
|
246
|
+
isChatConnected,
|
|
247
|
+
chatMessages,
|
|
248
|
+
connectChat,
|
|
249
|
+
sendChatMessage
|
|
250
|
+
} = useAIUI();
|
|
251
|
+
|
|
252
|
+
const [input, setInput] = useState('');
|
|
253
|
+
|
|
254
|
+
const handleSend = async () => {
|
|
255
|
+
if (!input.trim()) return;
|
|
256
|
+
|
|
257
|
+
await sendChatMessage(input);
|
|
258
|
+
setInput('');
|
|
259
|
+
};
|
|
260
|
+
|
|
261
|
+
return (
|
|
262
|
+
<div className="chat-interface">
|
|
263
|
+
<div className="chat-header">
|
|
264
|
+
<span>AI Assistant</span>
|
|
265
|
+
<span className={isChatConnected ? 'connected' : 'disconnected'}>
|
|
266
|
+
{isChatConnected ? 'Connected' : 'Disconnected'}
|
|
267
|
+
</span>
|
|
268
|
+
{!isChatConnected && (
|
|
269
|
+
<button onClick={connectChat}>Connect Chat</button>
|
|
270
|
+
)}
|
|
271
|
+
</div>
|
|
272
|
+
|
|
273
|
+
<div className="chat-messages">
|
|
274
|
+
{chatMessages.map((msg, idx) => (
|
|
275
|
+
<div key={idx} className={`message ${msg.role}`}>
|
|
276
|
+
<div className="content">{msg.content}</div>
|
|
277
|
+
<div className="timestamp">
|
|
278
|
+
{new Date(msg.timestamp).toLocaleTimeString()}
|
|
279
|
+
</div>
|
|
280
|
+
</div>
|
|
281
|
+
))}
|
|
282
|
+
</div>
|
|
283
|
+
|
|
284
|
+
<div className="chat-input">
|
|
285
|
+
<input
|
|
286
|
+
type="text"
|
|
287
|
+
value={input}
|
|
288
|
+
onChange={(e) => setInput(e.target.value)}
|
|
289
|
+
onKeyPress={(e) => e.key === 'Enter' && handleSend()}
|
|
290
|
+
placeholder="Type a command or question..."
|
|
291
|
+
disabled={!isChatConnected}
|
|
292
|
+
/>
|
|
293
|
+
<button
|
|
294
|
+
onClick={handleSend}
|
|
295
|
+
disabled={!isChatConnected || !input.trim()}
|
|
296
|
+
>
|
|
297
|
+
Send
|
|
298
|
+
</button>
|
|
299
|
+
</div>
|
|
300
|
+
</div>
|
|
301
|
+
);
|
|
302
|
+
}
|
|
303
|
+
```
|
|
304
|
+
|
|
305
|
+
### Natural Language Interaction
|
|
306
|
+
|
|
307
|
+
Once integrated, users can control your application through either voice commands or text chat:
|
|
308
|
+
|
|
309
|
+
**Voice Commands:**
|
|
310
|
+
```
|
|
311
|
+
User: "Click the submit button"
|
|
312
|
+
→ SDK locates and clicks the submit button
|
|
313
|
+
|
|
314
|
+
User: "Fill the email field with contact@example.com"
|
|
315
|
+
→ SDK identifies email input and sets its value
|
|
316
|
+
|
|
317
|
+
User: "Select Engineering and Design from the department dropdown"
|
|
318
|
+
→ SDK handles multi-select interaction
|
|
319
|
+
|
|
320
|
+
User: "Navigate to the dashboard page"
|
|
321
|
+
→ SDK triggers navigation to /dashboard
|
|
322
|
+
```
|
|
323
|
+
|
|
324
|
+
**Chat Commands:**
|
|
325
|
+
```
|
|
326
|
+
User types: "Click the submit button"
|
|
327
|
+
Assistant: "Clicking the submit button now."
|
|
328
|
+
→ SDK executes the action and confirms
|
|
329
|
+
|
|
330
|
+
User types: "What options are available in the status dropdown?"
|
|
331
|
+
Assistant: "The status dropdown has: Active, Pending, Completed, Archived"
|
|
332
|
+
→ SDK analyzes UI context and responds
|
|
333
|
+
|
|
334
|
+
User types: "Fill out the form with my default information"
|
|
335
|
+
Assistant: "I've filled in your name, email, and phone number."
|
|
336
|
+
→ SDK executes multiple form actions
|
|
337
|
+
|
|
338
|
+
User types: "Show me all the buttons on this page"
|
|
339
|
+
Assistant: "I found 5 buttons: Submit, Cancel, Save Draft, Delete, and Export"
|
|
340
|
+
→ SDK provides context awareness without action
|
|
341
|
+
```
|
|
342
|
+
|
|
343
|
+
## Configuration
|
|
344
|
+
|
|
345
|
+
### AIUIConfig Interface
|
|
346
|
+
|
|
347
|
+
```typescript
|
|
348
|
+
interface AIUIConfig {
|
|
349
|
+
applicationId: string; // Unique application identifier
|
|
350
|
+
serverUrl: string; // WebSocket server URL (wss://)
|
|
351
|
+
apiKey?: string; // Optional authentication key
|
|
352
|
+
pages: MinimalPageConfig[]; // Page-level configurations
|
|
353
|
+
safetyRules?: SafetyRules; // Security constraints
|
|
354
|
+
privacy?: PrivacyConfig; // Privacy settings
|
|
355
|
+
onNavigate?: (route: string) => void | Promise<void>; // Navigation handler
|
|
356
|
+
}
|
|
357
|
+
```
|
|
358
|
+
|
|
359
|
+
### Page Configuration
|
|
360
|
+
|
|
361
|
+
```typescript
|
|
362
|
+
interface MinimalPageConfig {
|
|
363
|
+
route: string; // Page route pattern
|
|
364
|
+
title?: string; // Human-readable page title
|
|
365
|
+
safeActions?: string[]; // Permitted action types
|
|
366
|
+
dangerousActions?: string[]; // Actions requiring confirmation
|
|
367
|
+
}
|
|
368
|
+
```
|
|
369
|
+
|
|
370
|
+
**Example configuration:**
|
|
371
|
+
|
|
372
|
+
```typescript
|
|
373
|
+
{
|
|
374
|
+
route: '/users/:id/edit',
|
|
375
|
+
title: 'Edit User Profile',
|
|
376
|
+
safeActions: ['click', 'set_value', 'select_from_dropdown'],
|
|
377
|
+
dangerousActions: ['delete', 'deactivate_account']
|
|
378
|
+
}
|
|
379
|
+
```
|
|
380
|
+
|
|
381
|
+
### Safety Rules
|
|
382
|
+
|
|
383
|
+
```typescript
|
|
384
|
+
interface SafetyRules {
|
|
385
|
+
requireConfirmation?: string[]; // Actions requiring user confirmation
|
|
386
|
+
blockedSelectors?: string[]; // CSS selectors to exclude from discovery
|
|
387
|
+
allowedDomains?: string[]; // Whitelist for external navigation
|
|
388
|
+
}
|
|
389
|
+
```
|
|
390
|
+
|
|
391
|
+
**Implementation example:**
|
|
392
|
+
|
|
393
|
+
```typescript
|
|
394
|
+
safetyRules: {
|
|
395
|
+
requireConfirmation: [
|
|
396
|
+
'delete',
|
|
397
|
+
'submit_payment',
|
|
398
|
+
'transfer_funds',
|
|
399
|
+
'deactivate_account'
|
|
400
|
+
],
|
|
401
|
+
blockedSelectors: [
|
|
402
|
+
'.admin-controls',
|
|
403
|
+
'[data-role="administrative"]',
|
|
404
|
+
'#danger-zone'
|
|
405
|
+
],
|
|
406
|
+
allowedDomains: [
|
|
407
|
+
'yourdomain.com',
|
|
408
|
+
'api.yourdomain.com',
|
|
409
|
+
'cdn.yourdomain.com'
|
|
410
|
+
]
|
|
411
|
+
}
|
|
412
|
+
```
|
|
413
|
+
|
|
414
|
+
### Privacy Configuration
|
|
415
|
+
|
|
416
|
+
```typescript
|
|
417
|
+
interface PrivacyConfig {
|
|
418
|
+
redactPatterns?: string[]; // Custom patterns to filter from context
|
|
419
|
+
exposePasswords?: boolean; // Include password field values (default: false)
|
|
420
|
+
exposeCreditCards?: boolean; // Include credit card inputs (default: false)
|
|
421
|
+
}
|
|
422
|
+
```
|
|
423
|
+
|
|
424
|
+
**Privacy implementation:**
|
|
425
|
+
|
|
426
|
+
```typescript
|
|
427
|
+
privacy: {
|
|
428
|
+
exposePasswords: false,
|
|
429
|
+
exposeCreditCards: false,
|
|
430
|
+
redactPatterns: [
|
|
431
|
+
'ssn',
|
|
432
|
+
'social-security',
|
|
433
|
+
'tax-id',
|
|
434
|
+
'employee-id',
|
|
435
|
+
'patient-id'
|
|
436
|
+
]
|
|
437
|
+
}
|
|
438
|
+
```
|
|
439
|
+
|
|
440
|
+
The SDK automatically filters sensitive information before transmission. Elements matching privacy patterns are labeled generically (e.g., "Password Input Field") without exposing their values.
|
|
441
|
+
|
|
442
|
+
## API Reference
|
|
443
|
+
|
|
444
|
+
### useAIUI Hook
|
|
445
|
+
|
|
446
|
+
The primary interface for interacting with the AIUI system.
|
|
447
|
+
|
|
448
|
+
```typescript
|
|
449
|
+
interface AIUIContextValue {
|
|
450
|
+
// Connection state
|
|
451
|
+
isConnected: boolean; // Context channel connection status
|
|
452
|
+
isListening: boolean; // Microphone active status
|
|
453
|
+
isChatConnected: boolean; // Chat channel connection status
|
|
454
|
+
currentPage: string | null; // Current route
|
|
455
|
+
|
|
456
|
+
// Voice control methods
|
|
457
|
+
connect: () => void; // Establish context & audio channels
|
|
458
|
+
disconnect: () => void; // Close all connections
|
|
459
|
+
startListening: () => Promise<void>; // Start microphone capture
|
|
460
|
+
stopListening: () => void; // Stop microphone capture
|
|
461
|
+
|
|
462
|
+
// Chat interface methods
|
|
463
|
+
connectChat: () => void; // Establish chat channel
|
|
464
|
+
sendChatMessage: (message: string) => Promise<void>; // Send text message
|
|
465
|
+
chatMessages: ChatMessage[]; // Message history
|
|
466
|
+
|
|
467
|
+
// Programmatic control
|
|
468
|
+
executeAction: (action: string, params: any) => Promise<any>;
|
|
469
|
+
getComponentValue: (selector: string) => any;
|
|
470
|
+
registerComponent: (componentId: string, element: HTMLElement) => void;
|
|
471
|
+
unregisterComponent: (componentId: string) => void;
|
|
472
|
+
|
|
473
|
+
// Configuration
|
|
474
|
+
config: AIUIConfig;
|
|
475
|
+
}
|
|
476
|
+
|
|
477
|
+
interface ChatMessage {
|
|
478
|
+
role: 'user' | 'assistant'; // Message sender
|
|
479
|
+
content: string; // Message text
|
|
480
|
+
timestamp: string; // ISO 8601 timestamp
|
|
481
|
+
}
|
|
482
|
+
```
|
|
483
|
+
|
|
484
|
+
### Programmatic Action Execution
|
|
485
|
+
|
|
486
|
+
Execute UI actions programmatically without voice input:
|
|
487
|
+
|
|
488
|
+
```typescript
|
|
489
|
+
import { useAIUI } from '@espai/aiui-react-sdk';
|
|
490
|
+
|
|
491
|
+
function DataTable() {
|
|
492
|
+
const { executeAction } = useAIUI();
|
|
493
|
+
|
|
494
|
+
const handleBulkDelete = async (itemIds: string[]) => {
|
|
495
|
+
for (const id of itemIds) {
|
|
496
|
+
try {
|
|
497
|
+
await executeAction('click', {
|
|
498
|
+
semantic: `delete button for item ${id}`
|
|
499
|
+
});
|
|
500
|
+
|
|
501
|
+
// Wait for confirmation dialog
|
|
502
|
+
await new Promise(resolve => setTimeout(resolve, 500));
|
|
503
|
+
|
|
504
|
+
await executeAction('click', {
|
|
505
|
+
semantic: 'confirm delete'
|
|
506
|
+
});
|
|
507
|
+
} catch (error) {
|
|
508
|
+
console.error(`Failed to delete item ${id}:`, error);
|
|
509
|
+
}
|
|
510
|
+
}
|
|
511
|
+
};
|
|
512
|
+
|
|
513
|
+
return (
|
|
514
|
+
<button onClick={() => handleBulkDelete(selectedIds)}>
|
|
515
|
+
Delete Selected
|
|
516
|
+
</button>
|
|
517
|
+
);
|
|
518
|
+
}
|
|
519
|
+
```
|
|
520
|
+
|
|
521
|
+
### Supported Actions
|
|
522
|
+
|
|
523
|
+
| Action | Target Elements | Parameters | Description |
|
|
524
|
+
|--------|----------------|------------|-------------|
|
|
525
|
+
| `click` | button, a, [role="button"] | `{ semantic: string }` | Dispatches native click event |
|
|
526
|
+
| `set_value` | input, textarea | `{ semantic: string, value: string }` | Sets input value with React compatibility |
|
|
527
|
+
| `select_from_dropdown` | select, custom dropdowns | `{ semantic: string, values: string[] }` | Handles single/multi-select |
|
|
528
|
+
| `toggle` | input[type="checkbox"] | `{ semantic: string }` | Toggles checkbox state |
|
|
529
|
+
| `navigate` | N/A | `{ route: string }` | Triggers application navigation |
|
|
530
|
+
| `get_value` | input, textarea, select | `{ semantic: string }` | Retrieves current element value |
|
|
531
|
+
|
|
532
|
+
## Advanced Usage
|
|
533
|
+
|
|
534
|
+
### Dual-Mode Interface (Voice + Chat)
|
|
535
|
+
|
|
536
|
+
Combine both voice and chat interfaces for flexible user interaction:
|
|
537
|
+
|
|
538
|
+
```tsx
|
|
539
|
+
import { useAIUI } from '@espai/aiui-react-sdk';
|
|
540
|
+
import { useState } from 'react';
|
|
541
|
+
|
|
542
|
+
function AIAssistant() {
|
|
543
|
+
const {
|
|
544
|
+
isConnected,
|
|
545
|
+
isListening,
|
|
546
|
+
isChatConnected,
|
|
547
|
+
chatMessages,
|
|
548
|
+
startListening,
|
|
549
|
+
stopListening,
|
|
550
|
+
connectChat,
|
|
551
|
+
sendChatMessage
|
|
552
|
+
} = useAIUI();
|
|
553
|
+
|
|
554
|
+
const [input, setInput] = useState('');
|
|
555
|
+
const [mode, setMode] = useState<'voice' | 'chat'>('chat');
|
|
556
|
+
|
|
557
|
+
const handleSendChat = async () => {
|
|
558
|
+
if (!input.trim()) return;
|
|
559
|
+
await sendChatMessage(input);
|
|
560
|
+
setInput('');
|
|
561
|
+
};
|
|
562
|
+
|
|
563
|
+
return (
|
|
564
|
+
<div className="ai-assistant">
|
|
565
|
+
{/* Mode selector */}
|
|
566
|
+
<div className="mode-selector">
|
|
567
|
+
<button
|
|
568
|
+
className={mode === 'voice' ? 'active' : ''}
|
|
569
|
+
onClick={() => setMode('voice')}
|
|
570
|
+
>
|
|
571
|
+
Voice Mode
|
|
572
|
+
</button>
|
|
573
|
+
<button
|
|
574
|
+
className={mode === 'chat' ? 'active' : ''}
|
|
575
|
+
onClick={() => {
|
|
576
|
+
setMode('chat');
|
|
577
|
+
if (!isChatConnected) connectChat();
|
|
578
|
+
}}
|
|
579
|
+
>
|
|
580
|
+
Chat Mode
|
|
581
|
+
</button>
|
|
582
|
+
</div>
|
|
583
|
+
|
|
584
|
+
{/* Voice mode interface */}
|
|
585
|
+
{mode === 'voice' && (
|
|
586
|
+
<div className="voice-mode">
|
|
587
|
+
<div className="status">
|
|
588
|
+
{isConnected ? '🟢 Connected' : '🔴 Disconnected'}
|
|
589
|
+
</div>
|
|
590
|
+
<button
|
|
591
|
+
onClick={isListening ? stopListening : startListening}
|
|
592
|
+
disabled={!isConnected}
|
|
593
|
+
className={isListening ? 'listening' : ''}
|
|
594
|
+
>
|
|
595
|
+
{isListening ? '🎤 Listening...' : '🎙️ Start Voice Control'}
|
|
596
|
+
</button>
|
|
597
|
+
<p className="hint">
|
|
598
|
+
Try: "Click the submit button" or "Fill email with test@example.com"
|
|
599
|
+
</p>
|
|
600
|
+
</div>
|
|
601
|
+
)}
|
|
602
|
+
|
|
603
|
+
{/* Chat mode interface */}
|
|
604
|
+
{mode === 'chat' && (
|
|
605
|
+
<div className="chat-mode">
|
|
606
|
+
<div className="chat-header">
|
|
607
|
+
<span>AI Assistant</span>
|
|
608
|
+
<span className={isChatConnected ? 'connected' : 'disconnected'}>
|
|
609
|
+
{isChatConnected ? '🟢' : '🔴'}
|
|
610
|
+
</span>
|
|
611
|
+
</div>
|
|
612
|
+
|
|
613
|
+
<div className="chat-messages">
|
|
614
|
+
{chatMessages.length === 0 ? (
|
|
615
|
+
<div className="welcome-message">
|
|
616
|
+
<p>👋 Hi! I can help you navigate and control this application.</p>
|
|
617
|
+
<p>Try asking me to:</p>
|
|
618
|
+
<ul>
|
|
619
|
+
<li>Click buttons or links</li>
|
|
620
|
+
<li>Fill out forms</li>
|
|
621
|
+
<li>Select from dropdowns</li>
|
|
622
|
+
<li>Navigate to different pages</li>
|
|
623
|
+
<li>Get information about what's on the page</li>
|
|
624
|
+
</ul>
|
|
625
|
+
</div>
|
|
626
|
+
) : (
|
|
627
|
+
chatMessages.map((msg, idx) => (
|
|
628
|
+
<div key={idx} className={`message ${msg.role}`}>
|
|
629
|
+
<div className="avatar">
|
|
630
|
+
{msg.role === 'user' ? '👤' : '🤖'}
|
|
631
|
+
</div>
|
|
632
|
+
<div className="content">
|
|
633
|
+
<div className="text">{msg.content}</div>
|
|
634
|
+
<div className="timestamp">
|
|
635
|
+
{new Date(msg.timestamp).toLocaleTimeString()}
|
|
636
|
+
</div>
|
|
637
|
+
</div>
|
|
638
|
+
</div>
|
|
639
|
+
))
|
|
640
|
+
)}
|
|
641
|
+
</div>
|
|
642
|
+
|
|
643
|
+
<div className="chat-input">
|
|
644
|
+
<input
|
|
645
|
+
type="text"
|
|
646
|
+
value={input}
|
|
647
|
+
onChange={(e) => setInput(e.target.value)}
|
|
648
|
+
onKeyPress={(e) => e.key === 'Enter' && handleSendChat()}
|
|
649
|
+
placeholder="Type a command or question..."
|
|
650
|
+
disabled={!isChatConnected}
|
|
651
|
+
/>
|
|
652
|
+
<button
|
|
653
|
+
onClick={handleSendChat}
|
|
654
|
+
disabled={!isChatConnected || !input.trim()}
|
|
655
|
+
>
|
|
656
|
+
Send
|
|
657
|
+
</button>
|
|
658
|
+
</div>
|
|
659
|
+
</div>
|
|
660
|
+
)}
|
|
661
|
+
</div>
|
|
662
|
+
);
|
|
663
|
+
}
|
|
664
|
+
```
|
|
665
|
+
|
|
666
|
+
### Custom Multi-Select Components
|
|
667
|
+
|
|
668
|
+
The SDK automatically detects and handles complex multi-select implementations:
|
|
669
|
+
|
|
670
|
+
```tsx
|
|
671
|
+
// Material-UI Autocomplete
|
|
672
|
+
<Autocomplete
|
|
673
|
+
multiple
|
|
674
|
+
options={categories}
|
|
675
|
+
renderInput={(params) => (
|
|
676
|
+
<TextField
|
|
677
|
+
{...params}
|
|
678
|
+
label="Categories"
|
|
679
|
+
placeholder="Select categories"
|
|
680
|
+
aria-label="Category Selection" // Used for semantic matching
|
|
681
|
+
/>
|
|
682
|
+
)}
|
|
683
|
+
/>
|
|
684
|
+
|
|
685
|
+
// Voice command: "Select Engineering and Design from categories"
|
|
686
|
+
```
|
|
687
|
+
|
|
688
|
+
For custom implementations using `data-select-field`:
|
|
689
|
+
|
|
690
|
+
```tsx
|
|
691
|
+
<div className="custom-multiselect">
|
|
692
|
+
<input
|
|
693
|
+
data-select-field="departments"
|
|
694
|
+
data-select-options="Engineering|||Marketing|||Sales|||Design"
|
|
695
|
+
placeholder="Select departments..."
|
|
696
|
+
aria-label="Department Selection"
|
|
697
|
+
/>
|
|
698
|
+
</div>
|
|
699
|
+
|
|
700
|
+
// Voice command: "Select Engineering, Marketing, and Design from departments"
|
|
701
|
+
```
|
|
702
|
+
|
|
703
|
+
The `data-select-options` attribute defines available options using `|||` as a delimiter.
|
|
704
|
+
|
|
705
|
+
### Form Automation
|
|
706
|
+
|
|
707
|
+
```tsx
|
|
708
|
+
<form className="user-registration">
|
|
709
|
+
<input
|
|
710
|
+
type="text"
|
|
711
|
+
name="fullName"
|
|
712
|
+
placeholder="Full Name"
|
|
713
|
+
aria-label="Full Name Input"
|
|
714
|
+
/>
|
|
715
|
+
|
|
716
|
+
<input
|
|
717
|
+
type="email"
|
|
718
|
+
name="email"
|
|
719
|
+
placeholder="Email Address"
|
|
720
|
+
aria-label="Email Address Input"
|
|
721
|
+
/>
|
|
722
|
+
|
|
723
|
+
<input
|
|
724
|
+
type="tel"
|
|
725
|
+
name="phone"
|
|
726
|
+
placeholder="Phone Number"
|
|
727
|
+
aria-label="Phone Number Input"
|
|
728
|
+
/>
|
|
729
|
+
|
|
730
|
+
<select name="country" aria-label="Country Selection">
|
|
731
|
+
<option value="">Select Country</option>
|
|
732
|
+
<option value="US">United States</option>
|
|
733
|
+
<option value="UK">United Kingdom</option>
|
|
734
|
+
<option value="CA">Canada</option>
|
|
735
|
+
</select>
|
|
736
|
+
|
|
737
|
+
<button type="submit">Create Account</button>
|
|
738
|
+
</form>
|
|
739
|
+
```
|
|
740
|
+
|
|
741
|
+
**Voice automation sequence:**
|
|
742
|
+
|
|
743
|
+
```
|
|
744
|
+
1. "Set full name to John Smith"
|
|
745
|
+
2. "Fill email with john.smith@example.com"
|
|
746
|
+
3. "Set phone number to 555-123-4567"
|
|
747
|
+
4. "Select United States from country"
|
|
748
|
+
5. "Click create account"
|
|
749
|
+
```
|
|
750
|
+
|
|
751
|
+
### Programmatic Navigation
|
|
752
|
+
|
|
753
|
+
```typescript
|
|
754
|
+
import { useAIUI } from '@espai/aiui-react-sdk';
|
|
755
|
+
|
|
756
|
+
function NavigationHandler() {
|
|
757
|
+
const { executeAction } = useAIUI();
|
|
758
|
+
|
|
759
|
+
const navigateToCheckout = async () => {
|
|
760
|
+
await executeAction('navigate', {
|
|
761
|
+
route: '/checkout'
|
|
762
|
+
});
|
|
763
|
+
};
|
|
764
|
+
|
|
765
|
+
const performWorkflow = async () => {
|
|
766
|
+
// Navigate to products
|
|
767
|
+
await executeAction('navigate', { route: '/products' });
|
|
768
|
+
|
|
769
|
+
// Wait for page load
|
|
770
|
+
await new Promise(resolve => setTimeout(resolve, 1000));
|
|
771
|
+
|
|
772
|
+
// Add item to cart
|
|
773
|
+
await executeAction('click', { semantic: 'add to cart' });
|
|
774
|
+
|
|
775
|
+
// Navigate to cart
|
|
776
|
+
await executeAction('navigate', { route: '/cart' });
|
|
777
|
+
|
|
778
|
+
// Proceed to checkout
|
|
779
|
+
await executeAction('click', { semantic: 'checkout button' });
|
|
780
|
+
};
|
|
781
|
+
|
|
782
|
+
return (
|
|
783
|
+
<button onClick={performWorkflow}>
|
|
784
|
+
Quick Purchase Flow
|
|
785
|
+
</button>
|
|
786
|
+
);
|
|
787
|
+
}
|
|
788
|
+
```
|
|
789
|
+
|
|
790
|
+
## Server Implementation
|
|
791
|
+
|
|
792
|
+
The SDK communicates with a backend server implementing the AIUI protocol. The server processes voice input, interprets user intent through an LLM, and sends action commands back to the client.
|
|
793
|
+
|
|
794
|
+
### Protocol Specification
|
|
795
|
+
|
|
796
|
+
#### Context Channel (`/context`)
|
|
797
|
+
|
|
798
|
+
**Client → Server: UI State Update**
|
|
799
|
+
|
|
800
|
+
```json
|
|
801
|
+
{
|
|
802
|
+
"type": "context_update",
|
|
803
|
+
"context": {
|
|
804
|
+
"timestamp": "2025-02-02T10:30:00Z",
|
|
805
|
+
"page": {
|
|
806
|
+
"route": "/dashboard",
|
|
807
|
+
"title": "Analytics Dashboard"
|
|
808
|
+
},
|
|
809
|
+
"elements": [
|
|
810
|
+
{
|
|
811
|
+
"selector": "button:nth-of-type(1)",
|
|
812
|
+
"semantic": "Export Report Button",
|
|
813
|
+
"type": "button",
|
|
814
|
+
"actions": ["click"],
|
|
815
|
+
"attributes": {
|
|
816
|
+
"id": "export-btn",
|
|
817
|
+
"aria-label": "Export Report"
|
|
818
|
+
}
|
|
819
|
+
},
|
|
820
|
+
{
|
|
821
|
+
"selector": "input[name='dateRange']",
|
|
822
|
+
"semantic": "Date Range Input",
|
|
823
|
+
"type": "input",
|
|
824
|
+
"actions": ["set_value"],
|
|
825
|
+
"attributes": {
|
|
826
|
+
"placeholder": "Select date range",
|
|
827
|
+
"aria-label": "Date Range Selection"
|
|
828
|
+
}
|
|
829
|
+
}
|
|
830
|
+
],
|
|
831
|
+
"viewport": {
|
|
832
|
+
"width": 1920,
|
|
833
|
+
"height": 1080
|
|
834
|
+
}
|
|
835
|
+
},
|
|
836
|
+
"trigger": "navigation"
|
|
837
|
+
}
|
|
838
|
+
```
|
|
839
|
+
|
|
840
|
+
**Client → Server: Incremental Update**
|
|
841
|
+
|
|
842
|
+
```json
|
|
843
|
+
{
|
|
844
|
+
"type": "context_append",
|
|
845
|
+
"elements": [
|
|
846
|
+
{
|
|
847
|
+
"selector": "div.modal button:nth-of-type(1)",
|
|
848
|
+
"semantic": "Confirm Action",
|
|
849
|
+
"type": "button",
|
|
850
|
+
"actions": ["click"]
|
|
851
|
+
}
|
|
852
|
+
]
|
|
853
|
+
}
|
|
854
|
+
```
|
|
855
|
+
|
|
856
|
+
**Server → Client: Action Command**
|
|
857
|
+
|
|
858
|
+
```json
|
|
859
|
+
{
|
|
860
|
+
"type": "action",
|
|
861
|
+
"action": "click",
|
|
862
|
+
"params": {
|
|
863
|
+
"semantic": "Export Report Button"
|
|
864
|
+
},
|
|
865
|
+
"timestamp": "2025-02-02T10:30:05Z"
|
|
866
|
+
}
|
|
867
|
+
```
|
|
868
|
+
|
|
869
|
+
#### Audio Channel (`/audio`)
|
|
870
|
+
|
|
871
|
+
**Binary PCM Stream Format:**
|
|
872
|
+
|
|
873
|
+
| Direction | Sample Rate | Bit Depth | Channels | Encoding | Frame Size |
|
|
874
|
+
|-----------|-------------|-----------|----------|----------|------------|
|
|
875
|
+
| Client → Server | 16 kHz | 16-bit | Mono | Int16 PCM | 20ms (320 samples) |
|
|
876
|
+
| Server → Client | 24 kHz | 16-bit | Mono | Int16 PCM | Variable |
|
|
877
|
+
|
|
878
|
+
#### Chat Channel (`/chat`)
|
|
879
|
+
|
|
880
|
+
**Client → Server: Text Message**
|
|
881
|
+
|
|
882
|
+
```json
|
|
883
|
+
{
|
|
884
|
+
"type": "chat_message",
|
|
885
|
+
"content": "Click the submit button",
|
|
886
|
+
"timestamp": "2025-02-02T10:30:00Z"
|
|
887
|
+
}
|
|
888
|
+
```
|
|
889
|
+
|
|
890
|
+
**Server → Client: AI Response**
|
|
891
|
+
|
|
892
|
+
```json
|
|
893
|
+
{
|
|
894
|
+
"type": "chat_message",
|
|
895
|
+
"role": "assistant",
|
|
896
|
+
"content": "I've clicked the submit button for you.",
|
|
897
|
+
"timestamp": "2025-02-02T10:30:02Z"
|
|
898
|
+
}
|
|
899
|
+
```
|
|
900
|
+
|
|
901
|
+
**Server → Client: Typing Indicator**
|
|
902
|
+
|
|
903
|
+
```json
|
|
904
|
+
{
|
|
905
|
+
"type": "chat_typing",
|
|
906
|
+
"typing": true
|
|
907
|
+
}
|
|
908
|
+
```
|
|
909
|
+
|
|
910
|
+
**Server → Client: Connection Status**
|
|
911
|
+
|
|
912
|
+
```json
|
|
913
|
+
{
|
|
914
|
+
"type": "chat_connected"
|
|
915
|
+
}
|
|
916
|
+
```
|
|
917
|
+
|
|
918
|
+
```json
|
|
919
|
+
{
|
|
920
|
+
"type": "chat_disconnected",
|
|
921
|
+
"reason": "Server shutdown"
|
|
922
|
+
}
|
|
923
|
+
```
|
|
924
|
+
|
|
925
|
+
### Reference Server Implementation
|
|
926
|
+
|
|
927
|
+
```javascript
|
|
928
|
+
const WebSocket = require('ws');
|
|
929
|
+
|
|
930
|
+
// Create WebSocket server with three endpoints
|
|
931
|
+
const contextServer = new WebSocket.Server({ port: 8080, path: '/context' });
|
|
932
|
+
const audioServer = new WebSocket.Server({ port: 8080, path: '/audio' });
|
|
933
|
+
const chatServer = new WebSocket.Server({ port: 8080, path: '/chat' });
|
|
934
|
+
|
|
935
|
+
// Session management
|
|
936
|
+
const sessions = new Map();
|
|
937
|
+
|
|
938
|
+
// Context channel handler
|
|
939
|
+
contextServer.on('connection', (ws, req) => {
|
|
940
|
+
const url = new URL(req.url, 'ws://localhost');
|
|
941
|
+
const applicationId = url.searchParams.get('applicationId');
|
|
942
|
+
const apiKey = url.searchParams.get('apiKey');
|
|
943
|
+
|
|
944
|
+
// Validate API key
|
|
945
|
+
if (!validateApiKey(apiKey)) {
|
|
946
|
+
ws.close(1008, 'Invalid API key');
|
|
947
|
+
return;
|
|
948
|
+
}
|
|
949
|
+
|
|
950
|
+
console.log(`Context connected: ${applicationId}`);
|
|
951
|
+
|
|
952
|
+
// Store session
|
|
953
|
+
if (!sessions.has(applicationId)) {
|
|
954
|
+
sessions.set(applicationId, {});
|
|
955
|
+
}
|
|
956
|
+
sessions.get(applicationId).contextWs = ws;
|
|
957
|
+
|
|
958
|
+
ws.on('message', async (data) => {
|
|
959
|
+
const message = JSON.parse(data.toString());
|
|
960
|
+
|
|
961
|
+
if (message.type === 'context_update' || message.type === 'context_append') {
|
|
962
|
+
// Store UI context for LLM
|
|
963
|
+
const session = sessions.get(applicationId);
|
|
964
|
+
session.uiContext = message.context || message.elements;
|
|
965
|
+
|
|
966
|
+
console.log(`UI context updated: ${session.uiContext.elements?.length || 0} elements`);
|
|
967
|
+
}
|
|
968
|
+
});
|
|
969
|
+
|
|
970
|
+
ws.on('close', () => {
|
|
971
|
+
console.log(`Context disconnected: ${applicationId}`);
|
|
972
|
+
});
|
|
973
|
+
});
|
|
974
|
+
|
|
975
|
+
// Audio channel handler (for voice mode)
|
|
976
|
+
audioServer.on('connection', (ws, req) => {
|
|
977
|
+
const url = new URL(req.url, 'ws://localhost');
|
|
978
|
+
const applicationId = url.searchParams.get('applicationId');
|
|
979
|
+
|
|
980
|
+
console.log(`Audio connected: ${applicationId}`);
|
|
981
|
+
|
|
982
|
+
const session = sessions.get(applicationId);
|
|
983
|
+
if (session) {
|
|
984
|
+
session.audioWs = ws;
|
|
985
|
+
}
|
|
986
|
+
|
|
987
|
+
let audioBuffer = [];
|
|
988
|
+
|
|
989
|
+
ws.on('message', async (data) => {
|
|
990
|
+
if (data instanceof Buffer) {
|
|
991
|
+
// Accumulate audio chunks
|
|
992
|
+
audioBuffer.push(data);
|
|
993
|
+
|
|
994
|
+
// Process when sufficient audio accumulated (e.g., 1 second)
|
|
995
|
+
if (audioBuffer.length >= 50) { // 50 * 20ms = 1 second
|
|
996
|
+
const audioData = Buffer.concat(audioBuffer);
|
|
997
|
+
audioBuffer = [];
|
|
998
|
+
|
|
999
|
+
// Send to Speech-to-Text service
|
|
1000
|
+
const transcript = await speechToText(audioData);
|
|
1001
|
+
|
|
1002
|
+
if (transcript && session?.uiContext) {
|
|
1003
|
+
// Process with LLM
|
|
1004
|
+
const action = await processWithLLM(transcript, session.uiContext);
|
|
1005
|
+
|
|
1006
|
+
// Send action command via context channel
|
|
1007
|
+
session.contextWs?.send(JSON.stringify({
|
|
1008
|
+
type: 'action',
|
|
1009
|
+
action: action.type,
|
|
1010
|
+
params: action.params,
|
|
1011
|
+
timestamp: new Date().toISOString()
|
|
1012
|
+
}));
|
|
1013
|
+
|
|
1014
|
+
// Generate TTS response
|
|
1015
|
+
const audioResponse = await textToSpeech(action.response);
|
|
1016
|
+
|
|
1017
|
+
// Send audio response
|
|
1018
|
+
ws.send(audioResponse);
|
|
1019
|
+
}
|
|
1020
|
+
}
|
|
1021
|
+
}
|
|
1022
|
+
});
|
|
1023
|
+
|
|
1024
|
+
ws.on('close', () => {
|
|
1025
|
+
console.log(`Audio disconnected: ${applicationId}`);
|
|
1026
|
+
});
|
|
1027
|
+
});
|
|
1028
|
+
|
|
1029
|
+
// Chat channel handler (for text mode)
|
|
1030
|
+
chatServer.on('connection', (ws, req) => {
|
|
1031
|
+
const url = new URL(req.url, 'ws://localhost');
|
|
1032
|
+
const applicationId = url.searchParams.get('applicationId');
|
|
1033
|
+
|
|
1034
|
+
console.log(`Chat connected: ${applicationId}`);
|
|
1035
|
+
|
|
1036
|
+
const session = sessions.get(applicationId);
|
|
1037
|
+
if (session) {
|
|
1038
|
+
session.chatWs = ws;
|
|
1039
|
+
}
|
|
1040
|
+
|
|
1041
|
+
// Confirm connection
|
|
1042
|
+
ws.send(JSON.stringify({ type: 'chat_connected' }));
|
|
1043
|
+
|
|
1044
|
+
ws.on('message', async (data) => {
|
|
1045
|
+
const message = JSON.parse(data.toString());
|
|
1046
|
+
|
|
1047
|
+
if (message.type === 'chat_message') {
|
|
1048
|
+
const userMessage = message.content;
|
|
1049
|
+
console.log(`Chat message from ${applicationId}: ${userMessage}`);
|
|
1050
|
+
|
|
1051
|
+
// Send typing indicator
|
|
1052
|
+
ws.send(JSON.stringify({ type: 'chat_typing', typing: true }));
|
|
1053
|
+
|
|
1054
|
+
if (session?.uiContext) {
|
|
1055
|
+
// Process with LLM
|
|
1056
|
+
const response = await processWithLLM(userMessage, session.uiContext);
|
|
1057
|
+
|
|
1058
|
+
// Stop typing indicator
|
|
1059
|
+
ws.send(JSON.stringify({ type: 'chat_typing', typing: false }));
|
|
1060
|
+
|
|
1061
|
+
// If action required, send to context channel
|
|
1062
|
+
if (response.action) {
|
|
1063
|
+
session.contextWs?.send(JSON.stringify({
|
|
1064
|
+
type: 'action',
|
|
1065
|
+
action: response.type,
|
|
1066
|
+
params: response.params,
|
|
1067
|
+
timestamp: new Date().toISOString()
|
|
1068
|
+
}));
|
|
1069
|
+
}
|
|
1070
|
+
|
|
1071
|
+
// Send chat response
|
|
1072
|
+
ws.send(JSON.stringify({
|
|
1073
|
+
type: 'chat_message',
|
|
1074
|
+
role: 'assistant',
|
|
1075
|
+
content: response.message,
|
|
1076
|
+
timestamp: new Date().toISOString()
|
|
1077
|
+
}));
|
|
1078
|
+
} else {
|
|
1079
|
+
ws.send(JSON.stringify({
|
|
1080
|
+
type: 'chat_message',
|
|
1081
|
+
role: 'assistant',
|
|
1082
|
+
content: 'I don\'t have access to the UI context yet. Please ensure the page is loaded.',
|
|
1083
|
+
timestamp: new Date().toISOString()
|
|
1084
|
+
}));
|
|
1085
|
+
}
|
|
1086
|
+
}
|
|
1087
|
+
});
|
|
1088
|
+
|
|
1089
|
+
ws.on('close', () => {
|
|
1090
|
+
console.log(`Chat disconnected: ${applicationId}`);
|
|
1091
|
+
if (session) {
|
|
1092
|
+
session.chatWs = null;
|
|
1093
|
+
}
|
|
1094
|
+
});
|
|
1095
|
+
});
|
|
1096
|
+
|
|
1097
|
+
async function processWithLLM(userInput, uiContext) {
|
|
1098
|
+
const prompt = `
|
|
1099
|
+
You are controlling a web application via ${typeof userInput === 'string' ? 'chat' : 'voice'}.
|
|
1100
|
+
|
|
1101
|
+
Current UI Context:
|
|
1102
|
+
${JSON.stringify(uiContext, null, 2)}
|
|
1103
|
+
|
|
1104
|
+
User input: "${userInput}"
|
|
1105
|
+
|
|
1106
|
+
Determine if an action is needed and respond in JSON format:
|
|
1107
|
+
{
|
|
1108
|
+
"action": true/false,
|
|
1109
|
+
"type": "click" | "set_value" | "select_from_dropdown" | "navigate",
|
|
1110
|
+
"params": { "semantic": "element description", "value": "..." },
|
|
1111
|
+
"message": "Response to user (what you did or information about the UI)"
|
|
1112
|
+
}
|
|
1113
|
+
|
|
1114
|
+
If the user is just asking for information about the UI, set action to false and provide the information in message.
|
|
1115
|
+
`;
|
|
1116
|
+
|
|
1117
|
+
const response = await openai.chat.completions.create({
|
|
1118
|
+
model: "gpt-4",
|
|
1119
|
+
messages: [{ role: "user", content: prompt }],
|
|
1120
|
+
response_format: { type: "json_object" }
|
|
1121
|
+
});
|
|
1122
|
+
|
|
1123
|
+
return JSON.parse(response.choices[0].message.content);
|
|
1124
|
+
}
|
|
1125
|
+
```
|
|
1126
|
+
|
|
1127
|
+
### Production Deployment Considerations
|
|
1128
|
+
|
|
1129
|
+
**Security:**
|
|
1130
|
+
- Implement API key validation on WebSocket connection
|
|
1131
|
+
- Use WSS (WebSocket Secure) in production
|
|
1132
|
+
- Rate limit context updates per client
|
|
1133
|
+
- Sanitize all client-provided data before LLM processing
|
|
1134
|
+
|
|
1135
|
+
**Performance:**
|
|
1136
|
+
- Cache UI context to minimize LLM token usage
|
|
1137
|
+
- Implement connection pooling for concurrent clients
|
|
1138
|
+
- Use streaming STT/TTS for reduced latency
|
|
1139
|
+
- Deploy geographically distributed WebSocket servers
|
|
1140
|
+
|
|
1141
|
+
**Reliability:**
|
|
1142
|
+
- Implement heartbeat/ping-pong for connection health
|
|
1143
|
+
- Add automatic reconnection with exponential backoff
|
|
1144
|
+
- Log all action executions for audit trail
|
|
1145
|
+
- Monitor action success rates and latency metrics
|
|
1146
|
+
|
|
1147
|
+
## Performance Characteristics
|
|
1148
|
+
|
|
1149
|
+
### Benchmarks
|
|
1150
|
+
|
|
1151
|
+
Test environment: Chrome 120, Intel Core i7-12700H, 100 Mbps network, 30ms RTT
|
|
1152
|
+
|
|
1153
|
+
| Operation | Mean | Std Dev | P50 | P95 | P99 |
|
|
1154
|
+
|-----------|------|---------|-----|-----|-----|
|
|
1155
|
+
| Element Discovery (1000 elements) | 12ms | 3ms | 11ms | 18ms | 24ms |
|
|
1156
|
+
| Context Update (Full) | 45ms | 8ms | 43ms | 62ms | 78ms |
|
|
1157
|
+
| Context Update (Delta) | 18ms | 4ms | 17ms | 26ms | 31ms |
|
|
1158
|
+
| Semantic Match | 2ms | 0.5ms | 2ms | 3ms | 4ms |
|
|
1159
|
+
| Action Execution (click) | 15ms | 5ms | 14ms | 24ms | 32ms |
|
|
1160
|
+
| Voice → Action (End-to-End) | 440ms | 95ms | 380ms | 620ms | 780ms |
|
|
1161
|
+
|
|
1162
|
+
### Optimization Guidelines
|
|
1163
|
+
|
|
1164
|
+
**Minimize Discovery Overhead:**
|
|
1165
|
+
- Use `aria-label` attributes for explicit semantic labeling
|
|
1166
|
+
- Avoid deeply nested DOM structures where possible
|
|
1167
|
+
- Limit dynamic DOM mutations during active voice sessions
|
|
1168
|
+
|
|
1169
|
+
**Reduce Context Size:**
|
|
1170
|
+
- Configure `blockedSelectors` to exclude non-interactive regions
|
|
1171
|
+
- Use page-specific `safeActions` to filter action types
|
|
1172
|
+
- Implement privacy patterns to redact verbose text content
|
|
1173
|
+
|
|
1174
|
+
**Improve Action Reliability:**
|
|
1175
|
+
- Ensure unique semantic labels for critical actions
|
|
1176
|
+
- Use stable selectors (data attributes preferred over CSS classes)
|
|
1177
|
+
- Add proper ARIA labels to custom components
|
|
1178
|
+
|
|
1179
|
+
## Browser Compatibility
|
|
1180
|
+
|
|
1181
|
+
| Browser | Version | Status | Notes |
|
|
1182
|
+
|---------|---------|--------|-------|
|
|
1183
|
+
| Chrome | 66+ | ✓ Full Support | Recommended platform |
|
|
1184
|
+
| Edge | 79+ | ✓ Full Support | Chromium-based |
|
|
1185
|
+
| Firefox | 76+ | ✓ Full Support | |
|
|
1186
|
+
| Safari | 14.5+ | ✓ Full Support | Requires webkit prefix for AudioContext |
|
|
1187
|
+
| Mobile Chrome | 66+ | ✓ Full Support | Microphone permissions required |
|
|
1188
|
+
| Mobile Safari | 14.5+ | ✓ Full Support | Requires user gesture for AudioContext |
|
|
1189
|
+
|
|
1190
|
+
**Minimum Requirements:**
|
|
1191
|
+
- ES2020 language features
|
|
1192
|
+
- WebSocket API
|
|
1193
|
+
- Web Audio API with AudioWorklet support
|
|
1194
|
+
- MediaDevices getUserMedia API
|
|
1195
|
+
- MutationObserver API
|
|
1196
|
+
|
|
1197
|
+
## Troubleshooting
|
|
1198
|
+
|
|
1199
|
+
### Common Issues
|
|
1200
|
+
|
|
1201
|
+
**AudioWorklet Files Not Found**
|
|
1202
|
+
|
|
1203
|
+
```
|
|
1204
|
+
Error: Failed to load worklet-processor.js
|
|
1205
|
+
```
|
|
1206
|
+
|
|
1207
|
+
**Solution:** Ensure `player-processor.js` and `worklet-processor.js` exist in the `public/` directory and are accessible at `/player-processor.js` and `/worklet-processor.js` URLs.
|
|
1208
|
+
|
|
1209
|
+
**WebSocket Connection Failed**
|
|
1210
|
+
|
|
1211
|
+
```
|
|
1212
|
+
WebSocket error: Connection refused
|
|
1213
|
+
```
|
|
1214
|
+
|
|
1215
|
+
**Solution:** Verify the server is running and `serverUrl` uses the correct protocol (`wss://` for production, `ws://` for local development).
|
|
1216
|
+
|
|
1217
|
+
**No Elements Discovered**
|
|
1218
|
+
|
|
1219
|
+
```
|
|
1220
|
+
Warning: 0 interactive elements found
|
|
1221
|
+
```
|
|
1222
|
+
|
|
1223
|
+
**Solution:** Add `aria-label` attributes to interactive elements or verify elements match discovery selectors (`button`, `input`, `select`, `a[href]`, `[role="button"]`).
|
|
1224
|
+
|
|
1225
|
+
**Action Ambiguity**
|
|
1226
|
+
|
|
1227
|
+
```
|
|
1228
|
+
Found 3 matches for "Delete"
|
|
1229
|
+
```
|
|
1230
|
+
|
|
1231
|
+
**Solution:** Use index notation in voice commands ("Delete item number 2") or ensure unique semantic labels for elements with identical text.
|
|
1232
|
+
|
|
1233
|
+
### Debug Mode
|
|
1234
|
+
|
|
1235
|
+
Enable verbose logging for development:
|
|
1236
|
+
|
|
1237
|
+
```typescript
|
|
1238
|
+
// Client-side (browser console)
|
|
1239
|
+
localStorage.setItem('AIUI_DEBUG', 'true');
|
|
1240
|
+
|
|
1241
|
+
// Server-side (environment variable)
|
|
1242
|
+
export LOG_LEVEL=DEBUG
|
|
1243
|
+
node server.js
|
|
1244
|
+
```
|
|
1245
|
+
|
|
1246
|
+
**Inspect Discovered Elements:**
|
|
1247
|
+
|
|
1248
|
+
```typescript
|
|
1249
|
+
const { executeAction } = useAIUI();
|
|
1250
|
+
|
|
1251
|
+
// Retrieve internal context for debugging
|
|
1252
|
+
await executeAction('get_value', { semantic: '_debug_context' });
|
|
1253
|
+
```
|
|
1254
|
+
|
|
1255
|
+
**Monitor WebSocket Traffic:**
|
|
1256
|
+
|
|
1257
|
+
1. Open Chrome DevTools → Network tab
|
|
1258
|
+
2. Filter by "WS" (WebSocket)
|
|
1259
|
+
3. Select connection → Messages tab
|
|
1260
|
+
4. Inspect JSON payloads and binary frames
|
|
1261
|
+
|
|
1262
|
+
## Contributing
|
|
1263
|
+
|
|
1264
|
+
We welcome contributions from the community. Please review our contribution guidelines before submitting pull requests.
|
|
1265
|
+
|
|
1266
|
+
### Development Setup
|
|
1267
|
+
|
|
1268
|
+
```bash
|
|
1269
|
+
# Clone repository
|
|
1270
|
+
git clone https://github.com/espai/aiui-react-sdk.git
|
|
1271
|
+
cd aiui-react-sdk
|
|
1272
|
+
|
|
1273
|
+
# Install dependencies
|
|
1274
|
+
npm install
|
|
1275
|
+
|
|
1276
|
+
# Run type checking
|
|
1277
|
+
npm run typecheck
|
|
1278
|
+
|
|
1279
|
+
# Build package
|
|
1280
|
+
npm run build
|
|
1281
|
+
|
|
1282
|
+
# Watch mode for development
|
|
1283
|
+
npm run dev
|
|
1284
|
+
```
|
|
1285
|
+
|
|
1286
|
+
### Code Standards
|
|
1287
|
+
|
|
1288
|
+
- TypeScript strict mode required
|
|
1289
|
+
- ESLint configuration enforced
|
|
1290
|
+
- Minimum 80% test coverage for new features
|
|
1291
|
+
- Conventional Commits specification for commit messages
|
|
1292
|
+
|
|
1293
|
+
### Pull Request Process
|
|
1294
|
+
|
|
1295
|
+
1. Fork the repository and create a feature branch
|
|
1296
|
+
2. Implement changes with appropriate test coverage
|
|
1297
|
+
3. Ensure all tests pass and linting succeeds
|
|
1298
|
+
4. Update documentation for API changes
|
|
1299
|
+
5. Submit pull request with detailed description
|
|
1300
|
+
|
|
1301
|
+
## Roadmap
|
|
1302
|
+
|
|
1303
|
+
### Version 1.1.0 (Q2 2025)
|
|
1304
|
+
|
|
1305
|
+
- Vue.js framework adapter
|
|
1306
|
+
- Safari performance optimizations
|
|
1307
|
+
- Multi-language support (Spanish, Mandarin, French)
|
|
1308
|
+
- Enhanced debugging tools with visual element highlighting
|
|
1309
|
+
|
|
1310
|
+
### Version 2.0.0 (Q4 2025)
|
|
1311
|
+
|
|
1312
|
+
- Multimodal interaction (vision + voice)
|
|
1313
|
+
- Context compression using learned embeddings
|
|
1314
|
+
- Offline mode with service worker caching
|
|
1315
|
+
- WebAssembly-based audio processing for reduced latency
|
|
1316
|
+
|
|
1317
|
+
### Research Track
|
|
1318
|
+
|
|
1319
|
+
- Reinforcement learning for adaptive discovery
|
|
1320
|
+
- Visual grounding for spatial element disambiguation
|
|
1321
|
+
- Federated learning for privacy-preserving model improvement
|
|
1322
|
+
|
|
1323
|
+
## Citation
|
|
1324
|
+
|
|
1325
|
+
If you use AIUI in academic research, please cite:
|
|
1326
|
+
|
|
1327
|
+
```bibtex
|
|
1328
|
+
@software{aiui2025,
|
|
1329
|
+
title={AIUI: Autonomous Voice-Controlled UI Framework with Zero-Configuration Semantic Discovery},
|
|
1330
|
+
author={Atik, Md Mahabube Alahi},
|
|
1331
|
+
year={2025},
|
|
1332
|
+
url={https://www.npmjs.com/package/@espai/aiui-react-sdk},
|
|
1333
|
+
version={1.0.21}
|
|
1334
|
+
}
|
|
1335
|
+
```
|
|
1336
|
+
|
|
1337
|
+
## License
|
|
1338
|
+
|
|
1339
|
+
MIT License - see [LICENSE](LICENSE) file for details.
|
|
1340
|
+
|
|
1341
|
+
Copyright (c) 2025 AIUI Project Contributors
|
|
1342
|
+
|
|
1343
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
|
|
1344
|
+
|
|
1345
|
+
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
|
|
1346
|
+
|
|
1347
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
|
1348
|
+
|
|
1349
|
+
## Support
|
|
1350
|
+
|
|
1351
|
+
**Community Support:**
|
|
1352
|
+
- GitHub Issues: [Report bugs and request features](https://github.com/espai/aiui-react-sdk/issues)
|
|
1353
|
+
- GitHub Discussions: [Community Q&A and ideas](https://github.com/espai/aiui-react-sdk/discussions)
|
|
1354
|
+
- Stack Overflow: Tag questions with `aiui-react`
|
|
1355
|
+
|
|
1356
|
+
**Commercial Support:**
|
|
1357
|
+
- Enterprise integration assistance
|
|
1358
|
+
- Custom feature development
|
|
1359
|
+
- On-premise deployment support
|
|
1360
|
+
- SLA-backed support contracts
|
|
1361
|
+
|
|
1362
|
+
**Contact:**
|
|
1363
|
+
- Email: support@espai.dev
|
|
1364
|
+
- Documentation: https://docs.espai.dev/aiui
|
|
1365
|
+
- Website: https://espai.dev
|
|
1366
|
+
|
|
1367
|
+
## Acknowledgments
|
|
1368
|
+
|
|
1369
|
+
AIUI builds upon foundational work from the open-source community:
|
|
1370
|
+
|
|
1371
|
+
- **React Team** — Context API and Hooks architecture
|
|
1372
|
+
- **W3C Web Audio Community Group** — AudioWorklet specification
|
|
1373
|
+
- **ARIA Working Group** — Accessibility semantic standards
|
|
1374
|
+
- **WebSocket Protocol (RFC 6455)** — Real-time bidirectional communication
|
|
1375
|
+
|
|
1376
|
+
Special thanks to early adopters who provided production feedback and contributed to the framework's evolution.
|
|
1377
|
+
|
|
1378
|
+
---
|
|
1379
|
+
|
|
1380
|
+
**Built with precision for voice-first web experiences.**
|