react-ai-avatar 0.1.2 → 0.1.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +186 -142
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,14 +1,33 @@
|
|
|
1
1
|
<p align="center">
|
|
2
|
-
<img src="./assets/
|
|
2
|
+
<img src="./assets/logo.svg" alt="react-ai-avatar" width="116" height="116" />
|
|
3
3
|
</p>
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
<h1 align="center">react-ai-avatar</h1>
|
|
6
6
|
|
|
7
|
-
>
|
|
7
|
+
<p align="center">
|
|
8
|
+
<strong>A face for your AI.</strong><br/>
|
|
9
|
+
A presentational React avatar for realtime LLM voice & text UIs — you bring the connection,<br/>
|
|
10
|
+
it brings the face that visibly <strong>listens, thinks and speaks</strong>.
|
|
11
|
+
</p>
|
|
12
|
+
|
|
13
|
+
<p align="center">
|
|
14
|
+
<a href="https://www.npmjs.com/package/react-ai-avatar"><img alt="npm version" src="https://img.shields.io/npm/v/react-ai-avatar?color=0d9488"></a>
|
|
15
|
+
<a href="https://www.npmjs.com/package/react-ai-avatar"><img alt="npm downloads" src="https://img.shields.io/npm/dm/react-ai-avatar?color=0d9488"></a>
|
|
16
|
+
<img alt="minzipped size" src="https://img.shields.io/bundlephobia/minzip/react-ai-avatar?color=0d9488">
|
|
17
|
+
<img alt="types included" src="https://img.shields.io/npm/types/react-ai-avatar?color=0d9488">
|
|
18
|
+
<a href="./LICENSE"><img alt="MIT license" src="https://img.shields.io/npm/l/react-ai-avatar?color=0d9488"></a>
|
|
19
|
+
</p>
|
|
20
|
+
|
|
21
|
+
<p align="center">
|
|
22
|
+
<a href="https://www.npmjs.com/package/react-ai-avatar"><b>npm</b></a> ·
|
|
23
|
+
<a href="https://react-ai-avatar-site.vercel.app/#/docs"><b>Documentation</b></a> ·
|
|
24
|
+
<a href="https://react-ai-avatar-site.vercel.app/"><b>Live demos</b></a> ·
|
|
25
|
+
<a href="#quickstart"><b>Quickstart</b></a>
|
|
26
|
+
</p>
|
|
8
27
|
|
|
9
|
-
|
|
28
|
+
---
|
|
10
29
|
|
|
11
|
-
|
|
30
|
+
**react-ai-avatar** handles exactly one step of your voice/chat pipeline: turning audio amplitude and conversation-state changes into a face that visibly reacts. It is **completely LLM-agnostic** — it doesn't know about Gemini, OpenAI or ElevenLabs. You pass two live things — a `state` and (optionally) a WebAudio `AnalyserNode` — and it does the rest. Your host app keeps the microphone, the WebSocket and the AI provider; **none of those dependencies enter your bundle**. One thing, done well, embeddable in a few lines, no backend, MIT.
|
|
12
31
|
|
|
13
32
|
```tsx
|
|
14
33
|
import { RealtimeAvatar } from 'react-ai-avatar';
|
|
@@ -18,38 +37,17 @@ import 'react-ai-avatar/style.css';
|
|
|
18
37
|
<RealtimeAvatar state="speaking" />
|
|
19
38
|
```
|
|
20
39
|
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
## Features
|
|
26
|
-
|
|
27
|
-
- 👄 **Audio-reactive mouth** — analyzes amplitude and frequency bands in real time. This is deliberately *not* phoneme-perfect "lip-sync": an `AnalyserNode` gives energy, not phonemes, and for flat avatars amplitude is what looks right.
|
|
28
|
-
- 🦺 **Graceful degradation** — `analyser={null}` while `state="speaking"`? The mouth animates with a synthetic speech-like pattern instead of freezing. Perfect for demos and non-WebRTC apps.
|
|
29
|
-
- ⌨️ **Text-streaming LLMs too** — no audio? Drive the mouth from *token cadence* with `createSpeechActivity()`. A text-only assistant (OpenAI-style `/chat/completions` or `/responses` with `stream: true`) gets a face that visibly tracks the stream — busy while tokens arrive, settling on pauses.
|
|
30
|
-
- 🧠 **A visible `thinking` state** — pulsing thought bubble + upward gaze. Your users *see* the LLM thinking, not just a color change.
|
|
31
|
-
- 🛠️ **A `working` state for tool use** — the fifth state, for agentic UIs. While the LLM runs a tool, the face goes amber and the state pill reads `Working: {tool}` (pass the tool name via the `tool` prop). Your users see *which* tool is running, not just a spinner.
|
|
32
|
-
- 🎨 **Own-design avatar catalog** — `geometric`, `memoji`, `pixelart`, `doodle`: four MIT, CC0-safe SVG presets. No third-party assets, no attribution headaches.
|
|
33
|
-
- 🎲 **DiceBear avatars (`dicebear`)** — generate deterministic [DiceBear](https://www.dicebear.com) avatars client-side, from a curated **CC0-only** style set (still no attribution). Animated with an audio-reactive bounce.
|
|
34
|
-
- 🔌 **Bring your own SVG (`byos`)** — any SVG implementing the small layer contract gets the full animation runtime for free. Your avatar, your license.
|
|
35
|
-
- ♿ **Production quality** — SSR-safe (Next.js friendly), honors `prefers-reduced-motion`, announces state changes via `aria-live`.
|
|
36
|
-
- 🧊 **Optional 3D (VRM)** — `variant="vrm"` renders VRoid/VRM models with visemes and gaze tracking. The three.js stack is an *optional* peer dependency, lazy-loaded only if you use it.
|
|
40
|
+
<p align="center">
|
|
41
|
+
<img src="./assets/banner.png" alt="The react-ai-avatar catalog reacting to conversation state" width="100%" />
|
|
42
|
+
</p>
|
|
37
43
|
|
|
38
|
-
##
|
|
44
|
+
## Quickstart
|
|
39
45
|
|
|
40
46
|
```bash
|
|
41
47
|
npm install react-ai-avatar motion
|
|
42
48
|
```
|
|
43
49
|
|
|
44
|
-
`react`, `react-dom` and `motion` are peer dependencies.
|
|
45
|
-
|
|
46
|
-
```bash
|
|
47
|
-
npm install three @react-three/fiber @react-three/drei @pixiv/three-vrm
|
|
48
|
-
```
|
|
49
|
-
|
|
50
|
-
## Quick start
|
|
51
|
-
|
|
52
|
-
The only prop you *have* to pass is `state` — you resolve it in your app, the avatar never infers it. Everything else has a default, so this already works:
|
|
50
|
+
`react`, `react-dom` and `motion` are peer dependencies. The only prop you *have* to pass is `state` — you resolve it in your app, the avatar never infers it:
|
|
53
51
|
|
|
54
52
|
```tsx
|
|
55
53
|
import { RealtimeAvatar } from 'react-ai-avatar';
|
|
@@ -63,11 +61,7 @@ export default function App() {
|
|
|
63
61
|
}
|
|
64
62
|
```
|
|
65
63
|
|
|
66
|
-
With no `analyser`, `speaking` falls back to a synthetic speech-like mouth — great for getting something on screen before the audio pipeline exists. Pass an `AnalyserNode` to make the mouth react to real audio (see [
|
|
67
|
-
|
|
68
|
-
### Customizing further
|
|
69
|
-
|
|
70
|
-
Every default is overridable. Opt into as much as you need:
|
|
64
|
+
With no `analyser`, `speaking` falls back to a synthetic speech-like mouth — great for getting something on screen before the audio pipeline exists. Pass an `AnalyserNode` to make the mouth react to real audio (see [Driving the mouth](#driving-the-mouth)). Every default is overridable:
|
|
71
65
|
|
|
72
66
|
```tsx
|
|
73
67
|
<RealtimeAvatar
|
|
@@ -80,6 +74,43 @@ Every default is overridable. Opt into as much as you need:
|
|
|
80
74
|
/>
|
|
81
75
|
```
|
|
82
76
|
|
|
77
|
+
For the optional 3D (VRM) variant, also install `three @react-three/fiber @react-three/drei @pixiv/three-vrm`; for `glb`, the same minus `@pixiv/three-vrm`; for `dicebear`, `@dicebear/core @dicebear/collection`. All are **optional** peer dependencies, lazy-loaded only if you use that variant.
|
|
78
|
+
|
|
79
|
+
## Table of contents
|
|
80
|
+
|
|
81
|
+
- [Philosophy](#philosophy)
|
|
82
|
+
- [Features](#features)
|
|
83
|
+
- [The avatar catalog](#the-avatar-catalog)
|
|
84
|
+
- [Driving the mouth](#driving-the-mouth)
|
|
85
|
+
- [Audio: getting an `AnalyserNode`](#audio-getting-an-analysernode)
|
|
86
|
+
- [Text-streaming LLMs (no audio)](#text-streaming-llms-no-audio)
|
|
87
|
+
- [Bring your own SVG (`byos`)](#bring-your-own-svg-byos)
|
|
88
|
+
- [3D avatars (VRM and GLB)](#3d-avatars-vrm-and-glb)
|
|
89
|
+
- [DiceBear avatars (`dicebear`)](#dicebear-avatars-dicebear)
|
|
90
|
+
- [API reference](#api-reference)
|
|
91
|
+
- [Building blocks](#building-blocks)
|
|
92
|
+
- [Positioning](#positioning)
|
|
93
|
+
- [Examples](#examples)
|
|
94
|
+
- [Contributing](#contributing)
|
|
95
|
+
- [License](#license)
|
|
96
|
+
|
|
97
|
+
## Philosophy
|
|
98
|
+
|
|
99
|
+
One thing, done well, embeddable in a few lines, no backend, MIT. The library handles exactly one step of your voice pipeline: turning audio amplitude + state changes into a face that visibly **listens, thinks and speaks**. Your host app keeps the microphone, the WebSocket and the AI provider — none of those dependencies enter your bundle.
|
|
100
|
+
|
|
101
|
+
## Features
|
|
102
|
+
|
|
103
|
+
- 👄 **Audio-reactive mouth** — analyzes amplitude and frequency bands in real time. This is deliberately *not* phoneme-perfect "lip-sync": an `AnalyserNode` gives energy, not phonemes, and for flat avatars amplitude is what looks right.
|
|
104
|
+
- 🦺 **Graceful degradation** — `analyser={null}` while `state="speaking"`? The mouth animates with a synthetic speech-like pattern instead of freezing. Perfect for demos and non-WebRTC apps.
|
|
105
|
+
- ⌨️ **Text-streaming LLMs too** — no audio? Drive the mouth from *token cadence* with `createSpeechActivity()`. A text-only assistant (OpenAI-style `/chat/completions` or `/responses` with `stream: true`) gets a face that visibly tracks the stream — busy while tokens arrive, settling on pauses.
|
|
106
|
+
- 🧠 **A visible `thinking` state** — pulsing thought bubble + upward gaze. Your users *see* the LLM thinking, not just a color change.
|
|
107
|
+
- 🛠️ **A `working` state for tool use** — the fifth state, for agentic UIs. While the LLM runs a tool, the face goes amber and the state pill reads `Working: {tool}` (pass the tool name via the `tool` prop). Your users see *which* tool is running, not just a spinner.
|
|
108
|
+
- 🎨 **Own-design avatar catalog** — `geometric`, `memoji`, `pixelart`, `doodle`: four MIT, CC0-safe SVG presets. No third-party assets, no attribution headaches.
|
|
109
|
+
- 🎲 **DiceBear avatars (`dicebear`)** — generate deterministic [DiceBear](https://www.dicebear.com) avatars client-side, from a curated **CC0-only** style set (still no attribution). Animated with an audio-reactive bounce.
|
|
110
|
+
- 🔌 **Bring your own SVG (`byos`)** — any SVG implementing the small layer contract gets the full animation runtime for free. Your avatar, your license.
|
|
111
|
+
- ♿ **Production quality** — SSR-safe (Next.js friendly), honors `prefers-reduced-motion`, announces state changes via `aria-live`.
|
|
112
|
+
- 🧊 **Optional 3D (VRM/GLB)** — `variant="vrm"` / `variant="glb"` render VRoid/VRM and ARKit-rigged glTF models with visemes and gaze tracking. The three.js stack is an *optional* peer dependency, lazy-loaded only if you use it.
|
|
113
|
+
|
|
83
114
|
## The avatar catalog
|
|
84
115
|
|
|
85
116
|
| variant | style | notes |
|
|
@@ -95,106 +126,11 @@ Every default is overridable. Opt into as much as you need:
|
|
|
95
126
|
|
|
96
127
|
All built-in presets are original designs licensed MIT — nothing inside this package requires attribution.
|
|
97
128
|
|
|
98
|
-
##
|
|
99
|
-
|
|
100
|
-
Generate [DiceBear](https://www.dicebear.com) avatars client-side — deterministic per `seed`, no network call. The packages are **optional** peer dependencies, lazy-loaded only when this variant renders:
|
|
101
|
-
|
|
102
|
-
```bash
|
|
103
|
-
npm install @dicebear/core @dicebear/collection
|
|
104
|
-
```
|
|
105
|
-
|
|
106
|
-
```tsx
|
|
107
|
-
<RealtimeAvatar
|
|
108
|
-
state={aiState}
|
|
109
|
-
analyser={analyser}
|
|
110
|
-
variant="dicebear"
|
|
111
|
-
dicebearCollection="open-peeps" // curated CC0 style id
|
|
112
|
-
dicebearSeed="ada-lovelace" // same seed + style => same face
|
|
113
|
-
/>
|
|
114
|
-
```
|
|
115
|
-
|
|
116
|
-
**Licensing:** DiceBear ships ~30 styles under mixed licenses. This library's catalog (`DICEBEAR_STYLES`) is curated to **CC0 1.0** styles that have a face — `pixel-art`(+`-neutral`), `lorelei`(+`-neutral`), `notionists`(+`-neutral`), `open-peeps`, `thumbs` — so it keeps the same no-attribution promise as the built-in presets. You *can* pass any other DiceBear style id to `dicebearCollection`, but then its license (e.g. CC BY 4.0 for `adventurer`, or "free for personal and commercial use" for `bottts`) is your responsibility — same deal as `byos`.
|
|
117
|
-
|
|
118
|
-
**Animation:** DiceBear SVGs have no `#rra-*` hooks, but their *option API* lets us pick which mouth/eyes variant to render. So every curated style actually **talks**: it pre-generates a few frames of the same avatar (same seed ⇒ identical hair/skin/etc.) with closed / mid / open mouths — plus a blink frame where the style allows — and swaps which frame is shown per audio frame, with a subtle bounce on top. Real articulation via the supported API, no fragile path hacks. The per-style variant choices live in the exported `DICEBEAR_RIGS` map. (A non-rigged style id you pass yourself — e.g. a faceless abstract DiceBear style — falls back to a pure audio-reactive bounce.) State color and the thinking bubble still come from the surrounding `<RealtimeAvatar />` chrome.
|
|
119
|
-
|
|
120
|
-
## 3D GLB + ARKit (`glb`)
|
|
129
|
+
## Driving the mouth
|
|
121
130
|
|
|
122
|
-
|
|
131
|
+
The mouth has three possible drivers, in precedence order: an explicit `speechActivity` source, then `streamingText`, then the audio `analyser`. Pick whichever matches your pipeline — voice apps use the analyser, text-only LLMs use the streaming-text paths.
|
|
123
132
|
|
|
124
|
-
|
|
125
|
-
npm install three @react-three/fiber @react-three/drei
|
|
126
|
-
```
|
|
127
|
-
|
|
128
|
-
```tsx
|
|
129
|
-
<RealtimeAvatar
|
|
130
|
-
state={aiState}
|
|
131
|
-
analyser={analyser}
|
|
132
|
-
variant="glb"
|
|
133
|
-
glbUrl="/models/rocketbox.glb" // CORS-enabled .glb with ARKit morph targets
|
|
134
|
-
/>
|
|
135
|
-
```
|
|
136
|
-
|
|
137
|
-
**Recommended example asset — Microsoft Rocketbox (MIT).** [Rocketbox](https://github.com/microsoft/Microsoft-Rocketbox) ships 115 rigged avatars with an ARKit-compatible blendshape variant, under the **MIT license** — the cleanest fit for this library's no-attribution-headaches philosophy. Rocketbox distributes `.fbx`, so convert one avatar to `.glb` once (offline, via [FBX2GLTF](https://github.com/facebookincubator/FBX2glTF) or Blender's glTF 2.0 export, keeping the blendshapes) and drop it in `public/models/`. Keep the MIT notice alongside it. [Ready Player Me](https://docs.readyplayer.me/ready-player-me/api-reference/avatars/morph-targets/apple-arkit) avatars (`?morphTargets=ARKit`) also work out of the box.
|
|
138
|
-
|
|
139
|
-
## Bring your own SVG (`byos`)
|
|
140
|
-
|
|
141
|
-
Any SVG exposing these stable hooks is animated by the runtime — same blink, gaze, mouth and thinking behavior as the built-in presets:
|
|
142
|
-
|
|
143
|
-
| hook | part | the runtime drives |
|
|
144
|
-
|---|---|---|
|
|
145
|
-
| `#rra-ring` | state ring | `stroke` = `stateColors[state]` |
|
|
146
|
-
| `#rra-mouth` | mouth | ellipse: `ry`/`rx` · rect: `height` |
|
|
147
|
-
| `.rra-pupil` (×2) | pupils | circle: `cx`/`cy` · rect: `x`/`y` (mouse tracking, thinking gaze) |
|
|
148
|
-
| `.rra-lid` (×2) | eyelids | `height` (blink; 0 = open) |
|
|
149
|
-
| `#rra-think` | thought bubble | `opacity` + dots pulsing while `thinking` |
|
|
150
|
-
|
|
151
|
-
Optional data attributes: `data-base-x`/`data-base-y` (pupil rest position), `data-max-height` (closed lid height), `data-quantize` (snap motion to a grid — that's how the pixel-art preset stays chunky).
|
|
152
|
-
|
|
153
|
-
```tsx
|
|
154
|
-
<RealtimeAvatar state={aiState} analyser={analyser} variant="byos">
|
|
155
|
-
<MyOwnSvgAvatar /> {/* exposes the #rra-* hooks; its license is your business */}
|
|
156
|
-
</RealtimeAvatar>
|
|
157
|
-
```
|
|
158
|
-
|
|
159
|
-
## API reference
|
|
160
|
-
|
|
161
|
-
### `<RealtimeAvatar />`
|
|
162
|
-
|
|
163
|
-
- `state` (`'idle' | 'listening' | 'thinking' | 'speaking' | 'working'`) — required. You resolve it; it is never inferred. `working` is the tool-use state for agentic UIs (amber).
|
|
164
|
-
- `tool` (`string`) — optional. The name of the tool currently running. While `state="working"`, the state pill reads `Working: {tool}` instead of the generic label.
|
|
165
|
-
- `analyser` (`AnalyserNode | null`) — optional. Drives the mouth from audio. Omitted or `null`, speaking falls back to the synthetic pattern.
|
|
166
|
-
- `streamingText` (`string`) — optional. Declarative mouth driver: pass the accumulated assistant text (e.g. from `useChat`) and the avatar diffs its growth to drive the mouth. Takes precedence over `analyser`. See [Text-streaming LLMs](#text-streaming-llms-no-audio).
|
|
167
|
-
- `speechActivity` (`SpeechActivitySource`) — optional. Imperative token-rate mouth driver, from `createSpeechActivity()`. Takes precedence over both `streamingText` and `analyser` when set.
|
|
168
|
-
- `size` (`number`) — px, default `280`.
|
|
169
|
-
- `variant` — see catalog above. Default `'geometric'`.
|
|
170
|
-
- `children` — your SVG, for `variant="byos"`.
|
|
171
|
-
- `vrmUrl` (`string`) — CORS-enabled `.vrm` URL, for `variant="vrm"`.
|
|
172
|
-
- `glbUrl` (`string`) — CORS-enabled `.glb` URL with ARKit blendshapes, for `variant="glb"`.
|
|
173
|
-
- `dicebearCollection` (`string`) — DiceBear style id (curated CC0 set), for `variant="dicebear"`.
|
|
174
|
-
- `dicebearSeed` (`string`) — deterministic DiceBear seed, for `variant="dicebear"`.
|
|
175
|
-
- `subtitle` / `thought` (`string`) — optional movie-style caption and a thought bubble. Pass raw text or markdown: both are flattened to spoken prose and rolled to a trailing window internally, so a long streamed reply never overflows or shows raw `**`/tables. For a long assistant reply, keep the full markdown in your chat transcript and pass the same text here for the short live caption.
|
|
176
|
-
- `showGlow` / `showStatePill` / `showThought` / `showSubtitle` (`boolean`) — HUD satellites, each `true` by default. Set any to `false` to hide it individually: the reactive glow, the state pill, the thought bubble, and the subtitle respectively. The built-in subtitle/thought float `absolute` around the face (needs open canvas); inside a constrained card, set `showSubtitle={false}` / `showThought={false}` and render `<AvatarCaption>` / `<AvatarThought>` in your own layout slot instead.
|
|
177
|
-
- `maxMouthOpening`, `mouseTrackingIntensity`, `blinkIntervalMin/Max`, `blinkDuration` — animation tuning.
|
|
178
|
-
- `stateColors`, `stateLabels` — theming; labels are announced via `aria-live`.
|
|
179
|
-
- `customization` — preset colors and accessories (skin, hair, clothing, glasses, headphones…).
|
|
180
|
-
|
|
181
|
-
### Building blocks
|
|
182
|
-
|
|
183
|
-
Everything the runtime uses is exported, so you can compose your own:
|
|
184
|
-
|
|
185
|
-
- `ContractAvatar` — wraps any contract-compliant SVG with the runtime.
|
|
186
|
-
- `useAvatarRuntime(containerRef, options)` — the animation runtime itself.
|
|
187
|
-
- `createMouthEngine(source)` / `useAudioMouth(...)` — the source→mouth analysis (amplitude + A/E/O shapes), procedural fallback included. `source` is an `AnalyserNode`, a `SpeechActivitySource`, or `null`.
|
|
188
|
-
- `createSpeechActivity(options?)` — the token-rate mouth driver for text streams (`push` / `end` / `reset` / `sample`).
|
|
189
|
-
- `useStreamingTextActivity(text)` — declarative wrapper: diffs accumulated streaming text into a `SpeechActivitySource` for you (what the `streamingText` prop uses).
|
|
190
|
-
- `useReducedMotion()` — SSR-safe `prefers-reduced-motion` hook.
|
|
191
|
-
- `GeometricAvatar`, `MemojiAvatar`, `PixelArtAvatar`, `DoodleAvatar` — the raw presets.
|
|
192
|
-
- `SquirrelAvatar` — a full branded character (red-squirrel dev face) built on the `#rra-*` contract; the worked `byos` example, shipped so the demos render it from one source. See [`examples/08-character-avatar-squirrel.tsx`](examples/08-character-avatar-squirrel.tsx).
|
|
193
|
-
- `AudioVisualizer` — Siri-style waveform telemetry strip.
|
|
194
|
-
- `AvatarCaption` / `AvatarThought` — host-placed caption + thought widgets. In-flow (not `absolute`), so they fit your own layout slot without overflow; both flatten markdown to spoken prose and roll a trailing window.
|
|
195
|
-
- `toPlainText(md)` / `tailWindow(text, { maxChars })` — the pure text helpers behind those widgets, for building your own caption.
|
|
196
|
-
|
|
197
|
-
## Getting an `AnalyserNode`
|
|
133
|
+
### Audio: getting an `AnalyserNode`
|
|
198
134
|
|
|
199
135
|
The standard recipe for base64 PCM streams (what Gemini Live / OpenAI Realtime return):
|
|
200
136
|
|
|
@@ -214,13 +150,13 @@ function playAudioChunk(pcmData: Float32Array) {
|
|
|
214
150
|
}
|
|
215
151
|
```
|
|
216
152
|
|
|
217
|
-
|
|
153
|
+
### Text-streaming LLMs (no audio)
|
|
218
154
|
|
|
219
155
|
Not every assistant speaks. For a text-only LLM that streams tokens — OpenAI-style `/chat/completions` or `/responses` with `stream: true`, or local servers like Ollama / LM Studio / vLLM — there's no `AnalyserNode` to read. Instead, drive the mouth from **token cadence**: the rhythm of arriving text becomes the same 0..1 energy signal the audio path produces. The mouth is busy while the model emits text and settles shut on pauses or when the stream ends. The library still never fetches anything — you own the stream, it owns the face.
|
|
220
156
|
|
|
221
157
|
There are two ways to feed it, matching the two ways React apps consume streams.
|
|
222
158
|
|
|
223
|
-
|
|
159
|
+
#### Declarative — `streamingText` (the easy path)
|
|
224
160
|
|
|
225
161
|
If you use a streaming chat hook — the [Vercel AI SDK](https://sdk.vercel.ai)'s `useChat` is the de-facto standard — you never see raw chunks: you get the **accumulated** assistant message (it grows each render) plus a `status`. Both map straight onto the avatar. Pass the text, the avatar diffs its growth internally and drives the mouth. No refs, no reader loop:
|
|
226
162
|
|
|
@@ -245,7 +181,7 @@ function ChatAvatar() {
|
|
|
245
181
|
|
|
246
182
|
That's the whole integration. `streamingText` takes precedence over `analyser`; the ambient glow reacts to it too. Works with every variant — flat presets, DiceBear, VRM and GLB.
|
|
247
183
|
|
|
248
|
-
|
|
184
|
+
#### Imperative — `createSpeechActivity()` (you own the reader loop)
|
|
249
185
|
|
|
250
186
|
Hand-rolling `fetch` or driving the OpenAI SDK's `for await` yourself? Then you *do* have the raw chunks — feed their cadence directly with a `SpeechActivitySource`:
|
|
251
187
|
|
|
@@ -291,6 +227,112 @@ function TextAvatar() {
|
|
|
291
227
|
|
|
292
228
|
> [`examples/03-streaming-text-imperative.tsx`](examples/03-streaming-text-imperative.tsx) shows this end-to-end against an OpenAI-compatible endpoint. The browser only ever talks to your own `/api/chat`; a tiny reference relay that proxies to the provider (so the key never reaches the client) lives in [`examples/server/proxy.ts`](examples/server/proxy.ts).
|
|
293
229
|
|
|
230
|
+
## Bring your own SVG (`byos`)
|
|
231
|
+
|
|
232
|
+
Any SVG exposing these stable hooks is animated by the runtime — same blink, gaze, mouth and thinking behavior as the built-in presets:
|
|
233
|
+
|
|
234
|
+
| hook | part | the runtime drives |
|
|
235
|
+
|---|---|---|
|
|
236
|
+
| `#rra-ring` | state ring | `stroke` = `stateColors[state]` |
|
|
237
|
+
| `#rra-mouth` | mouth | ellipse: `ry`/`rx` · rect: `height` |
|
|
238
|
+
| `.rra-pupil` (×2) | pupils | circle: `cx`/`cy` · rect: `x`/`y` (mouse tracking, thinking gaze) |
|
|
239
|
+
| `.rra-lid` (×2) | eyelids | `height` (blink; 0 = open) |
|
|
240
|
+
| `#rra-think` | thought bubble | `opacity` + dots pulsing while `thinking` |
|
|
241
|
+
|
|
242
|
+
Optional data attributes: `data-base-x`/`data-base-y` (pupil rest position), `data-max-height` (closed lid height), `data-quantize` (snap motion to a grid — that's how the pixel-art preset stays chunky).
|
|
243
|
+
|
|
244
|
+
```tsx
|
|
245
|
+
<RealtimeAvatar state={aiState} analyser={analyser} variant="byos">
|
|
246
|
+
<MyOwnSvgAvatar /> {/* exposes the #rra-* hooks; its license is your business */}
|
|
247
|
+
</RealtimeAvatar>
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
## 3D avatars (VRM and GLB)
|
|
251
|
+
|
|
252
|
+
Both 3D variants share the same mouth engine as the flat presets, so the model talks, blinks and follows the cursor. The three.js stack is an **optional** peer dependency, lazy-loaded only when one of these variants renders — it never enters your bundle otherwise.
|
|
253
|
+
|
|
254
|
+
**`vrm`** — render VRoid/VRM models with visemes and gaze tracking:
|
|
255
|
+
|
|
256
|
+
```bash
|
|
257
|
+
npm install three @react-three/fiber @react-three/drei @pixiv/three-vrm
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
```tsx
|
|
261
|
+
<RealtimeAvatar state={aiState} analyser={analyser} variant="vrm" vrmUrl="/models/avatar.vrm" />
|
|
262
|
+
```
|
|
263
|
+
|
|
264
|
+
**`glb`** — render any `.glb` that exposes the **52 [ARKit blendshapes](https://arkit-face-blendshapes.com/)** (the standard `jawOpen`, `mouthFunnel`, `eyeBlinkLeft`, … morph targets). Same deal as `vrm`, minus `@pixiv/three-vrm`:
|
|
265
|
+
|
|
266
|
+
```bash
|
|
267
|
+
npm install three @react-three/fiber @react-three/drei
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
```tsx
|
|
271
|
+
<RealtimeAvatar state={aiState} analyser={analyser} variant="glb" glbUrl="/models/rocketbox.glb" />
|
|
272
|
+
```
|
|
273
|
+
|
|
274
|
+
**Recommended example asset — Microsoft Rocketbox (MIT).** [Rocketbox](https://github.com/microsoft/Microsoft-Rocketbox) ships 115 rigged avatars with an ARKit-compatible blendshape variant, under the **MIT license** — the cleanest fit for this library's no-attribution-headaches philosophy. Rocketbox distributes `.fbx`, so convert one avatar to `.glb` once (offline, via [FBX2GLTF](https://github.com/facebookincubator/FBX2glTF) or Blender's glTF 2.0 export, keeping the blendshapes) and drop it in `public/models/`. Keep the MIT notice alongside it. [Ready Player Me](https://docs.readyplayer.me/ready-player-me/api-reference/avatars/morph-targets/apple-arkit) avatars (`?morphTargets=ARKit`) also work out of the box.
|
|
275
|
+
|
|
276
|
+
## DiceBear avatars (`dicebear`)
|
|
277
|
+
|
|
278
|
+
Generate [DiceBear](https://www.dicebear.com) avatars client-side — deterministic per `seed`, no network call. The packages are **optional** peer dependencies, lazy-loaded only when this variant renders:
|
|
279
|
+
|
|
280
|
+
```bash
|
|
281
|
+
npm install @dicebear/core @dicebear/collection
|
|
282
|
+
```
|
|
283
|
+
|
|
284
|
+
```tsx
|
|
285
|
+
<RealtimeAvatar
|
|
286
|
+
state={aiState}
|
|
287
|
+
analyser={analyser}
|
|
288
|
+
variant="dicebear"
|
|
289
|
+
dicebearCollection="open-peeps" // curated CC0 style id
|
|
290
|
+
dicebearSeed="ada-lovelace" // same seed + style => same face
|
|
291
|
+
/>
|
|
292
|
+
```
|
|
293
|
+
|
|
294
|
+
**Licensing:** DiceBear ships ~30 styles under mixed licenses. This library's catalog (`DICEBEAR_STYLES`) is curated to **CC0 1.0** styles that have a face — `pixel-art`(+`-neutral`), `lorelei`(+`-neutral`), `notionists`(+`-neutral`), `open-peeps`, `thumbs` — so it keeps the same no-attribution promise as the built-in presets. You *can* pass any other DiceBear style id to `dicebearCollection`, but then its license (e.g. CC BY 4.0 for `adventurer`, or "free for personal and commercial use" for `bottts`) is your responsibility — same deal as `byos`.
|
|
295
|
+
|
|
296
|
+
**Animation:** DiceBear SVGs have no `#rra-*` hooks, but their *option API* lets us pick which mouth/eyes variant to render. So every curated style actually **talks**: it pre-generates a few frames of the same avatar (same seed ⇒ identical hair/skin/etc.) with closed / mid / open mouths — plus a blink frame where the style allows — and swaps which frame is shown per audio frame, with a subtle bounce on top. Real articulation via the supported API, no fragile path hacks. The per-style variant choices live in the exported `DICEBEAR_RIGS` map. (A non-rigged style id you pass yourself — e.g. a faceless abstract DiceBear style — falls back to a pure audio-reactive bounce.) State color and the thinking bubble still come from the surrounding `<RealtimeAvatar />` chrome.
|
|
297
|
+
|
|
298
|
+
## API reference
|
|
299
|
+
|
|
300
|
+
### `<RealtimeAvatar />`
|
|
301
|
+
|
|
302
|
+
- `state` (`'idle' | 'listening' | 'thinking' | 'speaking' | 'working'`) — required. You resolve it; it is never inferred. `working` is the tool-use state for agentic UIs (amber).
|
|
303
|
+
- `tool` (`string`) — optional. The name of the tool currently running. While `state="working"`, the state pill reads `Working: {tool}` instead of the generic label.
|
|
304
|
+
- `analyser` (`AnalyserNode | null`) — optional. Drives the mouth from audio. Omitted or `null`, speaking falls back to the synthetic pattern.
|
|
305
|
+
- `streamingText` (`string`) — optional. Declarative mouth driver: pass the accumulated assistant text (e.g. from `useChat`) and the avatar diffs its growth to drive the mouth. Takes precedence over `analyser`. See [Text-streaming LLMs](#text-streaming-llms-no-audio).
|
|
306
|
+
- `speechActivity` (`SpeechActivitySource`) — optional. Imperative token-rate mouth driver, from `createSpeechActivity()`. Takes precedence over both `streamingText` and `analyser` when set.
|
|
307
|
+
- `size` (`number`) — px, default `280`.
|
|
308
|
+
- `variant` — see catalog above. Default `'geometric'`.
|
|
309
|
+
- `children` — your SVG, for `variant="byos"`.
|
|
310
|
+
- `vrmUrl` (`string`) — CORS-enabled `.vrm` URL, for `variant="vrm"`.
|
|
311
|
+
- `glbUrl` (`string`) — CORS-enabled `.glb` URL with ARKit blendshapes, for `variant="glb"`.
|
|
312
|
+
- `dicebearCollection` (`string`) — DiceBear style id (curated CC0 set), for `variant="dicebear"`.
|
|
313
|
+
- `dicebearSeed` (`string`) — deterministic DiceBear seed, for `variant="dicebear"`.
|
|
314
|
+
- `subtitle` / `thought` (`string`) — optional movie-style caption and a thought bubble. Pass raw text or markdown: both are flattened to spoken prose and rolled to a trailing window internally, so a long streamed reply never overflows or shows raw `**`/tables. For a long assistant reply, keep the full markdown in your chat transcript and pass the same text here for the short live caption.
|
|
315
|
+
- `showGlow` / `showStatePill` / `showThought` / `showSubtitle` (`boolean`) — HUD satellites, each `true` by default. Set any to `false` to hide it individually: the reactive glow, the state pill, the thought bubble, and the subtitle respectively. The built-in subtitle/thought float `absolute` around the face (needs open canvas); inside a constrained card, set `showSubtitle={false}` / `showThought={false}` and render `<AvatarCaption>` / `<AvatarThought>` in your own layout slot instead.
|
|
316
|
+
- `maxMouthOpening`, `mouseTrackingIntensity`, `blinkIntervalMin/Max`, `blinkDuration` — animation tuning.
|
|
317
|
+
- `stateColors`, `stateLabels` — theming; labels are announced via `aria-live`. Both cover all five states including `working`.
|
|
318
|
+
- `customization` — preset colors and accessories (skin, hair, clothing, glasses, headphones…).
|
|
319
|
+
|
|
320
|
+
### Building blocks
|
|
321
|
+
|
|
322
|
+
Everything the runtime uses is exported, so you can compose your own:
|
|
323
|
+
|
|
324
|
+
- `ContractAvatar` — wraps any contract-compliant SVG with the runtime.
|
|
325
|
+
- `useAvatarRuntime(containerRef, options)` — the animation runtime itself.
|
|
326
|
+
- `createMouthEngine(source)` / `useAudioMouth(...)` — the source→mouth analysis (amplitude + A/E/O shapes), procedural fallback included. `source` is an `AnalyserNode`, a `SpeechActivitySource`, or `null`.
|
|
327
|
+
- `createSpeechActivity(options?)` — the token-rate mouth driver for text streams (`push` / `end` / `reset` / `sample`).
|
|
328
|
+
- `useStreamingTextActivity(text)` — declarative wrapper: diffs accumulated streaming text into a `SpeechActivitySource` for you (what the `streamingText` prop uses).
|
|
329
|
+
- `useReducedMotion()` — SSR-safe `prefers-reduced-motion` hook.
|
|
330
|
+
- `GeometricAvatar`, `MemojiAvatar`, `PixelArtAvatar`, `DoodleAvatar` — the raw presets.
|
|
331
|
+
- `SquirrelAvatar` — a full branded character (red-squirrel dev face) built on the `#rra-*` contract; the worked `byos` example, shipped so the demos render it from one source. See [`examples/08-character-avatar-squirrel.tsx`](examples/08-character-avatar-squirrel.tsx).
|
|
332
|
+
- `AudioVisualizer` — Siri-style waveform telemetry strip.
|
|
333
|
+
- `AvatarCaption` / `AvatarThought` — host-placed caption + thought widgets. In-flow (not `absolute`), so they fit your own layout slot without overflow; both flatten markdown to spoken prose and roll a trailing window.
|
|
334
|
+
- `toPlainText(md)` / `tailWindow(text, { maxChars })` — the pure text helpers behind those widgets, for building your own caption.
|
|
335
|
+
|
|
294
336
|
## Positioning
|
|
295
337
|
|
|
296
338
|
The closest reference is [TalkingHead](https://github.com/met4citizen/TalkingHead) (3D, realistic lip-sync, Ready Player Me/Mixamo rigs). This library makes the opposite bet:
|
|
@@ -303,10 +345,13 @@ The closest reference is [TalkingHead](https://github.com/met4citizen/TalkingHea
|
|
|
303
345
|
| Makes visible | the voice | the *thinking* |
|
|
304
346
|
| Setup | avatar platform + Blender + rig | `npm i` + one component |
|
|
305
347
|
|
|
306
|
-
##
|
|
348
|
+
## Examples
|
|
349
|
+
|
|
350
|
+
Copy-pasteable, single-file integration examples — including a reference relay server for real voice/text providers — live in [`examples/`](examples/). One file per integration pattern (quickstart, `useChat`, imperative streaming, audio analyser, the avatar catalog, `byos`, Gemini Live voice, a branded character). The runnable, hosted versions (client-side mock, no API key) live on the [docs site](https://react-ai-avatar-site.vercel.app/).
|
|
351
|
+
|
|
352
|
+
## Contributing
|
|
307
353
|
|
|
308
|
-
This repo is the library only — no app or backend. The runnable, hosted demos
|
|
309
|
-
live on the project's docs site (built separately, client-side mock, no API key).
|
|
354
|
+
This repo is the library only — no app or backend. The runnable, hosted demos live on the project's docs site (built separately, client-side mock, no API key).
|
|
310
355
|
|
|
311
356
|
```bash
|
|
312
357
|
npm install
|
|
@@ -315,8 +360,7 @@ npm run lint # tsc --noEmit
|
|
|
315
360
|
npm run build:lib # builds the publishable package into dist/lib
|
|
316
361
|
```
|
|
317
362
|
|
|
318
|
-
|
|
319
|
-
real voice/text providers — live in [`examples/`](examples/).
|
|
363
|
+
Issues and pull requests are welcome — bug fixes, new presets that follow the `#rra-*` layer contract, and integration examples especially. Keep the library presentational and provider-agnostic: it never fetches, and the audio/three.js peers stay optional and lazy-loaded.
|
|
320
364
|
|
|
321
365
|
## License
|
|
322
366
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "react-ai-avatar",
|
|
3
|
-
"version": "0.1.
|
|
3
|
+
"version": "0.1.3",
|
|
4
4
|
"description": "A presentational React avatar for realtime LLM voice UIs — you bring the connection, it brings the face.",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"author": "Ariel A. <ariel.a.deibe@gmail.com>",
|