react-ai-avatar 0.1.1 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +191 -140
  2. package/package.json +1 -1
package/README.md CHANGED
@@ -1,10 +1,33 @@
1
- # react-ai-avatar
1
+ <p align="center">
2
+ <img src="./assets/logo.svg" alt="react-ai-avatar" width="116" height="116" />
3
+ </p>
2
4
 
3
- > A presentational React avatar for realtime LLM voice UIs — **you bring the connection, it brings the face.**
5
+ <h1 align="center">react-ai-avatar</h1>
4
6
 
5
- [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
7
+ <p align="center">
8
+ <strong>A face for your AI.</strong><br/>
9
+ A presentational React avatar for realtime LLM voice &amp; text UIs — you bring the connection,<br/>
10
+ it brings the face that visibly <strong>listens, thinks and speaks</strong>.
11
+ </p>
6
12
 
7
- A lightweight, MIT-licensed React library that renders an animated avatar reacting to your AI's conversation state and audio. It is **completely LLM-agnostic**: it doesn't know about Gemini, OpenAI or ElevenLabs. You pass two live things — a `state` and (optionally) a WebAudio `AnalyserNode` — and it does the rest.
13
+ <p align="center">
14
+ <a href="https://www.npmjs.com/package/react-ai-avatar"><img alt="npm version" src="https://img.shields.io/npm/v/react-ai-avatar?color=0d9488"></a>
15
+ <a href="https://www.npmjs.com/package/react-ai-avatar"><img alt="npm downloads" src="https://img.shields.io/npm/dm/react-ai-avatar?color=0d9488"></a>
16
+ <img alt="minzipped size" src="https://img.shields.io/bundlephobia/minzip/react-ai-avatar?color=0d9488">
17
+ <img alt="types included" src="https://img.shields.io/npm/types/react-ai-avatar?color=0d9488">
18
+ <a href="./LICENSE"><img alt="MIT license" src="https://img.shields.io/npm/l/react-ai-avatar?color=0d9488"></a>
19
+ </p>
20
+
21
+ <p align="center">
22
+ <a href="https://www.npmjs.com/package/react-ai-avatar"><b>npm</b></a> &nbsp;·&nbsp;
23
+ <a href="https://react-ai-avatar-site.vercel.app/#/docs"><b>Documentation</b></a> &nbsp;·&nbsp;
24
+ <a href="https://react-ai-avatar-site.vercel.app/"><b>Live demos</b></a> &nbsp;·&nbsp;
25
+ <a href="#quickstart"><b>Quickstart</b></a>
26
+ </p>
27
+
28
+ ---
29
+
30
+ **react-ai-avatar** handles exactly one step of your voice/chat pipeline: turning audio amplitude and conversation-state changes into a face that visibly reacts. It is **completely LLM-agnostic** — it doesn't know about Gemini, OpenAI or ElevenLabs. You pass two live things — a `state` and (optionally) a WebAudio `AnalyserNode` — and it does the rest. Your host app keeps the microphone, the WebSocket and the AI provider; **none of those dependencies enter your bundle**. One thing, done well, embeddable in a few lines, no backend, MIT.
8
31
 
9
32
  ```tsx
10
33
  import { RealtimeAvatar } from 'react-ai-avatar';
@@ -14,37 +37,17 @@ import 'react-ai-avatar/style.css';
14
37
  <RealtimeAvatar state="speaking" />
15
38
  ```
16
39
 
17
- ## Philosophy
18
-
19
- One thing, done well, embeddable in a few lines, no backend, MIT. The library handles exactly one step of your voice pipeline: turning audio amplitude + state changes into a face that visibly **listens, thinks and speaks**. Your host app keeps the microphone, the WebSocket and the AI provider — none of those dependencies enter your bundle.
40
+ <p align="center">
41
+ <img src="./assets/banner.png" alt="The react-ai-avatar catalog reacting to conversation state" width="100%" />
42
+ </p>
20
43
 
21
- ## Features
22
-
23
- - 👄 **Audio-reactive mouth** — analyzes amplitude and frequency bands in real time. This is deliberately *not* phoneme-perfect "lip-sync": an `AnalyserNode` gives energy, not phonemes, and for flat avatars amplitude is what looks right.
24
- - 🦺 **Graceful degradation** — `analyser={null}` while `state="speaking"`? The mouth animates with a synthetic speech-like pattern instead of freezing. Perfect for demos and non-WebRTC apps.
25
- - ⌨️ **Text-streaming LLMs too** — no audio? Drive the mouth from *token cadence* with `createSpeechActivity()`. A text-only assistant (OpenAI-style `/chat/completions` or `/responses` with `stream: true`) gets a face that visibly tracks the stream — busy while tokens arrive, settling on pauses.
26
- - 🧠 **A visible `thinking` state** — pulsing thought bubble + upward gaze. Your users *see* the LLM thinking, not just a color change.
27
- - 🎨 **Own-design avatar catalog** — `geometric`, `memoji`, `pixelart`, `doodle`: four MIT, CC0-safe SVG presets. No third-party assets, no attribution headaches.
28
- - 🎲 **DiceBear avatars (`dicebear`)** — generate deterministic [DiceBear](https://www.dicebear.com) avatars client-side, from a curated **CC0-only** style set (still no attribution). Animated with an audio-reactive bounce.
29
- - 🔌 **Bring your own SVG (`byos`)** — any SVG implementing the small layer contract gets the full animation runtime for free. Your avatar, your license.
30
- - ♿ **Production quality** — SSR-safe (Next.js friendly), honors `prefers-reduced-motion`, announces state changes via `aria-live`.
31
- - 🧊 **Optional 3D (VRM)** — `variant="vrm"` renders VRoid/VRM models with visemes and gaze tracking. The three.js stack is an *optional* peer dependency, lazy-loaded only if you use it.
32
-
33
- ## Installation
44
+ ## Quickstart
34
45
 
35
46
  ```bash
36
47
  npm install react-ai-avatar motion
37
48
  ```
38
49
 
39
- `react`, `react-dom` and `motion` are peer dependencies. For the optional VRM variant, also install:
40
-
41
- ```bash
42
- npm install three @react-three/fiber @react-three/drei @pixiv/three-vrm
43
- ```
44
-
45
- ## Quick start
46
-
47
- The only prop you *have* to pass is `state` — you resolve it in your app, the avatar never infers it. Everything else has a default, so this already works:
50
+ `react`, `react-dom` and `motion` are peer dependencies. The only prop you *have* to pass is `state` — you resolve it in your app, the avatar never infers it:
48
51
 
49
52
  ```tsx
50
53
  import { RealtimeAvatar } from 'react-ai-avatar';
@@ -52,17 +55,13 @@ import 'react-ai-avatar/style.css';
52
55
 
53
56
  export default function App() {
54
57
  // You resolve this in your app (Gemini, OpenAI Realtime, WebRTC, anything)
55
- const aiState = 'speaking'; // 'idle' | 'listening' | 'thinking' | 'speaking'
58
+ const aiState = 'speaking'; // 'idle' | 'listening' | 'thinking' | 'speaking' | 'working'
56
59
 
57
60
  return <RealtimeAvatar state={aiState} />;
58
61
  }
59
62
  ```
60
63
 
61
- With no `analyser`, `speaking` falls back to a synthetic speech-like mouth — great for getting something on screen before the audio pipeline exists. Pass an `AnalyserNode` to make the mouth react to real audio (see [Getting an `AnalyserNode`](#getting-an-analysernode)).
62
-
63
- ### Customizing further
64
-
65
- Every default is overridable. Opt into as much as you need:
64
+ With no `analyser`, `speaking` falls back to a synthetic speech-like mouth — great for getting something on screen before the audio pipeline exists. Pass an `AnalyserNode` to make the mouth react to real audio (see [Driving the mouth](#driving-the-mouth)). Every default is overridable:
66
65
 
67
66
  ```tsx
68
67
  <RealtimeAvatar
@@ -71,10 +70,47 @@ Every default is overridable. Opt into as much as you need:
71
70
  size={300} // default 280
72
71
  variant="geometric" // 'geometric' | 'memoji' | 'pixelart' | 'doodle' | 'dicebear' | 'vrm' | 'glb' | 'byos'
73
72
  customization={{ skinColor: '#f5c7a9', hairColor: '#2c2c2c', glasses: true, headphones: true }}
74
- stateColors={{ idle: '#4b5563', listening: '#3b82f6', thinking: '#8b5cf6', speaking: '#10b981' }}
73
+ stateColors={{ idle: '#4b5563', listening: '#3b82f6', thinking: '#8b5cf6', speaking: '#10b981', working: '#f59e0b' }}
75
74
  />
76
75
  ```
77
76
 
77
+ For the optional 3D (VRM) variant, also install `three @react-three/fiber @react-three/drei @pixiv/three-vrm`; for `glb`, the same minus `@pixiv/three-vrm`; for `dicebear`, `@dicebear/core @dicebear/collection`. All are **optional** peer dependencies, lazy-loaded only if you use that variant.
78
+
79
+ ## Table of contents
80
+
81
+ - [Philosophy](#philosophy)
82
+ - [Features](#features)
83
+ - [The avatar catalog](#the-avatar-catalog)
84
+ - [Driving the mouth](#driving-the-mouth)
85
+ - [Audio: getting an `AnalyserNode`](#audio-getting-an-analysernode)
86
+ - [Text-streaming LLMs (no audio)](#text-streaming-llms-no-audio)
87
+ - [Bring your own SVG (`byos`)](#bring-your-own-svg-byos)
88
+ - [3D avatars (VRM and GLB)](#3d-avatars-vrm-and-glb)
89
+ - [DiceBear avatars (`dicebear`)](#dicebear-avatars-dicebear)
90
+ - [API reference](#api-reference)
91
+ - [Building blocks](#building-blocks)
92
+ - [Positioning](#positioning)
93
+ - [Examples](#examples)
94
+ - [Contributing](#contributing)
95
+ - [License](#license)
96
+
97
+ ## Philosophy
98
+
99
+ One thing, done well, embeddable in a few lines, no backend, MIT. The library handles exactly one step of your voice pipeline: turning audio amplitude + state changes into a face that visibly **listens, thinks and speaks**. Your host app keeps the microphone, the WebSocket and the AI provider — none of those dependencies enter your bundle.
100
+
101
+ ## Features
102
+
103
+ - 👄 **Audio-reactive mouth** — analyzes amplitude and frequency bands in real time. This is deliberately *not* phoneme-perfect "lip-sync": an `AnalyserNode` gives energy, not phonemes, and for flat avatars amplitude is what looks right.
104
+ - 🦺 **Graceful degradation** — `analyser={null}` while `state="speaking"`? The mouth animates with a synthetic speech-like pattern instead of freezing. Perfect for demos and non-WebRTC apps.
105
+ - ⌨️ **Text-streaming LLMs too** — no audio? Drive the mouth from *token cadence* with `createSpeechActivity()`. A text-only assistant (OpenAI-style `/chat/completions` or `/responses` with `stream: true`) gets a face that visibly tracks the stream — busy while tokens arrive, settling on pauses.
106
+ - 🧠 **A visible `thinking` state** — pulsing thought bubble + upward gaze. Your users *see* the LLM thinking, not just a color change.
107
+ - 🛠️ **A `working` state for tool use** — the fifth state, for agentic UIs. While the LLM runs a tool, the face goes amber and the state pill reads `Working: {tool}` (pass the tool name via the `tool` prop). Your users see *which* tool is running, not just a spinner.
108
+ - 🎨 **Own-design avatar catalog** — `geometric`, `memoji`, `pixelart`, `doodle`: four MIT, CC0-safe SVG presets. No third-party assets, no attribution headaches.
109
+ - 🎲 **DiceBear avatars (`dicebear`)** — generate deterministic [DiceBear](https://www.dicebear.com) avatars client-side, from a curated **CC0-only** style set (still no attribution). Animated with an audio-reactive bounce.
110
+ - 🔌 **Bring your own SVG (`byos`)** — any SVG implementing the small layer contract gets the full animation runtime for free. Your avatar, your license.
111
+ - ♿ **Production quality** — SSR-safe (Next.js friendly), honors `prefers-reduced-motion`, announces state changes via `aria-live`.
112
+ - 🧊 **Optional 3D (VRM/GLB)** — `variant="vrm"` / `variant="glb"` render VRoid/VRM and ARKit-rigged glTF models with visemes and gaze tracking. The three.js stack is an *optional* peer dependency, lazy-loaded only if you use it.
113
+
78
114
  ## The avatar catalog
79
115
 
80
116
  | variant | style | notes |
@@ -90,104 +126,11 @@ Every default is overridable. Opt into as much as you need:
90
126
 
91
127
  All built-in presets are original designs licensed MIT — nothing inside this package requires attribution.
92
128
 
93
- ## DiceBear avatars (`dicebear`)
94
-
95
- Generate [DiceBear](https://www.dicebear.com) avatars client-side — deterministic per `seed`, no network call. The packages are **optional** peer dependencies, lazy-loaded only when this variant renders:
96
-
97
- ```bash
98
- npm install @dicebear/core @dicebear/collection
99
- ```
100
-
101
- ```tsx
102
- <RealtimeAvatar
103
- state={aiState}
104
- analyser={analyser}
105
- variant="dicebear"
106
- dicebearCollection="open-peeps" // curated CC0 style id
107
- dicebearSeed="ada-lovelace" // same seed + style => same face
108
- />
109
- ```
110
-
111
- **Licensing:** DiceBear ships ~30 styles under mixed licenses. This library's catalog (`DICEBEAR_STYLES`) is curated to **CC0 1.0** styles that have a face — `pixel-art`(+`-neutral`), `lorelei`(+`-neutral`), `notionists`(+`-neutral`), `open-peeps`, `thumbs` — so it keeps the same no-attribution promise as the built-in presets. You *can* pass any other DiceBear style id to `dicebearCollection`, but then its license (e.g. CC BY 4.0 for `adventurer`, or "free for personal and commercial use" for `bottts`) is your responsibility — same deal as `byos`.
112
-
113
- **Animation:** DiceBear SVGs have no `#rra-*` hooks, but their *option API* lets us pick which mouth/eyes variant to render. So every curated style actually **talks**: it pre-generates a few frames of the same avatar (same seed ⇒ identical hair/skin/etc.) with closed / mid / open mouths — plus a blink frame where the style allows — and swaps which frame is shown per audio frame, with a subtle bounce on top. Real articulation via the supported API, no fragile path hacks. The per-style variant choices live in the exported `DICEBEAR_RIGS` map. (A non-rigged style id you pass yourself — e.g. a faceless abstract DiceBear style — falls back to a pure audio-reactive bounce.) State color and the thinking bubble still come from the surrounding `<RealtimeAvatar />` chrome.
114
-
115
- ## 3D GLB + ARKit (`glb`)
116
-
117
- Render any `.glb` that exposes the **52 [ARKit blendshapes](https://arkit-face-blendshapes.com/)** (the standard `jawOpen`, `mouthFunnel`, `eyeBlinkLeft`, … morph targets). Same shared mouth engine as the flat presets drives them, so the model talks, blinks and follows the cursor. The three.js stack is optional and lazy-loaded — same deal as `vrm`, minus `@pixiv/three-vrm`:
118
-
119
- ```bash
120
- npm install three @react-three/fiber @react-three/drei
121
- ```
122
-
123
- ```tsx
124
- <RealtimeAvatar
125
- state={aiState}
126
- analyser={analyser}
127
- variant="glb"
128
- glbUrl="/models/rocketbox.glb" // CORS-enabled .glb with ARKit morph targets
129
- />
130
- ```
131
-
132
- **Recommended example asset — Microsoft Rocketbox (MIT).** [Rocketbox](https://github.com/microsoft/Microsoft-Rocketbox) ships 115 rigged avatars with an ARKit-compatible blendshape variant, under the **MIT license** — the cleanest fit for this library's no-attribution-headaches philosophy. Rocketbox distributes `.fbx`, so convert one avatar to `.glb` once (offline, via [FBX2GLTF](https://github.com/facebookincubator/FBX2glTF) or Blender's glTF 2.0 export, keeping the blendshapes) and drop it in `public/models/`. Keep the MIT notice alongside it. [Ready Player Me](https://docs.readyplayer.me/ready-player-me/api-reference/avatars/morph-targets/apple-arkit) avatars (`?morphTargets=ARKit`) also work out of the box.
133
-
134
- ## Bring your own SVG (`byos`)
135
-
136
- Any SVG exposing these stable hooks is animated by the runtime — same blink, gaze, mouth and thinking behavior as the built-in presets:
137
-
138
- | hook | part | the runtime drives |
139
- |---|---|---|
140
- | `#rra-ring` | state ring | `stroke` = `stateColors[state]` |
141
- | `#rra-mouth` | mouth | ellipse: `ry`/`rx` · rect: `height` |
142
- | `.rra-pupil` (×2) | pupils | circle: `cx`/`cy` · rect: `x`/`y` (mouse tracking, thinking gaze) |
143
- | `.rra-lid` (×2) | eyelids | `height` (blink; 0 = open) |
144
- | `#rra-think` | thought bubble | `opacity` + dots pulsing while `thinking` |
145
-
146
- Optional data attributes: `data-base-x`/`data-base-y` (pupil rest position), `data-max-height` (closed lid height), `data-quantize` (snap motion to a grid — that's how the pixel-art preset stays chunky).
147
-
148
- ```tsx
149
- <RealtimeAvatar state={aiState} analyser={analyser} variant="byos">
150
- <MyOwnSvgAvatar /> {/* exposes the #rra-* hooks; its license is your business */}
151
- </RealtimeAvatar>
152
- ```
153
-
154
- ## API reference
155
-
156
- ### `<RealtimeAvatar />`
157
-
158
- - `state` (`'idle' | 'listening' | 'thinking' | 'speaking'`) — required. You resolve it; it is never inferred.
159
- - `analyser` (`AnalyserNode | null`) — optional. Drives the mouth from audio. Omitted or `null`, speaking falls back to the synthetic pattern.
160
- - `streamingText` (`string`) — optional. Declarative mouth driver: pass the accumulated assistant text (e.g. from `useChat`) and the avatar diffs its growth to drive the mouth. Takes precedence over `analyser`. See [Text-streaming LLMs](#text-streaming-llms-no-audio).
161
- - `speechActivity` (`SpeechActivitySource`) — optional. Imperative token-rate mouth driver, from `createSpeechActivity()`. Takes precedence over both `streamingText` and `analyser` when set.
162
- - `size` (`number`) — px, default `280`.
163
- - `variant` — see catalog above. Default `'geometric'`.
164
- - `children` — your SVG, for `variant="byos"`.
165
- - `vrmUrl` (`string`) — CORS-enabled `.vrm` URL, for `variant="vrm"`.
166
- - `glbUrl` (`string`) — CORS-enabled `.glb` URL with ARKit blendshapes, for `variant="glb"`.
167
- - `dicebearCollection` (`string`) — DiceBear style id (curated CC0 set), for `variant="dicebear"`.
168
- - `dicebearSeed` (`string`) — deterministic DiceBear seed, for `variant="dicebear"`.
169
- - `subtitle` / `thought` (`string`) — optional movie-style caption and a thought bubble. Pass raw text or markdown: both are flattened to spoken prose and rolled to a trailing window internally, so a long streamed reply never overflows or shows raw `**`/tables. For a long assistant reply, keep the full markdown in your chat transcript and pass the same text here for the short live caption.
170
- - `showGlow` / `showStatePill` / `showThought` / `showSubtitle` (`boolean`) — HUD satellites, each `true` by default. Set any to `false` to hide it individually: the reactive glow, the state pill, the thought bubble, and the subtitle respectively. The built-in subtitle/thought float `absolute` around the face (needs open canvas); inside a constrained card, set `showSubtitle={false}` / `showThought={false}` and render `<AvatarCaption>` / `<AvatarThought>` in your own layout slot instead.
171
- - `maxMouthOpening`, `mouseTrackingIntensity`, `blinkIntervalMin/Max`, `blinkDuration` — animation tuning.
172
- - `stateColors`, `stateLabels` — theming; labels are announced via `aria-live`.
173
- - `customization` — preset colors and accessories (skin, hair, clothing, glasses, headphones…).
174
-
175
- ### Building blocks
129
+ ## Driving the mouth
176
130
 
177
- Everything the runtime uses is exported, so you can compose your own:
131
+ The mouth has three possible drivers, in precedence order: an explicit `speechActivity` source, then `streamingText`, then the audio `analyser`. Pick whichever matches your pipeline — voice apps use the analyser, text-only LLMs use the streaming-text paths.
178
132
 
179
- - `ContractAvatar` wraps any contract-compliant SVG with the runtime.
180
- - `useAvatarRuntime(containerRef, options)` — the animation runtime itself.
181
- - `createMouthEngine(source)` / `useAudioMouth(...)` — the source→mouth analysis (amplitude + A/E/O shapes), procedural fallback included. `source` is an `AnalyserNode`, a `SpeechActivitySource`, or `null`.
182
- - `createSpeechActivity(options?)` — the token-rate mouth driver for text streams (`push` / `end` / `reset` / `sample`).
183
- - `useStreamingTextActivity(text)` — declarative wrapper: diffs accumulated streaming text into a `SpeechActivitySource` for you (what the `streamingText` prop uses).
184
- - `useReducedMotion()` — SSR-safe `prefers-reduced-motion` hook.
185
- - `GeometricAvatar`, `MemojiAvatar`, `PixelArtAvatar`, `DoodleAvatar` — the raw presets.
186
- - `AudioVisualizer` — Siri-style waveform telemetry strip.
187
- - `AvatarCaption` / `AvatarThought` — host-placed caption + thought widgets. In-flow (not `absolute`), so they fit your own layout slot without overflow; both flatten markdown to spoken prose and roll a trailing window.
188
- - `toPlainText(md)` / `tailWindow(text, { maxChars })` — the pure text helpers behind those widgets, for building your own caption.
189
-
190
- ## Getting an `AnalyserNode`
133
+ ### Audio: getting an `AnalyserNode`
191
134
 
192
135
  The standard recipe for base64 PCM streams (what Gemini Live / OpenAI Realtime return):
193
136
 
@@ -207,13 +150,13 @@ function playAudioChunk(pcmData: Float32Array) {
207
150
  }
208
151
  ```
209
152
 
210
- ## Text-streaming LLMs (no audio)
153
+ ### Text-streaming LLMs (no audio)
211
154
 
212
155
  Not every assistant speaks. For a text-only LLM that streams tokens — OpenAI-style `/chat/completions` or `/responses` with `stream: true`, or local servers like Ollama / LM Studio / vLLM — there's no `AnalyserNode` to read. Instead, drive the mouth from **token cadence**: the rhythm of arriving text becomes the same 0..1 energy signal the audio path produces. The mouth is busy while the model emits text and settles shut on pauses or when the stream ends. The library still never fetches anything — you own the stream, it owns the face.
213
156
 
214
157
  There are two ways to feed it, matching the two ways React apps consume streams.
215
158
 
216
- ### Declarative — `streamingText` (the easy path)
159
+ #### Declarative — `streamingText` (the easy path)
217
160
 
218
161
  If you use a streaming chat hook — the [Vercel AI SDK](https://sdk.vercel.ai)'s `useChat` is the de-facto standard — you never see raw chunks: you get the **accumulated** assistant message (it grows each render) plus a `status`. Both map straight onto the avatar. Pass the text, the avatar diffs its growth internally and drives the mouth. No refs, no reader loop:
219
162
 
@@ -238,7 +181,7 @@ function ChatAvatar() {
238
181
 
239
182
  That's the whole integration. `streamingText` takes precedence over `analyser`; the ambient glow reacts to it too. Works with every variant — flat presets, DiceBear, VRM and GLB.
240
183
 
241
- ### Imperative — `createSpeechActivity()` (you own the reader loop)
184
+ #### Imperative — `createSpeechActivity()` (you own the reader loop)
242
185
 
243
186
  Hand-rolling `fetch` or driving the OpenAI SDK's `for await` yourself? Then you *do* have the raw chunks — feed their cadence directly with a `SpeechActivitySource`:
244
187
 
@@ -284,6 +227,112 @@ function TextAvatar() {
284
227
 
285
228
  > [`examples/03-streaming-text-imperative.tsx`](examples/03-streaming-text-imperative.tsx) shows this end-to-end against an OpenAI-compatible endpoint. The browser only ever talks to your own `/api/chat`; a tiny reference relay that proxies to the provider (so the key never reaches the client) lives in [`examples/server/proxy.ts`](examples/server/proxy.ts).
286
229
 
230
+ ## Bring your own SVG (`byos`)
231
+
232
+ Any SVG exposing these stable hooks is animated by the runtime — same blink, gaze, mouth and thinking behavior as the built-in presets:
233
+
234
+ | hook | part | the runtime drives |
235
+ |---|---|---|
236
+ | `#rra-ring` | state ring | `stroke` = `stateColors[state]` |
237
+ | `#rra-mouth` | mouth | ellipse: `ry`/`rx` · rect: `height` |
238
+ | `.rra-pupil` (×2) | pupils | circle: `cx`/`cy` · rect: `x`/`y` (mouse tracking, thinking gaze) |
239
+ | `.rra-lid` (×2) | eyelids | `height` (blink; 0 = open) |
240
+ | `#rra-think` | thought bubble | `opacity` + dots pulsing while `thinking` |
241
+
242
+ Optional data attributes: `data-base-x`/`data-base-y` (pupil rest position), `data-max-height` (closed lid height), `data-quantize` (snap motion to a grid — that's how the pixel-art preset stays chunky).
243
+
244
+ ```tsx
245
+ <RealtimeAvatar state={aiState} analyser={analyser} variant="byos">
246
+ <MyOwnSvgAvatar /> {/* exposes the #rra-* hooks; its license is your business */}
247
+ </RealtimeAvatar>
248
+ ```
249
+
250
+ ## 3D avatars (VRM and GLB)
251
+
252
+ Both 3D variants share the same mouth engine as the flat presets, so the model talks, blinks and follows the cursor. The three.js stack is an **optional** peer dependency, lazy-loaded only when one of these variants renders — it never enters your bundle otherwise.
253
+
254
+ **`vrm`** — render VRoid/VRM models with visemes and gaze tracking:
255
+
256
+ ```bash
257
+ npm install three @react-three/fiber @react-three/drei @pixiv/three-vrm
258
+ ```
259
+
260
+ ```tsx
261
+ <RealtimeAvatar state={aiState} analyser={analyser} variant="vrm" vrmUrl="/models/avatar.vrm" />
262
+ ```
263
+
264
+ **`glb`** — render any `.glb` that exposes the **52 [ARKit blendshapes](https://arkit-face-blendshapes.com/)** (the standard `jawOpen`, `mouthFunnel`, `eyeBlinkLeft`, … morph targets). Same deal as `vrm`, minus `@pixiv/three-vrm`:
265
+
266
+ ```bash
267
+ npm install three @react-three/fiber @react-three/drei
268
+ ```
269
+
270
+ ```tsx
271
+ <RealtimeAvatar state={aiState} analyser={analyser} variant="glb" glbUrl="/models/rocketbox.glb" />
272
+ ```
273
+
274
+ **Recommended example asset — Microsoft Rocketbox (MIT).** [Rocketbox](https://github.com/microsoft/Microsoft-Rocketbox) ships 115 rigged avatars with an ARKit-compatible blendshape variant, under the **MIT license** — the cleanest fit for this library's no-attribution-headaches philosophy. Rocketbox distributes `.fbx`, so convert one avatar to `.glb` once (offline, via [FBX2GLTF](https://github.com/facebookincubator/FBX2glTF) or Blender's glTF 2.0 export, keeping the blendshapes) and drop it in `public/models/`. Keep the MIT notice alongside it. [Ready Player Me](https://docs.readyplayer.me/ready-player-me/api-reference/avatars/morph-targets/apple-arkit) avatars (`?morphTargets=ARKit`) also work out of the box.
275
+
276
+ ## DiceBear avatars (`dicebear`)
277
+
278
+ Generate [DiceBear](https://www.dicebear.com) avatars client-side — deterministic per `seed`, no network call. The packages are **optional** peer dependencies, lazy-loaded only when this variant renders:
279
+
280
+ ```bash
281
+ npm install @dicebear/core @dicebear/collection
282
+ ```
283
+
284
+ ```tsx
285
+ <RealtimeAvatar
286
+ state={aiState}
287
+ analyser={analyser}
288
+ variant="dicebear"
289
+ dicebearCollection="open-peeps" // curated CC0 style id
290
+ dicebearSeed="ada-lovelace" // same seed + style => same face
291
+ />
292
+ ```
293
+
294
+ **Licensing:** DiceBear ships ~30 styles under mixed licenses. This library's catalog (`DICEBEAR_STYLES`) is curated to **CC0 1.0** styles that have a face — `pixel-art`(+`-neutral`), `lorelei`(+`-neutral`), `notionists`(+`-neutral`), `open-peeps`, `thumbs` — so it keeps the same no-attribution promise as the built-in presets. You *can* pass any other DiceBear style id to `dicebearCollection`, but then its license (e.g. CC BY 4.0 for `adventurer`, or "free for personal and commercial use" for `bottts`) is your responsibility — same deal as `byos`.
295
+
296
+ **Animation:** DiceBear SVGs have no `#rra-*` hooks, but their *option API* lets us pick which mouth/eyes variant to render. So every curated style actually **talks**: it pre-generates a few frames of the same avatar (same seed ⇒ identical hair/skin/etc.) with closed / mid / open mouths — plus a blink frame where the style allows — and swaps which frame is shown per audio frame, with a subtle bounce on top. Real articulation via the supported API, no fragile path hacks. The per-style variant choices live in the exported `DICEBEAR_RIGS` map. (A non-rigged style id you pass yourself — e.g. a faceless abstract DiceBear style — falls back to a pure audio-reactive bounce.) State color and the thinking bubble still come from the surrounding `<RealtimeAvatar />` chrome.
297
+
298
+ ## API reference
299
+
300
+ ### `<RealtimeAvatar />`
301
+
302
+ - `state` (`'idle' | 'listening' | 'thinking' | 'speaking' | 'working'`) — required. You resolve it; it is never inferred. `working` is the tool-use state for agentic UIs (amber).
303
+ - `tool` (`string`) — optional. The name of the tool currently running. While `state="working"`, the state pill reads `Working: {tool}` instead of the generic label.
304
+ - `analyser` (`AnalyserNode | null`) — optional. Drives the mouth from audio. Omitted or `null`, speaking falls back to the synthetic pattern.
305
+ - `streamingText` (`string`) — optional. Declarative mouth driver: pass the accumulated assistant text (e.g. from `useChat`) and the avatar diffs its growth to drive the mouth. Takes precedence over `analyser`. See [Text-streaming LLMs](#text-streaming-llms-no-audio).
306
+ - `speechActivity` (`SpeechActivitySource`) — optional. Imperative token-rate mouth driver, from `createSpeechActivity()`. Takes precedence over both `streamingText` and `analyser` when set.
307
+ - `size` (`number`) — px, default `280`.
308
+ - `variant` — see catalog above. Default `'geometric'`.
309
+ - `children` — your SVG, for `variant="byos"`.
310
+ - `vrmUrl` (`string`) — CORS-enabled `.vrm` URL, for `variant="vrm"`.
311
+ - `glbUrl` (`string`) — CORS-enabled `.glb` URL with ARKit blendshapes, for `variant="glb"`.
312
+ - `dicebearCollection` (`string`) — DiceBear style id (curated CC0 set), for `variant="dicebear"`.
313
+ - `dicebearSeed` (`string`) — deterministic DiceBear seed, for `variant="dicebear"`.
314
+ - `subtitle` / `thought` (`string`) — optional movie-style caption and a thought bubble. Pass raw text or markdown: both are flattened to spoken prose and rolled to a trailing window internally, so a long streamed reply never overflows or shows raw `**`/tables. For a long assistant reply, keep the full markdown in your chat transcript and pass the same text here for the short live caption.
315
+ - `showGlow` / `showStatePill` / `showThought` / `showSubtitle` (`boolean`) — HUD satellites, each `true` by default. Set any to `false` to hide it individually: the reactive glow, the state pill, the thought bubble, and the subtitle respectively. The built-in subtitle/thought float `absolute` around the face (needs open canvas); inside a constrained card, set `showSubtitle={false}` / `showThought={false}` and render `<AvatarCaption>` / `<AvatarThought>` in your own layout slot instead.
316
+ - `maxMouthOpening`, `mouseTrackingIntensity`, `blinkIntervalMin/Max`, `blinkDuration` — animation tuning.
317
+ - `stateColors`, `stateLabels` — theming; labels are announced via `aria-live`. Both cover all five states including `working`.
318
+ - `customization` — preset colors and accessories (skin, hair, clothing, glasses, headphones…).
319
+
320
+ ### Building blocks
321
+
322
+ Everything the runtime uses is exported, so you can compose your own:
323
+
324
+ - `ContractAvatar` — wraps any contract-compliant SVG with the runtime.
325
+ - `useAvatarRuntime(containerRef, options)` — the animation runtime itself.
326
+ - `createMouthEngine(source)` / `useAudioMouth(...)` — the source→mouth analysis (amplitude + A/E/O shapes), procedural fallback included. `source` is an `AnalyserNode`, a `SpeechActivitySource`, or `null`.
327
+ - `createSpeechActivity(options?)` — the token-rate mouth driver for text streams (`push` / `end` / `reset` / `sample`).
328
+ - `useStreamingTextActivity(text)` — declarative wrapper: diffs accumulated streaming text into a `SpeechActivitySource` for you (what the `streamingText` prop uses).
329
+ - `useReducedMotion()` — SSR-safe `prefers-reduced-motion` hook.
330
+ - `GeometricAvatar`, `MemojiAvatar`, `PixelArtAvatar`, `DoodleAvatar` — the raw presets.
331
+ - `SquirrelAvatar` — a full branded character (red-squirrel dev face) built on the `#rra-*` contract; the worked `byos` example, shipped so the demos render it from one source. See [`examples/08-character-avatar-squirrel.tsx`](examples/08-character-avatar-squirrel.tsx).
332
+ - `AudioVisualizer` — Siri-style waveform telemetry strip.
333
+ - `AvatarCaption` / `AvatarThought` — host-placed caption + thought widgets. In-flow (not `absolute`), so they fit your own layout slot without overflow; both flatten markdown to spoken prose and roll a trailing window.
334
+ - `toPlainText(md)` / `tailWindow(text, { maxChars })` — the pure text helpers behind those widgets, for building your own caption.
335
+
287
336
  ## Positioning
288
337
 
289
338
  The closest reference is [TalkingHead](https://github.com/met4citizen/TalkingHead) (3D, realistic lip-sync, Ready Player Me/Mixamo rigs). This library makes the opposite bet:
@@ -296,10 +345,13 @@ The closest reference is [TalkingHead](https://github.com/met4citizen/TalkingHea
296
345
  | Makes visible | the voice | the *thinking* |
297
346
  | Setup | avatar platform + Blender + rig | `npm i` + one component |
298
347
 
299
- ## Development
348
+ ## Examples
349
+
350
+ Copy-pasteable, single-file integration examples — including a reference relay server for real voice/text providers — live in [`examples/`](examples/). One file per integration pattern (quickstart, `useChat`, imperative streaming, audio analyser, the avatar catalog, `byos`, Gemini Live voice, a branded character). The runnable, hosted versions (client-side mock, no API key) live on the [docs site](https://react-ai-avatar-site.vercel.app/).
351
+
352
+ ## Contributing
300
353
 
301
- This repo is the library only — no app or backend. The runnable, hosted demos
302
- live on the project's docs site (built separately, client-side mock, no API key).
354
+ This repo is the library only — no app or backend. The runnable, hosted demos live on the project's docs site (built separately, client-side mock, no API key).
303
355
 
304
356
  ```bash
305
357
  npm install
@@ -308,8 +360,7 @@ npm run lint # tsc --noEmit
308
360
  npm run build:lib # builds the publishable package into dist/lib
309
361
  ```
310
362
 
311
- Copy-pasteable integration examples including a reference relay server for
312
- real voice/text providers — live in [`examples/`](examples/).
363
+ Issues and pull requests are welcome — bug fixes, new presets that follow the `#rra-*` layer contract, and integration examples especially. Keep the library presentational and provider-agnostic: it never fetches, and the audio/three.js peers stay optional and lazy-loaded.
313
364
 
314
365
  ## License
315
366
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "react-ai-avatar",
3
- "version": "0.1.1",
3
+ "version": "0.1.3",
4
4
  "description": "A presentational React avatar for realtime LLM voice UIs — you bring the connection, it brings the face.",
5
5
  "license": "MIT",
6
6
  "author": "Ariel A. <ariel.a.deibe@gmail.com>",