react-native-litert-lm 0.2.1 โ 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +331 -150
- package/android/build.gradle +1 -1
- package/android/src/main/java/com/margelo/nitro/dev/litert/litertlm/HybridLiteRTLM.kt +140 -37
- package/app.plugin.js +33 -0
- package/cpp/HybridLiteRTLM.cpp +577 -378
- package/cpp/HybridLiteRTLM.hpp +66 -23
- package/cpp/IOSDownloadHelper.h +24 -0
- package/cpp/cpp-adapter.cpp +10 -2
- package/cpp/include/litert_lm_engine.h +502 -0
- package/ios/IOSDownloadHelper.mm +129 -0
- package/ios/LiteRTLMAutolinking.mm +30 -0
- package/lib/hooks.d.ts +33 -3
- package/lib/hooks.js +54 -23
- package/lib/index.d.ts +4 -1
- package/lib/index.js +6 -6
- package/lib/memoryTracker.d.ts +128 -0
- package/lib/memoryTracker.js +155 -0
- package/lib/modelFactory.d.ts +21 -2
- package/lib/modelFactory.js +78 -11
- package/lib/specs/LiteRTLM.nitro.d.ts +19 -0
- package/nitrogen/generated/android/LiteRTLMOnLoad.cpp +28 -18
- package/nitrogen/generated/android/LiteRTLMOnLoad.hpp +13 -4
- package/nitrogen/generated/android/c++/JHybridLiteRTLMSpec.cpp +39 -36
- package/nitrogen/generated/android/c++/JHybridLiteRTLMSpec.hpp +20 -22
- package/nitrogen/generated/android/c++/JMemoryUsage.hpp +69 -0
- package/nitrogen/generated/android/kotlin/com/margelo/nitro/dev/litert/litertlm/HybridLiteRTLMSpec.kt +19 -18
- package/nitrogen/generated/android/kotlin/com/margelo/nitro/dev/litert/litertlm/MemoryUsage.kt +47 -0
- package/nitrogen/generated/shared/c++/HybridLiteRTLMSpec.cpp +1 -0
- package/nitrogen/generated/shared/c++/HybridLiteRTLMSpec.hpp +4 -0
- package/nitrogen/generated/shared/c++/MemoryUsage.hpp +95 -0
- package/package.json +12 -5
- package/react-native-litert-lm.podspec +20 -7
- package/scripts/build-ios-engine.sh +283 -0
- package/scripts/download-ios-frameworks.sh +72 -0
- package/scripts/postinstall.js +116 -0
- package/scripts/stubs/cxx_bridge_stubs.cc +224 -0
- package/scripts/stubs/gemma_model_constraint_provider.cc +46 -0
- package/scripts/stubs/llguidance_stubs.c +101 -0
- package/src/hooks.ts +107 -41
- package/src/index.ts +13 -6
- package/src/memoryTracker.ts +268 -0
- package/src/modelFactory.ts +107 -11
- package/src/specs/LiteRTLM.nitro.ts +21 -0
package/README.md
CHANGED
|
@@ -1,21 +1,19 @@
|
|
|
1
1
|
# react-native-litert-lm
|
|
2
2
|
|
|
3
|
-
High-performance LLM inference for React Native powered by [LiteRT-LM](https://github.com/google-ai-edge/LiteRT-LM) and [Nitro
|
|
3
|
+
High-performance on-device LLM inference for React Native, powered by [LiteRT-LM](https://github.com/google-ai-edge/LiteRT-LM) and [Nitro Modules](https://github.com/mrousavy/nitro). Optimized for **Gemma 3n** and other on-device language models.
|
|
4
4
|
|
|
5
5
|
## Features
|
|
6
6
|
|
|
7
|
-
- ๐ **Native Performance**
|
|
8
|
-
- ๐ง **Gemma 3n Ready**
|
|
9
|
-
- โก **GPU Acceleration**
|
|
10
|
-
-
|
|
11
|
-
-
|
|
12
|
-
-
|
|
13
|
-
-
|
|
14
|
-
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
> โ ๏ธ **Early Preview**: This library is under active development. Android is functional with enough RAM, iOS implementation pending LiteRT-LM iOS release. Please report any issues on the [GitHub issues](https://github.com/hung-yueh/react-native-litert-lm/issues).
|
|
7
|
+
- ๐ **Native Performance** โ Kotlin (Android) / C++ (iOS) via Nitro Modules JSI bindings
|
|
8
|
+
- ๐ง **Gemma 3n Ready** โ First-class support for Gemma 3n E2B/E4B models
|
|
9
|
+
- โก **GPU Acceleration** โ GPU delegate (Android), Metal/MPS (iOS)
|
|
10
|
+
- ๐ **Streaming Support** โ Token-by-token generation callbacks
|
|
11
|
+
- ๐ฑ **Cross-Platform** โ Android API 26+ / iOS 15.0+
|
|
12
|
+
- ๐ผ๏ธ **Multimodal** โ Image and audio input support (Android)
|
|
13
|
+
- ๐งต **Async API** โ Non-blocking inference on background threads
|
|
14
|
+
- ๐ **Real Memory Tracking** โ OS-level memory metrics (RSS, native heap, available memory) via native APIs
|
|
15
|
+
- ๐งฎ **Zero-Copy Buffers** โ Memory snapshots stored in native ArrayBuffers via Nitro Modules
|
|
16
|
+
- ๐ฅ **Automatic Model Download** โ Downloads models from URL with progress tracking and local caching
|
|
19
17
|
|
|
20
18
|
## Installation
|
|
21
19
|
|
|
@@ -42,65 +40,88 @@ Then create a development build:
|
|
|
42
40
|
|
|
43
41
|
```bash
|
|
44
42
|
npx expo prebuild
|
|
45
|
-
npx expo run:android
|
|
43
|
+
npx expo run:android # Android
|
|
44
|
+
npx expo run:ios # iOS
|
|
46
45
|
```
|
|
47
46
|
|
|
48
|
-
> **Note**: Only ARM devices are supported
|
|
47
|
+
> **Note**: Only ARM devices/simulators are supported. x86_64 Android emulators are not supported.
|
|
49
48
|
|
|
50
49
|
### Bare React Native
|
|
51
50
|
|
|
52
51
|
```bash
|
|
52
|
+
# Android
|
|
53
53
|
cd android && ./gradlew clean
|
|
54
|
-
|
|
54
|
+
|
|
55
|
+
# iOS
|
|
56
|
+
cd ios && pod install
|
|
55
57
|
```
|
|
56
58
|
|
|
57
59
|
## Example App
|
|
58
60
|
|
|
59
|
-
The
|
|
61
|
+
The `example/` directory contains a fully functional test app with a dark-themed diagnostic UI that demonstrates:
|
|
62
|
+
|
|
63
|
+
- Model downloading with progress tracking
|
|
64
|
+
- Text inference (blocking and streaming)
|
|
65
|
+
- Multi-turn conversation with context retention
|
|
66
|
+
- Performance benchmarking (tokens/sec, latency)
|
|
67
|
+
- Real-time memory tracking
|
|
68
|
+
- Quick chat interface
|
|
69
|
+
|
|
70
|
+
### Running the Example
|
|
71
|
+
|
|
72
|
+
1. **Build the library** (compiles TypeScript to `lib/`):
|
|
60
73
|
|
|
61
|
-
|
|
74
|
+
```bash
|
|
75
|
+
npm run build
|
|
76
|
+
```
|
|
62
77
|
|
|
63
|
-
|
|
78
|
+
2. **Install example dependencies:**
|
|
64
79
|
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
80
|
+
```bash
|
|
81
|
+
cd example
|
|
82
|
+
npm install
|
|
83
|
+
```
|
|
68
84
|
|
|
69
|
-
|
|
85
|
+
3. **Create a development build and run:**
|
|
70
86
|
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
87
|
+
```bash
|
|
88
|
+
npx expo prebuild --clean
|
|
89
|
+
npx expo run:android # Android
|
|
90
|
+
npx expo run:ios # iOS (requires XCFramework โ see "Building the iOS Engine" below)
|
|
91
|
+
```
|
|
74
92
|
|
|
75
|
-
|
|
76
|
-
```bash
|
|
77
|
-
npx expo run:android
|
|
78
|
-
```
|
|
93
|
+
> **Note:** If you change native code (C++/Kotlin/Obj-C++), you must run `npx expo prebuild --clean` again before rebuilding.
|
|
79
94
|
|
|
80
95
|
## Model Management
|
|
81
96
|
|
|
82
|
-
LiteRT-LM models (like Gemma 3n) are large files (
|
|
97
|
+
LiteRT-LM models (like Gemma 3n) are large files (3 GB+) and cannot be bundled into your app binary. They are downloaded at runtime.
|
|
83
98
|
|
|
84
99
|
### Automatic Downloading
|
|
85
100
|
|
|
86
|
-
The library
|
|
101
|
+
The library handles downloading automatically when you pass a URL to `loadModel` or `useModel`. Downloads include:
|
|
102
|
+
|
|
103
|
+
- **Progress tracking** โ real-time download percentage via callbacks
|
|
104
|
+
- **Local caching** โ downloaded models are cached and reused across app launches
|
|
105
|
+
- **Android**: app-local temp directory
|
|
106
|
+
- **iOS**: `Library/Caches/litert_models/` (survives app relaunch; reclaimable by iOS under storage pressure)
|
|
107
|
+
- **HTTPS enforcement** โ only secure URLs are accepted
|
|
87
108
|
|
|
88
109
|
### Manual Downloading (Optional)
|
|
89
110
|
|
|
90
|
-
If you prefer to manage downloads
|
|
111
|
+
If you prefer to manage downloads yourself (e.g., using `expo-file-system`), download the `.litertlm` file to a local path and pass that path to the library:
|
|
91
112
|
|
|
92
113
|
```typescript
|
|
93
|
-
import
|
|
94
|
-
// or import * as FileSystem from 'expo-file-system';
|
|
114
|
+
import * as FileSystem from "expo-file-system";
|
|
95
115
|
|
|
96
116
|
const MODEL_URL =
|
|
97
117
|
"https://huggingface.co/litert-community/gemma-3n-2b-it/resolve/main/model.litertlm";
|
|
98
|
-
const localPath = `${FileSystem.
|
|
118
|
+
const localPath = `${FileSystem.documentDirectory}gemma-3n.litertlm`;
|
|
99
119
|
|
|
100
120
|
async function downloadModel() {
|
|
101
|
-
|
|
121
|
+
const info = await FileSystem.getInfoAsync(localPath);
|
|
122
|
+
if (info.exists) return localPath;
|
|
102
123
|
|
|
103
|
-
|
|
124
|
+
await FileSystem.downloadAsync(MODEL_URL, localPath);
|
|
104
125
|
return localPath;
|
|
105
126
|
}
|
|
106
127
|
```
|
|
@@ -109,26 +130,27 @@ async function downloadModel() {
|
|
|
109
130
|
|
|
110
131
|
### React Hook (Recommended)
|
|
111
132
|
|
|
112
|
-
The `useModel` hook manages the model lifecycle
|
|
133
|
+
The `useModel` hook manages the full model lifecycle: downloading, loading, inference, and cleanup.
|
|
113
134
|
|
|
114
135
|
```typescript
|
|
115
136
|
import { useModel, GEMMA_3N_E2B_IT_INT4 } from "react-native-litert-lm";
|
|
137
|
+
import { Platform } from "react-native";
|
|
116
138
|
|
|
117
139
|
function App() {
|
|
118
140
|
const {
|
|
119
141
|
model,
|
|
120
142
|
isReady,
|
|
121
143
|
downloadProgress,
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
);
|
|
144
|
+
error,
|
|
145
|
+
load, // Manually trigger load
|
|
146
|
+
deleteModel, // Delete cached model file
|
|
147
|
+
memorySummary, // Auto-updated memory stats (if tracking enabled)
|
|
148
|
+
} = useModel(GEMMA_3N_E2B_IT_INT4, {
|
|
149
|
+
backend: Platform.OS === 'ios' ? 'gpu' : 'cpu',
|
|
150
|
+
autoLoad: true, // Default: true. Set false to load manually via load().
|
|
151
|
+
systemPrompt: "You are a helpful assistant.",
|
|
152
|
+
enableMemoryTracking: true,
|
|
153
|
+
});
|
|
132
154
|
|
|
133
155
|
if (!isReady) {
|
|
134
156
|
return <Text>Loading... {Math.round(downloadProgress * 100)}%</Text>;
|
|
@@ -156,7 +178,7 @@ await llm.loadModel("https://example.com/model.litertlm", {
|
|
|
156
178
|
systemPrompt: "You are a helpful assistant.",
|
|
157
179
|
});
|
|
158
180
|
|
|
159
|
-
// Generate response
|
|
181
|
+
// Generate a response
|
|
160
182
|
const response = await llm.sendMessage("What is the capital of France?");
|
|
161
183
|
console.log(response);
|
|
162
184
|
|
|
@@ -173,15 +195,16 @@ llm.sendMessageAsync("Tell me a story", (token, done) => {
|
|
|
173
195
|
});
|
|
174
196
|
```
|
|
175
197
|
|
|
176
|
-
### Multimodal (Image/Audio)
|
|
198
|
+
### Multimodal (Image / Audio)
|
|
199
|
+
|
|
200
|
+
> **Note**: Multimodal is fully supported on Android. iOS has the code paths implemented but vision/audio executors may not be available in the current XCFramework build โ use `checkMultimodalSupport()` to verify at runtime.
|
|
177
201
|
|
|
178
202
|
```typescript
|
|
179
203
|
import { checkMultimodalSupport } from "react-native-litert-lm";
|
|
180
204
|
|
|
181
|
-
|
|
182
|
-
|
|
183
|
-
|
|
184
|
-
console.warn(error); // iOS not yet supported
|
|
205
|
+
const warning = checkMultimodalSupport();
|
|
206
|
+
if (warning) {
|
|
207
|
+
console.warn(warning); // Experimental on iOS
|
|
185
208
|
} else {
|
|
186
209
|
// Image input (for vision models like Gemma 3n)
|
|
187
210
|
// Images >1024px are automatically resized to prevent OOM
|
|
@@ -190,7 +213,7 @@ if (error) {
|
|
|
190
213
|
"/path/to/image.jpg",
|
|
191
214
|
);
|
|
192
215
|
|
|
193
|
-
// Audio input
|
|
216
|
+
// Audio input
|
|
194
217
|
const transcription = await llm.sendMessageWithAudio(
|
|
195
218
|
"Transcribe this audio",
|
|
196
219
|
"/path/to/audio.wav",
|
|
@@ -198,183 +221,341 @@ if (error) {
|
|
|
198
221
|
}
|
|
199
222
|
```
|
|
200
223
|
|
|
201
|
-
###
|
|
224
|
+
### Performance Stats
|
|
202
225
|
|
|
203
226
|
```typescript
|
|
204
227
|
const stats = llm.getStats();
|
|
205
228
|
console.log(`Generated ${stats.completionTokens} tokens`);
|
|
206
229
|
console.log(`Speed: ${stats.tokensPerSecond.toFixed(1)} tokens/sec`);
|
|
230
|
+
console.log(`Time to first token: ${stats.timeToFirstToken.toFixed(0)} ms`);
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
### Memory Tracking
|
|
234
|
+
|
|
235
|
+
The library provides real OS-level memory data โ no estimation. It reads directly from `mach_task_basic_info` (iOS) and `Debug.getNativeHeapAllocatedSize()` + `/proc/self/status` (Android).
|
|
236
|
+
|
|
237
|
+
#### Direct Memory Query
|
|
238
|
+
|
|
239
|
+
```typescript
|
|
240
|
+
const usage = llm.getMemoryUsage();
|
|
241
|
+
console.log(
|
|
242
|
+
`Native heap: ${(usage.nativeHeapBytes / 1024 / 1024).toFixed(1)} MB`,
|
|
243
|
+
);
|
|
244
|
+
console.log(`RSS: ${(usage.residentBytes / 1024 / 1024).toFixed(1)} MB`);
|
|
245
|
+
console.log(
|
|
246
|
+
`Available: ${(usage.availableMemoryBytes / 1024 / 1024).toFixed(1)} MB`,
|
|
247
|
+
);
|
|
248
|
+
console.log(`Low memory: ${usage.isLowMemory}`);
|
|
249
|
+
```
|
|
250
|
+
|
|
251
|
+
#### Automatic Tracking with Native Buffers
|
|
252
|
+
|
|
253
|
+
Enable memory tracking to automatically record snapshots in a native-backed `ArrayBuffer` after every inference call:
|
|
254
|
+
|
|
255
|
+
```typescript
|
|
256
|
+
const llm = createLLM({
|
|
257
|
+
enableMemoryTracking: true,
|
|
258
|
+
maxMemorySnapshots: 256,
|
|
259
|
+
});
|
|
260
|
+
|
|
261
|
+
await llm.loadModel("/path/to/model.litertlm", { backend: "cpu" });
|
|
262
|
+
await llm.sendMessage("Hello!");
|
|
263
|
+
|
|
264
|
+
const summary = llm.memoryTracker!.getSummary();
|
|
265
|
+
console.log(
|
|
266
|
+
`Peak RSS: ${(summary.peakResidentBytes / 1024 / 1024).toFixed(1)} MB`,
|
|
267
|
+
);
|
|
268
|
+
console.log(
|
|
269
|
+
`RSS Delta: ${(summary.residentDeltaBytes / 1024 / 1024).toFixed(1)} MB`,
|
|
270
|
+
);
|
|
271
|
+
```
|
|
272
|
+
|
|
273
|
+
#### Using `useModel` with Memory Tracking
|
|
274
|
+
|
|
275
|
+
```typescript
|
|
276
|
+
const { model, isReady, memorySummary } = useModel(modelUrl, {
|
|
277
|
+
enableMemoryTracking: true,
|
|
278
|
+
maxMemorySnapshots: 100,
|
|
279
|
+
});
|
|
280
|
+
|
|
281
|
+
// memorySummary auto-updates after each inference call
|
|
282
|
+
if (memorySummary) {
|
|
283
|
+
console.log(`Current RSS: ${memorySummary.currentResidentBytes}`);
|
|
284
|
+
console.log(`Peak RSS: ${memorySummary.peakResidentBytes}`);
|
|
285
|
+
}
|
|
286
|
+
```
|
|
287
|
+
|
|
288
|
+
#### Standalone Memory Tracker
|
|
289
|
+
|
|
290
|
+
```typescript
|
|
291
|
+
import {
|
|
292
|
+
createMemoryTracker,
|
|
293
|
+
createNativeBuffer,
|
|
294
|
+
} from "react-native-litert-lm";
|
|
295
|
+
|
|
296
|
+
const tracker = createMemoryTracker(100);
|
|
297
|
+
|
|
298
|
+
tracker.record({
|
|
299
|
+
timestamp: Date.now(),
|
|
300
|
+
nativeHeapBytes: 50_000_000,
|
|
301
|
+
residentBytes: 200_000_000,
|
|
302
|
+
availableMemoryBytes: 4_000_000_000,
|
|
303
|
+
});
|
|
304
|
+
|
|
305
|
+
// Access the underlying native buffer (zero-copy transfer to native code)
|
|
306
|
+
const buffer = tracker.getNativeBuffer();
|
|
207
307
|
```
|
|
208
308
|
|
|
209
309
|
## Supported Models
|
|
210
310
|
|
|
211
|
-
Download `.litertlm` models automatically using the exported constants or from [HuggingFace](https://huggingface.co/litert-community):
|
|
311
|
+
Download `.litertlm` models automatically using the exported URL constants, or manually from [HuggingFace](https://huggingface.co/litert-community):
|
|
312
|
+
|
|
313
|
+
| Constant | Model | Size | Min RAM |
|
|
314
|
+
| :--------------------- | :------------------------------------- | :---- | :------ |
|
|
315
|
+
| `GEMMA_3N_E2B_IT_INT4` | Gemma 3n E2B (Instruction Tuned, Int4) | ~3 GB | 4 GB+ |
|
|
212
316
|
|
|
213
|
-
|
|
214
|
-
| :--------------------- | :------------------------------------- | :--- | :------------- |
|
|
215
|
-
| `GEMMA_3N_E2B_IT_INT4` | Gemma 3n E2B (Instruction Tuned, Int4) | ~3GB | 4GB+ |
|
|
317
|
+
**Other compatible models** (download manually from HuggingFace):
|
|
216
318
|
|
|
217
|
-
|
|
|
218
|
-
| ------------- |
|
|
219
|
-
| Gemma 3n E4B | ~
|
|
220
|
-
| Gemma 3 1B | ~
|
|
221
|
-
| Phi-4 Mini | ~
|
|
222
|
-
| Qwen 2.5 1.5B | ~1.
|
|
319
|
+
| Model | Size | Min RAM | Notes |
|
|
320
|
+
| ------------- | ------- | ------- | --------------------- |
|
|
321
|
+
| Gemma 3n E4B | ~4 GB | 8 GB+ | Higher quality |
|
|
322
|
+
| Gemma 3 1B | ~1 GB | 4 GB+ | Smallest, fastest |
|
|
323
|
+
| Phi-4 Mini | ~2 GB | 4 GB+ | Microsoft's small LLM |
|
|
324
|
+
| Qwen 2.5 1.5B | ~1.5 GB | 4 GB+ | Multilingual |
|
|
223
325
|
|
|
224
326
|
## API Reference
|
|
225
327
|
|
|
226
|
-
### `createLLM(): LiteRTLM`
|
|
328
|
+
### `createLLM(options?): LiteRTLM`
|
|
227
329
|
|
|
228
330
|
Creates a new LLM inference engine instance.
|
|
229
331
|
|
|
332
|
+
- `options.enableMemoryTracking` โ enable automatic memory snapshot recording
|
|
333
|
+
- `options.maxMemorySnapshots` โ max number of snapshots to retain (default: 256)
|
|
334
|
+
|
|
230
335
|
### `loadModel(path, config?): Promise<void>`
|
|
231
336
|
|
|
232
|
-
|
|
233
|
-
- `config.systemPrompt` - System prompt to guide model behavior (e.g., "You are a helpful assistant.")
|
|
234
|
-
- `config.backend` - `'cpu'` | `'gpu'` | `'npu'` (default: `'gpu'`)
|
|
235
|
-
- `config.temperature` - Sampling temperature (default: 0.7)
|
|
236
|
-
- `config.topK` - Top-K sampling (default: 40)
|
|
237
|
-
- `config.maxTokens` - Max generation length (default: 1024)
|
|
337
|
+
Loads a model from a local path or HTTPS URL.
|
|
238
338
|
|
|
239
|
-
|
|
339
|
+
| Parameter | Type | Default | Description |
|
|
340
|
+
| --------------------- | -------- | ------- | ----------------------------------------- |
|
|
341
|
+
| `path` | `string` | โ | Absolute path to `.litertlm` or HTTPS URL |
|
|
342
|
+
| `config.backend` | `string` | `'gpu'` | `'cpu'`, `'gpu'`, or `'npu'` |
|
|
343
|
+
| `config.systemPrompt` | `string` | โ | System prompt for the model |
|
|
344
|
+
| `config.temperature` | `number` | `0.7` | Sampling temperature |
|
|
345
|
+
| `config.topK` | `number` | `40` | Top-K sampling |
|
|
346
|
+
| `config.topP` | `number` | `0.95` | Top-P (nucleus) sampling |
|
|
347
|
+
| `config.maxTokens` | `number` | `1024` | Maximum generation length |
|
|
240
348
|
|
|
241
349
|
#### Backend Options
|
|
242
350
|
|
|
243
|
-
| Backend |
|
|
244
|
-
| ------- |
|
|
245
|
-
| `'cpu'` | CPU inference
|
|
246
|
-
| `'gpu'` | GPU
|
|
247
|
-
| `'npu'` | NPU/Neural Engine | Fastest | Requires supported hardware
|
|
351
|
+
| Backend | Engine | Speed | Notes |
|
|
352
|
+
| ------- | ------------------- | ------- | ---------------------------------------------- |
|
|
353
|
+
| `'cpu'` | CPU inference | Slowest | Always available, lower RAM requirement |
|
|
354
|
+
| `'gpu'` | GPU / Metal | Fast | Recommended default |
|
|
355
|
+
| `'npu'` | NPU / Neural Engine | Fastest | Requires supported hardware; falls back to GPU |
|
|
248
356
|
|
|
249
|
-
>
|
|
357
|
+
> **iOS**: `'gpu'` uses Metal/MPS and is the recommended backend. The engine automatically tries multiple backend combinations if the primary one fails.
|
|
250
358
|
|
|
251
359
|
### `sendMessage(message): Promise<string>`
|
|
252
360
|
|
|
253
|
-
|
|
361
|
+
Runs inference synchronously on a background thread. Returns the complete response.
|
|
254
362
|
|
|
255
363
|
### `sendMessageAsync(message, callback)`
|
|
256
364
|
|
|
257
|
-
Streaming generation. Callback
|
|
365
|
+
Streaming generation. Callback signature: `(token: string, isDone: boolean) => void`.
|
|
258
366
|
|
|
259
367
|
### `sendMessageWithImage(message, imagePath): Promise<string>`
|
|
260
368
|
|
|
261
|
-
Send a message with an image
|
|
369
|
+
Send a message with an image (Android only; for vision models like Gemma 3n).
|
|
262
370
|
|
|
263
371
|
### `sendMessageWithAudio(message, audioPath): Promise<string>`
|
|
264
372
|
|
|
265
|
-
Send a message with
|
|
373
|
+
Send a message with audio (Android only).
|
|
374
|
+
|
|
375
|
+
### `getStats(): GenerationStats`
|
|
376
|
+
|
|
377
|
+
Returns performance metrics from the last inference call.
|
|
378
|
+
|
|
379
|
+
```typescript
|
|
380
|
+
interface GenerationStats {
|
|
381
|
+
tokensPerSecond: number;
|
|
382
|
+
totalTime: number; // seconds
|
|
383
|
+
timeToFirstToken: number; // seconds
|
|
384
|
+
promptTokens: number;
|
|
385
|
+
completionTokens: number;
|
|
386
|
+
prefillSpeed: number; // tokens/sec
|
|
387
|
+
}
|
|
388
|
+
```
|
|
389
|
+
|
|
390
|
+
### `getMemoryUsage(): MemoryUsage`
|
|
391
|
+
|
|
392
|
+
Returns real OS-level memory usage.
|
|
393
|
+
|
|
394
|
+
```typescript
|
|
395
|
+
interface MemoryUsage {
|
|
396
|
+
nativeHeapBytes: number;
|
|
397
|
+
residentBytes: number;
|
|
398
|
+
availableMemoryBytes: number;
|
|
399
|
+
isLowMemory: boolean;
|
|
400
|
+
}
|
|
401
|
+
```
|
|
266
402
|
|
|
267
403
|
### `getHistory(): Message[]`
|
|
268
404
|
|
|
269
|
-
|
|
405
|
+
Returns the conversation history.
|
|
270
406
|
|
|
271
407
|
### `resetConversation()`
|
|
272
408
|
|
|
273
|
-
|
|
409
|
+
Clears conversation context and starts a fresh session.
|
|
274
410
|
|
|
275
411
|
### `close()`
|
|
276
412
|
|
|
277
|
-
|
|
413
|
+
Releases all native resources. Call when the model is no longer needed.
|
|
278
414
|
|
|
279
415
|
### `deleteModel(fileName): Promise<void>`
|
|
280
416
|
|
|
281
|
-
Deletes a model file from the app's
|
|
417
|
+
Deletes a cached model file from the app's local storage.
|
|
282
418
|
|
|
283
|
-
###
|
|
419
|
+
### Utility Functions
|
|
284
420
|
|
|
285
|
-
|
|
421
|
+
```typescript
|
|
422
|
+
import {
|
|
423
|
+
checkBackendSupport,
|
|
424
|
+
checkMultimodalSupport,
|
|
425
|
+
getRecommendedBackend,
|
|
426
|
+
applyGemmaTemplate,
|
|
427
|
+
applyPhiTemplate,
|
|
428
|
+
applyLlamaTemplate,
|
|
429
|
+
} from "react-native-litert-lm";
|
|
286
430
|
|
|
287
|
-
|
|
431
|
+
// Check if a backend is supported
|
|
432
|
+
const warning = checkBackendSupport("npu"); // string | undefined
|
|
433
|
+
const mmError = checkMultimodalSupport(); // string | undefined
|
|
434
|
+
const backend = getRecommendedBackend(); // 'gpu' | 'cpu'
|
|
288
435
|
|
|
289
|
-
|
|
436
|
+
// Manual prompt formatting (advanced)
|
|
437
|
+
const prompt = applyGemmaTemplate(
|
|
438
|
+
[{ role: "user", content: "Hello!" }],
|
|
439
|
+
"You are helpful.",
|
|
440
|
+
);
|
|
441
|
+
```
|
|
290
442
|
|
|
291
|
-
|
|
292
|
-
import { checkBackendSupport } from "react-native-litert-lm";
|
|
443
|
+
## Requirements
|
|
293
444
|
|
|
294
|
-
|
|
295
|
-
|
|
296
|
-
|
|
297
|
-
|
|
298
|
-
|
|
445
|
+
| Dependency | Version |
|
|
446
|
+
| -------------------------- | ------------- |
|
|
447
|
+
| React Native | 0.76+ |
|
|
448
|
+
| react-native-nitro-modules | 0.35.0+ |
|
|
449
|
+
| Android API | 26+ (ARM64) |
|
|
450
|
+
| iOS | 15.0+ (ARM64) |
|
|
451
|
+
| LiteRT-LM Android SDK | 0.9.0-alpha01 |
|
|
452
|
+
| LiteRT-LM iOS Engine | v0.9.0 |
|
|
299
453
|
|
|
300
|
-
|
|
454
|
+
## Platform Support
|
|
301
455
|
|
|
302
|
-
|
|
456
|
+
| Platform | Status | Architecture | Backends |
|
|
457
|
+
| -------- | -------- | ------------ | ---------------- |
|
|
458
|
+
| Android | โ
Ready | arm64-v8a | CPU, GPU, NPU |
|
|
459
|
+
| iOS | โ
Ready | arm64 | CPU, GPU (Metal) |
|
|
303
460
|
|
|
304
|
-
|
|
305
|
-
import { checkMultimodalSupport } from "react-native-litert-lm";
|
|
461
|
+
### iOS Feature Matrix
|
|
306
462
|
|
|
307
|
-
|
|
308
|
-
|
|
309
|
-
|
|
310
|
-
|
|
311
|
-
|
|
463
|
+
| Feature | Status | Notes |
|
|
464
|
+
| ---------------------------- | ------ | ----------------------------------------------------- |
|
|
465
|
+
| Text inference (blocking) | โ
| Via LiteRT-LM C API |
|
|
466
|
+
| Text inference (streaming) | โ
| Token-by-token callbacks |
|
|
467
|
+
| GPU inference (Metal/MPS) | โ
| Recommended backend |
|
|
468
|
+
| Model download with progress | โ
| NSURLSession, cached in `Caches/` |
|
|
469
|
+
| Memory tracking | โ
| `mach_task_basic_info` |
|
|
470
|
+
| Multi-turn conversation | โ
| Context retained across turns |
|
|
471
|
+
| Multimodal (image/audio) | ๐งช | Code paths exist; vision/audio executors experimental |
|
|
472
|
+
| Constrained decoding | โ | Requires llguidance Rust runtime |
|
|
473
|
+
| Function calling | โ | Requires Rust CXX bridge runtime |
|
|
312
474
|
|
|
313
|
-
|
|
475
|
+
## Building the iOS Engine
|
|
314
476
|
|
|
315
|
-
|
|
477
|
+
The iOS build uses a **Bazel-to-XCFramework pipeline** that compiles the LiteRT-LM C engine and all transitive dependencies into a static library (~83 MB).
|
|
316
478
|
|
|
317
|
-
|
|
318
|
-
|
|
319
|
-
|
|
320
|
-
|
|
321
|
-
applyLlamaTemplate,
|
|
322
|
-
ChatMessage,
|
|
323
|
-
} from "react-native-litert-lm";
|
|
479
|
+
### Prerequisites
|
|
480
|
+
|
|
481
|
+
- **Bazel 7.6.1+** (via [Bazelisk](https://github.com/bazelbuild/bazelisk) recommended)
|
|
482
|
+
- **Xcode command line tools** (`xcode-select --install`)
|
|
324
483
|
|
|
325
|
-
|
|
326
|
-
{ role: "user", content: "Hello!" },
|
|
327
|
-
{ role: "model", content: "Hi there!" },
|
|
328
|
-
{ role: "user", content: "Tell me a joke" },
|
|
329
|
-
];
|
|
484
|
+
### Build
|
|
330
485
|
|
|
331
|
-
|
|
332
|
-
|
|
486
|
+
```bash
|
|
487
|
+
./scripts/build-ios-engine.sh
|
|
488
|
+
```
|
|
489
|
+
|
|
490
|
+
This will:
|
|
333
491
|
|
|
334
|
-
|
|
335
|
-
|
|
492
|
+
1. Clone/checkout LiteRT-LM `v0.9.0` source into `.litert-lm-build/`
|
|
493
|
+
2. Build `//c:engine` for `ios_arm64` and `ios_sim_arm64` via Bazel
|
|
494
|
+
3. Collect all transitive `.o` files (engine, protobuf, re2, sentencepiece, etc.)
|
|
495
|
+
4. Compile C/C++ stubs for unavailable Rust dependencies
|
|
496
|
+
5. Patch `PromptTemplate` to use a simplified template engine (no Rust MinijinjaTemplate)
|
|
497
|
+
6. Merge ~1,900 object files into a static library via `libtool`
|
|
498
|
+
7. Package into `ios/Frameworks/LiteRTLM.xcframework`
|
|
336
499
|
|
|
337
|
-
|
|
338
|
-
|
|
500
|
+
### Output
|
|
501
|
+
|
|
502
|
+
```
|
|
503
|
+
ios/Frameworks/LiteRTLM.xcframework/
|
|
504
|
+
โโโ Info.plist
|
|
505
|
+
โโโ ios-arm64/LiteRTLM.framework/ # Device
|
|
506
|
+
โ โโโ LiteRTLM # ~81 MB static library
|
|
507
|
+
โ โโโ Headers/litert_lm_engine.h
|
|
508
|
+
โโโ ios-arm64-simulator/LiteRTLM.framework/ # Simulator
|
|
509
|
+
โโโ LiteRTLM # ~83 MB static library
|
|
510
|
+
โโโ Headers/litert_lm_engine.h
|
|
339
511
|
```
|
|
340
512
|
|
|
341
|
-
|
|
513
|
+
### FFI Stubs
|
|
342
514
|
|
|
343
|
-
-
|
|
344
|
-
- react-native-nitro-modules 0.33.2+
|
|
345
|
-
- Android API 26+ (ARM64 only)
|
|
346
|
-
- **LiteRT-LM Android SDK**: `0.9.0-alpha01` (bundled automatically)
|
|
347
|
-
- iOS 15.0+ (coming soon)
|
|
515
|
+
Certain LiteRT-LM features depend on Rust libraries (llguidance, CXX bridge, MinijinjaTemplate) that are not available in the iOS Bazel build. These are replaced with stubs:
|
|
348
516
|
|
|
349
|
-
|
|
517
|
+
| Stub File | Location | Purpose |
|
|
518
|
+
| ------------------------------------ | ---------------- | ---------------------------------------- |
|
|
519
|
+
| `cxx_bridge_stubs.cc` | `scripts/stubs/` | CXX bridge runtime + Rust FFI type stubs |
|
|
520
|
+
| `llguidance_stubs.c` | `scripts/stubs/` | llguidance constrained decoding C API |
|
|
521
|
+
| `gemma_model_constraint_provider.cc` | `scripts/stubs/` | Gemma constraint provider factory |
|
|
522
|
+
|
|
523
|
+
Additionally, `PromptTemplate` is patched at build time to use a simplified C++ template formatter instead of the Rust MinijinjaTemplate, which avoids all Rust FFI calls during conversation setup.
|
|
350
524
|
|
|
351
|
-
|
|
352
|
-
| -------- | -------- | ------------ |
|
|
353
|
-
| Android | โ
Ready | arm64-v8a |
|
|
354
|
-
| iOS | ๐ง Stub | - |
|
|
525
|
+
> **Text inference works fully without these Rust components.** Only constrained decoding, function calling parsers, and advanced Jinja2 template features are affected.
|
|
355
526
|
|
|
356
527
|
## Architecture
|
|
357
528
|
|
|
358
|
-
|
|
529
|
+
```
|
|
530
|
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
|
531
|
+
โ React Native (TypeScript) โ
|
|
532
|
+
โ useModel() / createLLM() / sendMessage() โ
|
|
533
|
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
|
|
534
|
+
โ Nitro Modules JSI Bridge โ
|
|
535
|
+
โโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโค
|
|
536
|
+
โ Android (Kotlin) โ iOS (C++) โ
|
|
537
|
+
โ HybridLiteRTLM.kt โ HybridLiteRTLM.cpp โ
|
|
538
|
+
โ litertlm-android โ LiteRTLM C API โ
|
|
539
|
+
โ AAR (GPU delegate) โ XCFramework (Metal) โ
|
|
540
|
+
โโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
|
541
|
+
```
|
|
359
542
|
|
|
360
|
-
- **Android**:
|
|
361
|
-
- **iOS**:
|
|
543
|
+
- **Android**: Kotlin (`HybridLiteRTLM.kt`) interfacing with the `litertlm-android` AAR.
|
|
544
|
+
- **iOS**: C++ (`HybridLiteRTLM.cpp`) interfacing with the LiteRT-LM C API via a prebuilt `LiteRTLM.xcframework`. Platform-specific code (model downloading, file management) is in Objective-C++ (`ios/IOSDownloadHelper.mm`).
|
|
362
545
|
|
|
363
|
-
> **
|
|
546
|
+
> **For contributors**: Changes to `cpp/HybridLiteRTLM.cpp` do not affect Android. Feature changes must be applied to both the Kotlin and C++ implementations.
|
|
364
547
|
|
|
365
548
|
## License
|
|
366
549
|
|
|
367
550
|
The code in this repository is licensed under the **[MIT License](LICENSE)**.
|
|
368
551
|
|
|
369
|
-
### โ ๏ธ
|
|
370
|
-
|
|
371
|
-
This library acts as an execution engine for On-Device Large Language Models (LLMs). The AI models themselves are **not** distributed with this package and are **not** covered by the MIT license.
|
|
552
|
+
### โ ๏ธ AI Model Disclaimer
|
|
372
553
|
|
|
373
|
-
|
|
554
|
+
This library is an execution engine for on-device LLMs. The AI models themselves are **not** distributed with this package and have their own licenses:
|
|
374
555
|
|
|
375
556
|
- **Gemma (Google)**: [Gemma Terms of Use](https://ai.google.dev/gemma/terms)
|
|
376
557
|
- **Llama 3 (Meta)**: [Llama 3.2 Community License](https://www.llama.com/llama3/license/)
|
|
377
|
-
- **Qwen (Alibaba)**: [Apache 2.0
|
|
558
|
+
- **Qwen (Alibaba)**: [Apache 2.0](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct/blob/main/LICENSE)
|
|
378
559
|
- **Phi (Microsoft)**: [MIT License](https://huggingface.co/microsoft/Phi-3.5-mini-instruct/blob/main/LICENSE)
|
|
379
560
|
|
|
380
|
-
|
|
561
|
+
By downloading and using these models, you agree to their respective licenses and acceptable use policies. The author of `react-native-litert-lm` takes no responsibility for model outputs or applications built with them.
|