react-native-litert-lm 0.2.2 โ 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +270 -186
- package/android/build.gradle +1 -1
- package/android/src/main/java/com/margelo/nitro/dev/litert/litertlm/HybridLiteRTLM.kt +93 -37
- package/app.plugin.js +33 -0
- package/cpp/HybridLiteRTLM.cpp +571 -451
- package/cpp/HybridLiteRTLM.hpp +54 -23
- package/cpp/IOSDownloadHelper.h +24 -0
- package/cpp/cpp-adapter.cpp +2 -2
- package/cpp/include/litert_lm_engine.h +502 -0
- package/ios/IOSDownloadHelper.mm +129 -0
- package/ios/LiteRTLMAutolinking.mm +30 -0
- package/lib/hooks.d.ts +9 -4
- package/lib/hooks.js +34 -20
- package/lib/index.d.ts +1 -0
- package/lib/index.js +2 -5
- package/lib/memoryTracker.d.ts +1 -1
- package/lib/memoryTracker.js +1 -1
- package/lib/modelFactory.d.ts +11 -5
- package/lib/modelFactory.js +9 -4
- package/nitrogen/generated/android/LiteRTLMOnLoad.cpp +11 -4
- package/nitrogen/generated/android/c++/JHybridLiteRTLMSpec.cpp +31 -37
- package/nitrogen/generated/android/c++/JHybridLiteRTLMSpec.hpp +19 -22
- package/nitrogen/generated/android/kotlin/com/margelo/nitro/dev/litert/litertlm/HybridLiteRTLMSpec.kt +15 -18
- package/package.json +12 -5
- package/react-native-litert-lm.podspec +20 -7
- package/scripts/build-ios-engine.sh +283 -0
- package/scripts/download-ios-frameworks.sh +72 -0
- package/scripts/postinstall.js +116 -0
- package/scripts/stubs/cxx_bridge_stubs.cc +224 -0
- package/scripts/stubs/gemma_model_constraint_provider.cc +46 -0
- package/scripts/stubs/llguidance_stubs.c +101 -0
- package/src/hooks.ts +62 -39
- package/src/index.ts +4 -7
- package/src/memoryTracker.ts +1 -1
- package/src/modelFactory.ts +30 -5
package/README.md
CHANGED
|
@@ -1,23 +1,19 @@
|
|
|
1
1
|
# react-native-litert-lm
|
|
2
2
|
|
|
3
|
-
High-performance LLM inference for React Native powered by [LiteRT-LM](https://github.com/google-ai-edge/LiteRT-LM) and [Nitro
|
|
3
|
+
High-performance on-device LLM inference for React Native, powered by [LiteRT-LM](https://github.com/google-ai-edge/LiteRT-LM) and [Nitro Modules](https://github.com/mrousavy/nitro). Optimized for **Gemma 3n** and other on-device language models.
|
|
4
4
|
|
|
5
5
|
## Features
|
|
6
6
|
|
|
7
|
-
- ๐ **Native Performance**
|
|
8
|
-
- ๐ง **Gemma 3n Ready**
|
|
9
|
-
- โก **GPU Acceleration**
|
|
10
|
-
-
|
|
11
|
-
-
|
|
12
|
-
-
|
|
13
|
-
-
|
|
14
|
-
-
|
|
15
|
-
-
|
|
16
|
-
-
|
|
17
|
-
|
|
18
|
-
## Status
|
|
19
|
-
|
|
20
|
-
> โ ๏ธ **Early Preview**: This library is under active development. Android is functional with enough RAM, iOS implementation pending LiteRT-LM iOS release. Please report any issues on the [GitHub issues](https://github.com/hung-yueh/react-native-litert-lm/issues).
|
|
7
|
+
- ๐ **Native Performance** โ Kotlin (Android) / C++ (iOS) via Nitro Modules JSI bindings
|
|
8
|
+
- ๐ง **Gemma 3n Ready** โ First-class support for Gemma 3n E2B/E4B models
|
|
9
|
+
- โก **GPU Acceleration** โ GPU delegate (Android), Metal/MPS (iOS)
|
|
10
|
+
- ๐ **Streaming Support** โ Token-by-token generation callbacks
|
|
11
|
+
- ๐ฑ **Cross-Platform** โ Android API 26+ / iOS 15.0+
|
|
12
|
+
- ๐ผ๏ธ **Multimodal** โ Image and audio input support (Android)
|
|
13
|
+
- ๐งต **Async API** โ Non-blocking inference on background threads
|
|
14
|
+
- ๐ **Real Memory Tracking** โ OS-level memory metrics (RSS, native heap, available memory) via native APIs
|
|
15
|
+
- ๐งฎ **Zero-Copy Buffers** โ Memory snapshots stored in native ArrayBuffers via Nitro Modules
|
|
16
|
+
- ๐ฅ **Automatic Model Download** โ Downloads models from URL with progress tracking and local caching
|
|
21
17
|
|
|
22
18
|
## Installation
|
|
23
19
|
|
|
@@ -44,69 +40,88 @@ Then create a development build:
|
|
|
44
40
|
|
|
45
41
|
```bash
|
|
46
42
|
npx expo prebuild
|
|
47
|
-
npx expo run:android
|
|
43
|
+
npx expo run:android # Android
|
|
44
|
+
npx expo run:ios # iOS
|
|
48
45
|
```
|
|
49
46
|
|
|
50
|
-
> **Note**: Only ARM devices are supported
|
|
47
|
+
> **Note**: Only ARM devices/simulators are supported. x86_64 Android emulators are not supported.
|
|
51
48
|
|
|
52
49
|
### Bare React Native
|
|
53
50
|
|
|
54
51
|
```bash
|
|
52
|
+
# Android
|
|
55
53
|
cd android && ./gradlew clean
|
|
56
|
-
|
|
54
|
+
|
|
55
|
+
# iOS
|
|
56
|
+
cd ios && pod install
|
|
57
57
|
```
|
|
58
58
|
|
|
59
59
|
## Example App
|
|
60
60
|
|
|
61
|
-
The
|
|
61
|
+
The `example/` directory contains a fully functional test app with a dark-themed diagnostic UI that demonstrates:
|
|
62
|
+
|
|
63
|
+
- Model downloading with progress tracking
|
|
64
|
+
- Text inference (blocking and streaming)
|
|
65
|
+
- Multi-turn conversation with context retention
|
|
66
|
+
- Performance benchmarking (tokens/sec, latency)
|
|
67
|
+
- Real-time memory tracking
|
|
68
|
+
- Quick chat interface
|
|
62
69
|
|
|
63
|
-
|
|
70
|
+
### Running the Example
|
|
64
71
|
|
|
65
|
-
1.
|
|
72
|
+
1. **Build the library** (compiles TypeScript to `lib/`):
|
|
66
73
|
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
74
|
+
```bash
|
|
75
|
+
npm run build
|
|
76
|
+
```
|
|
70
77
|
|
|
71
|
-
2.
|
|
78
|
+
2. **Install example dependencies:**
|
|
72
79
|
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
80
|
+
```bash
|
|
81
|
+
cd example
|
|
82
|
+
npm install
|
|
83
|
+
```
|
|
77
84
|
|
|
78
|
-
3.
|
|
79
|
-
```bash
|
|
80
|
-
npx expo prebuild --clean
|
|
81
|
-
npx expo run:android
|
|
82
|
-
```
|
|
85
|
+
3. **Create a development build and run:**
|
|
83
86
|
|
|
84
|
-
|
|
87
|
+
```bash
|
|
88
|
+
npx expo prebuild --clean
|
|
89
|
+
npx expo run:android # Android
|
|
90
|
+
npx expo run:ios # iOS (requires XCFramework โ see "Building the iOS Engine" below)
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
> **Note:** If you change native code (C++/Kotlin/Obj-C++), you must run `npx expo prebuild --clean` again before rebuilding.
|
|
85
94
|
|
|
86
95
|
## Model Management
|
|
87
96
|
|
|
88
|
-
LiteRT-LM models (like Gemma 3n) are large files (
|
|
97
|
+
LiteRT-LM models (like Gemma 3n) are large files (3 GB+) and cannot be bundled into your app binary. They are downloaded at runtime.
|
|
89
98
|
|
|
90
99
|
### Automatic Downloading
|
|
91
100
|
|
|
92
|
-
The library
|
|
101
|
+
The library handles downloading automatically when you pass a URL to `loadModel` or `useModel`. Downloads include:
|
|
102
|
+
|
|
103
|
+
- **Progress tracking** โ real-time download percentage via callbacks
|
|
104
|
+
- **Local caching** โ downloaded models are cached and reused across app launches
|
|
105
|
+
- **Android**: app-local temp directory
|
|
106
|
+
- **iOS**: `Library/Caches/litert_models/` (survives app relaunch; reclaimable by iOS under storage pressure)
|
|
107
|
+
- **HTTPS enforcement** โ only secure URLs are accepted
|
|
93
108
|
|
|
94
109
|
### Manual Downloading (Optional)
|
|
95
110
|
|
|
96
|
-
If you prefer to manage downloads
|
|
111
|
+
If you prefer to manage downloads yourself (e.g., using `expo-file-system`), download the `.litertlm` file to a local path and pass that path to the library:
|
|
97
112
|
|
|
98
113
|
```typescript
|
|
99
|
-
import
|
|
100
|
-
// or import * as FileSystem from 'expo-file-system';
|
|
114
|
+
import * as FileSystem from "expo-file-system";
|
|
101
115
|
|
|
102
116
|
const MODEL_URL =
|
|
103
117
|
"https://huggingface.co/litert-community/gemma-3n-2b-it/resolve/main/model.litertlm";
|
|
104
|
-
const localPath = `${FileSystem.
|
|
118
|
+
const localPath = `${FileSystem.documentDirectory}gemma-3n.litertlm`;
|
|
105
119
|
|
|
106
120
|
async function downloadModel() {
|
|
107
|
-
|
|
121
|
+
const info = await FileSystem.getInfoAsync(localPath);
|
|
122
|
+
if (info.exists) return localPath;
|
|
108
123
|
|
|
109
|
-
|
|
124
|
+
await FileSystem.downloadAsync(MODEL_URL, localPath);
|
|
110
125
|
return localPath;
|
|
111
126
|
}
|
|
112
127
|
```
|
|
@@ -115,26 +130,27 @@ async function downloadModel() {
|
|
|
115
130
|
|
|
116
131
|
### React Hook (Recommended)
|
|
117
132
|
|
|
118
|
-
The `useModel` hook manages the model lifecycle
|
|
133
|
+
The `useModel` hook manages the full model lifecycle: downloading, loading, inference, and cleanup.
|
|
119
134
|
|
|
120
135
|
```typescript
|
|
121
136
|
import { useModel, GEMMA_3N_E2B_IT_INT4 } from "react-native-litert-lm";
|
|
137
|
+
import { Platform } from "react-native";
|
|
122
138
|
|
|
123
139
|
function App() {
|
|
124
140
|
const {
|
|
125
141
|
model,
|
|
126
142
|
isReady,
|
|
127
143
|
downloadProgress,
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
);
|
|
144
|
+
error,
|
|
145
|
+
load, // Manually trigger load
|
|
146
|
+
deleteModel, // Delete cached model file
|
|
147
|
+
memorySummary, // Auto-updated memory stats (if tracking enabled)
|
|
148
|
+
} = useModel(GEMMA_3N_E2B_IT_INT4, {
|
|
149
|
+
backend: Platform.OS === 'ios' ? 'gpu' : 'cpu',
|
|
150
|
+
autoLoad: true, // Default: true. Set false to load manually via load().
|
|
151
|
+
systemPrompt: "You are a helpful assistant.",
|
|
152
|
+
enableMemoryTracking: true,
|
|
153
|
+
});
|
|
138
154
|
|
|
139
155
|
if (!isReady) {
|
|
140
156
|
return <Text>Loading... {Math.round(downloadProgress * 100)}%</Text>;
|
|
@@ -162,7 +178,7 @@ await llm.loadModel("https://example.com/model.litertlm", {
|
|
|
162
178
|
systemPrompt: "You are a helpful assistant.",
|
|
163
179
|
});
|
|
164
180
|
|
|
165
|
-
// Generate response
|
|
181
|
+
// Generate a response
|
|
166
182
|
const response = await llm.sendMessage("What is the capital of France?");
|
|
167
183
|
console.log(response);
|
|
168
184
|
|
|
@@ -179,15 +195,16 @@ llm.sendMessageAsync("Tell me a story", (token, done) => {
|
|
|
179
195
|
});
|
|
180
196
|
```
|
|
181
197
|
|
|
182
|
-
### Multimodal (Image/Audio)
|
|
198
|
+
### Multimodal (Image / Audio)
|
|
199
|
+
|
|
200
|
+
> **Note**: Multimodal is fully supported on Android. iOS has the code paths implemented but vision/audio executors may not be available in the current XCFramework build โ use `checkMultimodalSupport()` to verify at runtime.
|
|
183
201
|
|
|
184
202
|
```typescript
|
|
185
203
|
import { checkMultimodalSupport } from "react-native-litert-lm";
|
|
186
204
|
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
|
|
190
|
-
console.warn(error); // iOS not yet supported
|
|
205
|
+
const warning = checkMultimodalSupport();
|
|
206
|
+
if (warning) {
|
|
207
|
+
console.warn(warning); // Experimental on iOS
|
|
191
208
|
} else {
|
|
192
209
|
// Image input (for vision models like Gemma 3n)
|
|
193
210
|
// Images >1024px are automatically resized to prevent OOM
|
|
@@ -196,7 +213,7 @@ if (error) {
|
|
|
196
213
|
"/path/to/image.jpg",
|
|
197
214
|
);
|
|
198
215
|
|
|
199
|
-
// Audio input
|
|
216
|
+
// Audio input
|
|
200
217
|
const transcription = await llm.sendMessageWithAudio(
|
|
201
218
|
"Transcribe this audio",
|
|
202
219
|
"/path/to/audio.wav",
|
|
@@ -204,58 +221,59 @@ if (error) {
|
|
|
204
221
|
}
|
|
205
222
|
```
|
|
206
223
|
|
|
207
|
-
###
|
|
224
|
+
### Performance Stats
|
|
208
225
|
|
|
209
226
|
```typescript
|
|
210
227
|
const stats = llm.getStats();
|
|
211
228
|
console.log(`Generated ${stats.completionTokens} tokens`);
|
|
212
229
|
console.log(`Speed: ${stats.tokensPerSecond.toFixed(1)} tokens/sec`);
|
|
230
|
+
console.log(`Time to first token: ${stats.timeToFirstToken.toFixed(0)} ms`);
|
|
213
231
|
```
|
|
214
232
|
|
|
215
233
|
### Memory Tracking
|
|
216
234
|
|
|
217
|
-
The library provides real OS-level memory
|
|
235
|
+
The library provides real OS-level memory data โ no estimation. It reads directly from `mach_task_basic_info` (iOS) and `Debug.getNativeHeapAllocatedSize()` + `/proc/self/status` (Android).
|
|
218
236
|
|
|
219
237
|
#### Direct Memory Query
|
|
220
238
|
|
|
221
239
|
```typescript
|
|
222
|
-
// Get a single real-time snapshot from native APIs
|
|
223
240
|
const usage = llm.getMemoryUsage();
|
|
224
|
-
console.log(
|
|
241
|
+
console.log(
|
|
242
|
+
`Native heap: ${(usage.nativeHeapBytes / 1024 / 1024).toFixed(1)} MB`,
|
|
243
|
+
);
|
|
225
244
|
console.log(`RSS: ${(usage.residentBytes / 1024 / 1024).toFixed(1)} MB`);
|
|
226
|
-
console.log(
|
|
245
|
+
console.log(
|
|
246
|
+
`Available: ${(usage.availableMemoryBytes / 1024 / 1024).toFixed(1)} MB`,
|
|
247
|
+
);
|
|
227
248
|
console.log(`Low memory: ${usage.isLowMemory}`);
|
|
228
249
|
```
|
|
229
250
|
|
|
230
251
|
#### Automatic Tracking with Native Buffers
|
|
231
252
|
|
|
232
|
-
Enable memory tracking to automatically record snapshots in a native-backed `ArrayBuffer`
|
|
253
|
+
Enable memory tracking to automatically record snapshots in a native-backed `ArrayBuffer` after every inference call:
|
|
233
254
|
|
|
234
255
|
```typescript
|
|
235
|
-
import { createLLM } from 'react-native-litert-lm';
|
|
236
|
-
|
|
237
256
|
const llm = createLLM({
|
|
238
257
|
enableMemoryTracking: true,
|
|
239
|
-
maxMemorySnapshots: 256,
|
|
258
|
+
maxMemorySnapshots: 256,
|
|
240
259
|
});
|
|
241
260
|
|
|
242
|
-
await llm.loadModel(
|
|
243
|
-
await llm.sendMessage(
|
|
261
|
+
await llm.loadModel("/path/to/model.litertlm", { backend: "cpu" });
|
|
262
|
+
await llm.sendMessage("Hello!");
|
|
244
263
|
|
|
245
|
-
// Review tracked data
|
|
246
264
|
const summary = llm.memoryTracker!.getSummary();
|
|
247
|
-
console.log(
|
|
248
|
-
|
|
249
|
-
|
|
250
|
-
console.log(
|
|
265
|
+
console.log(
|
|
266
|
+
`Peak RSS: ${(summary.peakResidentBytes / 1024 / 1024).toFixed(1)} MB`,
|
|
267
|
+
);
|
|
268
|
+
console.log(
|
|
269
|
+
`RSS Delta: ${(summary.residentDeltaBytes / 1024 / 1024).toFixed(1)} MB`,
|
|
270
|
+
);
|
|
251
271
|
```
|
|
252
272
|
|
|
253
|
-
#### Using
|
|
273
|
+
#### Using `useModel` with Memory Tracking
|
|
254
274
|
|
|
255
275
|
```typescript
|
|
256
|
-
|
|
257
|
-
|
|
258
|
-
const { model, isReady, memorySummary, memoryTracker } = useModel(modelUrl, {
|
|
276
|
+
const { model, isReady, memorySummary } = useModel(modelUrl, {
|
|
259
277
|
enableMemoryTracking: true,
|
|
260
278
|
maxMemorySnapshots: 100,
|
|
261
279
|
});
|
|
@@ -270,12 +288,13 @@ if (memorySummary) {
|
|
|
270
288
|
#### Standalone Memory Tracker
|
|
271
289
|
|
|
272
290
|
```typescript
|
|
273
|
-
import {
|
|
291
|
+
import {
|
|
292
|
+
createMemoryTracker,
|
|
293
|
+
createNativeBuffer,
|
|
294
|
+
} from "react-native-litert-lm";
|
|
274
295
|
|
|
275
|
-
// Create a tracker backed by a native ArrayBuffer
|
|
276
296
|
const tracker = createMemoryTracker(100);
|
|
277
297
|
|
|
278
|
-
// Manually record snapshots
|
|
279
298
|
tracker.record({
|
|
280
299
|
timestamp: Date.now(),
|
|
281
300
|
nativeHeapBytes: 50_000_000,
|
|
@@ -283,195 +302,260 @@ tracker.record({
|
|
|
283
302
|
availableMemoryBytes: 4_000_000_000,
|
|
284
303
|
});
|
|
285
304
|
|
|
286
|
-
// Access the underlying native buffer (
|
|
305
|
+
// Access the underlying native buffer (zero-copy transfer to native code)
|
|
287
306
|
const buffer = tracker.getNativeBuffer();
|
|
288
|
-
|
|
289
|
-
// Create a standalone native buffer for custom use
|
|
290
|
-
const customBuffer = createNativeBuffer(1024);
|
|
291
307
|
```
|
|
292
308
|
|
|
293
309
|
## Supported Models
|
|
294
310
|
|
|
295
|
-
Download `.litertlm` models automatically using the exported constants or from [HuggingFace](https://huggingface.co/litert-community):
|
|
311
|
+
Download `.litertlm` models automatically using the exported URL constants, or manually from [HuggingFace](https://huggingface.co/litert-community):
|
|
296
312
|
|
|
297
|
-
|
|
|
298
|
-
| :--------------------- | :------------------------------------- |
|
|
299
|
-
| `GEMMA_3N_E2B_IT_INT4` | Gemma 3n E2B (Instruction Tuned, Int4) | ~
|
|
313
|
+
| Constant | Model | Size | Min RAM |
|
|
314
|
+
| :--------------------- | :------------------------------------- | :---- | :------ |
|
|
315
|
+
| `GEMMA_3N_E2B_IT_INT4` | Gemma 3n E2B (Instruction Tuned, Int4) | ~3 GB | 4 GB+ |
|
|
300
316
|
|
|
301
|
-
|
|
302
|
-
|
|
303
|
-
|
|
|
304
|
-
|
|
|
305
|
-
|
|
|
306
|
-
|
|
|
317
|
+
**Other compatible models** (download manually from HuggingFace):
|
|
318
|
+
|
|
319
|
+
| Model | Size | Min RAM | Notes |
|
|
320
|
+
| ------------- | ------- | ------- | --------------------- |
|
|
321
|
+
| Gemma 3n E4B | ~4 GB | 8 GB+ | Higher quality |
|
|
322
|
+
| Gemma 3 1B | ~1 GB | 4 GB+ | Smallest, fastest |
|
|
323
|
+
| Phi-4 Mini | ~2 GB | 4 GB+ | Microsoft's small LLM |
|
|
324
|
+
| Qwen 2.5 1.5B | ~1.5 GB | 4 GB+ | Multilingual |
|
|
307
325
|
|
|
308
326
|
## API Reference
|
|
309
327
|
|
|
310
|
-
### `createLLM(): LiteRTLM`
|
|
328
|
+
### `createLLM(options?): LiteRTLM`
|
|
311
329
|
|
|
312
330
|
Creates a new LLM inference engine instance.
|
|
313
331
|
|
|
332
|
+
- `options.enableMemoryTracking` โ enable automatic memory snapshot recording
|
|
333
|
+
- `options.maxMemorySnapshots` โ max number of snapshots to retain (default: 256)
|
|
334
|
+
|
|
314
335
|
### `loadModel(path, config?): Promise<void>`
|
|
315
336
|
|
|
316
|
-
|
|
317
|
-
- `config.systemPrompt` - System prompt to guide model behavior (e.g., "You are a helpful assistant.")
|
|
318
|
-
- `config.backend` - `'cpu'` | `'gpu'` | `'npu'` (default: `'gpu'`)
|
|
319
|
-
- `config.temperature` - Sampling temperature (default: 0.7)
|
|
320
|
-
- `config.topK` - Top-K sampling (default: 40)
|
|
321
|
-
- `config.maxTokens` - Max generation length (default: 1024)
|
|
337
|
+
Loads a model from a local path or HTTPS URL.
|
|
322
338
|
|
|
323
|
-
|
|
339
|
+
| Parameter | Type | Default | Description |
|
|
340
|
+
| --------------------- | -------- | ------- | ----------------------------------------- |
|
|
341
|
+
| `path` | `string` | โ | Absolute path to `.litertlm` or HTTPS URL |
|
|
342
|
+
| `config.backend` | `string` | `'gpu'` | `'cpu'`, `'gpu'`, or `'npu'` |
|
|
343
|
+
| `config.systemPrompt` | `string` | โ | System prompt for the model |
|
|
344
|
+
| `config.temperature` | `number` | `0.7` | Sampling temperature |
|
|
345
|
+
| `config.topK` | `number` | `40` | Top-K sampling |
|
|
346
|
+
| `config.topP` | `number` | `0.95` | Top-P (nucleus) sampling |
|
|
347
|
+
| `config.maxTokens` | `number` | `1024` | Maximum generation length |
|
|
324
348
|
|
|
325
349
|
#### Backend Options
|
|
326
350
|
|
|
327
|
-
| Backend |
|
|
328
|
-
| ------- |
|
|
329
|
-
| `'cpu'` | CPU inference
|
|
330
|
-
| `'gpu'` | GPU
|
|
331
|
-
| `'npu'` | NPU/Neural Engine | Fastest | Requires supported hardware
|
|
351
|
+
| Backend | Engine | Speed | Notes |
|
|
352
|
+
| ------- | ------------------- | ------- | ---------------------------------------------- |
|
|
353
|
+
| `'cpu'` | CPU inference | Slowest | Always available, lower RAM requirement |
|
|
354
|
+
| `'gpu'` | GPU / Metal | Fast | Recommended default |
|
|
355
|
+
| `'npu'` | NPU / Neural Engine | Fastest | Requires supported hardware; falls back to GPU |
|
|
332
356
|
|
|
333
|
-
>
|
|
357
|
+
> **iOS**: `'gpu'` uses Metal/MPS and is the recommended backend. The engine automatically tries multiple backend combinations if the primary one fails.
|
|
334
358
|
|
|
335
359
|
### `sendMessage(message): Promise<string>`
|
|
336
360
|
|
|
337
|
-
|
|
361
|
+
Runs inference synchronously on a background thread. Returns the complete response.
|
|
338
362
|
|
|
339
363
|
### `sendMessageAsync(message, callback)`
|
|
340
364
|
|
|
341
|
-
Streaming generation. Callback
|
|
365
|
+
Streaming generation. Callback signature: `(token: string, isDone: boolean) => void`.
|
|
342
366
|
|
|
343
367
|
### `sendMessageWithImage(message, imagePath): Promise<string>`
|
|
344
368
|
|
|
345
|
-
Send a message with an image
|
|
369
|
+
Send a message with an image (Android only; for vision models like Gemma 3n).
|
|
346
370
|
|
|
347
371
|
### `sendMessageWithAudio(message, audioPath): Promise<string>`
|
|
348
372
|
|
|
349
|
-
Send a message with
|
|
373
|
+
Send a message with audio (Android only).
|
|
374
|
+
|
|
375
|
+
### `getStats(): GenerationStats`
|
|
376
|
+
|
|
377
|
+
Returns performance metrics from the last inference call.
|
|
378
|
+
|
|
379
|
+
```typescript
|
|
380
|
+
interface GenerationStats {
|
|
381
|
+
tokensPerSecond: number;
|
|
382
|
+
totalTime: number; // seconds
|
|
383
|
+
timeToFirstToken: number; // seconds
|
|
384
|
+
promptTokens: number;
|
|
385
|
+
completionTokens: number;
|
|
386
|
+
prefillSpeed: number; // tokens/sec
|
|
387
|
+
}
|
|
388
|
+
```
|
|
350
389
|
|
|
351
390
|
### `getMemoryUsage(): MemoryUsage`
|
|
352
391
|
|
|
353
|
-
Returns real OS-level memory usage
|
|
392
|
+
Returns real OS-level memory usage.
|
|
354
393
|
|
|
355
394
|
```typescript
|
|
356
395
|
interface MemoryUsage {
|
|
357
|
-
nativeHeapBytes: number;
|
|
358
|
-
residentBytes: number;
|
|
359
|
-
availableMemoryBytes: number;
|
|
360
|
-
isLowMemory: boolean;
|
|
396
|
+
nativeHeapBytes: number;
|
|
397
|
+
residentBytes: number;
|
|
398
|
+
availableMemoryBytes: number;
|
|
399
|
+
isLowMemory: boolean;
|
|
361
400
|
}
|
|
362
401
|
```
|
|
363
402
|
|
|
364
403
|
### `getHistory(): Message[]`
|
|
365
404
|
|
|
366
|
-
|
|
405
|
+
Returns the conversation history.
|
|
367
406
|
|
|
368
407
|
### `resetConversation()`
|
|
369
408
|
|
|
370
|
-
|
|
409
|
+
Clears conversation context and starts a fresh session.
|
|
371
410
|
|
|
372
411
|
### `close()`
|
|
373
412
|
|
|
374
|
-
|
|
413
|
+
Releases all native resources. Call when the model is no longer needed.
|
|
375
414
|
|
|
376
415
|
### `deleteModel(fileName): Promise<void>`
|
|
377
416
|
|
|
378
|
-
Deletes a model file from the app's
|
|
417
|
+
Deletes a cached model file from the app's local storage.
|
|
418
|
+
|
|
419
|
+
### Utility Functions
|
|
420
|
+
|
|
421
|
+
```typescript
|
|
422
|
+
import {
|
|
423
|
+
checkBackendSupport,
|
|
424
|
+
checkMultimodalSupport,
|
|
425
|
+
getRecommendedBackend,
|
|
426
|
+
applyGemmaTemplate,
|
|
427
|
+
applyPhiTemplate,
|
|
428
|
+
applyLlamaTemplate,
|
|
429
|
+
} from "react-native-litert-lm";
|
|
430
|
+
|
|
431
|
+
// Check if a backend is supported
|
|
432
|
+
const warning = checkBackendSupport("npu"); // string | undefined
|
|
433
|
+
const mmError = checkMultimodalSupport(); // string | undefined
|
|
434
|
+
const backend = getRecommendedBackend(); // 'gpu' | 'cpu'
|
|
379
435
|
|
|
380
|
-
|
|
436
|
+
// Manual prompt formatting (advanced)
|
|
437
|
+
const prompt = applyGemmaTemplate(
|
|
438
|
+
[{ role: "user", content: "Hello!" }],
|
|
439
|
+
"You are helpful.",
|
|
440
|
+
);
|
|
441
|
+
```
|
|
381
442
|
|
|
382
|
-
|
|
443
|
+
## Requirements
|
|
383
444
|
|
|
384
|
-
|
|
445
|
+
| Dependency | Version |
|
|
446
|
+
| -------------------------- | ------------- |
|
|
447
|
+
| React Native | 0.76+ |
|
|
448
|
+
| react-native-nitro-modules | 0.35.0+ |
|
|
449
|
+
| Android API | 26+ (ARM64) |
|
|
450
|
+
| iOS | 15.0+ (ARM64) |
|
|
451
|
+
| LiteRT-LM Android SDK | 0.9.0-alpha01 |
|
|
452
|
+
| LiteRT-LM iOS Engine | v0.9.0 |
|
|
385
453
|
|
|
386
|
-
|
|
454
|
+
## Platform Support
|
|
387
455
|
|
|
388
|
-
|
|
389
|
-
|
|
456
|
+
| Platform | Status | Architecture | Backends |
|
|
457
|
+
| -------- | -------- | ------------ | ---------------- |
|
|
458
|
+
| Android | โ
Ready | arm64-v8a | CPU, GPU, NPU |
|
|
459
|
+
| iOS | โ
Ready | arm64 | CPU, GPU (Metal) |
|
|
390
460
|
|
|
391
|
-
|
|
392
|
-
if (warning) {
|
|
393
|
-
console.warn(warning);
|
|
394
|
-
}
|
|
395
|
-
```
|
|
461
|
+
### iOS Feature Matrix
|
|
396
462
|
|
|
397
|
-
|
|
463
|
+
| Feature | Status | Notes |
|
|
464
|
+
| ---------------------------- | ------ | ----------------------------------------------------- |
|
|
465
|
+
| Text inference (blocking) | โ
| Via LiteRT-LM C API |
|
|
466
|
+
| Text inference (streaming) | โ
| Token-by-token callbacks |
|
|
467
|
+
| GPU inference (Metal/MPS) | โ
| Recommended backend |
|
|
468
|
+
| Model download with progress | โ
| NSURLSession, cached in `Caches/` |
|
|
469
|
+
| Memory tracking | โ
| `mach_task_basic_info` |
|
|
470
|
+
| Multi-turn conversation | โ
| Context retained across turns |
|
|
471
|
+
| Multimodal (image/audio) | ๐งช | Code paths exist; vision/audio executors experimental |
|
|
472
|
+
| Constrained decoding | โ | Requires llguidance Rust runtime |
|
|
473
|
+
| Function calling | โ | Requires Rust CXX bridge runtime |
|
|
398
474
|
|
|
399
|
-
|
|
475
|
+
## Building the iOS Engine
|
|
400
476
|
|
|
401
|
-
|
|
402
|
-
import { checkMultimodalSupport } from "react-native-litert-lm";
|
|
477
|
+
The iOS build uses a **Bazel-to-XCFramework pipeline** that compiles the LiteRT-LM C engine and all transitive dependencies into a static library (~83 MB).
|
|
403
478
|
|
|
404
|
-
|
|
405
|
-
if (error) {
|
|
406
|
-
console.warn(error); // iOS multimodal not yet supported
|
|
407
|
-
}
|
|
408
|
-
```
|
|
479
|
+
### Prerequisites
|
|
409
480
|
|
|
410
|
-
|
|
481
|
+
- **Bazel 7.6.1+** (via [Bazelisk](https://github.com/bazelbuild/bazelisk) recommended)
|
|
482
|
+
- **Xcode command line tools** (`xcode-select --install`)
|
|
411
483
|
|
|
412
|
-
|
|
484
|
+
### Build
|
|
413
485
|
|
|
414
|
-
```
|
|
415
|
-
|
|
416
|
-
|
|
417
|
-
applyPhiTemplate,
|
|
418
|
-
applyLlamaTemplate,
|
|
419
|
-
ChatMessage,
|
|
420
|
-
} from "react-native-litert-lm";
|
|
486
|
+
```bash
|
|
487
|
+
./scripts/build-ios-engine.sh
|
|
488
|
+
```
|
|
421
489
|
|
|
422
|
-
|
|
423
|
-
{ role: "user", content: "Hello!" },
|
|
424
|
-
{ role: "model", content: "Hi there!" },
|
|
425
|
-
{ role: "user", content: "Tell me a joke" },
|
|
426
|
-
];
|
|
490
|
+
This will:
|
|
427
491
|
|
|
428
|
-
|
|
429
|
-
|
|
492
|
+
1. Clone/checkout LiteRT-LM `v0.9.0` source into `.litert-lm-build/`
|
|
493
|
+
2. Build `//c:engine` for `ios_arm64` and `ios_sim_arm64` via Bazel
|
|
494
|
+
3. Collect all transitive `.o` files (engine, protobuf, re2, sentencepiece, etc.)
|
|
495
|
+
4. Compile C/C++ stubs for unavailable Rust dependencies
|
|
496
|
+
5. Patch `PromptTemplate` to use a simplified template engine (no Rust MinijinjaTemplate)
|
|
497
|
+
6. Merge ~1,900 object files into a static library via `libtool`
|
|
498
|
+
7. Package into `ios/Frameworks/LiteRTLM.xcframework`
|
|
430
499
|
|
|
431
|
-
|
|
432
|
-
const phiPrompt = applyPhiTemplate(history);
|
|
500
|
+
### Output
|
|
433
501
|
|
|
434
|
-
|
|
435
|
-
|
|
502
|
+
```
|
|
503
|
+
ios/Frameworks/LiteRTLM.xcframework/
|
|
504
|
+
โโโ Info.plist
|
|
505
|
+
โโโ ios-arm64/LiteRTLM.framework/ # Device
|
|
506
|
+
โ โโโ LiteRTLM # ~81 MB static library
|
|
507
|
+
โ โโโ Headers/litert_lm_engine.h
|
|
508
|
+
โโโ ios-arm64-simulator/LiteRTLM.framework/ # Simulator
|
|
509
|
+
โโโ LiteRTLM # ~83 MB static library
|
|
510
|
+
โโโ Headers/litert_lm_engine.h
|
|
436
511
|
```
|
|
437
512
|
|
|
438
|
-
|
|
513
|
+
### FFI Stubs
|
|
439
514
|
|
|
440
|
-
-
|
|
441
|
-
- react-native-nitro-modules **0.34.1+** (required for `createNativeArrayBuffer` and memory tracking)
|
|
442
|
-
- Android API 26+ (ARM64 only)
|
|
443
|
-
- **LiteRT-LM Android SDK**: `0.9.0-alpha01` (bundled automatically)
|
|
444
|
-
- iOS 15.0+ (coming soon)
|
|
515
|
+
Certain LiteRT-LM features depend on Rust libraries (llguidance, CXX bridge, MinijinjaTemplate) that are not available in the iOS Bazel build. These are replaced with stubs:
|
|
445
516
|
|
|
446
|
-
|
|
517
|
+
| Stub File | Location | Purpose |
|
|
518
|
+
| ------------------------------------ | ---------------- | ---------------------------------------- |
|
|
519
|
+
| `cxx_bridge_stubs.cc` | `scripts/stubs/` | CXX bridge runtime + Rust FFI type stubs |
|
|
520
|
+
| `llguidance_stubs.c` | `scripts/stubs/` | llguidance constrained decoding C API |
|
|
521
|
+
| `gemma_model_constraint_provider.cc` | `scripts/stubs/` | Gemma constraint provider factory |
|
|
522
|
+
|
|
523
|
+
Additionally, `PromptTemplate` is patched at build time to use a simplified C++ template formatter instead of the Rust MinijinjaTemplate, which avoids all Rust FFI calls during conversation setup.
|
|
447
524
|
|
|
448
|
-
|
|
449
|
-
| -------- | -------- | ------------ |
|
|
450
|
-
| Android | โ
Ready | arm64-v8a |
|
|
451
|
-
| iOS | ๐ง Stub | - |
|
|
525
|
+
> **Text inference works fully without these Rust components.** Only constrained decoding, function calling parsers, and advanced Jinja2 template features are affected.
|
|
452
526
|
|
|
453
527
|
## Architecture
|
|
454
528
|
|
|
455
|
-
|
|
529
|
+
```
|
|
530
|
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
|
531
|
+
โ React Native (TypeScript) โ
|
|
532
|
+
โ useModel() / createLLM() / sendMessage() โ
|
|
533
|
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
|
|
534
|
+
โ Nitro Modules JSI Bridge โ
|
|
535
|
+
โโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโค
|
|
536
|
+
โ Android (Kotlin) โ iOS (C++) โ
|
|
537
|
+
โ HybridLiteRTLM.kt โ HybridLiteRTLM.cpp โ
|
|
538
|
+
โ litertlm-android โ LiteRTLM C API โ
|
|
539
|
+
โ AAR (GPU delegate) โ XCFramework (Metal) โ
|
|
540
|
+
โโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
|
541
|
+
```
|
|
456
542
|
|
|
457
|
-
- **Android**:
|
|
458
|
-
- **iOS**:
|
|
543
|
+
- **Android**: Kotlin (`HybridLiteRTLM.kt`) interfacing with the `litertlm-android` AAR.
|
|
544
|
+
- **iOS**: C++ (`HybridLiteRTLM.cpp`) interfacing with the LiteRT-LM C API via a prebuilt `LiteRTLM.xcframework`. Platform-specific code (model downloading, file management) is in Objective-C++ (`ios/IOSDownloadHelper.mm`).
|
|
459
545
|
|
|
460
|
-
> **
|
|
546
|
+
> **For contributors**: Changes to `cpp/HybridLiteRTLM.cpp` do not affect Android. Feature changes must be applied to both the Kotlin and C++ implementations.
|
|
461
547
|
|
|
462
548
|
## License
|
|
463
549
|
|
|
464
550
|
The code in this repository is licensed under the **[MIT License](LICENSE)**.
|
|
465
551
|
|
|
466
|
-
### โ ๏ธ
|
|
467
|
-
|
|
468
|
-
This library acts as an execution engine for On-Device Large Language Models (LLMs). The AI models themselves are **not** distributed with this package and are **not** covered by the MIT license.
|
|
552
|
+
### โ ๏ธ AI Model Disclaimer
|
|
469
553
|
|
|
470
|
-
|
|
554
|
+
This library is an execution engine for on-device LLMs. The AI models themselves are **not** distributed with this package and have their own licenses:
|
|
471
555
|
|
|
472
556
|
- **Gemma (Google)**: [Gemma Terms of Use](https://ai.google.dev/gemma/terms)
|
|
473
557
|
- **Llama 3 (Meta)**: [Llama 3.2 Community License](https://www.llama.com/llama3/license/)
|
|
474
|
-
- **Qwen (Alibaba)**: [Apache 2.0
|
|
558
|
+
- **Qwen (Alibaba)**: [Apache 2.0](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct/blob/main/LICENSE)
|
|
475
559
|
- **Phi (Microsoft)**: [MIT License](https://huggingface.co/microsoft/Phi-3.5-mini-instruct/blob/main/LICENSE)
|
|
476
560
|
|
|
477
|
-
|
|
561
|
+
By downloading and using these models, you agree to their respective licenses and acceptable use policies. The author of `react-native-litert-lm` takes no responsibility for model outputs or applications built with them.
|