react-native-litert-lm 0.2.2 โ†’ 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (35) hide show
  1. package/README.md +270 -186
  2. package/android/build.gradle +1 -1
  3. package/android/src/main/java/com/margelo/nitro/dev/litert/litertlm/HybridLiteRTLM.kt +93 -37
  4. package/app.plugin.js +33 -0
  5. package/cpp/HybridLiteRTLM.cpp +571 -451
  6. package/cpp/HybridLiteRTLM.hpp +54 -23
  7. package/cpp/IOSDownloadHelper.h +24 -0
  8. package/cpp/cpp-adapter.cpp +2 -2
  9. package/cpp/include/litert_lm_engine.h +502 -0
  10. package/ios/IOSDownloadHelper.mm +129 -0
  11. package/ios/LiteRTLMAutolinking.mm +30 -0
  12. package/lib/hooks.d.ts +9 -4
  13. package/lib/hooks.js +34 -20
  14. package/lib/index.d.ts +1 -0
  15. package/lib/index.js +2 -5
  16. package/lib/memoryTracker.d.ts +1 -1
  17. package/lib/memoryTracker.js +1 -1
  18. package/lib/modelFactory.d.ts +11 -5
  19. package/lib/modelFactory.js +9 -4
  20. package/nitrogen/generated/android/LiteRTLMOnLoad.cpp +11 -4
  21. package/nitrogen/generated/android/c++/JHybridLiteRTLMSpec.cpp +31 -37
  22. package/nitrogen/generated/android/c++/JHybridLiteRTLMSpec.hpp +19 -22
  23. package/nitrogen/generated/android/kotlin/com/margelo/nitro/dev/litert/litertlm/HybridLiteRTLMSpec.kt +15 -18
  24. package/package.json +12 -5
  25. package/react-native-litert-lm.podspec +20 -7
  26. package/scripts/build-ios-engine.sh +283 -0
  27. package/scripts/download-ios-frameworks.sh +72 -0
  28. package/scripts/postinstall.js +116 -0
  29. package/scripts/stubs/cxx_bridge_stubs.cc +224 -0
  30. package/scripts/stubs/gemma_model_constraint_provider.cc +46 -0
  31. package/scripts/stubs/llguidance_stubs.c +101 -0
  32. package/src/hooks.ts +62 -39
  33. package/src/index.ts +4 -7
  34. package/src/memoryTracker.ts +1 -1
  35. package/src/modelFactory.ts +30 -5
package/README.md CHANGED
@@ -1,23 +1,19 @@
1
1
  # react-native-litert-lm
2
2
 
3
- High-performance LLM inference for React Native powered by [LiteRT-LM](https://github.com/google-ai-edge/LiteRT-LM) and [Nitro Module](https://github.com/mrousavy/nitro). Optimized for **Gemma 3n** and other on-device language models.
3
+ High-performance on-device LLM inference for React Native, powered by [LiteRT-LM](https://github.com/google-ai-edge/LiteRT-LM) and [Nitro Modules](https://github.com/mrousavy/nitro). Optimized for **Gemma 3n** and other on-device language models.
4
4
 
5
5
  ## Features
6
6
 
7
- - ๐Ÿš€ **Native Performance** - Kotlin (Android) / C++ (iOS) implementation via Nitro Modules
8
- - ๐Ÿง  **Gemma 3n Ready** - First-class support for Gemma 3n E2B/E4B models
9
- - โšก **GPU Acceleration** - GPU delegate (Android), Metal (iOS when available)
10
- - ๐Ÿ“ฆ **Bundled Tokenizer** - No separate tokenization library needed
11
- - ๐Ÿ”„ **Streaming Support** - Token-by-token generation callbacks
12
- - ๐Ÿ“ฑ **Cross-Platform** - Android API 26+
13
- - ๐Ÿ–ผ๏ธ **Multimodal** - Image and audio input support (Android Beta, iOS coming soon)
14
- - ๐Ÿงต **Async API** - Non-blocking inference to prevent UI freezes
15
- - ๐Ÿ“Š **Real Memory Tracking** - OS-level memory metrics (RSS, native heap, available memory) via native APIs
16
- - ๐Ÿงฎ **Zero-Copy Buffers** - Memory snapshots stored in native ArrayBuffers via `NitroModules.createNativeArrayBuffer()` (v0.34+)
17
-
18
- ## Status
19
-
20
- > โš ๏ธ **Early Preview**: This library is under active development. Android is functional with enough RAM, iOS implementation pending LiteRT-LM iOS release. Please report any issues on the [GitHub issues](https://github.com/hung-yueh/react-native-litert-lm/issues).
7
+ - ๐Ÿš€ **Native Performance** โ€” Kotlin (Android) / C++ (iOS) via Nitro Modules JSI bindings
8
+ - ๐Ÿง  **Gemma 3n Ready** โ€” First-class support for Gemma 3n E2B/E4B models
9
+ - โšก **GPU Acceleration** โ€” GPU delegate (Android), Metal/MPS (iOS)
10
+ - ๐Ÿ”„ **Streaming Support** โ€” Token-by-token generation callbacks
11
+ - ๐Ÿ“ฑ **Cross-Platform** โ€” Android API 26+ / iOS 15.0+
12
+ - ๐Ÿ–ผ๏ธ **Multimodal** โ€” Image and audio input support (Android)
13
+ - ๐Ÿงต **Async API** โ€” Non-blocking inference on background threads
14
+ - ๐Ÿ“Š **Real Memory Tracking** โ€” OS-level memory metrics (RSS, native heap, available memory) via native APIs
15
+ - ๐Ÿงฎ **Zero-Copy Buffers** โ€” Memory snapshots stored in native ArrayBuffers via Nitro Modules
16
+ - ๐Ÿ“ฅ **Automatic Model Download** โ€” Downloads models from URL with progress tracking and local caching
21
17
 
22
18
  ## Installation
23
19
 
@@ -44,69 +40,88 @@ Then create a development build:
44
40
 
45
41
  ```bash
46
42
  npx expo prebuild
47
- npx expo run:android
43
+ npx expo run:android # Android
44
+ npx expo run:ios # iOS
48
45
  ```
49
46
 
50
- > **Note**: Only ARM devices are supported (physical devices or ARM emulators). x86_64 emulators are not supported.
47
+ > **Note**: Only ARM devices/simulators are supported. x86_64 Android emulators are not supported.
51
48
 
52
49
  ### Bare React Native
53
50
 
54
51
  ```bash
52
+ # Android
55
53
  cd android && ./gradlew clean
56
- cd ios && pod install # iOS coming soon
54
+
55
+ # iOS
56
+ cd ios && pod install
57
57
  ```
58
58
 
59
59
  ## Example App
60
60
 
61
- The repository includes a fully functional example app in the `example/` directory with a dark-themed diagnostic UI that demonstrates model loading, inference, memory tracking, and performance stats.
61
+ The `example/` directory contains a fully functional test app with a dark-themed diagnostic UI that demonstrates:
62
+
63
+ - Model downloading with progress tracking
64
+ - Text inference (blocking and streaming)
65
+ - Multi-turn conversation with context retention
66
+ - Performance benchmarking (tokens/sec, latency)
67
+ - Real-time memory tracking
68
+ - Quick chat interface
62
69
 
63
- To run it:
70
+ ### Running the Example
64
71
 
65
- 1. **Build the library** (compiles TypeScript to `lib/`):
72
+ 1. **Build the library** (compiles TypeScript to `lib/`):
66
73
 
67
- ```bash
68
- npm run build
69
- ```
74
+ ```bash
75
+ npm run build
76
+ ```
70
77
 
71
- 2. **Navigate to the example directory and install dependencies:**
78
+ 2. **Install example dependencies:**
72
79
 
73
- ```bash
74
- cd example
75
- npm install
76
- ```
80
+ ```bash
81
+ cd example
82
+ npm install
83
+ ```
77
84
 
78
- 3. **Create a development build and run on Android:**
79
- ```bash
80
- npx expo prebuild --clean
81
- npx expo run:android
82
- ```
85
+ 3. **Create a development build and run:**
83
86
 
84
- > **Note:** If you change native code (C++/Kotlin), you must run `npx expo prebuild --clean` again.
87
+ ```bash
88
+ npx expo prebuild --clean
89
+ npx expo run:android # Android
90
+ npx expo run:ios # iOS (requires XCFramework โ€” see "Building the iOS Engine" below)
91
+ ```
92
+
93
+ > **Note:** If you change native code (C++/Kotlin/Obj-C++), you must run `npx expo prebuild --clean` again before rebuilding.
85
94
 
86
95
  ## Model Management
87
96
 
88
- LiteRT-LM models (like Gemma 3n) are large files (3GB+) and cannot be bundled directly into your app's binary. You must download them at runtime to a writable directory (e.g., `DocumentDirectory`).
97
+ LiteRT-LM models (like Gemma 3n) are large files (3 GB+) and cannot be bundled into your app binary. They are downloaded at runtime.
89
98
 
90
99
  ### Automatic Downloading
91
100
 
92
- The library supports automatic downloading when you pass a URL to `loadModel` or `useModel`.
101
+ The library handles downloading automatically when you pass a URL to `loadModel` or `useModel`. Downloads include:
102
+
103
+ - **Progress tracking** โ€” real-time download percentage via callbacks
104
+ - **Local caching** โ€” downloaded models are cached and reused across app launches
105
+ - **Android**: app-local temp directory
106
+ - **iOS**: `Library/Caches/litert_models/` (survives app relaunch; reclaimable by iOS under storage pressure)
107
+ - **HTTPS enforcement** โ€” only secure URLs are accepted
93
108
 
94
109
  ### Manual Downloading (Optional)
95
110
 
96
- If you prefer to manage downloads manually (e.g., using `rn-fetch-blob` or `expo-file-system`), you can download the file to a local path and pass that path to the library.
111
+ If you prefer to manage downloads yourself (e.g., using `expo-file-system`), download the `.litertlm` file to a local path and pass that path to the library:
97
112
 
98
113
  ```typescript
99
- import { FileSystem } from "react-native-file-access";
100
- // or import * as FileSystem from 'expo-file-system';
114
+ import * as FileSystem from "expo-file-system";
101
115
 
102
116
  const MODEL_URL =
103
117
  "https://huggingface.co/litert-community/gemma-3n-2b-it/resolve/main/model.litertlm";
104
- const localPath = `${FileSystem.DocumentDirectoryPath}/gemma-3n.litertlm`;
118
+ const localPath = `${FileSystem.documentDirectory}gemma-3n.litertlm`;
105
119
 
106
120
  async function downloadModel() {
107
- if (await FileSystem.exists(localPath)) return localPath;
121
+ const info = await FileSystem.getInfoAsync(localPath);
122
+ if (info.exists) return localPath;
108
123
 
109
- // Download logic here...
124
+ await FileSystem.downloadAsync(MODEL_URL, localPath);
110
125
  return localPath;
111
126
  }
112
127
  ```
@@ -115,26 +130,27 @@ async function downloadModel() {
115
130
 
116
131
  ### React Hook (Recommended)
117
132
 
118
- The `useModel` hook manages the model lifecycle, including downloading, loading, and unloading.
133
+ The `useModel` hook manages the full model lifecycle: downloading, loading, inference, and cleanup.
119
134
 
120
135
  ```typescript
121
136
  import { useModel, GEMMA_3N_E2B_IT_INT4 } from "react-native-litert-lm";
137
+ import { Platform } from "react-native";
122
138
 
123
139
  function App() {
124
140
  const {
125
141
  model,
126
142
  isReady,
127
143
  downloadProgress,
128
- load, // Manually trigger load
129
- deleteModel // Delete model file
130
- } = useModel(
131
- GEMMA_3N_E2B_IT_INT4,
132
- {
133
- backend: "cpu",
134
- autoLoad: true, // Default: true. Set false to load manually.
135
- systemPrompt: "You are a helpful assistant."
136
- }
137
- );
144
+ error,
145
+ load, // Manually trigger load
146
+ deleteModel, // Delete cached model file
147
+ memorySummary, // Auto-updated memory stats (if tracking enabled)
148
+ } = useModel(GEMMA_3N_E2B_IT_INT4, {
149
+ backend: Platform.OS === 'ios' ? 'gpu' : 'cpu',
150
+ autoLoad: true, // Default: true. Set false to load manually via load().
151
+ systemPrompt: "You are a helpful assistant.",
152
+ enableMemoryTracking: true,
153
+ });
138
154
 
139
155
  if (!isReady) {
140
156
  return <Text>Loading... {Math.round(downloadProgress * 100)}%</Text>;
@@ -162,7 +178,7 @@ await llm.loadModel("https://example.com/model.litertlm", {
162
178
  systemPrompt: "You are a helpful assistant.",
163
179
  });
164
180
 
165
- // Generate response (async)
181
+ // Generate a response
166
182
  const response = await llm.sendMessage("What is the capital of France?");
167
183
  console.log(response);
168
184
 
@@ -179,15 +195,16 @@ llm.sendMessageAsync("Tell me a story", (token, done) => {
179
195
  });
180
196
  ```
181
197
 
182
- ### Multimodal (Image/Audio)
198
+ ### Multimodal (Image / Audio)
199
+
200
+ > **Note**: Multimodal is fully supported on Android. iOS has the code paths implemented but vision/audio executors may not be available in the current XCFramework build โ€” use `checkMultimodalSupport()` to verify at runtime.
183
201
 
184
202
  ```typescript
185
203
  import { checkMultimodalSupport } from "react-native-litert-lm";
186
204
 
187
- // Check platform support first
188
- const error = checkMultimodalSupport();
189
- if (error) {
190
- console.warn(error); // iOS not yet supported
205
+ const warning = checkMultimodalSupport();
206
+ if (warning) {
207
+ console.warn(warning); // Experimental on iOS
191
208
  } else {
192
209
  // Image input (for vision models like Gemma 3n)
193
210
  // Images >1024px are automatically resized to prevent OOM
@@ -196,7 +213,7 @@ if (error) {
196
213
  "/path/to/image.jpg",
197
214
  );
198
215
 
199
- // Audio input (for audio models)
216
+ // Audio input
200
217
  const transcription = await llm.sendMessageWithAudio(
201
218
  "Transcribe this audio",
202
219
  "/path/to/audio.wav",
@@ -204,58 +221,59 @@ if (error) {
204
221
  }
205
222
  ```
206
223
 
207
- ### Check Performance
224
+ ### Performance Stats
208
225
 
209
226
  ```typescript
210
227
  const stats = llm.getStats();
211
228
  console.log(`Generated ${stats.completionTokens} tokens`);
212
229
  console.log(`Speed: ${stats.tokensPerSecond.toFixed(1)} tokens/sec`);
230
+ console.log(`Time to first token: ${stats.timeToFirstToken.toFixed(0)} ms`);
213
231
  ```
214
232
 
215
233
  ### Memory Tracking
216
234
 
217
- The library provides real OS-level memory usage data. You can query memory at any time, or enable automatic tracking to record snapshots after each inference call.
235
+ The library provides real OS-level memory data โ€” no estimation. It reads directly from `mach_task_basic_info` (iOS) and `Debug.getNativeHeapAllocatedSize()` + `/proc/self/status` (Android).
218
236
 
219
237
  #### Direct Memory Query
220
238
 
221
239
  ```typescript
222
- // Get a single real-time snapshot from native APIs
223
240
  const usage = llm.getMemoryUsage();
224
- console.log(`Native heap: ${(usage.nativeHeapBytes / 1024 / 1024).toFixed(1)} MB`);
241
+ console.log(
242
+ `Native heap: ${(usage.nativeHeapBytes / 1024 / 1024).toFixed(1)} MB`,
243
+ );
225
244
  console.log(`RSS: ${(usage.residentBytes / 1024 / 1024).toFixed(1)} MB`);
226
- console.log(`Available: ${(usage.availableMemoryBytes / 1024 / 1024).toFixed(1)} MB`);
245
+ console.log(
246
+ `Available: ${(usage.availableMemoryBytes / 1024 / 1024).toFixed(1)} MB`,
247
+ );
227
248
  console.log(`Low memory: ${usage.isLowMemory}`);
228
249
  ```
229
250
 
230
251
  #### Automatic Tracking with Native Buffers
231
252
 
232
- Enable memory tracking to automatically record snapshots in a native-backed `ArrayBuffer` (allocated via `NitroModules.createNativeArrayBuffer()`) after every inference call:
253
+ Enable memory tracking to automatically record snapshots in a native-backed `ArrayBuffer` after every inference call:
233
254
 
234
255
  ```typescript
235
- import { createLLM } from 'react-native-litert-lm';
236
-
237
256
  const llm = createLLM({
238
257
  enableMemoryTracking: true,
239
- maxMemorySnapshots: 256, // default
258
+ maxMemorySnapshots: 256,
240
259
  });
241
260
 
242
- await llm.loadModel('/path/to/model.litertlm', { backend: 'cpu' });
243
- await llm.sendMessage('Hello!');
261
+ await llm.loadModel("/path/to/model.litertlm", { backend: "cpu" });
262
+ await llm.sendMessage("Hello!");
244
263
 
245
- // Review tracked data
246
264
  const summary = llm.memoryTracker!.getSummary();
247
- console.log(`Peak RSS: ${(summary.peakResidentBytes / 1024 / 1024).toFixed(1)} MB`);
248
- console.log(`Peak Native Heap: ${(summary.peakNativeHeapBytes / 1024 / 1024).toFixed(1)} MB`);
249
- console.log(`RSS Delta: ${(summary.residentDeltaBytes / 1024 / 1024).toFixed(1)} MB`);
250
- console.log(`Snapshots: ${summary.snapshotCount}`);
265
+ console.log(
266
+ `Peak RSS: ${(summary.peakResidentBytes / 1024 / 1024).toFixed(1)} MB`,
267
+ );
268
+ console.log(
269
+ `RSS Delta: ${(summary.residentDeltaBytes / 1024 / 1024).toFixed(1)} MB`,
270
+ );
251
271
  ```
252
272
 
253
- #### Using the `useModel` Hook with Memory Tracking
273
+ #### Using `useModel` with Memory Tracking
254
274
 
255
275
  ```typescript
256
- import { useModel } from 'react-native-litert-lm';
257
-
258
- const { model, isReady, memorySummary, memoryTracker } = useModel(modelUrl, {
276
+ const { model, isReady, memorySummary } = useModel(modelUrl, {
259
277
  enableMemoryTracking: true,
260
278
  maxMemorySnapshots: 100,
261
279
  });
@@ -270,12 +288,13 @@ if (memorySummary) {
270
288
  #### Standalone Memory Tracker
271
289
 
272
290
  ```typescript
273
- import { createMemoryTracker, createNativeBuffer } from 'react-native-litert-lm';
291
+ import {
292
+ createMemoryTracker,
293
+ createNativeBuffer,
294
+ } from "react-native-litert-lm";
274
295
 
275
- // Create a tracker backed by a native ArrayBuffer
276
296
  const tracker = createMemoryTracker(100);
277
297
 
278
- // Manually record snapshots
279
298
  tracker.record({
280
299
  timestamp: Date.now(),
281
300
  nativeHeapBytes: 50_000_000,
@@ -283,195 +302,260 @@ tracker.record({
283
302
  availableMemoryBytes: 4_000_000_000,
284
303
  });
285
304
 
286
- // Access the underlying native buffer (for zero-copy transfer to native code)
305
+ // Access the underlying native buffer (zero-copy transfer to native code)
287
306
  const buffer = tracker.getNativeBuffer();
288
-
289
- // Create a standalone native buffer for custom use
290
- const customBuffer = createNativeBuffer(1024);
291
307
  ```
292
308
 
293
309
  ## Supported Models
294
310
 
295
- Download `.litertlm` models automatically using the exported constants or from [HuggingFace](https://huggingface.co/litert-community):
311
+ Download `.litertlm` models automatically using the exported URL constants, or manually from [HuggingFace](https://huggingface.co/litert-community):
296
312
 
297
- | Model Constant | Description | Size | Min Device RAM |
298
- | :--------------------- | :------------------------------------- | :--- | :------------- |
299
- | `GEMMA_3N_E2B_IT_INT4` | Gemma 3n E2B (Instruction Tuned, Int4) | ~3GB | 4GB+ |
313
+ | Constant | Model | Size | Min RAM |
314
+ | :--------------------- | :------------------------------------- | :---- | :------ |
315
+ | `GEMMA_3N_E2B_IT_INT4` | Gemma 3n E2B (Instruction Tuned, Int4) | ~3 GB | 4 GB+ |
300
316
 
301
- | Other Models | Size | Min Device RAM | Use Case |
302
- | ------------- | ------ | -------------- | --------------------- |
303
- | Gemma 3n E4B | ~4GB | 8GB+ | Higher quality |
304
- | Gemma 3 1B | ~1GB | 4GB+ | Smallest, fastest |
305
- | Phi-4 Mini | ~2GB | 4GB+ | Microsoft's small LLM |
306
- | Qwen 2.5 1.5B | ~1.5GB | 4GB+ | Multilingual |
317
+ **Other compatible models** (download manually from HuggingFace):
318
+
319
+ | Model | Size | Min RAM | Notes |
320
+ | ------------- | ------- | ------- | --------------------- |
321
+ | Gemma 3n E4B | ~4 GB | 8 GB+ | Higher quality |
322
+ | Gemma 3 1B | ~1 GB | 4 GB+ | Smallest, fastest |
323
+ | Phi-4 Mini | ~2 GB | 4 GB+ | Microsoft's small LLM |
324
+ | Qwen 2.5 1.5B | ~1.5 GB | 4 GB+ | Multilingual |
307
325
 
308
326
  ## API Reference
309
327
 
310
- ### `createLLM(): LiteRTLM`
328
+ ### `createLLM(options?): LiteRTLM`
311
329
 
312
330
  Creates a new LLM inference engine instance.
313
331
 
332
+ - `options.enableMemoryTracking` โ€” enable automatic memory snapshot recording
333
+ - `options.maxMemorySnapshots` โ€” max number of snapshots to retain (default: 256)
334
+
314
335
  ### `loadModel(path, config?): Promise<void>`
315
336
 
316
- - `path: string` - Absolute path to `.litertlm` file OR a public URL (http/https). If a URL is provided, the model will be downloaded automatically.
317
- - `config.systemPrompt` - System prompt to guide model behavior (e.g., "You are a helpful assistant.")
318
- - `config.backend` - `'cpu'` | `'gpu'` | `'npu'` (default: `'gpu'`)
319
- - `config.temperature` - Sampling temperature (default: 0.7)
320
- - `config.topK` - Top-K sampling (default: 40)
321
- - `config.maxTokens` - Max generation length (default: 1024)
337
+ Loads a model from a local path or HTTPS URL.
322
338
 
323
- > **Note**: Vision encoder is always set to GPU (required by Gemma 3n). Audio encoder is always set to CPU (optimal for audio).
339
+ | Parameter | Type | Default | Description |
340
+ | --------------------- | -------- | ------- | ----------------------------------------- |
341
+ | `path` | `string` | โ€” | Absolute path to `.litertlm` or HTTPS URL |
342
+ | `config.backend` | `string` | `'gpu'` | `'cpu'`, `'gpu'`, or `'npu'` |
343
+ | `config.systemPrompt` | `string` | โ€” | System prompt for the model |
344
+ | `config.temperature` | `number` | `0.7` | Sampling temperature |
345
+ | `config.topK` | `number` | `40` | Top-K sampling |
346
+ | `config.topP` | `number` | `0.95` | Top-P (nucleus) sampling |
347
+ | `config.maxTokens` | `number` | `1024` | Maximum generation length |
324
348
 
325
349
  #### Backend Options
326
350
 
327
- | Backend | Description | Speed | Compatibility |
328
- | ------- | ----------------- | ------- | ------------------------------------------ |
329
- | `'cpu'` | CPU inference | Slowest | Always available with less RAM requirement |
330
- | `'gpu'` | GPU acceleration | Fast | Recommended default |
331
- | `'npu'` | NPU/Neural Engine | Fastest | Requires supported hardware |
351
+ | Backend | Engine | Speed | Notes |
352
+ | ------- | ------------------- | ------- | ---------------------------------------------- |
353
+ | `'cpu'` | CPU inference | Slowest | Always available, lower RAM requirement |
354
+ | `'gpu'` | GPU / Metal | Fast | Recommended default |
355
+ | `'npu'` | NPU / Neural Engine | Fastest | Requires supported hardware; falls back to GPU |
332
356
 
333
- > โš ๏ธ **NPU Note**: NPU acceleration requires compatible hardware (Qualcomm Hexagon, MediaTek APU, etc.). If unavailable, LiteRT-LM automatically falls back to GPU.
357
+ > **iOS**: `'gpu'` uses Metal/MPS and is the recommended backend. The engine automatically tries multiple backend combinations if the primary one fails.
334
358
 
335
359
  ### `sendMessage(message): Promise<string>`
336
360
 
337
- Blocking generation (executed on background thread). Returns complete response.
361
+ Runs inference synchronously on a background thread. Returns the complete response.
338
362
 
339
363
  ### `sendMessageAsync(message, callback)`
340
364
 
341
- Streaming generation. Callback receives `(token, isDone)`.
365
+ Streaming generation. Callback signature: `(token: string, isDone: boolean) => void`.
342
366
 
343
367
  ### `sendMessageWithImage(message, imagePath): Promise<string>`
344
368
 
345
- Send a message with an image attachment (for vision models).
369
+ Send a message with an image (Android only; for vision models like Gemma 3n).
346
370
 
347
371
  ### `sendMessageWithAudio(message, audioPath): Promise<string>`
348
372
 
349
- Send a message with an audio attachment (for audio models).
373
+ Send a message with audio (Android only).
374
+
375
+ ### `getStats(): GenerationStats`
376
+
377
+ Returns performance metrics from the last inference call.
378
+
379
+ ```typescript
380
+ interface GenerationStats {
381
+ tokensPerSecond: number;
382
+ totalTime: number; // seconds
383
+ timeToFirstToken: number; // seconds
384
+ promptTokens: number;
385
+ completionTokens: number;
386
+ prefillSpeed: number; // tokens/sec
387
+ }
388
+ ```
350
389
 
351
390
  ### `getMemoryUsage(): MemoryUsage`
352
391
 
353
- Returns real OS-level memory usage statistics from native APIs. No estimation โ€” reads directly from `mach_task_basic_info` (iOS) / `Debug.getNativeHeapAllocatedSize()` + `/proc/self/status` (Android).
392
+ Returns real OS-level memory usage.
354
393
 
355
394
  ```typescript
356
395
  interface MemoryUsage {
357
- nativeHeapBytes: number; // Native heap allocated bytes
358
- residentBytes: number; // Process RSS in bytes
359
- availableMemoryBytes: number; // Available system memory in bytes
360
- isLowMemory: boolean; // Whether the system considers memory low
396
+ nativeHeapBytes: number;
397
+ residentBytes: number;
398
+ availableMemoryBytes: number;
399
+ isLowMemory: boolean;
361
400
  }
362
401
  ```
363
402
 
364
403
  ### `getHistory(): Message[]`
365
404
 
366
- Get conversation history.
405
+ Returns the conversation history.
367
406
 
368
407
  ### `resetConversation()`
369
408
 
370
- Clear context and start fresh.
409
+ Clears conversation context and starts a fresh session.
371
410
 
372
411
  ### `close()`
373
412
 
374
- Release all native resources.
413
+ Releases all native resources. Call when the model is no longer needed.
375
414
 
376
415
  ### `deleteModel(fileName): Promise<void>`
377
416
 
378
- Deletes a model file from the app's internal storage and cleans up the engine instance.
417
+ Deletes a cached model file from the app's local storage.
418
+
419
+ ### Utility Functions
420
+
421
+ ```typescript
422
+ import {
423
+ checkBackendSupport,
424
+ checkMultimodalSupport,
425
+ getRecommendedBackend,
426
+ applyGemmaTemplate,
427
+ applyPhiTemplate,
428
+ applyLlamaTemplate,
429
+ } from "react-native-litert-lm";
430
+
431
+ // Check if a backend is supported
432
+ const warning = checkBackendSupport("npu"); // string | undefined
433
+ const mmError = checkMultimodalSupport(); // string | undefined
434
+ const backend = getRecommendedBackend(); // 'gpu' | 'cpu'
379
435
 
380
- ### `getRecommendedBackend(): Backend`
436
+ // Manual prompt formatting (advanced)
437
+ const prompt = applyGemmaTemplate(
438
+ [{ role: "user", content: "Hello!" }],
439
+ "You are helpful.",
440
+ );
441
+ ```
381
442
 
382
- Returns the recommended backend for the current platform (usually `'gpu'`).
443
+ ## Requirements
383
444
 
384
- ### `checkBackendSupport(backend): string | undefined`
445
+ | Dependency | Version |
446
+ | -------------------------- | ------------- |
447
+ | React Native | 0.76+ |
448
+ | react-native-nitro-modules | 0.35.0+ |
449
+ | Android API | 26+ (ARM64) |
450
+ | iOS | 15.0+ (ARM64) |
451
+ | LiteRT-LM Android SDK | 0.9.0-alpha01 |
452
+ | LiteRT-LM iOS Engine | v0.9.0 |
385
453
 
386
- Returns a warning message if the specified backend may have issues on the current platform, or `undefined` if OK.
454
+ ## Platform Support
387
455
 
388
- ```typescript
389
- import { checkBackendSupport } from "react-native-litert-lm";
456
+ | Platform | Status | Architecture | Backends |
457
+ | -------- | -------- | ------------ | ---------------- |
458
+ | Android | โœ… Ready | arm64-v8a | CPU, GPU, NPU |
459
+ | iOS | โœ… Ready | arm64 | CPU, GPU (Metal) |
390
460
 
391
- const warning = checkBackendSupport("npu");
392
- if (warning) {
393
- console.warn(warning);
394
- }
395
- ```
461
+ ### iOS Feature Matrix
396
462
 
397
- ### `checkMultimodalSupport(): string | undefined`
463
+ | Feature | Status | Notes |
464
+ | ---------------------------- | ------ | ----------------------------------------------------- |
465
+ | Text inference (blocking) | โœ… | Via LiteRT-LM C API |
466
+ | Text inference (streaming) | โœ… | Token-by-token callbacks |
467
+ | GPU inference (Metal/MPS) | โœ… | Recommended backend |
468
+ | Model download with progress | โœ… | NSURLSession, cached in `Caches/` |
469
+ | Memory tracking | โœ… | `mach_task_basic_info` |
470
+ | Multi-turn conversation | โœ… | Context retained across turns |
471
+ | Multimodal (image/audio) | ๐Ÿงช | Code paths exist; vision/audio executors experimental |
472
+ | Constrained decoding | โŒ | Requires llguidance Rust runtime |
473
+ | Function calling | โŒ | Requires Rust CXX bridge runtime |
398
474
 
399
- Returns an error message if multimodal (image/audio) is not supported on the current platform, or `undefined` if OK.
475
+ ## Building the iOS Engine
400
476
 
401
- ```typescript
402
- import { checkMultimodalSupport } from "react-native-litert-lm";
477
+ The iOS build uses a **Bazel-to-XCFramework pipeline** that compiles the LiteRT-LM C engine and all transitive dependencies into a static library (~83 MB).
403
478
 
404
- const error = checkMultimodalSupport();
405
- if (error) {
406
- console.warn(error); // iOS multimodal not yet supported
407
- }
408
- ```
479
+ ### Prerequisites
409
480
 
410
- ### Prompt Templates
481
+ - **Bazel 7.6.1+** (via [Bazelisk](https://github.com/bazelbuild/bazelisk) recommended)
482
+ - **Xcode command line tools** (`xcode-select --install`)
411
483
 
412
- For advanced use cases where you need to manually format prompts:
484
+ ### Build
413
485
 
414
- ```typescript
415
- import {
416
- applyGemmaTemplate,
417
- applyPhiTemplate,
418
- applyLlamaTemplate,
419
- ChatMessage,
420
- } from "react-native-litert-lm";
486
+ ```bash
487
+ ./scripts/build-ios-engine.sh
488
+ ```
421
489
 
422
- const history: ChatMessage[] = [
423
- { role: "user", content: "Hello!" },
424
- { role: "model", content: "Hi there!" },
425
- { role: "user", content: "Tell me a joke" },
426
- ];
490
+ This will:
427
491
 
428
- // For Gemma models
429
- const gemmaPrompt = applyGemmaTemplate(history, "You are a comedian.");
492
+ 1. Clone/checkout LiteRT-LM `v0.9.0` source into `.litert-lm-build/`
493
+ 2. Build `//c:engine` for `ios_arm64` and `ios_sim_arm64` via Bazel
494
+ 3. Collect all transitive `.o` files (engine, protobuf, re2, sentencepiece, etc.)
495
+ 4. Compile C/C++ stubs for unavailable Rust dependencies
496
+ 5. Patch `PromptTemplate` to use a simplified template engine (no Rust MinijinjaTemplate)
497
+ 6. Merge ~1,900 object files into a static library via `libtool`
498
+ 7. Package into `ios/Frameworks/LiteRTLM.xcframework`
430
499
 
431
- // For Phi models
432
- const phiPrompt = applyPhiTemplate(history);
500
+ ### Output
433
501
 
434
- // For Llama models
435
- const llamaPrompt = applyLlamaTemplate(history, "You are helpful.");
502
+ ```
503
+ ios/Frameworks/LiteRTLM.xcframework/
504
+ โ”œโ”€โ”€ Info.plist
505
+ โ”œโ”€โ”€ ios-arm64/LiteRTLM.framework/ # Device
506
+ โ”‚ โ”œโ”€โ”€ LiteRTLM # ~81 MB static library
507
+ โ”‚ โ””โ”€โ”€ Headers/litert_lm_engine.h
508
+ โ””โ”€โ”€ ios-arm64-simulator/LiteRTLM.framework/ # Simulator
509
+ โ”œโ”€โ”€ LiteRTLM # ~83 MB static library
510
+ โ””โ”€โ”€ Headers/litert_lm_engine.h
436
511
  ```
437
512
 
438
- ## Requirements
513
+ ### FFI Stubs
439
514
 
440
- - React Native 0.76+
441
- - react-native-nitro-modules **0.34.1+** (required for `createNativeArrayBuffer` and memory tracking)
442
- - Android API 26+ (ARM64 only)
443
- - **LiteRT-LM Android SDK**: `0.9.0-alpha01` (bundled automatically)
444
- - iOS 15.0+ (coming soon)
515
+ Certain LiteRT-LM features depend on Rust libraries (llguidance, CXX bridge, MinijinjaTemplate) that are not available in the iOS Bazel build. These are replaced with stubs:
445
516
 
446
- ## Platform Support
517
+ | Stub File | Location | Purpose |
518
+ | ------------------------------------ | ---------------- | ---------------------------------------- |
519
+ | `cxx_bridge_stubs.cc` | `scripts/stubs/` | CXX bridge runtime + Rust FFI type stubs |
520
+ | `llguidance_stubs.c` | `scripts/stubs/` | llguidance constrained decoding C API |
521
+ | `gemma_model_constraint_provider.cc` | `scripts/stubs/` | Gemma constraint provider factory |
522
+
523
+ Additionally, `PromptTemplate` is patched at build time to use a simplified C++ template formatter instead of the Rust MinijinjaTemplate, which avoids all Rust FFI calls during conversation setup.
447
524
 
448
- | Platform | Status | Architecture |
449
- | -------- | -------- | ------------ |
450
- | Android | โœ… Ready | arm64-v8a |
451
- | iOS | ๐Ÿšง Stub | - |
525
+ > **Text inference works fully without these Rust components.** Only constrained decoding, function calling parsers, and advanced Jinja2 template features are affected.
452
526
 
453
527
  ## Architecture
454
528
 
455
- This library uses a split implementation strategy to maximize performance and compatibility:
529
+ ```
530
+ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
531
+ โ”‚ React Native (TypeScript) โ”‚
532
+ โ”‚ useModel() / createLLM() / sendMessage() โ”‚
533
+ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
534
+ โ”‚ Nitro Modules JSI Bridge โ”‚
535
+ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
536
+ โ”‚ Android (Kotlin) โ”‚ iOS (C++) โ”‚
537
+ โ”‚ HybridLiteRTLM.kt โ”‚ HybridLiteRTLM.cpp โ”‚
538
+ โ”‚ litertlm-android โ”‚ LiteRTLM C API โ”‚
539
+ โ”‚ AAR (GPU delegate) โ”‚ XCFramework (Metal) โ”‚
540
+ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
541
+ ```
456
542
 
457
- - **Android**: Uses **Kotlin** (`HybridLiteRTLM.kt`) to interface directly with the `litertlm-android` AAR.
458
- - **iOS**: Uses **C++** (`HybridLiteRTLM.cpp`) which will interface with the LiteRT-LM C++ headers (once released).
543
+ - **Android**: Kotlin (`HybridLiteRTLM.kt`) interfacing with the `litertlm-android` AAR.
544
+ - **iOS**: C++ (`HybridLiteRTLM.cpp`) interfacing with the LiteRT-LM C API via a prebuilt `LiteRTLM.xcframework`. Platform-specific code (model downloading, file management) is in Objective-C++ (`ios/IOSDownloadHelper.mm`).
459
545
 
460
- > **Note for Contributors**: Changes made to the C++ implementation (`cpp/`) **do not** affect Android. You must apply feature changes to both the Kotlin and C++ implementations.
546
+ > **For contributors**: Changes to `cpp/HybridLiteRTLM.cpp` do not affect Android. Feature changes must be applied to both the Kotlin and C++ implementations.
461
547
 
462
548
  ## License
463
549
 
464
550
  The code in this repository is licensed under the **[MIT License](LICENSE)**.
465
551
 
466
- ### โš ๏ธ Important AI Model Disclaimer
467
-
468
- This library acts as an execution engine for On-Device Large Language Models (LLMs). The AI models themselves are **not** distributed with this package and are **not** covered by the MIT license.
552
+ ### โš ๏ธ AI Model Disclaimer
469
553
 
470
- By downloading and running these models within your app, you agree to comply with their respective licenses and acceptable use policies:
554
+ This library is an execution engine for on-device LLMs. The AI models themselves are **not** distributed with this package and have their own licenses:
471
555
 
472
556
  - **Gemma (Google)**: [Gemma Terms of Use](https://ai.google.dev/gemma/terms)
473
557
  - **Llama 3 (Meta)**: [Llama 3.2 Community License](https://www.llama.com/llama3/license/)
474
- - **Qwen (Alibaba)**: [Apache 2.0 License](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct/blob/main/LICENSE)
558
+ - **Qwen (Alibaba)**: [Apache 2.0](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct/blob/main/LICENSE)
475
559
  - **Phi (Microsoft)**: [MIT License](https://huggingface.co/microsoft/Phi-3.5-mini-instruct/blob/main/LICENSE)
476
560
 
477
- _The author of `react-native-litert-lm` takes no responsibility for the outputs generated by these models or the applications built using them._
561
+ By downloading and using these models, you agree to their respective licenses and acceptable use policies. The author of `react-native-litert-lm` takes no responsibility for model outputs or applications built with them.