react-native-litert-lm 0.2.1 โ†’ 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (43) hide show
  1. package/README.md +331 -150
  2. package/android/build.gradle +1 -1
  3. package/android/src/main/java/com/margelo/nitro/dev/litert/litertlm/HybridLiteRTLM.kt +140 -37
  4. package/app.plugin.js +33 -0
  5. package/cpp/HybridLiteRTLM.cpp +577 -378
  6. package/cpp/HybridLiteRTLM.hpp +66 -23
  7. package/cpp/IOSDownloadHelper.h +24 -0
  8. package/cpp/cpp-adapter.cpp +10 -2
  9. package/cpp/include/litert_lm_engine.h +502 -0
  10. package/ios/IOSDownloadHelper.mm +129 -0
  11. package/ios/LiteRTLMAutolinking.mm +30 -0
  12. package/lib/hooks.d.ts +33 -3
  13. package/lib/hooks.js +54 -23
  14. package/lib/index.d.ts +4 -1
  15. package/lib/index.js +6 -6
  16. package/lib/memoryTracker.d.ts +128 -0
  17. package/lib/memoryTracker.js +155 -0
  18. package/lib/modelFactory.d.ts +21 -2
  19. package/lib/modelFactory.js +78 -11
  20. package/lib/specs/LiteRTLM.nitro.d.ts +19 -0
  21. package/nitrogen/generated/android/LiteRTLMOnLoad.cpp +28 -18
  22. package/nitrogen/generated/android/LiteRTLMOnLoad.hpp +13 -4
  23. package/nitrogen/generated/android/c++/JHybridLiteRTLMSpec.cpp +39 -36
  24. package/nitrogen/generated/android/c++/JHybridLiteRTLMSpec.hpp +20 -22
  25. package/nitrogen/generated/android/c++/JMemoryUsage.hpp +69 -0
  26. package/nitrogen/generated/android/kotlin/com/margelo/nitro/dev/litert/litertlm/HybridLiteRTLMSpec.kt +19 -18
  27. package/nitrogen/generated/android/kotlin/com/margelo/nitro/dev/litert/litertlm/MemoryUsage.kt +47 -0
  28. package/nitrogen/generated/shared/c++/HybridLiteRTLMSpec.cpp +1 -0
  29. package/nitrogen/generated/shared/c++/HybridLiteRTLMSpec.hpp +4 -0
  30. package/nitrogen/generated/shared/c++/MemoryUsage.hpp +95 -0
  31. package/package.json +12 -5
  32. package/react-native-litert-lm.podspec +20 -7
  33. package/scripts/build-ios-engine.sh +283 -0
  34. package/scripts/download-ios-frameworks.sh +72 -0
  35. package/scripts/postinstall.js +116 -0
  36. package/scripts/stubs/cxx_bridge_stubs.cc +224 -0
  37. package/scripts/stubs/gemma_model_constraint_provider.cc +46 -0
  38. package/scripts/stubs/llguidance_stubs.c +101 -0
  39. package/src/hooks.ts +107 -41
  40. package/src/index.ts +13 -6
  41. package/src/memoryTracker.ts +268 -0
  42. package/src/modelFactory.ts +107 -11
  43. package/src/specs/LiteRTLM.nitro.ts +21 -0
package/README.md CHANGED
@@ -1,21 +1,19 @@
1
1
  # react-native-litert-lm
2
2
 
3
- High-performance LLM inference for React Native powered by [LiteRT-LM](https://github.com/google-ai-edge/LiteRT-LM) and [Nitro Module](https://github.com/mrousavy/nitro). Optimized for **Gemma 3n** and other on-device language models.
3
+ High-performance on-device LLM inference for React Native, powered by [LiteRT-LM](https://github.com/google-ai-edge/LiteRT-LM) and [Nitro Modules](https://github.com/mrousavy/nitro). Optimized for **Gemma 3n** and other on-device language models.
4
4
 
5
5
  ## Features
6
6
 
7
- - ๐Ÿš€ **Native Performance** - Kotlin (Android) / C++ (iOS) implementation via Nitro Modules
8
- - ๐Ÿง  **Gemma 3n Ready** - First-class support for Gemma 3n E2B/E4B models
9
- - โšก **GPU Acceleration** - GPU delegate (Android), Metal (iOS when available)
10
- - ๐Ÿ“ฆ **Bundled Tokenizer** - No separate tokenization library needed
11
- - ๐Ÿ”„ **Streaming Support** - Token-by-token generation callbacks
12
- - ๐Ÿ“ฑ **Cross-Platform** - Android API 26+
13
- - ๐Ÿ–ผ๏ธ **Multimodal** - Image and audio input support (Android Beta, iOS coming soon)
14
- - ๐Ÿงต **Async API** - Non-blocking inference to prevent UI freezes
15
-
16
- ## Status
17
-
18
- > โš ๏ธ **Early Preview**: This library is under active development. Android is functional with enough RAM, iOS implementation pending LiteRT-LM iOS release. Please report any issues on the [GitHub issues](https://github.com/hung-yueh/react-native-litert-lm/issues).
7
+ - ๐Ÿš€ **Native Performance** โ€” Kotlin (Android) / C++ (iOS) via Nitro Modules JSI bindings
8
+ - ๐Ÿง  **Gemma 3n Ready** โ€” First-class support for Gemma 3n E2B/E4B models
9
+ - โšก **GPU Acceleration** โ€” GPU delegate (Android), Metal/MPS (iOS)
10
+ - ๐Ÿ”„ **Streaming Support** โ€” Token-by-token generation callbacks
11
+ - ๐Ÿ“ฑ **Cross-Platform** โ€” Android API 26+ / iOS 15.0+
12
+ - ๐Ÿ–ผ๏ธ **Multimodal** โ€” Image and audio input support (Android)
13
+ - ๐Ÿงต **Async API** โ€” Non-blocking inference on background threads
14
+ - ๐Ÿ“Š **Real Memory Tracking** โ€” OS-level memory metrics (RSS, native heap, available memory) via native APIs
15
+ - ๐Ÿงฎ **Zero-Copy Buffers** โ€” Memory snapshots stored in native ArrayBuffers via Nitro Modules
16
+ - ๐Ÿ“ฅ **Automatic Model Download** โ€” Downloads models from URL with progress tracking and local caching
19
17
 
20
18
  ## Installation
21
19
 
@@ -42,65 +40,88 @@ Then create a development build:
42
40
 
43
41
  ```bash
44
42
  npx expo prebuild
45
- npx expo run:android
43
+ npx expo run:android # Android
44
+ npx expo run:ios # iOS
46
45
  ```
47
46
 
48
- > **Note**: Only ARM devices are supported (physical devices or ARM emulators). x86_64 emulators are not supported.
47
+ > **Note**: Only ARM devices/simulators are supported. x86_64 Android emulators are not supported.
49
48
 
50
49
  ### Bare React Native
51
50
 
52
51
  ```bash
52
+ # Android
53
53
  cd android && ./gradlew clean
54
- cd ios && pod install # iOS coming soon
54
+
55
+ # iOS
56
+ cd ios && pod install
55
57
  ```
56
58
 
57
59
  ## Example App
58
60
 
59
- The repository includes a fully functional example app in the `example/` directory.
61
+ The `example/` directory contains a fully functional test app with a dark-themed diagnostic UI that demonstrates:
62
+
63
+ - Model downloading with progress tracking
64
+ - Text inference (blocking and streaming)
65
+ - Multi-turn conversation with context retention
66
+ - Performance benchmarking (tokens/sec, latency)
67
+ - Real-time memory tracking
68
+ - Quick chat interface
69
+
70
+ ### Running the Example
71
+
72
+ 1. **Build the library** (compiles TypeScript to `lib/`):
60
73
 
61
- To run it:
74
+ ```bash
75
+ npm run build
76
+ ```
62
77
 
63
- 1. **Navigate to the example directory:**
78
+ 2. **Install example dependencies:**
64
79
 
65
- ```bash
66
- cd example
67
- ```
80
+ ```bash
81
+ cd example
82
+ npm install
83
+ ```
68
84
 
69
- 2. **Install dependencies:**
85
+ 3. **Create a development build and run:**
70
86
 
71
- ```bash
72
- npm install
73
- ```
87
+ ```bash
88
+ npx expo prebuild --clean
89
+ npx expo run:android # Android
90
+ npx expo run:ios # iOS (requires XCFramework โ€” see "Building the iOS Engine" below)
91
+ ```
74
92
 
75
- 3. **Run on Android:**
76
- ```bash
77
- npx expo run:android
78
- ```
93
+ > **Note:** If you change native code (C++/Kotlin/Obj-C++), you must run `npx expo prebuild --clean` again before rebuilding.
79
94
 
80
95
  ## Model Management
81
96
 
82
- LiteRT-LM models (like Gemma 3n) are large files (3GB+) and cannot be bundled directly into your app's binary. You must download them at runtime to a writable directory (e.g., `DocumentDirectory`).
97
+ LiteRT-LM models (like Gemma 3n) are large files (3 GB+) and cannot be bundled into your app binary. They are downloaded at runtime.
83
98
 
84
99
  ### Automatic Downloading
85
100
 
86
- The library supports automatic downloading when you pass a URL to `loadModel` or `useModel`.
101
+ The library handles downloading automatically when you pass a URL to `loadModel` or `useModel`. Downloads include:
102
+
103
+ - **Progress tracking** โ€” real-time download percentage via callbacks
104
+ - **Local caching** โ€” downloaded models are cached and reused across app launches
105
+ - **Android**: app-local temp directory
106
+ - **iOS**: `Library/Caches/litert_models/` (survives app relaunch; reclaimable by iOS under storage pressure)
107
+ - **HTTPS enforcement** โ€” only secure URLs are accepted
87
108
 
88
109
  ### Manual Downloading (Optional)
89
110
 
90
- If you prefer to manage downloads manually (e.g., using `rn-fetch-blob` or `expo-file-system`), you can download the file to a local path and pass that path to the library.
111
+ If you prefer to manage downloads yourself (e.g., using `expo-file-system`), download the `.litertlm` file to a local path and pass that path to the library:
91
112
 
92
113
  ```typescript
93
- import { FileSystem } from "react-native-file-access";
94
- // or import * as FileSystem from 'expo-file-system';
114
+ import * as FileSystem from "expo-file-system";
95
115
 
96
116
  const MODEL_URL =
97
117
  "https://huggingface.co/litert-community/gemma-3n-2b-it/resolve/main/model.litertlm";
98
- const localPath = `${FileSystem.DocumentDirectoryPath}/gemma-3n.litertlm`;
118
+ const localPath = `${FileSystem.documentDirectory}gemma-3n.litertlm`;
99
119
 
100
120
  async function downloadModel() {
101
- if (await FileSystem.exists(localPath)) return localPath;
121
+ const info = await FileSystem.getInfoAsync(localPath);
122
+ if (info.exists) return localPath;
102
123
 
103
- // Download logic here...
124
+ await FileSystem.downloadAsync(MODEL_URL, localPath);
104
125
  return localPath;
105
126
  }
106
127
  ```
@@ -109,26 +130,27 @@ async function downloadModel() {
109
130
 
110
131
  ### React Hook (Recommended)
111
132
 
112
- The `useModel` hook manages the model lifecycle, including downloading, loading, and unloading.
133
+ The `useModel` hook manages the full model lifecycle: downloading, loading, inference, and cleanup.
113
134
 
114
135
  ```typescript
115
136
  import { useModel, GEMMA_3N_E2B_IT_INT4 } from "react-native-litert-lm";
137
+ import { Platform } from "react-native";
116
138
 
117
139
  function App() {
118
140
  const {
119
141
  model,
120
142
  isReady,
121
143
  downloadProgress,
122
- load, // Manually trigger load
123
- deleteModel // Delete model file
124
- } = useModel(
125
- GEMMA_3N_E2B_IT_INT4,
126
- {
127
- backend: "cpu",
128
- autoLoad: true, // Default: true. Set false to load manually.
129
- systemPrompt: "You are a helpful assistant."
130
- }
131
- );
144
+ error,
145
+ load, // Manually trigger load
146
+ deleteModel, // Delete cached model file
147
+ memorySummary, // Auto-updated memory stats (if tracking enabled)
148
+ } = useModel(GEMMA_3N_E2B_IT_INT4, {
149
+ backend: Platform.OS === 'ios' ? 'gpu' : 'cpu',
150
+ autoLoad: true, // Default: true. Set false to load manually via load().
151
+ systemPrompt: "You are a helpful assistant.",
152
+ enableMemoryTracking: true,
153
+ });
132
154
 
133
155
  if (!isReady) {
134
156
  return <Text>Loading... {Math.round(downloadProgress * 100)}%</Text>;
@@ -156,7 +178,7 @@ await llm.loadModel("https://example.com/model.litertlm", {
156
178
  systemPrompt: "You are a helpful assistant.",
157
179
  });
158
180
 
159
- // Generate response (async)
181
+ // Generate a response
160
182
  const response = await llm.sendMessage("What is the capital of France?");
161
183
  console.log(response);
162
184
 
@@ -173,15 +195,16 @@ llm.sendMessageAsync("Tell me a story", (token, done) => {
173
195
  });
174
196
  ```
175
197
 
176
- ### Multimodal (Image/Audio)
198
+ ### Multimodal (Image / Audio)
199
+
200
+ > **Note**: Multimodal is fully supported on Android. iOS has the code paths implemented but vision/audio executors may not be available in the current XCFramework build โ€” use `checkMultimodalSupport()` to verify at runtime.
177
201
 
178
202
  ```typescript
179
203
  import { checkMultimodalSupport } from "react-native-litert-lm";
180
204
 
181
- // Check platform support first
182
- const error = checkMultimodalSupport();
183
- if (error) {
184
- console.warn(error); // iOS not yet supported
205
+ const warning = checkMultimodalSupport();
206
+ if (warning) {
207
+ console.warn(warning); // Experimental on iOS
185
208
  } else {
186
209
  // Image input (for vision models like Gemma 3n)
187
210
  // Images >1024px are automatically resized to prevent OOM
@@ -190,7 +213,7 @@ if (error) {
190
213
  "/path/to/image.jpg",
191
214
  );
192
215
 
193
- // Audio input (for audio models)
216
+ // Audio input
194
217
  const transcription = await llm.sendMessageWithAudio(
195
218
  "Transcribe this audio",
196
219
  "/path/to/audio.wav",
@@ -198,183 +221,341 @@ if (error) {
198
221
  }
199
222
  ```
200
223
 
201
- ### Check Performance
224
+ ### Performance Stats
202
225
 
203
226
  ```typescript
204
227
  const stats = llm.getStats();
205
228
  console.log(`Generated ${stats.completionTokens} tokens`);
206
229
  console.log(`Speed: ${stats.tokensPerSecond.toFixed(1)} tokens/sec`);
230
+ console.log(`Time to first token: ${stats.timeToFirstToken.toFixed(0)} ms`);
231
+ ```
232
+
233
+ ### Memory Tracking
234
+
235
+ The library provides real OS-level memory data โ€” no estimation. It reads directly from `mach_task_basic_info` (iOS) and `Debug.getNativeHeapAllocatedSize()` + `/proc/self/status` (Android).
236
+
237
+ #### Direct Memory Query
238
+
239
+ ```typescript
240
+ const usage = llm.getMemoryUsage();
241
+ console.log(
242
+ `Native heap: ${(usage.nativeHeapBytes / 1024 / 1024).toFixed(1)} MB`,
243
+ );
244
+ console.log(`RSS: ${(usage.residentBytes / 1024 / 1024).toFixed(1)} MB`);
245
+ console.log(
246
+ `Available: ${(usage.availableMemoryBytes / 1024 / 1024).toFixed(1)} MB`,
247
+ );
248
+ console.log(`Low memory: ${usage.isLowMemory}`);
249
+ ```
250
+
251
+ #### Automatic Tracking with Native Buffers
252
+
253
+ Enable memory tracking to automatically record snapshots in a native-backed `ArrayBuffer` after every inference call:
254
+
255
+ ```typescript
256
+ const llm = createLLM({
257
+ enableMemoryTracking: true,
258
+ maxMemorySnapshots: 256,
259
+ });
260
+
261
+ await llm.loadModel("/path/to/model.litertlm", { backend: "cpu" });
262
+ await llm.sendMessage("Hello!");
263
+
264
+ const summary = llm.memoryTracker!.getSummary();
265
+ console.log(
266
+ `Peak RSS: ${(summary.peakResidentBytes / 1024 / 1024).toFixed(1)} MB`,
267
+ );
268
+ console.log(
269
+ `RSS Delta: ${(summary.residentDeltaBytes / 1024 / 1024).toFixed(1)} MB`,
270
+ );
271
+ ```
272
+
273
+ #### Using `useModel` with Memory Tracking
274
+
275
+ ```typescript
276
+ const { model, isReady, memorySummary } = useModel(modelUrl, {
277
+ enableMemoryTracking: true,
278
+ maxMemorySnapshots: 100,
279
+ });
280
+
281
+ // memorySummary auto-updates after each inference call
282
+ if (memorySummary) {
283
+ console.log(`Current RSS: ${memorySummary.currentResidentBytes}`);
284
+ console.log(`Peak RSS: ${memorySummary.peakResidentBytes}`);
285
+ }
286
+ ```
287
+
288
+ #### Standalone Memory Tracker
289
+
290
+ ```typescript
291
+ import {
292
+ createMemoryTracker,
293
+ createNativeBuffer,
294
+ } from "react-native-litert-lm";
295
+
296
+ const tracker = createMemoryTracker(100);
297
+
298
+ tracker.record({
299
+ timestamp: Date.now(),
300
+ nativeHeapBytes: 50_000_000,
301
+ residentBytes: 200_000_000,
302
+ availableMemoryBytes: 4_000_000_000,
303
+ });
304
+
305
+ // Access the underlying native buffer (zero-copy transfer to native code)
306
+ const buffer = tracker.getNativeBuffer();
207
307
  ```
208
308
 
209
309
  ## Supported Models
210
310
 
211
- Download `.litertlm` models automatically using the exported constants or from [HuggingFace](https://huggingface.co/litert-community):
311
+ Download `.litertlm` models automatically using the exported URL constants, or manually from [HuggingFace](https://huggingface.co/litert-community):
312
+
313
+ | Constant | Model | Size | Min RAM |
314
+ | :--------------------- | :------------------------------------- | :---- | :------ |
315
+ | `GEMMA_3N_E2B_IT_INT4` | Gemma 3n E2B (Instruction Tuned, Int4) | ~3 GB | 4 GB+ |
212
316
 
213
- | Model Constant | Description | Size | Min Device RAM |
214
- | :--------------------- | :------------------------------------- | :--- | :------------- |
215
- | `GEMMA_3N_E2B_IT_INT4` | Gemma 3n E2B (Instruction Tuned, Int4) | ~3GB | 4GB+ |
317
+ **Other compatible models** (download manually from HuggingFace):
216
318
 
217
- | Other Models | Size | Min Device RAM | Use Case |
218
- | ------------- | ------ | -------------- | --------------------- |
219
- | Gemma 3n E4B | ~4GB | 8GB+ | Higher quality |
220
- | Gemma 3 1B | ~1GB | 4GB+ | Smallest, fastest |
221
- | Phi-4 Mini | ~2GB | 4GB+ | Microsoft's small LLM |
222
- | Qwen 2.5 1.5B | ~1.5GB | 4GB+ | Multilingual |
319
+ | Model | Size | Min RAM | Notes |
320
+ | ------------- | ------- | ------- | --------------------- |
321
+ | Gemma 3n E4B | ~4 GB | 8 GB+ | Higher quality |
322
+ | Gemma 3 1B | ~1 GB | 4 GB+ | Smallest, fastest |
323
+ | Phi-4 Mini | ~2 GB | 4 GB+ | Microsoft's small LLM |
324
+ | Qwen 2.5 1.5B | ~1.5 GB | 4 GB+ | Multilingual |
223
325
 
224
326
  ## API Reference
225
327
 
226
- ### `createLLM(): LiteRTLM`
328
+ ### `createLLM(options?): LiteRTLM`
227
329
 
228
330
  Creates a new LLM inference engine instance.
229
331
 
332
+ - `options.enableMemoryTracking` โ€” enable automatic memory snapshot recording
333
+ - `options.maxMemorySnapshots` โ€” max number of snapshots to retain (default: 256)
334
+
230
335
  ### `loadModel(path, config?): Promise<void>`
231
336
 
232
- - `path: string` - Absolute path to `.litertlm` file OR a public URL (http/https). If a URL is provided, the model will be downloaded automatically.
233
- - `config.systemPrompt` - System prompt to guide model behavior (e.g., "You are a helpful assistant.")
234
- - `config.backend` - `'cpu'` | `'gpu'` | `'npu'` (default: `'gpu'`)
235
- - `config.temperature` - Sampling temperature (default: 0.7)
236
- - `config.topK` - Top-K sampling (default: 40)
237
- - `config.maxTokens` - Max generation length (default: 1024)
337
+ Loads a model from a local path or HTTPS URL.
238
338
 
239
- > **Note**: Vision encoder is always set to GPU (required by Gemma 3n). Audio encoder is always set to CPU (optimal for audio).
339
+ | Parameter | Type | Default | Description |
340
+ | --------------------- | -------- | ------- | ----------------------------------------- |
341
+ | `path` | `string` | โ€” | Absolute path to `.litertlm` or HTTPS URL |
342
+ | `config.backend` | `string` | `'gpu'` | `'cpu'`, `'gpu'`, or `'npu'` |
343
+ | `config.systemPrompt` | `string` | โ€” | System prompt for the model |
344
+ | `config.temperature` | `number` | `0.7` | Sampling temperature |
345
+ | `config.topK` | `number` | `40` | Top-K sampling |
346
+ | `config.topP` | `number` | `0.95` | Top-P (nucleus) sampling |
347
+ | `config.maxTokens` | `number` | `1024` | Maximum generation length |
240
348
 
241
349
  #### Backend Options
242
350
 
243
- | Backend | Description | Speed | Compatibility |
244
- | ------- | ----------------- | ------- | ------------------------------------------ |
245
- | `'cpu'` | CPU inference | Slowest | Always available with less RAM requirement |
246
- | `'gpu'` | GPU acceleration | Fast | Recommended default |
247
- | `'npu'` | NPU/Neural Engine | Fastest | Requires supported hardware |
351
+ | Backend | Engine | Speed | Notes |
352
+ | ------- | ------------------- | ------- | ---------------------------------------------- |
353
+ | `'cpu'` | CPU inference | Slowest | Always available, lower RAM requirement |
354
+ | `'gpu'` | GPU / Metal | Fast | Recommended default |
355
+ | `'npu'` | NPU / Neural Engine | Fastest | Requires supported hardware; falls back to GPU |
248
356
 
249
- > โš ๏ธ **NPU Note**: NPU acceleration requires compatible hardware (Qualcomm Hexagon, MediaTek APU, etc.). If unavailable, LiteRT-LM automatically falls back to GPU.
357
+ > **iOS**: `'gpu'` uses Metal/MPS and is the recommended backend. The engine automatically tries multiple backend combinations if the primary one fails.
250
358
 
251
359
  ### `sendMessage(message): Promise<string>`
252
360
 
253
- Blocking generation (executed on background thread). Returns complete response.
361
+ Runs inference synchronously on a background thread. Returns the complete response.
254
362
 
255
363
  ### `sendMessageAsync(message, callback)`
256
364
 
257
- Streaming generation. Callback receives `(token, isDone)`.
365
+ Streaming generation. Callback signature: `(token: string, isDone: boolean) => void`.
258
366
 
259
367
  ### `sendMessageWithImage(message, imagePath): Promise<string>`
260
368
 
261
- Send a message with an image attachment (for vision models).
369
+ Send a message with an image (Android only; for vision models like Gemma 3n).
262
370
 
263
371
  ### `sendMessageWithAudio(message, audioPath): Promise<string>`
264
372
 
265
- Send a message with an audio attachment (for audio models).
373
+ Send a message with audio (Android only).
374
+
375
+ ### `getStats(): GenerationStats`
376
+
377
+ Returns performance metrics from the last inference call.
378
+
379
+ ```typescript
380
+ interface GenerationStats {
381
+ tokensPerSecond: number;
382
+ totalTime: number; // seconds
383
+ timeToFirstToken: number; // seconds
384
+ promptTokens: number;
385
+ completionTokens: number;
386
+ prefillSpeed: number; // tokens/sec
387
+ }
388
+ ```
389
+
390
+ ### `getMemoryUsage(): MemoryUsage`
391
+
392
+ Returns real OS-level memory usage.
393
+
394
+ ```typescript
395
+ interface MemoryUsage {
396
+ nativeHeapBytes: number;
397
+ residentBytes: number;
398
+ availableMemoryBytes: number;
399
+ isLowMemory: boolean;
400
+ }
401
+ ```
266
402
 
267
403
  ### `getHistory(): Message[]`
268
404
 
269
- Get conversation history.
405
+ Returns the conversation history.
270
406
 
271
407
  ### `resetConversation()`
272
408
 
273
- Clear context and start fresh.
409
+ Clears conversation context and starts a fresh session.
274
410
 
275
411
  ### `close()`
276
412
 
277
- Release all native resources.
413
+ Releases all native resources. Call when the model is no longer needed.
278
414
 
279
415
  ### `deleteModel(fileName): Promise<void>`
280
416
 
281
- Deletes a model file from the app's internal storage and cleans up the engine instance.
417
+ Deletes a cached model file from the app's local storage.
282
418
 
283
- ### `getRecommendedBackend(): Backend`
419
+ ### Utility Functions
284
420
 
285
- Returns the recommended backend for the current platform (usually `'gpu'`).
421
+ ```typescript
422
+ import {
423
+ checkBackendSupport,
424
+ checkMultimodalSupport,
425
+ getRecommendedBackend,
426
+ applyGemmaTemplate,
427
+ applyPhiTemplate,
428
+ applyLlamaTemplate,
429
+ } from "react-native-litert-lm";
286
430
 
287
- ### `checkBackendSupport(backend): string | undefined`
431
+ // Check if a backend is supported
432
+ const warning = checkBackendSupport("npu"); // string | undefined
433
+ const mmError = checkMultimodalSupport(); // string | undefined
434
+ const backend = getRecommendedBackend(); // 'gpu' | 'cpu'
288
435
 
289
- Returns a warning message if the specified backend may have issues on the current platform, or `undefined` if OK.
436
+ // Manual prompt formatting (advanced)
437
+ const prompt = applyGemmaTemplate(
438
+ [{ role: "user", content: "Hello!" }],
439
+ "You are helpful.",
440
+ );
441
+ ```
290
442
 
291
- ```typescript
292
- import { checkBackendSupport } from "react-native-litert-lm";
443
+ ## Requirements
293
444
 
294
- const warning = checkBackendSupport("npu");
295
- if (warning) {
296
- console.warn(warning);
297
- }
298
- ```
445
+ | Dependency | Version |
446
+ | -------------------------- | ------------- |
447
+ | React Native | 0.76+ |
448
+ | react-native-nitro-modules | 0.35.0+ |
449
+ | Android API | 26+ (ARM64) |
450
+ | iOS | 15.0+ (ARM64) |
451
+ | LiteRT-LM Android SDK | 0.9.0-alpha01 |
452
+ | LiteRT-LM iOS Engine | v0.9.0 |
299
453
 
300
- ### `checkMultimodalSupport(): string | undefined`
454
+ ## Platform Support
301
455
 
302
- Returns an error message if multimodal (image/audio) is not supported on the current platform, or `undefined` if OK.
456
+ | Platform | Status | Architecture | Backends |
457
+ | -------- | -------- | ------------ | ---------------- |
458
+ | Android | โœ… Ready | arm64-v8a | CPU, GPU, NPU |
459
+ | iOS | โœ… Ready | arm64 | CPU, GPU (Metal) |
303
460
 
304
- ```typescript
305
- import { checkMultimodalSupport } from "react-native-litert-lm";
461
+ ### iOS Feature Matrix
306
462
 
307
- const error = checkMultimodalSupport();
308
- if (error) {
309
- console.warn(error); // iOS multimodal not yet supported
310
- }
311
- ```
463
+ | Feature | Status | Notes |
464
+ | ---------------------------- | ------ | ----------------------------------------------------- |
465
+ | Text inference (blocking) | โœ… | Via LiteRT-LM C API |
466
+ | Text inference (streaming) | โœ… | Token-by-token callbacks |
467
+ | GPU inference (Metal/MPS) | โœ… | Recommended backend |
468
+ | Model download with progress | โœ… | NSURLSession, cached in `Caches/` |
469
+ | Memory tracking | โœ… | `mach_task_basic_info` |
470
+ | Multi-turn conversation | โœ… | Context retained across turns |
471
+ | Multimodal (image/audio) | ๐Ÿงช | Code paths exist; vision/audio executors experimental |
472
+ | Constrained decoding | โŒ | Requires llguidance Rust runtime |
473
+ | Function calling | โŒ | Requires Rust CXX bridge runtime |
312
474
 
313
- ### Prompt Templates
475
+ ## Building the iOS Engine
314
476
 
315
- For advanced use cases where you need to manually format prompts:
477
+ The iOS build uses a **Bazel-to-XCFramework pipeline** that compiles the LiteRT-LM C engine and all transitive dependencies into a static library (~83 MB).
316
478
 
317
- ```typescript
318
- import {
319
- applyGemmaTemplate,
320
- applyPhiTemplate,
321
- applyLlamaTemplate,
322
- ChatMessage,
323
- } from "react-native-litert-lm";
479
+ ### Prerequisites
480
+
481
+ - **Bazel 7.6.1+** (via [Bazelisk](https://github.com/bazelbuild/bazelisk) recommended)
482
+ - **Xcode command line tools** (`xcode-select --install`)
324
483
 
325
- const history: ChatMessage[] = [
326
- { role: "user", content: "Hello!" },
327
- { role: "model", content: "Hi there!" },
328
- { role: "user", content: "Tell me a joke" },
329
- ];
484
+ ### Build
330
485
 
331
- // For Gemma models
332
- const gemmaPrompt = applyGemmaTemplate(history, "You are a comedian.");
486
+ ```bash
487
+ ./scripts/build-ios-engine.sh
488
+ ```
489
+
490
+ This will:
333
491
 
334
- // For Phi models
335
- const phiPrompt = applyPhiTemplate(history);
492
+ 1. Clone/checkout LiteRT-LM `v0.9.0` source into `.litert-lm-build/`
493
+ 2. Build `//c:engine` for `ios_arm64` and `ios_sim_arm64` via Bazel
494
+ 3. Collect all transitive `.o` files (engine, protobuf, re2, sentencepiece, etc.)
495
+ 4. Compile C/C++ stubs for unavailable Rust dependencies
496
+ 5. Patch `PromptTemplate` to use a simplified template engine (no Rust MinijinjaTemplate)
497
+ 6. Merge ~1,900 object files into a static library via `libtool`
498
+ 7. Package into `ios/Frameworks/LiteRTLM.xcframework`
336
499
 
337
- // For Llama models
338
- const llamaPrompt = applyLlamaTemplate(history, "You are helpful.");
500
+ ### Output
501
+
502
+ ```
503
+ ios/Frameworks/LiteRTLM.xcframework/
504
+ โ”œโ”€โ”€ Info.plist
505
+ โ”œโ”€โ”€ ios-arm64/LiteRTLM.framework/ # Device
506
+ โ”‚ โ”œโ”€โ”€ LiteRTLM # ~81 MB static library
507
+ โ”‚ โ””โ”€โ”€ Headers/litert_lm_engine.h
508
+ โ””โ”€โ”€ ios-arm64-simulator/LiteRTLM.framework/ # Simulator
509
+ โ”œโ”€โ”€ LiteRTLM # ~83 MB static library
510
+ โ””โ”€โ”€ Headers/litert_lm_engine.h
339
511
  ```
340
512
 
341
- ## Requirements
513
+ ### FFI Stubs
342
514
 
343
- - React Native 0.76+
344
- - react-native-nitro-modules 0.33.2+
345
- - Android API 26+ (ARM64 only)
346
- - **LiteRT-LM Android SDK**: `0.9.0-alpha01` (bundled automatically)
347
- - iOS 15.0+ (coming soon)
515
+ Certain LiteRT-LM features depend on Rust libraries (llguidance, CXX bridge, MinijinjaTemplate) that are not available in the iOS Bazel build. These are replaced with stubs:
348
516
 
349
- ## Platform Support
517
+ | Stub File | Location | Purpose |
518
+ | ------------------------------------ | ---------------- | ---------------------------------------- |
519
+ | `cxx_bridge_stubs.cc` | `scripts/stubs/` | CXX bridge runtime + Rust FFI type stubs |
520
+ | `llguidance_stubs.c` | `scripts/stubs/` | llguidance constrained decoding C API |
521
+ | `gemma_model_constraint_provider.cc` | `scripts/stubs/` | Gemma constraint provider factory |
522
+
523
+ Additionally, `PromptTemplate` is patched at build time to use a simplified C++ template formatter instead of the Rust MinijinjaTemplate, which avoids all Rust FFI calls during conversation setup.
350
524
 
351
- | Platform | Status | Architecture |
352
- | -------- | -------- | ------------ |
353
- | Android | โœ… Ready | arm64-v8a |
354
- | iOS | ๐Ÿšง Stub | - |
525
+ > **Text inference works fully without these Rust components.** Only constrained decoding, function calling parsers, and advanced Jinja2 template features are affected.
355
526
 
356
527
  ## Architecture
357
528
 
358
- This library uses a split implementation strategy to maximize performance and compatibility:
529
+ ```
530
+ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
531
+ โ”‚ React Native (TypeScript) โ”‚
532
+ โ”‚ useModel() / createLLM() / sendMessage() โ”‚
533
+ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
534
+ โ”‚ Nitro Modules JSI Bridge โ”‚
535
+ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
536
+ โ”‚ Android (Kotlin) โ”‚ iOS (C++) โ”‚
537
+ โ”‚ HybridLiteRTLM.kt โ”‚ HybridLiteRTLM.cpp โ”‚
538
+ โ”‚ litertlm-android โ”‚ LiteRTLM C API โ”‚
539
+ โ”‚ AAR (GPU delegate) โ”‚ XCFramework (Metal) โ”‚
540
+ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
541
+ ```
359
542
 
360
- - **Android**: Uses **Kotlin** (`HybridLiteRTLM.kt`) to interface directly with the `litertlm-android` AAR.
361
- - **iOS**: Uses **C++** (`HybridLiteRTLM.cpp`) which will interface with the LiteRT-LM C++ headers (once released).
543
+ - **Android**: Kotlin (`HybridLiteRTLM.kt`) interfacing with the `litertlm-android` AAR.
544
+ - **iOS**: C++ (`HybridLiteRTLM.cpp`) interfacing with the LiteRT-LM C API via a prebuilt `LiteRTLM.xcframework`. Platform-specific code (model downloading, file management) is in Objective-C++ (`ios/IOSDownloadHelper.mm`).
362
545
 
363
- > **Note for Contributors**: Changes made to the C++ implementation (`cpp/`) **do not** affect Android. You must apply feature changes to both the Kotlin and C++ implementations.
546
+ > **For contributors**: Changes to `cpp/HybridLiteRTLM.cpp` do not affect Android. Feature changes must be applied to both the Kotlin and C++ implementations.
364
547
 
365
548
  ## License
366
549
 
367
550
  The code in this repository is licensed under the **[MIT License](LICENSE)**.
368
551
 
369
- ### โš ๏ธ Important AI Model Disclaimer
370
-
371
- This library acts as an execution engine for On-Device Large Language Models (LLMs). The AI models themselves are **not** distributed with this package and are **not** covered by the MIT license.
552
+ ### โš ๏ธ AI Model Disclaimer
372
553
 
373
- By downloading and running these models within your app, you agree to comply with their respective licenses and acceptable use policies:
554
+ This library is an execution engine for on-device LLMs. The AI models themselves are **not** distributed with this package and have their own licenses:
374
555
 
375
556
  - **Gemma (Google)**: [Gemma Terms of Use](https://ai.google.dev/gemma/terms)
376
557
  - **Llama 3 (Meta)**: [Llama 3.2 Community License](https://www.llama.com/llama3/license/)
377
- - **Qwen (Alibaba)**: [Apache 2.0 License](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct/blob/main/LICENSE)
558
+ - **Qwen (Alibaba)**: [Apache 2.0](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct/blob/main/LICENSE)
378
559
  - **Phi (Microsoft)**: [MIT License](https://huggingface.co/microsoft/Phi-3.5-mini-instruct/blob/main/LICENSE)
379
560
 
380
- _The author of `react-native-litert-lm` takes no responsibility for the outputs generated by these models or the applications built using them._
561
+ By downloading and using these models, you agree to their respective licenses and acceptable use policies. The author of `react-native-litert-lm` takes no responsibility for model outputs or applications built with them.