npm - react-native-litert-lm - Versions diffs - 0.1.0 - Mend

react-native-litert-lm 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (53) hide show

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 hung-yueh
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,259 @@
+# react-native-litert-lm
+High-performance LLM inference for React Native powered by [LiteRT-LM](https://github.com/google-ai-edge/LiteRT-LM) and [Nitro Module](https://github.com/mrousavy/nitro). Optimized for **Gemma 3n** and other on-device language models.
+## Features
+- 🚀 **Native Performance** - Kotlin (Android) / C++ (iOS) implementation via Nitro Modules
+- 🧠 **Gemma 3n Ready** - First-class support for Gemma 3n E2B/E4B models
+- ⚡ **GPU Acceleration** - GPU delegate (Android), Metal (iOS when available)
+- 📦 **Bundled Tokenizer** - No separate tokenization library needed
+- 🔄 **Streaming Support** - Token-by-token generation callbacks
+- 📱 **Cross-Platform** - Android API 26+ (iOS coming soon)
+- 🚧 **Multimodal** - Image and audio input (Coming Soon to Android)
+## Status
+> ⚠️ **Early Preview**: This library is under active development. Android is functional with enough RAM, iOS implementation pending LiteRT-LM iOS release. Please report any issues on the [GitHub repository](https://github.com/litert-community/react-native-litert-lm).
+## Installation
+```bash
+npm install react-native-litert-lm react-native-nitro-modules
+```
+### Expo
+Add to your `app.json`:
+```json
+{
+  "expo": {
+    "plugins": ["react-native-litert-lm"],
+    "android": {
+      "minSdkVersion": 26
+    }
+  }
+}
+```
+Then create a development build:
+```bash
+npx expo prebuild
+npx expo run:android
+```
+> **Note**: Only ARM devices are supported (physical devices or ARM emulators). x86_64 emulators are not supported.
+### Bare React Native
+```bash
+cd android && ./gradlew clean
+cd ios && pod install  # iOS coming soon
+```
+## Model Management
+LiteRT-LM models (like Gemma 3n) are large files (3GB+) and cannot be bundled directly into your app's binary. You must download them at runtime to a writable directory (e.g., `DocumentDirectory`).
+### Downloading Models
+We recommend using `rn-fetch-blob` or `expo-file-system` to download models.
+```typescript
+import { FileSystem } from "react-native-file-access";
+// or import * as FileSystem from 'expo-file-system';
+const MODEL_URL =
+  "https://huggingface.co/litert-community/gemma-3n-2b-it/resolve/main/model.litertlm";
+const localPath = `${FileSystem.DocumentDirectoryPath}/gemma-3n.litertlm`;
+async function downloadModel() {
+  if (await FileSystem.exists(localPath)) return localPath;
+  // Download logic here...
+  return localPath;
+}
+```
+## Usage
+### Basic Generation
+```typescript
+import { createLLM } from "react-native-litert-lm";
+const llm = createLLM();
+// Load a Gemma 3n model
+llm.loadModel("/path/to/gemma-3n-e2b.litertlm", {
+  backend: "gpu",
+  temperature: 0.7,
+  maxTokens: 512,
+});
+// Generate response
+const response = llm.sendMessage("What is the capital of France?");
+console.log(response);
+// Clean up
+llm.close();
+```
+### Streaming Generation
+```typescript
+llm.sendMessageAsync("Tell me a story", (token, done) => {
+  process.stdout.write(token);
+  if (done) console.log("\n--- Done ---");
+});
+```
+### Multimodal (Image/Audio)
+```typescript
+// Image input (for vision models)
+const response = llm.sendMessageWithImage(
+  "What's in this image?",
+  "/path/to/image.jpg",
+);
+// Audio input (for audio models)
+const transcription = llm.sendMessageWithAudio(
+  "Transcribe this audio",
+  "/path/to/audio.wav",
+);
+```
+### Check Performance
+```typescript
+const stats = llm.getStats();
+console.log(`Generated ${stats.completionTokens} tokens`);
+console.log(`Speed: ${stats.tokensPerSecond.toFixed(1)} tokens/sec`);
+```
+## Supported Models
+Download `.litertlm` models from [HuggingFace](https://huggingface.co/litert-community):
+| Model         | Size   | Min Device RAM | Use Case                  |
+| ------------- | ------ | -------------- | ------------------------- |
+| Gemma 3n E2B  | ~3GB   | 4GB+           | Efficient, fast responses |
+| Gemma 3n E4B  | ~4GB   | 8GB+           | Higher quality            |
+| Gemma 3 1B    | ~1GB   | 4GB+           | Smallest, fastest         |
+| Phi-4 Mini    | ~2GB   | 4GB+           | Microsoft's small LLM     |
+| Qwen 2.5 1.5B | ~1.5GB | 4GB+           | Multilingual              |
+## API Reference
+### `createLLM(): LiteRTLM`
+Creates a new LLM inference engine instance.
+### `loadModel(path, config?)`
+- `path: string` - Absolute path to `.litertlm` file
+- `config.backend` - `'cpu'` | `'gpu'` | `'npu'` (default: `'gpu'`)
+- `config.temperature` - Sampling temperature (default: 0.7)
+- `config.topK` - Top-K sampling (default: 40)
+- `config.maxTokens` - Max generation length (default: 1024)
+> **Note**: Vision encoder is always set to GPU (required by Gemma 3n). Audio encoder is always set to CPU (optimal for audio).
+#### Backend Options
+| Backend | Description       | Speed   | Compatibility                              |
+| ------- | ----------------- | ------- | ------------------------------------------ |
+| `'cpu'` | CPU inference     | Slowest | Always available with less RAM requirement |
+| `'gpu'` | GPU acceleration  | Fast    | Recommended default                        |
+| `'npu'` | NPU/Neural Engine | Fastest | Requires supported hardware                |
+> ⚠️ **NPU Note**: NPU acceleration requires compatible hardware (Qualcomm Hexagon, MediaTek APU, etc.). If unavailable, LiteRT-LM automatically falls back to GPU.
+### `sendMessage(message): string`
+Blocking generation. Returns complete response.
+### `sendMessageAsync(message, callback)`
+Streaming generation. Callback receives `(token, isDone)`.
+### `sendMessageWithImage(message, imagePath): string`
+Send a message with an image attachment (for vision models).
+### `sendMessageWithAudio(message, audioPath): string`
+Send a message with an audio attachment (for audio models).
+### `getHistory(): Message[]`
+Get conversation history.
+### `resetConversation()`
+Clear context and start fresh.
+### `close()`
+Release all native resources.
+### `getRecommendedBackend(): Backend`
+Returns the recommended backend for the current platform (usually `'gpu'`).
+### `checkBackendSupport(backend): string | undefined`
+Returns a warning message if the specified backend may have issues on the current platform, or `undefined` if OK.
+```typescript
+import { checkBackendSupport } from "react-native-litert-lm";
+const warning = checkBackendSupport("npu");
+if (warning) {
+  console.warn(warning);
+}
+```
+## Requirements
+- React Native 0.76+
+- react-native-nitro-modules 0.33.2+
+- Android API 26+ (ARM64 only)
+- **LiteRT-LM Android SDK**: `0.9.0-alpha01` (bundled automatically)
+- iOS 15.0+ (coming soon)
+## Platform Support
+| Platform | Status   | Architecture |
+| -------- | -------- | ------------ |
+| Android  | ✅ Ready | arm64-v8a    |
+| iOS      | 🚧 Stub  | -            |
+## Architecture
+This library uses a split implementation strategy to maximize performance and compatibility:
+- **Android**: Uses **Kotlin** (`HybridLiteRTLM.kt`) to interface directly with the `litertlm-android` AAR.
+- **iOS**: Uses **C++** (`HybridLiteRTLM.cpp`) which will interface with the LiteRT-LM C++ headers (once released).
+> **Note for Contributors**: Changes made to the C++ implementation (`cpp/`) **do not** affect Android. You must apply feature changes to both the Kotlin and C++ implementations.
+## License
+The code in this repository is licensed under the **[MIT License](LICENSE)**.
+### ⚠️ Important AI Model Disclaimer
+This library acts as an execution engine for On-Device Large Language Models (LLMs). The AI models themselves are **not** distributed with this package and are **not** covered by the MIT license.
+By downloading and running these models within your app, you agree to comply with their respective licenses and acceptable use policies:
+- **Gemma (Google)**: [Gemma Terms of Use](https://ai.google.dev/gemma/terms)
+- **Llama 3 (Meta)**: [Llama 3.2 Community License](https://www.llama.com/llama3/license/)
+- **Qwen (Alibaba)**: [Apache 2.0 License](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct/blob/main/LICENSE)
+- **Phi (Microsoft)**: [MIT License](https://huggingface.co/microsoft/Phi-3.5-mini-instruct/blob/main/LICENSE)
+_The author of `react-native-litert-lm` takes no responsibility for the outputs generated by these models or the applications built using them._

package/android/CMakeLists.txt ADDED Viewed

@@ -0,0 +1,32 @@
+cmake_minimum_required(VERSION 3.18.0)
+set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
+# Define the library name - must match what Nitrogen expects
+project(LiteRTLM)
+set(CMAKE_CXX_STANDARD 20)
+set(CMAKE_CXX_STANDARD_REQUIRED ON)
+# Define the shared library (main entry point)
+add_library(
+    LiteRTLM
+    SHARED
+    ../cpp/cpp-adapter.cpp
+    # Additional sources are added by autolinking.cmake below
+)
+# Allow undefined symbols - they will be resolved at runtime when the app
+# loads the NitroModules shared library. This is required because we're
+# building a library that depends on NitroModules symbols which are only
+# available at runtime.
+set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -Wl,--allow-shlib-undefined")
+# Include Nitrogen autolinking - this adds all generated sources and links
+include(${CMAKE_SOURCE_DIR}/../nitrogen/generated/android/LiteRTLM+autolinking.cmake)
+# Android system libraries
+target_link_libraries(
+    LiteRTLM
+    android
+    log
+)

package/android/build.gradle ADDED Viewed

@@ -0,0 +1,88 @@
+// Module-level build.gradle for react-native-litert-lm
+// Configures Android build with Kotlin HybridObject + C++ JNI glue
+plugins {
+    id 'com.android.library'
+    id 'org.jetbrains.kotlin.android'
+}
+// Apply Nitrogen autolinking
+apply from: '../nitrogen/generated/android/LiteRTLM+autolinking.gradle'
+android {
+    namespace "dev.litert.litertlm"
+    compileSdk 35
+    defaultConfig {
+        minSdk 26  // LiteRT-LM requires API 26+
+        externalNativeBuild {
+            cmake {
+                cppFlags "-O2 -fexceptions -frtti -std=c++20"
+                arguments "-DANDROID_STL=c++_shared"
+            }
+        }
+        ndk {
+            abiFilters 'arm64-v8a'
+        }
+    }
+    buildFeatures {
+        prefab true
+    }
+    externalNativeBuild {
+        cmake {
+            path "CMakeLists.txt"
+            version "3.22.1"
+        }
+    }
+    compileOptions {
+        sourceCompatibility JavaVersion.VERSION_17
+        targetCompatibility JavaVersion.VERSION_17
+    }
+    kotlinOptions {
+        jvmTarget = '17'
+    }
+    sourceSets {
+        main {
+            java.srcDirs += [
+                'src/main/java',
+                '../nitrogen/generated/android/kotlin'
+            ]
+        }
+    }
+    packaging {
+        jniLibs {
+            keepDebugSymbols.add("**/*.so")
+        }
+    }
+}
+repositories {
+    google()
+    mavenCentral()
+}
+dependencies {
+    // React Native
+    implementation 'com.facebook.react:react-android'
+    // Nitro Modules
+    implementation project(':react-native-nitro-modules')
+    // fbjni for HybridObject JNI bridge
+    implementation 'com.facebook.fbjni:fbjni:0.6.0'
+    // Kotlin coroutines for async operations
+    implementation 'org.jetbrains.kotlinx:kotlinx-coroutines-core:1.7.3'
+    implementation 'org.jetbrains.kotlinx:kotlinx-coroutines-android:1.7.3'
+    // LiteRT-LM Kotlin API
+    implementation 'com.google.ai.edge.litertlm:litertlm-android:0.9.0-alpha01'
+}

package/android/src/main/AndroidManifest.xml ADDED Viewed

@@ -0,0 +1,11 @@
+<?xml version="1.0" encoding="utf-8"?>
+<manifest xmlns:android="http://schemas.android.com/apk/res/android">
+    <application>
+        <!-- ContentProvider for initializing application context at startup -->
+        <provider
+            android:name="dev.litert.litertlm.LiteRTLMInitProvider"
+            android:authorities="${applicationId}.litertlm.init"
+            android:exported="false"
+            android:initOrder="100" />
+    </application>
+</manifest>

package/android/src/main/java/com/margelo/nitro/dev/litert/litertlm/HybridLiteRTLM.kt ADDED Viewed

@@ -0,0 +1,280 @@
+///
+/// HybridLiteRTLM.kt
+/// Kotlin implementation of LiteRTLM HybridObject using LiteRT-LM Android SDK.
+///
+package com.margelo.nitro.dev.litert.litertlm
+import android.util.Log
+import androidx.annotation.Keep
+import com.facebook.proguard.annotations.DoNotStrip
+import dev.litert.litertlm.LiteRTLMInitProvider
+import com.google.ai.edge.litertlm.Engine
+import com.google.ai.edge.litertlm.Conversation
+import com.google.ai.edge.litertlm.EngineConfig
+import com.google.ai.edge.litertlm.ConversationConfig
+import com.margelo.nitro.dev.litert.litertlm.Backend
+import com.margelo.nitro.dev.litert.litertlm.GenerationStats
+import com.margelo.nitro.dev.litert.litertlm.HybridLiteRTLMSpec
+import com.margelo.nitro.dev.litert.litertlm.LLMConfig
+import com.margelo.nitro.dev.litert.litertlm.Message
+import com.margelo.nitro.dev.litert.litertlm.Role
+// Alias to avoid confusion with our generated Message type
+typealias LiteRTMessage = com.google.ai.edge.litertlm.Message
+/**
+ * Kotlin implementation of LiteRTLM using the LiteRT-LM Android SDK.
+ * This class bridges between React Native (via Nitro) and the Google LiteRT-LM Engine.
+ */
+@DoNotStrip
+@Keep
+class HybridLiteRTLM : HybridLiteRTLMSpec() {
+    companion object {
+        private const val TAG = "HybridLiteRTLM"
+    }
+    // LiteRT-LM Engine and Conversation
+    private var engine: Engine? = null
+    private var conversation: Conversation? = null
+    // Conversation history for getHistory()
+    private val history = mutableListOf<Message>()
+    // Last generation stats
+    private var lastStats = GenerationStats(
+        promptTokens = 0.0,
+        completionTokens = 0.0,
+        totalTokens = 0.0,
+        timeToFirstToken = 0.0,
+        totalTime = 0.0,
+        tokensPerSecond = 0.0
+    )
+    // Configuration
+    private var backend: Backend = Backend.GPU
+    private var temperature: Double = 0.7
+    private var topK: Int = 40
+    private var topP: Double = 0.95
+    private var maxTokens: Int = 1024
+    override val memorySize: Long
+        get() = 10L * 1024L * 1024L // ~10MB estimate
+    // -------------------------------------------------------------------------
+    // loadModel - Initialize LiteRT-LM Engine and Conversation
+    // -------------------------------------------------------------------------
+    override fun loadModel(modelPath: String, config: LLMConfig?) {
+        Log.i(TAG, "loadModel: $modelPath")
+        // Clean up existing resources
+        close()
+        // Apply configuration
+        config?.let { cfg ->
+            cfg.backend?.let { backend = it }
+            cfg.temperature?.let { temperature = it }
+            cfg.topK?.let { topK = it.toInt() }
+            cfg.topP?.let { topP = it }
+            cfg.maxTokens?.let { maxTokens = it.toInt() }
+        }
+        try {
+            // Map our Backend enum to LiteRT-LM Backend enum
+            val lmBackend = when (backend) {
+                Backend.GPU -> com.google.ai.edge.litertlm.Backend.GPU
+                Backend.NPU -> {
+                    Log.i(TAG, "NPU backend requested - requires hardware support")
+                    com.google.ai.edge.litertlm.Backend.NPU
+                }
+                else -> com.google.ai.edge.litertlm.Backend.CPU
+            }
+            // Vision backend: hardcoded to GPU (required by Gemma 3n)
+            val lmVisionBackend = com.google.ai.edge.litertlm.Backend.GPU
+            // Audio backend: hardcoded to CPU (optimal for audio processing)
+            val lmAudioBackend = com.google.ai.edge.litertlm.Backend.CPU
+            Log.i(TAG, "Backend config: main=$lmBackend, vision=$lmVisionBackend (hardcoded), audio=$lmAudioBackend (hardcoded)")
+            // Get cache directory from application context
+            // LiteRT-LM needs this to store temporary compiled model files
+            val cacheDirectory = LiteRTLMInitProvider.applicationContext?.cacheDir?.absolutePath
+            Log.i(TAG, "Using cache directory: $cacheDirectory")
+            // Create Engine configuration
+            val engineConfig = EngineConfig(
+                modelPath = modelPath,
+                backend = lmBackend,
+                visionBackend = lmVisionBackend,
+                audioBackend = lmAudioBackend,
+                maxNumTokens = maxTokens,
+                cacheDir = cacheDirectory
+            )
+            // Create Engine (heavyweight - loads model)
+            engine = Engine(engineConfig).also { it.initialize() }
+            Log.i(TAG, "Engine created and initialized successfully")
+            // Create Conversation (lightweight - holds KV cache)
+            createNewConversation()
+            Log.i(TAG, "Conversation created successfully")
+        } catch (e: Exception) {
+            Log.e(TAG, "Failed to load model: ${e.message}", e)
+            throw RuntimeException("Failed to load model: ${e.message}", e)
+        }
+    }
+    // -------------------------------------------------------------------------
+    // sendMessage - Blocking text inference
+    // -------------------------------------------------------------------------
+    override fun sendMessage(message: String): String {
+        ensureLoaded()
+        // Add user message to history
+        history.add(Message(Role.USER, message))
+        // Pre-process message (chat template)
+        Log.i(TAG, "sendMessage: $message")
+        // Blocking inference
+        // LiteRT-LM expects a Message object, not String
+        val userMsg = LiteRTMessage.of(message)
+        val responseMsg = conversation!!.sendMessage(userMsg)
+        // Extract text from response Message
+        val response = responseMsg.contents
+            .filterIsInstance<com.google.ai.edge.litertlm.Content.Text>()
+            .joinToString("") { it.text }
+        // Add model response to history
+        history.add(Message(Role.MODEL, response))
+        // Update stats (mock/approximate for now as SDK doesn't return full stats for sync call)
+        lastStats = GenerationStats(
+            promptTokens = message.length / 4.0,
+            completionTokens = response.length / 4.0,
+            totalTokens = (message.length + response.length) / 4.0,
+            timeToFirstToken = 0.0,
+            totalTime = 0.0,
+            tokensPerSecond = 0.0
+        )
+        return response
+    }
+    // -------------------------------------------------------------------------
+    // sendMessageAsync - Streaming inference
+    // -------------------------------------------------------------------------
+    override fun sendMessageAsync(message: String, onToken: (String, Boolean) -> Unit) {
+        ensureLoaded()
+        // Add user message to history
+        history.add(Message(Role.USER, message))
+        Log.d(TAG, "sendMessageAsync: $message")
+        val fullResponseBuilder = StringBuilder()
+        // Define callback
+        val listener = object : com.google.ai.edge.litertlm.MessageCallback {
+             override fun onMessage(responseMsg: LiteRTMessage) {
+                val chunk = responseMsg.contents
+                    .filterIsInstance<com.google.ai.edge.litertlm.Content.Text>()
+                    .joinToString("") { it.text }
+                onToken(chunk, false)
+                if (chunk.isNotEmpty()) {
+                    fullResponseBuilder.append(chunk)
+                }
+            }
+            override fun onDone() {
+                onToken("", true)
+                val fullResponse = fullResponseBuilder.toString()
+                history.add(Message(Role.MODEL, fullResponse))
+                Log.d(TAG, "sendMessageAsync done. Length: ${fullResponse.length}")
+            }
+            override fun onError(t: Throwable) {
+                Log.e(TAG, "Async generation failed", t)
+                onToken("Error: ${t.message}", true)
+            }
+        }
+        try {
+            // Construct Message object
+            val userMsg = LiteRTMessage.of(message)
+            // LiteRT-LM async call - SDK handles threading
+            conversation!!.sendMessageAsync(userMsg, listener)
+        } catch (e: Exception) {
+            Log.e(TAG, "Failed into initiate async generation", e)
+            onToken("Error: ${e.message}", true)
+        }
+    }
+    // -------------------------------------------------------------------------
+    // Multimodal methods
+    // -------------------------------------------------------------------------
+    override fun sendMessageWithImage(message: String, imagePath: String): String {
+        // TODO: Implement image loading from path
+        throw RuntimeException("Multimodal (Image) not yet implemented in this wrapper")
+    }
+    override fun sendMessageWithAudio(message: String, audioPath: String): String {
+        // TODO: Implement audio loading from path
+        throw RuntimeException("Multimodal (Audio) not yet implemented in this wrapper")
+    }
+    // -------------------------------------------------------------------------
+    // Helpers
+    // -------------------------------------------------------------------------
+    override fun getHistory(): Array<Message> {
+        return history.toTypedArray()
+    }
+    override fun resetConversation() {
+        history.clear()
+        createNewConversation()
+    }
+    override fun isReady(): Boolean {
+        return isLoaded_
+    }
+    // Property backing field for isReady check
+    private val isLoaded_: Boolean
+        get() = engine != null
+    override fun getStats(): GenerationStats {
+        return lastStats
+    }
+    override fun close() {
+        Log.d(TAG, "Closing resources")
+        try {
+            conversation = null
+            engine = null // Engine destructor should handle cleanup
+            // In C++ we'd close explicitly, Kotlin GC helps but explicit close method is better if SDK has it
+        } catch (e: Exception) {
+            Log.e(TAG, "Error closing resources", e)
+        }
+    }
+    private fun ensureLoaded() {
+        if (engine == null) {
+            throw RuntimeException("LiteRTLM: No model loaded. Call loadModel() first.")
+        }
+    }
+    private fun createNewConversation() {
+        ensureLoaded()
+        // Dispose old conversation if needed
+        conversation = engine!!.createConversation()
+    }
+}