npm - @gmessier/nitro-speech - Versions diffs - 0.1.3 → 0.3.0 - Mend

@gmessier/nitro-speech 0.1.3 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (72) hide show

package/README.md CHANGED Viewed

@@ -13,14 +13,24 @@
 React Native Real-Time Speech Recognition Library, powered by [Nitro Modules](https://github.com/mrousavy/nitro).
+#### Compatibility:
+‼️ Newest versions of `@gmessier/nitro-speech` requires [react-native-nitro-modules 0.35.0 or higher](https://github.com/mrousavy/nitro/releases/tag/v0.35.0).
+| Compatibility | Supported versions |
+|---|---|
+| `react-native-nitro-modules <= 0.34.*` | `@gmessier/nitro-speech <= 0.2.*` |
+| `react-native-nitro-modules >= 0.35.*` | `@gmessier/nitro-speech >= 0.3.*` |
 #### Key Features:
 - Built on Nitro Modules for low-overhead native bridging
+- Uses newest advanced `SpeechAnalyzer` and `SpeechTranscriber` API for iOS 26+ (with fallback to legacy `SFSpeechRecognition` for older versions)
 - Configurable Timer for silence (default: 8 sec)
   - Callback `onAutoFinishProgress` for progress bars, etc...
   - Method `addAutoFinishTime` for single timer update
   - Method `updateAutoFinishTime` for constant timer update
-- Optional Haptic Feedback on start and finish
+- Configurable Haptic Feedback on start and finish
+- Flexible `onVolumeChange` to display input volume in UI with built-in `useVoiceInputVolume` hook
 - Speech-quality configurations:
   - Result is grouped by speech segments into Batches.
   - Param `disableRepeatingFilter` for consecutive duplicate-word filtering.
@@ -38,6 +48,7 @@ React Native Real-Time Speech Recognition Library, powered by [Nitro Modules](ht
   - [Recommended: useRecognizer Hook](#recommended-userecognizer-hook)
   - [With React Navigation (important)](#with-react-navigation-important)
   - [Cross-component control: RecognizerRef](#cross-component-control-recognizerref)
+  - [Voice input volume](#voice-input-volume)
   - [Unsafe: RecognizerSession](#unsafe-recognizersession)
 - [API Reference](#api-reference)
 - [Requirements](#requirements)
@@ -107,6 +118,7 @@ Both permissions are required for speech recognition to work on iOS.
 | **Haptic feedback** | Optional haptics on recording start/stop | ✅ | ✅ |
 | **Background handling** | Auto-stop when app loses focus/goes to background | ✅ | Not Safe *(TODO)* |
 | **Permission handling** | Dedicated `onPermissionDenied` callback | ✅ | ✅ |
+| **Voice input volume** | Normalized voice input level for UI meters (`useVoiceInputVolume`) | ✅ | ✅ |
 | **Repeating word filter** | Removes consecutive duplicate words from artifacts | ✅ | ✅ |
 | **Locale support** | Configure speech recognizer for different languages | ✅ | ✅ |
 | **Contextual strings** | Domain-specific vocabulary for improved accuracy | ✅ | ✅ |
@@ -166,7 +178,7 @@ function MyComponent() {
         // iOS specific
         iosAddPunctuation: true,
         // Android specific
-        androidMaskOffensiveWords: false,
+        maskOffensiveWords: false,
         androidFormattingPreferQuality: false,
         androidUseWebSearchModel: false,
         androidDisableBatchHandling: false,
@@ -218,17 +230,58 @@ import { RecognizerRef } from '@gmessier/nitro-speech';
 RecognizerRef.startListening({ locale: 'en-US' });
 RecognizerRef.addAutoFinishTime(5000);
 RecognizerRef.updateAutoFinishTime(10000, true);
+RecognizerRef.getIsActive();
 RecognizerRef.stopListening();
 ```
 `RecognizerRef` exposes only method handlers and is safe for cross-component method access.
+### Voice input volume
+#### useVoiceInputVolume
+By default you have access to `useVoiceInputVolume` to read normalized voice input level (`0..1`) for UI meters.
+⚠️ **Technical limitation**: this approach re-renders component a lot.
+```typescript
+import { useVoiceInputVolume } from '@gmessier/nitro-speech';
+function VoiceMeter() {
+  const volume = useVoiceInputVolume();
+  return <Text>{volume.toFixed(2)}</Text>;
+}
+```
+#### Reanimated: useSharedValue, worklets, UI thread
+As a better alternative you can control volume via SharedValue and apply it only on UI thread with Reanimated.
+This way you will avoid re-renders since the volume will be stored on UI thread
+```typescript
+function VoiceMeter() {
+  const sharedVolume = useSharedValue(0)
+  const {
+    // ...
+  } = useRecognizer(
+    {
+      // ...
+      onVolumeChange: (normVolume) => {
+        "worklet";
+        sharedVolume.value = normValue
+      },
+      // ...
+    }
+  );
+}
+```
 ### Unsafe: RecognizerSession
 `RecognizerSession` is the hybrid object. It gives direct access to callbacks and control methods, but it is unsafe to orchestrate the full session directly from it.
 ```typescript
-import { RecognizerSession } from '@gmessier/nitro-speech';
+import { RecognizerSession, unsafe_onVolumeChange } from '@gmessier/nitro-speech';
 // Set up callbacks
 RecognizerSession.onReadyForSpeech = () => {
@@ -255,6 +308,13 @@ RecognizerSession.onPermissionDenied = () => {
   console.log('Permission denied');
 };
+RecognizerSession.onVolumeChange = (volume) => {
+  console.log('new volume: ', volume);
+};
+// OR use unsafe_onVolumeChange to enable useVoiceInputVolume hook manually
+RecognizerSession.onVolumeChange = unsafe_onVolumeChange
 // Start listening
 RecognizerSession.startListening({
   locale: 'en-US',
@@ -305,6 +365,7 @@ The `RecognizerSession.dispose()` method is **NOT SAFE** and should rarely be us
 - `stopListening()` - Stop speech recognition
 - `addAutoFinishTime(additionalTimeMs?: number)` - Add time to the auto-finish timer (or reset to original if no parameter)
 - `updateAutoFinishTime(newTimeMs: number, withRefresh?: boolean)` - Update the auto-finish timer
+- `getIsActive()` - Returns true if the speech recognition is active
 ### `RecognizerRef`
@@ -312,6 +373,11 @@ The `RecognizerSession.dispose()` method is **NOT SAFE** and should rarely be us
 - `stopListening()`
 - `addAutoFinishTime(additionalTimeMs?: number)`
 - `updateAutoFinishTime(newTimeMs: number, withRefresh?: boolean)`
+- `getIsActive()`
+### `useVoiceInputVolume`
+- `useVoiceInputVolume(): number`
 ### `RecognizerSession`
@@ -328,8 +394,9 @@ Configuration object for speech recognition.
 - `autoFinishRecognitionMs?: number` - Auto-stop timeout in milliseconds (default: `8000`)
 - `contextualStrings?: string[]` - Array of domain-specific words for better recognition
 - `disableRepeatingFilter?: boolean` - Disable filter that removes consecutive duplicate words (default: `false`)
-- `startHapticFeedbackStyle?: 'light' | 'medium' | 'heavy'` - Haptic feedback style when microphone starts recording (default: `null` / disabled)
-- `stopHapticFeedbackStyle?: 'light' | 'medium' | 'heavy'` - Haptic feedback style when microphone stops recording (default: `null` / disabled)
+- `startHapticFeedbackStyle?: 'light' | 'medium' | 'heavy' | 'none'` - Haptic feedback style when microphone starts recording (default: `"medium"`)
+- `stopHapticFeedbackStyle?: 'light' | 'medium' | 'heavy' | 'none'` - Haptic feedback style when microphone stops recording (default: `"medium"`)
+- `maskOffensiveWords?: boolean` - Mask offensive words with asterisks. (Android 13+, iOS 26+, default: `false`. iOS <26: always `false`)
 #### iOS-Specific Parameters
@@ -337,7 +404,6 @@ Configuration object for speech recognition.
 #### Android-Specific Parameters
-- `androidMaskOffensiveWords?: boolean` - Mask offensive words (Android 13+, default: `false`)
 - `androidFormattingPreferQuality?: boolean` - Prefer quality over latency (Android 13+, default: `false`)
 - `androidUseWebSearchModel?: boolean` - Use web search language model instead of free-form (default: `false`)
 - `androidDisableBatchHandling?: boolean` - Disable default batch handling (may add many empty batches, default: `false`)
@@ -361,8 +427,3 @@ cd android && ./gradlew :react-native-nitro-modules:preBuild
 ## License
 MIT
-## TODO
-- [ ] (Android) Timer till the auto finish is called
-- [ ] (Android) Cleanup when app loses the focus

package/android/src/main/java/com/margelo/nitro/nitrospeech/recognizer/HapticImpact.kt CHANGED Viewed

@@ -8,7 +8,7 @@ import android.os.VibratorManager
 import com.margelo.nitro.nitrospeech.HapticFeedbackStyle
 class HapticImpact(
-  private val style: HapticFeedbackStyle = HapticFeedbackStyle.MEDIUM,
+  private val style: HapticFeedbackStyle?
 ) {
   private data class LegacyOneShot(
     val durationMs: Long,
@@ -16,6 +16,10 @@ class HapticImpact(
   )
   fun trigger(context: Context) {
+    if (style == HapticFeedbackStyle.NONE) {
+      return
+    }
     val vibrator = getVibrator(context) ?: return
     if (!vibrator.hasVibrator()) return
@@ -25,7 +29,10 @@ class HapticImpact(
           HapticFeedbackStyle.LIGHT -> VibrationEffect.EFFECT_TICK
           HapticFeedbackStyle.MEDIUM -> VibrationEffect.EFFECT_CLICK
           HapticFeedbackStyle.HEAVY -> VibrationEffect.EFFECT_HEAVY_CLICK
+          null -> VibrationEffect.EFFECT_CLICK
+          else -> null
         }
+        if (effect == null) { return }
         vibrator.vibrate(VibrationEffect.createPredefined(effect))
         return
       }
@@ -34,7 +41,10 @@ class HapticImpact(
         HapticFeedbackStyle.LIGHT -> LegacyOneShot(durationMs = 12L, amplitude = 50)
         HapticFeedbackStyle.MEDIUM -> LegacyOneShot(durationMs = 18L, amplitude = 100)
         HapticFeedbackStyle.HEAVY -> LegacyOneShot(durationMs = 28L, amplitude = 180)
+        null -> LegacyOneShot(durationMs = 18L, amplitude = 100)
+        else -> null
       }
+      if (legacyOneShot == null) { return }
       if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.O) {
         vibrator.vibrate(
           VibrationEffect.createOneShot(

package/android/src/main/java/com/margelo/nitro/nitrospeech/recognizer/HybridRecognizer.kt CHANGED Viewed

@@ -33,6 +33,11 @@ class HybridRecognizer: HybridRecognizerSpec() {
   override var onAutoFinishProgress: ((timeLeftMs: Double) -> Unit)? = null
   override var onError: ((error: String) -> Unit)? = null
   override var onPermissionDenied: (() -> Unit)? = null
+  override var onVolumeChange: ((normVolume: Double) -> Unit)? = null
+  override fun getIsActive(): Boolean {
+    return isActive
+  }
   @DoNotStrip
   @Keep
@@ -86,7 +91,7 @@ class HybridRecognizer: HybridRecognizerSpec() {
     mainHandler.postDelayed({
       val context = NitroModules.applicationContext
       val hapticImpact = config?.stopHapticFeedbackStyle
-      if (hapticImpact != null && context != null) {
+      if (context != null) {
         HapticImpact(hapticImpact).trigger(context)
       }
       cleanup()
@@ -129,6 +134,7 @@ class HybridRecognizer: HybridRecognizerSpec() {
         val recognitionListenerSession = RecognitionListenerSession(
           autoStopper,
           config,
+          onVolumeChange
         ) { result: ArrayList<String>?, errorMessage: String?, recordingStopped: Boolean ->
           onFinishRecognition(result, errorMessage, recordingStopped)
         }
@@ -140,10 +146,10 @@ class HybridRecognizer: HybridRecognizerSpec() {
         intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, languageModel)
         intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, config?.locale ?: "en-US")
         intent.putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS, true)
-        // set many secs to avoid cutting early
+        // Set a lot of time to avoid cutting early
         intent.putExtra(RecognizerIntent.EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS, 300000)
-        if (config?.androidMaskOffensiveWords != true && Build.VERSION.SDK_INT >= Build.VERSION_CODES.TIRAMISU) {
+        if (config?.maskOffensiveWords != true && Build.VERSION.SDK_INT >= Build.VERSION_CODES.TIRAMISU) {
           intent.putExtra(RecognizerIntent.EXTRA_MASK_OFFENSIVE_WORDS, false)
         }
@@ -163,10 +169,8 @@ class HybridRecognizer: HybridRecognizerSpec() {
         isActive = true
         val hapticImpact = config?.startHapticFeedbackStyle
-        if (hapticImpact != null) {
-          HapticImpact(hapticImpact).trigger(context)
-        }
+        HapticImpact(hapticImpact).trigger(context)
         mainHandler.postDelayed({
           if (isActive) {
             onReadyForSpeech?.invoke()
@@ -192,6 +196,8 @@ class HybridRecognizer: HybridRecognizerSpec() {
       speechRecognizer?.destroy()
       speechRecognizer = null
       isActive = false
+      // Reset voice meter in JS consumers after stop/error cleanup.
+      onVolumeChange?.invoke(0.0)
     } catch (e: Exception) {
       onFinishRecognition(
         null,

package/android/src/main/java/com/margelo/nitro/nitrospeech/recognizer/RecognitionListenerSession.kt CHANGED Viewed

@@ -5,17 +5,32 @@ import android.speech.RecognitionListener
 import android.speech.SpeechRecognizer
 import android.util.Log
 import com.margelo.nitro.nitrospeech.SpeechToTextParams
+import kotlin.math.max
+import kotlin.math.roundToInt
 class RecognitionListenerSession (
     private val autoStopper: AutoStopper?,
     private val config: SpeechToTextParams?,
+    private val onVolumeChange: ((normVolume: Double) -> Unit)?,
     private val onFinishRecognition: (result: ArrayList<String>?, errorMessage: String?, recordingStopped: Boolean) -> Unit,
 ) {
     companion object {
         private const val TAG = "HybridRecognizer"
+        private const val SPEECH_LEVEL_THRESHOLD = 0.08f
+        private const val FLOOR_RISE_ALPHA = 0.01f
+        private const val FLOOR_FALL_ALPHA = 0.20f
+        private const val PEAK_ATTACK_ALPHA = 0.25f
+        private const val PEAK_DECAY_ALPHA = 0.01f
+        private const val METER_ATTACK = 0.35f
+        private const val METER_RELEASE = 0.08f
+        private const val MIN_SPAN_DB = 6f
+        private const val PRECISION_SCALE = 1_000_000f
     }
     private var resultBatches: ArrayList<String>? = null
+    private var noiseFloorDb = Float.NaN
+    private var peakDb = Float.NaN
+    private var levelSmoothed = 0f
     fun createRecognitionListener(): RecognitionListener {
         resultBatches = null
@@ -23,7 +38,11 @@ class RecognitionListenerSession (
             override fun onReadyForSpeech(params: Bundle?) {}
             override fun onBeginningOfSpeech() {}
             override fun onRmsChanged(rmsdB: Float) {
-                autoStopper?.indicateRecordingActivity()
+                val normLevel = normalizeRmsDb(rmsdB)
+                onVolumeChange?.invoke(normLevel.toDouble())
+                if (normLevel > SPEECH_LEVEL_THRESHOLD) {
+                    autoStopper?.indicateRecordingActivity()
+                }
             }
             override fun onBufferReceived(buffer: ByteArray?) {}
             override fun onEndOfSpeech() {}
@@ -92,15 +111,62 @@ class RecognitionListenerSession (
         }
     }
-    // Filters out 2 or more repeating words in a row, like "and and"
+    // Filters out 2 or more consecutive duplicate words, like "and and"
     private fun repeatingFilter(text: String): String {
-        val words = text.split(Regex("\\s+")).toMutableList()
-        var joiner = words[0]
+        var words = text.split(Regex("\\s+")).filter { it.isNotBlank() }
+        if (words.isEmpty()) {
+            return ""
+        }
+        val joiner = StringBuilder()
+        // 10 - arbitrary number of last substrings that is still unstable
+        // and needs to be filtered. Prev substrings were handled earlier.
+        if (words.size >= 10) {
+            joiner.append(words.take(words.size - 9).joinToString(" "))
+            words = words.takeLast(10)
+        } else {
+            joiner.append(words.first())
+        }
         for (i in words.indices) {
             if (i == 0) continue
-            if (words[i] == words[i-1]) continue
-            joiner += " ${words[i]}"
+            // Always add number-containing strings.
+            if (Regex("\\d+").containsMatchIn(words[i])) {
+                joiner.append(" ").append(words[i])
+                continue
+            }
+            // Skip consecutive duplicate strings.
+            if (words[i] == words[i - 1]) continue
+            joiner.append(" ").append(words[i])
         }
-        return joiner
+        return joiner.toString()
+    }
+    private fun normalizeRmsDb(rmsdB: Float): Double {
+        if (!rmsdB.isFinite()) {
+            return 0.0
+        }
+        if (noiseFloorDb.isNaN()) {
+            noiseFloorDb = rmsdB
+        }
+        if (peakDb.isNaN()) {
+            peakDb = rmsdB + MIN_SPAN_DB
+        }
+        val floorAlpha = if (rmsdB < noiseFloorDb) FLOOR_FALL_ALPHA else FLOOR_RISE_ALPHA
+        noiseFloorDb += floorAlpha * (rmsdB - noiseFloorDb)
+        val peakAlpha = if (rmsdB > peakDb) PEAK_ATTACK_ALPHA else PEAK_DECAY_ALPHA
+        peakDb += peakAlpha * (rmsdB - peakDb)
+        val span = max(peakDb - noiseFloorDb, MIN_SPAN_DB)
+        val raw = ((rmsdB - noiseFloorDb) / span).coerceIn(0f, 1f)
+        val smoothingCoeff = if (raw > levelSmoothed) METER_ATTACK else METER_RELEASE
+        levelSmoothed += smoothingCoeff * (raw - levelSmoothed)
+        return ((levelSmoothed * PRECISION_SCALE).roundToInt() / PRECISION_SCALE).toDouble()
     }
   }