PyPI - s2t - Versions diffs - 0.1.5__tar.gz → 0.1.6.post1.dev0__tar.gz - Mend

s2t 0.1.5tar.gz → 0.1.6.post1.dev0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

{s2t-0.1.5/src/s2t.egg-info → s2t-0.1.6.post1.dev0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: s2t
-Version: 0.1.5
+Version: 0.1.6.post1.dev0
 Summary: Speech to Text (s2t): Record audio, run Whisper, export formats, and copy transcript to clipboard.
 Author: Maintainers
 License-Expression: LicenseRef-Proprietary
@@ -55,6 +55,8 @@ System requirements (Linux)
   - Translate to English: `-t` (long: `--translate`). You may still provide `--lang` as an input-language hint if you want.
   - List available models and exit: `-L` (long: `--list-models`)
   - Recording format: `-f flac|wav|mp3` (long: `--recording-format`), default `flac`. MP3 requires ffmpeg; if absent, it falls back to FLAC with a warning.
+  - Auto-split on silence: `--silence-sec 1.0` (default `1.0`; `0` disables). When continuous silence ≥ this many seconds is detected, the current chunk is ended automatically.
+  - Minimum chunk length for auto-split: `--min-chunk-sec 5.0` (default `5.0`). Prevents very short chunks and avoids splitting early in a sentence.
   - Prompt mode (spoken prompt): `-p` (long: `--prompt`). Speak your prompt first, then press SPACE to use it as prompt and continue with your main content. If you press ENTER instead of SPACE, no prompt is used; the spoken audio is transcribed as normal payload and the session ends.
   - Keep chunk files: `--keep-chunks` — by default, per‑chunk audio and per‑chunk Whisper outputs are deleted after the final merge.
   - Open transcript for editing: `-e` (long: `--edit`) — opens the generated `.txt` in your shell editor (`$VISUAL`/`$EDITOR`).
@@ -69,6 +71,11 @@ Outputs are written into a timestamped folder under the chosen output directory
 - Final outputs: `recording.flac/.wav` (and `recording.mp3` if requested and ffmpeg available), plus `recording.txt/.srt/.vtt/.tsv/.json`
 - Clipboard mirrors the combined `.txt` with blank lines between chunks.
+Auto-splitting details
+- SPACE always splits immediately; ENTER finishes the recording.
+- With `--silence-sec > 0`, chunks end automatically after detected continuous silence of that many seconds.
+- Auto-split only triggers once the current chunk has at least `--min-chunk-sec` seconds and after speech has been detected (to ignore leading silence). A short internal cooldown avoids duplicate splits.
 ## Makefile (optional)
 - Setup venv + dev deps: `make setup`
 - Lint/format/test: `make lint`, `make format`, `make test`; combined gate: `make check`

{s2t-0.1.5 → s2t-0.1.6.post1.dev0}/README.md RENAMED Viewed

@@ -27,6 +27,8 @@ System requirements (Linux)
   - Translate to English: `-t` (long: `--translate`). You may still provide `--lang` as an input-language hint if you want.
   - List available models and exit: `-L` (long: `--list-models`)
   - Recording format: `-f flac|wav|mp3` (long: `--recording-format`), default `flac`. MP3 requires ffmpeg; if absent, it falls back to FLAC with a warning.
+  - Auto-split on silence: `--silence-sec 1.0` (default `1.0`; `0` disables). When continuous silence ≥ this many seconds is detected, the current chunk is ended automatically.
+  - Minimum chunk length for auto-split: `--min-chunk-sec 5.0` (default `5.0`). Prevents very short chunks and avoids splitting early in a sentence.
   - Prompt mode (spoken prompt): `-p` (long: `--prompt`). Speak your prompt first, then press SPACE to use it as prompt and continue with your main content. If you press ENTER instead of SPACE, no prompt is used; the spoken audio is transcribed as normal payload and the session ends.
   - Keep chunk files: `--keep-chunks` — by default, per‑chunk audio and per‑chunk Whisper outputs are deleted after the final merge.
   - Open transcript for editing: `-e` (long: `--edit`) — opens the generated `.txt` in your shell editor (`$VISUAL`/`$EDITOR`).
@@ -41,6 +43,11 @@ Outputs are written into a timestamped folder under the chosen output directory
 - Final outputs: `recording.flac/.wav` (and `recording.mp3` if requested and ffmpeg available), plus `recording.txt/.srt/.vtt/.tsv/.json`
 - Clipboard mirrors the combined `.txt` with blank lines between chunks.
+Auto-splitting details
+- SPACE always splits immediately; ENTER finishes the recording.
+- With `--silence-sec > 0`, chunks end automatically after detected continuous silence of that many seconds.
+- Auto-split only triggers once the current chunk has at least `--min-chunk-sec` seconds and after speech has been detected (to ignore leading silence). A short internal cooldown avoids duplicate splits.
 ## Makefile (optional)
 - Setup venv + dev deps: `make setup`
 - Lint/format/test: `make lint`, `make format`, `make test`; combined gate: `make check`

{s2t-0.1.5 → s2t-0.1.6.post1.dev0}/src/s2t/cli.py RENAMED Viewed

@@ -213,6 +213,8 @@ def run_session(opts: SessionOptions) -> int:
         verbose=opts.verbose,
         pause_after_first_chunk=opts.prompt,
         resume_event=prompt_resume_event,
+        silence_sec=opts.silence_sec,
+        min_chunk_sec=opts.min_chunk_sec,
     )
     t0 = time.perf_counter()
     chunk_paths, chunk_frames, chunk_offsets = rec.run(tx_q)
@@ -441,6 +443,18 @@ def main(argv: list[str] | None = None) -> int:
         default=0,
         help="Debounce window for SPACE (ms). If >0, ignores rapid successive space presses",
     )
+    parser.add_argument(
+        "--silence-sec",
+        type=float,
+        default=1.0,
+        help="Auto-split when continuous silence >= this many seconds (0 disables)",
+    )
+    parser.add_argument(
+        "--min-chunk-sec",
+        type=float,
+        default=5.0,
+        help="Minimum duration a chunk must reach before auto-split can trigger",
+    )
     parser.add_argument(
         "--native-segmentation",
         action="store_true",
@@ -496,6 +510,8 @@ def main(argv: list[str] | None = None) -> int:
             verbose=args.verbose,
             edit=args.edit,
             debounce_ms=getattr(args, "debounce_ms", 0),
+            silence_sec=getattr(args, "silence_sec", 1.0),
+            min_chunk_sec=getattr(args, "min_chunk_sec", 5.0),
             profile=args.profile,
             keep_chunks=getattr(args, "keep_chunks", False),
             prompt=getattr(args, "prompt", False),

{s2t-0.1.5 → s2t-0.1.6.post1.dev0}/src/s2t/config.py RENAMED Viewed

@@ -18,6 +18,8 @@ class SessionOptions:
     verbose: bool
     edit: bool
     debounce_ms: int
+    silence_sec: float
+    min_chunk_sec: float
     profile: bool
     keep_chunks: bool
     prompt: bool

{s2t-0.1.5 → s2t-0.1.6.post1.dev0}/src/s2t/recorder.py RENAMED Viewed

@@ -9,6 +9,8 @@ import time
 from pathlib import Path
 from typing import Any, Protocol, cast, runtime_checkable
+import numpy as np
 class Recorder:
     def __init__(
@@ -21,6 +23,8 @@ class Recorder:
         verbose: bool = False,
         pause_after_first_chunk: bool = False,
         resume_event: threading.Event | None = None,
+        silence_sec: float = 1.0,
+        min_chunk_sec: float = 5.0,
     ) -> None:
         self.session_dir = session_dir
         self.samplerate = samplerate
@@ -31,6 +35,9 @@ class Recorder:
         self.pause_after_first_chunk = pause_after_first_chunk
         self.resume_event = resume_event
         self._paused = False
+        # Auto-split config
+        self.silence_sec = max(0.0, float(silence_sec))
+        self.min_chunk_sec = max(0.0, float(min_chunk_sec))
     def run(
         self,
@@ -209,50 +216,65 @@ class Recorder:
             fh = sf.SoundFile(
                 str(cur_path), mode="w", samplerate=self.samplerate, channels=self.channels
             )
+            # State for auto-split based on silence
+            silent_frames_run = 0
+            seen_non_silent = False
+            last_split_time = 0.0
+            # Internal thresholds
+            threshold_rms = 0.015  # conservative RMS threshold for float32 [-1,1]
+            split_cooldown_sec = 0.2
+            def _do_split() -> None:
+                nonlocal fh, frames_written, cur_path, chunk_index, offset_seconds_total
+                fh.flush()
+                fh.close()
+                if frames_written > 0:
+                    dur = frames_written / float(self.samplerate)
+                    chunk_paths.append(cur_path)
+                    chunk_frames.append(frames_written)
+                    chunk_offsets.append(offset_seconds_total)
+                    offset_seconds_total += dur
+                    if self.verbose:
+                        print(
+                            f"Saved chunk: {cur_path.name} ({dur:.2f}s)",
+                            file=sys.stderr,
+                        )
+                    tx_queue.put((chunk_index, cur_path, frames_written, chunk_offsets[-1]))
+                else:
+                    try:
+                        cur_path.unlink(missing_ok=True)
+                    except Exception:
+                        pass
+                frames_written = 0
+                chunk_index += 1
+                if (
+                    self.pause_after_first_chunk
+                    and chunk_index == 2
+                    and self.resume_event is not None
+                ):
+                    self._paused = True
+                    self.resume_event.wait()
+                    self._paused = False
+                cur_path = self.session_dir / f"chunk_{chunk_index:04d}{self.ext}"
+                fh = sf.SoundFile(
+                    str(cur_path),
+                    mode="w",
+                    samplerate=self.samplerate,
+                    channels=self.channels,
+                )
+                # Reset silence tracking after a split
+                return
             while True:
                 # First, handle any pending control commands so SPACE/ENTER are never blocked by frames backlog.
                 try:
                     while True:
                         cmd = ctrl_q.get_nowait()
                         if cmd == "split":
-                            fh.flush()
-                            fh.close()
-                            if frames_written > 0:
-                                dur = frames_written / float(self.samplerate)
-                                chunk_paths.append(cur_path)
-                                chunk_frames.append(frames_written)
-                                chunk_offsets.append(offset_seconds_total)
-                                offset_seconds_total += dur
-                                if self.verbose:
-                                    print(
-                                        f"Saved chunk: {cur_path.name} ({dur:.2f}s)",
-                                        file=sys.stderr,
-                                    )
-                                tx_queue.put(
-                                    (chunk_index, cur_path, frames_written, chunk_offsets[-1])
-                                )
-                            else:
-                                try:
-                                    cur_path.unlink(missing_ok=True)
-                                except Exception:
-                                    pass
-                            frames_written = 0
-                            chunk_index += 1
-                            if (
-                                self.pause_after_first_chunk
-                                and chunk_index == 2
-                                and self.resume_event is not None
-                            ):
-                                self._paused = True
-                                self.resume_event.wait()
-                                self._paused = False
-                            cur_path = self.session_dir / f"chunk_{chunk_index:04d}{self.ext}"
-                            fh = sf.SoundFile(
-                                str(cur_path),
-                                mode="w",
-                                samplerate=self.samplerate,
-                                channels=self.channels,
-                            )
+                            _do_split()
+                            # Reset silence tracking on manual split
+                            silent_frames_run = 0
+                            seen_non_silent = False
                         elif cmd == "finish":
                             fh.flush()
                             fh.close()
@@ -289,6 +311,48 @@ class Recorder:
                     data = payload
                     fh.write(data)
                     frames_written += len(data)
+                    # Auto-split based on silence if enabled
+                    if self.silence_sec > 0.0:
+                        try:
+                            arr = np.asarray(data, dtype=np.float32)
+                            if arr.ndim == 2 and arr.shape[1] > 1:
+                                # average channels
+                                arr_mono = arr.mean(axis=1)
+                            else:
+                                arr_mono = arr.reshape(-1)
+                            # compute RMS
+                            rms = (
+                                float(np.sqrt(np.mean(np.square(arr_mono))))
+                                if arr_mono.size
+                                else 0.0
+                            )
+                        except Exception:
+                            rms = 0.0
+                        if rms < threshold_rms:
+                            silent_frames_run += len(arr_mono)
+                        else:
+                            silent_frames_run = 0
+                            seen_non_silent = True
+                        # Conditions to auto-split
+                        enough_silence = silent_frames_run >= int(
+                            self.samplerate * self.silence_sec
+                        )
+                        enough_length = frames_written >= int(self.samplerate * self.min_chunk_sec)
+                        cooldown_ok = (time.perf_counter() - last_split_time) >= split_cooldown_sec
+                        if enough_silence and enough_length and seen_non_silent and cooldown_ok:
+                            if self.verbose:
+                                print(
+                                    f"[auto] split (≥{self.silence_sec:.2f}s silence)",
+                                    file=sys.stderr,
+                                )
+                            last_split_time = time.perf_counter()
+                            # Queue a split for the next control phase
+                            ctrl_q.put("split")
+                            # Reset silence tracking now to avoid cascaded triggers
+                            silent_frames_run = 0
+                            seen_non_silent = False
             tx_queue.put((-1, Path(), 0, 0.0))
         def cb(indata: Any, frames: int, time_info: Any, status: Any) -> None:
@@ -302,7 +366,12 @@ class Recorder:
         key_t.start()
         writer_t.start()
-        print("Recording… Press SPACE to split, Enter to finish.")
+        msg = "Recording… Press SPACE to split, Enter to finish."
+        if self.silence_sec > 0.0:
+            msg += (
+                f" Auto-split on ≥{self.silence_sec:.2f}s silence (min {self.min_chunk_sec:.2f}s)."
+            )
+        print(msg)
         print("—" * 60)
         print("")

{s2t-0.1.5 → s2t-0.1.6.post1.dev0/src/s2t.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: s2t
-Version: 0.1.5
+Version: 0.1.6.post1.dev0
 Summary: Speech to Text (s2t): Record audio, run Whisper, export formats, and copy transcript to clipboard.
 Author: Maintainers
 License-Expression: LicenseRef-Proprietary
@@ -55,6 +55,8 @@ System requirements (Linux)
   - Translate to English: `-t` (long: `--translate`). You may still provide `--lang` as an input-language hint if you want.
   - List available models and exit: `-L` (long: `--list-models`)
   - Recording format: `-f flac|wav|mp3` (long: `--recording-format`), default `flac`. MP3 requires ffmpeg; if absent, it falls back to FLAC with a warning.
+  - Auto-split on silence: `--silence-sec 1.0` (default `1.0`; `0` disables). When continuous silence ≥ this many seconds is detected, the current chunk is ended automatically.
+  - Minimum chunk length for auto-split: `--min-chunk-sec 5.0` (default `5.0`). Prevents very short chunks and avoids splitting early in a sentence.
   - Prompt mode (spoken prompt): `-p` (long: `--prompt`). Speak your prompt first, then press SPACE to use it as prompt and continue with your main content. If you press ENTER instead of SPACE, no prompt is used; the spoken audio is transcribed as normal payload and the session ends.
   - Keep chunk files: `--keep-chunks` — by default, per‑chunk audio and per‑chunk Whisper outputs are deleted after the final merge.
   - Open transcript for editing: `-e` (long: `--edit`) — opens the generated `.txt` in your shell editor (`$VISUAL`/`$EDITOR`).
@@ -69,6 +71,11 @@ Outputs are written into a timestamped folder under the chosen output directory
 - Final outputs: `recording.flac/.wav` (and `recording.mp3` if requested and ffmpeg available), plus `recording.txt/.srt/.vtt/.tsv/.json`
 - Clipboard mirrors the combined `.txt` with blank lines between chunks.
+Auto-splitting details
+- SPACE always splits immediately; ENTER finishes the recording.
+- With `--silence-sec > 0`, chunks end automatically after detected continuous silence of that many seconds.
+- Auto-split only triggers once the current chunk has at least `--min-chunk-sec` seconds and after speech has been detected (to ignore leading silence). A short internal cooldown avoids duplicate splits.
 ## Makefile (optional)
 - Setup venv + dev deps: `make setup`
 - Lint/format/test: `make lint`, `make format`, `make test`; combined gate: `make check`