PyPI - transcribe-studio - Versions diffs - 0.2.0__tar.gz - Mend

transcribe-studio 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (75) hide show

transcribe_studio-0.2.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Mishkat Quantum Labs
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

transcribe_studio-0.2.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,128 @@
+Metadata-Version: 2.4
+Name: transcribe-studio
+Version: 0.2.0
+Summary: Local classroom audio transcription with projects, WER evaluation, and pluggable transcript formats
+Author: Mishkat Quantum Labs
+License-Expression: MIT
+Project-URL: Homepage, https://github.com/Mishkat-Quantum-Labs/transcribe-studio
+Project-URL: Repository, https://github.com/Mishkat-Quantum-Labs/transcribe-studio
+Project-URL: Issues, https://github.com/Mishkat-Quantum-Labs/transcribe-studio/issues
+Project-URL: Changelog, https://github.com/Mishkat-Quantum-Labs/transcribe-studio/blob/main/CHANGELOG.md
+Keywords: transcription,classroom,audio,wer,speech,evaluation
+Classifier: Development Status :: 4 - Beta
+Classifier: Environment :: Web Environment
+Classifier: Framework :: FastAPI
+Classifier: Intended Audience :: Science/Research
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
+Classifier: Topic :: Scientific/Engineering
+Requires-Python: >=3.11
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: fastapi>=0.115.0
+Requires-Dist: uvicorn[standard]>=0.32.0
+Requires-Dist: jinja2>=3.1.0
+Requires-Dist: python-multipart>=0.0.12
+Requires-Dist: aiofiles>=24.1.0
+Provides-Extra: dev
+Requires-Dist: pytest>=8.0; extra == "dev"
+Requires-Dist: build>=1.0; extra == "dev"
+Requires-Dist: twine>=5.0; extra == "dev"
+Dynamic: license-file
+# Transcribe Studio
+[![CI](https://github.com/Mishkat-Quantum-Labs/transcribe-studio/actions/workflows/ci.yml/badge.svg)](https://github.com/Mishkat-Quantum-Labs/transcribe-studio/actions/workflows/ci.yml)
+A local, browser-based tool for classroom audio transcription. Organize work by **project**, split audio into timed chunks, label speakers in free text, and evaluate human transcripts against LLM output (WER + semantic WER).
+Built for researchers and annotators who need millisecond timestamps and exportable data — without Label Studio complexity.
+## Features
+- **Projects** — group recordings by class, session, or study
+- **Waveform editor** — divide audio into chunks (by duration or count), overlap speakers at the same timestamp
+- **Chunk playback** — play one chunk at a time with **speed up/down** (0.25×–2×, keys `,` / `.`)
+- **Exports** — TXT, Markdown, JSON, CSV, SRT, WebVTT
+- **LLM evaluation** — paste or upload hypothesis transcripts; strict + semantic WER
+- **Pluggable formats** — timestamp/speaker lines, JSON segments, plain text (TOML-driven)
+## Quick start
+### With uv (recommended)
+```bash
+git clone https://github.com/Mishkat-Quantum-Labs/transcribe-studio.git
+cd transcribe-studio
+uv venv
+uv pip install -e ".[dev]"
+uv run transcribe-studio
+```
+Open **http://127.0.0.1:8082**
+### With pip
+```bash
+pip install transcribe-studio
+transcribe-studio
+```
+## Usage
+1. Create a **project** from the dashboard
+2. **Upload** an MP3/WAV/M4A/OGG/FLAC recording
+3. **Divide** the wave into chunks, then transcribe each segment
+4. Use **Evaluation** to compare your transcript against an LLM upload
+5. **Export** when done
+Data is stored under `~/.transcribe-studio/` (override with `TRANSCRIBE_STUDIO_DATA`).
+## Development
+```bash
+uv pip install -e ".[dev]"
+uv run pytest
+```
+## Publishing
+### PyPI via uv (recommended)
+```bash
+uv build
+uv publish   # uses UV_PUBLISH_TOKEN or prompts for PyPI credentials
+```
+### PyPI via pip/twine
+```bash
+pip install build twine
+python -m build
+twine upload dist/*
+```
+### GitHub release
+```bash
+git tag v0.2.0
+git push origin v0.2.0
+gh release create v0.2.0 dist/*
+```
+## Configuration
+Evaluation and transcript import settings ship inside the package:
+- `app/config/evaluation.toml`
+- `app/config/transcript_formats.toml`
+- `app/config/languages/en.toml`
+## License
+MIT — see [LICENSE](LICENSE).
+## Contributing
+Issues and PRs welcome at [github.com/Mishkat-Quantum-Labs/transcribe-studio](https://github.com/Mishkat-Quantum-Labs/transcribe-studio).

transcribe_studio-0.2.0/README.md ADDED Viewed

@@ -0,0 +1,95 @@
+# Transcribe Studio
+[![CI](https://github.com/Mishkat-Quantum-Labs/transcribe-studio/actions/workflows/ci.yml/badge.svg)](https://github.com/Mishkat-Quantum-Labs/transcribe-studio/actions/workflows/ci.yml)
+A local, browser-based tool for classroom audio transcription. Organize work by **project**, split audio into timed chunks, label speakers in free text, and evaluate human transcripts against LLM output (WER + semantic WER).
+Built for researchers and annotators who need millisecond timestamps and exportable data — without Label Studio complexity.
+## Features
+- **Projects** — group recordings by class, session, or study
+- **Waveform editor** — divide audio into chunks (by duration or count), overlap speakers at the same timestamp
+- **Chunk playback** — play one chunk at a time with **speed up/down** (0.25×–2×, keys `,` / `.`)
+- **Exports** — TXT, Markdown, JSON, CSV, SRT, WebVTT
+- **LLM evaluation** — paste or upload hypothesis transcripts; strict + semantic WER
+- **Pluggable formats** — timestamp/speaker lines, JSON segments, plain text (TOML-driven)
+## Quick start
+### With uv (recommended)
+```bash
+git clone https://github.com/Mishkat-Quantum-Labs/transcribe-studio.git
+cd transcribe-studio
+uv venv
+uv pip install -e ".[dev]"
+uv run transcribe-studio
+```
+Open **http://127.0.0.1:8082**
+### With pip
+```bash
+pip install transcribe-studio
+transcribe-studio
+```
+## Usage
+1. Create a **project** from the dashboard
+2. **Upload** an MP3/WAV/M4A/OGG/FLAC recording
+3. **Divide** the wave into chunks, then transcribe each segment
+4. Use **Evaluation** to compare your transcript against an LLM upload
+5. **Export** when done
+Data is stored under `~/.transcribe-studio/` (override with `TRANSCRIBE_STUDIO_DATA`).
+## Development
+```bash
+uv pip install -e ".[dev]"
+uv run pytest
+```
+## Publishing
+### PyPI via uv (recommended)
+```bash
+uv build
+uv publish   # uses UV_PUBLISH_TOKEN or prompts for PyPI credentials
+```
+### PyPI via pip/twine
+```bash
+pip install build twine
+python -m build
+twine upload dist/*
+```
+### GitHub release
+```bash
+git tag v0.2.0
+git push origin v0.2.0
+gh release create v0.2.0 dist/*
+```
+## Configuration
+Evaluation and transcript import settings ship inside the package:
+- `app/config/evaluation.toml`
+- `app/config/transcript_formats.toml`
+- `app/config/languages/en.toml`
+## License
+MIT — see [LICENSE](LICENSE).
+## Contributing
+Issues and PRs welcome at [github.com/Mishkat-Quantum-Labs/transcribe-studio](https://github.com/Mishkat-Quantum-Labs/transcribe-studio).

transcribe_studio-0.2.0/app/__init__.py ADDED Viewed

File without changes

transcribe_studio-0.2.0/app/analytics.py ADDED Viewed

@@ -0,0 +1,131 @@
+"""Dashboard and recording-level transcription analytics."""
+from __future__ import annotations
+import re
+from typing import Any
+def _word_count(text: str) -> int:
+    return len(re.findall(r"\S+", text or ""))
+def _segment_duration(seg: dict) -> int:
+    return max(0, seg["end_ms"] - seg["start_ms"])
+def analyze_segments(segments: list[dict], duration_ms: int | None) -> dict[str, Any]:
+    total = len(segments)
+    transcribed = sum(1 for s in segments if (s.get("transcript") or "").strip())
+    labeled = sum(1 for s in segments if (s.get("speaker") or "").strip())
+    seg_ms = sum(_segment_duration(s) for s in segments)
+    words = sum(_word_count(s.get("transcript") or "") for s in segments)
+    speaker_stats: dict[str, dict[str, int]] = {}
+    for s in segments:
+        name = (s.get("speaker") or "").strip() or "Unlabeled"
+        if name not in speaker_stats:
+            speaker_stats[name] = {"segments": 0, "words": 0, "duration_ms": 0}
+        speaker_stats[name]["segments"] += 1
+        speaker_stats[name]["words"] += _word_count(s.get("transcript") or "")
+        speaker_stats[name]["duration_ms"] += _segment_duration(s)
+    speakers = [
+        {"name": k, **v}
+        for k, v in sorted(speaker_stats.items(), key=lambda x: -x[1]["duration_ms"])
+    ]
+    dur = duration_ms or 0
+    coverage_pct = min(100, round(seg_ms / dur * 100)) if dur else 0
+    transcript_pct = round(transcribed / total * 100) if total else 0
+    speaker_pct = round(labeled / total * 100) if total else 0
+    avg_seg_ms = round(seg_ms / total) if total else 0
+    return {
+        "segment_count": total,
+        "transcribed_segments": transcribed,
+        "speaker_labeled_segments": labeled,
+        "empty_segments": total - transcribed,
+        "total_words": words,
+        "segmented_duration_ms": seg_ms,
+        "coverage_pct": coverage_pct,
+        "transcript_pct": transcript_pct,
+        "speaker_label_pct": speaker_pct,
+        "avg_segment_ms": avg_seg_ms,
+        "speakers": speakers,
+    }
+def analyze_recording(rec: dict, segments: list[dict]) -> dict[str, Any]:
+    stats = analyze_segments(segments, rec.get("duration_ms"))
+    return {
+        "id": rec["id"],
+        "title": rec["title"],
+        "duration_ms": rec.get("duration_ms"),
+        "created_at": rec.get("created_at", "")[:10],
+        "notes": rec.get("notes") or "",
+        **stats,
+    }
+def dashboard_stats(conn) -> dict[str, Any]:
+    recordings = conn.execute(
+        "SELECT id, title, duration_ms, created_at FROM recordings ORDER BY id DESC"
+    ).fetchall()
+    total_segments = conn.execute("SELECT COUNT(*) FROM segments").fetchone()[0]
+    total_duration = conn.execute(
+        "SELECT COALESCE(SUM(duration_ms), 0) FROM recordings"
+    ).fetchone()[0]
+    all_segments = conn.execute(
+        "SELECT recording_id, start_ms, end_ms, speaker, transcript FROM segments"
+    ).fetchall()
+    seg_list = [dict(s) for s in all_segments]
+    transcribed = sum(1 for s in seg_list if (s.get("transcript") or "").strip())
+    words = sum(_word_count(s.get("transcript") or "") for s in seg_list)
+    segmented_ms = sum(_segment_duration(s) for s in seg_list)
+    speakers = {
+        (s.get("speaker") or "").strip() or "Unlabeled"
+        for s in seg_list
+        if (s.get("transcript") or "").strip() or (s.get("speaker") or "").strip()
+    }
+    recording_stats = []
+    for rec in recordings:
+        rec_segs = [s for s in seg_list if s["recording_id"] == rec["id"]]
+        recording_stats.append(analyze_recording(dict(rec), rec_segs))
+    overall_transcript_pct = (
+        round(transcribed / total_segments * 100) if total_segments else 0
+    )
+    overall_coverage_pct = (
+        min(100, round(segmented_ms / total_duration * 100)) if total_duration else 0
+    )
+    return {
+        "recording_count": len(recordings),
+        "segment_count": total_segments,
+        "total_duration_ms": total_duration,
+        "segmented_duration_ms": segmented_ms,
+        "transcribed_segments": transcribed,
+        "total_words": words,
+        "unique_speakers": len(speakers),
+        "transcript_pct": overall_transcript_pct,
+        "coverage_pct": overall_coverage_pct,
+        "recordings": recording_stats,
+    }
+def fmt_duration(ms: int | None) -> str:
+    if not ms:
+        return "—"
+    s = ms / 1000
+    h = int(s // 3600)
+    m = int((s % 3600) // 60)
+    sec = s % 60
+    if h:
+        return f"{h}h {m}m"
+    if m:
+        return f"{m}m {sec:.0f}s"
+    return f"{sec:.1f}s"

transcribe_studio-0.2.0/app/config/evaluation.toml ADDED Viewed

@@ -0,0 +1,45 @@
+# Transcribe Studio - Evaluation Configuration
+# https://github.com/Mishkat-Quantum-Labs/transcribe-studio
+[evaluation]
+version = "1.0"
+default_language = "en"
+# Metrics to compute
+# Set enabled = false to skip a metric
+# weight is used for weighted averaging in overall score
+[evaluation.metrics]
+[evaluation.metrics.wer]
+enabled = true
+weight = 1.0
+case_sensitive = false
+description = "Word Error Rate - standard ASR metric"
+[evaluation.metrics.cer]
+enabled = false
+weight = 0.0
+description = "Character Error Rate - useful for character-level languages"
+[evaluation.metrics.semantic_score]
+enabled = true
+weight = 0.5
+description = "Semantic equivalence score - partial credit for meaning"
+# Text normalization settings
+# These apply before metric calculation
+[evaluation.normalization]
+lowercase = true
+trim_whitespace = true
+remove_punctuation = false
+normalize_quotes = true
+remove_special_chars = false
+# UI Settings
+[evaluation.ui]
+show_detailed_breakdown = true
+highlight_errors = true
+color_scheme = "auto"

transcribe_studio-0.2.0/app/config/languages/en.toml ADDED Viewed

@@ -0,0 +1,242 @@
+# English Language Configuration
+# Semantic equivalence rules for English
+[language]
+code = "en"
+name = "English"
+normalizer_class = "en"
+# ============================================================
+# SEMANTIC MATCHING RULES
+# ============================================================
+# These rules define phrases that are semantically equivalent
+# even when they differ in exact wording.
+#
+# Each rule has:
+# - variants: list of alternative phrasings
+# - canonical: the "standard" form to compare against
+# - weight: 0.0-1.0, confidence of equivalence
+#
+# Matching works bidirectionally:
+# "gonna" matches "going to" and vice versa
+# ============================================================
+[[semantic_matchers.group]]
+name = "contractions_informal"
+description = "Contractions and informal speech → formal forms"
+enabled = true
+[[semantic_matchers.group.rule]]
+variants = ["gonna", "gon na", "gunna", "gonna"]
+canonical = "going to"
+weight = 0.95
+[[semantic_matchers.group.rule]]
+variants = ["wanna", "wanner"]
+canonical = "want to"
+weight = 0.95
+[[semantic_matchers.group.rule]]
+variants = ["gotta", "got a"]
+canonical = "got to"
+weight = 0.9
+[[semantic_matchers.group.rule]]
+variants = ["kinda", "kind of"]
+canonical = "kind of"
+weight = 0.9
+[[semantic_matchers.group.rule]]
+variants = ["sorta", "sort of"]
+canonical = "sort of"
+weight = 0.9
+[[semantic_matchers.group.rule]]
+variants = ["outta", "out of"]
+canonical = "out of"
+weight = 0.95
+[[semantic_matchers.group.rule]]
+variants = ["lemme", "let me"]
+canonical = "let me"
+weight = 0.95
+[[semantic_matchers.group.rule]]
+variants = ["gimme", "give me"]
+canonical = "give me"
+weight = 0.95
+[[semantic_matchers.group.rule]]
+variants = ["dunno", "dont know", "do not know", "don't know"]
+canonical = "do not know"
+weight = 0.9
+[[semantic_matchers.group.rule]]
+variants = ["coulda", "could have", "could've"]
+canonical = "could have"
+weight = 0.95
+[[semantic_matchers.group.rule]]
+variants = ["woulda", "would have", "would've"]
+canonical = "would have"
+weight = 0.95
+[[semantic_matchers.group.rule]]
+variants = ["shoulda", "should have", "should've"]
+canonical = "should have"
+weight = 0.95
+[[semantic_matchers.group.rule]]
+variants = ["lotsa", "lots of"]
+canonical = "lots of"
+weight = 0.95
+[[semantic_matchers.group.rule]]
+variants = ["cause", "cos", "cuz"]
+canonical = "because"
+weight = 0.85
+[[semantic_matchers.group.rule]]
+variants = ["nvm", "nvr", "nevermind", "never mind"]
+canonical = "never mind"
+weight = 0.95
+[[semantic_matchers.group.rule]]
+variants = ["thru", "through"]
+canonical = "through"
+weight = 0.98
+[[semantic_matchers.group.rule]]
+variants = ["u", "you"]
+canonical = "you"
+weight = 0.8
+[[semantic_matchers.group.rule]]
+variants = ["ur", "you're", "your"]
+canonical = "your"
+weight = 0.7
+[[semantic_matchers.group.rule]]
+variants = ["ok", "okay", "ok"]
+canonical = "okay"
+weight = 1.0
+[[semantic_matchers.group.rule]]
+variants = ["yeah", "yes", "yea", "yah"]
+canonical = "yes"
+weight = 0.95
+[[semantic_matchers.group.rule]]
+variants = ["nope", "no", "nah"]
+canonical = "no"
+weight = 0.95
+[[semantic_matchers.group.rule]]
+variants = ["alright", "all right", "allright"]
+canonical = "all right"
+weight = 1.0
+[[semantic_matchers.group.rule]]
+variants = ["gonna", "goin", "goin to", "going"]
+canonical = "going"
+weight = 0.8
+[[semantic_matchers.group]]
+name = "repeated_sounds"
+description = "Stuttered/repeated sounds - common in spontaneous speech"
+enabled = true
+[[semantic_matchers.group.rule]]
+variants = ["um", "uh", "er", "erm"]
+canonical = ""
+weight = 0.5
+[[semantic_matchers.group]]
+name = "common_confusions"
+description = "Common ASR/LLM transcription confusions"
+enabled = true
+[[semantic_matchers.group.rule]]
+variants = ["i am", "i'm", "im"]
+canonical = "i am"
+weight = 1.0
+[[semantic_matchers.group.rule]]
+variants = ["you know", "yknow", "y'know"]
+canonical = "you know"
+weight = 0.9
+[[semantic_matchers.group.rule]]
+variants = ["like", "like like"]
+canonical = "like"
+weight = 0.7
+[[semantic_matchers.group]]
+name = "numbers"
+description = "Number word ↔ digit equivalence"
+enabled = true
+[[semantic_matchers.group.rule]]
+variants = ["for", "four"]
+canonical = "four"
+weight = 0.9
+[[semantic_matchers.group.rule]]
+variants = ["to", "two", "too"]
+canonical = "two"
+weight = 0.8
+[[semantic_matchers.group]]
+name = "contractions"
+description = "Standard English contractions"
+enabled = true
+[[semantic_matchers.group.rule]]
+variants = ["don't", "do not"]
+canonical = "do not"
+weight = 1.0
+[[semantic_matchers.group.rule]]
+variants = ["can't", "cannot"]
+canonical = "cannot"
+weight = 1.0
+[[semantic_matchers.group.rule]]
+variants = ["won't", "will not"]
+canonical = "will not"
+weight = 1.0
+[[semantic_matchers.group.rule]]
+variants = ["i've", "i have"]
+canonical = "i have"
+weight = 1.0
+[[semantic_matchers.group.rule]]
+variants = ["i'll", "i will"]
+canonical = "i will"
+weight = 1.0
+[[semantic_matchers.group.rule]]
+variants = ["it's", "it is", "its"]
+canonical = "it is"
+weight = 0.95
+[[semantic_matchers.group.rule]]
+variants = ["that's", "that is"]
+canonical = "that is"
+weight = 1.0
+[[semantic_matchers.group.rule]]
+variants = ["there's", "there is"]
+canonical = "there is"
+weight = 1.0
+[[semantic_matchers.group.rule]]
+variants = ["here's", "here is"]
+canonical = "here is"
+weight = 1.0
+[[semantic_matchers.group.rule]]
+variants = ["what's", "what is"]
+canonical = "what is"
+weight = 1.0