PyPI - subaligner - Versions diffs - 0.2.4__py3.7.egg → 0.3.0__py3.7.egg - Mend

subaligner 0.2.4py3.7.egg → 0.3.0py3.7.egg

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

EGG-INFO/PKG-INFO +14 -14
EGG-INFO/SOURCES.txt +1 -0
EGG-INFO/requires.txt +23 -13
EGG-INFO/scripts/subaligner +82 -28
EGG-INFO/scripts/subaligner_1pass +1 -1
EGG-INFO/scripts/subaligner_2pass +1 -1
EGG-INFO/scripts/subaligner_batch +1 -1
EGG-INFO/scripts/subaligner_convert +1 -1
subaligner/__init__.py +2 -0
subaligner/__main__.py +82 -28
subaligner/_version.py +1 -1
subaligner/exception.py +4 -0
subaligner/predictor.py +1 -1
subaligner/subaligner_1pass/__main__.py +1 -1
subaligner/subaligner_2pass/__main__.py +1 -1
subaligner/subaligner_batch/__main__.py +1 -1
subaligner/subaligner_convert/__main__.py +1 -1
subaligner/subtitle.py +15 -0
subaligner/trainer.py +2 -2
subaligner/transcriber.py +118 -0
subaligner/translator.py +65 -23

EGG-INFO/PKG-INFO CHANGED Viewed

@@ -1,12 +1,11 @@
 Metadata-Version: 2.1
 Name: subaligner
-Version: 0.2.4
+Version: 0.3.0
 Summary: Automatically synchronize and translate subtitles with pretrained deep neural networks, forced alignments and transformers.
 Home-page: https://subaligner.readthedocs.io/en/latest/
 Author: Xi Bai
 Author-email: xi.bai.ed@gmail.com
 License: MIT
-Platform: UNKNOWN
 Classifier: License :: OSI Approved :: MIT License
 Classifier: Programming Language :: Python :: 3.7
 Classifier: Programming Language :: Python :: 3.8
@@ -19,6 +18,7 @@ Provides-Extra: dev
 Provides-Extra: docs
 Provides-Extra: stretch
 Provides-Extra: translation
+Provides-Extra: llm
 License-File: LICENSE
 <div align="center">
@@ -26,11 +26,12 @@ License-File: LICENSE
 </div>
 [![Build Status](https://github.com/baxtree/subaligner/actions/workflows/ci-pipeline.yml/badge.svg?branch=master)](https://github.com/baxtree/subaligner/actions/workflows/ci-pipeline.yml?query=branch%3Amaster) ![Codecov](https://img.shields.io/codecov/c/github/baxtree/subaligner)
-[![Python 3.9](https://img.shields.io/badge/python-3.9-blue.svg)](https://www.python.org/downloads/release/python-390/) [![Python 3.8](https://img.shields.io/badge/python-3.8-blue.svg)](https://www.python.org/downloads/release/python-380/) [![Python 3.7](https://img.shields.io/badge/python-3.7-blue.svg)](https://www.python.org/downloads/release/python-370/)
+[![Python 3.10](https://img.shields.io/badge/python-3.10-blue.svg)](https://www.python.org/downloads/release/python-3100/) [![Python 3.9](https://img.shields.io/badge/python-3.9-blue.svg)](https://www.python.org/downloads/release/python-390/) [![Python 3.8](https://img.shields.io/badge/python-3.8-blue.svg)](https://www.python.org/downloads/release/python-380/) [![Python 3.7](https://img.shields.io/badge/python-3.7-blue.svg)](https://www.python.org/downloads/release/python-370/)
 [![Documentation Status](https://readthedocs.org/projects/subaligner/badge/?version=latest)](https://subaligner.readthedocs.io/en/latest/?badge=latest)
 [![GitHub license](https://img.shields.io/github/license/baxtree/subaligner)](https://github.com/baxtree/subaligner/blob/master/LICENSE)
 [![PyPI](https://badge.fury.io/py/subaligner.svg)](https://badge.fury.io/py/subaligner)
-[![Docker](https://img.shields.io/docker/cloud/build/baxtree/subaligner?label=Docker&style=flat)](https://hub.docker.com/r/baxtree/subaligner/builds)
+[![Docker Build](https://img.shields.io/docker/cloud/build/baxtree/subaligner?label=Docker&style=flat)](https://hub.docker.com/r/baxtree/subaligner/builds)
+[![Docker Pulls](https://img.shields.io/docker/pulls/baxtree/subaligner)](https://hub.docker.com/r/baxtree/subaligner)
 [![Citation](https://zenodo.org/badge/228440472.svg)](https://doi.org/10.5281/zenodo.5603083)
 ## Supported Formats
@@ -56,9 +57,9 @@ $ pip install subaligner
 ## Installation with Optional Packages Supporting Additional Features
 ```
-# Install dependencies for enabling translation
+# Install dependencies for enabling translation and transcription
-$ pip install 'subaligner[translation]'
+$ pip install 'subaligner[llm]'
 ```
 ```
 # Install dependencies for enabling forced alignment
@@ -140,6 +141,10 @@ $ subaligner -m single -v https://example.com/video.mp4 -s https://example.com/s
 $ subaligner -m dual -v https://example.com/video.mp4 -s https://example.com/subtitle.srt -o subtitle_aligned.srt
 ```
 ```
+# Generate subtitles by transcribing audiovisual files
+$ subaligner -m transcribe -v video.mp4 -ml eng -mr whisper -mf small -o subtitle_aligned.srt
+```
+```
 # Alignment on segmented plain texts (double newlines as the delimiter)
 $ subaligner -m script -v test.mp4 -s subtitle.txt -o subtitle_aligned.srt
@@ -159,15 +164,11 @@ $ subaligner -m dual -v video.mkv -s embedded:stream_index=0 -o subtitle_aligned
 ```
 ```
 # Translative alignment with the ISO 639-3 language code pair (src,tgt)
-$ subaligner_1pass --languages
-$ subaligner_1pass -v video.mp4 -s subtitle.srt -t src,tgt
-$ subaligner_2pass --languages
-$ subaligner_2pass -v video.mp4 -s subtitle.srt -t src,tgt
 $ subaligner --languages
 $ subaligner -m single -v video.mp4 -s subtitle.srt -t src,tgt
 $ subaligner -m dual -v video.mp4 -s subtitle.srt -t src,tgt
 $ subaligner -m script -v test.mp4 -s subtitle.txt -o subtitle_aligned.srt -t src,tgt
+$ subaligner -m transcribe -v video.mp4 -ml eng -mr whisper -mf small -o subtitle_aligned.srt -t src,tgt
 ```
 ```
 # Shift subtitle manually by offset in seconds
@@ -236,10 +237,9 @@ This tool wouldn't be possible without the following packages:
 [pysrt](https://github.com/byroot/pysrt)
 [pysubs2](https://github.com/tkarabela/pysubs2)
 [aeneas](https://www.readbeyond.it/aeneas/)
-[transformers](https://huggingface.co/transformers/).
+[transformers](https://huggingface.co/transformers/)
+[openai-whisper](https://github.com/openai/whisper).
 Thanks to Alan Robinson and Nigel Megitt for their invaluable feedback.

EGG-INFO/SOURCES.txt CHANGED Viewed

@@ -24,6 +24,7 @@ subaligner/predictor.py
 subaligner/singleton.py
 subaligner/subtitle.py
 subaligner/trainer.py
+subaligner/transcriber.py
 subaligner/translator.py
 subaligner/utils.py
 subaligner.egg-info/PKG-INFO

EGG-INFO/requires.txt CHANGED Viewed

@@ -6,19 +6,19 @@ tornado==5.1.0
 toolz==0.9.0
 toml==0.10.0
 termcolor==1.1.0
-tensorflow<2.8,>=1.15.5
+tensorflow<2.9,>=1.15.5
 tblib==1.3.2
 six~=1.15.0
 setuptools>=41.0.0
 scikit-learn~=0.24.2
-scipy~=1.5.4
+scipy<=1.8.1
 rsa==4.7
 requests-oauthlib==1.3.0
 requests~=2.25.1
 PyYAML>=4.2b1
 pytz==2018.4
 pystack-debugger==0.8.0
-pysubs2==0.2.4
+pysubs2<=1.4.2
 pysrt==1.1.1
 pyprof2calltree==1.4.3
 pydotplus==2.0.2
@@ -31,7 +31,7 @@ psutil==5.6.7
 pluggy==0.13.1
 pbr==4.0.2
 oauthlib==3.1.0
-numpy<1.23.0
+numpy<1.24.0
 numba>=0.50.0
 msgpack-python==0.5.6
 networkx>=2.5.1
@@ -48,13 +48,13 @@ isort==4.3.4
 idna==2.8
 hyperopt==0.2.4
 html5lib==1.0b9
-h5py~=3.1.0
+h5py<=3.6.0
 HeapDict==1.0.0
 graphviz==0.8.3
 google-pasta~=0.2
 google-auth-oauthlib==0.4.2
 google-auth==1.27.0
-filelock==3.0.12
+filelock<4.0.0
 distributed==1.13.0
 decorator==4.3.0
 dask<2022.1.0
@@ -81,7 +81,7 @@ typing-extensions<4.0.0
 types-setuptools==57.4.9
 types-requests==2.27.9
 mypy==0.931
-pex==2.1.34
+pex<=2.1.80
 radish-bdd~=0.13.3
 scikit-build==0.11.1
 line-profiler==3.1.0
@@ -92,8 +92,9 @@ tox~=3.23.0
 coverage==5.5
 mock==4.0.3
 aeneas~=1.7.3.0
-transformers~=4.5.1
-torch~=1.8.1
+openai-whisper==20230124
+transformers<4.27.0
+torch<1.13.0
 sentencepiece~=0.1.95
 pycountry~=20.7.3
 docutils~=0.17.0
@@ -107,8 +108,16 @@ sphinx==3.3.1
 [harmony]
 aeneas~=1.7.3.0
-transformers~=4.5.1
-torch~=1.8.1
+openai-whisper==20230124
+transformers<4.27.0
+torch<1.13.0
+sentencepiece~=0.1.95
+pycountry~=20.7.3
+[llm]
+openai-whisper==20230124
+transformers<4.27.0
+torch<1.13.0
 sentencepiece~=0.1.95
 pycountry~=20.7.3
@@ -116,7 +125,8 @@ pycountry~=20.7.3
 aeneas~=1.7.3.0
 [translation]
-transformers~=4.5.1
-torch~=1.8.1
+openai-whisper==20230124
+transformers<4.27.0
+torch<1.13.0
 sentencepiece~=0.1.95
 pycountry~=20.7.3

EGG-INFO/scripts/subaligner CHANGED Viewed

@@ -1,13 +1,17 @@
 #!python
 """
-usage: subaligner [-h] [-m {single,dual,script,shift}] [-v VIDEO_PATH] [-s SUBTITLE_PATH [SUBTITLE_PATH ...]] [-l MAX_LOGLOSS] [-so]
+usage: subaligner [-h] [-m {single,dual,script,shift,transcribe}] [-v VIDEO_PATH] [-s SUBTITLE_PATH [SUBTITLE_PATH ...]] [-l MAX_LOGLOSS] [-so]
                   [-sil {afr,amh,ara,arg,asm,aze,ben,bos,bul,cat,ces,cmn,cym,dan,deu,ell,eng,epo,est,eus,fas,fin,fra,gla,gle,glg,grc,grn,guj,heb,hin,hrv,hun,hye,ina,ind,isl,ita,jbo,jpn,kal,kan,kat,kir,kor,kur,lat,lav,lfn,lit,mal,mar,mkd,mlt,msa,mya,nah,nep,nld,nor,ori,orm,pan,pap,pol,por,ron,rus,sin,slk,slv,spa,sqi,srp,swa,swe,tam,tat,tel,tha,tsn,tur,ukr,urd,vie,yue,zho}]
-                  [-fos] [-tod TRAINING_OUTPUT_DIRECTORY] [-o OUTPUT] [-t TRANSLATE] [-os OFFSET_SECONDS] [-lgs] [-d] [-q] [-ver]
+                  [-fos] [-tod TRAINING_OUTPUT_DIRECTORY] [-o OUTPUT] [-t TRANSLATE] [-os OFFSET_SECONDS]
+                  [-ml {afr,amh,ara,arg,asm,aze,ben,bos,bul,cat,ces,cmn,cym,dan,deu,ell,eng,epo,est,eus,fas,fin,fra,gla,gle,glg,grc,grn,guj,heb,hin,hrv,hun,hye,ina,ind,isl,ita,jbo,jpn,kal,kan,kat,kir,kor,kur,lat,lav,lfn,lit,mal,mar,mkd,mlt,msa,mya,nah,nep,nld,nor,ori,orm,pan,pap,pol,por,ron,rus,sin,slk,slv,spa,sqi,srp,swa,swe,tam,tat,tel,tha,tsn,tur,ukr,urd,vie,yue,zho}]
+                  [-mr {whisper}] [-mf {tiny,tiny.en,small,medium,medium.en,base,base.en,large-v1,large-v2,large}] [-lgs] [-d] [-q] [-ver]
 Subaligner command line interface
 optional arguments:
   -h, --help            show this help message and exit
+  -s SUBTITLE_PATH [SUBTITLE_PATH ...], --subtitle_path SUBTITLE_PATH [SUBTITLE_PATH ...]
+                        File path or URL to the subtitle file (Extensions of supported subtitles: .ssa, .vtt, .srt, .txt, .smi, .ytt, .sub, .xml, .sbv, .ass, .sami, .scc, .tmp, .stl, .ttml, .dfxp) or selector for the embedded subtitle (e.g., embedded:page_num=888 or embedded:stream_index=0)
   -l MAX_LOGLOSS, --max_logloss MAX_LOGLOSS
                         Max global log loss for alignment
   -so, --stretch_on     Switch on stretch on subtitles)
@@ -23,18 +27,22 @@ optional arguments:
                         Source and target ISO 639-3 language codes separated by a comma (e.g., eng,zho)
   -os OFFSET_SECONDS, --offset_seconds OFFSET_SECONDS
                         Offset by which the subtitle will be shifted
+  -ml {afr,amh,ara,arg,asm,aze,ben,bos,bul,cat,ces,cmn,cym,dan,deu,ell,eng,epo,est,eus,fas,fin,fra,gla,gle,glg,grc,grn,guj,heb,hin,hrv,hun,hye,ina,ind,isl,ita,jbo,jpn,kal,kan,kat,kir,kor,kur,lat,lav,lfn,lit,mal,mar,mkd,mlt,msa,mya,nah,nep,nld,nor,ori,orm,pan,pap,pol,por,ron,rus,sin,slk,slv,spa,sqi,srp,swa,swe,tam,tat,tel,tha,tsn,tur,ukr,urd,vie,yue,zho}, --main_language {afr,amh,ara,arg,asm,aze,ben,bos,bul,cat,ces,cmn,cym,dan,deu,ell,eng,epo,est,eus,fas,fin,fra,gla,gle,glg,grc,grn,guj,heb,hin,hrv,hun,hye,ina,ind,isl,ita,jbo,jpn,kal,kan,kat,kir,kor,kur,lat,lav,lfn,lit,mal,mar,mkd,mlt,msa,mya,nah,nep,nld,nor,ori,orm,pan,pap,pol,por,ron,rus,sin,slk,slv,spa,sqi,srp,swa,swe,tam,tat,tel,tha,tsn,tur,ukr,urd,vie,yue,zho}
+                        Target video's main language as an ISO 639-3 language code [https://en.wikipedia.org/wiki/List_of_ISO_639-3_codes]
+  -mr {whisper}, --llm_recipe {whisper}
+                        LLM recipe used for transcribing video files
+  -mf {tiny,tiny.en,small,medium,medium.en,base,base.en,large-v1,large-v2,large}, --llm_flavour {tiny,tiny.en,small,medium,medium.en,base,base.en,large-v1,large-v2,large}
+                        Flavour variation for a specific LLM recipe
   -lgs, --languages     Print out language codes used for stretch and translation
   -d, --debug           Print out debugging information
   -q, --quiet           Switch off logging information
   -ver, --version       show program's version number and exit
 required arguments:
-  -m {single,dual,script,shift}, --mode {single,dual,script,shift}
-                        Alignment mode: either single or dual
+  -m {single,dual,script,shift,transcribe}, --mode {single,dual,script,shift,transcribe}
+                        Alignment mode: single, dual, script, shift or transcribe
   -v VIDEO_PATH, --video_path VIDEO_PATH
                         File path or URL to the video file
-  -s SUBTITLE_PATH [SUBTITLE_PATH ...], --subtitle_path SUBTITLE_PATH [SUBTITLE_PATH ...]
-                        File path or URL to the subtitle file (Extensions of supported subtitles: .sami, .ssa, .vtt, .xml, .sub, .smi, .ass, .srt, .tmp, .dfxp, .stl, .ttml, .sbv, .txt, .ytt, .scc) or selector for the embedded subtitle (e.g., embedded:page_num=888 or embedded:stream_index=0)
 """
 import argparse
@@ -61,10 +69,10 @@ def main():
     required_args.add_argument(
         "-m",
         "--mode",
-        type=str,
+        type=str.lower,
         default="",
-        choices=["single", "dual", "script", "shift"],
-        help="Alignment mode: either single or dual",
+        choices=["single", "dual", "script", "shift", "transcribe"],
+        help="Alignment mode: single, dual, script, shift or transcribe",
     )
     required_args.add_argument(
         "-v",
@@ -74,7 +82,7 @@ def main():
         help="File path or URL to the video file",
     )
     from subaligner.subtitle import Subtitle
-    required_args.add_argument(
+    parser.add_argument(
         "-s",
         "--subtitle_path",
         type=str,
@@ -100,7 +108,7 @@ def main():
     parser.add_argument(
         "-sil",
         "--stretch_in_language",
-        type=str,
+        type=str.lower,
         choices=Utils.get_stretch_language_codes(),
         default="eng",
         help="Stretch the subtitle with the supported ISO 639-3 language code [https://en.wikipedia.org/wiki/List_of_ISO_639-3_codes].\nNB: This will be ignored if neither -so nor --stretch_on is present",
@@ -137,6 +145,29 @@ def main():
         type=float,
         help="Offset by which the subtitle will be shifted"
     )
+    parser.add_argument(
+        "-ml",
+        "--main_language",
+        type=str.lower,
+        choices=Utils.get_stretch_language_codes(),
+        help="Target video's main language as an ISO 639-3 language code [https://en.wikipedia.org/wiki/List_of_ISO_639-3_codes]",
+    )
+    parser.add_argument(
+        "-mr",
+        "--llm_recipe",
+        type=str.lower,
+        default="whisper",
+        choices=["whisper"],
+        help="LLM recipe used for transcribing video files"
+    )
+    parser.add_argument(
+        "-mf",
+        "--llm_flavour",
+        type=str.lower,
+        default="small",
+        choices=["tiny", "tiny.en", "small", "medium", "medium.en", "base", "base.en", "large-v1", "large-v2", "large"],
+        help="Flavour variation for a specific LLM recipe"
+    )
     parser.add_argument("-lgs", "--languages", action="store_true",
                         help="Print out language codes used for stretch and translation")
     parser.add_argument("-d", "--debug", action="store_true",
@@ -153,33 +184,45 @@ def main():
         print("ERROR: --mode was not passed in")
         parser.print_usage()
         sys.exit(21)
     FLAGS.subtitle_path = [path for paths in FLAGS.subtitle_path for path in paths]
-    if not FLAGS.subtitle_path:
+    if not FLAGS.subtitle_path and FLAGS.mode != "transcribe":
         print("ERROR: --subtitle_path was not passed in")
         parser.print_usage()
         sys.exit(21)
-    if FLAGS.mode != "shift":
+    elif FLAGS.mode == "transcribe":
+        FLAGS.subtitle_path = ["{}.srt".format(tempfile.mkstemp()[1])]
+    if FLAGS.mode in ["single", "dual", "script", "transcribe"]:
         for subtitle_path in FLAGS.subtitle_path:
             if FLAGS.video_path == "":
                 print("ERROR: --video_path was not passed in")
                 parser.print_usage()
                 sys.exit(21)
             if subtitle_path.lower().startswith("http") and FLAGS.output == "":
-                print("ERROR: --output was not passed in for alignment on a remote subtitle file")
+                print("ERROR: --output was not passed in but required by alignment on a remote subtitle file")
                 parser.print_usage()
                 sys.exit(21)
             if subtitle_path.lower().startswith("embedded:") and FLAGS.output == "":
-                print("ERROR: --output was not passed in for alignment on embedded subtitles")
+                print("ERROR: --output was not passed in but required by alignment on embedded subtitles")
                 parser.print_usage()
                 sys.exit(21)
             if FLAGS.mode == "script" and FLAGS.output == "":
-                print("ERROR: --output was not passed in for alignment on plain texts")
+                print("ERROR: --output was not passed in but required by alignment on plain texts")
                 parser.print_usage()
                 sys.exit(21)
-            if FLAGS.translate is not None:
+            if FLAGS.mode == "transcribe":
+                if FLAGS.output == "":
+                    print("ERROR: --output was not passed in but required by mode 'transcribe'")
+                    parser.print_usage()
+                    sys.exit(21)
+                if FLAGS.main_language is None:
+                    print("ERROR: --main_language was not passed in but required by mode 'transcribe'")
+                    parser.print_usage()
+                    sys.exit(21)
+            if FLAGS.translate is not None or FLAGS.mode == "transcribe":
                 if "transformers" not in {pkg.key for pkg in pkg_resources.working_set}:
-                    print('ERROR: Alignment has been configured to perform translation. Please install "subaligner[translation]" and run your command again.')
+                    print('ERROR: Alignment has been configured to use language models. Please install "subaligner[llm]" and run your command again.')
                     sys.exit(21)
             if FLAGS.stretch_on or FLAGS.mode == "script":
                 if "aeneas" not in {pkg.key for pkg in pkg_resources.working_set}:
@@ -190,13 +233,13 @@ def main():
             local_subtitle_path = subtitle_path
             exit_segfail = FLAGS.exit_segfail
             stretch = FLAGS.stretch_on
-            stretch_in_lang = FLAGS.stretch_in_language
+            stretch_in_lang = FLAGS.main_language or FLAGS.stretch_in_language
             from subaligner.logger import Logger
             Logger.VERBOSE = FLAGS.debug
             Logger.QUIET = FLAGS.quiet
             from subaligner.predictor import Predictor
-            from subaligner.exception import UnsupportedFormatException
+            from subaligner.exception import UnsupportedFormatException, TranscriptionException
             from subaligner.exception import TerminalException
             try:
@@ -230,6 +273,7 @@ def main():
                         parser.print_usage()
                         sys.exit(21)
+                voice_probabilities = None
                 predictor = Predictor()
                 if FLAGS.mode == "single":
                     aligned_subs, audio_file_path, voice_probabilities, frame_rate = predictor.predict_single_pass(
@@ -252,6 +296,11 @@ def main():
                         subtitle_file_path=local_subtitle_path,
                         stretch_in_lang=stretch_in_lang,
                     )
+                elif FLAGS.mode == "transcribe":
+                    from subaligner.transcriber import Transcriber
+                    transcriber = Transcriber(recipe=FLAGS.llm_recipe, flavour=FLAGS.llm_flavour)
+                    subtitle, frame_rate = transcriber.transcribe(local_video_path, stretch_in_lang)
+                    aligned_subs = subtitle.subs
                 else:
                     print("ERROR: Unknown mode {}".format(FLAGS.mode))
                     parser.print_usage()
@@ -267,6 +316,9 @@ def main():
                     aligned_subs = translator.translate(aligned_subs)
                     Subtitle.save_subs_as_target_format(aligned_subs, local_subtitle_path, aligned_subtitle_path,
                                                         frame_rate, "utf-8")
+                elif FLAGS.mode == "transcribe":
+                    Subtitle.save_subs_as_target_format(aligned_subs, local_subtitle_path, aligned_subtitle_path,
+                                                        frame_rate, "utf-8")
                 else:
                     Subtitle.save_subs_as_target_format(aligned_subs, local_subtitle_path, aligned_subtitle_path,
                                                         frame_rate)
@@ -277,35 +329,35 @@ def main():
                         print(
                             "ERROR: Alignment failed with a too high loss value: {}".format(log_loss)
                         )
-                        _remove_tmp_files(FLAGS.video_path, subtitle_path, local_video_path, local_subtitle_path)
+                        _remove_tmp_files(FLAGS.video_path, subtitle_path, local_video_path, local_subtitle_path, FLAGS.mode)
                         sys.exit(22)
                 print("Aligned subtitle saved to: {}".format(aligned_subtitle_path))
-            except UnsupportedFormatException as e:
+            except (UnsupportedFormatException, TranscriptionException) as e:
                 print(
                     "ERROR: {}\n{}".format(str(e), "".join(traceback.format_stack()) if FLAGS.debug else "")
                 )
                 traceback.print_tb(e.__traceback__)
-                _remove_tmp_files(FLAGS.video_path, subtitle_path, local_video_path, local_subtitle_path)
+                _remove_tmp_files(FLAGS.video_path, subtitle_path, local_video_path, local_subtitle_path, FLAGS.mode)
                 sys.exit(23)
             except TerminalException as e:
                 print(
                     "ERROR: {}\n{}".format(str(e), "".join(traceback.format_stack()) if FLAGS.debug else "")
                 )
                 traceback.print_tb(e.__traceback__)
-                _remove_tmp_files(FLAGS.video_path, subtitle_path, local_video_path, local_subtitle_path)
+                _remove_tmp_files(FLAGS.video_path, subtitle_path, local_video_path, local_subtitle_path, FLAGS.mode)
                 sys.exit(24)
             except Exception as e:
                 print(
                     "ERROR: {}\n{}".format(str(e), "".join(traceback.format_stack()) if FLAGS.debug else "")
                 )
                 traceback.print_tb(e.__traceback__)
-                _remove_tmp_files(FLAGS.video_path, subtitle_path, local_video_path, local_subtitle_path)
+                _remove_tmp_files(FLAGS.video_path, subtitle_path, local_video_path, local_subtitle_path, FLAGS.mode)
                 sys.exit(1)
             else:
-                _remove_tmp_files(FLAGS.video_path, subtitle_path, local_video_path, local_subtitle_path)
+                _remove_tmp_files(FLAGS.video_path, subtitle_path, local_video_path, local_subtitle_path, FLAGS.mode)
         sys.exit(0)
-    else:
+    elif FLAGS.mode == "shift":
         if FLAGS.offset_seconds is None:
             print("ERROR: --offset_seconds was not passed in during subtitle shifting")
             sys.exit(21)
@@ -319,11 +371,13 @@ def main():
         sys.exit(0)
-def _remove_tmp_files(video_path, subtitle_path, local_video_path, local_subtitle_path):
+def _remove_tmp_files(video_path, subtitle_path, local_video_path, local_subtitle_path, mode):
     if video_path.lower().startswith("http") and os.path.exists(local_video_path):
         os.remove(local_video_path)
     if subtitle_path.lower().startswith("http") and os.path.exists(local_subtitle_path):
         os.remove(local_subtitle_path)
+    if mode == "transcribe" and os.path.exists(local_subtitle_path):
+        os.remove(local_subtitle_path)
 if __name__ == "__main__":

EGG-INFO/scripts/subaligner_1pass CHANGED Viewed

@@ -120,7 +120,7 @@ def main():
         sys.exit(21)
     if FLAGS.translate is not None:
         if "transformers" not in {pkg.key for pkg in pkg_resources.working_set}:
-            print('ERROR: Alignment has been configured to perform translation. Please install "subaligner[translation]" and run your command again.')
+            print('ERROR: Alignment has been configured to perform translation. Please install "subaligner[llm]" and run your command again.')
             sys.exit(21)
     local_video_path = FLAGS.video_path

EGG-INFO/scripts/subaligner_2pass CHANGED Viewed

@@ -147,7 +147,7 @@ def main():
         sys.exit(21)
     if FLAGS.translate is not None:
         if "transformers" not in {pkg.key for pkg in pkg_resources.working_set}:
-            print('ERROR: Alignment has been configured to perform translation. Please install "subaligner[translation]" and run your command again.')
+            print('ERROR: Alignment has been configured to perform translation. Please install "subaligner[llm]" and run your command again.')
             sys.exit(21)
     if FLAGS.stretch_on:
         if "aeneas" not in {pkg.key for pkg in pkg_resources.working_set}:

EGG-INFO/scripts/subaligner_batch CHANGED Viewed

@@ -173,7 +173,7 @@ Each file pair needs to share the same base filename, the part before the extens
         sys.exit(21)
     if FLAGS.translate is not None:
         if "transformers" not in {pkg.key for pkg in pkg_resources.working_set}:
-            print('ERROR: Alignment has been configured to perform translation. Please install "subaligner[translation]" and run your command again.')
+            print('ERROR: Alignment has been configured to perform translation. Please install "subaligner[llm]" and run your command again.')
             sys.exit(21)
     video_file_paths = [os.path.abspath(os.path.join(path, p)) for path, _, files in

EGG-INFO/scripts/subaligner_convert CHANGED Viewed

@@ -99,7 +99,7 @@ def main():
         sys.exit(21)
     if FLAGS.translate is not None:
         if "transformers" not in {pkg.key for pkg in pkg_resources.working_set}:
-            print('ERROR: Alignment has been configured to perform translation. Please install "subaligner[translation]" and run your command again.')
+            print('ERROR: Alignment has been configured to perform translation. Please install "subaligner[llm]" and run your command again.')
             sys.exit(21)
     local_subtitle_path = FLAGS.input_subtitle_path

subaligner/__init__.py CHANGED Viewed

@@ -1,5 +1,7 @@
+import os
 import multiprocessing as mp
 from ._version import __version__
 __all__ = ["__version__"]
 mp.set_start_method("spawn", force=True)
+os.environ["KMP_WARNINGS"] = "0"

subaligner/__main__.py CHANGED Viewed

@@ -1,13 +1,17 @@
 #!/usr/bin/env python
 """
-usage: subaligner [-h] [-m {single,dual,script,shift}] [-v VIDEO_PATH] [-s SUBTITLE_PATH [SUBTITLE_PATH ...]] [-l MAX_LOGLOSS] [-so]
+usage: subaligner [-h] [-m {single,dual,script,shift,transcribe}] [-v VIDEO_PATH] [-s SUBTITLE_PATH [SUBTITLE_PATH ...]] [-l MAX_LOGLOSS] [-so]
                   [-sil {afr,amh,ara,arg,asm,aze,ben,bos,bul,cat,ces,cmn,cym,dan,deu,ell,eng,epo,est,eus,fas,fin,fra,gla,gle,glg,grc,grn,guj,heb,hin,hrv,hun,hye,ina,ind,isl,ita,jbo,jpn,kal,kan,kat,kir,kor,kur,lat,lav,lfn,lit,mal,mar,mkd,mlt,msa,mya,nah,nep,nld,nor,ori,orm,pan,pap,pol,por,ron,rus,sin,slk,slv,spa,sqi,srp,swa,swe,tam,tat,tel,tha,tsn,tur,ukr,urd,vie,yue,zho}]
-                  [-fos] [-tod TRAINING_OUTPUT_DIRECTORY] [-o OUTPUT] [-t TRANSLATE] [-os OFFSET_SECONDS] [-lgs] [-d] [-q] [-ver]
+                  [-fos] [-tod TRAINING_OUTPUT_DIRECTORY] [-o OUTPUT] [-t TRANSLATE] [-os OFFSET_SECONDS]
+                  [-ml {afr,amh,ara,arg,asm,aze,ben,bos,bul,cat,ces,cmn,cym,dan,deu,ell,eng,epo,est,eus,fas,fin,fra,gla,gle,glg,grc,grn,guj,heb,hin,hrv,hun,hye,ina,ind,isl,ita,jbo,jpn,kal,kan,kat,kir,kor,kur,lat,lav,lfn,lit,mal,mar,mkd,mlt,msa,mya,nah,nep,nld,nor,ori,orm,pan,pap,pol,por,ron,rus,sin,slk,slv,spa,sqi,srp,swa,swe,tam,tat,tel,tha,tsn,tur,ukr,urd,vie,yue,zho}]
+                  [-mr {whisper}] [-mf {tiny,tiny.en,small,medium,medium.en,base,base.en,large-v1,large-v2,large}] [-lgs] [-d] [-q] [-ver]
 Subaligner command line interface
 optional arguments:
   -h, --help            show this help message and exit
+  -s SUBTITLE_PATH [SUBTITLE_PATH ...], --subtitle_path SUBTITLE_PATH [SUBTITLE_PATH ...]
+                        File path or URL to the subtitle file (Extensions of supported subtitles: .ssa, .vtt, .srt, .txt, .smi, .ytt, .sub, .xml, .sbv, .ass, .sami, .scc, .tmp, .stl, .ttml, .dfxp) or selector for the embedded subtitle (e.g., embedded:page_num=888 or embedded:stream_index=0)
   -l MAX_LOGLOSS, --max_logloss MAX_LOGLOSS
                         Max global log loss for alignment
   -so, --stretch_on     Switch on stretch on subtitles)
@@ -23,18 +27,22 @@ optional arguments:
                         Source and target ISO 639-3 language codes separated by a comma (e.g., eng,zho)
   -os OFFSET_SECONDS, --offset_seconds OFFSET_SECONDS
                         Offset by which the subtitle will be shifted
+  -ml {afr,amh,ara,arg,asm,aze,ben,bos,bul,cat,ces,cmn,cym,dan,deu,ell,eng,epo,est,eus,fas,fin,fra,gla,gle,glg,grc,grn,guj,heb,hin,hrv,hun,hye,ina,ind,isl,ita,jbo,jpn,kal,kan,kat,kir,kor,kur,lat,lav,lfn,lit,mal,mar,mkd,mlt,msa,mya,nah,nep,nld,nor,ori,orm,pan,pap,pol,por,ron,rus,sin,slk,slv,spa,sqi,srp,swa,swe,tam,tat,tel,tha,tsn,tur,ukr,urd,vie,yue,zho}, --main_language {afr,amh,ara,arg,asm,aze,ben,bos,bul,cat,ces,cmn,cym,dan,deu,ell,eng,epo,est,eus,fas,fin,fra,gla,gle,glg,grc,grn,guj,heb,hin,hrv,hun,hye,ina,ind,isl,ita,jbo,jpn,kal,kan,kat,kir,kor,kur,lat,lav,lfn,lit,mal,mar,mkd,mlt,msa,mya,nah,nep,nld,nor,ori,orm,pan,pap,pol,por,ron,rus,sin,slk,slv,spa,sqi,srp,swa,swe,tam,tat,tel,tha,tsn,tur,ukr,urd,vie,yue,zho}
+                        Target video's main language as an ISO 639-3 language code [https://en.wikipedia.org/wiki/List_of_ISO_639-3_codes]
+  -mr {whisper}, --llm_recipe {whisper}
+                        LLM recipe used for transcribing video files
+  -mf {tiny,tiny.en,small,medium,medium.en,base,base.en,large-v1,large-v2,large}, --llm_flavour {tiny,tiny.en,small,medium,medium.en,base,base.en,large-v1,large-v2,large}
+                        Flavour variation for a specific LLM recipe
   -lgs, --languages     Print out language codes used for stretch and translation
   -d, --debug           Print out debugging information
   -q, --quiet           Switch off logging information
   -ver, --version       show program's version number and exit
 required arguments:
-  -m {single,dual,script,shift}, --mode {single,dual,script,shift}
-                        Alignment mode: either single or dual
+  -m {single,dual,script,shift,transcribe}, --mode {single,dual,script,shift,transcribe}
+                        Alignment mode: single, dual, script, shift or transcribe
   -v VIDEO_PATH, --video_path VIDEO_PATH
                         File path or URL to the video file
-  -s SUBTITLE_PATH [SUBTITLE_PATH ...], --subtitle_path SUBTITLE_PATH [SUBTITLE_PATH ...]
-                        File path or URL to the subtitle file (Extensions of supported subtitles: .sami, .ssa, .vtt, .xml, .sub, .smi, .ass, .srt, .tmp, .dfxp, .stl, .ttml, .sbv, .txt, .ytt, .scc) or selector for the embedded subtitle (e.g., embedded:page_num=888 or embedded:stream_index=0)
 """
 import argparse
@@ -61,10 +69,10 @@ def main():
     required_args.add_argument(
         "-m",
         "--mode",
-        type=str,
+        type=str.lower,
         default="",
-        choices=["single", "dual", "script", "shift"],
-        help="Alignment mode: either single or dual",
+        choices=["single", "dual", "script", "shift", "transcribe"],
+        help="Alignment mode: single, dual, script, shift or transcribe",
     )
     required_args.add_argument(
         "-v",
@@ -74,7 +82,7 @@ def main():
         help="File path or URL to the video file",
     )
     from subaligner.subtitle import Subtitle
-    required_args.add_argument(
+    parser.add_argument(
         "-s",
         "--subtitle_path",
         type=str,
@@ -100,7 +108,7 @@ def main():
     parser.add_argument(
         "-sil",
         "--stretch_in_language",
-        type=str,
+        type=str.lower,
         choices=Utils.get_stretch_language_codes(),
         default="eng",
         help="Stretch the subtitle with the supported ISO 639-3 language code [https://en.wikipedia.org/wiki/List_of_ISO_639-3_codes].\nNB: This will be ignored if neither -so nor --stretch_on is present",
@@ -137,6 +145,29 @@ def main():
         type=float,
         help="Offset by which the subtitle will be shifted"
     )
+    parser.add_argument(
+        "-ml",
+        "--main_language",
+        type=str.lower,
+        choices=Utils.get_stretch_language_codes(),
+        help="Target video's main language as an ISO 639-3 language code [https://en.wikipedia.org/wiki/List_of_ISO_639-3_codes]",
+    )
+    parser.add_argument(
+        "-mr",
+        "--llm_recipe",
+        type=str.lower,
+        default="whisper",
+        choices=["whisper"],
+        help="LLM recipe used for transcribing video files"
+    )
+    parser.add_argument(
+        "-mf",
+        "--llm_flavour",
+        type=str.lower,
+        default="small",
+        choices=["tiny", "tiny.en", "small", "medium", "medium.en", "base", "base.en", "large-v1", "large-v2", "large"],
+        help="Flavour variation for a specific LLM recipe"
+    )
     parser.add_argument("-lgs", "--languages", action="store_true",
                         help="Print out language codes used for stretch and translation")
     parser.add_argument("-d", "--debug", action="store_true",
@@ -153,33 +184,45 @@ def main():
         print("ERROR: --mode was not passed in")
         parser.print_usage()
         sys.exit(21)
     FLAGS.subtitle_path = [path for paths in FLAGS.subtitle_path for path in paths]
-    if not FLAGS.subtitle_path:
+    if not FLAGS.subtitle_path and FLAGS.mode != "transcribe":
         print("ERROR: --subtitle_path was not passed in")
         parser.print_usage()
         sys.exit(21)
-    if FLAGS.mode != "shift":
+    elif FLAGS.mode == "transcribe":
+        FLAGS.subtitle_path = ["{}.srt".format(tempfile.mkstemp()[1])]
+    if FLAGS.mode in ["single", "dual", "script", "transcribe"]:
         for subtitle_path in FLAGS.subtitle_path:
             if FLAGS.video_path == "":
                 print("ERROR: --video_path was not passed in")
                 parser.print_usage()
                 sys.exit(21)
             if subtitle_path.lower().startswith("http") and FLAGS.output == "":
-                print("ERROR: --output was not passed in for alignment on a remote subtitle file")
+                print("ERROR: --output was not passed in but required by alignment on a remote subtitle file")
                 parser.print_usage()
                 sys.exit(21)
             if subtitle_path.lower().startswith("embedded:") and FLAGS.output == "":
-                print("ERROR: --output was not passed in for alignment on embedded subtitles")
+                print("ERROR: --output was not passed in but required by alignment on embedded subtitles")
                 parser.print_usage()
                 sys.exit(21)
             if FLAGS.mode == "script" and FLAGS.output == "":
-                print("ERROR: --output was not passed in for alignment on plain texts")
+                print("ERROR: --output was not passed in but required by alignment on plain texts")
                 parser.print_usage()
                 sys.exit(21)
-            if FLAGS.translate is not None:
+            if FLAGS.mode == "transcribe":
+                if FLAGS.output == "":
+                    print("ERROR: --output was not passed in but required by mode 'transcribe'")
+                    parser.print_usage()
+                    sys.exit(21)
+                if FLAGS.main_language is None:
+                    print("ERROR: --main_language was not passed in but required by mode 'transcribe'")
+                    parser.print_usage()
+                    sys.exit(21)
+            if FLAGS.translate is not None or FLAGS.mode == "transcribe":
                 if "transformers" not in {pkg.key for pkg in pkg_resources.working_set}:
-                    print('ERROR: Alignment has been configured to perform translation. Please install "subaligner[translation]" and run your command again.')
+                    print('ERROR: Alignment has been configured to use language models. Please install "subaligner[llm]" and run your command again.')
                     sys.exit(21)
             if FLAGS.stretch_on or FLAGS.mode == "script":
                 if "aeneas" not in {pkg.key for pkg in pkg_resources.working_set}:
@@ -190,13 +233,13 @@ def main():
             local_subtitle_path = subtitle_path
             exit_segfail = FLAGS.exit_segfail
             stretch = FLAGS.stretch_on
-            stretch_in_lang = FLAGS.stretch_in_language
+            stretch_in_lang = FLAGS.main_language or FLAGS.stretch_in_language
             from subaligner.logger import Logger
             Logger.VERBOSE = FLAGS.debug
             Logger.QUIET = FLAGS.quiet
             from subaligner.predictor import Predictor
-            from subaligner.exception import UnsupportedFormatException
+            from subaligner.exception import UnsupportedFormatException, TranscriptionException
             from subaligner.exception import TerminalException
             try:
@@ -230,6 +273,7 @@ def main():
                         parser.print_usage()
                         sys.exit(21)
+                voice_probabilities = None
                 predictor = Predictor()
                 if FLAGS.mode == "single":
                     aligned_subs, audio_file_path, voice_probabilities, frame_rate = predictor.predict_single_pass(
@@ -252,6 +296,11 @@ def main():
                         subtitle_file_path=local_subtitle_path,
                         stretch_in_lang=stretch_in_lang,
                     )
+                elif FLAGS.mode == "transcribe":
+                    from subaligner.transcriber import Transcriber
+                    transcriber = Transcriber(recipe=FLAGS.llm_recipe, flavour=FLAGS.llm_flavour)
+                    subtitle, frame_rate = transcriber.transcribe(local_video_path, stretch_in_lang)
+                    aligned_subs = subtitle.subs
                 else:
                     print("ERROR: Unknown mode {}".format(FLAGS.mode))
                     parser.print_usage()
@@ -267,6 +316,9 @@ def main():
                     aligned_subs = translator.translate(aligned_subs)
                     Subtitle.save_subs_as_target_format(aligned_subs, local_subtitle_path, aligned_subtitle_path,
                                                         frame_rate, "utf-8")
+                elif FLAGS.mode == "transcribe":
+                    Subtitle.save_subs_as_target_format(aligned_subs, local_subtitle_path, aligned_subtitle_path,
+                                                        frame_rate, "utf-8")
                 else:
                     Subtitle.save_subs_as_target_format(aligned_subs, local_subtitle_path, aligned_subtitle_path,
                                                         frame_rate)
@@ -277,35 +329,35 @@ def main():
                         print(
                             "ERROR: Alignment failed with a too high loss value: {}".format(log_loss)
                         )
-                        _remove_tmp_files(FLAGS.video_path, subtitle_path, local_video_path, local_subtitle_path)
+                        _remove_tmp_files(FLAGS.video_path, subtitle_path, local_video_path, local_subtitle_path, FLAGS.mode)
                         sys.exit(22)
                 print("Aligned subtitle saved to: {}".format(aligned_subtitle_path))
-            except UnsupportedFormatException as e:
+            except (UnsupportedFormatException, TranscriptionException) as e:
                 print(
                     "ERROR: {}\n{}".format(str(e), "".join(traceback.format_stack()) if FLAGS.debug else "")
                 )
                 traceback.print_tb(e.__traceback__)
-                _remove_tmp_files(FLAGS.video_path, subtitle_path, local_video_path, local_subtitle_path)
+                _remove_tmp_files(FLAGS.video_path, subtitle_path, local_video_path, local_subtitle_path, FLAGS.mode)
                 sys.exit(23)
             except TerminalException as e:
                 print(
                     "ERROR: {}\n{}".format(str(e), "".join(traceback.format_stack()) if FLAGS.debug else "")
                 )
                 traceback.print_tb(e.__traceback__)
-                _remove_tmp_files(FLAGS.video_path, subtitle_path, local_video_path, local_subtitle_path)
+                _remove_tmp_files(FLAGS.video_path, subtitle_path, local_video_path, local_subtitle_path, FLAGS.mode)
                 sys.exit(24)
             except Exception as e:
                 print(
                     "ERROR: {}\n{}".format(str(e), "".join(traceback.format_stack()) if FLAGS.debug else "")
                 )
                 traceback.print_tb(e.__traceback__)
-                _remove_tmp_files(FLAGS.video_path, subtitle_path, local_video_path, local_subtitle_path)
+                _remove_tmp_files(FLAGS.video_path, subtitle_path, local_video_path, local_subtitle_path, FLAGS.mode)
                 sys.exit(1)
             else:
-                _remove_tmp_files(FLAGS.video_path, subtitle_path, local_video_path, local_subtitle_path)
+                _remove_tmp_files(FLAGS.video_path, subtitle_path, local_video_path, local_subtitle_path, FLAGS.mode)
         sys.exit(0)
-    else:
+    elif FLAGS.mode == "shift":
         if FLAGS.offset_seconds is None:
             print("ERROR: --offset_seconds was not passed in during subtitle shifting")
             sys.exit(21)
@@ -319,11 +371,13 @@ def main():
         sys.exit(0)
-def _remove_tmp_files(video_path, subtitle_path, local_video_path, local_subtitle_path):
+def _remove_tmp_files(video_path, subtitle_path, local_video_path, local_subtitle_path, mode):
     if video_path.lower().startswith("http") and os.path.exists(local_video_path):
         os.remove(local_video_path)
     if subtitle_path.lower().startswith("http") and os.path.exists(local_subtitle_path):
         os.remove(local_subtitle_path)
+    if mode == "transcribe" and os.path.exists(local_subtitle_path):
+        os.remove(local_subtitle_path)
 if __name__ == "__main__":

subaligner/_version.py CHANGED Viewed

@@ -1,2 +1,2 @@
 """The semver for the current release."""
-__version__ = "0.2.4"
+__version__ = "0.3.0"

subaligner/exception.py CHANGED Viewed

@@ -8,3 +8,7 @@ class TerminalException(Exception):
 class NoFrameRateException(Exception):
     """ An exception raised due to frame rate not found."""
+class TranscriptionException(Exception):
+    """ An exception raised due to transcription failures."""

subaligner/predictor.py CHANGED Viewed

@@ -37,7 +37,7 @@ class Predictor(metaclass=Singleton):
     __SEGMENT_PREDICTION_TIMEOUT = 60  # Maximum waiting time in seconds when predicting each segment
     __THREAD_QUEUE_SIZE = 8
-    __THREAD_NUMBER = 4
+    __THREAD_NUMBER = 1  # Do not change
     def __init__(self, **kwargs) -> None:
         """Feature predictor initialiser.

subaligner/subaligner_1pass/__main__.py CHANGED Viewed

@@ -120,7 +120,7 @@ def main():
         sys.exit(21)
     if FLAGS.translate is not None:
         if "transformers" not in {pkg.key for pkg in pkg_resources.working_set}:
-            print('ERROR: Alignment has been configured to perform translation. Please install "subaligner[translation]" and run your command again.')
+            print('ERROR: Alignment has been configured to perform translation. Please install "subaligner[llm]" and run your command again.')
             sys.exit(21)
     local_video_path = FLAGS.video_path

subaligner/subaligner_2pass/__main__.py CHANGED Viewed

@@ -147,7 +147,7 @@ def main():
         sys.exit(21)
     if FLAGS.translate is not None:
         if "transformers" not in {pkg.key for pkg in pkg_resources.working_set}:
-            print('ERROR: Alignment has been configured to perform translation. Please install "subaligner[translation]" and run your command again.')
+            print('ERROR: Alignment has been configured to perform translation. Please install "subaligner[llm]" and run your command again.')
             sys.exit(21)
     if FLAGS.stretch_on:
         if "aeneas" not in {pkg.key for pkg in pkg_resources.working_set}:

subaligner/subaligner_batch/__main__.py CHANGED Viewed

@@ -173,7 +173,7 @@ Each file pair needs to share the same base filename, the part before the extens
         sys.exit(21)
     if FLAGS.translate is not None:
         if "transformers" not in {pkg.key for pkg in pkg_resources.working_set}:
-            print('ERROR: Alignment has been configured to perform translation. Please install "subaligner[translation]" and run your command again.')
+            print('ERROR: Alignment has been configured to perform translation. Please install "subaligner[llm]" and run your command again.')
             sys.exit(21)
     video_file_paths = [os.path.abspath(os.path.join(path, p)) for path, _, files in

subaligner/subaligner_convert/__main__.py CHANGED Viewed

@@ -99,7 +99,7 @@ def main():
         sys.exit(21)
     if FLAGS.translate is not None:
         if "transformers" not in {pkg.key for pkg in pkg_resources.working_set}:
-            print('ERROR: Alignment has been configured to perform translation. Please install "subaligner[translation]" and run your command again.')
+            print('ERROR: Alignment has been configured to perform translation. Please install "subaligner[llm]" and run your command again.')
             sys.exit(21)
     local_subtitle_path = FLAGS.input_subtitle_path

subaligner/subtitle.py CHANGED Viewed

@@ -59,6 +59,8 @@ class Subtitle(object):
         if subtitle_format == "subrip":
             self.__subs = self.__load_subrip(subtitle_file_path)
+        elif subtitle_format == "subrip_raw":
+            self.__subs = pysrt.SubRipFile().from_string(subtitle_file_path)
         elif subtitle_format == "ttml":
             self.__subs = self.__convert_ttml_to_subs(subtitle_file_path)
         elif subtitle_format == "webvtt":
@@ -105,6 +107,19 @@ class Subtitle(object):
         return cls(cls.__secret, subtitle_file_path, "subrip")
+    @classmethod
+    def load_subrip_str(cls, subrip_raw: str) -> "Subtitle":
+        """Load a SubRip subtitle string.
+        Arguments:
+            subrip_str {string} -- The string representation of the SubRip content.
+        Returns:
+            Subtitle -- Subtitle object.
+        """
+        return cls(cls.__secret, subrip_raw, "subrip_raw")
     @classmethod
     def load_ttml(cls, subtitle_file_path: str) -> "Subtitle":
         """Load a TTML subtitle file.

subaligner/trainer.py CHANGED Viewed

@@ -315,8 +315,8 @@ class Trainer(object):
         train_data = [x for x in train_data if x is not None]
         labels = [x for x in labels if x is not None]
-        train_data = np.concatenate(train_data)
-        labels = np.concatenate(labels)
+        train_data: np.ndarray = np.concatenate(train_data)  # type: ignore
+        labels: np.ndarray = np.concatenate(labels)  # type: ignore
         self.__LOGGER.debug(
             "Data and labels extracted after {} seconds".format(
                 str(datetime.datetime.now() - extraction_start)

subaligner/transcriber.py ADDED Viewed

@@ -0,0 +1,118 @@
+import os
+import whisper
+from enum import Enum
+from typing import Tuple, Optional
+from pysrt import SubRipTime
+from whisper.tokenizer import LANGUAGES
+from .translator import Translator
+from .subtitle import Subtitle
+from .media_helper import MediaHelper
+from .logger import Logger
+from .exception import NoFrameRateException, TranscriptionException
+class Transcriber(object):
+    """Transcribe audiovisual content for subtitle generation.
+    """
+    def __init__(self, recipe: str = "whisper", flavour: str = "small") -> None:
+        """Initialiser for the transcribing process.
+        Arguments:
+            recipe {string} -- the LLM recipe used for transcribing video files (default: "whisper").
+            flavour {string} -- the flavour variation for a specific LLM recipe (default: "small").
+        Raises:
+            NotImplementedError -- Thrown when the LLM recipe is unknown.
+        """
+        if recipe not in [r.value for r in Recipe]:
+            raise NotImplementedError(f"Unknown recipe: {recipe}")
+        if recipe == Recipe.whisper.value:
+            if flavour not in [f.value for f in WhisperFlavour]:
+                raise NotImplementedError(f"Unknown {recipe} flavour: {flavour}")
+            self.__model = whisper.load_model(flavour)
+        self.recipe = recipe
+        self.flavour = flavour
+        self.__media_helper = MediaHelper()
+        self.__LOGGER = Logger().get_logger(__name__)
+    def transcribe(self, video_file_path: str, language_code: str) -> Tuple[Subtitle, Optional[float]]:
+        """Transcribe an audiovisual file and generate subtitles.
+        Arguments:
+            video_file_path {string} -- The input video file path.
+            language_code {string} -- An alpha 3 language code derived from ISO 639-3.
+        Raises:
+            TranscriptionException -- Thrown when transcription is failed.
+            NotImplementedError -- Thrown when the LLM recipe is not supported.
+        """
+        if self.recipe == "whisper":
+            lang = Translator.get_iso_639_alpha_2(language_code)
+            if lang not in LANGUAGES:
+                raise TranscriptionException(f'"{language_code}" is not supported by {self.recipe} ({self.flavour})')
+            audio_file_path = self.__media_helper.extract_audio(video_file_path, True, 16000)
+            try:
+                audio = whisper.load_audio(audio_file_path)
+                self.__LOGGER.debug("Start transcribing the audio...")
+                result = self.__model.transcribe(audio, task="transcribe", language=LANGUAGES[lang])
+                self.__LOGGER.info("Finished transcribing the audio")
+                srt_str = ""
+                for i, segment in enumerate(result["segments"], start=1):
+                    srt_str += f"{i}\n" \
+                               f"{self.__format_timestamp(segment['start'])} --> {self.__format_timestamp(segment['end'])}\n" \
+                               f"{segment['text'].strip().replace('-->', '->')}\n" \
+                               "\n"
+                subtitle = Subtitle.load_subrip_str(srt_str)
+                subtitle, frame_rate = self.__on_frame_timecodes(subtitle, video_file_path)
+                self.__LOGGER.debug("Generated the raw subtitle")
+                return subtitle, frame_rate
+            finally:
+                if os.path.exists(audio_file_path):
+                    os.remove(audio_file_path)
+        else:
+            raise NotImplementedError(f"{self.recipe} ({self.flavour}) is not supported")
+    @staticmethod
+    def __format_timestamp(seconds: float) -> str:
+        assert seconds >= 0, "non-negative timestamp expected"
+        milliseconds = round(seconds * 1000.0)
+        hours = milliseconds // 3_600_000
+        milliseconds -= hours * 3_600_000
+        minutes = milliseconds // 60_000
+        milliseconds -= minutes * 60_000
+        seconds = milliseconds // 1_000
+        milliseconds -= seconds * 1_000
+        hours_marker = f"{hours:02d}:"
+        return f"{hours_marker}{minutes:02d}:{seconds:02d},{milliseconds:03d}"
+    def __on_frame_timecodes(self, subtitle: Subtitle, video_file_path: str) -> Tuple[Subtitle, Optional[float]]:
+        frame_rate = None
+        try:
+            frame_rate = self.__media_helper.get_frame_rate(video_file_path)
+            frame_duration = 1.0 / frame_rate
+            for sub in subtitle.subs:
+                start_seconds = sub.start.hours * 3600 + sub.start.minutes * 60 + sub.start.seconds + sub.start.milliseconds / 1000.0
+                end_seconds = sub.end.hours * 3600 + sub.end.minutes * 60 + sub.end.seconds + sub.end.milliseconds / 1000.0
+                start_frames = int(start_seconds / frame_duration)
+                end_frames = int(end_seconds / frame_duration)
+                sub.start = SubRipTime(seconds=start_frames * frame_duration)
+                sub.end = SubRipTime(seconds=end_frames * frame_duration)
+        except NoFrameRateException:
+            self.__LOGGER.warning("Cannot detect the frame rate for %s" % video_file_path)
+        return subtitle, frame_rate
+class Recipe(str, Enum):
+    whisper = "whisper"
+class WhisperFlavour(str, Enum):
+    tiny = "tiny"
+    tiny_en = "tiny.en"
+    small = "small"
+    medium = "medium"
+    medium_en = "medium.en"
+    base = "base"
+    base_en = "base.en"
+    large_v1 = "large-v1"
+    large_v2 = "large-v2"
+    large = "large"

subaligner/translator.py CHANGED Viewed

@@ -16,6 +16,7 @@ class Translator(metaclass=Singleton):
     __TENSOR_TYPE = "pt"
     __OPUS_MT = "Helsinki-NLP/opus-mt-{}-{}"
+    __OPUS_MT_TC_BIG = "Helsinki-NLP/opus-mt-tc-big-{}-{}"
     __OPUS_TATOEBA = "Helsinki-NLP/opus-tatoeba-{}-{}"
     __TRANSLATING_BATCH_SIZE = 10
     __LANGUAGE_CODE_MAPPER = {
@@ -128,8 +129,8 @@ class Translator(metaclass=Singleton):
         num_of_batches = math.ceil(len(src_texts) / Translator.__TRANSLATING_BATCH_SIZE)
         self.__LOGGER.info("Translating %s subtitle cue(s)..." % len(src_texts))
         for batch in tqdm(Translator.__batch(src_texts, Translator.__TRANSLATING_BATCH_SIZE), total=num_of_batches):
-            tokenizer = self.tokenizer(batch, return_tensors=Translator.__TENSOR_TYPE, padding=True)
-            translated = self.lang_model.generate(**tokenizer)
+            input_ids = self.tokenizer(batch, return_tensors=Translator.__TENSOR_TYPE, padding=True)
+            translated = self.lang_model.generate(**input_ids)
             translated_texts.extend([self.tokenizer.decode(t, skip_special_tokens=True) for t in translated])
         for index in range(len(new_subs)):
             new_subs[index].text = translated_texts[index]
@@ -140,59 +141,100 @@ class Translator(metaclass=Singleton):
         src_lang = Translator.normalise_single(src_lang)
         tgt_lang = Translator.normalise_single(tgt_lang)
         src_lang, tgt_lang = Translator.normalise_pair(src_lang, tgt_lang)
+        if self.__download_mt_model(src_lang, tgt_lang):
+            return
+        elif self.__download_mt_tc_big_model(src_lang, tgt_lang):
+            return
+        elif self.__download_tatoeba_model(src_lang, tgt_lang):
+            return
+        else:
+            message = 'Cannot find the MT model for source language "{}" and destination language "{}"'.format(src_lang, tgt_lang)
+            self.__LOGGER.error(message)
+            raise NotImplementedError(message)
+    def __download_mt_model(self, src_lang: str, tgt_lang: str) -> bool:
         try:
             mt_model_name = Translator.__OPUS_MT.format(Translator.get_iso_639_alpha_2(src_lang), Translator.get_iso_639_alpha_2(tgt_lang))
-            self.__download_mt_model(mt_model_name)
-            return
+            self.__download(mt_model_name)
+            return True
         except OSError:
             self.__log_and_back_off(mt_model_name)
         try:
             mt_model_name = Translator.__OPUS_MT.format(src_lang, Translator.get_iso_639_alpha_2(tgt_lang))
-            self.__download_mt_model(mt_model_name)
-            return
+            self.__download(mt_model_name)
+            return True
         except OSError:
             self.__log_and_back_off(mt_model_name)
         try:
             mt_model_name = Translator.__OPUS_MT.format(Translator.get_iso_639_alpha_2(src_lang), tgt_lang)
-            self.__download_mt_model(mt_model_name)
-            return
+            self.__download(mt_model_name)
+            return True
         except OSError:
             self.__log_and_back_off(mt_model_name)
         try:
             mt_model_name = Translator.__OPUS_MT.format(src_lang, tgt_lang)
-            self.__download_mt_model(mt_model_name)
-            return
+            self.__download(mt_model_name)
+            return True
         except OSError:
             self.__log_and_back_off(mt_model_name)
+        return False
+    def __download_mt_tc_big_model(self, src_lang: str, tgt_lang: str) -> bool:
+        try:
+            mt_tc_model_name = Translator.__OPUS_MT_TC_BIG.format(Translator.get_iso_639_alpha_2(src_lang), Translator.get_iso_639_alpha_2(tgt_lang))
+            self.__download(mt_tc_model_name)
+            return True
+        except OSError:
+            self.__log_and_back_off(mt_tc_model_name)
+        try:
+            mt_tc_model_name = Translator.__OPUS_MT_TC_BIG.format(src_lang, Translator.get_iso_639_alpha_2(tgt_lang))
+            self.__download(mt_tc_model_name)
+            return True
+        except OSError:
+            self.__log_and_back_off(mt_tc_model_name)
+        try:
+            mt_tc_model_name = Translator.__OPUS_MT_TC_BIG.format(Translator.get_iso_639_alpha_2(src_lang), tgt_lang)
+            self.__download(mt_tc_model_name)
+            return True
+        except OSError:
+            self.__log_and_back_off(mt_tc_model_name)
+        try:
+            mt_tc_model_name = Translator.__OPUS_MT_TC_BIG.format(src_lang, tgt_lang)
+            self.__download(mt_tc_model_name)
+            return True
+        except OSError:
+            self.__log_and_back_off(mt_tc_model_name)
+        return False
+    def __download_tatoeba_model(self, src_lang: str, tgt_lang: str) -> bool:
         try:
             mt_model_name = Translator.__OPUS_TATOEBA.format(Translator.get_iso_639_alpha_2(src_lang), Translator.get_iso_639_alpha_2(tgt_lang))
-            self.__download_mt_model(mt_model_name)
-            return
+            self.__download(mt_model_name)
+            return True
         except OSError:
             self.__log_and_back_off(mt_model_name)
         try:
             mt_model_name = Translator.__OPUS_TATOEBA.format(src_lang, Translator.get_iso_639_alpha_2(tgt_lang))
-            self.__download_mt_model(mt_model_name)
-            return
+            self.__download(mt_model_name)
+            return True
         except OSError:
             self.__log_and_back_off(mt_model_name)
         try:
             mt_model_name = Translator.__OPUS_TATOEBA.format(Translator.get_iso_639_alpha_2(src_lang), tgt_lang)
-            self.__download_mt_model(mt_model_name)
-            return
+            self.__download(mt_model_name)
+            return True
         except OSError:
             self.__log_and_back_off(mt_model_name)
         try:
             mt_model_name = Translator.__OPUS_TATOEBA.format(src_lang, tgt_lang)
-            self.__download_mt_model(mt_model_name)
-            return
+            self.__download(mt_model_name)
+            return True
         except OSError:
-            self.__LOGGER.debug("Cannot download the MT model %s" % mt_model_name)
-            message = 'Cannot find the MT model for source language "{}" and destination language "{}"'.format(src_lang, tgt_lang)
-            self.__LOGGER.error(message)
-            raise NotImplementedError(message)
+            self.__log_and_back_off(mt_model_name)
+        return False
-    def __download_mt_model(self, mt_model_name: str) -> None:
+    def __download(self, mt_model_name: str) -> None:
         self.__LOGGER.debug("Trying to download the MT model %s" % mt_model_name)
         self.tokenizer = MarianTokenizer.from_pretrained(mt_model_name)
         self.lang_model = MarianMTModel.from_pretrained(mt_model_name)

subaligner 0.2.4__py3.7.egg → 0.3.0__py3.7.egg

subaligner 0.2.4py3.7.egg → 0.3.0py3.7.egg