PyPI - batchalign - Versions diffs - 0.7.6a32__tar.gz → 0.7.7__tar.gz - Mend

batchalign 0.7.6a32tar.gz → 0.7.7tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (113) hide show

{batchalign-0.7.6a32/batchalign.egg-info → batchalign-0.7.7}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: batchalign
-Version: 0.7.6a32
+Version: 0.7.7
 Summary: Python Speech Language Sample Analysis
 Author: Brian MacWhinney, Houjun Liu
 Author-email: macw@cmu.edu, houjun@cmu.edu
@@ -59,6 +59,8 @@ The following instructions provide a quick start to installing Batchalign. For m
         1.  Install Python 3.11: [via this link](https://www.python.org/ftp/python/3.11.7/python-3.11.7-amd64.exe)
         2.  If later commands report `pip module not found`, [this page may help](https://github.com/TalkBank/batchalign2/wiki/Troubleshooting-Tips#get-pip-on-windows)
     -  your distribution's instructions for Linux
+For first-time users of Python, note that if you didn't install Python 3.11 (as we recommended above), it may be complex to change Python versions downstream and may cause additional problems. We recommend explicitly installing Python 3.11 by installing it explicitly via specifying a version number as we show above.
 ### Install and Update the Package
 You can get Batchalign from PyPi, and you can update the package in the same way:
@@ -75,6 +77,8 @@ Windows:
 py -m pip install -U batchalign
 ```
+Note that if your system reports `pip: command not found`, replace every use of `pip` in the instructions with `pip3` and try again.
 ### Rock and Roll
 There are two main ways of interacting with Batchalign. Batchalign can be used as a program to batch-process CHAT (hence the name), or as a Python LSA library.

{batchalign-0.7.6a32 → batchalign-0.7.7}/README.md RENAMED Viewed

@@ -21,6 +21,8 @@ The following instructions provide a quick start to installing Batchalign. For m
         1.  Install Python 3.11: [via this link](https://www.python.org/ftp/python/3.11.7/python-3.11.7-amd64.exe)
         2.  If later commands report `pip module not found`, [this page may help](https://github.com/TalkBank/batchalign2/wiki/Troubleshooting-Tips#get-pip-on-windows)
     -  your distribution's instructions for Linux
+For first-time users of Python, note that if you didn't install Python 3.11 (as we recommended above), it may be complex to change Python versions downstream and may cause additional problems. We recommend explicitly installing Python 3.11 by installing it explicitly via specifying a version number as we show above.
 ### Install and Update the Package
 You can get Batchalign from PyPi, and you can update the package in the same way:
@@ -37,6 +39,8 @@ Windows:
 py -m pip install -U batchalign
 ```
+Note that if your system reports `pip: command not found`, replace every use of `pip` in the instructions with `pip3` and try again.
 ### Rock and Roll
 There are two main ways of interacting with Batchalign. Batchalign can be used as a program to batch-process CHAT (hence the name), or as a Python LSA library.

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/formats/chat/utils.py RENAMED Viewed

@@ -108,7 +108,7 @@ def annotation_clean(content, special=False):
     cleaned_word = re.sub(r"\x15\d+_\d+\x15", '', cleaned_word)
     if not special:
         cleaned_word = re.sub(r"&~\w+", '', cleaned_word)
-    cleaned_word = cleaned_word.replace("(","").replace(")","")
+    # cleaned_word = cleaned_word.replace("(","").replace(")","")
     cleaned_word = cleaned_word.replace("[","").replace("]","")
     cleaned_word = cleaned_word.replace("<","").replace(">","")
     cleaned_word = cleaned_word.replace("“","").replace("”","")

batchalign-0.7.7/batchalign/models/utils.py ADDED Viewed

@@ -0,0 +1,199 @@
+import torch
+from transformers.models.whisper.generation_whisper import _dynamic_time_warping as _dynamic_time_warping
+from transformers.models.whisper.generation_whisper import _median_filter as _median_filter
+from dataclasses import dataclass
+import numpy as np
+def _extract_token_timestamps(
+        self, generate_outputs, alignment_heads, time_precision=0.02, num_frames=None, num_input_ids=None
+    ):
+        """
+        Calculates token-level timestamps using the encoder-decoder cross-attentions and dynamic time-warping (DTW) to
+        map each output token to a position in the input audio. If `num_frames` is specified, the encoder-decoder
+        cross-attentions will be cropped before applying DTW.
+        Returns:
+            tensor containing the timestamps in seconds for each predicted token
+        """
+        # Create a list with `decoder_layers` elements, each a tensor of shape
+        # (batch size, attention_heads, output length, input length).
+        cross_attentions = []
+        for i in range(self.config.decoder_layers):
+            cross_attentions.append(torch.cat([x[i] for x in generate_outputs.cross_attentions], dim=2))
+        # Select specific cross-attention layers and heads. This is a tensor
+        # of shape (batch size, num selected, output length, input length).
+        weights = torch.stack([cross_attentions[l][:, h] for l, h in alignment_heads])
+        weights = weights.permute([1, 0, 2, 3])
+        weight_length = None
+        if "beam_indices" in generate_outputs:
+            # If beam search has been used, the output sequences may have been generated for more timesteps than their sequence_lengths
+            # since the beam search strategy chooses the most probable sequences at the end of the search.
+            # In that case, the cross_attentions weights are too long and we have to make sure that they have the right output_length
+            weight_length = (generate_outputs.beam_indices != -1).sum(-1).max()
+            weight_length = weight_length if num_input_ids is None else weight_length + num_input_ids
+            # beam search takes `decoder_input_ids` into account in the `beam_indices` length
+            # but forgot to shift the beam_indices by the number of `decoder_input_ids`
+            beam_indices = torch.zeros_like(generate_outputs.beam_indices[:, :weight_length], dtype=torch.float32)
+            # we actually shif the beam indices here
+            beam_indices[:, num_input_ids:] = generate_outputs.beam_indices[:, : weight_length - num_input_ids]
+            weights = weights[:, :, :weight_length]
+            # If beam index is still -1, it means that the associated token id is EOS
+            # We need to replace the index with 0 since index_select gives an error if any of the indexes is -1.
+            beam_indices = beam_indices.masked_fill(beam_indices == -1, 0)
+            # Select the cross attention from the right beam for each output sequences
+            weights = torch.stack(
+                [
+                    torch.index_select(weights[:, :, i, :], dim=0, index=beam_indices[:, i])
+                    for i in range(beam_indices.shape[1])
+                ],
+                dim=2,
+            )
+        # make sure timestamps are as long as weights
+        input_length = weight_length or cross_attentions[0].shape[2]
+        batch_size = generate_outputs.sequences.shape[0]
+        timestamps = torch.zeros(
+            (batch_size, input_length + 1), dtype=torch.float32, device=generate_outputs.sequences.device
+        )
+        if num_frames is not None:
+            # two cases:
+            # 1. num_frames is the same for each sample -> compute the DTW matrix for each sample in parallel
+            # 2. num_frames is different, compute the DTW matrix for each sample sequentially
+            # we're using np.unique because num_frames can be int/list/tuple
+            if isinstance(num_frames, int):
+                weights = weights[..., : num_frames // 2]
+            elif isinstance(num_frames, (list, tuple, np.ndarray)) and len(np.unique(num_frames)) == 1:
+                weights = weights[..., : num_frames[0] // 2]
+            elif isinstance(num_frames, (torch.Tensor)) and len(torch.unique(num_frames)) == 1:
+                weights = weights[..., : num_frames[0] // 2]
+            else:
+                # num_frames is of shape (batch_size,) whereas batch_size is truely batch_size*num_return_sequences
+                repeat_time = batch_size if isinstance(num_frames, int) else batch_size // len(num_frames)
+                num_frames = num_frames.cpu() if isinstance(num_frames, (torch.Tensor)) else num_frames
+                num_frames = np.repeat(num_frames, repeat_time)
+        if num_frames is None or isinstance(num_frames, int):
+            # Normalize and smoothen the weights.
+            std = torch.std(weights, dim=-2, keepdim=True, unbiased=False)
+            mean = torch.mean(weights, dim=-2, keepdim=True)
+            weights = (weights - mean) / std
+            weights = _median_filter(weights, self.config.median_filter_width)
+            # Average the different cross-attention heads.
+            weights = weights.mean(dim=1)
+        # Perform dynamic time warping on each element of the batch.
+        for batch_idx in range(batch_size):
+            if num_frames is not None and isinstance(num_frames, (tuple, list, np.ndarray, torch.Tensor)):
+                matrix = weights[batch_idx, ..., : num_frames[batch_idx] // 2]
+                # Normalize and smoothen the weights.
+                std = torch.std(matrix, dim=-2, keepdim=True, unbiased=False)
+                mean = torch.mean(matrix, dim=-2, keepdim=True)
+                matrix = (matrix - mean) / std
+                matrix = _median_filter(matrix, self.config.median_filter_width)
+                # Average the different cross-attention heads.
+                matrix = matrix.mean(dim=0)
+            else:
+                matrix = weights[batch_idx]
+            text_indices, time_indices = _dynamic_time_warping(-matrix.cpu().double().numpy())
+            jumps = np.pad(np.diff(text_indices), (1, 0), constant_values=1).astype(bool)
+            jump_times = time_indices[jumps] * time_precision
+            timestamps[batch_idx, 1:] = torch.tensor(jump_times)
+        return timestamps
+# def _extract_token_timestamps(self, generate_outputs, alignment_heads, time_precision=0.02, num_frames=None):
+#     """
+#     Calculates token-level timestamps using the encoder-decoder cross-attentions and dynamic time-warping (DTW) to
+#     map each output token to a position in the input audio. If `num_frames` is specified, the encoder-decoder
+#     cross-attentions will be cropped before applying DTW.
+#     Returns:
+#         tensor containing the timestamps in seconds for each predicted token
+#     """
+#     # Create a list with `decoder_layers` elements, each a tensor of shape
+#     # (batch size, attention_heads, output length, input length).
+#     cross_attentions = []
+#     for i in range(self.config.decoder_layers):
+#         cross_attentions.append(torch.cat([x[i] for x in generate_outputs.cross_attentions], dim=2))
+#     # Select specific cross-attention layers and heads. This is a tensor
+#     # of shape (batch size, num selected, output length, input length).
+#     weights = torch.stack([cross_attentions[l][:, h] for l, h in alignment_heads])
+#     weights = weights.permute([1, 0, 2, 3])
+#     if num_frames is not None:
+#         weights = weights[..., : num_frames // 2]
+#     # Normalize and smoothen the weights.
+#     std, mean = torch.std_mean(weights, dim=-2, keepdim=True, unbiased=False)
+#     weights = (weights - mean) / std
+#     weights = _median_filter(weights, self.config.median_filter_width)
+#     # Average the different cross-attention heads.
+#     matrix = weights.mean(dim=1)
+#     timestamps = torch.zeros_like(generate_outputs.sequences, dtype=torch.float32)
+#     # Perform dynamic time warping on each element of the batch.
+#     for batch_idx in range(timestamps.shape[0]):
+#         text_indices, time_indices = _dynamic_time_warping(-matrix[batch_idx].float().cpu().numpy())
+#         jumps = np.pad(np.diff(text_indices), (1, 0), constant_values=1).astype(bool)
+#         jump_times = time_indices[jumps] * time_precision
+#         timestamps[batch_idx, 1:] = torch.tensor(jump_times)
+#     return timestamps
+@dataclass
+class ASRAudioFile:
+    file : str
+    tensor : torch.Tensor
+    rate : int
+    def chunk(self,begin_ms, end_ms):
+        """Get a chunk of the audio.
+        Parameters
+        ----------
+        begin_ms : int
+            Milliseconds of the start of the slice.
+        end_ms : int
+            Milliseconds of the end of the slice.
+        Returns
+        -------
+        torch.Tensor
+            The returned chunk to supply to the ASR engine.
+        """
+        data = self.tensor[int(round((begin_ms/1000)*self.rate)):
+                           int(round((end_ms/1000)*self.rate))]
+        return data
+    def all(self):
+        """Get the audio in its entirety
+        Notes
+        -----
+        like `chunk()` but all of the audio
+        """
+        return self.tensor

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/analysis/eval.py RENAMED Viewed

@@ -3,6 +3,7 @@ eval.py
 Engines for transcript evaluation
 """
+import re
 from batchalign.document import *
 from batchalign.pipelines.base import *
 from batchalign.pipelines.asr.utils import *
@@ -22,11 +23,34 @@ class EvaluationEngine(BatchalignEngine):
         forms = [ j.text.lower() for i in doc.content for j in i.content if isinstance(i, Utterance)]
         gold_forms = [ j.text.lower() for i in gold.content for j in i.content if isinstance(i, Utterance)]
-        forms = [i for i in forms if i.strip() not in MOR_PUNCT+ENDING_PUNCT]
-        gold_forms = [i for i in gold_forms if i.strip() not in MOR_PUNCT+ENDING_PUNCT]
+        forms = [i.replace("-", "") for i in forms if i.strip() not in MOR_PUNCT+ENDING_PUNCT]
+        gold_forms = [i.replace("-", "") for i in gold_forms if i.strip() not in MOR_PUNCT+ENDING_PUNCT]
+        forms = [re.sub(r"\((.*)\)",r"", i) for i in forms]
+        gold_forms = [re.sub(r"\((.*)\)",r"", i) for i in gold_forms]
+        # if there are single letter frames, we combine them tofgether
+        # until the utterance is done or there isn't any left
+        forms_finished = []
+        single_sticky = ""
+        is_single = False
+        for i in forms:
+            if len(i) == 1:
+                single_sticky += i
+            else:
+                if single_sticky != "":
+                    forms_finished.append(single_sticky)
+                    single_sticky = ""
+                forms_finished.append(i)
+        if single_sticky != "":
+            forms_finished.append(single_sticky)
+            single_sticky = ""
         # dp!
-        alignment = align(forms, gold_forms, False)
+        alignment = align(forms_finished, gold_forms, False)
         # calculate each type of error
         sub = 0
@@ -39,14 +63,28 @@ class EvaluationEngine(BatchalignEngine):
         #     but if we have <extra.reference> <extra.reference> this is 2 insertions
         cleaned_alignment = []
+        # whether we had a "firstname" in reference document and hence are
+        # anticipating a payload for it (the actual name) in the next entry in the
+        # alignment
+        anticipating_payload = False
         for i in alignment:
             if isinstance(i, Extra):
-                if len(cleaned_alignment) > 0 and i.extra_type == ExtraType.REFERENCE and "name" in i.key and i.key[:4] != "name":
-                    cleaned_alignment.pop(-1)
+                if i.extra_type == ExtraType.REFERENCE and "name" in i.key and i.key[:4] != "name":
+                    if (isinstance(cleaned_alignment[-1], Extra) and
+                        cleaned_alignment[-1].extra_type ==  ExtraType.PAYLOAD and
+                        len(cleaned_alignment) > 0):
+                        cleaned_alignment.pop(-1)
+                    else:
+                        anticipating_payload = True
                     cleaned_alignment.append(Match(i.key, None, None))
                     continue
+                elif i.extra_type == ExtraType.PAYLOAD and anticipating_payload:
+                    anticipating_payload = False
+                    continue
                 if prev_error != None and prev_error != i.extra_type:
                     # this is a substitution: we have different "extra"s in
@@ -75,7 +113,7 @@ class EvaluationEngine(BatchalignEngine):
             cleaned_alignment.append(i)
         diff = []
-        for i in alignment:
+        for i in cleaned_alignment:
             if isinstance(i, Extra):
                 diff.append(f"{'+' if i.extra_type == ExtraType.REFERENCE else '-'} {i.key}")
             else:

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/morphosyntax/ud.py RENAMED Viewed

@@ -18,6 +18,7 @@ from stanza import DownloadMethod
 from torch import heaviside
 from stanza.pipeline.processor import ProcessorVariant, register_processor_variant
+from stanza.resources.common import download_resources_json, load_resources_json, get_language_resources
 # the loading bar
 from tqdm import tqdm
@@ -115,6 +116,7 @@ def handler(word, lang=None):
     target = target.replace('/100', '')
     target = target.replace('/r', '')
     target = target.replace('(', '')
+    target = target.replace("(","").replace(")","")
     # remove attachments
     if "|" in target:
@@ -217,9 +219,9 @@ def handler__NOUN(word, lang=None):
     type  = feats.get("PronType", "")
     apm = ""
-    if lang == "fr":
+    if lang == "fr" and number_str == "-Plur":
         from batchalign.pipelines.morphosyntax.fr.apm import is_apm_noun
-        apm = "apm" if is_apm_noun(word.text) else ""
+        apm = "Apm" if is_apm_noun(word.text) else ""
     if word.deprel == "obj" and case.strip() == "":
@@ -738,13 +740,17 @@ def morphoanalyze(doc: Document, retokenize:bool, status_hook:callable = None, *
     else:
         config["tokenize_postprocessor"] = lambda x:adlist_processor(x)
+    download_resources_json()
+    resources = load_resources_json()
+    mwt_exclusion = ["hr", "zh", "zh-hans", "zh-hant", "ja", "ko",
+                     "sl", "sr", "bg", "ru", "et", "hu",
+                     "eu", "el", "he", "af", "ga", "da", "ro"]
     if "zh" in lang:
         lang.pop(lang.index("zh"))
         lang.append("zh-hans")
-    elif not any([i in ["hr", "zh", "zh-hans", "zh-hant", "ja", "ko",
-                        "sl", "sr", "bg", "ru", "et", "hu",
-                        "eu", "el", "he", "af", "ga", "da", "ro"] for i in lang]):
+    elif not any(i in mwt_exclusion or "mwt" not in get_language_resources(resources, i) for i in lang):
         if "en" in lang:
             config["processors"]["mwt"] = "gum"
         else:
@@ -848,7 +854,7 @@ def morphoanalyze(doc: Document, retokenize:bool, status_hook:callable = None, *
         inputs.append(line_cut)
         try:
-            sents = nlp(line_cut.strip()).sentences
+            sents = nlp(line_cut.replace("(","").replace(")","").strip()).sentences
             if len(sents) == 0:
                 continue
@@ -958,6 +964,7 @@ def morphoanalyze(doc: Document, retokenize:bool, status_hook:callable = None, *
                 retokenized_ut = re.sub(r"⁎[⁎ ]*(.*?)[⁎ ]*⁎", r"⁎\1⁎ ", retokenized_ut)
                 retokenized_ut = re.sub(r"\[\*(.)\]", r"[* \1]", retokenized_ut)
                 retokenized_ut = re.sub(r" +", r" ", retokenized_ut)
+                retokenized_ut = re.sub(r"⁎ @", r"⁎@", retokenized_ut)
                 # pray to everyone that it works---this will simply crash and ignore
                 # the utterance if it didn't work, so we are doing this as a sanity

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/utterance/ud_utterance.py RENAMED Viewed

@@ -84,6 +84,7 @@ def parse_tree(subtree):
                      for i in stack]
 def process_ut(ut, nlp):
     # remove punct
     if (ut.content[-1].type == TokenType.PUNCT or
         ut.content[-1].text in ENDING_PUNCT):
@@ -142,7 +143,7 @@ def process_ut(ut, nlp):
         if isinstance(i, Match):
             matches.append(i)
         elif i.extra_type == ExtraType.REFERENCE:
-            new_refs.append(ReferenceTarget(key=i.key, payload=i.payload))
+            new_refs.append(ReferenceTarget(key=i.key, payload=i.payload if i.payload else -1))
     # we now sort the references based on their orignial utterance order
     matches = matches + new_refs

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/utils/utils.py RENAMED Viewed

@@ -29,6 +29,7 @@ def word_tokenize(str):
         return tmp.tokenize(str)
     except LookupError:
         nltk.download("punkt")
+        nltk.download("punkt_tab")
         return tmp.tokenize(str)
 def sent_tokenize(str):
@@ -49,6 +50,7 @@ def sent_tokenize(str):
         return ST(str)
     except LookupError:
         nltk.download("punkt")
+        nltk.download("punkt_tab")
         return ST(str)
 def detokenize(tokens):
@@ -69,6 +71,7 @@ def detokenize(tokens):
         return TreebankWordDetokenizer().detokenize(tokens)
     except LookupError:
         nltk.download("punkt")
+        nltk.download("punkt_tab")
         return TreebankWordDetokenizer().detokenize(tokens)
 def correct_timing(doc):

batchalign-0.7.7/batchalign/version ADDED Viewed

@@ -0,0 +1,3 @@
+0.7.7
+Janurary 3st, 2025
+releasing new full version

{batchalign-0.7.6a32 → batchalign-0.7.7/batchalign.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: batchalign
-Version: 0.7.6a32
+Version: 0.7.7
 Summary: Python Speech Language Sample Analysis
 Author: Brian MacWhinney, Houjun Liu
 Author-email: macw@cmu.edu, houjun@cmu.edu
@@ -59,6 +59,8 @@ The following instructions provide a quick start to installing Batchalign. For m
         1.  Install Python 3.11: [via this link](https://www.python.org/ftp/python/3.11.7/python-3.11.7-amd64.exe)
         2.  If later commands report `pip module not found`, [this page may help](https://github.com/TalkBank/batchalign2/wiki/Troubleshooting-Tips#get-pip-on-windows)
     -  your distribution's instructions for Linux
+For first-time users of Python, note that if you didn't install Python 3.11 (as we recommended above), it may be complex to change Python versions downstream and may cause additional problems. We recommend explicitly installing Python 3.11 by installing it explicitly via specifying a version number as we show above.
 ### Install and Update the Package
 You can get Batchalign from PyPi, and you can update the package in the same way:
@@ -75,6 +77,8 @@ Windows:
 py -m pip install -U batchalign
 ```
+Note that if your system reports `pip: command not found`, replace every use of `pip` in the instructions with `pip3` and try again.
 ### Rock and Roll
 There are two main ways of interacting with Batchalign. Batchalign can be used as a program to batch-process CHAT (hence the name), or as a Python LSA library.

batchalign-0.7.6a32/batchalign/models/utils.py DELETED Viewed

@@ -1,86 +0,0 @@
-import torch
-from transformers.models.whisper.generation_whisper import _dynamic_time_warping as _dynamic_time_warping
-from transformers.models.whisper.generation_whisper import _median_filter as _median_filter
-from dataclasses import dataclass
-import numpy as np
-def _extract_token_timestamps(self, generate_outputs, alignment_heads, time_precision=0.02, num_frames=None):
-    """
-    Calculates token-level timestamps using the encoder-decoder cross-attentions and dynamic time-warping (DTW) to
-    map each output token to a position in the input audio. If `num_frames` is specified, the encoder-decoder
-    cross-attentions will be cropped before applying DTW.
-    Returns:
-        tensor containing the timestamps in seconds for each predicted token
-    """
-    # Create a list with `decoder_layers` elements, each a tensor of shape
-    # (batch size, attention_heads, output length, input length).
-    cross_attentions = []
-    for i in range(self.config.decoder_layers):
-        cross_attentions.append(torch.cat([x[i] for x in generate_outputs.cross_attentions], dim=2))
-    # Select specific cross-attention layers and heads. This is a tensor
-    # of shape (batch size, num selected, output length, input length).
-    weights = torch.stack([cross_attentions[l][:, h] for l, h in alignment_heads])
-    weights = weights.permute([1, 0, 2, 3])
-    if num_frames is not None:
-        weights = weights[..., : num_frames // 2]
-    # Normalize and smoothen the weights.
-    std, mean = torch.std_mean(weights, dim=-2, keepdim=True, unbiased=False)
-    weights = (weights - mean) / std
-    weights = _median_filter(weights, self.config.median_filter_width)
-    # Average the different cross-attention heads.
-    matrix = weights.mean(dim=1)
-    timestamps = torch.zeros_like(generate_outputs.sequences, dtype=torch.float32)
-    # Perform dynamic time warping on each element of the batch.
-    for batch_idx in range(timestamps.shape[0]):
-        text_indices, time_indices = _dynamic_time_warping(-matrix[batch_idx].float().cpu().numpy())
-        jumps = np.pad(np.diff(text_indices), (1, 0), constant_values=1).astype(bool)
-        jump_times = time_indices[jumps] * time_precision
-        timestamps[batch_idx, 1:] = torch.tensor(jump_times)
-    return timestamps
-@dataclass
-class ASRAudioFile:
-    file : str
-    tensor : torch.Tensor
-    rate : int
-    def chunk(self,begin_ms, end_ms):
-        """Get a chunk of the audio.
-        Parameters
-        ----------
-        begin_ms : int
-            Milliseconds of the start of the slice.
-        end_ms : int
-            Milliseconds of the end of the slice.
-        Returns
-        -------
-        torch.Tensor
-            The returned chunk to supply to the ASR engine.
-        """
-        data = self.tensor[int(round((begin_ms/1000)*self.rate)):
-                           int(round((end_ms/1000)*self.rate))]
-        return data
-    def all(self):
-        """Get the audio in its entirety
-        Notes
-        -----
-        like `chunk()` but all of the audio
-        """
-        return self.tensor

batchalign-0.7.6a32/batchalign/version DELETED Viewed

@@ -1,3 +0,0 @@
-0.7.6-alpha.32
-November 26, 2024
-French APM

{batchalign-0.7.6a32 → batchalign-0.7.7}/LICENSE RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/MANIFEST.in RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/__init__.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/__main__.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/cli/__init__.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/cli/cli.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/cli/dispatch.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/constants.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/document.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/errors.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/formats/__init__.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/formats/base.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/formats/chat/__init__.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/formats/chat/file.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/formats/chat/generator.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/formats/chat/lexer.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/formats/chat/parser.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/formats/textgrid/__init__.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/formats/textgrid/file.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/formats/textgrid/generator.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/formats/textgrid/parser.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/models/__init__.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/models/resolve.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/models/speaker/__init__.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/models/speaker/config.yaml RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/models/speaker/infer.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/models/speaker/utils.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/models/training/__init__.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/models/training/run.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/models/training/utils.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/models/utterance/__init__.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/models/utterance/dataset.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/models/utterance/execute.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/models/utterance/infer.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/models/utterance/prep.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/models/utterance/train.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/models/whisper/__init__.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/models/whisper/infer_asr.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/models/whisper/infer_fa.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/__init__.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/analysis/__init__.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/asr/__init__.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/asr/rev.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/asr/utils.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/asr/whisper.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/asr/whisperx.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/base.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/cleanup/__init__.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/cleanup/cleanup.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/cleanup/disfluencies.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/cleanup/parse_support.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/cleanup/retrace.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/cleanup/support/filled_pauses.eng RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/cleanup/support/replacements.eng RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/cleanup/support/test.test RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/dispatch.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/fa/__init__.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/fa/whisper_fa.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/morphosyntax/__init__.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/morphosyntax/coref.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/morphosyntax/en/irr.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/morphosyntax/fr/apm.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/morphosyntax/fr/apmn.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/morphosyntax/fr/case.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/morphosyntax/ja/verbforms.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/pipeline.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/speaker/__init__.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/speaker/nemo_speaker.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/utr/__init__.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/utr/rev_utr.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/utr/utils.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/utr/whisper_utr.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/pipelines/utterance/__init__.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/tests/__init__.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/tests/conftest.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/tests/formats/chat/test_chat_file.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/tests/formats/chat/test_chat_generator.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/tests/formats/chat/test_chat_lexer.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/tests/formats/chat/test_chat_parser.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/tests/formats/chat/test_chat_utils.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/tests/formats/textgrid/test_textgrid.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/tests/pipelines/analysis/test_eval.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/tests/pipelines/asr/test_asr_pipeline.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/tests/pipelines/asr/test_asr_utils.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/tests/pipelines/cleanup/test_disfluency.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/tests/pipelines/cleanup/test_parse_support.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/tests/pipelines/fa/test_fa_pipeline.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/tests/pipelines/fixures.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/tests/pipelines/test_pipeline.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/tests/pipelines/test_pipeline_models.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/tests/test_document.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/utils/__init__.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/utils/config.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign/utils/dp.py RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign.egg-info/SOURCES.txt RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign.egg-info/dependency_links.txt RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign.egg-info/entry_points.txt RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign.egg-info/requires.txt RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/batchalign.egg-info/top_level.txt RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/setup.cfg RENAMED Viewed

File without changes

{batchalign-0.7.6a32 → batchalign-0.7.7}/setup.py RENAMED Viewed

File without changes

batchalign 0.7.6a32__tar.gz → 0.7.7__tar.gz

batchalign 0.7.6a32tar.gz → 0.7.7tar.gz