PyPI - easyaligner - Versions diffs - 0.2.0__tar.gz → 0.2.3__tar.gz - Mend

easyaligner 0.2.0tar.gz → 0.2.3tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (36) hide show

easyaligner-0.2.3/PKG-INFO ADDED Viewed

@@ -0,0 +1,167 @@
+Metadata-Version: 2.4
+Name: easyaligner
+Version: 0.2.3
+Summary: Forced alignment pipeline designed for efficiency and ease of use.
+Author: Faton Rekathati
+Project-URL: Repository, https://github.com/kb-labb/easyaligner
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: transformers>=4.45.0
+Requires-Dist: torch!=2.9.*,>=2.7.0
+Requires-Dist: torchaudio!=2.9.*,>=2.7.0
+Requires-Dist: tqdm>=4.66.1
+Requires-Dist: soundfile>=0.12.1
+Requires-Dist: nltk>=3.8.2
+Requires-Dist: pyannote-audio>=3.3.1
+Requires-Dist: silero-vad~=6.0
+Requires-Dist: msgspec
+Requires-Dist: rapidfuzz
+Dynamic: license-file
+# Easier forced alignment with `easyaligner`
+<div align="center"><img width="1020" height="340" alt="image" src="https://github.com/user-attachments/assets/a3589539-5c85-4ac1-a4a7-d5e801207faa" /></div>
+`easyaligner` is a fast and memory efficient forced alignment pipeline for speech and text. Given a text transcript, `easyaligner` will help identify where each word or phrase was spoken in the audio. The library supports aligning both from ground-truth transcripts, as well as from ASR-generated transcripts (`easyaligner` acts as the backend that powers alignment in [`easytranscriber`](https://github.com/kb-labb/easytranscriber)). Some notable features of `easyaligner` include:
+* **GPU accelerated forced alignment**. Uses [Pytorch's forced alignment API](https://docs.pytorch.org/audio/main/tutorials/ctc_forced_alignment_api_tutorial.html) with a GPU based implementation of the Viterbi algorithm. Enables fast and memory-efficient forced alignment of long audio segments ([Pratap et al., 2024](https://jmlr.org/papers/volume25/23-1318/23-1318.pdf#page=8)).
+* **Flexible text normalization for improved alignment quality**. Users can supply custom regex-based text normalization functions to preprocess transcripts before alignment. A mapping from the original text to the normalized text is maintained internally. All of the applied normalizations and transformations are consequently **non-destructive and reversible after alignment**.
+* **Batch processing support for emission extraction**. `easyaligner` supports batched inference for wav2vec2-based models, keeping track of non-padded logits when doing alignment.
+Check out the [documentation](https://kb-labb.github.io/easyaligner/) for more details and tutorials!
+## Installation
+### With GPU support (recommended)
+```bash
+pip install easyaligner --extra-index-url https://download.pytorch.org/whl/cu128
+```
+> [!TIP]
+> Remove `--extra-index-url` if you want a CPU-only installation.
+### Using uv
+When installing with [uv](https://docs.astral.sh/uv/), it will select the appropriate PyTorch version automatically (CPU for macOS, CUDA for Linux/Windows/ARM):
+```bash
+uv pip install easyaligner
+```
+## Usage
+The example below downloads a short snippet from a LibriVox audiobook recording of [A Tale of Two Cities](https://librivox.org/a-tale-of-two-cities-by-charles-dickens-2/). The snippet is 57 seconds long, and corresponds to the first paragraph of the first chapter of A Tale of Two Cities. The corresponding text to be used for alignment is directly supplied below and assigned to the `text` variable.
+```python
+from pathlib import Path
+from transformers import (
+    AutoModelForCTC,
+    Wav2Vec2Processor,
+)
+from huggingface_hub import snapshot_download
+from easyaligner.text import load_tokenizer
+from easyaligner.data.datamodel import SpeechSegment
+from easyaligner.pipelines import pipeline
+from easyaligner.text import text_normalizer
+from easyaligner.vad.pyannote import load_vad_model
+filepath_pattern = "tale-of-two-cities_align-en/taleoftwocities_01_dickens_64kb_align.mp3"
+# Download mp3 from Hugging Face Hub
+snapshot_download(
+    "Lauler/easytranscriber_tutorials",
+    repo_type="dataset",
+    local_dir="data/tutorials",
+    allow_patterns=filepath_pattern,
+)
+# File(s) to align
+filepath = Path("data/tutorials") / filepath_pattern
+audio_dir = filepath.parent
+audio_files = [filepath.name]
+text = """
+It was the best of times, it was the worst of times, it was the age of
+wisdom, it was the age of foolishness, it was the epoch of belief, it
+was the epoch of incredulity, it was the season of Light, it was the
+season of Darkness, it was the spring of hope, it was the winter of
+despair, we had everything before us, we had nothing before us, we were
+all going direct to Heaven, we were all going direct the other way--in
+short, the period was so far like the present period, that some of its
+noisiest authorities insisted on its being received, for good or for
+evil, in the superlative degree of comparison only.
+"""
+text = text.strip()
+# The alignments will be organized according to how the text is tokenized
+tokenizer = load_tokenizer(language="english")  # sentence tokenizer
+span_list = list(tokenizer.span_tokenize(text))  # start, end character indices for each sentence
+speeches = [[SpeechSegment(speech_id=0, text=text, text_spans=span_list, start=None, end=None)]]
+# Load models and run pipeline
+model_vad = load_vad_model()
+model = AutoModelForCTC.from_pretrained("facebook/wav2vec2-base-960h").to("cuda").half()
+processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base-960h")
+pipeline(
+    vad_model=model_vad,
+    emissions_model=model,
+    processor=processor,
+    audio_paths=audio_files,
+    audio_dir=audio_dir,
+    speeches=speeches,
+    alignment_strategy="speech",
+    text_normalizer_fn=text_normalizer,
+    tokenizer=tokenizer,
+    start_wildcard=True,
+    end_wildcard=True,
+    blank_id=processor.tokenizer.pad_token_id,
+    word_boundary="|",
+)
+```
+> [!TIP]
+> `easyaligner` allows organizing the output at any level of granularity the user wishes (sentence, paragraph, or other). In the above example, we use an `nltk.tokenize.punkt.PunktTokenizer` to sentence tokenize our text. See the [text processing documentation](https://kb-labb.github.io/easyaligner/get-started/text_processing.html) for a more detailed explanation, and a tutorial for implementing custom tokenizers.
+## Documentation
+Check out the documentation tutorials that cover common scenarios for forced alignment, and the API reference:
+* [https://kb-labb.github.io/easyaligner/](https://kb-labb.github.io/easyaligner/)
+* [Tutorial 1](https://kb-labb.github.io/easyaligner/get-started/tutorial01.html): Align text and audio when the transcript covers all of the spoken content in the audio.
+* [Tutorial 2](https://kb-labb.github.io/easyaligner/get-started/tutorial02.html): Transcript covers only part of the spoken content in the audio, but we know the relevant audio region in advance.
+* [Tutorial 3](https://kb-labb.github.io/easyaligner/get-started/tutorial03.html): Transcript covers only part of the spoken content in the audio, and we don't know the relevant audio region in advance.
+## Outputs
+By default, `easyaligner` saves the outputs of each stage of the pipeline (VAD, emission extraction, forced alignment) as JSON files in separate directories. The final aligned output can be found in `output/alignments`. The directory structure after running the full pipeline will look as follows:
+```
+output
+├── alignments
+├── emissions
+└── vad
+```
+The `output/emissions` directory will, in addition to the JSON files, also contain output emissions for each JSON file in `.npy` format.
+All intermediate files can safely be deleted, assuming there is no need to re-run the pipeline from a specific intermediate stage.
+## Citation
+If you use `easyaligner` in your research, consider citing the following blog post:
+```
+@online{rekathati2026,
+  author = {Rekathati, Faton},
+  title = {Easyaligner: {Forced} Alignment of Text and Audio, Made Easy},
+  date = {2026-04-08},
+  url = {https://kb-labb.github.io/posts/2026-04-08-easyaligner/},
+  langid = {en}
+}
+```

easyaligner-0.2.3/README.md ADDED Viewed

@@ -0,0 +1,146 @@
+# Easier forced alignment with `easyaligner`
+<div align="center"><img width="1020" height="340" alt="image" src="https://github.com/user-attachments/assets/a3589539-5c85-4ac1-a4a7-d5e801207faa" /></div>
+`easyaligner` is a fast and memory efficient forced alignment pipeline for speech and text. Given a text transcript, `easyaligner` will help identify where each word or phrase was spoken in the audio. The library supports aligning both from ground-truth transcripts, as well as from ASR-generated transcripts (`easyaligner` acts as the backend that powers alignment in [`easytranscriber`](https://github.com/kb-labb/easytranscriber)). Some notable features of `easyaligner` include:
+* **GPU accelerated forced alignment**. Uses [Pytorch's forced alignment API](https://docs.pytorch.org/audio/main/tutorials/ctc_forced_alignment_api_tutorial.html) with a GPU based implementation of the Viterbi algorithm. Enables fast and memory-efficient forced alignment of long audio segments ([Pratap et al., 2024](https://jmlr.org/papers/volume25/23-1318/23-1318.pdf#page=8)).
+* **Flexible text normalization for improved alignment quality**. Users can supply custom regex-based text normalization functions to preprocess transcripts before alignment. A mapping from the original text to the normalized text is maintained internally. All of the applied normalizations and transformations are consequently **non-destructive and reversible after alignment**.
+* **Batch processing support for emission extraction**. `easyaligner` supports batched inference for wav2vec2-based models, keeping track of non-padded logits when doing alignment.
+Check out the [documentation](https://kb-labb.github.io/easyaligner/) for more details and tutorials!
+## Installation
+### With GPU support (recommended)
+```bash
+pip install easyaligner --extra-index-url https://download.pytorch.org/whl/cu128
+```
+> [!TIP]
+> Remove `--extra-index-url` if you want a CPU-only installation.
+### Using uv
+When installing with [uv](https://docs.astral.sh/uv/), it will select the appropriate PyTorch version automatically (CPU for macOS, CUDA for Linux/Windows/ARM):
+```bash
+uv pip install easyaligner
+```
+## Usage
+The example below downloads a short snippet from a LibriVox audiobook recording of [A Tale of Two Cities](https://librivox.org/a-tale-of-two-cities-by-charles-dickens-2/). The snippet is 57 seconds long, and corresponds to the first paragraph of the first chapter of A Tale of Two Cities. The corresponding text to be used for alignment is directly supplied below and assigned to the `text` variable.
+```python
+from pathlib import Path
+from transformers import (
+    AutoModelForCTC,
+    Wav2Vec2Processor,
+)
+from huggingface_hub import snapshot_download
+from easyaligner.text import load_tokenizer
+from easyaligner.data.datamodel import SpeechSegment
+from easyaligner.pipelines import pipeline
+from easyaligner.text import text_normalizer
+from easyaligner.vad.pyannote import load_vad_model
+filepath_pattern = "tale-of-two-cities_align-en/taleoftwocities_01_dickens_64kb_align.mp3"
+# Download mp3 from Hugging Face Hub
+snapshot_download(
+    "Lauler/easytranscriber_tutorials",
+    repo_type="dataset",
+    local_dir="data/tutorials",
+    allow_patterns=filepath_pattern,
+)
+# File(s) to align
+filepath = Path("data/tutorials") / filepath_pattern
+audio_dir = filepath.parent
+audio_files = [filepath.name]
+text = """
+It was the best of times, it was the worst of times, it was the age of
+wisdom, it was the age of foolishness, it was the epoch of belief, it
+was the epoch of incredulity, it was the season of Light, it was the
+season of Darkness, it was the spring of hope, it was the winter of
+despair, we had everything before us, we had nothing before us, we were
+all going direct to Heaven, we were all going direct the other way--in
+short, the period was so far like the present period, that some of its
+noisiest authorities insisted on its being received, for good or for
+evil, in the superlative degree of comparison only.
+"""
+text = text.strip()
+# The alignments will be organized according to how the text is tokenized
+tokenizer = load_tokenizer(language="english")  # sentence tokenizer
+span_list = list(tokenizer.span_tokenize(text))  # start, end character indices for each sentence
+speeches = [[SpeechSegment(speech_id=0, text=text, text_spans=span_list, start=None, end=None)]]
+# Load models and run pipeline
+model_vad = load_vad_model()
+model = AutoModelForCTC.from_pretrained("facebook/wav2vec2-base-960h").to("cuda").half()
+processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base-960h")
+pipeline(
+    vad_model=model_vad,
+    emissions_model=model,
+    processor=processor,
+    audio_paths=audio_files,
+    audio_dir=audio_dir,
+    speeches=speeches,
+    alignment_strategy="speech",
+    text_normalizer_fn=text_normalizer,
+    tokenizer=tokenizer,
+    start_wildcard=True,
+    end_wildcard=True,
+    blank_id=processor.tokenizer.pad_token_id,
+    word_boundary="|",
+)
+```
+> [!TIP]
+> `easyaligner` allows organizing the output at any level of granularity the user wishes (sentence, paragraph, or other). In the above example, we use an `nltk.tokenize.punkt.PunktTokenizer` to sentence tokenize our text. See the [text processing documentation](https://kb-labb.github.io/easyaligner/get-started/text_processing.html) for a more detailed explanation, and a tutorial for implementing custom tokenizers.
+## Documentation
+Check out the documentation tutorials that cover common scenarios for forced alignment, and the API reference:
+* [https://kb-labb.github.io/easyaligner/](https://kb-labb.github.io/easyaligner/)
+* [Tutorial 1](https://kb-labb.github.io/easyaligner/get-started/tutorial01.html): Align text and audio when the transcript covers all of the spoken content in the audio.
+* [Tutorial 2](https://kb-labb.github.io/easyaligner/get-started/tutorial02.html): Transcript covers only part of the spoken content in the audio, but we know the relevant audio region in advance.
+* [Tutorial 3](https://kb-labb.github.io/easyaligner/get-started/tutorial03.html): Transcript covers only part of the spoken content in the audio, and we don't know the relevant audio region in advance.
+## Outputs
+By default, `easyaligner` saves the outputs of each stage of the pipeline (VAD, emission extraction, forced alignment) as JSON files in separate directories. The final aligned output can be found in `output/alignments`. The directory structure after running the full pipeline will look as follows:
+```
+output
+├── alignments
+├── emissions
+└── vad
+```
+The `output/emissions` directory will, in addition to the JSON files, also contain output emissions for each JSON file in `.npy` format.
+All intermediate files can safely be deleted, assuming there is no need to re-run the pipeline from a specific intermediate stage.
+## Citation
+If you use `easyaligner` in your research, consider citing the following blog post:
+```
+@online{rekathati2026,
+  author = {Rekathati, Faton},
+  title = {Easyaligner: {Forced} Alignment of Text and Audio, Made Easy},
+  date = {2026-04-08},
+  url = {https://kb-labb.github.io/posts/2026-04-08-easyaligner/},
+  langid = {en}
+}
+```

{easyaligner-0.2.0 → easyaligner-0.2.3}/pyproject.toml RENAMED Viewed

@@ -3,7 +3,7 @@ requires = ["setuptools>=67.0.0"]
 build-backend = "setuptools.build_meta"
 [project]
-version = "0.2.0"
+version = "0.2.3"
 name = "easyaligner"
 requires-python = ">= 3.10"
 description = "Forced alignment pipeline designed for efficiency and ease of use."
@@ -19,7 +19,8 @@ dependencies = [
   "nltk>=3.8.2",
   "pyannote-audio>=3.3.1",
   "silero-vad~=6.0",
-  "msgspec"
+  "msgspec",
+  "rapidfuzz"
 ]
 [project.urls]

{easyaligner-0.2.0 → easyaligner-0.2.3}/src/easyaligner/alignment/pytorch.py RENAMED Viewed

@@ -844,7 +844,7 @@ def get_segment_alignment(
         if token_cursor >= len(mapping):
             break  # No more tokens to process
-        if start_idx < mapping[token_cursor]["start_char"]:
+        if token_cursor == 0 and start_idx < mapping[token_cursor]["start_char"]:
             logger.warning(
                 "Segment indices start before the first token index. This may be due to "
                 "leading whitespace in the original text. Consider stripping leading/trailing "

{easyaligner-0.2.0 → easyaligner-0.2.3}/src/easyaligner/data/datamodel.py RENAMED Viewed

@@ -218,3 +218,32 @@ class AudioMetadata(msgspec.Struct):
     def to_dict(self):
         return {f: getattr(self, f) for f in self.__struct_fields__}
+class FuzzyMatch(msgspec.Struct):
+    """
+    Result of a fuzzy text match.
+    A `FuzzyMatch` contains the word indices, timestamps, and confidence score
+    of the best match found between a needle (ground truth text) and a haystack
+    (concatenated word texts from ASR output).
+    Attributes
+    ----------
+    start_index : int
+        Start matching word index in the haystack word list.
+    end_index : int
+        End matching word index in the haystack word list (inclusive).
+    score : float
+        Fuzzy match score on a 0-100 scale, as returned by rapidfuzz.
+    start : float
+        Start time of the match in seconds.
+    end : float
+        End time of the match in seconds.
+    """
+    start_index: int
+    end_index: int
+    score: float
+    start: float | None = None
+    end: float | None = None

{easyaligner-0.2.0 → easyaligner-0.2.3}/src/easyaligner/pipelines.py RENAMED Viewed

@@ -44,7 +44,6 @@ def vad_pipeline_generator(
     chunk_size: int = 30,
     sample_rate: int = 16000,
     metadata: list[dict] | None = None,
-    batch_size: int = 1,
     num_workers: int = 1,
     prefetch_factor: int = 2,
     save_json: bool = True,
@@ -73,8 +72,6 @@ def vad_pipeline_generator(
         The sample rate to resample the audio to before running VAD.
     metadata : list[dict] or None, optional
         Optional list of additional file level metadata to include.
-    batch_size : int, default 1
-        The batch size for the DataLoader.
     num_workers : int, default 1
         The number of workers for the DataLoader.
     prefetch_factor : int, default 2
@@ -99,7 +96,7 @@ def vad_pipeline_generator(
     )
     vad_dataloader = torch.utils.data.DataLoader(
         vad_dataset,
-        batch_size=batch_size,
+        batch_size=1,
         shuffle=False,
         collate_fn=vad_collate_fn,
         num_workers=num_workers,
@@ -163,7 +160,6 @@ def vad_pipeline(
     chunk_size: int = 30,
     sample_rate: int = 16000,
     metadata: list[dict] | None = None,
-    batch_size: int = 1,
     num_workers: int = 1,
     prefetch_factor: int = 2,
     save_json: bool = True,
@@ -192,8 +188,6 @@ def vad_pipeline(
         The sample rate to resample the audio to before running VAD.
     metadata : list[dict] or None, optional
         Optional list of additional file level metadata to include.
-    batch_size : int, default 1
-        The batch size for the DataLoader.
     num_workers : int, default 1
         The number of workers for the DataLoader.
     prefetch_factor : int, default 2
@@ -222,7 +216,6 @@ def vad_pipeline(
         chunk_size=chunk_size,
         sample_rate=sample_rate,
         metadata=metadata,
-        batch_size=batch_size,
         num_workers=num_workers,
         prefetch_factor=prefetch_factor,
         save_json=save_json,
@@ -249,7 +242,6 @@ def emissions_pipeline_generator(
     sample_rate: int = 16000,
     chunk_size: int = 30,
     alignment_strategy: str = "speech",
-    batch_size_files: int = 1,
     num_workers_files: int = 1,
     prefetch_factor_files: int = 2,
     batch_size_features: int = 8,
@@ -287,8 +279,6 @@ def emissions_pipeline_generator(
         Strategy for aligning features to text. One of 'speech' or 'chunk'.
         If `speech`, audio is split into `chunk_size` sized chunks based on SpeechSegments.
         If `chunk`, audio is taken from existing VAD chunks.
-    batch_size_files : int, default 1
-        Batch size for the file DataLoader.
     num_workers_files : int, default 1
         Number of workers for the file DataLoader.
     prefetch_factor_files : int, default 2
@@ -333,7 +323,7 @@ def emissions_pipeline_generator(
     file_dataloader = torch.utils.data.DataLoader(
         file_dataset,
-        batch_size=batch_size_files,
+        batch_size=1,
         shuffle=False,
         collate_fn=audiofile_collate_fn,
         num_workers=num_workers_files,
@@ -372,7 +362,7 @@ def emissions_pipeline_generator(
         speech_ids = []
         for batch in feature_dataloader:
-            features = batch["features"].half().to(device)
+            features = batch["features"].to(device=device, dtype=model.dtype)
             with torch.inference_mode():
                 logits = model(features).logits
@@ -420,7 +410,6 @@ def emissions_pipeline(
     sample_rate: int = 16000,
     chunk_size: int = 30,
     alignment_strategy: str = "speech",
-    batch_size_files: int = 1,
     num_workers_files: int = 1,
     prefetch_factor_files: int = 2,
     batch_size_features: int = 8,
@@ -455,8 +444,6 @@ def emissions_pipeline(
         Strategy for aligning features to text. One of 'speech' or 'chunk'.
         If `speech`, audio is split into `chunk_size` sized chunks based on SpeechSegments.
         If `chunk`, audio is taken from existing VAD chunks.
-    batch_size_files : int, default 1
-        Batch size for the file DataLoader.
     num_workers_files : int, default 1
         Number of workers for the file DataLoader.
     prefetch_factor_files : int, default 2
@@ -495,7 +482,6 @@ def emissions_pipeline(
         sample_rate=sample_rate,
         chunk_size=chunk_size,
         alignment_strategy=alignment_strategy,
-        batch_size_files=batch_size_files,
         num_workers_files=num_workers_files,
         prefetch_factor_files=prefetch_factor_files,
         batch_size_features=batch_size_features,
@@ -773,7 +759,6 @@ def pipeline(
     word_boundary: str = "|",
     indent: int = 2,
     ndigits: int = 5,
-    batch_size_files: int = 1,
     num_workers_files: int = 2,
     prefetch_factor_files: int = 1,
     batch_size_features: int = 8,
@@ -839,8 +824,6 @@ def pipeline(
         Indentation level for saved JSON files. `None` to disable pretty formatting.
     ndigits : int, default 5
         Number of decimal digits to round the alignment times and scores to.
-    batch_size_files : int, default 1
-        Batch size for the file DataLoader.
     num_workers_files : int, default 2
         Number of workers for the file DataLoader.
     prefetch_factor_files : int, default 1
@@ -887,7 +870,6 @@ def pipeline(
         speeches=speeches,
         chunk_size=chunk_size,
         sample_rate=sample_rate,
-        batch_size=batch_size_files,
         num_workers=num_workers_files,
         prefetch_factor=prefetch_factor_files,
         save_json=save_json,
@@ -909,7 +891,6 @@ def pipeline(
         sample_rate=sample_rate,
         chunk_size=chunk_size,
         alignment_strategy=alignment_strategy,
-        batch_size_files=batch_size_files,
         num_workers_files=num_workers_files,
         prefetch_factor_files=prefetch_factor_files,
         batch_size_features=batch_size_features,
@@ -929,7 +910,7 @@ def pipeline(
     )
     json_dataloader = torch.utils.data.DataLoader(
         json_dataset,
-        batch_size=batch_size_files,
+        batch_size=1,
         shuffle=False,
         collate_fn=metadata_collate_fn,
         num_workers=num_workers_files,

easyaligner-0.2.3/src/easyaligner/text/__init__.py ADDED Viewed

@@ -0,0 +1,28 @@
+from easyaligner.text.match import (
+    FuzzyMatch,
+    build_haystack,
+    flatten_words,
+    fuzzy_match,
+    resolve_char_to_word,
+)
+from easyaligner.text.normalization import (
+    SpanMapNormalizer,
+    add_deletions_to_mapping,
+    merge_multitoken_expressions,
+    text_normalizer,
+)
+from easyaligner.text.tokenizer import load_tokenizer, paragraph_tokenizer
+__all__ = [
+    "FuzzyMatch",
+    "SpanMapNormalizer",
+    "add_deletions_to_mapping",
+    "build_haystack",
+    "flatten_words",
+    "fuzzy_match",
+    "load_tokenizer",
+    "paragraph_tokenizer",
+    "merge_multitoken_expressions",
+    "resolve_char_to_word",
+    "text_normalizer",
+]

easyaligner 0.2.0__tar.gz → 0.2.3__tar.gz

easyaligner 0.2.0tar.gz → 0.2.3tar.gz