PyPI - DAJIN2 - Versions diffs - 0.4.2__zip → 0.4.3__zip - Mend

DAJIN2 0.4.2zip → 0.4.3zip

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (68) hide show

{DAJIN2-0.4.2/src/DAJIN2.egg-info → DAJIN2-0.4.3}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: DAJIN2
-Version: 0.4.2
+Version: 0.4.3
 Summary: One-step genotyping tools for targeted long-read sequencing
 Home-page: https://github.com/akikuno/DAJIN2
 Author: Akihiro Kuno
@@ -14,22 +14,22 @@ Classifier: Intended Audience :: Science/Research
 Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
 Description-Content-Type: text/markdown
 License-File: LICENSE
-Requires-Dist: numpy>=1.20.0
-Requires-Dist: scipy>=1.6.0
+Requires-Dist: numpy>=1.24.0
+Requires-Dist: scipy>=1.10.0
 Requires-Dist: pandas>=1.0.0
-Requires-Dist: openpyxl>=3.0.0
-Requires-Dist: rapidfuzz>=3.0.0
-Requires-Dist: scikit-learn>=1.0.0
+Requires-Dist: openpyxl>=3.1.0
+Requires-Dist: rapidfuzz>=3.6.0
+Requires-Dist: scikit-learn>=1.3.0
 Requires-Dist: mappy>=2.24
-Requires-Dist: pysam>=0.19.0
+Requires-Dist: pysam>=0.21.0
 Requires-Dist: Flask>=2.2.0
 Requires-Dist: waitress>=2.1.0
 Requires-Dist: Jinja2>=3.1.0
-Requires-Dist: plotly>=5.0.0
+Requires-Dist: plotly>=5.19.0
 Requires-Dist: kaleido>=0.2.0
 Requires-Dist: cstag>=1.0.0
-Requires-Dist: midsv>=0.10.1
-Requires-Dist: wslPath>=0.3.0
+Requires-Dist: midsv>=0.11.0
+Requires-Dist: wslPath>=0.4.1
 [![License](https://img.shields.io/badge/License-MIT-9cf.svg)](https://choosealicense.com/licenses/mit/)
 [![Test](https://img.shields.io/github/actions/workflow/status/akikuno/dajin2/pytest.yml?branch=main&label=Test&color=brightgreen)](https://github.com/akikuno/dajin2/actions)
@@ -78,6 +78,7 @@ conda activate env-dajin2
 > CONDA_SUBDIR=osx-64 conda create -n env-dajin2 -c conda-forge -c bioconda python=3.10 DAJIN2 -y
 > conda activate env-dajin2
 > conda config --env --set subdir osx-64
+> python -c "import platform; print(platform.machine())" # Confirm that the output is 'x86_64', not 'arm64'
 > ```
 ### From [PyPI](https://pypi.org/project/DAJIN2/)
@@ -164,12 +165,17 @@ Options:
 #### Example
 ```bash
+# Download example dataset
+wget https://github.com/akikuno/DAJIN2/raw/main/examples/example_single.tar.gz
+tar -xf example_single.tar.gz
+# Run DAJIN2
 DAJIN2 \
-    --control example/barcode01 \
-    --sample example/barcode02 \
-    --allele example/design.fa \
-    --name IL6-knockin \
-    --genome hg38 \
+    --control example_single/control \
+    --sample example_single/sample \
+    --allele example_single/stx2_deletion.fa \
+    --name stx2_deletion \
+    --genome mm39 \
     --threads 4
 ```
@@ -206,7 +212,6 @@ DAJIN2 \
 By using the `batch` subcommand, you can process multiple FASTQ files simultaneously.
 For this purpose, a CSV or Excel file consolidating the sample information is required.
-<!-- For a specific example, please refer to [this link](https://github.com/akikuno/DAJIN2/blob/main/examples/example-batch/batch.csv). -->
 > [!NOTE]
 > For guidance on how to compile sample information, please refer to [this document](https://docs.google.com/presentation/d/e/2PACX-1vSMEmXJPG2TNjfT66XZJRzqJd82aAqO5gJrdEzyhn15YBBr_Li-j5puOgVChYf3jA/embed?start=false&loop=false&delayms=3000).
@@ -224,13 +229,18 @@ options:
 #### Example
 ```bash
-DAJIN2 --file batch.csv --threads 4
+# Donwload the example dataset
+wget https://github.com/akikuno/DAJIN2/raw/main/examples/example_batch.tar.gz
+tar -xf example_batch.tar.gz
+# Run DAJIN2
+DAJIN2 batch --file example_batch/batch.csv --threads 4
 ```
 <!-- ```bash
 # Donwload the example dataset
-wget https://github.com/akikuno/DAJIN2/raw/main/examples/example-batch.tar.gz
-tar -xf example-batch.tar.gz
+wget https://github.com/akikuno/DAJIN2/raw/main/examples/example_batch.tar.gz
+tar -xf example_batch.tar.gz
 # Run DAJIN2
 DAJIN2 batch --file example-batch/batch.csv --threads 3

{DAJIN2-0.4.2 → DAJIN2-0.4.3}/README.md RENAMED Viewed

@@ -45,6 +45,7 @@ conda activate env-dajin2
 > CONDA_SUBDIR=osx-64 conda create -n env-dajin2 -c conda-forge -c bioconda python=3.10 DAJIN2 -y
 > conda activate env-dajin2
 > conda config --env --set subdir osx-64
+> python -c "import platform; print(platform.machine())" # Confirm that the output is 'x86_64', not 'arm64'
 > ```
 ### From [PyPI](https://pypi.org/project/DAJIN2/)
@@ -131,12 +132,17 @@ Options:
 #### Example
 ```bash
+# Download example dataset
+wget https://github.com/akikuno/DAJIN2/raw/main/examples/example_single.tar.gz
+tar -xf example_single.tar.gz
+# Run DAJIN2
 DAJIN2 \
-    --control example/barcode01 \
-    --sample example/barcode02 \
-    --allele example/design.fa \
-    --name IL6-knockin \
-    --genome hg38 \
+    --control example_single/control \
+    --sample example_single/sample \
+    --allele example_single/stx2_deletion.fa \
+    --name stx2_deletion \
+    --genome mm39 \
     --threads 4
 ```
@@ -173,7 +179,6 @@ DAJIN2 \
 By using the `batch` subcommand, you can process multiple FASTQ files simultaneously.
 For this purpose, a CSV or Excel file consolidating the sample information is required.
-<!-- For a specific example, please refer to [this link](https://github.com/akikuno/DAJIN2/blob/main/examples/example-batch/batch.csv). -->
 > [!NOTE]
 > For guidance on how to compile sample information, please refer to [this document](https://docs.google.com/presentation/d/e/2PACX-1vSMEmXJPG2TNjfT66XZJRzqJd82aAqO5gJrdEzyhn15YBBr_Li-j5puOgVChYf3jA/embed?start=false&loop=false&delayms=3000).
@@ -191,13 +196,18 @@ options:
 #### Example
 ```bash
-DAJIN2 --file batch.csv --threads 4
+# Donwload the example dataset
+wget https://github.com/akikuno/DAJIN2/raw/main/examples/example_batch.tar.gz
+tar -xf example_batch.tar.gz
+# Run DAJIN2
+DAJIN2 batch --file example_batch/batch.csv --threads 4
 ```
 <!-- ```bash
 # Donwload the example dataset
-wget https://github.com/akikuno/DAJIN2/raw/main/examples/example-batch.tar.gz
-tar -xf example-batch.tar.gz
+wget https://github.com/akikuno/DAJIN2/raw/main/examples/example_batch.tar.gz
+tar -xf example_batch.tar.gz
 # Run DAJIN2
 DAJIN2 batch --file example-batch/batch.csv --threads 3

DAJIN2-0.4.3/requirements.txt ADDED Viewed

@@ -0,0 +1,20 @@
+numpy >= 1.24.0
+scipy >= 1.10.0
+pandas >= 1.0.0
+openpyxl >= 3.1.0
+rapidfuzz >=3.6.0
+scikit-learn >= 1.3.0
+mappy >= 2.24
+pysam >= 0.21.0
+Flask >= 2.2.0
+waitress >= 2.1.0
+Jinja2 >= 3.1.0
+plotly >= 5.19.0
+kaleido >= 0.2.0
+cstag >= 1.0.0
+midsv >= 0.11.0
+wslPath >=0.4.1

{DAJIN2-0.4.2 → DAJIN2-0.4.3}/setup.py RENAMED Viewed

@@ -9,7 +9,7 @@ with open("requirements.txt") as requirements_file:
 setuptools.setup(
     name="DAJIN2",
-    version="0.4.2",
+    version="0.4.3",
     author="Akihiro Kuno",
     author_email="akuno@md.tsukuba.ac.jp",
     description="One-step genotyping tools for targeted long-read sequencing",

{DAJIN2-0.4.2 → DAJIN2-0.4.3}/src/DAJIN2/core/clustering/label_merger.py RENAMED Viewed

@@ -11,20 +11,6 @@ def calculate_label_percentages(labels: list[int]) -> dict[int, float]:
     return {label: (count / total_labels * 100) for label, count in label_counts.items()}
-def merge_mixed_cluster(labels_control: list[int], labels_sample: list[int], threshold: float = 0.5) -> list[int]:
-    """Merge labels in sample if they appear more than 'threshold' percentage in control."""
-    labels_merged = labels_sample.copy()
-    label_percentages_control = calculate_label_percentages(labels_control)
-    mixed_labels = {label for label, percent in label_percentages_control.items() if percent > threshold}
-    new_label = max(labels_merged) + 1
-    for i, label in enumerate(labels_sample):
-        if label in mixed_labels:
-            labels_merged[i] = new_label
-    return labels_merged
 def map_clusters_to_previous(labels_sample: list[int], labels_previous: list[int]) -> dict[int, int]:
     """
     Determine which cluster in labels_previous corresponds to each cluster in labels_sample.
@@ -63,6 +49,8 @@ def merge_minor_cluster(
     minor_labels_percentage = {label for label, percent in label_percentages.items() if percent < threshold_percentage}
     minor_labels_readnumber = {label for label, num in Counter(labels_sample).items() if num <= threshold_readnumber}
     minor_labels = minor_labels_percentage | minor_labels_readnumber
+    if minor_labels == set():
+        return labels_sample
     correspondence = map_clusters_to_previous(labels_sample, labels_previous)
     update_required_labels = get_update_required_labels(correspondence)
@@ -70,7 +58,23 @@ def merge_minor_cluster(
     labels_merged = labels_sample.copy()
     for m in minor_labels:
         new_label = max(labels_merged) + 1
-        labels_merged = [new_label if label in update_required_labels[correspondence[m]] else label for label in labels_merged]
+        labels_merged = [
+            new_label if label in update_required_labels[correspondence[m]] else label for label in labels_merged
+        ]
+    return labels_merged
+def merge_mixed_cluster(labels_control: list[int], labels_sample: list[int], threshold: float = 0.5) -> list[int]:
+    """Merge labels in sample if they appear more than 'threshold' percentage in control."""
+    labels_merged = labels_sample.copy()
+    label_percentages_control = calculate_label_percentages(labels_control)
+    mixed_labels = {label for label, percent in label_percentages_control.items() if percent > threshold}
+    new_label = max(labels_merged) + 1
+    for i, label in enumerate(labels_sample):
+        if label in mixed_labels:
+            labels_merged[i] = new_label
     return labels_merged
@@ -82,7 +86,7 @@ def merge_minor_cluster(
 def merge_labels(labels_control: list[int], labels_sample: list[int], labels_previous: list[int]) -> list[int]:
     labels_merged = merge_minor_cluster(
-        labels_sample, labels_previous, threshold_percentage=0.5, threshold_readnumber=10
+        labels_sample, labels_previous, threshold_percentage=0.5, threshold_readnumber=5
     )
     labels_merged = merge_mixed_cluster(labels_control, labels_merged)
     return labels_merged

{DAJIN2-0.4.2 → DAJIN2-0.4.3}/src/DAJIN2/core/core.py RENAMED Viewed

@@ -70,8 +70,8 @@ def execute_control(arguments: dict):
     # Output BAM files
     ###########################################################
     logger.info(f"Output BAM files of {arguments['control']}...")
-    report.report_bam.export_to_bam(
-        ARGS.tempdir, ARGS.control_name, ARGS.genome_coordinates, ARGS.threads, is_control=True
+    report.bam_exporter.export_to_bam(
+        ARGS.tempdir, ARGS.control_name, ARGS.genome_coordinates, ARGS.threads, ARGS.uuid, is_control=True
     )
     ###########################################################
     # Finish call
@@ -204,15 +204,15 @@ def execute_sample(arguments: dict):
     # RESULT
     io.write_jsonl(RESULT_SAMPLE, Path(ARGS.tempdir, "result", f"{ARGS.sample_name}.jsonl"))
     # FASTA
-    report.report_files.export_to_fasta(ARGS.tempdir, ARGS.sample_name, cons_sequence)
-    report.report_files.export_reference_to_fasta(ARGS.tempdir, ARGS.sample_name)
+    report.sequence_exporter.export_to_fasta(ARGS.tempdir, ARGS.sample_name, cons_sequence)
+    report.sequence_exporter.export_reference_to_fasta(ARGS.tempdir, ARGS.sample_name)
     # HTML
-    report.report_files.export_to_html(ARGS.tempdir, ARGS.sample_name, cons_percentage)
+    report.sequence_exporter.export_to_html(ARGS.tempdir, ARGS.sample_name, cons_percentage)
     # CSV (Allele Info)
-    report.report_mutation.export_to_csv(ARGS.tempdir, ARGS.sample_name, ARGS.genome_coordinates, cons_percentage)
+    report.mutation_exporter.export_to_csv(ARGS.tempdir, ARGS.sample_name, ARGS.genome_coordinates, cons_percentage)
     # BAM
-    report.report_bam.export_to_bam(
-        ARGS.tempdir, ARGS.sample_name, ARGS.genome_coordinates, ARGS.threads, RESULT_SAMPLE
+    report.bam_exporter.export_to_bam(
+        ARGS.tempdir, ARGS.sample_name, ARGS.genome_coordinates, ARGS.threads, ARGS.uuid, RESULT_SAMPLE
     )
     for path_bam_igvjs in Path(ARGS.tempdir, "cache", ".igvjs").glob(f"{ARGS.control_name}_control.bam*"):
         shutil.copy(path_bam_igvjs, Path(ARGS.tempdir, "report", ".igvjs", ARGS.sample_name))

{DAJIN2-0.4.2 → DAJIN2-0.4.3}/src/DAJIN2/core/preprocess/genome_fetcher.py RENAMED Viewed

@@ -5,11 +5,19 @@ from urllib.request import urlopen
 def fetch_seq_coordinates(genome: str, blat_url: str, seq: str) -> dict:
     url = f"{blat_url}?db={genome}&type=BLAT&userSeq={seq}"
-    response = urlopen(url).read().decode("utf8").split("\n")
-    matches = [x for x in response if "100.0%" in x]
+    records = urlopen(url).read().decode("utf8").split("\n")
+    matches = []
+    for record in records:
+        if "100.0%" not in record:
+            continue
+        record_trim = [r for r in record.split(" ") if r]
+        if record_trim[-1] == str(len(seq)):
+            matches = record_trim
     if not matches:
         raise ValueError(f"{seq[:60]}... is not found in {genome}")
-    chrom, strand, start, end, _ = matches[0].split()[-5:]
+    chrom, strand, start, end, _ = matches[-5:]
     return {"chrom": chrom, "strand": strand, "start": int(start), "end": int(end)}

{DAJIN2-0.4.2 → DAJIN2-0.4.3}/src/DAJIN2/core/preprocess/midsv_caller.py RENAMED Viewed

@@ -8,8 +8,7 @@ from itertools import chain, groupby
 from collections import Counter
-from DAJIN2.utils import sam_handler
-from DAJIN2.utils import cssplits_handler
+from DAJIN2.utils import io, sam_handler, cssplits_handler
 def has_inversion_in_splice(CIGAR: str) -> bool:
@@ -215,8 +214,8 @@ def generate_midsv(ARGS, is_control: bool = False, is_insertion: bool = False) -
             path_splice = Path(ARGS.tempdir, name, "sam", f"splice_{allele}.sam")
             path_output_midsv = Path(ARGS.tempdir, name, "midsv", f"{allele}.json")
-        sam_ont = sam_handler.remove_overlapped_reads(list(sam_handler.read_sam(path_ont)))
-        sam_splice = sam_handler.remove_overlapped_reads(list(sam_handler.read_sam(path_splice)))
+        sam_ont = sam_handler.remove_overlapped_reads(list(io.read_sam(path_ont)))
+        sam_splice = sam_handler.remove_overlapped_reads(list(io.read_sam(path_splice)))
         qname_of_map_ont = extract_qname_of_map_ont(sam_ont, sam_splice)
         sam_of_map_ont = filter_sam_by_preset(sam_ont, qname_of_map_ont, preset="map-ont")
         sam_of_splice = filter_sam_by_preset(sam_splice, qname_of_map_ont, preset="splice")

DAJIN2-0.4.3/src/DAJIN2/core/report/__init__.py ADDED Viewed

@@ -0,0 +1,3 @@
+from DAJIN2.core.report import bam_exporter
+from DAJIN2.core.report import sequence_exporter
+from DAJIN2.core.report import mutation_exporter

DAJIN2-0.4.2/src/DAJIN2/core/report/report_bam.py → DAJIN2-0.4.3/src/DAJIN2/core/report/bam_exporter.py RENAMED Viewed

@@ -1,17 +1,16 @@
 from __future__ import annotations
-import random
 from collections import defaultdict
 from itertools import groupby
 from pathlib import Path
-import midsv
 import pysam
-from DAJIN2.utils import sam_handler
+from DAJIN2.utils import io, sam_handler
-def realign(sam: list[list[str]], GENOME_COODINATES: dict) -> list[str]:
+def recalculate_sam_coodinates_to_reference(sam: list[list[str]], GENOME_COODINATES: dict) -> list[str]:
+    """Recalculate SAM genomic coordinates with the reference genome, not with the FASTA_ALLELE"""
     sam_headers = [s for s in sam if s[0].startswith("@")]
     sam_contents = [s for s in sam if not s[0].startswith("@")]
     for s in sam_headers:
@@ -29,31 +28,44 @@ def realign(sam: list[list[str]], GENOME_COODINATES: dict) -> list[str]:
     return sam_headers + sam_contents
+def convert_pos_to_one_indexed(sam_lines: list[list[str]]) -> list[list[str]]:
+    """Convert SAM POS from 0-indexed to 1-indexed"""
+    def convert_line(line: list[str]) -> list[str]:
+        if not line[0].startswith("@") and line[3] == "0":
+            line[3] = "1"
+        return line
+    return [convert_line(line) for line in sam_lines]
 def group_by_name(sam_contents: list[str], clust_sample: list[dict]) -> dict[list]:
+    """Group alignments in map-ont.sam by allele name (NAME)"""
     sam_contents.sort()
-    clust_sample_qname = sorted(clust_sample, key=lambda x: x["QNAME"])
-    clust_sample_qname_set = set()
-    for qnames in clust_sample_qname:
-        qname = qnames["QNAME"]
-        clust_sample_qname_set.add(qname)
+    clust_sample_sorted = sorted(clust_sample, key=lambda x: x["QNAME"])
+    qnames: set[str] = {c["QNAME"] for c in clust_sample_sorted}
     sam_groups = defaultdict(list)
-    idx_left = 0
-    idx_right = 0
-    while idx_left < len(sam_contents) and idx_right < len(clust_sample_qname):
-        read_left = sam_contents[idx_left][:-1]
-        read_right = clust_sample_qname[idx_right]
-        qname_left = read_left[0]
-        qname_right = read_right["QNAME"]
-        if qname_left not in clust_sample_qname_set:
-            idx_left += 1
+    idx_sam_contents = 0
+    idx_clust_sample = 0
+    while idx_sam_contents < len(sam_contents) and idx_clust_sample < len(clust_sample_sorted):
+        alignments_sam = sam_contents[idx_sam_contents][:-1]  # Discard CS tags to reduce file size
+        alignments_clsut_sample = clust_sample_sorted[idx_clust_sample]
+        qname_sam = alignments_sam[0]
+        if qname_sam not in qnames:
+            idx_sam_contents += 1
             continue
-        if qname_left == qname_right:
-            key = read_right["NAME"]
-            sam_groups[key].append(read_left)
-            idx_left += 1
+        if qname_sam == alignments_clsut_sample["QNAME"]:
+            key = alignments_clsut_sample["NAME"]
+            sam_groups[key].append(alignments_sam)
+            idx_sam_contents += 1
         else:
-            idx_right += 1
-    return sam_groups
+            idx_clust_sample += 1
+    return dict(sam_groups)
 ###############################################################################
@@ -67,13 +79,11 @@ def subset_qnames(RESULT_SAMPLE, readnum: int = 100) -> dict[set[str]]:
         group = list(group)
         qnames = [res["QNAME"] for res in group[:readnum]]
         qnames_by_name[name] = set(qnames)
-    return qnames_by_name
+    return dict(qnames_by_name)
-def subset_reads(name, sam_content, qnames_by_name):
-    qnames = qnames_by_name[name]
-    sam_subset = [sam for sam in sam_content if sam[0] in qnames]
-    return sam_subset
+def subset_reads(sam_content: list[str], qnames: set[str]) -> list[str]:
+    return [sam for sam in sam_content if sam[0] in qnames]
 ###############################################################################
@@ -89,31 +99,34 @@ def write_sam_to_bam(sam: list[list[str]], path_sam: str | Path, path_bam: str |
 def update_sam(sam: list, GENOME_COODINATES: dict = {}) -> list:
-    sam_update = sam.copy()
-    sam_update = sam_handler.remove_overlapped_reads(sam_update)
-    sam_update = sam_handler.remove_microhomology(sam_update)
-    if "genome" in GENOME_COODINATES:
-        sam_update = realign(sam_update, GENOME_COODINATES)
-    return sam_update
+    sam_records = sam.copy()
+    sam_records = sam_handler.remove_overlapped_reads(sam_records)
+    sam_records = sam_handler.remove_microhomology(sam_records)
+    if GENOME_COODINATES["genome"]:
+        return recalculate_sam_coodinates_to_reference(sam_records, GENOME_COODINATES)
+    else:
+        return convert_pos_to_one_indexed(sam_records)
-def export_to_bam(TEMPDIR, NAME, GENOME_COODINATES, THREADS, RESULT_SAMPLE=None, is_control=False) -> None:
-    randomnum = random.randint(100_000, 999_999)
+def export_to_bam(TEMPDIR, NAME, GENOME_COODINATES, THREADS, UUID, RESULT_SAMPLE=None, is_control=False) -> None:
     path_sam_input = Path(TEMPDIR, NAME, "sam", "map-ont_control.sam")
-    sam = list(midsv.read_sam(path_sam_input))
+    sam_records = list(io.read_sam(path_sam_input))
     # Update sam
-    sam_update = update_sam(sam, GENOME_COODINATES)
+    sam_updated = update_sam(sam_records, GENOME_COODINATES)
     # Output SAM and BAM
-    path_sam_output = Path(TEMPDIR, "report", "BAM", f"tmp{randomnum}_{NAME}_control.sam")
+    path_sam_output = Path(TEMPDIR, "report", "BAM", f"temp_{UUID}_{NAME}_control.sam")
     path_bam_output = Path(TEMPDIR, "report", "BAM", NAME, f"{NAME}.bam")
-    write_sam_to_bam(sam_update, path_sam_output, path_bam_output, THREADS)
+    write_sam_to_bam(sam_updated, path_sam_output, path_bam_output, THREADS)
     # Prepare SAM headers and contents
-    sam_headers = [s for s in sam_update if s[0].startswith("@")]
-    sam_contents = [s for s in sam_update if not s[0].startswith("@")]
+    sam_headers = [s for s in sam_updated if s[0].startswith("@")]
+    sam_contents = [s for s in sam_updated if not s[0].startswith("@")]
     if is_control:
-        qnames = set(list(set(s[0] for s in sam_contents[:10000]))[:100])
-        sam_subset = [s for s in sam_update if s[0] in qnames]
-        path_sam_output = Path(TEMPDIR, "report", "BAM", f"tmp{randomnum}_{NAME}_control_cache.sam")
+        qnames: set[str] = set(list(set(s[0] for s in sam_contents[:10000]))[:100])
+        sam_subset = [s for s in sam_updated if s[0] in qnames]
+        path_sam_output = Path(TEMPDIR, "report", "BAM", f"temp_{UUID}_{NAME}_control_cache.sam")
         path_bam_output = Path(TEMPDIR, "cache", ".igvjs", NAME, "control.bam")
         write_sam_to_bam(sam_headers + sam_subset, path_sam_output, path_bam_output, THREADS)
     else:
@@ -122,14 +135,15 @@ def export_to_bam(TEMPDIR, NAME, GENOME_COODINATES, THREADS, RESULT_SAMPLE=None,
         # Output SAM and BAM
         for name, sam_content in sam_groups.items():
             # BAM
-            path_sam_output = Path(TEMPDIR, "report", "bam", f"tmp{randomnum}_{name}.sam")
+            path_sam_output = Path(TEMPDIR, "report", "BAM", f"temp_{UUID}_{name}.sam")
             path_bam_output = Path(TEMPDIR, "report", "BAM", NAME, f"{NAME}_{name}.bam")
             write_sam_to_bam(sam_headers + sam_content, path_sam_output, path_bam_output, THREADS)
             # igvjs
-            sam_subset = subset_reads(name, sam_content, qnames_by_name)
-            path_sam_output = Path(TEMPDIR, "report", "bam", f"tmp{randomnum}_{name}_subset.sam")
+            sam_subset = subset_reads(sam_content, qnames_by_name[name])
+            path_sam_output = Path(TEMPDIR, "report", "BAM", f"temp_{UUID}_{name}_subset.sam")
             path_bam_output = Path(TEMPDIR, "report", ".igvjs", NAME, f"{name}.bam")
             write_sam_to_bam(sam_headers + sam_subset, path_sam_output, path_bam_output, THREADS)
     # Remove temporary files
-    sam_temp = Path(TEMPDIR, "report", "BAM").glob(f"tmp{randomnum}*.sam")
+    sam_temp = Path(TEMPDIR, "report", "BAM").glob(f"temp_{UUID}*.sam")
     [s.unlink() for s in sam_temp]

{DAJIN2-0.4.2 → DAJIN2-0.4.3}/src/DAJIN2/main.py RENAMED Viewed

@@ -20,7 +20,7 @@ from DAJIN2.core import core
 from DAJIN2.utils import io, config, report_generator, input_validator, multiprocess
-DAJIN_VERSION = "0.4.2"
+DAJIN_VERSION = "0.4.3"
 def generate_report(name: str) -> None:

{DAJIN2-0.4.2 → DAJIN2-0.4.3}/src/DAJIN2/utils/io.py RENAMED Viewed

@@ -19,6 +19,12 @@ from openpyxl import load_workbook, Workbook
 ###########################################################
+def read_sam(path_of_sam: str | Path) -> Generator[list]:
+    with open(path_of_sam) as f:
+        for line in f:
+            yield line.strip().split("\t")
 def load_pickle(file_path: Path):
     with open(file_path, "rb") as f:
         return pickle.load(f)

{DAJIN2-0.4.2 → DAJIN2-0.4.3}/src/DAJIN2/utils/sam_handler.py RENAMED Viewed

@@ -2,8 +2,6 @@ from __future__ import annotations
 import re
-from pathlib import Path
-from typing import Generator
 from itertools import groupby
 from DAJIN2.utils.dna_handler import revcomp
@@ -25,17 +23,6 @@ def is_mapped(s: list[str]) -> bool:
     return not s[0].startswith("@") and s[9] != "*"
-###########################################################
-# Read sam
-###########################################################
-def read_sam(path_of_sam: str | Path) -> Generator[list]:
-    with open(path_of_sam) as f:
-        for line in f:
-            yield line.strip().split("\t")
 ###########################################################
 # remove_overlapped_reads
 ###########################################################

{DAJIN2-0.4.2 → DAJIN2-0.4.3/src/DAJIN2.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: DAJIN2
-Version: 0.4.2
+Version: 0.4.3
 Summary: One-step genotyping tools for targeted long-read sequencing
 Home-page: https://github.com/akikuno/DAJIN2
 Author: Akihiro Kuno
@@ -14,22 +14,22 @@ Classifier: Intended Audience :: Science/Research
 Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
 Description-Content-Type: text/markdown
 License-File: LICENSE
-Requires-Dist: numpy>=1.20.0
-Requires-Dist: scipy>=1.6.0
+Requires-Dist: numpy>=1.24.0
+Requires-Dist: scipy>=1.10.0
 Requires-Dist: pandas>=1.0.0
-Requires-Dist: openpyxl>=3.0.0
-Requires-Dist: rapidfuzz>=3.0.0
-Requires-Dist: scikit-learn>=1.0.0
+Requires-Dist: openpyxl>=3.1.0
+Requires-Dist: rapidfuzz>=3.6.0
+Requires-Dist: scikit-learn>=1.3.0
 Requires-Dist: mappy>=2.24
-Requires-Dist: pysam>=0.19.0
+Requires-Dist: pysam>=0.21.0
 Requires-Dist: Flask>=2.2.0
 Requires-Dist: waitress>=2.1.0
 Requires-Dist: Jinja2>=3.1.0
-Requires-Dist: plotly>=5.0.0
+Requires-Dist: plotly>=5.19.0
 Requires-Dist: kaleido>=0.2.0
 Requires-Dist: cstag>=1.0.0
-Requires-Dist: midsv>=0.10.1
-Requires-Dist: wslPath>=0.3.0
+Requires-Dist: midsv>=0.11.0
+Requires-Dist: wslPath>=0.4.1
 [![License](https://img.shields.io/badge/License-MIT-9cf.svg)](https://choosealicense.com/licenses/mit/)
 [![Test](https://img.shields.io/github/actions/workflow/status/akikuno/dajin2/pytest.yml?branch=main&label=Test&color=brightgreen)](https://github.com/akikuno/dajin2/actions)
@@ -78,6 +78,7 @@ conda activate env-dajin2
 > CONDA_SUBDIR=osx-64 conda create -n env-dajin2 -c conda-forge -c bioconda python=3.10 DAJIN2 -y
 > conda activate env-dajin2
 > conda config --env --set subdir osx-64
+> python -c "import platform; print(platform.machine())" # Confirm that the output is 'x86_64', not 'arm64'
 > ```
 ### From [PyPI](https://pypi.org/project/DAJIN2/)
@@ -164,12 +165,17 @@ Options:
 #### Example
 ```bash
+# Download example dataset
+wget https://github.com/akikuno/DAJIN2/raw/main/examples/example_single.tar.gz
+tar -xf example_single.tar.gz
+# Run DAJIN2
 DAJIN2 \
-    --control example/barcode01 \
-    --sample example/barcode02 \
-    --allele example/design.fa \
-    --name IL6-knockin \
-    --genome hg38 \
+    --control example_single/control \
+    --sample example_single/sample \
+    --allele example_single/stx2_deletion.fa \
+    --name stx2_deletion \
+    --genome mm39 \
     --threads 4
 ```
@@ -206,7 +212,6 @@ DAJIN2 \
 By using the `batch` subcommand, you can process multiple FASTQ files simultaneously.
 For this purpose, a CSV or Excel file consolidating the sample information is required.
-<!-- For a specific example, please refer to [this link](https://github.com/akikuno/DAJIN2/blob/main/examples/example-batch/batch.csv). -->
 > [!NOTE]
 > For guidance on how to compile sample information, please refer to [this document](https://docs.google.com/presentation/d/e/2PACX-1vSMEmXJPG2TNjfT66XZJRzqJd82aAqO5gJrdEzyhn15YBBr_Li-j5puOgVChYf3jA/embed?start=false&loop=false&delayms=3000).
@@ -224,13 +229,18 @@ options:
 #### Example
 ```bash
-DAJIN2 --file batch.csv --threads 4
+# Donwload the example dataset
+wget https://github.com/akikuno/DAJIN2/raw/main/examples/example_batch.tar.gz
+tar -xf example_batch.tar.gz
+# Run DAJIN2
+DAJIN2 batch --file example_batch/batch.csv --threads 4
 ```
 <!-- ```bash
 # Donwload the example dataset
-wget https://github.com/akikuno/DAJIN2/raw/main/examples/example-batch.tar.gz
-tar -xf example-batch.tar.gz
+wget https://github.com/akikuno/DAJIN2/raw/main/examples/example_batch.tar.gz
+tar -xf example_batch.tar.gz
 # Run DAJIN2
 DAJIN2 batch --file example-batch/batch.csv --threads 3

{DAJIN2-0.4.2 → DAJIN2-0.4.3}/src/DAJIN2.egg-info/SOURCES.txt RENAMED Viewed

@@ -46,10 +46,10 @@ src/DAJIN2/core/preprocess/mapping.py
 src/DAJIN2/core/preprocess/midsv_caller.py
 src/DAJIN2/core/preprocess/mutation_extractor.py
 src/DAJIN2/core/report/__init__.py
+src/DAJIN2/core/report/bam_exporter.py
 src/DAJIN2/core/report/insertion_reflector.py
-src/DAJIN2/core/report/report_bam.py
-src/DAJIN2/core/report/report_files.py
-src/DAJIN2/core/report/report_mutation.py
+src/DAJIN2/core/report/mutation_exporter.py
+src/DAJIN2/core/report/sequence_exporter.py
 src/DAJIN2/static/css/style.css
 src/DAJIN2/templates/index.html
 src/DAJIN2/utils/config.py

DAJIN2-0.4.3/src/DAJIN2.egg-info/requires.txt ADDED Viewed

@@ -0,0 +1,16 @@
+numpy>=1.24.0
+scipy>=1.10.0
+pandas>=1.0.0
+openpyxl>=3.1.0
+rapidfuzz>=3.6.0
+scikit-learn>=1.3.0
+mappy>=2.24
+pysam>=0.21.0
+Flask>=2.2.0
+waitress>=2.1.0
+Jinja2>=3.1.0
+plotly>=5.19.0
+kaleido>=0.2.0
+cstag>=1.0.0
+midsv>=0.11.0
+wslPath>=0.4.1

DAJIN2-0.4.2/requirements.txt DELETED Viewed

@@ -1,20 +0,0 @@
-numpy >= 1.20.0
-scipy >= 1.6.0
-pandas >= 1.0.0
-openpyxl >= 3.0.0
-rapidfuzz >=3.0.0
-scikit-learn >= 1.0.0
-mappy >= 2.24
-pysam >= 0.19.0
-Flask >= 2.2.0
-waitress >= 2.1.0
-Jinja2 >= 3.1.0
-plotly >= 5.0.0
-kaleido >= 0.2.0
-cstag >= 1.0.0
-midsv >= 0.10.1
-wslPath >=0.3.0

DAJIN2-0.4.2/src/DAJIN2/core/report/__init__.py DELETED Viewed

@@ -1,3 +0,0 @@
-from DAJIN2.core.report import report_bam
-from DAJIN2.core.report import report_files
-from DAJIN2.core.report import report_mutation

DAJIN2-0.4.2/src/DAJIN2.egg-info/requires.txt DELETED Viewed

@@ -1,16 +0,0 @@
-numpy>=1.20.0
-scipy>=1.6.0
-pandas>=1.0.0
-openpyxl>=3.0.0
-rapidfuzz>=3.0.0
-scikit-learn>=1.0.0
-mappy>=2.24
-pysam>=0.19.0
-Flask>=2.2.0
-waitress>=2.1.0
-Jinja2>=3.1.0
-plotly>=5.0.0
-kaleido>=0.2.0
-cstag>=1.0.0
-midsv>=0.10.1
-wslPath>=0.3.0