PyPI - uht-tooling - Versions diffs - 0.1.3__tar.gz → 0.1.5__tar.gz - Mend

uht-tooling 0.1.3tar.gz → 0.1.5tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

{uht_tooling-0.1.3 → uht_tooling-0.1.5}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: uht-tooling
-Version: 0.1.3
+Version: 0.1.5
 Summary: Tooling for ultra-high throughput screening workflows.
 Author: Matt115A
 License: MIT
@@ -35,7 +35,7 @@ Automation helpers for ultra-high-throughput molecular biology workflows. The pa
 ### Quick install (recommended, easiest file maintainance)
 ```bash
-pip install "uht-tooling[gui]==0.1.3"
+pip install "uht-tooling[gui]==0.1.4"
 ```
@@ -189,9 +189,10 @@ If mutations fall within overlapping primer windows, design sequential reactions
     --fastq data/umi_hunter/*.fastq.gz \
     --output-dir results/umi_hunter/
   ```
-- Tunable parameters include `--umi-identity-threshold` and `--consensus-mutation-threshold`.
-- --umi-identity-threshold is a decimal between 0-1 and defines how similar two UMIs have to be to be considered grouped.
-- --consensus-mutation-threshold is the minimum group size to report a consensus sequence.
+- Tunable parameters include `--umi-identity-threshold`, `--consensus-mutation-threshold`, and `--min-cluster-size`.
+- `--umi-identity-threshold` (0–1) controls how similar two UMIs must be to fall into the same cluster.
+- `--consensus-mutation-threshold` (0–1) is the fraction of reads within a cluster that must agree on a base before it is written into the consensus sequence.
+- `--min-cluster-size` sets the minimum number of reads required in a cluster before a consensus is generated (smaller clusters remain listed in the raw UMI CSV but no consensus FASTA is produced).
 Please be aware, this toolkit will not scale well beyond around 50k reads/sample. See UMIC-seq pipelines for efficient UMI-gene dictionary generation.
@@ -221,7 +222,14 @@ Please be aware, this toolkit will not scale well beyond around 50k reads/sample
     --fastq data/ep-library-profile/*.fastq.gz \
     --output-dir results/ep-library-profile/
   ```
-- Output bundle includes per-sample directories and a master summary TSV.
+- Output bundle includes per-sample directories, a master summary TSV, and a `summary_panels` figure that visualises positional mutation rates, coverage, and amino-acid simulations.
+**How the mutation rate and AA expectations are derived**
+1. Reads are aligned to both the region of interest and the full plasmid. Mismatches in the region define the “target” rate; mismatches elsewhere provide the background.
+2. The per-base background rate is subtracted from the target rate to yield a net nucleotide mutation rate, and the standard deviation reflects binomial sampling and quality-score uncertainty.
+3. The net rate is multiplied by the CDS length to estimate λ_bp (mutations per copy). Monte Carlo simulations then flip random bases, translate the mutated CDS, and count amino-acid differences across 1,000 trials—these drives the AA mutation mean/variance that appear in the panel plot.
+4. If multiple Q-score thresholds are analysed, the CLI aggregates them via a precision-weighted consensus (1 / standard deviation weighting) after filtering out thresholds with insufficient coverage; the consensus value is written to `aa_mutation_consensus.txt` and plotted as a horizontal guide.
 ---
@@ -243,9 +251,9 @@ Key points:
 1. **Nextera XT** – forward/reverse primer inputs with CSV preview.
 2. **SLIM** – template/context FASTA text areas plus mutation list.
 3. **Gibson** – multi-mutation support using `+` syntax.
-4. **Mutation Caller** – upload FASTQ, template FASTA, and configuration CSV.
-5. **UMI Hunter** – long-read UMI clustering with configurable thresholds.
-6. **Profile Inserts** – probe CSV and multiple FASTQ uploads.
+4. **Mutation Caller** – upload FASTQ and template FASTA, then enter flanks and gene length bounds inline.
+5. **UMI Hunter** – long-read UMI clustering with flank entry, UMI length bounds, mutation threshold, and minimum cluster size.
+6. **Profile Inserts** – interactive probe table plus multiple FASTQ uploads with adjustable fuzzy-match ratio.
 7. **EP Library Profile** – FASTQ uploads plus plasmid and region FASTA inputs.
 ### Workflow tips

{uht_tooling-0.1.3 → uht_tooling-0.1.5}/README.md RENAMED Viewed

@@ -8,7 +8,7 @@ Automation helpers for ultra-high-throughput molecular biology workflows. The pa
 ### Quick install (recommended, easiest file maintainance)
 ```bash
-pip install "uht-tooling[gui]==0.1.3"
+pip install "uht-tooling[gui]==0.1.4"
 ```
@@ -162,9 +162,10 @@ If mutations fall within overlapping primer windows, design sequential reactions
     --fastq data/umi_hunter/*.fastq.gz \
     --output-dir results/umi_hunter/
   ```
-- Tunable parameters include `--umi-identity-threshold` and `--consensus-mutation-threshold`.
-- --umi-identity-threshold is a decimal between 0-1 and defines how similar two UMIs have to be to be considered grouped.
-- --consensus-mutation-threshold is the minimum group size to report a consensus sequence.
+- Tunable parameters include `--umi-identity-threshold`, `--consensus-mutation-threshold`, and `--min-cluster-size`.
+- `--umi-identity-threshold` (0–1) controls how similar two UMIs must be to fall into the same cluster.
+- `--consensus-mutation-threshold` (0–1) is the fraction of reads within a cluster that must agree on a base before it is written into the consensus sequence.
+- `--min-cluster-size` sets the minimum number of reads required in a cluster before a consensus is generated (smaller clusters remain listed in the raw UMI CSV but no consensus FASTA is produced).
 Please be aware, this toolkit will not scale well beyond around 50k reads/sample. See UMIC-seq pipelines for efficient UMI-gene dictionary generation.
@@ -194,7 +195,14 @@ Please be aware, this toolkit will not scale well beyond around 50k reads/sample
     --fastq data/ep-library-profile/*.fastq.gz \
     --output-dir results/ep-library-profile/
   ```
-- Output bundle includes per-sample directories and a master summary TSV.
+- Output bundle includes per-sample directories, a master summary TSV, and a `summary_panels` figure that visualises positional mutation rates, coverage, and amino-acid simulations.
+**How the mutation rate and AA expectations are derived**
+1. Reads are aligned to both the region of interest and the full plasmid. Mismatches in the region define the “target” rate; mismatches elsewhere provide the background.
+2. The per-base background rate is subtracted from the target rate to yield a net nucleotide mutation rate, and the standard deviation reflects binomial sampling and quality-score uncertainty.
+3. The net rate is multiplied by the CDS length to estimate λ_bp (mutations per copy). Monte Carlo simulations then flip random bases, translate the mutated CDS, and count amino-acid differences across 1,000 trials—these drives the AA mutation mean/variance that appear in the panel plot.
+4. If multiple Q-score thresholds are analysed, the CLI aggregates them via a precision-weighted consensus (1 / standard deviation weighting) after filtering out thresholds with insufficient coverage; the consensus value is written to `aa_mutation_consensus.txt` and plotted as a horizontal guide.
 ---
@@ -216,9 +224,9 @@ Key points:
 1. **Nextera XT** – forward/reverse primer inputs with CSV preview.
 2. **SLIM** – template/context FASTA text areas plus mutation list.
 3. **Gibson** – multi-mutation support using `+` syntax.
-4. **Mutation Caller** – upload FASTQ, template FASTA, and configuration CSV.
-5. **UMI Hunter** – long-read UMI clustering with configurable thresholds.
-6. **Profile Inserts** – probe CSV and multiple FASTQ uploads.
+4. **Mutation Caller** – upload FASTQ and template FASTA, then enter flanks and gene length bounds inline.
+5. **UMI Hunter** – long-read UMI clustering with flank entry, UMI length bounds, mutation threshold, and minimum cluster size.
+6. **Profile Inserts** – interactive probe table plus multiple FASTQ uploads with adjustable fuzzy-match ratio.
 7. **EP Library Profile** – FASTQ uploads plus plasmid and region FASTA inputs.
 ### Workflow tips

{uht_tooling-0.1.3 → uht_tooling-0.1.5}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "uht-tooling"
-version = "0.1.3"
+version = "0.1.5"
 description = "Tooling for ultra-high throughput screening workflows."
 readme = "README.md"
 requires-python = ">=3.8"

{uht_tooling-0.1.3 → uht_tooling-0.1.5}/src/uht_tooling/cli.py RENAMED Viewed

@@ -233,6 +233,11 @@ def umi_hunter_command(
         max=1.0,
         help="Mutation threshold for consensus calling (default: 0.7).",
     ),
+    min_cluster_size: int = typer.Option(
+        1,
+        min=1,
+        help="Minimum number of reads required in a UMI cluster before a consensus is generated.",
+    ),
     log_path: Optional[Path] = typer.Option(
         None,
         dir_okay=False,
@@ -249,6 +254,7 @@ def umi_hunter_command(
         output_dir=output_dir,
         umi_identity_threshold=umi_identity_threshold,
         consensus_mutation_threshold=consensus_mutation_threshold,
+        min_cluster_size=min_cluster_size,
         log_path=log_path,
     )
     if not results:
@@ -256,7 +262,12 @@ def umi_hunter_command(
     else:
         typer.echo("UMI hunter outputs:")
         for entry in results:
-            typer.echo(f"  Sample {entry['sample']}: {entry['directory']}")
+            total_clusters = entry.get("clusters_total", entry.get("clusters", 0))
+            typer.echo(
+                f"  Sample {entry['sample']}: "
+                f"{entry.get('clusters', 0)} consensus clusters "
+                f"(from {total_clusters} total) → {entry['directory']}"
+            )
 @app.command("ep-library-profile", help="Profile mutation rates for ep-library sequencing data.")

uht-tooling 0.1.3__tar.gz → 0.1.5__tar.gz

uht-tooling 0.1.3tar.gz → 0.1.5tar.gz