uht-tooling 0.1.9__tar.gz → 0.3.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {uht_tooling-0.1.9 → uht_tooling-0.3.0}/PKG-INFO +123 -5
- {uht_tooling-0.1.9 → uht_tooling-0.3.0}/README.md +122 -4
- {uht_tooling-0.1.9 → uht_tooling-0.3.0}/pyproject.toml +1 -1
- {uht_tooling-0.1.9 → uht_tooling-0.3.0}/src/uht_tooling/cli.py +153 -4
- uht_tooling-0.3.0/src/uht_tooling/config.py +137 -0
- uht_tooling-0.3.0/src/uht_tooling/tools.py +143 -0
- {uht_tooling-0.1.9 → uht_tooling-0.3.0}/src/uht_tooling/workflows/gui.py +19 -0
- {uht_tooling-0.1.9 → uht_tooling-0.3.0}/src/uht_tooling/workflows/mut_rate.py +484 -124
- {uht_tooling-0.1.9 → uht_tooling-0.3.0}/src/uht_tooling/workflows/mutation_caller.py +11 -2
- {uht_tooling-0.1.9 → uht_tooling-0.3.0}/src/uht_tooling/workflows/umi_hunter.py +9 -4
- {uht_tooling-0.1.9 → uht_tooling-0.3.0}/src/uht_tooling.egg-info/PKG-INFO +123 -5
- {uht_tooling-0.1.9 → uht_tooling-0.3.0}/src/uht_tooling.egg-info/SOURCES.txt +2 -0
- {uht_tooling-0.1.9 → uht_tooling-0.3.0}/setup.cfg +0 -0
- {uht_tooling-0.1.9 → uht_tooling-0.3.0}/src/uht_tooling/__init__.py +0 -0
- {uht_tooling-0.1.9 → uht_tooling-0.3.0}/src/uht_tooling/models/__init__.py +0 -0
- {uht_tooling-0.1.9 → uht_tooling-0.3.0}/src/uht_tooling/workflows/__init__.py +0 -0
- {uht_tooling-0.1.9 → uht_tooling-0.3.0}/src/uht_tooling/workflows/design_gibson.py +0 -0
- {uht_tooling-0.1.9 → uht_tooling-0.3.0}/src/uht_tooling/workflows/design_kld.py +0 -0
- {uht_tooling-0.1.9 → uht_tooling-0.3.0}/src/uht_tooling/workflows/design_slim.py +0 -0
- {uht_tooling-0.1.9 → uht_tooling-0.3.0}/src/uht_tooling/workflows/nextera_designer.py +0 -0
- {uht_tooling-0.1.9 → uht_tooling-0.3.0}/src/uht_tooling/workflows/profile_inserts.py +0 -0
- {uht_tooling-0.1.9 → uht_tooling-0.3.0}/src/uht_tooling.egg-info/dependency_links.txt +0 -0
- {uht_tooling-0.1.9 → uht_tooling-0.3.0}/src/uht_tooling.egg-info/entry_points.txt +0 -0
- {uht_tooling-0.1.9 → uht_tooling-0.3.0}/src/uht_tooling.egg-info/requires.txt +0 -0
- {uht_tooling-0.1.9 → uht_tooling-0.3.0}/src/uht_tooling.egg-info/top_level.txt +0 -0
- {uht_tooling-0.1.9 → uht_tooling-0.3.0}/tests/test_design_kld.py +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: uht-tooling
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.3.0
|
|
4
4
|
Summary: Tooling for ultra-high throughput screening workflows.
|
|
5
5
|
Author: Matt115A
|
|
6
6
|
License-Expression: MIT
|
|
@@ -47,7 +47,22 @@ This installs the core workflows plus the optional GUI dependency (Gradio). Omit
|
|
|
47
47
|
pip install uht-tooling
|
|
48
48
|
```
|
|
49
49
|
|
|
50
|
-
|
|
50
|
+
### External Tools
|
|
51
|
+
|
|
52
|
+
Some workflows require external bioinformatics tools:
|
|
53
|
+
|
|
54
|
+
| Workflow | Required Tools |
|
|
55
|
+
|----------|---------------|
|
|
56
|
+
| mutation-caller | mafft |
|
|
57
|
+
| umi-hunter | mafft |
|
|
58
|
+
| ep-library-profile | minimap2, NanoFilt |
|
|
59
|
+
|
|
60
|
+
Install via conda:
|
|
61
|
+
```bash
|
|
62
|
+
conda install -c bioconda mafft minimap2 nanofilt
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
The CLI and GUI will validate tool availability before running and provide clear error messages if tools are missing.
|
|
51
66
|
|
|
52
67
|
### Development install
|
|
53
68
|
```bash
|
|
@@ -95,10 +110,69 @@ Each command provides detailed help, including option descriptions and expected
|
|
|
95
110
|
uht-tooling mutation-caller --help
|
|
96
111
|
```
|
|
97
112
|
|
|
113
|
+
### Short Flags
|
|
114
|
+
|
|
115
|
+
All commands support short flags for common options:
|
|
116
|
+
|
|
117
|
+
```bash
|
|
118
|
+
# Long form
|
|
119
|
+
uht-tooling design-slim --gene-fasta gene.fa --context-fasta ctx.fa --mutations-csv mut.csv --output-dir out/
|
|
120
|
+
|
|
121
|
+
# Short form
|
|
122
|
+
uht-tooling design-slim -g gene.fa -c ctx.fa -m mut.csv -o out/
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
| Long Flag | Short | Commands |
|
|
126
|
+
|-----------|-------|----------|
|
|
127
|
+
| `--gene-fasta` | `-g` | design-slim, design-kld, design-gibson |
|
|
128
|
+
| `--context-fasta` | `-c` | design-slim, design-kld, design-gibson |
|
|
129
|
+
| `--mutations-csv` | `-m` | design-slim, design-kld, design-gibson |
|
|
130
|
+
| `--output-dir` | `-o` | 7 commands |
|
|
131
|
+
| `--log-path` | `-l` | 7 commands |
|
|
132
|
+
| `--template-fasta` | `-t` | mutation-caller, umi-hunter |
|
|
133
|
+
| `--fastq` | `-q` | 4 commands |
|
|
134
|
+
| `--threshold` | `-T` | mutation-caller |
|
|
135
|
+
| `--config-csv` | `-C` | umi-hunter |
|
|
136
|
+
| `--binding-csv` | `-b` | nextera-primers |
|
|
137
|
+
| `--probes-csv` | `-P` | profile-inserts |
|
|
138
|
+
| `--region-fasta` | `-R` | ep-library-profile |
|
|
139
|
+
| `--plasmid-fasta` | `-p` | ep-library-profile |
|
|
140
|
+
| `--work-dir` | `-w` | ep-library-profile |
|
|
141
|
+
| `--config` | `-K` | global (all commands) |
|
|
142
|
+
|
|
98
143
|
You can pass multiple FASTQ paths using repeated `--fastq` options or glob patterns. Optional `--log-path` flags redirect logs if you prefer a location outside the default results directory.
|
|
99
144
|
|
|
100
145
|
---
|
|
101
146
|
|
|
147
|
+
## Configuration File
|
|
148
|
+
|
|
149
|
+
uht-tooling supports a YAML configuration file for default options.
|
|
150
|
+
|
|
151
|
+
**Auto-discovery locations** (in order):
|
|
152
|
+
1. `$UHT_TOOLING_CONFIG` environment variable
|
|
153
|
+
2. `~/.uht-tooling.yaml`
|
|
154
|
+
3. `~/.config/uht-tooling/config.yaml`
|
|
155
|
+
4. `.uht-tooling.yaml` (current directory)
|
|
156
|
+
|
|
157
|
+
Or specify explicitly: `uht-tooling --config my-config.yaml ...`
|
|
158
|
+
|
|
159
|
+
**Example ~/.uht-tooling.yaml:**
|
|
160
|
+
```yaml
|
|
161
|
+
paths:
|
|
162
|
+
output_dir: ~/results/uht-tooling
|
|
163
|
+
|
|
164
|
+
defaults:
|
|
165
|
+
mutation_caller:
|
|
166
|
+
threshold: 15
|
|
167
|
+
umi_hunter:
|
|
168
|
+
umi_identity_threshold: 0.85
|
|
169
|
+
min_cluster_size: 5
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
CLI options always take precedence over config values.
|
|
173
|
+
|
|
174
|
+
---
|
|
175
|
+
|
|
102
176
|
## Workflow reference
|
|
103
177
|
|
|
104
178
|
### Nextera XT primer design
|
|
@@ -313,13 +387,57 @@ Please be aware, this toolkit will not scale well beyond around 50k reads/sample
|
|
|
313
387
|
--fastq data/ep-library-profile/*.fastq.gz \
|
|
314
388
|
--output-dir results/ep-library-profile/
|
|
315
389
|
```
|
|
316
|
-
|
|
390
|
+
|
|
391
|
+
**Output structure**
|
|
392
|
+
|
|
393
|
+
Each sample produces an organized output directory:
|
|
394
|
+
|
|
395
|
+
```
|
|
396
|
+
sample_name/
|
|
397
|
+
├── KEY_FINDINGS.txt # Lay-user executive summary
|
|
398
|
+
├── summary_panels.png/pdf # Main visualization
|
|
399
|
+
├── aa_mutation_consensus.txt # Consensus estimate details
|
|
400
|
+
├── run.log # Analysis log
|
|
401
|
+
└── detailed/ # Technical outputs
|
|
402
|
+
├── methodology_notes.txt # Documents which lambda drives what
|
|
403
|
+
├── lambda_comparison.csv # Side-by-side lambda comparison
|
|
404
|
+
├── gene_mismatch_rates.csv
|
|
405
|
+
├── base_distribution.csv
|
|
406
|
+
├── aa_substitutions.csv
|
|
407
|
+
├── plasmid_coverage.csv
|
|
408
|
+
├── aa_mutation_distribution.csv
|
|
409
|
+
├── comprehensive_qc_data.csv
|
|
410
|
+
├── simple_qc_data.csv
|
|
411
|
+
└── qc_plots/ # QC visualizations
|
|
412
|
+
├── qc_plot_*.png
|
|
413
|
+
├── comprehensive_qc_analysis.png
|
|
414
|
+
├── error_analysis.png
|
|
415
|
+
└── qc_mutation_rate_vs_quality.png/csv
|
|
416
|
+
```
|
|
417
|
+
|
|
418
|
+
**Lambda estimates: which to use**
|
|
419
|
+
|
|
420
|
+
The profiler calculates lambda (mutations per gene copy) via two methods:
|
|
421
|
+
|
|
422
|
+
| Method | Formula | Error Quantified? | Used For |
|
|
423
|
+
|--------|---------|-------------------|----------|
|
|
424
|
+
| Simple | `(hit_rate - bg_rate) × seq_len` | No | KDE plot, Monte Carlo simulation |
|
|
425
|
+
| Consensus | Precision-weighted average across Q-scores | Yes | Recommended for reporting |
|
|
426
|
+
|
|
427
|
+
- **For publication/reporting**: Use the consensus value from `KEY_FINDINGS.txt` or `aa_mutation_consensus.txt`.
|
|
428
|
+
- **For understanding distribution shape**: See the KDE plot in `summary_panels.png` (note: uses simple lambda).
|
|
429
|
+
- **For detailed error analysis**: See `detailed/comprehensive_qc_data.csv`.
|
|
430
|
+
|
|
431
|
+
The `KEY_FINDINGS.txt` file provides a plain-language summary including:
|
|
432
|
+
- Expected AA mutations per gene copy
|
|
433
|
+
- Poisson-based interpretation (% wild-type, % 1 mutation, % 2+ mutations)
|
|
434
|
+
- Quality assessment (GOOD/ACCEPTABLE/LOW COVERAGE)
|
|
317
435
|
|
|
318
436
|
**How the mutation rate and AA expectations are derived**
|
|
319
437
|
|
|
320
|
-
1. Reads are aligned to both the region of interest and the full plasmid. Mismatches in the region define the
|
|
438
|
+
1. Reads are aligned to both the region of interest and the full plasmid. Mismatches in the region define the "target" rate; mismatches elsewhere provide the background.
|
|
321
439
|
2. The per-base background rate is subtracted from the target rate to yield a net nucleotide mutation rate, and the standard deviation reflects binomial sampling and quality-score uncertainty.
|
|
322
|
-
3. The net rate is multiplied by the CDS length to estimate λ_bp (mutations per copy). Monte Carlo simulations then flip random bases, translate the mutated CDS, and count amino-acid differences across 1,000 trials—these
|
|
440
|
+
3. The net rate is multiplied by the CDS length to estimate λ_bp (mutations per copy). Monte Carlo simulations then flip random bases, translate the mutated CDS, and count amino-acid differences across 1,000 trials—these drive the AA mutation mean/variance that appear in the panel plot.
|
|
323
441
|
4. If multiple Q-score thresholds are analysed, the CLI aggregates them via a precision-weighted consensus (1 / standard deviation weighting) after filtering out thresholds with insufficient coverage; the consensus value is written to `aa_mutation_consensus.txt` and plotted as a horizontal guide.
|
|
324
442
|
|
|
325
443
|
---
|
|
@@ -18,7 +18,22 @@ This installs the core workflows plus the optional GUI dependency (Gradio). Omit
|
|
|
18
18
|
pip install uht-tooling
|
|
19
19
|
```
|
|
20
20
|
|
|
21
|
-
|
|
21
|
+
### External Tools
|
|
22
|
+
|
|
23
|
+
Some workflows require external bioinformatics tools:
|
|
24
|
+
|
|
25
|
+
| Workflow | Required Tools |
|
|
26
|
+
|----------|---------------|
|
|
27
|
+
| mutation-caller | mafft |
|
|
28
|
+
| umi-hunter | mafft |
|
|
29
|
+
| ep-library-profile | minimap2, NanoFilt |
|
|
30
|
+
|
|
31
|
+
Install via conda:
|
|
32
|
+
```bash
|
|
33
|
+
conda install -c bioconda mafft minimap2 nanofilt
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
The CLI and GUI will validate tool availability before running and provide clear error messages if tools are missing.
|
|
22
37
|
|
|
23
38
|
### Development install
|
|
24
39
|
```bash
|
|
@@ -66,10 +81,69 @@ Each command provides detailed help, including option descriptions and expected
|
|
|
66
81
|
uht-tooling mutation-caller --help
|
|
67
82
|
```
|
|
68
83
|
|
|
84
|
+
### Short Flags
|
|
85
|
+
|
|
86
|
+
All commands support short flags for common options:
|
|
87
|
+
|
|
88
|
+
```bash
|
|
89
|
+
# Long form
|
|
90
|
+
uht-tooling design-slim --gene-fasta gene.fa --context-fasta ctx.fa --mutations-csv mut.csv --output-dir out/
|
|
91
|
+
|
|
92
|
+
# Short form
|
|
93
|
+
uht-tooling design-slim -g gene.fa -c ctx.fa -m mut.csv -o out/
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
| Long Flag | Short | Commands |
|
|
97
|
+
|-----------|-------|----------|
|
|
98
|
+
| `--gene-fasta` | `-g` | design-slim, design-kld, design-gibson |
|
|
99
|
+
| `--context-fasta` | `-c` | design-slim, design-kld, design-gibson |
|
|
100
|
+
| `--mutations-csv` | `-m` | design-slim, design-kld, design-gibson |
|
|
101
|
+
| `--output-dir` | `-o` | 7 commands |
|
|
102
|
+
| `--log-path` | `-l` | 7 commands |
|
|
103
|
+
| `--template-fasta` | `-t` | mutation-caller, umi-hunter |
|
|
104
|
+
| `--fastq` | `-q` | 4 commands |
|
|
105
|
+
| `--threshold` | `-T` | mutation-caller |
|
|
106
|
+
| `--config-csv` | `-C` | umi-hunter |
|
|
107
|
+
| `--binding-csv` | `-b` | nextera-primers |
|
|
108
|
+
| `--probes-csv` | `-P` | profile-inserts |
|
|
109
|
+
| `--region-fasta` | `-R` | ep-library-profile |
|
|
110
|
+
| `--plasmid-fasta` | `-p` | ep-library-profile |
|
|
111
|
+
| `--work-dir` | `-w` | ep-library-profile |
|
|
112
|
+
| `--config` | `-K` | global (all commands) |
|
|
113
|
+
|
|
69
114
|
You can pass multiple FASTQ paths using repeated `--fastq` options or glob patterns. Optional `--log-path` flags redirect logs if you prefer a location outside the default results directory.
|
|
70
115
|
|
|
71
116
|
---
|
|
72
117
|
|
|
118
|
+
## Configuration File
|
|
119
|
+
|
|
120
|
+
uht-tooling supports a YAML configuration file for default options.
|
|
121
|
+
|
|
122
|
+
**Auto-discovery locations** (in order):
|
|
123
|
+
1. `$UHT_TOOLING_CONFIG` environment variable
|
|
124
|
+
2. `~/.uht-tooling.yaml`
|
|
125
|
+
3. `~/.config/uht-tooling/config.yaml`
|
|
126
|
+
4. `.uht-tooling.yaml` (current directory)
|
|
127
|
+
|
|
128
|
+
Or specify explicitly: `uht-tooling --config my-config.yaml ...`
|
|
129
|
+
|
|
130
|
+
**Example ~/.uht-tooling.yaml:**
|
|
131
|
+
```yaml
|
|
132
|
+
paths:
|
|
133
|
+
output_dir: ~/results/uht-tooling
|
|
134
|
+
|
|
135
|
+
defaults:
|
|
136
|
+
mutation_caller:
|
|
137
|
+
threshold: 15
|
|
138
|
+
umi_hunter:
|
|
139
|
+
umi_identity_threshold: 0.85
|
|
140
|
+
min_cluster_size: 5
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
CLI options always take precedence over config values.
|
|
144
|
+
|
|
145
|
+
---
|
|
146
|
+
|
|
73
147
|
## Workflow reference
|
|
74
148
|
|
|
75
149
|
### Nextera XT primer design
|
|
@@ -284,13 +358,57 @@ Please be aware, this toolkit will not scale well beyond around 50k reads/sample
|
|
|
284
358
|
--fastq data/ep-library-profile/*.fastq.gz \
|
|
285
359
|
--output-dir results/ep-library-profile/
|
|
286
360
|
```
|
|
287
|
-
|
|
361
|
+
|
|
362
|
+
**Output structure**
|
|
363
|
+
|
|
364
|
+
Each sample produces an organized output directory:
|
|
365
|
+
|
|
366
|
+
```
|
|
367
|
+
sample_name/
|
|
368
|
+
├── KEY_FINDINGS.txt # Lay-user executive summary
|
|
369
|
+
├── summary_panels.png/pdf # Main visualization
|
|
370
|
+
├── aa_mutation_consensus.txt # Consensus estimate details
|
|
371
|
+
├── run.log # Analysis log
|
|
372
|
+
└── detailed/ # Technical outputs
|
|
373
|
+
├── methodology_notes.txt # Documents which lambda drives what
|
|
374
|
+
├── lambda_comparison.csv # Side-by-side lambda comparison
|
|
375
|
+
├── gene_mismatch_rates.csv
|
|
376
|
+
├── base_distribution.csv
|
|
377
|
+
├── aa_substitutions.csv
|
|
378
|
+
├── plasmid_coverage.csv
|
|
379
|
+
├── aa_mutation_distribution.csv
|
|
380
|
+
├── comprehensive_qc_data.csv
|
|
381
|
+
├── simple_qc_data.csv
|
|
382
|
+
└── qc_plots/ # QC visualizations
|
|
383
|
+
├── qc_plot_*.png
|
|
384
|
+
├── comprehensive_qc_analysis.png
|
|
385
|
+
├── error_analysis.png
|
|
386
|
+
└── qc_mutation_rate_vs_quality.png/csv
|
|
387
|
+
```
|
|
388
|
+
|
|
389
|
+
**Lambda estimates: which to use**
|
|
390
|
+
|
|
391
|
+
The profiler calculates lambda (mutations per gene copy) via two methods:
|
|
392
|
+
|
|
393
|
+
| Method | Formula | Error Quantified? | Used For |
|
|
394
|
+
|--------|---------|-------------------|----------|
|
|
395
|
+
| Simple | `(hit_rate - bg_rate) × seq_len` | No | KDE plot, Monte Carlo simulation |
|
|
396
|
+
| Consensus | Precision-weighted average across Q-scores | Yes | Recommended for reporting |
|
|
397
|
+
|
|
398
|
+
- **For publication/reporting**: Use the consensus value from `KEY_FINDINGS.txt` or `aa_mutation_consensus.txt`.
|
|
399
|
+
- **For understanding distribution shape**: See the KDE plot in `summary_panels.png` (note: uses simple lambda).
|
|
400
|
+
- **For detailed error analysis**: See `detailed/comprehensive_qc_data.csv`.
|
|
401
|
+
|
|
402
|
+
The `KEY_FINDINGS.txt` file provides a plain-language summary including:
|
|
403
|
+
- Expected AA mutations per gene copy
|
|
404
|
+
- Poisson-based interpretation (% wild-type, % 1 mutation, % 2+ mutations)
|
|
405
|
+
- Quality assessment (GOOD/ACCEPTABLE/LOW COVERAGE)
|
|
288
406
|
|
|
289
407
|
**How the mutation rate and AA expectations are derived**
|
|
290
408
|
|
|
291
|
-
1. Reads are aligned to both the region of interest and the full plasmid. Mismatches in the region define the
|
|
409
|
+
1. Reads are aligned to both the region of interest and the full plasmid. Mismatches in the region define the "target" rate; mismatches elsewhere provide the background.
|
|
292
410
|
2. The per-base background rate is subtracted from the target rate to yield a net nucleotide mutation rate, and the standard deviation reflects binomial sampling and quality-score uncertainty.
|
|
293
|
-
3. The net rate is multiplied by the CDS length to estimate λ_bp (mutations per copy). Monte Carlo simulations then flip random bases, translate the mutated CDS, and count amino-acid differences across 1,000 trials—these
|
|
411
|
+
3. The net rate is multiplied by the CDS length to estimate λ_bp (mutations per copy). Monte Carlo simulations then flip random bases, translate the mutated CDS, and count amino-acid differences across 1,000 trials—these drive the AA mutation mean/variance that appear in the panel plot.
|
|
294
412
|
4. If multiple Q-score thresholds are analysed, the CLI aggregates them via a precision-weighted consensus (1 / standard deviation weighting) after filtering out thresholds with insufficient coverage; the consensus value is written to `aa_mutation_consensus.txt` and plotted as a horizontal guide.
|
|
295
413
|
|
|
296
414
|
---
|