PyPI - uht-tooling - Versions diffs - 0.2.0__tar.gz → 0.3.0__tar.gz - Mend

uht-tooling 0.2.0tar.gz → 0.3.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

{uht_tooling-0.2.0 → uht_tooling-0.3.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: uht-tooling
-Version: 0.2.0
+Version: 0.3.0
 Summary: Tooling for ultra-high throughput screening workflows.
 Author: Matt115A
 License-Expression: MIT
@@ -47,7 +47,22 @@ This installs the core workflows plus the optional GUI dependency (Gradio). Omit
 pip install uht-tooling
 ```
-You will need a functioning version of mafft - you should install this separately and it should be accessible from your environment.
+### External Tools
+Some workflows require external bioinformatics tools:
+| Workflow | Required Tools |
+|----------|---------------|
+| mutation-caller | mafft |
+| umi-hunter | mafft |
+| ep-library-profile | minimap2, NanoFilt |
+Install via conda:
+```bash
+conda install -c bioconda mafft minimap2 nanofilt
+```
+The CLI and GUI will validate tool availability before running and provide clear error messages if tools are missing.
 ### Development install
 ```bash
@@ -95,10 +110,69 @@ Each command provides detailed help, including option descriptions and expected
 uht-tooling mutation-caller --help
 ```
+### Short Flags
+All commands support short flags for common options:
+```bash
+# Long form
+uht-tooling design-slim --gene-fasta gene.fa --context-fasta ctx.fa --mutations-csv mut.csv --output-dir out/
+# Short form
+uht-tooling design-slim -g gene.fa -c ctx.fa -m mut.csv -o out/
+```
+| Long Flag | Short | Commands |
+|-----------|-------|----------|
+| `--gene-fasta` | `-g` | design-slim, design-kld, design-gibson |
+| `--context-fasta` | `-c` | design-slim, design-kld, design-gibson |
+| `--mutations-csv` | `-m` | design-slim, design-kld, design-gibson |
+| `--output-dir` | `-o` | 7 commands |
+| `--log-path` | `-l` | 7 commands |
+| `--template-fasta` | `-t` | mutation-caller, umi-hunter |
+| `--fastq` | `-q` | 4 commands |
+| `--threshold` | `-T` | mutation-caller |
+| `--config-csv` | `-C` | umi-hunter |
+| `--binding-csv` | `-b` | nextera-primers |
+| `--probes-csv` | `-P` | profile-inserts |
+| `--region-fasta` | `-R` | ep-library-profile |
+| `--plasmid-fasta` | `-p` | ep-library-profile |
+| `--work-dir` | `-w` | ep-library-profile |
+| `--config` | `-K` | global (all commands) |
 You can pass multiple FASTQ paths using repeated `--fastq` options or glob patterns. Optional `--log-path` flags redirect logs if you prefer a location outside the default results directory.
 ---
+## Configuration File
+uht-tooling supports a YAML configuration file for default options.
+**Auto-discovery locations** (in order):
+1. `$UHT_TOOLING_CONFIG` environment variable
+2. `~/.uht-tooling.yaml`
+3. `~/.config/uht-tooling/config.yaml`
+4. `.uht-tooling.yaml` (current directory)
+Or specify explicitly: `uht-tooling --config my-config.yaml ...`
+**Example ~/.uht-tooling.yaml:**
+```yaml
+paths:
+  output_dir: ~/results/uht-tooling
+defaults:
+  mutation_caller:
+    threshold: 15
+  umi_hunter:
+    umi_identity_threshold: 0.85
+    min_cluster_size: 5
+```
+CLI options always take precedence over config values.
+---
 ## Workflow reference
 ### Nextera XT primer design

{uht_tooling-0.2.0 → uht_tooling-0.3.0}/README.md RENAMED Viewed

@@ -18,7 +18,22 @@ This installs the core workflows plus the optional GUI dependency (Gradio). Omit
 pip install uht-tooling
 ```
-You will need a functioning version of mafft - you should install this separately and it should be accessible from your environment.
+### External Tools
+Some workflows require external bioinformatics tools:
+| Workflow | Required Tools |
+|----------|---------------|
+| mutation-caller | mafft |
+| umi-hunter | mafft |
+| ep-library-profile | minimap2, NanoFilt |
+Install via conda:
+```bash
+conda install -c bioconda mafft minimap2 nanofilt
+```
+The CLI and GUI will validate tool availability before running and provide clear error messages if tools are missing.
 ### Development install
 ```bash
@@ -66,10 +81,69 @@ Each command provides detailed help, including option descriptions and expected
 uht-tooling mutation-caller --help
 ```
+### Short Flags
+All commands support short flags for common options:
+```bash
+# Long form
+uht-tooling design-slim --gene-fasta gene.fa --context-fasta ctx.fa --mutations-csv mut.csv --output-dir out/
+# Short form
+uht-tooling design-slim -g gene.fa -c ctx.fa -m mut.csv -o out/
+```
+| Long Flag | Short | Commands |
+|-----------|-------|----------|
+| `--gene-fasta` | `-g` | design-slim, design-kld, design-gibson |
+| `--context-fasta` | `-c` | design-slim, design-kld, design-gibson |
+| `--mutations-csv` | `-m` | design-slim, design-kld, design-gibson |
+| `--output-dir` | `-o` | 7 commands |
+| `--log-path` | `-l` | 7 commands |
+| `--template-fasta` | `-t` | mutation-caller, umi-hunter |
+| `--fastq` | `-q` | 4 commands |
+| `--threshold` | `-T` | mutation-caller |
+| `--config-csv` | `-C` | umi-hunter |
+| `--binding-csv` | `-b` | nextera-primers |
+| `--probes-csv` | `-P` | profile-inserts |
+| `--region-fasta` | `-R` | ep-library-profile |
+| `--plasmid-fasta` | `-p` | ep-library-profile |
+| `--work-dir` | `-w` | ep-library-profile |
+| `--config` | `-K` | global (all commands) |
 You can pass multiple FASTQ paths using repeated `--fastq` options or glob patterns. Optional `--log-path` flags redirect logs if you prefer a location outside the default results directory.
 ---
+## Configuration File
+uht-tooling supports a YAML configuration file for default options.
+**Auto-discovery locations** (in order):
+1. `$UHT_TOOLING_CONFIG` environment variable
+2. `~/.uht-tooling.yaml`
+3. `~/.config/uht-tooling/config.yaml`
+4. `.uht-tooling.yaml` (current directory)
+Or specify explicitly: `uht-tooling --config my-config.yaml ...`
+**Example ~/.uht-tooling.yaml:**
+```yaml
+paths:
+  output_dir: ~/results/uht-tooling
+defaults:
+  mutation_caller:
+    threshold: 15
+  umi_hunter:
+    umi_identity_threshold: 0.85
+    min_cluster_size: 5
+```
+CLI options always take precedence over config values.
+---
 ## Workflow reference
 ### Nextera XT primer design

{uht_tooling-0.2.0 → uht_tooling-0.3.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "uht-tooling"
-version = "0.2.0"
+version = "0.3.0"
 description = "Tooling for ultra-high throughput screening workflows."
 readme = "README.md"
 requires-python = ">=3.8"

{uht_tooling-0.2.0 → uht_tooling-0.3.0}/src/uht_tooling/cli.py RENAMED Viewed

@@ -3,6 +3,8 @@ from typing import Optional
 import typer
+from uht_tooling.config import get_option, load_config
+from uht_tooling.tools import ToolNotFoundError, validate_workflow_tools
 from uht_tooling.workflows.design_gibson import run_design_gibson
 from uht_tooling.workflows.design_kld import run_design_kld
 from uht_tooling.workflows.design_slim import run_design_slim
@@ -28,29 +30,57 @@ from uht_tooling.workflows.gui import launch_gui
 app = typer.Typer(help="Command-line interface for the uht-tooling package.")
+@app.callback()
+def main_callback(
+    ctx: typer.Context,
+    config: Optional[Path] = typer.Option(
+        None,
+        "--config",
+        "-K",
+        exists=True,
+        readable=True,
+        help="Path to YAML configuration file for default options.",
+    ),
+):
+    """Global callback to load configuration file."""
+    ctx.ensure_object(dict)
+    ctx.obj["config"] = load_config(config)
 @app.command("design-slim", help="Design SLIM primers from user-specified FASTA/CSV inputs.")
 def design_slim_command(
-    gene_fasta: Path = typer.Option(..., exists=True, readable=True, help="Path to the gene FASTA file."),
+    ctx: typer.Context,
+    gene_fasta: Path = typer.Option(
+        ..., "--gene-fasta", "-g", exists=True, readable=True, help="Path to the gene FASTA file."
+    ),
     context_fasta: Path = typer.Option(
         ...,
+        "--context-fasta",
+        "-c",
         exists=True,
         readable=True,
         help="Path to the context FASTA file containing the plasmid or genomic sequence.",
     ),
     mutations_csv: Path = typer.Option(
         ...,
+        "--mutations-csv",
+        "-m",
         exists=True,
         readable=True,
         help="CSV file containing a 'mutations' column with the desired edits.",
     ),
     output_dir: Path = typer.Option(
         ...,
+        "--output-dir",
+        "-o",
         dir_okay=True,
         writable=True,
         help="Directory where results will be written.",
     ),
     log_path: Optional[Path] = typer.Option(
         None,
+        "--log-path",
+        "-l",
         dir_okay=False,
         writable=True,
         help="Optional path to write a dedicated log file for this run.",
@@ -69,27 +99,38 @@ def design_slim_command(
 @app.command("design-kld", help="Design KLD (inverse PCR) primers from user-specified FASTA/CSV inputs.")
 def design_kld_command(
-    gene_fasta: Path = typer.Option(..., exists=True, readable=True, help="Path to the gene FASTA file."),
+    ctx: typer.Context,
+    gene_fasta: Path = typer.Option(
+        ..., "--gene-fasta", "-g", exists=True, readable=True, help="Path to the gene FASTA file."
+    ),
     context_fasta: Path = typer.Option(
         ...,
+        "--context-fasta",
+        "-c",
         exists=True,
         readable=True,
         help="Path to the context FASTA file containing the plasmid or genomic sequence.",
     ),
     mutations_csv: Path = typer.Option(
         ...,
+        "--mutations-csv",
+        "-m",
         exists=True,
         readable=True,
         help="CSV file containing a 'mutations' column with the desired edits.",
     ),
     output_dir: Path = typer.Option(
         ...,
+        "--output-dir",
+        "-o",
         dir_okay=True,
         writable=True,
         help="Directory where results will be written.",
     ),
     log_path: Optional[Path] = typer.Option(
         None,
+        "--log-path",
+        "-l",
         dir_okay=False,
         writable=True,
         help="Optional path to write a dedicated log file for this run.",
@@ -108,26 +149,34 @@ def design_kld_command(
 @app.command("nextera-primers", help="Generate Nextera XT primers from binding region CSV input.")
 def nextera_primers_command(
+    ctx: typer.Context,
     binding_csv: Path = typer.Option(
         ...,
+        "--binding-csv",
+        "-b",
         exists=True,
         readable=True,
         help="CSV file with a 'binding_region' column; first row is i7, second row is i5.",
     ),
     output_csv: Path = typer.Option(
         ...,
+        "--output-csv",
+        "-o",
         dir_okay=False,
         writable=True,
         help="Path to write the generated primer CSV.",
     ),
     log_path: Optional[Path] = typer.Option(
         None,
+        "--log-path",
+        "-l",
         dir_okay=False,
         writable=True,
         help="Optional path to write a dedicated log file.",
     ),
     config: Optional[Path] = typer.Option(
         None,
+        "--nextera-config",
         exists=True,
         readable=True,
         help="Optional YAML file providing overrides for indexes/prefixes/suffixes.",
@@ -145,27 +194,38 @@ def nextera_primers_command(
 @app.command("design-gibson", help="Design Gibson assembly primers and assembly plans.")
 def design_gibson_command(
-    gene_fasta: Path = typer.Option(..., exists=True, readable=True, help="Path to the gene FASTA file."),
+    ctx: typer.Context,
+    gene_fasta: Path = typer.Option(
+        ..., "--gene-fasta", "-g", exists=True, readable=True, help="Path to the gene FASTA file."
+    ),
     context_fasta: Path = typer.Option(
         ...,
+        "--context-fasta",
+        "-c",
         exists=True,
         readable=True,
         help="Path to the circular context FASTA file.",
     ),
     mutations_csv: Path = typer.Option(
         ...,
+        "--mutations-csv",
+        "-m",
         exists=True,
         readable=True,
         help="CSV file with a 'mutations' column (use '+' to link sub-mutations).",
     ),
     output_dir: Path = typer.Option(
         ...,
+        "--output-dir",
+        "-o",
         dir_okay=True,
         writable=True,
         help="Directory where primer and assembly plan CSVs will be written.",
     ),
     log_path: Optional[Path] = typer.Option(
         None,
+        "--log-path",
+        "-l",
         dir_okay=False,
         writable=True,
         help="Optional path for a dedicated log file.",
@@ -189,41 +249,65 @@ def design_gibson_command(
     help="Identify amino-acid substitutions from long-read data without UMIs.",
 )
 def mutation_caller_command(
+    ctx: typer.Context,
     template_fasta: Path = typer.Option(
         ...,
+        "--template-fasta",
+        "-t",
         exists=True,
         readable=True,
         help="FASTA file containing the mutation caller template sequence.",
     ),
     flanks_csv: Path = typer.Option(
         ...,
+        "--flanks-csv",
+        "-f",
         exists=True,
         readable=True,
         help="CSV file describing gene flanks and min/max lengths.",
     ),
     fastq: list[str] = typer.Option(
         ...,
+        "--fastq",
+        "-q",
         help="One or more FASTQ(.gz) paths or glob patterns (provide multiple --fastq options as needed).",
     ),
     output_dir: Path = typer.Option(
         ...,
+        "--output-dir",
+        "-o",
         dir_okay=True,
         writable=True,
         help="Directory where per-sample outputs will be written.",
     ),
     threshold: int = typer.Option(
         10,
+        "--threshold",
+        "-T",
         min=1,
         help="Minimum AA substitution count to include in the frequent-substitution report.",
     ),
     log_path: Optional[Path] = typer.Option(
         None,
+        "--log-path",
+        "-l",
         dir_okay=False,
         writable=True,
         help="Optional path to write a dedicated log file.",
     ),
 ):
     """Identify and summarise amino-acid substitutions."""
+    # Validate required external tools
+    try:
+        validate_workflow_tools("mutation_caller")
+    except ToolNotFoundError as e:
+        typer.echo(f"Error: {e}", err=True)
+        raise typer.Exit(1)
+    # Apply config defaults
+    config = ctx.obj.get("config", {}) if ctx.obj else {}
+    threshold = get_option(config, "threshold", threshold, default=10, workflow="mutation_caller")
     fastq_files = expand_fastq_inputs_mutation(fastq)
     results = run_mutation_caller(
         template_fasta=template_fasta,
@@ -243,53 +327,89 @@ def mutation_caller_command(
 @app.command("umi-hunter", help="Cluster UMIs and produce consensus genes from long-read data.")
 def umi_hunter_command(
+    ctx: typer.Context,
     template_fasta: Path = typer.Option(
         ...,
+        "--template-fasta",
+        "-t",
         exists=True,
         readable=True,
         help="Template FASTA file for consensus generation.",
     ),
     config_csv: Path = typer.Option(
         ...,
+        "--config-csv",
+        "-C",
         exists=True,
         readable=True,
         help="CSV describing UMI/gene flanks and length bounds.",
     ),
     fastq: list[str] = typer.Option(
         ...,
+        "--fastq",
+        "-q",
         help="One or more FASTQ(.gz) paths or glob patterns (multiple --fastq options allowed).",
     ),
     output_dir: Path = typer.Option(
         ...,
+        "--output-dir",
+        "-o",
         dir_okay=True,
         writable=True,
         help="Directory where UMI hunter outputs will be stored.",
     ),
     umi_identity_threshold: float = typer.Option(
         0.9,
+        "--umi-identity-threshold",
+        "-u",
         min=0.0,
         max=1.0,
         help="UMI clustering identity threshold (default: 0.9).",
     ),
     consensus_mutation_threshold: float = typer.Option(
         0.7,
+        "--consensus-mutation-threshold",
+        "-M",
         min=0.0,
         max=1.0,
         help="Mutation threshold for consensus calling (default: 0.7).",
     ),
     min_cluster_size: int = typer.Option(
         1,
+        "--min-cluster-size",
+        "-s",
         min=1,
         help="Minimum number of reads required in a UMI cluster before a consensus is generated.",
     ),
     log_path: Optional[Path] = typer.Option(
         None,
+        "--log-path",
+        "-l",
         dir_okay=False,
         writable=True,
         help="Optional path to write a dedicated log file.",
     ),
 ):
     """Cluster UMIs and generate consensus sequences from long-read FASTQ data."""
+    # Validate required external tools
+    try:
+        validate_workflow_tools("umi_hunter")
+    except ToolNotFoundError as e:
+        typer.echo(f"Error: {e}", err=True)
+        raise typer.Exit(1)
+    # Apply config defaults
+    config = ctx.obj.get("config", {}) if ctx.obj else {}
+    umi_identity_threshold = get_option(
+        config, "umi_identity_threshold", umi_identity_threshold, default=0.9, workflow="umi_hunter"
+    )
+    consensus_mutation_threshold = get_option(
+        config, "consensus_mutation_threshold", consensus_mutation_threshold, default=0.7, workflow="umi_hunter"
+    )
+    min_cluster_size = get_option(
+        config, "min_cluster_size", min_cluster_size, default=1, workflow="umi_hunter"
+    )
     fastq_files = expand_fastq_inputs_umi(fastq)
     results = run_umi_hunter(
         template_fasta=template_fasta,
@@ -310,42 +430,60 @@ def umi_hunter_command(
             typer.echo(
                 f"  Sample {entry['sample']}: "
                 f"{entry.get('clusters', 0)} consensus clusters "
-                f"(from {total_clusters} total) → {entry['directory']}"
+                f"(from {total_clusters} total) -> {entry['directory']}"
             )
 @app.command("ep-library-profile", help="Profile mutation rates for ep-library sequencing data.")
 def ep_library_profile_command(
+    ctx: typer.Context,
     region_fasta: Path = typer.Option(
         ...,
+        "--region-fasta",
+        "-R",
         exists=True,
         readable=True,
         help="FASTA file describing the region of interest.",
     ),
     plasmid_fasta: Path = typer.Option(
         ...,
+        "--plasmid-fasta",
+        "-p",
         exists=True,
         readable=True,
         help="FASTA file with the full plasmid sequence.",
     ),
     fastq: list[str] = typer.Option(
         ...,
+        "--fastq",
+        "-q",
         help="One or more FASTQ(.gz) paths or glob patterns (multiple --fastq options allowed).",
     ),
     output_dir: Path = typer.Option(
         ...,
+        "--output-dir",
+        "-o",
         dir_okay=True,
         writable=True,
         help="Directory for per-sample outputs.",
     ),
     work_dir: Optional[Path] = typer.Option(
         None,
+        "--work-dir",
+        "-w",
         dir_okay=True,
         writable=True,
         help="Optional scratch directory for intermediate files (defaults to output/tmp).",
     ),
 ):
     """Quantify mutation rates for ep-library sequencing experiments."""
+    # Validate required external tools
+    try:
+        validate_workflow_tools("ep_library_profile")
+    except ToolNotFoundError as e:
+        typer.echo(f"Error: {e}", err=True)
+        raise typer.Exit(1)
     fastq_files = expand_fastq_inputs_ep(fastq)
     results = run_ep_library_profile(
         fastq_paths=fastq_files,
@@ -365,30 +503,41 @@ def ep_library_profile_command(
 @app.command("profile-inserts", help="Extract and profile inserts using probe pairs.")
 def profile_inserts_command(
+    ctx: typer.Context,
     probes_csv: Path = typer.Option(
         ...,
+        "--probes-csv",
+        "-P",
         exists=True,
         readable=True,
         help="CSV file containing upstream/downstream probes.",
     ),
     fastq: list[str] = typer.Option(
         ...,
+        "--fastq",
+        "-q",
         help="One or more FASTQ(.gz) paths or glob patterns (multiple --fastq options allowed).",
     ),
     output_dir: Path = typer.Option(
         ...,
+        "--output-dir",
+        "-o",
         dir_okay=True,
         writable=True,
         help="Directory for per-sample outputs.",
     ),
     min_ratio: int = typer.Option(
         80,
+        "--min-ratio",
+        "-r",
         min=0,
         max=100,
         help="Minimum fuzzy match ratio for probe detection (default: 80).",
     ),
     log_path: Optional[Path] = typer.Option(
         None,
+        "--log-path",
+        "-l",
         dir_okay=False,
         writable=True,
         help="Optional path to write a dedicated log file.",

uht_tooling-0.3.0/src/uht_tooling/config.py ADDED Viewed

@@ -0,0 +1,137 @@
+"""Global configuration file support."""
+import os
+from pathlib import Path
+from typing import Any, Dict, Optional
+try:
+    import yaml
+    HAVE_YAML = True
+except ImportError:
+    HAVE_YAML = False
+DEFAULT_CONFIG_PATHS = [
+    Path.home() / ".uht-tooling.yaml",
+    Path.home() / ".config" / "uht-tooling" / "config.yaml",
+    Path(".uht-tooling.yaml"),
+]
+def find_config_file() -> Optional[Path]:
+    """
+    Find a configuration file from environment variable or default locations.
+    Search order:
+    1. $UHT_TOOLING_CONFIG environment variable
+    2. ~/.uht-tooling.yaml
+    3. ~/.config/uht-tooling/config.yaml
+    4. .uht-tooling.yaml (current directory)
+    Returns:
+        Path to the config file if found, None otherwise.
+    """
+    # Check environment variable first
+    env_path = os.environ.get("UHT_TOOLING_CONFIG")
+    if env_path:
+        path = Path(env_path)
+        if path.exists():
+            return path
+    # Check default locations
+    for path in DEFAULT_CONFIG_PATHS:
+        if path.exists():
+            return path
+    return None
+def load_config(config_path: Optional[Path] = None) -> Dict[str, Any]:
+    """
+    Load YAML configuration, auto-discovering if path not provided.
+    Args:
+        config_path: Explicit path to config file. If None, auto-discover.
+    Returns:
+        Dictionary containing configuration. Empty dict if no config found
+        or if YAML is not available.
+    """
+    if not HAVE_YAML:
+        return {}
+    if config_path is None:
+        config_path = find_config_file()
+    if config_path is not None:
+        config_path = Path(config_path)
+    if config_path is None or not config_path.exists():
+        return {}
+    try:
+        with open(config_path, "r", encoding="utf-8") as f:
+            config = yaml.safe_load(f)
+            return config if isinstance(config, dict) else {}
+    except Exception:
+        return {}
+def get_option(
+    config: Dict[str, Any],
+    key: str,
+    cli_value: Any,
+    default: Any = None,
+    workflow: Optional[str] = None,
+) -> Any:
+    """
+    Get an option with precedence: CLI > workflow-specific config > global config > default.
+    Args:
+        config: Configuration dictionary from load_config().
+        key: The option key to look up.
+        cli_value: Value from CLI (takes precedence if not None).
+        default: Default value if not found anywhere.
+        workflow: Optional workflow name for workflow-specific defaults.
+    Returns:
+        The resolved option value.
+    """
+    # CLI value always takes precedence if explicitly provided
+    if cli_value is not None:
+        return cli_value
+    # Check workflow-specific defaults
+    if workflow:
+        workflow_defaults = config.get("defaults", {}).get(workflow, {})
+        if key in workflow_defaults:
+            return workflow_defaults[key]
+    # Check global paths config
+    paths_config = config.get("paths", {})
+    if key in paths_config:
+        value = paths_config[key]
+        # Expand ~ in paths
+        if isinstance(value, str):
+            return os.path.expanduser(value)
+        return value
+    # Check top-level config
+    if key in config:
+        return config[key]
+    return default
+def get_workflow_defaults(config: Dict[str, Any], workflow: str) -> Dict[str, Any]:
+    """
+    Get all default values for a specific workflow.
+    Args:
+        config: Configuration dictionary from load_config().
+        workflow: Workflow name (e.g., "mutation_caller", "umi_hunter").
+    Returns:
+        Dictionary of default values for the workflow.
+    """
+    return config.get("defaults", {}).get(workflow, {})

uht_tooling-0.3.0/src/uht_tooling/tools.py ADDED Viewed

@@ -0,0 +1,143 @@
+"""External tool validation utilities."""
+import shutil
+import subprocess
+from typing import Dict, List, Optional, Tuple
+class ToolNotFoundError(Exception):
+    """Raised when a required external tool is not found."""
+    pass
+def check_tool_available(tool_name: str) -> Tuple[bool, Optional[str]]:
+    """
+    Check if a tool is available on PATH.
+    Args:
+        tool_name: Name of the executable to check.
+    Returns:
+        Tuple of (available, version_or_error).
+        If available is True, version_or_error contains the version string.
+        If available is False, version_or_error contains the error message.
+    """
+    path = shutil.which(tool_name)
+    if not path:
+        return False, f"'{tool_name}' not found on PATH"
+    # Try to get version
+    try:
+        result = subprocess.run(
+            [tool_name, "--version"],
+            capture_output=True,
+            text=True,
+            timeout=5,
+        )
+        version = (result.stdout or result.stderr).strip().split("\n")[0]
+        return True, version
+    except subprocess.TimeoutExpired:
+        return True, "version unknown (timeout)"
+    except Exception:
+        return True, "version unknown"
+def validate_tools(tools: List[str], raise_on_missing: bool = True) -> Dict[str, dict]:
+    """
+    Validate multiple tools, optionally raising ToolNotFoundError.
+    Args:
+        tools: List of tool names to validate.
+        raise_on_missing: If True, raise ToolNotFoundError for missing tools.
+    Returns:
+        Dictionary mapping tool names to their status:
+        {
+            "tool_name": {
+                "available": bool,
+                "version": str or None,
+                "error": str or None
+            }
+        }
+    Raises:
+        ToolNotFoundError: If raise_on_missing is True and any tool is missing.
+    """
+    results: Dict[str, dict] = {}
+    missing: List[str] = []
+    for tool in tools:
+        available, info = check_tool_available(tool)
+        if available:
+            results[tool] = {
+                "available": True,
+                "version": info,
+                "error": None,
+            }
+        else:
+            results[tool] = {
+                "available": False,
+                "version": None,
+                "error": info,
+            }
+            missing.append(tool)
+    if raise_on_missing and missing:
+        missing_str = ", ".join(missing)
+        raise ToolNotFoundError(
+            f"Missing required external tool(s): {missing_str}. "
+            f"Install via conda: conda install -c bioconda {' '.join(missing)}"
+        )
+    return results
+# Tool requirements per workflow
+WORKFLOW_TOOLS: Dict[str, List[str]] = {
+    "mutation_caller": ["mafft"],
+    "umi_hunter": ["mafft"],
+    "ep_library_profile": ["minimap2", "NanoFilt"],
+}
+def validate_workflow_tools(workflow: str, raise_on_missing: bool = True) -> Dict[str, dict]:
+    """
+    Validate tools required for a specific workflow.
+    Args:
+        workflow: Name of the workflow (e.g., "mutation_caller", "umi_hunter").
+        raise_on_missing: If True, raise ToolNotFoundError for missing tools.
+    Returns:
+        Dictionary mapping tool names to their status.
+    Raises:
+        ValueError: If workflow is not recognized.
+        ToolNotFoundError: If raise_on_missing is True and any tool is missing.
+    """
+    if workflow not in WORKFLOW_TOOLS:
+        # No external tools required for this workflow
+        return {}
+    tools = WORKFLOW_TOOLS[workflow]
+    return validate_tools(tools, raise_on_missing=raise_on_missing)
+def get_tool_requirements_message(workflow: str) -> str:
+    """
+    Get a human-readable message about tool requirements for a workflow.
+    Args:
+        workflow: Name of the workflow.
+    Returns:
+        A message describing required tools, or empty string if none required.
+    """
+    if workflow not in WORKFLOW_TOOLS:
+        return ""
+    tools = WORKFLOW_TOOLS[workflow]
+    return (
+        f"This workflow requires: {', '.join(tools)}. "
+        f"Install via: conda install -c bioconda {' '.join(tools)}"
+    )

{uht_tooling-0.2.0 → uht_tooling-0.3.0}/src/uht_tooling/workflows/gui.py RENAMED Viewed

@@ -21,6 +21,7 @@ except ImportError as exc:  # pragma: no cover - handled at runtime
         f"{exc}. Install optional GUI extras via 'pip install gradio pandas'."
     ) from exc
+from uht_tooling.tools import ToolNotFoundError, validate_workflow_tools
 from uht_tooling.workflows.design_gibson import run_design_gibson
 from uht_tooling.workflows.design_kld import run_design_kld
 from uht_tooling.workflows.design_slim import run_design_slim
@@ -291,6 +292,12 @@ def run_gui_mutation_caller(
     config_dir: Optional[Path] = None
     output_dir: Optional[Path] = None
     try:
+        # Validate required external tools
+        try:
+            validate_workflow_tools("mutation_caller")
+        except ToolNotFoundError as e:
+            return f"Missing required tools: {e}", None
         if not fastq_file or not template_file:
             raise ValueError("Upload a FASTQ(.gz) read file and the reference template FASTA.")
@@ -367,6 +374,12 @@ def run_gui_umi_hunter(
     config_dir: Optional[Path] = None
     output_dir: Optional[Path] = None
     try:
+        # Validate required external tools
+        try:
+            validate_workflow_tools("umi_hunter")
+        except ToolNotFoundError as e:
+            return f"Missing required tools: {e}", None
         if not fastq_file or not template_file:
             raise ValueError("Upload a FASTQ(.gz) read file and the template FASTA.")
@@ -536,6 +549,12 @@ def run_gui_ep_library_profile(
     plasmid_fasta: Optional[str],
 ) -> Tuple[str, Optional[str]]:
     try:
+        # Validate required external tools
+        try:
+            validate_workflow_tools("ep_library_profile")
+        except ToolNotFoundError as e:
+            return f"Missing required tools: {e}", None
         if not fastq_files or not region_fasta or not plasmid_fasta:
             raise ValueError("Upload FASTQ(.gz) files plus region-of-interest and plasmid FASTA files.")

{uht_tooling-0.2.0 → uht_tooling-0.3.0}/src/uht_tooling/workflows/mut_rate.py RENAMED Viewed

@@ -16,6 +16,7 @@ import math
 import tempfile
 from pathlib import Path
 from typing import Dict, Iterable, List, Optional, Sequence, Tuple
+from tqdm import tqdm
 # Use a built-in Matplotlib style ("ggplot") for consistency
 plt.style.use("ggplot")
@@ -219,7 +220,12 @@ def compute_mismatch_stats_sam(sam_file, refs_dict):
     logging.info(f"Computing mismatch stats for {sam_file}")
     samfile = pysam.AlignmentFile(sam_file, "r")
-    for read in samfile.fetch():
+    # Count total aligned reads for progress bar
+    total_reads = sum(1 for _ in samfile.fetch())
+    samfile.close()
+    samfile = pysam.AlignmentFile(sam_file, "r")
+    for read in tqdm(samfile.fetch(), desc="Computing mismatch stats", total=total_reads, unit="read"):
         if read.is_unmapped or read.query_sequence is None:
             continue
         ref_name = samfile.get_reference_name(read.reference_id)
@@ -2709,7 +2715,7 @@ def run_ep_library_profile(
     master_summary_path.write_text(header + "\n", encoding="utf-8")
     sample_results: List[Dict[str, object]] = []
-    for fastq in fastq_paths:
+    for fastq in tqdm(fastq_paths, desc="Processing FASTQ files", unit="file"):
         result = process_single_fastq(
             fastq,
             region_fasta,

{uht_tooling-0.2.0 → uht_tooling-0.3.0}/src/uht_tooling/workflows/mutation_caller.py RENAMED Viewed

@@ -17,6 +17,7 @@ from Bio.Align.Applications import MafftCommandline
 from Bio.Seq import Seq
 from Bio.SeqRecord import SeqRecord
 from scipy.stats import fisher_exact, gaussian_kde
+from tqdm import tqdm
 def reverse_complement(seq: str) -> str:
@@ -52,8 +53,16 @@ def extract_gene(seq: str, pattern: re.Pattern, gene_min: int, gene_max: int) ->
 def process_fastq(file_path: Path, pattern: re.Pattern, gene_min: int, gene_max: int) -> Dict[str, str]:
     gene_reads: Dict[str, str] = {}
+    # Count total reads for progress bar
+    total_reads = 0
+    with gzip.open(file_path, "rt") as handle:
+        for _ in handle:
+            total_reads += 1
+    total_reads = total_reads // 4
     with gzip.open(file_path, "rt") as handle:
-        while True:
+        for _ in tqdm(range(total_reads), desc=f"Processing {file_path.name}", unit="read"):
             header = handle.readline()
             if not header:
                 break
@@ -274,7 +283,7 @@ def run_mutation_caller(
         results: List[Dict[str, Path]] = []
-        for fastq in fastq_files:
+        for fastq in tqdm(fastq_files, desc="Processing samples", unit="sample"):
             if not fastq.exists():
                 logger.warning("FASTQ file %s not found; skipping.", fastq)
                 continue

{uht_tooling-0.2.0 → uht_tooling-0.3.0}/src/uht_tooling/workflows/umi_hunter.py RENAMED Viewed

@@ -68,11 +68,18 @@ def process_fastq(
     pattern_gene: re.Pattern,
     logger: logging.Logger,
 ) -> tuple[int, Dict[str, List[str]]]:
+    # Count total reads for progress bar
+    total_reads = 0
+    with gzip.open(file_path, "rt") as handle:
+        for _ in handle:
+            total_reads += 1
+    total_reads = total_reads // 4
     read_count = 0
     umi_info: Dict[str, List[str]] = {}
     extracted = 0
     with gzip.open(file_path, "rt") as handle:
-        while True:
+        for _ in tqdm(range(total_reads), desc=f"Processing {file_path.name}", unit="read"):
             header = handle.readline()
             if not header:
                 break
@@ -85,8 +92,6 @@ def process_fastq(
             if umi and gene:
                 umi_info.setdefault(umi, []).append(gene)
                 extracted += 1
-            if read_count % 100000 == 0:
-                logger.info("Processed %s reads so far in %s", read_count, file_path.name)
     logger.info(
         "Finished reading %s: total reads=%s, extracted pairs=%s",
         file_path,
@@ -129,7 +134,7 @@ def cluster_umis(
     logger.info("Clustering %s unique UMIs with threshold %.2f", len(umi_info), threshold)
     sorted_umis = sorted(umi_info.items(), key=lambda item: len(item[1]), reverse=True)
     clusters: List[dict] = []
-    for umi, gene_list in sorted_umis:
+    for umi, gene_list in tqdm(sorted_umis, desc="Clustering UMIs", unit="UMI"):
         count = len(gene_list)
         for cluster in clusters:
             if percent_identity(umi, cluster["rep"]) >= threshold:

{uht_tooling-0.2.0 → uht_tooling-0.3.0}/src/uht_tooling.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: uht-tooling
-Version: 0.2.0
+Version: 0.3.0
 Summary: Tooling for ultra-high throughput screening workflows.
 Author: Matt115A
 License-Expression: MIT
@@ -47,7 +47,22 @@ This installs the core workflows plus the optional GUI dependency (Gradio). Omit
 pip install uht-tooling
 ```
-You will need a functioning version of mafft - you should install this separately and it should be accessible from your environment.
+### External Tools
+Some workflows require external bioinformatics tools:
+| Workflow | Required Tools |
+|----------|---------------|
+| mutation-caller | mafft |
+| umi-hunter | mafft |
+| ep-library-profile | minimap2, NanoFilt |
+Install via conda:
+```bash
+conda install -c bioconda mafft minimap2 nanofilt
+```
+The CLI and GUI will validate tool availability before running and provide clear error messages if tools are missing.
 ### Development install
 ```bash
@@ -95,10 +110,69 @@ Each command provides detailed help, including option descriptions and expected
 uht-tooling mutation-caller --help
 ```
+### Short Flags
+All commands support short flags for common options:
+```bash
+# Long form
+uht-tooling design-slim --gene-fasta gene.fa --context-fasta ctx.fa --mutations-csv mut.csv --output-dir out/
+# Short form
+uht-tooling design-slim -g gene.fa -c ctx.fa -m mut.csv -o out/
+```
+| Long Flag | Short | Commands |
+|-----------|-------|----------|
+| `--gene-fasta` | `-g` | design-slim, design-kld, design-gibson |
+| `--context-fasta` | `-c` | design-slim, design-kld, design-gibson |
+| `--mutations-csv` | `-m` | design-slim, design-kld, design-gibson |
+| `--output-dir` | `-o` | 7 commands |
+| `--log-path` | `-l` | 7 commands |
+| `--template-fasta` | `-t` | mutation-caller, umi-hunter |
+| `--fastq` | `-q` | 4 commands |
+| `--threshold` | `-T` | mutation-caller |
+| `--config-csv` | `-C` | umi-hunter |
+| `--binding-csv` | `-b` | nextera-primers |
+| `--probes-csv` | `-P` | profile-inserts |
+| `--region-fasta` | `-R` | ep-library-profile |
+| `--plasmid-fasta` | `-p` | ep-library-profile |
+| `--work-dir` | `-w` | ep-library-profile |
+| `--config` | `-K` | global (all commands) |
 You can pass multiple FASTQ paths using repeated `--fastq` options or glob patterns. Optional `--log-path` flags redirect logs if you prefer a location outside the default results directory.
 ---
+## Configuration File
+uht-tooling supports a YAML configuration file for default options.
+**Auto-discovery locations** (in order):
+1. `$UHT_TOOLING_CONFIG` environment variable
+2. `~/.uht-tooling.yaml`
+3. `~/.config/uht-tooling/config.yaml`
+4. `.uht-tooling.yaml` (current directory)
+Or specify explicitly: `uht-tooling --config my-config.yaml ...`
+**Example ~/.uht-tooling.yaml:**
+```yaml
+paths:
+  output_dir: ~/results/uht-tooling
+defaults:
+  mutation_caller:
+    threshold: 15
+  umi_hunter:
+    umi_identity_threshold: 0.85
+    min_cluster_size: 5
+```
+CLI options always take precedence over config values.
+---
 ## Workflow reference
 ### Nextera XT primer design

{uht_tooling-0.2.0 → uht_tooling-0.3.0}/src/uht_tooling.egg-info/SOURCES.txt RENAMED Viewed

@@ -2,6 +2,8 @@ README.md
 pyproject.toml
 src/uht_tooling/__init__.py
 src/uht_tooling/cli.py
+src/uht_tooling/config.py
+src/uht_tooling/tools.py
 src/uht_tooling.egg-info/PKG-INFO
 src/uht_tooling.egg-info/SOURCES.txt
 src/uht_tooling.egg-info/dependency_links.txt