pywombat 0.4.0__tar.gz → 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,125 @@
1
+ # Changelog
2
+
3
+ All notable changes to PyWombat will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [1.0.0] - 2026-01-23
9
+
10
+ First stable release of PyWombat! 🎉
11
+
12
+ ### Added
13
+
14
+ #### Core Features
15
+
16
+ - **Fast TSV Processing**: Efficient processing of bcftools tabulated TSV files using Polars
17
+ - **Flexible Output Formats**: Support for TSV, compressed TSV (`.gz`), and Parquet formats
18
+ - **Streaming Mode**: Memory-efficient processing for large files
19
+ - **Pedigree Support**: Trio and family analysis with automatic parent genotype joining
20
+ - **Multiple Sample Formats**: Handles various genotype formats (GT:DP:GQ:AD and variants)
21
+
22
+ #### Filtering Capabilities
23
+
24
+ - **Quality Filters**: Configurable thresholds for depth (DP), genotype quality (GQ), and variant allele frequency (VAF)
25
+ - **Genotype-Specific VAF Filters**: Separate thresholds for heterozygous, homozygous alternate, and homozygous reference calls
26
+ - **Expression-Based Filtering**: Complex logical expressions with comparison operators (`=`, `!=`, `<`, `>`, `<=`, `>=`) and logical operators (`&`, `|`)
27
+ - **Parent Quality Filtering**: Optional quality filter application to parent genotypes
28
+
29
+ #### De Novo Mutation Detection
30
+
31
+ - **Sex-Chromosome Aware Logic**: Proper handling of X and Y chromosomes in males
32
+ - **PAR Region Support**: Configurable pseudo-autosomal region (PAR) coordinates for GRCh37 and GRCh38
33
+ - **Hemizygous Variant Detection**: Specialized VAF thresholds for X chromosome in males (non-PAR) and Y chromosome
34
+ - **Homozygous VAF Thresholds**: Higher VAF requirements (≥85%) for homozygous variants
35
+ - **Parent Genotype Validation**: Ensures parents are homozygous reference with low VAF (<2%)
36
+ - **Missing Genotype Filtering**: Removes variants with partial/missing genotypes (`./.`, `0/.`, etc.)
37
+ - **Population Frequency Filtering**: Maximum allele frequency thresholds (gnomAD fafmax_faf95_max_genomes)
38
+ - **Quality Filter Support**: gnomAD genomes_filters PASS-only option
39
+
40
+ #### User Experience
41
+
42
+ - **Debug Mode**: Inspect specific variants by chromosome:position for troubleshooting
43
+ - **Verbose Mode**: Detailed filtering step information with variant counts
44
+ - **Automatic Output Naming**: Intelligent output file naming based on input and filter config
45
+ - **Configuration Examples**: Two comprehensive example configurations with extensive documentation
46
+ - `rare_variants_high_impact.yml`: Ultra-rare, high-impact variant filtering
47
+ - `de_novo_mutations.yml`: De novo mutation detection with full documentation
48
+
49
+ #### Documentation
50
+
51
+ - **Comprehensive README**: Complete usage guide with examples for all features
52
+ - **Example Workflows**: Real-world usage scenarios (rare disease, autism trios, etc.)
53
+ - **Input Requirements**: Detailed bcftools command examples for generating input files
54
+ - **VEP Annotation Guide**: Complete workflow from VEP annotation to PyWombat processing
55
+ - **Examples Directory**: Dedicated directory with configuration files and detailed README
56
+ - **Troubleshooting Section**: Common issues and solutions
57
+
58
+ #### Installation Methods
59
+
60
+ - **uvx Support**: One-line execution without installation (`uvx pywombat`)
61
+ - **uv Development Mode**: Local installation for repeated use (`uv sync`, `uv run wombat`)
62
+
63
+ ### Changed
64
+
65
+ - Improved performance with streaming lazy operations
66
+ - Optimized parent genotype lookup (excludes 0/0 genotypes from storage)
67
+ - Enhanced error messages for better user experience
68
+ - Normalized chromosome names for PAR region matching (handles both 'X' and 'chrX')
69
+
70
+ ### Fixed
71
+
72
+ - Sex column reading from pedigree file
73
+ - Parent genotype column naming consistency (father_id/mother_id)
74
+ - Genotype filtering to catch all partial genotypes (`./.`, `0/.`, `1/.`)
75
+ - PAR region matching for different chromosome naming conventions
76
+ - Empty chunk handling in output to avoid blank lines
77
+
78
+ ### Performance Optimizations
79
+
80
+ - Delayed annotation expansion (filter before expanding `(null)` field)
81
+ - Vectorized filtering operations (no Python loops)
82
+ - Early genotype filtering (skip 0/0 before parent lookup)
83
+ - Optimized parent lookup (stores only non-reference genotypes)
84
+ - Streaming mode by default for memory efficiency
85
+
86
+ ### Removed
87
+
88
+ - **Progress bar options**: Removed `--progress`/`--no-progress` and `--chunk-size` options for simplicity
89
+ - **Chunked processing mode**: Simplified to use only efficient streaming mode
90
+
91
+ ## [0.5.0] - 2026-01-20
92
+
93
+ ### Added
94
+
95
+ - Initial de novo mutation detection implementation
96
+ - Pedigree file support
97
+ - Basic quality filtering
98
+ - Expression-based filtering
99
+
100
+ ### Known Issues
101
+
102
+ - Progress bar had reliability issues (removed in 1.0.0)
103
+ - Chunked processing was complex (simplified in 1.0.0)
104
+
105
+ ---
106
+
107
+ ## Release Notes
108
+
109
+ ### v1.0.0 - Production Ready
110
+
111
+ This release marks PyWombat as production-ready for:
112
+
113
+ - Rare disease gene discovery
114
+ - De novo mutation detection in autism and developmental disorders
115
+ - Trio and family-based variant analysis
116
+ - High-throughput variant filtering workflows
117
+
118
+ **Recommended for**: Research groups working with rare variants, de novo mutations, and family-based genomic studies.
119
+
120
+ **Breaking Changes**: None from 0.5.0, but removed progress bar options for cleaner interface.
121
+
122
+ ---
123
+
124
+ [1.0.0]: https://github.com/bourgeron-lab/pywombat/releases/tag/v1.0.0
125
+ [0.5.0]: https://github.com/bourgeron-lab/pywombat/releases/tag/v0.5.0