pywombat 0.5.0__tar.gz → 1.0.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- pywombat-1.0.0/CHANGELOG.md +125 -0
- pywombat-1.0.0/PKG-INFO +638 -0
- pywombat-1.0.0/README.md +616 -0
- pywombat-1.0.0/examples/README.md +219 -0
- pywombat-1.0.0/examples/de_novo_mutations.yml +159 -0
- pywombat-1.0.0/examples/rare_variants_high_impact.yml +84 -0
- {pywombat-0.5.0 → pywombat-1.0.0}/pyproject.toml +3 -3
- {pywombat-0.5.0 → pywombat-1.0.0}/src/pywombat/cli.py +891 -29
- {pywombat-0.5.0 → pywombat-1.0.0}/uv.lock +16 -2
- pywombat-0.5.0/PKG-INFO +0 -142
- pywombat-0.5.0/README.md +0 -121
- {pywombat-0.5.0 → pywombat-1.0.0}/.github/workflows/publish.yml +0 -0
- {pywombat-0.5.0 → pywombat-1.0.0}/.gitignore +0 -0
- {pywombat-0.5.0 → pywombat-1.0.0}/.python-version +0 -0
- {pywombat-0.5.0 → pywombat-1.0.0}/QUICKSTART.md +0 -0
- {pywombat-0.5.0 → pywombat-1.0.0}/src/pywombat/__init__.py +0 -0
|
@@ -0,0 +1,125 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to PyWombat will be documented in this file.
|
|
4
|
+
|
|
5
|
+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
|
+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
+
|
|
8
|
+
## [1.0.0] - 2026-01-23
|
|
9
|
+
|
|
10
|
+
First stable release of PyWombat! 🎉
|
|
11
|
+
|
|
12
|
+
### Added
|
|
13
|
+
|
|
14
|
+
#### Core Features
|
|
15
|
+
|
|
16
|
+
- **Fast TSV Processing**: Efficient processing of bcftools tabulated TSV files using Polars
|
|
17
|
+
- **Flexible Output Formats**: Support for TSV, compressed TSV (`.gz`), and Parquet formats
|
|
18
|
+
- **Streaming Mode**: Memory-efficient processing for large files
|
|
19
|
+
- **Pedigree Support**: Trio and family analysis with automatic parent genotype joining
|
|
20
|
+
- **Multiple Sample Formats**: Handles various genotype formats (GT:DP:GQ:AD and variants)
|
|
21
|
+
|
|
22
|
+
#### Filtering Capabilities
|
|
23
|
+
|
|
24
|
+
- **Quality Filters**: Configurable thresholds for depth (DP), genotype quality (GQ), and variant allele frequency (VAF)
|
|
25
|
+
- **Genotype-Specific VAF Filters**: Separate thresholds for heterozygous, homozygous alternate, and homozygous reference calls
|
|
26
|
+
- **Expression-Based Filtering**: Complex logical expressions with comparison operators (`=`, `!=`, `<`, `>`, `<=`, `>=`) and logical operators (`&`, `|`)
|
|
27
|
+
- **Parent Quality Filtering**: Optional quality filter application to parent genotypes
|
|
28
|
+
|
|
29
|
+
#### De Novo Mutation Detection
|
|
30
|
+
|
|
31
|
+
- **Sex-Chromosome Aware Logic**: Proper handling of X and Y chromosomes in males
|
|
32
|
+
- **PAR Region Support**: Configurable pseudo-autosomal region (PAR) coordinates for GRCh37 and GRCh38
|
|
33
|
+
- **Hemizygous Variant Detection**: Specialized VAF thresholds for X chromosome in males (non-PAR) and Y chromosome
|
|
34
|
+
- **Homozygous VAF Thresholds**: Higher VAF requirements (≥85%) for homozygous variants
|
|
35
|
+
- **Parent Genotype Validation**: Ensures parents are homozygous reference with low VAF (<2%)
|
|
36
|
+
- **Missing Genotype Filtering**: Removes variants with partial/missing genotypes (`./.`, `0/.`, etc.)
|
|
37
|
+
- **Population Frequency Filtering**: Maximum allele frequency thresholds (gnomAD fafmax_faf95_max_genomes)
|
|
38
|
+
- **Quality Filter Support**: gnomAD genomes_filters PASS-only option
|
|
39
|
+
|
|
40
|
+
#### User Experience
|
|
41
|
+
|
|
42
|
+
- **Debug Mode**: Inspect specific variants by chromosome:position for troubleshooting
|
|
43
|
+
- **Verbose Mode**: Detailed filtering step information with variant counts
|
|
44
|
+
- **Automatic Output Naming**: Intelligent output file naming based on input and filter config
|
|
45
|
+
- **Configuration Examples**: Two comprehensive example configurations with extensive documentation
|
|
46
|
+
- `rare_variants_high_impact.yml`: Ultra-rare, high-impact variant filtering
|
|
47
|
+
- `de_novo_mutations.yml`: De novo mutation detection with full documentation
|
|
48
|
+
|
|
49
|
+
#### Documentation
|
|
50
|
+
|
|
51
|
+
- **Comprehensive README**: Complete usage guide with examples for all features
|
|
52
|
+
- **Example Workflows**: Real-world usage scenarios (rare disease, autism trios, etc.)
|
|
53
|
+
- **Input Requirements**: Detailed bcftools command examples for generating input files
|
|
54
|
+
- **VEP Annotation Guide**: Complete workflow from VEP annotation to PyWombat processing
|
|
55
|
+
- **Examples Directory**: Dedicated directory with configuration files and detailed README
|
|
56
|
+
- **Troubleshooting Section**: Common issues and solutions
|
|
57
|
+
|
|
58
|
+
#### Installation Methods
|
|
59
|
+
|
|
60
|
+
- **uvx Support**: One-line execution without installation (`uvx pywombat`)
|
|
61
|
+
- **uv Development Mode**: Local installation for repeated use (`uv sync`, `uv run wombat`)
|
|
62
|
+
|
|
63
|
+
### Changed
|
|
64
|
+
|
|
65
|
+
- Improved performance with streaming lazy operations
|
|
66
|
+
- Optimized parent genotype lookup (excludes 0/0 genotypes from storage)
|
|
67
|
+
- Enhanced error messages for better user experience
|
|
68
|
+
- Normalized chromosome names for PAR region matching (handles both 'X' and 'chrX')
|
|
69
|
+
|
|
70
|
+
### Fixed
|
|
71
|
+
|
|
72
|
+
- Sex column reading from pedigree file
|
|
73
|
+
- Parent genotype column naming consistency (father_id/mother_id)
|
|
74
|
+
- Genotype filtering to catch all partial genotypes (`./.`, `0/.`, `1/.`)
|
|
75
|
+
- PAR region matching for different chromosome naming conventions
|
|
76
|
+
- Empty chunk handling in output to avoid blank lines
|
|
77
|
+
|
|
78
|
+
### Performance Optimizations
|
|
79
|
+
|
|
80
|
+
- Delayed annotation expansion (filter before expanding `(null)` field)
|
|
81
|
+
- Vectorized filtering operations (no Python loops)
|
|
82
|
+
- Early genotype filtering (skip 0/0 before parent lookup)
|
|
83
|
+
- Optimized parent lookup (stores only non-reference genotypes)
|
|
84
|
+
- Streaming mode by default for memory efficiency
|
|
85
|
+
|
|
86
|
+
### Removed
|
|
87
|
+
|
|
88
|
+
- **Progress bar options**: Removed `--progress`/`--no-progress` and `--chunk-size` options for simplicity
|
|
89
|
+
- **Chunked processing mode**: Simplified to use only efficient streaming mode
|
|
90
|
+
|
|
91
|
+
## [0.5.0] - 2026-01-20
|
|
92
|
+
|
|
93
|
+
### Added
|
|
94
|
+
|
|
95
|
+
- Initial de novo mutation detection implementation
|
|
96
|
+
- Pedigree file support
|
|
97
|
+
- Basic quality filtering
|
|
98
|
+
- Expression-based filtering
|
|
99
|
+
|
|
100
|
+
### Known Issues
|
|
101
|
+
|
|
102
|
+
- Progress bar had reliability issues (removed in 1.0.0)
|
|
103
|
+
- Chunked processing was complex (simplified in 1.0.0)
|
|
104
|
+
|
|
105
|
+
---
|
|
106
|
+
|
|
107
|
+
## Release Notes
|
|
108
|
+
|
|
109
|
+
### v1.0.0 - Production Ready
|
|
110
|
+
|
|
111
|
+
This release marks PyWombat as production-ready for:
|
|
112
|
+
|
|
113
|
+
- Rare disease gene discovery
|
|
114
|
+
- De novo mutation detection in autism and developmental disorders
|
|
115
|
+
- Trio and family-based variant analysis
|
|
116
|
+
- High-throughput variant filtering workflows
|
|
117
|
+
|
|
118
|
+
**Recommended for**: Research groups working with rare variants, de novo mutations, and family-based genomic studies.
|
|
119
|
+
|
|
120
|
+
**Breaking Changes**: None from 0.5.0, but removed progress bar options for cleaner interface.
|
|
121
|
+
|
|
122
|
+
---
|
|
123
|
+
|
|
124
|
+
[1.0.0]: https://github.com/bourgeron-lab/pywombat/releases/tag/v1.0.0
|
|
125
|
+
[0.5.0]: https://github.com/bourgeron-lab/pywombat/releases/tag/v0.5.0
|