pywombat 0.5.0__py3-none-any.whl → 1.0.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,142 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: pywombat
3
- Version: 0.5.0
4
- Summary: A CLI tool for processing and filtering bcftools tabulated TSV files with pedigree support
5
- Project-URL: Homepage, https://github.com/bourgeron-lab/pywombat
6
- Project-URL: Repository, https://github.com/bourgeron-lab/pywombat
7
- Project-URL: Issues, https://github.com/bourgeron-lab/pywombat/issues
8
- Author-email: Freddy Cliquet <fcliquet@pasteur.fr>
9
- License: MIT
10
- Keywords: bioinformatics,genomics,pedigree,variant-calling,vcf
11
- Classifier: Development Status :: 3 - Alpha
12
- Classifier: Intended Audience :: Science/Research
13
- Classifier: Programming Language :: Python :: 3
14
- Classifier: Programming Language :: Python :: 3.12
15
- Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
16
- Requires-Python: >=3.12
17
- Requires-Dist: click>=8.1.0
18
- Requires-Dist: polars>=0.19.0
19
- Requires-Dist: pyyaml>=6.0
20
- Description-Content-Type: text/markdown
21
-
22
- # PyWombat
23
-
24
- A CLI tool for processing bcftools tabulated TSV files.
25
-
26
- ## Installation
27
-
28
- This is a UV-managed Python package. To install:
29
-
30
- ```bash
31
- uv sync
32
- ```
33
-
34
- ## Usage
35
-
36
- The `wombat` command processes bcftools tabulated TSV files:
37
-
38
- ```bash
39
- # Format a bcftools TSV file and print to stdout
40
- wombat input.tsv
41
-
42
- # Format and save to output file (creates output.tsv by default)
43
- wombat input.tsv -o output
44
-
45
- # Format and save as parquet
46
- wombat input.tsv -o output -f parquet
47
- wombat input.tsv -o output --format parquet
48
-
49
- # Format with pedigree information to add parent genotypes
50
- wombat input.tsv --pedigree pedigree.tsv -o output
51
- ```
52
-
53
- ### What does `wombat` do?
54
-
55
- The `wombat` command processes bcftools tabulated TSV files by:
56
-
57
- 1. **Expanding the `(null)` column**: This column contains multiple fields in the format `NAME=value` separated by semicolons (e.g., `DP=30;AF=0.5;AC=2`). Each field is extracted into its own column.
58
-
59
- 2. **Preserving the `CSQ` column**: The CSQ (Consequence) column is preserved as-is and not melted, allowing VEP annotations to remain intact.
60
-
61
- 3. **Melting and splitting sample columns**: After the `(null)` column, there are typically sample columns with values in `GT:DP:GQ:AD` format. The tool:
62
- - Extracts the sample name (the part before the first `:` character)
63
- - Transforms the wide format into long format
64
- - Creates a `sample` column with the sample names
65
- - Splits the sample values into separate columns:
66
- - `sample_gt`: Genotype (e.g., 0/1, 1/1)
67
- - `sample_dp`: Read depth
68
- - `sample_gq`: Genotype quality
69
- - `sample_ad`: Allele depth (takes the second value from comma-separated list)
70
- - `sample_vaf`: Variant allele frequency (calculated as sample_ad / sample_dp)
71
-
72
- ### Example
73
-
74
- **Input:**
75
-
76
- ```tsv
77
- CHROM POS REF ALT (null) Sample1:GT:Sample1:DP:Sample1:GQ:Sample1:AD Sample2:GT:Sample2:DP:Sample2:GQ:Sample2:AD
78
- chr1 100 A T DP=30;AF=0.5;AC=2 0/1:15:99:5,10 1/1:18:99:0,18
79
- ```
80
-
81
- **Output:**
82
-
83
- ```tsv
84
- CHROM POS REF ALT AC AF DP sample sample_gt sample_dp sample_gq sample_ad sample_vaf
85
- chr1 100 A T 2 0.5 30 Sample1 0/1 15 99 10 0.6667
86
- chr1 100 A T 2 0.5 30 Sample2 1/1 18 99 18 1.0
87
- ```
88
-
89
- Notes:
90
-
91
- - The `sample_ad` column contains the second value from the AD field (e.g., from `5,10` it extracts `10`)
92
- - The `sample_vaf` column is the variant allele frequency calculated as `sample_ad / sample_dp`
93
- - By default, output is in TSV format. Use `-f parquet` to output as Parquet files
94
- - The `-o` option specifies an output prefix (e.g., `-o output` creates `output.tsv` or `output.parquet`)
95
-
96
- ### Pedigree Support
97
-
98
- You can provide a pedigree file with the `--pedigree` option to add parent genotype information to the output. This enables trio analysis by including the father's and mother's genotypes for each sample.
99
-
100
- **Pedigree File Format:**
101
-
102
- The pedigree file should be a tab-separated file with the following columns:
103
-
104
- - `FID`: Family ID
105
- - `sample_id`: Sample identifier (matches the sample names in the VCF)
106
- - `FatherBarcode`: Father's sample identifier (use `0` or `-9` if unknown)
107
- - `MotherBarcode`: Mother's sample identifier (use `0` or `-9` if unknown)
108
- - `Sex`: Sex of the sample (optional)
109
- - `Pheno`: Phenotype information (optional)
110
-
111
- Example pedigree file:
112
-
113
- ```tsv
114
- FID sample_id FatherBarcode MotherBarcode Sex Pheno
115
- FAM1 Child1 Father1 Mother1 1 2
116
- FAM1 Father1 0 0 1 1
117
- FAM1 Mother1 0 0 2 1
118
- ```
119
-
120
- **Output with Pedigree:**
121
-
122
- When using `--pedigree`, the output will include additional columns for each parent:
123
-
124
- - `father_gt`, `father_dp`, `father_gq`, `father_ad`, `father_vaf`: Father's genotype information
125
- - `mother_gt`, `mother_dp`, `mother_gq`, `mother_ad`, `mother_vaf`: Mother's genotype information
126
-
127
- These columns will contain the parent's genotype data for the same variant, allowing you to analyze inheritance patterns.
128
-
129
- ## Development
130
-
131
- This project uses:
132
-
133
- - **UV** for package management
134
- - **Polars** for fast data processing
135
- - **Click** for CLI interface
136
-
137
- ## Testing
138
-
139
- Test files are available in the `tests/` directory:
140
-
141
- - `test.tabulated.tsv` - Real bcftools output
142
- - `test_small.tsv` - Small example for quick testing
@@ -1,6 +0,0 @@
1
- pywombat/__init__.py,sha256=iIPN9vJtsIUhl_DiKNnknxCamLinfayodLLFK8y-aJg,54
2
- pywombat/cli.py,sha256=0nBlwyRu1Q01a0EHcVyIYtKmgezCWA85pQtEXpnuzL4,44535
3
- pywombat-0.5.0.dist-info/METADATA,sha256=2Py8xwNxZBD18u4r-tJI_mQezMBg4td3ruWOm61MbdA,4982
4
- pywombat-0.5.0.dist-info/WHEEL,sha256=WLgqFyCfm_KASv4WHyYy0P3pM_m7J5L9k2skdKLirC8,87
5
- pywombat-0.5.0.dist-info/entry_points.txt,sha256=Vt7U2ypbiEgCBlEV71ZPk287H5_HKmPBT4iBu6duEcE,44
6
- pywombat-0.5.0.dist-info/RECORD,,