oxymetag 1.0.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,235 @@
1
+ Metadata-Version: 2.1
2
+ Name: oxymetag
3
+ Version: 1.0.0
4
+ Summary: Oxygen metabolism profiling from metagenomic data
5
+ Home-page: https://github.com/cliffbueno/oxymetag
6
+ Author: Clifton P. Bueno de Mesquita
7
+ Author-email: cliff.buenodemesquita@colorado.edu
8
+ Classifier: Development Status :: 4 - Beta
9
+ Classifier: Intended Audience :: Science/Research
10
+ Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
11
+ Classifier: Operating System :: OS Independent
12
+ Classifier: Programming Language :: Python :: 3
13
+ Classifier: Programming Language :: Python :: 3.7
14
+ Classifier: Programming Language :: Python :: 3.8
15
+ Classifier: Programming Language :: Python :: 3.9
16
+ Classifier: Programming Language :: Python :: 3.10
17
+ Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
18
+ Requires-Python: >=3.7
19
+ Description-Content-Type: text/markdown
20
+ License-File: LICENSE
21
+ Requires-Dist: pandas >=1.3.0
22
+ Requires-Dist: numpy >=1.20.0
23
+
24
+ # OxyMetaG
25
+
26
+ Oxygen metabolism profiling from metagenomic data using Pfam domains. OxyMetaG predicts the percent relative abundance of aerobic bacteria in metagenomic reads based on the ratio of abundances of a set of 20 Pfams. It is recommended to use a HPC cluster or server rather than laptop to run OxyMetaG due to memory requirements, particularly for the step of extracting bacterial reads. If you already have bacterial reads, the "profile" and "predict" functions will run quickly on a laptop.
27
+
28
+ If you are working with modern metagenomes, we recommend first quality filtering the raw reads with your method of choice and standard practices, and then extracting bacterial reads with Kraken2 and KrakenTools, which is performed with the OxyMetaG extract function.
29
+
30
+ If you are working with ancient metagenomes, we recommend first quality filtering the raw reads with your method of choice and standard practices, and then extracting bacterial reads with a workflow optimized for ancient DNA, such as the one employed by De Sanctis et al. (2025).
31
+
32
+ ## Installation
33
+
34
+ First clone the repository.
35
+
36
+ ```bash
37
+ git clone https://github.com/cliffbueno/oxymetag.git
38
+ cd oxymetag
39
+ ```
40
+
41
+ ### Using Conda (Recommended)
42
+
43
+ ```bash
44
+ # Create environment with dependencies
45
+ conda env create -f environment.yml
46
+ conda activate oxymetag
47
+
48
+ # Install OxyMetaG
49
+ pip install oxymetag
50
+ ```
51
+
52
+ ### Using Pip
53
+
54
+ First install external dependencies:
55
+ - Kraken2
56
+ - DIAMOND
57
+ - KrakenTools
58
+ - R with mgcv and dplyr packages
59
+
60
+ Then install OxyMetaG:
61
+ ```bash
62
+ pip install oxymetag
63
+ ```
64
+
65
+ ## Quick Start
66
+
67
+ ### 1. Setup the standard Kraken2 database
68
+ ```bash
69
+ oxymetag setup
70
+ ```
71
+
72
+ ### 2. Extract bacterial reads
73
+ ```bash
74
+ oxymetag extract -i sample1_R1.fastq.gz sample1_R2.fastq.gz -o BactReads -t 48
75
+ ```
76
+
77
+ ### 3. Profile samples
78
+ ```bash
79
+ oxymetag profile -i BactReads -o diamond_output -t 8
80
+ ```
81
+
82
+ ### 4. Predict aerobe levels
83
+ ```bash
84
+ # For modern DNA
85
+ oxymetag predict -i diamond_output -o per_aerobe_predictions.tsv -m modern
86
+
87
+ # For ancient DNA
88
+ oxymetag predict -i diamond_output -o per_aerobe_predictions.tsv -m ancient
89
+
90
+ # Custom cutoffs
91
+ oxymetag predict -i diamond_output -o per_aerobe_predictions.tsv -m custom --idcut 50 --bitcut 30 --ecut 0.01
92
+ ```
93
+
94
+ ## Commands
95
+
96
+ ### oxymetag setup
97
+ **Function:** Sets up the standard Kraken2 database for taxonomic classification.
98
+
99
+ **What it does:** Downloads and builds the standard Kraken2 database containing bacterial, archaeal, and viral genomes. This database is used by the `extract` command to identify bacterial sequences from metagenomic samples.
100
+
101
+ **Time:** 2-4 hours depending on internet speed and system performance.
102
+
103
+ **Output:** Creates a `kraken2_db/` directory with the standard database.
104
+
105
+ Make sure you run oxymetag setup from the directory where you want the database to live, or plan to always specify the --kraken-db path when running extract. The database is quite large (~50-100 GB), so choose a location with sufficient storage.
106
+
107
+ ---
108
+
109
+ ### oxymetag extract
110
+ **Function:** Extracts bacterial reads from metagenomic samples using taxonomic classification.
111
+
112
+ **What it does:**
113
+ 1. Runs Kraken2 to classify all reads in your metagenomic samples
114
+ 2. Uses KrakenTools to extract only the reads classified as bacterial
115
+ 3. Outputs cleaned bacterial-only FASTQ files for downstream analysis
116
+
117
+ **Input:** Quality filtered metagenomic read FASTQ files (paired-end or merged)\
118
+ **Output:** Bacterial-only FASTQ files in `BactReads/` directory
119
+
120
+ **Arguments:**
121
+ - `-i, --input`: Input fastq.gz files (paired-end or merged)
122
+ - `-o, --output`: Output directory (default: BactReads)
123
+ - `-t, --threads`: Number of threads (default: 48)
124
+ - `--kraken-db`: Kraken2 database path (default: kraken2_db)
125
+
126
+ ---
127
+
128
+ ### oxymetag profile
129
+ **Function:** Profiles bacterial reads against oxygen metabolism protein domains.
130
+
131
+ **What it does:**
132
+ 1. Takes bacterial-only reads from the `extract` step
133
+ 2. Uses DIAMOND blastx to search against a curated database of 20 Pfam domains related to oxygen metabolism
134
+ 3. Identifies protein-coding sequences and their functional annotations
135
+ 4. Creates detailed hit tables for each sample
136
+
137
+ **Input:** Bacterial FASTQ files (uses R1 or merged reads only)\
138
+ **Output:** DIAMOND alignment files (TSV format) in `diamond_output/` directory
139
+
140
+ **Arguments:**
141
+ - `-i, --input`: Input directory with bacterial reads (default: BactReads)
142
+ - `-o, --output`: Output directory (default: diamond_output)
143
+ - `-t, --threads`: Number of threads (default: 4)
144
+ - `--diamond-db`: Custom DIAMOND database path (optional)
145
+
146
+ ---
147
+
148
+ ### oxymetag predict
149
+ **Function:** Predicts aerobe abundance from protein domain profiles using machine learning.
150
+
151
+ **What it does:**
152
+ 1. Processes DIAMOND output files with appropriate quality filters
153
+ 2. Normalizes protein domain counts by gene length (reads per kilobase)
154
+ 3. Calculates aerobic/anaerobic domain ratios for each sample
155
+ 4. Applies a trained GAM (Generalized Additive Model) to predict percentage of aerobes
156
+ 5. Outputs a table with the sampleID, # Pfams detected, and predicted % aerobic bacteria
157
+
158
+ **Input:** DIAMOND output directory from `profile` step\
159
+ **Output:** Tab-separated file with aerobe predictions for each sample
160
+
161
+ **Arguments:**
162
+ - `-i, --input`: Directory with DIAMOND output (default: diamond_output)
163
+ - `-o, --output`: Output file (default: per_aerobe_predictions.tsv)
164
+ - `-t, --threads`: Number of threads (default: 4)
165
+ - `-m, --mode`: Filtering mode - 'modern', 'ancient', or 'custom' (default: modern)
166
+ - `--idcut`: Custom identity cutoff (for custom mode)
167
+ - `--bitcut`: Custom bitscore cutoff (for custom mode)
168
+ - `--ecut`: Custom e-value cutoff (for custom mode)
169
+
170
+ ## Filtering Modes
171
+
172
+ OxyMetaG includes three pre-configured filtering modes optimized for different types of DNA:
173
+
174
+ ### Modern DNA (default)
175
+ **Best for:** Modern environmental metagenomes
176
+ - Identity ≥ 60%
177
+ - Bitscore ≥ 50
178
+ - E-value ≤ 0.001
179
+
180
+ ### Ancient DNA
181
+ **Best for:** Archaeological samples, paleogenomic data, degraded environmental DNA
182
+ - Identity ≥ 45% (accounts for DNA damage)
183
+ - Bitscore ≥ 25 (accommodates shorter fragments)
184
+ - E-value ≤ 0.1 (more permissive for low-quality data)
185
+
186
+ ### Custom
187
+ **Best for:** Specialized applications or when you want to optimize parameters
188
+ - Specify your own `--idcut`, `--bitcut`, and `--ecut` values
189
+ - Useful for method development or unusual sample types
190
+
191
+ ## Output
192
+
193
+ The final output (`per_aerobe_predictions.tsv`) contains:
194
+ - `SampleID`: Sample identifier extracted from filenames
195
+ - `ratio`: Aerobic/anaerobic domain ratio
196
+ - `aerobe_pfams`: Number of aerobic Pfam domains detected
197
+ - `anaerobe_pfams`: Number of anaerobic Pfam domains detected
198
+ - `Per_aerobe`: **Predicted percentage of aerobic bacteria (0-100%)**
199
+
200
+ ## Biological Interpretation
201
+
202
+ The `Per_aerobe` value represents the predicted percentage of aerobic bacteria in your sample based on functional gene content:
203
+
204
+ - **0-20%**: Predominantly anaerobic community (e.g., sediments, anoxic environments)
205
+ - **20-40%**: Mixed anaerobic community with some aerobic components
206
+ - **40-60%**: Balanced aerobic/anaerobic community
207
+ - **60-80%**: Predominantly aerobic community
208
+ - **80-100%**: Highly aerobic community (e.g., surface soils, oxic water)
209
+
210
+ ## Citation
211
+
212
+ If you use OxyMetaG in your research, please cite:
213
+
214
+ ```
215
+ Bueno de Mesquita, C.P., Stallard-Olivera, E., Fierer, N. (2025). Bueno de Mesquita, C.P. et al. (2025). Predicting the proportion of aerobic and anaerobic bacteria from metagenomic reads with OxyMetaG.
216
+ ```
217
+ If you use the extract function, also cite Kraken2 and KrakenTools:
218
+
219
+ ```
220
+ Lu, J., Rincon, N., Wood, D.E. et al. Metagenome analysis using the Kraken software suite. Nat Protoc 17, 2815–2839 (2022). https://doi.org/10.1038/s41596-022-00738-y
221
+ ```
222
+ If you use the profile function, also cite DIAMOND
223
+
224
+ ```
225
+ Buchfink, B., Xie, C. & Huson, D. Fast and sensitive protein alignment using DIAMOND. Nat Methods 12, 59–60 (2015). https://doi.org/10.1038/nmeth.3176
226
+ ```
227
+
228
+ ## License
229
+
230
+ GPL-3.0 License
231
+
232
+ ## Support
233
+
234
+ For questions, bug reports, or feature requests, please open an issue on GitHub:
235
+ https://github.com/cliffbueno/oxymetag/issues
@@ -0,0 +1,18 @@
1
+ oxymetag/__init__.py,sha256=GM_D7cuLGpBCmwCp3PkT42Q3mnryV3BDvMWjMBqt4L8,441
2
+ oxymetag/cli.py,sha256=ITt3UNUFrdeIskjSaK-705oOcAPUlzq5acqlWCXVDqM,4478
3
+ oxymetag/core.py,sha256=AkuXmhhI2It-QFdfykJcZ-e_6i124yS9RMM-BAH6RlU,8565
4
+ oxymetag/utils.py,sha256=x-WVnR-yNawY13alF_8J9Ihkjpaeg15IlEpZXyn2JSU,2604
5
+ oxymetag/data/.DS_Store,sha256=1lFlJ5EFymdzGAUAaI30vcaaLHt3F1LwpG7xILf9jsM,6148
6
+ oxymetag/data/Oxygen_pfams.csv,sha256=f3_CFPy235BxbX-2Ami8dJpTLQDImM8e_m87QcJvYKo,673
7
+ oxymetag/data/oxygen_model.rds,sha256=8BMWnnIKCALapQnJKLpMnqTymFNAW46E-cQJGU2tJu0,8221
8
+ oxymetag/data/oxymetag_pfams.dmnd,sha256=sSrkriGi-x4Ybf-pQtRRCFh1Wjm4rbQeX9FL9buDbAE,1472431
9
+ oxymetag/data/pfam_headers_table.txt,sha256=wMg4WvlST6Zi3EzVFudjFHyREqNk8kHDI9Q6th7FdFY,255832
10
+ oxymetag/data/pfam_lengths.tsv,sha256=--0bGxDN2v_WiBo0rKFJMPeOPsOrbaNhyjPAoIF9E5A,366
11
+ oxymetag/scripts/predict_oxygen.R,sha256=72Eum7XFtJ-Be5vdIqY8FFIuPnEWTGYDMaZWv5OTPtQ,5549
12
+ tests/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
13
+ oxymetag-1.0.0.dist-info/LICENSE,sha256=OXLcl0T2SZ8Pmy2_dmlvKuetivmyPd5m1q-Gyd-zaYY,35149
14
+ oxymetag-1.0.0.dist-info/METADATA,sha256=eo8h97bwY2mGTgkTNpNjwomUpq6c_k8q4EEsyMiZDIk,8968
15
+ oxymetag-1.0.0.dist-info/WHEEL,sha256=iAkIy5fosb7FzIOwONchHf19Qu7_1wCWyFNR5gu9nU0,91
16
+ oxymetag-1.0.0.dist-info/entry_points.txt,sha256=-9xMAfrSPtFBEvQWRNVKROTM_3OjEik34mVEsYFwM2k,47
17
+ oxymetag-1.0.0.dist-info/top_level.txt,sha256=G7EHL5Fpxne8CH3w5IDIkrsRmMzaEfOhmTngNovoYi8,15
18
+ oxymetag-1.0.0.dist-info/RECORD,,
@@ -0,0 +1,5 @@
1
+ Wheel-Version: 1.0
2
+ Generator: setuptools (75.3.2)
3
+ Root-Is-Purelib: true
4
+ Tag: py3-none-any
5
+
@@ -0,0 +1,2 @@
1
+ [console_scripts]
2
+ oxymetag = oxymetag.cli:main
@@ -0,0 +1,2 @@
1
+ oxymetag
2
+ tests
tests/__init__.py ADDED
File without changes