PyPI - oxymetag - Versions diffs - 1.0.0__py3-none-any.whl - Mend

oxymetag 1.0.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

oxymetag/__init__.py +18 -0
oxymetag/cli.py +107 -0
oxymetag/core.py +226 -0
oxymetag/data/.DS_Store +0 -0
oxymetag/data/Oxygen_pfams.csv +21 -0
oxymetag/data/oxygen_model.rds +0 -0
oxymetag/data/oxymetag_pfams.dmnd +0 -0
oxymetag/data/pfam_headers_table.txt +3870 -0
oxymetag/data/pfam_lengths.tsv +21 -0
oxymetag/scripts/predict_oxygen.R +160 -0
oxymetag/utils.py +73 -0
oxymetag-1.0.0.dist-info/LICENSE +674 -0
oxymetag-1.0.0.dist-info/METADATA +235 -0
oxymetag-1.0.0.dist-info/RECORD +18 -0
oxymetag-1.0.0.dist-info/WHEEL +5 -0
oxymetag-1.0.0.dist-info/entry_points.txt +2 -0
oxymetag-1.0.0.dist-info/top_level.txt +2 -0
tests/__init__.py +0 -0

oxymetag-1.0.0.dist-info/METADATA ADDED Viewed

@@ -0,0 +1,235 @@
+Metadata-Version: 2.1
+Name: oxymetag
+Version: 1.0.0
+Summary: Oxygen metabolism profiling from metagenomic data
+Home-page: https://github.com/cliffbueno/oxymetag
+Author: Clifton P. Bueno de Mesquita
+Author-email: cliff.buenodemesquita@colorado.edu
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Science/Research
+Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.7
+Classifier: Programming Language :: Python :: 3.8
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
+Requires-Python: >=3.7
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: pandas >=1.3.0
+Requires-Dist: numpy >=1.20.0
+# OxyMetaG
+Oxygen metabolism profiling from metagenomic data using Pfam domains. OxyMetaG predicts the percent relative abundance of aerobic bacteria in metagenomic reads based on the ratio of abundances of a set of 20 Pfams. It is recommended to use a HPC cluster or server rather than laptop to run OxyMetaG due to memory requirements, particularly for the step of extracting bacterial reads. If you already have bacterial reads, the "profile" and "predict" functions will run quickly on a laptop.
+If you are working with modern metagenomes, we recommend first quality filtering the raw reads with your method of choice and standard practices, and then extracting bacterial reads with Kraken2 and KrakenTools, which is performed with the OxyMetaG extract function.
+If you are working with ancient metagenomes, we recommend first quality filtering the raw reads with your method of choice and standard practices, and then extracting bacterial reads with a workflow optimized for ancient DNA, such as the one employed by De Sanctis et al. (2025).
+## Installation
+First clone the repository.
+```bash
+git clone https://github.com/cliffbueno/oxymetag.git
+cd oxymetag
+```
+### Using Conda (Recommended)
+```bash
+# Create environment with dependencies
+conda env create -f environment.yml
+conda activate oxymetag
+# Install OxyMetaG
+pip install oxymetag
+```
+### Using Pip
+First install external dependencies:
+- Kraken2
+- DIAMOND
+- KrakenTools
+- R with mgcv and dplyr packages
+Then install OxyMetaG:
+```bash
+pip install oxymetag
+```
+## Quick Start
+### 1. Setup the standard Kraken2 database
+```bash
+oxymetag setup
+```
+### 2. Extract bacterial reads
+```bash
+oxymetag extract -i sample1_R1.fastq.gz sample1_R2.fastq.gz -o BactReads -t 48
+```
+### 3. Profile samples
+```bash
+oxymetag profile -i BactReads -o diamond_output -t 8
+```
+### 4. Predict aerobe levels
+```bash
+# For modern DNA
+oxymetag predict -i diamond_output -o per_aerobe_predictions.tsv -m modern
+# For ancient DNA
+oxymetag predict -i diamond_output -o per_aerobe_predictions.tsv -m ancient
+# Custom cutoffs
+oxymetag predict -i diamond_output -o per_aerobe_predictions.tsv -m custom --idcut 50 --bitcut 30 --ecut 0.01
+```
+## Commands
+### oxymetag setup
+**Function:** Sets up the standard Kraken2 database for taxonomic classification.
+**What it does:** Downloads and builds the standard Kraken2 database containing bacterial, archaeal, and viral genomes. This database is used by the `extract` command to identify bacterial sequences from metagenomic samples.
+**Time:** 2-4 hours depending on internet speed and system performance.
+**Output:** Creates a `kraken2_db/` directory with the standard database.
+Make sure you run oxymetag setup from the directory where you want the database to live, or plan to always specify the --kraken-db path when running extract. The database is quite large (~50-100 GB), so choose a location with sufficient storage.
+---
+### oxymetag extract
+**Function:** Extracts bacterial reads from metagenomic samples using taxonomic classification.
+**What it does:**
+1. Runs Kraken2 to classify all reads in your metagenomic samples
+2. Uses KrakenTools to extract only the reads classified as bacterial
+3. Outputs cleaned bacterial-only FASTQ files for downstream analysis
+**Input:** Quality filtered metagenomic read FASTQ files (paired-end or merged)\
+**Output:** Bacterial-only FASTQ files in `BactReads/` directory
+**Arguments:**
+- `-i, --input`: Input fastq.gz files (paired-end or merged)
+- `-o, --output`: Output directory (default: BactReads)
+- `-t, --threads`: Number of threads (default: 48)
+- `--kraken-db`: Kraken2 database path (default: kraken2_db)
+---
+### oxymetag profile
+**Function:** Profiles bacterial reads against oxygen metabolism protein domains.
+**What it does:**
+1. Takes bacterial-only reads from the `extract` step
+2. Uses DIAMOND blastx to search against a curated database of 20 Pfam domains related to oxygen metabolism
+3. Identifies protein-coding sequences and their functional annotations
+4. Creates detailed hit tables for each sample
+**Input:** Bacterial FASTQ files (uses R1 or merged reads only)\
+**Output:** DIAMOND alignment files (TSV format) in `diamond_output/` directory
+**Arguments:**
+- `-i, --input`: Input directory with bacterial reads (default: BactReads)
+- `-o, --output`: Output directory (default: diamond_output)
+- `-t, --threads`: Number of threads (default: 4)
+- `--diamond-db`: Custom DIAMOND database path (optional)
+---
+### oxymetag predict
+**Function:** Predicts aerobe abundance from protein domain profiles using machine learning.
+**What it does:**
+1. Processes DIAMOND output files with appropriate quality filters
+2. Normalizes protein domain counts by gene length (reads per kilobase)
+3. Calculates aerobic/anaerobic domain ratios for each sample
+4. Applies a trained GAM (Generalized Additive Model) to predict percentage of aerobes
+5. Outputs a table with the sampleID, # Pfams detected, and predicted % aerobic bacteria
+**Input:** DIAMOND output directory from `profile` step\
+**Output:** Tab-separated file with aerobe predictions for each sample
+**Arguments:**
+- `-i, --input`: Directory with DIAMOND output (default: diamond_output)
+- `-o, --output`: Output file (default: per_aerobe_predictions.tsv)
+- `-t, --threads`: Number of threads (default: 4)
+- `-m, --mode`: Filtering mode - 'modern', 'ancient', or 'custom' (default: modern)
+- `--idcut`: Custom identity cutoff (for custom mode)
+- `--bitcut`: Custom bitscore cutoff (for custom mode)
+- `--ecut`: Custom e-value cutoff (for custom mode)
+## Filtering Modes
+OxyMetaG includes three pre-configured filtering modes optimized for different types of DNA:
+### Modern DNA (default)
+**Best for:** Modern environmental metagenomes
+- Identity ≥ 60%
+- Bitscore ≥ 50
+- E-value ≤ 0.001
+### Ancient DNA
+**Best for:** Archaeological samples, paleogenomic data, degraded environmental DNA
+- Identity ≥ 45% (accounts for DNA damage)
+- Bitscore ≥ 25 (accommodates shorter fragments)
+- E-value ≤ 0.1 (more permissive for low-quality data)
+### Custom
+**Best for:** Specialized applications or when you want to optimize parameters
+- Specify your own `--idcut`, `--bitcut`, and `--ecut` values
+- Useful for method development or unusual sample types
+## Output
+The final output (`per_aerobe_predictions.tsv`) contains:
+- `SampleID`: Sample identifier extracted from filenames
+- `ratio`: Aerobic/anaerobic domain ratio
+- `aerobe_pfams`: Number of aerobic Pfam domains detected
+- `anaerobe_pfams`: Number of anaerobic Pfam domains detected
+- `Per_aerobe`: **Predicted percentage of aerobic bacteria (0-100%)**
+## Biological Interpretation
+The `Per_aerobe` value represents the predicted percentage of aerobic bacteria in your sample based on functional gene content:
+- **0-20%**: Predominantly anaerobic community (e.g., sediments, anoxic environments)
+- **20-40%**: Mixed anaerobic community with some aerobic components
+- **40-60%**: Balanced aerobic/anaerobic community
+- **60-80%**: Predominantly aerobic community
+- **80-100%**: Highly aerobic community (e.g., surface soils, oxic water)
+## Citation
+If you use OxyMetaG in your research, please cite:
+```
+Bueno de Mesquita, C.P., Stallard-Olivera, E., Fierer, N. (2025). Bueno de Mesquita, C.P. et al. (2025). Predicting the proportion of aerobic and anaerobic bacteria from metagenomic reads with OxyMetaG.
+```
+If you use the extract function, also cite Kraken2 and KrakenTools:
+```
+Lu, J., Rincon, N., Wood, D.E. et al. Metagenome analysis using the Kraken software suite. Nat Protoc 17, 2815–2839 (2022). https://doi.org/10.1038/s41596-022-00738-y
+```
+If you use the profile function, also cite DIAMOND
+```
+Buchfink, B., Xie, C. & Huson, D. Fast and sensitive protein alignment using DIAMOND. Nat Methods 12, 59–60 (2015). https://doi.org/10.1038/nmeth.3176
+```
+## License
+GPL-3.0 License
+## Support
+For questions, bug reports, or feature requests, please open an issue on GitHub:
+https://github.com/cliffbueno/oxymetag/issues

oxymetag-1.0.0.dist-info/RECORD ADDED Viewed

@@ -0,0 +1,18 @@
+oxymetag/__init__.py,sha256=GM_D7cuLGpBCmwCp3PkT42Q3mnryV3BDvMWjMBqt4L8,441
+oxymetag/cli.py,sha256=ITt3UNUFrdeIskjSaK-705oOcAPUlzq5acqlWCXVDqM,4478
+oxymetag/core.py,sha256=AkuXmhhI2It-QFdfykJcZ-e_6i124yS9RMM-BAH6RlU,8565
+oxymetag/utils.py,sha256=x-WVnR-yNawY13alF_8J9Ihkjpaeg15IlEpZXyn2JSU,2604
+oxymetag/data/.DS_Store,sha256=1lFlJ5EFymdzGAUAaI30vcaaLHt3F1LwpG7xILf9jsM,6148
+oxymetag/data/Oxygen_pfams.csv,sha256=f3_CFPy235BxbX-2Ami8dJpTLQDImM8e_m87QcJvYKo,673
+oxymetag/data/oxygen_model.rds,sha256=8BMWnnIKCALapQnJKLpMnqTymFNAW46E-cQJGU2tJu0,8221
+oxymetag/data/oxymetag_pfams.dmnd,sha256=sSrkriGi-x4Ybf-pQtRRCFh1Wjm4rbQeX9FL9buDbAE,1472431
+oxymetag/data/pfam_headers_table.txt,sha256=wMg4WvlST6Zi3EzVFudjFHyREqNk8kHDI9Q6th7FdFY,255832
+oxymetag/data/pfam_lengths.tsv,sha256=--0bGxDN2v_WiBo0rKFJMPeOPsOrbaNhyjPAoIF9E5A,366
+oxymetag/scripts/predict_oxygen.R,sha256=72Eum7XFtJ-Be5vdIqY8FFIuPnEWTGYDMaZWv5OTPtQ,5549
+tests/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
+oxymetag-1.0.0.dist-info/LICENSE,sha256=OXLcl0T2SZ8Pmy2_dmlvKuetivmyPd5m1q-Gyd-zaYY,35149
+oxymetag-1.0.0.dist-info/METADATA,sha256=eo8h97bwY2mGTgkTNpNjwomUpq6c_k8q4EEsyMiZDIk,8968
+oxymetag-1.0.0.dist-info/WHEEL,sha256=iAkIy5fosb7FzIOwONchHf19Qu7_1wCWyFNR5gu9nU0,91
+oxymetag-1.0.0.dist-info/entry_points.txt,sha256=-9xMAfrSPtFBEvQWRNVKROTM_3OjEik34mVEsYFwM2k,47
+oxymetag-1.0.0.dist-info/top_level.txt,sha256=G7EHL5Fpxne8CH3w5IDIkrsRmMzaEfOhmTngNovoYi8,15
+oxymetag-1.0.0.dist-info/RECORD,,

oxymetag-1.0.0.dist-info/WHEEL ADDED Viewed

@@ -0,0 +1,5 @@
+Wheel-Version: 1.0
+Generator: setuptools (75.3.2)
+Root-Is-Purelib: true
+Tag: py3-none-any

oxymetag-1.0.0.dist-info/entry_points.txt ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ [console_scripts]
2	+ oxymetag = oxymetag.cli:main

oxymetag-1.0.0.dist-info/top_level.txt ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ oxymetag
2	+ tests

tests/__init__.py ADDED Viewed

File without changes