moducomp 0.7.8__tar.gz → 0.7.10__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: moducomp
3
- Version: 0.7.8
3
+ Version: 0.7.10
4
4
  Summary: moducomp: metabolic module completeness and complementarity for microbiomes.
5
5
  Keywords: bioinformatics,microbiome,metabolic,kegg,genomics
6
6
  Author-email: "Juan C. Villada" <jvillada@lbl.gov>
@@ -37,6 +37,7 @@ Project-URL: Repository, https://github.com/NeLLi-team/moducomp
37
37
  - Generation of complementarity reports highlighting modules completed through genome partnerships.
38
38
  - Tracks and reports the actual proteins that are responsible for the completion of the module in the combination of N genomes.
39
39
  - **Automatic resource monitoring** with timestamped logs tracking CPU usage, memory consumption, and runtime for reproducibility.
40
+ - **Consistent logging to stdout/stderr** with a per-command resource summary emitted at the end of each run.
40
41
 
41
42
  ## Installation (Recommended)
42
43
 
@@ -77,6 +78,8 @@ Small test data sets ship with `moducomp`. After installation you can confirm th
77
78
  moducomp test --ncpus 16 --calculate-complementarity 2 --eggnog-data-dir "$EGGNOG_DATA_DIR"
78
79
  ```
79
80
 
81
+ The test command runs in low-memory mode by default. If you have plenty of RAM and want full-memory mode, add `--fullmem` (or `--full-mem`).
82
+
80
83
  ### Developer install (Pixi)
81
84
 
82
85
  If you want to download the code and develop locally:
@@ -105,6 +108,82 @@ You should see the command line help without errors.
105
108
 
106
109
  `moducomp` provides two main commands: `pipeline` and `analyze-ko-matrix`. You can run these commands using Pixi tasks defined in `pyproject.toml` or directly within the Pixi environment.
107
110
 
111
+ ### Pipeline overview
112
+
113
+ The diagram below shows the main stages executed by ModuComp.
114
+
115
+ ```mermaid
116
+ graph TD
117
+ A([Start run]) --> B[Initialize logging and resource monitoring]
118
+ B --> C{Input type}
119
+ C -->|pipeline| D[Validate genome directory]
120
+ C -->|analyze-ko-matrix| H[Load existing KO matrix]
121
+ D --> E[Prepare genomes: adapt headers or copy to tmp]
122
+ E --> F[Merge genomes into single FAA]
123
+ F --> G[Run eggNOG-mapper (if needed)]
124
+ G --> H[Create KO matrix (`kos_matrix.csv`)]
125
+ H --> I[Convert KO matrix to KPCT input]
126
+ I --> J[Run KPCT (parallel with fallback)]
127
+ J --> K[Create module completeness matrix]
128
+ K --> L{Complementarity requested?}
129
+ L -->|Yes| M[Generate complementarity report(s)]
130
+ L -->|No| N[Skip]
131
+ M --> O[Write outputs + logs]
132
+ N --> O
133
+ O --> P[Optional cleanup of `tmp/`]
134
+ P --> Q([Pipeline complete])
135
+ ```
136
+
137
+ ### CLI options and defaults
138
+
139
+ This section lists all CLI options implemented today, along with their default values.
140
+
141
+ #### `pipeline` command (positional args: `genomedir`, `savedir`)
142
+
143
+ | Option | Default | Description |
144
+ | --- | --- | --- |
145
+ | `--ncpus`, `-n` | `16` | Number of CPU cores to use for eggNOG-mapper and KPCT. |
146
+ | `--calculate-complementarity`, `-c` | `0` | Complementarity size to compute (0 disables). |
147
+ | `--adapt-headers/--no-adapt-headers` | `false` | Adapt FASTA headers to `genome|protein_N`. |
148
+ | `--del-tmp/--keep-tmp` | `true` | Delete temporary files after completion. |
149
+ | `--lowmem/--fullmem` (`--low-mem/--full-mem`) | `fullmem` | Run eggNOG-mapper without `--dbmem` to reduce RAM. |
150
+ | `--verbose/--quiet` | `false` | Enable verbose progress output. |
151
+ | `--log-level`, `-l` | `INFO` | Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR`. |
152
+ | `--eggnog-data-dir` | `EGGNOG_DATA_DIR` | Path to eggNOG-mapper data (sets `EGGNOG_DATA_DIR`). |
153
+
154
+ #### `test` command (bundled test genomes)
155
+
156
+ | Option | Default | Description |
157
+ | --- | --- | --- |
158
+ | `--output-dir`, `-o` | `output_test_moducomp_<DATETIME>` | Output directory for test run. |
159
+ | `--ncpus`, `-n` | `2` | CPU cores for the test run. |
160
+ | `--calculate-complementarity`, `-c` | `2` | Complementarity size to compute (0 disables). |
161
+ | `--adapt-headers/--no-adapt-headers` | `false` | Adapt FASTA headers before the test. |
162
+ | `--del-tmp/--keep-tmp` | `true` | Delete temporary files after the test completes. |
163
+ | `--lowmem/--fullmem` (`--low-mem/--full-mem`) | `lowmem` | Low-memory mode is the default for tests. |
164
+ | `--verbose/--quiet` | `verbose` | Verbose output is the default for tests. |
165
+ | `--log-level`, `-l` | `INFO` | Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR`. |
166
+ | `--eggnog-data-dir` | `EGGNOG_DATA_DIR` | Path to eggNOG-mapper data (sets `EGGNOG_DATA_DIR`). |
167
+
168
+ #### `analyze-ko-matrix` command (positional args: `kos_matrix`, `savedir`)
169
+
170
+ | Option | Default | Description |
171
+ | --- | --- | --- |
172
+ | `--calculate-complementarity`, `-c` | `0` | Complementarity size to compute (0 disables). |
173
+ | `--kpct-outprefix` | `output_give_completeness` | Prefix for KPCT output files. |
174
+ | `--del-tmp/--keep-tmp` | `true` | Delete temporary files after completion. |
175
+ | `--ncpus`, `-n` | `16` | CPU cores for KPCT parallel processing. |
176
+ | `--verbose/--quiet` | `false` | Enable verbose progress output. |
177
+ | `--log-level`, `-l` | `INFO` | Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR`. |
178
+
179
+ #### `download-eggnog-data` command
180
+
181
+ | Option | Default | Description |
182
+ | --- | --- | --- |
183
+ | `--eggnog-data-dir` | `${XDG_DATA_HOME:-~/.local/share}/moducomp/eggnog` | Destination for eggNOG-mapper data (sets `EGGNOG_DATA_DIR`). |
184
+ | `--log-level`, `-l` | `INFO` | Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR`. |
185
+ | `--verbose/--quiet` | `verbose` | Stream downloader output to the console. |
186
+
108
187
  ### Performance and parallel processing
109
188
 
110
189
  `moducomp` includes **parallel processing capabilities** for the KPCT (KEGG Pathways Completeness Tool) analysis, which can significantly improve performance for large datasets:
@@ -125,7 +204,7 @@ You should see the command line help without errors.
125
204
 
126
205
  ### ⚠️ Important note 2
127
206
 
128
- `moducomp` is specifically designed for large scale analysis of microbiomes with hundreds of members, and works on Linux systems with at least **64GB of RAM**. Nevertheless, it can be run on **smaller systems with less RAM, using the flag `--lowmem` when running the `pipeline` command**.
207
+ `moducomp` is specifically designed for large scale analysis of microbiomes with hundreds of members, and works on Linux systems with at least **64GB of RAM**. Nevertheless, it can be run on **smaller systems with less RAM, using the flag `--lowmem` (`--low-mem`) when running the `pipeline` command**. The `test` command uses low-memory mode by default and can be switched to full memory with `--fullmem` (`--full-mem`).
129
208
 
130
209
  ### Notes on bundled test data
131
210
 
@@ -162,9 +241,9 @@ moducomp pipeline \
162
241
  --ncpus <number_of_cpus_to_use> \
163
242
  --calculate-complementarity <N> # 0 to disable, 2 for 2-member, 3 for 3-member complementarity.
164
243
  # Optional flags:
165
- # --lowmem # Optional: Use this if you have less than 64GB of RAM
244
+ # --lowmem/--fullmem # Optional: Use low-mem if you have less than 64GB of RAM (default is full mem)
166
245
  # --adapt-headers # If your FASTA headers need modification
167
- # --del-tmp # To delete temporary files
246
+ # --del-tmp/--keep-tmp # Delete or keep temporary files
168
247
  # --eggnog-data-dir /path # If EGGNOG_DATA_DIR is not set
169
248
  # --verbose # Enable verbose output with detailed progress information
170
249
  ```
@@ -183,7 +262,7 @@ moducomp analyze-ko-matrix \
183
262
  --calculate-complementarity <N> # 0 to disable, 2 for 2-member, 3 for 3-member complementarity.
184
263
 
185
264
  # Optional flags:
186
- # --del-tmp false
265
+ # --keep-tmp # Keep temporary files
187
266
  # --verbose # Enable verbose output with detailed progress information
188
267
  ```
189
268
 
@@ -227,7 +306,7 @@ moducomp pipeline ./genomes ./output_lowmem --ncpus 8 --lowmem --calculate-compl
227
306
  - **`module_completeness.tsv`**: Module completeness scores for individual genomes and combinations
228
307
  - **`module_completeness_complementarity_Nmember.tsv`**: Complementarity reports (if requested)
229
308
  - **`logs/resource_usage_YYYYMMDD_HHMMSS.log`**: Resource monitoring log with CPU, memory, and runtime metrics for reproducibility
230
- - **`logs/moducomp.log`**: Detailed pipeline execution log
309
+ - **`logs/moducomp.log`**: Detailed pipeline execution log with a per-command resource summary at the end of the run
231
310
 
232
311
  ## Citation
233
312
  Villada, JC. & Schulz, F. (2025). Assessment of metabolic module completeness of genomes and metabolic complementarity in microbiomes with `moducomp` . `moducomp` (v0.5.1) Zenodo. https://doi.org/10.5281/zenodo.16116092
@@ -12,6 +12,7 @@
12
12
  - Generation of complementarity reports highlighting modules completed through genome partnerships.
13
13
  - Tracks and reports the actual proteins that are responsible for the completion of the module in the combination of N genomes.
14
14
  - **Automatic resource monitoring** with timestamped logs tracking CPU usage, memory consumption, and runtime for reproducibility.
15
+ - **Consistent logging to stdout/stderr** with a per-command resource summary emitted at the end of each run.
15
16
 
16
17
  ## Installation (Recommended)
17
18
 
@@ -52,6 +53,8 @@ Small test data sets ship with `moducomp`. After installation you can confirm th
52
53
  moducomp test --ncpus 16 --calculate-complementarity 2 --eggnog-data-dir "$EGGNOG_DATA_DIR"
53
54
  ```
54
55
 
56
+ The test command runs in low-memory mode by default. If you have plenty of RAM and want full-memory mode, add `--fullmem` (or `--full-mem`).
57
+
55
58
  ### Developer install (Pixi)
56
59
 
57
60
  If you want to download the code and develop locally:
@@ -80,6 +83,82 @@ You should see the command line help without errors.
80
83
 
81
84
  `moducomp` provides two main commands: `pipeline` and `analyze-ko-matrix`. You can run these commands using Pixi tasks defined in `pyproject.toml` or directly within the Pixi environment.
82
85
 
86
+ ### Pipeline overview
87
+
88
+ The diagram below shows the main stages executed by ModuComp.
89
+
90
+ ```mermaid
91
+ graph TD
92
+ A([Start run]) --> B[Initialize logging and resource monitoring]
93
+ B --> C{Input type}
94
+ C -->|pipeline| D[Validate genome directory]
95
+ C -->|analyze-ko-matrix| H[Load existing KO matrix]
96
+ D --> E[Prepare genomes: adapt headers or copy to tmp]
97
+ E --> F[Merge genomes into single FAA]
98
+ F --> G[Run eggNOG-mapper (if needed)]
99
+ G --> H[Create KO matrix (`kos_matrix.csv`)]
100
+ H --> I[Convert KO matrix to KPCT input]
101
+ I --> J[Run KPCT (parallel with fallback)]
102
+ J --> K[Create module completeness matrix]
103
+ K --> L{Complementarity requested?}
104
+ L -->|Yes| M[Generate complementarity report(s)]
105
+ L -->|No| N[Skip]
106
+ M --> O[Write outputs + logs]
107
+ N --> O
108
+ O --> P[Optional cleanup of `tmp/`]
109
+ P --> Q([Pipeline complete])
110
+ ```
111
+
112
+ ### CLI options and defaults
113
+
114
+ This section lists all CLI options implemented today, along with their default values.
115
+
116
+ #### `pipeline` command (positional args: `genomedir`, `savedir`)
117
+
118
+ | Option | Default | Description |
119
+ | --- | --- | --- |
120
+ | `--ncpus`, `-n` | `16` | Number of CPU cores to use for eggNOG-mapper and KPCT. |
121
+ | `--calculate-complementarity`, `-c` | `0` | Complementarity size to compute (0 disables). |
122
+ | `--adapt-headers/--no-adapt-headers` | `false` | Adapt FASTA headers to `genome|protein_N`. |
123
+ | `--del-tmp/--keep-tmp` | `true` | Delete temporary files after completion. |
124
+ | `--lowmem/--fullmem` (`--low-mem/--full-mem`) | `fullmem` | Run eggNOG-mapper without `--dbmem` to reduce RAM. |
125
+ | `--verbose/--quiet` | `false` | Enable verbose progress output. |
126
+ | `--log-level`, `-l` | `INFO` | Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR`. |
127
+ | `--eggnog-data-dir` | `EGGNOG_DATA_DIR` | Path to eggNOG-mapper data (sets `EGGNOG_DATA_DIR`). |
128
+
129
+ #### `test` command (bundled test genomes)
130
+
131
+ | Option | Default | Description |
132
+ | --- | --- | --- |
133
+ | `--output-dir`, `-o` | `output_test_moducomp_<DATETIME>` | Output directory for test run. |
134
+ | `--ncpus`, `-n` | `2` | CPU cores for the test run. |
135
+ | `--calculate-complementarity`, `-c` | `2` | Complementarity size to compute (0 disables). |
136
+ | `--adapt-headers/--no-adapt-headers` | `false` | Adapt FASTA headers before the test. |
137
+ | `--del-tmp/--keep-tmp` | `true` | Delete temporary files after the test completes. |
138
+ | `--lowmem/--fullmem` (`--low-mem/--full-mem`) | `lowmem` | Low-memory mode is the default for tests. |
139
+ | `--verbose/--quiet` | `verbose` | Verbose output is the default for tests. |
140
+ | `--log-level`, `-l` | `INFO` | Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR`. |
141
+ | `--eggnog-data-dir` | `EGGNOG_DATA_DIR` | Path to eggNOG-mapper data (sets `EGGNOG_DATA_DIR`). |
142
+
143
+ #### `analyze-ko-matrix` command (positional args: `kos_matrix`, `savedir`)
144
+
145
+ | Option | Default | Description |
146
+ | --- | --- | --- |
147
+ | `--calculate-complementarity`, `-c` | `0` | Complementarity size to compute (0 disables). |
148
+ | `--kpct-outprefix` | `output_give_completeness` | Prefix for KPCT output files. |
149
+ | `--del-tmp/--keep-tmp` | `true` | Delete temporary files after completion. |
150
+ | `--ncpus`, `-n` | `16` | CPU cores for KPCT parallel processing. |
151
+ | `--verbose/--quiet` | `false` | Enable verbose progress output. |
152
+ | `--log-level`, `-l` | `INFO` | Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR`. |
153
+
154
+ #### `download-eggnog-data` command
155
+
156
+ | Option | Default | Description |
157
+ | --- | --- | --- |
158
+ | `--eggnog-data-dir` | `${XDG_DATA_HOME:-~/.local/share}/moducomp/eggnog` | Destination for eggNOG-mapper data (sets `EGGNOG_DATA_DIR`). |
159
+ | `--log-level`, `-l` | `INFO` | Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR`. |
160
+ | `--verbose/--quiet` | `verbose` | Stream downloader output to the console. |
161
+
83
162
  ### Performance and parallel processing
84
163
 
85
164
  `moducomp` includes **parallel processing capabilities** for the KPCT (KEGG Pathways Completeness Tool) analysis, which can significantly improve performance for large datasets:
@@ -100,7 +179,7 @@ You should see the command line help without errors.
100
179
 
101
180
  ### ⚠️ Important note 2
102
181
 
103
- `moducomp` is specifically designed for large scale analysis of microbiomes with hundreds of members, and works on Linux systems with at least **64GB of RAM**. Nevertheless, it can be run on **smaller systems with less RAM, using the flag `--lowmem` when running the `pipeline` command**.
182
+ `moducomp` is specifically designed for large scale analysis of microbiomes with hundreds of members, and works on Linux systems with at least **64GB of RAM**. Nevertheless, it can be run on **smaller systems with less RAM, using the flag `--lowmem` (`--low-mem`) when running the `pipeline` command**. The `test` command uses low-memory mode by default and can be switched to full memory with `--fullmem` (`--full-mem`).
104
183
 
105
184
  ### Notes on bundled test data
106
185
 
@@ -137,9 +216,9 @@ moducomp pipeline \
137
216
  --ncpus <number_of_cpus_to_use> \
138
217
  --calculate-complementarity <N> # 0 to disable, 2 for 2-member, 3 for 3-member complementarity.
139
218
  # Optional flags:
140
- # --lowmem # Optional: Use this if you have less than 64GB of RAM
219
+ # --lowmem/--fullmem # Optional: Use low-mem if you have less than 64GB of RAM (default is full mem)
141
220
  # --adapt-headers # If your FASTA headers need modification
142
- # --del-tmp # To delete temporary files
221
+ # --del-tmp/--keep-tmp # Delete or keep temporary files
143
222
  # --eggnog-data-dir /path # If EGGNOG_DATA_DIR is not set
144
223
  # --verbose # Enable verbose output with detailed progress information
145
224
  ```
@@ -158,7 +237,7 @@ moducomp analyze-ko-matrix \
158
237
  --calculate-complementarity <N> # 0 to disable, 2 for 2-member, 3 for 3-member complementarity.
159
238
 
160
239
  # Optional flags:
161
- # --del-tmp false
240
+ # --keep-tmp # Keep temporary files
162
241
  # --verbose # Enable verbose output with detailed progress information
163
242
  ```
164
243
 
@@ -202,7 +281,7 @@ moducomp pipeline ./genomes ./output_lowmem --ncpus 8 --lowmem --calculate-compl
202
281
  - **`module_completeness.tsv`**: Module completeness scores for individual genomes and combinations
203
282
  - **`module_completeness_complementarity_Nmember.tsv`**: Complementarity reports (if requested)
204
283
  - **`logs/resource_usage_YYYYMMDD_HHMMSS.log`**: Resource monitoring log with CPU, memory, and runtime metrics for reproducibility
205
- - **`logs/moducomp.log`**: Detailed pipeline execution log
284
+ - **`logs/moducomp.log`**: Detailed pipeline execution log with a per-command resource summary at the end of the run
206
285
 
207
286
  ## Citation
208
287
  Villada, JC. & Schulz, F. (2025). Assessment of metabolic module completeness of genomes and metabolic complementarity in microbiomes with `moducomp` . `moducomp` (v0.5.1) Zenodo. https://doi.org/10.5281/zenodo.16116092
@@ -2,7 +2,7 @@
2
2
  moducomp: metabolic module completeness and complementarity for microbiomes.
3
3
  """
4
4
 
5
- __version__ = "0.7.8"
5
+ __version__ = "0.7.10"
6
6
  __author__ = "Juan C. Villada"
7
7
  __email__ = "jvillada@lbl.gov"
8
8
  __title__ = "moducomp"