moducomp 0.7.11__tar.gz → 0.7.13__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: moducomp
3
- Version: 0.7.11
3
+ Version: 0.7.13
4
4
  Summary: moducomp: metabolic module completeness and complementarity for microbiomes.
5
5
  Keywords: bioinformatics,microbiome,metabolic,kegg,genomics
6
6
  Author-email: "Juan C. Villada" <jvillada@lbl.gov>
@@ -38,6 +38,7 @@ Project-URL: Repository, https://github.com/NeLLi-team/moducomp
38
38
  - Tracks and reports the actual proteins that are responsible for the completion of the module in the combination of N genomes.
39
39
  - **Automatic resource monitoring** with timestamped logs tracking CPU usage, memory consumption, and runtime for reproducibility.
40
40
  - **Consistent logging to stdout/stderr** with a per-command resource summary emitted at the end of each run.
41
+ - **Built-in validation (`moducomp validate`)** for scientific consistency checks across annotations, KO matrices, KPCT outputs, and complementarity reports.
41
42
 
42
43
  ## Installation (Recommended)
43
44
 
@@ -59,16 +60,16 @@ pixi global install \
59
60
 
60
61
  ## Setup data (required)
61
62
 
62
- `moducomp` needs the eggNOG-mapper database to run. Download it once:
63
+ `moducomp` needs the eggNOG-mapper database to run. The primary (recommended) way to download it is using the `download_eggnog_data.py` wrapper, which mirrors the upstream downloader behavior. For upstream details, see the eggNOG-mapper setup guide: [eggNOG-mapper database setup](https://github.com/eggnogdb/eggnog-mapper/wiki/eggNOG-mapper-v2.1.5-to-v2.1.13#user-content-Setup).
63
64
 
64
65
  ```bash
65
66
  export EGGNOG_DATA_DIR="/path/to/eggnog-data"
66
- moducomp download-eggnog-data --eggnog-data-dir "$EGGNOG_DATA_DIR"
67
- # or the standalone script:
68
- # download_eggnog_data.py
67
+ download_eggnog_data.py --eggnog-data-dir "$EGGNOG_DATA_DIR"
68
+ # equivalent:
69
+ # moducomp download-eggnog-data --eggnog-data-dir "$EGGNOG_DATA_DIR"
69
70
  ```
70
71
 
71
- If `EGGNOG_DATA_DIR` is not set, `moducomp download-eggnog-data` defaults to `${XDG_DATA_HOME:-~/.local/share}/moducomp/eggnog`.
72
+ If `EGGNOG_DATA_DIR` is not set, the downloader defaults to `${XDG_DATA_HOME:-~/.local/share}/moducomp/eggnog`.
72
73
 
73
74
  ### Quick test
74
75
 
@@ -148,6 +149,9 @@ This section lists all CLI options implemented today, along with their default v
148
149
  | `--del-tmp/--keep-tmp` | `true` | Delete temporary files after completion. |
149
150
  | `--lowmem/--fullmem` (`--low-mem/--full-mem`) | `fullmem` | Run eggNOG-mapper without `--dbmem` to reduce RAM. |
150
151
  | `--verbose/--quiet` | `false` | Enable verbose progress output. |
152
+ | `--validate/--no-validate` | `validate` | Run post-run validation checks. |
153
+ | `--validate-report/--no-validate-report` | `validate-report` | Write `validation_report.json` in the output directory. |
154
+ | `--validate-strict/--validate-lenient` | `lenient` | Treat validation warnings as failures when strict. |
151
155
  | `--log-level`, `-l` | `INFO` | Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR`. |
152
156
  | `--eggnog-data-dir` | `EGGNOG_DATA_DIR` | Path to eggNOG-mapper data (sets `EGGNOG_DATA_DIR`). |
153
157
 
@@ -162,6 +166,9 @@ This section lists all CLI options implemented today, along with their default v
162
166
  | `--del-tmp/--keep-tmp` | `true` | Delete temporary files after the test completes. |
163
167
  | `--lowmem/--fullmem` (`--low-mem/--full-mem`) | `lowmem` | Low-memory mode is the default for tests. |
164
168
  | `--verbose/--quiet` | `verbose` | Verbose output is the default for tests. |
169
+ | `--validate/--no-validate` | `validate` | Run post-run validation checks. |
170
+ | `--validate-report/--no-validate-report` | `validate-report` | Write `validation_report.json` in the output directory. |
171
+ | `--validate-strict/--validate-lenient` | `lenient` | Treat validation warnings as failures when strict. |
165
172
  | `--log-level`, `-l` | `INFO` | Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR`. |
166
173
  | `--eggnog-data-dir` | `EGGNOG_DATA_DIR` | Path to eggNOG-mapper data (sets `EGGNOG_DATA_DIR`). |
167
174
 
@@ -174,6 +181,21 @@ This section lists all CLI options implemented today, along with their default v
174
181
  | `--del-tmp/--keep-tmp` | `true` | Delete temporary files after completion. |
175
182
  | `--ncpus`, `-n` | `16` | CPU cores for KPCT parallel processing. |
176
183
  | `--verbose/--quiet` | `false` | Enable verbose progress output. |
184
+ | `--validate/--no-validate` | `validate` | Run post-run validation checks. |
185
+ | `--validate-report/--no-validate-report` | `validate-report` | Write `validation_report.json` in the output directory. |
186
+ | `--validate-strict/--validate-lenient` | `lenient` | Treat validation warnings as failures when strict. |
187
+ | `--log-level`, `-l` | `INFO` | Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR`. |
188
+
189
+ #### `validate` command (positional args: `savedir`)
190
+
191
+ | Option | Default | Description |
192
+ | --- | --- | --- |
193
+ | `--mode` | `auto` | Validation mode: `auto`, `pipeline`, or `ko-matrix`. |
194
+ | `--calculate-complementarity`, `-c` | `auto-detect` | Expected complementarity size (0 disables). |
195
+ | `--kpct-outprefix` | `output_give_completeness` | KPCT output prefix used during analysis. |
196
+ | `--strict/--lenient` | `lenient` | Treat warnings as failures when strict. |
197
+ | `--report` | _none_ | Write JSON validation report to this path. |
198
+ | `--verbose/--quiet` | `false` | Enable verbose progress output. |
177
199
  | `--log-level`, `-l` | `INFO` | Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR`. |
178
200
 
179
201
  #### `download-eggnog-data` command
@@ -198,6 +220,33 @@ This section lists all CLI options implemented today, along with their default v
198
220
  - For KPCT parallel processing, the system creates the same number of chunks as CPU cores specified
199
221
  - Example: `--ncpus 8` will use 8 cores and create 8 chunks for optimal parallel processing
200
222
 
223
+ ### Validation (QC)
224
+
225
+ Use the built-in validator to check scientific consistency across outputs after a run. The validator compares:
226
+ - KO sets and counts between eggNOG-mapper annotations and `kos_matrix.csv`
227
+ - KO sets between `kos_matrix.csv` and `ko_file_for_kpct.txt`
228
+ - KPCT contigs vs pathways outputs
229
+ - Module completeness ranges and combination naming
230
+ - Complementarity reports versus module completeness values
231
+ - Protein provenance fields (pipeline mode) or placeholders (KO-matrix mode)
232
+
233
+ Example:
234
+
235
+ ```bash
236
+ # Validation runs by default after pipeline/analyze/test.
237
+ # Use --no-validate to disable or --no-validate-report to skip JSON output.
238
+ # When validation reports errors (or warnings in strict mode), the command exits non-zero.
239
+
240
+ # Validate a pipeline run and write a JSON report
241
+ moducomp validate /path/to/output --mode pipeline --report /path/to/output/validation_report.json
242
+
243
+ # Validate KO-matrix mode outputs (non-default KPCT prefix)
244
+ moducomp validate /path/to/output --mode ko-matrix --kpct-outprefix my_prefix
245
+
246
+ # Treat warnings as failures
247
+ moducomp validate /path/to/output --strict
248
+ ```
249
+
201
250
  ### ⚠️ Important note 1
202
251
 
203
252
  **Prepare FAA files**: Ensure FAA headers are in the form `>genomeName|proteinId`, or use the `--adapt-headers` option to format your headers into `>fileName_prefix|protein_id_counter`.
@@ -211,7 +260,7 @@ This section lists all CLI options implemented today, along with their default v
211
260
  You can override the bundled data location with `MODUCOMP_DATA_DIR`.
212
261
  When working from source, the bundled test genomes live at `moducomp/data/test_genomes`.
213
262
 
214
- `download_eggnog_data.py` is provided by eggnog-mapper and is available in the Pixi environment (or via `pixi global` installs).
263
+ `download_eggnog_data.py` is exposed by `moducomp` as a convenience wrapper for the eggnog-mapper downloader and is available in the Pixi environment (including `pixi global` installs).
215
264
 
216
265
  Pixi task (supports passing a custom location):
217
266
 
@@ -298,15 +347,38 @@ moducomp analyze-ko-matrix ./ko_matrix.csv ./output_moderate --ncpus 16 --calcul
298
347
  moducomp pipeline ./genomes ./output_lowmem --ncpus 8 --lowmem --calculate-complementarity 2
299
348
  ```
300
349
 
301
- ## Output files
350
+ ## Expected outputs
351
+
352
+ The sections below describe the expected output files, naming conventions, and the column-level meaning of each file. These details are the same for `moducomp pipeline` and `moducomp test` (pipeline mode), and the subset noted for `moducomp analyze-ko-matrix` (KO-matrix mode).
353
+
354
+ **Naming conventions**
355
+
356
+ Genome identifiers are stored as `taxon_oid`. In pipeline mode, ModuComp expects protein headers in the format `genome_id|protein_id`. If you set `--adapt-headers`, ModuComp rewrites headers to `>genomeName|protein_N`, where `genomeName` is the FAA filename stem. Combination identifiers use `__` (double underscore), for example `GenomeA__GenomeB`, and `n_members` in `module_completeness.tsv` records the size of each combination.
357
+
358
+ **Pipeline mode outputs (`moducomp pipeline`, `moducomp test`)**
359
+
360
+ - `emapper_out.emapper.annotations`: Full eggNOG-mapper annotations. The `#query` column must match `genome_id|protein_id`. `KEGG_ko` entries are prefixed `ko:KXXXXX` and are converted to `KXXXXX` for downstream matrices.
361
+ - `kos_matrix.csv`: Genome × KO count matrix. Columns: `taxon_oid` followed by KO IDs (e.g., `K00001`). Values are integer protein counts per KO.
362
+ - `ko_file_for_kpct.txt`: KPCT input file. Each line starts with `taxon_oid` followed by the set of KO IDs present in that genome or combination. If `--calculate-complementarity` is `N>=2`, combinations up to `N` are included as `GenomeA__GenomeB`.
363
+ - `output_give_completeness_contigs.with_weights.tsv`: KPCT module results per genome/combination. Columns: `contig` (genome/combination ID), `module_accession`, `completeness` (0–100), `pathway_name`, `pathway_class`, `matching_ko` (KO weights), `missing_ko`.
364
+ - `output_give_completeness_pathways.with_weights.tsv`: Same rows and order as the contigs file, but without the `contig` column. This is provided for compatibility with legacy tools; prefer the contigs file when you need genome-level provenance.
365
+ - `module_completeness.tsv`: Pivoted module completeness matrix. Columns: `n_members`, `taxon_oid`, followed by KEGG module IDs (`M00001`, …). Values are numeric percentages in the range 0–100.
366
+ - `module_completeness_complementarity_Nmember.tsv`: Complementarity report for `N`-member combinations (only when `--calculate-complementarity N` is set). Columns: `taxon_oid_1..N`, `completeness_taxon_oid_1..N`, `module_id`, `module_name`, `pathway_class`, `matching_ko`, `proteins_taxon_oid_1..N`. Protein fields list contributing proteins per KO (from eggNOG-mapper) as `{'KXXXXX': 'genome|protein'}`.
367
+ - `logs/moducomp.log`: Detailed run log with structured progress messages and per-command resource summaries.
368
+ - `logs/resource_usage_YYYYMMDD_HHMMSS.log`: Resource monitoring log capturing wall time, CPU time, CPU utilization, peak RAM, and exit code for each monitored command.
369
+ - `tmp/` (only if `--keep-tmp`): Intermediate files such as `merged_genomes.faa`, `emapper_output/`, and KPCT chunk outputs.
370
+ - `validation_report.json` (default when validation is enabled): JSON report produced by the validator.
302
371
 
303
- `moducomp` generates several output files in the specified output directory:
372
+ **KO-matrix mode outputs (`moducomp analyze-ko-matrix`)**
304
373
 
305
- - **`kos_matrix.csv`**: Matrix of KO counts for each genome
306
- - **`module_completeness.tsv`**: Module completeness scores for individual genomes and combinations
307
- - **`module_completeness_complementarity_Nmember.tsv`**: Complementarity reports (if requested)
308
- - **`logs/resource_usage_YYYYMMDD_HHMMSS.log`**: Resource monitoring log with CPU, memory, and runtime metrics for reproducibility
309
- - **`logs/moducomp.log`**: Detailed pipeline execution log with a per-command resource summary at the end of the run
374
+ - `kos_matrix.csv`: A copy of the input KO matrix (same format as above).
375
+ - `ko_file_for_kpct.txt`: KPCT input generated from the KO matrix. If `--calculate-complementarity` is set, combination lines are added using `GenomeA__GenomeB` identifiers.
376
+ - `output_give_completeness_contigs.with_weights.tsv`: KPCT module results per genome/combination (same format as pipeline mode).
377
+ - `output_give_completeness_pathways.with_weights.tsv`: Same rows as the contigs file, without the `contig` column.
378
+ - `module_completeness.tsv`: Module completeness matrix (same format as pipeline mode).
379
+ - `module_completeness_complementarity_Nmember.tsv`: Complementarity report. Protein contribution columns are filled with `No protein data available for <genome>` because no eggNOG-mapper annotations are available in KO-matrix mode.
380
+ - `logs/moducomp.log` and `logs/resource_usage_YYYYMMDD_HHMMSS.log`: Standard run logs and resource summaries.
381
+ - `validation_report.json` (default when validation is enabled): JSON report produced by the validator.
310
382
 
311
383
  ## Citation
312
384
  Villada, JC. & Schulz, F. (2025). Assessment of metabolic module completeness of genomes and metabolic complementarity in microbiomes with `moducomp` . `moducomp` (v0.5.1) Zenodo. https://doi.org/10.5281/zenodo.16116092
@@ -13,6 +13,7 @@
13
13
  - Tracks and reports the actual proteins that are responsible for the completion of the module in the combination of N genomes.
14
14
  - **Automatic resource monitoring** with timestamped logs tracking CPU usage, memory consumption, and runtime for reproducibility.
15
15
  - **Consistent logging to stdout/stderr** with a per-command resource summary emitted at the end of each run.
16
+ - **Built-in validation (`moducomp validate`)** for scientific consistency checks across annotations, KO matrices, KPCT outputs, and complementarity reports.
16
17
 
17
18
  ## Installation (Recommended)
18
19
 
@@ -34,16 +35,16 @@ pixi global install \
34
35
 
35
36
  ## Setup data (required)
36
37
 
37
- `moducomp` needs the eggNOG-mapper database to run. Download it once:
38
+ `moducomp` needs the eggNOG-mapper database to run. The primary (recommended) way to download it is using the `download_eggnog_data.py` wrapper, which mirrors the upstream downloader behavior. For upstream details, see the eggNOG-mapper setup guide: [eggNOG-mapper database setup](https://github.com/eggnogdb/eggnog-mapper/wiki/eggNOG-mapper-v2.1.5-to-v2.1.13#user-content-Setup).
38
39
 
39
40
  ```bash
40
41
  export EGGNOG_DATA_DIR="/path/to/eggnog-data"
41
- moducomp download-eggnog-data --eggnog-data-dir "$EGGNOG_DATA_DIR"
42
- # or the standalone script:
43
- # download_eggnog_data.py
42
+ download_eggnog_data.py --eggnog-data-dir "$EGGNOG_DATA_DIR"
43
+ # equivalent:
44
+ # moducomp download-eggnog-data --eggnog-data-dir "$EGGNOG_DATA_DIR"
44
45
  ```
45
46
 
46
- If `EGGNOG_DATA_DIR` is not set, `moducomp download-eggnog-data` defaults to `${XDG_DATA_HOME:-~/.local/share}/moducomp/eggnog`.
47
+ If `EGGNOG_DATA_DIR` is not set, the downloader defaults to `${XDG_DATA_HOME:-~/.local/share}/moducomp/eggnog`.
47
48
 
48
49
  ### Quick test
49
50
 
@@ -123,6 +124,9 @@ This section lists all CLI options implemented today, along with their default v
123
124
  | `--del-tmp/--keep-tmp` | `true` | Delete temporary files after completion. |
124
125
  | `--lowmem/--fullmem` (`--low-mem/--full-mem`) | `fullmem` | Run eggNOG-mapper without `--dbmem` to reduce RAM. |
125
126
  | `--verbose/--quiet` | `false` | Enable verbose progress output. |
127
+ | `--validate/--no-validate` | `validate` | Run post-run validation checks. |
128
+ | `--validate-report/--no-validate-report` | `validate-report` | Write `validation_report.json` in the output directory. |
129
+ | `--validate-strict/--validate-lenient` | `lenient` | Treat validation warnings as failures when strict. |
126
130
  | `--log-level`, `-l` | `INFO` | Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR`. |
127
131
  | `--eggnog-data-dir` | `EGGNOG_DATA_DIR` | Path to eggNOG-mapper data (sets `EGGNOG_DATA_DIR`). |
128
132
 
@@ -137,6 +141,9 @@ This section lists all CLI options implemented today, along with their default v
137
141
  | `--del-tmp/--keep-tmp` | `true` | Delete temporary files after the test completes. |
138
142
  | `--lowmem/--fullmem` (`--low-mem/--full-mem`) | `lowmem` | Low-memory mode is the default for tests. |
139
143
  | `--verbose/--quiet` | `verbose` | Verbose output is the default for tests. |
144
+ | `--validate/--no-validate` | `validate` | Run post-run validation checks. |
145
+ | `--validate-report/--no-validate-report` | `validate-report` | Write `validation_report.json` in the output directory. |
146
+ | `--validate-strict/--validate-lenient` | `lenient` | Treat validation warnings as failures when strict. |
140
147
  | `--log-level`, `-l` | `INFO` | Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR`. |
141
148
  | `--eggnog-data-dir` | `EGGNOG_DATA_DIR` | Path to eggNOG-mapper data (sets `EGGNOG_DATA_DIR`). |
142
149
 
@@ -149,6 +156,21 @@ This section lists all CLI options implemented today, along with their default v
149
156
  | `--del-tmp/--keep-tmp` | `true` | Delete temporary files after completion. |
150
157
  | `--ncpus`, `-n` | `16` | CPU cores for KPCT parallel processing. |
151
158
  | `--verbose/--quiet` | `false` | Enable verbose progress output. |
159
+ | `--validate/--no-validate` | `validate` | Run post-run validation checks. |
160
+ | `--validate-report/--no-validate-report` | `validate-report` | Write `validation_report.json` in the output directory. |
161
+ | `--validate-strict/--validate-lenient` | `lenient` | Treat validation warnings as failures when strict. |
162
+ | `--log-level`, `-l` | `INFO` | Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR`. |
163
+
164
+ #### `validate` command (positional args: `savedir`)
165
+
166
+ | Option | Default | Description |
167
+ | --- | --- | --- |
168
+ | `--mode` | `auto` | Validation mode: `auto`, `pipeline`, or `ko-matrix`. |
169
+ | `--calculate-complementarity`, `-c` | `auto-detect` | Expected complementarity size (0 disables). |
170
+ | `--kpct-outprefix` | `output_give_completeness` | KPCT output prefix used during analysis. |
171
+ | `--strict/--lenient` | `lenient` | Treat warnings as failures when strict. |
172
+ | `--report` | _none_ | Write JSON validation report to this path. |
173
+ | `--verbose/--quiet` | `false` | Enable verbose progress output. |
152
174
  | `--log-level`, `-l` | `INFO` | Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR`. |
153
175
 
154
176
  #### `download-eggnog-data` command
@@ -173,6 +195,33 @@ This section lists all CLI options implemented today, along with their default v
173
195
  - For KPCT parallel processing, the system creates the same number of chunks as CPU cores specified
174
196
  - Example: `--ncpus 8` will use 8 cores and create 8 chunks for optimal parallel processing
175
197
 
198
+ ### Validation (QC)
199
+
200
+ Use the built-in validator to check scientific consistency across outputs after a run. The validator compares:
201
+ - KO sets and counts between eggNOG-mapper annotations and `kos_matrix.csv`
202
+ - KO sets between `kos_matrix.csv` and `ko_file_for_kpct.txt`
203
+ - KPCT contigs vs pathways outputs
204
+ - Module completeness ranges and combination naming
205
+ - Complementarity reports versus module completeness values
206
+ - Protein provenance fields (pipeline mode) or placeholders (KO-matrix mode)
207
+
208
+ Example:
209
+
210
+ ```bash
211
+ # Validation runs by default after pipeline/analyze/test.
212
+ # Use --no-validate to disable or --no-validate-report to skip JSON output.
213
+ # When validation reports errors (or warnings in strict mode), the command exits non-zero.
214
+
215
+ # Validate a pipeline run and write a JSON report
216
+ moducomp validate /path/to/output --mode pipeline --report /path/to/output/validation_report.json
217
+
218
+ # Validate KO-matrix mode outputs (non-default KPCT prefix)
219
+ moducomp validate /path/to/output --mode ko-matrix --kpct-outprefix my_prefix
220
+
221
+ # Treat warnings as failures
222
+ moducomp validate /path/to/output --strict
223
+ ```
224
+
176
225
  ### ⚠️ Important note 1
177
226
 
178
227
  **Prepare FAA files**: Ensure FAA headers are in the form `>genomeName|proteinId`, or use the `--adapt-headers` option to format your headers into `>fileName_prefix|protein_id_counter`.
@@ -186,7 +235,7 @@ This section lists all CLI options implemented today, along with their default v
186
235
  You can override the bundled data location with `MODUCOMP_DATA_DIR`.
187
236
  When working from source, the bundled test genomes live at `moducomp/data/test_genomes`.
188
237
 
189
- `download_eggnog_data.py` is provided by eggnog-mapper and is available in the Pixi environment (or via `pixi global` installs).
238
+ `download_eggnog_data.py` is exposed by `moducomp` as a convenience wrapper for the eggnog-mapper downloader and is available in the Pixi environment (including `pixi global` installs).
190
239
 
191
240
  Pixi task (supports passing a custom location):
192
241
 
@@ -273,15 +322,38 @@ moducomp analyze-ko-matrix ./ko_matrix.csv ./output_moderate --ncpus 16 --calcul
273
322
  moducomp pipeline ./genomes ./output_lowmem --ncpus 8 --lowmem --calculate-complementarity 2
274
323
  ```
275
324
 
276
- ## Output files
325
+ ## Expected outputs
326
+
327
+ The sections below describe the expected output files, naming conventions, and the column-level meaning of each file. These details are the same for `moducomp pipeline` and `moducomp test` (pipeline mode), and the subset noted for `moducomp analyze-ko-matrix` (KO-matrix mode).
328
+
329
+ **Naming conventions**
330
+
331
+ Genome identifiers are stored as `taxon_oid`. In pipeline mode, ModuComp expects protein headers in the format `genome_id|protein_id`. If you set `--adapt-headers`, ModuComp rewrites headers to `>genomeName|protein_N`, where `genomeName` is the FAA filename stem. Combination identifiers use `__` (double underscore), for example `GenomeA__GenomeB`, and `n_members` in `module_completeness.tsv` records the size of each combination.
332
+
333
+ **Pipeline mode outputs (`moducomp pipeline`, `moducomp test`)**
334
+
335
+ - `emapper_out.emapper.annotations`: Full eggNOG-mapper annotations. The `#query` column must match `genome_id|protein_id`. `KEGG_ko` entries are prefixed `ko:KXXXXX` and are converted to `KXXXXX` for downstream matrices.
336
+ - `kos_matrix.csv`: Genome × KO count matrix. Columns: `taxon_oid` followed by KO IDs (e.g., `K00001`). Values are integer protein counts per KO.
337
+ - `ko_file_for_kpct.txt`: KPCT input file. Each line starts with `taxon_oid` followed by the set of KO IDs present in that genome or combination. If `--calculate-complementarity` is `N>=2`, combinations up to `N` are included as `GenomeA__GenomeB`.
338
+ - `output_give_completeness_contigs.with_weights.tsv`: KPCT module results per genome/combination. Columns: `contig` (genome/combination ID), `module_accession`, `completeness` (0–100), `pathway_name`, `pathway_class`, `matching_ko` (KO weights), `missing_ko`.
339
+ - `output_give_completeness_pathways.with_weights.tsv`: Same rows and order as the contigs file, but without the `contig` column. This is provided for compatibility with legacy tools; prefer the contigs file when you need genome-level provenance.
340
+ - `module_completeness.tsv`: Pivoted module completeness matrix. Columns: `n_members`, `taxon_oid`, followed by KEGG module IDs (`M00001`, …). Values are numeric percentages in the range 0–100.
341
+ - `module_completeness_complementarity_Nmember.tsv`: Complementarity report for `N`-member combinations (only when `--calculate-complementarity N` is set). Columns: `taxon_oid_1..N`, `completeness_taxon_oid_1..N`, `module_id`, `module_name`, `pathway_class`, `matching_ko`, `proteins_taxon_oid_1..N`. Protein fields list contributing proteins per KO (from eggNOG-mapper) as `{'KXXXXX': 'genome|protein'}`.
342
+ - `logs/moducomp.log`: Detailed run log with structured progress messages and per-command resource summaries.
343
+ - `logs/resource_usage_YYYYMMDD_HHMMSS.log`: Resource monitoring log capturing wall time, CPU time, CPU utilization, peak RAM, and exit code for each monitored command.
344
+ - `tmp/` (only if `--keep-tmp`): Intermediate files such as `merged_genomes.faa`, `emapper_output/`, and KPCT chunk outputs.
345
+ - `validation_report.json` (default when validation is enabled): JSON report produced by the validator.
277
346
 
278
- `moducomp` generates several output files in the specified output directory:
347
+ **KO-matrix mode outputs (`moducomp analyze-ko-matrix`)**
279
348
 
280
- - **`kos_matrix.csv`**: Matrix of KO counts for each genome
281
- - **`module_completeness.tsv`**: Module completeness scores for individual genomes and combinations
282
- - **`module_completeness_complementarity_Nmember.tsv`**: Complementarity reports (if requested)
283
- - **`logs/resource_usage_YYYYMMDD_HHMMSS.log`**: Resource monitoring log with CPU, memory, and runtime metrics for reproducibility
284
- - **`logs/moducomp.log`**: Detailed pipeline execution log with a per-command resource summary at the end of the run
349
+ - `kos_matrix.csv`: A copy of the input KO matrix (same format as above).
350
+ - `ko_file_for_kpct.txt`: KPCT input generated from the KO matrix. If `--calculate-complementarity` is set, combination lines are added using `GenomeA__GenomeB` identifiers.
351
+ - `output_give_completeness_contigs.with_weights.tsv`: KPCT module results per genome/combination (same format as pipeline mode).
352
+ - `output_give_completeness_pathways.with_weights.tsv`: Same rows as the contigs file, without the `contig` column.
353
+ - `module_completeness.tsv`: Module completeness matrix (same format as pipeline mode).
354
+ - `module_completeness_complementarity_Nmember.tsv`: Complementarity report. Protein contribution columns are filled with `No protein data available for <genome>` because no eggNOG-mapper annotations are available in KO-matrix mode.
355
+ - `logs/moducomp.log` and `logs/resource_usage_YYYYMMDD_HHMMSS.log`: Standard run logs and resource summaries.
356
+ - `validation_report.json` (default when validation is enabled): JSON report produced by the validator.
285
357
 
286
358
  ## Citation
287
359
  Villada, JC. & Schulz, F. (2025). Assessment of metabolic module completeness of genomes and metabolic complementarity in microbiomes with `moducomp` . `moducomp` (v0.5.1) Zenodo. https://doi.org/10.5281/zenodo.16116092
@@ -2,7 +2,7 @@
2
2
  moducomp: metabolic module completeness and complementarity for microbiomes.
3
3
  """
4
4
 
5
- __version__ = "0.7.11"
5
+ __version__ = "0.7.13"
6
6
  __author__ = "Juan C. Villada"
7
7
  __email__ = "jvillada@lbl.gov"
8
8
  __title__ = "moducomp"