PyPI - moducomp - Versions diffs - 0.7.8__tar.gz → 0.7.10__tar.gz - Mend

moducomp 0.7.8tar.gz → 0.7.10tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

{moducomp-0.7.8 → moducomp-0.7.10}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: moducomp
-Version: 0.7.8
+Version: 0.7.10
 Summary: moducomp: metabolic module completeness and complementarity for microbiomes.
 Keywords: bioinformatics,microbiome,metabolic,kegg,genomics
 Author-email: "Juan C. Villada" <jvillada@lbl.gov>
@@ -37,6 +37,7 @@ Project-URL: Repository, https://github.com/NeLLi-team/moducomp
 - Generation of complementarity reports highlighting modules completed through genome partnerships.
 - Tracks and reports the actual proteins that are responsible for the completion of the module in the combination of N genomes.
 - **Automatic resource monitoring** with timestamped logs tracking CPU usage, memory consumption, and runtime for reproducibility.
+- **Consistent logging to stdout/stderr** with a per-command resource summary emitted at the end of each run.
 ## Installation (Recommended)
@@ -77,6 +78,8 @@ Small test data sets ship with `moducomp`. After installation you can confirm th
 moducomp test --ncpus 16 --calculate-complementarity 2 --eggnog-data-dir "$EGGNOG_DATA_DIR"
 ```
+The test command runs in low-memory mode by default. If you have plenty of RAM and want full-memory mode, add `--fullmem` (or `--full-mem`).
 ### Developer install (Pixi)
 If you want to download the code and develop locally:
@@ -105,6 +108,82 @@ You should see the command line help without errors.
 `moducomp` provides two main commands: `pipeline` and `analyze-ko-matrix`. You can run these commands using Pixi tasks defined in `pyproject.toml` or directly within the Pixi environment.
+### Pipeline overview
+The diagram below shows the main stages executed by ModuComp.
+```mermaid
+graph TD
+    A([Start run]) --> B[Initialize logging and resource monitoring]
+    B --> C{Input type}
+    C -->|pipeline| D[Validate genome directory]
+    C -->|analyze-ko-matrix| H[Load existing KO matrix]
+    D --> E[Prepare genomes: adapt headers or copy to tmp]
+    E --> F[Merge genomes into single FAA]
+    F --> G[Run eggNOG-mapper (if needed)]
+    G --> H[Create KO matrix (`kos_matrix.csv`)]
+    H --> I[Convert KO matrix to KPCT input]
+    I --> J[Run KPCT (parallel with fallback)]
+    J --> K[Create module completeness matrix]
+    K --> L{Complementarity requested?}
+    L -->|Yes| M[Generate complementarity report(s)]
+    L -->|No| N[Skip]
+    M --> O[Write outputs + logs]
+    N --> O
+    O --> P[Optional cleanup of `tmp/`]
+    P --> Q([Pipeline complete])
+```
+### CLI options and defaults
+This section lists all CLI options implemented today, along with their default values.
+#### `pipeline` command (positional args: `genomedir`, `savedir`)
+| Option | Default | Description |
+| --- | --- | --- |
+| `--ncpus`, `-n` | `16` | Number of CPU cores to use for eggNOG-mapper and KPCT. |
+| `--calculate-complementarity`, `-c` | `0` | Complementarity size to compute (0 disables). |
+| `--adapt-headers/--no-adapt-headers` | `false` | Adapt FASTA headers to `genome|protein_N`. |
+| `--del-tmp/--keep-tmp` | `true` | Delete temporary files after completion. |
+| `--lowmem/--fullmem` (`--low-mem/--full-mem`) | `fullmem` | Run eggNOG-mapper without `--dbmem` to reduce RAM. |
+| `--verbose/--quiet` | `false` | Enable verbose progress output. |
+| `--log-level`, `-l` | `INFO` | Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR`. |
+| `--eggnog-data-dir` | `EGGNOG_DATA_DIR` | Path to eggNOG-mapper data (sets `EGGNOG_DATA_DIR`). |
+#### `test` command (bundled test genomes)
+| Option | Default | Description |
+| --- | --- | --- |
+| `--output-dir`, `-o` | `output_test_moducomp_<DATETIME>` | Output directory for test run. |
+| `--ncpus`, `-n` | `2` | CPU cores for the test run. |
+| `--calculate-complementarity`, `-c` | `2` | Complementarity size to compute (0 disables). |
+| `--adapt-headers/--no-adapt-headers` | `false` | Adapt FASTA headers before the test. |
+| `--del-tmp/--keep-tmp` | `true` | Delete temporary files after the test completes. |
+| `--lowmem/--fullmem` (`--low-mem/--full-mem`) | `lowmem` | Low-memory mode is the default for tests. |
+| `--verbose/--quiet` | `verbose` | Verbose output is the default for tests. |
+| `--log-level`, `-l` | `INFO` | Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR`. |
+| `--eggnog-data-dir` | `EGGNOG_DATA_DIR` | Path to eggNOG-mapper data (sets `EGGNOG_DATA_DIR`). |
+#### `analyze-ko-matrix` command (positional args: `kos_matrix`, `savedir`)
+| Option | Default | Description |
+| --- | --- | --- |
+| `--calculate-complementarity`, `-c` | `0` | Complementarity size to compute (0 disables). |
+| `--kpct-outprefix` | `output_give_completeness` | Prefix for KPCT output files. |
+| `--del-tmp/--keep-tmp` | `true` | Delete temporary files after completion. |
+| `--ncpus`, `-n` | `16` | CPU cores for KPCT parallel processing. |
+| `--verbose/--quiet` | `false` | Enable verbose progress output. |
+| `--log-level`, `-l` | `INFO` | Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR`. |
+#### `download-eggnog-data` command
+| Option | Default | Description |
+| --- | --- | --- |
+| `--eggnog-data-dir` | `${XDG_DATA_HOME:-~/.local/share}/moducomp/eggnog` | Destination for eggNOG-mapper data (sets `EGGNOG_DATA_DIR`). |
+| `--log-level`, `-l` | `INFO` | Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR`. |
+| `--verbose/--quiet` | `verbose` | Stream downloader output to the console. |
 ### Performance and parallel processing
 `moducomp` includes **parallel processing capabilities** for the KPCT (KEGG Pathways Completeness Tool) analysis, which can significantly improve performance for large datasets:
@@ -125,7 +204,7 @@ You should see the command line help without errors.
 ### ⚠️ Important note 2
-`moducomp` is specifically designed for large scale analysis of microbiomes with hundreds of members, and works on Linux systems with at least **64GB of RAM**. Nevertheless, it can be run on **smaller systems with less RAM, using the flag `--lowmem` when running the `pipeline` command**.
+`moducomp` is specifically designed for large scale analysis of microbiomes with hundreds of members, and works on Linux systems with at least **64GB of RAM**. Nevertheless, it can be run on **smaller systems with less RAM, using the flag `--lowmem` (`--low-mem`) when running the `pipeline` command**. The `test` command uses low-memory mode by default and can be switched to full memory with `--fullmem` (`--full-mem`).
 ### Notes on bundled test data
@@ -162,9 +241,9 @@ moducomp pipeline \
     --ncpus <number_of_cpus_to_use> \
     --calculate-complementarity <N>  # 0 to disable, 2 for 2-member, 3 for 3-member complementarity.
     # Optional flags:
-    # --lowmem                    # Optional: Use this if you have less than 64GB of RAM
+    # --lowmem/--fullmem          # Optional: Use low-mem if you have less than 64GB of RAM (default is full mem)
     # --adapt-headers             # If your FASTA headers need modification
-    # --del-tmp                   # To delete temporary files
+    # --del-tmp/--keep-tmp        # Delete or keep temporary files
     # --eggnog-data-dir /path     # If EGGNOG_DATA_DIR is not set
     # --verbose                   # Enable verbose output with detailed progress information
 ```
@@ -183,7 +262,7 @@ moducomp analyze-ko-matrix \
     --calculate-complementarity <N>  # 0 to disable, 2 for 2-member, 3 for 3-member complementarity.
     # Optional flags:
-    # --del-tmp false
+    # --keep-tmp                  # Keep temporary files
     # --verbose                   # Enable verbose output with detailed progress information
 ```
@@ -227,7 +306,7 @@ moducomp pipeline ./genomes ./output_lowmem --ncpus 8 --lowmem --calculate-compl
 - **`module_completeness.tsv`**: Module completeness scores for individual genomes and combinations
 - **`module_completeness_complementarity_Nmember.tsv`**: Complementarity reports (if requested)
 - **`logs/resource_usage_YYYYMMDD_HHMMSS.log`**: Resource monitoring log with CPU, memory, and runtime metrics for reproducibility
-- **`logs/moducomp.log`**: Detailed pipeline execution log
+- **`logs/moducomp.log`**: Detailed pipeline execution log with a per-command resource summary at the end of the run
 ## Citation
 Villada, JC. & Schulz, F. (2025). Assessment of metabolic module completeness of genomes and metabolic complementarity in microbiomes with `moducomp` . `moducomp` (v0.5.1) Zenodo. https://doi.org/10.5281/zenodo.16116092

{moducomp-0.7.8 → moducomp-0.7.10}/README.md RENAMED Viewed

@@ -12,6 +12,7 @@
 - Generation of complementarity reports highlighting modules completed through genome partnerships.
 - Tracks and reports the actual proteins that are responsible for the completion of the module in the combination of N genomes.
 - **Automatic resource monitoring** with timestamped logs tracking CPU usage, memory consumption, and runtime for reproducibility.
+- **Consistent logging to stdout/stderr** with a per-command resource summary emitted at the end of each run.
 ## Installation (Recommended)
@@ -52,6 +53,8 @@ Small test data sets ship with `moducomp`. After installation you can confirm th
 moducomp test --ncpus 16 --calculate-complementarity 2 --eggnog-data-dir "$EGGNOG_DATA_DIR"
 ```
+The test command runs in low-memory mode by default. If you have plenty of RAM and want full-memory mode, add `--fullmem` (or `--full-mem`).
 ### Developer install (Pixi)
 If you want to download the code and develop locally:
@@ -80,6 +83,82 @@ You should see the command line help without errors.
 `moducomp` provides two main commands: `pipeline` and `analyze-ko-matrix`. You can run these commands using Pixi tasks defined in `pyproject.toml` or directly within the Pixi environment.
+### Pipeline overview
+The diagram below shows the main stages executed by ModuComp.
+```mermaid
+graph TD
+    A([Start run]) --> B[Initialize logging and resource monitoring]
+    B --> C{Input type}
+    C -->|pipeline| D[Validate genome directory]
+    C -->|analyze-ko-matrix| H[Load existing KO matrix]
+    D --> E[Prepare genomes: adapt headers or copy to tmp]
+    E --> F[Merge genomes into single FAA]
+    F --> G[Run eggNOG-mapper (if needed)]
+    G --> H[Create KO matrix (`kos_matrix.csv`)]
+    H --> I[Convert KO matrix to KPCT input]
+    I --> J[Run KPCT (parallel with fallback)]
+    J --> K[Create module completeness matrix]
+    K --> L{Complementarity requested?}
+    L -->|Yes| M[Generate complementarity report(s)]
+    L -->|No| N[Skip]
+    M --> O[Write outputs + logs]
+    N --> O
+    O --> P[Optional cleanup of `tmp/`]
+    P --> Q([Pipeline complete])
+```
+### CLI options and defaults
+This section lists all CLI options implemented today, along with their default values.
+#### `pipeline` command (positional args: `genomedir`, `savedir`)
+| Option | Default | Description |
+| --- | --- | --- |
+| `--ncpus`, `-n` | `16` | Number of CPU cores to use for eggNOG-mapper and KPCT. |
+| `--calculate-complementarity`, `-c` | `0` | Complementarity size to compute (0 disables). |
+| `--adapt-headers/--no-adapt-headers` | `false` | Adapt FASTA headers to `genome|protein_N`. |
+| `--del-tmp/--keep-tmp` | `true` | Delete temporary files after completion. |
+| `--lowmem/--fullmem` (`--low-mem/--full-mem`) | `fullmem` | Run eggNOG-mapper without `--dbmem` to reduce RAM. |
+| `--verbose/--quiet` | `false` | Enable verbose progress output. |
+| `--log-level`, `-l` | `INFO` | Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR`. |
+| `--eggnog-data-dir` | `EGGNOG_DATA_DIR` | Path to eggNOG-mapper data (sets `EGGNOG_DATA_DIR`). |
+#### `test` command (bundled test genomes)
+| Option | Default | Description |
+| --- | --- | --- |
+| `--output-dir`, `-o` | `output_test_moducomp_<DATETIME>` | Output directory for test run. |
+| `--ncpus`, `-n` | `2` | CPU cores for the test run. |
+| `--calculate-complementarity`, `-c` | `2` | Complementarity size to compute (0 disables). |
+| `--adapt-headers/--no-adapt-headers` | `false` | Adapt FASTA headers before the test. |
+| `--del-tmp/--keep-tmp` | `true` | Delete temporary files after the test completes. |
+| `--lowmem/--fullmem` (`--low-mem/--full-mem`) | `lowmem` | Low-memory mode is the default for tests. |
+| `--verbose/--quiet` | `verbose` | Verbose output is the default for tests. |
+| `--log-level`, `-l` | `INFO` | Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR`. |
+| `--eggnog-data-dir` | `EGGNOG_DATA_DIR` | Path to eggNOG-mapper data (sets `EGGNOG_DATA_DIR`). |
+#### `analyze-ko-matrix` command (positional args: `kos_matrix`, `savedir`)
+| Option | Default | Description |
+| --- | --- | --- |
+| `--calculate-complementarity`, `-c` | `0` | Complementarity size to compute (0 disables). |
+| `--kpct-outprefix` | `output_give_completeness` | Prefix for KPCT output files. |
+| `--del-tmp/--keep-tmp` | `true` | Delete temporary files after completion. |
+| `--ncpus`, `-n` | `16` | CPU cores for KPCT parallel processing. |
+| `--verbose/--quiet` | `false` | Enable verbose progress output. |
+| `--log-level`, `-l` | `INFO` | Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR`. |
+#### `download-eggnog-data` command
+| Option | Default | Description |
+| --- | --- | --- |
+| `--eggnog-data-dir` | `${XDG_DATA_HOME:-~/.local/share}/moducomp/eggnog` | Destination for eggNOG-mapper data (sets `EGGNOG_DATA_DIR`). |
+| `--log-level`, `-l` | `INFO` | Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR`. |
+| `--verbose/--quiet` | `verbose` | Stream downloader output to the console. |
 ### Performance and parallel processing
 `moducomp` includes **parallel processing capabilities** for the KPCT (KEGG Pathways Completeness Tool) analysis, which can significantly improve performance for large datasets:
@@ -100,7 +179,7 @@ You should see the command line help without errors.
 ### ⚠️ Important note 2
-`moducomp` is specifically designed for large scale analysis of microbiomes with hundreds of members, and works on Linux systems with at least **64GB of RAM**. Nevertheless, it can be run on **smaller systems with less RAM, using the flag `--lowmem` when running the `pipeline` command**.
+`moducomp` is specifically designed for large scale analysis of microbiomes with hundreds of members, and works on Linux systems with at least **64GB of RAM**. Nevertheless, it can be run on **smaller systems with less RAM, using the flag `--lowmem` (`--low-mem`) when running the `pipeline` command**. The `test` command uses low-memory mode by default and can be switched to full memory with `--fullmem` (`--full-mem`).
 ### Notes on bundled test data
@@ -137,9 +216,9 @@ moducomp pipeline \
     --ncpus <number_of_cpus_to_use> \
     --calculate-complementarity <N>  # 0 to disable, 2 for 2-member, 3 for 3-member complementarity.
     # Optional flags:
-    # --lowmem                    # Optional: Use this if you have less than 64GB of RAM
+    # --lowmem/--fullmem          # Optional: Use low-mem if you have less than 64GB of RAM (default is full mem)
     # --adapt-headers             # If your FASTA headers need modification
-    # --del-tmp                   # To delete temporary files
+    # --del-tmp/--keep-tmp        # Delete or keep temporary files
     # --eggnog-data-dir /path     # If EGGNOG_DATA_DIR is not set
     # --verbose                   # Enable verbose output with detailed progress information
 ```
@@ -158,7 +237,7 @@ moducomp analyze-ko-matrix \
     --calculate-complementarity <N>  # 0 to disable, 2 for 2-member, 3 for 3-member complementarity.
     # Optional flags:
-    # --del-tmp false
+    # --keep-tmp                  # Keep temporary files
     # --verbose                   # Enable verbose output with detailed progress information
 ```
@@ -202,7 +281,7 @@ moducomp pipeline ./genomes ./output_lowmem --ncpus 8 --lowmem --calculate-compl
 - **`module_completeness.tsv`**: Module completeness scores for individual genomes and combinations
 - **`module_completeness_complementarity_Nmember.tsv`**: Complementarity reports (if requested)
 - **`logs/resource_usage_YYYYMMDD_HHMMSS.log`**: Resource monitoring log with CPU, memory, and runtime metrics for reproducibility
-- **`logs/moducomp.log`**: Detailed pipeline execution log
+- **`logs/moducomp.log`**: Detailed pipeline execution log with a per-command resource summary at the end of the run
 ## Citation
 Villada, JC. & Schulz, F. (2025). Assessment of metabolic module completeness of genomes and metabolic complementarity in microbiomes with `moducomp` . `moducomp` (v0.5.1) Zenodo. https://doi.org/10.5281/zenodo.16116092

{moducomp-0.7.8 → moducomp-0.7.10}/moducomp/__init__.py RENAMED Viewed

@@ -2,7 +2,7 @@
 moducomp: metabolic module completeness and complementarity for microbiomes.
 """
-__version__ = "0.7.8"
+__version__ = "0.7.10"
 __author__ = "Juan C. Villada"
 __email__ = "jvillada@lbl.gov"
 __title__ = "moducomp"

moducomp 0.7.8__tar.gz → 0.7.10__tar.gz

moducomp 0.7.8tar.gz → 0.7.10tar.gz