PyPI - speconsense - Versions diffs - 0.7.4__tar.gz → 0.7.5__tar.gz - Mend

speconsense 0.7.4tar.gz → 0.7.5tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (55) hide show

{speconsense-0.7.4/speconsense.egg-info → speconsense-0.7.5}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: speconsense
-Version: 0.7.4
+Version: 0.7.5
 Summary: High-quality clustering and consensus generation for Oxford Nanopore amplicon reads
 Author-email: Josh Walker <joshowalker@yahoo.com>
 License: BSD-3-Clause
@@ -295,14 +295,14 @@ When using `speconsense-summarize` for post-processing, creates `__Summary__/` d
 |---------------|-------------|------------|-------------|
 | **Original** | Source `cluster_debug/` | `-c1`, `-c2`, `-c3` | Preserves speconsense clustering results |
 | **Summarization** | `__Summary__/`, `FASTQ Files/`, `variants/` | `-1.v1`, `-1.v2`, `-2.v1`, `.raw1` | Post-processing groups and variants |
-| **Full consensus** | `__Summary__/` | `-1.full` | IUPAC consensus from all pre-merge variants in a group |
+| **Full consensus** | `__Summary__/` | `-1.full` | IUPAC consensus from pre-merge components of surviving variants |
 ### Example Directory Structure
 ```
 __Summary__/
 ├── sample-1.v1-RiC45.fasta                  # Primary variant (group 1, merged)
 ├── sample-1.v2-RiC23.fasta                  # Additional variant (not merged)
-├── sample-1.full-RiC68.fasta                # Full IUPAC consensus for group 1 (all pre-merge variants)
+├── sample-1.full-RiC68.fasta                # Full IUPAC consensus for group 1 (surviving variants' components)
 ├── sample-2.v1-RiC30.fasta                  # Second organism group, primary variant
 ├── summary.fasta                            # All final consensus sequences (excludes .raw)
 ├── summary.txt                              # Statistics
@@ -829,7 +829,7 @@ For high-throughput workflows (e.g., 100K sequences/year), this prioritization e
 ```bash
 speconsense-summarize --enable-full-consensus
 ```
-- Generates a full IUPAC consensus sequence per variant group from all pre-merge variants
+- Generates a full IUPAC consensus sequence per variant group from pre-merge variants that contributed to surviving post-merge variants
 - Output named `{specimen}-{group}.full-RiC{reads}.fasta` in the `__Summary__/` directory
 - Uses majority voting across all variants in the group; **gaps never win** — at each alignment column, the most common non-gap base is chosen, with IUPAC codes for ties among bases
 - Useful when you want a single representative sequence that captures all variation within a group as IUPAC ambiguity codes
@@ -1073,7 +1073,7 @@ The complete speconsense-summarize workflow operates in this order:
 4. **Homopolymer-aware MSA-based variant merging** within each group, including **overlap merging** for different-length sequences (`--disable-merging`, `--merge-effort`, `--merge-position-count`, `--merge-indel-length`, `--min-merge-overlap`, `--merge-snp`, `--merge-min-size-ratio`, `--disable-homopolymer-equivalence`)
 5. **Selection size ratio filtering** to remove tiny post-merge variants (`--select-min-size-ratio`)
 6. **Variant selection** within each group (`--select-max-variants`, `--select-strategy`)
-7. **Full consensus generation** (optional) — IUPAC consensus from all pre-merge variants per group (`--enable-full-consensus`)
+7. **Full consensus generation** (optional) — IUPAC consensus from pre-merge components of surviving post-merge variants (`--enable-full-consensus`)
 8. **Output generation** with customizable header fields (`--fasta-fields`) and full traceability
 **Key architectural features**:

{speconsense-0.7.4 → speconsense-0.7.5}/README.md RENAMED Viewed

@@ -260,14 +260,14 @@ When using `speconsense-summarize` for post-processing, creates `__Summary__/` d
 |---------------|-------------|------------|-------------|
 | **Original** | Source `cluster_debug/` | `-c1`, `-c2`, `-c3` | Preserves speconsense clustering results |
 | **Summarization** | `__Summary__/`, `FASTQ Files/`, `variants/` | `-1.v1`, `-1.v2`, `-2.v1`, `.raw1` | Post-processing groups and variants |
-| **Full consensus** | `__Summary__/` | `-1.full` | IUPAC consensus from all pre-merge variants in a group |
+| **Full consensus** | `__Summary__/` | `-1.full` | IUPAC consensus from pre-merge components of surviving variants |
 ### Example Directory Structure
 ```
 __Summary__/
 ├── sample-1.v1-RiC45.fasta                  # Primary variant (group 1, merged)
 ├── sample-1.v2-RiC23.fasta                  # Additional variant (not merged)
-├── sample-1.full-RiC68.fasta                # Full IUPAC consensus for group 1 (all pre-merge variants)
+├── sample-1.full-RiC68.fasta                # Full IUPAC consensus for group 1 (surviving variants' components)
 ├── sample-2.v1-RiC30.fasta                  # Second organism group, primary variant
 ├── summary.fasta                            # All final consensus sequences (excludes .raw)
 ├── summary.txt                              # Statistics
@@ -794,7 +794,7 @@ For high-throughput workflows (e.g., 100K sequences/year), this prioritization e
 ```bash
 speconsense-summarize --enable-full-consensus
 ```
-- Generates a full IUPAC consensus sequence per variant group from all pre-merge variants
+- Generates a full IUPAC consensus sequence per variant group from pre-merge variants that contributed to surviving post-merge variants
 - Output named `{specimen}-{group}.full-RiC{reads}.fasta` in the `__Summary__/` directory
 - Uses majority voting across all variants in the group; **gaps never win** — at each alignment column, the most common non-gap base is chosen, with IUPAC codes for ties among bases
 - Useful when you want a single representative sequence that captures all variation within a group as IUPAC ambiguity codes
@@ -1038,7 +1038,7 @@ The complete speconsense-summarize workflow operates in this order:
 4. **Homopolymer-aware MSA-based variant merging** within each group, including **overlap merging** for different-length sequences (`--disable-merging`, `--merge-effort`, `--merge-position-count`, `--merge-indel-length`, `--min-merge-overlap`, `--merge-snp`, `--merge-min-size-ratio`, `--disable-homopolymer-equivalence`)
 5. **Selection size ratio filtering** to remove tiny post-merge variants (`--select-min-size-ratio`)
 6. **Variant selection** within each group (`--select-max-variants`, `--select-strategy`)
-7. **Full consensus generation** (optional) — IUPAC consensus from all pre-merge variants per group (`--enable-full-consensus`)
+7. **Full consensus generation** (optional) — IUPAC consensus from pre-merge components of surviving post-merge variants (`--enable-full-consensus`)
 8. **Output generation** with customizable header fields (`--fasta-fields`) and full traceability
 **Key architectural features**:

{speconsense-0.7.4 → speconsense-0.7.5}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "speconsense"
-version = "0.7.4"
+version = "0.7.5"
 description = "High-quality clustering and consensus generation for Oxford Nanopore amplicon reads"
 readme = "README.md"
 requires-python = ">=3.8"

{speconsense-0.7.4 → speconsense-0.7.5}/speconsense/__init__.py RENAMED Viewed

@@ -5,7 +5,7 @@ A Python tool for experimental clustering and consensus generation as an alterna
 in the fungal DNA barcoding pipeline.
 """
-__version__ = "0.7.4"
+__version__ = "0.7.5"
 __author__ = "Josh Walker"
 __email__ = "joshowalker@yahoo.com"

{speconsense-0.7.4 → speconsense-0.7.5}/speconsense/summarize/cli.py RENAMED Viewed

@@ -392,19 +392,19 @@ def process_single_specimen(file_consensuses: List[ConsensusInfo],
             final_consensus.append(renamed_variant)
             group_naming.append((variant.sample_name, new_name))
-        # Generate full consensus from PRE-MERGE variants
+        # Generate full consensus from PRE-MERGE variants that contributed
+        # to surviving post-merge variants (after select-min-size-ratio)
         if getattr(args, 'enable_full_consensus', False):
-            pre_merge_variants = variant_groups[group_id]
-            # Apply size-ratio filter (same as merge pipeline)
-            if args.merge_min_size_ratio > 0 and len(pre_merge_variants) > 1:
-                largest_size = max(v.size for v in pre_merge_variants)
-                filtered = [v for v in pre_merge_variants
-                            if (v.size / largest_size) >= args.merge_min_size_ratio]
-                if len(filtered) < len(pre_merge_variants):
-                    filtered_count = len(pre_merge_variants) - len(filtered)
-                    logging.debug(f"Full consensus: filtered out {filtered_count} variants with size ratio < {args.merge_min_size_ratio} relative to largest (size={largest_size})")
-                    pre_merge_variants = filtered
+            # Collect original cluster names from surviving post-merge variants
+            surviving_originals = set()
+            for v in group_members:
+                if v.sample_name in all_merge_traceability:
+                    surviving_originals.update(all_merge_traceability[v.sample_name])
+                else:
+                    surviving_originals.add(v.sample_name)
+            pre_merge_variants = [v for v in variant_groups[group_id]
+                                  if v.sample_name in surviving_originals]
             specimen_base = selected_variants[0].sample_name.rsplit('-c', 1)[0]
             full_name = f"{specimen_base}-{group_idx + 1}.full"

{speconsense-0.7.4 → speconsense-0.7.5/speconsense.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: speconsense
-Version: 0.7.4
+Version: 0.7.5
 Summary: High-quality clustering and consensus generation for Oxford Nanopore amplicon reads
 Author-email: Josh Walker <joshowalker@yahoo.com>
 License: BSD-3-Clause
@@ -295,14 +295,14 @@ When using `speconsense-summarize` for post-processing, creates `__Summary__/` d
 |---------------|-------------|------------|-------------|
 | **Original** | Source `cluster_debug/` | `-c1`, `-c2`, `-c3` | Preserves speconsense clustering results |
 | **Summarization** | `__Summary__/`, `FASTQ Files/`, `variants/` | `-1.v1`, `-1.v2`, `-2.v1`, `.raw1` | Post-processing groups and variants |
-| **Full consensus** | `__Summary__/` | `-1.full` | IUPAC consensus from all pre-merge variants in a group |
+| **Full consensus** | `__Summary__/` | `-1.full` | IUPAC consensus from pre-merge components of surviving variants |
 ### Example Directory Structure
 ```
 __Summary__/
 ├── sample-1.v1-RiC45.fasta                  # Primary variant (group 1, merged)
 ├── sample-1.v2-RiC23.fasta                  # Additional variant (not merged)
-├── sample-1.full-RiC68.fasta                # Full IUPAC consensus for group 1 (all pre-merge variants)
+├── sample-1.full-RiC68.fasta                # Full IUPAC consensus for group 1 (surviving variants' components)
 ├── sample-2.v1-RiC30.fasta                  # Second organism group, primary variant
 ├── summary.fasta                            # All final consensus sequences (excludes .raw)
 ├── summary.txt                              # Statistics
@@ -829,7 +829,7 @@ For high-throughput workflows (e.g., 100K sequences/year), this prioritization e
 ```bash
 speconsense-summarize --enable-full-consensus
 ```
-- Generates a full IUPAC consensus sequence per variant group from all pre-merge variants
+- Generates a full IUPAC consensus sequence per variant group from pre-merge variants that contributed to surviving post-merge variants
 - Output named `{specimen}-{group}.full-RiC{reads}.fasta` in the `__Summary__/` directory
 - Uses majority voting across all variants in the group; **gaps never win** — at each alignment column, the most common non-gap base is chosen, with IUPAC codes for ties among bases
 - Useful when you want a single representative sequence that captures all variation within a group as IUPAC ambiguity codes
@@ -1073,7 +1073,7 @@ The complete speconsense-summarize workflow operates in this order:
 4. **Homopolymer-aware MSA-based variant merging** within each group, including **overlap merging** for different-length sequences (`--disable-merging`, `--merge-effort`, `--merge-position-count`, `--merge-indel-length`, `--min-merge-overlap`, `--merge-snp`, `--merge-min-size-ratio`, `--disable-homopolymer-equivalence`)
 5. **Selection size ratio filtering** to remove tiny post-merge variants (`--select-min-size-ratio`)
 6. **Variant selection** within each group (`--select-max-variants`, `--select-strategy`)
-7. **Full consensus generation** (optional) — IUPAC consensus from all pre-merge variants per group (`--enable-full-consensus`)
+7. **Full consensus generation** (optional) — IUPAC consensus from pre-merge components of surviving post-merge variants (`--enable-full-consensus`)
 8. **Output generation** with customizable header fields (`--fasta-fields`) and full traceability
 **Key architectural features**:

{speconsense-0.7.4 → speconsense-0.7.5}/tests/test_summarize.py RENAMED Viewed

@@ -521,7 +521,7 @@ class TestFullConsensus:
     def test_full_consensus_filters_small_variants(self):
-        """Integration test: merge_min_size_ratio filters small variants from full consensus."""
+        """Integration test: select_min_size_ratio filters small variants from full consensus."""
         temp_dir = tempfile.mkdtemp()
         source_dir = os.path.join(temp_dir, "clusters")
         summary_dir = os.path.join(temp_dir, "__Summary__")
@@ -529,7 +529,7 @@ class TestFullConsensus:
         try:
             # Two similar sequences (1 SNP at position 12: G vs A)
-            # Very different sizes so the small one is filtered by merge_min_size_ratio
+            # Very different sizes so the small one is filtered by select_min_size_ratio
             seq_large = "ATCGATCGATCGATCGATCGATCG"  # G at position 12
             seq_small = "ATCGATCGATCAATCGATCGATCG"  # A at position 12
@@ -542,7 +542,8 @@ class TestFullConsensus:
             with open(fasta_file, 'w') as f:
                 f.write(fasta_content)
-            # merge-min-size-ratio 0.1 filters 5/100=0.05 from full consensus
+            # select-min-size-ratio 0.1 filters 5/100=0.05 post-merge variant,
+            # so its pre-merge components are excluded from .full consensus
             result = subprocess.run(
                 [
                     "speconsense-summarize",
@@ -550,7 +551,7 @@ class TestFullConsensus:
                     "--summary-dir", summary_dir,
                     "--min-ric", "3",
                     "--enable-full-consensus",
-                    "--merge-min-size-ratio", "0.1",
+                    "--select-min-size-ratio", "0.1",
                     "--disable-merging",
                     "--min-merge-overlap", "0",
                 ],
@@ -574,8 +575,8 @@ class TestFullConsensus:
         finally:
             shutil.rmtree(temp_dir)
-    def test_full_consensus_no_filter_when_disabled(self):
-        """Integration test: merge_min_size_ratio=0 preserves all variants in full consensus."""
+    def test_full_consensus_no_filter_when_all_survive(self):
+        """Integration test: all post-merge variants surviving means all contribute to .full."""
         temp_dir = tempfile.mkdtemp()
         source_dir = os.path.join(temp_dir, "clusters")
         summary_dir = os.path.join(temp_dir, "__Summary__")
@@ -595,7 +596,7 @@ class TestFullConsensus:
             with open(fasta_file, 'w') as f:
                 f.write(fasta_content)
-            # merge-min-size-ratio 0 disables filtering — both contribute to .full
+            # No select-min-size-ratio — both variants survive, both contribute to .full
             result = subprocess.run(
                 [
                     "speconsense-summarize",
@@ -603,7 +604,6 @@ class TestFullConsensus:
                     "--summary-dir", summary_dir,
                     "--min-ric", "3",
                     "--enable-full-consensus",
-                    "--merge-min-size-ratio", "0",
                     "--disable-merging",
                     "--min-merge-overlap", "0",
                 ],