PyPI - levseq - Versions diffs - 1.2.6__tar.gz → 1.2.9__tar.gz - Mend

levseq 1.2.6tar.gz → 1.2.9tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (40) hide show

{levseq-1.2.6/levseq.egg-info → levseq-1.2.9}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: levseq
-Version: 1.2.6
+Version: 1.2.9
 Home-page: https://github.com/fhalab/levseq/
 Author: Yueming Long, Emreay Gursoy, Ariane Mora, Francesca-Zhoufan Li
 Author-email: ylong@caltech.edu
@@ -49,18 +49,18 @@ Requires-Dist: biopandas
 In directed evolution, sequencing every variant enhances data insight and creates datasets suitable for AI/ML methods. This method is presented as an extension of the original Every Variant Sequencer using Illumina technology. With this approach, sequence variants can be generated within a day at an extremely low cost.
-![Figure 1: LevSeq Workflow](manuscript/figures/LevSeq_Figure-1.png)
+![Figure 1: LevSeq Workflow](manuscript/figures/LevSeq_Figure-1.jpeg)
 Figure 1: Overview of the LevSeq variant sequencing workflow using Nanopore technology. This diagram illustrates the key steps in the process, from sample preparation to data analysis and visualization.
 - Data to reproduce the results and to test are available on zenodo [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.13694463.svg)](https://doi.org/10.5281/zenodo.13694463)
-- A dockerized website and database for labs to locally host and visualize their data: website is available [here](https://github.com/ArianeMora/LevSeq_vis/) and code to host locally at: https://github.com/fhalab/LevSeq_VDB/
+- A dockerized website and database for labs to locally host and visualize their data: website is available [here](https://levseqdb.streamlit.app/) and code to host locally [here](https://github.com/fhalab/LevSeq_db)
 ## Setup
 For setting up the experimental side of LevSeq we suggest the following preparations:
-- Order forward and reverse primers compatible with the desired plasmid, see methods section of [our paper](http://biorxiv.org/cgi/content/short/2024.09.04.611255v1?rss=1).
+- Order forward and reverse primers compatible with the desired plasmid, see methods section of [our paper](https://pubs.acs.org/doi/10.1021/acssynbio.4c00625).
 - Successfully install Oxford Nanopore's software (this is only for if you are doing basecalling/minION processing). [Link to installation guide](https://nanoporetech.com/).
 ## How to Use LevSeq
@@ -171,4 +171,18 @@ For more details or trouble shooting please look at our [computational_protocols
 #### Citing
-If you have found LevSeq useful, please cite out [paper](https://doi.org/10.1101/2024.09.04.611255).
+If you have found LevSeq useful, please cite our [paper](https://pubs.acs.org/doi/10.1021/acssynbio.4c00625).
+```bibtex
+@article{long2024levseq,
+  title={LevSeq: Rapid Generation of Sequence-Function Data for Directed Evolution and Machine Learning},
+  author={Long, Yueming and Mora, Ariane and Li, Francesca-Zhoufan and Gürsoy, Emre and Johnston, Kadina E and Arnold, Frances H},
+  journal={ACS Synthetic Biology},
+  year={2024},
+  publisher={American Chemical Society}
+}
+```
+#### Contact
+Leave a feature request in the issues or reach us via [email](mailto:levseqdb@gmail.com).

{levseq-1.2.6 → levseq-1.2.9}/README.md RENAMED Viewed

@@ -2,18 +2,18 @@
 In directed evolution, sequencing every variant enhances data insight and creates datasets suitable for AI/ML methods. This method is presented as an extension of the original Every Variant Sequencer using Illumina technology. With this approach, sequence variants can be generated within a day at an extremely low cost.
-![Figure 1: LevSeq Workflow](manuscript/figures/LevSeq_Figure-1.png)
+![Figure 1: LevSeq Workflow](manuscript/figures/LevSeq_Figure-1.jpeg)
 Figure 1: Overview of the LevSeq variant sequencing workflow using Nanopore technology. This diagram illustrates the key steps in the process, from sample preparation to data analysis and visualization.
 - Data to reproduce the results and to test are available on zenodo [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.13694463.svg)](https://doi.org/10.5281/zenodo.13694463)
-- A dockerized website and database for labs to locally host and visualize their data: website is available [here](https://github.com/ArianeMora/LevSeq_vis/) and code to host locally at: https://github.com/fhalab/LevSeq_VDB/
+- A dockerized website and database for labs to locally host and visualize their data: website is available [here](https://levseqdb.streamlit.app/) and code to host locally [here](https://github.com/fhalab/LevSeq_db)
 ## Setup
 For setting up the experimental side of LevSeq we suggest the following preparations:
-- Order forward and reverse primers compatible with the desired plasmid, see methods section of [our paper](http://biorxiv.org/cgi/content/short/2024.09.04.611255v1?rss=1).
+- Order forward and reverse primers compatible with the desired plasmid, see methods section of [our paper](https://pubs.acs.org/doi/10.1021/acssynbio.4c00625).
 - Successfully install Oxford Nanopore's software (this is only for if you are doing basecalling/minION processing). [Link to installation guide](https://nanoporetech.com/).
 ## How to Use LevSeq
@@ -124,4 +124,18 @@ For more details or trouble shooting please look at our [computational_protocols
 #### Citing
-If you have found LevSeq useful, please cite out [paper](https://doi.org/10.1101/2024.09.04.611255).
+If you have found LevSeq useful, please cite our [paper](https://pubs.acs.org/doi/10.1021/acssynbio.4c00625).
+```bibtex
+@article{long2024levseq,
+  title={LevSeq: Rapid Generation of Sequence-Function Data for Directed Evolution and Machine Learning},
+  author={Long, Yueming and Mora, Ariane and Li, Francesca-Zhoufan and Gürsoy, Emre and Johnston, Kadina E and Arnold, Frances H},
+  journal={ACS Synthetic Biology},
+  year={2024},
+  publisher={American Chemical Society}
+}
+```
+#### Contact
+Leave a feature request in the issues or reach us via [email](mailto:levseqdb@gmail.com).

{levseq-1.2.6 → levseq-1.2.9}/levseq/__init__.py RENAMED Viewed

@@ -18,7 +18,7 @@
 __title__ = 'levseq'
 __description__ = 'LevSeq nanopore sequencing'
 __url__ = 'https://github.com/fhalab/levseq/'
-__version__ = '1.2.6'
+__version__ = '1.2.9'
 __author__ = 'Yueming Long, Emreay Gursoy, Ariane Mora, Francesca-Zhoufan Li'
 __author_email__ = 'ylong@caltech.edu'
 __license__ = 'GPL3'

{levseq-1.2.6 → levseq-1.2.9}/levseq/run_levseq.py RENAMED Viewed

@@ -275,11 +275,11 @@ def create_df_v(variants_df):
     )
     # Fill in 'Deletion' in 'aa_variant' column
     df_variants_.loc[
-        df_variants_["nc_variant"] == "Deletion", "aa_variant"
-    ] = "Deletion"
+        df_variants_["nc_variant"] == "#DEL#", "aa_variant"
+    ] = "#DEL#"
     df_variants_.loc[
-        df_variants_["nc_variant"] == "Insertion", "aa_variant"
-    ] = "Insertion"
+        df_variants_["nc_variant"] == "#INS#", "aa_variant"
+    ] = "#INS#"
     # Compare aa_variant with translated refseq and generate Substitutions column
     df_variants_["Substitutions"] = df_variants_.apply(get_mutations, axis=1)
@@ -291,7 +291,7 @@ def create_df_v(variants_df):
     # Fill in Deletion into Substitutions Column, keep #N.A.# unchanged
     for i in df_variants_.index:
         if df_variants_["nc_variant"].iloc[i] == "Deletion":
-            df_variants_.Substitutions.iat[i] = df_variants_.Substitutions.iat[i].replace("", "-")
+            df_variants_.Substitutions.iat[i] = df_variants_.Substitutions.iat[i].replace("", "#DEL#")
         elif df_variants_["nc_variant"].iloc[i] == "#N.A.#":
             df_variants_.Substitutions.iat[i] = "#N.A.#"
@@ -363,9 +363,9 @@ def create_nc_variant(variant, refseq):
     elif variant == "#PARENT#":
         return refseq
     elif "DEL" in variant:
-        return "Deletion"
+        return "#DEL#"
     elif variant == '+':
-        return "Insertion"
+        return "#INS#"
     else:
         mutations = variant.split("_")
         nc_variant = list(refseq)
@@ -465,7 +465,7 @@ def process_ref_csv(cl_args, tqdm_fn=tqdm.tqdm):
             logging.info(f"Fasta file for {name} already exists. Skipping write.")
         barcode_path = filter_bc(cl_args, name_folder, i)
-        output_dir = Path(result_folder) / "basecalled_reads"
+        output_dir = Path(result_folder) / f"{cl_args['name']}_fastq"
         output_dir.mkdir(parents=True, exist_ok=True)
         if not cl_args["skip_demultiplexing"]:
@@ -491,17 +491,25 @@ def process_ref_csv(cl_args, tqdm_fn=tqdm.tqdm):
                 continue
     variant_df.to_csv(variant_csv_path, index=False)
-    return variant_df
+    return variant_df, ref_df
 # Main function to run LevSeq and ensure saving of intermediate results if an error occurs
 def run_LevSeq(cl_args, tqdm_fn=tqdm.tqdm):
     result_folder = create_result_folder(cl_args)
+    # Ref folder for saving ref csv file
+    ref_folder = os.path.join(result_folder, "ref")
+    os.makedirs(ref_folder, exist_ok=True)
     configure_logging(result_folder)
+    logging.info("Logging configured. Starting program.")
     variant_df = pd.DataFrame(columns=["barcode_plate", "name", "refseq", "variant"])
     try:
-        variant_df = process_ref_csv(cl_args, tqdm_fn)
+        variant_df, ref_df = process_ref_csv(cl_args, tqdm_fn)
+        ref_df_path = os.path.join(ref_folder, cl_args["name"]+".csv")
+        ref_df.to_csv(ref_df_path, index=False)
         if variant_df.empty:
             logging.warning("No data found during CSV processing. The CSV is empty.")
     except Exception as e:

{levseq-1.2.6 → levseq-1.2.9}/levseq/utils.py RENAMED Viewed

@@ -214,8 +214,10 @@ def calculate_mutation_significance_across_well(seq_df):
     # Do multiple test correction to correct each of the pvalues
     for p in ['p_value', 'p(a)', 'p(t)', 'p(g)', 'p(c)', 'p(n)', 'p(i)']:
         # Do B.H which is the simplest possibly change to have alpha be a variable! ToDo :D
-        padjs = multipletests(seq_df[p].values, alpha=0.05, method='fdr_bh')
-        seq_df[f'{p} adj.'] = padjs[1]
+        padjs = seq_df[p].values * len(seq_df)
+        # The multiple test correction was sometimes returning 0 so we're updating to just do bonferroni
+        #multipletests(seq_df[p].values, alpha=0.05, method='fdr_bh')
+        seq_df[f'{p} adj.'] = padjs #padjs[1]
     return seq_df
 def alignment_from_cigar(cigar: str, alignment: str, ref: str, query_qualities: list):
@@ -246,8 +248,8 @@ def alignment_from_cigar(cigar: str, alignment: str, ref: str, query_qualities:
             pos += op_len
             ref_pos += op_len
         elif op == 1:  # insertion to the reference
-            inserts[pos] = alignment[pos - 1:pos + op_len]
-            # new_seq += alignment[pos:pos + op_len]
+            inserts[ref_pos - 1] = alignment[pos - 1:pos + op_len]
+            new_seq = new_seq[:-1] + 'I'  # Set the previous position to be an insertion
             pos += op_len
         elif op == 2:  # deletion from the reference
             new_seq += '-' * op_len
@@ -487,12 +489,15 @@ def get_variant_label_for_well(seq_df, threshold):
         label = '_'.join(label)
         # Only keep the frequency of the most frequent mutation
         probability = np.mean([x for x in non_refs['percent_most_freq_mutation'].values])
-        # Combine the values
-        chi2_statistic, combined_p_value = combine_pvalues([x for x in non_refs['p_value adj.'].values], method='fisher')
+        # Combine the values -> looks like fishers works maybe only if there are > 1
+        if len(non_refs) > 1:
+            chi2_statistic, combined_p_value = combine_pvalues([x for x in non_refs['p_value adj.'].values], method='fisher')
+        else:
+            combined_p_value = non_refs['p_value adj.'].values[0]
     else:
         label = '#PARENT#'
         probability = np.mean([1 - x for x in non_refs['freq_non_ref'].values])
         combined_p_value = float("nan")
     # Return also the mean mutation rate for the well
     mean_mutation_rate = np.mean([1 - x for x in non_refs['freq_non_ref'].values])
-    return label, probability, combined_p_value, mixed_well, mean_mutation_rate
+    return label, probability, combined_p_value, mixed_well, mean_mutation_rate

{levseq-1.2.6 → levseq-1.2.9}/levseq/variantcaller.py RENAMED Viewed

@@ -27,6 +27,7 @@ from Bio import SeqIO
 import re
 from tqdm import tqdm
 import warnings
+import math
 '''
 Script for variant calling
@@ -37,11 +38,12 @@ The variant caller starts from demultiplexed fastq files.
 3) Call variant with soft alignment
 '''
-# Set up logging with a default level of WARNING
-logging.basicConfig(level=logging.WARNING, format='%(asctime)s - %(levelname)s - %(message)s')
 logger = logging.getLogger(__name__)
-# Suppress numpy warnings
-warnings.filterwarnings("ignore", category=RuntimeWarning)
+logger.setLevel(logging.WARNING)  # Set default level for this module
+# Use the logger in this file
+logger.warning("This is a warning message.")
+logger.info("This won't show unless logging is configured to INFO elsewhere.")
 class VariantCaller:
     """
@@ -212,7 +214,7 @@ class VariantCaller:
         self.variant_df['Variant'] = [self.variant_dict[b_id].get('Variant') for b_id in self.variant_df['ID'].values]
         self.variant_df['Mixed Well'] = [self.variant_dict[b_id].get('Mixed Well') for b_id in self.variant_df['ID'].values]
         self.variant_df['Average mutation frequency'] = [self.variant_dict[b_id].get('Average mutation frequency') for b_id in self.variant_df['ID'].values]
-        self.variant_df['P value'] = [self.variant_dict[b_id].get('P value') if self.variant_dict[b_id].get('P value') else 1.0 for b_id in self.variant_df['ID'].values]
+        self.variant_df['P value'] = [self.variant_dict[b_id].get('P value') for b_id in self.variant_df['ID'].values]
         self.variant_df['Alignment Count'] = [self.variant_dict[b_id].get('Alignment Count') for b_id in self.variant_df['ID'].values]
         self.variant_df['Average error rate'] = [self.variant_dict[b_id].get('Average error rate') for b_id in self.variant_df['ID'].values]

{levseq-1.2.6 → levseq-1.2.9/levseq.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: levseq
-Version: 1.2.6
+Version: 1.2.9
 Home-page: https://github.com/fhalab/levseq/
 Author: Yueming Long, Emreay Gursoy, Ariane Mora, Francesca-Zhoufan Li
 Author-email: ylong@caltech.edu
@@ -49,18 +49,18 @@ Requires-Dist: biopandas
 In directed evolution, sequencing every variant enhances data insight and creates datasets suitable for AI/ML methods. This method is presented as an extension of the original Every Variant Sequencer using Illumina technology. With this approach, sequence variants can be generated within a day at an extremely low cost.
-![Figure 1: LevSeq Workflow](manuscript/figures/LevSeq_Figure-1.png)
+![Figure 1: LevSeq Workflow](manuscript/figures/LevSeq_Figure-1.jpeg)
 Figure 1: Overview of the LevSeq variant sequencing workflow using Nanopore technology. This diagram illustrates the key steps in the process, from sample preparation to data analysis and visualization.
 - Data to reproduce the results and to test are available on zenodo [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.13694463.svg)](https://doi.org/10.5281/zenodo.13694463)
-- A dockerized website and database for labs to locally host and visualize their data: website is available [here](https://github.com/ArianeMora/LevSeq_vis/) and code to host locally at: https://github.com/fhalab/LevSeq_VDB/
+- A dockerized website and database for labs to locally host and visualize their data: website is available [here](https://levseqdb.streamlit.app/) and code to host locally [here](https://github.com/fhalab/LevSeq_db)
 ## Setup
 For setting up the experimental side of LevSeq we suggest the following preparations:
-- Order forward and reverse primers compatible with the desired plasmid, see methods section of [our paper](http://biorxiv.org/cgi/content/short/2024.09.04.611255v1?rss=1).
+- Order forward and reverse primers compatible with the desired plasmid, see methods section of [our paper](https://pubs.acs.org/doi/10.1021/acssynbio.4c00625).
 - Successfully install Oxford Nanopore's software (this is only for if you are doing basecalling/minION processing). [Link to installation guide](https://nanoporetech.com/).
 ## How to Use LevSeq
@@ -171,4 +171,18 @@ For more details or trouble shooting please look at our [computational_protocols
 #### Citing
-If you have found LevSeq useful, please cite out [paper](https://doi.org/10.1101/2024.09.04.611255).
+If you have found LevSeq useful, please cite our [paper](https://pubs.acs.org/doi/10.1021/acssynbio.4c00625).
+```bibtex
+@article{long2024levseq,
+  title={LevSeq: Rapid Generation of Sequence-Function Data for Directed Evolution and Machine Learning},
+  author={Long, Yueming and Mora, Ariane and Li, Francesca-Zhoufan and Gürsoy, Emre and Johnston, Kadina E and Arnold, Frances H},
+  journal={ACS Synthetic Biology},
+  year={2024},
+  publisher={American Chemical Society}
+}
+```
+#### Contact
+Leave a feature request in the issues or reach us via [email](mailto:levseqdb@gmail.com).

{levseq-1.2.6 → levseq-1.2.9}/levseq.egg-info/SOURCES.txt RENAMED Viewed

@@ -30,6 +30,7 @@ levseq/barcoding/demultiplex-arm64
 levseq/barcoding/demultiplex-x86
 levseq/barcoding/minion_barcodes.fasta
 tests/test_demultiplex_docker.py
+tests/test_deploy.py
 tests/test_opligopools.py
 tests/test_seqfitvis.py
 tests/test_seqs.py

levseq-1.2.9/tests/test_deploy.py ADDED Viewed

@@ -0,0 +1,91 @@
+###############################################################################
+#                                                                             #
+#    This program is free software: you can redistribute it and/or modify     #
+#    it under the terms of the GNU General Public License as published by     #
+#    the Free Software Foundation, either version 3 of the License, or        #
+#    (at your option) any later version.                                      #
+#                                                                             #
+#    This program is distributed in the hope that it will be useful,          #
+#    but WITHOUT ANY WARRANTY; without even the implied warranty of           #
+#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the            #
+#    GNU General Public License for more details.                             #
+#                                                                             #
+#    You should have received a copy of the GNU General Public License        #
+#    along with this program. If not, see <http://www.gnu.org/licenses/>.     #
+#                                                                             #
+###############################################################################
+import shutil
+import tempfile
+import unittest
+import matplotlib.pyplot as plt
+from levseq import *
+from levseq.run_levseq import process_ref_csv
+u = SciUtil()
+import math
+class TestClass(unittest.TestCase):
+    @classmethod
+    def setup_class(self):
+        local = True
+        # Create a base object since it will be the same for all the tests
+        THIS_DIR = os.path.dirname(os.path.abspath(__file__))
+        self.data_dir = os.path.join(THIS_DIR, 'test_data/')
+        if local:
+            self.tmp_dir = os.path.join(THIS_DIR, 'test_data/tmp/')
+            if os.path.exists(self.tmp_dir):
+                shutil.rmtree(self.tmp_dir)
+            os.mkdir(self.tmp_dir)
+        else:
+            self.tmp_dir = tempfile.mkdtemp(prefix='test_data')
+    @classmethod
+    def teardown_class(self):
+        shutil.rmtree(self.tmp_dir)
+class TestDeploy(TestClass):
+    def test_deploy(self):
+        cmd_list = [
+            'docker',  # Needs to be installed as vina.
+            'run',
+            '--rm',
+            '-v',
+            f'{os.getcwd()}:/levseq_results',
+            'levseq',
+            'test_deploy',
+            'test_data/laragen_run/levseq-1.2.7/',
+            'test_data/laragen_run/20241116-LevSeq-Review-Validation-levseq_ref.csv'
+        ]
+        # ToDo: add in scoring function for ad4
+        cmd_return = subprocess.run(cmd_list, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
+        print(cmd_return.stdout, cmd_return)
+    def test_variant_calling(self):
+        # Take as input the demultiplexed fastq files and the reference csv file
+        cl_args = {'skip_demultiplexing': True, 'skip_variantcalling': False}
+        cl_args["name"] = 'test_deploy'
+        cl_args['path'] = 'test_data/laragen_run/levseq-1.2.7/'
+        cl_args["summary"] = 'test_data/laragen_run/20241116-LevSeq-Review-Validation-levseq_ref.csv'
+        variant_df, ref_df = process_ref_csv(cl_args)
+        # Now we want to check all the variants are the same as in the original case:
+        checked_variants_df = pd.read_csv('test_data/laragen_run/levseq-1.2.7/variants_gold_standard.csv')
+        checked_variants = checked_variants_df['Variant'].values
+        checked_sig = checked_variants_df['P adj. value'].values
+        i = 0
+        for variant, pval in variant_df[['Variant', 'P adj. value']].values:
+            print(variant, checked_variants[i])
+            if checked_variants[i]:
+                if variant:
+                    assert variant == checked_variants[i]
+            # if pval < 0.05:
+            #     assert checked_sig[i] < 0.05
+            # elif math.isnan(pval):
+            #     assert math.isnan(checked_sig[i])
+            # else:
+            #     assert checked_sig[i] >= 0.05
+            print(pval, checked_sig[i])
+            i += 1