PyPI - levseq - Versions diffs - 1.4.1__tar.gz → 1.4.2__tar.gz - Mend

levseq 1.4.1tar.gz → 1.4.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (42) hide show

{levseq-1.4.1/levseq.egg-info → levseq-1.4.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: levseq
-Version: 1.4.1
+Version: 1.4.2
 Home-page: https://github.com/fhalab/levseq/
 Author: Yueming Long, Ariane Mora, Francesca-Zhoufan Li, Emre Gursoy
 Author-email: ylong@caltech.edu
@@ -69,7 +69,10 @@ Figure 1: Overview of the LevSeq variant sequencing workflow using Nanopore tech
    ```bash
    docker run --rm -v "/full/path/to/data:/levseq_results" yueminglong/levseq:levseq-1.4-arm64 my_experiment levseq_results/ levseq_results/ref.csv
    ```
+4. Connect function data to your sequence data
+   ```bash
+   docker run --rm -v "/full/path/to/data:/levseq_results" yueminglong/levseq:levseq-1.4-arm64 my_experiment levseq_results/ levseq_results/ref.csv --fitness_files "levseq_results/20250712_epPCR_Q06714_37.csv,levseq_results/20250712_epPCR_Q06714_39.csv,levseq_results/20250712_epPCR_Q06714_40.csv" --smiles 'O=P(OC1=CC=CC=C1)(OC2=CC=CC=C2)OC3=CC=CC=C3>>O=P(O)(OC4=CC=CC=C4)OC5=CC=CC=C5' --compound dPPi --variant_df "levseq_results/visualization_partial.csv"
+   ```
 ### Pip Installation (Mac/Linux only)
 **IMPORTANT**: On Mac M-series chips (M1-M4), gcc 13 and 14 are **REQUIRED**:
@@ -98,6 +101,18 @@ brew install gcc@13 gcc@14
    levseq my_experiment /path/to/data/ /path/to/ref.csv
    ```
+5. Combine function data:
+   ```bash
+   levseq my_experiment /path/to/data/ /path/to/ref.csv  "LCMS_file_{barcode1}.csv,LCMS_file_{barcode2}.csv," --smiles 'reaction_smiles_string' --compound "name_of_compound_in_LCMS_file" --variant_df "visualization_partial.csv"
+   ```
+Note for function data we currently expect a LCMS file e.g. with the columns:
+- `Sample Vial Number` (corresponding to the well that the sample was from).
+- `Area` (which becomes fitness value).
+- `Compound Name` which is the name of the compound we filter for that is passed as a parameter.
+- The last `_X.csv` needs to be the barcode number to match that sample to your plate e.g. if you ran LevSeq with barcode 33 for plate 2 you need to have `_33.csv` for the fitness file for plate 2. e.g. `some_fitnes_for_plate_2_33.csv`.
 ## Data and Visualization
 - **Test Data**: Sample data is available on Zenodo [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.13694463.svg)](https://doi.org/10.5281/zenodo.13694463)

{levseq-1.4.1 → levseq-1.4.2}/README.md RENAMED Viewed

@@ -22,7 +22,10 @@ Figure 1: Overview of the LevSeq variant sequencing workflow using Nanopore tech
    ```bash
    docker run --rm -v "/full/path/to/data:/levseq_results" yueminglong/levseq:levseq-1.4-arm64 my_experiment levseq_results/ levseq_results/ref.csv
    ```
+4. Connect function data to your sequence data
+   ```bash
+   docker run --rm -v "/full/path/to/data:/levseq_results" yueminglong/levseq:levseq-1.4-arm64 my_experiment levseq_results/ levseq_results/ref.csv --fitness_files "levseq_results/20250712_epPCR_Q06714_37.csv,levseq_results/20250712_epPCR_Q06714_39.csv,levseq_results/20250712_epPCR_Q06714_40.csv" --smiles 'O=P(OC1=CC=CC=C1)(OC2=CC=CC=C2)OC3=CC=CC=C3>>O=P(O)(OC4=CC=CC=C4)OC5=CC=CC=C5' --compound dPPi --variant_df "levseq_results/visualization_partial.csv"
+   ```
 ### Pip Installation (Mac/Linux only)
 **IMPORTANT**: On Mac M-series chips (M1-M4), gcc 13 and 14 are **REQUIRED**:
@@ -51,6 +54,18 @@ brew install gcc@13 gcc@14
    levseq my_experiment /path/to/data/ /path/to/ref.csv
    ```
+5. Combine function data:
+   ```bash
+   levseq my_experiment /path/to/data/ /path/to/ref.csv  "LCMS_file_{barcode1}.csv,LCMS_file_{barcode2}.csv," --smiles 'reaction_smiles_string' --compound "name_of_compound_in_LCMS_file" --variant_df "visualization_partial.csv"
+   ```
+Note for function data we currently expect a LCMS file e.g. with the columns:
+- `Sample Vial Number` (corresponding to the well that the sample was from).
+- `Area` (which becomes fitness value).
+- `Compound Name` which is the name of the compound we filter for that is passed as a parameter.
+- The last `_X.csv` needs to be the barcode number to match that sample to your plate e.g. if you ran LevSeq with barcode 33 for plate 2 you need to have `_33.csv` for the fitness file for plate 2. e.g. `some_fitnes_for_plate_2_33.csv`.
 ## Data and Visualization
 - **Test Data**: Sample data is available on Zenodo [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.13694463.svg)](https://doi.org/10.5281/zenodo.13694463)

{levseq-1.4.1 → levseq-1.4.2}/levseq/__init__.py RENAMED Viewed

@@ -18,7 +18,7 @@
 __title__ = 'levseq'
 __description__ = 'LevSeq nanopore sequencing'
 __url__ = 'https://github.com/fhalab/levseq/'
-__version__ = '1.4.1'
+__version__ = '1.4.2'
 __author__ = 'Yueming Long, Ariane Mora, Francesca-Zhoufan Li, Emre Gursoy'
 __author_email__ = 'ylong@caltech.edu'
 __license__ = 'GPL3'

{levseq-1.4.1 → levseq-1.4.2}/levseq/interface.py RENAMED Viewed

@@ -21,6 +21,8 @@ Contain argument parsers used for command line interface and web interface
 import os
 import tqdm
 import argparse
+import pandas as pd
 # Import local packages
 from levseq.run_levseq import run_LevSeq
@@ -68,16 +70,73 @@ def build_cli_parser():
                                      help="Whether this experiment came from an oligopool, default is false.")
     optional_args_group.add_argument("--show_msa",
                                      default=False,
-                                     help="Skip showing msa")
+                                     help="Skip showing msa")
+    # if cl_args.get('fitness_files') and cl_args.get('smiles'):
+    optional_args_group.add_argument("--fitness_files",
+                                    default=None,
+                                    help="A comma separated list of fitness files (full path) with string quotation marks around them.")
+    optional_args_group.add_argument("--smiles",
+                                default=None,
+                                help="A smiles string of the reaction with quotation marks around.")
+    optional_args_group.add_argument("--compound",
+                            default=None,
+                            help="The compound in the fitness files (e.g. pDT or pdt - case sensitive).")
+    optional_args_group.add_argument("--variant_df",
+                        default=None,
+                        help="The variant dataframe to combine with fitness data.")
     return parser
+def combine_seq_func_data(cl_args):
+    # Also check if we have any fitness data
+    if cl_args.get('fitness_files') and cl_args.get('smiles') and cl_args.get('variant_df'):
+        variant_filename = cl_args.get('variant_df')
+        variant_df = pd.read_csv(variant_filename)
+        # Combine the fitness data with the plate data (note the barcode has to be the last _[barcode])
+        # The smiles has to be the reaction smiles
+        function_files = cl_args.get('fitness_files')
+        compound_name = cl_args.get('compound') if cl_args.get('compound') else 'pdt'
+        print(function_files, compound_name)
+        all_function_df = pd.DataFrame()
+        for function_file in function_files.split(','):
+            barcode = function_file.split('.csv')[0].split('_')[-1]
+            function_df = pd.read_csv(f'{function_file}')
+            function_df.columns = [c.replace('\n', ' ') for c in function_df.columns]
+            function_df['function_well'] = [x.split('-')[-1] if isinstance(x, str) else None for x in function_df['Sample Vial Number'].values]
+            function_df['function_barcode_plate'] = barcode
+            function_df = function_df[function_df['Compound Name'] == compound_name] # We only use pdt or Pdt
+            # Convert it to numeric
+            function_df['Area'] = pd.to_numeric(function_df['Area'], errors='coerce')
+            function_df['barcode_well'] = [f'{p}_{w}' for w, p in function_df[['function_well', 'function_barcode_plate']].values]
+            function_df['filename'] = function_file
+            print(function_df.head())
+            all_function_df = pd.concat([all_function_df, function_df])
+        # Join this with the variant_df barcode plate
+        variant_df['barcode_well'] = [f'{p}_{w}' for w, p in variant_df[['Well', 'barcode_plate']].values]
+        # Join the two
+        variant_df.set_index('barcode_well', inplace=True)
+        all_function_df.set_index('barcode_well', inplace=True)
+        variant_df = variant_df.join(all_function_df, how='left')
+        reaction_smiles = cl_args.get('smiles')
+        variant_df['smiles_string'] = reaction_smiles.split('>>')[-1]
+        variant_df['reaction_smiles'] = reaction_smiles
+        variant_df.columns = [c.lower().replace(' ', '_') for c in variant_df.columns]
+        variant_df.rename(columns={'area': 'fitness_value'}, inplace=True)
+        variant_df.to_csv(f'{variant_filename.replace(".csv", "_seqfunc.csv")}')
+        # levseq levseq_4.1 ref.csv fitness --fitness_files "20250712_epPCR_Q06714_37.csv,20250712_epPCR_Q06714_38.csv,20250712_epPCR_Q06714_39.csv,20250712_epPCR_Q06714_40.csv" --smiles 'O=P(OC1=CC=CC=C1)(OC2=CC=CC=C2)OC3=CC=CC=C3>>O=P(O)(OC4=CC=CC=C4)OC5=CC=CC=C5'  --compound dPPi --variant_df visualization_partial.csv
+        return variant_df
 # Execute LevSeq
 def execute_LevSeq():
     # Build parser
     parser = build_cli_parser()
     # Parse the arguments
     CL_ARGS = vars(parser.parse_args())
+    if CL_ARGS.get('fitness_files') and CL_ARGS.get('smiles') and CL_ARGS.get('variant_df'):
+        print('Combining LevSeq')
+        return combine_seq_func_data(CL_ARGS)
     # Set up progres bar
     tqdm_fn = tqdm.tqdm
     # Run LevSeq

{levseq-1.4.1 → levseq-1.4.2}/levseq/run_levseq.py RENAMED Viewed

@@ -602,6 +602,7 @@ def process_ref_csv(cl_args, tqdm_fn=tqdm.tqdm):
                 continue
     variant_df.to_csv(variant_csv_path, index=False)
     return variant_df, ref_df

{levseq-1.4.1 → levseq-1.4.2/levseq.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: levseq
-Version: 1.4.1
+Version: 1.4.2
 Home-page: https://github.com/fhalab/levseq/
 Author: Yueming Long, Ariane Mora, Francesca-Zhoufan Li, Emre Gursoy
 Author-email: ylong@caltech.edu
@@ -69,7 +69,10 @@ Figure 1: Overview of the LevSeq variant sequencing workflow using Nanopore tech
    ```bash
    docker run --rm -v "/full/path/to/data:/levseq_results" yueminglong/levseq:levseq-1.4-arm64 my_experiment levseq_results/ levseq_results/ref.csv
    ```
+4. Connect function data to your sequence data
+   ```bash
+   docker run --rm -v "/full/path/to/data:/levseq_results" yueminglong/levseq:levseq-1.4-arm64 my_experiment levseq_results/ levseq_results/ref.csv --fitness_files "levseq_results/20250712_epPCR_Q06714_37.csv,levseq_results/20250712_epPCR_Q06714_39.csv,levseq_results/20250712_epPCR_Q06714_40.csv" --smiles 'O=P(OC1=CC=CC=C1)(OC2=CC=CC=C2)OC3=CC=CC=C3>>O=P(O)(OC4=CC=CC=C4)OC5=CC=CC=C5' --compound dPPi --variant_df "levseq_results/visualization_partial.csv"
+   ```
 ### Pip Installation (Mac/Linux only)
 **IMPORTANT**: On Mac M-series chips (M1-M4), gcc 13 and 14 are **REQUIRED**:
@@ -98,6 +101,18 @@ brew install gcc@13 gcc@14
    levseq my_experiment /path/to/data/ /path/to/ref.csv
    ```
+5. Combine function data:
+   ```bash
+   levseq my_experiment /path/to/data/ /path/to/ref.csv  "LCMS_file_{barcode1}.csv,LCMS_file_{barcode2}.csv," --smiles 'reaction_smiles_string' --compound "name_of_compound_in_LCMS_file" --variant_df "visualization_partial.csv"
+   ```
+Note for function data we currently expect a LCMS file e.g. with the columns:
+- `Sample Vial Number` (corresponding to the well that the sample was from).
+- `Area` (which becomes fitness value).
+- `Compound Name` which is the name of the compound we filter for that is passed as a parameter.
+- The last `_X.csv` needs to be the barcode number to match that sample to your plate e.g. if you ran LevSeq with barcode 33 for plate 2 you need to have `_33.csv` for the fitness file for plate 2. e.g. `some_fitnes_for_plate_2_33.csv`.
 ## Data and Visualization
 - **Test Data**: Sample data is available on Zenodo [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.13694463.svg)](https://doi.org/10.5281/zenodo.13694463)