levseq 1.4.1__tar.gz → 1.4.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (42) hide show
  1. {levseq-1.4.1/levseq.egg-info → levseq-1.4.2}/PKG-INFO +17 -2
  2. {levseq-1.4.1 → levseq-1.4.2}/README.md +16 -1
  3. {levseq-1.4.1 → levseq-1.4.2}/levseq/__init__.py +1 -1
  4. {levseq-1.4.1 → levseq-1.4.2}/levseq/interface.py +60 -1
  5. {levseq-1.4.1 → levseq-1.4.2}/levseq/run_levseq.py +1 -0
  6. {levseq-1.4.1 → levseq-1.4.2/levseq.egg-info}/PKG-INFO +17 -2
  7. {levseq-1.4.1 → levseq-1.4.2}/LICENSE +0 -0
  8. {levseq-1.4.1 → levseq-1.4.2}/MANIFEST.in +0 -0
  9. {levseq-1.4.1 → levseq-1.4.2}/levseq/IO_processor.py +0 -0
  10. {levseq-1.4.1 → levseq-1.4.2}/levseq/barcoding/__init__.py +0 -0
  11. {levseq-1.4.1 → levseq-1.4.2}/levseq/barcoding/demultiplex +0 -0
  12. {levseq-1.4.1 → levseq-1.4.2}/levseq/barcoding/demultiplex-arm64 +0 -0
  13. {levseq-1.4.1 → levseq-1.4.2}/levseq/barcoding/demultiplex-x86 +0 -0
  14. {levseq-1.4.1 → levseq-1.4.2}/levseq/barcoding/minion_barcodes.fasta +0 -0
  15. {levseq-1.4.1 → levseq-1.4.2}/levseq/basecaller.py +0 -0
  16. {levseq-1.4.1 → levseq-1.4.2}/levseq/cmd.py +0 -0
  17. {levseq-1.4.1 → levseq-1.4.2}/levseq/coordinates.py +0 -0
  18. {levseq-1.4.1 → levseq-1.4.2}/levseq/filter_orientation.py +0 -0
  19. {levseq-1.4.1 → levseq-1.4.2}/levseq/globals.py +0 -0
  20. {levseq-1.4.1 → levseq-1.4.2}/levseq/parser.py +0 -0
  21. {levseq-1.4.1 → levseq-1.4.2}/levseq/screen.py +0 -0
  22. {levseq-1.4.1 → levseq-1.4.2}/levseq/seqfit.py +0 -0
  23. {levseq-1.4.1 → levseq-1.4.2}/levseq/simulation.py +0 -0
  24. {levseq-1.4.1 → levseq-1.4.2}/levseq/user.py +0 -0
  25. {levseq-1.4.1 → levseq-1.4.2}/levseq/utils.py +0 -0
  26. {levseq-1.4.1 → levseq-1.4.2}/levseq/variantcaller.py +0 -0
  27. {levseq-1.4.1 → levseq-1.4.2}/levseq/visualization.py +0 -0
  28. {levseq-1.4.1 → levseq-1.4.2}/levseq.egg-info/SOURCES.txt +0 -0
  29. {levseq-1.4.1 → levseq-1.4.2}/levseq.egg-info/dependency_links.txt +0 -0
  30. {levseq-1.4.1 → levseq-1.4.2}/levseq.egg-info/entry_points.txt +0 -0
  31. {levseq-1.4.1 → levseq-1.4.2}/levseq.egg-info/requires.txt +0 -0
  32. {levseq-1.4.1 → levseq-1.4.2}/levseq.egg-info/top_level.txt +0 -0
  33. {levseq-1.4.1 → levseq-1.4.2}/setup.cfg +0 -0
  34. {levseq-1.4.1 → levseq-1.4.2}/setup.py +0 -0
  35. {levseq-1.4.1 → levseq-1.4.2}/tests/test_copy_fastq.py +0 -0
  36. {levseq-1.4.1 → levseq-1.4.2}/tests/test_demultiplex_docker.py +0 -0
  37. {levseq-1.4.1 → levseq-1.4.2}/tests/test_deploy.py +0 -0
  38. {levseq-1.4.1 → levseq-1.4.2}/tests/test_opligopools.py +0 -0
  39. {levseq-1.4.1 → levseq-1.4.2}/tests/test_seqfitvis.py +0 -0
  40. {levseq-1.4.1 → levseq-1.4.2}/tests/test_seqs.py +0 -0
  41. {levseq-1.4.1 → levseq-1.4.2}/tests/test_statistics.py +0 -0
  42. {levseq-1.4.1 → levseq-1.4.2}/tests/test_variant_calling.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: levseq
3
- Version: 1.4.1
3
+ Version: 1.4.2
4
4
  Home-page: https://github.com/fhalab/levseq/
5
5
  Author: Yueming Long, Ariane Mora, Francesca-Zhoufan Li, Emre Gursoy
6
6
  Author-email: ylong@caltech.edu
@@ -69,7 +69,10 @@ Figure 1: Overview of the LevSeq variant sequencing workflow using Nanopore tech
69
69
  ```bash
70
70
  docker run --rm -v "/full/path/to/data:/levseq_results" yueminglong/levseq:levseq-1.4-arm64 my_experiment levseq_results/ levseq_results/ref.csv
71
71
  ```
72
-
72
+ 4. Connect function data to your sequence data
73
+ ```bash
74
+ docker run --rm -v "/full/path/to/data:/levseq_results" yueminglong/levseq:levseq-1.4-arm64 my_experiment levseq_results/ levseq_results/ref.csv --fitness_files "levseq_results/20250712_epPCR_Q06714_37.csv,levseq_results/20250712_epPCR_Q06714_39.csv,levseq_results/20250712_epPCR_Q06714_40.csv" --smiles 'O=P(OC1=CC=CC=C1)(OC2=CC=CC=C2)OC3=CC=CC=C3>>O=P(O)(OC4=CC=CC=C4)OC5=CC=CC=C5' --compound dPPi --variant_df "levseq_results/visualization_partial.csv"
75
+ ```
73
76
  ### Pip Installation (Mac/Linux only)
74
77
 
75
78
  **IMPORTANT**: On Mac M-series chips (M1-M4), gcc 13 and 14 are **REQUIRED**:
@@ -98,6 +101,18 @@ brew install gcc@13 gcc@14
98
101
  levseq my_experiment /path/to/data/ /path/to/ref.csv
99
102
  ```
100
103
 
104
+ 5. Combine function data:
105
+ ```bash
106
+ levseq my_experiment /path/to/data/ /path/to/ref.csv "LCMS_file_{barcode1}.csv,LCMS_file_{barcode2}.csv," --smiles 'reaction_smiles_string' --compound "name_of_compound_in_LCMS_file" --variant_df "visualization_partial.csv"
107
+ ```
108
+
109
+ Note for function data we currently expect a LCMS file e.g. with the columns:
110
+ - `Sample Vial Number` (corresponding to the well that the sample was from).
111
+ - `Area` (which becomes fitness value).
112
+ - `Compound Name` which is the name of the compound we filter for that is passed as a parameter.
113
+ - The last `_X.csv` needs to be the barcode number to match that sample to your plate e.g. if you ran LevSeq with barcode 33 for plate 2 you need to have `_33.csv` for the fitness file for plate 2. e.g. `some_fitnes_for_plate_2_33.csv`.
114
+
115
+
101
116
  ## Data and Visualization
102
117
 
103
118
  - **Test Data**: Sample data is available on Zenodo [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.13694463.svg)](https://doi.org/10.5281/zenodo.13694463)
@@ -22,7 +22,10 @@ Figure 1: Overview of the LevSeq variant sequencing workflow using Nanopore tech
22
22
  ```bash
23
23
  docker run --rm -v "/full/path/to/data:/levseq_results" yueminglong/levseq:levseq-1.4-arm64 my_experiment levseq_results/ levseq_results/ref.csv
24
24
  ```
25
-
25
+ 4. Connect function data to your sequence data
26
+ ```bash
27
+ docker run --rm -v "/full/path/to/data:/levseq_results" yueminglong/levseq:levseq-1.4-arm64 my_experiment levseq_results/ levseq_results/ref.csv --fitness_files "levseq_results/20250712_epPCR_Q06714_37.csv,levseq_results/20250712_epPCR_Q06714_39.csv,levseq_results/20250712_epPCR_Q06714_40.csv" --smiles 'O=P(OC1=CC=CC=C1)(OC2=CC=CC=C2)OC3=CC=CC=C3>>O=P(O)(OC4=CC=CC=C4)OC5=CC=CC=C5' --compound dPPi --variant_df "levseq_results/visualization_partial.csv"
28
+ ```
26
29
  ### Pip Installation (Mac/Linux only)
27
30
 
28
31
  **IMPORTANT**: On Mac M-series chips (M1-M4), gcc 13 and 14 are **REQUIRED**:
@@ -51,6 +54,18 @@ brew install gcc@13 gcc@14
51
54
  levseq my_experiment /path/to/data/ /path/to/ref.csv
52
55
  ```
53
56
 
57
+ 5. Combine function data:
58
+ ```bash
59
+ levseq my_experiment /path/to/data/ /path/to/ref.csv "LCMS_file_{barcode1}.csv,LCMS_file_{barcode2}.csv," --smiles 'reaction_smiles_string' --compound "name_of_compound_in_LCMS_file" --variant_df "visualization_partial.csv"
60
+ ```
61
+
62
+ Note for function data we currently expect a LCMS file e.g. with the columns:
63
+ - `Sample Vial Number` (corresponding to the well that the sample was from).
64
+ - `Area` (which becomes fitness value).
65
+ - `Compound Name` which is the name of the compound we filter for that is passed as a parameter.
66
+ - The last `_X.csv` needs to be the barcode number to match that sample to your plate e.g. if you ran LevSeq with barcode 33 for plate 2 you need to have `_33.csv` for the fitness file for plate 2. e.g. `some_fitnes_for_plate_2_33.csv`.
67
+
68
+
54
69
  ## Data and Visualization
55
70
 
56
71
  - **Test Data**: Sample data is available on Zenodo [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.13694463.svg)](https://doi.org/10.5281/zenodo.13694463)
@@ -18,7 +18,7 @@
18
18
  __title__ = 'levseq'
19
19
  __description__ = 'LevSeq nanopore sequencing'
20
20
  __url__ = 'https://github.com/fhalab/levseq/'
21
- __version__ = '1.4.1'
21
+ __version__ = '1.4.2'
22
22
  __author__ = 'Yueming Long, Ariane Mora, Francesca-Zhoufan Li, Emre Gursoy'
23
23
  __author_email__ = 'ylong@caltech.edu'
24
24
  __license__ = 'GPL3'
@@ -21,6 +21,8 @@ Contain argument parsers used for command line interface and web interface
21
21
  import os
22
22
  import tqdm
23
23
  import argparse
24
+ import pandas as pd
25
+
24
26
  # Import local packages
25
27
  from levseq.run_levseq import run_LevSeq
26
28
 
@@ -68,16 +70,73 @@ def build_cli_parser():
68
70
  help="Whether this experiment came from an oligopool, default is false.")
69
71
  optional_args_group.add_argument("--show_msa",
70
72
  default=False,
71
- help="Skip showing msa")
73
+ help="Skip showing msa")
74
+ # if cl_args.get('fitness_files') and cl_args.get('smiles'):
75
+ optional_args_group.add_argument("--fitness_files",
76
+ default=None,
77
+ help="A comma separated list of fitness files (full path) with string quotation marks around them.")
78
+ optional_args_group.add_argument("--smiles",
79
+ default=None,
80
+ help="A smiles string of the reaction with quotation marks around.")
81
+ optional_args_group.add_argument("--compound",
82
+ default=None,
83
+ help="The compound in the fitness files (e.g. pDT or pdt - case sensitive).")
84
+ optional_args_group.add_argument("--variant_df",
85
+ default=None,
86
+ help="The variant dataframe to combine with fitness data.")
72
87
  return parser
73
88
 
74
89
 
90
+ def combine_seq_func_data(cl_args):
91
+ # Also check if we have any fitness data
92
+ if cl_args.get('fitness_files') and cl_args.get('smiles') and cl_args.get('variant_df'):
93
+ variant_filename = cl_args.get('variant_df')
94
+ variant_df = pd.read_csv(variant_filename)
95
+ # Combine the fitness data with the plate data (note the barcode has to be the last _[barcode])
96
+ # The smiles has to be the reaction smiles
97
+ function_files = cl_args.get('fitness_files')
98
+ compound_name = cl_args.get('compound') if cl_args.get('compound') else 'pdt'
99
+ print(function_files, compound_name)
100
+ all_function_df = pd.DataFrame()
101
+ for function_file in function_files.split(','):
102
+ barcode = function_file.split('.csv')[0].split('_')[-1]
103
+ function_df = pd.read_csv(f'{function_file}')
104
+ function_df.columns = [c.replace('\n', ' ') for c in function_df.columns]
105
+ function_df['function_well'] = [x.split('-')[-1] if isinstance(x, str) else None for x in function_df['Sample Vial Number'].values]
106
+ function_df['function_barcode_plate'] = barcode
107
+ function_df = function_df[function_df['Compound Name'] == compound_name] # We only use pdt or Pdt
108
+ # Convert it to numeric
109
+ function_df['Area'] = pd.to_numeric(function_df['Area'], errors='coerce')
110
+
111
+ function_df['barcode_well'] = [f'{p}_{w}' for w, p in function_df[['function_well', 'function_barcode_plate']].values]
112
+ function_df['filename'] = function_file
113
+ print(function_df.head())
114
+ all_function_df = pd.concat([all_function_df, function_df])
115
+ # Join this with the variant_df barcode plate
116
+ variant_df['barcode_well'] = [f'{p}_{w}' for w, p in variant_df[['Well', 'barcode_plate']].values]
117
+ # Join the two
118
+ variant_df.set_index('barcode_well', inplace=True)
119
+ all_function_df.set_index('barcode_well', inplace=True)
120
+ variant_df = variant_df.join(all_function_df, how='left')
121
+ reaction_smiles = cl_args.get('smiles')
122
+ variant_df['smiles_string'] = reaction_smiles.split('>>')[-1]
123
+ variant_df['reaction_smiles'] = reaction_smiles
124
+ variant_df.columns = [c.lower().replace(' ', '_') for c in variant_df.columns]
125
+ variant_df.rename(columns={'area': 'fitness_value'}, inplace=True)
126
+ variant_df.to_csv(f'{variant_filename.replace(".csv", "_seqfunc.csv")}')
127
+
128
+ # levseq levseq_4.1 ref.csv fitness --fitness_files "20250712_epPCR_Q06714_37.csv,20250712_epPCR_Q06714_38.csv,20250712_epPCR_Q06714_39.csv,20250712_epPCR_Q06714_40.csv" --smiles 'O=P(OC1=CC=CC=C1)(OC2=CC=CC=C2)OC3=CC=CC=C3>>O=P(O)(OC4=CC=CC=C4)OC5=CC=CC=C5' --compound dPPi --variant_df visualization_partial.csv
129
+ return variant_df
130
+
75
131
  # Execute LevSeq
76
132
  def execute_LevSeq():
77
133
  # Build parser
78
134
  parser = build_cli_parser()
79
135
  # Parse the arguments
80
136
  CL_ARGS = vars(parser.parse_args())
137
+ if CL_ARGS.get('fitness_files') and CL_ARGS.get('smiles') and CL_ARGS.get('variant_df'):
138
+ print('Combining LevSeq')
139
+ return combine_seq_func_data(CL_ARGS)
81
140
  # Set up progres bar
82
141
  tqdm_fn = tqdm.tqdm
83
142
  # Run LevSeq
@@ -602,6 +602,7 @@ def process_ref_csv(cl_args, tqdm_fn=tqdm.tqdm):
602
602
  continue
603
603
 
604
604
  variant_df.to_csv(variant_csv_path, index=False)
605
+
605
606
  return variant_df, ref_df
606
607
 
607
608
 
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: levseq
3
- Version: 1.4.1
3
+ Version: 1.4.2
4
4
  Home-page: https://github.com/fhalab/levseq/
5
5
  Author: Yueming Long, Ariane Mora, Francesca-Zhoufan Li, Emre Gursoy
6
6
  Author-email: ylong@caltech.edu
@@ -69,7 +69,10 @@ Figure 1: Overview of the LevSeq variant sequencing workflow using Nanopore tech
69
69
  ```bash
70
70
  docker run --rm -v "/full/path/to/data:/levseq_results" yueminglong/levseq:levseq-1.4-arm64 my_experiment levseq_results/ levseq_results/ref.csv
71
71
  ```
72
-
72
+ 4. Connect function data to your sequence data
73
+ ```bash
74
+ docker run --rm -v "/full/path/to/data:/levseq_results" yueminglong/levseq:levseq-1.4-arm64 my_experiment levseq_results/ levseq_results/ref.csv --fitness_files "levseq_results/20250712_epPCR_Q06714_37.csv,levseq_results/20250712_epPCR_Q06714_39.csv,levseq_results/20250712_epPCR_Q06714_40.csv" --smiles 'O=P(OC1=CC=CC=C1)(OC2=CC=CC=C2)OC3=CC=CC=C3>>O=P(O)(OC4=CC=CC=C4)OC5=CC=CC=C5' --compound dPPi --variant_df "levseq_results/visualization_partial.csv"
75
+ ```
73
76
  ### Pip Installation (Mac/Linux only)
74
77
 
75
78
  **IMPORTANT**: On Mac M-series chips (M1-M4), gcc 13 and 14 are **REQUIRED**:
@@ -98,6 +101,18 @@ brew install gcc@13 gcc@14
98
101
  levseq my_experiment /path/to/data/ /path/to/ref.csv
99
102
  ```
100
103
 
104
+ 5. Combine function data:
105
+ ```bash
106
+ levseq my_experiment /path/to/data/ /path/to/ref.csv "LCMS_file_{barcode1}.csv,LCMS_file_{barcode2}.csv," --smiles 'reaction_smiles_string' --compound "name_of_compound_in_LCMS_file" --variant_df "visualization_partial.csv"
107
+ ```
108
+
109
+ Note for function data we currently expect a LCMS file e.g. with the columns:
110
+ - `Sample Vial Number` (corresponding to the well that the sample was from).
111
+ - `Area` (which becomes fitness value).
112
+ - `Compound Name` which is the name of the compound we filter for that is passed as a parameter.
113
+ - The last `_X.csv` needs to be the barcode number to match that sample to your plate e.g. if you ran LevSeq with barcode 33 for plate 2 you need to have `_33.csv` for the fitness file for plate 2. e.g. `some_fitnes_for_plate_2_33.csv`.
114
+
115
+
101
116
  ## Data and Visualization
102
117
 
103
118
  - **Test Data**: Sample data is available on Zenodo [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.13694463.svg)](https://doi.org/10.5281/zenodo.13694463)
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes