CartMFP 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,127 @@
1
+ Metadata-Version: 2.4
2
+ Name: CartMFP
3
+ Version: 0.1.0
4
+ Summary: CartMFP: tools for database construction and formula prediction
5
+ Requires-Python: >=3.8
6
+ Description-Content-Type: text/markdown
7
+ Requires-Dist: numpy==2.3.5
8
+ Requires-Dist: pandas==2.3.3
9
+ Requires-Dist: pyarrow==21.0.0
10
+ Requires-Dist: scipy==1.16.3
11
+ Requires-Dist: psutil==7.0.0
12
+ Requires-Dist: npy-append-array==0.9.19
13
+
14
+ # CartMFP
15
+
16
+ CartMFP or Cartesian molecular formula prediction, is a python tool that can perform molecular formula predictions on custom databases.
17
+ <br>
18
+
19
+ #### How does CartMFP work
20
+
21
+ CartMFP consists of two steps.
22
+ 1. Constructing a database *space2cart.py* : A local database is constructed. <br>
23
+ 2. Molecular formula prediction *cart2form.py* : Molecular formulas that are predicted based on input masses. <br> <br>
24
+
25
+
26
+ # 1. Constructing a database
27
+
28
+ A database is constructed by enumerating all combinations of elements within a certain range.
29
+ The compositional space is described with a specific syntax: Element[min,max].
30
+ This can be any element in the periodic table for which the monoisotopic mass is described in the NIST database.
31
+ "H[200]C[75]N[50]O[50]P[10]S[10]" is used default elemental space.
32
+ This would eqaute to 0-200 Hydrogen, 0-75 Carbon, 0-50 Nitrogen, etc.
33
+
34
+ Apart from max element constraints, the elemental composition space is further limited by the maximum mass `max_mass` and ring double bond equivalents (RDBE) `min_rdbe`,`max_rdbe`.
35
+
36
+ Base chemical constraints:
37
+ |Parameter | Default value | Description|
38
+ |-----------------|:-----------:|---------------|
39
+ |-composition| "H[200]C[75]N[50]O[50]P[10]S[10]" | composition string describing minimum and maximum element counts|
40
+ |-max_mass| 1000 | maximum mass (Da)|
41
+ |-min_rdbe | -5 | minimum RDBE |
42
+ |-max_rdbe| 80 | maximum RDBE |
43
+
44
+
45
+ Additional chemical constraints are provided by implementing some of Fiehn's 7 Golden rules, which filters unrealistic or impossible compositions.
46
+ This can drastically reduce the size of your composition space. These include: Rule #2 – LEWIS and SENIOR check; Rule #4 – Hydrogen/Carbon element ratio check; Rule #5 heteroatom ratio check and Rule #6 – element probability check.
47
+
48
+ |Parameter | Default value | Description|
49
+ |-----------------|:-----------:|---------------|
50
+ |-filt_7gr| True | Toggle global to apply or remove 7 golden rules filtering|
51
+ |-filt_LewisSenior| True | Golden Rule #2: Filter compositions with non integer dbe (based on max valence) |
52
+ |-filt_ratios | "HC[0.1,6]FC[0,6]ClC[0,2]BrC[0,2]NC[0,4]OC[0,3]PC[0,2]SC[0,3]SiC[0,1]" | #Golden Rules #4,5: Filter on chemical ratios with extended range 99.9% coverage |
53
+ |-filt_NOPS| True | #6 – element probability check. |
54
+
55
+ Additional arguments can be supplied to affect the performance and output paths:
56
+
57
+ |Parameter | Default value | Description|
58
+ |-----------------|:-----------:|---------------|
59
+ |-maxmem | 10e9 | Amount of memory used in bytes |
60
+ |-mass_blowup | 100000 |blowup factor to convert float masses to integers|
61
+ |-write_mass | True | construct a mass lookup table (faster MFP but larger database)|
62
+ |-Cartesian_output_folder | "Cart_Output" | Path to output folder |
63
+ |-Cartesian_output_file |<depends on parameters> | Output database name |
64
+
65
+ ## Example use space2cart (In command line)
66
+
67
+ Contstruct default database:
68
+ ```
69
+ python "space2cart.py"
70
+ ```
71
+
72
+ Contstruct database with halogens:
73
+ ```
74
+ python "space2cart.py" -composition "H[200]C[75]N[50]O[50]P[10]S[10]F[5]Cl[5]I[3]Br[3]"
75
+ ```
76
+ space2cart can also be executed by running the script in an IDE, such as Spyder.
77
+
78
+ # 2. Molecular formula prediction
79
+
80
+ After the composition database has been constructed with `space2cart.py` , molecular formula prediction can be done using `cart2form.py`.
81
+ To run cart2form, an input mass list has to be supplied, which can be linked to a file in txt or any tabular format.
82
+ The file either has to have a single column, or a column titled "Mz" or "Mass".
83
+ Alternatively cart2form can be imported as a module within a script, and executed on a float mass or iterable set of masses with the function `predict_formula(input_file,composition_file)`.
84
+
85
+ #### required inputs
86
+ |Parameter | Default value | Description|
87
+ |-----------------|:-----------:|---------------|
88
+ |-input_file | "test_mass_CASMI2022.txt"| path to list of masses|
89
+ |-composition_file | "H[200]C[75]N[50]O[50]P[10]S[10]_b100000max1000rdbe-5_80_7gr_comp.npy"| path to the database composition file|
90
+
91
+
92
+ Optional arguments can supplied to tune which massses will be returned, this includes
93
+ the polariy, which adducts to consider, which charge states to consider.
94
+ Other key arguments include the ppm tolerance of the mass error of returned composition, and the maximum number of compositions returned per mass. Adducts use the following syntax: "sign(+ addition of /- loss of) elemental composition charge(+/-)"
95
+ |Parameter | Default value | Description|
96
+ |-----------------|:-----------:|---------------|
97
+ |-mode | "pos" | ionization mode. Options: " ", "positive", "negative"|
98
+ |-adducts|["+H+","+Na+","+K", "+-","+Cl-","-H+"] | default positive adducts "--","+H+","+Na+","+K", default negative adducts "+-","-H+","+Cl-" |
99
+ |-charges|[1] |Charge states to consider|
100
+ |-ppm| 5| maximum mass error (ppm) of predicted compositions |
101
+ |-top_candidates | 20 |maxmimum number of compositions returned per mass|
102
+
103
+
104
+ ## Example use cart2form
105
+ Molecular formula prediction from command line:
106
+ ```
107
+ python "cart2form.py" -input_file "test_mass_CASMI2022.txt" -composition_file "H[200]C[75]N[50]O[50]P[10]S[10]_b100000max1000rdbe-5_80_7gr_comp.npy"
108
+ ```
109
+ Alternatively cart2form.py can be imported.
110
+ ```
111
+ import cart2form
112
+ cart2form.predict_formula(input_file=124.56 , #float mass or iterable (array/list/DataFrame)
113
+ composition_file="H[200]C[75]N[50]O[50]P[10]S[10]_b100000max1000rdbe-5_80_7gr_comp.npy")
114
+ ```
115
+
116
+
117
+ #### Licensing
118
+
119
+ The pipeline is licensed with standard MIT-license. <br>
120
+ If you would like to use this pipeline in your research, please cite the following papers:
121
+
122
+
123
+
124
+
125
+ #### Contact:
126
+ -Hugo Kleikamp (Developer): hugo.kleikamp@uantwerpen.be<br>
127
+
@@ -0,0 +1,12 @@
1
+ README.md
2
+ pyproject.toml
3
+ CartMFP.egg-info/PKG-INFO
4
+ CartMFP.egg-info/SOURCES.txt
5
+ CartMFP.egg-info/dependency_links.txt
6
+ CartMFP.egg-info/entry_points.txt
7
+ CartMFP.egg-info/requires.txt
8
+ CartMFP.egg-info/top_level.txt
9
+ cartmfp/Construct_DB.py
10
+ cartmfp/Predict_formula.py
11
+ cartmfp/__init__.py
12
+ cartmfp/test_mass_CASMI2022.txt
@@ -0,0 +1,3 @@
1
+ [console_scripts]
2
+ construct-db = cartmfp.Construct_DB:main
3
+ predict-formula = cartmfp.Predict_formula:main
@@ -0,0 +1,6 @@
1
+ numpy==2.3.5
2
+ pandas==2.3.3
3
+ pyarrow==21.0.0
4
+ scipy==1.16.3
5
+ psutil==7.0.0
6
+ npy-append-array==0.9.19
@@ -0,0 +1 @@
1
+ cartmfp
cartmfp-0.1.0/PKG-INFO ADDED
@@ -0,0 +1,127 @@
1
+ Metadata-Version: 2.4
2
+ Name: CartMFP
3
+ Version: 0.1.0
4
+ Summary: CartMFP: tools for database construction and formula prediction
5
+ Requires-Python: >=3.8
6
+ Description-Content-Type: text/markdown
7
+ Requires-Dist: numpy==2.3.5
8
+ Requires-Dist: pandas==2.3.3
9
+ Requires-Dist: pyarrow==21.0.0
10
+ Requires-Dist: scipy==1.16.3
11
+ Requires-Dist: psutil==7.0.0
12
+ Requires-Dist: npy-append-array==0.9.19
13
+
14
+ # CartMFP
15
+
16
+ CartMFP or Cartesian molecular formula prediction, is a python tool that can perform molecular formula predictions on custom databases.
17
+ <br>
18
+
19
+ #### How does CartMFP work
20
+
21
+ CartMFP consists of two steps.
22
+ 1. Constructing a database *space2cart.py* : A local database is constructed. <br>
23
+ 2. Molecular formula prediction *cart2form.py* : Molecular formulas that are predicted based on input masses. <br> <br>
24
+
25
+
26
+ # 1. Constructing a database
27
+
28
+ A database is constructed by enumerating all combinations of elements within a certain range.
29
+ The compositional space is described with a specific syntax: Element[min,max].
30
+ This can be any element in the periodic table for which the monoisotopic mass is described in the NIST database.
31
+ "H[200]C[75]N[50]O[50]P[10]S[10]" is used default elemental space.
32
+ This would eqaute to 0-200 Hydrogen, 0-75 Carbon, 0-50 Nitrogen, etc.
33
+
34
+ Apart from max element constraints, the elemental composition space is further limited by the maximum mass `max_mass` and ring double bond equivalents (RDBE) `min_rdbe`,`max_rdbe`.
35
+
36
+ Base chemical constraints:
37
+ |Parameter | Default value | Description|
38
+ |-----------------|:-----------:|---------------|
39
+ |-composition| "H[200]C[75]N[50]O[50]P[10]S[10]" | composition string describing minimum and maximum element counts|
40
+ |-max_mass| 1000 | maximum mass (Da)|
41
+ |-min_rdbe | -5 | minimum RDBE |
42
+ |-max_rdbe| 80 | maximum RDBE |
43
+
44
+
45
+ Additional chemical constraints are provided by implementing some of Fiehn's 7 Golden rules, which filters unrealistic or impossible compositions.
46
+ This can drastically reduce the size of your composition space. These include: Rule #2 – LEWIS and SENIOR check; Rule #4 – Hydrogen/Carbon element ratio check; Rule #5 heteroatom ratio check and Rule #6 – element probability check.
47
+
48
+ |Parameter | Default value | Description|
49
+ |-----------------|:-----------:|---------------|
50
+ |-filt_7gr| True | Toggle global to apply or remove 7 golden rules filtering|
51
+ |-filt_LewisSenior| True | Golden Rule #2: Filter compositions with non integer dbe (based on max valence) |
52
+ |-filt_ratios | "HC[0.1,6]FC[0,6]ClC[0,2]BrC[0,2]NC[0,4]OC[0,3]PC[0,2]SC[0,3]SiC[0,1]" | #Golden Rules #4,5: Filter on chemical ratios with extended range 99.9% coverage |
53
+ |-filt_NOPS| True | #6 – element probability check. |
54
+
55
+ Additional arguments can be supplied to affect the performance and output paths:
56
+
57
+ |Parameter | Default value | Description|
58
+ |-----------------|:-----------:|---------------|
59
+ |-maxmem | 10e9 | Amount of memory used in bytes |
60
+ |-mass_blowup | 100000 |blowup factor to convert float masses to integers|
61
+ |-write_mass | True | construct a mass lookup table (faster MFP but larger database)|
62
+ |-Cartesian_output_folder | "Cart_Output" | Path to output folder |
63
+ |-Cartesian_output_file |<depends on parameters> | Output database name |
64
+
65
+ ## Example use space2cart (In command line)
66
+
67
+ Contstruct default database:
68
+ ```
69
+ python "space2cart.py"
70
+ ```
71
+
72
+ Contstruct database with halogens:
73
+ ```
74
+ python "space2cart.py" -composition "H[200]C[75]N[50]O[50]P[10]S[10]F[5]Cl[5]I[3]Br[3]"
75
+ ```
76
+ space2cart can also be executed by running the script in an IDE, such as Spyder.
77
+
78
+ # 2. Molecular formula prediction
79
+
80
+ After the composition database has been constructed with `space2cart.py` , molecular formula prediction can be done using `cart2form.py`.
81
+ To run cart2form, an input mass list has to be supplied, which can be linked to a file in txt or any tabular format.
82
+ The file either has to have a single column, or a column titled "Mz" or "Mass".
83
+ Alternatively cart2form can be imported as a module within a script, and executed on a float mass or iterable set of masses with the function `predict_formula(input_file,composition_file)`.
84
+
85
+ #### required inputs
86
+ |Parameter | Default value | Description|
87
+ |-----------------|:-----------:|---------------|
88
+ |-input_file | "test_mass_CASMI2022.txt"| path to list of masses|
89
+ |-composition_file | "H[200]C[75]N[50]O[50]P[10]S[10]_b100000max1000rdbe-5_80_7gr_comp.npy"| path to the database composition file|
90
+
91
+
92
+ Optional arguments can supplied to tune which massses will be returned, this includes
93
+ the polariy, which adducts to consider, which charge states to consider.
94
+ Other key arguments include the ppm tolerance of the mass error of returned composition, and the maximum number of compositions returned per mass. Adducts use the following syntax: "sign(+ addition of /- loss of) elemental composition charge(+/-)"
95
+ |Parameter | Default value | Description|
96
+ |-----------------|:-----------:|---------------|
97
+ |-mode | "pos" | ionization mode. Options: " ", "positive", "negative"|
98
+ |-adducts|["+H+","+Na+","+K", "+-","+Cl-","-H+"] | default positive adducts "--","+H+","+Na+","+K", default negative adducts "+-","-H+","+Cl-" |
99
+ |-charges|[1] |Charge states to consider|
100
+ |-ppm| 5| maximum mass error (ppm) of predicted compositions |
101
+ |-top_candidates | 20 |maxmimum number of compositions returned per mass|
102
+
103
+
104
+ ## Example use cart2form
105
+ Molecular formula prediction from command line:
106
+ ```
107
+ python "cart2form.py" -input_file "test_mass_CASMI2022.txt" -composition_file "H[200]C[75]N[50]O[50]P[10]S[10]_b100000max1000rdbe-5_80_7gr_comp.npy"
108
+ ```
109
+ Alternatively cart2form.py can be imported.
110
+ ```
111
+ import cart2form
112
+ cart2form.predict_formula(input_file=124.56 , #float mass or iterable (array/list/DataFrame)
113
+ composition_file="H[200]C[75]N[50]O[50]P[10]S[10]_b100000max1000rdbe-5_80_7gr_comp.npy")
114
+ ```
115
+
116
+
117
+ #### Licensing
118
+
119
+ The pipeline is licensed with standard MIT-license. <br>
120
+ If you would like to use this pipeline in your research, please cite the following papers:
121
+
122
+
123
+
124
+
125
+ #### Contact:
126
+ -Hugo Kleikamp (Developer): hugo.kleikamp@uantwerpen.be<br>
127
+
@@ -0,0 +1,114 @@
1
+ # CartMFP
2
+
3
+ CartMFP or Cartesian molecular formula prediction, is a python tool that can perform molecular formula predictions on custom databases.
4
+ <br>
5
+
6
+ #### How does CartMFP work
7
+
8
+ CartMFP consists of two steps.
9
+ 1. Constructing a database *space2cart.py* : A local database is constructed. <br>
10
+ 2. Molecular formula prediction *cart2form.py* : Molecular formulas that are predicted based on input masses. <br> <br>
11
+
12
+
13
+ # 1. Constructing a database
14
+
15
+ A database is constructed by enumerating all combinations of elements within a certain range.
16
+ The compositional space is described with a specific syntax: Element[min,max].
17
+ This can be any element in the periodic table for which the monoisotopic mass is described in the NIST database.
18
+ "H[200]C[75]N[50]O[50]P[10]S[10]" is used default elemental space.
19
+ This would eqaute to 0-200 Hydrogen, 0-75 Carbon, 0-50 Nitrogen, etc.
20
+
21
+ Apart from max element constraints, the elemental composition space is further limited by the maximum mass `max_mass` and ring double bond equivalents (RDBE) `min_rdbe`,`max_rdbe`.
22
+
23
+ Base chemical constraints:
24
+ |Parameter | Default value | Description|
25
+ |-----------------|:-----------:|---------------|
26
+ |-composition| "H[200]C[75]N[50]O[50]P[10]S[10]" | composition string describing minimum and maximum element counts|
27
+ |-max_mass| 1000 | maximum mass (Da)|
28
+ |-min_rdbe | -5 | minimum RDBE |
29
+ |-max_rdbe| 80 | maximum RDBE |
30
+
31
+
32
+ Additional chemical constraints are provided by implementing some of Fiehn's 7 Golden rules, which filters unrealistic or impossible compositions.
33
+ This can drastically reduce the size of your composition space. These include: Rule #2 – LEWIS and SENIOR check; Rule #4 – Hydrogen/Carbon element ratio check; Rule #5 heteroatom ratio check and Rule #6 – element probability check.
34
+
35
+ |Parameter | Default value | Description|
36
+ |-----------------|:-----------:|---------------|
37
+ |-filt_7gr| True | Toggle global to apply or remove 7 golden rules filtering|
38
+ |-filt_LewisSenior| True | Golden Rule #2: Filter compositions with non integer dbe (based on max valence) |
39
+ |-filt_ratios | "HC[0.1,6]FC[0,6]ClC[0,2]BrC[0,2]NC[0,4]OC[0,3]PC[0,2]SC[0,3]SiC[0,1]" | #Golden Rules #4,5: Filter on chemical ratios with extended range 99.9% coverage |
40
+ |-filt_NOPS| True | #6 – element probability check. |
41
+
42
+ Additional arguments can be supplied to affect the performance and output paths:
43
+
44
+ |Parameter | Default value | Description|
45
+ |-----------------|:-----------:|---------------|
46
+ |-maxmem | 10e9 | Amount of memory used in bytes |
47
+ |-mass_blowup | 100000 |blowup factor to convert float masses to integers|
48
+ |-write_mass | True | construct a mass lookup table (faster MFP but larger database)|
49
+ |-Cartesian_output_folder | "Cart_Output" | Path to output folder |
50
+ |-Cartesian_output_file |<depends on parameters> | Output database name |
51
+
52
+ ## Example use space2cart (In command line)
53
+
54
+ Contstruct default database:
55
+ ```
56
+ python "space2cart.py"
57
+ ```
58
+
59
+ Contstruct database with halogens:
60
+ ```
61
+ python "space2cart.py" -composition "H[200]C[75]N[50]O[50]P[10]S[10]F[5]Cl[5]I[3]Br[3]"
62
+ ```
63
+ space2cart can also be executed by running the script in an IDE, such as Spyder.
64
+
65
+ # 2. Molecular formula prediction
66
+
67
+ After the composition database has been constructed with `space2cart.py` , molecular formula prediction can be done using `cart2form.py`.
68
+ To run cart2form, an input mass list has to be supplied, which can be linked to a file in txt or any tabular format.
69
+ The file either has to have a single column, or a column titled "Mz" or "Mass".
70
+ Alternatively cart2form can be imported as a module within a script, and executed on a float mass or iterable set of masses with the function `predict_formula(input_file,composition_file)`.
71
+
72
+ #### required inputs
73
+ |Parameter | Default value | Description|
74
+ |-----------------|:-----------:|---------------|
75
+ |-input_file | "test_mass_CASMI2022.txt"| path to list of masses|
76
+ |-composition_file | "H[200]C[75]N[50]O[50]P[10]S[10]_b100000max1000rdbe-5_80_7gr_comp.npy"| path to the database composition file|
77
+
78
+
79
+ Optional arguments can supplied to tune which massses will be returned, this includes
80
+ the polariy, which adducts to consider, which charge states to consider.
81
+ Other key arguments include the ppm tolerance of the mass error of returned composition, and the maximum number of compositions returned per mass. Adducts use the following syntax: "sign(+ addition of /- loss of) elemental composition charge(+/-)"
82
+ |Parameter | Default value | Description|
83
+ |-----------------|:-----------:|---------------|
84
+ |-mode | "pos" | ionization mode. Options: " ", "positive", "negative"|
85
+ |-adducts|["+H+","+Na+","+K", "+-","+Cl-","-H+"] | default positive adducts "--","+H+","+Na+","+K", default negative adducts "+-","-H+","+Cl-" |
86
+ |-charges|[1] |Charge states to consider|
87
+ |-ppm| 5| maximum mass error (ppm) of predicted compositions |
88
+ |-top_candidates | 20 |maxmimum number of compositions returned per mass|
89
+
90
+
91
+ ## Example use cart2form
92
+ Molecular formula prediction from command line:
93
+ ```
94
+ python "cart2form.py" -input_file "test_mass_CASMI2022.txt" -composition_file "H[200]C[75]N[50]O[50]P[10]S[10]_b100000max1000rdbe-5_80_7gr_comp.npy"
95
+ ```
96
+ Alternatively cart2form.py can be imported.
97
+ ```
98
+ import cart2form
99
+ cart2form.predict_formula(input_file=124.56 , #float mass or iterable (array/list/DataFrame)
100
+ composition_file="H[200]C[75]N[50]O[50]P[10]S[10]_b100000max1000rdbe-5_80_7gr_comp.npy")
101
+ ```
102
+
103
+
104
+ #### Licensing
105
+
106
+ The pipeline is licensed with standard MIT-license. <br>
107
+ If you would like to use this pipeline in your research, please cite the following papers:
108
+
109
+
110
+
111
+
112
+ #### Contact:
113
+ -Hugo Kleikamp (Developer): hugo.kleikamp@uantwerpen.be<br>
114
+