@datagrok/hit-triage 1.1.6 → 1.1.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +4 -0
- package/README_HD.md +37 -4
- package/README_HT.md +37 -4
- package/dist/package.js +1 -1
- package/dist/package.js.map +1 -1
- package/package.json +1 -1
- package/src/app/dialogs/functions-dialog.ts +26 -20
- package/src/app/hit-design-app.ts +5 -31
- package/src/app/hit-triage-app.ts +0 -11
- package/src/app/utils/calculate-single-cell.ts +19 -2
package/CHANGELOG.md
CHANGED
package/README_HD.md
CHANGED
|
@@ -100,7 +100,7 @@ This function will go through every molecule in the dataframe, convert them to c
|
|
|
100
100
|
|
|
101
101
|
Datagrok scripts can also be used as compute functions. For example, you can create a js script that adds a new column to the dataframe. This script also needs to have `HitTriageFunction` tag and should accept `Dataframe` `table` and `Column` `molecules` as first two inputs:
|
|
102
102
|
|
|
103
|
-
```
|
|
103
|
+
```javascript
|
|
104
104
|
//name: Demo script HT
|
|
105
105
|
//description: Hello world script
|
|
106
106
|
//language: javascript
|
|
@@ -115,9 +115,42 @@ res = df
|
|
|
115
115
|
|
|
116
116
|
```
|
|
117
117
|
|
|
118
|
+
Or a python script that calculates the number of atoms in the molecule and multiplies it by a specified value. In case of python, you need to return the dataframe containing columns that you want to append:
|
|
119
|
+
|
|
120
|
+
```python
|
|
121
|
+
#name: HTPythonDemo
|
|
122
|
+
#description: Calculates number of atoms in mulecule in python and also multiplies it by specified value 'multiplier'
|
|
123
|
+
#language: python
|
|
124
|
+
#tags: HitTriageFunction
|
|
125
|
+
#input: dataframe table [Data table]
|
|
126
|
+
#input: column col {semType: Molecule}
|
|
127
|
+
#input: int multiplier
|
|
128
|
+
#output: dataframe result
|
|
129
|
+
|
|
130
|
+
from rdkit import Chem
|
|
131
|
+
import numpy as np
|
|
132
|
+
# in python, column is passed as column name and dataframes are in pandas format.
|
|
133
|
+
# first, get the column.
|
|
134
|
+
molecules = table[col]
|
|
135
|
+
length = len(molecules)
|
|
136
|
+
# create array of same length
|
|
137
|
+
resCol = np.full(length, None, dtype=object)
|
|
138
|
+
for n in range(0, length):
|
|
139
|
+
if molecules[n] == "":
|
|
140
|
+
continue
|
|
141
|
+
try:
|
|
142
|
+
mol = Chem.MolFromMolBlock(molecules[n], sanitize = True) if ("M END" in molecules[n]) else Chem.MolFromSmiles(molecules[n], sanitize = True)
|
|
143
|
+
if mol is None or mol.GetNumAtoms() == 0:
|
|
144
|
+
continue
|
|
145
|
+
resCol[n] = mol.GetNumAtoms() * multiplier
|
|
146
|
+
except:
|
|
147
|
+
continue
|
|
148
|
+
result = pd.DataFrame({'Number of Atoms * mult': resCol})
|
|
149
|
+
```
|
|
150
|
+
|
|
118
151
|
Similarly, queries with same `HitTriageFunction` tag will be added to the compute functions list. The query needs to have at least one input, first of which must be `list<string>`, representing the list of molecules. The query must return a dataframe, which should contain column `molecules` in order to join result with initial dataframe. `molecules` column will be used as key for joining tables. For example, you can create a query that looks for the molecule in Chembl database and returns the molregno number:
|
|
119
152
|
|
|
120
|
-
```
|
|
153
|
+
```sql
|
|
121
154
|
--name: ChemblMolregNoBySmiles
|
|
122
155
|
--friendlyName: Chembl Molregno by smiles
|
|
123
156
|
--input: list<string> molecules
|
|
@@ -131,7 +164,7 @@ select molregno, molecules from compound_structures c
|
|
|
131
164
|
|
|
132
165
|
Or a query that calculates fraction of sp3 hybridized carbons in the molecule using RDKit SQL cartridge:
|
|
133
166
|
|
|
134
|
-
```
|
|
167
|
+
```sql
|
|
135
168
|
--name: SP3Fraction
|
|
136
169
|
--friendlyName: SP3 fraction of carbons
|
|
137
170
|
--input: list<string> molecules
|
|
@@ -146,7 +179,7 @@ where is_valid_smiles(Cast(molecules as cstring))
|
|
|
146
179
|
|
|
147
180
|
Submit functions are used to save or submit the filtered and computed dataset. This could include saving to a private database or additional calculations. Submit functions are defined in the same way as compute functions, but they are tagged with `HitTriageSubmitFunction` tag. The function should accept only two inputs, `Dataframe` `df` and `String` `molecules`, which are the resulting dataframe and name of molecules column respectively. For example, we can create a function that saves the filtered and computed dataset to a database:
|
|
148
181
|
|
|
149
|
-
```
|
|
182
|
+
```typescript
|
|
150
183
|
//name: Sample File Submit
|
|
151
184
|
//tags: HitTriageSubmitFunction
|
|
152
185
|
//input: dataframe df [dataframe]
|
package/README_HT.md
CHANGED
|
@@ -44,7 +44,7 @@ The application will detect that the function/query requeires an input parameter
|
|
|
44
44
|
|
|
45
45
|
Compute functions are used to calculate molecular properties. For example, mass, solubility, mutagenicity, partial charges, toxicity risks, etc. By default, Hit design will include compute functions from `Chem` package, which are molecular descriptors, Structural alerts, Toxicity risks and Chemical properties. Users can add additional compute functions by tagging them with `HitDesignFunction` tag and writing them in normal datagrok style. The First two inputs of these functions should be `Dataframe` `table` and `Column` `molecule`, and rest can be any other input. Function should perform a certain task, modify the dataframe in desired way and return the modified dataframe. For example, we can create a function that retrieves the `Chembl` mol registration number by smiles string:
|
|
46
46
|
|
|
47
|
-
```
|
|
47
|
+
```typescript
|
|
48
48
|
//name: Chembl molregno
|
|
49
49
|
//tags: HitTriageFunction
|
|
50
50
|
//input: dataframe table [Input data table] {caption: Table}
|
|
@@ -72,7 +72,7 @@ This function will go through every molecule in the dataframe, convert them to c
|
|
|
72
72
|
|
|
73
73
|
Datagrok scripts can also be used as compute functions. For example, you can create a js script that adds a new column to the dataframe. This script also needs to have `HitTriageFunction` tag and should accept `Dataframe` `table` and `Column` `molecules` as first two inputs:
|
|
74
74
|
|
|
75
|
-
```
|
|
75
|
+
```javascript
|
|
76
76
|
//name: Demo script HT
|
|
77
77
|
//description: Hello world script
|
|
78
78
|
//language: javascript
|
|
@@ -87,9 +87,42 @@ res = df
|
|
|
87
87
|
|
|
88
88
|
```
|
|
89
89
|
|
|
90
|
+
Or a python script that calculates the number of atoms in the molecule and multiplies it by a specified value. In case of python, you need to return the dataframe containing columns that you want to append:
|
|
91
|
+
|
|
92
|
+
```python
|
|
93
|
+
#name: HTPythonDemo
|
|
94
|
+
#description: Calculates number of atoms in mulecule in python and also multiplies it by specified value 'multiplier'
|
|
95
|
+
#language: python
|
|
96
|
+
#tags: HitTriageFunction
|
|
97
|
+
#input: dataframe table [Data table]
|
|
98
|
+
#input: column col {semType: Molecule}
|
|
99
|
+
#input: int multiplier
|
|
100
|
+
#output: dataframe result
|
|
101
|
+
|
|
102
|
+
from rdkit import Chem
|
|
103
|
+
import numpy as np
|
|
104
|
+
# in python, column is passed as column name and dataframes are in pandas format.
|
|
105
|
+
# first, get the column.
|
|
106
|
+
molecules = table[col]
|
|
107
|
+
length = len(molecules)
|
|
108
|
+
# create array of same length
|
|
109
|
+
resCol = np.full(length, None, dtype=object)
|
|
110
|
+
for n in range(0, length):
|
|
111
|
+
if molecules[n] == "":
|
|
112
|
+
continue
|
|
113
|
+
try:
|
|
114
|
+
mol = Chem.MolFromMolBlock(molecules[n], sanitize = True) if ("M END" in molecules[n]) else Chem.MolFromSmiles(molecules[n], sanitize = True)
|
|
115
|
+
if mol is None or mol.GetNumAtoms() == 0:
|
|
116
|
+
continue
|
|
117
|
+
resCol[n] = mol.GetNumAtoms() * multiplier
|
|
118
|
+
except:
|
|
119
|
+
continue
|
|
120
|
+
result = pd.DataFrame({'Number of Atoms * mult': resCol})
|
|
121
|
+
```
|
|
122
|
+
|
|
90
123
|
Similarly, queries with same `HitTriageFunction` tag will be added to the compute functions list. The query needs to have at least one input, first of which must be `list<string>`, representing the list of molecules. The query must return a dataframe, which should contain column `molecules` in order to join result with initial dataframe. `molecules` column will be used as key for joining tables. For example, we can create a query that looks for the molecule in Chembl database and returns the molregno number:
|
|
91
124
|
|
|
92
|
-
```
|
|
125
|
+
```sql
|
|
93
126
|
--name: ChemblMolregNoBySmilesDirect
|
|
94
127
|
--friendlyName: Chembl Molregno by smiles direct
|
|
95
128
|
--input: list<string> molecules
|
|
@@ -103,7 +136,7 @@ select molregno, molecules from compound_structures c
|
|
|
103
136
|
|
|
104
137
|
Or a query that calculates fraction of sp3 hybridized carbons in the molecule using RDKit SQL cartridge:
|
|
105
138
|
|
|
106
|
-
```
|
|
139
|
+
```sql
|
|
107
140
|
--name: SP3Fraction
|
|
108
141
|
--friendlyName: SP3 fraction of carbons
|
|
109
142
|
--input: list<string> molecules
|