@datagrok/hit-triage 1.1.6 → 1.1.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,9 @@
1
1
  # HitTriage changelog
2
2
 
3
+ ## 1.1.8 (2024-02-22)
4
+
5
+ Add python script support.
6
+
3
7
  ## 1.1.6 (2024-01-26)
4
8
 
5
9
  * Fixes to adding new functions to the campaign.
package/README_HD.md CHANGED
@@ -100,7 +100,7 @@ This function will go through every molecule in the dataframe, convert them to c
100
100
 
101
101
  Datagrok scripts can also be used as compute functions. For example, you can create a js script that adds a new column to the dataframe. This script also needs to have `HitTriageFunction` tag and should accept `Dataframe` `table` and `Column` `molecules` as first two inputs:
102
102
 
103
- ```
103
+ ```javascript
104
104
  //name: Demo script HT
105
105
  //description: Hello world script
106
106
  //language: javascript
@@ -115,9 +115,42 @@ res = df
115
115
 
116
116
  ```
117
117
 
118
+ Or a python script that calculates the number of atoms in the molecule and multiplies it by a specified value. In case of python, you need to return the dataframe containing columns that you want to append:
119
+
120
+ ```python
121
+ #name: HTPythonDemo
122
+ #description: Calculates number of atoms in mulecule in python and also multiplies it by specified value 'multiplier'
123
+ #language: python
124
+ #tags: HitTriageFunction
125
+ #input: dataframe table [Data table]
126
+ #input: column col {semType: Molecule}
127
+ #input: int multiplier
128
+ #output: dataframe result
129
+
130
+ from rdkit import Chem
131
+ import numpy as np
132
+ # in python, column is passed as column name and dataframes are in pandas format.
133
+ # first, get the column.
134
+ molecules = table[col]
135
+ length = len(molecules)
136
+ # create array of same length
137
+ resCol = np.full(length, None, dtype=object)
138
+ for n in range(0, length):
139
+ if molecules[n] == "":
140
+ continue
141
+ try:
142
+ mol = Chem.MolFromMolBlock(molecules[n], sanitize = True) if ("M END" in molecules[n]) else Chem.MolFromSmiles(molecules[n], sanitize = True)
143
+ if mol is None or mol.GetNumAtoms() == 0:
144
+ continue
145
+ resCol[n] = mol.GetNumAtoms() * multiplier
146
+ except:
147
+ continue
148
+ result = pd.DataFrame({'Number of Atoms * mult': resCol})
149
+ ```
150
+
118
151
  Similarly, queries with same `HitTriageFunction` tag will be added to the compute functions list. The query needs to have at least one input, first of which must be `list<string>`, representing the list of molecules. The query must return a dataframe, which should contain column `molecules` in order to join result with initial dataframe. `molecules` column will be used as key for joining tables. For example, you can create a query that looks for the molecule in Chembl database and returns the molregno number:
119
152
 
120
- ```
153
+ ```sql
121
154
  --name: ChemblMolregNoBySmiles
122
155
  --friendlyName: Chembl Molregno by smiles
123
156
  --input: list<string> molecules
@@ -131,7 +164,7 @@ select molregno, molecules from compound_structures c
131
164
 
132
165
  Or a query that calculates fraction of sp3 hybridized carbons in the molecule using RDKit SQL cartridge:
133
166
 
134
- ```
167
+ ```sql
135
168
  --name: SP3Fraction
136
169
  --friendlyName: SP3 fraction of carbons
137
170
  --input: list<string> molecules
@@ -146,7 +179,7 @@ where is_valid_smiles(Cast(molecules as cstring))
146
179
 
147
180
  Submit functions are used to save or submit the filtered and computed dataset. This could include saving to a private database or additional calculations. Submit functions are defined in the same way as compute functions, but they are tagged with `HitTriageSubmitFunction` tag. The function should accept only two inputs, `Dataframe` `df` and `String` `molecules`, which are the resulting dataframe and name of molecules column respectively. For example, we can create a function that saves the filtered and computed dataset to a database:
148
181
 
149
- ```
182
+ ```typescript
150
183
  //name: Sample File Submit
151
184
  //tags: HitTriageSubmitFunction
152
185
  //input: dataframe df [dataframe]
package/README_HT.md CHANGED
@@ -44,7 +44,7 @@ The application will detect that the function/query requeires an input parameter
44
44
 
45
45
  Compute functions are used to calculate molecular properties. For example, mass, solubility, mutagenicity, partial charges, toxicity risks, etc. By default, Hit design will include compute functions from `Chem` package, which are molecular descriptors, Structural alerts, Toxicity risks and Chemical properties. Users can add additional compute functions by tagging them with `HitDesignFunction` tag and writing them in normal datagrok style. The First two inputs of these functions should be `Dataframe` `table` and `Column` `molecule`, and rest can be any other input. Function should perform a certain task, modify the dataframe in desired way and return the modified dataframe. For example, we can create a function that retrieves the `Chembl` mol registration number by smiles string:
46
46
 
47
- ```
47
+ ```typescript
48
48
  //name: Chembl molregno
49
49
  //tags: HitTriageFunction
50
50
  //input: dataframe table [Input data table] {caption: Table}
@@ -72,7 +72,7 @@ This function will go through every molecule in the dataframe, convert them to c
72
72
 
73
73
  Datagrok scripts can also be used as compute functions. For example, you can create a js script that adds a new column to the dataframe. This script also needs to have `HitTriageFunction` tag and should accept `Dataframe` `table` and `Column` `molecules` as first two inputs:
74
74
 
75
- ```
75
+ ```javascript
76
76
  //name: Demo script HT
77
77
  //description: Hello world script
78
78
  //language: javascript
@@ -87,9 +87,42 @@ res = df
87
87
 
88
88
  ```
89
89
 
90
+ Or a python script that calculates the number of atoms in the molecule and multiplies it by a specified value. In case of python, you need to return the dataframe containing columns that you want to append:
91
+
92
+ ```python
93
+ #name: HTPythonDemo
94
+ #description: Calculates number of atoms in mulecule in python and also multiplies it by specified value 'multiplier'
95
+ #language: python
96
+ #tags: HitTriageFunction
97
+ #input: dataframe table [Data table]
98
+ #input: column col {semType: Molecule}
99
+ #input: int multiplier
100
+ #output: dataframe result
101
+
102
+ from rdkit import Chem
103
+ import numpy as np
104
+ # in python, column is passed as column name and dataframes are in pandas format.
105
+ # first, get the column.
106
+ molecules = table[col]
107
+ length = len(molecules)
108
+ # create array of same length
109
+ resCol = np.full(length, None, dtype=object)
110
+ for n in range(0, length):
111
+ if molecules[n] == "":
112
+ continue
113
+ try:
114
+ mol = Chem.MolFromMolBlock(molecules[n], sanitize = True) if ("M END" in molecules[n]) else Chem.MolFromSmiles(molecules[n], sanitize = True)
115
+ if mol is None or mol.GetNumAtoms() == 0:
116
+ continue
117
+ resCol[n] = mol.GetNumAtoms() * multiplier
118
+ except:
119
+ continue
120
+ result = pd.DataFrame({'Number of Atoms * mult': resCol})
121
+ ```
122
+
90
123
  Similarly, queries with same `HitTriageFunction` tag will be added to the compute functions list. The query needs to have at least one input, first of which must be `list<string>`, representing the list of molecules. The query must return a dataframe, which should contain column `molecules` in order to join result with initial dataframe. `molecules` column will be used as key for joining tables. For example, we can create a query that looks for the molecule in Chembl database and returns the molregno number:
91
124
 
92
- ```
125
+ ```sql
93
126
  --name: ChemblMolregNoBySmilesDirect
94
127
  --friendlyName: Chembl Molregno by smiles direct
95
128
  --input: list<string> molecules
@@ -103,7 +136,7 @@ select molregno, molecules from compound_structures c
103
136
 
104
137
  Or a query that calculates fraction of sp3 hybridized carbons in the molecule using RDKit SQL cartridge:
105
138
 
106
- ```
139
+ ```sql
107
140
  --name: SP3Fraction
108
141
  --friendlyName: SP3 fraction of carbons
109
142
  --input: list<string> molecules