@datagrok/hit-triage 1.1.3 → 1.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/.eslintrc.json CHANGED
@@ -26,6 +26,7 @@
26
26
  120
27
27
  ],
28
28
  "require-jsdoc": "off",
29
+ "valid-jsdoc": "off",
29
30
  "spaced-comment": "off",
30
31
  "linebreak-style": "off",
31
32
  "curly": [
package/CHANGELOG.md CHANGED
@@ -1,5 +1,10 @@
1
1
  # HitTriage changelog
2
2
 
3
+ ## 1.1.4 (2024-01-17)
4
+
5
+ * Add ability to use queries directly as source functions.
6
+ * Fixed bugs with hit triage, including incorrect saving and addition of calculated properties.
7
+
3
8
  ## 1.1.3 (2023-12-27)
4
9
 
5
10
  Fix incompatibility with old api version
package/README.md CHANGED
@@ -1,26 +1,9 @@
1
1
  # HitTriage
2
2
 
3
- The HitTriage package is a powerful tool designed for molecule analysis and campaign management within the Datagrok environment. It consists of two applications: HitTriage and HitDesign. This README provides an overview of the package's functionalities, and subsequent readmes will dive deeper into each application's usage.
3
+ The **HitTriage** package is a powerful tool designed for molecule analysis and campaign management within the Datagrok environment. It consists of two applications: [HitTriage](https://github.com/datagrok-ai/public/blob/master/packages/HitTriage/README_HT.md) and [HitDesign](https://github.com/datagrok-ai/public/blob/master/packages/HitTriage/README_HD.md).
4
4
 
5
- ## Features
5
+ - The **HitTriage** application is designed for molecule analysis and filtering. It allows users to upload a dataset, filter it, calculate molecular properties, submit the results to any chosen function or query and share campaigns between users. More about HitTriage can be found [here](https://github.com/datagrok-ai/public/blob/master/packages/HitTriage/README_HT.md).
6
6
 
7
- Common Workflow
7
+ - The **HitDesign** application is similar in terms of campaign management, but instead of uploading a dataset, users can sketch molecules, calculate molecular properties, filter and organize them in stages. More about HitDesign can be found [here](https://github.com/datagrok-ai/public/blob/master/packages/HitTriage/README_HT.md).
8
8
 
9
- 1. **Template Creation:**
10
-
11
- Define a template specifying the data source for molecules, name, key, additional needed information and compute functions. This source can be a file upload or a query in any other package tagged with `HitTriageDataSource` tag.
12
- The Compute functions are collected from any package with a tag `HitTriageFunction`.
13
-
14
- ![template](https://github.com/datagrok-ai/public/blob/master/help/uploads/hittriage/template.png?raw=true)
15
-
16
- 2. **Campaign Building**:
17
-
18
- Create campaigns based on the template.
19
- Provide a campaign name, select the data source, provide additional information and initiate the campaign.
20
- During the campaign run, the specified compute functions are executed, and their results are appended to the dataframe. For example, you can compute molecular descriptors, toxicity risks, structural alerts and more.
21
-
22
- ![template](https://github.com/datagrok-ai/public/blob/master/help/uploads/hittriage/campaign.png?raw=true)
23
-
24
- After running a campaign, you can submit the dataframe to any chosen function or query. or
25
- save the campaign for later use. Saved campaigns can be reloaded and run again by any user on the platform usign the link or the campaigns table on the first page.
26
9
 
package/README_HD.md CHANGED
@@ -68,7 +68,7 @@ Hit design campaign consists of two views, a main design view and a tiles view.
68
68
 
69
69
  HitDesign allows users to define custom compute and submit functions, and these functions can be written in any Datagrok package that is installed in the environment.
70
70
 
71
- ** Compute functions **
71
+ ### Compute functions
72
72
 
73
73
  Compute functions are used to calculate molecular properties. For example, mass, solubility, mutagenicity, partial charges, toxicity risks, etc. By default, Hit design will include compute functions from `Chem` package, which are molecular descriptors, Structural alerts, Toxicity risks and Chemical properties. Users can add additional compute functions by tagging them with `HitDesignFunction` tag and writing them in normal datagrok style. The First two inputs of these functions should be `Dataframe` `table` and `Column` `molecule`, and rest can be any other input. Function should perform a certain task, modify the dataframe in desired way and return the modified dataframe. For example, we can create a function that retrieves the `Chembl` mol registration number by smiles string:
74
74
 
@@ -98,7 +98,51 @@ export async function chemblMolregno(table: DG.DataFrame, molecules: DG.Column):
98
98
 
99
99
  This function will go through every molecule in the dataframe, convert them to canonical smiles and call the query from Chembl database, that will retrieve the molregno number. The result will be added as a new column to the dataframe. If this function is defined in the `Chembl` package, after building and deploying it to stand, it will be automatically added to the compute functions list in HitDesign.
100
100
 
101
- ** Submit functions **
101
+ Datagrok scripts can also be used as compute functions. For example, you can create a js script that adds a new column to the dataframe. This script also needs to have `HitTriageFunction` tag and should accept `Dataframe` `table` and `Column` `molecules` as first two inputs:
102
+
103
+ ```
104
+ //name: Demo script HT
105
+ //description: Hello world script
106
+ //language: javascript
107
+ //input: dataframe df
108
+ //input: column col
109
+ //input: int a
110
+ //tags: HitTriageFunction
111
+ //output: dataframe res
112
+
113
+ df.columns.addNewInt('Some number col').init(() => a)
114
+ res = df
115
+
116
+ ```
117
+
118
+ Similarly, queries with same `HitTriageFunction` tag will be added to the compute functions list. The query needs to have at least one input, first of which must be `list<string>`, representing the list of molecules. The query must return a dataframe, which should contain column `molecules` in order to join result with initial dataframe. `molecules` column will be used as key for joining tables. For example, you can create a query that looks for the molecule in Chembl database and returns the molregno number:
119
+
120
+ ```
121
+ --name: ChemblMolregNoBySmiles
122
+ --friendlyName: Chembl Molregno by smiles
123
+ --input: list<string> molecules
124
+ --tags: HitTriageFunction
125
+ --connection: Chembl
126
+ select molregno, molecules from compound_structures c
127
+ INNER JOIN unnest(@molecules) molecules
128
+ ON molecules.molecules
129
+ = c.canonical_smiles
130
+ ```
131
+
132
+ Or a query that calculates fraction of sp3 hybridized carbons in the molecule using RDKit SQL cartridge:
133
+
134
+ ```
135
+ --name: SP3Fraction
136
+ --friendlyName: SP3 fraction of carbons
137
+ --input: list<string> molecules
138
+ --tags: HitTriageFunction
139
+ --connection: Chembl
140
+ select molecules, mol_fractioncsp3(Cast(molecules as mol))
141
+ from unnest(@molecules) as molecules
142
+ where is_valid_smiles(Cast(molecules as cstring))
143
+ ```
144
+
145
+ ### Submit functions
102
146
 
103
147
  Submit functions are used to save or submit the filtered and computed dataset. This could include saving to a private database or additional calculations. Submit functions are defined in the same way as compute functions, but they are tagged with `HitTriageSubmitFunction` tag. The function should accept only two inputs, `Dataframe` `df` and `String` `molecules`, which are the resulting dataframe and name of molecules column respectively. For example, we can create a function that saves the filtered and computed dataset to a database:
104
148
 
package/README_HT.md CHANGED
@@ -10,27 +10,113 @@ Templates in HitTriage contain essential configurations for conducting a campaig
10
10
 
11
11
  - **Campaign Prefix** : A code used as a prefix for campaign names (e.g., TMP-1, TMP-2).
12
12
 
13
- - **Data Ingestion Settings** : Controls how molecular data is ingested into the template. Users can choose between file upload or a query option. For the query, HitTriage searches for functions in any Datagrok package tagged with `HitTriageDataSource`. For example:
14
-
15
- ```//name: Demo File Ingestion
16
- //input: int numberOfMolecules [Molecules count]
17
- //tags: HitTriageDataSource
18
- //output: dataframe result
19
- export async function demoFileIngest(numberOfMolecules: number): Promise<DG.DataFrame> {
20
- const df = grok.data.demo.molecules(numberOfMolecules);
21
- df.name = 'Variable Molecules number';
22
- return df;
23
- }
24
- ```
25
- The application will detect that the function requeires an input parameter and will prompt the user to provide it in campaigns form. The function must return a dataframe with a column containing molecules
13
+ - **Data Ingestion Settings** : Controls how molecular data is ingested into the template. Users can choose between file upload or a query option. For the query, HitTriage searches for functions in any Datagrok package tagged with `HitTriageDataSource` and queries with same tag. For example, a package can export the following function:
14
+
15
+ ```//name: Demo File Ingestion
16
+ //input: int numberOfMolecules [Molecules count]
17
+ //tags: HitTriageDataSource
18
+ //output: dataframe result
19
+ export async function demoFileIngest(numberOfMolecules: number): Promise<DG.DataFrame> {
20
+ const df = grok.data.demo.molecules(numberOfMolecules);
21
+ df.name = 'Variable Molecules number';
22
+ return df;
23
+ }
24
+ ```
25
+ Or users can write a query in the query editor, save it and share with others. Example for loading molecules from Chembl database:
26
+
27
+ ```--name: _someChemblStructure
28
+ --friendlyName: Load Some Chembl structures
29
+ --input: int numberOfMolecules = 1000
30
+ --tags: HitTriageDataSource
31
+ --connection: Chembl
32
+ select
33
+ canonical_smiles, molregno
34
+ from
35
+ compound_structures
36
+ limit @numberOfMolecules
37
+ ```
38
+
39
+ The application will detect that the function/query requeires an input parameter and will prompt the user to provide it in campaigns form. The function\query must return a dataframe with a column containing molecules.
26
40
 
27
41
  - **Additional fields** : Users can configure additional fields for the template, which will be prompted for input during campaign creation. These fields include name, type, and whether they are required or not. For example, additional field for a campaign can be a target protein name, Head scientist name, deadlile, etc.
28
42
 
29
- - **Compute functions** : HitTriage aggregates compute functions tagged with `HitTriageFunction` from Datagrok packages. Users can select from these functions to perform calculations (e.g., mass, solubility, mutagenicity, partial charges, toxicity risks, etc.) on the dataset.
43
+ - **Compute functions**
44
+
45
+ Compute functions are used to calculate molecular properties. For example, mass, solubility, mutagenicity, partial charges, toxicity risks, etc. By default, Hit design will include compute functions from `Chem` package, which are molecular descriptors, Structural alerts, Toxicity risks and Chemical properties. Users can add additional compute functions by tagging them with `HitDesignFunction` tag and writing them in normal datagrok style. The First two inputs of these functions should be `Dataframe` `table` and `Column` `molecule`, and rest can be any other input. Function should perform a certain task, modify the dataframe in desired way and return the modified dataframe. For example, we can create a function that retrieves the `Chembl` mol registration number by smiles string:
46
+
47
+ ```
48
+ //name: Chembl molregno
49
+ //tags: HitTriageFunction
50
+ //input: dataframe table [Input data table] {caption: Table}
51
+ //input: column molecules {caption: Molecules; semType: Molecule}
52
+ //output: dataframe result
53
+ export async function chemblMolregno(table: DG.DataFrame, molecules: DG.Column): Promise<DG.DataFrame> {
54
+ const name = table.columns.getUnusedName('CHEMBL molregno');
55
+ table.columns.addNewInt(name);
56
+ for (let i = 0; i < molecules.length; i++) {
57
+ const smile = molecules.get(i);
58
+ if (!smile) {
59
+ table.set(name, i, null);
60
+ continue;
61
+ }
62
+ const canonical = grok.chem.convert(smile, DG.chem.Notation.Unknown, DG.chem.Notation.Smiles);
63
+ const resDf: DG.DataFrame = await grok.data.query('Chembl:ChemblMolregNoBySmiles', {smiles: canonical});
64
+ const res: number = resDf.getCol('molregno').toList()[0];
65
+ table.set(name, i, res);
66
+ }
67
+ return table;
68
+ }
69
+ ```
70
+
71
+ This function will go through every molecule in the dataframe, convert them to canonical smiles and call the query from Chembl database, that will retrieve the molregno number. The result will be added as a new column to the dataframe. If this function is defined in the `Chembl` package, after building and deploying it to stand, it will be automatically added to the compute functions list in HitDesign.
72
+
73
+ Datagrok scripts can also be used as compute functions. For example, you can create a js script that adds a new column to the dataframe. This script also needs to have `HitTriageFunction` tag and should accept `Dataframe` `table` and `Column` `molecules` as first two inputs:
74
+
75
+ ```
76
+ //name: Demo script HT
77
+ //description: Hello world script
78
+ //language: javascript
79
+ //input: dataframe df
80
+ //input: column col
81
+ //input: int a
82
+ //tags: HitTriageFunction
83
+ //output: dataframe res
84
+
85
+ df.columns.addNewInt('Some number col').init(() => a)
86
+ res = df
87
+
88
+ ```
89
+
90
+ Similarly, queries with same `HitTriageFunction` tag will be added to the compute functions list. The query needs to have at least one input, first of which must be `list<string>`, representing the list of molecules. The query must return a dataframe, which should contain column `molecules` in order to join result with initial dataframe. `molecules` column will be used as key for joining tables. For example, we can create a query that looks for the molecule in Chembl database and returns the molregno number:
91
+
92
+ ```
93
+ --name: ChemblMolregNoBySmilesDirect
94
+ --friendlyName: Chembl Molregno by smiles direct
95
+ --input: list<string> molecules
96
+ --tags: HitTriageFunction
97
+ --connection: Chembl
98
+ select molregno, molecules from compound_structures c
99
+ INNER JOIN unnest(@molecules) molecules
100
+ ON molecules.molecules
101
+ = c.canonical_smiles
102
+ ```
103
+
104
+ Or a query that calculates fraction of sp3 hybridized carbons in the molecule using RDKit SQL cartridge:
105
+
106
+ ```
107
+ --name: SP3Fraction
108
+ --friendlyName: SP3 fraction of carbons
109
+ --input: list<string> molecules
110
+ --tags: HitTriageFunction
111
+ --connection: Chembl
112
+ select molecules, mol_fractioncsp3(Cast(molecules as mol))
113
+ from unnest(@molecules) as molecules
114
+ where is_valid_smiles(Cast(molecules as cstring))
115
+ ```
30
116
 
31
117
  - **Submit function** : Users can define custom submit functions (tagged with `HitTriageSubmitFunction`) to further process or save the filtered and computed dataset. This could include saving to a private database or additional calculations.
32
118
 
33
- ![template](https://datagrok.ai/help/uploads/hittriage/files/images/template.png)
119
+ ![hitDesignReadmeImg](https://github.com/datagrok-ai/public/blob/master/help/uploads/hittriage/template.png?raw=true)
34
120
 
35
121
  ## Campaigns
36
122
 
@@ -42,22 +128,22 @@ Campaigns are built based on templates and encompass the actual hit triage proce
42
128
 
43
129
  - **Functionality**: Once a campaign starts, you can add extra calculated columns, apply changes, fileter, save or submit the campaign.
44
130
 
45
- ![template](https://datagrok.ai/help/uploads/hittriage/files/images/campaign.png)
131
+ ![hitDesignReadmeImg](https://github.com/datagrok-ai/public/blob/master/help/uploads/hittriage/campaign.png?raw=true)
46
132
 
47
133
  ## Getting started
48
134
 
49
135
  Users can continue ongoing campaigns either directly by a link or by selecting it from the campaigns table.
50
136
 
51
- ![template](https://datagrok.ai/help/uploads/hittriage/files/images/HT_Continue_campaign.gif)
137
+ ![hitDesignReadmeImg](https://github.com/datagrok-ai/public/blob/master/help/uploads/hittriage/HT_Continue_campaign.gif?raw=true)
52
138
 
53
139
  Users can create a new template by clicking on the `New Template` button in the `Templates` dropdown.
54
140
 
55
- ![template](https://datagrok.ai/help/uploads/hittriage/files/images/HT_create_template.gif)
141
+ ![hitDesignReadmeImg](https://github.com/datagrok-ai/public/blob/master/help/uploads/hittriage/HT_create_template.gif?raw=true)
56
142
 
57
143
  Users can start a new campaign by choosing a template and filling out the required information.
58
144
 
59
- ![template](https://datagrok.ai/help/uploads/hittriage/files/images/HT_create_campaign.gif)
145
+ ![hitDesignReadmeImg](https://github.com/datagrok-ai/public/blob/master/help/uploads/hittriage/HT_create_campaign.gif?raw=true)
60
146
 
61
- After the campaign starts, users can filter, modify or add viewers to the campaign and then save them. once saved, reloading the campaign will restore the saved state.
147
+ After the campaign starts, new calculated columns will be added. Users can filter, modify or add viewers to the campaign and then save them. Once saved, reloading the campaign will restore the saved state.
62
148
 
63
- ![template](https://datagrok.ai/help/uploads/hittriage/files/images/HT_save_campaign.gif)
149
+ ![hitDesignReadmeImg](https://github.com/datagrok-ai/public/blob/master/help/uploads/hittriage/HT_save_campaign.gif?raw=true)
@@ -66,7 +66,7 @@ table.hit-triage-table {
66
66
 
67
67
  .hit-triage-compute-dialog-host {
68
68
  max-height: 550px;
69
- max-width: 500px;
69
+ max-width: 700px;
70
70
  }
71
71
 
72
72
  .hit-triage-compute-dialog-descriptors-group {