@datagrok/hit-triage 1.1.4 → 1.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,26 +1,9 @@
1
1
  # HitTriage
2
2
 
3
- The HitTriage package is a powerful tool designed for molecule analysis and campaign management within the Datagrok environment. It consists of two applications: HitTriage and HitDesign. This README provides an overview of the package's functionalities, and subsequent readmes will dive deeper into each application's usage.
3
+ The **HitTriage** package is a powerful tool designed for molecule analysis and campaign management within the Datagrok environment. It consists of two applications: [HitTriage](https://github.com/datagrok-ai/public/blob/master/packages/HitTriage/README_HT.md) and [HitDesign](https://github.com/datagrok-ai/public/blob/master/packages/HitTriage/README_HD.md).
4
4
 
5
- ## Features
5
+ - The **HitTriage** application is designed for molecule analysis and filtering. It allows users to upload a dataset, filter it, calculate molecular properties, submit the results to any chosen function or query and share campaigns between users. More about HitTriage can be found [here](https://github.com/datagrok-ai/public/blob/master/packages/HitTriage/README_HT.md).
6
6
 
7
- Common Workflow
7
+ - The **HitDesign** application is similar in terms of campaign management, but instead of uploading a dataset, users can sketch molecules, calculate molecular properties, filter and organize them in stages. More about HitDesign can be found [here](https://github.com/datagrok-ai/public/blob/master/packages/HitTriage/README_HT.md).
8
8
 
9
- 1. **Template Creation:**
10
-
11
- Define a template specifying the data source for molecules, name, key, additional needed information and compute functions. This source can be a file upload or a query in any other package tagged with `HitTriageDataSource` tag.
12
- The Compute functions are collected from any package with a tag `HitTriageFunction`.
13
-
14
- ![template](https://github.com/datagrok-ai/public/blob/master/help/uploads/hittriage/template.png?raw=true)
15
-
16
- 2. **Campaign Building**:
17
-
18
- Create campaigns based on the template.
19
- Provide a campaign name, select the data source, provide additional information and initiate the campaign.
20
- During the campaign run, the specified compute functions are executed, and their results are appended to the dataframe. For example, you can compute molecular descriptors, toxicity risks, structural alerts and more.
21
-
22
- ![template](https://github.com/datagrok-ai/public/blob/master/help/uploads/hittriage/campaign.png?raw=true)
23
-
24
- After running a campaign, you can submit the dataframe to any chosen function or query. or
25
- save the campaign for later use. Saved campaigns can be reloaded and run again by any user on the platform usign the link or the campaigns table on the first page.
26
9
 
package/README_HD.md CHANGED
@@ -68,7 +68,7 @@ Hit design campaign consists of two views, a main design view and a tiles view.
68
68
 
69
69
  HitDesign allows users to define custom compute and submit functions, and these functions can be written in any Datagrok package that is installed in the environment.
70
70
 
71
- ** Compute functions **
71
+ ### Compute functions
72
72
 
73
73
  Compute functions are used to calculate molecular properties. For example, mass, solubility, mutagenicity, partial charges, toxicity risks, etc. By default, Hit design will include compute functions from `Chem` package, which are molecular descriptors, Structural alerts, Toxicity risks and Chemical properties. Users can add additional compute functions by tagging them with `HitDesignFunction` tag and writing them in normal datagrok style. The First two inputs of these functions should be `Dataframe` `table` and `Column` `molecule`, and rest can be any other input. Function should perform a certain task, modify the dataframe in desired way and return the modified dataframe. For example, we can create a function that retrieves the `Chembl` mol registration number by smiles string:
74
74
 
@@ -98,7 +98,51 @@ export async function chemblMolregno(table: DG.DataFrame, molecules: DG.Column):
98
98
 
99
99
  This function will go through every molecule in the dataframe, convert them to canonical smiles and call the query from Chembl database, that will retrieve the molregno number. The result will be added as a new column to the dataframe. If this function is defined in the `Chembl` package, after building and deploying it to stand, it will be automatically added to the compute functions list in HitDesign.
100
100
 
101
- ** Submit functions **
101
+ Datagrok scripts can also be used as compute functions. For example, you can create a js script that adds a new column to the dataframe. This script also needs to have `HitTriageFunction` tag and should accept `Dataframe` `table` and `Column` `molecules` as first two inputs:
102
+
103
+ ```
104
+ //name: Demo script HT
105
+ //description: Hello world script
106
+ //language: javascript
107
+ //input: dataframe df
108
+ //input: column col
109
+ //input: int a
110
+ //tags: HitTriageFunction
111
+ //output: dataframe res
112
+
113
+ df.columns.addNewInt('Some number col').init(() => a)
114
+ res = df
115
+
116
+ ```
117
+
118
+ Similarly, queries with same `HitTriageFunction` tag will be added to the compute functions list. The query needs to have at least one input, first of which must be `list<string>`, representing the list of molecules. The query must return a dataframe, which should contain column `molecules` in order to join result with initial dataframe. `molecules` column will be used as key for joining tables. For example, you can create a query that looks for the molecule in Chembl database and returns the molregno number:
119
+
120
+ ```
121
+ --name: ChemblMolregNoBySmiles
122
+ --friendlyName: Chembl Molregno by smiles
123
+ --input: list<string> molecules
124
+ --tags: HitTriageFunction
125
+ --connection: Chembl
126
+ select molregno, molecules from compound_structures c
127
+ INNER JOIN unnest(@molecules) molecules
128
+ ON molecules.molecules
129
+ = c.canonical_smiles
130
+ ```
131
+
132
+ Or a query that calculates fraction of sp3 hybridized carbons in the molecule using RDKit SQL cartridge:
133
+
134
+ ```
135
+ --name: SP3Fraction
136
+ --friendlyName: SP3 fraction of carbons
137
+ --input: list<string> molecules
138
+ --tags: HitTriageFunction
139
+ --connection: Chembl
140
+ select molecules, mol_fractioncsp3(Cast(molecules as mol))
141
+ from unnest(@molecules) as molecules
142
+ where is_valid_smiles(Cast(molecules as cstring))
143
+ ```
144
+
145
+ ### Submit functions
102
146
 
103
147
  Submit functions are used to save or submit the filtered and computed dataset. This could include saving to a private database or additional calculations. Submit functions are defined in the same way as compute functions, but they are tagged with `HitTriageSubmitFunction` tag. The function should accept only two inputs, `Dataframe` `df` and `String` `molecules`, which are the resulting dataframe and name of molecules column respectively. For example, we can create a function that saves the filtered and computed dataset to a database:
104
148
 
package/README_HT.md CHANGED
@@ -40,7 +40,79 @@ The application will detect that the function/query requeires an input parameter
40
40
 
41
41
  - **Additional fields** : Users can configure additional fields for the template, which will be prompted for input during campaign creation. These fields include name, type, and whether they are required or not. For example, additional field for a campaign can be a target protein name, Head scientist name, deadlile, etc.
42
42
 
43
- - **Compute functions** : HitTriage aggregates compute functions tagged with `HitTriageFunction` from Datagrok packages. Users can select from these functions to perform calculations (e.g., mass, solubility, mutagenicity, partial charges, toxicity risks, etc.) on the dataset.
43
+ - **Compute functions**
44
+
45
+ Compute functions are used to calculate molecular properties. For example, mass, solubility, mutagenicity, partial charges, toxicity risks, etc. By default, Hit design will include compute functions from `Chem` package, which are molecular descriptors, Structural alerts, Toxicity risks and Chemical properties. Users can add additional compute functions by tagging them with `HitDesignFunction` tag and writing them in normal datagrok style. The First two inputs of these functions should be `Dataframe` `table` and `Column` `molecule`, and rest can be any other input. Function should perform a certain task, modify the dataframe in desired way and return the modified dataframe. For example, we can create a function that retrieves the `Chembl` mol registration number by smiles string:
46
+
47
+ ```
48
+ //name: Chembl molregno
49
+ //tags: HitTriageFunction
50
+ //input: dataframe table [Input data table] {caption: Table}
51
+ //input: column molecules {caption: Molecules; semType: Molecule}
52
+ //output: dataframe result
53
+ export async function chemblMolregno(table: DG.DataFrame, molecules: DG.Column): Promise<DG.DataFrame> {
54
+ const name = table.columns.getUnusedName('CHEMBL molregno');
55
+ table.columns.addNewInt(name);
56
+ for (let i = 0; i < molecules.length; i++) {
57
+ const smile = molecules.get(i);
58
+ if (!smile) {
59
+ table.set(name, i, null);
60
+ continue;
61
+ }
62
+ const canonical = grok.chem.convert(smile, DG.chem.Notation.Unknown, DG.chem.Notation.Smiles);
63
+ const resDf: DG.DataFrame = await grok.data.query('Chembl:ChemblMolregNoBySmiles', {smiles: canonical});
64
+ const res: number = resDf.getCol('molregno').toList()[0];
65
+ table.set(name, i, res);
66
+ }
67
+ return table;
68
+ }
69
+ ```
70
+
71
+ This function will go through every molecule in the dataframe, convert them to canonical smiles and call the query from Chembl database, that will retrieve the molregno number. The result will be added as a new column to the dataframe. If this function is defined in the `Chembl` package, after building and deploying it to stand, it will be automatically added to the compute functions list in HitDesign.
72
+
73
+ Datagrok scripts can also be used as compute functions. For example, you can create a js script that adds a new column to the dataframe. This script also needs to have `HitTriageFunction` tag and should accept `Dataframe` `table` and `Column` `molecules` as first two inputs:
74
+
75
+ ```
76
+ //name: Demo script HT
77
+ //description: Hello world script
78
+ //language: javascript
79
+ //input: dataframe df
80
+ //input: column col
81
+ //input: int a
82
+ //tags: HitTriageFunction
83
+ //output: dataframe res
84
+
85
+ df.columns.addNewInt('Some number col').init(() => a)
86
+ res = df
87
+
88
+ ```
89
+
90
+ Similarly, queries with same `HitTriageFunction` tag will be added to the compute functions list. The query needs to have at least one input, first of which must be `list<string>`, representing the list of molecules. The query must return a dataframe, which should contain column `molecules` in order to join result with initial dataframe. `molecules` column will be used as key for joining tables. For example, we can create a query that looks for the molecule in Chembl database and returns the molregno number:
91
+
92
+ ```
93
+ --name: ChemblMolregNoBySmilesDirect
94
+ --friendlyName: Chembl Molregno by smiles direct
95
+ --input: list<string> molecules
96
+ --tags: HitTriageFunction
97
+ --connection: Chembl
98
+ select molregno, molecules from compound_structures c
99
+ INNER JOIN unnest(@molecules) molecules
100
+ ON molecules.molecules
101
+ = c.canonical_smiles
102
+ ```
103
+
104
+ Or a query that calculates fraction of sp3 hybridized carbons in the molecule using RDKit SQL cartridge:
105
+
106
+ ```
107
+ --name: SP3Fraction
108
+ --friendlyName: SP3 fraction of carbons
109
+ --input: list<string> molecules
110
+ --tags: HitTriageFunction
111
+ --connection: Chembl
112
+ select molecules, mol_fractioncsp3(Cast(molecules as mol))
113
+ from unnest(@molecules) as molecules
114
+ where is_valid_smiles(Cast(molecules as cstring))
115
+ ```
44
116
 
45
117
  - **Submit function** : Users can define custom submit functions (tagged with `HitTriageSubmitFunction`) to further process or save the filtered and computed dataset. This could include saving to a private database or additional calculations.
46
118
 
@@ -72,6 +144,6 @@ Users can start a new campaign by choosing a template and filling out the requir
72
144
 
73
145
  ![hitDesignReadmeImg](https://github.com/datagrok-ai/public/blob/master/help/uploads/hittriage/HT_create_campaign.gif?raw=true)
74
146
 
75
- After the campaign starts, users can filter, modify or add viewers to the campaign and then save them. once saved, reloading the campaign will restore the saved state.
147
+ After the campaign starts, new calculated columns will be added. Users can filter, modify or add viewers to the campaign and then save them. Once saved, reloading the campaign will restore the saved state.
76
148
 
77
149
  ![hitDesignReadmeImg](https://github.com/datagrok-ai/public/blob/master/help/uploads/hittriage/HT_save_campaign.gif?raw=true)
@@ -66,7 +66,7 @@ table.hit-triage-table {
66
66
 
67
67
  .hit-triage-compute-dialog-host {
68
68
  max-height: 550px;
69
- max-width: 500px;
69
+ max-width: 700px;
70
70
  }
71
71
 
72
72
  .hit-triage-compute-dialog-descriptors-group {