@datagrok/bio 2.8.4 → 2.9.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +19 -9
- package/README.md +39 -20
- package/dist/package-test.js +1 -1
- package/dist/package-test.js.map +1 -1
- package/dist/package.js +1 -1
- package/dist/package.js.map +1 -1
- package/dockerfiles/Dockerfile +5 -4
- package/package.json +3 -3
- package/src/analysis/sequence-activity-cliffs.ts +8 -7
- package/src/analysis/sequence-similarity-viewer.ts +8 -8
- package/src/apps/web-logo-app.ts +26 -6
- package/src/calculations/monomerLevelMols.ts +6 -3
- package/src/package-test.ts +1 -0
- package/src/package-types.ts +0 -1
- package/src/package.ts +52 -10
- package/src/substructure-search/substructure-search.ts +84 -55
- package/src/tests/activity-cliffs-tests.ts +1 -1
- package/src/tests/converters-test.ts +1 -1
- package/src/tests/detectors-tests.ts +2 -2
- package/src/tests/msa-tests.ts +2 -3
- package/src/tests/renderers-test.ts +37 -3
- package/src/tests/scoring.ts +38 -0
- package/src/tests/splitters-test.ts +27 -1
- package/src/tests/units-handler-splitted-tests.ts +19 -12
- package/src/tests/units-handler-tests.ts +15 -15
- package/src/utils/cell-renderer.ts +31 -20
- package/src/utils/monomer-cell-renderer.ts +14 -14
- package/src/utils/save-as-fasta.ts +1 -1
- package/src/utils/split-to-monomers.ts +40 -6
- package/src/utils/ui-utils.ts +4 -4
- package/src/viewers/vd-regions-viewer.ts +88 -51
- package/src/viewers/web-logo-viewer.ts +307 -310
- package/src/widgets/composition-analysis-widget.ts +6 -2
package/CHANGELOG.md
CHANGED
|
@@ -1,16 +1,26 @@
|
|
|
1
1
|
# Bio changelog
|
|
2
2
|
|
|
3
|
-
## 2.9.0 (
|
|
4
|
-
|
|
5
|
-
*Dependency: datagrok-api >= 1.13.3*
|
|
3
|
+
## 2.9.0 (2023-08-30)
|
|
6
4
|
|
|
7
5
|
### Features
|
|
8
6
|
|
|
9
|
-
*
|
|
7
|
+
* WebLogo: add property `showPositionLabels`.
|
|
8
|
+
* WebLogo: optimized with `splitterAsFastaSimple`.
|
|
9
|
+
* WebLogo: disable `userEditable` for `fixWidth`.
|
|
10
|
+
* VdRegionsViewer: optimized preventing rebuild on `positionWidth` changed and resize.
|
|
11
|
+
* VdRegionsViewer: to fit WebLogo enclosed on `positionWidth` of value 0.
|
|
12
|
+
* Introduced sequence identity and similarity scoring.
|
|
10
13
|
|
|
11
|
-
### Bug fixes
|
|
14
|
+
### Bug fixes
|
|
12
15
|
|
|
13
|
-
* Fix vdRegionsViewer viewer package function name consistency
|
|
16
|
+
* Fix vdRegionsViewer viewer package function name consistency.
|
|
17
|
+
* GROK-13310: Bio | Tools: Fix Split to monomers for multiple runs.
|
|
18
|
+
* GROK-12675: Bio | Tools: Fix the Composition dialog error on the selection column.
|
|
19
|
+
* Allow characters '(', ')', ',', '-', '_' in monomer names for fasta splitter.
|
|
20
|
+
* WebLogo: Fix horizontal alignment to the left while `fixWidth``.
|
|
21
|
+
* WebLogo: Fix layout for `fixWidth`, `fitArea`, and normal modes.
|
|
22
|
+
* VdRegionsViewer: Fix postponed rendering for tests.
|
|
23
|
+
* MacromoleculeDifferenceCellRenderer: Fix to not use `UnitsHandler`.
|
|
14
24
|
|
|
15
25
|
## 2.8.2 (2023-08-01)
|
|
16
26
|
|
|
@@ -20,12 +30,12 @@ This release focuses on improving the monomer cell renderer.
|
|
|
20
30
|
|
|
21
31
|
### Features
|
|
22
32
|
|
|
23
|
-
* Added sample datasets for natural and synthetic peptide sequences
|
|
24
|
-
* Added sample dataset for cyclic sequences with HELM notation
|
|
33
|
+
* Added sample datasets for natural and synthetic peptide sequences.
|
|
34
|
+
* Added sample dataset for cyclic sequences with HELM notation.
|
|
25
35
|
|
|
26
36
|
### Bug fixes
|
|
27
37
|
|
|
28
|
-
* GROK-13659: Bio | Tools: Fix MaxMonomerLength for Macromolecule cell renderer
|
|
38
|
+
* GROK-13659: Bio | Tools: Fix MaxMonomerLength for Macromolecule cell renderer.
|
|
29
39
|
|
|
30
40
|
## 2.8.1 (2023-07-24)
|
|
31
41
|
|
package/README.md
CHANGED
|
@@ -1,14 +1,14 @@
|
|
|
1
1
|
# Bio
|
|
2
2
|
|
|
3
|
-
Bio is a bioinformatics support [package](https://datagrok.ai/help/develop
|
|
4
|
-
[Datagrok](https://datagrok.ai) platform with an extensive toolset supporting SAR
|
|
3
|
+
Bio is a bioinformatics support [package](https://datagrok.ai/help/develop/#packages) for the
|
|
4
|
+
[Datagrok](https://datagrok.ai) platform with an extensive toolset supporting SAR analysis for small molecules
|
|
5
5
|
and antibodies.
|
|
6
6
|
|
|
7
7
|
## Notations
|
|
8
8
|
|
|
9
9
|
[@datagrok/bio](https://github.com/datagrok-ai/public/tree/master/packages/Bio) can ingest data in multiple file
|
|
10
10
|
formats (such as fasta or csv) and multiple notations for natural and modified residues, aligned and non-aligned forms,
|
|
11
|
-
nucleotide and amino acid sequences. The sequences are automatically detected and classified
|
|
11
|
+
nucleotide and amino acid sequences. The sequences are automatically detected and classified while preserving their
|
|
12
12
|
initial notation. Datagrok allows you to convert sequences between different notations as well.
|
|
13
13
|
|
|
14
14
|

|
|
@@ -21,7 +21,7 @@ See:
|
|
|
21
21
|
## Atomic-Level structures from sequences
|
|
22
22
|
|
|
23
23
|
For linear sequences, the linear form (see the illustration below) of molecules is reproduced. This is useful
|
|
24
|
-
for better visual inspection of sequence and duplex comparison. Structure at atomic level could be saved in available
|
|
24
|
+
for better visual inspection of sequence and duplex comparison. Structure at the atomic level could be saved in available
|
|
25
25
|
notations.
|
|
26
26
|
|
|
27
27
|

|
|
@@ -36,11 +36,10 @@ See:
|
|
|
36
36
|
|
|
37
37
|
## MSA
|
|
38
38
|
|
|
39
|
-
For multiple-sequence alignment, Datagrok uses the “kalign” that relies on Wu-Manber string-matching algorithm
|
|
39
|
+
For multiple-sequence alignment, Datagrok uses the “kalign” that relies on the Wu-Manber string-matching algorithm
|
|
40
40
|
[Lassmann, Timo. _Kalign 3: multiple sequence alignment of large data sets._ **Bioinformatics** (2019).pdf](https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btz795/30314127/btz795.pdf).
|
|
41
|
-
“kalign“ is suited for sequences containing only natural monomers. Sequences of a particular column can be analyzed
|
|
42
|
-
|
|
43
|
-
at the position of MSA result. User is also able to specify custom gap open, gap extend and terminal gap penalties for alignment.
|
|
41
|
+
“kalign“ is suited for sequences containing only natural monomers. Sequences of a particular column can be analyzed using the MSA algorithm available at the top menu. Aligned sequences can be inspected for base composition
|
|
42
|
+
at the position of the MSA result. User is also able to specify custom gap open, gap extend and terminal gap penalties for alignment.
|
|
44
43
|
|
|
45
44
|

|
|
@@ -71,11 +70,11 @@ The most helpful feature for exploration analysis with WebLogo in Datagrok is it
|
|
|
71
70
|
on a dataset. Mouse click on a particular residue in a specific position will select rows of the dataset
|
|
72
71
|
with sequences containing that residue at that position.
|
|
73
72
|
|
|
74
|
-
You must specify the tag
|
|
73
|
+
You must specify the tag `semType` with the value `Macromolecule` and tag `alphabet` of choice ('PT', 'DNA', 'RNA')
|
|
75
74
|
for the data column with multiple alignment sequences, it is mandatory to select the palette for monomers' colors.
|
|
76
75
|
|
|
77
76
|
You can customize the look of the viewer with properties. Properties ```startPosition``` and ```endPosition```)
|
|
78
|
-
allow to display multiple
|
|
77
|
+
allow to display multiple alignments partially. If property ```startPosition``` (```endPosition```)
|
|
79
78
|
is not specified, then the Logo will be plotted from the first (till the last) position of sequences.
|
|
80
79
|
|
|
81
80
|
### General
|
|
@@ -110,8 +109,8 @@ is not specified, then the Logo will be plotted from the first (till the last) p
|
|
|
110
109
|
See also:
|
|
111
110
|
|
|
112
111
|
* [WebLogo](../../libraries/)
|
|
113
|
-
* [Viewers](../../help/visualize/viewers.md)
|
|
114
|
-
* [Table view](../../help/datagrok/table
|
|
112
|
+
* [Viewers](../../help/visualize/viewers/viewers.md)
|
|
113
|
+
* [Table view](../../help/datagrok/concepts/table.md)
|
|
115
114
|
|
|
116
115
|
## Sequence space
|
|
117
116
|
|
|
@@ -119,7 +118,7 @@ Datagrok allows visualizing multidimensional sequence space using a dimensionali
|
|
|
119
118
|
Several distance-based dimensionality reduction algorithms are available, such as UMAP or t-SNE.
|
|
120
119
|
The sequences are projected to 2D space closer if they correspond to similar structures, and farther
|
|
121
120
|
otherwise. The tool for analyzing molecule collections is called 'Sequence space' and exists in
|
|
122
|
-
the Bio package. Depending on the sequence type, different distance functions will be used, like [Levenstein](https://en.wikipedia.org/wiki/Levenshtein_distance) for DNA/RNA, [Needleman-Wunsch](https://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm) for Proteins and [Hamming](https://en.wikipedia.org/wiki/Hamming_distance) for already aligned sequences. The process is conducted
|
|
121
|
+
the Bio package. Depending on the sequence type, different distance functions will be used, like [Levenstein](https://en.wikipedia.org/wiki/Levenshtein_distance) for DNA/RNA, [Needleman-Wunsch](https://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm) for Proteins and [Hamming](https://en.wikipedia.org/wiki/Hamming_distance) for already aligned sequences. The process is conducted by web-workers and is parallelized, which yields very fast and non-interrupting computing.
|
|
123
122
|
|
|
124
123
|
To launch the analysis from the top menu, select Bio | Structure | Sequence space.
|
|
125
124
|
|
|
@@ -127,15 +126,15 @@ To launch the analysis from the top menu, select Bio | Structure | Sequence spac
|
|
|
127
126
|
|
|
128
127
|
See:
|
|
129
128
|
|
|
130
|
-
* [sequenceSpace()](src/
|
|
129
|
+
* [sequenceSpace()](src/analysis/sequence-space.ts)
|
|
131
130
|
|
|
132
131
|
## Sequence activity cliffs
|
|
133
132
|
|
|
134
133
|
Activity cliffs tool finds pairs of sequences where small changes in the sequence yield significant
|
|
135
|
-
changes in activity or any other numerical property.
|
|
136
|
-
Similarity cutoff and similarity
|
|
134
|
+
changes in activity or any other numerical property. Open the tool from the top menu by selecting.
|
|
135
|
+
Similarity cutoff and similarity metrics are configurable. As in Sequence space, you can select
|
|
137
136
|
from different dimensionality reduction algorithms.
|
|
138
|
-
A custom scatter plot with cliffs will be added to the right side of the grid.
|
|
137
|
+
A custom scatter plot with cliffs will be added to the right side of the grid. The user has the option to show only cliffs and also to inspect them and highlight differences between similar sequences.
|
|
139
138
|
|
|
140
139
|
To launch the analysis from the top menu, select Bio | SAR | Sequence Activity Cliffs.
|
|
141
140
|
|
|
@@ -147,16 +146,36 @@ See:
|
|
|
147
146
|
|
|
148
147
|
## Similarity Search
|
|
149
148
|
|
|
150
|
-
Similarity Search tool allows
|
|
149
|
+
Similarity Search tool allows users to find sequences that are most similar to the target sequence. The tool can be accessed from the top menu of bio. It first constructs the distance matrix for all sequences and then uses it to find the most similar ones to the selection. Upon selecting similar sequences from the docked grid below, detailed differences will be shown in the context panel.
|
|
151
150
|
|
|
152
|
-
To launch the
|
|
151
|
+
To launch the search from the top menu, select Bio | Search | Similarity Search
|
|
153
152
|
|
|
154
153
|

|
|
155
154
|
|
|
156
155
|
## Diversity Search
|
|
157
156
|
|
|
158
|
-
Diversity Search tool allows
|
|
157
|
+
Diversity Search tool allows users to find sequences that are most diverse in the given dataset. The tool can be accessed from the top menu of bio. By default, the number of diverse sequences will be 10.
|
|
159
158
|
|
|
160
159
|
To launch the search from the top menu, select Bio | Search | Diversity Search
|
|
161
160
|
|
|
162
161
|

|
|
162
|
+
|
|
163
|
+
## Sequence scoring
|
|
164
|
+
|
|
165
|
+
Sequence scoring allows users to calculate sequence identity and similarity scores, given the reference sequence and add the results as a column. Sequence scoring functionality can be found in the top menu: Bio → Calculate.
|
|
166
|
+
|
|
167
|
+
### Identity
|
|
168
|
+
|
|
169
|
+
The identity score represents a fraction of the identical monomers in corresponding positions.
|
|
170
|
+
|
|
171
|
+
Identity scoring can be found in the top menu: **Bio → Calculate → Identity...**.
|
|
172
|
+
|
|
173
|
+

|
|
174
|
+
|
|
175
|
+
### Similarity
|
|
176
|
+
|
|
177
|
+
The similarity score represents the sum of fingerprint similarity of monomers in corresponding positions.
|
|
178
|
+
|
|
179
|
+
Similarity scoring can be found in the top menu: **Bio → Calculate → Similarity...**.
|
|
180
|
+
|
|
181
|
+

|