@datagrok/bio 1.11.2 → 1.11.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md
CHANGED
|
@@ -1,6 +1,66 @@
|
|
|
1
1
|
# Bio
|
|
2
2
|
|
|
3
|
-
Bio is a [package](https://datagrok.ai/help/develop/develop#packages) for the
|
|
3
|
+
Bio is a bioinformatics support [package](https://datagrok.ai/help/develop/develop#packages) for the
|
|
4
|
+
[Datagrok](https://datagrok.ai) platform with an extensive toolset supporting SAR analisys for small molecules
|
|
5
|
+
and antibodies.
|
|
6
|
+
|
|
7
|
+
# Notations
|
|
8
|
+
|
|
9
|
+
[@datagrok/bio](https://github.com/datagrok-ai/public/tree/master/packages/Bio) can ingest data in multiple file
|
|
10
|
+
formats (such as fasta o csv) and multiple notations for natural and modified residues, aligned and non-aligned forms,
|
|
11
|
+
nucleotide and amino acid sequences. The sequences are automatically detected and classified, while preserving their
|
|
12
|
+
initial notation. Datagrok allows you to convert sequences between different notations as well.
|
|
13
|
+
|
|
14
|
+

|
|
15
|
+
|
|
16
|
+
See:
|
|
17
|
+
|
|
18
|
+
* [detectMacromolecule()](../Bio/detectors.js)
|
|
19
|
+
* [class NotationConverter](../../libraries/bio/src/utils/notation-converter.ts)
|
|
20
|
+
|
|
21
|
+
# Atomic-Level structures from sequences
|
|
22
|
+
|
|
23
|
+
For linear sequences, the linear form (see the illustration below) of molecules is reproduced. This is useful
|
|
24
|
+
for better visual inspection of sequence and duplex comparison. Structure at atomic level could be saved in available
|
|
25
|
+
notations.
|
|
26
|
+
|
|
27
|
+

|
|
28
|
+
|
|
29
|
+
You can easily run this feature for any sequence data using the Bio package and accessing it from the top menu.
|
|
30
|
+
|
|
31
|
+

|
|
32
|
+
|
|
33
|
+
See:
|
|
34
|
+
|
|
35
|
+
* [getMolfilesFromSeq()](./src/utils/atomic-works.ts)
|
|
36
|
+
|
|
37
|
+
# MSA
|
|
38
|
+
|
|
39
|
+
For multiple-sequence alignment, Datagrok uses the “kalign” that relies on Wu-Manber string-matching algorithm
|
|
40
|
+
[Lassmann, Timo. _Kalign 3: multiple sequence alignment of large data sets._ **Bioinformatics** (2019).[pdf](
|
|
41
|
+
https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btz795/30314127/btz795.pdf)].
|
|
42
|
+
“kalign“ is suited for sequences containing only natural monomers. Sequences of a particular column can be analyzed
|
|
43
|
+
using MSA algorithm available at the top menu. Aligned sequences can be inspected for base composition
|
|
44
|
+
at the position of MSA result.
|
|
45
|
+
|
|
46
|
+

|
|
48
|
+
|
|
49
|
+
See:
|
|
50
|
+
|
|
51
|
+
* [runKalign()](src/utils/multiple-sequence-alignment.ts)
|
|
52
|
+
|
|
53
|
+
TODO: MSA with PepSeA
|
|
54
|
+
|
|
55
|
+
# Splitting to monomers
|
|
56
|
+
|
|
57
|
+
Splitting to monomers allows splitting aligned sequences in separate monomers.
|
|
58
|
+
|
|
59
|
+

|
|
60
|
+
|
|
61
|
+
See:
|
|
62
|
+
|
|
63
|
+
* [splitAlignedSequences()](../../libraries/bio/src/utils/splitter.ts)
|
|
4
64
|
|
|
5
65
|
# Web Logo
|
|
6
66
|
|
|
@@ -28,19 +88,60 @@ is not specified, then the Logo will be plotted from the first (till the last) p
|
|
|
28
88
|
|
|
29
89
|
## Properties
|
|
30
90
|
|
|
31
|
-
| Property name | Default
|
|
32
|
-
|
|
33
|
-
| positionWidth | 16
|
|
34
|
-
| minHeight | 50
|
|
35
|
-
| maxHeight | 100
|
|
36
|
-
| considerNullSequence | false
|
|
37
|
-
| sequenceColumnName | null
|
|
38
|
-
| startPositionName | null
|
|
39
|
-
| endPositionName | null
|
|
91
|
+
| Property name | Default | Description |
|
|
92
|
+
|----------------------|----------|-------------------------------------------------------------------------------------------------------------------------|
|
|
93
|
+
| positionWidth | 16 | Width of one position stack [px] |
|
|
94
|
+
| minHeight | 50 | Minimum height of Logo [px] |
|
|
95
|
+
| maxHeight | 100 | Maximum height of Logo [px] |
|
|
96
|
+
| considerNullSequence | false | Should logo consider null seqences of data |
|
|
97
|
+
| sequenceColumnName | null | source of multiple alignment sequences (column name) |
|
|
98
|
+
| startPositionName | null | name of the first position to display Logo partially |
|
|
99
|
+
| endPositionName | null | name of the last position to display Logo partially |
|
|
100
|
+
| fixWidth | false | Plot takes full width required for sequence length |
|
|
101
|
+
| verticalAlignment | 'middle' | choices: ['top', 'middle', 'bottom'] |
|
|
102
|
+
| horizontalAlignment | 'center' | choices: ['left', 'center', 'right'] |
|
|
103
|
+
| fitArea | true | Should control to be scaled to fit available area for viewer |
|
|
104
|
+
| shrinkEmptyTail | true | Shrink sequences' tails empty in filtered sequences |
|
|
105
|
+
| skipEmptyPositions | false | Skip positions containing only gap symbols in all sequences |
|
|
106
|
+
| positionMarginState | 'auto' | choices: ['auto', 'enable', 'off'] Margin between positions. auto - enables margins for sequences of multichar monomers |
|
|
107
|
+
| positionMargin | 0 or 4 | 4 - for sequences of multichar monomers, 0 - single char |
|
|
108
|
+
| positionHeight | '100%' | choices: ['100%', 'Entropy'] The way to calculate overall monomers stack height at position |
|
|
40
109
|
|
|
41
|
-

|
|
42
111
|
|
|
43
112
|
See also:
|
|
44
113
|
|
|
45
|
-
* [
|
|
46
|
-
* [
|
|
114
|
+
* [WebLogo](../../libraries/)
|
|
115
|
+
* [Viewers](../../help/visualize/viewers.md)
|
|
116
|
+
* [Table view](../../help/datagrok/table-view.md)
|
|
117
|
+
|
|
118
|
+
# Sequence space
|
|
119
|
+
|
|
120
|
+
Datagrok allows visualizing multidimensional sequence space using a dimensionality reduction approach.
|
|
121
|
+
Several distance-based dimensionality reduction algorithms are available, such as UMAP or t-SNE.
|
|
122
|
+
The sequences are projected to 2D space closer if they correspond to similar structures, and farther
|
|
123
|
+
otherwise. The tool for analyzing molecule collections is called 'Sequence space' and exists in
|
|
124
|
+
the Bio package.
|
|
125
|
+
|
|
126
|
+
To launch the analysis from the top menu, select Bio | Sequence space.
|
|
127
|
+
|
|
128
|
+

|
|
129
|
+
|
|
130
|
+
See:
|
|
131
|
+
|
|
132
|
+
* [sequenceSpace()](src/utils/sequence-space.ts)
|
|
133
|
+
|
|
134
|
+
# Sequence activity cliffs
|
|
135
|
+
|
|
136
|
+
Activity cliffs tool finds pairs of sequences where small changes in the sequence yield significant
|
|
137
|
+
changes in activity or any other numerical property. open the tool from a top menu by selecting.
|
|
138
|
+
Similarity cutoff and similarity metric are configurable. As in Sequence space, you can select
|
|
139
|
+
from different dimensionality reduction algorithms.
|
|
140
|
+
|
|
141
|
+
To launch the analysis from the top menu, select Bio | Sequence Activity Cliffs.
|
|
142
|
+
|
|
143
|
+

|
|
144
|
+
|
|
145
|
+
See:
|
|
146
|
+
|
|
147
|
+
* [getActivityCliffs()](../../libraries/ml/src/viewers/activity-cliffs.ts)
|