bioroebe 0.10.80 → 0.11.12

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of bioroebe might be problematic. Click here for more details.

Files changed (67) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +507 -310
  3. data/bioroebe.gemspec +3 -3
  4. data/doc/README.gen +506 -309
  5. data/doc/todo/bioroebe_todo.md +29 -40
  6. data/lib/bioroebe/aminoacids/display_aminoacid_table.rb +1 -0
  7. data/lib/bioroebe/base/colours_for_base/colours_for_base.rb +18 -8
  8. data/lib/bioroebe/base/commandline_application/commandline_arguments.rb +13 -11
  9. data/lib/bioroebe/base/commandline_application/misc.rb +18 -8
  10. data/lib/bioroebe/base/prototype/misc.rb +1 -1
  11. data/lib/bioroebe/codons/show_codon_tables.rb +6 -2
  12. data/lib/bioroebe/constants/aminoacids_and_proteins.rb +1 -0
  13. data/lib/bioroebe/constants/files_and_directories.rb +8 -1
  14. data/lib/bioroebe/count/count_amount_of_nucleotides.rb +3 -0
  15. data/lib/bioroebe/gui/gtk3/protein_to_DNA/protein_to_DNA.rb +18 -18
  16. data/lib/bioroebe/gui/shared_code/protein_to_DNA/protein_to_DNA_module.rb +14 -14
  17. data/lib/bioroebe/parsers/genbank_parser.rb +353 -24
  18. data/lib/bioroebe/python/README.md +1 -0
  19. data/lib/bioroebe/python/__pycache__/mymodule.cpython-39.pyc +0 -0
  20. data/lib/bioroebe/python/gui/gtk3/widget1.py +22 -0
  21. data/lib/bioroebe/python/mymodule.py +8 -0
  22. data/lib/bioroebe/python/protein_to_dna.py +30 -0
  23. data/lib/bioroebe/python/shell/shell.py +19 -0
  24. data/lib/bioroebe/python/to_rna.py +14 -0
  25. data/lib/bioroebe/python/toplevel_methods/to_camelcase.py +11 -0
  26. data/lib/bioroebe/sequence/nucleotide_module/nucleotide_module.rb +28 -25
  27. data/lib/bioroebe/sequence/sequence.rb +54 -2
  28. data/lib/bioroebe/shell/menu.rb +3336 -3304
  29. data/lib/bioroebe/shell/readline/readline.rb +1 -1
  30. data/lib/bioroebe/shell/shell.rb +11233 -28
  31. data/lib/bioroebe/siRNA/siRNA.rb +81 -1
  32. data/lib/bioroebe/string_matching/find_longest_substring.rb +3 -2
  33. data/lib/bioroebe/toplevel_methods/aminoacids_and_proteins.rb +31 -24
  34. data/lib/bioroebe/toplevel_methods/nucleotides.rb +22 -5
  35. data/lib/bioroebe/toplevel_methods/open_in_browser.rb +2 -0
  36. data/lib/bioroebe/toplevel_methods/to_camelcase.rb +5 -0
  37. data/lib/bioroebe/version/version.rb +2 -2
  38. data/lib/bioroebe/yaml/configuration/browser.yml +1 -1
  39. data/lib/bioroebe/yaml/restriction_enzymes/restriction_enzymes.yml +3 -3
  40. metadata +17 -36
  41. data/doc/setup.rb +0 -1655
  42. data/lib/bioroebe/genbank/genbank_parser.rb +0 -291
  43. data/lib/bioroebe/shell/add.rb +0 -108
  44. data/lib/bioroebe/shell/assign.rb +0 -360
  45. data/lib/bioroebe/shell/chop_and_cut.rb +0 -281
  46. data/lib/bioroebe/shell/constants.rb +0 -166
  47. data/lib/bioroebe/shell/download.rb +0 -335
  48. data/lib/bioroebe/shell/enable_and_disable.rb +0 -158
  49. data/lib/bioroebe/shell/enzymes.rb +0 -310
  50. data/lib/bioroebe/shell/fasta.rb +0 -345
  51. data/lib/bioroebe/shell/gtk.rb +0 -76
  52. data/lib/bioroebe/shell/history.rb +0 -132
  53. data/lib/bioroebe/shell/initialize.rb +0 -217
  54. data/lib/bioroebe/shell/loop.rb +0 -74
  55. data/lib/bioroebe/shell/misc.rb +0 -4341
  56. data/lib/bioroebe/shell/prompt.rb +0 -107
  57. data/lib/bioroebe/shell/random.rb +0 -289
  58. data/lib/bioroebe/shell/reset.rb +0 -335
  59. data/lib/bioroebe/shell/scan_and_parse.rb +0 -135
  60. data/lib/bioroebe/shell/search.rb +0 -337
  61. data/lib/bioroebe/shell/sequences.rb +0 -200
  62. data/lib/bioroebe/shell/show_report_and_display.rb +0 -2901
  63. data/lib/bioroebe/shell/startup.rb +0 -127
  64. data/lib/bioroebe/shell/taxonomy.rb +0 -14
  65. data/lib/bioroebe/shell/tk.rb +0 -23
  66. data/lib/bioroebe/shell/user_input.rb +0 -88
  67. data/lib/bioroebe/shell/xorg.rb +0 -45
data/doc/README.gen CHANGED
@@ -5,7 +5,7 @@ ADD_TIME_STAMP
5
5
 
6
6
  ## Bioroebe
7
7
 
8
- <img src="http://shevy.bplaced.net/BIOROEBE.png">
8
+ <img src="https://i.imgur.com/mAoP7AP.png">
9
9
  <img src="https://i.imgur.com/YqYxRBZ.png" style="margin: 4px; margin-left: 12px;"/>
10
10
  <img src="https://i.imgur.com/k7mMlg2.png" style="margin: 4px; margin-left: 12px;"/>
11
11
 
@@ -332,41 +332,6 @@ so I opted to go the yaml route. But if people want to use a hash
332
332
  instead, they can do so, too - see the <b>API</b> for codon tables
333
333
  lateron. Simply define your own constants and pass them to the
334
334
  appropriate methods.
335
-
336
- ## Support for other programming languages
337
-
338
- The main programming language for the bioroebe project is **ruby**.
339
- Ruby, from a language design point of view, is a great programming
340
- language - not necessarily all of ruby, but the subset that I use.
341
- It is very easy to quickly prototype ideas via ruby.
342
-
343
- However had, ruby is known to **not** be among the fastest programming
344
- languages about on this planet; so, it makes sense to use other
345
- languages too from this point of view. Additionally there are some
346
- software stacks in use in **other** programming languages, such as
347
- matplotlib and various more.
348
-
349
- Thus, it is important to **support other programming languages** as
350
- well, if there are useful libraries. The bioroebe project, after
351
- all, tries to be **practical**: it focuses on getting things done,
352
- no matter the language.
353
-
354
- This means that support for other programming languages can be
355
- found in this project as well, often using system() or similar
356
- functionality to tap into these other programming languages. Do
357
- not be surprised when that happens - the bioroebe project will
358
- also try to act as a **practical glue** towards functionality
359
- enabled via other projects. We want to get things done, no
360
- matter the programming language at hand!
361
-
362
- Whenever possible, though, the bioroebe project will try to be
363
- flexible in this regard, so ideally the same solution should
364
- work for many different programming languages.
365
-
366
- While Ruby is the primary language for this project, since as
367
- of 2021 I will try to officially support **java**, **jruby**
368
- and the **GraalVM**. This is on my TODO list, though - stay
369
- tuned for more updates in this regard.
370
335
 
371
336
  ## Readline support in the BioRoebe project
372
337
 
@@ -550,16 +515,16 @@ the DNA-to-Protein translation is somewhat simply kept as a
550
515
  Once you are inside a **running Bioshell**, you can do other **commands**
551
516
  such as this one here:
552
517
 
553
- random # ← This will generate a random DNA sequence.
518
+ random # ← This will generate a random DNA sequence. Each nucleotide has the same chance to be added.
554
519
 
555
520
  To **assign** a DNA sequence, do:
556
521
 
557
522
  assign ATAGGGCTTTT
558
523
 
559
- Note that since the year 2016, if you input a nucleotide sequence like
560
- the one above, without any other commands/words, then we will assume
524
+ Note that since as of the year <b>2016</b>, if you input a nucleotide sequence
525
+ like the one above, without any other commands/words, then we will assume
561
526
  that you did mean to do an assignment as-is anyway. The "assign" part
562
- then becomes superfluous.
527
+ then becomes superfluous and can be omitted.
563
528
 
564
529
  This is how this is simply done, by omitting the "assign" part of the
565
530
  above instruction altogether:
@@ -1070,18 +1035,18 @@ The text **banana** thus has the following suffixes:
1070
1035
 
1071
1036
  This subsection deals with some aspects of **HMMs**.
1072
1037
 
1073
- Why are HMMs useful in biology? They can be used to represent protein
1074
- families, for example (via pHMMs - profile hidden markov models).
1038
+ Why are HMMs useful in biology? They can be used to <b>represent protein
1039
+ families</b>, for example (via <b>pHMMs</b> - profile hidden markov models).
1075
1040
 
1076
1041
  Furthermore, they can show some bias in the mutation rate that can be
1077
1042
  observed. Different genomes are known to have different hotspots where
1078
- mutations are more likely to happen. These are examples where a HMM
1079
- may be useful.
1043
+ mutations are more likely to happen, for various reasons. These are
1044
+ examples where a HMM may be useful.
1080
1045
 
1081
- HMMs are usually based on the Shannon model where you assign different
1046
+ HMMs are usually based on the <b>Shannon model</b> where you assign different
1082
1047
  probabilities to "change" events. An example that was mentioned back
1083
- in 1948 was the english alphabet - some letters, and combinations of
1084
- letters, are more commonly seen. Shannon gave the example of "E"
1048
+ in <b>1948</b> was the english alphabet - some letters, and combinations
1049
+ of letters, are more commonly seen. Shannon gave the example of "E"
1085
1050
  versus "W", as shown in the following graph (a **finite state
1086
1051
  graph**):
1087
1052
 
@@ -1095,40 +1060,47 @@ DNA sequence, a 10-mer would be equivalent to **10 base pairs**.
1095
1060
  The individual transition states are based on an assumption of
1096
1061
  "randomness", but ensuring that these are truly random is not
1097
1062
  necessarily trivial. Computers do not really 'generate' true
1098
- randomness, at the least not when they are working solo. You
1099
- can even 'predict' some randomness here or there - see vulnerabilities
1100
- such as Specter or similar variants where software can read from
1101
- areas of the memory that should be inaccessible to them. Some
1102
- of this is based on co-predictions. For distributed computers,
1103
- you may often use random noise or decay of atoms as 'a source
1104
- of randomness''. For any DNA nucleotide sequence, we would
1105
- assume that each base pair has a 25% chance to exist at any
1106
- given position, but this is not necessarily true, for various
1107
- reasons. An interesting thought is ... why is ATP so important?
1108
- Yes, due to it being 'the energy currency in a cell' but .. why
1109
- is this ATP aka adenine? Why not GTP, aka guanine or any of
1110
- the other two nucleotides? I can not answer the question; there may
1111
- be many reasons, including differential chemical storage power as
1112
- well as mere random chance event in evolution, but for whatever
1063
+ randomness, at the least not when they are working solo, "on
1064
+ their own". You can even 'predict' some randomness here or there
1065
+ via various techniques - see vulnerabilities such as <b>Specter</b>
1066
+ or similar variants where software can read from areas of the
1067
+ memory that should be inaccessible to them. Some of this is based
1068
+ on co-predictions. For distributed computers, you may often use
1069
+ random noise or decay of atoms as 'a source of randomness'. For
1070
+ any DNA nucleotide sequence, we would assume that each base pair
1071
+ has a 25% chance to exist at any given position, but this is not
1072
+ necessarily true, again for various reasons.
1073
+
1074
+ An interesting thought is ... why is <b>ATP</b> so important?
1075
+ Yes, of course due to it being 'the energy currency in a cell' but ..
1076
+ why is this ATP, aka adenine? Why not GTP, aka guanine or any of
1077
+ the other two nucleotides? (GTP is used too, but why? Why not
1078
+ CTP and TTP?) I can not answer this question; there may
1079
+ be many reasons, including differential chemical storage power
1080
+ as well as mere random chance event in evolution, but for whatever
1113
1081
  the reason, you will not find a complete 25% percentage value
1114
1082
  for every given "slot" in DNA, depending on the organism.
1115
1083
 
1116
1084
  From a practical point of view, how can we approach Hidden Markov
1117
- Models?
1085
+ Models and use them?
1118
1086
 
1119
- Let's take the following sequence:
1087
+ Let's take the following simple sequence:
1120
1088
 
1121
1089
  ACGTACGC
1122
1090
 
1123
1091
  From this sequence we can see that the <b>3-mer</b> "ACG"
1124
1092
  is followed by either a T, or a C. Have a look at the sequence
1125
- to see if you can identify the two ACG subsequences there.
1093
+ again to see if you can identify the two ACG subsequences
1094
+ there. You can see one at the start, and the other one
1095
+ following a bit later, hence why we come to the conclusion
1096
+ that either a T or a C will follow this <b>3-mer</b>.
1126
1097
 
1127
- The probability of either T or C, thus, is 0.5 (50%);
1128
- for A and G to follow there is 0% so the latter two can
1129
- be ignored.
1098
+ The probability of either T or C to occur on <b>that</b>
1099
+ position, thus, is 0.5 (50%); for A and G to follow there
1100
+ is 0% so the latter two can be ignored.
1130
1101
 
1131
- Thus, we could use a ruby Hash as follows:
1102
+ Thus, we could use a ruby Hash as follows that should
1103
+ describe these probabilities:
1132
1104
 
1133
1105
  probabilities = {'T': 0.5, 'C': 0.5} # ignoring A and G here, but we could denote them via 0 as well
1134
1106
 
@@ -1214,34 +1186,6 @@ each edge.
1214
1186
  Parsimony assumes that substitutions are rare and that back-mutations
1215
1187
  do not occur.
1216
1188
 
1217
- ## Random stuff
1218
-
1219
- You can generate random DNA sequences in the shell:
1220
-
1221
- random dna 20
1222
- random dna 25
1223
- random dna 30
1224
-
1225
- This will generate random DNA sequences, with a length
1226
- of 20, 25, 30, respectively. This may not be very useful
1227
- but it was important that this functionality is made
1228
- available somewhere.
1229
-
1230
- You can also use some toplevel-methods to generate, e. g.
1231
- 20 random aminoacids:
1232
-
1233
- Bioroebe.random_aminoacid? 20 # => "UAVHYQQESWUYAOVESEIY"
1234
-
1235
- Note that there may exist other APIs within the Bioroebe project
1236
- that do the same as well.
1237
-
1238
- If you would like to use a ruby-gtk3 widget have a look
1239
- at **RandomSequence**, under **bioroebe/gtk3/random_sequence/**.
1240
- It works with aminoacids, DNA and RNA, and allows the user to
1241
- create random sequences. (If you need weighted randomness then
1242
- you currently have to use the commandline variant. Perhaps I may
1243
- add support into the GUI directly for this one day.)
1244
-
1245
1189
  ## Displaying the main sequence with delimiter characters
1246
1190
 
1247
1191
  From within the <b>bioshell</b>, you can use some alternative ways to
@@ -2711,18 +2655,6 @@ This may look as follows:
2711
2655
 
2712
2656
  <img src="https://i.imgur.com/gAZg8qG.png" style="margin: 1em; margin-left: 3em">
2713
2657
 
2714
- ## Obtaining a subsequence from a Bioroebe::Sequence object
2715
-
2716
- Say that you have the DNA sequence **ATGCATGCAAAA**.
2717
-
2718
- There are several ways how to obtain a subsequence from
2719
- this. One variant will be shown next, by making use of
2720
- the method called **.subseq()**.
2721
-
2722
- Example:
2723
-
2724
- seq = Bioroebe::Sequence.new("ATGCATGCAAAA"); seq.subseq(1,3) # => "ATG"
2725
-
2726
2658
  ## Bioroebe::Protein
2727
2659
 
2728
2660
  This class is a subclass of class **Bioroebe::Sequence**. The
@@ -2737,16 +2669,6 @@ functionality is also available in another method.
2737
2669
  For now keep this in mind; at some later point I may decide whether
2738
2670
  this class is to be kept or not.
2739
2671
 
2740
- ## Permanently disabling showing the startup-introduction of the Bioshell
2741
-
2742
- If you do not want to see the start-up intro, you can try
2743
- any of the following:
2744
-
2745
- bioshell --permanently-disable-startup-intro
2746
- bioshell --permanently-disable-startup-notice
2747
- bioshell --permanently-no-startup-intro
2748
- bioshell --permanently-no-startup-info
2749
-
2750
2672
  ## Decoding aminoacids
2751
2673
 
2752
2674
  Decoding aminoacids means to take the aminoacid at hand, ideally
@@ -3173,47 +3095,45 @@ can try to use:
3173
3095
  On class Bioroebe::Sequence. More customizability may be added
3174
3096
  to that method in this regard, if users need this.
3175
3097
 
3176
- ## The Hydropathy index
3098
+ ### Obtaining a subsequence from a Bioroebe::Sequence object
3177
3099
 
3178
- You can display the hydropathy index for aminoacids from within
3179
- the **bioshell**.
3100
+ Say that you have the DNA sequence **ATGCATGCAAAA**.
3180
3101
 
3181
- Simply issue:
3102
+ There are several ways how to obtain a subsequence from
3103
+ this. One variant will be shown next, by making use of
3104
+ the method called **.subseq()**.
3182
3105
 
3183
- hydropathy?
3106
+ Example:
3184
3107
 
3185
- ## Generate DNA
3108
+ seq = Bioroebe::Sequence.new("ATGCATGCAAAA"); seq.subseq(1,3) # => "ATG"
3186
3109
 
3187
- You can generate random DNA strings by issuing the following
3188
- code:
3110
+ You can also randomize the sequence, via .randomize().
3189
3111
 
3190
- x = Bioroebe.random_dna 50 # => "AGACATCCGGCTTGGATACCTCATAAGTCATATCAGCATCGTCGGACATT"
3112
+ Example:
3191
3113
 
3192
- As can be seen in the example above, after the #, a String will be
3193
- returned representing that nucleotide sequence.
3114
+ x = Bioroebe::Sequence.new; x.randomize
3194
3115
 
3195
- The number given to .random_dna() tells the method how many nucleotides
3196
- should be generated.
3116
+ This is similar to the method in Bioruby here:
3197
3117
 
3198
- ## The GFF file format
3118
+ https://github.com/bioruby/bioruby/blob/master/lib/bio/sequence/common.rb#L243
3199
3119
 
3200
- From within the **bioshell** you can analyze .gff and .gff3 files,
3201
- such as by issuing the following command:
3120
+ ## The Hydropathy index
3202
3121
 
3203
- gff3? foobar.gff3
3122
+ You can display the hydropathy index for aminoacids from within
3123
+ the **bioshell**.
3204
3124
 
3205
- Evidently for this to work the file at hand has to exist.
3125
+ Simply issue:
3206
3126
 
3207
- ## Shuffling the DNA/RNA string in the bioshell
3127
+ hydropathy?
3208
3128
 
3209
- Via
3129
+ ## The GFF file format
3210
3130
 
3211
- shuffle
3131
+ From within the **bioshell** you can analyze .gff and .gff3 files,
3132
+ such as by issuing the following command:
3212
3133
 
3213
- you can randomly rearrange the main DNA/RNA string.
3134
+ gff3? foobar.gff3
3214
3135
 
3215
- This can be useful if you just wish to quickly "test" new
3216
- compositions of the same nucleotide.
3136
+ Evidently for this to work the file at hand has to exist.
3217
3137
 
3218
3138
  ## The NCBI Taxonomy database (the Taxonomy submodule of the Bioroebe project)
3219
3139
 
@@ -3350,47 +3270,6 @@ nucleotides by issuing:
3350
3270
 
3351
3271
  show_individual_weight_of_the_four_dna_nucleotides
3352
3272
 
3353
- ## Truncating output in the bioroebe-shell
3354
- ![alt text][cat1]
3355
- [cat1]: https://i.imgur.com/Qmd7R0p.png
3356
-
3357
- **DNA/RNA sequences** can become very long and then become
3358
- quite difficult to view, read and handle on the commandline.
3359
-
3360
- Normally the bioroebe shell will truncate output of DNA sequences
3361
- that are "too long". This is mostly done so that working with
3362
- very long sequences becomes a bit more convenient.
3363
-
3364
- Sometimes this can become an antifeature, though, so the user
3365
- must be able to toggle this at his or her own discretion.
3366
-
3367
- By default, the bioroebe-shell (bioshell) will always try
3368
- to truncate output, but you can toggle this behaviour by
3369
- issuing:
3370
-
3371
- do not truncate
3372
-
3373
- In theory, other "do not" actions are also supported, or will
3374
- be supported in the future; right now (Oct 2019) this is a bit
3375
- limited.
3376
-
3377
- From the toplevel, you can use this method:
3378
-
3379
- Bioroebe.do_not_truncate
3380
-
3381
- The above instruction will toggle the truncate behaviour
3382
- to not truncate, ever.
3383
-
3384
- If you need to do so within the bioshell, this is the way:
3385
-
3386
- no_truncate
3387
-
3388
- Or simply
3389
-
3390
- truncate
3391
-
3392
- This will toggle, like a switch.
3393
-
3394
3273
  ## Rosalind Challenges
3395
3274
  ![alt text][cat1]
3396
3275
  [cat1]: https://i.imgur.com/Qmd7R0p.png
@@ -3527,31 +3406,6 @@ investing more time into Rosalind. Let's focus on solving
3527
3406
  real, existing problems instead - at the least as far as
3528
3407
  the Bioroebe project is concerned.
3529
3408
 
3530
- ## Numbers as input in the bioshell
3531
- ![alt text][cat1]
3532
- [cat1]: https://i.imgur.com/Qmd7R0p.png
3533
-
3534
- You can input a number in the **BioShell** such as <b style="color: darkblue">3</b>.
3535
-
3536
- This will attempt to <b>display the first 3 nucleotides</b> of
3537
- the assigned **main sequence**. It will only work if you have
3538
- assigned a sequence prior to that, though.
3539
-
3540
- Examples:
3541
-
3542
- 3
3543
- 33
3544
- 15
3545
-
3546
- ## transeq
3547
- ![alt text][cat1]
3548
- [cat1]: https://i.imgur.com/Qmd7R0p.png
3549
-
3550
- You can convert a DNA sequence into an aminoacid sequence by
3551
- doing this:
3552
-
3553
- transeq
3554
-
3555
3409
  ## Align two different sequences
3556
3410
  ![alt text][cat1]
3557
3411
  [cat1]: https://i.imgur.com/Qmd7R0p.png
@@ -3863,22 +3717,6 @@ does not (yet?) have support for comparing two genomes to
3863
3717
  one another and generate a visual map indicating the findings
3864
3718
  there.
3865
3719
 
3866
- ## Do not create directories on startup of the shell
3867
-
3868
- By default the bioshell will try to create some directories
3869
- on startup. This may not always be desired by the user
3870
- though, so an option has to exist to disable this functionality.
3871
-
3872
- Internally the variable @internal_hash[:create_directories_on_startup_of_the_shell]
3873
- keeps track of whether directories on startup of the shell will
3874
- be created.
3875
-
3876
- To disable this behaviour on startup of the bioshell, try
3877
- something like this:
3878
-
3879
- bioshell --do-not-create-directories-on-startup
3880
- bioshell --do-not-create-directories
3881
-
3882
3720
  ## class Bioroebe::MoveFileToItsCorrectLocation
3883
3721
 
3884
3722
  This class will move a bio-file to its "correct" location, with respect
@@ -4047,39 +3885,6 @@ has". Genes in itself are not that well-defined, so they are not necessarily
4047
3885
  the primary means of complexity. Think of this more as an interactome,
4048
3886
  where RNAs play a major dynamic role as well.
4049
3887
 
4050
- ## Bioroebe::ProfilePattern
4051
-
4052
- This class can be used to generate nucleotide sequences that
4053
- are not quite "random". For example, to generate sequences
4054
- that may "simulate" a TATA box.
4055
-
4056
- The idea for this class is to be extended into allowing
4057
- HMMs (Hidden Markov Models) one day.
4058
-
4059
- Usage example:
4060
-
4061
- _ = Bioroebe::ProfilePattern.new(ARGV, :do_not_run_yet)
4062
- _.generate_sequence_based_on_this_profile
4063
-
4064
- Such a profile will encode the profile specifying the preferred sequence
4065
- letters for each position in a section of DNA. You have to provide
4066
- the Hash into the method generate_sequence_based_on_this_profile() -
4067
- or you use the default Hash, which is stored in the constant
4068
- called **PER_POSITION_HASH**.
4069
-
4070
- That profile should be a Hash, with keys pointing to A, T, C, G
4071
- and the values being an Array of likelihood chance there,
4072
- as a number, such as 140. These values are also called
4073
- **scores**. Each score contains a number for each position
4074
- that indicates how likely it is to find the given
4075
- nucleotide at that location.
4076
-
4077
- You can also use this class to generate a random DNA string,
4078
- similar to the method called
4079
- **Bioroebe.generate_random_dna_sequence()**. The difference
4080
- is that class ProfilePattern allows for a bit more fine-tuned
4081
- control. The class will likely be extended in the future too.
4082
-
4083
3888
  ## class Bioroebe::DisplayOpenReadingFrames
4084
3889
 
4085
3890
  **class Bioroebe::DisplayOpenReadingFrames**, created in **May 2020**,
@@ -4459,28 +4264,6 @@ the BioRoebe-Shell, then you can use either of the following:
4459
4264
 
4460
4265
  seq?
4461
4266
  seq_with_tab?
4462
-
4463
- ## Prompt (the shell prompt9
4464
-
4465
- You can set a <b>custom prompt</b>, via the keywords
4466
- "prompt" or "set_prompt".
4467
-
4468
- To display the <b>current working directory</b>, do:
4469
-
4470
- prompt pwd
4471
-
4472
- To revert to the old default again, do this:
4473
-
4474
- prompt REVERT
4475
- prompt revert
4476
- prompt DEFAULT
4477
- prompt default
4478
-
4479
- If you do not want to set any prompt, do:
4480
-
4481
- prompt none
4482
-
4483
-
4484
4267
 
4485
4268
  ## Leader and Trailer
4486
4269
 
@@ -5761,6 +5544,9 @@ like this:
5761
5544
 
5762
5545
  <img src="https://i.imgur.com/vr2kEBz.png" style="margin: 1em; margin-left: 3em">
5763
5546
 
5547
+ Since as of <b>July 2022</b> invalid amino acids will be automatically
5548
+ filtered away before being assigned to the input.
5549
+
5764
5550
  ## Colourizing hydrophilic and hydrophobic aminoacids on the commandline
5765
5551
 
5766
5552
  Via class **Bioroebe::ColourizeHydrophilicAndHydrophobicAminoacids** you
@@ -5774,35 +5560,36 @@ Example output for this:
5774
5560
 
5775
5561
  This subsection contains some information about proteases.
5776
5562
 
5777
- trypsin:
5563
+ Trypsin:
5778
5564
  https://en.wikipedia.org/wiki/Trypsin
5779
- cuts at: Trypsin cuts peptide chains mainly at the carboxyl
5565
+ <b>cuts at</b>: Trypsin cuts peptide chains mainly at the carboxyl
5780
5566
  side of the amino acids lysine or arginine.
5781
5567
 
5782
- chymotrypsin:
5568
+ Chymotrypsin:
5783
5569
  https://en.wikipedia.org/wiki/Chymotrypsin
5784
- cuts at: Chymotrypsin preferentially cleaves peptide amide
5570
+ <b>cuts at</b>: Chymotrypsin preferentially cleaves peptide amide
5785
5571
  bonds where the side chain of the amino acid N-terminal
5786
- to the scissile amide bond is a large hydrophobic amino
5787
- acid (tyrosine, tryptophan, and phenylalanine).
5572
+ to the scissile amide bond is <b>a large hydrophobic amino</b>
5573
+ acid (specifically: tyrosine, tryptophan, and phenylalanine).
5574
+ Chymotrypsin will cleave proteins on the <b>carboxyl side</b>
5575
+ of aromatic or large hydrophobic amino acids.
5788
5576
 
5789
- thrombin:
5577
+ Thrombin:
5790
5578
  https://en.wikipedia.org/wiki/Thrombin
5791
- cuts at: Thrombin acts as a serine protease that converts
5579
+ <b>cuts at</b>: Thrombin acts as a serine protease that converts
5792
5580
  soluble fibrinogen into insoluble strands of fibrin. It
5793
5581
  catalyzes the hydrolysis of <b>Arg-Gly</b> bonds in
5794
5582
  particular peptide sequences only.
5795
5583
 
5796
- plasmin:
5584
+ Plasmin:
5797
5585
  https://en.wikipedia.org/wiki/Plasmin
5798
- cuts at: Plasmin is a serine protease.
5586
+ <b>cuts at</b>: Plasmin is a serine protease.
5799
5587
 
5800
- papain:
5588
+ Papain:
5801
5589
  https://en.wikipedia.org/wiki/Papain
5802
- cuts at: Papain prefers to cleave after an
5803
- arginine or lysine preceded by a hydrophobic
5804
- unit (Ala, Val, Leu, Ile, Phe, Trp, Tyr) and
5805
- not followed by a valine.
5590
+ <b>cuts at</b>: Papain prefers to cleave after an arginine or
5591
+ lysine preceded by a hydrophobic unit (Ala, Val, Leu, Ile,
5592
+ Phe, Trp, Tyr) and not followed by a valine.
5806
5593
 
5807
5594
  factor Xa:
5808
5595
 
@@ -5814,8 +5601,8 @@ Some proteins may permanently reside in the lumen of the
5814
5601
  Often such proteins will have a special signal sequence attached
5815
5602
  to their **C-terminal part**, such as **KDEL** (Lys-Asp-Glu-Leu).
5816
5603
 
5817
- KDEL is not the only signal that may be used, though. Some species
5818
- may use different signals, such as:
5604
+ <b>KDEL</b> is not the only signal that may be used, though. Some
5605
+ species may use different signals, such as:
5819
5606
 
5820
5607
  aminoacids | species
5821
5608
  -------------|------------------------------------------------------------
@@ -5825,8 +5612,9 @@ may use different signals, such as:
5825
5612
  ADEL | Schizosaccharomyces pombe (fission yeast)
5826
5613
  SDEL | Plasmodium falciparum
5827
5614
 
5828
- If you work with the bioshell then you can simply use this method
5829
- to query whether the given aminoacid sequence has a KDEL sequence:
5615
+ If you work with the <b>bioshell</b> then you can simply use this
5616
+ method to query whether the given aminoacid sequence has a KDEL
5617
+ sequence:
5830
5618
 
5831
5619
  KDEL?
5832
5620
 
@@ -7362,16 +7150,6 @@ This would notify the bioshell that only nucleotides from position
7362
7150
  51 to (including) position 3251 will be colourized, when doing another
7363
7151
  "ORF?" invocation.
7364
7152
 
7365
- ## Longest substring
7366
-
7367
- Within the Bioroebe::Shell you can determine the longest substring,
7368
- including gaps, like s:'
7369
-
7370
- longest_substring? ATTATTGTT | ATTATTCTT'
7371
-
7372
- Note that this will make use of the diff-lcs gem, which uses
7373
- the McIlroy-Hunt algorithm.
7374
-
7375
7153
  ## Restriction Enzymes
7376
7154
 
7377
7155
  This **subsection** will eventually be expanded to explain various things about
@@ -8730,6 +8508,22 @@ The images that can be generated via this may look as follows:
8730
8508
 
8731
8509
  <img src="https://i.imgur.com/fWwD1fj.png" style="margin: 1em; margin-left: 2em">
8732
8510
 
8511
+ Let's look at another example.
8512
+
8513
+ Say you input the following sequences there:
8514
+
8515
+ AGVV
8516
+ AGVV
8517
+ AGVV
8518
+ AGVV
8519
+ AGGV
8520
+ AGGV
8521
+ AGGV
8522
+
8523
+ The resulting image that is generated is:
8524
+
8525
+ <img src="https://i.imgur.com/3wWApIQ.png" style="margin: 1em; margin-left: 2em">
8526
+
8733
8527
  ## The Kozak Sequence
8734
8528
 
8735
8529
  The ribosome usually scans for a **AUG** codon. But there are
@@ -9180,6 +8974,409 @@ time being it is what it is. At a later point in time test cases
9180
8974
  may be added to check whether it performs correctly or whether it
9181
8975
  does not.
9182
8976
 
8977
+ The other rules, also published in 2004, are the Reynolds rules. Code
8978
+ support was added to the Bioroebe project in <b>June 2022</b>, but
8979
+ it was not tested yet, so the implementation may be incorrect.
8980
+
8981
+ ## The Bioroebe::Shell interface
8982
+
8983
+ The following subsection specifically handles information
8984
+ pertaining to the <b>Bioroebe::Shell</b> interface of the
8985
+ <b>bioroebe project</b>. It is also called <b>bioshell</b>,
8986
+ to simplify spelling it.
8987
+
8988
+ ### Numbers as input in the bioshell
8989
+ ![alt text][cat1]
8990
+ [cat1]: https://i.imgur.com/Qmd7R0p.png
8991
+
8992
+ You can input a number in the **BioShell** such as <b style="color: darkblue">3</b>.
8993
+
8994
+ This will attempt to <b>display the first 3 nucleotides</b> of
8995
+ the assigned **main sequence**. It will only work if you have
8996
+ assigned a sequence prior to that, though.
8997
+
8998
+ Examples:
8999
+
9000
+ 3
9001
+ 33
9002
+ 15
9003
+
9004
+ ### transeq
9005
+ ![alt text][cat1]
9006
+ [cat1]: https://i.imgur.com/Qmd7R0p.png
9007
+
9008
+ You can convert a DNA sequence into an aminoacid sequence by
9009
+ doing this:
9010
+
9011
+ transeq
9012
+
9013
+ ### Shuffling the DNA/RNA string in the bioshell
9014
+ ![alt text][cat1]
9015
+ [cat1]: https://i.imgur.com/Qmd7R0p.png
9016
+
9017
+ Via
9018
+
9019
+ shuffle
9020
+
9021
+ you can <b>randomly rearrange the main DNA/RNA string</b>
9022
+ that is used by the <b>Bioroebe::Shell</b>.
9023
+
9024
+ This can be useful if you just wish to quickly "test"
9025
+ new compositions of the same nucleotide.
9026
+
9027
+ ### Permanently disabling showing the startup-introduction of the Bioshell
9028
+ ![alt text][cat1]
9029
+ [cat1]: https://i.imgur.com/Qmd7R0p.png
9030
+
9031
+ If you do not want to see the start-up intro, you can try
9032
+ any of the following:
9033
+
9034
+ bioshell --permanently-disable-startup-intro
9035
+ bioshell --permanently-disable-startup-notice
9036
+ bioshell --permanently-no-startup-intro
9037
+ bioshell --permanently-no-startup-info
9038
+
9039
+ ### Longest substring
9040
+ ![alt text][cat1]
9041
+ [cat1]: https://i.imgur.com/Qmd7R0p.png
9042
+
9043
+ Within the Bioroebe::Shell you can determine the longest substring,
9044
+ including gaps, like s:'
9045
+
9046
+ longest_substring? ATTATTGTT | ATTATTCTT'
9047
+
9048
+ Note that this will make use of the diff-lcs gem, which uses
9049
+ the McIlroy-Hunt algorithm.
9050
+
9051
+ ### Do not create directories on startup of the shell
9052
+ ![alt text][cat1]
9053
+ [cat1]: https://i.imgur.com/Qmd7R0p.png
9054
+
9055
+ By default the <b>bioshell</b> will try to create some directories
9056
+ on startup. This may not always be desired by the user, though,
9057
+ so an option has to exist to <b>disable</b> this functionality.
9058
+
9059
+ Internally the variable @internal_hash[:create_directories_on_startup_of_the_shell]
9060
+ keeps track of whether directories on startup of the shell will
9061
+ be created.
9062
+
9063
+ To disable this behaviour on startup of the bioshell, try
9064
+ something like this:
9065
+
9066
+ bioshell --do-not-create-directories-on-startup
9067
+ bioshell --do-not-create-directories
9068
+
9069
+ ### Generating and assigning a random amount of nucleotides
9070
+ ![alt text][cat1]
9071
+ [cat1]: https://i.imgur.com/Qmd7R0p.png
9072
+
9073
+ Via:
9074
+
9075
+ random 555
9076
+
9077
+ you can "generate" 555 random nucleotides (DNA that is) and
9078
+ assign it to the main sequence in use by the bioshell. This
9079
+ is mostly a convenience feature, if you want to debug something
9080
+ quickly.
9081
+
9082
+ ### Determining the log directory for the Bioroebe::Shell component
9083
+ ![alt text][cat1]
9084
+ [cat1]: https://i.imgur.com/Qmd7R0p.png
9085
+
9086
+ Via:
9087
+
9088
+ bioshell_log_dir?
9089
+
9090
+ you can determine the log-directory output for the bioshell
9091
+ component. On my home system this will default to
9092
+ <b>/home/Temp/bioroebe/bioshell/</b>.
9093
+
9094
+ ### Prompt (the shell prompt of the bioshell)
9095
+ ![alt text][cat1]
9096
+ [cat1]: https://i.imgur.com/Qmd7R0p.png
9097
+
9098
+ You can set a <b>custom prompt</b> in the bioshell, via
9099
+ the keywords "<b>prompt</b>" or "<b>set_prompt</b>".
9100
+
9101
+ To display the <b>current working directory</b>, do:
9102
+
9103
+ prompt pwd
9104
+
9105
+ To revert to the old default again, do this:
9106
+
9107
+ prompt REVERT
9108
+ prompt revert
9109
+ prompt DEFAULT
9110
+ prompt default
9111
+
9112
+ If you do not want to set any prompt, do:
9113
+
9114
+ prompt none
9115
+
9116
+ ### Random stuff - generating random DNA sequences in the bioshell
9117
+ ![alt text][cat1]
9118
+ [cat1]: https://i.imgur.com/Qmd7R0p.png
9119
+
9120
+ You can <b>generate random DNA sequences</b> in the
9121
+ <b>bioshell</b> via:
9122
+
9123
+ random dna 20
9124
+ random dna 25
9125
+ random dna 30
9126
+ # or simpler
9127
+ random 20
9128
+ random 25
9129
+ random 30
9130
+
9131
+ This will generate random DNA sequences, with a length
9132
+ of 20, 25, 30, respectively. This may not be very useful
9133
+ but it was important that this functionality is made
9134
+ available somewhere. Sometimes you may not even care
9135
+ about the sequence and just use the a "filler" sequence,
9136
+ so randomness has to be part of the Bioroebe project
9137
+ as well.
9138
+
9139
+ You can also use some toplevel-methods to generate, e. g.
9140
+ 20 random aminoacids. Have a look at the following
9141
+ <b>toplevel API</b>:
9142
+
9143
+ Bioroebe.random_aminoacid? 20 # => "UAVHYQQESWUYAOVESEIY"
9144
+
9145
+ Note that there may exist other APIs within the Bioroebe project
9146
+ that do the same as well.
9147
+
9148
+ If you would like to use a ruby-gtk3 widget have a look
9149
+ at **RandomSequence**, under **bioroebe/gtk3/random_sequence/**.
9150
+ It works with aminoacids, DNA and RNA, and allows the user to
9151
+ create random sequences. (If you need weighted randomness then
9152
+ you currently have to use the commandline variant. Perhaps I may
9153
+ add support into the GUI directly for this one day.)
9154
+
9155
+ ### Deprecations within the Bioroebe::Shell
9156
+ ![alt text][cat1]
9157
+ [cat1]: https://i.imgur.com/Qmd7R0p.png
9158
+
9159
+ Over the years the Bioroebe::Shell changed quite a bit.
9160
+
9161
+ This subsection here will list a few of these changes
9162
+ or rather, the deprecations.
9163
+
9164
+ **raw_sequence**: removed in June 2022 completely. It is
9165
+ simpler to handle sequences via Bioroebe::Sequence
9166
+ instead.
9167
+
9168
+ <b>@internal_hash[:array_sequences]</b> was no longer in
9169
+ use, so it was removed in July 2022.
9170
+
9171
+ ### Chop off nucleotides within the Bioroebe::Shell
9172
+ ![alt text][cat1]
9173
+ [cat1]: https://i.imgur.com/Qmd7R0p.png
9174
+
9175
+ You can use the following syntax to chop away until you find
9176
+ a particular substring, in the bioshell:
9177
+
9178
+ chop_to ATG
9179
+
9180
+ This functionality was specifically added to find the first
9181
+ ATG codon.
9182
+
9183
+ ### Truncating output in the bioroebe-shell
9184
+ ![alt text][cat1]
9185
+ [cat1]: https://i.imgur.com/Qmd7R0p.png
9186
+
9187
+ **DNA/RNA sequences** can become very long and then become
9188
+ quite difficult to view, read and handle on the commandline.
9189
+
9190
+ Normally the bioroebe shell will truncate output of DNA sequences
9191
+ that are "too long". This is mostly done so that working with
9192
+ very long sequences becomes a bit more convenient.
9193
+
9194
+ Sometimes this can become an antifeature, though, so the user
9195
+ must be able to toggle this at his or her own discretion.
9196
+
9197
+ By default, the bioroebe-shell (bioshell) will always try
9198
+ to truncate output, but you can toggle this behaviour by
9199
+ issuing:
9200
+
9201
+ do not truncate
9202
+
9203
+ In theory, other "do not" actions are also supported, or will
9204
+ be supported in the future; right now (Oct 2019) this is a bit
9205
+ limited.
9206
+
9207
+ From the toplevel, you can use this method:
9208
+
9209
+ Bioroebe.do_not_truncate
9210
+
9211
+ The above instruction will toggle the truncate behaviour
9212
+ to not truncate, ever.
9213
+
9214
+ If you need to do so within the bioshell, this is the way:
9215
+
9216
+ no_truncate
9217
+
9218
+ Or simply
9219
+
9220
+ truncate
9221
+
9222
+ This will toggle, like a switch.
9223
+
9224
+ ## Support for other programming languages
9225
+
9226
+ The main programming language for the bioroebe project is **ruby**.
9227
+ Ruby, from a language design point of view, is a great programming
9228
+ language - not necessarily all of ruby, but the subset that I use.
9229
+ It is very easy to quickly prototype ideas via ruby.
9230
+
9231
+ However had, ruby is known to **not** be among the fastest programming
9232
+ languages about on this planet; so, it makes sense to use other
9233
+ languages too from this point of view. Additionally there are some
9234
+ software stacks in use in **other** programming languages, such as
9235
+ matplotlib and various more.
9236
+
9237
+ Thus, it is important to **support other programming languages** as
9238
+ well, if there are useful libraries. The bioroebe project, after
9239
+ all, tries to be **practical**: it focuses on getting things done,
9240
+ no matter the language.
9241
+
9242
+ This means that support for other programming languages can be
9243
+ found in this project as well, often using system() or similar
9244
+ functionality to tap into these other programming languages. Do
9245
+ not be surprised when that happens - the bioroebe project will
9246
+ also try to act as a **practical glue** towards functionality
9247
+ enabled via other projects. We want to get things done, no
9248
+ matter the programming language at hand!
9249
+
9250
+ Whenever possible, though, the bioroebe project will try to be
9251
+ flexible in this regard, so ideally the same solution should
9252
+ work for many different programming languages.
9253
+
9254
+ While Ruby is the primary language for this project, since as
9255
+ of 2021 I will try to officially support **java**, **jruby**
9256
+ and the **GraalVM**. This is on my TODO list, though - stay
9257
+ tuned for more updates in this regard. See also the
9258
+ subsection <b>Support for Python</b>.
9259
+
9260
+ ## Support for Python
9261
+
9262
+ In <b>June 2022</b> I decided to add support for Python to bioroebe.
9263
+
9264
+ While people can - and should - easily use <b>biopython</b> instead,
9265
+ I simply wanted to see how much python-support I can add to
9266
+ bioroebe. This may lag behind some years compared to biopython,
9267
+ but I wanted to extend python support as well, so there you go.
9268
+ It is simply an additional option for the bioroebe project.
9269
+ <b>Ruby</b> will remain the primary language for the project,
9270
+ though, at the least for now.
9271
+
9272
+ ## Bioroebe::ProfilePattern
9273
+
9274
+ This class can be used to generate nucleotide sequences that
9275
+ are not quite "random". For example, to generate sequences
9276
+ that may "simulate" a TATA box.
9277
+
9278
+ The idea for this class is to be extended into allowing
9279
+ HMMs (Hidden Markov Models) one day.
9280
+
9281
+ Usage example:
9282
+
9283
+ _ = Bioroebe::ProfilePattern.new(ARGV, :do_not_run_yet)
9284
+ _.generate_sequence_based_on_this_profile
9285
+
9286
+ Such a profile will encode the profile specifying the preferred sequence
9287
+ letters for each position in a section of DNA. You have to provide
9288
+ the Hash into the method generate_sequence_based_on_this_profile() -
9289
+ or you use the default Hash, which is stored in the constant
9290
+ called **PER_POSITION_HASH**.
9291
+
9292
+ That profile should be a Hash, with keys pointing to A, T, C, G
9293
+ and the values being an Array of likelihood chance there,
9294
+ as a number, such as 140. These values are also called
9295
+ **scores**. Each score contains a number for each position
9296
+ that indicates how likely it is to find the given
9297
+ nucleotide at that location.
9298
+
9299
+ You can also use this class to generate a random DNA string,
9300
+ similar to the method called
9301
+ **Bioroebe.generate_random_dna_sequence()**. The difference
9302
+ is that class ProfilePattern allows for a bit more fine-tuned
9303
+ control. The class will likely be extended in the future too.
9304
+
9305
+ ## Generate DNA via Bioroebe.random_dna
9306
+
9307
+ You can "generate" random DNA strings by making use of the
9308
+ following code:
9309
+
9310
+ x = Bioroebe.random_dna 50 # => "AGACATCCGGCTTGGATACCTCATAAGTCATATCAGCATCGTCGGACATT"
9311
+
9312
+ As can be seen in the example above, after the #, a String will be
9313
+ returned representing that nucleotide sequence. In the case above
9314
+ it'll be 50 nucleotides in length.
9315
+
9316
+ The number given to <b>.random_dna()</b> tells the method how many
9317
+ nucleotides should be generated.
9318
+
9319
+ The method accepts a second argument, which should be a Hash.
9320
+ If it is a hash then the generated DNA will be based on the
9321
+ **probabilities** given to that Hash.
9322
+
9323
+ Let's look at specific example here:
9324
+
9325
+ Bioroebe.random_dna(50, { A: 10, T: 10, C: 10, G: 70}) # => "GGGGTGGGGAGGGTATGCGGAGGAAGGGCGGGAAGGGCGGGGGCTGGGCG"
9326
+
9327
+ As you can see, in the Hash defined above, the likelihood for
9328
+ incorporating a Guanine is much higher than for Adenine
9329
+ (70 : 10). This will be reflected in the generated DNA
9330
+ sequence which, as can be seen, contains many more
9331
+ Guanines than Adenines.
9332
+
9333
+ There is yet a third use case for the above. If you pass a **String**
9334
+ as the second argument rather than a Hash, then that String will be
9335
+ used as basis for generating the DNA string at hand.
9336
+
9337
+ Again, let's look at a specific example here:
9338
+
9339
+ Bioroebe.random_dna(10, 'ATCGATCGGG')
9340
+
9341
+ Here we add more G than A, T or C, so the new DNA sequence should
9342
+ contain these nucleotides as well.
9343
+
9344
+ More usage examples in this regard:
9345
+
9346
+ Bioroebe.random_dna(20, 'ATGGGGGGGG') # => "TGAGGGGGGGGGTGGGAGGG"
9347
+ Bioroebe.random_dna(20, 'ATGGGGGGGG') # => "GGTAGGGGGGGGTAGGGGGG"
9348
+
9349
+ Note that this is similar to the .randomize() method in the bioruby
9350
+ project:
9351
+
9352
+ hash = {'a'=>1,'c'=>2,'g'=>3,'t'=>4}
9353
+ puts Bio::Sequence::NA.randomize(hash) # => "ggcttgttac" (for example)
9354
+
9355
+ ## Parsing genbank (.gbk) files
9356
+
9357
+ You could use Bioroebe::GenbankParser to parse .gbk files, at the
9358
+ least if you want to obtain the raw sequence, in FASTA format.
9359
+
9360
+ Example for this:
9361
+
9362
+ require 'bioroebe/genbank/genbank_parser.rb'
9363
+ result = Bioroebe::GenbankParser.new('/home/Temp/bioroebe/ls_orchid.gbk')
9364
+ result.dataset? # This method call will return the FASTA sequence.
9365
+
9366
+ Note that this currently (<b>July 2022</b>) only grabs one entry. In
9367
+ the upcoming rewrite in the future the parser will be able to parse
9368
+ all entries, and then present them to the user. Stay tuned in this
9369
+ regard.
9370
+
9371
+ ## Parsers in general
9372
+
9373
+ The bioroebe project will store most parsers in the parsers/ subdirectory
9374
+ since as of <b>July 2022</b>.
9375
+
9376
+ Prior to that date different parsers were stored in different subdirectories,
9377
+ such as the parser for genbank-files being stored in the genbank/
9378
+ subdirectory. As I found this situation confusing, I settled for
9379
+ the parsers/ subdirectory since as of <b>July 2022</b>.
9183
9380
 
9184
9381
  ## Possibly useful links in regards to molecular biology and science in general
9185
9382