bioroebe 0.10.80 → 0.11.12

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of bioroebe might be problematic. Click here for more details.

Files changed (67) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +507 -310
  3. data/bioroebe.gemspec +3 -3
  4. data/doc/README.gen +506 -309
  5. data/doc/todo/bioroebe_todo.md +29 -40
  6. data/lib/bioroebe/aminoacids/display_aminoacid_table.rb +1 -0
  7. data/lib/bioroebe/base/colours_for_base/colours_for_base.rb +18 -8
  8. data/lib/bioroebe/base/commandline_application/commandline_arguments.rb +13 -11
  9. data/lib/bioroebe/base/commandline_application/misc.rb +18 -8
  10. data/lib/bioroebe/base/prototype/misc.rb +1 -1
  11. data/lib/bioroebe/codons/show_codon_tables.rb +6 -2
  12. data/lib/bioroebe/constants/aminoacids_and_proteins.rb +1 -0
  13. data/lib/bioroebe/constants/files_and_directories.rb +8 -1
  14. data/lib/bioroebe/count/count_amount_of_nucleotides.rb +3 -0
  15. data/lib/bioroebe/gui/gtk3/protein_to_DNA/protein_to_DNA.rb +18 -18
  16. data/lib/bioroebe/gui/shared_code/protein_to_DNA/protein_to_DNA_module.rb +14 -14
  17. data/lib/bioroebe/parsers/genbank_parser.rb +353 -24
  18. data/lib/bioroebe/python/README.md +1 -0
  19. data/lib/bioroebe/python/__pycache__/mymodule.cpython-39.pyc +0 -0
  20. data/lib/bioroebe/python/gui/gtk3/widget1.py +22 -0
  21. data/lib/bioroebe/python/mymodule.py +8 -0
  22. data/lib/bioroebe/python/protein_to_dna.py +30 -0
  23. data/lib/bioroebe/python/shell/shell.py +19 -0
  24. data/lib/bioroebe/python/to_rna.py +14 -0
  25. data/lib/bioroebe/python/toplevel_methods/to_camelcase.py +11 -0
  26. data/lib/bioroebe/sequence/nucleotide_module/nucleotide_module.rb +28 -25
  27. data/lib/bioroebe/sequence/sequence.rb +54 -2
  28. data/lib/bioroebe/shell/menu.rb +3336 -3304
  29. data/lib/bioroebe/shell/readline/readline.rb +1 -1
  30. data/lib/bioroebe/shell/shell.rb +11233 -28
  31. data/lib/bioroebe/siRNA/siRNA.rb +81 -1
  32. data/lib/bioroebe/string_matching/find_longest_substring.rb +3 -2
  33. data/lib/bioroebe/toplevel_methods/aminoacids_and_proteins.rb +31 -24
  34. data/lib/bioroebe/toplevel_methods/nucleotides.rb +22 -5
  35. data/lib/bioroebe/toplevel_methods/open_in_browser.rb +2 -0
  36. data/lib/bioroebe/toplevel_methods/to_camelcase.rb +5 -0
  37. data/lib/bioroebe/version/version.rb +2 -2
  38. data/lib/bioroebe/yaml/configuration/browser.yml +1 -1
  39. data/lib/bioroebe/yaml/restriction_enzymes/restriction_enzymes.yml +3 -3
  40. metadata +17 -36
  41. data/doc/setup.rb +0 -1655
  42. data/lib/bioroebe/genbank/genbank_parser.rb +0 -291
  43. data/lib/bioroebe/shell/add.rb +0 -108
  44. data/lib/bioroebe/shell/assign.rb +0 -360
  45. data/lib/bioroebe/shell/chop_and_cut.rb +0 -281
  46. data/lib/bioroebe/shell/constants.rb +0 -166
  47. data/lib/bioroebe/shell/download.rb +0 -335
  48. data/lib/bioroebe/shell/enable_and_disable.rb +0 -158
  49. data/lib/bioroebe/shell/enzymes.rb +0 -310
  50. data/lib/bioroebe/shell/fasta.rb +0 -345
  51. data/lib/bioroebe/shell/gtk.rb +0 -76
  52. data/lib/bioroebe/shell/history.rb +0 -132
  53. data/lib/bioroebe/shell/initialize.rb +0 -217
  54. data/lib/bioroebe/shell/loop.rb +0 -74
  55. data/lib/bioroebe/shell/misc.rb +0 -4341
  56. data/lib/bioroebe/shell/prompt.rb +0 -107
  57. data/lib/bioroebe/shell/random.rb +0 -289
  58. data/lib/bioroebe/shell/reset.rb +0 -335
  59. data/lib/bioroebe/shell/scan_and_parse.rb +0 -135
  60. data/lib/bioroebe/shell/search.rb +0 -337
  61. data/lib/bioroebe/shell/sequences.rb +0 -200
  62. data/lib/bioroebe/shell/show_report_and_display.rb +0 -2901
  63. data/lib/bioroebe/shell/startup.rb +0 -127
  64. data/lib/bioroebe/shell/taxonomy.rb +0 -14
  65. data/lib/bioroebe/shell/tk.rb +0 -23
  66. data/lib/bioroebe/shell/user_input.rb +0 -88
  67. data/lib/bioroebe/shell/xorg.rb +0 -45
data/README.md CHANGED
@@ -2,13 +2,13 @@
2
2
  [![forthebadge](https://forthebadge.com/images/badges/made-with-ruby.svg)](https://www.ruby-lang.org/en/)
3
3
  [![Gem Version](https://badge.fury.io/rb/bioroebe.svg)](https://badge.fury.io/rb/bioroebe)
4
4
 
5
- This gem was <b>last updated</b> on the <span style="color: darkblue; font-weight: bold">24.06.2022</span> (dd.mm.yyyy notation), at <span style="color: steelblue; font-weight: bold">22:13:29</span> o'clock.
5
+ This gem was <b>last updated</b> on the <span style="color: darkblue; font-weight: bold">05.07.2022</span> (dd.mm.yyyy notation), at <span style="color: steelblue; font-weight: bold">16:47:23</span> o'clock.
6
6
 
7
7
  # The Bioroebe Project
8
8
 
9
9
  ## Bioroebe
10
10
 
11
- <img src="http://shevy.bplaced.net/BIOROEBE.png">
11
+ <img src="https://i.imgur.com/mAoP7AP.png">
12
12
  <img src="https://i.imgur.com/YqYxRBZ.png" style="margin: 4px; margin-left: 12px;"/>
13
13
  <img src="https://i.imgur.com/k7mMlg2.png" style="margin: 4px; margin-left: 12px;"/>
14
14
 
@@ -335,41 +335,6 @@ so I opted to go the yaml route. But if people want to use a hash
335
335
  instead, they can do so, too - see the <b>API</b> for codon tables
336
336
  lateron. Simply define your own constants and pass them to the
337
337
  appropriate methods.
338
-
339
- ## Support for other programming languages
340
-
341
- The main programming language for the bioroebe project is **ruby**.
342
- Ruby, from a language design point of view, is a great programming
343
- language - not necessarily all of ruby, but the subset that I use.
344
- It is very easy to quickly prototype ideas via ruby.
345
-
346
- However had, ruby is known to **not** be among the fastest programming
347
- languages about on this planet; so, it makes sense to use other
348
- languages too from this point of view. Additionally there are some
349
- software stacks in use in **other** programming languages, such as
350
- matplotlib and various more.
351
-
352
- Thus, it is important to **support other programming languages** as
353
- well, if there are useful libraries. The bioroebe project, after
354
- all, tries to be **practical**: it focuses on getting things done,
355
- no matter the language.
356
-
357
- This means that support for other programming languages can be
358
- found in this project as well, often using system() or similar
359
- functionality to tap into these other programming languages. Do
360
- not be surprised when that happens - the bioroebe project will
361
- also try to act as a **practical glue** towards functionality
362
- enabled via other projects. We want to get things done, no
363
- matter the programming language at hand!
364
-
365
- Whenever possible, though, the bioroebe project will try to be
366
- flexible in this regard, so ideally the same solution should
367
- work for many different programming languages.
368
-
369
- While Ruby is the primary language for this project, since as
370
- of 2021 I will try to officially support **java**, **jruby**
371
- and the **GraalVM**. This is on my TODO list, though - stay
372
- tuned for more updates in this regard.
373
338
 
374
339
  ## Readline support in the BioRoebe project
375
340
 
@@ -553,16 +518,16 @@ the DNA-to-Protein translation is somewhat simply kept as a
553
518
  Once you are inside a **running Bioshell**, you can do other **commands**
554
519
  such as this one here:
555
520
 
556
- random # ← This will generate a random DNA sequence.
521
+ random # ← This will generate a random DNA sequence. Each nucleotide has the same chance to be added.
557
522
 
558
523
  To **assign** a DNA sequence, do:
559
524
 
560
525
  assign ATAGGGCTTTT
561
526
 
562
- Note that since the year 2016, if you input a nucleotide sequence like
563
- the one above, without any other commands/words, then we will assume
527
+ Note that since as of the year <b>2016</b>, if you input a nucleotide sequence
528
+ like the one above, without any other commands/words, then we will assume
564
529
  that you did mean to do an assignment as-is anyway. The "assign" part
565
- then becomes superfluous.
530
+ then becomes superfluous and can be omitted.
566
531
 
567
532
  This is how this is simply done, by omitting the "assign" part of the
568
533
  above instruction altogether:
@@ -1073,18 +1038,18 @@ The text **banana** thus has the following suffixes:
1073
1038
 
1074
1039
  This subsection deals with some aspects of **HMMs**.
1075
1040
 
1076
- Why are HMMs useful in biology? They can be used to represent protein
1077
- families, for example (via pHMMs - profile hidden markov models).
1041
+ Why are HMMs useful in biology? They can be used to <b>represent protein
1042
+ families</b>, for example (via <b>pHMMs</b> - profile hidden markov models).
1078
1043
 
1079
1044
  Furthermore, they can show some bias in the mutation rate that can be
1080
1045
  observed. Different genomes are known to have different hotspots where
1081
- mutations are more likely to happen. These are examples where a HMM
1082
- may be useful.
1046
+ mutations are more likely to happen, for various reasons. These are
1047
+ examples where a HMM may be useful.
1083
1048
 
1084
- HMMs are usually based on the Shannon model where you assign different
1049
+ HMMs are usually based on the <b>Shannon model</b> where you assign different
1085
1050
  probabilities to "change" events. An example that was mentioned back
1086
- in 1948 was the english alphabet - some letters, and combinations of
1087
- letters, are more commonly seen. Shannon gave the example of "E"
1051
+ in <b>1948</b> was the english alphabet - some letters, and combinations
1052
+ of letters, are more commonly seen. Shannon gave the example of "E"
1088
1053
  versus "W", as shown in the following graph (a **finite state
1089
1054
  graph**):
1090
1055
 
@@ -1098,40 +1063,47 @@ DNA sequence, a 10-mer would be equivalent to **10 base pairs**.
1098
1063
  The individual transition states are based on an assumption of
1099
1064
  "randomness", but ensuring that these are truly random is not
1100
1065
  necessarily trivial. Computers do not really 'generate' true
1101
- randomness, at the least not when they are working solo. You
1102
- can even 'predict' some randomness here or there - see vulnerabilities
1103
- such as Specter or similar variants where software can read from
1104
- areas of the memory that should be inaccessible to them. Some
1105
- of this is based on co-predictions. For distributed computers,
1106
- you may often use random noise or decay of atoms as 'a source
1107
- of randomness''. For any DNA nucleotide sequence, we would
1108
- assume that each base pair has a 25% chance to exist at any
1109
- given position, but this is not necessarily true, for various
1110
- reasons. An interesting thought is ... why is ATP so important?
1111
- Yes, due to it being 'the energy currency in a cell' but .. why
1112
- is this ATP aka adenine? Why not GTP, aka guanine or any of
1113
- the other two nucleotides? I can not answer the question; there may
1114
- be many reasons, including differential chemical storage power as
1115
- well as mere random chance event in evolution, but for whatever
1066
+ randomness, at the least not when they are working solo, "on
1067
+ their own". You can even 'predict' some randomness here or there
1068
+ via various techniques - see vulnerabilities such as <b>Specter</b>
1069
+ or similar variants where software can read from areas of the
1070
+ memory that should be inaccessible to them. Some of this is based
1071
+ on co-predictions. For distributed computers, you may often use
1072
+ random noise or decay of atoms as 'a source of randomness'. For
1073
+ any DNA nucleotide sequence, we would assume that each base pair
1074
+ has a 25% chance to exist at any given position, but this is not
1075
+ necessarily true, again for various reasons.
1076
+
1077
+ An interesting thought is ... why is <b>ATP</b> so important?
1078
+ Yes, of course due to it being 'the energy currency in a cell' but ..
1079
+ why is this ATP, aka adenine? Why not GTP, aka guanine or any of
1080
+ the other two nucleotides? (GTP is used too, but why? Why not
1081
+ CTP and TTP?) I can not answer this question; there may
1082
+ be many reasons, including differential chemical storage power
1083
+ as well as mere random chance event in evolution, but for whatever
1116
1084
  the reason, you will not find a complete 25% percentage value
1117
1085
  for every given "slot" in DNA, depending on the organism.
1118
1086
 
1119
1087
  From a practical point of view, how can we approach Hidden Markov
1120
- Models?
1088
+ Models and use them?
1121
1089
 
1122
- Let's take the following sequence:
1090
+ Let's take the following simple sequence:
1123
1091
 
1124
1092
  ACGTACGC
1125
1093
 
1126
1094
  From this sequence we can see that the <b>3-mer</b> "ACG"
1127
1095
  is followed by either a T, or a C. Have a look at the sequence
1128
- to see if you can identify the two ACG subsequences there.
1096
+ again to see if you can identify the two ACG subsequences
1097
+ there. You can see one at the start, and the other one
1098
+ following a bit later, hence why we come to the conclusion
1099
+ that either a T or a C will follow this <b>3-mer</b>.
1129
1100
 
1130
- The probability of either T or C, thus, is 0.5 (50%);
1131
- for A and G to follow there is 0% so the latter two can
1132
- be ignored.
1101
+ The probability of either T or C to occur on <b>that</b>
1102
+ position, thus, is 0.5 (50%); for A and G to follow there
1103
+ is 0% so the latter two can be ignored.
1133
1104
 
1134
- Thus, we could use a ruby Hash as follows:
1105
+ Thus, we could use a ruby Hash as follows that should
1106
+ describe these probabilities:
1135
1107
 
1136
1108
  probabilities = {'T': 0.5, 'C': 0.5} # ignoring A and G here, but we could denote them via 0 as well
1137
1109
 
@@ -1217,34 +1189,6 @@ each edge.
1217
1189
  Parsimony assumes that substitutions are rare and that back-mutations
1218
1190
  do not occur.
1219
1191
 
1220
- ## Random stuff
1221
-
1222
- You can generate random DNA sequences in the shell:
1223
-
1224
- random dna 20
1225
- random dna 25
1226
- random dna 30
1227
-
1228
- This will generate random DNA sequences, with a length
1229
- of 20, 25, 30, respectively. This may not be very useful
1230
- but it was important that this functionality is made
1231
- available somewhere.
1232
-
1233
- You can also use some toplevel-methods to generate, e. g.
1234
- 20 random aminoacids:
1235
-
1236
- Bioroebe.random_aminoacid? 20 # => "UAVHYQQESWUYAOVESEIY"
1237
-
1238
- Note that there may exist other APIs within the Bioroebe project
1239
- that do the same as well.
1240
-
1241
- If you would like to use a ruby-gtk3 widget have a look
1242
- at **RandomSequence**, under **bioroebe/gtk3/random_sequence/**.
1243
- It works with aminoacids, DNA and RNA, and allows the user to
1244
- create random sequences. (If you need weighted randomness then
1245
- you currently have to use the commandline variant. Perhaps I may
1246
- add support into the GUI directly for this one day.)
1247
-
1248
1192
  ## Displaying the main sequence with delimiter characters
1249
1193
 
1250
1194
  From within the <b>bioshell</b>, you can use some alternative ways to
@@ -2714,18 +2658,6 @@ This may look as follows:
2714
2658
 
2715
2659
  <img src="https://i.imgur.com/gAZg8qG.png" style="margin: 1em; margin-left: 3em">
2716
2660
 
2717
- ## Obtaining a subsequence from a Bioroebe::Sequence object
2718
-
2719
- Say that you have the DNA sequence **ATGCATGCAAAA**.
2720
-
2721
- There are several ways how to obtain a subsequence from
2722
- this. One variant will be shown next, by making use of
2723
- the method called **.subseq()**.
2724
-
2725
- Example:
2726
-
2727
- seq = Bioroebe::Sequence.new("ATGCATGCAAAA"); seq.subseq(1,3) # => "ATG"
2728
-
2729
2661
  ## Bioroebe::Protein
2730
2662
 
2731
2663
  This class is a subclass of class **Bioroebe::Sequence**. The
@@ -2740,16 +2672,6 @@ functionality is also available in another method.
2740
2672
  For now keep this in mind; at some later point I may decide whether
2741
2673
  this class is to be kept or not.
2742
2674
 
2743
- ## Permanently disabling showing the startup-introduction of the Bioshell
2744
-
2745
- If you do not want to see the start-up intro, you can try
2746
- any of the following:
2747
-
2748
- bioshell --permanently-disable-startup-intro
2749
- bioshell --permanently-disable-startup-notice
2750
- bioshell --permanently-no-startup-intro
2751
- bioshell --permanently-no-startup-info
2752
-
2753
2675
  ## Decoding aminoacids
2754
2676
 
2755
2677
  Decoding aminoacids means to take the aminoacid at hand, ideally
@@ -3176,47 +3098,45 @@ can try to use:
3176
3098
  On class Bioroebe::Sequence. More customizability may be added
3177
3099
  to that method in this regard, if users need this.
3178
3100
 
3179
- ## The Hydropathy index
3101
+ ### Obtaining a subsequence from a Bioroebe::Sequence object
3180
3102
 
3181
- You can display the hydropathy index for aminoacids from within
3182
- the **bioshell**.
3103
+ Say that you have the DNA sequence **ATGCATGCAAAA**.
3183
3104
 
3184
- Simply issue:
3105
+ There are several ways how to obtain a subsequence from
3106
+ this. One variant will be shown next, by making use of
3107
+ the method called **.subseq()**.
3185
3108
 
3186
- hydropathy?
3109
+ Example:
3187
3110
 
3188
- ## Generate DNA
3111
+ seq = Bioroebe::Sequence.new("ATGCATGCAAAA"); seq.subseq(1,3) # => "ATG"
3189
3112
 
3190
- You can generate random DNA strings by issuing the following
3191
- code:
3113
+ You can also randomize the sequence, via .randomize().
3192
3114
 
3193
- x = Bioroebe.random_dna 50 # => "AGACATCCGGCTTGGATACCTCATAAGTCATATCAGCATCGTCGGACATT"
3115
+ Example:
3194
3116
 
3195
- As can be seen in the example above, after the #, a String will be
3196
- returned representing that nucleotide sequence.
3117
+ x = Bioroebe::Sequence.new; x.randomize
3197
3118
 
3198
- The number given to .random_dna() tells the method how many nucleotides
3199
- should be generated.
3119
+ This is similar to the method in Bioruby here:
3200
3120
 
3201
- ## The GFF file format
3121
+ https://github.com/bioruby/bioruby/blob/master/lib/bio/sequence/common.rb#L243
3202
3122
 
3203
- From within the **bioshell** you can analyze .gff and .gff3 files,
3204
- such as by issuing the following command:
3123
+ ## The Hydropathy index
3205
3124
 
3206
- gff3? foobar.gff3
3125
+ You can display the hydropathy index for aminoacids from within
3126
+ the **bioshell**.
3207
3127
 
3208
- Evidently for this to work the file at hand has to exist.
3128
+ Simply issue:
3209
3129
 
3210
- ## Shuffling the DNA/RNA string in the bioshell
3130
+ hydropathy?
3211
3131
 
3212
- Via
3132
+ ## The GFF file format
3213
3133
 
3214
- shuffle
3134
+ From within the **bioshell** you can analyze .gff and .gff3 files,
3135
+ such as by issuing the following command:
3215
3136
 
3216
- you can randomly rearrange the main DNA/RNA string.
3137
+ gff3? foobar.gff3
3217
3138
 
3218
- This can be useful if you just wish to quickly "test" new
3219
- compositions of the same nucleotide.
3139
+ Evidently for this to work the file at hand has to exist.
3220
3140
 
3221
3141
  ## The NCBI Taxonomy database (the Taxonomy submodule of the Bioroebe project)
3222
3142
 
@@ -3353,47 +3273,6 @@ nucleotides by issuing:
3353
3273
 
3354
3274
  show_individual_weight_of_the_four_dna_nucleotides
3355
3275
 
3356
- ## Truncating output in the bioroebe-shell
3357
- ![alt text][cat1]
3358
- [cat1]: https://i.imgur.com/Qmd7R0p.png
3359
-
3360
- **DNA/RNA sequences** can become very long and then become
3361
- quite difficult to view, read and handle on the commandline.
3362
-
3363
- Normally the bioroebe shell will truncate output of DNA sequences
3364
- that are "too long". This is mostly done so that working with
3365
- very long sequences becomes a bit more convenient.
3366
-
3367
- Sometimes this can become an antifeature, though, so the user
3368
- must be able to toggle this at his or her own discretion.
3369
-
3370
- By default, the bioroebe-shell (bioshell) will always try
3371
- to truncate output, but you can toggle this behaviour by
3372
- issuing:
3373
-
3374
- do not truncate
3375
-
3376
- In theory, other "do not" actions are also supported, or will
3377
- be supported in the future; right now (Oct 2019) this is a bit
3378
- limited.
3379
-
3380
- From the toplevel, you can use this method:
3381
-
3382
- Bioroebe.do_not_truncate
3383
-
3384
- The above instruction will toggle the truncate behaviour
3385
- to not truncate, ever.
3386
-
3387
- If you need to do so within the bioshell, this is the way:
3388
-
3389
- no_truncate
3390
-
3391
- Or simply
3392
-
3393
- truncate
3394
-
3395
- This will toggle, like a switch.
3396
-
3397
3276
  ## Rosalind Challenges
3398
3277
  ![alt text][cat1]
3399
3278
  [cat1]: https://i.imgur.com/Qmd7R0p.png
@@ -3530,31 +3409,6 @@ investing more time into Rosalind. Let's focus on solving
3530
3409
  real, existing problems instead - at the least as far as
3531
3410
  the Bioroebe project is concerned.
3532
3411
 
3533
- ## Numbers as input in the bioshell
3534
- ![alt text][cat1]
3535
- [cat1]: https://i.imgur.com/Qmd7R0p.png
3536
-
3537
- You can input a number in the **BioShell** such as <b style="color: darkblue">3</b>.
3538
-
3539
- This will attempt to <b>display the first 3 nucleotides</b> of
3540
- the assigned **main sequence**. It will only work if you have
3541
- assigned a sequence prior to that, though.
3542
-
3543
- Examples:
3544
-
3545
- 3
3546
- 33
3547
- 15
3548
-
3549
- ## transeq
3550
- ![alt text][cat1]
3551
- [cat1]: https://i.imgur.com/Qmd7R0p.png
3552
-
3553
- You can convert a DNA sequence into an aminoacid sequence by
3554
- doing this:
3555
-
3556
- transeq
3557
-
3558
3412
  ## Align two different sequences
3559
3413
  ![alt text][cat1]
3560
3414
  [cat1]: https://i.imgur.com/Qmd7R0p.png
@@ -3866,22 +3720,6 @@ does not (yet?) have support for comparing two genomes to
3866
3720
  one another and generate a visual map indicating the findings
3867
3721
  there.
3868
3722
 
3869
- ## Do not create directories on startup of the shell
3870
-
3871
- By default the bioshell will try to create some directories
3872
- on startup. This may not always be desired by the user
3873
- though, so an option has to exist to disable this functionality.
3874
-
3875
- Internally the variable @internal_hash[:create_directories_on_startup_of_the_shell]
3876
- keeps track of whether directories on startup of the shell will
3877
- be created.
3878
-
3879
- To disable this behaviour on startup of the bioshell, try
3880
- something like this:
3881
-
3882
- bioshell --do-not-create-directories-on-startup
3883
- bioshell --do-not-create-directories
3884
-
3885
3723
  ## class Bioroebe::MoveFileToItsCorrectLocation
3886
3724
 
3887
3725
  This class will move a bio-file to its "correct" location, with respect
@@ -4050,39 +3888,6 @@ has". Genes in itself are not that well-defined, so they are not necessarily
4050
3888
  the primary means of complexity. Think of this more as an interactome,
4051
3889
  where RNAs play a major dynamic role as well.
4052
3890
 
4053
- ## Bioroebe::ProfilePattern
4054
-
4055
- This class can be used to generate nucleotide sequences that
4056
- are not quite "random". For example, to generate sequences
4057
- that may "simulate" a TATA box.
4058
-
4059
- The idea for this class is to be extended into allowing
4060
- HMMs (Hidden Markov Models) one day.
4061
-
4062
- Usage example:
4063
-
4064
- _ = Bioroebe::ProfilePattern.new(ARGV, :do_not_run_yet)
4065
- _.generate_sequence_based_on_this_profile
4066
-
4067
- Such a profile will encode the profile specifying the preferred sequence
4068
- letters for each position in a section of DNA. You have to provide
4069
- the Hash into the method generate_sequence_based_on_this_profile() -
4070
- or you use the default Hash, which is stored in the constant
4071
- called **PER_POSITION_HASH**.
4072
-
4073
- That profile should be a Hash, with keys pointing to A, T, C, G
4074
- and the values being an Array of likelihood chance there,
4075
- as a number, such as 140. These values are also called
4076
- **scores**. Each score contains a number for each position
4077
- that indicates how likely it is to find the given
4078
- nucleotide at that location.
4079
-
4080
- You can also use this class to generate a random DNA string,
4081
- similar to the method called
4082
- **Bioroebe.generate_random_dna_sequence()**. The difference
4083
- is that class ProfilePattern allows for a bit more fine-tuned
4084
- control. The class will likely be extended in the future too.
4085
-
4086
3891
  ## class Bioroebe::DisplayOpenReadingFrames
4087
3892
 
4088
3893
  **class Bioroebe::DisplayOpenReadingFrames**, created in **May 2020**,
@@ -4462,28 +4267,6 @@ the BioRoebe-Shell, then you can use either of the following:
4462
4267
 
4463
4268
  seq?
4464
4269
  seq_with_tab?
4465
-
4466
- ## Prompt (the shell prompt9
4467
-
4468
- You can set a <b>custom prompt</b>, via the keywords
4469
- "prompt" or "set_prompt".
4470
-
4471
- To display the <b>current working directory</b>, do:
4472
-
4473
- prompt pwd
4474
-
4475
- To revert to the old default again, do this:
4476
-
4477
- prompt REVERT
4478
- prompt revert
4479
- prompt DEFAULT
4480
- prompt default
4481
-
4482
- If you do not want to set any prompt, do:
4483
-
4484
- prompt none
4485
-
4486
-
4487
4270
 
4488
4271
  ## Leader and Trailer
4489
4272
 
@@ -5764,6 +5547,9 @@ like this:
5764
5547
 
5765
5548
  <img src="https://i.imgur.com/vr2kEBz.png" style="margin: 1em; margin-left: 3em">
5766
5549
 
5550
+ Since as of <b>July 2022</b> invalid amino acids will be automatically
5551
+ filtered away before being assigned to the input.
5552
+
5767
5553
  ## Colourizing hydrophilic and hydrophobic aminoacids on the commandline
5768
5554
 
5769
5555
  Via class **Bioroebe::ColourizeHydrophilicAndHydrophobicAminoacids** you
@@ -5777,35 +5563,36 @@ Example output for this:
5777
5563
 
5778
5564
  This subsection contains some information about proteases.
5779
5565
 
5780
- trypsin:
5566
+ Trypsin:
5781
5567
  https://en.wikipedia.org/wiki/Trypsin
5782
- cuts at: Trypsin cuts peptide chains mainly at the carboxyl
5568
+ <b>cuts at</b>: Trypsin cuts peptide chains mainly at the carboxyl
5783
5569
  side of the amino acids lysine or arginine.
5784
5570
 
5785
- chymotrypsin:
5571
+ Chymotrypsin:
5786
5572
  https://en.wikipedia.org/wiki/Chymotrypsin
5787
- cuts at: Chymotrypsin preferentially cleaves peptide amide
5573
+ <b>cuts at</b>: Chymotrypsin preferentially cleaves peptide amide
5788
5574
  bonds where the side chain of the amino acid N-terminal
5789
- to the scissile amide bond is a large hydrophobic amino
5790
- acid (tyrosine, tryptophan, and phenylalanine).
5575
+ to the scissile amide bond is <b>a large hydrophobic amino</b>
5576
+ acid (specifically: tyrosine, tryptophan, and phenylalanine).
5577
+ Chymotrypsin will cleave proteins on the <b>carboxyl side</b>
5578
+ of aromatic or large hydrophobic amino acids.
5791
5579
 
5792
- thrombin:
5580
+ Thrombin:
5793
5581
  https://en.wikipedia.org/wiki/Thrombin
5794
- cuts at: Thrombin acts as a serine protease that converts
5582
+ <b>cuts at</b>: Thrombin acts as a serine protease that converts
5795
5583
  soluble fibrinogen into insoluble strands of fibrin. It
5796
5584
  catalyzes the hydrolysis of <b>Arg-Gly</b> bonds in
5797
5585
  particular peptide sequences only.
5798
5586
 
5799
- plasmin:
5587
+ Plasmin:
5800
5588
  https://en.wikipedia.org/wiki/Plasmin
5801
- cuts at: Plasmin is a serine protease.
5589
+ <b>cuts at</b>: Plasmin is a serine protease.
5802
5590
 
5803
- papain:
5591
+ Papain:
5804
5592
  https://en.wikipedia.org/wiki/Papain
5805
- cuts at: Papain prefers to cleave after an
5806
- arginine or lysine preceded by a hydrophobic
5807
- unit (Ala, Val, Leu, Ile, Phe, Trp, Tyr) and
5808
- not followed by a valine.
5593
+ <b>cuts at</b>: Papain prefers to cleave after an arginine or
5594
+ lysine preceded by a hydrophobic unit (Ala, Val, Leu, Ile,
5595
+ Phe, Trp, Tyr) and not followed by a valine.
5809
5596
 
5810
5597
  factor Xa:
5811
5598
 
@@ -5817,8 +5604,8 @@ Some proteins may permanently reside in the lumen of the
5817
5604
  Often such proteins will have a special signal sequence attached
5818
5605
  to their **C-terminal part**, such as **KDEL** (Lys-Asp-Glu-Leu).
5819
5606
 
5820
- KDEL is not the only signal that may be used, though. Some species
5821
- may use different signals, such as:
5607
+ <b>KDEL</b> is not the only signal that may be used, though. Some
5608
+ species may use different signals, such as:
5822
5609
 
5823
5610
  aminoacids | species
5824
5611
  -------------|------------------------------------------------------------
@@ -5828,8 +5615,9 @@ may use different signals, such as:
5828
5615
  ADEL | Schizosaccharomyces pombe (fission yeast)
5829
5616
  SDEL | Plasmodium falciparum
5830
5617
 
5831
- If you work with the bioshell then you can simply use this method
5832
- to query whether the given aminoacid sequence has a KDEL sequence:
5618
+ If you work with the <b>bioshell</b> then you can simply use this
5619
+ method to query whether the given aminoacid sequence has a KDEL
5620
+ sequence:
5833
5621
 
5834
5622
  KDEL?
5835
5623
 
@@ -7365,16 +7153,6 @@ This would notify the bioshell that only nucleotides from position
7365
7153
  51 to (including) position 3251 will be colourized, when doing another
7366
7154
  "ORF?" invocation.
7367
7155
 
7368
- ## Longest substring
7369
-
7370
- Within the Bioroebe::Shell you can determine the longest substring,
7371
- including gaps, like s:'
7372
-
7373
- longest_substring? ATTATTGTT | ATTATTCTT'
7374
-
7375
- Note that this will make use of the diff-lcs gem, which uses
7376
- the McIlroy-Hunt algorithm.
7377
-
7378
7156
  ## Restriction Enzymes
7379
7157
 
7380
7158
  This **subsection** will eventually be expanded to explain various things about
@@ -8733,6 +8511,22 @@ The images that can be generated via this may look as follows:
8733
8511
 
8734
8512
  <img src="https://i.imgur.com/fWwD1fj.png" style="margin: 1em; margin-left: 2em">
8735
8513
 
8514
+ Let's look at another example.
8515
+
8516
+ Say you input the following sequences there:
8517
+
8518
+ AGVV
8519
+ AGVV
8520
+ AGVV
8521
+ AGVV
8522
+ AGGV
8523
+ AGGV
8524
+ AGGV
8525
+
8526
+ The resulting image that is generated is:
8527
+
8528
+ <img src="https://i.imgur.com/3wWApIQ.png" style="margin: 1em; margin-left: 2em">
8529
+
8736
8530
  ## The Kozak Sequence
8737
8531
 
8738
8532
  The ribosome usually scans for a **AUG** codon. But there are
@@ -9183,6 +8977,409 @@ time being it is what it is. At a later point in time test cases
9183
8977
  may be added to check whether it performs correctly or whether it
9184
8978
  does not.
9185
8979
 
8980
+ The other rules, also published in 2004, are the Reynolds rules. Code
8981
+ support was added to the Bioroebe project in <b>June 2022</b>, but
8982
+ it was not tested yet, so the implementation may be incorrect.
8983
+
8984
+ ## The Bioroebe::Shell interface
8985
+
8986
+ The following subsection specifically handles information
8987
+ pertaining to the <b>Bioroebe::Shell</b> interface of the
8988
+ <b>bioroebe project</b>. It is also called <b>bioshell</b>,
8989
+ to simplify spelling it.
8990
+
8991
+ ### Numbers as input in the bioshell
8992
+ ![alt text][cat1]
8993
+ [cat1]: https://i.imgur.com/Qmd7R0p.png
8994
+
8995
+ You can input a number in the **BioShell** such as <b style="color: darkblue">3</b>.
8996
+
8997
+ This will attempt to <b>display the first 3 nucleotides</b> of
8998
+ the assigned **main sequence**. It will only work if you have
8999
+ assigned a sequence prior to that, though.
9000
+
9001
+ Examples:
9002
+
9003
+ 3
9004
+ 33
9005
+ 15
9006
+
9007
+ ### transeq
9008
+ ![alt text][cat1]
9009
+ [cat1]: https://i.imgur.com/Qmd7R0p.png
9010
+
9011
+ You can convert a DNA sequence into an aminoacid sequence by
9012
+ doing this:
9013
+
9014
+ transeq
9015
+
9016
+ ### Shuffling the DNA/RNA string in the bioshell
9017
+ ![alt text][cat1]
9018
+ [cat1]: https://i.imgur.com/Qmd7R0p.png
9019
+
9020
+ Via
9021
+
9022
+ shuffle
9023
+
9024
+ you can <b>randomly rearrange the main DNA/RNA string</b>
9025
+ that is used by the <b>Bioroebe::Shell</b>.
9026
+
9027
+ This can be useful if you just wish to quickly "test"
9028
+ new compositions of the same nucleotide.
9029
+
9030
+ ### Permanently disabling showing the startup-introduction of the Bioshell
9031
+ ![alt text][cat1]
9032
+ [cat1]: https://i.imgur.com/Qmd7R0p.png
9033
+
9034
+ If you do not want to see the start-up intro, you can try
9035
+ any of the following:
9036
+
9037
+ bioshell --permanently-disable-startup-intro
9038
+ bioshell --permanently-disable-startup-notice
9039
+ bioshell --permanently-no-startup-intro
9040
+ bioshell --permanently-no-startup-info
9041
+
9042
+ ### Longest substring
9043
+ ![alt text][cat1]
9044
+ [cat1]: https://i.imgur.com/Qmd7R0p.png
9045
+
9046
+ Within the Bioroebe::Shell you can determine the longest substring,
9047
+ including gaps, like s:'
9048
+
9049
+ longest_substring? ATTATTGTT | ATTATTCTT'
9050
+
9051
+ Note that this will make use of the diff-lcs gem, which uses
9052
+ the McIlroy-Hunt algorithm.
9053
+
9054
+ ### Do not create directories on startup of the shell
9055
+ ![alt text][cat1]
9056
+ [cat1]: https://i.imgur.com/Qmd7R0p.png
9057
+
9058
+ By default the <b>bioshell</b> will try to create some directories
9059
+ on startup. This may not always be desired by the user, though,
9060
+ so an option has to exist to <b>disable</b> this functionality.
9061
+
9062
+ Internally the variable @internal_hash[:create_directories_on_startup_of_the_shell]
9063
+ keeps track of whether directories on startup of the shell will
9064
+ be created.
9065
+
9066
+ To disable this behaviour on startup of the bioshell, try
9067
+ something like this:
9068
+
9069
+ bioshell --do-not-create-directories-on-startup
9070
+ bioshell --do-not-create-directories
9071
+
9072
+ ### Generating and assigning a random amount of nucleotides
9073
+ ![alt text][cat1]
9074
+ [cat1]: https://i.imgur.com/Qmd7R0p.png
9075
+
9076
+ Via:
9077
+
9078
+ random 555
9079
+
9080
+ you can "generate" 555 random nucleotides (DNA that is) and
9081
+ assign it to the main sequence in use by the bioshell. This
9082
+ is mostly a convenience feature, if you want to debug something
9083
+ quickly.
9084
+
9085
+ ### Determining the log directory for the Bioroebe::Shell component
9086
+ ![alt text][cat1]
9087
+ [cat1]: https://i.imgur.com/Qmd7R0p.png
9088
+
9089
+ Via:
9090
+
9091
+ bioshell_log_dir?
9092
+
9093
+ you can determine the log-directory output for the bioshell
9094
+ component. On my home system this will default to
9095
+ <b>/home/Temp/bioroebe/bioshell/</b>.
9096
+
9097
+ ### Prompt (the shell prompt of the bioshell)
9098
+ ![alt text][cat1]
9099
+ [cat1]: https://i.imgur.com/Qmd7R0p.png
9100
+
9101
+ You can set a <b>custom prompt</b> in the bioshell, via
9102
+ the keywords "<b>prompt</b>" or "<b>set_prompt</b>".
9103
+
9104
+ To display the <b>current working directory</b>, do:
9105
+
9106
+ prompt pwd
9107
+
9108
+ To revert to the old default again, do this:
9109
+
9110
+ prompt REVERT
9111
+ prompt revert
9112
+ prompt DEFAULT
9113
+ prompt default
9114
+
9115
+ If you do not want to set any prompt, do:
9116
+
9117
+ prompt none
9118
+
9119
+ ### Random stuff - generating random DNA sequences in the bioshell
9120
+ ![alt text][cat1]
9121
+ [cat1]: https://i.imgur.com/Qmd7R0p.png
9122
+
9123
+ You can <b>generate random DNA sequences</b> in the
9124
+ <b>bioshell</b> via:
9125
+
9126
+ random dna 20
9127
+ random dna 25
9128
+ random dna 30
9129
+ # or simpler
9130
+ random 20
9131
+ random 25
9132
+ random 30
9133
+
9134
+ This will generate random DNA sequences, with a length
9135
+ of 20, 25, 30, respectively. This may not be very useful
9136
+ but it was important that this functionality is made
9137
+ available somewhere. Sometimes you may not even care
9138
+ about the sequence and just use the a "filler" sequence,
9139
+ so randomness has to be part of the Bioroebe project
9140
+ as well.
9141
+
9142
+ You can also use some toplevel-methods to generate, e. g.
9143
+ 20 random aminoacids. Have a look at the following
9144
+ <b>toplevel API</b>:
9145
+
9146
+ Bioroebe.random_aminoacid? 20 # => "UAVHYQQESWUYAOVESEIY"
9147
+
9148
+ Note that there may exist other APIs within the Bioroebe project
9149
+ that do the same as well.
9150
+
9151
+ If you would like to use a ruby-gtk3 widget have a look
9152
+ at **RandomSequence**, under **bioroebe/gtk3/random_sequence/**.
9153
+ It works with aminoacids, DNA and RNA, and allows the user to
9154
+ create random sequences. (If you need weighted randomness then
9155
+ you currently have to use the commandline variant. Perhaps I may
9156
+ add support into the GUI directly for this one day.)
9157
+
9158
+ ### Deprecations within the Bioroebe::Shell
9159
+ ![alt text][cat1]
9160
+ [cat1]: https://i.imgur.com/Qmd7R0p.png
9161
+
9162
+ Over the years the Bioroebe::Shell changed quite a bit.
9163
+
9164
+ This subsection here will list a few of these changes
9165
+ or rather, the deprecations.
9166
+
9167
+ **raw_sequence**: removed in June 2022 completely. It is
9168
+ simpler to handle sequences via Bioroebe::Sequence
9169
+ instead.
9170
+
9171
+ <b>@internal_hash[:array_sequences]</b> was no longer in
9172
+ use, so it was removed in July 2022.
9173
+
9174
+ ### Chop off nucleotides within the Bioroebe::Shell
9175
+ ![alt text][cat1]
9176
+ [cat1]: https://i.imgur.com/Qmd7R0p.png
9177
+
9178
+ You can use the following syntax to chop away until you find
9179
+ a particular substring, in the bioshell:
9180
+
9181
+ chop_to ATG
9182
+
9183
+ This functionality was specifically added to find the first
9184
+ ATG codon.
9185
+
9186
+ ### Truncating output in the bioroebe-shell
9187
+ ![alt text][cat1]
9188
+ [cat1]: https://i.imgur.com/Qmd7R0p.png
9189
+
9190
+ **DNA/RNA sequences** can become very long and then become
9191
+ quite difficult to view, read and handle on the commandline.
9192
+
9193
+ Normally the bioroebe shell will truncate output of DNA sequences
9194
+ that are "too long". This is mostly done so that working with
9195
+ very long sequences becomes a bit more convenient.
9196
+
9197
+ Sometimes this can become an antifeature, though, so the user
9198
+ must be able to toggle this at his or her own discretion.
9199
+
9200
+ By default, the bioroebe-shell (bioshell) will always try
9201
+ to truncate output, but you can toggle this behaviour by
9202
+ issuing:
9203
+
9204
+ do not truncate
9205
+
9206
+ In theory, other "do not" actions are also supported, or will
9207
+ be supported in the future; right now (Oct 2019) this is a bit
9208
+ limited.
9209
+
9210
+ From the toplevel, you can use this method:
9211
+
9212
+ Bioroebe.do_not_truncate
9213
+
9214
+ The above instruction will toggle the truncate behaviour
9215
+ to not truncate, ever.
9216
+
9217
+ If you need to do so within the bioshell, this is the way:
9218
+
9219
+ no_truncate
9220
+
9221
+ Or simply
9222
+
9223
+ truncate
9224
+
9225
+ This will toggle, like a switch.
9226
+
9227
+ ## Support for other programming languages
9228
+
9229
+ The main programming language for the bioroebe project is **ruby**.
9230
+ Ruby, from a language design point of view, is a great programming
9231
+ language - not necessarily all of ruby, but the subset that I use.
9232
+ It is very easy to quickly prototype ideas via ruby.
9233
+
9234
+ However had, ruby is known to **not** be among the fastest programming
9235
+ languages about on this planet; so, it makes sense to use other
9236
+ languages too from this point of view. Additionally there are some
9237
+ software stacks in use in **other** programming languages, such as
9238
+ matplotlib and various more.
9239
+
9240
+ Thus, it is important to **support other programming languages** as
9241
+ well, if there are useful libraries. The bioroebe project, after
9242
+ all, tries to be **practical**: it focuses on getting things done,
9243
+ no matter the language.
9244
+
9245
+ This means that support for other programming languages can be
9246
+ found in this project as well, often using system() or similar
9247
+ functionality to tap into these other programming languages. Do
9248
+ not be surprised when that happens - the bioroebe project will
9249
+ also try to act as a **practical glue** towards functionality
9250
+ enabled via other projects. We want to get things done, no
9251
+ matter the programming language at hand!
9252
+
9253
+ Whenever possible, though, the bioroebe project will try to be
9254
+ flexible in this regard, so ideally the same solution should
9255
+ work for many different programming languages.
9256
+
9257
+ While Ruby is the primary language for this project, since as
9258
+ of 2021 I will try to officially support **java**, **jruby**
9259
+ and the **GraalVM**. This is on my TODO list, though - stay
9260
+ tuned for more updates in this regard. See also the
9261
+ subsection <b>Support for Python</b>.
9262
+
9263
+ ## Support for Python
9264
+
9265
+ In <b>June 2022</b> I decided to add support for Python to bioroebe.
9266
+
9267
+ While people can - and should - easily use <b>biopython</b> instead,
9268
+ I simply wanted to see how much python-support I can add to
9269
+ bioroebe. This may lag behind some years compared to biopython,
9270
+ but I wanted to extend python support as well, so there you go.
9271
+ It is simply an additional option for the bioroebe project.
9272
+ <b>Ruby</b> will remain the primary language for the project,
9273
+ though, at the least for now.
9274
+
9275
+ ## Bioroebe::ProfilePattern
9276
+
9277
+ This class can be used to generate nucleotide sequences that
9278
+ are not quite "random". For example, to generate sequences
9279
+ that may "simulate" a TATA box.
9280
+
9281
+ The idea for this class is to be extended into allowing
9282
+ HMMs (Hidden Markov Models) one day.
9283
+
9284
+ Usage example:
9285
+
9286
+ _ = Bioroebe::ProfilePattern.new(ARGV, :do_not_run_yet)
9287
+ _.generate_sequence_based_on_this_profile
9288
+
9289
+ Such a profile will encode the profile specifying the preferred sequence
9290
+ letters for each position in a section of DNA. You have to provide
9291
+ the Hash into the method generate_sequence_based_on_this_profile() -
9292
+ or you use the default Hash, which is stored in the constant
9293
+ called **PER_POSITION_HASH**.
9294
+
9295
+ That profile should be a Hash, with keys pointing to A, T, C, G
9296
+ and the values being an Array of likelihood chance there,
9297
+ as a number, such as 140. These values are also called
9298
+ **scores**. Each score contains a number for each position
9299
+ that indicates how likely it is to find the given
9300
+ nucleotide at that location.
9301
+
9302
+ You can also use this class to generate a random DNA string,
9303
+ similar to the method called
9304
+ **Bioroebe.generate_random_dna_sequence()**. The difference
9305
+ is that class ProfilePattern allows for a bit more fine-tuned
9306
+ control. The class will likely be extended in the future too.
9307
+
9308
+ ## Generate DNA via Bioroebe.random_dna
9309
+
9310
+ You can "generate" random DNA strings by making use of the
9311
+ following code:
9312
+
9313
+ x = Bioroebe.random_dna 50 # => "AGACATCCGGCTTGGATACCTCATAAGTCATATCAGCATCGTCGGACATT"
9314
+
9315
+ As can be seen in the example above, after the #, a String will be
9316
+ returned representing that nucleotide sequence. In the case above
9317
+ it'll be 50 nucleotides in length.
9318
+
9319
+ The number given to <b>.random_dna()</b> tells the method how many
9320
+ nucleotides should be generated.
9321
+
9322
+ The method accepts a second argument, which should be a Hash.
9323
+ If it is a hash then the generated DNA will be based on the
9324
+ **probabilities** given to that Hash.
9325
+
9326
+ Let's look at specific example here:
9327
+
9328
+ Bioroebe.random_dna(50, { A: 10, T: 10, C: 10, G: 70}) # => "GGGGTGGGGAGGGTATGCGGAGGAAGGGCGGGAAGGGCGGGGGCTGGGCG"
9329
+
9330
+ As you can see, in the Hash defined above, the likelihood for
9331
+ incorporating a Guanine is much higher than for Adenine
9332
+ (70 : 10). This will be reflected in the generated DNA
9333
+ sequence which, as can be seen, contains many more
9334
+ Guanines than Adenines.
9335
+
9336
+ There is yet a third use case for the above. If you pass a **String**
9337
+ as the second argument rather than a Hash, then that String will be
9338
+ used as basis for generating the DNA string at hand.
9339
+
9340
+ Again, let's look at a specific example here:
9341
+
9342
+ Bioroebe.random_dna(10, 'ATCGATCGGG')
9343
+
9344
+ Here we add more G than A, T or C, so the new DNA sequence should
9345
+ contain these nucleotides as well.
9346
+
9347
+ More usage examples in this regard:
9348
+
9349
+ Bioroebe.random_dna(20, 'ATGGGGGGGG') # => "TGAGGGGGGGGGTGGGAGGG"
9350
+ Bioroebe.random_dna(20, 'ATGGGGGGGG') # => "GGTAGGGGGGGGTAGGGGGG"
9351
+
9352
+ Note that this is similar to the .randomize() method in the bioruby
9353
+ project:
9354
+
9355
+ hash = {'a'=>1,'c'=>2,'g'=>3,'t'=>4}
9356
+ puts Bio::Sequence::NA.randomize(hash) # => "ggcttgttac" (for example)
9357
+
9358
+ ## Parsing genbank (.gbk) files
9359
+
9360
+ You could use Bioroebe::GenbankParser to parse .gbk files, at the
9361
+ least if you want to obtain the raw sequence, in FASTA format.
9362
+
9363
+ Example for this:
9364
+
9365
+ require 'bioroebe/genbank/genbank_parser.rb'
9366
+ result = Bioroebe::GenbankParser.new('/home/Temp/bioroebe/ls_orchid.gbk')
9367
+ result.dataset? # This method call will return the FASTA sequence.
9368
+
9369
+ Note that this currently (<b>July 2022</b>) only grabs one entry. In
9370
+ the upcoming rewrite in the future the parser will be able to parse
9371
+ all entries, and then present them to the user. Stay tuned in this
9372
+ regard.
9373
+
9374
+ ## Parsers in general
9375
+
9376
+ The bioroebe project will store most parsers in the parsers/ subdirectory
9377
+ since as of <b>July 2022</b>.
9378
+
9379
+ Prior to that date different parsers were stored in different subdirectories,
9380
+ such as the parser for genbank-files being stored in the genbank/
9381
+ subdirectory. As I found this situation confusing, I settled for
9382
+ the parsers/ subdirectory since as of <b>July 2022</b>.
9186
9383
 
9187
9384
  ## Possibly useful links in regards to molecular biology and science in general
9188
9385