RubyGems - bioroebe - Versions diffs - 0.10.80 → 0.11.12 - Mend

bioroebe 0.10.80 → 0.11.12

Potentially problematic release.

This version of bioroebe might be problematic. Click here for more details.

Files changed (67) hide show

checksums.yaml +4 -4
data/README.md +507 -310
data/bioroebe.gemspec +3 -3
data/doc/README.gen +506 -309
data/doc/todo/bioroebe_todo.md +29 -40
data/lib/bioroebe/aminoacids/display_aminoacid_table.rb +1 -0
data/lib/bioroebe/base/colours_for_base/colours_for_base.rb +18 -8
data/lib/bioroebe/base/commandline_application/commandline_arguments.rb +13 -11
data/lib/bioroebe/base/commandline_application/misc.rb +18 -8
data/lib/bioroebe/base/prototype/misc.rb +1 -1
data/lib/bioroebe/codons/show_codon_tables.rb +6 -2
data/lib/bioroebe/constants/aminoacids_and_proteins.rb +1 -0
data/lib/bioroebe/constants/files_and_directories.rb +8 -1
data/lib/bioroebe/count/count_amount_of_nucleotides.rb +3 -0
data/lib/bioroebe/gui/gtk3/protein_to_DNA/protein_to_DNA.rb +18 -18
data/lib/bioroebe/gui/shared_code/protein_to_DNA/protein_to_DNA_module.rb +14 -14
data/lib/bioroebe/parsers/genbank_parser.rb +353 -24
data/lib/bioroebe/python/README.md +1 -0
data/lib/bioroebe/python/__pycache__/mymodule.cpython-39.pyc +0 -0
data/lib/bioroebe/python/gui/gtk3/widget1.py +22 -0
data/lib/bioroebe/python/mymodule.py +8 -0
data/lib/bioroebe/python/protein_to_dna.py +30 -0
data/lib/bioroebe/python/shell/shell.py +19 -0
data/lib/bioroebe/python/to_rna.py +14 -0
data/lib/bioroebe/python/toplevel_methods/to_camelcase.py +11 -0
data/lib/bioroebe/sequence/nucleotide_module/nucleotide_module.rb +28 -25
data/lib/bioroebe/sequence/sequence.rb +54 -2
data/lib/bioroebe/shell/menu.rb +3336 -3304
data/lib/bioroebe/shell/readline/readline.rb +1 -1
data/lib/bioroebe/shell/shell.rb +11233 -28
data/lib/bioroebe/siRNA/siRNA.rb +81 -1
data/lib/bioroebe/string_matching/find_longest_substring.rb +3 -2
data/lib/bioroebe/toplevel_methods/aminoacids_and_proteins.rb +31 -24
data/lib/bioroebe/toplevel_methods/nucleotides.rb +22 -5
data/lib/bioroebe/toplevel_methods/open_in_browser.rb +2 -0
data/lib/bioroebe/toplevel_methods/to_camelcase.rb +5 -0
data/lib/bioroebe/version/version.rb +2 -2
data/lib/bioroebe/yaml/configuration/browser.yml +1 -1
data/lib/bioroebe/yaml/restriction_enzymes/restriction_enzymes.yml +3 -3
metadata +17 -36
data/doc/setup.rb +0 -1655
data/lib/bioroebe/genbank/genbank_parser.rb +0 -291
data/lib/bioroebe/shell/add.rb +0 -108
data/lib/bioroebe/shell/assign.rb +0 -360
data/lib/bioroebe/shell/chop_and_cut.rb +0 -281
data/lib/bioroebe/shell/constants.rb +0 -166
data/lib/bioroebe/shell/download.rb +0 -335
data/lib/bioroebe/shell/enable_and_disable.rb +0 -158
data/lib/bioroebe/shell/enzymes.rb +0 -310
data/lib/bioroebe/shell/fasta.rb +0 -345
data/lib/bioroebe/shell/gtk.rb +0 -76
data/lib/bioroebe/shell/history.rb +0 -132
data/lib/bioroebe/shell/initialize.rb +0 -217
data/lib/bioroebe/shell/loop.rb +0 -74
data/lib/bioroebe/shell/misc.rb +0 -4341
data/lib/bioroebe/shell/prompt.rb +0 -107
data/lib/bioroebe/shell/random.rb +0 -289
data/lib/bioroebe/shell/reset.rb +0 -335
data/lib/bioroebe/shell/scan_and_parse.rb +0 -135
data/lib/bioroebe/shell/search.rb +0 -337
data/lib/bioroebe/shell/sequences.rb +0 -200
data/lib/bioroebe/shell/show_report_and_display.rb +0 -2901
data/lib/bioroebe/shell/startup.rb +0 -127
data/lib/bioroebe/shell/taxonomy.rb +0 -14
data/lib/bioroebe/shell/tk.rb +0 -23
data/lib/bioroebe/shell/user_input.rb +0 -88
data/lib/bioroebe/shell/xorg.rb +0 -45

data/doc/README.gen CHANGED Viewed

@@ -5,7 +5,7 @@ ADD_TIME_STAMP
 ## Bioroebe
-<img src="http://shevy.bplaced.net/BIOROEBE.png">
+<img src="https://i.imgur.com/mAoP7AP.png">
 <img src="https://i.imgur.com/YqYxRBZ.png" style="margin: 4px; margin-left: 12px;"/>
 <img src="https://i.imgur.com/k7mMlg2.png" style="margin: 4px; margin-left: 12px;"/>
@@ -332,41 +332,6 @@ so I opted to go the yaml route. But if people want to use a hash
 instead, they can do so, too - see the <b>API</b> for codon tables
 lateron. Simply define your own constants and pass them to the
 appropriate methods.
-## Support for other programming languages
-The main programming language for the bioroebe project is **ruby**.
-Ruby, from a language design point of view, is a great programming
-language - not necessarily all of ruby, but the subset that I use.
-It is very easy to quickly prototype ideas via ruby.
-However had, ruby is known to **not** be among the fastest programming
-languages about on this planet; so, it makes sense to use other
-languages too from this point of view. Additionally there are some
-software stacks in use in **other** programming languages, such as
-matplotlib and various more.
-Thus, it is important to **support other programming languages** as
-well, if there are useful libraries. The bioroebe project, after
-all, tries to be **practical**: it focuses on getting things done,
-no matter the language.
-This means that support for other programming languages can be
-found in this project as well, often using system() or similar
-functionality to tap into these other programming languages. Do
-not be surprised when that happens - the bioroebe project will
-also try to act as a **practical glue** towards functionality
-enabled via other projects. We want to get things done, no
-matter the programming language at hand!
-Whenever possible, though, the bioroebe project will try to be
-flexible in this regard, so ideally the same solution should
-work for many different programming languages.
-While Ruby is the primary language for this project, since as
-of 2021 I will try to officially support **java**, **jruby**
-and the **GraalVM**. This is on my TODO list, though - stay
-tuned for more updates in this regard.
 ## Readline support in the BioRoebe project
@@ -550,16 +515,16 @@ the DNA-to-Protein translation is somewhat simply kept as a
 Once you are inside a **running Bioshell**, you can do other **commands**
 such as this one here:
-    random # ← This will generate a random DNA sequence.
+    random # ← This will generate a random DNA sequence. Each nucleotide has the same chance to be added.
 To **assign** a DNA sequence, do:
     assign ATAGGGCTTTT
-Note that since the year 2016, if you input a nucleotide sequence like
-the one above, without any other commands/words, then we will assume
+Note that since as of the year <b>2016</b>, if you input a nucleotide sequence
+like the one above, without any other commands/words, then we will assume
 that you did mean to do an assignment as-is anyway. The "assign" part
-then becomes superfluous.
+then becomes superfluous and can be omitted.
 This is how this is simply done, by omitting the "assign" part of the
 above instruction altogether:
@@ -1070,18 +1035,18 @@ The text **banana** thus has the following suffixes:
 This subsection deals with some aspects of **HMMs**.
-Why are HMMs useful in biology? They can be used to represent protein
-families, for example (via pHMMs - profile hidden markov models).
+Why are HMMs useful in biology? They can be used to <b>represent protein
+families</b>, for example (via <b>pHMMs</b> - profile hidden markov models).
 Furthermore, they can show some bias in the mutation rate that can be
 observed. Different genomes are known to have different hotspots where
-mutations are more likely to happen. These are examples where a HMM
-may be useful.
+mutations are more likely to happen, for various reasons. These are
+examples where a HMM may be useful.
-HMMs are usually based on the Shannon model where you assign different
+HMMs are usually based on the <b>Shannon model</b> where you assign different
 probabilities to "change" events. An example that was mentioned back
-in 1948 was the english alphabet - some letters, and combinations of
-letters, are more commonly seen. Shannon gave the example of "E"
+in <b>1948</b> was the english alphabet - some letters, and combinations
+of letters, are more commonly seen. Shannon gave the example of "E"
 versus "W", as shown in the following graph (a **finite state
 graph**):
@@ -1095,40 +1060,47 @@ DNA sequence, a 10-mer would be equivalent to **10 base pairs**.
 The individual transition states are based on an assumption of
 "randomness", but ensuring that these are truly random is not
 necessarily trivial. Computers do not really 'generate' true
-randomness, at the least not when they are working solo. You
-can even 'predict' some randomness here or there - see vulnerabilities
-such as Specter or similar variants where software can read from
-areas of the memory that should be inaccessible to them. Some
-of this is based on co-predictions. For distributed computers,
-you may often use random noise or decay of atoms as 'a source
-of randomness''. For any DNA nucleotide sequence, we would
-assume that each base pair has a 25% chance to exist at any
-given position, but this is not necessarily true, for various
-reasons. An interesting thought is ... why is ATP so important?
-Yes, due to it being 'the energy currency in a cell' but .. why
-is this ATP aka adenine? Why not GTP, aka guanine or any of
-the other two nucleotides? I can not answer the question; there may
-be many reasons, including differential chemical storage power as
-well as mere random chance event in evolution, but for whatever
+randomness, at the least not when they are working solo, "on
+their own". You can even 'predict' some randomness here or there
+via various techniques - see vulnerabilities such as <b>Specter</b>
+or similar variants where software can read from areas of the
+memory that should be inaccessible to them. Some of this is based
+on co-predictions. For distributed computers, you may often use
+random noise or decay of atoms as 'a source of randomness'. For
+any DNA nucleotide sequence, we would assume that each base pair
+has a 25% chance to exist at any given position, but this is not
+necessarily true, again for various reasons.
+An interesting thought is ... why is <b>ATP</b> so important?
+Yes, of course due to it being 'the energy currency in a cell' but ..
+why is this ATP, aka adenine? Why not GTP, aka guanine or any of
+the other two nucleotides? (GTP is used too, but why? Why not
+CTP and TTP?) I can not answer this question; there may
+be many reasons, including differential chemical storage power
+as well as mere random chance event in evolution, but for whatever
 the reason, you will not find a complete 25% percentage value
 for every given "slot" in DNA, depending on the organism.
 From a practical point of view, how can we approach Hidden Markov
-Models?
+Models and use them?
-Let's take the following sequence:
+Let's take the following simple sequence:
     ACGTACGC
 From this sequence we can see that the <b>3-mer</b> "ACG"
 is followed by either a T, or a C. Have a look at the sequence
-to see if you can identify the two ACG subsequences there.
+again to see if you can identify the two ACG subsequences
+there. You can see one at the start, and the other one
+following a bit later, hence why we come to the conclusion
+that either a T or a C will follow this <b>3-mer</b>.
-The probability of either T or C, thus, is 0.5 (50%);
-for A and G to follow there is 0% so the latter two can
-be ignored.
+The probability of either T or C to occur on <b>that</b>
+position, thus, is 0.5 (50%); for A and G to follow there
+is 0% so the latter two can be ignored.
-Thus, we could use a ruby Hash as follows:
+Thus, we could use a ruby Hash as follows that should
+describe these probabilities:
     probabilities = {'T': 0.5, 'C': 0.5} # ignoring A and G here, but we could denote them via 0 as well
@@ -1214,34 +1186,6 @@ each edge.
 Parsimony assumes that substitutions are rare and that back-mutations
 do not occur.
-## Random stuff
-You can generate random DNA sequences in the shell:
-    random dna 20
-    random dna 25
-    random dna 30
-This will generate random DNA sequences, with a length
-of 20, 25, 30, respectively. This may not be very useful
-but it was important that this functionality is made
-available somewhere.
-You can also use some toplevel-methods to generate, e. g.
-20 random aminoacids:
-    Bioroebe.random_aminoacid? 20 # => "UAVHYQQESWUYAOVESEIY"
-Note that there may exist other APIs within the Bioroebe project
-that do the same as well.
-If you would like to use a ruby-gtk3 widget have a look
-at **RandomSequence**, under **bioroebe/gtk3/random_sequence/**.
-It works with aminoacids, DNA and RNA, and allows the user to
-create random sequences. (If you need weighted randomness then
-you currently have to use the commandline variant. Perhaps I may
-add support into the GUI directly for this one day.)
 ## Displaying the main sequence with delimiter characters
 From within the <b>bioshell</b>, you can use some alternative ways to
@@ -2711,18 +2655,6 @@ This may look as follows:
 <img src="https://i.imgur.com/gAZg8qG.png" style="margin: 1em; margin-left: 3em">
-## Obtaining a subsequence from a Bioroebe::Sequence object
-Say that you have the DNA sequence **ATGCATGCAAAA**.
-There are several ways how to obtain a subsequence from
-this. One variant will be shown next, by making use of
-the method called **.subseq()**.
-Example:
-    seq = Bioroebe::Sequence.new("ATGCATGCAAAA"); seq.subseq(1,3) # => "ATG"
 ## Bioroebe::Protein
 This class is a subclass of class **Bioroebe::Sequence**. The
@@ -2737,16 +2669,6 @@ functionality is also available in another method.
 For now keep this in mind; at some later point I may decide whether
 this class is to be kept or not.
-## Permanently disabling showing the startup-introduction of the Bioshell
-If you do not want to see the start-up intro, you can try
-any of the following:
-    bioshell --permanently-disable-startup-intro
-    bioshell --permanently-disable-startup-notice
-    bioshell --permanently-no-startup-intro
-    bioshell --permanently-no-startup-info
 ## Decoding aminoacids
 Decoding aminoacids means to take the aminoacid at hand, ideally
@@ -3173,47 +3095,45 @@ can try to use:
 On class Bioroebe::Sequence. More customizability may be added
 to that method in this regard, if users need this.
-## The Hydropathy index
+### Obtaining a subsequence from a Bioroebe::Sequence object
-You can display the hydropathy index for aminoacids from within
-the **bioshell**.
+Say that you have the DNA sequence **ATGCATGCAAAA**.
-Simply issue:
+There are several ways how to obtain a subsequence from
+this. One variant will be shown next, by making use of
+the method called **.subseq()**.
-    hydropathy?
+Example:
-## Generate DNA
+    seq = Bioroebe::Sequence.new("ATGCATGCAAAA"); seq.subseq(1,3) # => "ATG"
-You can generate random DNA strings by issuing the following
-code:
+You can also randomize the sequence, via .randomize().
-    x = Bioroebe.random_dna 50 # => "AGACATCCGGCTTGGATACCTCATAAGTCATATCAGCATCGTCGGACATT"
+Example:
-As can be seen in the example above, after the #, a String will be
-returned representing that nucleotide sequence.
+    x = Bioroebe::Sequence.new; x.randomize
-The number given to .random_dna() tells the method how many nucleotides
-should be generated.
+This is similar to the method in Bioruby here:
-## The GFF file format
+https://github.com/bioruby/bioruby/blob/master/lib/bio/sequence/common.rb#L243
-From within the **bioshell** you can analyze .gff and .gff3 files,
-such as by issuing the following command:
+## The Hydropathy index
-    gff3? foobar.gff3
+You can display the hydropathy index for aminoacids from within
+the **bioshell**.
-Evidently for this to work the file at hand has to exist.
+Simply issue:
-## Shuffling the DNA/RNA string in the bioshell
+    hydropathy?
-Via
+## The GFF file format
-    shuffle
+From within the **bioshell** you can analyze .gff and .gff3 files,
+such as by issuing the following command:
-you can randomly rearrange the main DNA/RNA string.
+    gff3? foobar.gff3
-This can be useful if you just wish to quickly "test" new
-compositions of the same nucleotide.
+Evidently for this to work the file at hand has to exist.
 ## The NCBI Taxonomy database (the Taxonomy submodule of the Bioroebe project)
@@ -3350,47 +3270,6 @@ nucleotides by issuing:
     show_individual_weight_of_the_four_dna_nucleotides
-## Truncating output in the bioroebe-shell
-![alt text][cat1]
-[cat1]: https://i.imgur.com/Qmd7R0p.png
-**DNA/RNA sequences** can become very long and then become
-quite difficult to view, read and handle on the commandline.
-Normally the bioroebe shell will truncate output of DNA sequences
-that are "too long". This is mostly done so that working with
-very long sequences becomes a bit more convenient.
-Sometimes this can become an antifeature, though, so the user
-must be able to toggle this at his or her own discretion.
-By default, the bioroebe-shell (bioshell) will always try
-to truncate output, but you can toggle this behaviour by
-issuing:
-    do not truncate
-In theory, other "do not" actions are also supported, or will
-be supported in the future; right now (Oct 2019) this is a bit
-limited.
-From the toplevel, you can use this method:
-    Bioroebe.do_not_truncate
-The above instruction will toggle the truncate behaviour
-to not truncate, ever.
-If you need to do so within the bioshell, this is the way:
-    no_truncate
-Or simply
-    truncate
-This will toggle, like a switch.
 ## Rosalind Challenges
 ![alt text][cat1]
 [cat1]: https://i.imgur.com/Qmd7R0p.png
@@ -3527,31 +3406,6 @@ investing more time into Rosalind. Let's focus on solving
 real, existing problems instead - at the least as far as
 the Bioroebe project is concerned.
-## Numbers as input in the bioshell
-![alt text][cat1]
-[cat1]: https://i.imgur.com/Qmd7R0p.png
-You can input a number in the **BioShell** such as <b style="color: darkblue">3</b>.
-This will attempt to <b>display the first 3 nucleotides</b> of
-the assigned **main sequence**. It will only work if you have
-assigned a sequence prior to that, though.
-Examples:
-    3
-    33
-    15
-## transeq
-![alt text][cat1]
-[cat1]: https://i.imgur.com/Qmd7R0p.png
-You can convert a DNA sequence into an aminoacid sequence by
-doing this:
-    transeq
 ## Align two different sequences
 ![alt text][cat1]
 [cat1]: https://i.imgur.com/Qmd7R0p.png
@@ -3863,22 +3717,6 @@ does not (yet?) have support for comparing two genomes to
 one another and generate a visual map indicating the findings
 there.
-## Do not create directories on startup of the shell
-By default the bioshell will try to create some directories
-on startup. This may not always be desired by the user
-though, so an option has to exist to disable this functionality.
-Internally the variable @internal_hash[:create_directories_on_startup_of_the_shell]
-keeps track of whether directories on startup of the shell will
-be created.
-To disable this behaviour on startup of the bioshell, try
-something like this:
-    bioshell --do-not-create-directories-on-startup
-    bioshell --do-not-create-directories
 ## class Bioroebe::MoveFileToItsCorrectLocation
 This class will move a bio-file to its "correct" location, with respect
@@ -4047,39 +3885,6 @@ has". Genes in itself are not that well-defined, so they are not necessarily
 the primary means of complexity. Think of this more as an interactome,
 where RNAs play a major dynamic role as well.
-## Bioroebe::ProfilePattern
-This class can be used to generate nucleotide sequences that
-are not quite "random". For example, to generate sequences
-that may "simulate" a TATA box.
-The idea for this class is to be extended into allowing
-HMMs (Hidden Markov Models) one day.
-Usage example:
-    _ = Bioroebe::ProfilePattern.new(ARGV, :do_not_run_yet)
-    _.generate_sequence_based_on_this_profile
-Such a profile will encode the profile specifying the preferred sequence
-letters for each position in a section of DNA. You have to provide
-the Hash into the method generate_sequence_based_on_this_profile() -
-or you use the default Hash, which is stored in the constant
-called **PER_POSITION_HASH**.
-That profile should be a Hash, with keys pointing to A, T, C, G
-and the values being an Array of likelihood chance there,
-as a number, such as 140. These values are also called
-**scores**. Each score contains a number for each position
-that indicates how likely it is to find the given
-nucleotide at that location.
-You can also use this class to generate a random DNA string,
-similar to the method called
-**Bioroebe.generate_random_dna_sequence()**. The difference
-is that class ProfilePattern allows for a bit more fine-tuned
-control. The class will likely be extended in the future too.
 ## class Bioroebe::DisplayOpenReadingFrames
 **class Bioroebe::DisplayOpenReadingFrames**, created in **May 2020**,
@@ -4459,28 +4264,6 @@ the BioRoebe-Shell, then you can use either of the following:
     seq?
     seq_with_tab?
-## Prompt (the shell prompt9
-You can set a <b>custom prompt</b>, via the keywords
-"prompt" or "set_prompt".
-To display the <b>current working directory</b>, do:
-    prompt pwd
-To revert to the old default again, do this:
-    prompt REVERT
-    prompt revert
-    prompt DEFAULT
-    prompt default
-If you do not want to set any prompt, do:
-    prompt none
 ## Leader and Trailer
@@ -5761,6 +5544,9 @@ like this:
 <img src="https://i.imgur.com/vr2kEBz.png" style="margin: 1em; margin-left: 3em">
+Since as of <b>July 2022</b> invalid amino acids will be automatically
+filtered away before being assigned to the input.
 ## Colourizing hydrophilic and hydrophobic aminoacids on the commandline
 Via class **Bioroebe::ColourizeHydrophilicAndHydrophobicAminoacids** you
@@ -5774,35 +5560,36 @@ Example output for this:
 This subsection contains some information about proteases.
-trypsin:
+Trypsin:
 https://en.wikipedia.org/wiki/Trypsin
-cuts at: Trypsin cuts peptide chains mainly at the carboxyl
+<b>cuts at</b>: Trypsin cuts peptide chains mainly at the carboxyl
 side of the amino acids lysine or arginine.
-chymotrypsin:
+Chymotrypsin:
 https://en.wikipedia.org/wiki/Chymotrypsin
-cuts at: Chymotrypsin preferentially cleaves peptide amide
+<b>cuts at</b>: Chymotrypsin preferentially cleaves peptide amide
 bonds where the side chain of the amino acid N-terminal
-to the scissile amide bond is a large hydrophobic amino
-acid (tyrosine, tryptophan, and phenylalanine).
+to the scissile amide bond is <b>a large hydrophobic amino</b>
+acid (specifically: tyrosine, tryptophan, and phenylalanine).
+Chymotrypsin will cleave proteins on the <b>carboxyl side</b>
+of aromatic or large hydrophobic amino acids.
-thrombin:
+Thrombin:
 https://en.wikipedia.org/wiki/Thrombin
-cuts at: Thrombin acts as a serine protease that converts
+<b>cuts at</b>: Thrombin acts as a serine protease that converts
 soluble fibrinogen into insoluble strands of fibrin. It
 catalyzes the hydrolysis of <b>Arg-Gly</b> bonds in
 particular peptide sequences only.
-plasmin:
+Plasmin:
 https://en.wikipedia.org/wiki/Plasmin
-cuts at: Plasmin is a serine protease.
+<b>cuts at</b>: Plasmin is a serine protease.
-papain:
+Papain:
 https://en.wikipedia.org/wiki/Papain
-cuts at: Papain prefers to cleave after an
-arginine or lysine preceded by a hydrophobic
-unit (Ala, Val, Leu, Ile, Phe, Trp, Tyr) and
-not followed by a valine.
+<b>cuts at</b>: Papain prefers to cleave after an arginine or
+lysine preceded by a hydrophobic unit (Ala, Val, Leu, Ile,
+Phe, Trp, Tyr) and not followed by a valine.
 factor Xa:
@@ -5814,8 +5601,8 @@ Some proteins may permanently reside in the lumen of the
 Often such proteins will have a special signal sequence attached
 to their **C-terminal part**, such as **KDEL** (Lys-Asp-Glu-Leu).
-KDEL is not the only signal that may be used, though. Some species
-may use different signals, such as:
+<b>KDEL</b> is not the only signal that may be used, though. Some
+species may use different signals, such as:
  aminoacids  | species
 -------------|------------------------------------------------------------
@@ -5825,8 +5612,9 @@ may use different signals, such as:
   ADEL       | Schizosaccharomyces pombe (fission yeast)
   SDEL       | Plasmodium falciparum
-If you work with the bioshell then you can simply use this method
-to query whether the given aminoacid sequence has a KDEL sequence:
+If you work with the <b>bioshell</b> then you can simply use this
+method to query whether the given aminoacid sequence has a KDEL
+sequence:
     KDEL?
@@ -7362,16 +7150,6 @@ This would notify the bioshell that only nucleotides from position
 51 to (including) position 3251 will be colourized, when doing another
 "ORF?" invocation.
-## Longest substring
-Within the Bioroebe::Shell you can determine the longest substring,
-  including gaps, like s:'
-    longest_substring? ATTATTGTT | ATTATTCTT'
-Note that this will make use of the diff-lcs gem, which uses
-the McIlroy-Hunt algorithm.
 ## Restriction Enzymes
 This **subsection** will eventually be expanded to explain various things about
@@ -8730,6 +8508,22 @@ The images that can be generated via this may look as follows:
 <img src="https://i.imgur.com/fWwD1fj.png" style="margin: 1em; margin-left: 2em">
+Let's look at another example.
+Say you input the following sequences there:
+    AGVV
+    AGVV
+    AGVV
+    AGVV
+    AGGV
+    AGGV
+    AGGV
+The resulting image that is generated is:
+<img src="https://i.imgur.com/3wWApIQ.png" style="margin: 1em; margin-left: 2em">
 ## The Kozak Sequence
 The ribosome usually scans for a **AUG** codon. But there are
@@ -9180,6 +8974,409 @@ time being it is what it is. At a later point in time test cases
 may be added to check whether it performs correctly or whether it
 does not.
+The other rules, also published in 2004, are the Reynolds rules. Code
+support was added to the Bioroebe project in <b>June 2022</b>, but
+it was not tested yet, so the implementation may be incorrect.
+## The Bioroebe::Shell interface
+The following subsection specifically handles information
+pertaining to the <b>Bioroebe::Shell</b> interface of the
+<b>bioroebe project</b>. It is also called <b>bioshell</b>,
+to simplify spelling it.
+### Numbers as input in the bioshell
+![alt text][cat1]
+[cat1]: https://i.imgur.com/Qmd7R0p.png
+You can input a number in the **BioShell** such as <b style="color: darkblue">3</b>.
+This will attempt to <b>display the first 3 nucleotides</b> of
+the assigned **main sequence**. It will only work if you have
+assigned a sequence prior to that, though.
+Examples:
+    3
+    33
+    15
+### transeq
+![alt text][cat1]
+[cat1]: https://i.imgur.com/Qmd7R0p.png
+You can convert a DNA sequence into an aminoacid sequence by
+doing this:
+    transeq
+### Shuffling the DNA/RNA string in the bioshell
+![alt text][cat1]
+[cat1]: https://i.imgur.com/Qmd7R0p.png
+Via
+    shuffle
+you can <b>randomly rearrange the main DNA/RNA string</b>
+that is used by the <b>Bioroebe::Shell</b>.
+This can be useful if you just wish to quickly "test"
+new compositions of the same nucleotide.
+### Permanently disabling showing the startup-introduction of the Bioshell
+![alt text][cat1]
+[cat1]: https://i.imgur.com/Qmd7R0p.png
+If you do not want to see the start-up intro, you can try
+any of the following:
+    bioshell --permanently-disable-startup-intro
+    bioshell --permanently-disable-startup-notice
+    bioshell --permanently-no-startup-intro
+    bioshell --permanently-no-startup-info
+### Longest substring
+![alt text][cat1]
+[cat1]: https://i.imgur.com/Qmd7R0p.png
+Within the Bioroebe::Shell you can determine the longest substring,
+  including gaps, like s:'
+    longest_substring? ATTATTGTT | ATTATTCTT'
+Note that this will make use of the diff-lcs gem, which uses
+the McIlroy-Hunt algorithm.
+### Do not create directories on startup of the shell
+![alt text][cat1]
+[cat1]: https://i.imgur.com/Qmd7R0p.png
+By default the <b>bioshell</b> will try to create some directories
+on startup. This may not always be desired by the user, though,
+so an option has to exist to <b>disable</b> this functionality.
+Internally the variable @internal_hash[:create_directories_on_startup_of_the_shell]
+keeps track of whether directories on startup of the shell will
+be created.
+To disable this behaviour on startup of the bioshell, try
+something like this:
+    bioshell --do-not-create-directories-on-startup
+    bioshell --do-not-create-directories
+### Generating and assigning a random amount of nucleotides
+![alt text][cat1]
+[cat1]: https://i.imgur.com/Qmd7R0p.png
+Via:
+    random 555
+you can "generate" 555 random nucleotides (DNA that is) and
+assign it to the main sequence in use by the bioshell. This
+is mostly a convenience feature, if you want to debug something
+quickly.
+### Determining the log directory for the Bioroebe::Shell component
+![alt text][cat1]
+[cat1]: https://i.imgur.com/Qmd7R0p.png
+Via:
+    bioshell_log_dir?
+you can determine the log-directory output for the bioshell
+component. On my home system this will default to
+<b>/home/Temp/bioroebe/bioshell/</b>.
+### Prompt (the shell prompt of the bioshell)
+![alt text][cat1]
+[cat1]: https://i.imgur.com/Qmd7R0p.png
+You can set a <b>custom prompt</b> in the bioshell, via
+the keywords "<b>prompt</b>" or "<b>set_prompt</b>".
+To display the <b>current working directory</b>, do:
+    prompt pwd
+To revert to the old default again, do this:
+    prompt REVERT
+    prompt revert
+    prompt DEFAULT
+    prompt default
+If you do not want to set any prompt, do:
+    prompt none
+### Random stuff - generating random DNA sequences in the bioshell
+![alt text][cat1]
+[cat1]: https://i.imgur.com/Qmd7R0p.png
+You can <b>generate random DNA sequences</b> in the
+<b>bioshell</b> via:
+    random dna 20
+    random dna 25
+    random dna 30
+    # or simpler
+    random 20
+    random 25
+    random 30
+This will generate random DNA sequences, with a length
+of 20, 25, 30, respectively. This may not be very useful
+but it was important that this functionality is made
+available somewhere. Sometimes you may not even care
+about the sequence and just use the a "filler" sequence,
+so randomness has to be part of the Bioroebe project
+as well.
+You can also use some toplevel-methods to generate, e. g.
+20 random aminoacids. Have a look at the following
+<b>toplevel API</b>:
+    Bioroebe.random_aminoacid? 20 # => "UAVHYQQESWUYAOVESEIY"
+Note that there may exist other APIs within the Bioroebe project
+that do the same as well.
+If you would like to use a ruby-gtk3 widget have a look
+at **RandomSequence**, under **bioroebe/gtk3/random_sequence/**.
+It works with aminoacids, DNA and RNA, and allows the user to
+create random sequences. (If you need weighted randomness then
+you currently have to use the commandline variant. Perhaps I may
+add support into the GUI directly for this one day.)
+### Deprecations within the Bioroebe::Shell
+![alt text][cat1]
+[cat1]: https://i.imgur.com/Qmd7R0p.png
+Over the years the Bioroebe::Shell changed quite a bit.
+This subsection here will list a few of these changes
+or rather, the deprecations.
+**raw_sequence**: removed in June 2022 completely. It is
+simpler to handle sequences via Bioroebe::Sequence
+instead.
+<b>@internal_hash[:array_sequences]</b> was no longer in
+use, so it was removed in July 2022.
+### Chop off nucleotides within the Bioroebe::Shell
+![alt text][cat1]
+[cat1]: https://i.imgur.com/Qmd7R0p.png
+You can use the following syntax to chop away until you find
+a particular substring, in the bioshell:
+    chop_to ATG
+This functionality was specifically added to find the first
+ATG codon.
+### Truncating output in the bioroebe-shell
+![alt text][cat1]
+[cat1]: https://i.imgur.com/Qmd7R0p.png
+**DNA/RNA sequences** can become very long and then become
+quite difficult to view, read and handle on the commandline.
+Normally the bioroebe shell will truncate output of DNA sequences
+that are "too long". This is mostly done so that working with
+very long sequences becomes a bit more convenient.
+Sometimes this can become an antifeature, though, so the user
+must be able to toggle this at his or her own discretion.
+By default, the bioroebe-shell (bioshell) will always try
+to truncate output, but you can toggle this behaviour by
+issuing:
+    do not truncate
+In theory, other "do not" actions are also supported, or will
+be supported in the future; right now (Oct 2019) this is a bit
+limited.
+From the toplevel, you can use this method:
+    Bioroebe.do_not_truncate
+The above instruction will toggle the truncate behaviour
+to not truncate, ever.
+If you need to do so within the bioshell, this is the way:
+    no_truncate
+Or simply
+    truncate
+This will toggle, like a switch.
+## Support for other programming languages
+The main programming language for the bioroebe project is **ruby**.
+Ruby, from a language design point of view, is a great programming
+language - not necessarily all of ruby, but the subset that I use.
+It is very easy to quickly prototype ideas via ruby.
+However had, ruby is known to **not** be among the fastest programming
+languages about on this planet; so, it makes sense to use other
+languages too from this point of view. Additionally there are some
+software stacks in use in **other** programming languages, such as
+matplotlib and various more.
+Thus, it is important to **support other programming languages** as
+well, if there are useful libraries. The bioroebe project, after
+all, tries to be **practical**: it focuses on getting things done,
+no matter the language.
+This means that support for other programming languages can be
+found in this project as well, often using system() or similar
+functionality to tap into these other programming languages. Do
+not be surprised when that happens - the bioroebe project will
+also try to act as a **practical glue** towards functionality
+enabled via other projects. We want to get things done, no
+matter the programming language at hand!
+Whenever possible, though, the bioroebe project will try to be
+flexible in this regard, so ideally the same solution should
+work for many different programming languages.
+While Ruby is the primary language for this project, since as
+of 2021 I will try to officially support **java**, **jruby**
+and the **GraalVM**. This is on my TODO list, though - stay
+tuned for more updates in this regard. See also the
+subsection <b>Support for Python</b>.
+## Support for Python
+In <b>June 2022</b> I decided to add support for Python to bioroebe.
+While people can - and should - easily use <b>biopython</b> instead,
+I simply wanted to see how much python-support I can add to
+bioroebe. This may lag behind some years compared to biopython,
+but I wanted to extend python support as well, so there you go.
+It is simply an additional option for the bioroebe project.
+<b>Ruby</b> will remain the primary language for the project,
+though, at the least for now.
+## Bioroebe::ProfilePattern
+This class can be used to generate nucleotide sequences that
+are not quite "random". For example, to generate sequences
+that may "simulate" a TATA box.
+The idea for this class is to be extended into allowing
+HMMs (Hidden Markov Models) one day.
+Usage example:
+    _ = Bioroebe::ProfilePattern.new(ARGV, :do_not_run_yet)
+    _.generate_sequence_based_on_this_profile
+Such a profile will encode the profile specifying the preferred sequence
+letters for each position in a section of DNA. You have to provide
+the Hash into the method generate_sequence_based_on_this_profile() -
+or you use the default Hash, which is stored in the constant
+called **PER_POSITION_HASH**.
+That profile should be a Hash, with keys pointing to A, T, C, G
+and the values being an Array of likelihood chance there,
+as a number, such as 140. These values are also called
+**scores**. Each score contains a number for each position
+that indicates how likely it is to find the given
+nucleotide at that location.
+You can also use this class to generate a random DNA string,
+similar to the method called
+**Bioroebe.generate_random_dna_sequence()**. The difference
+is that class ProfilePattern allows for a bit more fine-tuned
+control. The class will likely be extended in the future too.
+## Generate DNA via Bioroebe.random_dna
+You can "generate" random DNA strings by making use of the
+following code:
+    x = Bioroebe.random_dna 50 # => "AGACATCCGGCTTGGATACCTCATAAGTCATATCAGCATCGTCGGACATT"
+As can be seen in the example above, after the #, a String will be
+returned representing that nucleotide sequence. In the case above
+it'll be 50 nucleotides in length.
+The number given to <b>.random_dna()</b> tells the method how many
+nucleotides should be generated.
+The method accepts a second argument, which should be a Hash.
+If it is a hash then the generated DNA will be based on the
+**probabilities** given to that Hash.
+Let's look at specific example here:
+    Bioroebe.random_dna(50, { A: 10, T: 10, C: 10, G: 70}) # => "GGGGTGGGGAGGGTATGCGGAGGAAGGGCGGGAAGGGCGGGGGCTGGGCG"
+As you can see, in the Hash defined above, the likelihood for
+incorporating a Guanine is much higher than for Adenine
+(70 : 10). This will be reflected in the generated DNA
+sequence which, as can be seen, contains many more
+Guanines than Adenines.
+There is yet a third use case for the above. If you pass a **String**
+as the second argument rather than a Hash, then that String will be
+used as basis for generating the DNA string at hand.
+Again, let's look at a specific example here:
+    Bioroebe.random_dna(10, 'ATCGATCGGG')
+Here we add more G than A, T or C, so the new DNA sequence should
+contain these nucleotides as well.
+More usage examples in this regard:
+    Bioroebe.random_dna(20, 'ATGGGGGGGG') # => "TGAGGGGGGGGGTGGGAGGG"
+    Bioroebe.random_dna(20, 'ATGGGGGGGG') # => "GGTAGGGGGGGGTAGGGGGG"
+Note that this is similar to the .randomize() method in the bioruby
+project:
+    hash = {'a'=>1,'c'=>2,'g'=>3,'t'=>4}
+    puts Bio::Sequence::NA.randomize(hash) # => "ggcttgttac" (for example)
+## Parsing genbank (.gbk) files
+You could use Bioroebe::GenbankParser to parse .gbk files, at the
+least if you want to obtain the raw sequence, in FASTA format.
+Example for this:
+    require 'bioroebe/genbank/genbank_parser.rb'
+    result = Bioroebe::GenbankParser.new('/home/Temp/bioroebe/ls_orchid.gbk')
+    result.dataset? # This method call will return the FASTA sequence.
+Note that this currently (<b>July 2022</b>) only grabs one entry. In
+the upcoming rewrite in the future the parser will be able to parse
+all entries, and then present them to the user. Stay tuned in this
+regard.
+## Parsers in general
+The bioroebe project will store most parsers in the parsers/ subdirectory
+since as of <b>July 2022</b>.
+Prior to that date different parsers were stored in different subdirectories,
+such as the parser for genbank-files being stored in the genbank/
+subdirectory. As I found this situation confusing, I settled for
+the parsers/ subdirectory since as of <b>July 2022</b>.
 ## Possibly useful links in regards to molecular biology and science in general