bioroebe 0.10.80 → 0.12.24
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +3946 -2817
- data/bin/bioroebe +13 -2
- data/bin/bioroebe_hash +7 -0
- data/bin/codon_to_aminoacid +6 -4
- data/bin/compacter +7 -0
- data/bin/plain_palindrome +7 -0
- data/bioroebe.gemspec +3 -3
- data/doc/README.gen +3918 -2793
- data/doc/quality_control/commandline_applications.md +3 -3
- data/doc/statistics/statistics.md +7 -7
- data/doc/todo/bioroebe_GUI_todo.md +19 -14
- data/doc/todo/bioroebe_java_todo.md +22 -0
- data/doc/todo/bioroebe_todo.md +2075 -2620
- data/lib/bioroebe/C++/DNA.cpp +69 -0
- data/lib/bioroebe/C++/RNA.cpp +58 -0
- data/lib/bioroebe/C++/sequence.cpp +35 -0
- data/lib/bioroebe/abstract/README.md +1 -0
- data/lib/bioroebe/abstract/features.rb +29 -0
- data/lib/bioroebe/aminoacids/aminoacid_substitution.rb +1 -9
- data/lib/bioroebe/aminoacids/codon_percentage.rb +1 -9
- data/lib/bioroebe/aminoacids/deduce_aminoacid_sequence.rb +1 -9
- data/lib/bioroebe/aminoacids/display_aminoacid_table.rb +1 -0
- data/lib/bioroebe/aminoacids/show_hydrophobicity.rb +1 -6
- data/lib/bioroebe/base/base_module/base_module.rb +36 -0
- data/lib/bioroebe/base/colours_for_base/colours_for_base.rb +18 -8
- data/lib/bioroebe/base/commandline_application/commandline_application.rb +13 -9
- data/lib/bioroebe/base/commandline_application/commandline_arguments.rb +24 -19
- data/lib/bioroebe/base/commandline_application/misc.rb +66 -49
- data/lib/bioroebe/base/commandline_application/opn.rb +8 -8
- data/lib/bioroebe/base/commandline_application/reset.rb +5 -3
- data/lib/bioroebe/base/internal_hash_module/internal_hash_module.rb +42 -0
- data/lib/bioroebe/base/misc.rb +35 -0
- data/lib/bioroebe/base/prototype/misc.rb +15 -9
- data/lib/bioroebe/base/prototype/reset.rb +10 -0
- data/lib/bioroebe/cleave_and_digest/digestion.rb +10 -2
- data/lib/bioroebe/cleave_and_digest/trypsin.rb +104 -50
- data/lib/bioroebe/codon_tables/frequencies/parse_frequency_table.rb +2 -10
- data/lib/bioroebe/codons/codons.rb +1 -1
- data/lib/bioroebe/codons/convert_this_codon_to_that_aminoacid.rb +208 -59
- data/lib/bioroebe/codons/possible_codons_for_this_aminoacid.rb +1 -9
- data/lib/bioroebe/codons/show_codon_tables.rb +8 -3
- data/lib/bioroebe/codons/show_codon_usage.rb +15 -4
- data/lib/bioroebe/colours/rev.rb +4 -1
- data/lib/bioroebe/constants/aminoacids_and_proteins.rb +1 -0
- data/lib/bioroebe/constants/database_constants.rb +1 -1
- data/lib/bioroebe/constants/files_and_directories.rb +31 -4
- data/lib/bioroebe/constants/misc.rb +20 -0
- data/lib/bioroebe/constants/nucleotides.rb +7 -0
- data/lib/bioroebe/conversions/dna_to_aminoacid_sequence.rb +109 -39
- data/lib/bioroebe/count/count_amount_of_aminoacids.rb +3 -2
- data/lib/bioroebe/count/count_amount_of_nucleotides.rb +3 -0
- data/lib/bioroebe/cpp +1 -0
- data/lib/bioroebe/crystal/README.md +2 -0
- data/lib/bioroebe/crystal/to_rna.cr +19 -0
- data/lib/bioroebe/data/README.md +11 -8
- data/lib/bioroebe/data/electron_microscopy/pos_example.pos +396 -0
- data/lib/bioroebe/data/electron_microscopy/test_particles.star +36 -0
- data/lib/bioroebe/data/fasta/human/Homo_sapiens_hemoglobin_subunit_alpha_HBB_mRNA.fasta +9 -0
- data/lib/bioroebe/data/fasta/human/Homo_sapiens_hemoglobin_subunit_beta_HBB_mRNA.fasta +8 -0
- data/lib/bioroebe/data/fasta/human/README.md +2 -0
- data/lib/bioroebe/dotplots/advanced_dotplot.rb +1 -1
- data/lib/bioroebe/electron_microscopy/coordinate_analyzer.rb +15 -18
- data/lib/bioroebe/{fasta_and_fastq/parse_fasta/run.rb → electron_microscopy/electron_microscopy_module.rb} +16 -8
- data/lib/bioroebe/electron_microscopy/fix_pos_file.rb +1 -9
- data/lib/bioroebe/electron_microscopy/flipy.rb +83 -0
- data/lib/bioroebe/electron_microscopy/parse_coordinates.rb +2 -10
- data/lib/bioroebe/electron_microscopy/read_file_xmd.rb +1 -9
- data/lib/bioroebe/electron_microscopy/simple_star_file_generator.rb +4 -9
- data/lib/bioroebe/enzymes/has_this_restriction_enzyme.rb +10 -3
- data/lib/bioroebe/enzymes/restriction_enzyme.rb +23 -1
- data/lib/bioroebe/enzymes/restriction_enzymes/statistics.rb +65 -0
- data/lib/bioroebe/fasta_and_fastq/autocorrect_the_name_of_this_fasta_file.rb +1 -9
- data/lib/bioroebe/fasta_and_fastq/compact_fasta_file/compact_fasta_file.rb +7 -9
- data/lib/bioroebe/fasta_and_fastq/fasta_defline/fasta_defline.rb +1 -5
- data/lib/bioroebe/fasta_and_fastq/fasta_to_yaml/fasta_to_yaml.rb +81 -0
- data/lib/bioroebe/fasta_and_fastq/parse_fasta/parse_fasta.rb +1518 -7
- data/lib/bioroebe/fasta_and_fastq/return_fasta_subsection_of_this_file.rb +11 -2
- data/lib/bioroebe/fasta_and_fastq/show_fasta_headers.rb +27 -12
- data/lib/bioroebe/fasta_and_fastq/simplify_fasta_header/simplify_fasta_header.rb +1 -5
- data/lib/bioroebe/fasta_and_fastq/split_this_fasta_file_into_chromosomes/constants.rb +0 -5
- data/lib/bioroebe/genome/README.md +4 -0
- data/lib/bioroebe/genome/genome.rb +130 -0
- data/lib/bioroebe/genomes/genome_pattern.rb +3 -9
- data/lib/bioroebe/gui/gtk +1 -0
- data/lib/bioroebe/gui/gtk3/alignment/alignment.rb +106 -137
- data/lib/bioroebe/gui/gtk3/aminoacid_composition/aminoacid_composition.rb +27 -61
- data/lib/bioroebe/gui/gtk3/aminoacid_composition/customized_dialog.rb +1 -1
- data/lib/bioroebe/gui/gtk3/blosum_matrix_viewer/blosum_matrix_viewer.rb +1 -2
- data/lib/bioroebe/gui/gtk3/calculate_cell_numbers_of_bacteria/calculate_cell_numbers_of_bacteria.rb +1 -2
- data/lib/bioroebe/gui/gtk3/controller/controller.rb +46 -29
- data/lib/bioroebe/gui/gtk3/dna_to_aminoacid_widget/dna_to_aminoacid_widget.rb +77 -52
- data/lib/bioroebe/gui/gtk3/dna_to_reverse_complement_widget/dna_to_reverse_complement_widget.rb +1 -2
- data/lib/bioroebe/gui/gtk3/fasta_table_widget/fasta_table_widget.rb +100 -23
- data/lib/bioroebe/gui/gtk3/format_converter/format_converter.rb +1 -2
- data/lib/bioroebe/gui/gtk3/gene/gene.rb +1 -2
- data/lib/bioroebe/gui/gtk3/hamming_distance/hamming_distance.rb +43 -30
- data/lib/bioroebe/gui/gtk3/levensthein_distance/levensthein_distance.rb +1 -2
- data/lib/bioroebe/gui/gtk3/nucleotide_analyser/nucleotide_analyser.rb +120 -73
- data/lib/bioroebe/gui/gtk3/primer_design_widget/primer_design_widget.rb +1 -2
- data/lib/bioroebe/gui/gtk3/protein_to_DNA/protein_to_DNA.rb +19 -20
- data/lib/bioroebe/gui/gtk3/random_sequence/random_sequence.rb +20 -13
- data/lib/bioroebe/gui/gtk3/restriction_enzymes/restriction_enzymes.rb +1 -2
- data/lib/bioroebe/gui/gtk3/show_codon_table/misc.rb +97 -22
- data/lib/bioroebe/gui/gtk3/show_codon_table/show_codon_table.rb +3 -73
- data/lib/bioroebe/gui/gtk3/show_codon_usage/show_codon_usage.rb +1 -2
- data/lib/bioroebe/gui/gtk3/sizeseq/sizeseq.rb +1 -2
- data/lib/bioroebe/gui/gtk3/three_to_one/three_to_one.rb +1 -2
- data/lib/bioroebe/gui/gtk3/www_finder/www_finder.rb +1 -2
- data/lib/bioroebe/gui/javafx/bioroebe/Bioroebe.class +0 -0
- data/lib/bioroebe/gui/javafx/bioroebe/Bioroebe.java +104 -0
- data/lib/bioroebe/gui/javafx/bioroebe.jar +0 -0
- data/lib/bioroebe/gui/javafx/bioroebe.mf +1 -0
- data/lib/bioroebe/gui/javafx/module-info.class +0 -0
- data/lib/bioroebe/gui/javafx/module-info.java +5 -0
- data/lib/bioroebe/gui/jruby/alignment/alignment.rb +165 -0
- data/lib/bioroebe/gui/jruby/aminoacid_composition/aminoacid_composition.rb +166 -0
- data/lib/bioroebe/gui/libui/alignment/alignment.rb +3 -1
- data/lib/bioroebe/gui/libui/controller/controller.rb +116 -0
- data/lib/bioroebe/gui/libui/random_sequence/random_sequence.rb +18 -2
- data/lib/bioroebe/gui/libui/show_codon_table/show_codon_table.rb +2 -0
- data/lib/bioroebe/gui/libui/three_to_one/three_to_one.rb +8 -6
- data/lib/bioroebe/gui/shared_code/alignment/alignment_module.rb +102 -0
- data/lib/bioroebe/gui/shared_code/aminoacid_composition/aminoacid_composition_module.rb +94 -0
- data/lib/bioroebe/gui/shared_code/levensthein_distance/levensthein_distance_module.rb +18 -16
- data/lib/bioroebe/gui/shared_code/protein_to_DNA/protein_to_DNA_module.rb +14 -14
- data/lib/bioroebe/gui/swing/three_to_one/ThreeToOne$1.class +0 -0
- data/lib/bioroebe/gui/swing/three_to_one/ThreeToOne$CloseListener.class +0 -0
- data/lib/bioroebe/gui/swing/three_to_one/ThreeToOne.class +0 -0
- data/lib/bioroebe/gui/swing/three_to_one/ThreeToOne.java +141 -0
- data/lib/bioroebe/images/FORWARD_PRIMER.png +0 -0
- data/lib/bioroebe/images/REVERSE_PRIMER.png +0 -0
- data/lib/bioroebe/images/images.html +29845 -0
- data/lib/bioroebe/java/README.md +5 -0
- data/lib/bioroebe/java/bioroebe/AllInOne.java +1 -0
- data/lib/bioroebe/java/bioroebe/Base.class +0 -0
- data/lib/bioroebe/java/bioroebe/Base.java +39 -5
- data/lib/bioroebe/java/bioroebe/IsPalindrome.java +23 -5
- data/lib/bioroebe/java/bioroebe/SanitizeNucleotideSequence.java +0 -0
- data/lib/bioroebe/java/bioroebe/Sequence.java +28 -3
- data/lib/bioroebe/java/bioroebe/ToCamelcase.class +0 -0
- data/lib/bioroebe/java/bioroebe/ToCamelcase.java +16 -4
- data/lib/bioroebe/java/bioroebe/ToRNA.java +43 -0
- data/lib/bioroebe/java/bioroebe/ToplevelMethods.java +6 -0
- data/lib/bioroebe/java/bioroebe/{BisulfiteTreatment.class → src/BisulfiteTreatment.class} +0 -0
- data/lib/bioroebe/java/bioroebe/{Codons.class → src/Codons.class} +0 -0
- data/lib/bioroebe/java/bioroebe/src/Codons.java +35 -0
- data/lib/bioroebe/java/bioroebe/src/Commandline.class +0 -0
- data/lib/bioroebe/java/bioroebe/src/Commandline.java +101 -0
- data/lib/bioroebe/java/bioroebe/{Esystem.class → src/Esystem.class} +0 -0
- data/lib/bioroebe/java/bioroebe/{Esystem.java → src/Esystem.java} +6 -1
- data/lib/bioroebe/java/bioroebe/{GenerateRandomDnaSequence.class → src/GenerateRandomDnaSequence.class} +0 -0
- data/lib/bioroebe/java/bioroebe/{GenerateRandomDnaSequence.java → src/GenerateRandomDnaSequence.java} +8 -2
- data/lib/bioroebe/java/bioroebe/src/PartnerNucleotide.class +0 -0
- data/lib/bioroebe/java/bioroebe/src/PartnerNucleotide.java +56 -0
- data/lib/bioroebe/java/bioroebe/{RemoveFile.java → src/RemoveFile.java} +10 -4
- data/lib/bioroebe/java/bioroebe/{RemoveNumbers.class → src/RemoveNumbers.class} +0 -0
- data/lib/bioroebe/java/bioroebe/{RemoveNumbers.java → src/RemoveNumbers.java} +1 -0
- data/lib/bioroebe/java/bioroebe/src/toplevel_methods/BaseComposition.class +0 -0
- data/lib/bioroebe/java/bioroebe/src/toplevel_methods/BaseComposition.java +75 -0
- data/lib/bioroebe/misc/ruler.rb +11 -2
- data/lib/bioroebe/nucleotides/most_likely_nucleotide_sequence_for_this_aminoacid_sequence.rb +1 -9
- data/lib/bioroebe/nucleotides/sanitize_nucleotide_sequence.rb +59 -18
- data/lib/bioroebe/nucleotides/show_nucleotide_sequence.rb +7 -7
- data/lib/bioroebe/parsers/genbank_parser.rb +347 -26
- data/lib/bioroebe/parsers/gff.rb +1 -9
- data/lib/bioroebe/patterns/scan_for_repeat.rb +1 -5
- data/lib/bioroebe/pdb/fetch_fasta_sequence_from_pdb.rb +1 -9
- data/lib/bioroebe/pdb/parse_mmCIF_file.rb +1 -9
- data/lib/bioroebe/pdb/parse_pdb_file.rb +4 -10
- data/lib/bioroebe/project/project.rb +1 -1
- data/lib/bioroebe/python/README.md +1 -0
- data/lib/bioroebe/python/__pycache__/mymodule.cpython-39.pyc +0 -0
- data/lib/bioroebe/python/gui/gtk3/all_in_one.css +4 -0
- data/lib/bioroebe/python/gui/gtk3/all_in_one.py +59 -0
- data/lib/bioroebe/python/gui/gtk3/widget1.py +20 -0
- data/lib/bioroebe/python/gui/tkinter/all_in_one.py +91 -0
- data/lib/bioroebe/python/mymodule.py +8 -0
- data/lib/bioroebe/python/protein_to_dna.py +33 -0
- data/lib/bioroebe/python/shell/shell.py +19 -0
- data/lib/bioroebe/python/to_rna.py +14 -0
- data/lib/bioroebe/python/toplevel_methods/convert_dna_to_aminoacid_sequence.py +137 -0
- data/lib/bioroebe/python/toplevel_methods/esystem.py +12 -0
- data/lib/bioroebe/python/toplevel_methods/open_in_browser.py +20 -0
- data/lib/bioroebe/python/toplevel_methods/palindromes.py +52 -0
- data/lib/bioroebe/python/toplevel_methods/rds.py +13 -0
- data/lib/bioroebe/python/toplevel_methods/shuffleseq.py +23 -0
- data/lib/bioroebe/python/toplevel_methods/three_delimiter.py +37 -0
- data/lib/bioroebe/python/toplevel_methods/time_and_date.py +43 -0
- data/lib/bioroebe/python/toplevel_methods/to_camelcase.py +21 -0
- data/lib/bioroebe/requires/require_cleave_and_digest.rb +3 -1
- data/lib/bioroebe/requires/require_the_bioroebe_project.rb +3 -1
- data/lib/bioroebe/sequence/alignment.rb +14 -4
- data/lib/bioroebe/sequence/dna.rb +1 -0
- data/lib/bioroebe/sequence/nucleotide_module/nucleotide_module.rb +28 -25
- data/lib/bioroebe/sequence/protein.rb +105 -3
- data/lib/bioroebe/sequence/rna.rb +220 -0
- data/lib/bioroebe/sequence/sequence.rb +128 -40
- data/lib/bioroebe/shell/menu.rb +3815 -3696
- data/lib/bioroebe/shell/misc.rb +9019 -3133
- data/lib/bioroebe/shell/readline/readline.rb +1 -1
- data/lib/bioroebe/shell/shell.rb +1137 -28
- data/lib/bioroebe/siRNA/siRNA.rb +81 -1
- data/lib/bioroebe/string_matching/find_longest_substring.rb +3 -2
- data/lib/bioroebe/string_matching/hamming_distance.rb +1 -9
- data/lib/bioroebe/taxonomy/class_methods.rb +3 -8
- data/lib/bioroebe/taxonomy/constants.rb +4 -3
- data/lib/bioroebe/taxonomy/edit.rb +2 -1
- data/lib/bioroebe/taxonomy/help/help.rb +10 -10
- data/lib/bioroebe/taxonomy/help/helpline.rb +2 -2
- data/lib/bioroebe/taxonomy/info/check_available.rb +15 -9
- data/lib/bioroebe/taxonomy/info/info.rb +18 -11
- data/lib/bioroebe/taxonomy/info/is_dna.rb +46 -36
- data/lib/bioroebe/taxonomy/interactive.rb +140 -104
- data/lib/bioroebe/taxonomy/menu.rb +27 -18
- data/lib/bioroebe/taxonomy/parse_fasta.rb +3 -1
- data/lib/bioroebe/taxonomy/shared.rb +1 -0
- data/lib/bioroebe/taxonomy/taxonomy.rb +1 -0
- data/lib/bioroebe/toplevel_methods/aminoacids_and_proteins.rb +31 -24
- data/lib/bioroebe/toplevel_methods/colourize_related_methods.rb +164 -0
- data/lib/bioroebe/toplevel_methods/databases.rb +1 -1
- data/lib/bioroebe/toplevel_methods/digest.rb +18 -8
- data/lib/bioroebe/toplevel_methods/fasta_and_fastq.rb +107 -63
- data/lib/bioroebe/toplevel_methods/file_and_directory_related_actions.rb +14 -2
- data/lib/bioroebe/toplevel_methods/frequencies.rb +8 -1
- data/lib/bioroebe/toplevel_methods/misc.rb +175 -11
- data/lib/bioroebe/toplevel_methods/nucleotides.rb +118 -46
- data/lib/bioroebe/toplevel_methods/open_in_browser.rb +2 -0
- data/lib/bioroebe/toplevel_methods/palindromes.rb +75 -47
- data/lib/bioroebe/toplevel_methods/taxonomy.rb +3 -3
- data/lib/bioroebe/toplevel_methods/to_camelcase.rb +5 -0
- data/lib/bioroebe/utility_scripts/align_open_reading_frames.rb +1 -9
- data/lib/bioroebe/utility_scripts/check_for_mismatches/check_for_mismatches.rb +1 -9
- data/lib/bioroebe/utility_scripts/compacter/compacter.rb +251 -0
- data/lib/bioroebe/utility_scripts/compseq/compseq.rb +1 -9
- data/lib/bioroebe/utility_scripts/consensus_sequence.rb +6 -6
- data/lib/bioroebe/utility_scripts/create_batch_entrez_file.rb +1 -9
- data/lib/bioroebe/utility_scripts/dot_alignment.rb +1 -9
- data/lib/bioroebe/utility_scripts/move_file_to_its_correct_location.rb +1 -4
- data/lib/bioroebe/utility_scripts/parse_taxonomy.rb +2 -2
- data/lib/bioroebe/utility_scripts/permutations.rb +36 -9
- data/lib/bioroebe/utility_scripts/showorf/constants.rb +0 -5
- data/lib/bioroebe/utility_scripts/showorf/reset.rb +1 -4
- data/lib/bioroebe/version/version.rb +2 -2
- data/lib/bioroebe/www/embeddable_interface.rb +121 -58
- data/lib/bioroebe/www/sinatra/sinatra.rb +186 -71
- data/lib/bioroebe/yaml/aminoacids/amino_acids_long_name_to_one_letter.yml +2 -2
- data/lib/bioroebe/yaml/aminoacids/weight_of_common_proteins.yml +17 -17
- data/lib/bioroebe/yaml/configuration/browser.yml +1 -1
- data/lib/bioroebe/yaml/configuration/temp_dir.yml +1 -1
- data/lib/bioroebe/yaml/consensus_sequences/consensus_sequences.yml +1 -0
- data/lib/bioroebe/yaml/genomes/README.md +3 -4
- data/lib/bioroebe/yaml/nucleotides/nucleotides.yml +5 -0
- data/lib/bioroebe/yaml/restriction_enzymes/restriction_enzymes.yml +57 -57
- data/spec/README.md +6 -0
- data/spec/project_wide_specification/classes.md +5 -0
- metadata +107 -70
- data/doc/setup.rb +0 -1655
- data/lib/bioroebe/fasta_and_fastq/parse_fasta/constants.rb +0 -50
- data/lib/bioroebe/fasta_and_fastq/parse_fasta/initialize.rb +0 -86
- data/lib/bioroebe/fasta_and_fastq/parse_fasta/menu.rb +0 -117
- data/lib/bioroebe/fasta_and_fastq/parse_fasta/misc.rb +0 -981
- data/lib/bioroebe/fasta_and_fastq/parse_fasta/report.rb +0 -156
- data/lib/bioroebe/fasta_and_fastq/parse_fasta/reset.rb +0 -128
- data/lib/bioroebe/genbank/genbank_parser.rb +0 -291
- data/lib/bioroebe/java/bioroebe/AllInOne.class +0 -0
- data/lib/bioroebe/java/bioroebe/Cat.class +0 -0
- data/lib/bioroebe/java/bioroebe/Codons.java +0 -22
- data/lib/bioroebe/java/bioroebe/IsPalindrome.class +0 -0
- data/lib/bioroebe/java/bioroebe/PartnerNucleotide.class +0 -0
- data/lib/bioroebe/java/bioroebe/PartnerNucleotide.java +0 -19
- data/lib/bioroebe/java/bioroebe/SanitizeNucleotideSequence.class +0 -0
- data/lib/bioroebe/java/bioroebe/ToplevelMethods.class +0 -0
- data/lib/bioroebe/java/bioroebe.jar +0 -0
- data/lib/bioroebe/shell/add.rb +0 -108
- data/lib/bioroebe/shell/assign.rb +0 -360
- data/lib/bioroebe/shell/chop_and_cut.rb +0 -281
- data/lib/bioroebe/shell/constants.rb +0 -166
- data/lib/bioroebe/shell/download.rb +0 -335
- data/lib/bioroebe/shell/enable_and_disable.rb +0 -158
- data/lib/bioroebe/shell/enzymes.rb +0 -310
- data/lib/bioroebe/shell/fasta.rb +0 -345
- data/lib/bioroebe/shell/gtk.rb +0 -76
- data/lib/bioroebe/shell/history.rb +0 -132
- data/lib/bioroebe/shell/initialize.rb +0 -217
- data/lib/bioroebe/shell/loop.rb +0 -74
- data/lib/bioroebe/shell/prompt.rb +0 -107
- data/lib/bioroebe/shell/random.rb +0 -289
- data/lib/bioroebe/shell/reset.rb +0 -335
- data/lib/bioroebe/shell/scan_and_parse.rb +0 -135
- data/lib/bioroebe/shell/search.rb +0 -337
- data/lib/bioroebe/shell/sequences.rb +0 -200
- data/lib/bioroebe/shell/show_report_and_display.rb +0 -2901
- data/lib/bioroebe/shell/startup.rb +0 -127
- data/lib/bioroebe/shell/taxonomy.rb +0 -14
- data/lib/bioroebe/shell/tk.rb +0 -23
- data/lib/bioroebe/shell/user_input.rb +0 -88
- data/lib/bioroebe/shell/xorg.rb +0 -45
- data/lib/bioroebe/utility_scripts/compacter.rb +0 -131
- /data/lib/bioroebe/java/bioroebe/{BisulfiteTreatment.java → src/BisulfiteTreatment.java} +0 -0
- /data/lib/bioroebe/java/bioroebe/{RemoveFile.class → src/RemoveFile.class} +0 -0
data/doc/todo/bioroebe_todo.md
CHANGED
@@ -1,2823 +1,2278 @@
|
|
1
|
-
|
2
|
-
(
|
3
|
-
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
1
|
+
--------------------------------------------------------------------------------
|
2
|
+
(2) → Integrate http://nc2.neb.com/NEBcutter2/cutshow.php?name=ffe1d68e-
|
3
|
+
in particular the visual part.
|
4
|
+
--------------------------------------------------------------------------------
|
5
|
+
(3) → add support for:
|
6
|
+
codon_of? this_aminoacid
|
7
|
+
class CodonOfThisAminoacid
|
8
|
+
^^^^
|
9
|
+
--------------------------------------------------------------------------------
|
10
|
+
(4) → Bioroebe::RestrictionEnzymes::Statistics.show
|
11
|
+
^^^ improve these
|
12
|
+
and then add it to the documentation.
|
13
|
+
--------------------------------------------------------------------------------
|
14
|
+
(5) → use glimmer + nebula for widgets
|
15
|
+
^^^
|
16
|
+
improve the nucleotide sequence analyser
|
17
|
+
--------------------------------------------------------------------------------
|
18
|
+
(6) → add to sinatra: a standalone server to query BAM files (and
|
19
|
+
the corresponding reference). The server will return the
|
20
|
+
content of a BAM file in the selected folder when the
|
21
|
+
server is started up. The server used is sintra.
|
22
|
+
--------------------------------------------------------------------------------
|
23
|
+
(7) → add the possibility to show what the effect of enzymes
|
24
|
+
are
|
25
|
+
AND inhibitors of enzymes. perhaps bioroebe can be
|
26
|
+
used in system biology one day
|
27
|
+
--------------------------------------------------------------------------------
|
28
|
+
(8) → Bioroebe::Sequence.new('AGCTTAGCGTACAGCTACGACGTAGTCTGACGA').cut_with? :AluI
|
29
|
+
^^^ support this API and document it too
|
30
|
+
--------------------------------------------------------------------------------
|
31
|
+
(9) → integrate electrno microscopy slowly and also add documentation
|
10
32
|
about this AS YOU GO!!!!!
|
11
33
|
^^^ yup add more of it
|
12
|
-
|
13
|
-
(
|
34
|
+
--------------------------------------------------------------------------------
|
35
|
+
(10) → Add save session support
|
14
36
|
to reload our last activity completely ...
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
|
40
|
-
|
41
|
-
^^^ this is "retention in lumen of ER"
|
42
|
-
find this too
|
43
|
-
|
44
|
-
BUT!
|
45
|
-
|
46
|
-
we must verify it
|
47
|
-
|
48
|
-
^^ yep this is also called KDEL
|
49
|
-
https://en.wikipedia.org/wiki/KDEL_(amino_acid_sequence)
|
50
|
-
-------------------------------------------------------------------------------
|
51
|
-
(6) → Add "orthologs". this shall show us the top 25 orthologs or
|
37
|
+
hmmm..
|
38
|
+
This has to be well designed...
|
39
|
+
Perhaps before we do so, we will add some
|
40
|
+
class that anylizes what we have
|
41
|
+
call it:
|
42
|
+
class AnalyseLocalDataset
|
43
|
+
And it is called when the bioshell is
|
44
|
+
called. Can be enabled and disabled.
|
45
|
+
AND document it then.
|
46
|
+
The idea is to provide additional information
|
47
|
+
upon startup of the bioroebe shell.
|
48
|
+
This is in preparation for save-session support.
|
49
|
+
--------------------------------------------------------------------------------
|
50
|
+
(11) → Lys-Asp-Glu-Leu
|
51
|
+
if i.include?('-') and Bioroebe.is_in_the_three_letter_code?(i)
|
52
|
+
end
|
53
|
+
- Lys-Asp-Glu-Leu-COO-
|
54
|
+
Lys-Asp-Glu-Leu
|
55
|
+
^^^ this is "retention in lumen of ER"
|
56
|
+
find this too
|
57
|
+
BUT!
|
58
|
+
we must verify it
|
59
|
+
^^ yep this is also called KDEL
|
60
|
+
https://en.wikipedia.org/wiki/KDEL_(amino_acid_sequence)
|
61
|
+
--------------------------------------------------------------------------------
|
62
|
+
(12) → Add "orthologs". this shall show us the top 25 orthologs or
|
52
63
|
something. In the bioshell? Hmm. Not sure yet.
|
53
|
-
|
54
|
-
(
|
55
|
-
|
56
|
-
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
|
61
|
-
|
62
|
-
|
63
|
-
|
64
|
-
|
65
|
-
|
66
|
-
-------------------------------------------------------------------------------
|
67
|
-
(8) → SARS genom analyisere in bioroebe
|
64
|
+
--------------------------------------------------------------------------------
|
65
|
+
(13) → clone the functionality of this:
|
66
|
+
http://www.kazusa.or.jp/codon/cgi-bin/countcodon.cgi
|
67
|
+
http://www.kazusa.or.jp/codon/countcodon.html
|
68
|
+
In other words, create a class that can generate such an output.
|
69
|
+
^^^ This is now done.
|
70
|
+
Then add this to a GUI as well as the www output.
|
71
|
+
^^^ This still has to be done, though. We will use a ruby-gtk3
|
72
|
+
widget first. And sinatra output too.
|
73
|
+
AND document it as well
|
74
|
+
--------------------------------------------------------------------------------
|
75
|
+
(14) → SARS genom analyisere in bioroebe
|
68
76
|
eventuell auch graphisch
|
69
|
-
|
70
|
-
|
71
|
-
|
72
|
-
(9) → In bioroebe, generate that .ps thingy graphical thing from the
|
77
|
+
Gibt es neue GUIs die wir kombinieren könnten? Hmmm.
|
78
|
+
--------------------------------------------------------------------------------
|
79
|
+
(15) → In bioroebe, generate that .ps thingy graphical thing from the
|
73
80
|
vienna RNA tutorial. Hmmm.
|
74
|
-
|
75
|
-
|
76
|
-
|
77
|
-
|
78
|
-
human
|
81
|
+
https://www.tbi.univie.ac.at/RNA/tutorial/
|
82
|
+
--------------------------------------------------------------------------------
|
83
|
+
(16) → get insulin squence frmo NCBI
|
84
|
+
human
|
79
85
|
then apply trypsin onto it
|
80
86
|
and try it like this:
|
81
|
-
|
82
|
-
|
83
|
-
|
84
|
-
|
85
|
-
|
86
|
-
|
87
|
-
|
88
|
-
|
89
|
-
|
90
|
-
|
91
|
-
-------------------------------------------------------------------------------
|
92
|
-
(1) → in bioroebe: UAG?
|
87
|
+
trypsin --insulin
|
88
|
+
^^^
|
89
|
+
also document it then. well .....
|
90
|
+
Also add:
|
91
|
+
insulin?
|
92
|
+
^^^ to show it
|
93
|
+
Hmm. Perhaps also auto-download or something.
|
94
|
+
--------------------------------------------------------------------------------
|
95
|
+
(17) → in bioroebe: UAG?
|
93
96
|
^^^ show all stop codons with that in the bioshell
|
94
97
|
all UAG sequences... hmm. and TAG?
|
95
98
|
Finish that.
|
96
|
-
|
97
|
-
(
|
99
|
+
--------------------------------------------------------------------------------
|
100
|
+
(18) → The position of a symbol in a string is the total number of
|
98
101
|
symbols found to its left, including itself (e.g., the positions
|
99
102
|
of all occurrences of 'U' in "AUGCUUCAGAAAGGUCUUACG" are 2, 5,
|
100
103
|
6, 15, 17, and 18). The symbol at position i
|
101
104
|
of s is denoted by s[i].
|
102
|
-
|
103
|
-
|
104
|
-
|
105
|
-
|
106
|
-
(1) → http://bioruby.org/rdoc/Bio/Blast.html
|
105
|
+
^^^ add a solution there, a toplevel API
|
106
|
+
!!!!!
|
107
|
+
--------------------------------------------------------------------------------
|
108
|
+
(19) → http://bioruby.org/rdoc/Bio/Blast.html
|
107
109
|
^^^ add support for BLAST
|
108
|
-
|
109
|
-
(
|
110
|
+
--------------------------------------------------------------------------------
|
111
|
+
(20) → add: parse_pdb()
|
110
112
|
With this we shall just show some info, about a given
|
111
113
|
.pdb file at hand.
|
112
114
|
Also make it commandline based too + bioshell variant
|
113
115
|
here, and a sinatra interface once this all works.
|
114
116
|
Don't forget to document it!!!!!
|
115
117
|
^^^ and google a bit how others do that
|
116
|
-
|
117
|
-
(
|
118
|
+
--------------------------------------------------------------------------------
|
119
|
+
(21) → pdb 1a6m
|
118
120
|
^^^ download this when that is used in the bioshell; we also have
|
119
|
-
|
120
|
-
|
121
|
+
to use the download directory for this, so make sure that
|
122
|
+
we do.
|
121
123
|
^^^ And then, also document this clearly.
|
122
|
-
|
123
|
-
(
|
124
|
-
^^^ slowly port this ... find out differences
|
125
|
-
|
126
|
-
|
127
|
-
|
128
|
-
(
|
129
|
-
|
130
|
-
|
131
|
-
|
132
|
-
(5) → Scan for leucine zipper!
|
133
|
-
|
124
|
+
--------------------------------------------------------------------------------
|
125
|
+
(22) → show_string
|
126
|
+
^^^ slowly port this ... find out differences
|
127
|
+
then unify into one method. right now we used
|
128
|
+
two or something.
|
129
|
+
--------------------------------------------------------------------------------
|
130
|
+
(23) → Try to see if we can integrate this into our GUI:
|
131
|
+
https://cdn.snapgene.com/assets/7.6.11/assets/images/snapgene/homepage/homepage-hero.png
|
132
|
+
--------------------------------------------------------------------------------
|
133
|
+
(24) → Scan for leucine zipper!
|
134
134
|
This is ~25% implemented. We need to double-check what
|
135
135
|
exactly is a leucine zipper.
|
136
|
-
|
137
|
-
(
|
136
|
+
--------------------------------------------------------------------------------
|
137
|
+
(25) → Extend the sinatra-interface for the Rosalind task,
|
138
138
|
perhaps add a sub-link to show which parts are solved
|
139
139
|
as-is. Hmm. I am not continuing on this though.
|
140
|
-
^^^^
|
141
|
-
well - make rosalind anew again or something.
|
142
|
-
|
143
|
-
|
144
|
-
(7) - Add a blast interface; both via the web-interface, GUI,
|
140
|
+
^^^^
|
141
|
+
well - make rosalind anew again or something.
|
142
|
+
--------------------------------------------------------------------------------
|
143
|
+
(26) → Add a blast interface; both via the web-interface, GUI,
|
145
144
|
and also from the commandline.
|
146
|
-
|
147
|
-
(
|
145
|
+
--------------------------------------------------------------------------------
|
146
|
+
(27) → Write a tutorial about primer design.
|
148
147
|
also make sure that the GUI has support for this.
|
149
|
-
|
150
|
-
(
|
148
|
+
--------------------------------------------------------------------------------
|
149
|
+
(28) → In the documentation examples, show some exampls for how to work
|
151
150
|
with different organisms.
|
152
|
-
|
153
|
-
(
|
154
|
-
|
155
|
-
|
156
|
-
|
157
|
-
(
|
158
|
-
|
159
|
-
|
160
|
-
|
161
|
-
|
162
|
-
|
163
|
-
|
164
|
-
but also as shortcut via the commandline such as:
|
151
|
+
--------------------------------------------------------------------------------
|
152
|
+
(29) → In the bioshell, if "stop?" is issued, then the colouring isn't
|
153
|
+
correct. It currently does not show any result. This has to
|
154
|
+
be fixed.
|
155
|
+
--------------------------------------------------------------------------------
|
156
|
+
(30) → https://www.rubydoc.info/gems/biomart
|
157
|
+
^^^ integrate biomart
|
158
|
+
p biomart.list_datasets
|
159
|
+
p biomart.datasets?
|
160
|
+
--------------------------------------------------------------------------------
|
161
|
+
(31) → Add Trypsin und Trypsinogen sequences, both as FASTA
|
162
|
+
but also as shortcut via the commandline such as:
|
165
163
|
show_orf :trypsine
|
166
164
|
show_orf :trypsin
|
167
|
-
|
168
|
-
|
169
|
-
(
|
170
|
-
|
171
|
-
|
172
|
-
|
173
|
-
|
174
|
-
|
175
|
-
|
176
|
-
|
177
|
-
|
178
|
-
|
179
|
-
|
180
|
-
-------------------------------------------------------------------------------
|
181
|
-
(14) → MG1655
|
165
|
+
Or something like this; and document it as well.
|
166
|
+
--------------------------------------------------------------------------------
|
167
|
+
(32) → 1..60
|
168
|
+
setdna 57
|
169
|
+
append stop
|
170
|
+
1..60
|
171
|
+
Next showing the nucleotides 1370 to 1462 (including 1370 and 1462).
|
172
|
+
The length of the fragment will be 93 nucleotides.
|
173
|
+
5' - ATGTGCAGTCAGGTGAATTTATTGAAAAATTTGAGGCTCCTGGTGGTGCAAATCAAAGAACTGCTCCTCAGTGGATGTTGCCTTTACTTCTAG - 3'
|
174
|
+
^^^ hier beim colourize, wenn das letzte codon ein STOP codon ist
|
175
|
+
dann colourizen wir das auch.
|
176
|
+
--------------------------------------------------------------------------------
|
177
|
+
(33) → MG1655
|
182
178
|
^^^ input this to download the sequence. Also show it to the user.
|
183
|
-
|
184
|
-
(
|
185
|
-
|
186
|
-
(
|
179
|
+
--------------------------------------------------------------------------------
|
180
|
+
(34) → extend virus-information into the bioroebe project.
|
181
|
+
--------------------------------------------------------------------------------
|
182
|
+
(35) → Add a way to analyse the chemical structure of all
|
187
183
|
aminoacids. We wish to show the chemical formula.
|
188
|
-
|
189
184
|
E. g. if we input:
|
190
|
-
|
191
185
|
"phenylalanin"
|
192
|
-
|
193
|
-
|
194
|
-
|
195
|
-
|
196
|
-
|
197
|
-
|
198
|
-
|
199
|
-
|
200
|
-
|
201
|
-
|
202
|
-
|
203
|
-
|
204
|
-
|
205
|
-
|
206
|
-
|
207
|
-
|
208
|
-
|
209
|
-
|
210
|
-
|
211
|
-
|
212
|
-
|
213
|
-
|
214
|
-
-------------------------------------------------------------------------------
|
215
|
-
(21) → rewrite the whole project anew
|
186
|
+
Then the C9N should be shown, of its -R part.
|
187
|
+
^^^ H wird aber rausgelöscht und O ebenso.
|
188
|
+
wtf?
|
189
|
+
I don't understand why it removes H and 0 so perhaps
|
190
|
+
dont remove that part. But still show the -R.
|
191
|
+
--------------------------------------------------------------------------------
|
192
|
+
(36) → FIX THE COLOURIZATION BUG; THIS ONE TRIGGERED THE WHOLE
|
193
|
+
REWRITE AFTER ALL!
|
194
|
+
--------------------------------------------------------------------------------
|
195
|
+
(37) → FIX TAXONOMY related-problems AS WELL
|
196
|
+
^^^^^^ AND DOCUMENT THIS related-problems.
|
197
|
+
--------------------------------------------------------------------------------
|
198
|
+
(38) → Do note that z will then be a String, not a sequence object anymore.
|
199
|
+
(This may be subject to change in the future, but for now, aka
|
200
|
+
**February 2020**, it is that way.)
|
201
|
+
^^^^
|
202
|
+
--------------------------------------------------------------------------------
|
203
|
+
(39) → ^^^ colours are appended. That should not be the case!
|
204
|
+
ADD SOMETHING NEW ... some todo entries
|
205
|
+
and some python tool
|
206
|
+
--------------------------------------------------------------------------------
|
207
|
+
(40) → rewrite the whole project anew
|
216
208
|
- improve the documentation
|
217
|
-
|
218
|
-
|
219
|
-
|
220
|
-
|
221
|
-
|
222
|
-
|
223
|
-
(
|
224
|
-
|
225
|
-
|
226
|
-
|
227
|
-
|
228
|
-
|
229
|
-
|
230
|
-
|
231
|
-
|
232
|
-
|
233
|
-
|
234
|
-
|
235
|
-
|
236
|
-
|
237
|
-
|
238
|
-
|
239
|
-
|
240
|
-
|
241
|
-
|
242
|
-
|
243
|
-
|
244
|
-
|
245
|
-
|
246
|
-
|
247
|
-
|
248
|
-
|
249
|
-
^^^^^^^^^^^^^^^
|
250
|
-
|
251
|
-
-
|
252
|
-
efetch "https://www.ncbi.nlm.nih.gov/gene/744779"
|
253
|
-
^^^ test this. again
|
254
|
-
|
255
|
-
-------------------------------------------------------------------------------
|
256
|
-
(25) fix tk-levensthein
|
257
|
-
-------------------------------------------------------------------------------
|
258
|
-
(26) → rewrite the whole project anew
|
209
|
+
- focus on class Protein first and add
|
210
|
+
all_dna_combinations or somethingl ike
|
211
|
+
that, as well as:
|
212
|
+
.backtrans
|
213
|
+
.reverse_translate
|
214
|
+
--------------------------------------------------------------------------------
|
215
|
+
(41) →
|
216
|
+
Reduced alphabets for proteins | [not implemented yet]
|
217
|
+
^^^ check this as well
|
218
|
+
require 'bioroebe/base/commandline_application/aminoacids.rb'
|
219
|
+
^^^ verify whether we need this really in the commandline
|
220
|
+
application part. Or perhaps it should be part of
|
221
|
+
Base.
|
222
|
+
- in bioroebe, test sequel for taxonomy...
|
223
|
+
- LEARN FUCKING JAVA; combine it with bioroebe though. work through
|
224
|
+
bioroebe and as you go, also write related-problems into java.
|
225
|
+
Add restriction thingy complete in java AND bioroebe
|
226
|
+
show how many will be cut.
|
227
|
+
Improve this also in bioroebe at the same time
|
228
|
+
add a GUI in swing too, also in ruby-gtk.
|
229
|
+
document publish IMPROVE
|
230
|
+
for now just show how many segments we will
|
231
|
+
generate.
|
232
|
+
First focus on bioroebe.
|
233
|
+
^^^^^^^^^^^^^^^
|
234
|
+
-
|
235
|
+
efetch "https://www.ncbi.nlm.nih.gov/gene/744779"
|
236
|
+
^^^ test this. again
|
237
|
+
--------------------------------------------------------------------------------
|
238
|
+
(42) → fix tk-levensthein
|
239
|
+
--------------------------------------------------------------------------------
|
240
|
+
(43) → rewrite the whole project anew
|
259
241
|
- improve the documentation
|
260
|
-
|
261
|
-
|
262
|
-
|
263
|
-
|
264
|
-
|
265
|
-
|
266
|
-
|
267
|
-
(
|
268
|
-
|
269
|
-
|
270
|
-
|
271
|
-
|
272
|
-
|
273
|
-
(28) → SINATRA STUFF:
|
242
|
+
- rework the WHOLE tutorial as well
|
243
|
+
- focus on class Protein first and add
|
244
|
+
all_dna_combinations or somethingl ike
|
245
|
+
that
|
246
|
+
.backtrans
|
247
|
+
.reverse_translate
|
248
|
+
--------------------------------------------------------------------------------
|
249
|
+
(44) → analyze /Depot/Temp/Bioroebe/1CEZ.pdb
|
250
|
+
^^^
|
251
|
+
support this. Already works half-way, we started writing a pdb parser.
|
252
|
+
this should work in general, for .fasta files as well.
|
253
|
+
--------------------------------------------------------------------------------
|
254
|
+
(45) → SINATRA STUFF:
|
274
255
|
FIX AND EXTEND SINATRA IN BIOROEBE.
|
275
256
|
extend it too.
|
276
|
-
|
277
257
|
http://localhost:4567/random_aminoacids
|
278
258
|
^^^ add a form there
|
279
|
-
|
280
259
|
add emboss show orf or so
|
281
260
|
and special-dispaly on sinatra kaa
|
282
261
|
where the nucleotide sequence has numbers
|
283
262
|
^^^
|
284
|
-
|
285
|
-
(
|
286
|
-
|
287
|
-
|
288
|
-
See:
|
263
|
+
--------------------------------------------------------------------------------
|
264
|
+
(46) → pick any virus and begin to amass tons of data; and then when done
|
265
|
+
also connect this into a GUI for use therein.
|
266
|
+
See:
|
289
267
|
https://raw.githubusercontent.com/labsquare/fastQt/master/screenshot.gif
|
290
|
-
^^^^^^^
|
291
|
-
|
292
|
-
|
293
|
-
|
294
|
-
|
295
|
-
|
296
|
-
|
297
|
-
|
298
|
-
|
299
|
-
|
300
|
-
|
301
|
-
|
302
|
-
|
303
|
-
|
304
|
-
|
305
|
-
|
306
|
-
(
|
307
|
-
|
308
|
-
|
309
|
-
|
310
|
-
|
311
|
-
|
312
|
-
|
313
|
-
|
314
|
-
|
315
|
-
|
316
|
-
|
317
|
-
|
318
|
-
|
319
|
-
|
320
|
-
|
321
|
-
|
322
|
-
|
323
|
-
|
324
|
-
|
325
|
-
|
326
|
-
|
327
|
-
|
328
|
-
|
329
|
-
|
330
|
-
|
331
|
-
|
332
|
-
|
333
|
-
|
334
|
-
|
335
|
-
|
336
|
-
|
337
|
-
|
338
|
-
|
339
|
-
|
340
|
-
|
341
|
-
|
342
|
-
|
343
|
-
|
344
|
-
|
345
|
-
|
346
|
-
(
|
347
|
-
|
348
|
-
|
349
|
-
|
350
|
-
|
351
|
-
|
352
|
-
|
353
|
-
|
354
|
-
|
355
|
-
|
356
|
-
|
357
|
-
|
358
|
-
|
359
|
-
|
360
|
-
|
361
|
-
|
362
|
-
|
363
|
-
|
364
|
-
|
365
|
-
|
366
|
-
|
367
|
-
dann:
|
368
|
-
|
369
|
-
Bioroebe.digest_this_dna(:lambda_genome, with: :EcoRI)
|
370
|
-
Bioroebe.digest_this_dna("/root/Bioroebe/fasta/NC_001416.1_Enterobacteria_phage_lambda_complete_genome.fasta", with: :EcoRI)
|
371
|
-
|
372
|
-
|
373
|
-
^^^ test this API and document it as well.
|
374
|
-
^^^ and say how many fragments will be created in this CIRCULAR
|
375
|
-
DNA.
|
376
|
-
^^^ this now works kind of ... but it must be better
|
377
|
-
documented and we must test this with more data.
|
378
|
-
-------------------------------------------------------------------------------
|
379
|
-
(8) → add the bioroebe logo to sinatra, but as appropriate size,
|
380
|
-
via base64. perhaps width 50 or so. need to determine
|
381
|
-
which size fits here.
|
382
|
-
-------------------------------------------------------------------------------
|
383
|
-
(9) → Integrate http://nc2.neb.com/NEBcutter2/cutshow.php?name=ffe1d68e-
|
384
|
-
|
385
|
-
in particular the visual part.
|
386
|
-
-------------------------------------------------------------------------------
|
387
|
-
(10) → https://international.neb.com/products/r0196-ncii#Product%20Information
|
388
|
-
^^^ autogenerate such an image, aka restriction cutting enzyme
|
389
|
-
to indicate the target sequence.
|
390
|
-
-------------------------------------------------------------------------------
|
391
|
-
(6) → how to do codon optimiation in e.coli? bioroebe must support this!
|
392
|
-
|
393
|
-
we must first get a display which codon is very commonly used in
|
394
|
-
E. coli, from some remote site ... japanese site I think.
|
395
|
-
|
396
|
-
then, we analyse all possibilities.
|
397
|
-
|
398
|
-
and then we look which codons may be improvable - display
|
399
|
-
them on the commandline
|
400
|
-
|
401
|
-
class: OptimizeCodons.new(of_this_sequence)
|
402
|
-
-------------------------------------------------------------------------------
|
403
|
-
(7) → Molekulare Grösse von "Ubiquitin"? "8.5 kd".
|
268
|
+
^^^^^^^
|
269
|
+
begin with a circovirus
|
270
|
+
^^^^^^^
|
271
|
+
DOCUMENT THIS AS YOU GO
|
272
|
+
research about circovirus too
|
273
|
+
https://www.ncbi.nlm.nih.gov/nuccore/NC_038391.1
|
274
|
+
--------------------------------------------------------------------------------
|
275
|
+
(47) → Fix:
|
276
|
+
require 'bioroebe/toplevel_methods/open_reading_frames.rb'
|
277
|
+
Something is wrong; it returns regions that contain
|
278
|
+
a stop codon, which can not be true.
|
279
|
+
--------------------------------------------------------------------------------
|
280
|
+
(48) → Fix: extend glycovirology parts
|
281
|
+
seek stuff in viral genomes
|
282
|
+
--------------------------------------------------------------------------------
|
283
|
+
(49) →
|
284
|
+
seq = Bio::Sequence::NA.new("atgcatgcaaaaaaa")
|
285
|
+
puts seq
|
286
|
+
puts seq.complement
|
287
|
+
puts seq.subseq(3,8)
|
288
|
+
puts seq.subseq(3,8).complement #wont work
|
289
|
+
p seq.gc_percent
|
290
|
+
p (100 - seq.gc_percent) # at_percent
|
291
|
+
p seq.composition
|
292
|
+
puts seq.translate
|
293
|
+
puts seq.translate(2)
|
294
|
+
puts seq.translate(1,11)
|
295
|
+
puts seq.translate.codes
|
296
|
+
puts seq.translate.names
|
297
|
+
puts seq.translate.composition
|
298
|
+
puts seq.translate.molecular_weight
|
299
|
+
puts seq.complement.translate
|
300
|
+
^^^ make sure this works
|
301
|
+
seq = Bioroebe::Sequence.new("atgcatgcaaaaaaa")
|
302
|
+
puts seq
|
303
|
+
puts seq.complement
|
304
|
+
--------------------------------------------------------------------------------
|
305
|
+
(50) → In BioRoebe:
|
306
|
+
Add a table showing how compatible bioroebe is compared to the other
|
307
|
+
bio-projects, staring with biophp.
|
308
|
+
Also show the status how much is complete in each,
|
309
|
+
including Bio (ruby-bio) the main ruby project here.
|
310
|
+
And add a table which functionality is implemented
|
311
|
+
in Java already.
|
312
|
+
--------------------------------------------------------------------------------
|
313
|
+
(51) →
|
314
|
+
********************************************************************************
|
315
|
+
Was passiert wenn wir das Lambda-Genom mit EcoRI behandeln?
|
316
|
+
********************************************************************************
|
317
|
+
Es entstehen 3 chromosomale Fragmente.
|
318
|
+
^^^ dies testen
|
319
|
+
also zerst lambda genom herunterladen.
|
320
|
+
download lambda
|
321
|
+
download lambda_genome
|
322
|
+
^^^
|
323
|
+
dann:
|
324
|
+
Bioroebe.digest_this_dna(:lambda_genome, with: :EcoRI)
|
325
|
+
Bioroebe.digest_this_dna("/root/Bioroebe/fasta/NC_001416.1_Enterobacteria_phage_lambda_complete_genome.fasta", with: :EcoRI)
|
326
|
+
^^^ test this API and document it as well.
|
327
|
+
^^^ and say how many fragments will be created in this CIRCULAR
|
328
|
+
DNA.
|
329
|
+
^^^ this now works kind of ... but it must be better
|
330
|
+
documented and we must test this with more data.
|
331
|
+
--------------------------------------------------------------------------------
|
332
|
+
(52) → https://international.neb.com/products/r0196-ncii#Product%20Information
|
333
|
+
^^^ autogenerate such an image, aka restriction cutting enzyme
|
334
|
+
to indicate the target sequence.
|
335
|
+
--------------------------------------------------------------------------------
|
336
|
+
(53) → how to do codon optimiation in e.coli? bioroebe must support this!
|
337
|
+
we must first get a display which codon is very commonly used in
|
338
|
+
E. coli, from some remote site ... japanese site I think.
|
339
|
+
then, we analyse all possibilities.
|
340
|
+
and then we look which codons may be improvable - display
|
341
|
+
them on the commandline
|
342
|
+
class: OptimizeCodons.new(of_this_sequence)
|
343
|
+
--------------------------------------------------------------------------------
|
344
|
+
(54) → Molekulare Grösse von "Ubiquitin"? "8.5 kd".
|
404
345
|
^^^ das sollte automatisch ausgerechnet werden
|
405
|
-
|
406
|
-
(
|
407
|
-
|
408
|
-
(
|
346
|
+
--------------------------------------------------------------------------------
|
347
|
+
(55) → taxonomy !!!!!!!!!!!!!!!!!!
|
348
|
+
--------------------------------------------------------------------------------
|
349
|
+
(56) → Given a list of gene names that I would like to get chromosome/position
|
409
350
|
information for (in mm10). Is there some service online where I can
|
410
351
|
paste this list? ^^^ enable this
|
411
|
-
|
412
|
-
(
|
413
|
-
|
414
|
-
This works quite ok, but right now the approach is to store
|
415
|
-
this in a .yml file which is not ideal.
|
416
|
-
|
417
|
-
Thus, we have to add two things:
|
418
|
-
- The ability to store this into a SQL database
|
419
|
-
- The ability to batch-download all of these codons,
|
420
|
-
which first requires that we have a way to obtain all
|
421
|
-
taxonomic ids.
|
422
|
-
-------------------------------------------------------------------------------
|
423
|
-
(11) → Add a way in bioroebe to store a gene into a yaml file
|
424
|
-
or so, and to also load it up again. Perhaps simplify
|
425
|
-
this automatically. Need some ways to describe that.
|
426
|
-
-------------------------------------------------------------------------------
|
427
|
-
(12) → Make bioroebe very useful from the www, no matter if via sinatra
|
352
|
+
--------------------------------------------------------------------------------
|
353
|
+
(57) → Make bioroebe very useful from the www, no matter if via sinatra
|
428
354
|
or rails. It should be a tool-set project on the www as well.
|
429
|
-
|
430
|
-
(
|
431
|
-
Fasta file. For example, lets consider the file cor6_6.gb
|
432
|
-
which is included in the Biopython unit tests under the
|
355
|
+
--------------------------------------------------------------------------------
|
356
|
+
(58) → Suppose you have a GenBank file which you want to turn into a
|
357
|
+
Fasta file. For example, lets consider the file cor6_6.gb
|
358
|
+
which is included in the Biopython unit tests under the
|
433
359
|
GenBank directory.
|
434
|
-
|
435
|
-
|
436
|
-
|
437
|
-
|
438
|
-
|
439
|
-
|
440
|
-
|
441
|
-
|
442
|
-
THEN THIS CAN BE REMOVED!!!!!!!
|
443
|
-
|
444
|
-
-------------------------------------------------------------------------------
|
445
|
-
(14) → Wir brauchen eine table wo wir die starken promotoren verschiedener
|
360
|
+
need to check that this is equivalent, think about the API
|
361
|
+
document it and then remove this entry.
|
362
|
+
^^^ also build a GUI for this.
|
363
|
+
call it format-converter or so
|
364
|
+
the GUI works somewhat but needs to be polished up.
|
365
|
+
THEN THIS CAN BE REMOVED!!!!!!!
|
366
|
+
--------------------------------------------------------------------------------
|
367
|
+
(59) → Wir brauchen eine table wo wir die starken promotoren verschiedener
|
446
368
|
Organismen zusammenstellen und vergleichen können.
|
447
|
-
|
448
|
-
|
449
|
-
|
450
|
-
|
451
|
-
|
452
|
-
|
453
|
-
|
454
|
-
|
455
|
-
|
456
|
-
|
457
|
-
|
458
|
-
(16) → also add 30-33 to aminoacids hmmm difficult.
|
459
|
-
-------------------------------------------------------------------------------
|
460
|
-
(17) → http://bioinformatics.oxfordjournals.org/content/18/8/1135
|
369
|
+
strong_promoters.yml
|
370
|
+
--------------------------------------------------------------------------------
|
371
|
+
(60) → add:
|
372
|
+
start position of exons
|
373
|
+
and show the sequence based on that file
|
374
|
+
Normally there's a "gene" entry for each gene, so:
|
375
|
+
awk 'BEGIN{FS="\t"; OFS="\t"}{if($3 == "gene") print $1, $4, $5}' foo.gtf
|
376
|
+
--------------------------------------------------------------------------------
|
377
|
+
(61) → also add 30-33 to aminoacids hmmm difficult.
|
378
|
+
--------------------------------------------------------------------------------
|
379
|
+
(62) → http://bioinformatics.oxfordjournals.org/content/18/8/1135
|
461
380
|
"TFBS: Computational framework for transcription factor
|
462
|
-
binding site analysis"
|
463
|
-
|
464
|
-
|
465
|
-
|
466
|
-
|
467
|
-
|
468
|
-
(18) → They include trypsin, chymotrypsin, thrombin, plasmin, papain and factor Xa.
|
381
|
+
binding site analysis"
|
382
|
+
study the above and see if it can be included
|
383
|
+
into bioroebe
|
384
|
+
http://tfbs.genereg.net/
|
385
|
+
--------------------------------------------------------------------------------
|
386
|
+
(63) → They include trypsin, chymotrypsin, thrombin, plasmin, papain and factor Xa.
|
469
387
|
^^^ provide means to identify where they cut,
|
470
|
-
|
471
|
-
|
472
|
-
|
388
|
+
and show this then by simualting a digest.
|
389
|
+
return an array with the starting aminoacids.
|
390
|
+
also document this on bioroebe todo
|
473
391
|
this is done via digestion/digestions
|
474
392
|
but it's not quite perfect yet.
|
475
|
-
|
476
|
-
(
|
477
|
-
|
393
|
+
--------------------------------------------------------------------------------
|
394
|
+
(64) → a) add a commandline way to generate a random protein
|
395
|
+
with a specified length and then display it on the
|
478
396
|
commandline [DONE] !!!
|
479
|
-
|
480
|
-
|
481
|
-
|
482
|
-
|
483
|
-
|
484
|
-
|
485
|
-
|
486
|
-
|
487
|
-
|
488
|
-
|
489
|
-
|
490
|
-
|
491
|
-
|
492
|
-
|
493
|
-
|
494
|
-
|
495
|
-
|
496
|
-
|
497
|
-
|
498
|
-
|
499
|
-
|
500
|
-
|
501
|
-
|
502
|
-
|
503
|
-
|
504
|
-
|
505
|
-
|
506
|
-
|
507
|
-
|
508
|
-
what does that mean? upcase as method? hmmm.
|
509
|
-
|
510
|
-
..........................................................................
|
511
|
-
(1) → http://www.biomart.org/other/user-docs.pdf
|
512
|
-
^^^ work through this
|
513
|
-
^^^ integrate the old .cgi part and improve as you go
|
514
|
-
..........................................................................
|
515
|
-
(1) → Access geninfo numbers easily.
|
397
|
+
bioroebe --random-aminoacids=33
|
398
|
+
bioroebe --n-aminoacids=33
|
399
|
+
sinatra:
|
400
|
+
random_aminoacids/33 [Implemented! 23.09.2019]
|
401
|
+
also added the gtk-GUI code here; needs to be
|
402
|
+
documented briefly, then this part is completelty
|
403
|
+
done. contiu on random_aminoacids: in particular
|
404
|
+
add a gtk_entry that specifie, no, a spin button
|
405
|
+
that states how many no... an entry or so
|
406
|
+
to state how many aminoacids to generate
|
407
|
+
randomly
|
408
|
+
b) add a way to generate a cDNA sequence from such a
|
409
|
+
protein and view all possible sequences from that
|
410
|
+
sequence. ^^^
|
411
|
+
Enable this BOTH from the commandline AND from the
|
412
|
+
interactive variant and from sinatra! Hmmmm.
|
413
|
+
--------------------------------------------------------------------------------
|
414
|
+
(65) → add an option to design a
|
415
|
+
degenerate primer
|
416
|
+
--------------------------------------------------------------------------------
|
417
|
+
(66) → Add upcase to sequences and ensure that it works; also document it
|
418
|
+
internally and in the .pdf tutorial
|
419
|
+
what does that mean? upcase as method? hmmm.
|
420
|
+
--------------------------------------------------------------------------------
|
421
|
+
(67) → http://www.biomart.org/other/user-docs.pdf
|
422
|
+
^^^ work through this
|
423
|
+
^^^ integrate the old .cgi part and improve as you go
|
424
|
+
--------------------------------------------------------------------------------
|
425
|
+
(68) → Access geninfo numbers easily.
|
516
426
|
Die suchen und runterladen.
|
517
|
-
|
518
|
-
|
519
|
-
|
520
|
-
|
521
|
-
|
522
|
-
|
523
|
-
|
524
|
-
|
525
|
-
|
526
|
-
|
527
|
-
|
528
|
-
|
529
|
-
|
530
|
-
|
531
|
-
|
532
|
-
|
533
|
-
|
534
|
-
(
|
535
|
-
|
536
|
-
(5) Continue with biojava in bioroebe.
|
537
|
-
|
538
|
-
→ We need to make some table that tells us what is implemented
|
427
|
+
--------------------------------------------------------------------------------
|
428
|
+
(69) → Add all of bioruby into bioroebe:
|
429
|
+
continous project
|
430
|
+
https://github.com/biopython/biopython
|
431
|
+
https://github.com/bioruby/bioruby/tree/master/lib/bio
|
432
|
+
--------------------------------------------------------------------------------
|
433
|
+
(70) → https://github.com/bioruby/bioruby/issues/134
|
434
|
+
^^^ check this, for restriction enzymes
|
435
|
+
http://rebase.neb.com/rebase/enz/MboII.html
|
436
|
+
Bio::RestrictionEnzyme.cut(seq, 'MboII').primary rescue [seq]
|
437
|
+
=> ["agaagattaggatt", "gatgat"]
|
438
|
+
> seq = seq.reverse_complement
|
439
|
+
> Bio::RestrictionEnzyme.cut(seq, 'MboII').primary rescue [seq]
|
440
|
+
=> ["atcatcaatcctaatcttct"]
|
441
|
+
--------------------------------------------------------------------------------
|
442
|
+
(71) → Document how an ORF is defined for the bioroebe project.
|
443
|
+
--------------------------------------------------------------------------------
|
444
|
+
(72) → Continue with biojava in bioroebe.
|
445
|
+
→ We need to make some table that tells us what is implemented
|
539
446
|
in java.
|
540
|
-
|
447
|
+
→ Make it possible to randomly generate aminoacids, and then,
|
541
448
|
based on that, design degenarate DNA that matches to it) ←
|
542
449
|
this also must work standalone, and be documented.
|
543
|
-
|
544
|
-
|
545
|
-
|
546
|
-
|
547
|
-
|
548
|
-
|
549
|
-
|
550
|
-
..........................................................................
|
551
|
-
(1) → The codon tables:
|
552
|
-
→ In January we added a codon-table GUI to ruby-gtk3.
|
553
|
-
|
450
|
+
document on bioroebe.cgi as well
|
451
|
+
We can generate degenerate primers now:
|
452
|
+
dprimer M-T-T-Y-Y-T-A-A-A-STOP
|
453
|
+
--------------------------------------------------------------------------------
|
454
|
+
(73) → The codon tables:
|
455
|
+
→ In January we added a codon-table GUI to ruby-gtk3.
|
554
456
|
also enable an inverse table.
|
555
|
-
|
556
|
-
|
557
|
-
|
558
|
-
|
559
|
-
|
560
|
-
|
561
|
-
|
562
|
-
|
563
|
-
|
564
|
-
|
565
|
-
|
566
|
-
|
567
|
-
|
568
|
-
|
569
|
-
|
570
|
-
|
571
|
-
|
572
|
-
add this as well. Then document it.
|
573
|
-
|
574
|
-
^^^ document this better too
|
457
|
+
Ala/A GCT, GCC, GCA, GCG GCN Leu/L TTA, TTG, CTT, CTC, CTA, CTG YTR, CTN
|
458
|
+
Arg/R CGT, CGC, CGA, CGG, AGA, AGG CGN, MGR Lys/K AAA, AAG AAR
|
459
|
+
Asn/N AAT, AAC AAY Met/M ATG
|
460
|
+
Asp/D GAT, GAC GAY Phe/F TTT, TTC TTY
|
461
|
+
Cys/C TGT, TGC TGY Pro/P CCT, CCC, CCA, CCG CCN
|
462
|
+
Gln/Q CAA, CAG CAR Ser/S TCT, TCC, TCA, TCG, AGT, AGC TCN, AGY
|
463
|
+
Glu/E GAA, GAG GAR Thr/T ACT, ACC, ACA, ACG ACN
|
464
|
+
Gly/G GGT, GGC, GGA, GGG GGN Trp/W TGG
|
465
|
+
His/H CAT, CAC CAY Tyr/Y TAT, TAC TAY
|
466
|
+
Ile/I ATT, ATC, ATA ATH Val/V GTT, GTC, GTA, GTG GTN
|
467
|
+
START ATG STOP TAA, TGA, TAG TAR, TRA
|
468
|
+
I think this is already stored in:
|
469
|
+
inverse_rna_codon_table.yml
|
470
|
+
table = Bio::CodonTable[1]
|
471
|
+
^^^^ this is quite a useful feature of bioruby. We need to
|
472
|
+
add this as well. Then document it.
|
473
|
+
^^^ document this better too
|
575
474
|
that we can now display all the different codon tables.
|
576
|
-
|
577
|
-
|
578
|
-
|
579
|
-
|
580
|
-
|
581
|
-
|
582
|
-
|
583
|
-
|
584
|
-
|
585
|
-
|
586
|
-
|
587
|
-
|
588
|
-
|
589
|
-
|
590
|
-
|
591
|
-
|
592
|
-
|
593
|
-
|
594
|
-
|
595
|
-
|
596
|
-
|
597
|
-
|
598
|
-
|
599
|
-
|
600
|
-
|
601
|
-
|
602
|
-
|
603
|
-
|
604
|
-
|
605
|
-
|
606
|
-
|
607
|
-
|
608
|
-
|
609
|
-
|
610
|
-
|
611
|
-
|
612
|
-
|
613
|
-
|
614
|
-
|
615
|
-
|
616
|
-
|
617
|
-
|
618
|
-
|
619
|
-
|
620
|
-
|
621
|
-
|
622
|
-
|
623
|
-
|
624
|
-
|
625
|
-
|
626
|
-
|
627
|
-
|
628
|
-
|
629
|
-
|
630
|
-
|
631
|
-
|
632
|
-
|
633
|
-
|
634
|
-
|
635
|
-
|
636
|
-
|
637
|
-
^^^
|
638
|
-
|
639
|
-
|
640
|
-
|
641
|
-
|
642
|
-
|
643
|
-
|
644
|
-
|
645
|
-
|
646
|
-
|
647
|
-
|
648
|
-
|
649
|
-
|
650
|
-
→
|
651
|
-
|
652
|
-
|
653
|
-
|
654
|
-
|
655
|
-
|
656
|
-
|
657
|
-
|
658
|
-
download 1fat
|
659
|
-
^^^ notify the user about this
|
660
|
-
but put it into the dir of bioshell
|
661
|
-
|
662
|
-
→ add:
|
663
|
-
|
664
|
-
set_dna :insulin
|
665
|
-
set_dna insulin
|
666
|
-
|
667
|
-
This shall allow us to use the sequence of human insulin
|
668
|
-
here. Also document this. Shall just make things more
|
669
|
-
convenient for us.
|
670
|
-
|
671
|
-
http://www.ncbi.nlm.nih.gov/gene/3630
|
672
|
-
|
673
|
-
insulin = 'ncbi_gene: 3630'
|
674
|
-
→ becomes: http://www.ncbi.nlm.nih.gov/gene/3630
|
675
|
-
|
676
|
-
wtf ... better to learn how NCBI uworks
|
677
|
-
-------------------------------------------------------------------------------
|
678
|
-
- Add a seuqence table int obioroebe for GFP, YFP etc
|
679
|
-
and mae this show in both the interactio bioshell but
|
680
|
-
also the main README.md
|
681
|
-
-------------------------------------------------------------------------------
|
682
|
-
- stop_frame1?
|
683
|
-
^^^ add support for this
|
684
|
-
and stop_frame2?
|
685
|
-
etcc
|
686
|
-
to show stop-codons in this colour
|
687
|
-
THEN UPLOAD!
|
688
|
-
^^^ this works now but is not documented
|
689
|
-
|
690
|
-
|
691
|
-
-------------------------------------------------------------------------------
|
692
|
-
|
693
|
-
- chop to first ATG
|
694
|
-
|
695
|
-
chop :ATG
|
696
|
-
|
697
|
-
^^^^ enable this, to chop towards the first ATG
|
698
|
-
sequence in the string
|
699
|
-
|
700
|
-
-------------------------------------------------------------------------------
|
701
|
-
→ http://www.biophp.org/stats/describe_data/demo.php?show=formula
|
702
|
-
|
703
|
-
^^^ should also add documentation like this, also via www interface
|
704
|
-
-------------------------------------------------------------------------------
|
705
|
-
→ add mouse chromsoome URL, also in the bioshell
|
706
|
-
and the main README, to be of help for the
|
707
|
-
user. add a mouse subsection.
|
708
|
-
..........................................................................
|
709
|
-
→ fix the taxonomy stuff...
|
710
|
-
..........................................................................
|
711
|
-
(1) → add 2nd_orf
|
712
|
-
→ this shall scan for the 2nd orf
|
713
|
-
→ and third ORF as well, then, and document it.
|
714
|
-
..........................................................................
|
715
|
-
(2) → Add a "cutter-range example" in restriction enzymes +
|
475
|
+
This now sorta works semi-ok.
|
476
|
+
--------------------------------------------------------------------------------
|
477
|
+
(74) → In the bioroebe-shell, enable input such as:
|
478
|
+
NC_000011.10
|
479
|
+
This shall quickly download this sequence into the
|
480
|
+
local file, and also rename it properly.
|
481
|
+
--------------------------------------------------------------------------------
|
482
|
+
(75) → clone all of bioruby
|
483
|
+
--------------------------------------------------------------------------------
|
484
|
+
(76) → bioinf bücher udrhclesen und zeug inkludiere !!!
|
485
|
+
^^^^^ mehr bilderchen hinzufügen ... auchv on den GUIs eventuell.
|
486
|
+
Und auch biopython durcharbeiten und alles wichtige nach
|
487
|
+
bioroebe übertragen.
|
488
|
+
--------------------------------------------------------------------------------
|
489
|
+
(77) → Add: DetectMotif
|
490
|
+
This class shall be used for detecting subsequences.
|
491
|
+
--------------------------------------------------------------------------------
|
492
|
+
(78) → Neue funktionälit rein
|
493
|
+
--------------------------------------------------------------------------------
|
494
|
+
(79) → mehr doku!!!
|
495
|
+
--------------------------------------------------------------------------------
|
496
|
+
(80) → Rewrite bioroebe completely - add some tests, too or so, to
|
497
|
+
test this. ^^^
|
498
|
+
That way we learn how to write tests.
|
499
|
+
AND ... we will actually start with the taxonomy project
|
500
|
+
so that it finally works again.
|
501
|
+
continue work on bioroebe
|
502
|
+
MAKE BIOROEBE EPIC because this is what I will make money with.
|
503
|
+
CONTINUE THE BIOROEBE PORT !
|
504
|
+
^^^^
|
505
|
+
require 'bioroebe/constants/remote_urls.rb
|
506
|
+
^^^
|
507
|
+
ncbi taxonomy databse: move this into this file.
|
508
|
+
# require 'bioroebe/constants/aminoacid_families.rb'
|
509
|
+
^^^ also ... fix this here.
|
510
|
+
also continue bioroebe port...
|
511
|
+
hmm. and perhaps add something else, like the option to have
|
512
|
+
multiple genes and multiple proteis
|
513
|
+
and define workspaces.
|
514
|
+
but start with taxonomy first.
|
515
|
+
^^^
|
516
|
+
Also during that rewrite, make sure that the quality of the
|
517
|
+
documentation improves. That way the whole project serves
|
518
|
+
as advertisement too.
|
519
|
+
wait with rewrite though......
|
520
|
+
clone bioroebe in java, STEP BY STEP, as means of learning java too,
|
521
|
+
for preparation at the TU Wien.
|
522
|
+
^^^ but first read up some java tutorial because I dont know related-problems about
|
523
|
+
java
|
524
|
+
CPK: international colour scheme
|
525
|
+
add document in bioroebe + registration
|
526
|
+
https://proteopedia.org/wiki/index.php/CPK
|
527
|
+
^^^
|
528
|
+
bioroebe erweitern... auch rosalind
|
529
|
+
^^^
|
530
|
+
improve bioroebe so that it is supper
|
531
|
+
and add C++
|
532
|
+
extend bioroebe sinatra interface
|
533
|
+
also add a footer to show which entries are available or so
|
534
|
+
→ in bioroebe, mach das die postgresql datenbank wieder funktioniert ...
|
535
|
+
--------------------------------------------------------------------------------
|
536
|
+
(81) → ^^^ improve this whole project a lot
|
537
|
+
before uploading then send email
|
538
|
+
→ add:
|
539
|
+
set_dna :insulin
|
540
|
+
set_dna insulin
|
541
|
+
This shall allow us to use the sequence of human insulin
|
542
|
+
here. Also document this. Shall just make things more
|
543
|
+
convenient for us.
|
544
|
+
http://www.ncbi.nlm.nih.gov/gene/3630
|
545
|
+
insulin = 'ncbi_gene: 3630'
|
546
|
+
→ becomes: http://www.ncbi.nlm.nih.gov/gene/3630
|
547
|
+
wtf ... better to learn how NCBI uworks
|
548
|
+
--------------------------------------------------------------------------------
|
549
|
+
(82) → Add a seuqence table into bioroebe for GFP, YFP etc
|
550
|
+
and mae this show in both the interactio bioshell but
|
551
|
+
also the main README.md
|
552
|
+
--------------------------------------------------------------------------------
|
553
|
+
(83) → http://www.biophp.org/stats/describe_data/demo.php?show=formula
|
554
|
+
^^^ should also add documentation like this, also via www interface
|
555
|
+
--------------------------------------------------------------------------------
|
556
|
+
(84) → Add a "cutter-range example" in restriction enzymes +
|
716
557
|
table + examples + tutorial
|
717
|
-
|
718
|
-
one example each in this overview.
|
719
|
-
|
558
|
+
one example each in this overview.
|
720
559
|
Also, add in the documentation where this
|
721
560
|
can be found.
|
722
|
-
|
723
|
-
(
|
724
|
-
But we want to do this on the dna-sequence rather
|
725
|
-
than the aminoacid sequence.
|
726
|
-
This works but the display is not ideal.
|
727
|
-
..........................................................................
|
728
|
-
(4) → Add some codon-usage analyzer. What shall it show? It
|
561
|
+
--------------------------------------------------------------------------------
|
562
|
+
(85) → Add some codon-usage analyzer. What shall it show? It
|
729
563
|
should show how many codons are used, frequencies etc...
|
730
564
|
by an organism, and compare that to other data.
|
731
|
-
|
732
|
-
(
|
733
|
-
|
565
|
+
--------------------------------------------------------------------------------
|
566
|
+
(86) → Implement a GPCR interface.
|
734
567
|
This is for "G-protein coupled receptors."
|
735
568
|
Denote which variants exist and so forth. Document it as well.
|
736
|
-
|
737
|
-
(
|
738
|
-
|
569
|
+
--------------------------------------------------------------------------------
|
570
|
+
(87) → alu?
|
739
571
|
Will read from the file `/Programs/Ruby/2.3.0/lib/ruby/site_ruby/2.3.0/bioroebe/yaml/alu_elements.yml`.
|
740
572
|
Bioroebe::ParseFasta: This sequence is assumed to be a protein.
|
741
573
|
This sequence has 1317 aminoacids.
|
742
|
-
|
743
574
|
We have identified a total of 1 entries in this fasta dataset.
|
744
575
|
The ALU sequence in humans may be:
|
745
|
-
|
746
|
-
|
747
|
-
|
748
|
-
|
749
|
-
|
750
|
-
|
751
|
-
|
752
|
-
|
753
|
-
|
754
|
-
|
755
|
-
|
756
|
-
|
757
|
-
|
758
|
-
|
759
|
-
|
760
|
-
|
761
|
-
|
762
|
-
|
763
|
-
|
764
|
-
|
765
|
-
|
766
|
-
and then display nice thingies to the user.
|
767
|
-
|
768
|
-
http://www.pdb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=3030
|
769
|
-
http://www.pdb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=2VEZ
|
770
|
-
|
771
|
-
in 3EML 2VTP 2VEZ
|
772
|
-
do
|
773
|
-
..........................................................................
|
774
|
-
(1) → Fully integrate electron microscopy then remove the old entry.
|
576
|
+
GC ...
|
577
|
+
^^^ das stimmt aber net ... hmmm.
|
578
|
+
(3) → The .pdb file that used to be distributed via bioroebe was way
|
579
|
+
too large. Perhaps add a way to download it instead if needed
|
580
|
+
e. g.:
|
581
|
+
common_downloads:
|
582
|
+
^^^ add this and document it or something like that
|
583
|
+
And perhaps add a small protein as an example how to
|
584
|
+
work with .pdb files instead.
|
585
|
+
--------------------------------------------------------------------------------
|
586
|
+
(88) → Extend bioroebe to allow download
|
587
|
+
PDB files
|
588
|
+
id 3030
|
589
|
+
and then display nice thingies to the user.
|
590
|
+
http://www.pdb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=3030
|
591
|
+
http://www.pdb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=2VEZ
|
592
|
+
in 3EML 2VTP 2VEZ
|
593
|
+
do
|
594
|
+
--------------------------------------------------------------------------------
|
595
|
+
(89) → Fully integrate electron microscopy then remove the old entry.
|
775
596
|
Test it though.
|
776
597
|
Hmm... but ... we will first polish the main bioroebe
|
777
598
|
gem AND the taxonomy gem and THEN AFTERWARDS
|
778
599
|
integate elctron microsopcy.
|
779
|
-
|
780
|
-
(
|
781
|
-
|
600
|
+
--------------------------------------------------------------------------------
|
601
|
+
(90) → ORF Finder:
|
782
602
|
We must add an ORF finder for the bioroebe project,
|
783
603
|
similar to the NCBI ORF Finder.
|
784
|
-
|
785
604
|
This works partially... start_stop works but we do not
|
786
605
|
yet find all subsequences.
|
787
|
-
|
788
|
-
|
789
|
-
(1) → must change determine whether we have protein or nucleotide or
|
606
|
+
--------------------------------------------------------------------------------
|
607
|
+
(91) → must change determine whether we have protein or nucleotide or
|
790
608
|
so via a topelvel method!
|
791
|
-
|
792
|
-
(
|
793
|
-
|
794
|
-
|
795
|
-
|
796
|
-
|
797
|
-
|
798
|
-
(
|
609
|
+
--------------------------------------------------------------------------------
|
610
|
+
(92) → there is a talens module.
|
611
|
+
we have to improve on it for a while
|
612
|
+
better docu
|
613
|
+
more testing
|
614
|
+
then we can get rid of this entry here
|
615
|
+
--------------------------------------------------------------------------------
|
616
|
+
(93) → 33.44
|
799
617
|
Next showing the nucleotides 33 to 44 (including 33 and 44).
|
800
|
-
|
801
|
-
|
802
|
-
|
803
|
-
|
804
|
-
|
805
|
-
(
|
806
|
-
|
807
|
-
|
808
|
-
|
809
|
-
|
810
|
-
|
811
|
-
@type=:dna>
|
812
|
-
|
813
|
-
BIO SHELL> aaseq?
|
618
|
+
The length of the fragment will be 12 nucleotides.
|
619
|
+
5' - 2;70;130;180 - 3'
|
620
|
+
^^^ there is some problem; we somehow embed the colour codes,
|
621
|
+
which should not happen.
|
622
|
+
--------------------------------------------------------------------------------
|
623
|
+
(94) → set_aa DTLCIGYHAN NSTDTVDTVL EKNVTVTHSV NLLEDKHNGK LCKLRGVAPL HLGKCNIAGW ILGNPECESL STASSWSYIV ETSNSDNGTC YPGDFINYEE LREQLSSVSS FERFEIFPKT SSWPNHDNKG VTAACPHAGA KSFYKNLIWL VKKGNSYPKL NQSYINDKGK EVLVLWGIHH PSTTADQQSL YQNADAYVFV GTSRYSKKFK PEIATRPKVR DQEGRMNYYW TLVEPGDKIT FEATGNLVVP RYAFMERNAG SGIIISDTPV HDCNTTCQTP EGAINTSLPF QNIHPITIGK CPKYVKSTKL RLATGLRNVP SIQSRGLFGA IAGFIEGGWT GMVDGWYGYH HQNEQGSGYA ADLKSTQNAI DKITNKVNSV IKMNTQFTAV GKEFNHLEKR IENLNKKVDD GFLDIWTYNA ELLVLLENER TLDYHDSNVK NLYEKVRNQL KNNAKEIGNG CFEFYHKCDN TCMESVKNGT YDYPKYSEEA KLNREKIDGV KLESTRIYHH HHHH
|
624
|
+
^^^ enable copy/pasting,
|
625
|
+
then reverse_sequence
|
626
|
+
dna_sequence?
|
627
|
+
@type=:dna>
|
628
|
+
BIO SHELL> aaseq?
|
814
629
|
DTLCIGYHANNSTDTVDTVLEKNVTVTHSVNLLEDKHNGKLCKLRGVAPLHLGKCNIAGWILGNPECESLSTASSWSYIVETSNSDNGTCYPGDFINYEELREQLSSVSSFERFEIFPKTSSWPNHDNKGVTAACPHAGAKSFYKNLIWLVKKGNSYPKLNQSYINDKGKEVLVLWGIHHPSTTADQQSLYQNADAYVFVGTSRYSKKFKPEIATRPKVRDQEGRMNYYWTLVEPGDKITFEATGNLVVPRYAFMERNAGSGIIISDTPVHDCNTTCQTPEGAINTSLPFQNIHPITIGKCPKYVKSTKLRLATGLRNVPSIQSRGLFGAIAGFIEGGWTGMVDGWYGYHHQNEQGSGYAADLKSTQNAIDKITNKVNSVIKMNTQFTAVGKEFNHLEKRIENLNKKVDDGFLDIWTYNAELLVLLENERTLDYHDSNVKNLYEKVRNQLKNNAKEIGNGCFEFYHKCDNTCMESVKNGTYDYPKYSEEAKLNREKIDGVKLESTRIYHHHHHH
|
815
|
-
|
816
|
-
|
817
|
-
|
818
|
-
|
819
|
-
|
820
|
-
(1) → add this functionality:
|
821
|
-
|
630
|
+
BIO SHELL> aasize?
|
631
|
+
This sequence has 50 aminoacids.
|
632
|
+
^^^ das stimmt net.
|
633
|
+
--------------------------------------------------------------------------------
|
634
|
+
(95) → add this functionality:
|
822
635
|
meting temper
|
823
636
|
melting temper
|
824
637
|
melting_temperature?
|
825
|
-
|
826
|
-
|
827
|
-
|
828
|
-
|
829
|
-
|
830
|
-
|
831
|
-
|
832
|
-
|
833
|
-
|
834
|
-
|
835
|
-
|
836
|
-
|
837
|
-
|
838
|
-
|
839
|
-
|
840
|
-
|
841
|
-
|
842
|
-
|
843
|
-
|
844
|
-
|
845
|
-
|
846
|
-
|
847
|
-
|
848
|
-
|
849
|
-
|
850
|
-
Note that NCBI Blast and several other sites also already
|
851
|
-
have very good algorithms in this regards, so the prime
|
852
|
-
use case for BioRoebe is to explain a bit the algorithms
|
853
|
-
and also provide a commandline-way to calculate them,
|
854
|
-
using ruby. The latter may be useful and rather easy for
|
855
|
-
scripted use.
|
856
|
-
..........................................................................
|
857
|
-
(1) → show insulin
|
638
|
+
^^^ for short primers
|
639
|
+
→ 4°C for each G/C base pair
|
640
|
+
→ 2°C for each A/T base pair
|
641
|
+
Also add an explanation as can be seen here:
|
642
|
+
http://comments.gmane.org/gmane.comp.lang.ruby.bio/1182
|
643
|
+
"I discovered the above discussion by accident, about two
|
644
|
+
years lateron. :)
|
645
|
+
I am not using email-discussions or usegroups/newsgroups
|
646
|
+
in general, largely because I have never been able to
|
647
|
+
keep up to date with them and deal with emails properly;
|
648
|
+
I am more a casual emails user myself, growing up in a
|
649
|
+
www-world where phpBB really made it convenient to
|
650
|
+
communicate with other people. So probably a bit after the
|
651
|
+
emails-people use emails.
|
652
|
+
At any rate, when I noticed it, I decided on my todo-list
|
653
|
+
that I will improve the melting-temperature calculation
|
654
|
+
of BioRoebe.
|
655
|
+
Note that NCBI Blast and several other sites also already
|
656
|
+
have very good algorithms in this regards, so the prime
|
657
|
+
use case for BioRoebe is to explain a bit the algorithms
|
658
|
+
and also provide a commandline-way to calculate them,
|
659
|
+
using ruby. The latter may be useful and rather easy for
|
660
|
+
scripted use.
|
661
|
+
--------------------------------------------------------------------------------
|
662
|
+
(96) → show insulin
|
858
663
|
^^^ to show the insulin structure
|
859
|
-
|
860
|
-
|
861
|
-
|
862
|
-
(
|
664
|
+
how to find it? no idea...
|
665
|
+
but we should have these structures already made available somewhere.
|
666
|
+
--------------------------------------------------------------------------------
|
667
|
+
(97) → Todo: find family of enzymes, based on sequence structure
|
863
668
|
alone.
|
864
|
-
|
865
|
-
(
|
866
|
-
|
867
|
-
^^^ this website is quite interesting; try to use components
|
868
|
-
from it.
|
869
|
-
-------------------------------------------------------------------------------
|
870
|
-
(1) → Add some option to show the aminoacid sequence, at the least
|
871
|
-
store it; and optionally show it.
|
872
|
-
|
873
|
-
possibly always report how many aminoacids are
|
874
|
-
part of that file; and optionally also show
|
875
|
-
the whole sequence.
|
876
|
-
-------------------------------------------------------------------------------
|
877
|
-
(1) → WORK THROUGH the PROTOCOL AT BOKU. THEN WORK THROUGH THE VARIOUST
|
669
|
+
--------------------------------------------------------------------------------
|
670
|
+
(98) → WORK THROUGH the PROTOCOL AT BOKU. THEN WORK THROUGH THE VARIOUST
|
878
671
|
TIDBIDS AT UNI WIEN STARTING WITH HEIKO.
|
879
|
-
|
672
|
+
^^^ da sind wir nun.
|
880
673
|
wir sind an beginn von 1b ... hmmmm, also zerst mal das an der
|
881
674
|
BOKU durchgehen. Dann das löschen.
|
882
|
-
|
883
|
-
(
|
884
|
-
|
885
|
-
(
|
675
|
+
--------------------------------------------------------------------------------
|
676
|
+
(99) → Begin tk-bindings for bioroebe, following the gtk stuff.
|
677
|
+
--------------------------------------------------------------------------------
|
678
|
+
(100) → frame_value = position_of_the_stop_codon - position_of_the_start_codon
|
886
679
|
^^^ continue on this ...
|
887
|
-
|
888
|
-
(
|
680
|
+
--------------------------------------------------------------------------------
|
681
|
+
(101) → improve both the gtk-apps parts, and the sinatra web-interface,
|
889
682
|
and other GUI-like elements. The idea is to make this software
|
890
683
|
more useful for people around the world, which should help
|
891
684
|
increase its adoption rate.
|
892
|
-
|
893
|
-
(
|
894
|
-
|
895
|
-
http://www.ncbi.nlm.nih.gov/nuccore/NM_007315.3?report=fasta&log$=seqview&format=text
|
685
|
+
--------------------------------------------------------------------------------
|
686
|
+
(102) → Look to integrate this:
|
687
|
+
http://www.ncbi.nlm.nih.gov/nuccore/NM_007315.3?report=fasta&log$=seqview&format=text
|
896
688
|
^^^
|
897
|
-
|
898
|
-
(
|
899
|
-
|
900
|
-
See: http://emboss.sourceforge.net/apps/cvs/emboss/apps/getorf.html
|
901
|
-
-------------------------------------------------------------------------------
|
902
|
-
(2) → set_dna_sequence alu
|
903
|
-
|
904
|
-
^^^ fetch random alu
|
905
|
-
|
906
|
-
^^^ alu sequence
|
907
|
-
Ok we started this now adding more details, but we
|
908
|
-
need to become better at searching for this
|
909
|
-
sequence.
|
910
|
-
-------------------------------------------------------------------------------
|
911
|
-
(3) → We need to make available the ... thingy magick
|
689
|
+
--------------------------------------------------------------------------------
|
690
|
+
(103) → We need to make available the ... thingy magick
|
912
691
|
emboss functionality. that may seem useful
|
913
692
|
but also feel free to extend these parts for
|
914
693
|
bioroebe as necessary.
|
915
|
-
|
916
|
-
(
|
917
|
-
This will take more time, so first we finish with the
|
694
|
+
--------------------------------------------------------------------------------
|
695
|
+
(104) → integrate electron_microscopy fully
|
696
|
+
This will take more time, so first we finish with the
|
918
697
|
taxonomy module instead.
|
919
|
-
|
920
|
-
(
|
921
|
-
|
698
|
+
--------------------------------------------------------------------------------
|
699
|
+
(105) → Improve support for BLAST up until
|
922
700
|
middle of 2015 so that I am better prepared
|
923
701
|
for work-related stuff. In order for this
|
924
|
-
to succed, we first have to understand
|
702
|
+
to succed, we first have to understand
|
925
703
|
BLAST very well.
|
926
|
-
|
927
704
|
So, work on BLAST tutorial at bioinf page:
|
928
|
-
|
929
|
-
|
930
|
-
|
931
|
-
(3) → integrate a "codon usage database", whatever this means.
|
705
|
+
bl bioinf; rf bioinf
|
706
|
+
--------------------------------------------------------------------------------
|
707
|
+
(106) → integrate a "codon usage database", whatever this means.
|
932
708
|
It is a cool database anyway. Then document this.
|
933
709
|
First, create a codon-usage analyze on a per-FASTA
|
934
710
|
site basis. Meaning we download a fasta sequence
|
935
711
|
and calculate the codon usage from there.
|
936
|
-
|
937
|
-
|
938
|
-
|
939
|
-
(4) → Input sequence:
|
940
|
-
|
712
|
+
^^^ and add some GUI to this. hmmm
|
713
|
+
--------------------------------------------------------------------------------
|
714
|
+
(107) → Input sequence:
|
941
715
|
MFLMVSPTAYHQNKDECFLP
|
942
716
|
TAYHQNKDECMVSPTAYHQN
|
943
717
|
KDECFLPTAYHQMVSPTAYH
|
944
718
|
QNKDECFLPTAYHQ
|
945
719
|
Reverse Translated sequence
|
946
|
-
|
947
720
|
ATG TTY YTNATGGTNWSNCCNACNGCNTAYCAYCARAAYAARGAYGARTGYTTYYTNCCN
|
948
721
|
ACNGCNTAYCAYCARAAYAARGAYGARTGYATGGTNWSNCCNACNGCNTAYCAYCARAAY
|
949
722
|
AARGAYGARTGYTTYYTNCCNACNGCNTAYCAYCARATGGTNWSNCCNACNGCNTAYCAY
|
950
723
|
CARAAYAARGAYGARTGYTTYYTNCCNACNGCNTAYCAYCAR
|
951
|
-
|
952
724
|
^^^ we should also show this on the commandline AND the
|
953
|
-
|
954
|
-
|
955
|
-
(
|
725
|
+
www ... hmmm.
|
726
|
+
--------------------------------------------------------------------------------
|
727
|
+
(108) → enable a graphical layer so that we can find out which
|
956
728
|
transcription factor activates which gene(s). This
|
957
729
|
should show e. g. a transcription factor highlighting
|
958
730
|
a target genetic area.
|
959
|
-
|
960
|
-
(
|
731
|
+
--------------------------------------------------------------------------------
|
732
|
+
(109) → We should add more screenshots, make them available on imgur
|
961
733
|
as well, after storing them locally. Start with the more
|
962
734
|
important functionality.
|
963
|
-
|
964
|
-
|
965
|
-
|
966
|
-
|
967
|
-
|
968
|
-
|
969
|
-
|
970
|
-
|
971
|
-
|
972
|
-
|
973
|
-
|
974
|
-
|
975
|
-
|
976
|
-
|
977
|
-
|
978
|
-
|
979
|
-
(
|
980
|
-
|
981
|
-
|
982
|
-
ne day this will work again *shake fist*
|
983
|
-
-------------------------------------------------------------------------------
|
984
|
-
(5) → re1 = Bio::RestrictionEnzyme::DoubleStranded.new(enzyme1)
|
985
|
-
|
986
|
-
^^^ add this? hmmmm
|
987
|
-
^^^ from here.
|
988
|
-
-------------------------------------------------------------------------------
|
989
|
-
(1) → Colourize exon/intron boundaries.
|
990
|
-
-------------------------------------------------------------------------------
|
991
|
-
(2) → In bioroebe: enhance phylogeny stuff and perhaps automatically
|
735
|
+
--------------------------------------------------------------------------------
|
736
|
+
(110) → clone serial cloner or whatever the name was, that GUI,
|
737
|
+
so that we can offer the same functionality.
|
738
|
+
--------------------------------------------------------------------------------
|
739
|
+
(111) →
|
740
|
+
# * searching for PubMed IDs given a query string:
|
741
|
+
# * Bio::PubMed#esearch (recommended)
|
742
|
+
# * Bio::PubMed#search (only retrieves top 20 hits; will be deprecated)
|
743
|
+
^^^ implement this
|
744
|
+
--------------------------------------------------------------------------------
|
745
|
+
(112) → Aufgabe 16 in bioroebe lösen könnnen
|
746
|
+
--------------------------------------------------------------------------------
|
747
|
+
(113) → re1 = Bio::RestrictionEnzyme::DoubleStranded.new(enzyme1)
|
748
|
+
^^^ add this? hmmmm
|
749
|
+
^^^ from here.
|
750
|
+
--------------------------------------------------------------------------------
|
751
|
+
(114) → Colourize exon/intron boundaries.
|
752
|
+
--------------------------------------------------------------------------------
|
753
|
+
(115) → In bioroebe: enhance phylogeny stuff and perhaps automatically
|
992
754
|
generate pictures here.
|
993
|
-
|
994
|
-
(
|
755
|
+
--------------------------------------------------------------------------------
|
756
|
+
(116) → In sinatra: add a backtranseq entry point, perhaps
|
995
757
|
alias it as well.
|
996
|
-
|
997
|
-
|
998
|
-
|
999
|
-
|
1000
|
-
|
1001
|
-
|
1002
|
-
|
1003
|
-
|
1004
|
-
|
1005
|
-
|
1006
|
-
|
1007
|
-
|
1008
|
-
|
1009
|
-
|
1010
|
-
and also in the interactive bioshell
|
1011
|
-
|
1012
|
-
https://github.com/rubygems/rubygems/blob/master/lib/rubygems/text.rb
|
1013
|
-
^^^ actually move that part into bioroebe itself...
|
1014
|
-
|
1015
|
-
-------------------------------------------------------------------------------
|
1016
|
-
(1) → add _source to all APIs in sinatra there. Ensure that this works
|
758
|
+
^^ sync this to ruby-gtk3? hmm
|
759
|
+
bioroebe --protein-to-dna
|
760
|
+
^^^ this shall start the GTK3 variant
|
761
|
+
--------------------------------------------------------------------------------
|
762
|
+
(117) → require 'rubygems/text'
|
763
|
+
include Gem::Text
|
764
|
+
levenshtein_distance 'shevy', 'chevy' # => 1
|
765
|
+
^^^ add some class that outpus, on the commandline
|
766
|
+
the levensthein distance ont he commandline
|
767
|
+
and also in the interactive bioshell
|
768
|
+
https://github.com/rubygems/rubygems/blob/master/lib/rubygems/text.rb
|
769
|
+
^^^ actually move that part into bioroebe itself...
|
770
|
+
--------------------------------------------------------------------------------
|
771
|
+
(118) → add _source to all APIs in sinatra there. Ensure that this works
|
1017
772
|
too. The user should be able to view the source code.
|
1018
773
|
^^^ it has been added for 2 methods so far in sinatra; we need
|
1019
|
-
|
1020
|
-
|
1021
|
-
|
1022
|
-
(
|
1023
|
-
|
1024
|
-
also offer this functionality, through commandline, GUI
|
774
|
+
to add it for the remaining ones too. Then we can remove
|
775
|
+
this entry point.
|
776
|
+
--------------------------------------------------------------------------------
|
777
|
+
(119) → Check out expasy
|
778
|
+
peptidcutter
|
779
|
+
also offer this functionality, through commandline, GUI
|
1025
780
|
and sinatra.
|
1026
|
-
|
1027
|
-
We now have added trypsin but we should add more here; and
|
781
|
+
https://web.expasy.org/peptide_cutter/
|
782
|
+
We now have added trypsin but we should add more here; and
|
1028
783
|
still have to add support for sinatra here.
|
1029
|
-
|
1030
|
-
(
|
1031
|
-
|
784
|
+
--------------------------------------------------------------------------------
|
785
|
+
(120) → melting temperature subsection
|
1032
786
|
hmmm .... molecular weight calculation works now ... but
|
1033
787
|
... is it correct for a ssDNA string? hmm...
|
1034
|
-
|
1035
|
-
(
|
1036
|
-
|
1037
|
-
|
1038
|
-
|
1039
|
-
|
1040
|
-
|
1041
|
-
|
1042
|
-
|
1043
|
-
|
1044
|
-
degenerate_primer
|
1045
|
-
|
1046
|
-
^^^ epxnad that subsection
|
1047
|
-
more explanations and examples
|
1048
|
-
|
1049
|
-
-------------------------------------------------------------------------------
|
1050
|
-
(1) → Copy the functionality of plotorf:
|
1051
|
-
|
788
|
+
--------------------------------------------------------------------------------
|
789
|
+
(121) → Degenerate Primers
|
790
|
+
You can try to determine the degenerate primers via the Shell
|
791
|
+
component. Issue the following instructions:
|
792
|
+
degenerate_primer
|
793
|
+
^^^ epxnad that subsection
|
794
|
+
more explanations and examples
|
795
|
+
--------------------------------------------------------------------------------
|
796
|
+
(122) → Copy the functionality of plotorf:
|
1052
797
|
See:
|
1053
|
-
|
1054
|
-
http://www.bioinformatics.nl/cgi-bin/emboss/plotorf
|
1055
|
-
|
798
|
+
http://www.bioinformatics.nl/cgi-bin/emboss/plotorf
|
1056
799
|
Also extend emboss info on the main homepage.
|
1057
800
|
For plotorf we also need to be able to generate images.
|
1058
801
|
We also need a simpler toplevel API here, something like
|
1059
802
|
Bioroebe.return_all_open_reading_frames(of_this_sequence, use_these_as_start_codons = start_codons?, use_these_as_stop_codons = stop_codons?)
|
1060
803
|
^^^
|
1061
804
|
Bioroebe.return_all_ORFS
|
1062
|
-
|
1063
|
-
|
1064
|
-
|
1065
|
-
|
1066
|
-
|
1067
|
-
|
1068
|
-
See the following example:
|
1069
|
-
|
1070
|
-
BIO SHELL> highlight AAA
|
1071
|
-
5' - GTAACTGTTAAACTGTCAGGCAGGCGCTCAGGTGTACGTTTGATGCTCAGTAGTATTCCATTCTCGCGAGGGTCACGATACCCAAGATCTCCATGGCTTTCTGTTAGACGCAGCCGTGGACGACTAGAGCGTTTTTTTTTGGAAAGTATATGACCAGCACTCTACATCCTAACTAGAAGGTCTTCTAGGCGTACCAATATTAACGAATAGTGAGTGGTTACCCGTACCCGTCATGACGTCTATCATTAATT - 3'
|
1072
|
-
BIO SHELL>
|
805
|
+
--------------------------------------------------------------------------------
|
806
|
+
(123) → Start nucleotide position is at: 142
|
807
|
+
See the following example:
|
808
|
+
BIO SHELL> highlight AAA
|
809
|
+
5' - GTAACTGTTAAACTGTCAGGCAGGCGCTCAGGTGTACGTTTGATGCTCAGTAGTATTCCATTCTCGCGAGGGTCACGATACCCAAGATCTCCATGGCTTTCTGTTAGACGCAGCCGTGGACGACTAGAGCGTTTTTTTTTGGAAAGTATATGACCAGCACTCTACATCCTAACTAGAAGGTCTTCTAGGCGTACCAATATTAACGAATAGTGAGTGGTTACCCGTACCCGTCATGACGTCTATCATTAATT - 3'
|
810
|
+
BIO SHELL>
|
1073
811
|
^^^ this does not work; nothing is highlighted.
|
1074
|
-
|
1075
|
-
|
1076
|
-
|
1077
|
-
|
1078
|
-
|
1079
|
-
|
1080
|
-
|
1081
|
-
|
1082
|
-
-------------------------------------------------------------------------------
|
1083
|
-
(3) → integrate the bioroebe_tutorial.cgi into the .md file completely.
|
1084
|
-
|
1085
|
-
-------------------------------------------------------------------------------
|
1086
|
-
(4) → Integrate everything from the biopython tutorial, if it makes
|
812
|
+
--------------------------------------------------------------------------------
|
813
|
+
(124) → Add a myristoylierung-signal
|
814
|
+
Met-Gly-Xaa-Xaa-YXaa-Ser/Thr-Lys-Lys
|
815
|
+
1^^ but check first.
|
816
|
+
--------------------------------------------------------------------------------
|
817
|
+
(125) → integrate the bioroebe_tutorial.cgi into the .md file completely.
|
818
|
+
--------------------------------------------------------------------------------
|
819
|
+
(126) → Integrate everything from the biopython tutorial, if it makes
|
1087
820
|
sense.
|
1088
|
-
|
1089
|
-
|
1090
|
-
(5) → Improve the codon-optimizer in Bioroebe, including the
|
821
|
+
--------------------------------------------------------------------------------
|
822
|
+
(127) → Improve the codon-optimizer in Bioroebe, including the
|
1091
823
|
documentation. We need to make this really useful.
|
1092
|
-
|
1093
|
-
(
|
1094
|
-
|
1095
|
-
|
1096
|
-
|
1097
|
-
|
1098
|
-
|
1099
|
-
|
1100
|
-
|
1101
|
-
|
1102
|
-
|
1103
|
-
|
1104
|
-
^^^^^^^^^^
|
1105
|
-
It has been 5 years ...
|
1106
|
-
|
1107
|
-
^^^ taxonomy/colours/colours wird integriert
|
824
|
+
--------------------------------------------------------------------------------
|
825
|
+
(128) →
|
826
|
+
5'- TACACGGCACAT -3'
|
827
|
+
3'- ATGTGCCGTGTA -5'
|
828
|
+
Imperfect DNA mirror repeats (IMRs) are less than 100% symmetrical.
|
829
|
+
^^^ integrate mirror repeats creation
|
830
|
+
and searching for them. Hmmm.
|
831
|
+
--------------------------------------------------------------------------------
|
832
|
+
(129) → continue porting bioroebe/taxonomy
|
833
|
+
^^^^^^^^^^
|
834
|
+
It has been 5 years ...
|
835
|
+
^^^ taxonomy/colours/colours wird integriert
|
1108
836
|
^^^ das ist der nächste schritt, so das
|
1109
|
-
|
1110
|
-
|
1111
|
-
|
1112
|
-
(8) → find out which bacteria all contain the needle complex; find out
|
837
|
+
wir das nit mehr benötigen.
|
838
|
+
--------------------------------------------------------------------------------
|
839
|
+
(130) → find out which bacteria all contain the needle complex; find out
|
1113
840
|
the sequence for the needle complex as well and study it;
|
1114
841
|
find the positions of the genes responsible.
|
1115
|
-
|
1116
|
-
|
1117
|
-
(9) → Add trypsin_digest, also in the shell, but possibly
|
842
|
+
--------------------------------------------------------------------------------
|
843
|
+
(131) → Add trypsin_digest, also in the shell, but possibly
|
1118
844
|
on toplevel as well (if the input is a protein sequence.
|
1119
|
-
|
1120
845
|
Also, more generally in the shell, add this:
|
1121
|
-
|
1122
|
-
digest trypsin
|
1123
|
-
|
846
|
+
digest trypsin
|
1124
847
|
^^^ onto the aminoacid
|
1125
|
-
|
1126
|
-
|
1127
|
-
|
1128
|
-
|
1129
|
-
|
1130
|
-
|
1131
|
-
|
1132
|
-
(
|
1133
|
-
|
1134
|
-
|
1135
|
-
|
1136
|
-
|
1137
|
-
|
1138
|
-
(
|
1139
|
-
|
1140
|
-
|
1141
|
-
|
1142
|
-
|
1143
|
-
|
1144
|
-
|
1145
|
-
(
|
1146
|
-
|
1147
|
-
|
1148
|
-
|
1149
|
-
|
1150
|
-
|
1151
|
-
|
1152
|
-
|
1153
|
-
|
1154
|
-
|
1155
|
-
|
1156
|
-
|
1157
|
-
|
1158
|
-
|
1159
|
-
|
1160
|
-
|
1161
|
-
|
1162
|
-
|
1163
|
-
|
1164
|
-
|
1165
|
-
|
1166
|
-
|
1167
|
-
(
|
1168
|
-
|
1169
|
-
|
1170
|
-
|
1171
|
-
|
1172
|
-
|
1173
|
-
|
1174
|
-
|
1175
|
-
|
1176
|
-
|
1177
|
-
|
1178
|
-
|
1179
|
-
|
1180
|
-
|
1181
|
-
|
1182
|
-
|
1183
|
-
|
1184
|
-
|
1185
|
-
|
1186
|
-
|
1187
|
-
|
1188
|
-
|
1189
|
-
|
1190
|
-
|
1191
|
-
|
1192
|
-
|
1193
|
-
|
1194
|
-
^^^ we need an analyze-mode as well.
|
1195
|
-
|
1196
|
-
..........................................................................
|
1197
|
-
(20) → ^^^^ add the ability to
|
1198
|
-
show a ruler AND highlighting as well
|
1199
|
-
^^^ then document it.
|
1200
|
-
..........................................................................
|
1201
|
-
(21) → https://github.com/bioperl/bioperl-live
|
1202
|
-
Look what we can take from ^^^.
|
1203
|
-
|
1204
|
-
https://github.com/bioperl/bioperl-live/tree/master/examples
|
1205
|
-
|
1206
|
-
..........................................................................
|
1207
|
-
(23) → continue biojava, and bioroebe a bit
|
1208
|
-
|
1209
|
-
Ideally we should have biojava o a working point.
|
1210
|
-
..........................................................................
|
1211
|
-
(24) → Clone all of Emboss. :)
|
1212
|
-
..........................................................................
|
1213
|
-
(25) → clone the functionality found at https://web.expasy.org/protparam/
|
1214
|
-
|
1215
|
-
https://web.expasy.org/cgi-bin/protparam/protparam
|
1216
|
-
^^^ this is halfway done...
|
1217
|
-
|
1218
|
-
^^^^ also add pI calculation:
|
1219
|
-
|
1220
|
-
Theoretical pI: 5.78
|
1221
|
-
|
1222
|
-
-------------------------------------------------------------------------------
|
1223
|
-
(27) → NP_417539.1
|
1224
|
-
|
848
|
+
"This is the result when digsting with trypsine."
|
849
|
+
And document it; but do not digest if a prolin
|
850
|
+
follows !!!
|
851
|
+
^^^ document this too into .md
|
852
|
+
--------------------------------------------------------------------------------
|
853
|
+
(132) → add codon usage in bioroebe
|
854
|
+
--------------------------------------------------------------------------------
|
855
|
+
(133) → Clone the following functionality.
|
856
|
+
http://www.bioinformatics.nl/cgi-bin/emboss/help/sirna
|
857
|
+
--------------------------------------------------------------------------------
|
858
|
+
(134) → Improve the "find and scan" subsection. We must be able to find
|
859
|
+
subsequences; check for "matches" as well, including the bioshell.
|
860
|
+
--------------------------------------------------------------------------------
|
861
|
+
(135) → Clone the CLUSTAL format aligment.
|
862
|
+
--------------------------------------------------------------------------------
|
863
|
+
(136) → We need to be able to load up a whole geneome into bioroebe,
|
864
|
+
and then be able to manipulate it.
|
865
|
+
^^^ perhaps test this with some example
|
866
|
+
data or so...
|
867
|
+
--------------------------------------------------------------------------------
|
868
|
+
(137) → Restriction enzymes:
|
869
|
+
Add a subsection about restritction enzymes including
|
870
|
+
examples, and also explain how to use this in bioroebe.
|
871
|
+
Minute by minute...
|
872
|
+
AND add resources to useful sites.
|
873
|
+
^^^ start it... have to expand on it
|
874
|
+
Also, improve the part about restriction enzymes in
|
875
|
+
general, so that we can reproduce and verify the
|
876
|
+
information there.
|
877
|
+
--------------------------------------------------------------------------------
|
878
|
+
(138) → clone pepinfo
|
879
|
+
The program "pepinfo" plots various amino acid properties in
|
880
|
+
parallel for an input protein sequence.
|
881
|
+
The types of plot available are:
|
882
|
+
i. Hydrophobicity plots using the method of Kyte & Doolittle, the
|
883
|
+
optimal matching hydrophobicity scale (OHM) of Sweet & Eisenberg,
|
884
|
+
or consensus parameters (Eisenberg et al).
|
885
|
+
ii. Histogram of the presence of residues with the physico-chemical
|
886
|
+
properties: Tiny, Small, Aliphatic, Aromatic, Non-polar, Polar,
|
887
|
+
Charged, Positive, Negative.
|
888
|
+
The data are also written out to an output file.
|
889
|
+
--------------------------------------------------------------------------------
|
890
|
+
(139) → gff?
|
891
|
+
There are 6 .gff3 files in the current directory.
|
892
|
+
We will simply pass the first entry there into class Bioroebe::Parser::GFF.
|
893
|
+
The accession id is `NZ_CP011602.1`.
|
894
|
+
Bioroebe::Parser::GFF: We are instructed to split into standalone files, but we
|
895
|
+
Bioroebe::Parser::GFF: can not do so, as there is not more than one accession id
|
896
|
+
Bioroebe::Parser::GFF: in this file.
|
897
|
+
^^^ we need an analyze-mode as well.
|
898
|
+
--------------------------------------------------------------------------------
|
899
|
+
(140) → ^^^^ add the ability to
|
900
|
+
show a ruler AND highlighting as well
|
901
|
+
^^^ then document it.
|
902
|
+
--------------------------------------------------------------------------------
|
903
|
+
(141) → https://github.com/bioperl/bioperl-live
|
904
|
+
Look what we can take from ^^^.
|
905
|
+
https://github.com/bioperl/bioperl-live/tree/master/examples
|
906
|
+
--------------------------------------------------------------------------------
|
907
|
+
(142) → continue biojava, and bioroebe a bit
|
908
|
+
Ideally we should have biojava o a working point.
|
909
|
+
--------------------------------------------------------------------------------
|
910
|
+
(143) → clone the functionality found at https://web.expasy.org/protparam/
|
911
|
+
https://web.expasy.org/cgi-bin/protparam/protparam
|
912
|
+
^^^ this is halfway done...
|
913
|
+
^^^^ also add pI calculation:
|
914
|
+
Theoretical pI: 5.78
|
915
|
+
--------------------------------------------------------------------------------
|
916
|
+
(144) → NP_417539.1
|
1225
917
|
https://www.ncbi.nlm.nih.gov/protein/NP_417539.1
|
1226
918
|
https://www.ncbi.nlm.nih.gov/protein/NP_417539.1?report=fasta
|
1227
|
-
|
1228
|
-
^^^ if the input is exactly like the above, on the first line,
|
919
|
+
^^^ if the input is exactly like the above, on the first line,
|
1229
920
|
download the sequence.
|
1230
|
-
|
1231
|
-
(
|
1232
|
-
|
1233
|
-
|
1234
|
-
|
1235
|
-
|
1236
|
-
(
|
1237
|
-
|
1238
|
-
|
1239
|
-
|
1240
|
-
(
|
1241
|
-
|
1242
|
-
|
1243
|
-
and try to help
|
1244
|
-
and extend bioruby at the same time.
|
1245
|
-
-------------------------------------------------------------------------------
|
1246
|
-
(31) → The taxonomy-submodule should work one day, and be properly
|
1247
|
-
documented as well. Perhaps integrate the parts of Taxonomy
|
1248
|
-
that can be included into the toplevel domain.
|
1249
|
-
-------------------------------------------------------------------------------
|
1250
|
-
(32) → Enable:
|
1251
|
-
|
1252
|
-
Bioroebe.set_genetic_code()
|
1253
|
-
Bioroebe.set_genetic_code(to: 'Vertebrate Mitochondrial')
|
1254
|
-
|
921
|
+
--------------------------------------------------------------------------------
|
922
|
+
(145) → http://www.biostars.org/
|
923
|
+
^^^ regularly work through this
|
924
|
+
and try to help
|
925
|
+
and extend bioruby at the same time.
|
926
|
+
--------------------------------------------------------------------------------
|
927
|
+
(146) → The taxonomy-submodule should work one day, and be properly
|
928
|
+
documented as well. Perhaps integrate the parts of Taxonomy
|
929
|
+
that can be included into the toplevel domain.
|
930
|
+
--------------------------------------------------------------------------------
|
931
|
+
(147) → Enable:
|
932
|
+
Bioroebe.set_genetic_code()
|
933
|
+
Bioroebe.set_genetic_code(to: 'Vertebrate Mitochondrial')
|
1255
934
|
^^^ enable this
|
1256
|
-
|
1257
|
-
|
1258
|
-
|
1259
|
-
|
1260
|
-
|
1261
|
-
Seq('MAIVMGRWKGAR*')
|
1262
|
-
|
935
|
+
Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG")
|
936
|
+
coding_dna.translate(table=2)
|
937
|
+
coding_dna.translate(table="Vertebrate Mitochondrial")
|
938
|
+
Seq('MAIVMGRWKGAR*', HasStopCodon(IUPACProtein(), '*'))
|
939
|
+
Seq('MAIVMGRWKGAR*')
|
1263
940
|
^^^ enable this as well; extent documentation too.
|
1264
|
-
|
1265
|
-
|
1266
|
-
|
1267
|
-
|
1268
|
-
|
1269
|
-
|
1270
|
-
|
1271
|
-
|
1272
|
-
|
1273
|
-
|
1274
|
-
|
1275
|
-
|
1276
|
-
|
1277
|
-
|
1278
|
-
(
|
1279
|
-
|
1280
|
-
|
1281
|
-
|
1282
|
-
(
|
1283
|
-
|
1284
|
-
|
1285
|
-
|
1286
|
-
|
1287
|
-
|
1288
|
-
|
1289
|
-
|
1290
|
-
|
1291
|
-
|
1292
|
-
|
1293
|
-
|
1294
|
-
|
1295
|
-
|
1296
|
-
|
1297
|
-
|
1298
|
-
|
1299
|
-
|
1300
|
-
|
1301
|
-
|
1302
|
-
|
1303
|
-
|
1304
|
-
|
1305
|
-
|
1306
|
-
|
1307
|
-
|
1308
|
-
|
1309
|
-
|
1310
|
-
|
1311
|
-
|
1312
|
-
|
1313
|
-
|
1314
|
-
|
1315
|
-
|
1316
|
-
|
1317
|
-
|
1318
|
-
|
1319
|
-
|
1320
|
-
|
1321
|
-
|
1322
|
-
|
1323
|
-
|
1324
|
-
|
1325
|
-
|
1326
|
-
|
1327
|
-
|
1328
|
-
|
1329
|
-
|
1330
|
-
|
1331
|
-
|
1332
|
-
|
1333
|
-
|
1334
|
-
|
1335
|
-
|
1336
|
-
|
1337
|
-
|
1338
|
-
|
1339
|
-
|
1340
|
-
|
1341
|
-
|
1342
|
-
|
1343
|
-
|
1344
|
-
|
1345
|
-
(
|
1346
|
-
|
1347
|
-
|
1348
|
-
|
1349
|
-
|
1350
|
-
|
1351
|
-
|
1352
|
-
|
1353
|
-
|
1354
|
-
|
1355
|
-
|
1356
|
-
|
1357
|
-
|
1358
|
-
|
1359
|
-
|
1360
|
-
|
1361
|
-
|
1362
|
-
|
1363
|
-
|
1364
|
-
|
1365
|
-
|
1366
|
-
|
1367
|
-
|
1368
|
-
|
1369
|
-
|
1370
|
-
|
1371
|
-
|
1372
|
-
|
1373
|
-
|
1374
|
-
(
|
1375
|
-
|
1376
|
-
|
1377
|
-
|
1378
|
-
|
1379
|
-
|
1380
|
-
|
1381
|
-
|
1382
|
-
|
1383
|
-
|
1384
|
-
|
1385
|
-
|
1386
|
-
|
1387
|
-
|
1388
|
-
|
1389
|
-
|
1390
|
-
|
1391
|
-
|
1392
|
-
|
1393
|
-
|
1394
|
-
|
1395
|
-
(57) → Add:
|
1396
|
-
|
1397
|
-
http://nar.oxfordjournals.org/content/35/suppl_2/W71.long
|
1398
|
-
|
1399
|
-
-------------------------------------------------------------------------------
|
1400
|
-
(58) → Now, you may want to translate the nucleotides up to
|
1401
|
-
the first in frame stop codon, and then stop (as
|
1402
|
-
happens in nature):
|
1403
|
-
|
1404
|
-
coding_dna.translate()
|
1405
|
-
Seq('MAIVMGR*KGAR*', HasStopCodon(IUPACProtein(), '*'))
|
1406
|
-
>>> coding_dna.translate(to_stop=True)
|
1407
|
-
Seq('MAIVMGR', IUPACProtein())
|
1408
|
-
^^^^ support this hmmm.
|
1409
|
-
|
1410
|
-
Then continue from here:
|
1411
|
-
|
1412
|
-
https://people.duke.edu/~ccc14/pcfb/biopython/BiopythonSequences.html
|
1413
|
-
-------------------------------------------------------------------------------
|
1414
|
-
(59) → Add:
|
1415
|
-
|
1416
|
-
set_dna :Ubiquitin
|
1417
|
-
set_dna :ubiquitin
|
1418
|
-
|
1419
|
-
^^^ we want to obtain the ubuiqitin sequence
|
1420
|
-
-------------------------------------------------------------------------------
|
1421
|
-
(59) → Telomers
|
1422
|
-
|
1423
|
-
Telomeres are listed from 5' to 3'.
|
1424
|
-
|
1425
|
-
Example for the human telomeres would be:
|
1426
|
-
5'-TTAGGG-3
|
1427
|
-
|
1428
|
-
^^^ stimmt das?
|
1429
|
-
|
1430
|
-
add:
|
1431
|
-
doc_telomeres
|
1432
|
-
|
1433
|
-
^^^ add this to say the human telomere sequence
|
1434
|
-
-------------------------------------------------------------------------------
|
1435
|
-
(60) → ORF_positions?
|
941
|
+
--------------------------------------------------------------------------------
|
942
|
+
(148) → We have found a restriction enzyme called NheI.
|
943
|
+
The sequence this 6-cutter relates to is: `5' - GCTAGC - 3'`
|
944
|
+
This restriction enzyme will produce a blunt overhang.
|
945
|
+
^^^ nope das ist falsch
|
946
|
+
--------------------------------------------------------------------------------
|
947
|
+
(149) → Sau3A?
|
948
|
+
^^^ enable this restriction site
|
949
|
+
--------------------------------------------------------------------------------
|
950
|
+
(150) → Add matplotlib support.
|
951
|
+
try_to_use_matplotlib
|
952
|
+
--------------------------------------------------------------------------------
|
953
|
+
(151) → https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/RESTfulAPIs.html
|
954
|
+
--------------------------------------------------------------------------------
|
955
|
+
(152) → The following input:
|
956
|
+
downcase; orf?; seq?
|
957
|
+
leads to strange display. Something is wrong here, must be checked.
|
958
|
+
--------------------------------------------------------------------------------
|
959
|
+
(153) → Continue with rosalind problems.
|
960
|
+
These challenges can be found here:
|
961
|
+
http://rosalind.info/problems/sign/
|
962
|
+
Also integrate these rosalind-quizzes into bioroebe
|
963
|
+
when possible.
|
964
|
+
--------------------------------------------------------------------------------
|
965
|
+
(154) → https://web.expasy.org/cgi-bin/peptide_mass/peptide-mass.pl
|
966
|
+
^^^ make the above usable in sinaitra as well
|
967
|
+
--------------------------------------------------------------------------------
|
968
|
+
(155) → Integrate a way to search for commonly known promoters:
|
969
|
+
promoters?
|
970
|
+
^^^ this functionality
|
971
|
+
^^^ this has to be expanded
|
972
|
+
and ...
|
973
|
+
--------------------------------------------------------------------------------
|
974
|
+
(156) → Integrate:
|
975
|
+
http://biotools.nubic.northwestern.edu/OligoCalc.html
|
976
|
+
--------------------------------------------------------------------------------
|
977
|
+
(157) → Extend the Java part of BioRoebe systematically..
|
978
|
+
What should come next? Let's make a list.
|
979
|
+
→ remove_numbers [DONE]
|
980
|
+
--------------------------------------------------------------------------------
|
981
|
+
(158) → Study gnuplot; one day we have to draw graphs.
|
982
|
+
--------------------------------------------------------------------------------
|
983
|
+
(159) → Add a genome browser, both ascii without GUI and also
|
984
|
+
with. In ruby-gtk.
|
985
|
+
--------------------------------------------------------------------------------
|
986
|
+
(160) → Clone the functionality of:
|
987
|
+
http://www.biophp.org/minitools/restriction_digest/demo.php
|
988
|
+
--------------------------------------------------------------------------------
|
989
|
+
(161) → Add the loxP sequence to readme [DONE] and explain this
|
990
|
+
better on the main readme; and perhaps also assign
|
991
|
+
the sequence via the bioshell.
|
992
|
+
--------------------------------------------------------------------------------
|
993
|
+
(162) → 33. Cephalodiscidae Mitochondrial UAA-Tyr Code (transl_table=33)
|
994
|
+
AAs = FFLLSSSSYYY*CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSSKVVVVAAAADDEEGGGG
|
995
|
+
Starts = ---M-------*-------M---------------M---------------M------------
|
996
|
+
Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
|
997
|
+
Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
|
998
|
+
Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
|
999
|
+
^^^ add a parser, and document it, that can take this input
|
1000
|
+
and output the corresponding code, in a valid .yml file.
|
1001
|
+
--------------------------------------------------------------------------------
|
1002
|
+
(163) → Add to bioroebe the ability to add cloning vectors
|
1003
|
+
and molecular_weight calcuation
|
1004
|
+
for this
|
1005
|
+
and also to show the sequence of a vector
|
1006
|
+
and how to add them to bioroebe
|
1007
|
+
also download sequence data for a vector - this
|
1008
|
+
should probably be some interactive table or
|
1009
|
+
something of the sort here.
|
1010
|
+
^^^ this works a tiny bit now, must be documented still
|
1011
|
+
via seq2? and so forth
|
1012
|
+
"pBR322 contains the genes for resistance to ampicillin and
|
1013
|
+
tetracycline, and can be amplified with chloramphenicol.
|
1014
|
+
The molecular weight is 2.83 x 106 daltons."
|
1015
|
+
^^^ we also need a way to find out what resistance genes
|
1016
|
+
are carried there.
|
1017
|
+
--------------------------------------------------------------------------------
|
1018
|
+
(164) → In the lambda genome sequence there are 10 EcoB and
|
1019
|
+
5 EcoK sites.
|
1020
|
+
^^^ verify this too, as an example as well
|
1021
|
+
--------------------------------------------------------------------------------
|
1022
|
+
(165) → show restriction sites, composable and compatible with
|
1023
|
+
serial clone ... hmm
|
1024
|
+
--------------------------------------------------------------------------------
|
1025
|
+
(166) → enable:
|
1026
|
+
BIOROEBE_USE_COLOURS:
|
1027
|
+
can be 0 or 1
|
1028
|
+
what is this?
|
1029
|
+
--------------------------------------------------------------------------------
|
1030
|
+
(167) → Burrows-Wheeler-Transform (BWT)
|
1031
|
+
^^^ add some method here
|
1032
|
+
Bioroebe.burrows_wheeler_transform
|
1033
|
+
^^^ if no '$' char is in the input, then append it
|
1034
|
+
then, output the array of a SORTED BWT transform,
|
1035
|
+
frmo the given input string.
|
1036
|
+
document this properly as well
|
1037
|
+
also test this against my paper-result
|
1038
|
+
with input being: "GATAG$".
|
1039
|
+
--------------------------------------------------------------------------------
|
1040
|
+
(168) → Enable working with several genes... hmm and store that somewhere.
|
1041
|
+
Something like a per-project workspace thingy.
|
1042
|
+
--------------------------------------------------------------------------------
|
1043
|
+
(169) → Add:
|
1044
|
+
http://nar.oxfordjournals.org/content/35/suppl_2/W71.long
|
1045
|
+
--------------------------------------------------------------------------------
|
1046
|
+
(170) → Now, you may want to translate the nucleotides up to
|
1047
|
+
the first in frame stop codon, and then stop (as
|
1048
|
+
happens in nature):
|
1049
|
+
coding_dna.translate()
|
1050
|
+
Seq('MAIVMGR*KGAR*', HasStopCodon(IUPACProtein(), '*'))
|
1051
|
+
>>> coding_dna.translate(to_stop=True)
|
1052
|
+
Seq('MAIVMGR', IUPACProtein())
|
1053
|
+
^^^^ support this hmmm.
|
1054
|
+
Then continue from here:
|
1055
|
+
https://people.duke.edu/~ccc14/pcfb/biopython/BiopythonSequences.html
|
1056
|
+
--------------------------------------------------------------------------------
|
1057
|
+
(171) → Add:
|
1058
|
+
set_dna :Ubiquitin
|
1059
|
+
set_dna :ubiquitin
|
1060
|
+
^^^ we want to obtain the ubuiqitin sequence
|
1061
|
+
--------------------------------------------------------------------------------
|
1062
|
+
(172) → Telomers
|
1063
|
+
Telomeres are listed from 5' to 3'.
|
1064
|
+
Example for the human telomeres would be:
|
1065
|
+
5'-TTAGGG-3
|
1066
|
+
^^^ stimmt das?
|
1067
|
+
add:
|
1068
|
+
doc_telomeres
|
1069
|
+
^^^ add this to say the human telomere sequence
|
1070
|
+
--------------------------------------------------------------------------------
|
1071
|
+
(173) → ORF_positions?
|
1436
1072
|
^^^ change this a bit, to actually show the positions
|
1437
|
-
|
1438
|
-
|
1439
|
-
(
|
1440
|
-
|
1441
|
-
|
1442
|
-
|
1443
|
-
|
1444
|
-
|
1445
|
-
|
1446
|
-
|
1447
|
-
|
1448
|
-
|
1449
|
-
|
1450
|
-
|
1451
|
-
|
1452
|
-
|
1453
|
-
|
1454
|
-
|
1455
|
-
|
1456
|
-
|
1457
|
-
|
1458
|
-
|
1459
|
-
|
1460
|
-
|
1461
|
-
|
1462
|
-
|
1463
|
-
|
1464
|
-
|
1465
|
-
NameError (uninitialized constant Bioroebe::Blosum)
|
1466
|
-
irb(main):003:0> module LibSSW
|
1467
|
-
irb(main):004:1> BLOSUM50 = [
|
1468
|
-
irb(main):005:2*
|
1469
|
-
|
1470
|
-
|
1471
|
-
^^^ also enable Bioroebe::Blosum.matrix?,
|
1472
|
-
Bioroebe::Blosum.matrix?
|
1473
|
-
#^^^ add this
|
1073
|
+
of the various ORFs with the start-position.
|
1074
|
+
--------------------------------------------------------------------------------
|
1075
|
+
(174) → add:
|
1076
|
+
setgene2
|
1077
|
+
add_dna2
|
1078
|
+
dna2
|
1079
|
+
dna? <--- this one is not a setter but a query.
|
1080
|
+
--------------------------------------------------------------------------------
|
1081
|
+
(175) → improve the TM calculation. must be better, must have more
|
1082
|
+
documentation, and a small tutorial.
|
1083
|
+
--------------------------------------------------------------------------------
|
1084
|
+
(176) → Compare bioroebe to:
|
1085
|
+
https://www.ncbi.nlm.nih.gov/orffinder
|
1086
|
+
whether both return the same
|
1087
|
+
also possibly add a web-gui
|
1088
|
+
--------------------------------------------------------------------------------
|
1089
|
+
(177) → Find out ratios from:
|
1090
|
+
Doolittle RF. 1989. Redundancies in protein sequences. I
|
1091
|
+
http://onlinelibrary.wiley.com/doi/10.1110/ps.9.6.1203/pdf
|
1092
|
+
^^^ that table perhaps
|
1093
|
+
(1) require 'bioroebe'
|
1094
|
+
NameError (uninitialized constant Bioroebe::Blosum)
|
1095
|
+
irb(main):003:0> module LibSSW
|
1096
|
+
irb(main):004:1> BLOSUM50 = [
|
1097
|
+
irb(main):005:2*
|
1098
|
+
^^^ also enable Bioroebe::Blosum.matrix?,
|
1099
|
+
Bioroebe::Blosum.matrix?
|
1100
|
+
#^^^ add this
|
1474
1101
|
Bioroebe::Blosum[50]
|
1475
1102
|
^^^ add this, and show an error if the file does not exist.
|
1476
|
-
|
1477
|
-
|
1478
|
-
|
1479
|
-
|
1480
|
-
|
1481
|
-
|
1482
|
-
|
1483
|
-
|
1484
|
-
|
1485
|
-
|
1486
|
-
|
1487
|
-
class Cell
|
1488
|
-
^^^ simulate a cell
|
1103
|
+
.show_matrix
|
1104
|
+
^^^ and so forth, also:
|
1105
|
+
Bioroebe::Blosum[50] as an API.
|
1106
|
+
and document it in general.
|
1107
|
+
--------------------------------------------------------------------------------
|
1108
|
+
(178) → http://www.biomart.org/other/user-docs.pdf
|
1109
|
+
^^^ work through this
|
1110
|
+
--------------------------------------------------------------------------------
|
1111
|
+
(179) → add:
|
1112
|
+
class Cell
|
1113
|
+
^^^ simulate a cell
|
1489
1114
|
Hmmm. Needs specific components ... and needs a better plan.
|
1490
|
-
|
1491
|
-
(
|
1492
|
-
|
1493
|
-
|
1494
|
-
|
1495
|
-
|
1496
|
-
|
1497
|
-
|
1498
|
-
|
1499
|
-
|
1500
|
-
|
1501
|
-
|
1502
|
-
|
1503
|
-
|
1504
|
-
|
1505
|
-
|
1506
|
-
|
1507
|
-
|
1508
|
-
|
1509
|
-
|
1510
|
-
|
1511
|
-
|
1512
|
-
|
1513
|
-
|
1514
|
-
|
1515
|
-
|
1516
|
-
|
1517
|
-
Cytosines: 255 | 25.50 %
|
1518
|
-
Thymine: 228 | 22.80 %
|
1519
|
-
|
1520
|
-
|
1521
|
-
^^^ created balanced composition
|
1522
|
-
|
1115
|
+
--------------------------------------------------------------------------------
|
1116
|
+
(180) → class Protein:
|
1117
|
+
add glycosyslation patteren
|
1118
|
+
.glycosylated? yes no
|
1119
|
+
+ glycoslated?
|
1120
|
+
need to somehow add the modiication type
|
1121
|
+
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5358406/
|
1122
|
+
--------------------------------------------------------------------------------
|
1123
|
+
(181) → In the BioShell we must be able to do probes - completementary
|
1124
|
+
to amino acids.
|
1125
|
+
--------------------------------------------------------------------------------
|
1126
|
+
(182) → Add www-related functionality to bioroebe eventually make use
|
1127
|
+
of rails, but start with sinatra possibly. In the long run,
|
1128
|
+
make it flexible to work with as many different frameworks
|
1129
|
+
as possible, though.
|
1130
|
+
--------------------------------------------------------------------------------
|
1131
|
+
(183) → Spaltstellen anzeigen zum beispiel lambda-DNA verdau
|
1132
|
+
BgI II.
|
1133
|
+
--------------------------------------------------------------------------------
|
1134
|
+
(184) → dnaanalyze
|
1135
|
+
In the DNA string `TCCGTCGCAACACATCGCCTCAACAAACCGACCGGGATATGCAATACCGGAATCCGATCCTTTAGAAGCTGCATTCCAAACGCTTGCAATAACACCCACTCGACTATTCAGCATTGGCAAAGGGTACGAATTCGACGAAGGGAGGGTGCTATATTTTCCAAGTTGCTCGCCGATTGATACGGAGCCTGTGGAAAGATTTCGCGGCTCTAGTCTTTAGCTTTGATGTCACCCCTGAGTAGTAACCCGGCGTGGTAGCTTTCATTAGACTTCTCGGAGAGAGTATTAAGCAAAGGTGGAGGTCCCAGGGGTCCAGTGAGCTGTATCGCACTAAAAGCATGCCTACGGGCAATGCTATTTTGCTCACAGGAACTTTGGGGGAGCCACAAACTCTCGAAGCCGGATTGTTGTGGCGGCTAACTTTCCAAAGGCGACCATTCATGGTCTGAATGGGCCCTCACCAGAAGAACGTTTTCGACGGGCATTCTTCCCCGGGGTTTCGAAGGCAAGGGTCAGCACGGCGCGGAAAAGTACGCGACGCATACCGGACTAGTCATGCAACTCCCTCGGAACTGGCGATTCCCACCCAAGAGACGCACGCTGATCATTGCCCATGCCGACTGGAGATGCTGAATTTGGTATGCGGGTCTGTTGCCAGCGCTGACATTATCGGACATTGTGGGGAGAACCGTGTGATTGATTGAGCTGGCGCATTTGTCCGCATGCTCTCCTCATGTGGACACCTTCGCAGGTTCTTTCCGCGGCCACAGTGTCGGGATCTACCCCTGGTGCGTCGCCGCGAGTACAGGTGGGGTTTCGCGCATGAGAACCAATGTTGCACGCCTCAAAACATGGCTGTAACATATTAGCGCCAATAAAAATTTTTGGCAACAAAGAAACAAGGCCAACCGAAGTGCTAAGCCGCGATCATGAAGGGGCGATGCCAGAATGGGAGTCTGCCTTTCCTGTGTGGACGTGAGATTGTACCTAGACAGAGAACGCC` we found these Nucleotides:
|
1136
|
+
================================================================================
|
1137
|
+
Adenines: 244 | 24.40 %
|
1138
|
+
Guanines: 273 | 27.30 %
|
1139
|
+
Cytosines: 255 | 25.50 %
|
1140
|
+
Thymine: 228 | 22.80 %
|
1141
|
+
^^^ created balanced composition
|
1523
1142
|
"Enter the percentage"
|
1524
|
-
|
1525
1143
|
"interactive_string"
|
1526
1144
|
^^^ here we ask user input
|
1527
|
-
|
1528
1145
|
otherwise, we assume it to be 25%
|
1529
1146
|
Ok 25% works...
|
1530
|
-
|
1531
|
-
|
1532
|
-
|
1533
|
-
|
1534
|
-
|
1535
|
-
|
1536
|
-
(
|
1537
|
-
|
1538
|
-
|
1539
|
-
|
1540
|
-
(
|
1541
|
-
|
1542
|
-
|
1543
|
-
|
1544
|
-
first in a yaml file; also documented, then also add
|
1147
|
+
The other part does not yet work. I am so lazy...
|
1148
|
+
^^^ add a GUI part too. The GUI has been added;
|
1149
|
+
we need to make it so that an input sequence
|
1150
|
+
can be assigned, and dnaanalyse --GUI should
|
1151
|
+
start it too. ALSO document it once this works.
|
1152
|
+
--------------------------------------------------------------------------------
|
1153
|
+
(185) → go through the individual components slowly and improve them,
|
1154
|
+
step by step, including the documentation. Then eventually
|
1155
|
+
remove this todo-entry here.
|
1156
|
+
--------------------------------------------------------------------------------
|
1157
|
+
(186) → Add a consensus sequence for:
|
1158
|
+
Asn-X-Ser/Thr-Conesnsus
|
1159
|
+
first in a yaml file; also documented, then also add
|
1545
1160
|
a way to scan for these something like:
|
1546
|
-
|
1547
|
-
|
1548
|
-
|
1549
|
-
|
1550
|
-
|
1551
|
-
|
1552
|
-
(
|
1553
|
-
|
1554
|
-
|
1555
|
-
|
1556
|
-
|
1557
|
-
|
1558
|
-
|
1559
|
-
|
1560
|
-
#
|
1561
|
-
|
1562
|
-
|
1563
|
-
|
1564
|
-
|
1565
|
-
|
1566
|
-
|
1567
|
-
|
1568
|
-
|
1569
|
-
|
1570
|
-
|
1571
|
-
|
1572
|
-
|
1573
|
-
|
1574
|
-
|
1575
|
-
|
1576
|
-
|
1577
|
-
|
1578
|
-
perhaps add the restriction enyme
|
1579
|
-
part nto a standalone file
|
1580
|
-
so taht it can be used by both the .cgi and
|
1581
|
-
well rdoc...
|
1582
|
-
-------------------------------------------------------------------------------
|
1583
|
-
- Add more protein-specific thingies to bioroebe.
|
1584
|
-
-------------------------------------------------------------------------------
|
1585
|
-
- Die bioshell vorantreiben und durch std_biology.rb abarbeiten.
|
1586
|
-
Vielleicht können wir ja etwas davon auslagern in eine Klasse
|
1587
|
-
oder so.
|
1588
|
-
|
1589
|
-
Das ganze sollte auch mit Webmin (biomin) verknüpft werden, so das
|
1590
|
-
wir die Bioshell auch elegant über das www verwenden können!
|
1591
|
-
-------------------------------------------------------------------------------
|
1592
|
-
- ^^^ when we find restriction enzyme sites in a DNA
|
1161
|
+
N-Glycosylation?
|
1162
|
+
NGlycosylation?
|
1163
|
+
NGlyc
|
1164
|
+
/N-?Glyc/i
|
1165
|
+
^^^ use that regex
|
1166
|
+
--------------------------------------------------------------------------------
|
1167
|
+
(187) → require 'bio'
|
1168
|
+
# creating a Bio::Sequence::NA object containing ambiguous alphabets
|
1169
|
+
ambiguous_seq = Bio::Sequence::NA.new("atgcyrwskmbdhvn")
|
1170
|
+
# show the contents and class of the DNA sequence object
|
1171
|
+
p ambiguous_seq # => "atgcyrwskmbdhvn"
|
1172
|
+
p ambiguous_seq.class # => Bio::Sequence::NA
|
1173
|
+
# convert the sequence to a Regexp object
|
1174
|
+
p ambiguous_seq.to_re # => /atgc[tc][ag][at][gc][tg][ac][tgc][atg][atc][agc][atgc]/
|
1175
|
+
p ambiguous_seq.to_re.class # => Regexp
|
1176
|
+
^^^ add .to_re to this. it must generate a regexp object.
|
1177
|
+
- Also add some restirctione nzymes example
|
1178
|
+
on the bioroebe readme... and bio.cgi
|
1179
|
+
perhaps add the restriction enyme
|
1180
|
+
part nto a standalone file
|
1181
|
+
so taht it can be used by both the .cgi and
|
1182
|
+
well rdoc...
|
1183
|
+
--------------------------------------------------------------------------------
|
1184
|
+
(188) → Add more protein-specific thingies to bioroebe.
|
1185
|
+
--------------------------------------------------------------------------------
|
1186
|
+
(189) → Die bioshell vorantreiben und durch std_biology.rb abarbeiten.
|
1187
|
+
Vielleicht können wir ja etwas davon auslagern in eine Klasse
|
1188
|
+
oder so.
|
1189
|
+
Das ganze sollte auch mit Webmin (biomin) verknüpft werden, so das
|
1190
|
+
wir die Bioshell auch elegant über das www verwenden können!
|
1191
|
+
--------------------------------------------------------------------------------
|
1192
|
+
(190) → ^^^ when we find restriction enzyme sites in a DNA
|
1593
1193
|
string, colourize them RED.
|
1594
|
-
|
1595
|
-
|
1596
|
-
|
1597
|
-
|
1598
|
-
|
1599
|
-
|
1600
|
-
|
1601
|
-
|
1602
|
-
|
1603
|
-
|
1604
|
-
|
1605
|
-
|
1606
|
-
|
1607
|
-
|
1608
|
-
|
1609
|
-
|
1610
|
-
|
1611
|
-
|
1612
|
-
|
1613
|
-
|
1614
|
-
|
1615
|
-
|
1616
|
-
|
1617
|
-
|
1618
|
-
|
1619
|
-
|
1620
|
-
|
1621
|
-
|
1622
|
-
|
1623
|
-
|
1624
|
-
|
1625
|
-
|
1626
|
-
|
1627
|
-
|
1628
|
-
|
1629
|
-
|
1630
|
-
|
1631
|
-
|
1632
|
-
|
1633
|
-
|
1634
|
-
|
1635
|
-
|
1636
|
-
|
1637
|
-
|
1638
|
-
|
1639
|
-
|
1640
|
-
|
1641
|
-
|
1642
|
-
|
1643
|
-
|
1644
|
-
|
1645
|
-
|
1646
|
-
|
1647
|
-
|
1648
|
-
|
1649
|
-
|
1650
|
-
|
1651
|
-
|
1652
|
-
|
1653
|
-
|
1654
|
-
|
1655
|
-
|
1656
|
-
|
1657
|
-
|
1658
|
-
|
1659
|
-
|
1660
|
-
|
1661
|
-
|
1662
|
-
|
1663
|
-
x = 'Goldman, JM, Melo JV 2003 NEJM 349:1451 14534339
|
1664
|
-
Lewis GD 1993 Cancer Immunol Immun other 37: 255 8102322
|
1665
|
-
McShane LM 2009 Clin Canc Res 15: 1898 19276274
|
1666
|
-
Fox JL 2007 Nature Biotech 25: 489 17483821
|
1667
|
-
Bodin L 2005 Blood 106: 135 15790782'
|
1668
|
-
|
1669
|
-
require 'rubygems'
|
1670
|
-
require 'bio'
|
1671
|
-
|
1672
|
-
my_file = File.new(ARGV[0])
|
1673
|
-
refs = my_file.readlines
|
1674
|
-
ids = []
|
1675
|
-
|
1676
|
-
refs.each do |line|
|
1194
|
+
also set it to
|
1195
|
+
set_restriction_size()
|
1196
|
+
--------------------------------------------------------------------------------
|
1197
|
+
(191) → also ... while learning C++ we extend the project here...
|
1198
|
+
Useful C++ things will be combined.
|
1199
|
+
--------------------------------------------------------------------------------
|
1200
|
+
(192) → As of April 2003, there were 176,890 total taxa represented.
|
1201
|
+
^^^ we need a way to also output how many entries we
|
1202
|
+
have there.
|
1203
|
+
--------------------------------------------------------------------------------
|
1204
|
+
(193) → Replace bioruby with bioroebe completely!
|
1205
|
+
In order for this to work, we first need to find out
|
1206
|
+
what bioruby is able to do. :P
|
1207
|
+
--------------------------------------------------------------------------------
|
1208
|
+
(194) → append 33
|
1209
|
+
# ^^^ in the bioshell
|
1210
|
+
Only numbers were given: Adding 33 random nucleotides to the main string next.
|
1211
|
+
Traceback (most recent call last):
|
1212
|
+
10: from /usr/bin/bioshell:26:in `<main>'
|
1213
|
+
9: from /home/Programs/Ruby/2.7.1/lib/ruby/site_ruby/2.7.0/bioroebe/shell/misc.rb:4121:in `shell'
|
1214
|
+
8: from /home/Programs/Ruby/2.7.1/lib/ruby/site_ruby/2.7.0/bioroebe/shell/misc.rb:4121:in `new'
|
1215
|
+
7: from /home/Programs/Ruby/2.7.1/lib/ruby/site_ruby/2.7.0/bioroebe/shell/initialize.rb:168:in `initialize'
|
1216
|
+
6: from /home/Programs/Ruby/2.7.1/lib/ruby/site_ruby/2.7.0/bioroebe/shell/loop.rb:18:in `enter_main_loop'
|
1217
|
+
5: from /home/Programs/Ruby/2.7.1/lib/ruby/site_ruby/2.7.0/bioroebe/shell/loop.rb:18:in `loop'
|
1218
|
+
4: from /home/Programs/Ruby/2.7.1/lib/ruby/site_ruby/2.7.0/bioroebe/shell/loop.rb:30:in `block in enter_main_loop'
|
1219
|
+
3: from /home/Programs/Ruby/2.7.1/lib/ruby/site_ruby/2.7.0/bioroebe/shell/loop.rb:30:in `each'
|
1220
|
+
2: from /home/Programs/Ruby/2.7.1/lib/ruby/site_ruby/2.7.0/bioroebe/shell/loop.rb:31:in `block (2 levels) in enter_main_loop'
|
1221
|
+
1: from /home/Programs/Ruby/2.7.1/lib/ruby/site_ruby/2.7.0/bioroebe/shell/menu.rb:3565:in `menu'
|
1222
|
+
/home/Programs/Ruby/2.7.1/lib/ruby/site_ruby/2.7.0/bioroebe/shell/misc.rb:3979:in `append': undefined method `return_sequence_that_is_cut_via_restriction_enzyme' for Bioroebe:Module (NoMethodError)
|
1223
|
+
Did you mean? return_random_codon_sequence_for_this_aminoacid_sequence
|
1224
|
+
^^^^^ BUG!
|
1225
|
+
--------------------------------------------------------------------------------
|
1226
|
+
(195) → > rest?
|
1227
|
+
We found these restriction sites within the sequence `TTCAGAACTCAACGCCTGGTTGGCCGTCCAGTAAGCTGACTAAGTAAGTCTATGCCCGCGATAACCAGGATACAGATATCGTGAAACCTGGTTTATCTCCTTCTATAAGAGTCTGCACATCTAGC`:
|
1228
|
+
AccII → CGCG ( 1 times found)
|
1229
|
+
AluI → AGCT ( 1 times found)
|
1230
|
+
BfaI → CTAG ( 1 times found)
|
1231
|
+
BshI → GGCC ( 1 times found)
|
1232
|
+
Bsh1236I → CGCG ( 1 times found)
|
1233
|
+
BshFI → GGCC ( 1 times found)
|
1234
|
+
BstFNI → CGCG ( 1 times found)
|
1235
|
+
BstUI → CGCG ( 1 times found)
|
1236
|
+
BsuRI -> GGCC ( 1 times found)
|
1237
|
+
CviRI -> TGCA ( 1 times found)
|
1238
|
+
Eco32I -> GATATC ( 1 times found)
|
1239
|
+
EcoRV -> GATATC ( 1 times found)
|
1240
|
+
FnuDII -> CGCG ( 1 times found)
|
1241
|
+
HaeIII -> GGCC ( 1 times found)
|
1242
|
+
HpyCH4V -> TGCA ( 1 times found)
|
1243
|
+
MaeI -> CTAG ( 1 times found)
|
1244
|
+
MvnI -> CGCG ( 1 times found)
|
1245
|
+
PalI -> GGCC ( 1 times found)
|
1246
|
+
SelI -> CGCG ( 1 times found)
|
1247
|
+
ThaI -> CGCG ( 1 times found)
|
1248
|
+
XspI → CTAG ( 1 times found)
|
1249
|
+
^^^^ also show the position
|
1250
|
+
--------------------------------------------------------------------------------
|
1251
|
+
(196) → PMID entries are:
|
1252
|
+
x = 'Goldman, JM, Melo JV 2003 NEJM 349:1451 14534339
|
1253
|
+
Lewis GD 1993 Cancer Immunol Immun other 37: 255 8102322
|
1254
|
+
McShane LM 2009 Clin Canc Res 15: 1898 19276274
|
1255
|
+
Fox JL 2007 Nature Biotech 25: 489 17483821
|
1256
|
+
Bodin L 2005 Blood 106: 135 15790782'
|
1257
|
+
require 'rubygems'
|
1258
|
+
require 'bio'
|
1259
|
+
my_file = File.new(ARGV[0])
|
1260
|
+
refs = my_file.readlines
|
1261
|
+
ids = []
|
1262
|
+
refs.each do |line|
|
1677
1263
|
pmid = line.strip().split("\t")
|
1678
1264
|
ids.push(pmid[2])
|
1679
|
-
|
1680
|
-
|
1681
|
-
ids.each { |id|
|
1265
|
+
end
|
1266
|
+
ids.each { |id|
|
1682
1267
|
entry = Bio::PubMed.query(id)
|
1683
1268
|
medline = Bio::MEDLINE.new(entry)
|
1684
1269
|
reference = medline.reference
|
1685
1270
|
puts reference.endnote
|
1686
|
-
|
1687
|
-
- Clone, in bioroebe, the get_ORF functionality.
|
1688
|
-
|
1689
|
-
|
1690
|
-
|
1691
|
-
|
1692
|
-
|
1693
|
-
|
1694
|
-
|
1695
|
-
|
1696
|
-
|
1697
|
-
|
1698
|
-
|
1699
|
-
|
1700
|
-
|
1701
|
-
|
1702
|
-
|
1703
|
-
|
1704
|
-
|
1705
|
-
|
1706
|
-
|
1707
|
-
|
1708
|
-
|
1709
|
-
|
1710
|
-
|
1711
|
-
|
1712
|
-
|
1713
|
-
|
1714
|
-
|
1715
|
-
|
1716
|
-
|
1717
|
-
|
1718
|
-
|
1719
|
-
|
1720
|
-
|
1721
|
-
|
1722
|
-
|
1723
|
-
|
1724
|
-
|
1725
|
-
|
1726
|
-
|
1727
|
-
|
1728
|
-
|
1729
|
-
|
1730
|
-
|
1731
|
-
|
1732
|
-
|
1733
|
-
|
1734
|
-
|
1735
|
-
|
1736
|
-
|
1737
|
-
|
1738
|
-
|
1739
|
-
|
1740
|
-
|
1741
|
-
|
1742
|
-
|
1743
|
-
|
1744
|
-
|
1745
|
-
|
1746
|
-
|
1747
|
-
|
1748
|
-
|
1749
|
-
|
1750
|
-
|
1751
|
-
|
1752
|
-
|
1753
|
-
|
1754
|
-
|
1755
|
-
|
1756
|
-
|
1757
|
-
|
1758
|
-
|
1759
|
-
|
1760
|
-
|
1761
|
-
|
1762
|
-
|
1763
|
-
|
1764
|
-
|
1765
|
-
|
1766
|
-
|
1767
|
-
|
1768
|
-
|
1769
|
-
|
1770
|
-
|
1771
|
-
|
1772
|
-
|
1773
|
-
|
1774
|
-
|
1775
|
-
|
1776
|
-
|
1777
|
-
|
1778
|
-
|
1779
|
-
|
1780
|
-
|
1781
|
-
|
1782
|
-
|
1783
|
-
|
1784
|
-
|
1785
|
-
|
1786
|
-
|
1787
|
-
|
1788
|
-
|
1789
|
-
|
1790
|
-
|
1791
|
-
|
1792
|
-
|
1793
|
-
|
1794
|
-
|
1795
|
-
|
1796
|
-
|
1797
|
-
|
1798
|
-
|
1799
|
-
|
1800
|
-
|
1801
|
-
|
1802
|
-
|
1803
|
-
|
1804
|
-
|
1805
|
-
|
1806
|
-
|
1807
|
-
|
1808
|
-
|
1809
|
-
|
1810
|
-
|
1811
|
-
|
1812
|
-
|
1813
|
-
|
1814
|
-
|
1815
|
-
|
1816
|
-
|
1817
|
-
|
1818
|
-
|
1819
|
-
|
1820
|
-
|
1821
|
-
|
1822
|
-
|
1823
|
-
|
1824
|
-
|
1825
|
-
|
1826
|
-
|
1827
|
-
|
1828
|
-
|
1829
|
-
|
1830
|
-
|
1831
|
-
|
1832
|
-
|
1833
|
-
|
1834
|
-
|
1835
|
-
|
1836
|
-
|
1837
|
-
|
1838
|
-
|
1839
|
-
|
1840
|
-
|
1841
|
-
|
1842
|
-
|
1843
|
-
|
1844
|
-
|
1845
|
-
|
1846
|
-
|
1847
|
-
|
1848
|
-
|
1849
|
-
|
1850
|
-
|
1851
|
-
|
1852
|
-
But I do not know how to locate ORIs.
|
1853
|
-
|
1854
|
-
|
1855
|
-
|
1856
|
-
|
1857
|
-
|
1858
|
-
|
1859
|
-
|
1860
|
-
|
1861
|
-
-------------------------------------------------------------------------------
|
1862
|
-
^^^ also integrate git into bioroebe.
|
1863
|
-
-------------------------------------------------------------------------------
|
1864
|
-
WIR MÜSSEN DAS HIER EXTREM VERBESSERN.
|
1865
|
-
|
1866
|
-
DANN UPLOADEN UND ALS BASIS FÜR APPLICATIONS NUTZEN.
|
1867
|
-
-------------------------------------------------------------------------------
|
1868
|
-
|
1869
|
-
Study MetaCyc
|
1870
|
-
^^^ study metabolic pathways.
|
1871
|
-
|
1872
|
-
http://metacyc.org/
|
1873
|
-
|
1874
|
-
→ Create KuroMetaCyc, in Analogy towards Metabolic Cycle.
|
1875
|
-
|
1876
|
-
-------------------------------------------------------------------------------
|
1877
|
-
|
1878
|
-
Welcome to BioShell May 2012. Type "help" to get some help.
|
1879
|
-
|
1880
|
-
Hello and welcome to the Bio Shell Version, last updated: May 2012
|
1881
|
-
|
1882
|
-
BIO SHELL> IPEYVDWRQKGAVTPVKNQGSCGSCWAFSAVVTIEGIIKIRTGNLNQYSEQELLDCDRRSYGCNGGYPWSALQLVAQYGI
|
1883
|
-
BIO SHELL> HYRNTYPYEGVQRYCRSREKGPYAAKTDGVRQVQPYNQGALLYSIANQPVSVVLQAAGKDFQLYRGGIFVGPCGNKVDHA
|
1884
|
-
BIO SHELL> VAAVGYGPNYILIKNSWGTGWGENGYIRIKRGTGNSYGVCGLYTSSFYPVKN
|
1885
|
-
BIO SHELL>
|
1886
|
-
BIO SHELL> input?
|
1887
|
-
BIO SHELL> pdb
|
1888
|
-
BIO SHELL>
|
1889
|
-
|
1890
|
-
^^^ (1)
|
1891
|
-
Add a pdb submodule
|
1892
|
-
When we type this, we then ask:
|
1893
|
-
"Please input your FASTA format now:"
|
1894
|
-
|
1895
|
-
|
1896
|
-
|
1897
|
-
|
1898
|
-
-------------------------------------------------------------------------------
|
1899
|
-
|
1900
|
-
http://biopython.org/DIST/docs/cookbook/Restriction.html#mozTocId101269
|
1901
|
-
|
1902
|
-
^^^ support this also:
|
1903
|
-
|
1904
|
-
>>> from Bio import Restriction
|
1905
|
-
>>> dir()
|
1906
|
-
['Restriction', '__builtins__', '__doc__', '__name__']
|
1907
|
-
>>> Restriction.EcoRI
|
1908
|
-
EcoRI
|
1909
|
-
>>> Restriction.EcoRI.site
|
1910
|
-
'GAATTC'
|
1911
|
-
>>>
|
1912
|
-
|
1913
|
-
and document it somewhere; perhaps in a new .cgi page.
|
1914
|
-
|
1915
|
-
The above will return the exact site, without verbosity.
|
1916
|
-
|
1917
|
-
|
1918
|
-
Restriction.EcoRI
|
1919
|
-
Restriction.EcoRI.site
|
1920
|
-
|
1921
|
-
result = Bioroebe.restriction_enzyme 'EcoRI.site'
|
1271
|
+
}
|
1272
|
+
- Clone, in bioroebe, the get_ORF functionality.
|
1273
|
+
- in bioroebe: Find out how to count where a restriction
|
1274
|
+
enzyme was found + add it into a table, also WHEN it
|
1275
|
+
was found and by WHO.
|
1276
|
+
We should make a good reference table there, so that
|
1277
|
+
people can reproduce where the information is kept
|
1278
|
+
or obtained from.
|
1279
|
+
- Enable:
|
1280
|
+
download ecoli
|
1281
|
+
first fix a bug
|
1282
|
+
and then make it so that we can download the ecoli
|
1283
|
+
sequence from that file.... yaml file
|
1284
|
+
- Rewrite Bioroebe (Pending since as of September 2019 ...)
|
1285
|
+
Start with the functionality that shall decode something,
|
1286
|
+
and put this back ... as a first class citizen with
|
1287
|
+
descriptions how to work with this.
|
1288
|
+
- improve quality
|
1289
|
+
- also fix the taxonomy project as you go...
|
1290
|
+
- Add C++ wrapper stuff, starting with qt-widgets specifically
|
1291
|
+
but also extend the base part of bioroebe as far as C++ is
|
1292
|
+
concerned. Perhaps at a later time also add/embed mruby
|
1293
|
+
into the project.
|
1294
|
+
- Need to overlay 2 exceptionally large DNA data sets and
|
1295
|
+
analyze the overlap.
|
1296
|
+
Want to determine the frequency of one event within the DNA
|
1297
|
+
reads of the other data set. Prefer to discuss the details
|
1298
|
+
of the data sets one-on-one.
|
1299
|
+
^^^ we need to compare two DNA data sets and analyze the overlap.
|
1300
|
+
- Make sure that the sinatra interface has the parts of
|
1301
|
+
emboss available, too. List the progress here.
|
1302
|
+
- interactivefasta interactive_fasta
|
1303
|
+
^^^ enable this... but I am not sure what is meant with
|
1304
|
+
that. :\
|
1305
|
+
--------------------------------------------------------------------------------
|
1306
|
+
(197) → Bei der Datenbanksuche werden die gemessenen Massen mit den Peptidmassen
|
1307
|
+
aller Proteine bzw. Gene in einer Datenbank (NCBI, Uniprot) verglichen. DNA-
|
1308
|
+
Sequenzen werden dazu in Proteinsequenzen übersetzt und in silico mit der beim
|
1309
|
+
Verdau benutzten Protease geschnitten.
|
1310
|
+
^^^ enable digestions
|
1311
|
+
--------------------------------------------------------------------------------
|
1312
|
+
(198) → Complexity of libraries:
|
1313
|
+
How many independent clones are necessary to represent a genome (plant,
|
1314
|
+
animal/fungus) or how many such clones have to be screened to have realistic
|
1315
|
+
chance of finding the gene of interest?
|
1316
|
+
This can be calculated by the formula:
|
1317
|
+
P = 1 - (1 - F/G) N
|
1318
|
+
N = ln(1 - P) / ln(1 - F/G)
|
1319
|
+
P.... Propability that a certain insert is present in library consisting of N clones
|
1320
|
+
F.... average insert size (kb)
|
1321
|
+
G... Genome size (kb) of the organism from which a library should be constructed
|
1322
|
+
N.... Number of clones in the library
|
1323
|
+
^^^^ add this formula + documentation
|
1324
|
+
Example: 16 kb average insert size in a replacement vector
|
1325
|
+
Genome sizes:
|
1326
|
+
Yeast:
|
1327
|
+
16 Mb = 16 000 kb (1000 clones with 16 kb = 1 genome equivalent)
|
1328
|
+
F/G=0.001.
|
1329
|
+
ln(1-F/G)= - 0.0010005
|
1330
|
+
95%: ln0.05= - 2.99573
|
1331
|
+
2,99/0.001 3000
|
1332
|
+
Wheat: 16 000 Mb
|
1333
|
+
3 .10 6 clones
|
1334
|
+
How many plaques can be screended on one 9 cm petri dish or filter?
|
1335
|
+
Plaques close to confluency: about 10 4 pfu per plate.
|
1336
|
+
While in case of yeast the whole library can be screened on 1 plate,
|
1337
|
+
300 plates would be needed for wheat - which is impracticable!
|
1338
|
+
(BAC "bacterial artificial chromosome" libraries with 150 -300 kb inserts are used!)
|
1339
|
+
Most plants have reasonable genome size (e.g. tomato about 800 Mb) - 15 filters
|
1340
|
+
have to be hybridized.
|
1341
|
+
--------------------------------------------------------------------------------
|
1342
|
+
(199) → BIO SHELL> BglI?
|
1343
|
+
We have found a restriction enzyme called BglI.
|
1344
|
+
The sequence this 11-cutter relates to is: `5' - GCCNNNNNGGC - 3'`
|
1345
|
+
It will specifically cut between: 5' - GCCNNNN|NGGC - 3'
|
1346
|
+
3' - CGG(A/T/G/C)(A/T/G/C)(A/T/G/C)(A/T/G/C)(A/T/|G/C)CCG - 5'
|
1347
|
+
^
|
1348
|
+
also add a line to say
|
1349
|
+
"This is a blunt-end cutter."
|
1350
|
+
or
|
1351
|
+
"This enzyme cre
|
1352
|
+
ates a staggered cut."
|
1353
|
+
http://biopython.org/DIST/docs/api/Bio.Restriction.Restriction.Blunt-class.html
|
1354
|
+
^^^ also add something like this... as a query thingy.
|
1355
|
+
catalyse(cls, dna, linear=True)
|
1356
|
+
List the sequence fragments after cutting dna with enzyme. source code
|
1357
|
+
catalyze(cls, dna, linear=True)
|
1358
|
+
List the sequence fragments after cutting dna with enzyme. source code
|
1359
|
+
is_blunt(cls)
|
1360
|
+
Return if the enzyme produces blunt ends. source code
|
1361
|
+
is_5overhang(cls)
|
1362
|
+
Return if the enzymes produces 5' overhanging ends. source code
|
1363
|
+
is_3overhang(cls)
|
1364
|
+
Return if the enzyme produces 3' overhanging ends. source code
|
1365
|
+
overhang(cls)
|
1366
|
+
Return the type of the enzyme's overhang as string. source code
|
1367
|
+
compatible_end(cls, batch=None)
|
1368
|
+
List all enzymes that produce compatible ends for the enzyme.
|
1369
|
+
http://biopython.org/DIST/docs/api/Bio.Restriction.Restriction.Blunt-class.html
|
1370
|
+
--------------------------------------------------------------------------------
|
1371
|
+
(200) → https://www.reddit.com/r/bioinformatics/comments/5o3kn8/bioinformatics_contest_2017_jan_23rd29th_solve_as/
|
1372
|
+
--------------------------------------------------------------------------------
|
1373
|
+
(201) → Finish all of biophp integration into bioroebe.
|
1374
|
+
http://www.biophp.org/
|
1375
|
+
--------------------------------------------------------------------------------
|
1376
|
+
(202) → locate oriC here:
|
1377
|
+
ttcgttaagtaacttcactgcccgtagtgtaccggcattcgctagcaagagtctttctg
|
1378
|
+
ggcaagcttcacttgtgatcgcggcctgtgcccccggaatgaaacaaccacgtccctgct
|
1379
|
+
aacaacgacgggaaaagggaagtgatccgtcggcagacccagactagtgcccttctccgg
|
1380
|
+
cttccaacaccaacgagtcggaccgaattgagcactcgaatgcacggcgctttttgccgg
|
1381
|
+
ccgaaacggcgcctccgcattgatcgacgcacggcctcttttggctacagcgcatggctt
|
1382
|
+
tacactcggcatgcatttccagtgctaatcaaacagaattccttgtaaagtccttcaacc
|
1383
|
+
gtgacagactatcgctaaggagcctttccagtcgtgcctgcaatcactcgcgaaatcaac
|
1384
|
+
aaatctacatctaagcacgctcgtggttcggagtcccgccctcatgtggaccatagccgg
|
1385
|
+
ttcgcccgagtcctaggcacgatcagaggacctatctttcgccactcaactcttctgagt
|
1386
|
+
gaaacaatatcgaccgaaaccttgctcggttttgtccacaacaacgtcaggcccataagc
|
1387
|
+
agacgacattagtccgctgttgtcgcgggcgtcccatagccgtacgatgtcccgtcgga
|
1388
|
+
ori?
|
1389
|
+
^^^ this shall give us all ORI in a sequence.
|
1390
|
+
DnaA protein binds to DNA-Box in an ori.
|
1391
|
+
'9cut'
|
1392
|
+
'8cut'
|
1393
|
+
'7cut'
|
1394
|
+
^^^ these give us slices
|
1395
|
+
But I do not know how to locate ORIs.
|
1396
|
+
--------------------------------------------------------------------------------
|
1397
|
+
(203) → ^^^ also integrate git into bioroebe.
|
1398
|
+
--------------------------------------------------------------------------------
|
1399
|
+
(204) → WIR MÜSSEN DAS HIER EXTREM VERBESSERN.
|
1400
|
+
DANN UPLOADEN UND ALS BASIS FÜR APPLICATIONS NUTZEN.
|
1401
|
+
--------------------------------------------------------------------------------
|
1402
|
+
(205) → Study MetaCyc
|
1403
|
+
^^^ study metabolic pathways.
|
1404
|
+
http://metacyc.org/
|
1405
|
+
→ Create KuroMetaCyc, in Analogy towards Metabolic Cycle.
|
1406
|
+
--------------------------------------------------------------------------------
|
1407
|
+
(206) → Welcome to BioShell May 2012. Type "help" to get some help.
|
1408
|
+
Hello and welcome to the Bio Shell Version, last updated: May 2012
|
1409
|
+
BIO SHELL> IPEYVDWRQKGAVTPVKNQGSCGSCWAFSAVVTIEGIIKIRTGNLNQYSEQELLDCDRRSYGCNGGYPWSALQLVAQYGI
|
1410
|
+
BIO SHELL> HYRNTYPYEGVQRYCRSREKGPYAAKTDGVRQVQPYNQGALLYSIANQPVSVVLQAAGKDFQLYRGGIFVGPCGNKVDHA
|
1411
|
+
BIO SHELL> VAAVGYGPNYILIKNSWGTGWGENGYIRIKRGTGNSYGVCGLYTSSFYPVKN
|
1412
|
+
BIO SHELL>
|
1413
|
+
BIO SHELL> input?
|
1414
|
+
BIO SHELL> pdb
|
1415
|
+
BIO SHELL>
|
1416
|
+
^^^ (1)
|
1417
|
+
Add a pdb submodule
|
1418
|
+
When we type this, we then ask:
|
1419
|
+
"Please input your FASTA format now:"
|
1420
|
+
--------------------------------------------------------------------------------
|
1421
|
+
(207) → http://biopython.org/DIST/docs/cookbook/Restriction.html#mozTocId101269
|
1422
|
+
^^^ support this also:
|
1423
|
+
>>> from Bio import Restriction
|
1424
|
+
>>> dir()
|
1425
|
+
['Restriction', '__builtins__', '__doc__', '__name__']
|
1426
|
+
>>> Restriction.EcoRI
|
1427
|
+
EcoRI
|
1428
|
+
>>> Restriction.EcoRI.site
|
1429
|
+
'GAATTC'
|
1430
|
+
>>>
|
1431
|
+
and document it somewhere; perhaps in a new .cgi page.
|
1432
|
+
The above will return the exact site, without verbosity.
|
1433
|
+
Restriction.EcoRI
|
1434
|
+
Restriction.EcoRI.site
|
1435
|
+
result = Bioroebe.restriction_enzyme 'EcoRI.site'
|
1922
1436
|
# => "GAATTC"
|
1923
|
-
|
1924
|
-
|
1925
|
-
|
1926
|
-
|
1927
|
-
|
1928
|
-
|
1929
|
-
- Clone the following functionality from Bio:
|
1930
|
-
|
1931
|
-
require 'bio'
|
1932
|
-
quality_threshold = 60
|
1933
|
-
Bio::FlatFile.open('sample.fastq').each {|entry|
|
1437
|
+
^^^ funktioniert bereits Teilweise, aber noch nit
|
1438
|
+
ausreichend.
|
1439
|
+
- Clone the following functionality from Bio:
|
1440
|
+
require 'bio'
|
1441
|
+
quality_threshold = 60
|
1442
|
+
Bio::FlatFile.open('sample.fastq').each {|entry|
|
1934
1443
|
hq_seq = entry.mask(quality_threshold)
|
1935
1444
|
puts hq_seq.output_fasta(entry.entry_id)
|
1936
|
-
|
1937
|
-
|
1938
|
-
|
1939
|
-
|
1940
|
-
|
1941
|
-
|
1942
|
-
|
1943
|
-
|
1944
|
-
|
1945
|
-
|
1946
|
-
|
1947
|
-
|
1948
|
-
|
1949
|
-
|
1950
|
-
|
1951
|
-
|
1952
|
-
|
1953
|
-
|
1954
|
-
|
1955
|
-
|
1956
|
-
|
1957
|
-
|
1958
|
-
|
1959
|
-
|
1960
|
-
|
1961
|
-
|
1962
|
-
|
1963
|
-
|
1964
|
-
|
1965
|
-
|
1966
|
-
|
1967
|
-
|
1968
|
-
|
1969
|
-
|
1970
|
-
|
1971
|
-
|
1972
|
-
|
1973
|
-
|
1974
|
-
|
1975
|
-
|
1976
|
-
|
1977
|
-
|
1978
|
-
|
1979
|
-
|
1980
|
-
|
1981
|
-
|
1982
|
-
|
1983
|
-
|
1984
|
-
|
1985
|
-
|
1986
|
-
|
1987
|
-
|
1988
|
-
|
1989
|
-
|
1990
|
-
|
1991
|
-
- create virus(:which_one, :amount) # Note the difference to the below
|
1992
|
-
- create hydra(:amount)
|
1993
|
-
- create bread
|
1994
|
-
..........................................................................
|
1995
|
-
→ both
|
1996
|
-
^ should work, does not work right now.
|
1997
|
-
..........................................................................
|
1998
|
-
→ Taxonomy is now integrated into bioroebe. This is good but we need more
|
1999
|
-
documentation, some more tests, a rethinking of the layout and the
|
2000
|
-
structures, and a fixing of the query-part of the database.
|
2001
|
-
|
2002
|
-
Also, make sure that it does the main functions.
|
2003
|
-
|
2004
|
-
rewrite for taxonomy in bioroebe
|
2005
|
-
and while doing this, also continue with the
|
2006
|
-
protokoll in bioinformatik
|
2007
|
-
so that we can finish both related-problemsters
|
2008
|
-
at about the same time \o/
|
2009
|
-
AND document this related-problems too
|
2010
|
-
Integrate this some other day...
|
2011
|
-
..........................................................................
|
2012
|
-
- http://www.restrictionmapper.org/cgi-bin/sitefind3.pl
|
2013
|
-
|
2014
|
-
^^^ Das sollte man integrieren, die Funktionalität, so das
|
1445
|
+
}
|
1446
|
+
- Document the workflow of all scripts in a reproducible
|
1447
|
+
manner, e. g. so that others can use it...
|
1448
|
+
→ it must also allow for different tables to be
|
1449
|
+
used! check this... so that we can search in
|
1450
|
+
standard ORF but also in different ORFs
|
1451
|
+
und die länge angeben, zumindest vom längsten ORF
|
1452
|
+
start + stop... also so das das ergebnis auch
|
1453
|
+
passt
|
1454
|
+
- Before we start to use rails for bioroebe, let's polish the GUI
|
1455
|
+
components more.
|
1456
|
+
But we really should use rails for this project too, at the
|
1457
|
+
least optionally.
|
1458
|
+
Ideally every small class has a tiny widget that can be
|
1459
|
+
interconnected. Perhaps do this via sinatra first and think
|
1460
|
+
of ways how to generalize on it.
|
1461
|
+
We should probably be systematic about this and go through
|
1462
|
+
each class, then write a GUI for it:
|
1463
|
+
^^^ first GUI
|
1464
|
+
^^^ then rails
|
1465
|
+
Ok, first task - write a lot of GUIs.
|
1466
|
+
gui/hamming_distance.rb - works ok-ish but it needs an in-widget
|
1467
|
+
notification, that is, we need to somehow colourize this thing.
|
1468
|
+
- fix batch generation of .star files
|
1469
|
+
- add script that will generate those weird
|
1470
|
+
files ... from the tch shell script
|
1471
|
+
- in mohitstar --help auch examples hinzufügen
|
1472
|
+
--examples
|
1473
|
+
we also need the Z coordinates into the .star file
|
1474
|
+
test.xmd
|
1475
|
+
/home/kumar/Desktop/test_exit/InputMicrographs/
|
1476
|
+
--------------------------------------------------------------------------------
|
1477
|
+
(208) → BioTodo - GENESIS, science fiction.
|
1478
|
+
- create virus(:which_one, :amount) # Note the difference to the below
|
1479
|
+
- create hydra(:amount)
|
1480
|
+
- create bread
|
1481
|
+
--------------------------------------------------------------------------------
|
1482
|
+
(209) → both
|
1483
|
+
^ should work, does not work right now.
|
1484
|
+
--------------------------------------------------------------------------------
|
1485
|
+
(210) → Taxonomy is now integrated into bioroebe. This is good but we need more
|
1486
|
+
documentation, some more tests, a rethinking of the layout and the
|
1487
|
+
structures, and a fixing of the query-part of the database.
|
1488
|
+
Also, make sure that it does the main functions.
|
1489
|
+
rewrite for taxonomy in bioroebe
|
1490
|
+
and while doing this, also continue with the
|
1491
|
+
protokoll in bioinformatik
|
1492
|
+
so that we can finish both related-problemsters
|
1493
|
+
at about the same time \o/
|
1494
|
+
AND document this related-problems too
|
1495
|
+
Integrate this some other day...
|
1496
|
+
--------------------------------------------------------------------------------
|
1497
|
+
(211) → http://www.restrictionmapper.org/cgi-bin/sitefind3.pl
|
1498
|
+
^^^ Das sollte man integrieren, die Funktionalität, so das
|
2015
1499
|
man ALLE Restriktion-Enzymes ausprobiert ausgehend von
|
2016
1500
|
einer bestimmten Sequenz.
|
2017
|
-
|
2018
|
-
→
|
2019
|
-
|
2020
|
-
|
2021
|
-
|
2022
|
-
|
2023
|
-
|
2024
|
-
|
2025
|
-
|
2026
|
-
|
2027
|
-
|
2028
|
-
|
2029
|
-
|
2030
|
-
|
2031
|
-
|
2032
|
-
|
2033
|
-
|
2034
|
-
|
2035
|
-
|
2036
|
-
|
2037
|
-
|
2038
|
-
|
2039
|
-
|
2040
|
-
|
2041
|
-
|
2042
|
-
|
2043
|
-
→
|
2044
|
-
|
2045
|
-
|
2046
|
-
|
2047
|
-
|
2048
|
-
|
2049
|
-
|
2050
|
-
|
2051
|
-
|
2052
|
-
|
2053
|
-
|
2054
|
-
|
2055
|
-
|
2056
|
-
|
2057
|
-
|
2058
|
-
|
2059
|
-
|
2060
|
-
|
2061
|
-
|
2062
|
-
|
2063
|
-
|
2064
|
-
|
2065
|
-
|
2066
|
-
|
2067
|
-
|
2068
|
-
|
2069
|
-
|
2070
|
-
|
2071
|
-
|
2072
|
-
|
2073
|
-
|
2074
|
-
|
2075
|
-
|
2076
|
-
|
2077
|
-
|
2078
|
-
|
2079
|
-
|
2080
|
-
|
2081
|
-
|
2082
|
-
|
2083
|
-
|
2084
|
-
|
2085
|
-
|
2086
|
-
|
2087
|
-
from /Programs/Ruby/2.3.1/lib/ruby/site_ruby/2.3.0/bioroebe/bioshell/bioshell.rb:52:in `shell'
|
2088
|
-
from /System/Executables/bioshell:6:in `<main>'
|
2089
|
-
|
2090
|
-
^^^ also fix this stupid bug.
|
2091
|
-
perhaps redo the whole restriction enzyme stuff.
|
2092
|
-
|
2093
|
-
→ Taxonomy components:
|
2094
|
-
|
2095
|
-
(1) Wir wollen nur einen query machen statt vieler kleiner
|
1501
|
+
--------------------------------------------------------------------------------
|
1502
|
+
(212) → A search is essentially substring search across a database of strings
|
1503
|
+
(albeit with a smaller alphabet). Some common use cases: one,
|
1504
|
+
scientists will search for certain genes that they've used in engineered
|
1505
|
+
plasmids. Two, since multiple codons can translate to the same amino
|
1506
|
+
acid, a process called "codon optimization" might replace codons with
|
1507
|
+
equivalent ones that work better given a certain kind of organism -
|
1508
|
+
your CAA (glutamine) might become CAG (still glutamine) instead. You
|
1509
|
+
might want to search by amino acid ("glutamine") so that you find
|
1510
|
+
it regardless of whether CAA or CAG was used.
|
1511
|
+
^^^^ yeah enable this too.
|
1512
|
+
Perhaps start with a "codon optimizer". This shall, for a given
|
1513
|
+
organism, replace the given codons with the "optimal" ones.
|
1514
|
+
Then also add this to sinatra interafce.
|
1515
|
+
Bioroebe::DetermineOptimalCodons
|
1516
|
+
^^^ this is currently incomplete.
|
1517
|
+
--------------------------------------------------------------------------------
|
1518
|
+
(213) → Redo restrictions enzymes completely.
|
1519
|
+
And polish this a LOT.
|
1520
|
+
This may take some days. But we want this to be REALLY good and
|
1521
|
+
lasting for a long time.
|
1522
|
+
Need to keep on working at that!
|
1523
|
+
--------------------------------------------------------------------------------
|
1524
|
+
(214) → Add: average_aminoacid_weight?
|
1525
|
+
→ === LV-Nummer 300214 UE Übung III B Sequenzanalysen in der Molekularbiologie
|
1526
|
+
→ Pubmed
|
1527
|
+
→ Finding sequences
|
1528
|
+
→ Sequence homology search (Blast, FASTA)
|
1529
|
+
→ Pairwise sequence alignment
|
1530
|
+
→ DNA analysis, translation
|
1531
|
+
→ Gene finding in genomes
|
1532
|
+
^^^ find genes
|
1533
|
+
→ Plasmid Cloning
|
1534
|
+
→ Primer for PCR
|
1535
|
+
→ Protein analysis
|
1536
|
+
→ PROSITE
|
1537
|
+
→ EMBOSS
|
1538
|
+
^^^ das nochmals in Ruhe durchgehen.
|
1539
|
+
→ in bioroebe viel C++ schreiben, möglichst optimieren,
|
1540
|
+
und dann irgendwann später ruby bindings dazu liefern
|
1541
|
+
→ add the ability to compare two FASTA files
|
1542
|
+
can probably do so via the two sequences.
|
1543
|
+
→ integrate "grundlagen der bioinfo" slides
|
1544
|
+
gleichzeitig während wir dafür lernen.
|
1545
|
+
ntSeq.each_entry do |e|
|
1546
|
+
# pep = Bio::Sequence.new(e.naseq.reverse_complement!.translate)
|
1547
|
+
pep = Bio::Sequence.new(e.naseq.translate)
|
1548
|
+
if strand == 0
|
1549
|
+
pep = Bio::Sequence.new(e.naseq.reverse_complement!.translate)
|
1550
|
+
end
|
1551
|
+
puts pep.output_fasta(e.definition,20)
|
1552
|
+
end
|
1553
|
+
^^^ we must show this in n characters per line:
|
1554
|
+
see https://raw.githubusercontent.com/zorino/bioruby-scripts/master/transeq.rb
|
1555
|
+
→ We must be able to align not only nucleotides but also aminoacids.
|
1556
|
+
But where is the alignment comparer? perhaps hamming distance?
|
1557
|
+
hmm we have to see.
|
1558
|
+
--------------------------------------------------------------------------------
|
1559
|
+
(215) → /Programs/Ruby/2.3.1/lib/ruby/site_ruby/2.3.0/bioroebe/bioshell/menu.rb:311:in `menu': undefined method `upcase' for ["EcoRI"]:Array (NoMethodError)
|
1560
|
+
from /Programs/Ruby/2.3.1/lib/ruby/site_ruby/2.3.0/bioroebe/bioshell/user_input.rb:31:in `block in enter_main_loop'
|
1561
|
+
from /Programs/Ruby/2.3.1/lib/ruby/site_ruby/2.3.0/bioroebe/bioshell/user_input.rb:12:in `loop'
|
1562
|
+
from /Programs/Ruby/2.3.1/lib/ruby/site_ruby/2.3.0/bioroebe/bioshell/user_input.rb:12:in `enter_main_loop'
|
1563
|
+
from /Programs/Ruby/2.3.1/lib/ruby/site_ruby/2.3.0/bioroebe/bioshell/initialize.rb:42:in `initialize'
|
1564
|
+
from /Programs/Ruby/2.3.1/lib/ruby/site_ruby/2.3.0/bioroebe/bioshell/bioshell.rb:52:in `new'
|
1565
|
+
from /Programs/Ruby/2.3.1/lib/ruby/site_ruby/2.3.0/bioroebe/bioshell/bioshell.rb:52:in `shell'
|
1566
|
+
from /System/Executables/bioshell:6:in `<main>'
|
1567
|
+
^^^ also fix this stupid bug.
|
1568
|
+
perhaps redo the whole restriction enzyme stuff.
|
1569
|
+
→ Taxonomy components:
|
1570
|
+
Wir wollen nur einen query machen statt vieler kleiner
|
2096
1571
|
queries.
|
2097
|
-
|
2098
|
-
|
2099
|
-
|
2100
|
-
(2) Send email, also enable disable_email notification
|
1572
|
+
Taxonomy component
|
1573
|
+
(2) Send email, also enable disable_email notification
|
2101
1574
|
this should be simple if we use the email part
|
2102
1575
|
of cyberweb-project.
|
2103
1576
|
We send an email when everything has finished.
|
2104
|
-
|
2105
|
-
|
2106
|
-
|
2107
|
-
|
2108
|
-
|
2109
|
-
|
2110
|
-
|
2111
|
-
|
2112
|
-
|
2113
|
-
|
2114
|
-
|
2115
|
-
|
2116
|
-
|
2117
|
-
|
2118
|
-
|
2119
|
-
|
2120
|
-
-
|
2121
|
-
|
2122
|
-
|
2123
|
-
|
2124
|
-
|
2125
|
-
|
2126
|
-
|
2127
|
-
|
2128
|
-
|
2129
|
-
|
2130
|
-
|
2131
|
-
|
2132
|
-
|
2133
|
-
|
2134
|
-
|
2135
|
-
|
2136
|
-
|
2137
|
-
|
2138
|
-
|
2139
|
-
|
2140
|
-
|
2141
|
-
|
2142
|
-
|
2143
|
-
|
2144
|
-
|
2145
|
-
|
2146
|
-
|
2147
|
-
|
2148
|
-
|
2149
|
-
|
2150
|
-
|
2151
|
-
|
2152
|
-
|
2153
|
-
|
2154
|
-
keywords = ARGV.join(' ')
|
2155
|
-
options = {
|
2156
|
-
'retmax' => 1
|
2157
|
-
}
|
2158
|
-
entries = Bio::PubMed.esearch(keywords, options)
|
2159
|
-
Bio::PubMed.efetch(entries).each do |entry|
|
2160
|
-
medline = Bio::MEDLINE.new(entry)
|
2161
|
-
reference = medline.reference
|
2162
|
-
puts reference.bibtex
|
2163
|
-
end
|
2164
|
-
^^^ enable BioPubMed access, similar to bioruby,
|
2165
|
-
then docment it as well.
|
2166
|
-
|
2167
|
-
- Learn from:
|
2168
|
-
|
2169
|
-
http://www.snapgene.com/products/snapgene_viewer/
|
2170
|
-
|
2171
|
-
-------------------------------------------------------------------------------
|
2172
|
-
(1) → Wir sollten GFP tagging unterstützen, also wie das
|
1577
|
+
data = 'Last update... job is now finished,
|
1578
|
+
at this date.'
|
1579
|
+
SendEmail.new to: Roebe.email?, data
|
1580
|
+
--------------------------------------------------------------------------------
|
1581
|
+
(216) → Document which parts of emboss have already been copied.
|
1582
|
+
→ EMBOSS.md
|
1583
|
+
--------------------------------------------------------------------------------
|
1584
|
+
(217) → Trametes_versicolor_FP-101664_SS1_pyranose_2-oxidase_partial_mRNA_XM_008046051.1.fasta
|
1585
|
+
Bioroebe::Shell: Now loading from `Trametes_versicolor_FP-101664_SS1_pyranose_2-oxidase_partial_mRNA_XM_008046051.1.fasta`.
|
1586
|
+
Bioroebe::ParseFasta: Will read from the file `Trametes_versicolor_FP-101664_SS1_pyranose_2-oxidase_partial_mRNA_XM_008046051.1.fasta`.
|
1587
|
+
This sequence is assumed to be DNA or RNA.
|
1588
|
+
The GC content of "XM_008046051.1 Trametes versicolor FP-101664 SS1 pyranose 2-oxidase partial mRNA" is:
|
1589
|
+
60.41667 %
|
1590
|
+
This sequence has 1872 nucleotides.
|
1591
|
+
We have identified a total of 1 entry in this fasta dataset.
|
1592
|
+
Setting DNA sequence to (1872 nucleotides):
|
1593
|
+
5'- ATGTCTACCAGCTCGAGCGACCCGTTCTTCAACTTCACGAAGTCGAGCTTCAGGAGCGCGGCGGCGCAGAAGGCCTCGGCGACTTCTCTGCCGCCGCTGCCTGGTCCCGACAAGAAAGTCCCTGGAATGGACATCAAGTACGACGTTGTCATAGTAGGCTCCGGACCGATTGGATGCACCTATGCCCGTGAGCTCGTCGAAGCCGGTTACAAGGTCGCGATGTTCGACATCGGAGAGATCGACTCCGGCCTGAAGATCGGTGCCCACAAGAAGAACACTGTCGAATACCAGAAGAACATTGACAAGTTTGTGAACGTCATTCAGGGTCAATTGATGTCTGTTTCCGTTCCCGTCAATACCCTCGTGATCGATACGCTCAGCCCGACGTCTTGGCAAGCTTCATCGTTCTTCGTCCGTAACGGCTCGAACCCAGAGCAGGACCCGCTTCGTAACCTCAGTGGTCAGGCGGTCACACGCGTCGTCGGGGGAATGTCCACGCATTGGACTTGCGCGACACCCCGCTTTGACCGCGAGCAGCGCCCACTGCTCGTGAAGGATGACACGGACGCCGACGACGCCGAGTGGGACCGGCTGTACACCAAGGCCGAGTCGTACTTCAAGACCGGGACGGACCAGTTCAAGGAGTCGATCCGCCACAACCTCGTGCTCAACAAGCTCGCGGAGGAATACAAAGGTCAGCGCGACTTCCAGCAGATCCCGCTGGCGGCAACGCGCCGCAGTCCGACCTTCGTCGAGTGGAGCTCGGCGAACACCGTGTTCGACCTCCAGAACAGGCCGAACACGGACGCGCCGAATGAGCGCTTCAACCTCTTCCCCGCGGTCGCATGTGAGCGCGTCGTGCGCAACACGTCGAACTCCGAGATCGAGAGTCTGCACATCCACGACCTCATCTCAGGCGACCGCTTCGAAATCAAGGCAGACGTGTTTGTTCTCACAGCCGGGGCGGTCCACAACGCGCAGCTTCTCGTGAACTCTGGCTTTGGACAGCTGGGCCGGCCGGACCCCGCGAACCCGCCGCGGTTGCTGCCTTCCCTGGGAAGCTACATCACCGAGCAGTCGCTCGTCTTCTGCCAGACCGTGATGAGCACCGAGCTCATCGACAGCGTCAAGTCCGACATGATCATCAGGGGCAACCCTGGTGATCCGGGGTATAGCGTCACGTACACGCCGGGCGCGTCGACCAACAAGCACCCGGACTGGTGGAACGAGAAGGTGAAGAACCACATGATGCAGCACCAGGAGGACCCGCTCCCGATCCCGTTCGAGGACCCCGAGCCCCAGGTCACCACCCTGTTCCAGCCGTCGCACCCGTGGCACACCCAGATCCACCGCGACGCTTTCAGTTACGGTGCGGTGCAGCAAAGCATCGACTCGCGTCTCATCGTCGACTGGCGCTTTTTCGGAAGGACGGAGCCCAAGGAGGAAAACAAGCTCTGGTTCTCGGACAAGATCACCGACACGTACAACATGCCGCAGCCGACGTTCGACTTCCGCTTCCCGGCAGGCCGCACGAGCAAGGAGGCGGAGGACATGATGACCGACATGTGCGTCATGTCGGCGAAGATTGGTGGCTTCCTGCCCGGCTCTCTCCCGCAATTCATGGAGCCCGGTCTTGTCCTTCACCTCGGTGGTACGCACCGCATGGGCTTCGATGAGCAGGAGGACAAGTGCTGCGTCAACACGGACTCCCGCGTGTTCGGCTTTAAGAACCTTTTCCTCGGCGGCTGCGGCAACATTCCCACCGCGTACGGCGCGAACCCGACGCTCACCGCAATGTCGCTCGCGATCAAGAGTTGCGAGTACATCAAGAACAACTTCACACCGAGCCCTTTCACAGATCAGGCTCAGTGA - 3'
|
1594
|
+
BIO SHELL> gc?
|
1595
|
+
Traceback (most recent call last):
|
1596
|
+
12: from /System/Index/bin/bioshell:27:in `<main>'
|
1597
|
+
11: from /Programs/Ruby/2.6.4/lib/ruby/site_ruby/2.6.0/bioroebe/shell/shell.rb:109:in `shell'
|
1598
|
+
10: from /Programs/Ruby/2.6.4/lib/ruby/site_ruby/2.6.0/bioroebe/shell/shell.rb:109:in `new'
|
1599
|
+
9: from /Programs/Ruby/2.6.4/lib/ruby/site_ruby/2.6.0/bioroebe/shell/initialize.rb:152:in `initialize'
|
1600
|
+
8: from /Programs/Ruby/2.6.4/lib/ruby/site_ruby/2.6.0/bioroebe/shell/user_input.rb:18:in `enter_main_loop'
|
1601
|
+
7: from /Programs/Ruby/2.6.4/lib/ruby/site_ruby/2.6.0/bioroebe/shell/user_input.rb:18:in `loop'
|
1602
|
+
6: from /Programs/Ruby/2.6.4/lib/ruby/site_ruby/2.6.0/bioroebe/shell/user_input.rb:41:in `block in enter_main_loop'
|
1603
|
+
5: from /Programs/Ruby/2.6.4/lib/ruby/site_ruby/2.6.0/bioroebe/shell/menu.rb:997:in `menu'
|
1604
|
+
4: from /Programs/Ruby/2.6.4/lib/ruby/site_ruby/2.6.0/bioroebe/shell/shell.rb:2605:in `calculcate_gc_content'
|
1605
|
+
3: from /Programs/Ruby/2.6.4/lib/ruby/site_ruby/2.6.0/bioroebe/shell/shell.rb:2605:in `new'
|
1606
|
+
2: from /Programs/Ruby/2.6.4/lib/ruby/site_ruby/2.6.0/bioroebe/calculate/calculate_gc_content.rb:41:in `initialize'
|
1607
|
+
1: from /Programs/Ruby/2.6.4/lib/ruby/site_ruby/2.6.0/bioroebe/calculate/calculate_gc_content.rb:71:in `set_data'
|
1608
|
+
/Programs/Ruby/2.6.4/lib/ruby/site_ruby/2.6.0/bioroebe/calculate/calculate_gc_content.rb:71:in `exist?': can't convert Bioroebe::Sequence to String (Bioroebe::Sequence#to_str gives Bioroebe::Sequence) (TypeError)
|
1609
|
+
- require 'bio'
|
1610
|
+
keywords = ARGV.join(' ')
|
1611
|
+
options = {
|
1612
|
+
'retmax' => 1
|
1613
|
+
}
|
1614
|
+
entries = Bio::PubMed.esearch(keywords, options)
|
1615
|
+
Bio::PubMed.efetch(entries).each do |entry|
|
1616
|
+
medline = Bio::MEDLINE.new(entry)
|
1617
|
+
reference = medline.reference
|
1618
|
+
puts reference.bibtex
|
1619
|
+
end
|
1620
|
+
^^^ enable BioPubMed access, similar to bioruby,
|
1621
|
+
then docment it as well.
|
1622
|
+
- Learn from:
|
1623
|
+
http://www.snapgene.com/products/snapgene_viewer/
|
1624
|
+
--------------------------------------------------------------------------------
|
1625
|
+
(218) → Wir sollten GFP tagging unterstützen, also wie das
|
2173
1626
|
Protein-Konstrukt aussehen soll und so weiter.
|
2174
1627
|
Das geht teilweise...
|
2175
1628
|
GFP? zeigt die Sequenz an.
|
2176
|
-
|
2177
|
-
|
2178
|
-
|
2179
|
-
|
2180
|
-
|
2181
|
-
|
2182
|
-
|
2183
|
-
|
2184
|
-
|
2185
|
-
|
2186
|
-
|
2187
|
-
|
2188
|
-
|
2189
|
-
|
2190
|
-
|
2191
|
-
|
2192
|
-
|
2193
|
-
|
2194
|
-
|
2195
|
-
|
2196
|
-
|
2197
|
-
|
2198
|
-
|
2199
|
-
|
2200
|
-
|
2201
|
-
|
2202
|
-
|
2203
|
-
|
2204
|
-
|
2205
|
-
|
2206
|
-
|
2207
|
-
- fasta_download AAC76198.2
|
2208
|
-
|
2209
|
-
^^^ enable the above in the bioshell, and perhaps also outside
|
1629
|
+
assign :GFP
|
1630
|
+
fügt die sequence asl main dna sequenz ein.
|
1631
|
+
Was fehlt? Hmmmm... eventuell noch mehr an
|
1632
|
+
dokumentation.
|
1633
|
+
--------------------------------------------------------------------------------
|
1634
|
+
(219) → in bioroebe, create subsequences for siRNA, then scan for
|
1635
|
+
submatcher + report where these are. Should be fast too.
|
1636
|
+
--------------------------------------------------------------------------------
|
1637
|
+
(220) → Reverse complement now works quite well, also via the sinatra
|
1638
|
+
interface. We still should have a way to show 5' and
|
1639
|
+
3', both on the commandline, and via sinatra.
|
1640
|
+
Perhaps via --fancy commandline flag or so.
|
1641
|
+
--------------------------------------------------------------------------------
|
1642
|
+
(221) → Cn3D files?
|
1643
|
+
^^^ add support for these; research what they are, too.
|
1644
|
+
--------------------------------------------------------------------------------
|
1645
|
+
(222) → Consider adding graphviz, perhaps to the taxonomy project
|
1646
|
+
where we make graphs towards different nodes or so...
|
1647
|
+
--------------------------------------------------------------------------------
|
1648
|
+
(223) → in parse fasta
|
1649
|
+
@colourize_sequence = false
|
1650
|
+
^^^ change this lateron...
|
1651
|
+
perhaps create a toplevel method
|
1652
|
+
this method now exists, but we still have to make
|
1653
|
+
the check better whether it is a protein or a DNA/RNA
|
1654
|
+
add a toplevel method for this.
|
1655
|
+
--------------------------------------------------------------------------------
|
1656
|
+
(224) → clone the BLast ident matcher functionality for aminacids into
|
1657
|
+
Bioroebe.
|
1658
|
+
- fasta_download AAC76198.2
|
1659
|
+
^^^ enable the above in the bioshell, and perhaps also outside
|
2210
1660
|
of the bioshell.
|
2211
|
-
|
2212
|
-
|
2213
|
-
|
2214
|
-
|
2215
|
-
|
2216
|
-
|
2217
|
-
|
2218
|
-
-------------------------------------------------------------------------------
|
2219
|
-
- Be able to mark exon/intron boundaries.
|
2220
|
-
|
2221
|
-
- Add "taxid?" to tell us the name of the organism. This works now.
|
2222
|
-
|
2223
|
-
^^^ should also work with a local database. ← we integrate this
|
1661
|
+
http://www.ncbi.nlm.nih.gov/protein/145693187?report=fasta
|
1662
|
+
^^^ shall use something such as the above
|
1663
|
+
--------------------------------------------------------------------------------
|
1664
|
+
(225) → Be able to mark exon/intron boundaries.
|
1665
|
+
- Add "taxid?" to tell us the name of the organism. This works now.
|
1666
|
+
^^^ should also work with a local database. ← we integrate this
|
2224
1667
|
at a later point.
|
2225
|
-
|
2226
|
-
|
2227
|
-
|
2228
|
-
|
2229
|
-
|
2230
|
-
|
2231
|
-
|
2232
|
-
|
2233
|
-
|
2234
|
-
|
2235
|
-
|
2236
|
-
|
2237
|
-
|
2238
|
-
|
2239
|
-
|
2240
|
-
|
2241
|
-
|
2242
|
-
|
2243
|
-
|
2244
|
-
|
2245
|
-
|
2246
|
-
|
2247
|
-
|
2248
|
-
|
2249
|
-
|
2250
|
-
|
2251
|
-
|
2252
|
-
|
2253
|
-
|
2254
|
-
|
2255
|
-
|
2256
|
-
|
2257
|
-
|
2258
|
-
|
2259
|
-
|
2260
|
-
|
2261
|
-
|
2262
|
-
|
2263
|
-
|
2264
|
-
|
2265
|
-
|
2266
|
-
|
2267
|
-
|
2268
|
-
|
2269
|
-
|
2270
|
-
|
2271
|
-
|
2272
|
-
|
2273
|
-
|
2274
|
-
|
2275
|
-
|
2276
|
-
|
2277
|
-
|
2278
|
-
|
2279
|
-
|
2280
|
-
|
2281
|
-
|
2282
|
-
|
2283
|
-
|
2284
|
-
Let's use a table for now to show which variants we have enabled:
|
2285
|
-
|
2286
|
-
DNA to protein | Bioroebe.to_aa
|
2287
|
-
Protein to DNA | [IMPLEMENTED FULLY] → http://www.biophp.org/minitools/protein_to_dna/
|
2288
|
-
| GTK GUI bindings now exist in a simple manner.
|
2289
|
-
Restriction digest of DNA | http://www.biophp.org/minitools/restriction_digest/demo.php
|
2290
|
-
Find Palindromic Sequences | [IMPLEMENTED FULLY]
|
2291
|
-
Sequence manipulation and data |
|
2292
|
-
Melting Temperature (Tm) Calculator |
|
2293
|
-
PCR Amplification |
|
2294
|
-
Microsatellite Repeats Finder |
|
2295
|
-
Alignment of DNA/Protein sequences | http://www.biophp.org/minitools/seq_alignment/demo.php
|
2296
|
-
|
2297
|
-
Microarray analysis: adaptive quantification |
|
2298
|
-
Protein sequence information | http://www.biophp.org/minitools/protein_properties/demo.php
|
2299
|
-
Reduced alphabets for proteins | http://www.biophp.org/minitools/reduce_protein_alphabet/ (started; implemented the first one for now)
|
2300
|
-
Chaos Game Representation |
|
2301
|
-
GC-, AT-, KETO- and oligo-skews generator |
|
2302
|
-
Oligonucleotide Frequency | [IMPLEMENTED PARTIALLY]
|
2303
|
-
^^^ we need a way to obtain a hash with the frequencies;
|
1668
|
+
- We have identified a total of 24 entries in this fasta dataset.
|
1669
|
+
There is a total of 4038 letters/nucleotides stored in total.
|
1670
|
+
Setting DNA sequence to (1582 nucleotides):
|
1671
|
+
^^^^ hmm should enable:
|
1672
|
+
@seq1
|
1673
|
+
@seq2
|
1674
|
+
and so forth
|
1675
|
+
- SUMOylation
|
1676
|
+
"small ubiquitin modifier"
|
1677
|
+
chemistry
|
1678
|
+
SUMO proteins are small
|
1679
|
+
100 aa / 12 kD.
|
1680
|
+
Similar structural fold as ubiqitin
|
1681
|
+
Most SUMO-modified proteins contain the tetrapeptide consensus motif.
|
1682
|
+
phi is a hydrophobic residue
|
1683
|
+
kappa is the lysine conjugated to SUMO
|
1684
|
+
x is any aa
|
1685
|
+
D or E is an acidic residue
|
1686
|
+
SomethingHydrophobic-K-x-D/E
|
1687
|
+
Prediction programmes, e.g. SUMOplot http://www.abgent.com/sumoplot
|
1688
|
+
>MYB44 LENGTH=305
|
1689
|
+
MADRIKGPWSPEEDEQLRRLVVKYGPRNWTVISKSIPGRSGKSCRLRWCNQLSPQVEHRPFSAEEDETIARAHAQFGNKWATI
|
1690
|
+
ARLLNGRTDNAVKNHWNSTLKRKCGGYDHRGYDGSEDHRPVKRSVSAGSPPVVTGLYMSPGSPTGSDVSDSSTIPILPSVELF
|
1691
|
+
KPVPRPGAVVLPLPIETSSSSDDPPTSLSLSLPGADVSEESNRSHESTNINNTTSSRHNHNNTVSFMPFSGGFRGAIEEMGKS
|
1692
|
+
FPGNGGEFMAVVQEMIKAEVRSYMTEMQRNNGGGFVGGFIDNGMIPMSQIGVGRIE
|
1693
|
+
^^^
|
1694
|
+
study sumoplot ...
|
1695
|
+
--------------------------------------------------------------------------------
|
1696
|
+
(226) → http://a-little-book-of-r-for-bioinformatics.readthedocs.io/en/latest/src/chapter7.html
|
1697
|
+
--------------------------------------------------------------------------------
|
1698
|
+
(227) → http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc22
|
1699
|
+
^^^ continue here; "You can also specify the table using the
|
1700
|
+
NCBI table number which is shorter, and often included in
|
1701
|
+
the feature annotation of GenBank files:"
|
1702
|
+
^^^ work through this and see if it is good.
|
1703
|
+
--------------------------------------------------------------------------------
|
1704
|
+
(228) → Clone ALL of biophp, if it us useful.
|
1705
|
+
Then state so too, then get rid of this entry here.
|
1706
|
+
But remember, we must also be able to do so via a webinterface!
|
1707
|
+
Oligos now work. Hmm.
|
1708
|
+
http://www.biophp.org/
|
1709
|
+
Let's use a table for now to show which variants we have enabled:
|
1710
|
+
DNA to protein | Bioroebe.to_aa
|
1711
|
+
Protein to DNA | [IMPLEMENTED FULLY] → http://www.biophp.org/minitools/protein_to_dna/
|
1712
|
+
| GTK GUI bindings now exist in a simple manner.
|
1713
|
+
Restriction digest of DNA | http://www.biophp.org/minitools/restriction_digest/demo.php
|
1714
|
+
Find Palindromic Sequences | [IMPLEMENTED FULLY]
|
1715
|
+
Sequence manipulation and data |
|
1716
|
+
Melting Temperature (Tm) Calculator |
|
1717
|
+
PCR Amplification |
|
1718
|
+
Microsatellite Repeats Finder |
|
1719
|
+
Alignment of DNA/Protein sequences | http://www.biophp.org/minitools/seq_alignment/demo.php
|
1720
|
+
Microarray analysis: adaptive quantification |
|
1721
|
+
Protein sequence information | http://www.biophp.org/minitools/protein_properties/demo.php
|
1722
|
+
Reduced alphabets for proteins | http://www.biophp.org/minitools/reduce_protein_alphabet/ (started; implemented the first one for now)
|
1723
|
+
Chaos Game Representation |
|
1724
|
+
GC-, AT-, KETO- and oligo-skews generator |
|
1725
|
+
Oligonucleotide Frequency | [IMPLEMENTED PARTIALLY]
|
1726
|
+
^^^ we need a way to obtain a hash with the frequencies;
|
2304
1727
|
we also need to extend this to 3 or 4 etc... oligos
|
2305
|
-
|
2306
|
-
|
2307
|
-
|
2308
|
-
|
2309
|
-
|
2310
|
-
|
2311
|
-
|
2312
|
-
|
2313
|
-
|
2314
|
-
|
2315
|
-
|
2316
|
-
We should also put this poart into doc/ subsection
|
2317
|
-
to keep track of what is missing and what is not.
|
2318
|
-
|
2319
|
-
-------------------------------------------------------------------------------
|
2320
|
-
(1) → sizeseq
|
2321
|
-
|
2322
|
-
^^^ clone this functionality and describe it in detail.
|
1728
|
+
Oligonucleotides for distance among sequences |
|
1729
|
+
Random sequences |
|
1730
|
+
Useful formulas
|
1731
|
+
rf biophp
|
1732
|
+
Palindromic sequences finder
|
1733
|
+
^^^ enable this next.
|
1734
|
+
We should also put this poart into doc/ subsection
|
1735
|
+
to keep track of what is missing and what is not.
|
1736
|
+
--------------------------------------------------------------------------------
|
1737
|
+
(229) → sizeseq
|
1738
|
+
^^^ clone this functionality and describe it in detail.
|
2323
1739
|
also for the www. Hmmm. Need to add this for the
|
2324
1740
|
www.
|
2325
|
-
|
2326
1741
|
http://www.bioinformatics.nl/cgi-bin/emboss/sizeseq
|
2327
|
-
|
2328
|
-
GTTGTTGCAAGATACAATCTGGTGTGTACTAGA
|
2329
|
-
AGCTAACTCCAGACCGATACAT
|
2330
|
-
CGGACTCGGCC
|
2331
|
-
AATACCAGCGTAGGCTGTGAGCTCGCGGCTGACAAAC
|
2332
|
-
GGAAACGTTTCCTATGTCGGGATTC
|
2333
|
-
|
2334
|
-
|
2335
|
-
|
2336
|
-
|
2337
|
-
|
2338
|
-
|
2339
|
-
|
2340
|
-
|
2341
|
-
|
2342
|
-
|
2343
|
-
|
2344
|
-
|
2345
|
-
|
2346
|
-
|
2347
|
-
|
2348
|
-
|
2349
|
-
|
2350
|
-
|
2351
|
-
|
2352
|
-
|
2353
|
-
|
2354
|
-
|
2355
|
-
|
2356
|
-
|
2357
|
-
|
2358
|
-
|
2359
|
-
|
2360
|
-
|
2361
|
-
|
2362
|
-
|
2363
|
-
|
2364
|
-
|
2365
|
-
|
2366
|
-
|
2367
|
-
|
2368
|
-
|
2369
|
-
|
2370
|
-
|
2371
|
-
|
2372
|
-
|
2373
|
-
|
2374
|
-
|
2375
|
-
|
2376
|
-
|
2377
|
-
|
2378
|
-
will be moved into the project.
|
2379
|
-
|
2380
|
-
Also tell how to start or get this GUI stuff to run, then add
|
2381
|
-
components that can be a part of bioroebe into it.
|
2382
|
-
|
2383
|
-
^^^ we should push this before asking for a job in the summer
|
1742
|
+
^^^ hmm implemented that I think.
|
1743
|
+
GTTGTTGCAAGATACAATCTGGTGTGTACTAGA
|
1744
|
+
AGCTAACTCCAGACCGATACAT
|
1745
|
+
CGGACTCGGCC
|
1746
|
+
AATACCAGCGTAGGCTGTGAGCTCGCGGCTGACAAAC
|
1747
|
+
GGAAACGTTTCCTATGTCGGGATTC
|
1748
|
+
Output file outseq
|
1749
|
+
>four
|
1750
|
+
ATGC
|
1751
|
+
>two
|
1752
|
+
GTTGTTGCAAGATACAATCTGGTGTGTACTAGACCGATACATCGGACTCGGCCAATACCA
|
1753
|
+
GCGTAGGCTGTGAGCTCGCGGCTGACAAACGGAAACGTTTCCTATGTCGGGAT
|
1754
|
+
>one
|
1755
|
+
GTTGTTGCAAGATACAATCTGGTGTGTACTAGAAGCTAACTCCAGACCGATACATCGGAC
|
1756
|
+
TCGGCCAATACCAGCGTAGGCTGTGAGCTCGCGGCTGACAAACGGAAACGTTTCCTATGT
|
1757
|
+
CGGGATTC
|
1758
|
+
>three
|
1759
|
+
GTTGTTGCAAGATACAATCTGGTGTGTACTAGAAGCTAACTCCAGACGTTGTTGCAAGAT
|
1760
|
+
ACAATCTGGTGTGTACTAGACGATACATCGGGTTGTTGCAAGATACAATCTGGTGTGTAC
|
1761
|
+
TAGAACTCGGCCAATACCAGCGTAGGCTGTGAGCTCGCGGCTGACAAACGGAAACGTTTC
|
1762
|
+
CTATGTCGGGATTC
|
1763
|
+
foobar.fasta
|
1764
|
+
^^^ demonstrate via foobar.fasta
|
1765
|
+
ALSO ADD A GUI; sizeseq.rb was added in February 2021.
|
1766
|
+
--------------------------------------------------------------------------------
|
1767
|
+
(230) → In the sinatra-web-interface for Bioroebe:
|
1768
|
+
continue quiz in rosalind !!!
|
1769
|
+
also, at to_dna: default to RNA
|
1770
|
+
And improve the general quality.
|
1771
|
+
→ also add the ability to tchange the codon table
|
1772
|
+
via URL through the sinatra interface
|
1773
|
+
Example:
|
1774
|
+
codon_table/1,2,3
|
1775
|
+
view it, change it, document it
|
1776
|
+
also add:
|
1777
|
+
view_codon_table
|
1778
|
+
^^^ shall display it on-line
|
1779
|
+
and give a formatted-view
|
1780
|
+
output-view numbering
|
1781
|
+
Something like:
|
1782
|
+
→ formatted_view
|
1783
|
+
111^^^^ in ncbi format
|
1784
|
+
and document all of this.
|
1785
|
+
--------------------------------------------------------------------------------
|
1786
|
+
(231) →
|
1787
|
+
--------------------------------------------------------------------------------
|
1788
|
+
(232) → Add a ruby-GUI stuff, probably the old biology/ subsection
|
1789
|
+
will be moved into the project.
|
1790
|
+
Also tell how to start or get this GUI stuff to run, then add
|
1791
|
+
components that can be a part of bioroebe into it.
|
1792
|
+
^^^ we should push this before asking for a job in the summer
|
2384
1793
|
months.
|
2385
|
-
|
2386
|
-
|
2387
|
-
|
2388
|
-
|
2389
|
-
|
2390
|
-
|
2391
|
-
|
2392
|
-
|
2393
|
-
|
2394
|
-
|
2395
|
-
|
2396
|
-
|
2397
|
-
|
2398
|
-
|
2399
|
-
|
2400
|
-
|
2401
|
-
|
2402
|
-
|
2403
|
-
|
2404
|
-
|
2405
|
-
---
|
2406
|
-
|
2407
|
-
^^^^ also add this commandline tool
|
2408
|
-
|
2409
|
-
|
2410
|
-
bin/swalin
|
2411
|
-
|
2412
|
-
should have same output
|
2413
|
-
- Gene finding in genomes
|
2414
|
-
|
2415
|
-
^^^ find genes;
|
1794
|
+
- http://www.biophp.org/minitools/seq_alignment/demo.php
|
1795
|
+
^^^ implement smith waterman alignment
|
1796
|
+
swalign AAGGGGAGGACGATGCGGATGTTC AGGGAGGACGATGCGG
|
1797
|
+
--------------------------------------------------------------------------------
|
1798
|
+
(233) → Query: cmdline (16 nt)
|
1799
|
+
Ref : cmdline (24 nt)
|
1800
|
+
Query: 1 A-GGGAGGACGATGCGG 16
|
1801
|
+
| |||||||||||||||
|
1802
|
+
Ref : 2 AGGGGAGGACGATGCGG 18
|
1803
|
+
Score: 31
|
1804
|
+
Matches: 16 (94.1%)
|
1805
|
+
Mismatches: 1
|
1806
|
+
CIGAR: 1M1D15M
|
1807
|
+
--------------------------------------------------------------------------------
|
1808
|
+
(234) → ^^^^ also add this commandline tool
|
1809
|
+
bin/swalin
|
1810
|
+
should have same output
|
1811
|
+
- Gene finding in genomes
|
1812
|
+
^^^ find genes;
|
2416
1813
|
and add: return all genes from the ORFs
|
2417
|
-
|
2418
|
-
|
2419
|
-
|
2420
|
-
|
2421
|
-
|
2422
|
-
|
2423
|
-
|
2424
|
-
|
2425
|
-
|
2426
|
-
|
2427
|
-
|
2428
|
-
|
2429
|
-
|
2430
|
-
|
2431
|
-
|
2432
|
-
|
2433
|
-
|
2434
|
-
|
2435
|
-
|
2436
|
-
|
2437
|
-
|
2438
|
-
|
2439
|
-
|
2440
|
-
|
2441
|
-
|
2442
|
-
|
2443
|
-
|
2444
|
-
|
2445
|
-
|
2446
|
-
|
2447
|
-
|
2448
|
-
|
2449
|
-
|
2450
|
-
|
2451
|
-
|
2452
|
-
|
2453
|
-
|
2454
|
-
|
2455
|
-
|
2456
|
-
|
2457
|
-
|
2458
|
-
|
2459
|
-
|
2460
|
-
|
2461
|
-
|
2462
|
-
|
2463
|
-
|
2464
|
-
|
2465
|
-
|
2466
|
-
|
2467
|
-
|
2468
|
-
|
2469
|
-
^^^^ enable this
|
2470
|
-
|
2471
|
-
|
2472
|
-
|
2473
|
-
-------------------------------------------------------------------------------
|
2474
|
-
- Identifying amino acid cleavage sites (Sigcleave)
|
2475
|
-
|
2476
|
-
For amino acid sequences we may be interested to know whether
|
2477
|
-
the amino acid sequence contains a cleavable signal sequence
|
2478
|
-
for directing the transport of the protein within the cell.
|
2479
|
-
|
2480
|
-
SigCleave is a program (originally part of the EGCG molecular
|
2481
|
-
biology package) to predict signal sequences, and to identify
|
2482
|
-
the cleavage site based on the "von Heijne" algorithm.
|
2483
|
-
|
2484
|
-
The threshold setting controls the score reporting. If no
|
2485
|
-
value for threshold is passed in by the user, the code
|
2486
|
-
defaults to a reporting value of 3.5.
|
2487
|
-
|
2488
|
-
SigCleave will only return score/position pairs which meet
|
2489
|
-
the threshold limit.
|
2490
|
-
|
2491
|
-
There are 2 accessor methods for this object.
|
2492
|
-
signals() will return a perl hash containing the
|
2493
|
-
sigcleave scores keyed by amino acid position.
|
2494
|
-
pretty_print() returns a formatted string similar
|
2495
|
-
to the output of the original sigcleave utility.
|
2496
|
-
|
2497
|
-
The syntax for using Sigcleave is as follows:
|
2498
|
-
|
2499
|
-
# create a Seq object, for example:
|
2500
|
-
$seqobj = Bio::Seq->new(-seq => "AALLHHHHHHGGGGPPRTTTTTVVVVVVVVVVVVVVV");
|
2501
|
-
|
2502
|
-
use Bio::Tools::Sigcleave;
|
2503
|
-
$sigcleave_object = new Bio::Tools::Sigcleave
|
1814
|
+
- Enable blast.
|
1815
|
+
- fasta http://www.ncbi.nlm.nih.gov/nuccore/145337669?report=fasta
|
1816
|
+
Input `fasta http://www.ncbi.nlm.nih.gov/nuccore/145337669?report=fasta` not found.
|
1817
|
+
^^^ this should work.
|
1818
|
+
fasta_header? http://www.ncbi.nlm.nih.gov/nuccore/145337669?report=fasta
|
1819
|
+
^^^ and this hmmm.
|
1820
|
+
- http://www.biophp.org/minitools/random_seqs/demo.php
|
1821
|
+
^^^ clone this thingy
|
1822
|
+
composition?
|
1823
|
+
^^^ also calculate the percentage.... hmm
|
1824
|
+
one part has been cloned finally BUT it has to be
|
1825
|
+
described... and we may have to show this
|
1826
|
+
in CGI too hmmmmmmm.
|
1827
|
+
the first one has been done, the second one not quite yet
|
1828
|
+
and we lack documentation!
|
1829
|
+
Also add this to sinatra yay!
|
1830
|
+
Well, we have cloned quite a bit so far. Need to finish
|
1831
|
+
this up eventually.
|
1832
|
+
- How do I write Sequences in Fasta format?
|
1833
|
+
FASTA format is a fairly standard bioinformatics output that is convenient and easy to read. BioRuby's Sequence class has a to_fasta method for formatting sequence in FASTA format.
|
1834
|
+
Printing any Bio::Sequence sequence object in FASTA format.
|
1835
|
+
#!/usr/bin/env ruby
|
1836
|
+
require 'bio'
|
1837
|
+
# Generates a sample 100bp sequence.
|
1838
|
+
seq1 = Bio::Sequence::NA.new("aatgacccgt" * 10)
|
1839
|
+
# Naming this sequence as "testseq" and print in FASTA format
|
1840
|
+
# (folded by 60 chars per line).
|
1841
|
+
puts seq1.to_fasta("testseq", 60)
|
1842
|
+
^^^^ enable this
|
1843
|
+
--------------------------------------------------------------------------------
|
1844
|
+
(235) → Identifying amino acid cleavage sites (Sigcleave)
|
1845
|
+
For amino acid sequences we may be interested to know whether
|
1846
|
+
the amino acid sequence contains a cleavable signal sequence
|
1847
|
+
for directing the transport of the protein within the cell.
|
1848
|
+
SigCleave is a program (originally part of the EGCG molecular
|
1849
|
+
biology package) to predict signal sequences, and to identify
|
1850
|
+
the cleavage site based on the "von Heijne" algorithm.
|
1851
|
+
The threshold setting controls the score reporting. If no
|
1852
|
+
value for threshold is passed in by the user, the code
|
1853
|
+
defaults to a reporting value of 3.5.
|
1854
|
+
SigCleave will only return score/position pairs which meet
|
1855
|
+
the threshold limit.
|
1856
|
+
There are 2 accessor methods for this object.
|
1857
|
+
signals() will return a perl hash containing the
|
1858
|
+
sigcleave scores keyed by amino acid position.
|
1859
|
+
pretty_print() returns a formatted string similar
|
1860
|
+
to the output of the original sigcleave utility.
|
1861
|
+
The syntax for using Sigcleave is as follows:
|
1862
|
+
# create a Seq object, for example:
|
1863
|
+
$seqobj = Bio::Seq->new(-seq => "AALLHHHHHHGGGGPPRTTTTTVVVVVVVVVVVVVVV");
|
1864
|
+
use Bio::Tools::Sigcleave;
|
1865
|
+
$sigcleave_object = new Bio::Tools::Sigcleave
|
2504
1866
|
( -seq => $seqobj,
|
2505
|
-
|
2506
|
-
|
1867
|
+
-threshold => 3.5,
|
1868
|
+
-description => 'test sigcleave protein seq',
|
2507
1869
|
);
|
2508
|
-
|
2509
|
-
|
2510
|
-
|
2511
|
-
|
2512
|
-
|
2513
|
-
|
2514
|
-
|
2515
|
-
|
2516
|
-
-
|
2517
|
-
|
2518
|
-
|
2519
|
-
|
2520
|
-
|
2521
|
-
|
2522
|
-
|
2523
|
-
|
2524
|
-
|
2525
|
-
|
2526
|
-
|
2527
|
-
|
2528
|
-
|
2529
|
-
|
2530
|
-
|
2531
|
-
|
2532
|
-
|
2533
|
-
|
2534
|
-
|
2535
|
-
|
2536
|
-
-
|
2537
|
-
|
2538
|
-
|
2539
|
-
|
2540
|
-
|
2541
|
-
|
2542
|
-
|
2543
|
-
|
2544
|
-
|
2545
|
-
|
2546
|
-
|
2547
|
-
|
2548
|
-
|
2549
|
-
|
2550
|
-
|
2551
|
-
|
2552
|
-
|
2553
|
-
|
2554
|
-
|
2555
|
-
|
2556
|
-
|
2557
|
-
|
2558
|
-
|
2559
|
-
|
2560
|
-
|
2561
|
-
|
2562
|
-
|
2563
|
-
|
2564
|
-
|
2565
|
-
|
2566
|
-
|
2567
|
-
|
2568
|
-
|
2569
|
-
|
2570
|
-
|
2571
|
-
|
2572
|
-
|
2573
|
-
|
2574
|
-
|
2575
|
-
|
2576
|
-
|
2577
|
-
|
2578
|
-
|
2579
|
-
|
2580
|
-
|
2581
|
-
|
2582
|
-
|
2583
|
-
|
2584
|
-
|
2585
|
-
|
2586
|
-
|
2587
|
-
|
2588
|
-
|
2589
|
-
|
2590
|
-
|
2591
|
-
|
2592
|
-
|
2593
|
-
|
2594
|
-
|
2595
|
-
|
2596
|
-
|
2597
|
-
|
2598
|
-
|
2599
|
-
|
2600
|
-
|
2601
|
-
|
2602
|
-
|
2603
|
-
|
2604
|
-
|
2605
|
-
|
2606
|
-
|
2607
|
-
|
2608
|
-
|
2609
|
-
|
2610
|
-
|
2611
|
-
|
2612
|
-
|
2613
|
-
|
2614
|
-
|
2615
|
-
|
2616
|
-
|
2617
|
-
|
2618
|
-
|
2619
|
-
|
2620
|
-
|
2621
|
-
|
2622
|
-
|
2623
|
-
|
2624
|
-
|
2625
|
-
|
2626
|
-
|
2627
|
-
|
2628
|
-
|
2629
|
-
|
2630
|
-
|
2631
|
-
|
2632
|
-
|
2633
|
-
|
2634
|
-
|
2635
|
-
|
2636
|
-
|
2637
|
-
|
2638
|
-
|
2639
|
-
|
2640
|
-
|
2641
|
-
|
2642
|
-
|
2643
|
-
|
2644
|
-
|
2645
|
-
|
2646
|
-
|
2647
|
-
|
2648
|
-
|
2649
|
-
|
2650
|
-
|
2651
|
-
|
2652
|
-
|
2653
|
-
|
2654
|
-
|
2655
|
-
|
2656
|
-
|
2657
|
-
F1 1 M Q L L R C F S I F S V I A S V L A Q E L T T I C E Q I P S P T L E 34
|
2658
|
-
|
2659
|
-
^^^ when we do f1
|
2660
|
-
the aminoacid sequence position is on the next
|
2661
|
-
line. this is bad.
|
2662
|
-
|
2663
|
-
|
2664
|
-
ff1 "ATGCAGTTACTTCGCTGATTTTCTGTTATTGCTTTTTCAATATTTTCTGTTATTGCTTCAGTTTTAGCACAGGAACTGACAACTATATGCGAGCAAATCCCCTCACCAACTTTAG
|
2665
|
-
ATGCAGTTACTTCGCTTTCTGTTATTGCTTCAGTTTTAGCACAGGAACTGACAACTATATGCGAGCAAATCCCCTCACCAACTTTAG"
|
2666
|
-
|
2667
|
-
this works semi ok...
|
2668
|
-
we probably have to rewrite the whole thing
|
2669
|
-
|
2670
|
-
BEFORE we add ANY COLOURS.
|
2671
|
-
OH WELL.
|
2672
|
-
|
2673
|
-
-------------------------------------------------------------------------------
|
2674
|
-
(100) → Add a primer-design widget
|
2675
|
-
|
2676
|
-
The idea is to be able to manipulate forward and
|
2677
|
-
reverse primer areas.
|
2678
|
-
|
2679
|
-
AND research how to do this ...
|
2680
|
-
We now have a ruby-gtk3 widget for this. It's not yet
|
2681
|
-
perfect but it is a start.
|
2682
|
-
|
2683
|
-
|
2684
|
-
https://www.bioinformatics.nl/molbi/SCLResources/sequence_notation.htm
|
2685
|
-
^^^ and check what is useful there. perhaps also add
|
2686
|
-
nicer visual cues to pretty it up a bit.
|
2687
|
-
-------------------------------------------------------------------------------
|
2688
|
-
(1) → Compare bioroebe to:
|
2689
|
-
|
2690
|
-
https://www.ncbi.nlm.nih.gov/orffinder
|
2691
|
-
|
2692
|
-
whether both return the same also possibly add a web-gui
|
2693
|
-
→ it must also allow for different tables to be used!
|
2694
|
-
check this... so that we can search in standard ORF
|
2695
|
-
but also in different ORFs
|
2696
|
-
und die länge angeben, zumindest vom längsten ORF start + stop... also so das das ergebnis auch passt
|
2697
|
-
...........................................................................
|
2698
|
-
test reverse complement in bioroebe
|
2699
|
-
^^^
|
2700
|
-
new_WWW/
|
2701
|
-
^^^ this should eventually become the new web-related interface.
|
2702
|
-
Ah well. Perhaps not ... ruby-cgi is soooooo annoying ...
|
2703
|
-
...........................................................................
|
2704
|
-
(154) → the blosum-viewer should be supported in the cgi part
|
1870
|
+
%raw_results = $sigcleave_object->signals;
|
1871
|
+
$formatted_output = $sigcleave_object->pretty_print;
|
1872
|
+
Please see Bio::Tools::Sigcleave for details.
|
1873
|
+
^^^ add this
|
1874
|
+
http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Tools/Sigcleave.html
|
1875
|
+
- enable drawing of images like the following:
|
1876
|
+
http://nar.oxfordjournals.org/content/43/D1/D227/F1.large.jpg
|
1877
|
+
https://www.researchgate.net/profile/Matt_Oates/publication/268790596/figure/fig1/AS:295477619773440@1447458762108/Summary-of-all-genome-updates-and-additions-at-the-level-of-taxonomic-Class-since-the.png
|
1878
|
+
- add reverse showorf
|
1879
|
+
like emboss
|
1880
|
+
document this
|
1881
|
+
then upload
|
1882
|
+
also enable in bioshell
|
1883
|
+
r1
|
1884
|
+
r2
|
1885
|
+
r3
|
1886
|
+
so wie f1
|
1887
|
+
f2 f3
|
1888
|
+
^^^ da gibt es einen bug. später nochmals probieren.
|
1889
|
+
- we have to add expasy...
|
1890
|
+
functionality to the cmdline too.
|
1891
|
+
Which one specifically? Let's see...
|
1892
|
+
https://www.expasy.org/
|
1893
|
+
--------------------------------------------------------------------------------
|
1894
|
+
(236) → https://biopython.org/wiki/Category%3ACookbook
|
1895
|
+
^^^ clone that
|
1896
|
+
--------------------------------------------------------------------------------
|
1897
|
+
(237) → include covid genome, and begin to analyse it in bioroebe
|
1898
|
+
"Das Genom von SARS-CoV-2 sei doppelt so groß wie jenes
|
1899
|
+
von Influenzaviren, daher scheinen letztere viermal
|
1900
|
+
so schnell zu mutieren, schrieb Moshiri."
|
1901
|
+
--------------------------------------------------------------------------------
|
1902
|
+
(238) → Look at the GUIs that are part of the BioRoebe project.
|
1903
|
+
Polish these part, at the least one widget, then
|
1904
|
+
make a screenshot, as the first one.
|
1905
|
+
Then upload the image + new release and docu!
|
1906
|
+
document also that more images will be added to this
|
1907
|
+
in the coming weeks and months. Once done, move this to the
|
1908
|
+
bottom and regularly improve on this part of the bioroebe
|
1909
|
+
project.
|
1910
|
+
^^^ also add java gui to it in the long run.
|
1911
|
+
Hmmm. And then, also consider transitioning into gtk3,
|
1912
|
+
and make mroe screenshots.
|
1913
|
+
--------------------------------------------------------------------------------
|
1914
|
+
(239) → https://www.ebi.ac.uk/Tools/seqstats/emboss_pepstats/
|
1915
|
+
http://www.ebi.ac.uk/Tools/services/web/toolresult.ebi?jobId=emboss_pepstats-I20160208-020243-0564-53154194-oy
|
1916
|
+
^^^^ clone the pepstat functionality
|
1917
|
+
printAA RLAVQYAPLSGCHSTIREDVHNLHFCRARKE*
|
1918
|
+
- Improve on temperature content and how it is calculated
|
1919
|
+
someone googled for it in 2014 so build on it
|
1920
|
+
--------------------------------------------------------------------------------
|
1921
|
+
(240) → pfasta /Depot/Temp/bioroebe/NM_000539.3_Homo_sapiens_rhodopsin_RHO.fasta
|
1922
|
+
Will read from the file `/Depot/Temp/bioroebe/NM_000539.3_Homo_sapiens_rhodopsin_RHO.fasta`.
|
1923
|
+
Bioroebe::ParseFasta: This sequence is assumed to be a protein.
|
1924
|
+
This sequence has 2768 aminoacids.
|
1925
|
+
We have identified a total of 1 entries in this fasta dataset.
|
1926
|
+
Bioroebe::BioShell: We will now assign this data to @_.
|
1927
|
+
Now assigning aminoacid sequence to:
|
1928
|
+
AGAGTCATCCAGCTGGAGCCCTGAGTGGCTGAGCTCAGGCCTTCGCAG
|
1929
|
+
AGAGTCATCCAGCTGGAGCCCTGAGTGGCTGAGCTCAGGCCTTCGCAG
|
1930
|
+
--------------------------------------------------------------------------------
|
1931
|
+
(241) → Formats
|
1932
|
+
BioPerl's SeqIO system understands lot of formats and can interconvert
|
1933
|
+
all of them. Here is a current listing of formats, as of version 1.6.
|
1934
|
+
^^^ must implement this too
|
1935
|
+
Name Description File extension
|
1936
|
+
abi ABI tracefile ab[i1]
|
1937
|
+
ace Ace database ace
|
1938
|
+
agave AGAVE XML
|
1939
|
+
alf ALF tracefile alf
|
1940
|
+
asciitree write-only, to visualize features
|
1941
|
+
bsml BSML using bsm,bsml
|
1942
|
+
bsml_sax BSML, using
|
1943
|
+
chadoxml CHADO sequence format
|
1944
|
+
chaos CHAOS sequence format
|
1945
|
+
chaosxml Chaos XML
|
1946
|
+
ctf CTF tracefile ctf
|
1947
|
+
embl EMBL database embl,ebl,emb,dat
|
1948
|
+
entrezgene Entrez Gene ASN1
|
1949
|
+
excel Excel
|
1950
|
+
exp Staden EXP format exp
|
1951
|
+
fasta FASTA fasta,fast,seq,fa,fsa,nt,aa
|
1952
|
+
fastq quality score data in FASTA-like format fastq
|
1953
|
+
flybase_chadoxml variant of Chado XML
|
1954
|
+
game GAME XML
|
1955
|
+
gcg GCG gcg
|
1956
|
+
genbank GenBank gb
|
1957
|
+
interpro InterProScan XML
|
1958
|
+
kegg KEGG
|
1959
|
+
largefasta Large files, fasta format
|
1960
|
+
lasergene Lasergene format
|
1961
|
+
locuslink LocusLink
|
1962
|
+
metafasta
|
1963
|
+
phd Phred phd,phred
|
1964
|
+
pir PIR database pir
|
1965
|
+
pln PLN tracefile pln
|
1966
|
+
qual Phred
|
1967
|
+
raw plain text txt
|
1968
|
+
scf Standard Chromatogram Format scf
|
1969
|
+
seqxml SeqXML sequence format xml
|
1970
|
+
strider DNA Strider format
|
1971
|
+
swiss SwissProt swiss,sp
|
1972
|
+
tab tab-delimited
|
1973
|
+
table Table
|
1974
|
+
tigr TIGR XML
|
1975
|
+
tigrxml TIGR Coordset XML
|
1976
|
+
tinyseq NCBI TinySeq XML
|
1977
|
+
ztr ZTR tracefile ztr
|
1978
|
+
--------------------------------------------------------------------------------
|
1979
|
+
(242) → Look at f1 display:
|
1980
|
+
10 20 30 40 50 60 70 80 90 100
|
1981
|
+
--------------------------------------------------------------------------------
|
1982
|
+
(243) → 1 ATGCAGTTACTTCGCTGTTTTTCAATATTTTCTGTTATTGCTTCAGTTTTAGCACAGGAACTGACAACTATATGCGAGCAAATCCCCTCACCAACTTTAG 100
|
1983
|
+
F1 1 M Q L L R C F S I F S V I A S V L A Q E L T T I C E Q I P S P T L E 34
|
1984
|
+
^^^ when we do f1
|
1985
|
+
the aminoacid sequence position is on the next
|
1986
|
+
line. this is bad.
|
1987
|
+
ff1 "ATGCAGTTACTTCGCTGATTTTCTGTTATTGCTTTTTCAATATTTTCTGTTATTGCTTCAGTTTTAGCACAGGAACTGACAACTATATGCGAGCAAATCCCCTCACCAACTTTAG
|
1988
|
+
ATGCAGTTACTTCGCTTTCTGTTATTGCTTCAGTTTTAGCACAGGAACTGACAACTATATGCGAGCAAATCCCCTCACCAACTTTAG"
|
1989
|
+
this works semi ok...
|
1990
|
+
we probably have to rewrite the whole thing
|
1991
|
+
BEFORE we add ANY COLOURS.
|
1992
|
+
OH WELL.
|
1993
|
+
--------------------------------------------------------------------------------
|
1994
|
+
(244) → Add a primer-design widget
|
1995
|
+
The idea is to be able to manipulate forward and
|
1996
|
+
reverse primer areas.
|
1997
|
+
AND research how to do this ...
|
1998
|
+
We now have a ruby-gtk3 widget for this. It's not yet
|
1999
|
+
perfect but it is a start.
|
2000
|
+
https://www.bioinformatics.nl/molbi/SCLResources/sequence_notation.htm
|
2001
|
+
^^^ and check what is useful there. perhaps also add
|
2002
|
+
nicer visual cues to pretty it up a bit.
|
2003
|
+
--------------------------------------------------------------------------------
|
2004
|
+
(245) → Compare bioroebe to:
|
2005
|
+
https://www.ncbi.nlm.nih.gov/orffinder
|
2006
|
+
whether both return the same also possibly add a web-gui
|
2007
|
+
→ it must also allow for different tables to be used!
|
2008
|
+
check this... so that we can search in standard ORF
|
2009
|
+
but also in different ORFs
|
2010
|
+
und die länge angeben, zumindest vom längsten ORF start + stop... also so das das ergebnis auch passt
|
2011
|
+
--------------------------------------------------------------------------------
|
2012
|
+
(246) → test reverse complement in bioroebe
|
2013
|
+
^^^
|
2014
|
+
new_WWW/
|
2015
|
+
^^^ this should eventually become the new web-related interface.
|
2016
|
+
Ah well. Perhaps not ... ruby-cgi is soooooo annoying ...
|
2017
|
+
--------------------------------------------------------------------------------
|
2018
|
+
(247) → the blosum-viewer should be supported in the cgi part
|
2705
2019
|
and sinatra part as well.
|
2706
2020
|
This now works for sinatra. Need to enable this for
|
2707
2021
|
the cgi-part too eventually.
|
2708
|
-
|
2709
|
-
(
|
2710
|
-
|
2711
|
-
|
2712
|
-
|
2713
|
-
|
2714
|
-
|
2715
|
-
|
2716
|
-
|
2717
|
-
|
2718
|
-
|
2719
|
-
|
2720
|
-
(
|
2721
|
-
|
2722
|
-
|
2723
|
-
|
2724
|
-
(
|
2725
|
-
|
2726
|
-
|
2727
|
-
|
2728
|
-
(
|
2729
|
-
|
2730
|
-
|
2731
|
-
|
2732
|
-
|
2733
|
-
|
2734
|
-
|
2735
|
-
|
2736
|
-
|
2737
|
-
|
2738
|
-
|
2739
|
-
|
2740
|
-
|
2741
|
-
|
2742
|
-
|
2743
|
-
|
2744
|
-
|
2745
|
-
|
2746
|
-
|
2747
|
-
|
2748
|
-
|
2749
|
-
|
2750
|
-
|
2751
|
-
|
2752
|
-
|
2753
|
-
|
2754
|
-
|
2755
|
-
|
2756
|
-
|
2757
|
-
(
|
2758
|
-
|
2759
|
-
|
2760
|
-
|
2761
|
-
|
2762
|
-
|
2763
|
-
|
2764
|
-
|
2765
|
-
|
2766
|
-
|
2767
|
-
|
2768
|
-
|
2769
|
-
|
2770
|
-
|
2771
|
-
|
2772
|
-
|
2773
|
-
|
2774
|
-
|
2775
|
-
|
2776
|
-
|
2777
|
-
|
2778
|
-
|
2779
|
-
|
2780
|
-
-------------------------------------------------------------------------------
|
2781
|
-
(90) - integrate calculation of the Instability index (II)
|
2782
|
-
|
2783
|
-
The instability index provides an estimate of the
|
2784
|
-
stability of your protein in a test tube. Statistical
|
2785
|
-
analysis of 12 unstable and 32 stable proteins has
|
2786
|
-
revealed [7] that there are certain dipeptides, the
|
2787
|
-
occurence of which is significantly different in the
|
2788
|
-
unstable proteins compared with those in the stable
|
2789
|
-
ones. The authors of this method have assigned a
|
2790
|
-
weight value of instability to each of the 400
|
2791
|
-
different dipeptides (DIWV). Using these weight
|
2792
|
-
values it is possible to compute an instability
|
2793
|
-
index (II) which is defined as:
|
2794
|
-
|
2795
|
-
|
2796
|
-
i=L-1
|
2797
|
-
|
2798
|
-
II = (10/L) * Sum DIWV(x[i]x[i+1])
|
2799
|
-
|
2800
|
-
i=1
|
2801
|
-
|
2802
|
-
|
2803
|
-
where: L is the length of sequence
|
2804
|
-
|
2805
|
-
DIWV(x[i]x[i+1]) is the instability weight value for the dipeptide starting in position i.
|
2806
|
-
|
2807
|
-
A protein whose instability index is smaller than 40
|
2808
|
-
is predicted as stable, a value above 40 predicts
|
2809
|
-
that the protein may be unstable.
|
2810
|
-
|
2022
|
+
--------------------------------------------------------------------------------
|
2023
|
+
(248) → port the sinatra stuff together in bioroebe
|
2024
|
+
create a dir: web_api
|
2025
|
+
^^^ also make params? usable in both sinatra and cgi page
|
2026
|
+
well ...............
|
2027
|
+
this is quite tough.
|
2028
|
+
Hmmmmmm...
|
2029
|
+
perhaps as a middle-step,
|
2030
|
+
add tons of HtmlTemplate[]
|
2031
|
+
and replace the ad-hoc code otherwise...
|
2032
|
+
^^^ yeah, finish the HtmlTemplate stuff.
|
2033
|
+
--------------------------------------------------------------------------------
|
2034
|
+
(249) → https://i.imgur.com/ptcSn12.png
|
2035
|
+
^^^ enable such an overview; this shows mass compuation e.g
|
2036
|
+
peptide mass and such
|
2037
|
+
--------------------------------------------------------------------------------
|
2038
|
+
(250) → Bioroebe.sanitize_nucleotide_sequence
|
2039
|
+
^^^ port this into java. The code has been written for this already,
|
2040
|
+
but we currently fail to link it.
|
2041
|
+
--------------------------------------------------------------------------------
|
2042
|
+
(251) → Batch-create the .exe files on windows for libui, once
|
2043
|
+
the first has been added. And then test it too
|
2044
|
+
AND document it. This should be done with the controller
|
2045
|
+
eventually. Once this works, we can remove this entry
|
2046
|
+
here.
|
2047
|
+
--------------------------------------------------------------------------------
|
2048
|
+
(252) → port more libui stuff in bioroebe. We have two widgets ported so far;
|
2049
|
+
add more such entries.
|
2050
|
+
--------------------------------------------------------------------------------
|
2051
|
+
(253) → after libui has been ported, explore how gosu works on windows.
|
2052
|
+
if possible add things to a gosu-specific UI as well, but
|
2053
|
+
we may need a common, unified GUI base for that.
|
2054
|
+
--------------------------------------------------------------------------------
|
2055
|
+
(254) → (86)
|
2056
|
+
add libui bindings AND once done make sure the controller works in
|
2057
|
+
libui as well. Embed the various things into it.
|
2058
|
+
Tab A set named tabs for placing items in
|
2059
|
+
^^^ use this perhaps also in bioroebe hmmm
|
2060
|
+
yeah.
|
2061
|
+
--------------------------------------------------------------------------------
|
2062
|
+
(255) → https://github.com/cnjinhao/nana/wiki/User-Works-using-Nana
|
2063
|
+
^^^ port the "DNA hybrid"
|
2064
|
+
https://camo.githubusercontent.com/4c27d554ca4d698d288628f21255f917c2c577e35d7e11dd67e21880d56b6b0a/687474703a2f2f6e616e6170726f2e6f72672f696d616765732f73637265656e73686f74732f746864795f7365715f6578706c2e706e67
|
2065
|
+
--------------------------------------------------------------------------------
|
2066
|
+
(256) → Bioroebe::Cell
|
2067
|
+
^^^ think about what to do with it. If we don't need it then perhaps
|
2068
|
+
we should just remove it. Think about this more at 2022, before
|
2069
|
+
deciding what to do.
|
2070
|
+
--------------------------------------------------------------------------------
|
2071
|
+
(257) → Add emboss cgplot functionality.
|
2072
|
+
https://www.bioinformatics.nl/cgi-bin/emboss/cpgplot
|
2073
|
+
--------------------------------------------------------------------------------
|
2074
|
+
(258) → integrate calculation of the Instability index (II)
|
2075
|
+
The instability index provides an estimate of the
|
2076
|
+
stability of your protein in a test tube. Statistical
|
2077
|
+
analysis of 12 unstable and 32 stable proteins has
|
2078
|
+
revealed [7] that there are certain dipeptides, the
|
2079
|
+
occurence of which is significantly different in the
|
2080
|
+
unstable proteins compared with those in the stable
|
2081
|
+
ones. The authors of this method have assigned a
|
2082
|
+
weight value of instability to each of the 400
|
2083
|
+
different dipeptides (DIWV). Using these weight
|
2084
|
+
values it is possible to compute an instability
|
2085
|
+
index (II) which is defined as:
|
2086
|
+
i=L-1
|
2087
|
+
II = (10/L) * Sum DIWV(x[i]x[i+1])
|
2088
|
+
i=1
|
2089
|
+
where: L is the length of sequence
|
2090
|
+
DIWV(x[i]x[i+1]) is the instability weight value for the dipeptide starting in position i.
|
2091
|
+
A protein whose instability index is smaller than 40
|
2092
|
+
is predicted as stable, a value above 40 predicts
|
2093
|
+
that the protein may be unstable.
|
2811
2094
|
# MEKVQYLTRSAIRRASTIEMPQQARQKLQNLFINFCLILICLLLICIIVMLL
|
2812
|
-
|
2813
|
-
|
2814
|
-
|
2815
|
-
|
2816
|
-
|
2817
|
-
|
2818
|
-
-------------------------------------------------------------------------------
|
2819
|
-
(1) → We have now added a method to show all hydrophobic amino acids, via the
|
2095
|
+
52
|
2096
|
+
The instability index (II) is computed to be 65.43
|
2097
|
+
This classifies the protein as unstable.
|
2098
|
+
--------------------------------------------------------------------------------
|
2099
|
+
(259) → We have now added a method to show all hydrophobic amino acids, via the
|
2820
2100
|
method .hydrophobic_amino_acids?. This works and has been documented
|
2821
2101
|
in May 2022. However had, we also still need a way to PREDICT
|
2822
2102
|
hydrophobic segments in a polypeptide sequence.
|
2823
|
-
|
2103
|
+
--------------------------------------------------------------------------------
|
2104
|
+
(260) → <img src="https://i.imgur.com/tkB8MTJ.png" style="margin: 1em">
|
2105
|
+
--------------------------------------------------------------------------------
|
2106
|
+
(261) → https://www.studocu.com/en-us/document/queens-college-cuny/biochemistry-laboratory/bioinformatics-exercise/13329106
|
2107
|
+
^^^ this enable via a method
|
2108
|
+
and add a screenshot
|
2109
|
+
we want to colourize an existing string
|
2110
|
+
ALSo use javascript for this on the www, otherwise use ruby
|
2111
|
+
hmmm. so first the ruby variant + sinatra demo app
|
2112
|
+
^^^ this works now, see the
|
2113
|
+
image at:
|
2114
|
+
https://i.imgur.com/tkB8MTJ.png
|
2115
|
+
However had, we should add a sinatra demo app too,
|
2116
|
+
and demonstrate this too and then documen it as-is
|
2117
|
+
--------------------------------------------------------------------------------
|
2118
|
+
(262) →
|
2119
|
+
make sure we have a good fasta-showing widget
|
2120
|
+
show how many nucleotides are
|
2121
|
+
AND add support to modify this as-is
|
2122
|
+
^^^^
|
2123
|
+
The fasta showing widget in ruby-gtk3 is quite nice, but we
|
2124
|
+
need to make it more convenient to work with.
|
2125
|
+
Perhaps add a context menu that can be customzied?
|
2126
|
+
hmm. and perhaps open the sequence in the editor
|
2127
|
+
or something ... perhaps also keybindings by default
|
2128
|
+
and a help option somewhere to explain all of this.
|
2129
|
+
--------------------------------------------------------------------------------
|
2130
|
+
(263) → Add a way in bioroebe to store a gene into a yaml file
|
2131
|
+
or so, and to also load it up again. Perhaps simplify
|
2132
|
+
this automatically. Need some ways to describe that.
|
2133
|
+
FastaToYaml
|
2134
|
+
^^^ perhaps?
|
2135
|
+
^^^ describe the why too
|
2136
|
+
This class now exists. We have to add more features to it
|
2137
|
+
eventually, though.
|
2138
|
+
--------------------------------------------------------------------------------
|
2139
|
+
(264) → https://pubchem.ncbi.nlm.nih.gov/compound/16131099#section=Top
|
2140
|
+
^^^ this website is quite interesting; try to use components
|
2141
|
+
from it.
|
2142
|
+
--------------------------------------------------------------------------------
|
2143
|
+
(265) → Add some option to show the aminoacid sequence, at the least
|
2144
|
+
store it; and optionally show it.
|
2145
|
+
possibly always report how many aminoacids are
|
2146
|
+
part of that file; and optionally also show
|
2147
|
+
the whole sequence.
|
2148
|
+
--------------------------------------------------------------------------------
|
2149
|
+
(266) → http://insilico.ehu.es/
|
2150
|
+
^^^ check if we have all of this incorporated
|
2151
|
+
--------------------------------------------------------------------------------
|
2152
|
+
(267) → Integrate these nice GUI parts parts:
|
2153
|
+
https://dev.to/kojix2/introduction-to-gr-rb-data-visualization-with-ruby-2c39
|
2154
|
+
--------------------------------------------------------------------------------
|
2155
|
+
(268) → AND THEN test on windows as well.
|
2156
|
+
^^^^^^^^^^^^^^
|
2157
|
+
--------------------------------------------------------------------------------
|
2158
|
+
(269) → add mouse chromsoome URL, also in the bioshell
|
2159
|
+
and the main README, to be of help for the
|
2160
|
+
user. add a mouse subsection.
|
2161
|
+
--------------------------------------------------------------------------------
|
2162
|
+
(270) → fix the taxonomy stuff...
|
2163
|
+
--------------------------------------------------------------------------------
|
2164
|
+
(271) → set_dna_sequence alu
|
2165
|
+
^^^ fetch random alu
|
2166
|
+
^^^ alu sequence
|
2167
|
+
Ok we started this now adding more details, but we
|
2168
|
+
need to become better at searching for this
|
2169
|
+
sequence.
|
2170
|
+
--------------------------------------------------------------------------------
|
2171
|
+
(272) → draw things based on GR
|
2172
|
+
--------------------------------------------------------------------------------
|
2173
|
+
(273) → https://mycocosm.jgi.doe.gov/help/screenshots/browser_viewer.png
|
2174
|
+
^^^ offer the same functionality
|
2175
|
+
--------------------------------------------------------------------------------
|
2176
|
+
(274) → https://genome.cshlp.org/content/12/10/1611/F3.expansion.html
|
2177
|
+
^^^ enable this, we must obtain a sequence then store into genbank format
|
2178
|
+
so, first fetch; then store as-is.
|
2179
|
+
--------------------------------------------------------------------------------
|
2180
|
+
(275) → be able to generate nice graphics
|
2181
|
+
https://genome.cshlp.org/content/12/10/1611/F1.large.jpg
|
2182
|
+
--------------------------------------------------------------------------------
|
2183
|
+
(276) → add rmagicks wrappre, perhaps via imageparadise or something
|
2184
|
+
the idea is that we can make fancy drawings and generate
|
2185
|
+
an image for the end user to see
|
2186
|
+
--------------------------------------------------------------------------------
|
2187
|
+
(277) → https://bioperl.org/howtos/Beginners_HOWTO.html#item13
|
2188
|
+
extend the sequence object and document it
|
2189
|
+
also add:
|
2190
|
+
class Genome
|
2191
|
+
and:
|
2192
|
+
def is_circular?
|
2193
|
+
@internal_hash[:is_circular]
|
2194
|
+
end; alias circular? is_circular? # === circular?
|
2195
|
+
def species?
|
2196
|
+
@internal_hash[:species] # return the species here
|
2197
|
+
end
|
2198
|
+
--------------------------------------------------------------------------------
|
2199
|
+
(278) → http://lib.ysu.am/open_books/312400.pdf
|
2200
|
+
clone:
|
2201
|
+
Primer.pl
|
2202
|
+
This program was written to support the required informatics for a sequencing
|
2203
|
+
lab. The desire was to quickly generate primer pair candidates for use in STS
|
2204
|
+
mapping. We use Bioperl modules to fetch the sequences from GenBank.
|
2205
|
+
#! /usr/bin/perl
|
2206
|
+
#
|
2207
|
+
# primers.pl
|
2208
|
+
#
|
2209
|
+
# Reads a list of
|
2210
|
+
% primers.pl AC013798
|
2211
|
+
AC013798
|
2212
|
+
Left Right Length Penalty
|
2213
|
+
CCTCCTGGACAACCTGTGTT TGAAGTCAGGGGACATAGGG 280 0.0823
|
2214
|
+
CCTCCTGGACAACCTGTGTT AGGCCAGTAGACTGGGTGTG 298 0.1758
|
2215
|
+
CCTCCTGGACAACCTGTGTT GGTGTGAAGTCAGGGGACAT 284 0.1852
|
2216
|
+
TTCCCGCATCTCTTAGCAGT AGGCCAGTAGACTGGGTGTG 209 0.1962
|
2217
|
+
CTTCCCGCATCTCTTAGCAG GACACTAGTGGCAAGGAGGC 226 0.2362
|
2218
|
+
Most of the primers.pl program is extremely simple. The real guts and power
|
2219
|
+
of the program lie in the classes and the methods we call. The next section
|
2220
|
+
examines the Primer3 module, which is similar to many Bioperl modules
|
2221
|
+
--------------------------------------------------------------------------------
|
2222
|
+
(279) → Clone all of Emboss. :)
|
2223
|
+
→ Clone and document the getorf functionality properly.
|
2224
|
+
See: http://emboss.sourceforge.net/apps/cvs/emboss/apps/getorf.html
|
2225
|
+
http://emboss.sourceforge.net
|
2226
|
+
http://emboss.sourceforge.net/apps/cvs/emboss/apps/getorf.html
|
2227
|
+
--------------------------------------------------------------------------------
|
2228
|
+
(280) → Add useful formulas for bioshell.
|
2229
|
+
--------------------------------------------------------------------------------
|
2230
|
+
(281) → Polish the GUI sets:
|
2231
|
+
https://i.imgur.com/djElIMh.png
|
2232
|
+
--------------------------------------------------------------------------------
|
2233
|
+
(282) → The taxonomy part should be fully integrated, without it
|
2234
|
+
being a standalone part anymore.
|
2235
|
+
continue on the taxonomy stuff.
|
2236
|
+
ne day this will work again *shake fist*
|
2237
|
+
--------------------------------------------------------------------------------
|
2238
|
+
(283) → Show the frequency of codons in different tables
|
2239
|
+
This works quite ok, but right now the approach is to store
|
2240
|
+
this in a .yml file which is not ideal.
|
2241
|
+
Thus, we have to add two things:
|
2242
|
+
- The ability to store this into a SQL database
|
2243
|
+
- The ability to batch-download all of these codons,
|
2244
|
+
which first requires that we have a way to obtain all
|
2245
|
+
taxonomic ids.
|
2246
|
+
Add where this can be found.
|
2247
|
+
IMPROVE THIS ALL!!!!!!!
|
2248
|
+
--------------------------------------------------------------------------------
|
2249
|
+
(284) → improve docu + tests for melting temperature analysis again
|
2250
|
+
+ usage example + GUI + web-use
|
2251
|
+
--------------------------------------------------------------------------------
|
2252
|
+
(285) → https://biopython.org/DIST/docs/tutorial/Tutorial.html#sec15
|
2253
|
+
^^^ work through the above, also integrate it + write docs
|
2254
|
+
https://raw.githubusercontent.com/biopython/biopython/master/Doc/examples/ls_orchid.fasta
|
2255
|
+
--------------------------------------------------------------------------------
|
2256
|
+
(286) → work a bit more on tk!!!
|
2257
|
+
in particular to start it from the bioshell as-is.
|
2258
|
+
^^^ this is mostly done for quick
|
2259
|
+
demonstration purposes
|
2260
|
+
- also add another ruby-tk widget hmm
|
2261
|
+
a new one...
|
2262
|
+
^^^^^ and fix the remaining ones
|
2263
|
+
hamming_distance [PARTIALLY IMPLEMENTED; ~80%]
|
2264
|
+
protein_to_DNA
|
2265
|
+
^^^^ improve both while improving tk_paradise docu as well.
|
2266
|
+
--------------------------------------------------------------------------------
|
2267
|
+
(287) → add 2nd_orf
|
2268
|
+
→ this shall scan for the 2nd orf
|
2269
|
+
→ and third ORF as well, then, and document it.
|
2270
|
+
--------------------------------------------------------------------------------
|
2271
|
+
(288) → https://github.com/pjotrp/bigbio
|
2272
|
+
^^^^ include uses cases from that readme
|
2273
|
+
--------------------------------------------------------------------------------
|
2274
|
+
(289) -> bioinformatiocs bioroebe:
|
2275
|
+
cut_via(:trypsin)
|
2276
|
+
^^^^ show the digest as array
|
2277
|
+
then upload after documenting this
|
2278
|
+
------------------------------------------------------------------------
|