bio-ucsc-api 0.2.1 → 0.3.0
Sign up to get free protection for your applications and to get access to all the features.
- data/COPYING.ja +2 -0
- data/README.rdoc +59 -44
- data/VERSION +1 -1
- data/bio-ucsc-api.gemspec +7 -2
- data/lib/bio-ucsc.rb +4 -4
- data/lib/bio-ucsc/genomic-interval-bin.rb +13 -0
- data/lib/bio-ucsc/table_class_detector.rb +206 -67
- data/samples/bed2refseq.rb +12 -8
- data/samples/hg19-2bit-retrieve.rb +0 -2
- data/samples/hg19-sample.rb +22 -38
- data/samples/num-gene-exon.rb +10 -6
- data/samples/symbol2summary.rb +4 -5
- data/spec/chromosome_specific_tables_spec.rb +36 -0
- data/spec/find_by_and_spec.rb +18 -0
- data/spec/genomic-interval-bin_spec.rb +23 -0
- data/spec/named_scope_spec.rb +81 -0
- metadata +32 -23
data/COPYING.ja
CHANGED
data/README.rdoc
CHANGED
@@ -9,22 +9,24 @@ http://rubyucscapi.userecho.com/.
|
|
9
9
|
== Features
|
10
10
|
|
11
11
|
* Supporting all organisms in the UCSC genome database.
|
12
|
-
* Using ActiveRecord as an O/R mapping framework. Basically, each tables can access using ActiveRecord
|
12
|
+
* Using ActiveRecord as an O/R mapping framework. Basically, each tables can access using ActiveRecord method convention.
|
13
13
|
* Using the Bin index system to improve query performance. This is one of the reason why you use Ruby UCSC API instead of submitting SQL queries directly.
|
14
14
|
* Supporting genomic sequence query using locally downloaded "2bit" files. Genomic sequences are not stored in UCSC's official MySQL database.
|
15
15
|
* Automatic conversion of "1-based full-closed intervals" to internal "0-based left-closed right-open intervals" (see also bioruby-genomic-interval)
|
16
|
-
* Supporting non-official MySql hosts (e.g. local servers)
|
16
|
+
* Supporting non-official full/partial mirror MySql hosts (e.g. local servers)
|
17
17
|
* Using Rspec for the testing framework
|
18
|
+
* Written in pure Ruby and supporting multiple Ruby interpreter implementations including Ruby1.8, Ruby1.9, and JRuby1.6
|
18
19
|
* Designed as a BioRuby plugin
|
19
|
-
* Current version does not support
|
20
|
+
* Current version does not support table-linked bigWIG/bigBED/BAM files.
|
20
21
|
|
21
22
|
== Supported databases (genome assemblies)
|
23
|
+
|
22
24
|
[human] Hg19, Hg18
|
23
25
|
[mammals] chimp (PanTro3), orangutan (PonAbe2), rhesus (RheMac2), marmoset (CalJac3), mouse (Mm9), rat (Rn4), guinea pig (CavPor3), rabbit (OryCun2), cat (FelCat4), panda (AilMel1), dog (CanFam2), horse (EquCab2), pig (SusScr2), sheep (OviAri1), cow (BosTau4), elephant (LoxAfr3), opossum (MonDom5), platypus (OrnAna1)
|
24
26
|
[vertebrates] chicken (GalGal3), zebra finch (TaeGut1), lizard (AnoCar2), X. tropicalis (XenTro2), zebrafish (DanRer7), tetraodon (TetNig2), fugu (Fr2), stickleback (GasAcu1), medaka (OryLat2), lamprey (PetMar1)
|
25
27
|
[deuterostomes] lancelet (BraFlo1), sea squirt (Ci2), sea urchin (StrPur2)
|
26
28
|
[insects] D.melanogaster (Dm3), D.simulans (DroSim1), D.sechellia (DroSec1), D.yakuba (DroYak2), D.erecta (DroEre1), D.ananassae (DroAna2), D.pseudoobscura (Dp3), D.persimilis (DroPer1), D.virilis (DroVir2), D.mojavensis (DroMoj2), D.grimshawi (DroGri1), Anopheles mosquito (AnoGam1), honey bee (ApiMel2)
|
27
|
-
[nematodes] C.elegans (Ce6), C.brenneri (CaePb3), C.briggsae (Cb3), C.remanei (CaeRem3), C.japonica (
|
29
|
+
[nematodes] C.elegans (Ce6), C.brenneri (CaePb3), C.briggsae (Cb3), C.remanei (CaeRem3), C.japonica (CaeJap1), P.pacificus (PriPac1)
|
28
30
|
[others] sea hare (AplCal1), yeast (SacCer2)
|
29
31
|
[genome assembly independent] Go, HgFixed, Proteome, UniProt, VisiGene
|
30
32
|
|
@@ -34,24 +36,27 @@ This package is based on the followings:
|
|
34
36
|
* original ruby-ucsc-api: https://github.com/jandot/ruby-ucsc-api
|
35
37
|
* ruby-ensembl-api: https://github.com/jandot/ruby-ensembl-api
|
36
38
|
|
37
|
-
|
38
|
-
|
39
|
-
* active_record http://api.rubyonrails.org/classes/ActiveRecord/Base.html
|
40
|
-
* bioruby-genomic-interval https://github.com/misshie/bioruby-genomic-interval
|
41
|
-
|
42
|
-
Requirement:
|
39
|
+
Supported Ruby interpreter implementations:
|
43
40
|
|
44
41
|
* Ruby version 1.9.2 or later
|
45
42
|
* Ruby version 1.8.7 or later
|
46
|
-
*
|
43
|
+
* JRuby version 1.6.3 or later - Appropiate Java heap size may have to be specified to invoke JRuby, especially when you use Bio::Ucsc::Reference. Try "jruby -J-Xmx3g your_script.rb" to keep 3G byte heap.
|
44
|
+
|
45
|
+
Major dependent gems:
|
46
|
+
|
47
|
+
* active_record - http://api.rubyonrails.org/classes/ActiveRecord/Base.html
|
48
|
+
* bioruby-genomic-interval - https://github.com/misshie/bioruby-genomic-interval
|
49
|
+
* mysql (MySQL/Ruby MySQL API module) - http://www.tmtm.org/mysql/ruby/README.html
|
47
50
|
|
48
51
|
See also:
|
49
52
|
|
50
|
-
* Strozzi F, Aerts J: A Ruby API to query the Ensembl database for genomic features.
|
51
|
-
Bioinformatics 2011, 27:1013-1014.
|
53
|
+
* Strozzi F, Aerts J: A Ruby API to query the Ensembl database for genomic features. Bioinformatics 2011, 27:1013-1014.
|
52
54
|
* UCSCBin library - https://github.com/misshie/UCSCBin
|
53
55
|
|
54
56
|
== Change Log
|
57
|
+
* *NEW* (v.0.3.0): Now genomic interval queries are expressed using the named scope "with_interval". Table#find_(all_)by_interval is now deprecated. Sorry for an inconstant API. However, this change enable combination queries using genomic intervals and any fields.
|
58
|
+
* *NEW* (v.0.3.0): Bio::GenomicInterval#bin_all and Bio::GenomicInterval#bin return the bin index for the given interval.
|
59
|
+
* *NEW* (v.0.3.0): Supporting JRuby 1.6.3 or later. Appropiate Java heap size may have to be specified to invoke JRuby, especially when you use Bio::Ucsc::Reference. Try "jruby -J-Xmx3g your_script.rb" to keep 3G byte heap.
|
55
60
|
* *NEW* (v.0.2.1): New genome assemblies are supported: [chimp] PanTro3, [orangutan] PonAbe2, [rhesus] RheMac2, [marmoset] CalJac3, [rat] Rn4, [guinea pig] CavPor3, [rabbit] OryCun2, [cat] FelCat4, [panda] AilMel1, [Dog] CanFam2, [horse] EquCab2, [pig] SusScr2, [sheep] OviAri1, [cow] BosTau4, [elephant] LoxAfr3, [opossum] MonDom5, [platypus] OrnAna1, [chicken] GalGal3, [zebra finch] TaeGut1, [lizard] AnoCar2, [X. tropicalis] XenTro2, [zebrafish] DanRer7, [tetraodon] TetNig2, [fugu] Fr2, [stickleback] GasAcu1, [medaka] OryLat2, [lamprey] PerMar1, [lancelet] BraFlo1, [sea squirt] Ci2, [sea urchin] StrPur2, [D.simulans] DroSim1, [D.sechellia] DroSec1, [D.yakuba] DroYak2, [D.electa] DroEre1, [D.ananassae] DroAna2, [D.pseudoobscura] Dp3, [D.persimilis] DroPer1, [D. virilis] DroVir2, [D.mojavensis] DroMoj2, [D.grimshawi] DroGri1, [Anopheles mosquito] AnoGam1, [honey bee] ApiMel2, [C.brenneri] CaePb3, [C.briggsae] Cb3, [C.remanei] CaeRem3, [P.pacificus] PriPac1, [sea hare] AplCal1, [yeast] SacCer2
|
56
61
|
* *NEW* (v.0.2.1): Supporting Ruby 1.8.7 or later
|
57
62
|
* *NEW* Adding to human Hg19 and Hg18, the following genome assemblies are supported: [mouse] Mm9, [fruitfly] Dm3, [C. elegans] Ce6, [genome assembly independent] Go, HgFixed, Proteome, UniProt, VisiGene
|
@@ -78,64 +83,74 @@ You may need to be root or use "sudo". "--no-ri" and "--no-rdoc" options are rec
|
|
78
83
|
* Before using a database, establish a connection to the database. For example, "Bio::Ucsc::Hg19::DBConnection.connect".
|
79
84
|
* A table in a database is represented as a class in the database module. For example, the snp132 table in the hg19 database is referred by "Bio::Ucsc::Hg19::Snp132".
|
80
85
|
* Queries to a field (column) in a table are represented by class methods of the table class. For example, finding the first record (row) of the snp132 table in the hg19 database is "Bio::Ucsc::Hg19::Snp132.first".
|
81
|
-
* Queries using genomic intervals are supported by
|
86
|
+
* Queries using genomic intervals are supported by the named scope ".with_intervals" and ".with_intervals_excl (omitting pertially included annotations)" method of the table class. The method accepts a Bio::GenomicInterval object containing a genomic interval such as "chr1:1233-5678". If a table to query has the "bin" column, the bin index system is automatically used to speed-up the query.
|
82
87
|
* Fields in a retrieved record can be acccessed by using instance methods of a record object. For example, the name field of a table record stored in the "result" variable is "result.name".
|
83
88
|
|
84
89
|
=== Sample Codes
|
90
|
+
At first, you have to declare the API and establish the connection to a database.
|
85
91
|
require 'bio-ucsc'
|
86
92
|
|
87
|
-
include Bio
|
88
|
-
DBConnection.connect
|
93
|
+
include Bio # To short-cut the class path
|
94
|
+
Ucsc::Hg19::DBConnection.connect
|
89
95
|
|
90
|
-
|
91
|
-
|
92
|
-
#
|
93
|
-
|
94
|
-
|
96
|
+
In the first reference of a table class, the followings does not work:
|
97
|
+
include Bio::Ucsc::Hg19
|
98
|
+
Snp131.first # The Ruby interpreter searchs Snp131 at the top-level
|
99
|
+
But the following line works because the API will fail to prefetch the table and define the appropriate class dynamically. "include Bio" or "include Bio::Ucsc" will work.
|
100
|
+
Ucsc::Hg19::Snp131 # This line works
|
95
101
|
|
96
|
-
|
97
|
-
|
98
|
-
|
99
|
-
|
102
|
+
Table search using genomic intervals:
|
103
|
+
gi = GenomicInterval.parse("chr1:1-11,000")
|
104
|
+
Ucsc::Hg19::Snp131.with_interval(gi).find(:all).each do |e|
|
105
|
+
i = GenomicInterval.zero_based(e.chrom, e.chromStart, e.chromEnd)
|
106
|
+
puts "#{i.chrom}\t#{i.chr_start}\t#{e.name}\t#{e[:class]}" # "e.class" does not work
|
100
107
|
end
|
101
108
|
|
102
|
-
gi =
|
103
|
-
|
104
|
-
|
109
|
+
gi = GenomicInterval.parse("chr17:7,579,614-7,579,700")
|
110
|
+
Ucsc::Hg19::Snp131.with_interval(gi).find(:all)
|
111
|
+
|
112
|
+
Ucsc::Hg19::Snp131.with_interval_excl(gi).find(:all)
|
105
113
|
|
106
|
-
|
114
|
+
Ucsc::Hg19::Snp132.with_interval(gi).select(:name).find_all_by_class_and_strand("in-del", "+")
|
107
115
|
|
108
|
-
|
109
|
-
|
116
|
+
Ucsc::Hg19::Snp131.find_by_name("rs56289060")
|
117
|
+
|
118
|
+
Sometimes, queries using raw SQLs provide elegant solutions.
|
110
119
|
sql << 'SQL'
|
111
120
|
SELECT name,chrom,chromStart,chromEnd,observed
|
112
121
|
FROM snp131
|
113
122
|
WHERE name="rs56289060"
|
114
123
|
SQL
|
115
|
-
p Snp131.find_by_sql(sql)
|
124
|
+
p Ucsc::Hg19::Snp131.find_by_sql(sql)
|
116
125
|
|
117
|
-
|
118
|
-
hg19ref =
|
119
|
-
gi =
|
126
|
+
retrieve reference sequence from a locally-stored 2bit file. The "hg19.2bit" file can be downloaded from http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.2bit
|
127
|
+
hg19ref = Ucsc::Reference.load("hg19.2bit")
|
128
|
+
gi = GenomicInterval.parse("chr1:9,500-10,999")
|
120
129
|
hg19ref.find_by_interval(gi)
|
121
130
|
|
122
|
-
|
123
|
-
DBConnection.db_host = '
|
124
|
-
DBConnection.db_username = 'genome'
|
125
|
-
DBConnection.db_password = ''
|
126
|
-
DBConnection.connect
|
131
|
+
Connetcting to non-official or local full/partial mirror MySQL servers
|
132
|
+
Ucsc::Hg18::DBConnection.db_host = 'localhost'
|
133
|
+
Ucsc::Hg18::DBConnection.db_username = 'genome'
|
134
|
+
Ucsc::Hg18::DBConnection.db_password = ''
|
135
|
+
Ucsc::Hg18::DBConnection.connect
|
136
|
+
|
137
|
+
Ucsc::Hg18::DBConnection.default # reset to connect UCSC's public MySQL sever
|
138
|
+
Ucsc::Hg18::DBConnection.connect
|
127
139
|
|
128
|
-
|
129
|
-
|
140
|
+
And see also sample scripts in the samples directory.
|
141
|
+
* num-gene-exon.rb - calculation of total number of genes and exons using genomic interval
|
142
|
+
* symbol2summary.rb - getting summary descriptions using gene symbol
|
143
|
+
* hg19-2bit-retrieve - outputting reference sequence in FASTA format
|
144
|
+
* bed2refseq - getting unique gene symbols in the genomic intervals in a BED file.
|
130
145
|
|
131
146
|
=== Notes of Exceptions in Table Support
|
132
147
|
* Table names starting with a number: Because Ruby class names cannot start with number, use the table class name starting with "T" (T for Table). Thus, the "2micron_est" table is supported by the "T2micron_est" class.
|
133
148
|
* Table names starting with uppercase character: Classes for "HInv" and "NIAGene" tables are "HInv" and "NIAGene", respectively
|
134
|
-
*
|
149
|
+
* Accessing chromosome-specific tables: For example, the 'rmsk' table in hg18 is actually separated into 'chr1_rmsk', 'chr2_rmsk'... There is two way to access to them. (1) Accessing separated tables directly. There is no difference from other regular tables. However, you have to manage each separated tables. (2) Use abstract table classes (e.g., 'Rmsk') and their class methods ".find_by_interval' or '.find_all_by_interval'. These methods look for correspondent separated tables automatically. However, you cannot combine with other 'find_by_[field]' methods. Moreover, if you have to perform single- or multi-chromosomal search, you have to access separated tables individually and integrate results by yourself. Fortunately, recent databases, including hg19, seem not to use chromosome-specific tables.
|
135
150
|
* For honey bee ApiMel2 database, Group*_chainDm2 and Group*_chainDm2Link tables are accessible using find(_all)_by_interval class methods of the ChainDm2 and ChainDm2Link classes.
|
136
151
|
* Special field (column) names: Field names such as 'attribute', 'valid', 'validate', 'class', 'method', 'methods', and 'type' cannot be accessed using instance methods. This restriction is because of the collision of method names that are internally used by ActiveRecord. Instead, use hash to access the field like "result[:type]".
|
137
152
|
|
138
|
-
=== details in "
|
153
|
+
=== details in "with_interval"
|
139
154
|
* When a table class is referred first time, the API prefetches the table to get a list of fields and dynamically defines a class using following algorithm.
|
140
155
|
* If chrom/chromStart/chromEnd fields exist (BED table), the API uses them for interval queries.
|
141
156
|
* When tName/tStart/tEnd fields exist (PSL table), the API uses them for interval queries.
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.
|
1
|
+
0.3.0
|
data/bio-ucsc-api.gemspec
CHANGED
@@ -5,11 +5,11 @@
|
|
5
5
|
|
6
6
|
Gem::Specification.new do |s|
|
7
7
|
s.name = %q{bio-ucsc-api}
|
8
|
-
s.version = "0.
|
8
|
+
s.version = "0.3.0"
|
9
9
|
|
10
10
|
s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
|
11
11
|
s.authors = [%q{Hiroyuki Mishima}, %q{Jan Aerts}]
|
12
|
-
s.date = %q{2011-08-
|
12
|
+
s.date = %q{2011-08-23}
|
13
13
|
s.description = %q{Ruby UCSC API: accessing the UCSC Genome Database using Ruby}
|
14
14
|
s.email = %q{missy@be.to}
|
15
15
|
s.extra_rdoc_files = [
|
@@ -294,6 +294,7 @@ Gem::Specification.new do |s|
|
|
294
294
|
"lib/bio-ucsc/gasacu1/intronest.rb",
|
295
295
|
"lib/bio-ucsc/gasacu1/mrna.rb",
|
296
296
|
"lib/bio-ucsc/gasacu1/rmsk.rb",
|
297
|
+
"lib/bio-ucsc/genomic-interval-bin.rb",
|
297
298
|
"lib/bio-ucsc/go.rb",
|
298
299
|
"lib/bio-ucsc/go/db_connection.rb",
|
299
300
|
"lib/bio-ucsc/hg18.rb",
|
@@ -546,6 +547,7 @@ Gem::Specification.new do |s|
|
|
546
547
|
"spec/cavpor3_spec.rb",
|
547
548
|
"spec/cb3_spec.rb",
|
548
549
|
"spec/ce6_spec.rb",
|
550
|
+
"spec/chromosome_specific_tables_spec.rb",
|
549
551
|
"spec/ci2_spec.rb",
|
550
552
|
"spec/danrer7_spec.rb",
|
551
553
|
"spec/dm3_spec.rb",
|
@@ -561,9 +563,11 @@ Gem::Specification.new do |s|
|
|
561
563
|
"spec/droyak2_spec.rb",
|
562
564
|
"spec/equcab2_spec.rb",
|
563
565
|
"spec/felcat4_spec.rb",
|
566
|
+
"spec/find_by_and_spec.rb",
|
564
567
|
"spec/fr2_spec.rb",
|
565
568
|
"spec/galgal3_spec.rb",
|
566
569
|
"spec/gasacu1_spec.rb",
|
570
|
+
"spec/genomic-interval-bin_spec.rb",
|
567
571
|
"spec/go_spec.rb",
|
568
572
|
"spec/hg18/acembly_spec.rb",
|
569
573
|
"spec/hg18/acemblyclass_spec.rb",
|
@@ -5449,6 +5453,7 @@ Gem::Specification.new do |s|
|
|
5449
5453
|
"spec/loxafr3_spec.rb",
|
5450
5454
|
"spec/mm9_spec.rb",
|
5451
5455
|
"spec/mondom5_spec.rb",
|
5456
|
+
"spec/named_scope_spec.rb",
|
5452
5457
|
"spec/ornana1_spec.rb",
|
5453
5458
|
"spec/orycun2_spec.rb",
|
5454
5459
|
"spec/orylat2_spec.rb",
|
data/lib/bio-ucsc.rb
CHANGED
@@ -4,13 +4,13 @@
|
|
4
4
|
# MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
|
5
5
|
# License:: Ruby licence (Ryby's / GPLv2 dual)
|
6
6
|
|
7
|
-
base = File.dirname(__FILE__)
|
8
|
-
require "#{base}/
|
9
|
-
require "
|
7
|
+
base = "#{File.dirname(__FILE__)}/bio-ucsc"
|
8
|
+
require "#{base}/ucsc_bin"
|
9
|
+
require "#{base}/genomic-interval-bin"
|
10
10
|
|
11
11
|
module Bio
|
12
12
|
module Ucsc
|
13
|
-
VERSION = "0.
|
13
|
+
VERSION = "0.3.0"
|
14
14
|
base = "#{File.dirname(__FILE__)}/bio-ucsc"
|
15
15
|
|
16
16
|
# mammmals #####################################
|
@@ -22,6 +22,15 @@ module Bio
|
|
22
22
|
['attribute', 'valid', 'validate', 'class', 'method', 'methods', 'type']
|
23
23
|
UPPERCASED_TABLE_PREFIX =
|
24
24
|
['HInv', 'NIAGene']
|
25
|
+
COMMON_CLASS_METHODS = %!
|
26
|
+
def self.find_by_interval(interval, opt = {:partial => true})
|
27
|
+
find_first_or_all_by_interval(interval, :first, opt)
|
28
|
+
end
|
29
|
+
|
30
|
+
def self.find_all_by_interval(interval, opt = {:partial => true})
|
31
|
+
find_first_or_all_by_interval(interval, :all, opt)
|
32
|
+
end
|
33
|
+
!
|
25
34
|
|
26
35
|
def const_missing(sym)
|
27
36
|
module_eval generic(sym)
|
@@ -81,13 +90,31 @@ module Bio
|
|
81
90
|
class #{uphead(sym)} < DBConnection
|
82
91
|
set_table_name "#{downhead(sym)}"
|
83
92
|
#{delete_reserved_methods}
|
84
|
-
|
85
|
-
|
86
|
-
|
87
|
-
|
88
|
-
|
89
|
-
|
90
|
-
|
93
|
+
#{COMMON_CLASS_METHODS}
|
94
|
+
|
95
|
+
where = <<-SQL
|
96
|
+
tName = :chrom
|
97
|
+
AND bin in (:bins)
|
98
|
+
AND ((tStart BETWEEN :zstart AND :zend)
|
99
|
+
OR (tEnd BETWEEN :zstart AND :zend)
|
100
|
+
OR (tStart <= :zstart AND tEnd >= :zend))
|
101
|
+
SQL
|
102
|
+
scope(:with_interval,
|
103
|
+
Proc.new{|gi|{:conditions => [where, {:chrom => gi.chrom,
|
104
|
+
:bins => gi.bin_all,
|
105
|
+
:zstart => gi.zero_start,
|
106
|
+
:zend => gi.zero_end,}]}})
|
107
|
+
where = <<-SQL
|
108
|
+
tName = :chrom
|
109
|
+
AND bin in (:bins)
|
110
|
+
AND ((tStart BETWEEN :zstart AND :zend)
|
111
|
+
AND (tEnd BETWEEN :zstart AND :zend))
|
112
|
+
SQL
|
113
|
+
scope(:with_interval_excl,
|
114
|
+
Proc.new{|gi|{:conditions => [where,{:chrom => gi.chrom,
|
115
|
+
:bins => gi.bin_all,
|
116
|
+
:zstart => gi.zero_start,
|
117
|
+
:zend => gi.zero_end,}]}})
|
91
118
|
|
92
119
|
def self.find_first_or_all_by_interval(interval, first_all, opt)
|
93
120
|
zstart = interval.zero_start
|
@@ -100,13 +127,14 @@ AND bin in (:bins)
|
|
100
127
|
AND ((tStart BETWEEN :zstart AND :zend)
|
101
128
|
OR (tEnd BETWEEN :zstart AND :zend)
|
102
129
|
OR (tStart <= :zstart AND tEnd >= :zend))
|
103
|
-
|
130
|
+
SQL
|
104
131
|
else
|
105
|
-
|
132
|
+
where = <<-SQL
|
106
133
|
tName = :chrom
|
134
|
+
AND bin in (:bins)
|
107
135
|
AND ((tStart BETWEEN :zstart AND :zend)
|
108
136
|
AND (tEnd BETWEEN :zstart AND :zend))
|
109
|
-
|
137
|
+
SQL
|
110
138
|
end
|
111
139
|
cond = {
|
112
140
|
:chrom => interval.chrom,
|
@@ -125,15 +153,28 @@ AND (tEnd BETWEEN :zstart AND :zend))
|
|
125
153
|
class #{uphead(sym)} < DBConnection
|
126
154
|
set_table_name "#{downhead(sym)}"
|
127
155
|
#{delete_reserved_methods}
|
128
|
-
|
129
|
-
def self.find_by_interval(interval, opt = {:partial => true})
|
130
|
-
find_first_or_all_by_interval(interval, :first, opt)
|
131
|
-
end
|
132
|
-
|
133
|
-
def self.find_all_by_interval(interval, opt = {:partial => true})
|
134
|
-
find_first_or_all_by_interval(interval, :all, opt)
|
135
|
-
end
|
156
|
+
#{COMMON_CLASS_METHODS}
|
136
157
|
|
158
|
+
where = <<-SQL
|
159
|
+
tName = :chrom
|
160
|
+
AND ((tStart BETWEEN :zstart AND :zend)
|
161
|
+
OR (tEnd BETWEEN :zstart AND :zend)
|
162
|
+
OR (tStart <= :zstart AND tEnd >= :zend))
|
163
|
+
SQL
|
164
|
+
scope(:with_interval,
|
165
|
+
Proc.new{|gi|{:conditions => [where, {:chrom => gi.chrom,
|
166
|
+
:zstart => gi.zero_start,
|
167
|
+
:zend => gi.zero_end,}]}})
|
168
|
+
where = <<-SQL
|
169
|
+
tName = :chrom
|
170
|
+
AND ((tStart BETWEEN :zstart AND :zend)
|
171
|
+
AND (tEnd BETWEEN :zstart AND :zend))
|
172
|
+
SQL
|
173
|
+
scope(:with_interval_excl,
|
174
|
+
Proc.new{|gi|{:conditions => [where,{:chrom => gi.chrom,
|
175
|
+
:zstart => gi.zero_start,
|
176
|
+
:zend => gi.zero_end,}]}})
|
177
|
+
|
137
178
|
def self.find_first_or_all_by_interval(interval, first_all, opt)
|
138
179
|
zstart = interval.zero_start
|
139
180
|
zend = interval.zero_end
|
@@ -174,15 +215,32 @@ AND (tEnd BETWEEN :zstart AND :zend))
|
|
174
215
|
class #{uphead(sym)} < DBConnection
|
175
216
|
set_table_name "#{downhead(sym)}"
|
176
217
|
#{delete_reserved_methods}
|
218
|
+
#{COMMON_CLASS_METHODS}
|
177
219
|
|
178
|
-
|
179
|
-
|
180
|
-
|
181
|
-
|
182
|
-
|
183
|
-
|
184
|
-
|
185
|
-
|
220
|
+
where = <<-SQL
|
221
|
+
chrom = :chrom
|
222
|
+
AND bin in (:bins)
|
223
|
+
AND ((chromStart BETWEEN :zstart AND :zend)
|
224
|
+
OR (chromEnd BETWEEN :zstart AND :zend)
|
225
|
+
OR (chromStart <= :zstart AND chromEnd >= :zend))
|
226
|
+
SQL
|
227
|
+
scope(:with_interval,
|
228
|
+
Proc.new{|gi|{:conditions => [where, {:chrom => gi.chrom,
|
229
|
+
:bins => gi.bin_all,
|
230
|
+
:zstart => gi.zero_start,
|
231
|
+
:zend => gi.zero_end,}]}})
|
232
|
+
where = <<-SQL
|
233
|
+
chrom = :chrom
|
234
|
+
AND bin in (:bins)
|
235
|
+
AND ((chromStart BETWEEN :zstart AND :zend)
|
236
|
+
AND (chromEnd BETWEEN :zstart AND :zend))
|
237
|
+
SQL
|
238
|
+
scope(:with_interval_excl,
|
239
|
+
Proc.new{|gi|{:conditions => [where,{:chrom => gi.chrom,
|
240
|
+
:bins => gi.bin_all,
|
241
|
+
:zstart => gi.zero_start,
|
242
|
+
:zend => gi.zero_end,}]}})
|
243
|
+
|
186
244
|
def self.find_first_or_all_by_interval(interval, first_all, opt)
|
187
245
|
zstart = interval.zero_start
|
188
246
|
zend = interval.zero_end
|
@@ -220,14 +278,27 @@ AND (chromEnd BETWEEN :zstart AND :zend))
|
|
220
278
|
class #{uphead(sym)} < DBConnection
|
221
279
|
set_table_name "#{downhead(sym)}"
|
222
280
|
#{delete_reserved_methods}
|
223
|
-
|
224
|
-
|
225
|
-
|
226
|
-
|
227
|
-
|
228
|
-
|
229
|
-
|
230
|
-
|
281
|
+
#{COMMON_CLASS_METHODS}
|
282
|
+
|
283
|
+
where = <<-SQL
|
284
|
+
chrom = :chrom
|
285
|
+
AND ((chromStart BETWEEN :zstart AND :zend)
|
286
|
+
OR (chromEnd BETWEEN :zstart AND :zend)
|
287
|
+
OR (chromStart <= :zstart AND chromEnd >= :zend))
|
288
|
+
SQL
|
289
|
+
scope(:with_interval,
|
290
|
+
Proc.new{|gi|{:conditions => [where, {:chrom => gi.chrom,
|
291
|
+
:zstart => gi.zero_start,
|
292
|
+
:zend => gi.zero_end,}]}})
|
293
|
+
where = <<-SQL
|
294
|
+
chrom = :chrom
|
295
|
+
AND ((chromStart BETWEEN :zstart AND :zend)
|
296
|
+
AND (chromEnd BETWEEN :zstart AND :zend))
|
297
|
+
SQL
|
298
|
+
scope(:with_interval_excl,
|
299
|
+
Proc.new{|gi|{:conditions => [where,{:chrom => gi.chrom,
|
300
|
+
:zstart => gi.zero_start,
|
301
|
+
:zend => gi.zero_end,}]}})
|
231
302
|
|
232
303
|
def self.find_first_or_all_by_interval(interval, first_all, opt)
|
233
304
|
zstart = interval.zero_start
|
@@ -269,14 +340,32 @@ AND (chromEnd BETWEEN :zstart AND :zend))
|
|
269
340
|
class #{uphead(sym)} < DBConnection
|
270
341
|
set_table_name "#{downhead(sym)}"
|
271
342
|
#{delete_reserved_methods}
|
272
|
-
|
273
|
-
|
274
|
-
|
275
|
-
|
276
|
-
|
277
|
-
|
278
|
-
|
279
|
-
|
343
|
+
#{COMMON_CLASS_METHODS}
|
344
|
+
|
345
|
+
where = <<-SQL
|
346
|
+
chrom = :chrom
|
347
|
+
AND bin in (:bins)
|
348
|
+
AND ((txStart BETWEEN :zstart AND :zend)
|
349
|
+
OR (txEnd BETWEEN :zstart AND :zend)
|
350
|
+
OR (txStart <= :zstart AND txEnd >= :zend))
|
351
|
+
SQL
|
352
|
+
scope(:with_interval,
|
353
|
+
Proc.new{|gi|{:conditions => [where, {:chrom => gi.chrom,
|
354
|
+
:bins => gi.bin_all,
|
355
|
+
:zstart => gi.zero_start,
|
356
|
+
:zend => gi.zero_end,}]}})
|
357
|
+
|
358
|
+
where = <<-SQL
|
359
|
+
chrom = :chrom
|
360
|
+
AND bin in (:bins)
|
361
|
+
AND ((txStart BETWEEN :zstart AND :zend)
|
362
|
+
AND (txEnd BETWEEN :zstart AND :zend))
|
363
|
+
SQL
|
364
|
+
scope(:with_interval_excl,
|
365
|
+
Proc.new{|gi|{:conditions => [where,{:chrom => gi.chrom,
|
366
|
+
:bins => gi.bin_all,
|
367
|
+
:zstart => gi.zero_start,
|
368
|
+
:zend => gi.zero_end,}]}})
|
280
369
|
|
281
370
|
def self.find_first_or_all_by_interval(interval, first_all, opt)
|
282
371
|
zstart = interval.zero_start
|
@@ -314,14 +403,33 @@ AND (txEnd BETWEEN :zstart AND :zend))
|
|
314
403
|
class #{uphead(sym)} < DBConnection
|
315
404
|
set_table_name "#{downhead(sym)}"
|
316
405
|
#{delete_reserved_methods}
|
317
|
-
|
318
|
-
|
319
|
-
|
320
|
-
|
321
|
-
|
322
|
-
|
323
|
-
|
324
|
-
|
406
|
+
#{COMMON_CLASS_METHODS}
|
407
|
+
|
408
|
+
def self.with_interval(gi, opt = {:partial => true})
|
409
|
+
if opt[:partial] == true
|
410
|
+
where = <<-SQL
|
411
|
+
chrom = :chrom
|
412
|
+
AND ((txStart BETWEEN :zstart AND :zend)
|
413
|
+
OR (txEnd BETWEEN :zstart AND :zend)
|
414
|
+
OR (txStart <= :zstart AND txEnd >= :zend))
|
415
|
+
SQL
|
416
|
+
else
|
417
|
+
where = <<-SQL
|
418
|
+
chrom = :chrom
|
419
|
+
AND ((txStart BETWEEN :zstart AND :zend)
|
420
|
+
AND (txEnd BETWEEN :zstart AND :zend))
|
421
|
+
SQL
|
422
|
+
end
|
423
|
+
values = {
|
424
|
+
:chrom => gi.chrom,
|
425
|
+
:zstart => gi.zero_start,
|
426
|
+
:zend => gi.zero_end,
|
427
|
+
}
|
428
|
+
|
429
|
+
with_scope(:find => {:conditions => [where, values]}) do
|
430
|
+
yield
|
431
|
+
end
|
432
|
+
end # def self.with_interval
|
325
433
|
|
326
434
|
def self.find_first_or_all_by_interval(interval, first_all, opt)
|
327
435
|
zstart = interval.zero_start
|
@@ -363,14 +471,31 @@ AND ((txStart BETWEEN :zstart AND :zend)
|
|
363
471
|
class #{uphead(sym)} < DBConnection
|
364
472
|
set_table_name "#{downhead(sym)}"
|
365
473
|
#{delete_reserved_methods}
|
474
|
+
#{COMMON_CLASS_METHODS}
|
366
475
|
|
367
|
-
|
368
|
-
|
369
|
-
|
370
|
-
|
371
|
-
|
372
|
-
|
373
|
-
|
476
|
+
where = <<-SQL
|
477
|
+
genoName = :chrom
|
478
|
+
AND bin in (:bins)
|
479
|
+
AND ((genoStart BETWEEN :zstart AND :zend)
|
480
|
+
OR (genoEnd BETWEEN :zstart AND :zend)
|
481
|
+
OR (genoStart <= :zstart AND genoEnd >= :zend))
|
482
|
+
SQL
|
483
|
+
scope(:with_interval,
|
484
|
+
Proc.new{|gi|{:conditions => [where, {:chrom => gi.chrom,
|
485
|
+
:bins => gi.bin_all,
|
486
|
+
:zstart => gi.zero_start,
|
487
|
+
:zend => gi.zero_end,}]}})
|
488
|
+
where = <<-SQL
|
489
|
+
genoName = :chrom
|
490
|
+
AND bin in (:bins)
|
491
|
+
AND ((genoStart BETWEEN :zstart AND :zend)
|
492
|
+
AND (genoEnd BETWEEN :zstart AND :zend))
|
493
|
+
SQL
|
494
|
+
scope(:with_interval_excl,
|
495
|
+
Proc.new{|gi|{:conditions => [where,{:chrom => gi.chrom,
|
496
|
+
:bins => gi.bin_all,
|
497
|
+
:zstart => gi.zero_start,
|
498
|
+
:zend => gi.zero_end,}]}})
|
374
499
|
|
375
500
|
def self.find_first_or_all_by_interval(interval, first_all, opt)
|
376
501
|
zstart = interval.zero_start
|
@@ -382,14 +507,14 @@ AND bin in (:bins)
|
|
382
507
|
AND ((genoStart BETWEEN :zstart AND :zend)
|
383
508
|
OR (genoEnd BETWEEN :zstart AND :zend)
|
384
509
|
OR (genoStart <= :zstart AND genoEnd >= :zend))
|
385
|
-
|
510
|
+
SQL
|
386
511
|
else
|
387
512
|
where = <<-SQL
|
388
513
|
genoName = :chrom
|
389
514
|
AND bin in (:bins)
|
390
515
|
AND ((genoStart BETWEEN :zstart AND :zend)
|
391
516
|
AND (genoEnd BETWEEN :zstart AND :zend))
|
392
|
-
|
517
|
+
SQL
|
393
518
|
end
|
394
519
|
cond = {
|
395
520
|
:chrom => interval.chrom,
|
@@ -408,14 +533,28 @@ AND (genoEnd BETWEEN :zstart AND :zend))
|
|
408
533
|
class #{uphead(sym)} < DBConnection
|
409
534
|
set_table_name "#{downhead(sym)}"
|
410
535
|
#{delete_reserved_methods}
|
411
|
-
|
412
|
-
|
413
|
-
|
414
|
-
|
415
|
-
|
416
|
-
|
417
|
-
|
418
|
-
|
536
|
+
#{COMMON_CLASS_METHODS}
|
537
|
+
|
538
|
+
where = <<-SQL
|
539
|
+
genoName = :chrom
|
540
|
+
AND ((genoStart BETWEEN :zstart AND :zend)
|
541
|
+
OR (genoEnd BETWEEN :zstart AND :zend)
|
542
|
+
OR (genoStart <= :zstart AND genoEnd >= :zend))
|
543
|
+
SQL
|
544
|
+
scope(:with_interval,
|
545
|
+
Proc.new{|gi|{:conditions => [where, {:chrom => gi.chrom,
|
546
|
+
:bins => gi.bin_all,
|
547
|
+
:zstart => gi.zero_start,
|
548
|
+
:zend => gi.zero_end,}]}})
|
549
|
+
where = <<-SQL
|
550
|
+
genoName = :chrom
|
551
|
+
AND ((genoStart BETWEEN :zstart AND :zend)
|
552
|
+
AND (genoEnd BETWEEN :zstart AND :zend))
|
553
|
+
SQL
|
554
|
+
scope(:with_interval_excl,
|
555
|
+
Proc.new{|gi|{:conditions => [where,{:chrom => gi.chrom,
|
556
|
+
:zstart => gi.zero_start,
|
557
|
+
:zend => gi.zero_end,}]}})
|
419
558
|
|
420
559
|
def self.find_first_or_all_by_interval(interval, first_all, opt)
|
421
560
|
zstart = interval.zero_start
|
data/samples/bed2refseq.rb
CHANGED
@@ -1,19 +1,23 @@
|
|
1
1
|
#!/usr/local/bin/ruby-1.9
|
2
2
|
|
3
|
-
require 'bio-ucsc'
|
4
|
-
|
3
|
+
#require 'bio-ucsc'
|
4
|
+
require '../lib/bio-ucsc'
|
5
5
|
|
6
|
-
|
6
|
+
include Bio
|
7
|
+
|
8
|
+
Ucsc::Hg19::DBConnection.connect
|
7
9
|
|
8
10
|
genes = Array.new
|
9
11
|
ARGF.each_line do |row|
|
10
12
|
row.chomp!
|
11
13
|
next if row.empty?
|
12
14
|
chr, chr_start, chr_end = row.split("\t")
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
15
|
+
gi = GenomicInterval.zero_based(chr,
|
16
|
+
Integer(chr_start),
|
17
|
+
Integer(chr_end),)
|
18
|
+
|
19
|
+
results = Ucsc::Hg19::RefGene.with_interval(gi).select(:name2).find(:all)
|
20
|
+
genes.concat(results.map{|e|e.name2})
|
17
21
|
end
|
18
22
|
|
19
|
-
genes.uniq
|
23
|
+
genes.uniq.each{|e|puts e} if genes
|
@@ -20,13 +20,11 @@ class Hg19Ref
|
|
20
20
|
|
21
21
|
def run(interval)
|
22
22
|
DBConnection.connect
|
23
|
-
#ReferenceSequence.load(HG19_2BIT_FILE) # v0.1
|
24
23
|
ref = Bio::Ucsc::Reference.load(HG19_2BIT_FILE) # v0.2 and later
|
25
24
|
|
26
25
|
itv = Bio::GenomicInterval.parse(interval)
|
27
26
|
|
28
27
|
puts itv.to_s
|
29
|
-
# puts NKF.nkf("-wf50-0", ReferenceSequence.find_by_interval(itv)) # v0.1
|
30
28
|
puts NKF.nkf("-wf50-0", ref.find_by_interval(itv)) # v0.2 and later
|
31
29
|
end
|
32
30
|
end
|
data/samples/hg19-sample.rb
CHANGED
@@ -11,14 +11,14 @@
|
|
11
11
|
require File.dirname(__FILE__) + '/../lib/bio-ucsc'
|
12
12
|
require 'nkf'
|
13
13
|
|
14
|
-
include Bio
|
14
|
+
include Bio
|
15
15
|
|
16
|
-
Hg19::DBConnection.connect
|
16
|
+
Ucsc::Hg19::DBConnection.connect
|
17
17
|
|
18
|
-
|
19
|
-
[
|
20
|
-
|
21
|
-
|
18
|
+
gi_a =
|
19
|
+
[GenomicInterval.parse("chr1:1-200,000"),
|
20
|
+
GenomicInterval.parse("chr2:1-200,000"),
|
21
|
+
GenomicInterval.parse("chr3:1-300,000"),
|
22
22
|
]
|
23
23
|
|
24
24
|
puts
|
@@ -26,8 +26,10 @@ puts "Queries in Slice objects using 1-based [start,end] closed intervals"
|
|
26
26
|
puts "Results in 0-based [start,end) half-open intervals"
|
27
27
|
puts
|
28
28
|
|
29
|
-
|
30
|
-
|
29
|
+
puts "test 1 (hg19/RefGene) --- Bio::Ucsc::Hg19::RefGene.with_interval"
|
30
|
+
|
31
|
+
results = gi_a.map{|gi| Ucsc::Hg19::RefGene.with_interval(gi).find(:all)}
|
32
|
+
|
31
33
|
puts "0-based interval\t1-based interval\tGene Symbol"
|
32
34
|
results.flatten.each do |e|
|
33
35
|
i = Bio::GenomicInterval.zero_based(e.chrom, e.txStart, e.txEnd)
|
@@ -35,26 +37,27 @@ results.flatten.each do |e|
|
|
35
37
|
print "#{i.chrom}:#{i.chr_start}-#{i.chr_end}\t#{e.name2}\n"
|
36
38
|
end
|
37
39
|
|
38
|
-
|
39
|
-
#
|
40
|
+
#################################################################################
|
40
41
|
|
41
|
-
|
42
|
-
[
|
43
|
-
|
44
|
-
|
42
|
+
gi_b =
|
43
|
+
[GenomicInterval.parse("chr1:1-11,000"),
|
44
|
+
GenomicInterval.parse("chr2:1-11,000"),
|
45
|
+
GenomicInterval.parse("chr3:1-12,000"),
|
45
46
|
]
|
46
47
|
|
47
48
|
puts
|
48
|
-
puts "test 2 (hg19/Snp131) --- Bio::Ucsc::Hg19::Snp131.
|
49
|
+
puts "test 2 (hg19/Snp131) --- Bio::Ucsc::Hg19::Snp131.with_interval"
|
49
50
|
puts "0-based interval\t1-based interval\tdbSNP rs ID\tClass"
|
50
|
-
|
51
|
+
|
52
|
+
results = gi_b.map{|gi|Ucsc::Hg19::Snp131.with_interval(gi).find(:all)}
|
53
|
+
|
51
54
|
results.flatten.each do |e|
|
52
55
|
i = Bio::GenomicInterval.zero_based(e.chrom, e.chromStart, e.chromEnd)
|
53
56
|
print "#{e.chrom}:#{e.chromStart}-#{e.chromEnd}\t"
|
54
57
|
print "#{i.chrom}:#{i.chr_start}-#{i.chr_end}\t#{e.name}\t#{e[:class]}\n"
|
55
58
|
end
|
56
59
|
|
57
|
-
|
60
|
+
###############################################################################
|
58
61
|
#
|
59
62
|
|
60
63
|
names = %w(rs56289060 rs62636508 rs28888107)
|
@@ -62,26 +65,7 @@ names = %w(rs56289060 rs62636508 rs28888107)
|
|
62
65
|
puts
|
63
66
|
puts "test 3 (hg19/Snp131) ---Bio::Ucsc::Hg19::Snp131.find_by_name"
|
64
67
|
names.each do |n|
|
65
|
-
r =
|
66
|
-
i =
|
68
|
+
r = Ucsc::Hg19::Snp131.find_by_name(n)
|
69
|
+
i = GenomicInterval.zero_based(r.chrom, r.chromStart, r.chromEnd)
|
67
70
|
puts "Query: #{n}\t#{i.chrom}\t#{i.chr_start}\t#{i.chr_end}\t#{r[:class]}"
|
68
71
|
end
|
69
|
-
|
70
|
-
#
|
71
|
-
#
|
72
|
-
|
73
|
-
results = GbCdnaInfo.find([1,2,3,4,5], :include => :description)
|
74
|
-
results.each{|e| puts "#{e.acc}\t#{e.description.name}"}
|
75
|
-
|
76
|
-
p GbCdnaInfo.find_by_acc("AA411542", :include => :description)
|
77
|
-
|
78
|
-
results = KgXref.find_all_by_geneSymbol("TP53")
|
79
|
-
results.each{|e| puts "#{e.mRNA}\t#{e.description}"}
|
80
|
-
|
81
|
-
#
|
82
|
-
#
|
83
|
-
|
84
|
-
puts
|
85
|
-
puts NKF.nkf("-wF72", RefSeqSummary.find_by_mrnaAcc("NM_000546").summary)
|
86
|
-
puts
|
87
|
-
puts NKF.nkf("-wF72", RefSeqSummary.find_by_mrnaAcc("NR_029476").summary)
|
data/samples/num-gene-exon.rb
CHANGED
@@ -9,15 +9,14 @@
|
|
9
9
|
# number of genes, and maximum number of exons.
|
10
10
|
#
|
11
11
|
|
12
|
-
require 'bio-ucsc'
|
13
|
-
|
12
|
+
#require 'bio-ucsc'
|
13
|
+
require '../lib/bio-ucsc'
|
14
14
|
|
15
15
|
interval = Bio::GenomicInterval.parse(ARGV[0])
|
16
16
|
|
17
|
-
DBConnection.connect
|
17
|
+
Bio::Ucsc::Hg18::DBConnection.connect
|
18
18
|
|
19
|
-
genes = RefGene.
|
20
|
-
|
19
|
+
genes = Bio::Ucsc::Hg18::RefGene.with_interval(interval).find(:all).map{|e|e.name2}.uniq
|
21
20
|
puts "Included genes:"
|
22
21
|
puts genes
|
23
22
|
puts "Number of genes:"
|
@@ -25,7 +24,12 @@ puts genes.size
|
|
25
24
|
|
26
25
|
total_exons = 0
|
27
26
|
genes.each do |gene|
|
28
|
-
total_exons +=
|
27
|
+
total_exons +=
|
28
|
+
Bio::Ucsc::Hg18::RefGene.
|
29
|
+
with_interval(interval).
|
30
|
+
find_all_by_name2(gene).
|
31
|
+
map{|e|e.exonCount}.
|
32
|
+
max
|
29
33
|
end
|
30
34
|
|
31
35
|
puts "Number of exons (maximum):"
|
data/samples/symbol2summary.rb
CHANGED
@@ -12,13 +12,12 @@ require File.dirname(__FILE__) + '/../lib/bio-ucsc'
|
|
12
12
|
require 'nkf'
|
13
13
|
|
14
14
|
class Sym2Sum
|
15
|
-
include Bio::Ucsc::Hg19
|
16
15
|
|
17
16
|
def run(genesym)
|
18
|
-
DBConnection.connect
|
19
|
-
known_gene = KgXref.find_by_geneSymbol(genesym)
|
20
|
-
ref_gene = RefGene.find_by_name2(genesym)
|
21
|
-
summary = RefSeqSummary.find_by_mrnaAcc(ref_gene.name).summary
|
17
|
+
Bio::Ucsc::Hg19::DBConnection.connect
|
18
|
+
known_gene = Bio::Ucsc::Hg19::KgXref.find_by_geneSymbol(genesym)
|
19
|
+
ref_gene = Bio::Ucsc::Hg19::RefGene.find_by_name2(genesym)
|
20
|
+
summary = Bio::Ucsc::Hg19::RefSeqSummary.find_by_mrnaAcc(ref_gene.name).summary
|
22
21
|
|
23
22
|
puts "---"
|
24
23
|
puts "Gene symbol: #{genesym}" if known_gene
|
@@ -0,0 +1,36 @@
|
|
1
|
+
require "bio-ucsc"
|
2
|
+
require "pp"
|
3
|
+
|
4
|
+
describe "Bio::Ucsc::Hg18" do
|
5
|
+
|
6
|
+
before(:all) do
|
7
|
+
Bio::Ucsc::Hg18::DBConnection.connect
|
8
|
+
end
|
9
|
+
|
10
|
+
describe "Rmsk (separated Rmsk table)" do
|
11
|
+
describe ".find_by_interval" do
|
12
|
+
context "given 'chr1:10,000-20,000'" do
|
13
|
+
it 'returns a first hit record' do
|
14
|
+
gi = Bio::GenomicInterval.parse("chr1:10,000-20,000")
|
15
|
+
results = Bio::Ucsc::Hg18::Rmsk.find_by_interval(gi)
|
16
|
+
pp results
|
17
|
+
results.should be_true
|
18
|
+
end
|
19
|
+
end
|
20
|
+
end
|
21
|
+
end
|
22
|
+
|
23
|
+
describe "Chr1_rmsk (separated Rmsk table)" do
|
24
|
+
describe ".with_interval" do
|
25
|
+
context "given 'chr1:10,000-20,000'" do
|
26
|
+
it 'returns a first hit record' do
|
27
|
+
gi = Bio::GenomicInterval.parse("chr1:10,000-20,000")
|
28
|
+
results = Bio::Ucsc::Hg18::Chr1_rmsk.with_interval(gi).find(:first)
|
29
|
+
pp results
|
30
|
+
results.should be_true
|
31
|
+
end
|
32
|
+
end
|
33
|
+
end
|
34
|
+
end
|
35
|
+
|
36
|
+
end
|
@@ -0,0 +1,18 @@
|
|
1
|
+
require "bio-ucsc"
|
2
|
+
|
3
|
+
describe "Bio::Ucsc::Hg19::Snp132" do
|
4
|
+
|
5
|
+
describe ".find_all_by_bin_and_strand" do
|
6
|
+
context "given 'chr17:7,579,614-7,579,700' and '+'" do
|
7
|
+
it 'returns records' do
|
8
|
+
Bio::Ucsc::Hg19::DBConnection.connect
|
9
|
+
gi = Bio::GenomicInterval.parse("chr17:7,579,614-7,579,700")
|
10
|
+
results =
|
11
|
+
Bio::Ucsc::Hg19::Snp132.find_all_by_chrom_and_bin_and_class("chr17",
|
12
|
+
gi.bin_all,
|
13
|
+
"in-del")
|
14
|
+
results.should be_kind_of(Array)
|
15
|
+
end
|
16
|
+
end
|
17
|
+
end
|
18
|
+
end
|
@@ -0,0 +1,23 @@
|
|
1
|
+
require "bio-ucsc"
|
2
|
+
|
3
|
+
describe "Bio::GenomicInterval" do
|
4
|
+
|
5
|
+
describe ".bin_all" do
|
6
|
+
context "given 'chr17:7,579,614-7,579,700'" do
|
7
|
+
it 'returns all bins to search ([642, 80, 9, 1, 0])' do
|
8
|
+
gi = Bio::GenomicInterval.parse("chr17:7,579,614-7,579,700")
|
9
|
+
gi.bin_all.should == [642, 80, 9, 1, 0]
|
10
|
+
end
|
11
|
+
end
|
12
|
+
end
|
13
|
+
|
14
|
+
describe ".bin" do
|
15
|
+
context "given 'chr17:7,579,614-7,579,700'" do
|
16
|
+
it 'returns the smallest bin to search (642)' do
|
17
|
+
gi = Bio::GenomicInterval.parse("chr17:7,579,614-7,579,700")
|
18
|
+
gi.bin.should == 642
|
19
|
+
end
|
20
|
+
end
|
21
|
+
end
|
22
|
+
|
23
|
+
end
|
@@ -0,0 +1,81 @@
|
|
1
|
+
require "bio-ucsc"
|
2
|
+
require "pp"
|
3
|
+
|
4
|
+
describe "Bio::Ucsc::Hg19" do
|
5
|
+
|
6
|
+
before(:all) do
|
7
|
+
Bio::Ucsc::Hg19::DBConnection.connect
|
8
|
+
end
|
9
|
+
|
10
|
+
describe "Snp132 (BED table)" do
|
11
|
+
describe ".with_interval" do
|
12
|
+
context "given 'chr17:7,579,614-7,579,700' and class='in-del'" do
|
13
|
+
it 'returns records' do
|
14
|
+
gi = Bio::GenomicInterval.parse("chr17:7,579,614-7,579,700")
|
15
|
+
results = Bio::Ucsc::Hg19::Snp132.
|
16
|
+
with_interval(gi).
|
17
|
+
find_all_by_class("in-del")
|
18
|
+
pp results
|
19
|
+
results.should be_kind_of(Array)
|
20
|
+
end
|
21
|
+
end
|
22
|
+
|
23
|
+
context "given 'chr17:7,579,614-7,579,700'/non-partial and class='in-del'" do
|
24
|
+
it 'returns records' do
|
25
|
+
gi = Bio::GenomicInterval.parse("chr17:7,579,614-7,579,700")
|
26
|
+
results = Bio::Ucsc::Hg19::Snp132.
|
27
|
+
with_interval_excl(gi).
|
28
|
+
find_all_by_class("in-del")
|
29
|
+
pp results
|
30
|
+
results.should be_kind_of(Array)
|
31
|
+
end
|
32
|
+
end
|
33
|
+
end
|
34
|
+
end # describe "Snp132"
|
35
|
+
|
36
|
+
describe "Rmsk (Rmsk table)" do
|
37
|
+
describe ".with_interval" do
|
38
|
+
context "given 'chr1:10,000-200,00' and repClass='LINE'" do
|
39
|
+
it 'returns hit records' do
|
40
|
+
gi = Bio::GenomicInterval.parse("chr1:10,000-20,000")
|
41
|
+
results = Bio::Ucsc::Hg19::Rmsk.
|
42
|
+
with_interval(gi).
|
43
|
+
find_all_by_repClass "LINE"
|
44
|
+
pp results
|
45
|
+
results.should be_kind_of(Array)
|
46
|
+
end
|
47
|
+
end
|
48
|
+
end
|
49
|
+
end # describe "Rmsk"
|
50
|
+
|
51
|
+
describe "RefGene (genePred table)" do
|
52
|
+
describe ".with_interval" do
|
53
|
+
context "given 'chr1:10,000-100,000' and strand='+'" do
|
54
|
+
it 'returns hit records' do
|
55
|
+
gi = Bio::GenomicInterval.parse("chr1:10,000-100,000")
|
56
|
+
results = Bio::Ucsc::Hg19::RefGene.
|
57
|
+
with_interval(gi).
|
58
|
+
find_all_by_strand "+"
|
59
|
+
pp results
|
60
|
+
results.should be_kind_of(Array)
|
61
|
+
end
|
62
|
+
end
|
63
|
+
end
|
64
|
+
end # describe "RefGene"
|
65
|
+
|
66
|
+
describe "ChainAilMel1 (PSL table)" do
|
67
|
+
describe ".with_interval" do
|
68
|
+
context "given 'chr1:10,000-50,000' and qStrand='+'" do
|
69
|
+
it 'returns hit records' do
|
70
|
+
gi = Bio::GenomicInterval.parse("chr1:10,000-50,000")
|
71
|
+
results = Bio::Ucsc::Hg19::ChainAilMel1.
|
72
|
+
with_interval(gi).
|
73
|
+
find_all_by_qStrand "+"
|
74
|
+
pp results
|
75
|
+
results.should be_kind_of(Array)
|
76
|
+
end
|
77
|
+
end
|
78
|
+
end
|
79
|
+
end # "ChainAilMel1 (PSL table)"
|
80
|
+
|
81
|
+
end # describe "Bio::Ucsc::Hg19"
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: bio-ucsc-api
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.3.0
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -10,11 +10,11 @@ authors:
|
|
10
10
|
autorequire:
|
11
11
|
bindir: bin
|
12
12
|
cert_chain: []
|
13
|
-
date: 2011-08-
|
13
|
+
date: 2011-08-23 00:00:00.000000000Z
|
14
14
|
dependencies:
|
15
15
|
- !ruby/object:Gem::Dependency
|
16
16
|
name: activerecord
|
17
|
-
requirement: &
|
17
|
+
requirement: &174367420 !ruby/object:Gem::Requirement
|
18
18
|
none: false
|
19
19
|
requirements:
|
20
20
|
- - ! '>='
|
@@ -22,10 +22,10 @@ dependencies:
|
|
22
22
|
version: 3.0.7
|
23
23
|
type: :runtime
|
24
24
|
prerelease: false
|
25
|
-
version_requirements: *
|
25
|
+
version_requirements: *174367420
|
26
26
|
- !ruby/object:Gem::Dependency
|
27
27
|
name: activesupport
|
28
|
-
requirement: &
|
28
|
+
requirement: &174366940 !ruby/object:Gem::Requirement
|
29
29
|
none: false
|
30
30
|
requirements:
|
31
31
|
- - ! '>='
|
@@ -33,10 +33,10 @@ dependencies:
|
|
33
33
|
version: 3.0.7
|
34
34
|
type: :runtime
|
35
35
|
prerelease: false
|
36
|
-
version_requirements: *
|
36
|
+
version_requirements: *174366940
|
37
37
|
- !ruby/object:Gem::Dependency
|
38
38
|
name: mysql
|
39
|
-
requirement: &
|
39
|
+
requirement: &174366460 !ruby/object:Gem::Requirement
|
40
40
|
none: false
|
41
41
|
requirements:
|
42
42
|
- - ! '>='
|
@@ -44,10 +44,10 @@ dependencies:
|
|
44
44
|
version: 2.8.1
|
45
45
|
type: :runtime
|
46
46
|
prerelease: false
|
47
|
-
version_requirements: *
|
47
|
+
version_requirements: *174366460
|
48
48
|
- !ruby/object:Gem::Dependency
|
49
49
|
name: bio-genomic-interval
|
50
|
-
requirement: &
|
50
|
+
requirement: &174365980 !ruby/object:Gem::Requirement
|
51
51
|
none: false
|
52
52
|
requirements:
|
53
53
|
- - ! '>='
|
@@ -55,10 +55,10 @@ dependencies:
|
|
55
55
|
version: 0.1.2
|
56
56
|
type: :runtime
|
57
57
|
prerelease: false
|
58
|
-
version_requirements: *
|
58
|
+
version_requirements: *174365980
|
59
59
|
- !ruby/object:Gem::Dependency
|
60
60
|
name: rspec
|
61
|
-
requirement: &
|
61
|
+
requirement: &174365500 !ruby/object:Gem::Requirement
|
62
62
|
none: false
|
63
63
|
requirements:
|
64
64
|
- - ~>
|
@@ -66,10 +66,10 @@ dependencies:
|
|
66
66
|
version: 2.5.0
|
67
67
|
type: :development
|
68
68
|
prerelease: false
|
69
|
-
version_requirements: *
|
69
|
+
version_requirements: *174365500
|
70
70
|
- !ruby/object:Gem::Dependency
|
71
71
|
name: bundler
|
72
|
-
requirement: &
|
72
|
+
requirement: &174365020 !ruby/object:Gem::Requirement
|
73
73
|
none: false
|
74
74
|
requirements:
|
75
75
|
- - ~>
|
@@ -77,10 +77,10 @@ dependencies:
|
|
77
77
|
version: 1.0.0
|
78
78
|
type: :development
|
79
79
|
prerelease: false
|
80
|
-
version_requirements: *
|
80
|
+
version_requirements: *174365020
|
81
81
|
- !ruby/object:Gem::Dependency
|
82
82
|
name: jeweler
|
83
|
-
requirement: &
|
83
|
+
requirement: &174327460 !ruby/object:Gem::Requirement
|
84
84
|
none: false
|
85
85
|
requirements:
|
86
86
|
- - ~>
|
@@ -88,10 +88,10 @@ dependencies:
|
|
88
88
|
version: 1.5.2
|
89
89
|
type: :development
|
90
90
|
prerelease: false
|
91
|
-
version_requirements: *
|
91
|
+
version_requirements: *174327460
|
92
92
|
- !ruby/object:Gem::Dependency
|
93
93
|
name: rcov
|
94
|
-
requirement: &
|
94
|
+
requirement: &174326840 !ruby/object:Gem::Requirement
|
95
95
|
none: false
|
96
96
|
requirements:
|
97
97
|
- - ! '>='
|
@@ -99,10 +99,10 @@ dependencies:
|
|
99
99
|
version: '0'
|
100
100
|
type: :development
|
101
101
|
prerelease: false
|
102
|
-
version_requirements: *
|
102
|
+
version_requirements: *174326840
|
103
103
|
- !ruby/object:Gem::Dependency
|
104
104
|
name: bio
|
105
|
-
requirement: &
|
105
|
+
requirement: &174326240 !ruby/object:Gem::Requirement
|
106
106
|
none: false
|
107
107
|
requirements:
|
108
108
|
- - ! '>='
|
@@ -110,10 +110,10 @@ dependencies:
|
|
110
110
|
version: 1.4.1
|
111
111
|
type: :development
|
112
112
|
prerelease: false
|
113
|
-
version_requirements: *
|
113
|
+
version_requirements: *174326240
|
114
114
|
- !ruby/object:Gem::Dependency
|
115
115
|
name: rdoc
|
116
|
-
requirement: &
|
116
|
+
requirement: &174325620 !ruby/object:Gem::Requirement
|
117
117
|
none: false
|
118
118
|
requirements:
|
119
119
|
- - ! '>='
|
@@ -121,7 +121,7 @@ dependencies:
|
|
121
121
|
version: 3.9.1
|
122
122
|
type: :development
|
123
123
|
prerelease: false
|
124
|
-
version_requirements: *
|
124
|
+
version_requirements: *174325620
|
125
125
|
description: ! 'Ruby UCSC API: accessing the UCSC Genome Database using Ruby'
|
126
126
|
email: missy@be.to
|
127
127
|
executables: []
|
@@ -407,6 +407,7 @@ files:
|
|
407
407
|
- lib/bio-ucsc/gasacu1/intronest.rb
|
408
408
|
- lib/bio-ucsc/gasacu1/mrna.rb
|
409
409
|
- lib/bio-ucsc/gasacu1/rmsk.rb
|
410
|
+
- lib/bio-ucsc/genomic-interval-bin.rb
|
410
411
|
- lib/bio-ucsc/go.rb
|
411
412
|
- lib/bio-ucsc/go/db_connection.rb
|
412
413
|
- lib/bio-ucsc/hg18.rb
|
@@ -652,6 +653,7 @@ files:
|
|
652
653
|
- spec/cavpor3_spec.rb
|
653
654
|
- spec/cb3_spec.rb
|
654
655
|
- spec/ce6_spec.rb
|
656
|
+
- spec/chromosome_specific_tables_spec.rb
|
655
657
|
- spec/ci2_spec.rb
|
656
658
|
- spec/danrer7_spec.rb
|
657
659
|
- spec/dm3_spec.rb
|
@@ -667,9 +669,11 @@ files:
|
|
667
669
|
- spec/droyak2_spec.rb
|
668
670
|
- spec/equcab2_spec.rb
|
669
671
|
- spec/felcat4_spec.rb
|
672
|
+
- spec/find_by_and_spec.rb
|
670
673
|
- spec/fr2_spec.rb
|
671
674
|
- spec/galgal3_spec.rb
|
672
675
|
- spec/gasacu1_spec.rb
|
676
|
+
- spec/genomic-interval-bin_spec.rb
|
673
677
|
- spec/go_spec.rb
|
674
678
|
- spec/hg18/acembly_spec.rb
|
675
679
|
- spec/hg18/acemblyclass_spec.rb
|
@@ -5555,6 +5559,7 @@ files:
|
|
5555
5559
|
- spec/loxafr3_spec.rb
|
5556
5560
|
- spec/mm9_spec.rb
|
5557
5561
|
- spec/mondom5_spec.rb
|
5562
|
+
- spec/named_scope_spec.rb
|
5558
5563
|
- spec/ornana1_spec.rb
|
5559
5564
|
- spec/orycun2_spec.rb
|
5560
5565
|
- spec/orylat2_spec.rb
|
@@ -5591,7 +5596,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
5591
5596
|
version: '0'
|
5592
5597
|
segments:
|
5593
5598
|
- 0
|
5594
|
-
hash:
|
5599
|
+
hash: 2327058743331415747
|
5595
5600
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
5596
5601
|
none: false
|
5597
5602
|
requirements:
|
@@ -5620,6 +5625,7 @@ test_files:
|
|
5620
5625
|
- spec/cavpor3_spec.rb
|
5621
5626
|
- spec/cb3_spec.rb
|
5622
5627
|
- spec/ce6_spec.rb
|
5628
|
+
- spec/chromosome_specific_tables_spec.rb
|
5623
5629
|
- spec/ci2_spec.rb
|
5624
5630
|
- spec/danrer7_spec.rb
|
5625
5631
|
- spec/dm3_spec.rb
|
@@ -5635,9 +5641,11 @@ test_files:
|
|
5635
5641
|
- spec/droyak2_spec.rb
|
5636
5642
|
- spec/equcab2_spec.rb
|
5637
5643
|
- spec/felcat4_spec.rb
|
5644
|
+
- spec/find_by_and_spec.rb
|
5638
5645
|
- spec/fr2_spec.rb
|
5639
5646
|
- spec/galgal3_spec.rb
|
5640
5647
|
- spec/gasacu1_spec.rb
|
5648
|
+
- spec/genomic-interval-bin_spec.rb
|
5641
5649
|
- spec/go_spec.rb
|
5642
5650
|
- spec/hg18/acembly_spec.rb
|
5643
5651
|
- spec/hg18/acemblyclass_spec.rb
|
@@ -10523,6 +10531,7 @@ test_files:
|
|
10523
10531
|
- spec/loxafr3_spec.rb
|
10524
10532
|
- spec/mm9_spec.rb
|
10525
10533
|
- spec/mondom5_spec.rb
|
10534
|
+
- spec/named_scope_spec.rb
|
10526
10535
|
- spec/ornana1_spec.rb
|
10527
10536
|
- spec/orycun2_spec.rb
|
10528
10537
|
- spec/orylat2_spec.rb
|