bio-ucsc-api 0.2.1 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
data/COPYING.ja CHANGED
@@ -1,3 +1,5 @@
1
+ # -*- coding: iso-2022-jp -*-
2
+
1
3
  $BK\%W%m%0%i%`$O%U%j!<%=%U%H%&%'%"$G$9!%(BGPL (the GNU General
2
4
  Public License)$B%P!<%8%g%s(B2$B$^$?$O0J2<$K<($9>r7o$GK\%W%m%0%i%`(B
3
5
  $B$r:FG[I[$G$-$^$9!%(BGPL$B$K$D$$$F$O(BGPL$B%U%!%$%k$r;2>H$7$F2<$5$$!%(B
@@ -9,22 +9,24 @@ http://rubyucscapi.userecho.com/.
9
9
  == Features
10
10
 
11
11
  * Supporting all organisms in the UCSC genome database.
12
- * Using ActiveRecord as an O/R mapping framework. Basically, each tables can access using ActiveRecord methods.
12
+ * Using ActiveRecord as an O/R mapping framework. Basically, each tables can access using ActiveRecord method convention.
13
13
  * Using the Bin index system to improve query performance. This is one of the reason why you use Ruby UCSC API instead of submitting SQL queries directly.
14
14
  * Supporting genomic sequence query using locally downloaded "2bit" files. Genomic sequences are not stored in UCSC's official MySQL database.
15
15
  * Automatic conversion of "1-based full-closed intervals" to internal "0-based left-closed right-open intervals" (see also bioruby-genomic-interval)
16
- * Supporting non-official MySql hosts (e.g. local servers)
16
+ * Supporting non-official full/partial mirror MySql hosts (e.g. local servers)
17
17
  * Using Rspec for the testing framework
18
+ * Written in pure Ruby and supporting multiple Ruby interpreter implementations including Ruby1.8, Ruby1.9, and JRuby1.6
18
19
  * Designed as a BioRuby plugin
19
- * Current version does not support tables linking to bigWIG or bigBED files.
20
+ * Current version does not support table-linked bigWIG/bigBED/BAM files.
20
21
 
21
22
  == Supported databases (genome assemblies)
23
+
22
24
  [human] Hg19, Hg18
23
25
  [mammals] chimp (PanTro3), orangutan (PonAbe2), rhesus (RheMac2), marmoset (CalJac3), mouse (Mm9), rat (Rn4), guinea pig (CavPor3), rabbit (OryCun2), cat (FelCat4), panda (AilMel1), dog (CanFam2), horse (EquCab2), pig (SusScr2), sheep (OviAri1), cow (BosTau4), elephant (LoxAfr3), opossum (MonDom5), platypus (OrnAna1)
24
26
  [vertebrates] chicken (GalGal3), zebra finch (TaeGut1), lizard (AnoCar2), X. tropicalis (XenTro2), zebrafish (DanRer7), tetraodon (TetNig2), fugu (Fr2), stickleback (GasAcu1), medaka (OryLat2), lamprey (PetMar1)
25
27
  [deuterostomes] lancelet (BraFlo1), sea squirt (Ci2), sea urchin (StrPur2)
26
28
  [insects] D.melanogaster (Dm3), D.simulans (DroSim1), D.sechellia (DroSec1), D.yakuba (DroYak2), D.erecta (DroEre1), D.ananassae (DroAna2), D.pseudoobscura (Dp3), D.persimilis (DroPer1), D.virilis (DroVir2), D.mojavensis (DroMoj2), D.grimshawi (DroGri1), Anopheles mosquito (AnoGam1), honey bee (ApiMel2)
27
- [nematodes] C.elegans (Ce6), C.brenneri (CaePb3), C.briggsae (Cb3), C.remanei (CaeRem3), C.japonica (CarJap1), P.pacificus (PriPac1)
29
+ [nematodes] C.elegans (Ce6), C.brenneri (CaePb3), C.briggsae (Cb3), C.remanei (CaeRem3), C.japonica (CaeJap1), P.pacificus (PriPac1)
28
30
  [others] sea hare (AplCal1), yeast (SacCer2)
29
31
  [genome assembly independent] Go, HgFixed, Proteome, UniProt, VisiGene
30
32
 
@@ -34,24 +36,27 @@ This package is based on the followings:
34
36
  * original ruby-ucsc-api: https://github.com/jandot/ruby-ucsc-api
35
37
  * ruby-ensembl-api: https://github.com/jandot/ruby-ensembl-api
36
38
 
37
- Major dependent gems:
38
-
39
- * active_record http://api.rubyonrails.org/classes/ActiveRecord/Base.html
40
- * bioruby-genomic-interval https://github.com/misshie/bioruby-genomic-interval
41
-
42
- Requirement:
39
+ Supported Ruby interpreter implementations:
43
40
 
44
41
  * Ruby version 1.9.2 or later
45
42
  * Ruby version 1.8.7 or later
46
- * (To-Do: JRuby support)
43
+ * JRuby version 1.6.3 or later - Appropiate Java heap size may have to be specified to invoke JRuby, especially when you use Bio::Ucsc::Reference. Try "jruby -J-Xmx3g your_script.rb" to keep 3G byte heap.
44
+
45
+ Major dependent gems:
46
+
47
+ * active_record - http://api.rubyonrails.org/classes/ActiveRecord/Base.html
48
+ * bioruby-genomic-interval - https://github.com/misshie/bioruby-genomic-interval
49
+ * mysql (MySQL/Ruby MySQL API module) - http://www.tmtm.org/mysql/ruby/README.html
47
50
 
48
51
  See also:
49
52
 
50
- * Strozzi F, Aerts J: A Ruby API to query the Ensembl database for genomic features.
51
- Bioinformatics 2011, 27:1013-1014.
53
+ * Strozzi F, Aerts J: A Ruby API to query the Ensembl database for genomic features. Bioinformatics 2011, 27:1013-1014.
52
54
  * UCSCBin library - https://github.com/misshie/UCSCBin
53
55
 
54
56
  == Change Log
57
+ * *NEW* (v.0.3.0): Now genomic interval queries are expressed using the named scope "with_interval". Table#find_(all_)by_interval is now deprecated. Sorry for an inconstant API. However, this change enable combination queries using genomic intervals and any fields.
58
+ * *NEW* (v.0.3.0): Bio::GenomicInterval#bin_all and Bio::GenomicInterval#bin return the bin index for the given interval.
59
+ * *NEW* (v.0.3.0): Supporting JRuby 1.6.3 or later. Appropiate Java heap size may have to be specified to invoke JRuby, especially when you use Bio::Ucsc::Reference. Try "jruby -J-Xmx3g your_script.rb" to keep 3G byte heap.
55
60
  * *NEW* (v.0.2.1): New genome assemblies are supported: [chimp] PanTro3, [orangutan] PonAbe2, [rhesus] RheMac2, [marmoset] CalJac3, [rat] Rn4, [guinea pig] CavPor3, [rabbit] OryCun2, [cat] FelCat4, [panda] AilMel1, [Dog] CanFam2, [horse] EquCab2, [pig] SusScr2, [sheep] OviAri1, [cow] BosTau4, [elephant] LoxAfr3, [opossum] MonDom5, [platypus] OrnAna1, [chicken] GalGal3, [zebra finch] TaeGut1, [lizard] AnoCar2, [X. tropicalis] XenTro2, [zebrafish] DanRer7, [tetraodon] TetNig2, [fugu] Fr2, [stickleback] GasAcu1, [medaka] OryLat2, [lamprey] PerMar1, [lancelet] BraFlo1, [sea squirt] Ci2, [sea urchin] StrPur2, [D.simulans] DroSim1, [D.sechellia] DroSec1, [D.yakuba] DroYak2, [D.electa] DroEre1, [D.ananassae] DroAna2, [D.pseudoobscura] Dp3, [D.persimilis] DroPer1, [D. virilis] DroVir2, [D.mojavensis] DroMoj2, [D.grimshawi] DroGri1, [Anopheles mosquito] AnoGam1, [honey bee] ApiMel2, [C.brenneri] CaePb3, [C.briggsae] Cb3, [C.remanei] CaeRem3, [P.pacificus] PriPac1, [sea hare] AplCal1, [yeast] SacCer2
56
61
  * *NEW* (v.0.2.1): Supporting Ruby 1.8.7 or later
57
62
  * *NEW* Adding to human Hg19 and Hg18, the following genome assemblies are supported: [mouse] Mm9, [fruitfly] Dm3, [C. elegans] Ce6, [genome assembly independent] Go, HgFixed, Proteome, UniProt, VisiGene
@@ -78,64 +83,74 @@ You may need to be root or use "sudo". "--no-ri" and "--no-rdoc" options are rec
78
83
  * Before using a database, establish a connection to the database. For example, "Bio::Ucsc::Hg19::DBConnection.connect".
79
84
  * A table in a database is represented as a class in the database module. For example, the snp132 table in the hg19 database is referred by "Bio::Ucsc::Hg19::Snp132".
80
85
  * Queries to a field (column) in a table are represented by class methods of the table class. For example, finding the first record (row) of the snp132 table in the hg19 database is "Bio::Ucsc::Hg19::Snp132.first".
81
- * Queries using genomic intervals are supported by .find_by_intervals (returns the first hit record) and .find_all_by (returns all the hit records) class methods. Each method accepts a Bio::GenomicInterval object containing a genomic interval such as "chr1:1233-5678". If a table to query has the "bin"column, the bin index system is automatically used to speed-up the query.
86
+ * Queries using genomic intervals are supported by the named scope ".with_intervals" and ".with_intervals_excl (omitting pertially included annotations)" method of the table class. The method accepts a Bio::GenomicInterval object containing a genomic interval such as "chr1:1233-5678". If a table to query has the "bin" column, the bin index system is automatically used to speed-up the query.
82
87
  * Fields in a retrieved record can be acccessed by using instance methods of a record object. For example, the name field of a table record stored in the "result" variable is "result.name".
83
88
 
84
89
  === Sample Codes
90
+ At first, you have to declare the API and establish the connection to a database.
85
91
  require 'bio-ucsc'
86
92
 
87
- include Bio::Ucsc::Hg19
88
- DBConnection.connect
93
+ include Bio # To short-cut the class path
94
+ Ucsc::Hg19::DBConnection.connect
89
95
 
90
- # When using a table-class first time, refer the class using full-path.
91
- # If not, the API will fail to prefetch the table and define the appropriate class.
92
- # After that, you can refer the table class with a short name enabled by
93
- # top level "include" function
94
- Bio::Ucsc::Hg19::Snp131 # This line just refer the table class
96
+ In the first reference of a table class, the followings does not work:
97
+ include Bio::Ucsc::Hg19
98
+ Snp131.first # The Ruby interpreter searchs Snp131 at the top-level
99
+ But the following line works because the API will fail to prefetch the table and define the appropriate class dynamically. "include Bio" or "include Bio::Ucsc" will work.
100
+ Ucsc::Hg19::Snp131 # This line works
95
101
 
96
- gi = Bio::GenomicInterval.parse("chr1:1-11,000")
97
- Snp131.find_all_by_interval(gi).each do |e|
98
- i = Bio::GenomicInterval.zero_based(e.chrom, e.chromStart, e.chromEnd)
99
- puts "#{i.chrom}\t#{i.chr_start}\t#{e.name}\t#{e[:class]}"
102
+ Table search using genomic intervals:
103
+ gi = GenomicInterval.parse("chr1:1-11,000")
104
+ Ucsc::Hg19::Snp131.with_interval(gi).find(:all).each do |e|
105
+ i = GenomicInterval.zero_based(e.chrom, e.chromStart, e.chromEnd)
106
+ puts "#{i.chrom}\t#{i.chr_start}\t#{e.name}\t#{e[:class]}" # "e.class" does not work
100
107
  end
101
108
 
102
- gi = Bio::GenomicInterval.parse("chr17:7,579,614-7,579,700")
103
- p Snp131.find_all_by_interval(gi)
104
- p Snp131.find_all_by_interval(gi, partial:false)
109
+ gi = GenomicInterval.parse("chr17:7,579,614-7,579,700")
110
+ Ucsc::Hg19::Snp131.with_interval(gi).find(:all)
111
+
112
+ Ucsc::Hg19::Snp131.with_interval_excl(gi).find(:all)
105
113
 
106
- p Snp131.find_by_name("rs56289060")
114
+ Ucsc::Hg19::Snp132.with_interval(gi).select(:name).find_all_by_class_and_strand("in-del", "+")
107
115
 
108
- # Sometimes, queries using raw SQL provide elegant solutions.
109
- #
116
+ Ucsc::Hg19::Snp131.find_by_name("rs56289060")
117
+
118
+ Sometimes, queries using raw SQLs provide elegant solutions.
110
119
  sql << 'SQL'
111
120
  SELECT name,chrom,chromStart,chromEnd,observed
112
121
  FROM snp131
113
122
  WHERE name="rs56289060"
114
123
  SQL
115
- p Snp131.find_by_sql(sql)
124
+ p Ucsc::Hg19::Snp131.find_by_sql(sql)
116
125
 
117
- # retrieve reference sequence from a locally-stored 2bit file
118
- hg19ref = Bio::Ucsc::Reference.load("hg19.2bit")
119
- gi = Bio::GenomicInterval.parse("chr1:9,500-10,999")
126
+ retrieve reference sequence from a locally-stored 2bit file. The "hg19.2bit" file can be downloaded from http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.2bit
127
+ hg19ref = Ucsc::Reference.load("hg19.2bit")
128
+ gi = GenomicInterval.parse("chr1:9,500-10,999")
120
129
  hg19ref.find_by_interval(gi)
121
130
 
122
- # Connetcting to non-official or local mirror MySQL servers
123
- DBConnection.db_host = 'foo.example.edu'
124
- DBConnection.db_username = 'genome'
125
- DBConnection.db_password = ''
126
- DBConnection.connect
131
+ Connetcting to non-official or local full/partial mirror MySQL servers
132
+ Ucsc::Hg18::DBConnection.db_host = 'localhost'
133
+ Ucsc::Hg18::DBConnection.db_username = 'genome'
134
+ Ucsc::Hg18::DBConnection.db_password = ''
135
+ Ucsc::Hg18::DBConnection.connect
136
+
137
+ Ucsc::Hg18::DBConnection.default # reset to connect UCSC's public MySQL sever
138
+ Ucsc::Hg18::DBConnection.connect
127
139
 
128
- DBConnection.default # reset to connect UCSC's public MySQL sever
129
- DBConnection.connect
140
+ And see also sample scripts in the samples directory.
141
+ * num-gene-exon.rb - calculation of total number of genes and exons using genomic interval
142
+ * symbol2summary.rb - getting summary descriptions using gene symbol
143
+ * hg19-2bit-retrieve - outputting reference sequence in FASTA format
144
+ * bed2refseq - getting unique gene symbols in the genomic intervals in a BED file.
130
145
 
131
146
  === Notes of Exceptions in Table Support
132
147
  * Table names starting with a number: Because Ruby class names cannot start with number, use the table class name starting with "T" (T for Table). Thus, the "2micron_est" table is supported by the "T2micron_est" class.
133
148
  * Table names starting with uppercase character: Classes for "HInv" and "NIAGene" tables are "HInv" and "NIAGene", respectively
134
- * Tables separated into each chromosome, like 'chr1_rmsk', 'chr2_rmsk'... are supported by a representative class ('Rmsk'). When using the find(_all)_by_interval class method, the API invoke required separated tables automatically.
149
+ * Accessing chromosome-specific tables: For example, the 'rmsk' table in hg18 is actually separated into 'chr1_rmsk', 'chr2_rmsk'... There is two way to access to them. (1) Accessing separated tables directly. There is no difference from other regular tables. However, you have to manage each separated tables. (2) Use abstract table classes (e.g., 'Rmsk') and their class methods ".find_by_interval' or '.find_all_by_interval'. These methods look for correspondent separated tables automatically. However, you cannot combine with other 'find_by_[field]' methods. Moreover, if you have to perform single- or multi-chromosomal search, you have to access separated tables individually and integrate results by yourself. Fortunately, recent databases, including hg19, seem not to use chromosome-specific tables.
135
150
  * For honey bee ApiMel2 database, Group*_chainDm2 and Group*_chainDm2Link tables are accessible using find(_all)_by_interval class methods of the ChainDm2 and ChainDm2Link classes.
136
151
  * Special field (column) names: Field names such as 'attribute', 'valid', 'validate', 'class', 'method', 'methods', and 'type' cannot be accessed using instance methods. This restriction is because of the collision of method names that are internally used by ActiveRecord. Instead, use hash to access the field like "result[:type]".
137
152
 
138
- === details in "find_(all_)by_intervals"
153
+ === details in "with_interval"
139
154
  * When a table class is referred first time, the API prefetches the table to get a list of fields and dynamically defines a class using following algorithm.
140
155
  * If chrom/chromStart/chromEnd fields exist (BED table), the API uses them for interval queries.
141
156
  * When tName/tStart/tEnd fields exist (PSL table), the API uses them for interval queries.
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.2.1
1
+ 0.3.0
@@ -5,11 +5,11 @@
5
5
 
6
6
  Gem::Specification.new do |s|
7
7
  s.name = %q{bio-ucsc-api}
8
- s.version = "0.2.1"
8
+ s.version = "0.3.0"
9
9
 
10
10
  s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
11
11
  s.authors = [%q{Hiroyuki Mishima}, %q{Jan Aerts}]
12
- s.date = %q{2011-08-14}
12
+ s.date = %q{2011-08-23}
13
13
  s.description = %q{Ruby UCSC API: accessing the UCSC Genome Database using Ruby}
14
14
  s.email = %q{missy@be.to}
15
15
  s.extra_rdoc_files = [
@@ -294,6 +294,7 @@ Gem::Specification.new do |s|
294
294
  "lib/bio-ucsc/gasacu1/intronest.rb",
295
295
  "lib/bio-ucsc/gasacu1/mrna.rb",
296
296
  "lib/bio-ucsc/gasacu1/rmsk.rb",
297
+ "lib/bio-ucsc/genomic-interval-bin.rb",
297
298
  "lib/bio-ucsc/go.rb",
298
299
  "lib/bio-ucsc/go/db_connection.rb",
299
300
  "lib/bio-ucsc/hg18.rb",
@@ -546,6 +547,7 @@ Gem::Specification.new do |s|
546
547
  "spec/cavpor3_spec.rb",
547
548
  "spec/cb3_spec.rb",
548
549
  "spec/ce6_spec.rb",
550
+ "spec/chromosome_specific_tables_spec.rb",
549
551
  "spec/ci2_spec.rb",
550
552
  "spec/danrer7_spec.rb",
551
553
  "spec/dm3_spec.rb",
@@ -561,9 +563,11 @@ Gem::Specification.new do |s|
561
563
  "spec/droyak2_spec.rb",
562
564
  "spec/equcab2_spec.rb",
563
565
  "spec/felcat4_spec.rb",
566
+ "spec/find_by_and_spec.rb",
564
567
  "spec/fr2_spec.rb",
565
568
  "spec/galgal3_spec.rb",
566
569
  "spec/gasacu1_spec.rb",
570
+ "spec/genomic-interval-bin_spec.rb",
567
571
  "spec/go_spec.rb",
568
572
  "spec/hg18/acembly_spec.rb",
569
573
  "spec/hg18/acemblyclass_spec.rb",
@@ -5449,6 +5453,7 @@ Gem::Specification.new do |s|
5449
5453
  "spec/loxafr3_spec.rb",
5450
5454
  "spec/mm9_spec.rb",
5451
5455
  "spec/mondom5_spec.rb",
5456
+ "spec/named_scope_spec.rb",
5452
5457
  "spec/ornana1_spec.rb",
5453
5458
  "spec/orycun2_spec.rb",
5454
5459
  "spec/orylat2_spec.rb",
@@ -4,13 +4,13 @@
4
4
  # MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
5
5
  # License:: Ruby licence (Ryby's / GPLv2 dual)
6
6
 
7
- base = File.dirname(__FILE__)
8
- require "#{base}/bio-ucsc/ucsc_bin"
9
- require "bio-genomic-interval"
7
+ base = "#{File.dirname(__FILE__)}/bio-ucsc"
8
+ require "#{base}/ucsc_bin"
9
+ require "#{base}/genomic-interval-bin"
10
10
 
11
11
  module Bio
12
12
  module Ucsc
13
- VERSION = "0.2.1"
13
+ VERSION = "0.3.0"
14
14
  base = "#{File.dirname(__FILE__)}/bio-ucsc"
15
15
 
16
16
  # mammmals #####################################
@@ -0,0 +1,13 @@
1
+ require "bio-genomic-interval"
2
+
3
+ module Bio
4
+ class GenomicInterval
5
+ def bin
6
+ Bio::Ucsc::UcscBin.bin(self.zero_start, self.zero_end)
7
+ end
8
+
9
+ def bin_all
10
+ Bio::Ucsc::UcscBin.bin_all(self.zero_start, self.zero_end)
11
+ end
12
+ end
13
+ end
@@ -22,6 +22,15 @@ module Bio
22
22
  ['attribute', 'valid', 'validate', 'class', 'method', 'methods', 'type']
23
23
  UPPERCASED_TABLE_PREFIX =
24
24
  ['HInv', 'NIAGene']
25
+ COMMON_CLASS_METHODS = %!
26
+ def self.find_by_interval(interval, opt = {:partial => true})
27
+ find_first_or_all_by_interval(interval, :first, opt)
28
+ end
29
+
30
+ def self.find_all_by_interval(interval, opt = {:partial => true})
31
+ find_first_or_all_by_interval(interval, :all, opt)
32
+ end
33
+ !
25
34
 
26
35
  def const_missing(sym)
27
36
  module_eval generic(sym)
@@ -81,13 +90,31 @@ module Bio
81
90
  class #{uphead(sym)} < DBConnection
82
91
  set_table_name "#{downhead(sym)}"
83
92
  #{delete_reserved_methods}
84
- def self.find_by_interval(interval, opt = {:partial => true})
85
- find_first_or_all_by_interval(interval, :first, opt)
86
- end
87
-
88
- def self.find_all_by_interval(interval, opt = {:partial => true})
89
- find_first_or_all_by_interval(interval, :all, opt)
90
- end
93
+ #{COMMON_CLASS_METHODS}
94
+
95
+ where = <<-SQL
96
+ tName = :chrom
97
+ AND bin in (:bins)
98
+ AND ((tStart BETWEEN :zstart AND :zend)
99
+ OR (tEnd BETWEEN :zstart AND :zend)
100
+ OR (tStart <= :zstart AND tEnd >= :zend))
101
+ SQL
102
+ scope(:with_interval,
103
+ Proc.new{|gi|{:conditions => [where, {:chrom => gi.chrom,
104
+ :bins => gi.bin_all,
105
+ :zstart => gi.zero_start,
106
+ :zend => gi.zero_end,}]}})
107
+ where = <<-SQL
108
+ tName = :chrom
109
+ AND bin in (:bins)
110
+ AND ((tStart BETWEEN :zstart AND :zend)
111
+ AND (tEnd BETWEEN :zstart AND :zend))
112
+ SQL
113
+ scope(:with_interval_excl,
114
+ Proc.new{|gi|{:conditions => [where,{:chrom => gi.chrom,
115
+ :bins => gi.bin_all,
116
+ :zstart => gi.zero_start,
117
+ :zend => gi.zero_end,}]}})
91
118
 
92
119
  def self.find_first_or_all_by_interval(interval, first_all, opt)
93
120
  zstart = interval.zero_start
@@ -100,13 +127,14 @@ AND bin in (:bins)
100
127
  AND ((tStart BETWEEN :zstart AND :zend)
101
128
  OR (tEnd BETWEEN :zstart AND :zend)
102
129
  OR (tStart <= :zstart AND tEnd >= :zend))
103
- SQL
130
+ SQL
104
131
  else
105
- where = <<-SQL
132
+ where = <<-SQL
106
133
  tName = :chrom
134
+ AND bin in (:bins)
107
135
  AND ((tStart BETWEEN :zstart AND :zend)
108
136
  AND (tEnd BETWEEN :zstart AND :zend))
109
- SQL
137
+ SQL
110
138
  end
111
139
  cond = {
112
140
  :chrom => interval.chrom,
@@ -125,15 +153,28 @@ AND (tEnd BETWEEN :zstart AND :zend))
125
153
  class #{uphead(sym)} < DBConnection
126
154
  set_table_name "#{downhead(sym)}"
127
155
  #{delete_reserved_methods}
128
-
129
- def self.find_by_interval(interval, opt = {:partial => true})
130
- find_first_or_all_by_interval(interval, :first, opt)
131
- end
132
-
133
- def self.find_all_by_interval(interval, opt = {:partial => true})
134
- find_first_or_all_by_interval(interval, :all, opt)
135
- end
156
+ #{COMMON_CLASS_METHODS}
136
157
 
158
+ where = <<-SQL
159
+ tName = :chrom
160
+ AND ((tStart BETWEEN :zstart AND :zend)
161
+ OR (tEnd BETWEEN :zstart AND :zend)
162
+ OR (tStart <= :zstart AND tEnd >= :zend))
163
+ SQL
164
+ scope(:with_interval,
165
+ Proc.new{|gi|{:conditions => [where, {:chrom => gi.chrom,
166
+ :zstart => gi.zero_start,
167
+ :zend => gi.zero_end,}]}})
168
+ where = <<-SQL
169
+ tName = :chrom
170
+ AND ((tStart BETWEEN :zstart AND :zend)
171
+ AND (tEnd BETWEEN :zstart AND :zend))
172
+ SQL
173
+ scope(:with_interval_excl,
174
+ Proc.new{|gi|{:conditions => [where,{:chrom => gi.chrom,
175
+ :zstart => gi.zero_start,
176
+ :zend => gi.zero_end,}]}})
177
+
137
178
  def self.find_first_or_all_by_interval(interval, first_all, opt)
138
179
  zstart = interval.zero_start
139
180
  zend = interval.zero_end
@@ -174,15 +215,32 @@ AND (tEnd BETWEEN :zstart AND :zend))
174
215
  class #{uphead(sym)} < DBConnection
175
216
  set_table_name "#{downhead(sym)}"
176
217
  #{delete_reserved_methods}
218
+ #{COMMON_CLASS_METHODS}
177
219
 
178
- def self.find_by_interval(interval, opt = {:partial => true})
179
- find_first_or_all_by_interval(interval, :first, opt)
180
- end
181
-
182
- def self.find_all_by_interval(interval, opt = {:partial => true})
183
- find_first_or_all_by_interval(interval, :all, opt)
184
- end
185
-
220
+ where = <<-SQL
221
+ chrom = :chrom
222
+ AND bin in (:bins)
223
+ AND ((chromStart BETWEEN :zstart AND :zend)
224
+ OR (chromEnd BETWEEN :zstart AND :zend)
225
+ OR (chromStart <= :zstart AND chromEnd >= :zend))
226
+ SQL
227
+ scope(:with_interval,
228
+ Proc.new{|gi|{:conditions => [where, {:chrom => gi.chrom,
229
+ :bins => gi.bin_all,
230
+ :zstart => gi.zero_start,
231
+ :zend => gi.zero_end,}]}})
232
+ where = <<-SQL
233
+ chrom = :chrom
234
+ AND bin in (:bins)
235
+ AND ((chromStart BETWEEN :zstart AND :zend)
236
+ AND (chromEnd BETWEEN :zstart AND :zend))
237
+ SQL
238
+ scope(:with_interval_excl,
239
+ Proc.new{|gi|{:conditions => [where,{:chrom => gi.chrom,
240
+ :bins => gi.bin_all,
241
+ :zstart => gi.zero_start,
242
+ :zend => gi.zero_end,}]}})
243
+
186
244
  def self.find_first_or_all_by_interval(interval, first_all, opt)
187
245
  zstart = interval.zero_start
188
246
  zend = interval.zero_end
@@ -220,14 +278,27 @@ AND (chromEnd BETWEEN :zstart AND :zend))
220
278
  class #{uphead(sym)} < DBConnection
221
279
  set_table_name "#{downhead(sym)}"
222
280
  #{delete_reserved_methods}
223
-
224
- def self.find_by_interval(interval, opt = {:partial => true})
225
- find_first_or_all_by_interval(interval, :first, opt)
226
- end
227
-
228
- def self.find_all_by_interval(interval, opt = {:partial => true})
229
- find_first_or_all_by_interval(interval, :all, opt)
230
- end
281
+ #{COMMON_CLASS_METHODS}
282
+
283
+ where = <<-SQL
284
+ chrom = :chrom
285
+ AND ((chromStart BETWEEN :zstart AND :zend)
286
+ OR (chromEnd BETWEEN :zstart AND :zend)
287
+ OR (chromStart <= :zstart AND chromEnd >= :zend))
288
+ SQL
289
+ scope(:with_interval,
290
+ Proc.new{|gi|{:conditions => [where, {:chrom => gi.chrom,
291
+ :zstart => gi.zero_start,
292
+ :zend => gi.zero_end,}]}})
293
+ where = <<-SQL
294
+ chrom = :chrom
295
+ AND ((chromStart BETWEEN :zstart AND :zend)
296
+ AND (chromEnd BETWEEN :zstart AND :zend))
297
+ SQL
298
+ scope(:with_interval_excl,
299
+ Proc.new{|gi|{:conditions => [where,{:chrom => gi.chrom,
300
+ :zstart => gi.zero_start,
301
+ :zend => gi.zero_end,}]}})
231
302
 
232
303
  def self.find_first_or_all_by_interval(interval, first_all, opt)
233
304
  zstart = interval.zero_start
@@ -269,14 +340,32 @@ AND (chromEnd BETWEEN :zstart AND :zend))
269
340
  class #{uphead(sym)} < DBConnection
270
341
  set_table_name "#{downhead(sym)}"
271
342
  #{delete_reserved_methods}
272
-
273
- def self.find_by_interval(interval, opt = {:partial => true})
274
- find_first_or_all_by_interval(interval, :first, opt)
275
- end
276
-
277
- def self.find_all_by_interval(interval, opt = {:partial => true})
278
- find_first_or_all_by_interval(interval, :all, opt)
279
- end
343
+ #{COMMON_CLASS_METHODS}
344
+
345
+ where = <<-SQL
346
+ chrom = :chrom
347
+ AND bin in (:bins)
348
+ AND ((txStart BETWEEN :zstart AND :zend)
349
+ OR (txEnd BETWEEN :zstart AND :zend)
350
+ OR (txStart <= :zstart AND txEnd >= :zend))
351
+ SQL
352
+ scope(:with_interval,
353
+ Proc.new{|gi|{:conditions => [where, {:chrom => gi.chrom,
354
+ :bins => gi.bin_all,
355
+ :zstart => gi.zero_start,
356
+ :zend => gi.zero_end,}]}})
357
+
358
+ where = <<-SQL
359
+ chrom = :chrom
360
+ AND bin in (:bins)
361
+ AND ((txStart BETWEEN :zstart AND :zend)
362
+ AND (txEnd BETWEEN :zstart AND :zend))
363
+ SQL
364
+ scope(:with_interval_excl,
365
+ Proc.new{|gi|{:conditions => [where,{:chrom => gi.chrom,
366
+ :bins => gi.bin_all,
367
+ :zstart => gi.zero_start,
368
+ :zend => gi.zero_end,}]}})
280
369
 
281
370
  def self.find_first_or_all_by_interval(interval, first_all, opt)
282
371
  zstart = interval.zero_start
@@ -314,14 +403,33 @@ AND (txEnd BETWEEN :zstart AND :zend))
314
403
  class #{uphead(sym)} < DBConnection
315
404
  set_table_name "#{downhead(sym)}"
316
405
  #{delete_reserved_methods}
317
-
318
- def self.find_by_interval(interval, opt = {:partial => true})
319
- find_first_or_all_by_interval(interval, :first, opt)
320
- end
321
-
322
- def self.find_all_by_interval(interval, opt = {:partial => true})
323
- find_first_or_all_by_interval(interval, :all, opt)
324
- end
406
+ #{COMMON_CLASS_METHODS}
407
+
408
+ def self.with_interval(gi, opt = {:partial => true})
409
+ if opt[:partial] == true
410
+ where = <<-SQL
411
+ chrom = :chrom
412
+ AND ((txStart BETWEEN :zstart AND :zend)
413
+ OR (txEnd BETWEEN :zstart AND :zend)
414
+ OR (txStart <= :zstart AND txEnd >= :zend))
415
+ SQL
416
+ else
417
+ where = <<-SQL
418
+ chrom = :chrom
419
+ AND ((txStart BETWEEN :zstart AND :zend)
420
+ AND (txEnd BETWEEN :zstart AND :zend))
421
+ SQL
422
+ end
423
+ values = {
424
+ :chrom => gi.chrom,
425
+ :zstart => gi.zero_start,
426
+ :zend => gi.zero_end,
427
+ }
428
+
429
+ with_scope(:find => {:conditions => [where, values]}) do
430
+ yield
431
+ end
432
+ end # def self.with_interval
325
433
 
326
434
  def self.find_first_or_all_by_interval(interval, first_all, opt)
327
435
  zstart = interval.zero_start
@@ -363,14 +471,31 @@ AND ((txStart BETWEEN :zstart AND :zend)
363
471
  class #{uphead(sym)} < DBConnection
364
472
  set_table_name "#{downhead(sym)}"
365
473
  #{delete_reserved_methods}
474
+ #{COMMON_CLASS_METHODS}
366
475
 
367
- def self.find_by_interval(interval, opt = {:partial => true})
368
- find_first_or_all_by_interval(interval, :first, opt)
369
- end
370
-
371
- def self.find_all_by_interval(interval, opt = {:partial => true})
372
- find_first_or_all_by_interval(interval, :all, opt)
373
- end
476
+ where = <<-SQL
477
+ genoName = :chrom
478
+ AND bin in (:bins)
479
+ AND ((genoStart BETWEEN :zstart AND :zend)
480
+ OR (genoEnd BETWEEN :zstart AND :zend)
481
+ OR (genoStart <= :zstart AND genoEnd >= :zend))
482
+ SQL
483
+ scope(:with_interval,
484
+ Proc.new{|gi|{:conditions => [where, {:chrom => gi.chrom,
485
+ :bins => gi.bin_all,
486
+ :zstart => gi.zero_start,
487
+ :zend => gi.zero_end,}]}})
488
+ where = <<-SQL
489
+ genoName = :chrom
490
+ AND bin in (:bins)
491
+ AND ((genoStart BETWEEN :zstart AND :zend)
492
+ AND (genoEnd BETWEEN :zstart AND :zend))
493
+ SQL
494
+ scope(:with_interval_excl,
495
+ Proc.new{|gi|{:conditions => [where,{:chrom => gi.chrom,
496
+ :bins => gi.bin_all,
497
+ :zstart => gi.zero_start,
498
+ :zend => gi.zero_end,}]}})
374
499
 
375
500
  def self.find_first_or_all_by_interval(interval, first_all, opt)
376
501
  zstart = interval.zero_start
@@ -382,14 +507,14 @@ AND bin in (:bins)
382
507
  AND ((genoStart BETWEEN :zstart AND :zend)
383
508
  OR (genoEnd BETWEEN :zstart AND :zend)
384
509
  OR (genoStart <= :zstart AND genoEnd >= :zend))
385
- SQL
510
+ SQL
386
511
  else
387
512
  where = <<-SQL
388
513
  genoName = :chrom
389
514
  AND bin in (:bins)
390
515
  AND ((genoStart BETWEEN :zstart AND :zend)
391
516
  AND (genoEnd BETWEEN :zstart AND :zend))
392
- SQL
517
+ SQL
393
518
  end
394
519
  cond = {
395
520
  :chrom => interval.chrom,
@@ -408,14 +533,28 @@ AND (genoEnd BETWEEN :zstart AND :zend))
408
533
  class #{uphead(sym)} < DBConnection
409
534
  set_table_name "#{downhead(sym)}"
410
535
  #{delete_reserved_methods}
411
-
412
- def self.find_by_interval(interval, opt = {:partial => true})
413
- find_first_or_all_by_interval(interval, :first, opt)
414
- end
415
-
416
- def self.find_all_by_interval(interval, opt = {:partial => true})
417
- find_first_or_all_by_interval(interval, :all, opt)
418
- end
536
+ #{COMMON_CLASS_METHODS}
537
+
538
+ where = <<-SQL
539
+ genoName = :chrom
540
+ AND ((genoStart BETWEEN :zstart AND :zend)
541
+ OR (genoEnd BETWEEN :zstart AND :zend)
542
+ OR (genoStart <= :zstart AND genoEnd >= :zend))
543
+ SQL
544
+ scope(:with_interval,
545
+ Proc.new{|gi|{:conditions => [where, {:chrom => gi.chrom,
546
+ :bins => gi.bin_all,
547
+ :zstart => gi.zero_start,
548
+ :zend => gi.zero_end,}]}})
549
+ where = <<-SQL
550
+ genoName = :chrom
551
+ AND ((genoStart BETWEEN :zstart AND :zend)
552
+ AND (genoEnd BETWEEN :zstart AND :zend))
553
+ SQL
554
+ scope(:with_interval_excl,
555
+ Proc.new{|gi|{:conditions => [where,{:chrom => gi.chrom,
556
+ :zstart => gi.zero_start,
557
+ :zend => gi.zero_end,}]}})
419
558
 
420
559
  def self.find_first_or_all_by_interval(interval, first_all, opt)
421
560
  zstart = interval.zero_start
@@ -1,19 +1,23 @@
1
1
  #!/usr/local/bin/ruby-1.9
2
2
 
3
- require 'bio-ucsc'
4
- include Bio::Ucsc::Hg19
3
+ #require 'bio-ucsc'
4
+ require '../lib/bio-ucsc'
5
5
 
6
- DBConnection.connect
6
+ include Bio
7
+
8
+ Ucsc::Hg19::DBConnection.connect
7
9
 
8
10
  genes = Array.new
9
11
  ARGF.each_line do |row|
10
12
  row.chomp!
11
13
  next if row.empty?
12
14
  chr, chr_start, chr_end = row.split("\t")
13
- interval = Bio::GenomicInterval.zero_based(chr,
14
- Integer(chr_start),
15
- Integer(chr_end))
16
- genes.concat(RefGene.find_all_by_interval(interval).map{|e|e.name2})
15
+ gi = GenomicInterval.zero_based(chr,
16
+ Integer(chr_start),
17
+ Integer(chr_end),)
18
+
19
+ results = Ucsc::Hg19::RefGene.with_interval(gi).select(:name2).find(:all)
20
+ genes.concat(results.map{|e|e.name2})
17
21
  end
18
22
 
19
- genes.uniq!.each{|e|puts e}
23
+ genes.uniq.each{|e|puts e} if genes
@@ -20,13 +20,11 @@ class Hg19Ref
20
20
 
21
21
  def run(interval)
22
22
  DBConnection.connect
23
- #ReferenceSequence.load(HG19_2BIT_FILE) # v0.1
24
23
  ref = Bio::Ucsc::Reference.load(HG19_2BIT_FILE) # v0.2 and later
25
24
 
26
25
  itv = Bio::GenomicInterval.parse(interval)
27
26
 
28
27
  puts itv.to_s
29
- # puts NKF.nkf("-wf50-0", ReferenceSequence.find_by_interval(itv)) # v0.1
30
28
  puts NKF.nkf("-wf50-0", ref.find_by_interval(itv)) # v0.2 and later
31
29
  end
32
30
  end
@@ -11,14 +11,14 @@
11
11
  require File.dirname(__FILE__) + '/../lib/bio-ucsc'
12
12
  require 'nkf'
13
13
 
14
- include Bio::Ucsc
14
+ include Bio
15
15
 
16
- Hg19::DBConnection.connect
16
+ Ucsc::Hg19::DBConnection.connect
17
17
 
18
- itvs_a =
19
- [Bio::GenomicInterval.parse("chr1:1-200,000"),
20
- Bio::GenomicInterval.parse("chr2:1-200,000"),
21
- Bio::GenomicInterval.parse("chr3:1-300,000"),
18
+ gi_a =
19
+ [GenomicInterval.parse("chr1:1-200,000"),
20
+ GenomicInterval.parse("chr2:1-200,000"),
21
+ GenomicInterval.parse("chr3:1-300,000"),
22
22
  ]
23
23
 
24
24
  puts
@@ -26,8 +26,10 @@ puts "Queries in Slice objects using 1-based [start,end] closed intervals"
26
26
  puts "Results in 0-based [start,end) half-open intervals"
27
27
  puts
28
28
 
29
- ::puts "test 1 (hg19/RefGene) --- Bio::Ucsc::Hg19::RefGene.find_by_interval"
30
- results = itvs_a.map{|i|Hg19::RefGene.find_by_interval(i)}
29
+ puts "test 1 (hg19/RefGene) --- Bio::Ucsc::Hg19::RefGene.with_interval"
30
+
31
+ results = gi_a.map{|gi| Ucsc::Hg19::RefGene.with_interval(gi).find(:all)}
32
+
31
33
  puts "0-based interval\t1-based interval\tGene Symbol"
32
34
  results.flatten.each do |e|
33
35
  i = Bio::GenomicInterval.zero_based(e.chrom, e.txStart, e.txEnd)
@@ -35,26 +37,27 @@ results.flatten.each do |e|
35
37
  print "#{i.chrom}:#{i.chr_start}-#{i.chr_end}\t#{e.name2}\n"
36
38
  end
37
39
 
38
- #
39
- #
40
+ #################################################################################
40
41
 
41
- itvs_b =
42
- [Bio::GenomicInterval.parse("chr1:1-11,000"),
43
- Bio::GenomicInterval.parse("chr2:1-11,000"),
44
- Bio::GenomicInterval.parse("chr3:1-12,000"),
42
+ gi_b =
43
+ [GenomicInterval.parse("chr1:1-11,000"),
44
+ GenomicInterval.parse("chr2:1-11,000"),
45
+ GenomicInterval.parse("chr3:1-12,000"),
45
46
  ]
46
47
 
47
48
  puts
48
- puts "test 2 (hg19/Snp131) --- Bio::Ucsc::Hg19::Snp131.find_by_interval"
49
+ puts "test 2 (hg19/Snp131) --- Bio::Ucsc::Hg19::Snp131.with_interval"
49
50
  puts "0-based interval\t1-based interval\tdbSNP rs ID\tClass"
50
- results = itvs_b.map{|i|Hg19::Snp131.find_by_interval(i)}
51
+
52
+ results = gi_b.map{|gi|Ucsc::Hg19::Snp131.with_interval(gi).find(:all)}
53
+
51
54
  results.flatten.each do |e|
52
55
  i = Bio::GenomicInterval.zero_based(e.chrom, e.chromStart, e.chromEnd)
53
56
  print "#{e.chrom}:#{e.chromStart}-#{e.chromEnd}\t"
54
57
  print "#{i.chrom}:#{i.chr_start}-#{i.chr_end}\t#{e.name}\t#{e[:class]}\n"
55
58
  end
56
59
 
57
- #
60
+ ###############################################################################
58
61
  #
59
62
 
60
63
  names = %w(rs56289060 rs62636508 rs28888107)
@@ -62,26 +65,7 @@ names = %w(rs56289060 rs62636508 rs28888107)
62
65
  puts
63
66
  puts "test 3 (hg19/Snp131) ---Bio::Ucsc::Hg19::Snp131.find_by_name"
64
67
  names.each do |n|
65
- r = Hg19::Snp131.find_by_name(n)
66
- i = Bio::GenomicInterval.zero_based(r.chrom, r.chromStart, r.chromEnd)
68
+ r = Ucsc::Hg19::Snp131.find_by_name(n)
69
+ i = GenomicInterval.zero_based(r.chrom, r.chromStart, r.chromEnd)
67
70
  puts "Query: #{n}\t#{i.chrom}\t#{i.chr_start}\t#{i.chr_end}\t#{r[:class]}"
68
71
  end
69
-
70
- #
71
- #
72
-
73
- results = GbCdnaInfo.find([1,2,3,4,5], :include => :description)
74
- results.each{|e| puts "#{e.acc}\t#{e.description.name}"}
75
-
76
- p GbCdnaInfo.find_by_acc("AA411542", :include => :description)
77
-
78
- results = KgXref.find_all_by_geneSymbol("TP53")
79
- results.each{|e| puts "#{e.mRNA}\t#{e.description}"}
80
-
81
- #
82
- #
83
-
84
- puts
85
- puts NKF.nkf("-wF72", RefSeqSummary.find_by_mrnaAcc("NM_000546").summary)
86
- puts
87
- puts NKF.nkf("-wF72", RefSeqSummary.find_by_mrnaAcc("NR_029476").summary)
@@ -9,15 +9,14 @@
9
9
  # number of genes, and maximum number of exons.
10
10
  #
11
11
 
12
- require 'bio-ucsc'
13
- include Bio::Ucsc::Hg18
12
+ #require 'bio-ucsc'
13
+ require '../lib/bio-ucsc'
14
14
 
15
15
  interval = Bio::GenomicInterval.parse(ARGV[0])
16
16
 
17
- DBConnection.connect
17
+ Bio::Ucsc::Hg18::DBConnection.connect
18
18
 
19
- genes = RefGene.find_all_by_interval(interval).map{|e|e.name2}.uniq
20
-
19
+ genes = Bio::Ucsc::Hg18::RefGene.with_interval(interval).find(:all).map{|e|e.name2}.uniq
21
20
  puts "Included genes:"
22
21
  puts genes
23
22
  puts "Number of genes:"
@@ -25,7 +24,12 @@ puts genes.size
25
24
 
26
25
  total_exons = 0
27
26
  genes.each do |gene|
28
- total_exons += RefGene.find_all_by_name2(gene).map{|e|e.exonCount}.max
27
+ total_exons +=
28
+ Bio::Ucsc::Hg18::RefGene.
29
+ with_interval(interval).
30
+ find_all_by_name2(gene).
31
+ map{|e|e.exonCount}.
32
+ max
29
33
  end
30
34
 
31
35
  puts "Number of exons (maximum):"
@@ -12,13 +12,12 @@ require File.dirname(__FILE__) + '/../lib/bio-ucsc'
12
12
  require 'nkf'
13
13
 
14
14
  class Sym2Sum
15
- include Bio::Ucsc::Hg19
16
15
 
17
16
  def run(genesym)
18
- DBConnection.connect
19
- known_gene = KgXref.find_by_geneSymbol(genesym)
20
- ref_gene = RefGene.find_by_name2(genesym)
21
- summary = RefSeqSummary.find_by_mrnaAcc(ref_gene.name).summary
17
+ Bio::Ucsc::Hg19::DBConnection.connect
18
+ known_gene = Bio::Ucsc::Hg19::KgXref.find_by_geneSymbol(genesym)
19
+ ref_gene = Bio::Ucsc::Hg19::RefGene.find_by_name2(genesym)
20
+ summary = Bio::Ucsc::Hg19::RefSeqSummary.find_by_mrnaAcc(ref_gene.name).summary
22
21
 
23
22
  puts "---"
24
23
  puts "Gene symbol: #{genesym}" if known_gene
@@ -0,0 +1,36 @@
1
+ require "bio-ucsc"
2
+ require "pp"
3
+
4
+ describe "Bio::Ucsc::Hg18" do
5
+
6
+ before(:all) do
7
+ Bio::Ucsc::Hg18::DBConnection.connect
8
+ end
9
+
10
+ describe "Rmsk (separated Rmsk table)" do
11
+ describe ".find_by_interval" do
12
+ context "given 'chr1:10,000-20,000'" do
13
+ it 'returns a first hit record' do
14
+ gi = Bio::GenomicInterval.parse("chr1:10,000-20,000")
15
+ results = Bio::Ucsc::Hg18::Rmsk.find_by_interval(gi)
16
+ pp results
17
+ results.should be_true
18
+ end
19
+ end
20
+ end
21
+ end
22
+
23
+ describe "Chr1_rmsk (separated Rmsk table)" do
24
+ describe ".with_interval" do
25
+ context "given 'chr1:10,000-20,000'" do
26
+ it 'returns a first hit record' do
27
+ gi = Bio::GenomicInterval.parse("chr1:10,000-20,000")
28
+ results = Bio::Ucsc::Hg18::Chr1_rmsk.with_interval(gi).find(:first)
29
+ pp results
30
+ results.should be_true
31
+ end
32
+ end
33
+ end
34
+ end
35
+
36
+ end
@@ -0,0 +1,18 @@
1
+ require "bio-ucsc"
2
+
3
+ describe "Bio::Ucsc::Hg19::Snp132" do
4
+
5
+ describe ".find_all_by_bin_and_strand" do
6
+ context "given 'chr17:7,579,614-7,579,700' and '+'" do
7
+ it 'returns records' do
8
+ Bio::Ucsc::Hg19::DBConnection.connect
9
+ gi = Bio::GenomicInterval.parse("chr17:7,579,614-7,579,700")
10
+ results =
11
+ Bio::Ucsc::Hg19::Snp132.find_all_by_chrom_and_bin_and_class("chr17",
12
+ gi.bin_all,
13
+ "in-del")
14
+ results.should be_kind_of(Array)
15
+ end
16
+ end
17
+ end
18
+ end
@@ -0,0 +1,23 @@
1
+ require "bio-ucsc"
2
+
3
+ describe "Bio::GenomicInterval" do
4
+
5
+ describe ".bin_all" do
6
+ context "given 'chr17:7,579,614-7,579,700'" do
7
+ it 'returns all bins to search ([642, 80, 9, 1, 0])' do
8
+ gi = Bio::GenomicInterval.parse("chr17:7,579,614-7,579,700")
9
+ gi.bin_all.should == [642, 80, 9, 1, 0]
10
+ end
11
+ end
12
+ end
13
+
14
+ describe ".bin" do
15
+ context "given 'chr17:7,579,614-7,579,700'" do
16
+ it 'returns the smallest bin to search (642)' do
17
+ gi = Bio::GenomicInterval.parse("chr17:7,579,614-7,579,700")
18
+ gi.bin.should == 642
19
+ end
20
+ end
21
+ end
22
+
23
+ end
@@ -0,0 +1,81 @@
1
+ require "bio-ucsc"
2
+ require "pp"
3
+
4
+ describe "Bio::Ucsc::Hg19" do
5
+
6
+ before(:all) do
7
+ Bio::Ucsc::Hg19::DBConnection.connect
8
+ end
9
+
10
+ describe "Snp132 (BED table)" do
11
+ describe ".with_interval" do
12
+ context "given 'chr17:7,579,614-7,579,700' and class='in-del'" do
13
+ it 'returns records' do
14
+ gi = Bio::GenomicInterval.parse("chr17:7,579,614-7,579,700")
15
+ results = Bio::Ucsc::Hg19::Snp132.
16
+ with_interval(gi).
17
+ find_all_by_class("in-del")
18
+ pp results
19
+ results.should be_kind_of(Array)
20
+ end
21
+ end
22
+
23
+ context "given 'chr17:7,579,614-7,579,700'/non-partial and class='in-del'" do
24
+ it 'returns records' do
25
+ gi = Bio::GenomicInterval.parse("chr17:7,579,614-7,579,700")
26
+ results = Bio::Ucsc::Hg19::Snp132.
27
+ with_interval_excl(gi).
28
+ find_all_by_class("in-del")
29
+ pp results
30
+ results.should be_kind_of(Array)
31
+ end
32
+ end
33
+ end
34
+ end # describe "Snp132"
35
+
36
+ describe "Rmsk (Rmsk table)" do
37
+ describe ".with_interval" do
38
+ context "given 'chr1:10,000-200,00' and repClass='LINE'" do
39
+ it 'returns hit records' do
40
+ gi = Bio::GenomicInterval.parse("chr1:10,000-20,000")
41
+ results = Bio::Ucsc::Hg19::Rmsk.
42
+ with_interval(gi).
43
+ find_all_by_repClass "LINE"
44
+ pp results
45
+ results.should be_kind_of(Array)
46
+ end
47
+ end
48
+ end
49
+ end # describe "Rmsk"
50
+
51
+ describe "RefGene (genePred table)" do
52
+ describe ".with_interval" do
53
+ context "given 'chr1:10,000-100,000' and strand='+'" do
54
+ it 'returns hit records' do
55
+ gi = Bio::GenomicInterval.parse("chr1:10,000-100,000")
56
+ results = Bio::Ucsc::Hg19::RefGene.
57
+ with_interval(gi).
58
+ find_all_by_strand "+"
59
+ pp results
60
+ results.should be_kind_of(Array)
61
+ end
62
+ end
63
+ end
64
+ end # describe "RefGene"
65
+
66
+ describe "ChainAilMel1 (PSL table)" do
67
+ describe ".with_interval" do
68
+ context "given 'chr1:10,000-50,000' and qStrand='+'" do
69
+ it 'returns hit records' do
70
+ gi = Bio::GenomicInterval.parse("chr1:10,000-50,000")
71
+ results = Bio::Ucsc::Hg19::ChainAilMel1.
72
+ with_interval(gi).
73
+ find_all_by_qStrand "+"
74
+ pp results
75
+ results.should be_kind_of(Array)
76
+ end
77
+ end
78
+ end
79
+ end # "ChainAilMel1 (PSL table)"
80
+
81
+ end # describe "Bio::Ucsc::Hg19"
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: bio-ucsc-api
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.1
4
+ version: 0.3.0
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -10,11 +10,11 @@ authors:
10
10
  autorequire:
11
11
  bindir: bin
12
12
  cert_chain: []
13
- date: 2011-08-14 00:00:00.000000000Z
13
+ date: 2011-08-23 00:00:00.000000000Z
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
16
16
  name: activerecord
17
- requirement: &130996320 !ruby/object:Gem::Requirement
17
+ requirement: &174367420 !ruby/object:Gem::Requirement
18
18
  none: false
19
19
  requirements:
20
20
  - - ! '>='
@@ -22,10 +22,10 @@ dependencies:
22
22
  version: 3.0.7
23
23
  type: :runtime
24
24
  prerelease: false
25
- version_requirements: *130996320
25
+ version_requirements: *174367420
26
26
  - !ruby/object:Gem::Dependency
27
27
  name: activesupport
28
- requirement: &130995840 !ruby/object:Gem::Requirement
28
+ requirement: &174366940 !ruby/object:Gem::Requirement
29
29
  none: false
30
30
  requirements:
31
31
  - - ! '>='
@@ -33,10 +33,10 @@ dependencies:
33
33
  version: 3.0.7
34
34
  type: :runtime
35
35
  prerelease: false
36
- version_requirements: *130995840
36
+ version_requirements: *174366940
37
37
  - !ruby/object:Gem::Dependency
38
38
  name: mysql
39
- requirement: &130995360 !ruby/object:Gem::Requirement
39
+ requirement: &174366460 !ruby/object:Gem::Requirement
40
40
  none: false
41
41
  requirements:
42
42
  - - ! '>='
@@ -44,10 +44,10 @@ dependencies:
44
44
  version: 2.8.1
45
45
  type: :runtime
46
46
  prerelease: false
47
- version_requirements: *130995360
47
+ version_requirements: *174366460
48
48
  - !ruby/object:Gem::Dependency
49
49
  name: bio-genomic-interval
50
- requirement: &130994840 !ruby/object:Gem::Requirement
50
+ requirement: &174365980 !ruby/object:Gem::Requirement
51
51
  none: false
52
52
  requirements:
53
53
  - - ! '>='
@@ -55,10 +55,10 @@ dependencies:
55
55
  version: 0.1.2
56
56
  type: :runtime
57
57
  prerelease: false
58
- version_requirements: *130994840
58
+ version_requirements: *174365980
59
59
  - !ruby/object:Gem::Dependency
60
60
  name: rspec
61
- requirement: &130994360 !ruby/object:Gem::Requirement
61
+ requirement: &174365500 !ruby/object:Gem::Requirement
62
62
  none: false
63
63
  requirements:
64
64
  - - ~>
@@ -66,10 +66,10 @@ dependencies:
66
66
  version: 2.5.0
67
67
  type: :development
68
68
  prerelease: false
69
- version_requirements: *130994360
69
+ version_requirements: *174365500
70
70
  - !ruby/object:Gem::Dependency
71
71
  name: bundler
72
- requirement: &130993880 !ruby/object:Gem::Requirement
72
+ requirement: &174365020 !ruby/object:Gem::Requirement
73
73
  none: false
74
74
  requirements:
75
75
  - - ~>
@@ -77,10 +77,10 @@ dependencies:
77
77
  version: 1.0.0
78
78
  type: :development
79
79
  prerelease: false
80
- version_requirements: *130993880
80
+ version_requirements: *174365020
81
81
  - !ruby/object:Gem::Dependency
82
82
  name: jeweler
83
- requirement: &130993400 !ruby/object:Gem::Requirement
83
+ requirement: &174327460 !ruby/object:Gem::Requirement
84
84
  none: false
85
85
  requirements:
86
86
  - - ~>
@@ -88,10 +88,10 @@ dependencies:
88
88
  version: 1.5.2
89
89
  type: :development
90
90
  prerelease: false
91
- version_requirements: *130993400
91
+ version_requirements: *174327460
92
92
  - !ruby/object:Gem::Dependency
93
93
  name: rcov
94
- requirement: &130992920 !ruby/object:Gem::Requirement
94
+ requirement: &174326840 !ruby/object:Gem::Requirement
95
95
  none: false
96
96
  requirements:
97
97
  - - ! '>='
@@ -99,10 +99,10 @@ dependencies:
99
99
  version: '0'
100
100
  type: :development
101
101
  prerelease: false
102
- version_requirements: *130992920
102
+ version_requirements: *174326840
103
103
  - !ruby/object:Gem::Dependency
104
104
  name: bio
105
- requirement: &130992440 !ruby/object:Gem::Requirement
105
+ requirement: &174326240 !ruby/object:Gem::Requirement
106
106
  none: false
107
107
  requirements:
108
108
  - - ! '>='
@@ -110,10 +110,10 @@ dependencies:
110
110
  version: 1.4.1
111
111
  type: :development
112
112
  prerelease: false
113
- version_requirements: *130992440
113
+ version_requirements: *174326240
114
114
  - !ruby/object:Gem::Dependency
115
115
  name: rdoc
116
- requirement: &130991960 !ruby/object:Gem::Requirement
116
+ requirement: &174325620 !ruby/object:Gem::Requirement
117
117
  none: false
118
118
  requirements:
119
119
  - - ! '>='
@@ -121,7 +121,7 @@ dependencies:
121
121
  version: 3.9.1
122
122
  type: :development
123
123
  prerelease: false
124
- version_requirements: *130991960
124
+ version_requirements: *174325620
125
125
  description: ! 'Ruby UCSC API: accessing the UCSC Genome Database using Ruby'
126
126
  email: missy@be.to
127
127
  executables: []
@@ -407,6 +407,7 @@ files:
407
407
  - lib/bio-ucsc/gasacu1/intronest.rb
408
408
  - lib/bio-ucsc/gasacu1/mrna.rb
409
409
  - lib/bio-ucsc/gasacu1/rmsk.rb
410
+ - lib/bio-ucsc/genomic-interval-bin.rb
410
411
  - lib/bio-ucsc/go.rb
411
412
  - lib/bio-ucsc/go/db_connection.rb
412
413
  - lib/bio-ucsc/hg18.rb
@@ -652,6 +653,7 @@ files:
652
653
  - spec/cavpor3_spec.rb
653
654
  - spec/cb3_spec.rb
654
655
  - spec/ce6_spec.rb
656
+ - spec/chromosome_specific_tables_spec.rb
655
657
  - spec/ci2_spec.rb
656
658
  - spec/danrer7_spec.rb
657
659
  - spec/dm3_spec.rb
@@ -667,9 +669,11 @@ files:
667
669
  - spec/droyak2_spec.rb
668
670
  - spec/equcab2_spec.rb
669
671
  - spec/felcat4_spec.rb
672
+ - spec/find_by_and_spec.rb
670
673
  - spec/fr2_spec.rb
671
674
  - spec/galgal3_spec.rb
672
675
  - spec/gasacu1_spec.rb
676
+ - spec/genomic-interval-bin_spec.rb
673
677
  - spec/go_spec.rb
674
678
  - spec/hg18/acembly_spec.rb
675
679
  - spec/hg18/acemblyclass_spec.rb
@@ -5555,6 +5559,7 @@ files:
5555
5559
  - spec/loxafr3_spec.rb
5556
5560
  - spec/mm9_spec.rb
5557
5561
  - spec/mondom5_spec.rb
5562
+ - spec/named_scope_spec.rb
5558
5563
  - spec/ornana1_spec.rb
5559
5564
  - spec/orycun2_spec.rb
5560
5565
  - spec/orylat2_spec.rb
@@ -5591,7 +5596,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
5591
5596
  version: '0'
5592
5597
  segments:
5593
5598
  - 0
5594
- hash: 3670200921648831687
5599
+ hash: 2327058743331415747
5595
5600
  required_rubygems_version: !ruby/object:Gem::Requirement
5596
5601
  none: false
5597
5602
  requirements:
@@ -5620,6 +5625,7 @@ test_files:
5620
5625
  - spec/cavpor3_spec.rb
5621
5626
  - spec/cb3_spec.rb
5622
5627
  - spec/ce6_spec.rb
5628
+ - spec/chromosome_specific_tables_spec.rb
5623
5629
  - spec/ci2_spec.rb
5624
5630
  - spec/danrer7_spec.rb
5625
5631
  - spec/dm3_spec.rb
@@ -5635,9 +5641,11 @@ test_files:
5635
5641
  - spec/droyak2_spec.rb
5636
5642
  - spec/equcab2_spec.rb
5637
5643
  - spec/felcat4_spec.rb
5644
+ - spec/find_by_and_spec.rb
5638
5645
  - spec/fr2_spec.rb
5639
5646
  - spec/galgal3_spec.rb
5640
5647
  - spec/gasacu1_spec.rb
5648
+ - spec/genomic-interval-bin_spec.rb
5641
5649
  - spec/go_spec.rb
5642
5650
  - spec/hg18/acembly_spec.rb
5643
5651
  - spec/hg18/acemblyclass_spec.rb
@@ -10523,6 +10531,7 @@ test_files:
10523
10531
  - spec/loxafr3_spec.rb
10524
10532
  - spec/mm9_spec.rb
10525
10533
  - spec/mondom5_spec.rb
10534
+ - spec/named_scope_spec.rb
10526
10535
  - spec/ornana1_spec.rb
10527
10536
  - spec/orycun2_spec.rb
10528
10537
  - spec/orylat2_spec.rb