bio-ucsc-api 0.0.5 → 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (52) hide show
  1. data/README.rdoc +36 -16
  2. data/Rakefile +3 -3
  3. data/VERSION +1 -1
  4. data/bio-ucsc-api.gemspec +9 -7
  5. data/lib/bio-ucsc/hg18.rb +2 -4
  6. data/lib/bio-ucsc/hg18/activerecord.rb +1 -1
  7. data/lib/bio-ucsc/hg18/all_bacends.rb +0 -1
  8. data/lib/bio-ucsc/hg18/cnpiafrate2.rb +0 -1
  9. data/lib/bio-ucsc/hg18/cnplocke.rb +0 -1
  10. data/lib/bio-ucsc/hg18/cnpredon.rb +0 -1
  11. data/lib/bio-ucsc/hg18/cnpsebat2.rb +0 -1
  12. data/lib/bio-ucsc/hg18/cnpsharp2.rb +0 -1
  13. data/lib/bio-ucsc/hg18/cnptuzun.rb +0 -1
  14. data/lib/bio-ucsc/hg18/cytoband.rb +0 -1
  15. data/lib/bio-ucsc/hg18/db_connection.rb +1 -1
  16. data/lib/bio-ucsc/hg18/delconrad2.rb +0 -1
  17. data/lib/bio-ucsc/hg18/delhinds2.rb +0 -1
  18. data/lib/bio-ucsc/hg18/delmccarroll.rb +0 -1
  19. data/lib/bio-ucsc/hg18/dgv.rb +0 -1
  20. data/lib/bio-ucsc/hg18/ensgene.rb +0 -1
  21. data/lib/bio-ucsc/hg18/exaptedrepeats.rb +0 -1
  22. data/lib/bio-ucsc/hg18/hgcentral_wikitrack.rb +0 -1
  23. data/lib/bio-ucsc/hg18/kgprotmap2.rb +0 -1
  24. data/lib/bio-ucsc/hg18/kgtargetali.rb +0 -1
  25. data/lib/bio-ucsc/hg18/kiddeichlerdiscabc10.rb +0 -1
  26. data/lib/bio-ucsc/hg18/kiddeichlerdiscabc11.rb +0 -1
  27. data/lib/bio-ucsc/hg18/kiddeichlerdiscabc12.rb +0 -1
  28. data/lib/bio-ucsc/hg18/kiddeichlerdiscabc13.rb +0 -1
  29. data/lib/bio-ucsc/hg18/kiddeichlerdiscabc14.rb +0 -1
  30. data/lib/bio-ucsc/hg18/kiddeichlerdiscabc8.rb +0 -1
  31. data/lib/bio-ucsc/hg18/kiddeichlerdiscabc9.rb +0 -1
  32. data/lib/bio-ucsc/hg18/kiddeichlerdiscg248.rb +0 -1
  33. data/lib/bio-ucsc/hg18/reference.rb +4 -163
  34. data/lib/bio-ucsc/hg18/reference_sequence.rb +171 -0
  35. data/lib/bio-ucsc/hg18/refgene.rb +0 -1
  36. data/lib/bio-ucsc/hg18/snp130.rb +0 -1
  37. data/lib/bio-ucsc/hg19.rb +2 -1
  38. data/lib/bio-ucsc/hg19/activerecord.rb +1 -1
  39. data/lib/bio-ucsc/hg19/cytoband.rb +0 -1
  40. data/lib/bio-ucsc/hg19/cytobandideo.rb +0 -1
  41. data/lib/bio-ucsc/hg19/db_connection.rb +1 -1
  42. data/lib/bio-ucsc/hg19/dgv.rb +0 -1
  43. data/lib/bio-ucsc/hg19/ensgene.rb +0 -1
  44. data/lib/bio-ucsc/hg19/reference.rb +4 -163
  45. data/lib/bio-ucsc/hg19/reference_sequence.rb +171 -0
  46. data/lib/bio-ucsc/hg19/refgene.rb +0 -1
  47. data/lib/bio-ucsc/hg19/snp131.rb +0 -1
  48. data/lib/bio-ucsc/hg19/wgrna.rb +0 -1
  49. data/samples/hg19-2bit-retrieve.rb +2 -2
  50. data/spec/hg18/{reference_spec.rb → reference_sequence_spec.rb} +26 -26
  51. data/spec/hg19/{reference_spec.rb → reference_sequence_spec.rb} +26 -26
  52. metadata +11 -10
@@ -1,37 +1,40 @@
1
1
  = bio-ucsc-api
2
2
 
3
- A BioRuby plugin: Ruby API for the UCSC database.
3
+ The Ruby UCSC API: accessing the UCSC Genome Database using Ruby.
4
4
 
5
5
  This package is based on the followings:
6
- * ruby-ucsc-api: https://github.com/jandot/ruby-ucsc-api
6
+ * original ruby-ucsc-api: https://github.com/jandot/ruby-ucsc-api
7
7
  * ruby-ensembl-api: https://github.com/jandot/ruby-ensembl-api
8
8
 
9
9
  Major dependent gems:
10
10
  * active_record http://api.rubyonrails.org/classes/ActiveRecord/Base.html
11
11
  * bioruby-genomic-interval https://github.com/misshie/bioruby-genomic-interval
12
12
 
13
+ Requirement:
14
+ * Ruby version 1.9.2 or later
15
+
13
16
  See also a paper:
14
17
 
15
18
  Strozzi F, Aerts J: A Ruby API to query the Ensembl database for genomic features.
16
19
  Bioinformatics 2011, 27:1013-1014.
17
20
 
18
- NOTE: THIS PACKAGE IS STILL *EXPERIMENTAL* VERSION.
19
- Your comments, suggestions and requests are welcome.
20
- Email to the author (missy at be.to) or create an "Issue" at
21
- http://github.com/misshie/bioruby-ucsc-api/issues
21
+ Your comments, suggestions and requests are welcome. Documentation and
22
+ feedback are available at the UserEcho site at
23
+ http://rubyucscapi.userecho.com/.
22
24
 
23
25
  == Features
24
26
 
25
27
  * Designed as a BioRuby plugin
26
28
  * Using ActiveRecord as an O/R mapping framework
27
29
  * Using the Bin index system to improve query performance (see https://github.com/misshie/UCSCBin )
28
- * Automatic conversion of "1-based full-closed intervals" to internal "0-based half-closed intervals" (see also bioruby-genomic-interval)
30
+ * Automatic conversion of "1-based full-closed intervals" to internal "0-based left-closed right-open intervals" (see also bioruby-genomic-interval)
29
31
  * Supporting non-official MySql hosts (e.g. local servers)
30
32
  * Using Rspec for the testing framework
31
- * Support hg19 and hg18 databases. So far, tables containing only single records of the fileName columns are not supported.
33
+ * Support hg19 and hg18 databases. Current version does not support tables containing only single records of the fileName columns linking to bigWIG or bigBED files.
32
34
 
33
35
  == Change Log
34
-
36
+ * *MODIFIED* (v0.1.0): The name of this library is now "Ruby UCSC API". The RubyGem name and the GitHub account and the library name are not changed.
37
+ * *MODIFIED* (v0.1.0): Bio::Ucsc::[Hg18|Hg19]::Reference is replaced by Bio::Ucsc::[Hg18|Hg19]::ReferenceSequence.
35
38
  * *UPDATE* (v0.0.5): Almost all hg18 tables are supported.
36
39
  * *UPDATE* (v0.0.5): find_by_interval and find_all_by_interval class methods accept the "partial" option. Default is true. When "partial: false" is opted, return value will be only fully-included (non-partially-included) records.
37
40
  * *UPDATE* (v0.0.4): Almost all hg19 tables are supported. "filename" tables in ENCODE dataset are omitted. Each of them contains only single record of a path to the raw data file. Definitions of table relations are incomplete.
@@ -49,13 +52,13 @@ You may need to be root or use "sudo". "--no-ri" and "--no-rdoc" options are rec
49
52
 
50
53
  require 'bio-ucsc'
51
54
 
52
- include Bio::Ucsc::Hg19
55
+ include Bio::Ucsc::Hg19
53
56
  DBConnection.connect
54
57
 
55
58
  itv = Bio::GenomicInterval.parse("chr1:1-11,000")
56
59
  Snp131.find_all_by_interval(itv).each do |e|
57
60
  i = Bio::GenomicInterval.zero_based(e.chrom, e.chromStart, e.chromEnd)
58
- puts "#{i.chrom}\t#{i.chromStart}\t#{e.name}\t#{e[:class]}"
61
+ puts "#{i.chrom}\t#{i.chr_start}\t#{e.name}\t#{e[:class]}"
59
62
  end
60
63
 
61
64
  itv = Bio::GenomicInterval.parse("chr17:7,579,614-7,579,700")
@@ -64,15 +67,32 @@ You may need to be root or use "sudo". "--no-ri" and "--no-rdoc" options are rec
64
67
 
65
68
  p Snp131.find_by_name("rs56289060")
66
69
 
70
+ # Sometimes, queries using raw SQL provide elegant solutions.
71
+ #
72
+ sql << 'SQL'
73
+ SELECT name,chrom,chromStart,chromEnd,observed
74
+ FROM snp131
75
+ WHERE name="rs56289060"
76
+ SQL
77
+ p Snp131.find_by_sql(sql)
78
+
67
79
  # retrieve reference sequence from a locally-stored 2bit file
68
- Reference.load("hg19.2bit")
80
+ ReferenceSequence.load("hg19.2bit")
69
81
  itv = Bio::GenomicInterval.parse("chr1:9,500-10,999")
70
- p Reference.find_by_interval(itv)
82
+ p ReferenceSequence.find_by_interval(itv)
71
83
 
72
- == Copyright
73
- Copyright:: (c) 2011 MISHIMA, Hiroyuki (missy at be.to / hmishima at nagasaki-u.ac.jp)
84
+ # Connetcting to non-official or local mirror MySQL servers
85
+ DBConnection.db_host = 'foo.example.edu'
86
+ DBConnection.db_username = 'genome'
87
+ DBConnection.db_password = ''
88
+ DBConnection.connect
89
+
90
+ DBConnection.default # reset to connect UCSC's public MySQL sever
91
+ DBConnection.connect
74
92
 
75
- Copyright:: (c) 2008-2010 Francesco Strozzi, Jan Aerts
93
+ == Copyright
94
+ Copyright:: (c) 2011 MISHIMA, Hiroyuki (missy at be.to / hmishima at nagasaki-u.ac.jp / @mishimahryk at Twitter)
95
+ Copyright:: (c) 2010 Jan Aerts
76
96
 
77
97
  License:: Ruby license (Ruby's / GPLv2 dual). See COPYING and COPYING.ja for further details.
78
98
 
data/Rakefile CHANGED
@@ -16,10 +16,10 @@ Jeweler::Tasks.new do |gem|
16
16
  gem.name = "bio-ucsc-api"
17
17
  gem.homepage = "http://github.com/misshie/bioruby-ucsc-api"
18
18
  gem.license = "Ruby (Ruby's/GPLv2 dual)"
19
- gem.summary = %Q{A Bioruby plugin: an API for UCSC Genome Browser (experimental)}
20
- gem.description = %Q{A Bioruby plugin: an API for UCSC Genome Browser database (experimental)}
19
+ gem.summary = %Q{The Ruby UCSC API: accessing the UCSC Genome Database using Ruby}
20
+ gem.description = %Q{Ruby UCSC API: accessing the UCSC Genome Database using Ruby}
21
21
  gem.email = "missy@be.to"
22
- gem.authors = ["Hiroyuki Mishima", "Francesco Strozzi", "Jan Aerts"]
22
+ gem.authors = ["Hiroyuki Mishima", "Jan Aerts"]
23
23
  # Include your dependencies below. Runtime dependencies are required when using your gem,
24
24
  # and development dependencies are only needed for development
25
25
  # (ie running rake tasks, tests, etc)
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.0.5
1
+ 0.1.0
@@ -5,12 +5,12 @@
5
5
 
6
6
  Gem::Specification.new do |s|
7
7
  s.name = %q{bio-ucsc-api}
8
- s.version = "0.0.5"
8
+ s.version = "0.1.0"
9
9
 
10
10
  s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
11
- s.authors = ["Hiroyuki Mishima", "Francesco Strozzi", "Jan Aerts"]
12
- s.date = %q{2011-06-19}
13
- s.description = %q{A Bioruby plugin: an API for UCSC Genome Browser database (experimental)}
11
+ s.authors = ["Hiroyuki Mishima", "Jan Aerts"]
12
+ s.date = %q{2011-07-07}
13
+ s.description = %q{Ruby UCSC API: accessing the UCSC Genome Database using Ruby}
14
14
  s.email = %q{missy@be.to}
15
15
  s.extra_rdoc_files = [
16
16
  "README.rdoc"
@@ -1009,6 +1009,7 @@ Gem::Specification.new do |s|
1009
1009
  "lib/bio-ucsc/hg18/rdmrraw.rb",
1010
1010
  "lib/bio-ucsc/hg18/recombrate.rb",
1011
1011
  "lib/bio-ucsc/hg18/reference.rb",
1012
+ "lib/bio-ucsc/hg18/reference_sequence.rb",
1012
1013
  "lib/bio-ucsc/hg18/refflat.rb",
1013
1014
  "lib/bio-ucsc/hg18/refgene.rb",
1014
1015
  "lib/bio-ucsc/hg18/reflink.rb",
@@ -3651,6 +3652,7 @@ Gem::Specification.new do |s|
3651
3652
  "lib/bio-ucsc/hg19/polyapredict.rb",
3652
3653
  "lib/bio-ucsc/hg19/recombrate.rb",
3653
3654
  "lib/bio-ucsc/hg19/reference.rb",
3655
+ "lib/bio-ucsc/hg19/reference_sequence.rb",
3654
3656
  "lib/bio-ucsc/hg19/refflat.rb",
3655
3657
  "lib/bio-ucsc/hg19/refgene.rb",
3656
3658
  "lib/bio-ucsc/hg19/reflink.rb",
@@ -4939,7 +4941,7 @@ Gem::Specification.new do |s|
4939
4941
  s.licenses = ["Ruby (Ruby's/GPLv2 dual)"]
4940
4942
  s.require_paths = ["lib"]
4941
4943
  s.rubygems_version = %q{1.7.2}
4942
- s.summary = %q{A Bioruby plugin: an API for UCSC Genome Browser (experimental)}
4944
+ s.summary = %q{The Ruby UCSC API: accessing the UCSC Genome Database using Ruby}
4943
4945
  s.test_files = [
4944
4946
  "spec/hg18/acembly_spec.rb",
4945
4947
  "spec/hg18/acemblyclass_spec.rb",
@@ -5921,7 +5923,7 @@ Gem::Specification.new do |s|
5921
5923
  "spec/hg18/rdmr_spec.rb",
5922
5924
  "spec/hg18/rdmrraw_spec.rb",
5923
5925
  "spec/hg18/recombrate_spec.rb",
5924
- "spec/hg18/reference_spec.rb",
5926
+ "spec/hg18/reference_sequence_spec.rb",
5925
5927
  "spec/hg18/refflat_spec.rb",
5926
5928
  "spec/hg18/refgene_spec.rb",
5927
5929
  "spec/hg18/reflink_spec.rb",
@@ -8560,7 +8562,7 @@ Gem::Specification.new do |s|
8560
8562
  "spec/hg19/polyadb_spec.rb",
8561
8563
  "spec/hg19/polyapredict_spec.rb",
8562
8564
  "spec/hg19/recombrate_spec.rb",
8563
- "spec/hg19/reference_spec.rb",
8565
+ "spec/hg19/reference_sequence_spec.rb",
8564
8566
  "spec/hg19/refflat_spec.rb",
8565
8567
  "spec/hg19/refgene_spec.rb",
8566
8568
  "spec/hg19/reflink_spec.rb",
@@ -2,8 +2,6 @@
2
2
  # = hg18/ucsc.rb
3
3
  # Copyright:: Cioyrught (C) 2011
4
4
  # MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
5
- # Copyright:: Copyright (C) 2008
6
- # Jan Aerts <jan.aerts@gmail.com>
7
5
  # License:: Ruby's
8
6
 
9
7
  base = "#{File.dirname(__FILE__)}/hg18"
@@ -15,7 +13,8 @@ module Bio
15
13
  module Hg18
16
14
  base = "#{File.dirname(__FILE__)}/hg18"
17
15
  # Reference sequence retrieval via the 2bit file
18
- autoload :Reference, "#{base}/reference"
16
+ autoload :Reference, "#{base}/reference" # OBSOLETE
17
+ autoload :ReferenceSequence, "#{base}/reference_sequence"
19
18
 
20
19
  # group: Mapping and Sequencing Tracks ----------
21
20
  #
@@ -3867,4 +3866,3 @@ module Bio
3867
3866
  end
3868
3867
  end
3869
3868
  end
3870
-
@@ -3,7 +3,7 @@
3
3
  #
4
4
  # Copyright::
5
5
  # Copyright (C) 2011 MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
6
- # Copyright (C) 2008 Jan Aerts <jan.aerts@gmail.com>
6
+ # Copyright (C) 2010 Jan Aerts <jan.aerts@gmail.com>
7
7
  # License:: The Ruby licence (Ryby's / GPLv2 dual)
8
8
  #
9
9
  # = DESCRIPTION
@@ -2,7 +2,6 @@
2
2
  # = hg18/cnpiafrate2.rb
3
3
  # Copyright::
4
4
  # Copyright (C) 2011 MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
5
- # Copyright (C) 2008 Jan Aerts <jan.aerts@gmail.com>
6
5
  # License:: The Ruby licence (Ryby's / GPLv2 dual)
7
6
  #
8
7
  # = Table desfription in UCSC Table Browser
@@ -2,7 +2,6 @@
2
2
  # = hg18/cnpiafrate2.rb
3
3
  # Copyright::
4
4
  # Copyright (C) 2011 MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
5
- # Copyright (C) 2008 Jan Aerts <jan.aerts@gmail.com>
6
5
  # License:: The Ruby licence (Ryby's / GPLv2 dual)
7
6
  #
8
7
  # = Table desfription in UCSC Table Browser
@@ -2,7 +2,6 @@
2
2
  # = hg18/cnplocke.rb
3
3
  # Copyright::
4
4
  # Copyright (C) 2011 MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
5
- # Copyright (C) 2008 Jan Aerts <jan.aerts@gmail.com>
6
5
  # License:: The Ruby licence (Ryby's / GPLv2 dual)
7
6
  #
8
7
  # = Table desfription in UCSC Table Browser
@@ -2,7 +2,6 @@
2
2
  # = hg18/cnpredon.rb
3
3
  # Copyright::
4
4
  # Copyright (C) 2011 MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
5
- # Copyright (C) 2008 Jan Aerts <jan.aerts@gmail.com>
6
5
  # License:: The Ruby licence (Ryby's / GPLv2 dual)
7
6
  #
8
7
  # = Table desfription in UCSC Table Browser
@@ -2,7 +2,6 @@
2
2
  # = hg18/cnpsebat2.rb
3
3
  # Copyright::
4
4
  # Copyright (C) 2011 MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
5
- # Copyright (C) 2008 Jan Aerts <jan.aerts@gmail.com>
6
5
  # License:: The Ruby licence (Ryby's / GPLv2 dual)
7
6
  #
8
7
  # = Table desfription in UCSC Table Browser
@@ -2,7 +2,6 @@
2
2
  # = hg18/cnpsharp2.rb
3
3
  # Copyright::
4
4
  # Copyright (C) 2011 MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
5
- # Copyright (C) 2008 Jan Aerts <jan.aerts@gmail.com>
6
5
  # License:: The Ruby licence (Ryby's / GPLv2 dual)
7
6
  #
8
7
  # = Table desfription in UCSC Table Browser
@@ -2,7 +2,6 @@
2
2
  # = hg18/cnptuzun.rb
3
3
  # Copyright::
4
4
  # Copyright (C) 2011 MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
5
- # Copyright (C) 2008 Jan Aerts <jan.aerts@gmail.com>
6
5
  # License:: The Ruby licence (Ryby's / GPLv2 dual)
7
6
  #
8
7
  # = Table desfription in UCSC Table Browser
@@ -2,7 +2,6 @@
2
2
  # = hg18/wikitrack.rb
3
3
  # Copyright::
4
4
  # Copyright (C) 2011 MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
5
- # Copyright (C) 2008 Jan Aerts <jan.aerts@gmail.com>
6
5
  # License:: The Ruby licence (Ryby's / GPLv2 dual)
7
6
  #
8
7
  # = Table desfription in UCSC Table Browser The Wiki track displays
@@ -3,7 +3,7 @@
3
3
  #
4
4
  # Copyright:: Cioyrught (C) 2011
5
5
  # MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
6
- # Copyright:: Copyright (C) 2008
6
+ # Copyright:: Copyright (C) 2010
7
7
  # Jan Aerts <jan.aerts@gmail.com>
8
8
  # License:: Ruby licence (Ryby's / GPLv2 dual)
9
9
 
@@ -2,7 +2,6 @@
2
2
  # = hg18/DelConrad2.rb
3
3
  # Copyright::
4
4
  # Copyright (C) 2011 MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
5
- # Copyright (C) 2008 Jan Aerts <jan.aerts@gmail.com>
6
5
  # License:: The Ruby licence (Ryby's / GPLv2 dual)
7
6
  #
8
7
  # = Table desfription in UCSC Table Browser
@@ -2,7 +2,6 @@
2
2
  # = hg18/DelHinds2.rb
3
3
  # Copyright::
4
4
  # Copyright (C) 2011 MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
5
- # Copyright (C) 2008 Jan Aerts <jan.aerts@gmail.com>
6
5
  # License:: The Ruby licence (Ryby's / GPLv2 dual)
7
6
  #
8
7
  # = Table desfription in UCSC Table Browser
@@ -2,7 +2,6 @@
2
2
  # = hg18/DelMccarroll.rb
3
3
  # Copyright::
4
4
  # Copyright (C) 2011 MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
5
- # Copyright (C) 2008 Jan Aerts <jan.aerts@gmail.com>
6
5
  # License:: The Ruby licence (Ryby's / GPLv2 dual)
7
6
  #
8
7
  # = Table desfription in UCSC Table Browser
@@ -2,7 +2,6 @@
2
2
  # = hg18/dgv.rb
3
3
  # Copyright::
4
4
  # Copyright (C) 2011 MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
5
- # Copyright (C) 2008 Jan Aerts <jan.aerts@gmail.com>
6
5
  # License:: The Ruby licence (Ryby's / GPLv2 dual)
7
6
  #
8
7
  # = Table desfription in UCSC Table Browser
@@ -2,7 +2,6 @@
2
2
  # = hg18/ensgene.rb
3
3
  # Copyright::
4
4
  # Copyright (C) 2011 MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
5
- # Copyright (C) 2008 Jan Aerts <jan.aerts@gmail.com>
6
5
  # License:: The Ruby licence (Ryby's / GPLv2 dual)
7
6
  #
8
7
  # = Table desfription in UCSC Table Browser
@@ -2,7 +2,6 @@
2
2
  # = hg18/exaptedrepeats.rb
3
3
  # Copyright::
4
4
  # Copyright (C) 2011 MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
5
- # Copyright (C) 2008 Jan Aerts <jan.aerts@gmail.com>
6
5
  # License:: The Ruby licence (Ryby's / GPLv2 dual)
7
6
  #
8
7
  # = Table desfription in UCSC Table Browser
@@ -2,7 +2,6 @@
2
2
  # = hg18/wikitrack.rb
3
3
  # Copyright::
4
4
  # Copyright (C) 2011 MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
5
- # Copyright (C) 2008 Jan Aerts <jan.aerts@gmail.com>
6
5
  # License:: The Ruby licence (Ryby's / GPLv2 dual)
7
6
  #
8
7
  # = Table desfription in UCSC Table Browser The Wiki track displays
@@ -2,7 +2,6 @@
2
2
  # = hg18/kgprotmap2.rb
3
3
  # Copyright::
4
4
  # Copyright (C) 2011 MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
5
- # Copyright (C) 2008 Jan Aerts <jan.aerts@gmail.com>
6
5
  # License:: The Ruby licence (Ryby's / GPLv2 dual)
7
6
  #
8
7
  # = Table desfription in UCSC Table Browser
@@ -2,7 +2,6 @@
2
2
  # = hg18/kgtargetali.rb
3
3
  # Copyright::
4
4
  # Copyright (C) 2011 MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
5
- # Copyright (C) 2008 Jan Aerts <jan.aerts@gmail.com>
6
5
  # License:: The Ruby licence (Ryby's / GPLv2 dual)
7
6
  #
8
7
  # = Table desfription in UCSC Table Browser
@@ -2,7 +2,6 @@
2
2
  # = hg18/kiddeichlerdiscabc10.rb
3
3
  # Copyright::
4
4
  # Copyright (C) 2011 MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
5
- # Copyright (C) 2008 Jan Aerts <jan.aerts@gmail.com>
6
5
  # License:: The Ruby licence (Ryby's / GPLv2 dual)
7
6
  #
8
7
  # = Table desfription in UCSC Table Browser
@@ -2,7 +2,6 @@
2
2
  # = hg18/kiddeichlerdiscabc11.rb
3
3
  # Copyright::
4
4
  # Copyright (C) 2011 MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
5
- # Copyright (C) 2008 Jan Aerts <jan.aerts@gmail.com>
6
5
  # License:: The Ruby licence (Ryby's / GPLv2 dual)
7
6
  #
8
7
  # = Table desfription in UCSC Table Browser
@@ -2,7 +2,6 @@
2
2
  # = hg18/kiddeichlerdiscabc12.rb
3
3
  # Copyright::
4
4
  # Copyright (C) 2011 MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
5
- # Copyright (C) 2008 Jan Aerts <jan.aerts@gmail.com>
6
5
  # License:: The Ruby licence (Ryby's / GPLv2 dual)
7
6
  #
8
7
  # = Table desfription in UCSC Table Browser
@@ -2,7 +2,6 @@
2
2
  # = hg18/kiddeichlerdiscabc13.rb
3
3
  # Copyright::
4
4
  # Copyright (C) 2011 MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
5
- # Copyright (C) 2008 Jan Aerts <jan.aerts@gmail.com>
6
5
  # License:: The Ruby licence (Ryby's / GPLv2 dual)
7
6
  #
8
7
  # = Table desfription in UCSC Table Browser
@@ -2,7 +2,6 @@
2
2
  # = hg18/kiddeichlerdiscabc14.rb
3
3
  # Copyright::
4
4
  # Copyright (C) 2011 MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
5
- # Copyright (C) 2008 Jan Aerts <jan.aerts@gmail.com>
6
5
  # License:: The Ruby licence (Ryby's / GPLv2 dual)
7
6
  #
8
7
  # = Table desfription in UCSC Table Browser
@@ -2,7 +2,6 @@
2
2
  # = hg18/kiddeichlerdiscabc8.rb
3
3
  # Copyright::
4
4
  # Copyright (C) 2011 MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
5
- # Copyright (C) 2008 Jan Aerts <jan.aerts@gmail.com>
6
5
  # License:: The Ruby licence (Ryby's / GPLv2 dual)
7
6
  #
8
7
  # = Table desfription in UCSC Table Browser
@@ -2,7 +2,6 @@
2
2
  # = hg18/kiddeichlerdiscabc9.rb
3
3
  # Copyright::
4
4
  # Copyright (C) 2011 MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
5
- # Copyright (C) 2008 Jan Aerts <jan.aerts@gmail.com>
6
5
  # License:: The Ruby licence (Ryby's / GPLv2 dual)
7
6
  #
8
7
  # = Table desfription in UCSC Table Browser
@@ -2,7 +2,6 @@
2
2
  # = hg18/kiddeichlerdiscg248
3
3
  # Copyright::
4
4
  # Copyright (C) 2011 MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
5
- # Copyright (C) 2008 Jan Aerts <jan.aerts@gmail.com>
6
5
  # License:: The Ruby licence (Ryby's / GPLv2 dual)
7
6
  #
8
7
  # = Table desfription in UCSC Table Browser
@@ -2,170 +2,11 @@
2
2
  # = reference.rb
3
3
  # handle UCSC's 2bit file (locally stored) to retrieve the reference sequence
4
4
  #
5
+ # In version 0.1.0, this file is OBSOLETE.
6
+ # Use Ucsc::Hg18::ReferenceSequence instead.
7
+ #
5
8
  # Copyright:: Cioyrught (C) 2011
6
9
  # MISHIMA, Hiroyuki <missy at be.to / hmishima at nagasaki-u.ac.jp>
7
10
  # License:: Ruby license (Ryby's / GPLv2 dual)
8
11
 
9
- # require 'bio'
10
-
11
- module Bio
12
- module Ucsc
13
- module Hg18
14
- TwoBitHeader =
15
- Struct.new(:signature, :version, :sequence_count, :reserved)
16
- TwoBitRecord =
17
- Struct.new(:dna_size,
18
- :n_block_intervals, :mask_block_intervals,
19
- :reserved, :packed_dna_offset)
20
-
21
- class ByteQueue
22
- def initialize(str)
23
- @str = str
24
- @index = 0
25
- end
26
-
27
- attr_accessor :index
28
-
29
- def next(n)
30
- result = @str[@index, n]
31
- @index += n
32
- result
33
- end
34
- end # class ByteQueue
35
-
36
- class Reference
37
- BINCODE = {0b00 => "T", 0b01 => "C", 0b10 => "A", 0b11 => "G"}
38
-
39
- cattr_reader :filename, :header, :offsets
40
-
41
- def self.load(filename)
42
- two_bit = nil
43
- open(filename, 'rb') {|f| two_bit = f.read}
44
- @@tbq = ByteQueue.new(two_bit)
45
- @@filename = filename
46
-
47
- twobit_header = TwoBitHeader.new
48
- twobit_header.signature = @@tbq.next(4).unpack('L').first
49
- twobit_header.version = @@tbq.next(4).unpack('L').first
50
- twobit_header.sequence_count = @@tbq.next(4).unpack('L').first
51
- twobit_header.reserved = @@tbq.next(4).unpack('L').first
52
- @@header = twobit_header
53
-
54
- @@offsets = Hash.new
55
- @@header.sequence_count.times do
56
- name_length = @@tbq.next(1).unpack('C').first
57
- @@offsets[@@tbq.next(name_length).unpack('a*').first] =
58
- @@tbq.next(4).unpack('L').first
59
- end
60
- @@records = Hash.new
61
- end
62
-
63
- def self.records(chrom)
64
- return @@records[chrom] if @@records[chrom]
65
-
66
- @@tbq.index = @@offsets[chrom]
67
- @@records[chrom] = TwoBitRecord.new
68
- @@records[chrom].dna_size = @@tbq.next(4).unpack('L').first
69
-
70
- n_block_count = @@tbq.next(4).unpack('L').first
71
- n_block_starts = Array.new
72
- n_block_count.times do
73
- n_block_starts << @@tbq.next(4).unpack('L').first
74
- end
75
- n_block_sizes = Array.new
76
- n_block_count.times do
77
- n_block_sizes << @@tbq.next(4).unpack('L').first
78
- end
79
- @@records[chrom].n_block_intervals = Array.new
80
- n_block_count.times do |idx|
81
- @@records[chrom].n_block_intervals <<
82
- Bio::GenomicInterval.zero_based(chrom,
83
- n_block_starts[idx],
84
- n_block_starts[idx]+n_block_sizes[idx])
85
- end
86
-
87
- mask_block_count = @@tbq.next(4).unpack('L').first
88
- mask_block_starts = Array.new
89
- mask_block_count.times do
90
- mask_block_starts << @@tbq.next(4).unpack('L').first
91
- end
92
- mask_block_sizes = Array.new
93
- mask_block_count.times do
94
- mask_block_sizes << @@tbq.next(4).unpack('L').first
95
- end
96
- @@records[chrom].mask_block_intervals = Array.new
97
- mask_block_count.times do |idx|
98
- @@records[chrom].mask_block_intervals <<
99
- Bio::GenomicInterval.zero_based(chrom,
100
- mask_block_starts[idx],
101
- mask_block_starts[idx]+mask_block_sizes[idx])
102
- end
103
-
104
- @@records[chrom].reserved = @@tbq.next(4).unpack('L').first
105
- @@records[chrom].packed_dna_offset = @@tbq.index
106
-
107
- @@records[chrom]
108
- end
109
-
110
- def self.find_by_interval(interval)
111
- seq = self.find_by_interval_raw(interval)
112
- @@records[interval.chrom].n_block_intervals.map do |nb|
113
- if interval.overlapped?(nb)
114
- case interval.compare(nb)
115
- when :equal,:contained_by
116
- seq = 'N' * interval.overlap(nb)
117
- when :contains
118
- left_len = nb.chr_start - interval.chr_start + 1
119
- right_len = interval.chr_end - nb.chr_end + 1
120
- seq[0, left_len] = 'N' * left_len
121
- seq[-right_len, right_len] = 'N' * right_len
122
- when :left_overlapped
123
- left_len = nb.chr_end - interval.chr_start + 1
124
- seq[0, left_len] = 'N' * left_len
125
- when :right_overlapped
126
- right_len = interval.chr_end - nb.chr_start + 1
127
- seq[-right_len, right_len] = 'N' * right_len
128
- when :right_adjacent, :right_off
129
- # expecting that N-blocks are sorted
130
- # return Bio::Sequence::NA.new(seq)
131
- seq
132
- end
133
- end
134
- end
135
- #Bio::Sequence::NA.new(seq)
136
- seq
137
- end
138
-
139
- def self.find_by_interval_raw(interval)
140
- byte_count, byte_mod = interval.zero_start.divmod 4
141
- chrom_top = self.records(interval.chrom).packed_dna_offset
142
- div_start, mod_start = interval.zero_start.divmod 4
143
- div_end, mod_end = interval.zero_end.divmod 4
144
- div_len, mod_len = interval.length.divmod 4
145
-
146
- byte_length = div_end - div_start + 1
147
- @@tbq.index = chrom_top + div_start
148
- bytes = @@tbq.next(byte_length).unpack('C*')
149
- seq = Bio::Ucsc::Hg18::Reference.bytes_to_nucleotides(bytes)
150
- seq[mod_start..(-1-(4-mod_end))]
151
- end
152
-
153
- def self.bytes_to_nucleotides(bytes)
154
- results = ""
155
- bytes.each do |byte|
156
- results << Bio::Ucsc::Hg18::Reference.byte_to_nucleotides(byte)
157
- end
158
- results
159
- end
160
-
161
- def self.byte_to_nucleotides(byte)
162
- BINCODE[byte >> 6] +
163
- BINCODE[(byte >> 4) & 0b11] +
164
- BINCODE[(byte >> 2) & 0b11] +
165
- BINCODE[byte & 0b11]
166
- end
167
- end # class Reference
168
-
169
- end # module Hg18
170
- end # module Ucsc
171
- end # module Bio
12
+ raise "Bio::Ucsc::Hg18::Reference is OBSOLETE. Use Bio::Ucsc::Hg18::ReferenceSequence instead."