bio-locus 0.0.2 → 0.0.6

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 33548bcc8a3474a7e1d3ebbec6c4cfe2472d4af9
4
- data.tar.gz: e7c7f93a5638a79f2052142d472dbad3dc57b334
3
+ metadata.gz: eef39503225998bf18b16997dd0cea5926ef6db2
4
+ data.tar.gz: db40d63c48275bc3df6d5575931097fa30448dd5
5
5
  SHA512:
6
- metadata.gz: 04b8576748a2f324c7e4de0224c1b7409e28da8e2e2697e5bf0da60a55db0a8994a6185649d4c3b68930e7d617732e92edd8aed54cba4646cea14ded86f09be6
7
- data.tar.gz: 2b17b04ef00b04a37a1d272ee4a10137fc50392a108850671f13d01a777755674413c6bf21e26843a700168d57484cb0fd150c0a054adf012d4c2577f973e6a8
6
+ metadata.gz: fe9de938d89f49f69bb1aabcd1d870e04fba1ec4ee32191f9b2c5753a3d0b7a39bf2715ae93378007563f1865f98a7e860674dd08bed81efd4efcf6e33e3f698
7
+ data.tar.gz: ef30a47864ea175ac621a9c12cee0d60fcd572ca0ce10302203e2d41fe4e84358bc5534dc6d759367e1dae978593434a0dde0e1e06b099d3b13cd4340f3cde3d
data/Gemfile CHANGED
@@ -1,14 +1,14 @@
1
1
  source "http://rubygems.org"
2
- # Add dependencies required to use your gem here.
3
- # Example:
4
- # gem "activesupport", ">= 2.3.5"
5
-
6
- # Add dependencies to develop your gem here.
7
- # Include everything needed to run rake, tests, features, etc.
8
2
  group :development do
9
3
  gem "cucumber"
10
4
  gem "jeweler"
11
5
  gem "bundler"
6
+ gem "rspec"
7
+ gem "tokyocabinet"
8
+ gem "localmemcache"
9
+ gem "moneta"
12
10
  end
13
- gem "localmemcache"
14
- gem "moneta"
11
+ # The following are optional (Ruby serialize is the default)
12
+ # gem "tokyocabinet"
13
+ # gem "localmemcache"
14
+ # gem "moneta"
data/README.md CHANGED
@@ -6,8 +6,22 @@ Bio-locus is a tool for fast querying of genome locations. Many file
6
6
  formats in bioinformatics contain records that start with a chromosome
7
7
  name and a position for a SNP, or a start-end position for indels.
8
8
 
9
- This tool essentially allows your to store this information in a Hash
10
- or database:
9
+ This tool essentially allows your to store this chr+pos or chr+pos+alt
10
+ information in a fast database.
11
+
12
+ Why would you use bio-locus?
13
+
14
+ 1. Fast comparison of VCF files and other formats that use chr+pos
15
+ 2. Fast comparison of VCF files and other formats that use chr+pos+alt
16
+ 3. See what positions match an EVS or GoNL database
17
+ 4. Compare locations from databases such as the TCGA and COSMIC
18
+ 5. Comparison of overlap or difference
19
+
20
+ In principle any of the Moneta supported backends can be used,
21
+ including LocalMemCache, RubySerialize and TokyoCabinet. The default
22
+ is RubySerialize because it works out of the box.
23
+
24
+ Usage:
11
25
 
12
26
  ```sh
13
27
  bio-locus --store < one.vcf
@@ -19,7 +33,7 @@ listed alt alleles. To find positions in another dataset which match
19
33
  those in the database:
20
34
 
21
35
  ```sh
22
- bio-locus --match < two.vcf
36
+ bio-locus --match < two.vcf > matched.vcf
23
37
  ```
24
38
 
25
39
  The point is that this is a two-step process, first create the
@@ -29,40 +43,46 @@ with the --delete switch.
29
43
  To match with alt use
30
44
 
31
45
  ```sh
32
- bio-locus --match --include-alt < two.vcf
46
+ bio-locus --match --alt only < two.vcf > matched.vcf
33
47
  ```
34
48
 
35
- Why would you use bio-locus?
49
+ So, with bio-locus you can
36
50
 
37
- * To reduce the size of large SNP databases before storage/querying
38
- * To gain performance
39
- * To filter on chr+pos (default)
40
- * To filter on chr+pos+field (where field can be a VCF ALT)
51
+ * reduce the size of large SNP databases before storage/querying
52
+ * gain performance
53
+ * filter on chr+pos (default)
54
+ * filter on chr+pos+field (where field can be a VCF ALT)
41
55
 
42
56
  Use cases are
43
57
 
44
- * To filter for annotated variants
58
+ * To filter for annotated variants (including INDELS)
45
59
  * To remove common variants from a set
46
60
 
47
61
  In short a more targeted approach allowing you to work with less data. This
48
- tool is decently fast. For example, looking for 130 positions in 20 million SNPs
49
- in GoNL takes 0.11s to store and 1.5 minutes to match on my laptop:
62
+ tool is decently fast. For example, looking for 130 positions in 20 million
63
+ SNPs in GoNL takes 0.11s to store and 1.5 minutes to match on my laptop (using
64
+ localmemcache):
50
65
 
51
66
  ```sh
52
- cat my_130_variants.vcf | ./bin/bio-locus --store
67
+ cat my_130_variants.vcf | ./bin/bio-locus --store --storage :localmemcache
53
68
  Stored 130 positions out of 130 in locus.db
54
69
  real 0m0.119s
55
70
  user 0m0.108s
56
71
  sys 0m0.012s
57
72
 
58
- cat gonl.*.vcf |./bin/bio-locus --match
73
+ cat gonl.*.vcf |./bin/bio-locus --match --storage :localmemcache
59
74
  Matched 3 out of 20736323 lines in locus.db!
60
75
  real 1m34.577s
61
76
  user 1m33.602s
62
77
  sys 0m1.868s
63
78
  ```
64
79
 
65
- Note: for the storage the [moneta](https://github.com/minad/moneta) gem is used, currently with localmemcache.
80
+ Note: for the storage here the
81
+ [moneta](https://github.com/minad/moneta) gem is used, currently with
82
+ localmemcache. The default mode for bio-locus is Ruby serialization,
83
+ and :tokyocabinet is also supported. The larger your data becomes, the
84
+ more likely it is that you need :tokyocabinet because the others are
85
+ more RAM oriented.
66
86
 
67
87
  Note: the ALT field is split into components for matching, so A,C
68
88
  becomes two chr+pos records, one for A and one for C.
@@ -82,18 +102,25 @@ of options available through
82
102
  bio-locus --help
83
103
  ```
84
104
 
105
+ The most important one is the handling of ALT. Both with --store and
106
+ --match ALT (chr+pos+alt) can be matched in conjuction with POS
107
+ (chr+pos). When using --alt only, only ALT is matched. When using
108
+ --alt include, both ALT and POS are matched. When using --alt exclude,
109
+ only POS is matched.
110
+
111
+
85
112
  ### Deleting keys
86
113
 
87
- To delete entries use
114
+ To delete entries from the database use
88
115
 
89
116
  ```sh
90
117
  bio-locus --delete < two.vcf
91
118
  ```
92
119
 
93
- To match with alt use
120
+ To delete those that match with alt use
94
121
 
95
122
  ```sh
96
- bio-locus --delete --include-alt < two.vcf
123
+ bio-locus --delete --alt only < two.vcf
97
124
  ```
98
125
 
99
126
  You may need to run both with and without alt, depending on your needs!
@@ -113,29 +140,64 @@ can be done with
113
140
  bio-locus --store --eval-alt 'field[2].split(/\//)[1]'
114
141
  ```
115
142
 
143
+ Actually, if the --in-format is 'snv', this is exactly what is used.
144
+
116
145
  ### COSMIC
117
146
 
118
147
  COSMIC is pretty large, so it can be useful to cut the database down to the
119
148
  variants that you have. The locus information is combined
120
149
  in the before last column as chr:start-end, e.g.,
121
- 19:58861911-58861911. This will work:
150
+ 19:58861911-58861911. This may work for COSMICv68
122
151
 
123
152
  ```sh
124
153
  bio-locus -i --match --eval-chr='field[13] =~ /^([^:]+)/ ; $1' --eval-pos='field[13] =~ /:(\d+)-/ ; $1 ' < CosmicMutantExportIncFus_v68.tsv
125
154
  ```
126
155
 
156
+ You may also use the --in-format cosmic switch for supported COSMIC
157
+ versions.
158
+
127
159
  Note the -i switch is needed to skip records that lack position
128
- information.
160
+ information or are non-SNV.
161
+
162
+ ## GoNL INDEL example
129
163
 
130
- ## Usage
164
+ Here an example of filtering out all INDELs that also exist in a
165
+ different dataste, in this case
166
+ [GoNL](http://www.genoomvannederland.nl/) which provides a database of
167
+ population INDELs in VCF format. First we use
168
+ [bio-vcf](https://github.com/pjotrp/bioruby-vcf) to create a
169
+ subset of common INDELS:
131
170
 
132
- ```ruby
133
- require 'bio-locus'
171
+ ```sh
172
+ cat gonl.*.snps_indels.r5.vcf |bio-vcf --filter 'r.info.set=="INDEL" and r.info.af>0.05' > gonl_indel0.05.vcf
173
+ ```
174
+
175
+ Create a locus database from this VCF
176
+
177
+ ```sh
178
+ bio-locus --store --db gonl_indel0.05.db --alt only < gonl_indel0.05.vcf
179
+ Stored 480639 positions out of 480639 in gonl_indel0.05.db (0 duplicate hits)
180
+ ```
181
+
182
+ Next, we take our datafile and filter for INDELs that are
183
+ in the population set
184
+
185
+ ```sh
186
+ bio-locus --match -v --db gonl_indel0.05.db --alt only < varscan2_indel_nfreq30_tfreq30.vcf > /dev/null
187
+ Matched 635 (unique 75) lines out of 1005 (header 18, unique 174) in gonl_indel0.05.db!
188
+ ```
189
+ Which says that 75 INDELs were population matches. We have 635 hits
190
+ because there are multiple samples in this VCF.
191
+
192
+ This is not what we want in our file, so now we take our datafile and
193
+ filter for INDELs that are *not* in the population set
194
+
195
+ ```sh
196
+ bio-locus --match -v --db gonl_indel0.05.db --alt only < varscan2_indel_nfreq30_tfreq30.vcf > unique_indels.vcf
197
+ Matched 370 (unique 99) lines out of 1005 (header 18, unique 174) in gonl_indel0.05.db!
134
198
  ```
199
+ So now we have 99 INDELs for this dataset which are not common INDELs.
135
200
 
136
- The API doc is online. For more code examples see the test files in
137
- the source tree.
138
-
139
201
  ## Project home page
140
202
 
141
203
  Information on the source tree, documentation, examples, issues and
data/Rakefile CHANGED
@@ -25,21 +25,27 @@ Jeweler::Tasks.new do |gem|
25
25
  end
26
26
  Jeweler::RubygemsDotOrgTasks.new
27
27
 
28
- # require 'rspec/core'
29
- # require 'rspec/core/rake_task'
30
- # RSpec::Core::RakeTask.new(:spec) do |spec|
31
- # spec.pattern = FileList['spec/**/*_spec.rb']
32
- # end
28
+ require 'rspec/core'
29
+ require 'rspec/core/rake_task'
30
+ RSpec::Core::RakeTask.new(:spec) do |spec|
31
+ spec.pattern = FileList['spec/**/*_spec.rb']
32
+ end
33
33
 
34
34
  # RSpec::Core::RakeTask.new(:rcov) do |spec|
35
35
  # spec.pattern = 'spec/**/*_spec.rb'
36
36
  # spec.rcov = true
37
37
  # end
38
38
 
39
- require 'cucumber/rake/task'
40
- Cucumber::Rake::Task.new(:features)
39
+ # require 'cucumber/rake/task'
40
+ # Cucumber::Rake::Task.new(:features)
41
41
 
42
42
  task :default => :spec
43
+ task :test => [:spec]
44
+
45
+ RSpec::Core::RakeTask.new(:rcov) do |spec|
46
+ spec.pattern = 'spec/**/*_spec.rb'
47
+ spec.rcov = true
48
+ end
43
49
 
44
50
  require 'rdoc/task'
45
51
  Rake::RDocTask.new do |rdoc|
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.0.2
1
+ 0.0.6
@@ -16,13 +16,16 @@ end
16
16
  require 'bio-locus'
17
17
  require 'optparse'
18
18
 
19
- options = {task: nil, db: 'locus.db', show_help: false, header: 1}
19
+ options = {task: nil, db: 'locus.db', show_help: false, header: 1, in_format: :vcf, alt: :include, storage: :serialize}
20
20
  opts = OptionParser.new do |o|
21
21
  o.banner = "Usage: #{File.basename($0)} [options] filename\ne.g. #{File.basename($0)} test.txt"
22
22
 
23
23
  o.on("--store", 'Create or add to a cache file') do
24
24
  options[:task] = :store
25
- options[:include_alt] = true # always include alt
25
+ end
26
+
27
+ o.on("--storage [:serialize,:tokyocabinet,:localmemcache]", [:serialize,:tokyocabinet,:localmemcache], 'Persistent cache type (default :serialize)') do |t|
28
+ options[:storage] = t
26
29
  end
27
30
 
28
31
  o.on("--delete", 'Remove matches from a cache file') do
@@ -33,40 +36,60 @@ opts = OptionParser.new do |o|
33
36
  options[:task] = :match
34
37
  end
35
38
 
36
- o.on("--include-alt", 'Include chr+pos+ALT VCF field to filter') do
37
- options[:include_alt] = true
38
- end
39
+ # o.on("--include-alt", 'Include chr+pos+ALT VCF field to filter') do
40
+ # options[:include_alt] = true
41
+ # end
39
42
 
40
- o.on("--exclude-alt", 'Override adding chr+pos+ALT field to store') do
41
- options[:exclude_alt] = true
43
+ o.on('--alt [include,exclude,only]', [:include,:exclude,:only],
44
+ 'Include, exclude, only ALT (default include)') do |par|
45
+ options[:alt] = par.to_sym
42
46
  end
43
47
 
48
+ # o.on("--only-alt", 'Only look for chr+pos+ALT field in filter') do
49
+ # options[:only_alt] = true
50
+ # end
51
+
52
+ # o.on("--exclude-alt", 'Override adding chr+pos+ALT field to store') do
53
+ # options[:exclude_alt] = true
54
+ # end
44
55
 
45
56
  o.on("--db filename",String,"Use db file") do | fn |
46
57
  options[:db] = fn
47
58
  end
48
59
 
49
- o.on("--eval-chr expr",String,"Evaluate record to retrieve chr name") do | expr |
60
+ o.on('--in-format [vcf,tab,cosmic,snv]', [:vcf,:tab,:cosmic,:snv], 'Input format (default vcf)') do |par|
61
+ options[:in_format] = par.to_sym
62
+ end
63
+
64
+ o.on("--eval-chr expr",String,"Evaluate record to retrieve chr name (default field[0])") do | expr |
50
65
  options[:eval_chr] = expr
51
66
  end
52
67
 
53
- o.on("--eval-pos expr",String,"Evaluate record to retrieve position") do | expr |
68
+ o.on("--eval-pos expr",String,"Evaluate record to retrieve position (default field[1])") do | expr |
54
69
  options[:eval_pos] = expr
55
70
  end
56
71
 
57
- o.on("--eval-alt expr",String,"Evaluate record to retrieve alt list") do | expr |
72
+ o.on("--eval-alt expr",String,"Evaluate record to retrieve alt list (default field[4])") do | expr |
58
73
  options[:eval_alt] = expr
59
74
  end
60
-
75
+
61
76
  o.on("--header num", "Header lines (default 1)") do |l|
62
77
  options[:header] = l.to_i
63
78
  end
79
+
80
+ o.on("-v", "--invert-match", "Invert the sense of matching, to select non-matching lines") do
81
+ options[:invert_match] = true
82
+ end
64
83
 
84
+ o.on("--header num", "Header lines (default 1)") do |l|
85
+ options[:header] = l.to_i
86
+ end
87
+
65
88
  o.on("-q", "--quiet", "Run quietly") do |q|
66
89
  options[:quiet] = true
67
90
  end
68
91
 
69
- o.on("-v", "--verbose", "Run verbosely") do |v|
92
+ o.on("--verbose", "Run verbosely") do |v|
70
93
  options[:verbose] = true
71
94
  end
72
95
 
@@ -78,6 +101,10 @@ opts = OptionParser.new do |o|
78
101
  options[:ignore_errors] = true
79
102
  end
80
103
 
104
+ o.on("--once", "Only one copy stored/matched") do |q|
105
+ options[:once] = true
106
+ end
107
+
81
108
 
82
109
  o.separator ""
83
110
  o.on_tail('-h', '--help', 'display this help and exit') do
@@ -107,7 +134,6 @@ end
107
134
  case options[:task]
108
135
  when :store then
109
136
  require 'bio-locus/store'
110
- options[:include_alt]=false if options[:exclude_alt]
111
137
  BioLocus::Store.run(options)
112
138
  when :match ,:delete then
113
139
  require 'bio-locus/match'
@@ -8,5 +8,6 @@
8
8
  #
9
9
  # In this file only require other files. Avoid other source code.
10
10
 
11
+ require 'bio-locus/dbmapper'
11
12
  require 'bio-locus/locus.rb'
12
13
 
@@ -0,0 +1,106 @@
1
+ module BioLocus
2
+
3
+ class SerializeMapper
4
+ def initialize dbname
5
+ @dbname = dbname
6
+ @h = {}
7
+ if File.exist?(@dbname)
8
+ @h = Marshal.load(File.read(@dbname))
9
+ end
10
+ end
11
+
12
+ def [] key
13
+ @h[key]
14
+ end
15
+
16
+ def []= key, value
17
+ @h[key] = value
18
+ end
19
+
20
+ def close
21
+ File.open(@dbname, 'w') {|f| f.write(Marshal.dump(@h)) }
22
+ end
23
+ end
24
+
25
+ class MonetaMapper
26
+ def initialize storage, dbname
27
+ begin
28
+ require 'moneta'
29
+ rescue LoadError
30
+ $stderr.print "Error: Missing moneta. Install with command 'gem install moneta'\n"
31
+ exit 1
32
+ end
33
+ @store = Moneta.new(storage, file: dbname)
34
+ end
35
+
36
+ def [] key
37
+ @store[key]
38
+ end
39
+
40
+ def []= key, value
41
+ @store[key] = value
42
+ end
43
+
44
+ def close
45
+ @store.close
46
+ end
47
+ end
48
+
49
+ class TokyoCabinetMapper
50
+ def initialize dbname
51
+ begin
52
+ require 'tokyocabinet'
53
+ rescue LoadError
54
+ $stderr.print "Error: Missing tokyocabinet. Install with command 'gem install tokyocabinet'\n"
55
+ exit 1
56
+ end
57
+ @hdb = TokyoCabinet::HDB::new
58
+ if File.exist?(dbname)
59
+ if !@hdb.open(dbname, TokyoCabinet::HDB::OREADER)
60
+ ecode = @hdb.ecode
61
+ raise sprintf("open error: %s\n", @hdb.errmsg(ecode))
62
+ end
63
+ else
64
+ if !@hdb.open(dbname, TokyoCabinet::HDB::OWRITER | TokyoCabinet::HDB::OCREAT)
65
+ ecode = @hdb.ecode
66
+ raise sprintf("open error: %s\n", @hdb.errmsg(ecode))
67
+ end
68
+ end
69
+ end
70
+
71
+ def [] key
72
+ @hdb.get(key)
73
+ end
74
+
75
+ def []= key, value
76
+ if !@hdb.put(key,value)
77
+ ecode = @hdb.ecode
78
+ raise sprintf("put error: %s\n", @hdb.errmsg(ecode))
79
+ end
80
+ end
81
+
82
+ def close
83
+ if !@hdb.close
84
+ ecode = @hdb.ecode
85
+ raise sprintf("close error: %s\n", @hdb.errmsg(ecode))
86
+ end
87
+ end
88
+ end
89
+
90
+ module DbMapper
91
+ def DbMapper::factory options
92
+ dbname = options[:db]
93
+ if File.exist?(dbname)
94
+ $stderr.print "Database #{dbname} exists!\n"
95
+ end
96
+ case options[:storage]
97
+ when :tokyocabinet
98
+ TokyoCabinetMapper.new(dbname)
99
+ when :localmemcache
100
+ MonetaMapper.new(:LocalMemCache,dbname)
101
+ else
102
+ SerializeMapper.new(dbname)
103
+ end
104
+ end
105
+ end
106
+ end
@@ -1,41 +1,70 @@
1
1
 
2
2
  module BioLocus
3
3
  module Keys
4
+ @@in_list = {}
5
+
4
6
  def Keys::each_key(line,options)
7
+ use_alt = (options[:alt] == :include or options[:alt] == :only)
8
+ use_pos = (options[:alt] == :include or options[:alt] == :exclude)
9
+
5
10
  if line =~ /^[[:alnum:]]+/
6
11
  fields = nil
7
- # The default layout (VCF) may or may not work
12
+ # The default layout (VCF) may or may not work. Critically
13
+ # chr,pos and alt are expected in positions 0,1,4 respectively.
8
14
  chr,pos,id,no_use,alt,rest = line.split(/\t/,6)[0..-1]
9
- # Override parsing with
10
- if options[:eval_chr]
11
- fields ||= line.split(/\t/)
12
- field = fields
13
- chr = eval(options[:eval_chr])
14
- end
15
- if options[:eval_pos]
16
- fields ||= line.split(/\t/)
17
- field = fields
18
- pos = eval(options[:eval_pos])
19
- end
20
- if options[:eval_alt]
21
- fields ||= line.split(/\t/)
15
+ if options[:in_format] or options[:eval_chr] or options[:eval_pos] or options[:eval_alt]
16
+ fields = line.split(/\t/)
22
17
  field = fields
23
- alt = eval(options[:eval_alt])
18
+ case options[:in_format]
19
+ when :tab then
20
+ # chr,pos,ref,alt
21
+ alt = field[3].strip.split(/,/)[0] if field[3]
22
+ when :snv then
23
+ alt = field[2].split(/\//)[1] if field[2]
24
+ when :cosmic then
25
+ # COSMIC tsv files, either in field 17 (COSMICv70)
26
+ locus_field = field[17]
27
+ locus_field = field[13] if locus_field !~ /:/
28
+ if field[15] !~ /delet/i and locus_field =~ /:/
29
+ chr = /^([^:]+)/.match(locus_field)[1]
30
+ a = /:(\d+)-(\d+)/.match(locus_field)
31
+ pos = a[1] if a[1]==a[2]
32
+ end
33
+ end
34
+ # Override parsing with
35
+ if options[:eval_chr]
36
+ chr = eval(options[:eval_chr])
37
+ end
38
+ if options[:eval_pos]
39
+ pos = eval(options[:eval_pos])
40
+ end
41
+ if options[:eval_alt]
42
+ alt = eval(options[:eval_alt])
43
+ end
24
44
  end
25
- p [chr,pos] if options[:debug]
45
+ # p [:debug,chr,pos,alt] if options[:debug]
26
46
 
27
47
  # If we have a position emit it
28
48
  if pos =~ /^\d+$/ and chr and chr != ''
29
- alts = [''] # position only
30
- alts += alt.split(/,/) if options[:include_alt]
49
+ alts = if use_pos
50
+ ['']
51
+ else
52
+ []
53
+ end
54
+ alts += alt.split(/,/) if use_alt and alt
31
55
  alts.each do | nuc |
32
56
  key = chr+"\t"+pos
33
57
  key += "\t"+nuc if nuc != ''
58
+ if options[:once]
59
+ # check we haven't already sent this out in this run
60
+ return if @@in_list[key]
61
+ @@in_list[key] = true
62
+ end
34
63
  yield key
35
64
  end
36
65
  else
37
66
  if options[:ignore_errors]
38
- $stderr.print "WARNING, skipping: ",line if not options[:quiet]
67
+ $stderr.print "WARNING, <#{chr}:#{pos}> skipping: ",line if not options[:quiet]
39
68
  else
40
69
  p line
41
70
  p fields
@@ -1,42 +1,58 @@
1
1
  module BioLocus
2
-
3
- require 'moneta'
4
-
5
2
  module Match
6
3
  def Match.run(options)
7
4
  do_delete = (options[:task] == :delete)
8
- store = Moneta.new(:LocalMemCache, file: options[:db])
5
+ invert_match = options[:invert_match]
6
+ store = DbMapper.factory(options)
9
7
  lines = 0
8
+ header_lines = 0
10
9
  count = 0
11
10
  in_header = true
12
- uniq = {}
11
+ uniq_match = {}
12
+ uniq_no_match = {}
13
13
  STDIN.each_line do | line |
14
14
  if in_header and line =~ /^#/
15
15
  # Retain comments in header (for VCF)
16
16
  print line
17
+ header_lines += 1
17
18
  next
18
19
  else
19
20
  in_header = false
20
21
  end
21
- lines += 1
22
+ if line =~ /^#/
23
+ header_lines += 1
24
+ else
25
+ lines += 1
26
+ end
22
27
  $stderr.print '.' if (lines % 1_000_000) == 0 if not options[:quiet]
23
28
  Keys::each_key(line,options) do | key |
24
- if store[key]
29
+ has_match = lambda {
30
+ if invert_match
31
+ not store[key]
32
+ else
33
+ store[key]
34
+ end
35
+ }
36
+ if has_match.call
37
+ # We have a match
38
+ $stderr.print "Matched <#{key}>\n" if options[:debug]
25
39
  count += 1
26
40
  if do_delete
27
41
  store.delete(key)
28
42
  else
29
43
  print line
30
- uniq[key] ||= true
44
+ uniq_match[key] ||= true
31
45
  end
46
+ else
47
+ uniq_no_match[key] ||= true
32
48
  end
33
49
  end
34
50
  end
35
51
  store.close
36
52
  if do_delete
37
- $stderr.print "\nDeleted #{count} keys representing #{lines} in #{options[:db]}!\n" if not options[:quiet]
53
+ $stderr.print "\nDeleted #{count} keys in #{options[:db]} reading #{lines} lines !\n" if not options[:quiet]
38
54
  else
39
- $stderr.print "\nMatched #{count} (unique #{uniq.keys.size}) lines out of #{lines} in #{options[:db]}!\n" if not options[:quiet]
55
+ $stderr.print "\nMatched #{count} (unique #{uniq_match.keys.size}) lines out of #{lines} (header #{header_lines}, unique #{uniq_no_match.keys.size+uniq_match.keys.size}) in #{options[:db]}!\n" if not options[:quiet]
40
56
  end
41
57
  end
42
58
  end
@@ -1,14 +1,20 @@
1
1
  module BioLocus
2
2
 
3
- require 'moneta'
4
-
5
3
  module Store
6
4
  def Store.run(options)
7
- store = Moneta.new(:LocalMemCache, file: options[:db])
5
+ invert_match = options[:invert_match]
6
+ store = DbMapper.factory(options)
8
7
  count = count_new = count_dup = 0
9
8
  STDIN.each_line do | line |
10
9
  Keys::each_key(line,options) do | key |
11
- if not store[key]
10
+ has_match = lambda {
11
+ if invert_match
12
+ not store[key]
13
+ else
14
+ store[key]
15
+ end
16
+ }
17
+ if not has_match.call
12
18
  count_new += 1
13
19
  store[key] = true
14
20
  else
@@ -23,7 +29,7 @@ module BioLocus
23
29
  end
24
30
  end
25
31
  store.close
26
- $stderr.print "Stored #{count_new} positions out of #{count} in #{options[:db]} (#{count_dup} hits)\n" if !options[:quiet]
32
+ $stderr.print "Stored #{count_new} positions out of #{count} in #{options[:db]} (#{count_dup} duplicate hits)\n" if !options[:quiet]
27
33
  end
28
34
  end
29
35
  end
@@ -1,7 +1,37 @@
1
1
  require File.expand_path(File.dirname(__FILE__) + '/spec_helper')
2
2
 
3
- describe "BioLocus" do
4
- it "fails" do
5
- fail "hey buddy, you should probably rename this file and start specing for real"
6
- end
3
+ describe "BioLocus with Serialize" do
4
+ fn = 'biolocus_serialize.db'
5
+ store = BioLocus::DbMapper.factory({storage: :serialize, db: fn})
6
+ store['test'] = 'yes'
7
+ store['test2'] = 'no'
8
+ a = store['test']
9
+ store['test'].should == 'yes'
10
+ store['test2'].should == 'no'
11
+ store.close
12
+ File.unlink(fn)
13
+ end
14
+
15
+ describe "BioLocus with Moneta" do
16
+ fn = 'biolocus_moneta_localmemcache.db'
17
+ store = BioLocus::MonetaMapper.new(:LocalMemCache,fn)
18
+ store['test'] = 'yes'
19
+ store['test2'] = 'no'
20
+ a = store['test']
21
+ store['test'].should == 'yes'
22
+ store['test2'].should == 'no'
23
+ store.close
24
+ File.unlink(fn)
25
+ end
26
+
27
+ describe "BioLocus with TokyoCabinet" do
28
+ fn = 'biolocus_tokyocabinet.db'
29
+ store = BioLocus::TokyoCabinetMapper.new(fn)
30
+ store['test'] = 'yes'
31
+ store['test2'] = 'no'
32
+ a = store['test']
33
+ store['test'].should == 'yes'
34
+ store['test2'].should == 'no'
35
+ store.close
36
+ File.unlink(fn)
7
37
  end
metadata CHANGED
@@ -1,23 +1,23 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: bio-locus
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.2
4
+ version: 0.0.6
5
5
  platform: ruby
6
6
  authors:
7
7
  - Pjotr Prins
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-06-05 00:00:00.000000000 Z
11
+ date: 2014-10-10 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
- name: localmemcache
14
+ name: cucumber
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
17
  - - ">="
18
18
  - !ruby/object:Gem::Version
19
19
  version: '0'
20
- type: :runtime
20
+ type: :development
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
@@ -25,13 +25,13 @@ dependencies:
25
25
  - !ruby/object:Gem::Version
26
26
  version: '0'
27
27
  - !ruby/object:Gem::Dependency
28
- name: moneta
28
+ name: jeweler
29
29
  requirement: !ruby/object:Gem::Requirement
30
30
  requirements:
31
31
  - - ">="
32
32
  - !ruby/object:Gem::Version
33
33
  version: '0'
34
- type: :runtime
34
+ type: :development
35
35
  prerelease: false
36
36
  version_requirements: !ruby/object:Gem::Requirement
37
37
  requirements:
@@ -39,7 +39,7 @@ dependencies:
39
39
  - !ruby/object:Gem::Version
40
40
  version: '0'
41
41
  - !ruby/object:Gem::Dependency
42
- name: cucumber
42
+ name: bundler
43
43
  requirement: !ruby/object:Gem::Requirement
44
44
  requirements:
45
45
  - - ">="
@@ -53,7 +53,7 @@ dependencies:
53
53
  - !ruby/object:Gem::Version
54
54
  version: '0'
55
55
  - !ruby/object:Gem::Dependency
56
- name: jeweler
56
+ name: rspec
57
57
  requirement: !ruby/object:Gem::Requirement
58
58
  requirements:
59
59
  - - ">="
@@ -67,7 +67,35 @@ dependencies:
67
67
  - !ruby/object:Gem::Version
68
68
  version: '0'
69
69
  - !ruby/object:Gem::Dependency
70
- name: bundler
70
+ name: tokyocabinet
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - ">="
74
+ - !ruby/object:Gem::Version
75
+ version: '0'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - ">="
81
+ - !ruby/object:Gem::Version
82
+ version: '0'
83
+ - !ruby/object:Gem::Dependency
84
+ name: localmemcache
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - ">="
88
+ - !ruby/object:Gem::Version
89
+ version: '0'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - ">="
95
+ - !ruby/object:Gem::Version
96
+ version: '0'
97
+ - !ruby/object:Gem::Dependency
98
+ name: moneta
71
99
  requirement: !ruby/object:Gem::Requirement
72
100
  requirements:
73
101
  - - ">="
@@ -103,6 +131,7 @@ files:
103
131
  - features/step_definitions/bio-locus_steps.rb
104
132
  - features/support/env.rb
105
133
  - lib/bio-locus.rb
134
+ - lib/bio-locus/dbmapper.rb
106
135
  - lib/bio-locus/locus.rb
107
136
  - lib/bio-locus/match.rb
108
137
  - lib/bio-locus/store.rb