RubyGems - bio-vcf - Versions diffs - 0.8.1 → 0.8.2 - Mend

bio-vcf 0.8.1 → 0.8.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

checksums.yaml +4 -4
data/.travis.yml +5 -0
data/Gemfile +4 -4
data/Gemfile.lock +4 -4
data/README.md +32 -11
data/VERSION +1 -1
data/bin/bio-vcf +18 -9
data/bio-vcf.gemspec +25 -17
data/features/cli.feature +11 -1
data/features/step_definitions/cli-feature.rb +1 -1
data/features/step_definitions/vcf_header.rb +48 -0
data/features/vcf_header.feature +35 -0
data/lib/bio-vcf.rb +1 -0
data/lib/bio-vcf/vcfheader.rb +88 -4
data/lib/bio-vcf/vcfheader_line.rb +483 -0
data/lib/bio-vcf/vcfsample.rb +10 -1
data/ragel/gen_vcfheaderline_parser.rb +483 -0
data/ragel/gen_vcfheaderline_parser.rl +122 -0
data/ragel/generate.sh +5 -0
data/template/vcf2json_full_header.erb +23 -0
data/template/vcf2json_use_meta.erb +41 -0
data/test/data/regression/vcf2json_full_header.ref +261 -0
metadata +20 -11

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 90a933c33c683c1f0886a202fa5a9ee5ed2ad8ff
-  data.tar.gz: 1f769a89fcb3e3b44e22864ddf729ea3ac040260
+  metadata.gz: 515319faec0710075f13a0265a4027130ec5f10a
+  data.tar.gz: aed2ff09861568291363ca21944567ad36987813
 SHA512:
-  metadata.gz: 308d93ca1bcb142fa9cd4be63d929edb0ad92b7ac0da0d4f2d51f4b363de6ef0b87ac3d2688af8c36257317e203f69355fdc3348c6a330adb7d997af7ab6714d
-  data.tar.gz: d7d328a13d90b209a6068f9d3f09d56e8a00262ccf7fba9d9d67ffe4935993b2ccbb2ddc2f9a4831dd8928258a9c5d468f050d9edfbb95737616f4bfaf184bb0
+  metadata.gz: 94ff3bfda4357fc187a89c9a55116ceefe15fc2b8fa28af45e92afcad452c8d2bd65e5eae17dd2c40b046f89288c640ba5d4b40b8efb711781caed766e48f518
+  data.tar.gz: 3d810db35d1ad862aad6f4ec81d695c6d7d74d46336d4e5563e925da267d04521387994d794ff7d8384cf10d8c94701e0e2af9380ddc0b4505e00edbbedb7c3e

data/.travis.yml CHANGED

@@ -3,6 +3,11 @@ rvm:
 #  - 1.9.3 <- No longer working
   - 2.0.0
   - 2.1.0
+branches:
+  only:
+    - master
 #  - jruby-head
 #  - jruby-19mode # JRuby in 1.9 mode
 #  - 1.8.7

data/Gemfile CHANGED

@@ -7,9 +7,9 @@ source "http://rubygems.org"
 # Include everything needed to run rake, tests, features, etc.
 group :development do
   # gem "minitest"
-  gem "rspec"
-  gem "cucumber"
-  gem "jeweler", "~> 2.0.1" # , "~> 1.8.4", :git => "https://github.com/technicalpickles/jeweler.git"
-  gem "regressiontest", "~> 0.0.3"
+  gem "rspec", ">= 2.14.0"
+  gem "cucumber", ">= 1.3.11"
+  gem "jeweler", ">= 2.0.1" # , "~> 1.8.4", :git => "https://github.com/technicalpickles/jeweler.git"
+  gem "regressiontest", ">= 0.0.3"
 end

data/Gemfile.lock CHANGED

@@ -75,7 +75,7 @@ PLATFORMS
   ruby
 DEPENDENCIES
-  cucumber
-  jeweler (~> 2.0.1)
-  regressiontest (~> 0.0.3)
-  rspec
+  cucumber (>= 1.3.11)
+  jeweler (>= 2.0.1)
+  regressiontest (>= 0.0.3)
+  rspec (>= 2.14.0)

data/README.md CHANGED

@@ -5,7 +5,9 @@
 A new generation VCF parser. Bio-vcf is not only fast for genome-wide
 (WGS) data, it also comes with a really nice filtering, evaluation and
 rewrite language and it can output any type of textual data, including
-RDF and JSON. Why would you use bio-vcf over other parsers?
+VCF header and contents in RDF and JSON.
+So, why would you use bio-vcf over other parsers? Because
 1. Bio-vcf is fast and scales on multi-core computers
 2. Bio-vcf has an expressive filtering and evaluation language
@@ -16,14 +18,14 @@ RDF and JSON. Why would you use bio-vcf over other parsers?
 7. Bio-vcf allows for genotype processing
 8. Bio-vcf has support for set analysis
 9. Bio-vcf has sane error handling
-10. Bio-vcf can output tabular data, HTML, LaTeX, RDF, JSON and JSON-LD and even other VCFs using (erb) templates
+10. Bio-vcf can convert *any* VCF to *any* output, including tabular data, HTML, LaTeX, RDF, JSON and JSON-LD and even other VCFs by using (erb) templates
 Bio-vcf has better performance than other tools
 because of lazy parsing, multi-threading, and useful combinations of
 (fancy) command line filtering. For example on an 2 core machine
-bio-vcf is typically 50% faster than JVM based SnpSift. On an 8 core machine
-bio-vcf is at least 3x faster than SnpSift. Parsing a 1 Gb ESP
-VCF with 8 cores with bio-vcf takes
+bio-vcf is typically 50% faster than JVM based SnpSift. Adding
+cores, bio-vcf just does better. The more complicated the filters,
+the larger the gain.
 ```sh
   time ./bin/bio-vcf -iv --num-threads 8 --filter 'r.info.cp>0.3' < ESP6500SI_V2_SSA137.vcf > test1.vcf
@@ -52,8 +54,8 @@ a 16 core machine takes
   sys     0m5.039s
 ```
-which shows decent core utilisation (10x). We are running
-gzip compressed VCF files of 30+ Gb with similar performance gains.
+which shows decent core utilisation (10x). Running
+gzip compressed VCF files of 30+ Gb has similar performance gains.
 Use zcat to
 pipe such gzipped (vcf.gz) files into bio-vcf, e.g.
@@ -64,10 +66,10 @@ pipe such gzipped (vcf.gz) files into bio-vcf, e.g.
     --eval '[r.chrom,r.pos,r.pos+1]' > test.bed
 ```
-bio-vcf comes with a sensible parser definition language (it is 100%
-Ruby), as well as primitives for set analysis. Few
+bio-vcf comes with a sensible parser definition language (interestingly it is 100%
+Ruby), an embedded Ragel parser for INFO and FORMAT header definitions, as well as primitives for set analysis. Few
 assumptions are made about the actual contents of the VCF file (field
-names are resolved on the fly), so bio-vcf should practically work with
+names are resolved on the fly), so bio-vcf should work with
 all VCF files.
 To fetch all entries where all samples have depth larger than 20 use
@@ -679,7 +681,7 @@ Also check out [bio-table](https://github.com/pjotrp/bioruby-table) to convert t
 ## Templates
-To have more output options blastxmlparser can use an [ERB
+To have more output options bio-vcf can use an [ERB
 template](http://www.stuartellis.eu/articles/erb/) for every match. This is a
 very flexible option that can output textual formats such as JSON, YAML, HTML
 and RDF. Examples are provided in
@@ -785,6 +787,12 @@ can be
 ]
 ```
+with
+```sh
+  bio-vcf --template template/vcf2json.erb < dbsnp.vcf
+```
 may generate something like
 ```Javascript
@@ -816,6 +824,19 @@ from the last BODY element. To make it valid JSON that needs to be
 removed. A future version may add a parameter to the BODY element or a
 global rewrite function for this purpose. YAML and RDF have no such issue.
+### Using full VCF header (meta) info
+To get and put the full information from the header, simple use
+vcf.meta.to_json.  See ./template/vcf2json_full_header.erb for an
+example. This meta information can also be used to output info fields
+and sample values on the fly! For an example, see the template at
+[./template/vcf2json_use_meta.erb](https://github.com/pjotrp/bioruby-vcf/tree/master/template/vcf2json_use_meta.erb)
+and the generated output at
+[./test/data/regression/vcf2json_use_meta.ref](https://github.com/pjotrp/bioruby-vcf/tree/master/test/data/regression/vcf2json_use_meta.ref).
+This way, it is possible to write templates that can convert the content of
+*any* VCF file without prior knowledge to JSON, RDF, etc.
 ## Statistics
 Simple statistics are available for REF>ALT changes:

data/VERSION CHANGED

	@@ -1 +1 @@
1	- 0.8.1
1	+ 0.8.2

data/bin/bio-vcf CHANGED

@@ -200,7 +200,7 @@ end
 include BioVcf
-# Parse the header section of a VCF file
+# Parse the header section of a VCF file (chomping STDIN)
 def parse_header line, samples, options
   header = VcfHeader.new
   header.add(line)
@@ -374,22 +374,31 @@ begin
     end
   } # end output
-  print template.header(binding) if template
   # ---- Main loop
   STDIN.each_line do | line |
     line_number += 1
     # ---- In this section header information is handled
+    # ---- Skip embedded headers down the line...
     next if header_output_completed and line =~ /^#/
-    if line =~ /^##fileformat=/ or line =~ /^#CHR/
+    # ---- Parse the header lines (chomps from STDIN)
+    #      and returns header info and the current line
+    if line =~ /^#/
       header,line = parse_header(line,samples,options)
     end
-    next if line =~ /^##/ # empty file
-    header_output_completed = true
-    if not options[:efilter_samples] and options[:ifilter_samples]
-      # Create exclude set as a complement of include set
-      options[:efilter_samples] = header.column_names[9..-1].fill{|i|i.to_s}-options[:ifilter_samples]
+    # p [line_number,line]
+    # ---- After the header continue processing
+    if not header_output_completed
+      # one-time post-header processing
+      if not options[:efilter_samples] and options[:ifilter_samples]
+        # Create exclude set as a complement of include set
+        options[:efilter_samples] = header.column_names[9..-1].fill{|i|i.to_s}-options[:ifilter_samples]
+      end
+      print template.header(binding) if template
+      header_output_completed = true
     end
     # ---- In this section the VCF variant lines are parsed
     lines << line
     if NUM_THREADS == 1

data/bio-vcf.gemspec CHANGED

@@ -2,16 +2,14 @@
 # DO NOT EDIT THIS FILE DIRECTLY
 # Instead, edit Jeweler::Tasks in Rakefile, and run 'rake gemspec'
 # -*- encoding: utf-8 -*-
-# stub: bio-vcf 0.8.1 ruby lib
 Gem::Specification.new do |s|
   s.name = "bio-vcf"
-  s.version = "0.8.1"
+  s.version = "0.8.2"
   s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
-  s.require_paths = ["lib"]
   s.authors = ["Pjotr Prins"]
-  s.date = "2014-11-26"
+  s.date = "2014-12-28"
   s.description = "Smart lazy multi-threaded parser for VCF format with useful filtering and output rewriting"
   s.email = "pjotr.public01@thebird.nl"
   s.executables = ["bio-vcf"]
@@ -40,7 +38,9 @@ Gem::Specification.new do |s|
     "features/step_definitions/multisample.rb",
     "features/step_definitions/sfilter.rb",
     "features/step_definitions/somaticsniper.rb",
+    "features/step_definitions/vcf_header.rb",
     "features/support/env.rb",
+    "features/vcf_header.feature",
     "lib/bio-vcf.rb",
     "lib/bio-vcf/bedfilter.rb",
     "lib/bio-vcf/template.rb",
@@ -49,13 +49,19 @@ Gem::Specification.new do |s|
     "lib/bio-vcf/vcf.rb",
     "lib/bio-vcf/vcfgenotypefield.rb",
     "lib/bio-vcf/vcfheader.rb",
+    "lib/bio-vcf/vcfheader_line.rb",
     "lib/bio-vcf/vcfline.rb",
     "lib/bio-vcf/vcfrdf.rb",
     "lib/bio-vcf/vcfrecord.rb",
     "lib/bio-vcf/vcfsample.rb",
     "lib/bio-vcf/vcfstatistics.rb",
+    "ragel/gen_vcfheaderline_parser.rb",
+    "ragel/gen_vcfheaderline_parser.rl",
+    "ragel/generate.sh",
     "template/gatk_vcf2rdf.erb",
     "template/vcf2json.erb",
+    "template/vcf2json_full_header.erb",
+    "template/vcf2json_use_meta.erb",
     "template/vcf2rdf.erb",
     "template/vcf2rdf_header.erb",
     "test/data/input/dbsnp.vcf",
@@ -71,33 +77,35 @@ Gem::Specification.new do |s|
     "test/data/regression/thread4.ref",
     "test/data/regression/thread4_4.ref",
     "test/data/regression/thread4_4_failed_filter-stderr.ref",
+    "test/data/regression/vcf2json_full_header.ref",
     "test/performance/metrics.md"
   ]
   s.homepage = "http://github.com/pjotrp/bioruby-vcf"
   s.licenses = ["MIT"]
+  s.require_paths = ["lib"]
   s.required_ruby_version = Gem::Requirement.new(">= 2.0.0")
-  s.rubygems_version = "2.2.2"
+  s.rubygems_version = "2.0.3"
   s.summary = "Fast multi-threaded VCF parser"
   if s.respond_to? :specification_version then
     s.specification_version = 4
     if Gem::Version.new(Gem::VERSION) >= Gem::Version.new('1.2.0') then
-      s.add_development_dependency(%q<rspec>, [">= 0"])
-      s.add_development_dependency(%q<cucumber>, [">= 0"])
-      s.add_development_dependency(%q<jeweler>, ["~> 2.0.1"])
-      s.add_development_dependency(%q<regressiontest>, ["~> 0.0.3"])
+      s.add_development_dependency(%q<rspec>, [">= 2.14.0"])
+      s.add_development_dependency(%q<cucumber>, [">= 1.3.11"])
+      s.add_development_dependency(%q<jeweler>, [">= 2.0.1"])
+      s.add_development_dependency(%q<regressiontest>, [">= 0.0.3"])
     else
-      s.add_dependency(%q<rspec>, [">= 0"])
-      s.add_dependency(%q<cucumber>, [">= 0"])
-      s.add_dependency(%q<jeweler>, ["~> 2.0.1"])
-      s.add_dependency(%q<regressiontest>, ["~> 0.0.3"])
+      s.add_dependency(%q<rspec>, [">= 2.14.0"])
+      s.add_dependency(%q<cucumber>, [">= 1.3.11"])
+      s.add_dependency(%q<jeweler>, [">= 2.0.1"])
+      s.add_dependency(%q<regressiontest>, [">= 0.0.3"])
     end
   else
-    s.add_dependency(%q<rspec>, [">= 0"])
-    s.add_dependency(%q<cucumber>, [">= 0"])
-    s.add_dependency(%q<jeweler>, ["~> 2.0.1"])
-    s.add_dependency(%q<regressiontest>, ["~> 0.0.3"])
+    s.add_dependency(%q<rspec>, [">= 2.14.0"])
+    s.add_dependency(%q<cucumber>, [">= 1.3.11"])
+    s.add_dependency(%q<jeweler>, [">= 2.0.1"])
+    s.add_dependency(%q<regressiontest>, [">= 0.0.3"])
   end
 end

data/features/cli.feature CHANGED

@@ -43,14 +43,24 @@ Feature: Command-line interface (CLI)
     When I execute "./bin/bio-vcf -i --sfilter 's.dp>10' --seval 's.dp'"
     Then I expect the named output to match the named output "sfilter_seval_s.dp"
   Scenario: Rewrite an info field
     Given I have input file(s) named "test/data/input/multisample.vcf"
     When I execute "./bin/bio-vcf --rewrite rec.info[\'sample\']=\'XXXXX\'"
     Then I expect the named output to match the named output "rewrite.info.sample"
+  Scenario: Test JSON output with header meta data
+    Given I have input file(s) named "test/data/input/multisample.vcf"
+    When I execute "./bin/bio-vcf --template template/vcf2json_full_header.erb"
+    Then I expect the named output to match the named output "vcf2json_full_header"
+  Scenario: Test JSON output with header meta data and query samples
+    Given I have input file(s) named "test/data/input/multisample.vcf"
+    When I execute "./bin/bio-vcf --template template/vcf2json_use_meta.erb"
+    Then I expect the named output to match the named output "vcf2json_use_meta"
   Scenario: Test deadlock on failed filter with threads
     Given I have input file(s) named "test/data/input/multisample.vcf"
     When I execute "./bin/bio-vcf --num-threads 4 --thread-lines 4 --filter 't.info.dp>2'"
     Then I expect an error and the named output to match the named output "thread4_4_failed_filter" in under 30 seconds

data/features/step_definitions/cli-feature.rb CHANGED

@@ -8,7 +8,7 @@ When /^I execute "(.*?)"$/ do |arg1|
 end
 Then(/^I expect the named output to match the named output "(.*?)"$/) do |arg1|
-  RegressionTest::CliExec::exec(@cmd,arg1,ignore: '##BioVcf=').should be_true
+  RegressionTest::CliExec::exec(@cmd,arg1,ignore: '(##BioVcf|date|"version":)').should be_true
 end
 Then(/^I expect an error and the named output to match the named output "(.*?)" in under (\d+) seconds$/) do |arg1,arg2|

data/features/step_definitions/vcf_header.rb ADDED

@@ -0,0 +1,48 @@
+Given(/^the VCF header lines$/) do |string|
+  header = VcfHeader.new
+  header.add string
+  @vcf = header
+end
+When(/^I parse the VCF header$/) do
+end
+Then(/^I expect vcf\.columns to be \[CHROM','POS','ID','REF','ALT','QUAL','FILTER','INFO','FORMAT','NORMAL','TUMOR'\]$/) do
+  expect(@vcf.column_names).to eq ['CHROM','POS','ID','REF','ALT','QUAL','FILTER','INFO','FORMAT','NORMAL','TUMOR']
+end
+Then(/^I expect vcf\.fileformat to be "(.*?)"$/) do |arg1|
+  expect(@vcf.fileformat).to eq arg1
+end
+Then(/^I expect vcf\.fileDate to be "(.*?)"$/) do |arg1|
+  expect(@vcf.fileDate).to eq arg1
+end
+Then(/^I expect vcf.field\['fileDate'\] to be "(.*?)"$/) do |arg1|
+  expect(@vcf.field['fileDate']).to eq arg1
+end
+Then(/^I expect vcf\.phasing to be "(.*?)"$/) do |arg1|
+  expect(@vcf.phasing).to eq arg1
+end
+Then(/^I expect vcf\.reference to be "(.*?)"$/) do |arg1|
+  expect(@vcf.reference).to eq arg1
+end
+Then(/^I expect vcf\.format\['(\w+)'\] to be (\{[^}]+\})/) do |arg1,arg2|
+  expect(@vcf.format[arg1].to_s).to eq arg2
+end
+Then(/^I expect vcf\.info\['(\w+)'\] to be (\{[^}]+\})/) do |arg1,arg2|
+  expect(@vcf.info[arg1].to_s).to eq arg2
+end
+Then(/^I expect vcf\.meta to contain all header meta information$/) do
+  m = @vcf.meta
+  expect(m['fileformat']).to eq "VCFv4.1"
+  expect(m['FORMAT']['DP']['Number']).to eq "1"
+  expect(m.size).to be 6
+end

data/features/vcf_header.feature ADDED

@@ -0,0 +1,35 @@
+@meta
+Feature: Parsing VCF meta information from the header
+  Take a header and parse that information as defined by the VCF standard.
+  Scenario: When parsing a header line
+    Given the VCF header lines
+    """
+##fileformat=VCFv4.1
+##fileDate=20140121
+##phasing=none
+##reference=file:///data/GENOMES/human_GATK_GRCh37/GRCh37_gatk.fasta
+##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
+##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Total read depth">
+##FORMAT=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
+##INFO=<ID=PM,Number=0,Type=Flag,Description="Variant is Precious(Clinical,Pubmed Cited)">
+#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	NORMAL	TUMOR
+    """
+    When I parse the VCF header
+    Then I expect vcf.columns to be [CHROM','POS','ID','REF','ALT','QUAL','FILTER','INFO','FORMAT','NORMAL','TUMOR']
+    And I expect vcf.fileformat to be "VCFv4.1"
+    And I expect vcf.fileDate to be "20140121"
+    And I expect vcf.field['fileDate'] to be "20140121"
+    And I expect vcf.phasing to be "none"
+    And I expect vcf.reference to be "file:///data/GENOMES/human_GATK_GRCh37/GRCh37_gatk.fasta"
+    And I expect vcf.format['GT'] to be {"ID"=>"GT", "Number"=>"1", "Type"=>"String", "Description"=>"Genotype"}
+    And I expect vcf.format['DP'] to be {"ID"=>"DP", "Number"=>"1", "Type"=>"Integer", "Description"=>"Total read depth"}
+    And I expect vcf.format['DP4'] to be {"ID"=>"DP4", "Number"=>"4", "Type"=>"Integer", "Description"=>"# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases"}
+    And I expect vcf.info['PM'] to be {"ID"=>"PM", "Number"=>"0", "Type"=>"Flag", "Description"=>"Variant is Precious(Clinical,Pubmed Cited)"}'
+    And I expect vcf.meta to contain all header meta information
+  Scenario: When parsing the header of somatic_sniper.vcf
+    Do something

data/lib/bio-vcf.rb CHANGED

@@ -11,6 +11,7 @@
 require 'bio-vcf/utils'
 require 'bio-vcf/vcf'
 require 'bio-vcf/vcfsample'
+require 'bio-vcf/vcfheader_line'
 require 'bio-vcf/vcfheader'
 require 'bio-vcf/vcfline'
 require 'bio-vcf/vcfgenotypefield'

data/lib/bio-vcf/vcfheader.rb CHANGED

@@ -1,3 +1,14 @@
+# This module parses the VCF header. A header consists of lines
+# containing fields. Most fields are of 'key=value' type and appear
+# only once.  These can be retrieved with the find_field method.
+#
+# INFO and FORMAT fields are special as they appear multiple times
+# and contain multiple key values (identified by an ID field).
+# To retrieve these call 'info' and 'format' functions respectively,
+# which return a hash on the contained ID.
+#
+# For the INFO and FORMAT fields a Ragel parser is used, mostly to
+# deal with embedded quoted fields.
 module BioVcf
@@ -13,21 +24,27 @@ module BioVcf
       end
       nil
     end
+    def VcfHeaderParser.parse_field(line)
+      BioVcf::VcfHeaderParser::RagelKeyValues.run_lexer(line, debug: false)
+    end
   end
   class VcfHeader
-    attr_reader :lines
+    attr_reader :lines, :field
     def initialize
       @lines = []
+      @field = {}
     end
+    # Add a new field to the header
     def add line
-      @lines << line.strip
+      @lines += line.split(/\n/)
     end
-    # Add a key value list to the header
+    # Push a special key value list to the header
     def tag h
       h2 = h.dup
       [:show_help,:skip_header,:verbose,:quiet,:debug].each { |key| h2.delete(key) }
@@ -82,6 +99,73 @@ module BioVcf
       @sample_index = index
       index
     end
-  end
+    # Look for a line in the header with the field name and return the
+    # value, otherwise return nil
+    def find_field name
+      return field[name] if field[name]
+      @lines.each do | line |
+        value = line.scan(/###{name}=(.*)/)
+        if value[0]
+          v = value[0][0]
+          field[name] = v
+          return v
+        end
+      end
+      nil
+    end
+    # Look for all the lines that match the field name and return
+    # a hash of hashes. An empty hash is returned when there are
+    # no matches.
+    def find_fields name
+      res = {}
+      @lines.each do | line |
+        value = line.scan(/###{name}=<(.*)>/)
+        if value[0]
+          str = value[0][0]
+          # p str
+          v = VcfHeaderParser.parse_field(line)
+          id = v['ID']
+          res[id] = v
+        end
+      end
+      # p res
+      res
+    end
+    def format
+      find_fields('FORMAT')
+    end
+    def info
+      find_fields('INFO')
+    end
+    def meta
+      res = { 'INFO' => {}, 'FORMAT' => {} }
+      @lines.each do | line |
+        value = line.scan(/##(.*?)=(.*)/)
+        if value[0]
+          k,v = value[0]
+          if k != 'FORMAT' and k != 'INFO'
+            # p [k,v]
+            res[k] = v
+          end
+        end
+      end
+      res['INFO'] = info
+      res['FORMAT'] = format
+      # p [:res, res]
+      res
+    end
+    def method_missing(m, *args, &block)
+      name = m.to_s
+      value = find_field(name)
+      return value if value
+      raise "Unknown VCF header query '#{name}'"
+    end
+  end
 end