RubyGems - pikelet - Versions diffs - 0.0.2 → 0.1.0 - Mend

pikelet 0.0.2 → 0.1.0

Files changed (8) hide show

checksums.yaml +4 -4
data/README.md +198 -77
data/lib/pikelet/field_definition.rb +16 -7
data/lib/pikelet/record_definition.rb +4 -4
data/lib/pikelet/version.rb +1 -1
data/spec/pikelet/field_definition_spec.rb +86 -0
data/spec/pikelet_spec.rb +35 -7
metadata +4 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: ba77b8ff8d09025491cbe13b0cd6f18f735a3b84
-  data.tar.gz: 4949556c8adee37a5d0370d02a2bd1c1a3a1f544
+  metadata.gz: b08f706b9dfcb52105cbcbafdb725cca95db84fc
+  data.tar.gz: 0a4362f589115d5b7fc082322b9d419f68e62c0a
 SHA512:
-  metadata.gz: 4c0afce709436d65d47d6234abbf8cc94dfbe61870a5c171bfa8b2314488cdfb3ce907a901767a3f3f95ad74cc8f251223c554eaa0a081aba663c7d2b8607c9e
-  data.tar.gz: 60432f02988c999c97d33c73418523997f58f2a0133707435a8fe61d749f5683a76d0c5dfb986fc446b0a76b39e2a130dc8536b10258272607edb567cba58580
+  metadata.gz: db9ac7654c8a926801bde9f537b36990b7d9560477eae190116b1ab24fe6915b0d44e0abe48357a25497bba512c3d555e6c4c6d3c4a5a260d750963183350843
+  data.tar.gz: 44e690c9a0913d61bc7a77df2ec460e4ee38479425c166fd6c24ab657699650ee511927829638bd723309487551e03932b1b21b52fc35052619803f5bf66f5d9

data/README.md CHANGED Viewed

@@ -1,8 +1,24 @@
 # Pikelet
-A pikelet is a type of small pancake popular in Australia and New Zealand.
-Also, a simple flat-file database parser capable of dealing with
-files containing heterogeneous records.
+[![Gem Version][gem-badge]][gem]
+[![Build status][build-badge]][build]
+[![Coverage Status][coverage-badge]][coverage]
+A [pikelet][pikelet-recipe] is a small, delicious pancake popular in Australia
+and New Zealand. Also, the stage name of Australian musician
+[Evelyn Morris][pikelet-musician]. Also, a simple flat-file database parser
+capable of dealing with files containing heterogeneous records. Somehow you've
+wound up at the github page for the last one.
+The reason I built Pikelet was to handle "HOT" files as described in the
+[IATA BSP Data Interchange Specifications handbook][dish]. These are
+essentially flat-file databases comprised of a number of different fixed-width
+record types. Each record type has a different structure, though some types
+share common fields, and all types have a type signature.
+However, Pikelet will also handle more typical flat-file databases comprised
+of homogeneous records. Additionally, it will work equally as well with CSV
+files as it will with fixed-width records.
 ## Installation
@@ -20,35 +36,60 @@ Or install it yourself as:
 ## Usage
-### Homogeneous records, fixed-width fields
+### The simple case: homogeneous records
-    require "pikelet"
+Let's say our file is a simple list of first and last names with each field
+being 10 characters in width, padded with spaces (vertical pipes used to
+indicate field boundaries).
-    data = <<-FLATFILE.gsub(/^\s*/, "")
-      Nicolaus  Copernicus
-      Tycho     Brahe
-    FLATFILE
+    |Nicolaus  |Copernicus|
+    |Tycho     |Brahe     |
+We can describe this format using Pikelet as follows:
     definition = Pikelet.define do
-      first_name  0...10
-      last_name  10...20
+      first_name   0...10
+      last_name   10...20
     end
-    definition.parse(data.split(/[\r\n]+/)).to_a
+Each field is described with a field name and a range describing the field
+boundaries. You can use either the end-inclusive (`..`) or end-exclusive
+(`...`) form of range literals. I prefer the exclusive form for this.
-    # => [#<struct first_name="Nicolaus", last_name="Copernicus">,
-    #  #<struct first_name="Tycho", last_name="Brahe">]
+Parsing the data is simple as this:
-### Heterogeneous records, fixed-width fields
+    definition.parse(data)
-    require "pikelet"
+`data` is assumed to be an enumerable object yielding successive lines from
+your file. For instance, you could do something like this:
-    data = <<-FLATFILE.gsub(/^\s*/, "")
-      NAMENicolaus  Copernicus
-      ADDR123 South Street    Nowhereville        45678Y    Someplace           Someland
-    FLATFILE
+    records = definition.parse(IO.readlines(filepath))
-    definition = Pikelet.define do
+or this:
+    records = File(filepath, 'r').do |f|
+      definition.parse(f)
+    end
+`parse` returns an enumerator, which you can either iterate over, or convert
+to an array, or whatever else you people do with enumerators. In any case,
+what you'll end up with is a series of `Structs` like this:
+    #<struct first_name="Nicolaus", last_name="Copernicus">,
+    #<struct first_name="Tycho", last_name="Brahe">
+### A more complex case: heterogeneous records
+Now let's say we're given a file consisting of names and addresses, each
+record contains a 4-character type signature - 'NAME' for names, 'ADDR' for
+addresses:
+    |NAME|Nicolaus  |Copernicus|
+    |ADDR|123 South Street     |Nowhereville        |45678Y    |Someplace           |
+We can describe it as follows:
+    Pikelet.define do
       type_signature 0...4
       record "NAME" do
@@ -61,35 +102,39 @@ Or install it yourself as:
         city           24...44
         postal_code    44...54
         state          54...74
-        country        74...94
       end
     end
-    definition.parse(data.split(/[\r\n]+/)).to_a
+Note that the type signature is described as a field like any other, but it
+must have the name `type_signature`.
-    # => [#<struct
-    #   type_signature="NAME",
-    #   first_name="Nicolaus",
-    #   last_name="Copernicus">,
-    #  #<struct
-    #   type_signature="ADDR",
-    #   street_address="123 South Street",
-    #   city="Nowhereville",
-    #   postal_code="45678Y",
-    #   state="Someplace",
-    #   country="Someland">]
+Each record type is described using `record` statements, which take the
+record's type signature as a parameter and a block describing its fields.
-### CSV files
+When we parse the data, we end up with this:
-    require "pikelet"
-    require "csv"
+    #<struct
+      type_signature="NAME",
+      first_name="Nicolaus",
+      last_name="Copernicus">,
+    #<struct
+      type_signature="ADDR",
+      street_address="123 South Street",
+      city="Nowhereville",
+      postal_code="45678Y",
+      state="Someplace">
-    data = <<-CSV.gsub(/^\s*/, "")
-      NAME,Nicolaus,Copernicus
-      ADDR,123 South Street,Nowhereville,45678Y,Someplace,Someland
-    CSV
+### Handling CSV files
-    definition = Pikelet.define do
+What happens if we were given the data in the previous example in CSV form?
+    NAME,Nicolaus,Copernicus
+    ADDR,123 South Street,Nowhereville,45678Y,Someplace
+In this case instead of describing fields with a boundary range, we just
+give it a simple (zero-based) index, like so:
+    Pikelet.define do
       type_signature 0
       record "NAME" do
@@ -102,63 +147,139 @@ Or install it yourself as:
         city           2
         postal_code    3
         state          4
-        country        5
       end
     end
-    definition.parse(CSV.parse(data)).to_a
+This yields the same results as above.
-    # => [#<struct
-    #   type_signature="NAME",
-    #   first_name="Nicolaus",
-    #   last_name="Copernicus">,
-    #  #<struct
-    #   type_signature="ADDR",
-    #   street_address="123 South Street",
-    #   city="Nowhereville",
-    #   postal_code="45678Y",
-    #   state="Someplace",
-    #   country="Someland">]
+Note that this ability to handle CSV was not planned - it just sprang
+fully-formed from the implementation. One of those pleasant little surprises
+that happens sometimes. If only I had a use for it.
 ### Inheritance
-    require "pikelet"
+Now we go back to our original example, starting with a simple list of names,
+but this time some of the records include a nickname:
-    data = <<-FLATFILE.gsub(/^\s*/, "")
-      SIMPLENicolaus  Copernicus
-      FANCY Tycho     Brahe     Tykester
-    FLATFILE
+    |PLAIN|Nicolaus  |Copernicus|
+    |FANCY|Tycho     |Brahe     |Tykester  |
-    definition = Pikelet.define do
-      type_signature 0...6
+The first and last name fields have the same boundaries in each case, but the
+"FANCY" records have an additional field. We can describe this by nesting the
+definition for FANCY records inside the definition for the PLAIN records:
+    Pikelet.define do
+      type_signature 0...5
-      record "SIMPLE" do
-        first_name  6...16
-        last_name  16...26
+      record "PLAIN" do
+        first_name  5...15
+        last_name  15...25
         record "FANCY" do
-          nickname 26...36
+          nickname 25...35
         end
       end
     end
-    definition.parse(data.split(/[\r\n]+/)).to_a
+Note that the outer definition is really just a record definition in disguise,
+you might have already figured this out if you were paying attention.
-    # => [#<struct
-    #   type_signature="SIMPLE",
-    #   first_name="Nicolaus",
-    #   last_name="Copernicus">,
-    #  #<struct
-    #   type_signature="FANCY",
-    #   first_name="Tycho",
-    #   last_name="Brahe",
-    #   nickname="Tykester">]
+Anyway, this is what we get when we parse it.
+    #<struct
+      type_signature="SIMPLE",
+      first_name="Nicolaus",
+      last_name="Copernicus">,
+    #<struct
+      type_signature="FANCY",
+      first_name="Tycho",
+      last_name="Brahe",
+      nickname="Tykester">
+### Custom field parsing
+Field definitions can accept a block. If provided, the field value is yielded
+to the block. This is useful for parsing numeric fields (say).
+    Pikelet.define do
+      a_number(0...4) { |value| value.to_i }
+    end
+You can also use shorthand syntax:
+    Pikelet.define do
+      a_number 0...4, &:to_i
+    end
+### A stupid trick
+The `field` statement will actually accepts multiple ranges/indices and will
+simply glue the sections described together. Consider the following data:
+    |BFH|00000001|01|LONZZZ  203TEST1101022359GB000001        |
+    |BCH|00000002|02|0111101007F110107                        |
+    |BOH|00000003|03|91200001101031                       GBP2|
+    |BKT|00000004|06|      000001                    011X ZZZ |
+In this format the first three characters are a 'message identifier', the next
+8 characters are a sequence number and the next 2 are a 'numeric qualifier'.
+The message identifier and numeric qualifier together form the type signature.
+We can describe this as follows (let's not bother describing all the
+different record types):
+    Pikelet.define do
+      type_signature  0... 3, 11...13
+      sequence        3...11, &:to_i
+      payload        13.. -1
+    end
+Which will yield:
+    #<struct
+      type_signature="BFH01",
+      sequence=1,
+      payload="LONZZZ  203TEST1101022359GB000001">,
+    #<struct
+      type_signature="BCH02",
+      sequence=2,
+      payload="0111101007F110107">,
+    #<struct
+      type_signature="BOH03",
+      sequence=3,
+      payload="91200001101031                       GBP2">,
+    #<struct
+      type_signature="BKT06",
+      sequence=4,
+      payload="000001                    011X ZZZ">
+In case you were wondering, no I didn't make that format up. That is what a
+[BSP HOT file][dish] actually looks like, except there's a hell of a lot more
+of it and many, many more record types.
+## Thoughts/plans
+* With some work, Pikelet could produce flat file records as easily as it
+  consumes them.
+* I had a crack at supporting lazy enumeration, and it kinda works. Sometimes.
+  If the moon is in the right quarter. I'd like to get it working properly.
 ## Contributing
-1. Fork it ( http://github.com/johncarney/pikelet/fork )
+1. Fork it ([http://github.com/johncarney/pikelet/fork][fork])
 2. Create your feature branch (`git checkout -b my-new-feature`)
 3. Commit your changes (`git commit -am 'Add some feature'`)
 4. Push to the branch (`git push origin my-new-feature`)
 5. Create new Pull Request
+[pikelet-recipe]:   http://www.taste.com.au/recipes/5757/pikelets
+[pikelet-musician]: http://en.wikipedia.org/wiki/Evelyn_Morris
+[dish]:             http://www.iata.org/publications/Pages/bspdish.aspx
+[overpunch]:        https://github.com/johncarney/overpunch
+[gem-badge]:        https://badge.fury.io/rb/pikelet.svg
+[gem]:              http://badge.fury.io/rb/pikelet
+[build-badge]:      https://travis-ci.org/johncarney/pikelet.svg?branch=master
+[build]:            https://travis-ci.org/johncarney/pikelet
+[coverage-badge]:   https://img.shields.io/coveralls/johncarney/pikelet.svg
+[coverage]:         https://coveralls.io/r/johncarney/pikelet?branch=master
+[fork]:             http://github.com/johncarney/pikelet/fork

data/lib/pikelet/field_definition.rb CHANGED Viewed

@@ -2,22 +2,31 @@ require "overpunch"
 module Pikelet
   class FieldDefinition
-    attr_reader :indices, :type
+    attr_reader :indices, :parser
-    def initialize(indices, type: nil)
+    def initialize(indices, type: nil, &parser)
       @indices = indices
-      @type = type
+      if block_given?
+        @parser = parser
+      else
+        @parser = parser_from_type(type)
+      end
     end
     def parse(text)
-      value = indices.map { |index| text[index] }.join
+      @parser.call(indices.map { |index| text[index] }.join)
+    end
+    private
+    def parser_from_type(type)
       case type
       when :integer
-        value.to_i
+        :to_i.to_proc
       when :overpunch
-        Overpunch.parse(value)
+        Proc.new { |value| Overpunch.parse(value) }
       else
-        value.strip
+        :strip.to_proc
       end
     end
   end

data/lib/pikelet/record_definition.rb CHANGED Viewed

@@ -10,9 +10,9 @@ module Pikelet
       end
     end
-    def field(name, *indices, type: nil)
+    def field(name, *indices, type: nil, &block)
       @record_class = nil
-      field_definitions[name] = Pikelet::FieldDefinition.new(indices, type: type)
+      field_definitions[name] = Pikelet::FieldDefinition.new(indices, type: type, &block)
     end
     def record(type_signature, &block)
@@ -23,8 +23,8 @@ module Pikelet
       record_class.new(*field_definitions.values.map { |field| field.parse(data) })
     end
-    def method_missing(method, *args, **options)
-      field(method, *args, **options)
+    def method_missing(method, *args, **options, &block)
+      field(method, *args, **options, &block)
     end
     def record_class

data/lib/pikelet/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module Pikelet
-  VERSION = "0.0.2"
+  VERSION = "0.1.0"
 end

data/spec/pikelet/field_definition_spec.rb ADDED Viewed

@@ -0,0 +1,86 @@
+require "spec_helper"
+require "pikelet"
+require "csv"
+describe Pikelet::FieldDefinition do
+  let(:data)        { "The quick brown fox" }
+  let(:type)        { nil }
+  let(:definition)  { Pikelet::FieldDefinition.new(indices, type: type) }
+  subject(:value) { definition.parse(data) }
+  describe "for a fixed-width field" do
+    let(:indices) { [ 4...9 ] }
+    it "extracts the field content from the data" do
+      expect(value).to eq "quick"
+    end
+  end
+  describe "given whitespace" do
+    let(:indices) { [ 3...16 ] }
+    it "strips leading and trailing whitespace" do
+      expect(value).to eq "quick brown"
+    end
+  end
+  describe "with multiple indices" do
+    let(:indices) { [ 0...4, 16...19 ] }
+    it "joins the sections together" do
+      expect(value).to eq "The fox"
+    end
+  end
+  describe "given a CSV row" do
+    let(:data)    { CSV.parse("The,quick,brown,fox").first }
+    let(:indices) { [ 2 ] }
+    it "extracts the field" do
+      expect(value).to eq "brown"
+    end
+  end
+  describe "for integer fields" do
+    let(:data)    { "xx326xx" }
+    let(:indices) { [ 2...5] }
+    let(:type)    { :integer }
+    it "converts the value to an integer" do
+      expect(value).to eq 326
+    end
+  end
+  describe "for overpunch fields" do
+    let(:data)    { "xx67Kxx" }
+    let(:indices) { [ 2...5] }
+    let(:type)    { :overpunch }
+    it "converts the value to an integer" do
+      expect(value).to eq -672
+    end
+  end
+  describe "given a parser block" do
+    let(:indices) { [ 4...9] }
+    let(:definition) do
+      Pikelet::FieldDefinition.new(indices) { |value| value.reverse }
+    end
+    it "yields the value to the parser" do
+      expect(value).to eq "kciuq"
+    end
+  end
+  describe "given a symbol for the parser block" do
+    let(:indices) { [ 4...9] }
+    let(:definition) do
+      Pikelet::FieldDefinition.new(indices, &:upcase)
+    end
+    it "invokes the named method on the value" do
+      expect(value).to eq "QUICK"
+    end
+  end
+end

data/spec/pikelet_spec.rb CHANGED Viewed

@@ -13,7 +13,7 @@ describe Pikelet do
   subject { records }
-  describe "a simple flat file" do
+  describe "for a simple flat file" do
     let(:definition) do
       Pikelet.define do
         name   0... 4
@@ -34,7 +34,7 @@ describe Pikelet do
     its(:last)  { is_expected.to match_hash(name: "Sue",  number: "087654321") }
   end
-  describe "a file with heterogeneous records" do
+  describe "for a file with heterogeneous records" do
     let(:definition) do
       Pikelet.define do
         type_signature 0...1
@@ -65,7 +65,7 @@ describe Pikelet do
     its(:last)  { is_expected.to match_hash(name: "Sue",  number: "087654321", type_signature: "B") }
   end
-  describe "a CSV file" do
+  describe "for a CSV file" do
     let(:definition) do
       Pikelet.define do
         name   0
@@ -114,10 +114,10 @@ describe Pikelet do
     its(:last)  { is_expected.to match_hash(name: "Sue",  number: "087654321", type_signature: "FANCY") }
   end
-  describe "integer fields" do
+  describe "given integer fields" do
     let(:definition) do
       Pikelet.define do
-        value 0...4, type: :integer
+        value 0...4, &:to_i
       end
     end
@@ -132,10 +132,10 @@ describe Pikelet do
     its(:value) { is_expected.to eq 5637 }
   end
-  describe "overpunch fields" do
+  describe "given overpunch fields" do
     let(:definition) do
       Pikelet.define do
-        value 0...4, type: :overpunch
+        value(0...4) { |value| Overpunch.parse(value) }
       end
     end
@@ -149,4 +149,32 @@ describe Pikelet do
     its(:value) { is_expected.to eq -5631 }
   end
+  describe "given a block when parsing" do
+    let(:collected_records) { [] }
+    let(:definition) do
+      Pikelet.define do
+        name   0... 4
+        number 4...13
+      end
+    end
+    let(:data) do
+      <<-FILE.gsub(/^\s*/, "").split(/[\r\n]+/)
+        John012345678
+        Sue 087654321
+      FILE
+    end
+    before do
+      definition.parse(data) do |record|
+        collected_records << record.to_h
+      end
+    end
+    it 'yields each record to the block' do
+      expect(collected_records).to contain_exactly(*records.map(&:to_h))
+    end
+  end
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: pikelet
 version: !ruby/object:Gem::Version
-  version: 0.0.2
+  version: 0.1.0
 platform: ruby
 authors:
 - John Carney
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2014-07-24 00:00:00.000000000 Z
+date: 2014-07-25 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: overpunch
@@ -131,6 +131,7 @@ files:
 - lib/pikelet/record_definition.rb
 - lib/pikelet/version.rb
 - pikelet.gemspec
+- spec/pikelet/field_definition_spec.rb
 - spec/pikelet_spec.rb
 - spec/spec_helper.rb
 homepage: ''
@@ -158,5 +159,6 @@ signing_key:
 specification_version: 4
 summary: A simple flat-file database parser.
 test_files:
+- spec/pikelet/field_definition_spec.rb
 - spec/pikelet_spec.rb
 - spec/spec_helper.rb