RubyGems - pikelet - Versions diffs - 0.0.2 → 0.1.0 - Mend

pikelet 0.0.2 → 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

checksums.yaml +4 -4
data/README.md +198 -77
data/lib/pikelet/field_definition.rb +16 -7
data/lib/pikelet/record_definition.rb +4 -4
data/lib/pikelet/version.rb +1 -1
data/spec/pikelet/field_definition_spec.rb +86 -0
data/spec/pikelet_spec.rb +35 -7
metadata +4 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: ba77b8ff8d09025491cbe13b0cd6f18f735a3b84
-  data.tar.gz: 4949556c8adee37a5d0370d02a2bd1c1a3a1f544
+  metadata.gz: b08f706b9dfcb52105cbcbafdb725cca95db84fc
+  data.tar.gz: 0a4362f589115d5b7fc082322b9d419f68e62c0a
 SHA512:
-  metadata.gz: 4c0afce709436d65d47d6234abbf8cc94dfbe61870a5c171bfa8b2314488cdfb3ce907a901767a3f3f95ad74cc8f251223c554eaa0a081aba663c7d2b8607c9e
-  data.tar.gz: 60432f02988c999c97d33c73418523997f58f2a0133707435a8fe61d749f5683a76d0c5dfb986fc446b0a76b39e2a130dc8536b10258272607edb567cba58580
+  metadata.gz: db9ac7654c8a926801bde9f537b36990b7d9560477eae190116b1ab24fe6915b0d44e0abe48357a25497bba512c3d555e6c4c6d3c4a5a260d750963183350843
+  data.tar.gz: 44e690c9a0913d61bc7a77df2ec460e4ee38479425c166fd6c24ab657699650ee511927829638bd723309487551e03932b1b21b52fc35052619803f5bf66f5d9

data/README.md CHANGED Viewed

@@ -1,8 +1,24 @@
 # Pikelet
-A pikelet is a type of small pancake popular in Australia and New Zealand.
-Also, a simple flat-file database parser capable of dealing with
-files containing heterogeneous records.
+[![Gem Version][gem-badge]][gem]
+[![Build status][build-badge]][build]
+[![Coverage Status][coverage-badge]][coverage]
+A [pikelet][pikelet-recipe] is a small, delicious pancake popular in Australia
+and New Zealand. Also, the stage name of Australian musician
+[Evelyn Morris][pikelet-musician]. Also, a simple flat-file database parser
+capable of dealing with files containing heterogeneous records. Somehow you've
+wound up at the github page for the last one.
+The reason I built Pikelet was to handle "HOT" files as described in the
+[IATA BSP Data Interchange Specifications handbook][dish]. These are
+essentially flat-file databases comprised of a number of different fixed-width
+record types. Each record type has a different structure, though some types
+share common fields, and all types have a type signature.
+However, Pikelet will also handle more typical flat-file databases comprised
+of homogeneous records. Additionally, it will work equally as well with CSV
+files as it will with fixed-width records.
 ## Installation
@@ -20,35 +36,60 @@ Or install it yourself as:
 ## Usage
-### Homogeneous records, fixed-width fields
+### The simple case: homogeneous records
-    require "pikelet"
+Let's say our file is a simple list of first and last names with each field
+being 10 characters in width, padded with spaces (vertical pipes used to
+indicate field boundaries).
-    data = <<-FLATFILE.gsub(/^\s*/, "")
-      Nicolaus  Copernicus
-      Tycho     Brahe
-    FLATFILE
+    |Nicolaus  |Copernicus|
+    |Tycho     |Brahe     |
+We can describe this format using Pikelet as follows:
     definition = Pikelet.define do
-      first_name  0...10
-      last_name  10...20
+      first_name   0...10
+      last_name   10...20
     end
-    definition.parse(data.split(/[\r\n]+/)).to_a
+Each field is described with a field name and a range describing the field
+boundaries. You can use either the end-inclusive (`..`) or end-exclusive
+(`...`) form of range literals. I prefer the exclusive form for this.
-    # => [#<struct first_name="Nicolaus", last_name="Copernicus">,
-    #  #<struct first_name="Tycho", last_name="Brahe">]
+Parsing the data is simple as this:
-### Heterogeneous records, fixed-width fields
+    definition.parse(data)
-    require "pikelet"
+`data` is assumed to be an enumerable object yielding successive lines from
+your file. For instance, you could do something like this:
-    data = <<-FLATFILE.gsub(/^\s*/, "")
-      NAMENicolaus  Copernicus
-      ADDR123 South Street    Nowhereville        45678Y    Someplace           Someland
-    FLATFILE
+    records = definition.parse(IO.readlines(filepath))
-    definition = Pikelet.define do
+or this:
+    records = File(filepath, 'r').do |f|
+      definition.parse(f)
+    end
+`parse` returns an enumerator, which you can either iterate over, or convert
+to an array, or whatever else you people do with enumerators. In any case,
+what you'll end up with is a series of `Structs` like this:
+    #<struct first_name="Nicolaus", last_name="Copernicus">,
+    #<struct first_name="Tycho", last_name="Brahe">
+### A more complex case: heterogeneous records
+Now let's say we're given a file consisting of names and addresses, each
+record contains a 4-character type signature - 'NAME' for names, 'ADDR' for
+addresses:
+    |NAME|Nicolaus  |Copernicus|
+    |ADDR|123 South Street     |Nowhereville        |45678Y    |Someplace           |
+We can describe it as follows:
+    Pikelet.define do
       type_signature 0...4
       record "NAME" do
@@ -61,35 +102,39 @@ Or install it yourself as:
         city           24...44
         postal_code    44...54
         state          54...74
-        country        74...94
       end
     end
-    definition.parse(data.split(/[\r\n]+/)).to_a
+Note that the type signature is described as a field like any other, but it
+must have the name `type_signature`.
-    # => [#<struct
-    #   type_signature="NAME",
-    #   first_name="Nicolaus",
-    #   last_name="Copernicus">,
-    #  #<struct
-    #   type_signature="ADDR",
-    #   street_address="123 South Street",
-    #   city="Nowhereville",
-    #   postal_code="45678Y",
-    #   state="Someplace",
-    #   country="Someland">]
+Each record type is described using `record` statements, which take the
+record's type signature as a parameter and a block describing its fields.
-### CSV files
+When we parse the data, we end up with this:
-    require "pikelet"
-    require "csv"
+    #<struct
+      type_signature="NAME",
+      first_name="Nicolaus",
+      last_name="Copernicus">,
+    #<struct
+      type_signature="ADDR",
+      street_address="123 South Street",
+      city="Nowhereville",
+      postal_code="45678Y",
+      state="Someplace">
-    data = <<-CSV.gsub(/^\s*/, "")
-      NAME,Nicolaus,Copernicus
-      ADDR,123 South Street,Nowhereville,45678Y,Someplace,Someland
-    CSV
+### Handling CSV files
-    definition = Pikelet.define do
+What happens if we were given the data in the previous example in CSV form?
+    NAME,Nicolaus,Copernicus
+    ADDR,123 South Street,Nowhereville,45678Y,Someplace
+In this case instead of describing fields with a boundary range, we just
+give it a simple (zero-based) index, like so:
+    Pikelet.define do
       type_signature 0
       record "NAME" do
@@ -102,63 +147,139 @@ Or install it yourself as:
         city           2
         postal_code    3
         state          4
-        country        5
       end
     end
-    definition.parse(CSV.parse(data)).to_a
+This yields the same results as above.
-    # => [#<struct
-    #   type_signature="NAME",
-    #   first_name="Nicolaus",
-    #   last_name="Copernicus">,
-    #  #<struct
-    #   type_signature="ADDR",
-    #   street_address="123 South Street",
-    #   city="Nowhereville",
-    #   postal_code="45678Y",
-    #   state="Someplace",
-    #   country="Someland">]
+Note that this ability to handle CSV was not planned - it just sprang
+fully-formed from the implementation. One of those pleasant little surprises
+that happens sometimes. If only I had a use for it.
 ### Inheritance
-    require "pikelet"
+Now we go back to our original example, starting with a simple list of names,
+but this time some of the records include a nickname:
-    data = <<-FLATFILE.gsub(/^\s*/, "")
-      SIMPLENicolaus  Copernicus
-      FANCY Tycho     Brahe     Tykester
-    FLATFILE
+    |PLAIN|Nicolaus  |Copernicus|
+    |FANCY|Tycho     |Brahe     |Tykester  |
-    definition = Pikelet.define do
-      type_signature 0...6
+The first and last name fields have the same boundaries in each case, but the
+"FANCY" records have an additional field. We can describe this by nesting the
+definition for FANCY records inside the definition for the PLAIN records:
+    Pikelet.define do
+      type_signature 0...5
-      record "SIMPLE" do
-        first_name  6...16
-        last_name  16...26
+      record "PLAIN" do
+        first_name  5...15
+        last_name  15...25
         record "FANCY" do
-          nickname 26...36
+          nickname 25...35
         end
       end
     end
-    definition.parse(data.split(/[\r\n]+/)).to_a
+Note that the outer definition is really just a record definition in disguise,
+you might have already figured this out if you were paying attention.
-    # => [#<struct
-    #   type_signature="SIMPLE",
-    #   first_name="Nicolaus",
-    #   last_name="Copernicus">,
-    #  #<struct
-    #   type_signature="FANCY",
-    #   first_name="Tycho",
-    #   last_name="Brahe",
-    #   nickname="Tykester">]
+Anyway, this is what we get when we parse it.
+    #<struct
+      type_signature="SIMPLE",
+      first_name="Nicolaus",
+      last_name="Copernicus">,
+    #<struct
+      type_signature="FANCY",
+      first_name="Tycho",
+      last_name="Brahe",
+      nickname="Tykester">
+### Custom field parsing
+Field definitions can accept a block. If provided, the field value is yielded
+to the block. This is useful for parsing numeric fields (say).
+    Pikelet.define do
+      a_number(0...4) { |value| value.to_i }
+    end
+You can also use shorthand syntax:
+    Pikelet.define do
+      a_number 0...4, &:to_i
+    end
+### A stupid trick
+The `field` statement will actually accepts multiple ranges/indices and will
+simply glue the sections described together. Consider the following data:
+    |BFH|00000001|01|LONZZZ  203TEST1101022359GB000001        |
+    |BCH|00000002|02|0111101007F110107                        |
+    |BOH|00000003|03|91200001101031                       GBP2|
+    |BKT|00000004|06|      000001                    011X ZZZ |
+In this format the first three characters are a 'message identifier', the next
+8 characters are a sequence number and the next 2 are a 'numeric qualifier'.
+The message identifier and numeric qualifier together form the type signature.
+We can describe this as follows (let's not bother describing all the
+different record types):
+    Pikelet.define do
+      type_signature  0... 3, 11...13
+      sequence        3...11, &:to_i
+      payload        13.. -1
+    end
+Which will yield:
+    #<struct
+      type_signature="BFH01",
+      sequence=1,
+      payload="LONZZZ  203TEST1101022359GB000001">,
+    #<struct
+      type_signature="BCH02",
+      sequence=2,
+      payload="0111101007F110107">,
+    #<struct
+      type_signature="BOH03",
+      sequence=3,
+      payload="91200001101031                       GBP2">,
+    #<struct
+      type_signature="BKT06",
+      sequence=4,
+      payload="000001                    011X ZZZ">
+In case you were wondering, no I didn't make that format up. That is what a
+[BSP HOT file][dish] actually looks like, except there's a hell of a lot more
+of it and many, many more record types.
+## Thoughts/plans
+* With some work, Pikelet could produce flat file records as easily as it
+  consumes them.
+* I had a crack at supporting lazy enumeration, and it kinda works. Sometimes.
+  If the moon is in the right quarter. I'd like to get it working properly.
 ## Contributing
-1. Fork it ( http://github.com/johncarney/pikelet/fork )
+1. Fork it ([http://github.com/johncarney/pikelet/fork][fork])
 2. Create your feature branch (`git checkout -b my-new-feature`)
 3. Commit your changes (`git commit -am 'Add some feature'`)
 4. Push to the branch (`git push origin my-new-feature`)
 5. Create new Pull Request
+[pikelet-recipe]:   http://www.taste.com.au/recipes/5757/pikelets
+[pikelet-musician]: http://en.wikipedia.org/wiki/Evelyn_Morris
+[dish]:             http://www.iata.org/publications/Pages/bspdish.aspx
+[overpunch]:        https://github.com/johncarney/overpunch
+[gem-badge]:        https://badge.fury.io/rb/pikelet.svg
+[gem]:              http://badge.fury.io/rb/pikelet
+[build-badge]:      https://travis-ci.org/johncarney/pikelet.svg?branch=master
+[build]:            https://travis-ci.org/johncarney/pikelet
+[coverage-badge]:   https://img.shields.io/coveralls/johncarney/pikelet.svg
+[coverage]:         https://coveralls.io/r/johncarney/pikelet?branch=master
+[fork]:             http://github.com/johncarney/pikelet/fork

data/lib/pikelet/field_definition.rb CHANGED Viewed

@@ -2,22 +2,31 @@ require "overpunch"
 module Pikelet
   class FieldDefinition
-    attr_reader :indices, :type
+    attr_reader :indices, :parser
-    def initialize(indices, type: nil)
+    def initialize(indices, type: nil, &parser)
       @indices = indices
-      @type = type
+      if block_given?
+        @parser = parser
+      else
+        @parser = parser_from_type(type)
+      end
     end
     def parse(text)
-      value = indices.map { |index| text[index] }.join
+      @parser.call(indices.map { |index| text[index] }.join)
+    end
+    private
+    def parser_from_type(type)
       case type
       when :integer
-        value.to_i
+        :to_i.to_proc
       when :overpunch
-        Overpunch.parse(value)
+        Proc.new { |value| Overpunch.parse(value) }
       else
-        value.strip
+        :strip.to_proc
       end
     end
   end

data/lib/pikelet/record_definition.rb CHANGED Viewed

@@ -10,9 +10,9 @@ module Pikelet
       end
     end
-    def field(name, *indices, type: nil)
+    def field(name, *indices, type: nil, &block)
       @record_class = nil
-      field_definitions[name] = Pikelet::FieldDefinition.new(indices, type: type)
+      field_definitions[name] = Pikelet::FieldDefinition.new(indices, type: type, &block)
     end
     def record(type_signature, &block)
@@ -23,8 +23,8 @@ module Pikelet
       record_class.new(*field_definitions.values.map { |field| field.parse(data) })
     end
-    def method_missing(method, *args, **options)
-      field(method, *args, **options)
+    def method_missing(method, *args, **options, &block)
+      field(method, *args, **options, &block)
     end
     def record_class

data/lib/pikelet/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module Pikelet
-  VERSION = "0.0.2"
+  VERSION = "0.1.0"
 end

data/spec/pikelet/field_definition_spec.rb ADDED Viewed

@@ -0,0 +1,86 @@
+require "spec_helper"
+require "pikelet"
+require "csv"
+describe Pikelet::FieldDefinition do
+  let(:data)        { "The quick brown fox" }
+  let(:type)        { nil }
+  let(:definition)  { Pikelet::FieldDefinition.new(indices, type: type) }
+  subject(:value) { definition.parse(data) }
+  describe "for a fixed-width field" do
+    let(:indices) { [ 4...9 ] }
+    it "extracts the field content from the data" do
+      expect(value).to eq "quick"
+    end
+  end
+  describe "given whitespace" do
+    let(:indices) { [ 3...16 ] }
+    it "strips leading and trailing whitespace" do
+      expect(value).to eq "quick brown"
+    end
+  end
+  describe "with multiple indices" do
+    let(:indices) { [ 0...4, 16...19 ] }
+    it "joins the sections together" do
+      expect(value).to eq "The fox"
+    end
+  end
+  describe "given a CSV row" do
+    let(:data)    { CSV.parse("The,quick,brown,fox").first }
+    let(:indices) { [ 2 ] }
+    it "extracts the field" do
+      expect(value).to eq "brown"
+    end
+  end
+  describe "for integer fields" do
+    let(:data)    { "xx326xx" }
+    let(:indices) { [ 2...5] }
+    let(:type)    { :integer }
+    it "converts the value to an integer" do
+      expect(value).to eq 326
+    end
+  end
+  describe "for overpunch fields" do
+    let(:data)    { "xx67Kxx" }
+    let(:indices) { [ 2...5] }
+    let(:type)    { :overpunch }
+    it "converts the value to an integer" do
+      expect(value).to eq -672
+    end
+  end
+  describe "given a parser block" do
+    let(:indices) { [ 4...9] }
+    let(:definition) do
+      Pikelet::FieldDefinition.new(indices) { |value| value.reverse }
+    end
+    it "yields the value to the parser" do
+      expect(value).to eq "kciuq"
+    end
+  end
+  describe "given a symbol for the parser block" do
+    let(:indices) { [ 4...9] }
+    let(:definition) do
+      Pikelet::FieldDefinition.new(indices, &:upcase)
+    end
+    it "invokes the named method on the value" do
+      expect(value).to eq "QUICK"
+    end
+  end
+end

data/spec/pikelet_spec.rb CHANGED Viewed

@@ -13,7 +13,7 @@ describe Pikelet do
   subject { records }
-  describe "a simple flat file" do
+  describe "for a simple flat file" do
     let(:definition) do
       Pikelet.define do
         name   0... 4
@@ -34,7 +34,7 @@ describe Pikelet do
     its(:last)  { is_expected.to match_hash(name: "Sue",  number: "087654321") }
   end
-  describe "a file with heterogeneous records" do
+  describe "for a file with heterogeneous records" do
     let(:definition) do
       Pikelet.define do
         type_signature 0...1
@@ -65,7 +65,7 @@ describe Pikelet do
     its(:last)  { is_expected.to match_hash(name: "Sue",  number: "087654321", type_signature: "B") }
   end
-  describe "a CSV file" do
+  describe "for a CSV file" do
     let(:definition) do
       Pikelet.define do
         name   0
@@ -114,10 +114,10 @@ describe Pikelet do
     its(:last)  { is_expected.to match_hash(name: "Sue",  number: "087654321", type_signature: "FANCY") }
   end
-  describe "integer fields" do
+  describe "given integer fields" do
     let(:definition) do
       Pikelet.define do
-        value 0...4, type: :integer
+        value 0...4, &:to_i
       end
     end
@@ -132,10 +132,10 @@ describe Pikelet do
     its(:value) { is_expected.to eq 5637 }
   end
-  describe "overpunch fields" do
+  describe "given overpunch fields" do
     let(:definition) do
       Pikelet.define do
-        value 0...4, type: :overpunch
+        value(0...4) { |value| Overpunch.parse(value) }
       end
     end
@@ -149,4 +149,32 @@ describe Pikelet do
     its(:value) { is_expected.to eq -5631 }
   end
+  describe "given a block when parsing" do
+    let(:collected_records) { [] }
+    let(:definition) do
+      Pikelet.define do
+        name   0... 4
+        number 4...13
+      end
+    end
+    let(:data) do
+      <<-FILE.gsub(/^\s*/, "").split(/[\r\n]+/)
+        John012345678
+        Sue 087654321
+      FILE
+    end
+    before do
+      definition.parse(data) do |record|
+        collected_records << record.to_h
+      end
+    end
+    it 'yields each record to the block' do
+      expect(collected_records).to contain_exactly(*records.map(&:to_h))
+    end
+  end
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: pikelet
 version: !ruby/object:Gem::Version
-  version: 0.0.2
+  version: 0.1.0
 platform: ruby
 authors:
 - John Carney
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2014-07-24 00:00:00.000000000 Z
+date: 2014-07-25 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: overpunch
@@ -131,6 +131,7 @@ files:
 - lib/pikelet/record_definition.rb
 - lib/pikelet/version.rb
 - pikelet.gemspec
+- spec/pikelet/field_definition_spec.rb
 - spec/pikelet_spec.rb
 - spec/spec_helper.rb
 homepage: ''
@@ -158,5 +159,6 @@ signing_key:
 specification_version: 4
 summary: A simple flat-file database parser.
 test_files:
+- spec/pikelet/field_definition_spec.rb
 - spec/pikelet_spec.rb
 - spec/spec_helper.rb