pikelet 0.0.2 → 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: ba77b8ff8d09025491cbe13b0cd6f18f735a3b84
4
- data.tar.gz: 4949556c8adee37a5d0370d02a2bd1c1a3a1f544
3
+ metadata.gz: b08f706b9dfcb52105cbcbafdb725cca95db84fc
4
+ data.tar.gz: 0a4362f589115d5b7fc082322b9d419f68e62c0a
5
5
  SHA512:
6
- metadata.gz: 4c0afce709436d65d47d6234abbf8cc94dfbe61870a5c171bfa8b2314488cdfb3ce907a901767a3f3f95ad74cc8f251223c554eaa0a081aba663c7d2b8607c9e
7
- data.tar.gz: 60432f02988c999c97d33c73418523997f58f2a0133707435a8fe61d749f5683a76d0c5dfb986fc446b0a76b39e2a130dc8536b10258272607edb567cba58580
6
+ metadata.gz: db9ac7654c8a926801bde9f537b36990b7d9560477eae190116b1ab24fe6915b0d44e0abe48357a25497bba512c3d555e6c4c6d3c4a5a260d750963183350843
7
+ data.tar.gz: 44e690c9a0913d61bc7a77df2ec460e4ee38479425c166fd6c24ab657699650ee511927829638bd723309487551e03932b1b21b52fc35052619803f5bf66f5d9
data/README.md CHANGED
@@ -1,8 +1,24 @@
1
1
  # Pikelet
2
2
 
3
- A pikelet is a type of small pancake popular in Australia and New Zealand.
4
- Also, a simple flat-file database parser capable of dealing with
5
- files containing heterogeneous records.
3
+ [![Gem Version][gem-badge]][gem]
4
+ [![Build status][build-badge]][build]
5
+ [![Coverage Status][coverage-badge]][coverage]
6
+
7
+ A [pikelet][pikelet-recipe] is a small, delicious pancake popular in Australia
8
+ and New Zealand. Also, the stage name of Australian musician
9
+ [Evelyn Morris][pikelet-musician]. Also, a simple flat-file database parser
10
+ capable of dealing with files containing heterogeneous records. Somehow you've
11
+ wound up at the github page for the last one.
12
+
13
+ The reason I built Pikelet was to handle "HOT" files as described in the
14
+ [IATA BSP Data Interchange Specifications handbook][dish]. These are
15
+ essentially flat-file databases comprised of a number of different fixed-width
16
+ record types. Each record type has a different structure, though some types
17
+ share common fields, and all types have a type signature.
18
+
19
+ However, Pikelet will also handle more typical flat-file databases comprised
20
+ of homogeneous records. Additionally, it will work equally as well with CSV
21
+ files as it will with fixed-width records.
6
22
 
7
23
  ## Installation
8
24
 
@@ -20,35 +36,60 @@ Or install it yourself as:
20
36
 
21
37
  ## Usage
22
38
 
23
- ### Homogeneous records, fixed-width fields
39
+ ### The simple case: homogeneous records
24
40
 
25
- require "pikelet"
41
+ Let's say our file is a simple list of first and last names with each field
42
+ being 10 characters in width, padded with spaces (vertical pipes used to
43
+ indicate field boundaries).
26
44
 
27
- data = <<-FLATFILE.gsub(/^\s*/, "")
28
- Nicolaus Copernicus
29
- Tycho Brahe
30
- FLATFILE
45
+ |Nicolaus |Copernicus|
46
+ |Tycho |Brahe |
47
+
48
+ We can describe this format using Pikelet as follows:
31
49
 
32
50
  definition = Pikelet.define do
33
- first_name 0...10
34
- last_name 10...20
51
+ first_name 0...10
52
+ last_name 10...20
35
53
  end
36
54
 
37
- definition.parse(data.split(/[\r\n]+/)).to_a
55
+ Each field is described with a field name and a range describing the field
56
+ boundaries. You can use either the end-inclusive (`..`) or end-exclusive
57
+ (`...`) form of range literals. I prefer the exclusive form for this.
38
58
 
39
- # => [#<struct first_name="Nicolaus", last_name="Copernicus">,
40
- # #<struct first_name="Tycho", last_name="Brahe">]
59
+ Parsing the data is simple as this:
41
60
 
42
- ### Heterogeneous records, fixed-width fields
61
+ definition.parse(data)
43
62
 
44
- require "pikelet"
63
+ `data` is assumed to be an enumerable object yielding successive lines from
64
+ your file. For instance, you could do something like this:
45
65
 
46
- data = <<-FLATFILE.gsub(/^\s*/, "")
47
- NAMENicolaus Copernicus
48
- ADDR123 South Street Nowhereville 45678Y Someplace Someland
49
- FLATFILE
66
+ records = definition.parse(IO.readlines(filepath))
50
67
 
51
- definition = Pikelet.define do
68
+ or this:
69
+
70
+ records = File(filepath, 'r').do |f|
71
+ definition.parse(f)
72
+ end
73
+
74
+ `parse` returns an enumerator, which you can either iterate over, or convert
75
+ to an array, or whatever else you people do with enumerators. In any case,
76
+ what you'll end up with is a series of `Structs` like this:
77
+
78
+ #<struct first_name="Nicolaus", last_name="Copernicus">,
79
+ #<struct first_name="Tycho", last_name="Brahe">
80
+
81
+ ### A more complex case: heterogeneous records
82
+
83
+ Now let's say we're given a file consisting of names and addresses, each
84
+ record contains a 4-character type signature - 'NAME' for names, 'ADDR' for
85
+ addresses:
86
+
87
+ |NAME|Nicolaus |Copernicus|
88
+ |ADDR|123 South Street |Nowhereville |45678Y |Someplace |
89
+
90
+ We can describe it as follows:
91
+
92
+ Pikelet.define do
52
93
  type_signature 0...4
53
94
 
54
95
  record "NAME" do
@@ -61,35 +102,39 @@ Or install it yourself as:
61
102
  city 24...44
62
103
  postal_code 44...54
63
104
  state 54...74
64
- country 74...94
65
105
  end
66
106
  end
67
107
 
68
- definition.parse(data.split(/[\r\n]+/)).to_a
108
+ Note that the type signature is described as a field like any other, but it
109
+ must have the name `type_signature`.
69
110
 
70
- # => [#<struct
71
- # type_signature="NAME",
72
- # first_name="Nicolaus",
73
- # last_name="Copernicus">,
74
- # #<struct
75
- # type_signature="ADDR",
76
- # street_address="123 South Street",
77
- # city="Nowhereville",
78
- # postal_code="45678Y",
79
- # state="Someplace",
80
- # country="Someland">]
111
+ Each record type is described using `record` statements, which take the
112
+ record's type signature as a parameter and a block describing its fields.
81
113
 
82
- ### CSV files
114
+ When we parse the data, we end up with this:
83
115
 
84
- require "pikelet"
85
- require "csv"
116
+ #<struct
117
+ type_signature="NAME",
118
+ first_name="Nicolaus",
119
+ last_name="Copernicus">,
120
+ #<struct
121
+ type_signature="ADDR",
122
+ street_address="123 South Street",
123
+ city="Nowhereville",
124
+ postal_code="45678Y",
125
+ state="Someplace">
86
126
 
87
- data = <<-CSV.gsub(/^\s*/, "")
88
- NAME,Nicolaus,Copernicus
89
- ADDR,123 South Street,Nowhereville,45678Y,Someplace,Someland
90
- CSV
127
+ ### Handling CSV files
91
128
 
92
- definition = Pikelet.define do
129
+ What happens if we were given the data in the previous example in CSV form?
130
+
131
+ NAME,Nicolaus,Copernicus
132
+ ADDR,123 South Street,Nowhereville,45678Y,Someplace
133
+
134
+ In this case instead of describing fields with a boundary range, we just
135
+ give it a simple (zero-based) index, like so:
136
+
137
+ Pikelet.define do
93
138
  type_signature 0
94
139
 
95
140
  record "NAME" do
@@ -102,63 +147,139 @@ Or install it yourself as:
102
147
  city 2
103
148
  postal_code 3
104
149
  state 4
105
- country 5
106
150
  end
107
151
  end
108
152
 
109
- definition.parse(CSV.parse(data)).to_a
153
+ This yields the same results as above.
110
154
 
111
- # => [#<struct
112
- # type_signature="NAME",
113
- # first_name="Nicolaus",
114
- # last_name="Copernicus">,
115
- # #<struct
116
- # type_signature="ADDR",
117
- # street_address="123 South Street",
118
- # city="Nowhereville",
119
- # postal_code="45678Y",
120
- # state="Someplace",
121
- # country="Someland">]
155
+ Note that this ability to handle CSV was not planned - it just sprang
156
+ fully-formed from the implementation. One of those pleasant little surprises
157
+ that happens sometimes. If only I had a use for it.
122
158
 
123
159
  ### Inheritance
124
160
 
125
- require "pikelet"
161
+ Now we go back to our original example, starting with a simple list of names,
162
+ but this time some of the records include a nickname:
126
163
 
127
- data = <<-FLATFILE.gsub(/^\s*/, "")
128
- SIMPLENicolaus Copernicus
129
- FANCY Tycho Brahe Tykester
130
- FLATFILE
164
+ |PLAIN|Nicolaus |Copernicus|
165
+ |FANCY|Tycho |Brahe |Tykester |
131
166
 
132
- definition = Pikelet.define do
133
- type_signature 0...6
167
+ The first and last name fields have the same boundaries in each case, but the
168
+ "FANCY" records have an additional field. We can describe this by nesting the
169
+ definition for FANCY records inside the definition for the PLAIN records:
170
+
171
+ Pikelet.define do
172
+ type_signature 0...5
134
173
 
135
- record "SIMPLE" do
136
- first_name 6...16
137
- last_name 16...26
174
+ record "PLAIN" do
175
+ first_name 5...15
176
+ last_name 15...25
138
177
 
139
178
  record "FANCY" do
140
- nickname 26...36
179
+ nickname 25...35
141
180
  end
142
181
  end
143
182
  end
144
183
 
145
- definition.parse(data.split(/[\r\n]+/)).to_a
184
+ Note that the outer definition is really just a record definition in disguise,
185
+ you might have already figured this out if you were paying attention.
146
186
 
147
- # => [#<struct
148
- # type_signature="SIMPLE",
149
- # first_name="Nicolaus",
150
- # last_name="Copernicus">,
151
- # #<struct
152
- # type_signature="FANCY",
153
- # first_name="Tycho",
154
- # last_name="Brahe",
155
- # nickname="Tykester">]
187
+ Anyway, this is what we get when we parse it.
156
188
 
189
+ #<struct
190
+ type_signature="SIMPLE",
191
+ first_name="Nicolaus",
192
+ last_name="Copernicus">,
193
+ #<struct
194
+ type_signature="FANCY",
195
+ first_name="Tycho",
196
+ last_name="Brahe",
197
+ nickname="Tykester">
198
+
199
+ ### Custom field parsing
200
+
201
+ Field definitions can accept a block. If provided, the field value is yielded
202
+ to the block. This is useful for parsing numeric fields (say).
203
+
204
+ Pikelet.define do
205
+ a_number(0...4) { |value| value.to_i }
206
+ end
207
+
208
+ You can also use shorthand syntax:
209
+
210
+ Pikelet.define do
211
+ a_number 0...4, &:to_i
212
+ end
213
+
214
+ ### A stupid trick
215
+
216
+ The `field` statement will actually accepts multiple ranges/indices and will
217
+ simply glue the sections described together. Consider the following data:
218
+
219
+ |BFH|00000001|01|LONZZZ 203TEST1101022359GB000001 |
220
+ |BCH|00000002|02|0111101007F110107 |
221
+ |BOH|00000003|03|91200001101031 GBP2|
222
+ |BKT|00000004|06| 000001 011X ZZZ |
223
+
224
+ In this format the first three characters are a 'message identifier', the next
225
+ 8 characters are a sequence number and the next 2 are a 'numeric qualifier'.
226
+ The message identifier and numeric qualifier together form the type signature.
227
+
228
+ We can describe this as follows (let's not bother describing all the
229
+ different record types):
230
+
231
+ Pikelet.define do
232
+ type_signature 0... 3, 11...13
233
+ sequence 3...11, &:to_i
234
+ payload 13.. -1
235
+ end
236
+
237
+ Which will yield:
238
+
239
+ #<struct
240
+ type_signature="BFH01",
241
+ sequence=1,
242
+ payload="LONZZZ 203TEST1101022359GB000001">,
243
+ #<struct
244
+ type_signature="BCH02",
245
+ sequence=2,
246
+ payload="0111101007F110107">,
247
+ #<struct
248
+ type_signature="BOH03",
249
+ sequence=3,
250
+ payload="91200001101031 GBP2">,
251
+ #<struct
252
+ type_signature="BKT06",
253
+ sequence=4,
254
+ payload="000001 011X ZZZ">
255
+
256
+ In case you were wondering, no I didn't make that format up. That is what a
257
+ [BSP HOT file][dish] actually looks like, except there's a hell of a lot more
258
+ of it and many, many more record types.
259
+
260
+ ## Thoughts/plans
261
+
262
+ * With some work, Pikelet could produce flat file records as easily as it
263
+ consumes them.
264
+ * I had a crack at supporting lazy enumeration, and it kinda works. Sometimes.
265
+ If the moon is in the right quarter. I'd like to get it working properly.
157
266
 
158
267
  ## Contributing
159
268
 
160
- 1. Fork it ( http://github.com/johncarney/pikelet/fork )
269
+ 1. Fork it ([http://github.com/johncarney/pikelet/fork][fork])
161
270
  2. Create your feature branch (`git checkout -b my-new-feature`)
162
271
  3. Commit your changes (`git commit -am 'Add some feature'`)
163
272
  4. Push to the branch (`git push origin my-new-feature`)
164
273
  5. Create new Pull Request
274
+
275
+ [pikelet-recipe]: http://www.taste.com.au/recipes/5757/pikelets
276
+ [pikelet-musician]: http://en.wikipedia.org/wiki/Evelyn_Morris
277
+ [dish]: http://www.iata.org/publications/Pages/bspdish.aspx
278
+ [overpunch]: https://github.com/johncarney/overpunch
279
+ [gem-badge]: https://badge.fury.io/rb/pikelet.svg
280
+ [gem]: http://badge.fury.io/rb/pikelet
281
+ [build-badge]: https://travis-ci.org/johncarney/pikelet.svg?branch=master
282
+ [build]: https://travis-ci.org/johncarney/pikelet
283
+ [coverage-badge]: https://img.shields.io/coveralls/johncarney/pikelet.svg
284
+ [coverage]: https://coveralls.io/r/johncarney/pikelet?branch=master
285
+ [fork]: http://github.com/johncarney/pikelet/fork
@@ -2,22 +2,31 @@ require "overpunch"
2
2
 
3
3
  module Pikelet
4
4
  class FieldDefinition
5
- attr_reader :indices, :type
5
+ attr_reader :indices, :parser
6
6
 
7
- def initialize(indices, type: nil)
7
+ def initialize(indices, type: nil, &parser)
8
8
  @indices = indices
9
- @type = type
9
+ if block_given?
10
+ @parser = parser
11
+ else
12
+ @parser = parser_from_type(type)
13
+ end
10
14
  end
11
15
 
12
16
  def parse(text)
13
- value = indices.map { |index| text[index] }.join
17
+ @parser.call(indices.map { |index| text[index] }.join)
18
+ end
19
+
20
+ private
21
+
22
+ def parser_from_type(type)
14
23
  case type
15
24
  when :integer
16
- value.to_i
25
+ :to_i.to_proc
17
26
  when :overpunch
18
- Overpunch.parse(value)
27
+ Proc.new { |value| Overpunch.parse(value) }
19
28
  else
20
- value.strip
29
+ :strip.to_proc
21
30
  end
22
31
  end
23
32
  end
@@ -10,9 +10,9 @@ module Pikelet
10
10
  end
11
11
  end
12
12
 
13
- def field(name, *indices, type: nil)
13
+ def field(name, *indices, type: nil, &block)
14
14
  @record_class = nil
15
- field_definitions[name] = Pikelet::FieldDefinition.new(indices, type: type)
15
+ field_definitions[name] = Pikelet::FieldDefinition.new(indices, type: type, &block)
16
16
  end
17
17
 
18
18
  def record(type_signature, &block)
@@ -23,8 +23,8 @@ module Pikelet
23
23
  record_class.new(*field_definitions.values.map { |field| field.parse(data) })
24
24
  end
25
25
 
26
- def method_missing(method, *args, **options)
27
- field(method, *args, **options)
26
+ def method_missing(method, *args, **options, &block)
27
+ field(method, *args, **options, &block)
28
28
  end
29
29
 
30
30
  def record_class
@@ -1,3 +1,3 @@
1
1
  module Pikelet
2
- VERSION = "0.0.2"
2
+ VERSION = "0.1.0"
3
3
  end
@@ -0,0 +1,86 @@
1
+ require "spec_helper"
2
+ require "pikelet"
3
+ require "csv"
4
+
5
+ describe Pikelet::FieldDefinition do
6
+ let(:data) { "The quick brown fox" }
7
+ let(:type) { nil }
8
+ let(:definition) { Pikelet::FieldDefinition.new(indices, type: type) }
9
+
10
+ subject(:value) { definition.parse(data) }
11
+
12
+ describe "for a fixed-width field" do
13
+ let(:indices) { [ 4...9 ] }
14
+
15
+ it "extracts the field content from the data" do
16
+ expect(value).to eq "quick"
17
+ end
18
+ end
19
+
20
+ describe "given whitespace" do
21
+ let(:indices) { [ 3...16 ] }
22
+
23
+ it "strips leading and trailing whitespace" do
24
+ expect(value).to eq "quick brown"
25
+ end
26
+ end
27
+
28
+ describe "with multiple indices" do
29
+ let(:indices) { [ 0...4, 16...19 ] }
30
+
31
+ it "joins the sections together" do
32
+ expect(value).to eq "The fox"
33
+ end
34
+ end
35
+
36
+ describe "given a CSV row" do
37
+ let(:data) { CSV.parse("The,quick,brown,fox").first }
38
+ let(:indices) { [ 2 ] }
39
+
40
+ it "extracts the field" do
41
+ expect(value).to eq "brown"
42
+ end
43
+ end
44
+
45
+ describe "for integer fields" do
46
+ let(:data) { "xx326xx" }
47
+ let(:indices) { [ 2...5] }
48
+ let(:type) { :integer }
49
+
50
+ it "converts the value to an integer" do
51
+ expect(value).to eq 326
52
+ end
53
+ end
54
+
55
+ describe "for overpunch fields" do
56
+ let(:data) { "xx67Kxx" }
57
+ let(:indices) { [ 2...5] }
58
+ let(:type) { :overpunch }
59
+
60
+ it "converts the value to an integer" do
61
+ expect(value).to eq -672
62
+ end
63
+ end
64
+
65
+ describe "given a parser block" do
66
+ let(:indices) { [ 4...9] }
67
+ let(:definition) do
68
+ Pikelet::FieldDefinition.new(indices) { |value| value.reverse }
69
+ end
70
+
71
+ it "yields the value to the parser" do
72
+ expect(value).to eq "kciuq"
73
+ end
74
+ end
75
+
76
+ describe "given a symbol for the parser block" do
77
+ let(:indices) { [ 4...9] }
78
+ let(:definition) do
79
+ Pikelet::FieldDefinition.new(indices, &:upcase)
80
+ end
81
+
82
+ it "invokes the named method on the value" do
83
+ expect(value).to eq "QUICK"
84
+ end
85
+ end
86
+ end
data/spec/pikelet_spec.rb CHANGED
@@ -13,7 +13,7 @@ describe Pikelet do
13
13
 
14
14
  subject { records }
15
15
 
16
- describe "a simple flat file" do
16
+ describe "for a simple flat file" do
17
17
  let(:definition) do
18
18
  Pikelet.define do
19
19
  name 0... 4
@@ -34,7 +34,7 @@ describe Pikelet do
34
34
  its(:last) { is_expected.to match_hash(name: "Sue", number: "087654321") }
35
35
  end
36
36
 
37
- describe "a file with heterogeneous records" do
37
+ describe "for a file with heterogeneous records" do
38
38
  let(:definition) do
39
39
  Pikelet.define do
40
40
  type_signature 0...1
@@ -65,7 +65,7 @@ describe Pikelet do
65
65
  its(:last) { is_expected.to match_hash(name: "Sue", number: "087654321", type_signature: "B") }
66
66
  end
67
67
 
68
- describe "a CSV file" do
68
+ describe "for a CSV file" do
69
69
  let(:definition) do
70
70
  Pikelet.define do
71
71
  name 0
@@ -114,10 +114,10 @@ describe Pikelet do
114
114
  its(:last) { is_expected.to match_hash(name: "Sue", number: "087654321", type_signature: "FANCY") }
115
115
  end
116
116
 
117
- describe "integer fields" do
117
+ describe "given integer fields" do
118
118
  let(:definition) do
119
119
  Pikelet.define do
120
- value 0...4, type: :integer
120
+ value 0...4, &:to_i
121
121
  end
122
122
  end
123
123
 
@@ -132,10 +132,10 @@ describe Pikelet do
132
132
  its(:value) { is_expected.to eq 5637 }
133
133
  end
134
134
 
135
- describe "overpunch fields" do
135
+ describe "given overpunch fields" do
136
136
  let(:definition) do
137
137
  Pikelet.define do
138
- value 0...4, type: :overpunch
138
+ value(0...4) { |value| Overpunch.parse(value) }
139
139
  end
140
140
  end
141
141
 
@@ -149,4 +149,32 @@ describe Pikelet do
149
149
 
150
150
  its(:value) { is_expected.to eq -5631 }
151
151
  end
152
+
153
+ describe "given a block when parsing" do
154
+ let(:collected_records) { [] }
155
+
156
+ let(:definition) do
157
+ Pikelet.define do
158
+ name 0... 4
159
+ number 4...13
160
+ end
161
+ end
162
+
163
+ let(:data) do
164
+ <<-FILE.gsub(/^\s*/, "").split(/[\r\n]+/)
165
+ John012345678
166
+ Sue 087654321
167
+ FILE
168
+ end
169
+
170
+ before do
171
+ definition.parse(data) do |record|
172
+ collected_records << record.to_h
173
+ end
174
+ end
175
+
176
+ it 'yields each record to the block' do
177
+ expect(collected_records).to contain_exactly(*records.map(&:to_h))
178
+ end
179
+ end
152
180
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: pikelet
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.2
4
+ version: 0.1.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - John Carney
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-07-24 00:00:00.000000000 Z
11
+ date: 2014-07-25 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: overpunch
@@ -131,6 +131,7 @@ files:
131
131
  - lib/pikelet/record_definition.rb
132
132
  - lib/pikelet/version.rb
133
133
  - pikelet.gemspec
134
+ - spec/pikelet/field_definition_spec.rb
134
135
  - spec/pikelet_spec.rb
135
136
  - spec/spec_helper.rb
136
137
  homepage: ''
@@ -158,5 +159,6 @@ signing_key:
158
159
  specification_version: 4
159
160
  summary: A simple flat-file database parser.
160
161
  test_files:
162
+ - spec/pikelet/field_definition_spec.rb
161
163
  - spec/pikelet_spec.rb
162
164
  - spec/spec_helper.rb