pikelet 0.0.2 → 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: ba77b8ff8d09025491cbe13b0cd6f18f735a3b84
4
- data.tar.gz: 4949556c8adee37a5d0370d02a2bd1c1a3a1f544
3
+ metadata.gz: b08f706b9dfcb52105cbcbafdb725cca95db84fc
4
+ data.tar.gz: 0a4362f589115d5b7fc082322b9d419f68e62c0a
5
5
  SHA512:
6
- metadata.gz: 4c0afce709436d65d47d6234abbf8cc94dfbe61870a5c171bfa8b2314488cdfb3ce907a901767a3f3f95ad74cc8f251223c554eaa0a081aba663c7d2b8607c9e
7
- data.tar.gz: 60432f02988c999c97d33c73418523997f58f2a0133707435a8fe61d749f5683a76d0c5dfb986fc446b0a76b39e2a130dc8536b10258272607edb567cba58580
6
+ metadata.gz: db9ac7654c8a926801bde9f537b36990b7d9560477eae190116b1ab24fe6915b0d44e0abe48357a25497bba512c3d555e6c4c6d3c4a5a260d750963183350843
7
+ data.tar.gz: 44e690c9a0913d61bc7a77df2ec460e4ee38479425c166fd6c24ab657699650ee511927829638bd723309487551e03932b1b21b52fc35052619803f5bf66f5d9
data/README.md CHANGED
@@ -1,8 +1,24 @@
1
1
  # Pikelet
2
2
 
3
- A pikelet is a type of small pancake popular in Australia and New Zealand.
4
- Also, a simple flat-file database parser capable of dealing with
5
- files containing heterogeneous records.
3
+ [![Gem Version][gem-badge]][gem]
4
+ [![Build status][build-badge]][build]
5
+ [![Coverage Status][coverage-badge]][coverage]
6
+
7
+ A [pikelet][pikelet-recipe] is a small, delicious pancake popular in Australia
8
+ and New Zealand. Also, the stage name of Australian musician
9
+ [Evelyn Morris][pikelet-musician]. Also, a simple flat-file database parser
10
+ capable of dealing with files containing heterogeneous records. Somehow you've
11
+ wound up at the github page for the last one.
12
+
13
+ The reason I built Pikelet was to handle "HOT" files as described in the
14
+ [IATA BSP Data Interchange Specifications handbook][dish]. These are
15
+ essentially flat-file databases comprised of a number of different fixed-width
16
+ record types. Each record type has a different structure, though some types
17
+ share common fields, and all types have a type signature.
18
+
19
+ However, Pikelet will also handle more typical flat-file databases comprised
20
+ of homogeneous records. Additionally, it will work equally as well with CSV
21
+ files as it will with fixed-width records.
6
22
 
7
23
  ## Installation
8
24
 
@@ -20,35 +36,60 @@ Or install it yourself as:
20
36
 
21
37
  ## Usage
22
38
 
23
- ### Homogeneous records, fixed-width fields
39
+ ### The simple case: homogeneous records
24
40
 
25
- require "pikelet"
41
+ Let's say our file is a simple list of first and last names with each field
42
+ being 10 characters in width, padded with spaces (vertical pipes used to
43
+ indicate field boundaries).
26
44
 
27
- data = <<-FLATFILE.gsub(/^\s*/, "")
28
- Nicolaus Copernicus
29
- Tycho Brahe
30
- FLATFILE
45
+ |Nicolaus |Copernicus|
46
+ |Tycho |Brahe |
47
+
48
+ We can describe this format using Pikelet as follows:
31
49
 
32
50
  definition = Pikelet.define do
33
- first_name 0...10
34
- last_name 10...20
51
+ first_name 0...10
52
+ last_name 10...20
35
53
  end
36
54
 
37
- definition.parse(data.split(/[\r\n]+/)).to_a
55
+ Each field is described with a field name and a range describing the field
56
+ boundaries. You can use either the end-inclusive (`..`) or end-exclusive
57
+ (`...`) form of range literals. I prefer the exclusive form for this.
38
58
 
39
- # => [#<struct first_name="Nicolaus", last_name="Copernicus">,
40
- # #<struct first_name="Tycho", last_name="Brahe">]
59
+ Parsing the data is simple as this:
41
60
 
42
- ### Heterogeneous records, fixed-width fields
61
+ definition.parse(data)
43
62
 
44
- require "pikelet"
63
+ `data` is assumed to be an enumerable object yielding successive lines from
64
+ your file. For instance, you could do something like this:
45
65
 
46
- data = <<-FLATFILE.gsub(/^\s*/, "")
47
- NAMENicolaus Copernicus
48
- ADDR123 South Street Nowhereville 45678Y Someplace Someland
49
- FLATFILE
66
+ records = definition.parse(IO.readlines(filepath))
50
67
 
51
- definition = Pikelet.define do
68
+ or this:
69
+
70
+ records = File(filepath, 'r').do |f|
71
+ definition.parse(f)
72
+ end
73
+
74
+ `parse` returns an enumerator, which you can either iterate over, or convert
75
+ to an array, or whatever else you people do with enumerators. In any case,
76
+ what you'll end up with is a series of `Structs` like this:
77
+
78
+ #<struct first_name="Nicolaus", last_name="Copernicus">,
79
+ #<struct first_name="Tycho", last_name="Brahe">
80
+
81
+ ### A more complex case: heterogeneous records
82
+
83
+ Now let's say we're given a file consisting of names and addresses, each
84
+ record contains a 4-character type signature - 'NAME' for names, 'ADDR' for
85
+ addresses:
86
+
87
+ |NAME|Nicolaus |Copernicus|
88
+ |ADDR|123 South Street |Nowhereville |45678Y |Someplace |
89
+
90
+ We can describe it as follows:
91
+
92
+ Pikelet.define do
52
93
  type_signature 0...4
53
94
 
54
95
  record "NAME" do
@@ -61,35 +102,39 @@ Or install it yourself as:
61
102
  city 24...44
62
103
  postal_code 44...54
63
104
  state 54...74
64
- country 74...94
65
105
  end
66
106
  end
67
107
 
68
- definition.parse(data.split(/[\r\n]+/)).to_a
108
+ Note that the type signature is described as a field like any other, but it
109
+ must have the name `type_signature`.
69
110
 
70
- # => [#<struct
71
- # type_signature="NAME",
72
- # first_name="Nicolaus",
73
- # last_name="Copernicus">,
74
- # #<struct
75
- # type_signature="ADDR",
76
- # street_address="123 South Street",
77
- # city="Nowhereville",
78
- # postal_code="45678Y",
79
- # state="Someplace",
80
- # country="Someland">]
111
+ Each record type is described using `record` statements, which take the
112
+ record's type signature as a parameter and a block describing its fields.
81
113
 
82
- ### CSV files
114
+ When we parse the data, we end up with this:
83
115
 
84
- require "pikelet"
85
- require "csv"
116
+ #<struct
117
+ type_signature="NAME",
118
+ first_name="Nicolaus",
119
+ last_name="Copernicus">,
120
+ #<struct
121
+ type_signature="ADDR",
122
+ street_address="123 South Street",
123
+ city="Nowhereville",
124
+ postal_code="45678Y",
125
+ state="Someplace">
86
126
 
87
- data = <<-CSV.gsub(/^\s*/, "")
88
- NAME,Nicolaus,Copernicus
89
- ADDR,123 South Street,Nowhereville,45678Y,Someplace,Someland
90
- CSV
127
+ ### Handling CSV files
91
128
 
92
- definition = Pikelet.define do
129
+ What happens if we were given the data in the previous example in CSV form?
130
+
131
+ NAME,Nicolaus,Copernicus
132
+ ADDR,123 South Street,Nowhereville,45678Y,Someplace
133
+
134
+ In this case instead of describing fields with a boundary range, we just
135
+ give it a simple (zero-based) index, like so:
136
+
137
+ Pikelet.define do
93
138
  type_signature 0
94
139
 
95
140
  record "NAME" do
@@ -102,63 +147,139 @@ Or install it yourself as:
102
147
  city 2
103
148
  postal_code 3
104
149
  state 4
105
- country 5
106
150
  end
107
151
  end
108
152
 
109
- definition.parse(CSV.parse(data)).to_a
153
+ This yields the same results as above.
110
154
 
111
- # => [#<struct
112
- # type_signature="NAME",
113
- # first_name="Nicolaus",
114
- # last_name="Copernicus">,
115
- # #<struct
116
- # type_signature="ADDR",
117
- # street_address="123 South Street",
118
- # city="Nowhereville",
119
- # postal_code="45678Y",
120
- # state="Someplace",
121
- # country="Someland">]
155
+ Note that this ability to handle CSV was not planned - it just sprang
156
+ fully-formed from the implementation. One of those pleasant little surprises
157
+ that happens sometimes. If only I had a use for it.
122
158
 
123
159
  ### Inheritance
124
160
 
125
- require "pikelet"
161
+ Now we go back to our original example, starting with a simple list of names,
162
+ but this time some of the records include a nickname:
126
163
 
127
- data = <<-FLATFILE.gsub(/^\s*/, "")
128
- SIMPLENicolaus Copernicus
129
- FANCY Tycho Brahe Tykester
130
- FLATFILE
164
+ |PLAIN|Nicolaus |Copernicus|
165
+ |FANCY|Tycho |Brahe |Tykester |
131
166
 
132
- definition = Pikelet.define do
133
- type_signature 0...6
167
+ The first and last name fields have the same boundaries in each case, but the
168
+ "FANCY" records have an additional field. We can describe this by nesting the
169
+ definition for FANCY records inside the definition for the PLAIN records:
170
+
171
+ Pikelet.define do
172
+ type_signature 0...5
134
173
 
135
- record "SIMPLE" do
136
- first_name 6...16
137
- last_name 16...26
174
+ record "PLAIN" do
175
+ first_name 5...15
176
+ last_name 15...25
138
177
 
139
178
  record "FANCY" do
140
- nickname 26...36
179
+ nickname 25...35
141
180
  end
142
181
  end
143
182
  end
144
183
 
145
- definition.parse(data.split(/[\r\n]+/)).to_a
184
+ Note that the outer definition is really just a record definition in disguise,
185
+ you might have already figured this out if you were paying attention.
146
186
 
147
- # => [#<struct
148
- # type_signature="SIMPLE",
149
- # first_name="Nicolaus",
150
- # last_name="Copernicus">,
151
- # #<struct
152
- # type_signature="FANCY",
153
- # first_name="Tycho",
154
- # last_name="Brahe",
155
- # nickname="Tykester">]
187
+ Anyway, this is what we get when we parse it.
156
188
 
189
+ #<struct
190
+ type_signature="SIMPLE",
191
+ first_name="Nicolaus",
192
+ last_name="Copernicus">,
193
+ #<struct
194
+ type_signature="FANCY",
195
+ first_name="Tycho",
196
+ last_name="Brahe",
197
+ nickname="Tykester">
198
+
199
+ ### Custom field parsing
200
+
201
+ Field definitions can accept a block. If provided, the field value is yielded
202
+ to the block. This is useful for parsing numeric fields (say).
203
+
204
+ Pikelet.define do
205
+ a_number(0...4) { |value| value.to_i }
206
+ end
207
+
208
+ You can also use shorthand syntax:
209
+
210
+ Pikelet.define do
211
+ a_number 0...4, &:to_i
212
+ end
213
+
214
+ ### A stupid trick
215
+
216
+ The `field` statement will actually accepts multiple ranges/indices and will
217
+ simply glue the sections described together. Consider the following data:
218
+
219
+ |BFH|00000001|01|LONZZZ 203TEST1101022359GB000001 |
220
+ |BCH|00000002|02|0111101007F110107 |
221
+ |BOH|00000003|03|91200001101031 GBP2|
222
+ |BKT|00000004|06| 000001 011X ZZZ |
223
+
224
+ In this format the first three characters are a 'message identifier', the next
225
+ 8 characters are a sequence number and the next 2 are a 'numeric qualifier'.
226
+ The message identifier and numeric qualifier together form the type signature.
227
+
228
+ We can describe this as follows (let's not bother describing all the
229
+ different record types):
230
+
231
+ Pikelet.define do
232
+ type_signature 0... 3, 11...13
233
+ sequence 3...11, &:to_i
234
+ payload 13.. -1
235
+ end
236
+
237
+ Which will yield:
238
+
239
+ #<struct
240
+ type_signature="BFH01",
241
+ sequence=1,
242
+ payload="LONZZZ 203TEST1101022359GB000001">,
243
+ #<struct
244
+ type_signature="BCH02",
245
+ sequence=2,
246
+ payload="0111101007F110107">,
247
+ #<struct
248
+ type_signature="BOH03",
249
+ sequence=3,
250
+ payload="91200001101031 GBP2">,
251
+ #<struct
252
+ type_signature="BKT06",
253
+ sequence=4,
254
+ payload="000001 011X ZZZ">
255
+
256
+ In case you were wondering, no I didn't make that format up. That is what a
257
+ [BSP HOT file][dish] actually looks like, except there's a hell of a lot more
258
+ of it and many, many more record types.
259
+
260
+ ## Thoughts/plans
261
+
262
+ * With some work, Pikelet could produce flat file records as easily as it
263
+ consumes them.
264
+ * I had a crack at supporting lazy enumeration, and it kinda works. Sometimes.
265
+ If the moon is in the right quarter. I'd like to get it working properly.
157
266
 
158
267
  ## Contributing
159
268
 
160
- 1. Fork it ( http://github.com/johncarney/pikelet/fork )
269
+ 1. Fork it ([http://github.com/johncarney/pikelet/fork][fork])
161
270
  2. Create your feature branch (`git checkout -b my-new-feature`)
162
271
  3. Commit your changes (`git commit -am 'Add some feature'`)
163
272
  4. Push to the branch (`git push origin my-new-feature`)
164
273
  5. Create new Pull Request
274
+
275
+ [pikelet-recipe]: http://www.taste.com.au/recipes/5757/pikelets
276
+ [pikelet-musician]: http://en.wikipedia.org/wiki/Evelyn_Morris
277
+ [dish]: http://www.iata.org/publications/Pages/bspdish.aspx
278
+ [overpunch]: https://github.com/johncarney/overpunch
279
+ [gem-badge]: https://badge.fury.io/rb/pikelet.svg
280
+ [gem]: http://badge.fury.io/rb/pikelet
281
+ [build-badge]: https://travis-ci.org/johncarney/pikelet.svg?branch=master
282
+ [build]: https://travis-ci.org/johncarney/pikelet
283
+ [coverage-badge]: https://img.shields.io/coveralls/johncarney/pikelet.svg
284
+ [coverage]: https://coveralls.io/r/johncarney/pikelet?branch=master
285
+ [fork]: http://github.com/johncarney/pikelet/fork
@@ -2,22 +2,31 @@ require "overpunch"
2
2
 
3
3
  module Pikelet
4
4
  class FieldDefinition
5
- attr_reader :indices, :type
5
+ attr_reader :indices, :parser
6
6
 
7
- def initialize(indices, type: nil)
7
+ def initialize(indices, type: nil, &parser)
8
8
  @indices = indices
9
- @type = type
9
+ if block_given?
10
+ @parser = parser
11
+ else
12
+ @parser = parser_from_type(type)
13
+ end
10
14
  end
11
15
 
12
16
  def parse(text)
13
- value = indices.map { |index| text[index] }.join
17
+ @parser.call(indices.map { |index| text[index] }.join)
18
+ end
19
+
20
+ private
21
+
22
+ def parser_from_type(type)
14
23
  case type
15
24
  when :integer
16
- value.to_i
25
+ :to_i.to_proc
17
26
  when :overpunch
18
- Overpunch.parse(value)
27
+ Proc.new { |value| Overpunch.parse(value) }
19
28
  else
20
- value.strip
29
+ :strip.to_proc
21
30
  end
22
31
  end
23
32
  end
@@ -10,9 +10,9 @@ module Pikelet
10
10
  end
11
11
  end
12
12
 
13
- def field(name, *indices, type: nil)
13
+ def field(name, *indices, type: nil, &block)
14
14
  @record_class = nil
15
- field_definitions[name] = Pikelet::FieldDefinition.new(indices, type: type)
15
+ field_definitions[name] = Pikelet::FieldDefinition.new(indices, type: type, &block)
16
16
  end
17
17
 
18
18
  def record(type_signature, &block)
@@ -23,8 +23,8 @@ module Pikelet
23
23
  record_class.new(*field_definitions.values.map { |field| field.parse(data) })
24
24
  end
25
25
 
26
- def method_missing(method, *args, **options)
27
- field(method, *args, **options)
26
+ def method_missing(method, *args, **options, &block)
27
+ field(method, *args, **options, &block)
28
28
  end
29
29
 
30
30
  def record_class
@@ -1,3 +1,3 @@
1
1
  module Pikelet
2
- VERSION = "0.0.2"
2
+ VERSION = "0.1.0"
3
3
  end
@@ -0,0 +1,86 @@
1
+ require "spec_helper"
2
+ require "pikelet"
3
+ require "csv"
4
+
5
+ describe Pikelet::FieldDefinition do
6
+ let(:data) { "The quick brown fox" }
7
+ let(:type) { nil }
8
+ let(:definition) { Pikelet::FieldDefinition.new(indices, type: type) }
9
+
10
+ subject(:value) { definition.parse(data) }
11
+
12
+ describe "for a fixed-width field" do
13
+ let(:indices) { [ 4...9 ] }
14
+
15
+ it "extracts the field content from the data" do
16
+ expect(value).to eq "quick"
17
+ end
18
+ end
19
+
20
+ describe "given whitespace" do
21
+ let(:indices) { [ 3...16 ] }
22
+
23
+ it "strips leading and trailing whitespace" do
24
+ expect(value).to eq "quick brown"
25
+ end
26
+ end
27
+
28
+ describe "with multiple indices" do
29
+ let(:indices) { [ 0...4, 16...19 ] }
30
+
31
+ it "joins the sections together" do
32
+ expect(value).to eq "The fox"
33
+ end
34
+ end
35
+
36
+ describe "given a CSV row" do
37
+ let(:data) { CSV.parse("The,quick,brown,fox").first }
38
+ let(:indices) { [ 2 ] }
39
+
40
+ it "extracts the field" do
41
+ expect(value).to eq "brown"
42
+ end
43
+ end
44
+
45
+ describe "for integer fields" do
46
+ let(:data) { "xx326xx" }
47
+ let(:indices) { [ 2...5] }
48
+ let(:type) { :integer }
49
+
50
+ it "converts the value to an integer" do
51
+ expect(value).to eq 326
52
+ end
53
+ end
54
+
55
+ describe "for overpunch fields" do
56
+ let(:data) { "xx67Kxx" }
57
+ let(:indices) { [ 2...5] }
58
+ let(:type) { :overpunch }
59
+
60
+ it "converts the value to an integer" do
61
+ expect(value).to eq -672
62
+ end
63
+ end
64
+
65
+ describe "given a parser block" do
66
+ let(:indices) { [ 4...9] }
67
+ let(:definition) do
68
+ Pikelet::FieldDefinition.new(indices) { |value| value.reverse }
69
+ end
70
+
71
+ it "yields the value to the parser" do
72
+ expect(value).to eq "kciuq"
73
+ end
74
+ end
75
+
76
+ describe "given a symbol for the parser block" do
77
+ let(:indices) { [ 4...9] }
78
+ let(:definition) do
79
+ Pikelet::FieldDefinition.new(indices, &:upcase)
80
+ end
81
+
82
+ it "invokes the named method on the value" do
83
+ expect(value).to eq "QUICK"
84
+ end
85
+ end
86
+ end
data/spec/pikelet_spec.rb CHANGED
@@ -13,7 +13,7 @@ describe Pikelet do
13
13
 
14
14
  subject { records }
15
15
 
16
- describe "a simple flat file" do
16
+ describe "for a simple flat file" do
17
17
  let(:definition) do
18
18
  Pikelet.define do
19
19
  name 0... 4
@@ -34,7 +34,7 @@ describe Pikelet do
34
34
  its(:last) { is_expected.to match_hash(name: "Sue", number: "087654321") }
35
35
  end
36
36
 
37
- describe "a file with heterogeneous records" do
37
+ describe "for a file with heterogeneous records" do
38
38
  let(:definition) do
39
39
  Pikelet.define do
40
40
  type_signature 0...1
@@ -65,7 +65,7 @@ describe Pikelet do
65
65
  its(:last) { is_expected.to match_hash(name: "Sue", number: "087654321", type_signature: "B") }
66
66
  end
67
67
 
68
- describe "a CSV file" do
68
+ describe "for a CSV file" do
69
69
  let(:definition) do
70
70
  Pikelet.define do
71
71
  name 0
@@ -114,10 +114,10 @@ describe Pikelet do
114
114
  its(:last) { is_expected.to match_hash(name: "Sue", number: "087654321", type_signature: "FANCY") }
115
115
  end
116
116
 
117
- describe "integer fields" do
117
+ describe "given integer fields" do
118
118
  let(:definition) do
119
119
  Pikelet.define do
120
- value 0...4, type: :integer
120
+ value 0...4, &:to_i
121
121
  end
122
122
  end
123
123
 
@@ -132,10 +132,10 @@ describe Pikelet do
132
132
  its(:value) { is_expected.to eq 5637 }
133
133
  end
134
134
 
135
- describe "overpunch fields" do
135
+ describe "given overpunch fields" do
136
136
  let(:definition) do
137
137
  Pikelet.define do
138
- value 0...4, type: :overpunch
138
+ value(0...4) { |value| Overpunch.parse(value) }
139
139
  end
140
140
  end
141
141
 
@@ -149,4 +149,32 @@ describe Pikelet do
149
149
 
150
150
  its(:value) { is_expected.to eq -5631 }
151
151
  end
152
+
153
+ describe "given a block when parsing" do
154
+ let(:collected_records) { [] }
155
+
156
+ let(:definition) do
157
+ Pikelet.define do
158
+ name 0... 4
159
+ number 4...13
160
+ end
161
+ end
162
+
163
+ let(:data) do
164
+ <<-FILE.gsub(/^\s*/, "").split(/[\r\n]+/)
165
+ John012345678
166
+ Sue 087654321
167
+ FILE
168
+ end
169
+
170
+ before do
171
+ definition.parse(data) do |record|
172
+ collected_records << record.to_h
173
+ end
174
+ end
175
+
176
+ it 'yields each record to the block' do
177
+ expect(collected_records).to contain_exactly(*records.map(&:to_h))
178
+ end
179
+ end
152
180
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: pikelet
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.2
4
+ version: 0.1.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - John Carney
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-07-24 00:00:00.000000000 Z
11
+ date: 2014-07-25 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: overpunch
@@ -131,6 +131,7 @@ files:
131
131
  - lib/pikelet/record_definition.rb
132
132
  - lib/pikelet/version.rb
133
133
  - pikelet.gemspec
134
+ - spec/pikelet/field_definition_spec.rb
134
135
  - spec/pikelet_spec.rb
135
136
  - spec/spec_helper.rb
136
137
  homepage: ''
@@ -158,5 +159,6 @@ signing_key:
158
159
  specification_version: 4
159
160
  summary: A simple flat-file database parser.
160
161
  test_files:
162
+ - spec/pikelet/field_definition_spec.rb
161
163
  - spec/pikelet_spec.rb
162
164
  - spec/spec_helper.rb