honey_format 0.12.0 → 0.13.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 6d236d3a0eeee26825fc307504a6e65c66dbd30c5cbc8b4d190f5af391e73fc0
4
- data.tar.gz: e5989adcda923101ccff44d34b15adffe7db4cbd1cabdfad3229dace94076c6b
3
+ metadata.gz: 0c4d5fc0d404dc7820a766b51307451f8900497495e51b42414498c569dbf782
4
+ data.tar.gz: 96faa29782adeb254a1a8c3e9a4a2a2f19bd88d261bce4a96675ad9f234d6ae4
5
5
  SHA512:
6
- metadata.gz: 5fa617674f689a27707c15a6a9d307181a8bda8662da9f40e5a3a689821c1269ca34762c4609511a7265cf75c04e551c1e1de8d5e2d0825826ff6a422aa77398
7
- data.tar.gz: de36bd4aa3bd05e3e9f99d703b0ffe4674f894edaa85aecc3b51114f434dcdf9d6ea3b23c9b3b2b0c0c5319d63849c49532d46bcccb044ebc4190c77c9212eeb
6
+ metadata.gz: 6dd3cd150abd94f14b9f4416b2dc5d42756b1901f963144e8850cafe344ce657443f188d32d4cc008437a3d42430cda9b02a0d25f741f0417ede3b838cc848a8
7
+ data.tar.gz: 0ec2641fc7346b44376460a8587a70248f7f61d249d3277e6a9a50f7d2f2f8b7ad8225f4f1bb4e1d797ee6adc85242026acb71ec9c8526499f403b86bd608d6c
data/CHANGELOG.md CHANGED
@@ -1,3 +1,19 @@
1
+ # HEAD
2
+
3
+ # v0.13.0
4
+
5
+ :warning: This release contains some backwards compatible changes.
6
+
7
+ * Extract `Matrix` super class from `CSV`
8
+ * Add `Header#empty?` and `Rows#empty?`
9
+ * Value converters [[#PR15](https://github.com/buren/honey_format/pull/15)]
10
+ + Convert column value to number, date, etc..
11
+ + Additional converters in [[#PR20](https://github.com/buren/honey_format/pull/20)]
12
+ * Add support for CSV row delimiter and quote character [[#PR15](https://github.com/buren/honey_format/pull/15)]
13
+ * :warning: `CSV#header` now returns an instance of `Header` instead of an array of the original header columns [[#PR15](https://github.com/buren/honey_format/pull/15)]
14
+ * Add `--[no-]rows-only` CLI option
15
+ * Rename `--[no-]only-header` CLI option to `--[no-]header-only`
16
+
1
17
  # v0.12.0
2
18
 
3
19
  * Add `--[no-]only-header` option to CLI
data/LICENSE.txt CHANGED
@@ -1,6 +1,6 @@
1
1
  The MIT License (MIT)
2
2
 
3
- Copyright (c) 2015 Jacob Burenstam Linder
3
+ Copyright (c) 2018 Jacob Burenstam Linder
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
6
  of this software and associated documentation files (the "Software"), to deal
data/README.md CHANGED
@@ -1,28 +1,40 @@
1
- # HoneyFormat [![Build Status](https://travis-ci.org/buren/honey_format.svg)](https://travis-ci.org/buren/honey_format) [![Code Climate](https://codeclimate.com/github/buren/honey_format/badges/gpa.svg)](https://codeclimate.com/github/buren/honey_format) [![Inline docs](http://inch-ci.org/github/buren/honey_format.svg)](http://inch-ci.org/github/buren/honey_format)
1
+ # HoneyFormat [![Build Status](https://travis-ci.org/buren/honey_format.svg)](https://travis-ci.org/buren/honey_format) [![Code Climate](https://codeclimate.com/github/buren/honey_format/badges/gpa.svg)](https://codeclimate.com/github/buren/honey_format) [![Inline docs](http://inch-ci.org/github/buren/honey_format.svg)](https://www.rubydoc.info/gems/honey_format/)
2
2
 
3
- Convert CSV to an array of objects with with ease.
3
+ > Makes working with CSVs as smooth as honey.
4
+
5
+ Proper objects for CSV headers and rows, convert column values, filter columns and rows, small(-ish) perfomance overhead, no dependencies other than Ruby stdlib.
4
6
 
5
7
  ## Features
6
8
 
7
9
  - Proper objects for CSV header and rows
8
- - Convert column values with custom row builder
10
+ - Convert column values
11
+ - Pass your own custom row builder
9
12
  - Convert header column names
10
- - Customize what columns and rows are included in CSV output
11
- - [CLI](#cli)
13
+ - Filter what columns and rows are included in CSV output
14
+ - [CLI](#cli) - Simple command line interface
15
+ - Only ~5-10% overhead from using Ruby CSV, see [benchmarks](#benchmark)
12
16
  - Has no dependencies other than Ruby stdlib
13
17
  - Supports Ruby >= 2.3
14
18
 
15
- ## Examples
19
+ Read the [usage section](#usage), [RubyDoc](https://www.rubydoc.info/gems/honey_format/) or [examples/ directory](https://github.com/buren/honey_format/tree/master/examples) for how to use this gem.
20
+
21
+ ## Quick use
16
22
 
17
- See [examples/](https://github.com/buren/honey_format/tree/master/examples) for more examples.
18
23
 
19
24
  ```ruby
20
- csv_string = "Id,Username\n1,buren"
21
- csv = HoneyFormat::CSV.new(csv_string)
22
- csv.header # => ["Id", "Username"]
25
+ csv_string = <<-CSV
26
+ Id,Username,Email
27
+ 1,buren,buren@example.com
28
+ 2,jacob,jacob@example.com
29
+ CSV
30
+ csv = HoneyFormat::CSV.new(csv_string, type_map: { id: :integer })
31
+ csv.columns # => [:id, :username]
23
32
  user = csv.rows # => [#<struct id="1", username="buren">]
24
- user.id # => "1"
33
+ user.id # => 1
25
34
  user.username # => "buren"
35
+
36
+ csv.to_csv(columns: [:id, :username]) { |row| row.id < 2 }
37
+ # => "id,username\n1,buren\n"
26
38
  ```
27
39
 
28
40
  ## Installation
@@ -45,21 +57,73 @@ $ gem install honey_format
45
57
 
46
58
  ## Usage
47
59
 
48
- By default assumes a header in the CSV file.
60
+ By default assumes a header in the CSV file
49
61
 
50
62
  ```ruby
51
63
  csv_string = "Id,Username\n1,buren"
52
64
  csv = HoneyFormat::CSV.new(csv_string)
53
- csv.header # => ["Id", "Username"]
54
- csv.columns # => [:id, :username]
55
65
 
66
+ # Header
67
+ header = csv.header
68
+ header.original # => ["Id", "Username"]
69
+ header.columns # => [:id, :username]
70
+
71
+
72
+ # Rows
56
73
  rows = csv.rows # => [#<struct id="1", username="buren">]
57
74
  user = rows.first
58
75
  user.id # => "1"
59
76
  user.username # => "buren"
60
77
  ```
61
78
 
62
- Minimal custom row builder
79
+ Set delimiter & quote character
80
+ ```ruby
81
+ csv_string = "name;id|'John Doe';42"
82
+ csv = HoneyFormat::CSV.new(
83
+ csv_string,
84
+ delimiter: ';',
85
+ row_delimiter: '|',
86
+ quote_character: "'",
87
+ )
88
+ ```
89
+
90
+ __Type converters__
91
+
92
+ > Type converters are great if you want to convert column values, like numbers and dates.
93
+
94
+ There are a few default type converters
95
+ ```ruby
96
+ csv_string = "Id,Username\n1,buren"
97
+ type_map = { id: :integer }
98
+ csv = HoneyFormat::CSV.new(csv_string, type_map: type_map)
99
+ csv.rows.first.id # => 1
100
+ ```
101
+
102
+ Add your own converter
103
+ ```ruby
104
+ HoneyFormat.configure do |config|
105
+ config.converter.register :upcased, proc { |v| v.upcase }
106
+ end
107
+
108
+ csv_string = "Id,Username\n1,buren"
109
+ type_map = { username: :upcased }
110
+ csv = HoneyFormat::CSV.new(csv_string, type_map: type_map)
111
+ csv.rows.first.username # => "BUREN"
112
+ ```
113
+
114
+ Access registered converters
115
+ ```ruby
116
+ decimal_converter = HoneyFormat.value_converter[:decimal]
117
+ decimal_converter.call('1.1') # => 1.1
118
+ ```
119
+
120
+ See [`ValueConverter::DEFAULT_CONVERTERS`](https://github.com/buren/honey_format/tree/master/lib/honey_format/value_converter.rb) for a complete list of the default ones.
121
+
122
+ __Row builder__
123
+
124
+ > Pass your own row builder if you want more control of the entire row or if you want to return your own row object.
125
+
126
+ Custom row builder
63
127
  ```ruby
64
128
  csv_string = "Id,Username\n1,buren"
65
129
  upcaser = ->(row) { row.tap { |r| r.username.upcase! } }
@@ -67,26 +131,38 @@ csv = HoneyFormat::CSV.new(csv_string, row_builder: upcaser)
67
131
  csv.rows # => [#<struct id="1", username="BUREN">]
68
132
  ```
69
133
 
70
- Complete custom row builder
134
+ As long as the row builder responds to `#call` you can pass anything you like
71
135
  ```ruby
72
136
  class Anonymizer
73
- def self.call(row)
137
+ def call(row)
138
+ @cache ||= {}
74
139
  # Return an object you want to represent the row
75
140
  row.tap do |r|
76
- r.name = '<anon>'
77
- r.email = '<anon>'
78
- r.ssn = '<anon>'
141
+ # given the same value make sure to return the same anonymized value every time
142
+ @cache[r.email] ||= "#{SecureRandom.hex(6)}@example.com"
143
+ r.email = @cache[r.email]
79
144
  r.payment_id = '<scrubbed>'
80
145
  end
81
146
  end
82
147
  end
83
148
 
84
- csv_string = "Id,Username\n1,buren"
85
- csv = HoneyFormat::CSV.new(csv_string, row_builder: Anonymizer)
86
- csv.rows # => [#<struct id="1", username="BUREN">]
149
+ csv_string = <<~CSV
150
+ Email,Payment ID
151
+ buren@example.com,123
152
+ buren@example.com,998
153
+ CSV
154
+ csv = HoneyFormat::CSV.new(csv_string, row_builder: Anonymizer.new)
155
+ csv.rows.to_csv(columns: [:email])
156
+ # => 8f6ed70a7f98@example.com
157
+ # 8f6ed70a7f98@example.com
158
+ # 0db96f350cea@example.com
87
159
  ```
88
160
 
89
- Output CSV
161
+ __Output CSV__
162
+
163
+ > Makes it super easy to output a subset of columns/rows.
164
+
165
+ Manipulate the rows before output
90
166
  ```ruby
91
167
  csv_string = "Id,Username\n1,buren"
92
168
  csv = HoneyFormat::CSV.new(csv_string)
@@ -94,35 +170,33 @@ csv.rows.each { |row| row.id = nil }
94
170
  csv.to_csv # => "id,username\n,buren\n"
95
171
  ```
96
172
 
97
- Output a subset of columns to CSV
173
+ Output a subset of columns
98
174
  ```ruby
99
175
  csv_string = "Id, Username, Country\n1,buren,Sweden"
100
176
  csv = HoneyFormat::CSV.new(csv_string)
101
177
  csv.to_csv(columns: [:id, :country]) # => "id,country\nburen,Sweden\n"
102
178
  ```
103
179
 
104
- Output a subset of rows to CSV
180
+ Output a subset of rows
105
181
  ```ruby
106
182
  csv_string = "Name, Country\nburen,Sweden\njacob,Denmark"
107
183
  csv = HoneyFormat::CSV.new(csv_string)
108
184
  csv.to_csv { |row| row.country == 'Sweden' } # => "name,country\nburen,Sweden\n"
109
185
  ```
110
186
 
111
- You can of course set the delimiter
112
- ```ruby
113
- HoneyFormat::CSV.new(csv_string, delimiter: ';')
114
- ```
187
+ __Headers__
115
188
 
116
- Validate CSV header
189
+ > By default generates method-like names for each header column, but also gives you full control: define them or convert them.
190
+
191
+ By default assumes a header in the CSV file.
117
192
  ```ruby
118
193
  csv_string = "Id,Username\n1,buren"
119
- # Invalid
120
- HoneyFormat::CSV.new(csv_string, valid_columns: [:something, :username])
121
- # => HoneyFormat::UnknownHeaderColumnError (column :id not in [:something, :username])
194
+ csv = HoneyFormat::CSV.new(csv_string)
122
195
 
123
- # Valid
124
- csv = HoneyFormat::CSV.new(csv_string, valid_columns: [:id, :username])
125
- csv.rows.first.username # => "buren"
196
+ # Header
197
+ header = csv.header
198
+ header.original # => ["Id", "Username"]
199
+ header.columns # => [:id, :username]
126
200
  ```
127
201
 
128
202
  Define header
@@ -132,32 +206,33 @@ csv = HoneyFormat::CSV.new(csv_string, header: ['Id', 'Username'])
132
206
  csv.rows.first.username # => "buren"
133
207
  ```
134
208
 
135
- If your header contains special chars and/or chars that can't be part of Ruby method names,
136
- things can get a little awkward..
209
+ Set default header converter
137
210
  ```ruby
138
- csv_string = "ÅÄÖ\nSwedish characters"
139
- user = HoneyFormat::CSV.new(csv_string).rows.first
140
- # Note that these chars aren't "downcased" in Ruby 2.3 and older versions of Ruby,
141
- # "ÅÄÖ".downcase # => "ÅÄÖ"
142
- user.ÅÄÖ # => "Swedish characters"
143
- # while on Ruby > 2.3
144
- user.åäö
211
+ HoneyFormat.configure do |config|
212
+ config.header_converter = proc { |v| v.downcase }
213
+ end
145
214
 
146
- csv_string = "First^Name\nJacob"
147
- user = HoneyFormat::CSV.new(csv_string).rows.first
148
- user.public_send(:"first^name") # => "Jacob"
149
- # or
150
- user['first^name'] # => "Jacob"
215
+ # you can get the default one with
216
+ header_converter = HoneyFormat.value_converter[:header_column]
217
+ header_converter.call('First name') # => "first_name"
218
+ ```
219
+
220
+ Use any value converter as the header converter
221
+ ```ruby
222
+ csv_string = "Id,Username\n1,buren"
223
+ csv = HoneyFormat::CSV.new(csv_string, header_converter: :upcase)
224
+ csv.columns # => [:ID, :USERNAME]
151
225
  ```
152
226
 
153
227
  Pass your own header converter
154
228
  ```ruby
155
229
  map = { 'First^Name' => :first_name }
156
- converter = ->(column) { map.fetch(column, column) }
230
+ converter = ->(column) { map.fetch(column, column.downcase) }
157
231
 
158
- csv_string = "First^Name\nJacob"
232
+ csv_string = "ID,First^Name\n1,Jacob"
159
233
  user = HoneyFormat::CSV.new(csv_string, header_converter: converter).rows.first
160
234
  user.first_name # => "Jacob"
235
+ user.id # => "1"
161
236
  ```
162
237
 
163
238
  Missing header values
@@ -168,9 +243,30 @@ user = csv.rows.first
168
243
  user.column1 # => "val1"
169
244
  ```
170
245
 
171
- Errors
246
+ If your header contains special chars and/or chars that can't be part of Ruby method names,
247
+ things can get a little awkward..
248
+ ```ruby
249
+ csv_string = "ÅÄÖ\nSwedish characters"
250
+ user = HoneyFormat::CSV.new(csv_string).rows.first
251
+ # Note that these chars aren't "downcased" in Ruby 2.3 and older versions of Ruby,
252
+ # "ÅÄÖ".downcase # => "ÅÄÖ"
253
+ user.ÅÄÖ # => "Swedish characters"
254
+ # while on Ruby > 2.3
255
+ user.åäö
256
+
257
+ csv_string = "First^Name\nJacob"
258
+ user = HoneyFormat::CSV.new(csv_string).rows.first
259
+ user.public_send(:"first^name") # => "Jacob"
260
+ # or
261
+ user['first^name'] # => "Jacob"
262
+ ```
263
+
264
+ __Errors__
265
+
266
+ > When you need that some extra safety.
267
+
268
+ If you want to there are some errors you can rescue
172
269
  ```ruby
173
- # there are two error super classes
174
270
  begin
175
271
  HoneyFormat::CSV.new(csv_string)
176
272
  rescue HoneyFormat::HeaderError => e
@@ -184,13 +280,17 @@ end
184
280
 
185
281
  You can see all [available errors here](https://www.rubydoc.info/gems/honey_format/HoneyFormat/Errors).
186
282
 
187
- If you want to see more usage examples check out the `spec/` directory.
283
+ If you want to see more usage examples check out the [`examples/`](https://github.com/buren/honey_format/tree/master/examples) and [`spec/`](https://github.com/buren/honey_format/tree/master/spec) directories.
188
284
 
189
285
  ## CLI
190
286
 
287
+ > Perfect when you want to get something simple done quickly.
288
+
191
289
  ```
192
290
  Usage: honey_format [file.csv] [options]
193
291
  --csv=input.csv CSV file
292
+ --[no-]header-only Print only the header
293
+ --[no-]rows-only Print only the rows
194
294
  --columns=id,name Select columns.
195
295
  --output=output.csv CSV output (STDOUT otherwise)
196
296
  --delimiter=, CSV delimiter (default: ,)
@@ -198,30 +298,48 @@ Usage: honey_format [file.csv] [options]
198
298
  --version Show version
199
299
  ```
200
300
 
201
- ## Benchmark
202
-
203
- _Note_: This gem, adds some overhead to parsing a CSV string. I've included some benchmarks below, your mileage may vary..
204
-
205
- You can run the benchmarks yourself:
301
+ Output a subset of columns to a new file
302
+ ```
303
+ # input.csv
304
+ id,name,username
305
+ 1,jacob,buren
306
+ ```
206
307
 
207
308
  ```
208
- $ bin/benchmark file.csv
309
+ $ honey_format input.csv --columns=id,username > output.csv
209
310
  ```
210
311
 
312
+
313
+ ## Benchmark
314
+
315
+ _Note_: This gem, adds some overhead to parsing a CSV string, typically ~5-10%. I've included some benchmarks below, your mileage may vary..
316
+
211
317
  204KB (1k lines)
212
318
 
213
319
  ```
214
- stdlib CSV: 51.9 i/s
215
- HoneyFormat::CSV: 49.6 i/s - 1.05x slower
320
+ CSV no options: 51.0 i/s
321
+ CSV with header: 36.1 i/s - 1.41x slower
322
+ HoneyFormat::CSV: 48.7 i/s - 1.05x slower
216
323
  ```
217
324
 
218
325
  2MB (10k lines)
219
326
 
220
327
  ```
221
- stdlib CSV: 4.6 i/s
222
- HoneyFormat::CSV: 4.2 i/s - 1.08x slower
328
+ CSV no options: 5.1 i/s
329
+ CSV with header: 3.6 i/s - 1.42x slower
330
+ HoneyFormat::CSV: 4.9 i/s - 1.05x slower
223
331
  ```
224
332
 
333
+ You can run the benchmarks yourself
334
+ ```
335
+ Usage: bin/benchmark [file.csv] [options]
336
+ --csv=[file1.csv] CSV file(s)
337
+ --[no-]verbose Verbose output
338
+ --lines-multipliers=[1,2,10] Multiply the rows in the CSV file (default: 1)
339
+ --time=[30] Benchmark time (default: 30)
340
+ --warmup=[30] Benchmark warmup (default: 30)
341
+ -h, --help How to use
342
+ ```
225
343
 
226
344
  ## Development
227
345
 
data/bin/benchmark CHANGED
@@ -4,72 +4,43 @@ require 'honey_format'
4
4
 
5
5
  require 'benchmark/ips'
6
6
  require 'csv'
7
- require 'optparse'
8
7
 
9
- input_path = nil
10
- benchmark_time = 30
11
- benchmark_warmup = 5
12
- lines_multipliers = [1]
8
+ require 'honey_format/cli/benchmark_cli'
13
9
 
14
- OptionParser.new do |parser|
15
- parser.banner = "Usage: bin/benchmark [file.csv] [options]"
16
- parser.default_argv = ARGV
10
+ cli = HoneyFormat::BenchmarkCLI.new
11
+ Writer = cli.writer
12
+ options = cli.options
17
13
 
18
- parser.on("--csv=file1.csv", String, "CSV file(s)") do |value|
19
- input_path = value
20
- end
21
-
22
- parser.on("--lines-multipliers=[1,10,50]", Array, "Multiply the rows in the CSV file (default: 1)") do |value|
23
- lines_multipliers = value.map do |v|
24
- Integer(v).tap do |int|
25
- unless int >= 1
26
- raise(ArgumentError, '--lines-multiplier must be 1 or greater')
27
- end
28
- end
29
- end
30
- end
31
-
32
- parser.on("--time=[30]", String, "Benchmark time (default: 30)") do |value|
33
- benchmark_time = Integer(value)
34
- end
35
-
36
- parser.on("--warmup=[30]", String, "Benchmark warmup (default: 30)") do |value|
37
- benchmark_warmup = Integer(value)
38
- end
39
-
40
- parser.on("-h", "--help", "How to use") do
41
- puts parser
42
- exit
43
- end
14
+ input_path = options[:input_path]
15
+ benchmark_time = options[:benchmark_time]
16
+ benchmark_warmup = options[:benchmark_warmup]
17
+ lines_multipliers = options[:lines_multipliers]
44
18
 
45
- # No argument, shows at tail. This will print an options summary.
46
- parser.on_tail("-h", "--help", "Show this message") do
47
- puts parser
48
- exit
49
- end
50
- end.parse!
19
+ original_csv = input_path ? File.read(input_path) : cli.fetch_default_benchmark_csv
20
+ original_csv_lines = original_csv.lines
51
21
 
52
- csv = File.read(input_path)
22
+ runtime_seconds = cli.expected_runtime_seconds(report_count: 3)
23
+ Writer.puts "Expected runtime: ~#{runtime_seconds} seconds.", verbose: true
53
24
 
54
- lines_multipliers.each do |lines_multiplier|
55
- if lines_multiplier > 1
56
- orignial_csv_lines = csv.lines
57
- rows = orignial_csv_lines[1..-1] * lines_multiplier
58
- csv = orignial_csv_lines.first + rows.join
59
- end
25
+ lines_multipliers.each_with_index do |lines_multiplier, index|
26
+ rows = original_csv_lines[1..-1] * lines_multiplier
27
+ csv = original_csv_lines.first + rows.join
60
28
 
61
29
  line_count = csv.lines.length
62
30
 
63
- puts "== [START] Benchmark for #{line_count} lines =="
31
+ Writer.puts "== Benchmark #{index + 1} of #{lines_multipliers.length} =="
32
+ Writer.puts "path #{cli.used_input_path}"
33
+ Writer.puts "lines #{line_count}"
34
+ Writer.puts "multiplier #{lines_multiplier}"
35
+
64
36
  Benchmark.ips do |x|
65
37
  x.time = benchmark_time
66
38
  x.warmup = benchmark_warmup
67
39
 
68
- x.report('stdlib CSV no options') { CSV.parse(csv) }
69
- x.report('stdlib CSV with header') { CSV.parse(csv, headers: true) }
70
- x.report('HoneyFormat::CSV') { HoneyFormat::CSV.new(csv).rows }
40
+ x.report('CSV no options') { CSV.parse(csv) }
41
+ x.report('CSV with header') { CSV.parse(csv, headers: true) }
42
+ x.report('HoneyFormat::CSV') { HoneyFormat::CSV.new(csv).rows }
71
43
 
72
44
  x.compare!
73
45
  end
74
- puts "== [END] Benchmark for #{line_count} lines =="
75
46
  end