honey_format 0.12.0 → 0.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 6d236d3a0eeee26825fc307504a6e65c66dbd30c5cbc8b4d190f5af391e73fc0
4
- data.tar.gz: e5989adcda923101ccff44d34b15adffe7db4cbd1cabdfad3229dace94076c6b
3
+ metadata.gz: 0c4d5fc0d404dc7820a766b51307451f8900497495e51b42414498c569dbf782
4
+ data.tar.gz: 96faa29782adeb254a1a8c3e9a4a2a2f19bd88d261bce4a96675ad9f234d6ae4
5
5
  SHA512:
6
- metadata.gz: 5fa617674f689a27707c15a6a9d307181a8bda8662da9f40e5a3a689821c1269ca34762c4609511a7265cf75c04e551c1e1de8d5e2d0825826ff6a422aa77398
7
- data.tar.gz: de36bd4aa3bd05e3e9f99d703b0ffe4674f894edaa85aecc3b51114f434dcdf9d6ea3b23c9b3b2b0c0c5319d63849c49532d46bcccb044ebc4190c77c9212eeb
6
+ metadata.gz: 6dd3cd150abd94f14b9f4416b2dc5d42756b1901f963144e8850cafe344ce657443f188d32d4cc008437a3d42430cda9b02a0d25f741f0417ede3b838cc848a8
7
+ data.tar.gz: 0ec2641fc7346b44376460a8587a70248f7f61d249d3277e6a9a50f7d2f2f8b7ad8225f4f1bb4e1d797ee6adc85242026acb71ec9c8526499f403b86bd608d6c
data/CHANGELOG.md CHANGED
@@ -1,3 +1,19 @@
1
+ # HEAD
2
+
3
+ # v0.13.0
4
+
5
+ :warning: This release contains some backwards compatible changes.
6
+
7
+ * Extract `Matrix` super class from `CSV`
8
+ * Add `Header#empty?` and `Rows#empty?`
9
+ * Value converters [[#PR15](https://github.com/buren/honey_format/pull/15)]
10
+ + Convert column value to number, date, etc..
11
+ + Additional converters in [[#PR20](https://github.com/buren/honey_format/pull/20)]
12
+ * Add support for CSV row delimiter and quote character [[#PR15](https://github.com/buren/honey_format/pull/15)]
13
+ * :warning: `CSV#header` now returns an instance of `Header` instead of an array of the original header columns [[#PR15](https://github.com/buren/honey_format/pull/15)]
14
+ * Add `--[no-]rows-only` CLI option
15
+ * Rename `--[no-]only-header` CLI option to `--[no-]header-only`
16
+
1
17
  # v0.12.0
2
18
 
3
19
  * Add `--[no-]only-header` option to CLI
data/LICENSE.txt CHANGED
@@ -1,6 +1,6 @@
1
1
  The MIT License (MIT)
2
2
 
3
- Copyright (c) 2015 Jacob Burenstam Linder
3
+ Copyright (c) 2018 Jacob Burenstam Linder
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
6
  of this software and associated documentation files (the "Software"), to deal
data/README.md CHANGED
@@ -1,28 +1,40 @@
1
- # HoneyFormat [![Build Status](https://travis-ci.org/buren/honey_format.svg)](https://travis-ci.org/buren/honey_format) [![Code Climate](https://codeclimate.com/github/buren/honey_format/badges/gpa.svg)](https://codeclimate.com/github/buren/honey_format) [![Inline docs](http://inch-ci.org/github/buren/honey_format.svg)](http://inch-ci.org/github/buren/honey_format)
1
+ # HoneyFormat [![Build Status](https://travis-ci.org/buren/honey_format.svg)](https://travis-ci.org/buren/honey_format) [![Code Climate](https://codeclimate.com/github/buren/honey_format/badges/gpa.svg)](https://codeclimate.com/github/buren/honey_format) [![Inline docs](http://inch-ci.org/github/buren/honey_format.svg)](https://www.rubydoc.info/gems/honey_format/)
2
2
 
3
- Convert CSV to an array of objects with with ease.
3
+ > Makes working with CSVs as smooth as honey.
4
+
5
+ Proper objects for CSV headers and rows, convert column values, filter columns and rows, small(-ish) perfomance overhead, no dependencies other than Ruby stdlib.
4
6
 
5
7
  ## Features
6
8
 
7
9
  - Proper objects for CSV header and rows
8
- - Convert column values with custom row builder
10
+ - Convert column values
11
+ - Pass your own custom row builder
9
12
  - Convert header column names
10
- - Customize what columns and rows are included in CSV output
11
- - [CLI](#cli)
13
+ - Filter what columns and rows are included in CSV output
14
+ - [CLI](#cli) - Simple command line interface
15
+ - Only ~5-10% overhead from using Ruby CSV, see [benchmarks](#benchmark)
12
16
  - Has no dependencies other than Ruby stdlib
13
17
  - Supports Ruby >= 2.3
14
18
 
15
- ## Examples
19
+ Read the [usage section](#usage), [RubyDoc](https://www.rubydoc.info/gems/honey_format/) or [examples/ directory](https://github.com/buren/honey_format/tree/master/examples) for how to use this gem.
20
+
21
+ ## Quick use
16
22
 
17
- See [examples/](https://github.com/buren/honey_format/tree/master/examples) for more examples.
18
23
 
19
24
  ```ruby
20
- csv_string = "Id,Username\n1,buren"
21
- csv = HoneyFormat::CSV.new(csv_string)
22
- csv.header # => ["Id", "Username"]
25
+ csv_string = <<-CSV
26
+ Id,Username,Email
27
+ 1,buren,buren@example.com
28
+ 2,jacob,jacob@example.com
29
+ CSV
30
+ csv = HoneyFormat::CSV.new(csv_string, type_map: { id: :integer })
31
+ csv.columns # => [:id, :username]
23
32
  user = csv.rows # => [#<struct id="1", username="buren">]
24
- user.id # => "1"
33
+ user.id # => 1
25
34
  user.username # => "buren"
35
+
36
+ csv.to_csv(columns: [:id, :username]) { |row| row.id < 2 }
37
+ # => "id,username\n1,buren\n"
26
38
  ```
27
39
 
28
40
  ## Installation
@@ -45,21 +57,73 @@ $ gem install honey_format
45
57
 
46
58
  ## Usage
47
59
 
48
- By default assumes a header in the CSV file.
60
+ By default assumes a header in the CSV file
49
61
 
50
62
  ```ruby
51
63
  csv_string = "Id,Username\n1,buren"
52
64
  csv = HoneyFormat::CSV.new(csv_string)
53
- csv.header # => ["Id", "Username"]
54
- csv.columns # => [:id, :username]
55
65
 
66
+ # Header
67
+ header = csv.header
68
+ header.original # => ["Id", "Username"]
69
+ header.columns # => [:id, :username]
70
+
71
+
72
+ # Rows
56
73
  rows = csv.rows # => [#<struct id="1", username="buren">]
57
74
  user = rows.first
58
75
  user.id # => "1"
59
76
  user.username # => "buren"
60
77
  ```
61
78
 
62
- Minimal custom row builder
79
+ Set delimiter & quote character
80
+ ```ruby
81
+ csv_string = "name;id|'John Doe';42"
82
+ csv = HoneyFormat::CSV.new(
83
+ csv_string,
84
+ delimiter: ';',
85
+ row_delimiter: '|',
86
+ quote_character: "'",
87
+ )
88
+ ```
89
+
90
+ __Type converters__
91
+
92
+ > Type converters are great if you want to convert column values, like numbers and dates.
93
+
94
+ There are a few default type converters
95
+ ```ruby
96
+ csv_string = "Id,Username\n1,buren"
97
+ type_map = { id: :integer }
98
+ csv = HoneyFormat::CSV.new(csv_string, type_map: type_map)
99
+ csv.rows.first.id # => 1
100
+ ```
101
+
102
+ Add your own converter
103
+ ```ruby
104
+ HoneyFormat.configure do |config|
105
+ config.converter.register :upcased, proc { |v| v.upcase }
106
+ end
107
+
108
+ csv_string = "Id,Username\n1,buren"
109
+ type_map = { username: :upcased }
110
+ csv = HoneyFormat::CSV.new(csv_string, type_map: type_map)
111
+ csv.rows.first.username # => "BUREN"
112
+ ```
113
+
114
+ Access registered converters
115
+ ```ruby
116
+ decimal_converter = HoneyFormat.value_converter[:decimal]
117
+ decimal_converter.call('1.1') # => 1.1
118
+ ```
119
+
120
+ See [`ValueConverter::DEFAULT_CONVERTERS`](https://github.com/buren/honey_format/tree/master/lib/honey_format/value_converter.rb) for a complete list of the default ones.
121
+
122
+ __Row builder__
123
+
124
+ > Pass your own row builder if you want more control of the entire row or if you want to return your own row object.
125
+
126
+ Custom row builder
63
127
  ```ruby
64
128
  csv_string = "Id,Username\n1,buren"
65
129
  upcaser = ->(row) { row.tap { |r| r.username.upcase! } }
@@ -67,26 +131,38 @@ csv = HoneyFormat::CSV.new(csv_string, row_builder: upcaser)
67
131
  csv.rows # => [#<struct id="1", username="BUREN">]
68
132
  ```
69
133
 
70
- Complete custom row builder
134
+ As long as the row builder responds to `#call` you can pass anything you like
71
135
  ```ruby
72
136
  class Anonymizer
73
- def self.call(row)
137
+ def call(row)
138
+ @cache ||= {}
74
139
  # Return an object you want to represent the row
75
140
  row.tap do |r|
76
- r.name = '<anon>'
77
- r.email = '<anon>'
78
- r.ssn = '<anon>'
141
+ # given the same value make sure to return the same anonymized value every time
142
+ @cache[r.email] ||= "#{SecureRandom.hex(6)}@example.com"
143
+ r.email = @cache[r.email]
79
144
  r.payment_id = '<scrubbed>'
80
145
  end
81
146
  end
82
147
  end
83
148
 
84
- csv_string = "Id,Username\n1,buren"
85
- csv = HoneyFormat::CSV.new(csv_string, row_builder: Anonymizer)
86
- csv.rows # => [#<struct id="1", username="BUREN">]
149
+ csv_string = <<~CSV
150
+ Email,Payment ID
151
+ buren@example.com,123
152
+ buren@example.com,998
153
+ CSV
154
+ csv = HoneyFormat::CSV.new(csv_string, row_builder: Anonymizer.new)
155
+ csv.rows.to_csv(columns: [:email])
156
+ # => 8f6ed70a7f98@example.com
157
+ # 8f6ed70a7f98@example.com
158
+ # 0db96f350cea@example.com
87
159
  ```
88
160
 
89
- Output CSV
161
+ __Output CSV__
162
+
163
+ > Makes it super easy to output a subset of columns/rows.
164
+
165
+ Manipulate the rows before output
90
166
  ```ruby
91
167
  csv_string = "Id,Username\n1,buren"
92
168
  csv = HoneyFormat::CSV.new(csv_string)
@@ -94,35 +170,33 @@ csv.rows.each { |row| row.id = nil }
94
170
  csv.to_csv # => "id,username\n,buren\n"
95
171
  ```
96
172
 
97
- Output a subset of columns to CSV
173
+ Output a subset of columns
98
174
  ```ruby
99
175
  csv_string = "Id, Username, Country\n1,buren,Sweden"
100
176
  csv = HoneyFormat::CSV.new(csv_string)
101
177
  csv.to_csv(columns: [:id, :country]) # => "id,country\nburen,Sweden\n"
102
178
  ```
103
179
 
104
- Output a subset of rows to CSV
180
+ Output a subset of rows
105
181
  ```ruby
106
182
  csv_string = "Name, Country\nburen,Sweden\njacob,Denmark"
107
183
  csv = HoneyFormat::CSV.new(csv_string)
108
184
  csv.to_csv { |row| row.country == 'Sweden' } # => "name,country\nburen,Sweden\n"
109
185
  ```
110
186
 
111
- You can of course set the delimiter
112
- ```ruby
113
- HoneyFormat::CSV.new(csv_string, delimiter: ';')
114
- ```
187
+ __Headers__
115
188
 
116
- Validate CSV header
189
+ > By default generates method-like names for each header column, but also gives you full control: define them or convert them.
190
+
191
+ By default assumes a header in the CSV file.
117
192
  ```ruby
118
193
  csv_string = "Id,Username\n1,buren"
119
- # Invalid
120
- HoneyFormat::CSV.new(csv_string, valid_columns: [:something, :username])
121
- # => HoneyFormat::UnknownHeaderColumnError (column :id not in [:something, :username])
194
+ csv = HoneyFormat::CSV.new(csv_string)
122
195
 
123
- # Valid
124
- csv = HoneyFormat::CSV.new(csv_string, valid_columns: [:id, :username])
125
- csv.rows.first.username # => "buren"
196
+ # Header
197
+ header = csv.header
198
+ header.original # => ["Id", "Username"]
199
+ header.columns # => [:id, :username]
126
200
  ```
127
201
 
128
202
  Define header
@@ -132,32 +206,33 @@ csv = HoneyFormat::CSV.new(csv_string, header: ['Id', 'Username'])
132
206
  csv.rows.first.username # => "buren"
133
207
  ```
134
208
 
135
- If your header contains special chars and/or chars that can't be part of Ruby method names,
136
- things can get a little awkward..
209
+ Set default header converter
137
210
  ```ruby
138
- csv_string = "ÅÄÖ\nSwedish characters"
139
- user = HoneyFormat::CSV.new(csv_string).rows.first
140
- # Note that these chars aren't "downcased" in Ruby 2.3 and older versions of Ruby,
141
- # "ÅÄÖ".downcase # => "ÅÄÖ"
142
- user.ÅÄÖ # => "Swedish characters"
143
- # while on Ruby > 2.3
144
- user.åäö
211
+ HoneyFormat.configure do |config|
212
+ config.header_converter = proc { |v| v.downcase }
213
+ end
145
214
 
146
- csv_string = "First^Name\nJacob"
147
- user = HoneyFormat::CSV.new(csv_string).rows.first
148
- user.public_send(:"first^name") # => "Jacob"
149
- # or
150
- user['first^name'] # => "Jacob"
215
+ # you can get the default one with
216
+ header_converter = HoneyFormat.value_converter[:header_column]
217
+ header_converter.call('First name') # => "first_name"
218
+ ```
219
+
220
+ Use any value converter as the header converter
221
+ ```ruby
222
+ csv_string = "Id,Username\n1,buren"
223
+ csv = HoneyFormat::CSV.new(csv_string, header_converter: :upcase)
224
+ csv.columns # => [:ID, :USERNAME]
151
225
  ```
152
226
 
153
227
  Pass your own header converter
154
228
  ```ruby
155
229
  map = { 'First^Name' => :first_name }
156
- converter = ->(column) { map.fetch(column, column) }
230
+ converter = ->(column) { map.fetch(column, column.downcase) }
157
231
 
158
- csv_string = "First^Name\nJacob"
232
+ csv_string = "ID,First^Name\n1,Jacob"
159
233
  user = HoneyFormat::CSV.new(csv_string, header_converter: converter).rows.first
160
234
  user.first_name # => "Jacob"
235
+ user.id # => "1"
161
236
  ```
162
237
 
163
238
  Missing header values
@@ -168,9 +243,30 @@ user = csv.rows.first
168
243
  user.column1 # => "val1"
169
244
  ```
170
245
 
171
- Errors
246
+ If your header contains special chars and/or chars that can't be part of Ruby method names,
247
+ things can get a little awkward..
248
+ ```ruby
249
+ csv_string = "ÅÄÖ\nSwedish characters"
250
+ user = HoneyFormat::CSV.new(csv_string).rows.first
251
+ # Note that these chars aren't "downcased" in Ruby 2.3 and older versions of Ruby,
252
+ # "ÅÄÖ".downcase # => "ÅÄÖ"
253
+ user.ÅÄÖ # => "Swedish characters"
254
+ # while on Ruby > 2.3
255
+ user.åäö
256
+
257
+ csv_string = "First^Name\nJacob"
258
+ user = HoneyFormat::CSV.new(csv_string).rows.first
259
+ user.public_send(:"first^name") # => "Jacob"
260
+ # or
261
+ user['first^name'] # => "Jacob"
262
+ ```
263
+
264
+ __Errors__
265
+
266
+ > When you need that some extra safety.
267
+
268
+ If you want to there are some errors you can rescue
172
269
  ```ruby
173
- # there are two error super classes
174
270
  begin
175
271
  HoneyFormat::CSV.new(csv_string)
176
272
  rescue HoneyFormat::HeaderError => e
@@ -184,13 +280,17 @@ end
184
280
 
185
281
  You can see all [available errors here](https://www.rubydoc.info/gems/honey_format/HoneyFormat/Errors).
186
282
 
187
- If you want to see more usage examples check out the `spec/` directory.
283
+ If you want to see more usage examples check out the [`examples/`](https://github.com/buren/honey_format/tree/master/examples) and [`spec/`](https://github.com/buren/honey_format/tree/master/spec) directories.
188
284
 
189
285
  ## CLI
190
286
 
287
+ > Perfect when you want to get something simple done quickly.
288
+
191
289
  ```
192
290
  Usage: honey_format [file.csv] [options]
193
291
  --csv=input.csv CSV file
292
+ --[no-]header-only Print only the header
293
+ --[no-]rows-only Print only the rows
194
294
  --columns=id,name Select columns.
195
295
  --output=output.csv CSV output (STDOUT otherwise)
196
296
  --delimiter=, CSV delimiter (default: ,)
@@ -198,30 +298,48 @@ Usage: honey_format [file.csv] [options]
198
298
  --version Show version
199
299
  ```
200
300
 
201
- ## Benchmark
202
-
203
- _Note_: This gem, adds some overhead to parsing a CSV string. I've included some benchmarks below, your mileage may vary..
204
-
205
- You can run the benchmarks yourself:
301
+ Output a subset of columns to a new file
302
+ ```
303
+ # input.csv
304
+ id,name,username
305
+ 1,jacob,buren
306
+ ```
206
307
 
207
308
  ```
208
- $ bin/benchmark file.csv
309
+ $ honey_format input.csv --columns=id,username > output.csv
209
310
  ```
210
311
 
312
+
313
+ ## Benchmark
314
+
315
+ _Note_: This gem, adds some overhead to parsing a CSV string, typically ~5-10%. I've included some benchmarks below, your mileage may vary..
316
+
211
317
  204KB (1k lines)
212
318
 
213
319
  ```
214
- stdlib CSV: 51.9 i/s
215
- HoneyFormat::CSV: 49.6 i/s - 1.05x slower
320
+ CSV no options: 51.0 i/s
321
+ CSV with header: 36.1 i/s - 1.41x slower
322
+ HoneyFormat::CSV: 48.7 i/s - 1.05x slower
216
323
  ```
217
324
 
218
325
  2MB (10k lines)
219
326
 
220
327
  ```
221
- stdlib CSV: 4.6 i/s
222
- HoneyFormat::CSV: 4.2 i/s - 1.08x slower
328
+ CSV no options: 5.1 i/s
329
+ CSV with header: 3.6 i/s - 1.42x slower
330
+ HoneyFormat::CSV: 4.9 i/s - 1.05x slower
223
331
  ```
224
332
 
333
+ You can run the benchmarks yourself
334
+ ```
335
+ Usage: bin/benchmark [file.csv] [options]
336
+ --csv=[file1.csv] CSV file(s)
337
+ --[no-]verbose Verbose output
338
+ --lines-multipliers=[1,2,10] Multiply the rows in the CSV file (default: 1)
339
+ --time=[30] Benchmark time (default: 30)
340
+ --warmup=[30] Benchmark warmup (default: 30)
341
+ -h, --help How to use
342
+ ```
225
343
 
226
344
  ## Development
227
345
 
data/bin/benchmark CHANGED
@@ -4,72 +4,43 @@ require 'honey_format'
4
4
 
5
5
  require 'benchmark/ips'
6
6
  require 'csv'
7
- require 'optparse'
8
7
 
9
- input_path = nil
10
- benchmark_time = 30
11
- benchmark_warmup = 5
12
- lines_multipliers = [1]
8
+ require 'honey_format/cli/benchmark_cli'
13
9
 
14
- OptionParser.new do |parser|
15
- parser.banner = "Usage: bin/benchmark [file.csv] [options]"
16
- parser.default_argv = ARGV
10
+ cli = HoneyFormat::BenchmarkCLI.new
11
+ Writer = cli.writer
12
+ options = cli.options
17
13
 
18
- parser.on("--csv=file1.csv", String, "CSV file(s)") do |value|
19
- input_path = value
20
- end
21
-
22
- parser.on("--lines-multipliers=[1,10,50]", Array, "Multiply the rows in the CSV file (default: 1)") do |value|
23
- lines_multipliers = value.map do |v|
24
- Integer(v).tap do |int|
25
- unless int >= 1
26
- raise(ArgumentError, '--lines-multiplier must be 1 or greater')
27
- end
28
- end
29
- end
30
- end
31
-
32
- parser.on("--time=[30]", String, "Benchmark time (default: 30)") do |value|
33
- benchmark_time = Integer(value)
34
- end
35
-
36
- parser.on("--warmup=[30]", String, "Benchmark warmup (default: 30)") do |value|
37
- benchmark_warmup = Integer(value)
38
- end
39
-
40
- parser.on("-h", "--help", "How to use") do
41
- puts parser
42
- exit
43
- end
14
+ input_path = options[:input_path]
15
+ benchmark_time = options[:benchmark_time]
16
+ benchmark_warmup = options[:benchmark_warmup]
17
+ lines_multipliers = options[:lines_multipliers]
44
18
 
45
- # No argument, shows at tail. This will print an options summary.
46
- parser.on_tail("-h", "--help", "Show this message") do
47
- puts parser
48
- exit
49
- end
50
- end.parse!
19
+ original_csv = input_path ? File.read(input_path) : cli.fetch_default_benchmark_csv
20
+ original_csv_lines = original_csv.lines
51
21
 
52
- csv = File.read(input_path)
22
+ runtime_seconds = cli.expected_runtime_seconds(report_count: 3)
23
+ Writer.puts "Expected runtime: ~#{runtime_seconds} seconds.", verbose: true
53
24
 
54
- lines_multipliers.each do |lines_multiplier|
55
- if lines_multiplier > 1
56
- orignial_csv_lines = csv.lines
57
- rows = orignial_csv_lines[1..-1] * lines_multiplier
58
- csv = orignial_csv_lines.first + rows.join
59
- end
25
+ lines_multipliers.each_with_index do |lines_multiplier, index|
26
+ rows = original_csv_lines[1..-1] * lines_multiplier
27
+ csv = original_csv_lines.first + rows.join
60
28
 
61
29
  line_count = csv.lines.length
62
30
 
63
- puts "== [START] Benchmark for #{line_count} lines =="
31
+ Writer.puts "== Benchmark #{index + 1} of #{lines_multipliers.length} =="
32
+ Writer.puts "path #{cli.used_input_path}"
33
+ Writer.puts "lines #{line_count}"
34
+ Writer.puts "multiplier #{lines_multiplier}"
35
+
64
36
  Benchmark.ips do |x|
65
37
  x.time = benchmark_time
66
38
  x.warmup = benchmark_warmup
67
39
 
68
- x.report('stdlib CSV no options') { CSV.parse(csv) }
69
- x.report('stdlib CSV with header') { CSV.parse(csv, headers: true) }
70
- x.report('HoneyFormat::CSV') { HoneyFormat::CSV.new(csv).rows }
40
+ x.report('CSV no options') { CSV.parse(csv) }
41
+ x.report('CSV with header') { CSV.parse(csv, headers: true) }
42
+ x.report('HoneyFormat::CSV') { HoneyFormat::CSV.new(csv).rows }
71
43
 
72
44
  x.compare!
73
45
  end
74
- puts "== [END] Benchmark for #{line_count} lines =="
75
46
  end