csvreader 1.2.4 → 1.2.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md CHANGED
@@ -1,682 +1,682 @@
1
- # csvreader - read tabular data in the comma-separated values (csv) format the right way (uses best practices out-of-the-box with zero-configuration)
2
-
3
-
4
- * home :: [github.com/csvreader/csvreader](https://github.com/csvreader/csvreader)
5
- * bugs :: [github.com/csvreader/csvreader/issues](https://github.com/csvreader/csvreader/issues)
6
- * gem :: [rubygems.org/gems/csvreader](https://rubygems.org/gems/csvreader)
7
- * rdoc :: [rubydoc.info/gems/csvreader](http://rubydoc.info/gems/csvreader)
8
- * forum :: [wwwmake](http://groups.google.com/group/wwwmake)
9
-
10
-
11
-
12
-
13
- ## What's News?
14
-
15
- **v1.2.2** Added auto-fix/correction/recovery
16
- for double quoted value with extra trailing value
17
- to the default parser (`ParserStd`) e.g. `"Freddy" Mercury`
18
- will get read "as is" and turned
19
- into an "unquoted" value with "literal" quotes e.g. `"Freddy" Mercury`.
20
-
21
-
22
- **v1.2.1** Added support for (optional) hashtag to the
23
- to the default parser (`ParserStd`) for
24
- supporting the [Humanitarian eXchange Language (HXL)](https://github.com/csvspecs/csv-hxl).
25
- Default is turned off (`false`). Use `Csv.human`
26
- or `Csv.hum` or `Csv.hxl` for pre-defined with hashtag turned on.
27
-
28
-
29
- **v1.2** Added support for alternative (non-space) separators (e.g. `;|^:`)
30
- to the default parser (`ParserStd`).
31
-
32
-
33
- **v1.1.5** Added built-in support for (optional) alternative space
34
- character
35
- (e.g. `_-+•`)
36
- to the default parser (`ParserStd`) and the table parser (`ParserTable`).
37
- Turns `Man_Utd` into `Man Utd`, for example. Default is turned off (`nil`).
38
-
39
-
40
- **v1.1.4** Added new "classic" table parser (see `ParserTable`) for supporting fields separated by (one or more) spaces
41
- e.g. `Csv.table.parse( txt )`.
42
-
43
-
44
- **v1.1.3**: Added built-in support for french single and double quotes / guillemets (`‹› «»`) to default parser ("The Right Way").
45
- Now you can use both, that is, single (`‹...›'` or `›...‹'`)
46
- or double (`«...»` or `»...«`).
47
- Note: A quote only "kicks-in" if it's the first (non-whitespace)
48
- character of the value (otherwise it's just a "vanilla" literal character).
49
-
50
-
51
- **v1.1.2**: Added built-in support for single quotes (`'`) to default parser ("The Right Way").
52
- Now you can use both, that is, single (`'...'`) or double quotes (`"..."`)
53
- like in ruby (or javascript or html or ...) :-).
54
- Note: A quote only "kicks-in" if it's the first (non-whitespace)
55
- character of the value (otherwise it's just a "vanilla" literal character)
56
- e.g. `48°51'24"N` needs no quote :-).
57
- With the "strict" parser you will get a firework of "stray" quote errors / exceptions.
58
-
59
-
60
-
61
- **v1.1.1**: Added built-in support for (optional) alternative comments (`%`) - used by
62
- [ARFF (attribute-relation file format)](https://github.com/csvspecs/csv-meta#attribute-relation-classic) -
63
- and support for (optional) directives (`@`) in header (that is, before any records)
64
- to default parser ("The Right Way").
65
- Now you can use either `#` or `%` for comments, the first one "wins" - you CANNOT use both.
66
- Now you can use either a front matter (`---`) block
67
- or directives (e.g. `@attribute`, `@relation`, etc.)
68
- for meta data, the first one "wins" - you CANNOT use both.
69
-
70
-
71
- **v1.1.0**: Added new fixed width field (fwf) parser (see `ParserFixed`) for supporting fields with fixed width (and no separator)
72
- e.g. `Csv.fixed.parse( txt, width: [8,-2,8,-3,32,-2,14] )`.
73
-
74
-
75
- **v1.0.3**: Added built-in support for an (optional) front matter (`---`) meta data block
76
- in header (that is, before any records)
77
- to default parser ("The Right Way") - used by [CSVY (yaml front matter for csv file format)](https://github.com/csvspecs/csv-meta#front-matter-in-yaml).
78
- Use `Csv.parser.meta` to get the parsed meta data block hash (or `nil`) if none.
79
-
80
-
81
-
82
-
83
- ## Usage
84
-
85
-
86
- ``` ruby
87
- txt = <<TXT
88
- 1,2,3
89
- 4,5,6
90
- TXT
91
-
92
- records = Csv.parse( txt ) ## or CsvReader.parse
93
- pp records
94
- # => [["1","2","3"],
95
- # ["4","5","6"]]
96
-
97
- # -or-
98
-
99
- records = Csv.read( "values.csv" ) ## or CsvReader.read
100
- pp records
101
- # => [["1","2","3"],
102
- # ["4","5","6"]]
103
-
104
- # -or-
105
-
106
- Csv.foreach( "values.csv" ) do |rec| ## or CsvReader.foreach
107
- pp rec
108
- end
109
- # => ["1","2","3"]
110
- # => ["4","5","6"]
111
- ```
112
-
113
-
114
- ### What about type inference and data converters?
115
-
116
- Use the converters keyword option to (auto-)convert strings to nulls, booleans, integers, floats, dates, etc.
117
- Example:
118
-
119
- ``` ruby
120
- txt = <<TXT
121
- 1,2,3
122
- true,false,null
123
- TXT
124
-
125
- records = Csv.parse( txt, :converters => :all ) ## or CsvReader.parse
126
- pp records
127
- # => [[1,2,3],
128
- # [true,false,nil]]
129
- ```
130
-
131
-
132
- Built-in converters include:
133
-
134
- | Converter | Comments |
135
- |--------------|-------------------|
136
- | `:integer` | convert matching strings to integer |
137
- | `:float` | convert matching strings to float |
138
- | `:numeric` | shortcut for `[:integer, :float]` |
139
- | `:date` | convert matching strings to `Date` (year/month/day) |
140
- | `:date_time` | convert matching strings to `DateTime` |
141
- | `:null` | convert matching strings to null (`nil`) |
142
- | `:boolean` | convert matching strings to boolean (`true` or `false`) |
143
- | `:all` | shortcut for `[:null, :boolean, :date_time, :numeric]` |
144
-
145
-
146
- Or add your own converters. Example:
147
-
148
- ``` ruby
149
- Csv.parse( 'Ruby, 2020-03-01, 100', converters: [->(v) { Time.parse(v) rescue v }] )
150
- #=> [["Ruby", 2020-03-01 00:00:00 +0200, "100"]]
151
- ```
152
-
153
- A custom converter is a method that gets the value passed in
154
- and if successful returns a non-string type (e.g. integer, float, date, etc.)
155
- or a string (for further processing with all other converters in the "pipeline" configuration).
156
-
157
-
158
-
159
- ### What about Enumerable?
160
-
161
- Yes, every reader includes `Enumerable` and runs on `each`.
162
- Use `new` or `open` without a block
163
- to get the enumerator (iterator).
164
- Example:
165
-
166
-
167
- ``` ruby
168
- csv = Csv.new( "a,b,c" )
169
- it = csv.to_enum
170
- pp it.next
171
- # => ["a","b","c"]
172
-
173
- # -or-
174
-
175
- csv = Csv.open( "values.csv" )
176
- it = csv.to_enum
177
- pp it.next
178
- # => ["1","2","3"]
179
- pp it.next
180
- # => ["4","5","6"]
181
- ```
182
-
183
-
184
-
185
-
186
-
187
- ### What about headers?
188
-
189
- Use the `CsvHash`
190
- if the first line is a header (or if missing pass in the headers
191
- as an array) and you want your records as hashes instead of arrays of strings.
192
- Example:
193
-
194
- ``` ruby
195
- txt = <<TXT
196
- A,B,C
197
- 1,2,3
198
- 4,5,6
199
- TXT
200
-
201
- records = CsvHash.parse( txt ) ## or CsvHashReader.parse
202
- pp records
203
-
204
- # -or-
205
-
206
- txt2 = <<TXT
207
- 1,2,3
208
- 4,5,6
209
- TXT
210
-
211
- records = CsvHash.parse( txt2, headers: ["A","B","C"] ) ## or CsvHashReader.parse
212
- pp records
213
-
214
- # => [{"A": "1", "B": "2", "C": "3"},
215
- # {"A": "4", "B": "5", "C": "6"}]
216
-
217
- # -or-
218
-
219
- records = CsvHash.read( "hash.csv" ) ## or CsvHashReader.read
220
- pp records
221
- # => [{"A": "1", "B": "2", "C": "3"},
222
- # {"A": "4", "B": "5", "C": "6"}]
223
-
224
- # -or-
225
-
226
- CsvHash.foreach( "hash.csv" ) do |rec| ## or CsvHashReader.foreach
227
- pp rec
228
- end
229
- # => {"A": "1", "B": "2", "C": "3"}
230
- # => {"A": "4", "B": "5", "C": "6"}
231
- ```
232
-
233
-
234
- ### What about symbol keys for hashes?
235
-
236
- Yes, you can use the header_converters keyword option.
237
- Use `:symbol` for (auto-)converting header (strings) to symbols.
238
- Note: the symbol converter will also downcase all letters and
239
- remove all non-alphanumeric (e.g. `!?$%`) chars
240
- and replace spaces with underscores.
241
-
242
- Example:
243
-
244
- ``` ruby
245
- txt = <<TXT
246
- a,b,c
247
- 1,2,3
248
- true,false,null
249
- TXT
250
-
251
- records = CsvHash.parse( txt, :converters => :all, :header_converters => :symbol )
252
- pp records
253
- # => [{a: 1, b: 2, c: 3},
254
- # {a: true, b: false, c: nil}]
255
-
256
- # -or-
257
- options = { :converters => :all,
258
- :header_converters => :symbol }
259
-
260
- records = CsvHash.parse( txt, options )
261
- pp records
262
- # => [{a: 1, b: 2, c: 3},
263
- # {a: true, b: false, c: nil}]
264
- ```
265
-
266
- Built-in header converters include:
267
-
268
- | Converter | Comments |
269
- |--------------|---------------------|
270
- | `:downcase` | downcase strings |
271
- | `:symbol` | convert strings to symbols (and downcase and remove non-alphanumerics) |
272
-
273
-
274
-
275
- ### What about (typed) structs?
276
-
277
- See the [csvrecord library »](https://github.com/csvreader/csvrecord)
278
-
279
- Example from the csvrecord docu:
280
-
281
- Step 1: Define a (typed) struct for the comma-separated values (csv) records. Example:
282
-
283
- ```ruby
284
- require 'csvrecord'
285
-
286
- Beer = CsvRecord.define do
287
- field :brewery ## note: default type is :string
288
- field :city
289
- field :name
290
- field :abv, Float ## allows type specified as class (or use :float)
291
- end
292
- ```
293
-
294
- or in "classic" style:
295
-
296
- ```ruby
297
- class Beer < CsvRecord::Base
298
- field :brewery
299
- field :city
300
- field :name
301
- field :abv, Float
302
- end
303
- ```
304
-
305
-
306
- Step 2: Read in the comma-separated values (csv) datafile. Example:
307
-
308
- ```ruby
309
- beers = Beer.read( 'beer.csv' )
310
-
311
- puts "#{beers.size} beers:"
312
- pp beers
313
- ```
314
-
315
- pretty prints (pp):
316
-
317
- ```
318
- 6 beers:
319
- [#<Beer:0x302c760 @values=
320
- ["Andechser Klosterbrauerei", "Andechs", "Doppelbock Dunkel", 7.0]>,
321
- #<Beer:0x3026fe8 @values=
322
- ["Augustiner Br\u00E4u M\u00FCnchen", "M\u00FCnchen", "Edelstoff", 5.6]>,
323
- #<Beer:0x30257a0 @values=
324
- ["Bayerische Staatsbrauerei Weihenstephan", "Freising", "Hefe Weissbier", 5.4]>,
325
- ...
326
- ]
327
- ```
328
-
329
- Or loop over the records. Example:
330
-
331
- ``` ruby
332
- Beer.read( 'beer.csv' ).each do |rec|
333
- puts "#{rec.name} (#{rec.abv}%) by #{rec.brewery}, #{rec.city}"
334
- end
335
-
336
- # -or-
337
-
338
- Beer.foreach( 'beer.csv' ) do |rec|
339
- puts "#{rec.name} (#{rec.abv}%) by #{rec.brewery}, #{rec.city}"
340
- end
341
- ```
342
-
343
-
344
- printing:
345
-
346
- ```
347
- Doppelbock Dunkel (7.0%) by Andechser Klosterbrauerei, Andechs
348
- Edelstoff (5.6%) by Augustiner Bräu München, München
349
- Hefe Weissbier (5.4%) by Bayerische Staatsbrauerei Weihenstephan, Freising
350
- Rauchbier Märzen (5.1%) by Brauerei Spezial, Bamberg
351
- Münchner Dunkel (5.0%) by Hacker-Pschorr Bräu, München
352
- Hofbräu Oktoberfestbier (6.3%) by Staatliches Hofbräuhaus München, München
353
- ```
354
-
355
-
356
- ### What about tabular data packages with pre-defined types / schemas?
357
-
358
- See the [csvpack library »](https://github.com/csvreader/csvpack)
359
-
360
-
361
-
362
-
363
-
364
- ## Frequently Asked Questions (FAQ) and Answers
365
-
366
- ### Q: What's CSV the right way? What best practices can I use?
367
-
368
- Use best practices out-of-the-box with zero-configuration.
369
- Do you know how to skip blank lines or how to add `#` single-line comments?
370
- Or how to trim leading and trailing spaces? No worries. It's turned on by default.
371
-
372
- Yes, you can. Use
373
-
374
- ```
375
- #######
376
- # try with some comments
377
- # and blank lines even before header (first row)
378
-
379
- Brewery,City,Name,Abv
380
- Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
381
- Augustiner Bräu München,München,Edelstoff,5.6%
382
-
383
- Bayerische Staatsbrauerei Weihenstephan, Freising, Hefe Weissbier, 5.4%
384
- Brauerei Spezial, Bamberg, Rauchbier Märzen, 5.1%
385
- Hacker-Pschorr Bräu, München, Münchner Dunkel, 5.0%
386
- Staatliches Hofbräuhaus München, München, Hofbräu Oktoberfestbier, 6.3%
387
- ```
388
-
389
- instead of strict "classic"
390
- (no blank lines, no comments, no leading and trailing spaces, etc.):
391
-
392
- ```
393
- Brewery,City,Name,Abv
394
- Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
395
- Augustiner Bräu München,München,Edelstoff,5.6%
396
- Bayerische Staatsbrauerei Weihenstephan,Freising,Hefe Weissbier,5.4%
397
- Brauerei Spezial,Bamberg,Rauchbier Märzen,5.1%
398
- Hacker-Pschorr Bräu,München,Münchner Dunkel,5.0%
399
- Staatliches Hofbräuhaus München,München,Hofbräu Oktoberfestbier,6.3%
400
- ```
401
-
402
-
403
- Or use the ARFF (attribute-relation file format)-like alternative style
404
- with `%` for comments and `@`-directives
405
- for "meta data" in the header (before any records):
406
-
407
- ```
408
- %%%%%%%%%%%%%%%%%%
409
- % try with some comments
410
- % and blank lines even before @-directives in header
411
-
412
- @RELATION Beer
413
-
414
- @ATTRIBUTE Brewery
415
- @ATTRIBUTE City
416
- @ATTRIBUTE Name
417
- @ATTRIBUTE Abv
418
-
419
- @DATA
420
- Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
421
- Augustiner Bräu München,München,Edelstoff,5.6%
422
-
423
- Bayerische Staatsbrauerei Weihenstephan, Freising, Hefe Weissbier, 5.4%
424
- Brauerei Spezial, Bamberg, Rauchbier Märzen, 5.1%
425
- Hacker-Pschorr Bräu, München, Münchner Dunkel, 5.0%
426
- Staatliches Hofbräuhaus München, München, Hofbräu Oktoberfestbier, 6.3%
427
- ```
428
-
429
- Or use the ARFF (attribute-relation file format)-like alternative style with `@`-directives
430
- inside comments (for easier backwards compatibility with old readers)
431
- for "meta data" in the header (before any records):
432
-
433
- ```
434
- ##########################
435
- # try with some comments
436
- # and blank lines even before @-directives in header
437
- #
438
- # @RELATION Beer
439
- #
440
- # @ATTRIBUTE Brewery
441
- # @ATTRIBUTE City
442
- # @ATTRIBUTE Name
443
- # @ATTRIBUTE Abv
444
-
445
- Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
446
- Augustiner Bräu München,München,Edelstoff,5.6%
447
-
448
- Bayerische Staatsbrauerei Weihenstephan, Freising, Hefe Weissbier, 5.4%
449
- Brauerei Spezial, Bamberg, Rauchbier Märzen, 5.1%
450
- Hacker-Pschorr Bräu, München, Münchner Dunkel, 5.0%
451
- Staatliches Hofbräuhaus München, München, Hofbräu Oktoberfestbier, 6.3%
452
- ```
453
-
454
-
455
-
456
- ### Q: How can I change the default format / dialect?
457
-
458
- The reader includes more than half a dozen pre-configured formats,
459
- dialects.
460
-
461
- Use strict if you do NOT want to trim leading and trailing spaces
462
- and if you do NOT want to skip blank lines. Example:
463
-
464
- ``` ruby
465
- txt = <<TXT
466
- 1, 2,3
467
- 4,5 ,6
468
-
469
- TXT
470
-
471
- records = Csv.strict.parse( txt )
472
- pp records
473
- # => [["1","•2","3"],
474
- # ["4","5•","6"],
475
- # [""]]
476
- ```
477
-
478
- More strict pre-configured variants include:
479
-
480
- `Csv.mysql` uses:
481
-
482
- ``` ruby
483
- ParserStrict.new( sep: "\t",
484
- quote: false,
485
- escape: true,
486
- null: "\\N" )
487
- ```
488
-
489
- `Csv.postgres` or `Csv.postgresql` uses:
490
-
491
- ``` ruby
492
- ParserStrict.new( doublequote: false,
493
- escape: true,
494
- null: "" )
495
- ```
496
-
497
- `Csv.postgres_text` or `Csv.postgresql_text` uses:
498
-
499
- ``` ruby
500
- ParserStrict.new( sep: "\t",
501
- quote: false,
502
- escape: true,
503
- null: "\\N" )
504
- ```
505
-
506
- and so on.
507
-
508
-
509
- ### Q: How can I change the separator to semicolon (`;`) or pipe (`|`) or tab (`\t`)?
510
-
511
- Pass in the `sep` keyword option
512
- to the parser. Example:
513
-
514
- ``` ruby
515
- Csv.parse( ..., sep: ';' )
516
- Csv.read( ..., sep: ';' )
517
- # ...
518
- Csv.parse( ..., sep: '|' )
519
- Csv.read( ..., sep: '|' )
520
- # and so on
521
- ```
522
-
523
- Note: If you use tab (`\t`) use the `TabReader`
524
- (or for your convenience the built-in `Csv.tab` alias)!
525
- If you use the "classic" one or more space or tab (`/[ \t]+/`) regex
526
- use the `TableReader`
527
- (or for your convenience the built-in `Csv.table` alias)!
528
-
529
-
530
- Note: The default ("The Right Way") parser does NOT allow space or tab
531
- as separator (because leading and trailing space always gets trimmed
532
- unless inside quotes, etc.). Use the `strict` parser if you want
533
- to make up your own format with space or tab as a separator
534
- or if you want that every space or tab counts (is significant).
535
-
536
-
537
-
538
- Aside: Why? Tab =! CSV. Yes, tab is
539
- its own (even) simpler format
540
- (e.g. no escape rules, no newlines in values, etc.),
541
- see [`TabReader` »](https://github.com/csvreader/tabreader).
542
-
543
- ``` ruby
544
- Csv.tab.parse( ... ) # note: "classic" strict tab format
545
- Csv.tab.read( ... )
546
- # ...
547
-
548
- Csv.table.parse( ... ) # note: "classic" one or more space (or tab) table format
549
- Csv.table.read( ... )
550
- # ...
551
- ```
552
-
553
- If you want double quote escape rules, newlines in quotes values, etc. use
554
- the "strict" parser with the separator (`sep`) changed to tab (`\t`).
555
-
556
- ``` ruby
557
- Csv.strict.parse( ..., sep: "\t" ) # note: csv-like tab format with quotes
558
- Csv.strict.read( ..., sep: "\t" )
559
- # ...
560
- ```
561
-
562
-
563
-
564
-
565
- ### Q: How can I read records with fixed width fields (and no separator)?
566
-
567
- Pass in the `width` keyword option with the field widths / lengths
568
- to the "fixed" parser. Example:
569
-
570
- ``` ruby
571
- txt = <<TXT
572
- 12345678123456781234567890123456789012345678901212345678901234
573
- TXT
574
-
575
- Csv.fixed.parse( txt, width: [8,8,32,14] ) # or Csv.fix or Csv.f
576
- # => [["12345678","12345678", "12345678901234567890123456789012", "12345678901234"]]
577
-
578
-
579
- txt = <<TXT
580
- John Smith john@example.com 1-888-555-6666
581
- Michele O'Reileymichele@example.com 1-333-321-8765
582
- TXT
583
-
584
- Csv.fixed.parse( txt, width: [8,8,32,14] ) # or Csv.fix or Csv.f
585
- # => [["John", "Smith", "john@example.com", "1-888-555-6666"],
586
- # ["Michele", "O'Reiley", "michele@example.com", "1-333-321-8765"]]
587
-
588
- # and so on
589
- ```
590
-
591
- <!--
592
- Note: You can use for your convenience the built-in
593
- `Csv.fix` or `Csv.f` aliases / shortcuts.
594
- -->
595
-
596
-
597
- Note: You can use negative widths (e.g. `-2`, `-3`, and so on)
598
- to "skip" filler fields (e.g. `--`, `---`, and so on).
599
- Example:
600
-
601
- ``` ruby
602
- txt = <<TXT
603
- 12345678--12345678---12345678901234567890123456789012--12345678901234XXX
604
- TXT
605
-
606
- Csv.fixed.parse( txt, width: [8,-2,8,-3,32,-2,14] ) # or Csv.fix or Csv.f
607
- # => [["12345678","12345678", "12345678901234567890123456789012", "12345678901234"]]
608
- ```
609
-
610
-
611
-
612
-
613
-
614
- ### Q: What's broken in the standard library CSV reader?
615
-
616
- Two major design bugs and many many minor.
617
-
618
- (1) The CSV class uses [`line.split(',')`](https://github.com/ruby/csv/blob/master/lib/csv.rb#L1255) with some kludges (†) with the claim it's faster.
619
- What?! The right way: CSV needs its own purpose-built parser. There's no other
620
- way you can handle all the (edge) cases with double quotes and escaped doubled up
621
- double quotes. Period.
622
-
623
- For example, the CSV class cannot handle leading or trailing spaces
624
- for double quoted values `1,•"2","3"•`.
625
- Or handling double quotes inside values and so on and on.
626
-
627
- (2) The CSV class returns `nil` for `,,` but an empty string (`""`)
628
- for `"","",""`. The right way: All values are always strings. Period.
629
-
630
- If you want to use `nil` you MUST configure a string (or strings)
631
- such as `NA`, `n/a`, `\N`, or similar that map to `nil`.
632
-
633
-
634
- (†): kludge - a workaround or quick-and-dirty solution that is clumsy, inelegant, inefficient, difficult to extend and hard to maintain
635
-
636
- Appendix: Simple examples the standard csv library cannot read:
637
-
638
- Quoted values with leading or trailing spaces e.g.
639
-
640
- ```
641
- 1, "2","3" , "4" ,5
642
- ```
643
-
644
- =>
645
-
646
- ``` ruby
647
- ["1", "2", "3", "4" ,"5"]
648
- ```
649
-
650
- "Auto-fix" unambiguous quotes in "unquoted" values e.g.
651
-
652
- ```
653
- value with "quotes", another value
654
- ```
655
-
656
- =>
657
-
658
- ``` ruby
659
- ["value with \"quotes\"", "another value"]
660
- ```
661
-
662
- and some more.
663
-
664
-
665
-
666
-
667
- ## Alternatives
668
-
669
- See the Libraries & Tools section in the [Awesome CSV](https://github.com/csvspecs/awesome-csv#libraries--tools) page.
670
-
671
-
672
- ## License
673
-
674
- ![](https://publicdomainworks.github.io/buttons/zero88x31.png)
675
-
676
- The `csvreader` scripts are dedicated to the public domain.
677
- Use it as you please with no restrictions whatsoever.
678
-
679
- ## Questions? Comments?
680
-
681
- Send them along to the [wwwmake forum](http://groups.google.com/group/wwwmake).
682
- Thanks!
1
+ # csvreader - read tabular data in the comma-separated values (csv) format the right way (uses best practices out-of-the-box with zero-configuration)
2
+
3
+
4
+ * home :: [github.com/csvreader/csvreader](https://github.com/csvreader/csvreader)
5
+ * bugs :: [github.com/csvreader/csvreader/issues](https://github.com/csvreader/csvreader/issues)
6
+ * gem :: [rubygems.org/gems/csvreader](https://rubygems.org/gems/csvreader)
7
+ * rdoc :: [rubydoc.info/gems/csvreader](http://rubydoc.info/gems/csvreader)
8
+ * forum :: [wwwmake](http://groups.google.com/group/wwwmake)
9
+
10
+
11
+
12
+
13
+ ## What's News?
14
+
15
+ **v1.2.2** Added auto-fix/correction/recovery
16
+ for double quoted value with extra trailing value
17
+ to the default parser (`ParserStd`) e.g. `"Freddy" Mercury`
18
+ will get read "as is" and turned
19
+ into an "unquoted" value with "literal" quotes e.g. `"Freddy" Mercury`.
20
+
21
+
22
+ **v1.2.1** Added support for (optional) hashtag to the
23
+ to the default parser (`ParserStd`) for
24
+ supporting the [Humanitarian eXchange Language (HXL)](https://github.com/csvspecs/csv-hxl).
25
+ Default is turned off (`false`). Use `Csv.human`
26
+ or `Csv.hum` or `Csv.hxl` for pre-defined with hashtag turned on.
27
+
28
+
29
+ **v1.2** Added support for alternative (non-space) separators (e.g. `;|^:`)
30
+ to the default parser (`ParserStd`).
31
+
32
+
33
+ **v1.1.5** Added built-in support for (optional) alternative space
34
+ character
35
+ (e.g. `_-+•`)
36
+ to the default parser (`ParserStd`) and the table parser (`ParserTable`).
37
+ Turns `Man_Utd` into `Man Utd`, for example. Default is turned off (`nil`).
38
+
39
+
40
+ **v1.1.4** Added new "classic" table parser (see `ParserTable`) for supporting fields separated by (one or more) spaces
41
+ e.g. `Csv.table.parse( txt )`.
42
+
43
+
44
+ **v1.1.3**: Added built-in support for french single and double quotes / guillemets (`‹› «»`) to default parser ("The Right Way").
45
+ Now you can use both, that is, single (`‹...›'` or `›...‹'`)
46
+ or double (`«...»` or `»...«`).
47
+ Note: A quote only "kicks-in" if it's the first (non-whitespace)
48
+ character of the value (otherwise it's just a "vanilla" literal character).
49
+
50
+
51
+ **v1.1.2**: Added built-in support for single quotes (`'`) to default parser ("The Right Way").
52
+ Now you can use both, that is, single (`'...'`) or double quotes (`"..."`)
53
+ like in ruby (or javascript or html or ...) :-).
54
+ Note: A quote only "kicks-in" if it's the first (non-whitespace)
55
+ character of the value (otherwise it's just a "vanilla" literal character)
56
+ e.g. `48°51'24"N` needs no quote :-).
57
+ With the "strict" parser you will get a firework of "stray" quote errors / exceptions.
58
+
59
+
60
+
61
+ **v1.1.1**: Added built-in support for (optional) alternative comments (`%`) - used by
62
+ [ARFF (attribute-relation file format)](https://github.com/csvspecs/csv-meta#attribute-relation-classic) -
63
+ and support for (optional) directives (`@`) in header (that is, before any records)
64
+ to default parser ("The Right Way").
65
+ Now you can use either `#` or `%` for comments, the first one "wins" - you CANNOT use both.
66
+ Now you can use either a front matter (`---`) block
67
+ or directives (e.g. `@attribute`, `@relation`, etc.)
68
+ for meta data, the first one "wins" - you CANNOT use both.
69
+
70
+
71
+ **v1.1.0**: Added new fixed width field (fwf) parser (see `ParserFixed`) for supporting fields with fixed width (and no separator)
72
+ e.g. `Csv.fixed.parse( txt, width: [8,-2,8,-3,32,-2,14] )`.
73
+
74
+
75
+ **v1.0.3**: Added built-in support for an (optional) front matter (`---`) meta data block
76
+ in header (that is, before any records)
77
+ to default parser ("The Right Way") - used by [CSVY (yaml front matter for csv file format)](https://github.com/csvspecs/csv-meta#front-matter-in-yaml).
78
+ Use `Csv.parser.meta` to get the parsed meta data block hash (or `nil`) if none.
79
+
80
+
81
+
82
+
83
+ ## Usage
84
+
85
+
86
+ ``` ruby
87
+ txt = <<TXT
88
+ 1,2,3
89
+ 4,5,6
90
+ TXT
91
+
92
+ records = Csv.parse( txt ) ## or CsvReader.parse
93
+ pp records
94
+ # => [["1","2","3"],
95
+ # ["4","5","6"]]
96
+
97
+ # -or-
98
+
99
+ records = Csv.read( "values.csv" ) ## or CsvReader.read
100
+ pp records
101
+ # => [["1","2","3"],
102
+ # ["4","5","6"]]
103
+
104
+ # -or-
105
+
106
+ Csv.foreach( "values.csv" ) do |rec| ## or CsvReader.foreach
107
+ pp rec
108
+ end
109
+ # => ["1","2","3"]
110
+ # => ["4","5","6"]
111
+ ```
112
+
113
+
114
+ ### What about type inference and data converters?
115
+
116
+ Use the converters keyword option to (auto-)convert strings to nulls, booleans, integers, floats, dates, etc.
117
+ Example:
118
+
119
+ ``` ruby
120
+ txt = <<TXT
121
+ 1,2,3
122
+ true,false,null
123
+ TXT
124
+
125
+ records = Csv.parse( txt, :converters => :all ) ## or CsvReader.parse
126
+ pp records
127
+ # => [[1,2,3],
128
+ # [true,false,nil]]
129
+ ```
130
+
131
+
132
+ Built-in converters include:
133
+
134
+ | Converter | Comments |
135
+ |--------------|-------------------|
136
+ | `:integer` | convert matching strings to integer |
137
+ | `:float` | convert matching strings to float |
138
+ | `:numeric` | shortcut for `[:integer, :float]` |
139
+ | `:date` | convert matching strings to `Date` (year/month/day) |
140
+ | `:date_time` | convert matching strings to `DateTime` |
141
+ | `:null` | convert matching strings to null (`nil`) |
142
+ | `:boolean` | convert matching strings to boolean (`true` or `false`) |
143
+ | `:all` | shortcut for `[:null, :boolean, :date_time, :numeric]` |
144
+
145
+
146
+ Or add your own converters. Example:
147
+
148
+ ``` ruby
149
+ Csv.parse( 'Ruby, 2020-03-01, 100', converters: [->(v) { Time.parse(v) rescue v }] )
150
+ #=> [["Ruby", 2020-03-01 00:00:00 +0200, "100"]]
151
+ ```
152
+
153
+ A custom converter is a method that gets the value passed in
154
+ and if successful returns a non-string type (e.g. integer, float, date, etc.)
155
+ or a string (for further processing with all other converters in the "pipeline" configuration).
156
+
157
+
158
+
159
+ ### What about Enumerable?
160
+
161
+ Yes, every reader includes `Enumerable` and runs on `each`.
162
+ Use `new` or `open` without a block
163
+ to get the enumerator (iterator).
164
+ Example:
165
+
166
+
167
+ ``` ruby
168
+ csv = Csv.new( "a,b,c" )
169
+ it = csv.to_enum
170
+ pp it.next
171
+ # => ["a","b","c"]
172
+
173
+ # -or-
174
+
175
+ csv = Csv.open( "values.csv" )
176
+ it = csv.to_enum
177
+ pp it.next
178
+ # => ["1","2","3"]
179
+ pp it.next
180
+ # => ["4","5","6"]
181
+ ```
182
+
183
+
184
+
185
+
186
+
187
+ ### What about headers?
188
+
189
+ Use the `CsvHash`
190
+ if the first line is a header (or if missing pass in the headers
191
+ as an array) and you want your records as hashes instead of arrays of strings.
192
+ Example:
193
+
194
+ ``` ruby
195
+ txt = <<TXT
196
+ A,B,C
197
+ 1,2,3
198
+ 4,5,6
199
+ TXT
200
+
201
+ records = CsvHash.parse( txt ) ## or CsvHashReader.parse
202
+ pp records
203
+
204
+ # -or-
205
+
206
+ txt2 = <<TXT
207
+ 1,2,3
208
+ 4,5,6
209
+ TXT
210
+
211
+ records = CsvHash.parse( txt2, headers: ["A","B","C"] ) ## or CsvHashReader.parse
212
+ pp records
213
+
214
+ # => [{"A": "1", "B": "2", "C": "3"},
215
+ # {"A": "4", "B": "5", "C": "6"}]
216
+
217
+ # -or-
218
+
219
+ records = CsvHash.read( "hash.csv" ) ## or CsvHashReader.read
220
+ pp records
221
+ # => [{"A": "1", "B": "2", "C": "3"},
222
+ # {"A": "4", "B": "5", "C": "6"}]
223
+
224
+ # -or-
225
+
226
+ CsvHash.foreach( "hash.csv" ) do |rec| ## or CsvHashReader.foreach
227
+ pp rec
228
+ end
229
+ # => {"A": "1", "B": "2", "C": "3"}
230
+ # => {"A": "4", "B": "5", "C": "6"}
231
+ ```
232
+
233
+
234
+ ### What about symbol keys for hashes?
235
+
236
+ Yes, you can use the header_converters keyword option.
237
+ Use `:symbol` for (auto-)converting header (strings) to symbols.
238
+ Note: the symbol converter will also downcase all letters and
239
+ remove all non-alphanumeric (e.g. `!?$%`) chars
240
+ and replace spaces with underscores.
241
+
242
+ Example:
243
+
244
+ ``` ruby
245
+ txt = <<TXT
246
+ a,b,c
247
+ 1,2,3
248
+ true,false,null
249
+ TXT
250
+
251
+ records = CsvHash.parse( txt, :converters => :all, :header_converters => :symbol )
252
+ pp records
253
+ # => [{a: 1, b: 2, c: 3},
254
+ # {a: true, b: false, c: nil}]
255
+
256
+ # -or-
257
+ options = { :converters => :all,
258
+ :header_converters => :symbol }
259
+
260
+ records = CsvHash.parse( txt, options )
261
+ pp records
262
+ # => [{a: 1, b: 2, c: 3},
263
+ # {a: true, b: false, c: nil}]
264
+ ```
265
+
266
+ Built-in header converters include:
267
+
268
+ | Converter | Comments |
269
+ |--------------|---------------------|
270
+ | `:downcase` | downcase strings |
271
+ | `:symbol` | convert strings to symbols (and downcase and remove non-alphanumerics) |
272
+
273
+
274
+
275
+ ### What about (typed) structs?
276
+
277
+ See the [csvrecord library »](https://github.com/csvreader/csvrecord)
278
+
279
+ Example from the csvrecord docu:
280
+
281
+ Step 1: Define a (typed) struct for the comma-separated values (csv) records. Example:
282
+
283
+ ```ruby
284
+ require 'csvrecord'
285
+
286
+ Beer = CsvRecord.define do
287
+ field :brewery ## note: default type is :string
288
+ field :city
289
+ field :name
290
+ field :abv, Float ## allows type specified as class (or use :float)
291
+ end
292
+ ```
293
+
294
+ or in "classic" style:
295
+
296
+ ```ruby
297
+ class Beer < CsvRecord::Base
298
+ field :brewery
299
+ field :city
300
+ field :name
301
+ field :abv, Float
302
+ end
303
+ ```
304
+
305
+
306
+ Step 2: Read in the comma-separated values (csv) datafile. Example:
307
+
308
+ ```ruby
309
+ beers = Beer.read( 'beer.csv' )
310
+
311
+ puts "#{beers.size} beers:"
312
+ pp beers
313
+ ```
314
+
315
+ pretty prints (pp):
316
+
317
+ ```
318
+ 6 beers:
319
+ [#<Beer:0x302c760 @values=
320
+ ["Andechser Klosterbrauerei", "Andechs", "Doppelbock Dunkel", 7.0]>,
321
+ #<Beer:0x3026fe8 @values=
322
+ ["Augustiner Br\u00E4u M\u00FCnchen", "M\u00FCnchen", "Edelstoff", 5.6]>,
323
+ #<Beer:0x30257a0 @values=
324
+ ["Bayerische Staatsbrauerei Weihenstephan", "Freising", "Hefe Weissbier", 5.4]>,
325
+ ...
326
+ ]
327
+ ```
328
+
329
+ Or loop over the records. Example:
330
+
331
+ ``` ruby
332
+ Beer.read( 'beer.csv' ).each do |rec|
333
+ puts "#{rec.name} (#{rec.abv}%) by #{rec.brewery}, #{rec.city}"
334
+ end
335
+
336
+ # -or-
337
+
338
+ Beer.foreach( 'beer.csv' ) do |rec|
339
+ puts "#{rec.name} (#{rec.abv}%) by #{rec.brewery}, #{rec.city}"
340
+ end
341
+ ```
342
+
343
+
344
+ printing:
345
+
346
+ ```
347
+ Doppelbock Dunkel (7.0%) by Andechser Klosterbrauerei, Andechs
348
+ Edelstoff (5.6%) by Augustiner Bräu München, München
349
+ Hefe Weissbier (5.4%) by Bayerische Staatsbrauerei Weihenstephan, Freising
350
+ Rauchbier Märzen (5.1%) by Brauerei Spezial, Bamberg
351
+ Münchner Dunkel (5.0%) by Hacker-Pschorr Bräu, München
352
+ Hofbräu Oktoberfestbier (6.3%) by Staatliches Hofbräuhaus München, München
353
+ ```
354
+
355
+
356
+ ### What about tabular data packages with pre-defined types / schemas?
357
+
358
+ See the [csvpack library »](https://github.com/csvreader/csvpack)
359
+
360
+
361
+
362
+
363
+
364
+ ## Frequently Asked Questions (FAQ) and Answers
365
+
366
+ ### Q: What's CSV the right way? What best practices can I use?
367
+
368
+ Use best practices out-of-the-box with zero-configuration.
369
+ Do you know how to skip blank lines or how to add `#` single-line comments?
370
+ Or how to trim leading and trailing spaces? No worries. It's turned on by default.
371
+
372
+ Yes, you can. Use
373
+
374
+ ```
375
+ #######
376
+ # try with some comments
377
+ # and blank lines even before header (first row)
378
+
379
+ Brewery,City,Name,Abv
380
+ Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
381
+ Augustiner Bräu München,München,Edelstoff,5.6%
382
+
383
+ Bayerische Staatsbrauerei Weihenstephan, Freising, Hefe Weissbier, 5.4%
384
+ Brauerei Spezial, Bamberg, Rauchbier Märzen, 5.1%
385
+ Hacker-Pschorr Bräu, München, Münchner Dunkel, 5.0%
386
+ Staatliches Hofbräuhaus München, München, Hofbräu Oktoberfestbier, 6.3%
387
+ ```
388
+
389
+ instead of strict "classic"
390
+ (no blank lines, no comments, no leading and trailing spaces, etc.):
391
+
392
+ ```
393
+ Brewery,City,Name,Abv
394
+ Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
395
+ Augustiner Bräu München,München,Edelstoff,5.6%
396
+ Bayerische Staatsbrauerei Weihenstephan,Freising,Hefe Weissbier,5.4%
397
+ Brauerei Spezial,Bamberg,Rauchbier Märzen,5.1%
398
+ Hacker-Pschorr Bräu,München,Münchner Dunkel,5.0%
399
+ Staatliches Hofbräuhaus München,München,Hofbräu Oktoberfestbier,6.3%
400
+ ```
401
+
402
+
403
+ Or use the ARFF (attribute-relation file format)-like alternative style
404
+ with `%` for comments and `@`-directives
405
+ for "meta data" in the header (before any records):
406
+
407
+ ```
408
+ %%%%%%%%%%%%%%%%%%
409
+ % try with some comments
410
+ % and blank lines even before @-directives in header
411
+
412
+ @RELATION Beer
413
+
414
+ @ATTRIBUTE Brewery
415
+ @ATTRIBUTE City
416
+ @ATTRIBUTE Name
417
+ @ATTRIBUTE Abv
418
+
419
+ @DATA
420
+ Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
421
+ Augustiner Bräu München,München,Edelstoff,5.6%
422
+
423
+ Bayerische Staatsbrauerei Weihenstephan, Freising, Hefe Weissbier, 5.4%
424
+ Brauerei Spezial, Bamberg, Rauchbier Märzen, 5.1%
425
+ Hacker-Pschorr Bräu, München, Münchner Dunkel, 5.0%
426
+ Staatliches Hofbräuhaus München, München, Hofbräu Oktoberfestbier, 6.3%
427
+ ```
428
+
429
+ Or use the ARFF (attribute-relation file format)-like alternative style with `@`-directives
430
+ inside comments (for easier backwards compatibility with old readers)
431
+ for "meta data" in the header (before any records):
432
+
433
+ ```
434
+ ##########################
435
+ # try with some comments
436
+ # and blank lines even before @-directives in header
437
+ #
438
+ # @RELATION Beer
439
+ #
440
+ # @ATTRIBUTE Brewery
441
+ # @ATTRIBUTE City
442
+ # @ATTRIBUTE Name
443
+ # @ATTRIBUTE Abv
444
+
445
+ Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
446
+ Augustiner Bräu München,München,Edelstoff,5.6%
447
+
448
+ Bayerische Staatsbrauerei Weihenstephan, Freising, Hefe Weissbier, 5.4%
449
+ Brauerei Spezial, Bamberg, Rauchbier Märzen, 5.1%
450
+ Hacker-Pschorr Bräu, München, Münchner Dunkel, 5.0%
451
+ Staatliches Hofbräuhaus München, München, Hofbräu Oktoberfestbier, 6.3%
452
+ ```
453
+
454
+
455
+
456
+ ### Q: How can I change the default format / dialect?
457
+
458
+ The reader includes more than half a dozen pre-configured formats,
459
+ dialects.
460
+
461
+ Use strict if you do NOT want to trim leading and trailing spaces
462
+ and if you do NOT want to skip blank lines. Example:
463
+
464
+ ``` ruby
465
+ txt = <<TXT
466
+ 1, 2,3
467
+ 4,5 ,6
468
+
469
+ TXT
470
+
471
+ records = Csv.strict.parse( txt )
472
+ pp records
473
+ # => [["1","•2","3"],
474
+ # ["4","5•","6"],
475
+ # [""]]
476
+ ```
477
+
478
+ More strict pre-configured variants include:
479
+
480
+ `Csv.mysql` uses:
481
+
482
+ ``` ruby
483
+ ParserStrict.new( sep: "\t",
484
+ quote: false,
485
+ escape: true,
486
+ null: "\\N" )
487
+ ```
488
+
489
+ `Csv.postgres` or `Csv.postgresql` uses:
490
+
491
+ ``` ruby
492
+ ParserStrict.new( doublequote: false,
493
+ escape: true,
494
+ null: "" )
495
+ ```
496
+
497
+ `Csv.postgres_text` or `Csv.postgresql_text` uses:
498
+
499
+ ``` ruby
500
+ ParserStrict.new( sep: "\t",
501
+ quote: false,
502
+ escape: true,
503
+ null: "\\N" )
504
+ ```
505
+
506
+ and so on.
507
+
508
+
509
+ ### Q: How can I change the separator to semicolon (`;`) or pipe (`|`) or tab (`\t`)?
510
+
511
+ Pass in the `sep` keyword option
512
+ to the parser. Example:
513
+
514
+ ``` ruby
515
+ Csv.parse( ..., sep: ';' )
516
+ Csv.read( ..., sep: ';' )
517
+ # ...
518
+ Csv.parse( ..., sep: '|' )
519
+ Csv.read( ..., sep: '|' )
520
+ # and so on
521
+ ```
522
+
523
+ Note: If you use tab (`\t`) use the `TabReader`
524
+ (or for your convenience the built-in `Csv.tab` alias)!
525
+ If you use the "classic" one or more space or tab (`/[ \t]+/`) regex
526
+ use the `TableReader`
527
+ (or for your convenience the built-in `Csv.table` alias)!
528
+
529
+
530
+ Note: The default ("The Right Way") parser does NOT allow space or tab
531
+ as separator (because leading and trailing space always gets trimmed
532
+ unless inside quotes, etc.). Use the `strict` parser if you want
533
+ to make up your own format with space or tab as a separator
534
+ or if you want that every space or tab counts (is significant).
535
+
536
+
537
+
538
+ Aside: Why? Tab =! CSV. Yes, tab is
539
+ its own (even) simpler format
540
+ (e.g. no escape rules, no newlines in values, etc.),
541
+ see [`TabReader` »](https://github.com/csvreader/tabreader).
542
+
543
+ ``` ruby
544
+ Csv.tab.parse( ... ) # note: "classic" strict tab format
545
+ Csv.tab.read( ... )
546
+ # ...
547
+
548
+ Csv.table.parse( ... ) # note: "classic" one or more space (or tab) table format
549
+ Csv.table.read( ... )
550
+ # ...
551
+ ```
552
+
553
+ If you want double quote escape rules, newlines in quotes values, etc. use
554
+ the "strict" parser with the separator (`sep`) changed to tab (`\t`).
555
+
556
+ ``` ruby
557
+ Csv.strict.parse( ..., sep: "\t" ) # note: csv-like tab format with quotes
558
+ Csv.strict.read( ..., sep: "\t" )
559
+ # ...
560
+ ```
561
+
562
+
563
+
564
+
565
+ ### Q: How can I read records with fixed width fields (and no separator)?
566
+
567
+ Pass in the `width` keyword option with the field widths / lengths
568
+ to the "fixed" parser. Example:
569
+
570
+ ``` ruby
571
+ txt = <<TXT
572
+ 12345678123456781234567890123456789012345678901212345678901234
573
+ TXT
574
+
575
+ Csv.fixed.parse( txt, width: [8,8,32,14] ) # or Csv.fix or Csv.f
576
+ # => [["12345678","12345678", "12345678901234567890123456789012", "12345678901234"]]
577
+
578
+
579
+ txt = <<TXT
580
+ John Smith john@example.com 1-888-555-6666
581
+ Michele O'Reileymichele@example.com 1-333-321-8765
582
+ TXT
583
+
584
+ Csv.fixed.parse( txt, width: [8,8,32,14] ) # or Csv.fix or Csv.f
585
+ # => [["John", "Smith", "john@example.com", "1-888-555-6666"],
586
+ # ["Michele", "O'Reiley", "michele@example.com", "1-333-321-8765"]]
587
+
588
+ # and so on
589
+ ```
590
+
591
+ <!--
592
+ Note: You can use for your convenience the built-in
593
+ `Csv.fix` or `Csv.f` aliases / shortcuts.
594
+ -->
595
+
596
+
597
+ Note: You can use negative widths (e.g. `-2`, `-3`, and so on)
598
+ to "skip" filler fields (e.g. `--`, `---`, and so on).
599
+ Example:
600
+
601
+ ``` ruby
602
+ txt = <<TXT
603
+ 12345678--12345678---12345678901234567890123456789012--12345678901234XXX
604
+ TXT
605
+
606
+ Csv.fixed.parse( txt, width: [8,-2,8,-3,32,-2,14] ) # or Csv.fix or Csv.f
607
+ # => [["12345678","12345678", "12345678901234567890123456789012", "12345678901234"]]
608
+ ```
609
+
610
+
611
+
612
+
613
+
614
+ ### Q: What's broken in the standard library CSV reader?
615
+
616
+ Two major design bugs and many many minor.
617
+
618
+ (1) The CSV class uses [`line.split(',')`](https://github.com/ruby/csv/blob/master/lib/csv.rb#L1255) with some kludges (†) with the claim it's faster.
619
+ What?! The right way: CSV needs its own purpose-built parser. There's no other
620
+ way you can handle all the (edge) cases with double quotes and escaped doubled up
621
+ double quotes. Period.
622
+
623
+ For example, the CSV class cannot handle leading or trailing spaces
624
+ for double quoted values `1,•"2","3"•`.
625
+ Or handling double quotes inside values and so on and on.
626
+
627
+ (2) The CSV class returns `nil` for `,,` but an empty string (`""`)
628
+ for `"","",""`. The right way: All values are always strings. Period.
629
+
630
+ If you want to use `nil` you MUST configure a string (or strings)
631
+ such as `NA`, `n/a`, `\N`, or similar that map to `nil`.
632
+
633
+
634
+ (†): kludge - a workaround or quick-and-dirty solution that is clumsy, inelegant, inefficient, difficult to extend and hard to maintain
635
+
636
+ Appendix: Simple examples the standard csv library cannot read:
637
+
638
+ Quoted values with leading or trailing spaces e.g.
639
+
640
+ ```
641
+ 1, "2","3" , "4" ,5
642
+ ```
643
+
644
+ =>
645
+
646
+ ``` ruby
647
+ ["1", "2", "3", "4" ,"5"]
648
+ ```
649
+
650
+ "Auto-fix" unambiguous quotes in "unquoted" values e.g.
651
+
652
+ ```
653
+ value with "quotes", another value
654
+ ```
655
+
656
+ =>
657
+
658
+ ``` ruby
659
+ ["value with \"quotes\"", "another value"]
660
+ ```
661
+
662
+ and some more.
663
+
664
+
665
+
666
+
667
+ ## Alternatives
668
+
669
+ See the Libraries & Tools section in the [Awesome CSV](https://github.com/csvspecs/awesome-csv#libraries--tools) page.
670
+
671
+
672
+ ## License
673
+
674
+ ![](https://publicdomainworks.github.io/buttons/zero88x31.png)
675
+
676
+ The `csvreader` scripts are dedicated to the public domain.
677
+ Use it as you please with no restrictions whatsoever.
678
+
679
+ ## Questions? Comments?
680
+
681
+ Send them along to the [wwwmake forum](http://groups.google.com/group/wwwmake).
682
+ Thanks!