csvreader 1.2.4 → 1.2.5

Sign up to get free protection for your applications and to get access to all the features.
data/README.md CHANGED
@@ -1,682 +1,682 @@
1
- # csvreader - read tabular data in the comma-separated values (csv) format the right way (uses best practices out-of-the-box with zero-configuration)
2
-
3
-
4
- * home :: [github.com/csvreader/csvreader](https://github.com/csvreader/csvreader)
5
- * bugs :: [github.com/csvreader/csvreader/issues](https://github.com/csvreader/csvreader/issues)
6
- * gem :: [rubygems.org/gems/csvreader](https://rubygems.org/gems/csvreader)
7
- * rdoc :: [rubydoc.info/gems/csvreader](http://rubydoc.info/gems/csvreader)
8
- * forum :: [wwwmake](http://groups.google.com/group/wwwmake)
9
-
10
-
11
-
12
-
13
- ## What's News?
14
-
15
- **v1.2.2** Added auto-fix/correction/recovery
16
- for double quoted value with extra trailing value
17
- to the default parser (`ParserStd`) e.g. `"Freddy" Mercury`
18
- will get read "as is" and turned
19
- into an "unquoted" value with "literal" quotes e.g. `"Freddy" Mercury`.
20
-
21
-
22
- **v1.2.1** Added support for (optional) hashtag to the
23
- to the default parser (`ParserStd`) for
24
- supporting the [Humanitarian eXchange Language (HXL)](https://github.com/csvspecs/csv-hxl).
25
- Default is turned off (`false`). Use `Csv.human`
26
- or `Csv.hum` or `Csv.hxl` for pre-defined with hashtag turned on.
27
-
28
-
29
- **v1.2** Added support for alternative (non-space) separators (e.g. `;|^:`)
30
- to the default parser (`ParserStd`).
31
-
32
-
33
- **v1.1.5** Added built-in support for (optional) alternative space
34
- character
35
- (e.g. `_-+•`)
36
- to the default parser (`ParserStd`) and the table parser (`ParserTable`).
37
- Turns `Man_Utd` into `Man Utd`, for example. Default is turned off (`nil`).
38
-
39
-
40
- **v1.1.4** Added new "classic" table parser (see `ParserTable`) for supporting fields separated by (one or more) spaces
41
- e.g. `Csv.table.parse( txt )`.
42
-
43
-
44
- **v1.1.3**: Added built-in support for french single and double quotes / guillemets (`‹› «»`) to default parser ("The Right Way").
45
- Now you can use both, that is, single (`‹...›'` or `›...‹'`)
46
- or double (`«...»` or `»...«`).
47
- Note: A quote only "kicks-in" if it's the first (non-whitespace)
48
- character of the value (otherwise it's just a "vanilla" literal character).
49
-
50
-
51
- **v1.1.2**: Added built-in support for single quotes (`'`) to default parser ("The Right Way").
52
- Now you can use both, that is, single (`'...'`) or double quotes (`"..."`)
53
- like in ruby (or javascript or html or ...) :-).
54
- Note: A quote only "kicks-in" if it's the first (non-whitespace)
55
- character of the value (otherwise it's just a "vanilla" literal character)
56
- e.g. `48°51'24"N` needs no quote :-).
57
- With the "strict" parser you will get a firework of "stray" quote errors / exceptions.
58
-
59
-
60
-
61
- **v1.1.1**: Added built-in support for (optional) alternative comments (`%`) - used by
62
- [ARFF (attribute-relation file format)](https://github.com/csvspecs/csv-meta#attribute-relation-classic) -
63
- and support for (optional) directives (`@`) in header (that is, before any records)
64
- to default parser ("The Right Way").
65
- Now you can use either `#` or `%` for comments, the first one "wins" - you CANNOT use both.
66
- Now you can use either a front matter (`---`) block
67
- or directives (e.g. `@attribute`, `@relation`, etc.)
68
- for meta data, the first one "wins" - you CANNOT use both.
69
-
70
-
71
- **v1.1.0**: Added new fixed width field (fwf) parser (see `ParserFixed`) for supporting fields with fixed width (and no separator)
72
- e.g. `Csv.fixed.parse( txt, width: [8,-2,8,-3,32,-2,14] )`.
73
-
74
-
75
- **v1.0.3**: Added built-in support for an (optional) front matter (`---`) meta data block
76
- in header (that is, before any records)
77
- to default parser ("The Right Way") - used by [CSVY (yaml front matter for csv file format)](https://github.com/csvspecs/csv-meta#front-matter-in-yaml).
78
- Use `Csv.parser.meta` to get the parsed meta data block hash (or `nil`) if none.
79
-
80
-
81
-
82
-
83
- ## Usage
84
-
85
-
86
- ``` ruby
87
- txt = <<TXT
88
- 1,2,3
89
- 4,5,6
90
- TXT
91
-
92
- records = Csv.parse( txt ) ## or CsvReader.parse
93
- pp records
94
- # => [["1","2","3"],
95
- # ["4","5","6"]]
96
-
97
- # -or-
98
-
99
- records = Csv.read( "values.csv" ) ## or CsvReader.read
100
- pp records
101
- # => [["1","2","3"],
102
- # ["4","5","6"]]
103
-
104
- # -or-
105
-
106
- Csv.foreach( "values.csv" ) do |rec| ## or CsvReader.foreach
107
- pp rec
108
- end
109
- # => ["1","2","3"]
110
- # => ["4","5","6"]
111
- ```
112
-
113
-
114
- ### What about type inference and data converters?
115
-
116
- Use the converters keyword option to (auto-)convert strings to nulls, booleans, integers, floats, dates, etc.
117
- Example:
118
-
119
- ``` ruby
120
- txt = <<TXT
121
- 1,2,3
122
- true,false,null
123
- TXT
124
-
125
- records = Csv.parse( txt, :converters => :all ) ## or CsvReader.parse
126
- pp records
127
- # => [[1,2,3],
128
- # [true,false,nil]]
129
- ```
130
-
131
-
132
- Built-in converters include:
133
-
134
- | Converter | Comments |
135
- |--------------|-------------------|
136
- | `:integer` | convert matching strings to integer |
137
- | `:float` | convert matching strings to float |
138
- | `:numeric` | shortcut for `[:integer, :float]` |
139
- | `:date` | convert matching strings to `Date` (year/month/day) |
140
- | `:date_time` | convert matching strings to `DateTime` |
141
- | `:null` | convert matching strings to null (`nil`) |
142
- | `:boolean` | convert matching strings to boolean (`true` or `false`) |
143
- | `:all` | shortcut for `[:null, :boolean, :date_time, :numeric]` |
144
-
145
-
146
- Or add your own converters. Example:
147
-
148
- ``` ruby
149
- Csv.parse( 'Ruby, 2020-03-01, 100', converters: [->(v) { Time.parse(v) rescue v }] )
150
- #=> [["Ruby", 2020-03-01 00:00:00 +0200, "100"]]
151
- ```
152
-
153
- A custom converter is a method that gets the value passed in
154
- and if successful returns a non-string type (e.g. integer, float, date, etc.)
155
- or a string (for further processing with all other converters in the "pipeline" configuration).
156
-
157
-
158
-
159
- ### What about Enumerable?
160
-
161
- Yes, every reader includes `Enumerable` and runs on `each`.
162
- Use `new` or `open` without a block
163
- to get the enumerator (iterator).
164
- Example:
165
-
166
-
167
- ``` ruby
168
- csv = Csv.new( "a,b,c" )
169
- it = csv.to_enum
170
- pp it.next
171
- # => ["a","b","c"]
172
-
173
- # -or-
174
-
175
- csv = Csv.open( "values.csv" )
176
- it = csv.to_enum
177
- pp it.next
178
- # => ["1","2","3"]
179
- pp it.next
180
- # => ["4","5","6"]
181
- ```
182
-
183
-
184
-
185
-
186
-
187
- ### What about headers?
188
-
189
- Use the `CsvHash`
190
- if the first line is a header (or if missing pass in the headers
191
- as an array) and you want your records as hashes instead of arrays of strings.
192
- Example:
193
-
194
- ``` ruby
195
- txt = <<TXT
196
- A,B,C
197
- 1,2,3
198
- 4,5,6
199
- TXT
200
-
201
- records = CsvHash.parse( txt ) ## or CsvHashReader.parse
202
- pp records
203
-
204
- # -or-
205
-
206
- txt2 = <<TXT
207
- 1,2,3
208
- 4,5,6
209
- TXT
210
-
211
- records = CsvHash.parse( txt2, headers: ["A","B","C"] ) ## or CsvHashReader.parse
212
- pp records
213
-
214
- # => [{"A": "1", "B": "2", "C": "3"},
215
- # {"A": "4", "B": "5", "C": "6"}]
216
-
217
- # -or-
218
-
219
- records = CsvHash.read( "hash.csv" ) ## or CsvHashReader.read
220
- pp records
221
- # => [{"A": "1", "B": "2", "C": "3"},
222
- # {"A": "4", "B": "5", "C": "6"}]
223
-
224
- # -or-
225
-
226
- CsvHash.foreach( "hash.csv" ) do |rec| ## or CsvHashReader.foreach
227
- pp rec
228
- end
229
- # => {"A": "1", "B": "2", "C": "3"}
230
- # => {"A": "4", "B": "5", "C": "6"}
231
- ```
232
-
233
-
234
- ### What about symbol keys for hashes?
235
-
236
- Yes, you can use the header_converters keyword option.
237
- Use `:symbol` for (auto-)converting header (strings) to symbols.
238
- Note: the symbol converter will also downcase all letters and
239
- remove all non-alphanumeric (e.g. `!?$%`) chars
240
- and replace spaces with underscores.
241
-
242
- Example:
243
-
244
- ``` ruby
245
- txt = <<TXT
246
- a,b,c
247
- 1,2,3
248
- true,false,null
249
- TXT
250
-
251
- records = CsvHash.parse( txt, :converters => :all, :header_converters => :symbol )
252
- pp records
253
- # => [{a: 1, b: 2, c: 3},
254
- # {a: true, b: false, c: nil}]
255
-
256
- # -or-
257
- options = { :converters => :all,
258
- :header_converters => :symbol }
259
-
260
- records = CsvHash.parse( txt, options )
261
- pp records
262
- # => [{a: 1, b: 2, c: 3},
263
- # {a: true, b: false, c: nil}]
264
- ```
265
-
266
- Built-in header converters include:
267
-
268
- | Converter | Comments |
269
- |--------------|---------------------|
270
- | `:downcase` | downcase strings |
271
- | `:symbol` | convert strings to symbols (and downcase and remove non-alphanumerics) |
272
-
273
-
274
-
275
- ### What about (typed) structs?
276
-
277
- See the [csvrecord library »](https://github.com/csvreader/csvrecord)
278
-
279
- Example from the csvrecord docu:
280
-
281
- Step 1: Define a (typed) struct for the comma-separated values (csv) records. Example:
282
-
283
- ```ruby
284
- require 'csvrecord'
285
-
286
- Beer = CsvRecord.define do
287
- field :brewery ## note: default type is :string
288
- field :city
289
- field :name
290
- field :abv, Float ## allows type specified as class (or use :float)
291
- end
292
- ```
293
-
294
- or in "classic" style:
295
-
296
- ```ruby
297
- class Beer < CsvRecord::Base
298
- field :brewery
299
- field :city
300
- field :name
301
- field :abv, Float
302
- end
303
- ```
304
-
305
-
306
- Step 2: Read in the comma-separated values (csv) datafile. Example:
307
-
308
- ```ruby
309
- beers = Beer.read( 'beer.csv' )
310
-
311
- puts "#{beers.size} beers:"
312
- pp beers
313
- ```
314
-
315
- pretty prints (pp):
316
-
317
- ```
318
- 6 beers:
319
- [#<Beer:0x302c760 @values=
320
- ["Andechser Klosterbrauerei", "Andechs", "Doppelbock Dunkel", 7.0]>,
321
- #<Beer:0x3026fe8 @values=
322
- ["Augustiner Br\u00E4u M\u00FCnchen", "M\u00FCnchen", "Edelstoff", 5.6]>,
323
- #<Beer:0x30257a0 @values=
324
- ["Bayerische Staatsbrauerei Weihenstephan", "Freising", "Hefe Weissbier", 5.4]>,
325
- ...
326
- ]
327
- ```
328
-
329
- Or loop over the records. Example:
330
-
331
- ``` ruby
332
- Beer.read( 'beer.csv' ).each do |rec|
333
- puts "#{rec.name} (#{rec.abv}%) by #{rec.brewery}, #{rec.city}"
334
- end
335
-
336
- # -or-
337
-
338
- Beer.foreach( 'beer.csv' ) do |rec|
339
- puts "#{rec.name} (#{rec.abv}%) by #{rec.brewery}, #{rec.city}"
340
- end
341
- ```
342
-
343
-
344
- printing:
345
-
346
- ```
347
- Doppelbock Dunkel (7.0%) by Andechser Klosterbrauerei, Andechs
348
- Edelstoff (5.6%) by Augustiner Bräu München, München
349
- Hefe Weissbier (5.4%) by Bayerische Staatsbrauerei Weihenstephan, Freising
350
- Rauchbier Märzen (5.1%) by Brauerei Spezial, Bamberg
351
- Münchner Dunkel (5.0%) by Hacker-Pschorr Bräu, München
352
- Hofbräu Oktoberfestbier (6.3%) by Staatliches Hofbräuhaus München, München
353
- ```
354
-
355
-
356
- ### What about tabular data packages with pre-defined types / schemas?
357
-
358
- See the [csvpack library »](https://github.com/csvreader/csvpack)
359
-
360
-
361
-
362
-
363
-
364
- ## Frequently Asked Questions (FAQ) and Answers
365
-
366
- ### Q: What's CSV the right way? What best practices can I use?
367
-
368
- Use best practices out-of-the-box with zero-configuration.
369
- Do you know how to skip blank lines or how to add `#` single-line comments?
370
- Or how to trim leading and trailing spaces? No worries. It's turned on by default.
371
-
372
- Yes, you can. Use
373
-
374
- ```
375
- #######
376
- # try with some comments
377
- # and blank lines even before header (first row)
378
-
379
- Brewery,City,Name,Abv
380
- Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
381
- Augustiner Bräu München,München,Edelstoff,5.6%
382
-
383
- Bayerische Staatsbrauerei Weihenstephan, Freising, Hefe Weissbier, 5.4%
384
- Brauerei Spezial, Bamberg, Rauchbier Märzen, 5.1%
385
- Hacker-Pschorr Bräu, München, Münchner Dunkel, 5.0%
386
- Staatliches Hofbräuhaus München, München, Hofbräu Oktoberfestbier, 6.3%
387
- ```
388
-
389
- instead of strict "classic"
390
- (no blank lines, no comments, no leading and trailing spaces, etc.):
391
-
392
- ```
393
- Brewery,City,Name,Abv
394
- Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
395
- Augustiner Bräu München,München,Edelstoff,5.6%
396
- Bayerische Staatsbrauerei Weihenstephan,Freising,Hefe Weissbier,5.4%
397
- Brauerei Spezial,Bamberg,Rauchbier Märzen,5.1%
398
- Hacker-Pschorr Bräu,München,Münchner Dunkel,5.0%
399
- Staatliches Hofbräuhaus München,München,Hofbräu Oktoberfestbier,6.3%
400
- ```
401
-
402
-
403
- Or use the ARFF (attribute-relation file format)-like alternative style
404
- with `%` for comments and `@`-directives
405
- for "meta data" in the header (before any records):
406
-
407
- ```
408
- %%%%%%%%%%%%%%%%%%
409
- % try with some comments
410
- % and blank lines even before @-directives in header
411
-
412
- @RELATION Beer
413
-
414
- @ATTRIBUTE Brewery
415
- @ATTRIBUTE City
416
- @ATTRIBUTE Name
417
- @ATTRIBUTE Abv
418
-
419
- @DATA
420
- Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
421
- Augustiner Bräu München,München,Edelstoff,5.6%
422
-
423
- Bayerische Staatsbrauerei Weihenstephan, Freising, Hefe Weissbier, 5.4%
424
- Brauerei Spezial, Bamberg, Rauchbier Märzen, 5.1%
425
- Hacker-Pschorr Bräu, München, Münchner Dunkel, 5.0%
426
- Staatliches Hofbräuhaus München, München, Hofbräu Oktoberfestbier, 6.3%
427
- ```
428
-
429
- Or use the ARFF (attribute-relation file format)-like alternative style with `@`-directives
430
- inside comments (for easier backwards compatibility with old readers)
431
- for "meta data" in the header (before any records):
432
-
433
- ```
434
- ##########################
435
- # try with some comments
436
- # and blank lines even before @-directives in header
437
- #
438
- # @RELATION Beer
439
- #
440
- # @ATTRIBUTE Brewery
441
- # @ATTRIBUTE City
442
- # @ATTRIBUTE Name
443
- # @ATTRIBUTE Abv
444
-
445
- Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
446
- Augustiner Bräu München,München,Edelstoff,5.6%
447
-
448
- Bayerische Staatsbrauerei Weihenstephan, Freising, Hefe Weissbier, 5.4%
449
- Brauerei Spezial, Bamberg, Rauchbier Märzen, 5.1%
450
- Hacker-Pschorr Bräu, München, Münchner Dunkel, 5.0%
451
- Staatliches Hofbräuhaus München, München, Hofbräu Oktoberfestbier, 6.3%
452
- ```
453
-
454
-
455
-
456
- ### Q: How can I change the default format / dialect?
457
-
458
- The reader includes more than half a dozen pre-configured formats,
459
- dialects.
460
-
461
- Use strict if you do NOT want to trim leading and trailing spaces
462
- and if you do NOT want to skip blank lines. Example:
463
-
464
- ``` ruby
465
- txt = <<TXT
466
- 1, 2,3
467
- 4,5 ,6
468
-
469
- TXT
470
-
471
- records = Csv.strict.parse( txt )
472
- pp records
473
- # => [["1","•2","3"],
474
- # ["4","5•","6"],
475
- # [""]]
476
- ```
477
-
478
- More strict pre-configured variants include:
479
-
480
- `Csv.mysql` uses:
481
-
482
- ``` ruby
483
- ParserStrict.new( sep: "\t",
484
- quote: false,
485
- escape: true,
486
- null: "\\N" )
487
- ```
488
-
489
- `Csv.postgres` or `Csv.postgresql` uses:
490
-
491
- ``` ruby
492
- ParserStrict.new( doublequote: false,
493
- escape: true,
494
- null: "" )
495
- ```
496
-
497
- `Csv.postgres_text` or `Csv.postgresql_text` uses:
498
-
499
- ``` ruby
500
- ParserStrict.new( sep: "\t",
501
- quote: false,
502
- escape: true,
503
- null: "\\N" )
504
- ```
505
-
506
- and so on.
507
-
508
-
509
- ### Q: How can I change the separator to semicolon (`;`) or pipe (`|`) or tab (`\t`)?
510
-
511
- Pass in the `sep` keyword option
512
- to the parser. Example:
513
-
514
- ``` ruby
515
- Csv.parse( ..., sep: ';' )
516
- Csv.read( ..., sep: ';' )
517
- # ...
518
- Csv.parse( ..., sep: '|' )
519
- Csv.read( ..., sep: '|' )
520
- # and so on
521
- ```
522
-
523
- Note: If you use tab (`\t`) use the `TabReader`
524
- (or for your convenience the built-in `Csv.tab` alias)!
525
- If you use the "classic" one or more space or tab (`/[ \t]+/`) regex
526
- use the `TableReader`
527
- (or for your convenience the built-in `Csv.table` alias)!
528
-
529
-
530
- Note: The default ("The Right Way") parser does NOT allow space or tab
531
- as separator (because leading and trailing space always gets trimmed
532
- unless inside quotes, etc.). Use the `strict` parser if you want
533
- to make up your own format with space or tab as a separator
534
- or if you want that every space or tab counts (is significant).
535
-
536
-
537
-
538
- Aside: Why? Tab =! CSV. Yes, tab is
539
- its own (even) simpler format
540
- (e.g. no escape rules, no newlines in values, etc.),
541
- see [`TabReader` »](https://github.com/csvreader/tabreader).
542
-
543
- ``` ruby
544
- Csv.tab.parse( ... ) # note: "classic" strict tab format
545
- Csv.tab.read( ... )
546
- # ...
547
-
548
- Csv.table.parse( ... ) # note: "classic" one or more space (or tab) table format
549
- Csv.table.read( ... )
550
- # ...
551
- ```
552
-
553
- If you want double quote escape rules, newlines in quotes values, etc. use
554
- the "strict" parser with the separator (`sep`) changed to tab (`\t`).
555
-
556
- ``` ruby
557
- Csv.strict.parse( ..., sep: "\t" ) # note: csv-like tab format with quotes
558
- Csv.strict.read( ..., sep: "\t" )
559
- # ...
560
- ```
561
-
562
-
563
-
564
-
565
- ### Q: How can I read records with fixed width fields (and no separator)?
566
-
567
- Pass in the `width` keyword option with the field widths / lengths
568
- to the "fixed" parser. Example:
569
-
570
- ``` ruby
571
- txt = <<TXT
572
- 12345678123456781234567890123456789012345678901212345678901234
573
- TXT
574
-
575
- Csv.fixed.parse( txt, width: [8,8,32,14] ) # or Csv.fix or Csv.f
576
- # => [["12345678","12345678", "12345678901234567890123456789012", "12345678901234"]]
577
-
578
-
579
- txt = <<TXT
580
- John Smith john@example.com 1-888-555-6666
581
- Michele O'Reileymichele@example.com 1-333-321-8765
582
- TXT
583
-
584
- Csv.fixed.parse( txt, width: [8,8,32,14] ) # or Csv.fix or Csv.f
585
- # => [["John", "Smith", "john@example.com", "1-888-555-6666"],
586
- # ["Michele", "O'Reiley", "michele@example.com", "1-333-321-8765"]]
587
-
588
- # and so on
589
- ```
590
-
591
- <!--
592
- Note: You can use for your convenience the built-in
593
- `Csv.fix` or `Csv.f` aliases / shortcuts.
594
- -->
595
-
596
-
597
- Note: You can use negative widths (e.g. `-2`, `-3`, and so on)
598
- to "skip" filler fields (e.g. `--`, `---`, and so on).
599
- Example:
600
-
601
- ``` ruby
602
- txt = <<TXT
603
- 12345678--12345678---12345678901234567890123456789012--12345678901234XXX
604
- TXT
605
-
606
- Csv.fixed.parse( txt, width: [8,-2,8,-3,32,-2,14] ) # or Csv.fix or Csv.f
607
- # => [["12345678","12345678", "12345678901234567890123456789012", "12345678901234"]]
608
- ```
609
-
610
-
611
-
612
-
613
-
614
- ### Q: What's broken in the standard library CSV reader?
615
-
616
- Two major design bugs and many many minor.
617
-
618
- (1) The CSV class uses [`line.split(',')`](https://github.com/ruby/csv/blob/master/lib/csv.rb#L1255) with some kludges (†) with the claim it's faster.
619
- What?! The right way: CSV needs its own purpose-built parser. There's no other
620
- way you can handle all the (edge) cases with double quotes and escaped doubled up
621
- double quotes. Period.
622
-
623
- For example, the CSV class cannot handle leading or trailing spaces
624
- for double quoted values `1,•"2","3"•`.
625
- Or handling double quotes inside values and so on and on.
626
-
627
- (2) The CSV class returns `nil` for `,,` but an empty string (`""`)
628
- for `"","",""`. The right way: All values are always strings. Period.
629
-
630
- If you want to use `nil` you MUST configure a string (or strings)
631
- such as `NA`, `n/a`, `\N`, or similar that map to `nil`.
632
-
633
-
634
- (†): kludge - a workaround or quick-and-dirty solution that is clumsy, inelegant, inefficient, difficult to extend and hard to maintain
635
-
636
- Appendix: Simple examples the standard csv library cannot read:
637
-
638
- Quoted values with leading or trailing spaces e.g.
639
-
640
- ```
641
- 1, "2","3" , "4" ,5
642
- ```
643
-
644
- =>
645
-
646
- ``` ruby
647
- ["1", "2", "3", "4" ,"5"]
648
- ```
649
-
650
- "Auto-fix" unambiguous quotes in "unquoted" values e.g.
651
-
652
- ```
653
- value with "quotes", another value
654
- ```
655
-
656
- =>
657
-
658
- ``` ruby
659
- ["value with \"quotes\"", "another value"]
660
- ```
661
-
662
- and some more.
663
-
664
-
665
-
666
-
667
- ## Alternatives
668
-
669
- See the Libraries & Tools section in the [Awesome CSV](https://github.com/csvspecs/awesome-csv#libraries--tools) page.
670
-
671
-
672
- ## License
673
-
674
- ![](https://publicdomainworks.github.io/buttons/zero88x31.png)
675
-
676
- The `csvreader` scripts are dedicated to the public domain.
677
- Use it as you please with no restrictions whatsoever.
678
-
679
- ## Questions? Comments?
680
-
681
- Send them along to the [wwwmake forum](http://groups.google.com/group/wwwmake).
682
- Thanks!
1
+ # csvreader - read tabular data in the comma-separated values (csv) format the right way (uses best practices out-of-the-box with zero-configuration)
2
+
3
+
4
+ * home :: [github.com/csvreader/csvreader](https://github.com/csvreader/csvreader)
5
+ * bugs :: [github.com/csvreader/csvreader/issues](https://github.com/csvreader/csvreader/issues)
6
+ * gem :: [rubygems.org/gems/csvreader](https://rubygems.org/gems/csvreader)
7
+ * rdoc :: [rubydoc.info/gems/csvreader](http://rubydoc.info/gems/csvreader)
8
+ * forum :: [wwwmake](http://groups.google.com/group/wwwmake)
9
+
10
+
11
+
12
+
13
+ ## What's News?
14
+
15
+ **v1.2.2** Added auto-fix/correction/recovery
16
+ for double quoted value with extra trailing value
17
+ to the default parser (`ParserStd`) e.g. `"Freddy" Mercury`
18
+ will get read "as is" and turned
19
+ into an "unquoted" value with "literal" quotes e.g. `"Freddy" Mercury`.
20
+
21
+
22
+ **v1.2.1** Added support for (optional) hashtag to the
23
+ to the default parser (`ParserStd`) for
24
+ supporting the [Humanitarian eXchange Language (HXL)](https://github.com/csvspecs/csv-hxl).
25
+ Default is turned off (`false`). Use `Csv.human`
26
+ or `Csv.hum` or `Csv.hxl` for pre-defined with hashtag turned on.
27
+
28
+
29
+ **v1.2** Added support for alternative (non-space) separators (e.g. `;|^:`)
30
+ to the default parser (`ParserStd`).
31
+
32
+
33
+ **v1.1.5** Added built-in support for (optional) alternative space
34
+ character
35
+ (e.g. `_-+•`)
36
+ to the default parser (`ParserStd`) and the table parser (`ParserTable`).
37
+ Turns `Man_Utd` into `Man Utd`, for example. Default is turned off (`nil`).
38
+
39
+
40
+ **v1.1.4** Added new "classic" table parser (see `ParserTable`) for supporting fields separated by (one or more) spaces
41
+ e.g. `Csv.table.parse( txt )`.
42
+
43
+
44
+ **v1.1.3**: Added built-in support for french single and double quotes / guillemets (`‹› «»`) to default parser ("The Right Way").
45
+ Now you can use both, that is, single (`‹...›'` or `›...‹'`)
46
+ or double (`«...»` or `»...«`).
47
+ Note: A quote only "kicks-in" if it's the first (non-whitespace)
48
+ character of the value (otherwise it's just a "vanilla" literal character).
49
+
50
+
51
+ **v1.1.2**: Added built-in support for single quotes (`'`) to default parser ("The Right Way").
52
+ Now you can use both, that is, single (`'...'`) or double quotes (`"..."`)
53
+ like in ruby (or javascript or html or ...) :-).
54
+ Note: A quote only "kicks-in" if it's the first (non-whitespace)
55
+ character of the value (otherwise it's just a "vanilla" literal character)
56
+ e.g. `48°51'24"N` needs no quote :-).
57
+ With the "strict" parser you will get a firework of "stray" quote errors / exceptions.
58
+
59
+
60
+
61
+ **v1.1.1**: Added built-in support for (optional) alternative comments (`%`) - used by
62
+ [ARFF (attribute-relation file format)](https://github.com/csvspecs/csv-meta#attribute-relation-classic) -
63
+ and support for (optional) directives (`@`) in header (that is, before any records)
64
+ to default parser ("The Right Way").
65
+ Now you can use either `#` or `%` for comments, the first one "wins" - you CANNOT use both.
66
+ Now you can use either a front matter (`---`) block
67
+ or directives (e.g. `@attribute`, `@relation`, etc.)
68
+ for meta data, the first one "wins" - you CANNOT use both.
69
+
70
+
71
+ **v1.1.0**: Added new fixed width field (fwf) parser (see `ParserFixed`) for supporting fields with fixed width (and no separator)
72
+ e.g. `Csv.fixed.parse( txt, width: [8,-2,8,-3,32,-2,14] )`.
73
+
74
+
75
+ **v1.0.3**: Added built-in support for an (optional) front matter (`---`) meta data block
76
+ in header (that is, before any records)
77
+ to default parser ("The Right Way") - used by [CSVY (yaml front matter for csv file format)](https://github.com/csvspecs/csv-meta#front-matter-in-yaml).
78
+ Use `Csv.parser.meta` to get the parsed meta data block hash (or `nil`) if none.
79
+
80
+
81
+
82
+
83
+ ## Usage
84
+
85
+
86
+ ``` ruby
87
+ txt = <<TXT
88
+ 1,2,3
89
+ 4,5,6
90
+ TXT
91
+
92
+ records = Csv.parse( txt ) ## or CsvReader.parse
93
+ pp records
94
+ # => [["1","2","3"],
95
+ # ["4","5","6"]]
96
+
97
+ # -or-
98
+
99
+ records = Csv.read( "values.csv" ) ## or CsvReader.read
100
+ pp records
101
+ # => [["1","2","3"],
102
+ # ["4","5","6"]]
103
+
104
+ # -or-
105
+
106
+ Csv.foreach( "values.csv" ) do |rec| ## or CsvReader.foreach
107
+ pp rec
108
+ end
109
+ # => ["1","2","3"]
110
+ # => ["4","5","6"]
111
+ ```
112
+
113
+
114
+ ### What about type inference and data converters?
115
+
116
+ Use the converters keyword option to (auto-)convert strings to nulls, booleans, integers, floats, dates, etc.
117
+ Example:
118
+
119
+ ``` ruby
120
+ txt = <<TXT
121
+ 1,2,3
122
+ true,false,null
123
+ TXT
124
+
125
+ records = Csv.parse( txt, :converters => :all ) ## or CsvReader.parse
126
+ pp records
127
+ # => [[1,2,3],
128
+ # [true,false,nil]]
129
+ ```
130
+
131
+
132
+ Built-in converters include:
133
+
134
+ | Converter | Comments |
135
+ |--------------|-------------------|
136
+ | `:integer` | convert matching strings to integer |
137
+ | `:float` | convert matching strings to float |
138
+ | `:numeric` | shortcut for `[:integer, :float]` |
139
+ | `:date` | convert matching strings to `Date` (year/month/day) |
140
+ | `:date_time` | convert matching strings to `DateTime` |
141
+ | `:null` | convert matching strings to null (`nil`) |
142
+ | `:boolean` | convert matching strings to boolean (`true` or `false`) |
143
+ | `:all` | shortcut for `[:null, :boolean, :date_time, :numeric]` |
144
+
145
+
146
+ Or add your own converters. Example:
147
+
148
+ ``` ruby
149
+ Csv.parse( 'Ruby, 2020-03-01, 100', converters: [->(v) { Time.parse(v) rescue v }] )
150
+ #=> [["Ruby", 2020-03-01 00:00:00 +0200, "100"]]
151
+ ```
152
+
153
+ A custom converter is a method that gets the value passed in
154
+ and if successful returns a non-string type (e.g. integer, float, date, etc.)
155
+ or a string (for further processing with all other converters in the "pipeline" configuration).
156
+
157
+
158
+
159
+ ### What about Enumerable?
160
+
161
+ Yes, every reader includes `Enumerable` and runs on `each`.
162
+ Use `new` or `open` without a block
163
+ to get the enumerator (iterator).
164
+ Example:
165
+
166
+
167
+ ``` ruby
168
+ csv = Csv.new( "a,b,c" )
169
+ it = csv.to_enum
170
+ pp it.next
171
+ # => ["a","b","c"]
172
+
173
+ # -or-
174
+
175
+ csv = Csv.open( "values.csv" )
176
+ it = csv.to_enum
177
+ pp it.next
178
+ # => ["1","2","3"]
179
+ pp it.next
180
+ # => ["4","5","6"]
181
+ ```
182
+
183
+
184
+
185
+
186
+
187
+ ### What about headers?
188
+
189
+ Use the `CsvHash`
190
+ if the first line is a header (or if missing pass in the headers
191
+ as an array) and you want your records as hashes instead of arrays of strings.
192
+ Example:
193
+
194
+ ``` ruby
195
+ txt = <<TXT
196
+ A,B,C
197
+ 1,2,3
198
+ 4,5,6
199
+ TXT
200
+
201
+ records = CsvHash.parse( txt ) ## or CsvHashReader.parse
202
+ pp records
203
+
204
+ # -or-
205
+
206
+ txt2 = <<TXT
207
+ 1,2,3
208
+ 4,5,6
209
+ TXT
210
+
211
+ records = CsvHash.parse( txt2, headers: ["A","B","C"] ) ## or CsvHashReader.parse
212
+ pp records
213
+
214
+ # => [{"A": "1", "B": "2", "C": "3"},
215
+ # {"A": "4", "B": "5", "C": "6"}]
216
+
217
+ # -or-
218
+
219
+ records = CsvHash.read( "hash.csv" ) ## or CsvHashReader.read
220
+ pp records
221
+ # => [{"A": "1", "B": "2", "C": "3"},
222
+ # {"A": "4", "B": "5", "C": "6"}]
223
+
224
+ # -or-
225
+
226
+ CsvHash.foreach( "hash.csv" ) do |rec| ## or CsvHashReader.foreach
227
+ pp rec
228
+ end
229
+ # => {"A": "1", "B": "2", "C": "3"}
230
+ # => {"A": "4", "B": "5", "C": "6"}
231
+ ```
232
+
233
+
234
+ ### What about symbol keys for hashes?
235
+
236
+ Yes, you can use the header_converters keyword option.
237
+ Use `:symbol` for (auto-)converting header (strings) to symbols.
238
+ Note: the symbol converter will also downcase all letters and
239
+ remove all non-alphanumeric (e.g. `!?$%`) chars
240
+ and replace spaces with underscores.
241
+
242
+ Example:
243
+
244
+ ``` ruby
245
+ txt = <<TXT
246
+ a,b,c
247
+ 1,2,3
248
+ true,false,null
249
+ TXT
250
+
251
+ records = CsvHash.parse( txt, :converters => :all, :header_converters => :symbol )
252
+ pp records
253
+ # => [{a: 1, b: 2, c: 3},
254
+ # {a: true, b: false, c: nil}]
255
+
256
+ # -or-
257
+ options = { :converters => :all,
258
+ :header_converters => :symbol }
259
+
260
+ records = CsvHash.parse( txt, options )
261
+ pp records
262
+ # => [{a: 1, b: 2, c: 3},
263
+ # {a: true, b: false, c: nil}]
264
+ ```
265
+
266
+ Built-in header converters include:
267
+
268
+ | Converter | Comments |
269
+ |--------------|---------------------|
270
+ | `:downcase` | downcase strings |
271
+ | `:symbol` | convert strings to symbols (and downcase and remove non-alphanumerics) |
272
+
273
+
274
+
275
+ ### What about (typed) structs?
276
+
277
+ See the [csvrecord library »](https://github.com/csvreader/csvrecord)
278
+
279
+ Example from the csvrecord docu:
280
+
281
+ Step 1: Define a (typed) struct for the comma-separated values (csv) records. Example:
282
+
283
+ ```ruby
284
+ require 'csvrecord'
285
+
286
+ Beer = CsvRecord.define do
287
+ field :brewery ## note: default type is :string
288
+ field :city
289
+ field :name
290
+ field :abv, Float ## allows type specified as class (or use :float)
291
+ end
292
+ ```
293
+
294
+ or in "classic" style:
295
+
296
+ ```ruby
297
+ class Beer < CsvRecord::Base
298
+ field :brewery
299
+ field :city
300
+ field :name
301
+ field :abv, Float
302
+ end
303
+ ```
304
+
305
+
306
+ Step 2: Read in the comma-separated values (csv) datafile. Example:
307
+
308
+ ```ruby
309
+ beers = Beer.read( 'beer.csv' )
310
+
311
+ puts "#{beers.size} beers:"
312
+ pp beers
313
+ ```
314
+
315
+ pretty prints (pp):
316
+
317
+ ```
318
+ 6 beers:
319
+ [#<Beer:0x302c760 @values=
320
+ ["Andechser Klosterbrauerei", "Andechs", "Doppelbock Dunkel", 7.0]>,
321
+ #<Beer:0x3026fe8 @values=
322
+ ["Augustiner Br\u00E4u M\u00FCnchen", "M\u00FCnchen", "Edelstoff", 5.6]>,
323
+ #<Beer:0x30257a0 @values=
324
+ ["Bayerische Staatsbrauerei Weihenstephan", "Freising", "Hefe Weissbier", 5.4]>,
325
+ ...
326
+ ]
327
+ ```
328
+
329
+ Or loop over the records. Example:
330
+
331
+ ``` ruby
332
+ Beer.read( 'beer.csv' ).each do |rec|
333
+ puts "#{rec.name} (#{rec.abv}%) by #{rec.brewery}, #{rec.city}"
334
+ end
335
+
336
+ # -or-
337
+
338
+ Beer.foreach( 'beer.csv' ) do |rec|
339
+ puts "#{rec.name} (#{rec.abv}%) by #{rec.brewery}, #{rec.city}"
340
+ end
341
+ ```
342
+
343
+
344
+ printing:
345
+
346
+ ```
347
+ Doppelbock Dunkel (7.0%) by Andechser Klosterbrauerei, Andechs
348
+ Edelstoff (5.6%) by Augustiner Bräu München, München
349
+ Hefe Weissbier (5.4%) by Bayerische Staatsbrauerei Weihenstephan, Freising
350
+ Rauchbier Märzen (5.1%) by Brauerei Spezial, Bamberg
351
+ Münchner Dunkel (5.0%) by Hacker-Pschorr Bräu, München
352
+ Hofbräu Oktoberfestbier (6.3%) by Staatliches Hofbräuhaus München, München
353
+ ```
354
+
355
+
356
+ ### What about tabular data packages with pre-defined types / schemas?
357
+
358
+ See the [csvpack library »](https://github.com/csvreader/csvpack)
359
+
360
+
361
+
362
+
363
+
364
+ ## Frequently Asked Questions (FAQ) and Answers
365
+
366
+ ### Q: What's CSV the right way? What best practices can I use?
367
+
368
+ Use best practices out-of-the-box with zero-configuration.
369
+ Do you know how to skip blank lines or how to add `#` single-line comments?
370
+ Or how to trim leading and trailing spaces? No worries. It's turned on by default.
371
+
372
+ Yes, you can. Use
373
+
374
+ ```
375
+ #######
376
+ # try with some comments
377
+ # and blank lines even before header (first row)
378
+
379
+ Brewery,City,Name,Abv
380
+ Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
381
+ Augustiner Bräu München,München,Edelstoff,5.6%
382
+
383
+ Bayerische Staatsbrauerei Weihenstephan, Freising, Hefe Weissbier, 5.4%
384
+ Brauerei Spezial, Bamberg, Rauchbier Märzen, 5.1%
385
+ Hacker-Pschorr Bräu, München, Münchner Dunkel, 5.0%
386
+ Staatliches Hofbräuhaus München, München, Hofbräu Oktoberfestbier, 6.3%
387
+ ```
388
+
389
+ instead of strict "classic"
390
+ (no blank lines, no comments, no leading and trailing spaces, etc.):
391
+
392
+ ```
393
+ Brewery,City,Name,Abv
394
+ Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
395
+ Augustiner Bräu München,München,Edelstoff,5.6%
396
+ Bayerische Staatsbrauerei Weihenstephan,Freising,Hefe Weissbier,5.4%
397
+ Brauerei Spezial,Bamberg,Rauchbier Märzen,5.1%
398
+ Hacker-Pschorr Bräu,München,Münchner Dunkel,5.0%
399
+ Staatliches Hofbräuhaus München,München,Hofbräu Oktoberfestbier,6.3%
400
+ ```
401
+
402
+
403
+ Or use the ARFF (attribute-relation file format)-like alternative style
404
+ with `%` for comments and `@`-directives
405
+ for "meta data" in the header (before any records):
406
+
407
+ ```
408
+ %%%%%%%%%%%%%%%%%%
409
+ % try with some comments
410
+ % and blank lines even before @-directives in header
411
+
412
+ @RELATION Beer
413
+
414
+ @ATTRIBUTE Brewery
415
+ @ATTRIBUTE City
416
+ @ATTRIBUTE Name
417
+ @ATTRIBUTE Abv
418
+
419
+ @DATA
420
+ Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
421
+ Augustiner Bräu München,München,Edelstoff,5.6%
422
+
423
+ Bayerische Staatsbrauerei Weihenstephan, Freising, Hefe Weissbier, 5.4%
424
+ Brauerei Spezial, Bamberg, Rauchbier Märzen, 5.1%
425
+ Hacker-Pschorr Bräu, München, Münchner Dunkel, 5.0%
426
+ Staatliches Hofbräuhaus München, München, Hofbräu Oktoberfestbier, 6.3%
427
+ ```
428
+
429
+ Or use the ARFF (attribute-relation file format)-like alternative style with `@`-directives
430
+ inside comments (for easier backwards compatibility with old readers)
431
+ for "meta data" in the header (before any records):
432
+
433
+ ```
434
+ ##########################
435
+ # try with some comments
436
+ # and blank lines even before @-directives in header
437
+ #
438
+ # @RELATION Beer
439
+ #
440
+ # @ATTRIBUTE Brewery
441
+ # @ATTRIBUTE City
442
+ # @ATTRIBUTE Name
443
+ # @ATTRIBUTE Abv
444
+
445
+ Andechser Klosterbrauerei,Andechs,Doppelbock Dunkel,7%
446
+ Augustiner Bräu München,München,Edelstoff,5.6%
447
+
448
+ Bayerische Staatsbrauerei Weihenstephan, Freising, Hefe Weissbier, 5.4%
449
+ Brauerei Spezial, Bamberg, Rauchbier Märzen, 5.1%
450
+ Hacker-Pschorr Bräu, München, Münchner Dunkel, 5.0%
451
+ Staatliches Hofbräuhaus München, München, Hofbräu Oktoberfestbier, 6.3%
452
+ ```
453
+
454
+
455
+
456
+ ### Q: How can I change the default format / dialect?
457
+
458
+ The reader includes more than half a dozen pre-configured formats,
459
+ dialects.
460
+
461
+ Use strict if you do NOT want to trim leading and trailing spaces
462
+ and if you do NOT want to skip blank lines. Example:
463
+
464
+ ``` ruby
465
+ txt = <<TXT
466
+ 1, 2,3
467
+ 4,5 ,6
468
+
469
+ TXT
470
+
471
+ records = Csv.strict.parse( txt )
472
+ pp records
473
+ # => [["1","•2","3"],
474
+ # ["4","5•","6"],
475
+ # [""]]
476
+ ```
477
+
478
+ More strict pre-configured variants include:
479
+
480
+ `Csv.mysql` uses:
481
+
482
+ ``` ruby
483
+ ParserStrict.new( sep: "\t",
484
+ quote: false,
485
+ escape: true,
486
+ null: "\\N" )
487
+ ```
488
+
489
+ `Csv.postgres` or `Csv.postgresql` uses:
490
+
491
+ ``` ruby
492
+ ParserStrict.new( doublequote: false,
493
+ escape: true,
494
+ null: "" )
495
+ ```
496
+
497
+ `Csv.postgres_text` or `Csv.postgresql_text` uses:
498
+
499
+ ``` ruby
500
+ ParserStrict.new( sep: "\t",
501
+ quote: false,
502
+ escape: true,
503
+ null: "\\N" )
504
+ ```
505
+
506
+ and so on.
507
+
508
+
509
+ ### Q: How can I change the separator to semicolon (`;`) or pipe (`|`) or tab (`\t`)?
510
+
511
+ Pass in the `sep` keyword option
512
+ to the parser. Example:
513
+
514
+ ``` ruby
515
+ Csv.parse( ..., sep: ';' )
516
+ Csv.read( ..., sep: ';' )
517
+ # ...
518
+ Csv.parse( ..., sep: '|' )
519
+ Csv.read( ..., sep: '|' )
520
+ # and so on
521
+ ```
522
+
523
+ Note: If you use tab (`\t`) use the `TabReader`
524
+ (or for your convenience the built-in `Csv.tab` alias)!
525
+ If you use the "classic" one or more space or tab (`/[ \t]+/`) regex
526
+ use the `TableReader`
527
+ (or for your convenience the built-in `Csv.table` alias)!
528
+
529
+
530
+ Note: The default ("The Right Way") parser does NOT allow space or tab
531
+ as separator (because leading and trailing space always gets trimmed
532
+ unless inside quotes, etc.). Use the `strict` parser if you want
533
+ to make up your own format with space or tab as a separator
534
+ or if you want that every space or tab counts (is significant).
535
+
536
+
537
+
538
+ Aside: Why? Tab =! CSV. Yes, tab is
539
+ its own (even) simpler format
540
+ (e.g. no escape rules, no newlines in values, etc.),
541
+ see [`TabReader` »](https://github.com/csvreader/tabreader).
542
+
543
+ ``` ruby
544
+ Csv.tab.parse( ... ) # note: "classic" strict tab format
545
+ Csv.tab.read( ... )
546
+ # ...
547
+
548
+ Csv.table.parse( ... ) # note: "classic" one or more space (or tab) table format
549
+ Csv.table.read( ... )
550
+ # ...
551
+ ```
552
+
553
+ If you want double quote escape rules, newlines in quotes values, etc. use
554
+ the "strict" parser with the separator (`sep`) changed to tab (`\t`).
555
+
556
+ ``` ruby
557
+ Csv.strict.parse( ..., sep: "\t" ) # note: csv-like tab format with quotes
558
+ Csv.strict.read( ..., sep: "\t" )
559
+ # ...
560
+ ```
561
+
562
+
563
+
564
+
565
+ ### Q: How can I read records with fixed width fields (and no separator)?
566
+
567
+ Pass in the `width` keyword option with the field widths / lengths
568
+ to the "fixed" parser. Example:
569
+
570
+ ``` ruby
571
+ txt = <<TXT
572
+ 12345678123456781234567890123456789012345678901212345678901234
573
+ TXT
574
+
575
+ Csv.fixed.parse( txt, width: [8,8,32,14] ) # or Csv.fix or Csv.f
576
+ # => [["12345678","12345678", "12345678901234567890123456789012", "12345678901234"]]
577
+
578
+
579
+ txt = <<TXT
580
+ John Smith john@example.com 1-888-555-6666
581
+ Michele O'Reileymichele@example.com 1-333-321-8765
582
+ TXT
583
+
584
+ Csv.fixed.parse( txt, width: [8,8,32,14] ) # or Csv.fix or Csv.f
585
+ # => [["John", "Smith", "john@example.com", "1-888-555-6666"],
586
+ # ["Michele", "O'Reiley", "michele@example.com", "1-333-321-8765"]]
587
+
588
+ # and so on
589
+ ```
590
+
591
+ <!--
592
+ Note: You can use for your convenience the built-in
593
+ `Csv.fix` or `Csv.f` aliases / shortcuts.
594
+ -->
595
+
596
+
597
+ Note: You can use negative widths (e.g. `-2`, `-3`, and so on)
598
+ to "skip" filler fields (e.g. `--`, `---`, and so on).
599
+ Example:
600
+
601
+ ``` ruby
602
+ txt = <<TXT
603
+ 12345678--12345678---12345678901234567890123456789012--12345678901234XXX
604
+ TXT
605
+
606
+ Csv.fixed.parse( txt, width: [8,-2,8,-3,32,-2,14] ) # or Csv.fix or Csv.f
607
+ # => [["12345678","12345678", "12345678901234567890123456789012", "12345678901234"]]
608
+ ```
609
+
610
+
611
+
612
+
613
+
614
+ ### Q: What's broken in the standard library CSV reader?
615
+
616
+ Two major design bugs and many many minor.
617
+
618
+ (1) The CSV class uses [`line.split(',')`](https://github.com/ruby/csv/blob/master/lib/csv.rb#L1255) with some kludges (†) with the claim it's faster.
619
+ What?! The right way: CSV needs its own purpose-built parser. There's no other
620
+ way you can handle all the (edge) cases with double quotes and escaped doubled up
621
+ double quotes. Period.
622
+
623
+ For example, the CSV class cannot handle leading or trailing spaces
624
+ for double quoted values `1,•"2","3"•`.
625
+ Or handling double quotes inside values and so on and on.
626
+
627
+ (2) The CSV class returns `nil` for `,,` but an empty string (`""`)
628
+ for `"","",""`. The right way: All values are always strings. Period.
629
+
630
+ If you want to use `nil` you MUST configure a string (or strings)
631
+ such as `NA`, `n/a`, `\N`, or similar that map to `nil`.
632
+
633
+
634
+ (†): kludge - a workaround or quick-and-dirty solution that is clumsy, inelegant, inefficient, difficult to extend and hard to maintain
635
+
636
+ Appendix: Simple examples the standard csv library cannot read:
637
+
638
+ Quoted values with leading or trailing spaces e.g.
639
+
640
+ ```
641
+ 1, "2","3" , "4" ,5
642
+ ```
643
+
644
+ =>
645
+
646
+ ``` ruby
647
+ ["1", "2", "3", "4" ,"5"]
648
+ ```
649
+
650
+ "Auto-fix" unambiguous quotes in "unquoted" values e.g.
651
+
652
+ ```
653
+ value with "quotes", another value
654
+ ```
655
+
656
+ =>
657
+
658
+ ``` ruby
659
+ ["value with \"quotes\"", "another value"]
660
+ ```
661
+
662
+ and some more.
663
+
664
+
665
+
666
+
667
+ ## Alternatives
668
+
669
+ See the Libraries & Tools section in the [Awesome CSV](https://github.com/csvspecs/awesome-csv#libraries--tools) page.
670
+
671
+
672
+ ## License
673
+
674
+ ![](https://publicdomainworks.github.io/buttons/zero88x31.png)
675
+
676
+ The `csvreader` scripts are dedicated to the public domain.
677
+ Use it as you please with no restrictions whatsoever.
678
+
679
+ ## Questions? Comments?
680
+
681
+ Send them along to the [wwwmake forum](http://groups.google.com/group/wwwmake).
682
+ Thanks!