csv 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
Files changed (3) hide show
  1. checksums.yaml +7 -0
  2. data/lib/csv.rb +2381 -0
  3. metadata +73 -0
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 723a7a56a9be6c37293ee26e91afc2618bf558fb
4
+ data.tar.gz: 356ad9c2ebc55e05b2c07e7a0c5b65e47614a39b
5
+ SHA512:
6
+ metadata.gz: f10f07c53cf9cdda7587d21ae5bd92ff5fe2a5c6bd18cebdb68cb7b808bc95a1a5933bef69f068dbab6737da71c360737086792b436a5aa3dc2705f43335124e
7
+ data.tar.gz: e3b48413c43a7803ce93d1079b36f0fd679afce318dc6af6a638f628e02c345a51dac638e1bb80e01c0e891b559bdb6b2898e47a0be4c53649f383c35b3d1999
@@ -0,0 +1,2381 @@
1
+ # encoding: US-ASCII
2
+ # frozen_string_literal: true
3
+ # = csv.rb -- CSV Reading and Writing
4
+ #
5
+ # Created by James Edward Gray II on 2005-10-31.
6
+ # Copyright 2005 James Edward Gray II. You can redistribute or modify this code
7
+ # under the terms of Ruby's license.
8
+ #
9
+ # See CSV for documentation.
10
+ #
11
+ # == Description
12
+ #
13
+ # Welcome to the new and improved CSV.
14
+ #
15
+ # This version of the CSV library began its life as FasterCSV. FasterCSV was
16
+ # intended as a replacement to Ruby's then standard CSV library. It was
17
+ # designed to address concerns users of that library had and it had three
18
+ # primary goals:
19
+ #
20
+ # 1. Be significantly faster than CSV while remaining a pure Ruby library.
21
+ # 2. Use a smaller and easier to maintain code base. (FasterCSV eventually
22
+ # grew larger, was also but considerably richer in features. The parsing
23
+ # core remains quite small.)
24
+ # 3. Improve on the CSV interface.
25
+ #
26
+ # Obviously, the last one is subjective. I did try to defer to the original
27
+ # interface whenever I didn't have a compelling reason to change it though, so
28
+ # hopefully this won't be too radically different.
29
+ #
30
+ # We must have met our goals because FasterCSV was renamed to CSV and replaced
31
+ # the original library as of Ruby 1.9. If you are migrating code from 1.8 or
32
+ # earlier, you may have to change your code to comply with the new interface.
33
+ #
34
+ # == What's Different From the Old CSV?
35
+ #
36
+ # I'm sure I'll miss something, but I'll try to mention most of the major
37
+ # differences I am aware of, to help others quickly get up to speed:
38
+ #
39
+ # === CSV Parsing
40
+ #
41
+ # * This parser is m17n aware. See CSV for full details.
42
+ # * This library has a stricter parser and will throw MalformedCSVErrors on
43
+ # problematic data.
44
+ # * This library has a less liberal idea of a line ending than CSV. What you
45
+ # set as the <tt>:row_sep</tt> is law. It can auto-detect your line endings
46
+ # though.
47
+ # * The old library returned empty lines as <tt>[nil]</tt>. This library calls
48
+ # them <tt>[]</tt>.
49
+ # * This library has a much faster parser.
50
+ #
51
+ # === Interface
52
+ #
53
+ # * CSV now uses Hash-style parameters to set options.
54
+ # * CSV no longer has generate_row() or parse_row().
55
+ # * The old CSV's Reader and Writer classes have been dropped.
56
+ # * CSV::open() is now more like Ruby's open().
57
+ # * CSV objects now support most standard IO methods.
58
+ # * CSV now has a new() method used to wrap objects like String and IO for
59
+ # reading and writing.
60
+ # * CSV::generate() is different from the old method.
61
+ # * CSV no longer supports partial reads. It works line-by-line.
62
+ # * CSV no longer allows the instance methods to override the separators for
63
+ # performance reasons. They must be set in the constructor.
64
+ #
65
+ # If you use this library and find yourself missing any functionality I have
66
+ # trimmed, please {let me know}[mailto:james@grayproductions.net].
67
+ #
68
+ # == Documentation
69
+ #
70
+ # See CSV for documentation.
71
+ #
72
+ # == What is CSV, really?
73
+ #
74
+ # CSV maintains a pretty strict definition of CSV taken directly from
75
+ # {the RFC}[http://www.ietf.org/rfc/rfc4180.txt]. I relax the rules in only one
76
+ # place and that is to make using this library easier. CSV will parse all valid
77
+ # CSV.
78
+ #
79
+ # What you don't want to do is feed CSV invalid data. Because of the way the
80
+ # CSV format works, it's common for a parser to need to read until the end of
81
+ # the file to be sure a field is invalid. This eats a lot of time and memory.
82
+ #
83
+ # Luckily, when working with invalid CSV, Ruby's built-in methods will almost
84
+ # always be superior in every way. For example, parsing non-quoted fields is as
85
+ # easy as:
86
+ #
87
+ # data.split(",")
88
+ #
89
+ # == Questions and/or Comments
90
+ #
91
+ # Feel free to email {James Edward Gray II}[mailto:james@grayproductions.net]
92
+ # with any questions.
93
+
94
+ require "forwardable"
95
+ require "English"
96
+ require "date"
97
+ require "stringio"
98
+
99
+ #
100
+ # This class provides a complete interface to CSV files and data. It offers
101
+ # tools to enable you to read and write to and from Strings or IO objects, as
102
+ # needed.
103
+ #
104
+ # == Reading
105
+ #
106
+ # === From a File
107
+ #
108
+ # ==== A Line at a Time
109
+ #
110
+ # CSV.foreach("path/to/file.csv") do |row|
111
+ # # use row here...
112
+ # end
113
+ #
114
+ # ==== All at Once
115
+ #
116
+ # arr_of_arrs = CSV.read("path/to/file.csv")
117
+ #
118
+ # === From a String
119
+ #
120
+ # ==== A Line at a Time
121
+ #
122
+ # CSV.parse("CSV,data,String") do |row|
123
+ # # use row here...
124
+ # end
125
+ #
126
+ # ==== All at Once
127
+ #
128
+ # arr_of_arrs = CSV.parse("CSV,data,String")
129
+ #
130
+ # == Writing
131
+ #
132
+ # === To a File
133
+ #
134
+ # CSV.open("path/to/file.csv", "wb") do |csv|
135
+ # csv << ["row", "of", "CSV", "data"]
136
+ # csv << ["another", "row"]
137
+ # # ...
138
+ # end
139
+ #
140
+ # === To a String
141
+ #
142
+ # csv_string = CSV.generate do |csv|
143
+ # csv << ["row", "of", "CSV", "data"]
144
+ # csv << ["another", "row"]
145
+ # # ...
146
+ # end
147
+ #
148
+ # == Convert a Single Line
149
+ #
150
+ # csv_string = ["CSV", "data"].to_csv # to CSV
151
+ # csv_array = "CSV,String".parse_csv # from CSV
152
+ #
153
+ # == Shortcut Interface
154
+ #
155
+ # CSV { |csv_out| csv_out << %w{my data here} } # to $stdout
156
+ # CSV(csv = "") { |csv_str| csv_str << %w{my data here} } # to a String
157
+ # CSV($stderr) { |csv_err| csv_err << %w{my data here} } # to $stderr
158
+ # CSV($stdin) { |csv_in| csv_in.each { |row| p row } } # from $stdin
159
+ #
160
+ # == Advanced Usage
161
+ #
162
+ # === Wrap an IO Object
163
+ #
164
+ # csv = CSV.new(io, options)
165
+ # # ... read (with gets() or each()) from and write (with <<) to csv here ...
166
+ #
167
+ # == CSV and Character Encodings (M17n or Multilingualization)
168
+ #
169
+ # This new CSV parser is m17n savvy. The parser works in the Encoding of the IO
170
+ # or String object being read from or written to. Your data is never transcoded
171
+ # (unless you ask Ruby to transcode it for you) and will literally be parsed in
172
+ # the Encoding it is in. Thus CSV will return Arrays or Rows of Strings in the
173
+ # Encoding of your data. This is accomplished by transcoding the parser itself
174
+ # into your Encoding.
175
+ #
176
+ # Some transcoding must take place, of course, to accomplish this multiencoding
177
+ # support. For example, <tt>:col_sep</tt>, <tt>:row_sep</tt>, and
178
+ # <tt>:quote_char</tt> must be transcoded to match your data. Hopefully this
179
+ # makes the entire process feel transparent, since CSV's defaults should just
180
+ # magically work for your data. However, you can set these values manually in
181
+ # the target Encoding to avoid the translation.
182
+ #
183
+ # It's also important to note that while all of CSV's core parser is now
184
+ # Encoding agnostic, some features are not. For example, the built-in
185
+ # converters will try to transcode data to UTF-8 before making conversions.
186
+ # Again, you can provide custom converters that are aware of your Encodings to
187
+ # avoid this translation. It's just too hard for me to support native
188
+ # conversions in all of Ruby's Encodings.
189
+ #
190
+ # Anyway, the practical side of this is simple: make sure IO and String objects
191
+ # passed into CSV have the proper Encoding set and everything should just work.
192
+ # CSV methods that allow you to open IO objects (CSV::foreach(), CSV::open(),
193
+ # CSV::read(), and CSV::readlines()) do allow you to specify the Encoding.
194
+ #
195
+ # One minor exception comes when generating CSV into a String with an Encoding
196
+ # that is not ASCII compatible. There's no existing data for CSV to use to
197
+ # prepare itself and thus you will probably need to manually specify the desired
198
+ # Encoding for most of those cases. It will try to guess using the fields in a
199
+ # row of output though, when using CSV::generate_line() or Array#to_csv().
200
+ #
201
+ # I try to point out any other Encoding issues in the documentation of methods
202
+ # as they come up.
203
+ #
204
+ # This has been tested to the best of my ability with all non-"dummy" Encodings
205
+ # Ruby ships with. However, it is brave new code and may have some bugs.
206
+ # Please feel free to {report}[mailto:james@grayproductions.net] any issues you
207
+ # find with it.
208
+ #
209
+ class CSV
210
+ # The version of the installed library.
211
+ VERSION = "2.4.8"
212
+
213
+ #
214
+ # A CSV::Row is part Array and part Hash. It retains an order for the fields
215
+ # and allows duplicates just as an Array would, but also allows you to access
216
+ # fields by name just as you could if they were in a Hash.
217
+ #
218
+ # All rows returned by CSV will be constructed from this class, if header row
219
+ # processing is activated.
220
+ #
221
+ class Row
222
+ #
223
+ # Construct a new CSV::Row from +headers+ and +fields+, which are expected
224
+ # to be Arrays. If one Array is shorter than the other, it will be padded
225
+ # with +nil+ objects.
226
+ #
227
+ # The optional +header_row+ parameter can be set to +true+ to indicate, via
228
+ # CSV::Row.header_row?() and CSV::Row.field_row?(), that this is a header
229
+ # row. Otherwise, the row is assumes to be a field row.
230
+ #
231
+ # A CSV::Row object supports the following Array methods through delegation:
232
+ #
233
+ # * empty?()
234
+ # * length()
235
+ # * size()
236
+ #
237
+ def initialize(headers, fields, header_row = false)
238
+ @header_row = header_row
239
+ headers.each { |h| h.freeze if h.is_a? String }
240
+
241
+ # handle extra headers or fields
242
+ @row = if headers.size >= fields.size
243
+ headers.zip(fields)
244
+ else
245
+ fields.zip(headers).map { |pair| pair.reverse! }
246
+ end
247
+ end
248
+
249
+ # Internal data format used to compare equality.
250
+ attr_reader :row
251
+ protected :row
252
+
253
+ ### Array Delegation ###
254
+
255
+ extend Forwardable
256
+ def_delegators :@row, :empty?, :length, :size
257
+
258
+ # Returns +true+ if this is a header row.
259
+ def header_row?
260
+ @header_row
261
+ end
262
+
263
+ # Returns +true+ if this is a field row.
264
+ def field_row?
265
+ not header_row?
266
+ end
267
+
268
+ # Returns the headers of this row.
269
+ def headers
270
+ @row.map { |pair| pair.first }
271
+ end
272
+
273
+ #
274
+ # :call-seq:
275
+ # field( header )
276
+ # field( header, offset )
277
+ # field( index )
278
+ #
279
+ # This method will return the field value by +header+ or +index+. If a field
280
+ # is not found, +nil+ is returned.
281
+ #
282
+ # When provided, +offset+ ensures that a header match occurs on or later
283
+ # than the +offset+ index. You can use this to find duplicate headers,
284
+ # without resorting to hard-coding exact indices.
285
+ #
286
+ def field(header_or_index, minimum_index = 0)
287
+ # locate the pair
288
+ finder = (header_or_index.is_a?(Integer) || header_or_index.is_a?(Range)) ? :[] : :assoc
289
+ pair = @row[minimum_index..-1].send(finder, header_or_index)
290
+
291
+ # return the field if we have a pair
292
+ if pair.nil?
293
+ nil
294
+ else
295
+ header_or_index.is_a?(Range) ? pair.map(&:last) : pair.last
296
+ end
297
+ end
298
+ alias_method :[], :field
299
+
300
+ #
301
+ # :call-seq:
302
+ # fetch( header )
303
+ # fetch( header ) { |row| ... }
304
+ # fetch( header, default )
305
+ #
306
+ # This method will fetch the field value by +header+. It has the same
307
+ # behavior as Hash#fetch: if there is a field with the given +header+, its
308
+ # value is returned. Otherwise, if a block is given, it is yielded the
309
+ # +header+ and its result is returned; if a +default+ is given as the
310
+ # second argument, it is returned; otherwise a KeyError is raised.
311
+ #
312
+ def fetch(header, *varargs)
313
+ raise ArgumentError, "Too many arguments" if varargs.length > 1
314
+ pair = @row.assoc(header)
315
+ if pair
316
+ pair.last
317
+ else
318
+ if block_given?
319
+ yield header
320
+ elsif varargs.empty?
321
+ raise KeyError, "key not found: #{header}"
322
+ else
323
+ varargs.first
324
+ end
325
+ end
326
+ end
327
+
328
+ # Returns +true+ if there is a field with the given +header+.
329
+ def has_key?(header)
330
+ !!@row.assoc(header)
331
+ end
332
+ alias_method :include?, :has_key?
333
+ alias_method :key?, :has_key?
334
+ alias_method :member?, :has_key?
335
+
336
+ #
337
+ # :call-seq:
338
+ # []=( header, value )
339
+ # []=( header, offset, value )
340
+ # []=( index, value )
341
+ #
342
+ # Looks up the field by the semantics described in CSV::Row.field() and
343
+ # assigns the +value+.
344
+ #
345
+ # Assigning past the end of the row with an index will set all pairs between
346
+ # to <tt>[nil, nil]</tt>. Assigning to an unused header appends the new
347
+ # pair.
348
+ #
349
+ def []=(*args)
350
+ value = args.pop
351
+
352
+ if args.first.is_a? Integer
353
+ if @row[args.first].nil? # extending past the end with index
354
+ @row[args.first] = [nil, value]
355
+ @row.map! { |pair| pair.nil? ? [nil, nil] : pair }
356
+ else # normal index assignment
357
+ @row[args.first][1] = value
358
+ end
359
+ else
360
+ index = index(*args)
361
+ if index.nil? # appending a field
362
+ self << [args.first, value]
363
+ else # normal header assignment
364
+ @row[index][1] = value
365
+ end
366
+ end
367
+ end
368
+
369
+ #
370
+ # :call-seq:
371
+ # <<( field )
372
+ # <<( header_and_field_array )
373
+ # <<( header_and_field_hash )
374
+ #
375
+ # If a two-element Array is provided, it is assumed to be a header and field
376
+ # and the pair is appended. A Hash works the same way with the key being
377
+ # the header and the value being the field. Anything else is assumed to be
378
+ # a lone field which is appended with a +nil+ header.
379
+ #
380
+ # This method returns the row for chaining.
381
+ #
382
+ def <<(arg)
383
+ if arg.is_a?(Array) and arg.size == 2 # appending a header and name
384
+ @row << arg
385
+ elsif arg.is_a?(Hash) # append header and name pairs
386
+ arg.each { |pair| @row << pair }
387
+ else # append field value
388
+ @row << [nil, arg]
389
+ end
390
+
391
+ self # for chaining
392
+ end
393
+
394
+ #
395
+ # A shortcut for appending multiple fields. Equivalent to:
396
+ #
397
+ # args.each { |arg| csv_row << arg }
398
+ #
399
+ # This method returns the row for chaining.
400
+ #
401
+ def push(*args)
402
+ args.each { |arg| self << arg }
403
+
404
+ self # for chaining
405
+ end
406
+
407
+ #
408
+ # :call-seq:
409
+ # delete( header )
410
+ # delete( header, offset )
411
+ # delete( index )
412
+ #
413
+ # Used to remove a pair from the row by +header+ or +index+. The pair is
414
+ # located as described in CSV::Row.field(). The deleted pair is returned,
415
+ # or +nil+ if a pair could not be found.
416
+ #
417
+ def delete(header_or_index, minimum_index = 0)
418
+ if header_or_index.is_a? Integer # by index
419
+ @row.delete_at(header_or_index)
420
+ elsif i = index(header_or_index, minimum_index) # by header
421
+ @row.delete_at(i)
422
+ else
423
+ [ ]
424
+ end
425
+ end
426
+
427
+ #
428
+ # The provided +block+ is passed a header and field for each pair in the row
429
+ # and expected to return +true+ or +false+, depending on whether the pair
430
+ # should be deleted.
431
+ #
432
+ # This method returns the row for chaining.
433
+ #
434
+ # If no block is given, an Enumerator is returned.
435
+ #
436
+ def delete_if(&block)
437
+ block or return enum_for(__method__) { size }
438
+
439
+ @row.delete_if(&block)
440
+
441
+ self # for chaining
442
+ end
443
+
444
+ #
445
+ # This method accepts any number of arguments which can be headers, indices,
446
+ # Ranges of either, or two-element Arrays containing a header and offset.
447
+ # Each argument will be replaced with a field lookup as described in
448
+ # CSV::Row.field().
449
+ #
450
+ # If called with no arguments, all fields are returned.
451
+ #
452
+ def fields(*headers_and_or_indices)
453
+ if headers_and_or_indices.empty? # return all fields--no arguments
454
+ @row.map { |pair| pair.last }
455
+ else # or work like values_at()
456
+ headers_and_or_indices.inject(Array.new) do |all, h_or_i|
457
+ all + if h_or_i.is_a? Range
458
+ index_begin = h_or_i.begin.is_a?(Integer) ? h_or_i.begin :
459
+ index(h_or_i.begin)
460
+ index_end = h_or_i.end.is_a?(Integer) ? h_or_i.end :
461
+ index(h_or_i.end)
462
+ new_range = h_or_i.exclude_end? ? (index_begin...index_end) :
463
+ (index_begin..index_end)
464
+ fields.values_at(new_range)
465
+ else
466
+ [field(*Array(h_or_i))]
467
+ end
468
+ end
469
+ end
470
+ end
471
+ alias_method :values_at, :fields
472
+
473
+ #
474
+ # :call-seq:
475
+ # index( header )
476
+ # index( header, offset )
477
+ #
478
+ # This method will return the index of a field with the provided +header+.
479
+ # The +offset+ can be used to locate duplicate header names, as described in
480
+ # CSV::Row.field().
481
+ #
482
+ def index(header, minimum_index = 0)
483
+ # find the pair
484
+ index = headers[minimum_index..-1].index(header)
485
+ # return the index at the right offset, if we found one
486
+ index.nil? ? nil : index + minimum_index
487
+ end
488
+
489
+ # Returns +true+ if +name+ is a header for this row, and +false+ otherwise.
490
+ def header?(name)
491
+ headers.include? name
492
+ end
493
+ alias_method :include?, :header?
494
+
495
+ #
496
+ # Returns +true+ if +data+ matches a field in this row, and +false+
497
+ # otherwise.
498
+ #
499
+ def field?(data)
500
+ fields.include? data
501
+ end
502
+
503
+ include Enumerable
504
+
505
+ #
506
+ # Yields each pair of the row as header and field tuples (much like
507
+ # iterating over a Hash). This method returns the row for chaining.
508
+ #
509
+ # If no block is given, an Enumerator is returned.
510
+ #
511
+ # Support for Enumerable.
512
+ #
513
+ def each(&block)
514
+ block or return enum_for(__method__) { size }
515
+
516
+ @row.each(&block)
517
+
518
+ self # for chaining
519
+ end
520
+
521
+ #
522
+ # Returns +true+ if this row contains the same headers and fields in the
523
+ # same order as +other+.
524
+ #
525
+ def ==(other)
526
+ return @row == other.row if other.is_a? CSV::Row
527
+ @row == other
528
+ end
529
+
530
+ #
531
+ # Collapses the row into a simple Hash. Be warned that this discards field
532
+ # order and clobbers duplicate fields.
533
+ #
534
+ def to_hash
535
+ # flatten just one level of the internal Array
536
+ Hash[*@row.inject(Array.new) { |ary, pair| ary.push(*pair) }]
537
+ end
538
+
539
+ #
540
+ # Returns the row as a CSV String. Headers are not used. Equivalent to:
541
+ #
542
+ # csv_row.fields.to_csv( options )
543
+ #
544
+ def to_csv(options = Hash.new)
545
+ fields.to_csv(options)
546
+ end
547
+ alias_method :to_s, :to_csv
548
+
549
+ # A summary of fields, by header, in an ASCII compatible String.
550
+ def inspect
551
+ str = ["#<", self.class.to_s]
552
+ each do |header, field|
553
+ str << " " << (header.is_a?(Symbol) ? header.to_s : header.inspect) <<
554
+ ":" << field.inspect
555
+ end
556
+ str << ">"
557
+ begin
558
+ str.join('')
559
+ rescue # any encoding error
560
+ str.map do |s|
561
+ e = Encoding::Converter.asciicompat_encoding(s.encoding)
562
+ e ? s.encode(e) : s.force_encoding("ASCII-8BIT")
563
+ end.join('')
564
+ end
565
+ end
566
+ end
567
+
568
+ #
569
+ # A CSV::Table is a two-dimensional data structure for representing CSV
570
+ # documents. Tables allow you to work with the data by row or column,
571
+ # manipulate the data, and even convert the results back to CSV, if needed.
572
+ #
573
+ # All tables returned by CSV will be constructed from this class, if header
574
+ # row processing is activated.
575
+ #
576
+ class Table
577
+ #
578
+ # Construct a new CSV::Table from +array_of_rows+, which are expected
579
+ # to be CSV::Row objects. All rows are assumed to have the same headers.
580
+ #
581
+ # A CSV::Table object supports the following Array methods through
582
+ # delegation:
583
+ #
584
+ # * empty?()
585
+ # * length()
586
+ # * size()
587
+ #
588
+ def initialize(array_of_rows)
589
+ @table = array_of_rows
590
+ @mode = :col_or_row
591
+ end
592
+
593
+ # The current access mode for indexing and iteration.
594
+ attr_reader :mode
595
+
596
+ # Internal data format used to compare equality.
597
+ attr_reader :table
598
+ protected :table
599
+
600
+ ### Array Delegation ###
601
+
602
+ extend Forwardable
603
+ def_delegators :@table, :empty?, :length, :size
604
+
605
+ #
606
+ # Returns a duplicate table object, in column mode. This is handy for
607
+ # chaining in a single call without changing the table mode, but be aware
608
+ # that this method can consume a fair amount of memory for bigger data sets.
609
+ #
610
+ # This method returns the duplicate table for chaining. Don't chain
611
+ # destructive methods (like []=()) this way though, since you are working
612
+ # with a duplicate.
613
+ #
614
+ def by_col
615
+ self.class.new(@table.dup).by_col!
616
+ end
617
+
618
+ #
619
+ # Switches the mode of this table to column mode. All calls to indexing and
620
+ # iteration methods will work with columns until the mode is changed again.
621
+ #
622
+ # This method returns the table and is safe to chain.
623
+ #
624
+ def by_col!
625
+ @mode = :col
626
+
627
+ self
628
+ end
629
+
630
+ #
631
+ # Returns a duplicate table object, in mixed mode. This is handy for
632
+ # chaining in a single call without changing the table mode, but be aware
633
+ # that this method can consume a fair amount of memory for bigger data sets.
634
+ #
635
+ # This method returns the duplicate table for chaining. Don't chain
636
+ # destructive methods (like []=()) this way though, since you are working
637
+ # with a duplicate.
638
+ #
639
+ def by_col_or_row
640
+ self.class.new(@table.dup).by_col_or_row!
641
+ end
642
+
643
+ #
644
+ # Switches the mode of this table to mixed mode. All calls to indexing and
645
+ # iteration methods will use the default intelligent indexing system until
646
+ # the mode is changed again. In mixed mode an index is assumed to be a row
647
+ # reference while anything else is assumed to be column access by headers.
648
+ #
649
+ # This method returns the table and is safe to chain.
650
+ #
651
+ def by_col_or_row!
652
+ @mode = :col_or_row
653
+
654
+ self
655
+ end
656
+
657
+ #
658
+ # Returns a duplicate table object, in row mode. This is handy for chaining
659
+ # in a single call without changing the table mode, but be aware that this
660
+ # method can consume a fair amount of memory for bigger data sets.
661
+ #
662
+ # This method returns the duplicate table for chaining. Don't chain
663
+ # destructive methods (like []=()) this way though, since you are working
664
+ # with a duplicate.
665
+ #
666
+ def by_row
667
+ self.class.new(@table.dup).by_row!
668
+ end
669
+
670
+ #
671
+ # Switches the mode of this table to row mode. All calls to indexing and
672
+ # iteration methods will work with rows until the mode is changed again.
673
+ #
674
+ # This method returns the table and is safe to chain.
675
+ #
676
+ def by_row!
677
+ @mode = :row
678
+
679
+ self
680
+ end
681
+
682
+ #
683
+ # Returns the headers for the first row of this table (assumed to match all
684
+ # other rows). An empty Array is returned for empty tables.
685
+ #
686
+ def headers
687
+ if @table.empty?
688
+ Array.new
689
+ else
690
+ @table.first.headers
691
+ end
692
+ end
693
+
694
+ #
695
+ # In the default mixed mode, this method returns rows for index access and
696
+ # columns for header access. You can force the index association by first
697
+ # calling by_col!() or by_row!().
698
+ #
699
+ # Columns are returned as an Array of values. Altering that Array has no
700
+ # effect on the table.
701
+ #
702
+ def [](index_or_header)
703
+ if @mode == :row or # by index
704
+ (@mode == :col_or_row and (index_or_header.is_a?(Integer) or index_or_header.is_a?(Range)))
705
+ @table[index_or_header]
706
+ else # by header
707
+ @table.map { |row| row[index_or_header] }
708
+ end
709
+ end
710
+
711
+ #
712
+ # In the default mixed mode, this method assigns rows for index access and
713
+ # columns for header access. You can force the index association by first
714
+ # calling by_col!() or by_row!().
715
+ #
716
+ # Rows may be set to an Array of values (which will inherit the table's
717
+ # headers()) or a CSV::Row.
718
+ #
719
+ # Columns may be set to a single value, which is copied to each row of the
720
+ # column, or an Array of values. Arrays of values are assigned to rows top
721
+ # to bottom in row major order. Excess values are ignored and if the Array
722
+ # does not have a value for each row the extra rows will receive a +nil+.
723
+ #
724
+ # Assigning to an existing column or row clobbers the data. Assigning to
725
+ # new columns creates them at the right end of the table.
726
+ #
727
+ def []=(index_or_header, value)
728
+ if @mode == :row or # by index
729
+ (@mode == :col_or_row and index_or_header.is_a? Integer)
730
+ if value.is_a? Array
731
+ @table[index_or_header] = Row.new(headers, value)
732
+ else
733
+ @table[index_or_header] = value
734
+ end
735
+ else # set column
736
+ if value.is_a? Array # multiple values
737
+ @table.each_with_index do |row, i|
738
+ if row.header_row?
739
+ row[index_or_header] = index_or_header
740
+ else
741
+ row[index_or_header] = value[i]
742
+ end
743
+ end
744
+ else # repeated value
745
+ @table.each do |row|
746
+ if row.header_row?
747
+ row[index_or_header] = index_or_header
748
+ else
749
+ row[index_or_header] = value
750
+ end
751
+ end
752
+ end
753
+ end
754
+ end
755
+
756
+ #
757
+ # The mixed mode default is to treat a list of indices as row access,
758
+ # returning the rows indicated. Anything else is considered columnar
759
+ # access. For columnar access, the return set has an Array for each row
760
+ # with the values indicated by the headers in each Array. You can force
761
+ # column or row mode using by_col!() or by_row!().
762
+ #
763
+ # You cannot mix column and row access.
764
+ #
765
+ def values_at(*indices_or_headers)
766
+ if @mode == :row or # by indices
767
+ ( @mode == :col_or_row and indices_or_headers.all? do |index|
768
+ index.is_a?(Integer) or
769
+ ( index.is_a?(Range) and
770
+ index.first.is_a?(Integer) and
771
+ index.last.is_a?(Integer) )
772
+ end )
773
+ @table.values_at(*indices_or_headers)
774
+ else # by headers
775
+ @table.map { |row| row.values_at(*indices_or_headers) }
776
+ end
777
+ end
778
+
779
+ #
780
+ # Adds a new row to the bottom end of this table. You can provide an Array,
781
+ # which will be converted to a CSV::Row (inheriting the table's headers()),
782
+ # or a CSV::Row.
783
+ #
784
+ # This method returns the table for chaining.
785
+ #
786
+ def <<(row_or_array)
787
+ if row_or_array.is_a? Array # append Array
788
+ @table << Row.new(headers, row_or_array)
789
+ else # append Row
790
+ @table << row_or_array
791
+ end
792
+
793
+ self # for chaining
794
+ end
795
+
796
+ #
797
+ # A shortcut for appending multiple rows. Equivalent to:
798
+ #
799
+ # rows.each { |row| self << row }
800
+ #
801
+ # This method returns the table for chaining.
802
+ #
803
+ def push(*rows)
804
+ rows.each { |row| self << row }
805
+
806
+ self # for chaining
807
+ end
808
+
809
+ #
810
+ # Removes and returns the indicated column or row. In the default mixed
811
+ # mode indices refer to rows and everything else is assumed to be a column
812
+ # header. Use by_col!() or by_row!() to force the lookup.
813
+ #
814
+ def delete(index_or_header)
815
+ if @mode == :row or # by index
816
+ (@mode == :col_or_row and index_or_header.is_a? Integer)
817
+ @table.delete_at(index_or_header)
818
+ else # by header
819
+ @table.map { |row| row.delete(index_or_header).last }
820
+ end
821
+ end
822
+
823
+ #
824
+ # Removes any column or row for which the block returns +true+. In the
825
+ # default mixed mode or row mode, iteration is the standard row major
826
+ # walking of rows. In column mode, iteration will +yield+ two element
827
+ # tuples containing the column name and an Array of values for that column.
828
+ #
829
+ # This method returns the table for chaining.
830
+ #
831
+ # If no block is given, an Enumerator is returned.
832
+ #
833
+ def delete_if(&block)
834
+ block or return enum_for(__method__) { @mode == :row or @mode == :col_or_row ? size : headers.size }
835
+
836
+ if @mode == :row or @mode == :col_or_row # by index
837
+ @table.delete_if(&block)
838
+ else # by header
839
+ to_delete = Array.new
840
+ headers.each_with_index do |header, i|
841
+ to_delete << header if block[[header, self[header]]]
842
+ end
843
+ to_delete.map { |header| delete(header) }
844
+ end
845
+
846
+ self # for chaining
847
+ end
848
+
849
+ include Enumerable
850
+
851
+ #
852
+ # In the default mixed mode or row mode, iteration is the standard row major
853
+ # walking of rows. In column mode, iteration will +yield+ two element
854
+ # tuples containing the column name and an Array of values for that column.
855
+ #
856
+ # This method returns the table for chaining.
857
+ #
858
+ # If no block is given, an Enumerator is returned.
859
+ #
860
+ def each(&block)
861
+ block or return enum_for(__method__) { @mode == :col ? headers.size : size }
862
+
863
+ if @mode == :col
864
+ headers.each { |header| block[[header, self[header]]] }
865
+ else
866
+ @table.each(&block)
867
+ end
868
+
869
+ self # for chaining
870
+ end
871
+
872
+ # Returns +true+ if all rows of this table ==() +other+'s rows.
873
+ def ==(other)
874
+ @table == other.table
875
+ end
876
+
877
+ #
878
+ # Returns the table as an Array of Arrays. Headers will be the first row,
879
+ # then all of the field rows will follow.
880
+ #
881
+ def to_a
882
+ @table.inject([headers]) do |array, row|
883
+ if row.header_row?
884
+ array
885
+ else
886
+ array + [row.fields]
887
+ end
888
+ end
889
+ end
890
+
891
+ #
892
+ # Returns the table as a complete CSV String. Headers will be listed first,
893
+ # then all of the field rows.
894
+ #
895
+ # This method assumes you want the Table.headers(), unless you explicitly
896
+ # pass <tt>:write_headers => false</tt>.
897
+ #
898
+ def to_csv(options = Hash.new)
899
+ wh = options.fetch(:write_headers, true)
900
+ @table.inject(wh ? [headers.to_csv(options)] : [ ]) do |rows, row|
901
+ if row.header_row?
902
+ rows
903
+ else
904
+ rows + [row.fields.to_csv(options)]
905
+ end
906
+ end.join('')
907
+ end
908
+ alias_method :to_s, :to_csv
909
+
910
+ # Shows the mode and size of this table in a US-ASCII String.
911
+ def inspect
912
+ "#<#{self.class} mode:#{@mode} row_count:#{to_a.size}>".encode("US-ASCII")
913
+ end
914
+ end
915
+
916
+ # The error thrown when the parser encounters illegal CSV formatting.
917
+ class MalformedCSVError < RuntimeError; end
918
+
919
+ #
920
+ # A FieldInfo Struct contains details about a field's position in the data
921
+ # source it was read from. CSV will pass this Struct to some blocks that make
922
+ # decisions based on field structure. See CSV.convert_fields() for an
923
+ # example.
924
+ #
925
+ # <b><tt>index</tt></b>:: The zero-based index of the field in its row.
926
+ # <b><tt>line</tt></b>:: The line of the data source this row is from.
927
+ # <b><tt>header</tt></b>:: The header for the column, when available.
928
+ #
929
+ FieldInfo = Struct.new(:index, :line, :header)
930
+
931
+ # A Regexp used to find and convert some common Date formats.
932
+ DateMatcher = / \A(?: (\w+,?\s+)?\w+\s+\d{1,2},?\s+\d{2,4} |
933
+ \d{4}-\d{2}-\d{2} )\z /x
934
+ # A Regexp used to find and convert some common DateTime formats.
935
+ DateTimeMatcher =
936
+ / \A(?: (\w+,?\s+)?\w+\s+\d{1,2}\s+\d{1,2}:\d{1,2}:\d{1,2},?\s+\d{2,4} |
937
+ \d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2} )\z /x
938
+
939
+ # The encoding used by all converters.
940
+ ConverterEncoding = Encoding.find("UTF-8")
941
+
942
+ #
943
+ # This Hash holds the built-in converters of CSV that can be accessed by name.
944
+ # You can select Converters with CSV.convert() or through the +options+ Hash
945
+ # passed to CSV::new().
946
+ #
947
+ # <b><tt>:integer</tt></b>:: Converts any field Integer() accepts.
948
+ # <b><tt>:float</tt></b>:: Converts any field Float() accepts.
949
+ # <b><tt>:numeric</tt></b>:: A combination of <tt>:integer</tt>
950
+ # and <tt>:float</tt>.
951
+ # <b><tt>:date</tt></b>:: Converts any field Date::parse() accepts.
952
+ # <b><tt>:date_time</tt></b>:: Converts any field DateTime::parse() accepts.
953
+ # <b><tt>:all</tt></b>:: All built-in converters. A combination of
954
+ # <tt>:date_time</tt> and <tt>:numeric</tt>.
955
+ #
956
+ # All built-in converters transcode field data to UTF-8 before attempting a
957
+ # conversion. If your data cannot be transcoded to UTF-8 the conversion will
958
+ # fail and the field will remain unchanged.
959
+ #
960
+ # This Hash is intentionally left unfrozen and users should feel free to add
961
+ # values to it that can be accessed by all CSV objects.
962
+ #
963
+ # To add a combo field, the value should be an Array of names. Combo fields
964
+ # can be nested with other combo fields.
965
+ #
966
+ Converters = {
967
+ integer: lambda { |f|
968
+ Integer(f.encode(ConverterEncoding)) rescue f
969
+ },
970
+ float: lambda { |f|
971
+ Float(f.encode(ConverterEncoding)) rescue f
972
+ },
973
+ numeric: [:integer, :float],
974
+ date: lambda { |f|
975
+ begin
976
+ e = f.encode(ConverterEncoding)
977
+ e =~ DateMatcher ? Date.parse(e) : f
978
+ rescue # encoding conversion or date parse errors
979
+ f
980
+ end
981
+ },
982
+ date_time: lambda { |f|
983
+ begin
984
+ e = f.encode(ConverterEncoding)
985
+ e =~ DateTimeMatcher ? DateTime.parse(e) : f
986
+ rescue # encoding conversion or date parse errors
987
+ f
988
+ end
989
+ },
990
+ all: [:date_time, :numeric],
991
+ }
992
+
993
+ #
994
+ # This Hash holds the built-in header converters of CSV that can be accessed
995
+ # by name. You can select HeaderConverters with CSV.header_convert() or
996
+ # through the +options+ Hash passed to CSV::new().
997
+ #
998
+ # <b><tt>:downcase</tt></b>:: Calls downcase() on the header String.
999
+ # <b><tt>:symbol</tt></b>:: Leading/trailing spaces are dropped, string is
1000
+ # downcased, remaining spaces are replaced with
1001
+ # underscores, non-word characters are dropped,
1002
+ # and finally to_sym() is called.
1003
+ #
1004
+ # All built-in header converters transcode header data to UTF-8 before
1005
+ # attempting a conversion. If your data cannot be transcoded to UTF-8 the
1006
+ # conversion will fail and the header will remain unchanged.
1007
+ #
1008
+ # This Hash is intentionally left unfrozen and users should feel free to add
1009
+ # values to it that can be accessed by all CSV objects.
1010
+ #
1011
+ # To add a combo field, the value should be an Array of names. Combo fields
1012
+ # can be nested with other combo fields.
1013
+ #
1014
+ HeaderConverters = {
1015
+ downcase: lambda { |h| h.encode(ConverterEncoding).downcase },
1016
+ symbol: lambda { |h|
1017
+ h.encode(ConverterEncoding).downcase.strip.gsub(/\s+/, "_").
1018
+ gsub(/\W+/, "").to_sym
1019
+ }
1020
+ }
1021
+
1022
+ #
1023
+ # The options used when no overrides are given by calling code. They are:
1024
+ #
1025
+ # <b><tt>:col_sep</tt></b>:: <tt>","</tt>
1026
+ # <b><tt>:row_sep</tt></b>:: <tt>:auto</tt>
1027
+ # <b><tt>:quote_char</tt></b>:: <tt>'"'</tt>
1028
+ # <b><tt>:field_size_limit</tt></b>:: +nil+
1029
+ # <b><tt>:converters</tt></b>:: +nil+
1030
+ # <b><tt>:unconverted_fields</tt></b>:: +nil+
1031
+ # <b><tt>:headers</tt></b>:: +false+
1032
+ # <b><tt>:return_headers</tt></b>:: +false+
1033
+ # <b><tt>:header_converters</tt></b>:: +nil+
1034
+ # <b><tt>:skip_blanks</tt></b>:: +false+
1035
+ # <b><tt>:force_quotes</tt></b>:: +false+
1036
+ # <b><tt>:skip_lines</tt></b>:: +nil+
1037
+ # <b><tt>:liberal_parsing</tt></b>:: +false+
1038
+ #
1039
+ DEFAULT_OPTIONS = {
1040
+ col_sep: ",",
1041
+ row_sep: :auto,
1042
+ quote_char: '"',
1043
+ field_size_limit: nil,
1044
+ converters: nil,
1045
+ unconverted_fields: nil,
1046
+ headers: false,
1047
+ return_headers: false,
1048
+ header_converters: nil,
1049
+ skip_blanks: false,
1050
+ force_quotes: false,
1051
+ skip_lines: nil,
1052
+ liberal_parsing: false,
1053
+ }.freeze
1054
+
1055
+ #
1056
+ # This method will return a CSV instance, just like CSV::new(), but the
1057
+ # instance will be cached and returned for all future calls to this method for
1058
+ # the same +data+ object (tested by Object#object_id()) with the same
1059
+ # +options+.
1060
+ #
1061
+ # If a block is given, the instance is passed to the block and the return
1062
+ # value becomes the return value of the block.
1063
+ #
1064
+ def self.instance(data = $stdout, options = Hash.new)
1065
+ # create a _signature_ for this method call, data object and options
1066
+ sig = [data.object_id] +
1067
+ options.values_at(*DEFAULT_OPTIONS.keys.sort_by { |sym| sym.to_s })
1068
+
1069
+ # fetch or create the instance for this signature
1070
+ @@instances ||= Hash.new
1071
+ instance = (@@instances[sig] ||= new(data, options))
1072
+
1073
+ if block_given?
1074
+ yield instance # run block, if given, returning result
1075
+ else
1076
+ instance # or return the instance
1077
+ end
1078
+ end
1079
+
1080
+ #
1081
+ # :call-seq:
1082
+ # filter( options = Hash.new ) { |row| ... }
1083
+ # filter( input, options = Hash.new ) { |row| ... }
1084
+ # filter( input, output, options = Hash.new ) { |row| ... }
1085
+ #
1086
+ # This method is a convenience for building Unix-like filters for CSV data.
1087
+ # Each row is yielded to the provided block which can alter it as needed.
1088
+ # After the block returns, the row is appended to +output+ altered or not.
1089
+ #
1090
+ # The +input+ and +output+ arguments can be anything CSV::new() accepts
1091
+ # (generally String or IO objects). If not given, they default to
1092
+ # <tt>ARGF</tt> and <tt>$stdout</tt>.
1093
+ #
1094
+ # The +options+ parameter is also filtered down to CSV::new() after some
1095
+ # clever key parsing. Any key beginning with <tt>:in_</tt> or
1096
+ # <tt>:input_</tt> will have that leading identifier stripped and will only
1097
+ # be used in the +options+ Hash for the +input+ object. Keys starting with
1098
+ # <tt>:out_</tt> or <tt>:output_</tt> affect only +output+. All other keys
1099
+ # are assigned to both objects.
1100
+ #
1101
+ # The <tt>:output_row_sep</tt> +option+ defaults to
1102
+ # <tt>$INPUT_RECORD_SEPARATOR</tt> (<tt>$/</tt>).
1103
+ #
1104
+ def self.filter(*args)
1105
+ # parse options for input, output, or both
1106
+ in_options, out_options = Hash.new, {row_sep: $INPUT_RECORD_SEPARATOR}
1107
+ if args.last.is_a? Hash
1108
+ args.pop.each do |key, value|
1109
+ case key.to_s
1110
+ when /\Ain(?:put)?_(.+)\Z/
1111
+ in_options[$1.to_sym] = value
1112
+ when /\Aout(?:put)?_(.+)\Z/
1113
+ out_options[$1.to_sym] = value
1114
+ else
1115
+ in_options[key] = value
1116
+ out_options[key] = value
1117
+ end
1118
+ end
1119
+ end
1120
+ # build input and output wrappers
1121
+ input = new(args.shift || ARGF, in_options)
1122
+ output = new(args.shift || $stdout, out_options)
1123
+
1124
+ # read, yield, write
1125
+ input.each do |row|
1126
+ yield row
1127
+ output << row
1128
+ end
1129
+ end
1130
+
1131
+ #
1132
+ # This method is intended as the primary interface for reading CSV files. You
1133
+ # pass a +path+ and any +options+ you wish to set for the read. Each row of
1134
+ # file will be passed to the provided +block+ in turn.
1135
+ #
1136
+ # The +options+ parameter can be anything CSV::new() understands. This method
1137
+ # also understands an additional <tt>:encoding</tt> parameter that you can use
1138
+ # to specify the Encoding of the data in the file to be read. You must provide
1139
+ # this unless your data is in Encoding::default_external(). CSV will use this
1140
+ # to determine how to parse the data. You may provide a second Encoding to
1141
+ # have the data transcoded as it is read. For example,
1142
+ # <tt>encoding: "UTF-32BE:UTF-8"</tt> would read UTF-32BE data from the file
1143
+ # but transcode it to UTF-8 before CSV parses it.
1144
+ #
1145
+ def self.foreach(path, options = Hash.new, &block)
1146
+ return to_enum(__method__, path, options) unless block
1147
+ open(path, options) do |csv|
1148
+ csv.each(&block)
1149
+ end
1150
+ end
1151
+
1152
+ #
1153
+ # :call-seq:
1154
+ # generate( str, options = Hash.new ) { |csv| ... }
1155
+ # generate( options = Hash.new ) { |csv| ... }
1156
+ #
1157
+ # This method wraps a String you provide, or an empty default String, in a
1158
+ # CSV object which is passed to the provided block. You can use the block to
1159
+ # append CSV rows to the String and when the block exits, the final String
1160
+ # will be returned.
1161
+ #
1162
+ # Note that a passed String *is* modified by this method. Call dup() before
1163
+ # passing if you need a new String.
1164
+ #
1165
+ # The +options+ parameter can be anything CSV::new() understands. This method
1166
+ # understands an additional <tt>:encoding</tt> parameter when not passed a
1167
+ # String to set the base Encoding for the output. CSV needs this hint if you
1168
+ # plan to output non-ASCII compatible data.
1169
+ #
1170
+ def self.generate(*args)
1171
+ # add a default empty String, if none was given
1172
+ if args.first.is_a? String
1173
+ io = StringIO.new(args.shift)
1174
+ io.seek(0, IO::SEEK_END)
1175
+ args.unshift(io)
1176
+ else
1177
+ encoding = args[-1][:encoding] if args.last.is_a?(Hash)
1178
+ str = String.new
1179
+ str.force_encoding(encoding) if encoding
1180
+ args.unshift(str)
1181
+ end
1182
+ csv = new(*args) # wrap
1183
+ yield csv # yield for appending
1184
+ csv.string # return final String
1185
+ end
1186
+
1187
+ #
1188
+ # This method is a shortcut for converting a single row (Array) into a CSV
1189
+ # String.
1190
+ #
1191
+ # The +options+ parameter can be anything CSV::new() understands. This method
1192
+ # understands an additional <tt>:encoding</tt> parameter to set the base
1193
+ # Encoding for the output. This method will try to guess your Encoding from
1194
+ # the first non-+nil+ field in +row+, if possible, but you may need to use
1195
+ # this parameter as a backup plan.
1196
+ #
1197
+ # The <tt>:row_sep</tt> +option+ defaults to <tt>$INPUT_RECORD_SEPARATOR</tt>
1198
+ # (<tt>$/</tt>) when calling this method.
1199
+ #
1200
+ def self.generate_line(row, options = Hash.new)
1201
+ options = {row_sep: $INPUT_RECORD_SEPARATOR}.merge(options)
1202
+ encoding = options.delete(:encoding)
1203
+ str = String.new
1204
+ if encoding
1205
+ str.force_encoding(encoding)
1206
+ elsif field = row.find { |f| not f.nil? }
1207
+ str.force_encoding(String(field).encoding)
1208
+ end
1209
+ (new(str, options) << row).string
1210
+ end
1211
+
1212
+ #
1213
+ # :call-seq:
1214
+ # open( filename, mode = "rb", options = Hash.new ) { |faster_csv| ... }
1215
+ # open( filename, options = Hash.new ) { |faster_csv| ... }
1216
+ # open( filename, mode = "rb", options = Hash.new )
1217
+ # open( filename, options = Hash.new )
1218
+ #
1219
+ # This method opens an IO object, and wraps that with CSV. This is intended
1220
+ # as the primary interface for writing a CSV file.
1221
+ #
1222
+ # You must pass a +filename+ and may optionally add a +mode+ for Ruby's
1223
+ # open(). You may also pass an optional Hash containing any +options+
1224
+ # CSV::new() understands as the final argument.
1225
+ #
1226
+ # This method works like Ruby's open() call, in that it will pass a CSV object
1227
+ # to a provided block and close it when the block terminates, or it will
1228
+ # return the CSV object when no block is provided. (*Note*: This is different
1229
+ # from the Ruby 1.8 CSV library which passed rows to the block. Use
1230
+ # CSV::foreach() for that behavior.)
1231
+ #
1232
+ # You must provide a +mode+ with an embedded Encoding designator unless your
1233
+ # data is in Encoding::default_external(). CSV will check the Encoding of the
1234
+ # underlying IO object (set by the +mode+ you pass) to determine how to parse
1235
+ # the data. You may provide a second Encoding to have the data transcoded as
1236
+ # it is read just as you can with a normal call to IO::open(). For example,
1237
+ # <tt>"rb:UTF-32BE:UTF-8"</tt> would read UTF-32BE data from the file but
1238
+ # transcode it to UTF-8 before CSV parses it.
1239
+ #
1240
+ # An opened CSV object will delegate to many IO methods for convenience. You
1241
+ # may call:
1242
+ #
1243
+ # * binmode()
1244
+ # * binmode?()
1245
+ # * close()
1246
+ # * close_read()
1247
+ # * close_write()
1248
+ # * closed?()
1249
+ # * eof()
1250
+ # * eof?()
1251
+ # * external_encoding()
1252
+ # * fcntl()
1253
+ # * fileno()
1254
+ # * flock()
1255
+ # * flush()
1256
+ # * fsync()
1257
+ # * internal_encoding()
1258
+ # * ioctl()
1259
+ # * isatty()
1260
+ # * path()
1261
+ # * pid()
1262
+ # * pos()
1263
+ # * pos=()
1264
+ # * reopen()
1265
+ # * seek()
1266
+ # * stat()
1267
+ # * sync()
1268
+ # * sync=()
1269
+ # * tell()
1270
+ # * to_i()
1271
+ # * to_io()
1272
+ # * truncate()
1273
+ # * tty?()
1274
+ #
1275
+ def self.open(*args)
1276
+ # find the +options+ Hash
1277
+ options = if args.last.is_a? Hash then args.pop else Hash.new end
1278
+ # wrap a File opened with the remaining +args+ with no newline
1279
+ # decorator
1280
+ file_opts = {universal_newline: false}.merge(options)
1281
+ begin
1282
+ f = File.open(*args, file_opts)
1283
+ rescue ArgumentError => e
1284
+ raise unless /needs binmode/ =~ e.message and args.size == 1
1285
+ args << "rb"
1286
+ file_opts = {encoding: Encoding.default_external}.merge(file_opts)
1287
+ retry
1288
+ end
1289
+ begin
1290
+ csv = new(f, options)
1291
+ rescue Exception
1292
+ f.close
1293
+ raise
1294
+ end
1295
+
1296
+ # handle blocks like Ruby's open(), not like the CSV library
1297
+ if block_given?
1298
+ begin
1299
+ yield csv
1300
+ ensure
1301
+ csv.close
1302
+ end
1303
+ else
1304
+ csv
1305
+ end
1306
+ end
1307
+
1308
+ #
1309
+ # :call-seq:
1310
+ # parse( str, options = Hash.new ) { |row| ... }
1311
+ # parse( str, options = Hash.new )
1312
+ #
1313
+ # This method can be used to easily parse CSV out of a String. You may either
1314
+ # provide a +block+ which will be called with each row of the String in turn,
1315
+ # or just use the returned Array of Arrays (when no +block+ is given).
1316
+ #
1317
+ # You pass your +str+ to read from, and an optional +options+ Hash containing
1318
+ # anything CSV::new() understands.
1319
+ #
1320
+ def self.parse(*args, &block)
1321
+ csv = new(*args)
1322
+ if block.nil? # slurp contents, if no block is given
1323
+ begin
1324
+ csv.read
1325
+ ensure
1326
+ csv.close
1327
+ end
1328
+ else # or pass each row to a provided block
1329
+ csv.each(&block)
1330
+ end
1331
+ end
1332
+
1333
+ #
1334
+ # This method is a shortcut for converting a single line of a CSV String into
1335
+ # an Array. Note that if +line+ contains multiple rows, anything beyond the
1336
+ # first row is ignored.
1337
+ #
1338
+ # The +options+ parameter can be anything CSV::new() understands.
1339
+ #
1340
+ def self.parse_line(line, options = Hash.new)
1341
+ new(line, options).shift
1342
+ end
1343
+
1344
+ #
1345
+ # Use to slurp a CSV file into an Array of Arrays. Pass the +path+ to the
1346
+ # file and any +options+ CSV::new() understands. This method also understands
1347
+ # an additional <tt>:encoding</tt> parameter that you can use to specify the
1348
+ # Encoding of the data in the file to be read. You must provide this unless
1349
+ # your data is in Encoding::default_external(). CSV will use this to determine
1350
+ # how to parse the data. You may provide a second Encoding to have the data
1351
+ # transcoded as it is read. For example,
1352
+ # <tt>encoding: "UTF-32BE:UTF-8"</tt> would read UTF-32BE data from the file
1353
+ # but transcode it to UTF-8 before CSV parses it.
1354
+ #
1355
+ def self.read(path, *options)
1356
+ open(path, *options) { |csv| csv.read }
1357
+ end
1358
+
1359
+ # Alias for CSV::read().
1360
+ def self.readlines(*args)
1361
+ read(*args)
1362
+ end
1363
+
1364
+ #
1365
+ # A shortcut for:
1366
+ #
1367
+ # CSV.read( path, { headers: true,
1368
+ # converters: :numeric,
1369
+ # header_converters: :symbol }.merge(options) )
1370
+ #
1371
+ def self.table(path, options = Hash.new)
1372
+ read( path, { headers: true,
1373
+ converters: :numeric,
1374
+ header_converters: :symbol }.merge(options) )
1375
+ end
1376
+
1377
+ #
1378
+ # This constructor will wrap either a String or IO object passed in +data+ for
1379
+ # reading and/or writing. In addition to the CSV instance methods, several IO
1380
+ # methods are delegated. (See CSV::open() for a complete list.) If you pass
1381
+ # a String for +data+, you can later retrieve it (after writing to it, for
1382
+ # example) with CSV.string().
1383
+ #
1384
+ # Note that a wrapped String will be positioned at the beginning (for
1385
+ # reading). If you want it at the end (for writing), use CSV::generate().
1386
+ # If you want any other positioning, pass a preset StringIO object instead.
1387
+ #
1388
+ # You may set any reading and/or writing preferences in the +options+ Hash.
1389
+ # Available options are:
1390
+ #
1391
+ # <b><tt>:col_sep</tt></b>:: The String placed between each field.
1392
+ # This String will be transcoded into
1393
+ # the data's Encoding before parsing.
1394
+ # <b><tt>:row_sep</tt></b>:: The String appended to the end of each
1395
+ # row. This can be set to the special
1396
+ # <tt>:auto</tt> setting, which requests
1397
+ # that CSV automatically discover this
1398
+ # from the data. Auto-discovery reads
1399
+ # ahead in the data looking for the next
1400
+ # <tt>"\r\n"</tt>, <tt>"\n"</tt>, or
1401
+ # <tt>"\r"</tt> sequence. A sequence
1402
+ # will be selected even if it occurs in
1403
+ # a quoted field, assuming that you
1404
+ # would have the same line endings
1405
+ # there. If none of those sequences is
1406
+ # found, +data+ is <tt>ARGF</tt>,
1407
+ # <tt>STDIN</tt>, <tt>STDOUT</tt>, or
1408
+ # <tt>STDERR</tt>, or the stream is only
1409
+ # available for output, the default
1410
+ # <tt>$INPUT_RECORD_SEPARATOR</tt>
1411
+ # (<tt>$/</tt>) is used. Obviously,
1412
+ # discovery takes a little time. Set
1413
+ # manually if speed is important. Also
1414
+ # note that IO objects should be opened
1415
+ # in binary mode on Windows if this
1416
+ # feature will be used as the
1417
+ # line-ending translation can cause
1418
+ # problems with resetting the document
1419
+ # position to where it was before the
1420
+ # read ahead. This String will be
1421
+ # transcoded into the data's Encoding
1422
+ # before parsing.
1423
+ # <b><tt>:quote_char</tt></b>:: The character used to quote fields.
1424
+ # This has to be a single character
1425
+ # String. This is useful for
1426
+ # application that incorrectly use
1427
+ # <tt>'</tt> as the quote character
1428
+ # instead of the correct <tt>"</tt>.
1429
+ # CSV will always consider a double
1430
+ # sequence of this character to be an
1431
+ # escaped quote. This String will be
1432
+ # transcoded into the data's Encoding
1433
+ # before parsing.
1434
+ # <b><tt>:field_size_limit</tt></b>:: This is a maximum size CSV will read
1435
+ # ahead looking for the closing quote
1436
+ # for a field. (In truth, it reads to
1437
+ # the first line ending beyond this
1438
+ # size.) If a quote cannot be found
1439
+ # within the limit CSV will raise a
1440
+ # MalformedCSVError, assuming the data
1441
+ # is faulty. You can use this limit to
1442
+ # prevent what are effectively DoS
1443
+ # attacks on the parser. However, this
1444
+ # limit can cause a legitimate parse to
1445
+ # fail and thus is set to +nil+, or off,
1446
+ # by default.
1447
+ # <b><tt>:converters</tt></b>:: An Array of names from the Converters
1448
+ # Hash and/or lambdas that handle custom
1449
+ # conversion. A single converter
1450
+ # doesn't have to be in an Array. All
1451
+ # built-in converters try to transcode
1452
+ # fields to UTF-8 before converting.
1453
+ # The conversion will fail if the data
1454
+ # cannot be transcoded, leaving the
1455
+ # field unchanged.
1456
+ # <b><tt>:unconverted_fields</tt></b>:: If set to +true+, an
1457
+ # unconverted_fields() method will be
1458
+ # added to all returned rows (Array or
1459
+ # CSV::Row) that will return the fields
1460
+ # as they were before conversion. Note
1461
+ # that <tt>:headers</tt> supplied by
1462
+ # Array or String were not fields of the
1463
+ # document and thus will have an empty
1464
+ # Array attached.
1465
+ # <b><tt>:headers</tt></b>:: If set to <tt>:first_row</tt> or
1466
+ # +true+, the initial row of the CSV
1467
+ # file will be treated as a row of
1468
+ # headers. If set to an Array, the
1469
+ # contents will be used as the headers.
1470
+ # If set to a String, the String is run
1471
+ # through a call of CSV::parse_line()
1472
+ # with the same <tt>:col_sep</tt>,
1473
+ # <tt>:row_sep</tt>, and
1474
+ # <tt>:quote_char</tt> as this instance
1475
+ # to produce an Array of headers. This
1476
+ # setting causes CSV#shift() to return
1477
+ # rows as CSV::Row objects instead of
1478
+ # Arrays and CSV#read() to return
1479
+ # CSV::Table objects instead of an Array
1480
+ # of Arrays.
1481
+ # <b><tt>:return_headers</tt></b>:: When +false+, header rows are silently
1482
+ # swallowed. If set to +true+, header
1483
+ # rows are returned in a CSV::Row object
1484
+ # with identical headers and
1485
+ # fields (save that the fields do not go
1486
+ # through the converters).
1487
+ # <b><tt>:write_headers</tt></b>:: When +true+ and <tt>:headers</tt> is
1488
+ # set, a header row will be added to the
1489
+ # output.
1490
+ # <b><tt>:header_converters</tt></b>:: Identical in functionality to
1491
+ # <tt>:converters</tt> save that the
1492
+ # conversions are only made to header
1493
+ # rows. All built-in converters try to
1494
+ # transcode headers to UTF-8 before
1495
+ # converting. The conversion will fail
1496
+ # if the data cannot be transcoded,
1497
+ # leaving the header unchanged.
1498
+ # <b><tt>:skip_blanks</tt></b>:: When set to a +true+ value, CSV will
1499
+ # skip over any empty rows. Note that
1500
+ # this setting will not skip rows that
1501
+ # contain column separators, even if
1502
+ # the rows contain no actual data. If
1503
+ # you want to skip rows that contain
1504
+ # separators but no content, consider
1505
+ # using <tt>:skip_lines</tt>, or
1506
+ # inspecting fields.compact.empty? on
1507
+ # each row.
1508
+ # <b><tt>:force_quotes</tt></b>:: When set to a +true+ value, CSV will
1509
+ # quote all CSV fields it creates.
1510
+ # <b><tt>:skip_lines</tt></b>:: When set to an object responding to
1511
+ # <tt>match</tt>, every line matching
1512
+ # it is considered a comment and ignored
1513
+ # during parsing. When set to a String,
1514
+ # it is first converted to a Regexp.
1515
+ # When set to +nil+ no line is considered
1516
+ # a comment. If the passed object does
1517
+ # not respond to <tt>match</tt>,
1518
+ # <tt>ArgumentError</tt> is thrown.
1519
+ # <b><tt>:liberal_parsing</tt></b>:: When set to a +true+ value, CSV will
1520
+ # attempt to parse input not conformant
1521
+ # with RFC 4180, such as double quotes
1522
+ # in unquoted fields.
1523
+ #
1524
+ # See CSV::DEFAULT_OPTIONS for the default settings.
1525
+ #
1526
+ # Options cannot be overridden in the instance methods for performance reasons,
1527
+ # so be sure to set what you want here.
1528
+ #
1529
+ def initialize(data, options = Hash.new)
1530
+ if data.nil?
1531
+ raise ArgumentError.new("Cannot parse nil as CSV")
1532
+ end
1533
+
1534
+ # build the options for this read/write
1535
+ options = DEFAULT_OPTIONS.merge(options)
1536
+
1537
+ # create the IO object we will read from
1538
+ @io = data.is_a?(String) ? StringIO.new(data) : data
1539
+ # honor the IO encoding if we can, otherwise default to ASCII-8BIT
1540
+ @encoding = raw_encoding(nil) ||
1541
+ ( if encoding = options.delete(:internal_encoding)
1542
+ case encoding
1543
+ when Encoding; encoding
1544
+ else Encoding.find(encoding)
1545
+ end
1546
+ end ) ||
1547
+ ( case encoding = options.delete(:encoding)
1548
+ when Encoding; encoding
1549
+ when /\A[^:]+/; Encoding.find($&)
1550
+ end ) ||
1551
+ Encoding.default_internal || Encoding.default_external
1552
+ #
1553
+ # prepare for building safe regular expressions in the target encoding,
1554
+ # if we can transcode the needed characters
1555
+ #
1556
+ @re_esc = "\\".encode(@encoding).freeze rescue ""
1557
+ @re_chars = /#{%"[-\\]\\[\\.^$?*+{}()|# \r\n\t\f\v]".encode(@encoding)}/
1558
+
1559
+ init_separators(options)
1560
+ init_parsers(options)
1561
+ init_converters(options)
1562
+ init_headers(options)
1563
+ init_comments(options)
1564
+
1565
+ @force_encoding = !!(encoding || options.delete(:encoding))
1566
+ options.delete(:internal_encoding)
1567
+ options.delete(:external_encoding)
1568
+ unless options.empty?
1569
+ raise ArgumentError, "Unknown options: #{options.keys.join(', ')}."
1570
+ end
1571
+
1572
+ # track our own lineno since IO gets confused about line-ends is CSV fields
1573
+ @lineno = 0
1574
+ end
1575
+
1576
+ #
1577
+ # The encoded <tt>:col_sep</tt> used in parsing and writing. See CSV::new
1578
+ # for details.
1579
+ #
1580
+ attr_reader :col_sep
1581
+ #
1582
+ # The encoded <tt>:row_sep</tt> used in parsing and writing. See CSV::new
1583
+ # for details.
1584
+ #
1585
+ attr_reader :row_sep
1586
+ #
1587
+ # The encoded <tt>:quote_char</tt> used in parsing and writing. See CSV::new
1588
+ # for details.
1589
+ #
1590
+ attr_reader :quote_char
1591
+ # The limit for field size, if any. See CSV::new for details.
1592
+ attr_reader :field_size_limit
1593
+
1594
+ # The regex marking a line as a comment. See CSV::new for details
1595
+ attr_reader :skip_lines
1596
+
1597
+ #
1598
+ # Returns the current list of converters in effect. See CSV::new for details.
1599
+ # Built-in converters will be returned by name, while others will be returned
1600
+ # as is.
1601
+ #
1602
+ def converters
1603
+ @converters.map do |converter|
1604
+ name = Converters.rassoc(converter)
1605
+ name ? name.first : converter
1606
+ end
1607
+ end
1608
+ #
1609
+ # Returns +true+ if unconverted_fields() to parsed results. See CSV::new
1610
+ # for details.
1611
+ #
1612
+ def unconverted_fields?() @unconverted_fields end
1613
+ #
1614
+ # Returns +nil+ if headers will not be used, +true+ if they will but have not
1615
+ # yet been read, or the actual headers after they have been read. See
1616
+ # CSV::new for details.
1617
+ #
1618
+ def headers
1619
+ @headers || true if @use_headers
1620
+ end
1621
+ #
1622
+ # Returns +true+ if headers will be returned as a row of results.
1623
+ # See CSV::new for details.
1624
+ #
1625
+ def return_headers?() @return_headers end
1626
+ # Returns +true+ if headers are written in output. See CSV::new for details.
1627
+ def write_headers?() @write_headers end
1628
+ #
1629
+ # Returns the current list of converters in effect for headers. See CSV::new
1630
+ # for details. Built-in converters will be returned by name, while others
1631
+ # will be returned as is.
1632
+ #
1633
+ def header_converters
1634
+ @header_converters.map do |converter|
1635
+ name = HeaderConverters.rassoc(converter)
1636
+ name ? name.first : converter
1637
+ end
1638
+ end
1639
+ #
1640
+ # Returns +true+ blank lines are skipped by the parser. See CSV::new
1641
+ # for details.
1642
+ #
1643
+ def skip_blanks?() @skip_blanks end
1644
+ # Returns +true+ if all output fields are quoted. See CSV::new for details.
1645
+ def force_quotes?() @force_quotes end
1646
+ # Returns +true+ if illegal input is handled. See CSV::new for details.
1647
+ def liberal_parsing?() @liberal_parsing end
1648
+
1649
+ #
1650
+ # The Encoding CSV is parsing or writing in. This will be the Encoding you
1651
+ # receive parsed data in and/or the Encoding data will be written in.
1652
+ #
1653
+ attr_reader :encoding
1654
+
1655
+ #
1656
+ # The line number of the last row read from this file. Fields with nested
1657
+ # line-end characters will not affect this count.
1658
+ #
1659
+ attr_reader :lineno
1660
+
1661
+ ### IO and StringIO Delegation ###
1662
+
1663
+ extend Forwardable
1664
+ def_delegators :@io, :binmode, :binmode?, :close, :close_read, :close_write,
1665
+ :closed?, :eof, :eof?, :external_encoding, :fcntl,
1666
+ :fileno, :flock, :flush, :fsync, :internal_encoding,
1667
+ :ioctl, :isatty, :path, :pid, :pos, :pos=, :reopen,
1668
+ :seek, :stat, :string, :sync, :sync=, :tell, :to_i,
1669
+ :to_io, :truncate, :tty?
1670
+
1671
+ # Rewinds the underlying IO object and resets CSV's lineno() counter.
1672
+ def rewind
1673
+ @headers = nil
1674
+ @lineno = 0
1675
+
1676
+ @io.rewind
1677
+ end
1678
+
1679
+ ### End Delegation ###
1680
+
1681
+ #
1682
+ # The primary write method for wrapped Strings and IOs, +row+ (an Array or
1683
+ # CSV::Row) is converted to CSV and appended to the data source. When a
1684
+ # CSV::Row is passed, only the row's fields() are appended to the output.
1685
+ #
1686
+ # The data source must be open for writing.
1687
+ #
1688
+ def <<(row)
1689
+ # make sure headers have been assigned
1690
+ if header_row? and [Array, String].include? @use_headers.class
1691
+ parse_headers # won't read data for Array or String
1692
+ self << @headers if @write_headers
1693
+ end
1694
+
1695
+ # handle CSV::Row objects and Hashes
1696
+ row = case row
1697
+ when self.class::Row then row.fields
1698
+ when Hash then @headers.map { |header| row[header] }
1699
+ else row
1700
+ end
1701
+
1702
+ @headers = row if header_row?
1703
+ @lineno += 1
1704
+
1705
+ output = row.map(&@quote).join(@col_sep) + @row_sep # quote and separate
1706
+ if @io.is_a?(StringIO) and
1707
+ output.encoding != (encoding = raw_encoding)
1708
+ if @force_encoding
1709
+ output = output.encode(encoding)
1710
+ elsif (compatible_encoding = Encoding.compatible?(@io.string, output))
1711
+ @io.set_encoding(compatible_encoding)
1712
+ @io.seek(0, IO::SEEK_END)
1713
+ end
1714
+ end
1715
+ @io << output
1716
+
1717
+ self # for chaining
1718
+ end
1719
+ alias_method :add_row, :<<
1720
+ alias_method :puts, :<<
1721
+
1722
+ #
1723
+ # :call-seq:
1724
+ # convert( name )
1725
+ # convert { |field| ... }
1726
+ # convert { |field, field_info| ... }
1727
+ #
1728
+ # You can use this method to install a CSV::Converters built-in, or provide a
1729
+ # block that handles a custom conversion.
1730
+ #
1731
+ # If you provide a block that takes one argument, it will be passed the field
1732
+ # and is expected to return the converted value or the field itself. If your
1733
+ # block takes two arguments, it will also be passed a CSV::FieldInfo Struct,
1734
+ # containing details about the field. Again, the block should return a
1735
+ # converted field or the field itself.
1736
+ #
1737
+ def convert(name = nil, &converter)
1738
+ add_converter(:converters, self.class::Converters, name, &converter)
1739
+ end
1740
+
1741
+ #
1742
+ # :call-seq:
1743
+ # header_convert( name )
1744
+ # header_convert { |field| ... }
1745
+ # header_convert { |field, field_info| ... }
1746
+ #
1747
+ # Identical to CSV#convert(), but for header rows.
1748
+ #
1749
+ # Note that this method must be called before header rows are read to have any
1750
+ # effect.
1751
+ #
1752
+ def header_convert(name = nil, &converter)
1753
+ add_converter( :header_converters,
1754
+ self.class::HeaderConverters,
1755
+ name,
1756
+ &converter )
1757
+ end
1758
+
1759
+ include Enumerable
1760
+
1761
+ #
1762
+ # Yields each row of the data source in turn.
1763
+ #
1764
+ # Support for Enumerable.
1765
+ #
1766
+ # The data source must be open for reading.
1767
+ #
1768
+ def each
1769
+ if block_given?
1770
+ while row = shift
1771
+ yield row
1772
+ end
1773
+ else
1774
+ to_enum
1775
+ end
1776
+ end
1777
+
1778
+ #
1779
+ # Slurps the remaining rows and returns an Array of Arrays.
1780
+ #
1781
+ # The data source must be open for reading.
1782
+ #
1783
+ def read
1784
+ rows = to_a
1785
+ if @use_headers
1786
+ Table.new(rows)
1787
+ else
1788
+ rows
1789
+ end
1790
+ end
1791
+ alias_method :readlines, :read
1792
+
1793
+ # Returns +true+ if the next row read will be a header row.
1794
+ def header_row?
1795
+ @use_headers and @headers.nil?
1796
+ end
1797
+
1798
+ #
1799
+ # The primary read method for wrapped Strings and IOs, a single row is pulled
1800
+ # from the data source, parsed and returned as an Array of fields (if header
1801
+ # rows are not used) or a CSV::Row (when header rows are used).
1802
+ #
1803
+ # The data source must be open for reading.
1804
+ #
1805
+ def shift
1806
+ #########################################################################
1807
+ ### This method is purposefully kept a bit long as simple conditional ###
1808
+ ### checks are faster than numerous (expensive) method calls. ###
1809
+ #########################################################################
1810
+
1811
+ # handle headers not based on document content
1812
+ if header_row? and @return_headers and
1813
+ [Array, String].include? @use_headers.class
1814
+ if @unconverted_fields
1815
+ return add_unconverted_fields(parse_headers, Array.new)
1816
+ else
1817
+ return parse_headers
1818
+ end
1819
+ end
1820
+
1821
+ #
1822
+ # it can take multiple calls to <tt>@io.gets()</tt> to get a full line,
1823
+ # because of \r and/or \n characters embedded in quoted fields
1824
+ #
1825
+ in_extended_col = false
1826
+ csv = Array.new
1827
+
1828
+ loop do
1829
+ # add another read to the line
1830
+ unless parse = @io.gets(@row_sep)
1831
+ return nil
1832
+ end
1833
+
1834
+ parse.sub!(@parsers[:line_end], "")
1835
+
1836
+ if csv.empty?
1837
+ #
1838
+ # I believe a blank line should be an <tt>Array.new</tt>, not Ruby 1.8
1839
+ # CSV's <tt>[nil]</tt>
1840
+ #
1841
+ if parse.empty?
1842
+ @lineno += 1
1843
+ if @skip_blanks
1844
+ next
1845
+ elsif @unconverted_fields
1846
+ return add_unconverted_fields(Array.new, Array.new)
1847
+ elsif @use_headers
1848
+ return self.class::Row.new(Array.new, Array.new)
1849
+ else
1850
+ return Array.new
1851
+ end
1852
+ end
1853
+ end
1854
+
1855
+ next if @skip_lines and @skip_lines.match parse
1856
+
1857
+ parts = parse.split(@col_sep, -1)
1858
+ if parts.empty?
1859
+ if in_extended_col
1860
+ csv[-1] << @col_sep # will be replaced with a @row_sep after the parts.each loop
1861
+ else
1862
+ csv << nil
1863
+ end
1864
+ end
1865
+
1866
+ # This loop is the hot path of csv parsing. Some things may be non-dry
1867
+ # for a reason. Make sure to benchmark when refactoring.
1868
+ parts.each do |part|
1869
+ if in_extended_col
1870
+ # If we are continuing a previous column
1871
+ if part[-1] == @quote_char && part.count(@quote_char) % 2 != 0
1872
+ # extended column ends
1873
+ csv[-1] = csv[-1].push(part[0..-2]).join("")
1874
+ if csv.last =~ @parsers[:stray_quote]
1875
+ raise MalformedCSVError,
1876
+ "Missing or stray quote in line #{lineno + 1}"
1877
+ end
1878
+ csv.last.gsub!(@quote_char * 2, @quote_char)
1879
+ in_extended_col = false
1880
+ else
1881
+ csv.last.push(part, @col_sep)
1882
+ end
1883
+ elsif part[0] == @quote_char
1884
+ # If we are starting a new quoted column
1885
+ if part.count(@quote_char) % 2 != 0
1886
+ # start an extended column
1887
+ csv << [part[1..-1], @col_sep]
1888
+ in_extended_col = true
1889
+ elsif part[-1] == @quote_char
1890
+ # regular quoted column
1891
+ csv << part[1..-2]
1892
+ if csv.last =~ @parsers[:stray_quote]
1893
+ raise MalformedCSVError,
1894
+ "Missing or stray quote in line #{lineno + 1}"
1895
+ end
1896
+ csv.last.gsub!(@quote_char * 2, @quote_char)
1897
+ elsif @liberal_parsing
1898
+ csv << part
1899
+ else
1900
+ raise MalformedCSVError,
1901
+ "Missing or stray quote in line #{lineno + 1}"
1902
+ end
1903
+ elsif part =~ @parsers[:quote_or_nl]
1904
+ # Unquoted field with bad characters.
1905
+ if part =~ @parsers[:nl_or_lf]
1906
+ raise MalformedCSVError, "Unquoted fields do not allow " +
1907
+ "\\r or \\n (line #{lineno + 1})."
1908
+ else
1909
+ if @liberal_parsing
1910
+ csv << part
1911
+ else
1912
+ raise MalformedCSVError, "Illegal quoting in line #{lineno + 1}."
1913
+ end
1914
+ end
1915
+ else
1916
+ # Regular ole unquoted field.
1917
+ csv << (part.empty? ? nil : part)
1918
+ end
1919
+ end
1920
+
1921
+ # Replace tacked on @col_sep with @row_sep if we are still in an extended
1922
+ # column.
1923
+ csv[-1][-1] = @row_sep if in_extended_col
1924
+
1925
+ if in_extended_col
1926
+ # if we're at eof?(), a quoted field wasn't closed...
1927
+ if @io.eof?
1928
+ raise MalformedCSVError,
1929
+ "Unclosed quoted field on line #{lineno + 1}."
1930
+ elsif @field_size_limit and csv.last.sum(&:size) >= @field_size_limit
1931
+ raise MalformedCSVError, "Field size exceeded on line #{lineno + 1}."
1932
+ end
1933
+ # otherwise, we need to loop and pull some more data to complete the row
1934
+ else
1935
+ @lineno += 1
1936
+
1937
+ # save fields unconverted fields, if needed...
1938
+ unconverted = csv.dup if @unconverted_fields
1939
+
1940
+ # convert fields, if needed...
1941
+ csv = convert_fields(csv) unless @use_headers or @converters.empty?
1942
+ # parse out header rows and handle CSV::Row conversions...
1943
+ csv = parse_headers(csv) if @use_headers
1944
+
1945
+ # inject unconverted fields and accessor, if requested...
1946
+ if @unconverted_fields and not csv.respond_to? :unconverted_fields
1947
+ add_unconverted_fields(csv, unconverted)
1948
+ end
1949
+
1950
+ # return the results
1951
+ break csv
1952
+ end
1953
+ end
1954
+ end
1955
+ alias_method :gets, :shift
1956
+ alias_method :readline, :shift
1957
+
1958
+ #
1959
+ # Returns a simplified description of the key CSV attributes in an
1960
+ # ASCII compatible String.
1961
+ #
1962
+ def inspect
1963
+ str = ["<#", self.class.to_s, " io_type:"]
1964
+ # show type of wrapped IO
1965
+ if @io == $stdout then str << "$stdout"
1966
+ elsif @io == $stdin then str << "$stdin"
1967
+ elsif @io == $stderr then str << "$stderr"
1968
+ else str << @io.class.to_s
1969
+ end
1970
+ # show IO.path(), if available
1971
+ if @io.respond_to?(:path) and (p = @io.path)
1972
+ str << " io_path:" << p.inspect
1973
+ end
1974
+ # show encoding
1975
+ str << " encoding:" << @encoding.name
1976
+ # show other attributes
1977
+ %w[ lineno col_sep row_sep
1978
+ quote_char skip_blanks liberal_parsing ].each do |attr_name|
1979
+ if a = instance_variable_get("@#{attr_name}")
1980
+ str << " " << attr_name << ":" << a.inspect
1981
+ end
1982
+ end
1983
+ if @use_headers
1984
+ str << " headers:" << headers.inspect
1985
+ end
1986
+ str << ">"
1987
+ begin
1988
+ str.join('')
1989
+ rescue # any encoding error
1990
+ str.map do |s|
1991
+ e = Encoding::Converter.asciicompat_encoding(s.encoding)
1992
+ e ? s.encode(e) : s.force_encoding("ASCII-8BIT")
1993
+ end.join('')
1994
+ end
1995
+ end
1996
+
1997
+ private
1998
+
1999
+ #
2000
+ # Stores the indicated separators for later use.
2001
+ #
2002
+ # If auto-discovery was requested for <tt>@row_sep</tt>, this method will read
2003
+ # ahead in the <tt>@io</tt> and try to find one. +ARGF+, +STDIN+, +STDOUT+,
2004
+ # +STDERR+ and any stream open for output only with a default
2005
+ # <tt>@row_sep</tt> of <tt>$INPUT_RECORD_SEPARATOR</tt> (<tt>$/</tt>).
2006
+ #
2007
+ # This method also establishes the quoting rules used for CSV output.
2008
+ #
2009
+ def init_separators(options)
2010
+ # store the selected separators
2011
+ @col_sep = options.delete(:col_sep).to_s.encode(@encoding)
2012
+ @row_sep = options.delete(:row_sep) # encode after resolving :auto
2013
+ @quote_char = options.delete(:quote_char).to_s.encode(@encoding)
2014
+
2015
+ if @quote_char.length != 1
2016
+ raise ArgumentError, ":quote_char has to be a single character String"
2017
+ end
2018
+
2019
+ #
2020
+ # automatically discover row separator when requested
2021
+ # (not fully encoding safe)
2022
+ #
2023
+ if @row_sep == :auto
2024
+ if [ARGF, STDIN, STDOUT, STDERR].include?(@io) or
2025
+ (defined?(Zlib) and @io.class == Zlib::GzipWriter)
2026
+ @row_sep = $INPUT_RECORD_SEPARATOR
2027
+ else
2028
+ begin
2029
+ #
2030
+ # remember where we were (pos() will raise an exception if @io is pipe
2031
+ # or not opened for reading)
2032
+ #
2033
+ saved_pos = @io.pos
2034
+ while @row_sep == :auto
2035
+ #
2036
+ # if we run out of data, it's probably a single line
2037
+ # (ensure will set default value)
2038
+ #
2039
+ break unless sample = @io.gets(nil, 1024)
2040
+ # extend sample if we're unsure of the line ending
2041
+ if sample.end_with? encode_str("\r")
2042
+ sample << (@io.gets(nil, 1) || "")
2043
+ end
2044
+
2045
+ # try to find a standard separator
2046
+ if sample =~ encode_re("\r\n?|\n")
2047
+ @row_sep = $&
2048
+ break
2049
+ end
2050
+ end
2051
+
2052
+ # tricky seek() clone to work around GzipReader's lack of seek()
2053
+ @io.rewind
2054
+ # reset back to the remembered position
2055
+ while saved_pos > 1024 # avoid loading a lot of data into memory
2056
+ @io.read(1024)
2057
+ saved_pos -= 1024
2058
+ end
2059
+ @io.read(saved_pos) if saved_pos.nonzero?
2060
+ rescue IOError # not opened for reading
2061
+ # do nothing: ensure will set default
2062
+ rescue NoMethodError # Zlib::GzipWriter doesn't have some IO methods
2063
+ # do nothing: ensure will set default
2064
+ rescue SystemCallError # pipe
2065
+ # do nothing: ensure will set default
2066
+ ensure
2067
+ #
2068
+ # set default if we failed to detect
2069
+ # (stream not opened for reading, a pipe, or a single line of data)
2070
+ #
2071
+ @row_sep = $INPUT_RECORD_SEPARATOR if @row_sep == :auto
2072
+ end
2073
+ end
2074
+ end
2075
+ @row_sep = @row_sep.to_s.encode(@encoding)
2076
+
2077
+ # establish quoting rules
2078
+ @force_quotes = options.delete(:force_quotes)
2079
+ do_quote = lambda do |field|
2080
+ field = String(field)
2081
+ encoded_quote = @quote_char.encode(field.encoding)
2082
+ encoded_quote +
2083
+ field.gsub(encoded_quote, encoded_quote * 2) +
2084
+ encoded_quote
2085
+ end
2086
+ quotable_chars = encode_str("\r\n", @col_sep, @quote_char)
2087
+ @quote = if @force_quotes
2088
+ do_quote
2089
+ else
2090
+ lambda do |field|
2091
+ if field.nil? # represent +nil+ fields as empty unquoted fields
2092
+ ""
2093
+ else
2094
+ field = String(field) # Stringify fields
2095
+ # represent empty fields as empty quoted fields
2096
+ if field.empty? or
2097
+ field.count(quotable_chars).nonzero?
2098
+ do_quote.call(field)
2099
+ else
2100
+ field # unquoted field
2101
+ end
2102
+ end
2103
+ end
2104
+ end
2105
+ end
2106
+
2107
+ # Pre-compiles parsers and stores them by name for access during reads.
2108
+ def init_parsers(options)
2109
+ # store the parser behaviors
2110
+ @skip_blanks = options.delete(:skip_blanks)
2111
+ @field_size_limit = options.delete(:field_size_limit)
2112
+ @liberal_parsing = options.delete(:liberal_parsing)
2113
+
2114
+ # prebuild Regexps for faster parsing
2115
+ esc_row_sep = escape_re(@row_sep)
2116
+ esc_quote = escape_re(@quote_char)
2117
+ @parsers = {
2118
+ # for detecting parse errors
2119
+ quote_or_nl: encode_re("[", esc_quote, "\r\n]"),
2120
+ nl_or_lf: encode_re("[\r\n]"),
2121
+ stray_quote: encode_re( "[^", esc_quote, "]", esc_quote,
2122
+ "[^", esc_quote, "]" ),
2123
+ # safer than chomp!()
2124
+ line_end: encode_re(esc_row_sep, "\\z"),
2125
+ # illegal unquoted characters
2126
+ return_newline: encode_str("\r\n")
2127
+ }
2128
+ end
2129
+
2130
+ #
2131
+ # Loads any converters requested during construction.
2132
+ #
2133
+ # If +field_name+ is set <tt>:converters</tt> (the default) field converters
2134
+ # are set. When +field_name+ is <tt>:header_converters</tt> header converters
2135
+ # are added instead.
2136
+ #
2137
+ # The <tt>:unconverted_fields</tt> option is also activated for
2138
+ # <tt>:converters</tt> calls, if requested.
2139
+ #
2140
+ def init_converters(options, field_name = :converters)
2141
+ if field_name == :converters
2142
+ @unconverted_fields = options.delete(:unconverted_fields)
2143
+ end
2144
+
2145
+ instance_variable_set("@#{field_name}", Array.new)
2146
+
2147
+ # find the correct method to add the converters
2148
+ convert = method(field_name.to_s.sub(/ers\Z/, ""))
2149
+
2150
+ # load converters
2151
+ unless options[field_name].nil?
2152
+ # allow a single converter not wrapped in an Array
2153
+ unless options[field_name].is_a? Array
2154
+ options[field_name] = [options[field_name]]
2155
+ end
2156
+ # load each converter...
2157
+ options[field_name].each do |converter|
2158
+ if converter.is_a? Proc # custom code block
2159
+ convert.call(&converter)
2160
+ else # by name
2161
+ convert.call(converter)
2162
+ end
2163
+ end
2164
+ end
2165
+
2166
+ options.delete(field_name)
2167
+ end
2168
+
2169
+ # Stores header row settings and loads header converters, if needed.
2170
+ def init_headers(options)
2171
+ @use_headers = options.delete(:headers)
2172
+ @return_headers = options.delete(:return_headers)
2173
+ @write_headers = options.delete(:write_headers)
2174
+
2175
+ # headers must be delayed until shift(), in case they need a row of content
2176
+ @headers = nil
2177
+
2178
+ init_converters(options, :header_converters)
2179
+ end
2180
+
2181
+ # Stores the pattern of comments to skip from the provided options.
2182
+ #
2183
+ # The pattern must respond to +.match+, else ArgumentError is raised.
2184
+ # Strings are converted to a Regexp.
2185
+ #
2186
+ # See also CSV.new
2187
+ def init_comments(options)
2188
+ @skip_lines = options.delete(:skip_lines)
2189
+ @skip_lines = Regexp.new(@skip_lines) if @skip_lines.is_a? String
2190
+ if @skip_lines and not @skip_lines.respond_to?(:match)
2191
+ raise ArgumentError, ":skip_lines has to respond to matches"
2192
+ end
2193
+ end
2194
+ #
2195
+ # The actual work method for adding converters, used by both CSV.convert() and
2196
+ # CSV.header_convert().
2197
+ #
2198
+ # This method requires the +var_name+ of the instance variable to place the
2199
+ # converters in, the +const+ Hash to lookup named converters in, and the
2200
+ # normal parameters of the CSV.convert() and CSV.header_convert() methods.
2201
+ #
2202
+ def add_converter(var_name, const, name = nil, &converter)
2203
+ if name.nil? # custom converter
2204
+ instance_variable_get("@#{var_name}") << converter
2205
+ else # named converter
2206
+ combo = const[name]
2207
+ case combo
2208
+ when Array # combo converter
2209
+ combo.each do |converter_name|
2210
+ add_converter(var_name, const, converter_name)
2211
+ end
2212
+ else # individual named converter
2213
+ instance_variable_get("@#{var_name}") << combo
2214
+ end
2215
+ end
2216
+ end
2217
+
2218
+ #
2219
+ # Processes +fields+ with <tt>@converters</tt>, or <tt>@header_converters</tt>
2220
+ # if +headers+ is passed as +true+, returning the converted field set. Any
2221
+ # converter that changes the field into something other than a String halts
2222
+ # the pipeline of conversion for that field. This is primarily an efficiency
2223
+ # shortcut.
2224
+ #
2225
+ def convert_fields(fields, headers = false)
2226
+ # see if we are converting headers or fields
2227
+ converters = headers ? @header_converters : @converters
2228
+
2229
+ fields.map.with_index do |field, index|
2230
+ converters.each do |converter|
2231
+ break if field.nil?
2232
+ field = if converter.arity == 1 # straight field converter
2233
+ converter[field]
2234
+ else # FieldInfo converter
2235
+ header = @use_headers && !headers ? @headers[index] : nil
2236
+ converter[field, FieldInfo.new(index, lineno, header)]
2237
+ end
2238
+ break unless field.is_a? String # short-circuit pipeline for speed
2239
+ end
2240
+ field # final state of each field, converted or original
2241
+ end
2242
+ end
2243
+
2244
+ #
2245
+ # This method is used to turn a finished +row+ into a CSV::Row. Header rows
2246
+ # are also dealt with here, either by returning a CSV::Row with identical
2247
+ # headers and fields (save that the fields do not go through the converters)
2248
+ # or by reading past them to return a field row. Headers are also saved in
2249
+ # <tt>@headers</tt> for use in future rows.
2250
+ #
2251
+ # When +nil+, +row+ is assumed to be a header row not based on an actual row
2252
+ # of the stream.
2253
+ #
2254
+ def parse_headers(row = nil)
2255
+ if @headers.nil? # header row
2256
+ @headers = case @use_headers # save headers
2257
+ # Array of headers
2258
+ when Array then @use_headers
2259
+ # CSV header String
2260
+ when String
2261
+ self.class.parse_line( @use_headers,
2262
+ col_sep: @col_sep,
2263
+ row_sep: @row_sep,
2264
+ quote_char: @quote_char )
2265
+ # first row is headers
2266
+ else row
2267
+ end
2268
+
2269
+ # prepare converted and unconverted copies
2270
+ row = @headers if row.nil?
2271
+ @headers = convert_fields(@headers, true)
2272
+ @headers.each { |h| h.freeze if h.is_a? String }
2273
+
2274
+ if @return_headers # return headers
2275
+ return self.class::Row.new(@headers, row, true)
2276
+ elsif not [Array, String].include? @use_headers.class # skip to field row
2277
+ return shift
2278
+ end
2279
+ end
2280
+
2281
+ self.class::Row.new(@headers, convert_fields(row)) # field row
2282
+ end
2283
+
2284
+ #
2285
+ # This method injects an instance variable <tt>unconverted_fields</tt> into
2286
+ # +row+ and an accessor method for +row+ called unconverted_fields(). The
2287
+ # variable is set to the contents of +fields+.
2288
+ #
2289
+ def add_unconverted_fields(row, fields)
2290
+ class << row
2291
+ attr_reader :unconverted_fields
2292
+ end
2293
+ row.instance_eval { @unconverted_fields = fields }
2294
+ row
2295
+ end
2296
+
2297
+ #
2298
+ # This method is an encoding safe version of Regexp::escape(). It will escape
2299
+ # any characters that would change the meaning of a regular expression in the
2300
+ # encoding of +str+. Regular expression characters that cannot be transcoded
2301
+ # to the target encoding will be skipped and no escaping will be performed if
2302
+ # a backslash cannot be transcoded.
2303
+ #
2304
+ def escape_re(str)
2305
+ str.gsub(@re_chars) {|c| @re_esc + c}
2306
+ end
2307
+
2308
+ #
2309
+ # Builds a regular expression in <tt>@encoding</tt>. All +chunks+ will be
2310
+ # transcoded to that encoding.
2311
+ #
2312
+ def encode_re(*chunks)
2313
+ Regexp.new(encode_str(*chunks))
2314
+ end
2315
+
2316
+ #
2317
+ # Builds a String in <tt>@encoding</tt>. All +chunks+ will be transcoded to
2318
+ # that encoding.
2319
+ #
2320
+ def encode_str(*chunks)
2321
+ chunks.map { |chunk| chunk.encode(@encoding.name) }.join('')
2322
+ end
2323
+
2324
+ private
2325
+
2326
+ #
2327
+ # Returns the encoding of the internal IO object or the +default+ if the
2328
+ # encoding cannot be determined.
2329
+ #
2330
+ def raw_encoding(default = Encoding::ASCII_8BIT)
2331
+ if @io.respond_to? :internal_encoding
2332
+ @io.internal_encoding || @io.external_encoding
2333
+ elsif @io.is_a? StringIO
2334
+ @io.string.encoding
2335
+ elsif @io.respond_to? :encoding
2336
+ @io.encoding
2337
+ else
2338
+ default
2339
+ end
2340
+ end
2341
+ end
2342
+
2343
+ # Passes +args+ to CSV::instance.
2344
+ #
2345
+ # CSV("CSV,data").read
2346
+ # #=> [["CSV", "data"]]
2347
+ #
2348
+ # If a block is given, the instance is passed the block and the return value
2349
+ # becomes the return value of the block.
2350
+ #
2351
+ # CSV("CSV,data") { |c|
2352
+ # c.read.any? { |a| a.include?("data") }
2353
+ # } #=> true
2354
+ #
2355
+ # CSV("CSV,data") { |c|
2356
+ # c.read.any? { |a| a.include?("zombies") }
2357
+ # } #=> false
2358
+ #
2359
+ def CSV(*args, &block)
2360
+ CSV.instance(*args, &block)
2361
+ end
2362
+
2363
+ class Array # :nodoc:
2364
+ # Equivalent to CSV::generate_line(self, options)
2365
+ #
2366
+ # ["CSV", "data"].to_csv
2367
+ # #=> "CSV,data\n"
2368
+ def to_csv(options = Hash.new)
2369
+ CSV.generate_line(self, options)
2370
+ end
2371
+ end
2372
+
2373
+ class String # :nodoc:
2374
+ # Equivalent to CSV::parse_line(self, options)
2375
+ #
2376
+ # "CSV,data".parse_csv
2377
+ # #=> ["CSV", "data"]
2378
+ def parse_csv(options = Hash.new)
2379
+ CSV.parse_line(self, options)
2380
+ end
2381
+ end