rubysl-csv 1.0.1 → 2.0.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 67c6fb34a5cd1839b27f1b3db51b962e4966fe3c
4
- data.tar.gz: c5c5befc7ebcf24aa4b45b7be33ae659d4680262
3
+ metadata.gz: 06b35a01dd6e04add0701020be3f918d840698d4
4
+ data.tar.gz: b8addc03698ccd506f75a0370204553710dcfe1c
5
5
  SHA512:
6
- metadata.gz: 5444f8467b23724368c2d33e4cd0ff4666b58a354194ffada01da0c625a618cf6d23d3abe8caf3256b73d7d89c94fb6fef841745a405747b855e8bd92a97dbad
7
- data.tar.gz: ab7bbb1b410705d370f4a4aa362bb39541113f9c2134d4ff94dd02d36c15b9a4ac8c7effa9009db0f422865c5720ef0ded4f2b4427509b45030d7ed849d3beb2
6
+ metadata.gz: 12419103eb8b2833f0aca7c11477c69e7175d0e1e8964616b5077dd9ef84e028b93dfbd6ddfc0e1f0f8d25d9939cd52bf4088ba708d3958271fc4ea3b663b27c
7
+ data.tar.gz: f388bfd4ba16eae0714a9fed2031f2175b4a435f9176c39d9155d1a07bad33ba882e72a94c3d370b22f0ac08a15b3f036c6a0fff0ff92a6910b77b320bf7c730
data/.gitignore CHANGED
@@ -15,4 +15,3 @@ spec/reports
15
15
  test/tmp
16
16
  test/version_tmp
17
17
  tmp
18
- .rbx
@@ -1,8 +1,9 @@
1
1
  language: ruby
2
2
  before_install:
3
+ - rvm use $RVM --install --binary --fuzzy
3
4
  - gem update --system
4
5
  - gem --version
5
6
  - gem install rubysl-bundler
7
+ env:
8
+ - RVM=rbx-nightly-d21 RUBYLIB=lib
6
9
  script: bundle exec mspec spec
7
- rvm:
8
- - rbx-nightly-18mode
data/README.md CHANGED
@@ -1,4 +1,4 @@
1
- # RubySL::Csv
1
+ # Rubysl::Csv
2
2
 
3
3
  TODO: Write a gem description
4
4
 
@@ -24,6 +24,6 @@ TODO: Write usage instructions here
24
24
 
25
25
  1. Fork it
26
26
  2. Create your feature branch (`git checkout -b my-new-feature`)
27
- 3. Commit your changes (`git commit -am 'Added some feature'`)
27
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
28
28
  4. Push to the branch (`git push origin my-new-feature`)
29
29
  5. Create new Pull Request
data/Rakefile CHANGED
@@ -1,2 +1 @@
1
- #!/usr/bin/env rake
2
1
  require "bundler/gem_tasks"
@@ -1,992 +1,2315 @@
1
- # CSV -- module for generating/parsing CSV data.
2
- # Copyright (C) 2000-2004 NAKAMURA, Hiroshi <nakahiro@sarion.co.jp>.
3
-
4
- # $Id: csv.rb 11708 2007-02-12 23:01:19Z shyouhei $
5
-
6
- # This program is copyrighted free software by NAKAMURA, Hiroshi. You can
7
- # redistribute it and/or modify it under the same terms of Ruby's license;
8
- # either the dual license version in 2003, or any later version.
9
-
10
-
1
+ # encoding: US-ASCII
2
+ # = csv.rb -- CSV Reading and Writing
3
+ #
4
+ # Created by James Edward Gray II on 2005-10-31.
5
+ # Copyright 2005 James Edward Gray II. You can redistribute or modify this code
6
+ # under the terms of Ruby's license.
7
+ #
8
+ # See CSV for documentation.
9
+ #
10
+ # == Description
11
+ #
12
+ # Welcome to the new and improved CSV.
13
+ #
14
+ # This version of the CSV library began its life as FasterCSV. FasterCSV was
15
+ # intended as a replacement to Ruby's then standard CSV library. It was
16
+ # designed to address concerns users of that library had and it had three
17
+ # primary goals:
18
+ #
19
+ # 1. Be significantly faster than CSV while remaining a pure Ruby library.
20
+ # 2. Use a smaller and easier to maintain code base. (FasterCSV eventually
21
+ # grew larger, was also but considerably richer in features. The parsing
22
+ # core remains quite small.)
23
+ # 3. Improve on the CSV interface.
24
+ #
25
+ # Obviously, the last one is subjective. I did try to defer to the original
26
+ # interface whenever I didn't have a compelling reason to change it though, so
27
+ # hopefully this won't be too radically different.
28
+ #
29
+ # We must have met our goals because FasterCSV was renamed to CSV and replaced
30
+ # the original library as of Ruby 1.9. If you are migrating code from 1.8 or
31
+ # earlier, you may have to change your code to comply with the new interface.
32
+ #
33
+ # == What's Different From the Old CSV?
34
+ #
35
+ # I'm sure I'll miss something, but I'll try to mention most of the major
36
+ # differences I am aware of, to help others quickly get up to speed:
37
+ #
38
+ # === CSV Parsing
39
+ #
40
+ # * This parser is m17n aware. See CSV for full details.
41
+ # * This library has a stricter parser and will throw MalformedCSVErrors on
42
+ # problematic data.
43
+ # * This library has a less liberal idea of a line ending than CSV. What you
44
+ # set as the <tt>:row_sep</tt> is law. It can auto-detect your line endings
45
+ # though.
46
+ # * The old library returned empty lines as <tt>[nil]</tt>. This library calls
47
+ # them <tt>[]</tt>.
48
+ # * This library has a much faster parser.
49
+ #
50
+ # === Interface
51
+ #
52
+ # * CSV now uses Hash-style parameters to set options.
53
+ # * CSV no longer has generate_row() or parse_row().
54
+ # * The old CSV's Reader and Writer classes have been dropped.
55
+ # * CSV::open() is now more like Ruby's open().
56
+ # * CSV objects now support most standard IO methods.
57
+ # * CSV now has a new() method used to wrap objects like String and IO for
58
+ # reading and writing.
59
+ # * CSV::generate() is different from the old method.
60
+ # * CSV no longer supports partial reads. It works line-by-line.
61
+ # * CSV no longer allows the instance methods to override the separators for
62
+ # performance reasons. They must be set in the constructor.
63
+ #
64
+ # If you use this library and find yourself missing any functionality I have
65
+ # trimmed, please {let me know}[mailto:james@grayproductions.net].
66
+ #
67
+ # == Documentation
68
+ #
69
+ # See CSV for documentation.
70
+ #
71
+ # == What is CSV, really?
72
+ #
73
+ # CSV maintains a pretty strict definition of CSV taken directly from
74
+ # {the RFC}[http://www.ietf.org/rfc/rfc4180.txt]. I relax the rules in only one
75
+ # place and that is to make using this library easier. CSV will parse all valid
76
+ # CSV.
77
+ #
78
+ # What you don't want to do is feed CSV invalid data. Because of the way the
79
+ # CSV format works, it's common for a parser to need to read until the end of
80
+ # the file to be sure a field is invalid. This eats a lot of time and memory.
81
+ #
82
+ # Luckily, when working with invalid CSV, Ruby's built-in methods will almost
83
+ # always be superior in every way. For example, parsing non-quoted fields is as
84
+ # easy as:
85
+ #
86
+ # data.split(",")
87
+ #
88
+ # == Questions and/or Comments
89
+ #
90
+ # Feel free to email {James Edward Gray II}[mailto:james@grayproductions.net]
91
+ # with any questions.
92
+
93
+ require "forwardable"
94
+ require "English"
95
+ require "date"
96
+ require "stringio"
97
+
98
+ #
99
+ # This class provides a complete interface to CSV files and data. It offers
100
+ # tools to enable you to read and write to and from Strings or IO objects, as
101
+ # needed.
102
+ #
103
+ # == Reading
104
+ #
105
+ # === From a File
106
+ #
107
+ # ==== A Line at a Time
108
+ #
109
+ # CSV.foreach("path/to/file.csv") do |row|
110
+ # # use row here...
111
+ # end
112
+ #
113
+ # ==== All at Once
114
+ #
115
+ # arr_of_arrs = CSV.read("path/to/file.csv")
116
+ #
117
+ # === From a String
118
+ #
119
+ # ==== A Line at a Time
120
+ #
121
+ # CSV.parse("CSV,data,String") do |row|
122
+ # # use row here...
123
+ # end
124
+ #
125
+ # ==== All at Once
126
+ #
127
+ # arr_of_arrs = CSV.parse("CSV,data,String")
128
+ #
129
+ # == Writing
130
+ #
131
+ # === To a File
132
+ #
133
+ # CSV.open("path/to/file.csv", "wb") do |csv|
134
+ # csv << ["row", "of", "CSV", "data"]
135
+ # csv << ["another", "row"]
136
+ # # ...
137
+ # end
138
+ #
139
+ # === To a String
140
+ #
141
+ # csv_string = CSV.generate do |csv|
142
+ # csv << ["row", "of", "CSV", "data"]
143
+ # csv << ["another", "row"]
144
+ # # ...
145
+ # end
146
+ #
147
+ # == Convert a Single Line
148
+ #
149
+ # csv_string = ["CSV", "data"].to_csv # to CSV
150
+ # csv_array = "CSV,String".parse_csv # from CSV
151
+ #
152
+ # == Shortcut Interface
153
+ #
154
+ # CSV { |csv_out| csv_out << %w{my data here} } # to $stdout
155
+ # CSV(csv = "") { |csv_str| csv_str << %w{my data here} } # to a String
156
+ # CSV($stderr) { |csv_err| csv_err << %w{my data here} } # to $stderr
157
+ # CSV($stdin) { |csv_in| csv_in.each { |row| p row } } # from $stdin
158
+ #
159
+ # == Advanced Usage
160
+ #
161
+ # === Wrap an IO Object
162
+ #
163
+ # csv = CSV.new(io, options)
164
+ # # ... read (with gets() or each()) from and write (with <<) to csv here ...
165
+ #
166
+ # == CSV and Character Encodings (M17n or Multilingualization)
167
+ #
168
+ # This new CSV parser is m17n savvy. The parser works in the Encoding of the IO
169
+ # or String object being read from or written to. Your data is never transcoded
170
+ # (unless you ask Ruby to transcode it for you) and will literally be parsed in
171
+ # the Encoding it is in. Thus CSV will return Arrays or Rows of Strings in the
172
+ # Encoding of your data. This is accomplished by transcoding the parser itself
173
+ # into your Encoding.
174
+ #
175
+ # Some transcoding must take place, of course, to accomplish this multiencoding
176
+ # support. For example, <tt>:col_sep</tt>, <tt>:row_sep</tt>, and
177
+ # <tt>:quote_char</tt> must be transcoded to match your data. Hopefully this
178
+ # makes the entire process feel transparent, since CSV's defaults should just
179
+ # magically work for you data. However, you can set these values manually in
180
+ # the target Encoding to avoid the translation.
181
+ #
182
+ # It's also important to note that while all of CSV's core parser is now
183
+ # Encoding agnostic, some features are not. For example, the built-in
184
+ # converters will try to transcode data to UTF-8 before making conversions.
185
+ # Again, you can provide custom converters that are aware of your Encodings to
186
+ # avoid this translation. It's just too hard for me to support native
187
+ # conversions in all of Ruby's Encodings.
188
+ #
189
+ # Anyway, the practical side of this is simple: make sure IO and String objects
190
+ # passed into CSV have the proper Encoding set and everything should just work.
191
+ # CSV methods that allow you to open IO objects (CSV::foreach(), CSV::open(),
192
+ # CSV::read(), and CSV::readlines()) do allow you to specify the Encoding.
193
+ #
194
+ # One minor exception comes when generating CSV into a String with an Encoding
195
+ # that is not ASCII compatible. There's no existing data for CSV to use to
196
+ # prepare itself and thus you will probably need to manually specify the desired
197
+ # Encoding for most of those cases. It will try to guess using the fields in a
198
+ # row of output though, when using CSV::generate_line() or Array#to_csv().
199
+ #
200
+ # I try to point out any other Encoding issues in the documentation of methods
201
+ # as they come up.
202
+ #
203
+ # This has been tested to the best of my ability with all non-"dummy" Encodings
204
+ # Ruby ships with. However, it is brave new code and may have some bugs.
205
+ # Please feel free to {report}[mailto:james@grayproductions.net] any issues you
206
+ # find with it.
207
+ #
11
208
  class CSV
12
- class IllegalFormatError < RuntimeError; end
209
+ # The version of the installed library.
210
+ VERSION = "2.4.8".freeze
211
+
212
+ #
213
+ # A CSV::Row is part Array and part Hash. It retains an order for the fields
214
+ # and allows duplicates just as an Array would, but also allows you to access
215
+ # fields by name just as you could if they were in a Hash.
216
+ #
217
+ # All rows returned by CSV will be constructed from this class, if header row
218
+ # processing is activated.
219
+ #
220
+ class Row
221
+ #
222
+ # Construct a new CSV::Row from +headers+ and +fields+, which are expected
223
+ # to be Arrays. If one Array is shorter than the other, it will be padded
224
+ # with +nil+ objects.
225
+ #
226
+ # The optional +header_row+ parameter can be set to +true+ to indicate, via
227
+ # CSV::Row.header_row?() and CSV::Row.field_row?(), that this is a header
228
+ # row. Otherwise, the row is assumes to be a field row.
229
+ #
230
+ # A CSV::Row object supports the following Array methods through delegation:
231
+ #
232
+ # * empty?()
233
+ # * length()
234
+ # * size()
235
+ #
236
+ def initialize(headers, fields, header_row = false)
237
+ @header_row = header_row
13
238
 
14
- # deprecated
15
- class Cell < String
16
- def initialize(data = "", is_null = false)
17
- super(is_null ? "" : data)
239
+ # handle extra headers or fields
240
+ @row = if headers.size > fields.size
241
+ headers.zip(fields)
242
+ else
243
+ fields.zip(headers).map { |pair| pair.reverse }
244
+ end
18
245
  end
19
246
 
20
- def data
21
- to_s
247
+ # Internal data format used to compare equality.
248
+ attr_reader :row
249
+ protected :row
250
+
251
+ ### Array Delegation ###
252
+
253
+ extend Forwardable
254
+ def_delegators :@row, :empty?, :length, :size
255
+
256
+ # Returns +true+ if this is a header row.
257
+ def header_row?
258
+ @header_row
22
259
  end
23
- end
24
260
 
25
- # deprecated
26
- class Row < Array
27
- end
261
+ # Returns +true+ if this is a field row.
262
+ def field_row?
263
+ not header_row?
264
+ end
28
265
 
29
- # Open a CSV formatted file for reading or writing.
30
- #
31
- # For reading.
32
- #
33
- # EXAMPLE 1
34
- # CSV.open('csvfile.csv', 'r') do |row|
35
- # p row
36
- # end
37
- #
38
- # EXAMPLE 2
39
- # reader = CSV.open('csvfile.csv', 'r')
40
- # row1 = reader.shift
41
- # row2 = reader.shift
42
- # if row2.empty?
43
- # p 'row2 not find.'
44
- # end
45
- # reader.close
46
- #
47
- # ARGS
48
- # filename: filename to parse.
49
- # col_sep: Column separator. ?, by default. If you want to separate
50
- # fields with semicolon, give ?; here.
51
- # row_sep: Row separator. nil by default. nil means "\r\n or \n". If you
52
- # want to separate records with \r, give ?\r here.
53
- #
54
- # RETURNS
55
- # reader instance. To get parse result, see CSV::Reader#each.
56
- #
57
- #
58
- # For writing.
59
- #
60
- # EXAMPLE 1
61
- # CSV.open('csvfile.csv', 'w') do |writer|
62
- # writer << ['r1c1', 'r1c2']
63
- # writer << ['r2c1', 'r2c2']
64
- # writer << [nil, nil]
65
- # end
66
- #
67
- # EXAMPLE 2
68
- # writer = CSV.open('csvfile.csv', 'w')
69
- # writer << ['r1c1', 'r1c2'] << ['r2c1', 'r2c2'] << [nil, nil]
70
- # writer.close
71
- #
72
- # ARGS
73
- # filename: filename to generate.
74
- # col_sep: Column separator. ?, by default. If you want to separate
75
- # fields with semicolon, give ?; here.
76
- # row_sep: Row separator. nil by default. nil means "\r\n or \n". If you
77
- # want to separate records with \r, give ?\r here.
78
- #
79
- # RETURNS
80
- # writer instance. See CSV::Writer#<< and CSV::Writer#add_row to know how
81
- # to generate CSV string.
82
- #
83
- def CSV.open(path, mode, fs = nil, rs = nil, &block)
84
- if mode == 'r' or mode == 'rb'
85
- open_reader(path, mode, fs, rs, &block)
86
- elsif mode == 'w' or mode == 'wb'
87
- open_writer(path, mode, fs, rs, &block)
88
- else
89
- raise ArgumentError.new("'mode' must be 'r', 'rb', 'w', or 'wb'")
266
+ # Returns the headers of this row.
267
+ def headers
268
+ @row.map { |pair| pair.first }
90
269
  end
91
- end
92
270
 
93
- def CSV.foreach(path, rs = nil, &block)
94
- open_reader(path, 'r', ',', rs, &block)
95
- end
271
+ #
272
+ # :call-seq:
273
+ # field( header )
274
+ # field( header, offset )
275
+ # field( index )
276
+ #
277
+ # This method will return the field value by +header+ or +index+. If a field
278
+ # is not found, +nil+ is returned.
279
+ #
280
+ # When provided, +offset+ ensures that a header match occurrs on or later
281
+ # than the +offset+ index. You can use this to find duplicate headers,
282
+ # without resorting to hard-coding exact indices.
283
+ #
284
+ def field(header_or_index, minimum_index = 0)
285
+ # locate the pair
286
+ finder = header_or_index.is_a?(Integer) ? :[] : :assoc
287
+ pair = @row[minimum_index..-1].send(finder, header_or_index)
96
288
 
97
- def CSV.read(path, length = nil, offset = nil)
98
- CSV.parse(IO.read(path, length, offset))
99
- end
100
-
101
- def CSV.readlines(path, rs = nil)
102
- reader = open_reader(path, 'r', ',', rs)
103
- begin
104
- reader.collect { |row| row }
105
- ensure
106
- reader.close
289
+ # return the field if we have a pair
290
+ pair.nil? ? nil : pair.last
107
291
  end
108
- end
292
+ alias_method :[], :field
109
293
 
110
- def CSV.generate(path, fs = nil, rs = nil, &block)
111
- open_writer(path, 'w', fs, rs, &block)
112
- end
294
+ #
295
+ # :call-seq:
296
+ # fetch( header )
297
+ # fetch( header ) { |row| ... }
298
+ # fetch( header, default )
299
+ #
300
+ # This method will fetch the field value by +header+. It has the same
301
+ # behavior as Hash#fetch: if there is a field with the given +header+, its
302
+ # value is returned. Otherwise, if a block is given, it is yielded the
303
+ # +header+ and its result is returned; if a +default+ is given as the
304
+ # second argument, it is returned; otherwise a KeyError is raised.
305
+ #
306
+ def fetch(header, *varargs)
307
+ raise ArgumentError, "Too many arguments" if varargs.length > 1
308
+ pair = @row.assoc(header)
309
+ if pair
310
+ pair.last
311
+ else
312
+ if block_given?
313
+ yield header
314
+ elsif varargs.empty?
315
+ raise KeyError, "key not found: #{header}"
316
+ else
317
+ varargs.first
318
+ end
319
+ end
320
+ end
113
321
 
114
- # Parse lines from given string or stream. Return rows as an Array of Arrays.
115
- def CSV.parse(str_or_readable, fs = nil, rs = nil, &block)
116
- if File.exist?(str_or_readable)
117
- STDERR.puts("CSV.parse(filename) is deprecated." +
118
- " Use CSV.open(filename, 'r') instead.")
119
- return open_reader(str_or_readable, 'r', fs, rs, &block)
322
+ # Returns +true+ if there is a field with the given +header+.
323
+ def has_key?(header)
324
+ !!@row.assoc(header)
120
325
  end
121
- if block
122
- CSV::Reader.parse(str_or_readable, fs, rs) do |row|
123
- yield(row)
326
+ alias_method :include?, :has_key?
327
+ alias_method :key?, :has_key?
328
+ alias_method :member?, :has_key?
329
+
330
+ #
331
+ # :call-seq:
332
+ # []=( header, value )
333
+ # []=( header, offset, value )
334
+ # []=( index, value )
335
+ #
336
+ # Looks up the field by the semantics described in CSV::Row.field() and
337
+ # assigns the +value+.
338
+ #
339
+ # Assigning past the end of the row with an index will set all pairs between
340
+ # to <tt>[nil, nil]</tt>. Assigning to an unused header appends the new
341
+ # pair.
342
+ #
343
+ def []=(*args)
344
+ value = args.pop
345
+
346
+ if args.first.is_a? Integer
347
+ if @row[args.first].nil? # extending past the end with index
348
+ @row[args.first] = [nil, value]
349
+ @row.map! { |pair| pair.nil? ? [nil, nil] : pair }
350
+ else # normal index assignment
351
+ @row[args.first][1] = value
352
+ end
353
+ else
354
+ index = index(*args)
355
+ if index.nil? # appending a field
356
+ self << [args.first, value]
357
+ else # normal header assignment
358
+ @row[index][1] = value
359
+ end
124
360
  end
125
- nil
126
- else
127
- CSV::Reader.create(str_or_readable, fs, rs).collect { |row| row }
128
361
  end
129
- end
130
362
 
131
- # Parse a line from given string. Bear in mind it parses ONE LINE. Rest of
132
- # the string is ignored for example "a,b\r\nc,d" => ['a', 'b'] and the
133
- # second line 'c,d' is ignored.
134
- #
135
- # If you don't know whether a target string to parse is exactly 1 line or
136
- # not, use CSV.parse_row instead of this method.
137
- def CSV.parse_line(src, fs = nil, rs = nil)
138
- fs ||= ','
139
- if fs.is_a?(Fixnum)
140
- fs = fs.chr
363
+ #
364
+ # :call-seq:
365
+ # <<( field )
366
+ # <<( header_and_field_array )
367
+ # <<( header_and_field_hash )
368
+ #
369
+ # If a two-element Array is provided, it is assumed to be a header and field
370
+ # and the pair is appended. A Hash works the same way with the key being
371
+ # the header and the value being the field. Anything else is assumed to be
372
+ # a lone field which is appended with a +nil+ header.
373
+ #
374
+ # This method returns the row for chaining.
375
+ #
376
+ def <<(arg)
377
+ if arg.is_a?(Array) and arg.size == 2 # appending a header and name
378
+ @row << arg
379
+ elsif arg.is_a?(Hash) # append header and name pairs
380
+ arg.each { |pair| @row << pair }
381
+ else # append field value
382
+ @row << [nil, arg]
383
+ end
384
+
385
+ self # for chaining
141
386
  end
142
- if !rs.nil? and rs.is_a?(Fixnum)
143
- rs = rs.chr
387
+
388
+ #
389
+ # A shortcut for appending multiple fields. Equivalent to:
390
+ #
391
+ # args.each { |arg| csv_row << arg }
392
+ #
393
+ # This method returns the row for chaining.
394
+ #
395
+ def push(*args)
396
+ args.each { |arg| self << arg }
397
+
398
+ self # for chaining
144
399
  end
145
- idx = 0
146
- res_type = :DT_COLSEP
147
- row = []
148
- begin
149
- while res_type == :DT_COLSEP
150
- res_type, idx, cell = parse_body(src, idx, fs, rs)
151
- row << cell
400
+
401
+ #
402
+ # :call-seq:
403
+ # delete( header )
404
+ # delete( header, offset )
405
+ # delete( index )
406
+ #
407
+ # Used to remove a pair from the row by +header+ or +index+. The pair is
408
+ # located as described in CSV::Row.field(). The deleted pair is returned,
409
+ # or +nil+ if a pair could not be found.
410
+ #
411
+ def delete(header_or_index, minimum_index = 0)
412
+ if header_or_index.is_a? Integer # by index
413
+ @row.delete_at(header_or_index)
414
+ elsif i = index(header_or_index, minimum_index) # by header
415
+ @row.delete_at(i)
416
+ else
417
+ [ ]
152
418
  end
153
- rescue IllegalFormatError
154
- return []
155
419
  end
156
- row
157
- end
158
420
 
159
- # Create a line from cells. each cell is stringified by to_s.
160
- def CSV.generate_line(row, fs = nil, rs = nil)
161
- if row.size == 0
162
- return ''
163
- end
164
- fs ||= ','
165
- if fs.is_a?(Fixnum)
166
- fs = fs.chr
167
- end
168
- if !rs.nil? and rs.is_a?(Fixnum)
169
- rs = rs.chr
170
- end
171
- res_type = :DT_COLSEP
172
- result_str = ''
173
- idx = 0
174
- while true
175
- generate_body(row[idx], result_str, fs, rs)
176
- idx += 1
177
- if (idx == row.size)
178
- break
179
- end
180
- generate_separator(:DT_COLSEP, result_str, fs, rs)
181
- end
182
- result_str
183
- end
184
-
185
- # Parse a line from string. Consider using CSV.parse_line instead.
186
- # To parse lines in CSV string, see EXAMPLE below.
187
- #
188
- # EXAMPLE
189
- # src = "a,b\r\nc,d\r\ne,f"
190
- # idx = 0
191
- # begin
192
- # parsed = []
193
- # parsed_cells, idx = CSV.parse_row(src, idx, parsed)
194
- # puts "Parsed #{ parsed_cells } cells."
195
- # p parsed
196
- # end while parsed_cells > 0
197
- #
198
- # ARGS
199
- # src: a CSV data to be parsed. Must respond '[](idx)'.
200
- # src[](idx) must return a char. (Not a string such as 'a', but 97).
201
- # src[](idx_out_of_bounds) must return nil. A String satisfies this
202
- # requirement.
203
- # idx: index of parsing location of 'src'. 0 origin.
204
- # out_dev: buffer for parsed cells. Must respond '<<(aString)'.
205
- # col_sep: Column separator. ?, by default. If you want to separate
206
- # fields with semicolon, give ?; here.
207
- # row_sep: Row separator. nil by default. nil means "\r\n or \n". If you
208
- # want to separate records with \r, give ?\r here.
209
- #
210
- # RETURNS
211
- # parsed_cells: num of parsed cells.
212
- # idx: index of next parsing location of 'src'.
213
- #
214
- def CSV.parse_row(src, idx, out_dev, fs = nil, rs = nil)
215
- fs ||= ','
216
- if fs.is_a?(Fixnum)
217
- fs = fs.chr
218
- end
219
- if !rs.nil? and rs.is_a?(Fixnum)
220
- rs = rs.chr
221
- end
222
- idx_backup = idx
223
- parsed_cells = 0
224
- res_type = :DT_COLSEP
225
- begin
226
- while res_type != :DT_ROWSEP
227
- res_type, idx, cell = parse_body(src, idx, fs, rs)
228
- if res_type == :DT_EOS
229
- if idx == idx_backup #((parsed_cells == 0) and cell.nil?)
230
- return 0, 0
421
+ #
422
+ # The provided +block+ is passed a header and field for each pair in the row
423
+ # and expected to return +true+ or +false+, depending on whether the pair
424
+ # should be deleted.
425
+ #
426
+ # This method returns the row for chaining.
427
+ #
428
+ def delete_if(&block)
429
+ @row.delete_if(&block)
430
+
431
+ self # for chaining
432
+ end
433
+
434
+ #
435
+ # This method accepts any number of arguments which can be headers, indices,
436
+ # Ranges of either, or two-element Arrays containing a header and offset.
437
+ # Each argument will be replaced with a field lookup as described in
438
+ # CSV::Row.field().
439
+ #
440
+ # If called with no arguments, all fields are returned.
441
+ #
442
+ def fields(*headers_and_or_indices)
443
+ if headers_and_or_indices.empty? # return all fields--no arguments
444
+ @row.map { |pair| pair.last }
445
+ else # or work like values_at()
446
+ headers_and_or_indices.inject(Array.new) do |all, h_or_i|
447
+ all + if h_or_i.is_a? Range
448
+ index_begin = h_or_i.begin.is_a?(Integer) ? h_or_i.begin :
449
+ index(h_or_i.begin)
450
+ index_end = h_or_i.end.is_a?(Integer) ? h_or_i.end :
451
+ index(h_or_i.end)
452
+ new_range = h_or_i.exclude_end? ? (index_begin...index_end) :
453
+ (index_begin..index_end)
454
+ fields.values_at(new_range)
455
+ else
456
+ [field(*Array(h_or_i))]
231
457
  end
232
- res_type = :DT_ROWSEP
233
458
  end
234
- parsed_cells += 1
235
- out_dev << cell
236
- end
237
- rescue IllegalFormatError
238
- return 0, 0
239
- end
240
- return parsed_cells, idx
241
- end
242
-
243
- # Convert a line from cells data to string. Consider using CSV.generate_line
244
- # instead. To generate multi-row CSV string, see EXAMPLE below.
245
- #
246
- # EXAMPLE
247
- # row1 = ['a', 'b']
248
- # row2 = ['c', 'd']
249
- # row3 = ['e', 'f']
250
- # src = [row1, row2, row3]
251
- # buf = ''
252
- # src.each do |row|
253
- # parsed_cells = CSV.generate_row(row, 2, buf)
254
- # puts "Created #{ parsed_cells } cells."
255
- # end
256
- # p buf
257
- #
258
- # ARGS
259
- # src: an Array of String to be converted to CSV string. Must respond to
260
- # 'size' and '[](idx)'. src[idx] must return String.
261
- # cells: num of cells in a line.
262
- # out_dev: buffer for generated CSV string. Must respond to '<<(string)'.
263
- # col_sep: Column separator. ?, by default. If you want to separate
264
- # fields with semicolon, give ?; here.
265
- # row_sep: Row separator. nil by default. nil means "\r\n or \n". If you
266
- # want to separate records with \r, give ?\r here.
267
- #
268
- # RETURNS
269
- # parsed_cells: num of converted cells.
270
- #
271
- def CSV.generate_row(src, cells, out_dev, fs = nil, rs = nil)
272
- fs ||= ','
273
- if fs.is_a?(Fixnum)
274
- fs = fs.chr
275
- end
276
- if !rs.nil? and rs.is_a?(Fixnum)
277
- rs = rs.chr
278
- end
279
- src_size = src.size
280
- if (src_size == 0)
281
- if cells == 0
282
- generate_separator(:DT_ROWSEP, out_dev, fs, rs)
283
- end
284
- return 0
285
- end
286
- res_type = :DT_COLSEP
287
- parsed_cells = 0
288
- generate_body(src[parsed_cells], out_dev, fs, rs)
289
- parsed_cells += 1
290
- while ((parsed_cells < cells) and (parsed_cells != src_size))
291
- generate_separator(:DT_COLSEP, out_dev, fs, rs)
292
- generate_body(src[parsed_cells], out_dev, fs, rs)
293
- parsed_cells += 1
294
- end
295
- if (parsed_cells == cells)
296
- generate_separator(:DT_ROWSEP, out_dev, fs, rs)
297
- else
298
- generate_separator(:DT_COLSEP, out_dev, fs, rs)
459
+ end
460
+ end
461
+ alias_method :values_at, :fields
462
+
463
+ #
464
+ # :call-seq:
465
+ # index( header )
466
+ # index( header, offset )
467
+ #
468
+ # This method will return the index of a field with the provided +header+.
469
+ # The +offset+ can be used to locate duplicate header names, as described in
470
+ # CSV::Row.field().
471
+ #
472
+ def index(header, minimum_index = 0)
473
+ # find the pair
474
+ index = headers[minimum_index..-1].index(header)
475
+ # return the index at the right offset, if we found one
476
+ index.nil? ? nil : index + minimum_index
477
+ end
478
+
479
+ # Returns +true+ if +name+ is a header for this row, and +false+ otherwise.
480
+ def header?(name)
481
+ headers.include? name
482
+ end
483
+ alias_method :include?, :header?
484
+
485
+ #
486
+ # Returns +true+ if +data+ matches a field in this row, and +false+
487
+ # otherwise.
488
+ #
489
+ def field?(data)
490
+ fields.include? data
491
+ end
492
+
493
+ include Enumerable
494
+
495
+ #
496
+ # Yields each pair of the row as header and field tuples (much like
497
+ # iterating over a Hash).
498
+ #
499
+ # Support for Enumerable.
500
+ #
501
+ # This method returns the row for chaining.
502
+ #
503
+ def each(&block)
504
+ @row.each(&block)
505
+
506
+ self # for chaining
507
+ end
508
+
509
+ #
510
+ # Returns +true+ if this row contains the same headers and fields in the
511
+ # same order as +other+.
512
+ #
513
+ def ==(other)
514
+ return @row == other.row if other.is_a? CSV::Row
515
+ @row == other
516
+ end
517
+
518
+ #
519
+ # Collapses the row into a simple Hash. Be warning that this discards field
520
+ # order and clobbers duplicate fields.
521
+ #
522
+ def to_hash
523
+ # flatten just one level of the internal Array
524
+ Hash[*@row.inject(Array.new) { |ary, pair| ary.push(*pair) }]
525
+ end
526
+
527
+ #
528
+ # Returns the row as a CSV String. Headers are not used. Equivalent to:
529
+ #
530
+ # csv_row.fields.to_csv( options )
531
+ #
532
+ def to_csv(options = Hash.new)
533
+ fields.to_csv(options)
534
+ end
535
+ alias_method :to_s, :to_csv
536
+
537
+ # A summary of fields, by header, in an ASCII compatible String.
538
+ def inspect
539
+ str = ["#<", self.class.to_s]
540
+ each do |header, field|
541
+ str << " " << (header.is_a?(Symbol) ? header.to_s : header.inspect) <<
542
+ ":" << field.inspect
543
+ end
544
+ str << ">"
545
+ begin
546
+ str.join('')
547
+ rescue # any encoding error
548
+ str.map do |s|
549
+ e = Encoding::Converter.asciicompat_encoding(s.encoding)
550
+ e ? s.encode(e) : s.force_encoding("ASCII-8BIT")
551
+ end.join('')
552
+ end
299
553
  end
300
- parsed_cells
301
554
  end
302
-
303
- # Private class methods.
304
- class << self
305
- private
306
555
 
307
- def open_reader(path, mode, fs, rs, &block)
308
- file = File.open(path, mode)
309
- if block
310
- begin
311
- CSV::Reader.parse(file, fs, rs) do |row|
312
- yield(row)
313
- end
314
- ensure
315
- file.close
316
- end
317
- nil
556
+ #
557
+ # A CSV::Table is a two-dimensional data structure for representing CSV
558
+ # documents. Tables allow you to work with the data by row or column,
559
+ # manipulate the data, and even convert the results back to CSV, if needed.
560
+ #
561
+ # All tables returned by CSV will be constructed from this class, if header
562
+ # row processing is activated.
563
+ #
564
+ class Table
565
+ #
566
+ # Construct a new CSV::Table from +array_of_rows+, which are expected
567
+ # to be CSV::Row objects. All rows are assumed to have the same headers.
568
+ #
569
+ # A CSV::Table object supports the following Array methods through
570
+ # delegation:
571
+ #
572
+ # * empty?()
573
+ # * length()
574
+ # * size()
575
+ #
576
+ def initialize(array_of_rows)
577
+ @table = array_of_rows
578
+ @mode = :col_or_row
579
+ end
580
+
581
+ # The current access mode for indexing and iteration.
582
+ attr_reader :mode
583
+
584
+ # Internal data format used to compare equality.
585
+ attr_reader :table
586
+ protected :table
587
+
588
+ ### Array Delegation ###
589
+
590
+ extend Forwardable
591
+ def_delegators :@table, :empty?, :length, :size
592
+
593
+ #
594
+ # Returns a duplicate table object, in column mode. This is handy for
595
+ # chaining in a single call without changing the table mode, but be aware
596
+ # that this method can consume a fair amount of memory for bigger data sets.
597
+ #
598
+ # This method returns the duplicate table for chaining. Don't chain
599
+ # destructive methods (like []=()) this way though, since you are working
600
+ # with a duplicate.
601
+ #
602
+ def by_col
603
+ self.class.new(@table.dup).by_col!
604
+ end
605
+
606
+ #
607
+ # Switches the mode of this table to column mode. All calls to indexing and
608
+ # iteration methods will work with columns until the mode is changed again.
609
+ #
610
+ # This method returns the table and is safe to chain.
611
+ #
612
+ def by_col!
613
+ @mode = :col
614
+
615
+ self
616
+ end
617
+
618
+ #
619
+ # Returns a duplicate table object, in mixed mode. This is handy for
620
+ # chaining in a single call without changing the table mode, but be aware
621
+ # that this method can consume a fair amount of memory for bigger data sets.
622
+ #
623
+ # This method returns the duplicate table for chaining. Don't chain
624
+ # destructive methods (like []=()) this way though, since you are working
625
+ # with a duplicate.
626
+ #
627
+ def by_col_or_row
628
+ self.class.new(@table.dup).by_col_or_row!
629
+ end
630
+
631
+ #
632
+ # Switches the mode of this table to mixed mode. All calls to indexing and
633
+ # iteration methods will use the default intelligent indexing system until
634
+ # the mode is changed again. In mixed mode an index is assumed to be a row
635
+ # reference while anything else is assumed to be column access by headers.
636
+ #
637
+ # This method returns the table and is safe to chain.
638
+ #
639
+ def by_col_or_row!
640
+ @mode = :col_or_row
641
+
642
+ self
643
+ end
644
+
645
+ #
646
+ # Returns a duplicate table object, in row mode. This is handy for chaining
647
+ # in a single call without changing the table mode, but be aware that this
648
+ # method can consume a fair amount of memory for bigger data sets.
649
+ #
650
+ # This method returns the duplicate table for chaining. Don't chain
651
+ # destructive methods (like []=()) this way though, since you are working
652
+ # with a duplicate.
653
+ #
654
+ def by_row
655
+ self.class.new(@table.dup).by_row!
656
+ end
657
+
658
+ #
659
+ # Switches the mode of this table to row mode. All calls to indexing and
660
+ # iteration methods will work with rows until the mode is changed again.
661
+ #
662
+ # This method returns the table and is safe to chain.
663
+ #
664
+ def by_row!
665
+ @mode = :row
666
+
667
+ self
668
+ end
669
+
670
+ #
671
+ # Returns the headers for the first row of this table (assumed to match all
672
+ # other rows). An empty Array is returned for empty tables.
673
+ #
674
+ def headers
675
+ if @table.empty?
676
+ Array.new
318
677
  else
319
- reader = CSV::Reader.create(file, fs, rs)
320
- reader.close_on_terminate
321
- reader
678
+ @table.first.headers
322
679
  end
323
680
  end
324
681
 
325
- def open_writer(path, mode, fs, rs, &block)
326
- file = File.open(path, mode)
327
- if block
328
- begin
329
- CSV::Writer.generate(file, fs, rs) do |writer|
330
- yield(writer)
331
- end
332
- ensure
333
- file.close
334
- end
335
- nil
336
- else
337
- writer = CSV::Writer.create(file, fs, rs)
338
- writer.close_on_terminate
339
- writer
340
- end
341
- end
342
-
343
- def parse_body(src, idx, fs, rs)
344
- fs_str = fs
345
- fs_size = fs_str.size
346
- rs_str = rs || "\n"
347
- rs_size = rs_str.size
348
- fs_idx = rs_idx = 0
349
- cell = Cell.new
350
- state = :ST_START
351
- quoted = cr = false
352
- c = nil
353
- last_idx = idx
354
- while c = src[idx]
355
- unless quoted
356
- fschar = (c == fs_str[fs_idx])
357
- rschar = (c == rs_str[rs_idx])
358
- # simple 1 char backtrack
359
- if !fschar and c == fs_str[0]
360
- fs_idx = 0
361
- fschar = true
362
- if state == :ST_START
363
- state = :ST_DATA
364
- elsif state == :ST_QUOTE
365
- raise IllegalFormatError
366
- end
367
- end
368
- if !rschar and c == rs_str[0]
369
- rs_idx = 0
370
- rschar = true
371
- if state == :ST_START
372
- state = :ST_DATA
373
- elsif state == :ST_QUOTE
374
- raise IllegalFormatError
375
- end
376
- end
682
+ #
683
+ # In the default mixed mode, this method returns rows for index access and
684
+ # columns for header access. You can force the index association by first
685
+ # calling by_col!() or by_row!().
686
+ #
687
+ # Columns are returned as an Array of values. Altering that Array has no
688
+ # effect on the table.
689
+ #
690
+ def [](index_or_header)
691
+ if @mode == :row or # by index
692
+ (@mode == :col_or_row and index_or_header.is_a? Integer)
693
+ @table[index_or_header]
694
+ else # by header
695
+ @table.map { |row| row[index_or_header] }
696
+ end
697
+ end
698
+
699
+ #
700
+ # In the default mixed mode, this method assigns rows for index access and
701
+ # columns for header access. You can force the index association by first
702
+ # calling by_col!() or by_row!().
703
+ #
704
+ # Rows may be set to an Array of values (which will inherit the table's
705
+ # headers()) or a CSV::Row.
706
+ #
707
+ # Columns may be set to a single value, which is copied to each row of the
708
+ # column, or an Array of values. Arrays of values are assigned to rows top
709
+ # to bottom in row major order. Excess values are ignored and if the Array
710
+ # does not have a value for each row the extra rows will receive a +nil+.
711
+ #
712
+ # Assigning to an existing column or row clobbers the data. Assigning to
713
+ # new columns creates them at the right end of the table.
714
+ #
715
+ def []=(index_or_header, value)
716
+ if @mode == :row or # by index
717
+ (@mode == :col_or_row and index_or_header.is_a? Integer)
718
+ if value.is_a? Array
719
+ @table[index_or_header] = Row.new(headers, value)
720
+ else
721
+ @table[index_or_header] = value
377
722
  end
378
- if c == ?"
379
- fs_idx = rs_idx = 0
380
- if cr
381
- raise IllegalFormatError
382
- end
383
- cell << src[last_idx, (idx - last_idx)]
384
- last_idx = idx
385
- if state == :ST_DATA
386
- if quoted
387
- last_idx += 1
388
- quoted = false
389
- state = :ST_QUOTE
723
+ else # set column
724
+ if value.is_a? Array # multiple values
725
+ @table.each_with_index do |row, i|
726
+ if row.header_row?
727
+ row[index_or_header] = index_or_header
390
728
  else
391
- raise IllegalFormatError
392
- end
393
- elsif state == :ST_QUOTE
394
- cell << c.chr
395
- last_idx += 1
396
- quoted = true
397
- state = :ST_DATA
398
- else # :ST_START
399
- quoted = true
400
- last_idx += 1
401
- state = :ST_DATA
402
- end
403
- elsif fschar or rschar
404
- if fschar
405
- fs_idx += 1
406
- end
407
- if rschar
408
- rs_idx += 1
409
- end
410
- sep = nil
411
- if fs_idx == fs_size
412
- if state == :ST_START and rs_idx > 0 and fs_idx < rs_idx
413
- state = :ST_DATA
414
- end
415
- cell << src[last_idx, (idx - last_idx - (fs_size - 1))]
416
- last_idx = idx
417
- fs_idx = rs_idx = 0
418
- if cr
419
- raise IllegalFormatError
420
- end
421
- sep = :DT_COLSEP
422
- elsif rs_idx == rs_size
423
- if state == :ST_START and fs_idx > 0 and rs_idx < fs_idx
424
- state = :ST_DATA
425
- end
426
- if !(rs.nil? and cr)
427
- cell << src[last_idx, (idx - last_idx - (rs_size - 1))]
428
- last_idx = idx
729
+ row[index_or_header] = value[i]
429
730
  end
430
- fs_idx = rs_idx = 0
431
- sep = :DT_ROWSEP
432
- end
433
- if sep
434
- if state == :ST_DATA
435
- return sep, idx + 1, cell;
436
- elsif state == :ST_QUOTE
437
- return sep, idx + 1, cell;
438
- else # :ST_START
439
- return sep, idx + 1, nil
440
- end
441
- end
442
- elsif rs.nil? and c == ?\r
443
- # special \r treatment for backward compatibility
444
- fs_idx = rs_idx = 0
445
- if cr
446
- raise IllegalFormatError
447
- end
448
- cell << src[last_idx, (idx - last_idx)]
449
- last_idx = idx
450
- if quoted
451
- state = :ST_DATA
452
- else
453
- cr = true
454
731
  end
455
- else
456
- fs_idx = rs_idx = 0
457
- if state == :ST_DATA or state == :ST_START
458
- if cr
459
- raise IllegalFormatError
732
+ else # repeated value
733
+ @table.each do |row|
734
+ if row.header_row?
735
+ row[index_or_header] = index_or_header
736
+ else
737
+ row[index_or_header] = value
460
738
  end
461
- state = :ST_DATA
462
- else # :ST_QUOTE
463
- raise IllegalFormatError
464
739
  end
465
740
  end
466
- idx += 1
467
741
  end
468
- if state == :ST_START
469
- if fs_idx > 0 or rs_idx > 0
470
- state = :ST_DATA
471
- else
472
- return :DT_EOS, idx, nil
473
- end
474
- elsif quoted
475
- raise IllegalFormatError
476
- elsif cr
477
- raise IllegalFormatError
478
- end
479
- cell << src[last_idx, (idx - last_idx)]
480
- last_idx = idx
481
- return :DT_EOS, idx, cell
482
- end
483
-
484
- def generate_body(cell, out_dev, fs, rs)
485
- if cell.nil?
486
- # empty
487
- else
488
- cell = cell.to_s
489
- row_data = cell.dup
490
- if (row_data.gsub!('"', '""') or
491
- row_data.index(fs) or
492
- (rs and row_data.index(rs)) or
493
- (/[\r\n]/ =~ row_data) or
494
- (cell.empty?))
495
- out_dev << '"' << row_data << '"'
496
- else
497
- out_dev << row_data
498
- end
742
+ end
743
+
744
+ #
745
+ # The mixed mode default is to treat a list of indices as row access,
746
+ # returning the rows indicated. Anything else is considered columnar
747
+ # access. For columnar access, the return set has an Array for each row
748
+ # with the values indicated by the headers in each Array. You can force
749
+ # column or row mode using by_col!() or by_row!().
750
+ #
751
+ # You cannot mix column and row access.
752
+ #
753
+ def values_at(*indices_or_headers)
754
+ if @mode == :row or # by indices
755
+ ( @mode == :col_or_row and indices_or_headers.all? do |index|
756
+ index.is_a?(Integer) or
757
+ ( index.is_a?(Range) and
758
+ index.first.is_a?(Integer) and
759
+ index.last.is_a?(Integer) )
760
+ end )
761
+ @table.values_at(*indices_or_headers)
762
+ else # by headers
763
+ @table.map { |row| row.values_at(*indices_or_headers) }
499
764
  end
500
765
  end
501
-
502
- def generate_separator(type, out_dev, fs, rs)
503
- case type
504
- when :DT_COLSEP
505
- out_dev << fs
506
- when :DT_ROWSEP
507
- out_dev << (rs || "\n")
766
+
767
+ #
768
+ # Adds a new row to the bottom end of this table. You can provide an Array,
769
+ # which will be converted to a CSV::Row (inheriting the table's headers()),
770
+ # or a CSV::Row.
771
+ #
772
+ # This method returns the table for chaining.
773
+ #
774
+ def <<(row_or_array)
775
+ if row_or_array.is_a? Array # append Array
776
+ @table << Row.new(headers, row_or_array)
777
+ else # append Row
778
+ @table << row_or_array
508
779
  end
780
+
781
+ self # for chaining
509
782
  end
510
- end
511
783
 
784
+ #
785
+ # A shortcut for appending multiple rows. Equivalent to:
786
+ #
787
+ # rows.each { |row| self << row }
788
+ #
789
+ # This method returns the table for chaining.
790
+ #
791
+ def push(*rows)
792
+ rows.each { |row| self << row }
512
793
 
513
- # CSV formatted string/stream reader.
514
- #
515
- # EXAMPLE
516
- # read CSV lines untill the first column is 'stop'.
517
- #
518
- # CSV::Reader.parse(File.open('bigdata', 'rb')) do |row|
519
- # p row
520
- # break if !row[0].is_null && row[0].data == 'stop'
521
- # end
522
- #
523
- class Reader
524
- include Enumerable
794
+ self # for chaining
795
+ end
796
+
797
+ #
798
+ # Removes and returns the indicated column or row. In the default mixed
799
+ # mode indices refer to rows and everything else is assumed to be a column
800
+ # header. Use by_col!() or by_row!() to force the lookup.
801
+ #
802
+ def delete(index_or_header)
803
+ if @mode == :row or # by index
804
+ (@mode == :col_or_row and index_or_header.is_a? Integer)
805
+ @table.delete_at(index_or_header)
806
+ else # by header
807
+ @table.map { |row| row.delete(index_or_header).last }
808
+ end
809
+ end
525
810
 
526
- # Parse CSV data and get lines. Given block is called for each parsed row.
527
- # Block value is always nil. Rows are not cached for performance reason.
528
- def Reader.parse(str_or_readable, fs = ',', rs = nil, &block)
529
- reader = Reader.create(str_or_readable, fs, rs)
530
- if block
531
- reader.each do |row|
532
- yield(row)
811
+ #
812
+ # Removes any column or row for which the block returns +true+. In the
813
+ # default mixed mode or row mode, iteration is the standard row major
814
+ # walking of rows. In column mode, interation will +yield+ two element
815
+ # tuples containing the column name and an Array of values for that column.
816
+ #
817
+ # This method returns the table for chaining.
818
+ #
819
+ def delete_if(&block)
820
+ if @mode == :row or @mode == :col_or_row # by index
821
+ @table.delete_if(&block)
822
+ else # by header
823
+ to_delete = Array.new
824
+ headers.each_with_index do |header, i|
825
+ to_delete << header if block[[header, self[header]]]
533
826
  end
534
- reader.close
535
- nil
536
- else
537
- reader
827
+ to_delete.map { |header| delete(header) }
538
828
  end
829
+
830
+ self # for chaining
539
831
  end
540
832
 
541
- # Returns reader instance.
542
- def Reader.create(str_or_readable, fs = ',', rs = nil)
543
- case str_or_readable
544
- when IO
545
- IOReader.new(str_or_readable, fs, rs)
546
- when String
547
- StringReader.new(str_or_readable, fs, rs)
833
+ include Enumerable
834
+
835
+ #
836
+ # In the default mixed mode or row mode, iteration is the standard row major
837
+ # walking of rows. In column mode, interation will +yield+ two element
838
+ # tuples containing the column name and an Array of values for that column.
839
+ #
840
+ # This method returns the table for chaining.
841
+ #
842
+ def each(&block)
843
+ if @mode == :col
844
+ headers.each { |header| block[[header, self[header]]] }
548
845
  else
549
- IOReader.new(str_or_readable, fs, rs)
846
+ @table.each(&block)
550
847
  end
848
+
849
+ self # for chaining
850
+ end
851
+
852
+ # Returns +true+ if all rows of this table ==() +other+'s rows.
853
+ def ==(other)
854
+ @table == other.table
551
855
  end
552
856
 
553
- def each
554
- while true
555
- row = []
556
- parsed_cells = get_row(row)
557
- if parsed_cells == 0
558
- break
857
+ #
858
+ # Returns the table as an Array of Arrays. Headers will be the first row,
859
+ # then all of the field rows will follow.
860
+ #
861
+ def to_a
862
+ @table.inject([headers]) do |array, row|
863
+ if row.header_row?
864
+ array
865
+ else
866
+ array + [row.fields]
559
867
  end
560
- yield(row)
561
868
  end
562
- nil
563
869
  end
564
870
 
565
- def shift
566
- row = []
567
- parsed_cells = get_row(row)
568
- row
871
+ #
872
+ # Returns the table as a complete CSV String. Headers will be listed first,
873
+ # then all of the field rows.
874
+ #
875
+ # This method assumes you want the Table.headers(), unless you explicitly
876
+ # pass <tt>:write_headers => false</tt>.
877
+ #
878
+ def to_csv(options = Hash.new)
879
+ wh = options.fetch(:write_headers, true)
880
+ @table.inject(wh ? [headers.to_csv(options)] : [ ]) do |rows, row|
881
+ if row.header_row?
882
+ rows
883
+ else
884
+ rows + [row.fields.to_csv(options)]
885
+ end
886
+ end.join('')
569
887
  end
888
+ alias_method :to_s, :to_csv
570
889
 
571
- def close
572
- terminate
890
+ # Shows the mode and size of this table in a US-ASCII String.
891
+ def inspect
892
+ "#<#{self.class} mode:#{@mode} row_count:#{to_a.size}>".encode("US-ASCII")
573
893
  end
894
+ end
574
895
 
575
- private
896
+ # The error thrown when the parser encounters illegal CSV formatting.
897
+ class MalformedCSVError < RuntimeError; end
576
898
 
577
- def initialize(dev)
578
- raise RuntimeError.new('Do not instanciate this class directly.')
579
- end
899
+ #
900
+ # A FieldInfo Struct contains details about a field's position in the data
901
+ # source it was read from. CSV will pass this Struct to some blocks that make
902
+ # decisions based on field structure. See CSV.convert_fields() for an
903
+ # example.
904
+ #
905
+ # <b><tt>index</tt></b>:: The zero-based index of the field in its row.
906
+ # <b><tt>line</tt></b>:: The line of the data source this row is from.
907
+ # <b><tt>header</tt></b>:: The header for the column, when available.
908
+ #
909
+ FieldInfo = Struct.new(:index, :line, :header)
580
910
 
581
- def get_row(row)
582
- raise NotImplementedError.new('Method get_row must be defined in a derived class.')
583
- end
911
+ # A Regexp used to find and convert some common Date formats.
912
+ DateMatcher = / \A(?: (\w+,?\s+)?\w+\s+\d{1,2},?\s+\d{2,4} |
913
+ \d{4}-\d{2}-\d{2} )\z /x
914
+ # A Regexp used to find and convert some common DateTime formats.
915
+ DateTimeMatcher =
916
+ / \A(?: (\w+,?\s+)?\w+\s+\d{1,2}\s+\d{1,2}:\d{1,2}:\d{1,2},?\s+\d{2,4} |
917
+ \d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2} )\z /x
584
918
 
585
- def terminate
586
- # Define if needed.
919
+ # The encoding used by all converters.
920
+ ConverterEncoding = Encoding.find("UTF-8")
921
+
922
+ #
923
+ # This Hash holds the built-in converters of CSV that can be accessed by name.
924
+ # You can select Converters with CSV.convert() or through the +options+ Hash
925
+ # passed to CSV::new().
926
+ #
927
+ # <b><tt>:integer</tt></b>:: Converts any field Integer() accepts.
928
+ # <b><tt>:float</tt></b>:: Converts any field Float() accepts.
929
+ # <b><tt>:numeric</tt></b>:: A combination of <tt>:integer</tt>
930
+ # and <tt>:float</tt>.
931
+ # <b><tt>:date</tt></b>:: Converts any field Date::parse() accepts.
932
+ # <b><tt>:date_time</tt></b>:: Converts any field DateTime::parse() accepts.
933
+ # <b><tt>:all</tt></b>:: All built-in converters. A combination of
934
+ # <tt>:date_time</tt> and <tt>:numeric</tt>.
935
+ #
936
+ # All built-in converters transcode field data to UTF-8 before attempting a
937
+ # conversion. If your data cannot be transcoded to UTF-8 the conversion will
938
+ # fail and the field will remain unchanged.
939
+ #
940
+ # This Hash is intentionally left unfrozen and users should feel free to add
941
+ # values to it that can be accessed by all CSV objects.
942
+ #
943
+ # To add a combo field, the value should be an Array of names. Combo fields
944
+ # can be nested with other combo fields.
945
+ #
946
+ Converters = { integer: lambda { |f|
947
+ Integer(f.encode(ConverterEncoding)) rescue f
948
+ },
949
+ float: lambda { |f|
950
+ Float(f.encode(ConverterEncoding)) rescue f
951
+ },
952
+ numeric: [:integer, :float],
953
+ date: lambda { |f|
954
+ begin
955
+ e = f.encode(ConverterEncoding)
956
+ e =~ DateMatcher ? Date.parse(e) : f
957
+ rescue # encoding conversion or date parse errors
958
+ f
959
+ end
960
+ },
961
+ date_time: lambda { |f|
962
+ begin
963
+ e = f.encode(ConverterEncoding)
964
+ e =~ DateTimeMatcher ? DateTime.parse(e) : f
965
+ rescue # encoding conversion or date parse errors
966
+ f
967
+ end
968
+ },
969
+ all: [:date_time, :numeric] }
970
+
971
+ #
972
+ # This Hash holds the built-in header converters of CSV that can be accessed
973
+ # by name. You can select HeaderConverters with CSV.header_convert() or
974
+ # through the +options+ Hash passed to CSV::new().
975
+ #
976
+ # <b><tt>:downcase</tt></b>:: Calls downcase() on the header String.
977
+ # <b><tt>:symbol</tt></b>:: The header String is downcased, spaces are
978
+ # replaced with underscores, non-word characters
979
+ # are dropped, and finally to_sym() is called.
980
+ #
981
+ # All built-in header converters transcode header data to UTF-8 before
982
+ # attempting a conversion. If your data cannot be transcoded to UTF-8 the
983
+ # conversion will fail and the header will remain unchanged.
984
+ #
985
+ # This Hash is intetionally left unfrozen and users should feel free to add
986
+ # values to it that can be accessed by all CSV objects.
987
+ #
988
+ # To add a combo field, the value should be an Array of names. Combo fields
989
+ # can be nested with other combo fields.
990
+ #
991
+ HeaderConverters = {
992
+ downcase: lambda { |h| h.encode(ConverterEncoding).downcase },
993
+ symbol: lambda { |h|
994
+ h.encode(ConverterEncoding).downcase.gsub(/\s+/, "_").
995
+ gsub(/\W+/, "").to_sym
996
+ }
997
+ }
998
+
999
+ #
1000
+ # The options used when no overrides are given by calling code. They are:
1001
+ #
1002
+ # <b><tt>:col_sep</tt></b>:: <tt>","</tt>
1003
+ # <b><tt>:row_sep</tt></b>:: <tt>:auto</tt>
1004
+ # <b><tt>:quote_char</tt></b>:: <tt>'"'</tt>
1005
+ # <b><tt>:field_size_limit</tt></b>:: +nil+
1006
+ # <b><tt>:converters</tt></b>:: +nil+
1007
+ # <b><tt>:unconverted_fields</tt></b>:: +nil+
1008
+ # <b><tt>:headers</tt></b>:: +false+
1009
+ # <b><tt>:return_headers</tt></b>:: +false+
1010
+ # <b><tt>:header_converters</tt></b>:: +nil+
1011
+ # <b><tt>:skip_blanks</tt></b>:: +false+
1012
+ # <b><tt>:force_quotes</tt></b>:: +false+
1013
+ # <b><tt>:skip_lines</tt></b>:: +nil+
1014
+ #
1015
+ DEFAULT_OPTIONS = { col_sep: ",",
1016
+ row_sep: :auto,
1017
+ quote_char: '"',
1018
+ field_size_limit: nil,
1019
+ converters: nil,
1020
+ unconverted_fields: nil,
1021
+ headers: false,
1022
+ return_headers: false,
1023
+ header_converters: nil,
1024
+ skip_blanks: false,
1025
+ force_quotes: false,
1026
+ skip_lines: nil }.freeze
1027
+
1028
+ #
1029
+ # This method will return a CSV instance, just like CSV::new(), but the
1030
+ # instance will be cached and returned for all future calls to this method for
1031
+ # the same +data+ object (tested by Object#object_id()) with the same
1032
+ # +options+.
1033
+ #
1034
+ # If a block is given, the instance is passed to the block and the return
1035
+ # value becomes the return value of the block.
1036
+ #
1037
+ def self.instance(data = $stdout, options = Hash.new)
1038
+ # create a _signature_ for this method call, data object and options
1039
+ sig = [data.object_id] +
1040
+ options.values_at(*DEFAULT_OPTIONS.keys.sort_by { |sym| sym.to_s })
1041
+
1042
+ # fetch or create the instance for this signature
1043
+ @@instances ||= Hash.new
1044
+ instance = (@@instances[sig] ||= new(data, options))
1045
+
1046
+ if block_given?
1047
+ yield instance # run block, if given, returning result
1048
+ else
1049
+ instance # or return the instance
587
1050
  end
588
1051
  end
589
-
590
1052
 
591
- class StringReader < Reader
592
- def initialize(string, fs = ',', rs = nil)
593
- @fs = fs
594
- @rs = rs
595
- @dev = string
596
- @idx = 0
597
- if @dev[0, 3] == "\xef\xbb\xbf"
598
- @idx += 3
1053
+ #
1054
+ # :call-seq:
1055
+ # filter( options = Hash.new ) { |row| ... }
1056
+ # filter( input, options = Hash.new ) { |row| ... }
1057
+ # filter( input, output, options = Hash.new ) { |row| ... }
1058
+ #
1059
+ # This method is a convenience for building Unix-like filters for CSV data.
1060
+ # Each row is yielded to the provided block which can alter it as needed.
1061
+ # After the block returns, the row is appended to +output+ altered or not.
1062
+ #
1063
+ # The +input+ and +output+ arguments can be anything CSV::new() accepts
1064
+ # (generally String or IO objects). If not given, they default to
1065
+ # <tt>ARGF</tt> and <tt>$stdout</tt>.
1066
+ #
1067
+ # The +options+ parameter is also filtered down to CSV::new() after some
1068
+ # clever key parsing. Any key beginning with <tt>:in_</tt> or
1069
+ # <tt>:input_</tt> will have that leading identifier stripped and will only
1070
+ # be used in the +options+ Hash for the +input+ object. Keys starting with
1071
+ # <tt>:out_</tt> or <tt>:output_</tt> affect only +output+. All other keys
1072
+ # are assigned to both objects.
1073
+ #
1074
+ # The <tt>:output_row_sep</tt> +option+ defaults to
1075
+ # <tt>$INPUT_RECORD_SEPARATOR</tt> (<tt>$/</tt>).
1076
+ #
1077
+ def self.filter(*args)
1078
+ # parse options for input, output, or both
1079
+ in_options, out_options = Hash.new, {row_sep: $INPUT_RECORD_SEPARATOR}
1080
+ if args.last.is_a? Hash
1081
+ args.pop.each do |key, value|
1082
+ case key.to_s
1083
+ when /\Ain(?:put)?_(.+)\Z/
1084
+ in_options[$1.to_sym] = value
1085
+ when /\Aout(?:put)?_(.+)\Z/
1086
+ out_options[$1.to_sym] = value
1087
+ else
1088
+ in_options[key] = value
1089
+ out_options[key] = value
1090
+ end
599
1091
  end
600
1092
  end
1093
+ # build input and output wrappers
1094
+ input = new(args.shift || ARGF, in_options)
1095
+ output = new(args.shift || $stdout, out_options)
601
1096
 
602
- private
603
-
604
- def get_row(row)
605
- parsed_cells, next_idx = CSV.parse_row(@dev, @idx, row, @fs, @rs)
606
- if parsed_cells == 0 and next_idx == 0 and @idx != @dev.size
607
- raise IllegalFormatError.new
608
- end
609
- @idx = next_idx
610
- parsed_cells
1097
+ # read, yield, write
1098
+ input.each do |row|
1099
+ yield row
1100
+ output << row
611
1101
  end
612
1102
  end
613
1103
 
614
-
615
- class IOReader < Reader
616
- def initialize(io, fs = ',', rs = nil)
617
- @io = io
618
- @fs = fs
619
- @rs = rs
620
- @dev = CSV::IOBuf.new(@io)
621
- @idx = 0
622
- if @dev[0] == 0xef and @dev[1] == 0xbb and @dev[2] == 0xbf
623
- @idx += 3
624
- end
625
- @close_on_terminate = false
1104
+ #
1105
+ # This method is intended as the primary interface for reading CSV files. You
1106
+ # pass a +path+ and any +options+ you wish to set for the read. Each row of
1107
+ # file will be passed to the provided +block+ in turn.
1108
+ #
1109
+ # The +options+ parameter can be anything CSV::new() understands. This method
1110
+ # also understands an additional <tt>:encoding</tt> parameter that you can use
1111
+ # to specify the Encoding of the data in the file to be read. You must provide
1112
+ # this unless your data is in Encoding::default_external(). CSV will use this
1113
+ # to determine how to parse the data. You may provide a second Encoding to
1114
+ # have the data transcoded as it is read. For example,
1115
+ # <tt>encoding: "UTF-32BE:UTF-8"</tt> would read UTF-32BE data from the file
1116
+ # but transcode it to UTF-8 before CSV parses it.
1117
+ #
1118
+ def self.foreach(path, options = Hash.new, &block)
1119
+ open(path, options) do |csv|
1120
+ csv.each(&block)
626
1121
  end
1122
+ end
627
1123
 
628
- # Tell this reader to close the IO when terminated (Triggered by invoking
629
- # CSV::IOReader#close).
630
- def close_on_terminate
631
- @close_on_terminate = true
1124
+ #
1125
+ # :call-seq:
1126
+ # generate( str, options = Hash.new ) { |csv| ... }
1127
+ # generate( options = Hash.new ) { |csv| ... }
1128
+ #
1129
+ # This method wraps a String you provide, or an empty default String, in a
1130
+ # CSV object which is passed to the provided block. You can use the block to
1131
+ # append CSV rows to the String and when the block exits, the final String
1132
+ # will be returned.
1133
+ #
1134
+ # Note that a passed String *is* modfied by this method. Call dup() before
1135
+ # passing if you need a new String.
1136
+ #
1137
+ # The +options+ parameter can be anything CSV::new() understands. This method
1138
+ # understands an additional <tt>:encoding</tt> parameter when not passed a
1139
+ # String to set the base Encoding for the output. CSV needs this hint if you
1140
+ # plan to output non-ASCII compatible data.
1141
+ #
1142
+ def self.generate(*args)
1143
+ # add a default empty String, if none was given
1144
+ if args.first.is_a? String
1145
+ io = StringIO.new(args.shift)
1146
+ io.seek(0, IO::SEEK_END)
1147
+ args.unshift(io)
1148
+ else
1149
+ encoding = (args[-1] = args[-1].dup).delete(:encoding) if args.last.is_a?(Hash)
1150
+ str = ""
1151
+ str.encode!(encoding) if encoding
1152
+ args.unshift(str)
632
1153
  end
1154
+ csv = new(*args) # wrap
1155
+ yield csv # yield for appending
1156
+ csv.string # return final String
1157
+ end
633
1158
 
634
- private
1159
+ #
1160
+ # This method is a shortcut for converting a single row (Array) into a CSV
1161
+ # String.
1162
+ #
1163
+ # The +options+ parameter can be anything CSV::new() understands. This method
1164
+ # understands an additional <tt>:encoding</tt> parameter to set the base
1165
+ # Encoding for the output. This method will try to guess your Encoding from
1166
+ # the first non-+nil+ field in +row+, if possible, but you may need to use
1167
+ # this parameter as a backup plan.
1168
+ #
1169
+ # The <tt>:row_sep</tt> +option+ defaults to <tt>$INPUT_RECORD_SEPARATOR</tt>
1170
+ # (<tt>$/</tt>) when calling this method.
1171
+ #
1172
+ def self.generate_line(row, options = Hash.new)
1173
+ options = {row_sep: $INPUT_RECORD_SEPARATOR}.merge(options)
1174
+ encoding = options.delete(:encoding)
1175
+ str = ""
1176
+ if encoding
1177
+ str.force_encoding(encoding)
1178
+ elsif field = row.find { |f| not f.nil? }
1179
+ str.force_encoding(String(field).encoding)
1180
+ end
1181
+ (new(str, options) << row).string
1182
+ end
635
1183
 
636
- def get_row(row)
637
- parsed_cells, next_idx = CSV.parse_row(@dev, @idx, row, @fs, @rs)
638
- if parsed_cells == 0 and next_idx == 0 and !@dev.is_eos?
639
- raise IllegalFormatError.new
640
- end
641
- dropped = @dev.drop(next_idx)
642
- @idx = next_idx - dropped
643
- parsed_cells
1184
+ #
1185
+ # :call-seq:
1186
+ # open( filename, mode = "rb", options = Hash.new ) { |faster_csv| ... }
1187
+ # open( filename, options = Hash.new ) { |faster_csv| ... }
1188
+ # open( filename, mode = "rb", options = Hash.new )
1189
+ # open( filename, options = Hash.new )
1190
+ #
1191
+ # This method opens an IO object, and wraps that with CSV. This is intended
1192
+ # as the primary interface for writing a CSV file.
1193
+ #
1194
+ # You must pass a +filename+ and may optionally add a +mode+ for Ruby's
1195
+ # open(). You may also pass an optional Hash containing any +options+
1196
+ # CSV::new() understands as the final argument.
1197
+ #
1198
+ # This method works like Ruby's open() call, in that it will pass a CSV object
1199
+ # to a provided block and close it when the block terminates, or it will
1200
+ # return the CSV object when no block is provided. (*Note*: This is different
1201
+ # from the Ruby 1.8 CSV library which passed rows to the block. Use
1202
+ # CSV::foreach() for that behavior.)
1203
+ #
1204
+ # You must provide a +mode+ with an embedded Encoding designator unless your
1205
+ # data is in Encoding::default_external(). CSV will check the Encoding of the
1206
+ # underlying IO object (set by the +mode+ you pass) to determine how to parse
1207
+ # the data. You may provide a second Encoding to have the data transcoded as
1208
+ # it is read just as you can with a normal call to IO::open(). For example,
1209
+ # <tt>"rb:UTF-32BE:UTF-8"</tt> would read UTF-32BE data from the file but
1210
+ # transcode it to UTF-8 before CSV parses it.
1211
+ #
1212
+ # An opened CSV object will delegate to many IO methods for convenience. You
1213
+ # may call:
1214
+ #
1215
+ # * binmode()
1216
+ # * binmode?()
1217
+ # * close()
1218
+ # * close_read()
1219
+ # * close_write()
1220
+ # * closed?()
1221
+ # * eof()
1222
+ # * eof?()
1223
+ # * external_encoding()
1224
+ # * fcntl()
1225
+ # * fileno()
1226
+ # * flock()
1227
+ # * flush()
1228
+ # * fsync()
1229
+ # * internal_encoding()
1230
+ # * ioctl()
1231
+ # * isatty()
1232
+ # * path()
1233
+ # * pid()
1234
+ # * pos()
1235
+ # * pos=()
1236
+ # * reopen()
1237
+ # * seek()
1238
+ # * stat()
1239
+ # * sync()
1240
+ # * sync=()
1241
+ # * tell()
1242
+ # * to_i()
1243
+ # * to_io()
1244
+ # * truncate()
1245
+ # * tty?()
1246
+ #
1247
+ def self.open(*args)
1248
+ # find the +options+ Hash
1249
+ options = if args.last.is_a? Hash then args.pop else Hash.new end
1250
+ # wrap a File opened with the remaining +args+ with no newline
1251
+ # decorator
1252
+ file_opts = {universal_newline: false}.merge(options)
1253
+ begin
1254
+ f = File.open(*args, file_opts)
1255
+ rescue ArgumentError => e
1256
+ raise unless /needs binmode/ =~ e.message and args.size == 1
1257
+ args << "rb"
1258
+ file_opts = {encoding: Encoding.default_external}.merge(file_opts)
1259
+ retry
644
1260
  end
1261
+ csv = new(f, options)
645
1262
 
646
- def terminate
647
- if @close_on_terminate
648
- @io.close
1263
+ # handle blocks like Ruby's open(), not like the CSV library
1264
+ if block_given?
1265
+ begin
1266
+ yield csv
1267
+ ensure
1268
+ csv.close
649
1269
  end
1270
+ else
1271
+ csv
1272
+ end
1273
+ end
650
1274
 
651
- if @dev
652
- @dev.close
1275
+ #
1276
+ # :call-seq:
1277
+ # parse( str, options = Hash.new ) { |row| ... }
1278
+ # parse( str, options = Hash.new )
1279
+ #
1280
+ # This method can be used to easily parse CSV out of a String. You may either
1281
+ # provide a +block+ which will be called with each row of the String in turn,
1282
+ # or just use the returned Array of Arrays (when no +block+ is given).
1283
+ #
1284
+ # You pass your +str+ to read from, and an optional +options+ Hash containing
1285
+ # anything CSV::new() understands.
1286
+ #
1287
+ def self.parse(*args, &block)
1288
+ csv = new(*args)
1289
+ if block.nil? # slurp contents, if no block is given
1290
+ begin
1291
+ csv.read
1292
+ ensure
1293
+ csv.close
653
1294
  end
1295
+ else # or pass each row to a provided block
1296
+ csv.each(&block)
654
1297
  end
655
1298
  end
656
1299
 
1300
+ #
1301
+ # This method is a shortcut for converting a single line of a CSV String into
1302
+ # a into an Array. Note that if +line+ contains multiple rows, anything
1303
+ # beyond the first row is ignored.
1304
+ #
1305
+ # The +options+ parameter can be anything CSV::new() understands.
1306
+ #
1307
+ def self.parse_line(line, options = Hash.new)
1308
+ new(line, options).shift
1309
+ end
657
1310
 
658
- # CSV formatted string/stream writer.
659
1311
  #
660
- # EXAMPLE
661
- # Write rows to 'csvout' file.
1312
+ # Use to slurp a CSV file into an Array of Arrays. Pass the +path+ to the
1313
+ # file and any +options+ CSV::new() understands. This method also understands
1314
+ # an additional <tt>:encoding</tt> parameter that you can use to specify the
1315
+ # Encoding of the data in the file to be read. You must provide this unless
1316
+ # your data is in Encoding::default_external(). CSV will use this to determine
1317
+ # how to parse the data. You may provide a second Encoding to have the data
1318
+ # transcoded as it is read. For example,
1319
+ # <tt>encoding: "UTF-32BE:UTF-8"</tt> would read UTF-32BE data from the file
1320
+ # but transcode it to UTF-8 before CSV parses it.
662
1321
  #
663
- # outfile = File.open('csvout', 'wb')
664
- # CSV::Writer.generate(outfile) do |csv|
665
- # csv << ['c1', nil, '', '"', "\r\n", 'c2']
666
- # ...
667
- # end
1322
+ def self.read(path, *options)
1323
+ open(path, *options) { |csv| csv.read }
1324
+ end
1325
+
1326
+ # Alias for CSV::read().
1327
+ def self.readlines(*args)
1328
+ read(*args)
1329
+ end
1330
+
668
1331
  #
669
- # outfile.close
1332
+ # A shortcut for:
670
1333
  #
671
- class Writer
672
- # Given block is called with the writer instance. str_or_writable must
673
- # handle '<<(string)'.
674
- def Writer.generate(str_or_writable, fs = ',', rs = nil, &block)
675
- writer = Writer.create(str_or_writable, fs, rs)
676
- if block
677
- yield(writer)
678
- writer.close
679
- nil
680
- else
681
- writer
682
- end
1334
+ # CSV.read( path, { headers: true,
1335
+ # converters: :numeric,
1336
+ # header_converters: :symbol }.merge(options) )
1337
+ #
1338
+ def self.table(path, options = Hash.new)
1339
+ read( path, { headers: true,
1340
+ converters: :numeric,
1341
+ header_converters: :symbol }.merge(options) )
1342
+ end
1343
+
1344
+ #
1345
+ # This constructor will wrap either a String or IO object passed in +data+ for
1346
+ # reading and/or writing. In addition to the CSV instance methods, several IO
1347
+ # methods are delegated. (See CSV::open() for a complete list.) If you pass
1348
+ # a String for +data+, you can later retrieve it (after writing to it, for
1349
+ # example) with CSV.string().
1350
+ #
1351
+ # Note that a wrapped String will be positioned at at the beginning (for
1352
+ # reading). If you want it at the end (for writing), use CSV::generate().
1353
+ # If you want any other positioning, pass a preset StringIO object instead.
1354
+ #
1355
+ # You may set any reading and/or writing preferences in the +options+ Hash.
1356
+ # Available options are:
1357
+ #
1358
+ # <b><tt>:col_sep</tt></b>:: The String placed between each field.
1359
+ # This String will be transcoded into
1360
+ # the data's Encoding before parsing.
1361
+ # <b><tt>:row_sep</tt></b>:: The String appended to the end of each
1362
+ # row. This can be set to the special
1363
+ # <tt>:auto</tt> setting, which requests
1364
+ # that CSV automatically discover this
1365
+ # from the data. Auto-discovery reads
1366
+ # ahead in the data looking for the next
1367
+ # <tt>"\r\n"</tt>, <tt>"\n"</tt>, or
1368
+ # <tt>"\r"</tt> sequence. A sequence
1369
+ # will be selected even if it occurs in
1370
+ # a quoted field, assuming that you
1371
+ # would have the same line endings
1372
+ # there. If none of those sequences is
1373
+ # found, +data+ is <tt>ARGF</tt>,
1374
+ # <tt>STDIN</tt>, <tt>STDOUT</tt>, or
1375
+ # <tt>STDERR</tt>, or the stream is only
1376
+ # available for output, the default
1377
+ # <tt>$INPUT_RECORD_SEPARATOR</tt>
1378
+ # (<tt>$/</tt>) is used. Obviously,
1379
+ # discovery takes a little time. Set
1380
+ # manually if speed is important. Also
1381
+ # note that IO objects should be opened
1382
+ # in binary mode on Windows if this
1383
+ # feature will be used as the
1384
+ # line-ending translation can cause
1385
+ # problems with resetting the document
1386
+ # position to where it was before the
1387
+ # read ahead. This String will be
1388
+ # transcoded into the data's Encoding
1389
+ # before parsing.
1390
+ # <b><tt>:quote_char</tt></b>:: The character used to quote fields.
1391
+ # This has to be a single character
1392
+ # String. This is useful for
1393
+ # application that incorrectly use
1394
+ # <tt>'</tt> as the quote character
1395
+ # instead of the correct <tt>"</tt>.
1396
+ # CSV will always consider a double
1397
+ # sequence this character to be an
1398
+ # escaped quote. This String will be
1399
+ # transcoded into the data's Encoding
1400
+ # before parsing.
1401
+ # <b><tt>:field_size_limit</tt></b>:: This is a maximum size CSV will read
1402
+ # ahead looking for the closing quote
1403
+ # for a field. (In truth, it reads to
1404
+ # the first line ending beyond this
1405
+ # size.) If a quote cannot be found
1406
+ # within the limit CSV will raise a
1407
+ # MalformedCSVError, assuming the data
1408
+ # is faulty. You can use this limit to
1409
+ # prevent what are effectively DoS
1410
+ # attacks on the parser. However, this
1411
+ # limit can cause a legitimate parse to
1412
+ # fail and thus is set to +nil+, or off,
1413
+ # by default.
1414
+ # <b><tt>:converters</tt></b>:: An Array of names from the Converters
1415
+ # Hash and/or lambdas that handle custom
1416
+ # conversion. A single converter
1417
+ # doesn't have to be in an Array. All
1418
+ # built-in converters try to transcode
1419
+ # fields to UTF-8 before converting.
1420
+ # The conversion will fail if the data
1421
+ # cannot be transcoded, leaving the
1422
+ # field unchanged.
1423
+ # <b><tt>:unconverted_fields</tt></b>:: If set to +true+, an
1424
+ # unconverted_fields() method will be
1425
+ # added to all returned rows (Array or
1426
+ # CSV::Row) that will return the fields
1427
+ # as they were before conversion. Note
1428
+ # that <tt>:headers</tt> supplied by
1429
+ # Array or String were not fields of the
1430
+ # document and thus will have an empty
1431
+ # Array attached.
1432
+ # <b><tt>:headers</tt></b>:: If set to <tt>:first_row</tt> or
1433
+ # +true+, the initial row of the CSV
1434
+ # file will be treated as a row of
1435
+ # headers. If set to an Array, the
1436
+ # contents will be used as the headers.
1437
+ # If set to a String, the String is run
1438
+ # through a call of CSV::parse_line()
1439
+ # with the same <tt>:col_sep</tt>,
1440
+ # <tt>:row_sep</tt>, and
1441
+ # <tt>:quote_char</tt> as this instance
1442
+ # to produce an Array of headers. This
1443
+ # setting causes CSV#shift() to return
1444
+ # rows as CSV::Row objects instead of
1445
+ # Arrays and CSV#read() to return
1446
+ # CSV::Table objects instead of an Array
1447
+ # of Arrays.
1448
+ # <b><tt>:return_headers</tt></b>:: When +false+, header rows are silently
1449
+ # swallowed. If set to +true+, header
1450
+ # rows are returned in a CSV::Row object
1451
+ # with identical headers and
1452
+ # fields (save that the fields do not go
1453
+ # through the converters).
1454
+ # <b><tt>:write_headers</tt></b>:: When +true+ and <tt>:headers</tt> is
1455
+ # set, a header row will be added to the
1456
+ # output.
1457
+ # <b><tt>:header_converters</tt></b>:: Identical in functionality to
1458
+ # <tt>:converters</tt> save that the
1459
+ # conversions are only made to header
1460
+ # rows. All built-in converters try to
1461
+ # transcode headers to UTF-8 before
1462
+ # converting. The conversion will fail
1463
+ # if the data cannot be transcoded,
1464
+ # leaving the header unchanged.
1465
+ # <b><tt>:skip_blanks</tt></b>:: When set to a +true+ value, CSV will
1466
+ # skip over any rows with no content.
1467
+ # <b><tt>:force_quotes</tt></b>:: When set to a +true+ value, CSV will
1468
+ # quote all CSV fields it creates.
1469
+ # <b><tt>:skip_lines</tt></b>:: When set to an object responding to
1470
+ # <tt>match</tt>, every line matching
1471
+ # it is considered a comment and ignored
1472
+ # during parsing. When set to +nil+
1473
+ # no line is considered a comment.
1474
+ # If the passed object does not respond
1475
+ # to <tt>match</tt>, <tt>ArgumentError</tt>
1476
+ # is thrown.
1477
+ #
1478
+ # See CSV::DEFAULT_OPTIONS for the default settings.
1479
+ #
1480
+ # Options cannot be overridden in the instance methods for performance reasons,
1481
+ # so be sure to set what you want here.
1482
+ #
1483
+ def initialize(data, options = Hash.new)
1484
+ # build the options for this read/write
1485
+ options = DEFAULT_OPTIONS.merge(options)
1486
+
1487
+ # create the IO object we will read from
1488
+ @io = data.is_a?(String) ? StringIO.new(data) : data
1489
+ # honor the IO encoding if we can, otherwise default to ASCII-8BIT
1490
+ @encoding = raw_encoding(nil) ||
1491
+ ( if encoding = options.delete(:internal_encoding)
1492
+ case encoding
1493
+ when Encoding; encoding
1494
+ else Encoding.find(encoding)
1495
+ end
1496
+ end ) ||
1497
+ ( case encoding = options.delete(:encoding)
1498
+ when Encoding; encoding
1499
+ when /\A[^:]+/; Encoding.find($&)
1500
+ end ) ||
1501
+ Encoding.default_internal || Encoding.default_external
1502
+ #
1503
+ # prepare for building safe regular expressions in the target encoding,
1504
+ # if we can transcode the needed characters
1505
+ #
1506
+ @re_esc = "\\".encode(@encoding) rescue ""
1507
+ @re_chars = /#{%"[-][\\.^$?*+{}()|# \r\n\t\f\v]".encode(@encoding)}/
1508
+ # @re_chars = /#{%"[-][\\.^$?*+{}()|# \r\n\t\f\v]".encode(@encoding, fallback: proc{""})}/
1509
+
1510
+ init_separators(options)
1511
+ init_parsers(options)
1512
+ init_converters(options)
1513
+ init_headers(options)
1514
+ init_comments(options)
1515
+
1516
+ options.delete(:encoding)
1517
+ options.delete(:internal_encoding)
1518
+ options.delete(:external_encoding)
1519
+ unless options.empty?
1520
+ raise ArgumentError, "Unknown options: #{options.keys.join(', ')}."
683
1521
  end
684
1522
 
685
- # str_or_writable must handle '<<(string)'.
686
- def Writer.create(str_or_writable, fs = ',', rs = nil)
687
- BasicWriter.new(str_or_writable, fs, rs)
1523
+ # track our own lineno since IO gets confused about line-ends is CSV fields
1524
+ @lineno = 0
1525
+ end
1526
+
1527
+ #
1528
+ # The encoded <tt>:col_sep</tt> used in parsing and writing. See CSV::new
1529
+ # for details.
1530
+ #
1531
+ attr_reader :col_sep
1532
+ #
1533
+ # The encoded <tt>:row_sep</tt> used in parsing and writing. See CSV::new
1534
+ # for details.
1535
+ #
1536
+ attr_reader :row_sep
1537
+ #
1538
+ # The encoded <tt>:quote_char</tt> used in parsing and writing. See CSV::new
1539
+ # for details.
1540
+ #
1541
+ attr_reader :quote_char
1542
+ # The limit for field size, if any. See CSV::new for details.
1543
+ attr_reader :field_size_limit
1544
+
1545
+ # The regex marking a line as a comment. See CSV::new for details
1546
+ attr_reader :skip_lines
1547
+
1548
+ #
1549
+ # Returns the current list of converters in effect. See CSV::new for details.
1550
+ # Built-in converters will be returned by name, while others will be returned
1551
+ # as is.
1552
+ #
1553
+ def converters
1554
+ @converters.map do |converter|
1555
+ name = Converters.rassoc(converter)
1556
+ name ? name.first : converter
688
1557
  end
1558
+ end
1559
+ #
1560
+ # Returns +true+ if unconverted_fields() to parsed results. See CSV::new
1561
+ # for details.
1562
+ #
1563
+ def unconverted_fields?() @unconverted_fields end
1564
+ #
1565
+ # Returns +nil+ if headers will not be used, +true+ if they will but have not
1566
+ # yet been read, or the actual headers after they have been read. See
1567
+ # CSV::new for details.
1568
+ #
1569
+ def headers
1570
+ @headers || true if @use_headers
1571
+ end
1572
+ #
1573
+ # Returns +true+ if headers will be returned as a row of results.
1574
+ # See CSV::new for details.
1575
+ #
1576
+ def return_headers?() @return_headers end
1577
+ # Returns +true+ if headers are written in output. See CSV::new for details.
1578
+ def write_headers?() @write_headers end
1579
+ #
1580
+ # Returns the current list of converters in effect for headers. See CSV::new
1581
+ # for details. Built-in converters will be returned by name, while others
1582
+ # will be returned as is.
1583
+ #
1584
+ def header_converters
1585
+ @header_converters.map do |converter|
1586
+ name = HeaderConverters.rassoc(converter)
1587
+ name ? name.first : converter
1588
+ end
1589
+ end
1590
+ #
1591
+ # Returns +true+ blank lines are skipped by the parser. See CSV::new
1592
+ # for details.
1593
+ #
1594
+ def skip_blanks?() @skip_blanks end
1595
+ # Returns +true+ if all output fields are quoted. See CSV::new for details.
1596
+ def force_quotes?() @force_quotes end
689
1597
 
690
- # dump CSV stream to the device. argument must be an Array of String.
691
- def <<(row)
692
- CSV.generate_row(row, row.size, @dev, @fs, @rs)
693
- self
1598
+ #
1599
+ # The Encoding CSV is parsing or writing in. This will be the Encoding you
1600
+ # receive parsed data in and/or the Encoding data will be written in.
1601
+ #
1602
+ attr_reader :encoding
1603
+
1604
+ #
1605
+ # The line number of the last row read from this file. Fields with nested
1606
+ # line-end characters will not affect this count.
1607
+ #
1608
+ attr_reader :lineno
1609
+
1610
+ ### IO and StringIO Delegation ###
1611
+
1612
+ extend Forwardable
1613
+ def_delegators :@io, :binmode, :binmode?, :close, :close_read, :close_write,
1614
+ :closed?, :eof, :eof?, :external_encoding, :fcntl,
1615
+ :fileno, :flock, :flush, :fsync, :internal_encoding,
1616
+ :ioctl, :isatty, :path, :pid, :pos, :pos=, :reopen,
1617
+ :seek, :stat, :string, :sync, :sync=, :tell, :to_i,
1618
+ :to_io, :truncate, :tty?
1619
+
1620
+ # Rewinds the underlying IO object and resets CSV's lineno() counter.
1621
+ def rewind
1622
+ @headers = nil
1623
+ @lineno = 0
1624
+
1625
+ @io.rewind
1626
+ end
1627
+
1628
+ ### End Delegation ###
1629
+
1630
+ #
1631
+ # The primary write method for wrapped Strings and IOs, +row+ (an Array or
1632
+ # CSV::Row) is converted to CSV and appended to the data source. When a
1633
+ # CSV::Row is passed, only the row's fields() are appended to the output.
1634
+ #
1635
+ # The data source must be open for writing.
1636
+ #
1637
+ def <<(row)
1638
+ # make sure headers have been assigned
1639
+ if header_row? and [Array, String].include? @use_headers.class
1640
+ parse_headers # won't read data for Array or String
1641
+ self << @headers if @write_headers
694
1642
  end
695
- alias add_row <<
696
1643
 
697
- def close
698
- terminate
1644
+ # handle CSV::Row objects and Hashes
1645
+ row = case row
1646
+ when self.class::Row then row.fields
1647
+ when Hash then @headers.map { |header| row[header] }
1648
+ else row
1649
+ end
1650
+
1651
+ @headers = row if header_row?
1652
+ @lineno += 1
1653
+
1654
+ output = row.map(&@quote).join(@col_sep) + @row_sep # quote and separate
1655
+ if @io.is_a?(StringIO) and
1656
+ output.encoding != raw_encoding and
1657
+ (compatible_encoding = Encoding.compatible?(@io.string, output))
1658
+ @io = StringIO.new(@io.string.force_encoding(compatible_encoding))
1659
+ @io.seek(0, IO::SEEK_END)
699
1660
  end
1661
+ @io << output
700
1662
 
701
- private
1663
+ self # for chaining
1664
+ end
1665
+ alias_method :add_row, :<<
1666
+ alias_method :puts, :<<
702
1667
 
703
- def initialize(dev)
704
- raise RuntimeError.new('Do not instanciate this class directly.')
1668
+ #
1669
+ # :call-seq:
1670
+ # convert( name )
1671
+ # convert { |field| ... }
1672
+ # convert { |field, field_info| ... }
1673
+ #
1674
+ # You can use this method to install a CSV::Converters built-in, or provide a
1675
+ # block that handles a custom conversion.
1676
+ #
1677
+ # If you provide a block that takes one argument, it will be passed the field
1678
+ # and is expected to return the converted value or the field itself. If your
1679
+ # block takes two arguments, it will also be passed a CSV::FieldInfo Struct,
1680
+ # containing details about the field. Again, the block should return a
1681
+ # converted field or the field itself.
1682
+ #
1683
+ def convert(name = nil, &converter)
1684
+ add_converter(:converters, self.class::Converters, name, &converter)
1685
+ end
1686
+
1687
+ #
1688
+ # :call-seq:
1689
+ # header_convert( name )
1690
+ # header_convert { |field| ... }
1691
+ # header_convert { |field, field_info| ... }
1692
+ #
1693
+ # Identical to CSV#convert(), but for header rows.
1694
+ #
1695
+ # Note that this method must be called before header rows are read to have any
1696
+ # effect.
1697
+ #
1698
+ def header_convert(name = nil, &converter)
1699
+ add_converter( :header_converters,
1700
+ self.class::HeaderConverters,
1701
+ name,
1702
+ &converter )
1703
+ end
1704
+
1705
+ include Enumerable
1706
+
1707
+ #
1708
+ # Yields each row of the data source in turn.
1709
+ #
1710
+ # Support for Enumerable.
1711
+ #
1712
+ # The data source must be open for reading.
1713
+ #
1714
+ def each
1715
+ if block_given?
1716
+ while row = shift
1717
+ yield row
1718
+ end
1719
+ else
1720
+ to_enum
705
1721
  end
1722
+ end
706
1723
 
707
- def terminate
708
- # Define if needed.
1724
+ #
1725
+ # Slurps the remaining rows and returns an Array of Arrays.
1726
+ #
1727
+ # The data source must be open for reading.
1728
+ #
1729
+ def read
1730
+ rows = to_a
1731
+ if @use_headers
1732
+ Table.new(rows)
1733
+ else
1734
+ rows
709
1735
  end
710
1736
  end
1737
+ alias_method :readlines, :read
711
1738
 
1739
+ # Returns +true+ if the next row read will be a header row.
1740
+ def header_row?
1741
+ @use_headers and @headers.nil?
1742
+ end
712
1743
 
713
- class BasicWriter < Writer
714
- def initialize(str_or_writable, fs = ',', rs = nil)
715
- @fs = fs
716
- @rs = rs
717
- @dev = str_or_writable
718
- @close_on_terminate = false
719
- end
1744
+ #
1745
+ # The primary read method for wrapped Strings and IOs, a single row is pulled
1746
+ # from the data source, parsed and returned as an Array of fields (if header
1747
+ # rows are not used) or a CSV::Row (when header rows are used).
1748
+ #
1749
+ # The data source must be open for reading.
1750
+ #
1751
+ def shift
1752
+ #########################################################################
1753
+ ### This method is purposefully kept a bit long as simple conditional ###
1754
+ ### checks are faster than numerous (expensive) method calls. ###
1755
+ #########################################################################
720
1756
 
721
- # Tell this writer to close the IO when terminated (Triggered by invoking
722
- # CSV::BasicWriter#close).
723
- def close_on_terminate
724
- @close_on_terminate = true
1757
+ # handle headers not based on document content
1758
+ if header_row? and @return_headers and
1759
+ [Array, String].include? @use_headers.class
1760
+ if @unconverted_fields
1761
+ return add_unconverted_fields(parse_headers, Array.new)
1762
+ else
1763
+ return parse_headers
1764
+ end
725
1765
  end
726
1766
 
727
- private
1767
+ #
1768
+ # it can take multiple calls to <tt>@io.gets()</tt> to get a full line,
1769
+ # because of \r and/or \n characters embedded in quoted fields
1770
+ #
1771
+ in_extended_col = false
1772
+ csv = Array.new
728
1773
 
729
- def terminate
730
- if @close_on_terminate
731
- @dev.close
732
- end
733
- end
734
- end
735
-
736
- private
737
-
738
- # Buffered stream.
739
- #
740
- # EXAMPLE 1 -- an IO.
741
- # class MyBuf < StreamBuf
742
- # # Do initialize myself before a super class. Super class might call my
743
- # # method 'read'. (Could be awful for C++ user. :-)
744
- # def initialize(s)
745
- # @s = s
746
- # super()
747
- # end
748
- #
749
- # # define my own 'read' method.
750
- # # CAUTION: Returning nil means EnfOfStream.
751
- # def read(size)
752
- # @s.read(size)
753
- # end
754
- #
755
- # # release buffers. in Ruby which has GC, you do not have to call this...
756
- # def terminate
757
- # @s = nil
758
- # super()
759
- # end
760
- # end
761
- #
762
- # buf = MyBuf.new(STDIN)
763
- # my_str = ''
764
- # p buf[0, 0] # => '' (null string)
765
- # p buf[0] # => 97 (char code of 'a')
766
- # p buf[0, 1] # => 'a'
767
- # my_str = buf[0, 5]
768
- # p my_str # => 'abcde' (5 chars)
769
- # p buf[0, 6] # => "abcde\n" (6 chars)
770
- # p buf[0, 7] # => "abcde\n" (6 chars)
771
- # p buf.drop(3) # => 3 (dropped chars)
772
- # p buf.get(0, 2) # => 'de' (2 chars)
773
- # p buf.is_eos? # => false (is not EOS here)
774
- # p buf.drop(5) # => 3 (dropped chars)
775
- # p buf.is_eos? # => true (is EOS here)
776
- # p buf[0] # => nil (is EOS here)
777
- #
778
- # EXAMPLE 2 -- String.
779
- # This is a conceptual example. No pros with this.
780
- #
781
- # class StrBuf < StreamBuf
782
- # def initialize(s)
783
- # @str = s
784
- # @idx = 0
785
- # super()
786
- # end
787
- #
788
- # def read(size)
789
- # str = @str[@idx, size]
790
- # @idx += str.size
791
- # str
792
- # end
793
- # end
794
- #
795
- class StreamBuf
796
- # get a char or a partial string from the stream.
797
- # idx: index of a string to specify a start point of a string to get.
798
- # unlike String instance, idx < 0 returns nil.
799
- # n: size of a string to get.
800
- # returns char at idx if n == nil.
801
- # returns a partial string, from idx to (idx + n) if n != nil. at EOF,
802
- # the string size could not equal to arg n.
803
- def [](idx, n = nil)
804
- if idx < 0
1774
+ loop do
1775
+ # add another read to the line
1776
+ unless parse = @io.gets(@row_sep)
805
1777
  return nil
806
1778
  end
807
- if (idx_is_eos?(idx))
808
- if n and (@offset + idx == buf_size(@cur_buf))
809
- # Like a String, 'abc'[4, 1] returns nil and
810
- # 'abc'[3, 1] returns '' not nil.
811
- return ''
812
- else
813
- return nil
1779
+
1780
+ parse.sub!(@parsers[:line_end], "")
1781
+
1782
+ if csv.empty?
1783
+ #
1784
+ # I believe a blank line should be an <tt>Array.new</tt>, not Ruby 1.8
1785
+ # CSV's <tt>[nil]</tt>
1786
+ #
1787
+ if parse.empty?
1788
+ @lineno += 1
1789
+ if @skip_blanks
1790
+ next
1791
+ elsif @unconverted_fields
1792
+ return add_unconverted_fields(Array.new, Array.new)
1793
+ elsif @use_headers
1794
+ return self.class::Row.new(Array.new, Array.new)
1795
+ else
1796
+ return Array.new
1797
+ end
814
1798
  end
815
1799
  end
816
- my_buf = @cur_buf
817
- my_offset = @offset
818
- next_idx = idx
819
- while (my_offset + next_idx >= buf_size(my_buf))
820
- if (my_buf == @buf_tail_idx)
821
- unless add_buf
822
- break
823
- end
1800
+
1801
+ next if @skip_lines and @skip_lines.match parse
1802
+
1803
+ parts = parse.split(@col_sep, -1)
1804
+ if parts.empty?
1805
+ if in_extended_col
1806
+ csv[-1] << @col_sep # will be replaced with a @row_sep after the parts.each loop
1807
+ else
1808
+ csv << nil
824
1809
  end
825
- next_idx = my_offset + next_idx - buf_size(my_buf)
826
- my_buf += 1
827
- my_offset = 0
828
- end
829
- loc = my_offset + next_idx
830
- if !n
831
- return @buf_list[my_buf][loc] # Fixnum of char code.
832
- elsif (loc + n - 1 < buf_size(my_buf))
833
- return @buf_list[my_buf][loc, n] # String.
834
- else # should do loop insted of (tail) recursive call...
835
- res = @buf_list[my_buf][loc, BufSize]
836
- size_added = buf_size(my_buf) - loc
837
- if size_added > 0
838
- idx += size_added
839
- n -= size_added
840
- ret = self[idx, n]
841
- if ret
842
- res << ret
1810
+ end
1811
+
1812
+ # This loop is the hot path of csv parsing. Some things may be non-dry
1813
+ # for a reason. Make sure to benchmark when refactoring.
1814
+ parts.each do |part|
1815
+ if in_extended_col
1816
+ # If we are continuing a previous column
1817
+ if part[-1] == @quote_char && part.count(@quote_char) % 2 != 0
1818
+ # extended column ends
1819
+ csv.last << part[0..-2]
1820
+ if csv.last =~ @parsers[:stray_quote]
1821
+ raise MalformedCSVError,
1822
+ "Missing or stray quote in line #{lineno + 1}"
1823
+ end
1824
+ csv.last.gsub!(@quote_char * 2, @quote_char)
1825
+ in_extended_col = false
1826
+ else
1827
+ csv.last << part
1828
+ csv.last << @col_sep
843
1829
  end
844
- end
845
- return res
846
- end
847
- end
848
- alias get []
849
-
850
- # drop a string from the stream.
851
- # returns dropped size. at EOF, dropped size might not equals to arg n.
852
- # Once you drop the head of the stream, access to the dropped part via []
853
- # or get returns nil.
854
- def drop(n)
855
- if is_eos?
856
- return 0
857
- end
858
- size_dropped = 0
859
- while (n > 0)
860
- if !@is_eos or (@cur_buf != @buf_tail_idx)
861
- if (@offset + n < buf_size(@cur_buf))
862
- size_dropped += n
863
- @offset += n
864
- n = 0
1830
+ elsif part[0] == @quote_char
1831
+ # If we are staring a new quoted column
1832
+ if part[-1] != @quote_char || part.count(@quote_char) % 2 != 0
1833
+ # start an extended column
1834
+ csv << part[1..-1]
1835
+ csv.last << @col_sep
1836
+ in_extended_col = true
865
1837
  else
866
- size = buf_size(@cur_buf) - @offset
867
- size_dropped += size
868
- n -= size
869
- @offset = 0
870
- unless rel_buf
871
- unless add_buf
872
- break
873
- end
874
- @cur_buf = @buf_tail_idx
1838
+ # regular quoted column
1839
+ csv << part[1..-2]
1840
+ if csv.last =~ @parsers[:stray_quote]
1841
+ raise MalformedCSVError,
1842
+ "Missing or stray quote in line #{lineno + 1}"
875
1843
  end
1844
+ csv.last.gsub!(@quote_char * 2, @quote_char)
1845
+ end
1846
+ elsif part =~ @parsers[:quote_or_nl]
1847
+ # Unquoted field with bad characters.
1848
+ if part =~ @parsers[:nl_or_lf]
1849
+ raise MalformedCSVError, "Unquoted fields do not allow " +
1850
+ "\\r or \\n (line #{lineno + 1})."
1851
+ else
1852
+ raise MalformedCSVError, "Illegal quoting in line #{lineno + 1}."
876
1853
  end
1854
+ else
1855
+ # Regular ole unquoted field.
1856
+ csv << (part.empty? ? nil : part)
877
1857
  end
878
1858
  end
879
- size_dropped
880
- end
881
-
882
- def is_eos?
883
- return idx_is_eos?(0)
884
- end
885
-
886
- # WARN: Do not instantiate this class directly. Define your own class
887
- # which derives this class and define 'read' instance method.
888
- def initialize
889
- @buf_list = []
890
- @cur_buf = @buf_tail_idx = -1
891
- @offset = 0
892
- @is_eos = false
893
- add_buf
894
- @cur_buf = @buf_tail_idx
895
- end
896
-
897
- protected
898
-
899
- def terminate
900
- while (rel_buf); end
901
- end
902
-
903
- # protected method 'read' must be defined in derived classes.
904
- # CAUTION: Returning a string which size is not equal to 'size' means
905
- # EnfOfStream. When it is not at EOS, you must block the callee, try to
906
- # read and return the sized string.
907
- def read(size) # raise EOFError
908
- raise NotImplementedError.new('Method read must be defined in a derived class.')
909
- end
910
-
911
- private
912
-
913
- def buf_size(idx)
914
- @buf_list[idx].size
1859
+
1860
+ # Replace tacked on @col_sep with @row_sep if we are still in an extended
1861
+ # column.
1862
+ csv[-1][-1] = @row_sep if in_extended_col
1863
+
1864
+ if in_extended_col
1865
+ # if we're at eof?(), a quoted field wasn't closed...
1866
+ if @io.eof?
1867
+ raise MalformedCSVError,
1868
+ "Unclosed quoted field on line #{lineno + 1}."
1869
+ elsif @field_size_limit and csv.last.size >= @field_size_limit
1870
+ raise MalformedCSVError, "Field size exceeded on line #{lineno + 1}."
1871
+ end
1872
+ # otherwise, we need to loop and pull some more data to complete the row
1873
+ else
1874
+ @lineno += 1
1875
+
1876
+ # save fields unconverted fields, if needed...
1877
+ unconverted = csv.dup if @unconverted_fields
1878
+
1879
+ # convert fields, if needed...
1880
+ csv = convert_fields(csv) unless @use_headers or @converters.empty?
1881
+ # parse out header rows and handle CSV::Row conversions...
1882
+ csv = parse_headers(csv) if @use_headers
1883
+
1884
+ # inject unconverted fields and accessor, if requested...
1885
+ if @unconverted_fields and not csv.respond_to? :unconverted_fields
1886
+ add_unconverted_fields(csv, unconverted)
1887
+ end
1888
+
1889
+ # return the results
1890
+ break csv
1891
+ end
915
1892
  end
1893
+ end
1894
+ alias_method :gets, :shift
1895
+ alias_method :readline, :shift
916
1896
 
917
- def add_buf
918
- if @is_eos
919
- return false
1897
+ #
1898
+ # Returns a simplified description of the key CSV attributes in an
1899
+ # ASCII compatible String.
1900
+ #
1901
+ def inspect
1902
+ str = ["<#", self.class.to_s, " io_type:"]
1903
+ # show type of wrapped IO
1904
+ if @io == $stdout then str << "$stdout"
1905
+ elsif @io == $stdin then str << "$stdin"
1906
+ elsif @io == $stderr then str << "$stderr"
1907
+ else str << @io.class.to_s
1908
+ end
1909
+ # show IO.path(), if available
1910
+ if @io.respond_to?(:path) and (p = @io.path)
1911
+ str << " io_path:" << p.inspect
1912
+ end
1913
+ # show encoding
1914
+ str << " encoding:" << @encoding.name
1915
+ # show other attributes
1916
+ %w[ lineno col_sep row_sep
1917
+ quote_char skip_blanks ].each do |attr_name|
1918
+ if a = instance_variable_get("@#{attr_name}")
1919
+ str << " " << attr_name << ":" << a.inspect
920
1920
  end
921
- begin
922
- str_read = read(BufSize)
923
- rescue EOFError
924
- str_read = nil
925
- rescue
926
- terminate
927
- raise
928
- end
929
- if str_read.nil?
930
- @is_eos = true
931
- @buf_list.push('')
932
- @buf_tail_idx += 1
933
- false
1921
+ end
1922
+ if @use_headers
1923
+ str << " headers:" << headers.inspect
1924
+ end
1925
+ str << ">"
1926
+ begin
1927
+ str.join('')
1928
+ rescue # any encoding error
1929
+ str.map do |s|
1930
+ e = Encoding::Converter.asciicompat_encoding(s.encoding)
1931
+ e ? s.encode(e) : s.force_encoding("ASCII-8BIT")
1932
+ end.join('')
1933
+ end
1934
+ end
1935
+
1936
+ private
1937
+
1938
+ #
1939
+ # Stores the indicated separators for later use.
1940
+ #
1941
+ # If auto-discovery was requested for <tt>@row_sep</tt>, this method will read
1942
+ # ahead in the <tt>@io</tt> and try to find one. +ARGF+, +STDIN+, +STDOUT+,
1943
+ # +STDERR+ and any stream open for output only with a default
1944
+ # <tt>@row_sep</tt> of <tt>$INPUT_RECORD_SEPARATOR</tt> (<tt>$/</tt>).
1945
+ #
1946
+ # This method also establishes the quoting rules used for CSV output.
1947
+ #
1948
+ def init_separators(options)
1949
+ # store the selected separators
1950
+ @col_sep = options.delete(:col_sep).to_s.encode(@encoding)
1951
+ @row_sep = options.delete(:row_sep) # encode after resolving :auto
1952
+ @quote_char = options.delete(:quote_char).to_s.encode(@encoding)
1953
+
1954
+ if @quote_char.length != 1
1955
+ raise ArgumentError, ":quote_char has to be a single character String"
1956
+ end
1957
+
1958
+ #
1959
+ # automatically discover row separator when requested
1960
+ # (not fully encoding safe)
1961
+ #
1962
+ if @row_sep == :auto
1963
+ if [ARGF, STDIN, STDOUT, STDERR].include?(@io) or
1964
+ (defined?(Zlib) and @io.class == Zlib::GzipWriter)
1965
+ @row_sep = $INPUT_RECORD_SEPARATOR
934
1966
  else
935
- @buf_list.push(str_read)
936
- @buf_tail_idx += 1
937
- true
1967
+ begin
1968
+ #
1969
+ # remember where we were (pos() will raise an axception if @io is pipe
1970
+ # or not opened for reading)
1971
+ #
1972
+ saved_pos = @io.pos
1973
+ while @row_sep == :auto
1974
+ #
1975
+ # if we run out of data, it's probably a single line
1976
+ # (ensure will set default value)
1977
+ #
1978
+ break unless sample = @io.gets(nil, 1024)
1979
+ # extend sample if we're unsure of the line ending
1980
+ if sample.end_with? encode_str("\r")
1981
+ sample << (@io.gets(nil, 1) || "")
1982
+ end
1983
+
1984
+ # try to find a standard separator
1985
+ if sample =~ encode_re("\r\n?|\n")
1986
+ @row_sep = $&
1987
+ break
1988
+ end
1989
+ end
1990
+
1991
+ # tricky seek() clone to work around GzipReader's lack of seek()
1992
+ @io.rewind
1993
+ # reset back to the remembered position
1994
+ while saved_pos > 1024 # avoid loading a lot of data into memory
1995
+ @io.read(1024)
1996
+ saved_pos -= 1024
1997
+ end
1998
+ @io.read(saved_pos) if saved_pos.nonzero?
1999
+ rescue IOError # not opened for reading
2000
+ # do nothing: ensure will set default
2001
+ rescue NoMethodError # Zlib::GzipWriter doesn't have some IO methods
2002
+ # do nothing: ensure will set default
2003
+ rescue SystemCallError # pipe
2004
+ # do nothing: ensure will set default
2005
+ ensure
2006
+ #
2007
+ # set default if we failed to detect
2008
+ # (stream not opened for reading, a pipe, or a single line of data)
2009
+ #
2010
+ @row_sep = $INPUT_RECORD_SEPARATOR if @row_sep == :auto
2011
+ end
938
2012
  end
939
2013
  end
940
-
941
- def rel_buf
942
- if (@cur_buf < 0)
943
- return false
2014
+ @row_sep = @row_sep.to_s.encode(@encoding)
2015
+
2016
+ # establish quoting rules
2017
+ @force_quotes = options.delete(:force_quotes)
2018
+ do_quote = lambda do |field|
2019
+ field = String(field)
2020
+ encoded_quote = @quote_char.encode(field.encoding)
2021
+ encoded_quote +
2022
+ field.gsub(encoded_quote, encoded_quote * 2) +
2023
+ encoded_quote
2024
+ end
2025
+ quotable_chars = encode_str("\r\n", @col_sep, @quote_char)
2026
+ @quote = if @force_quotes
2027
+ do_quote
2028
+ else
2029
+ lambda do |field|
2030
+ if field.nil? # represent +nil+ fields as empty unquoted fields
2031
+ ""
2032
+ else
2033
+ field = String(field) # Stringify fields
2034
+ # represent empty fields as empty quoted fields
2035
+ if field.empty? or
2036
+ field.count(quotable_chars).nonzero?
2037
+ do_quote.call(field)
2038
+ else
2039
+ field # unquoted field
2040
+ end
2041
+ end
944
2042
  end
945
- @buf_list[@cur_buf] = nil
946
- if (@cur_buf == @buf_tail_idx)
947
- @cur_buf = -1
948
- return false
949
- else
950
- @cur_buf += 1
951
- return true
2043
+ end
2044
+ end
2045
+
2046
+ # Pre-compiles parsers and stores them by name for access during reads.
2047
+ def init_parsers(options)
2048
+ # store the parser behaviors
2049
+ @skip_blanks = options.delete(:skip_blanks)
2050
+ @field_size_limit = options.delete(:field_size_limit)
2051
+
2052
+ # prebuild Regexps for faster parsing
2053
+ esc_row_sep = escape_re(@row_sep)
2054
+ esc_quote = escape_re(@quote_char)
2055
+ @parsers = {
2056
+ # for detecting parse errors
2057
+ quote_or_nl: encode_re("[", esc_quote, "\r\n]"),
2058
+ nl_or_lf: encode_re("[\r\n]"),
2059
+ stray_quote: encode_re( "[^", esc_quote, "]", esc_quote,
2060
+ "[^", esc_quote, "]" ),
2061
+ # safer than chomp!()
2062
+ line_end: encode_re(esc_row_sep, "\\z"),
2063
+ # illegal unquoted characters
2064
+ return_newline: encode_str("\r\n")
2065
+ }
2066
+ end
2067
+
2068
+ #
2069
+ # Loads any converters requested during construction.
2070
+ #
2071
+ # If +field_name+ is set <tt>:converters</tt> (the default) field converters
2072
+ # are set. When +field_name+ is <tt>:header_converters</tt> header converters
2073
+ # are added instead.
2074
+ #
2075
+ # The <tt>:unconverted_fields</tt> option is also actived for
2076
+ # <tt>:converters</tt> calls, if requested.
2077
+ #
2078
+ def init_converters(options, field_name = :converters)
2079
+ if field_name == :converters
2080
+ @unconverted_fields = options.delete(:unconverted_fields)
2081
+ end
2082
+
2083
+ instance_variable_set("@#{field_name}", Array.new)
2084
+
2085
+ # find the correct method to add the converters
2086
+ convert = method(field_name.to_s.sub(/ers\Z/, ""))
2087
+
2088
+ # load converters
2089
+ unless options[field_name].nil?
2090
+ # allow a single converter not wrapped in an Array
2091
+ unless options[field_name].is_a? Array
2092
+ options[field_name] = [options[field_name]]
2093
+ end
2094
+ # load each converter...
2095
+ options[field_name].each do |converter|
2096
+ if converter.is_a? Proc # custom code block
2097
+ convert.call(&converter)
2098
+ else # by name
2099
+ convert.call(converter)
2100
+ end
952
2101
  end
953
2102
  end
954
-
955
- def idx_is_eos?(idx)
956
- (@is_eos and ((@cur_buf < 0) or (@cur_buf == @buf_tail_idx)))
2103
+
2104
+ options.delete(field_name)
2105
+ end
2106
+
2107
+ # Stores header row settings and loads header converters, if needed.
2108
+ def init_headers(options)
2109
+ @use_headers = options.delete(:headers)
2110
+ @return_headers = options.delete(:return_headers)
2111
+ @write_headers = options.delete(:write_headers)
2112
+
2113
+ # headers must be delayed until shift(), in case they need a row of content
2114
+ @headers = nil
2115
+
2116
+ init_converters(options, :header_converters)
2117
+ end
2118
+
2119
+ # Stores the pattern of comments to skip from the provided options.
2120
+ #
2121
+ # The pattern must respond to +.match+, else ArgumentError is raised.
2122
+ #
2123
+ # See also CSV.new
2124
+ def init_comments(options)
2125
+ @skip_lines = options.delete(:skip_lines)
2126
+ if @skip_lines and not @skip_lines.respond_to?(:match)
2127
+ raise ArgumentError, ":skip_lines has to respond to matches"
2128
+ end
2129
+ end
2130
+ #
2131
+ # The actual work method for adding converters, used by both CSV.convert() and
2132
+ # CSV.header_convert().
2133
+ #
2134
+ # This method requires the +var_name+ of the instance variable to place the
2135
+ # converters in, the +const+ Hash to lookup named converters in, and the
2136
+ # normal parameters of the CSV.convert() and CSV.header_convert() methods.
2137
+ #
2138
+ def add_converter(var_name, const, name = nil, &converter)
2139
+ if name.nil? # custom converter
2140
+ instance_variable_get("@#{var_name}") << converter
2141
+ else # named converter
2142
+ combo = const[name]
2143
+ case combo
2144
+ when Array # combo converter
2145
+ combo.each do |converter_name|
2146
+ add_converter(var_name, const, converter_name)
2147
+ end
2148
+ else # individual named converter
2149
+ instance_variable_get("@#{var_name}") << combo
2150
+ end
957
2151
  end
958
-
959
- BufSize = 1024 * 8
960
2152
  end
961
2153
 
962
- # Buffered IO.
963
2154
  #
964
- # EXAMPLE
965
- # # File 'bigdata' could be a giga-byte size one!
966
- # buf = CSV::IOBuf.new(File.open('bigdata', 'rb'))
967
- # CSV::Reader.new(buf).each do |row|
968
- # p row
969
- # break if row[0].data == 'admin'
970
- # end
2155
+ # Processes +fields+ with <tt>@converters</tt>, or <tt>@header_converters</tt>
2156
+ # if +headers+ is passed as +true+, returning the converted field set. Any
2157
+ # converter that changes the field into something other than a String halts
2158
+ # the pipeline of conversion for that field. This is primarily an efficiency
2159
+ # shortcut.
971
2160
  #
972
- class IOBuf < StreamBuf
973
- def initialize(s)
974
- @s = s
975
- super()
2161
+ def convert_fields(fields, headers = false)
2162
+ # see if we are converting headers or fields
2163
+ converters = headers ? @header_converters : @converters
2164
+
2165
+ fields.map.with_index do |field, index|
2166
+ converters.each do |converter|
2167
+ field = if converter.arity == 1 # straight field converter
2168
+ converter[field]
2169
+ else # FieldInfo converter
2170
+ header = @use_headers && !headers ? @headers[index] : nil
2171
+ converter[field, FieldInfo.new(index, lineno, header)]
2172
+ end
2173
+ break unless field.is_a? String # short-curcuit pipeline for speed
2174
+ end
2175
+ field # final state of each field, converted or original
976
2176
  end
977
-
978
- def close
979
- terminate
2177
+ end
2178
+
2179
+ #
2180
+ # This method is used to turn a finished +row+ into a CSV::Row. Header rows
2181
+ # are also dealt with here, either by returning a CSV::Row with identical
2182
+ # headers and fields (save that the fields do not go through the converters)
2183
+ # or by reading past them to return a field row. Headers are also saved in
2184
+ # <tt>@headers</tt> for use in future rows.
2185
+ #
2186
+ # When +nil+, +row+ is assumed to be a header row not based on an actual row
2187
+ # of the stream.
2188
+ #
2189
+ def parse_headers(row = nil)
2190
+ if @headers.nil? # header row
2191
+ @headers = case @use_headers # save headers
2192
+ # Array of headers
2193
+ when Array then @use_headers
2194
+ # CSV header String
2195
+ when String
2196
+ self.class.parse_line( @use_headers,
2197
+ col_sep: @col_sep,
2198
+ row_sep: @row_sep,
2199
+ quote_char: @quote_char )
2200
+ # first row is headers
2201
+ else row
2202
+ end
2203
+
2204
+ # prepare converted and unconverted copies
2205
+ row = @headers if row.nil?
2206
+ @headers = convert_fields(@headers, true)
2207
+
2208
+ if @return_headers # return headers
2209
+ return self.class::Row.new(@headers, row, true)
2210
+ elsif not [Array, String].include? @use_headers.class # skip to field row
2211
+ return shift
2212
+ end
980
2213
  end
981
2214
 
982
- private
2215
+ self.class::Row.new(@headers, convert_fields(row)) # field row
2216
+ end
983
2217
 
984
- def read(size)
985
- @s.read(size)
2218
+ #
2219
+ # This method injects an instance variable <tt>unconverted_fields</tt> into
2220
+ # +row+ and an accessor method for +row+ called unconverted_fields(). The
2221
+ # variable is set to the contents of +fields+.
2222
+ #
2223
+ def add_unconverted_fields(row, fields)
2224
+ class << row
2225
+ attr_reader :unconverted_fields
986
2226
  end
987
-
988
- def terminate
989
- super()
2227
+ row.instance_eval { @unconverted_fields = fields }
2228
+ row
2229
+ end
2230
+
2231
+ #
2232
+ # This method is an encoding safe version of Regexp::escape(). It will escape
2233
+ # any characters that would change the meaning of a regular expression in the
2234
+ # encoding of +str+. Regular expression characters that cannot be transcoded
2235
+ # to the target encoding will be skipped and no escaping will be performed if
2236
+ # a backslash cannot be transcoded.
2237
+ #
2238
+ def escape_re(str)
2239
+ str.gsub(@re_chars) {|c| @re_esc + c}
2240
+ end
2241
+
2242
+ #
2243
+ # Builds a regular expression in <tt>@encoding</tt>. All +chunks+ will be
2244
+ # transcoded to that encoding.
2245
+ #
2246
+ def encode_re(*chunks)
2247
+ Regexp.new(encode_str(*chunks))
2248
+ end
2249
+
2250
+ #
2251
+ # Builds a String in <tt>@encoding</tt>. All +chunks+ will be transcoded to
2252
+ # that encoding.
2253
+ #
2254
+ def encode_str(*chunks)
2255
+ chunks.map { |chunk| chunk.encode(@encoding.name) }.join('')
2256
+ end
2257
+
2258
+ private
2259
+
2260
+ #
2261
+ # Returns the encoding of the internal IO object or the +default+ if the
2262
+ # encoding cannot be determined.
2263
+ #
2264
+ def raw_encoding(default = Encoding::ASCII_8BIT)
2265
+ if @io.respond_to? :internal_encoding
2266
+ @io.internal_encoding || @io.external_encoding
2267
+ elsif @io.is_a? StringIO
2268
+ @io.string.encoding
2269
+ elsif @io.respond_to? :encoding
2270
+ @io.encoding
2271
+ else
2272
+ default
990
2273
  end
991
2274
  end
992
2275
  end
2276
+
2277
+ # Passes +args+ to CSV::instance.
2278
+ #
2279
+ # CSV("CSV,data").read
2280
+ # #=> [["CSV", "data"]]
2281
+ #
2282
+ # If a block is given, the instance is passed the block and the return value
2283
+ # becomes the return value of the block.
2284
+ #
2285
+ # CSV("CSV,data") { |c|
2286
+ # c.read.any? { |a| a.include?("data") }
2287
+ # } #=> true
2288
+ #
2289
+ # CSV("CSV,data") { |c|
2290
+ # c.read.any? { |a| a.include?("zombies") }
2291
+ # } #=> false
2292
+ #
2293
+ def CSV(*args, &block)
2294
+ CSV.instance(*args, &block)
2295
+ end
2296
+
2297
+ class Array # :nodoc:
2298
+ # Equivalent to CSV::generate_line(self, options)
2299
+ #
2300
+ # ["CSV", "data"].to_csv
2301
+ # #=> "CSV,data\n"
2302
+ def to_csv(options = Hash.new)
2303
+ CSV.generate_line(self, options)
2304
+ end
2305
+ end
2306
+
2307
+ class String # :nodoc:
2308
+ # Equivalent to CSV::parse_line(self, options)
2309
+ #
2310
+ # "CSV,data".parse_csv
2311
+ # #=> ["CSV", "data"]
2312
+ def parse_csv(options = Hash.new)
2313
+ CSV.parse_line(self, options)
2314
+ end
2315
+ end