rubysl-csv 1.0.1 → 2.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 67c6fb34a5cd1839b27f1b3db51b962e4966fe3c
4
- data.tar.gz: c5c5befc7ebcf24aa4b45b7be33ae659d4680262
3
+ metadata.gz: 06b35a01dd6e04add0701020be3f918d840698d4
4
+ data.tar.gz: b8addc03698ccd506f75a0370204553710dcfe1c
5
5
  SHA512:
6
- metadata.gz: 5444f8467b23724368c2d33e4cd0ff4666b58a354194ffada01da0c625a618cf6d23d3abe8caf3256b73d7d89c94fb6fef841745a405747b855e8bd92a97dbad
7
- data.tar.gz: ab7bbb1b410705d370f4a4aa362bb39541113f9c2134d4ff94dd02d36c15b9a4ac8c7effa9009db0f422865c5720ef0ded4f2b4427509b45030d7ed849d3beb2
6
+ metadata.gz: 12419103eb8b2833f0aca7c11477c69e7175d0e1e8964616b5077dd9ef84e028b93dfbd6ddfc0e1f0f8d25d9939cd52bf4088ba708d3958271fc4ea3b663b27c
7
+ data.tar.gz: f388bfd4ba16eae0714a9fed2031f2175b4a435f9176c39d9155d1a07bad33ba882e72a94c3d370b22f0ac08a15b3f036c6a0fff0ff92a6910b77b320bf7c730
data/.gitignore CHANGED
@@ -15,4 +15,3 @@ spec/reports
15
15
  test/tmp
16
16
  test/version_tmp
17
17
  tmp
18
- .rbx
@@ -1,8 +1,9 @@
1
1
  language: ruby
2
2
  before_install:
3
+ - rvm use $RVM --install --binary --fuzzy
3
4
  - gem update --system
4
5
  - gem --version
5
6
  - gem install rubysl-bundler
7
+ env:
8
+ - RVM=rbx-nightly-d21 RUBYLIB=lib
6
9
  script: bundle exec mspec spec
7
- rvm:
8
- - rbx-nightly-18mode
data/README.md CHANGED
@@ -1,4 +1,4 @@
1
- # RubySL::Csv
1
+ # Rubysl::Csv
2
2
 
3
3
  TODO: Write a gem description
4
4
 
@@ -24,6 +24,6 @@ TODO: Write usage instructions here
24
24
 
25
25
  1. Fork it
26
26
  2. Create your feature branch (`git checkout -b my-new-feature`)
27
- 3. Commit your changes (`git commit -am 'Added some feature'`)
27
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
28
28
  4. Push to the branch (`git push origin my-new-feature`)
29
29
  5. Create new Pull Request
data/Rakefile CHANGED
@@ -1,2 +1 @@
1
- #!/usr/bin/env rake
2
1
  require "bundler/gem_tasks"
@@ -1,992 +1,2315 @@
1
- # CSV -- module for generating/parsing CSV data.
2
- # Copyright (C) 2000-2004 NAKAMURA, Hiroshi <nakahiro@sarion.co.jp>.
3
-
4
- # $Id: csv.rb 11708 2007-02-12 23:01:19Z shyouhei $
5
-
6
- # This program is copyrighted free software by NAKAMURA, Hiroshi. You can
7
- # redistribute it and/or modify it under the same terms of Ruby's license;
8
- # either the dual license version in 2003, or any later version.
9
-
10
-
1
+ # encoding: US-ASCII
2
+ # = csv.rb -- CSV Reading and Writing
3
+ #
4
+ # Created by James Edward Gray II on 2005-10-31.
5
+ # Copyright 2005 James Edward Gray II. You can redistribute or modify this code
6
+ # under the terms of Ruby's license.
7
+ #
8
+ # See CSV for documentation.
9
+ #
10
+ # == Description
11
+ #
12
+ # Welcome to the new and improved CSV.
13
+ #
14
+ # This version of the CSV library began its life as FasterCSV. FasterCSV was
15
+ # intended as a replacement to Ruby's then standard CSV library. It was
16
+ # designed to address concerns users of that library had and it had three
17
+ # primary goals:
18
+ #
19
+ # 1. Be significantly faster than CSV while remaining a pure Ruby library.
20
+ # 2. Use a smaller and easier to maintain code base. (FasterCSV eventually
21
+ # grew larger, was also but considerably richer in features. The parsing
22
+ # core remains quite small.)
23
+ # 3. Improve on the CSV interface.
24
+ #
25
+ # Obviously, the last one is subjective. I did try to defer to the original
26
+ # interface whenever I didn't have a compelling reason to change it though, so
27
+ # hopefully this won't be too radically different.
28
+ #
29
+ # We must have met our goals because FasterCSV was renamed to CSV and replaced
30
+ # the original library as of Ruby 1.9. If you are migrating code from 1.8 or
31
+ # earlier, you may have to change your code to comply with the new interface.
32
+ #
33
+ # == What's Different From the Old CSV?
34
+ #
35
+ # I'm sure I'll miss something, but I'll try to mention most of the major
36
+ # differences I am aware of, to help others quickly get up to speed:
37
+ #
38
+ # === CSV Parsing
39
+ #
40
+ # * This parser is m17n aware. See CSV for full details.
41
+ # * This library has a stricter parser and will throw MalformedCSVErrors on
42
+ # problematic data.
43
+ # * This library has a less liberal idea of a line ending than CSV. What you
44
+ # set as the <tt>:row_sep</tt> is law. It can auto-detect your line endings
45
+ # though.
46
+ # * The old library returned empty lines as <tt>[nil]</tt>. This library calls
47
+ # them <tt>[]</tt>.
48
+ # * This library has a much faster parser.
49
+ #
50
+ # === Interface
51
+ #
52
+ # * CSV now uses Hash-style parameters to set options.
53
+ # * CSV no longer has generate_row() or parse_row().
54
+ # * The old CSV's Reader and Writer classes have been dropped.
55
+ # * CSV::open() is now more like Ruby's open().
56
+ # * CSV objects now support most standard IO methods.
57
+ # * CSV now has a new() method used to wrap objects like String and IO for
58
+ # reading and writing.
59
+ # * CSV::generate() is different from the old method.
60
+ # * CSV no longer supports partial reads. It works line-by-line.
61
+ # * CSV no longer allows the instance methods to override the separators for
62
+ # performance reasons. They must be set in the constructor.
63
+ #
64
+ # If you use this library and find yourself missing any functionality I have
65
+ # trimmed, please {let me know}[mailto:james@grayproductions.net].
66
+ #
67
+ # == Documentation
68
+ #
69
+ # See CSV for documentation.
70
+ #
71
+ # == What is CSV, really?
72
+ #
73
+ # CSV maintains a pretty strict definition of CSV taken directly from
74
+ # {the RFC}[http://www.ietf.org/rfc/rfc4180.txt]. I relax the rules in only one
75
+ # place and that is to make using this library easier. CSV will parse all valid
76
+ # CSV.
77
+ #
78
+ # What you don't want to do is feed CSV invalid data. Because of the way the
79
+ # CSV format works, it's common for a parser to need to read until the end of
80
+ # the file to be sure a field is invalid. This eats a lot of time and memory.
81
+ #
82
+ # Luckily, when working with invalid CSV, Ruby's built-in methods will almost
83
+ # always be superior in every way. For example, parsing non-quoted fields is as
84
+ # easy as:
85
+ #
86
+ # data.split(",")
87
+ #
88
+ # == Questions and/or Comments
89
+ #
90
+ # Feel free to email {James Edward Gray II}[mailto:james@grayproductions.net]
91
+ # with any questions.
92
+
93
+ require "forwardable"
94
+ require "English"
95
+ require "date"
96
+ require "stringio"
97
+
98
+ #
99
+ # This class provides a complete interface to CSV files and data. It offers
100
+ # tools to enable you to read and write to and from Strings or IO objects, as
101
+ # needed.
102
+ #
103
+ # == Reading
104
+ #
105
+ # === From a File
106
+ #
107
+ # ==== A Line at a Time
108
+ #
109
+ # CSV.foreach("path/to/file.csv") do |row|
110
+ # # use row here...
111
+ # end
112
+ #
113
+ # ==== All at Once
114
+ #
115
+ # arr_of_arrs = CSV.read("path/to/file.csv")
116
+ #
117
+ # === From a String
118
+ #
119
+ # ==== A Line at a Time
120
+ #
121
+ # CSV.parse("CSV,data,String") do |row|
122
+ # # use row here...
123
+ # end
124
+ #
125
+ # ==== All at Once
126
+ #
127
+ # arr_of_arrs = CSV.parse("CSV,data,String")
128
+ #
129
+ # == Writing
130
+ #
131
+ # === To a File
132
+ #
133
+ # CSV.open("path/to/file.csv", "wb") do |csv|
134
+ # csv << ["row", "of", "CSV", "data"]
135
+ # csv << ["another", "row"]
136
+ # # ...
137
+ # end
138
+ #
139
+ # === To a String
140
+ #
141
+ # csv_string = CSV.generate do |csv|
142
+ # csv << ["row", "of", "CSV", "data"]
143
+ # csv << ["another", "row"]
144
+ # # ...
145
+ # end
146
+ #
147
+ # == Convert a Single Line
148
+ #
149
+ # csv_string = ["CSV", "data"].to_csv # to CSV
150
+ # csv_array = "CSV,String".parse_csv # from CSV
151
+ #
152
+ # == Shortcut Interface
153
+ #
154
+ # CSV { |csv_out| csv_out << %w{my data here} } # to $stdout
155
+ # CSV(csv = "") { |csv_str| csv_str << %w{my data here} } # to a String
156
+ # CSV($stderr) { |csv_err| csv_err << %w{my data here} } # to $stderr
157
+ # CSV($stdin) { |csv_in| csv_in.each { |row| p row } } # from $stdin
158
+ #
159
+ # == Advanced Usage
160
+ #
161
+ # === Wrap an IO Object
162
+ #
163
+ # csv = CSV.new(io, options)
164
+ # # ... read (with gets() or each()) from and write (with <<) to csv here ...
165
+ #
166
+ # == CSV and Character Encodings (M17n or Multilingualization)
167
+ #
168
+ # This new CSV parser is m17n savvy. The parser works in the Encoding of the IO
169
+ # or String object being read from or written to. Your data is never transcoded
170
+ # (unless you ask Ruby to transcode it for you) and will literally be parsed in
171
+ # the Encoding it is in. Thus CSV will return Arrays or Rows of Strings in the
172
+ # Encoding of your data. This is accomplished by transcoding the parser itself
173
+ # into your Encoding.
174
+ #
175
+ # Some transcoding must take place, of course, to accomplish this multiencoding
176
+ # support. For example, <tt>:col_sep</tt>, <tt>:row_sep</tt>, and
177
+ # <tt>:quote_char</tt> must be transcoded to match your data. Hopefully this
178
+ # makes the entire process feel transparent, since CSV's defaults should just
179
+ # magically work for you data. However, you can set these values manually in
180
+ # the target Encoding to avoid the translation.
181
+ #
182
+ # It's also important to note that while all of CSV's core parser is now
183
+ # Encoding agnostic, some features are not. For example, the built-in
184
+ # converters will try to transcode data to UTF-8 before making conversions.
185
+ # Again, you can provide custom converters that are aware of your Encodings to
186
+ # avoid this translation. It's just too hard for me to support native
187
+ # conversions in all of Ruby's Encodings.
188
+ #
189
+ # Anyway, the practical side of this is simple: make sure IO and String objects
190
+ # passed into CSV have the proper Encoding set and everything should just work.
191
+ # CSV methods that allow you to open IO objects (CSV::foreach(), CSV::open(),
192
+ # CSV::read(), and CSV::readlines()) do allow you to specify the Encoding.
193
+ #
194
+ # One minor exception comes when generating CSV into a String with an Encoding
195
+ # that is not ASCII compatible. There's no existing data for CSV to use to
196
+ # prepare itself and thus you will probably need to manually specify the desired
197
+ # Encoding for most of those cases. It will try to guess using the fields in a
198
+ # row of output though, when using CSV::generate_line() or Array#to_csv().
199
+ #
200
+ # I try to point out any other Encoding issues in the documentation of methods
201
+ # as they come up.
202
+ #
203
+ # This has been tested to the best of my ability with all non-"dummy" Encodings
204
+ # Ruby ships with. However, it is brave new code and may have some bugs.
205
+ # Please feel free to {report}[mailto:james@grayproductions.net] any issues you
206
+ # find with it.
207
+ #
11
208
  class CSV
12
- class IllegalFormatError < RuntimeError; end
209
+ # The version of the installed library.
210
+ VERSION = "2.4.8".freeze
211
+
212
+ #
213
+ # A CSV::Row is part Array and part Hash. It retains an order for the fields
214
+ # and allows duplicates just as an Array would, but also allows you to access
215
+ # fields by name just as you could if they were in a Hash.
216
+ #
217
+ # All rows returned by CSV will be constructed from this class, if header row
218
+ # processing is activated.
219
+ #
220
+ class Row
221
+ #
222
+ # Construct a new CSV::Row from +headers+ and +fields+, which are expected
223
+ # to be Arrays. If one Array is shorter than the other, it will be padded
224
+ # with +nil+ objects.
225
+ #
226
+ # The optional +header_row+ parameter can be set to +true+ to indicate, via
227
+ # CSV::Row.header_row?() and CSV::Row.field_row?(), that this is a header
228
+ # row. Otherwise, the row is assumes to be a field row.
229
+ #
230
+ # A CSV::Row object supports the following Array methods through delegation:
231
+ #
232
+ # * empty?()
233
+ # * length()
234
+ # * size()
235
+ #
236
+ def initialize(headers, fields, header_row = false)
237
+ @header_row = header_row
13
238
 
14
- # deprecated
15
- class Cell < String
16
- def initialize(data = "", is_null = false)
17
- super(is_null ? "" : data)
239
+ # handle extra headers or fields
240
+ @row = if headers.size > fields.size
241
+ headers.zip(fields)
242
+ else
243
+ fields.zip(headers).map { |pair| pair.reverse }
244
+ end
18
245
  end
19
246
 
20
- def data
21
- to_s
247
+ # Internal data format used to compare equality.
248
+ attr_reader :row
249
+ protected :row
250
+
251
+ ### Array Delegation ###
252
+
253
+ extend Forwardable
254
+ def_delegators :@row, :empty?, :length, :size
255
+
256
+ # Returns +true+ if this is a header row.
257
+ def header_row?
258
+ @header_row
22
259
  end
23
- end
24
260
 
25
- # deprecated
26
- class Row < Array
27
- end
261
+ # Returns +true+ if this is a field row.
262
+ def field_row?
263
+ not header_row?
264
+ end
28
265
 
29
- # Open a CSV formatted file for reading or writing.
30
- #
31
- # For reading.
32
- #
33
- # EXAMPLE 1
34
- # CSV.open('csvfile.csv', 'r') do |row|
35
- # p row
36
- # end
37
- #
38
- # EXAMPLE 2
39
- # reader = CSV.open('csvfile.csv', 'r')
40
- # row1 = reader.shift
41
- # row2 = reader.shift
42
- # if row2.empty?
43
- # p 'row2 not find.'
44
- # end
45
- # reader.close
46
- #
47
- # ARGS
48
- # filename: filename to parse.
49
- # col_sep: Column separator. ?, by default. If you want to separate
50
- # fields with semicolon, give ?; here.
51
- # row_sep: Row separator. nil by default. nil means "\r\n or \n". If you
52
- # want to separate records with \r, give ?\r here.
53
- #
54
- # RETURNS
55
- # reader instance. To get parse result, see CSV::Reader#each.
56
- #
57
- #
58
- # For writing.
59
- #
60
- # EXAMPLE 1
61
- # CSV.open('csvfile.csv', 'w') do |writer|
62
- # writer << ['r1c1', 'r1c2']
63
- # writer << ['r2c1', 'r2c2']
64
- # writer << [nil, nil]
65
- # end
66
- #
67
- # EXAMPLE 2
68
- # writer = CSV.open('csvfile.csv', 'w')
69
- # writer << ['r1c1', 'r1c2'] << ['r2c1', 'r2c2'] << [nil, nil]
70
- # writer.close
71
- #
72
- # ARGS
73
- # filename: filename to generate.
74
- # col_sep: Column separator. ?, by default. If you want to separate
75
- # fields with semicolon, give ?; here.
76
- # row_sep: Row separator. nil by default. nil means "\r\n or \n". If you
77
- # want to separate records with \r, give ?\r here.
78
- #
79
- # RETURNS
80
- # writer instance. See CSV::Writer#<< and CSV::Writer#add_row to know how
81
- # to generate CSV string.
82
- #
83
- def CSV.open(path, mode, fs = nil, rs = nil, &block)
84
- if mode == 'r' or mode == 'rb'
85
- open_reader(path, mode, fs, rs, &block)
86
- elsif mode == 'w' or mode == 'wb'
87
- open_writer(path, mode, fs, rs, &block)
88
- else
89
- raise ArgumentError.new("'mode' must be 'r', 'rb', 'w', or 'wb'")
266
+ # Returns the headers of this row.
267
+ def headers
268
+ @row.map { |pair| pair.first }
90
269
  end
91
- end
92
270
 
93
- def CSV.foreach(path, rs = nil, &block)
94
- open_reader(path, 'r', ',', rs, &block)
95
- end
271
+ #
272
+ # :call-seq:
273
+ # field( header )
274
+ # field( header, offset )
275
+ # field( index )
276
+ #
277
+ # This method will return the field value by +header+ or +index+. If a field
278
+ # is not found, +nil+ is returned.
279
+ #
280
+ # When provided, +offset+ ensures that a header match occurrs on or later
281
+ # than the +offset+ index. You can use this to find duplicate headers,
282
+ # without resorting to hard-coding exact indices.
283
+ #
284
+ def field(header_or_index, minimum_index = 0)
285
+ # locate the pair
286
+ finder = header_or_index.is_a?(Integer) ? :[] : :assoc
287
+ pair = @row[minimum_index..-1].send(finder, header_or_index)
96
288
 
97
- def CSV.read(path, length = nil, offset = nil)
98
- CSV.parse(IO.read(path, length, offset))
99
- end
100
-
101
- def CSV.readlines(path, rs = nil)
102
- reader = open_reader(path, 'r', ',', rs)
103
- begin
104
- reader.collect { |row| row }
105
- ensure
106
- reader.close
289
+ # return the field if we have a pair
290
+ pair.nil? ? nil : pair.last
107
291
  end
108
- end
292
+ alias_method :[], :field
109
293
 
110
- def CSV.generate(path, fs = nil, rs = nil, &block)
111
- open_writer(path, 'w', fs, rs, &block)
112
- end
294
+ #
295
+ # :call-seq:
296
+ # fetch( header )
297
+ # fetch( header ) { |row| ... }
298
+ # fetch( header, default )
299
+ #
300
+ # This method will fetch the field value by +header+. It has the same
301
+ # behavior as Hash#fetch: if there is a field with the given +header+, its
302
+ # value is returned. Otherwise, if a block is given, it is yielded the
303
+ # +header+ and its result is returned; if a +default+ is given as the
304
+ # second argument, it is returned; otherwise a KeyError is raised.
305
+ #
306
+ def fetch(header, *varargs)
307
+ raise ArgumentError, "Too many arguments" if varargs.length > 1
308
+ pair = @row.assoc(header)
309
+ if pair
310
+ pair.last
311
+ else
312
+ if block_given?
313
+ yield header
314
+ elsif varargs.empty?
315
+ raise KeyError, "key not found: #{header}"
316
+ else
317
+ varargs.first
318
+ end
319
+ end
320
+ end
113
321
 
114
- # Parse lines from given string or stream. Return rows as an Array of Arrays.
115
- def CSV.parse(str_or_readable, fs = nil, rs = nil, &block)
116
- if File.exist?(str_or_readable)
117
- STDERR.puts("CSV.parse(filename) is deprecated." +
118
- " Use CSV.open(filename, 'r') instead.")
119
- return open_reader(str_or_readable, 'r', fs, rs, &block)
322
+ # Returns +true+ if there is a field with the given +header+.
323
+ def has_key?(header)
324
+ !!@row.assoc(header)
120
325
  end
121
- if block
122
- CSV::Reader.parse(str_or_readable, fs, rs) do |row|
123
- yield(row)
326
+ alias_method :include?, :has_key?
327
+ alias_method :key?, :has_key?
328
+ alias_method :member?, :has_key?
329
+
330
+ #
331
+ # :call-seq:
332
+ # []=( header, value )
333
+ # []=( header, offset, value )
334
+ # []=( index, value )
335
+ #
336
+ # Looks up the field by the semantics described in CSV::Row.field() and
337
+ # assigns the +value+.
338
+ #
339
+ # Assigning past the end of the row with an index will set all pairs between
340
+ # to <tt>[nil, nil]</tt>. Assigning to an unused header appends the new
341
+ # pair.
342
+ #
343
+ def []=(*args)
344
+ value = args.pop
345
+
346
+ if args.first.is_a? Integer
347
+ if @row[args.first].nil? # extending past the end with index
348
+ @row[args.first] = [nil, value]
349
+ @row.map! { |pair| pair.nil? ? [nil, nil] : pair }
350
+ else # normal index assignment
351
+ @row[args.first][1] = value
352
+ end
353
+ else
354
+ index = index(*args)
355
+ if index.nil? # appending a field
356
+ self << [args.first, value]
357
+ else # normal header assignment
358
+ @row[index][1] = value
359
+ end
124
360
  end
125
- nil
126
- else
127
- CSV::Reader.create(str_or_readable, fs, rs).collect { |row| row }
128
361
  end
129
- end
130
362
 
131
- # Parse a line from given string. Bear in mind it parses ONE LINE. Rest of
132
- # the string is ignored for example "a,b\r\nc,d" => ['a', 'b'] and the
133
- # second line 'c,d' is ignored.
134
- #
135
- # If you don't know whether a target string to parse is exactly 1 line or
136
- # not, use CSV.parse_row instead of this method.
137
- def CSV.parse_line(src, fs = nil, rs = nil)
138
- fs ||= ','
139
- if fs.is_a?(Fixnum)
140
- fs = fs.chr
363
+ #
364
+ # :call-seq:
365
+ # <<( field )
366
+ # <<( header_and_field_array )
367
+ # <<( header_and_field_hash )
368
+ #
369
+ # If a two-element Array is provided, it is assumed to be a header and field
370
+ # and the pair is appended. A Hash works the same way with the key being
371
+ # the header and the value being the field. Anything else is assumed to be
372
+ # a lone field which is appended with a +nil+ header.
373
+ #
374
+ # This method returns the row for chaining.
375
+ #
376
+ def <<(arg)
377
+ if arg.is_a?(Array) and arg.size == 2 # appending a header and name
378
+ @row << arg
379
+ elsif arg.is_a?(Hash) # append header and name pairs
380
+ arg.each { |pair| @row << pair }
381
+ else # append field value
382
+ @row << [nil, arg]
383
+ end
384
+
385
+ self # for chaining
141
386
  end
142
- if !rs.nil? and rs.is_a?(Fixnum)
143
- rs = rs.chr
387
+
388
+ #
389
+ # A shortcut for appending multiple fields. Equivalent to:
390
+ #
391
+ # args.each { |arg| csv_row << arg }
392
+ #
393
+ # This method returns the row for chaining.
394
+ #
395
+ def push(*args)
396
+ args.each { |arg| self << arg }
397
+
398
+ self # for chaining
144
399
  end
145
- idx = 0
146
- res_type = :DT_COLSEP
147
- row = []
148
- begin
149
- while res_type == :DT_COLSEP
150
- res_type, idx, cell = parse_body(src, idx, fs, rs)
151
- row << cell
400
+
401
+ #
402
+ # :call-seq:
403
+ # delete( header )
404
+ # delete( header, offset )
405
+ # delete( index )
406
+ #
407
+ # Used to remove a pair from the row by +header+ or +index+. The pair is
408
+ # located as described in CSV::Row.field(). The deleted pair is returned,
409
+ # or +nil+ if a pair could not be found.
410
+ #
411
+ def delete(header_or_index, minimum_index = 0)
412
+ if header_or_index.is_a? Integer # by index
413
+ @row.delete_at(header_or_index)
414
+ elsif i = index(header_or_index, minimum_index) # by header
415
+ @row.delete_at(i)
416
+ else
417
+ [ ]
152
418
  end
153
- rescue IllegalFormatError
154
- return []
155
419
  end
156
- row
157
- end
158
420
 
159
- # Create a line from cells. each cell is stringified by to_s.
160
- def CSV.generate_line(row, fs = nil, rs = nil)
161
- if row.size == 0
162
- return ''
163
- end
164
- fs ||= ','
165
- if fs.is_a?(Fixnum)
166
- fs = fs.chr
167
- end
168
- if !rs.nil? and rs.is_a?(Fixnum)
169
- rs = rs.chr
170
- end
171
- res_type = :DT_COLSEP
172
- result_str = ''
173
- idx = 0
174
- while true
175
- generate_body(row[idx], result_str, fs, rs)
176
- idx += 1
177
- if (idx == row.size)
178
- break
179
- end
180
- generate_separator(:DT_COLSEP, result_str, fs, rs)
181
- end
182
- result_str
183
- end
184
-
185
- # Parse a line from string. Consider using CSV.parse_line instead.
186
- # To parse lines in CSV string, see EXAMPLE below.
187
- #
188
- # EXAMPLE
189
- # src = "a,b\r\nc,d\r\ne,f"
190
- # idx = 0
191
- # begin
192
- # parsed = []
193
- # parsed_cells, idx = CSV.parse_row(src, idx, parsed)
194
- # puts "Parsed #{ parsed_cells } cells."
195
- # p parsed
196
- # end while parsed_cells > 0
197
- #
198
- # ARGS
199
- # src: a CSV data to be parsed. Must respond '[](idx)'.
200
- # src[](idx) must return a char. (Not a string such as 'a', but 97).
201
- # src[](idx_out_of_bounds) must return nil. A String satisfies this
202
- # requirement.
203
- # idx: index of parsing location of 'src'. 0 origin.
204
- # out_dev: buffer for parsed cells. Must respond '<<(aString)'.
205
- # col_sep: Column separator. ?, by default. If you want to separate
206
- # fields with semicolon, give ?; here.
207
- # row_sep: Row separator. nil by default. nil means "\r\n or \n". If you
208
- # want to separate records with \r, give ?\r here.
209
- #
210
- # RETURNS
211
- # parsed_cells: num of parsed cells.
212
- # idx: index of next parsing location of 'src'.
213
- #
214
- def CSV.parse_row(src, idx, out_dev, fs = nil, rs = nil)
215
- fs ||= ','
216
- if fs.is_a?(Fixnum)
217
- fs = fs.chr
218
- end
219
- if !rs.nil? and rs.is_a?(Fixnum)
220
- rs = rs.chr
221
- end
222
- idx_backup = idx
223
- parsed_cells = 0
224
- res_type = :DT_COLSEP
225
- begin
226
- while res_type != :DT_ROWSEP
227
- res_type, idx, cell = parse_body(src, idx, fs, rs)
228
- if res_type == :DT_EOS
229
- if idx == idx_backup #((parsed_cells == 0) and cell.nil?)
230
- return 0, 0
421
+ #
422
+ # The provided +block+ is passed a header and field for each pair in the row
423
+ # and expected to return +true+ or +false+, depending on whether the pair
424
+ # should be deleted.
425
+ #
426
+ # This method returns the row for chaining.
427
+ #
428
+ def delete_if(&block)
429
+ @row.delete_if(&block)
430
+
431
+ self # for chaining
432
+ end
433
+
434
+ #
435
+ # This method accepts any number of arguments which can be headers, indices,
436
+ # Ranges of either, or two-element Arrays containing a header and offset.
437
+ # Each argument will be replaced with a field lookup as described in
438
+ # CSV::Row.field().
439
+ #
440
+ # If called with no arguments, all fields are returned.
441
+ #
442
+ def fields(*headers_and_or_indices)
443
+ if headers_and_or_indices.empty? # return all fields--no arguments
444
+ @row.map { |pair| pair.last }
445
+ else # or work like values_at()
446
+ headers_and_or_indices.inject(Array.new) do |all, h_or_i|
447
+ all + if h_or_i.is_a? Range
448
+ index_begin = h_or_i.begin.is_a?(Integer) ? h_or_i.begin :
449
+ index(h_or_i.begin)
450
+ index_end = h_or_i.end.is_a?(Integer) ? h_or_i.end :
451
+ index(h_or_i.end)
452
+ new_range = h_or_i.exclude_end? ? (index_begin...index_end) :
453
+ (index_begin..index_end)
454
+ fields.values_at(new_range)
455
+ else
456
+ [field(*Array(h_or_i))]
231
457
  end
232
- res_type = :DT_ROWSEP
233
458
  end
234
- parsed_cells += 1
235
- out_dev << cell
236
- end
237
- rescue IllegalFormatError
238
- return 0, 0
239
- end
240
- return parsed_cells, idx
241
- end
242
-
243
- # Convert a line from cells data to string. Consider using CSV.generate_line
244
- # instead. To generate multi-row CSV string, see EXAMPLE below.
245
- #
246
- # EXAMPLE
247
- # row1 = ['a', 'b']
248
- # row2 = ['c', 'd']
249
- # row3 = ['e', 'f']
250
- # src = [row1, row2, row3]
251
- # buf = ''
252
- # src.each do |row|
253
- # parsed_cells = CSV.generate_row(row, 2, buf)
254
- # puts "Created #{ parsed_cells } cells."
255
- # end
256
- # p buf
257
- #
258
- # ARGS
259
- # src: an Array of String to be converted to CSV string. Must respond to
260
- # 'size' and '[](idx)'. src[idx] must return String.
261
- # cells: num of cells in a line.
262
- # out_dev: buffer for generated CSV string. Must respond to '<<(string)'.
263
- # col_sep: Column separator. ?, by default. If you want to separate
264
- # fields with semicolon, give ?; here.
265
- # row_sep: Row separator. nil by default. nil means "\r\n or \n". If you
266
- # want to separate records with \r, give ?\r here.
267
- #
268
- # RETURNS
269
- # parsed_cells: num of converted cells.
270
- #
271
- def CSV.generate_row(src, cells, out_dev, fs = nil, rs = nil)
272
- fs ||= ','
273
- if fs.is_a?(Fixnum)
274
- fs = fs.chr
275
- end
276
- if !rs.nil? and rs.is_a?(Fixnum)
277
- rs = rs.chr
278
- end
279
- src_size = src.size
280
- if (src_size == 0)
281
- if cells == 0
282
- generate_separator(:DT_ROWSEP, out_dev, fs, rs)
283
- end
284
- return 0
285
- end
286
- res_type = :DT_COLSEP
287
- parsed_cells = 0
288
- generate_body(src[parsed_cells], out_dev, fs, rs)
289
- parsed_cells += 1
290
- while ((parsed_cells < cells) and (parsed_cells != src_size))
291
- generate_separator(:DT_COLSEP, out_dev, fs, rs)
292
- generate_body(src[parsed_cells], out_dev, fs, rs)
293
- parsed_cells += 1
294
- end
295
- if (parsed_cells == cells)
296
- generate_separator(:DT_ROWSEP, out_dev, fs, rs)
297
- else
298
- generate_separator(:DT_COLSEP, out_dev, fs, rs)
459
+ end
460
+ end
461
+ alias_method :values_at, :fields
462
+
463
+ #
464
+ # :call-seq:
465
+ # index( header )
466
+ # index( header, offset )
467
+ #
468
+ # This method will return the index of a field with the provided +header+.
469
+ # The +offset+ can be used to locate duplicate header names, as described in
470
+ # CSV::Row.field().
471
+ #
472
+ def index(header, minimum_index = 0)
473
+ # find the pair
474
+ index = headers[minimum_index..-1].index(header)
475
+ # return the index at the right offset, if we found one
476
+ index.nil? ? nil : index + minimum_index
477
+ end
478
+
479
+ # Returns +true+ if +name+ is a header for this row, and +false+ otherwise.
480
+ def header?(name)
481
+ headers.include? name
482
+ end
483
+ alias_method :include?, :header?
484
+
485
+ #
486
+ # Returns +true+ if +data+ matches a field in this row, and +false+
487
+ # otherwise.
488
+ #
489
+ def field?(data)
490
+ fields.include? data
491
+ end
492
+
493
+ include Enumerable
494
+
495
+ #
496
+ # Yields each pair of the row as header and field tuples (much like
497
+ # iterating over a Hash).
498
+ #
499
+ # Support for Enumerable.
500
+ #
501
+ # This method returns the row for chaining.
502
+ #
503
+ def each(&block)
504
+ @row.each(&block)
505
+
506
+ self # for chaining
507
+ end
508
+
509
+ #
510
+ # Returns +true+ if this row contains the same headers and fields in the
511
+ # same order as +other+.
512
+ #
513
+ def ==(other)
514
+ return @row == other.row if other.is_a? CSV::Row
515
+ @row == other
516
+ end
517
+
518
+ #
519
+ # Collapses the row into a simple Hash. Be warning that this discards field
520
+ # order and clobbers duplicate fields.
521
+ #
522
+ def to_hash
523
+ # flatten just one level of the internal Array
524
+ Hash[*@row.inject(Array.new) { |ary, pair| ary.push(*pair) }]
525
+ end
526
+
527
+ #
528
+ # Returns the row as a CSV String. Headers are not used. Equivalent to:
529
+ #
530
+ # csv_row.fields.to_csv( options )
531
+ #
532
+ def to_csv(options = Hash.new)
533
+ fields.to_csv(options)
534
+ end
535
+ alias_method :to_s, :to_csv
536
+
537
+ # A summary of fields, by header, in an ASCII compatible String.
538
+ def inspect
539
+ str = ["#<", self.class.to_s]
540
+ each do |header, field|
541
+ str << " " << (header.is_a?(Symbol) ? header.to_s : header.inspect) <<
542
+ ":" << field.inspect
543
+ end
544
+ str << ">"
545
+ begin
546
+ str.join('')
547
+ rescue # any encoding error
548
+ str.map do |s|
549
+ e = Encoding::Converter.asciicompat_encoding(s.encoding)
550
+ e ? s.encode(e) : s.force_encoding("ASCII-8BIT")
551
+ end.join('')
552
+ end
299
553
  end
300
- parsed_cells
301
554
  end
302
-
303
- # Private class methods.
304
- class << self
305
- private
306
555
 
307
- def open_reader(path, mode, fs, rs, &block)
308
- file = File.open(path, mode)
309
- if block
310
- begin
311
- CSV::Reader.parse(file, fs, rs) do |row|
312
- yield(row)
313
- end
314
- ensure
315
- file.close
316
- end
317
- nil
556
+ #
557
+ # A CSV::Table is a two-dimensional data structure for representing CSV
558
+ # documents. Tables allow you to work with the data by row or column,
559
+ # manipulate the data, and even convert the results back to CSV, if needed.
560
+ #
561
+ # All tables returned by CSV will be constructed from this class, if header
562
+ # row processing is activated.
563
+ #
564
+ class Table
565
+ #
566
+ # Construct a new CSV::Table from +array_of_rows+, which are expected
567
+ # to be CSV::Row objects. All rows are assumed to have the same headers.
568
+ #
569
+ # A CSV::Table object supports the following Array methods through
570
+ # delegation:
571
+ #
572
+ # * empty?()
573
+ # * length()
574
+ # * size()
575
+ #
576
+ def initialize(array_of_rows)
577
+ @table = array_of_rows
578
+ @mode = :col_or_row
579
+ end
580
+
581
+ # The current access mode for indexing and iteration.
582
+ attr_reader :mode
583
+
584
+ # Internal data format used to compare equality.
585
+ attr_reader :table
586
+ protected :table
587
+
588
+ ### Array Delegation ###
589
+
590
+ extend Forwardable
591
+ def_delegators :@table, :empty?, :length, :size
592
+
593
+ #
594
+ # Returns a duplicate table object, in column mode. This is handy for
595
+ # chaining in a single call without changing the table mode, but be aware
596
+ # that this method can consume a fair amount of memory for bigger data sets.
597
+ #
598
+ # This method returns the duplicate table for chaining. Don't chain
599
+ # destructive methods (like []=()) this way though, since you are working
600
+ # with a duplicate.
601
+ #
602
+ def by_col
603
+ self.class.new(@table.dup).by_col!
604
+ end
605
+
606
+ #
607
+ # Switches the mode of this table to column mode. All calls to indexing and
608
+ # iteration methods will work with columns until the mode is changed again.
609
+ #
610
+ # This method returns the table and is safe to chain.
611
+ #
612
+ def by_col!
613
+ @mode = :col
614
+
615
+ self
616
+ end
617
+
618
+ #
619
+ # Returns a duplicate table object, in mixed mode. This is handy for
620
+ # chaining in a single call without changing the table mode, but be aware
621
+ # that this method can consume a fair amount of memory for bigger data sets.
622
+ #
623
+ # This method returns the duplicate table for chaining. Don't chain
624
+ # destructive methods (like []=()) this way though, since you are working
625
+ # with a duplicate.
626
+ #
627
+ def by_col_or_row
628
+ self.class.new(@table.dup).by_col_or_row!
629
+ end
630
+
631
+ #
632
+ # Switches the mode of this table to mixed mode. All calls to indexing and
633
+ # iteration methods will use the default intelligent indexing system until
634
+ # the mode is changed again. In mixed mode an index is assumed to be a row
635
+ # reference while anything else is assumed to be column access by headers.
636
+ #
637
+ # This method returns the table and is safe to chain.
638
+ #
639
+ def by_col_or_row!
640
+ @mode = :col_or_row
641
+
642
+ self
643
+ end
644
+
645
+ #
646
+ # Returns a duplicate table object, in row mode. This is handy for chaining
647
+ # in a single call without changing the table mode, but be aware that this
648
+ # method can consume a fair amount of memory for bigger data sets.
649
+ #
650
+ # This method returns the duplicate table for chaining. Don't chain
651
+ # destructive methods (like []=()) this way though, since you are working
652
+ # with a duplicate.
653
+ #
654
+ def by_row
655
+ self.class.new(@table.dup).by_row!
656
+ end
657
+
658
+ #
659
+ # Switches the mode of this table to row mode. All calls to indexing and
660
+ # iteration methods will work with rows until the mode is changed again.
661
+ #
662
+ # This method returns the table and is safe to chain.
663
+ #
664
+ def by_row!
665
+ @mode = :row
666
+
667
+ self
668
+ end
669
+
670
+ #
671
+ # Returns the headers for the first row of this table (assumed to match all
672
+ # other rows). An empty Array is returned for empty tables.
673
+ #
674
+ def headers
675
+ if @table.empty?
676
+ Array.new
318
677
  else
319
- reader = CSV::Reader.create(file, fs, rs)
320
- reader.close_on_terminate
321
- reader
678
+ @table.first.headers
322
679
  end
323
680
  end
324
681
 
325
- def open_writer(path, mode, fs, rs, &block)
326
- file = File.open(path, mode)
327
- if block
328
- begin
329
- CSV::Writer.generate(file, fs, rs) do |writer|
330
- yield(writer)
331
- end
332
- ensure
333
- file.close
334
- end
335
- nil
336
- else
337
- writer = CSV::Writer.create(file, fs, rs)
338
- writer.close_on_terminate
339
- writer
340
- end
341
- end
342
-
343
- def parse_body(src, idx, fs, rs)
344
- fs_str = fs
345
- fs_size = fs_str.size
346
- rs_str = rs || "\n"
347
- rs_size = rs_str.size
348
- fs_idx = rs_idx = 0
349
- cell = Cell.new
350
- state = :ST_START
351
- quoted = cr = false
352
- c = nil
353
- last_idx = idx
354
- while c = src[idx]
355
- unless quoted
356
- fschar = (c == fs_str[fs_idx])
357
- rschar = (c == rs_str[rs_idx])
358
- # simple 1 char backtrack
359
- if !fschar and c == fs_str[0]
360
- fs_idx = 0
361
- fschar = true
362
- if state == :ST_START
363
- state = :ST_DATA
364
- elsif state == :ST_QUOTE
365
- raise IllegalFormatError
366
- end
367
- end
368
- if !rschar and c == rs_str[0]
369
- rs_idx = 0
370
- rschar = true
371
- if state == :ST_START
372
- state = :ST_DATA
373
- elsif state == :ST_QUOTE
374
- raise IllegalFormatError
375
- end
376
- end
682
+ #
683
+ # In the default mixed mode, this method returns rows for index access and
684
+ # columns for header access. You can force the index association by first
685
+ # calling by_col!() or by_row!().
686
+ #
687
+ # Columns are returned as an Array of values. Altering that Array has no
688
+ # effect on the table.
689
+ #
690
+ def [](index_or_header)
691
+ if @mode == :row or # by index
692
+ (@mode == :col_or_row and index_or_header.is_a? Integer)
693
+ @table[index_or_header]
694
+ else # by header
695
+ @table.map { |row| row[index_or_header] }
696
+ end
697
+ end
698
+
699
+ #
700
+ # In the default mixed mode, this method assigns rows for index access and
701
+ # columns for header access. You can force the index association by first
702
+ # calling by_col!() or by_row!().
703
+ #
704
+ # Rows may be set to an Array of values (which will inherit the table's
705
+ # headers()) or a CSV::Row.
706
+ #
707
+ # Columns may be set to a single value, which is copied to each row of the
708
+ # column, or an Array of values. Arrays of values are assigned to rows top
709
+ # to bottom in row major order. Excess values are ignored and if the Array
710
+ # does not have a value for each row the extra rows will receive a +nil+.
711
+ #
712
+ # Assigning to an existing column or row clobbers the data. Assigning to
713
+ # new columns creates them at the right end of the table.
714
+ #
715
+ def []=(index_or_header, value)
716
+ if @mode == :row or # by index
717
+ (@mode == :col_or_row and index_or_header.is_a? Integer)
718
+ if value.is_a? Array
719
+ @table[index_or_header] = Row.new(headers, value)
720
+ else
721
+ @table[index_or_header] = value
377
722
  end
378
- if c == ?"
379
- fs_idx = rs_idx = 0
380
- if cr
381
- raise IllegalFormatError
382
- end
383
- cell << src[last_idx, (idx - last_idx)]
384
- last_idx = idx
385
- if state == :ST_DATA
386
- if quoted
387
- last_idx += 1
388
- quoted = false
389
- state = :ST_QUOTE
723
+ else # set column
724
+ if value.is_a? Array # multiple values
725
+ @table.each_with_index do |row, i|
726
+ if row.header_row?
727
+ row[index_or_header] = index_or_header
390
728
  else
391
- raise IllegalFormatError
392
- end
393
- elsif state == :ST_QUOTE
394
- cell << c.chr
395
- last_idx += 1
396
- quoted = true
397
- state = :ST_DATA
398
- else # :ST_START
399
- quoted = true
400
- last_idx += 1
401
- state = :ST_DATA
402
- end
403
- elsif fschar or rschar
404
- if fschar
405
- fs_idx += 1
406
- end
407
- if rschar
408
- rs_idx += 1
409
- end
410
- sep = nil
411
- if fs_idx == fs_size
412
- if state == :ST_START and rs_idx > 0 and fs_idx < rs_idx
413
- state = :ST_DATA
414
- end
415
- cell << src[last_idx, (idx - last_idx - (fs_size - 1))]
416
- last_idx = idx
417
- fs_idx = rs_idx = 0
418
- if cr
419
- raise IllegalFormatError
420
- end
421
- sep = :DT_COLSEP
422
- elsif rs_idx == rs_size
423
- if state == :ST_START and fs_idx > 0 and rs_idx < fs_idx
424
- state = :ST_DATA
425
- end
426
- if !(rs.nil? and cr)
427
- cell << src[last_idx, (idx - last_idx - (rs_size - 1))]
428
- last_idx = idx
729
+ row[index_or_header] = value[i]
429
730
  end
430
- fs_idx = rs_idx = 0
431
- sep = :DT_ROWSEP
432
- end
433
- if sep
434
- if state == :ST_DATA
435
- return sep, idx + 1, cell;
436
- elsif state == :ST_QUOTE
437
- return sep, idx + 1, cell;
438
- else # :ST_START
439
- return sep, idx + 1, nil
440
- end
441
- end
442
- elsif rs.nil? and c == ?\r
443
- # special \r treatment for backward compatibility
444
- fs_idx = rs_idx = 0
445
- if cr
446
- raise IllegalFormatError
447
- end
448
- cell << src[last_idx, (idx - last_idx)]
449
- last_idx = idx
450
- if quoted
451
- state = :ST_DATA
452
- else
453
- cr = true
454
731
  end
455
- else
456
- fs_idx = rs_idx = 0
457
- if state == :ST_DATA or state == :ST_START
458
- if cr
459
- raise IllegalFormatError
732
+ else # repeated value
733
+ @table.each do |row|
734
+ if row.header_row?
735
+ row[index_or_header] = index_or_header
736
+ else
737
+ row[index_or_header] = value
460
738
  end
461
- state = :ST_DATA
462
- else # :ST_QUOTE
463
- raise IllegalFormatError
464
739
  end
465
740
  end
466
- idx += 1
467
741
  end
468
- if state == :ST_START
469
- if fs_idx > 0 or rs_idx > 0
470
- state = :ST_DATA
471
- else
472
- return :DT_EOS, idx, nil
473
- end
474
- elsif quoted
475
- raise IllegalFormatError
476
- elsif cr
477
- raise IllegalFormatError
478
- end
479
- cell << src[last_idx, (idx - last_idx)]
480
- last_idx = idx
481
- return :DT_EOS, idx, cell
482
- end
483
-
484
- def generate_body(cell, out_dev, fs, rs)
485
- if cell.nil?
486
- # empty
487
- else
488
- cell = cell.to_s
489
- row_data = cell.dup
490
- if (row_data.gsub!('"', '""') or
491
- row_data.index(fs) or
492
- (rs and row_data.index(rs)) or
493
- (/[\r\n]/ =~ row_data) or
494
- (cell.empty?))
495
- out_dev << '"' << row_data << '"'
496
- else
497
- out_dev << row_data
498
- end
742
+ end
743
+
744
+ #
745
+ # The mixed mode default is to treat a list of indices as row access,
746
+ # returning the rows indicated. Anything else is considered columnar
747
+ # access. For columnar access, the return set has an Array for each row
748
+ # with the values indicated by the headers in each Array. You can force
749
+ # column or row mode using by_col!() or by_row!().
750
+ #
751
+ # You cannot mix column and row access.
752
+ #
753
+ def values_at(*indices_or_headers)
754
+ if @mode == :row or # by indices
755
+ ( @mode == :col_or_row and indices_or_headers.all? do |index|
756
+ index.is_a?(Integer) or
757
+ ( index.is_a?(Range) and
758
+ index.first.is_a?(Integer) and
759
+ index.last.is_a?(Integer) )
760
+ end )
761
+ @table.values_at(*indices_or_headers)
762
+ else # by headers
763
+ @table.map { |row| row.values_at(*indices_or_headers) }
499
764
  end
500
765
  end
501
-
502
- def generate_separator(type, out_dev, fs, rs)
503
- case type
504
- when :DT_COLSEP
505
- out_dev << fs
506
- when :DT_ROWSEP
507
- out_dev << (rs || "\n")
766
+
767
+ #
768
+ # Adds a new row to the bottom end of this table. You can provide an Array,
769
+ # which will be converted to a CSV::Row (inheriting the table's headers()),
770
+ # or a CSV::Row.
771
+ #
772
+ # This method returns the table for chaining.
773
+ #
774
+ def <<(row_or_array)
775
+ if row_or_array.is_a? Array # append Array
776
+ @table << Row.new(headers, row_or_array)
777
+ else # append Row
778
+ @table << row_or_array
508
779
  end
780
+
781
+ self # for chaining
509
782
  end
510
- end
511
783
 
784
+ #
785
+ # A shortcut for appending multiple rows. Equivalent to:
786
+ #
787
+ # rows.each { |row| self << row }
788
+ #
789
+ # This method returns the table for chaining.
790
+ #
791
+ def push(*rows)
792
+ rows.each { |row| self << row }
512
793
 
513
- # CSV formatted string/stream reader.
514
- #
515
- # EXAMPLE
516
- # read CSV lines untill the first column is 'stop'.
517
- #
518
- # CSV::Reader.parse(File.open('bigdata', 'rb')) do |row|
519
- # p row
520
- # break if !row[0].is_null && row[0].data == 'stop'
521
- # end
522
- #
523
- class Reader
524
- include Enumerable
794
+ self # for chaining
795
+ end
796
+
797
+ #
798
+ # Removes and returns the indicated column or row. In the default mixed
799
+ # mode indices refer to rows and everything else is assumed to be a column
800
+ # header. Use by_col!() or by_row!() to force the lookup.
801
+ #
802
+ def delete(index_or_header)
803
+ if @mode == :row or # by index
804
+ (@mode == :col_or_row and index_or_header.is_a? Integer)
805
+ @table.delete_at(index_or_header)
806
+ else # by header
807
+ @table.map { |row| row.delete(index_or_header).last }
808
+ end
809
+ end
525
810
 
526
- # Parse CSV data and get lines. Given block is called for each parsed row.
527
- # Block value is always nil. Rows are not cached for performance reason.
528
- def Reader.parse(str_or_readable, fs = ',', rs = nil, &block)
529
- reader = Reader.create(str_or_readable, fs, rs)
530
- if block
531
- reader.each do |row|
532
- yield(row)
811
+ #
812
+ # Removes any column or row for which the block returns +true+. In the
813
+ # default mixed mode or row mode, iteration is the standard row major
814
+ # walking of rows. In column mode, interation will +yield+ two element
815
+ # tuples containing the column name and an Array of values for that column.
816
+ #
817
+ # This method returns the table for chaining.
818
+ #
819
+ def delete_if(&block)
820
+ if @mode == :row or @mode == :col_or_row # by index
821
+ @table.delete_if(&block)
822
+ else # by header
823
+ to_delete = Array.new
824
+ headers.each_with_index do |header, i|
825
+ to_delete << header if block[[header, self[header]]]
533
826
  end
534
- reader.close
535
- nil
536
- else
537
- reader
827
+ to_delete.map { |header| delete(header) }
538
828
  end
829
+
830
+ self # for chaining
539
831
  end
540
832
 
541
- # Returns reader instance.
542
- def Reader.create(str_or_readable, fs = ',', rs = nil)
543
- case str_or_readable
544
- when IO
545
- IOReader.new(str_or_readable, fs, rs)
546
- when String
547
- StringReader.new(str_or_readable, fs, rs)
833
+ include Enumerable
834
+
835
+ #
836
+ # In the default mixed mode or row mode, iteration is the standard row major
837
+ # walking of rows. In column mode, interation will +yield+ two element
838
+ # tuples containing the column name and an Array of values for that column.
839
+ #
840
+ # This method returns the table for chaining.
841
+ #
842
+ def each(&block)
843
+ if @mode == :col
844
+ headers.each { |header| block[[header, self[header]]] }
548
845
  else
549
- IOReader.new(str_or_readable, fs, rs)
846
+ @table.each(&block)
550
847
  end
848
+
849
+ self # for chaining
850
+ end
851
+
852
+ # Returns +true+ if all rows of this table ==() +other+'s rows.
853
+ def ==(other)
854
+ @table == other.table
551
855
  end
552
856
 
553
- def each
554
- while true
555
- row = []
556
- parsed_cells = get_row(row)
557
- if parsed_cells == 0
558
- break
857
+ #
858
+ # Returns the table as an Array of Arrays. Headers will be the first row,
859
+ # then all of the field rows will follow.
860
+ #
861
+ def to_a
862
+ @table.inject([headers]) do |array, row|
863
+ if row.header_row?
864
+ array
865
+ else
866
+ array + [row.fields]
559
867
  end
560
- yield(row)
561
868
  end
562
- nil
563
869
  end
564
870
 
565
- def shift
566
- row = []
567
- parsed_cells = get_row(row)
568
- row
871
+ #
872
+ # Returns the table as a complete CSV String. Headers will be listed first,
873
+ # then all of the field rows.
874
+ #
875
+ # This method assumes you want the Table.headers(), unless you explicitly
876
+ # pass <tt>:write_headers => false</tt>.
877
+ #
878
+ def to_csv(options = Hash.new)
879
+ wh = options.fetch(:write_headers, true)
880
+ @table.inject(wh ? [headers.to_csv(options)] : [ ]) do |rows, row|
881
+ if row.header_row?
882
+ rows
883
+ else
884
+ rows + [row.fields.to_csv(options)]
885
+ end
886
+ end.join('')
569
887
  end
888
+ alias_method :to_s, :to_csv
570
889
 
571
- def close
572
- terminate
890
+ # Shows the mode and size of this table in a US-ASCII String.
891
+ def inspect
892
+ "#<#{self.class} mode:#{@mode} row_count:#{to_a.size}>".encode("US-ASCII")
573
893
  end
894
+ end
574
895
 
575
- private
896
+ # The error thrown when the parser encounters illegal CSV formatting.
897
+ class MalformedCSVError < RuntimeError; end
576
898
 
577
- def initialize(dev)
578
- raise RuntimeError.new('Do not instanciate this class directly.')
579
- end
899
+ #
900
+ # A FieldInfo Struct contains details about a field's position in the data
901
+ # source it was read from. CSV will pass this Struct to some blocks that make
902
+ # decisions based on field structure. See CSV.convert_fields() for an
903
+ # example.
904
+ #
905
+ # <b><tt>index</tt></b>:: The zero-based index of the field in its row.
906
+ # <b><tt>line</tt></b>:: The line of the data source this row is from.
907
+ # <b><tt>header</tt></b>:: The header for the column, when available.
908
+ #
909
+ FieldInfo = Struct.new(:index, :line, :header)
580
910
 
581
- def get_row(row)
582
- raise NotImplementedError.new('Method get_row must be defined in a derived class.')
583
- end
911
+ # A Regexp used to find and convert some common Date formats.
912
+ DateMatcher = / \A(?: (\w+,?\s+)?\w+\s+\d{1,2},?\s+\d{2,4} |
913
+ \d{4}-\d{2}-\d{2} )\z /x
914
+ # A Regexp used to find and convert some common DateTime formats.
915
+ DateTimeMatcher =
916
+ / \A(?: (\w+,?\s+)?\w+\s+\d{1,2}\s+\d{1,2}:\d{1,2}:\d{1,2},?\s+\d{2,4} |
917
+ \d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2} )\z /x
584
918
 
585
- def terminate
586
- # Define if needed.
919
+ # The encoding used by all converters.
920
+ ConverterEncoding = Encoding.find("UTF-8")
921
+
922
+ #
923
+ # This Hash holds the built-in converters of CSV that can be accessed by name.
924
+ # You can select Converters with CSV.convert() or through the +options+ Hash
925
+ # passed to CSV::new().
926
+ #
927
+ # <b><tt>:integer</tt></b>:: Converts any field Integer() accepts.
928
+ # <b><tt>:float</tt></b>:: Converts any field Float() accepts.
929
+ # <b><tt>:numeric</tt></b>:: A combination of <tt>:integer</tt>
930
+ # and <tt>:float</tt>.
931
+ # <b><tt>:date</tt></b>:: Converts any field Date::parse() accepts.
932
+ # <b><tt>:date_time</tt></b>:: Converts any field DateTime::parse() accepts.
933
+ # <b><tt>:all</tt></b>:: All built-in converters. A combination of
934
+ # <tt>:date_time</tt> and <tt>:numeric</tt>.
935
+ #
936
+ # All built-in converters transcode field data to UTF-8 before attempting a
937
+ # conversion. If your data cannot be transcoded to UTF-8 the conversion will
938
+ # fail and the field will remain unchanged.
939
+ #
940
+ # This Hash is intentionally left unfrozen and users should feel free to add
941
+ # values to it that can be accessed by all CSV objects.
942
+ #
943
+ # To add a combo field, the value should be an Array of names. Combo fields
944
+ # can be nested with other combo fields.
945
+ #
946
+ Converters = { integer: lambda { |f|
947
+ Integer(f.encode(ConverterEncoding)) rescue f
948
+ },
949
+ float: lambda { |f|
950
+ Float(f.encode(ConverterEncoding)) rescue f
951
+ },
952
+ numeric: [:integer, :float],
953
+ date: lambda { |f|
954
+ begin
955
+ e = f.encode(ConverterEncoding)
956
+ e =~ DateMatcher ? Date.parse(e) : f
957
+ rescue # encoding conversion or date parse errors
958
+ f
959
+ end
960
+ },
961
+ date_time: lambda { |f|
962
+ begin
963
+ e = f.encode(ConverterEncoding)
964
+ e =~ DateTimeMatcher ? DateTime.parse(e) : f
965
+ rescue # encoding conversion or date parse errors
966
+ f
967
+ end
968
+ },
969
+ all: [:date_time, :numeric] }
970
+
971
+ #
972
+ # This Hash holds the built-in header converters of CSV that can be accessed
973
+ # by name. You can select HeaderConverters with CSV.header_convert() or
974
+ # through the +options+ Hash passed to CSV::new().
975
+ #
976
+ # <b><tt>:downcase</tt></b>:: Calls downcase() on the header String.
977
+ # <b><tt>:symbol</tt></b>:: The header String is downcased, spaces are
978
+ # replaced with underscores, non-word characters
979
+ # are dropped, and finally to_sym() is called.
980
+ #
981
+ # All built-in header converters transcode header data to UTF-8 before
982
+ # attempting a conversion. If your data cannot be transcoded to UTF-8 the
983
+ # conversion will fail and the header will remain unchanged.
984
+ #
985
+ # This Hash is intetionally left unfrozen and users should feel free to add
986
+ # values to it that can be accessed by all CSV objects.
987
+ #
988
+ # To add a combo field, the value should be an Array of names. Combo fields
989
+ # can be nested with other combo fields.
990
+ #
991
+ HeaderConverters = {
992
+ downcase: lambda { |h| h.encode(ConverterEncoding).downcase },
993
+ symbol: lambda { |h|
994
+ h.encode(ConverterEncoding).downcase.gsub(/\s+/, "_").
995
+ gsub(/\W+/, "").to_sym
996
+ }
997
+ }
998
+
999
+ #
1000
+ # The options used when no overrides are given by calling code. They are:
1001
+ #
1002
+ # <b><tt>:col_sep</tt></b>:: <tt>","</tt>
1003
+ # <b><tt>:row_sep</tt></b>:: <tt>:auto</tt>
1004
+ # <b><tt>:quote_char</tt></b>:: <tt>'"'</tt>
1005
+ # <b><tt>:field_size_limit</tt></b>:: +nil+
1006
+ # <b><tt>:converters</tt></b>:: +nil+
1007
+ # <b><tt>:unconverted_fields</tt></b>:: +nil+
1008
+ # <b><tt>:headers</tt></b>:: +false+
1009
+ # <b><tt>:return_headers</tt></b>:: +false+
1010
+ # <b><tt>:header_converters</tt></b>:: +nil+
1011
+ # <b><tt>:skip_blanks</tt></b>:: +false+
1012
+ # <b><tt>:force_quotes</tt></b>:: +false+
1013
+ # <b><tt>:skip_lines</tt></b>:: +nil+
1014
+ #
1015
+ DEFAULT_OPTIONS = { col_sep: ",",
1016
+ row_sep: :auto,
1017
+ quote_char: '"',
1018
+ field_size_limit: nil,
1019
+ converters: nil,
1020
+ unconverted_fields: nil,
1021
+ headers: false,
1022
+ return_headers: false,
1023
+ header_converters: nil,
1024
+ skip_blanks: false,
1025
+ force_quotes: false,
1026
+ skip_lines: nil }.freeze
1027
+
1028
+ #
1029
+ # This method will return a CSV instance, just like CSV::new(), but the
1030
+ # instance will be cached and returned for all future calls to this method for
1031
+ # the same +data+ object (tested by Object#object_id()) with the same
1032
+ # +options+.
1033
+ #
1034
+ # If a block is given, the instance is passed to the block and the return
1035
+ # value becomes the return value of the block.
1036
+ #
1037
+ def self.instance(data = $stdout, options = Hash.new)
1038
+ # create a _signature_ for this method call, data object and options
1039
+ sig = [data.object_id] +
1040
+ options.values_at(*DEFAULT_OPTIONS.keys.sort_by { |sym| sym.to_s })
1041
+
1042
+ # fetch or create the instance for this signature
1043
+ @@instances ||= Hash.new
1044
+ instance = (@@instances[sig] ||= new(data, options))
1045
+
1046
+ if block_given?
1047
+ yield instance # run block, if given, returning result
1048
+ else
1049
+ instance # or return the instance
587
1050
  end
588
1051
  end
589
-
590
1052
 
591
- class StringReader < Reader
592
- def initialize(string, fs = ',', rs = nil)
593
- @fs = fs
594
- @rs = rs
595
- @dev = string
596
- @idx = 0
597
- if @dev[0, 3] == "\xef\xbb\xbf"
598
- @idx += 3
1053
+ #
1054
+ # :call-seq:
1055
+ # filter( options = Hash.new ) { |row| ... }
1056
+ # filter( input, options = Hash.new ) { |row| ... }
1057
+ # filter( input, output, options = Hash.new ) { |row| ... }
1058
+ #
1059
+ # This method is a convenience for building Unix-like filters for CSV data.
1060
+ # Each row is yielded to the provided block which can alter it as needed.
1061
+ # After the block returns, the row is appended to +output+ altered or not.
1062
+ #
1063
+ # The +input+ and +output+ arguments can be anything CSV::new() accepts
1064
+ # (generally String or IO objects). If not given, they default to
1065
+ # <tt>ARGF</tt> and <tt>$stdout</tt>.
1066
+ #
1067
+ # The +options+ parameter is also filtered down to CSV::new() after some
1068
+ # clever key parsing. Any key beginning with <tt>:in_</tt> or
1069
+ # <tt>:input_</tt> will have that leading identifier stripped and will only
1070
+ # be used in the +options+ Hash for the +input+ object. Keys starting with
1071
+ # <tt>:out_</tt> or <tt>:output_</tt> affect only +output+. All other keys
1072
+ # are assigned to both objects.
1073
+ #
1074
+ # The <tt>:output_row_sep</tt> +option+ defaults to
1075
+ # <tt>$INPUT_RECORD_SEPARATOR</tt> (<tt>$/</tt>).
1076
+ #
1077
+ def self.filter(*args)
1078
+ # parse options for input, output, or both
1079
+ in_options, out_options = Hash.new, {row_sep: $INPUT_RECORD_SEPARATOR}
1080
+ if args.last.is_a? Hash
1081
+ args.pop.each do |key, value|
1082
+ case key.to_s
1083
+ when /\Ain(?:put)?_(.+)\Z/
1084
+ in_options[$1.to_sym] = value
1085
+ when /\Aout(?:put)?_(.+)\Z/
1086
+ out_options[$1.to_sym] = value
1087
+ else
1088
+ in_options[key] = value
1089
+ out_options[key] = value
1090
+ end
599
1091
  end
600
1092
  end
1093
+ # build input and output wrappers
1094
+ input = new(args.shift || ARGF, in_options)
1095
+ output = new(args.shift || $stdout, out_options)
601
1096
 
602
- private
603
-
604
- def get_row(row)
605
- parsed_cells, next_idx = CSV.parse_row(@dev, @idx, row, @fs, @rs)
606
- if parsed_cells == 0 and next_idx == 0 and @idx != @dev.size
607
- raise IllegalFormatError.new
608
- end
609
- @idx = next_idx
610
- parsed_cells
1097
+ # read, yield, write
1098
+ input.each do |row|
1099
+ yield row
1100
+ output << row
611
1101
  end
612
1102
  end
613
1103
 
614
-
615
- class IOReader < Reader
616
- def initialize(io, fs = ',', rs = nil)
617
- @io = io
618
- @fs = fs
619
- @rs = rs
620
- @dev = CSV::IOBuf.new(@io)
621
- @idx = 0
622
- if @dev[0] == 0xef and @dev[1] == 0xbb and @dev[2] == 0xbf
623
- @idx += 3
624
- end
625
- @close_on_terminate = false
1104
+ #
1105
+ # This method is intended as the primary interface for reading CSV files. You
1106
+ # pass a +path+ and any +options+ you wish to set for the read. Each row of
1107
+ # file will be passed to the provided +block+ in turn.
1108
+ #
1109
+ # The +options+ parameter can be anything CSV::new() understands. This method
1110
+ # also understands an additional <tt>:encoding</tt> parameter that you can use
1111
+ # to specify the Encoding of the data in the file to be read. You must provide
1112
+ # this unless your data is in Encoding::default_external(). CSV will use this
1113
+ # to determine how to parse the data. You may provide a second Encoding to
1114
+ # have the data transcoded as it is read. For example,
1115
+ # <tt>encoding: "UTF-32BE:UTF-8"</tt> would read UTF-32BE data from the file
1116
+ # but transcode it to UTF-8 before CSV parses it.
1117
+ #
1118
+ def self.foreach(path, options = Hash.new, &block)
1119
+ open(path, options) do |csv|
1120
+ csv.each(&block)
626
1121
  end
1122
+ end
627
1123
 
628
- # Tell this reader to close the IO when terminated (Triggered by invoking
629
- # CSV::IOReader#close).
630
- def close_on_terminate
631
- @close_on_terminate = true
1124
+ #
1125
+ # :call-seq:
1126
+ # generate( str, options = Hash.new ) { |csv| ... }
1127
+ # generate( options = Hash.new ) { |csv| ... }
1128
+ #
1129
+ # This method wraps a String you provide, or an empty default String, in a
1130
+ # CSV object which is passed to the provided block. You can use the block to
1131
+ # append CSV rows to the String and when the block exits, the final String
1132
+ # will be returned.
1133
+ #
1134
+ # Note that a passed String *is* modfied by this method. Call dup() before
1135
+ # passing if you need a new String.
1136
+ #
1137
+ # The +options+ parameter can be anything CSV::new() understands. This method
1138
+ # understands an additional <tt>:encoding</tt> parameter when not passed a
1139
+ # String to set the base Encoding for the output. CSV needs this hint if you
1140
+ # plan to output non-ASCII compatible data.
1141
+ #
1142
+ def self.generate(*args)
1143
+ # add a default empty String, if none was given
1144
+ if args.first.is_a? String
1145
+ io = StringIO.new(args.shift)
1146
+ io.seek(0, IO::SEEK_END)
1147
+ args.unshift(io)
1148
+ else
1149
+ encoding = (args[-1] = args[-1].dup).delete(:encoding) if args.last.is_a?(Hash)
1150
+ str = ""
1151
+ str.encode!(encoding) if encoding
1152
+ args.unshift(str)
632
1153
  end
1154
+ csv = new(*args) # wrap
1155
+ yield csv # yield for appending
1156
+ csv.string # return final String
1157
+ end
633
1158
 
634
- private
1159
+ #
1160
+ # This method is a shortcut for converting a single row (Array) into a CSV
1161
+ # String.
1162
+ #
1163
+ # The +options+ parameter can be anything CSV::new() understands. This method
1164
+ # understands an additional <tt>:encoding</tt> parameter to set the base
1165
+ # Encoding for the output. This method will try to guess your Encoding from
1166
+ # the first non-+nil+ field in +row+, if possible, but you may need to use
1167
+ # this parameter as a backup plan.
1168
+ #
1169
+ # The <tt>:row_sep</tt> +option+ defaults to <tt>$INPUT_RECORD_SEPARATOR</tt>
1170
+ # (<tt>$/</tt>) when calling this method.
1171
+ #
1172
+ def self.generate_line(row, options = Hash.new)
1173
+ options = {row_sep: $INPUT_RECORD_SEPARATOR}.merge(options)
1174
+ encoding = options.delete(:encoding)
1175
+ str = ""
1176
+ if encoding
1177
+ str.force_encoding(encoding)
1178
+ elsif field = row.find { |f| not f.nil? }
1179
+ str.force_encoding(String(field).encoding)
1180
+ end
1181
+ (new(str, options) << row).string
1182
+ end
635
1183
 
636
- def get_row(row)
637
- parsed_cells, next_idx = CSV.parse_row(@dev, @idx, row, @fs, @rs)
638
- if parsed_cells == 0 and next_idx == 0 and !@dev.is_eos?
639
- raise IllegalFormatError.new
640
- end
641
- dropped = @dev.drop(next_idx)
642
- @idx = next_idx - dropped
643
- parsed_cells
1184
+ #
1185
+ # :call-seq:
1186
+ # open( filename, mode = "rb", options = Hash.new ) { |faster_csv| ... }
1187
+ # open( filename, options = Hash.new ) { |faster_csv| ... }
1188
+ # open( filename, mode = "rb", options = Hash.new )
1189
+ # open( filename, options = Hash.new )
1190
+ #
1191
+ # This method opens an IO object, and wraps that with CSV. This is intended
1192
+ # as the primary interface for writing a CSV file.
1193
+ #
1194
+ # You must pass a +filename+ and may optionally add a +mode+ for Ruby's
1195
+ # open(). You may also pass an optional Hash containing any +options+
1196
+ # CSV::new() understands as the final argument.
1197
+ #
1198
+ # This method works like Ruby's open() call, in that it will pass a CSV object
1199
+ # to a provided block and close it when the block terminates, or it will
1200
+ # return the CSV object when no block is provided. (*Note*: This is different
1201
+ # from the Ruby 1.8 CSV library which passed rows to the block. Use
1202
+ # CSV::foreach() for that behavior.)
1203
+ #
1204
+ # You must provide a +mode+ with an embedded Encoding designator unless your
1205
+ # data is in Encoding::default_external(). CSV will check the Encoding of the
1206
+ # underlying IO object (set by the +mode+ you pass) to determine how to parse
1207
+ # the data. You may provide a second Encoding to have the data transcoded as
1208
+ # it is read just as you can with a normal call to IO::open(). For example,
1209
+ # <tt>"rb:UTF-32BE:UTF-8"</tt> would read UTF-32BE data from the file but
1210
+ # transcode it to UTF-8 before CSV parses it.
1211
+ #
1212
+ # An opened CSV object will delegate to many IO methods for convenience. You
1213
+ # may call:
1214
+ #
1215
+ # * binmode()
1216
+ # * binmode?()
1217
+ # * close()
1218
+ # * close_read()
1219
+ # * close_write()
1220
+ # * closed?()
1221
+ # * eof()
1222
+ # * eof?()
1223
+ # * external_encoding()
1224
+ # * fcntl()
1225
+ # * fileno()
1226
+ # * flock()
1227
+ # * flush()
1228
+ # * fsync()
1229
+ # * internal_encoding()
1230
+ # * ioctl()
1231
+ # * isatty()
1232
+ # * path()
1233
+ # * pid()
1234
+ # * pos()
1235
+ # * pos=()
1236
+ # * reopen()
1237
+ # * seek()
1238
+ # * stat()
1239
+ # * sync()
1240
+ # * sync=()
1241
+ # * tell()
1242
+ # * to_i()
1243
+ # * to_io()
1244
+ # * truncate()
1245
+ # * tty?()
1246
+ #
1247
+ def self.open(*args)
1248
+ # find the +options+ Hash
1249
+ options = if args.last.is_a? Hash then args.pop else Hash.new end
1250
+ # wrap a File opened with the remaining +args+ with no newline
1251
+ # decorator
1252
+ file_opts = {universal_newline: false}.merge(options)
1253
+ begin
1254
+ f = File.open(*args, file_opts)
1255
+ rescue ArgumentError => e
1256
+ raise unless /needs binmode/ =~ e.message and args.size == 1
1257
+ args << "rb"
1258
+ file_opts = {encoding: Encoding.default_external}.merge(file_opts)
1259
+ retry
644
1260
  end
1261
+ csv = new(f, options)
645
1262
 
646
- def terminate
647
- if @close_on_terminate
648
- @io.close
1263
+ # handle blocks like Ruby's open(), not like the CSV library
1264
+ if block_given?
1265
+ begin
1266
+ yield csv
1267
+ ensure
1268
+ csv.close
649
1269
  end
1270
+ else
1271
+ csv
1272
+ end
1273
+ end
650
1274
 
651
- if @dev
652
- @dev.close
1275
+ #
1276
+ # :call-seq:
1277
+ # parse( str, options = Hash.new ) { |row| ... }
1278
+ # parse( str, options = Hash.new )
1279
+ #
1280
+ # This method can be used to easily parse CSV out of a String. You may either
1281
+ # provide a +block+ which will be called with each row of the String in turn,
1282
+ # or just use the returned Array of Arrays (when no +block+ is given).
1283
+ #
1284
+ # You pass your +str+ to read from, and an optional +options+ Hash containing
1285
+ # anything CSV::new() understands.
1286
+ #
1287
+ def self.parse(*args, &block)
1288
+ csv = new(*args)
1289
+ if block.nil? # slurp contents, if no block is given
1290
+ begin
1291
+ csv.read
1292
+ ensure
1293
+ csv.close
653
1294
  end
1295
+ else # or pass each row to a provided block
1296
+ csv.each(&block)
654
1297
  end
655
1298
  end
656
1299
 
1300
+ #
1301
+ # This method is a shortcut for converting a single line of a CSV String into
1302
+ # a into an Array. Note that if +line+ contains multiple rows, anything
1303
+ # beyond the first row is ignored.
1304
+ #
1305
+ # The +options+ parameter can be anything CSV::new() understands.
1306
+ #
1307
+ def self.parse_line(line, options = Hash.new)
1308
+ new(line, options).shift
1309
+ end
657
1310
 
658
- # CSV formatted string/stream writer.
659
1311
  #
660
- # EXAMPLE
661
- # Write rows to 'csvout' file.
1312
+ # Use to slurp a CSV file into an Array of Arrays. Pass the +path+ to the
1313
+ # file and any +options+ CSV::new() understands. This method also understands
1314
+ # an additional <tt>:encoding</tt> parameter that you can use to specify the
1315
+ # Encoding of the data in the file to be read. You must provide this unless
1316
+ # your data is in Encoding::default_external(). CSV will use this to determine
1317
+ # how to parse the data. You may provide a second Encoding to have the data
1318
+ # transcoded as it is read. For example,
1319
+ # <tt>encoding: "UTF-32BE:UTF-8"</tt> would read UTF-32BE data from the file
1320
+ # but transcode it to UTF-8 before CSV parses it.
662
1321
  #
663
- # outfile = File.open('csvout', 'wb')
664
- # CSV::Writer.generate(outfile) do |csv|
665
- # csv << ['c1', nil, '', '"', "\r\n", 'c2']
666
- # ...
667
- # end
1322
+ def self.read(path, *options)
1323
+ open(path, *options) { |csv| csv.read }
1324
+ end
1325
+
1326
+ # Alias for CSV::read().
1327
+ def self.readlines(*args)
1328
+ read(*args)
1329
+ end
1330
+
668
1331
  #
669
- # outfile.close
1332
+ # A shortcut for:
670
1333
  #
671
- class Writer
672
- # Given block is called with the writer instance. str_or_writable must
673
- # handle '<<(string)'.
674
- def Writer.generate(str_or_writable, fs = ',', rs = nil, &block)
675
- writer = Writer.create(str_or_writable, fs, rs)
676
- if block
677
- yield(writer)
678
- writer.close
679
- nil
680
- else
681
- writer
682
- end
1334
+ # CSV.read( path, { headers: true,
1335
+ # converters: :numeric,
1336
+ # header_converters: :symbol }.merge(options) )
1337
+ #
1338
+ def self.table(path, options = Hash.new)
1339
+ read( path, { headers: true,
1340
+ converters: :numeric,
1341
+ header_converters: :symbol }.merge(options) )
1342
+ end
1343
+
1344
+ #
1345
+ # This constructor will wrap either a String or IO object passed in +data+ for
1346
+ # reading and/or writing. In addition to the CSV instance methods, several IO
1347
+ # methods are delegated. (See CSV::open() for a complete list.) If you pass
1348
+ # a String for +data+, you can later retrieve it (after writing to it, for
1349
+ # example) with CSV.string().
1350
+ #
1351
+ # Note that a wrapped String will be positioned at at the beginning (for
1352
+ # reading). If you want it at the end (for writing), use CSV::generate().
1353
+ # If you want any other positioning, pass a preset StringIO object instead.
1354
+ #
1355
+ # You may set any reading and/or writing preferences in the +options+ Hash.
1356
+ # Available options are:
1357
+ #
1358
+ # <b><tt>:col_sep</tt></b>:: The String placed between each field.
1359
+ # This String will be transcoded into
1360
+ # the data's Encoding before parsing.
1361
+ # <b><tt>:row_sep</tt></b>:: The String appended to the end of each
1362
+ # row. This can be set to the special
1363
+ # <tt>:auto</tt> setting, which requests
1364
+ # that CSV automatically discover this
1365
+ # from the data. Auto-discovery reads
1366
+ # ahead in the data looking for the next
1367
+ # <tt>"\r\n"</tt>, <tt>"\n"</tt>, or
1368
+ # <tt>"\r"</tt> sequence. A sequence
1369
+ # will be selected even if it occurs in
1370
+ # a quoted field, assuming that you
1371
+ # would have the same line endings
1372
+ # there. If none of those sequences is
1373
+ # found, +data+ is <tt>ARGF</tt>,
1374
+ # <tt>STDIN</tt>, <tt>STDOUT</tt>, or
1375
+ # <tt>STDERR</tt>, or the stream is only
1376
+ # available for output, the default
1377
+ # <tt>$INPUT_RECORD_SEPARATOR</tt>
1378
+ # (<tt>$/</tt>) is used. Obviously,
1379
+ # discovery takes a little time. Set
1380
+ # manually if speed is important. Also
1381
+ # note that IO objects should be opened
1382
+ # in binary mode on Windows if this
1383
+ # feature will be used as the
1384
+ # line-ending translation can cause
1385
+ # problems with resetting the document
1386
+ # position to where it was before the
1387
+ # read ahead. This String will be
1388
+ # transcoded into the data's Encoding
1389
+ # before parsing.
1390
+ # <b><tt>:quote_char</tt></b>:: The character used to quote fields.
1391
+ # This has to be a single character
1392
+ # String. This is useful for
1393
+ # application that incorrectly use
1394
+ # <tt>'</tt> as the quote character
1395
+ # instead of the correct <tt>"</tt>.
1396
+ # CSV will always consider a double
1397
+ # sequence this character to be an
1398
+ # escaped quote. This String will be
1399
+ # transcoded into the data's Encoding
1400
+ # before parsing.
1401
+ # <b><tt>:field_size_limit</tt></b>:: This is a maximum size CSV will read
1402
+ # ahead looking for the closing quote
1403
+ # for a field. (In truth, it reads to
1404
+ # the first line ending beyond this
1405
+ # size.) If a quote cannot be found
1406
+ # within the limit CSV will raise a
1407
+ # MalformedCSVError, assuming the data
1408
+ # is faulty. You can use this limit to
1409
+ # prevent what are effectively DoS
1410
+ # attacks on the parser. However, this
1411
+ # limit can cause a legitimate parse to
1412
+ # fail and thus is set to +nil+, or off,
1413
+ # by default.
1414
+ # <b><tt>:converters</tt></b>:: An Array of names from the Converters
1415
+ # Hash and/or lambdas that handle custom
1416
+ # conversion. A single converter
1417
+ # doesn't have to be in an Array. All
1418
+ # built-in converters try to transcode
1419
+ # fields to UTF-8 before converting.
1420
+ # The conversion will fail if the data
1421
+ # cannot be transcoded, leaving the
1422
+ # field unchanged.
1423
+ # <b><tt>:unconverted_fields</tt></b>:: If set to +true+, an
1424
+ # unconverted_fields() method will be
1425
+ # added to all returned rows (Array or
1426
+ # CSV::Row) that will return the fields
1427
+ # as they were before conversion. Note
1428
+ # that <tt>:headers</tt> supplied by
1429
+ # Array or String were not fields of the
1430
+ # document and thus will have an empty
1431
+ # Array attached.
1432
+ # <b><tt>:headers</tt></b>:: If set to <tt>:first_row</tt> or
1433
+ # +true+, the initial row of the CSV
1434
+ # file will be treated as a row of
1435
+ # headers. If set to an Array, the
1436
+ # contents will be used as the headers.
1437
+ # If set to a String, the String is run
1438
+ # through a call of CSV::parse_line()
1439
+ # with the same <tt>:col_sep</tt>,
1440
+ # <tt>:row_sep</tt>, and
1441
+ # <tt>:quote_char</tt> as this instance
1442
+ # to produce an Array of headers. This
1443
+ # setting causes CSV#shift() to return
1444
+ # rows as CSV::Row objects instead of
1445
+ # Arrays and CSV#read() to return
1446
+ # CSV::Table objects instead of an Array
1447
+ # of Arrays.
1448
+ # <b><tt>:return_headers</tt></b>:: When +false+, header rows are silently
1449
+ # swallowed. If set to +true+, header
1450
+ # rows are returned in a CSV::Row object
1451
+ # with identical headers and
1452
+ # fields (save that the fields do not go
1453
+ # through the converters).
1454
+ # <b><tt>:write_headers</tt></b>:: When +true+ and <tt>:headers</tt> is
1455
+ # set, a header row will be added to the
1456
+ # output.
1457
+ # <b><tt>:header_converters</tt></b>:: Identical in functionality to
1458
+ # <tt>:converters</tt> save that the
1459
+ # conversions are only made to header
1460
+ # rows. All built-in converters try to
1461
+ # transcode headers to UTF-8 before
1462
+ # converting. The conversion will fail
1463
+ # if the data cannot be transcoded,
1464
+ # leaving the header unchanged.
1465
+ # <b><tt>:skip_blanks</tt></b>:: When set to a +true+ value, CSV will
1466
+ # skip over any rows with no content.
1467
+ # <b><tt>:force_quotes</tt></b>:: When set to a +true+ value, CSV will
1468
+ # quote all CSV fields it creates.
1469
+ # <b><tt>:skip_lines</tt></b>:: When set to an object responding to
1470
+ # <tt>match</tt>, every line matching
1471
+ # it is considered a comment and ignored
1472
+ # during parsing. When set to +nil+
1473
+ # no line is considered a comment.
1474
+ # If the passed object does not respond
1475
+ # to <tt>match</tt>, <tt>ArgumentError</tt>
1476
+ # is thrown.
1477
+ #
1478
+ # See CSV::DEFAULT_OPTIONS for the default settings.
1479
+ #
1480
+ # Options cannot be overridden in the instance methods for performance reasons,
1481
+ # so be sure to set what you want here.
1482
+ #
1483
+ def initialize(data, options = Hash.new)
1484
+ # build the options for this read/write
1485
+ options = DEFAULT_OPTIONS.merge(options)
1486
+
1487
+ # create the IO object we will read from
1488
+ @io = data.is_a?(String) ? StringIO.new(data) : data
1489
+ # honor the IO encoding if we can, otherwise default to ASCII-8BIT
1490
+ @encoding = raw_encoding(nil) ||
1491
+ ( if encoding = options.delete(:internal_encoding)
1492
+ case encoding
1493
+ when Encoding; encoding
1494
+ else Encoding.find(encoding)
1495
+ end
1496
+ end ) ||
1497
+ ( case encoding = options.delete(:encoding)
1498
+ when Encoding; encoding
1499
+ when /\A[^:]+/; Encoding.find($&)
1500
+ end ) ||
1501
+ Encoding.default_internal || Encoding.default_external
1502
+ #
1503
+ # prepare for building safe regular expressions in the target encoding,
1504
+ # if we can transcode the needed characters
1505
+ #
1506
+ @re_esc = "\\".encode(@encoding) rescue ""
1507
+ @re_chars = /#{%"[-][\\.^$?*+{}()|# \r\n\t\f\v]".encode(@encoding)}/
1508
+ # @re_chars = /#{%"[-][\\.^$?*+{}()|# \r\n\t\f\v]".encode(@encoding, fallback: proc{""})}/
1509
+
1510
+ init_separators(options)
1511
+ init_parsers(options)
1512
+ init_converters(options)
1513
+ init_headers(options)
1514
+ init_comments(options)
1515
+
1516
+ options.delete(:encoding)
1517
+ options.delete(:internal_encoding)
1518
+ options.delete(:external_encoding)
1519
+ unless options.empty?
1520
+ raise ArgumentError, "Unknown options: #{options.keys.join(', ')}."
683
1521
  end
684
1522
 
685
- # str_or_writable must handle '<<(string)'.
686
- def Writer.create(str_or_writable, fs = ',', rs = nil)
687
- BasicWriter.new(str_or_writable, fs, rs)
1523
+ # track our own lineno since IO gets confused about line-ends is CSV fields
1524
+ @lineno = 0
1525
+ end
1526
+
1527
+ #
1528
+ # The encoded <tt>:col_sep</tt> used in parsing and writing. See CSV::new
1529
+ # for details.
1530
+ #
1531
+ attr_reader :col_sep
1532
+ #
1533
+ # The encoded <tt>:row_sep</tt> used in parsing and writing. See CSV::new
1534
+ # for details.
1535
+ #
1536
+ attr_reader :row_sep
1537
+ #
1538
+ # The encoded <tt>:quote_char</tt> used in parsing and writing. See CSV::new
1539
+ # for details.
1540
+ #
1541
+ attr_reader :quote_char
1542
+ # The limit for field size, if any. See CSV::new for details.
1543
+ attr_reader :field_size_limit
1544
+
1545
+ # The regex marking a line as a comment. See CSV::new for details
1546
+ attr_reader :skip_lines
1547
+
1548
+ #
1549
+ # Returns the current list of converters in effect. See CSV::new for details.
1550
+ # Built-in converters will be returned by name, while others will be returned
1551
+ # as is.
1552
+ #
1553
+ def converters
1554
+ @converters.map do |converter|
1555
+ name = Converters.rassoc(converter)
1556
+ name ? name.first : converter
688
1557
  end
1558
+ end
1559
+ #
1560
+ # Returns +true+ if unconverted_fields() to parsed results. See CSV::new
1561
+ # for details.
1562
+ #
1563
+ def unconverted_fields?() @unconverted_fields end
1564
+ #
1565
+ # Returns +nil+ if headers will not be used, +true+ if they will but have not
1566
+ # yet been read, or the actual headers after they have been read. See
1567
+ # CSV::new for details.
1568
+ #
1569
+ def headers
1570
+ @headers || true if @use_headers
1571
+ end
1572
+ #
1573
+ # Returns +true+ if headers will be returned as a row of results.
1574
+ # See CSV::new for details.
1575
+ #
1576
+ def return_headers?() @return_headers end
1577
+ # Returns +true+ if headers are written in output. See CSV::new for details.
1578
+ def write_headers?() @write_headers end
1579
+ #
1580
+ # Returns the current list of converters in effect for headers. See CSV::new
1581
+ # for details. Built-in converters will be returned by name, while others
1582
+ # will be returned as is.
1583
+ #
1584
+ def header_converters
1585
+ @header_converters.map do |converter|
1586
+ name = HeaderConverters.rassoc(converter)
1587
+ name ? name.first : converter
1588
+ end
1589
+ end
1590
+ #
1591
+ # Returns +true+ blank lines are skipped by the parser. See CSV::new
1592
+ # for details.
1593
+ #
1594
+ def skip_blanks?() @skip_blanks end
1595
+ # Returns +true+ if all output fields are quoted. See CSV::new for details.
1596
+ def force_quotes?() @force_quotes end
689
1597
 
690
- # dump CSV stream to the device. argument must be an Array of String.
691
- def <<(row)
692
- CSV.generate_row(row, row.size, @dev, @fs, @rs)
693
- self
1598
+ #
1599
+ # The Encoding CSV is parsing or writing in. This will be the Encoding you
1600
+ # receive parsed data in and/or the Encoding data will be written in.
1601
+ #
1602
+ attr_reader :encoding
1603
+
1604
+ #
1605
+ # The line number of the last row read from this file. Fields with nested
1606
+ # line-end characters will not affect this count.
1607
+ #
1608
+ attr_reader :lineno
1609
+
1610
+ ### IO and StringIO Delegation ###
1611
+
1612
+ extend Forwardable
1613
+ def_delegators :@io, :binmode, :binmode?, :close, :close_read, :close_write,
1614
+ :closed?, :eof, :eof?, :external_encoding, :fcntl,
1615
+ :fileno, :flock, :flush, :fsync, :internal_encoding,
1616
+ :ioctl, :isatty, :path, :pid, :pos, :pos=, :reopen,
1617
+ :seek, :stat, :string, :sync, :sync=, :tell, :to_i,
1618
+ :to_io, :truncate, :tty?
1619
+
1620
+ # Rewinds the underlying IO object and resets CSV's lineno() counter.
1621
+ def rewind
1622
+ @headers = nil
1623
+ @lineno = 0
1624
+
1625
+ @io.rewind
1626
+ end
1627
+
1628
+ ### End Delegation ###
1629
+
1630
+ #
1631
+ # The primary write method for wrapped Strings and IOs, +row+ (an Array or
1632
+ # CSV::Row) is converted to CSV and appended to the data source. When a
1633
+ # CSV::Row is passed, only the row's fields() are appended to the output.
1634
+ #
1635
+ # The data source must be open for writing.
1636
+ #
1637
+ def <<(row)
1638
+ # make sure headers have been assigned
1639
+ if header_row? and [Array, String].include? @use_headers.class
1640
+ parse_headers # won't read data for Array or String
1641
+ self << @headers if @write_headers
694
1642
  end
695
- alias add_row <<
696
1643
 
697
- def close
698
- terminate
1644
+ # handle CSV::Row objects and Hashes
1645
+ row = case row
1646
+ when self.class::Row then row.fields
1647
+ when Hash then @headers.map { |header| row[header] }
1648
+ else row
1649
+ end
1650
+
1651
+ @headers = row if header_row?
1652
+ @lineno += 1
1653
+
1654
+ output = row.map(&@quote).join(@col_sep) + @row_sep # quote and separate
1655
+ if @io.is_a?(StringIO) and
1656
+ output.encoding != raw_encoding and
1657
+ (compatible_encoding = Encoding.compatible?(@io.string, output))
1658
+ @io = StringIO.new(@io.string.force_encoding(compatible_encoding))
1659
+ @io.seek(0, IO::SEEK_END)
699
1660
  end
1661
+ @io << output
700
1662
 
701
- private
1663
+ self # for chaining
1664
+ end
1665
+ alias_method :add_row, :<<
1666
+ alias_method :puts, :<<
702
1667
 
703
- def initialize(dev)
704
- raise RuntimeError.new('Do not instanciate this class directly.')
1668
+ #
1669
+ # :call-seq:
1670
+ # convert( name )
1671
+ # convert { |field| ... }
1672
+ # convert { |field, field_info| ... }
1673
+ #
1674
+ # You can use this method to install a CSV::Converters built-in, or provide a
1675
+ # block that handles a custom conversion.
1676
+ #
1677
+ # If you provide a block that takes one argument, it will be passed the field
1678
+ # and is expected to return the converted value or the field itself. If your
1679
+ # block takes two arguments, it will also be passed a CSV::FieldInfo Struct,
1680
+ # containing details about the field. Again, the block should return a
1681
+ # converted field or the field itself.
1682
+ #
1683
+ def convert(name = nil, &converter)
1684
+ add_converter(:converters, self.class::Converters, name, &converter)
1685
+ end
1686
+
1687
+ #
1688
+ # :call-seq:
1689
+ # header_convert( name )
1690
+ # header_convert { |field| ... }
1691
+ # header_convert { |field, field_info| ... }
1692
+ #
1693
+ # Identical to CSV#convert(), but for header rows.
1694
+ #
1695
+ # Note that this method must be called before header rows are read to have any
1696
+ # effect.
1697
+ #
1698
+ def header_convert(name = nil, &converter)
1699
+ add_converter( :header_converters,
1700
+ self.class::HeaderConverters,
1701
+ name,
1702
+ &converter )
1703
+ end
1704
+
1705
+ include Enumerable
1706
+
1707
+ #
1708
+ # Yields each row of the data source in turn.
1709
+ #
1710
+ # Support for Enumerable.
1711
+ #
1712
+ # The data source must be open for reading.
1713
+ #
1714
+ def each
1715
+ if block_given?
1716
+ while row = shift
1717
+ yield row
1718
+ end
1719
+ else
1720
+ to_enum
705
1721
  end
1722
+ end
706
1723
 
707
- def terminate
708
- # Define if needed.
1724
+ #
1725
+ # Slurps the remaining rows and returns an Array of Arrays.
1726
+ #
1727
+ # The data source must be open for reading.
1728
+ #
1729
+ def read
1730
+ rows = to_a
1731
+ if @use_headers
1732
+ Table.new(rows)
1733
+ else
1734
+ rows
709
1735
  end
710
1736
  end
1737
+ alias_method :readlines, :read
711
1738
 
1739
+ # Returns +true+ if the next row read will be a header row.
1740
+ def header_row?
1741
+ @use_headers and @headers.nil?
1742
+ end
712
1743
 
713
- class BasicWriter < Writer
714
- def initialize(str_or_writable, fs = ',', rs = nil)
715
- @fs = fs
716
- @rs = rs
717
- @dev = str_or_writable
718
- @close_on_terminate = false
719
- end
1744
+ #
1745
+ # The primary read method for wrapped Strings and IOs, a single row is pulled
1746
+ # from the data source, parsed and returned as an Array of fields (if header
1747
+ # rows are not used) or a CSV::Row (when header rows are used).
1748
+ #
1749
+ # The data source must be open for reading.
1750
+ #
1751
+ def shift
1752
+ #########################################################################
1753
+ ### This method is purposefully kept a bit long as simple conditional ###
1754
+ ### checks are faster than numerous (expensive) method calls. ###
1755
+ #########################################################################
720
1756
 
721
- # Tell this writer to close the IO when terminated (Triggered by invoking
722
- # CSV::BasicWriter#close).
723
- def close_on_terminate
724
- @close_on_terminate = true
1757
+ # handle headers not based on document content
1758
+ if header_row? and @return_headers and
1759
+ [Array, String].include? @use_headers.class
1760
+ if @unconverted_fields
1761
+ return add_unconverted_fields(parse_headers, Array.new)
1762
+ else
1763
+ return parse_headers
1764
+ end
725
1765
  end
726
1766
 
727
- private
1767
+ #
1768
+ # it can take multiple calls to <tt>@io.gets()</tt> to get a full line,
1769
+ # because of \r and/or \n characters embedded in quoted fields
1770
+ #
1771
+ in_extended_col = false
1772
+ csv = Array.new
728
1773
 
729
- def terminate
730
- if @close_on_terminate
731
- @dev.close
732
- end
733
- end
734
- end
735
-
736
- private
737
-
738
- # Buffered stream.
739
- #
740
- # EXAMPLE 1 -- an IO.
741
- # class MyBuf < StreamBuf
742
- # # Do initialize myself before a super class. Super class might call my
743
- # # method 'read'. (Could be awful for C++ user. :-)
744
- # def initialize(s)
745
- # @s = s
746
- # super()
747
- # end
748
- #
749
- # # define my own 'read' method.
750
- # # CAUTION: Returning nil means EnfOfStream.
751
- # def read(size)
752
- # @s.read(size)
753
- # end
754
- #
755
- # # release buffers. in Ruby which has GC, you do not have to call this...
756
- # def terminate
757
- # @s = nil
758
- # super()
759
- # end
760
- # end
761
- #
762
- # buf = MyBuf.new(STDIN)
763
- # my_str = ''
764
- # p buf[0, 0] # => '' (null string)
765
- # p buf[0] # => 97 (char code of 'a')
766
- # p buf[0, 1] # => 'a'
767
- # my_str = buf[0, 5]
768
- # p my_str # => 'abcde' (5 chars)
769
- # p buf[0, 6] # => "abcde\n" (6 chars)
770
- # p buf[0, 7] # => "abcde\n" (6 chars)
771
- # p buf.drop(3) # => 3 (dropped chars)
772
- # p buf.get(0, 2) # => 'de' (2 chars)
773
- # p buf.is_eos? # => false (is not EOS here)
774
- # p buf.drop(5) # => 3 (dropped chars)
775
- # p buf.is_eos? # => true (is EOS here)
776
- # p buf[0] # => nil (is EOS here)
777
- #
778
- # EXAMPLE 2 -- String.
779
- # This is a conceptual example. No pros with this.
780
- #
781
- # class StrBuf < StreamBuf
782
- # def initialize(s)
783
- # @str = s
784
- # @idx = 0
785
- # super()
786
- # end
787
- #
788
- # def read(size)
789
- # str = @str[@idx, size]
790
- # @idx += str.size
791
- # str
792
- # end
793
- # end
794
- #
795
- class StreamBuf
796
- # get a char or a partial string from the stream.
797
- # idx: index of a string to specify a start point of a string to get.
798
- # unlike String instance, idx < 0 returns nil.
799
- # n: size of a string to get.
800
- # returns char at idx if n == nil.
801
- # returns a partial string, from idx to (idx + n) if n != nil. at EOF,
802
- # the string size could not equal to arg n.
803
- def [](idx, n = nil)
804
- if idx < 0
1774
+ loop do
1775
+ # add another read to the line
1776
+ unless parse = @io.gets(@row_sep)
805
1777
  return nil
806
1778
  end
807
- if (idx_is_eos?(idx))
808
- if n and (@offset + idx == buf_size(@cur_buf))
809
- # Like a String, 'abc'[4, 1] returns nil and
810
- # 'abc'[3, 1] returns '' not nil.
811
- return ''
812
- else
813
- return nil
1779
+
1780
+ parse.sub!(@parsers[:line_end], "")
1781
+
1782
+ if csv.empty?
1783
+ #
1784
+ # I believe a blank line should be an <tt>Array.new</tt>, not Ruby 1.8
1785
+ # CSV's <tt>[nil]</tt>
1786
+ #
1787
+ if parse.empty?
1788
+ @lineno += 1
1789
+ if @skip_blanks
1790
+ next
1791
+ elsif @unconverted_fields
1792
+ return add_unconverted_fields(Array.new, Array.new)
1793
+ elsif @use_headers
1794
+ return self.class::Row.new(Array.new, Array.new)
1795
+ else
1796
+ return Array.new
1797
+ end
814
1798
  end
815
1799
  end
816
- my_buf = @cur_buf
817
- my_offset = @offset
818
- next_idx = idx
819
- while (my_offset + next_idx >= buf_size(my_buf))
820
- if (my_buf == @buf_tail_idx)
821
- unless add_buf
822
- break
823
- end
1800
+
1801
+ next if @skip_lines and @skip_lines.match parse
1802
+
1803
+ parts = parse.split(@col_sep, -1)
1804
+ if parts.empty?
1805
+ if in_extended_col
1806
+ csv[-1] << @col_sep # will be replaced with a @row_sep after the parts.each loop
1807
+ else
1808
+ csv << nil
824
1809
  end
825
- next_idx = my_offset + next_idx - buf_size(my_buf)
826
- my_buf += 1
827
- my_offset = 0
828
- end
829
- loc = my_offset + next_idx
830
- if !n
831
- return @buf_list[my_buf][loc] # Fixnum of char code.
832
- elsif (loc + n - 1 < buf_size(my_buf))
833
- return @buf_list[my_buf][loc, n] # String.
834
- else # should do loop insted of (tail) recursive call...
835
- res = @buf_list[my_buf][loc, BufSize]
836
- size_added = buf_size(my_buf) - loc
837
- if size_added > 0
838
- idx += size_added
839
- n -= size_added
840
- ret = self[idx, n]
841
- if ret
842
- res << ret
1810
+ end
1811
+
1812
+ # This loop is the hot path of csv parsing. Some things may be non-dry
1813
+ # for a reason. Make sure to benchmark when refactoring.
1814
+ parts.each do |part|
1815
+ if in_extended_col
1816
+ # If we are continuing a previous column
1817
+ if part[-1] == @quote_char && part.count(@quote_char) % 2 != 0
1818
+ # extended column ends
1819
+ csv.last << part[0..-2]
1820
+ if csv.last =~ @parsers[:stray_quote]
1821
+ raise MalformedCSVError,
1822
+ "Missing or stray quote in line #{lineno + 1}"
1823
+ end
1824
+ csv.last.gsub!(@quote_char * 2, @quote_char)
1825
+ in_extended_col = false
1826
+ else
1827
+ csv.last << part
1828
+ csv.last << @col_sep
843
1829
  end
844
- end
845
- return res
846
- end
847
- end
848
- alias get []
849
-
850
- # drop a string from the stream.
851
- # returns dropped size. at EOF, dropped size might not equals to arg n.
852
- # Once you drop the head of the stream, access to the dropped part via []
853
- # or get returns nil.
854
- def drop(n)
855
- if is_eos?
856
- return 0
857
- end
858
- size_dropped = 0
859
- while (n > 0)
860
- if !@is_eos or (@cur_buf != @buf_tail_idx)
861
- if (@offset + n < buf_size(@cur_buf))
862
- size_dropped += n
863
- @offset += n
864
- n = 0
1830
+ elsif part[0] == @quote_char
1831
+ # If we are staring a new quoted column
1832
+ if part[-1] != @quote_char || part.count(@quote_char) % 2 != 0
1833
+ # start an extended column
1834
+ csv << part[1..-1]
1835
+ csv.last << @col_sep
1836
+ in_extended_col = true
865
1837
  else
866
- size = buf_size(@cur_buf) - @offset
867
- size_dropped += size
868
- n -= size
869
- @offset = 0
870
- unless rel_buf
871
- unless add_buf
872
- break
873
- end
874
- @cur_buf = @buf_tail_idx
1838
+ # regular quoted column
1839
+ csv << part[1..-2]
1840
+ if csv.last =~ @parsers[:stray_quote]
1841
+ raise MalformedCSVError,
1842
+ "Missing or stray quote in line #{lineno + 1}"
875
1843
  end
1844
+ csv.last.gsub!(@quote_char * 2, @quote_char)
1845
+ end
1846
+ elsif part =~ @parsers[:quote_or_nl]
1847
+ # Unquoted field with bad characters.
1848
+ if part =~ @parsers[:nl_or_lf]
1849
+ raise MalformedCSVError, "Unquoted fields do not allow " +
1850
+ "\\r or \\n (line #{lineno + 1})."
1851
+ else
1852
+ raise MalformedCSVError, "Illegal quoting in line #{lineno + 1}."
876
1853
  end
1854
+ else
1855
+ # Regular ole unquoted field.
1856
+ csv << (part.empty? ? nil : part)
877
1857
  end
878
1858
  end
879
- size_dropped
880
- end
881
-
882
- def is_eos?
883
- return idx_is_eos?(0)
884
- end
885
-
886
- # WARN: Do not instantiate this class directly. Define your own class
887
- # which derives this class and define 'read' instance method.
888
- def initialize
889
- @buf_list = []
890
- @cur_buf = @buf_tail_idx = -1
891
- @offset = 0
892
- @is_eos = false
893
- add_buf
894
- @cur_buf = @buf_tail_idx
895
- end
896
-
897
- protected
898
-
899
- def terminate
900
- while (rel_buf); end
901
- end
902
-
903
- # protected method 'read' must be defined in derived classes.
904
- # CAUTION: Returning a string which size is not equal to 'size' means
905
- # EnfOfStream. When it is not at EOS, you must block the callee, try to
906
- # read and return the sized string.
907
- def read(size) # raise EOFError
908
- raise NotImplementedError.new('Method read must be defined in a derived class.')
909
- end
910
-
911
- private
912
-
913
- def buf_size(idx)
914
- @buf_list[idx].size
1859
+
1860
+ # Replace tacked on @col_sep with @row_sep if we are still in an extended
1861
+ # column.
1862
+ csv[-1][-1] = @row_sep if in_extended_col
1863
+
1864
+ if in_extended_col
1865
+ # if we're at eof?(), a quoted field wasn't closed...
1866
+ if @io.eof?
1867
+ raise MalformedCSVError,
1868
+ "Unclosed quoted field on line #{lineno + 1}."
1869
+ elsif @field_size_limit and csv.last.size >= @field_size_limit
1870
+ raise MalformedCSVError, "Field size exceeded on line #{lineno + 1}."
1871
+ end
1872
+ # otherwise, we need to loop and pull some more data to complete the row
1873
+ else
1874
+ @lineno += 1
1875
+
1876
+ # save fields unconverted fields, if needed...
1877
+ unconverted = csv.dup if @unconverted_fields
1878
+
1879
+ # convert fields, if needed...
1880
+ csv = convert_fields(csv) unless @use_headers or @converters.empty?
1881
+ # parse out header rows and handle CSV::Row conversions...
1882
+ csv = parse_headers(csv) if @use_headers
1883
+
1884
+ # inject unconverted fields and accessor, if requested...
1885
+ if @unconverted_fields and not csv.respond_to? :unconverted_fields
1886
+ add_unconverted_fields(csv, unconverted)
1887
+ end
1888
+
1889
+ # return the results
1890
+ break csv
1891
+ end
915
1892
  end
1893
+ end
1894
+ alias_method :gets, :shift
1895
+ alias_method :readline, :shift
916
1896
 
917
- def add_buf
918
- if @is_eos
919
- return false
1897
+ #
1898
+ # Returns a simplified description of the key CSV attributes in an
1899
+ # ASCII compatible String.
1900
+ #
1901
+ def inspect
1902
+ str = ["<#", self.class.to_s, " io_type:"]
1903
+ # show type of wrapped IO
1904
+ if @io == $stdout then str << "$stdout"
1905
+ elsif @io == $stdin then str << "$stdin"
1906
+ elsif @io == $stderr then str << "$stderr"
1907
+ else str << @io.class.to_s
1908
+ end
1909
+ # show IO.path(), if available
1910
+ if @io.respond_to?(:path) and (p = @io.path)
1911
+ str << " io_path:" << p.inspect
1912
+ end
1913
+ # show encoding
1914
+ str << " encoding:" << @encoding.name
1915
+ # show other attributes
1916
+ %w[ lineno col_sep row_sep
1917
+ quote_char skip_blanks ].each do |attr_name|
1918
+ if a = instance_variable_get("@#{attr_name}")
1919
+ str << " " << attr_name << ":" << a.inspect
920
1920
  end
921
- begin
922
- str_read = read(BufSize)
923
- rescue EOFError
924
- str_read = nil
925
- rescue
926
- terminate
927
- raise
928
- end
929
- if str_read.nil?
930
- @is_eos = true
931
- @buf_list.push('')
932
- @buf_tail_idx += 1
933
- false
1921
+ end
1922
+ if @use_headers
1923
+ str << " headers:" << headers.inspect
1924
+ end
1925
+ str << ">"
1926
+ begin
1927
+ str.join('')
1928
+ rescue # any encoding error
1929
+ str.map do |s|
1930
+ e = Encoding::Converter.asciicompat_encoding(s.encoding)
1931
+ e ? s.encode(e) : s.force_encoding("ASCII-8BIT")
1932
+ end.join('')
1933
+ end
1934
+ end
1935
+
1936
+ private
1937
+
1938
+ #
1939
+ # Stores the indicated separators for later use.
1940
+ #
1941
+ # If auto-discovery was requested for <tt>@row_sep</tt>, this method will read
1942
+ # ahead in the <tt>@io</tt> and try to find one. +ARGF+, +STDIN+, +STDOUT+,
1943
+ # +STDERR+ and any stream open for output only with a default
1944
+ # <tt>@row_sep</tt> of <tt>$INPUT_RECORD_SEPARATOR</tt> (<tt>$/</tt>).
1945
+ #
1946
+ # This method also establishes the quoting rules used for CSV output.
1947
+ #
1948
+ def init_separators(options)
1949
+ # store the selected separators
1950
+ @col_sep = options.delete(:col_sep).to_s.encode(@encoding)
1951
+ @row_sep = options.delete(:row_sep) # encode after resolving :auto
1952
+ @quote_char = options.delete(:quote_char).to_s.encode(@encoding)
1953
+
1954
+ if @quote_char.length != 1
1955
+ raise ArgumentError, ":quote_char has to be a single character String"
1956
+ end
1957
+
1958
+ #
1959
+ # automatically discover row separator when requested
1960
+ # (not fully encoding safe)
1961
+ #
1962
+ if @row_sep == :auto
1963
+ if [ARGF, STDIN, STDOUT, STDERR].include?(@io) or
1964
+ (defined?(Zlib) and @io.class == Zlib::GzipWriter)
1965
+ @row_sep = $INPUT_RECORD_SEPARATOR
934
1966
  else
935
- @buf_list.push(str_read)
936
- @buf_tail_idx += 1
937
- true
1967
+ begin
1968
+ #
1969
+ # remember where we were (pos() will raise an axception if @io is pipe
1970
+ # or not opened for reading)
1971
+ #
1972
+ saved_pos = @io.pos
1973
+ while @row_sep == :auto
1974
+ #
1975
+ # if we run out of data, it's probably a single line
1976
+ # (ensure will set default value)
1977
+ #
1978
+ break unless sample = @io.gets(nil, 1024)
1979
+ # extend sample if we're unsure of the line ending
1980
+ if sample.end_with? encode_str("\r")
1981
+ sample << (@io.gets(nil, 1) || "")
1982
+ end
1983
+
1984
+ # try to find a standard separator
1985
+ if sample =~ encode_re("\r\n?|\n")
1986
+ @row_sep = $&
1987
+ break
1988
+ end
1989
+ end
1990
+
1991
+ # tricky seek() clone to work around GzipReader's lack of seek()
1992
+ @io.rewind
1993
+ # reset back to the remembered position
1994
+ while saved_pos > 1024 # avoid loading a lot of data into memory
1995
+ @io.read(1024)
1996
+ saved_pos -= 1024
1997
+ end
1998
+ @io.read(saved_pos) if saved_pos.nonzero?
1999
+ rescue IOError # not opened for reading
2000
+ # do nothing: ensure will set default
2001
+ rescue NoMethodError # Zlib::GzipWriter doesn't have some IO methods
2002
+ # do nothing: ensure will set default
2003
+ rescue SystemCallError # pipe
2004
+ # do nothing: ensure will set default
2005
+ ensure
2006
+ #
2007
+ # set default if we failed to detect
2008
+ # (stream not opened for reading, a pipe, or a single line of data)
2009
+ #
2010
+ @row_sep = $INPUT_RECORD_SEPARATOR if @row_sep == :auto
2011
+ end
938
2012
  end
939
2013
  end
940
-
941
- def rel_buf
942
- if (@cur_buf < 0)
943
- return false
2014
+ @row_sep = @row_sep.to_s.encode(@encoding)
2015
+
2016
+ # establish quoting rules
2017
+ @force_quotes = options.delete(:force_quotes)
2018
+ do_quote = lambda do |field|
2019
+ field = String(field)
2020
+ encoded_quote = @quote_char.encode(field.encoding)
2021
+ encoded_quote +
2022
+ field.gsub(encoded_quote, encoded_quote * 2) +
2023
+ encoded_quote
2024
+ end
2025
+ quotable_chars = encode_str("\r\n", @col_sep, @quote_char)
2026
+ @quote = if @force_quotes
2027
+ do_quote
2028
+ else
2029
+ lambda do |field|
2030
+ if field.nil? # represent +nil+ fields as empty unquoted fields
2031
+ ""
2032
+ else
2033
+ field = String(field) # Stringify fields
2034
+ # represent empty fields as empty quoted fields
2035
+ if field.empty? or
2036
+ field.count(quotable_chars).nonzero?
2037
+ do_quote.call(field)
2038
+ else
2039
+ field # unquoted field
2040
+ end
2041
+ end
944
2042
  end
945
- @buf_list[@cur_buf] = nil
946
- if (@cur_buf == @buf_tail_idx)
947
- @cur_buf = -1
948
- return false
949
- else
950
- @cur_buf += 1
951
- return true
2043
+ end
2044
+ end
2045
+
2046
+ # Pre-compiles parsers and stores them by name for access during reads.
2047
+ def init_parsers(options)
2048
+ # store the parser behaviors
2049
+ @skip_blanks = options.delete(:skip_blanks)
2050
+ @field_size_limit = options.delete(:field_size_limit)
2051
+
2052
+ # prebuild Regexps for faster parsing
2053
+ esc_row_sep = escape_re(@row_sep)
2054
+ esc_quote = escape_re(@quote_char)
2055
+ @parsers = {
2056
+ # for detecting parse errors
2057
+ quote_or_nl: encode_re("[", esc_quote, "\r\n]"),
2058
+ nl_or_lf: encode_re("[\r\n]"),
2059
+ stray_quote: encode_re( "[^", esc_quote, "]", esc_quote,
2060
+ "[^", esc_quote, "]" ),
2061
+ # safer than chomp!()
2062
+ line_end: encode_re(esc_row_sep, "\\z"),
2063
+ # illegal unquoted characters
2064
+ return_newline: encode_str("\r\n")
2065
+ }
2066
+ end
2067
+
2068
+ #
2069
+ # Loads any converters requested during construction.
2070
+ #
2071
+ # If +field_name+ is set <tt>:converters</tt> (the default) field converters
2072
+ # are set. When +field_name+ is <tt>:header_converters</tt> header converters
2073
+ # are added instead.
2074
+ #
2075
+ # The <tt>:unconverted_fields</tt> option is also actived for
2076
+ # <tt>:converters</tt> calls, if requested.
2077
+ #
2078
+ def init_converters(options, field_name = :converters)
2079
+ if field_name == :converters
2080
+ @unconverted_fields = options.delete(:unconverted_fields)
2081
+ end
2082
+
2083
+ instance_variable_set("@#{field_name}", Array.new)
2084
+
2085
+ # find the correct method to add the converters
2086
+ convert = method(field_name.to_s.sub(/ers\Z/, ""))
2087
+
2088
+ # load converters
2089
+ unless options[field_name].nil?
2090
+ # allow a single converter not wrapped in an Array
2091
+ unless options[field_name].is_a? Array
2092
+ options[field_name] = [options[field_name]]
2093
+ end
2094
+ # load each converter...
2095
+ options[field_name].each do |converter|
2096
+ if converter.is_a? Proc # custom code block
2097
+ convert.call(&converter)
2098
+ else # by name
2099
+ convert.call(converter)
2100
+ end
952
2101
  end
953
2102
  end
954
-
955
- def idx_is_eos?(idx)
956
- (@is_eos and ((@cur_buf < 0) or (@cur_buf == @buf_tail_idx)))
2103
+
2104
+ options.delete(field_name)
2105
+ end
2106
+
2107
+ # Stores header row settings and loads header converters, if needed.
2108
+ def init_headers(options)
2109
+ @use_headers = options.delete(:headers)
2110
+ @return_headers = options.delete(:return_headers)
2111
+ @write_headers = options.delete(:write_headers)
2112
+
2113
+ # headers must be delayed until shift(), in case they need a row of content
2114
+ @headers = nil
2115
+
2116
+ init_converters(options, :header_converters)
2117
+ end
2118
+
2119
+ # Stores the pattern of comments to skip from the provided options.
2120
+ #
2121
+ # The pattern must respond to +.match+, else ArgumentError is raised.
2122
+ #
2123
+ # See also CSV.new
2124
+ def init_comments(options)
2125
+ @skip_lines = options.delete(:skip_lines)
2126
+ if @skip_lines and not @skip_lines.respond_to?(:match)
2127
+ raise ArgumentError, ":skip_lines has to respond to matches"
2128
+ end
2129
+ end
2130
+ #
2131
+ # The actual work method for adding converters, used by both CSV.convert() and
2132
+ # CSV.header_convert().
2133
+ #
2134
+ # This method requires the +var_name+ of the instance variable to place the
2135
+ # converters in, the +const+ Hash to lookup named converters in, and the
2136
+ # normal parameters of the CSV.convert() and CSV.header_convert() methods.
2137
+ #
2138
+ def add_converter(var_name, const, name = nil, &converter)
2139
+ if name.nil? # custom converter
2140
+ instance_variable_get("@#{var_name}") << converter
2141
+ else # named converter
2142
+ combo = const[name]
2143
+ case combo
2144
+ when Array # combo converter
2145
+ combo.each do |converter_name|
2146
+ add_converter(var_name, const, converter_name)
2147
+ end
2148
+ else # individual named converter
2149
+ instance_variable_get("@#{var_name}") << combo
2150
+ end
957
2151
  end
958
-
959
- BufSize = 1024 * 8
960
2152
  end
961
2153
 
962
- # Buffered IO.
963
2154
  #
964
- # EXAMPLE
965
- # # File 'bigdata' could be a giga-byte size one!
966
- # buf = CSV::IOBuf.new(File.open('bigdata', 'rb'))
967
- # CSV::Reader.new(buf).each do |row|
968
- # p row
969
- # break if row[0].data == 'admin'
970
- # end
2155
+ # Processes +fields+ with <tt>@converters</tt>, or <tt>@header_converters</tt>
2156
+ # if +headers+ is passed as +true+, returning the converted field set. Any
2157
+ # converter that changes the field into something other than a String halts
2158
+ # the pipeline of conversion for that field. This is primarily an efficiency
2159
+ # shortcut.
971
2160
  #
972
- class IOBuf < StreamBuf
973
- def initialize(s)
974
- @s = s
975
- super()
2161
+ def convert_fields(fields, headers = false)
2162
+ # see if we are converting headers or fields
2163
+ converters = headers ? @header_converters : @converters
2164
+
2165
+ fields.map.with_index do |field, index|
2166
+ converters.each do |converter|
2167
+ field = if converter.arity == 1 # straight field converter
2168
+ converter[field]
2169
+ else # FieldInfo converter
2170
+ header = @use_headers && !headers ? @headers[index] : nil
2171
+ converter[field, FieldInfo.new(index, lineno, header)]
2172
+ end
2173
+ break unless field.is_a? String # short-curcuit pipeline for speed
2174
+ end
2175
+ field # final state of each field, converted or original
976
2176
  end
977
-
978
- def close
979
- terminate
2177
+ end
2178
+
2179
+ #
2180
+ # This method is used to turn a finished +row+ into a CSV::Row. Header rows
2181
+ # are also dealt with here, either by returning a CSV::Row with identical
2182
+ # headers and fields (save that the fields do not go through the converters)
2183
+ # or by reading past them to return a field row. Headers are also saved in
2184
+ # <tt>@headers</tt> for use in future rows.
2185
+ #
2186
+ # When +nil+, +row+ is assumed to be a header row not based on an actual row
2187
+ # of the stream.
2188
+ #
2189
+ def parse_headers(row = nil)
2190
+ if @headers.nil? # header row
2191
+ @headers = case @use_headers # save headers
2192
+ # Array of headers
2193
+ when Array then @use_headers
2194
+ # CSV header String
2195
+ when String
2196
+ self.class.parse_line( @use_headers,
2197
+ col_sep: @col_sep,
2198
+ row_sep: @row_sep,
2199
+ quote_char: @quote_char )
2200
+ # first row is headers
2201
+ else row
2202
+ end
2203
+
2204
+ # prepare converted and unconverted copies
2205
+ row = @headers if row.nil?
2206
+ @headers = convert_fields(@headers, true)
2207
+
2208
+ if @return_headers # return headers
2209
+ return self.class::Row.new(@headers, row, true)
2210
+ elsif not [Array, String].include? @use_headers.class # skip to field row
2211
+ return shift
2212
+ end
980
2213
  end
981
2214
 
982
- private
2215
+ self.class::Row.new(@headers, convert_fields(row)) # field row
2216
+ end
983
2217
 
984
- def read(size)
985
- @s.read(size)
2218
+ #
2219
+ # This method injects an instance variable <tt>unconverted_fields</tt> into
2220
+ # +row+ and an accessor method for +row+ called unconverted_fields(). The
2221
+ # variable is set to the contents of +fields+.
2222
+ #
2223
+ def add_unconverted_fields(row, fields)
2224
+ class << row
2225
+ attr_reader :unconverted_fields
986
2226
  end
987
-
988
- def terminate
989
- super()
2227
+ row.instance_eval { @unconverted_fields = fields }
2228
+ row
2229
+ end
2230
+
2231
+ #
2232
+ # This method is an encoding safe version of Regexp::escape(). It will escape
2233
+ # any characters that would change the meaning of a regular expression in the
2234
+ # encoding of +str+. Regular expression characters that cannot be transcoded
2235
+ # to the target encoding will be skipped and no escaping will be performed if
2236
+ # a backslash cannot be transcoded.
2237
+ #
2238
+ def escape_re(str)
2239
+ str.gsub(@re_chars) {|c| @re_esc + c}
2240
+ end
2241
+
2242
+ #
2243
+ # Builds a regular expression in <tt>@encoding</tt>. All +chunks+ will be
2244
+ # transcoded to that encoding.
2245
+ #
2246
+ def encode_re(*chunks)
2247
+ Regexp.new(encode_str(*chunks))
2248
+ end
2249
+
2250
+ #
2251
+ # Builds a String in <tt>@encoding</tt>. All +chunks+ will be transcoded to
2252
+ # that encoding.
2253
+ #
2254
+ def encode_str(*chunks)
2255
+ chunks.map { |chunk| chunk.encode(@encoding.name) }.join('')
2256
+ end
2257
+
2258
+ private
2259
+
2260
+ #
2261
+ # Returns the encoding of the internal IO object or the +default+ if the
2262
+ # encoding cannot be determined.
2263
+ #
2264
+ def raw_encoding(default = Encoding::ASCII_8BIT)
2265
+ if @io.respond_to? :internal_encoding
2266
+ @io.internal_encoding || @io.external_encoding
2267
+ elsif @io.is_a? StringIO
2268
+ @io.string.encoding
2269
+ elsif @io.respond_to? :encoding
2270
+ @io.encoding
2271
+ else
2272
+ default
990
2273
  end
991
2274
  end
992
2275
  end
2276
+
2277
+ # Passes +args+ to CSV::instance.
2278
+ #
2279
+ # CSV("CSV,data").read
2280
+ # #=> [["CSV", "data"]]
2281
+ #
2282
+ # If a block is given, the instance is passed the block and the return value
2283
+ # becomes the return value of the block.
2284
+ #
2285
+ # CSV("CSV,data") { |c|
2286
+ # c.read.any? { |a| a.include?("data") }
2287
+ # } #=> true
2288
+ #
2289
+ # CSV("CSV,data") { |c|
2290
+ # c.read.any? { |a| a.include?("zombies") }
2291
+ # } #=> false
2292
+ #
2293
+ def CSV(*args, &block)
2294
+ CSV.instance(*args, &block)
2295
+ end
2296
+
2297
+ class Array # :nodoc:
2298
+ # Equivalent to CSV::generate_line(self, options)
2299
+ #
2300
+ # ["CSV", "data"].to_csv
2301
+ # #=> "CSV,data\n"
2302
+ def to_csv(options = Hash.new)
2303
+ CSV.generate_line(self, options)
2304
+ end
2305
+ end
2306
+
2307
+ class String # :nodoc:
2308
+ # Equivalent to CSV::parse_line(self, options)
2309
+ #
2310
+ # "CSV,data".parse_csv
2311
+ # #=> ["CSV", "data"]
2312
+ def parse_csv(options = Hash.new)
2313
+ CSV.parse_line(self, options)
2314
+ end
2315
+ end