csv 3.2.0 → 3.2.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: c48c0d15454e002ff10270a9c56cf4311ce635a8a9dfb527f7a7541f29f801b2
4
- data.tar.gz: 505d1d0dbb4cff0a544b2e00925cb1101ed71642a584d534f443405fba8bd820
3
+ metadata.gz: 915b3ed5a51bf4836f08f7bb06efc3b07bdc90e09209a5253092130e2cad2ab6
4
+ data.tar.gz: 6bce2e39329afcf200691b4b2f422b6a48d45da66368f5d5e136e0c761cd6217
5
5
  SHA512:
6
- metadata.gz: 1c9ecd18d5b9a4f663c0676694ffc133a4657e2f7a07cafe2f0a5d9ddd2d7846f505bc62c21698fcf1117126efc6978b7aa1b497d2fef8d532a8a4246c58bff2
7
- data.tar.gz: e4fe05b49f92c68c011060d1dcd39ead1785d886eabbd3689a12884df9eb30124694417314e39bcf656cd05d6d0dea6a80a701dd5fe6cac42efc33c67be54926
6
+ metadata.gz: 5c1434c8e91c16de40d19d4d1200f193248e786720b67f2bbecf26a481859fe814b8cbaa02d22027668ff02588541266c8ff5d00b9fc1cfc2163b358b8e9ece9
7
+ data.tar.gz: 1978e933549049129f0ec99e80a10f2838b3c75a282103aa177d8421fe7589d428308e2786b29a961a4a7a5565ede77e3b1ef44ba8f4bc91b593a5a884ded7aa
data/NEWS.md CHANGED
@@ -1,5 +1,104 @@
1
1
  # News
2
2
 
3
+ ## 3.2.3 - 2022-04-09
4
+
5
+ ### Improvements
6
+
7
+ * Added contents summary to `CSV::Table#inspect`.
8
+ [GitHub#229][Patch by Eriko Sugiyama]
9
+ [GitHub#235][Patch by Sampat Badhe]
10
+
11
+ * Suppressed `$INPUT_RECORD_SEPARATOR` deprecation warning by
12
+ `Warning.warn`.
13
+ [GitHub#233][Reported by Jean byroot Boussier]
14
+
15
+ * Improved error message for liberal parsing with quoted values.
16
+ [GitHub#231][Patch by Nikolay Rys]
17
+
18
+ * Fixed typos in documentation.
19
+ [GitHub#236][Patch by Sampat Badhe]
20
+
21
+ * Added `:max_field_size` option and deprecated `:field_size_limit` option.
22
+ [GitHub#238][Reported by Dan Buettner]
23
+
24
+ * Added `:symbol_raw` to built-in header converters.
25
+ [GitHub#237][Reported by taki]
26
+ [GitHub#239][Patch by Eriko Sugiyama]
27
+
28
+ ### Fixes
29
+
30
+ * Fixed a bug that some texts may be dropped unexpectedly.
31
+ [Bug #18245][ruby-core:105587][Reported by Hassan Abdul Rehman]
32
+
33
+ * Fixed a bug that `:field_size_limit` doesn't work with not complex row.
34
+ [GitHub#238][Reported by Dan Buettner]
35
+
36
+ ### Thanks
37
+
38
+ * Hassan Abdul Rehman
39
+
40
+ * Eriko Sugiyama
41
+
42
+ * Jean byroot Boussier
43
+
44
+ * Nikolay Rys
45
+
46
+ * Sampat Badhe
47
+
48
+ * Dan Buettner
49
+
50
+ * taki
51
+
52
+ ## 3.2.2 - 2021-12-24
53
+
54
+ ### Improvements
55
+
56
+ * Added a validation for invalid option combination.
57
+ [GitHub#225][Patch by adamroyjones]
58
+
59
+ * Improved documentation for developers.
60
+ [GitHub#227][Patch by Eriko Sugiyama]
61
+
62
+ ### Fixes
63
+
64
+ * Fixed a bug that all of `ARGF` contents may not be consumed.
65
+ [GitHub#228][Reported by Rafael Navaza]
66
+
67
+ ### Thanks
68
+
69
+ * adamroyjones
70
+
71
+ * Eriko Sugiyama
72
+
73
+ * Rafael Navaza
74
+
75
+ ## 3.2.1 - 2021-10-23
76
+
77
+ ### Improvements
78
+
79
+ * doc: Fixed wrong class name.
80
+ [GitHub#217][Patch by Vince]
81
+
82
+ * Changed to always use `"\n"` for the default row separator on Ruby
83
+ 3.0 or later because `$INPUT_RECORD_SEPARATOR` was deprecated
84
+ since Ruby 3.0.
85
+
86
+ * Added support for Ractor.
87
+ [GitHub#218][Patch by rm155]
88
+
89
+ * Users who want to use the built-in converters in non-main
90
+ Ractors need to call `Ractor.make_shareable(CSV::Converters)`
91
+ and/or `Ractor.make_shareable(CSV::HeaderConverters)` before
92
+ creating non-main Ractors.
93
+
94
+ ### Thanks
95
+
96
+ * Vince
97
+
98
+ * Joakim Antman
99
+
100
+ * rm155
101
+
3
102
  ## 3.2.0 - 2021-06-06
4
103
 
5
104
  ### Improvements
data/README.md CHANGED
@@ -35,7 +35,7 @@ end
35
35
 
36
36
  ## Development
37
37
 
38
- After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
38
+ After checking out the repo, run `ruby run-test.rb` to check if your changes can pass the test.
39
39
 
40
40
  To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
41
41
 
@@ -19,7 +19,7 @@ Without +write_headers+:
19
19
 
20
20
  With +write_headers+":
21
21
  CSV.open(file_path,'w',
22
- :write_headers=> true,
22
+ :write_headers => true,
23
23
  :headers => ['Name','Value']
24
24
  ) do |csv|
25
25
  csv << ['foo', '0']
@@ -148,7 +148,7 @@ This example defines and uses a custom write converter to strip whitespace from
148
148
 
149
149
  ==== Recipe: Specify Multiple Write Converters
150
150
 
151
- Use option <tt>:write_converters</tt> and multiple custom coverters
151
+ Use option <tt>:write_converters</tt> and multiple custom converters
152
152
  to convert field values when generating \CSV.
153
153
 
154
154
  This example defines and uses two custom write converters to strip and upcase generated fields:
@@ -83,7 +83,7 @@ Use instance method CSV#each with option +headers+ to read a source \String one
83
83
  CSV.new(string, headers: true).each do |row|
84
84
  p row
85
85
  end
86
- Ouput:
86
+ Output:
87
87
  #<CSV::Row "Name":"foo" "Value":"0">
88
88
  #<CSV::Row "Name":"bar" "Value":"1">
89
89
  #<CSV::Row "Name":"baz" "Value":"2">
@@ -16,7 +16,7 @@ class CSV
16
16
  @empty_value = options[:empty_value]
17
17
  @empty_value_is_empty_string = (@empty_value == "")
18
18
  @accept_nil = options[:accept_nil]
19
- @builtin_converters = options[:builtin_converters]
19
+ @builtin_converters_name = options[:builtin_converters_name]
20
20
  @need_static_convert = need_static_convert?
21
21
  end
22
22
 
@@ -24,7 +24,7 @@ class CSV
24
24
  if name.nil? # custom converter
25
25
  @converters << converter
26
26
  else # named converter
27
- combo = @builtin_converters[name]
27
+ combo = builtin_converters[name]
28
28
  case combo
29
29
  when Array # combo converter
30
30
  combo.each do |sub_name|
@@ -80,5 +80,9 @@ class CSV
80
80
  @need_static_convert or
81
81
  (not @converters.empty?)
82
82
  end
83
+
84
+ def builtin_converters
85
+ @builtin_converters ||= ::CSV.const_get(@builtin_converters_name)
86
+ end
83
87
  end
84
88
  end
@@ -0,0 +1,18 @@
1
+ require "English"
2
+ require "stringio"
3
+
4
+ class CSV
5
+ module InputRecordSeparator
6
+ class << self
7
+ if RUBY_VERSION >= "3.0.0"
8
+ def value
9
+ "\n"
10
+ end
11
+ else
12
+ def value
13
+ $INPUT_RECORD_SEPARATOR
14
+ end
15
+ end
16
+ end
17
+ end
18
+ end
data/lib/csv/parser.rb CHANGED
@@ -3,6 +3,7 @@
3
3
  require "strscan"
4
4
 
5
5
  require_relative "delete_suffix"
6
+ require_relative "input_record_separator"
6
7
  require_relative "match_p"
7
8
  require_relative "row"
8
9
  require_relative "table"
@@ -26,6 +27,10 @@ class CSV
26
27
  class InvalidEncoding < StandardError
27
28
  end
28
29
 
30
+ # Raised when unexpected case is happen.
31
+ class UnexpectedError < StandardError
32
+ end
33
+
29
34
  #
30
35
  # CSV::Scanner receives a CSV output, scans it and return the content.
31
36
  # It also controls the life cycle of the object with its methods +keep_start+,
@@ -77,16 +82,17 @@ class CSV
77
82
  # +keep_end+, +keep_back+, +keep_drop+.
78
83
  #
79
84
  # CSV::InputsScanner.scan() tries to match with pattern at the current position.
80
- # If there's a match, the scanner advances the scan pointer and returns the matched string.
85
+ # If there's a match, the scanner advances the "scan pointer" and returns the matched string.
81
86
  # Otherwise, the scanner returns nil.
82
87
  #
83
- # CSV::InputsScanner.rest() returns the rest of the string (i.e. everything after the scan pointer).
88
+ # CSV::InputsScanner.rest() returns the "rest" of the string (i.e. everything after the scan pointer).
84
89
  # If there is no more data (eos? = true), it returns "".
85
90
  #
86
91
  class InputsScanner
87
- def initialize(inputs, encoding, chunk_size: 8192)
92
+ def initialize(inputs, encoding, row_separator, chunk_size: 8192)
88
93
  @inputs = inputs.dup
89
94
  @encoding = encoding
95
+ @row_separator = row_separator
90
96
  @chunk_size = chunk_size
91
97
  @last_scanner = @inputs.empty?
92
98
  @keeps = []
@@ -94,11 +100,13 @@ class CSV
94
100
  end
95
101
 
96
102
  def each_line(row_separator)
103
+ return enum_for(__method__, row_separator) unless block_given?
97
104
  buffer = nil
98
105
  input = @scanner.rest
99
106
  position = @scanner.pos
100
107
  offset = 0
101
108
  n_row_separator_chars = row_separator.size
109
+ # trace(__method__, :start, line, input)
102
110
  while true
103
111
  input.each_line(row_separator) do |line|
104
112
  @scanner.pos += line.bytesize
@@ -138,25 +146,28 @@ class CSV
138
146
  end
139
147
 
140
148
  def scan(pattern)
149
+ # trace(__method__, pattern, :start)
141
150
  value = @scanner.scan(pattern)
151
+ # trace(__method__, pattern, :done, :last, value) if @last_scanner
142
152
  return value if @last_scanner
143
153
 
144
- if value
145
- read_chunk if @scanner.eos?
146
- return value
147
- else
148
- nil
149
- end
154
+ read_chunk if value and @scanner.eos?
155
+ # trace(__method__, pattern, :done, value)
156
+ value
150
157
  end
151
158
 
152
159
  def scan_all(pattern)
160
+ # trace(__method__, pattern, :start)
153
161
  value = @scanner.scan(pattern)
162
+ # trace(__method__, pattern, :done, :last, value) if @last_scanner
154
163
  return value if @last_scanner
155
164
 
156
165
  return nil if value.nil?
157
166
  while @scanner.eos? and read_chunk and (sub_value = @scanner.scan(pattern))
167
+ # trace(__method__, pattern, :sub, sub_value)
158
168
  value << sub_value
159
169
  end
170
+ # trace(__method__, pattern, :done, value)
160
171
  value
161
172
  end
162
173
 
@@ -165,76 +176,135 @@ class CSV
165
176
  end
166
177
 
167
178
  def keep_start
168
- @keeps.push([@scanner.pos, nil])
179
+ # trace(__method__, :start)
180
+ adjust_last_keep
181
+ @keeps.push([@scanner, @scanner.pos, nil])
182
+ # trace(__method__, :done)
169
183
  end
170
184
 
171
185
  def keep_end
172
- start, buffer = @keeps.pop
173
- keep = @scanner.string.byteslice(start, @scanner.pos - start)
186
+ # trace(__method__, :start)
187
+ scanner, start, buffer = @keeps.pop
188
+ if scanner == @scanner
189
+ keep = @scanner.string.byteslice(start, @scanner.pos - start)
190
+ else
191
+ keep = @scanner.string.byteslice(0, @scanner.pos)
192
+ end
174
193
  if buffer
175
194
  buffer << keep
176
195
  keep = buffer
177
196
  end
197
+ # trace(__method__, :done, keep)
178
198
  keep
179
199
  end
180
200
 
181
201
  def keep_back
182
- start, buffer = @keeps.pop
202
+ # trace(__method__, :start)
203
+ scanner, start, buffer = @keeps.pop
183
204
  if buffer
205
+ # trace(__method__, :rescan, start, buffer)
184
206
  string = @scanner.string
185
- keep = string.byteslice(start, string.bytesize - start)
207
+ if scanner == @scanner
208
+ keep = string.byteslice(start, string.bytesize - start)
209
+ else
210
+ keep = string
211
+ end
186
212
  if keep and not keep.empty?
187
213
  @inputs.unshift(StringIO.new(keep))
188
214
  @last_scanner = false
189
215
  end
190
216
  @scanner = StringScanner.new(buffer)
191
217
  else
218
+ if @scanner != scanner
219
+ message = "scanners are different but no buffer: "
220
+ message += "#{@scanner.inspect}(#{@scanner.object_id}): "
221
+ message += "#{scanner.inspect}(#{scanner.object_id})"
222
+ raise UnexpectedError, message
223
+ end
224
+ # trace(__method__, :repos, start, buffer)
192
225
  @scanner.pos = start
193
226
  end
194
227
  read_chunk if @scanner.eos?
195
228
  end
196
229
 
197
230
  def keep_drop
198
- @keeps.pop
231
+ _, _, buffer = @keeps.pop
232
+ # trace(__method__, :done, :empty) unless buffer
233
+ return unless buffer
234
+
235
+ last_keep = @keeps.last
236
+ # trace(__method__, :done, :no_last_keep) unless last_keep
237
+ return unless last_keep
238
+
239
+ if last_keep[2]
240
+ last_keep[2] << buffer
241
+ else
242
+ last_keep[2] = buffer
243
+ end
244
+ # trace(__method__, :done)
199
245
  end
200
246
 
201
247
  def rest
202
248
  @scanner.rest
203
249
  end
204
250
 
251
+ def check(pattern)
252
+ @scanner.check(pattern)
253
+ end
254
+
205
255
  private
206
- def read_chunk
207
- return false if @last_scanner
256
+ def trace(*args)
257
+ pp([*args, @scanner, @scanner&.string, @scanner&.pos, @keeps])
258
+ end
208
259
 
209
- unless @keeps.empty?
210
- keep = @keeps.last
211
- keep_start = keep[0]
212
- string = @scanner.string
213
- keep_data = string.byteslice(keep_start, @scanner.pos - keep_start)
214
- if keep_data
215
- keep_buffer = keep[1]
216
- if keep_buffer
217
- keep_buffer << keep_data
218
- else
219
- keep[1] = keep_data.dup
220
- end
260
+ def adjust_last_keep
261
+ # trace(__method__, :start)
262
+
263
+ keep = @keeps.last
264
+ # trace(__method__, :done, :empty) if keep.nil?
265
+ return if keep.nil?
266
+
267
+ scanner, start, buffer = keep
268
+ string = @scanner.string
269
+ if @scanner != scanner
270
+ start = 0
271
+ end
272
+ if start == 0 and @scanner.eos?
273
+ keep_data = string
274
+ else
275
+ keep_data = string.byteslice(start, @scanner.pos - start)
276
+ end
277
+ if keep_data
278
+ if buffer
279
+ buffer << keep_data
280
+ else
281
+ keep[2] = keep_data.dup
221
282
  end
222
- keep[0] = 0
223
283
  end
224
284
 
285
+ # trace(__method__, :done)
286
+ end
287
+
288
+ def read_chunk
289
+ return false if @last_scanner
290
+
291
+ adjust_last_keep
292
+
225
293
  input = @inputs.first
226
294
  case input
227
295
  when StringIO
228
296
  string = input.read
229
297
  raise InvalidEncoding unless string.valid_encoding?
298
+ # trace(__method__, :stringio, string)
230
299
  @scanner = StringScanner.new(string)
231
300
  @inputs.shift
232
301
  @last_scanner = @inputs.empty?
233
302
  true
234
303
  else
235
- chunk = input.gets(nil, @chunk_size)
304
+ chunk = input.gets(@row_separator, @chunk_size)
236
305
  if chunk
237
306
  raise InvalidEncoding unless chunk.valid_encoding?
307
+ # trace(__method__, :chunk, chunk)
238
308
  @scanner = StringScanner.new(chunk)
239
309
  if input.respond_to?(:eof?) and input.eof?
240
310
  @inputs.shift
@@ -242,6 +312,7 @@ class CSV
242
312
  end
243
313
  true
244
314
  else
315
+ # trace(__method__, :no_chunk)
245
316
  @scanner = StringScanner.new("".encode(@encoding))
246
317
  @inputs.shift
247
318
  @last_scanner = @inputs.empty?
@@ -276,7 +347,11 @@ class CSV
276
347
  end
277
348
 
278
349
  def field_size_limit
279
- @field_size_limit
350
+ @max_field_size&.succ
351
+ end
352
+
353
+ def max_field_size
354
+ @max_field_size
280
355
  end
281
356
 
282
357
  def skip_lines
@@ -344,6 +419,16 @@ class CSV
344
419
  end
345
420
  message = "Invalid byte sequence in #{@encoding}"
346
421
  raise MalformedCSVError.new(message, lineno)
422
+ rescue UnexpectedError => error
423
+ if @scanner
424
+ ignore_broken_line
425
+ lineno = @lineno
426
+ else
427
+ lineno = @lineno + 1
428
+ end
429
+ message = "This should not be happen: #{error.message}: "
430
+ message += "Please report this to https://github.com/ruby/csv/issues"
431
+ raise MalformedCSVError.new(message, lineno)
347
432
  end
348
433
  end
349
434
 
@@ -360,6 +445,7 @@ class CSV
360
445
  prepare_skip_lines
361
446
  prepare_strip
362
447
  prepare_separators
448
+ validate_strip_and_col_sep_options
363
449
  prepare_quoted
364
450
  prepare_unquoted
365
451
  prepare_line
@@ -387,7 +473,7 @@ class CSV
387
473
  @backslash_quote = false
388
474
  end
389
475
  @unconverted_fields = @options[:unconverted_fields]
390
- @field_size_limit = @options[:field_size_limit]
476
+ @max_field_size = @options[:max_field_size]
391
477
  @skip_blanks = @options[:skip_blanks]
392
478
  @fields_converter = @options[:fields_converter]
393
479
  @header_fields_converter = @options[:header_fields_converter]
@@ -479,9 +565,9 @@ class CSV
479
565
  begin
480
566
  StringScanner.new("x").scan("x")
481
567
  rescue TypeError
482
- @@string_scanner_scan_accept_string = false
568
+ STRING_SCANNER_SCAN_ACCEPT_STRING = false
483
569
  else
484
- @@string_scanner_scan_accept_string = true
570
+ STRING_SCANNER_SCAN_ACCEPT_STRING = true
485
571
  end
486
572
 
487
573
  def prepare_separators
@@ -505,7 +591,7 @@ class CSV
505
591
  @first_column_separators = Regexp.new(@escaped_first_column_separator +
506
592
  "+".encode(@encoding))
507
593
  else
508
- if @@string_scanner_scan_accept_string
594
+ if STRING_SCANNER_SCAN_ACCEPT_STRING
509
595
  @column_end = @column_separator
510
596
  else
511
597
  @column_end = Regexp.new(@escaped_column_separator)
@@ -526,10 +612,32 @@ class CSV
526
612
 
527
613
  @cr = "\r".encode(@encoding)
528
614
  @lf = "\n".encode(@encoding)
529
- @cr_or_lf = Regexp.new("[\r\n]".encode(@encoding))
615
+ @line_end = Regexp.new("\r\n|\n|\r".encode(@encoding))
530
616
  @not_line_end = Regexp.new("[^\r\n]+".encode(@encoding))
531
617
  end
532
618
 
619
+ # This method verifies that there are no (obvious) ambiguities with the
620
+ # provided +col_sep+ and +strip+ parsing options. For example, if +col_sep+
621
+ # and +strip+ were both equal to +\t+, then there would be no clear way to
622
+ # parse the input.
623
+ def validate_strip_and_col_sep_options
624
+ return unless @strip
625
+
626
+ if @strip.is_a?(String)
627
+ if @column_separator.start_with?(@strip) || @column_separator.end_with?(@strip)
628
+ raise ArgumentError,
629
+ "The provided strip (#{@escaped_strip}) and " \
630
+ "col_sep (#{@escaped_column_separator}) options are incompatible."
631
+ end
632
+ else
633
+ if Regexp.new("\\A[#{@escaped_strip}]|[#{@escaped_strip}]\\z").match?(@column_separator)
634
+ raise ArgumentError,
635
+ "The provided strip (true) and " \
636
+ "col_sep (#{@escaped_column_separator}) options are incompatible."
637
+ end
638
+ end
639
+ end
640
+
533
641
  def prepare_quoted
534
642
  if @quote_character
535
643
  @quotes = Regexp.new(@escaped_quote_character +
@@ -605,7 +713,7 @@ class CSV
605
713
  # do nothing: ensure will set default
606
714
  end
607
715
  end
608
- separator = $INPUT_RECORD_SEPARATOR if separator == :auto
716
+ separator = InputRecordSeparator.value if separator == :auto
609
717
  end
610
718
  separator.to_s.encode(@encoding)
611
719
  end
@@ -704,26 +812,28 @@ class CSV
704
812
  sample[0, 128].index(@quote_character)
705
813
  end
706
814
 
707
- SCANNER_TEST = (ENV["CSV_PARSER_SCANNER_TEST"] == "yes")
708
- if SCANNER_TEST
709
- class UnoptimizedStringIO
710
- def initialize(string)
711
- @io = StringIO.new(string, "rb:#{string.encoding}")
712
- end
815
+ class UnoptimizedStringIO # :nodoc:
816
+ def initialize(string)
817
+ @io = StringIO.new(string, "rb:#{string.encoding}")
818
+ end
713
819
 
714
- def gets(*args)
715
- @io.gets(*args)
716
- end
820
+ def gets(*args)
821
+ @io.gets(*args)
822
+ end
717
823
 
718
- def each_line(*args, &block)
719
- @io.each_line(*args, &block)
720
- end
824
+ def each_line(*args, &block)
825
+ @io.each_line(*args, &block)
826
+ end
721
827
 
722
- def eof?
723
- @io.eof?
724
- end
828
+ def eof?
829
+ @io.eof?
725
830
  end
831
+ end
726
832
 
833
+ SCANNER_TEST = (ENV["CSV_PARSER_SCANNER_TEST"] == "yes")
834
+ if SCANNER_TEST
835
+ SCANNER_TEST_CHUNK_SIZE_NAME = "CSV_PARSER_SCANNER_TEST_CHUNK_SIZE"
836
+ SCANNER_TEST_CHUNK_SIZE_VALUE = ENV[SCANNER_TEST_CHUNK_SIZE_NAME]
727
837
  def build_scanner
728
838
  inputs = @samples.collect do |sample|
729
839
  UnoptimizedStringIO.new(sample)
@@ -733,17 +843,27 @@ class CSV
733
843
  else
734
844
  inputs << @input
735
845
  end
736
- chunk_size = ENV["CSV_PARSER_SCANNER_TEST_CHUNK_SIZE"] || "1"
846
+ begin
847
+ chunk_size_value = ENV[SCANNER_TEST_CHUNK_SIZE_NAME]
848
+ rescue # Ractor::IsolationError
849
+ # Ractor on Ruby 3.0 can't read ENV value.
850
+ chunk_size_value = SCANNER_TEST_CHUNK_SIZE_VALUE
851
+ end
852
+ chunk_size = Integer((chunk_size_value || "1"), 10)
737
853
  InputsScanner.new(inputs,
738
854
  @encoding,
739
- chunk_size: Integer(chunk_size, 10))
855
+ @row_separator,
856
+ chunk_size: chunk_size)
740
857
  end
741
858
  else
742
859
  def build_scanner
743
860
  string = nil
744
861
  if @samples.empty? and @input.is_a?(StringIO)
745
862
  string = @input.read
746
- elsif @samples.size == 1 and @input.respond_to?(:eof?) and @input.eof?
863
+ elsif @samples.size == 1 and
864
+ @input != ARGF and
865
+ @input.respond_to?(:eof?) and
866
+ @input.eof?
747
867
  string = @samples[0]
748
868
  end
749
869
  if string
@@ -762,7 +882,7 @@ class CSV
762
882
  StringIO.new(sample)
763
883
  end
764
884
  inputs << @input
765
- InputsScanner.new(inputs, @encoding)
885
+ InputsScanner.new(inputs, @encoding, @row_separator)
766
886
  end
767
887
  end
768
888
  end
@@ -796,6 +916,14 @@ class CSV
796
916
  end
797
917
  end
798
918
 
919
+ def validate_field_size(field)
920
+ return unless @max_field_size
921
+ return if field.size <= @max_field_size
922
+ ignore_broken_line
923
+ message = "Field size exceeded: #{field.size} > #{@max_field_size}"
924
+ raise MalformedCSVError.new(message, @lineno)
925
+ end
926
+
799
927
  def parse_no_quote(&block)
800
928
  @scanner.each_line(@row_separator) do |line|
801
929
  next if @skip_lines and skip_line?(line)
@@ -808,6 +936,11 @@ class CSV
808
936
  else
809
937
  line = strip_value(line)
810
938
  row = line.split(@split_column_separator, -1)
939
+ if @max_field_size
940
+ row.each do |column|
941
+ validate_field_size(column)
942
+ end
943
+ end
811
944
  n_columns = row.size
812
945
  i = 0
813
946
  while i < n_columns
@@ -863,6 +996,7 @@ class CSV
863
996
  @need_robust_parsing = true
864
997
  return parse_quotable_robust(&block)
865
998
  end
999
+ validate_field_size(row[i])
866
1000
  end
867
1001
  i += 1
868
1002
  end
@@ -886,10 +1020,7 @@ class CSV
886
1020
  value = parse_column_value
887
1021
  if value
888
1022
  @scanner.scan_all(@strip_value) if @strip_value
889
- if @field_size_limit and value.size >= @field_size_limit
890
- ignore_broken_line
891
- raise MalformedCSVError.new("Field size exceeded", @lineno)
892
- end
1023
+ validate_field_size(value)
893
1024
  end
894
1025
  if parse_column_end
895
1026
  row << value
@@ -910,11 +1041,17 @@ class CSV
910
1041
  break
911
1042
  else
912
1043
  if @quoted_column_value
1044
+ if liberal_parsing? and (new_line = @scanner.check(@line_end))
1045
+ message =
1046
+ "Illegal end-of-line sequence outside of a quoted field " +
1047
+ "<#{new_line.inspect}>"
1048
+ else
1049
+ message = "Any value after quoted field isn't allowed"
1050
+ end
913
1051
  ignore_broken_line
914
- message = "Any value after quoted field isn't allowed"
915
1052
  raise MalformedCSVError.new(message, @lineno)
916
1053
  elsif @unquoted_column_value and
917
- (new_line = @scanner.scan(@cr_or_lf))
1054
+ (new_line = @scanner.scan(@line_end))
918
1055
  ignore_broken_line
919
1056
  message = "Unquoted fields do not allow new line " +
920
1057
  "<#{new_line.inspect}>"
@@ -923,7 +1060,7 @@ class CSV
923
1060
  ignore_broken_line
924
1061
  message = "Illegal quoting"
925
1062
  raise MalformedCSVError.new(message, @lineno)
926
- elsif (new_line = @scanner.scan(@cr_or_lf))
1063
+ elsif (new_line = @scanner.scan(@line_end))
927
1064
  ignore_broken_line
928
1065
  message = "New line must be <#{@row_separator.inspect}> " +
929
1066
  "not <#{new_line.inspect}>"
@@ -1089,7 +1226,7 @@ class CSV
1089
1226
 
1090
1227
  def ignore_broken_line
1091
1228
  @scanner.scan_all(@not_line_end)
1092
- @scanner.scan_all(@cr_or_lf)
1229
+ @scanner.scan_all(@line_end)
1093
1230
  @lineno += 1
1094
1231
  end
1095
1232
 
data/lib/csv/table.rb CHANGED
@@ -999,9 +999,15 @@ class CSV
999
999
  # Omits the headers if option +write_headers+ is given as +false+
1000
1000
  # (see {Option +write_headers+}[../CSV.html#class-CSV-label-Option+write_headers]):
1001
1001
  # table.to_csv(write_headers: false) # => "foo,0\nbar,1\nbaz,2\n"
1002
- def to_csv(write_headers: true, **options)
1002
+ #
1003
+ # Limit rows if option +limit+ is given like +2+:
1004
+ # table.to_csv(limit: 2) # => "Name,Value\nfoo,0\nbar,1\n"
1005
+ def to_csv(write_headers: true, limit: nil, **options)
1003
1006
  array = write_headers ? [headers.to_csv(**options)] : []
1004
- @table.each do |row|
1007
+ limit ||= @table.size
1008
+ limit = @table.size + 1 + limit if limit < 0
1009
+ limit = 0 if limit < 0
1010
+ @table.first(limit).each do |row|
1005
1011
  array.push(row.fields.to_csv(**options)) unless row.header_row?
1006
1012
  end
1007
1013
 
@@ -1038,9 +1044,13 @@ class CSV
1038
1044
  # Example:
1039
1045
  # source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n"
1040
1046
  # table = CSV.parse(source, headers: true)
1041
- # table.inspect # => "#<CSV::Table mode:col_or_row row_count:4>"
1047
+ # table.inspect # => "#<CSV::Table mode:col_or_row row_count:4>\nName,Value\nfoo,0\nbar,1\nbaz,2\n"
1048
+ #
1042
1049
  def inspect
1043
- "#<#{self.class} mode:#{@mode} row_count:#{to_a.size}>".encode("US-ASCII")
1050
+ inspected = +"#<#{self.class} mode:#{@mode} row_count:#{to_a.size}>"
1051
+ summary = to_csv(limit: 5)
1052
+ inspected << "\n" << summary if summary.encoding.ascii_compatible?
1053
+ inspected
1044
1054
  end
1045
1055
  end
1046
1056
  end
data/lib/csv/version.rb CHANGED
@@ -2,5 +2,5 @@
2
2
 
3
3
  class CSV
4
4
  # The version of the installed library.
5
- VERSION = "3.2.0"
5
+ VERSION = "3.2.3"
6
6
  end
data/lib/csv/writer.rb CHANGED
@@ -1,5 +1,6 @@
1
1
  # frozen_string_literal: true
2
2
 
3
+ require_relative "input_record_separator"
3
4
  require_relative "match_p"
4
5
  require_relative "row"
5
6
 
@@ -133,7 +134,7 @@ class CSV
133
134
  @column_separator = @options[:column_separator].to_s.encode(@encoding)
134
135
  row_separator = @options[:row_separator]
135
136
  if row_separator == :auto
136
- @row_separator = $INPUT_RECORD_SEPARATOR.encode(@encoding)
137
+ @row_separator = InputRecordSeparator.value.encode(@encoding)
137
138
  else
138
139
  @row_separator = row_separator.to_s.encode(@encoding)
139
140
  end
data/lib/csv.rb CHANGED
@@ -90,11 +90,11 @@
90
90
  # with any questions.
91
91
 
92
92
  require "forwardable"
93
- require "English"
94
93
  require "date"
95
94
  require "stringio"
96
95
 
97
96
  require_relative "csv/fields_converter"
97
+ require_relative "csv/input_record_separator"
98
98
  require_relative "csv/match_p"
99
99
  require_relative "csv/parser"
100
100
  require_relative "csv/row"
@@ -341,6 +341,7 @@ using CSV::MatchP if CSV.const_defined?(:MatchP)
341
341
  # liberal_parsing: false,
342
342
  # nil_value: nil,
343
343
  # empty_value: "",
344
+ # strip: false,
344
345
  # # For generating.
345
346
  # write_headers: nil,
346
347
  # quote_empty: true,
@@ -348,7 +349,6 @@ using CSV::MatchP if CSV.const_defined?(:MatchP)
348
349
  # write_converters: nil,
349
350
  # write_nil_value: nil,
350
351
  # write_empty_value: "",
351
- # strip: false,
352
352
  # }
353
353
  #
354
354
  # ==== Options for Parsing
@@ -357,7 +357,9 @@ using CSV::MatchP if CSV.const_defined?(:MatchP)
357
357
  # - +row_sep+: Specifies the row separator; used to delimit rows.
358
358
  # - +col_sep+: Specifies the column separator; used to delimit fields.
359
359
  # - +quote_char+: Specifies the quote character; used to quote fields.
360
- # - +field_size_limit+: Specifies the maximum field size allowed.
360
+ # - +field_size_limit+: Specifies the maximum field size + 1 allowed.
361
+ # Deprecated since 3.2.3. Use +max_field_size+ instead.
362
+ # - +max_field_size+: Specifies the maximum field size allowed.
361
363
  # - +converters+: Specifies the field converters to be used.
362
364
  # - +unconverted_fields+: Specifies whether unconverted fields are to be available.
363
365
  # - +headers+: Specifies whether data contains headers,
@@ -366,8 +368,9 @@ using CSV::MatchP if CSV.const_defined?(:MatchP)
366
368
  # - +header_converters+: Specifies the header converters to be used.
367
369
  # - +skip_blanks+: Specifies whether blanks lines are to be ignored.
368
370
  # - +skip_lines+: Specifies how comments lines are to be recognized.
369
- # - +strip+: Specifies whether leading and trailing whitespace are
370
- # to be stripped from fields..
371
+ # - +strip+: Specifies whether leading and trailing whitespace are to be
372
+ # stripped from fields. This must be compatible with +col_sep+; if it is not,
373
+ # then an +ArgumentError+ exception will be raised.
371
374
  # - +liberal_parsing+: Specifies whether \CSV should attempt to parse
372
375
  # non-compliant data.
373
376
  # - +nil_value+: Specifies the object that is to be substituted for each null (no-text) field.
@@ -513,7 +516,7 @@ using CSV::MatchP if CSV.const_defined?(:MatchP)
513
516
  # [" 1 ", #<struct CSV::FieldInfo index=1, line=2, header=nil>]
514
517
  # [" baz ", #<struct CSV::FieldInfo index=0, line=3, header=nil>]
515
518
  # [" 2 ", #<struct CSV::FieldInfo index=1, line=3, header=nil>]
516
- # Each CSV::Info object shows:
519
+ # Each CSV::FieldInfo object shows:
517
520
  # - The 0-based field index.
518
521
  # - The 1-based line index.
519
522
  # - The field header, if any.
@@ -547,6 +550,14 @@ using CSV::MatchP if CSV.const_defined?(:MatchP)
547
550
  #
548
551
  # There is no such storage structure for write headers.
549
552
  #
553
+ # In order for the parsing methods to access stored converters in non-main-Ractors, the
554
+ # storage structure must be made shareable first.
555
+ # Therefore, <tt>Ractor.make_shareable(CSV::Converters)</tt> and
556
+ # <tt>Ractor.make_shareable(CSV::HeaderConverters)</tt> must be called before the creation
557
+ # of Ractors that use the converters stored in these structures. (Since making the storage
558
+ # structures shareable involves freezing them, any custom converters that are to be used
559
+ # must be added first.)
560
+ #
550
561
  # ===== Converter Lists
551
562
  #
552
563
  # A _converter_ _list_ is an \Array that may include any assortment of:
@@ -917,8 +928,10 @@ class CSV
917
928
  symbol: lambda { |h|
918
929
  h.encode(ConverterEncoding).downcase.gsub(/[^\s\w]+/, "").strip.
919
930
  gsub(/\s+/, "_").to_sym
920
- }
931
+ },
932
+ symbol_raw: lambda { |h| h.encode(ConverterEncoding).to_sym }
921
933
  }
934
+
922
935
  # Default values for method options.
923
936
  DEFAULT_OPTIONS = {
924
937
  # For both parsing and generating.
@@ -927,6 +940,7 @@ class CSV
927
940
  quote_char: '"',
928
941
  # For parsing.
929
942
  field_size_limit: nil,
943
+ max_field_size: nil,
930
944
  converters: nil,
931
945
  unconverted_fields: nil,
932
946
  headers: false,
@@ -937,6 +951,7 @@ class CSV
937
951
  liberal_parsing: false,
938
952
  nil_value: nil,
939
953
  empty_value: "",
954
+ strip: false,
940
955
  # For generating.
941
956
  write_headers: nil,
942
957
  quote_empty: true,
@@ -944,7 +959,6 @@ class CSV
944
959
  write_converters: nil,
945
960
  write_nil_value: nil,
946
961
  write_empty_value: "",
947
- strip: false,
948
962
  }.freeze
949
963
 
950
964
  class << self
@@ -957,6 +971,8 @@ class CSV
957
971
  # Creates or retrieves cached \CSV objects.
958
972
  # For arguments and options, see CSV.new.
959
973
  #
974
+ # This API is not Ractor-safe.
975
+ #
960
976
  # ---
961
977
  #
962
978
  # With no block given, returns a \CSV object.
@@ -1187,7 +1203,7 @@ class CSV
1187
1203
  # See {Options for Parsing}[#class-CSV-label-Options+for+Parsing].
1188
1204
  def filter(input=nil, output=nil, **options)
1189
1205
  # parse options for input, output, or both
1190
- in_options, out_options = Hash.new, {row_sep: $INPUT_RECORD_SEPARATOR}
1206
+ in_options, out_options = Hash.new, {row_sep: InputRecordSeparator.value}
1191
1207
  options.each do |key, value|
1192
1208
  case key.to_s
1193
1209
  when /\Ain(?:put)?_(.+)\Z/
@@ -1407,8 +1423,8 @@ class CSV
1407
1423
  # Argument +ary+ must be an \Array.
1408
1424
  #
1409
1425
  # Special options:
1410
- # * Option <tt>:row_sep</tt> defaults to <tt>$INPUT_RECORD_SEPARATOR</tt>
1411
- # (<tt>$/</tt>).:
1426
+ # * Option <tt>:row_sep</tt> defaults to <tt>"\n"> on Ruby 3.0 or later
1427
+ # and <tt>$INPUT_RECORD_SEPARATOR</tt> (<tt>$/</tt>) otherwise.:
1412
1428
  # $INPUT_RECORD_SEPARATOR # => "\n"
1413
1429
  # * This method accepts an additional option, <tt>:encoding</tt>, which sets the base
1414
1430
  # Encoding for the output. This method will try to guess your Encoding from
@@ -1430,7 +1446,7 @@ class CSV
1430
1446
  # CSV.generate_line(:foo)
1431
1447
  #
1432
1448
  def generate_line(row, **options)
1433
- options = {row_sep: $INPUT_RECORD_SEPARATOR}.merge(options)
1449
+ options = {row_sep: InputRecordSeparator.value}.merge(options)
1434
1450
  str = +""
1435
1451
  if options[:encoding]
1436
1452
  str.force_encoding(options[:encoding])
@@ -1853,6 +1869,7 @@ class CSV
1853
1869
  row_sep: :auto,
1854
1870
  quote_char: '"',
1855
1871
  field_size_limit: nil,
1872
+ max_field_size: nil,
1856
1873
  converters: nil,
1857
1874
  unconverted_fields: nil,
1858
1875
  headers: false,
@@ -1868,11 +1885,11 @@ class CSV
1868
1885
  encoding: nil,
1869
1886
  nil_value: nil,
1870
1887
  empty_value: "",
1888
+ strip: false,
1871
1889
  quote_empty: true,
1872
1890
  write_converters: nil,
1873
1891
  write_nil_value: nil,
1874
- write_empty_value: "",
1875
- strip: false)
1892
+ write_empty_value: "")
1876
1893
  raise ArgumentError.new("Cannot parse nil as CSV") if data.nil?
1877
1894
 
1878
1895
  if data.is_a?(String)
@@ -1895,11 +1912,14 @@ class CSV
1895
1912
  @initial_header_converters = header_converters
1896
1913
  @initial_write_converters = write_converters
1897
1914
 
1915
+ if max_field_size.nil? and field_size_limit
1916
+ max_field_size = field_size_limit - 1
1917
+ end
1898
1918
  @parser_options = {
1899
1919
  column_separator: col_sep,
1900
1920
  row_separator: row_sep,
1901
1921
  quote_character: quote_char,
1902
- field_size_limit: field_size_limit,
1922
+ max_field_size: max_field_size,
1903
1923
  unconverted_fields: unconverted_fields,
1904
1924
  headers: headers,
1905
1925
  return_headers: return_headers,
@@ -1967,10 +1987,24 @@ class CSV
1967
1987
  # Returns the limit for field size; used for parsing;
1968
1988
  # see {Option +field_size_limit+}[#class-CSV-label-Option+field_size_limit]:
1969
1989
  # CSV.new('').field_size_limit # => nil
1990
+ #
1991
+ # Deprecated since 3.2.3. Use +max_field_size+ instead.
1970
1992
  def field_size_limit
1971
1993
  parser.field_size_limit
1972
1994
  end
1973
1995
 
1996
+ # :call-seq:
1997
+ # csv.max_field_size -> integer or nil
1998
+ #
1999
+ # Returns the limit for field size; used for parsing;
2000
+ # see {Option +max_field_size+}[#class-CSV-label-Option+max_field_size]:
2001
+ # CSV.new('').max_field_size # => nil
2002
+ #
2003
+ # Since 3.2.3.
2004
+ def max_field_size
2005
+ parser.max_field_size
2006
+ end
2007
+
1974
2008
  # :call-seq:
1975
2009
  # csv.skip_lines -> regexp or nil
1976
2010
  #
@@ -1992,6 +2026,10 @@ class CSV
1992
2026
  # csv.converters # => [:integer]
1993
2027
  # csv.convert(proc {|x| x.to_s })
1994
2028
  # csv.converters
2029
+ #
2030
+ # Notes that you need to call
2031
+ # +Ractor.make_shareable(CSV::Converters)+ on the main Ractor to use
2032
+ # this method.
1995
2033
  def converters
1996
2034
  parser_fields_converter.map do |converter|
1997
2035
  name = Converters.rassoc(converter)
@@ -2054,6 +2092,10 @@ class CSV
2054
2092
  # Returns an \Array containing header converters; used for parsing;
2055
2093
  # see {Header Converters}[#class-CSV-label-Header+Converters]:
2056
2094
  # CSV.new('').header_converters # => []
2095
+ #
2096
+ # Notes that you need to call
2097
+ # +Ractor.make_shareable(CSV::HeaderConverters)+ on the main Ractor
2098
+ # to use this method.
2057
2099
  def header_converters
2058
2100
  header_fields_converter.map do |converter|
2059
2101
  name = HeaderConverters.rassoc(converter)
@@ -2694,7 +2736,7 @@ class CSV
2694
2736
 
2695
2737
  def build_parser_fields_converter
2696
2738
  specific_options = {
2697
- builtin_converters: Converters,
2739
+ builtin_converters_name: :Converters,
2698
2740
  }
2699
2741
  options = @base_fields_converter_options.merge(specific_options)
2700
2742
  build_fields_converter(@initial_converters, options)
@@ -2706,7 +2748,7 @@ class CSV
2706
2748
 
2707
2749
  def build_header_fields_converter
2708
2750
  specific_options = {
2709
- builtin_converters: HeaderConverters,
2751
+ builtin_converters_name: :HeaderConverters,
2710
2752
  accept_nil: true,
2711
2753
  }
2712
2754
  options = @base_fields_converter_options.merge(specific_options)
@@ -2774,6 +2816,8 @@ end
2774
2816
  # io = StringIO.new
2775
2817
  # CSV(io, col_sep: ";") { |csv| csv << ["a", "b", "c"] }
2776
2818
  #
2819
+ # This API is not Ractor-safe.
2820
+ #
2777
2821
  def CSV(*args, **options, &block)
2778
2822
  CSV.instance(*args, **options, &block)
2779
2823
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: csv
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.2.0
4
+ version: 3.2.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - James Edward Gray II
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2021-06-05 00:00:00.000000000 Z
12
+ date: 2022-04-08 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: bundler
@@ -59,14 +59,14 @@ dependencies:
59
59
  requirements:
60
60
  - - ">="
61
61
  - !ruby/object:Gem::Version
62
- version: 3.4.3
62
+ version: 3.4.8
63
63
  type: :development
64
64
  prerelease: false
65
65
  version_requirements: !ruby/object:Gem::Requirement
66
66
  requirements:
67
67
  - - ">="
68
68
  - !ruby/object:Gem::Version
69
- version: 3.4.3
69
+ version: 3.4.8
70
70
  description: The CSV library provides a complete interface to CSV files and data.
71
71
  It offers tools to enable you to read and write to and from Strings or IO objects,
72
72
  as needed.
@@ -118,6 +118,7 @@ files:
118
118
  - lib/csv/core_ext/string.rb
119
119
  - lib/csv/delete_suffix.rb
120
120
  - lib/csv/fields_converter.rb
121
+ - lib/csv/input_record_separator.rb
121
122
  - lib/csv/match_p.rb
122
123
  - lib/csv/parser.rb
123
124
  - lib/csv/row.rb
@@ -146,7 +147,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
146
147
  - !ruby/object:Gem::Version
147
148
  version: '0'
148
149
  requirements: []
149
- rubygems_version: 3.3.0.dev
150
+ rubygems_version: 3.4.0.dev
150
151
  signing_key:
151
152
  specification_version: 4
152
153
  summary: CSV Reading and Writing