RubyGems - csv - Versions diffs - 3.2.1 → 3.2.4 - Mend

csv 3.2.1 → 3.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

checksums.yaml +4 -4
data/NEWS.md +113 -0
data/README.md +1 -1
data/doc/csv/options/generating/write_headers.rdoc +1 -1
data/doc/csv/recipes/generating.rdoc +1 -1
data/doc/csv/recipes/parsing.rdoc +1 -1
data/lib/csv/fields_converter.rb +3 -2
data/lib/csv/input_record_separator.rb +1 -14
data/lib/csv/parser.rb +237 -92
data/lib/csv/row.rb +1 -1
data/lib/csv/table.rb +14 -4
data/lib/csv/version.rb +1 -1
data/lib/csv/writer.rb +5 -5
data/lib/csv.rb +48 -17
metadata +3 -5
data/lib/csv/delete_suffix.rb +0 -18
data/lib/csv/match_p.rb +0 -20

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 6aee33c500a979f9f1a9afdbc394545932700ca3442c1b4326b417e59b654ef4
-  data.tar.gz: 6622e0c4f190f10aa6d3a49b023a6c24540b1133da4d0ef91c07016ba224073f
+  metadata.gz: 3ef88fc9b205f8f64c5e817b22f121d5d03534c155cb8d27dc9d87aa62e6b7e0
+  data.tar.gz: e12b86cee946837a96ae609314a50246e2135fab855dca58e800de1ddc50e524
 SHA512:
-  metadata.gz: 7ba35310fd8dc9ffd4075ca31d786b7458fefefaee4c5a00fc326063fb13d020d1959d9970dcf17c222dd5d2422e3d20b9ddcd503caf5a4aad04e0b93159ff60
-  data.tar.gz: 9d1971baae109ad7396124cfa6bda15bc6db635ff09c6166a750dc22633b552679e5d589b15a57724f7b4aef3ae33c2ccd2fcfcfb1084e71df7f3734347e9926
+  metadata.gz: d663d0917d63315e4fc5802f9538727731f39ea06997adea485f5f137f37639f0791db1f9697476e7d0c4a8048a0339b9e4fb7aecd79f3ec8ee8eb24a4cb676e
+  data.tar.gz: f9d22fe50d227b8f8ca0d075ed0fbf1f39a10bf59a1adb68835fede727f3d1dfd1d2f88754caa1098fe9c24df738c71b29ae5fc3080a25d80df484f76530344e

data/NEWS.md CHANGED Viewed

@@ -1,5 +1,118 @@
 # News
+## 3.2.4 - 2022-08-22
+### Improvements
+  * Cleaned up internal implementations.
+    [[GitHub#249](https://github.com/ruby/csv/pull/249)]
+    [[GitHub#250](https://github.com/ruby/csv/pull/250)]
+    [[GitHub#251](https://github.com/ruby/csv/pull/251)]
+    [Patch by Mau Magnaguagno]
+  * Added support for RFC 3339 style time.
+    [[GitHub#248](https://github.com/ruby/csv/pull/248)]
+    [Patch by Thierry Lambert]
+  * Added support for transcoding String CSV. Syntax is
+    `from-encoding:to-encoding`.
+    [[GitHub#254](https://github.com/ruby/csv/issues/254)]
+    [Reported by Richard Stueven]
+  * Added quoted information to `CSV::FieldInfo`.
+    [[GitHub#254](https://github.com/ruby/csv/pull/253)]
+    [Reported by Hirokazu SUZUKI]
+### Fixes
+  * Fixed a link in documents.
+    [[GitHub#244](https://github.com/ruby/csv/pull/244)]
+    [Patch by Peter Zhu]
+### Thanks
+  * Peter Zhu
+  * Mau Magnaguagno
+  * Thierry Lambert
+  * Richard Stueven
+  * Hirokazu SUZUKI
+## 3.2.3 - 2022-04-09
+### Improvements
+  * Added contents summary to `CSV::Table#inspect`.
+    [GitHub#229][Patch by Eriko Sugiyama]
+    [GitHub#235][Patch by Sampat Badhe]
+  * Suppressed `$INPUT_RECORD_SEPARATOR` deprecation warning by
+    `Warning.warn`.
+    [GitHub#233][Reported by Jean byroot Boussier]
+  * Improved error message for liberal parsing with quoted values.
+    [GitHub#231][Patch by Nikolay Rys]
+  * Fixed typos in documentation.
+    [GitHub#236][Patch by Sampat Badhe]
+  * Added `:max_field_size` option and deprecated `:field_size_limit` option.
+    [GitHub#238][Reported by Dan Buettner]
+  * Added `:symbol_raw` to built-in header converters.
+    [GitHub#237][Reported by taki]
+    [GitHub#239][Patch by Eriko Sugiyama]
+### Fixes
+  * Fixed a bug that some texts may be dropped unexpectedly.
+    [Bug #18245][ruby-core:105587][Reported by Hassan Abdul Rehman]
+  * Fixed a bug that `:field_size_limit` doesn't work with not complex row.
+    [GitHub#238][Reported by Dan Buettner]
+### Thanks
+  * Hassan Abdul Rehman
+  * Eriko Sugiyama
+  * Jean byroot Boussier
+  * Nikolay Rys
+  * Sampat Badhe
+  * Dan Buettner
+  * taki
+## 3.2.2 - 2021-12-24
+### Improvements
+  * Added a validation for invalid option combination.
+    [GitHub#225][Patch by adamroyjones]
+  * Improved documentation for developers.
+    [GitHub#227][Patch by Eriko Sugiyama]
+### Fixes
+  * Fixed a bug that all of `ARGF` contents may not be consumed.
+    [GitHub#228][Reported by Rafael Navaza]
+### Thanks
+  * adamroyjones
+  * Eriko Sugiyama
+  * Rafael Navaza
 ## 3.2.1 - 2021-10-23
 ### Improvements

data/README.md CHANGED Viewed

@@ -35,7 +35,7 @@ end
 ## Development
-After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
+After checking out the repo, run `ruby run-test.rb` to check if your changes can pass the test.
 To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).

data/doc/csv/options/generating/write_headers.rdoc CHANGED Viewed

@@ -19,7 +19,7 @@ Without +write_headers+:
 With +write_headers+":
   CSV.open(file_path,'w',
-      :write_headers=> true,
+      :write_headers => true,
       :headers => ['Name','Value']
     ) do |csv|
       csv << ['foo', '0']

data/doc/csv/recipes/generating.rdoc CHANGED Viewed

@@ -148,7 +148,7 @@ This example defines and uses a custom write converter to strip whitespace from
 ==== Recipe: Specify Multiple Write Converters
-Use option <tt>:write_converters</tt> and multiple custom coverters
+Use option <tt>:write_converters</tt> and multiple custom converters
 to convert field values when generating \CSV.
 This example defines and uses two custom write converters to strip and upcase generated fields:

data/doc/csv/recipes/parsing.rdoc CHANGED Viewed

@@ -83,7 +83,7 @@ Use instance method CSV#each with option +headers+ to read a source \String one
   CSV.new(string, headers: true).each do |row|
     p row
   end
-Ouput:
+Output:
   #<CSV::Row "Name":"foo" "Value":"0">
   #<CSV::Row "Name":"bar" "Value":"1">
   #<CSV::Row "Name":"baz" "Value":"2">

data/lib/csv/fields_converter.rb CHANGED Viewed

@@ -44,7 +44,7 @@ class CSV
       @converters.empty?
     end
-    def convert(fields, headers, lineno)
+    def convert(fields, headers, lineno, quoted_fields)
       return fields unless need_convert?
       fields.collect.with_index do |field, index|
@@ -63,7 +63,8 @@ class CSV
             else
               header = nil
             end
-            field = converter[field, FieldInfo.new(index, lineno, header)]
+            quoted = quoted_fields[index]
+            field = converter[field, FieldInfo.new(index, lineno, header, quoted)]
           end
           break unless field.is_a?(String)  # short-circuit pipeline for speed
         end

data/lib/csv/input_record_separator.rb CHANGED Viewed

@@ -4,20 +4,7 @@ require "stringio"
 class CSV
   module InputRecordSeparator
     class << self
-      is_input_record_separator_deprecated = false
-      verbose, $VERBOSE = $VERBOSE, true
-      stderr, $stderr = $stderr, StringIO.new
-      input_record_separator = $INPUT_RECORD_SEPARATOR
-      begin
-        $INPUT_RECORD_SEPARATOR = "\r\n"
-        is_input_record_separator_deprecated = (not $stderr.string.empty?)
-      ensure
-        $INPUT_RECORD_SEPARATOR = input_record_separator
-        $stderr = stderr
-        $VERBOSE = verbose
-      end
-      if is_input_record_separator_deprecated
+      if RUBY_VERSION >= "3.0.0"
         def value
           "\n"
         end

data/lib/csv/parser.rb CHANGED Viewed

@@ -2,15 +2,10 @@
 require "strscan"
-require_relative "delete_suffix"
 require_relative "input_record_separator"
-require_relative "match_p"
 require_relative "row"
 require_relative "table"
-using CSV::DeleteSuffix if CSV.const_defined?(:DeleteSuffix)
-using CSV::MatchP if CSV.const_defined?(:MatchP)
 class CSV
   # Note: Don't use this class directly. This is an internal class.
   class Parser
@@ -27,6 +22,10 @@ class CSV
     class InvalidEncoding < StandardError
     end
+    # Raised when unexpected case is happen.
+    class UnexpectedError < StandardError
+    end
     #
     # CSV::Scanner receives a CSV output, scans it and return the content.
     # It also controls the life cycle of the object with its methods +keep_start+,
@@ -78,16 +77,17 @@ class CSV
     # +keep_end+, +keep_back+, +keep_drop+.
     #
     # CSV::InputsScanner.scan() tries to match with pattern at the current position.
-    # If there's a match, the scanner advances the “scan pointer” and returns the matched string.
+    # If there's a match, the scanner advances the "scan pointer" and returns the matched string.
     # Otherwise, the scanner returns nil.
     #
-    # CSV::InputsScanner.rest() returns the “rest” of the string (i.e. everything after the scan pointer).
+    # CSV::InputsScanner.rest() returns the "rest" of the string (i.e. everything after the scan pointer).
     # If there is no more data (eos? = true), it returns "".
     #
     class InputsScanner
-      def initialize(inputs, encoding, chunk_size: 8192)
+      def initialize(inputs, encoding, row_separator, chunk_size: 8192)
         @inputs = inputs.dup
         @encoding = encoding
+        @row_separator = row_separator
         @chunk_size = chunk_size
         @last_scanner = @inputs.empty?
         @keeps = []
@@ -95,11 +95,13 @@ class CSV
       end
       def each_line(row_separator)
+        return enum_for(__method__, row_separator) unless block_given?
         buffer = nil
         input = @scanner.rest
         position = @scanner.pos
         offset = 0
         n_row_separator_chars = row_separator.size
+        # trace(__method__, :start, line, input)
         while true
           input.each_line(row_separator) do |line|
             @scanner.pos += line.bytesize
@@ -139,25 +141,28 @@ class CSV
       end
       def scan(pattern)
+        # trace(__method__, pattern, :start)
         value = @scanner.scan(pattern)
+        # trace(__method__, pattern, :done, :last, value) if @last_scanner
         return value if @last_scanner
-        if value
-          read_chunk if @scanner.eos?
-          return value
-        else
-          nil
-        end
+        read_chunk if value and @scanner.eos?
+        # trace(__method__, pattern, :done, value)
+        value
       end
       def scan_all(pattern)
+        # trace(__method__, pattern, :start)
         value = @scanner.scan(pattern)
+        # trace(__method__, pattern, :done, :last, value) if @last_scanner
         return value if @last_scanner
         return nil if value.nil?
         while @scanner.eos? and read_chunk and (sub_value = @scanner.scan(pattern))
+          # trace(__method__, pattern, :sub, sub_value)
           value << sub_value
         end
+        # trace(__method__, pattern, :done, value)
         value
       end
@@ -166,76 +171,135 @@ class CSV
       end
       def keep_start
-        @keeps.push([@scanner.pos, nil])
+        # trace(__method__, :start)
+        adjust_last_keep
+        @keeps.push([@scanner, @scanner.pos, nil])
+        # trace(__method__, :done)
       end
       def keep_end
-        start, buffer = @keeps.pop
-        keep = @scanner.string.byteslice(start, @scanner.pos - start)
+        # trace(__method__, :start)
+        scanner, start, buffer = @keeps.pop
+        if scanner == @scanner
+          keep = @scanner.string.byteslice(start, @scanner.pos - start)
+        else
+          keep = @scanner.string.byteslice(0, @scanner.pos)
+        end
         if buffer
           buffer << keep
           keep = buffer
         end
+        # trace(__method__, :done, keep)
         keep
       end
       def keep_back
-        start, buffer = @keeps.pop
+        # trace(__method__, :start)
+        scanner, start, buffer = @keeps.pop
         if buffer
+          # trace(__method__, :rescan, start, buffer)
           string = @scanner.string
-          keep = string.byteslice(start, string.bytesize - start)
+          if scanner == @scanner
+            keep = string.byteslice(start, string.bytesize - start)
+          else
+            keep = string
+          end
           if keep and not keep.empty?
             @inputs.unshift(StringIO.new(keep))
             @last_scanner = false
           end
           @scanner = StringScanner.new(buffer)
         else
+          if @scanner != scanner
+            message = "scanners are different but no buffer: "
+            message += "#{@scanner.inspect}(#{@scanner.object_id}): "
+            message += "#{scanner.inspect}(#{scanner.object_id})"
+            raise UnexpectedError, message
+          end
+          # trace(__method__, :repos, start, buffer)
           @scanner.pos = start
         end
         read_chunk if @scanner.eos?
       end
       def keep_drop
-        @keeps.pop
+        _, _, buffer = @keeps.pop
+        # trace(__method__, :done, :empty) unless buffer
+        return unless buffer
+        last_keep = @keeps.last
+        # trace(__method__, :done, :no_last_keep) unless last_keep
+        return unless last_keep
+        if last_keep[2]
+          last_keep[2] << buffer
+        else
+          last_keep[2] = buffer
+        end
+        # trace(__method__, :done)
       end
       def rest
         @scanner.rest
       end
+      def check(pattern)
+        @scanner.check(pattern)
+      end
       private
-      def read_chunk
-        return false if @last_scanner
+      def trace(*args)
+        pp([*args, @scanner, @scanner&.string, @scanner&.pos, @keeps])
+      end
-        unless @keeps.empty?
-          keep = @keeps.last
-          keep_start = keep[0]
-          string = @scanner.string
-          keep_data = string.byteslice(keep_start, @scanner.pos - keep_start)
-          if keep_data
-            keep_buffer = keep[1]
-            if keep_buffer
-              keep_buffer << keep_data
-            else
-              keep[1] = keep_data.dup
-            end
+      def adjust_last_keep
+        # trace(__method__, :start)
+        keep = @keeps.last
+        # trace(__method__, :done, :empty) if keep.nil?
+        return if keep.nil?
+        scanner, start, buffer = keep
+        string = @scanner.string
+        if @scanner != scanner
+          start = 0
+        end
+        if start == 0 and @scanner.eos?
+          keep_data = string
+        else
+          keep_data = string.byteslice(start, @scanner.pos - start)
+        end
+        if keep_data
+          if buffer
+            buffer << keep_data
+          else
+            keep[2] = keep_data.dup
           end
-          keep[0] = 0
         end
+        # trace(__method__, :done)
+      end
+      def read_chunk
+        return false if @last_scanner
+        adjust_last_keep
         input = @inputs.first
         case input
         when StringIO
           string = input.read
           raise InvalidEncoding unless string.valid_encoding?
+          # trace(__method__, :stringio, string)
           @scanner = StringScanner.new(string)
           @inputs.shift
           @last_scanner = @inputs.empty?
           true
         else
-          chunk = input.gets(nil, @chunk_size)
+          chunk = input.gets(@row_separator, @chunk_size)
           if chunk
             raise InvalidEncoding unless chunk.valid_encoding?
+            # trace(__method__, :chunk, chunk)
             @scanner = StringScanner.new(chunk)
             if input.respond_to?(:eof?) and input.eof?
               @inputs.shift
@@ -243,6 +307,7 @@ class CSV
             end
             true
           else
+            # trace(__method__, :no_chunk)
             @scanner = StringScanner.new("".encode(@encoding))
             @inputs.shift
             @last_scanner = @inputs.empty?
@@ -277,7 +342,11 @@ class CSV
     end
     def field_size_limit
-      @field_size_limit
+      @max_field_size&.succ
+    end
+    def max_field_size
+      @max_field_size
     end
     def skip_lines
@@ -345,6 +414,16 @@ class CSV
         end
         message = "Invalid byte sequence in #{@encoding}"
         raise MalformedCSVError.new(message, lineno)
+      rescue UnexpectedError => error
+        if @scanner
+          ignore_broken_line
+          lineno = @lineno
+        else
+          lineno = @lineno + 1
+        end
+        message = "This should not be happen: #{error.message}: "
+        message += "Please report this to https://github.com/ruby/csv/issues"
+        raise MalformedCSVError.new(message, lineno)
       end
     end
@@ -361,6 +440,7 @@ class CSV
       prepare_skip_lines
       prepare_strip
       prepare_separators
+      validate_strip_and_col_sep_options
       prepare_quoted
       prepare_unquoted
       prepare_line
@@ -388,7 +468,7 @@ class CSV
         @backslash_quote = false
       end
       @unconverted_fields = @options[:unconverted_fields]
-      @field_size_limit = @options[:field_size_limit]
+      @max_field_size = @options[:max_field_size]
       @skip_blanks = @options[:skip_blanks]
       @fields_converter = @options[:fields_converter]
       @header_fields_converter = @options[:header_fields_converter]
@@ -531,6 +611,28 @@ class CSV
       @not_line_end = Regexp.new("[^\r\n]+".encode(@encoding))
     end
+    # This method verifies that there are no (obvious) ambiguities with the
+    # provided +col_sep+ and +strip+ parsing options. For example, if +col_sep+
+    # and +strip+ were both equal to +\t+, then there would be no clear way to
+    # parse the input.
+    def validate_strip_and_col_sep_options
+      return unless @strip
+      if @strip.is_a?(String)
+        if @column_separator.start_with?(@strip) || @column_separator.end_with?(@strip)
+          raise ArgumentError,
+                "The provided strip (#{@escaped_strip}) and " \
+                "col_sep (#{@escaped_column_separator}) options are incompatible."
+        end
+      else
+        if Regexp.new("\\A[#{@escaped_strip}]|[#{@escaped_strip}]\\z").match?(@column_separator)
+          raise ArgumentError,
+                "The provided strip (true) and " \
+                "col_sep (#{@escaped_column_separator}) options are incompatible."
+        end
+      end
+    end
     def prepare_quoted
       if @quote_character
         @quotes = Regexp.new(@escaped_quote_character +
@@ -656,9 +758,10 @@ class CSV
       case headers
       when Array
         @raw_headers = headers
+        quoted_fields = [false] * @raw_headers.size
         @use_headers = true
       when String
-        @raw_headers = parse_headers(headers)
+        @raw_headers, quoted_fields = parse_headers(headers)
         @use_headers = true
       when nil, false
         @raw_headers = nil
@@ -668,21 +771,28 @@ class CSV
         @use_headers = true
       end
       if @raw_headers
-        @headers = adjust_headers(@raw_headers)
+        @headers = adjust_headers(@raw_headers, quoted_fields)
       else
         @headers = nil
       end
     end
     def parse_headers(row)
-      CSV.parse_line(row,
-                     col_sep:    @column_separator,
-                     row_sep:    @row_separator,
-                     quote_char: @quote_character)
+      quoted_fields = []
+      converter = lambda do |field, info|
+        quoted_fields << info.quoted?
+        field
+      end
+      headers = CSV.parse_line(row,
+                               col_sep:    @column_separator,
+                               row_sep:    @row_separator,
+                               quote_char: @quote_character,
+                               converters: [converter])
+      [headers, quoted_fields]
     end
-    def adjust_headers(headers)
-      adjusted_headers = @header_fields_converter.convert(headers, nil, @lineno)
+    def adjust_headers(headers, quoted_fields)
+      adjusted_headers = @header_fields_converter.convert(headers, nil, @lineno, quoted_fields)
       adjusted_headers.each {|h| h.freeze if h.is_a? String}
       adjusted_headers
     end
@@ -705,28 +815,28 @@ class CSV
       sample[0, 128].index(@quote_character)
     end
-    SCANNER_TEST = (ENV["CSV_PARSER_SCANNER_TEST"] == "yes")
-    if SCANNER_TEST
-      class UnoptimizedStringIO
-        def initialize(string)
-          @io = StringIO.new(string, "rb:#{string.encoding}")
-        end
+    class UnoptimizedStringIO # :nodoc:
+      def initialize(string)
+        @io = StringIO.new(string, "rb:#{string.encoding}")
+      end
-        def gets(*args)
-          @io.gets(*args)
-        end
+      def gets(*args)
+        @io.gets(*args)
+      end
-        def each_line(*args, &block)
-          @io.each_line(*args, &block)
-        end
+      def each_line(*args, &block)
+        @io.each_line(*args, &block)
+      end
-        def eof?
-          @io.eof?
-        end
+      def eof?
+        @io.eof?
       end
+    end
-      SCANNER_TEST_CHUNK_SIZE =
-        Integer((ENV["CSV_PARSER_SCANNER_TEST_CHUNK_SIZE"] || "1"), 10)
+    SCANNER_TEST = (ENV["CSV_PARSER_SCANNER_TEST"] == "yes")
+    if SCANNER_TEST
+      SCANNER_TEST_CHUNK_SIZE_NAME = "CSV_PARSER_SCANNER_TEST_CHUNK_SIZE"
+      SCANNER_TEST_CHUNK_SIZE_VALUE = ENV[SCANNER_TEST_CHUNK_SIZE_NAME]
       def build_scanner
         inputs = @samples.collect do |sample|
           UnoptimizedStringIO.new(sample)
@@ -736,16 +846,27 @@ class CSV
         else
           inputs << @input
         end
+        begin
+          chunk_size_value = ENV[SCANNER_TEST_CHUNK_SIZE_NAME]
+        rescue # Ractor::IsolationError
+          # Ractor on Ruby 3.0 can't read ENV value.
+          chunk_size_value = SCANNER_TEST_CHUNK_SIZE_VALUE
+        end
+        chunk_size = Integer((chunk_size_value || "1"), 10)
         InputsScanner.new(inputs,
                           @encoding,
-                          chunk_size: SCANNER_TEST_CHUNK_SIZE)
+                          @row_separator,
+                          chunk_size: chunk_size)
       end
     else
       def build_scanner
         string = nil
         if @samples.empty? and @input.is_a?(StringIO)
           string = @input.read
-        elsif @samples.size == 1 and @input.respond_to?(:eof?) and @input.eof?
+        elsif @samples.size == 1 and
+              @input != ARGF and
+              @input.respond_to?(:eof?) and
+              @input.eof?
           string = @samples[0]
         end
         if string
@@ -764,7 +885,7 @@ class CSV
             StringIO.new(sample)
           end
           inputs << @input
-          InputsScanner.new(inputs, @encoding)
+          InputsScanner.new(inputs, @encoding, @row_separator)
         end
       end
     end
@@ -798,6 +919,14 @@ class CSV
       end
     end
+    def validate_field_size(field)
+      return unless @max_field_size
+      return if field.size <= @max_field_size
+      ignore_broken_line
+      message = "Field size exceeded: #{field.size} > #{@max_field_size}"
+      raise MalformedCSVError.new(message, @lineno)
+    end
     def parse_no_quote(&block)
       @scanner.each_line(@row_separator) do |line|
         next if @skip_lines and skip_line?(line)
@@ -807,9 +936,16 @@ class CSV
         if line.empty?
           next if @skip_blanks
           row = []
+          quoted_fields = []
         else
           line = strip_value(line)
           row = line.split(@split_column_separator, -1)
+          quoted_fields = [false] * row.size
+          if @max_field_size
+            row.each do |column|
+              validate_field_size(column)
+            end
+          end
           n_columns = row.size
           i = 0
           while i < n_columns
@@ -818,7 +954,7 @@ class CSV
           end
         end
         @last_line = original_line
-        emit_row(row, &block)
+        emit_row(row, quoted_fields, &block)
       end
     end
@@ -840,31 +976,37 @@ class CSV
             next
           end
           row = []
+          quoted_fields = []
         elsif line.include?(@cr) or line.include?(@lf)
           @scanner.keep_back
           @need_robust_parsing = true
           return parse_quotable_robust(&block)
         else
           row = line.split(@split_column_separator, -1)
+          quoted_fields = []
           n_columns = row.size
           i = 0
           while i < n_columns
             column = row[i]
             if column.empty?
+              quoted_fields << false
               row[i] = nil
             else
               n_quotes = column.count(@quote_character)
               if n_quotes.zero?
+                quoted_fields << false
                 # no quote
               elsif n_quotes == 2 and
                    column.start_with?(@quote_character) and
                    column.end_with?(@quote_character)
+                quoted_fields << true
                 row[i] = column[1..-2]
               else
                 @scanner.keep_back
                 @need_robust_parsing = true
                 return parse_quotable_robust(&block)
               end
+              validate_field_size(row[i])
             end
             i += 1
           end
@@ -872,13 +1014,14 @@ class CSV
         @scanner.keep_drop
         @scanner.keep_start
         @last_line = original_line
-        emit_row(row, &block)
+        emit_row(row, quoted_fields, &block)
       end
       @scanner.keep_drop
     end
     def parse_quotable_robust(&block)
       row = []
+      quoted_fields = []
       skip_needless_lines
       start_row
       while true
@@ -888,32 +1031,39 @@ class CSV
         value = parse_column_value
         if value
           @scanner.scan_all(@strip_value) if @strip_value
-          if @field_size_limit and value.size >= @field_size_limit
-            ignore_broken_line
-            raise MalformedCSVError.new("Field size exceeded", @lineno)
-          end
+          validate_field_size(value)
         end
         if parse_column_end
           row << value
+          quoted_fields << @quoted_column_value
         elsif parse_row_end
           if row.empty? and value.nil?
-            emit_row([], &block) unless @skip_blanks
+            emit_row([], [], &block) unless @skip_blanks
           else
             row << value
-            emit_row(row, &block)
+            quoted_fields << @quoted_column_value
+            emit_row(row, quoted_fields, &block)
             row = []
+            quoted_fields = []
           end
           skip_needless_lines
           start_row
         elsif @scanner.eos?
           break if row.empty? and value.nil?
           row << value
-          emit_row(row, &block)
+          quoted_fields << @quoted_column_value
+          emit_row(row, quoted_fields, &block)
           break
         else
           if @quoted_column_value
+            if liberal_parsing? and (new_line = @scanner.check(@line_end))
+              message =
+                "Illegal end-of-line sequence outside of a quoted field " +
+                "<#{new_line.inspect}>"
+            else
+              message = "Any value after quoted field isn't allowed"
+            end
             ignore_broken_line
-            message = "Any value after quoted field isn't allowed"
             raise MalformedCSVError.new(message, @lineno)
           elsif @unquoted_column_value and
                 (new_line = @scanner.scan(@line_end))
@@ -1006,7 +1156,7 @@ class CSV
       if (n_quotes % 2).zero?
         quotes[0, (n_quotes - 2) / 2]
       else
-        value = quotes[0, (n_quotes - 1) / 2]
+        value = quotes[0, n_quotes / 2]
         while true
           quoted_value = @scanner.scan_all(@quoted_value)
           value << quoted_value if quoted_value
@@ -1030,11 +1180,9 @@ class CSV
           n_quotes = quotes.size
           if n_quotes == 1
             break
-          elsif (n_quotes % 2) == 1
-            value << quotes[0, (n_quotes - 1) / 2]
-            break
           else
             value << quotes[0, n_quotes / 2]
+            break if (n_quotes % 2) == 1
           end
         end
         value
@@ -1070,18 +1218,15 @@ class CSV
     def strip_value(value)
       return value unless @strip
-      return nil if value.nil?
+      return value if value.nil?
       case @strip
       when String
-        size = value.size
-        while value.start_with?(@strip)
-          size -= 1
-          value = value[1, size]
+        while value.delete_prefix!(@strip)
+          # do nothing
         end
-        while value.end_with?(@strip)
-          size -= 1
-          value = value[0, size]
+        while value.delete_suffix!(@strip)
+          # do nothing
         end
       else
         value.strip!
@@ -1104,22 +1249,22 @@ class CSV
       @scanner.keep_start
     end
-    def emit_row(row, &block)
+    def emit_row(row, quoted_fields, &block)
       @lineno += 1
       raw_row = row
       if @use_headers
         if @headers.nil?
-          @headers = adjust_headers(row)
+          @headers = adjust_headers(row, quoted_fields)
           return unless @return_headers
           row = Row.new(@headers, row, true)
         else
           row = Row.new(@headers,
-                        @fields_converter.convert(raw_row, @headers, @lineno))
+                        @fields_converter.convert(raw_row, @headers, @lineno, quoted_fields))
         end
       else
         # convert fields, if needed...
-        row = @fields_converter.convert(raw_row, nil, @lineno)
+        row = @fields_converter.convert(raw_row, nil, @lineno, quoted_fields)
       end
       # inject unconverted fields and accessor, if requested...

data/lib/csv/row.rb CHANGED Viewed

@@ -703,7 +703,7 @@ class CSV
     # by +index_or_header+ and +specifiers+.
     #
     # The nested objects may be instances of various classes.
-    # See {Dig Methods}[https://docs.ruby-lang.org/en/master/doc/dig_methods_rdoc.html].
+    # See {Dig Methods}[https://docs.ruby-lang.org/en/master/dig_methods_rdoc.html].
     #
     # Examples:
     #   source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n"

data/lib/csv/table.rb CHANGED Viewed

@@ -999,9 +999,15 @@ class CSV
     # Omits the headers if option +write_headers+ is given as +false+
     # (see {Option +write_headers+}[../CSV.html#class-CSV-label-Option+write_headers]):
     #   table.to_csv(write_headers: false) # => "foo,0\nbar,1\nbaz,2\n"
-    def to_csv(write_headers: true, **options)
+    #
+    # Limit rows if option +limit+ is given like +2+:
+    #   table.to_csv(limit: 2) # => "Name,Value\nfoo,0\nbar,1\n"
+    def to_csv(write_headers: true, limit: nil, **options)
       array = write_headers ? [headers.to_csv(**options)] : []
-      @table.each do |row|
+      limit ||= @table.size
+      limit = @table.size + 1 + limit if limit < 0
+      limit = 0 if limit < 0
+      @table.first(limit).each do |row|
         array.push(row.fields.to_csv(**options)) unless row.header_row?
       end
@@ -1038,9 +1044,13 @@ class CSV
     # Example:
     #   source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n"
     #   table = CSV.parse(source, headers: true)
-    #   table.inspect # => "#<CSV::Table mode:col_or_row row_count:4>"
+    #   table.inspect # => "#<CSV::Table mode:col_or_row row_count:4>\nName,Value\nfoo,0\nbar,1\nbaz,2\n"
+    #
     def inspect
-      "#<#{self.class} mode:#{@mode} row_count:#{to_a.size}>".encode("US-ASCII")
+      inspected = +"#<#{self.class} mode:#{@mode} row_count:#{to_a.size}>"
+      summary = to_csv(limit: 5)
+      inspected << "\n" << summary if summary.encoding.ascii_compatible?
+      inspected
     end
   end
 end

data/lib/csv/version.rb CHANGED Viewed

@@ -2,5 +2,5 @@
 class CSV
   # The version of the installed library.
-  VERSION = "3.2.1"
+  VERSION = "3.2.4"
 end

data/lib/csv/writer.rb CHANGED Viewed

@@ -1,11 +1,8 @@
 # frozen_string_literal: true
 require_relative "input_record_separator"
-require_relative "match_p"
 require_relative "row"
-using CSV::MatchP if CSV.const_defined?(:MatchP)
 class CSV
   # Note: Don't use this class directly. This is an internal class.
   class Writer
@@ -42,7 +39,10 @@ class CSV
       @headers ||= row if @use_headers
       @lineno += 1
-      row = @fields_converter.convert(row, nil, lineno) if @fields_converter
+      if @fields_converter
+        quoted_fields = [false] * row.size
+        row = @fields_converter.convert(row, nil, lineno, quoted_fields)
+      end
       i = -1
       converted_row = row.collect do |field|
@@ -97,7 +97,7 @@ class CSV
       return unless @headers
       converter = @options[:header_fields_converter]
-      @headers = converter.convert(@headers, nil, 0)
+      @headers = converter.convert(@headers, nil, 0, [])
       @headers.each do |header|
         header.freeze if header.is_a?(String)
       end

data/lib/csv.rb CHANGED Viewed

@@ -95,14 +95,11 @@ require "stringio"
 require_relative "csv/fields_converter"
 require_relative "csv/input_record_separator"
-require_relative "csv/match_p"
 require_relative "csv/parser"
 require_relative "csv/row"
 require_relative "csv/table"
 require_relative "csv/writer"
-using CSV::MatchP if CSV.const_defined?(:MatchP)
 # == \CSV
 #
 # === In a Hurry?
@@ -341,6 +338,7 @@ using CSV::MatchP if CSV.const_defined?(:MatchP)
 #     liberal_parsing:    false,
 #     nil_value:          nil,
 #     empty_value:        "",
+#     strip:              false,
 #     # For generating.
 #     write_headers:      nil,
 #     quote_empty:        true,
@@ -348,7 +346,6 @@ using CSV::MatchP if CSV.const_defined?(:MatchP)
 #     write_converters:   nil,
 #     write_nil_value:    nil,
 #     write_empty_value:  "",
-#     strip:              false,
 #   }
 #
 # ==== Options for Parsing
@@ -357,7 +354,9 @@ using CSV::MatchP if CSV.const_defined?(:MatchP)
 # - +row_sep+: Specifies the row separator; used to delimit rows.
 # - +col_sep+: Specifies the column separator; used to delimit fields.
 # - +quote_char+: Specifies the quote character; used to quote fields.
-# - +field_size_limit+: Specifies the maximum field size allowed.
+# - +field_size_limit+: Specifies the maximum field size + 1 allowed.
+#   Deprecated since 3.2.3. Use +max_field_size+ instead.
+# - +max_field_size+: Specifies the maximum field size allowed.
 # - +converters+: Specifies the field converters to be used.
 # - +unconverted_fields+: Specifies whether unconverted fields are to be available.
 # - +headers+: Specifies whether data contains headers,
@@ -366,8 +365,9 @@ using CSV::MatchP if CSV.const_defined?(:MatchP)
 # - +header_converters+: Specifies the header converters to be used.
 # - +skip_blanks+: Specifies whether blanks lines are to be ignored.
 # - +skip_lines+: Specifies how comments lines are to be recognized.
-# - +strip+: Specifies whether leading and trailing whitespace are
-#   to be stripped from fields..
+# - +strip+: Specifies whether leading and trailing whitespace are to be
+#   stripped from fields. This must be compatible with +col_sep+; if it is not,
+#   then an +ArgumentError+ exception will be raised.
 # - +liberal_parsing+: Specifies whether \CSV should attempt to parse
 #   non-compliant data.
 # - +nil_value+: Specifies the object that is to be substituted for each null (no-text) field.
@@ -863,8 +863,9 @@ class CSV
   # <b><tt>index</tt></b>::  The zero-based index of the field in its row.
   # <b><tt>line</tt></b>::   The line of the data source this row is from.
   # <b><tt>header</tt></b>:: The header for the column, when available.
+  # <b><tt>quoted?</tt></b>:: True or false, whether the original value is quoted or not.
   #
-  FieldInfo = Struct.new(:index, :line, :header)
+  FieldInfo = Struct.new(:index, :line, :header, :quoted?)
   # A Regexp used to find and convert some common Date formats.
   DateMatcher     = / \A(?: (\w+,?\s+)?\w+\s+\d{1,2},?\s+\d{2,4} |
@@ -872,10 +873,9 @@ class CSV
   # A Regexp used to find and convert some common DateTime formats.
   DateTimeMatcher =
     / \A(?: (\w+,?\s+)?\w+\s+\d{1,2}\s+\d{1,2}:\d{1,2}:\d{1,2},?\s+\d{2,4} |
-            \d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2} |
-            # ISO-8601
+            # ISO-8601 and RFC-3339 (space instead of T) recognized by DateTime.parse
             \d{4}-\d{2}-\d{2}
-              (?:T\d{2}:\d{2}(?::\d{2}(?:\.\d+)?(?:[+-]\d{2}(?::\d{2})|Z)?)?)?
+              (?:[T\s]\d{2}:\d{2}(?::\d{2}(?:\.\d+)?(?:[+-]\d{2}(?::\d{2})|Z)?)?)?
         )\z /x
   # The encoding used by all converters.
@@ -925,7 +925,8 @@ class CSV
     symbol:   lambda { |h|
       h.encode(ConverterEncoding).downcase.gsub(/[^\s\w]+/, "").strip.
                                            gsub(/\s+/, "_").to_sym
-    }
+    },
+    symbol_raw: lambda { |h| h.encode(ConverterEncoding).to_sym }
   }
   # Default values for method options.
@@ -936,6 +937,7 @@ class CSV
     quote_char:         '"',
     # For parsing.
     field_size_limit:   nil,
+    max_field_size:     nil,
     converters:         nil,
     unconverted_fields: nil,
     headers:            false,
@@ -946,6 +948,7 @@ class CSV
     liberal_parsing:    false,
     nil_value:          nil,
     empty_value:        "",
+    strip:              false,
     # For generating.
     write_headers:      nil,
     quote_empty:        true,
@@ -953,7 +956,6 @@ class CSV
     write_converters:   nil,
     write_nil_value:    nil,
     write_empty_value:  "",
-    strip:              false,
   }.freeze
   class << self
@@ -1864,6 +1866,7 @@ class CSV
                  row_sep: :auto,
                  quote_char: '"',
                  field_size_limit: nil,
+                 max_field_size: nil,
                  converters: nil,
                  unconverted_fields: nil,
                  headers: false,
@@ -1879,16 +1882,27 @@ class CSV
                  encoding: nil,
                  nil_value: nil,
                  empty_value: "",
+                 strip: false,
                  quote_empty: true,
                  write_converters: nil,
                  write_nil_value: nil,
-                 write_empty_value: "",
-                 strip: false)
+                 write_empty_value: "")
     raise ArgumentError.new("Cannot parse nil as CSV") if data.nil?
     if data.is_a?(String)
+      if encoding
+        if encoding.is_a?(String)
+          data_external_encoding, data_internal_encoding = encoding.split(":", 2)
+          if data_internal_encoding
+            data = data.encode(data_internal_encoding, data_external_encoding)
+          else
+            data = data.dup.force_encoding(data_external_encoding)
+          end
+        else
+          data = data.dup.force_encoding(encoding)
+        end
+      end
       @io = StringIO.new(data)
-      @io.set_encoding(encoding || data.encoding)
     else
       @io = data
     end
@@ -1906,11 +1920,14 @@ class CSV
     @initial_header_converters = header_converters
     @initial_write_converters = write_converters
+    if max_field_size.nil? and field_size_limit
+      max_field_size = field_size_limit - 1
+    end
     @parser_options = {
       column_separator: col_sep,
       row_separator: row_sep,
       quote_character: quote_char,
-      field_size_limit: field_size_limit,
+      max_field_size: max_field_size,
       unconverted_fields: unconverted_fields,
       headers: headers,
       return_headers: return_headers,
@@ -1978,10 +1995,24 @@ class CSV
   # Returns the limit for field size; used for parsing;
   # see {Option +field_size_limit+}[#class-CSV-label-Option+field_size_limit]:
   #   CSV.new('').field_size_limit # => nil
+  #
+  # Deprecated since 3.2.3. Use +max_field_size+ instead.
   def field_size_limit
     parser.field_size_limit
   end
+  # :call-seq:
+  #   csv.max_field_size -> integer or nil
+  #
+  # Returns the limit for field size; used for parsing;
+  # see {Option +max_field_size+}[#class-CSV-label-Option+max_field_size]:
+  #   CSV.new('').max_field_size # => nil
+  #
+  # Since 3.2.3.
+  def max_field_size
+    parser.max_field_size
+  end
   # :call-seq:
   #   csv.skip_lines -> regexp or nil
   #

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: csv
 version: !ruby/object:Gem::Version
-  version: 3.2.1
+  version: 3.2.4
 platform: ruby
 authors:
 - James Edward Gray II
@@ -9,7 +9,7 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2021-10-22 00:00:00.000000000 Z
+date: 2022-08-22 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: bundler
@@ -116,10 +116,8 @@ files:
 - lib/csv.rb
 - lib/csv/core_ext/array.rb
 - lib/csv/core_ext/string.rb
-- lib/csv/delete_suffix.rb
 - lib/csv/fields_converter.rb
 - lib/csv/input_record_separator.rb
-- lib/csv/match_p.rb
 - lib/csv/parser.rb
 - lib/csv/row.rb
 - lib/csv/table.rb
@@ -147,7 +145,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 3.3.0.dev
+rubygems_version: 3.4.0.dev
 signing_key:
 specification_version: 4
 summary: CSV Reading and Writing

data/lib/csv/delete_suffix.rb DELETED Viewed

@@ -1,18 +0,0 @@
-# frozen_string_literal: true
-# This provides String#delete_suffix? for Ruby 2.4.
-unless String.method_defined?(:delete_suffix)
-  class CSV
-    module DeleteSuffix
-      refine String do
-        def delete_suffix(suffix)
-          if end_with?(suffix)
-            self[0...-suffix.size]
-          else
-            self
-          end
-        end
-      end
-    end
-  end
-end

data/lib/csv/match_p.rb DELETED Viewed

@@ -1,20 +0,0 @@
-# frozen_string_literal: true
-# This provides String#match? and Regexp#match? for Ruby 2.3.
-unless String.method_defined?(:match?)
-  class CSV
-    module MatchP
-      refine String do
-        def match?(pattern)
-          self =~ pattern
-        end
-      end
-      refine Regexp do
-        def match?(string)
-          self =~ string
-        end
-      end
-    end
-  end
-end