smarter_csv 1.9.2 → 1.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 3e4032569303bd062a92b3c3f45f5166346808291667dda9ebd91af123f532ef
4
- data.tar.gz: 78b73abc411d8ed866feae600b87b72c3c99fd3b00b67c81eac227c17f8d38ea
3
+ metadata.gz: f1d0b58acf0135b621e3182470674230ef73b48c829810e74fffa975fc318cf5
4
+ data.tar.gz: ee404c5c485748d35cda36b8d249cb6813a3f80005182fe8c05feac1694aba57
5
5
  SHA512:
6
- metadata.gz: 1712951a2ce4f6e8ad93a6e76a105a3a8d4890babacfbb9ae3eead11ac638962d9da3d45421a327049e87c9d54b43c0dca1327f11a13bbd54440d3a7fefc6253
7
- data.tar.gz: 3d8b81f04c8eb16a7b2ab9ddf27bdaf2b2bfdd2ee3a8b70765a88f809fc9869500debe950d8ec27e3a6af818e6f1e415d96d078e52784d638f1363619088faa3
6
+ metadata.gz: 4fee097fe2237f863510100155062da6815237260da5b15189f104f54596f7d5ff0479deb80596544e0bb1b9ba7b78126d2251798721e8d2f91e06b430950cd6
7
+ data.tar.gz: c30562965452ef296b5e5aaf2a9a12887aa42d8e8396780b73b34f99a2386d232bf020578618fcbd65186fc864518c81a3e7555cae9b00a005322f3599e18c5a
data/CHANGELOG.md CHANGED
@@ -1,6 +1,29 @@
1
1
 
2
2
  # SmarterCSV 1.x Change Log
3
3
 
4
+ ## 1.10.0 (2023-12-31) ⚡ BREAKING ⚡
5
+
6
+ * BREAKING CHANGES:
7
+
8
+ Changed behavior:
9
+ + when `user_provided_headers` are provided:
10
+ * if they are not unique, an exception will now be raised
11
+ * they are taken "as is", no header transformations can be applied
12
+ * when they are given as strings or as symbols, it is assumed that this is the desired format
13
+ * the value of the `strings_as_keys` options will be ignored
14
+
15
+ + option `duplicate_header_suffix` now defaults to `''` instead of `nil`.
16
+ * this allows automatic disambiguation when processing of CSV files with duplicate headers, by appending a number
17
+ * explicitly set this option to `nil` to get the behavior from previous versions.
18
+
19
+ * performance and memory improvements
20
+ * code refactor
21
+
22
+ ## 1.9.3 (2023-12-16)
23
+ * raise SmarterCSV::IncorrectOption when `user_provided_headers` are empty
24
+ * code refactor / no functional changes
25
+ * added test cases
26
+
4
27
  ## 1.9.2 (2023-11-12)
5
28
  * fixed bug with '\\' at end of line (issue #252, thanks to averycrespi-moz)
6
29
  * fixed require statements (issue #249, thanks to PikachuEXE, courtsimas)
data/README.md CHANGED
@@ -2,15 +2,33 @@
2
2
  # SmarterCSV
3
3
 
4
4
  [![codecov](https://codecov.io/gh/tilo/smarter_csv/branch/main/graph/badge.svg?token=1L7OD80182)](https://codecov.io/gh/tilo/smarter_csv) [![Gem Version](https://badge.fury.io/rb/smarter_csv.svg)](http://badge.fury.io/rb/smarter_csv)
5
-
5
+
6
+
7
+ #### LATEST CHANGES
8
+
9
+ * Version 1.10.0 has BREAKING CHANGES:
10
+
11
+ Changed behavior:
12
+ + when `user_provided_headers` are provided:
13
+ * if they are not unique, an exception will now be raised
14
+ * they are taken "as is", no header transformations can be applied
15
+ * when they are given as strings or as symbols, it is assumed that this is the desired format
16
+ * the value of the `strings_as_keys` options will be ignored
17
+
18
+ + option `duplicate_header_suffix` now defaults to `''` instead of `nil`.
19
+ * this allows automatic disambiguation when processing of CSV files with duplicate headers, by appending a number
20
+ * explicitly set this option to `nil` to get the behavior from previous versions.
21
+
6
22
  #### Development Branches
7
23
 
8
24
  * default branch is `main` for 1.x development
9
- * 2.x development is on `2.0-development` (check this branch for 2.0 documentation)
25
+
26
+ * 2.x development is on `2.0-development` (check this branch for 2.0 documentation)
27
+ - This is an EXPERIMENTAL branch - DO NOT USE in production
10
28
 
11
- #### Work towards Future Version 2.0
29
+ #### Work towards Future Version 2.x
12
30
 
13
- * Work towards SmarterCSV 2.0 is still ongoing, with improved features, and more streamlined options, but consider it as experimental at this time.
31
+ * Work towards SmarterCSV 2.x is still ongoing, with improved features, and more streamlined options, but consider it as experimental at this time.
14
32
  Please check the [2.0-develop branch](https://github.com/tilo/smarter_csv/tree/2.0-develop), open any issues and pull requests with mention of tag v2.0.
15
33
 
16
34
  ---------------
@@ -84,6 +102,10 @@ $ hexdump -C spec/fixtures/bom_test_feff.csv
84
102
  00000040 73 2c 35 36 37 38 0d 0a |s,5678..|
85
103
  ```
86
104
 
105
+ ### Articles
106
+ * [Processing 1.4 Million CSV Records in Ruby, fast ](https://lcx.wien/blog/processing-14-million-csv-records-in-ruby/)
107
+ * [Speeding up CSV parsing with parallel processing](http://xjlin0.github.io/tech/2015/05/25/faster-parsing-csv-with-parallel-processing)
108
+
87
109
  ### Examples
88
110
 
89
111
  Here are some examples to demonstrate the versatility of SmarterCSV.
@@ -243,8 +265,6 @@ NOTE: If you use `key_mappings` and `value_converters`, make sure that the value
243
265
  data[0][:price].class
244
266
  => Float
245
267
  ```
246
- ## Parallel Processing
247
- [Jack](https://github.com/xjlin0) wrote an interesting article about [Speeding up CSV parsing with parallel processing](http://xjlin0.github.io/tech/2015/05/25/faster-parsing-csv-with-parallel-processing)
248
268
 
249
269
  ## Documentation
250
270
 
@@ -280,7 +300,8 @@ The options and the block are optional.
280
300
  | :headers_in_file | true | Whether or not the file contains headers as the first line. |
281
301
  | | | Important if the file does not contain headers, |
282
302
  | | | otherwise you would lose the first line of data. |
283
- | :duplicate_header_suffix | nil | If set, adds numbers to duplicated headers and separates them by the given suffix |
303
+ | :duplicate_header_suffix | '' | Adds numbers to duplicated headers and separates them by the given suffix. |
304
+ | | | Set this to nil to raise `DuplicateHeaders` error instead (previous behavior) |
284
305
  | :user_provided_headers | nil | *careful with that axe!* |
285
306
  | | | user provided Array of header strings or symbols, to define |
286
307
  | | | what headers should be used, overriding any in-file headers. |
@@ -300,7 +321,7 @@ And header and data validations will also be supported in 2.x
300
321
  | Option | Default | Explanation |
301
322
  ---------------------------------------------------------------------------------------------------------------------------------
302
323
  | :key_mapping | nil | a hash which maps headers from the CSV file to keys in the result hash |
303
- | :silence_missing_key | false | ignore missing keys in `key_mapping` |
324
+ | :silence_missing_keys | false | ignore missing keys in `key_mapping` |
304
325
  | | | if set to true: makes all mapped keys optional |
305
326
  | | | if given an array, makes only the keys listed in it optional |
306
327
  | :required_keys | nil | An array. Specify the required names AFTER header transformation. |
@@ -0,0 +1,73 @@
1
+ # frozen_string_literal: true
2
+
3
+ module SmarterCSV
4
+ class << self
5
+ protected
6
+
7
+ # If file has headers, then guesses column separator from headers.
8
+ # Otherwise guesses column separator from contents.
9
+ # Raises exception if none is found.
10
+ def guess_column_separator(filehandle, options)
11
+ skip_lines(filehandle, options)
12
+
13
+ delimiters = [',', "\t", ';', ':', '|']
14
+
15
+ line = nil
16
+ has_header = options[:headers_in_file]
17
+ candidates = Hash.new(0)
18
+ count = has_header ? 1 : 5
19
+ count.times do
20
+ line = readline_with_counts(filehandle, options)
21
+ delimiters.each do |d|
22
+ candidates[d] += line.scan(d).count
23
+ end
24
+ rescue EOFError # short files
25
+ break
26
+ end
27
+ rewind(filehandle)
28
+
29
+ if candidates.values.max == 0
30
+ # if the header only contains
31
+ return ',' if line.chomp(options[:row_sep]) =~ /^\w+$/
32
+
33
+ raise SmarterCSV::NoColSepDetected
34
+ end
35
+
36
+ candidates.key(candidates.values.max)
37
+ end
38
+
39
+ # limitation: this currently reads the whole file in before making a decision
40
+ def guess_line_ending(filehandle, options)
41
+ counts = {"\n" => 0, "\r" => 0, "\r\n" => 0}
42
+ quoted_char = false
43
+
44
+ # count how many of the pre-defined line-endings we find
45
+ # ignoring those contained within quote characters
46
+ last_char = nil
47
+ lines = 0
48
+ filehandle.each_char do |c|
49
+ quoted_char = !quoted_char if c == options[:quote_char]
50
+ next if quoted_char
51
+
52
+ if last_char == "\r"
53
+ if c == "\n"
54
+ counts["\r\n"] += 1
55
+ else
56
+ counts["\r"] += 1 # \r are counted after they appeared
57
+ end
58
+ elsif c == "\n"
59
+ counts["\n"] += 1
60
+ end
61
+ last_char = c
62
+ lines += 1
63
+ break if options[:auto_row_sep_chars] && options[:auto_row_sep_chars] > 0 && lines >= options[:auto_row_sep_chars]
64
+ end
65
+ rewind(filehandle)
66
+
67
+ counts["\r"] += 1 if last_char == "\r"
68
+ # find the most frequent key/value pair:
69
+ most_frequent_key, _count = counts.max_by{|_, v| v}
70
+ most_frequent_key
71
+ end
72
+ end
73
+ end
@@ -0,0 +1,50 @@
1
+ # frozen_string_literal: true
2
+
3
+ module SmarterCSV
4
+ class << self
5
+ protected
6
+
7
+ def readline_with_counts(filehandle, options)
8
+ line = filehandle.readline(options[:row_sep])
9
+ @file_line_count += 1
10
+ @csv_line_count += 1
11
+ line = remove_bom(line) if @csv_line_count == 1
12
+ line
13
+ end
14
+
15
+ def skip_lines(filehandle, options)
16
+ options[:skip_lines].to_i.times do
17
+ readline_with_counts(filehandle, options)
18
+ end
19
+ end
20
+
21
+ def rewind(filehandle)
22
+ @file_line_count = 0
23
+ @csv_line_count = 0
24
+ filehandle.rewind
25
+ end
26
+
27
+ private
28
+
29
+ UTF_32_BOM = %w[0 0 fe ff].freeze
30
+ UTF_32LE_BOM = %w[ff fe 0 0].freeze
31
+ UTF_8_BOM = %w[ef bb bf].freeze
32
+ UTF_16_BOM = %w[fe ff].freeze
33
+ UTF_16LE_BOM = %w[ff fe].freeze
34
+
35
+ def remove_bom(str)
36
+ str_as_hex = str.bytes.map{|x| x.to_s(16)}
37
+ # if string does not start with one of the bytes, there is no BOM
38
+ return str unless %w[ef fe ff 0].include?(str_as_hex[0])
39
+
40
+ return str.byteslice(4..-1) if [UTF_32_BOM, UTF_32LE_BOM].include?(str_as_hex[0..3])
41
+ return str.byteslice(3..-1) if str_as_hex[0..2] == UTF_8_BOM
42
+ return str.byteslice(2..-1) if [UTF_16_BOM, UTF_16LE_BOM].include?(str_as_hex[0..1])
43
+
44
+ # :nocov:
45
+ puts "SmarterCSV found unhandled BOM! #{str.chars[0..7].inspect}"
46
+ str
47
+ # :nocov:
48
+ end
49
+ end
50
+ end
@@ -0,0 +1,91 @@
1
+ # frozen_string_literal: true
2
+
3
+ module SmarterCSV
4
+ class << self
5
+ def hash_transformations(hash, options)
6
+ # there may be unmapped keys, or keys purposedly mapped to nil or an empty key..
7
+ # make sure we delete any key/value pairs from the hash, which the user wanted to delete:
8
+ remove_empty_values = options[:remove_empty_values] == true
9
+ remove_zero_values = options[:remove_zero_values]
10
+ remove_values_matching = options[:remove_values_matching]
11
+ convert_to_numeric = options[:convert_values_to_numeric]
12
+ value_converters = options[:value_converters]
13
+
14
+ hash.each_with_object({}) do |(k, v), new_hash|
15
+ next if k.nil? || k == '' || k == :""
16
+ next if remove_empty_values && (has_rails ? v.blank? : blank?(v))
17
+ next if remove_zero_values && v.is_a?(String) && v =~ /^(0+|0+\.0+)$/ # values are Strings
18
+ next if remove_values_matching && v =~ remove_values_matching
19
+
20
+ # deal with the :only / :except options to :convert_values_to_numeric
21
+ if convert_to_numeric && !limit_execution_for_only_or_except(options, :convert_values_to_numeric, k)
22
+ if v =~ /^[+-]?\d+\.\d+$/
23
+ v = v.to_f
24
+ elsif v =~ /^[+-]?\d+$/
25
+ v = v.to_i
26
+ end
27
+ end
28
+
29
+ converter = value_converters[k] if value_converters
30
+ v = converter.convert(v) if converter
31
+
32
+ new_hash[k] = v
33
+ end
34
+ end
35
+
36
+ # def hash_transformations(hash, options)
37
+ # # there may be unmapped keys, or keys purposedly mapped to nil or an empty key..
38
+ # # make sure we delete any key/value pairs from the hash, which the user wanted to delete:
39
+ # hash.delete(nil)
40
+ # hash.delete('')
41
+ # hash.delete(:"")
42
+
43
+ # if options[:remove_empty_values] == true
44
+ # hash.delete_if{|_k, v| has_rails ? v.blank? : blank?(v)}
45
+ # end
46
+
47
+ # hash.delete_if{|_k, v| !v.nil? && v =~ /^(0+|0+\.0+)$/} if options[:remove_zero_values] # values are Strings
48
+ # hash.delete_if{|_k, v| v =~ options[:remove_values_matching]} if options[:remove_values_matching]
49
+
50
+ # if options[:convert_values_to_numeric]
51
+ # hash.each do |k, v|
52
+ # # deal with the :only / :except options to :convert_values_to_numeric
53
+ # next if limit_execution_for_only_or_except(options, :convert_values_to_numeric, k)
54
+
55
+ # # convert if it's a numeric value:
56
+ # case v
57
+ # when /^[+-]?\d+\.\d+$/
58
+ # hash[k] = v.to_f
59
+ # when /^[+-]?\d+$/
60
+ # hash[k] = v.to_i
61
+ # end
62
+ # end
63
+ # end
64
+
65
+ # if options[:value_converters]
66
+ # hash.each do |k, v|
67
+ # converter = options[:value_converters][k]
68
+ # next unless converter
69
+
70
+ # hash[k] = converter.convert(v)
71
+ # end
72
+ # end
73
+
74
+ # hash
75
+ # end
76
+
77
+ protected
78
+
79
+ # acts as a road-block to limit processing when iterating over all k/v pairs of a CSV-hash:
80
+ def limit_execution_for_only_or_except(options, option_name, key)
81
+ if options[option_name].is_a?(Hash)
82
+ if options[option_name].has_key?(:except)
83
+ return true if Array(options[option_name][:except]).include?(key)
84
+ elsif options[option_name].has_key?(:only)
85
+ return true unless Array(options[option_name][:only]).include?(key)
86
+ end
87
+ end
88
+ false
89
+ end
90
+ end
91
+ end
@@ -0,0 +1,63 @@
1
+ # frozen_string_literal: true
2
+
3
+ module SmarterCSV
4
+ class << self
5
+ # transform the headers that were in the file:
6
+ def header_transformations(header_array, options)
7
+ header_array.map!{|x| x.gsub(%r/#{options[:quote_char]}/, '')}
8
+ header_array.map!{|x| x.strip} if options[:strip_whitespace]
9
+
10
+ unless options[:keep_original_headers]
11
+ header_array.map!{|x| x.gsub(/\s+|-+/, '_')}
12
+ header_array.map!{|x| x.downcase} if options[:downcase_header]
13
+ end
14
+
15
+ # detect duplicate headers and disambiguate
16
+ header_array = disambiguate_headers(header_array, options) if options[:duplicate_header_suffix]
17
+ # symbolize headers
18
+ header_array = header_array.map{|x| x.to_sym } unless options[:strings_as_keys] || options[:keep_original_headers]
19
+ # doesn't make sense to re-map when we have user_provided_headers
20
+ header_array = remap_headers(header_array, options) if options[:key_mapping]
21
+
22
+ header_array
23
+ end
24
+
25
+ def disambiguate_headers(headers, options)
26
+ counts = Hash.new(0)
27
+ headers.map do |header|
28
+ counts[header] += 1
29
+ counts[header] > 1 ? "#{header}#{options[:duplicate_header_suffix]}#{counts[header]}" : header
30
+ end
31
+ end
32
+
33
+ # do some key mapping on the keys in the file header
34
+ # if you want to completely delete a key, then map it to nil or to ''
35
+ def remap_headers(headers, options)
36
+ key_mapping = options[:key_mapping]
37
+ if key_mapping.empty? || !key_mapping.is_a?(Hash) || key_mapping.keys.empty?
38
+ raise(SmarterCSV::IncorrectOption, "ERROR: incorrect format for key_mapping! Expecting hash with from -> to mappings")
39
+ end
40
+
41
+ key_mapping = options[:key_mapping]
42
+ # if silence_missing_keys are not set, raise error if missing header
43
+ missing_keys = key_mapping.keys - headers
44
+ # if the user passes a list of speciffic mapped keys that are optional
45
+ missing_keys -= options[:silence_missing_keys] if options[:silence_missing_keys].is_a?(Array)
46
+
47
+ unless missing_keys.empty? || options[:silence_missing_keys] == true
48
+ raise SmarterCSV::KeyMappingError, "ERROR: can not map headers: #{missing_keys.join(', ')}"
49
+ end
50
+
51
+ headers.map! do |header|
52
+ if key_mapping.has_key?(header)
53
+ key_mapping[header].nil? ? nil : key_mapping[header]
54
+ elsif options[:remove_unmapped_keys]
55
+ nil
56
+ else
57
+ header
58
+ end
59
+ end
60
+ headers
61
+ end
62
+ end
63
+ end
@@ -0,0 +1,34 @@
1
+ # frozen_string_literal: true
2
+
3
+ module SmarterCSV
4
+ class << self
5
+ def header_validations(headers, options)
6
+ check_duplicate_headers(headers, options)
7
+ check_required_headers(headers, options)
8
+ end
9
+
10
+ def check_duplicate_headers(headers, _options)
11
+ header_counts = Hash.new(0)
12
+ headers.each { |header| header_counts[header] += 1 unless header.nil? }
13
+
14
+ duplicates = header_counts.select { |_, count| count > 1 }
15
+
16
+ unless duplicates.empty?
17
+ raise(SmarterCSV::DuplicateHeaders, "Duplicate Headers in CSV: #{duplicates.inspect}")
18
+ end
19
+ end
20
+
21
+ require 'set'
22
+
23
+ def check_required_headers(headers, options)
24
+ if options[:required_keys] && options[:required_keys].is_a?(Array)
25
+ headers_set = headers.to_set
26
+ missing_keys = options[:required_keys].select { |k| !headers_set.include?(k) }
27
+
28
+ unless missing_keys.empty?
29
+ raise SmarterCSV::MissingKeys, "ERROR: missing attributes: #{missing_keys.join(',')}"
30
+ end
31
+ end
32
+ end
33
+ end
34
+ end
@@ -0,0 +1,68 @@
1
+ # frozen_string_literal: true
2
+
3
+ module SmarterCSV
4
+ class << self
5
+ def process_headers(filehandle, options)
6
+ @raw_header = nil # header as it appears in the file
7
+ @headers = nil # the processed headers
8
+ header_array = []
9
+ file_header_size = nil
10
+
11
+ # if headers_in_file, get the headers -> We get the number of columns, even when user provided headers
12
+ if options[:headers_in_file] # extract the header line
13
+ # process the header line in the CSV file..
14
+ # the first line of a CSV file contains the header .. it might be commented out, so we need to read it anyhow
15
+ header_line = @raw_header = readline_with_counts(filehandle, options)
16
+ header_line = preprocess_header_line(header_line, options)
17
+
18
+ file_header_array, file_header_size = parse(header_line, options)
19
+
20
+ file_header_array = header_transformations(file_header_array, options)
21
+
22
+ else
23
+ unless options[:user_provided_headers]
24
+ raise SmarterCSV::IncorrectOption, "ERROR: If :headers_in_file is set to false, you have to provide :user_provided_headers"
25
+ end
26
+ end
27
+
28
+ if options[:user_provided_headers]
29
+ unless options[:user_provided_headers].is_a?(Array) && !options[:user_provided_headers].empty?
30
+ raise(SmarterCSV::IncorrectOption, "ERROR: incorrect format for user_provided_headers! Expecting array with headers.")
31
+ end
32
+
33
+ # use user-provided headers
34
+ user_header_array = options[:user_provided_headers]
35
+ # user_provided_headers: their count should match the headers_in_file if any
36
+ if defined?(file_header_size) && !file_header_size.nil?
37
+ if user_header_array.size != file_header_size
38
+ raise SmarterCSV::HeaderSizeMismatch, "ERROR: :user_provided_headers defines #{user_header_array.size} headers != CSV-file has #{file_header_size} headers"
39
+ else
40
+ # we could print out the mapping of file_header_array to header_array here
41
+ end
42
+ end
43
+
44
+ header_array = user_header_array
45
+ else
46
+ header_array = file_header_array
47
+ end
48
+
49
+ [header_array, header_array.size]
50
+ end
51
+
52
+ private
53
+
54
+ def preprocess_header_line(header_line, options)
55
+ header_line = enforce_utf8_encoding(header_line, options)
56
+ header_line = remove_comments_from_header(header_line, options)
57
+ header_line = header_line.chomp(options[:row_sep])
58
+ header_line.gsub!(options[:strip_chars_from_headers], '') if options[:strip_chars_from_headers]
59
+ header_line
60
+ end
61
+
62
+ def remove_comments_from_header(header, options)
63
+ return header unless options[:comment_regexp]
64
+
65
+ header.sub(options[:comment_regexp], '')
66
+ end
67
+ end
68
+ end
@@ -9,7 +9,7 @@ module SmarterCSV
9
9
  comment_regexp: nil, # was: /\A#/,
10
10
  convert_values_to_numeric: true,
11
11
  downcase_header: true,
12
- duplicate_header_suffix: nil,
12
+ duplicate_header_suffix: '', # was: nil,
13
13
  file_encoding: 'utf-8',
14
14
  force_simple_split: false,
15
15
  force_utf8: false,
@@ -62,6 +62,15 @@ module SmarterCSV
62
62
  private
63
63
 
64
64
  def validate_options!(options)
65
+ # deprecate required_headers
66
+ unless options[:required_headers].nil?
67
+ puts "DEPRECATION WARNING: please use 'required_keys' instead of 'required_headers'"
68
+ if options[:required_keys].nil?
69
+ options[:required_keys] = options[:required_headers]
70
+ options[:required_headers] = nil
71
+ end
72
+ end
73
+
65
74
  keys = options.keys
66
75
  errors = []
67
76
  errors << "invalid row_sep" if keys.include?(:row_sep) && !option_valid?(options[:row_sep])
@@ -0,0 +1,90 @@
1
+ # frozen_string_literal: true
2
+
3
+ module SmarterCSV
4
+ class << self
5
+ protected
6
+
7
+ ###
8
+ ### Thin wrapper around C-extension
9
+ ###
10
+ def parse(line, options, header_size = nil)
11
+ # puts "SmarterCSV.parse OPTIONS: #{options[:acceleration]}" if options[:verbose]
12
+
13
+ if options[:acceleration] && has_acceleration?
14
+ # :nocov:
15
+ has_quotes = line =~ /#{options[:quote_char]}/
16
+ elements = parse_csv_line_c(line, options[:col_sep], options[:quote_char], header_size)
17
+ elements.map!{|x| cleanup_quotes(x, options[:quote_char])} if has_quotes
18
+ [elements, elements.size]
19
+ # :nocov:
20
+ else
21
+ # puts "WARNING: SmarterCSV is using un-accelerated parsing of lines. Check options[:acceleration]"
22
+ parse_csv_line_ruby(line, options, header_size)
23
+ end
24
+ end
25
+
26
+ # ------------------------------------------------------------------
27
+ # Ruby equivalent of the C-extension for parse_line
28
+ #
29
+ # parses a single line: either a CSV header and body line
30
+ # - quoting rules compared to RFC-4180 are somewhat relaxed
31
+ # - we are not assuming that quotes inside a fields need to be doubled
32
+ # - we are not assuming that all fields need to be quoted (0 is even)
33
+ # - works with multi-char col_sep
34
+ # - if header_size is given, only up to header_size fields are parsed
35
+ #
36
+ # We use header_size for parsing the body lines to make sure we always match the number of headers
37
+ # in case there are trailing col_sep characters in line
38
+ #
39
+ # Our convention is that empty fields are returned as empty strings, not as nil.
40
+ #
41
+ #
42
+ # the purpose of the max_size parameter is to handle a corner case where
43
+ # CSV lines contain more fields than the header.
44
+ # In which case the remaining fields in the line are ignored
45
+ #
46
+ def parse_csv_line_ruby(line, options, header_size = nil)
47
+ return [] if line.nil?
48
+
49
+ line_size = line.size
50
+ col_sep = options[:col_sep]
51
+ col_sep_size = col_sep.size
52
+ quote = options[:quote_char]
53
+ quote_count = 0
54
+ elements = []
55
+ start = 0
56
+ i = 0
57
+
58
+ previous_char = ''
59
+ while i < line_size
60
+ if line[i...i+col_sep_size] == col_sep && quote_count.even?
61
+ break if !header_size.nil? && elements.size >= header_size
62
+
63
+ elements << cleanup_quotes(line[start...i], quote)
64
+ previous_char = line[i]
65
+ i += col_sep.size
66
+ start = i
67
+ else
68
+ quote_count += 1 if line[i] == quote && previous_char != '\\'
69
+ previous_char = line[i]
70
+ i += 1
71
+ end
72
+ end
73
+ elements << cleanup_quotes(line[start..-1], quote) if header_size.nil? || elements.size < header_size
74
+ [elements, elements.size]
75
+ end
76
+
77
+ def cleanup_quotes(field, quote)
78
+ return field if field.nil?
79
+
80
+ # return if field !~ /#{quote}/ # this check can probably eliminated
81
+
82
+ if field.start_with?(quote) && field.end_with?(quote)
83
+ field.delete_prefix!(quote)
84
+ field.delete_suffix!(quote)
85
+ end
86
+ field.gsub!("#{quote}#{quote}", quote)
87
+ field
88
+ end
89
+ end
90
+ end