smarter_csv 1.11.0.pre2 → 1.11.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 9027a37c4b29e68fcbc559a6a8285e5076684883612e98b13f116526dadc6e4b
4
- data.tar.gz: 43cfa0254ac2caa8ca02a8863fc790f1c56beb0b16e175a16fd947f92eda8c08
3
+ metadata.gz: 4a650fbccc5bf703199c6771da75ed95c87c9e3763e176b688a2daaa1b0c4669
4
+ data.tar.gz: 89cfff5bb92ee8faaeea7a644b61e63a7e47c61f30faefc515b602808eec0989
5
5
  SHA512:
6
- metadata.gz: d5b2eac35e33bdeb9ec632578207c47f34a08a4595c1ebf04929a7fe302efc3fc08565c0d6fc6454fd538b99cc1c4a5662599a88693e3aa1b80c5c0c7fc1b05e
7
- data.tar.gz: 36f622f12d5412ef8919c30a267def315985aef4b3f33c209341a046442795bb66df6c8ebb97399f35e24fd565df658a0e4b7a94d3563831d7ce32facbeab33f
6
+ metadata.gz: 70ea1888be23a467d15001086935117252a30860ee5a418e8b61b61bfb19a03377391ffb28589db412b0aac4293ac029a047b268d0fa6836ec60f2319cabf102
7
+ data.tar.gz: 49d7d1d1c4e258611168056a0b8b7433dfedcf37eeac25c22c0aade552a878361d019b86156cbadcacc5996a43cae11d9d55990a0ef9c4a599cf447019cf8d7f
data/.rspec ADDED
@@ -0,0 +1 @@
1
+ --require spec_helper
data/.rubocop.yml CHANGED
@@ -88,18 +88,12 @@ Style/IfInsideElse:
88
88
  Style/IfUnlessModifier:
89
89
  Enabled: false
90
90
 
91
- Style/InverseMethods:
92
- Enabled: false
93
-
94
91
  Style/NestedTernaryOperator:
95
92
  Enabled: false
96
93
 
97
94
  Style/PreferredHashMethods:
98
95
  Enabled: false
99
96
 
100
- Style/Proc:
101
- Enabled: false
102
-
103
97
  Style/NumericPredicate:
104
98
  Enabled: false
105
99
 
@@ -118,6 +112,9 @@ Style/SlicingWithRange:
118
112
  Style/SpecialGlobalVars: # DANGER: unsafe rule!!
119
113
  Enabled: false
120
114
 
115
+ Style/StringConcatenation:
116
+ Enabled: false
117
+
121
118
  Style/StringLiterals:
122
119
  Enabled: false
123
120
  EnforcedStyle: double_quotes
@@ -135,9 +132,6 @@ Style/SymbolProc: # old Ruby versions can't do this
135
132
  Style/TrailingCommaInHashLiteral:
136
133
  Enabled: false
137
134
 
138
- Style/TrailingCommaInArrayLiteral:
139
- Enabled: false
140
-
141
135
  Style/TrailingUnderscoreVariable:
142
136
  Enabled: false
143
137
 
@@ -147,9 +141,6 @@ Style/TrivialAccessors:
147
141
  # Style/UnlessModifier:
148
142
  # Enabled: false
149
143
 
150
- Style/WordArray:
151
- Enabled: false
152
-
153
144
  Style/ZeroLengthPredicate:
154
145
  Enabled: false
155
146
 
data/CHANGELOG.md CHANGED
@@ -1,42 +1,23 @@
1
1
 
2
2
  # SmarterCSV 1.x Change Log
3
3
 
4
- ## T.B.D.
5
-
6
- * code refactor
4
+ ## 1.11.2 (2024-07-05)
5
+ * fixing missing errors definition
6
+
7
+ ## 1.11.1 (2024-07-05) (YANKED)
8
+ * improved behavior of Writer class
9
+ * added SmarterCSV.generate shortcut for CSV writing
10
+
11
+ ## 1.11.0 (2024-07-02)
12
+ * added SmarterCSV::Writer to output CSV files ([issue #44](https://github.com/tilo/smarter_csv/issues/44))
7
13
 
8
- * NEW BEHAVIOR:
9
- - hidden `:v2_mode` options (incomplete!)
10
- - pre-processing for v2 options
11
- - implemented v2 `:header_transformations` (DO NOT USE YET!)
12
- + -> check if all v1 transformations are correctly done
13
- How are we going to
14
- * disambiguate headers?
15
-
16
-
17
- * do key_mapping? -> seems to work
18
- - remove_unmapped_keys ?
19
- - silence missing keys ... a missing mapped key should raise an exception, except when silenced
20
- - required_keys needs to be a header-validation
14
+ ## 1.10.3 (2024-03-10)
15
+ * fixed issue when frozen options are handed in (thanks to Daniel Pepper)
16
+ * cleaned-up rspec tests (thanks to Daniel Pepper)
17
+ * fixed link in README (issue #251)
21
18
 
22
-
23
- * keep original headers? -> :none
24
- * do strings_as_* ? -> either :keys_as_symbols, :keys_as_strings
25
- * remove quote_chars? -> included in keys_as_*
26
- * strip whitespace? -> included in keys_as_*
27
-
28
- TODO:
29
-
30
- - add tests for header_validations
31
-
32
- - modify options to handle v1 and v2 options
33
- - add v1 defaults in v2 processing
34
- - add tests for all options processing
35
- - 100% backwards compatibility when working in v1 mode
36
-
37
-
38
- ## 1.10.1 (2024-01-07)
39
- * fix incorrect warning about UTF-8 (issue #268, thanks hirowatari)
19
+ ## 1.10.2 (2024-02-11)
20
+ * improve error message for missing keys
40
21
 
41
22
  ## 1.10.1 (2024-01-07)
42
23
  * fix incorrect warning about UTF-8 (issue #268, thanks hirowatari)
data/CONTRIBUTORS.md CHANGED
@@ -51,4 +51,5 @@ A Big Thank you to everyone who filed issues, sent comments, and who contributed
51
51
  * [Rahul Chaudhary](https://github.com/rahulch95)
52
52
  * [Alessandro Fazzi](https://github.com/pioneerskies)
53
53
  * [JP Camara](https://github.com/jpcamara)
54
- * [Hiro Watari](https://github.com/hirowatari)
54
+ * [Kenton Hirowatari](https://github.com/hirowatari)
55
+ * [Daniel Pepper](https://github.com/dpep)
data/README.md CHANGED
@@ -3,10 +3,17 @@
3
3
 
4
4
  [![codecov](https://codecov.io/gh/tilo/smarter_csv/branch/main/graph/badge.svg?token=1L7OD80182)](https://codecov.io/gh/tilo/smarter_csv) [![Gem Version](https://badge.fury.io/rb/smarter_csv.svg)](http://badge.fury.io/rb/smarter_csv)
5
5
 
6
+ SmarterCSV provides a complete interface to CSV files and data. It offers tools to enable you to read and write to and from Strings or IO objects, as needed.
6
7
 
7
- #### LATEST CHANGES
8
+ SmarterCSV focuses on representing the data for each row as a hash.
8
9
 
9
- * Version 1.10.0 has BREAKING CHANGES:
10
+ When reading CSV files, Using an array-of-hashes format makes it much easier to further process the data, or creating database records with it.
11
+
12
+ When writing CSV data to file, it similarly takes arrays of hashes, and converts them to a CSV file.
13
+
14
+ #### BREAKING CHANGES
15
+
16
+ * Version 1.10.0 had BREAKING CHANGES:
10
17
 
11
18
  Changed behavior:
12
19
  + when `user_provided_headers` are provided:
@@ -23,13 +30,8 @@
23
30
 
24
31
  * default branch is `main` for 1.x development
25
32
 
26
- * 2.x development is on `2.0-development` (check this branch for 2.0 documentation)
27
- - This is an EXPERIMENTAL branch - DO NOT USE in production
28
-
29
- #### Work towards Future Version 2.x
30
-
31
- * Work towards SmarterCSV 2.x is still ongoing, with improved features, and more streamlined options, but consider it as experimental at this time.
32
- Please check the [2.0-develop branch](https://github.com/tilo/smarter_csv/tree/2.0-develop), open any issues and pull requests with mention of tag v2.0.
33
+ * 2.x development is [MOVED TO THIS PR](https://github.com/tilo/smarter_csv/pull/267)
34
+ - 2.x behavior is still EXPERIMENTAL - DO NOT USE in production
33
35
 
34
36
  ---------------
35
37
 
@@ -394,10 +396,9 @@ And header and data validations will also be supported in 2.x
394
396
  * some CSV files use un-escaped quotation characters inside fields. This can cause the import to break. To get around this, use the `:force_simple_split => true` option in combination with `:strip_chars_from_headers => /[\-"]/` . This will also significantly speed up the import.
395
397
  If you would force a different :quote_char instead (setting it to a non-used character), then the import would be up to 5-times slower than using `:force_simple_split`.
396
398
 
397
- ## See also:
398
-
399
- http://www.unixgods.org/~tilo/Ruby/process_csv_as_hashes.html
399
+ ## The original post that started SmarterCSV:
400
400
 
401
+ http://www.unixgods.org/Ruby/process_csv_as_hashes.html
401
402
 
402
403
 
403
404
  ## Installation
@@ -0,0 +1,16 @@
1
+ # frozen_string_literal: true
2
+
3
+ module SmarterCSV
4
+ class Error < StandardError; end # new code should rescue this instead
5
+ # Reader:
6
+ class SmarterCSVException < Error; end # for backwards compatibility
7
+ class HeaderSizeMismatch < SmarterCSVException; end
8
+ class IncorrectOption < SmarterCSVException; end
9
+ class ValidationError < SmarterCSVException; end
10
+ class DuplicateHeaders < SmarterCSVException; end
11
+ class MissingKeys < SmarterCSVException; end # previously known as MissingHeaders
12
+ class NoColSepDetected < SmarterCSVException; end
13
+ class KeyMappingError < SmarterCSVException; end
14
+ # Writer:
15
+ class InvalidInputData < SmarterCSVException; end
16
+ end
@@ -2,16 +2,7 @@
2
2
 
3
3
  module SmarterCSV
4
4
  class << self
5
- # this is processing the headers from the input file
6
5
  def hash_transformations(hash, options)
7
- if options[:v2_mode]
8
- hash_transformations_v2(hash, options)
9
- else
10
- hash_transformations_v1(hash, options)
11
- end
12
- end
13
-
14
- def hash_transformations_v1(hash, options)
15
6
  # there may be unmapped keys, or keys purposedly mapped to nil or an empty key..
16
7
  # make sure we delete any key/value pairs from the hash, which the user wanted to delete:
17
8
  remove_empty_values = options[:remove_empty_values] == true
@@ -42,117 +33,46 @@ module SmarterCSV
42
33
  end
43
34
  end
44
35
 
45
- def hash_transformations_v2(hash, options)
46
- return hash if options[:hash_transformations].nil? || options[:hash_transformations].empty?
47
-
48
- # do the header transformations the user requested:
49
- if options[:hash_transformations]
50
- options[:hash_transformations].each do |transformation|
51
- if transformation.respond_to?(:call) # this is used when a user-provided Proc is passed in
52
- hash = transformation.call(hash, options)
53
- else
54
- case transformation
55
- when Symbol # this is used for pre-defined transformations that are defined in the SmarterCSV module
56
- hash = public_send(transformation, hash, options)
57
- when Hash # this is called for hash arguments, e.g. hash_transformations
58
- trans, args = transformation.first # .first treats the hash first element as an array
59
- hash = apply_transformation(trans, hash, args, options)
60
- when Array # this can be used for passing additional arguments in array form (e.g. into a Proc)
61
- trans, *args = transformation
62
- hash = apply_transformation(trans, hash, args, options)
63
- else
64
- raise SmarterCSV::IncorrectOption, "Invalid transformation type: #{transformation.class}"
65
- end
66
- end
67
- end
68
- end
69
-
70
- hash
71
- end
72
-
73
- #
74
- # To handle v1-backward-compatible behavior, it is faster to roll all behavior into one method
75
- #
76
- def v1_backwards_compatibility(hash, options)
77
- hash.each_with_object({}) do |(k, v), new_hash|
78
- next if k.nil? || k == '' || k == :"" # remove_empty_keys
79
- next if has_rails ? v.blank? : blank?(v) # remove_empty_values
80
-
81
- # convert_values_to_numeric:
82
- # deal with the :only / :except options to :convert_values_to_numeric
83
- unless limit_execution_for_only_or_except(options, :convert_values_to_numeric, k)
84
- if v =~ /^[+-]?\d+\.\d+$/
85
- v = v.to_f
86
- elsif v =~ /^[+-]?\d+$/
87
- v = v.to_i
88
- end
89
- end
90
-
91
- new_hash[k] = v
92
- end
93
- end
94
-
95
- #
96
- # Building Blocks in case you want to build your own flow:
97
- #
98
-
99
- def value_converters(hash, _options)
100
- #
101
- # TO BE IMPLEMENTED
102
- #
103
- end
104
-
105
- def strip_spaces(hash, _options)
106
- hash.each_key {|key| hash[key].strip! unless hash[key].nil? } # &. syntax was introduced in Ruby 2.3 - need to stay backwards compatible
107
- end
108
-
109
- def remove_blank_values(hash, _options)
110
- hash.each_key {|key| hash.delete(key) if hash[key].nil? || hash[key].is_a?(String) && hash[key] !~ /[^[:space:]]/ }
111
- end
112
-
113
- def remove_zero_values(hash, _options)
114
- hash.each_key {|key| hash.delete(key) if hash[key].is_a?(Numeric) && hash[key].zero? }
115
- end
116
-
117
- def remove_empty_keys(hash, _options)
118
- hash.reject!{|key, _v| key.nil? || key.empty?}
119
- end
120
-
121
- def convert_values_to_numeric(hash, _options)
122
- hash.each_key do |k|
123
- case hash[k]
124
- when /^[+-]?\d+\.\d+$/
125
- hash[k] = hash[k].to_f
126
- when /^[+-]?\d+$/
127
- hash[k] = hash[k].to_i
128
- end
129
- end
130
- end
131
-
132
- def convert_values_to_numeric_unless_leading_zeroes(hash, _options)
133
- hash.each_key do |k|
134
- case hash[k]
135
- when /^[+-]?[1-9]\d*\.\d+$/
136
- hash[k] = hash[k].to_f
137
- when /^[+-]?[1-9]\d*$/
138
- hash[k] = hash[k].to_i
139
- end
140
- end
141
- end
142
-
143
- # IMPORTANT NOTE:
144
- # this can lead to cases where a nil or empty value gets converted into 0 or 0.0,
145
- # and can then not be properly removed!
146
- #
147
- # you should first try to use convert_values_to_numeric or convert_values_to_numeric_unless_leading_zeroes
148
- #
149
- def convert_to_integer(hash, _options)
150
- hash.each_key {|key| hash[key] = hash[key].to_i }
151
- end
152
-
153
- def convert_to_float(hash, _options)
154
- hash.each_key {|key| hash[key] = hash[key].to_f }
155
- end
36
+ # def hash_transformations(hash, options)
37
+ # # there may be unmapped keys, or keys purposedly mapped to nil or an empty key..
38
+ # # make sure we delete any key/value pairs from the hash, which the user wanted to delete:
39
+ # hash.delete(nil)
40
+ # hash.delete('')
41
+ # hash.delete(:"")
42
+
43
+ # if options[:remove_empty_values] == true
44
+ # hash.delete_if{|_k, v| has_rails ? v.blank? : blank?(v)}
45
+ # end
46
+
47
+ # hash.delete_if{|_k, v| !v.nil? && v =~ /^(0+|0+\.0+)$/} if options[:remove_zero_values] # values are Strings
48
+ # hash.delete_if{|_k, v| v =~ options[:remove_values_matching]} if options[:remove_values_matching]
49
+
50
+ # if options[:convert_values_to_numeric]
51
+ # hash.each do |k, v|
52
+ # # deal with the :only / :except options to :convert_values_to_numeric
53
+ # next if limit_execution_for_only_or_except(options, :convert_values_to_numeric, k)
54
+
55
+ # # convert if it's a numeric value:
56
+ # case v
57
+ # when /^[+-]?\d+\.\d+$/
58
+ # hash[k] = v.to_f
59
+ # when /^[+-]?\d+$/
60
+ # hash[k] = v.to_i
61
+ # end
62
+ # end
63
+ # end
64
+
65
+ # if options[:value_converters]
66
+ # hash.each do |k, v|
67
+ # converter = options[:value_converters][k]
68
+ # next unless converter
69
+
70
+ # hash[k] = converter.convert(v)
71
+ # end
72
+ # end
73
+
74
+ # hash
75
+ # end
156
76
 
157
77
  protected
158
78
 
@@ -2,18 +2,8 @@
2
2
 
3
3
  module SmarterCSV
4
4
  class << self
5
- # this is processing the headers from the input file
5
+ # transform the headers that were in the file:
6
6
  def header_transformations(header_array, options)
7
- if options[:v2_mode]
8
- header_transformations_v2(header_array, options)
9
- else
10
- header_transformations_v1(header_array, options)
11
- end
12
- end
13
-
14
- # ---- V1.x Version: transform the headers that were in the file: ------------------------------------------
15
- #
16
- def header_transformations_v1(header_array, options)
17
7
  header_array.map!{|x| x.gsub(%r/#{options[:quote_char]}/, '')}
18
8
  header_array.map!{|x| x.strip} if options[:strip_whitespace]
19
9
 
@@ -67,99 +57,7 @@ module SmarterCSV
67
57
  header
68
58
  end
69
59
  end
70
-
71
60
  headers
72
61
  end
73
-
74
- # ---- V2.x Version: transform the headers that were in the file: ------------------------------------------
75
- #
76
- def header_transformations_v2(header_array, options)
77
- return header_array if options[:header_transformations].nil? || options[:header_transformations].empty?
78
-
79
- # do the header transformations the user requested:
80
- if options[:header_transformations]
81
- options[:header_transformations].each do |transformation|
82
- if transformation.respond_to?(:call) # this is used when a user-provided Proc is passed in
83
- header_array = transformation.call(header_array, options)
84
- else
85
- case transformation
86
- when Symbol # this is used for pre-defined transformations that are defined in the SmarterCSV module
87
- header_array = public_send(transformation, header_array, options)
88
- when Hash # this is called for hash arguments, e.g. header_transformations
89
- trans, args = transformation.first # .first treats the hash first element as an array
90
- header_array = apply_transformation(trans, header_array, args, options)
91
- when Array # this can be used for passing additional arguments in array form (e.g. into a Proc)
92
- trans, *args = transformation
93
- header_array = apply_transformation(trans, header_array, args, options)
94
- else
95
- raise SmarterCSV::IncorrectOption, "Invalid transformation type: #{transformation.class}"
96
- end
97
- end
98
- end
99
- end
100
-
101
- header_array
102
- end
103
-
104
- def apply_transformation(transformation, header_array, args, options)
105
- if transformation.respond_to?(:call)
106
- # If transformation is a callable object (like a Proc)
107
- transformation.call(header_array, args, options)
108
- else
109
- # If transformation is a symbol (method name)
110
- public_send(transformation, header_array, args, options)
111
- end
112
- end
113
-
114
- # pre-defined v2 header transformations:
115
-
116
- # these are some pre-defined header transformations which can be used
117
- # all these take the headers array as the input
118
- #
119
- # the computed options can be accessed via @options
120
-
121
- def keys_as_symbols(headers, options)
122
- headers.map do |header|
123
- header.strip.downcase.gsub(%r{#{options[:quote_char]}}, '').gsub(/(\s|-)+/, '_').to_sym
124
- end
125
- end
126
-
127
- def keys_as_strings(headers, options)
128
- headers.map do |header|
129
- header.strip.gsub(%r{#{options[:quote_char]}}, '').downcase.gsub(/(\s|-)+/, '_')
130
- end
131
- end
132
-
133
- def downcase_headers(headers, _options)
134
- headers.map do |header|
135
- header.strip.downcase!
136
- end
137
- end
138
-
139
- def key_mapping(headers, mapping = {}, options)
140
- raise(SmarterCSV::IncorrectOption, "ERROR: incorrect format for key_mapping! Expecting hash with from -> to mappings") if mapping.empty? || !mapping.is_a?(Hash)
141
-
142
- headers_set = headers.to_set
143
- mapping_keys_set = mapping.keys.to_set
144
- silence_keys_set = (options[:silence_missing_keys] || []).to_set
145
-
146
- # Check for missing keys
147
- missing_keys = mapping_keys_set - headers_set - silence_keys_set
148
- raise SmarterCSV::KeyMappingError, "ERROR: cannot map headers: #{missing_keys.to_a.join(', ')}" if missing_keys.any? && !options[:silence_missing_keys]
149
-
150
- # Apply key mapping, retaining nils for explicitly mapped headers
151
- headers.map do |header|
152
- if mapping.key?(header)
153
- # Maps the key according to the mapping, including nil mapping
154
- mapping[header]
155
- elsif options[:remove_unmapped_keys]
156
- # Remove headers not specified in the mapping
157
- nil
158
- else
159
- # Keep the original header if not specified in the mapping
160
- header
161
- end
162
- end
163
- end
164
62
  end
165
63
  end
@@ -3,21 +3,11 @@
3
3
  module SmarterCSV
4
4
  class << self
5
5
  def header_validations(headers, options)
6
- if options[:v2_mode]
7
- header_validations_v2(headers, options)
8
- else
9
- header_validations_v1(headers, options)
10
- end
11
- end
12
-
13
- # ---- V1.x Version: validate the headers -----------------------------------------------------------------
14
-
15
- def header_validations_v1(headers, options)
16
- check_duplicate_headers_v1(headers, options)
17
- check_required_headers_v1(headers, options)
6
+ check_duplicate_headers(headers, options)
7
+ check_required_headers(headers, options)
18
8
  end
19
9
 
20
- def check_duplicate_headers_v1(headers, _options)
10
+ def check_duplicate_headers(headers, _options)
21
11
  header_counts = Hash.new(0)
22
12
  headers.each { |header| header_counts[header] += 1 unless header.nil? }
23
13
 
@@ -28,109 +18,17 @@ module SmarterCSV
28
18
  end
29
19
  end
30
20
 
31
- def check_required_headers_v1(headers, options)
21
+ require 'set'
22
+
23
+ def check_required_headers(headers, options)
32
24
  if options[:required_keys] && options[:required_keys].is_a?(Array)
33
25
  headers_set = headers.to_set
34
26
  missing_keys = options[:required_keys].select { |k| !headers_set.include?(k) }
35
27
 
36
28
  unless missing_keys.empty?
37
- raise SmarterCSV::MissingKeys, "ERROR: missing attributes: #{missing_keys.join(',')}"
29
+ raise SmarterCSV::MissingKeys, "ERROR: missing attributes: #{missing_keys.join(',')}. Check `SmarterCSV.headers` for original headers."
38
30
  end
39
31
  end
40
32
  end
41
-
42
- # ---- V2.x Version: validate the headers -----------------------------------------------------------------
43
-
44
- # def header_validations_v2(headers, options)
45
- # return unless options[:header_validations]
46
-
47
- # options[:header_validations].each do |validation|
48
- # if validation.respond_to?(:call)
49
- # # Directly call if it's a Proc or lambda
50
- # validation.call(headers)
51
- # else
52
- # binding.pry
53
- # # Handle Symbol, Hash, or Array
54
- # method_name, args = validation.is_a?(Symbol) ? [validation, []] : validation
55
- # public_send(method_name, headers, *Array(args))
56
- # end
57
- # end
58
- # end
59
-
60
- def header_validations_v2(headers, options)
61
- return unless options[:header_validations]
62
-
63
- # do the header validations the user requested:
64
- # Header validations typically raise errors directly
65
- #
66
- options[:header_validations].each do |validation|
67
- if validation.respond_to?(:call)
68
- # Directly call if it's a Proc or lambda
69
- validation.call(headers)
70
- else
71
- case validation
72
- when Symbol
73
- public_send(validation, headers)
74
- when Hash
75
- val, args = validation.first
76
- public_send(val, headers, args)
77
- when Array
78
- val, *args = validation
79
- public_send(val, headers, args)
80
- else
81
- raise SmarterCSV::IncorrectOption, "Invalid validation type: #{validation.class}"
82
- end
83
- end
84
- end
85
- end
86
-
87
- # def header_validations_v2_orig(headers, options)
88
- # # do the header validations the user requested:
89
- # # Header validations typically raise errors directly
90
- # #
91
- # if options[:header_validations]
92
- # options[:header_validations].each do |validation|
93
- # case validation
94
- # when Symbol
95
- # public_send(validation, headers)
96
- # when Hash
97
- # val, args = validation.first
98
- # public_send(val, headers, args)
99
- # when Array
100
- # val, args = validation
101
- # public_send(val, headers, args)
102
- # else
103
- # validation.call(headers) unless validation.nil?
104
- # end
105
- # end
106
- # end
107
- # end
108
-
109
- # these are some pre-defined header validations which can be used
110
- # all these take the headers array as the input
111
- #
112
- # the computed options can be accessed via @options
113
-
114
- def unique_headers(headers)
115
- header_counts = Hash.new(0)
116
- headers.each { |header| header_counts[header] += 1 unless header.nil? }
117
-
118
- duplicates = header_counts.select { |_, count| count > 1 }
119
-
120
- unless duplicates.empty?
121
- raise(SmarterCSV::DuplicateHeaders, "Duplicate Headers in CSV: #{duplicates.inspect}")
122
- end
123
- end
124
-
125
- def required_headers(headers, required = [])
126
- raise(SmarterCSV::IncorrectOption, "ERROR: required_headers validation needs an array argument") unless required.is_a?(Array)
127
-
128
- headers_set = headers.to_set
129
- missing = required.select { |r| !headers_set.include?(r) }
130
-
131
- unless missing.empty?
132
- raise(SmarterCSV::MissingKeys, "Missing Headers in CSV: #{missing.inspect}")
133
- end
134
- end
135
33
  end
136
34
  end
@@ -1,7 +1,7 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SmarterCSV
4
- COMMON_OPTIONS = {
4
+ DEFAULT_OPTIONS = {
5
5
  acceleration: true,
6
6
  auto_row_sep_chars: 500,
7
7
  chunk_size: nil,
@@ -15,66 +15,39 @@ module SmarterCSV
15
15
  force_utf8: false,
16
16
  headers_in_file: true,
17
17
  invalid_byte_sequence: '',
18
- quote_char: '"',
19
- remove_unmapped_keys: false,
20
- row_sep: :auto, # was: $/,
21
- silence_deprecations: false, # new in 1.11
22
- silence_missing_keys: false,
23
- skip_lines: nil,
24
- user_provided_headers: nil,
25
- verbose: false,
26
- with_line_numbers: false,
27
- v2_mode: false,
28
- }.freeze
29
-
30
- V1_DEFAULT_OPTIONS = {
31
18
  keep_original_headers: false,
32
19
  key_mapping: nil,
20
+ quote_char: '"',
33
21
  remove_empty_hashes: true,
34
22
  remove_empty_values: true,
23
+ remove_unmapped_keys: false,
35
24
  remove_values_matching: nil,
36
25
  remove_zero_values: false,
37
26
  required_headers: nil,
38
27
  required_keys: nil,
28
+ row_sep: :auto, # was: $/,
29
+ silence_missing_keys: false,
30
+ skip_lines: nil,
39
31
  strings_as_keys: false,
40
32
  strip_chars_from_headers: nil,
41
33
  strip_whitespace: true,
34
+ user_provided_headers: nil,
42
35
  value_converters: nil,
43
- v2_mode: false,
36
+ verbose: false,
37
+ with_line_numbers: false,
44
38
  }.freeze
45
39
 
46
- DEPRECATED_OPTIONS = [
47
- :convert_values_to_numeric,
48
- :downcase_headers,
49
- :keep_original_headers,
50
- :key_mapping,
51
- :remove_empty_hashes,
52
- :remove_empty_values,
53
- :remove_values_matching,
54
- :remove_zero_values,
55
- :required_headers,
56
- :required_keys,
57
- :stirngs_as_keys,
58
- :strip_cars_from_headers,
59
- :strip_whitespace,
60
- :value_converters,
61
- ].freeze
62
-
63
40
  class << self
64
41
  # NOTE: this is not called when "parse" methods are tested by themselves
65
42
  def process_options(given_options = {})
66
43
  puts "User provided options:\n#{pp(given_options)}\n" if given_options[:verbose]
67
44
 
68
- # fix invalid input
69
- given_options[:invalid_byte_sequence] = '' if given_options[:invalid_byte_sequence].nil?
70
-
71
- # warn about deprecated options / raises error for v2_mode
72
- handle_deprecations(given_options)
45
+ @options = DEFAULT_OPTIONS.dup.merge!(given_options)
73
46
 
74
- given_options = preprocess_v2_options(given_options) if given_options[:v2_mode]
47
+ # fix invalid input
48
+ @options[:invalid_byte_sequence] ||= ''
75
49
 
76
- @options = compute_default_options(given_options).merge!(given_options)
77
- puts "Computed options:\n#{pp(@options)}\n" if given_options[:verbose]
50
+ puts "Computed options:\n#{pp(@options)}\n" if @options[:verbose]
78
51
 
79
52
  validate_options!(@options)
80
53
  @options
@@ -84,35 +57,11 @@ module SmarterCSV
84
57
  #
85
58
  # ONLY FOR BACKWARDS-COMPATIBILITY
86
59
  def default_options
87
- COMMON_OPTIONS.merge(V1_DEFAULT_OPTIONS)
60
+ DEFAULT_OPTIONS
88
61
  end
89
62
 
90
63
  private
91
64
 
92
- def compute_default_options(options = {})
93
- return COMMON_OPTIONS.merge(V1_DEFAULT_OPTIONS) unless options[:v2_mode]
94
-
95
- default_options = {}
96
- if options[:defaults].to_s != 'none'
97
- default_options = COMMON_OPTIONS.dup.merge(V2_DEFAULT_OPTIONS)
98
- if options[:defaults].to_s == 'v1'
99
- default_options.merge(V1_TRANSFORMATIONS)
100
- else
101
- default_options.merge(V2_TRANSFORMATIONS)
102
- end
103
- end
104
- end
105
-
106
- def handle_deprecations(options)
107
- used_deprecated_options = DEPRECATED_OPTIONS & options.keys
108
- message = "SmarterCSV #{VERSION} DEPRECATED OPTIONS: #{pp(used_deprecated_options)}"
109
- if options[:v2_mode]
110
- raise(SmarterCSV::DeprecatedOptions, "ERROR: #{message}") unless used_deprecated_options.empty? || options[:silence_deprecations]
111
- else
112
- puts "DEPRECATION WARNING: #{message}" unless used_deprecated_options.empty? || options[:silence_deprecations]
113
- end
114
- end
115
-
116
65
  def validate_options!(options)
117
66
  # deprecate required_headers
118
67
  unless options[:required_headers].nil?
@@ -141,57 +90,5 @@ module SmarterCSV
141
90
  def pp(value)
142
91
  defined?(AwesomePrint) ? value.awesome_inspect(index: nil) : value.inspect
143
92
  end
144
-
145
- # ---- V2 code ----------------------------------------------------------------------------------------
146
-
147
- V2_DEFAULT_OPTIONS = {
148
- # These need to go to the COMMON_OPTIONS:
149
- remove_empty_hashes: true, # this might need a transformation or move to common options
150
- # ------------
151
- header_transformations: [:keys_as_symbols],
152
- header_validations: [:unique_headers],
153
- # data_transformations: [:replace_blank_with_nil],
154
- # data_validations: [],
155
- hash_transformations: [:strip_spaces, :remove_blank_values],
156
- hash_validations: [],
157
- v2_mode: true,
158
- }.freeze
159
-
160
- V2_TRANSFORMATIONS = {
161
- header_transformations: [:keys_as_symbols],
162
- header_validations: [:unique_headers],
163
- # data_transformations: [:replace_blank_with_nil],
164
- # data_validations: [],
165
- hash_transformations: [:v1_backwards_compatibility],
166
- # hash_transformations: [:remove_empty_keys, :strip_spaces, :remove_blank_values, :convert_values_to_numeric], # ??? :convert_values_to_numeric]
167
- hash_validations: [],
168
- }.freeze
169
-
170
- V1_TRANSFORMATIONS = {
171
- header_transformations: [:keys_as_symbols],
172
- header_validations: [:unique_headers],
173
- # data_transformations: [:replace_blank_with_nil],
174
- # data_validations: [],
175
- hash_transformations: [:strip_spaces, :remove_blank_values, :convert_values_to_numeric],
176
- hash_validations: [],
177
- }.freeze
178
-
179
- def preprocess_v2_options(options)
180
- return options unless options[:v2_mode] || options[:header_transformations]
181
-
182
- # We want to provide safe defaults for easy processing, that is why we have a special keyword :none
183
- # to not do any header transformations..
184
- #
185
- # this is why we need to remove the 'none' here:
186
- #
187
- requested_header_transformations = options[:header_transformations]
188
- if requested_header_transformations.to_s == 'none'
189
- requested_header_transformations = []
190
- else
191
- requested_header_transformations = requested_header_transformations.reject {|x| x.to_s == 'none'} unless requested_header_transformations.nil?
192
- end
193
- options[:header_transformations] = requested_header_transformations || []
194
- options
195
- end
196
93
  end
197
94
  end
@@ -1,16 +1,6 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SmarterCSV
4
- class SmarterCSVException < StandardError; end
5
- class DeprecatedOptions < SmarterCSVException; end
6
- class HeaderSizeMismatch < SmarterCSVException; end
7
- class IncorrectOption < SmarterCSVException; end
8
- class ValidationError < SmarterCSVException; end
9
- class DuplicateHeaders < SmarterCSVException; end
10
- class MissingKeys < SmarterCSVException; end # previously known as MissingHeaders
11
- class NoColSepDetected < SmarterCSVException; end
12
- class KeyMappingError < SmarterCSVException; end
13
-
14
4
  # first parameter: filename or input object which responds to readline method
15
5
  def SmarterCSV.process(input, given_options = {}, &block) # rubocop:disable Lint/UnusedMethodArgument
16
6
  initialize_variables
@@ -109,10 +99,6 @@ module SmarterCSV
109
99
 
110
100
  next if options[:remove_empty_hashes] && hash.empty?
111
101
 
112
- #
113
- # should HASH VALIDATIONS go here instead?
114
- #
115
-
116
102
  puts "CSV Line #{@file_line_count}: #{pp(hash)}" if @verbose == '2' # very verbose setting
117
103
  # optional adding of csv_line_number to the hash to help debugging
118
104
  hash[:csv_line_number] = @csv_line_count if options[:with_line_numbers]
@@ -170,19 +156,22 @@ module SmarterCSV
170
156
  end
171
157
 
172
158
  class << self
173
- # Counts the number of quote characters in a line, excluding escaped quotes.
174
- # FYI: using Ruby built-in regex processing to determine the number of quotes
175
159
  def count_quote_chars(line, quote_char)
176
160
  return 0 if line.nil? || quote_char.nil? || quote_char.empty?
177
161
 
178
- # Escaped quote character (e.g., if quote_char is ", then escaped is \")
179
- escaped_quote = Regexp.escape(quote_char)
162
+ count = 0
163
+ escaped = false
180
164
 
181
- # Pattern to match a quote character not preceded by a backslash
182
- pattern = /(?<!\\)(?:\\\\)*#{escaped_quote}/
165
+ line.each_char do |char|
166
+ if char == '\\' && !escaped
167
+ escaped = true
168
+ else
169
+ count += 1 if char == quote_char && !escaped
170
+ escaped = false
171
+ end
172
+ end
183
173
 
184
- # Count occurrences
185
- line.scan(pattern).count
174
+ count
186
175
  end
187
176
 
188
177
  def has_acceleration?
@@ -15,7 +15,6 @@ module SmarterCSV
15
15
  @raw_header = nil # header as it appears in the file
16
16
  @result = []
17
17
  @warnings = {}
18
- @v2_mode = false
19
18
  @enforce_utf8 = false # only set to true if needed (after options parsing)
20
19
  end
21
20
 
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SmarterCSV
4
- VERSION = "1.11.0.pre2"
4
+ VERSION = "1.11.2"
5
5
  end
@@ -0,0 +1,115 @@
1
+ # frozen_string_literal: true
2
+
3
+ module SmarterCSV
4
+ #
5
+ # Generate CSV files
6
+ #
7
+ # Create an instance of the Writer class with the filename and options.
8
+ # call `<<` one or mulltiple times to append data to the file.
9
+ # call `finalize` to save the file.
10
+ #
11
+ # The `<<` method can take different arguments:
12
+ # * a signle Hash
13
+ # * an array of Hashes
14
+ # * nested arrays of arrays of Hashes
15
+ #
16
+ # By default SmarterCSV::Writer automatically discovers all headers that are present
17
+ # in the data on-the-fly. This can be disabled, then only given headers are used.
18
+ # Disabling can be useful when you want to select attributes from hashes, or ActiveRecord instances.
19
+ #
20
+ # If `discover_headers` is enabled, and headers are given, any new headers that are found in the data will still be appended.
21
+ #
22
+ # The Writer automatically quotes fields containing the col_sep, row_sep, or the quote_char.
23
+ #
24
+ # Options:
25
+ # col_sep : defaults to , but can be set to any other character
26
+ # row_sep : defaults to LF \n , but can be set to \r\n or \r or anything else
27
+ # quote_char : defaults to "
28
+ # discover_headers : defaults to true
29
+ # headers : defaults to []
30
+ # force_quotes: defaults to false
31
+ # map_headers: defaults to {}, can be a hash of key -> value mappings
32
+
33
+ # IMPORTANT NOTES:
34
+ # * Data hashes could contain strings or symbols as keys.
35
+ # Make sure to use the correct form when specifying headers manually,
36
+ # in combination with the :discover_headers option
37
+
38
+ attr_reader :options, :row_sep, :col_sep, :quote_char, :force_quotes, :discover_headers, :headers, :map_headers, :output_file
39
+
40
+ class Writer
41
+ def initialize(file_path, options = {})
42
+ @options = options
43
+ @row_sep = options[:row_sep] || "\n" # RFC4180 "\r\n"
44
+ @col_sep = options[:col_sep] || ','
45
+ @quote_char = options[:quote_char] || '"'
46
+ @force_quotes = options[:force_quotes] == true
47
+ @discover_headers = true # defaults to true
48
+ if options.has_key?(:discover_headers)
49
+ # passing in the option overrides the default behavior
50
+ @discover_headers = options[:discover_headers] == true
51
+ else
52
+ # disable discover_headers when headers are given explicitly
53
+ @discover_headers = !(options.has_key?(:map_headers) || options.has_key?(:headers))
54
+ end
55
+ @headers = [] # start with empty headers
56
+ @headers = options[:headers] if options.has_key?(:headers) # unless explicitly given
57
+ @headers = options[:map_headers].keys if options.has_key?(:map_headers) && !options.has_key?(:headers)
58
+ @map_headers = options[:map_headers] || {}
59
+
60
+ @output_file = File.open(file_path, 'w+')
61
+ # hidden state:
62
+ @temp_file = Tempfile.new('tempfile', '/tmp')
63
+ @quote_regex = Regexp.union(@col_sep, @row_sep, @quote_char)
64
+ end
65
+
66
+ # this can be called many times in order to append lines to the csv file
67
+ def <<(data)
68
+ case data
69
+ when Hash
70
+ process_hash(data)
71
+ when Array
72
+ data.each { |item| self << item }
73
+ when NilClass
74
+ # ignore
75
+ else
76
+ raise InvalidInputData, "Invalid data type: #{data.class}. Must be a Hash or an Array."
77
+ end
78
+ end
79
+
80
+ def finalize
81
+ # Map headers if :map_headers option is provided
82
+ mapped_headers = @headers.map { |header| @map_headers[header] || header }
83
+
84
+ @temp_file.rewind
85
+ @output_file.write(mapped_headers.join(@col_sep) + @row_sep)
86
+ @output_file.write(@temp_file.read)
87
+ @output_file.flush
88
+ @output_file.close
89
+ @temp_file.delete
90
+ end
91
+
92
+ private
93
+
94
+ def process_hash(hash)
95
+ if @discover_headers
96
+ hash_keys = hash.keys
97
+ new_keys = hash_keys - @headers
98
+ @headers.concat(new_keys)
99
+ end
100
+
101
+ # Reorder the hash to match the current headers order and fill missing fields
102
+ ordered_row = @headers.map { |header| hash[header] || '' }
103
+
104
+ @temp_file.write ordered_row.map { |value| escape_csv_field(value) }.join(@col_sep) + @row_sep
105
+ end
106
+
107
+ def escape_csv_field(field)
108
+ if @force_quotes || field.to_s.match(@quote_regex)
109
+ "\"#{field}\""
110
+ else
111
+ field.to_s
112
+ end
113
+ end
114
+ end
115
+ end
data/lib/smarter_csv.rb CHANGED
@@ -1,8 +1,8 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require 'set'
4
-
5
3
  require "smarter_csv/version"
4
+ require "smarter_csv/errors"
5
+
6
6
  require "smarter_csv/file_io"
7
7
  require "smarter_csv/options_processing"
8
8
  require "smarter_csv/auto_detection"
@@ -11,7 +11,10 @@ require 'smarter_csv/header_transformations'
11
11
  require 'smarter_csv/header_validations'
12
12
  require "smarter_csv/headers"
13
13
  require "smarter_csv/hash_transformations"
14
+
14
15
  require "smarter_csv/parse"
16
+ require "smarter_csv/writer"
17
+ require "smarter_csv/smarter_csv"
15
18
 
16
19
  # load the C-extension:
17
20
  case RUBY_ENGINE
@@ -50,4 +53,24 @@ else
50
53
  BLOCK_COMMENT
51
54
  end
52
55
  # :nocov:
53
- require "smarter_csv/smarter_csv"
56
+
57
+ module SmarterCSV
58
+ # SmarterCSV.generate(filename, options) do |csv_writer|
59
+ # MyModel.find_in_batches(batch_size: 100) do |batch|
60
+ # batch.pluck(:name, :description, :instructor).each do |record|
61
+ # csv_writer << record
62
+ # end
63
+ # end
64
+ # end
65
+ #
66
+ # rubocop:disable Lint/UnusedMethodArgument
67
+ def self.generate(filename, options = {}, &block)
68
+ raise unless block_given?
69
+
70
+ writer = Writer.new(filename, options)
71
+ yield writer
72
+ ensure
73
+ writer.finalize
74
+ end
75
+ # rubocop:enable Lint/UnusedMethodArgument
76
+ end
data/smarter_csv.gemspec CHANGED
@@ -9,8 +9,8 @@ Gem::Specification.new do |spec|
9
9
  spec.authors = ["Tilo Sloboda"]
10
10
  spec.email = ["tilo.sloboda@gmail.com"]
11
11
 
12
- spec.summary = "Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots of optional features, e.g. chunked processing for huge CSV files"
13
- spec.description = "Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, with optional features for processing large files in parallel, embedded comments, unusual field- and record-separators, flexible mapping of CSV-headers to Hash-keys"
12
+ spec.summary = "CSV Reading and Writing"
13
+ spec.description = "Ruby Gem for convenient reading and writing: importing of CSV Files as Array(s) of Hashes, with lots of features for processing large files in parallel, embedded comments, unusual field- and record-separators, flexible mapping of CSV-headers to Hash-keys"
14
14
  spec.homepage = "https://github.com/tilo/smarter_csv"
15
15
  spec.license = 'MIT'
16
16
 
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: smarter_csv
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.11.0.pre2
4
+ version: 1.11.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tilo Sloboda
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2024-01-14 00:00:00.000000000 Z
11
+ date: 2024-07-06 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: awesome_print
@@ -94,9 +94,10 @@ dependencies:
94
94
  - - ">="
95
95
  - !ruby/object:Gem::Version
96
96
  version: '0'
97
- description: Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, with
98
- optional features for processing large files in parallel, embedded comments, unusual
99
- field- and record-separators, flexible mapping of CSV-headers to Hash-keys
97
+ description: 'Ruby Gem for convenient reading and writing: importing of CSV Files
98
+ as Array(s) of Hashes, with lots of features for processing large files in parallel,
99
+ embedded comments, unusual field- and record-separators, flexible mapping of CSV-headers
100
+ to Hash-keys'
100
101
  email:
101
102
  - tilo.sloboda@gmail.com
102
103
  executables: []
@@ -104,6 +105,7 @@ extensions:
104
105
  - ext/smarter_csv/extconf.rb
105
106
  extra_rdoc_files: []
106
107
  files:
108
+ - ".rspec"
107
109
  - ".rubocop.yml"
108
110
  - ".rvmrc"
109
111
  - CHANGELOG.md
@@ -117,6 +119,7 @@ files:
117
119
  - ext/smarter_csv/smarter_csv.c
118
120
  - lib/smarter_csv.rb
119
121
  - lib/smarter_csv/auto_detection.rb
122
+ - lib/smarter_csv/errors.rb
120
123
  - lib/smarter_csv/file_io.rb
121
124
  - lib/smarter_csv/hash_transformations.rb
122
125
  - lib/smarter_csv/header_transformations.rb
@@ -127,6 +130,7 @@ files:
127
130
  - lib/smarter_csv/smarter_csv.rb
128
131
  - lib/smarter_csv/variables.rb
129
132
  - lib/smarter_csv/version.rb
133
+ - lib/smarter_csv/writer.rb
130
134
  - smarter_csv.gemspec
131
135
  homepage: https://github.com/tilo/smarter_csv
132
136
  licenses:
@@ -147,13 +151,12 @@ required_ruby_version: !ruby/object:Gem::Requirement
147
151
  version: 2.5.0
148
152
  required_rubygems_version: !ruby/object:Gem::Requirement
149
153
  requirements:
150
- - - ">"
154
+ - - ">="
151
155
  - !ruby/object:Gem::Version
152
- version: 1.3.1
156
+ version: '0'
153
157
  requirements: []
154
158
  rubygems_version: 3.2.3
155
159
  signing_key:
156
160
  specification_version: 4
157
- summary: Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots
158
- of optional features, e.g. chunked processing for huge CSV files
161
+ summary: CSV Reading and Writing
159
162
  test_files: []