smarter_csv 1.12.0.pre1 → 1.12.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: a7573b21c853accca5035c8ba3b1db8f0d3a7fdddc961a535440adedcf0b6b82
4
- data.tar.gz: 19fffb74289999f01210ad359ac349286a8b267df3910846bec81934f6cde333
3
+ metadata.gz: 05aa9e7d2d22ec6e1beb3790e2b727cd3e615cadcd537716f2dfbb190cc87a09
4
+ data.tar.gz: e37b072c7c81a3b6cdc6192ed2bfab046c924f3aa7a8a3e2a66f55fafa25b7ff
5
5
  SHA512:
6
- metadata.gz: 1f9bcb549185941fec0ee7a238df470a8bfdba7cc7ec007057afed0f9dfda8e7a298d1fcfe3bcb2911337827900ccb71df6bd65ed917aed322a3499d4cf3c3a9
7
- data.tar.gz: 3350a2d318e351f5d5a192fa1aa0664ed0ba42910c5742c18e4a92cdcd145f0e27ddea928e968e834b50e9ea2906b4b4aa573939540e661b65e11acffa739c0b
6
+ metadata.gz: 07c149aaa123ef75fb65fd596fbab64359e24cf2b8606fe406d714358a1c14696fa9ecb420e6dd0a95d40f6af6d41e4988b16df9eac4346d9e1295e3c32f22b1
7
+ data.tar.gz: 71341c1cf1092fabbfe9106ce533adb872e2bc1b0c30fbc032f3ceaea1832e2ddef5d4156f1465658a67dddaae508cd23b12cfe9fdf34edea3f1f3ede0385688
data/CHANGELOG.md CHANGED
@@ -1,10 +1,13 @@
1
1
 
2
2
  # SmarterCSV 1.x Change Log
3
3
 
4
- ## 1.12.0 (2024-07-08)
5
- * added SmarterCSV::Reader to process CSV files ([issue #277](https://github.com/tilo/smarter_csv/pull/277))
4
+ ## 1.12.1 (2024-07-10)
5
+ * Improved column separator detection by ignoring quoted sections [#276](https://github.com/tilo/smarter_csv/pull/276) (thanks to Nicolas Castellanos)
6
+
7
+ ## 1.12.0 (2024-07-09)
8
+ * Added Thread-Safety: added SmarterCSV::Reader to process CSV files in a thread-safe manner ([issue #277](https://github.com/tilo/smarter_csv/pull/277))
6
9
  * SmarterCSV::Writer changed default row separator to the system's row separator (`\n` on Linux, `\r\n` on Windows)
7
- * added a lot of docs
10
+ * added a doc tree
8
11
 
9
12
  * POTENTIAL ISSUE:
10
13
 
data/CONTRIBUTORS.md CHANGED
@@ -53,3 +53,4 @@ A Big Thank you to everyone who filed issues, sent comments, and who contributed
53
53
  * [JP Camara](https://github.com/jpcamara)
54
54
  * [Kenton Hirowatari](https://github.com/hirowatari)
55
55
  * [Daniel Pepper](https://github.com/dpep)
56
+ * [Nicolas Castellanos](https://github.com/nicastelo)
data/README.md CHANGED
@@ -36,6 +36,7 @@ Or install it yourself as:
36
36
 
37
37
  * [Introduction](docs/_introduction.md)
38
38
  * [The Basic API](docs/basic_api.md)
39
+ * [Batch Processing](./docs/batch_processing.md)
39
40
  * [Configuration Options](docs/options.md)
40
41
  * [Row and Column Separators](docs/row_col_sep.md)
41
42
  * [Header Transformations](docs/header_transformations.md)
@@ -43,9 +44,8 @@ Or install it yourself as:
43
44
  * [Data Transformations](docs/data_transformations.md)
44
45
  * [Value Converters](docs/value_converters.md)
45
46
 
46
- * [Notes](docs/notes.md) <--- this info needs to be moved to individual pages
47
-
48
47
  # Articles
48
+ * [Parsing CSV Files in Ruby with SmarterCSV](https://tilo-sloboda.medium.com/parsing-csv-files-in-ruby-with-smartercsv-6ce66fb6cf38)
49
49
  * [Processing 1.4 Million CSV Records in Ruby, fast ](https://lcx.wien/blog/processing-14-million-csv-records-in-ruby/)
50
50
  * [Speeding up CSV parsing with parallel processing](http://xjlin0.github.io/tech/2015/05/25/faster-parsing-csv-with-parallel-processing)
51
51
  * [The original post](http://www.unixgods.org/Ruby/process_csv_as_hashes.html) that started SmarterCSV
@@ -1,8 +1,21 @@
1
1
 
2
- # SmarterCSV Introduction
2
+ ### Contents
3
+
4
+ * [**Introduction**](./_introduction.md)
5
+ * [The Basic API](./basic_api.md)
6
+ * [Batch Processing](././batch_processing.md)
7
+ * [Configuration Options](./options.md)
8
+ * [Row and Column Separators](./row_col_sep.md)
9
+ * [Header Transformations](./header_transformations.md)
10
+ * [Header Validations](./header_validations.md)
11
+ * [Data Transformations](./data_transformations.md)
12
+ * [Value Converters](./value_converters.md)
13
+
14
+ --------------
3
15
 
4
- `smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with ActiveRecord, parallel processing, kicking-off batch jobs with Sidekiq, or oploading data to S3.
16
+ # SmarterCSV Introduction
5
17
 
18
+ `smarter_csv` is a Ruby Gem for convenient reading and writing of CSV files. It has intelligent defaults, and auto-discovery of column and row separators. It imports CSV Files as Array(s) of Hashes, suitable for direct processing with ActiveRecord, kicking-off batch jobs with Sidekiq, parallel processing, or oploading data to S3. Similarly, writing CSV files takes Hashes, or Arrays of Hashes to create a CSV file.
6
19
 
7
20
  ## Why another CSV library?
8
21
 
@@ -38,3 +51,6 @@ The CSV processing also needed to be robust against variations in the input data
38
51
 
39
52
  * Data Validations
40
53
  (planned feature)
54
+
55
+ ---------------
56
+ PREVIOUS [README](../README.md) | NEXT: [The Basic API](./basic_api.md)
data/docs/basic_api.md CHANGED
@@ -1,5 +1,19 @@
1
1
 
2
- # SmarterCSV API
2
+ ### Contents
3
+
4
+ * [Introduction](./_introduction.md)
5
+ * [**The Basic API**](./basic_api.md)
6
+ * [Batch Processing](././batch_processing.md)
7
+ * [Configuration Options](./options.md)
8
+ * [Row and Column Separators](./row_col_sep.md)
9
+ * [Header Transformations](./header_transformations.md)
10
+ * [Header Validations](./header_validations.md)
11
+ * [Data Transformations](./data_transformations.md)
12
+ * [Value Converters](./value_converters.md)
13
+
14
+ --------------
15
+
16
+ # SmarterCSV Basic API
3
17
 
4
18
  Let's explore the basic APIs for reading and writing CSV files. There is a simplified API (backwards conpatible with previous SmarterCSV versions) and the full API, which allows you to access the internal state of the reader or writer instance after processing.
5
19
 
@@ -138,3 +152,6 @@ $ hexdump -C spec/fixtures/bom_test_feff.csv
138
152
  data = SmarterCSV.process(f)
139
153
  end
140
154
  ```
155
+
156
+ ----------------
157
+ PREVIOUS: [Introduction](./_introduction.md) | NEXT: [Batch Processing](./batch_processing.md)
@@ -1,4 +1,18 @@
1
1
 
2
+ ### Contents
3
+
4
+ * [Introduction](./_introduction.md)
5
+ * [The Basic API](./basic_api.md)
6
+ * [**Batch Processing**](././batch_processing.md)
7
+ * [Configuration Options](./options.md)
8
+ * [Row and Column Separators](./row_col_sep.md)
9
+ * [Header Transformations](./header_transformations.md)
10
+ * [Header Validations](./header_validations.md)
11
+ * [Data Transformations](./data_transformations.md)
12
+ * [Value Converters](./value_converters.md)
13
+
14
+ --------------
15
+
2
16
  # Batch Processing
3
17
 
4
18
  Processing CSV data in batches (chunks), allows you to parallelize the workload of importing data.
@@ -44,10 +58,11 @@ and how the `process` method returns the number of chunks when called with a blo
44
58
  n = SmarterCSV.process(filename, options) do |chunk|
45
59
  # we're passing a block in, to process each resulting hash / row (block takes array of hashes)
46
60
  # when chunking is enabled, there are up to :chunk_size hashes in each chunk
47
- MyModel.collection.insert( chunk ) # insert up to 100 records at a time
61
+ MyModel.insert_all( chunk ) # insert up to 100 records at a time
48
62
  end
49
63
 
50
64
  => returns number of chunks we processed
51
65
  ```
52
66
 
53
-
67
+ ----------------
68
+ PREVIOUS: [The Basic API](./basic_api.md) | NEXT: [Configuration Options](./options.md)
@@ -1,3 +1,18 @@
1
+
2
+ ### Contents
3
+
4
+ * [Introduction](./_introduction.md)
5
+ * [The Basic API](./basic_api.md)
6
+ * [Batch Processing](././batch_processing.md)
7
+ * [Configuration Options](./options.md)
8
+ * [Row and Column Separators](./row_col_sep.md)
9
+ * [Header Transformations](./header_transformations.md)
10
+ * [Header Validations](./header_validations.md)
11
+ * [**Data Transformations**](./data_transformations.md)
12
+ * [Value Converters](./value_converters.md)
13
+
14
+ --------------
15
+
1
16
  # Data Transformations
2
17
 
3
18
  SmarterCSV automatically transforms the values in each colum in order to normalize the data.
@@ -30,3 +45,6 @@ It can happen that after all transformations, a row of the CSV file would produc
30
45
  By default SmarterCSV uses `remove_empty_hashes: true` to remove these empty hashes from the result.
31
46
 
32
47
  This can be set to `true`, to keep these empty hashes in the results.
48
+
49
+ -------------------
50
+ PREVIOUS: [Header Validations](./header_validations.md) | NEXT: [Value Converters](./value_converters.md)
data/docs/examples.md CHANGED
@@ -1,4 +1,18 @@
1
1
 
2
+ ### Contents
3
+
4
+ * [Introduction](./_introduction.md)
5
+ * [The Basic API](./basic_api.md)
6
+ * [Batch Processing](././batch_processing.md)
7
+ * [Configuration Options](./options.md)
8
+ * [Row and Column Separators](./row_col_sep.md)
9
+ * [Header Transformations](./header_transformations.md)
10
+ * [Header Validations](./header_validations.md)
11
+ * [Data Transformations](./data_transformations.md)
12
+ * [Value Converters](./value_converters.md)
13
+
14
+ --------------
15
+
2
16
  # Examples
3
17
 
4
18
  Here are some examples to demonstrate the versatility of SmarterCSV.
@@ -1,3 +1,18 @@
1
+
2
+ ### Contents
3
+
4
+ * [Introduction](./_introduction.md)
5
+ * [The Basic API](./basic_api.md)
6
+ * [Batch Processing](././batch_processing.md)
7
+ * [Configuration Options](./options.md)
8
+ * [Row and Column Separators](./row_col_sep.md)
9
+ * [**Header Transformations**](./header_transformations.md)
10
+ * [Header Validations](./header_validations.md)
11
+ * [Data Transformations](./data_transformations.md)
12
+ * [Value Converters](./value_converters.md)
13
+
14
+ --------------
15
+
1
16
  # Header Transformations
2
17
 
3
18
  By default SmarterCSV assumes that a CSV file has headers, and it automatically normalizes the headers and transforms them into Ruby symbols. You can completely customize or override this (see below).
@@ -93,3 +108,6 @@ For CSV files with headers, you can either:
93
108
  * some CSV files use un-escaped quotation characters inside fields. This can cause the import to break. To get around this, use the `:force_simple_split => true` option in combination with `:strip_chars_from_headers => /[\-"]/` . This will also significantly speed up the import.
94
109
  If you would force a different :quote_char instead (setting it to a non-used character), then the import would be up to 5-times slower than using `:force_simple_split`.
95
110
 
111
+ ---------------
112
+ PREVIOUS: [Row and Column Separators](./row_col_sep.md) | NEXT: [Header Validations](./header_validations.md)
113
+
@@ -1,3 +1,18 @@
1
+
2
+ ### Contents
3
+
4
+ * [Introduction](./_introduction.md)
5
+ * [The Basic API](./basic_api.md)
6
+ * [Batch Processing](././batch_processing.md)
7
+ * [Configuration Options](./options.md)
8
+ * [Row and Column Separators](./row_col_sep.md)
9
+ * [Header Transformations](./header_transformations.md)
10
+ * [**Header Validations**](./header_validations.md)
11
+ * [Data Transformations](./data_transformations.md)
12
+ * [Value Converters](./value_converters.md)
13
+
14
+ --------------
15
+
1
16
  # Header Validations
2
17
 
3
18
  When you are importing data, it can be important to verify that all required data is present, to ensure consistent quality when importing data.
@@ -16,3 +31,6 @@ If these keys are not present, `SmarterCSV::MissingKeys` will be raised to infor
16
31
 
17
32
  => this will raise SmarterCSV::MissingKeys if any row does not contain these three keys
18
33
  ```
34
+
35
+ ----------------
36
+ PREVIOUS: [Header Transformations](./header_transformations.md) | NEXT: [Data Transformations](./data_transformations.md)
data/docs/options.md CHANGED
@@ -1,5 +1,19 @@
1
1
 
2
- # SmarterCSV Options
2
+ ### Contents
3
+
4
+ * [Introduction](./_introduction.md)
5
+ * [The Basic API](./basic_api.md)
6
+ * [Batch Processing](././batch_processing.md)
7
+ * [**Configuration Options**](./options.md)
8
+ * [Row and Column Separators](./row_col_sep.md)
9
+ * [Header Transformations](./header_transformations.md)
10
+ * [Header Validations](./header_validations.md)
11
+ * [Data Transformations](./data_transformations.md)
12
+ * [Value Converters](./value_converters.md)
13
+
14
+ --------------
15
+
16
+ # Configuration Options
3
17
 
4
18
  ## CSV Writing
5
19
 
@@ -80,3 +94,5 @@ There have been a lot of 1-offs and feature creep around these options, and goin
80
94
  | | | also accepts either {:except => [:key1,:key2]} or {:only => :key3} |
81
95
  ---------------------------------------------------------------------------------------------------------------------------------
82
96
 
97
+ -------------
98
+ PREVIOUS: [Batch Processing](./batch_processing.md) | NEXT: [Row and Column Separators](./row_col_sep.md)
data/docs/row_col_sep.md CHANGED
@@ -1,4 +1,18 @@
1
1
 
2
+ ### Contents
3
+
4
+ * [Introduction](./_introduction.md)
5
+ * [The Basic API](./basic_api.md)
6
+ * [Batch Processing](././batch_processing.md)
7
+ * [Configuration Options](./options.md)
8
+ * [**Row and Column Separators**](./row_col_sep.md)
9
+ * [Header Transformations](./header_transformations.md)
10
+ * [Header Validations](./header_validations.md)
11
+ * [Data Transformations](./data_transformations.md)
12
+ * [Value Converters](./value_converters.md)
13
+
14
+ --------------
15
+
2
16
  # Row and Column Separators
3
17
 
4
18
  ## Automatic Detection
@@ -85,3 +99,6 @@ In this example, we use `comment_regexp` to filter out and ignore any lines star
85
99
  end
86
100
  => returns number of chunks
87
101
  ```
102
+
103
+ ----------------
104
+ PREVIOUS: [Configuration Options](./options.md) | NEXT: [Header Transformations](./header_transformations.md)
@@ -1,4 +1,18 @@
1
1
 
2
+ ### Contents
3
+
4
+ * [Introduction](./_introduction.md)
5
+ * [The Basic API](./basic_api.md)
6
+ * [Batch Processing](././batch_processing.md)
7
+ * [Configuration Options](./options.md)
8
+ * [Row and Column Separators](./row_col_sep.md)
9
+ * [Header Transformations](./header_transformations.md)
10
+ * [Header Validations](./header_validations.md)
11
+ * [Data Transformations](./data_transformations.md)
12
+ * [**Value Converters**](./value_converters.md)
13
+
14
+ --------------
15
+
2
16
  # Using Value Converters
3
17
 
4
18
  Value Converters allow you to do custom transformations specific rows, to help you massage the data so it fits the expectations of your down-stream process, such as creating a DB record.
@@ -49,3 +63,6 @@ If you use `key_mappings` and `value_converters`, make sure that the value conve
49
63
  first_record[:price].class
50
64
  => Float
51
65
  ```
66
+
67
+ --------------------
68
+ PREVIOUS: [Data Transformations](./data_transformations.md) | UP: [README](../README.md)
@@ -19,7 +19,12 @@ module SmarterCSV
19
19
  count.times do
20
20
  line = readline_with_counts(filehandle, options)
21
21
  delimiters.each do |d|
22
- candidates[d] += line.scan(d).count
22
+ escaped_quote = Regexp.escape(options[:quote_char])
23
+
24
+ # Count only non-quoted occurrences of the delimiter
25
+ non_quoted_text = line.split(/#{escaped_quote}[^#{escaped_quote}]*#{escaped_quote}/).join
26
+
27
+ candidates[d] += non_quoted_text.scan(d).count
23
28
  end
24
29
  rescue EOFError # short files
25
30
  break
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SmarterCSV
4
- VERSION = "1.12.0.pre1"
4
+ VERSION = "1.12.1"
5
5
  end
data/smarter_csv.gemspec CHANGED
@@ -10,7 +10,7 @@ Gem::Specification.new do |spec|
10
10
  spec.email = ["tilo.sloboda@gmail.com"]
11
11
 
12
12
  spec.summary = "Convenient CSV Reading and Writing"
13
- spec.description = "Ruby Gem for convenient reading and writing: importing of CSV Files as Array(s) of Hashes, with lots of features for processing large files in parallel, embedded comments, unusual field- and record-separators, flexible mapping of CSV-headers to Hash-keys"
13
+ spec.description = "Ruby Gem for convenient reading and writing of CSV files. It has intelligent defaults, and auto-discovery of column and row separators. It imports CSV Files as Array(s) of Hashes, suitable for direct processing with ActiveRecord, kicking-off batch jobs with Sidekiq, parallel processing, or oploading data to S3. Similarly, writing CSV files takes Hashes, or Arrays of Hashes to create a CSV file."
14
14
  spec.homepage = "https://github.com/tilo/smarter_csv"
15
15
  spec.license = 'MIT'
16
16
 
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: smarter_csv
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.12.0.pre1
4
+ version: 1.12.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tilo Sloboda
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2024-07-08 00:00:00.000000000 Z
11
+ date: 2024-07-10 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: awesome_print
@@ -94,10 +94,11 @@ dependencies:
94
94
  - - ">="
95
95
  - !ruby/object:Gem::Version
96
96
  version: '0'
97
- description: 'Ruby Gem for convenient reading and writing: importing of CSV Files
98
- as Array(s) of Hashes, with lots of features for processing large files in parallel,
99
- embedded comments, unusual field- and record-separators, flexible mapping of CSV-headers
100
- to Hash-keys'
97
+ description: Ruby Gem for convenient reading and writing of CSV files. It has intelligent
98
+ defaults, and auto-discovery of column and row separators. It imports CSV Files
99
+ as Array(s) of Hashes, suitable for direct processing with ActiveRecord, kicking-off
100
+ batch jobs with Sidekiq, parallel processing, or oploading data to S3. Similarly,
101
+ writing CSV files takes Hashes, or Arrays of Hashes to create a CSV file.
101
102
  email:
102
103
  - tilo.sloboda@gmail.com
103
104
  executables: []
@@ -122,7 +123,6 @@ files:
122
123
  - docs/examples.md
123
124
  - docs/header_transformations.md
124
125
  - docs/header_validations.md
125
- - docs/notes.md
126
126
  - docs/options.md
127
127
  - docs/row_col_sep.md
128
128
  - docs/value_converters.md
@@ -161,9 +161,9 @@ required_ruby_version: !ruby/object:Gem::Requirement
161
161
  version: 2.5.0
162
162
  required_rubygems_version: !ruby/object:Gem::Requirement
163
163
  requirements:
164
- - - ">"
164
+ - - ">="
165
165
  - !ruby/object:Gem::Version
166
- version: 1.3.1
166
+ version: '0'
167
167
  requirements: []
168
168
  rubygems_version: 3.2.3
169
169
  signing_key:
data/docs/notes.md DELETED
@@ -1,29 +0,0 @@
1
-
2
- # Notes
3
-
4
-
5
-
6
-
7
- ## NOTES on the use of Chunking and Blocks:
8
- * chunking can be VERY USEFUL if used in combination with passing a block to File.read_csv FOR LARGE FILES
9
- * if you pass a block to File.read_csv, that block will be executed and given an Array of Hashes as the parameter.
10
- * if the chunk_size is not set, then the array will only contain one Hash.
11
- * if the chunk_size is > 0 , then the array may contain up to chunk_size Hashes.
12
- * this can be very useful when passing chunked data to a post-processing step, e.g. through Sidekiq
13
-
14
- ## NOTES about File Encodings:
15
- * if you have a CSV file which contains unicode characters, you can process it as follows:
16
-
17
- ```ruby
18
- File.open(filename, "r:bom|utf-8") do |f|
19
- data = SmarterCSV.process(f);
20
- end
21
- ```
22
- * if the CSV file with unicode characters is in a remote location, similarly you need to give the encoding as an option to the `open` call:
23
- ```ruby
24
- require 'open-uri'
25
- file_location = 'http://your.remote.org/sample.csv'
26
- open(file_location, 'r:utf-8') do |f| # don't forget to specify the UTF-8 encoding!!
27
- data = SmarterCSV.process(f)
28
- end
29
- ```