smarter_csv 1.12.0.pre1 → 1.12.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: a7573b21c853accca5035c8ba3b1db8f0d3a7fdddc961a535440adedcf0b6b82
4
- data.tar.gz: 19fffb74289999f01210ad359ac349286a8b267df3910846bec81934f6cde333
3
+ metadata.gz: 05aa9e7d2d22ec6e1beb3790e2b727cd3e615cadcd537716f2dfbb190cc87a09
4
+ data.tar.gz: e37b072c7c81a3b6cdc6192ed2bfab046c924f3aa7a8a3e2a66f55fafa25b7ff
5
5
  SHA512:
6
- metadata.gz: 1f9bcb549185941fec0ee7a238df470a8bfdba7cc7ec007057afed0f9dfda8e7a298d1fcfe3bcb2911337827900ccb71df6bd65ed917aed322a3499d4cf3c3a9
7
- data.tar.gz: 3350a2d318e351f5d5a192fa1aa0664ed0ba42910c5742c18e4a92cdcd145f0e27ddea928e968e834b50e9ea2906b4b4aa573939540e661b65e11acffa739c0b
6
+ metadata.gz: 07c149aaa123ef75fb65fd596fbab64359e24cf2b8606fe406d714358a1c14696fa9ecb420e6dd0a95d40f6af6d41e4988b16df9eac4346d9e1295e3c32f22b1
7
+ data.tar.gz: 71341c1cf1092fabbfe9106ce533adb872e2bc1b0c30fbc032f3ceaea1832e2ddef5d4156f1465658a67dddaae508cd23b12cfe9fdf34edea3f1f3ede0385688
data/CHANGELOG.md CHANGED
@@ -1,10 +1,13 @@
1
1
 
2
2
  # SmarterCSV 1.x Change Log
3
3
 
4
- ## 1.12.0 (2024-07-08)
5
- * added SmarterCSV::Reader to process CSV files ([issue #277](https://github.com/tilo/smarter_csv/pull/277))
4
+ ## 1.12.1 (2024-07-10)
5
+ * Improved column separator detection by ignoring quoted sections [#276](https://github.com/tilo/smarter_csv/pull/276) (thanks to Nicolas Castellanos)
6
+
7
+ ## 1.12.0 (2024-07-09)
8
+ * Added Thread-Safety: added SmarterCSV::Reader to process CSV files in a thread-safe manner ([issue #277](https://github.com/tilo/smarter_csv/pull/277))
6
9
  * SmarterCSV::Writer changed default row separator to the system's row separator (`\n` on Linux, `\r\n` on Windows)
7
- * added a lot of docs
10
+ * added a doc tree
8
11
 
9
12
  * POTENTIAL ISSUE:
10
13
 
data/CONTRIBUTORS.md CHANGED
@@ -53,3 +53,4 @@ A Big Thank you to everyone who filed issues, sent comments, and who contributed
53
53
  * [JP Camara](https://github.com/jpcamara)
54
54
  * [Kenton Hirowatari](https://github.com/hirowatari)
55
55
  * [Daniel Pepper](https://github.com/dpep)
56
+ * [Nicolas Castellanos](https://github.com/nicastelo)
data/README.md CHANGED
@@ -36,6 +36,7 @@ Or install it yourself as:
36
36
 
37
37
  * [Introduction](docs/_introduction.md)
38
38
  * [The Basic API](docs/basic_api.md)
39
+ * [Batch Processing](./docs/batch_processing.md)
39
40
  * [Configuration Options](docs/options.md)
40
41
  * [Row and Column Separators](docs/row_col_sep.md)
41
42
  * [Header Transformations](docs/header_transformations.md)
@@ -43,9 +44,8 @@ Or install it yourself as:
43
44
  * [Data Transformations](docs/data_transformations.md)
44
45
  * [Value Converters](docs/value_converters.md)
45
46
 
46
- * [Notes](docs/notes.md) <--- this info needs to be moved to individual pages
47
-
48
47
  # Articles
48
+ * [Parsing CSV Files in Ruby with SmarterCSV](https://tilo-sloboda.medium.com/parsing-csv-files-in-ruby-with-smartercsv-6ce66fb6cf38)
49
49
  * [Processing 1.4 Million CSV Records in Ruby, fast ](https://lcx.wien/blog/processing-14-million-csv-records-in-ruby/)
50
50
  * [Speeding up CSV parsing with parallel processing](http://xjlin0.github.io/tech/2015/05/25/faster-parsing-csv-with-parallel-processing)
51
51
  * [The original post](http://www.unixgods.org/Ruby/process_csv_as_hashes.html) that started SmarterCSV
@@ -1,8 +1,21 @@
1
1
 
2
- # SmarterCSV Introduction
2
+ ### Contents
3
+
4
+ * [**Introduction**](./_introduction.md)
5
+ * [The Basic API](./basic_api.md)
6
+ * [Batch Processing](././batch_processing.md)
7
+ * [Configuration Options](./options.md)
8
+ * [Row and Column Separators](./row_col_sep.md)
9
+ * [Header Transformations](./header_transformations.md)
10
+ * [Header Validations](./header_validations.md)
11
+ * [Data Transformations](./data_transformations.md)
12
+ * [Value Converters](./value_converters.md)
13
+
14
+ --------------
3
15
 
4
- `smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with ActiveRecord, parallel processing, kicking-off batch jobs with Sidekiq, or oploading data to S3.
16
+ # SmarterCSV Introduction
5
17
 
18
+ `smarter_csv` is a Ruby Gem for convenient reading and writing of CSV files. It has intelligent defaults, and auto-discovery of column and row separators. It imports CSV Files as Array(s) of Hashes, suitable for direct processing with ActiveRecord, kicking-off batch jobs with Sidekiq, parallel processing, or oploading data to S3. Similarly, writing CSV files takes Hashes, or Arrays of Hashes to create a CSV file.
6
19
 
7
20
  ## Why another CSV library?
8
21
 
@@ -38,3 +51,6 @@ The CSV processing also needed to be robust against variations in the input data
38
51
 
39
52
  * Data Validations
40
53
  (planned feature)
54
+
55
+ ---------------
56
+ PREVIOUS [README](../README.md) | NEXT: [The Basic API](./basic_api.md)
data/docs/basic_api.md CHANGED
@@ -1,5 +1,19 @@
1
1
 
2
- # SmarterCSV API
2
+ ### Contents
3
+
4
+ * [Introduction](./_introduction.md)
5
+ * [**The Basic API**](./basic_api.md)
6
+ * [Batch Processing](././batch_processing.md)
7
+ * [Configuration Options](./options.md)
8
+ * [Row and Column Separators](./row_col_sep.md)
9
+ * [Header Transformations](./header_transformations.md)
10
+ * [Header Validations](./header_validations.md)
11
+ * [Data Transformations](./data_transformations.md)
12
+ * [Value Converters](./value_converters.md)
13
+
14
+ --------------
15
+
16
+ # SmarterCSV Basic API
3
17
 
4
18
  Let's explore the basic APIs for reading and writing CSV files. There is a simplified API (backwards conpatible with previous SmarterCSV versions) and the full API, which allows you to access the internal state of the reader or writer instance after processing.
5
19
 
@@ -138,3 +152,6 @@ $ hexdump -C spec/fixtures/bom_test_feff.csv
138
152
  data = SmarterCSV.process(f)
139
153
  end
140
154
  ```
155
+
156
+ ----------------
157
+ PREVIOUS: [Introduction](./_introduction.md) | NEXT: [Batch Processing](./batch_processing.md)
@@ -1,4 +1,18 @@
1
1
 
2
+ ### Contents
3
+
4
+ * [Introduction](./_introduction.md)
5
+ * [The Basic API](./basic_api.md)
6
+ * [**Batch Processing**](././batch_processing.md)
7
+ * [Configuration Options](./options.md)
8
+ * [Row and Column Separators](./row_col_sep.md)
9
+ * [Header Transformations](./header_transformations.md)
10
+ * [Header Validations](./header_validations.md)
11
+ * [Data Transformations](./data_transformations.md)
12
+ * [Value Converters](./value_converters.md)
13
+
14
+ --------------
15
+
2
16
  # Batch Processing
3
17
 
4
18
  Processing CSV data in batches (chunks), allows you to parallelize the workload of importing data.
@@ -44,10 +58,11 @@ and how the `process` method returns the number of chunks when called with a blo
44
58
  n = SmarterCSV.process(filename, options) do |chunk|
45
59
  # we're passing a block in, to process each resulting hash / row (block takes array of hashes)
46
60
  # when chunking is enabled, there are up to :chunk_size hashes in each chunk
47
- MyModel.collection.insert( chunk ) # insert up to 100 records at a time
61
+ MyModel.insert_all( chunk ) # insert up to 100 records at a time
48
62
  end
49
63
 
50
64
  => returns number of chunks we processed
51
65
  ```
52
66
 
53
-
67
+ ----------------
68
+ PREVIOUS: [The Basic API](./basic_api.md) | NEXT: [Configuration Options](./options.md)
@@ -1,3 +1,18 @@
1
+
2
+ ### Contents
3
+
4
+ * [Introduction](./_introduction.md)
5
+ * [The Basic API](./basic_api.md)
6
+ * [Batch Processing](././batch_processing.md)
7
+ * [Configuration Options](./options.md)
8
+ * [Row and Column Separators](./row_col_sep.md)
9
+ * [Header Transformations](./header_transformations.md)
10
+ * [Header Validations](./header_validations.md)
11
+ * [**Data Transformations**](./data_transformations.md)
12
+ * [Value Converters](./value_converters.md)
13
+
14
+ --------------
15
+
1
16
  # Data Transformations
2
17
 
3
18
  SmarterCSV automatically transforms the values in each colum in order to normalize the data.
@@ -30,3 +45,6 @@ It can happen that after all transformations, a row of the CSV file would produc
30
45
  By default SmarterCSV uses `remove_empty_hashes: true` to remove these empty hashes from the result.
31
46
 
32
47
  This can be set to `true`, to keep these empty hashes in the results.
48
+
49
+ -------------------
50
+ PREVIOUS: [Header Validations](./header_validations.md) | NEXT: [Value Converters](./value_converters.md)
data/docs/examples.md CHANGED
@@ -1,4 +1,18 @@
1
1
 
2
+ ### Contents
3
+
4
+ * [Introduction](./_introduction.md)
5
+ * [The Basic API](./basic_api.md)
6
+ * [Batch Processing](././batch_processing.md)
7
+ * [Configuration Options](./options.md)
8
+ * [Row and Column Separators](./row_col_sep.md)
9
+ * [Header Transformations](./header_transformations.md)
10
+ * [Header Validations](./header_validations.md)
11
+ * [Data Transformations](./data_transformations.md)
12
+ * [Value Converters](./value_converters.md)
13
+
14
+ --------------
15
+
2
16
  # Examples
3
17
 
4
18
  Here are some examples to demonstrate the versatility of SmarterCSV.
@@ -1,3 +1,18 @@
1
+
2
+ ### Contents
3
+
4
+ * [Introduction](./_introduction.md)
5
+ * [The Basic API](./basic_api.md)
6
+ * [Batch Processing](././batch_processing.md)
7
+ * [Configuration Options](./options.md)
8
+ * [Row and Column Separators](./row_col_sep.md)
9
+ * [**Header Transformations**](./header_transformations.md)
10
+ * [Header Validations](./header_validations.md)
11
+ * [Data Transformations](./data_transformations.md)
12
+ * [Value Converters](./value_converters.md)
13
+
14
+ --------------
15
+
1
16
  # Header Transformations
2
17
 
3
18
  By default SmarterCSV assumes that a CSV file has headers, and it automatically normalizes the headers and transforms them into Ruby symbols. You can completely customize or override this (see below).
@@ -93,3 +108,6 @@ For CSV files with headers, you can either:
93
108
  * some CSV files use un-escaped quotation characters inside fields. This can cause the import to break. To get around this, use the `:force_simple_split => true` option in combination with `:strip_chars_from_headers => /[\-"]/` . This will also significantly speed up the import.
94
109
  If you would force a different :quote_char instead (setting it to a non-used character), then the import would be up to 5-times slower than using `:force_simple_split`.
95
110
 
111
+ ---------------
112
+ PREVIOUS: [Row and Column Separators](./row_col_sep.md) | NEXT: [Header Validations](./header_validations.md)
113
+
@@ -1,3 +1,18 @@
1
+
2
+ ### Contents
3
+
4
+ * [Introduction](./_introduction.md)
5
+ * [The Basic API](./basic_api.md)
6
+ * [Batch Processing](././batch_processing.md)
7
+ * [Configuration Options](./options.md)
8
+ * [Row and Column Separators](./row_col_sep.md)
9
+ * [Header Transformations](./header_transformations.md)
10
+ * [**Header Validations**](./header_validations.md)
11
+ * [Data Transformations](./data_transformations.md)
12
+ * [Value Converters](./value_converters.md)
13
+
14
+ --------------
15
+
1
16
  # Header Validations
2
17
 
3
18
  When you are importing data, it can be important to verify that all required data is present, to ensure consistent quality when importing data.
@@ -16,3 +31,6 @@ If these keys are not present, `SmarterCSV::MissingKeys` will be raised to infor
16
31
 
17
32
  => this will raise SmarterCSV::MissingKeys if any row does not contain these three keys
18
33
  ```
34
+
35
+ ----------------
36
+ PREVIOUS: [Header Transformations](./header_transformations.md) | NEXT: [Data Transformations](./data_transformations.md)
data/docs/options.md CHANGED
@@ -1,5 +1,19 @@
1
1
 
2
- # SmarterCSV Options
2
+ ### Contents
3
+
4
+ * [Introduction](./_introduction.md)
5
+ * [The Basic API](./basic_api.md)
6
+ * [Batch Processing](././batch_processing.md)
7
+ * [**Configuration Options**](./options.md)
8
+ * [Row and Column Separators](./row_col_sep.md)
9
+ * [Header Transformations](./header_transformations.md)
10
+ * [Header Validations](./header_validations.md)
11
+ * [Data Transformations](./data_transformations.md)
12
+ * [Value Converters](./value_converters.md)
13
+
14
+ --------------
15
+
16
+ # Configuration Options
3
17
 
4
18
  ## CSV Writing
5
19
 
@@ -80,3 +94,5 @@ There have been a lot of 1-offs and feature creep around these options, and goin
80
94
  | | | also accepts either {:except => [:key1,:key2]} or {:only => :key3} |
81
95
  ---------------------------------------------------------------------------------------------------------------------------------
82
96
 
97
+ -------------
98
+ PREVIOUS: [Batch Processing](./batch_processing.md) | NEXT: [Row and Column Separators](./row_col_sep.md)
data/docs/row_col_sep.md CHANGED
@@ -1,4 +1,18 @@
1
1
 
2
+ ### Contents
3
+
4
+ * [Introduction](./_introduction.md)
5
+ * [The Basic API](./basic_api.md)
6
+ * [Batch Processing](././batch_processing.md)
7
+ * [Configuration Options](./options.md)
8
+ * [**Row and Column Separators**](./row_col_sep.md)
9
+ * [Header Transformations](./header_transformations.md)
10
+ * [Header Validations](./header_validations.md)
11
+ * [Data Transformations](./data_transformations.md)
12
+ * [Value Converters](./value_converters.md)
13
+
14
+ --------------
15
+
2
16
  # Row and Column Separators
3
17
 
4
18
  ## Automatic Detection
@@ -85,3 +99,6 @@ In this example, we use `comment_regexp` to filter out and ignore any lines star
85
99
  end
86
100
  => returns number of chunks
87
101
  ```
102
+
103
+ ----------------
104
+ PREVIOUS: [Configuration Options](./options.md) | NEXT: [Header Transformations](./header_transformations.md)
@@ -1,4 +1,18 @@
1
1
 
2
+ ### Contents
3
+
4
+ * [Introduction](./_introduction.md)
5
+ * [The Basic API](./basic_api.md)
6
+ * [Batch Processing](././batch_processing.md)
7
+ * [Configuration Options](./options.md)
8
+ * [Row and Column Separators](./row_col_sep.md)
9
+ * [Header Transformations](./header_transformations.md)
10
+ * [Header Validations](./header_validations.md)
11
+ * [Data Transformations](./data_transformations.md)
12
+ * [**Value Converters**](./value_converters.md)
13
+
14
+ --------------
15
+
2
16
  # Using Value Converters
3
17
 
4
18
  Value Converters allow you to do custom transformations specific rows, to help you massage the data so it fits the expectations of your down-stream process, such as creating a DB record.
@@ -49,3 +63,6 @@ If you use `key_mappings` and `value_converters`, make sure that the value conve
49
63
  first_record[:price].class
50
64
  => Float
51
65
  ```
66
+
67
+ --------------------
68
+ PREVIOUS: [Data Transformations](./data_transformations.md) | UP: [README](../README.md)
@@ -19,7 +19,12 @@ module SmarterCSV
19
19
  count.times do
20
20
  line = readline_with_counts(filehandle, options)
21
21
  delimiters.each do |d|
22
- candidates[d] += line.scan(d).count
22
+ escaped_quote = Regexp.escape(options[:quote_char])
23
+
24
+ # Count only non-quoted occurrences of the delimiter
25
+ non_quoted_text = line.split(/#{escaped_quote}[^#{escaped_quote}]*#{escaped_quote}/).join
26
+
27
+ candidates[d] += non_quoted_text.scan(d).count
23
28
  end
24
29
  rescue EOFError # short files
25
30
  break
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SmarterCSV
4
- VERSION = "1.12.0.pre1"
4
+ VERSION = "1.12.1"
5
5
  end
data/smarter_csv.gemspec CHANGED
@@ -10,7 +10,7 @@ Gem::Specification.new do |spec|
10
10
  spec.email = ["tilo.sloboda@gmail.com"]
11
11
 
12
12
  spec.summary = "Convenient CSV Reading and Writing"
13
- spec.description = "Ruby Gem for convenient reading and writing: importing of CSV Files as Array(s) of Hashes, with lots of features for processing large files in parallel, embedded comments, unusual field- and record-separators, flexible mapping of CSV-headers to Hash-keys"
13
+ spec.description = "Ruby Gem for convenient reading and writing of CSV files. It has intelligent defaults, and auto-discovery of column and row separators. It imports CSV Files as Array(s) of Hashes, suitable for direct processing with ActiveRecord, kicking-off batch jobs with Sidekiq, parallel processing, or oploading data to S3. Similarly, writing CSV files takes Hashes, or Arrays of Hashes to create a CSV file."
14
14
  spec.homepage = "https://github.com/tilo/smarter_csv"
15
15
  spec.license = 'MIT'
16
16
 
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: smarter_csv
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.12.0.pre1
4
+ version: 1.12.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tilo Sloboda
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2024-07-08 00:00:00.000000000 Z
11
+ date: 2024-07-10 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: awesome_print
@@ -94,10 +94,11 @@ dependencies:
94
94
  - - ">="
95
95
  - !ruby/object:Gem::Version
96
96
  version: '0'
97
- description: 'Ruby Gem for convenient reading and writing: importing of CSV Files
98
- as Array(s) of Hashes, with lots of features for processing large files in parallel,
99
- embedded comments, unusual field- and record-separators, flexible mapping of CSV-headers
100
- to Hash-keys'
97
+ description: Ruby Gem for convenient reading and writing of CSV files. It has intelligent
98
+ defaults, and auto-discovery of column and row separators. It imports CSV Files
99
+ as Array(s) of Hashes, suitable for direct processing with ActiveRecord, kicking-off
100
+ batch jobs with Sidekiq, parallel processing, or oploading data to S3. Similarly,
101
+ writing CSV files takes Hashes, or Arrays of Hashes to create a CSV file.
101
102
  email:
102
103
  - tilo.sloboda@gmail.com
103
104
  executables: []
@@ -122,7 +123,6 @@ files:
122
123
  - docs/examples.md
123
124
  - docs/header_transformations.md
124
125
  - docs/header_validations.md
125
- - docs/notes.md
126
126
  - docs/options.md
127
127
  - docs/row_col_sep.md
128
128
  - docs/value_converters.md
@@ -161,9 +161,9 @@ required_ruby_version: !ruby/object:Gem::Requirement
161
161
  version: 2.5.0
162
162
  required_rubygems_version: !ruby/object:Gem::Requirement
163
163
  requirements:
164
- - - ">"
164
+ - - ">="
165
165
  - !ruby/object:Gem::Version
166
- version: 1.3.1
166
+ version: '0'
167
167
  requirements: []
168
168
  rubygems_version: 3.2.3
169
169
  signing_key:
data/docs/notes.md DELETED
@@ -1,29 +0,0 @@
1
-
2
- # Notes
3
-
4
-
5
-
6
-
7
- ## NOTES on the use of Chunking and Blocks:
8
- * chunking can be VERY USEFUL if used in combination with passing a block to File.read_csv FOR LARGE FILES
9
- * if you pass a block to File.read_csv, that block will be executed and given an Array of Hashes as the parameter.
10
- * if the chunk_size is not set, then the array will only contain one Hash.
11
- * if the chunk_size is > 0 , then the array may contain up to chunk_size Hashes.
12
- * this can be very useful when passing chunked data to a post-processing step, e.g. through Sidekiq
13
-
14
- ## NOTES about File Encodings:
15
- * if you have a CSV file which contains unicode characters, you can process it as follows:
16
-
17
- ```ruby
18
- File.open(filename, "r:bom|utf-8") do |f|
19
- data = SmarterCSV.process(f);
20
- end
21
- ```
22
- * if the CSV file with unicode characters is in a remote location, similarly you need to give the encoding as an option to the `open` call:
23
- ```ruby
24
- require 'open-uri'
25
- file_location = 'http://your.remote.org/sample.csv'
26
- open(file_location, 'r:utf-8') do |f| # don't forget to specify the UTF-8 encoding!!
27
- data = SmarterCSV.process(f)
28
- end
29
- ```