RubyGems - smarter_csv - Versions diffs - 1.12.0.pre1 → 1.12.1 - Mend

smarter_csv 1.12.0.pre1 → 1.12.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +6 -3
data/CONTRIBUTORS.md +1 -0
data/README.md +2 -2
data/docs/_introduction.md +18 -2
data/docs/basic_api.md +18 -1
data/docs/batch_processing.md +17 -2
data/docs/data_transformations.md +18 -0
data/docs/examples.md +14 -0
data/docs/header_transformations.md +18 -0
data/docs/header_validations.md +18 -0
data/docs/options.md +17 -1
data/docs/row_col_sep.md +17 -0
data/docs/value_converters.md +17 -0
data/lib/smarter_csv/auto_detection.rb +6 -1
data/lib/smarter_csv/version.rb +1 -1
data/smarter_csv.gemspec +1 -1
metadata +9 -9
data/docs/notes.md +0 -29

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: a7573b21c853accca5035c8ba3b1db8f0d3a7fdddc961a535440adedcf0b6b82
-  data.tar.gz: 19fffb74289999f01210ad359ac349286a8b267df3910846bec81934f6cde333
+  metadata.gz: 05aa9e7d2d22ec6e1beb3790e2b727cd3e615cadcd537716f2dfbb190cc87a09
+  data.tar.gz: e37b072c7c81a3b6cdc6192ed2bfab046c924f3aa7a8a3e2a66f55fafa25b7ff
 SHA512:
-  metadata.gz: 1f9bcb549185941fec0ee7a238df470a8bfdba7cc7ec007057afed0f9dfda8e7a298d1fcfe3bcb2911337827900ccb71df6bd65ed917aed322a3499d4cf3c3a9
-  data.tar.gz: 3350a2d318e351f5d5a192fa1aa0664ed0ba42910c5742c18e4a92cdcd145f0e27ddea928e968e834b50e9ea2906b4b4aa573939540e661b65e11acffa739c0b
+  metadata.gz: 07c149aaa123ef75fb65fd596fbab64359e24cf2b8606fe406d714358a1c14696fa9ecb420e6dd0a95d40f6af6d41e4988b16df9eac4346d9e1295e3c32f22b1
+  data.tar.gz: 71341c1cf1092fabbfe9106ce533adb872e2bc1b0c30fbc032f3ceaea1832e2ddef5d4156f1465658a67dddaae508cd23b12cfe9fdf34edea3f1f3ede0385688

data/CHANGELOG.md CHANGED Viewed

@@ -1,10 +1,13 @@
 # SmarterCSV 1.x Change Log
-## 1.12.0 (2024-07-08)
-  * added SmarterCSV::Reader to process CSV files ([issue #277](https://github.com/tilo/smarter_csv/pull/277))
+## 1.12.1 (2024-07-10)
+  * Improved column separator detection by ignoring quoted sections [#276](https://github.com/tilo/smarter_csv/pull/276) (thanks to Nicolas Castellanos)
+## 1.12.0 (2024-07-09)
+  * Added Thread-Safety: added SmarterCSV::Reader to process CSV files in a thread-safe manner ([issue #277](https://github.com/tilo/smarter_csv/pull/277))
   * SmarterCSV::Writer changed default row separator to the system's row separator (`\n` on Linux, `\r\n` on Windows)
-  * added a lot of docs
+  * added a doc tree
   * POTENTIAL ISSUE:

data/CONTRIBUTORS.md CHANGED Viewed

@@ -53,3 +53,4 @@ A Big Thank you to everyone who filed issues, sent comments, and who contributed
  * [JP Camara](https://github.com/jpcamara)
  * [Kenton Hirowatari](https://github.com/hirowatari)
  * [Daniel Pepper](https://github.com/dpep)
+ * [Nicolas Castellanos](https://github.com/nicastelo)

data/README.md CHANGED Viewed

@@ -36,6 +36,7 @@ Or install it yourself as:
   * [Introduction](docs/_introduction.md)
   * [The Basic API](docs/basic_api.md)
+  * [Batch Processing](./docs/batch_processing.md)
   * [Configuration Options](docs/options.md)
   * [Row and Column Separators](docs/row_col_sep.md)
   * [Header Transformations](docs/header_transformations.md)
@@ -43,9 +44,8 @@ Or install it yourself as:
   * [Data Transformations](docs/data_transformations.md)
   * [Value Converters](docs/value_converters.md)
-  * [Notes](docs/notes.md)  <--- this info needs to be moved to individual pages
 # Articles
+* [Parsing CSV Files in Ruby with SmarterCSV](https://tilo-sloboda.medium.com/parsing-csv-files-in-ruby-with-smartercsv-6ce66fb6cf38)
 * [Processing 1.4 Million CSV Records in Ruby, fast ](https://lcx.wien/blog/processing-14-million-csv-records-in-ruby/)
 * [Speeding up CSV parsing with parallel processing](http://xjlin0.github.io/tech/2015/05/25/faster-parsing-csv-with-parallel-processing)
 * [The original post](http://www.unixgods.org/Ruby/process_csv_as_hashes.html) that started SmarterCSV

data/docs/_introduction.md CHANGED Viewed

@@ -1,8 +1,21 @@
-# SmarterCSV Introduction
+### Contents
+  * [**Introduction**](./_introduction.md)
+  * [The Basic API](./basic_api.md)
+  * [Batch Processing](././batch_processing.md)
+  * [Configuration Options](./options.md)
+  * [Row and Column Separators](./row_col_sep.md)
+  * [Header Transformations](./header_transformations.md)
+  * [Header Validations](./header_validations.md)
+  * [Data Transformations](./data_transformations.md)
+  * [Value Converters](./value_converters.md)
+--------------
-`smarter_csv` is a Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, suitable for direct processing with ActiveRecord, parallel processing, kicking-off batch jobs with Sidekiq, or oploading data to S3.
+# SmarterCSV Introduction
+`smarter_csv` is a Ruby Gem for convenient reading and writing of CSV files. It has intelligent defaults, and auto-discovery of column and row separators. It imports CSV Files as Array(s) of Hashes, suitable for direct processing with ActiveRecord, kicking-off batch jobs with Sidekiq, parallel processing, or oploading data to S3. Similarly, writing CSV files takes Hashes, or Arrays of Hashes to create a CSV file.
 ## Why another CSV library?
@@ -38,3 +51,6 @@ The CSV processing also needed to be robust against variations in the input data
 * Data Validations
   (planned feature)
+---------------
+PREVIOUS [README](../README.md) | NEXT: [The Basic API](./basic_api.md)

data/docs/basic_api.md CHANGED Viewed

@@ -1,5 +1,19 @@
-# SmarterCSV API
+### Contents
+  * [Introduction](./_introduction.md)
+  * [**The Basic API**](./basic_api.md)
+  * [Batch Processing](././batch_processing.md)
+  * [Configuration Options](./options.md)
+  * [Row and Column Separators](./row_col_sep.md)
+  * [Header Transformations](./header_transformations.md)
+  * [Header Validations](./header_validations.md)
+  * [Data Transformations](./data_transformations.md)
+  * [Value Converters](./value_converters.md)
+--------------
+# SmarterCSV Basic API
 Let's explore the basic APIs for reading and writing CSV files. There is a simplified API (backwards conpatible with previous SmarterCSV versions) and the full API, which allows you to access the internal state of the reader or writer instance after processing.
@@ -138,3 +152,6 @@ $ hexdump -C spec/fixtures/bom_test_feff.csv
          data = SmarterCSV.process(f)
        end
 ```
+----------------
+PREVIOUS: [Introduction](./_introduction.md) | NEXT: [Batch Processing](./batch_processing.md)

data/docs/batch_processing.md CHANGED Viewed

@@ -1,4 +1,18 @@
+### Contents
+  * [Introduction](./_introduction.md)
+  * [The Basic API](./basic_api.md)
+  * [**Batch Processing**](././batch_processing.md)
+  * [Configuration Options](./options.md)
+  * [Row and Column Separators](./row_col_sep.md)
+  * [Header Transformations](./header_transformations.md)
+  * [Header Validations](./header_validations.md)
+  * [Data Transformations](./data_transformations.md)
+  * [Value Converters](./value_converters.md)
+--------------
 # Batch Processing
 Processing CSV data in batches (chunks), allows you to parallelize the workload of importing data.
@@ -44,10 +58,11 @@ and how the `process` method returns the number of chunks when called with a blo
     n = SmarterCSV.process(filename, options) do |chunk|
           # we're passing a block in, to process each resulting hash / row (block takes array of hashes)
           # when chunking is enabled, there are up to :chunk_size hashes in each chunk
-          MyModel.collection.insert( chunk )   # insert up to 100 records at a time
+          MyModel.insert_all( chunk )   # insert up to 100 records at a time
     end
      => returns number of chunks we processed
 ```
+----------------
+PREVIOUS: [The Basic API](./basic_api.md)  | NEXT: [Configuration Options](./options.md)

data/docs/data_transformations.md CHANGED Viewed

@@ -1,3 +1,18 @@
+### Contents
+  * [Introduction](./_introduction.md)
+  * [The Basic API](./basic_api.md)
+  * [Batch Processing](././batch_processing.md)
+  * [Configuration Options](./options.md)
+  * [Row and Column Separators](./row_col_sep.md)
+  * [Header Transformations](./header_transformations.md)
+  * [Header Validations](./header_validations.md)
+  * [**Data Transformations**](./data_transformations.md)
+  * [Value Converters](./value_converters.md)
+--------------
 # Data Transformations
 SmarterCSV automatically transforms the values in each colum in order to normalize the data.
@@ -30,3 +45,6 @@ It can happen that after all transformations, a row of the CSV file would produc
 By default SmarterCSV uses `remove_empty_hashes: true` to remove these empty hashes from the result.
 This can be set to `true`, to keep these empty hashes in the results.
+-------------------
+PREVIOUS: [Header Validations](./header_validations.md) | NEXT: [Value Converters](./value_converters.md)

data/docs/examples.md CHANGED Viewed

@@ -1,4 +1,18 @@
+### Contents
+  * [Introduction](./_introduction.md)
+  * [The Basic API](./basic_api.md)
+  * [Batch Processing](././batch_processing.md)
+  * [Configuration Options](./options.md)
+  * [Row and Column Separators](./row_col_sep.md)
+  * [Header Transformations](./header_transformations.md)
+  * [Header Validations](./header_validations.md)
+  * [Data Transformations](./data_transformations.md)
+  * [Value Converters](./value_converters.md)
+--------------
 # Examples
 Here are some examples to demonstrate the versatility of SmarterCSV.

data/docs/header_transformations.md CHANGED Viewed

@@ -1,3 +1,18 @@
+### Contents
+  * [Introduction](./_introduction.md)
+  * [The Basic API](./basic_api.md)
+  * [Batch Processing](././batch_processing.md)
+  * [Configuration Options](./options.md)
+  * [Row and Column Separators](./row_col_sep.md)
+  * [**Header Transformations**](./header_transformations.md)
+  * [Header Validations](./header_validations.md)
+  * [Data Transformations](./data_transformations.md)
+  * [Value Converters](./value_converters.md)
+--------------
 # Header Transformations
 By default SmarterCSV assumes that a CSV file has headers, and it automatically normalizes the headers and transforms them into Ruby symbols. You can completely customize or override this (see below).
@@ -93,3 +108,6 @@ For CSV files with headers, you can either:
  * some CSV files use un-escaped quotation characters inside fields. This can cause the import to break. To get around this, use the `:force_simple_split => true` option in combination with `:strip_chars_from_headers => /[\-"]/` . This will also significantly speed up the import.
    If you would force a different :quote_char instead (setting it to a non-used character), then the import would be up to 5-times slower than using `:force_simple_split`.
+---------------
+PREVIOUS: [Row and Column Separators](./row_col_sep.md) | NEXT: [Header Validations](./header_validations.md)

data/docs/header_validations.md CHANGED Viewed

@@ -1,3 +1,18 @@
+### Contents
+  * [Introduction](./_introduction.md)
+  * [The Basic API](./basic_api.md)
+  * [Batch Processing](././batch_processing.md)
+  * [Configuration Options](./options.md)
+  * [Row and Column Separators](./row_col_sep.md)
+  * [Header Transformations](./header_transformations.md)
+  * [**Header Validations**](./header_validations.md)
+  * [Data Transformations](./data_transformations.md)
+  * [Value Converters](./value_converters.md)
+--------------
 # Header Validations
 When you are importing data, it can be important to verify that all required data is present, to ensure consistent quality when importing data.
@@ -16,3 +31,6 @@ If these keys are not present, `SmarterCSV::MissingKeys` will be raised to infor
   => this will raise SmarterCSV::MissingKeys if any row does not contain these three keys
 ```
+----------------
+PREVIOUS: [Header Transformations](./header_transformations.md) | NEXT: [Data Transformations](./data_transformations.md)

data/docs/options.md CHANGED Viewed

@@ -1,5 +1,19 @@
-# SmarterCSV Options
+### Contents
+  * [Introduction](./_introduction.md)
+  * [The Basic API](./basic_api.md)
+  * [Batch Processing](././batch_processing.md)
+  * [**Configuration Options**](./options.md)
+  * [Row and Column Separators](./row_col_sep.md)
+  * [Header Transformations](./header_transformations.md)
+  * [Header Validations](./header_validations.md)
+  * [Data Transformations](./data_transformations.md)
+  * [Value Converters](./value_converters.md)
+--------------
+# Configuration Options
 ## CSV Writing
@@ -80,3 +94,5 @@ There have been a lot of 1-offs and feature creep around these options, and goin
      |                             |          |      also accepts either {:except => [:key1,:key2]} or {:only => :key3}              |
      ---------------------------------------------------------------------------------------------------------------------------------
+-------------
+PREVIOUS: [Batch Processing](./batch_processing.md) | NEXT: [Row and Column Separators](./row_col_sep.md)

data/docs/row_col_sep.md CHANGED Viewed

@@ -1,4 +1,18 @@
+### Contents
+  * [Introduction](./_introduction.md)
+  * [The Basic API](./basic_api.md)
+  * [Batch Processing](././batch_processing.md)
+  * [Configuration Options](./options.md)
+  * [**Row and Column Separators**](./row_col_sep.md)
+  * [Header Transformations](./header_transformations.md)
+  * [Header Validations](./header_validations.md)
+  * [Data Transformations](./data_transformations.md)
+  * [Value Converters](./value_converters.md)
+--------------
 # Row and Column Separators
 ## Automatic Detection
@@ -85,3 +99,6 @@ In this example, we use `comment_regexp` to filter out and ignore any lines star
     end
     => returns number of chunks
 ```
+----------------
+PREVIOUS: [Configuration Options](./options.md) | NEXT: [Header Transformations](./header_transformations.md)

data/docs/value_converters.md CHANGED Viewed

@@ -1,4 +1,18 @@
+### Contents
+  * [Introduction](./_introduction.md)
+  * [The Basic API](./basic_api.md)
+  * [Batch Processing](././batch_processing.md)
+  * [Configuration Options](./options.md)
+  * [Row and Column Separators](./row_col_sep.md)
+  * [Header Transformations](./header_transformations.md)
+  * [Header Validations](./header_validations.md)
+  * [Data Transformations](./data_transformations.md)
+  * [**Value Converters**](./value_converters.md)
+--------------
 # Using Value Converters
 Value Converters allow you to do custom transformations specific rows, to help you massage the data so it fits the expectations of your down-stream process, such as creating a DB record.
@@ -49,3 +63,6 @@ If you use `key_mappings` and `value_converters`, make sure that the value conve
     first_record[:price].class
       => Float
 ```
+--------------------
+PREVIOUS: [Data Transformations](./data_transformations.md) | UP: [README](../README.md)

data/lib/smarter_csv/auto_detection.rb CHANGED Viewed

@@ -19,7 +19,12 @@ module SmarterCSV
       count.times do
         line = readline_with_counts(filehandle, options)
         delimiters.each do |d|
-          candidates[d] += line.scan(d).count
+          escaped_quote = Regexp.escape(options[:quote_char])
+          # Count only non-quoted occurrences of the delimiter
+          non_quoted_text = line.split(/#{escaped_quote}[^#{escaped_quote}]*#{escaped_quote}/).join
+          candidates[d] += non_quoted_text.scan(d).count
         end
       rescue EOFError # short files
         break

data/lib/smarter_csv/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module SmarterCSV
-  VERSION = "1.12.0.pre1"
+  VERSION = "1.12.1"
 end

data/smarter_csv.gemspec CHANGED Viewed

@@ -10,7 +10,7 @@ Gem::Specification.new do |spec|
   spec.email         = ["tilo.sloboda@gmail.com"]
   spec.summary       = "Convenient CSV Reading and Writing"
-  spec.description   = "Ruby Gem for convenient reading and writing: importing of CSV Files as Array(s) of Hashes, with lots of features for processing large files in parallel, embedded comments, unusual field- and record-separators, flexible mapping of CSV-headers to Hash-keys"
+  spec.description   = "Ruby Gem for convenient reading and writing of CSV files. It has intelligent defaults, and auto-discovery of column and row separators. It imports CSV Files as Array(s) of Hashes, suitable for direct processing with ActiveRecord, kicking-off batch jobs with Sidekiq, parallel processing, or oploading data to S3. Similarly, writing CSV files takes Hashes, or Arrays of Hashes to create a CSV file."
   spec.homepage      = "https://github.com/tilo/smarter_csv"
   spec.license       = 'MIT'

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: smarter_csv
 version: !ruby/object:Gem::Version
-  version: 1.12.0.pre1
+  version: 1.12.1
 platform: ruby
 authors:
 - Tilo Sloboda
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2024-07-08 00:00:00.000000000 Z
+date: 2024-07-10 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: awesome_print
@@ -94,10 +94,11 @@ dependencies:
     - - ">="
       - !ruby/object:Gem::Version
         version: '0'
-description: 'Ruby Gem for convenient reading and writing: importing of CSV Files
-  as Array(s) of Hashes, with lots of features for processing large files in parallel,
-  embedded comments, unusual field- and record-separators, flexible mapping of CSV-headers
-  to Hash-keys'
+description: Ruby Gem for convenient reading and writing of CSV files. It has intelligent
+  defaults, and auto-discovery of column and row separators. It imports CSV Files
+  as Array(s) of Hashes, suitable for direct processing with ActiveRecord, kicking-off
+  batch jobs with Sidekiq, parallel processing, or oploading data to S3. Similarly,
+  writing CSV files takes Hashes, or Arrays of Hashes to create a CSV file.
 email:
 - tilo.sloboda@gmail.com
 executables: []
@@ -122,7 +123,6 @@ files:
 - docs/examples.md
 - docs/header_transformations.md
 - docs/header_validations.md
-- docs/notes.md
 - docs/options.md
 - docs/row_col_sep.md
 - docs/value_converters.md
@@ -161,9 +161,9 @@ required_ruby_version: !ruby/object:Gem::Requirement
       version: 2.5.0
 required_rubygems_version: !ruby/object:Gem::Requirement
   requirements:
-  - - ">"
+  - - ">="
     - !ruby/object:Gem::Version
-      version: 1.3.1
+      version: '0'
 requirements: []
 rubygems_version: 3.2.3
 signing_key:

data/docs/notes.md DELETED Viewed

@@ -1,29 +0,0 @@
-# Notes
-## NOTES on the use of Chunking and Blocks:
- * chunking can be VERY USEFUL if used in combination with passing a block to File.read_csv FOR LARGE FILES
- * if you pass a block to File.read_csv, that block will be executed and given an Array of Hashes as the parameter.
- * if the chunk_size is not set, then the array will only contain one Hash.
- * if the chunk_size is > 0 , then the array may contain up to chunk_size Hashes.
- * this can be very useful when passing chunked data to a post-processing step, e.g. through Sidekiq
-## NOTES about File Encodings:
- * if you have a CSV file which contains unicode characters, you can process it as follows:
-```ruby
-       File.open(filename, "r:bom|utf-8") do |f|
-         data = SmarterCSV.process(f);
-       end
-```
-* if the CSV file with unicode characters is in a remote location, similarly you need to give the encoding as an option to the `open` call:
-```ruby
-       require 'open-uri'
-       file_location = 'http://your.remote.org/sample.csv'
-       open(file_location, 'r:utf-8') do |f|   # don't forget to specify the UTF-8 encoding!!
-         data = SmarterCSV.process(f)
-       end
-```