RubyGems - smarter_csv - Versions diffs - 1.15.2 → 1.16.1 - Mend

smarter_csv 1.15.2 → 1.16.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (50) hide show

checksums.yaml +4 -4
data/.rspec +2 -0
data/.rubocop.yml +9 -0
data/CHANGELOG.md +112 -1
data/CONTRIBUTORS.md +4 -1
data/Gemfile +1 -0
data/README.md +129 -27
data/docs/_introduction.md +45 -24
data/docs/bad_row_quarantine.md +342 -0
data/docs/basic_read_api.md +152 -9
data/docs/basic_write_api.md +475 -59
data/docs/batch_processing.md +162 -4
data/docs/column_selection.md +184 -0
data/docs/data_transformations.md +163 -29
data/docs/examples.md +340 -46
data/docs/header_transformations.md +94 -12
data/docs/header_validations.md +57 -18
data/docs/history.md +119 -0
data/docs/instrumentation.md +166 -0
data/docs/migrating_from_csv.md +565 -0
data/docs/options.md +151 -87
data/docs/parsing_strategy.md +64 -1
data/docs/real_world_csv.md +263 -0
data/docs/releases/1.16.0/benchmarks.md +223 -0
data/docs/releases/1.16.0/changes.md +273 -0
data/docs/releases/1.16.0/performance_notes.md +114 -0
data/docs/row_col_sep.md +15 -5
data/docs/ruby_csv_pitfalls.md +514 -0
data/docs/value_converters.md +194 -57
data/ext/smarter_csv/extconf.rb +3 -0
data/ext/smarter_csv/smarter_csv.c +1017 -82
data/images/SmarterCSV_1.16.0_vs_RubyCSV_3.3.5_speedup.png +0 -0
data/images/SmarterCSV_1.16.0_vs_RubyCSV_3.3.5_speedup.svg +108 -0
data/images/SmarterCSV_1.16.0_vs_previous_C-speedup.png +0 -0
data/images/SmarterCSV_1.16.0_vs_previous_C-speedup.svg +141 -0
data/images/SmarterCSV_1.16.0_vs_previous_Rb-speedup.png +0 -0
data/images/SmarterCSV_1.16.0_vs_previous_Rb-speedup.svg +139 -0
data/lib/smarter_csv/errors.rb +8 -0
data/lib/smarter_csv/file_io.rb +1 -1
data/lib/smarter_csv/hash_transformations.rb +14 -13
data/lib/smarter_csv/header_transformations.rb +21 -2
data/lib/smarter_csv/headers.rb +2 -1
data/lib/smarter_csv/options.rb +124 -7
data/lib/smarter_csv/parser.rb +358 -74
data/lib/smarter_csv/reader.rb +494 -46
data/lib/smarter_csv/version.rb +1 -1
data/lib/smarter_csv/writer.rb +71 -19
data/lib/smarter_csv.rb +134 -13
data/smarter_csv.gemspec +20 -10
metadata +38 -80

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 41a8d63c5aea4500d77b4268079521194f0d2d34de2b3e5f2264c48181159273
-  data.tar.gz: 586facc801af166270eebf0ece90949061ccfeaadfa3e7837678cb935e032bcb
+  metadata.gz: 043745aedb1c63fd4a044b9ae46bb8e5d98324c14e609214ee3d895acfd5f501
+  data.tar.gz: c39a10521b767daf51887278c9020c9ff6d8d93c32c5ec3f95a17ec575ebdab5
 SHA512:
-  metadata.gz: ed4072e64c4e66fb5b982dfaffe49d32370b087aa9a1ff689c2f73bfa6450ae275547bb17818ff227e8843834bcb981a8a906b5e7936bbf999f497e89b2cb91d
-  data.tar.gz: 31ecb71b2b50e1bb5f2aa037583550eb878f2e1faf66adf0803c8dcdeafbd52b0fa24c3b78bcc9bcdc3a3c759b53667004541257c32799d08b944a4ed53d9b49
+  metadata.gz: 5f1d125138443f02e0276e964dac9e584b996de6acafe8b3856316852a38220094e69a2c14302f922bcc93b6d23cc594bbf926940ccd70a2bd65ab08c5a18b49
+  data.tar.gz: '0929051996781c8643c0239556d123c840e7041d9c12f7d867e3800dfb2c2eb92e6f6fb77b5fc08660a36f34f470cb826f255943fa445d3e29d231e648da51b4'

data/.rspec CHANGED Viewed

@@ -1 +1,3 @@
 --require spec_helper
+--color
+--format documentation

data/.rubocop.yml CHANGED Viewed

@@ -133,6 +133,9 @@ Style/SoleNestedConditional:
 Style/SpecialGlobalVars: # DANGER: unsafe rule!!
   Enabled: false
+Style/StderrPuts:
+  Enabled: false # DANGER: unsafe rule!! we DO NOT want warn here
 Style/StringConcatenation:
   Enabled: false
@@ -164,6 +167,12 @@ Style/TrailingUnderscoreVariable:
 Style/TrivialAccessors:
   Enabled: false
+Style/WhileUntilModifier:
+  Enabled: false
+Style/WordArray:
+  Enabled: false
 # Style/UnlessModifier:
 #   Enabled: false

data/CHANGELOG.md CHANGED Viewed

@@ -1,9 +1,120 @@
 # SmarterCSV 1.x Change Log
+## 1.16.1 (2026-03-16) — Bug Fixes & New Features
+RSpec tests: **1,247 → 1,410** (+163 tests)
+### New Features
+* **`SmarterCSV.errors`** — class-level error access after any `process`, `parse`, `each`, or `each_chunk` call.
+  Exposes the same `reader.errors` hash without requiring access to the `Reader` instance.
+  Errors are cleared at the start of each call and stored per-thread (safe in Puma/Sidekiq).
+  ```ruby
+  # Previously — required Reader instance to access errors
+  reader = SmarterCSV::Reader.new('data.csv', on_bad_row: :skip)
+  reader.process
+  puts reader.errors[:bad_row_count]
+  # Now — works with the class-level API too
+  SmarterCSV.process('data.csv', on_bad_row: :skip)
+  puts SmarterCSV.errors[:bad_row_count]
+  ```
+> **Note:** `SmarterCSV.errors` only surfaces errors from the **most recent run on the
+> current thread**. In a multi-threaded environment (Puma, Sidekiq), each thread maintains
+> its own error state independently. If you call `SmarterCSV.process` twice in the same
+> thread, the second call's errors replace the first's. For long-running or complex
+> pipelines where you need to aggregate errors across multiple files, use the Reader API.
+>
+> ⚠️ **Fibers:** `SmarterCSV.errors` uses `Thread.current` for storage, which is **shared
+> across all fibers running in the same thread**. If you process CSV files concurrently
+> in fibers (e.g. with `Async`, `Falcon`, or manual `Fiber` scheduling), `SmarterCSV.errors`
+> may return stale or wrong results. **Use `SmarterCSV::Reader` directly** — errors are
+> scoped to the reader instance and are always correct regardless of fiber context.
+### Bug Fixes
+* fixed [#325](https://github.com/tilo/smarter_csv/issues/325): `col_sep` in quoted headers was handled incorrectly; Thanks to Paho Lurie-Gregg.
+* fixed issue with quoted numeric fields that were not converted to numeric
+### Tests
+* Added 163 tests covering new features and corner cases
+## 1.16.0 (2026-03-12) — Minor Breaking Change
+[Full details](docs/releases/1.16.0/changes.md) · [Benchmarks](docs/releases/1.16.0/benchmarks.md) · [Performance notes](docs/releases/1.16.0/performance_notes.md)
+RSpec tests: **714 → 1,247** (+533 tests)
+### Minor Breaking Change
+New option **`quote_boundary:`**
+* defaults to `:standard`**: quotes are now only recognized as field delimiters at field boundaries;
+  mid-field quotes are treated as literal characters.
+  This aligns SmarterCSV with RFC 4180 and other CSV libraries. In practice, mid-field quotes
+  were already producing silently corrupt output in previous versions — so most users will see
+  correct behavior improve, not regress.
+* Use `quote_boundary: :legacy` only in exceptional cases to restore previous behavior. See [Parsing Strategy](../../parsing_strategy.md).
+### Performance
+ * **1.8×–8.6× faster** than Ruby `CSV.read` (raw tokenization only; no post-processing)
+ * **7×–129× faster** than Ruby `CSV.table` (nearest equivalent output)
+ * **up to 2.4× faster** for accelerated path vs 1.15.2 (15/19 benchmark files faster)
+ * **up to 2× faster** for Ruby path vs 1.15.2
+ * **9×–65× faster** for accelerated path vs 1.14.4
+Measured on 19 benchmark files, Apple M1, Ruby 3.4.7. See [benchmarks](docs/releases/1.16.0/benchmarks.md).
+### New Read API
+ * **`SmarterCSV.parse(csv_string, options)`**: can now parse a CSV string directly. See [Migrating from Ruby CSV](docs/migrating_from_csv.md).
+ * **`SmarterCSV.each` / `Reader#each`**: row-by-row enumerator; `Reader` now includes `Enumerable`.
+ * **`SmarterCSV.each_chunk` / `Reader#each_chunk`**: chunked enumerator yielding `(Array<Hash>, chunk_index)`.
+### New Options
+ * **`on_bad_row:`** — bad row quarantine: `:skip`, `:collect`, `:raise`, or callable. See [Bad Row Quarantine](docs/bad_row_quarantine.md).
+ * **`bad_row_limit: N`** — raises `SmarterCSV::TooManyBadRows` after N bad rows.
+ * **`collect_raw_lines:`** (default: `true`) — include raw line in bad-row error records.
+ * **`field_size_limit: N`** — cap field size in bytes; prevents DoS from unclosed quotes. Raises `SmarterCSV::FieldSizeLimitExceeded`.
+ * **`headers: { only: [...] }` / `headers: { except: [...] }`** — column selection; excluded columns skipped in C hot path. See [Column Selection](docs/column_selection.md).
+ * **`nil_values_matching:`** — replaces deprecated `remove_values_matching:`.
+ * **`missing_headers:`** (default: `:auto`) — replaces deprecated `strict:`.
+ * **`verbose: :quiet/:normal/:debug`** — replaces deprecated `verbose: true/false`.
+ * **`on_start:` / `on_chunk:` / `on_complete:`** — instrumentation hooks. See [Instrumentation](docs/instrumentation.md).
+### New Write API
+ * **IO/StringIO support**: `SmarterCSV.generate` and `Writer.new` now accept any `IO`-compatible object. See [Write API](docs/basic_write_api.md).
+ * **`SmarterCSV.generate` returns a String** when called without a destination argument.
+ * **Streaming mode**: when `headers:` or `map_headers:` is provided upfront, Writer skips the temp file and streams directly.
+ * **`encoding:` / `write_nil_value:` / `write_empty_value:` / `write_bom:`** — new writer options.
+### Deprecations
+ * `remove_values_matching:` → use `nil_values_matching:`
+ * `strict:` → use `missing_headers: :raise/:auto`
+ * `verbose: true/false` → use `verbose: :debug/:normal`
+ * `only_headers:` / `except_headers:` → use `headers: { only: }` / `headers: { except: }`
+### Bug Fixes
+ * **Empty headers** ([#324](https://github.com/tilo/smarter_csv/issues/324), [#312](https://github.com/tilo/smarter_csv/issues/312)): empty/whitespace-only header fields now auto-generate names via `missing_header_prefix`.
+ * **All library output now goes to `$stderr`** — nothing written to `$stdout`.
+ * **`SmarterCSV.generate` raises `ArgumentError`** (not blank `RuntimeError`) when called without a block.
+ * **Writer temp file** no longer hardcoded to `/tmp` (fixes Windows); properly cleaned up with `Tempfile#close!`.
+ * **Writer `StringIO`**: `finalize` no longer attempts to close a caller-owned `StringIO`.
 ## 1.15.2 (2026-02-20)
-* Performance Optimizations
+### Performance Optimizations
  - 1.6× to 7.2× faster than CSV.read
  - 6× to 113× faster than Ruby’s CSV.table
  - 5.4× to 37.4× faster than SmarterCSV 1.14.4 (with C-acceleration)

data/CONTRIBUTORS.md CHANGED Viewed

@@ -1,4 +1,4 @@
-# A Big Thank You to all 59 Contributors!!
+# A Big Thank You to all 61 Contributors!!
 A Big Thank you to everyone who filed issues, sent comments, and who contributed with pull requests:
@@ -62,3 +62,6 @@ A Big Thank you to everyone who filed issues, sent comments, and who contributed
  * [Felipe Cabezudo](https://github.com/felipekb)
  * [Skye Shaw](https://github.com/sshaw)
  * [Mark Bumiller](https://github.com/makrsmark)
+ * [Tophe](https://github.com/tophe)
+ * [Dom Lebron](https://github.com/biglebronski)
+ * [Paho Lurie-Gregg](https://github.com/paholg)

data/Gemfile CHANGED Viewed

@@ -8,6 +8,7 @@ gemspec
 gem "rake"
 gem "rake-compiler"
+gem "awesome_print"
 gem 'pry'
 gem "rubocop"

data/README.md CHANGED Viewed

@@ -3,19 +3,27 @@
   ![Gem Version](https://img.shields.io/gem/v/smarter_csv) [![codecov](https://codecov.io/gh/tilo/smarter_csv/branch/main/graph/badge.svg?token=1L7OD80182)](https://codecov.io/gh/tilo/smarter_csv) [View on RubyGems](https://rubygems.org/gems/smarter_csv) [View on RubyToolbox](https://www.ruby-toolbox.com/search?q=smarter_csv)
- SmarterCSV provides a convenient interface for reading and writing CSV files and data.
+  SmarterCSV is a high-performance CSV ingestion and generation for Ruby, focused on fast end-to-end CSV ingestion of real-world data — no silent failures, no surprises, not just tokenization.
- Unlike traditional CSV parsing methods, SmarterCSV focuses on representing the data for each row as a Ruby hash, which lends itself perfectly for direct use with ActiveRecord, Sidekiq, and JSON stores such as S3. For large files it supports processing CSV data in chunks of array-of-hashes, which allows parallel or batch processing of the data.
+  ⭐ If SmarterCSV saved you hours of import time, please star the repo, and consider sponsoring this project.
- Its powerful interface is designed to simplify and optimize the process of handling CSV data, and allows for highly customizable and efficient data processing by enabling the user to easily map CSV headers to Hash keys, skip unwanted rows, and transform data on-the-fly.
+  Ruby's built-in CSV library has 10 documented failure modes that can silently corrupt or lose data — duplicate headers, blank header cells, extra columns, BOMs, whitespace, encoding issues, and more — all without raising an exception.
+  SmarterCSV handles 8 our of 10 by default, and the remaining 2 with a single option each.
- This results in a more readable, maintainable, and performant codebase. Whether you're dealing with large datasets or complex data transformations, SmarterCSV streamlines CSV operations, making it an invaluable tool for developers seeking to enhance their data processing workflows.
+  > See [**Ruby CSV Pitfalls**](docs/ruby_csv_pitfalls.md) for 10 ways `CSV.read` silently corrupts or loses data, and how SmarterCSV handles them.
-  When writing CSV data to file, it similarly takes arrays of hashes, and converts them to a CSV file.
+  Beyond raw speed, SmarterCSV is designed to provide a significantly more convenient and developer-friendly interface than traditional CSV libraries. Instead of returning raw arrays that require substantial post-processing, SmarterCSV produces Rails-ready hashes for each row, making the data immediately usable with ActiveRecord, Sidekiq pipelines, parallel processing, and JSON-based workflows such as S3.
-One user wrote:
+  The library includes intelligent defaults, automatic detection of column and row separators, and flexible header/value transformations. These features eliminate much of the boilerplate typically required when working with CSV data and help keep ingestion code concise and maintainable.
-  > *Best gem for CSV for us yet. [...] taking an import process from 7+ hours to about 3 minutes. [...] Smarter CSV was a big part and helped clean up our code ALOT*
+  For large files, SmarterCSV supports both chunked processing (arrays of hashes) and streaming via Enumerable APIs, enabling efficient batch jobs and low-memory pipelines. The C acceleration further optimizes the full ingestion path — including parsing, hash construction, and conversions — so performance gains reflect real-world workloads, not just tokenizer benchmarks.
+  The interface is intentionally designed to robustly handle messy real-world CSV while keeping application code clean. Developers can easily map headers, skip unwanted rows, quarantine problematic data, and transform values on the fly without building custom post-processing pipelines. See [Real-World CSV Files](docs/real_world_csv.md) for a comprehensive guide to production CSV patterns.
+  When exporting data, SmarterCSV converts arrays of hashes back into properly formatted CSV, maintaining the same focus on convenience and correctness.
+**User Testimonial:**
+  > "Best gem for CSV for us yet. […] taking an import process from 7+ hours to about 3 minutes. […] SmarterCSV was a big part and helped clean up our code A LOT."
 ## Performance
@@ -25,19 +33,45 @@ SmarterCSV is designed for **real-world CSV processing**, returning fully usable
 For a fair comparison, `CSV.table` is the closest Ruby CSV equivalent to SmarterCSV.
-| Comparison                               | Range                |
-|------------------------------------------|----------------------|
-| vs SmarterCSV 1.14.4 (with acceleration) | 5.4× to 37.4x faster |
-| vs SmarterCSV 1.14.4 (pure Ruby)         | 1.4× to 9.5× faster  |
-| vs CSV.read  (arrays of arrays)          | 1.6x to 7.2x faster  |
-| vs CSV.table (arrays of hashes)          | 6× to 113× faster    |
-| vs ZSV (arrays of hashes)                | 1.4× to 6.3× faster  |
+| Comparison (SmarterCSV 1.16.0, C-accelerated)  | Range                   |
+|-------------------------------------------------|-------------------------|
+| vs SmarterCSV 1.15.2 (with C acceleration)      | up to 2.4× faster       |
+| vs SmarterCSV 1.14.4 (with C acceleration)      | 9×–65× faster           |
+| vs SmarterCSV 1.14.4 (Ruby path)                | 1.7×–10.6× faster       |
+| vs CSV.read  (arrays of arrays)                 | 1.7×–8.6× faster        |
+| vs CSV.table (arrays of hashes)                 | 7×–129× faster          |
+| vs ZSV (arrays of hashes, equiv. output)        | 1.1×–6.6× faster †      |
+† SmarterCSV faster on 15 of 16 files. ZSV raw arrays (no hashes, no conversions) are 2×–14× faster — but that omits the post-processing work needed to produce usable output.
+_Benchmarks: 19 CSV files (20k–80k rows), Ruby 3.4.7, Apple M1._
+![SmarterCSV 1.16.0 vs Ruby CSV 3.3.5 speedup](images/SmarterCSV_1.16.0_vs_RubyCSV_3.3.5_speedup.png)
- [More details here](https://tilo-sloboda.medium.com/smartercsv-1-15-2-faster-than-raw-csv-arrays-benchmarks-zsv-and-the-full-pipeline-2c12a798032e) and [here](https://github.com/tilo/smarter_csv/pull/319)
+![SmarterCSV 1.16.0 vs previous versions — C-accelerated path](images/SmarterCSV_1.16.0_vs_previous_C-speedup.svg)
-SmarterCSV also wins 14 of 16 benchmark files head-to-head against ZSV+wrapper (SIMD-accelerated C parser with Ruby wrapper to produce equivalent hash output).
+See [SmarterCSV 1.15.2: Faster Than Raw CSV Arrays](https://tilo-sloboda.medium.com/smartercsv-1-15-2-faster-than-raw-csv-arrays-benchmarks-zsv-and-the-full-pipeline-2c12a798032e) and [PR #319](https://github.com/tilo/smarter_csv/pull/319) for more details.
-_Benchmarks: 16 CSV files (43k–80k rows), Ruby 3.4.7, Apple M1. Memory: 39% less allocated, 43% fewer objects. See [CHANGELOG](./CHANGELOG.md) and [PR #319](https://github.com/tilo/smarter_csv/pull/319) for details._
+## Switching from Ruby CSV?
+It's a one-line change:
+```ruby
+# Before
+rows = CSV.table('data.csv').map(&:to_h)
+# After — up to 129× faster, same symbol keys
+rows = SmarterCSV.process('data.csv')
+```
+`SmarterCSV.parse(string)` works like `CSV.parse(string, headers: true, header_converters: :symbol)` — with numeric conversion included by default:
+```ruby
+data = SmarterCSV.parse(csv_string)
+```
+* See [**Migrating from Ruby CSV**](docs/migrating_from_csv.md) for a full comparison of options, behavior differences, and a quick-reference table.
 ## Examples
@@ -67,6 +101,29 @@ Notice how SmarterCSV automatically (all defaults):
 - Removes empty values → `remove_empty_values: true`
 - Preserves Unicode and emoji characters
+### Header Transformation Pipeline
+Once the header line is read, SmarterCSV normalizes it through these steps:
+```
+comment_regexp → strip_chars_from_headers → split on col_sep → strip quote_char
+    → strip_whitespace → [gsub spaces/dashes→_ → downcase_header]
+    → disambiguate_headers → symbolize → key_mapping
+```
+`user_provided_headers` bypasses all of the above. Each step is individually configurable. See [Header Transformations](docs/header_transformations.md) for the full step-by-step table and options.
+### Value Transformation Pipeline
+After each row is parsed, SmarterCSV applies a transformation pipeline to field values:
+```
+strip_whitespace → nil_values_matching → remove_empty_values → remove_zero_values
+    → convert_values_to_numeric → value_converters → remove_empty_hashes
+```
+Each step is individually configurable. See [Data Transformations](docs/data_transformations.md) and [Value Converters](docs/value_converters.md) for details.
 ### Batch Processing:
 Processing large CSV files in chunks minimizes memory usage and enables powerful workflows:
@@ -86,11 +143,46 @@ end
 # Parallel processing with Sidekiq
 SmarterCSV.process(filename, chunk_size: 100) do |chunk|
-  MyWorker.perform_async(chunk)  # each chunk processed in parallel
+  Sidekiq::Client.push_bulk('class' => MyWorker, 'args' => chunk) # each chunk processed in parallel
+end
+```
+### Modern Enumerator API:
+`Reader#each` is the modern, idiomatic way to process rows — `Reader` includes `Enumerable`, so all standard Ruby methods work:
+```ruby
+reader = SmarterCSV::Reader.new('data.csv', options)
+reader.each { |hash| MyModel.upsert(hash) }
+# Enumerable methods
+active = reader.select { |h| h[:status] == 'active' }
+names  = reader.map    { |h| h[:name] }
+# Lazy — stop early without reading the whole file
+first_ten = reader.lazy.select { |h| h[:active] }.first(10)
+# Manual batching without chunk_size
+reader.each_slice(500) { |batch| MyModel.insert_all(batch) }
+```
+### Bad Row Handling:
+SmarterCSV can quarantine malformed rows instead of crashing the entire import:
+```ruby
+reader = SmarterCSV::Reader.new('data.csv', on_bad_row: :collect)
+good_rows = reader.process
+puts "#{good_rows.size} imported, #{reader.errors[:bad_rows].size} bad rows"
+reader.errors[:bad_rows].each do |rec|
+  puts "Line #{rec[:file_line_number]}: #{rec[:error_message]}"
 end
 ```
-See [Examples](docs/examples.md), [Batch Processing](docs/batch_processing.md), and [Configuration Options](docs/options.md) for more.
+See [Bad Row Quarantine](docs/bad_row_quarantine.md) for full details including `bad_row_limit` and `field_size_limit`.
+See [13 Examples](docs/examples.md) for more, including value converters, header validation, writing CSV, encoding handling, and resumable Rails ActiveJob imports.
 ## Requirements
@@ -99,7 +191,7 @@ See [Examples](docs/examples.md), [Batch Processing](docs/batch_processing.md),
 **C Extension:** SmarterCSV includes a native C extension for accelerated CSV parsing.
 The C extension is automatically compiled on MRI Ruby. For JRuby and TruffleRuby, SmarterCSV falls back to a pure Ruby implementation.
-# Installation
+## Installation
 Add this line to your application's Gemfile:
 ```ruby
@@ -114,31 +206,41 @@ Or install it yourself as:
     $ gem install smarter_csv
 ```
-# Documentation
+## Documentation
   * [Introduction](docs/_introduction.md)
+  * [**Migrating from Ruby CSV**](docs/migrating_from_csv.md)
+  * [Ruby CSV Pitfalls](docs/ruby_csv_pitfalls.md)
   * [Parsing Strategy](docs/parsing_strategy.md)
   * [The Basic Read API](docs/basic_read_api.md)
   * [The Basic Write API](docs/basic_write_api.md)
-  * [Batch Processing](./docs/batch_processing.md)
+  * [Batch Processing](docs/batch_processing.md)
   * [Configuration Options](docs/options.md)
   * [Row and Column Separators](docs/row_col_sep.md)
   * [Header Transformations](docs/header_transformations.md)
   * [Header Validations](docs/header_validations.md)
+  * [Column Selection](docs/column_selection.md)
   * [Data Transformations](docs/data_transformations.md)
   * [Value Converters](docs/value_converters.md)
-# Articles
+  * [Bad Row Quarantine](docs/bad_row_quarantine.md)
+  * [Instrumentation Hooks](docs/instrumentation.md)
+  * [Examples](docs/examples.md)
+  * [Real-World CSV Files](docs/real_world_csv.md)
+  * [SmarterCSV over the Years](docs/history.md)
+  * [Release Notes](docs/releases/1.16.0/changes.md)
+## Articles
   * [Parsing CSV Files in Ruby with SmarterCSV](https://tilo-sloboda.medium.com/parsing-csv-files-in-ruby-with-smartercsv-6ce66fb6cf38)
   * [CSV Writing with SmarterCSV](https://tilo-sloboda.medium.com/csv-writing-with-smartercsv-26136d47ad0c)
   * [Processing 1.4 Million CSV Records in Ruby, fast ](https://lcx.wien/blog/processing-14-million-csv-records-in-ruby/)
   * [Faster Parsing CSV with Parallel Processing](http://xjlin0.github.io/tech/2015/05/25/faster-parsing-csv-with-parallel-processing) by [Jack lin](https://github.com/xjlin0/)
   * The original [Stackoverflow Question](https://stackoverflow.com/questions/7788618/update-mongodb-with-array-from-csv-join-table/7788746#7788746) that inspired SmarterCSV
   * [The original post](http://www.unixgods.org/Ruby/process_csv_as_hashes.html) for SmarterCSV
+  * [SmarterCSV over the Years](docs/history.md) — version timeline and performance journey (9×–65× faster than v1.14.4)
 # [ChangeLog](./CHANGELOG.md)
-# Reporting Bugs / Feature Requests
+## Reporting Bugs / Feature Requests
 Please [open an Issue on GitHub](https://github.com/tilo/smarter_csv/issues) if you have feedback, new feature requests, or want to report a bug. Thank you!
@@ -147,10 +249,10 @@ For reporting issues, please:
   * open a pull-request adding a test that demonstrates the issue
   * mention your version of SmarterCSV, Ruby, Rails
-# [A Special Thanks to all 59 Contributors!](CONTRIBUTORS.md) 🎉🎉🎉
+# [A Special Thanks to all 62 Contributors!](CONTRIBUTORS.md) 🎉🎉🎉
-# Contributing
+## Contributing
 1. Fork it
 2. Create your feature branch (`git checkout -b my-new-feature`)

data/docs/_introduction.md CHANGED Viewed

@@ -2,6 +2,8 @@
 ### Contents
   * [**Introduction**](./_introduction.md)
+  * [Migrating from Ruby CSV](./migrating_from_csv.md)
+  * [Ruby CSV Pitfalls](./ruby_csv_pitfalls.md)
   * [Parsing Strategy](./parsing_strategy.md)
   * [The Basic Read API](./basic_read_api.md)
   * [The Basic Write API](./basic_write_api.md)
@@ -10,49 +12,68 @@
   * [Row and Column Separators](./row_col_sep.md)
   * [Header Transformations](./header_transformations.md)
   * [Header Validations](./header_validations.md)
+  * [Column Selection](./column_selection.md)
   * [Data Transformations](./data_transformations.md)
   * [Value Converters](./value_converters.md)
---------------
+  * [Bad Row Quarantine](./bad_row_quarantine.md)
+  * [Instrumentation Hooks](./instrumentation.md)
+  * [Examples](./examples.md)
+  * [Real-World CSV Files](./real_world_csv.md)
+  * [SmarterCSV over the Years](./history.md)
+  * [Release Notes](./releases/1.16.0/changes.md)
+--------------
 # SmarterCSV Introduction
-`smarter_csv` is a Ruby Gem for convenient reading and writing of CSV files. It has intelligent defaults, and auto-discovery of column and row separators. It imports CSV Files as Array(s) of Hashes, suitable for direct processing with ActiveRecord, kicking-off batch jobs with Sidekiq, parallel processing, or oploading data to S3. Similarly, writing CSV files takes Hashes, or Arrays of Hashes to create a CSV file.
+`smarter_csv` is a Ruby gem for fast & convenient importing and exporting of CSV files. It has intelligent defaults and auto-discovery of column and row separators. Importing returns Rails-ready hashes — suitable for direct use with ActiveRecord, Sidekiq, parallel processing, or S3 workflows. Exporting takes hashes or arrays of hashes and writes properly formatted CSV.
 ## Why another CSV library?
-Ruby's original 'csv' library's API is pretty old, and its processing of CSV-files returning an array-of-array format feels unnecessarily 'close to the metal'. Its output is not easy to use - especially not if you need a data hash to create database records, or JSON from it, or pass it to Sidekiq or S3. Another shortcoming is that Ruby's 'csv' library does not have good support for huge CSV-files, e.g. there is no support for batching and/or parallel processing of the CSV-content (e.g. with Sidekiq jobs).
+**Inconvenient.** Ruby's built-in `csv` library returns arrays of arrays, which means your application code must handle column indexing, header normalization, type conversion, and whitespace stripping manually. It also has no built-in support for chunked or parallel processing of large files.
+**Hidden failure modes.** `CSV.read` has 10 ways to silently corrupt or lose data — no exception, no warning, no log line. Duplicate headers, blank header cells, extra columns, BOMs, whitespace, inconsistent empty-field representation, runaway quoted fields, and encoding issues all fail silently. See [Ruby CSV Pitfalls](./ruby_csv_pitfalls.md) for reproducible examples and the SmarterCSV fix for each.
+**Slow.** On top of everything else, it is up to 129× slower than SmarterCSV for equivalent end-to-end work.
+![SmarterCSV 1.16.0 vs Ruby CSV 3.3.5 speedup](../images/SmarterCSV_1.16.0_vs_RubyCSV_3.3.5_speedup.png)
-When SmarterCSV was envisioned, I needed to do nightly imports of very large data sets that came in CSV format, that needed to be upserted into a database, and because of the sheer volume of data needed to be processed in parallel.
-The CSV processing also needed to be robust against variations in the input data.
+SmarterCSV was created to solve exactly these problems: nightly imports of large datasets that needed to be upserted into a database, processed in parallel, and remain robust against real-world variations in input data.
 ## Benefits of using SmarterCSV
-* Improved Robustness:
-  Typically you have little control over the data quality of CSV files that need to be imported. Because SmarterCSV has intelligent defaults and auto-detection of typical formats, this improves the robustness of your CSV imports without having to manually tweak options.
+* **Performance:**
+  SmarterCSV's C extension accelerates the full ingestion pipeline — parsing, hash construction, and value conversions — not just tokenization. Real-world benchmarks against `CSV.table` (the closest equivalent) show 7×–129× faster end-to-end throughput.
-* Easy-to-use Format:
-  By using a Ruby hash to represent a CSV row, SmarterCSV allows you to directly use this data and insert it into a database, or use it with Sidekiq, S3, message queues, etc
+* **Rails-ready output:**
+  Each CSV row is returned as a Ruby hash with symbol keys, numeric conversion, and whitespace stripping applied automatically. No post-processing boilerplate needed — records can be passed directly to `ActiveRecord`, `insert_all`, Sidekiq, message queues, or JSON serializers.
-* Normalized Headers:
-  SmarterCSV automatically transforms CSV headers to Ruby symbols, stripping leading or trailing whitespace.
-  There are many ways to customize the header transformation to your liking. You can re-map CSV headers to hash keys, and you can ignore CSV columns.
+* **Intelligent defaults and robustness:**
+  SmarterCSV auto-detects row and column separators, handles BOMs, strips extra whitespace, and tolerates common real-world inconsistencies — all without manual configuration. This makes imports robust against data you don't fully control, such as user-uploaded files or third-party exports.
-* Normalized Data:
-  SmarterCSV transforms the data in each CSV row automatically, stripping whitespace, converting numerical data into numbers, ignoring nil or empty fields, and more. There are many ways to customize this. You can even add your own value converters.
+* **Flexible header and value transformations:**
+  Headers are automatically downcased, symbolized, and normalized. You can remap or drop columns with `key_mapping`, override headers entirely with `user_provided_headers`, and apply per-field value converters for custom type coercion (dates, booleans, currency, etc.).
-* Batch Processing of large CSV files:
-  Processing large CSV files in chunks, reduces the memory impact and allows for faster / parallel processing.
-  By adding the option `chunk_size: numeric_value`, you can switch to batch processing. SmarterCSV will then return arrays-of-hashes. This makes parallel processing easy: you can pass whole chunks of data to Sidekiq, bulk-insert into a DB, or pass it to other data sinks.
+* **Batch and streaming processing:**
+  `chunk_size` enables memory-efficient batch processing of arbitrarily large files — each chunk is an array of hashes ready for `insert_all`, Sidekiq, or other data sinks. The `Reader#each` enumerator includes `Enumerable`, giving you lazy evaluation, `each_slice`, `select`, `map`, and more.
+* **Bad row quarantine:**
+  Malformed rows can be collected or skipped instead of crashing the entire import. `on_bad_row: :collect` lets you inspect and log bad rows after processing completes.
 ## Additional Features
-* Header Validation:
-  You can validate that a set of hash keys is present in each record after header transformations are applied.
-  This can help ensure importing data with consistent quality.
+* **Header validation:**
+  Use `required_keys` to raise an error before any data rows are processed if expected columns are missing. Works with post-transformation key names, so it's safe to combine with `key_mapping`. See [Header Validations](./header_validations.md).
+* **Instrumentation hooks:**
+  `on_start`, `on_chunk`, and `on_complete` callbacks give you visibility into import progress — useful for logging, progress bars, and alerting in long-running jobs. See [Instrumentation Hooks](./instrumentation.md).
-* Data Validations
-  (planned feature)
+* **Resumable imports:**
+  The `chunk_index` parameter pairs naturally with Rails 8.1's `ActiveJob::Continuable` for jobs that can pause and resume mid-import without reprocessing already-completed chunks. See [Examples](./examples.md#example-12-resumable-csv-import-with-rails-activejob-rails-81).
+* **CSV writing:**
+  `SmarterCSV.generate` writes arrays of hashes to CSV, with support for header renaming and value converters on output. See [The Basic Write API](./basic_write_api.md).
 ---------------
-PREVIOUS [README](../README.md) | NEXT: [Parsing Strategy](./parsing_strategy.md)
+NEXT: [Migrating from Ruby CSV](./migrating_from_csv.md) | UP: [README](../README.md)