RubyGems - smarter_csv - Versions diffs - 1.16.4 → 1.17.0 - Mend

smarter_csv 1.16.4 → 1.17.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (46) hide show

checksums.yaml +4 -4
data/.rubocop.yml +10 -1
data/CHANGELOG.md +54 -0
data/Gemfile +10 -5
data/README.md +98 -14
data/TO_DO.md +109 -0
data/docs/_introduction.md +1 -0
data/docs/bad_row_quarantine.md +2 -1
data/docs/basic_read_api.md +6 -1
data/docs/basic_write_api.md +30 -0
data/docs/batch_processing.md +25 -0
data/docs/column_selection.md +1 -0
data/docs/data_transformations.md +1 -0
data/docs/examples.md +126 -0
data/docs/header_transformations.md +23 -0
data/docs/header_validations.md +1 -0
data/docs/history.md +1 -0
data/docs/instrumentation.md +2 -1
data/docs/migrating_from_csv.md +1 -0
data/docs/options.md +20 -18
data/docs/parsing_strategy.md +1 -0
data/docs/real_world_csv.md +51 -1
data/docs/releases/1.16.0/performance_notes.md +15 -15
data/docs/releases/1.17.0/benchmarks.md +121 -0
data/docs/releases/1.17.0/changes.md +161 -0
data/docs/releases/1.17.0/performance_notes.md +126 -0
data/docs/row_col_sep.md +21 -1
data/docs/ruby_csv_pitfalls.md +1 -0
data/docs/value_converters.md +24 -0
data/docs/warnings.md +141 -0
data/ext/smarter_csv/smarter_csv.c +98 -32
data/images/SmarterCSV_1.17.0_vs_RubyCSV_3.3.5_speedup.svg +106 -0
data/images/SmarterCSV_1.17.0_vs_previous_C-speedup.svg +181 -0
data/images/SmarterCSV_1.17.0_vs_previous_Rb-speedup.svg +179 -0
data/lib/smarter_csv/auto_detection.rb +215 -30
data/lib/smarter_csv/file_io.rb +2 -2
data/lib/smarter_csv/hash_transformations.rb +29 -13
data/lib/smarter_csv/parser.rb +42 -33
data/lib/smarter_csv/peekable_io.rb +453 -0
data/lib/smarter_csv/reader.rb +119 -23
data/lib/smarter_csv/reader_options.rb +61 -1
data/lib/smarter_csv/version.rb +1 -1
data/lib/smarter_csv.rb +40 -12
metadata +12 -5
data/TO_DO_v2.md +0 -14
data/ext/smarter_csv/Makefile +0 -270

data/docs/examples.md CHANGED Viewed

@@ -16,6 +16,7 @@
   * [Data Transformations](./data_transformations.md)
   * [Value Converters](./value_converters.md)
   * [Bad Row Quarantine](./bad_row_quarantine.md)
+  * [Warnings](./warnings.md)
   * [Instrumentation Hooks](./instrumentation.md)
   * [**Examples**](./examples.md)
   * [Real-World CSV Files](./real_world_csv.md)
@@ -43,6 +44,12 @@
 11. [Batch Processing with Sidekiq](#example-11-batch-processing-with-sidekiq)
 12. [Resumable CSV Import with Rails ActiveJob](#example-12-resumable-csv-import-with-rails-activejob-rails-81)
 13. [Instrumentation](#example-13-instrumentation)
+14. [Streaming Inputs (Non-Seekable IO)](#example-14-streaming-inputs-non-seekable-io)
+15. [Resumable Import (Plain Ruby)](#example-15-resumable-import-plain-ruby)
+16. [CSV Files with Comment Lines](#example-16-csv-files-with-comment-lines)
+17. [Tab-Separated Values (TSV)](#example-17-tab-separated-values-tsv)
+18. [Multi-Line Fields](#example-18-multi-line-fields)
+19. [Filtering and Transforming a CSV File](#example-19-filtering-and-transforming-a-csv-file)
 ---
@@ -369,5 +376,124 @@ SmarterCSV.process('large_import.csv',
 See [Instrumentation Hooks](./instrumentation.md).
+---
+## Example 14: Streaming Inputs (Non-Seekable IO)
+*(1.17.0+)* SmarterCSV reads from gzipped files, HTTP responses, S3 objects, or piped STDIN — no need to materialize the file on disk first.
+```ruby
+require 'zlib'
+Zlib::GzipReader.open('huge.csv.gz') do |io|
+  SmarterCSV.process(io) { |row| MyModel.upsert(row.first) }
+end
+```
+See [Real-World CSV Files → I/O Patterns](./real_world_csv.md#io-patterns) for gzip, S3, HTTP, STDIN, and `IO.popen` worked examples.
+---
+## Example 15: Resumable Import (Plain Ruby)
+A non-Rails counterpart to Example 12 — track the chunk cursor in a JSON file so an interrupted import resumes where it left off.
+See [Batch Processing → Resumable Import (Plain Ruby)](./batch_processing.md#example-resumable-import-plain-ruby) for the worked example.
+---
+## Example 16: CSV Files with Comment Lines
+Strip lines matching a pattern (e.g. `#`-prefixed comments in DB dumps and log exports) using `comment_regexp`:
+```ruby
+SmarterCSV.process('data.csv', comment_regexp: /\A#/)
+```
+See [Header Transformations → CSV Files with Comment Lines](./header_transformations.md#csv-files-with-comment-lines) for the worked example.
+---
+## Example 17: Tab-Separated Values (TSV)
+```ruby
+SmarterCSV.process('data.tsv')                  # auto-detected
+SmarterCSV.process('data.tsv', col_sep: "\t")   # explicit
+```
+See [Row and Column Separators → Tab-Separated Values (TSV)](./row_col_sep.md#tab-separated-values-tsv) for details.
+---
+## Example 18: Multi-Line Fields
+Newlines inside `"..."` are preserved as part of the field — common in addresses, CRM notes, and free-text comments. No configuration needed.
+See [Real-World CSV Files → Multi-Line Quoted Fields](./real_world_csv.md#multi-line-quoted-fields) for the worked example.
+---
+## Example 19: Filtering and Transforming a CSV File
+The Ruby CSV library has `CSV.filter` for "read CSV, mutate each row, write CSV." In SmarterCSV this is a two-line composition of `SmarterCSV.each` and `SmarterCSV.generate`:
+```ruby
+SmarterCSV.generate('out.csv') do |csv|
+  SmarterCSV.each('in.csv') do |row|
+    row[:price] = (row[:price] * 1.1).round(2)
+    row.delete(:internal_notes)
+    csv << row
+  end
+end
+```
+The explicit `csv << row` is the win over `CSV.filter` — emission is intentional, not a side effect of mutating the block argument.
+### Pipeline (STDIN → STDOUT)
+```ruby
+# cat in.csv | ruby filter.rb > out.csv
+SmarterCSV.generate($stdout) do |csv|
+  SmarterCSV.each($stdin) { |row| csv << row }
+end
+```
+### Skipping rows
+```ruby
+SmarterCSV.generate('out.csv') do |csv|
+  SmarterCSV.each('in.csv') do |row|
+    next if row[:status] == 'archived'   # just skip — no emit
+    csv << row
+  end
+end
+```
+### Compressed in, compressed out
+```ruby
+require 'zlib'
+Zlib::GzipWriter.open('out.csv.gz') do |gz_out|
+  SmarterCSV.generate(gz_out) do |csv|
+    Zlib::GzipReader.open('in.csv.gz') do |gz_in|
+      SmarterCSV.each(gz_in) { |row| csv << row }
+    end
+  end
+end
+```
+Both endpoints are non-seekable streams — a pattern `CSV.filter` cannot handle, since it requires seekable input/output.
+### Header renaming on the way through
+```ruby
+SmarterCSV.generate('out.csv', headers: [:given_name, :family_name, :email]) do |csv|
+  SmarterCSV.each('in.csv',
+    key_mapping: { first_name: :given_name, last_name: :family_name }
+  ) { |row| csv << row }
+end
+```
+Use `key_mapping:` on the read side to rename columns and `headers:` on the write side to enforce output column order.
 --------------------
 PREVIOUS: [Instrumentation Hooks](./instrumentation.md) | NEXT: [Real-World CSV Files](./real_world_csv.md) | UP: [README](../README.md)

data/docs/header_transformations.md CHANGED Viewed

@@ -16,6 +16,7 @@
   * [Data Transformations](./data_transformations.md)
   * [Value Converters](./value_converters.md)
   * [Bad Row Quarantine](./bad_row_quarantine.md)
+  * [Warnings](./warnings.md)
   * [Instrumentation Hooks](./instrumentation.md)
   * [Examples](./examples.md)
   * [Real-World CSV Files](./real_world_csv.md)
@@ -61,6 +62,28 @@ See [Configuration Options](./options.md) for full option reference.
 ---
+## CSV Files with Comment Lines
+Strip comment lines anywhere in the file — including before the header — using `comment_regexp`:
+```ruby
+$ cat data.csv
+# Generated 2026-01-15 by exporter v3.2
+# Confidential — internal use only
+id,name,amount
+1,Alice,100
+2,Bob,200
+# end of file
+data = SmarterCSV.process('data.csv', comment_regexp: /\A#/)
+# => [{id: 1, name: "Alice", amount: 100},
+#     {id: 2, name: "Bob",   amount: 200}]
+```
+Common in database dumps, log exports, and pipelines that prepend provenance metadata. The regexp is applied per line — any line matching is dropped before parsing.
+---
 ## Header Normalization
 When processing the headers, it transforms them into Ruby symbols, stripping extra spaces, lower-casing them and replacing spaces with underscores. e.g. " \t Annual Sales  " becomes `:annual_sales`. (see Notes below)

data/docs/header_validations.md CHANGED Viewed

@@ -16,6 +16,7 @@
   * [Data Transformations](./data_transformations.md)
   * [Value Converters](./value_converters.md)
   * [Bad Row Quarantine](./bad_row_quarantine.md)
+  * [Warnings](./warnings.md)
   * [Instrumentation Hooks](./instrumentation.md)
   * [Examples](./examples.md)
   * [Real-World CSV Files](./real_world_csv.md)

data/docs/history.md CHANGED Viewed

@@ -16,6 +16,7 @@
   * [Data Transformations](./data_transformations.md)
   * [Value Converters](./value_converters.md)
   * [Bad Row Quarantine](./bad_row_quarantine.md)
+  * [Warnings](./warnings.md)
   * [Instrumentation Hooks](./instrumentation.md)
   * [Examples](./examples.md)
   * [Real-World CSV Files](./real_world_csv.md)

data/docs/instrumentation.md CHANGED Viewed

@@ -16,6 +16,7 @@
   * [Data Transformations](./data_transformations.md)
   * [Value Converters](./value_converters.md)
   * [Bad Row Quarantine](./bad_row_quarantine.md)
+  * [Warnings](./warnings.md)
   * [**Instrumentation Hooks**](./instrumentation.md)
   * [Examples](./examples.md)
   * [Real-World CSV Files](./real_world_csv.md)
@@ -163,4 +164,4 @@ SmarterCSV.process(file, on_start: ON_START, on_complete: ON_COMPLETE)
 ```
 --------------------
-PREVIOUS: [Bad Row Quarantine](./bad_row_quarantine.md) | NEXT: [Examples](./examples.md) | UP: [README](../README.md)
+PREVIOUS: [Warnings](./warnings.md) | NEXT: [Examples](./examples.md) | UP: [README](../README.md)

data/docs/migrating_from_csv.md CHANGED Viewed

@@ -16,6 +16,7 @@
   * [Data Transformations](./data_transformations.md)
   * [Value Converters](./value_converters.md)
   * [Bad Row Quarantine](./bad_row_quarantine.md)
+  * [Warnings](./warnings.md)
   * [Instrumentation Hooks](./instrumentation.md)
   * [Examples](./examples.md)
   * [Real-World CSV Files](./real_world_csv.md)

data/docs/options.md CHANGED Viewed

@@ -16,6 +16,7 @@
   * [Data Transformations](./data_transformations.md)
   * [Value Converters](./value_converters.md)
   * [Bad Row Quarantine](./bad_row_quarantine.md)
+  * [Warnings](./warnings.md)
   * [Instrumentation Hooks](./instrumentation.md)
   * [Examples](./examples.md)
   * [Real-World CSV Files](./real_world_csv.md)
@@ -52,27 +53,28 @@
 ### File Input & Encoding
-| Option | Default | Explanation |
-|--------|---------|-------------|
-| `:file_encoding` | `utf-8` | Set the file encoding, e.g. `'windows-1252'` or `'iso-8859-1'`. |
-| `:invalid_byte_sequence` | `''` | What to replace invalid byte sequences with. |
-| `:force_utf8` | `false` | Force UTF-8 encoding of all lines (including headers) in the CSV file. |
+| Option                   | Default | Explanation                                                            |
+|--------------------------|---------|------------------------------------------------------------------------|
+| `:file_encoding`         | `utf-8` | Set the file encoding, e.g. `'windows-1252'` or `'iso-8859-1'`.        |
+| `:invalid_byte_sequence` | `''`    | What to replace invalid byte sequences with.                           |
+| `:force_utf8`            | `false` | Force UTF-8 encoding of all lines (including headers) in the CSV file. |
 ### File Layout
-| Option | Default | Explanation |
-|--------|---------|-------------|
-| `:skip_lines` | `nil` | How many lines to skip before the first line or header line is processed. |
-| `:comment_regexp` | `nil` | Regular expression to ignore comment lines (e.g. `/\A#/`). See NOTE on CSV header. |
-| `:chunk_size` | `nil` | If set, data is yielded in chunks of this many rows instead of all at once. Use with `SmarterCSV.each_chunk` for memory-efficient batch processing. |
+| Option            | Default | Explanation                                                                                                                                         |
+|-------------------|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
+| `:skip_lines`     | `nil`   | How many lines to skip before the first line or header line is processed.                                                                           |
+| `:comment_regexp` | `nil`   | Regular expression to ignore comment lines (e.g. `/\A#/`). See NOTE on CSV header.                                                                  |
+| `:chunk_size`     | `nil`   | If set, data is yielded in chunks of this many rows instead of all at once. Use with `SmarterCSV.each_chunk` for memory-efficient batch processing. |
 ### Separators
 | Option | Default | Explanation |
 |--------|---------|-------------|
 | `:col_sep` | `:auto` | Column separator. `:auto` detects from file content (previous default was `','`). |
-| `:row_sep` | `:auto` | Row / record separator. `:auto` detects from file content. Manual detection reads the whole file first (slow on large files). |
-| `:auto_row_sep_chars` | `500` | How many characters to analyze when using `:row_sep => :auto`. `nil` or `0` means whole file. |
+| `:row_sep` | `:auto` | Row / record separator. `:auto` detects from file content by scanning in chunks of `auto_row_sep_chars` bytes, up to a 64KB hard cap. |
+| `:auto_row_sep_chars` | `4096` | Initial scan size for `:row_sep => :auto` detection. Scan stops as soon as one separator has a clear majority, up to a 64KB cap. Bump this if your files have very wide headers or long comment preambles. Out-of-range values, `nil`, or `0` fall back to the default with a warning. |
+| `:buffer_size` | `16_384` | Peek buffer chunk size for non-seekable inputs (pipes, gzip readers, HTTP/S3 bodies). Out-of-range values warn and clamp to the supported range. Has no effect on seekable inputs (file paths, `File`, `StringIO`, `Tempfile`). |
 ### Quoting
@@ -121,8 +123,8 @@ See [Parsing Strategy](./parsing_strategy.md) for full details on quote handling
 | `:strip_whitespace` | `true` | Remove whitespace before/after values and headers. |
 | `:convert_values_to_numeric` | `true` | Convert strings containing integers or floats to the appropriate numeric type. Accepts `{except: [:key1, :key2]}` or `{only: :key3}` to limit which columns. |
 | `:value_converters` | `nil` | Hash of `:header => converter`; converter can be a lambda/Proc or a class implementing `self.convert(value)`. See [Value Converters](./value_converters.md). |
-| `:remove_empty_values` | `true` | Remove key/value pairs where the value is `nil` or an empty string. |
-| `:remove_zero_values` | `false` | Remove key/value pairs where the numeric value equals zero. |
+| `:remove_empty_values` | `true` | Remove key/value pairs where the value is `nil`, empty, or whitespace-only — any Unicode whitespace, same as Ruby's `String#blank?`. |
+| `:remove_zero_values` | `false` | Remove key/value pairs whose value is zero — numeric `0` / `0.0`, or any textual form of zero (`"0"`, `"0.0"`, `"00.00"`, `"+0"`, `"-0.0"`, …). |
 | `:nil_values_matching` | `nil` | Set matching values to `nil`. Accepts a regular expression matched against the string representation of each value (e.g. `/\ANAN\z/` for NaN, `/\A#VALUE!\z/` for Excel errors). With `remove_empty_values: true` (default), nil-ified values are then removed. With `remove_empty_values: false`, the key is retained with a `nil` value. |
 | `:remove_empty_hashes` | `true` | Remove result hashes that have no key/value pairs or all-empty values. |
@@ -142,7 +144,7 @@ See [Bad Row Quarantine](./bad_row_quarantine.md) for full details.
 | Option | Default | Explanation |
 |--------|---------|-------------|
 | `:with_line_numbers` | `false` | Add `:csv_line_number` to each result hash. |
-| `:verbose` | `:normal` | Controls warning and diagnostic output. Accepted values:<br>• `:quiet` — suppress all warnings and notices (recommended for production)<br>• `:normal` — show behavioral warnings, e.g. auto-configuration notices **(default)**<br>• `:debug` — `:normal` + print computed options and per-row diagnostics to stderr<br>`nil` is silently treated as `:normal`. Passing `true` or `false` still works but is deprecated — see below. |
+| `:verbose` | `:normal` | Controls warning and diagnostic output. Accepted values:<br>• `:quiet` — suppress all warnings and notices (recommended for production)<br>• `:normal` — show behavioral warnings, e.g. auto-configuration notices **(default)**<br>• `:debug` — `:normal` + print computed options and per-row diagnostics to stderr<br>`nil` is silently treated as `:normal`. Passing `true` or `false` still works but is deprecated — see below. See [Warnings](./warnings.md) for the structured warning collection. |
 ### Instrumentation Hooks
@@ -156,9 +158,9 @@ See [Instrumentation Hooks](./instrumentation.md) for full details and payload r
 ### Performance
-| Option | Default | Explanation |
-|--------|---------|-------------|
-| `:acceleration` | `true` | Use the C extension for parsing (MRI Ruby only). Set to `false` to force the pure-Ruby fallback (always used on JRuby/TruffleRuby). |
+| Option            | Default | Explanation                                                                                                                         |
+|-------------------|---------|-------------------------------------------------------------------------------------------------------------------------------------|
+| `:acceleration`   | `true`  | Use the C extension for parsing (MRI Ruby only). Set to `false` to force the pure-Ruby fallback (always used on JRuby/TruffleRuby). |
 ---

data/docs/parsing_strategy.md CHANGED Viewed

@@ -16,6 +16,7 @@
   * [Data Transformations](./data_transformations.md)
   * [Value Converters](./value_converters.md)
   * [Bad Row Quarantine](./bad_row_quarantine.md)
+  * [Warnings](./warnings.md)
   * [Instrumentation Hooks](./instrumentation.md)
   * [Examples](./examples.md)
   * [Real-World CSV Files](./real_world_csv.md)

data/docs/real_world_csv.md CHANGED Viewed

@@ -16,6 +16,7 @@
   * [Data Transformations](./data_transformations.md)
   * [Value Converters](./value_converters.md)
   * [Bad Row Quarantine](./bad_row_quarantine.md)
+  * [Warnings](./warnings.md)
   * [Instrumentation Hooks](./instrumentation.md)
   * [Examples](./examples.md)
   * [**Real-World CSV Files**](./real_world_csv.md)
@@ -186,10 +187,59 @@ Numeric conversion is one of the most common sources of data loss. SmarterCSV co
 ### I/O Patterns
+SmarterCSV accepts any IO-compatible source — file paths, open `File` handles, `StringIO`, and **non-seekable streams** like pipes, `STDIN`, and `Zlib::GzipReader`. Auto-detection of `row_sep` / `col_sep` works on streaming sources too thanks to internal buffering — the underlying source never needs to support `rewind` or `seek`. (Streaming IO support landed in 1.17.0.)
 | Source | Issue | Status | Notes |
 |--------|-------|--------|-------|
-| Gzipped CSV (`.csv.gz`) | Compressed file | 🔘 | Decompress and pass the resulting IO object: `SmarterCSV.process(Zlib::GzipReader.open(path))`. |
+| Gzipped CSV (`.csv.gz`) | Compressed, non-seekable stream | 🔘 | `SmarterCSV.process(Zlib::GzipReader.open(path))` — no need to decompress to disk first. |
 | HTTP streaming | Parsing from a live HTTP response | 🔘 | Pass any IO-compatible object that responds to `#gets`. |
+| `STDIN` / shell pipes | Non-seekable input | 🔘 | `cat data.csv \| ruby -rsmarter_csv -e 'SmarterCSV.process(STDIN) { \|h\| ... }'` |
+| `IO.popen` output | Non-seekable subprocess stream | 🔘 | `IO.popen('zcat data.csv.gz') { \|io\| SmarterCSV.process(io) }` |
+| S3 object body | Non-seekable HTTP stream | 🔘 | `SmarterCSV.process(s3.get_object(...).body)` — see worked example below. |
+#### Streaming Inputs
+```ruby
+# Gzipped CSV — stream-decompressed, never written to disk
+require 'zlib'
+Zlib::GzipReader.open('huge.csv.gz') do |io|
+  SmarterCSV.process(io) { |row| MyModel.upsert(row.first) }
+end
+# STDIN / pipes
+SmarterCSV.process($stdin) { |row, _| MyModel.upsert(row.first) }
+# HTTP response body
+require 'open-uri'
+URI.open('https://example.com/data.csv') { |io| SmarterCSV.process(io) }
+# S3 — stream the response body directly
+require 'aws-sdk-s3'
+obj = Aws::S3::Client.new.get_object(bucket: 'data', key: 'imports/users.csv')
+SmarterCSV::Reader.new(obj.body, chunk_size: 500).each_chunk do |chunk, _index|
+  MyModel.insert_all(chunk)
+end
+# Subprocess output
+IO.popen('zcat data.csv.gz') { |io| SmarterCSV.process(io) }
+```
+#### Multi-Line Quoted Fields
+Newlines inside `"..."` are preserved as part of the field — useful for address blocks, CRM notes, and free-text comments. No configuration needed:
+```ruby
+$ cat addresses.csv
+id,name,address
+1,Alice,"123 Main St
+Apt 4B
+Brooklyn, NY 11201"
+2,Bob,"42 Elm Ave"
+data = SmarterCSV.process('addresses.csv')
+# => [{id: 1, name: "Alice", address: "123 Main St\nApt 4B\nBrooklyn, NY 11201"},
+#     {id: 2, name: "Bob",   address: "42 Elm Ave"}]
+```
 †: Legacy Apple DB Dump and older UNIX data dumps use ASCII control characters as delimiters:

data/docs/releases/1.16.0/performance_notes.md CHANGED Viewed

@@ -45,14 +45,14 @@ rows with type conversion applied. SmarterCSV/C is dramatically faster:
 ### C path
-| Gain         | Files                                                               |
-|--------------|---------------------------------------------------------------------|
-| **2.4×**     | long_fields — biggest win; `memchr` skip-ahead in quoted fields     |
-| **1.5×**     | heavy_quoting — same skip-ahead benefit                             |
-| **1.4×**     | tab_separated                                                       |
+| Gain         | Files                                                                       |
+|--------------|-----------------------------------------------------------------------------|
+| **2.4×**     | long_fields — biggest win; `memchr` skip-ahead in quoted fields             |
+| **1.5×**     | heavy_quoting — same skip-ahead benefit                                     |
+| **1.4×**     | tab_separated                                                               |
 | **1.2–1.3×** | embedded_sep, utf8, PEOPLE_IMPORT_C/NC, worldcities, whitespace, multi_char |
-| **1.1–1.2×** | PEOPLE_IMPORT_B/NB, uszips, sample_10M, wide_500_cols               |
-| **~1.0×**    | sensor_data, embedded_newlines (within noise)                       |
+| **1.1–1.2×** | PEOPLE_IMPORT_B/NB, uszips, sample_10M, wide_500_cols                       |
+| **~1.0×**    | sensor_data, embedded_newlines (within noise)                               |
 15 of 19 files are measurably faster; 2 within noise; 2 files show a small regression
 (PEOPLE_IMPORT_NB −7%, wide_500_cols −5%) attributable to the new `quote_boundary: :standard`
@@ -60,11 +60,11 @@ default adding one extra state check on the unquoted fast path.
 ### Ruby path
-| Gain         | Files                                                               |
-|--------------|---------------------------------------------------------------------|
+| Gain         | Files                                                                             |
+|--------------|-----------------------------------------------------------------------------------|
 | **1.9×**     | PEOPLE_IMPORT_C (117 cols) — direct hash construction bypasses intermediate Array |
-| **1.5×**     | PEOPLE_IMPORT_NC, multi_char_sep                                    |
-| **1.0–1.1×** | most other files                                                    |
+| **1.5×**     | PEOPLE_IMPORT_NC, multi_char_sep                                                  |
+| **1.0–1.1×** | most other files                                                                  |
 The Ruby path gains are concentrated on wide/complex files where the direct-hash
 construction optimization (Opt #11) has the most impact.
@@ -106,9 +106,9 @@ are skipped entirely in the C hot path — no string allocation, no conversion,
 insertion. Benchmark on `wide_500_cols_20k.csv` (500 columns):
 | Columns kept | Speedup vs no selection |
-|---|---|
-| 2 of 500  | ~16× faster |
-| 10 of 500 | ~8× faster  |
-| 50 of 500 | ~3× faster  |
+|--------------|-------------------------|
+|    2 of 500  |             ~16× faster |
+|   10 of 500  |             ~8× faster  |
+|   50 of 500  |             ~3× faster  |
 This is additive on top of the baseline gains above.

data/docs/releases/1.17.0/benchmarks.md ADDED Viewed

@@ -0,0 +1,121 @@
+# SmarterCSV 1.17.0 — Benchmark Results
+- **Date:** 2026-05-06
+- **Ruby:** 3.4.7 [arm64-darwin25] on Apple M1 Pro
+- **SmarterCSV:** 1.17.0
+- **Versions compared:** 1.14.4, 1.15.2, 1.16.4, 1.17.0
+- **Ruby CSV:** 3.3.5
+- **Methodology:** best of 40 measured runs (2 warm-up)
+- **Raw data files:**
+  - [`2026-05-06_1250_ruby3.4.7.md`](2026-05-06_1250_ruby3.4.7.md) / [`.json`](2026-05-06_1250_ruby3.4.7.json) — version comparison (1.14.4 / 1.15.2 / 1.16.4 / 1.17.0)
+  - [`2026-05-06_1511_ruby3.4.7.md`](2026-05-06_1511_ruby3.4.7.md) / [`.json`](2026-05-06_1511_ruby3.4.7.json) — vs Ruby CSV 3.3.5
+See [performance_notes.md](performance_notes.md) for analysis of these numbers.
+---
+## SmarterCSV C accelerated — version comparison
+| File                             |   Rows | v1.14.4    | v1.15.2   | v1.16.4   | v1.17.0   | newest vs oldest |
+|----------------------------------|--------|------------|-----------|-----------|-----------|------------------|
+| PEOPLE_IMPORT_B.csv              |  50000 |  1.6175s   |  0.1049s  |  0.0867s  |  0.0872s  | 18.54× faster    |
+| PEOPLE_IMPORT_C.csv              |  50000 |  8.0347s   |  0.2055s  |  0.1763s  |  0.1746s  | 46.02× faster    |
+| PEOPLE_IMPORT_NB.csv             |  50000 |  1.5629s   |  0.0994s  |  0.0694s  |  0.0708s  | 22.08× faster    |
+| PEOPLE_IMPORT_NC.csv             |  50000 |  1.4679s   |  0.0855s  |  0.0711s  |  0.0705s  | 20.83× faster    |
+| uscities.csv                     |  31257 |  1.0357s   |  0.1129s  |  0.0878s  |  0.0819s  | 12.64× faster    |
+| uszips.csv                       |  33782 |  1.2419s   |  0.1121s  |  0.0880s  |  0.0879s  | 14.13× faster    |
+| worldcities.csv                  |  48059 |  1.0420s   |  0.1174s  |  0.0861s  |  0.0773s  | 13.49× faster    |
+| embedded_newlines_20k.csv        |  80000 |  0.5337s   |  0.0633s  |  0.0591s  |  0.0545s  |  9.80× faster    |
+| embedded_separators_20k.csv      |  20000 |  0.2761s   |  0.0328s  |  0.0215s  |  0.0214s  | 12.90× faster    |
+| heavy_quoting_20k.csv            |  20000 |  0.5129s   |  0.0561s  |  0.0364s  |  0.0358s  | 14.34× faster    |
+| long_fields_20k.csv              |  20000 |  2.9215s   |  0.1082s  |  0.0464s  |  0.0392s  | 74.54× faster    |
+| many_empty_fields_20k.csv        |  20000 |  0.3885s   |  0.0314s  |  0.0240s  |  0.0262s  | 14.81× faster    |
+| multi_char_separator_20k.csv     |  20000 |  0.5305s   |  0.0340s  |  0.0272s  |  0.0296s  | 17.90× faster    |
+| sample_10M.csv                   |  50000 |  0.4513s   |  0.0619s  |  0.0480s  |  0.0446s  | 10.11× faster    |
+| sensor_data_50krows_50cols.csv   |  50000 |  3.8704s   |  0.2714s  |  0.2559s  |  0.2549s  | 15.19× faster    |
+| tab_separated_20k.tsv            |  20000 |  0.4496s   |  0.0337s  |  0.0255s  |  0.0256s  | 17.54× faster    |
+| utf8_multibyte_20k.csv           |  20000 |  0.2233s   |  0.0210s  |  0.0152s  |  0.0149s  | 14.96× faster    |
+| whitespace_heavy_20k.csv         |  20000 |  0.5244s   |  0.0349s  |  0.0250s  |  0.0286s  | 18.34× faster    |
+| wide_500_cols_20k.csv            |  20000 | 17.3477s   |  1.2805s  |  1.2798s  |  1.2701s  | 13.66× faster    |
+## SmarterCSV Ruby path — version comparison
+| File                             |   Rows | v1.14.4    | v1.15.2   | v1.16.4   | v1.17.0   | newest vs oldest |
+|----------------------------------|--------|------------|-----------|-----------|-----------|------------------|
+| PEOPLE_IMPORT_B.csv              |  50000 |  4.5718s   |  0.5635s  |  0.5272s  |  0.4971s  |  9.20× faster    |
+| PEOPLE_IMPORT_C.csv              |  50000 | 26.0194s   |  2.5511s  |  1.3401s  |  1.3328s  | 19.52× faster    |
+| PEOPLE_IMPORT_NB.csv             |  50000 |  4.4999s   |  0.5268s  |  0.4757s  |  0.4791s  |  9.39× faster    |
+| PEOPLE_IMPORT_NC.csv             |  50000 |  4.3233s   |  0.5752s  |  0.3989s  |  0.4017s  | 10.76× faster    |
+| uscities.csv                     |  31257 |  2.6702s   |  1.8124s  |  1.0662s  |  1.0944s  |  2.44× faster    |
+| uszips.csv                       |  33782 |  3.1853s   |  2.1641s  |  1.3332s  |  1.3434s  |  2.37× faster    |
+| worldcities.csv                  |  48059 |  2.8397s   |  1.8978s  |  1.0910s  |  1.0909s  |  2.60× faster    |
+| embedded_newlines_20k.csv        |  80000 |  0.9578s   |  0.4629s  |  0.4291s  |  0.4314s  |  2.22× faster    |
+| embedded_separators_20k.csv      |  20000 |  0.7074s   |  0.4535s  |  0.2748s  |  0.2748s  |  2.57× faster    |
+| heavy_quoting_20k.csv            |  20000 |  1.4361s   |  0.8598s  |  0.5241s  |  0.5273s  |  2.72× faster    |
+| long_fields_20k.csv              |  20000 |  8.8715s   |  4.7839s  |  2.5696s  |  2.5624s  |  3.46× faster    |
+| many_empty_fields_20k.csv        |  20000 |  0.8635s   |  0.2521s  |  0.1680s  |  0.1664s  |  5.19× faster    |
+| multi_char_separator_20k.csv     |  20000 |  1.4172s   |  0.2463s  |  0.1853s  |  0.1879s  |  7.54× faster    |
+| sample_10M.csv                   |  50000 |  1.0547s   |  0.2388s  |  0.2238s  |  0.2211s  |  4.77× faster    |
+| sensor_data_50krows_50cols.csv   |  50000 |  8.9445s   |  1.8246s  |  1.8348s  |  1.8181s  |  4.92× faster    |
+| tab_separated_20k.tsv            |  20000 |  1.2664s   |  0.1596s  |  0.1553s  |  0.1536s  |  8.24× faster    |
+| utf8_multibyte_20k.csv           |  20000 |  0.6484s   |  0.1124s  |  0.1068s  |  0.1066s  |  6.08× faster    |
+| whitespace_heavy_20k.csv         |  20000 |  1.5513s   |  0.1613s  |  0.1654s  |  0.1610s  |  9.63× faster    |
+| wide_500_cols_20k.csv            |  20000 | 44.5782s   |  7.2023s  |  6.9748s  |  6.9261s  |  6.44× faster    |
+---
+## SmarterCSV 1.17.0 vs Ruby CSV 3.3.5 — full results
+| File                             |   Rows | CSV.read¹  | CSV.hashes¹ | SmarterCSV/C  | SmarterCSV/Rb |
+|----------------------------------|--------|------------|-------------|---------------|---------------|
+| PEOPLE_IMPORT_B.csv              |  50000 |  0.2718s   |  0.7750s    |  0.0673s      |  0.5034s      |
+| PEOPLE_IMPORT_C.csv              |  50000 |  1.4111s   |  8.0199s    |  0.1907s      |  1.4032s      |
+| PEOPLE_IMPORT_NB.csv             |  50000 |  0.2659s   |  0.7603s    |  0.0638s      |  0.4800s      |
+| PEOPLE_IMPORT_NC.csv             |  50000 |  0.2860s   |  0.9173s    |  0.0630s      |  0.4132s      |
+| uscities.csv                     |  31257 |  0.5640s   |  0.8803s    |  0.0789s      |  1.1120s      |
+| uszips.csv                       |  33782 |  0.7414s   |  1.1604s    |  0.0929s      |  1.3645s      |
+| worldcities.csv                  |  48059 |  0.6313s   |  0.9906s    |  0.0794s      |  1.0945s      |
+| embedded_newlines_20k.csv        |  80000 |  0.1693s   |  0.2245s    |  0.0554s      |  0.4451s      |
+| embedded_separators_20k.csv      |  20000 |  0.1312s   |  0.1838s    |  0.0206s      |  0.2830s      |
+| heavy_quoting_20k.csv            |  20000 |  0.1167s   |  0.2410s    |  0.0338s      |  0.5400s      |
+| long_fields_20k.csv              |  20000 |  0.2373s   |  0.2762s    |  0.0392s      |  2.6172s      |
+| many_empty_fields_20k.csv        |  20000 |  0.1145s   |  0.3622s    |  0.0216s      |  0.1727s      |
+| multi_char_separator_20k.csv     |  20000 |  0.0890s   |  0.2122s    |  0.0293s      |  0.1662s      |
+| sample_10M.csv                   |  50000 |  0.1685s   |  0.3012s    |  0.0357s      |  0.2361s      |
+| sensor_data_50krows_50cols.csv   |  50000 |  0.5655s   |  2.6744s    |  0.2442s      |  1.8878s      |
+| tab_separated_20k.tsv            |  20000 |  0.0832s   |  0.2029s    |  0.0219s      |  0.1651s      |
+| utf8_multibyte_20k.csv           |  20000 |  0.0662s   |  0.1427s    |  0.0156s      |  0.1138s      |
+| whitespace_heavy_20k.csv         |  20000 |  0.0890s   |  0.2169s    |  0.0278s      |  0.1670s      |
+| wide_500_cols_20k.csv            |  20000 |  2.3351s   | 32.4002s    |  1.2823s      |  7.3504s      |
+## Ruby CSV 3.3.5 vs SmarterCSV 1.17.0 (C accelerated)
+| File                             |   Rows | CSV.read¹     | CSV.hashes¹   |
+|----------------------------------|--------|---------------|---------------|
+| PEOPLE_IMPORT_B.csv              |  50000 |  4.04× slower | 11.51× slower |
+| PEOPLE_IMPORT_C.csv              |  50000 |  7.40× slower | 42.04× slower |
+| PEOPLE_IMPORT_NB.csv             |  50000 |  4.17× slower | 11.92× slower |
+| PEOPLE_IMPORT_NC.csv             |  50000 |  4.54× slower | 14.55× slower |
+| uscities.csv                     |  31257 |  7.15× slower | 11.16× slower |
+| uszips.csv                       |  33782 |  7.98× slower | 12.50× slower |
+| worldcities.csv                  |  48059 |  7.95× slower | 12.48× slower |
+| embedded_newlines_20k.csv        |  80000 |  3.05× slower |  4.05× slower |
+| embedded_separators_20k.csv      |  20000 |  6.36× slower |  8.91× slower |
+| heavy_quoting_20k.csv            |  20000 |  3.46× slower |  7.14× slower |
+| long_fields_20k.csv              |  20000 |  6.05× slower |  7.04× slower |
+| many_empty_fields_20k.csv        |  20000 |  5.29× slower | 16.73× slower |
+| multi_char_separator_20k.csv     |  20000 |  3.04× slower |  7.25× slower |
+| sample_10M.csv                   |  50000 |  4.72× slower |  8.43× slower |
+| sensor_data_50krows_50cols.csv   |  50000 |  2.32× slower | 10.95× slower |
+| tab_separated_20k.tsv            |  20000 |  3.80× slower |  9.28× slower |
+| utf8_multibyte_20k.csv           |  20000 |  4.24× slower |  9.14× slower |
+| whitespace_heavy_20k.csv         |  20000 |  3.20× slower |  7.81× slower |
+| wide_500_cols_20k.csv            |  20000 |  1.82× slower | 25.27× slower |
+---
+¹ **Raw output** — no post-processing applied. Returns plain arrays or string-keyed hashes. No header normalization, type conversion, whitespace stripping, or empty-value removal. Your own post-processing must be added to produce usable data.
+---
+PREVIOUS: [Performance Notes](./performance_notes.md) | UP: [README](../../../README.md)