RubyGems - smarter_csv - Versions diffs - 1.17.0.pre5 → 1.17.1 - Mend

smarter_csv 1.17.0.pre5 → 1.17.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

checksums.yaml +4 -4
data/.rubocop.yml +3 -0
data/CHANGELOG.md +47 -5
data/Gemfile +10 -5
data/README.md +94 -13
data/TO_DO.md +109 -0
data/docs/basic_read_api.md +4 -0
data/docs/basic_write_api.md +29 -0
data/docs/batch_processing.md +24 -0
data/docs/examples.md +125 -0
data/docs/header_transformations.md +22 -0
data/docs/options.md +17 -16
data/docs/real_world_csv.md +46 -1
data/docs/releases/1.16.0/performance_notes.md +15 -15
data/docs/releases/1.17.0/benchmarks.md +121 -0
data/docs/releases/1.17.0/changes.md +161 -0
data/docs/releases/1.17.0/performance_notes.md +126 -0
data/docs/row_col_sep.md +20 -1
data/docs/warnings.md +22 -0
data/ext/smarter_csv/smarter_csv.c +120 -42
data/images/SmarterCSV_1.17.0_vs_RubyCSV_3.3.5_speedup.svg +106 -0
data/images/SmarterCSV_1.17.0_vs_previous_C-speedup.svg +181 -0
data/images/SmarterCSV_1.17.0_vs_previous_Rb-speedup.svg +179 -0
data/lib/smarter_csv/auto_detection.rb +169 -25
data/lib/smarter_csv/hash_transformations.rb +29 -13
data/lib/smarter_csv/parser.rb +42 -33
data/lib/smarter_csv/peekable_io.rb +23 -2
data/lib/smarter_csv/reader.rb +9 -15
data/lib/smarter_csv/reader_options.rb +58 -11
data/lib/smarter_csv/version.rb +1 -1
data/lib/smarter_csv.rb +1 -1
metadata +10 -5
data/TO_DO_v2.md +0 -20
data/ext/smarter_csv/Makefile +0 -270

data/docs/options.md CHANGED Viewed

@@ -53,19 +53,19 @@
 ### File Input & Encoding
-| Option | Default | Explanation |
-|--------|---------|-------------|
-| `:file_encoding` | `utf-8` | Set the file encoding, e.g. `'windows-1252'` or `'iso-8859-1'`. |
-| `:invalid_byte_sequence` | `''` | What to replace invalid byte sequences with. |
-| `:force_utf8` | `false` | Force UTF-8 encoding of all lines (including headers) in the CSV file. |
+| Option                   | Default | Explanation                                                            |
+|--------------------------|---------|------------------------------------------------------------------------|
+| `:file_encoding`         | `utf-8` | Set the file encoding, e.g. `'windows-1252'` or `'iso-8859-1'`.        |
+| `:invalid_byte_sequence` | `''`    | What to replace invalid byte sequences with.                           |
+| `:force_utf8`            | `false` | Force UTF-8 encoding of all lines (including headers) in the CSV file. |
 ### File Layout
-| Option | Default | Explanation |
-|--------|---------|-------------|
-| `:skip_lines` | `nil` | How many lines to skip before the first line or header line is processed. |
-| `:comment_regexp` | `nil` | Regular expression to ignore comment lines (e.g. `/\A#/`). See NOTE on CSV header. |
-| `:chunk_size` | `nil` | If set, data is yielded in chunks of this many rows instead of all at once. Use with `SmarterCSV.each_chunk` for memory-efficient batch processing. |
+| Option            | Default | Explanation                                                                                                                                         |
+|-------------------|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
+| `:skip_lines`     | `nil`   | How many lines to skip before the first line or header line is processed.                                                                           |
+| `:comment_regexp` | `nil`   | Regular expression to ignore comment lines (e.g. `/\A#/`). See NOTE on CSV header.                                                                  |
+| `:chunk_size`     | `nil`   | If set, data is yielded in chunks of this many rows instead of all at once. Use with `SmarterCSV.each_chunk` for memory-efficient batch processing. |
 ### Separators
@@ -73,7 +73,8 @@
 |--------|---------|-------------|
 | `:col_sep` | `:auto` | Column separator. `:auto` detects from file content (previous default was `','`). |
 | `:row_sep` | `:auto` | Row / record separator. `:auto` detects from file content by scanning in chunks of `auto_row_sep_chars` bytes, up to a 64KB hard cap. |
-| `:auto_row_sep_chars` | `8192` | Chunk size used while scanning for `:row_sep => :auto`. Detection stops as soon as one separator has a clear majority, with a 64KB hard cap. Must be an Integer ≥ 8192; smaller values, `nil`, or `0` are rejected and fall back to the default with a warning. |
+| `:auto_row_sep_chars` | `4096` | Initial scan size for `:row_sep => :auto` detection. Scan stops as soon as one separator has a clear majority, up to a 64KB cap. Bump this if your files have very wide headers or long comment preambles. Out-of-range values, `nil`, or `0` fall back to the default with a warning. |
+| `:buffer_size` | `16_384` | Peek buffer chunk size for non-seekable inputs (pipes, gzip readers, HTTP/S3 bodies). Out-of-range values warn and clamp to the supported range. Has no effect on seekable inputs (file paths, `File`, `StringIO`, `Tempfile`). |
 ### Quoting
@@ -122,8 +123,8 @@ See [Parsing Strategy](./parsing_strategy.md) for full details on quote handling
 | `:strip_whitespace` | `true` | Remove whitespace before/after values and headers. |
 | `:convert_values_to_numeric` | `true` | Convert strings containing integers or floats to the appropriate numeric type. Accepts `{except: [:key1, :key2]}` or `{only: :key3}` to limit which columns. |
 | `:value_converters` | `nil` | Hash of `:header => converter`; converter can be a lambda/Proc or a class implementing `self.convert(value)`. See [Value Converters](./value_converters.md). |
-| `:remove_empty_values` | `true` | Remove key/value pairs where the value is `nil` or an empty string. |
-| `:remove_zero_values` | `false` | Remove key/value pairs where the numeric value equals zero. |
+| `:remove_empty_values` | `true` | Remove key/value pairs where the value is `nil`, empty, or whitespace-only — any Unicode whitespace, same as Ruby's `String#blank?`. |
+| `:remove_zero_values` | `false` | Remove key/value pairs whose value is zero — numeric `0` / `0.0`, or any textual form of zero (`"0"`, `"0.0"`, `"00.00"`, `"+0"`, `"-0.0"`, …). |
 | `:nil_values_matching` | `nil` | Set matching values to `nil`. Accepts a regular expression matched against the string representation of each value (e.g. `/\ANAN\z/` for NaN, `/\A#VALUE!\z/` for Excel errors). With `remove_empty_values: true` (default), nil-ified values are then removed. With `remove_empty_values: false`, the key is retained with a `nil` value. |
 | `:remove_empty_hashes` | `true` | Remove result hashes that have no key/value pairs or all-empty values. |
@@ -157,9 +158,9 @@ See [Instrumentation Hooks](./instrumentation.md) for full details and payload r
 ### Performance
-| Option | Default | Explanation |
-|--------|---------|-------------|
-| `:acceleration` | `true` | Use the C extension for parsing (MRI Ruby only). Set to `false` to force the pure-Ruby fallback (always used on JRuby/TruffleRuby). |
+| Option            | Default | Explanation                                                                                                                         |
+|-------------------|---------|-------------------------------------------------------------------------------------------------------------------------------------|
+| `:acceleration`   | `true`  | Use the C extension for parsing (MRI Ruby only). Set to `false` to force the pure-Ruby fallback (always used on JRuby/TruffleRuby). |
 ---

data/docs/real_world_csv.md CHANGED Viewed

@@ -187,7 +187,7 @@ Numeric conversion is one of the most common sources of data loss. SmarterCSV co
 ### I/O Patterns
-SmarterCSV accepts any IO-compatible source — file paths, open `File` handles, `StringIO`, and **non-seekable streams** like pipes, `STDIN`, and `Zlib::GzipReader`. Auto-detection of `row_sep` / `col_sep` works on streaming sources too: SmarterCSV captures the first bytes in an internal peek buffer and replays them, so the underlying source never needs to support `rewind` or `seek`. (Streaming IO support landed in 1.17.0.)
+SmarterCSV accepts any IO-compatible source — file paths, open `File` handles, `StringIO`, and **non-seekable streams** like pipes, `STDIN`, and `Zlib::GzipReader`. Auto-detection of `row_sep` / `col_sep` works on streaming sources too thanks to internal buffering — the underlying source never needs to support `rewind` or `seek`. (Streaming IO support landed in 1.17.0.)
 | Source | Issue | Status | Notes |
 |--------|-------|--------|-------|
@@ -195,6 +195,51 @@ SmarterCSV accepts any IO-compatible source — file paths, open `File` handles,
 | HTTP streaming | Parsing from a live HTTP response | 🔘 | Pass any IO-compatible object that responds to `#gets`. |
 | `STDIN` / shell pipes | Non-seekable input | 🔘 | `cat data.csv \| ruby -rsmarter_csv -e 'SmarterCSV.process(STDIN) { \|h\| ... }'` |
 | `IO.popen` output | Non-seekable subprocess stream | 🔘 | `IO.popen('zcat data.csv.gz') { \|io\| SmarterCSV.process(io) }` |
+| S3 object body | Non-seekable HTTP stream | 🔘 | `SmarterCSV.process(s3.get_object(...).body)` — see worked example below. |
+#### Streaming Inputs
+```ruby
+# Gzipped CSV — stream-decompressed, never written to disk
+require 'zlib'
+Zlib::GzipReader.open('huge.csv.gz') do |io|
+  SmarterCSV.process(io) { |row| MyModel.upsert(row.first) }
+end
+# STDIN / pipes
+SmarterCSV.process($stdin) { |row, _| MyModel.upsert(row.first) }
+# HTTP response body
+require 'open-uri'
+URI.open('https://example.com/data.csv') { |io| SmarterCSV.process(io) }
+# S3 — stream the response body directly
+require 'aws-sdk-s3'
+obj = Aws::S3::Client.new.get_object(bucket: 'data', key: 'imports/users.csv')
+SmarterCSV::Reader.new(obj.body, chunk_size: 500).each_chunk do |chunk, _index|
+  MyModel.insert_all(chunk)
+end
+# Subprocess output
+IO.popen('zcat data.csv.gz') { |io| SmarterCSV.process(io) }
+```
+#### Multi-Line Quoted Fields
+Newlines inside `"..."` are preserved as part of the field — useful for address blocks, CRM notes, and free-text comments. No configuration needed:
+```ruby
+$ cat addresses.csv
+id,name,address
+1,Alice,"123 Main St
+Apt 4B
+Brooklyn, NY 11201"
+2,Bob,"42 Elm Ave"
+data = SmarterCSV.process('addresses.csv')
+# => [{id: 1, name: "Alice", address: "123 Main St\nApt 4B\nBrooklyn, NY 11201"},
+#     {id: 2, name: "Bob",   address: "42 Elm Ave"}]
+```
 †: Legacy Apple DB Dump and older UNIX data dumps use ASCII control characters as delimiters:

data/docs/releases/1.16.0/performance_notes.md CHANGED Viewed

@@ -45,14 +45,14 @@ rows with type conversion applied. SmarterCSV/C is dramatically faster:
 ### C path
-| Gain         | Files                                                               |
-|--------------|---------------------------------------------------------------------|
-| **2.4×**     | long_fields — biggest win; `memchr` skip-ahead in quoted fields     |
-| **1.5×**     | heavy_quoting — same skip-ahead benefit                             |
-| **1.4×**     | tab_separated                                                       |
+| Gain         | Files                                                                       |
+|--------------|-----------------------------------------------------------------------------|
+| **2.4×**     | long_fields — biggest win; `memchr` skip-ahead in quoted fields             |
+| **1.5×**     | heavy_quoting — same skip-ahead benefit                                     |
+| **1.4×**     | tab_separated                                                               |
 | **1.2–1.3×** | embedded_sep, utf8, PEOPLE_IMPORT_C/NC, worldcities, whitespace, multi_char |
-| **1.1–1.2×** | PEOPLE_IMPORT_B/NB, uszips, sample_10M, wide_500_cols               |
-| **~1.0×**    | sensor_data, embedded_newlines (within noise)                       |
+| **1.1–1.2×** | PEOPLE_IMPORT_B/NB, uszips, sample_10M, wide_500_cols                       |
+| **~1.0×**    | sensor_data, embedded_newlines (within noise)                               |
 15 of 19 files are measurably faster; 2 within noise; 2 files show a small regression
 (PEOPLE_IMPORT_NB −7%, wide_500_cols −5%) attributable to the new `quote_boundary: :standard`
@@ -60,11 +60,11 @@ default adding one extra state check on the unquoted fast path.
 ### Ruby path
-| Gain         | Files                                                               |
-|--------------|---------------------------------------------------------------------|
+| Gain         | Files                                                                             |
+|--------------|-----------------------------------------------------------------------------------|
 | **1.9×**     | PEOPLE_IMPORT_C (117 cols) — direct hash construction bypasses intermediate Array |
-| **1.5×**     | PEOPLE_IMPORT_NC, multi_char_sep                                    |
-| **1.0–1.1×** | most other files                                                    |
+| **1.5×**     | PEOPLE_IMPORT_NC, multi_char_sep                                                  |
+| **1.0–1.1×** | most other files                                                                  |
 The Ruby path gains are concentrated on wide/complex files where the direct-hash
 construction optimization (Opt #11) has the most impact.
@@ -106,9 +106,9 @@ are skipped entirely in the C hot path — no string allocation, no conversion,
 insertion. Benchmark on `wide_500_cols_20k.csv` (500 columns):
 | Columns kept | Speedup vs no selection |
-|---|---|
-| 2 of 500  | ~16× faster |
-| 10 of 500 | ~8× faster  |
-| 50 of 500 | ~3× faster  |
+|--------------|-------------------------|
+|    2 of 500  |             ~16× faster |
+|   10 of 500  |             ~8× faster  |
+|   50 of 500  |             ~3× faster  |
 This is additive on top of the baseline gains above.

data/docs/releases/1.17.0/benchmarks.md ADDED Viewed

@@ -0,0 +1,121 @@
+# SmarterCSV 1.17.0 — Benchmark Results
+- **Date:** 2026-05-06
+- **Ruby:** 3.4.7 [arm64-darwin25] on Apple M1 Pro
+- **SmarterCSV:** 1.17.0
+- **Versions compared:** 1.14.4, 1.15.2, 1.16.4, 1.17.0
+- **Ruby CSV:** 3.3.5
+- **Methodology:** best of 40 measured runs (2 warm-up)
+- **Raw data files:**
+  - [`2026-05-06_1250_ruby3.4.7.md`](2026-05-06_1250_ruby3.4.7.md) / [`.json`](2026-05-06_1250_ruby3.4.7.json) — version comparison (1.14.4 / 1.15.2 / 1.16.4 / 1.17.0)
+  - [`2026-05-06_1511_ruby3.4.7.md`](2026-05-06_1511_ruby3.4.7.md) / [`.json`](2026-05-06_1511_ruby3.4.7.json) — vs Ruby CSV 3.3.5
+See [performance_notes.md](performance_notes.md) for analysis of these numbers.
+---
+## SmarterCSV C accelerated — version comparison
+| File                             |   Rows | v1.14.4    | v1.15.2   | v1.16.4   | v1.17.0   | newest vs oldest |
+|----------------------------------|--------|------------|-----------|-----------|-----------|------------------|
+| PEOPLE_IMPORT_B.csv              |  50000 |  1.6175s   |  0.1049s  |  0.0867s  |  0.0872s  | 18.54× faster    |
+| PEOPLE_IMPORT_C.csv              |  50000 |  8.0347s   |  0.2055s  |  0.1763s  |  0.1746s  | 46.02× faster    |
+| PEOPLE_IMPORT_NB.csv             |  50000 |  1.5629s   |  0.0994s  |  0.0694s  |  0.0708s  | 22.08× faster    |
+| PEOPLE_IMPORT_NC.csv             |  50000 |  1.4679s   |  0.0855s  |  0.0711s  |  0.0705s  | 20.83× faster    |
+| uscities.csv                     |  31257 |  1.0357s   |  0.1129s  |  0.0878s  |  0.0819s  | 12.64× faster    |
+| uszips.csv                       |  33782 |  1.2419s   |  0.1121s  |  0.0880s  |  0.0879s  | 14.13× faster    |
+| worldcities.csv                  |  48059 |  1.0420s   |  0.1174s  |  0.0861s  |  0.0773s  | 13.49× faster    |
+| embedded_newlines_20k.csv        |  80000 |  0.5337s   |  0.0633s  |  0.0591s  |  0.0545s  |  9.80× faster    |
+| embedded_separators_20k.csv      |  20000 |  0.2761s   |  0.0328s  |  0.0215s  |  0.0214s  | 12.90× faster    |
+| heavy_quoting_20k.csv            |  20000 |  0.5129s   |  0.0561s  |  0.0364s  |  0.0358s  | 14.34× faster    |
+| long_fields_20k.csv              |  20000 |  2.9215s   |  0.1082s  |  0.0464s  |  0.0392s  | 74.54× faster    |
+| many_empty_fields_20k.csv        |  20000 |  0.3885s   |  0.0314s  |  0.0240s  |  0.0262s  | 14.81× faster    |
+| multi_char_separator_20k.csv     |  20000 |  0.5305s   |  0.0340s  |  0.0272s  |  0.0296s  | 17.90× faster    |
+| sample_10M.csv                   |  50000 |  0.4513s   |  0.0619s  |  0.0480s  |  0.0446s  | 10.11× faster    |
+| sensor_data_50krows_50cols.csv   |  50000 |  3.8704s   |  0.2714s  |  0.2559s  |  0.2549s  | 15.19× faster    |
+| tab_separated_20k.tsv            |  20000 |  0.4496s   |  0.0337s  |  0.0255s  |  0.0256s  | 17.54× faster    |
+| utf8_multibyte_20k.csv           |  20000 |  0.2233s   |  0.0210s  |  0.0152s  |  0.0149s  | 14.96× faster    |
+| whitespace_heavy_20k.csv         |  20000 |  0.5244s   |  0.0349s  |  0.0250s  |  0.0286s  | 18.34× faster    |
+| wide_500_cols_20k.csv            |  20000 | 17.3477s   |  1.2805s  |  1.2798s  |  1.2701s  | 13.66× faster    |
+## SmarterCSV Ruby path — version comparison
+| File                             |   Rows | v1.14.4    | v1.15.2   | v1.16.4   | v1.17.0   | newest vs oldest |
+|----------------------------------|--------|------------|-----------|-----------|-----------|------------------|
+| PEOPLE_IMPORT_B.csv              |  50000 |  4.5718s   |  0.5635s  |  0.5272s  |  0.4971s  |  9.20× faster    |
+| PEOPLE_IMPORT_C.csv              |  50000 | 26.0194s   |  2.5511s  |  1.3401s  |  1.3328s  | 19.52× faster    |
+| PEOPLE_IMPORT_NB.csv             |  50000 |  4.4999s   |  0.5268s  |  0.4757s  |  0.4791s  |  9.39× faster    |
+| PEOPLE_IMPORT_NC.csv             |  50000 |  4.3233s   |  0.5752s  |  0.3989s  |  0.4017s  | 10.76× faster    |
+| uscities.csv                     |  31257 |  2.6702s   |  1.8124s  |  1.0662s  |  1.0944s  |  2.44× faster    |
+| uszips.csv                       |  33782 |  3.1853s   |  2.1641s  |  1.3332s  |  1.3434s  |  2.37× faster    |
+| worldcities.csv                  |  48059 |  2.8397s   |  1.8978s  |  1.0910s  |  1.0909s  |  2.60× faster    |
+| embedded_newlines_20k.csv        |  80000 |  0.9578s   |  0.4629s  |  0.4291s  |  0.4314s  |  2.22× faster    |
+| embedded_separators_20k.csv      |  20000 |  0.7074s   |  0.4535s  |  0.2748s  |  0.2748s  |  2.57× faster    |
+| heavy_quoting_20k.csv            |  20000 |  1.4361s   |  0.8598s  |  0.5241s  |  0.5273s  |  2.72× faster    |
+| long_fields_20k.csv              |  20000 |  8.8715s   |  4.7839s  |  2.5696s  |  2.5624s  |  3.46× faster    |
+| many_empty_fields_20k.csv        |  20000 |  0.8635s   |  0.2521s  |  0.1680s  |  0.1664s  |  5.19× faster    |
+| multi_char_separator_20k.csv     |  20000 |  1.4172s   |  0.2463s  |  0.1853s  |  0.1879s  |  7.54× faster    |
+| sample_10M.csv                   |  50000 |  1.0547s   |  0.2388s  |  0.2238s  |  0.2211s  |  4.77× faster    |
+| sensor_data_50krows_50cols.csv   |  50000 |  8.9445s   |  1.8246s  |  1.8348s  |  1.8181s  |  4.92× faster    |
+| tab_separated_20k.tsv            |  20000 |  1.2664s   |  0.1596s  |  0.1553s  |  0.1536s  |  8.24× faster    |
+| utf8_multibyte_20k.csv           |  20000 |  0.6484s   |  0.1124s  |  0.1068s  |  0.1066s  |  6.08× faster    |
+| whitespace_heavy_20k.csv         |  20000 |  1.5513s   |  0.1613s  |  0.1654s  |  0.1610s  |  9.63× faster    |
+| wide_500_cols_20k.csv            |  20000 | 44.5782s   |  7.2023s  |  6.9748s  |  6.9261s  |  6.44× faster    |
+---
+## SmarterCSV 1.17.0 vs Ruby CSV 3.3.5 — full results
+| File                             |   Rows | CSV.read¹  | CSV.hashes¹ | SmarterCSV/C  | SmarterCSV/Rb |
+|----------------------------------|--------|------------|-------------|---------------|---------------|
+| PEOPLE_IMPORT_B.csv              |  50000 |  0.2718s   |  0.7750s    |  0.0673s      |  0.5034s      |
+| PEOPLE_IMPORT_C.csv              |  50000 |  1.4111s   |  8.0199s    |  0.1907s      |  1.4032s      |
+| PEOPLE_IMPORT_NB.csv             |  50000 |  0.2659s   |  0.7603s    |  0.0638s      |  0.4800s      |
+| PEOPLE_IMPORT_NC.csv             |  50000 |  0.2860s   |  0.9173s    |  0.0630s      |  0.4132s      |
+| uscities.csv                     |  31257 |  0.5640s   |  0.8803s    |  0.0789s      |  1.1120s      |
+| uszips.csv                       |  33782 |  0.7414s   |  1.1604s    |  0.0929s      |  1.3645s      |
+| worldcities.csv                  |  48059 |  0.6313s   |  0.9906s    |  0.0794s      |  1.0945s      |
+| embedded_newlines_20k.csv        |  80000 |  0.1693s   |  0.2245s    |  0.0554s      |  0.4451s      |
+| embedded_separators_20k.csv      |  20000 |  0.1312s   |  0.1838s    |  0.0206s      |  0.2830s      |
+| heavy_quoting_20k.csv            |  20000 |  0.1167s   |  0.2410s    |  0.0338s      |  0.5400s      |
+| long_fields_20k.csv              |  20000 |  0.2373s   |  0.2762s    |  0.0392s      |  2.6172s      |
+| many_empty_fields_20k.csv        |  20000 |  0.1145s   |  0.3622s    |  0.0216s      |  0.1727s      |
+| multi_char_separator_20k.csv     |  20000 |  0.0890s   |  0.2122s    |  0.0293s      |  0.1662s      |
+| sample_10M.csv                   |  50000 |  0.1685s   |  0.3012s    |  0.0357s      |  0.2361s      |
+| sensor_data_50krows_50cols.csv   |  50000 |  0.5655s   |  2.6744s    |  0.2442s      |  1.8878s      |
+| tab_separated_20k.tsv            |  20000 |  0.0832s   |  0.2029s    |  0.0219s      |  0.1651s      |
+| utf8_multibyte_20k.csv           |  20000 |  0.0662s   |  0.1427s    |  0.0156s      |  0.1138s      |
+| whitespace_heavy_20k.csv         |  20000 |  0.0890s   |  0.2169s    |  0.0278s      |  0.1670s      |
+| wide_500_cols_20k.csv            |  20000 |  2.3351s   | 32.4002s    |  1.2823s      |  7.3504s      |
+## Ruby CSV 3.3.5 vs SmarterCSV 1.17.0 (C accelerated)
+| File                             |   Rows | CSV.read¹     | CSV.hashes¹   |
+|----------------------------------|--------|---------------|---------------|
+| PEOPLE_IMPORT_B.csv              |  50000 |  4.04× slower | 11.51× slower |
+| PEOPLE_IMPORT_C.csv              |  50000 |  7.40× slower | 42.04× slower |
+| PEOPLE_IMPORT_NB.csv             |  50000 |  4.17× slower | 11.92× slower |
+| PEOPLE_IMPORT_NC.csv             |  50000 |  4.54× slower | 14.55× slower |
+| uscities.csv                     |  31257 |  7.15× slower | 11.16× slower |
+| uszips.csv                       |  33782 |  7.98× slower | 12.50× slower |
+| worldcities.csv                  |  48059 |  7.95× slower | 12.48× slower |
+| embedded_newlines_20k.csv        |  80000 |  3.05× slower |  4.05× slower |
+| embedded_separators_20k.csv      |  20000 |  6.36× slower |  8.91× slower |
+| heavy_quoting_20k.csv            |  20000 |  3.46× slower |  7.14× slower |
+| long_fields_20k.csv              |  20000 |  6.05× slower |  7.04× slower |
+| many_empty_fields_20k.csv        |  20000 |  5.29× slower | 16.73× slower |
+| multi_char_separator_20k.csv     |  20000 |  3.04× slower |  7.25× slower |
+| sample_10M.csv                   |  50000 |  4.72× slower |  8.43× slower |
+| sensor_data_50krows_50cols.csv   |  50000 |  2.32× slower | 10.95× slower |
+| tab_separated_20k.tsv            |  20000 |  3.80× slower |  9.28× slower |
+| utf8_multibyte_20k.csv           |  20000 |  4.24× slower |  9.14× slower |
+| whitespace_heavy_20k.csv         |  20000 |  3.20× slower |  7.81× slower |
+| wide_500_cols_20k.csv            |  20000 |  1.82× slower | 25.27× slower |
+---
+¹ **Raw output** — no post-processing applied. Returns plain arrays or string-keyed hashes. No header normalization, type conversion, whitespace stripping, or empty-value removal. Your own post-processing must be added to produce usable data.
+---
+PREVIOUS: [Performance Notes](./performance_notes.md) | UP: [README](../../../README.md)

data/docs/releases/1.17.0/changes.md ADDED Viewed

@@ -0,0 +1,161 @@
+### Contents
+  * [Introduction](../../_introduction.md)
+  * [Migrating from Ruby CSV](../../migrating_from_csv.md)
+  * [Ruby CSV Pitfalls](../../ruby_csv_pitfalls.md)
+  * [Parsing Strategy](../../parsing_strategy.md)
+  * [The Basic Read API](../../basic_read_api.md)
+  * [The Basic Write API](../../basic_write_api.md)
+  * [Batch Processing](../../batch_processing.md)
+  * [Configuration Options](../../options.md)
+  * [Row and Column Separators](../../row_col_sep.md)
+  * [Header Transformations](../../header_transformations.md)
+  * [Header Validations](../../header_validations.md)
+  * [Column Selection](../../column_selection.md)
+  * [Data Transformations](../../data_transformations.md)
+  * [Value Converters](../../value_converters.md)
+  * [Bad Row Quarantine](../../bad_row_quarantine.md)
+  * [Warnings](../../warnings.md)
+  * [Instrumentation Hooks](../../instrumentation.md)
+  * [Examples](../../examples.md)
+  * [Real-World CSV Files](../../real_world_csv.md)
+  * [SmarterCSV over the Years](../../history.md)
+  * [**Release Notes**](./changes.md)
+--------------
+# SmarterCSV 1.17.0 — Changes
+RSpec tests: **1,434 → 2,210** (+776 tests since 1.16.4)
+1.17.0 is a **features-and-quality** release, focused on three things: streaming IO inputs, a structured warnings system, and Rails-friendly defaults. The C parser's core line-parsing — separator splitting, quote/escape handling, multiline stitching — is unchanged from 1.16.0 (see [`docs/releases/1.16.0/`](../1.16.0/changes.md) for the parser performance story); what changed in the C path this cycle is a faster code path for quoted-field-heavy files and Unicode-aware blank detection. On the C-accelerated path, 1.17.0 vs 1.16.4 is a **mixed picture**: quoted-field-heavy and wide files run meaningfully faster, a handful of short-line / many-small-field files run a little slower, and the rest are within noise. The Ruby path is parity throughout. The wins come from the faster quoted-field handling; the small regressions trace to the new auto-detection default (`auto_row_sep_chars` 500→4096) plus a tiny per-line overhead — see [performance_notes.md](performance_notes.md) and [benchmarks.md](benchmarks.md) for the per-file breakdown.
+---
+## Compatibility
+* **No breaking changes.** All 1.16.x code continues to work without modification.
+* **Behavior change worth noting:** `auto_row_sep_chars: nil` / `0` no longer means "scan whole file" — these values fall back to the default with a warning. The total scan is hard-capped at 64KB. If you relied on the previous undocumented "scan whole file" semantics, this is a visible change.
+---
+## Headline Features
+### 1. Non-Seekable Streaming Inputs
+SmarterCSV now reads directly from any IO source — including streams that don't support `rewind` or `seek`. No need to materialize the file on disk first.
+```ruby
+# Gzipped CSV — stream-decompressed, never written to disk
+require 'zlib'
+Zlib::GzipReader.open('huge.csv.gz') do |io|
+  SmarterCSV.process(io) { |row| MyModel.upsert(row.first) }
+end
+# STDIN / pipes
+SmarterCSV.process($stdin) { |row, _| MyModel.upsert(row.first) }
+# HTTP response body
+require 'open-uri'
+URI.open('https://example.com/data.csv') { |io| SmarterCSV.process(io) }
+# S3 — stream the response body directly
+require 'aws-sdk-s3'
+obj = Aws::S3::Client.new.get_object(bucket: 'data', key: 'imports/users.csv')
+SmarterCSV::Reader.new(obj.body, chunk_size: 500).each_chunk do |chunk, _|
+  MyModel.insert_all(chunk)
+end
+```
+Auto-detection of `row_sep` and `col_sep` works on these streaming sources thanks to internal buffering — the underlying source never needs to support `rewind` or `seek`. See [Real-World CSV Files → I/O Patterns](../../real_world_csv.md#io-patterns) and [Examples → Streaming Inputs](../../examples.md#example-14-streaming-inputs-non-seekable-io).
+### 2. Structured Warnings Collection
+Auto-detection and configuration warnings are now collected on the Reader as a deduped histogram, in addition to being emitted to a log sink:
+```ruby
+reader = SmarterCSV::Reader.new('data.csv')
+reader.process
+reader.warnings
+# => [
+#   { type: :config, code: :chunk_size_default, severity: :warn,
+#     message: "chunk_size not set, defaulting to 100. ...", count: 1 },
+#   ...
+# ]
+```
+Repeated warnings of the same `(type, code)` are deduped — `count` tracks occurrences across the run. This lets you surface warnings programmatically (dashboards, fail-deploys-on-codes, etc.) without parsing stderr text.
+**Warning codes available in 1.17.0:**
+| Code                          | Type           | Severity | Triggered when                                                                                |
+|-------------------------------|----------------|----------|-----------------------------------------------------------------------------------------------|
+| `:chunk_size_default`         | `:config`      | `:warn`  | `each_chunk` is called without `chunk_size:` and the default of `100` is used.                |
+| `:header_a_method`            | `:deprecation` | `:warn`  | The deprecated `Reader#headerA` accessor is called.                                           |
+| `:utf8_missing_binary_mode`   | `:encoding`    | `:warn`  | UTF-8 input is being processed but the IO was not opened with `"b:utf-8"`.                    |
+| `:no_clear_row_sep`           | `:row_sep`     | `:error` | Auto-detection found a true tie between separators after scanning 64KB. Silent miss-parse risk. |
+| `:no_row_sep_found`           | `:row_sep`     | `:error` | No known row separator was found in the first 64KB. Likely an exotic separator like ` `. |
+See [Warnings](../../warnings.md) for the full record shape, suppression options, and Rails integration details.
+### 3. Class-Level `SmarterCSV.warnings` Accessor
+Mirrors `SmarterCSV.errors`. Returns warnings from the most recent call to `process`, `parse`, `each`, or `each_chunk` on the current thread. Cleared at the start of each new call.
+```ruby
+SmarterCSV.process('data.csv')
+SmarterCSV.warnings.each do |w|
+  logger.warn("[#{w[:type]}/#{w[:code]}] #{w[:message]} (×#{w[:count]})")
+end
+```
+Per-thread (uses `Thread.current`) — safe under Puma and Sidekiq. Not fiber-safe; use `SmarterCSV::Reader` directly if processing CSV concurrently with `Async`/`Falcon`/manual `Fiber` scheduling.
+### 4. Rails.logger Auto-Routing
+When `Rails.logger` is present, warnings are routed through it at the severity declared at the call site (`:debug` / `:info` / `:warn` / `:error` / `:fatal`):
+```
+# In log/development.log
+[WARN]  SmarterCSV: chunk_size not set, defaulting to 100. ...
+```
+Without Rails, falls back to `Kernel#warn` (writes to `$stderr`). Detection is one-shot at Reader construction — no per-call overhead. The programmatic `reader.warnings` collection is identical in both modes.
+See [Warnings → Log sink routing](../../warnings.md#log-sink-routing).
+---
+## Improvements
+* **Better auto-detection of `row_sep` and `col_sep`** — more accurate results on files with comment headers and other irregularities at the start of the stream.
+* **`auto_row_sep_chars` default changed to `4096`** (was `500` in 1.16.x). Sized to cover wide-header CSVs in a single read. Out-of-range values, `nil`, or `0` fall back to the default with a warning. **Behavior change vs 1.16.x:** the previous undocumented "scan whole file" semantics on `nil`/`0` is removed; the total scan is hard-capped at 64KB.
+* **`buffer_size` is now a public option** — peek buffer chunk size for non-seekable inputs (pipes, gzip readers, HTTP/S3 bodies). Default `16_384`. Out-of-range values warn and clamp to the supported range rather than raising. Has no effect on seekable inputs (file paths, `File`, `StringIO`).
+* **Files ending in a lone `\r`** are now correctly detected as `\r`-terminated instead of falling through to a "no clear row separator" warning.
+* **`SmarterCSV.errors` mid-stream preservation** *(merged from 1.16.4)* — fixed a bug where collected error records could be lost when processing raised mid-stream (e.g. `bad_row_limit:` exceeded → `TooManyBadRows`, or a user block raising through `.process` / `.each` / `.each_chunk`).
+* **`enforce_utf8_encoding` for `ASCII-8BIT` inputs** *(merged from 1.16.4)* — fixed incorrect replacement of all non-ASCII bytes when the input was tagged binary. Encoding is now relabeled to UTF-8 before transcoding so only genuinely invalid byte sequences are replaced.
+---
+## Documentation
+Substantive expansion of the user-facing docs to match the new capabilities:
+* **`docs/examples.md`** — six new cookbook entries (Examples 14–19): Streaming Inputs, Resumable Plain-Ruby Import, CSV Files with Comment Lines, Tab-Separated Values (TSV), Multi-Line Fields, and Filtering and Transforming a CSV File (the `CSV.filter` replacement pattern).
+* **`docs/real_world_csv.md`** — expanded I/O Patterns section with worked examples for gzip, S3, HTTP, STDIN, and `IO.popen`. Added a Multi-Line Quoted Fields worked example.
+* **`docs/warnings.md`** *(new)* — full coverage of the structured warnings system: record shape, available codes, log-sink routing for Rails vs non-Rails, suppression via `verbose: :quiet`.
+* **`docs/header_transformations.md`** — added a worked example for `comment_regexp:` (CSV files with comment lines).
+* **`docs/row_col_sep.md`** — added a worked TSV example.
+* **`docs/batch_processing.md`** — added a Resumable Import (Plain Ruby) example using `chunk_index` + a JSON state file (companion to the Rails 8.1 ActiveJob version in `examples.md`).
+* **`docs/basic_read_api.md`** / **`docs/basic_write_api.md`** — cross-references to the read-transform-write composition pattern; added `$stdout` and S3 streaming write examples.
+* **`README.md`** — added inline examples for streaming inputs, value converters, header validation, and writing CSV; one-sentence note on Rails.logger auto-routing.
+---
+PREVIOUS: [SmarterCSV over the Years](../../history.md) | UP: [README](../../../README.md)

data/docs/releases/1.17.0/performance_notes.md ADDED Viewed

@@ -0,0 +1,126 @@
+# SmarterCSV 1.17.0 — Performance Notes
+The per-file tables below: Apple M4, Ruby 3.4.7 [arm64], 40 iterations per run × 8 runs, median across runs (p10-trimmed), measured 2026-05-11–12. 19-file corpus; `1.16.4 → 1.17.0`. Times in seconds — lower is better. (The "vs Ruby CSV" tables further down are from the earlier 2026-05-06 run — see Methodology.)
+---
+## 1.16.4 → 1.17.0 — C-accelerated path (the default)
+The C parser's core line-parsing (separator splitting, quote/escape handling, multiline stitching) is unchanged from 1.16.0. The C-path changes this cycle are a faster code path for quoted-field-heavy files — the big wins — and Unicode-aware blank detection.
+| file                           | 1.16.4 (s) | 1.17.0 (s) | 1.17.0 vs 1.16.4 |
+| ------------------------------ | ---------- | ---------- | ---------------- |
+| PEOPLE_IMPORT_B.csv            |    0.06255 |    0.06305 | ~1% noise        |
+| PEOPLE_IMPORT_C.csv            |    0.13072 |    0.13274 | ~2% noise        |
+| PEOPLE_IMPORT_NB.csv           |    0.05985 |    0.06079 | ~2% noise        |
+| PEOPLE_IMPORT_NC.csv           |    0.05273 |    0.05420 | ~3% noise        |
+| uscities.csv                   |    0.06325 |    0.05545 | 12.3% faster     |
+| uszips.csv                     |    0.06957 |    0.06255 | 10.1% faster     |
+| worldcities.csv                |    0.06824 |    0.06134 | 10.1% faster     |
+| embedded_newlines_60k.csv      |    0.12795 |    0.11951 | 6.6% faster      |
+| embedded_separators_60k.csv    |    0.05093 |    0.04591 | 9.9% faster      |
+| heavy_quoting_60k.csv          |    0.08926 |    0.07490 | 16.1% faster     |
+| long_fields_40k.csv            |    0.06375 |    0.04970 | 22.0% faster     |
+| many_empty_fields_60k.csv      |    0.06813 |    0.06888 | ~1% noise        |
+| multi_char_separator_60k.csv   |    0.07720 |    0.07830 | ~1% noise        |
+| sample_100k.csv                |    0.07051 |    0.07139 | ~1% noise        |
+| sensor_data_50krows_50cols.csv |    0.17839 |    0.17897 | ~1% noise        |
+| tab_separated_60k.tsv          |    0.06704 |    0.06798 | ~1% noise        |
+| utf8_multibyte_60k.csv         |    0.04391 |    0.04376 | ~ same           |
+| whitespace_heavy_60k.csv       |    0.06803 |    0.06897 | ~1% noise        |
+| wide_500_cols_20k.csv          |    1.07019 |    1.07348 | ~1% noise        |
+*`~N% noise` means the measured difference (≈N%, always a small slowdown here) is within the run-to-run variance of this setup (8 runs × 40 iterations, median across runs, p10-trimmed) — i.e. effectively unchanged, not a real regression. The raw per-version times are in the table for the exact figure.*
+Quote-heavy / large-field / wide files run **7–22% faster** than 1.16.4 (`long_fields_40k` 22%, `heavy_quoting_60k` 16%, the city files 10–12%, `embedded_separators` 10%, `embedded_newlines` 7%). Everything else is within ±3% of 1.16.4 — effectively unchanged. (The short-line / many-small-field files do show a small, *consistent* uptick at the bottom of that band, traceable to the larger default auto-detection scan window plus a tiny per-line overhead; if that matters for your workload, set `auto_row_sep_chars` lower. See [What's driving the mixed C-path picture](#whats-driving-the-mixed-c-path-picture) below.)
+---
+## 1.16.4 → 1.17.0 — Ruby fallback path (`acceleration: false`)
+Faster on nearly every file this cycle, from three changes: in-place stripping in the no-quote split path, a first-byte fast-reject before numeric conversion, and per-row / per-value overhead removed from the hash transformations.
+| file                           | 1.16.4 (s) | 1.17.0 (s) | 1.17.0 vs 1.16.4 |
+| ------------------------------ | ---------- | ---------- | ---------------- |
+| PEOPLE_IMPORT_B.csv            |    0.38220 |    0.35281 | 7.7% faster      |
+| PEOPLE_IMPORT_C.csv            |    0.99047 |    0.95728 | 3.4% faster      |
+| PEOPLE_IMPORT_NB.csv           |    0.36110 |    0.31716 | 12.2% faster     |
+| PEOPLE_IMPORT_NC.csv           |    0.28762 |    0.25849 | 10.1% faster     |
+| uscities.csv                   |    0.74246 |    0.71183 | 4.1% faster      |
+| uszips.csv                     |    0.90817 |    0.87628 | 3.5% faster      |
+| worldcities.csv                |    0.75714 |    0.72641 | 4.1% faster      |
+| embedded_newlines_60k.csv      |    0.88887 |    0.86252 | 3.0% faster      |
+| embedded_separators_60k.csv    |    0.57053 |    0.53401 | 6.4% faster      |
+| heavy_quoting_60k.csv          |    1.09395 |    1.02829 | 6.0% faster      |
+| long_fields_40k.csv            |    3.27964 |    3.29366 | ~ same           |
+| many_empty_fields_60k.csv      |    0.37815 |    0.33153 | 12.3% faster     |
+| multi_char_separator_60k.csv   |    0.45717 |    0.38380 | 16.0% faster     |
+| sample_100k.csv                |    0.34527 |    0.30690 | 11.1% faster     |
+| sensor_data_50krows_50cols.csv |    1.32705 |    1.33218 | ~ same           |
+| tab_separated_60k.tsv          |    0.38261 |    0.31359 | 18.0% faster     |
+| utf8_multibyte_60k.csv         |    0.24212 |    0.21281 | 12.1% faster     |
+| whitespace_heavy_60k.csv       |    0.37635 |    0.30848 | 18.0% faster     |
+| wide_500_cols_20k.csv          |    5.28395 |    4.23045 | 19.9% faster     |
+Gains run **3–20%** vs 1.16.4, biggest on wide / many-small-field files (`wide_500_cols` 20%, `whitespace_heavy` / `tab_separated` 18%, `multi_char_separator` 16%). Only `long_fields_40k` (dominated by large-field allocation, not per-field work) and `sensor_data` (numeric-heavy — the fast-reject's per-value cost and a saved per-value method call cancel out) sit at parity.
+---
+## What's driving the mixed C-path picture
+The C parser's core line-parsing — separator splitting, quote/escape handling, multiline stitching — is unchanged from 1.16.0; all of that hot-path work carries forward (see [the 1.16.0 changes](../1.16.0/changes.md) for the parser performance story). So why the split — some files faster, a band of small files a hair slower?
+**The wins are the quoted-field handling.** 1.17.0 added a faster path for fields wrapped in quotes: the common case — a quoted field with no doubled `""` inside — now skips a copy step. Files where most or all fields are quoted (city/address-style data, long quoted text, wide rows) pick up 7–22%.
+**The bigger default auto-detection window.** The benchmark leaves `row_sep` at `:auto` for every file, so each run reads `auto_row_sep_chars` bytes up front — now `4096`, was `500` — and scans them for the row separator.
+  * On tiny files where total parse time is only ~50–80 ms, that one-time scan shows up as a ≤3% uptick.
+  * On larger files it's noise (and often net-positive — the wider window usually settles the separator on the first read, avoiding the doubling-escalation loop).
+If you parse lots of very small files and care about that 1–3%, set `auto_row_sep_chars` lower, or pin `row_sep` explicitly to skip detection entirely. (The related `guess_line_ending` change — a chunked scan that doubles up to a 64 KB hard cap, replacing the old undocumented "scan whole file" on `nil`/`0` — is the same trade-off.)
+**Not a factor here:** the buffering layer for non-seekable streams. The benchmark passes file paths to `SmarterCSV.process`, which opens them as seekable `File` objects, so the seekable fast path is taken and no buffering wrapper is instantiated. That layer only runs for pipes / gzip readers / HTTP/S3 bodies, which have much higher latency anyway — any extra work the buffer does there is negligible.
+---
+## vs Ruby CSV 3.3.5 (1.17.0 reference)
+### vs `CSV.read` (raw arrays — minimum equivalent work)
+`CSV.read` is the *fastest* Ruby CSV mode: plain string arrays, no symbol keys, no numeric conversion. SmarterCSV/C delivers fully processed hashes — and still beats it on every file:
+| Range     | Files                                                                   |
+|-----------|-------------------------------------------------------------------------|
+| **7–8×**  | PEOPLE_IMPORT_C (7.8×), uszips (7.8×)                                   |
+| **6–7×**  | long_fields (6.9×), uscities (6.8×), worldcities (6.8×)                 |
+| **5–6×**  | embedded_separators (5.4×)                                              |
+| **3–4×**  | utf8_multibyte (3.9×), PEOPLE_IMPORT_NC (3.7×), many_empty (3.5×), heavy_quoting (3.4×), sample_100k (3.4×), PEOPLE_IMPORT_NB (3.2×) |
+| **2–3×**  | PEOPLE_IMPORT_B (2.9×), embedded_newlines (2.9×), whitespace_heavy (2.9×), sensor_data (2.5×) |
+| **1–2×**  | wide_500_cols (1.7×), tab_separated (1.6×), multi_char_separator (1.4×) |
+**Summary: 1.4×–7.8× faster than `CSV.read`, while returning fully processed hashes.**
+### vs `CSV.hashes` (string-keyed hashes — closer to SmarterCSV output)
+| Range      | Files                                                                  |
+|------------|------------------------------------------------------------------------|
+| **40–50×** | PEOPLE_IMPORT_C (47.3×)                                                |
+| **20–25×** | wide_500_cols (22.1×)                                                  |
+| **10–15×** | uszips (12.5×), PEOPLE_IMPORT_NC (12.1×), many_empty (11.8×), worldcities (11.4×), uscities (11.2×), sensor_data (11.1×) |
+| **7–10×**  | embedded_separators (8.3×), long_fields (8.1×), PEOPLE_IMPORT_NB (8.1×), PEOPLE_IMPORT_B (7.9×), heavy_quoting (7.0×) |
+| **5–7×**   | whitespace_heavy (6.9×), utf8_multibyte (6.7×), sample_100k (6.2×)     |
+| **4–5×**   | embedded_newlines (4.2×)                                               |
+| **2–3×**   | tab_separated (2.3×), multi_char_separator (2.2×)                      |
+**Summary: 2.2×–47.3× faster than `CSV.hashes`.**
+---
+## Methodology
+Same as 1.16.0:
+- Apple M4, Ruby 3.4.7
+- 40 iterations per run × 8 runs (2 warm-up), median across runs (p10-trimmed)
+- Raw .json captures preserved alongside the .md tables for reproducibility
+---
+PREVIOUS: [Changes](./changes.md) | UP: [README](../../../README.md)

data/docs/row_col_sep.md CHANGED Viewed

@@ -31,7 +31,7 @@
 Convenient defaults allow automatic detection of the column and row separators: `row_sep: :auto`, `col_sep: :auto`. This makes it easier to process any CSV files without having to examine the line endings or column separators, e.g. when users upload CSV files to your service and you have no control over the incoming files.
-The setting `:auto_row_sep_chars` controls the chunk size used while scanning for the row separator (default is 8192). Detection reads in chunks of this size and stops as soon as one separator has a clear majority, with a 64KB hard cap. Values below 8192 (and `nil` / `0`) are rejected and fall back to the default with a warning. Of course you can also set the `:row_sep` manually.
+The setting `:auto_row_sep_chars` controls the initial scan size used while detecting the row separator (default is `4096`). Detection stops as soon as one separator has a clear majority, up to a 64KB cap. Bump it higher if your files have very wide headers or long comment preambles; out-of-range values, `nil`, or `0` fall back to the default with a warning. Of course you can also set the `:row_sep` manually to skip auto-detection entirely.
 ## Column Separator `col_sep`
@@ -40,6 +40,25 @@ The automatic detection of column separators considers: `,`, `\t`, `;`, `:`, `|`
 Some CSV files may contain an unusual column separqator, which could even be a control character.
+### Tab-Separated Values (TSV)
+Tab-separated files are auto-detected by default — no options needed:
+```ruby
+$ cat data.tsv
+id<TAB>name<TAB>amount
+1<TAB>Alice<TAB>100
+2<TAB>Bob<TAB>200
+# Auto-detected — col_sep: :auto is the default
+SmarterCSV.process('data.tsv')
+# Or set the separator explicitly
+SmarterCSV.process('data.tsv', col_sep: "\t")
+```
+The default `col_sep: :auto` picks tab when it's the dominant delimiter in the first chunk of the file. The explicit form is useful in test fixtures or when you want to fail fast on unexpected formats.
 ## Row Separator `row_sep`
 The automatic detection of row separators considers: `\n`, `\r\n`, `\r`.